0% found this document useful (0 votes)
21 views442 pages

Wick - Numerical Methods For Partial Differential Equations

This document provides an overview of the structure and content of a numerical methods for partial differential equations course. The course would consist of 4 hours of lectures and 2 hours of exercises per week over 12-14 weeks. Key topics covered in the lectures include finite differences, finite elements, numerical solution techniques, and methods for parabolic/hyperbolic time-dependent PDEs. The exercises would provide hands-on practice implementing finite element methods in C++. Related classes could include algorithmic programming, ODE methods, continuum mechanics, and specialized topics like nonlinear/coupled PDEs. The overall goal is to introduce students to numerical analysis concepts and techniques for solving PDEs that model real-world problems.

Uploaded by

wbc2014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views442 pages

Wick - Numerical Methods For Partial Differential Equations

This document provides an overview of the structure and content of a numerical methods for partial differential equations course. The course would consist of 4 hours of lectures and 2 hours of exercises per week over 12-14 weeks. Key topics covered in the lectures include finite differences, finite elements, numerical solution techniques, and methods for parabolic/hyperbolic time-dependent PDEs. The exercises would provide hands-on practice implementing finite element methods in C++. Related classes could include algorithmic programming, ODE methods, continuum mechanics, and specialized topics like nonlinear/coupled PDEs. The overall goal is to introduce students to numerical analysis concepts and techniques for solving PDEs that model real-world problems.

Uploaded by

wbc2014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 442

Numerical Methods for Partial Differential Equations

Thomas Wick

Gottfried Wilhelm Leibniz Universität Hannover (LUH)


Institut für Angewandte Mathematik (IfAM)

AG Wissenschaftliches Rechnen (GWR)

Welfengarten 1, 30167 Hannover, Germany

https://www.ifam.uni-hannover.de/wick.html
https://thomaswick.org

[email protected]

Version 2, last update:


Friday, January 21, 2022
DOI https://doi.org/10.15488/11709

Version 1, Jan 28, 2020


DOI https://doi.org/10.15488/9248

by Elisa Wick

1
Foreword
These notes accompany lectures to the classes
Numerik partieller Differentialgleichungen (NumPDGL)
Numerics of partial differential equations (NumPDE)
An international open class (IntNumPDE)

at the Leibniz Universität Hannover (LUH). These classes are classical 4 + 2 lectures (per week: two times 90
minutes lecture and one 90 minute exercise class) in the German university system. Usually, no programming
or software developements are made in these classes. Rather, they are accompanied by an extra class of 2
hours
Finite-Elemente-Programmierpraktikum in C++ (FEMProg)

in which finite elements for solving partial differential equations are implemented by the students (Class 3b;
see next pages).
A brief summary of all topics in these notes is:
• Recapitulation of characteristic concepts of numerical mathematics
• Brief review of mathematical modeling of differential equations
• Discretization with Finite differences (FD) for elliptic boundary value problems

• Discretization with Finite elements (FEM) for elliptic boundary value problems
• A posteriori error estimation of goal functionals using duality arguments
(extra class; see Class 3c on the next pages)
• Numerical solution of the discretized problems

• A brief introduction to vector-valued problems (elasticity/Stokes) (if time permits)


• Time-dependent problems: Methods for parabolic and hyperbolic problems
• Numerical methods for nonlinear and coupled problems (Class 5a)

The prerequisites are lectures in calculus, linear algebra and an introduction to numerics or as well classes
on the theory of ordinary (ODE) or partial (PDE) differential equations. Also, functional analysis (FA) is
recommended, but not necessarily mandatory. Classes in continuum mechanics may also be helpful, but are
not mandatory to understand these notes.
On the other hand, for the convenience of all of us, many results from linear algebra, analysis and functional
analysis are included (often without the proofs though) in order to achieve a more or less self-contained booklet.
To fully enjoy these notes, the reader should have a taste for practical aspects and algorithms as well as he/she
should be ready to dive into theoretical aspects of variational formulations and numerical analysis. In most
sections, numerical tests complement algorithms and theory. Snippets of programming codes are
provided as well at the end.

At this place I want to thank my students from the WS 2017/2018, WS 2019/2020 classes, the Peruvian
participants from the international open class WS 2020/2021, and again in presence WS 2021/2022 and my
doctoral researchers and Dr. Maryam Parvizi for letting me know about mistakes and typos. Furthermore
thanks to Katrin Mang for some tikz pictures and latex improvements.
Enjoy reading and please let me know about mistakes via the email address indicated on the front page.

Hannover, January 2022 Thomas Wick

2
Specific structure and topics in a 4 + 2 lecture class
As previously mentioned, such a class is typical in the German system and has usually the following charac-
teristics:
• 10 ECTS (European Credit Transfer and Accumulation System)
• 4 hours lecture per week, i.e., two times 90 minutes lecture per week

• 2 hours exercises per week, i.e., once 90 minutes per week


• In total per semester around 26 − 28 lecture meetings (each - as said - 90 minutes)
• In total per semester around 13 − 14 exercise meetings (each - as said - 90 minutes)

As general info: the German winter semester usually runs from mid-October until end of January/begin
February. The German summer semester usually runs from begin/mid-April until mid July. Both have
usually around 12 − 14 weeks lecture courses.

Topics
The topics may be1 (L = lecture):
1. L1: Recapitulation of characteristic concepts of numerical mathematics
Parts of Chapter 2

2. L2-L3: Brief review of mathematical modeling of differential equations


Parts of Chapter 4
3. L4: Classifications
Parts of Chapter 5
4. L5-L8: Finite differences (FD) for elliptic boundary value problems
Parts of Chapter 6
5. L9-L20: Finite elements (FEM) for elliptic boundary value problems
Major parts of Chapter 8 and partially Chapter 7
6. L21-L23: Numerical solution of the discretized problems
Parts of Chapter 10 with a focus on multigrid
7. L24-L26/27: Methods for parabolic and hyperbolic problems (time-dependent PDEs)
Chapter 12
8. L28: Recapitulation of contents
Chapter 18.1

1 Remark: the entire lecture notes contain too much additional materials. The purpose of these lecture notes was not a specific
tailored booklet to some class, but rather a self-contained work that in particular also covers advanced topics such as error
estimation, nonlinear, and coupled problems.

3
Structure of a possible numerical analysis/scientific computing cycle
1. Class 1: Introduction to numerical analysis (Numerik 1)

2. Class 1b: Algorithmisches Programmieren in C++


3. Class 2: Numerical methods for ODEs and eigenvalue problems (Numerik 2)
4. Class 3: Numerical methods for PDEs (Numerik 3)
5. Class 3b: Finite-Elemente-Programmierpraktikum (C++)

6. Class 3c: Goal-oriented error estimation and adaptive finite elements


7. Class 4: Numerical methods for continuum mechanics
8. Special lectures:
• Class 5a: Numerical methods for nonlinear and coupled problems,
• Class 5b: Numerical methods for phase-field fracture,
• Class 5a/b: Numerical methods for coupled variational inequality systems,
• Class 5c: Scientific computing
• ...

4
Contents

Contents

1 Main literature and acknowledgements 15


1.1 Linear PDEs and finite elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 Nonlinear and coupled PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Own materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Online materials (WS 20/21) for IntNumPDE (an international open class) . . . . . . . . . . . 16
1.5 IntNumPDE: an international open class and computing future lectures . . . . . . . . . . . . . 17
1.6 Teaching funding acknowledgements (WS 20/21) . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Motivation 19
2.1 Scientific computing (english) / Wissenschaftliches Rechnen (german) . . . . . . . . . . . . . . 19
2.1.1 Addressing new problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Guiding questions in differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Illustration of three errors (experimental, theory, numerical) with the clothesline problem 22
2.4.2 A priori and a posteriori error estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.1 Consequences for numerical mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Concepts in numerical mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7 Illustration of physical oscillations versus numerical instabilities . . . . . . . . . . . . . . . . . . 27
2.8 Well-posedness (in the sense of Hadamard 1923) . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.9 Numerical methods for solving PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.10 Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Notation 31
3.1 Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Independent variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Function, vector and tensor notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6 Multiindex notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.7 Gradient, divergence, trace, Laplace, rotation/curl . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.8 Some properties of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8.1 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8.2 Positive definite, eigenvalues and more . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8.3 Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.9 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.10 Scalar product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.11 Normed spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.12 Linear mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.13 Little o and big O - the Landau symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.14 Taylor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.15 Transformation of integrals: substitution rule / change of variables . . . . . . . . . . . . . . . . 39
3.16 Gauss-Green theorem / Divergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.17 Integration by parts and Green’s formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.18 Main notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.19 Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5
Contents
4 An extended introduction 43
4.1 Examples of differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1.1 Variational principles in mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1.2 Population growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.1.3 Laplace equation / Poisson problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.4 Mass conservation / First-order hyperbolic problem . . . . . . . . . . . . . . . . . . . . 47
4.1.5 Elasticity (Lamé-Navier) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.6 The incompressible, isothermal Navier-Stokes equations (fluid mechanics) . . . . . . . . 50
4.1.7 Coupled: Biot (porous media flow) and Biot coupled to elasticity . . . . . . . . . . . . . 51
4.2 Intermediate brief summary: several PDEs in one view . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.2 Heat equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.3 Wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.4 Biharmonic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.5 Eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.6 Linear transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.7 Linearized elasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Three important linear PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.1 Elliptic equations and Poisson problem/Laplace equation . . . . . . . . . . . . . . . . . 57
4.3.2 Parabolic equations and heat equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.3 Hyperbolic equations and the wave equation . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Nonconstant coefficients. Change of the PDE types elliptic, parabolic, hyperbolic . . . . . . . . 61
4.5 Boundary conditions and initial data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5.1 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5.2 Initial conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5.3 Example: temperature in a room . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6 The general definition of a differential equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6.1 Single-valued differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6.2 Vector-valued differential equations, PDE systems . . . . . . . . . . . . . . . . . . . . . 64
4.7 Classifications into linear and nonlinear PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.7.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.8 Further remarks on the classification into three important types . . . . . . . . . . . . . . . . . . 67
4.9 PDEs that are not elliptic, parabolic or hyperbolic . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.10 Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Classifications 71
5.1 Physical conservation properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Order of a PDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3 Single equations versus PDE systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4 Linear versus nonlinear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.5 Decoupled versus coupled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.5.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.5.2 Coupling situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.6 Variational equations versus variational inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.7 Coupled variational inequality systems (CVIS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.8 Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6 Finite differences (FD) for elliptic problems 77


6.1 A 1D model problem: Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Well-posedness of the continuous problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2.1 Green’s function and well-posedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2.2 Maximum principle on the continuous level . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2.3 Regularity of the solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3 Spatial discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.4 Solving the linear equation system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6
Contents
6.5 Well-posedness of the discrete problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.5.1 Existence and uniqueness of the discrete problem . . . . . . . . . . . . . . . . . . . . . . 82
6.5.2 Maximum principle on the discrete level . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.6 Numerical analysis: consistency, stability, and convergence . . . . . . . . . . . . . . . . . . . . . 84
6.6.1 Basics in error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.6.2 Truncation error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.6.3 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.6.4 Stability in L∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.6.5 Convergence L∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.6.6 Convergence L2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.7 Numerical test: 1D Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.7.1 Homogeneous Dirichlet boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.7.2 Nonhomogeneous Dirichlet boundary conditions u(0) = u(1) = 1 . . . . . . . . . . . . . 90
6.7.3 Nonhomogeneous Dirichlet boundary conditions u(0) = 3 and u(1) = 2 . . . . . . . . . . 90
6.7.4 How can we improve the numerical solution? . . . . . . . . . . . . . . . . . . . . . . . . 90
6.8 Finite differences in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.8.1 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.9 Finite differences for time-dependent problems: 1D space / 1D time . . . . . . . . . . . . . . . 92
6.9.1 Algorithmic aspects: temporal and spatial discretization . . . . . . . . . . . . . . . . . . 92
6.9.2 Consistency and precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.9.3 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.9.4 Stability in L∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.9.5 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.9.6 Extended exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.10 Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7 Functional analysis and approximation theory 99


7.1 Analysis, linear algebra, functional analysis and Sobolev spaces . . . . . . . . . . . . . . . . . . 99
7.1.1 Convergence and complete spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.1.2 Scalar products, orthogonality, Hilbert spaces in 1D . . . . . . . . . . . . . . . . . . . . 100
7.1.3 Lebuesgue measures and integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.1.4 Weak derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.1.5 The Hilbert spaces H 1 and H01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.1.6 Hilbert spaces versus classical function spaces . . . . . . . . . . . . . . . . . . . . . . . . 106
7.1.7 Sobolev spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.1.8 Brief summary of L2 , H 1 and H 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.2 Useful Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.2.1 Some compactness results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.3 Approximation theory emphasizing on the projection theorem . . . . . . . . . . . . . . . . . . . 109
7.4 Differentiation in Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

8 Theory and finite elements (FEM) for elliptic problems 119


8.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.1.1 Construction of an exact solution (only possible for simple cases!) . . . . . . . . . . . . 119
8.1.2 Equivalent formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8.2 Derivation of a weak form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.3 Notation: writing the bilinear and linear forms in abstract notation . . . . . . . . . . . . . . . . 124
8.4 Finite elements in 1D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.4.1 The mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.4.2 Linear finite elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.4.3 The process to construct the specific form of the shape functions . . . . . . . . . . . . . 127
8.4.4 The discrete weak form and employing linear combination for uh ∈ Vh . . . . . . . . . . 128
8.4.5 Evaluation of the integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.4.6 More comments on how to build-in boundary conditions . . . . . . . . . . . . . . . . . . 131
8.4.7 Definition of a finite element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

7
Contents
8.4.8 Properties of the system matrix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8.4.9 Numerical test: 1D Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.5 Algorithmic details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.5.1 Assembling integrals on each cell K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.5.2 Example in R5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.5.3 On the indices of A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8.5.4 Numerical quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
8.5.5 Details on the evaluation on the master element . . . . . . . . . . . . . . . . . . . . . . 138
8.5.6 Generalization to s elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.5.7 Example: Section 8.5.2 continued using Sections 8.5.5 and 8.5.6 . . . . . . . . . . . . . . 140
8.6 Quadratic finite elements: P2 elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.6.1 Algorithmic aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.6.2 Numerical test: 1D Poisson using quadratic FEM (P2 elements) . . . . . . . . . . . . . . 143
8.7 Galerkin orthogonality, a geometric interpretation of the FEM, and a first error estimate . . . . 144
8.8 Neumann & Robin problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.8.1 Robin boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.8.2 Neumann boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
8.9 Variational formulations of elliptic problems in higher dimensions . . . . . . . . . . . . . . . . . 149
8.9.1 Model problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.9.2 Comments on the domain Ω and its boundaries ∂Ω . . . . . . . . . . . . . . . . . . . . . 150
8.9.3 Integration by parts and Green’s formulae . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.9.4 Variational formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.10 The Lax-Milgram lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.10.1 The energy norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.10.2 The Poincaré inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.10.3 Trace theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.11 Theory of elliptic problems (in higher dimensions) . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.11.1 The formal procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
8.11.2 Poisson’s problem: homogeneous Dirichlet conditions . . . . . . . . . . . . . . . . . . . . 158
8.11.3 Nonhomogeneous Dirichlet conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
8.11.4 Neumann conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.11.5 Robin conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.12 Finite element spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.12.1 Example: a triangle in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.12.2 Example: a quadrilateral in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.12.3 Well-posedness of the discrete problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.13 Numerical analysis: error estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.13.1 The Céa lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.13.2 H 1 and L2 estimates in 1D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.13.3 An improved L2 estimate - the Aubin-Nitsche trick . . . . . . . . . . . . . . . . . . . . . 172
8.13.4 Analogy between finite differences and finite elements . . . . . . . . . . . . . . . . . . . 173
8.14 Numerical analysis in 2D and higher dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.14.1 The Bramble-Hilbert-Lemma and an interpolation estimate . . . . . . . . . . . . . . . . 174
8.14.2 Inverse estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.14.3 Error estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.15 Numerical analysis: influence of numerical quadrature, first Strang lemma . . . . . . . . . . . . 175
8.15.1 Approximate solution of A and b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.15.2 Interpolation with exact polynomial integration . . . . . . . . . . . . . . . . . . . . . . . 177
8.15.3 Integration with numerical quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.15.4 Numerical tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.16 Numerical tests and computational convergence analysis . . . . . . . . . . . . . . . . . . . . . . 189
8.16.1 2D Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
8.16.2 Numerical test: 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
8.16.3 Checking programming code and convergence analysis for linear and quadratic FEM . . 190

8
Contents
8.16.4 Convergence analysis for 1D Poisson using linear FEM . . . . . . . . . . . . . . . . . . . 192
8.16.5 Computing the error norms ku − uh k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.17 Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

9 Goal-oriented a posteriori error estimation and adaptivity 195


9.1 Overview of the special class WS 19/20 on goal-oriented error estimation . . . . . . . . . . . . 195
9.2 Motivation (Class 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
9.2.1 Goal functionals in differential equations and solution systems . . . . . . . . . . . . . . 196
9.2.2 Abstract formulation of all these goal functionals . . . . . . . . . . . . . . . . . . . . . . 197
9.2.3 Error, estimates, accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
9.3 Difference between norm-based error estimation and goal functionals . . . . . . . . . . . . . . . 197
9.4 Principles of error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
9.5 Efficiency, reliability, basic adaptive scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
9.6 Goal-oriented error estimation using dual-weighted residuals (DWR) . . . . . . . . . . . . . . . 199
9.6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
9.6.2 Excursus: Variational principles and Lagrange multipliers in mechanics . . . . . . . . . 200
9.6.3 First-order optimality system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
9.6.4 Error estimation and adjoints in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
9.6.5 Duality-based error estimation - linear problems . . . . . . . . . . . . . . . . . . . . . . 203
9.6.6 Duality-based error estimation - nonlinear problems . . . . . . . . . . . . . . . . . . . . 204
9.7 Duality-based error estimation for ODEs - adaptivity in time . . . . . . . . . . . . . . . . . . . 204
9.8 Spatial error estimation for linear problems and linear goal functionals (Poisson) . . . . . . . . 205
9.8.1 Primal problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
9.8.2 Goal functional: some possible examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
9.8.3 Adjoint problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
9.8.4 Derivation of an error identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
9.9 Nonlinear problems and nonlinear goal functionals . . . . . . . . . . . . . . . . . . . . . . . . . 207
9.9.1 Approximation of the adjoint solution for the primal estimator ρ(uh )(·) . . . . . . . . . 208
9.9.2 Summary and alternatives for computing the adjoint . . . . . . . . . . . . . . . . . . . . 209
9.9.3 Approximation of the primal solution for the adjoint estimator ρ∗ (zh )(·) . . . . . . . . . 209
9.10 Measuring the quality of the error estimator η . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
9.11 Rough structure of adjoint-based error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 210
9.12 Localization techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
9.12.1 The classical way of error localization of the primal estimator for linear problems . . . . 211
9.12.2 The classical way for the combined estimator . . . . . . . . . . . . . . . . . . . . . . . . 212
9.12.3 A variational primal-based error estimator with PU localization . . . . . . . . . . . . . . 213
9.12.4 PU localization for the combined estimator . . . . . . . . . . . . . . . . . . . . . . . . . 213
9.13 Comments to adjoint-based error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.14 Balancing discretization and iteration errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.14.1 Main theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.14.2 Adaptive fixed-point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.14.3 Adaptive Newton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.15 Effectivity of goal-oriented error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.16 Efficiency estimates using a saturation assumption . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.17 Principles of multi-goal-oriented error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.17.1 The adaptive algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
9.18 Residual-based error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
9.19 Mesh refinement strategies: averaging, fixed-rate, fixed-fraction . . . . . . . . . . . . . . . . . . 219
9.19.1 Averaging and general algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
9.19.2 Fixed-rate (fixed number) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.19.3 Fixed error reduction (fixed-fraction) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.19.4 How to refine marked cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.19.5 Convergence of adaptive algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
9.20 Numerical test: Poisson with mean value goal functional . . . . . . . . . . . . . . . . . . . . . . 221

9
Contents
9.21 Numerical test: L-shaped domain with Dirac rhs and Dirac goal functional . . . . . . . . . . . 224
9.22 Space-time goal-oriented error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
9.23 Final comments to error estimation in numerical simulations . . . . . . . . . . . . . . . . . . . 226
9.24 Example: Stationary Navier-Stokes 2D-1 benchmark . . . . . . . . . . . . . . . . . . . . . . . . 226
9.24.1 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
9.24.2 Functionals of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.24.3 A duality-based a posteriori error estimator (primal part) . . . . . . . . . . . . . . . . . 229
9.24.4 2D-1 configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.24.5 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.24.6 Parameters and right hand side data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.24.7 Step 1: Verification of benchmark values . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
9.24.8 Step 2 and Step 3: Computing J(u) on a fine mesh, J(u) − J(uh ) and Ief f . . . . . . . 230
9.24.9 More findings - graphical solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.25 Example: Stationary fluid-structure interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
9.26 Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

10 Numerical solution of the discretized problems 233


10.1 On the condition number of the system matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
10.2 Fixed-point schemes: Richardson, Jacobi, Gauss-Seidel . . . . . . . . . . . . . . . . . . . . . . . 233
10.2.1 On the Richardson iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
10.2.2 Jacobi and Gauss-Seidel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
10.3 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
10.4 Conjugate gradients (a Krylov space method) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
10.4.1 Formulation of the CG scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
10.4.2 Convergence analysis of the CG scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
10.5 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
10.6 GMRES - generalized minimal residual method . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
10.6.1 Pure GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
10.6.2 Left-preconditioned GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
10.6.3 Right-preconditioned GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
10.7 Geometric multigrid methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
10.7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
10.7.2 Smoothing property of Richardson iteration . . . . . . . . . . . . . . . . . . . . . . . . . 249
10.7.3 Eigenvalues and eigenvectors for 1D Poisson and mesh independence . . . . . . . . . . . 250
10.7.4 Numerical example of eigenvalues and eigenvectors for 1D Poisson . . . . . . . . . . . . 251
10.7.5 Variational geometric multigrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
10.7.6 Usage of MG in practice and MG as a preconditioner . . . . . . . . . . . . . . . . . . . 257
10.7.7 Convergence analysis of the W cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
10.8 Numerical tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
10.9 Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

11 Applications in continuum mechanics: linearized elasticity and Stokes 261


11.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
11.2 Well-posedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
11.3 Finite element discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11.4 Numerical tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
11.4.1 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
11.4.2 2D test with focus on the maximum principle . . . . . . . . . . . . . . . . . . . . . . . . 268
11.5 Stokes - FEM discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
11.6 Numerical test - 2D-1 benchmark without convection term (Stokes) . . . . . . . . . . . . . . . . 270
11.7 Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

12 Methods for time-dependent PDEs: parabolic and hyperbolic problems 271


12.1 Principle procedures for discretizing time and space . . . . . . . . . . . . . . . . . . . . . . . . 271
12.2 Bochner spaces - space-time functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

10
Contents
12.3 Methods for parabolic problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
12.3.1 Problem statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
12.3.2 Temporal discretization via One-Step-θ schemes . . . . . . . . . . . . . . . . . . . . . . 274
12.3.3 Spatial discretization and an abstract One-Step-θ scheme . . . . . . . . . . . . . . . . . 275
12.3.4 Spatial discretization for the forward Euler scheme . . . . . . . . . . . . . . . . . . . . . 275
12.3.5 Evaluation of the integrals (1D in space) . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
12.3.6 Final algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
12.3.7 Recapitulation of ODE stability analysis using finite differences . . . . . . . . . . . . . . 278
12.3.8 Refinements of A-stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
12.3.9 Stiff problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
12.3.10 Numerical analysis: stability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
12.3.11 Convergence results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
12.3.12 Numerical tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
12.3.13 Student project in the FEM-C++ course in the summer 2018 . . . . . . . . . . . . . . . . 291
12.4 Time discretization by operator splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
12.4.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
12.4.2 Peaceman-Rachford time discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
12.4.3 The Fractional-Step-θ scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
12.4.4 Modified Fractional-Step-θ schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
12.5 The discontinuous Galerkin (DG) method for temporal discretization . . . . . . . . . . . . . . . 300
12.6 Methods for second-order-in-time hyperbolic problems . . . . . . . . . . . . . . . . . . . . . . . 301
12.6.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
12.6.2 Variational formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
12.6.3 A space-time formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
12.6.4 Energy conservation and consequences for good time-stepping schemes . . . . . . . . . . 303
12.6.5 One-Step-θ times-stepping for the wave equation . . . . . . . . . . . . . . . . . . . . . . 305
12.6.6 Stability analysis / energy conservation on the time-discrete level . . . . . . . . . . . . . 306
12.6.7 Convergence results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
12.6.8 Summary of stability, convergence and energy conservation . . . . . . . . . . . . . . . . 308
12.7 Numerical tests: scalar wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
12.8 Numerical tests: elastic wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
12.8.1 Constant force - stationary limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
12.8.2 Time-dependent force - Crank-Nicolson scheme . . . . . . . . . . . . . . . . . . . . . . . 312
12.8.3 Time-dependent force - backward Euler scheme . . . . . . . . . . . . . . . . . . . . . . . 313
12.9 Numerical tests: elastic wave equation in a flow field . . . . . . . . . . . . . . . . . . . . . . . . 314
12.10Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

13 Nonlinear problems 317


13.1 Nonlinear PDEs: strong forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
13.1.1 p-Laplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
13.1.2 Semilinear equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
13.1.3 Incompressible Euler equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
13.1.4 Incompressible Navier-Stokes equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
13.1.5 A coupled nonlinear problem (I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
13.1.6 A coupled nonlinear problem (II) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
13.1.7 A coupled nonlinear problem (III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
13.2 Nonlinear PDEs: weak forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
13.3 A variational inequality I: Obstacle problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
13.4 Exercise: FEM for the penalized 1D obstacle problem . . . . . . . . . . . . . . . . . . . . . . . 321
13.4.1 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
13.5 A variational inequality II: Phase-field fracture . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
13.6 Differentiation of nonlinear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
13.7 Linearization techniques: brief overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323

11
Contents
13.8 Fixed-point iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
13.8.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
13.8.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
13.8.3 Example of a 1D nonlinear equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
13.9 Linearization via time-lagging (example: NSE) . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
13.9.1 Stokes linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
13.9.2 Oseen linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
13.10Newton’s method in R - the Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . . 327
13.10.1 Newton’s method: overview. Going from R to Banach spaces . . . . . . . . . . . . . . . 329
13.10.2 A basic algorithm for a residual-based Newton method . . . . . . . . . . . . . . . . . . . 329
13.11Inexact Newton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
13.12Newton’s method for variational formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
13.13Continuing Section 13.8.2, now with Newton . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
13.14Continuing Section 13.8.3, now with Newton . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
13.15The regularized p-Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
13.15.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
13.15.2 Augmented Lagrangian techniques as one option for the numerical solution . . . . . . . 334
13.15.3 Numerical test p-Laplacian w. Newton-CG nonlinear/linear solver w. GMG precondi-
tioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
13.16Temporal discretization of non-linear time-dependent problems . . . . . . . . . . . . . . . . . . 336
13.16.1 Classifying the semi-linear forms of PDE systems . . . . . . . . . . . . . . . . . . . . . . 336
13.16.2 Nonlinear time derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
13.16.3 One-Step-θ and Fractional-Step-θ schemes . . . . . . . . . . . . . . . . . . . . . . . . . . 337
13.17Resulting time discretized problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
13.18An academic example of finite-difference-in-time, Galerkin-FEM-in-space-discretization and lin-
earization in a Newton setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
13.19Navier-Stokes - FEM discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
13.20Newton solver performance for Poiseuille flow in a channel . . . . . . . . . . . . . . . . . . . . . 343
13.20.1 Stokes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
13.20.2 Navier-Stokes with νf = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
13.20.3 Navier-Stokes with νf = 1e − 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
13.21Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

14 Coupled problems 345


14.1 An interface-coupled problem in 1D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
14.2 Volume coupling of two PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
14.3 Partitioned versus monolithic coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
14.4 Iterative coupling (known as partitioned or staggered) schemes . . . . . . . . . . . . . . . . . . 348
14.5 Variational-monolithic coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
14.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
14.6.1 Two stationary elliptic PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
14.6.2 Two time-dependent parabolic PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
14.6.3 The Biot problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
14.7 Interface coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
14.7.1 Preliminaries: Interface-Tracking and Interface-Capturing . . . . . . . . . . . . . . . . . 350
14.7.2 Interface coupling of two PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
14.7.3 Variational-monolithic coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
14.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
14.8.1 Two stationary elliptic PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
14.8.2 The augmented Biot-Lamé-Navier problem . . . . . . . . . . . . . . . . . . . . . . . . . 353
14.9 Variational-monolithic fluid-structure interaction . . . . . . . . . . . . . . . . . . . . . . . . . . 353
14.9.1 A compact semi-linear form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
14.9.2 Preparing temporal discretization: grouping the FSI problem . . . . . . . . . . . . . . . 354
14.9.3 Implementation in ANS/deal.II and DOpElib; both based on C++ . . . . . . . . . . . . 355

12
Contents
14.10Diffusive interface approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
14.11Sharp interface approximations in cut-elements with a locally modified FEM . . . . . . . . . . 356
14.12On the nonlinear and linear numerical solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
14.12.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
14.13Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

15 Public climate school (November 2019/2021): PDEs in atmospheric physics 359


15.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
15.2 Five governing equations in meteorology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
15.3 Some values and units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
15.4 Classifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
15.4.1 Conservation laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
15.4.2 Order, steady-state vs. nonstationary, single vs. vector-valued . . . . . . . . . . . . . . . 361
15.4.3 Coupling and nonlinearities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
15.4.4 Identification of similar structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
15.5 Weak formulations and solution algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
15.5.1 Examples of weak formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
15.5.2 Principle solution schemes: monolithic versus staggered . . . . . . . . . . . . . . . . . . 364
15.5.3 A fully monolithic formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
15.5.4 Staggered solution (time explicit coupling) . . . . . . . . . . . . . . . . . . . . . . . . . . 365
15.6 FEM discretization in 1D of ideal gas equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
15.7 Navier-Stokes discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
15.8 Boussinesq approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

16 Computational convergence analysis 367


16.1 Discretization error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
16.1.1 Relationship between h and N (DoFs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
16.1.2 Discretization error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
16.1.3 Computationally-obtained convergence order . . . . . . . . . . . . . . . . . . . . . . . . 369
16.1.4 Temporal order for FE, BE, CN in ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . 369
16.2 Spatial discretization error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
16.3 Temporal discretization error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
16.4 Extrapolation to the limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
16.5 Iteration error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373

17 Software and programming codes 375


17.1 Research software (C++) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
17.2 Programming codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
17.2.1 deal.II (C++) step-3 in a 1D version: clothesline / Poisson problem . . . . . . . . . . . 375
17.2.2 Poisson 1D in python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
17.2.3 Poisson 1D in python (short-elegant version) by Roth/Schröder . . . . . . . . . . . . . . 382
17.2.4 Poisson 2D in python with assembling and numerical quadrature by Roth/Schröder . . 384
17.2.5 Newton’s method for solving nonlinear problems: octave/MATLAB and C++ . . . . . . 387
17.2.6 Time-stepping schemes for ODEs in Octave . . . . . . . . . . . . . . . . . . . . . . . . . 390
17.2.7 deal.II (C++) PU-DWR implementation used in Section 9 . . . . . . . . . . . . . . . . 393
17.2.8 deal.II (C++) heat equation used in Section 12.3.12 . . . . . . . . . . . . . . . . . . . . 401
17.2.9 deal.II (C++) Navier-Stokes illustrating nonlinear problems, Chapter 13 . . . . . . . . . 414
17.3 Chapter summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417

18 Wrap-up 419
18.1 Quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
18.1.1 Basic techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
18.1.2 Advanced techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
18.2 Hints for presenting results in a talk (FEM-C++ practical class) . . . . . . . . . . . . . . . . . 422
18.2.1 Writing an abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

13
Contents
18.2.2 Some example pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
18.3 The end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428

Bibliography 429

Index 437

14
1. MAIN LITERATURE AND ACKNOWLEDGEMENTS

1 Main literature and acknowledgements


1.1 Linear PDEs and finite elements
1. C. Johnson 1987; Numerical solution of partial differential equations by the finite element method [82]
(in English; original version in Swedish)
2. P. Ciarlet 1987: The finite element method for elliptic problems [35] (in English)
3. R. Rannacher 2017 (in German); Numerik partieller Differentialgleichungen [106]
4. K. Eriksson and D. Estep and P. Hansbo and C. Johnson: Computational Differential Equations 2009
[48] (in English)
5. Ch. Grossmann, H.-G. Roos, M. Stynes; Numerical treatment of partial differential equations [65] (in
English; original version in German).
6. T. Hughes 2000; The finite element method [78] (in English)
7. G. F. Carey and J. T. Oden 1984; Finite Elements. Volume III. Compuational Aspects [30] (in English)
8. D. Braess 2007; Finite Elemente [24] (in German)
9. G. Allaire, F. Alouges: Analyse variationnelle des équations aux dérivées partielles [3] (in French)
10. S. Brenner and L.R. Scott 2008; The mathematical theory of finite element methods [26] (in English)
11. W. Hackbusch 1986: Theorie und Numerik elliptischer Differentialgleichungen [68] (in German; also
available in English)
12. H.R. Schwarz: Methode der finiten Elemente [120] (in German)
13. G. Allaire: Introduction to mathematical modelling, numerical simulation, and optimization [2] (in
English; original version in French),
14. P.G. Ciarlet: Linear and nonlinear functional analysis [36] (in English).

1.2 Nonlinear and coupled PDEs


(the order of the following list is purely arbitrary)
1. G. Duvaut, J.L. Lions: Inequalities in mechanics and physics, Springer, 1976 [43]
2. R. Glowinski; Numerical methods for nonlinear variational problems, 1982
3. Trémolières/Lions/Glowinski; Numerical Analysis of VI, 2011 (original version in French) [130]
4. R. Glowinski; P. Le Tallec; AL and OS in nonlinear mechanics [59]
5. F.-T. Suttmeier; Numerical solution of Variational Inequalities by Adaptive Finite Elements, Vieweg-
Teubner, 2008 [123]
6. T. Richter; Fluid-structure interactions: models, analysis, and finite elements, Springer, 2017 [112]
7. J. M. Ortega, W.C. Rheinboldt; Iterative solution of nonlinear equations in several variables, Academic
Press, New York, 1970
8. H. Schwetlick; Numerische Lösung nichtlinearer Gleichungen. Verlag der Wissenschaften, Berlin, 1978
9. N. Kikuchi, J.T. Oden; Contact problems in elasticity, SIAM, 1988. [83]
10. S. Bartels; Numerical methods for nonlinear PDEs, 2015 [12]
11. T. Wick; Multiphysics phase-field fracture, de Gruyter, 2020 [141]

15
1. MAIN LITERATURE AND ACKNOWLEDGEMENTS
1.3 Own materials
For the definition of the ‘Classes’, we refer to page 3.
1. Class 1: T. Richter, T. Wick; Einführung in die numerische Mathematik - Begriffe, Konzepte und
zahlreiche Anwendungsbeispiele, Springer, 2017 [114] (in german)
https://www.springer.com/fr/book/9783662541777
2. Class 1 (english version; École Polytechnique): S. Amstutz, T. Wick; Refresher course in maths and
a project on numerical modeling done in twos. Hannover : Institutionelles Repositorium der Leibniz
Universität Hannover, 2021-12-30, 276 S. [5] (in english)
https://doi.org/10.15488/11629
3. Class 1b: M.C. Steinbach, J.P. Thiele, T. Wick; Algorithmisches Programmieren (Numerische Algorith-
men mit C++) Hannover : Institutionelles Repositorium der Leibniz Universität Hannover, 2021, [122]
(in german)
https://doi.org/10.15488/11583
4. Class 2: L. Samolik, T. Wick; Numerische Methoden für gewöhnliche Differentialgleichungen und Eigen-
wertprobleme, Mitschrift/Skriptum, SS 2019, Leibniz Universität Hannover, 2019 [117] (in german)
and
Class 2/3: T. Wick, P. Bastian: PeC3 Spring School on Introduction to Numerical Modeling with Dif-
ferential Equations. Hannover : Institutionelles Repositorium der Leibniz Universität Hannover, 2019.
DOI: https://doi.org/10.15488/6427 [143] (in english)
5. Classes 3, 3b, 3c: These lecture notes (in english)
6. Class 3c: J. Goldenstein, T. Wick; Goal-oriented a posteriori error estimation and adaptive finite ele-
ments; Lecture notes WS 19/20, Leibniz Universität Hannover, Feb 2020 (in english)
7. Class 4: T. Wick; Modeling, Discretization, Optimization, and Simulation of Fluid-Structure Interaction;
Lecture notes at Heidelberg University, TU Munich, and JKU Linz, 2016, available on https://www-m17.
ma.tum.de/Lehrstuhl/LehreSoSe15NMFSIEn (in english)
8. Class 5a: These lecture notes (in english)
9. Class 5b: K. Mang, T. Wick; Numerical methods for variational phase-field fracture problems; Hannover
: Institutionelles Repositorium der Leibniz Universität Hannover, 2019 DOI: https://doi.org/10.
15488/5129 (in english)
10. Classes 4, 5a, 5b (additional literature): T. Wick; Multiphysics phase-field fracture, de Gruyter, 2020
[141], DOI: http://www.degruyter.com/books/978-3-11-049656-7

1.4 Online materials (WS 20/21) for IntNumPDE (an international open class)
Due to the COVID-19 online lectures, students of mine (Max Schröder and Julian Roth) complemented these
lecture notes with additional videos published on youtube:
• PDEs and their classification: https://www.youtube.com/watch?v=afNZj5URZsc&feature=youtu.be
• Finite differences: https://youtu.be/YotrBNLFen0
• Finite elements: https://youtu.be/P4lBRuY7pC4
• Galerkin orthogonality and Céa’s lemma: https://www.youtube.com/watch?v=uPt5PIW1Rfo
• Lax-Milgram lemma: https://www.youtube.com/watch?v=w8QjyVug78M
All materials of this online class can be found on:
• Handwritten notes, edited lecture notes, exercise sheets can be found on: http://www.thomaswick.org/
num_pde_winter_2020_engl.html

16
1. MAIN LITERATURE AND ACKNOWLEDGEMENTS
1.5 IntNumPDE: an international open class and computing future lectures
Link to the webpage:

http://www.thomaswick.org/num_pde_winter_2020_engl.html

1.6 Teaching funding acknowledgements (WS 20/21)


Due to the COVID-19 global crisis, financial support for improving teaching lecture materials (such as these
notes and the previously mentioned videos) is gratefully acknowledged:

• DAAD (German Academic Exchange Service): PeCCC (Peruvian Competence Center of Scientific Com-
puting), Stärkung des Wissenschaftlichen Rechnens in der Lehre in Peru,
https://www.pec3.org/
• Leibniz University Hannover: International Office, South America support administrated from Ms. Rhina
Colunge-Peters
• Leibniz University Hannover: Fonds für Internationalisierung

17
1. MAIN LITERATURE AND ACKNOWLEDGEMENTS

18
2. MOTIVATION

2 Motivation
These lecture notes are devoted to the numerical solution of partial differential equations (PDEs).
A PDE is a function of at least two independent variables in which both derivative information and function
values may appear. PDEs play important roles in continuum mechanics for modeling applications in physics,
biology, chemistry, finance, mechanical/civil/environmental engineering. The focus in this course is on nu-
merical modeling of PDEs, which allow us to discretize them for computer simulations. We discuss various
discretization schemes, design of algorithms and their rigorous numerical analysis. Moreover, we discuss the
numerical solution of the arising linear equation systems. This class is an important part of scientific comput-
ing eduction in which mathematical modeling, numerical methods, and computer implementations interact.
In these notes, not only classical topics for linear PDEs, methods for nonlinear PDEs, and coupled problems
are addressed, but we do also dive into current state-of-the-art techniques.

2.1 Scientific computing (english) / Wissenschaftliches Rechnen (german)


Developing and analyzing algorithms for solving PDEs with a computer is a part of numerical methods,
which itself is a part of scientific computing. Scientific computing comprises three main fields:
1. Mathematical modeling and analysis of physical, biological, chemical, economical, financial processes,
and so forth;

2. Development of reliable and efficient numerical methods and algorithms and their analysis;
3. Research software development: Implementation of the derived algorithms
All these steps work in a feed-back manner and the different subtasks interact with each other. It is in fact
the third above aspect, namely software and computers, who helped to establish this third category of science.
Thus, a new branch of mathematics, computational science, has been established. This kind of mathematics
may become experimental like experiments in physics/chemistry/biology.
A key task is the design and analysis of algorithms:
Definition 2.1 (Algorithm). An algorithm is an instruction for a schematic solution of a mathematical
problem statement. The main purpose of an algorithm is to formulate a scheme that can be implemented into
a computer to carry out so-called numerical simulations. Direct schemes solve the given problem up to round-off
errors (for instance Gaussian elimination). Iterative schemes approximate the solution up to a certain accuracy
(for instance Richardson iteration for solving linear equation systems, or fixed-point iterations). Algorithms
differ in terms of accuracy, robustness, and efficiency.
An important feedback task is to analyse (rigorously or computationally) these algorithms in order to
detect shortcomings and suggest improvements. These can be nowadays on the algorithmic side (classical
numerical analysis with convergence proofs) or the computational side (analysis of different discretization
levels, parameter variations by just running the code again) or coding improvements (re-organization of, e.g.,
for-loops in a finite element program), or hardware-specific aspects (e.g., CPU computing, GPU computing).

2.1.1 Addressing new problems


Computational science allows us to investigate research fields that have partially not been addressable in the
past. Why? On the one hand experiments are often too expensive, too far away (Mars, Moon, astronomy in
general), the scales are too small (nano-scale for example); or experiments are simply too dangerous. On the
other hand, mathematical theory or the explicit solution of an (ambitious) engineering problem in an analytical
manner is often impossible!
Nowadays, nonstationary, nonlinear, coupled, PDE systems are addressed in which two or more
PDEs interact. Sometimes, these systems are subject to inequality constraints resulting into CVIS (coupled
variational inequality systems) [141]. Mathematics helps to formalise and structure such systems. Hereupon,
established algorithms for the numerical discretization and numerical solution for known parts of the given
system shall be further extended for new problems and new applications.

19
2. MOTIVATION
2.2 Differential equations
Let us first define the meaning of a differential equation:

Definition 2.2. A differential equation is a mathematical equation that relates a function with its derivatives
such that the solution satisfies both the function and the derivatives.
Differential equations can be split into two classes:
Definition 2.3 (Ordinary differential equation (ODE) ). An ordinary differential equation (ODE) is an
equation (or equation system) involving an unknown function of one independent variable and certain of its
derivatives.
Definition 2.4 (Partial differential equation (PDE) ). A partial differential equation (PDE) is an equation (or
equation system) involving an unknown function of two or more variables and certain of its partial derivatives.
Such differential equations are posed in function spaces and analyzed therein. For the following, let us
denote:
• V function space of the (possibly unknown) exact solution u. We write u ∈ V .
• Ṽ function space of approximate solutions ũ. We write ũ ∈ Ṽ .

Numerical discretizations in space and time introduce parameters h and k in order to compute the approximate
solution, i.e., ũ := uhk . We then aim for that the discretization error behaves as

ku − ukh k → 0 for h → 0, k → 0.

Further errors may come into play as we will see after the next subsection. Moreover, the choice of good norms
is important to determine the accuracy.

2.3 Guiding questions in differential equations


1. What type of equations are we dealing with?

2. Are we dealing with a single PDE, coupled problems, nonlinear PDEs?


3. What are the physical conservation properties?
4. Are theoretical results known? Can the equations be related to (simplified) known PDEs?
5. What kind of discretization scheme(s) shall we use?

6. How do we design algorithms to compute the discrete solution uh ?


7. Can we proof that these algorithms really work?
8. Are they robust (stable with respect to parameter variations), accurate, and efficient?

9. Can we construct physics-based algorithms that maintain as best as possible conservation laws; in par-
ticular when several equations interact (couple)?
10. How far is uh away from u in a certain (error) norm? Hint: comparing color figures gives a first impression,
but is not science!
11. The discretized systems (to obtain uh ) are often large with a huge number of unknowns: how do we solve
these linear equation systems?
12. What is the computational cost?
13. How can we achieve more efficient algorithms? Hint: adaptivity, parallel computing, model order reduc-
tion.

20
2. MOTIVATION
14. Are analytical solutions or experimental results available to compare with?
15. How can we check that the solution is correct?
16. How do we present numerical results: figures, graphs, tables?
17. What parts of the solution are of interest? The entire solution? Only certain goal functionals?
18. What is my personal task or goal: theory, numerical analysis or implementation? For the latter, what
software is available on the market?

2.4 Errors
In order to realize algorithms, we go ahead and implement them in a programming language or software (for
instance Matlab/octave, python, fortran, C++) using a computer or cluster.
From the cycle over mathematical modeling, over the design of algorithms, up to their implementation, we
face several error sources:
1. The set of numbers that can be represented in a computer is finite therefore a numerical calculation is
limited by machine precision, which results in round-off errors.
2. The memory of a computer (or cluster) is finite and thus functions and equations can only be represented
through approximations. Thus, continuous information has to be represented through discrete informa-
tion, which results in investigating so-called discretization errors or more general approximation
error.
3. Interpolation errors appear when complicated functions are interpolated to simpler functions.
4. All further simplifications of a numerical algorithm (in order to solve the discrete problem), with the
final goal to reduce the computational time, result into iteration / truncation errors. One example is
the stopping criterion, which decides after how many steps an iteration is stopped. These can be further
divided into linear and nonlinear iteration errors.
5. Regularization errors appear when ill-posed models are adapted such that they become well-posed.
6. Homogenization errors arise when full-scale models are reduced (homogenized) such that they are
computationally simpler to solve.
7. Implementation errors, better known as bugs, appear when mistakes are made implementing algo-
rithms into a software. These errors can simply cause the program to abort. Then it is more or less
simple to find the place with the help of a debugger. But there are also subtle errors, which cannot be
easily found when the program runs nonetheless, but showing strange results or behaviors.
8. Model errors: In order to make a ‘quick guess’ of a possible solution and to start the development
of an algorithm aimed at addressing at a later stage a difficult problem, often complicated (nonlinear)
differential equations are reduced to simple (in most cases linear) versions, which results in the so-called
model error.
9. Data errors and uncertainties: the data (e.g., input data, boundary conditions, parameters) are
obtained from experimental data and may be inaccurate themselves.
10. Inaccurate experimental setups that yield wrong reference solutions u to which numerical solutions
ũ should be compared with.
It it very important to understand and to accept that we never can avoid all these errors. The important
aspect is to control these errors and to provide answers if these errors are too big to influence the interpretation
of numerical simulations or if they can be assumed to be negliable. A big branch of numerical mathematics is
to derive error estimates that allow us to predict quantitative information about the arising errors.
One objective is to clearify whether different error parts are rather of theoretical value or whether they are
computable. Also, we need to study which error contribution dominates.

21
2. MOTIVATION
2.4.1 Illustration of three errors (experimental, theory, numerical) with the clothesline problem
You need to understand a bit of French for the words, but the figure might be self-explaining.

Figure 1: Errors in experiments, theory and numerical modeling.

2.4.2 A priori and a posteriori error estimates


Error estimates can be classified into two categories:

• a priori estimates include the (unknown) exact solution u, such that η := η(u), and yield qualitative
convergence rates for asymptotic limits. They can be derived before (thus a priori) the approximation
is known.
• a posteriori error estimates are of the form η := η(ũ) explicitly employ the approximation ũ and
therefore yield quantitative information with computable majorants (i.e., bounds) and can be further
utilized to design adaptive schemes (see Chapter 9).

2.5 Accuracy
According to the ISO norm 5725-1:1994, ‘accuracy is the closeness of an agreement between a test
result and an accepted reference value’. The justification of the reference value can be an issue of itself.
Closely related is the precision, which measures how closely two or more measurements agree with each other.
This is linked to reproducibility. Traditionally, precision rather comes from experimental observations, but
since numerical simulations are also ‘experimental’ in view of various parameters, precision may play a role
here as well.
Accuracy and precision can be quantified by employing error estimates. Mathematically, we need to frame
our problem statements in appropriate spaces that allow us to measure distances (i.e., the distance between
the test result and the reference value). A metric space is a set X with a metric on it. This metric associates
with any pair of elements (i.e., points), say a, b ∈ X, a distance. The concept of a distance is more general
than that of a norm.

22
2. MOTIVATION
Definition 2.5 (Metric space). A metric space is a pair (X, d) where X is a set and d is a metric (distance
function) on X. The function d is defined on X × X, where × denotes the Cartesian product of sets such that
X × X is the set of all ordered pairs of elements of X. For all x, y, z ∈ X, it holds

1. d is real-valued, finite and nonnegative: d ∈ [0, ∞)


2. d(x, y) = 0 if and only if x = y
3. d(x, y) = d(y, x)

4. d(x, y) ≤ d(x, z) + d(z, y)


Example 2.6. On the real line R the usual metric is defined by

d(x, y) = |x − y|.

The Euclidian plane R2 and the Euclidian space R3 are natural extensions. As another example, we obtain
the function space C[a, b] of continuous functions defined over the closed interval [a, b]. Further examples can
be found in any calculus (analysis) or functional analysis textbook.
The specification of a metric from the two-dimensional plane or our three-dimensional usual space in which
we live, is a so-called norm. Thus, norms provide distance measures in more general spaces. This
leads to normed spaces. Such norms can now be employed to study the accuracy of mathematical models
and their numerical approximations.

Definition 2.7 (Normed space). A normed space X is a vector space with a norm. A norm on a real or
complex vector space X is a real-valued function on X whose value at x ∈ X is denoted by kxk and which has
the following properties:
1. kxk ≥ 0

2. kxk = 0 ⇔ x = 0
3. kαxk = |α|kxk
4. kx + yk ≤ kxk + kyk
where x, y ∈ X and α is any scalar from the underlying field R or C.

Definition 2.8 (Norm vs. metric). A norm on X induces a metric d on X, which is defined by

d(x, y) = kx − yk.

With the help of these definitions it is easy to infer that indeed a norm induces a metric. Therefore, normed
spaces are in particular metric spaces. It holds
Proposition 2.9. The norm is continuous, that is x 7→ kxk is a continuous mapping of (X, k · k) into the field
R.
Distances, induced or not by a norm, can now be employed to study the accuracy of mathematical models
and their numerical approximations.

2.5.1 Consequences for numerical mathematics


Closing the loop to Section 2.2, one key task is often to design appropriate norms for measuring the distance
between a numerical approximation uh and the continuous (exact) solution u. These choices depend on the
geometry, regularity properities of coefficients, functions, and right hand sides. Such norms give quantitative
information about the (approximation) error for deciding when an approximation (here discretization) is
sufficiently close to our sought solution:

ku − uhk k → 0 for h → 0, k → 0.

23
2. MOTIVATION
As another example, iteration error measurements allow us to decide when numerical algorithms can be
stopped:
ku − ul k → 0 (l → ∞),
where l indicates the iteration index. Very often, a mixture of both (and more situations) appear simultane-
ously.
A natural question is whether different norms yield similar answers. Here, one must distinguish between
finite-dimensional vector spaces and infinite-dimensional spaces. The latter results into the branch of functional
analysis. Therein, norms are in general not equivalent. Consequently, it does matter which norm is adopted.
In finite-dimensional spaces, it can be shown, however, that norms are indeed equivalent, but the involved
constants might influence the results.

2.6 Concepts in numerical mathematics


2.6.1 Definitions
In introductory classes to numerical methods, we deal with concepts that are very characteristic for numer-
ical modeling. In [114], we summarized them into seven points, which will frequently encounter us in the
forthcoming chapters:
1. Approximation: since analytical solutions are not possible to achieve as we just learned in the previous
section, solutions are obtained by numerical approximations.
2. Convergence: is a qualitative expression that tells us when members an of a sequence (an )n∈N are
sufficiently close to a limit a. In numerical mathematics this limit is often the solution that we are
looking for.
3. Order of convergence: While in analysis, we are often interested in the convergence itself, in numerical
mathematics we must pay attention how long it takes until a numerical solution has sufficient accuracy.
The longer a simulation takes, the more time and more energy (electricity to run the computer, air
conditioning of servers, etc.) are consumed. Therefore, we are heavily interested in developing fast
algorithms. In order to judge whether an algorithm is fast or not we have to determine the order of
convergence.
4. Errors: Numerical mathematics can be considered as the branch ‘mathematics of errors’ as we discussed
in detail in Section 2.4.
5. Error estimation: This is one of the biggest and most classical branches in numerical mathematics.
We need to derive error formulae to judge the outcome of our numerical simulations and to measure
the difference of the numerical solution ũ and the (unknown) exact solution u in appropriate norms. A
crucial aspect is to find out which error contributions dominate the total error. Say we have
etotal = edisc + emodel
Just as an example, let us assume edisc  emodel , then we can work with
etotal ≈ edisc .
It in the very heart of numerics to derive such error estimates and to decide which terms dominate. It
may happen that in lucky cases very complicated terms do not need to be implemented because their
error contribution is negligable. In a book review [142], I review a work from Repin and Sauter (2020)
[109] in which I explain the purpose of error estimates for the general mathematical community and not
only for numerical specialists.
6. Efficiency: In general we can say, the higher the convergence order of an algorithm is, the more efficient
the algorithm is. Therefore, we obtain faster the numerical solution to a given problem. But numerical
efficiency is not automatically related to resource-effective computing. For instance, developing a parallel
code using MPI (message passing interface), hardware-optimization (CPU,GPU), software optimizations
(ordering in some optimal way for-loops, arithmetic evaluations, etc.) can further reduce computational
costs.

24
2. MOTIVATION
7. Stability: Lastly, the robustness of algorithms and implementations with respect to parameter (model,
material, numerical) variations, boundary conditions, initial conditions, uncertainties must be studied.
Stability relates in the broadest sense to the third condition of Hadamard defined in Section 2.8.

2.6.2 Examples
We illustrate the previous concepts with the help of some examples.
1. Approximation: Two approximations of the clothesline problem:

2. Convergence: Converging approximations of the clothesline problem:

3. Order of convergence: Two different speeds of convergence:

4. Errors: We first refer the reader to Section 2.7 for a comparison of experimental, theoretical and
numerical situations. Second, we sharpen the sense of the influence of different errors. Not all errors

25
2. MOTIVATION
are equally important and sometimes, one might try to ‘optimize’ an error, which has no significant
influence. Let’s see this in more detail. Let the errors eM odel , eN umerics , eSof tware enter. The total error
is defined as
eT otal = eM odel + eN umerics + eSof tware
Let us assume that we have the numbers eM odel = 1000, eN umerics = 0.001, eSof tware = 4, the total error
is then given by
eT otal = 1000 + 0.001 + 4 = 1004.001.
Which error dominates? It is clearly eM odel = 1000. The relative influence is eM odel /eT otal = 0.996. So,
the other two error sources are negligible and would not need further attention in this specific example.
5. Error estimation: Error estimation is the process to obtain the concrete numbers 1000, 0.001, 4 in the
previous example. Error estimates can be classified into two categories:
• a priori estimates include the (unknown) exact solution u, such that η := η(u), and yield qual-
itative convergence rates for asymptotic limits. They can be derived before (thus a priori) the
approximation is known.
• a posteriori error estimates are of the form η := η(ũ) explicitly employ the approximation ũ
and therefore yield quantitative information with computable majorants (i.e., bounds) and can be
further utilized to design adaptive schemes.

6. Efficiency: is more or less self-explaining. A first answer is to look at CPU or wall time: how many
seconds, minutes, weeks, months does a program need to terminate and yield a result? A second answer
is to study ‘iteration numbers’ or arithmetic operations. The latter are often given in terms of the big O
notation. Having a linear equation system Ax = b with A ∈ Rn×n and O(n3 ) complexity means that we
need n3 (cubic in unknowns n) arithmetic operations to calculate the result. For instance, for n = 100,
we need around 1 000 000 operations. Having another algorithm (yielding the same result of Ax = b)
with only O(n) operations, means that we only need around 100 operations, which is a great difference.
The development of efficient solvers for large linear equations systems is consequently a big branch in
numerics and scientific computing.
7. Stability: We finally come back to the clothesline problem and change a bit the left boundary condition:

26
2. MOTIVATION
2.7 Illustration of physical oscillations versus numerical instabilities
To illustrate the numerical concept No. 7, we study an example (taken from [135]) of periodic flow interacting
with an elastic beam (fluid-structure interaction).

Figure 2: Fluid flow (Navier-Stokes) interacts with an elastic beam. Due to a non-symmetry of the cylinder,
the beam starts oscillating. These oscillations are physical!

• Observe the tip of the elastic beam!


→ Physical oscillations! Shown in red color for a ‘good’ numerical scheme.

800 Tangent CN (vw)


Secant CN shifted (v)

600

400
Drag

200

-200

2 4 6 8 10 12
Time

• The grey numerical scheme exhibits at some time around t ≈ 10 micro-oscillations which are due to
numerical instabilites. Finally the grey numerical scheme has a blow-up and yields garbage solutions.

27
2. MOTIVATION
2.8 Well-posedness (in the sense of Hadamard 1923)
The concept of well-posedness is very general and in fact very simple.

Definition 2.10. Let A : X → Y be a mapping and X, Y topological spaces. Then, the problem

Ax = y (1)

is well posed if

1. for each y ∈ Y , Problem (1) has a solution x,


2. the solution x is unique,
3. the solution x depends continuously on the problem data.
The first condition is immediately clear. The second condition is also obvious but often difficult to meet - and
in fact many physical processes do not have unique solutions. The last condition says if a variation of the
input data (right hand side, boundary values, initial conditions) vary only a little bit, then also the (unique)
solution should only vary a bit.
Remark 2.11. Problems in which one of the three conditions is violated are ill-posed.

Example 2.12 (Numerical differentiation). In view of the later chapters and that we shall deal with derivatives
throughout these notes, we consider the difference quotient Dh g as approximation of the derivative g 0 of a
function g. We know from introduction to numerics:

g(x + h) − g(x)
Dh g(x) = , 0 ≤ x ≤ h/2,
h
g(x + h/2) − g(x − h/2)
Dh g(x) = , h/2 < x < 1 − h/2,
h
g(x) − g(x − h)
Dh g(x) = , 1 − h/2 ≤ x ≤ 1,
h
with a step size h. We know that

|g 0 (x) − Dh g(x)| = O(h2 ) for the central difference quotient


0
|g (x) − Dh g(x)| = O(h) for the forward/backward difference quotients,

when g is three times continuously differentiable. When g is only of class C 2 [0, 1] or when we only have
disturbed data gε (e.g., data or model or round-off errors) with kg − gε k ≤ ε, then we have for the data error

|Dh (gε − g)(x)| ≤ , 0 ≤ x ≤ 1.
h
The total error is composed by the data error Dh (gε − g) and the discretization error g 0 − Dh g:

Dh gε − g 0 = Dh (gε − g) + Dh g − g 0

yielding the estimate


2ε 1 00
kDh gε − g 0 k ≤+ kg kC ∞ h.
h 2
This means that for noisy data, i.e., ε 6= 0, the total error cannot become arbitrarily small as we normally
hope to wish. This problem is ill-posed - specifically for h  ε. The optimal step size is
2
hε = 1/2
ε1/2 ,
kg 00 kC ∞

if g 00 6= 0. In conclusion the total error is of order O(ε1/2 ).

28
2. MOTIVATION
Example 2.13 (An ill-posed elliptic equation from Hadamard). Let Ω := (−∞, ∞) × (0, δ). We consider the
Laplace equation (defined in Chapter 4). We define the following boundary value problem:

∆u = 0 in Ω
u(x, 0) = ϕ(x)
∂y u(x, 0) = 0

cos(x)
with the boundary function ϕ(x) = n . To this problem, we can compute the exact solution

cos(nx) cosh(ny)
u(x, y) = .
n
For n → ∞ we easily see the limits:

lim ϕ(x) → 0
n→∞
lim u(x, y) → ∞
n→∞

for the general cases when we do not sit in a period x = π/2 of cos(nx) = 0. Consequently, small boundary
data cause large solution deflections, which is therefore an ill-posed problem.

2.9 Numerical methods for solving PDEs


There exist various methods for solving PDEs numerically:
• Finite differences: the derivatives in the differential equation will be replaced by difference quotients. As
they belong to the most simple methods and are still used frequently in engineering and industry, we
will provide a short introduction on them as well.
• Finite volume methods: they are based on local conservation properties of the underlying PDE.
• Variational methods such as Galerkin finite elements (FEM).

• Boundary element methods (BEM).


• Isogeometric analysis (IGA) [79].
• Spectral methods.
• Meshless methods.

2.10 Chapter summary and outlook


In this chapter, we recapitulated guiding questions and numerical concepts. We also defined in a brief way a
PDE. In the next chapter, we shall introduce the notation used throughout these lecture notes.

29
2. MOTIVATION

30
3. NOTATION

3 Notation
In this chapter, we collect most parts of the notation and some important results from linear algebra, analysis,
vector and tensor analysis, necessary for numerical mathematics.

3.1 Domains
We consider open, bounded domains Ω ⊂ Rd where d = 1, 2, 3 is the dimension. The boundary is denoted by
∂Ω. The outer normal vector with respect to (w.r.t.) to ∂Ω is n. We assume that Ω is sufficiently smooth
(i.e., a Lipschitz domain or domain with Lipschitz boundary) such that the normal n can be defined. What
also works for most of our theory are convex, polyhedral domains with finite corners. For specific definitions
of nonsmooth domains, we refer to the literature, e.g., [63].

3.2 Independent variables


A point in Rd is denoted by
x = (x1 , . . . , xd ).
Pd
The variable for ‘time’ is denoted by t. The euclidian scalar product is denoted by (x, y) = x · y = i=1 xi yi .

3.3 Function, vector and tensor notation


We start with
Definition 3.1. A real-valued or complex-valued function (of one variable) on a set X is a mapping f , which
associates in a unique fashion to each x ∈ X a number f (x). We write f : X → C or f : X → R and x 7→ f (x).
The set X is the definition space and f (X) := {f (x) ∈ C|x ∈ X} is the image set. The graph of f is the set

G(f ) := {(x, f (x))|x ∈ X} ⊂ X × C.

The variable x is the ‘independent’ variable or also called the argument. All x values form the set of definition
of f (x). The variable y is called ‘dependent’. All y values form the image space of f (x).

In these notes, functions are often denoted by

u := u(x)

if they only depend on the spatial variable x = (x1 , . . . , xd ). If they depend on time and space, they are
denoted by
u = u(t, x).
Usually in physics or engineering vector-valued and tensor-valued quantities are denoted in bold font size or
with the help of arrows. Unfortunately in mathematics, this notation is only sometimes adopted. We continue
this crime and do not distinguish scalar, vector, and tensor-valued functions. Thus for points in R3 we write:

x := (x, y, z) = x = ~x.

Similar for functions from a space u : R3 ⊇ U → R3 :

u := (ux , uy , uz ) = u = ~u.

And also similar for tensor-valued functions (which often have a bar or two bars under the tensor quantity) as
for example the Cauchy stress tensor σf ∈ R3×3 of a fluid:
 
σxx σxy σxz
σf := σ f =  σyx σyy σyz .
 
σzx σzy σzz

31
3. NOTATION
3.4 Partial derivatives
We frequently use:
∂u
= ∂x u
∂x
and
∂u
= ∂t u
∂t
and
∂2u
= ∂t2 u
∂t∂t
and
∂2u
= ∂xy u
∂x∂y

3.5 Chain rule


Let the functions g : (a, b) → Rm and f : Rm → R and its composition h = f (g) ∈ R be given and specifically
g := (t, x) := (t, x2 , x3 , . . . , xm ):

Dt h(x) = Dt f (g(x)) = Dt f (t, x) = Dt f (t, x2 , . . . , xk )


m
X
= ∂k f (g(x)) · ∂t gk
k=1
Xm
= ∂k f (t, x2 , . . . , xm ) · ∂t xk , where x1 := t
k=1
m
X
= ∂t f · ∂t t + ∂k f (t, x2 , . . . , xk ) · ∂t xk
k=2
= ∂t f + ∇f · (∂t x2 , · · · , ∂t xm )T .

For instance m = 4 means that we deal with a four-dimensional continuum (t, x, y, z).
Remark 3.2. See also [86][page 54 and page 93] for definitions of the chain rule.

3.6 Multiindex notation


For a general description of ODEs and PDEs the multiindex notation is commonly used.
• A multiindex is a vector α = (α1 , . . . , αn ), where each component αi ∈ N0 . The order is

|α| = α1 + . . . + αn .

• For a given multiindex we define the partial derivative:

Dα u(x) := ∂xα11 · · · ∂xαnn u

• If k ∈ N0 , we define the set of all partial derivatives of order k:

Dk u(x) := {Dα u(x) : |α| = k}.

Example 3.3. Let the problem dimension n = 3. Then, α = (α1 , α2 , α3 ). For instance, let α = (2, 0, 1).
Then |α| = 3 and Dα u(x) = ∂x2 ∂z1 u(x).

32
3. NOTATION
3.7 Gradient, divergence, trace, Laplace, rotation/curl
Well-known in physics, it is convenient to work with the nabla-operator to define derivative expressions.
The gradient of a single-valued function v : Rn → R reads:
 
∂1 v
 . 
∇v =  . 
 . .
∂n v

The gradient of a vector-valued function v : Rn → Rm is called Jacobian matrix and reads:


 
∂1 v1 . . . ∂n v1
 . .. 
∇v =  . . .
 .

∂1 vm . . . ∂n vm

The divergence is defined for vector-valued functions v : Rn → Rn :


 
v1 n
 .  X
div v := ∇ · v := ∇ ·  .. 

 = ∂k vk .
k=1
vn

The divergence for a tensor σ ∈ Rn×n is defined as:


n
X ∂σij 
∇·σ = 1≤i≤n
.
j=1
∂xj

The trace of a matrix A ∈ Rn×n is defined as


n
X
tr(A) = aii .
i=1

Definition 3.4 (Laplace operator). The Laplace operator of a two-times continuously differentiable scalar-
valued function u : Rn → R is defined as
Xn
∆u = ∂kk u.
k=1

Definition 3.5. For a vector-valued function u : Rn → Rm , we define the Laplace operator component-wise
as    Pn 
u1 k=1 ∂kk u1
 .   .. 
∆u = ∆ .  
 . =
.
. 
Pn
um k=1 ∂kk um

Let us also introduce the cross product of two vectors u, v ∈ R3 :


     
u1 v1 u2 v3 − u3 v2
 u2  ×  v2  =  u3 v1 − u1 v3 .
     
u3 v3 u1 v2 − u2 v1

With the help of the cross product, we can define the rotation or curl :
     
∂x v1 ∂y v3 − ∂z v2
rotv = ∇ × v =  ∂y  ×  v2  =  ∂z v1 − ∂x v3 .
     
∂z v3 ∂x v2 − ∂y v1

33
3. NOTATION
3.8 Some properties of matrices
3.8.1 Eigenvalues
The complex or real number λ is an eigenvalue of the square matrix A if there is a vector x ∈ Cn , x 6= 0, such
that the eigenvalue equation holds:
Ax = λx.
The vector x is called eigenvector corresponding to the eigenvalue λ. From linear algebra arguments λ is an
eigenvalue of A if and only if
det(A − λI) = 0,
which is the so-called characteristic equation for A. The left hand side is a polynomial of degree n and called
characteristic polynomial of A.

3.8.2 Positive definite, eigenvalues and more


Let Q : Rn → R be a quadratic form with Q(x) = xT Ax. The quadratic form (and also its representing matrix
A) have certain names according to their properties:
• Q is positive definite, when Q(x) > 0 for all x 6= 0

• Q is positive semi-definite, when Q(x) ≥ 0 for all x


• Q is negative definite, when Q(x) < 0 for all x 6= 0
• Q is negative semi-definite, when Q(x) ≤ 0 for all x

• Q is indefinite, when Q has positive and negative values.


The relation to the eigenvalues is:
• Q > 0 if and only if all eigenvalues λi > 0
• Q ≥ 0 if and only if all eigenvalues λi ≥ 0

• Q < 0 if and only if all eigenvalues λi < 0


• Q ≥ 0 if and only if all eigenvalues λi ≤ 0
• Q indefinite if and only if eigenvalues λi < 0 and λi > 0 arise.

3.8.3 Invariants
The principal invariants of a matrix A are the coefficients of the characteristic polynomial det(A − λI). A
matrix A ∈ R3×3 has three principal invariants; namely iA = (i1 (A), i2 (A), i3 (A)) with

det (λI − A) = λ3 − i1 (A)λ2 + i2 (A)λ − i3 (A).

Let λ1 , λ2 , λ3 be the eigenvalues of A. Then we have

i1 (A) = tr (A) = λ1 + λ2 + λ3 ,
1h i
i2 (A) = (tr A)2 − tr (A)2 = λ1 λ2 + λ1 λ3 + λ2 λ3 ,
2
i3 (A) = det (A) = λ1 λ2 λ3 .

Remark 3.6. If two different matrices have the same principal invariants, they also have the same eigenvalues.
Remark 3.7 (Cayley-Hamilton). Every second-order tensor (i.e., a matrix) satisfies its own characteristic
equation:
A3 − i1 A2 + i2 A − i3 I = 0.

34
3. NOTATION
3.9 Vector spaces
Let K = R. In fact, K = C would work as well and any general field. But we restrict our attention in the
entire lecture notes to real numbers R.
Definition 3.8 (Vector space). A vector space or linear space over a field K is a nonempty set X (later
often denotes by V, U or also W ). The space X contains elements x1 , x2 , . . . which are the so-called vectors.
We define two algebraic operations:
• Vector addition: x + y for x, y ∈ X.

• Multiplication of vectors with scalars: αx for x ∈ X and α ∈ K.


These operations satisfy the usual laws that they are commutative, associative, and satisfy distributive laws.

3.10 Scalar product


Let V be a vector space over R (note C would work as well). A mapping (·, ·) : V × V → R is a scalar product
(or inner product) if
1. (u + v, w) = (u, w) + (v, w) for all u, v, w ∈ V

2. (αu, v) = α(u, v) for all u, v ∈ V and α ∈ R


3. (u, v) = (v, u) for all u, v ∈ V
4. (u, u) ≥ 0 for all u ∈ V
5. (u, u) = 0 if and only if when u = 0.

Moreover:
1. (u, v + w) = (u, v) + (u, w) for all u, v, w ∈ V
2. (u, αv) = ᾱ(u, v) for all u, v ∈ V and α ∈ R

For real-valued functions, the scalar product is bilinear and in the complex-valued case the scalar product is
called sesquilinear.
It holds:
Proposition 3.9 (Cauchy-Schwarz). Let V be a vector space with (·, ·). Then, it holds

|(u, v)| ≤ kukkvk for all u, v ∈ V.

3.11 Normed spaces


Let X be a linear space. The mapping || · || : X → R is a norm if
i) ||x|| ≥ 0 ∀x ∈ X (Positivity)
ii) ||x|| = 0 ⇔ x = 0 (Definiteness)
iii) ||αx|| = |α| ||x||, α∈K (Homogenity)
iv) ||x + y|| ≤ ||x|| + ||y|| (Triangle inequality)

A space X is a normed space when the norm properties are satisfied. If condition ii) is not satisfied, the
mapping is called a semi-norm and denoted by |x|X for x ∈ X.

Definition 3.10. Let k · k be a norm on X. Then {X, k · k} is called a (real) normed space.
Example 3.11. We provide some examples:

35
3. NOTATION
Pn
1. Rn with the euclidian norm kxk = ( i=1 x2i )1/2 is a normed space.
2. Let Ω := [a, b]. The space of continuous functions C(Ω) endowed with the maximum norm

kukC(Ω) = max ku(x)k


x∈Ω

is a normed space.

3. The space {C(Ω), k · kL2 } with


Z 1/2
kukL2 = u(x)2 dx

is a normed space.
Definition 3.12. Two norms are equivalent if converging sequences have the same limits.
Proposition 3.13. Two norms || · ||A , || · ||B on X are equivalent if and only if there exist two constants
C1 , C2 > 0 such that
C1 ||x||A ≤ ||x||B ≤ C2 ||x||A ∀x ∈ X
The limits are the same.

Proof. See e.g., [134].


Remark 3.14. These statements have indeed some immediate consequences. For instance, often convergence
of an iterative scheme is proven with the help of the Banach fixed point scheme in which the contraction
constant q must be smaller than 1; see for instance Section 10. It is important that not all norms may satisfy
q < 1, but when different norms are equivalent and we pick one that satisfy q < 1, we can prove convergence.

3.12 Linear mappings


Let {U, k · kU } and {V, k · kV } be normed spaces over R.
Definition 3.15 (Linear mappings). A mapping T : U → V is called linear or linear operator when

T (u) = T (au1 + bu2 ) = aT (u1 ) + bT (u2 ),

for u = au1 + bu2 and for a, b ∈ R.


Example 3.16. We discuss two examples:

1. Let T (u) = ∆u. Then:

T (au1 + bu2 ) = ∆(au1 + bu2 ) = a∆u1 + b∆u2 = aT (u1 ) + bT (u2 ).

Thus, T is linear.
2. Let T (u) = (u · ∇)u. Then:

T (au1 + bu2 ) = ((au1 + bu2 ) · ∇)(au1 + bu2 ) 6= a(u1 · ∇)u1 + b(u2 · ∇)u2 = aT (u1 ) + bT (u2 ).

Here, T is nonlinear.
Exercise 1. Classify T (u) = |u|.

Definition 3.17 (Linear functional). A mapping T : U → V with V = R is called linear functional.


Definition 3.18. A mapping T : U → V is called continuous when

lim un = u ⇒ lim T un = T u.
n→∞ n→∞

36
3. NOTATION
Definition 3.19. A linear operator T : U → V is called bounded, when the following estimate holds true:

kT ukV ≤ ckukU .

Theorem 3.20. A linear operator is bounded if and only if it is continuous.


Proof. See [134], Satz II.1.2.
Definition 3.21. Let T : U → V be a linear and continuous operator. The norm of T is defined as

kT k = sup kT ukV .
kukU =1

Since a linear T is bounded (when continuous) there exists kT ukV ≤ ckukU for all u ∈ U . The smallest number
for c is c = kT k.
Definition 3.22. The linear space of all linear and bounded operators from U to V is denoted by

L(U, V ).

The norm of L(U, V ) is the operator norm kT k.


Definition 3.23 (Dual space). The linear space of all linear, bounded functionals on U (see Def. 3.17) is the
dual space, denoted by U ∗ , i.e.,
U ∗ = L(U, R).
For f ∈ U ∗ , the norm is given by:
kf kU ∗ = sup |f (u)|.
kukU =1

The dual space is always a Banach space; see again [134]. For more details on Banach spaces, we also refer
to Section 7.1 of these lecture notes.

3.13 Little o and big O - the Landau symbols


Definition 3.24 (Landau symbols). (i) Let g(n) a function with g → ∞ for n → ∞. Then f ∈ O(g) if and
only if when
f (n)
lim sup <∞
n→∞ g(n)
and f ∈ o(g) if and only if
f (n)
lim = 0.
n→∞ g(n)
(ii) Let g(h) a function with g(h) → 0 for h → 0. As before, we define f ∈ O(g) and f ∈ o(g):

f (h)
lim sup <∞ ⇔ f ∈ O(g),
h→0 g(h)

and
f (h)
lim =0 ⇔ f ∈ o(g).
h→0 g(h)
(iii) Specifically:
lim sup f (h) < ∞ ⇔ f ∈ O(1),
h→0

and
lim f (h) = 0 ⇔ f ∈ o(1).
h→0

Often, the notation f = O(g) is used rather than f ∈ O(g) and similarily f = o(g) rather than f ∈ o(g).
Example 3.25. Seven examples:

37
3. NOTATION
1
1. x = o( x12 ) (x → 0) ,
1
2. x2 = o( x1 ) (|x| → ∞),
3. e−x = o(x−26 ) (x → ∞).
4. Let ε → 0 and h → 0. We write
h = o(ε)
when
h
→ 0 for h → 0, ε → 0,
ε
which means that h tends faster to 0 than ε.
5. Let us assume that we have the error estimate (see sections on the numerical analysis)

ky(tn ) − yn k2 = O(k).

Here the O notation means nothing else than


ky(tn ) − yn k2
→C for k → 0.
k
6. Let
c1 k 1 + c2 k 2 + c3 k 3 + . . .
be a development in powers of k. We can shortly write:

c1 k 1 + O(k 2 ).

7. Complexity of algorithms, e.g., LR


O(n3 ) for n → ∞.

Here the fraction converges to a constant C (and not necessarily 0!), which illustrates that O(·) convergence is
weaker than o(·) convergence. On the other hand, this should not yield the wrong conclusion that ky(tn ) − yn k2
may not tend to zero. Since k → 0, also ky(tn ) − yn k2 → 0 must hold necessarily.

3.14 Taylor expansion


Let f : R → R be of class C n around some point a. Then, the Taylor-Young formula is defined as
n
X f (j) (a)
T f (x) = (x − a)j + o(|x − a|n ).
j=0
j!

n
The remainder o(|x − a| ) (Landau notation) is a function such that
o(|x − a|n )
lim = 0.
x→a |x − a|n

Furthermore, we have the Taylor-Lagrange formula. Let f be of class C n+1 :


n
X f (j) (a) f (n+1) (y)
T f (x) = (x − a)j + (x − a)n+1 ,
j=0
j! (n + 1)!

for some intermediate point y between a and x.


Finally, we recall the Fourier power series. Let f : R → R be of class C ∞ . The power series

X f (j) (a)
T f (x) = (x − a)j
j=0
j!

is the Taylor series of f in the point a.

38
3. NOTATION
3.15 Transformation of integrals: substitution rule / change of variables
One of the most important formulas in continuum mechanics and variational formulations is the substitution
rule that allows to transform integrals from one domain to another.
In 1D it holds:
Proposition 3.26 (Substituion rule in 1D). Let I = [a, b] be given. To transform this interval to a new
interval, we use a mapping T (I) = [α, β] with T (a) = α and T (b) = β. If T ∈ C 1 (a continuously differentiable
mapping) and monotonically increasing (i.e., T 0 > 0), we have the transformation rule:
Z β Z T (b) Z b
f (y) dy = f (y) dy = f (T (x)) T 0 (x) dx.
α T (a) a

Proof. Any analysis book. Here Analysis 2, Rolf Rannacher, Heidelberg University [105].
Remark 3.27. In case that T 0 < 0 the previous proposition still holds true, but with a negative sign:
Z β Z T (a) Z T (b) Z b
f (y) dy = f (y) dy = − f (y) dy = f (T (x)) (−T 0 (x)) dx.
α T (b) T (a) a

For both cases with T 0 6= 0 the formula works and finally yields:
Theorem 3.28. Let I = [a, b] be given. To transform this interval to a new interval [α, β], we employ a
mapping T . If T ∈ C 1 (a continuously differentiable mapping) and T 0 6= 0, it holds:
Z Z β Z b Z
f (y) dy := f (y) dy = f (T (x)) |T (x)| dx =: f (T (x)) |T 0 (x)| dx.
0
T (I) α a I

Proof. Any analysis book. Here Analysis 2, Rolf Rannacher, Heidelberg University [105].
Remark 3.29. We observe the relation between the integration increments:

dy = |T 0 (x)| dx.

Example 3.30. Let T be a affine linear transformation defined as

T (x) = ax + b.

Then,
dy = |a|dx.
In higher dimensions, we have the following result of the substitution rule (also known as change of
variables under the integral):
Theorem 3.31. Let Ω ⊂ Rn be an open, measurable, domain. Let the function T : Ω → R be of class C 1 ,
one-to-one (injective) and Lipschitz continuous. Then:
• The domain Ω̂ := T (Ω) is measurable.
• The function f (T (·))|detT 0 (·)| : Ω → R is (Riemann)-integrable.
• For all measurable subdomains M ⊂ Ω it holds the substitution rule:
Z Z
f (y) dy = f (T (x))|detT 0 (x)| dx,
T (M ) M

and in particular as well for M = Ω.


Proof. Any analysis book. See e.g., [105] or [86][Chapter 9].
Remark 3.32. In continuum mechanics, T 0 is the so-called deformation gradient and J := det(T 0 ), the
volume ratio.

39
3. NOTATION
3.16 Gauss-Green theorem / Divergence theorem
The Gauss-Green theorem or often known as divergence theorem, is one of the most useful formulas in
continuum mechanics and numerical analysis.
Let Ω ⊂ Rn an bounded, open domain and ∂Ω of class C 1 .
Theorem 3.33 (Gauss-Green theorem / Divergence theorem). Suppose that u := u(x) ∈ C 1 (Ω̄) with x =
(x1 , . . . , xn ). Then: Z Z
uxi dx = uni ds, for i = 1, . . . , n.
Ω ∂Ω
In compact notation, we have Z Z
div u dx = u · n ds
Ω ∂Ω

for each vector field u ∈ C 1 (Ω̄; Rn ).

Proof. The proof is nontrivial. See for example [86].

3.17 Integration by parts and Green’s formulae


One of the most important formulae in applied mathematics, physics and continuum mechanics is integration
by parts.
From the divergence Theorem 3.33, we obtain immediately:
Proposition 3.34 (Integration by parts). Let u, v ∈ C 1 (Ω̄). Then:
Z Z Z
uxi v dx = − uvxi dx + uvni ds, for i = 1, . . . , n.
Ω Ω ∂Ω

In compact notation: Z Z Z
∇uv dx = − u∇v dx + uvn ds.
Ω Ω ∂Ω

Proof. Apply the divergence theorem to uv. Exercise.


We obtain now some further results, which are very useful, but all are based directly on the integration by
parts. For this reason, it is more important to know the divergence theorem and integration by parts formula.

Proposition 3.35 (Green’s formulas). Let u, v ∈ C 2 (Ω̄). Then:


Z Z
∆u dx = ∂n u ds,
Ω ∂Ω
Z Z Z
∇u · ∇v dx = − ∆u v dx + v ∂n u ds.
Ω Ω ∂Ω

Proof. Apply integration parts.


Proposition 3.36 (Green’s formula in 1D). Let Ω = (a, b). Let u, v ∈ C 2 (Ω̄). Then:
Z Z
u0 (x) · v 0 (x) dx = − u00 (x) v(x) dx + [u0 (x)v(x)]x=a
x=b
Ω Ω

Proof. Apply integration parts.

40
3. NOTATION
3.18 Main notation
Notation of scalar products, linear forms and bilinear forms. Let Ω be a domain in Rn with the usual properties
to ensure integrability and differentiability:
1. Let x, y ∈ V ⊂ Rn . Then the scalar product is denoted by:
Z
(x, y) = xy dz x, y ∈ V.

2. Let X, Y ∈ V ⊂ Rn×n . Then the scalar product for a matrix is denoted by:
Z
(X, Y ) = X : Y dz X, Y ∈ V.

Here, : stands for the Frobenius scalar product.

3. Often, we use: Find u ∈ V such that

a(u, ϕ) = l(φ) ∀ϕ ∈ V.

Here a(u, ϕ) : V × V → R is a bilinear form and l(ϕ) ∈ V ∗ is a linear form (linear functional).

4. For nonlinear problems, the solution variable u ∈ V is nonlinear while the test function is still linear.
Here we use the notation: Find u ∈ V such that

a(u)(ϕ) = l(ϕ) ∀ϕ ∈ V.

Here, a(u)(ϕ) is a so-called semi-linear form.

5. For (linear) PDE systems, my notation is: Find U ∈ X such that

A(U, Ψ) = F (Ψ) ∀Ψ ∈ X.

6. For nonlinear PDE systems, my notation is: Find U ∈ X such that

A(U )(Ψ) = F (Ψ) ∀Ψ ∈ X.

3.19 Chapter summary and outlook


Having the notation setup, we shall now begin with the specific topics promised in the title of these lecture
notes. To this end, we are going to have an extended introduction in which mathematical modeling of some
differential equations is carried out.

41
3. NOTATION

42
4. AN EXTENDED INTRODUCTION

4 An extended introduction
In this introduction (despite being already in Chapter 4), we first start with modeling on how the Laplace
equation can be derived from physical fundamental principles.

4.1 Examples of differential equations


We start with some examples:

4.1.1 Variational principles in mechanics


Variational principles were developed in physics and more precisely in classical mechanics, e.g., [52, 61].
One of the first variational problems was designed by Jacob Bernoulli in the year 1696: how does a mass
point reach from a point p1 in the shortest time T another point p2 under graviational forces? In consequence,
we seek a minimal time T along a curve u(x):

min J(u) = min(T )

with respect to the boundary conditions u(x1 ) = u1 and u(x2 ) = u2 . To obtain the solution u(x), we start
from energy conservation, which yields for the kinetic and the potential energies:

mv 2
= mg(u1 − u),
2
where m is the mass, v the velocity of the mass, g the graviational force, u1 the boundary condition, and
u = u(x) the sought solution curve. We have the functional:
Z 2 Z x2 s
ds 1 + u0 (x)2
J(u) = T = = dx.
1 v x1 2g(u1 − u(x))

Proposition 4.1. The solution to this problem is the so-called Brachistochrone formulated by Jacob Bernoulli
in 1696 [52].
More generally, we formulate the unconstrained problem:

Formulation 4.2. Find u = u(x) such that


min J(u)
with Z x2
J(u) = F (u, u0 , x) dx.
x1

Here, we assume that the function F (u, u0 , x) and the boundary values u(x1 ) − u1 and u(x2 ) = u2 are known.
The idea is that we vary J(u + δu) with a small increment δu as we have formulated in Section 8.1.2. Then,
we arrive at the Euler-Lagrange equations:

Definition 4.3. The (strong form of the) Euler-Lagrange equations are obtained as stationary point of the
functional J(u) and are nothing else, but the PDE in differential form:

d ∂F (u, u0 , x) ∂F (u, u0 , x)
0
= .
dx ∂u ∂u
Remark 4.4 (Weak form of Euler-Lagrange equations). An equivalent statement is related to the weak form
of the Euler-Lagrange equations in Banach spaces. Here, the functional J(u) is differentiated w.r.t. u into the
direction φ yielding Ju0 (u)(φ) as used in Section 8.1.2.

43
4. AN EXTENDED INTRODUCTION
Example 4.5. To compute the length of a curve u(x) between two points (x1 , u(x1 )) and (x2 , u(x2 )), the
following functional is used: Z 2 Z x2 p
J(u) = ds = 1 + (u0 )2 dx.
1 x1

This brings us to the question for which function u(x) the functional J(u) attains a minimum, i.e., measuring
the shortest distance between (x1 , u(x1 )) and (x2 , u(x2 )). This functional is the starting point to derive the
clothesline problem. p
Moreover, we identify F (u, u0 , x) = 1 + (u0 )2 . The Euler-Lagrange equation is then given by

d u0 (x)
p = 0.
dx 1 + (u0 )2

When no external forces act (right hand side is zero), we can immediately derive the solution:

d u0 (x) d 0
p =0 ⇒ u (x) = 0 ⇒ u0 (x) = const ⇒ u(x) = ax + c.
dx 1 + (u0 )2 dx

The boundary conditions (not specified here) will determine the constants a and c. Of course, the solution,
i.e., the shortest distance between two points, is a linear function. When external forces act (e.g., gravity), we
will arrive at a solution similar to that in Section 8.1.1.

4.1.2 Population growth


Let us compute the growth of a species (for example human beings) with a very simple (and finally not that
realistic) model. But this shows that reality can be represented to some extend at least by simple models, but
that continuous comparisons with other data is also necessary. Furthermore, this also shows that mathematical
modeling often starts with a simple equation and is then continuously further augmented with further terms
and coefficients. In the final end we arrive at complicated formulae.

To get started, let us assume that the population number is y = y(t) at time t. Furthermore, we assume
constant growth g and mortalities rates m, respectively. In a short time frame dt we have a relative increase
of
dy
= (g − m)dt
y
of the population. Re-arranging and taking the limit dt → 0 yields:
dy
lim = (g − m)y
t→0 dt
and thus
y 0 = (g − m)y.
This is the so-called Malthusian law of growth. For this ordinary differential equation (ODE), we can
even explicitely compute the solution:

y(t) = c exp((g − m)(t − t0 )).

This can be achieved with separation of variables:


dy
y0 = = (g − m)y (2)
dt
Z Z t
dy
⇒ = (g − m)dt (3)
y t0
⇒ Ln|y| + C = (g − m)(t − t0 ) (4)
⇒ y = exp[C] · exp[(g − m)(t − t0 )] (5)

44
4. AN EXTENDED INTRODUCTION
In order to work with this ODE and to compute the future development of the species we need an initial value
at some starting point t0 :
y(t0 ) = y0 .
With this value, we can further work to determine the constant exp(C):
y(t0 ) = exp(C) exp[(g − m)(t0 − t0 )] = exp(C) = y0 . (6)
Let us say in the year t0 = 2011 there have been two members of this species: y(2011) = 2. Supposing a
growth rate of 25 per cent per year yields g = 0.25. Let us say m = 0 - nobody will die. In the following we
compute two estimates of the future evolution of this species: for the year t = 2014 and t = 2022. We first
obtain:
y(2014) = 2 exp(0.25 ∗ (2014 − 2011)) = 4.117 ≈ 4.
Thus, four members of this species exist after three years. Secondly, we want to give a ‘long term’ estimate
for the year t = 2022 and calculate:
y(2022) = 2 exp(0.25 ∗ (2022 − 2011)) = 31.285 ≈ 31.
In fact, this species has an increase of 29 members within 11 years. If you translate this to human beings,
we observe that the formula works quite well for a short time range but becomes somewhat unrealistic for
long-term estimates though.
Remark 4.6. An improvement of the basic population model is the logistic differential equation that was
proposed by Verhulst; see e.g., [25]. In equations, this reads:
y 0 = ay − by 2 , y(t0 ) = y0 .
which is obviously a nonlinear equation. For population growth, usually a  b. For instance, one example is
for the growth of the world population is [25] (as of the year 1993):
t0 = 1965, y0 = 3.34 × 109 , a = 0.029, b = 2.695 × 10−12 .

4.1.3 Laplace equation / Poisson problem


This equation has several applications, e.g., Fick’s law of diffusion or heat conduction (temporal diffusion) or
the deflection of a solid membrane:
Formulation 4.7 (Laplace problem / Poisson problem). Let Ω be an open set. The Laplace equation reads:
−∆u = 0 in Ω.
The Poisson problem reads:
−∆u = f in Ω.
Definition 4.8. A C 2 function (C 2 means two times continuously differentiable) that satisfies the Laplace
equation is called harmonic function.
The physical interpretation is as follows. Let u denote the density of some quantity, for instance concentration
or temperature, in equilibrium. If G is any smooth region G ⊂ Ω, the flux F of the quantity u through the
boundary ∂G is zero: Z
F · n dx = 0. (7)
∂G
Here F denotes the flux density and n the outer normal vector. Gauss’ divergence theorem yields:
Formulation 4.9 (Integral conservation equation). Extending (7) to Ω we have
Z
F · n dx = 0.
∂Ω

Employing Gauss’ divergence theorem, it holds


Z
∇ · F dx = 0. (8)

This integral form of a conservation equation is very often the initial representation of physical laws.

45
4. AN EXTENDED INTRODUCTION
The question is now how we come from the integral form to a pointwise differential equation. We first need
to assume that the integrand is sufficiently regular. Here, F should be continuously differentiable. Second, we
request that (8) holds for arbitrary G ⊂ Ω as previously mentioned. Then, it can be inferred from results in
analysis (calculus) [85, 86] that the integrand is pointwise zero:
Formulation 4.10 (Conservation equation in differential form).

∇·F =0 in Ω. (9)

Constitutive laws Now we need a second assumption (or better a relation) between the flux and the quantity
u. Such relations do often come from material properties and are so-called constitutive laws. In many
situations it is reasonable to assume that the flux F is proportional to the negative gradient −∇u of the
quantity u. This means that flow goes from regions with a higher concentration to lower concentration regions.
For instance, the rate at which energy ‘flows’ (or diffuses) as heat from a warm body to a colder body is a
function of the temperature difference. The larger the temperature difference, the larger the diffusion. We
consequently obtain as further relation:
F = −∇u.
Plugging into the Equation (9) yields:

∇ · F = ∇ · (−∇u) = −∇ · (∇u) = −∆u = 0.

This is the simplest derivation one can make. Adding more knowledge on the underlying material of the body,
a material parameter a > 0 can be added:

∇ · F = ∇ · (−a∇u) = −∇ · (a∇u) = −a∆u = 0.

And adding a nonconstant and spatially dependent material further yields:

∇ · F = ∇ · (−a(x)∇u) = −∇ · (a(x)∇u) = 0.

In this last equation, we do not obtain any more the classical Laplace equation but a diffusion equation in
divergence form.

Figure 3: Solution of the Poisson problem on Ω = (−1, 1)2 with the right hand side f = 1.

46
4. AN EXTENDED INTRODUCTION
4.1.4 Mass conservation / First-order hyperbolic problem
In this next example let us again consider some concentration, but now a time dependent situation, which
brings us to a first-order hyperbolic equation. The application might be transport of a species/concentration
in some fluid flow (e.g. water) or nutrient transport in blood flow. Let Ω ⊂ Rn be an open domain, x ∈ Ω the
spatial variable and t the time. Let ρ(x, t) be the density of some quantity (e.g., concentration) and let v(x, t)
its velocity. Then the vector field
F = ρv in Rn
denotes the flux of this quantity. Let G be a subset of Ω. Then we have as in the previous case (Equation (7))
the definition: Z
F · n dx.
∂G
But this time we do not assume the ‘equilibrium state’ but ‘flow’. That is to say that the outward flow through
the boundary ∂G must coincide with the temporal decrease of the quantity:
Z Z
d
F · n ds = − ρ dx.
∂G dt G

We then apply again Gauss’ divergence theorem to the left hand side and bring the resulting term to the right
hand side. We now encounter a principle difficulty that the domain G may depend on time and consequently
integration and differentiation do not commute. Therefore, in a first step we need to transform the integrand
of Z
d
ρ dx
dt G
onto a fixed reference configuration Ĝ in which we can insert the time derivative under the integral sign. Then
we perform the calculation and transform lastly everything back to the physical domain G. Let the mapping
between Ĝ and G be denoted by T . Then, it holds:

x ∈ G : x = T (x̂, t), x̂ ∈ Ĝ.

Moreover, dx = J(x̂, t)dx̂, where J := det(∇T ). Using the substitution rule (change of variables) in higher
dimensions (see Section 3.15) yields:
Z Z
d d
ρ(x, t) dx = ρ(T (x̂, t), t)J(x̂, t)dx̂.
dt G dt Ĝ

We eliminated time dependence on the right hand side integral and thus differentiation and integration com-
mute now:
Z Z 
d  d d 
ρ(T (x̂, t), t)J(x̂, t) dx̂ = ρ(T (x̂, t), t) · J(x̂, t) + ρ(T (x̂, t), t) J(x̂, t) dx̂
Ĝ dt Ĝ dt dt
d
Here, dt ρ(T (x̂, t), t) is the material time derivative of a spatial field (see e.g., [76]), which is not the same as
the partial time derivative! In the last step, we need the Eulerian expansion formula
d
J = ∇ · v J.
dt

47
4. AN EXTENDED INTRODUCTION
Then:
Z 
d d 
ρ(T (x̂, t), t) · J(x̂, t) + ρ(T (x̂, t), t) J(x̂, t) dx̂
dt dt
ZĜ 
d 
= ρ(T (x̂, t), t) · J(x̂, t) + ρ(T (x̂, t), t)∇ · v J dx̂
dt
ZĜ 
d 
= ρ(T (x̂, t), t) + ρ(T (x̂, t), t)∇ · v J(x̂, t)dx̂
dt
ZĜ 
d 
= ρ(x, t) + ρ(x, t)∇ · v dx
dt
ZG  
= ∂t ρ(x, t) + ∇ρ(x, t) v + ρ(x, t)∇ · v dx
ZG  
= ∂t ρ(x, t) + ∇ · (ρv) dx.
G

Collecting the previous calculations brings us to:


Z Z
(∂t ρ + ∇ · F ) dx = (∂t ρ + ∇ · (ρv) dx, where F = ρv as introduced before.
G G

This is the so-called continuity equation (or mass conservation). Since G was arbitrary, we are allowed to write
the strong form:
∂t ρ + ∇ · (ρv) = 0 in Ω. (10)
If there are sources or sinks, denoted by f , inside Ω we obtain the more general formulation:

∂t ρ + ∇ · (ρv) = f in Ω.

On the other hand, various simplifications of Equation (10) can be made when certain requirements are fulfilled.
For instance, if ρ is not spatially varying, we obtain:

∂t ρ + ρ∇ · v = 0 in Ω.

If furthermore ρ is constant in time, we obtain:

∇ · v = 0 in Ω.

This is now the mass conservation law that appears for instance in the incompressible Navier-Stokes equations,
which are discussed a little bit later below.
In terms of the density ρ, we have shown in this section:

Theorem 4.11 (Reynolds’ transport theorem). Let Φ := Φ(x, t) be a smooth scalar field and Ω a (moving)
domain. It holds: Z Z Z
d ∂Φ
Φ dx = Φv · n ds + dx
dt Ω ∂Ω Ω ∂t

The first term on the right hand side represents the rate of transport (also known as the outward normal
flux) of the quantity Φv across the boundary surface ∂Ω. This contribution originates from the moving domain
Ω. The second contribution is the local time rate of change of the spatial scalar field Φ. If the domain Ω
does not move, the first term on the right hand side will of course vanish. Using Gauss’ divergence theorem it
holds furthermore: Z Z Z
d ∂Φ
Φ dx = ∇ · (Φ v) dx + dx.
dt Ω Ω Ω ∂t

48
4. AN EXTENDED INTRODUCTION
4.1.5 Elasticity (Lamé-Navier)
This example is already difficult because a system of nonlinear equations is considered:
Formulation 4.12. Let Ω b s ⊂ Rn , n = 3 with the boundary ∂ Ω bD ∪ Γ
b := Γ b N . Furthermore, let I := (0, T ]
where T > 0 is the end time value. The equations for geometrically non-linear elastodynamics in the reference
b are given as follows: Find vector-valued displacements ûs := (ûs(x) , û(y)
configuration Ω
(z)
s , ûs ) : Ωs × I → R
b n

such that
ρ̂s ∂t2 ûs − ∇
b · (FbΣ)
b =0 b s × I,
in Ω
ûs = 0 b D × I,
on Γ
F Σ · n̂s = ĥs
b b b N × I,
on Γ
ûs (0) = û0 b s × {0},
in Ω
v̂s (0) = v̂0 b s × {0}.
in Ω
We deal with two types of boundary conditions: Dirichlet and Neumann conditions. Furthermore, two initial
conditions on the displacements and the velocity are required. The constitutive law is given by the geometrically
nonlinear tensors (see e.g., Ciarlet [34]):

Σ
b =Σ
b s (ûs ) = 2µE
b + λtr(E)I,
b b = 1 (FbT Fb − I).
E (11)
2
Here, µ and λ are the Lamé coefficients for the solid. The solid density is denoted by ρ̂s and the solid
deformation gradient is Fb = Iˆ + ∇û
b s where Iˆ ∈ R3×3 is the identity matrix. Furthermore, n̂s denotes the
normal vector.

Figure 4: Deformed elastic flag.

49
4. AN EXTENDED INTRODUCTION
4.1.6 The incompressible, isothermal Navier-Stokes equations (fluid mechanics)
Flow equations in general are extremely important and have an incredible amount of possible applications
such as for example water (fluids), blood flow, wind, weather forecast, aerodynamics:
Formulation 4.13. Let Ωf ⊂ Rn , n = 3. Furthermore, let the boundary be split into ∂Ωf := Γin ∪Γout ∪ΓD ∪Γi .
The isothermal, incompressible (non-linear) Navier-Stokes equations read: Find vector-valued velocities vf :
Ωf × I → Rn and a scalar-valued pressure pf : Ωf × I → R such that

ρf ∂t vf + ρf vf · ∇vf − ∇ · σf (vf , pf ) = 0 in Ωf × I,
∇ · vf = 0 in Ωf × I,
vfD = vin on Γin × I,
vf = 0 on ΓD × I, (12)
−pf nf + ρf νf ∇vf · nf = 0 on Γout × I,
vf = hf on Γi × I,
vf (0) = v0 in Ωf × {t = 0},

where the (symmetric) Cauchy stress is given by

σf (vf , pf ) := −pf I + ρf νf (∇vf + ∇vfT ),

with the density ρf and the kinematic viscosity νf . The normal vector is denoted by nf .

Figure 5: Prototype example of a fluid mechanics problem (isothermal, incompressible Navier-Stokes equa-
tions): the famous Karman vortex street. The setting is based on the benchmark setting [118] and
the code can be found in NonStat Example 1 in [62] www.dopelib.net.

Remark 4.14. The two Formulations 4.12 and 4.13 are very important in many applications and their coupling
results in fluid-structure interaction. Here we notice that fluid flows are usually modeled in Eulerian coordinates
and solid deformations in Lagrangian coordinates. In the case of small displacements, the two coordinate
systems can be identified, i.e., Ω b ' Ω. This is the reason why in many basic books - in particular basics of
PDE theory or basics of numerical algorithms - the ‘hat’ notation (or similar notation to distinguish coordinate
systems) is not used.
Exercise 2. Recapitulate (in case you have had classes on continuum mechanics) the differences between
Lagrangian and Eulerian coordinates.

Remark 4.15. In Section 9.24, we study in more detail the stationary Navier-Stokes equations (NSE) in
terms of a well-acknowledged benchmark problem.

50
4. AN EXTENDED INTRODUCTION
4.1.7 Coupled: Biot (porous media flow) and Biot coupled to elasticity
In this section the Biot model is considered, which is a standard model for porous media applications. In
fact, the Biot system [20–22] is a multi-scale problem which is identified on the micro-scale as a fluid-structure
interaction problem (details on the interface law are found in Mikelić and Wheeler [95]). Through homoge-
nization, the Biot system is derived for the macro-scale level. This system is specifically suited for applications
in subsurface modeling for the poroelastic part, the so-called pay-zone. On the other hand, surrounding rock
(the non-pay zone) is modeled with the help of linear elasticity [34]. Therefore, the final configuration belongs
to a multiphysics problem in non-overlapping domains.
Remark 4.16. The Biot system allows for modeling of fluids and solids in porous media. In contrast to ‘true’
fluid-structure interaction, the displacements are assumed to be small and therefore, all equations are modeled
in Eulerian coordinates. Please see literature on solid mechanics [76] or fluid-structure interaction [112] in
which Lagrangian or Eulerian coordinates are explained.

4.1.7.1 Modeling We begin by describing the setting for a pure poroelastic setting, the so-called pay-zone.
Let ΩB the domain of interest and ∂ΩB its boundary with the partition:
∂ΩB = Γu ∪ Γt = Γp ∪ Γf ,
where Γu denotes the displacement boundary (Dirichlet), Γt the total stress or traction boundary (Neumann),
Γp the pore pressure boundary (Dirichlet), and Γf the fluid flux boundary (Neumann). Concretely, we have
for a material with the displacement variable u and its Cauchy stress tensor σ:
u = ū on Γu ,
σn = t̄ on Γt ,
for given ū and t̄, and the normal vector n. For the pressure with the permeability tensor K and fluid’s
viscosity ηf , we have the conditions:
p = p̄ on Γp ,
K 
− ∇p − ρf g · n = q̄ on Γf ,
ηf
for given p̄ and q̄; and the density ρf and the gravity g. For the initial conditions at time t = 0, we prescribe
p(t = 0) = p0 in ΩB ,
σ(t = 0) = σ0 in ΩB .
The extension of the previous setting with some non-pay zone ΩS (e.g., rock modeled as an elastic medium) is
displayed in Figure 6. In this case (if ΩB is totally embedded in ΩS ) the boundary conditions on ∂ΩB reduce
to interface conditions ∂ΩB := Γi = ΩB ∩ ΩS .

Figure 6: Configuration of the coupled Biot-Lamé-Navier system with the Biot equations in the pay zone ΩB
and linear elasticity in the non-pay zone ΩS .

Let I := (0, T ) denote the time interval. The underlying Biot system reads:

51
4. AN EXTENDED INTRODUCTION
Formulation 4.17. Find the pressure pB and displacement uB such that
1
∂t (cB pB + αB divuB ) − div K(∇pB − ρf g) = q in ΩB × I,
ηf
por 
−div σB (uB ) = fB in ΩB × I,

with the total poroelastic stress tensor


por
σB (u) := σB (uB ) − αB ∇pB ,

with the elastic stress tensor


σB (uB ) := µB e(uB ) + λB div uB I,
and the coefficients cB ≥ 0 (related to the compressibility of the fluid; cB = 0 means incompressible fluid), the
Biot-Willis constant αB ∈ [0, 1], (in fact, this constant relates to the amount of coupling between the flow part
and the elastic part) and the permeability tensor K, fluid’s viscosity and its density ηf and ρf , gravity g and
a volume source term q (i.e., usually, wells for oil production and fluid injection). In the second equation, the
Lamé coefficients are denoted by λB > 0 and µB > 0 and fB is a volume force.
The velocity vB in the porous medium is obtained with the help of Darcy’s law [38] and the Darcy equations
which are obtained through homogenization of Stokes’s equations [16]. It holds:
1
vB = − K(∇pB − ρf g).
ηf

Usually the non-pay zone is described in terms of linear elasticity:


Formulation 4.18. Find a displacement uS such that

−div σS (uS ) = fS in ΩS × I,

with

σS (uS ) := µS (∇uS + ∇uTS ) + λS div uS I,

with the Lamé coefficients µS and λS and a volume force fS . On the boundary ∂ΩS := ΓD ∪ ΓN , the conditions

uS = ūS on ΓD , σS (uS )nS = t̄S on ΓN ,

are prescribed with given ūS and t̄S .

4.1.7.2 Interface conditions It finally remains to describe the interface conditions on Γi between the two
sub-systems:
uB = uS ,
σB (uB )nB − σS (uS )nS = αpB nB ,
(13)
1
− K(∇pB − ρf g) · nS = 0.
ηf
It is important to notice that the second condition in (13), requires careful implementation on the interface.

4.1.7.3 The coupled problem in variational form The coupled system of equations is formulated in terms
of a variational monolithically-coupled system:
• Domain-coupling: solution variables enter as linear terms into the other problem. Here coupling both
equations yields a linear overall problem.
• Interface-coupling on Γi between the poroelastic model and the linearized elastic problem.

52
4. AN EXTENDED INTRODUCTION
We first define the function spaces using the boundary conditions from before:

VP := {p ∈ H 1 (ΩB )| p = 0 on Γr },
VS := {u ∈ H 1 (ΩB )| uy = 0 on Γb , ux = 0 on Γl }.

The definition of a weak formulation in a variational-monolithic setting reads:


Formulation 4.19 (Biot Problem in ΩB ). Find (p, u) ∈ VP × VS , with p(0) = p0 , such that for almost all
times t ∈ I, it holds
K
cB (∂t p, φp ) + αB (∇ · u, φp ) + (∇p, ∇φp )
νF
−ρF (g, φp ) − (q, φp ) = 0 ∀φp ∈ VP ,
Z
u u
(σB , ∇φ ) − αB (pI, ∇φ ) − (t̄S − αpn)φu ds − (fB , φu ) = 0 ∀φu ∈ VS .
Γtop

By augmenting the domain with an outer elastic part we must account for an additional elastic equation.
To this end, we redefine

VS := {u ∈ H 1 (ΩB ∪ ΩS )| uy = 0 on Γb , ux = 0 on Γl,B ∪ Γl,S },

in which we now implicitly have built-in the kinematic condition uB = uS on Γi . The dynamic coupling
condition from (13) is implicitly contained in the coupled PDE system.
Formulation 4.20 (Biot-Lamé-Navier Problem in ΩB ∪ ΩS ). Find (p, u) ∈ VP × VS , with the initial data
p(0) = p0 , such that for almost all times t ∈ I, it holds

K
cB (∂t p, φp ) + αB (∇ · u, φp ) + (∇p, ∇φp )
νF
−ρF (g, φp ) − (q, φp ) = 0 ∀φp ∈ VP ,
u u u
(σB , ∇φ ) − αB (pI, ∇φ ) − (fB , φ ) = 0 ∀φu ∈ VS (ΩB ),
Z
u
(σS , ∇φ ) − t̄S φu ds − (fS , φu ) = 0 ∀φu ∈ VS (ΩS ).
Γtop

4.1.7.4 Mandel’s problem We now present the results of two test cases. Specifically, we consider the well-
known Mandel problem, which is an acknowledged benchmark in subsurface modeling. In the second example,
the augmented Mandel problem [56] is used as verification. The configuration and parameters are taken from
[55, 66].
Remark 4.21. The situation has very practical applications: Suppose a sponge filled with water that is
squeezed: first the fluid pressure inside increases, but with time, the water will flow out and the pressure
will decrease to zero.

4.1.7.4.1 Configuration The domain is [0, 100m] × [0, 20m]. The initial mesh is 4-times globally refined
corresponding to 256 cells and 2467 degrees of freedom.

4.1.7.4.2 Boundary conditions The boundary conditions are given as:

σB n − αB pB n = t̄ − αB pB n on Γtop ,
p = 0 on Γr ,
uy = 0 on Γb ,
ux = 0 on Γl .

53
4. AN EXTENDED INTRODUCTION

Figure 7: Configuration of Mandel’s problem.

4.1.7.4.3 Parameters As parameters, we choose: MB = 2.5 ∗ 1012 P a, cB = 1/MB P a−1 , αB = 1.0, νF =


1.0 ∗ 10−3 m2 /s, KB = 100md = 10−13 m2 , ρF = 1.0kg/m3 , t̄ = F = 1.0 ∗ 107 P a · m. As elasticity parameters
in Biot’s model, we use µS = 1.0e + 8P a, νS = 0.2. The time step is chosen as k = 1000s. The final time is
5 ∗ 106 s (corresponds to computing 5000 time steps).

4.1.7.4.4 Discussion of our findings For different time steps t1 = 1 ∗ 103 , t2 = 5 ∗ 103 , t3 = 1 ∗ 104 , t4 =
1 ∗ 105 , t5 = 5 ∗ 105 and t10 = 5 ∗ 106 [s] (the numeration of time steps is taken from [55]) the solution of
the pressure and the x-displacement evaluated on the x-axis is shown in Figure 8. These values coincides
with the literature values displayed in [55, 66, 91, 92]. In particular, the well-known Mandel Creyer effect
(non-monotonic pressure behavior over time) is identified.

2e+07 5
t1 4.5 t1
t2 4 t2
1.5e+07 t3 t3
t4 3.5 t4
t5 3 t5
ux[m]
p[Pa]

1e+07 t10 2.5 t10


2
1.5
5e+06
1
0.5
0 0
0 20 40 60 80 100 0 20 40 60 80 100
x[m] x[m]

Figure 8: Mandel’s problem: Pressure trace and x-displacement on the x-axis.

4.1.7.5 Augmented Mandel’s problem This second example adds an elastic zone above the porous media.
Such a modification is very useful for practical problems. Specifically, we deal now with a problem in non-
overlapping domains. The configuration is inspired by Girault et al. [56]. This is considered as a new
benchmark configuration that is prototypical for reservoir simulations with some overburden. In the same
manner, an underburdon could have been added in addition. This leads, however, to no additional difficulties.
This second example can be computed by further modifying the programming code or can be immediately run
with another deal.II-based package DOpElib [62].

4.1.7.5.1 Configuration The computational domain is enlarged in height such that [0, 100m] × [0, 40m]
and is sketched in Figure 9.

4.1.7.5.2 Boundary conditions As boundary conditions, we choose:


σB n = t̄ on Γtop ,
σB n = 0 on Γr,S ,
p = 0 on Γr,B ,
uy = 0 on Γb ,
ux = 0 on Γl,B ∪ Γl,S .

54
4. AN EXTENDED INTRODUCTION

Figure 9: Configuration of augmented Mandel’s problem.

4.1.7.5.3 Parameters We use the same Biot-region parameters as in the previous example. In addition,
we use in the pure elastic zone the same Lamé coefficients, such that µB = µS and λB = λS .

2.5e+07 5
t1 4.5 t1
2e+07 t2 4 t2
t3 t3
t4 3.5 t4
1.5e+07 t5 3 t5
ux[m]
p[Pa]

t10 2.5 t10


1e+07 2
1.5
5e+06 1
0.5
0 0
0 20 40 60 80 100 0 20 40 60 80 100
x[m] x[m]

Figure 10: Augmented Mandel’s problem: Pressure trace on the x-axis.

4.1.7.5.4 Discussion of our findings The augmented Mandel problem shows also the well-known Mandel
Creyer effect as illustrated in Figure 10. Here, it is important to carefully implement the interface conditions
on Γi . Otherwise the results are not correct from physical point of view. Graphical solutions of the surfaces
of pressure distribution is shown in Figure 11. Here, we detect the typical pressure behavior on the x-axis.

Figure 11: Augmented Mandel’s problem: Pressure surface for two time steps t1 and t4 .

55
4. AN EXTENDED INTRODUCTION
4.2 Intermediate brief summary: several PDEs in one view
4.2.1 Poisson
Find u : Ω → R:

−∆u = f (14)

Properties: linear, stationary, scalar-valued.

4.2.2 Heat equation


Find u : Ω × I → R:

∂t u − ∆u = f (15)

Properties: linear, time-dependent, scalar-valued.

4.2.3 Wave equation


Find U : Ω × I → R:

∂t2 u − ∆u = f (16)

Properties: linear, time-dependent, scalar-valued.

4.2.4 Biharmonic
Find u : Ω → R:

∆2 u = f (17)

Properties: 4th order, linear, stationary, scalar-valued.

4.2.5 Eigenvalue
Find u : Ω → R:

−∆u = λu (18)

Properties: linear, stationary, scalar-valued.

4.2.6 Linear transport


Find u : Ω → R:

∂t u + b∇u = 0 (19)

Properties: linear, nonstationary, scalar-valued.

4.2.7 Linearized elasticity


Find u : Ω → Rn :

−∇ · σ(u) = f (20)

with σ = 2µe(u) + λtr(e(u))I and e(u) = 21 (∇u + ∇uT ). Properties: linear, stationary, vector-valued.

56
4. AN EXTENDED INTRODUCTION
4.3 Three important linear PDEs
From the previous considerations, we can extract three important types of linear PDEs. But we refrain
from giving the impression that all differential equations can be classified and then a recipe for solving them
applies. This is definitely not true - in particular for nonlinear PDEs. However, often ‘new’ equations can be
related or simplified to these three types. For this reason, it is important to known them.
They read:
• Poisson problem: −∆u = f is elliptic: second order in space and no time dependence.

• Heat equation: ∂t u − ∆u = f is parabolic: second order in space and first order in time.
• Wave equation: ∂t2 u − ∆u = f is hyperbolic: second order in space and second order in time.
All these three equations have in common that their spatial order is two (the highest derivative with respect
to spatial variables that occurs in the equation). With regard to time, there are differences: the heat equation
is of first order whereas the wave equation is second order in time.
Let us now consider these three types in a bit more detail.

4.3.1 Elliptic equations and Poisson problem/Laplace equation


The Poisson equation plus boundary conditions is a boundary-value problem and will be derived from the
general definition (very similar to the form before). Elliptic problems are second-order in space and have no
time dependence.

Formulation 4.22. Let f : Ω → R be given. Furthermore, Ω is an open, bounded set of Rn . We seek the
unknown function u : Ω̄ → R such that

Lu = f in Ω, (21)
u=0 on ∂Ω. (22)

Here, the linear second-order differential operator is defined by (divergence form):


n
X n
X
Lu := − ∂xj (aij (x)∂xi u) + bi (x)∂xi u + c(x)u, u = u(x), (23)
i,j=1 i=1

with the symmetry assumption aij = aji and given coefficient functions aij , bi , c. Alternatively we often use
the compact notation with derivatives defined in terms of the nabla-operator:

Lu := −∇ · (a∇u) + b∇u + cu.

In non-divergence form the above differential operator reads:


n
X n
X
Lu := − aij (x)∂xj ∂xi u + bi (x)∂xi u + c(x)u, u = u(x), (24)
i,j=1 i=1

If the coefficients aij are C 1 functions, then both formulations can be rewritten into the other.
Remark 4.23. Both versions have some advantages: the divergence form is more natural for energy methods
and formulating weak formulations (basis of Galerkin finite elements), while the non-divergence form will be
just used below for some theoretical aspects.
Finally we notice that the boundary condition (22) is called homogeneous Dirichlet condition.
Remark 4.24. Other possible boundary conditions are introduced in Section 4.5.
Remark 4.25. If the coefficient functions a, b, c also depend on the solution u we obtain a nonlinear PDE.

57
4. AN EXTENDED INTRODUCTION
Consider now the main part (from the non-divergence form):
n
X
L̃u := aij (x)∂ij u
i,j=1

and we assume symmetry aij = aji . The operator L̃ (and also L of course) generates a quadratic form Σ,
which is defined as
Xn
Σ(ξ) := aij (x)ξi ξj .
i,j=1

The properties of the form Σ (and consequently the classification of the underlying PDE) depends on the
eigenvalues of the matrix A:
A = (aij )nij=1 .

Definition 4.26 (Elliptic: sign of all eigenvalues is the same). At the a given point x ∈ Ω, the differential
operator L is said to be elliptic if all eigenvalues (for the definition see Section 3.8.1) of A are non-zero and
have the same sign. This is the same as saying A is symmetric, positive definite.

Equivalently one can say:


Definition 4.27. A PDE operator L is (uniformly) elliptic if there exists a constant θ > 0 such that
n
X
aij (x)ξi ξj ≥ θ|ξ|2 ,
i,j=1

for a.e. x ∈ Ω and all ξ ∈ Rn .

Formulation 4.28 (Poisson problem). Setting in Formulation 4.22, aij = δij and bi = 0 and c = 0, we obtain
the Laplace operator. Let f : Ω → R be given. Furthermore, Ω is an open, bounded set of Rn . We seek the
unknown function u : Ω̄ → R such that

Lu = f in Ω, (25)
u=0 on ∂Ω. (26)

Here, the linear second-order differential operator is defined by:

Lu := −∇ · (∇u) = −∆u.

Some important physical laws are related to the Laplace operator (partially taken from [51]):
1. Fick’s law of chemical diffusion

2. Fourier’s law of heat conduction


3. Ohm’s law of electrical conduction
4. Small deformations in elasticity (e.g., membrane, clothesline).

In the following we discuss some characteristics of the solution of the Laplace problem, which can be generalized
to general elliptic problems. Solutions to Poisson’s equation attain their maximum on the boundary and not
in the interior of the domain. This property will also carry over to the discrete problem and can be used to
check the correctness of the corresponding numerical solution.
Definition 4.29 (Harmonic function). A C 2 function satisfying ∆u = 0 is called a harmonic function.

58
4. AN EXTENDED INTRODUCTION
Theorem 4.30 (Strong maximum principle for the Laplace problem). Suppose u ∈ C 2 (Ω)∩C(Ω̄) is a harmonic
function. Then
max u = max u.
Ω̄ ∂Ω

Moreover, if Ω is connected and there exists a point y ∈ Ω in which

u(y) = max u,
Ω̄

then u is constant within Ω. The same holds for −u, but then for minima.
Proof. See Evans [51].
Remark 4.31 (Maximum principle in practice). An illustration of the maximum principle in practice for a
1D setting is provided in Figure 13, where the maximum values are clearly obtained on the two boundary points
of the domain.
Corollary 4.32. For finding u ∈ C 2 (Ω) ∩ C(Ω̄), such that

∆u = 0 in Ω,
u=g on ∂Ω,

with g ≥ 0, it holds that

u>0 everywhere in Ω if g > 0 on some part of ∂Ω. (27)

This property also shows that the Laplace equation has an infinite propagation speed of perturbations: if we
change the solution by ε in one point, the solution in the entire domain will (slightly) change.
From the maximum principle we obtain immediately uniqueness of a solution:
Theorem 4.33 (Uniqueness of the Laplace problem). Let g ∈ C(∂Ω) and f ∈ C(Ω). Then there exists at
most one solution u ∈ C 2 (Ω) ∩ C(Ω̄) of the boundary-value problem:

−∆u = f in Ω, (28)
u=g on ∂Ω. (29)

Proof. If u1 and u2 both satisfy the equation, we apply the maximum principle to the harmonic functions

w = u1 − u2 , w = −(u1 − u2 ).

This yields u1 = u2 .

4.3.2 Parabolic equations and heat equation


We now study parabolic problems, which contain as the most famous example the heat equation. Parabolic
problems are second order in space and first order in time.
In this section, we assume Ω ⊂ Rn to be an open and bounded set. The time interval is given by I := (0, T ]
for some fixed end time value T > 0.
We consider the initial/boundary-value problem (IBVP):
Formulation 4.34. Let f : Ω × I → R and u0 : Ω → R be given. We seek the unknown function u : Ω̄ × I → R
such that2

∂t u + Lu = f in Ω × I, (30)
u=0 on ∂Ω × [0, T ], (31)
u = u0 on Ω × {t = 0}. (32)
2 Forthe heat equation, we should have better used the notation T , the temperature, for the variable rather than u. But to stay
consistant with the entire notation, we still use u.

59
4. AN EXTENDED INTRODUCTION
Here, the linear second-order differential operator is defined by:
n
X n
X
Lu := − ∂xj (aij (x, t)∂xi u) + bi (x, t)∂xi u + c(x, t)u, u = u(x, t)
i,j=1 i=1

for given (possibly spatial and time-dependent) coefficient functions aij , bi , c.


We have the following:
Definition 4.35. The PDE operator ∂t + L is (uniformly) parabolic if there exists a constant θ > 0 such that
n
X
aij (x, t)ξi ξj ≥ θ|ξ|2 ,
i,j=1

for all (x, t) ∈ Ω × I and all ξ ∈ Rn . In particular this operator is elliptic in the spatial variable x for each
fixed time 0 ≤ t ≤ T .
Definition 4.36 (Parabolic: one eigenvalue zero; all others have the same sign). For parabolic PDEs one
eigenvalue of the space-time coefficient matrix is zero whereas the others have the same sign.
Formulation 4.37 (Heat equation). Setting in Formulation 4.34, aij = δij and bi = 0 and c = 0, we obtain
the Laplace operator. Let f : Ω → R be given. Furthermore, Ω is an open, bounded set of Rn . We seek the
unknown function u : Ω̄ → R such that

∂t u + Lu = f in Ω × I, (33)
u=0 on ∂Ω × [0, T ], (34)
u = u0 on Ω × {t = 0}. (35)

Here, the linear second-order differential operator is defined by:

Lu := −∇ · (∇u) = −∆u.

The heat equation has an infinite propagation speed for disturbances. If the initial temperature is non-
negative and is positive somewhere, the temperature at any later time is everywhere positive. For the heat
equation, a very similar maximum principle as for elliptic problems holds true. This principle is extended to
the parabolic boundary. We deal with a diffusions problem and flow points into the direction of the negative
gradient. Define Q = (0, T ] × Ω the space-time cylinder. The parabolic boundary is given by Γ := Q̄ − Q,
namely:    
Γ = [0, T ] × ∂Ω ∪ {0} × Ω .

This definition contains the bottom and the vertical sides of Q × [0, T ], but not the top. For f = 0 the
maximum temperature u is obtained on the parabolic boundary. In other words, the maximum is taken at the
initial time step or on the boundary of the spatial domain.
Theorem 4.38 (Strong parabolic maximum principle for the heat equation). Suppose u ∈ C12 (Ω×I)∩C(Ω̄×I)
is a solution of the heat equation:
∂t u = ∆u in Ω × I = (0, T ).
Then
max u = max u.
Ω̄×I Γ

In other words: the maximum u (or the minimum) is obtained either at t = 0 or on [0, T ] × ∂Ω. A numerical
test demonstrating the maximum principle is provided in Section 12.3.12.

Remark 4.39. As for elliptic problems, the maximum principle can be used to proof uniqueness for the
parabolic case.

60
4. AN EXTENDED INTRODUCTION
4.3.3 Hyperbolic equations and the wave equation
In this section, we shall study hyperbolic problems, which are natural generalizations of the wave equation. As
for parabolic problems, let Ω ⊂ Rn to be an open and bounded set. The time interval is given by I := (0, T ]
for some fixed end time value T > 0.
We consider the initial/boundary-value problem:
Formulation 4.40. Let f : Ω × I → R and u0 , v0 : Ω → R be given. We seek the unknown function
u : Ω̄ × I → R such that
∂t2 u + Lu = f in Ω × I, (36)
u=0 on ∂Ω × [0, T ], (37)
u = u0 on Ω × {t = 0}, (38)
∂t u = v0 on Ω × {t = 0}. (39)
In the last line, ∂t u = v can be identified as the velocity. Furthermore, the linear second-order differential
operator is defined by:
n
X n
X
Lu := − ∂xj (aij (x, t)∂xi u) + bi (x, t)∂xi u + c(x, t)u, u = u(x, t)
i,j=1 i=1

for given (possibly spatial and time-dependent) coefficient functions aij , bi , c.


Remark 4.41. The wave equation is often written in terms of a first-order system in which the velocity is
introduced and a second-order time derivative is avoided. Then the previous equation reads: Find u : Ω̄×I → R
and v : Ω̄ × I → R such that
∂t v + Lu = f in Ω × I, (40)
∂t u = v in Ω × I, (41)
u=0 on ∂Ω × [0, T ], (42)
u = u0 on Ω × {t = 0}, (43)
v = v0 on Ω × {t = 0}. (44)
We have the following:
Definition 4.42. The PDE operator ∂t2 + L is (uniformly) hyperbolic if there exists a constant θ > 0 such
that
Xn
aij (x, t)ξi ξj ≥ θ|ξ|2 ,
i,j=1
n
for all (x, t) ∈ Ω × I and all ξ ∈ R . In particular this operator is elliptic in the spatial variable x for each
fixed time 0 ≤ t ≤ T .
Definition 4.43 (Hyperbolic: one eigenvalue has a different sign than all others). A space-time differential
operator is called hyperbolic when one eigenvalue has a different sign than all the others.
And as before, setting the coefficient functions to trivial values, we obtain the original wave equation. In
contrast to parabolic problems, a strong maximum principle does not hold for hyperbolic equations. And
consequently, the propagation speed is finite. This means that a disturbance at some point x ∈ Ω at some
time t ∈ I is not immediately transported to any point x ∈ Ω at any time t ∈ I.

4.4 Nonconstant coefficients. Change of the PDE types elliptic, parabolic, hyperbolic
Even so that we define separate types of PDEs, in many processes there is a mixture of these classes in one
single PDE - depending on the size of certain parameters. For instance, the Navier-Stokes equations (only
showing here the conservation of momentum):
1
∂t v + v · ∇v − ∆v + ∇p = f
Re

61
4. AN EXTENDED INTRODUCTION
for modeling fluid flow, vary between parabolic and hyperbolic type depending on the Reynold’s number
Re ∼ νf−1 (respectively a characteristic velocity and a characteristic length). For Re → 0, we obtain a
first-order hyperbolic equation:
∂t v + v · ∇v + ∇p = f,
while for Re → ∞, we obtain a parabolic type:
1
∂t v − ∆v + ∇p = f.
Re
Relating this to the general definition of the differential operator L, we can say that the coefficients aij of the
differential operator L are non-constant and change with respect to space and time.

4.5 Boundary conditions and initial data


As seen in the previous sections, all PDEs are complemented with boundary conditions and in time-dependent
cases, also with initial conditions. Actually, these are crucial ingredients for solving differential equations.
Often, one has the (wrong) impression that only the PDE itself is of importance. But what happens on the
boundaries finally yields the ‘result’ of the computation. And this holds true for analytical (classical) as well
as computational solutions.

4.5.1 Boundary conditions


In Section 4.3, for simplicity, we have mainly dealt with homogeneous Dirichlet conditions of the form u = 0
on the boundary ∂Ω = ∂ΩD ∪ ∂ΩN ∪ ∂ΩR . In general three types of conditions are of importance:

• Dirichlet (or essential) boundary conditions: u = gD on ∂ΩD ; when gD = 0 we say ‘homogeneous’


boundary condition.
• Neumann (or natural) boundary conditions: ∂n u = gN on ∂ΩN ; when gN = 0 we say ‘homogeneous’
boundary condition.

• Robin (third type) boundary condition: au + b∂n u = gR on ∂ΩR ; when gR = 0 we say ‘homogeneous’
boundary condition.
On each boundary section, only one of the three conditions can be described. In particular the natural
conditions generalize when dealing with more general operators L. Here in this section, we have assumed
L := −∆u.

Remark 4.44. Going from differential equations to variational formulations, Dirichlet conditions are ex-
plicitely built into the function space, for this reason they are called essential boundary conditions. Neu-
mann conditions appear explicitly through integration by parts; for this reason they are called natural bound-
ary conditions.

4.5.2 Initial conditions


The number of initial conditions depends on the order of temporal time derivatives. The usual notation is:

u = u0 in Ω × {t = 0}.

As examples, we have:
• ∂t u − ...: first order in time: one initial condition.

• ∂t2 u − ...: second order in time: two initial conditions.

62
4. AN EXTENDED INTRODUCTION
4.5.3 Example: temperature in a room
Example 4.45. Let us compute the heat distribution in our room b302. The room volume is Ω. The window
wall is a Dirichlet boundary ∂D Ω and the remaining walls are Neumann boundaries ∂N Ω. Let K be the air
viscosity.
We consider the heat equation: Find T : Ω × I → R such that
∂t T + (v · ∇)T − ∇ · (K∇T ) = f in Ω × I,
T = 18◦ C on ∂D Ω × I,
K∇T · n = 0 on ∂N Ω × I,

T (0) = 15 C in Ω × {0}.
The homogeneous Neumann condition means that there is no heat exchange on the respective walls (thus
neighboring rooms will have the same room temperature on the respective walls). The nonhomogeneous Dirichlet
condition states that there is a given temperature of 18◦ C, which is constant in time and space (but this
condition may be also non-constant in time and space). Possible heaters in the room can be modeled via the
right hand side f . The vector v : Ω → R3 denotes a given flow field yielding a convection of the heat, for
instance wind. We can assume v ≈ 0. Then the above equation is reduced to the original heat equation:
∂t T − ∇ · (K∇T ) = f .

4.6 The general definition of a differential equation


After the previous examples, let us precise the definition of a differential equation. In order to allow for largest
possible generality we first need to introduce some notation. Common is to use the multiindex notation as
introduced in Section 3.6.

4.6.1 Single-valued differential equations


Definition 4.46 (Evans [51], single-valued PDE). Let Ω ⊂ Rn be open. Here, n denotes the total dimension
including time. Furthermore, let k ≥ 1 an integer that denotes the order of the differential equation. Then, a
differential equation can be expressed as: Find u : Ω → R such that
F (Dk u, Dk−1 u, . . . , D2 u, Du, u, x) = 0 x ∈ Ω,
where k k−1 2
F : Rn × Rn × . . . × Rn × Rn × R × Ω → R.
Example 4.47. We provide some examples. Let us assume the spatial dimension to be 2 and the temporal
dimension is 1. That is for a time-dependent ODE (ordinary differential equation) problem n = 1 and for
time-dependent PDE cases, n = 2 + 1 = 3, and for stationary PDE examples n = 2.
1. ODE model problem: F (Du, u) := u0 − au = 0 where F : R1 × R → R. Here k = 1.
2. Laplace operator: F (D2 u) := −∆u = 0 where F : R4 → R. That is k = 2 and lower derivatives of order
1 and 0 do not exist. We notice that in the general form it holds
 
2 ∂xx u ∂yx u
D u=
∂xy u ∂yy u
and specficially now for the Laplacian (consider the sign as well!):
 
2 −∂xx u 0
D u=
0 −∂yy u

3. Heat equation: F (D2 u, Du) = ∂t u − ∆u = 0 where F : R9 × R3 → R. Here k = 2 is the same as before,


but a lower-order derivative of order 1 in form of the time derivative does exist. We provide again details
on D2 u:    
∂tt u ∂xt u ∂yt u ∂t u
D2 u =  ∂tx u ∂xx u ∂yx u , Du =  ∂x u 
   
∂ty u ∂xy u ∂yy u ∂y u

63
4. AN EXTENDED INTRODUCTION
and for the heat equation:
   
0 0 0 ∂t u
D2 u =  0 −∂xx u 0 , Du =  0 
   
0 0 −∂yy u 0

4. Wave equation: F (D2 u) = ∂t2 u − ∆u = 0 where F : R9 → R. Here it holds specifically


 
∂tt u 0 0
D2 u =  0 −∂xx u 0 .
 
0 0 −∂yy u

We finally compare the coeffcient matrices and recall the type classifications. Let us denote the coefficient
matrix by A. Then:
   
2 −∂xx u 0 −1 0
D u= → A= → Same sign: elliptic
0 −∂yy u 0 −1
   
0 0 0 0 0 0
D2 u =  0 −∂xx u 0  → A =  0 −1 0  → One eigenvalue zero; others same sign: parabolic
   
0 0 −∂yy u 0 0 −1
   
∂tt 0 0 1 0 0
D2 u =  0 −∂xx u 0  → A =  0 −1 0  → One eigenvalue has different sign: hyperbolic
   
0 0 −∂yy u 0 0 −1

4.6.2 Vector-valued differential equations, PDE systems


Definition 4.48 (Evans [51], PDE system). Let Ω ⊂ Rn be open. Here, n denotes the total dimension
including time. Furthermore, let k ≥ 1 an integer that denotes the order of the differential equation. Then, a
vector-valued differential equation can be expressed as: Find u = (u1 , . . . , um ) : Ω → Rm such that

F (Dk u, Dk−1 u, . . . , D2 u, Du, u, x) = 0 x ∈ Ω,

where k k−1 2
F : Rmn × Rmn × . . . × Rmn × Rmn × Rm × Ω → Rm .

Example 4.49. We discuss two vector-valued PDEs:


1. Vector-Laplace: 
  
u1 f1
 .   . 
.  
 .  =  .. .
−∆u = −∆ 
um fm
Say in n = 2, and m = 3 we have
   
u1 (x, y) f1
−∆u = −∆ u2 (x, y)  =  f2 .
   
u3 (x, y) f3

Here,
F (D2 u) := −∆u − f,
with 2
F : R3·2 × Ω → R3 .

64
4. AN EXTENDED INTRODUCTION
2. Linearized elasticity (see also Chapter 11): Find u = (u1 , u2 , u3 ) : Ω → R3 such that

−∇ · σ(u) = f,

where f = (f1 , f2 , f3 ) and the (very simple!) stress tensor


1
σ(u) = (∇u + ∇uT ).
2
In comparison to the previous vector Laplace, we have now cross terms in the off-diagonal blocks:
   
∂x ux ∂y ux ∂z ux ∂x ux ∂x uy ∂x uz
1  
σ(u) =   ∂x uy ∂y uy ∂z uy  +  ∂y ux ∂y uy ∂y uz 
 
2  
∂x uz ∂y uz ∂z uz ∂z ux ∂z uy ∂z uz

Here,
F (D2 u) := −∇ · σ(u) − f,
with (as for vector-Laplace)
2
F : R3·2 × Ω → R3 .

4.7 Classifications into linear and nonlinear PDEs


Based on the previous notation, we now provide classifications of linear and nonlinear differential equations.
Simply speaking: each differential equation, which is not linear is called nonlinear. However, in the nonlinear
case a further refined classification can be undertaken.

Definition 4.50 (Evans[51] ). Differential equations are divided into linear and nonlinear classes as follows:
1. A differential equation is called linear if it is of the form:
X
aα (x)Dα u − f (x) = 0.
|α|≤k

2. A differential equation is called semi-linear if it is of the form:


X
aα (x)Dα u + a0 (Dk−1 u, . . . , D2 u, Du, u, x) = 0.
|α|=k

Here, nonlinearities may appear in all terms of order |α| < k, but the highest order |α| = k is fully linear.
3. A differential equation is called quasi-linear if it is of the form:
X
aα (Dk−1 u, . . . , D2 u, Du, u, x)Dα u + a0 (Dk−1 u, . . . , D2 u, Du, u, x) = 0.
|α|=k

Here, full nonlinearities may appear in all terms of order |α| < k, in the highest order |α| = k, nonlinear
terms appear up to order |α| < k.
4. If none of the previous cases applies, a differential equation is called (fully) nonlinear.

Remark 4.51. In theory as well as practice it holds: the more nonlinear and equation is, the more difficult
the analysis becomes and the more challenging is the design of good numerical methods.

65
4. AN EXTENDED INTRODUCTION
4.7.1 Examples
How can we now proof in practice, whether a PDE is linear or nonlinear? The simplest way is to go term by
term and check the linearity condition from Definition 3.15.
Example 4.52. We provide again some examples:
1. All differential equations from Example 4.47 are linear. Check term by term Definition 3.15!
2. Euler equations (fluid dynamics, special case of Navier-Stokes with zero viscosity). Here, n = 2+1+1 = 4
(in two spatial dimensions) in which we have 2 velocity components, 1 pressure variable, and 1 temporal
variable. Let us consider the momentum part of the Euler equations:

∂t vf + vf · ∇vf + ∇pf = f.

Here the highest order is k = 1 (in the temporal variable as well as the spatial variable). But in front
of the spatial derivative, we multiply with the zero-order term vf . Consequently, the Euler equations are
quasi-linear because a lower-order term of the solution variable is multiplied with the highest derivative.
3. Navier-Stokes momentum equation:

∂t vf − ρf νf ∆vf + vf · ∇vf + ∇pf = f.

Here k = 2. But the coefficients in front of the highest order term, the Laplacian, do not depend on vf .
Consequently, the Navier-Stokes equations are neither fully nonlinear nor quasi-linear. Well, we have
again the nonlinearity of the first order convection term vf · ∇vf . Thus the Navier-Stokes equations are
semi-linear.
4. A fully nonlinear situation would be:

∂t vf − ρf νf (∆vf )2 + vf · ∇vf + ∇pf = f.

Example 4.53 (Development of numerical methods for nonlinear equations). In case you are given a nonlinear
IBVP (initial-boundary value problem) and want to start developing numerical methods for this specific PDE,
it is often much easier to start with appropriate simplifications in order to build and analyze step-by-step your
final method. Let us say you are given the nonlinear time-dependent PDE

∇u∂t2 u + u · ∇u − (∆u)2 = f

Then, you could tackle the problem as follows:


1. Consider the linear equation:
∂t2 u − ∆u = f
which is nothing else than the wave equation.
2. Add a slight nonlinearity to make the problem semi-linear:

∂t2 u + u · ∇u − ∆u = f

3. Add ∇u such that the problem becomes quasi-linear:

∇u∂t2 u + u · ∇u − ∆u = f

4. Make the problem fully nonlinear by considering (∆u)2 :

∇u∂t2 u + u · ∇u − (∆u)2 = f.

In each step, make sure that the corresponding numerical solution makes sense and that your developments so
far are correct. If yes, proceed to the next step.

66
4. AN EXTENDED INTRODUCTION
Remark 4.54. The contrary to the previous example works as well and should be kept in mind! If you have
implemented the full nonlinear PDE and recognize a difficulty, you can reduce the PDE term by term and make
it step by step simpler. The same procedure holds true for the mathematical analysis.
Exercise 3. Classify the following PDEs into linear and nonlinear PDEs (including a short justification):

−∇ · (|∇u|p−2 ∇u) = f
det(∇2 u) = f
∂t2 u + 2a∂t u − ∂xx u = 0, a>0
∂t u + u∂x u = 0
∂tt − ∆u + m2 u = 0, m > 0.

How can we linearize nonlinear PDEs? Which terms need to be simplified such that we obtain a linear PDE?

4.8 Further remarks on the classification into three important types


The previous general definitions into elliptic, parabolic, and hyperbolic types can be made clearer when we
simplify the situation of (23) in R2 :

Lu := a11 ∂x2 u + 2a12 ∂x ∂y u + a22 ∂y2 u + a1 ∂x u + a2 ∂y u + au = f

with, for the moment, constant coefficients aij . Moreover, a11 , a12 , a22 should not be zero simultaneously. The
main part of L can be represented by the quadratic form:

q(x, y) = a11 x2 + 2a12 xy + a22 y 2 .

Solving q(x, y) = 0 yields conic sections in the xy plane. Specifically, we have a relation to geometry:
• Circle, ellipse: x2 + y 2 = 1;
• Parabola: y = x2 ;
• Hyperbola: x2 − y 2 = 1.
In order to apply this idea to L we write the main part of L in matrix-vector form:
 T   
∂x a11 a12 ∂x
L0 := a11 ∂x2 + 2a12 ∂x ∂y + a22 ∂y2 = .
∂y a21 a22 ∂y

The type of the PDE can now be dermined by investigating the properties of the coefficient matrix:
Proposition 4.55. The differential operator L can be brought through a linear transformation to one of the
following forms:
1. Elliptic, if a212 < a11 a22 .
example
→ ∂x2 u + ∂y2 u + . . . = 0 → ∆u = 0
2. Parabolic, if a212 = a11 a22 .
example
→ ∂x2 u + . . . = 0 → ∂t u − ∆u = 0
3. Hyperbolic, if a212 > a11 a22 .
example
→ ∂x2 u − ∂y2 u + . . . = 0 → ∂t2 u − ∆u = 0
The . . . represent terms of lower order.
Remark 4.56. Recall that this corresponds exactly to our previous definitions of elliptic, parabolic, and hy-
perbolic cases.

67
4. AN EXTENDED INTRODUCTION
Proof. We proof the result for the elliptic case. The other cases will follow in a similar manner.
Without loss of generality, we assume a11 = 1, a1 = a2 = a0 = 0. Completing the square yields
0 = (∂x + a12 ∂y )2 u + (a22 − a212 )∂y2 u
(45)
= (∂x2 + 2a12 ∂x ∂y + a212 ∂y2 )u + (a22 − a212 )∂y2 u
p
under the assumption that a212 < a11 · a22 . Set b := a22 − a212 > 0.
We now do a coordinate transformation

x = ξ, y = a12 · ξ + b · η (linear in ξ and η)

Recall how derivatives transform under coordinate transformations. We define

v(ξ, η) = u(x, y)

and interprete x and y as functions:

x(ξ, η), y(ξ, η) → v(ξ, η) = u(x(ξ, η), y(ξ, η)).

We compute (∂ξ v, ∂η v):


 
∂ξ x ∂η x
(∂ξ v, ∂η v) = (∂x u, ∂y u)
∂ξ y ∂η y
 
1 0
= (∂x u, ∂y u)
a12 b
= (∂x u + a12 ∂y u, 0 · ∂x u + b · ∂y u).
This yields: ∂ξ = ∂x + a12 ∂y and ∂η = b · ∂y .
We plug the result into (45). The resulting equation is:

∂ξ2 + ∂η2 = 0,

i.e., the Laplace equation.

Remark 4.57. This definition also works for L that depends on space and time. Just replace the variable y
through the time t.
Example 4.58. Classify the following equations:
1. ∂xx u − 5∂xy u = 0
2. 4 ∂xx u − 12∂xy u + 9∂yy u + ∂y u = 0
3. 4 ∂xx u + 6∂xy u + 9∂yy u = 0
As previously discussed we compute the discriminant D = a212 − a11 a22 in order to see which case we have on
hand:
1. D = (− 25 )2 − 1 · 0 = 25
4 >0 → hyperbolic
2. D = (−6)2 − 4 · 9 = 36 − 36 = 0 → parabolic
2
3. D = 3 − 4 · 9 = 9 − 36 = −25 < 0 → elliptic.
Example 4.59. Determine subdomains of the xy plane in which

y∂xx u − 2∂xy u + x∂yy u = 0

is elliptic, hyperbolic or parabolic. First, we have D = (−1)2 − y · x = 1 − yx. The equation is parabolic for
xy = 1 because 1 − 1 = 0. The elliptic parts are convex xy > 1 because D = 1 − yx < 0. In the connected part
xy < 1, the equation is hyperbolic.

68
4. AN EXTENDED INTRODUCTION
4.9 PDEs that are not elliptic, parabolic or hyperbolic
We have intensively discussed in the previous sections the classification into elliptic, parabolic, and hyperbolic
types. It should be emphasized that these are convenient classifications for which well developed theories
exist. But this does not mean that any linear PDE needs to fit into these types. For instance,

∂t u − ∂xx u − ∂zz u = f

is also a PDE that is very similar to a parabolic one, but the term ∂yy u is missing. Consequently, the coefficient
matrix has two eigenvalues that are zero and two eigenvalues that are minus one. This is not simply a parabolic
PDE anymore, but still a PDE that we can investigate and numerically model. Also, mixed derivatives might
appear as well:
2
∂tt u − ∂xx u − ∂yz u − ∂zx u = f
This is a linear second order PDE, which is also not anymore parabolic or hyperbolic, but still a valid PDE
that we can try to treat numerically. PDEs can also be of higher order:
2
∂tt u + ∆2 u = f

which is a linear fourth-order PDE. Or


2
∂tt u − ∂xxy u = f,
which is a third-order PDE. All these PDEs in this section are mathematically valid, but we do not claim that
they have specific physical applications though. However, fourth-order PDEs are well-known in mechanics for
plates and beams; see [24].

4.10 Chapter summary and outlook


In this extended introduction, we formulated several important PDEs and discussed prototype applications
and equations. Furthermore, we classified PDEs and introduced their level of linearity. In the next section,
we first state some characteristic classifications of differential equations.

69
4. AN EXTENDED INTRODUCTION

70
5. CLASSIFICATIONS

5 Classifications
In this chapter we classify PDEs into different categories:
Definition 5.1 (Classification of differential equations). The following properties are distinguished:
1. Independent variables (x, y, z, t, . . .); if only one, then ODE (ordinary differential equation), otherwise
PDE;
2. Steady-state problems or time-dependent PDEs;
3. Physical conservation properties such as mass, momentum, diffusion, transport, energy;
4. Order of a PDE (in time, space, or both);
5. Single equations and PDE systems;
6. Nonlinear problems:
• Nonlinearities in the PDE through state variables,
• Nonlinear constitutive laws,
• Inequality constraints;
7. Coupled systems:
• coupling through model or material parameters,
• linear or nonlinear coupling terms,
• right-hand sides,
• interfaces.

5.1 Physical conservation properties


Despite not of main interest in these lecture notes, but rather in Numerical Methods for Continuum Mechanics,
let us briefly comment on them. Often, conservation laws are useful to know in order to choose the correct
numerical methods. One example is energy conservation (see also Chapter 12) for which time-stepping schemes
with minimal numerical diffusion must be chosen.

5.2 Order of a PDE


As we have already seen in the previous chapters, the first property that we can distingish and which is quite
important is the order of a PDE. Most problems have probably order 2 in space. Examples are the Laplacian,
heat equation, wave equation, p-Laplace, Navier-Stokes. In time we deal mainly of order 0 (Poisson) to 2
(wave equation). In elasticity, several equations are of order 4 in space. Examples are beams.

5.3 Single equations versus PDE systems


Definition 5.2 (Single equation). A single PDE consists of determining one solution variable, e.g.,

u : Ω ⊂ Rd → R.

Typical examples are Poisson’s problem, the heat equation, wave equation, Monge-Ampère equation, Hamiliton-
Jacobi equation, p-Laplacian.
Definition 5.3 (PDE system). A PDE system determines a solution vector

u = (u1 , . . . , ud ) : Ω → Rd .

For each ui , i = 1, . . . , d, a PDE must be solved. Inside these PDEs, the solution variables may depend on each
other or not. Typical examples of PDE systems are linearized elasticity, nonlinear elasto-dynamics, Maxwell’s
equations.

71
5. CLASSIFICATIONS
5.4 Linear versus nonlinear
Simply speaking: each differential equation, which is not linear is called nonlinear. However, in the nonlinear
case a further refined classification can be undertaken as we have seen before.
When we deal with a nonlinear equation, we often in terms of the notation: Find u ∈ V such that

a(u)(φ) = l(φ) ∀φ ∈ V.

In the so called semi-linear form a(u)(φ), the first argument u may be nonlinear while the text function φ is
still linear.
Remark 5.4 (Semi-linear vs. semi-linear). The wording semi-linear should not be confused. A semi-linear
form is a weak form of a nonlinear PDE in which the test function is still linear. Nonlinear equations can be
distinguished into semi-linear, quasi-linear, fully nonlinear. Therefore, the notation of a semi-linear form does
not indicate with which level of nonlinearity we are deadling with.

5.5 Decoupled versus coupled


We shall specify how decoupled PDE systems can be characterized.

5.5.1 Definitions
Definition 5.5 (Coupled PDEs). A coupled PDE (system) consists of at least two PDEs that interact with
each other. Namely, the solution of one PDE will influence the solution of the other PDE.
Remark 5.6. Coupled problems may also decouple by changing certain coefficients (e.g., Biot equations in
porous media) or designing solution algorithms based on partitioned procedures. Due to decoupling, the single
semi-linear forms may even become linear (e.g., Biot equations), but not always (e.g., fluid-structure interaction
in which the subproblems NSE and elasticity are still nonlinear themselves).

5.5.2 Coupling situations


The coupling can be of various types:
• Volume coupling (the PDEs live in the same domain) and exchange information via volume terms, right
hand sides, coefficients.
Example: Biot equations for modeling porous media flow, phase-field fracture, Cahn-Hilliard phase-field
models for incompressible flows.

• Interface coupling (the PDEs live in different domains) and the information of the PDEs is only exchanged
on the interface.
Example: fluid-structure interaction.
The coupling is first of all based on first physical principles. Based on this information, appropriate numerical
schemes can be derived. Specifically, interface-based coupling is not simple! Imagine a regular finite element
mesh and assume that the interface cuts through the mesh elements.
In principle two approaches are distinguished:
• Interface-tracking: the interface is resolved in an exact fashion
• Interface-capturing: the interface moves through the mesh and detected with the help of an additional
function (level-set or phase-field).
The situation becomes even more challenging when the coupled problems are modeled in different coordinate
systems as it is the case for fluid-structure interaction. Fluids are described in Eulerian coordinates and solids
usually in the Lagrangian framework. Here we first have to decide in which coordinate system we want to
work. Four choices are possible:

• Eulerian (fluid) / Lagrangian (solid) - natural systems;

72
5. CLASSIFICATIONS
• Lagrangian/Eulerian (fluid) / Lagrangian (solid) - arbitrary Lagrangian-Eulerian;
• Lagrangian (fluid) / Eulerian (solid) - unnatural ! Not in use!
• Eulerian (fluid) / Eulerian (solid) - fully Eulerian.

5.6 Variational equations versus variational inequalities


Variational inequalities generalize weak forms such that constraints on the solution variables can be incorpo-
rated.
Let (V, k · k) be a real Hilbert space and K a nonempty, closed, convex subset of V . We define the mapping
A : K → V ∗ , where V ∗ is as usual the dual space of V . Then, we define the abstract problem:
Formulation 5.7 (Abstract variational inequality). Find u ∈ K such that

A(u)(ϕ − u) ≥ 0 ∀ϕ ∈ K.

If K is a linear subspace (and not only a set!) of V , then we have the linearity properties, see Section 3.12,
which allow us to take
ϕ := u ± w ∈ K
for each w ∈ K. Since the semi-linear form A(·)(·) is linear in the second argument, we obtain:

A(u)(w) ≥ 0 and A(u)(−w) ≥ 0

from which follows:


A(u)(w) = 0 ∀w ∈ K.
This shows that weak equations are contained in the more general setting of variational inequalities.

5.6.0.1 On the notation In the literature, the notation of the duality pairing is often found for describing
weak forms and variational inequalities. This notation highlights better in which spaces we are working. Here,
the previous statements read:
Formulation 5.8 (Abstract variational inequality - duality pairing notation). Find u ∈ K such that

hA(u), ϕ − ui ≥ 0 ∀ϕ ∈ K.

Here we see more easily that A(u) ∈ V ∗ and ϕ − u ∈ V .


If K is a linear subspace (and not only a set!) of V , then

ϕ := u ± w ∈ K

for each w ∈ K. Since the semi-linear form A(·)(·) is linear in the second argument, we obtain:

hA(u), wi ≥ 0 and hA(u), −wi ≥ 0

from which follows:


hA(u), wi = 0 ∀w ∈ K.

5.6.0.2 Minimization of functionals Let {V, k · k} be a reflexive, real Hilbert space and K be a nonempty
closed convex subset of V and F : K → R be a real functional defined on K.
Formulation 5.9 (Minimization problem). Find u ∈ K such that

F (u) = inf F (v)


v∈K

or equivalently
F (u) ≤ F (v) ∀v ∈ K.

73
5. CLASSIFICATIONS
Definition 5.10 (Convexity). A functional F : K → R is convex if and only if

F (θu + (1 − θ)v) ≤ θF (u) + (1 − θ)F (v) u, v ∈ K, 0 ≤ θ ≤ 1.

The functional is strictly convex when < holds true.


Definition 5.11. Let
d
lim F (u + εv) = F 0 (u)(v) = hF 0 (u), vi
ε→0 dε

be the Gateaux derivative of F .


Theorem 5.12. Let K be a subset of a normed linear space V and let F : K → R be Gateaux differentiable.
If u is a minimizer of F in K, then u can be characterized as follows:
1. If K is a nonempty closed convex subset of V , then

F 0 (u)(v − u) ≥ 0 ∀v ∈ K.

2. If K is a nonempty closed convex subset of V and u ∈ int(K), then

F 0 (u)(v) ≥ 0 ∀v ∈ K.

3. If K is a linear variety with respect to w (i.e., K is a linear subspace translated by a factor w with respect
to the origin), then
F 0 (u)(v) = 0 ∀v ∈ V
and v − w ∈ K.
4. If K is a linear subspace of V , then
F 0 (u)(v) = 0 v ∈ K.

Proof. Sketch.
1. The set K is convex. Thus w := u + θ(v − u) ∈ K for θ ∈ [0, 1] and for every u, v ∈ K. If u is a minimizer
of K, then we have
F (u) ≤ F (w) = F (u + θ(v − u)) ∀v ∈ K.
Differentiation yields:
d
[F (u + θ(v − u)) − F (v)]|θ=0 = F 0 (u)(v − u) ≥ 0.

2. If u ∈ int(K) and any v ∈ K. Then there exists ε ≥ 0 such that u + εv ∈ K. Therefore, we insert
element in into the previous characterization and obtain:

F 0 (u)(u + εv − u) = F 0 (u)(εv) ≥ 0.

Since ε is arbitrary, this implies the assertion.


3. Let K = {w} + M , where M is a linear subspace of V . Then w − v ∈ K implies v ∈ M . The same holds
for −v ∈ M . Then,
F 0 (u)(v) = 0 v ∈ V such that v − w ∈ K.

4. Take w = 0. Then,
F 0 (u)(v) = 0 v ∈ K.

74
5. CLASSIFICATIONS
5.7 Coupled variational inequality systems (CVIS)
In multiphysics problems, several physical phenomena interact with each other, as for instance: solids, fluids,
heat transfer, chemical reactions, and electromagnetic forces. The governing equations in these lecture notes are
based on continuum mechanics. Such models often result in nonstationary, nonlinear, coupled systems of
partial differential equations (PDEs) and they may be subject to inequality constraints. Using variational
formulations, in [141], the resulting system is referred to as a coupled variational inequality3 system
(CVIS). Characteristic features are:

1. At least two PDEs/VIs are involved;


2. Models in Lagrangian and Eulerian coordinates;
3. Different physical conservation laws such as balance of mass, momentum, angular momentum, and energy;
4. Second law of thermodynamics: entropy inequality principle characterizing the energy transfer direction
in order to distinguish irreversible or reversible processes;
5. Information exchange between different PDEs/CVISs via:
• coefficients,
• linear or nonlinear coupling terms,
• right-hand sides,
• interfaces (internal boundaries).

Remark 5.13 (Coupled problems versus multiphysics). Phase-field fracture modeling as a key ingredient
of multiphysics problems addressed in this book, consists of two interacting problems. However, phase-field
fracture is not yet a multiphysics system because only one PDE describes physics, namely discplacements. The
second problem is a VI, which is an indicator function that ‘only’ locates a discontinuity, damage or fracture.
Therefore, phase-field fracture is simply a so-called coupled problem.

5.8 Chapter summary and outlook


In the next section, we introduce finite differences as a first class of numerical methods for elliptic equations.
In Section 8, we then discuss finite elements as another class of discretization methods.

3A variational inequality is referred to as VI.

75
5. CLASSIFICATIONS

76
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS

6 Finite differences (FD) for elliptic problems


In this section, we address finite differences (FD) for boundary value problems (BVP). The key idea using
FD is to approximate the derivatives of the PDEs with the help of difference quotients. Despite that finite
differences are very old and other methods are better (in terms of domain shapes, general material coefficients)
in state-of-the art approximations of PDEs, it is still worth to discuss them:
• They are very simple to understand and implement and yield a fast numerical solution and idea how a
solution could look like;

• They are still used in (big) research codes in industry;


• Parts, such as difference quotients, serve as basis in all kind of software implementations of numerical
methods for computing ODEs and PDEs in which derivatives need to be approximated.
First, we complete the missing properties of the continuous problem and show well-posedness for the one-
dimensional Poisson problem with non-homogeneous Dirichlet boundary data. Then, we provide few words on
the maximum principle and come afterwards to the discretization and finish by the numerical analysis.

6.1 A 1D model problem: Poisson


We study in this section a finite difference discretization for the one-dimensional Poisson problem:

Find u ∈ C 2 ([0, L]) such that






 −u00 (x) = f , ∀x ∈ (0, L),


(46)
u(0) = a,







 u(L) = b,

where f is a given right hand side function C 0 ([0, L]) and a, b ∈ R.


For f = −1 and a = 0 and L = 1 we obtain a situation as sketched in Figure 12.

0 1

0 u(x) 1

Figure 12: The clothesline problem: a uniform force f = −1 acts on a 1D line yielding a displacement u(x).

Remark 6.1. A derivation based on first principles in physics is provided in Section 9.6.2.

6.2 Well-posedness of the continuous problem


We investigate in the following well-posedness, the maximum principle and the regularity of the continuous
problem.

77
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
6.2.1 Green’s function and well-posedness
We start with Green’s function that represents the integral kernel of the Laplace operator (here the second-
order derivative in 1D). Using Green’s function, we obtain an explicit representation formula for the solution
u. We define Green’s function as:

 s(L − x) if 0 ≤ s ≤ x ≤ L,


G(x, s) = L (47)
x(L − s)
if 0 ≤ x ≤ s ≤ L.



L
We now integrate two times the differential equation (46). For all ∀x ∈ [0, L], we obtain
!
Z Z x t
u(x) = − f (s)ds dt + C1 x + C2 , (48)
0 0

where C1 and C2 are integration constants. Using Fubini’s theorem (see e.g., [86]), we obtain
Z x Z t ! Z x Z x  Z x
f (s)ds dt = f (s)dt ds = (x − s) f (s)ds
0 0 0 s 0

and therefore Z x
u(x) = − (x − s) f (s)ds + C1 x + C2 .
0

Using the boundary conditions from (46), we can determine C1 and C2 :


x L
L−x
Z Z
x x
u(x) = − (x − s)f (s)ds +(L − s)f (s)ds + a +b
0 0 L L L
Z x
x L

L(x − s) L−x
Z
x x
= − + (L − s) f (s)ds + (L − s)f (s)ds + a +b (49)
0 L L L x L L
Z L
(L − x) x
= G(x, s)f (s)ds + a +b
0 L L
in which we employed Green’s function G defined in (47).
Now, we are able to investigate the well-posedness (see Section 2.8 for the meaning of well-posedness):
Proposition 6.2. Problem (46) has a unique solution and depends continuously on the problem data via the
following stability estimate:

kukC 2 = max(kukC 0 , Lku0 kC 0 , L2 ku00 kC 0 ) ≤ C1 kf kC 0 + C2 |a| + C3 |b|,

where the constants may be different than before. If a = b = 0 (homogeneous Dirichlet conditions), the estimate
simplifies to:
kukC 2 ≤ C1 kf kC 0 .
An analoguous result for more general spaces is stated in Proposition 8.86.
Proof. Existence and uniquness follow immediately from the representation of the solution as
Z L
(L − x) x
u(x) = G(x, s)f (s) ds + a +b . (50)
0 L L
The stability estimate follows via the following arguments. Green’s function G(x, s) is positive in (0, L) and
we can estimate as follows: Z L
|u(x)| ≤ kf kC 0 G(x, s) ds + |a| + |b|
0

and ∀x ∈ [0, L], we have

78
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS

L x L
(L − x)
Z Z Z
x
G(x, s) ds = sds + (L − s)ds (51)
0 L 0 L x
(52)
2 2 2
(L − x)x x(L − x) (L − x)x L
= + = ≤ , in x = L/2, (53)
2L 2L 2 8
yielding
L2
kukC 0 = max |u(x)| ≤ kf kC 0 + |a| + |b|.
x∈[0,L] 8
All values on the right hand side are given by the problem data and are therefore known. However, the estimate
is not yet optimal in the regularity of u. In the following we will see that we can even bound the second-order
derivative of u by the right hand side f .
Indeed, by differentiating the solution (50) (actually better to see for the G(x, s) function in (51)), we obtain
∀x ∈ [0, L]:
1 x 1 L
Z Z
0 a b
u (x) = − sf (s)ds + (L − s)f (s)ds − + .
L 0 L x L L
Since the integrand is still positive, we can again estimate:
x L
|a| |b|
Z Z
1 1
|u0 (x)| ≤ kf kC 0 s ds + (L − s)ds + +
L 0 L x L L

and ∀x ∈ [0, L], the Green’s function part yields:


x L
x2 (L − x)2
Z Z
1 1 L
s ds + (L − s)ds = + ≤
L 0 L x 2L 2L 2

thus
L |a| |b|
ku0 kC 0 = max |u0 (x)| ≤ kf kC 0 + + .
x∈[0,L] 2 L L
Differentiating a second time, we obtain (of course) ∀x ∈ [0, L] :

u00 (x) = −f (x)

i.e., our original PDE. Therefore, we obtain the third estimate as:

ku00 kC 0 = max |u00 (x)| ≤ kf kC 0 .


x∈[0,L]

Consequently, the solution depends continuously on the problem data and therefore the problem is well-
posed.
Exercise 4. Recapitulate and complete the above steps:
• Derive step-by-step the explicit form (48);

• Determine C1 and C2 and convince yourself that (49) holds true;


• Convince yourself by an explicit computation that u00 (x) = −f (x) follows from (50).

6.2.2 Maximum principle on the continuous level


Since G(x, s) is positive in (0, L), the representation (50) yields immediately the maximum principle. That is
to say:
f ≥ 0, a ≥ 0, b ≥ 0 ⇒ u ≥ 0.

79
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
Remark 6.3. Intuitively this is immediately clear. Just consider a parabola function u(x) = 21 (−x2 + x),
which is −u00 (x) = 1, where f = 1. The function is positive with maximum in 0.5 and has its minima on the
boundaries 0 and 1.

Moreover, the maximum principle yields also uniqueness of the solution. Assume that u1 and u2 are two
different solutions. Their difference is w := u1 − u2 . Of course the original PDE, namely −u00 (x) = f , also
holds for w. Thus: −w00 (x) = f . We apply f, a, b and −f, −a, −b and obtain −∆w = f from which we get
w ≥ 0 and −∆(−w) = −f from which we get w ≤ 0, respectively. Consequently, w = 0 and therefore u1 = u2 .

6.2.3 Regularity of the solution


The regularity in the case of Poisson’s problem carries over from the right hand side. If f ∈ C 0 ([0, L]) we
have shown that u ∈ C 2 ([0, L]). For higher regularity k ≥ 1, if f ∈ C k ([0, L]), therefore for 1 ≤ k 0 ≤ k,
0 0
u(k +2) = −f (k ) ∈ C 0 ([0, L]) and consequently u ∈ C k+2 ([0, L]). The solution is always two orders more
regular than the right hand side data f .

6.3 Spatial discretization


After the previous theoretical investigations we address now a finite difference discretization for Poisson’s prob-
lem. As previously mentioned, using finite differences, derivatives are approximated via difference quotients.
For simplicity we consider homogeneous Dirichlet boundary values. We recall the corresponding problem:
Formulation 6.4. 
Find u ∈ C 2 ([0, L]) such that






 −u00 (x) = f ∀x ∈ (0, L),


(54)
u(0) = 0,







 u(L) = 0.

In the first step, we need to discretize the domain by introducing discrete support points. Let n ∈ N and
the step size (spatial discretization parameter) h (the distance between two support points) and the support
points xj be defined as
L
h= , xj = jh for all 0 ≤ j ≤ n + 1.
n+1
By this construction, we have
• n inner points;

• n + 2 total points (including the two boundary points);


• n + 1 intervals (so-called mesh or grid elements).
The difference quotient for approximating a second order derivative is given by (recall classes to the introduction
of numerical methods, e.g., [114]):
Uj+1 − 2Uj + Uj−1
u00 (xj ) ≈ ,
h2
where we used a capitel letter U to distinguish the discrete solution from the exact solution u. In the following,
be careful with the minus sign since we approximate the negative second order derivative of u:
−Uj+1 + 2Uj − Uj−1
−u00 (xj ) ≈ .
h2
Consequently, the approximated, discrete, PDE reads:
−Uj+1 + 2Uj − Uj−1
= f (xj ), 1 ≤ j ≤ n,
h2

80
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
with U0 = Un+1 = 0. Going from j = 0, . . . n, we obtain a linear equation system
  
f (x1 )

U1

2 −1 0 · · · 0
 −1 2 −1 . . . ...  
  U2    f (x2 ) 
  
   ..  
   .. 
1 .. .. ..  . .

=
   

h2  0 . . . 0   ..  
   .. 
 . .  .   .

 . . . −1 2 −1 

 .
    
 Un−1   f (xn−1 ) 
0 · · · 0 −1 2
| {z }| U n
{z } |
f (xn )
{z }
=A∈Rn×n =U ∈Rn =F ∈Rn

In compact form:
AU = F.
The ‘only’ remaining task consists now in solving this linear equation system. The solution U vector will then
contain the discrete solution values on the corresponding support (inner) points.

Remark 6.5. We could have formulated the above system for all points (including the two boundary points):
   
 2  U 0 0
h 0 ··· ··· 0
 −1 2 −1 0 · · · · · · 0   U1   f (x1 ) 
   
    
 . ..   U2   f (x2 ) 
 .. −1 2 −1 . . . .  .  
    
 ..   .. 
1 .. .. ..   . 
2

0 . . . 0
 
.
 =

.

h     ..  
 . 
.. . .   . 

. . −1 2 −1 0       
  Un−1   f (xn−1 ) 
   
0 · · · 0 −1 2 −1   U  

 n   f (xn ) 
 
2
0 0 ··· 0 0 0 h
| {z } Un+1 0
=A∈R(n+2)×(n+2)
| {z } | {z }
=U ∈Rn+2 =F ∈Rn+2

But we see immediately that we can save this slightly bigger matrix since the boundary values are already known
and can be computed immediately.

6.4 Solving the linear equation system


Depending on the fineness of the discretization (the smaller h, the smaller the discretization error as we will
see later), we deal with large matrices A ∈ Rn×n for n ∼ 106 and larger. For moderate sizes, a direct solution
à la LU or Cholesky is competitive. For large n, due to
• computational cost,

• and memory restrictions,


we have to approximate U with the help of iterative solution schemes such as Jacobi, Gauss-Seidel, conjugate
gradient, multigrid, and so forth. We come back to this topic in Section 10 in which we provide a basic
introduction.

6.5 Well-posedness of the discrete problem


We now investigate well-posedness of the discrete problem.

81
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
6.5.1 Existence and uniqueness of the discrete problem
We first show that the matrix A is symmetric (A = AT , where AT is the transpose), positive definite. A direct
calculation for two vectors X, Y ∈ Rn yields
n X
X n
< AX, Y >n = Aij Xj Yi
i=1 j=1
n
1 X
= (−Xi−1 + 2Xi − Xi+1 )Yi (apply entries of A to Xj )
h2 i=1
n n
1 X 1 X
= (Xi − Xi+1 )Yi + (Xi − Xi−1 )Yi
h2 i=1
h2 i=1
n n+1
1 X 1 X
= (Xi − Xi+1 )Yi + 2 (Xi − Xi−1 )Yi (index shift due to bc)
h2 i=0
h i=1
n+1 n+1
1 X 1 X
= (Xi−1 − Xi )Yi−1 + 2 (Xi − Xi−1 )Yi
h2 i=1
h i=1
n+1
1 X
= (Xi − Xi−1 )(Yi − Yi−1 )
h2 i=1
n+1
X Xi − Xi−1 Yi − Yi−1
=
i=1
h h
n+1
X Yi − Yi−1 Xi − Xi−1
= (switch factors)
i=1
h h
= ... (do all steps backwards)

= < X, AY >n ,

with X0 = Y0 = Xn+1 = Yn+1 = 0. This first calculation shows the symmetry. It remains to show that

< AX, Y >n

is positive definite. We now consider the term


n+1
1 X
< AX, Y >= . . . = (Xi − Xi−1 )(Yi − Yi−1 )
h2 i=1

from the previous calculation. Setting Y = X yields:


n+1
X 2
Xi − Xi−1
< AX, X >n = ≥ 0.
i=1
h

This shows the positivity. It remains to show the definiteness: < AX, X >n = 0 when X = 0. For all
1 ≤ i ≤ n + 1, we first obtain

X0 = X1 = · · · = Xi−1 = Xi = · · · = Xn = Xn+1 ,

and the zero boundary conditions finally yield X = 0. Consequently, the matrix A is symmetric positive
definite. The eigenvalues are in R and strictly positive. Therefore, A is regular (recall the results from linear
algebra!) and thus invertible, which finally yields:
Proposition 6.6 (Existence and uniqueness of the discrete problem). The discrete problem
−Uj+1 + 2Uj − Uj−1
= f (xj ) 1 ≤ j ≤ n, (55)
h2
with U0 = Un+1 = 0, has one and only one solution for all F ∈ Rn .

82
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
6.5.2 Maximum principle on the discrete level
As previously stated, the maximum principle on the continuous level has a discrete counterpart. Let F be
given such that Fj ≥ 0 for 1 ≤ j ≤ n. We perform an indirecte proof and show that µ := min Uj ≥ 0. We
1≤j≤n
suppose µ < 0 and take i ∈ {1, · · · , n} as one index (it exists at least one!) such that the minimum is taken:
Ui = min Uj = µ. If 1 < i < n, and U is solution of AU = F , the ith row can be written as:
1≤j≤n

2Ui − Ui+1 − Ui−1 = h2 Fi ≥ 0

under the assumption on F being positive. Therefore, we split

0 ≤ (Ui − Ui+1 ) + (Ui − Ui−1 ) = (µ − Ui+1 ) + (µ − Ui−1 ) ≤ 0

because Ui = µ is the minimum value, which we assumed to be negative. To satisfy this inequality chain it
must hold:
Ui−1 = Ui = Ui+1 = µ,
which shows that also the neighboring indices i − 1 and i + 1 satisfy the minimum. Extending this idea to all
indices yields finally, U1 = Un = µ. Therefore, we can investigate the first or the last row of the matrix A:

0 ≤ F1 = 2U1 + U2 = U1 + (U1 − U2 ) ≤ U1 = µ < 0.

But
0 ≤ U1 = µ < 0
is a contradiction. Consequently, µ > 0. Respecting the boundary conditions, we notice that for U0 = Un+1 = 0
and Fj ≥ 0 for all 1 ≤ j ≤ n, we have the desired result

µ= min Uj ≥ 0
0≤j≤n+1

showing that we have a discrete maximum principle.

Proposition 6.7 (Monotone inverse). The matrix A corresponding to the finite difference discretization is a
so-called M matrix (inverse-monotone), i.e., the inverse A−1 is element-wise non-negative, i.e.,

A−1 ≥ 0.

Proof. For 1 ≤ j ≤ n, we fix F j = (0 · · · 1 · · · 0) = (δij )1≤i≤n , where δij is the Kronecker symbol and X j is the
solution of AX j = F j . By construction, X j = A−1 F j and for 1 ≤ i ≤ n, the ith row can be written as
n
X n
X
Xij = A−1 j
ik Fk = A−1 −1
ik δkj = Aij . (56)
k=1 k=1

Using the maximum principle, we know that for 1 ≤ i, j ≤ n and Fij = δij ≥ 0, the solution is positive, i.e.,

Xij ≥ 0

(j is the jth solution and i is the component of that solution). Thanks to (56), we have Xij = A−1
ij , and we
obtain
A−1
ij ≥ 0,

which means that the inverse of the matrix A is an M matrix.

83
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
6.6 Numerical analysis: consistency, stability, and convergence
We finally investigate the convergence properties of the finite difference discretization. As we know from ODE
numerics, for linear schemes, consistency plus stability will yield convergence.

6.6.1 Basics in error estimation


We recall the basics. Let Āx̄ = b̄ the continuous problem and Ax = b the discrete approximation. Then, we
define:
• Approximation error: e := x̄ − x

• Truncation error: η := Ax̄ − b (exact solution into numerical scheme)


• Residual (defect): ρ := Āx − b̄ (discrete solution into continuous problem)
Furthermore, when the underlying numerical scheme is linear (see ODE lectures), we have:
• Truncation error plus stability yields convergence.

In the following, it will be the same procedure that we use.

6.6.2 Truncation error


We first derive an expression for the truncation error η (the local discretization error), which is obtained by
plugging in the exact (unknown) solution into the numerical scheme.
Let
ej = Uj − Ūj for 1 ≤ j ≤ n,
where Ūj = (Ūi )1≤i≤n = (u(xi ))1≤i≤n ∈ Rn . Then, we have for 1 ≤ i ≤ n:
 
(Ae)i = A(U − Ū ) i = (F − AŪ )i = f (xi ) − (AŪ )i
−Ūi+1 + 2Ūi − Ūi−1
= −u00 (xi ) −
h2

u(xi+1 ) − 2u(xi ) + u(xi−1 )


= − u00 (xi )
h2
and thus
Ae = η (57)
where η ∈ Rn is the local truncation error that arises because the exact solution u does not satisfy the
numerical scheme (it is only the discrete solution U that is a solution of the discrete scheme!). Consequently,
for all 1 ≤ i ≤ n, we have
u(xi+1 ) − 2u(xi ) + u(xi−1 )
ηi := − u00 (xi ).
h2

6.6.3 Consistency
The consistency is based on estimates of the local truncation error. The usual method to work with is a Taylor
expansion (see Section 3.14). We have previously established that for f ∈ C 2 ([0, L]), we have u ∈ C 4 ([0, L]).
Consequently, we can employ a Taylor expansion up to order 4 at the point xi , 1 ≤ i ≤ n. This brings us to:

h2 00 h3 h4
u(xi+1 ) = u(xi ) + hu0 (xi ) + u (xi ) + u(3) (xi ) + u(4) (τi+ )
2 6 24
h2 00 h3 h4
u(xi−1 ) = u(xi ) − hu0 (xi ) + u (xi ) − u(3) (xi ) + u(4) (τi− )
2 6 24

84
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
with τi+ ∈ [xi , xi+1 ] and τi− ∈ [xi−1 , xi ]. It follows that for all 1 ≤ i ≤ n

h2 (4) + h2 h2 (4) h2 00
|ηi | = u (τi ) + u(4) (τi− ) ≤ ku kC 0 = kf kC 0
24 24 12 12

in which we used that u(4) = −f 00 . As final result, we obtain:


Proposition 6.8. The local truncation error of the finite difference approximation of Poisson’s problem can
be estimated as
1 00
kηk∞ := max |ηi | ≤ Cf h2 , with Cf = kf kC 0 (58)
1≤i≤n 12
Therefore, the scheme (55) is consistent with order 2.
Remark 6.9. We notice that the additional order of the scheme (namely order two and not one) has been
obtained since we work with a central difference quotient to approximate the second order derivative. Of course,
here it was necessary to assume sufficient regularity of the right hand side f and the solution u.
Remark 6.10. The requirements on the regularity of the solution u to show the above estimates is much higher
than the corresponding estimates for finite elements (see Section 8.13). In practice, such a high regularity can
only be ensured in a few cases, which is a drawback of finite differences in comparison to finite elements. But
still, we can use the method in practice, but one has to be careful whether the results are still robust and reliable.

6.6.4 Stability in L∞
We now investigate the stability of the finite difference scheme. We recapitulate the matrix norm k.k∞
(maximum row sum) for M ∈ Rn,n defined by
n
X
kM k∞ = max |Mij |
1≤i≤n
j=1

Moreover, we denote by w(x) the exact solution for the problem in which f = 1 on [0, L]. The choice of f = 1
is only for simplicity. More general f work as well, but will require more work in the proof below.
Proposition 6.11. It holds:
kA−1 k∞ = kW̄ k∞ ,
and
L2
kA−1 k∞ ≤ .
8
Proof. For f = 1, the second derivative vanishes: f 00 = 0. From Section 6.6.3 with (58) we know that in this
case
kηk∞ = 0
and therefore
0 = Ae = A(W − W̄ )
where W ∈ Rn denotes the discrete solution corresponding to f = 1 (the full right hand side vector is denoted
by F1 in the following). Moreover, we denote W̄ = (w(xi ))1≤i≤n ∈ Rn the continuous solution. Consequently,

A(W − W̄ ) = 0 ⇒ AW = AW̄ ⇒ F1 = AW̄


−1 −1
⇒ W̄ = A F1 = A .

The last step holds because we work with f = 1 and therefore F1 = 1. For 1 ≤ i ≤ n, we obtain
n
X n
X n
X
W̄i = A−1
ij (F1 )i = A−1
ij = |A−1
ij |
j=1 j=1 j=1

85
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
The last step holds because A is an M -matrix. Otherwise, we would not be allowed to take absolute signs.
We finally obtain:
Xn
kA−1 k∞ = max |A−1
ij | = max |W̄i | = kW̄ k∞ .
1≤i≤n 1≤i≤n
j=1

The exact solution to the problem −w00 (x) = 1 on (0, L) with homogeneous Dirichlet conditions is w(x) =
x(L − x)
, which atteins its maximum at x = L/2. This yields max(w) = L2 /8 and therefore,
2
L2
kA−1 k∞ ≤ ,
8
which shows the assertion.
Remark 6.12. The case f = 1 with η = Ae = 0 has a very practical implication. Here, the discrete solution
matches the continuous solution in the support points xi , 1 ≤ i ≤ n. This can be confirmed by numerical
simulations; see e.g., Setion 6.7. But be careful, this correspondance is in general only true for f 00 = 0 and
only in 1D cases.
Remark 6.13. The previous result can be extended to other right hand sides f , which are not necessarily
f = 1.

6.6.5 Convergence L∞
With the previous two results, we can now proof convergence of the finite difference scheme. We assume as
before f ∈ C 2 ([0, L]) and obtain:
Proposition 6.14. For the convergence of the finite difference scheme (55), it holds:

L2 L2 00
kek∞ = kA−1 ηk∞ ≤ kA−1 k∞ kηk∞ ≤ Cf h2 = kf kC 0 h2 = O(h2 ),
8 96
where we employed the stability and consistency estimates and (58). Consequently, we have quadratic conver-
gence as h tends to zero.

6.6.6 Convergence L2
In this final section, we proof a second convergence result, but now in the L2 norm. We define two norms:
  21   12
n+1 n
X 1 X
kXkA =  |Xj − Xj−1 |2  kXk2 =  h|Xj |2 
j=1
h j=1

and notice that < . >A and < . >2 are the corresponding scalar products.
Proposition 6.15. It holds for each X ∈ Rn :

kXk2 ≤ L1/2 kXk∞ ≤ LkXkA .

Proof. Let X ∈ Rn for all 1 ≤ i ≤ n and we set as usually X0 = Xn+1 = 0. Then:

i i
X X Xj − Xj−1 Xj − Xj−1
|Xi | = X0 + (Xj − Xj−1 ) = h ∗ 1 =< , 1 >i
j=1 j=1
h h
 1/2  1/2
i 2 i
X Xj − Xj−1 X
≤  h   h|1|2 
j=1
h j=1
1/2
≤ kXkA L

86
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
where we used the Cauchy-Schwarz inequality. It follows that

kXk∞ ≤ L1/2 kXkA .

Moreover,    
Xn Xn
kXk22 =  h|Xj |2  ≤ kXk2∞  h ≤ kXk2∞ L.
j=1 j=1

We finally obtain
kXk2 ≤ kXk∞ L1/2 ≤ LkXkA .

i
X
Remark 6.16. We notice that kXk2 ≤ LkXkA holds because X0 = 0 and h|1|2 is bounded. Otherwise the
j=1
previous result would be not correct.
Proposition 6.17. Let X, Y ∈ Rn . For a symmetric, positive, definite matrix A, we can define the so-called
A-norm; see also Section 10.4. It holds

< X, Y >A =< AX, Y >2

and
kekA ≤ Lkηk2 .
Proof. We already have shown that for all X, Y ∈ Rn , it holds
n+1
X Xi − Xi−1 Yi − Yi−1
< AX, Y >n = .
i=1
h h

Therefore,
n+1
X Xi − Xi−1 Yi − Yi−1
< AX, Y >2 = h < AX, Y >n = h =< X, Y >A
i=1
h h
from which follows

kek2A =< e, e >A =< Ae, e >2 =< η, e >2 ≤ kηk2 kek2 ≤ Lkηk2 kekA

in which we used again the Cauchy-Schwarz inequality. This shows the assertion.
Proposition 6.18. It holds:
kekA ≤ Lkηk2 ≤ L3/2 kηk∞ ≤ L3/2 Cf h2
and
kek2 ≤ LkekA ≤ L5/2 Cf h2 .
In other words: the scheme (55) converges in the L2 norm and the A norm with order 2 when h tends to zero.

6.7 Numerical test: 1D Poisson


We finish this 1D section with a numerical test implemented in octave [44].

87
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
6.7.1 Homogeneous Dirichlet boundary conditions
Formulation 6.19. Let Ω = (0, 1)

Find u ∈ C 2 (Ω) such that






 −u00 (x) = f in Ω


(59)
u(0) = 0,







 u(1) = 0.

where f = −1. The continuous solution can be computed here (see Section 8.1.1) and is
1
u(x) = (−x2 + x).
2
For the discretization we choose n = 4. Thus we have five support points

x0 , x1 , x2 , x3 , x4

and h = 1/4 = 0.25.


With these settings we obtain for the matrix A (see Section (6.3)) including boundary conditions:

1 0 0 0 0
-16 32 -16 0 0
0 -16 32 -16 0
0 0 -16 32 -16
0 0 0 0 1

and for the right hand side vector b:


0
-1
-1
-1
0
We use the famous backslash solution operator (in fact an LU decomposition):
U = A\b
and obtain as solution U = (U0 , U1 , U2 , U3 , U4 ):

0.00000
-0.09375
-0.12500
-0.09375
0.00000

A plot of the discrete solution is provided in Figure 13. Moreover, the continuous solution can be evaluated
in the support as well. Here, we see that both solutions match. This was already suggested by our numerical
analysis; see in particular Remark 6.12.

88
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS

0
FD solution

-0.02

-0.04

-0.06

-0.08

-0.1

-0.12

-0.14
0 0.2 0.4 0.6 0.8 1

Figure 13: Solution of the 1D Poisson problem with f = −1 using five support points with h = 0.25. We
observe that the approximation is rather coarse. A smaller h would yield more support points and
more solution values and therefore a more accurate solution.

Remark 6.20 (Matrix A in FD vs. FEM). In the above matrix A and RHS b, we have implicitly 1/h2 and 1
as factors, respectively. Indeed
 2 
h 0 ··· ··· 0
 −1 2 −1 0 ··· ···  0
 
 . .  ..
 .. −1 2 −1 . .  .
 
1 .. .. .. 1 1
bF D = (0, 1, . . . , 1, 0)T ,

AF D = 2  0 . . . 0 ,
 = .
h  h2 16
 .. . . 

 . . −1 2 −1 0 

0 ··· 0 −1 2 −1 
 

0 0 ··· 0 0 0 h2

The corresponding matrix using an FEM scheme (Section 8) looks slightly different, but is in fact the same.
Here, we integrate only u0 , but multiplied with a test function ϕ0 . On the right hand side we also have ϕ. Here,
we obtain:
 
h 0 ··· ··· 0
 −1 2 −1 0 · · · · · · 0 
 
 . .. 
 .. −1 2 −1 . . . . 
 
1 . . . 1 1
bF EM = h ∗ (0, 1, . . . , 1, 0)T ,
 
AF EM =  0 . . . . . . 0
, = .
h  h 4
.. . . 

 . . −1 2 −1 0 

0 · · · 0 −1 2 −1 
 

0 0 ··· 0 0 0 h

Dividing over h in AF EM and bF EM yields directly AF D and bF D .

89
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
6.7.2 Nonhomogeneous Dirichlet boundary conditions u(0) = u(1) = 1
We change the boundary conditions to
u(0) = u(1) = 1.
The right hand side then reads b = (1, −1, −1, −1, 1)T . We obtain as solution:

1.00000
0.90625
0.87500
0.90625
1.00000

6.7.3 Nonhomogeneous Dirichlet boundary conditions u(0) = 3 and u(1) = 2


We change the boundary conditions to
u(0) = 3 , u(1) = 2.
T
The right hand side then reads b = (3, −1, −1, −1, 2) .
We obtain as solution:
3.0000
2.6562
2.3750
2.1562
2.0000

Figure 14: Solution of the 1D Poisson problem with f = −1 and nonhomogeneous boundary conditions.

6.7.4 How can we improve the numerical solution?


An important question is how we can improve the numerical solution? A five-point approximation as employed
in Figure 13 is obviously not very accurate. After all previous subsections in this chapter, the answer is clear:
increasing the number of support points xn , i.e., n → ∞ in the given domain Ω will yield h → 0 (recall
h = xj+1 − xj ). According to our findings in the numerical analysis section, the discretization error will
become very small and we have therefore an accurate approximation of the given problem. On the other hand,
large n will require a lot of computational resources. We have finally to find a compromise between accuracy
of a numerical solution and computational cost (recall the concepts outlined in Section 2.6).

6.8 Finite differences in 2D


We extend the idea of finite differences now to two-dimensional problems. The goal is the construction of the
method. For a complete numerical analysis we refer to the literature, e.g., [65, 106].

90
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
In this section we follow (more or less) [111]. We consider

−∆u = f in (0, 1)2 , (60)


u = uD on ∂ΩD . (61)

6.8.1 Discretization
In extension to the previous section, we now construct a grid in two dimensions. The local grid size is h. We
define
Ωh := {(xij ), i, j = 0, . . . , N | xij = (ih, jh), h = N −1 }
We seek a solution in all inner grid points

Ω0h := {(xij ), i, j = 1, . . . , N − 1| xij = (ih, jh), h = N −1 }


N −1
For the following, we introduce a grid function uh := (uij )i,j=1 and seek unknown function as the solution of

−∆h uh = Fh , xij ∈ Ω0h .

Using a central finite difference operator of second order we obtain


1
−(∆h uh )ij = (4uij − ui−1,j − ui+1,j − ui,j−1 − ui,j+1 ), (Fh )ij := f (xij )
h2
On the boundary, we set
u0,j = uN,j = ui,0 = ui,N
This difference quotient is known as a five-point stencil:
 
−1
1
Sh = 2  −1 4 −1 

h
−1

In total we obtain the following linear equation system:

Ah uh = fh (62)

with    
Bm −Im 4 −1
 −I
 m Bm −Im  −1 4 − 1
  
 
Ah :=  . . , B :=  ..  (63)
   
 −Im Bm .  −1 4 .
.. ..
   
. .
2
×N 2
The matrix has size Ah ∈ RN and bandwidth 2N + 1. Further properties are:
Proposition 6.21. The matrix Ah has the following properties:
1. Ah has five entries per row;
2. it is symmetric;

3. weakly diagonal-dominant;
4. positive definite;
5. an M matrix.

It holds:

91
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
Proposition 6.22. The five-point stencil approximation of the Poisson problem in two dimensions satisfies
the following a priori error estimate:
1 2
max |u(xij ) − uij | ≤ h max(|∂x4 u| + |∂y4 u|).
ij 96 Ω

Proof. Later.
The resulting linear equation system Ah uh = Fh is sparse in the matrix Ah , but maybe large, for large N
(i.e., very small h and consequently high accuracy).
To determine the condition number (which is important for the numerical solution), we recall the eigenvalues
and eigenvectors:

λkl = h−2 (4 − 2(cos(khπ) + cos(lhπ))),


−1
ωkl = (sin(ikhπ) sin(jlhπ))N
ij=1 .

The largest and smallest eigenvalues are, respectively:

λmax = h−2 (4 − 4 cos(1 − h)π) = 8h−2 + O(1),


λmin = h−2 (4 − 4 cos(hπ)) = h−2 (4 − 4(1 − 0.5(π 2 h2 )) + O(h2 ) = 2π 2 + O(h2 ).

Therefore, the condition number is


λmax 2
cond2 (Ah ) = = 2 2,
λmin π h
which is similar to the heat equation.
Remark 6.23 (on the condition number). The order O(h−2 ) of the matrix Ah is determined by the differential
equation and neither by the dimension of the problem (here 2) nor the quadratic order of the consistency
error. This statement holds true for finite differences as well as finite elements; see also Section 10.1.

6.9 Finite differences for time-dependent problems: 1D space / 1D time


We extend our developments to time-dependent problems and restrict ourselves again to one-dimensional
spatial problems. As model problem we consider the heat equation. As spatial interval, we choose Ω = (0, 1).
The time interval is denoted by I := (0, T ) where T > 0 is the end time value. The problem reads:
Formulation 6.24 (Heat equation). Let ν > 0 be a material parameter. Find u := u(x, t) such that

∂t u − ν∂xx u = 0 in Ω × I PDE, (64)


u(t, 0) = u(t, 1) on ∂Ω Boundary condition, (65)
u(0, x) = u0 (x) in Ω × {0} Initial condition. (66)

The PDE is complemented with a boundary condition of Dirichlet type, here a 1-periodic boundary condition,
and an intial condition for time t = 0.

6.9.1 Algorithmic aspects: temporal and spatial discretization


For spatial and temporal discretization, we introduce a spatial mesh parameter h and a temporal mesh pa-
rameter k:
1
h= > 0, k > 0.
(N + 1)
We define the (mesh) nodes

(tn , xj ) = (nk, jh) ∀n ≥ 0, j ∈ {0, 1, . . . , N + 1}.

A discrete value that approximates the (unknown) exact solution in a point (tn , xj ) is denoted by unj . The
initial value is discretized by
u0j = u0 (xj ), j ∈ {0, 1, . . . , N + 1}.

92
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
As boundary values we have in the discrete sense

un0 = unN +1 ∀n > 0.

To this end, at each time point, we compute the values

(unj )1≤j≤N ∈ RN

In the following we introduce some schemes (i.e., algorithms) in order to compute these solution values. Each
scheme yields a system of N equations (for each spatial point xj for 1 ≤ j ≤ N ) in order to compute the N
components of unj .
Definition 6.25 (Explicit scheme). An explicit scheme (explicit in terms of n) is

un+1
j − unj −unj−1 + 2unj − unj+1
+ν =0
k h2
for n > 0 and j ∈ {1, . . . , N }. This scheme is known from ordinary differential equations as the forward Euler
scheme .
Definition 6.26 (Implicit scheme). An implicit scheme (implicit in terms of n, i.e., the solution is not
immediately obtained) is
un+1
j − unj −un+1 n+1
j−1 + 2uj − un+1
j+1
+ν =0
k h2
for n > 0 and j ∈ {1, . . . , N }. This scheme is the well-known backward Euler scheme.
It must be further justified that indeed the implicit scheme can be used the values un+1j . To this end, we
have to write out explicitly all N equations, yielding a linear equation system. Now, we need to check whether
the resulting matrix is invertible. Specifically, we have
 
1 + 2c −c 0
 −c 1 + 2c −c 
 
 .. .. .. 

 . . . 

−c 1 + 2c −c 
 

0 −c 1 + 2c

with c := ν hk2 .
A combination of the two previous schemes yields a famous scheme, the so-called θ scheme:
Definition 6.27 (One-Step-θ scheme). The θ scheme is defined as:

un+1
j − unj −un+1 n+1
j−1 + 2uj − un+1
j+1 −unj−1 + 2unj − unj+1
+ νθ + ν(1 − θ) =0
k h2 h2
for n > 0 and j ∈ {1, . . . , N }. Here the classification (whether implicit or explicit) depends on the choice
of θ ∈ [0, 1]. For θ = 0 the scheme is clearly explicit. For θ > 0 the scheme is implicit, but care has to be
taken about the meaning. Indeed for 0 ≤ θ ≤ 0.5, it can be shown that the scheme has problems in robustness.
For θ ≥ 0.5, we have a better properties. Indeed, for θ = 1, we obtain the very stable (implicit) backward
Euler scheme. Finally for θ = 0.5, the resulting scheme has similarities with the trapezoidal rule in numerical
quadrature, can specifically in application of the heat equation it is known as Crank-Nicolson scheme..
Remark 6.28. Further schemes can be obtained with the same rules known from ordinary differential equa-
tions.

93
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
6.9.2 Consistency and precision
As before, based on Taylor developments, we now investigate the consistency of our numerical schemes and
also their precision. In abstract form we write the PDE as (see also Definition 4.46):

F (u) = 0

In a general fashion, the corresponding FD scheme can be written as:

Fk,h (un+m
j+i ) = 0 for m− ≤ m ≤ m+ , i− ≤ i ≤ i+

The values m− , m+ , i− , i+ define the size of the so-called stencil


Definition 6.29 (Truncation error). The finite difference scheme Fk,h (un+m j+i ) = 0 is consistent with the PDE
F (u) = 0 when for all sufficiently smooth solutions u(t, x), the truncation error, it holds

Fk,h (u(t + mk, x + ih)) → 0 for h → 0, k → 0.

Furthermore, the order is


Fk,h (u(t + mk, x + ih)) = O(hp + k q )
We say then that the finite difference scheme is consistent with order p in space and order q in time when the
last approximation holds true.
Remark 6.30. The truncation error is always obtained by inserting the exact solution into the numerical
scheme.
Proposition 6.31. For the forward Euler scheme, it holds q = 1 (first order in time) and p = 2 (second order
in space).
Proof. We choose a function u ∈ C 6 and perform Taylor. Then, we re-organize all terms such that we obtain
the original PDE. The remainder terms constitute the order of the scheme.

6.9.3 Stability
We now concentrate on the stability. As outlined in Section 2.6, the non-stability often results in non-physical
oscillations of the numerical solution. The investigation of stability in PDEs for finite differences follows simply
the concepts of those learned in ODEs. For the stability, we need to define proper norms to work with. We
recall first that un = (unj )1≤j≤N is our numerical solution. We define the following norm:

N
X 1/p
kun kp = h|unj |p
j=1

for 1 ≤ p ≤ ∞. For p = ∞, we use


kun k∞ = max |unj |
1≤j≤N

Due to the weighting with h, the norm ku kp is identical with the norm Lp (0, 1) for constant function on the
n

half-open sub-interval [xj , xj+1 ).


Definition 6.32 (Stability). A finite difference scheme is (unconditionlly) stable with respect to the norm
k · k defined before, if there exists a constant K > 0 independent of the mesh sizes k and h such that

kun k ≤ Kku0 k for n ≥ 0.

Here un is the numerical solution at time step tn and u0 the initial value. A finite difference scheme is
conditionally stable if the stability estimate only holds under certain conditions on h and k.
Definition 6.33 (Linear numerical schemes). A finite difference scheme is linear when Fk,h (un+m
j+i ) is linear
with respect to its argument un+m
j+i .

94
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
For linear one-step schemes, the stability can be investigated in the equivalent formulation

un+1 = Aun

where A is a linear operator (a matrix) from RN mapping to RN . For instance, we know already for the
forward Euler scheme:  
1 + 2c −c 0
 −c 1 + 2c −c 
 

B :=  .. .. .. 
 . . . 

−c 1 + 2c −c 
 

0 −c 1 + 2c
The matrix A is nothing else than the inverse of B:
 
1 − 2c c 0
 c
 1 − 2c c 

A=

 . .. . .. . ..



c 1 − 2c
 
 c 
0 c 1 − 2c

If we now iterate, we obtain


un = An u0
Consequently, the stability becomes

kAn u0 k ≤ Kku0 k for n ≥ 0, ∀u0 ∈ RN .

From introductory classes into numerical analysis, we know the definition of matrix norm:

kM uk
kM k = sup
u∈RN ,u6=0 kuk

Hence, the stability of the scheme is equivalent to

kAn k ≤ K, n ≥ 0,

which means nothing else than that the powers of A are bounded.

6.9.4 Stability in L∞
In the following, we investigate the stability in the L∞ . This is linked to the discrete maximum principle.

Definition 6.34. A finite difference scheme satisfies the discrete maximum principle if for all n ≥ 0 and all
1 ≤ j ≤ N , it holds
min (0, min u0j ) ≤ unj ≤ max (0, max u0j )
0≤j≤N +1 0≤j≤N +1

Using the maximum principle, we obtain the following result:


Proposition 6.35. For forward Euler scheme is stable in the L∞ norm if and only if the CFL (Courant-
Friedrichs-Lewy) condition
2νk ≤ h2
is satisfied. The implicit Euler scheme is unconditionally stable in the L∞ norm.

95
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
6.9.5 Convergence
Having the consistency and stability, we can now address the convergence of the scheme. The main theorem
is based on Lax, saying that for a linear numerical scheme, stability and conistency yield the convergence. In
practice the theorem of Lax says that if we use a consistent scheme and we do not observe any non-physical
oscillations, then the numerical solution is ‘close’ to the sought (unknown exact) solution.
Theorem 6.36 (Lax). Let u(t, x) be a sufficiently smooth solution of the heat equation posed in Formulation
6.24. Let unj the discrete solution obtained by a finite difference scheme using the initial condition u0j := u0 (xj ).
We assume that the finite difference scheme is linear, a one-step scheme, consistent, and stable in the k · k
norm. Then the scheme is convergent and it holds for all T > 0
 
lim sup ken k = 0
h,k→0 tn ≤T

where enj := unj − u(tn , xj ) is the error vector. Moreover, the order of convergence is p in space and q in time.
For all T > 0, there exists a constant CT > 0 such that
X  
ken k ≤ CT hp + k q .
tn ≤T

Proof. We assume that we work with Dirichlet conditions. From before we know that

un+1 = Aun .

Let u be an exact solution of Formulation 6.24. We denote v n = (vjn )1≤j≤N with vjn = u(tn , xj ). Since the
scheme is consistent, there exist a vector εn such that

v n+1 = Av n + kεn

with limh,k→0 kεn k = 0, where the convergence is uniformely for 0 ≤ tn ≤ T . Furthermore,

kεn k ≤ C(hp + k q ).

We now use
enj = unj − u(tn , xj )
Subtraction of the perturbed iteration and exact iteration yields

en+1 = Aen − kεn .

By recursion, we obtain
n
X
e n = An e 0 − k An−i εi−1 .
i=1

We now argue with the stability


kun k = kAn u0 k ≤ Kku0 k
i.e.,
kAn k ≤ K
Since e0 = 0, the recursion yields
n
X
n
ke k ≤ k kAn−i kkεi−1 k ≤ knKC(hp + k q ).
i=1

This was to show, where we recognize that CT := T KC.

96
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
6.9.6 Extended exercise
Let T > 0 and ν > 0. We define the domains Ω := R and I := (0, T ). Furthermore, let u0 ∈ C 0 (Ω) a 1-periodic
function, i.e., u0 (0, t) = u0 (1, t). Find u(x, t) : Ω × I → R such that

∂t u(x, t) − ν∂xx u(x, t) = 0 in Ω × I,


0
u(x, 0) = u (x) in Ω × {0}.

We assume that this probem is well-posed. In the following we are now interested in
1. Mass conservation: Show that for all t ∈ I, it holds
Z 1 Z 1
u(x, t) dx = u0 (x) dx.
0 0

2. Energy dissipation: Show that for all t ∈ I, it holds

1 1 1 1 0
Z Z
|u(x, t)|2 dx ≤ |u (x)|2 dx
2 0 2 0

3. Discretization with a θ-scheme in time (see also Section 12.3.2). Let θ ∈ [0, 1]. Let N ∈ N with N ≥ 1.
T
We define the time step size as k = N . For all 0 ≤ n ≤ N , we introduce the discrete time points tn = nk.
For the spatial discretization, we work with J ∈ N with J ≥ 1. The spatial mesh size is defined by h = J1
(since 1-periodic, we only consider the interval (0, 1)). We now introduce the discrete nodes xj = jh for
−1 ≤ j ≤ J + 1. The θ discretization of the heat equation yields:

Ujn+1 − Ujn n+1


Uj+1 − 2Ujn+1 + Uj−1
n+1 n
Uj+1 − 2Ujn + Uj−1
n
− θν − (1 − θ)ν =0
k h2 h2
for all 0 ≤ n ≤ N − 1 and for all 1 ≤ j ≤ J.
Task 1: Justify for yourself that the θ discretization makes sense and corresponds to the finite differences
from classes introducing numerical methods
We now add the boundary conditions. Because of the 1-periodicity, we have:

∀0 ≤ n ≤ N − 1, U0n = UJn , n
UJ+1 = U1n .

It remains to pose the initial conditions at n = 0:

∀1 ≤ j ≤ J, Uj0 = u0 (xj ).

We introduce further
k
Cd = ν .
h2
For all 0 ≤ n ≤ N we introduce the vector U n = (Uin )1≤i≤J ∈ RJ .
Task 2: Write the θ scheme in the following matrix form for 0 ≤ n ≤ N − 1:

(BIθ )−1 U n+1 = BE


θ n
U

with
BIθ = (I + θCd B)−1 , θ
BE = I − (1 − θ)Cd B
where I ∈ RJ×J is the identity matrix and B ∈ RJ×J a quadratic matrix (please explain its entries!).
Task 3: What happens to BIθ and BE θ
when θ = 0 or 1?
Task 4: Please justify furthermore the existence of BIθ .

97
6. FINITE DIFFERENCES (FD) FOR ELLIPTIC PROBLEMS
4. Truncation error / convergence error. We introduce the vector en ∈ RJ with

enj = Ujn − Vjn ,

where the exact solution evaluated at the discrete time and spatial points is denoted by Vjn = (Vin )1≤i≤J =
(u(xi , tn ))1≤i≤J ∈ RJ . Because of the boundary conditions, we have

en0 = enJ and enJ+1 = en1

for 0 ≤ n ≤ N − 1 and because of e0j = 0 for 1 ≤ j ≤ J. Show that for 0 ≤ n ≤ N − 1, it holds

(BIθ )−1 en+1 = BE


θ n
e − kη n .

Furthemore, please provide the details of the truncation error η n ∈ RJ .


5. Consistency. Show that the θ-schema is consistent with the order 1 in time and 2 in space. Please specify
which order of the exact solution is necessary to obtain these (optimal) results. Finally, what happens
for the choice θ = 0.5?

6.10 Chapter summary and outlook


In this chapter, we considered finite differences for elliptic problems. Despite that finite elements or isogeometric
analysis have become in general more important, finite differences still serve their purposes because they are
simple to implement and do still exist in large industrial codes. FD are nevertheless further developed (mimetic
finite differences, etc.), and are also often used in combination with other methods such as finite elements. For
instance, a widely used discretization method for initial boundary value problems is to discretize in time with
a finite difference scheme (e.g., One-Step-θ) and to discretize in space with a finite element method, which is
the topic in Section 12. But let us first investigate tools from functional analysis, approximation theory, and
FEM in the next two chapters.

98
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY

7 Functional analysis and approximation theory


In this chapter, we gather all ingrediants from analysis, functinal analysis, and approximation theory in order
to study PDEs from a theoretical point of view. Most results have immediate practical consequences on the
development of good numerical schemes.

7.1 Analysis, linear algebra, functional analysis and Sobolev spaces


Functional analysis is the combination of classical analysis (i.e., calculus) and linear algebra. The latter one
is introduced for finite dimensional spaces in first semester classes. In functional analysis, infinite dimensional
vector spaces are considered, which require tools from analysis such as functions, operators, and convergence
results. The extension to infinite dimensions leads to many nontrivial results. Classical books are [4, 28, 87,
134, 146].
We cannot provide a complete introduction, but only the most important concepts necessary for the further
understanding of the present notes. At the beginning on the next two pages, we partially follow in the spirit of
Tröltzsch [131][p. 17-19] since therein a nice, compact, introduction to the basic tools is provided. The analysis
results (in particlar the Lebuesgue integral) can be studied in more detail with the help of Königsberger [86].
Before we begin, we refer the reader to Section 3.11 in which the definition of a normed space is recapitulated.

7.1.1 Convergence and complete spaces


Definition 7.1 (Convergence). A sequence (un )n∈N ⊂ {V, k · k} converges, if there exists a limit u ∈ V such
that
lim kun − uk = 0
n→∞

or in other notation
lim un = u.
n→∞

Definition 7.2 (Cauchy sequence). A sequence is called a Cauchy sequence if for all ε > 0, there exists
N ∈ N such that
kun − um k ≤ ε ∀n > N, ∀m > N
Any sequence that converges is a Cauchy sequence. The opposite is in general not true.
Example 7.3. Let Ω = (0, 2). We consider the space {C(Ω), k · kL2 } with

un (x) = min{1, xn }.

We see (check yourself )


Z 2
2
kun − um k2L2 = (un − um )2 dx = . . . = . . . ≤
0 2m + 1

for m ≤ n. It is clear that this is a Cauchy sequence. However the limit u is not contained in the space
{C(Ω), k · kL2 } because it is not a continuous function:

 0, 0 ≤ x ≤ 1

u(x) = lim un (x) =
n→∞
 1, 1 ≤ x ≤ 0

A very important class of function spaces are those in which all Cauchy sequences do converge:

Definition 7.4 (Complete space, Banach space). A normed space {V, k · kV } is a complete space, if all
Cauchy sequences converge (i.e., the Cauchy sequence has a limit that belongs to V ). A complete, normed
space is a so-called Banach space.
Example 7.5. We continue our examples:

99
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
1. The set Q of rational numbers is not complete (e.g., [87]).
Pn
2. Rn with the euclidian norm |x| = ( i=1 x2i )1/2 is a Banach space.
3. C(Ω) endowed with the maximum norm

kukC(Ω) = max ku(x)k


x∈Ω

is a Banach space.
4. The space {C(Ω), k · kL2 } with
Z 1/2
kukL2 = u(x)2 dx

is not a Banach space; please see the previous example in which we have shown that the limit u does not
belong to C(Ω) when we work with the L2 norm.
Remark 7.6. The last example from before has already several consequences on our variational formulations.
So far, we have mainly worked and proven our Rresults using classical function spaces C, C k , C ∞ . On the other
hand, we work with integrals, i.e, (∇u, ∇φ) = Ω ∇u · ∇φ dx. Observing that {C(Ω), k · kL2 } is not a Banach
space and that the limit may be not contained in C(Ω) will influence wellposedness results and convergence
results (recall that we try to proof later results similar as for finite differences).

7.1.2 Scalar products, orthogonality, Hilbert spaces in 1D


In Rn a very important concept is associated to orthogonality, which does not come automatically with the
Banach-space property. Those spaces in which a scalar product (inner product) can be defined, allow the
definition of orthogonality. These spaces play a crucial role in PDEs and functional analysis.
Definition 7.7 (Pre-Hilbert space). Let V be a linear space (see Section 3.9). A mapping (·, ·) : V → R is
called a scalar product if for all u, v, w ∈ V and λ ∈ R it holds

(u, u) ≥ 0 and (u, u) = 0 ⇔ u=0


(u, v) = (v, u)
(u + v, w) = (u, w) + (v, w)
(λu, v) = λ(u, v)

If a scalar product is defined in V , this space is called a pre-Hilbert space.


Definition 7.8 (Hilbert space). A complete space endowed with an inner product is called a Hilbert space.
The norm is defined by p
kuk := (u, u).
Example 7.9. The space Rn from before has a scalar product and is complete, thus a Hilbert space. The
space {C(Ω), k · kL2 } has a scalar product, but is not complete, and therefore not a Hilbert space. The space
{C(Ω), k · kC(Ω) } is complete, but the norm is not induced by a scalar product and is therefore not a Hilbert
space, but only a Banach space.
Specifically, for studying PDEs we provide the following further examples:
Definition 7.10 (The L2 space in 1D). Let Ω = (a, b) be an interval (recall 1D Poisson). The space of
square-integrable functions on Ω is defined by
Z
2
L (Ω) = {v : v 2 dx < ∞}

The space L2 is a Hilbert space equipped with the scalar product


Z
(v, w) = vw dx

100
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
and the induced norm p
kvkL2 := (v, v).
Using Cauchy’s inequality
|(v, w)| ≤ kvkL2 kwkL2 ,
we observe that the scalar product is well-defined when v, w ∈ L2 . A mathematically very correct definition
must include in which sense (Riemann or Lebesgue) the integral exists. In general, all L spaces are defined in
the sense of the Lebuesgue integral (see for instance [86][Chapter 7]).
Definition 7.11 (The H 1 space in 1D). We define the H 1 (Ω) space with Ω = (a, b) as
H 1 (Ω) = {v : v and v 0 belong to L2 }
This is space is equipped with the following scalar product:
Z
(v, w)H 1 = (vw + v 0 w0 ) dx

and the norm p


kvkH 1 := (v, v)H 1 .
Definition 7.12 (The H01 space in 1D). We define the H01 (Ω) space with Ω = (a, b) as
H01 (Ω) = {v ∈ H 1 (Ω) : v(a) = v(b) = 0}.
The scalar product is the same as for the H 1 space.
Remark 7.13. With the help of the space H01 we are now able to re-state the 1D Poisson problem. Rather
than working with the space V (see (72)), we work with H01 . In fact this is the largest space that allows to
define the 1D Poisson problem:
Find u ∈ H01 (Ω) : a(u, φ) = l(φ) ∀φ ∈ H01 (Ω)
with
a(u, φ) = (u0 , φ0 ), l(φ) = (f, φ).
We also recall Definition 3.23 of a dual space. For instance, the dual space of H 1 is the space H −1 .
Definition 7.14 (Notation). When there is no confusion with the domain, we often write
L2 := L2 (Ω), H 1 := H 1 (Ω)
and so forth.

7.1.3 Lebuesgue measures and integrals


To formulate variational formulations, we have worked with function spaces so far, mostly denoted by V , that
contain integral information and classical continuous functions.
However, to enjoy the full beauty of variational formulations, the Riemann integral is too limited (see
textbooks to analysis such as [86][Chapter 7]) and the Lebuesgue integral is usually used. The latter one does
hold for a larger class of integrable functions in terms of completeness, which is the biggest advantage in
comparison to the Riemann integral. In consequence, one also obtains deep results in exchanging limiting
processes such as passing to the limit with sequences and integration (see Chapter 8 in [86]). For instance,
when does Z Z
f dx = lim fk dx for fk → f
k→∞

hold true?
Based on the Lebuesgue integral, we can then formulate function spaces with derivative information, so-
called Sobolev spaces, which become Hilbert spaces when a scalar product can be defined, i.e., when the
basic underlying space is the L2 space. The key feature of the Lebesgue integral is that Lebuesgue-measurable
functions are considered for integration [86][Chapter 7]. Here, the set of Lebuesgue measure zero plays an
important role.

101
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
Definition 7.15 (Lebesgue measurable, [86], Chapter 7.5). A set Ω ⊂ Rn is Lebesgue measurable (or in
short measurable) when the characteristic function 1 can be integrated over Ω. The number
Z Z
V (Ω) := V (n) (Ω) := 1 dx = 1Ω dx
Ω Rn

is the n-dimensional volume or Lebesgues measure of A. In the case n = 1, this is the length of the interval.
For n = 2 it is the area. For n = 3 it is the classical three-dimensional volume. The empty set has Lebesgue
measure zero.
Definition 7.16 (Zero set in the sense of Lebesgue, [86], Chapter 7.6). A set N ⊂ Rn is a Lebesgue-set of
measure zero if N is measurable with measure zero: V (N ) = 0.
Example 7.17. We give three examples:
• The set Q of rational numbers has Lebesgue measure zero in R (see for instance [86][Chapter 7.6]).
• In Rn all subsets Rs with dimension s < n have Lebesgue measure zero. (see for instance [86][Chapter
7.6]).
The second example has important consequences for PDEs: boundaries Γ ⊂ Rn−1 of domains Ω ⊂ Rn have
Lebesgue measure zero. To illustrate this in R1 , we have a third example
• Consider R1 and Ω = [−1, 1] and define
 
1 x ≥ 0
 1

x>0
f (x) := and g(x) :=
0 x < 0 0 x≤0
 

The two functions f and g differ in only one point, which is a set of measure zero in the Lebesgue sense.
They are viewed representing the same function.
Definition 7.18 (Almost everywhere, a.e.). Let Ω ⊂ Rn be a Lebesgue-measurable open domain. Measurable
functions are defined almost everywhere, in short a.e., in Ω. When we change the function value on a
subset of Lebuesgue measure zero, then we do not change the function value. In other words

f (x) = g(x) almost everywhere in Ω

when there is T ⊂ Ω such that the Lebesgue measure of T is zero and

f (x) = g(x) for all x ∈ Ω \ T.

Short: Almost everywhere means except on a set of Lebesgue measure zero.


Definition 7.19 (L2 space in Rn ). Let Ω ⊂ Rn be a Lebesgue-measurable open domain. The space L2 = L2 (Ω)
contains all square-integrable functions in Ω:
Z
2
L = {v is Lebesgue measurable | v 2 dx < ∞}.

The space L2 is complete (each Cauchy sequence converges in the norm defined below) and thus a Banach-
space. For a proof see for instance [87][Section 2.2-7] or [134][Kapitel I]. Moreover, we can define a scalar
product: Z
(u, v) = vu dx

such that L2 is even a Hilbert space with norm


p
kukL2 = (u, u).

In the same way, one obtains all Lp spaces:

102
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
Definition 7.20 (Lp spaces). Let 1 ≤ p < ∞. Then,
Z
Lp = {v is Lebesgue measurable | v p dx < ∞}

equipped with the norm:
Z  p1
kukLp := |v|p dx .

For p = ∞, we define the norm via:
kukL∞ := ess sup |u|.
which are the essentially bounded functions.
Proposition 7.21. Let Ω be bounded. Then:
L∞ ⊂ . . . ⊂ Lp ⊂ Lp−1 ⊂ . . . ⊂ L2 ⊂ L1 .
We recall several important results from analysis that have an important impact on variational formulations
of PDEs and numerics. The first result is:
Definition 7.22 (Density, e.g., [87]). A metric space U is dense in V when
Ū = V.
Here Ū means the closure of U . Every point of V is either in U or arbitrarly close to a member of U .
Example 7.23. Two examples:
• The set Q of rational numbers is dense in R (e.g., [87])
• The space of polynomial functions Pk is dense in C[a, b] (see lectures on Analysis 1, e.g. [85]. In other
words: to each f ∈ C[a, b], there exists a sequence (Pk )k∈N of polynomials, which converges uniformely
to f . This is well known as the Weierstrass approximation theorem
Theorem 7.24 (Density result for L2 ). The space Cc∞ (Ω) is dense in L2 (Ω). In other words: for all functions
f ∈ L2 , there exists a sequence (fn )n∈N ⊂ Cc∞ (Ω) such that
lim kf − fn k = 0.
n→∞

Remark 7.25. The previous result means that we can approximate any L2 function by a sequence of ‘very
nice’, smooth, functions defined in the usual classical sense.
Remark 7.26 (Density results). In analysis and functional analysis, density arguments are often adopted
since it allows to work with simple functions in order to approximate ‘abstract’ and complicated functions.
As second result, we generalize the fundamental lemma of calculus of variations:
Theorem 7.27 (Fundamental lemma of calculus of variations). Let f ∈ L2 (Ω). If
Z
f φ dx = 0 ∀φ ∈ Cc∞ (Ω),

then f (x) = 0 almost everywhere in Ω.
Proof. For f ∈ L2 according to the density result, there exists a sequence (fn )n∈N ⊂ Cc∞ (Ω) with
kfn − f k → 0 (n → ∞).
This yields
Z
kf k2L2 = f 2 dx = (f, f ) = (f, f ) − (f, fn ) = (f, f − fn ) ≤ kf kL2 kf − fn kL2 → 0 for n → ∞.
Ω | {z }
=0 assumpt.!

Therefore, kf k2L2 = 0 if and only if f = 0 a.e..

Remark 7.28 (Historical note). The first version of the Rfundamental lemma of calculus of variations appeared
z
around 1755/1756 when Lagrange found a solution to x F (x, y, y 0 ) dx = min and Euler requested a more
precise proof from him. Euler than gave in 1756 the name ‘Variational calculus’ in honor to Lagrange.

103
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
7.1.4 Weak derivatives
The definition of the space L2 now allows us to define a more general derivative expression, the so-called
weak derivative. In fact, this is the form of derivatives that we have implicitly used so far in our variational
formulations.
Definition 7.29 (Weak derivative). Let Ω ⊂ Rn be an open, measurable, domain. Let u ∈ L2 . We say
that u is weakly differentiable in L2 , when functions wi ∈ L2 , i = 1, . . . , n exist such that for all functions
φ ∈ Cc∞ (Ω), we have Z Z
u∂xi φ dx = − wi (x)φ(x) dx.
Ω Ω
In our usual compact form:
(u, ∂xi φ) = −(wi , φ).
Each wi is the ith partial derivative of u. Thus
∂xi u = wi in the weak sense.
We notice that there are no boundary terms since ϕ has compact support in Ω and therefore vanishes near the
boundary ∂Ω. We also recognize that the weak derivative is nothing else than again integration by parts
Remark 7.30 (Example). As the terminology indicates, the weak derivative is weaker than the usual (strong)
derivative expression. In classical function spaces both coincide.
Example 7.31 ([131]). Consider u(x) = |x| in Ω = (−1, 1). Then, the weak derivative is given by:

 −1 x ∈ (−1, 0)

0
u (x) := w(x) = .
 +1 x ∈ [0, 1)

Indeed it holds that for all v ∈ Cc∞ (Ω) we have


Z 1 Z 0 Z 1 Z 1
|x|v 0 (x) dx = (−x)v 0 (x) + xv 0 (x) dx = . . . = − w(x)v(x) dx.
−1 −1 0 −1

We notice that it is not important which value u0 has in zero since it is a point of measure zero; see Example
7.17.
Example 7.32 ([51]). Let n = 1 and Ω = (0, 2) and

 x if 0 < x ≤ 1

u(x) =
 2 if 1 < x < 2.

The function u0 has no weak derivative! There does not exist any function v ∈ L1loc that satisfies the relation
Z 2 Z 2
0
uϕ dx = vϕ dx ∀ϕ ∈ Cc∞ (Ω).
0 0

We proof by a counter example. Suppose the weak derivative exists for v and all ϕ. Then:
Z 2 Z 2 Z 1 Z 2 Z 1
0 0 0
− vϕ dx = uϕ dx = xϕ dx + 2 ϕ dx = − ϕ dx − ϕ(1)
0 0 0 1 0

We choose a sequence of test functions (ϕm )m∈N with


0 ≤ ϕm ≤ 1, ϕm (1) = 1, ϕm (x) → 0
for all x 6= 1. We now take ϕ := ϕm , m → ∞, we obtain
"Z #
2 Z 1
1 = lim ϕm (1) = lim vϕm dx − ϕm dx = 0.
m→∞ m→∞ 0 0

This is a contradiction.

104
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
We can simply extend the weak derivative to higher-order derivatives. Here we remind the reader to the
multiindex notation presented in Section 3.6. It holds:
Definition 7.33. Let u ∈ C k (Ω̄) and v ∈ Cck (Ω) and α a multiindex with |α| ≤ k. Then employing multiple
times partial integration, we obtain
Z Z
u(x)Dα v(x) dx = (−1)|α| v(x)Dα u(x) dx.
Ω Ω
n
Definition 7.34 (Higher-order weak derivative). Let Ω ⊂ R be an open, measurable, domain. Let u be locally
integrable in the sense of Lebesgue and α be a multiindex. If there exists a locally integrable function w with
Z Z
u(x)Dα v(x) dx = (−1)|α| v(x)w(x) dx,
Ω Ω
α
then w = D u(x) and the so-called weak derivative (of higher-order).

7.1.5 The Hilbert spaces H 1 and H01


With the help of weak derivatives, we now define Lebuesgue spaces with (weak) derivatives: Sobolev spaces.
When their norms are induced by a scalar product, namely L2 -based spaces, a Sobolev space is a Hilbert
space.
Definition 7.35 (The space H 1 ). Let Ω ⊂ Rn be an open, measurable, domain. The space H 1 := H 1 (Ω) is
defined by
H 1 := {v ∈ L2 (Ω)| ∂xi v ∈ L2 , i = 1, . . . , n}
In compact form:
H 1 := {v ∈ L2 (Ω)| ∇v ∈ L2 }.
The nabla-operator has been defined in Section 3.7.
Remark 7.36. In physics and mechanics, the space H 1 is also called the energy space . The associated
norm is called energy norm.
Proposition 7.37 (H 1 is a Hilbert space). We define the scalar product
Z
(u, v)H 1 := (uv + ∇u · ∇v) dx

which induces the norm: p
kukH 1 = (u, u)H 1 .
1
The space H equipped with the norm kukH 1 is a Hilbert space.
Proof. It is tivial to see that (u, v)H 1 defines a scalar product. It remains to show that H 1 is complete.
Let (un )n∈N be a Cauchy sequence in H 1 . We need to show that this Cauchy sequence converges in H 1 .
Specifically (un )n∈N and (∂xi un )n∈N are Cauchy sequences in L2 . Since L2 is complete, see Definition 7.19,
there exist two limits u and wi with
un → u in L2 for n → ∞
2
∂xi un → wi in L for n → ∞.
We employ now the weak derivative:
Z Z
un (x)∂xi φ(x) dx = − ∂xi un (x)φ(x) dx φ ∈ Cc∞
Ω Ω
Passing to the limit n → ∞ yields
Z Z
u(x)∂xi φ(x) dx = − wi (x)φ(x) dx.
Ω Ω

This shows that u has a weak derivative w. Therefore, the element w is contained in H 1 and (un )n∈N ⊂ H 1 ,
which shows the assertion.
Remark 7.38. The space H 1 is the closure of C 1 (Ω̄) with respect to the H 1 norm. It also holds for Ω that is
regular, open, bounded that Cc∞ (Ω̄) is dense in H 1 (Ω).

105
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
7.1.6 Hilbert spaces versus classical function spaces
Even so we can approximate Lebuesgue-measurable functions in L2 or H 1 with functions from Cc∞ (Ω), the
limit function may not have nice properties. The most well-known example is the following:
Proposition 7.39 (Singularities of H 1 functions). Let Ω ⊂ Rn be open and measurable. It holds:
• For n = 1, H 1 (Ω) ⊂ C(Ω).
• For n ≥ 2, functions in H 1 (Ω) are in general neither continuous nor bounded.
Proof. Let n = 1. Let Ω = [a, b] and C ∞ (Ω). Then, we have for |x − y| < δ and using Cauchy-Schwarz
Z y Z y 1/2 Z y 1/2 √
|u(x) − u(y)| = v 0 (t) dt ≤ 12 dt |v 0 (t)|2 dt ≤ δkvkH 1 .
x x x
1 ∞
Consequently, each Cauchy sequence in H ∩ C is equicontinuous and bounded. Using Arcelà-Ascoli (for a
general version we refer to [134]), the limit function is continuous.
Now let n ≥ 2. Let the unit circle be defined as

D := {(x, y) ∈ R2 | x2 + y 2 < 1}.

We define the function


2
u(x, y) = log log
r
where r is the radius with r = x + y . It is obvious that u is unbounded. On the other hand, u ∈ H 1 because
2 2 2

Z 1/2
1
2 dr < ∞.
0 r log r
For n ≥ 3, we have
u(x) = r−α , α < (n − 2)/2.
1
It holds u ∈ H with a singularity in the point of origin. It seems that the higher n is, the more significant
the singularities are.
Definition 7.40 (H01 ). The space H01 (Ω) contains vanishing function values on the boundary ∂Ω. The space
H01 (Ω) is the closure of Cc∞ (Ω) in H 1 .
One often writes:
Definition 7.41 (H01 ). The space H01 (spoken: ‘H,1,0’ and not ‘H,0,1’) is defined by:

H01 (Ω) = {v ∈ H 1 | v = 0 on ∂Ω}.

7.1.7 Sobolev spaces


Lebuesgue spaces Lp spaces with p 6= 2 do not have scalar products, but are of course still complete. Introducing
weak derivatives yield the so-called Sobolev spaces. For p = 2 a Sobolev space is a Hilbert space.
Definition 7.42 (Higher-order Sobolev spaces). Let 1 ≤ p < ∞ and k ∈ N. The normed space W k,p (Ω)
contains all functions v ∈ Lp , which derivatives Dα v with |α| ≤ k also belong to Lp . The norm is defined by
X Z 1/p
kvkW k,p = |Dα v(x)|p .
|α|≤k Ω

For p = ∞ we define
kvkW k,∞ = max kDα vkL∞ .
|α|≤k
k,p
The spaces W are Banach spaces. For p = 2, we obtain Hilbert spaces. For instance choosing k = 1 and
p = 2 yields
H 1 := W 1,2 .

106
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
7.1.8 Brief summary of L2 , H 1 and H 2
Let Ω ⊂ Rn be a Lebesgue measurable domain.

• L2 (Ω) space: Z
L2 = {v is Lebesgue measurable | v 2 dx < ∞}.

• L2 scalar product: Z
(u, v)L2 := u · v dx.

• L2 norm: Z
p
kukL2 := (u, u)L2 := |u|2 dx.

1
• H (Ω) space:
H 1 = {v ∈ L2 (Ω)| ∂xi v ∈ L2 (Ω), i = 1, . . . , n}.

• H 1 scalar product: Z
(u, v)H 1 := [u · v + ∇u · ∇v] dx.

• H 1 norm: Z
p
kuk H1 := (u, u)H 1 := [|u|2 + |∇u|2 ] dx.

• H 1 seminorm (not a norm, because in general we do not have |u|H 1 ⇒ u = 0):


p Z
|u|H 1 := (∇u, ∇u)L2 := |∇u|2 dx.

• H 2 (Ω) space:
H 2 = {v ∈ L2 (Ω)| ∂xi v ∈ L2 (Ω), ∂xi xj v ∈ L2 (Ω), i, j = 1, . . . , n}.

• H 2 scalar product: Z
(u, v)H 2 := [u · v + ∇u · ∇v + ∇2 u : ∇2 v] dx.

• H 2 norm: Z
p
kukH 2 := (u, u)H 2 := [|u|2 + |∇u|2 + |∇2 u|2 ] dx.

• H 2 seminorm (not a norm, because in general we do not have |u|H 2 ⇒ u = 0):


p Z
2 2
|u|H 2 := (∇ u, ∇ u)L2 := |∇2 u|2 dx.

107
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
7.2 Useful Inequalities
The Hölder inequality reads:
1 1 1
Proposition 7.43 (Hölder’s inequality). Let 1 ≤ p ≤ ∞ and p + q = 1 (with the convention ∞ = 0). Let
f ∈ Lp and g ∈ Lq . Then,
f g ∈ L1 ,
and
kf gkL1 ≤ kf kLp kgkLq .
Proof. See Werner [134].

Corollary 7.44 (Cauchy-Schwarz inequality). Set p = q = 2. Then

kf gkL1 ≤ kf kL2 kgkL2 .

Proposition 7.45 (Minkowski’s inequality). Let 1 ≤ p ≤ ∞ and let f ∈ Lp and g ∈ Lp . Then,

kf + gkLp ≤ kf kLp + kgkLp .

Proposition 7.46 (Cauchy’s inequality with ε).

b2
ab ≤ εa2 +

for a, b > 0 and ε > 0. For  = 21 , the original Cauchy inequality is obtained.
1 1
Proposition 7.47 (Young’s inequality). Let 1 < p, q < ∞ and p + q = 1. Then,

ap bq
ab ≤ + , a, b > 0.
p q

7.2.1 Some compactness results


A fundamental result in prooving well-posedness of PDEs in infinite dimensional spaces (recall that bounded-
ness and closetness are not equivalent to compactness for infinite-dimensional spaces) is the following result:
Theorem 7.48 (Rellich). Let Ω be an open, bounded domain of class C 1 . The embedding of H 1 into L2 is
compact. In other words: from each bounded sequence (un ) ⊂ H 1 , we can extract a subsequence (unl ) ⊂ L2
with a limit in L2 .
Proof. See for instance Wloka [145].
We have seen that H 1 in dimension n ≥ 2 are not embedded into classical function spaces. However, higher-
order Sobolev spaces have continuous representations. These are the so-called Sobolev embedding theorems.
One result is

Proposition 7.49. Let kuk∞ be the supremum norm. Let Ω ⊂ R2 be convex polygonal or with Lipschitz
boundary. Then:
H 2 (Ω) ⊂⊂ C(Ω̄)
and with c > 0, it holds
kuk∞ ≤ ckukH 2 for u ∈ H 2 (Ω)
Moreover, for each open and connected domain Ω ⊂ R2 , it holds:

H 2 (Ω) ⊂⊂ C(Ω).

108
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
7.3 Approximation theory emphasizing on the projection theorem
The best approximation is based on findings from approximation theory, which we briefly recapitulate in
the following.
We recall the main ingredients of orthogonal projections as they yield the best approximation properties.

7.3.0.1 Best approximation


Definition 7.50. Let U ⊂ X be a subset of a normed space X and w ∈ X. An element v ∈ U is called best
approximation to w in U if
kw − vk = inf kw − uk ,
u∈U

i.e., v ∈ U has the smallest distance to w.


Proposition 7.51. Let U be a finite-dimensional subspace of a normed space X. Then, there exists for each
element in X a best approximation in U .
Proof. Let w ∈ X. We choose a minimizing sequence (un ) from U to approximate w. This fulfills:
kw − un k → d with n → ∞
where d := inf u∈U kw − uk. Since
kun k = kun k − kwk + kwk
≤ | kun k − kwk | + kwk
= | kwk − kun k | + kwk
≤ kw − un k + kwk
the sequence (un ) is bounded. Because U is a finite-dimensional normed space, there exists a converging
subsequence (un(l) ) with
lim un(l) → v ∈ U.
l→∞
It follows that
kw − vk = lim kw − un(l) k = d.
l→∞

Definition 7.52. A pre-Hilbert space is a linear space with a scalar product.


Proposition 7.53. Let U be a linear subspace of a pre-Hilbert space X. An element v ∈ U is a best approxi-
mation to w ∈ X if and only if
(w − v, u) = 0, for all u ∈ U,
i.e., w − v ⊥ U . For each w ∈ X, we have at most one best approximation w.r.t. U .
Proof. “ ⇐ ”: p
Let kak := (a, a), which holds for all v, u ∈ U :
kw − uk2 = kw − v + v − uk2
= ((w − v) + (v − u), (w − v) + (v − u))
= (w − v, w − v) + (w − v, v − u) + (v − u, w − v) + (v − u, v − u)
= kw − vk2 + (w − v, v − u) + (w − v, v − u) + kv − uk2
= kw − vk2 + 2 · Re((w − v, v − u)) + kv − uk2 .
Since U is a linear space, we have v − u ∈ U and we require (w − v, u) = 0∀u ∈ U , it holds:
kw − uk2 = kw − vk2 + kv − uk2
⇒ kw − vk2 = kw − uk2 − kv − uk2 , ∀u ∈ U
⇒ kw − vk < kw − uk , ∀u ∈ U, u 6= 0

109
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
Therefore,
kw − vk = inf kw − uk
u∈U

and thus v is a best approximation to w in U .


“ ⇒ ”:
Let v be a best approximation to w in U . We assume that (w − v, u0 ) 6= 0 for a u0 ∈ U . Let us assume
further that (w − v, u0 ) ∈ R because U is a linear subspace of X. We choose u = v + (w−v,u 0)
ku0 k2 u0 and conclude

 !
(w − v, u0 )
kw − uk2 = kw − vk2 + 2Re w − v, v − v + u0
ku0 k2
  2
(w − v, u0 )
+ v− v+ u0
ku0 k2
  2
2 (w − v, u0 ) (w − v, u0 )
= kw − vk + 2Re w − v, − u0 + − u0
ku0 k2 ku0 k2
(w − v, u0 )2
 
(w − v, u0 )
= kw − vk2 + 2 − Re(w − v, u0 ) + ku0 k2
ku0 k2 ku0 k4
(w − v, u0 )2 (w − v, u0 )2
= kw − vk2 − 2 2
+
ku0 k ku0 k2
(w − v, u0 )2
= kw − vk2 −
ku0 k2
< kw − vk2 .

This is a contradiction that v is a best approximation to w in U .


Uniqueness
Assume that v1 , v2 are best approximations. Then,

(w − v1 , v1 − v2 ) = 0 = (w − v2 , v1 − v2 ), da v1 − v2 ∈ U.

and

⇒ (w, v1 − v2 ) − (v1 , v1 − v2 ) = (w, v1 − v2 ) − (v2 , v1 − v2 )


⇒ (v1 , v1 − v2 ) = (v2 , v1 , −v2 )
⇒ (v1 , v1 − v2 ) − (v2 , v1 − v2 ) = 0
⇒ (v1 − v2 , v1 − v2 ) = 0
⇒ v1 − v2 = 0
⇒ v1 = v2 .

Thus, the best approximation is unique.


Proposition 7.54 (Orthogonal projection). Let U be a complete linear subspace of a pre-Hilbert space X.
Then, there exists for each w ∈ X a unique best approximation in U . The operator P : X → U , which maps
each w ∈ X on its best approximation, is a bounded linear operator. The properties of P are:

P2 = P and kP k = 1

This is called an orthogonal projection of X on U .


Proof. We choose a sequence (un ) with
1
kw − un k2 ≤ d2 + , n∈N
n

110
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
with d := inf u∈U kw − uk. Then:
k(w − un ) + (w − um )k2 + kun − um k2
= ((w − un ) + (w − um ), (w − un ) + (w − um ))
+ ((un − w) + (w − um ), (un − w) + (w − um ))
= (w − un , w − un ) + (w − un , w − um ) + (w − um , w − un ) + (w − um , w − um )
+ (un − w, un − w) + (un − w, w − um ) + (w − um , un − w) + (w − um , w − um )
= 2kw − un k2 + 2kw − um k2
2 2
≤ 4d2 + + for all n, m ∈ N.
n m

Since 12 (un + um ) ∈ U , it holds


1
kw − (un + um )k2 ≥ d2
2
and we can estimate kun − um k2 via
2 2
kun − um k2 ≤ 4d2 + + − k(w − un ) + (w − um )k2
n m
2 2 1
= 4d2 + + − k2(w − (un + um ))k2
n m 2
2 2 2 1
= 4d + + − 4kw − (un + um )k2
n m 2
2 2
≤ + .
n m
Therefore, (un ) is bounded and therefore a Cauchy sequence.
Since U is complete, there exists a v ∈ U with un → v, n → ∞. From
1
kw − un k2 ≤ d2 +
n
it follows that limn→∞ un = v is a best apprioximation to w in U . The uniqueness follows from Proposition
(7.53).
Next, we show P 2 = P . Always, it holds P (u) = u. It follows
P 2 (w) = P (P (w)) = P (u) = u = P (w)
The linearity of P is shown as follows: let α 6= 0, then
P (αx) = v
⇔ (αx − v, u) = 0
v
⇔ α(x − , u) = 0
α
v
⇔ P (x) =
α
⇔ αP (x) = v.

Consequently, P (αx) = αP (x).


Let v1 , v2 be best approximations of x, y, i.e., it holds P (x) = v1 , P (y) = v2 . Then:
P (x) + P (y) = v1 + v2
⇔ (x − v1 , u) = 0 ∧ (y − v2 , u) = 0
⇔ (x − v1 , u) + (y − v2 , u) = 0
⇔ (x − v1 + y − v2 , u) = 0
⇔ ((x + y) − (v1 + v2 ), u) = 0.

111
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
Thus, P (x + y) = v1 + v2 = P (x) + P (y). It remains to show that kP k = 1. It holds:

kwk2 = (P (w) + w − P (w), P (w) + w − P (w))


= (P (w), P (w)) + (w − P (w), w − P (w))
+ (P (w), w − P (w)) + (w − P (w), P (w))
= kP (w)k2 + kw − P (w)k2
≥ kP (w)k2 ∀w ∈ X

Thus, P is bounded with kP k ≤ 1. Because P 2 = P and since for two lineaer operators A and B, it holds
kABk ≤ kAkkBk, it follows finally

kP k = kP 2 k = kP P k ≤ kP kkP k
⇒ kP k ≥ 1
⇒ kP k = 1.

7.3.0.2 The Fréchet-Riesz representation theorem An important principle in functional analysis is to gain
information of normed spaces with the help of functionals.

7.3.0.2.1 Direct sum


Definition 7.55 (Direct sum). A vector space X can be written as direct sum of two subspaces Y and Z:

X = Y ⊕ Z,

if each element x ∈ X has a unique representation such as

x = y + z, y ∈ Y, z ∈ Z

Remark 7.56. The space Z is the algebraic complement of Y in X.


In Hilbert space theory we are specifically interested in such representations. A general Hilbert space H can
be written as direct sum of a closed subspace Y and its orthogonal complement Y ⊥ .
Proposition 7.57 (Projection theorem, direct sum). Let Y be a closed subspace of a Hilbert space H. Then:

H = Y ⊕ Z, Z =Y⊥

7.3.0.2.2 The Fréchet-Riesz representation theorem


Definition 7.58. The space L = L(X, K) of the linear, bounded functionals on a normed space is called
the dual space of X. We often find the notations X 0 = L = L(X, K) or if X is a Hilbert space, we write
X ∗ = L = L(X, K).
One of the key theorems of Hilbert space theory is:
Proposition 7.59. Let X be a Hilbert space. For each linear, bounded functional F ∈ L, there exists a unique
element f ∈ X, so that
F (u) = (u, f ) ∀u ∈ X.
We have constructed a bijective, isogeometric conjugate linear mapping with

kf k = kF k.

Proof. We divide the proof into several parts:

112
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
• Uniqueness
When f maps on F = 0, then
F (x) = (x, f ) = 0 ∀x ∈ X
and also
(f, f ) = 0
from which follows
f = 0.
Thus, the mapping f → F is unique because f = 0 is the only element, which induces the zero function
F = 0.
• Equality of norms
Cauchy-Schwarz yields

||F || = |F | = |(x, f )| ≤ ||x|| · ||f ||


|F |
⇒ ≤ ||f ||
||x||
|F (x)|
⇔ sup ≤ ||f ||
||x||≤1 ||x||

⇒ ||F || ≤ ||f ||

with
|F (x)|
||F || = sup
||x||≤1 ||x||

We use specifically x = f in F (x) = (x, f ) and obtain:

|(f, f )| = ||f ||2 = |F (f )| ≤ ||f || · ||F ||.

Therefore:
||f || ≤ ||F ||
and therefore, we have on the one hand ||f || ≥ ||F ||. On the other hand, we have ||f || ≤ ||F ||. From this
it follows:
||f || = ||F ||

• Construction
For F ∈ X ∗ = L we need to construct f ∈ X:
The kernel
N (F ) = {u ∈ X : F (u) = 0}
is a closed, linear subspace of the Hilbert space X. If N = X, then f = 0 yields the desired result. If
N 6= X, then it holds
X = N ⊕ N⊥
We choose now a w ∈ X with F (w) 6= 0. From the projection theorem it follows that if v is best
approximation of w onto the subspace N , then

w − v ⊥ N (F )

It holds with g := w − v,
F (g)u − F (u)g ∈ N (F ) ∀u ∈ X
because
F (F (g)u − F (u)g) = F (g)F (u) − F (u)F (g) = 0.

113
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
Then:
(F (g)u − F (u)g, g) = 0 ∀u ∈ X

⇒ (F (g)u, g) − (F (u)g, g) = 0

⇒ (F (u)g, g) = (F (g)u, g)

⇒ F (u)(g, g) = (u, F (g)g)


 
u,F (g)g
⇒ F (u) = ||g||2

7.3.0.3 The Pythagorean theorem This section recapitulates and extends our findings from the previous
two sections. Let

ϕ(a) = (u − av, u − av) (67)


2
= (u, u) + a · (v, v) − a · [(u, v) + (v, u)] (68)
| {z }
=2Re(u,v)

The smallest distance is the minimum of (68). A necessary condition is

ϕ0 (a) = 2a · (v, v) − 2Re(u, v) = 0

The minimum is
Re(u, v)
a0 = .
(v, v)
Hence, we have found a minimal point a0 , which fulfills the condition 0 ≤ ϕ(a0 ) ≤ ϕ(a). In detail:

(u, u) − a0 · 2Re(u, v) + a20 · (v, v) ≤ (u, u) + a2 · (v, v) − a2 · Re(u, v).

Furthermore
2Re(u, v) Re(u, v) |Re(u, v)|2
0 ≤ (u, u) − + (v, v)
(v, v) (v, v)2
|Re(u, v)|2
= (u, u) − .
(v, v)

It follows:
|Re(u, v)| ≤ (u, u) · (v, v)
We now define the projection of a vector w onto a vector v:

(w, v)
P<v> (w) = · v.
(v, v)

For an illustration, we refer to Figure 15.

7.3.0.4 Properties of the projection We investigate now the properties of the projection:
2
i) P<v> = P<v> (Idempotenz (germ.))

ii) P<v> = P<v> (Self-adjointness)
The second property can be shown easily using the scalar product. It says that the projection operator can
be shifted into the second argument. It follows that

(P<v> (w), w0 ) = (w, P<v> (w0 )) ⇔ ∗


P<v> = P<v>

114
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY

Figure 15: Projection of w on v.

Proof. To i)
Let w0 = P<v> (w). We compute
 
2 0 (w, v) (w, v)
P<v> (w) = P<v> (w ) = P<v> ·v = P<v> (v) = P<v> (w)
(v, v) (v, v) | {z }
=v

To ii)
On the one hand, we have
(w, v)
(P<v> (w), w0 ) = (v, w0 ).
(v, v)
Secondly,
(w0 , v) (v, w0 ) · (w, v)
 
(w, P<v> (w0 )) = w, ·v =
(v, v) (v, v)
Comparing both sides yields the assertion.

7.3.0.5 The projection theorem (the Pythagorean theorem) Using Figure 15, we can derive the Pythagorean
theorem. It holds

||w||2 = ||P<v> (w) + (w − P<v> (w))||2


= ||P<v> (w)||2 + ||w − P<v> (w)||2 + (P<v> (w), w − P<v>(w) ) + (. . .)
= ||P<v> (w)||2 + ||w − P<v> (w)||2 + (P<v>(w) , w) − (P<v> (w), P<v> (w))
∗ 2
Furthermore, using P<v> P<v> = P<v> = P<v> , we obtain

||w||2 = ||P<v> (w)||2 + ||w − P<v> (w)||2 + (P<v>(w) , w) − (P<v> P<v> (w), w)
= ||P<v> (w)||2 + ||w − P<v> (w)||2 .

Proposition 7.60 (Pythagorean theorem). We define:

||w||2 = ||P<v> (w)||2 + ||w − P<v> (w)||2

Specifically, it holds
||w||2 ≥ ||P<v> (w)||2 (69)
The equality is obtained for w = ||P<v> (w)||.
Remark 7.61. We can write in terms of scalar products:
|(w, v)|2
(w, w) ≥ · (v, v)
(v, v)2
and can obtain backwards the Cauchy-Schwarz inequality:

(w, w)(v, v) ≥ |(w, v)|2 .

115
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
7.4 Differentiation in Banach spaces
We discuss in this section how to differentiate in Banach spaces.

Definition 7.62 (Directional derivative in a Banach space). Let V and W be normed vector spaces and let
U ⊂ V be non-empty. Let f : U → W be a given mapping. If the limit

f (v + εh) − f (v) d
f 0 (v)(h) := lim = f (v + εh)|ε=0 , v ∈ U, h ∈ V
ε→0 ε dε
exists, then f 0 (v)(h) is called the directional derivative of the mapping f at v in direction h. If the directional
derivative exists for all h ∈ V , then f is called directionally differentiable at v.

Remark 7.63 (on the notation). Often, the direction h is denoted by δv in order to highlight that the direc-
tion is related with the variable v. This notation is useful, when several solution variables exist and several
directional derivatives need to be computed.
Remark 7.64. The definition of the directional derivative in Banach spaces is in perfect agreement with the
definition of derivatives in R at x ∈ R (see [85]):

f (x + ε) − f (x)
f 0 (x) := lim ,
ε→0 ε
and in Rn we have (see [86]):
f (x + εh) − f (x)
f 0 (x)(h) := lim .
ε→0 ε
A function is called differentiable when all directional derivatives exist in the point x ∈ Rn (similar to the
Gâteaux derivative). The derivatives in the directions ei , i = 1, . . . , n of the standard basis are the well-known
partial derivatives.
Example 7.65. Given f (u) = u3 , compute the directional derivative into h ∈ V . That is:

f 0 (u)(h) = 3u2 h.

Definition 7.66 (Gâteaux derivative). Let the assumptions hold as in Definition 7.62. A directional-differentiable
mapping as defined in Definition 7.62, is called Gâteaux differentiable at v ∈ U , if there exists a linear contin-
uous mapping A ∈ L(U, W ) such that
f 0 (v)(h) = A(h)
holds true for all h ∈ U . Then, A is the Gâteaux derivative of f at v and we write A = f 0 (v).
Remark 7.67. The definition of the directional derivative immediately carries over to bilinear forms and for
this reason the derivatives in Example 9.26 are well-defined. Therein, the semilinear form a(u, z) is differen-
tiated with respect to the first variable u into the direction φ ∈ V .
Remark 7.68. The Gâteaux is computed with the help of directional deriatives and it holds f 0 (v) ∈ L(U, W ).
If W = R, then f 0 (v) ∈ U ∗ .
Definition 7.69 (Fréchet derivative). A mapping f : U → W is Fréchet differentiable at u ∈ U if there exists
an operator A ∈ L(U, W ) and a mapping r(u, ·) : U → W such that for each h ∈ U with u + h ∈ U , it holds:

f (u + h) = f (u) + A(h) + r(u, h)

with
kr(u, h)kW
→0 for khkU → 0.
khkU
The operator A(·) is the Fréchet derivative of f at u and we write A = f 0 (u).

116
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
Example 7.70. Given f (u) = u3 , we show that the derivative is Fréchet:

f (u + h) = (u + h)3 = |{z}
u3 + |3u{z2 h} + 3uh2 + h3
| {z }
=f (u) =A(u) =r(u,h)

with A(u) = f 0 (u)(h) = 3u2 h and clearly


kr(u, h)k
→ 0.
khk
Example 7.71. The above bilinear form a(u, φ) = (∇u, ∇φ) is Fréchet differentiable in the first argument u
(of course also in the second argument! But u is the variable we are interested in):

a(u + h, φ) = (∇(u + h), ∇φ) = (∇u, ∇φ) + (∇h, ∇φ) .


| {z } | {z }
=a(u,φ) =a0u (u,φ)(h)

Here the remainder term is zero, i.e., r(u, h) = 0, because the bilinear form is linear in u. Thus, the Fréchet
derivative of a(u, φ) = (∇u, ∇φ) is a0u (u, φ)(h) = (∇h, ∇φ).
Example 7.72. Another example is a functional of the form J(u) = u2 dx. Here:
R

Z Z Z Z
J(u + h) = (u + h)2 dx = u2 dx + 2uh dx + h2 dx .
| {z } | {z } | {z }
J(u) =A(h) =r(u,h)

That J(u) is really Fréchet differentiable we need to check whether

kr(u, h)kW
→0 for khkU → 0.
khkU

Here we have Z
h2 dx = khk2V

and therefore:
khk2V
= khkV
khkV
u2 dx is
R
For h → 0 we clearly have khkV → 0. Consequently the directional derivative of J(u) =
Z
J 0 (u)(h) = A(h) = 2uh dx.

Example 7.73. We present four further examples in which two variables couple. Such situations arise in
Chapter 14. Now it depends whether both are unknowns simultaneously or whether they can be decoupled by
assuming that one variable is given and only one is actually unknown. We consider

1. Solve u and v. Let f (U ) = u + v with U = (u, v).


2. Solve u and v. Let f (U ) = uv with U = (u, v).
3. Given v and solve for u. Let f (U ) = u + v with U = (u, v).

4. Given v and solve for u. Let f (U ) = uv with U = (u, v).


The derivatives read:
f 0 (U )(δU ) = δu + δv. Because

(u + εδu + v + εδv) − (u + v)
f 0 (U )(δU ) = lim = δu + δv.
ε→0 ε

117
7. FUNCTIONAL ANALYSIS AND APPROXIMATION THEORY
f 0 (U )(δU ) = vδu + uδv. Because

(u + εδu)(v + εδv) − (uv) εδuv + εδvu + ε2 δuδv


f 0 (U )(δU ) = lim = lim = δuv + δvu.
ε→0 ε ε→0 ε
f 0 (U )(δu) = δu. Because
(u + εδu + v) − (u + v)
f 0 (U )(δU ) = lim = δu.
ε→0 ε
f 0 (U )(δu) = vδu. Because
(u + εδu)v − (uv)
f 0 (U )(δU ) = lim = vδu.
ε→0 ε

118
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS

8 Theory and finite elements (FEM) for elliptic problems


In this section, we discuss the finite element method (FEM). Our strategy is to introduce the method first for
1D problems in which deeper mathematical results with respect to functional analysis and Sobolev theory are
not required for understanding. Later we also provide deeper mathematical results. The discretization parts
are again complemented with the corresponding numerical analysis as well as algorithmic aspects and prototype
numerical tests. As before, we augment from time to time with nonstandard formulations, such as for instance
formulating a prototype coupled problem. Historically, the work by R. Courant in 1943 [37] is considered to
be the first mathematical contribution to the finite element method. At the end of the 60’s, engineers used
finite elements and from then on, fruitful studies in both disciplines mathematics and engineering appeared.

8.1 Preliminaries
8.1.1 Construction of an exact solution (only possible for simple cases!)
In this section, we construct again an exact solution first. In principle this section is the same as Section 6.2.1
except that we derive a very explicit representation of the solution (but we could have done this in Section
6.2.1 as well). We consider again the model problem (D)4 :
−u00 = f in Ω = (0, 1), (70)
u(0) = u(1) = 0. (71)
Please make yourself clear that this is nothing else than expressing Formulation 4.28 in 1D. Thus, we deal
with a second-order elliptic problem.
In 1D, we can derive the explicit solution of (70). First we have
Z x̃
u0 (x̃) = − f (s)ds + C1
0

where C1 is a positive constant. Further integration yields


Z x Z x Z x̃ 
u(x) = u0 (x̃) dx̃ = − f (s) ds dx̃ + C1 x + C2
0 0 0

with another integration constant C2 . We have two unknown constants C1 , C2 but also two given boundary
conditions which allows us to determine C1 and C2 .
Z x Z x̃ 
0 = u(0) = − f (s) ds dx̃ + C1 · 0 + C2
0 0

yields C2 = 0. For C1 , using u(1) = 0, we calculate:


Z 1 Z x̃ 
C1 = f (s) ds dx̃.
0 0

Thus we obtain as final solution:


Z x Z !
x̃  Z 1 Z x̃ 
u(x) = − f (s) ds dx̃ + x f (s) ds dx̃ .
0 0 0 0

Thus, for a given right hand side f we obtain an explicit expression. For instance f = 1 yields:
1
u(x) = (−x2 + x).
2
It is trivial to double-check that this solution also satisfies the boundary conditions: u(0) = u(1) = 0. Fur-
thermore, we see by differentiation that the original equation is obtained:
1
u0 (x) = −x + ⇒ −u00 (x) = 1.
2
4 (D) stands for differential problem. (M ) stands for minimization problem. (V ) stands for variational problem.

119
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Exercise 5. Let f = −1.
• Compute C1 ;

• Compute u(x);
• Check that u(x) satisfies the boundary conditions;
• Check that u(x) satisfies the PDE.

8.1.2 Equivalent formulations


We first discuss that the solution of (70) (D) is also the solution of a minimization problem (M ) and a
variational problem (V ). To formulate these (equivalent) problems, we first introduce the scalar product
Z 1
(v, w) = v(x)w(x) dx.
0

Furthermore we introduce the linear space

V := {v| v ∈ C[0, 1], v 0 is piecewise continuous and bounded on [0, 1], v(0) = v(1) = 0}. (72)

We also introduce the linear functional F : V → R such that


1 0 0
F (v) = (v , v ) − (f, v).
2
Recall from Numerik 1 [114][Section 7.5]
1
Q(x) = (Ax, x)2 − (b, x)2 .
2
We state

Definition 8.1 (Four equivalent relationships). We deal with four (equivalent) problems:
(D) Find u ∈ C 2 such that −u00 = f with u(0) = u(1) = 0; (strong form)
(M) Find u ∈ V such that F (u) ≤ F (v) for all v ∈ V ; (energy minimization)

(V) Find u ∈ V such that (u0 , v 0 ) = (f, v) for all v ∈ V ; (weak form).
Moreover, the conservation relation (see Section 4.1.3) is equivalent to these three problems:
(C) Find u ∈ C 2 such that (−u00 , 1) = (f, 1) with u(0) = u(1) = 0 (volume conservation law).
Actually, (C) is most often the starting point from a physical derivation using volume conservation prin-
ciples such as mass, momentum (angular momentum), and energy. In physics, the quantity F (v) stands for
the total potential energy of the underlying model; see also Section 9.6.2. Moreover, the first term in F (v)
denotes the internal elastic energy and (f, v) the load potential. Therefore, formulation (M) corresponds to the
fundamental principle of minimal potential energy and the variational problem (V) to the principle of
virtual work (e.g., [34]). The proofs of their equivalence will be provided in the following.

Remark 8.2 (Euler-Lagrange equations). When (V ) is derived from (M ), we also say that (V ) is the Euler-
Lagrange equation related to (M ).
Remark 8.3 (Numerical schemes inspired by these four equivalent relations). The previous four relations
inspire different discretization schemes:
(D) → Finite differences; Chapter 6.

(M) → Optimization algorithms

120
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
(V) → Galerkin finite elements; this chapter
(C) → Finite volume schemes
In the following we show that the four problems (D), (M ), (V ), (C) are equivalent.
Proposition 8.4. It holds
(D) ↔ (C).
Proof. First, (D) → (C) is trivial. The other way around, i.e. (C) → (D) has been discussed in Section 4.1.3,
namely in (C) the integrand must be sufficiently regular and the integral relation must hold for arbitrary
subsets U of Ω.
As next relationship, we have:
Proposition 8.5. It holds
(D) → (V ).
Proof. We multiply u00 = f with an arbitrary function φ (a so-called test function) from the space V defined
in (72). Then we integrate over the interval Ω = (0, 1) yielding

−u00 = f (73)
Z Z
⇒− u00 φ dx = f φ dx (74)

Z ZΩ
⇒ u0 φ0 dx − u0 (1)φ(1) + u0 (0)φ(0) = f φ dx (75)

Z ZΩ
⇒ u0 φ0 dx = f φ dx ∀φ ∈ V. (76)
Ω Ω

In the second last term, we used integration by parts.


The boundary terms vanish because φ ∈ V . This shows that
Z Z
0 0
u φ dx = f φ dx
Ω Ω

is a solution of (V ).
Proposition 8.6. It holds
(V ) ↔ (M ).
Proof. We first assume that u is a solution to (V ). Let φ ∈ V and set w = φ − u such that φ = u + w and
w ∈ V . We obtain
1 0
F (φ) = F (u + w) = (u + w0 , u0 + w0 ) − (f, u + w)
2
1 1
= (u0 , u0 ) − (f, u) + (u0 , w0 ) − (f, w) + (w0 , w0 ) ≥ F (u)
2 2
We use now the fact that (V ) holds true, namely

(u0 , w0 ) − (f, w) = 0,

and also that (w0 , w0 ) ≥ 0. Thus, we have shown that u is a solution to (M ). We show now that (M ) → (V )
holds true as well. For any φ ∈ V and ε ∈ R we have

F (u) ≤ F (u + εφ),

because u + εφ ∈ V . We differentiate with respect to ε and show that (V ) is a first order necessary condition
to (M ) with a minimum at ε = 0. To do so, we define

1 0 0 ε2
g(ε) := F (u + εφ) = (u , u ) + ε(u0 , φ0 ) + (φ0 , φ0 ) − (f, u) − ε(f, φ).
2 2

121
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Thus
g 0 (ε) = (u0 , φ0 ) + ε(φ0 , φ0 ) − (f, φ).
A minimum is obtained for ε = 0. Consequently,

g 0 (0) = 0.

In detail:
(u0 , φ0 ) − (f, φ) = 0,
which is nothing else than the solution of (V ).
Proposition 8.7. The solution u to (V ) is unique.
Proof. Assume that u1 and u2 are solutions to (V ) with

(u01 , φ0 ) = (f, φ) ∀φ ∈ V,
(u02 , φ0 ) = (f, φ) ∀φ ∈ V.

We subtract these solutions and obtain:


(u01 − u02 , φ0 ) = 0.
We choose as test function φ0 = u01 − u02 and obtain

(u01 − u02 , u01 − u02 ) = 0.

Thus:
u01 (x) − u02 (x) = (u1 − u2 )0 (x) = 0.
Since the difference of the derivatives is zero, this means that the difference of the functions themselves is
constant:
(u1 − u2 )(x) = const
Using the boundary conditions u(0) = u(1) = 0, yields

(u1 − u2 )(x) = 0.

Thus u1 = u2 .
Remark 8.8. The last statements are related to the definiteness of the norm. It holds:
Z
kuk2L2 = u2 dx.

Thus the results follow directly because


kuk = 0 ⇔ u = 0.
Proposition 8.9. It holds
(V ) → (D).
Proof. We assume that u is a solution to (V ), i.e.,

(u0 , φ0 ) = (f, φ) ∀φ ∈ V.

If we assume sufficient regularity of u (in particular u ∈ C 2 ), then u00 exists and we can integrate backwards.
Moreover, we use that φ(0) = φ(1) = 0 since φ ∈ V . Then:

(−u00 − f, φ) = 0 ∀φ ∈ V.

Since we assumed sufficient regularity for u00 and f the difference is continuous. We can now apply the
fundamental principle (see Proposition 8.11):
Z
w ∈ C(Ω) ⇒ wφ dx = 0 ⇒ w ≡ 0.

122
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
We proof this result later. Before, we obtain

(−u00 − f, φ) = 0 ⇒ −u00 − f = 0.

This yields the PDE. The boundary conditions are recovered because u ∈ V and therein, we have u(0) =
u(1) = 0. Therefore, we obtain:

−u00 = f
u(0) = (1) = 0.

Since we know that (D) → (V ) holds true, u has the assumed regularity properties and we have shown the
equivalence.
It remains to state and proof the fundamental lemma of calculus of variations. To this end, we introduce
Definition 8.10 (Continuous functions with compact support). Let Ω ⊂ Rn be an open domain.
• The set of continuous functions from Ω to R with compact support is denoted by Cc (Ω). Such functions
vanish on the boundary.
• The set of smooth functions (infinitely continuously differentiable) with compact support is denoted by
Cc∞ (Ω). Often, functions ϕ ∈ Cc∞ (Ω) are so-called test functions.
Proposition 8.11 (Fundamental lemma of calculus of variations). Let Ω = [a, b] be a compact interval and
let w ∈ C(Ω). Let φ ∈ C ∞ with φ(a) = φ(b) = 0, i.e., φ ∈ Cc∞ (Ω). If for all φ it holds
Z
w(x)φ(x) dx = 0, (77)

then, w ≡ 0 in Ω.
Proof. We perform an indirect proof. We suppose that there exist a point x0 ∈ Ω with w(x0 ) 6= 0. Without
loss of generality, we can assume w(x0 ) > 0. Since w is continuous, there exists a small (open) neighborhood
ω ⊂ Ω with w(x) > 0 for all x ∈ ω and outside w ≡ 0 in Ω \ ω. Let φ now be a positive test function (recall
that φ can be arbitrary, specifically positive if we wish) in Ω and thus also in ω. Since w(x) and φ(x) are
continuous the product is also w(x)φ(x) ∈ C(Ω). But now w(x)φ(x) > 0 in ω and w(x)φ(x) = 0 in Ω \ ω. By
using next the hypothesis (77), we have
Z Z
0= w(x)φ(x) dx = w(x)φ(x) dx > 0.
Ω ω

But this is a contradiction. Thus w(x) = 0 for all in x ∈ ω. Extending this result to all open neighborhoods
in Ω we arrive at the final result. The same procedure can be applied to w(x0 ) < 0 and consequently, w ≡ 0
in Ω.

8.2 Derivation of a weak form


We continue to consider the variational form in this section and provide more definitions and notation. We
recall the key idea, which is a two-step procedure:
• Step 1: Design a function space V that also includes the correct boundary conditions.
• Step 2: Multiply the strong form (D) with a test function from V and integrate.
The second operation ‘weakens’ the derivative information (therefore the name weak form because rather
evaluating 2nd order derivatives, we only need to evaluate a 1st order derivative on the trial function and
another 1st order derivative on the test function.
For Poisson with homogeneous Dirichlet conditions, we then obtain for Step 1: Take space

V := {v| v ∈ C[0, 1], v 0 is pc. cont. and bound. on [0, 1], v(0) = v(1) = 0}

123
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
from before. We now address Step 2:
−u00 = f (78)
Z Z
00
⇒− u φ dx = f φ dx (79)
Z Z Ω ZΩ
⇒ u0 φ0 dx − ∂n uφ ds = f φ dx (80)
Ω ∂Ω
Z ZΩ
⇒ u0 φ0 dx = f φ dx. (81)
Ω Ω

To summarize we have:
Z Z
u0 φ0 dx = f φ dx (82)
Ω Ω

A common short-hand notation in mathematics is to use parentheses for L2 scalar products:


R

ab dx =: (a, b):
(u0 , φ0 ) = (f, φ) (83)
A mathematically-correct statement is:
Formulation 8.12. Find u ∈ V such that
(u0 , φ0 ) = (f, φ) ∀φ ∈ V. (84)

8.3 Notation: writing the bilinear and linear forms in abstract notation
In the following, a very common notation is introduced. First, we recall concepts known from linear algebra:
Definition 8.13 (Linear form and bilinear form). If V is a linear space, l is a linear form on V if l : V → R
or in other words l(v) ∈ R for v ∈ V . Moreover, l is linear:
l(av + bw) = al(v) + bl(w) ∀a, b ∈ R, ∀v, w ∈ V.
A problem is a bilinear form 5 on V × V if a : V × V → R and a(v, w) ∈ R for v, w ∈ V and a(v, w) is
linear in each argument, i.e., for α, β ∈ R and u, v, w ∈ V , it holds
a(u, αv + βw) = αa(u, v) + βa(u, w),
a(αu + βv, w) = αa(u, w) + βa(v, w).
A bilinear form is said to be symmetric if
a(u, v) = a(v, u).
A symmetric bilinear form a(·, ·) on V × V defines a scalar product on V if
a(v, v) > 0 ∀v ∈ V, v 6= 0.
The associated norm is denoted by k · ka and defined by
p
kvka := a(v, v) for v ∈ V.
Definition 8.14 (Frobenius scalar product). For vector-valued functions in second-order problems, we need
to work with a scalar product defined for matrices because their gradients are matrices. Here the natural form
is the Frobenius scalar product, which is defined as:
XX
< A, B >F = A : B = aij bij
i j

The Frobenius scalar product induces the Frobenius norm:


p
kAkF = < A, B >.
5 For
nonlinear problems we deal with a semi-linear form, which is only linear in one argument, i.e., the solution variable and
where the test function enters linearly; see Chapter 13.

124
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Remark 8.15. For vector-valued PDEs (such as elasticity, Stokes or Navier-Stokes), we formulate in compact
form:
a(u, φ) = l(φ)
with some l(φ) and Z
a(u, φ) = ∇u : ∇φ dx

where ∇u : ∇φ is then evaluated in terms of the Frobenius scalar product.
Definition 8.16 (Bilinear and linear forms of the continuous Poisson problem). For the Poisson problem, we
can define:
a(u, φ) = (u0 , φ0 ),
l(φ) = (f, φ).
We shall state the variational form of Poisson’s problem in terms of a(·, ·) and l(·):
Formulation 8.17 (Variational Poisson problem on the continuous level). Find u ∈ V such that
a(u, φ) = l(φ) ∀φ ∈ V, (85)
where a(·, ·) and l(·) are defined in Definition 8.16. The unknown function u is called the trial or (ansatz)
function whereas φ is the so-called test function.

8.4 Finite elements in 1D


In the following we want to concentrate how to compute a discrete solution. As in the previous chapter, this,
in principle, allows us to address even more complicated situations and also higher spatial dimensions. The
principle of the FEM is rather simple:
S
1. Introduce a mesh Th := Ki (where Ki denote the single mesh elements) of the given domain Ω = (0, 1)
with mesh size (diameter/length) parameter h
2. Define on each mesh element Ki := [xi , xi+1 ], i = 0, . . . , n polynomials for trial and test functions. These
polynomials must form a basis in a space Vh and they should reflect certain conditions on the mesh
edges;
3. Use the variational (or weak) form of the given problem and derive a discrete version. Then, plug-in a
linear combination of basis functions from Vh ;
4. Evaluate the arising integrals;
5. Collect all contributions on all Ki leading to a linear equation system AU = B;
6. Solve this linear equation system; the solution vector U = (u0 , . . . , un )T contains the discrete solution at
the nodal points x0 , . . . , xn (for Lagrange FEM);
7. Verify the correctness of the solution U (experimentally, numerical/computational analysis, analytical
results).

8.4.1 The mesh


Let us start with the mesh. We introduce nodal points and divide Ω = (0, 1) into
x0 = 0 < x1 < x2 < . . . < xn < xn+1 = 1.
In particular, we can work with a uniform mesh in which all nodal points have equidistant distance:
1
xj = jh, h= , 0 ≤ j ≤ n + 1, h = xj+1 − xj .
n+1
Remark 8.18. An important research topic is to organize the points xj in certain non-uniform ways (keyword
is adaptivity!) in order to reduce the discrete error (see Section 8.13).

125
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.4.2 Linear finite elements
In the following we denote Pk the space that contains all polynomials up to order k with coefficients in R:
Definition 8.19.
k
X
Pk := { ai xi | ai ∈ R}.
i=0
In particular we will work with the space of linear polynomials
P1 := {a0 + a1 x| a0 , a1 ∈ R}.
A finite element is now a function localized to an element Ki ∈ Th and uniquely defined by the values in the
nodal points xi , xi+1 .
We then define the space:
(1)
Vh = Vh := {v ∈ C[0, 1]| v|Ki ∈ P1 , Ki := [xi , xi+1 ], 0 ≤ i ≤ n, v(0) = v(1) = 0}.
The boundary conditions are build into the space through v(0) = v(1) = 0. This is an important concept that
Dirichlet boundary conditions will not appear explicitly later, but are contained in the function spaces.
All functions inside Vh are so called shape functions and can be represented by so-called hat func-
tions. Hat functions are specifically linear functions on each element Ki . Attaching them yields a hat in the
geometrical sense.
For j = 1, . . . , n we define: 
if x 6∈ [xj−1 , xj+1 ]

0




φj (x) = xx−x j−1
if x ∈ [xj−1 , xj ] (86)
 j −xj−1

 xj+1 −x


 x −x
j+1 j
if x ∈ [xj , xj+1 ]

with the property 


1

i=j
φj (xi ) = . (87)
0 i 6= j

φi−1 φi φi+1
1

xi−1 xi xi+1

Figure 16: Hat functions.

For a uniform step size h = xj − xj−1 = xj+1 − xj we obtain





 0 if x 6∈ [xj−1 , xj+1 ]


x−xj−1
φj (x) = h if x ∈ [xj−1 , xj ]


 xj+1 −x if x ∈ [x , x ]


h j j+1

126
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
and for its derivative: 


 0 if x 6∈ [xj−1 , xj+1 ]


φ0j (x) = + h1 if x ∈ [xj−1 , xj ]



−1

if x ∈ [xj , xj+1 ]
h

Lemma 8.20. The space Vh is a subspace of V := C[0, 1] and has dimension n (because we deal with n basis
functions). Thus the such constructed finite element method is a conforming method. Furthermore, for each
function vh ∈ Vh we have a unique representation:
n
X
vh (x) = vh,j φj (x) ∀x ∈ [0, 1], vh,j ∈ R.
j=1

Proof. Sketch: The unique representation is clear, because in the nodal points it holds φj (xi ) = δij , where δij
is the Kronecker symbol with δij = 1 for i = j and 0 otherwise.
The function vh (x) connects the discrete values vh,j ∈ R and in particular the values between two support
points xj and xj+1 can be evaluated.

vh

xi−1 xi xi+1

Figure 17: The function vh ∈ Vh .

Remark 8.21. The finite element method introduced above is a Lagrange method, since the basis functions
φj are defined only through its values at the nodal points without using derivative information (which would
result in Hermite polynomials).

8.4.3 The process to construct the specific form of the shape functions
In the previous construction, we have hidden the process how to find the specific form of φj (x). For 1D it is
more or less clear and we would accept the φj (x) really has the form as it has in (86) . In Rn this task is a
bit of work. To understand this procedure, we explain the process in detail. Here we first address the defining
properties of a finite element (see also Section 8.4.7):

• Intervals [xi , xi+1 ];


• A linear polynomial φ(x) = a0 + a1 x;
• Nodal values at xi and xi+1 (the so-called degrees of freedom).

Later only these properties are stated (see also the general literature in Section 1) and one has to construct
the specific φ(x) such as for example in (86). Thus, the main task consists in finding the unknown coefficients

127
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
a0 and a1 of the shape function. The key property is (87) (also valid in Rn in order to have a small support)
and therefore we obtain:

φj (xj ) = a0 + a1 xj = 1,
φj (xi ) = a0 + a1 xi = 0.

To determine a0 and a1 we have to solve a small linear equation system:


    
1 xj a0 1
= .
1 xi a1 0

We obtain
1
a1 = −
xi − xj
and
xi
a0 = .
xi − xj
Then:
xi − x
φj (x) = a0 + a1 x = .
xi − xj
At this stage we have now to distinguish whether xj := xi−1 or xj := xi+1 or |i − j| > 1 yielding the three
cases in (86).
Remark 8.22. Of course, for higher-order polynomials and higher-order problems in Rn , the matrix system
to determining the coefficients becomes larger. However, in all state-of-the-art FEM software packages, the
shape functions are already implemented.
Remark 8.23. A very practical and detailed derivation of finite elements in different dimensions can be found
in [120].

8.4.4 The discrete weak form and employing linear combination for uh ∈ Vh
We have set-up a mesh and local polynomial functions with a unique representation, all these developments
result in a ‘finite element’. Now, we use the variational formulation and derive the discrete counterpart:
Formulation 8.24. Find uh ∈ Vh such that

(u0h , φ0h ) = (f, φh ) ∀φh ∈ Vh . (88)

Or in the previously introduced compact form:


Formulation 8.25 (Variational Poisson problem on the discrete level). Find uh ∈ Vh such that

a(uh , φh ) = l(φh ) ∀φh ∈ Vh , (89)

where a(·, ·) and l(·) are defined in Definition 8.16 by adding h as subindex for u and φ.

Remark 8.26 (Galerkin method). The process going from V to Vh using the variational formulation is called
Galerkin method. Here, it is not necessary that the bilinear form is symmetric. As further information:
not only is Galerkin’s method a numerical procedure, but it is also used in analysis when establishing existence
of the continuous problem. Here, one starts with a finite dimensional subspace and constructs a sequence of
finite dimensional subspaces Vh ⊂ V (namely passing with h → 0; that is to say: we add more and more basis
functions such that dim(Vh ) → ∞). The idea of numerics is the same: finally we are interested in small h
such that we obtain a discrete solution with sufficient accuracy.
Remark 8.27 (Ritz method). If we discretize the minimization problem (M ), the above process is called Ritz
method. In particular, the bilinear form of the variational problem is symmetric.

128
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Remark 8.28 (Ritz-Galerkin method). For general bilinear forms (i.e., not necessarily symmetric) the dis-
cretization procedure is called Ritz-Galerkin method.
Remark 8.29 (Petrov-Galerkin method). In a Petrov-Galerkin method the trial and test spaces can be
different.
In order to proceed we can express uh ∈ Vh with the help of the basis functions φj in Vh := {φ1 , . . . , φn },
thus:
Xn
uh = uj φj (x), uj ∈ R.
j=1

Since (88) holds for all φi ∈ Vh for 1 ≤ i ≤ n, it holds in particular for each i:
(u0h , φ0i ) = (f, φi ) for 1 ≤ i ≤ n. (90)
We now insert the representation for uh in (90), yielding the Galerkin equations:
n
X
uj (φ0j , φ0i ) = (f, φi ) for 1 ≤ i ≤ n. (91)
j=1

We have now extracted the coefficient vector of uh and only the shape functions φj and φi (i.e., their
derivatives of course) remain in the integral.
This yields a linear equation system of the form
AU = B
where
U = (uj )1≤j≤n ∈ Rn , (92)
B = (f, φi ) 1≤i≤n ∈ Rn ,

(93)
 
A = (aij )ni,j=1 = (φ0j , φ0i ) ∈ Rn×n . (94)
1≤j,i≤n

Thus the final solution vector is U which contains the values uj at the nodal points xj of the mesh. Here we
remark that x0 and xn+1 are not solved in the above system and are determined by the boundary conditions
u(x0 ) = u(0) = 0 and u(xn+1 ) = u(1) = 0.
Remark 8.30 (Regularity of A). It remains the question whether A is regular such that A−1 exists. We show
this rigorously using Assumption No. 3 in Definition 8.71 in Section 8.12.3.
Remark 8.31. We discuss in Section 8.5.3, why (aij )ni,j=1 = (φ0j , φ0i ) holds and not (aij )ni,j=1 = (φ0i , φ0j ) in
general.

8.4.5 Evaluation of the integrals


It remains to determine the specific entries of the system matrix (also called stiffness matrix) A and the right
hand side vector B. Since the basis functions have only little support on two neighboring elements (in fact
that is one of the key features of the FEM) the resulting matrix A is sparse, i.e., it contains only a few number
of entries that are not equal to zero.
Let us now evaluate the integrals that form the entries aij of the matrix A = (aij )1≤i,j≤n . For the diagonal
elements we calculate:
Z Z xi+1
aii = ϕ0i (x)ϕ0i (x) dx = ϕ0i (x)ϕ0i (x) dx (95)
Ω xi−1
Z xi Z xi+1 
1 1 2
= dx + − dx (96)
xi−1 h2 xi h
h h
= + 2 (97)
h2 h
2
= . (98)
h

129
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
For the right off-diagonal we have:
Z Z xi+1
1  1 1
ai,i+1 = ϕ0i+1 (x)ϕ0i (x) dx = · − dx = − . (99)
Ω xi h h h

It is trivial to see that ai,i+1 = ai−1,i .


In compact form we summarize:

− xj+11−xj if j = i − 1






1 1

xj −xj−1 + if j = i
Z 

xj+1 −xj
aij = φ0j (x)φ0i (x) dx = . (100)
1
Ω − xj −x if j = i + 1



 j−1



0

otherwise

For a uniform mesh (as we assume in this section) we can simplify the previous calculation since we know that
h = hj = xj+1 − xj :

− h1 if j = i − 1






2

if j = i
Z 
0 0 h
aij = φj (x)φi (x) dx = . (101)
Ω  − h1 if j = i + 1






0

otherwise

The resulting matrix A for the ‘inner’ points x1 , . . . , xn reads then:


 
2 −1 0
 −1 2 −1 
 
A=h  −1 
 .. .. ..  ∈ Rn×n .

. . . 
−1 2 −1 
 

0 −1 2

Remark 8.32. Since the boundary values at x0 = 0 and xn+1 = 1 are known to be u(0) = u(1) = 0, they are
not assembled in the matrix A. We could have considered all support points x0 , x1 , . . . , xn , xn+1 we would have
obtained:  
h 0 0
 −1 2 −1 
 
A = h−1 
 .. .. ..  ∈ R(n+2)×(n+2) .

 . . . 
−1 2 −1 
 

0 0 h
Here the entries a00 = an+1,n+1 = 1 (and not 2) because at the boundary only the half of the two corresponding
test functions do exist. Furthermore, working with this matrix A in the solution process below, we have to
modify the entries a0,1 = a1,0 = an,n+1 = an+1,n from the value −1 to 0 such that uh,0 = uh,n+1 = 0 can
follow.
It remains to evaluate the integrals of the right hand side vector b. Here the main difficulty is that the given
right hand side f may be complicated. Of course it always holds for the entries of b:
Z
bi = f (x)φi (x) dx for 1 ≤ i ≤ n.

130
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
The right hand side now depends explicitly on the values for f . Let us assume a constant f = 1 (thus the
original problem would be −u00 = 1), then:
Z Z
bi = 1 · φi (x) dx = 1 · φi (x) dx = 1 · h for 1 ≤ i ≤ n.
Ω Ω

Namely
B = (h, . . . , h)T .
For non-uniform step sizes we obtain:
Z Z
1 1
φi (x) dx, and φi (x) dx
hi Ki hi+1 Ki+i

with hi = xi − xi−1 , hi+1 = xi+1 − xi and Ki = [xi−1 , xi ] and Ki+1 = [xi , xi+1 ].
Exercise 6. Derive the system matrix A by assuming non-uniform step sizes hj = xj − xj+1 .
Exercise 7. Derive the system matrix A for a P2 discretization, namely working with quadratic basis functions.
Sketch how the basis functions look like.

8.4.6 More comments on how to build-in boundary conditions


We briefly elaborate a little bit more on boundary conditions. If they are of Neumann type, they can be
immediately assembled as previously shown. On the other hand, Dirichlet conditions must be carefully applied
to the matrix. Usually the entries are locally modified as previously mentioned. In higher dimensions, this is
more work, and constraint matrix can be used. An alternative to the strong prescription of Dirichlet conditions
can be a weak formulation. The latter is as follows for instance:

Formulation 8.33 (Weak prescription of boundary conditions). Let Ω be a domain and Γ its boundary
(sufficiently smooth). Let V := H 1 . Let g be the boundary function and γ > 0 a penalization parameter. Find
u ∈ V such that Z
a(u, φ) + γ (u − g)φ ds = l(φ) ∀φ ∈ V.
Γ

8.4.7 Definition of a finite element


We briefly summarize the key ingredients that define a finite element. A finite element is a triple (K, PK , Σ)
where

• K is an element, i.e., a geometric object (in 1D an interval);


• Pk (K) is a finite dimensional linear space of polynomials defined on K;
• Σ, not introduced so far, is a set of degrees of freedom (DoF), e.g., the values of the polynomial at the
vertices of K.

These three ingredients yield a uniquely determined polynomial on an element K.

8.4.8 Properties of the system matrix A


We first notice:
Remark 8.34. The system matrix A has a factor h1 whereas in the corresponding matrix A, using a finite
difference scheme, we have a factor h12 . The reason is that we integrate the weak form using finite elements
and one h is hidden in the right hand side vector b. Division by h we yield the same system for finite elements
as for finite differences.

131
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Next, we show that A is symmetric and positive definite. It holds

(φ0i , φ0j ) = (φ0j , φ0i ).

Using the representation


n
X
u(x) = uj,h φj,h (x),
j=1

we obtain
n
X n
X n
X
ui (φ0i , φ0j )uj = ( ui φ0i , uj φ0j ) = (u0 , u0 ) ≥ 0.
i,j=1 i,j=1 i,j=1

We have
(u0 , u0 ) = 0
only if u0 ≡ 0 since u(0) = 0 only for u ≡ 0 or uj,h = 0 for j = 1, . . . , n. We recall that a symmetric matrix
B ∈ Rn×n is said to be positive definite if
n
X
ξ · Bξ = ξi bij ξj > 0 ∀ξ ∈ Rn , ξ 6= 0.
i,j=1

Finally, it holds:
Proposition 8.35. A symmetric positive matrix B is positive definite if and only if the eigenvalues of B are
strictly positive. Furthermore, a positive definite matrix is regular and consequently, the linear system to which
B is associated with has a unique solution.
Proof. Results from linear algebra.

8.4.9 Numerical test: 1D Poisson


We compute the same test as in Section 6.7, but now with finite elements. Since we can explicitly derive
all entries of the system matrix A as in the finite difference method, everything would result in the same
operations as in Section 6.7. Rather we adopt deal.II [6, 9] an open source C++ finite element code to carry
out this computation. Even that in such packages many things are hidden and performed in the background,
deal.II leaves enough room to see the important points:
• Creating a domain Ω and decomposition of this domain into elements;
• Setting up the vectors u, b and the matrix A with their correct length;

• Assembling (not yet solving!) the system Au = b. We compute locally on each mesh element the matrix
entries and right hand side entries on a master cell and insert the respective values at their global places
in the matrix A;
• In the assembly, the integrals are evaluated using numerical quadrature in order to allow for more
general expressions when for instance the material parameter a is not constant to 1 or when higher-order
polynomials need to be evaluated;
• Solution of the linear system Au = b (the implementation of the specific method is hidden though;
• Postprocessing of the solution: here writing the solution data Uj , 1 ≤ j ≤ n into a file that can be read
from a graphic visualization program such as gnuplot, paraview, visit.

132
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS

-0.05

-0.1
u(x)

-0.15
h = 0.5, DoFs = 3
h = 0.25, DoFs = 5
h = 0.125, DoFs = 9
-0.2 h = 0.0625, DoFs = 17
h = 0.03125, DoFs = 33
Min. u(x) = -0.125
-0.25
0 0.2 0.4 0.6 0.8 1
x

Figure 18: Solution of the 1D Poisson problem with f = −1 using finite elements with various mesh sizes
h. DoFs is the abbreviation for degrees of freedom; here the number of support points xj . The
dimension of the discrete space is DoF s. For instance for h = 0.5, we have 3 DoFs and two basis
functions, thus dim(Vh ) = 3. The numerical solutions are computed with an adaptation of step-3
in deal.II [6, 9]. Please notice that the picture norm is not a proof in the strict mathematical sense:
to show that the purple, and blue lines come closer and closer must be confirmed by error estimates
as presented in Section 8.13 accompanied by numerical simulations in Section 8.16. Of course, for
this 1D Poisson problem, we easily observe a limit case, but for more complicated equations it is
often not visible whether the solutions do converge.

Level Elements DoFs


======================
1 2 3
2 4 5
3 8 9
4 16 17
5 32 33
======================

133
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.5 Algorithmic details
We provide some algorithmic details for implementing finite elements in a computer code. Of course, for
constant coefficients, a ‘nice’ domain Ω with uniform step lengths, we can immediately build the matrix, right
hand side vector and compute the solution. In practice, we are interested in more general formulations. Again,
we explain all details in 1D, and partially follow Suttmeier’s lecture [124]. Further remarks and details (in
particular for higher dimensions) can be found in the literature provided in Section 1.

8.5.1 Assembling integrals on each cell K


In practice, one does not proceed as in Section 8.4.5 since this only makes sense for very simple problems.
A more general procedure is based on an element-wise consideration, which is motivated by:
Z X Z
0 0
a(u, φ) = u φ dx = u0 φ0 dx.
Ω K∈Th K

The basic procedure is:


Algorithm 8.36 (Basic assembling - robust, but partly inefficient). Let Ks , s = 0, . . . n be an element and let
i and j be the indices of the degrees of freedom (namely the basis functions). The basic algorithm to compute
all entries of the system matrix and right hand side vector is:

for all elements Ks with s = 0, . . . , n


for all DoFs i with i = 0, . . . , n + 1
for all DoFs j with j = 0, . . . , n + 1
Z
aij + = φ0i (x) φ0j (x) dx
Ks

Here + = means that entries with the same indices i, j are summed. For the right hand side, we have

for all elements Ks with s = 0, . . . , n


for all DoFs i with i = 0, . . . , n + 1
Z
bi + = f (x)φi (x) dx.
Ks

Again + = means that only the bi with the same i are summed.
Remark 8.37. This algorithm is a bit inefficient since a lot of zeros are added. Knowing in advance the
polynomial degree of the shape functions allows to add an if-condition to assemble only non-zero entries.

8.5.2 Example in R5
We illustrate the previous algorithm for a concrete example. Let us compute 1D Poisson on four support
points xi , i = 0, 1, 2, 3, 4 that are equidistantly distributed yielding a uniform mesh size h = xj − xj−1 . The
discrete space Vh is given by:
Vh = {φ0 , φ1 , φ2 , φ3 , φ4 }, dim(Vh ) = 5.
The number of cells is #K = 4.

134
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS

φ0 φ1 φ2 φ3 φ4
1

K0 K1 K2 K3
x0 x1 x2 x3 x4

Figure 19: Basis functions φi , elements Ks and nodes xs used in Section 8.5.2.

It holds furthermore:
U ∈ R5 , A ∈ R5×5 , b ∈ R5 .

8.5.2.1 Element K0 We start with s = 0, namely K0 :


Z Z
0 0 1 1
as=0
00 = a00 = s=0
φ0 φ0 = , a01 = a01 = φ00 φ01 = − ,
K h K h
Z 0 Z 0 Z
0 0
as=0
02 = a02 = s=0
φ0 φ2 = 0, a03 = a03 = φ00 φ03 = 0, as=0
04 = a04 = φ00 φ04 = 0.
K0 K0 K0

Similarly, we evaluate a1j , a2j , a3j , a4j , j = 0, . . . 4. This yields the local matrix:
 
1 1 −1
Ãs=0 =
h −1 1

and the global matrix  


+1 −1 0 0 0
 −1 +1 0 0 0 
1 
As=0 =  0 0 0 0 0
 
h  
 0 0 0 0 0
0 0 0 0 0
We also see that we add a lot of zeros when |i − j| > 1. For this reason, a good algorithm first design the
sparsity pattern and determines the entries of A that are non-zero. This is clear due to the construction of
the hat functions and that they only overlap on neighboring elements.

8.5.2.2 Element K1 Next, we increment s = 1 and work on cell K1 . Here we again evaluate all aij and sum
them to the previously obtained values on K0 . Therefore the + = in the above algorithm.
This yields the local matrix:  
s=1 1 1 −1
à =
h −1 1
and the global matrix  
0 0 0 0 0
 0 +1 −1 0 0 
1 
As=1 =  0 −1 +1 0 0 
 
h 
0 0 0 0 0
0 0 0 0 0

135
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.5.2.3 Assembling all elements After having assembled the values on all four elements Ks , s = 1, 2, 3, 4,
we obtain the following system matrix:
P s P s P s P s P s   
s a00 Ps a01 Ps a02 Ps a03 Ps a04 1 −1 0 0 0
s s s s s 
P  −1 1 + 1 −1
 a a a a
 Ps 10 Ps 11 Ps 12 Ps 13 Ps 14  X a 3 0 0 
1 
A =  s as20 as21 as22 as23 as24  = A(s) =  0 −1 1 + 1 −1 0 
   
s s s s
P s P s P s P s P s  h  
 s a30 a a a a s=0 0 0 −1 1 + 1 −1 
P s Ps 31 Ps 32 Ps 33 Ps 34
 
s s s s
a
s 40 a
s 41 a
s 42 a
s 43 a
s 44 0 0 0 −1 1

To fix the homogeneous Dirichlet conditions, we can manipulate directly the matrix A or work with a ‘constraint
matrix’. We eliminate the entries of the rows and columns of the off-diagonals corresponding to the boundary
indices; here i = 0 and i = 4. Then:
 
h 0 0 0 0
 −1 2 −1 0 0 
1 
A =  0 −1 2 −1 0 .
 
h 
 0 0 −1 2 −1 
0 0 0 0 h

8.5.3 On the indices of A


We investigate whether
aij = (∇φj , ∇φi ) = (∇φi , ∇φj ).
For (∇u, ∇φ) this result is true because of symmetry. For non-symmetric problems this result is not anymore
true. Consider for instance
(∇u, ∇φ) + (b∇u, φ),
where b ∈ Rd . Take now
n
X
uh = uj φj
j=1

and insert for each test function φi , i = 1, . . . , n:


n
X
uj [(∇φj , ∇φ1 ) + (b∇φj , φ1 )], i = 1,
j=1
..
.
Xn
uj [(∇φj , ∇φn ) + (b∇φj , φn )], i = n,
j=1

As we already know and again see is that the test function determines the row in the system matrix A. For
this reason, on the right hand side, the first index j determines the column and the second index i determines
the row:
Xn
aij = uj [(∇φj , ∇φi ) + (b∇φj , φi )]
j=1

Since this problem is non-symmtric, we have aij 6= aji because (b∇φj , φi ) 6= (b∇φi , φj ). Here is a typical code
snippet from deal.II:

for (unsigned int i=0; i<dofs_per_cell; ++i)


{
...
for (unsigned int j=0; j<dofs_per_cell; ++j)

136
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
{
local_matrix(j,i) += (b * phi_grads_u[i] * phi_u[j] +
scalar_product(phi_grads_u[i], phi_grads_u[j])
) * fe_values.JxW(q);
}
}

8.5.4 Numerical quadrature


As previously stated, the arising integrals may easily become difficult such that a direct integration is not
possible anymore:
• Non-constant right hand sides f (x) and non-constant coefficients α(x);
• Higher-order shape functions;
• Non-uniform step sizes, more general domains.
In modern FEM programs, Algorithm 8.36 is complemented by an alterative evaluation of the integrals using
numerical quadrature. The general formula reads:
Z nq
X
g(x) ≈ ωl g(ql )
Ω l=0

with quadrature weights ωl and quadrature points ql . The number of quadrature points is nq + 1.
Remark 8.38. The support points xi of the FE mesh and the quadrature points ql do not need to be necessarily
the same. For Gauss quadrature, they are indeed different. For lowest-order interpolatory quadrature rules
(box, Trapez) they correspond though.
We continue the above example by choosing the trapezoidal rule, which in addition, should integrate the
arising integrals exactly:
Z nq
X
g(x) ≈ hs ωl g(ql )
Ks l=0

where hs is the length of interval/element Ks , nq = 1 and ωl = 0.5. This brings us to:


Z
g(q0 ) + g(q1 )
g(x) ≈ hs .
Ks 2

Applied to our matrix entries, we have on an element Ks :


Z
hs  0 
aii = φ0i (x)φ0i (x) dx ≈ φi (q0 )φ0i (q0 ) + φ0i (q1 )φ0i (q1 ) .
Ks 2

For the right hand side, we for the case f = 1 we can use for instance the mid-point rule:
Z    
xi + xi−1 xi + xi−1
φi (x) dx ≈ hi φi = hi φi .
Ki 2 2

Remark 8.39. If f = f (x) with a dependency on x, we should use a quadrature formula that integrates the
function f (x)φi (x) as accurate as possible. The mid-point rule would read:
Z "    #
xi + xi−1 xi + xi−1
f (x)φi (x) dx ≈ hi f φi .
Ki 2 2

Remark 8.40. It is important to notice that the order of the quadrature formula must be sufficiently high
since otherwise the quadrature error dominates the convergence behavior of the FEM scheme.

137
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
We have now all ingredients to extend Algorithm 8.36:
Algorithm 8.41 (Assembling using the trapezoidal rule). Let Ks , s = 0, . . . n be an element and let i and j be
the indices of the degrees of freedom (namely the basis functions). The basic algorithm to compute all entries
of the system matrix A and right hand side vector b is:
for all elements Ks with s = 0, . . . , n
for all DoFs i with i = 0, . . . , n + 1
for all DoFs j with j = 0, . . . , n + 1
for all quad points l with l = 0, . . . , nq
hs 0
aij + = φ (ql )φ0j (ql )
2 i
where nq = 1. Here + = means that entries with the same indices are summed. This is necessary because on
all cells Ks we assemble again aij .
for all elements Ks with s = 0, . . . , n
for all DoFs i with i = 0, . . . , n + 1
for all quad points l with l = 0, . . . , nq
bi + = hs f (ql )φi (ql )
Remark 8.42. In practice it is relatively easy to reduce the computational cost when a lot of zeros are
computed. We know due to the choice of the finite element, which DoFs are active on a specific element and
can assemble only these shape functions. For linear elements, we could easily add an if condition that only
these aij are assembled when |i − j| ≤ 1. We finally remark that these loops will definitely work, but there is a
second aspect that needs to be considered in practice which is explained in Section 8.5.5.

8.5.5 Details on the evaluation on the master element


In practice, all integrals are transformed onto a master element (or so-called reference element) and
evaluated there. This has the advantage that
• we only need to evaluate once all basis functions;
• numerical integration formulae are only required on the master element;
• independence of the coordinate system. For instance quadrilateral elements in 2D change their form
when the coordinate system is rotated [106].
The price to pay is to compute at each step a deformation gradient and a determinant, which is however easier
than evaluating all the integrals.
(h ) (h )
We consider the (physical) element Ki i = [xi , xi+1 ], i = 0, . . . n and the variable x ∈ Ki i with and
(h0 )
hi = xi+1 − xi . Without loss of generality, we work in the following with the first element K0 = [x0 , x1 ] and
h = h0 = x1 − x0 as shown in Figure 20. The generalization to s elements is briefly discussed in Section 8.5.6.

x0 x1

(h0 )
Figure 20: The mesh element K0 .

(h )
The element Ki i is transformed to the master element (i.e., the unit interval with mesh size h = 1)
K (1) := [0, 1] with the local variable ξ ∈ [0, 1] as shown in Figure 21.
For the transformations, we work with the substitution rule (see Section 3.15). Here in 1D, and in higher
dimensions with the analogon. We define the mapping
(h )
Th : K (1) → K0 0
ξ 7→ Th (ξ) = x = x0 + ξ · (x1 − x0 ) = x0 + ξh.

138
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS

0 1

Figure 21: The master element K1 .

While function values can be identified in both coordinate systems, i.e.,


f (x) = fˆ(ξ), fˆ defined in K (1) ,
derivatives will be complemented by further terms due to the chain rule that we need to employ. Differentiation
in the physical coordinates yields
d dξ
: 1 = (x1 − x0 )
dx dx
⇒ dx = (x1 − x0 ) · dξ.
The volume (here in 1D: length) change can be represented by the determinant of the Jacobian of the trans-
formation:
J := x1 − x0 = h.
These transformations follow exactly the way as they are known in continuum mechanics. Thus for higher
dimensions we refer exemplarily to [76], where transformations between different domains can be constructed.
We now construct the inverse mapping
(h )
Th−1 : K0 0 → K (1)
x − x0 x − x0
x 7→ Th−1 (x) = ξ = = ,
x1 − x0 h
with the derivative
dξ 1
∂x Th−1 (x) = ξx = = .
dx x1 − x0
(h0 )
A basis function ϕhi on K0 reads:
ϕhi (x) := ϕ1i (Th−1 (x)) = ϕ1i (ξ)
and for the derivative we obtain with the chain rule:
∂x ϕhi (x) = ∂ξ ϕ1i (ξ) · ∂x Th−1 (x) = ∂ξ ϕ1i (ξ) · ξx
with Th−1 (x) = ξ.
Example 8.43. We provide two examples. Firstly:
Z Z
h Sub.
f (x) ϕi (x) dx = f (Th (ξ)) · ϕ1i (ξ) · J · dξ, (102)
Kh K (1)

and secondly, Z Z    
∂x ϕhi (x) · ∂x ϕhj (x) dx = ∂ξ ϕ1i (ξ) · ξx · ∂ξ ϕ1j (ξ) · ξx · J dξ. (103)
Kh K (1)

We can now apply numerical integration using again the trapezoidal rule and obtain for the two previous
integrals:
Z q
(102) X
h
f (x) ϕi (x) dx ≈ ωk f (Fh (ξk )) ϕ1i (ξk ) · J
Th k=1
and for the second example:
Z q
X    
∂x ϕhj (x)∂x ϕhi (x) dx ≈ ωk ∂ξ ϕ1j (ξk ) · ξx · ∂ξ ϕ1i (ξk ) · ξx · J.
Th k=1

Remark 8.44. These final evaluations can again be realized by using Algorithm 8.41, but are now performed
on the unit cell K (1) .

139
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.5.6 Generalization to s elements
We briefly setup the notation to evaluate the integrals for s elements:
• Let n be the index of the end point xn = b (b is the nodal point of the right boundary). Then, n − 1 is
the number of elements (intervals in 1d), and n + 1 is the number of the nodal points (degrees of freedom
- DoFs) and the number shape functions, respectively (see also Figure 19);
(hs )
• Ks = [xs , xs+1 ], s = 0, . . . n − 1.
• hs = xs+1 − xs ;
(h )
• Ts : K (1) → Ks s : ξ 7→ Ts (ξ) = xs + ξ(xs+1 − xs ) = xs + ξhs ;
(h ) x−xs
• Ts−1 : Ks s → K (1) : x 7→ Ts−1 (x) = hs ;

• ∇Ts−1 (x) = ∂x Ts−1 (x) = 1


hs (in 1D);
• ∇Ts (ξ) = ∂x Ts (ξ) = (xs + ξhs )0 = hs (in 1D);
• Js := det(∇Ts (ξ)) = (xs + ξhs )0 = hs (in 1D).

8.5.7 Example: Section 8.5.2 continued using Sections 8.5.5 and 8.5.6
We continue Example 8.5.2 and build the system matrix A by using the master element. In the following, we
evaluate the Laplacians in weak form:
Z Z
0 0
s
aij = φi (x) · φj (x) dx = φ0i (ξ) ∂x Ts−1 (x) · φ0j (ξ) ∂x Ts−1 (x) Js dξ
Ks K (1)
Z
1 1
= φ0i (ξ) · φ0j (ξ) hs dξ
(1) h s h s
ZK
1
= φ0i (ξ) · φ0j (ξ) dξ.
K (1) h s

We now compute once! the required integrals on the unit element (i.e., the master element) K (1) . Here,
ξ0 = 0 and ξ1 = 1 with h(1) = ξ1 − ξ0 = 1 resulting in two shape functions:
ξ1 − ξ ξ − ξ0
φ0 (ξ) = , φ1 (ξ) = .
h(1) h(1)
The derivatives are given by:
−1 1
φ00 (ξ) = = −1, φ01 (ξ) = = 1.
h(1) h(1)

φ0(ξ) φ1(ξ)
1

K (1)
ξ0 = 0 h(1) ξ1 = 1

Figure 22: The master element K (1) including nodal points and mesh size parameter h(1) .

140
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
With these calculations, we compute all combinations on the master element required to evaluate the Lapla-
cian in 1d:
Z Z
(1) 0 0
a00 = φ0 (ξ) · φ0 (ξ) dξ = (−1)2 dξ = 1,
K (1) K (1)
Z Z
(1)
a01 = φ00 (ξ) · φ01 (ξ) dξ = (−1) · 1 dξ = −1,
K (1) K (1)
Z Z
(1)
a10 = φ01 (ξ) · φ00 (ξ) dξ = 1 · (−1) dξ = −1,
(1) (1)
ZK ZK
(1)
a11 = φ01 (ξ) · φ01 (ξ) dξ = 1 · 1 dξ = 1.
K (1) K (1)

It is clear what happens on each element Ks using Algorithm 8.36:

Z Z
1 1 1
as00 = φ00 (ξ) · φ00 (ξ) dξ = (−1)2 dξ = ,
(1) hs hs K (1) hs
ZK
−1
Z
0 0 1 1
as01 = φ0 (ξ) · φ1 (ξ) dξ = (−1) · 1 dξ = ,
(1) hs hs K (1) hs
ZK Z
1 1
as02 = φ00 (ξ) · φ02 (ξ) dξ = (−1) · 0 dξ = 0,
(1) hs hs K (1)
ZK Z
0 0 1 1
as03 = φ0 (ξ) · φ3 (ξ) dξ = (−1) · 0 dξ = 0,
(1) hs hs K (1)
ZK Z
0 0 1 1
as04 = φ0 (ξ) · φ4 (ξ) dξ = (−1) · 0 dξ = 0.
K (1) hs hs K (1)

Next, we increase i → i + 1 resulting in i = 1 and compute:

as10 , as11 , as12 , as13 , as14 .

We proceed until i = 4. This procedure is done for all elements Ks with s = 0, 1, 2, 3. As stated in Algorithm
8.36, the procedure is however inefficient since a lot of zeros are assembled. Knowing the structure of the
matrix (i.e., the sparsity pattern) allows us upfront only to assemble the entries with non-zero entries.
Anyhow, the system matrix is composed by (the same as in Section 8.5.2):
P s P s P s P s P s 
s a00 s a01 s a02 s a03 s a04
 P as P as P as P as P as 
 Ps 10 Ps 11 Ps 12 Ps 13 Ps 14 
A =  s as20 as as as as 
 
 P s Ps 21 s
Ps 22 s
Ps 23 s
Ps 24s 
 s a30 a s a32 Ps a33 Ps a34
P s Ps 31

s s s s
P
a40 s a41 s a42 s a43 s a44
 s 
1 −1
h0 h0 0 0 0
 −1 1 1 −1 
h h +h h1 0 0 
 0 0 1 
−1 1 1 −1
= 0 + 0 .
 
h1 h1 h2 h2
−1 −1 
 1 1

 0 0
 h2 h2 + h3 h3 
−1 1
0 0 0 h3 h3

This matrix is the same as if we had evaluated all shape functions on the physical elements with a possibly
non-uniform step size hs , s = 0, 1, 2, 3.

Remark 8.45. Good practical descriptions for higher dimensions can be found in [82][Chapter 12] and [120].

141
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.6 Quadratic finite elements: P2 elements
8.6.1 Algorithmic aspects
We explain the idea how to construct higher-order FEM in this section. Specifically, we concentrate on P2
elements, which are quadratic on each element Ks . First we define the discrete space:
Vh = {v ∈ C[0, 1]| v|Kj ∈ P2 }
The space Vh is composed by the basis functions:
Vh = {φ0 , . . . , φn+1 , φ 21 , . . . , φn+ 12 }.
The dimension of this space is dim(Vh ) = 2n + 1. Here, we followed the strategy that we use the elements Ks
as in the linear case and add the mid-points in order to construct unique parabolic functions on each Ks . The
mid-points represent degrees of freedom as the two edge points. For instance on each Kj = [xj , xj+1 ] we have
as well xj+ 21 = xj + h2 , where h = xj+1 − xj . From this construction it is clear (not proven though!) that
quadratic FEM have a higher accuracy than linear FEM.

φi φi+ 12 φi+1
1

xi xi+ 12 xi+1

Figure 23: Quadratic basis functions

The specific construction of shape functions can be done as shown in 8.4.3 or using directly Lagrange basis
polynomials (see lectures on introduction of numerical methods).
Definition 8.46 (P2 shape functions). On the master element K (1) , we have
φ0 (ξ) = 1 − 3ξ + 2ξ 2 ,
φ 21 (ξ) = 4ξ − 4ξ 2 ,
φ1 (ξ) = −ξ + 2ξ 2 .
These basis functions fulfill the property:

1

i=j
φi (ξj ) =
0 i 6= j

for i, j = 0, 21 , 1. On the master element, a function has therefore the prepresentation:


1
X
u(ξ) = uj φj (ξ) + u 21 φ 21 (ξ).
j=0

This definition allows us to construct a global function from Vh :


Proposition 8.47. The space Vh is a subspace of V . Each function from Vh has a unique representation and
is defined by its nodal points:
n+1
X n
X
uh (x) = uj φj (x) + uj+ 12 φ 12 (x).
j=0 j=0

Remark 8.48. The assembling of A, u and b is done in a similar fashion as shown in detail for linear finite
elements.

142
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.6.2 Numerical test: 1D Poisson using quadratic FEM (P2 elements)
We continue Section 8.4.9.

-0.05

-0.1
u(x)

-0.15
h = 0.5, DoFs = 5
h = 0.25, DoFs = 9
h = 0.125, DoFs = 17
-0.2 h = 0.0625, DoFs = 33
h = 0.03125, DoFs = 65
Min. u(x) = -0.125
-0.25
0 0.2 0.4 0.6 0.8 1
x

Figure 24: Solution of the 1D Poisson problem with f = −1 using quadratic finite elements with various mesh
sizes h. DoFs is the abbreviation for degrees of freedom; here the number of support points xj . The
dimension of the discrete space is DoF s. For instance for h = 0.5, we have 2 mesh elements, we
have 5 DoFs and three basis functions, thus dim(Vh ) = 5. The numerical solutions are computed
with an adaptation of step-3 in deal.II [6, 9]. Please notice that the picture norm is not a proof in
the strict mathematical sense: to show that the purple, and blue lines come closer and closer must
be confirmed by error estimates as presented in Section 8.13 accompanied by numerical simulations
in Section 8.16. Of course, for this 1D Poisson problem, we easily observe a limit case, but for more
complicated equations it is often not visible whether the solutions do converge.

Remark 8.49. Be careful when plotting the solution using higher-order FEM. Sometimes, the output data is
only written into the nodal values values defining the physical elements. In this case, a quadratic function is
visually the same as a linear function. Plotting in all degrees of freedom would on the other hand shows also
visually that higher-order FEM are employed.

Level Elements DoFs


======================
1 2 5
2 4 9
3 8 17
4 16 33
5 32 65
======================

143
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.7 Galerkin orthogonality, a geometric interpretation of the FEM, and a first error
estimate
The finite element method has a remarkable geometric interpretation, which is finally the key how to solve
and analyze a given PDE in an infinite-dimensional vector space V using the finite-dimensional space Vh as
we have previously constructured from a practical point of view. The techniques are based on results from
functional analysis.
The goal is to study the accuracy of our FEM scheme in terms of the discretization error u − uh . Here,
u ∈ V is the exact (unknown) solution and uh ∈ Vh our finite element solution. The key is to use first best
approximation results and later (see the later sections) interpolation estimates.
We have

(u0 , φ0 ) = (f, φ) ∀φ ∈ V,
(u0h , φ0h ) = (f, φh ) ∀φh ∈ Vh .

Taking in particular only discrete test functions from Vh ⊂ V and subtraction of both equations yields:
Proposition 8.50 (Galerkin orthogonality). It holds:

((u − uh )0 , φ0h ) = 0 ∀φh ∈ Vh ,

or in the more general notation:


a(u − uh , φh ) = 0 ∀φh ∈ Vh .
This also shows that we deal with a projection.
Proof. Taking φh ∈ Vh in both previous equations yields:

(u0 , φ0h ) − (u0h , φ0h ) = (f, φh ) − (f, φh ).

Taking both equations, in the discrete space Vh yields

(f, φh ) − (f, φh ) = 0.

u
u − uh

a(u − uh, φh) = 0

uh Vh

Figure 25: Illustration of the Galerkin orthogonality.

Since (·, ·) is a scalar product (for the definition we refer the reader to Definition 7.7); here in the L2 sense, 6
Galerkin orthogonality yields immediately a geometric interpretation of the finite element method. The error
(measured in terms of the first derivative) stands orthogonal to all elements φh of Vh . Therefore, the solution
uh is the best approximation since it has the smallest distance to u.
6 We later define more precisely the L2 space. The reader can think of a square-integrable functions defined on Ω with
Z
L2 (Ω) := {v| v 2 dx < ∞}.

144
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Proposition 8.51 (Best approximation in finite-dimensional subspaces). Let U be a finite-dimensional sub-
space of V (for instance U := Vh , our finite element function space). Then, there exists for each element in V
a best approximation in U .
Proof. Let u ∈ V . Choosing a minimizing sequence7 (vn )n∈N from U to approximate u brings us to

ku − vn k → d for n → ∞,

where d := inf v∈U ku − vk. Moreover, we can estimate as follows:

kvn k = kvn k − kuk + kuk


≤ |kvn k − kuk| + kuk
= |kuk − kvn k| + kuk
≤ ku − vn k + kuk.

Thus, the sequence (vn )n∈N is bounded. Since U is finite dimensional, there exists a converging subsequence
(vnl )l∈N (a result from functional analysis, e.g., [134]) with

lim unl → v ∈ U.
l→∞

It follows that
ku − vk = lim ku − vnl k = inf ku − φk
l→∞ φ∈U

which shows the assertion.


In the following, we use the fact that the L2 scalar product induces a norm (this is due to the fact that L2
is a Hilbert space; see Section 7.8) sZ
p
kwk = (w, w) = w2 dx.

We recall Cauchy’s inequality, which we need in the following:

|(v, w)| ≤ kvkkwk.

We show that uh is a best approximation to u.


Theorem 8.52 (Céa lemma - first version). For any vh ∈ Vh it holds

k(u − uh )0 k ≤ k(u − vh )0 k

Proof. We estimate as follows:

k(u − uh )0 k2 = (u0 − u0h , u0 − u0h )


= a(u − uh , u − uh )
= a(u − uh , u − vh + vh − uh )
= a(u − uh , u − vh ) + a(u − uh , vh − uh )
| {z }
=0, Galerkin ortho.

≤ k(u − uh )0 k k(u − vh )0 k.

We can now divide by |(u − uh )0 k and obtain

k(u − uh )0 k ≤ k(u − vh )0 k,

which shows the assertion. We finally remark that this proof technique (using Galerkin orthogonality at some
point) is essential in finite element error analysis.
7 Working with minimizing sequences is an important concept in optimization and calculus of variations when minimizers to
functionals need to be established. If (vn ) is a minimizing sequence and the functional F (·) is bounded from below, we obtain
that F (vn ) < ∞ (assuming that F is proper) and limn F (vn ) = inf u F (u).

145
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Remark 8.53. This theorem is a first version of Céa’s Lemma, which will be later augmented with certain
constants yielding a more precise result. However, this result itself is extremely important as it shows the best
approximation property: we choose the best approximation uh from the space Vh . In particular, any inter-
polation error is based on the best approximation property. Specifically, vh will be chosen as an appropriate
interpolation ih v ∈ Vh from which we then can explicitely calculate the discrete functions and obtain a quan-
titative error estimate in terms of the mesh size h. Secondly, we also see (trivial, but nevertheless worth to
be mentioned) that if u is already an element of Vh , the error would be identically zero, which is clear from
geometrical considerations.

The previous result can be extended as follows: not only does the Galerkin orthogonality yield the best
approximation property, but it does also hold the contrary, namely that a best approximation property yields

((u − uh )0 , φh ) = 0.

Proposition 8.54. Let U be a linear subspace of a Pre-Hilbert space V . An element v ∈ U is best approxi-
mation to u ∈ V if and only if
a(u − v, φ) = 0 ∀φ ∈ U.
For each u ∈ V there exists at most one best approximation with respect to U .

Proof. The backward part follows from Theorem 8.52. The forward direction follows from Section 7.3.

8.8 Neumann & Robin problems


8.8.1 Robin boundary conditions
We briefly introduce and discuss well-posedness of Neumann and Robin boundary value problems. Let Ω =
(0, 1) and let f ∈ C(Ω) and β > 0. We consider the following problem:
Formulation 8.55 ((D)). Find u ∈ C 2 (Ω̄) such that

−u00 = f in Ω,



βu(0) − u0 (0) = α0 , (104)
βu(1) + u0 (1) = α1 .

Remark 8.56. For β = 0 we would obtain a pure Neumann (or so-called traction) problem.

To derive a variational formulation, a first step is the definition of an appropriate function space V . We
recall that boundary conditions on function values (such as Dirichlet or the first terms u(0) and u(1) in the
Robin type conditions) are built into V . If we change the problem statement (thus other boundary conditions),
also the definition of V will change. Here, we define

V = C 1 (Ω) ∩ C(Ω̄).

In the second step, we can state the variational form of (104):


Formulation 8.57 ((V)). Find u ∈ V such that

a(u, φ) = l(φ) ∀φ ∈ V

where Z
a(u, φ) = u0 φ0 dx + βu(1)φ(1) + βu(0)φ(0),
ZΩ (105)
l(φ) = f φ dx + α1 φ(1) + α0 φ(0).

Proposition 8.58. For u ∈ C 2 (Ω̄), it holds


(D) ↔ (V ).

146
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Proof. First, we show (D) → (V ). Multiplication with a test function, integration, and applying integration
by parts yield
−u00 = f
⇒ (−u00 , φ) = (f, φ)
⇒ (u0 , φ0 ) − [u0 φ]10 = (f, φ)
⇒ (u0 , φ0 ) − [u0 (1)φ(1) − u0 (0)φ(0)] = (f, φ)
0 0
⇒ (u , φ ) − (α1 − βu(1))φ(1) + (βu(0) − α0 )φ(0) = (f, φ)
⇒ (u0 , φ0 ) + βu(1)φ(1) + βu(0)φ(0) = (f, φ) + α1 φ(1) + α0 φ(0) .
| {z } | {z }
=a(u,φ) =l(φ)

We show the other direction (V ) → (D). Let u be solution to the variational problem. After backward
integration by parts, we obtain:
(−u00 − f, φ) = −[u0 (x)φ(x)]x=1
x=0 + [α0 − βu(0)]φ(0) + [α1 − βu(1)]φ(1)
= [α0 + u0 (0) − βu(0)]φ(0) + [α1 − u0 (1) − βu(1)]φ(1)
We work in a two-step procedure and discuss first the domain integral terms and in a second step the boundary
terms. First, we work with the test space Cc∞ (see Definition 8.10). Definitely φ ∈ Cc∞ , it follows φ ∈ C 1 ∩ C 0 .
We note that Cc∞ is a dense subspace of V (see Definition 7.22) and therefore an admissible choice as test
space. Using Cc∞ , the boundary terms vanish and we have
(−u00 − f, φ) = 0 ⇒ −u00 − f = 0 ⇒ −u00 = f ∀φ ∈ Cc∞ .
Since the domain terms fulfill the equation, the boundary terms remain:
[α0 + u0 (0) − βu(0)]φ(0) + [α1 − u0 (1) − βu(1)]φ(1) = 0.
We choose special test functions φ ∈ V with φ(0) = 1 and φ(1) = 0 yielding
α0 + u0 (0) − βu(0) = 0
and thus the first boundary condition. Vice versa, we choose φ(1) = 0 and φ(0) = 1 and obtain
α1 − u0 (1) − βu(1) = 0,
which is the second boundary condition. In summary, we have obtained the strong formulation and everything
is shown.
Proposition 8.59. For β > 0 the solutions to (D) and (V ) are unique.
Proof. We assume that there are two solutions u1 and u2 and we define the difference function w = u1 − u2 .
Thanks to linearity it holds:
a(w, φ) = 0 ∀φ ∈ V.
We recall that a(·, ·) has been defined in (105). As test function, we choose φ := w:
a(w, w) = 0.
In detail: Z
(w0 )2 dx + βw(1)2 + βw(0)2 = 0.

(w0 )2 = 0. In
R
Recalling that β > 0, the previous equation can only be satisfied when w(1) = w(0) = Ω
particular the integral yields that
w0 ≡ 0 ⇒ w = constant.
Since the solutions u1 and u2 are continuous (because we work with V - the reader may recall its definition!)
the difference function w is continuous as well. Since on the boundary, we have w(0) = w(1) = 0, it follows
from the continuity of w that necessarily
w≡0 ⇒ u1 − u2 = 0.

147
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.8.2 Neumann boundary conditions
Working with Neumann boundary conditions, we have derivative information on the boundaries.
Formulation 8.60 ((D)). Let Ω = (0, 1). Find u ∈ C 2 (Ω̄) such that
00

 −u = f in Ω,

0
−u (0) = α0 , (106)
 0

u (1) = α1 .

The minus sign in front of u0 (0) is only for convenience.


Here, the solution is not unique anymore and only determined up to a constant:

u + c, c > 0.

It is trivial to see that this solution satisfies Formulation 8.60. One possibility is to add an additional condition
to fix the solution. A common choice is to prescribe the mean value:
Z
u(x) dx = 0. (107)

Figure 26: Sketch of possible solutions to the Poisson problem with Neumann boundary conditions. The red
solution fulfills the compatibility condition (107). But any other constant to fix the solution would
work as well such that one of the black solutions would be feasbile.

A variational formulation of (106) on V = C 1 (Ω) ∩ C(Ω̄) reads:


Formulation 8.61 ((V)). Find u ∈ V such that

a(u, φ) = l(φ) ∀φ ∈ V

where Z
a(u, φ) = u0 φ0 dx,
ZΩ (108)
l(φ) = f φ dx + α1 φ(1) + α0 φ(0).

148
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Proposition 8.62. If Formulation 8.60 has a solution u ∈ C 2 (Ω̄), it is unique by applying the normalization
condition (107). Moreover, a compatibility condition, which is sufficient and necessary, must hold:
Z 1
f (x) dx + α0 + α1 = 0.
0

Proof. More a sketch rather a rigorous result! It holds necessarily using integration by parts in 1D (Gauss
divergence theorem to be more precise)
Z Z Z
f dx = − u00 dx = − u00 · 1 dx = −[u0 (x)]10 = −[u0 (1) − u0 (0)] = −α0 − α1 .
Ω Ω Ω

Consequently, we have shown the compatibility condition:


Z
f dx + α0 + α1 = 0.

This condition is also sufficient to obtain a solution:


Z
f dx + α0 + α1 = 0

Z
⇔ f · 1 dx + α0 · 1 + α1 · 1 = 0

Z
⇔ f φ dx + α0 · φ + α1 · φ = 0
|Ω {z }
=l(φ)
Z
⇔ u0 φ0 = 0 ∀φ = const.

| {z }
=a(u,φ)

Therefore, a(u, φ) = l(φ).

8.9 Variational formulations of elliptic problems in higher dimensions


In the previous sections of this chapter, we have restricted ourselves to 1D derivations in order to show the
principle mechanisms. In the forthcoming sections, we extend everything to Rn .
However, going from 1D to higher dimensions requires a deeper mathematical theory because the
solutions to elliptic problems (e.g., Poisson) can be irregular such that strong solutions in classical function
spaces may become infeasible. Consequently, we need to enlarge the sets of adequate solutions.
To this end, one restricts often to finding only weak solutions. For the integrals, however, the classical
Riemann integral is again too restrictive. For this reason, the Lebuesgue integral must be adopted. We
refer the reader to Section 7.1.3. A rich theory has been developed in the literature that is subject in classical
lectures on the theory of PDEs and also functional analysis. We have gathered the most important aspects in
Chapter 7.
In all this, principle challenges are:
1. Singularities in the domain (not a problem in 1D!), in the equations, the data become more severe in
higher dimensions;
2. The natural function spaces (Sobolev spaces ), here H 1 (Ω) with Ω ⊂ Rd , d ≥ 2, for solving the weak
problem cannot be embedded into classical continuous spaces. This causes challenges in existence theories
(for linear problems Lax-Milgram can be adopted) and also the prescription of boundary and initial data.
These have to be formulated and interpreted in the Lebuesgue intergral sense.

149
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.9.1 Model problem
The model problem is
Formulation 8.63 (Model problem). Let Ω ⊂ Rn . Find u ∈ C 2 ∩ C(Ω̄) such that

−∆u = f in Ω
u=0 on ∂Ω.

The Laplace-Operator ∆ has been previously defined in Definition 3.4.

8.9.2 Comments on the domain Ω and its boundaries ∂Ω


We recall the basic definitions of a domain and its boundaries. Let Ω ⊂ Rn be a domain (its properties are
specified in a minute) and ∂Ω its boundary. The closure of Ω is denoted by Ω̄. The boundary may be split
into non-overlapping parts
∂Ω = ∂ΩD ∪ ∂ΩN ∪ ∂ΩR ,
where ∂ΩD represents a Dirichlet boundary, ∂ΩN represents a Neumann boundary and ∂ΩR represents a Robin
boundary.

• By C(Ω) we denote the function space of continuous functions in Ω.


• Let k ≥ 0 (an integer). Then we denote by C k (Ω) the functions that are k-times continuously differen-
tiable in Ω.
• By C ∞ (Ω) we denote the space of infinitely differentiable functions.
• By Cc∞ (Ω) we denote the space of infinitely differentiable functions with compact support.

In order to be able to define a normal vector n, we assume that Ω is sufficiently regular, here Ω is of class
C 1 (or we also say ∂Ω ∈ C 1 or Ω has a Lipschitz boundary). The normal vector points outward of the domain.
Remark 8.64. Depending on the properties of Ω (bounded or not), its regularity, and also the properties of its
boundary, one can obtain very delicate mathematical results. We refer exemplarily to Grisvard [63] or Wloka
[145]. We always assume in the remainder that Ω is open and bounded and sufficiently regular.

8.9.3 Integration by parts and Green’s formulae


From the divergence Theorem 3.33, we obtain immediately:
Proposition 8.65 (Integration by parts). Let u, v ∈ C 1 (Ω̄). Then:
Z Z Z
uxi v dx = − uvxi dx + uvni ds, for i = 1, . . . , n.
Ω Ω ∂Ω

In compact notation: Z Z Z
∇uv dx = − u∇v dx + uvn ds.
Ω Ω ∂Ω

Proof. Apply the divergence theorem to uv. Exercise.


We obtain now some further results, which are very useful, but all are based directly on the integration by
parts. For this reason, it is more important to know the divergence theorem and integration by parts formula.
Proposition 8.66 (Green’s formulas). Let u, v ∈ C 2 (Ω̄). Then:
Z Z
∆u dx = ∂n u ds,
Ω ∂Ω
Z Z Z
∇u · ∇v dx = − ∆u v dx + v ∂n u ds.
Ω Ω ∂Ω

Proof. Apply integration parts.

150
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.9.4 Variational formulations
We recall the details of the notation that we shall use in the following. In Rn , it holds:
   
Z Z ∂1 u ∂1 φ Z 
 .   .  
(∇u, ∇φ) = ∇u · ∇φ dx =  .
.
 ·  .
.
 dx = ∂1 u∂1 φ + . . . + ∂ n u∂n φ dx.
   
Ω Ω Ω
∂n u ∂n φ

We have
Proposition 8.67. Let u ∈ C 2 (Ω̄) and f ∈ C(Ω̄). Let V defined by

V = {φ ∈ C 1 (Ω̄)| φ = 0 on ∂Ω}.

The function u is a solution to (D) if and only if u ∈ V such that

(∇u, ∇φ) = (f, φ) ∀φ ∈ V.

Proof. The proof is the same as in the 1D case. Let u be the solution of (D). We multiply the PDE by a test
function φ ∈ V , integrate and perform integration by parts:
Z
−(∆u, φ) = − ∂n uφ ds + (∇u, ∇φ)
∂Ω

where ∂n u = ∇u · n, where n is the normal vector as before. Since φ = 0 on ∂Ω, we obtain

−(∆u, φ) = (∇u, ∇φ)

thus
(∇u, ∇φ) = (f, φ).
In the backward direction we show (V ) → (D):
Z
(∇u, ∇φ) = −(∆u, φ) + ∂n uφ ds.
∂Ω

As before φ = 0 on ∂Ω and we get


(∇u, ∇φ) = −(∆u, φ)
thus
−(∆u, φ) = (f, φ).
Consequently,
(−∆u + f, φ) = 0.
Since −∆u + f is continuous on Ω we can employ the fundamental lemma of calculus of variations in higher
dimensions from which we obtain
−∆u + f = 0 ∀x ∈ Ω.
We recover the boundary conditions since we started from u ∈ V , which includes that u = 0 on ∂Ω.
Proposition 8.68 (Fundamental lemma of calculus of variations in Rn ). Let Ω ⊂ Rn be an open domain and
let w be a continuous function. Let φ ∈ Cc∞ (Ω). If
Z
w(x)φ(x) dx = 0 ∀φ ∈ C ∞ (Ω)

then, w ≡ 0 in Ω.
Proof. Similar to the 1D version. See also Theorem 6.3.2, p. 314 in [36].
Remark 8.69. We have not yet shown well-posedness of (D) and (V ), which is the topic of the next section.
Only we have shown so far that if solutions to (D) and (V ) exist, then these solutions are equivalent.

151
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.10 The Lax-Milgram lemma
We present in this section a result that ensures well-posedness (existence, uniqueness, stability) of linear
variational problems.
Formulation 8.70 (Abstract model problem). Let V be a Hilbert space with norm || · ||V . Find u ∈ V such
that
a(u, φ) = l(φ) ∀φ ∈ V.
Definition 8.71 (Assumptions). We suppose:

1. l(·) is a bounded linear form:


|l(u)| ≤ Ckuk for all u ∈ V.

2. a(·, ·) is a bilinear form on V × V and continuous:

|a(u, v)| ≤ γkukV kvkV , γ > 0, ∀u, v ∈ V.

3. a(·, ·) is coercive (or V -elliptic):

a(u, u) ≥ αkuk2V , α > 0, ∀u ∈ V.

Lemma 8.72 (Lax-Milgram for convex sets). Let W be a closed and convex subset of a Hilbert space V . Let
a(·, ·) : V × V → R be a continuous, symmetric, V -elliptic bilinear form. Then, for each l ∈ V ∗ the variational
problem
a(u, φ) = l(φ) ∀φ ∈ V
has a unique solution u ∈ W . Moreover, we have the stability estimate:
1
kuk ≤ klkV ∗ .
α
with
|l(ϕ)|
klkV ∗ := sup .
ϕ6=0 ||ϕ||V
Proof. The proof contains typical arguments often used in optimization and calculus of variations. Conse-
quently, we work with the minimization formulation (M) rather than (V) and finally transfer the result to (V)
as in the equivalent characterizations. The goal is to construct a sequence of solutions and to pass to the limit.
Then we must show that the limit is contained in the space V .
Existence of a minimum
We define J(·) = 12 a(·, ·) − l(·), which is possible since a(·, ·) is symmetric. We choose (recall the best approxi-
mation proofs and the remarks regarding minimizing sequences therein) again a minimizing sequence (un )n∈N
in V is characterized by the property

lim J(un ) = d := inf J(ϕ).


n→∞ ϕ∈V

Hence
2J(ϕ) = a(ϕ, ϕ) − 2l(ϕ) ≥ α||ϕ||2V − 2||l|| ||ϕ||V , (109)
where, in the last term, the operator norm |l(ϕ)| ≤ ||l||||ϕ||V is used. In particular, the linear functional l is
bounded and consequently klkV ∗ < ∞. This shows d > −∞ because

(kϕk2V − 2k kϕkV ) → ∞ for kφk → ∞.

Cauchy property
It holds
a(u − ϕ, u − ϕ) + a(u + ϕ, u + ϕ) = 2a(u, u) + 2a(ϕ, ϕ). (110)

152
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Our previous work is used in the next estimation. We begin at the V -ellipticity of a:

α||un − um ||2V ≤ a(un − um , un − um )


(110)
= 2a(un , un ) + 2a(um , um ) − a(un + um , un + um )
 
un + um un + um
= 2a(un , un ) + 2a(um , um ) − 4a , .
2 2

Adding zero in form of


 
un + um
−4l(un ) − 4l(um ) + 4l(un ) + 4l(um ) = −4l(un ) − 4l(um ) + 8l
2

yields
 
un + um un + um
α||un − um ||2V ≤ 2a(un , un ) + 2a(um , um ) − 4a , (111)
2 2
 
un + um
− 4l(un ) − 4l(um ) + 8l (112)
2
= 2a(un , un ) − 4l(un ) + 2a(um , um ) − 4l(um ) (113)
   
un + um un + um un + um
− 4a , + 8l (114)
2 2 2
 
(109) un + um
= 4J(un ) + 4J(um ) − 8J . (115)
2
un +um

Hence J 2 ≥ d (since W is assumed convex here) and

lim J(un ) + lim J(um ) = d + d = 2d.


n→∞ m→∞

Therefore in the limit:


α||un − um ||2V ≤ 4d + 4d − 8d = 0, n, m → ∞,
which shows the Cauchy property in V :

α||un − um ||2V → 0 (n, m → ∞).

Therefore, the limit u exists in V (recall that V is a Hilbert space and therefore complete - each Cauchy
sequence converges) and consequently in W since it is closed in V . It holds limn→∞ un = u. Since a(·, ·)
and l(·) are linear and bounded (i.e., continuous), the functional J(·) is linear and continuous as well, and it
follows:
lim un = u ⇒ J(u) = lim J(un ) = inf = d
n→∞ n→∞ v∈W

showing that the limit u exists and is an element of W .


Remark 8.73. Be careful when the functional J(u) is not linear anymore or when we deal with norms; e.g.,
in optimal control problems. Here, the continuity of J(·) is not sufficient to proof convergence. One needs to
go to weak convergence and deeper results in functional analysis. A nice idea for quadratic optimization
problems in Hilbert spaces recapitulating the basic ingredients of functional analysis is provided in [131][Section
2.4 and Section 2.5].
(M) yields (V)
It remains to show that the minimum of (M) is equivalent to the solution of (V). The following strategy for
the minimizing problem is used:

J(u) = min J(ϕ) ⇔ J(u) ≤ J(u + εϕ), ε ∈ R. (116)


ϕ∈V

153
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
A minimum is achieved by differentiation with respect to ε and setting the resulting equation to zero:

d
J(u + εϕ) = 0. (117)
dε ε=0

The assertion is shown with the help of the equivalence of the variational problem (V ) and the previous
minimization formulation (116). Thus,

a(u, ϕ) = l(ϕ) ∀ϕ ∈ V. (118)

Uniqueness.
Consider two solutions u1 , u2 ∈ V of the variational equation (118) which imply

a(u1 , ϕ) = l(ϕ) and a(u2 , ϕ) = l(ϕ) for all ϕ ∈ V.

Subtraction yields
a(u1 − u2 , ϕ) = 0 for all ϕ ∈ V.
Choosing ϕ = u1 − u2 gives us
a(u1 − u2 , u1 − u2 ) = ku1 − u2 k2V
and therefore
αku1 − u2 k2V = 0 ⇔ u1 − u2 = 0 ⇔ u1 = u2 .
Stability.
Follows from consideration of
α||u||2V ≤ a(u, u) = l(u) ≤ C||u||V .
Hence
C |l(u)|
||u||V ≤ with C = klkV ∗ = sup .
α u6=0 ||u||V

All assertions were shown.


Remark 8.74. For a symmetric bilinear form, i.e.,

a(u, φ) = a(φ, u)

the Riesz
p representation Theorem 8.75 can be applied. The key observation is that in this case the mapping
u 7→ a(u, u) defines a scalar product; and consequently a norm because we work in a Hilbert space. Then,
there exists a unique element u ∈ V such that

l(φ) = a(u, φ) ∀φ ∈ V.

The Lax-Milgram lemma is in particular useful for non-symmetric bilinear forms.


Theorem 8.75 (Riesz representation theorem). Let ˆl : V → R be a continuous linear functional on a Hilbert
space V . Then there exists a unique element w ∈ V such that

(w, φ)V = ˆl(φ) ∀φ ∈ V.

Moreover, this operation is an isometry, that is to say, kˆlk∗ = kwkV .


Proof. For a proof the Riesz representation theorem, we refer to [27].
Proposition 8.76. In the symmetric case, a(u, φ) = (∇u, ∇φ) defines a scalar product on the Hilbert space
V . Moreover, the right hand side functional l(·) is linear and continuous. Thus, the assumptions of the Riesz
representation theorem are fulfilled and there exists a unique solution to the Poisson problem.
Exercise 8. Show that the assumptions of the Lax-Milgram lemma are verified for the 1D Poisson model
problem. Thus, the variational formulation is well-posed (existence, uniqueness and stability of a solution).

154
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Lemma 8.77 (Lax-Milgram - general version). Let a(·, ·) : V × V → R be a continuous, V -elliptic bilinear
form. Then, for each l ∈ V ∗ the variational problem

a(u, φ) = l(φ) ∀φ ∈ V

has a unique solution u ∈ V . Moreover, we have the stability estimate:


1
kuk ≤ klkV ∗ .
α
with
|l(ϕ)|
klkV ∗ := sup .
ϕ6=0 ||ϕ||V

Proof. We recall upfront from before: Riesz: for all l ∈ V ∗ there exists u ∈ V such that

(u, ϕ) = l(ϕ) ∀ϕ ∈ V.

We work as follows.
Step 1 (Existence of an operator A)
For all fixed u ∈ V the mapping v 7→ a(u, v) is a bounded linear functional on V . Then with Riesz: there exist
Au ∈ V and f ∈ V such that

a(u, ϕ) = (Au, ϕ) ∀ϕ ∈ V (119)


l(ϕ) = (f, ϕ) ∀ϕ ∈ V. (120)

Step 2 (Linearity and boundedness of A)


The mapping u 7→ Au yields a linear bounded operator A : V → V . It holds for the linearity:

(A(λ1 u1 + λ2 u2 ), ϕ)V = a(λ1 u1 + λ2 u2 , ϕ) = . . . = (λ1 Au1 + λ2 Au2 , ϕ)V

This holds for all ϕ ∈ V and consequently A is linear. The boundedness (i.e., continuity) we obtain as follows:

kAuk2V = (Au, Au) = a(u, Au) ≤ γkukkAuk


⇒ kAukV ≤ γkuk ∀u ∈ V,

showing that A is bounded. For such operators in FA, we the notation A ∈ L(V, V ) is used. To sum-up

a(u, ϕ) = l(ϕ) (Variational form) (121)


Au = f (Operator equation) (122)

Step 3 (Existence of a fixed point)


We show that a fixed point exists. Specifically, we establish that

v ∈ V 7→ Tδ v := v − δ(Av − f ) ∈ V

with δ > 0, is a contraction on V . Then, it holds

Tδ v = v

has a unique solution u ∈ V . This solution is then also solution of (121) and (122) because of

Tδ v − v = 0
⇔ δ(Av − f ) = 0
⇔ Av − f = 0
⇔ Av = f.

The contradiction now follows from

v − δ(Av − f ) = v − δAv − δf

155
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
The right term is constant with respect to v and does therefore not play a role in the contraction proof (recall
Banach’s fixed point theorem in Numerik 1 for iterative schemes for solving Ax = b). Then:

kv − δAvk2V = kvk2 − 2δa(v, v) + δ 2 kAvk2V

We now use the coercivity of a(v, v). Then:

kv − δAvk2V ≤ kvk2 − 2δαkvk2V + δ 2 γ 2 kvk2V = (1 − 2δα + δ 2 γ 2 )kvk2V

For the choice of



0<δ<
γ2
we have a contraction and thus a fixed point. Due to the equivalences of the formulations in this proof (weak
form, operator form, fixed point form), we have obtained a unique solution of

u ∈ V : a(u, ϕ) = l(ϕ) ϕ ∈ V.

Step 4
Uniqueness and the stability are the same as in the previous first proof of Lax-Milgram’s lemma.

8.10.1 The energy norm


As we have seen before, the (symmetric) bilinear form defines a scalar product from which we can define
another norm. The continuity and coercivity of the bilinear form yield the energy norm:

kvk2a := a(v, v), v ∈ V.

This norm is aquivalent to the V -norm of the space V , i.e.,

ckvkV ≤ kvka ≤ CkvkV , ∀v ∈ V

and two positive constants c and C. We can even precisely determine these two constants:

α||u||2V ≤ a(u, u) ≤ γkuk2V


√ √
yielding c = α and C = γ. The corresponding scalar product is defined by

(v, w)a := a(v, w).

Finally, the best approximation property can be written in terms of the a-norm:

(u − uh , v)a = 0 ∀v ∈ Vh .

8.10.2 The Poincaré inequality


In order to apply the Lax-Milgram lemma to partial differential equations complemented with homogeneous
boundary conditions, we need another essential tool, the so-called Poincaré inequality.
Proposition 8.78. Let Ω ⊂ Rn be a bounded, open, domain. There exists a constant dΩ (depending on the
domain) such that Z Z
|v(x)|2 dx ≤ dΩ |∇v(x)|2 dx.
Ω Ω

holds true for each v ∈ H01 (Ω).


Remark 8.79. It is clear that the Poincaré inequality specifically holds true for functions from H01 but not
H 1 . A counter example is v ≡ 1, which is in H 1 , but violates the Poincaré inequality.

Proof. The proof can be found in all textbooks, e.g., Rannacher [106], Wloka [145] or Braess [24], page 29.

156
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Corollary 8.80. Let Ω ⊂ Rn be an open, bounded, domain. The semi-norm
Z 1/2
|v|H01 = |∇v|2 dx

is a norm on H01 and equivalent to the usual H 1 norm.


Proof. Let v ∈ H01 . It holds on the one hand (trivial!):

|v|H01 ≤ kvkH01 .

On the other hand, using Poincaré’s inequality, we obtain:


Z
kvk2H 1 ≤ (dΩ + 1) |∇v|2 dx = (dΩ + 1)|v|2H 1 ,
0

yielding
|v|2H 1 ≤ kvk2H 1 ≤ (dΩ + 1)|v|2H 1 .
0 0

8.10.3 Trace theorems


We have seen that H 1 functions are not continuous when n ≥ 2. Consequently, we cannot define pointwise
values and secondly, boundaries of measure zero. Thus, it is a priori a bit difficult to define boundary values
in Sobolev spaces. We face the question in which sense boundary values can be defined when working with
Sobolev spaces.
We have:
Theorem 8.81 (Trace theorem). Let Ω be an open, boundary domain of class C 1 . We define the trace
¯ as
γ0 : H 1 ∩ C(Ω̄) → L2 (∂Ω) ∩ C(∂Ω)
v 7→ γ0 (v) = v|∂Ω .
By continuity γ0 can be extended to the entire domain Ω. It holds

kvkL2 (∂Ω) ≤ CkvkH 1 (Ω) .

Proof. See Rannacher [106], page 30.


Theorem 8.82 (Trace theorem for derivatives). Let Ω be an open, boundary domain of class C 1 . We define
¯ as
the trace γ0 : H 2 ∩ C 1 (Ω̄) → L2 (∂Ω) ∩ C(∂Ω)

v 7→ γ1 (v) = ∂n v|∂Ω .

By continuity, γ1 can be extended from H 2 (Ω) mapped into L2 (∂Ω). It holds

k∂n vkL2 (∂Ω) ≤ CkvkH 2 (Ω) .

Proof. The existence of γ1 follows from the first trace theorem previously shown. Since v ∈ H 2 , we have
∇v ∈ (H 1 )n . Consequently, we can define the trace of ∇v on ∂Ω, which is then a trace on (L2 (∂Ω))n . Since
the normal vector is a bounded, continuous function on ∂Ω, we have ∇v · n ∈ L2 (∂Ω).
Remark 8.83. The second trace theorem has important consequences on the regularity of the boundary func-
tions. In general, for variational problems we can only proof that the solution u is from H 1 . In this case, the
normal derivative is only from H −1 (a space with very low regularity). Only under additional assumptions on
the data and the domain, we can show u ∈ H 2 , which would yield better regularity of the normal derivative
∇u · n on ∂Ω.

8.11 Theory of elliptic problems (in higher dimensions)


Gathering all results from the previous sections, we are now in a state to proof well-posedness of the Poisson
problem in higher dimensions.

157
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.11.1 The formal procedure
In most cases, one starts with the differential problem (D). In order to establish well-posedness, one can work
in three steps:
• Derive formally a variational problem (V).
• Proof well-posedness for the variational problem (here for linear problems using Lax-Milgram).
• If the solution enjoys enough regularity, go back and show (V ) → (D) (see also Section 8.1.2) using the
fundamental lemma of calculus of variations.

8.11.2 Poisson’s problem: homogeneous Dirichlet conditions


Proposition 8.84. Let Ω be a bounded domain of class C 1 and let f ∈ L2 . Let V := H01 . Then, the Poisson
problem has a unique solution u ∈ V and it exists a constant cp (independant of f ) such that the stability
estimate
kukH 1 ≤ cp kf kL2
holds true.
Proof. We use the Lax-Milgram lemma and check its assumptions. Since H01 is a subspace of H 1 , we work
with the H 1 norm.
Continuity of a(·, ·)

Using Cauchy-Schwarz, we obtain:


Z Z Z 1/2 Z 1/2
|a(u, φ)| = | ∇u · ∇φ dx| ≤ |∇u| |∇φ| dx ≤ |∇u|2 dx |∇φ|2 dx
Ω Ω Ω Ω
Z 1/2 Z 1/2
2 2 2 2
≤ (u + |∇u| ) dx (φ + |∇φ| ) dx = kukH 1 kφkH 1 .
Ω Ω

Coercivity of a(·, ·)

Using the Poincaré inequality, we obtain


Z Z Z
2 1 2 1
a(u, u) = |∇u| dx = |∇u| dx + |∇u|2 dx
Ω 2 Ω 2 Ω
Z Z
1 1 1 1
≥ |∇u|2 dx + u2 dx ≥ min(1, )kuk2H 1 .
2 Ω 2dΩ Ω 2 dΩ

Continuity of l(·)

Using again Cauchy-Schwarz, we obtain:

|l(φ)| = |(f, φ)L2 | ≤ kf kL2 kφkL2 ≤ kf kL2 kφkH 1 .

Stability estimate

From Lax-Milgram we know:

1 |l(φ)| kf kL2 kφkH 1 1


kuk ≤ klkV ∗ ≤ ≤ = kf kL2 .
α ||φ||V kφkH 1 α

Everything is shown now.

158
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Remark 8.85 (On the dual norm). Previously we have had:

|l(φ)| = |(f, φ)| ≤ kf kL2 kφkL2 .

When f ∈ L2 . But this estimate works also for f ∈ H −1 the dual space of H 1 . Then:

|l(φ)| = |(f, φ)| ≤ kf kH −1 kφkH 1 .

The smallest possible choice for C = klkV ∗ = kf kH −1 (see Section 3.12) is

|(f, φ)|
C= sup ,
φ∈H01 ,φ6=0 kφkH 1

which is indeed a norm and in general denoted by kf kH −1 . In the L2 case we have

|(f, φ)|
kf kL2 = sup .
φ∈L2 ,φ6=0 kφkL2

Here, we take the supremum over a larger space; recall that H 1 ⊂ L2 . Consequently

kf kH −1 ≤ kf kL2

and therefore
L2 ⊂ H −1
and finally
H 1 ⊂ L2 ⊂ H −1 .

Under stronger conditions, we have the following regularity result:


Proposition 8.86 (Regularity of the exact solution). Let Ω be bounded, open and of class C 2 . Then, the
solution of the Poisson problem is u ∈ H 2 (Ω) and it holds the stability estimate:

kukH 2 ≤ Ckf kL2 .

Proof. See for instance [89].


Proposition 8.87 (Higher regularity of the exact solution). Let Ω be bounded, open and of class C k (i.e.,
very smooth). Then, the solution of the Poisson problem is u ∈ H k+2 (Ω) and it holds the stability estimate:

kukH k+2 ≤ Ckf kH k .

If the boundary is not smooth enough (even when f is sufficiently smooth), this result will not hold true
anymore. This is the case for reentrent corners (L-shaped domains) for instance.

8.11.3 Nonhomogeneous Dirichlet conditions


In this section, we investigate non-homogeneous Dirichlet boundary conditions. Despite one only needs to add
formally a value or a function for the boundary conditions, the mathematical formulations are non-trivial.
We have
Formulation 8.88. Let Ω ⊂ Rn an open and bounded domain. Let f ∈ L2 (Ω). Find u such that

−∆u = f in Ω,
u=h on ∂Ω.

159
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Since we seek u ∈ H 1 (Ω) we assume that there exists a function u0 ∈ H 1 such that

trace(u0 ) = h on ∂Ω.

We know that h ∈ L2 (∂Ω).


In the following, we concentrate first on the derivation of a weak formulation. The idea is to work with the
space H 1 by extracting the nonhomogeneous Dirichlet condition. We write

u = u0 + ũ, ũ ∈ H01 .

Then:
Z Z Z
∇u · ∇φ dx = ∇(u0 + ũ) · ∇φ dx = (∇u0 · ∇φ + ũ · ∇φ) dx
Ω Ω Ω

This yields the variational problem:


Formulation 8.89. Find ũ ∈ H01 such that

(∇ũ, ∇φ) = (f, φ) − (∇u0 , ∇φ) ∀φ ∈ H01 .

Having found ũ, we obtain u ∈ H 1 via


u = u0 + ũ.
The equivalent formulation, often found in the literature, is:
Formulation 8.90. Find u ∈ {h + H01 } such that

(∇u, ∇φ) = (f, φ) ∀φ ∈ H01 .

Example 8.91 (Poisson in 1D). Let

−u00 (x) = f,
h=4 on ∂Ω.

Here u0 = 4 in Ω. When we solve the Poisson problem with finite elements, we proceed as shown before and
obtain ũ ∈ H01 and finally simply
u = ũ + 4
to obtain the final u. For nonconstant u0 (thus nonconstant h on ∂Ω, please be careful in the variational
formulation since the term
(∇u0 , ∇φ)
will not vanish and needs to be assembled. For instance let u(0) = 0 and u(1) = 1 in Ω = (0, 1). Then, u0 = x
and trace(u0 ) = h with h(0) = 1 and h(1) = 1. Thus we solve Formulation 8.89 to obtain ũ and finally:

u = ũ + u0 = ũ + x.

Proposition 8.92. Formulation 8.89 yields a unique solution.


Proof. The proof is similar to Formulation 8.84.

8.11.4 Neumann conditions


We have seen in the 1D case that the pure Neumann problem has solutions defined up to a constant value.
Formulation 8.93 (Pure Neumann problem). Let Ω ⊂ Rn be an open, bounded, connected, domain and
f ∈ L2 (Ω) and g ∈ L2 (∂Ω). We consider

−∆u = f in Ω,
∂n u = g on ∂Ω.

160
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Remark 8.94. We call the problem ‘pure’ Neumann problem because it is of course possible to have different
boundary conditions on different boundary parts. But once we have Dirichlet conditions on some part of the
boundary ∂ΩD with ∂ΩD 6= ∅ and on the remaining boundary ∂ΩN Neumann conditions, the problem of
non-uniqueness is gone since the solution is fixed on the Dirichlet part.
It only exists a solution if the following compatibility condition is fulfilled:
Z Z
f (x) dx + g(x) ds = 0. (123)
Ω ∂Ω
The compatibility condition is necessary and sufficient. We have the following result:
Proposition 8.95. Let Ω ⊂ Rn an open, bounded, connected domain of class C 1 . Let f ∈ L2 (Ω) and
g ∈ L2 (∂Ω), which satisfy the compabitiblity condition (123). Then, it exists a weak solution u ∈ H 1 (Ω) of
Formulation 8.93, which is unique up to a constant value.
Proof. The variational formulation is obtained as usually:
(∇u, ∇φ) = (f, φ) + hg, φi for all admissible φ.
Here, we have the first difficulty, since the Hilbert space H 1 will not yield the coercivity of the corresponding
bilinear form (in fact this is related to the non-uniqueness). For this reason, we define the space:
Z
V := {v ∈ H 1 | v dx = 0}.

Hence, the correct variational formulation is:
Find u ∈ V : (∇u, ∇φ) = (f, φ) + hg, φi ∀φ ∈ V.
As before, we adopt the Lax-Milgram Lemma to show existence and uniqueness of the solution. The continuity
of the bilinear form and right hand side are obvious; see the proof of Proposition 8.84.
The only delicate step is the coercivity. Here, Poincaré’s inequality will not work and we have to adopt a
variant of Friedrichs inequality
kvkL2 ≤ c(|ṽ| + |v|H 1 ) for v ∈ H 1
R
with c := c(Ω) and with ṽ = Ω v dx/µ(Ω) being the mean value of v. Thus, it holds:
Z
a(u, u) = |∇u|2 dx = |u|2H 1 ≥ αkuk2H1 .

Thus, the Lax-Milgram lemma yields a unique solution in the space V . We notice that this solution must
satisfy the compatibility condition (123) because Gauss’ divergence theorem yields
Z Z Z Z
f dx = − div(∇u) = − ∇u · n ds = − g ds.
Ω Ω ∂Ω ∂Ω
Hence Z Z
f dx + g ds = 0.
Ω ∂Ω
Interestingly this condition is also sufficient. Lax-Milgram says that we obtain the solution u ∈ V via solving
a(u, φ) = (f, φ) + hg, φi ∀φ ∈ V.
Using the compatibility condition we see the following:
Z Z
f dx + g ds = 0,
Z Ω Z∂Ω
⇔ c f dx + c g ds = 0
Ω ∂Ω
Z Z
⇔ f · c dx + g · c ds = 0
ZΩ Z ∂Ω
⇔ f · φ dx + g · φ ds = 0
Ω ∂Ω
⇔ a(u, φ) = (f, φ) + hg, φi.

161
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
We observe that this relation holds in particular for φ = const. For more information see Braess [24].

8.11.5 Robin conditions


Given:
Formulation 8.96. Let Ω ⊂ Rn and ∂Ω its boundary. Furthermore, f ∈ L2 (Ω) and g ∈ L2 (∂Ω) and
c ∈ L∞ (Ω) and b ∈ L∞ (∂Ω). Then: Find u such that

−∆u + cu = f, in Ω,
∂n u + bu = g, on ∂Ω.

We first formulate a weak formulation. As function space, we use V := H 1 . The bilinear and linear forms
are given by:
Z Z Z
a(u, φ) = ∇u · ∇φ dx + cuφ dx + buφ ds,
ZΩ Z Ω ∂Ω

l(φ) = f φ dx + gφ ds.
Ω ∂Ω

2
We notice that the functional l is not anymore a L function, but only a linear functional on the space V .
To proof coercivity, we need the following general form of Friedrichs inequality:
Lemma 8.97 (Friedrichs inequality). Let Ω ⊂ Rn a bounded domain of class C 1 and Γ ⊂ ∂Ω a measurable
set with |Γ| > 0. Then there exists a constant C (independent of u) such that
 Z 2 
kuk2H 1 ≤ C |u|2H 1 + u ds ∀y ∈ H 1 (Ω).
Γ

Proof. For a proof see [145].


Proposition 8.98. Let Ω be a bounded domain of class C 1 and functions c ∈ L∞ (Ω) and b ∈ L∞ (∂Ω) almost
everywhere non-negative with the condition:
Z Z
c2 dx + b2 ds > 0.
Ω ∂Ω

Then, the variational Robin problem has for each f ∈ L2 , g ∈ L2 (∂Ω) a unique solution u ∈ H 1 =: V .
Furthermore, it holds the stability estimate:

kukH 1 ≤ C kf kL2 + kgkL2 (∂Ω) .

Proof. We verify again the assumptions of the Lax-Milgram lemma. We work with the Cauchy-Scharz inequal-
ity, the generalization of Friedrichs inequality and also the trace lemma since we have deal with functions on
the boundary ∂Ω.
Preleminaries

First:
Z
| cuφ dx| ≤ kckL∞ kukL2 kφkL2 ≤ kckL∞ kukH 1 kφkH 1 .

With the trace lemma, we have:


Z
| buφ dx| ≤ kbkL∞ (∂Ω) kukL2 (∂Ω) kφkL2 (∂Ω) ≤ CkbkL∞ (∂Ω) kukH 1 (∂Ω) kφkH 1 (∂Ω) .
∂Ω

Continuity of a(·, ·)

162
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
We establish the continuity:

|a(u, φ)| = |(∇u, ∇φ) + (cu, φ)L2 + (bu, φ)L2 (∂Ω) | ≤ CkukH 1 kφkH 1 .

In the last estimates we used the inequalities derived in the preleminaries.


Coercivity of a(·, ·)

min (1, C1 )
a(u, u) ≥ kuk2H 1
C2 max (1, |Γ|)

where b(x) ≥ C1 . We left out several steps, which can be verified by the reader.
Continuity of l(φ)

We estimate:
Z Z
|l(φ)| = | f φ dx + gφ ds| ≤ C(kf kL2 + kgkL2 (∂Ω) )kφkH 1 .
Ω ∂Ω

Stability estimate

From Lax-Milgram we know:

1 |l(φ)| C(kf kL2 + kgkL2 )kφkH 1


kukH 1 ≤ klkV ∗ ≤ ≤ = C(kf kL2 + kgkL2 ).
α ||φ||V kφkH 1

Everything is shown now.

8.12 Finite element spaces


We have now a theoretical basis of the finite element method and for 1D problems we explained in Section 8.4
how to construct explicitely finite elements and how to implement them in a code.
It remains to give a flavour of the construction of finite elements in higher dimensions. Since the major
developments have been taken place some time ago, there are nice books including all details:
• Johnson [82] provides an elementary introduction that is easily accessible.
• The most classical book is by Ciarlet [35].

• Another book including many mathematical aspects is by Rannacher [106].


Consequently, the aim of this chapter is only to provide the key ideas to get the reader started. For further
studies, we refer to the literature. In addition, to the three above references the reader may consult the books
mentiond in Chapter 1 of these lecture notes.
We follow chapter 3 in [82] to provide an elementary introduction. The generalization is nicely performed by
Ciarlet [35][Chapter 2]. Moreover, Ciarlet also introduces in details rectangular finite elements and parametric
and isoparametric finite elements in order to approximate curved boundaries. Isoparametric elements are also
explained in detail by Rannacher [106].
Finite element spaces consist of piecewise polynomials functions on a decomposition (historically still called
triangulation), i.e., a mesh, [
Th := Ki
i
n
of a bounded domain R , n = 1, 2, 3. The Ki are called mesh elements.
In particular:
• n = 1: The elements Ki are intervals;

163
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
• n = 2: The elements Ki are triangles or quadrilaterals;
• n = 3: The elements Ki are tetrahedrons or hexahedra.
A conforming FEM scheme requires Vh ⊂ V with V := H 1 for instance. Since Vh consists of piecewise
polynomials, we have
Vh ⊂ H 1 (Ω) ⇔ Vh ⊂ C 0 (Ω̄).
As we have seen in the 1D case, to specify a finite element (see Section 8.4.7 - we remind that this section
holds not only for 1D, but also higher dimensions), we need three ingrediants:
• A mesh Th representing the domain Ω.
• Specification of v ∈ Vh on each element Ki .
• Parameters (degrees of freedom; DoFs) to describe v uniquely on Ki .

8.12.1 Example: a triangle in 2D


Let us follow [82][Chapter 3] and let us work in 2D. Let K be an element of Th . A P1 (K) function is defined
as
v(x) = a00 + a10 x1 + a01 x2 , x = (x1 , x2 ) ∈ K
and coefficients aij ∈ R.
Definition 8.99 (A basis of P1 (linear polynomials)). A basis of the space P1 is given by

P1 = {φ1 , φ2 , φ3 }

with the basis functions


φ1 ≡ 1, φ2 = x1 , φ3 = x2 .
The dimension is dim(P1 ) = 3. Clearly, any polynomial from P1 can be represented through the linear combi-
nation:
p(x) = a00 φ1 + a10 φ2 + a01 φ3 .
Definition 8.100 (A basis of P2 (quadratic polynomials)). A basis of the space P1 is given by

P2 = {φ1 , φ2 , φ3 , φ4 , φ5 , φ6 }

with the basis functions

φ1 ≡ 1, φ2 = x1 , φ3 = x2 , φ4 = x21 , φ5 = x22 , φ6 = x1 x2 .

The dimension is dim(P1 ) = 6. Again, any polynomial from P2 can be represented through the linear combi-
nation:
p(x) = a00 φ1 + a10 φ2 + a01 φ3 + a20 φ4 + a02 φ5 + a11 φ6 .
Definition 8.101 (A basis of Ps ). A basis of the space P1 is given by
X
Ps = {v| v(x) = aij xi1 xj2 }
0≤i+j≤s

with the dimension


(s + 1)(s + 2)
dim(Ps ) = .
2
In the following, we explicitely construct the space of linear functions on K. Let

Vh = {v ∈ C 0 (Ω̄)| v|K ∈ P1 (K), ∀K ∈ Th },

which we still know from our 1D constructions. To uniquely describe the functions in Vh we choose the global
degrees of freedom, here the values of the nodal points of Th .

164
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Remark 8.102. Choosing the nodal points is the most common choice and is a Lagrangian scheme. Using
for instance derivative information in the nodal points, we arrive at Hermite polynomials.
We need to show that the values of the nodal points yield a unique polynomial in P1 (K).
Definition 8.103 (Local DoFs). Let n = 2 and let K be a triangle with vertices ai , i = 1, 2, 3. These values
are called local degrees of freedom.
Proposition 8.104. Let K ∈ Th be an element (a triangle) with vertices ai = (ai1 , ai2 ) ∈ K. A function on K
is uniquely determined by the local degrees of freedom from Definition 8.103. In other words, for given values
αi , there exists a unique function v ∈ P1 (K) such that
v(ai ) = αi , i = 1, 2, 3.
Proof. Recalling the definition of a P1 polynomial on K, we have
v(x) = a00 + a10 x1 + a01 x2 .
In the support points ai , we have:
v(ai ) = a00 + a10 ai1 + a01 ai2 = αi , i = 1, 2, 3.
This yields a quadratic system with three equations for three unknowns. Using matrix notation, we can plug
the coefficients into a system
Ba = b
with      
1 a11 a12 a00 α1
B =  1 a21 a22 , a =  a10 , b =  α2 .
     
1 a31 a32 a01 α3
The system Ba = b has a unique solution when det(B) 6= 0, which can be easily checked here and completes
the proof.
Algorithm 8.105 (Construction of P1 shape functions). For the explicit construction of such P1 shape func-
tion, we simply need to solve
v(ai ) = a00 + a10 ai1 + a01 ai2 = αi , i = 1, 2, 3,
for the unknown coefficients aij . This means, we need to construct three basis functions ψi ∈ P1 (K) with
ψi (aj ) = δij ,
where δij is the already introduced Kronecker delta. Thus, we have to solve the system Ba = b three times:

ψi (aj ) = a00 + a10 aj1 + a01 aj2 = αj , i, j = 1, 2, 3.


Then, we can build the local basis functions via
ψi (x) = ai00 + ai10 x1 + ai01 x2
with all quantities explicitely given.
We finally check that the continuity at the nodes ai is sufficient to obtain a globally continuous function,
which is in particular also continuous at the edges between two elements. Let K1 and K2 be two elements in
Th with a common edge Γ and the end points b1 and b2 . Let vi = v|Ki ∈ P1 (Ki ) be the restriction of v to Ki .
Then, the function w = v1 − v2 defined on Γ vanishes at the end points b1 and b2 . Since w is linear on Γ the
entire function w must vanish on Γ. Consequently, v is continuous across Γ. Extending to all edges of Th be
conclude that the function v is a globally continuous function and that indeed v ∈ C 0 (Ω̄).
Definition 8.106 (C 0 elements). The class of finite elements that fulfills the continuity condition at the edges
are called C 0 elements.
Remark 8.107. The generalization to n-simplicies can be found in [35][Chapter 2].

165
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.12.2 Example: a quadrilateral in 2D
In this section, we briefly present how quadrilaterals can be constructed. The work program of this section is
the same as in Section 8.12.1 and is based on [82][Chapter 3].
Let K be an element with four vertices ai , i = 1, . . . 4 and the sides parallel to the coordinate axes in R2 .
Definition 8.108 (A basis of Q1 (bilinear polynomials)). A basis of the space Q1 (K) on an element K is
given by
Q1 := Q1 (K) = {φ1 , φ2 , φ3 , φ4 }
with the basis functions
φ1 ≡ 1, φ2 = x1 , φ3 = x2 , φ4 = x1 x2 .
The dimension is dim(P1 ) = 3. Clearly, any polynomial from P1 can be represented through the linear combi-
nation:
p(x) = a00 φ1 + a10 φ2 + a01 φ3 + a11 φ4 .
Proposition 8.109. A Q1 function is uniquely determined by the values of the nodal points ai .
Proof. The same as for triangles.
Proposition 8.110. The definition of Q1 (K) yields globally continuous functions.
Proof. Same as for triangles.
Remark 8.111. As for triangles, we can easily defined higher-order spaces.
Remark 8.112. Be careful, when quadrilateral elements are turned or shifted. Then, the nature of the local
polynomials will change. This is one major reason why in practice we work with a master element and all
other elements are obtained via transformations from this master element. To this end, we are not restricted
to a specific coordinate system and leaves the desired freedom for deformations of the single elements. A lot of
details can be found in [106][Pages 105 ff ].

8.12.3 Well-posedness of the discrete problem


With the help of the Lax-Milgram lemma, we obtain:
Proposition 8.113. Let V a real Hilbert space and Vh ⊂ V . Let a(u, v) be a bilinear form and l(v) a linear
form. Then, the discrete problem
Find uh ∈ Vh : a(uh , φh ) = l(φh ) ∀φh ∈ Vh
has a unique solution. This discrete solution uh ∈ Vh is obtained by solving a linear equation system.
Proof. Existence and uniqueness are obtained from the Lax-Milgram lemma with the same arguments as for
the continuous problem. In particular, we have
AU = b
with
(A)nij=1 = a(φj , φi ), (b)ni=1 = l(φi ).
The coercivity yields the positive definitness of the matrix A and therefore (see lectures on linear algebra), A
is invertible:
Xn
AU · U ≥ αk uj φj k ≥ CkU k2 .
j=1

In fact, we investigate the homogeneous system


a(uh , φh ) = 0
and show that this only yields the zero solution, which is immediately true using the coercivity.
Furthermore, the symmetry of a(u, φ) yields the symmetry of A. Linear algebra arguments on matrices
(symmetric, positive, definite) yield a unique solution U .

166
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.13 Numerical analysis: error estimates
In this section, we perform the numerical analysis of our finite element derivations. We work as before and
focus on elementary aspects working out 1D results in order to show the mechanism and to explain the principle
ideas. For higher dimensions, we only provide the results and refer to the literature for detailed proofs.

8.13.1 The Céa lemma


The Céa lemma is the only result in this section that holds for higher dimensions. We investigate the best
approximation error (recall our 1D findings).
Proposition 8.114 (Céa lemma). Let V be a Hilbert space and Vh ⊂ V a finite dimensional subspace. Let the
assumptions, Def. 8.71, of the Lax-Milgram Lemma 8.72 hold true. Let u ∈ V and uh ∈ Vh be the solutions of
the variational problems. Then:
γ
ku − uh kV ≤ inf ku − φh kV
α φh ∈Vh
Proof. It holds Galerkin orthogonality:

a(u − uh , wh ) = 0 ∀wh ∈ Vh .

We choose wh := uh − φh and we obtain:

αku − uh k2 ≤ a(u − uh , u − uh ) = a(u − uh , u − φh ) ≤ γku − uh kku − φh k.

This yields
γ
ku − uh k ≤ ku − φh k ∀φh ∈ Vh .
α
Taking the infimum yields:
γ
ku − uh k ≤ inf ku − φh k.
φh ∈Vh α
Since Vh is closed it even holds
γ
ku − uh k ≤ min ku − φh k.
φh ∈Vh α

Corollary 8.115. When a(·, ·) is symmetric, which is the case for Poisson, then the improved estimate holds
true: r
γ
ku − uh kV = inf ku − φh kV
α φh ∈Vh
Based on the Céa lemma, we have the first (still non-quantitative) error estimate. The key aspect is that
we work with a dense subspace U of V which contains simpler functions than V itself from which we can
interpolate from U to the discrete space Vh . We have [3]:
Proposition 8.116. We assume that the hypotheses from before hold true. Furthermore, we assume that
U ⊂ V is dense. We construct an interpolation operator ih : U → Vh such that

lim kv − ih (v)k = 0 ∀v ∈ U
h→0

holds true. Then, for all u ∈ V and uh ∈ Vh :

lim ku − uh k = 0.
h→0

This result shows that the Galerkin solution uh ∈ Vh converges to the continuous solution u.

167
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Proof. Let ε > 0. Thanks to the density, for each u ∈ V , it exists a v ∈ U such that ku − vk ≤ ε. Moreover,
there is an h0 > 0, depending on the choice of ε, such that

kv − ih (v)k ≤ ε ∀h ≤ h0 .

The Céa lemma yields now:

ku − uh k ≤ Cku − ih (v)k = C(ku − v + v − ih (v)k) ≤ C(ku − vk + kv − ih (v)k) ≤ C(ε + ε) = 2Cε.

These proofs employs the trick, very often seen for similar calculations, that at an appropriate place an
appropriate function, here v is inserted, and the terms are split thanks to the triangular inequality. Afterwards,
the two separate terms can be estimated using the assumptions of other known results.

8.13.2 H 1 and L2 estimates in 1D


First, we need to construct an interpolation operator in order to approximate the continuous solution at certain
nodes.
Definition 8.117. Let Ω = (0, 1). Let Vh = {ϕ0 , . . . , ϕn+1 } a discrete function space. Moreover, let v ∈ H 1 .
A P1 interpolation operator ih : H 1 → Vh is defined by
n+1
X
(ih v)(x) = v(xj )ϕj (x).
j=0

Recapitulate for instance Lagrangian interpolation, e.g., [114][Chapter 8.1].


This definition is well-defined since H 1 functions are continuous in 1D and are pointwise defined. The
interpolation ih creates a piece-wise linear function that coincides in the support points xj with its H 1 function.

Remark 8.118. Since H 1 functions are not continuous any more in higher dimensions, we need more as-
sumptions here.
The convergence of a finite element method in 1D relies on
Lemma 8.119. Let ih : H 1 → Vh be given. Then:

lim ku − ih ukH 1 = 0.
h→0

If u ∈ H 2 , there is a constant C such that

ku − ih ukH 1 ≤ Ch|u|H 2 .

Proof. Since
ku − ih uk2H 1 = ku − ih uk2L2 + |u − ih u|2H 1 ,
the result follows immediately from the next two lemmas; namely Lemma 8.121 and Lemma 8.122.
Definition 8.120. The global norm k · k is obtained by locally integrating and then summing up the results on
all elements. For instance, the L2 norm is defined by
X Z 1/2
kukL =
2 |u|2 dx .
K∈Th K

Lemma 8.121. For a function u ∈ H 2 , it exists a constant C (independent of h) such that

ku − ih ukL2 ≤ Ch2 ku00 kL2 ,


|u − ih u|H 1 ≤ Chku00 kL2 .

168
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Proof. We choose u ∈ Cc∞ (Ω) (the result will hold true for H 2 because of a density argument). The interpolant
ih u is a linear function on each element Kj = (xj , xj+1 ). We calculate:
 u(xj+1 ) − u(xj ) 
u(x) − ih u(x) = u(x) − u(xj ) + (x − xj )
xj+1 − xj
x − xj  
= u(x) − u(xj ) − u(xj+1 ) − u(xj )
xj+1 − xj
Z x Z xj+1
x − xj
= u0 (t) dt − u0 (t) dt.
xj xj+1 − xj xj
Since u0 is continuous, we can apply the mean value theorem from calculus (see e.g., [85][Chapter 11.3]) and
further obtain:
x − xj
u(x) − ih u(x) = (x − xj )u0 (xj + α) − (xj+1 − xj )u0 (xj + β)
xj+1 − xj
Z xj +α
= (x − xj ) u00 (t) dt,
xj +β

with 0 ≤ α ≤ x − xj and 0 ≤ β ≤ h, h = xj+1 − xj . We now take the square and apply the Cauchy-Schwarz
inequality (recall |(u, v)|2 ≤ kuk2 kvk2 ):
Z xj +α 2
|u(x) − ih u(x)|2 = (x − xj ) u00 (t) dt
xj +β
Z xj +α 2
≤ h2 u00 (t) dt
xj +β
Z xj +α Z xj +α 
≤h 2 2
1 dt |u00 (t)|2 dt
xj +β xj +β
Z xj+1 Z xj+1 
≤ h2 2
1 dt |u00 (t)|2 dt
xj xj
Z xj+1 
= h3 |u00 (t)|2 dt .
xj

When we now integrate |u(x) − ih u(x)|2 on Kj , we obtain a norm-like object what we wish:
Z xj+1 Z xj+1 
|u(x) − ih u(x)|2 dx ≤ h4 |u00 (t)|2 dt .
xj xj

Summing up over all elements Kj = [xj , xj+1 ], j = 0, . . . , n, we obtain the global norm (see Definition 8.120):
X Z xj+1
|u(x) − ih u(x)|2 dx = ku − ih uk2L2 ≤ h4 |u|2H 2 .
j xj

Taking the square root yields the desired result:


ku − ih ukL2 ≤ Ch2 ku00 kL2 for u ∈ Cc∞ (Ω).
By density the result also holds true for H 2 .
The second estimate is obtained by the same calculations. Here, the initial idea is:
u(xj+1 ) − u(xj )
u0 (x) − ih u0 (x) = u0 (x) −
Z h
hu0 (x) 1 xj+1 0
= − u (t) dt
h h xj
1 xj+1 0
Z
= (u (x) − u0 (t)) dt
h xj
1 xj+1 x 00
Z Z
= u (s) ds.
h xj t

169
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
As before, taking the square root, then using Cauchy-Schwarz, then integrating over each element Kj , and
finally summing over all elements Kj will yield the desired result
|u − ih u|H 1 ≤ Chku00 kL2 .

Lemma 8.122. There exists a constant C (independent of h) such that for all u ∈ H 1 (Ω), it holds
kih ukH 1 ≤ CkukH 1
and
ku − ih ukL2 ≤ Ch|u|H 1 .
Moreover:
lim ku0 − ih u0 kL2 = 0.
n→∞

Remark 8.123. The generalization of the estimate of the interpolation operator is known as the Bramble-
Hilbert lemma from 1970 (see e.g. [24]).
Proof. The proofs use similar techniques as before. However, to get started, let u ∈ H 1 . In 1D we discussed
that H 1 functions are continuous. We therefore have:
kih ukL2 ≤ max |ih u(x)| ≤ max |u(x)| ≤ CkukH 1 .
x x

Furthermore, we have
 2
Z xj+1 u(xj+1 ) − u(xj ) 1
Z xj+1 2 Z xj+1
|ih u|2H 1 = |(ih u)0 (x)|2 dx = = u0 (x) dx ≤ |u0 (x)|2 dx = |u|2H 1 .
xj h h xj xj

As in the other Lemma 8.121, we sum over all elements and obtain
kih ukH 1 ≤ C|u|H 1 ≤ CkukH 1 .
To obtain the second result, we use again (see the other Lemma 8.121):
 u(xj+1 ) − u(xj ) 
u(x) − ih u(x) = u(x) − u(xj ) + (x − xj )
xj+1 − xj
Z x Z xj
x − xj
= u0 (t) dt − u0 (t) dt
xj xj+1 − xj xj+1
Z xj+1
≤2 |u0 (x)|2 dx.
xj

We then take again the square, integrate, use Cauchy-Schwarz and finally sum over all elements to obtain
ku − ih ukL2 ≤ Ch|u|H 1 .
To establish the third result, let ε > 0. Since Cc∞ is dense in H 1 , we have for all u ∈ H 1 :
ku0 − v 0 kL2 ≤ ε, for v ∈ Cc∞ .
With our first estimate on the interpolation error we have:
kih u0 − ih v 0 kL2 ≤ Cku0 − v 0 kL2 ≤ Cε.
For sufficiently small h we also have (Lemma 8.121, 2nd statement)
|v − ih v|H 1 ≤ ε.
These results can be used in our final estimate:
ku0 − (ih u)0 kL2 ≤ ku0 − v 0 kL2 + kv 0 − (ih v)0 kL2 + k(ih u)0 − (ih v)0 kL2 ≤ 3Cε,
which yields the desired result.

170
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
With the help of the previous three lemmas, we obtain the main result:
Theorem 8.124. Let u ∈ H01 and uh ∈ Vh be the solutions of the continuous and discrete Poisson problems.
Then, the finite element method using linear shape functions converges:

lim ku − uh kH 1 = 0.
h→0

Moreover, if u ∈ H 2 (for instance when f ∈ L2 and in higher dimensions when the domain is sufficiently
smooth or polygonal and convex), we have

ku − uh kH 1 ≤ Chku00 kL2 = Chkf kL2 .

Thus the convergence in the H 1 norm (the energy norm) is linear and depends continuously on the problem
data.

Proof. The first part is proven by using Lemma 8.119 applied to Lemma 8.116, which yields the first part of
the assertion. The estimate is based on the Céa lemma:

ku − uh kH 1 ≤ Cku − φh k ≤ Cku − ih ukH 1 ≤ Ch|u|H 2 = O(h).

In the last estimate, we used again Lemma 8.119.


Corollary 8.125. We have

ku − uh kL2 ≤ Chku00 kL2 = Chkf kL2 = O(h).

Here, weakening the norm to L2 does not seem to have any positive effect on the convergence order, despite
that we managed to proof ku − ih ukL2 = O(h2 ) in Lemma 8.121, 1st statement.
Proof. Follows immediately from

ku − uh kH 1 ≤ Chku00 kL2 = Chkf kL2 .

Now we use, Poincaré:


ku − uh kL2 ≤ Cku − uh kH 1
Thus:
ku − uh kL2 ≤ Cku − uh kH 1 ≤ Chku00 kL2 = Chkf kL2 = O(h).
Why does
ku − uh kL2 ≤ Cku − ih ukL2 ≤ Ch2 ku00 kL2
not work? The reason is that Céa’s lemma only holds for the natural norm V of the problem setting, here
V = H 1 . Thus, we cannot justify that

ku − uh kL2 ≤ Cku − ih ukL2

holds true. For this reason our final result is

ku − uh kL2 = O(h).

171
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.13.3 An improved L2 estimate - the Aubin-Nitsche trick
Carrying out numerical simulations, see for instance Section 8.16.3.2, we observe O(h2 ) in the L2 norm. This
seems one order better than proofed in Corollary 8.125. Actually, according to the L2 interpolation estimate, we
would have expected O(h2 ), but could not proof it. In this section, we establish that not only the interpolation
estimate is O(h2 ), but indeed the approximation error ku − uh k as well.
Proposition 8.126. Let u ∈ H01 and uh ∈ Vh be the solutions of the continuous and discrete Poisson problems.
Then:
ku − uh kL2 ≤ Ch2 ku00 kL2 = Ch2 kf kL2 = O(h2 ).
Proof. The goal is to obtain one order better than in the previous subsection. To this end, we define an
auxiliary problem (the Aubin-Nitsche trick) that considers the error between the exact solution u and the
discrete solution uh : Find w ∈ V such that
Z
a(w, φ) = (u − uh )φ dx for all φ ∈ V.

We set φ = u − uh ∈ V , indeed since u ∈ V and Vh ⊂ V the variable φ is in V , and obtain


Z
a(w, u − uh ) = (u − uh )2 dx = ku − uh k2L2 .

On the other hand we still have Galerkin orthogonality:


a(u − uh , φh ) = 0 ∀φh ∈ Vh .
Herein, we take the interpolant φh := ih w ∈ Vh and obtain
a(u − uh , ih w) = 0 ∀φh ∈ Vh .
Combining these two findings, we get:
a(w, u − uh ) = a(w − ih w, u − uh ) = ku − uh k2L2 .
To get a better estimate we now proceed with the continuity of the bilinear form:
ku − uh k2L2 = a(w − ih w, u − uh ) ≤ γkw − ih wkH 1 ku − uh kH 1 . (124)
The previous interpolation estimate in Theorem 8.124 can now be applied to the auxiliary variable w ∈ V :
kw − ih wkH 1 ≤ Chkw00 kL2 = Chku − uh kL2 . (125)
The last estimate holds true thanks to the auxiliary problem, which reads in strong form (key word: funda-
mental lemma of calculus of variations!):
−w00 = u − uh in Ω, w = 0 on ∂Ω.
and in which we need again that Ω is sufficiently regular, which still holds from the original problem of course,
and also that the right hand side is in L2 . But the latter one is also okay since u ∈ V (in particular in L2 and
uh ∈ Vh with Vh ⊂ V .
Plugging (125) into (124) yields on the one hand:
ku − uh k2L2 ≤ γChku − uh kL2 ku − uh kH 1
therefore
ku − uh kL2 ≤ γChku − uh kH 1 .
On the other hand, we know from Theorem 8.124 that
ku − uh kH 1 ≤ Chkf kL2 .
Plugging the last equation into the previous equation brings us to:
ku − uh kL2 ≤ γChku − uh kH 1 ≤ C 2 h2 kf kL2 = O(h2 ).

172
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.13.4 Analogy between finite differences and finite elements
Having established the numerical analysis, we can draw a short comparison to finite differences:

• FD: convergence = consistency + stability


• FE: convergence = interpolation + coercivity

8.14 Numerical analysis in 2D and higher dimensions


The 2D and higher dimension results are much more involved and for the detailed proofs, we refer to the
literature [24, 35, 106].
Definition 8.127 ([24], Chapter 5). Let Ω be a polygonal domain in R2 such that we can approximate this
domain with quadrilaterals or triangles. A triangulation Th with M elements Ki , i = 1, . . . , M is admissible
when
SM
1. Ω̄ = i=1 Ki .
2. If Ki ∩ Kj results in one single point, it is a corner point of Ki and Kj .
3. If Ki ∩ Kj results in more than point, then Ki ∩ Kj is an edge (or face) of Ki and Kj .

Definition 8.128 (Quasi-uniform, shape regular). A family of triangulations Th is called quasi-uniform or


shape regular, when there exists a κ > 0 such that each Ki ∈ Th contains a circle with radius ρK with
hK
ρK ≥
κ
where hK is half diameter of the element K. The diameter of K is defined as the longest side of K.

Definition 8.129 (Uniform triangulation). A family of triangulations Th is called uniform, when there exists
a κ > 0 such that each Ki ∈ Th contains a circle with radius ρK with
h
ρK ≥
κ
where h = maxK∈Th hK .
Remark 8.130. Geometrically speaking, the above conditions characterize a minimum angle condition ,
which tells us that the largest angle must not approach 180◦ in order to avoid that triangles become too thin.
The constant β is a measure for the smallest angle in K.

Remark 8.131. For adaptive mesh refinement (see Section 9.19.4) in higher dimensions, i.e., n ≥ 2, one has
to take care that the minimum angle condition is fulfilled when refining the mesh locally.
Definition 8.132. Let Th a decomposition of Ω and m ≥ 1. Then, we define the norm
s X
kukm := kuk2m,Kj .
Kj ∈Th

Remark 8.133. We recall that for m ≥ 2, the Sobolev space H m is compactly embedded into C 0 (Ω). Conse-
quently in these cases, we can construct a unique interpolation operator ih u. This allows us to define ih u and
to estimate the interpolation error ku − ih uk by higher-order norms.

173
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.14.1 The Bramble-Hilbert-Lemma and an interpolation estimate
Lemma 8.134 (Bramble-Hilbert-Lemma). Let us work in R2 in order to have the standard Lagrange interpo-
lation. Higher dimensions with other interpolation operators yield, however, the same results. Let now Ω ⊂ R2
a domain with Lipschitz boundary. Let r ≥ 2 and L a bounded, linear mapping from H r to a normed space Y .
If Pr−1 ⊂ ker(L), then it holds for c > 0 (i.e., c(Ω) · kLk ≥ 0):

kLuk ≤ c|u|H r ∀u ∈ H r (Ω).

Proof. Braess [24] p. 74.


Proposition 8.135 ([24], Chapter 6). Let r ≥ 2 and Th a quasi-uniform (i.e., shape-regular) triangulation of
Ω. Then the interpolation via piecewise polynomials of degree r − 1 yields

ku − ih ukH m ≤ chr−m |u|H r , u ∈ Hr, 0 ≤ m ≤ r.

Proof. See [24][Chapter 6]. Be careful, these are long and nontrivial proofs in which one learns however a lot
of things.

8.14.2 Inverse estimates


Working in the continuous function spaces allows for the previous approximation results. Specifically, we have

ku − ih ukH m ≤ chr−m |u|H r ,

where m ≤ r. In other words: the approximation error is measured in a coarser norm than the function itself.
On the discrete level, we can show results the other way around: the finer norm of finite elements can be
estimated by coarser norms.
(k)
Proposition 8.136 (Inverse estimate). Let Vh := Vh be discrete function spaces based on uniform decom-
positions. Then, there exists a constant c := c(κ, k, r) such that for 0 ≤ m ≤ r:

kuh kr ≤ c hm−r kukm ∀uh ∈ Vh .

8.14.3 Error estimates


Definition 8.137. Let m ≥ 1 and H0m ⊂ V ⊂ H m . Let a(·, ·) be a V -elliptic bilinear form. The variational
problem, find u ∈ V such that
a(u, φ) = l(φ) ∀φ ∈ V
is H s regular if for each f ∈ H s−2m a solution u ∈ H s (Ω) exists. Furthermore, it holds

kukH s ≤ ckf kH s−2m

Here, we assume s ≥ 2m. For negative norms, this last condition can be omitted.
(k)
Theorem 8.138. Let Th be a quasi-uniform decomposition of Ω. Then it holds for uh ∈ Vh , k = 1, 2, 3 and
triangular or quadrilaterals elements:

ku − uh kH 1 ≤ chkukH 2 ≤ chkf kL2 = O(h)

Using a duality argument (Aubin-Nitsche; see our 1D proofs), we obtain:


Theorem 8.139. Let the previous assumptions hold true, then:

ku − uh kL2 ≤ ch2 kf kL2 = O(h2 ).

174
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.15 Numerical analysis: influence of numerical quadrature, first Strang lemma
Previously, we studied the Céa-Lemma in terms of approximation theory. Now, a second error estimate for ũh
becomes important which is a consequence of numerical quadrature, we denote it error of consistence,
named first Lemma from Strang.
Our focus is combining both errors and give some answers about the order of convergence in the H1 -norm
|| · ||1 , that means
||u − ũh ||1 ≤ ||u − uh ||1 + ||uh − ũh ||1
| {z } | {z }
Céa Strang

This investigation leads to practical results in solving integrals with numerical integration. One could say
which degree of quadrature formula is necessary integrating finite elements at a given degree.
This discussion covers two possible ways: First, we study interpolation by Lagrange with exact integration.
The second way illustrates approximation theory for numerical quadrature formulae, the most common way
solving integrals in FEM.
Therefore, we study an elliptic variational problem in two dimensions. This includes the model problem
(Poisson problem). The results also hold for elliptic PDE’s of second order in 1D.
Formulation 8.140. Let Ω ⊂ R2 be a bounded, convex, polygonal domain. The task is finding a weak solution
u ∈ H01 (Ω) for an elliptic boundary value problem in R2 with homogeneous Dirichlet conditions

u ∈ H01 (Ω) : a(u, v) = (f, v)0 for v ∈ H01 (Ω)

with
2 Z
X
a(u, v) = aik ∂k u∂i v dx (126)
i,k=1 Ω

and coefficient functions aik := aik (x).


We assume a regular triangulation Th = {T } from Ω with properties
S
i) Ω = i Ti

ii) Ti◦ ∩ Tν◦ = ∅ , ν 6= i


For more information see [24] p. 58.

8.15.1 Approximate solution of A and b


There is often no chance or an enormous computational cost solving matrix A and right-hand side b. In
principle we have the reasons

i) The first case puts aij = δij in (126), so the computation uses integration formulae for polynomials
of degree 2m − 2. Exact formulae for higher m need many computations but convergence is sure for
quadrature rules of lower order.
ii) In general, the coefficient functions aij are not constant. Therefore, exact computations for the elements
of Aij and bi are not possible.
iii) Numerical integration is the favorite way to determine isoparametric elements.
The next ideas where introduced by Strang and generalize the Céa-Lemma. Already known is the error of
approximation and now a second estimate becomes important, named error of consistence.
Calculations will be done on a reference element subsequently affine-transformed on each cell (triangle).

175
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.15.1.1 Idea, deriving an approximate solution The matrix A = (Aij )N
i,j=1 and right-hand-side b = (bi ), i =
1, . . . , N leads to the linear equation system
Au = b
Instead of solving this one, an approximate linear equation system is given by

Ãũ = b̃ (127)

with à = (Ãij )N
i,j=1 and b̃ = (b̃i ), i = 1, . . . , N . The solution of (127) brings us the approximate solution vector

N
X
ũh = α̃i wi ∈ Shm .
i=1

Now the error


eh = uh − ũh
arises. Next we give an idea for the corresponding variational formulation of (127). We work with two linear
combinations of Shm :
XN XN
vh = ξi wi and wh = ηj wj
i=1 j=1

These are necessary for the approximate bilinear form ã(vh , wh ) and the approximate right-hand-side ˜l(vh ):
N
X N
X
ã(vh , wh ) = Ãij ξi ηj and ˜l(vh ) = b̃i ξi (128)
i,j=1 i=1

This leads to the variational formulation (Ṽ )

ũh ∈ Shm : ã(ũh , vh ) = ˜l(vh ) ∀vh ∈ Shm ⇔ Ãũ = b̃

The following lemma describes an error estimate for uh − ũh .


Lemma 8.141. (first Lemma from Strang)
The approximate bilinear form ã and the linear functional ˜l satisfying the condition

||vh ||21 ≤ γã(vh ) (Uniform ellipticity of ã) (129)

uniformly in 0 < h ≤ h0 and also holds the estimate

|(a − ã)(uh , vh )| + |(f, vh )0 − ˜l(vh )| ≤ chτ · ||vh ||1 , vh ∈ Shm (130)

Then the problems (Ṽ ) have unique solutions ũ ∈ Shm . The quantitative error estimate of the Ritzapproximation
uh ∈ Shm is given by
||uh − ũh ||1 ≤ cγhτ
Proof.
The unique solvability may be proofed with the Theorem of Lax-Milgram. See [24] p. 37 respectively [108]
p.41. We derive the quantitative error estimation by the following calculation. Denote eh = vh = uh − ũh ,

ã(eh ) = ã(eh , eh ) = ã(uh − ũh , eh ) = ã(uh , eh ) − ã(ũh , eh )


= ã(uh , eh ) − a(uh , eh ) + a(uh , eh ) − ã(ũh , eh )
= (ã − a)(uh , eh ) + a(uh , eh ) − ã(ũh , eh )
= (ã − a)(uh , eh ) + (f, eh )0 − ˜l(eh )
≤ chτ ||eh ||1

The last estimation follows from second assumption (130). Since we have (129), we can conclude:

||eh ||21 ≤ γã(eh ) ≤ γchτ ||eh ||1

176
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Division by ||eh ||1 shows the assertion.

With the aid of (128) we have a corresponding formulation for uniform ellipticity of ã:
X
ã(vh ) = Ãij ξi ξj ≥ γ1 |ξ|2 ∀ξ ∈ Rn , γ1 > 0 (131)
i,j

Hence the coefficient matrix is uniformly definite in the variable x.


We point out two different ways for an approximate determination of A and b. In this context, we assume
that the coefficient functions aij and the power vector f are sufficiently smooth,

aij , f ∈ L∞ (E), i, j = 1, 2

Secondly, we work with an affine family of finite elements, so we have the polynomial space Pm (E) for the
reference element. Interpolation and quadrature formulae were derived on that reference element and later
transformed on each cell of Ω.

8.15.2 Interpolation with exact polynomial integration


There is a triangulation Th . Each triangle T ∈ Th satisfies polynomial interpolation for aij and f with

ãij , f˜ ∈ Pr−1 (T ), i, j = 1, 2

The error is determined by the remainder and we find the uniform estimation

||aij − ãij ||∞ + ||f − f˜||∞ = O(hr ) (132)

This error is independent from step width h and holds for functions aij , f ∈ C r (T ). Higher differentiability
leads not to a better error estimation. See [69] p. 194ff.
We compute the elements of à and right-hand-side b̃ with
Z 2
X
Ãij = ãνµ ∂µ wi ∂µ wj dx, i, j = 1, . . . , N ,
Ω ν,µ=1
Z
and b̃i = f˜wi dx, i = 1, . . . , N .

The advantage of these elements is exact integration. Therefore we have to evaluate the following integrals,
Z
xβ dx, 0 ≤ |β| ≤ 2m + r − 3
T

Proof. Sketch. We give an answer that the degree of the polynomials must be 0 ≤ |β| ≤ 2m + r − 3.
The assumption says wi ∈ Shm which means wi |T ∈ Pm (T ). The coefficient function ãij is a polynomial of
∈ Pr−1 (T ). Then
ãν,µ ∂ν wi ∂µ wj ∈ P(r−1)+2m−2 (T )
The proof is finished.

177
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
An error estimation is given by Lemma (8.141). First task is to check the assumptions. Consider vh , wh ∈ Shm :

|(a − ã)(vh , wh )| = |a(vh , wh ) − ã(vh , wh )| (133)


Z X 2 Z 2
X
= aνµ ∂µ vh ∂ν wh dx − ãνµ ∂µ vh ∂ν wh dx (134)
G ν,µ=1 G ν,µ=1

Z 2
X
= (aνµ − ãνµ )∂µ vh ∂ν wh dx (135)
G ν,µ=1

Z 2
X
≤ c||a − ã||∞ · ∂µ vh ∂ν wh dx (136)
G ν,µ=1

C.S.
≤ c||a − ã||∞ · ||vh ||1 · ||wh ||1 (137)
(132)
≤ chr ||vh ||1 · ||wh ||1 (138)

We proof the second part


Z Z
|(f, vh ) − ˜l(vh )| = f vh dx − f˜ vh dx (139)
G G
Z
= (f − f˜) vh dx (140)
G
≤ c||f − f˜||∞ · ||vh ||∞ (141)
(132)
≤ chr · ||vh ||1 (142)

We complement the proof showing (129),

||vh ||21 ≤ c a(vh ) (V-ellipticity) (143)


≤ c|a(vh ) − ã(vh ) + ã(vh )| (expansion with ã(vh ) ) (144)
≤ c|a(vh ) − ã(vh )| + c|ã(vh )| (triangle inequality) (145)
= c|(a − ã)(vh )| + c|ã(vh )| (146)
r
≤ ch ||vh ||21 + c ã(vh ) (set vh = wh in (138)) (147)

The last estimate is a result from ã(vh ) > 0. For sufficient small step size 0 < h ≤ h0 we can conclude

||vh ||21 ≤ c ã(vh )

Now, Lemma (8.141) delivers the error of the approximate Ritzapproximation

||uh − ũh ||1 ≤ chr

8.15.2.1 Total error We recall the roles of the different solutions


• u : original function which is to approximate,
• the function u is approached by the solution uh of the Ritz-procedure,
• for lower computational cost, uh is approximated itself by ũh .
The triangle inequality shows

||u − ũh ||1 ≤ ||u − uh ||1 + ||uh − ũh ||1 = O(hm + hr )

The order for optimal convergence is r ≥ m. The error of the interpolating functions can be neglected in
comparison with the procedure-error if r > m.

178
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.15.3 Integration with numerical quadrature
In this context numerical integration formulae based on interpolated functions. The most common formulae is
Z L
X
fT (g) = g dx ∼ QT (g) = ωi g(ξi ) (148)
T i=1

with distinct quadrature points ξi ∈ T and quadrature weights ωi . For construction of these formulae, please
visit other literature.

8.15.3.1 Affine transformation rules We define the unit triangle E as the reference element. Let σT : E → T
be affine-linear mapping of E onto the triangle T . Then we have the properties

x = σT (x̂) = BT x̂ + bT , x ∈ T, x̂ ∈ E

with the functional matrix BT .

Figure 27: Reference element E and arbitrary cell (triagle) T

Example 8.142. Let E = conv{â1 , â2 , â3 } with â1 = (0, 0)T , â2 = (1, 0)T , â3 = (0, 1)T . The triangle T has
the coordinates a1 = (3, 1/4), a2 = (4, −1/2), a3 = (17/4, 1).
A bijective mapping between E and a triangle T is given by σT : E → T with σT (x̂) = BT x̂ + b, where
b = a1 . The functional matrix is defined by
!
1 45
BT = (a2 − a1 , a3 − a1 ) =
− 34 34

The determinant of BT is of order h2 . That means detBT = O(h2 ).

We obtain the relation, also known as “affine-equivalence” between two Lagrange elements,

v̂(x̂) := v(x), x̂ = σT−1 x, x ∈ T

between functions on E and T . The polynomial space for the reference element is charakterized by Pm (E). A
(linear) quadrature rule on E satisfies
L n
x̂ − xk
X Z Y
QE (g) = ω̂i g(ξˆi ), ω̂i := L̂i (x̂)dx̂, L̂i (x̂) =
i=1 E xi − xk
k=0

179
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
with quadrature points ξˆi ∈ E and positive weights ω̂i , i = 1, . . . , L.
Affine transformation of QF on E gives quadrature rules on each element T with
L
X
QT (g) = ωi g(ξi )
i=1

with quadrature points ξi = σT ξˆi and weights ωi = |detBT |ω̂i , i = 1, . . . , L. Then


L
X L
X
QT (g) := ωi g(ξi ) = |detBT |ω̂i ĝ(ξˆi ) = |detBT | |QE (ĝ)|
i=1 i=1

There is a short introduction of the very important transformation rule for integrals. It follows from the
substitution rule in 1D.
Lemma 8.143. Let T1 and T2 open subsets of Rn and σ : T1 → T2 . Then, and only then, the function f on
T2 is integrable if (f ◦ σ) · |detσ 0 | is integrable over T1 . It is
Z Z
0
f (σ(x)) · |detσ | dx = f (y) dy
T1 T2

Proof. See [86] p. 299. It follows directly for an affine mapping σx̂ = B x̂ + b, with derivation D(σx̂) =
D(B x̂ + b) = B, that Z Z
v(x) dx = |detBT | v̂(x̂) dx̂
T E

The next two estimations can directly derived from the geometry of our triangulation. Remember the use
of triangles in a regular triangulation. “Regular” means, that

%T ≥ chT ∀h, T, c > 0

where hT charakterises the maximum diameter for any triangle, that means the longest side. In general we
have the following two situations

hT ĥ
||BT ||2 ≤ = c1 hT and ||BT−1 ||2 ≤ = c0 %−1
T (149)
%̂ %T
The symbol ||B||2 means
  21
X
||B||2 =  |bik |2 
i,k

Furthermore there is an estimation for determinants as follows

||BT−1 ||−n n
2 ≤ |detBT | ≤ ||BT ||2

which may be proofed with the Laplace expansion theorem. Then

c0 %nT ≤ |detBT | ≤ c1 hnT

and
h
||BT ||2 ||BT−1 ||2 ≤ C
%
It follows in 2D (n = 2)
1
C0 h2T ≤ ≤ |det(BT )| ≤ ||BT ||22 ≤ C1 h2T (150)
||BT−1 ||22
At last we mention
c0 |∇v̂(x̂)|
b ≤ hT |∇v(x)| ≤ c1 |∇v̂(x̂)|,
b x ∈ T, x̂ = σT−1 x (151)

180
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Proof. Sketch. Estimation (151) follows from the compound rule of three

∇v̂(x̂)
b = BTT ∇v(x) (152)

We have
|∇v(x)| = |∇v(σ(x̂))| = |B −T ∇v̂(x̂)|
b ≤ ||B −T ||2 |∇v̂(x̂)|
b

and on the other hand


|∇v̂(x̂)|
b = |B T ∇v(x)| ≤ ||B T ||2 |∇v(x)|
Working with (149) show (151).

8.15.3.2 Numerical quadrature We set QF as an abbreviation for quadrature formula.


Definition 8.144. (Order of a QF)
A QF QT (·) on T is said to be of order r if it integrates exactly all polynomials of degree r − 1,
Z
QT (p) = p dx, p ∈ Pr−1 (T )
T

Another equivalent formulation is given by


Z
QT (g) = PT g dx, PT g ∈ Pr−1 (T )
T

where PT is an interpolating operator for g. Then, QT (·) is named as an interpolatary approximation of fT (·).
The first result gives an estimation for the error of numerical quadrature. Therefore we need the seminorm
X Z
|u|r,1,T = |∂ α u| dx
|α|=r T

and we mention that C r (T ) is dense in H r,1 (T ).


Lemma 8.145. (Quantitative error of numerical quadrature)
Assume QT (g) is a QF of order r ≥ 3 of the function g ∈ H r,1 (T ). Then

|fT (g) − QT (g)| ≤ chr |g|r,1,T ∀T (153)

for constant c which is independent of T and h


Proof. The requirement r ≥ 3 will be proofed with the inequality from Sobolev which is a consequence of the
embedding theorem. Note, that the case r = 2 is also possible but difficult to prove. The linear functional fT
is continuous with respect to the L1 -norm,
Z Z
|fT (g)| = g dx ≤ |g| dx = |g|0,1,T , g ∈ L1 (T ) (154)
T T

It follows for g ∈ H r,1 (T ) and all T

|fT (g) − QT (g)| = |fT (g) − fT (PT g)| (interpolated approximation of fT )


= |fT (g − PT g)|
(154)
≤ |g − PT g|0,1
≤ chr |g|r,1 (interpolation estimation)

We finished proof.

181
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Definition 8.146. (Admissibility of a QF)
Each polynomial q ∈ Pm−1 (E), integrated by QE , has the property

For q(ξˆi ) = 0, i = 1, . . . , L follows q≡0 (155)


This property ensures that the Lagrange interpolation has a unique solution. The necessary condition for
(155) requires that the number of interpolation points L is greater or equal to dimPm−1 .
We need further results like a new norm, and after that definition, we give an estimation between QF and
integral.
Definition 8.147. (Norm on Pm−1 (E)/P0 (E))
On the finite dimensional quotient space Pm−1 (E)/P0 (E) is a norm defined by
  12
XL 2
X
|||q̂||| :=  ω̂i (∂j q)2 (ξˆi )
i=1 j=1

Proof. We have to proof all attributes of a norm. But the friendly reader wants to show three properties. We
only check,
Definiteness.
Property (69) holds. Since the quadrature weights ω̂i are positive it follows for q ∈ Pm−1 (E),
L
X 2
X
ω̂i (∂j q)2 (ξˆi ) = 0
i=1 j=1

⇒ ∂j q(ξˆi ) = 0, 1 ≤ j ≤ 2, 1≤i≤L

Remark that any ∂j q is an element of Pm−2 (E) because q ∈ Pm−1 (E). Since (69), ∂j q vanishes at ξˆi . This
implies ∂j q ≡ 0, j = 1, 2 which shows the definiteness.
The reader may check
||q|| ≥ 0, ||αq|| = |α| ||q||, α ∈ R, ||p + q|| ≤ ||p|| + ||q||

Lemma 8.148. (Ellipticity of QF)


For some constant c which is independent of h we have
Z
QT (|∇v|2 ) ≥ c |∇v|2 dx ∀v ∈ Shm−1 ∀T (156)
T

Proof. There is some work to do. First we have the norm of definition (8.147). A second norm on Pm−1 (E)/P0 (E)
is given by |q̂|1,E . The finite dimension of Pm−1 (E)/P0 (E) leads to the equivalence of the two norms. It exists
some constant Ĉ, that
Ĉ|q̂|21 ≤ |||q̂|||, q̂ ∈ Pm−1 (E)/P0 (E) (157)
We compute
L
(148) X
QT (|∇p|2 ) = ωi |∇p(ξi )|2
i=1
L
X
= |detBT | bi |∇p(ξi )|2
ω (with ωi = |detBT |ω̂i )
i=1

(151) L
1 X b ξˆi )|2
≥ c0 |detBT | ω̂i |∇q̂(
h2 i=1
≥ c00 · QE (|∇q̂|
b 2)
= c00 |||q̂|||, q̂ ∈ Pm−1 (E)/P0 (E)

182
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
We conclude with the aid of equivalence of the two norms (157)
(157)
QT (|∇p|2 ) ≥ c00 |||q̂||| ≥ c00 Ĉ|q̂|21,E
Z
= c0 |detBT |h−2 |q̂|21,E = c0 |detBT |h−2 b 2 dx̂
|∇q̂|
E
Z
−2
= c0 |detBT |h |BTT ∇q|2 dx̂
E
Z
≥ c0 h−2 ||BTT ||2 |∇q|2 dx
T
(150)
Z
−2 2
≥ c0 h C1 h |∇q|2 dx
T
Z
=C |∇p|2 dx
T

for p ∈ Shm−1 . Which completes the proof.

8.15.3.3 Error estimation The central theorem leads to an error estimation of the FEM for numerical quadra-
ture formulae.
We give the definitions of the approximate elements of the matrix Ãij and the right-hand side b̃i
 
X X2
Ãij = QT  aµν ∂µ wi ∂ν wj  , i, j = 1, . . . , N
T ν,µ=1

and X
b̃i = QT (f wi ), i = 1, . . . , N
T

The main Theorem is a consequence of the Lemma from Strang (8.141). First, we proof the conditions, then
we have the error estimation for uh − ũh .
Theorem 8.149. Assume the quadrature formula, of order r ≥ 3,
L
X
QE (g) = ω̂i g(ξˆi )
i=1

on E, which is admissible for Pm−1 (E). Let r ≥ m − 1, m − 2. The degree of the finite elements is m − 1 that
is uh , vh ∈ Shm−1 . The coefficient functions aνµ ∈ L∞ (E) are sufficiently smooth. Then we have

|(a − ã)(uh , vh )| ≤ Chr−m+2 ||vh ||1 , vh ∈ Shm−1

and
|(f, vh )0 − ˜l(vh )| ≤ Chr−m+2 ||vh ||1 , vh ∈ Shm−1
and also uniform ellipticity
c||vh ||21 ≤ ã(vh ), vh ∈ Shm−1
The constants c, C depending on u.

183
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
0
Proof. Set m0 := m − 1. Denote vh ∈ Shm :

|(a − ã)(uh , vh )| = |a(uh , vh ) − ã(uh , vh )|


  
X Z X 2 2
X
= aνµ ∂µ uh ∂ν vh dx − QT  aνµ ∂µ uh ∂ν vh 


T T ν,µ=1 ν,µ=1

2 Z
X X 

= aνµ ∂µ uh ∂ν vh dx − QT aνµ ∂µ uh ∂ν vh
T ν,µ=1 T

2 Z 
TRI X X 
≤ aνµ ∂µ uh ∂ν vh dx − QT aνµ ∂µ uh ∂ν vh
T ν,µ=1 T

(153) 2
X X
≤ chr aνµ ∂µ uh ∂ν vh
T ν,µ=1
r,1,T

The functions aνµ are bounded with ||aνµ ||∞ ≤ c which implies

(153) 2
X X
|(a − ã)(uh , vh )| ≤ chr aνµ ∂µ uh ∂ν vh (158)
T ν,µ=1
r,1,T
  12   21
2 2
C.S. X X X
≤ chr  |∂µ uh |2r,1,T  ·  |∂ν vh |2r,1,T  (159)
T µ=1 ν=1
X
≤ chr ||uh ||r+1,T · ||vh ||r+1,T (160)
T

0
Now, a function wh ∈ Shm on each triangle T ∈ Th is a polynomial of Pm0 (T ). Because of differentiability, we
compute
||wh ||r+1,T = ||wh ||m,T for r ≥ m0 (161)
The inverse estimation delivers 0
||wh ||m0 ,T ≤ ch1−m ||wh ||1,T (162)
Using (161) and (162) for the function vh show us
0
||vh ||r+1,T ≤ ch1−m ||vh ||1,T (163)

We obtain with (161) for uh


||uh ||r+1,T = ||uh ||m0 ,T (164)
We put the last two results and the Cauchy-Schwarz inequality in (160),
X
|(a − ã)(uh , vh )| ≤ chr ||uh ||r+1,T · ||vh ||r+1,T
T
  21   12
C.S. X X
≤ chr  ||vh ||2r+1,T  ·  ||uh ||2r+1,T 
T T
  12
(163),(164) 0 X
≤ chr−m +1 ||vh ||1 ·  ||uh ||2m0 ,T 
T
r−m0 +1
= ch ||vh ||1 · ||uh ||m0

184
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
0
We specify the last term with the Ritzapproximation uh . Taking an interpolation operator Ih u ∈ Shm for u
we conclude
||uh ||m0 ,T = ||uh − Ih u + Ih u − u + u||m0 ,T
≤ ||uh − Ih u||m0 ,T + ||Ih u − u||m0 ,T + ||u||m0 ,T
(162) 0
≤ ch1−m ||uh − Ih u||1,T + ||Ih u − u||m0 ,T + ||u||m0 ,T
0
= ch1−m ||uh − u + u − Ih u||1,T + ||Ih u − u||m0 ,T + ||u||m,T
0 0
≤ ch1−m ||uh − u||1,T + ch1−m ||u − Ih u||1,T + ||Ih u − u||m0 ,T + ||u||m0 ,T

Remember the result from approximation with Ih :


0 0
||u − Ih u||m0 ≤ cht−m |u|t ≤ cht−m ||u||t for u ∈ H t (Ω), 0 ≤ m0 ≤ t.
Then, we have
0
||uh ||m0 ,T ≤ ch1−m ||uh − u||1,T + c1 ||u||m0 ,T + c2 ||u||m0 ,T + ||u||m0 ,T
0
≤ ch1−m ||uh − u||1,T + C||u||m0 ,T
The extension on Ω is
  21
X 0
||uh ||m0 = ||uh ||2m0 ,T  ≤ ch1−m ||u − uh ||1 + c||u||m0
T
0
Since ||u − uh ||1 = O(hm ), gives
0
|(a − ã)(uh , vh )| ≤ chr−m +1 ||vh ||1 · ||uh ||m0
0
 0

≤ chr−m +1 ||vh ||1 ch1−m ||u − uh ||1 + c||u||m0
0 0
= chr−2m +2 ||u − uh ||1 ||vh ||1 + chr−m +1 ||vh ||1 · ||u||m0
0 0
= chr−m +2 ||vh ||1 + c(u)hr−m +1 ||vh ||1
0
≤ c(u)hr−m +1 ||vh ||1
0
We explain in words: The third line uses the order of convergence ||u − uh || = O(hm ). The number ||u||m0
0
is not important for convergence, so it is some constant factor c(u). The order of convergence of hr−m +1 is
0
not as good as hr−m +2 which completes the last line. Remark that r ≥ 3 and r ≥ m0 . Short summary of the
result (m0 = m − 1):
|(a − ã)(uh , vh )| ≤ chr−m+2 ||vh ||1 , c = c(u)
Analogous computation verifies
|(f, vh )0 − ˜l(vh )| ≤ chr−m+2 ||vh ||1
0
The last estimation shows uniform ellipticity of ã. Let vh ∈ Shm which implies vh |T ∈ Pm0 (T ). Also, numerical
stability requires positive weights and we refrain that the bilinear form a is elliptic. Then
 
X X2
ã(vh ) = QT  aνµ ∂µ vh ∂ν vh 
T ν,µ=1

(128),(131) X
≥ c QT (|∇vh |2 ) (uniform ellipticity of a)
T
(156) XZ
≥ c |∇vh |2 dx
T T

= c|vh |21
≥ c||vh ||21 (Poincaré for functions of H01 (Ω))

185
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
The last estimation follows from the equivalence of || · ||1 and | · |1 on H01 .
The conditions of Lemma (8.141) are true and the proof is finished.

The previous theorem yields the order convergence of the FEM for numerical integration.
Corollary 8.150. The conditions of Theorem (8.149) hold. Then, Lemma (8.141) leads to a quantitative
error estimation, with τ = r − m + 2,

||uh − ũh ||1 = O(hr−m+2 )

The total error is given by

||u − ũh ||1 ≤ ||u − uh ||1 + ||uh − ũh ||1 = O(hm−1 + hr−m+2 )

Convergence is assured if r ≥ m − 1. In other words

r0 ≥ m0 (165)

Remark that r0 denotes the degree of the quadrature rule and m0 the degree of the finite elements.
Example 8.151. For r = m − 1 we have,

||u − ũh ||1 = O(h)

Optimal convergence is given if r ≥ 2m − 3. In example r = 2m − 3:

||u − ũh ||1 = O(hm−1 )

The error of quadrature can be neglected if r > 2m − 3, because

||u − ũh ||1 = O(hm−1 )

is more important for the order of convergence.


Remark 8.152. The inequality (165) is the main assertion of the report. But remind that r0 >> 2m0 − 1
(degree of quadrature rule very large in comparison to the optimal order of convergence) is not recommended
because the calculation of the coefficient functions aik (x) and the gradients ∂µ wi ∂ν wj , i, j = 1, . . . N, µ, ν = 1, 2
could be very expensive for higher order quadrature rules.

8.15.3.4 Concluding remarks The central result is the Lemma of Strang (8.141) which gives a quantitative
error estimation for convergence and leads to Theorem (8.149).
The main condition is the uniform ellipticity of the approximate bi linear form ã.
If the conditions are not satisfied it may be possible that the linear equation system Au = b is not solvable
because matrix A is singular. We study one example for this situation.

8.15.4 Numerical tests


8.15.4.1 Example 1 We discuss one simple example for a situation where the system matrix will be singular.
Since r0 ≥ m we have convergence of the procedure. Where r0 means the degree of the quadrature rule and m
is the degree of the FEM polynomials.
We assume the one-dimensional Poisson problem, take quadratic finite elements and use the middle-point
rule as integration formula. So we solve

−u00 = f, u(0) = (1) = 0

and take quadratic basic functions in the interval [0, 1] with equidistant step size h = xi − xi−1 . There is an
additional point required, so we take the middle point of the cells Ti ,
1
xi−1/2 := (xi−1 − xi )
2

186
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
The construction of the quadratic function v satisfies
N
X N
X
v(x) = vi ψi (x) + vi−1/2 ψi−1/2 (x)
i=0 i=1

with the following properties


i) ψi (xk ) = δik , ψi (xk−1/2 ) = 0
ii) ψi−1/2 (xk ) = 0, ψi−1/2 (xk−1/2 ) = δik
Now, we get three quadratic basic functions

(x−xi−1 )(x−xi−1/2 )
 (xi −xi−1 )(xi −xi−1/2 ) , x ∈ Ti




ψi (x) = (x(x−x i+1 )(x−xi+1/2 )
, x ∈ Ti+1
 i −xi+1 )(xi −xi+1/2 )



 0, otherwise

and 
(x−xi−1 )(x−xi )
(xi−1/2 −xi )(xi−1/2 −xi+1 ) , x ∈ Ti


ψi−1/2 (x) =
 0,

otherwise

Next task is constructing the derivatives and give the variational formulation of the one-dimensional Laplace
problem. Then solving integrals with middle-point rule,
Z b  
a+b 1
f (x) dx ≈ (b − a) f + f 00 (ξ)(b − a)3 (166)
a 2 24
One know that the middle-point rule has degree 1. Using this quadrature formula for quadratic elements fails
and the resulting systemmatrix A will have some zeros, so it is singular. For a local element we obtain
2
∂x ψi−1 (x) = (2x − xi−1/2 − xi )
h2i
2
∂x ψi−1/2 (x) = 2 (xi + xi−1 − 2x)
hi
2
∂x ψi (x) = 2 (2x − xi−1/2 − xi−1 )
hi
and the following integrals
Z xi
A11 = (∂x ψi−1 )2 dx
xi−1
Z xi
A22 = (∂x ψi−1/2 )2 dx ≈ hi (∂x ψi−1/2 (xi−1/2 ))2 = 0
xi−1
Z xi
A33 = (∂x ψi )2 dx
xi−1
Z xi h i
A12 = A21 = ∂x ψi−1 · ∂x ψi−1/2 dx ≈ hi (∂x ψi−1 · ∂x ψi−1/2 )(xi−1/2 ) = 0
xi−1
Z xi
A13 = A31 = ∂x ψi−1 · ∂x ψi dx
xi−1
Z xi h i
A23 = A32 = ∂x ψi−1/2 · ∂x ψi dx ≈ hi (∂x ψi−1/2 · ∂x ψi )(xi−1/2 ) = 0
xi−1

187
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
At last we write the results in our matrix,
 
A11 0 A13
A= 0 0 0  (167)
 
A31 0 A33

8.15.4.2 Example 2 The second example handles also with the Laplace equation but in 2D implemented
again in deal.II [6]. Our focuss is on two lines in this program:
// m’ : degree of the finite elements
fe (m’),

and
// Gauss formula, where n gives the number of notes in each direction
QGauss<2> quadrature_formula(n)
The degree of Gauss quadrature is given by r0 = 2n − 1. We made some calculations for different m0 and r0 .
The results are
n 1 2 3 4
r’ 1 3 5 7
m’

1 OK OK OK OK
2 F OK OK OK
3 F ok OK OK
4 F F ok OK
5 F F ok ok
6 F F F ok
7 F F F ok
8 F F F F
We shortly explain the notation:

• ok: (normal) convergence takes place


• OK: Optimal convergence is given with r0 ≥ 2m − 1
• F: no convergence

188
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.16 Numerical tests and computational convergence analysis
We finish this long chapter with several numerical tests in 2D and 3D.

8.16.1 2D Poisson

Figure 28: The graphical solution (left) and the corresponding mesh (right).

We compute Poisson in 2D on Ω = (0, 1)2 and homogeneous Dirichlet conditions on ∂Ω. The force is f = 1.
We use a quadrilateral mesh using Q1 elements (in an isoparametric FEM framework). The number of mesh
elements is 256 and the number of DoFs is 289. We need 26 CG iterations (see Section 10.4 for the development
of the CG scheme), without preconditioning, for the linear solver to converge.

Figure 29: Q1 (bilinear) and Q2 (biquadratic) elements on a quadrilateral in 2D.

189
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.16.2 Numerical test: 3D
Number of elements: 4096
Number of DoFs: 4913
Number of iterations: 25 CG steps without preconditioning.

Figure 30: The graphical solution (left) and the corresponding mesh (right).

8.16.3 Checking programming code and convergence analysis for linear and quadratic FEM
We present a general algorithm and present afterwards 2D results.
Algorithm 8.153. Given a PDE problem. E.g. −∆u = f in Ω and u = 0 on the boundary ∂Ω.

1. Construct by hand a solution u that fulfills the boundary conditions.


2. Insert u into the PDE to determine f .
3. Use that f in the finite element simulation to compute numerically uh .

4. Compare u − uh in relevant norms (e.g., L2 , H 1 ).


5. Check whether the desired h powers can be obtained for small h.
We demonstrate the previous algorithm for a 2D case in Ω = (0, π)2 :

−∆u(x, y) = f in Ω,
u(x, y) = 0 on ∂Ω,

and we constuct u(x, y) = sin(x) sin(y), which fulfills the boundary conditions (trivial to check! But please do
it!). Next, we compute the right hand side f :

−∆u = −(∂xx u + ∂yy u) = 2sin(x)sin(y) = f (x, y).

190
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
8.16.3.1 Numerically-obtained reference values On a seven-times refined mesh (16 384 elements), we obtain
as point-value goal functionals using Q1c elements:
u(0.5, 0.5) = 0.0736749...
u(0.75, 0.75) = 0.0452887...

For Q2c , we obtain


u(0.5, 0.5) = 0.0736714...
u(0.75, 0.75) = 0.0452862...

These values can be compared to the previously-constructed manufactured solution evaluated in these point
values. Moreover, these values can be more easily taken into account to quickly check an own developed solver.
For a sophisticated numerical analysis, we go now into the next paragraphs below.

8.16.3.2 2D Poisson: Linear FEM with Q1c elements We then use f in the above program from Section
8.16.1 and evaluate the L2 and H 1 norms using linear FEM. The results are:
Level Elements DoFs (N) h L2 err H1 err
=============================================================================
2 16 25 1.11072 0.0955104 0.510388
3 64 81 0.55536 0.0238811 0.252645
4 256 289 0.27768 0.00597095 0.126015
5 1024 1089 0.13884 0.00149279 0.0629697
6 4096 4225 0.06942 0.0003732 0.0314801
7 16384 16641 0.03471 9.33001e-05 0.0157395
8 65536 66049 0.017355 2.3325e-05 0.00786965
9 262144 263169 0.00867751 5.83126e-06 0.00393482
10 1048576 1050625 0.00433875 1.45782e-06 0.00196741
11 4194304 4198401 0.00216938 3.64448e-07 0.000983703
=============================================================================
In this table we observe that we have quadratic convergence in the L2 norm and linear convergence in the
1
H norm. For a precise (heuristic) computation, we also refer to Chapter 16 in which a formula for computing
the convergence orders α is derived.

8.16.3.3 2D Poisson: Quadratic FEM with Q2c elements We use again f in the above program from Section
8.16.1 and evaluate the L2 and H 1 norms using quadratic FEM. The results are:
Level Elements DoFs (N) h L2 err H1 err
=============================================================================
2 16 81 1.11072 0.00505661 0.0511714
3 64 289 0.55536 0.000643595 0.0127748
4 256 1089 0.27768 8.07932e-05 0.00319225
5 1024 4225 0.13884 1.01098e-05 0.000797969
6 4096 16641 0.06942 1.26405e-06 0.000199486
7 16384 66049 0.03471 1.58017e-07 4.98712e-05
8 65536 263169 0.017355 1.97524e-08 1.24678e-05
9 262144 1050625 0.00867751 2.46907e-09 3.11694e-06
10 1048576 4198401 0.00433875 3.08687e-10 7.79235e-07
11 4194304 16785409 0.00216938 6.14696e-11 1.94809e-07
=============================================================================
In this table we observe that we have cubic convergence O(h3 ) in the L2 norm and quadratic convergence
O(h2 ) in the H 1 norm. This confirms the theory; see for instance [24]. For a precise (heuristic) computation,
we also refer to Chapter 16 in which a formula for computing the convergence orders α is derived.

191
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
We next plot the DoFs versus the errors in Figure 31 in order to highlight the convergence orders. For the
relationship between h and DoFs versus the errors in different dimensions, we refer again to Chapter 16.

1
0.1
0.01
0.001
0.0001
1e-05
Error

1e-06
1e-07 Lin. FEM, L2 err.
Lin. FEM, H21 err.
1e-08 Quad. FEM, L err.
1e-09 Quad. FEM, H1 err.
O(h)
1e-10 O(h23)
O(h )
1e-11
1 10 100 1000 10000 100000 1e+06 1e+07 1e+08
DoFs
Figure 31: Plotting the DoFs versus the various errors for the 2D Poisson test using linear and quadratic FEM.
We confirm numerically the theory: we observe ku − uh kL2 = O(hr+1 ) and ku − uh kH 1 = O(hr ) for
r = 1, 2, where r is the FEM degree.

8.16.4 Convergence analysis for 1D Poisson using linear FEM


We continue Section 8.4.9. As manufactured solution we use
1
u(x) = (−x2 + x),
2
on Ω = (0, 1) with u(0) = u(1) = 0, which we derived in Section 8.1.1. In the middle point, we have
u(0.5) = −0.125 (in theory and simulations). In the following table, we plot the L2 and H 1 error norms, i.e.,
ku − uh kX with X = L2 and X = H 1 , respectively.

Level Elements DoFs (N) h L2 err H1 err


=============================================================================
1 2 3 0.5 0.0228218 0.146131
2 4 5 0.25 0.00570544 0.072394
3 8 9 0.125 0.00142636 0.0361126
4 16 17 0.0625 0.00035659 0.0180457
5 32 33 0.03125 8.91476e-05 0.00902154
6 64 65 0.015625 2.22869e-05 0.0045106
7 128 129 0.0078125 5.57172e-06 0.00225528
8 256 257 0.00390625 1.39293e-06 0.00112764
9 512 513 0.00195312 3.48233e-07 0.000563819
10 1024 1025 0.000976562 8.70582e-08 0.000281909
=============================================================================

192
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Using the formulae from Chapter 16, we compute the convergence order. Setting for instance for the L2
error we have

P (h) = 1.39293e − 06
P (h/2) = 3.48233e − 07
P (h/4) = 8.70582e − 08.

Then:  1.39293e − 06 − 3.48233e − 07 


1
α= log = 1.99999696186957,
log(2) 3.48233e − 07 − 8.70582e − 08
which is optimal convergence that confirm the a priori error estimates from Section 8.13.3. The octave code
is:
alpha = 1 / log(2) * log(abs(1.39293e-06 - 3.48233e-07) / abs(3.48233e-07 - 8.70582e-08))
For the H 1 convergence order we obtain:
alpha = 1 / log(2) * log(abs(0.00112764 - 0.000563819) / abs(0.000563819 - 0.000281909))
= 1.00000255878430,
which is again optimal convergence; we refer to Section 8.13.2 for the theory.

8.16.5 Computing the error norms ku − uh k


To evaluate the global error norms ku − uh kL2 (Ω) and ku − uh kH 1 (Ω) , we proceed element-wise and define:
Definition 8.154 (See for example [24](Chapter 2, Section 6)). For a decomposition Th = {K1 , . . . , KM } of
the domain Ω and m ≥ 1 we define
sX
kukH m (Ω) := kuk2H m (Kj ) .
Kj

Therefore we compute element-wise the error contributions:


Z X
kuk2H m (Kj ) = k∂ α uk2L2 (Kj ) ,
Kj |α|≤m

where α is a multiindex representing the degree of the derivatives; see Section 3.6.
Example 8.155 (L2 and H 1 ). For L2 it holds:
Z
kuk2L2 (Kj ) = u2 dx
Kj

For H 1 it holds: Z
kuk2H 1 (Kj ) = (u2 + ∇u2 ) dx
Kj

Consequently for the errors:


Example 8.156 (Errors in L2 and H 1 ). For L2 it holds:
Z
ku − uh k2L2 (Kj ) = (u − uh )2 dx
Kj

For H 1 it holds: Z
ku − uh k2H 1 (Kj ) = ((u − uh )2 + (∇u − ∇uh )2 ) dx,
Kj

which allow directly the evaluation of the integrals, or better, since uh only lives in the discrete nodes (or
quadrature points), we use directly a quadrature rule.

193
8. THEORY AND FINITE ELEMENTS (FEM) FOR ELLIPTIC PROBLEMS
Then we obtain:
Proposition 8.157. It holds:
sX
ku − uh kL2 (Ω) = ku − uh k2L2 (Kj ) ,
Kj
sX
ku − uh kH 1 (Ω) = ku − uh k2H 1 (Kj ) .
Kj

8.17 Chapter summary and outlook


In this key chapter, we have introduced the theoretical background of finite elements, namely variational
formulations. We discussed the properties and characteristic features of the finite element method in one
dimension and higher dimensions. The discretization is based on uniform meshes (grids). For large-scaled
problems and practical applications, uniform meshes are however unfeasible. In the next chapter, we introduce
a posteriori error estimation that can be used to extract localized error indicators, which are then used for
adaptive mesh refinement such that only ‘important’ parts of the numerical solution are resolved with a desired
accuracy. Less important solution parts can be still treated with relatively coarse meshes. In Chapter 10, we
discuss some numerical solution techniques such as the CG scheme, which has already been employed in the
current chapter.

194
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY

9 Goal-oriented a posteriori error estimation and adaptivity


This chapter is devoted to goal-oriented adaptivity based on a posteriori error estimation. The error estimation
is carried out with the dual-weighted residual method (DWR) in which an adjoint problem must be solved. This
method is particularly suited for problem in which general error functionals with only parts of the numerical
solution uh is of interest. In these cases, we need a high accuracy of the numerical solution, but uniform
(i.e., global) mesh refinement is very expensive for large problems and three-dimensional computations. Here,
memory requirements and CPU wall clock simulation times are too high to be justified. Mesh adaptivity tries
to refine only a few mesh elements such that the envisaged discretization error reduces below a given tolerance.
Simultaneously, mesh regions out of interest for the total accuracy may be coarsened. Similarly these concepts
can be applied to the adaptive control of solution algorithms as for instance linear or nonlinear solvers.

9.1 Overview of the special class WS 19/20 on goal-oriented error estimation


1. Class 1: Motivation and principles: why error estimation, why adaptivity, examples, efficiency/robustness
of error estimators, goal-functionals, Lagrangian (combining the PDE and the goal functional); see
Section 9.2
2. Class 2: Errors and adjoints in Rn for linear equation systems [11][Section 1.4]

3. Class 3: DWR in time for ODEs (I) [11][Chapter 2]


4. Class 4: DWR in time for ODEs (II) [11][Chapter 2]; algorithm for end-time error estimation in which
the initial value for the adjoint is numerically approximated
5. Class 5: Spatial DWR for linear Poisson (classical localization; integration by parts to obtain error
information from neighboring elements) and approximation of the adjoint problem (Section 9.12.1)
6. Class 6: Spatial DWR for linear Poisson with PU localization 9.12.3; effectivity of the classical localization
[113]
7. Class 7, Dec 3, 2019: Effectivity PU localization [113]; DWR for nonlinear spatial problems; adaptive
algorithms; balancing discretization and nonlinear iteration error [45, 107]
8. Class 8, Dec 5, 2019: Ex 3; algorithms fixed-point and Newton for balancing discretization and nonlinear
iteration error [45, 107]; two-sided error estimates [46] (Corr. 3.1, Lemma 3.2, Assumpt. 1, Theorem
3.3, Sec. 3.4 with Theorem 3.7, Part 1)
9. Class 9, Dec 12, 2019: Ex 4; Multiple goal functionals [47][Section 3]

10. Class 10, Dec 17, 2019: proof of enrichment for the combined goal functional [47][Prop. 3.1], adap-
tive algorithm for multiple goal functionals [47][Section 3.1]; practical aspects: refining elements (tri-
angles, quads, prisms, hex; hanging nodes); refinement strategies: averaging, fixed-rate, fixed-fraction
[11][Chapter 4]

11. Class 11, Jan 7, 2020: Space-time DWR [11][Chapter 9]; a primal-part error estimator for PU-DWR
space-time in which a(u)(ϕ) may be semi-linear
12. Class 12: Jan 14, 2020: Space-time DWR Besier (Schmich)/Vexler [119]. Further literature (not done):
Rannacher [19] and [18]
13. Class 13, Jan 16, 2020: Space-time DWR Besier (Schmich)/Vexler [119]. Further literature (not done):
Rannacher [19] and [18]
14. Class 14, Jan 28, 2020: Open questions, future possibilities, further applications

195
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.2 Motivation (Class 1)
Why adaptivity?

• Allows for flexible algorithms, efficient evaluations of numerical solutions or parts of it


• Decreasing the computational cost
• More important than ever because of big data, multipyisics applications, ...

How?
• Goal-oriented algorithms
• Approach motivated from optimization (Lagrange formalism)
• Adjoint equations

What?
• Different equations
• Related linear and nonlinear equations
• A posteriori error estimates

• Discretization error
• Nonlinear/linear iteration errors
Exercise 9. Do you have examples of such goal functionals or parts of the solution? A: Displacements, flow,
heat transfer, chemical reactions, adaptive choice of tolerances as stopping criteria in Ax = b or nonlinear
iterations such as Newton or fixed-point.

9.2.1 Goal functionals in differential equations and solution systems


Examples of goal functionals in differential equations:
• Given the operator equation A(y) = 0 with A(y) := y 0 (t) − ay(t). Goal: end time value y(T ) of for
instance a population

• Mean value distribution Z t2


y(t) dt
t1

• L2 norms of point values in −u00 (x) = f . For instance


Z x2
u(x) dx
x1

or !1/2
Z x2
2
u(x) dx
x1

or point values u(xl ) or


max |u(x)|
x∈Ω

• Or in higher-order dimensions for −∆u = f as well


Z
∂n u ds
Γ

196
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
• Flow equations (Navier-Stokes) Z
∇ × v dx

or stresses Z
σ · n ds
Γ

• Multiphysics: possibly multiple goal functionals


• Solvers (Newton): Solve f (x) = 0. Then, we need |f (xk )| < T OL. Question: how to choose T OL?
Depending on the iteration k, the tolerance T OL can be chosen adaptively. x1 is far away from the
solution, T OL may be still large. xk is closer to the solution, T OL now small.

9.2.2 Abstract formulation of all these goal functionals


Definition 9.1 (Goal functional). A goal functional (quantity of physical interest; QoI) characterizes a part
of a (or the entire) mathematical solution to a given mathematical problem.
Definition 9.2 (Goal functional; math. definition). Let V be a function space. A goal functional is defined
as a mapping
J : V → R.
The result is simply a number in R. Often, the error is measured in terms of this goal functional and here, the
result characterizes the error in percentage.
Example 9.3. We present some examples:
1. J(y) = 0.01 mean that the absolute error of y is measured in terms of J is 1 per cent.
2. J(y) := y(T ) is the end time error
Rt
3. J(y) := t12 y(t) dt.

9.2.3 Error, estimates, accuracy


Either way with goal functionals or usual norms, we are interested in the distance between numerical approx-
imations and reference values. We recall briefly from Section 2.4. For instance, given Ax = b and Ah xh = bh ,
the question is about the distance between x and xh :

kx − xh k =? (Iteration error)

Or for differential equations, given −∆u = f and −∆uh = f , what is

ku − uh k =? (Discretization error)

To address these errors, we work with a priori or a posteriori error estimation; see Section 9.4.
Definition 9.4. To estimate the previous errors, we work with an estimator η. See further in Section 9.5.

9.3 Difference between norm-based error estimation and goal functionals


Goal functional error estimation is more general as norm-based error estimation. The main difference is that
in norm-based error estimation we have

ku − ũk = 0 ⇔ u − ũ = 0,

and ku − ũk ≥ 0. For goal functionals, it can be that

J(u) − J(ũ) = 0 for u − ũ 6= 0

and in general J(u) − J(ũ) ∈ R, but not necessarily positive.

197
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.4 Principles of error estimation
In general, we distinguish between a priori and a posteriori error estimation. In the first one, which we
already have discussed for finite differences and finite elements, the (discretization) error is estimated before we
start a numerical simulation/computation. Often, there are however constants c and (unknown) higher-order
derivatives |u|H m of the (unknown) solution:

ku − uh k ≤ chm |u|H m .

Such estimates yield


• the asymptotic information of the error for h → 0.
• In particular, they provide the expected order of convergence, which can be adopted to verify program-
ming codes and the findings of numerical simulations for (simple?!) model problems.
However, a priori estimates do not contain useful information during a computation. Specifically, the deter-
mination of better, reliable, adaptive step sizes h is nearly impossible (except for very simple cases).
On the other hand, a posteriori error estimation uses information of the already computed uh during a
simulation. Here, asymptotic information of u is not required:

ku − uh k ≤ cη(uh )

where η(uh ) is a computable quantity, because, as said, they work with the known discrete solution uh . Such
estimates cannot in general predict the asymptotic behavior, the reason for which a priori estimates remain
important, but they can be evaluated during a computation with two main advantages:
• We can control the error during a computation;
• We can possibly localize the error estimator in order to obtain local error information on each element
Ki ∈ Th . The elements with the highest error information can be decomposed into smaller elements in
order to reduce the error.

9.5 Efficiency, reliability, basic adaptive scheme


Following Becker/Rannacher [14, 15], Bangerth/Rannacher [11], and Rannacher [104], we derive a posteriori
error estimates not only for norms, but for a general differentiable functional of interest J : V → R. This allows
in particular to estimate technical quantities arising from applications (mechanical and civil engineering, fluid
dynamics, solid dynamics, etc.). Of course, norm-based error estimation, ku − uh k can be expressed in terms
of J(·); for hints see Section 9.18.
When designing a posteriori error estimates, we are in principle interested in the following relationship:

C1 η ≤ |J(u) − J(uh )| ≤ C2 η (168)

where η := η(uh ) is the error estimator and C1 , C2 are positive constants. Moreover, J(u) − J(uh ) is the
true error.
Definition 9.5 (Efficiency and reliablity). A good estimator η := η(uh ) should satisfy two bounds:
1. An error estimator η of the form (168) is called efficient when

C1 η ≤ |J(u) − J(uh )|,

which means that the error estimator is bounded by the error itself.
2. An error estimator η of the form (168) is called reliable when

|J(u) − J(uh )| ≤ C2 η.

Here, the true error is bounded by the estimator.

198
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Ideally, it should hold in the asymptotic limit:
η
→ 1.
J(u) − J(uh )
This yields the so-called effectivity index; see also Section 9.10.
Remark 9.6. For general goal functionals J(·), it is much more simpler to derive reliable estimators rather
than prooving their efficiency.
Remark 9.7 (AFEM). Finite element frameworks working with a posteriori error estimators applied to local
mesh adaptivity, are called adaptive FEM.
Definition 9.8 (Basic algorithm of AFEM). The basic algorithm for AFEM is always the same:
1. Solve the PDE on the current mesh Th ;
2. Estimate the error via a posteriori error estimation to obtain η;
3. Mark the elements by localizing the error estimator;
4. Refine/coarsen the elements with the highest/lowest error contributions using a certain refinement
strategy.
A prototype situation on three meshes is displayed in Figure 32.

Figure 32: Meshes, say, on level 0, level 1 and 2. The colored mesh elements indicate high local errors and are
marked for refinement using bisection and hanging nodes.

9.6 Goal-oriented error estimation using dual-weighted residuals (DWR)


9.6.1 Motivation
The goal of this section is to derive an error estimator that is based on duality arguments and can be used for
norm-based error estimation (residual-based) as well as estimating more general error functionals J(·).
The key idea is based on numerical optimization, but goes back to classical mechanics in the 17th century.
From our previous considerations, our goal is to reduce the error in the functional of interest with respect to
a given PDE:
Formulation 9.9 (Goal functional evaluation as a constrained optimization problem). Let u ∈ V the con-
tinuous (unknown) solution and uh ∈ Vh the numerical approximation. For a given PDE in weak from
a(u, φ) = l(φ) respectively, the discrete form a(uh , φh ) = l(φh ), we aim to minimize the goal functional error
J(u) − J(uh ). This can be written as a constrained optimization problem

min J(u) − J(uh ) s.t. a(u, φ) = l(φ), (169)

where J(·) and a(u, φ) can be linear or nonlinear, but need to be differentiable (in Banach spaces).

199
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Remark 9.10. Of course for linear functionals, we can write
J(u) − J(uh ) = J(u − uh ) = J(e), e = u − uh .
Such minimization problems as in (169) can be treated with the help of the so-called Lagrangian L :
V × V → R in which the functional J(·) is of main interest subject to the constraint a(·, ·) − l(φ) (here the
PDE in variational form). Here, we deal with the primal variable u ∈ V and a Lagrange multiplier z ∈ V ,
which is the so-called adjoint variable and which is assigned to the constraint a(u, z) − l(z) = 0. We then
obtain
Definition 9.11. The Lagrangian L : V × V → R is defined as

L(u, z) = J(u) − J(uh ) − a(u, z) + l(z).
The Lagrange multiplier z measures the sensitivity of the PDE with respect to the given goal functional J(u).
Before we proceed, we briefly recapitulate an important property of the Lagrangian. Let V and Z be two
Banach spaces. A Lagrangian is a mapping L(v, q) : U × Y → R, where U × Y ⊂ V × Z.
Definition 9.12 (Saddle-point). A point (u, z) ∈ U × Y is a saddle point of L on U × Y when
∀y ∈ Y : L(u, y) ≤ L(u, z) ≤ L(v, z) ∀v ∈ U.
A saddle-point is also known as min-max point. Geometrically one may think of a horse saddle.
Remark 9.13. One can show that a saddle point yields under certain (strong) conditions a global minimum
of u of J(·) in a subspace of U .
The Lagrangian has the following important property to clear away the constraint (recall the constraint is
the PDE!):
Lemma 9.14. The problem 
inf J(u) − J(uh )
u∈V,−a(u,z)+l(z)=0

is equivalent to
inf = inf sup L(u, z)
u∈V,−a(u,z)+l(z)=0 u∈V z∈V

Proof. If a(u, z) + l(z) = 0 we clearly have J(u) − J(uh ) = L(u, z) for all z ∈ V . If a(u, z) + l(z) 6= 0, then
supz∈V L(u, z) = +∞, which shows the result.
Remark 9.15. In the previous lemma, we prefer to work with the infimum (inf ) since it is not clear a priori
whether the minimum (min) is taken.

9.6.2 Excursus: Variational principles and Lagrange multipliers in mechanics


In this excursus we shall give a mechanics-based motivation of Lagrange multipliers and constrained opti-
mization problems as they form the background of the current chapter. Specifically, we address a physical
interpretation of Lagrange multipliers. We have motivated the Poisson problem as the clothesline problem
in Section 6.1. The derivation based on first principles in physics, namely conservation of potential energy is
as follows. Let a clothesline be subject to graviational forces g. We seek the solution curve u = u(x). A final
equilibrium is achieved for minimal potential energy. This condition can be expressed as:
Formulation 9.16 (Minimal potential energy - an unconstrained variational problem). Find u such that
min J(u)
with Z 2 Z x2 p
J(u) = Epot = gu dm = ρg u 1 + (u0 )2 dx,
|1 {z } |
x1
{z }
right hand side left hand side

where g is the gravity, u the sought solution, dm mass elements, ρ the mass density. By variations δu we
obtain solutions u + δu and seek the optimal solution such that J(u) is minimal. Of course, the boundary
conditions u1 and u2 are not varied.

200
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
In the following, we formulate a constrained minimization problem. We ask that the length L of the
clothesline is fixed: Z x2 p
K(u) = L = 1 + (u0 )2 dx = const.
x1

Formulation 9.17 (A constrained minimization problem). Find u such that

min J(u) s.t. K(u) = const.

The question is how we address Formulation 9.17 in practice? We explain the derivation in terms of a 1D
situation in order to provide a basic understanding as usually done in these lecture notes.
The task is:
min J(x, u(x)) s.t. K(x, u(x)) = 0. (170)
Let us assume for a moment, we can explicitly compute u = uK (x) from K(x, u(x)) = 0 such

K(x, uK (x)) = 0.

The minimal value of J(x, u) on the curve uK (x) can be computed as minimum of J(x, uK (x)). For this
reason, we use the first derivative to compute the stationary point. With the help of the chain rule, we obtain:
d
0= J(x, uK ) = Jx0 (x, uK ) + Ju0 (x, uK )u0K (x). (171)
dx
With this equation, we obtain the solution x1 . The solution u1 is then obtain from u1 = uK (x1 ).
Using Lagrange multipliers, we avoid the explicit construction of uK (x) because such an expression is only
easy to obtain for simple model problems. To this end, we introduce the variable z as a Lagrange multiplier.
We build the Lagrangian
L(x, u, z) = J(x, u) − zK(x, u)
and consider the problem:
min L(x, u, z) s.t. K(x, u) = 0.
Again, to find the optimal points, we differentiate w.r.t. to the three solution variables x, u, z:

Jx0 (x, u) − zKx0 (x, u) = 0


Ju0 (x, u) − zKu0 (x, u) = 0 (172)
K(x, u) = 0

to obtain a first-order optimality system for determining x, u, z.


Proposition 9.18. The formulations (171) and (172) are equivalent.
Proof. We show (172) yields (171). We assume that we know u = uK (x), but we do not need the explicit
construction of that uK (x). Then, K(x, u) = 0 is equivalent to

K(x, u) = u − uK (x) = 0.

We differentiate w.r.t. x and u and insert the resulting expressions into the first two equations in (172):

Jx0 (x, u) + zu0K (x) = 0, (173)


Ju0 (x, u) − z = 0. (174)

The second condition is nothing else, but


z = Ju0 (x, u).
Here, we easily see that the Lagrange multiplier measures the sensitivity (i.e., the variation) of the solution
curve u with respect to the functional J(x, u). Inserting z = Ju0 (x, u) into (172) yields (171). The backward
direction can be shown in a similar fashion.

201
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Remark 9.19. The previous derivation has been done for a 1D problem in which u(x) with x ∈ R is unknown.
The method can be extended to R and Banach spaces, the latter one being addressed in Section 9.6.3.
Remark 9.20. We emphasize again that the use of Lagrange multipliers seems more complicated, but avoids
the explicit construction of uK (x), which can be cumbersome. This is the main reason of the big success of
adjoint methods, working with the adjoint variable z in physics and numerical optimization. Again, here,
the Lagrangian L(x, u, z) is minimized and the solution u = u(x, z) contains a parameter z. This parameter
is automatically determined such that the constraint K(x, u) = 0 is satisfied. The prize to pay is a higher
computational cost since more equations need to be solved using (172) in comparison to (171).

9.6.3 First-order optimality system


As motivated in the previous subsections, we seek minimal points and therefore we look at the first-order
necessary conditions. Now we work in Banach spaces rather than R. Differentiation with respect to u ∈ V
and z ∈ V yields the optimality system:
Proposition 9.21 (Optimality system: Primal and adjoint problems). Differentiating (see Section 7.4) the
Lagrangian from Definition 9.11

L(u, z) = J(u) − J(uh ) − a(u, z) + l(z),

yields:

L0u (u, z)(φ) = Ju0 (u)(φ) − a0u (u, z)(φ) ∀φ ∈ V,


L0z (u, z)(ψ) = −a0z (u, z)(ψ) + lz0 (ψ) ∀ψ ∈ V.

The first equation is called the adjoint problem and the second equation is nothing else than our PDE, the
so-called primal problem. We also observe that the trial and test functions switch in a natural way in the
adjoint problem.
Proof. Trivial with the methods presented in Section 7.4.
Remark 9.22 (on the notation). We abuse a bit the standard notation for semi-linear forms. Usually, all
linear and nonlinear arguments are distingished such that a0u (u, z)(φ) would read a0u (u)(z, φ) because u is
nonlinear and z and φ are linear. We use in these notes however a0u (u, z)(φ) in order to emphasize that u and
z are the main variables.
Corollary 9.23 (Primal and adjoint problems in the linear case). In the linear case, we obtain from the
general formulation:

L(φ, z) = J(φ) − a(φ, z)


L(u, ψ) = −a(u, ψ) + l(ψ).

Definition 9.24. When we discretize both problems using for example a finite element scheme (later more),
we define the residuals for uh ∈ Vh and zh ∈ Vh . The primal and adjoint residuals read, respectively:

ρ(uh , ·) = −a0z (uh , z)(·) + lz0 (·)


ρ∗ (zh , ·) = Ju0 (u)(·) − a0u (u, zh )(·).

In the linear case:

ρ(uh , ·) = −a(uh , ·) + l(·)


ρ∗ (zh , ·) = J(·) − a(·, zh ).

Proposition 9.25 (First-order optimality system). To determine the optimal points (u, z) ∈ V × V , we set
the first-order optimality conditions to zero:

0 = Ju0 (u)(φ) − a0u (u)(z, φ)


0 = −a0z (u)(z, ψ) + lz0 (ψ).

202
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Here, we easily observe that the primal equation is nothing else than the bilinear form a(·, ·) we have worked
with so far. The adjoint problem is new (well known in optimization though) and yields sensitivity measures
z of the primal solution u with respect to the goal functional J(·).

Example 9.26 (Poisson problem). Let a(u, φ) = (∇u, ∇φ). Then: a(u, z) = (∇u, ∇z). Then,

a0u (u, z)(φ) = (∇φ, ∇z),

and
a0z (u, z)(ψ) = (∇u, ∇ψ).
These derivatives are computed with the help of directional derivatives (Gâteaux derivatives) in Banach spaces.

9.6.4 Error estimation and adjoints in Rn


In this section, we follow closely [11][Section 1.4]. Let A, Ah ∈ Rn×n and vectors b, bh ∈ Rn , we consider
x, xh ∈ Rn , which are obtained from solving the continuous and approximated problems

Ax = b, Ah xh = bh .

As usually in these notes, h indicates the approximation (discretization) parameter. We recall from Numerik
1/2:

e := x − xh (approximation error)
τ := Ah x − bh (truncation error)
ρ := b − Axh (residual)

A priori error analysis is based on the truncation error:

Ah e = Ah x − Ah xh = Ah x − bh = τ,

yielding
e = τ A−1
h .

An error bound is obtained as follows:

kek ≤ cS,h kτ k, cS,h := kA−1


h k

where cS,h is the discrete stability constant. In general τ is not or difficult to compute since the (unknown)
exact solution x enters. On the other hand, a posteriori error analysis is based on the residual as follows:

Ae = Ax − Axh = b − Axh = ρ.

From which we obtain


e = ρA−1
and thus
kek ≤ cS kρk, cS := kA−1 k.
The operator A is of course unknown, but may be available from analytical arguments or regularity theory.

9.6.5 Duality-based error estimation - linear problems


Let j ∈ Rn be given. We have
J(u) − J(uh ) = J(e) = (e, j)
The element j is determined by solving the adjoint problem:

A∗ z = j.

203
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Then, we obtain
J(e) = (e, j) = (e, A∗ z) = (Ae, z) = (ρ, z),
which is an error relation e measured in the goal functional J by using the residual ρ and the adjoint solution
z. Then:
n
X
|J(e)| = |(e, j)| = |(ρ, z)| ≤ kρkkzk = |ρi ||zi |.
i=1

The local residual is denoted by ρi and the weights zi measure the influence of ρi on the target J(u).

9.6.6 Duality-based error estimation - nonlinear problems


We assume differentiable mappings A and Ah : Rn → Rn and the vectors b, bh ∈ Rn . The problems are given
by
A(x) = b, Ah (xh ) = bh .
Using the substitution rule, we have the following relation:
Z 1
A(x) − A(xh ) = A0 (xh + se)e ds, e = x − xh .
0

Using the scalar product, we then have


Z 1
(A(x) − A(xh ), y) = (A0 (xh + se)e, y) ds = (Be, y),
0

where Z 1
B := B(x, xh ) := A0 (xh + se)e ds.
0
As in the linear case, we can define the adjoint problem

B∗z = j

yielding the error identity

J(e) = (e, j) = (e, B ∗ z) = (Be, z) = (A(x) − A(xh ), z) = (ρ, z)

and again
n
X
|J(e)| ≤:= |ρi ||zi |.
i=1

In the nonlinear case, a crucial difficulty is that the unknown error e = x − xh is included in the matrix B.
One very simple possibility is Z 1
B ≈ B̃ = A0 (xh ) ds.
0

9.7 Duality-based error estimation for ODEs - adaptivity in time


See [11][Section 2].

204
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.8 Spatial error estimation for linear problems and linear goal functionals (Poisson)
We explain our developments in terms of the linear Poisson problem and linear goal functionals.

9.8.1 Primal problem


We work with
Formulation 9.27 (Primal problem). Let f ∈ L2 (Ω), and we assume that the problem and domain are
sufficiently regular such that the trace theorem (see Theorem 8.82 and also the literature, e.g., [145]) holds
1 1
true, i.e., h ∈ H − 2 (ΓN ), and finally uD ∈ H 2 (Ω). Find u ∈ {uD + V }:

a(u, φ) = l(φ) ∀φ ∈ V,

where
a(u, φ) = (α∇u, ∇φ)
and Z Z
l(φ) := f φ dx + gφ ds,
Ω ΓN

and the diffusion coefficient α := α(x) ∈ L∞ (Ω). In this setting ΓN gφ ds has to be understood as duality
R

product as for instance in [65]. If g ∈ L2 (ΓN ) then it coincides with the integral.

9.8.2 Goal functional: some possible examples


As previously motivated, the aim is to compute a certain quantity of interest J(u) with a desired accuracy at
low computational cost.
Example 9.28. Examples of goal functionals are mean values, line integration or point values:
Z Z Z
J(u) = u dx, J(u) = u ds, J(u) = ∂n u ds, J(u) = u(x0 , y0 , z0 ).
Ω Γ Γ

The first goal functional is simply the mean value of the solution. The third and fourth goal functionals are a
priori not well defined. In case of the second functional we know the ∇u ∈ [L2 (Ω)]d . Using the trace theorem
1
(see Theorem 8.82), we can deduce that the trace in normal direction belongs to H − 2 (∂Ω). This however
leads to the problem that the second functional is not always well defined. Concerning the third functional, we
have previously shown in Section 7.1 that H 1 functions with dimension d > 1 the solution u is not any more
continuous and the last evaluation is not well defined. If the domain and boundaries are sufficiently regular
in 2D, the resulting solution is however H 2 regular and thanks to Sobolev embedding theorems (e.g., [34, 50])
also continuous.
R
Example 9.29. Let J(u) = Ω u dx. Then the Fréchet derivative is given by
Z
0
J(φ) = Ju (u)(φ) = φ dx

and, of course Jλ0 (u)(ψ) ≡ 0.


The above goal functionals are however computed with a numerical method leading to a discrete version
J(uh ). Thus the key goal is to control the error J(e) := J(u) − J(uh ) in terms of local residuals, which are
computable on each mesh cell Ki ∈ Th .

9.8.3 Adjoint problem


According to the previous sections, we have

205
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Proposition 9.30 (Adjoint problem). Based on the optimality system, here Corollary 9.23, we seek the adjoint
variable z ∈ V :
a(φ, z) = J(φ) ∀φ ∈ V. (175)
Specifically, the adjoint bilinear form for the Poisson problem is given by

a(φ, z) = (α∇φ, ∇z).

For symmetric problems, the adjoint bilinear form a(·, ·) is the same as the original one, but differs for non-
symmetric problems like transport for example.
Proof. Apply the first-order necessary condition.
Remark 9.31. Existence and uniqueness of this adjoint solution follows by standard arguments provided
sufficient regularity of the goal functional and the domain are given. The regularity of z ∈ V depends on the
regularity of the functional J. For J ∈ H −1 (Ω) it holds z ∈ H 1 (Ω). Given a more regular functional like
the L2 -error J(φ) = kek−1 (eh , φ) (where e := u − uh ) with J ∈ L2 (Ω)∗ (denoting the dual space), it holds
z ∈ H 2 (Ω) on suitable domains (convex polygonal or smooth boundary with C 2 -parameterization).

9.8.4 Derivation of an error identity


We now work with the techniques as adopted in Section 8.13. Inserting as special test function ψ := u−uh ∈ V ,
recall that uh ∈ Vh ⊂ V into (175) yields:

J(u − uh ) = a(u − uh , z),

and therefore we have now a representation for the error in the goal functional.
Next, we use the Galerkin orthogonality a(u − uh , ψh ) = 0 for all ψh ∈ Vh , and we obtain (compare to
Aubin-Nitsche in Section 8.13.3):

a(u − uh , z) = a(u − uh , z) − a(u − uh , ψh ) = a(u − uh , z − ψh ) = J(u − uh ). (176)


| {z }
=0

The previous step allows us to choose ψh in such a way that z−ψh can be bounded using interpolation estimates.
Indeed, since ψh is an arbitrary discrete test function, we can for example use a projection ψh := ih z ∈ Vh in
(176), which is for instance the nodal interpolation.
Definition 9.32 (Error identity). Choosing ψh := ih z ∈ Vh in (176) yields:

J(u − uh ) = a(u − uh , z − ih z). (177)

Thus, the error in the functional J(u − uh ) can be expressed in terms of a residual, that is weighted by adjoint
sensitivity information z − ih z.
However, since z ∈ V is an unknown itself, we cannot yet simply evaluate the error identity because z is
only known analytically in very special cases. In general, z is evaluated with the help of a finite element
approximation yielding zh ∈ Vh .
However, this yields another difficulty since we inserted the interpolation ih : V → Vh for z ∈ V . When we
(1)
approximate now z by zh and use a linear or bilinear approximation r = 1: zh ∈ Vh , then the interpolation
ih does nothing (in fact we interpolate a linear/bilinear function zh with a linear/bilinear function ih zh , which
is clearly
zh − ih zh ≡ 0.
For this reason, we need to approximate zh with a scheme that results in a higher-order representation: here
(2)
at least something of quadratic order: zh ∈ Vh .

206
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.9 Nonlinear problems and nonlinear goal functionals
In the nonlinear case, the PDE may be nonlinear (e.g., p-Laplace, nonlinear elasticity, Navier-Stokes, and
Section 4.6) and also the goal functional may be nonlinear, e.g.,
Z
J(u) = u2 dx.

These nonlinear problems yield a semi-linear form a(u)(φ) (not bilinear any more), which is nonlinear in the
first variable u and linear in the test function φ. Both the semi-linear form and the goal functional J(·) are
assumed to be (Fréchet) differentiable.
We start from Definition 9.21. Assuming that we have discretized both problems using finite elements (for
further hints on the adjoint solution see Section 9.9.1). We suppose that all problems have unique solutions.
We define:
Definition 9.33 (Primal and adjoint residuals). We define the primal and adjoint residuals, respectively:

ρ(uh )(φ) = l(φ) − a(uh )(φ) ∀φ ∈ V,



ρ (zh )(φ) = Ju0 (uh )(φ) − a0u (uh )(φ, z) ∀φ ∈ V.

It holds:
Theorem 9.34 ([15]). For the Galerkin approximation of the first-order necessary system Definition 9.21, we
have the combined a posteriori error representation:
1 1
J(u) − J(uh ) = η = min ρ(uh )(z − φh ) + min ρ∗ (zh )(u − φh ) + R.
2 φh ∈Vh 2 φh ∈Vh

The remainder term is of third order in J(·) and second order in a(·)(·). Thus, for linear a(·)(·) and quadratic
J(·) the remainder term R vanishes. In practice the remainder term is neglected anyway and assumed to be
small. However, in general this assumption should be justified for each nonlinear problem.
Proof. We refer the reader to [15].
Corollary 9.35 (Linear problems). In the case of linear problems the two residuals coincide. Then, it is
sufficient to only solve the primal residual:

J(u) − J(uh ) = J(u − uh ) = min ρ(uh )(z − φh )


φh ∈Vh
| {z }
Primal
= min ρ∗ (zh )(u − φh )
φh ∈Vh
| {z }
Adjoint
1 1
= min ρ(uh )(z − φh ) + min ρ∗ (zh )(u − φh )
2 φh ∈Vh 2 φh ∈Vh
| {z }
Combined

Indeed the errors are exactly the same for linear goal functionals using the primal, adjoint or combined error
estimator. The only difference are the resulting meshes. Several examples have been shown in Example 2 in
[113].
Proof. It holds for the adjoint problem in the linear case:

J(φ) = a(φ, z).

207
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Then:

J(u − uh ) = J(e) = a(u − uh , z)


| {z }
= l(z) − a(uh , z)
| {z }
=ρ(uh ,z)

= a(e, z) = a(e, z − zh ) = a(e, e∗ ) = a(u − uh , e∗ )


= a(u, e∗ )
| {z }
=a(u,z)−a(u,zh )=J(u) − a(u, zh )
| {z }
=ρ∗ (zh ,u)


= J(e ),

where e∗ := z − zh and where a(e, z) = a(e, z − zh ) and a(u − uh , e∗ ) = a(u, e∗ ) hold true thanks to Galerkin
orthogonality.
Definition 9.36 (A formal procedure to derive the adjoint problem for nonlinear equations). We summarize
the previous developments. Based on Proposition 9.21, we set-up the adjoint problem as follows:
1. Given a PDE plus boundary conditions, and in time-dependent cases initial conditions

2. Derive weak form a(u)(φ)


3. Differentiate w.r.t. u ∈ V such that
a0u (u)(δu, φ)
with the direction δu ∈ V .
4. Switch the trial function δu ∈ V and the test function φ ∈ V such that:

a0u (u)(φ, δu)

5. Replace δu ∈ V by the adjoint solution z ∈ V :

a0u (u)(φ, z)

6. This procedure also shows that the primal variable u ∈ V enters the adjoint problem in the nonlinear case.
However u ∈ V is now given data and z ∈ V is the sought unknown. This also means that in nonlinear
problems, that primal solution needs to be stored in order to be accessed when solving the adjoint problem.
7. Construct error identity

8. Construct error estimator with u ≈ u2h and z ≈ zh2


9. Possible localization (e.g., spatial refinement) of error estimator

9.9.1 Approximation of the adjoint solution for the primal estimator ρ(uh )(·)
In order to obtain a computable error representation, z ∈ V is approximated through a finite element function
zh ∈ Vh , that is obtained from solving a discrete adjoint problem:

Formulation 9.37 (Discrete adjoint problem for linear problems).

a(ψh , zh ) = J(ψh ) ∀ψh ∈ Vh . (178)

208
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Then the primal part of the error estimator reads:

a(u − uh , zh − ih zh ) = ρ(uh )(zh − ih zh ) ≈ J(u) − J(uh ). (179)

The difficulty is that if we compute the adjoint problem with the same polynomial degree as the primal problem,
then zh − ih zh ≡ 0, and thus the whole error identity defined in (179) would vanish, i.e., J(u) − J(uh ) ≡ 0.
This is clear from a theoretical standpoint and can be easily verified in numerical computations.
In fact normally an interpolation operator ih interpolates from infinite dimensional spaces V into finite-
dimensional spaces Vh or from higher-order spaces into low-order spaces.
Thus:
(1)
ih : V → Vh so far...
(1) (1)
ih : Vh → Vh first choice, but trivial solution, useless
(2) (1)
ih : Vh → Vh useful choice with nontrivial solution

From these considerations it is also clear that


(1) (2)
ih : Vh → Vh
(2) (1)
will even be worse (this would arise if we approximate the primal solution with uh ∈ Vh and zh ∈ Vh ).

9.9.2 Summary and alternatives for computing the adjoint


In summary:
• the adjoint solution needs to be computed either with a global higher-order approximation (using a
higher order finite element of degree r + 1 when the primal problem is approximated with degree r),
• or a solution on a finer mesh,
• or local higher-order approximations using a patch-wise higher-order interpolation [11, 15, 113].
Clearly, the last possibility is the cheapest from the computational cost point of view, but needs some efforts to
be implemented. For the convenience of the reader we tacitly work with a global higher-order approximation
in the rest of this chapter.
We finally end up with the primal error estimator:
Definition 9.38 (Primal error estimator). The primal error estimator is given by:
(r+1) (r+1) (r+1) (r+1)
J(u) − J(uh ) =: η ≈ a(u − uh , zh − ih zh ) = ρ(uh )(zh − ih zh ).

9.9.3 Approximation of the primal solution for the adjoint estimator ρ∗ (zh )(·)
To evaluate the adjoint estimator ρ(zh )(·), we need to construct

u − ih u

with ih : V → Vh for u ∈ V and uh ∈ Vh . Here we encounter the opposite problem to the previous section. We
need to solve the primal problem with higher accuracy using polynomials of degree r + 1 in order to construct
a useful interpolation, yielding
u − ih u 6= 0 a.e.
Then:
(r+1) (r+1)
J(u) − J(uh ) ≈ η := ρ∗ (zh )(uh − ih uh ).

209
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.10 Measuring the quality of the error estimator η
As quality measure how well the estimator approximates the true error, we use the effectivity index Ief f :

Definition 9.39 (Effectivity index). Let η be an error estimator and J(u) − J(uh ) the true error. The
effectivity index is defined as:
η
Ief f := Ief f (uh , zh ) = . (180)
J(u) − J(uh )
Problems with good Ief f satisfy asymptotically Ief f → 1 for h → 0. We say that
• for Ief f > 1, we have an over estimation of the error,
• for Ief f < 1, we have an under estimation of the error.

9.11 Rough structure of adjoint-based error estimation


We obtain as basic scheme:
1. Compute primal solution ũ

2. Compute adjoint solution z̃. For linear problems, both solutions are independent. For nonlinear PDEs,
the primal solution ũ enters into the adjoint problem. The latter follows naturally from the Lagrangian
formulation.
3. Having just computed ũ and z̃ construct error estimator η(ũ, z̃)

4. Evaluate η(ũ, z̃) and compare with true error e. The latter of course can only be constructed as long as
we deal with prototype problems for which we know the exact solution u.
5. Evaluate the quality of η by observing the effectivity index Ief f .
Using finer meshes (in order to obtain a higher approximation quality) for computing ũ and z̃, we then
repeat all previous steps and compare the different Ief f . The higher the approximation quality, the better
(hopefully!!!) Ief f becomes. This procedure can be carried out for both uniform mesh refinement and adaptive
mesh refinement. The latter is going to be explained in the following sections.

9.12 Localization techniques


In the previous sections, we have derived an error approximation η with the help of duality arguments that
lives on the entire domain Ω. In order to use the error estimator for mesh refinement we need to localize the
error estimator η to single mesh elements Kj ∈ Th or degrees of freedom (DoF) i. We present two techniques:
• A classical procedure using integration by parts resulting in an element-based indicator ηK ;

• A more recent way by employing a partition-of-unity (PU) yielding PU-DoF-based indicators ηi .


We recall that the primal estimator starts with z ∈ V from:

ρ(uh )(z − ih z) = a(u − uh , z − ih z) = a(u, z − ih z) − a(uh , z − ih z) = l(z − ih z) − a(uh , z − ih z),

and the adjoint estimator


ρ∗ (zh )(u − ih u) = J(u − ih u) − a(u − ih u, z).
According to Theorem 9.34, we have for linear problems
1 1
J(u) − J(uh ) = ρ + ρ∗ + R
2 2
1  1
= l(z − ih z) − a(uh , z − ih z) + J 0 (u − ih u) − a0 (u − ih u, zh ) + R

2 2

210
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
where R is the remainder term. Furthermore on the discrete level, we have (for linear problems):
1  1 
J(u) − J(uh ) ≈ η := l(zh − ih zh ) − a(uh , zh − ih zh ) + J(uh − ih uh ) − a(uh − ih uh , zh ) . (181)
2 2
We understand that the discrete solutions zh in the first part and uh in the second part have to be understood
computed in terms of higher-order approximations r + 1.
In the following we localize these error estimators on a single element Ki ∈ Th . Here, the influence of
neighboring elements Kj , j 6= i is important [32]. In order to achieve such an influence, we consider the error
estimator on each cell and then either integrate back into the strong form (the classical way) or keep the weak
form and introduce a partition-of-unity (a more recent way). Traditionally, there is also another way with
weak form, proposed in [23], which has been analyzed theoretically in [113], but which we do not follow in
these notes further.
In the following both localization techniques we present, we start from:

J(u − uh ) = l(zh − ih zh ) − a(uh , zh − ih zh ) (182)

For the Poisson problem, we can specify as follows:

J(u − uh ) = (f, zh − ih zh ) − (∇uh , ∇(zh − ih zh )) (183)

The next step will be to localize both terms either on a cell K ∈ Th (Section 9.12.1) or on a degree of freedom
(Section 9.12.3).

9.12.1 The classical way of error localization of the primal estimator for linear problems
In the classical way, the error identity (177) is treated with integration by parts on every mesh element K ∈ Th ,
which yields:
Proposition 9.40. It holds:
X
J(u − uh ) ≈ η = (f + ∇ · (α∇uh ), zh − ih zh )K + (α∂n uh , zh − ih zh )∂K (184)
K∈Th

Proof. Idea: Take (181), insert specific form of the PDE, reduce to an element K, integrate by parts. Now in
detail: Let α = 1 for simplicity. We start from

J(u − uh ) = (f, zh − ih zh ) − (∇uh , ∇(zh − ih zh ))

and obtain further

J(u − uh ) = (f, zh − ih zh ) − (∇uh , ∇(zh − ih zh ))


X
= (f, zh − ih zh )K − (∇uh , ∇(zh − ih zh ))K
K
X
= (f, zh − ih zh )K + (∆uh , zh − ih zh )K − (∂n uh , zh − ih zh )∂K
K
X 1
= (f + ∆uh , zh − ih zh )K − ([∂n uh ]K , zh − ih zh )∂K
2
K

with [∂n uh ] := [∂n uh ]K = ∂n uh |K + ∂n0 uh |K 0 where K 0 is a neighbor cell of K. On the outer (Dirichlet)
boundary we set [∂n uh ]∂Ω = 2∂n uh .

With the notation from the proof, we can define local residuals:

RT := f + ∆uh ,
r∂K := −[∂n uh ]

211
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Here, RT are element residuals that measures the ‘correctness’ of the PDE. The r∂K are so-called face
residuals that compute the jumps over element faces and consequently measure the smoothness of the discrete
solution uh .
Further estimates are obtained as follows:
X X
|J(u − uh )| ≤ ... ≤ |...|
K∈Th K∈Th

where we assume zh − φh = 0 on ∂Ω. With Cauchy-Schwarz we further obtain:


Xh 1 i
|J(u − uh )| ≤ η = kf + ∆uh kK kzh − ih zh kK + k[∂n uh ]k∂K\∂Ω kzh − ih zh k∂K
2
K
X
= ρK (uh )ωK (zh ) + ρ∂K (uh )ω∂K (zh ).
K

In summary we have shown:


Proposition 9.41. We have: X
|J(u) − J(uh )| ≤ η := ρK ωK , (185)
K∈Th

with
1 −1
ρK := kf + ∇ · (α∇uh )kK + hK 2 k[α∂n uh ]k∂K , (186)
2
1
ωK := kz − ih zkK + hk kz − ih zk∂K ,
2
(187)

where by [α∂n uh ] we denote the jump of the uh derivative in normal direction. The residual part ρK only
contains the discrete solution uh and the problem data. On Dirichlet boundaries ΓD , we set [α∂n uh ] = 0 and
on the Neumann part we evaluate α∂n uh = gN . Of course, we implicitly assume here that gN ∈ L2 (ΓN ) such
that these terms are well-defined.
Remark 9.42. In practice, this primal error estimator needs to be evaluated in the dual space. Here, we
proceed as follows:
• Prolongate the primal solution uh into the dual space;
(r+1)
• Next, we compute the interpolation ih zh ∈ Qr w.r.t. to the primal space;
(r+1) (r+1) (r+1)
• Then, we compute zh − ih zh (here, ih zh is prolongated to Qr+1 in order to compute the
difference);
• Evaluate the duality product h·, ·i and face terms.
(1)
Remark 9.43. When Vh = Vh , then ∇ · ∇uh ≡ 0. This also demonstrates heuristically that face terms are
important.

9.12.2 The classical way for the combined estimator


The combined estimator reads:
Proposition 9.44. It holds:
X 1 1 ∗
J(u) − J(uh ) ≈ ηK + ηK
2 2
K∈Th

with
Z

ηK = hf + ∇ · (α∇uh ), zh − ih zh iK + α∂n uh · (zh − ih zh ) ds
∂K
Z Z


ηK = J(uh − ih uh ) − ( ... + ...)
K ∂K

212
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.12.3 A variational primal-based error estimator with PU localization
An alternative way is an DoF-based estimator, which is the first difference to before. The second difference to
the classical approach is that we continue to work in the variational form and do not integrate back into the
strong form. Such an estimator has been developed and analyzed in [113]. This idea combines the simplicity of
the approach proposed in [23] (as it is given in terms of variational residuals) in terms of a very simple structure,
which makes it particularly interesting for coupled and nonlinear PDE systems (see further comments below).
Variational localizations are useful for nonlinear and coupled problems as we do not need to derive the strong
form.
To this end we need to introduce a partition-of-unity (PU), which can be realized in terms of another finite
element function. The procedure is therefore easy to realize in existing codes.
Definition 9.45 (PU - partition-of-unity). The PU is given by:

VP U := {ψ1 , . . . , ψM }

with dim(VP U ) = M . The PU has the property


M
X
ψi ≡ 1.
i=1

Remark 9.46. The PU can be simply chosen as the lowest order finite element space with linear or bilinear
elements, i.e.,
(1)
VP U = Vh .
To understand the idea, we recall that in the classical error estimator the face terms are essential since
they gather information from neighboring cells. When we work with the variational form, no integration by
parts (fortunately!) is necessary. Therefore, the information of the neighboring cells is missing. Using the
PU, we touch different cells per PU-node and consequently we gather now information from neighboring cells.
Therefore, the PU serves as localization technique.
In the following, we now describe how the PU enters into the global error identity (177):
Proposition 9.47 (Primal error estimator). For the finite element approximation of Formulation 9.27, we
have the a posteriori error estimate:
M
X
|J(u) − J(uh )| ≤ η := |ηi |, (188)
i=1

where
ηi = a(u − uh , (z − ih z)ψi ) = l((z − ih z)ψi ) − a(uh , (z − ih z)ψi ),
and more specifically for the Poisson problem:
n o
ηi = hf, (z − ih z)ψi i − (α∇uh , ∇(z − ih z)ψi ) . (189)

9.12.4 PU localization for the combined estimator


Proposition 9.48 (The combined primal-dual error estimator). The combined estimator reads:
M
X 1 1
|J(u) − J(uh )| ≤ η := |ηi | + |ηi∗ |
i=1
2 2

with

ηi = l((zh − ih zh ) − a(u, (zh − ih zh )ψi )ψi )
ηi∗ = Ju0 ((uh − ih uh )ψi ) − a0u ((uh − ih uh )ψi , zh )


213
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.13 Comments to adjoint-based error estimation
Adjoint-based error estimation allows to measure precisely at a low computational cost specific functionals of
interest J(u). However, the prize to pay is:
• We must compute a second solution z ∈ V .
• This second solution inside the primal estimator must be of higher order, which means more computa-
tional cost in comparison to the primal problem.
• For the full error estimator in total we need to compute four problems.
• From the theoretical point of view, we cannot proof convergence of the adaptive scheme for general goal
functionals.
For nonlinear problems, one has to say that the primal problem is subject to nonlinear iterations, but the
adjoint problem is always a linearized problem. Here, the computational cost may become less significant of
computing an additional adjoint problem. Nonetheless, there is no free lunch.

9.14 Balancing discretization and iteration errors


9.14.1 Main theorem
We recall from [45, 107] for the Galerkin case (U = V ), we derive an error representation in the following
theorem:
Theorem 9.49. Let us assume that A ∈ C 3 (U, V ) and J ∈ C 3 (U, R). If u solves the primal problem and z
solves the adjoint problem for u ∈ U , then it holds for arbitrary fixed ũ ∈ U and z̃ ∈ V :
1 1
J(u) − J(ũ) = ρ(ũ)(z − z̃) + ρ∗ (ũ, z̃)(u − ũ) + ρ(ũ)(z̃) + R(3) , (190)
2 2
where

ρ(ũ)(·) := −A(ũ)(·), (191)


ρ (ũ, z̃)(·) := J 0 (u) − A0 (ũ)(·, z̃),

(192)

and the remainder term


1 1 000
Z
R(3) := [J (ũ + se)(e, e, e) (193)
2 0
− A000 (ũ + se)(e, e, e, z̃ + se∗ ) − 3A00 (ũ + se)(e, e, e∗ )]s(s − 1) ds, (194)

with e = u − ũ and e∗ = z − z̃.


Proof. For the completeness of the presentation we add the proof below, which is very similar to [107]. First
we define x := (u, z) ∈ X := U × V and x̃ := (ũ, z̃) ∈ X. By assuming that A ∈ C 3 (U, V ) and J ∈ C 3 (U, R) we
know that the Lagrange function

L(x̂) := J(û) − A(û)(ẑ) ∀(û, ẑ) =: x̂ ∈ X,

is in C 3 (X, R). Assuming this, it holds


Z 1
L(x) − L(x̃) = L0 (x̃ + s(x − x̃))(x − x̃) ds.
0

Using the trapezoidal rule [107], we obtain


Z 1 Z 1
1 1
f (s) ds = (f (0) + f (1)) + f 00 (s)s(s − 1) ds,
0 2 2 0

214
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
for f (s) := L0 (x̃ + s(x − x̃))(x − x̃) we conclude
1
L(x) − L(x̃) = (L0 (x)(x − x̃) + L0 (x̃)(x − x̃)) + R(3) .
2
From the definition of L we observe that

J(u) − J(ũ) = L(x) − L(x̃) + A(u)(z) −A(ũ)(z̃)


| {z }
=0
1
= (L0 (x)(x − x̃) + L0 (x̃)(x − x̃)) − A(ũ)(z̃) + R(3) .
2
0
It remains to show that 1
2 (L (x)(x − x̃) + L0 (x̃)(x − x̃)) = 12 ρ(ũ)(z − z̃) + 21 ρ∗ (ũ, z̃)(u − ũ). But this is true
since

L0 (x)(x − x̃) + L0 (x̃)(x − x̃)


= J 0 (u)(e) − A0 (u)(e, z) − A(u)(e∗ ) + J 0 (ũ)(e) − A0 (ũ)(e, z̃) − A(ũ)(e∗ ) .
| {z } | {z } | {z } | {z }
=0 =0 =ρ∗ (ũ,z̃)(u−ũ) =−ρ(ũ)(z−z̃)

9.14.2 Adaptive fixed-point


See [107][Section 3.2].

9.14.3 Adaptive Newton


See [107][Section 3.3].

9.15 Effectivity of goal-oriented error estimation


Since J(·) is only a functional, in general not a norm, we can easily have

|J(u) − J(uh )| = 0

while u 6= uh . The objective in this section is to show that the error estimator η and J(u) − J(uh ) have a
common upper bound. If this is the case, we call the error estimator effective; see [113]. To show effectivity
of the classical error estimator, we refer the reader to [113][Section 4.1]. The PU localization is treated in the
same paper in Section 4.3.

9.16 Efficiency estimates using a saturation assumption


An improvement of the previous section was made in [46]. The prize to pay was a saturation assumption that
is required in the corresponding proofs for the efficiency.

9.17 Principles of multi-goal-oriented error estimation


In the previous section we recapitulated the DWR method for computing a single goal functional. Now we
assume that we are given N linear functionals with N ∈ N. Let J be defined as J := {J0 , . . . , JN −1 }. We can
use (188) for each Ji ∈ J where i ∈ N to compute the node-wise contributions of the error. But to do so we
have to solve N adjoint problems. Therefore we seek a method to avoid these computations. We follow the
idea from [71, 72] of combining the functionals in J. We create a linear combination of the goal functionals to
one functional J˜c resulting in
N
X −1
J˜c (ψ) := wi Ji (ψ) ∀ψ ∈ V, (195)
i=0

215
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
where wi ∈ R. We call J˜c the combined functional. Now we have to find out how to choose the weights wi .
One crucial aspect is the sign of wi because it may lead to error canceling. Furthermore, we are interested in
having similar relative errors in our functional evaluations. One idea (as in [71, 72]) is to choose wi as

sign(Ji (u) − Ji (uh ))ωi


wi := , (196)
|J(uh )|

where ωi describes some self-chosen, but positive weights. This choice leads to no error canceling and also the
relative errors are similar (if the weights ωi are almost equal). The previous choices are reasonable because in
the final end we are interested in the error
N −1
X sign(Ji (u − uh ))ωi
J˜c (u − uh ) = Ji (u − uh )
i=0
|J(uh )|

Unfortunately we do not know Ji (u). Hence, we have to find a way to get sign(Ji (u) − Ji (uh )). To do so,
we consider the adjoint to the adjoint problem (which is akin to saying a discrete error problem) [71, 72]:
Formulation 9.50 (Adjoint to the adjoint problem). Find the error function e such that

a(e, ψ) = hRuh , ψi ∀ψ ∈ V, (197)

where hRuh , ψi := hf, ψi − a(uh , ψ).


By solving this problem, we obtain e where e = u − uh and therefore we can compute Ji (u) − Ji (uh ). The
adjoint to the adjoint problem provides information with respect to the error in the goal functionals Ji (u), but
it does not yield local error information, which are required for mesh refinement. Thus the solution is to solve
both the adjoint to the adjoint problem and another adjoint problem leading to two additional problems. In
summary, using this approach for multiple goal functionals, three problems need to be solved: primal, adjoint,
adjoint to the adjoint.
For treating the adjoint to the adjoint problem we again have to solve a PDE discretized by finite elements.
(r+1) (r+1)
For this problem we have to use a discrete subspace Vh ⊂ V which fulfills Vh ( Vh because otherwise
hRuh , ψh i = 0 for all ψh ∈ Vh , which must be avoided. Moreover, we have to solve the primal problem to
compute uh and then compute e as solution of the adjoint to the adjoint problem, so we have to solve two
systems sequentially.
The dependence of the adjoint and adjoint to the adjoint problem slightly limits the possibility to further
reduce the computational cost. Therefore, we suggest the following alternative in case we approximate the
(r+1)
solution in our discrete subspace Vh , (r ≥ 1):
Proposition 9.51. Let a : V ×V → R be a bilinear form, f ∈ V ∗ , where V ∗ is the dual space of V , fulfilling the
(2) (2)
assumptions of Lax-Milgram (e.g., [51, 65]) and let uh , uh , eh be the solutions of the problems, respectively:
(2) (2) (2)
Find uh ∈ Vh , eh , uh ∈ Vh such that

a(uh , ψh ) = hf, ψh i ∀ψh ∈ Vh , (198)


(2) (2) (2) (2) (2)
a(uh , ψh ) = hf, ψh i ∀ψh ∈ Vh , (199)
and
(2) (2) (2) (2) (2)
a(eh , ψh ) = hRuh , ψh i ∀ψh ∈ Vh , (200)
(2) (2)
where Vh ⊂ V, Vh ⊂ V and hRuh , ψh i are defined as in Formulation 9.50. Then there exists a projection
(2)
P[V (2) ] : V → Vh such that
h
(2) (2)
eh = uh − P[V (2) ] uh . (201)
h

(2)
Specifically, if Vh ⊆ Vh , it holds
(2) (2)
eh = uh − uh . (202)

216
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Proof. Let uh be the solution of (198), then there exists a unique fuh ∈ V ∗ such that

a(uh , ψ) = hfuh , ψi ∀ψ ∈ V. (203)


(2)
If we want to approximate the solution of (203) on the finite element space Vh , we obtain the approximation
(2)
uuh of uh which is given by the unique solution of the problem: Find uuh ∈ Vh such that
(2) (2) (2) (2)
a(uuh , ψh ) = hfuh , ψh i ∀ψh ∈ Vh .

It can be shown that the mapping uh 7→ uuh is a projection, which is denoted by P[V (2) ] .
h
For this projection holds:
(2) (2) (2) (2) (2)
a(uh − P[V (2) ] uh , ψh ) = hfuh , ψh i − hfuh , ψh i = 0 ∀ψh ∈ Vh .
| h{z }
uuh

A simple calculation shows

(2) (2) (2) (2) (2)


a(eh , ψh ) = hRuh , ψh i = hf, ψh i − a(uh , ψh )
(2) (2) (2)
= a(uh , ψh ) − a(P[V (2) ] uh , ψh )
h
(2) (2) (2) (2)
= a(uh − P[V (2) ] uh , ψh ) ∀ψh ∈ Vh .
h

(2) (2) (2)


The Lax-Milgram lemma yields a unique solution in the space Vh and we can conclude that eh = uh −
P[V (2) ] uh hence
h
(2) (2) (2) (2) (2) (2) (2)
a(eh , ψh ) = hRuh , ψh i = a(uh − P[V (2) ] uh , ψh ) ∀ψh ∈ Vh .
h

(2) (2)
And this shows the first statement (201). If Vh ⊆ Vh holds, then P[V (2) ] uh = uh because uh ∈ Vh and
h
henceforth (202) has been shown.
Remark 9.52. The assumptions of the Lax-Milgram lemma can be relaxed by any condition which guarantees
only that a(u, ψ) = hf, ψi for all ψ ∈ V , (198), (199) and (200) have unique solutions for all f ∈ V ∗ .
(1) (r+1)
Remark 9.53. Furthermore, we notice that our previous theory does hold not only for Vh ⊂ Vh but for
(r+1)
general spaces which are not necessarily subspaces of Vh .
(r+1)
Corollary 9.54. From Proposition 9.51 we obtain that if we work on the spaces Vh and Vh the error can
be simply computed as:
(r+1) (r+1)
eh = uh − uh .
(r+1)
In particular, the two subproblems for obtaining uh and uh can be computed in parallel without communi-
(r+1)
cation using the spaces Vh and Vh .
Furthermore to avoid problems with prolongation operators in programming, we can compute immediately
Ji (uh ) and just communicate this value. With the help of Proposition 9.51, the combined functional J˜c is
approximated by Jc with
N −1 (r+1)
X sign(Ji (uh ) − Ji (uh ))ωi
Jc (ψ) := Ji (ψ) ∀ψ ∈ V, (204)
i=0
|Ji (uh )|

for some self-chosen but positive weights ωi ∈ R. Now we can use the PU approach for the functional Jc in a
similar way as discussed in Section 9.12.3, resulting in

217
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Proposition 9.55. For the finite element approximation of Formulation 9.27, and considering N goal func-
tionals Ji (·) we have the a posteriori error estimate:
M
X
|Jc (u) − Jc (uh )| ≤ η := |ηi |, (205)
i=1

where
n o
ηi = hf, (z − ih z)ψi i − (α∇uh , ∇(z − ih z)ψi ) . (206)

Specifically, the adjoint problem is given by: Find z ∈ V such that

a(ψ, z) = Jc (ψ) ∀ψ ∈ V,

for which Jc has been constructed by (204).


Remark 9.56. Since in praxis we are in general not able to compute the exact adjoint solution z (see also
(r+1)
Section 9.9.1), we approximate z by zh solving the problem: Find zh ∈ Vh such that
(r+1)
a(ψh , zh ) = Jc (ψh ) ∀ψh ∈ Vh ,

for which Jc has been constructed by (204).


Remark 9.57. An advantage of this approach is that we have to solve (as in [71, 72]) only 2 linear systems
instead of N , and furthermore we do not lose the possibility of parallelization.
Remark 9.58. If we use the same finite element space for the second primal problem and the adjoint problem,
then we just have to assemble one matrix instead of the two system matrices Aprimal , Aadjoint since it holds
Aadjoint = ATprimal .

9.17.1 The adaptive algorithm


Let error tolerances T OLi be given for each Ji ∈ J and i ∈ {N } where N is the number of functionals of
interest. Mesh adaptation is realized by extracting local error indicators ηi from the a posteriori error estimate
in Proposition 9.55 in which the adjoint solution has been approximated as explained in Remark 9.56. To this
end, we can adapt the mesh using the following strategy:
(r+1)
1. Solve two primal problems: Compute the primal solutions uh and uh for two finite element
spaces, respectively. This can be done completely in parallel.
2. Construct combined functional: Construct Jc as in (204).
(r+1)
3. Solve the adjoint problem: Compute the adjoint solution zh by solving the adjoint problem
a(ψh , zh ) = Jc (ψh ) on a larger FE-space than for uh .
4. Estimate:
• Determine the indicator ηi at each node i by (206).
P
• Compute the sum of all indicators η := i ηi .
• Check, if the stopping criterion is satisfied:

η < T OLc ,

where T OLc := inf i∈{N } { ω|JiiT(uOL i


h )|
}. If this criterion is satisfied, stop the computation since Jc (uh )
has been computed with a desired accuracy. Otherwise, proceed to the following step.
αη
5. Mark all cells Ki that touch nodes that have values ηi above the average N (where N denotes the total
number of cells of the mesh Th and α ≈ 1).

218
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
6. Refine the mesh.
7. Go back to 1.

Remark 9.59. The reason for the special choice of T OLc is to ensure that for all functionals Ji holds that
|Ji (u) − Ji (uh )| < T OLi in which we tacitly assume that the exact adjoint solution would be known.
Remark 9.60. Despite that we formulated our algorithm for r and r + 1, we notice that we could also have
worked with the same polynomial degree but locally using finer meshes to obtain the second space. This is very
similar to the various options that we have to approximate the adjoint problem itself as described in Section
9.9.1.

9.18 Residual-based error estimation


Setting
J(φ) = k∇eh k−1 (∇φ, ∇eh )
we obtain an estimator for the energy norm. And setting

J(φ) = keh k−1 (φ, eh )

we obtain an estimator for the L2 norm. For details we refer to [15] and [106].

9.19 Mesh refinement strategies: averaging, fixed-rate, fixed-fraction


We have now on each element Kj ∈ Th or each PU-DoF i an error value. It remains to set-up a strategy which
elements shall be refined to enhance the accuracy in terms of the goal functional J(·).
Let an error tolerance TOL be given. Mesh adaption is realized using extracted local error indicators from
the a posteriori error estimate on the mesh Th . A cell-wise assembling reads:
X
|J(u) − J(uh )| ≤ η := ηK for all cells K ∈ Th .
K∈Th

Alternatively, the PU allows for a DoF-wise assembling:


X
|J(u) − J(uh )| ≤ η := ηi for all DoFs i of the PU.
i

9.19.1 Averaging and general algorithm


This information is used to adapt the mesh using the following strategy:
1. Compute the primal solution uh and the adjoint solution zh on the present mesh Th .

2. Determine the cell indicator ηK at each cell K.


Alternatively, determine the DoF-indicator ηi at each PU-DoF i.
P
3. Compute the sum of Pall indicators η := K∈Th ηK .
Alternatively, η := i ηi .

4. Check, if the stopping criterion is satisfied: |J(u) − J(uh )| ≤ η ≤ T OL, then accept uh within the
tolerance T OL. Otherwise, proceed to the following step.
5. Mark all cells Ki that have values ηKi above the average αη
N (where N denotes the total number of cells
of the mesh Th and α ≈ 1).
Alternatively, all PU-DoFs are marked that are above, say, the average αη
N .

Remark 9.61. We emphasize that the tolerance T OL should be well above the tolerances of the numerical
solvers. Just recall T OL = 0.01 would mean that the goal functional is measured up to a tolerance of 1%.

219
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
When the DoF-based estimator is adopted, the error indicators ηi are node-wise contributions of the error.
Mesh adaptivity can be carried out in two ways:
• in a node-wise fashion: if a node i is picked for refinement, all elements touching this node will be refined;

• alternatively, one could also first assemble element wise for each K ∈ Th indicators by summing up all
indicators belonging to nodes of this element and then carry out adaptivity in the usual element-wise
way.
On adaptive meshes with hanging nodes, the evaluation of the PU indicator is straightforward: First, the
PU is assembled in (189) employing the basis functions ψi ∈ VP U for i = 1, . . . , M . In a second step, the
contributions belonging to hanging nodes are condensed in the usual way by distribution to the neighboring
indicators.

9.19.2 Fixed-rate (fixed number)


See for instance [11, 15]. Refining/coarsening a fixed fraction/number of elements. All elements are ordered
with respect to their error values:
η1 ≥ η2 ≥ ... ≥ ηN .
Let X and Y be fractions with 1 − X > Y . Here, X · N elements are refined and Y · N elementss are coarsened.
For instance, X = 0.3 and Y = 0.02.

9.19.3 Fixed error reduction (fixed-fraction)


Refining/coarsening according to a reduction of the error estimate (also known as bulk criterion or Dörfler
marking [42]). Here, the error values are summed up such that a prescribed fraction of the total error is
reduced. All elements that contribute to this fraction are refined. Let again X and Y be fractions with
1 − X > Y . Determine indices Nl , Nu ∈ {1, . . . , N } with N being the actual size of the system, such that
Nu
X N
X
ηi ≈ Xη, ηi ≈ Y η.
i=1 i=Nl

Then, the elements K1 , . . . , KNu are refined and KNl , . . . , KN are coarsened. For instance, X = 0.2 and
Y = 0.1.

9.19.4 How to refine marked cells


It remains to explain how marked cells are finally refined.

9.19.4.1 Quads and hexs Using quadrilateral or hexahedra meshes, simply bisection can be used. Here, a
cell is cut in the middle and split into 4 (in 2d) or 8 (in 3d) sub-elements. When the neighboring cell is not
refined, we end up with so-called hanging nodes. These are degrees of freedom on the refined cell, but on the
coarse neighboring cell, these nodes lie on the middle point of faces or edges and do not represent true degrees
of freedom. Their values are obtained by interpolation of the neighboring DoFs. Consequently, condition No.
3 in Definition 8.127 is weakened in the presence of hanging nodes. For more details, we refer to Carey and
Oden [30].

9.19.4.2 Triangles and prims For triangles or prisms, we have various possibilies how to split elements into
sub-elements. Here, common ways are red (one triangle is split into four triangles) and green (one triangle
is split into two triangles) refinement strategies. Here a strategy is to use red refinement and apply green
refinement when hanging nodes would occur in neighboring elements.

9.19.4.3 Conditions on the geometry While refining the mesh locally for problems in n ≥ 2, we need to
take care that the minimum angle condition (see Section 8.14) is fulfilled.

220
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.19.5 Convergence of adaptive algorithms
The first convergence result of an adaptive algorithm was shown in [42] for the Poisson problem. The con-
vergence of adaptive algorithms of generalized problems is subject to current research. Axioms of adaptivity
have been recently formulated in [31].

9.20 Numerical test: Poisson with mean value goal functional


We consider the following test first: Let Ω = (0, 1)2 and find u ∈ H01 (Ω) such that
(∇u, ∇φ) = (1, φ) ∀φ ∈ H01 .
The goal functional is Z
1
J(u) = u dx.
|Ω| Ω
The reference value has been computed on a very fine mesh (despite here a manufactured solution would have
been easy to derive) and is given by
Jref (u) = 3.5144241236e − 02.
The above goal functional yields the adjoint problem:
a(φ, z) = J(φ)
and in particular: Z
(∇φ, ∇z) = φ dx = (1, φ).

The parameter α in Section 9.19 is chosen α = 4. The solution of the adjoint problem is the same as the
primal problem as also illustrated in Figure 33.

Figure 33: Section 9.20: Solutions of the primal and adjoint problems.

The first goal in our computations is to investigate Ief f using the PU localization from Proposition 9.47;
again in short form:
M
(2) (2) (2) (2)
X
|J(u) − J(uh )| ≤ η := hf, (zh − ih zh )ψi i − (∇uh , ∇(zh − ih zh )ψi ) , (207)
i=1

A second goal is to show that a pure weak form without partial integration or without PU does yield an
over-estimation of the error:
M
(2) (2) (2) (2)
X
|J(u) − J(uh )| ≤ η := hf, (zh − ih zh )i − (∇uh , ∇(zh − ih zh )) , (208)
i=1

221
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Findings using PU localization
Dofs True error Estimated error Effectivity
=======================================================
25 3.17e-03 3.14e-03 9.92e-01
69 1.66e-03 2.23e-03 1.35e+00
153 7.15e-04 1.02e-03 1.43e+00
329 2.01e-04 2.19e-04 1.09e+00
829 1.07e-04 1.43e-04 1.34e+00
1545 4.83e-05 5.62e-05 1.16e+00
4049 1.68e-05 2.28e-05 1.35e+00
6673 1.18e-05 1.35e-05 1.15e+00
15893 4.15e-06 5.35e-06 1.29e+00
24853 2.96e-06 3.29e-06 1.11e+00
62325 1.05e-06 1.38e-06 1.31e+00
96953 7.34e-07 8.21e-07 1.12e+00
=======================================================

Findings using a pure weak form without part. int. nor PU


Dofs True error Estimated error Effectivity
=======================================================
25 3.17e-03 1.26e-02 3.97e+00
69 1.66e-03 8.79e-03 5.31e+00
165 6.92e-04 4.12e-03 5.95e+00
313 3.47e-04 2.42e-03 6.98e+00
629 1.77e-04 1.02e-03 5.73e+00
1397 8.36e-05 5.64e-04 6.75e+00
2277 4.57e-05 2.37e-04 5.19e+00
4949 2.14e-05 1.18e-04 5.54e+00
8921 1.04e-05 5.10e-05 4.90e+00
22469 4.71e-06 2.73e-05 5.78e+00
37377 2.66e-06 1.36e-05 5.12e+00
80129 1.83e-06 8.85e-06 4.83e+00
=======================================================

In the second table, we clearly observe an over-estimation of the error by a factor of 5. In the case of the
PU localization, we see Ief f around 1, what we expected due to the regularity of the problem.

Figure 34: Section 9.20: Locally adapted meshes using PU localization for the refinement levels 7 and 8.

222
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY

Figure 35: Section 9.20: Locally adapted meshes using a pure weak form without partial integration nor PU
localization for the refinement levels 7 and 8.

223
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.21 Numerical test: L-shaped domain with Dirac rhs and Dirac goal functional
We redo Example 2 from [113]. Specifically, we consider Poisson’s equation on an L-shaped domain ΩL =
(−1, 1)2 \ (−1, 0)2 with bounddary ∂ΩL , where the right hand side is given by a Dirac function in x0 =
(−0.5, 0.5)
− ∆u = δx0 in ΩL , u = 0 on ∂ΩL . (209)
As goal functional, we take the point evaluation in x1 = (0.5, −0.5) such that J(φ) = φ(x1 ). The adjoint
problem corresponds to solving Poisson’s equation −∆z = δx1 with a Dirac right hand side in x1 . Both the
primal problem and the adjoint problem lack the required minimal regularity for the standard finite element
theory (see the previous chapter), such that a regularization by averaging is required, e.g. by averaging over
a small subdomain: Z
1
J (φ) = φ(x) dx, (210)
2π2 |x−x1 |<
where  > 0 is a small parameter not depending on h. As reference functional quantity we compute the value

Jref = 2.134929 · 10−3 ± 10−7 (211)

obtained on a sufficiently globally refined mesh.

Dofs True error Estimated error Effectivity


=======================================================
21 1.24e-03 1.41e-03 1.14e+00
65 2.74e-04 2.45e-04 8.97e-01
225 7.50e-05 6.32e-05 8.42e-01
677 2.65e-05 2.11e-05 7.95e-01
2021 9.93e-06 7.70e-06 7.75e-01
6089 3.59e-06 2.80e-06 7.81e-01
20157 1.38e-06 1.18e-06 8.51e-01
61905 5.16e-07 5.20e-07 1.01e+00
177221 1.34e-07 2.10e-07 1.56e+00
=======================================================

224
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY

Figure 36: Section 9.21: Poisson problem with Dirac right hand side (left figure on top) and Dirac goal func-
tional (right figure on top) on the l-shaped domain. For mesh refinement the primal error estimator
with PU localization is used. The primal problem is discretized with Q1 elements and the adjoint
problem with Q2 elements. The PU is based on Q1 elements. The mesh is refined around the
two Dirac functions and in the corner of the domain (bottom figure left). In order to highlight the
various mesh levels within one computation, the different element areas are colorized (bottom figure
right).

225
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.22 Space-time goal-oriented error estimation
See [11][Section 3 and Section 9] and [19].

9.23 Final comments to error estimation in numerical simulations


We give some final comments to error estimation. One should keep in mind several aspects. First:
• Numerical mathematics is its own discipline and we should aim to develop methods with best accuracy
as possible.
• On the other hand, numerical mathematics is a very useful tool for many other disciplines (engineering,
natural sciences, etc.). Here, an improvement of a process (i.e., approximation) by several percent is
already very important, while the total error may be still large though and in the very sense of numerical
mathematics the result is not yet satisfying.

Second (w.r.t. the first bullet point): A posteriori error estimation, verification of the developed methods, and
code validation are achieved in three steps (with increasing level of difficulty) after having constructed an
a posteriori error estimator η that can be localized and used for local mesh adaptivity. These steps hold true
for classical norm-based error-estimation ku − uh k or goal-oriented error estimation J(u) − J(uh ).

1. Does the solution make sense?


• If possible test your code with an acknowledged benchmark configuration and verify whether J(uh )
matches the benchmark values in a given range and on a sequence of at least three meshes.
This first step can be performed with uniform and adaptive mesh refinement. In time-dependent
problems, please compute at least with three different time step sizes.
• If no benchmark configuration available, study a simple, prototype configuration and observe
whether the solution makes sense.
2. Do the true error J(u) − J(uh ) and the error estimator η descrease their values under mesh refinement?
• Compute J(u) either on a uniformly-refined super-fine mesh or even analytically (i.e., a manufac-
tured solution). Compute the error J(u) − J(uh ) and observe whether the error is decreasing.
• If a priori estimates are available, see if the orders of convergence are as expected. But be careful,
often goal functionals are nonlinear, for which rigorous a priori error estimates are often not available.

3. Compare η and J(u)−J(uh ) in terms of the effectivity index Ief f . Do we asymptotically obtain something
around Ief f ≈ 1? For nonlinear problems, one easily observes 0.5 ≤ Ief f ≤ 10, which might be still okay
depending on the problem.

In the very sense of numerical mathematics we must go for all three steps, but in particular the last step No.
3 is often very difficult when not all theoretical requirements (smoothness of data and boundary conditions,
regularity of the goal functional, smoothness of domains and boundaries) are fulfilled. Also keep in mind that
all parts of error estimators are implemented and that still the remainder term R may play a role. For instance,
in practice, very often when working with the DWR method, only the primal error estimator is implemented
and the adjoint part neglected; see the following section.

9.24 Example: Stationary Navier-Stokes 2D-1 benchmark


We demonstrate the developements in this chapter with a well-known benchmark in fluid mechanics: the
2D-1 benchmark [118]. The underlying equations are the stationary incompressible Navier-Stokes equations
[57, 102, 126], which are vector-valued (more details in Section 11) and nonlinear (more details in Section 13).

The goals are the accurate measurements of drag, lift and a pressure difference.

226
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.24.1 Equations
Formulation 9.62. Let Ω ⊂ R2 with boundary ∂Ω = Γwall ∪ Γin ∪ Γout . Find vector-valued velocities
v : Ω → R2 and a scalar-valued pressure p : Ω → R such that

ρ(v · ∇)v − div(σ) = ρf in Ω,


div(v) = 0 in Ω,
v=0 on Γwall ,
v=g on Γin ,
ν∂n v − pn = 0 on Γout ,

where f : Ω → R2 is a force, g : Γ → R2 a Dirichlet inflow profile, n the normal vector, ρ the density and

σ = −pI + ρν(∇v + ∇v T ), [σ] = P a = N/m2

is the Cauchy stress tensor for a Newtonian fluid. Moreover, I is the identity matrix and ν the kinematic
viscosity.
The Reynolds number Re is given by the dimensionless expression
LU
Re =
ν
where L is a characteristic length and U a characteristic velocity.
Remark 9.63 (Stokes). Neglecting the convection term (v · ∇)v for viscous flow, we obtain the linear Stokes
equations.

(0, 0.41) Γwall (2.2, 0.41)

Γin Ω Γout

(0, 0) Γwall (2.2, 0)

Figure 37: 2D-1 benchmark configuration.

For a variational formulation, we define the spaces (for details see [57, 102, 126]):

Vv := {v ∈ H 1 (Ω)2 | v = 0 on Γin ∪ Γwall },


Vp = L2 (Ω)/R.

Here, Vv is a vector-valued space. The pressure is only defined up to a constant. For existence of the pressure,
the inf-sup condition is adopted [57, 102, 126].
On the outflow boundary Γout , we apply a natural boundary condition:

ρν∂n v − pn = 0.

This type of condition implicitly normalizes the pressure [75, 101] to a unique solution. Thus, the pressure
space can be written as
Vp = L2 (Ω).

227
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
Remark 9.64. Applying ρν∂n v − pn = 0 on Γout is not consistent with the total stress σ · n = −pI + ρν(∇v +
∇v T ) · n. The symmetric part ρν∇v T · n needs to be subtracted on Γout . For more details see [75, 138]. For a
better concentration on the main parts in this section, we neglect this term in the following.

As variational formulation, we obtain:


Formulation 9.65. Find v ∈ {g + Vv } and p ∈ Vp such that

(ρ(v · ∇)v, φ) + (σ, ∇φ) = (ρf, φ) ∀φ ∈ Vv ,


(div(v), ψ) = 0 ∀ψ ∈ Vp .

For the prescription of non-homogeneous Dirichlet conditions, we refer to Section 8.11.3.

For the finite element spaces we have to satisfy a discrete inf-sup condition [57]. Basically speaking, the
finite element space for the velocities must be large enough. A stable FE pair is the so-called Taylor-Hood
element Q2 × Q1 ; here defined on quadrilaterals. The velocities are bi-quadratic and the pressure bi-linear.
Hence we seek (vh , ph ) ∈ Vh2 × Vh1 with

Vhs := {uh ∈ C(Ω̄)| uh |K ∈ Qs (K), ∀K ∈ Th }.

This is a conforming finite element space, i.e., Vhs ⊂ H 1 (Ω).


Formulation 9.66. Find vh ∈ {g + Vh2 } and ph ∈ Vh1 such that

(ρ(vh · ∇)vh , φh ) + (σ, ∇φh ) = (ρf, φh ) ∀φh ∈ Vh2 ,


(div(vh ), ψh ) = 0 ∀ψh ∈ Vh1 .

A compact semi-linear form is formulated as follows:


Formulation 9.67. Find Uh := (vh , ph ) ∈ {g + Vh2 } × Vh1 such that

a(Uh )(Ψh ) = (ρ(vh · ∇)vh , φh ) + (σ, ∇φh ) − (ρf, φh ) + (div(vh ), ψh ) ∀Ψh = (φh , ψh ) ∈ Vh2 × Vh1

9.24.2 Functionals of interest


The goal functionals are drag, lift, and pressure difference, respectively:
Z Z
J1 (U ) = σ · ne1 ds, J2 (U ) = σ · ne2 ds, J3 (U ) = p(x0 , y0 ) − p(x1 , y1 )
S S

with x0 , x1 , y0 , y1 ∈ Ω and where e1 and e2 are the unit vectors in x and y direction. We are interested in the
drag and lift coefficients:
2J1 2J2
cD = , cL =
ρV 2 D ρV 2 D
with D = 0.1m (the cylinder diameter) and the mean velocity
2
V = V (t) = vx (0, H/2)
3
where vx is defined below and with the height of the channel H = 0.41m.

228
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.24.3 A duality-based a posteriori error estimator (primal part)
We develop an error estimator only based on the primal part. Since the problem is nonlinear, the adjoint part
should be better evaluated as well. On the other, the adjoint part is expensive for which reason it might be
justified only to work with the primal error estimator.
The adjoint bi-linear form is given by:
Formulation 9.68. From the optimality system, we obtain:

a0U (Uh )(Ψh , Zh ) = (ρ(φh · ∇)vh + ρ(vh · ∇)φh , zhv ) + (−ψh I + νρ(∇φh + ∇φTh ), ∇zhv ) + (div(φh ), zhp ).

Proposition 9.69 (Adjoint problem). The adjoint problem depends on the specific goal functional Ji (U ), i =
1, 2, 3. For example, for the drag, it holds: Find Zh = (zhv , zhp ) ∈ Vh3 × Vh2 such that

a0U (Uh )(Ψh , Z) = J1 (Ψh ) ∀Ψh = (φh , ψh ) ∈ Vh3 × Vh2

with a0U (Uh )(Ψh , Z) defined in Formulation 9.68 and


Z  
J1 (Ψh ) = −ψh I + ρν(∇φh + ∇φTh ) · ne1 ds, Ψh = (φh , ψh ).
S

Recall that the adjoint solution must be approximated with a higher-order method. Therefore, we choose the
space V 3 × V 2 .
The primal PU-DWR error estimator reads:
Proposition 9.70. It holds:
X
|J(U ) − J(Uh )| ≤ η := |ηi |,
i

with
ηi = −a(Uh )((Zh − ih Zh )χi ), Zh = (zhv , zhp ), χi ∈ VP U
and the PU is realized as VP U := Vh1 and where a(Uh )(·) has been defined in Formulation 9.67.
Remark 9.71. In the error estimator the non-homogeneous Dirichlet data g on Γin are neglected and assumed
to be small. Also, the outflow correction condition, see Remark 9.64, is neglected.

9.24.4 2D-1 configuration


The configuration is displayed in [118][Figure 1].

9.24.5 Boundary conditions


As boundary conditions, we specified all boundaries previously except the inflow:
H −y T
g = (g1 , g2 )T = (vx (0, y), vy )T = (4vm y , 0)
H2
with vm = 0.3m/s. The resulting Reynolds number is Re = 20 (use D and V as defined above).

9.24.6 Parameters and right hand side data


We use ρ = 1kg/m3 , ν = 10−3 m2 /s and f = 0.

229
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.24.7 Step 1: Verification of benchmark values
In Table 3 in [118], the following bounds are given:

• Drag, cD : 5.57 − 5.59.


• Lift, cL : 0.0104 − 0.0110.
• Pressure difference, ∆p: 0.1172 − 0.1176.

We obtain on level 3 with 1828 elements and with 15868 velocity DoFs and 2044 pressure DoFs:
Functional Value
======================================
Drag: 5.5754236905876873e+00
Lift: 1.0970590684824442e-02
Pressure diff: 1.1745258793930979e-01
======================================

9.24.8 Step 2 and Step 3: Computing J(u) on a fine mesh, J(u) − J(uh ) and Ief f
The reference value J1 (U ) for the drag is:
5.5787294556197073e+00

and was obtained on a four times (Level 4) uniformely refined mesh.


We obtain:
Level Exact err Est err I_eff
==================================================
0 3.51e-01 1.08e-01 3.07e-01
1 9.25e-02 2.08e-02 2.25e-01
2 1.94e-02 6.07e-03 3.12e-01
3 3.31e-03 2.45e-03 7.40e-01
==================================================
We observe that the true error and the estimated error η both decrease. The effectivity indicies are less than
1 and indicate an underestimation of the true error (clearly seen in columns No. 2 and 3 as well). Because
of our several assumptions and relatively coarse meshes, the results are more or less satisfying w.r.t. Step 3.
Regarding Step 1 and 2, our findings are excellent.

230
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY
9.24.9 More findings - graphical solutions

Figure 38: 2D-1 fluid benchmark: adaptively refined mesh with respect to J1 (U ), the pressure solution, x-
velocity, y-velocity.

231
9. GOAL-ORIENTED A POSTERIORI ERROR ESTIMATION AND ADAPTIVITY

Figure 39: 2D-1 fluid benchmark: adjoint solutions of the pressure solution, x-velocity, y-velocity.

9.25 Example: Stationary fluid-structure interaction


The previous example can be extended to fluid-structure interaction. Details can be found:

• online on github https://github.com/tommeswick/goal-oriented-fsi


• a preprint https://arxiv.org/abs/2105.11145 for details on the problem statement and algorithms

9.26 Chapter summary and outlook


In this chapter, we introduced goal-oriented a posteriori error estimation to control the error during a finite
element simulation a posteriori. Furthermore, these estimates can be localized on each mesh element and then
be used for mesh refinement to obtain a higher accuracy of the numerical solution with low computational
cost. The discrete systems in the present and the previous chapter lead to large linear equation systems that
can in general not be solved anymore with a direct decomposition (e.g., LU). We shall briefly introduce some
iterative methods in the next chapter.

232
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS

10 Numerical solution of the discretized problems


In this chapter, we provide some ideas how to solve the arising linear systems
Ax = b
where
A ∈ Rn×n , x = (u1 , . . . , un )T ∈ Rn , b ∈ Rn .
when discretizing a PDE using finite differences or finite elements. We notice that to be consistent with the
previous notation, we assume that the boundary points x0 and xn+1 are not assembled.
For a moderate number of degrees of freedom, direct solvers such as Gaussian elimination, LU or Cholesky
(for symmetric A) can be used.
More efficient schemes for large problems in terms of
• computational cost (CPU run time);
• and memory consumptions
are iterative solvers. Illustrative examples of floating point operations and CPU times are provided in
[114][Pages 68-69, Tables 3.1 and 3.2].
We notice that most of the material in this chapter is taken from [114][Chapter 7] and translated from
German into English.

10.1 On the condition number of the system matrix


Proposition 10.1. Let Th be a quasi-uniform triangulation. The condition numbers χ(A) of the system matrix
A, and the condition number χ(M ) of a mass matrix M can be estimated by:
χ(A) = O(h−2 ), χ(M ) = O(1), h → 0.
Here
Z
A = aij = a(φj , φi ) = ∇φj ∇φi dx,

Z
M = mij = m(φj , φi ) = φj φi dx.

Proof. See [106], Section 3.5 or also [82], Section 7.3.

10.2 Fixed-point schemes: Richardson, Jacobi, Gauss-Seidel


A large class of schemes is based on so-called fixed point methods:
g(x) = x.
We provide in the following a brief introduction that is based on [114]. Starting from
Ax = b
we write
0 = b − Ax
and therefore
x = x + (b − Ax) .
| {z }
g(x)

Introducing a scaling matrix C (in fact C is a preconditioner) and an iteration, we arrive at


xk = xk−1 + C(b − Axk−1 ).
Summarizing, we have

233
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
Definition 10.2. Let A ∈ Rn×n , b ∈ Rn and C ∈ Rn×n . To solve

Ax = b

we choose an initial guess x0 ∈ Rn and we iterate for k = 1, 2, . . .:

xk = xk−1 + C(b − Axk−1 ).

Please be careful that k does not denote the power, but the current iteration index. Furthermore, we introduce:

B := I − CA and c := Cb.

Then:
xk = Bxk−1 + c.
Thanks to the construction of
g(x) = Bx + c = x + C(b − Ax)
it is trivial to see that in the limit k → ∞, it holds

g(x) = x

with the solution


Ax = b
Remark 10.3. Thanks to Banach’s fixed point theorem (see again [114]), we can investigate under which
conditions the above scheme will converge. We must ensure that

kg(x) − g(y)k ≤ kBk kx − yk

with kBk < 1. In fact we have:


g(x) = Bx + c, g(y) = By + c
from which the above estimate is easily to see. A critical issue (which is also true for the ODE cases) is that
different norms may predict different results. For instance it may happen that

kBk2 < 1 but kBk∞ > 1.

For this reason, one often works with the spectral norm spr(B). More details can be found for instance in [114].
That we have a chance to achieve kBk < 1 is the reason why we introduced the matrix C.
We concentrate now on the algorithmic aspects. The two fundamental requirements for the matrix C (defined
above) are:
• It should hold C ≈ A−1 and therefore kI − CAk  1;
• It should be simple to construct C.
Of course, we easily see that these two requirements are conflicting statements. As always in numerics we need
to find a trade-off that is satisfying for the developer and the computer.

10.2.1 On the Richardson iteration


We investigate the Richardson iteration in more detail.
Definition 10.4 (Richardson iteration). The simplest choice of C is the identity matrix, i.e.,

C = ωI.

Then, we obtain the Richardson iteration

xk = xk−1 + ω(b − Axk−1 )

with a relaxation parameter ω > 0.

234
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
Let B = I − ωA be the iteration matrix of Richardson’s method. The eigenvalues are

1 − ωλ1 , . . . , 1 − ωλn .

Thus the spectral radius is given by


ρ(B) = max |1 − ωλi |
i=1,...,n

The question is for which i we obtain


min(ρ(B))?
For being s.p.d. (eigenvalues ordered, λ1 = λmin , λn = λmax ), we have

ρ(B) = max(|1 − ωλ1 |, |1 − ωλn |) (212)

Proposition 10.5 (Convergence Richardson). For A s.p.d. and ordered eigenvalues (real, positive), Richardon’s
method converges if and only if
2
0<ω< (213)
λn
The optimal relaxation parameter is then given by
2
ωopt =
λ1 + λn
with the spectral radius
λn − λ1
ρ(B) =
λ1 + λn
where B = I − ωopt A.
2
Proof. Let 0 < ω < λn hold true. Then:

−1 < 1 − ωλn ≤ 1 − ωλn < 1.

With (212), we obtain ρ(B) < 1 and therefore convergence. Let us now assume, we have convergence we want
to show (213). In the case of convergence, for (212), it holds

1 > ρ(B) ≥ |1 − ωλn | ≥ 1 − ωλn .

Therefore: ωλn > 0 since λn > 0. On the other hand:

−1 < −ρ(B) ≤ −|1 − ωλn | ≤ 1 − ωλn

showing ωλn < 2, yielding (213). For the last part, consider the intersection 1 − ωλ1 and ωλn − 1 yielding the
optimal value ωopt .

10.2.2 Jacobi and Gauss-Seidel


Further schemes require more work and we need to decompose the matrix A first:

A = L + D + U.

Here, L is a lower-triangular matrix, D a diagional matrix, and U an upper-triangular matrix. In more detail:
     
0 ... 0 a11 ... 0 0 a12 . . . a1n
 ..   ..   .
.. ... .. 
a
 21 .   .  
. 

A= . . + + .
   
 . .. ..   ..   ..
 . .  
  .  
  . an−1,n 

an1 . . . an,n−1 0 0 ... ann 0 ... 0
| {z } | {z } | {z }
=:L =:D =:U

With this, we can now define two very important schemes:

235
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
Definition 10.6 (Jacobi method). To solve Ax = b with A = L + D + R let x0 ∈ Rn be an initial guess. We
iterate for k = 1, 2, . . .
xk = xk−1 + D−1 (b − Axk−1 )
or in other words J := −D−1 (L + R):
xk = Jxk−1 + D−1 b.

Definition 10.7 (Gauß-Seidel method). To solve Ax = b with A = L + D + R let x0 ∈ Rn be an initial guess.


We iterate for k = 1, 2, . . .
xk = xk−1 + (D + L)−1 (b − Axk−1 )
or in other words H := −(D + L)−1 R:

xk = Hxk−1 + (D + L)−1 b.

To implement these two schemes, we provide the presentation in index-notation:

Theorem 10.8 (Index-notation of the Jacobi- and Gauß-Seidel methods). One step of the Jacobi method and
Gauß-Seidel method, respectively, can be carried out in n2 + O(n) operations. For each step, in index-notation
for each entry it holds:  
n
1  X
xki = bi − aij xk−1
j
 , i = 1, . . . , n,
aii
j=1,j6=i

i.e., (for the Gauss-Seidel method):


 
1 X X
xki = bi − aij xkj − aij xk−1
j
, i = 1, . . . , n.
aii j<i j>i

10.3 Gradient descent


An alternative class of methods is based on so-called descent or gradient methods, which further improve
the previously introduced methods. So far, we have:

xk+1 = xk + dk , k = 1, 2, 3, . . .

where dk denotes the direction in which we go at each step. For instance:

dk = D−1 (b − Axk ), dk = (D + L)−1 (b − Axk )

for the Jacobi and Gauss-Seidel methods, respectively. To improve these kind of iterations, we have two
possiblities:
• Introducing a relaxation (or so-called damping) parameter ω k > 0 (possibly adapted at each step) such
that
xk+1 = xk + ω k dk ,

and/or to improve the search direction dk such that we reduce the error as best as possible. We restrict our
attention to positive definite matrices as they appear in the discretization of elliptic PDEs studied previously
in this section. A key point is another view on the problem by regarding it as a minimization problem for which
Ax = b is the first-order necessary condition and consequently the sought solution. Imagine for simplicity that
we want to minimize f (x) = 21 ax2 − bx. The first-order necessary condition is nothing else than the derivative
f 0 (x) = ax − b. We find a possible minimum via f 0 (x) = 0, namely

ax − b = 0 ⇒ x = a−1 b, if a 6= 0.

236
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
That is exactly the same how we would solve a linear matrix system Ax = b. By regarding it as a minimum
problem we understand better the purpose of our derivations: How does minimizing a function f (x) work in
terms of an iteration? Well, we try to minimize f at each step k:

f (x0 ) > f (x1 ) > . . . > f (xk )

This means that the direction dk (to determine xk+1 = xk + ω k dk ) should be a descent direction. This idea
can be applied to solving linear equation systems. We first define the quadratic form
1
Q(y) = (Ay, y)2 − (b, y)2 ,
2
where (·, ·) is the Euclidian scalar product. Then, we can define
Algorithm 10.9 (Descent method - basic idea). Let A ∈ Rn×n be positive definite and x0 , b ∈ Rn . Then for
k = 0, 1, 2, . . .
• Compute dk ;

• Determine ω k as minimum of ω k = argmin Q(xk + ω k dk );


• Update xk+1 = xk + ω k dk .
For instance dk can be determined via the Jacobi or Gauss-Seidel methods.

Another possibility is the gradient method in which we use the gradient to obtain search directions dk . This
brings us to the gradient method:
Algorithm 10.10 (Gradient descent). Let A ∈ Rn×n positive definite and the right hand side b ∈ Rn . Let
the initial guess be x0 ∈ R and the initial search direction d0 = b − Ax0 . Then k = 0, 1, 2, . . .
• Compute the vector rk = Adk ;

• Compute the relaxation


kdk k22
ωk =
(rk , dk )2

• Update the solution vector xk+1 = xk + ω k dk .


• Update the search direction vector dk+1 = dk − ω k rk .

One can show that the gradient method converges to the solution of the linear equation system Ax = b (see for
instance [114]).
Proposition 10.11 (Descent directions). Let A ∈ Rn×n be symmetric positive definite. Then, two subsequent
search directions dk and dk+1 of the gradient descent scheme are orthogonal, i.e., (dk , dk+1 ) = 0.
Proof. We have
(dk , dk )
dk+1 = dk − ω k rk = dk − Adk .
(Adk , dk )
Therefore,
(dk , dk )
(dk+1 , dk ) = (dk , dk ) − (Adk , dk ) = (dk , dk ) − (dk , dk ) = 0.
(Adk , dk )

237
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
10.4 Conjugate gradients (a Krylov space method)
This section is copy and paste from [114][Section 7.8] and translation from German into English. The rela-
tionship in Proposition 10.11 is only true for pairwise descent directions. In general, we have dk 6⊥ dk+2 . For
this reason ,the gradient descent scheme converges slowly in most cases.

10.4.1 Formulation of the CG scheme


In order to enhance the performance of gradient descent, the conjugate gradient (CG) scheme was developed.
Here, the search directions {d0 , . . . , dk−1 } are pairwise orthogonal. The measure of orthogonality is achievend
by using the A scalar product:
(Adr , ds ) = 0 ∀r 6= s
Pk−1
At step k, we seek the approximation xk = x0 + i=0 αi di as the minimum of all α = (α0 , . . . , αk−1 ) with
respect to Q(xk ):
      
k−1
X 1 k−1
X k−1
X k−1
X 
min Q x0 + α i di  = min Ax0 + αi Adi , x0 + αi di  − b, x0 + α i di 
α∈Rk α∈Rk  2 
i=0 i=0 i=0 i=0

The stationary point is given by


 
k−1
! ∂ X  
0 = Q(xk ) = Ax0 + αi Adi , dj  − (b, dj ) = − b − Axk , dj , j = 0, . . . , k − 1.
∂αj i=0

Therefore, the new residual b − Axk is perpendicular to all search directions dj for j = 0, . . . , k − 1. The
resulting linear equation system
(b − Axk , dj ) = 0 ∀j = 0, . . . , k − 1 (214)
has the feature of Galerkin orthogonality, which we know as property of FEM schemes.
While constructing the CG method, new search directions should be linearly independent of the current dj .
Otherwise, the space would not become larger and consequently, the approximation cannot be improved.
Definition 10.12 (Krylov space). We choose an initial approximation x0 ∈ Rn with d0 := b − Ax0 the Krylov
space Kk (d0 , A) such that
Kk (d0 , A) := span{d0 , Ad0 , . . . , Ak−1 d0 }.
Here, Ak means the k-th power of A.
It holds:
Lemma 10.13. Let Ak d0 ∈ Kk . Then, the solution x ∈ Rn of Ax = b is an element of the k-th Krylov space
Kk (d0 , A).
Proof. Let Kk be given and xk ∈ x0 + Kk the best approximation, which fulfills the Galerkin equation (214)
Let rk := b − Axk . Since
rk = b − Axk = b − Ax0 +A (x0 − xk ) ∈ d0 + AKk
| {z } | {z }
=d0 ∈Kk

it holds rk ∈ Kk+1 . Supposing that Kk+1 ⊂ Kk , we obtain rk ∈ Kk . The Galerkin equation yields rk ⊥ Kk ,
from which we obtain rk = 0 and Axk = b.
If the CG scheme aborts since it cannot find new search directions, the solution is found. Let us assume
that the A-orthogonal search directions {d0 , d1 , . . . , dk−1 }Phave been found, then we can compute the next CG
approximation using the basis representation xk = x0 + αi di and employing the Galerkin equation:
 
k−1
b − Ax0 −
X (d0 , dj )
αi Adi , dj  = 0 ⇒ (b − Ax0 , dj ) = αj (Adj , dj ) ⇒ αj =
i=0
(Adj , dj )

238
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
The A-orthogonal basis {d0 , . . . , dk−1 } of the Krylov space Kk (d0 , A) can be computed with the Gram-
Schmidt procedure. However, this procedure has a high computational cost; see e.g., [114]. A better procedure
is a two-step recursion formula, which is efficient and stable:

Lemma 10.14 (Two-step recursion formula). Let A ∈ Rn×n symmetric positive definite and x0 ∈ Rn and
d0 := b − Ax0 . Then, for k = 1, 2, . . . , the iteration

(rk , Adk−1 )
rk := b − Axk , βk−1 := − , dk := rk − βk−1 dk−1
(dk−1 , Adk−1 )

constructs an A-orthogonal basis with (Adr , ds ) = 0 for r 6= s. Here xk in step k defines the new Galerkin
solution (b − Axk , dj ) = 0 for j = 0, . . . , k − 1.
Proof. See [114].

We collect now all ingredients to construct the CG scheme. Let x0 be an initial guess and d0 := b − Ax0
the resulting defect. Suppose that Kk := span{d0 , . . . , dk−1 } and xk ∈ x0 + Kk and rk = b − Axk have been
computed. Then, we can compute the next iterate dk according to Lemma 10.14:

(rk , Adk−1 )
βk−1 = − , dk = rk − βk−1 dk−1 . (215)
(dk−1 , Adk−1 )
Pk
For the new coefficient αk in xk+1 = x0 + i=0 αi di holds with testing in the Galerkin equation (214) with
dk :
 
k
X
Ax0 −
b| −{z αi Adi , dk  = (b − Ax0 , dk ) − αk (Adk , dk ) = (b − Ax0 + A(x0 − xk ), dk ) − αk (Adk , dk ).
 
} | {z }
=d0 i=0
∈Kk

That is
(rk , dk )
αk = , xk+1 = xk + αk dk . (216)
(Adk , dk )
This allows to compute the new defect rk+1 :

rk+1 = b − Axk+1 = b − Axk − αk Adk = rk − αk Adk (217)

We summarize (215 − 217) and formulate the classical CG scheme:

Algorithm 10.15. Let A ∈ Rn×n symmetric positive definite and x0 ∈ Rn and r0 = d0 = b − Ax0 be given.
Iterate for k = 0, 1, . . . :
(r k ,dk )
1. αk = (Adk ,dk )

2. xk+1 = xk + αk dk
3. rk+1 = rk − αk Adk
(r k+1 ,Adk )
4. βk = (dk ,Adk )

5. dk+1 = rk+1 − βk dk
Without round-off errors, the CG scheme yields after (at most) n steps the solution of a n-dimensional
problem and is in this sense a direct method rather than an iterative scheme. However, in practice for huge
n, the CG scheme is usually stopped earlier, yiedling an approximate solution.

239
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
Proposition 10.16 (CG as a direct method). Let x0 ∈ Rn be any initial guess. Assuming no round-off errors,
the CG scheme terminates after (at most) n steps with xn = x. At each step, we have:

Q(xk ) = min Q(xk−1 + αdk−1 ) = min Q(y)


α∈R y∈x0 +Kk

i.e.,
kb − Axk kA−1 = min kb − AykA−1
y∈x0 +Kk

with the norm 1


kxkA−1 = (A−1 x, x)22 .
Proof. That the CG scheme is a direct scheme follows from Lemma 10.13.
The iterate is given by
Q(xk ) = min0
Q(y),
y∈x +Kk

with is equivalent to (214). The ansatz


k−1
X
xk = x0 + αk dk−1 = x0 + y k−1 +αt−1 dk−1
| {z }
k=0
∈Kt−1

yields
(b − Axk , dj ) = (b − Ay k−1 , dj ) − αt−1 (Adk−1 , dj ) = 0 ∀j = 0, . . . , t − 1
that is
(b − Ay k−1 , dj ) = 0 ∀j = 0, . . . , t − 2,
and therefore y k−1 = xk−1 and
Q(xk ) = min Q(xk−1 + αdk−1 ).
α∈R
T
Finally, employing symmetry A = A , we obtain:

kb − Ayk2A−1 = (A−1 [b − Ay], b − Ay)2 = (Ay, y)2 − (A−1 b, Ay)2 − (y, b)2
= (Ay, y)2 − 2(b, y)2 ,

i.e., the relationship kb − Ayk2A−1 = 2Q(y).


Remark 10.17 (The CG scheme as direct method). In Section 10.8 we solve the linear equation system
resulting from Poisson’s problem with several schemes. Using the CG scheme without preconditioning, we see
that is terminates with an iteration number M < N = DoF s. For instance for N = 25DoF s, we have M = 3
iterations. For larger N it is of course interesting to terminate the CG scheme earlier than machine precision,
which - as previously stated - is the main interest in general.
Remark 10.18 (The CG scheme as iterative scheme). As previosously mentioned, in pratice the CG scheme
is (always) used as iterative method rather than a direct method. Due to round-off errors the search directions
are never 100% orthogonal.

10.4.2 Convergence analysis of the CG scheme


We now turn our attention to the convergence analysis, which is a nontrivial task. The key is the following
characterization of one iteration xk = x0 + Kk by

xk = x0 + pk−1 (A)d0 ,

where pk−1 ∈ Pk−1 is a polynomial in A:


k−1
X
pk−1 (A) = αi Ai
i=0

240
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
The characterization as minimization of Proposition 10.16 can be written as:

kb − Axk kA−1 = min kb − AykA−1 = min kb − Ax0 − Aq(A)d0 kA−1 .


y∈x0 +Kk q∈Pk−1

When we employ the k · kA norm, we obtain with d0 = b − Ax0 = A(x − x0 )

kb − Axk kA−1 = kx − xk kA = min k(x − x0 ) − q(A)A(x − x0 )kA ,


q∈Pk−1

that is
kx − xk kA = min k[I − q(A)A](x − x0 )kA .
q∈Pk−1

In the sense of the best approximation property (recall e.g., Section 8.7), we can formulate this task as:

p ∈ Pk−1 : k[I − p(A)A](x − x0 )kA = min k[I + q(A)A](x − x0 )kA . (218)


q∈Pk−1

The characterization as best approximation is key in the convergence analysis of the CG scheme. Let q(A)A ∈
Pk (A). We seek a polynomial q ∈ Pk with q(0) = 1, such that

kxk − xkA ≤ min kq(A)kA kx − x0 kA . (219)


q∈Pk , q(0)=1

The convergence of the CG method is related to the fact whether we can construct a polynomial q ∈ Pk
with p(0) = 1 such that the A norm is as small as possible. First, we have:
Lemma 10.19 (Bounds for matrix polynomials). Let A ∈ Rn×n symmetric positive definite with the eigen-
values 0 < λ1 ≤ · · · ≤ λn , and p ∈ Pk a polynomials with p(0) = 1:

kp(A)kA ≤ M, M := min sup |p(λ)|.


p∈Pk , p(0)=1 λ∈[λ1 ,λn ]

Proof. See [114].


Employing the previous result and the error estimate (219), we can now derive a convergence result for the
CG scheme.

Proposition 10.20 (Convergence of the CG scheme). Let A ∈ Rn×n be symmetric positive definite. Let
b ∈ Rn a right hand side vector and let x0 ∈ Rn be an initial guess. Then:
√ !k
k 1 − 1/ κ
kx − xkA ≤ 2 √ kx0 − xkA , k ≥ 0,
1 + 1/ κ

with the spectral condition κ = cond2 (A) of the matrix A.


Remark 10.21. We see immediately that a large condition number κ  1 yields

1 − 1/ κ
√ →1
1 + 1/ κ

and deteriorates significantly the convergence rate of the CG scheme. This is the key reason why preconditioners
of the form P −1 ≈ A−1 are introduced that re-scale the system; see Section 10.5:
−1 −1
P
| {z A} x = P b.
≈I

Computations to substantiate these findings are provided in Section 10.8.

241
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
Proof. Of Prop. 10.20.
From the previous lemma and the estimate (219) it follows
kxk − xkA ≤ M kx0 − xkA
with
M= min max |q(λ)|.
q∈Pk , q(0)=1 λ∈[λ1 ,λn ]

We have to find a sharp estimate of the size of M . That is to say, we seek a polynomial q ∈ Pk which takes
at the origin the value 1, i.e., q(0) = 1 and which simultaneously has values near 0 in the maximum norm in
the interval [λ1 , λn ]. To this end, we work with the Tschebyscheff approximation (see e.g., [114] and references
therein for the original references). We seek a best approximation p ∈ Pk of the zero function on [λ1 , λn ]. Such
a polynomial should have the property p(0) = 1. For this reason, the trivial solution p = 0 is not valid. A
Tschebyscheff polynomial reads: 
Tk = cos k arccos(x)
and has the property:
k−1
X
2−k−1 max |Tk (x)| = min max |xk + αi xk |.
[−1,1] α0 ,...,αk−1 [−1,1]
i=0
We choose now the transformation:
λn + λ1 − 2t
x 7→
λn − λ1
and obtain with   −1
λn + λ1 − 2t λn + λ1
p(t) = Tk Tk
λn − λ1 λn − λ1
a polynomial of degree k, which is minimal on [λ1 , λn ] and can be normalized by
p(0) = 1.
It holds:  −1  −1
λn + λ1 κ+1
sup |p(t)| = Tk = Tk (220)
t∈[λ1 ,λn ] λn − λ1 κ−1
with the spectral condition:
λn
. κ :=
λ1
We now employ the Tschebyscheff polynomials outside of [−1, 1]:
1h p p i
Tn (x) = (x + x2 − 1)n + (x − x2 − 1)n
2
κ+1
For x = κ−1 , it holds:
√ √
s 2
κ+1 κ+1 κ+2 κ+1 κ+1
+ −1= =√
κ−1 κ−1 κ−1 κ−1
and therefore

s 2
κ+1 κ+1 κ−1
− −1= √ .
κ−1 κ−1 κ+1
Using this relationship, we can estimate (220):
  √ !k √ !k √ !k
κ+1 1h κ+1 κ−1 i 1 κ+1
Tk = √ + √ ≥ √
κ−1 2 κ−1 κ+1 2 κ−1
It follows that k
√ !k 
√1
 −1 1−
κ+1 κ−1 κ
sup Tk ≤2 √ = 2  .
t∈[λ1 ,λn ] κ−1 κ+1 1+ √1
κ

This finishes the proof.

242
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
10.5 Preconditioning
This section is mostly copy and paste from [114][Section 7.8.1] and translation from German into English.
The rate of convergence of iteratice schemes depends on the condition number of the system matrix. For
instance, we recall that for second-order operators (such as Laplace) we have a dependence on the mesh size
O(h−2 ) = O(N ) (in 2D) (see Section 16 for the relation between h and N ). For the CG scheme it holds:

1− √1
 
κ 2 1
ρCG = =1− √ +O .
1+ √1 κ κ
κ

Preconditioning reformulates the original system with the goal of obtaining a moderate condition number for
the modified system. To this end, we seek a matrix P ∈ Rn×n such that

P ≈ A−1 .

On the other hand


P ≈ I,
such that the construction of P is not too costly. Obviously, these are two conflicting requirements.
When A is symmetric (for instance Poisson problem), P can be composed of two matrices:

P = KK T .

Then:
Ax = b ⇔ K −1 A(K T )−1 K T
x = K −1 b,
| {z } | {z } | {z }
=:x̃ =:b̃
=:Ã

which is
Ãx̃ = b̃.
In the case of
cond2 (Ã)  cond2 (A)
−1
and if the application of K is cheap, then the consideration of a preconditioned system Ãx̃ = b̃ yields a
much faster solution of the iterative scheme. The condition P = KK T is necessary such that the matrix Ã
keeps its symmetry.
The preconditioned CG scheme (PCG) can be formulated as:
Algorithm 10.22. Let A ∈ Rn×n symmetric positive definite and P = KK T a symmetric preconditioner.
Choosing an initial guess x0 ∈ Rn yields:
1. r0 = b − Ax0
2. P p0 = r0
3. d0 = p0
4. For k = 0, 1, . . .
(r k ,dk )
a) αk = (Adk ,dk )

b) xk+1 = xk + αk dk
c) rk+1 = rk − αk Adk
d) P pk+1 = rk+1
(r k+1 ,pk+1 )
e) βk = (r k ,g k )

f ) dk+1 = pk+1 + βk dk
At each step, we have as additional cost the application of the preconditioner P . We recall that P allows
the decomposition into K and K T even if they are not explicitely used.
Typical preconditioners are:

243
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
• Jacobi preconditioning
We choose P ≈ D−1 , where D is the diagonal part of A. It holds
1 1
D = D 2 (D 2 )T ,

which means that for Dii > 0, this preconditioner is admissible. For the preconditioned matrix, it holds
1 1
à = D− 2 AD− 2 ⇒ ãii = 1

• SSOR preconditioning

The SSOR scheme is a symmetric variant of the SOR method (successive over-relaxation) and is based
on the decomposition:
1 1 1 1
P = (D + ωL)D−1 (D + ωR) = (D 2 + ωLD− 2 ) (D 2 + ωD− 2 R) .
| {z }| {z }
K =K T

For instance, for the Poisson problem, we can find an optimal ω (which is a non-trivial task) such that
p
cond2 (Ã) = cond2 (A)

can be shown. Here, the convergence improves significantly. The number of necessary steps to achieve a
given error reduction by a factor of  improves to

log() log() log() log()


tCG () = 1 ≈− √ , t̃CG () = 1 ≈ √ .
log(1 − κ− 2 ) κ log(1 − κ− 4 ) 4
κ

Rather than having 100 steps, we only need 10 steps for instance, in case an optimal ω can be found.

10.6 GMRES - generalized minimal residual method


The iterative GMRES algorithm [115, 116] is suited for solving nonsymmetric linear systems. Let A ∈ Rn×n
be a regular matrix, but not necessarily symmetric. We demonstrate first an option, which turns out to be
not feasible when n is large. A symmetric version of the problem

Ax = b

can be achieved by multiplication with AT :

AT Ax = AT b.

The matrix B = AT A is positive definite since

(Bx, x)2 = (AT Ax, x)2 = (Ax, Ax)2 = kAxk2 .

In principle, we could now apply the CG scheme to AT A. Instead of one matrix-vector multiplication, we
would need two such multiplications per step. However, using AT A, the convergence rate will deteriorate since

κ(B) = cond2 (AT A) = cond2 (A)2 .

For this reason, the CG scheme is not really an option.

10.6.1 Pure GMRES


Let us now prepare the ingredients for the GMRES scheme. The main part is Arnoldi’s method [7] in which
the (modified) Gram-Schmidt method (see Numerik 1, e.g. [114] for the modified Gram-Schmidt see e.g., [115])
is used for orthonormalization of the basis {v1 , . . . , vk } of the Krylov space Kk = span{v1 , Av1 , . . . , Ak−1 v1 }.

244
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
Algorithm 10.23 (Arnoldi). Choose some initial vector v1 with kv1 k = 1. Iterate for j = 1, 2, . . . , do
1. hi,j = (Avj , v)2 , i = 1, 2, . . . , j
Pj
2. v̂j+1 = Avj − i=1 hi,j vi
3. hj+1,j = kv̂j+1 k
4. vj+1 = v̂j+1 /hj+1,j
We then define
Vk = {v1 , . . . , vk } ∈ Rn×k
and Hk = VkT AVk ∈ Rk×k is the upper Hessenberg matrix, with entries hij just computed by Arnoldi’s
algorithm.
Recall the task: Solve
Ax = b, A ∈ Rn×n
Using a Galerkin method with an l2 -orthonormal basis (produced by Arnoldi’s algorithm), we seek xk of the
form
xk = x0 + zk
where x0 is some initial guess and zk comes from the Krylov space Kk = span{r0 , Ar0 , . . . , Ak−1 r0 } with the
usual initial residual r0 = b − Ax0 .
In order to prepare for GMRES, we assume that after k steps of Arnoldi’s algorithm, we have
• an orthonormal system Vk+1 = {v1 , . . . , vk+1 }
• a matrix H̄k ∈ R(k+1)×k with the non-zero elements hij .
We have
AVk = Vk+1 H̄k . (221)
With this, we are interested in solving the following least squares problem:

min kb − A[x0 + z]k = min kr0 − Azk. (222)


z∈Kk z∈Kk

Introducing the variable z = Vk y we obtain the following minimization problem:

min J(y) = min kβv1 − AVk yk, β = kr0 k.

We notice that (using (221))

b − Ax = b − A(x0 + Vk y) = r0 − AVk y = βv1 − Vk+1 H̄k y = Vk+1 (βe1 − H̄k y)

This brings us to
J(y) = kVk+1 [βe1 − H̄k y]k
(k+1)×(k+1)
with the unit vector e1 ∈ R . Since Vk+1 is orthonormal we have

J(y) = kβe1 − H̄k yk, y ∈ Rk .

The solution to (222) is given by


xk = x0 + Vk yk
where yk minimizes J(y). This minimizer yk is inexpensive since it requires to solve a (k + 1) × k least-squares
problem, where k is typically small.
Algorithm 10.24. GMRES:
1. Choose the initial guess x0 . Compute r0 = b − Ax0 and v1 = r0 /kr0 k.
2. Iterate: For j = 1, 2, . . . , k, . . . until some stopping criterion:

245
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
a) hi,j = (Avj , vi )2 , i = 1, 2, . . . , j
Pj
b) v̂j+1 = Avj − i=1 hi,j vi
c) hj+1,j = kv̂j+1 k. If hj+1,j = 0 set k := j and go to Step 3 (Hessenberg matrix)
d) vj+1 = v̂j+1 /hj+1,j
3. Define the (k + 1) × k Hessenberg matrix H̄k = {hij }1≤ik+1,1≤j≤k ;
4. Compute minimizer yk of J(y);
5. Set as approximate solution xk = x0 + Vk yk .
Unfortunately, for increasing k the storage of vectors becomes an issue and increases like k. Moreover, the
arithmetic cost increases as 1/2k 2 n. Consequently, there exists a restared version, say after m steps, denoted
by GMRES(m):
Algorithm 10.25. GMRES(m):
1. Choose the initial guess x0 . Compute r0 = b − Ax0 and v1 = r0 /kr0 k.
2. Iterate: For j = 1, 2, . . . , m do:
a) hi,j = (Avj , vi )2 , i = 1, 2, . . . , j
Pj
b) v̂j+1 = Avj − i=1 hi,j vi
c) hj+1,j = kv̂j+1 k
d) vj+1 = v̂j+1 /hj+1,j
3. Set as approximate solution xm = x0 + Vm yk where ym minimizes kβe1 − H̄m yk for y ∈ Rm .
4. Restart:
• Compute rm = b − Axm ; if satisfied, stop
• else compute x0 := xm , r1 := rm /krm k and go to step 2.
Remark 10.26. Of course in both versions, still the minimization problem J(y) must be solved, which can be
done by the QR method. For more details, we refer the reader to [115].

10.6.2 Left-preconditioned GMRES


The left-preconditioned scheme reads
P −1 Ax = P −1 b.
Then, we have
Algorithm 10.27. Left-preconditioned GMRES:
1. Compute r0 = P −1 (b − Ax0 ), β = kr0 k2 , and v1 = r0 /β
2. For j = 1, . . . , k do
a) Compute w := P −1 Avj
b) For i = 1, . . . , j do
i. hi,j := (w, vi )
ii. w := w − hi,j vi
c) Compute hj+1,j = kwk2 and vj+1 = w/hj+1,j
3. Define Vk := {v1 , . . . , vk } and H̄k = {hi,j }1≤i≤j+1,1≤j≤k
4. Compute yk = argminy kβe1 − H̄k yk2
5. Set xk = x0 + Vk yk
6. If satisfied stop (solution found), else set x0 := xk and go to Step 1.

246
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
10.6.3 Right-preconditioned GMRES
The right-preconditioned scheme reads

AP −1 u = b, u = P x.

Then, we have
Algorithm 10.28. Right-preconditioned GMRES:

1. Compute r0 = b − Ax0 , β = kr0 k2 , and v1 = r0 /β


2. For j = 1, . . . , k do
a) Compute w := AP −1 vj
b) For i = 1, . . . , j do
i. hi,j := (w, vi )
ii. w := w − hi,j vi
c) Compute hj+1,j = kwk2 and vj+1 = w/hj+1,j
3. Define Vk := {v1 , . . . , vk } and H̄k = {hi,j }1≤i≤j+1,1≤j≤k

4. Compute yk = argminy kβe1 − H̄k yk2


5. Set xk = x0 + P −1 Vk yk
6. If satisfied stop (solution found), else set x0 := xk and go to Step 1.
The difference to the left-preconditioned version is that the residual norm is taken with respect to the initial
system b − Axk (without P −1 ). The reason is that the algorithm obtains the residual implicitly b − Axk =
b − AP −1 uk . Moreover, the right-preconditioned version can be extended to FGMRES (where ‘F’ stands for
‘flexible’) in which the preconditioner P −1 can be changed at each iteration step [115].

247
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
10.7 Geometric multigrid methods
In this section, we follow closely [13][Chapter 10]. Excellent descriptions including historical notes can be
found in Braess [24] and the famous book from Hackbusch [67]. Also [115] is recommended.

10.7.1 Motivation
For elliptic problems, the linear systems are large, sparse and symmetric positiv definite (s.p.d.). These
properties already were necessary to use the CG method for the iterative solution. However, the convergence
properties depend on the condition number. We recall again that the usual stiffness matrix behaves as (see
Section 10.1)
λmax
κ(A) = = O(h−2 ).
λmin
Example 10.29. Referring forward to Section 10.7.3, we obtain for instance for h = 0.5, 0.25, 0.125:

κh=0.5 (A) = 8
κh=0.25 (A) = 54
κh=0.125 (A) = 246

reflecting that the condition number grows with O(h−2 ).


The work amount in terms of arithmetic operations for often-used methods is listed in Table 1. Therein, we
already see that the CG method may perform extremely well; in particular in 3d.

Table 1: Operations for solving Az = b with A ∈ Rn×n being large, sparse, and s.p.d.
Scheme d=2 d=3

Gauss (direct) n3 n3
Banded-Gauss (direct) n2 n7/3
Jacobi (iterative) n2 n5/3
Gauss-Seidel (iterative) n2 n5/3
CG n3/2 n4/3
SOR with opt. ω n3/2 n4/3
Multigrid n n

Multigrid methods can solve linear systems with optimal complexity; namely n. Less than n is not possible
since each entry has to be looked up at least once.

Algorithm 10.30 (Basic steps of multigrid). The basic procedure of geometric multigrid is:
• Solve a given PDE on a hierarchy of meshes (grids);
• Use on each mesh (grid) a (simple) iterative solver, e.g., Jacobi or Gauss-Seidel, the so-called smoother.
The smoother will damp quickly high oscillatory parts of errors on each respective mesh (grid) level;

• Transfer (restriction) solutions from fine grids to coarser grids and collect finally (prolongation) all
parts to a final solution on the finest grid.

248
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
10.7.2 Smoothing property of Richardson iteration
In this first section, we explain the motivation why to solve on a hierarchy of meshes using a simple iterative
solver. Let
Ax = b
be given. Richardson iteration (Numerik 1 [114]) then reads:

xk+1 = xk + ω(b − Axk )

with a relaxation parameter ω. We know that Richardson converges when 0 < ω < 2/λmax (A) (see Section
10.2.1). Since A is s.p.d. the spectrum of eigenvalues can be ordered as

σ(A) = {λmin (A) = λ1 , λ2 , . . . , λn = λmax (A)}

with
0 < λ1 ≤ λ2 ≤ . . . ≤ λn .
k+1
We can now write the iteration error e = x − xk+1 . Specifically:

ek+1 = x − xk+1
= x − xk − ω(b − Axk ) (Richardson iteration function)
= (I − ωA)(x − xk )
= (I − ωA)ek
= M ek .

The matrix M = (I − ωA) is the so-called iteration matrix. In Numerik 1, we have analyzed under which
conditions for M we obtain convergence. Since we A is s.p.d. we can perform an eigenvalue decomposition.
Let (λi , zi ) an eigenpair of A, where λi is the eigenvalue and zi the eigenvector. We recall that Azi = λi zi is
the eigenvalue equation. We apply this idea to the M ek :

M zi = (I − ωA)zi = (1 − ωλi )zi .

We recall from linear algebra that the zi , i = 1, . . . , n form a basis of Rn . Consequently, any vector ek can be
written as linear combination with the help of the basis vectors. Thus:
n
X n
X
M ek = M ai zi = ai (1 − ωλi )zi ,
i=1 i=1

where ai ∈ R are the coefficients. Now we have a relation between the error e and the reduction factor
η := (1−ωλi ). It is clear that the error depends on the size of (1−ωλi ). For the special choice ω = 1/λn < 2/λn ,
we observe the reduction factor:
λi
1 − ωλi = 1 − → 0 when i is large, i.e., λi ≈ λn
λn
λi
1 − ωλi = 1 − → 1 when i is small, i.e., λi  λn
λn
We remember from Numerik 1 that the first situation is much better, because we have a fast error reduction.
What does this observation mean? First, large eigenvalues with λi ≈ λn are damped (smoothed) quickly in
the iteration error. Second, smaller eigenvalues are not damped quickly. However, smaller eigenvalues become
larger on coarser grids (smaller n). And here we are: we use a hierachy of meshes in order to damp all
eigenvalues efficiently.

249
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
10.7.2.1 Practical procedure when λn not known In practice λn is often not known. In this case, an upper
bound can be obtained with Gershgorin circles (see Numerik 2; e.g., [117]), such that we have λn ≤ γ. We
then set ω = γ1 . Clearly, we then have still convergence since

1 1 2
≤ < .
γ λn λn
Then,  
λi
η = (1 − ωλi ) = 1− . (223)
γ
Slowest convergence is obtained for i = 1 with λ1 /γ  1 and for i = n we have fast convergence with λn /γ ≈ 1.

10.7.3 Eigenvalues and eigenvectors for 1D Poisson and mesh independence


For Poisson in 1D,

−u00 (x) = f in Ω = (0, 1)


u(0) = u(1) = 0

This problem is discretized on the mesh Th = {xi }i=0,...,n+1 . Thus, we have n + 2 Dofs including the boundary
dofs x0 and xn+1 and n inner degrees of freedom x1 , . . . , xn . The mesh size is h = 1/(n + 1). We have our
well known matrix (by having h2 on the right hand side and excluding the boundary points):
 
f (x1 )
 
−1 2 −1 0 0
 . 
 ..   . 

 . 
  . 
n×n
b = h  ... 
2 n
 
A =  0 −1 2 −1 0  ∈ R , ∈R .
 
..
 
 .. 
 
.
 
   . 
0 0 −1 2 −1 f (xn )
We recall the eigenvalue equation
Aw = λw.
The eigenvalues are
θi iπ
λi = 4 sin2 , θi = = ihπ, i = 1, . . . n
2 n+1
and the corresponding eigenvectors:
 
sin(θi )
 sin(2θ ) 
i 
∈ Rn ,

wi = 
 .. 
 i = 1, . . . , n.
 . 
sin(nθi )

We now use (223) and estimate with Gershgorin circles that γ = 4 (clear since |x − 2| ≤ | − 1| + | − 1| = 2).
Then, ! 
4 sin2 θ2i
  
λi iπh
η = 1− = 1− = 1 − sin2 (224)
4 4 2

• For i = 1 (smallest eigenvalue), we obtain


   
πh πh
1 − sin2 ≈1− = 1 − O(h2 ).
2 2
It is clear that for h → 0, we obtain
1 − O(h2 ) ≈ 1.
In particular, this result is h-dependent: the smaller h, the worse.

250
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
• For i = n (largest eigenvalue), we have
 
2 nπh nπ 1
1 − sin = cos2 ≤ ,
2 2(n + 1) 2

which holds for any n = 1, 2, . . . and therefore the bound 1/2 is independent of n and therefore h.

• i > n/2:  
2 iπh iπ 1
1 − sin = cos2 ≤ ,
2 2(n + 1) 2
independely of n (thus h).
Low frequency eigenvectors are obtained for small i. High frequency eigenvectors are for i ≈ n. Whether an
eigenvalue is low frequent or not depends on the mesh size h (and consequently on n). This means:
• High frequency errors with i > n/2 are damped quickly with Richardson;
• Low frequency errors with i ≤ i/2 are damped slowly with Richardson.

Consequently, if we transfer low frequency errors to coarser meshes, they become high frequent and we could
again damp them with Richardson. The realization requires consequently are hierarchy of meshes.

10.7.4 Numerical example of eigenvalues and eigenvectors for 1D Poisson


We demonstrate our previous descriptions with a numerical example on three different (hierarchical) meshes.
The starting point is Section 6.7 with the matrix A = h12 (. . .). We use octave and specifically the function
[V,lambda] = eig(A)

We use on Ω = (0, 1) three mesh sizes h = 0.5, 0.25, 0.125 corresponding to n = 1, 3, 7 (excluding the boundary
values; thus each time x0 , . . . , xn+1 , thus n + 2 nodel points), respectively8 . Starting on the finest mesh, we
obtain:

A2 =

1 0 0 0 0 0 0 0 0
-64 128 -64 0 0 0 0 0 0
0 -64 128 -64 0 0 0 0 0
0 0 -64 128 -64 0 0 0 0
0 0 0 -64 128 -64 0 0 0
0 0 0 0 -64 128 -64 0 0
0 0 0 0 0 -64 128 -64 0
0 0 0 0 0 0 -64 128 -64
0 0 0 0 0 0 0 0 1

V =

0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.52743 0.00000


0.19134 -0.35355 0.46194 0.50000 -0.19134 -0.35355 0.46194 0.48112 0.07816
-0.35355 0.50000 -0.35355 -0.00000 -0.35355 -0.50000 0.35355 0.42729 0.15511
0.46194 -0.35355 -0.19134 -0.50000 -0.46194 -0.35355 -0.19134 0.36679 0.22962
-0.50000 -0.00000 0.50000 -0.00000 -0.50000 0.00000 -0.50000 0.30055 0.30055
0.46194 0.35355 -0.19134 0.50000 -0.46194 0.35355 -0.19134 0.22962 0.36679
-0.35355 -0.50000 -0.35355 0.00000 -0.35355 0.50000 0.35355 0.15511 0.42729
0.19134 0.35355 0.46194 -0.50000 -0.19134 0.35355 0.46194 0.07816 0.48112
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.52743
8 For the correspondance of nodal points n and the mesh size h, we refer the reader to Section 16.1.1.

251
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS

lambda =

246.2566 0 0 0 0 0 0 0 0
0 218.5097 0 0 0 0 0 0 0
0 0 176.9835 0 0 0 0 0 0
0 0 0 128.0000 0 0 0 0 0
0 0 0 0 9.7434 0 0 0 0
0 0 0 0 0 37.4903 0 0 0
0 0 0 0 0 0 79.0165 0 0
0 0 0 0 0 0 0 1.0000 0
0 0 0 0 0 0 0 0 1.0000

We see that the lower eigenvalues λi = 79, 37, 9, 1, 1 for i ≤ n/2 are relatively small in comparison to λn = 246.
Now we coarsen the mesh using h = 0.25 and obtain:
A1 =

1 0 0 0 0
-16 32 -16 0 0
0 -16 32 -16 0
0 0 -16 32 -16
0 0 0 0 1

V =

0.00000 0.00000 0.00000 0.69531 0.00000


-0.50000 0.70711 0.50000 0.56348 0.20461
0.70711 0.00000 0.70711 0.39644 0.39644
-0.50000 -0.70711 0.50000 0.20461 0.56348
0.00000 0.00000 0.00000 0.00000 0.69531

lambda =

54.6274 0 0 0 0
0 32.0000 0 0 0
0 0 9.3726 0 0
0 0 0 1.0000 0
0 0 0 0 1.0000
We see that the lower eigenvalues λi = 1, 1 for i < n/2 are relatively small in comparison to λn = 54. Therefore,
a relatively small eigenvalue λ5 = 79 in A2 became a the biggest eigenvalue λ5 = 54 on the coarser grid. Now
we coarsen again the mesh using h = 0.5 and obtain:
A0 =

1 0 0
-4 8 -4
0 0 1

V =

0.00000 0.86824 0.00000

252
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
1.00000 0.49614 0.49614
0.00000 0.00000 0.86824

lambda =

8 0 0
0 1 0
0 0 1

And again, a small eigenvalue in A1 became the biggest eigenvalue λ3 = 8 on the coarsest mesh. Of course, this
behavior of the eigenvalues and eigenvalues is due to the structure of the linear equation system and therefore
due to the problem statement; here Poisson.

10.7.5 Variational geometric multigrid


Usually the systematic procedure for multigrid is first explained for a two-grid method. This idea is then
extended to multiple meshes.
Let Vh = {ϕ1 , . . . , ϕn }, dim(Vh ) = n, and we consider the well-known abstract problem: Find uh ∈ Vh such
that
a(uh , ϕh ) = l(ϕh ) ∀ϕh ∈ Vh .
We know for vh ∈ Vh :
n
X
vh = xi ϕi
i=1

with x = (x1 , . . . , xn ) ∈ Rn being the coefficient vector. Consequently, we have the correspondance:

x ∈ Rn ↔ vh ∈ V h (225)

with the relation dim(Vh ) = n. We now consider hierarchic refinement and ask that the finite element spaces
are nested:
(1) (1)
V2h ⊂ Vh

V2h

Vh

Figure 40: Hierarchical refinement and nexted FEM spaces V2h ⊂ Vh .

Remark 10.31. Such hierarchical refinements can also be used adjoint-based mesh refinement in Chapter 9.
Formulation 10.32 (Two-grid method). A two-grid method can be formulated as:

1. Let ukh ∈ Vh be given. Compute uk,1


h ∈ Vh (on the fine grid Th ) by iterating ν times Richardson (usually
only 1 ≤ ν ≤ 3). This means (using (225)):

Set xk,0 k
h = xh
k,i/ν k,(i−1)/ν k,(i−1)/ν
Iterate xh = xh + ω(b − Ah xh ), i = 1, . . . , ν.

For i = ν, we have obtained xk,1


h ∈R
dim(Vh )
.

253
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
2. Coarse grid correction: we now solve the problem on the coarser grid T2h with V2h . Find w ∈ V2h such
that

a(uhk,1 |T2h + w, v) = l(v) ∀v ∈ V2h , (226)

where uk,1
h |T2h is now restricted to the coarser mesh.

3. Set
uk+1
h = uk,1
h + w|Th ,

where w is now prolongated back to the fine grid.


We explain the second step now in more detail. We define the bases

Vh = {ϕh1 , . . . , ϕhn }, dim(Vh ) = n


V2h = {ϕ2h 2h
1 , . . . , ϕm } dim(V2h ) = m.

Clearly, m < n. Since V2h ⊂ Vh , there exist coefficients rij to restrict the fine grid function to the coarse grid:
n
X
ϕ2h
i = rij ϕhj , 1≤i≤m
j=1

ϕ2h
1 ϕ2h
2

ϕh1 ϕh2 ϕh3

Pn
Figure 41: Illustration of ϕ2h
i =
h
j=1 rij ϕj .

The question is how to compute the restriction operator (matrix) R = (rij )? Using the above relation, this
can be done explicitly. For instance, observing Figure 41, we see

ϕ2h h h h
1 = r11 ϕ1 + r12 ϕ2 + . . . + r15 ϕ5

In more detail, ϕ2h


1 can be constructed as:

ϕ2h h h
1 = 1ϕ1 + 0.5ϕ2

yielding r11 = 1, r12 = 0.5, r1j = 0 for j = 3, 4, 5. Extending to ϕ2h 2h


2 and ϕ3 , we obtain:
 
1 0.5 0 0 0
h
R2h =  0 0.5 1 0.5 0  ∈ R3×5
 
0 0 0 0.5 1

Remark 10.33. As usual, in the actual implementation, care has to be taken how the boundary conditions
are treated in the FEM code and possibly the indices corresponding to boundary values may be modified.

254
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
We now insert this basis representation into (226):

a(uk,1 2h 2h
h + w, ϕ ) = l(ϕ ) ∀ϕ2h ∈ V2h
⇔ a(w, ϕ2h ) = l(ϕ2h ) − a(uk,1 , ϕ2h ) ∀ϕ2h ∈ V2h
  h 
Xm Xn
⇔ a yj ϕ2h 2h 
j , ϕi = l(ϕ2h
i )−a
 xk,1 h 2h 
s ϕs , ϕi , i = 1, . . . , m
j=1 s=1
   
m
X   Xn Xn n
X
⇔ yj a ϕ2h 2h
j , ϕi = l ris ϕhs  − a  xk,1 h
s ϕs , ris ϕhs 
j=1 s=1 s=1 s=1
 
m
X   Xn   Xn
⇔ yj a ϕ2h 2h
j , ϕi = ris l ϕhs − a  xk,1 h h 
t ϕt , ϕs 

j=1 s=1 t=1

⇔ h
A2h y2h = R2h (bh − Ah xk,1
h )

with the dimensions

A2h ∈ Rm×m , y2h ∈ Rm , h


R2h ∈ Rm×n , b h ∈ Rn , Ah ∈ Rn×n , xk,1 n
h ∈R .

h
Specifically, the rectangular restriction matrix is determined by (R2h )ij = rij .
Algorithm 10.34 (Two grid method (TGM) - algebraic version). We denote the linear equation systems by
Ah xh = bh and A2h x2h = b2h with Ah ∈ Rn×n and A2h ∈ Rm×m with m < n. Let the current iterate xkh ∈ Rn
be given. The following method computes the next iterate xk+1
h ∈ Rn :

T GM (xkh )
{
xk,0
h = xh
k

f or (i = 1, . . . , ν)
k,i/ν k,(i−1)/ν k,(i−1)/ν
xh = xh + ω(b − Ah xh ) Pre-smoothing
dh = bh − Ah xk,1
h Defect on fine grid
h
d2h = R2h dh Restriction to coarse grid
−1
x2h = A2h d2h Coarse grid solve
h T
yh = (R2h ) x2h Prolongation to fine grid
k,2 k,1
xh = xh + yh Coarse grid correction applied to fine grid solution
return xk,2
h
}

We see in this algorithm, the two-grid method can be extended to a multigrid method by applying recursively
the idea to the coarse grid solve y2h = A−1
2h d2h .

Remark 10.35 (Solve on the coarsest grid). If A2h is sufficiently small (i.e., T2h sufficiently coarse), then
x2h = A−1
2h d2h is often solved in practice with a direct solver, e.g, LU.

Algorithm 10.36 (Geometric multigrid (GMG) method). Let j ∈ N be the mesh level. For instance j =h, ˆ
then j − 1=2h,
ˆ and so forth. Let ν1 , ν2 ∈ N be given parameters for the number of Richardson iterations.
Moreover, let γ ∈ N be the cycle form parameter (often γ = 1 or γ = 2). The following method computes the
new iterate xk+1
j on mesh level j:

255
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS

GM G(j, xkj , bj )
{
if (j == 0)
{
xkj = A−1
j bj Solve on coarsest mesh with direct solve for instance
return xkj
}
xk,0
j = xkj
f or (i = 1, . . . , ν1 )
k,i/ν k,(i−1)/ν k,(i−1)/ν
xj = xj + ω(bj − Aj xj ) Pre-smoothing
dj = bj − Aj xk,1
j Defect on fine grid
j
dj−1 = Rj−1 dj Restriction to coarse grid
xj−1 = 0 Initial value for coarse grid solution
//Coarse grid soluton now with multigrid
if (j == 1)
γ̄ = 1
else
γ̄ = γ
f or (i = 1, . . . , γ̄)
xj−1 = GM G(j − 1, xj−1 , dj−1 )
j
yj = (Rj−1 )T xj−1 Prolongation to fine grid
xk,2
j = xk,1
j + yj Coarse grid correction applied to fine grid solution
f or (i = 1, . . . , ν2 )
k,2+i/ν k,2+(i−1)/ν k,2+(i−1)/ν
xj = xj + ω(bj − Aj xj ) Post-smoothing
return xk,3
j
}
The choice of γ gives the GMG method specific names. For instance γ = 1 results in a so-called V cycle.
The choice γ = 2 is known as W cycle. Here, we do one V cycle from j to 0 and back to j − 1 and do a second
V cycle from j − 1 to 0 back to j.

ν ν
j →1 →2
R & P %
ν ν
j−1 →1 →2
.. .. (227)
.

.
..

.
ν ν
1 →1 →2
R & P %
0 A−1
0

The V cycle.

256
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS

ν ν
j →1 →2
R & P %
ν ν ν ν
j−1 →1 →2 →1 →2
.. .. ..

.
. .

..

..
.
ν1 ν2 ν ν
1 → → →1 →2
R & P % R & P %
0 A−1
0 A−1
0
(228)

The W cycle.

10.7.6 Usage of MG in practice and MG as a preconditioner


In practice, multigrid solvers can be used as solvers for Ax = b themselves. Or, which happens more often,
multigrid methods are used a preconditioner (Section 10.5) for other iterative methods, e.g., CG or GMRES.
Advantages are twofold:
• There is no need to solve MG up to a tolerance;
• In a giving programming code with some good basic solver only the preconditioner needs to be replaced.

The performance of MG as a preconditioner is shown in Section 10.8 in which CG is used as linear solver,
preconditioned with MG and SOR (successive overrelaxation) as smoother.

10.7.7 Convergence analysis of the W cycle


For full proofs, we refer to [13, 24, 67]. Still following [13], we concentrate on the W cycle because the proofs
are simpler. The proofs of the V cycle are more involved requiring sufficient smoothing steps and H 2 regularity.
Let j be the mesh index and k the iteration index. The goal is to derive an estimation with the multigrid
convergence rate ρj on j + 1 meshes:

kuj,k+1 − uj k ≤ ρj kuj,k − uj k

where uj ∈ Vj is the discrete solution on mesh level j and uj,k+1 is the k + 1-th iterate. Specifically, the
multigrid method is convergent when ρj < 1 and ρj is independent of the mesh size h (see Section
10.7.3 for 1D Poisson and Richardson as a smoother). Then, the computational cost would not increase while
refining the mesh. The factor ρj is the convergence rate. Of course, ρj  1, the faster the convergence.
Error recursion for the smoothing step
We have:
ek,1 = xh − xk,1 ν1 k
h = (Ih − ωAh ) (xh − xh ) = Sh e
ν1 k

Coarse grid correction step


Then:

ek+1 = xh − xhk+1
 
= xh − xk,1 h T −1 h k,1
h + (R2h ) A2h R2h (bh − Ah xh )
h T −1 h
= ek,1 − (R2h ) A2h R2h Ah ek,1
h T −1 h
= (Ih − (R2h ) A2h R2h Ah )ek,1

257
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
With this, we obtain for the iteration matrix:
h T −1 h
ek+1 = (Ih − (R2h ) A2h R2h Ah )Shν1 ek = (A−1 h T −1 h ν1 k
h − (R2h ) A2h R2h )Ah Sh e

This allows us to split the error into two parts:


kek+1 k ≤ k(A−1 h T −1 h
− (R2h ) A2h R2h )k kAh Shν1 k kek k.
| h {z } | {z }
Approximation Smoothing

The appropriate norm is the Euclidian norm as it will turn out.

10.7.7.1 Norms We recall that A is s.p.d. (only necessary for s = 1 in the following). Thus we define (similar
to the CG method): p
kxks := (x, As x), s = 0, 1, 2.
In detail:
p
s=0: kxk0 = ((x, x)) Euclidian norm
p
s=1: kxk1 = ((x, Ax)) Energy norm
p
s=2: kxk2 = ((Ax, Ax)) Defect norm
These definitions can be further extended to s ∈ R by using a diagonalization of A.
The Sobolev norms k · kL2 and k · kH 1 can be related to k · k0 and k · k1 , respecively. We have:
kxk21 = (x, Ax) = a(uh , uh )
As in the Lax-Milgram lemma, cf. Section 8.10, we obtain:
αkuh kH 1 ≤ kxk21 = a(uh , uh ) ≤ γkuh kH 1 .
As seen in the stability estimate of the proof of the Lax-Milgram lemma, these estimates are independent of
the basis of Vh and only use approximation properties.
Lemma 10.37. Let {ϕ1 , . . . , ϕN } be the Lagrange basis of Vh on a family of uniform and shape regular
triangulations. Let x be the coefficient vector of vh ∈ Vh . Then, there exist c1 , c2 > 0 independent of h, but
dependent on the overall mesh Th such that
c1 hn/2 kxk0 ≤ kvkL2 ≤ c2 hn/2 kxk0 .
Here n is the space dimension.

10.7.7.2 Approximation properties


Lemma 10.38 (Approximation property). Let Ah xh = bh the discrete problem on the finest grid. Moreover,
let xk,1 k,2
h denote the iterate after smoothing and xh denotes the iterate after the coarse grid correction. If the
mesh is uniform and shape-regular, the variational problem is H 2 regular and we have
kxh − xk,2
h k0 ≤ ch
2−n
kxh − xk,1
h k2

As before, n is the space dimension. We see that this estimate is not robust w.r.t. to n.

10.7.7.3 Smoothing properties


Lemma 10.39 (Smoothing property). Let Ah be s.p.d. The Richardson iteration
xk+1
h = xkh + ω(bh − Ah xkh )
with ω = (λmax (A))−1 yields the following estimate:
λmax (Ah )
kxh − xνh k2 ≤ kxh − x0h k0 .
ν

258
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
10.7.7.4 Two- and multigrid convergence results
Theorem 10.40. With all assumptions from before, the two-grid method with ν steps of Richardson iterations
as smoother, satisfies
c
kxh − xk+1
h k0 ≤ kxh − x0h k0
ν
To this end, for sufficiently large ν, the two-grid method converges independendly of the mesh size h.
Lemma 10.41 (Recursion; [13] or [24][Chapter V.3). Let ρ1 be the convergence rate of the two-grid method
and ρl the convergence rate of a multigrid method with the cycle parameter γ and l ≥ 2 levels. Then, the
following recursion holds true
ρl ≤ ρ1 + (1 + ρ1 )ργl−1

Theorem 10.42 (Convergence ρl rate W cycle). If ρ1 ≤ 51 for the two-grid method, then it holds for the W
cycle of the multigrid method
5 1
ρl ≤ ρ1 ≤ for l = 2, 3, . . .
3 3
Lemma 10.43 (Optimal complexity of the multigrid method; [13]). Let Nl the number of unknown on level
l using P1 finite elements. Then, the amount of arithemetic operations Al on level l is

Al = O(Nl ).

259
10. NUMERICAL SOLUTION OF THE DISCRETIZED PROBLEMS
10.8 Numerical tests
We continue the numerical tests from Section 8.16 and consider again the Poisson problem in 2D and 3D on
the unit square and with force f = 1 and homogeneous Dirichlet conditions. We use as solvers:
• CG
• PCG with SSOR preconditioning and ω = 1.2
• PCG with geometric multigrid preconditioning and SOR smoother

• GMRES
• GMRES with SSOR preconditioning and ω = 1.2
• BiCGStab

• BiCGStab with SSOR preconditioning and ω = 1.2


The tolerance is chosen as T OL = 1.0e − 12. We also run on different mesh levels in order to show the
dependency on n.

Dimension Elements DoFs(N) CG PCG PCG-GMG GMRES P-GMRES BiCGStab P-BiCGStab


====================================================================================
2 16 25 3 5
2 64 81 10 6
2 256 289 23 19 6 23 18 16 12
2 1024 1089 47 33 6 83 35 33 21
2 4096 4225 94 60 6 420 78 66 44
2 16384 16641 186 6
2 65536 66049 370 6
2 262144 263169 731 5
2 1048576 1050625 1400 5
2 4194304 4198401 2780 5
------------------------------------------------------------------------------------
3 4096 4913 25 19 25 21 16 11
3 32768 35937 51 32 77 38 40 23
3 262144 274625 98 57 307 83 69 46
====================================================================================

Four main messages:

• GMRES performs worst.


• PCG performs better than pure CG, but less significant than one would wish.
• BiCGStab performs very well - a bit surprising.

• Geometric multigrid preconditioning yields mesh-independent constant iteration numbers of the outer
CG iteration

10.9 Chapter summary and outlook


In extension to Numerik 1, we introduced in this chapter, the numerical solution with geometric multigrid
methods. As smoother, however, iterative methods from Numerik 1, are needed. This shows a nice combina-
tions between different classes in numerical methods.

260
11. APPLICATIONS IN CONTINUUM MECHANICS: LINEARIZED ELASTICITY AND STOKES

11 Applications in continuum mechanics: linearized elasticity and


Stokes
In this chapter, we undertake a short excursus to important prototype applications: linearized elasticity (solid
mechanics) and the Stokes problem (fluid mechanics). Both models lead to systems of equations since they
are vector-valued, but are still linear and consequently fit into the lines of our previous developments. We try
to provide an introduction that touches the aspects, which can be understood with the help of our previous
developments. For instance, linearized elasticity is still of elliptic type and we show, how the well-posedness can
be proven using Lax-Milgram (i.e., Riesz since the problem is symmetric). On the other hand, the maximum
principle does not hold anymore. These notes cannot provide all possible details since they fill entire books
such as [34] and [125]. In regard of fluid mechanics, the Navier-Stokes equations have been introduced in
Section 9.24 to demonstrate mesh adaptivity with respect to drag forces as goal functional.

11.1 Modeling
Since linearized elasticity is vector-valued, we design now the space H 1 for three solution components ûx , ûy , ûz :

û = (ûx , ûy , ûz ) ∈ [H 1 (Ω)]


b 3

Specifically, we define:
V̂s0 := [H01 (Ω)]
b 3.

Remark 11.1. In linearized elasticity we deal with small displacements and consequently the two principle
coordinate systems, Lagrangian and Eulerian are identified; we also refer the reader to Remark 4.14. For large
displacements, one needs to distingish between a reference configuration in which the actual computations
are carried out. This reference configuration is indicated in a ‘hat’ notation, i.e., Ω.
b Via a transformation,
the solution is mapped to the current (i.e., physical or deformed) configuration Ω.

Problem 11.2 (Stationary linearized elasticity). Let Ω ⊂ Rd , d = 3. Given fˆ : Ω → Rd , find a vector-valued


displacement û ∈ V̂s0 = H01 such that

b ϕ̂) = (fˆ, ϕ̂)


b lin , ∇
(Σ ∀ϕ̂ ∈ V̂s0 ,

where
Σ
b lin = 2µE
blin + λtrE ˆ
blin I.
b lin is a matrix and specified below. Furthermore, the gradient ∇
Here, Σ b ϕ̂ is a matrix. Consequently, the scalar
product is based on the Frobenius Definition 8.14. The material parameters µ, λ > 0 are the so-called Lamé
parameters. Finally tr(·) is the trace operator. For further details on the entire model, we refer to Ciarlet [34].

Definition 11.3 (Green-Lagrange strain tensor E).


b

b = 1 (C
E ˆ = 1 (FbT Fb − I)
b − I) ˆ = 1 (∇û b T + ∇û
b + ∇û b T ),
b · ∇û
2 2 2

which is again symmetric and positive definite for all x̂ ∈ Ω b and of course, Iˆ have these properties. 
b since C

Performing geometric linearization, which is reasonable for example when k∇ûk


b  1, we can work with the
linearized Green-Lagrange strain tensor
Definition 11.4 (Linearized Green-Lagrange strain tensor E
blin ).

blin = 1 (∇û
E b T ).
b + ∇û
2


261
11. APPLICATIONS IN CONTINUUM MECHANICS: LINEARIZED ELASTICITY AND STOKES
Proposition 11.5. Let the following linear boundary value problem be given:

−div(Σ
b lin ) = f in Ω,
b (229)
û = 0 on ∂ Ω
bD, (230)
Σ
b lin n̂ = ĝ on ∂ Ω
bN . (231)

Finding a solution û of this strong form is formally equivalent to finding a solution of Problem 11.2.
Proof. We employ Green’s formula for any sufficiently smooth tensor field Σ b and vector field ϕ̂, we obtain
Z Z Z Z Z
∇ · Σ · ϕ̂ dx̂ = −
b b Σ : ∇v̂ dx̂ +
b b Σn̂ · ϕ̂ dŝ = −
b Σ : Elin dx̂ +
b b b · ϕ̂ dŝ.
Σn̂

b Ω
b ∂Ω
bN Ω
b ∂Ω
bN

The last equal sign is justified with linear algebra arguments, i.e., for a symmetric matrix A is holds:
Definition 11.6 (Proof is homework). For the Frobenius scalar product, and A = AT , it holds
1
A:B=A: (B + B T ).
2

Furthermore, on the boundary part ∂ Ω


b D the vector field ϕ̂ vanishes by definition. Thus, the first direction
is shown. Conversely, we now assume that the variational equations are satisfied; namely,
Z Z
b :∇
Σ b ϕ̂ dx̂ = fˆ · ϕ̂ dx̂

b Ω
b

if ϕ̂ = 0 on all ∂Ω. By Green’s formula we now obtain


Z Z
Σ : ∇ϕ̂ dx̂ = −
b b ∇
b ·Σ
b dx̂.

b Ω
b

Putting both pieces together and taking into account that the integrals do hold on arbitrary volumes yields

−∇ b = fˆ in Ω.
b ·Σ b

Considering now the Neumann boundary, we use again Green’s formula to see
Z Z
b · ϕ̂ ds =
Σn̂ ĝ ds.
∂Ω
bN ∂Ω
bN

This holds for arbitrary boundary parts ∂ Ω


b N such that Σn̂
b = ĝ can be inferred and concluded the second part
of the proof.

11.2 Well-posedness
Even so that linearized elasticity is an elliptic problem and has much in common with the scalar-valued Poisson
problem, establishing existence and uniquenss for the stationary version is a non-trivial task. Why? Let us
work with Problem 11.2 in 3D and since this problem is linear we aim to apply the Riesz representation
theorem9 (or the Lax-Milgram lemma - see Theorem 8.72) and need to check the assumptions:
• Determine in which space V̂ we want to work;
• The form A(û, ϕ̂) is symmetric or non-symmetric

• The form A(û, ϕ̂) is continuous bilinear w.r.t. to the norm of V̂ ;


• The bilinear form A(û, ϕ̂) is V -elliptic;

9 b lin , ∇
Here, A(û, ϕ̂) = (Σ b ϕ̂) is symmetric and the Riesz representation 8.75 is sufficient for existence and uniqeness!

262
11. APPLICATIONS IN CONTINUUM MECHANICS: LINEARIZED ELASTICITY AND STOKES
• The right-hand side functional fˆ is a continuous linear form.
The ingredients to check are:

• From which space must fˆ be chosen such that it is continuous? (This leads to Sobolev embedding
theorems [34, 50]);
• How do we check the V -ellipticity? (This requires the famous Korn inequality as it is stated in a minute
below).

For proofing existence of linearized elasticity, we need to check as second contition the V -ellipticity which is
‘easy for scalar-valued problems’ but non-trivial in the case of vector-valued elasticity. The resulting inequalities
are named after Korn (1906):
Theorem 11.7 (1st Korn inequality). Let us assume displacement Dirichlet conditions on the entire boundary
∂Ω. For vector-fields û ∈ H01 (Ω)
b 3 it holds:

k∇ûk ≤ cKorn kE
blin (û)k.

blin (û) := 1 (∇û


where E b Ts ) ∈ L2 (Ω
b s + ∇û b s ).
2

Proof. See for example [24, 34, 102].


The more general form is as follows:
Theorem 11.8 (2nd Korn inequality). Let a part of the boundary of Neumann type, i.e., ∂ΩN 6= 0 and let the
entire boundary be of class C 1 , or a polygon, or of Lipschitz-type. For vector-fields û ∈ H 1 (Ω) b 3 and E
blin ∈ L2
it holds 1/2
2 b 2
kûs kH 1 (Ω
b s ) ≤ CKorn |ûs |L2 (Ω
b ) + |Elin |L2 (Ω
b ) ∀ûs ∈ H 1 (Ω
b s ),
s s

and a positive constant CK and secondly,


1/2
u 7→ |ûs |2L2 (Ω b 2
b ) + |Elin |L2 (Ω
s s)
b

is a norm that is equivalent to k · kH 1 .


b ⊂ R3 and Γ
Theorem 11.9 (Existence of a weak solution of linearized elasticity ). Let Ω b D > 0 and Γ
bN > 0
Dirichlet and Neumann boundary respectively. Let the Lamé constants be µ > 0 and λ > 0 and fˆ ∈ L6/5 (Ω)
and ĝ ∈ L4/3 (Γ
b N ) be given. Furthermore, let

V̂ := {ϕ̂ ∈ H 1 (Ω)| b D }.
b ϕ̂ = 0 on Γ

Then, there exists a unique element û ∈ V̂ such that

A(û, ϕ̂) = F (ϕ̂) ∀ϕ̂ ∈ V̂ .

Here, the bilinear form and the right hand side functional are given by
b lin , ∇
A(û, ϕ̂) = (Σ b ϕ̂), and F (ϕ̂) = (fˆ, ϕ̂) + hĝ, ϕ̂i, (232)

where the linearized STVK model is given by


b lin := 2µE
Σ blin + λtrE ˆ
blin I.

The function ĝ is a prescribed traction condition on Γ


b N ; namely:

[2µE
blin + λtrE
blin I] n̂ = ĝ.

Proof. The proof is as follows:

263
11. APPLICATIONS IN CONTINUUM MECHANICS: LINEARIZED ELASTICITY AND STOKES
• Continuity of the right hand side functional F (·). In order to apply Hölder’s inequality, we first
check the correct Sobolev embedding:
∗ 1 1 m
H 1 = W 1,2 ,→ Lp with ∗
= −
p p d

for m < dp . Since we work in 3D, d = 3. Furthermore, m = 1, p = 2 in W m,p . It can be inferred that
consequently, p∗ = 6. For the continuity of F , we calculate with Hölder’s inequality:

|(fˆ, ϕ̂)| ≤ kfˆkp kϕ̂kq=6

where p1 + 1q = 1. Since, q = 6, this yields p = 56 , i.e., fˆ ∈ L6/5 . In the same spirit, we obtain ĝ ∈ L4/3
while additionally employing the trace inequality. Let us verify this claim: From trace theorems, e.g.,
[34], p. 280, we know that for 1 ≤ p < ∞ we have

tr ∈ L(W 1,p , Lp ) ⇔ k · kLp∗ ≤ ck · kW 1,p ,

with p1∗ = p1 − (d−1)


1
( p−1
p ) if 1 ≤ p < d. As before and still, d = 3. This means 1 ≤ p ≤ 2. Of course we
want to work again with W 1,2 , p = 2. Then,
1 1 1 p−1 1 1 2−1 1
= − ( )= − ( )= .
p∗ p (d − 1) p 2 (3 − 1) 2 4

Consequently, p∗ = 4. Let us estimate with Hölder’s inequality and the trace theorem:
Z
ĝ ϕ̂ ds ≤ kĝkLq (ΓbN ) kϕ̂kL4 (ΓbN ) ≤ kĝkLq (ΓbN ) kϕ̂kW 1,2 (Ω)
b .
Γ
bN

In order to be able to apply Hölder’s inequality we need to find the correct q. This is now easy with
1 1 4
+ =1 ⇒q= .
p q 3

Consequently, ĝ ∈ L4/3 .
• Symmetry of A(û, ϕ̂). Let us now verify if A(û, ϕ̂) is symmetric:

b lin , ∇
A(û, ϕ̂) = (Σ b ϕ̂) = (2µE blin (û) + λtr(E ˆE
blin (û))I, blin (ϕ̂))
= 2µ(E blin (û), E
blin (ϕ̂)) + λ(tr(E ˆE
blin (û))I, blin (ϕ̂))
= 2µ(E
blin (ϕ̂), E
blin (û)) + λ(tr(E ˆE
blin (ϕ̂))I, blin (û))
= (2µE
blin (ϕ̂) + λtr(E ˆE
blin (ϕ̂))I, blin (û))
= A(ϕ̂, û).

Here, we used the relation defined in Definition 11.6.


• Continuity of A(û, ϕ̂). We need to show that

|A(û, ϕ̂)| ≤ ckûkH 1 kϕ̂kH 1 ∀û, φ in V̂ .

Let us start with the definition and apply first Cauchy’s inequality:

|A(û, ϕ̂)| = |(Σ


b lin , ∇
b ϕ̂)| = |(Σb lin , E
blin (ϕ̂))|
Z 1/2 Z 1/2
≤ b lin |2
|Σ blin (ϕ̂)|2
|E
Z 1/2 Z 1/2
= |2µE blin (û) + λtr(E ˆ2
blin (û))I| blin (ϕ̂)|2
|E .

264
11. APPLICATIONS IN CONTINUUM MECHANICS: LINEARIZED ELASTICITY AND STOKES
We use ktr(E ˆ ≤ kElin (û)k and |E(û)|L2 ≤ ckûkH 1 , where the latter one specifically holds for
blin (û))Ik
1
all û ∈ H . Then,
Z 1/2 Z 1/2
|A(û, ϕ̂)| ≤ |2µE
blin (û) + λtr(E ˆ2
blin (û))I| blin (ϕ̂)|2
|E
Z 1/2 Z 1/2
2 blin (ϕ̂)|2
≤ |2µElin (û) + λElin (û)|
b b |E
Z 1/2 Z 1/2
2 blin (ϕ̂)|2
≤ c(µ, λ) |Elin |
b |E

= c(µ, λ)|E(û)|L2 |E(ϕ̂)|L2


≤ c1 kûkH 1 kϕ̂kH 1 .

• V -ellipticity. First, we observe that

A(û, û) = (2µE


blin (û) + λtr(E ˆE
blin (û))I, blin (û)) ≥ (2µE
blin (û), E blin |L2 .
blin (û)) = 2µ|E

because µ, λ > 0. The V -ellipticity can be inferred if we can show that on the space V̂s0 , the semi-norm

û 7→ |E
blin |L2

is a norm that is equivalant to k · kH 1 . To do so, we proceed in two steps. We now employ the 2nd Korn
inequality (first step), see Theorem 11.8. In detail: if Γ b D > 0 and c > 0 a given constant such that (the
second step)
c−1 kûk1 ≤ |Eblin |L2 ≤ ckûk1 ∀û ∈ H 1 (Ω
b s ),

i.e., on V̂ , the mapping u 7→ |E


blin | is norm equivalent to kûk1 . (This techniquality requires a seperate
proof!)
• Apply Riesz. Having proven that A is symmetric, continuous and V -elliptic as well as that F is linear
continuous, we have checked all assumptions of the Riesz representation Theorem 8.75 and consequently,
a unique solution û ∈ V̂ does exist.
For further discussions, the reader might consult Ciarlet [34].
Remark 11.10. Since this elasticity problem is symmetric, the problem can be also interpreted from the
energy minimization standpoint as outlined in Theorem 8.75. In this respect, the weak form considered here
corresponds to the first-order necessary condition, i.e., the weak form of the Euler-Lagrange equations.
Remark 11.11. For linearized elasticity, even that it is of elliptic type, the maximum principle does not hold
anymore.

11.3 Finite element discretization


Problem 11.12 (Discretized stationary linearized elasticity). Find vector-valued displacements ûh ∈ V̂h0 such
that
b ϕ̂h ) = (fˆ, ϕ̂h ) ∀ϕ̂h ∈ V̂h0 ,
b h,lin , ∇

where
Σ
b h,lin = 2µE
bh,lin + λtrE ˆ
bh,lin I.

and
bh,lin = 1 (∇û
E b Th ).
b h + ∇û
2
As previsously discussed, the spatial discretization is based upon a space V̂h0 with basis {ϕ̂1 , . . . , ϕ̂N } where
N = dim(V̂h0 ). We recall, that in 3D elasticity, we have three solution components such that the total dimension
is 3N = [V̂h0 ]d , d = 3.
We solve the problem:

265
11. APPLICATIONS IN CONTINUUM MECHANICS: LINEARIZED ELASTICITY AND STOKES
Formulation 11.13. Find ûh ∈ V̂h such that

A(ûh , ϕ̂h ) = F (ϕ̂h ) ∀ϕ̂h ∈ V̂h ,

with
b h,lin , ∇
A(ûh , ϕ̂h ) := (Σ b ϕ̂h ), F (ϕ̂h ) := (fˆ, ϕ̂h ).
This relation does in particular hold true for each test function ϕ̂i , i = 1, . . . , N :

A(ûh , ϕ̂ih ) = F (ϕ̂ih ) ∀ϕ̂ih , i = 1, . . . , N


PN
The solution ûh we are seeking for is a linear combination of all test functions, i.e., ûh = j=1 uj ϕ̂jh . Inserting
this relation into the bilinear form A(·, ·) yields
N
X
A(ϕ̂jh , ϕ̂ih )uj = F (ϕ̂ih ), i = 1, . . . , N.
j=1

It follows for the ingredients of the linear equation system:

û = (ûj )N N
j=1 ∈ R , b = (Fi )N N
i=1 ∈ R , B = Aij = a(ϕ̂Jh , ϕ̂ih ).

The resulting linear equation systems reads:


Bu = b
Remark 11.14. In the matrix B the rule is always as follows: the ϕ̂ih test function determines the row and the
trial function ϕ̂jh the column. This does not play a role for symmetric problems (e.g., Poisson’s problem) but
becomes important for nonsymmetric problems such as for example Navier-Stokes (because of the convection
term).
Remark 11.15. In the matrix, the degrees of freedom that belong to Dirichlet conditions (here only dis-
placements since we assume Neumann conditions for the phase-field) are strongly enforced by replacing the
corresponding rows and columns as usual in a finite element code.

Example 11.16 (Laplacian in 3D). In this vector-valued problem, we have 3N test functions since the solution
(1) (2) (3)
vector is an element of R3 : uh = (uh , uh , uh ). Thus in the boundary value problem

Find uh ∈ Vh : (∇uh , ∇ϕh ) = (f, ϕh ) ∀ϕh ∈ Vh ,

the bilinear form is tensor-valued:

∂1 ϕ1,j ∂2 ϕ1,j ∂3 ϕ1,j ∂1 ϕ1,i ∂2 ϕ1,i ∂3 ϕ1,i


   
Z Z h h h h h h
aLaplacian (ϕjh , ϕih ) = ∇ϕjh : ∇ϕih dx = 2,j 2,j 2,j   2,i 2,i
 ∂1 ϕh ∂2 ϕh ∂3 ϕh  :  ∂1 ϕh ∂2 ϕh ∂3 ϕh  dx.
2,i 

Ω Ω
∂1 ϕ3,j
h ∂2 ϕ3,j
h ∂3 ϕ3,j
h ∂1 ϕ3,i
h ∂2 ϕ3,i
h ∂3 ϕ3,i
h

266
11. APPLICATIONS IN CONTINUUM MECHANICS: LINEARIZED ELASTICITY AND STOKES
11.4 Numerical tests
11.4.1 3D
We use the above laws given Problem 11.12 and implement them again in deal.II [6, 9].

11.4.1.1 Configuration The domain Ω is spanned by (0, 0, 0) and (5, 0.5, 2) resulting in an elastic beam (see
Figure 42). The vertical axis is the y-direction (see again Figure 42). The initial domain is four times globally
refined resulting in 4096 and total 14739 DoFs (i.e., 4913 DoFs in each solution component).

11.4.1.2 Boundary conditions The specimen has six boundary planes. On the boundary Γ = {(x, y, z)|x =
0} the elastic beam is fixed and can move at the other boundaries:

u = (ux , uy , uz ) = 0 on Γ (Homogeneous Dirichlet),


Σlin · n = 0 on ∂Ω \ Γ (Homogeneous Neumann).

11.4.1.3 Material parameters and given data The material parameters are:
lame_coefficient_mu = 1.0e+6;
poisson_ratio_nu = 0.4;

lame_coefficient_lambda = (2 * poisson_ratio_nu * lame_coefficient_mu)/


(1.0 - 2 * poisson_ratio_nu);
and the force vector is prescribed as
f (x, y, z) = (0, −9.81, 0)T .

11.4.1.4 Simulation results With a PCG scheme (see Section 10.8), we need 788 PCG iterations to obtain
convergence of the linear solver. The resulting solution is displayed in Figure 42.

Figure 42: 3D linearized elasticity: uy solution and mesh. The elastic beam is attached at x = 0 in the
y − z plane by ux = uy = uz = 0. All other boundaries are traction free (homogeneous Neumann
conditions).

267
11. APPLICATIONS IN CONTINUUM MECHANICS: LINEARIZED ELASTICITY AND STOKES
11.4.2 2D test with focus on the maximum principle
In the corresonding 2D test (because it is easier to show), we demonstrate the violation of the maximum
principle.

Figure 43: 2D linearized elasticity. The same test as for 3D. At left, the ux solution is displayed. At right, the
uy solution is displayed.

Figure 44: 2D linearized elasticity: all boundaries are subject to homogeneous Dirichlet conditions. In the ux
solution (left), we observe that the minimum and maximum are obtained inside the domain, which
is a violation of the maximum principle.

268
11. APPLICATIONS IN CONTINUUM MECHANICS: LINEARIZED ELASTICITY AND STOKES
11.5 Stokes - FEM discretization
In Section 9.24, we have formulated the stationary NSE system. Without convection term, we arrive at the
Stokes system for viscous flow (such as honey for instance). Without going into any theoretical details, we
briefly present the finite element scheme in the following. A classical reference to NSE FEM discretizations is
[57].
Let Uh = {vh , ph }, Xh := Vh × Lh = H01 × L0 and Ψ = {ψ v , ψ p }. The problem reads:

Find Uh ∈ Xh such that: A(Uh , Ψh ) = F (Ψh ) ∀Φ ∈ Xh ,

where

A(Uh , Ψh ) = (∇vh , ∇ψhv ) − (ph , ∇ · ψhv ) + (∇ · vh , ψhp ),


F (Ψh ) = (ff , ψhv ).

Let us choose as basis:

Vh = {ψhv,i , i = 1, . . . , NV := dimVh },
Lh = {ψhp,i , i = 1, . . . , NP := dimLh }.

Please pay attention that Vh := (Vh )d is a vector-valued space with dimension d. If follows:

(∇vh , ∇ψhv,i ) − (ph , ∇ · ψhv,i ) i = 1, . . . , NV ,


(∇ · vh , ψhp,i ) i = 1, . . . , NP ,

Setting:
NV
X NP
X
vh = vj ψhv,j , ph = pj ψhp,j
j=1 j=1

yield the discrete equations:


NV
X NP
X
(∇ψhv,j , ∇ψhv,i )vj − (ψhp,j , ∇ · ψhv,i )pj , i = 1, . . . , NV ,
j=1 j=1
Nv
X
(∇ · ψhv,j , ψhp,i ), i = 1, . . . , NP .
j=1

With this, we obtain the following matrices:

A := (∇ψhv,j , ∇ψhv,i )N V ,NV


ij=1 , B := −(ψhp,j , ∇ · ψhv,i )ij=1
NV ,NP
, −B T = (∇ · ψhv,j , ψhp,i )N P ,NV
ij=1 .

These matrices form the block system:


    
A B v f
= , (233)
−B T 0 p 0

where v = (vi )N NP
i=1 , p = (pi )i=1 .
V

269
11. APPLICATIONS IN CONTINUUM MECHANICS: LINEARIZED ELASTICITY AND STOKES
11.6 Numerical test - 2D-1 benchmark without convection term (Stokes)
We take the same material parameters as in Section 9.24, but now without the convection term. The results
are displayed in Figure 45.

Figure 45: Stokes 2D-1 test: x velocity profile and pressure distribution.

The functional values on a three times uniformly-refined mesh are:


==================================
P-Diff: 4.5559514861737337e-02
P-front: 6.3063021215059634e-02
P-back: 1.7503506353322297e-02
Drag: 3.1419886302140707e+00
Lift: 3.0188981539007183e-02
==================================

11.7 Chapter summary and outlook


In this chapter, we extended from scalar-valued elliptic problems to vector-valued elliptic problems. The well-
posedness analysis follows the same lines as the scalar cases. We notice that the maximum principle does not
hold anymore in the vector-valued case (not proven though, but shown in a numerical example.). The finite
element formulation is in principle the same as the scalar case, but we have to work with vector-valued function
spaces now. The formal procedure is the same as before, but the computational cost increases significantly
since now three variables need to discretized (in 3D) rather than only one variable.

270
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS

12 Methods for time-dependent PDEs: parabolic and hyperbolic


problems
By now, we have considered stationary, linear PDE problems. However, more realistic, and most notably
fascinating, modeling is
• nonstationary,
• and nonlinear.
We investigate the former one in the present chapter and give a brief introduction to nonlinear problems in
Chapter 13.
Therefore, the focus of this chapter is on
• time-dependent linear PDEs.
We concentrate first on parabolic PDEs and later study second-order-in-time hyperbolic PDEs. For first-order
hyperbolic PDEs, we refer to [82].
A prominent example of a parabolic PDE is the heat equation that we investigate in this section. In the
next section, we consider the second-order hyperbolic wave equation.

12.1 Principle procedures for discretizing time and space


Time-dependent PDE problems require require discretization in space and time. One can mainly distinguish
three procedures:
• Vertical method of lines (method of lines): first space, then time.
• Horizontal method of lines (Rothe method): first time, then space.
• A full space-time Galerkin method.
Using one of the first two methods, and this is also the procedure we shall follow here, temporal discretization
is based on FD whereas spatial discretization is based on the FEM. In the following we concentrate on the
Rothe method and we briefly provide reasons why we want to work with the Rothe method. In fact, the
traditional way of discretizing time-dependent problems is the (vertical) method of lines. The advantage of
the method of lines is that we have simple data structures and matrix assembly. This leads to a (large) ODE
system, which can be treated by (well-known) standard methods from ODE analysis (ordinary differential
equations) can be employed for time discretization. The major disadvantage is that the spatial mesh is fixed
(since we first discretize in space) and it is then difficult to represent or compute time-varying features such
as certain target functionals (e.g., drag or lift).
In contrast, the Rothe method allows for dynamic spatial mesh adaptation with the price that data structures
and matrix assembly are more costly.

12.2 Bochner spaces - space-time functions


For the correct function spaces for formulating time-dependent variational forms, we define the Bochner inte-
gral. Let I := (0, T ) with 0 < T < ∞ a bounded time interval with end time value T . For any Banach space
X and 1 ≤ p ≤ ∞, the space
Lp (I, X)
denotes the space of Lp integrable functions f from the time interval I into X. This is a Banach space, the
so-called Bochner space, with the norm, see [145],
Z 1/p
kvkLp (I,X) := kv(t)kpX dt
I
kvkL∞ (I,X) := ess sup kv(t)kX .
t∈I

For the definition of the norms of the spatial spaces, i.e., kv(t)kX with X = L2 or X = H 1 , we refer to Section
7.1.7.

271
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
Example 12.1. For instance, we can define a H 1 space in time:
n o
H 1 (I, X) = v ∈ L2 (I, X)| ∂t v ∈ L2 (I, X) .

Functions that are even continuous in time, i.e., u : I → X, are contained in spaces like

C(I; X)

with
kukC(I;X) := max ku(t)k < ∞.
0≤t≤T

Definition 12.2 (Weak derivative of space-time functions). Let u ∈ L1 (I; X). A function v ∈ L1 (I; X) is the
weak derivative of v, denoted as
∂t u = v
if
Z T Z T
∂t ϕ(t)u(t) dt = − ϕ(t)v(t) dt
0 0

for all test functions ϕ ∈ Cc∞ (I).


In particular, the following result holds:
Theorem 12.3 ([50]). Assume v ∈ L2 (I, H01 ) and ∂t v ∈ L2 (I, H −1 ). Then, v is continuous in time, i.e.,

v ∈ C(I, L2 )

(after possible redefined on a set of measure zero). Furthermore, the mapping

t 7→ kv(t)k2L2 (X)

is absolutely continuous with


d d
kv(t)k2L2 (X) = 2h v(t), v(t)i
dt dt
for a.e. 0 ≤ t ≤ T .
Proof. See Evans [50], Theorem 3 in Section 5.9.2.
The importance of this theorem lies in the fact that now the point-wise prescription of initial conditions
does make sense in weak formulations.
Remark 12.4. More details of these spaces by means of the Bochner integral can be found in [39, 144] and
also [50].

12.3 Methods for parabolic problems


To illustrate the concepts of discretizing we work in 1D (one spatial dimension) in the following.

12.3.1 Problem statements


Let us consider the following abstract problem:
Formulation 12.5. Consider the following parabolic problem: Find u such that:

∂t u + A(u) = f in Ω × I,
u(x, t) = uD on ∂Ω × I,
0
u(x, 0) = u in ∂Ω × {0},

where A is a linear elliptic operator.

272
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
A concrete realization is the following situation: Let Ω be an open, bounded subset of Rd , d = 1 and
I := (0, T ] where T > 0 is the end time value. The IBVP (initial boundary-value problem) reads:
Formulation 12.6. Find u := u(x, t) : Ω × I → R such that

ρ∂t u − ∇ · (α∇u) = f in Ω × I,
u=a on ∂Ω × [0, T ],
u(0) = g in Ω × t = 0,

where f : Ω × I → R and g : Ω → R and α ∈ R and ρ are material parameters, and a ≥ 0 is a Dirichlet


boundary condition. More precisely, g is the initial temperature and a is the wall temperature, and f is some
heat source.

For the variational formulation of the abstract form, let H and V be Hilbert spaces with V ∗ being the dual
space of V . A Gelfand triple is denoted by
V ,→ H ,→ V ∗
Definition 12.7 (Inner products). The spatial inner product on H is denoted by
Z
(u, ϕ)H = u · ϕ dx

The temporal inner product on a space-time space X is denoted by


Z T
(u, ϕ) := (u, ϕ)X = (u, ϕ)H dt
0

Let us now define the function spaces. In space, we choose our well-known canditates:

V := H01 (Ω), H := L2 (Ω)

On the time interval I, we introduce the Hilbert space X := W (0, T ) with

W (0, T ) := {v|v ∈ L2 (I, V ), ∂t v ∈ L2 (I, V ∗ )}

The notation L2 (I, V ) means that v is square-integrable in time using values from I and mapping to the image
space V . The space X is embedded in C(I, ¯ H), which allows to work with well-defined (continuous) initial
conditions u0 .
Definition 12.8 (Bilinear forms). We define the spatial bilinear form ā : V × V → R as usually. The
time-dependent semi-linear form is as follows: a : X × X → R:
Z T
a(u, ϕ) := ā(u(t), ϕ(t)) dt.
0

Formulation 12.9 (Space-time weak form). Then the space-time parabolic problem is given by: Find u ∈ X
such that

(∂t u, ϕ) + a(u, ϕ) = (f, ϕ) ∀ϕ ∈ X


0
u(0) = u

with f ∈ L2 (I, V ∗ ) and u0 ∈ H.

273
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.3.2 Temporal discretization via One-Step-θ schemes
We first discretize in time and create a time grid of the time domain I = [0, T ] with NT intervals and a time
step size k = NTT :
0 = t0 < t1 < . . . < tN = T.
Furthermore we set
In = [tn−1 , tn ], kn = tn − tn−1 , k := max kn .
1≤n≤N

Moreover, we denote un := u(tn ).


The simplest discretizations are explicit and implicit Euler:
Formulation 12.10 (Explicit and implicit Euler). Given un−1 , find un such that

un − un−1
− ∇ · (α∇un−1 ) = f n−1
k
and for the implicit Euler scheme:

un − un−1
− ∇ · (α∇un ) = f n
k
Here we use finite differences and more specifically we introduce the so-called One-Step-θ scheme, which
allows for a compact notation for three major finite difference schemes: forward Euler (θ = 0), backward Euler
(θ = 1), and Crank-Nicolson (θ = 0.5) well known from numerical methods for ODEs.
Definition 12.11 (Choice of θ). By the choice of θ, we obtain the following time-stepping schemes:
• θ = 0: 1st order explicit Euler time stepping;

• θ = 0.5: 2nd order Crank-Nicolson (trapezoidal rule) time stepping;


• θ = 0.5 + kn : 2nd order shifted Crank-Nicolson which is shifted by the time step size kn = tn − tn−1
towards the implicit side;
• θ = 1: 1st order implicit Euler time stepping.

To have good stability properties (namely A-stability) of the time-stepping scheme is important for temporal
discretization of partial differential equations. Often (as here for the heat equation) we deal with second-order
operators in space, such PDE-problems are generically (very) stiff with the order O(h−2 ), where h is the spatial
discretization parameter.
After these preliminary considerations, let us now discretize in time the above problem:

Formulation 12.12 (One-Step-θ for the heat equation). The one-step-θ scheme for the heat equation reads:
Given the continuous PDE problem:

∂t u − ∇ · (α∇u) = f,

we obtain
un − un−1
− θ∇ · (α∇un ) − (1 − θ)∇ · (α∇un−1 ) = θf n + (1 − θ)f n−1
k
Re-organizing in known and unknown terms, we obtain:

un − kθ∇ · (α∇un ) = un−1 + k(1 − θ)∇ · (α∇un−1 ) + kθf n + k(1 − θ)f n−1
| {z } | {z }
unknown known

where un := u(tn ) and un−1 := u(tn−1 ) and f n := f (tn ) and f n−1 := f (tn−1 ).

274
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.3.3 Spatial discretization and an abstract One-Step-θ scheme
We take the temporally discretized problem and use a Galerkin finite element scheme to discretize in space
as explained in Section 8. That is we multiply with a test function and integrate. We then obtain: Find
uh ∈ {a + Vh } such that for n = 1, 2, . . . NT :

(unh , ϕh ) − kθ(α∇unh , ∇ϕh ) = (un−1 , ϕh ) + k(1 − θ)(α∇un−1 , ∇ϕh ) + kθ(f n , ϕh ) + k(1 − θ)(f n−1 , ϕh )

for all ϕh ∈ Vh .
Definition 12.13 (Specification of the problem data). For simplicity let us assume that there is no heat source
f = 0 and the material coefficients are specified by α = 1 and ρ = 1.
With the help of the semi-linear forms, we can formulate an abstract time-stepping scheme.
Formulation 12.14 (Abstract one-step-θ scheme). The one-step-θ scheme can be formulated in the following
abstract way: Given un−1 ∈ V , find un ∈ V such that

aT (un,k , ϕ) + θaE (un , ϕ) = −(1 − θ)aE (un−1 , ϕ) + θln (ϕ) + (1 − θ)ln−1 (ϕ).

Here:
1 n
aT (un,k , ϕ) = (u − un−1 , ϕ)
k
aE (un , ϕ) = (α∇un , ∇ϕ)
ln (ϕ) = (f n , ϕ).

12.3.4 Spatial discretization for the forward Euler scheme


Let us work with the explicit Euler scheme θ = 0. Then we obtain:

(unh , ϕh ) = (uhn−1 , ϕh ) − k(∇un−1


h , ∇ϕh )

The structure of this equation, namely

y n = y n−1 + kf (tn−1 , y n−1 ),

is the same as we know from ODEs. On the other hand, the single terms in the weak form are akin to our
derivations in Section 8. This means that we seek at each time tn a finite element solution unh on the spatial
domain Ω. With the help of the basis functions ϕh,j we can represent the spatial solution:
Nx
X
unh (x) := uh,j ϕh,j , uh,j ∈ R.
j=1

Then:
Nx
X
unh,j (ϕh,j , ϕh,i ) = (un−1 , ϕh,i ) − k(∇un−1
h , ∇ϕh,i )
j=1

which results in a linear equation system: M U = B. Here

(Mij )N
i,j=1 := (ϕh,j , ϕh,i ),
x

(Uj )N
j=1 := (uh,1 , . . . , uh,N )
x

(Bi )N n−1
i=1 := (uh
x
, ϕh,i ) − k(∇un−1
h , ∇ϕh,i ).

Furthermore the old time step solution can be written as well in matrix form:
Nx
X
(un−1 , ϕh,i ) = un−1
h,j (ϕh,j , ϕh,i )
j=1
| {z }
=:M

275
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
and similarly for the Laplacian term:
Nx
X
n−1
(∇un−1 , ∇ϕh,i ) = uh,j (∇ϕh,j , ∇ϕh,i ) .
j=1
| {z }
=:K

The mass matrix M (also known as the Gramian matrix) is always the same for fixed h and can be explicitely
computed.
Proposition 12.15. Formally we arrive at: Given un−1
h , find unh , such that

M unh = M un−1
h − kKun−1
h ⇒ unh = un−1
h − kM −1 Kun−1
h .

for n = 1, . . . , NT .
Remark 12.16. We emphasize that the forward Euler scheme is not recommended to be used as time stepping
scheme for solving PDEs. The reason is that the stiffness matrix is of order h12 and one is in general interested
in h → 0. Thus the coefficients become very large for h → 0 resulting in a stiff system. For stiff systems, as
we learned before, one should better use implicit schemes, otherwise the time step size k has to be chosen too
small in order to obtain stable numerical results; see the numerical analysis in Section 12.3.10.

12.3.5 Evaluation of the integrals (1D in space)


We need to evaluate the integrals for the stiffness matrix K:
Z Z xj+1
0 0
Nx
K = (Kij )ij=1 = ϕh,j (x)ϕh,i (x) dx = ϕ0h,j (x)ϕ0h,i (x) dx
Ω xj−1

resulting in  
2 −1 0
 −1 2 −1 
 
A = h−1 
 .. .. .. 
 . . . 

−1 2 −1
 
 
0 −1 2
Be careful and do not forget the material parameter α in case it is not α = 1.
For the mass matrix M we obtain:
Z Z xj+1
Nx
M = (Mij )ij=1 = ϕh,j (x)ϕh,i (x) dx = ϕh,j (x)ϕh,i (x) dx.
Ω xj−1

Specifically on the diagonal, we calculate:


xi x − x xi+1 
xi+1 − x 2
Z Z 2 Z
i−1
Mii = ϕh,i (x)ϕh,i (x) dx = dx + dx
Ω xi−1 h xi h
Z xi Z xi+1
1 2 1 2 h h
= 2 (x − xi−1 ) dx + 2 (xi+1 − x) dx = +
h xi−1 h xi 3 3
2h
= .
3
For the right off-diagonal, we have
xi+1 xi+1
x − xi xi+1 − x x − xi xi − x + h
Z Z Z
mi,i+1 = ϕh,i+1 (x)ϕh,i (x) dx = · dx = · dx
Ω xi h h xi h h
= ...
h h2 h
=− + = .
3 2h 6

276
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
It is trivial to see that mi,i+1 = mi−1,i . Summarizing all entries results in
 
4 1 0
1 4 1 
 
h .. .. ..
 
M=  . . . .
6 
1 4 1 


0 1 4

12.3.6 Final algorithms


We first setup a sequence of discrete (time) points:

0 = t0 < t1 < . . . < tN = T.

Furthermore we set
In = [tn−1 , tn ], kn = tn − tn−1 , k := max kn .
1≤n≤N

Forward (explicit) Euler:

Algorithm 12.17. Given the initial condition g, solve for n = 1, 2, 3, . . . , NT

M unh = M un−1
h − kKun−1
h ⇒ unh = un−1
h − kM −1 Kun−1
h ,

where unh , un−1


h ∈ RNx .

Algorithm 12.18. The backward (implicit) Euler scheme reads:

M unh + kKunh = M un−1


h

An example of a pseudo C++ code is

final_loop()
{
// Initalization of model parameters, material parameters, and so forth
set_runtime_parameters();

// Make a mesh - decompose the domain into elements


create_mesh();

apply_initial_conditions();

// Timestep loop
do
{
std::cout << "Timestep " << timestep_number << " (" << time_stepping_scheme
<< ")" << ": " << time << " (" << timestep << ")"
<< "\n==================================================================="
<< std::endl;

std::cout << std::endl;

// Solve for next time step: Assign previous time step solution u^{n-1}
old_timestep_solution = solution;

// Assemble FEM matrix A and right hand side f


assemble_system ();

277
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS

// Solve the linear equation system Ax=b


solve ();

// Update time
time += timestep;

// Write solution into a file or similar


output_results (timestep_number);

// Increment n->n+1
++timestep_number;
}
while (timestep_number <= max_no_timesteps);
}

12.3.7 Recapitulation of ODE stability analysis using finite differences


This section is optional. We recall the stability analysis for ODEs. The principle ideas are exactly the same
as in Section 6.

12.3.7.1 The Euler method The Euler method is the most simplest scheme. It is an explicit scheme and
also known as forward Euler method.
We consider again the IVP from before and the right hand side satisfies again a Lipschitz condition. Ac-
cording to ODE theory (e.g., [103]), there exists a unique solution for all t ≥ 0.
For a numerical approximation of the IVP, we first select a sequence of discrete (time) points:

t0 < t1 < . . . < tN = t0 + T.

Furthermore we set
In = [tn−1 , tn ], kn = tn − tn−1 , k := max kn .
1≤n≤N

The derivation of the Euler method is as follows: approximate the derivative with a forward difference quotient
(we sit at the time point tn−1 and look forward in time):
yn − yn−1
y 0 (tn−1 ) ≈
kn

Thus: y 0 (tn−1 ) = f (tn−1 , yn−1 (tn−1 )). Then the ODE can be approximated as:
yn − yn−1
≈ f (tn−1 , yn−1 (tn−1 ))
kn
Thus we obtain the scheme:
Algorithm 12.19 (Euler method). For a given starting point y0 := y(0) ∈ Rn , the Euler method generates a
sequence {yn }n∈N through
yn = yn−1 + kn f (tn−1 , yn−1 ), n = 1, . . . N,
where yn := y(tn ).

Remark 12.20 (Notation). The chosen notation is not optimal in the sense that yn denotes the discrete
solution obtained by the numerical scheme and y(tn ) the corresponding (unknown) exact solution. However,
in the literature one often abbreviates yn := y(tn ), which would both denote the exact solution. One could add
another index ynk (k for discretized solution with step size k) to explicitely distinguish the discrete and exact
solutions. In these notes, we hope that the reader will not confuse the notation and we still use yn for the
discrete solution and y(tn ) for the exact solution.

278
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.3.7.2 Implicit schemes With the same notation as introduced in Section 12.3.7.1, we define two further
schemes. Beside the Euler method, low-order simple schemes are implicit Euler and the trapezoidal rule. The
main difference is that in general a nonlinear equation system needs to be solved in order to compute the
solution. On the other hand we have better numerical stability properties in particular for stiff problems (an
analysis will be undertaken in Section 12.3.7.3).
The derivation of the backward Euler method is derived as follows: approximate the derivative with a
backward difference quotient (we sit at tn and look back to tn−1 ):
yn − yn−1
y 0 (tn ) =
kn
Consequently, we take the right hand side f at the current time step y 0 (tn ) = f (tn , yn (tn )) and obtain as
approximation
yn − yn−1
= f (tn , yn (tn )).
kn
Consequently, we obtain a scheme in which the right hand side is unknown itself, which leads to a formulation
of a nonlinear system:
Algorithm 12.21 (Implicit (or backward) Euler). The implicit Euler scheme is defined as:

y0 := y(0),
yn − kn f (tn , yn ) = yn−1 , n = 1, . . . , N

In contrast to the Euler method, the ‘right hand side’ function f now depends on the unknown solution yn .
Thus the computational cost is (much) higher than for the (forward) Euler method. But on the contrary, the
method does not require a time step restriction as we shall see in Section 12.3.7.4.
To derive the trapezoidal rule, we take again the difference quotient on the left hand side but approximate
the right hand side through its mean value:
yn − yn−1 1 
= f (tn , yn (tn )) + f (tn−1 , yn−1 (tn−1 )) ,
kn 2
which yields:
Algorithm 12.22 (Trapezoidal rule (Crank-Nicolson)). The trapezoidal rule reads:

y0 := y(0),
1  
yn = yn−1 + kn f (tn , yn ) + f (tn−1 , yn−1 , n = 1, . . . , N.
2
It can be shown that the trapezoidal rule is of second order, which means that halving the step size kn leads to
an error that is four times smaller.

12.3.7.3 Numerical analysis In the previous section, we have constructed algorithms that yield a sequence
of discrete solution {yn }n∈N . In the numerical analysis our goal is to derive a convergence result of the form

kyn − y(tn )k ≤ Ck α

where α is the order of the scheme. This result will tell us that the discrete solution y n really approximates
the exact solution y and if we come closer to the exact solution at which rate we come closer. To derive error
estimates we work with the model problem:

y 0 = λy, y(0) = y0 , y0 , λ ∈ R. (234)

For linear numerical schemes the convergence is composed by stability and consistency.
First of all we have from the previous section that yn is obtained for the forward Euler method as:

yn = (1 + kλ)yn−1 , (235)
= BE yn−1 , BE := (1 + kλ). (236)

279
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
Let us write the error at each time point tn as:
en := yn − y(tn ) for 1 ≤ n ≤ N.
It holds:
en = yn − y(tn ),
= BE yn−1 − y(tn ),
= BE (en−1 + y(tn−1 )) − y(tn ),
= BE en−1 + BE y(tn−1 ) − y(tn ),
k(BE y(tn−1 ) − y(tn ))
= BE en−1 + ,
k
y(tn ) − BE y(tn−1 )
= BE en−1 − k .
| k
{z }
=:ηn−1

Therefore, the error can be split into two parts:


Definition 12.23 (Error splitting of the model problem). The error at step n can be decomposed as
en := BE en−1 − kηn−1 . (237)
| {z } | {z }
Stability Consistency

The first term, namely the stability, provides an idea how the previous error en−1 is propagated from tn−1 to tn .
The second term ηn−1 is the so-called truncation error (or local discretization error), which arises because the
exact solution does not satisfy the numerical scheme and represents the consistency of the numerical scheme.
Moreover, ηn−1 yields the speed of convergence of the numerical scheme.
In fact, for the forward Euler scheme in (237), we observe for the truncation error:
y(tn ) − BE y(tn−1 ) y(tn ) − (1 + kλ)y(tn−1 )
ηn−1 = = (238)
k k
y(tn ) − y(tn−1 )
= − λy(tn−1 ) (239)
k
y(tn ) − y(tn−1 )
= − y 0 (tn−1 ). (240)
k
Thus,
y(tn ) − y(tn−1 )
y 0 (tn−1 ) = − ηn−1 , (241)
k
which is nothing else than the approximation of the first-order derivative with the help of a difference quotient
plus the truncation error. We investigate these terms further in Section 12.3.7.5 and concentrate first on the
stability estimates in the very next section.

12.3.7.4 Stability The goal of this section is to control the term BE = (1 + kλ). Specifically, we will justify
why |BE | ≤ 1 should hold. The stability is often related to non-physical oscillations of the numerical solution.
We recapitulate (absolute) stability and A-stability. From the model problem
y 0 (t) = λy(t), y(t0 ) = y0 , λ ∈ C,
we know the solution y(t) = y0 exp(λt). For t → ∞ the solution is characterized by the sign of Re λ:
Re λ < 0 ⇒ |y(t)| = |y0 | exp(Re λ) → 0,
Re λ = 0 ⇒ |y(t)| = |y0 | exp(Re λ) = |y0 |,
Re λ > 0 ⇒ |y(t)| = |y0 | exp(Re λ) → ∞.
For a good numerical scheme, the first case is particularly interesting whether such a scheme can produce a
bounded discrete solution when the continuous solution has this property.

280
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
Definition 12.24 ((Absolute) stability). A (one-step) method is absolute stable for λk 6= 0 if its application
to the model problem produces in the case Re λ ≤ 0 a sequence of bounded discrete solutions: supn≥0 |yn | < ∞.
To find the stability region, we work with the stability function R(z) where z = λk. The region of absolute
stability is defined as:
SR = {z = λk ∈ C : |R(z)| ≤ 1}.
Remark 12.25. Recall that R(z) := BE .
The stability functions to explicit, implicit Euler and trapezoidal rule are given by:

Proposition 12.26. For the simplest time-stepping schemes forward Euler, backward Euler and the trapezoidal
rule, the stability functions R(z) read:

R(z) = 1 + z,
1
R(z) = ,
1−z
1 + 21 z
R(z) = .
1 − 21 z

Proof. We take again the model problem y 0 = λy. Let us discretize this problem with the forward Euler
method:
yn − yn−1
= λyn−1 (242)
k
⇒ yn = (yn−1 + λk)yn−1 (243)
= (1 + λk)yn−1 (244)
= (1 + z)yn−1 (245)
= R(z)yn−1 . (246)

For the implicit Euler method we obtain:


yn − yn−1
= λyn (247)
k
⇒ yn = (yn−1 + λk)yn (248)
1
⇒ yn = yn (249)
1 − ak
1
⇒ yn = yn . (250)
1−z
| {z }
=:R(z)

The procedure for the trapezoidal rule is again the analogous.


Definition 12.27 (A-stability). A difference method is A-stable if its stability region is part of the absolute
stability region:
{z ∈ C : Re z ≤ 0} ⊂ SR,
here Re denotes the real part of the complex number z. A brief introduction to complex numbers can be found
in any calculus lecture dealing with those or also in the book [114].
In other words:

Definition 12.28 (A-stability). Let {yn }n the sequence of solutions of a difference method for solving the ODE
model problem. Then, this method is A-stable if for arbitrary λ ∈ C− = {λ : Re(λ) ≤ 0} the approximate
solutions are bounded (or even contractive) for arbitrary, but fixed, step size k. That is to say:

|yn+1 | ≤ |yn | < ∞ for n = 1, 2, 3, . . .

281
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
Remark 12.29. A-stability is attractive since in particular for stiff problems we can compute with arbitrary
step sizes k and do not need any step size restriction.
Proposition 12.30. The explicit Euler scheme cannot be A-stable.
Proof. For the forward Euler scheme, it is R(z) = 1 + z. For |z| → ∞ is holds R(z) → ∞ which is a violation
of the definition of A-stability.
Remark 12.31. More generally, explicit schemes can never be A-stable.
Example 12.32. We illustrate the previous statements.
1. In Proposition 12.26 we have seen that for the forward Euler method it holds:

yn = R(z)yn−1 ,

where R(z) = 1 + z. Thus, according to Definition 12.27 and 12.28, we obtain convergence when the
sequence {yn } is contracting:
|R(z)| ≤ |1 + z| ≤ 1. (251)
Thus if the value of λ (in z = λk) is very big, we must choose a very small time step k in order to achieve
|1 − λk| < 1. Otherwise the sequence {yn }n will increase and thus diverge (recall that stability is defined
with respect to decreasing parts of functions! Thus, the continuous solution is bounded and consequently
the numerical approximation should be bounded, too). In conclusions, the forward Euler scheme is only
conditionally stable, i.e., it is stable provided that (251) is fulfilled.
2. For the implicit Euler scheme, we see that a large λ and large k even both help to stabilize the iteration
scheme (but be careful, the implicit Euler scheme, stabilizes actually too much. Because it computes
contracting sequences also for case where the continuous solution would grow). Thus, no time step
restriction is required. Consequently, the implicit Euler scheme is well suited for stiff problems with large
parameters/coefficients λ.
Remark 12.33. The previous example shows that a careful design of the appropriate discretization scheme
requires some work: there is no a priori best scheme. Some schemes require time step size restrictions in
case of large coefficients (explicit Euler). On the other hand, the implicit Euler scheme does not need step
restrictions but may have in certain cases too much damping. Which scheme should be employed for which
problem depends finally on the problem itself and must be carefully thought for each problem again.

12.3.7.5 Consistency / local discretization error - convergence order We address now the second ‘error
source’ in (237). The consistency determines the precision of the scheme and will finally carry over the local rate
of consistency to the global rate of convergence. To determine the consistency we assume sufficient regularity
of the exact solution such that we can apply Taylor expansion. The idea is then that all Taylor terms of
combined to the discrete scheme. The lowest order remainder term determines finally the local consistency of
the scheme.
We briefly formally recall Taylor expansion. For a function f (x) we develop at a point a 6= x the Taylor
series:

X f (j) (a)
T (f (x)) = (x − a)j .
j=0
j!

Let us continue with the forward Euler scheme and let us now specify the truncation error ηn−1 in (241):

y(tn ) − y(tn−1 )
y 0 (tn−1 ) + ηn−1 = .
k
To this end, we need information about the solution at the old time step tn−1 in order to eliminate y(tn ).
Thus we use Taylor and develop y(tn ) at the time point tn−1 :
1
y(tn ) = y(tn−1 ) + y 0 (tn−1 )k + y 00 (τ n−1 )k 2
2

282
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
We obtain the difference quotient of forward Euler by the following manipulation:
y(tn ) − y(tn−1 ) 1
= y 0 (tn−1 ) + y 00 (τ n−1 )k.
k 2
We observe that the first terms correspond to (240). Thus the remainder term is
1 00 n−1
y (τ )k
2
and therefore the truncation error ηn−1 can be estimated as
1 00
kηn−1 k ≤ max ky (t)kk = O(k).
t∈[0,T ] 2
Therefore, the convergence order is k (namely linear convergence speed).

12.3.7.6 Convergence With the help of the two previous subsections, we can easily show the following error
estimates:
Theorem 12.34 (Convergence of implicit/explicit Euler). We have

max |yn − y(tn )| ≤ c(T, y)k = O(k),


tn ∈I

where k := maxn kn .
Proof. The proof does hold for both schemes, except that when we plug-in the stability estimate one should
recall that the backward Euler scheme is unconditionally stable and the forward Euler scheme is only stable
when the step size k is sufficiently small. It holds for 1 ≤ n ≤ N :
n−1
X
n−k
|yn − y(tn )| = ken k = kk BE ηk k
k=0
n−1
X
n−k
≤k kBE ηk k (triangle inequality)
k=0
n−1
X
n−k
≤k kBE k kηk k
k=0
n−1
X
n−k
≤k kBE k Ck (consistency)
k=0
n−1
X
≤k 1 Ck (stability)
k=0
= kN Ck
= T Ck, where we used k = T /N
= C(T, y)k
= O(k)

Theorem 12.35 (Convergence of trapezoidal rule). We have

max |yn (t) − y(t)| ≤ c(T, y)k 2 = O(k 2 ),


t∈I

The main message is that the Euler schemes both converge with order O(k) (which is very slow) and the
trapezoidal rule converges quadratically, i.e., O(k 2 ).
Let us justify the convergence order for the forward Euler scheme in more detail now.

283
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
Theorem 12.36. Let I := [0, T ] the time interval and f : I × Rd → Rd continuously differentiable and globally
Lipschitz-continuous with respect to y:

kf (t, y) − f (t, z)k2 ≤ Lky − zk2

for all t ∈ I and y, z ∈ Rd . Let y be the solution to

y 0 = f (t, y), y(0) = y0 .

Furthermore let yn , n = 1, . . . , n the approximations obtained with the Euler scheme at the nodal points tn ∈ I.
Then it holds
(1 + Lk)n − 1 00 eLT − 1 00
ky(tn ) − yn k2 ≤ ky k k ≤ ky k k = c(T, y)k = O(k),
2L 2L
for n = 0, . . . , N .

Proof. The proof follows [70], but consists in working out the steps shown at the beginning of Section 12.3.7.3,
Section 12.3.7.4, and Section 12.3.7.5.

12.3.8 Refinements of A-stability


We extend the terminology of A-stability [106]. Moreover, the previous stability functions shall also be used
to represent physical oscillations, which are related to the imaginary unit i. To this end, we have:

1. A-stability:
|R(z)| ≤ 1, z≤0
This ensures that the discrete solution remains bounded.
2. Strict A-stability:
|R(z)| ≤ 1 − ck, z ≤ −1.
The discrete solution is bounded for inhomogeneous right hand sides or irrugular initial data.
3. Strong A-stability:
|R(z)| ≤ κ < 1, z ≤ −1.
Damping of high-frequency errors and robust against local errors. See for instance Section 2.7 for an
illustration.
4. Numerical dissipation. For physical oscillations, the numerically introduced dissipation should be as
small as possible. This means:
R(±i) ∼ 1.

It is clear that only implicit schemes can be A-stable. Thus, explicit schemes play only a minor role in solving
numerically PDEs. Let us provide more examples. For the implicit Euler scheme, it holds:
1
R(z) =
1−z
and in −i:
1 1
|R(−i)| = | | = √ < 1.
1+i 2
We nicely see that the damping in the implicit Euler scheme is too strong and will damp out physical os-
cillations. Therefore, this scheme is not suited for wave equations. Now, let us consider the Crank-Nicolson
scheme:
1 + 21 z
lim = −1,
z→∞ 1 − 1 z
2

284
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
which means A-stability. Also we have for z = −i:
1 − 21 i
|R(−i)| = | | = 1.
1 + 21 i

Thus, physical oscillations can be perfectly represented. However, any small disturbances (e.g., even round-off
errors), can lead to a blow-up of the solution; see again Section 2.7. One possibility is a k-shift towards the
implicit side [100]:
0.5 + k
This scheme is strictly A-stable and still of second order as proven in [100].

12.3.9 Stiff problems


An essential difficulty in developing stable numerical schemes for temporal discretization using finite differences
is associated with stiffness, which we shall define in the following. Stiffness is very important in both ODE
and (time-dependent) PDE applications. The latter situation will be discussed in Section 12.3.10.
Definition 12.37. An IVP is called stiff (along a solution y(t)) if the eigenvalues λ(t) of the Jacobian
f 0 (t, y(t)) yield the stiffness ratio:
maxReλ(t)<0 |Reλ(t)|
κ(t) :=  1.
minReλ(t)<0 |Reλ(t)|
In the above case for the model problem, the eigenvalue corresponds directly to the coefficient λ.
It is however not really true to classify any ODE with large coefficients |λ|  1 always as a stiff problem.
Here, the Lipschitz constant of the problem is already large and thus the discretization error asks for relatively
small time steps. Rather stiff problems are characterized as ODE solutions that contain various components
with (significant) different evolutions over time; that certainly appears in ODE systems in which we seek
y(t) ∈ Rn rather than y(t) ∈ R.
For PDEs, we know that the stiffness matrix of the Laplacian satisfies K ∼ h12 and therefore it holds for the
condition number κ ∼ h12 (see Section 10.1). Thus, problems involving the Laplacian are always stiff.

12.3.10 Numerical analysis: stability analysis


We apply the concepts of the previous sections now to the heat equation. and simply replace y by u. The heat
equation is a good example of a stiff problem (for the definition we refer back to Section 12.3.9) and therefore,
implicit methods are preferred in order to avoid very, very small time steps in order to obtain stability. We
substantiate this claim in the current section.
The model problem is

∂t u − ∆u = 0 in Ω × I
u = 0 on ∂Ω × I,
u(0) = u0 in Ω × {0}.

Spatial discretization yields:

(∂t uh , φh ) + (∇uh , ∇φh ) = 0,


(u0h , φh ) = (u0 , φh ).

Backward Euler time discretization yields:

(unh , φh ) + k(∇unh , ∇φh ) = (uhn−1 , φh )

It holds
Proposition 12.38. A basic stability estimate for the above problem is:

kunh k ≤ ku0h k ≤ ku0 k (252)

285
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
Proof. For the stability estimate we proceed as earlier and make a special choice for a test function: φh := unh .
We then obtain:
1 n−1 2 1
kunh k2L2 + k|∇unh |2H 1 = (uhn−1 , unh ) ≤ kuh kL2 + kunh k2L2
2 2
which yields:
1 n 2 1
ku k 2 − kun−1 k2L2 + k|∇unh |2H 1 ≤ 0, ∀n = 1, . . . , N.
2 h L 2 h
Of course it further holds:
1 n 2 1
kuh kL2 − kuhn−1 k2L2 ≤ 0, ∀n = 1, . . . , N,
2 2
1 n 2 1
⇔ kuh kL2 ≤ kuhn−1 k2L2
2 2
Taking the square root and applying the result back to n = 0 yields
1 n 1
ku kL2 ≤ ku0h kL2
2 h 2
It remains to show the second estimate:
ku0h k ≤ ku0 k
In the initial condition we choose the test function ϕ := u0h and obtain:
1 0 1
(u0h , u0h ) = (u0 , u0h ) ≤ ku k + ku0h k
2 2
which shows the assertion.

12.3.10.1 Stability analysis for backward Euler A more detailed stability analysis that brings in the spatial
discretization properties is based on the matrix notation and is as follows. We recapitulate and start from (see
Section 12.3.6):
M unh + kKunh = M un−1
h
Then:
M unh + kKunh = M un−1
h
⇒ unh + kM −1 Kunh = un−1
h
⇒ (I + kM −1 K)unh = un−1
h

Compare to the ODE notation:


(1 + z)y n = y n−1
Thus, the role of z = kλ is taken by kM −1 K thus λ ∼ M −1 K. We know for the condition numbers
κ(M ) = O(1), κ(K) = O(h−2 ), h → 0.
We proceed similar to ODEs, and obtain
unh = (I + kM −1 K)−1 un−1
h

The goal is now to show that


|(I + kM −1 K)−1 | ≤ 1.
To this end, let µj , j = 1, . . . , M = dim(Vh ) be the eigenvalues of the matrix M −1 K ranging from the smallest
eigenvalue µ1 ∼ O(1) to the largest eigenvalue µM ∼ O(h−2 ); for a proof, we refer the reader to [82][Section
7.7]. Then:
1 1
|(I + kM −1 K)−1 | = max = .
j 1 + kµj 1 + kµM

286
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
To have a stable scheme, we must have:
1
< 1.
1 + kµM
We clearly have then again from (252):

kunh k ≤ kun−1
h k ⇒ kunh k ≤ ku0h k

Furthermore, we see that the largest eigenvalue µM will increase as O(h−2 ) when h tends to zero. Using the
backward Euler or Crank-Nicolson scheme, this will be not a problem and both schemes are unconditionally
stable.

12.3.10.2 Stability analysis for explicit Euler Here, the stability condition reads:

unh = (I − kM −1 K)un−1
h

yielding
|(I − kM −1 K)−1 | = | max(1 − kµj )| = |1 − kµM |.
j

As before µM = O(h −2
). To establish a stability estimate of the form kunh k ≤ kuhn−1 k, it is required that

|1 − kµM | ≤ 1.

Resolving this estimate, we obtain


2
kµM ≤ 2 ⇔ k≤ = O(h2 ),
µM
thus
k ≤ Ch2 , C > 0. (253)
Here, we only have conditional stability in time, namely the time step size k depends on the spatial
discretization parameter h. Recall that in terms of accuarcy and discretization errors we want to work with
small h, then the time step size has to be extremely small in order to have a stable scheme. In practice this
is in most cases not attrative at all and for this reason the forward Euler scheme does not play a role despite
that the actual solution is cheaper than solving an implicit scheme.

Exercise 10. Derive stability estimates for the Crank-Nicolson scheme.

12.3.11 Convergence results


We follow [99]. To this end, we consider a slighly more general problem statement than in the previous sections
of this chapter.
Formulation 12.39. Let Ω ⊂ Rn be a bounded domain with n = 1, 2, 3 and all given data sufficiently smooth.
Find u : Ω × (0, ∞) → R such that

∂t u + A(u) = f in Ω × (0, ∞)
u(0) = u0 in Ω × {0}
u(x, t) = uD on ∂ΩD × (0, ∞)
α∂n u + δu = uR on ∂ΩR × (0, ∞)

with the operator (see also Section 4.3.2)

A = −∇ · (α∇) + β · ∇ + γ.

Then, discretize this problem in space with a Galerkin FEM method of order r ≥ 2. Be careful, linear FEM
have the order r = 2 (not starting with 1!).

287
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
Theorem 12.40 (Convergence order; e.g., [93]; summarized in [99]). Let all given data be sufficiently regular
and the spatial discretization appropriate. Let the orders for spatial and temporal discretization be r ≥ 2 and
s ≥ 1, respectively. Then, we have optimal convergence for unkh to u(tn ) for k, h → 0, uniformely on bounded
time intervals, i.e.,
max ku(tn ) − unkh kL2 (Ω) = O(hr + k s ).
0≤tn ≤T

In the case of irregular data, the convergence rate may be reduced to o(1) in the worst case.
Remark 12.41. In several papers in the ‘80s, Rannacher has proven how the optimal convergence order can
be retained in the case of rough or irrgular initial data: [99, 100].
Remark 12.42. Some simple convergence analysis computations to show computationally the convergence
order are performed in Section 12.3.12.1.

288
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.3.12 Numerical tests
The series of numerical tests in this section has three goals:

• to show some figures to get an impression about simulation results;


• to show computationally that the stability results are indeed satisfied or violated;
• to show computationally satisfaction of the (discrete) maximum principle.

We consider the heat equation

∂t u − ∆u = 0, in Ω × I
u = 0 on ∂Ω × I,
u(x, 0) = sin(x) sin(y) in Ω × {0}.

We compute 5 time steps, i.e., T = 5s, with time step size k = 1s. The computational domain is Ω = (0, π)2 .
We use a One-Step-θ scheme with θ = 0 (forward Euler) and θ = 1 (backward Euler). The full programming
code can be found in Section 17.2.8.
The graphical results are displayed in the Figures 46 - 49. Due to stability reasons and violation of the
condition k ≤ ch2 , the forward Euler scheme is unstable (Figures 48 - 49). By reducing the time step size to
k = 0.01s (the critical time step value could have been computed - the value k = 0.01s has been found by
trial and error simply), we obtain stable results for the forward Euler scheme. These findings are displayed
in Figure 50. However, to reach T = 5s, we need to compute 500 time steps, which is finally more expensive,
despite being an explicit scheme, than the implicit Euler method.

Figure 46: Heat equation with θ = 1 (backward Euler) at T = 0, 1, 2, 5. The solution is stable and satisfies
the parabolic maximum principle. The color scale is adapted at each time to the minimum and
maximum values of the solution.

Figure 47: Heat equation with θ = 1 (backward Euler) at T = 0, 1, 2, 5. The solution is stable and satisfies the
parabolic maximum principle. The color scale is fixed between 0 and 1.

289
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS

Figure 48: Heat equation with θ = 0 (forward Euler) at T = 0, 1, 2, 5. The solution is unstable, showing non-
physical oscillations, because the time step restriction (253) is violated. The color scale is adapted
at each time to the minimum and maximum values of the solution.

Figure 49: Heat equation with θ = 0 (forward Euler) at T = 0, 1, 2, 5. The solution is unstable, showing
non-physical oscillations, because the time step restriction (253) is violated. The color scale is fixed
between 0 and 1.

Figure 50: Heat equation with θ = 0 at T = 0, 1, 2, 5 and time step size k = 0.01s. Here the results are stable
since the time step size k has been chosen sufficiently small to satisfy the stability estimate (253).
Of course, to obtain results at the same times T as before, we now need to compute more solutions,
i.e., N = 500, rather than 5 as in the other tests. The color scale is adapted at each time to the
minimum and maximum values of the solution.

290
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.3.12.1 Computational convergence analysis With the help of Chapter 16, we perform a brief temporal
convergence analysis for θ = 1 and θ = 0.5. In space, the mesh is five times refined. As time step sizes, we
choose k = 1, 0.5, 0.25. As goal functional, we choose the spatial L2 norm at the end time value T = 25s:
J(ukh (T )) = kukh (T )kL2 (Ω) .
Then, for the backward Euler scheme, we obtain:
alpha = 1/log(2) * log((6.644876e-02 - 1.236463e-01) /(1.236463e-01 - 2.485528e-01) )
ans = abs(alpha) = 1.1268
and for the Crank-Nicolson scheme:
alpha = 1/log(2) * log ((2.352296e-05 - 4.243702e-02 ) - (4.243702e-02 - 1.660507e-01))
ans = abs(alpha) = 3.6224
For the latter, the value is better than expected, which might be due that the spatial mesh is still too course.
See the hints in Section 16.3.

12.3.13 Student project in the FEM-C++ course in the summer 2018


We consider another numerical example with the heat equation inspired from control engineering. This test
and the corresponding C++ code were designed from scratch in the FEM-C++ class in the summer 2018 at LUH
by J. Goldenstein and J. Grashorn [60]. The programming code was written from scratch and the graphical
output was done with MATLAB using the MEX functionality. Specifically, the C++ code delivers csv files that
can be read by MATLAB.
We consider the following heat equation. Let Ω ⊂ R2 and find u = u(x, t) : Ω × I → R:
∂t u − div(α∇u) = fs (x, t), in Ω × I,
u=0 on ∂Ω × I,
u(x, 0) = 0 in Ω × {0}.
Here α describes the heat conductivity and fs the heat energy. In the numerical tests, α = 10 is chosen.
Besides code development, one main goal of the simulations was to compare again different time stepping
schemes with respect to their stability:
• θ = 0: explicit Euler (simulation results displated in Figure 53;
• θ = 0.5 + k: shifted Crank-Nicolson (simulation results displated in Figure 54.
The initial heat sources were described as sketched in Figure 51 [left].
In particular the right hand side fs (x, t) is itself determined by solving another ordinary differential equation
for the heat energy:
∂t Θ = Θ0 + ω(ΘH,C − Θ(t)),
with the evolution shown in Figure 51 [right]. Here ω = 0.6. We set:
fs (x, t) = Θ(t).

Θ ΘH

fs,H
Θ0

ΘC
fs,C

Figure 51: Left: heat sources fs (x, t) := Θ(t) in the domain Ω. At right, the evolution of Θ(t) is shown.
Figures provided by [60].

291
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
The parameters for the simulations are: the time step size k = 0.1s, the end time value Tmax = 5, and
furthermore Θ0 = 0 and ΘH = 1 and ΘC = −1. In the spatial discretization 20 nodes in each direction are
chosen. The resulting force data fs (x, t) are plotted in Figure 52 and the simulations results of solving the
final heat equation are displayed in Figure 53 and Figure 54.

Figure 52: Evolution of fs (x, t). Figures provided by [60].

292
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS

Figure 53: Numerical simulations using the explicit Euler scheme. Solution plots at t = 0, 0.2, 0.5, 1, 2, 5 are
displayed. We observe blow-up since the CFL-condition k ≤ ch2 for numerical stability in time is
violated. Figures provided by [60].

293
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS

Figure 54: Numerical simulations using the shifted Crank-Nicolson scheme. Solution plots at t =
0, 0.2, 0.5, 1, 2, 5 are displayed. Here, we observe stable results. Figures provided by [60].

294
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.4 Time discretization by operator splitting
Originally, time discretization schemes with operator splitting where introduced mainly for the Navier-Stokes
equations in order to decouple difficulties associated to the nonlinearity (the convection term) and the in-
compressiblity condition. However, also for other types of problems, the Fractional-Step-θ scheme, the most
prominent technique, appeared also to be very advantageous. The most striking feature is it little numerical
dissipation and being strongly A-stable such that we have the full smoothing property.

12.4.1 Derivation
Let V be a real Hilbert space and let A : V → V . We consider as model problem the initial value problem:

∂t u + A(u) = f, u(0) = u0 .

In the following we split A into two operators A1 and A2 such that

A = A1 + A2

As before let k the time step size.

12.4.2 Peaceman-Rachford time discretization


12.4.2.1 Basic scheme A very old and famous scheme is the Peaceman-Rachford scheme:
Formulation 12.43 (Peaceman-Rachford time discretization). Let u0 = u0 be the discrete initial value at
n = 0. For n ≥ 0, let un be the previous time step solution. We compute un+1 in a two-step procedure via the
intermediate value un+1/2 :
un+1/2 − un
+ A1 (un+1/2 ) + A2 (un ) = f n+1/2 ,
2k
un+1 − un+1/2
+ A1 (un+1/2 ) + A2 (un+1 ) = f n+1 ,
2k
where un+α := u(tn+α ) and f n+α := f (tn+α ).

12.4.2.2 Numerical analysis: stability and convergence We follow [29], but also refer to the references cited
therein.
Let us work in Rn . Let f = 0 and u0 ∈ Rn and A ∈ Rn×n is symmetric and positive definite. Furthermore

A1 = αA, A2 = βA

with α + β = 1 and 0 < α, β < 1.


From ODEs it is well-known that the matrix exponential function is a solution to the above model problem:

u(t) = exp(−tA)u0 .

12.4.2.2.1 Stability The goal is now to show a stability estimate as

un+1 = R(z)un

with |R(z)| ≤ 1 similar to Section 12.3.7.4.


To derive R(z) for a splitted scheme, we combine the two equations into one single equation and then proceed
as in Section 12.3.7.4. Using the splitting à la Peaceman-Rachford and the definitions of A1 and A2 , we obtain:
k k k k
un+1 = (I + β A)−1 (I − α A)(I + α A)−1 (I − β A)un .
2 2 2 2

295
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
Since A is symmetric positive definite, there exists a basis of eigenvectors. Consequently it holds
k k k k
un+1
i = (1 + β λi )−1 (1 − α λi )(1 + α λi )−1 (1 − β λi )uni ,
2 2 2 2
where λi > 0 for i = 1, . . . , N denoting the ith eigenvalue of the matrix A. We assume that the eigenvalues
are ordered as
λ1 ≤ λ2 ≤ . . . ≤ λN .
In order to analyze further, let us consider the rational function
(1 − α2 x)(1 − β2 x)
R(z) =
(1 + α2 x)(1 + β2 x)
Careful inspection yields that this stability function shares on the first view some similarities with the Crank-
Nicolson scheme. Indeed the Peaceman-Rachford scheme is inconditionally stable when
|R(z)| < 1 for allz > 0,
which is the case for the rational function under consideration. However, we have
lim |R(z)| = 1
z→∞

and therefore the scheme is only A-stable but not strongly A-stable. For stiff problems with λN /λ1  1 the
different components associated to small and large eigenvalues of A are not simultaneously damped. And the
scheme may possibly suffer from irregular initial data or numerical noice for long-term simulations.

12.4.2.2.2 Consistency and convergence To derive the order of the scheme, we develop the analytical
solution exp(−tA) via Taylor. Let us do this for z = tA:
1
e−z = 1 − z + z 2 + z 2 b(z).
2
On the other hand, the rational function can be re-written as:
1
R(z) = 1 − z + z 2 + z 2 c(z).
2
For the higher-order terms it holds
lim b(z) = lim c(z) = 0.
z→0 z→0
2
Since the first three terms coincide up to z , the scheme is second-order accurate for the model problem.
The convergence follows with similar arguments as shown in Section 12.3.7.6.
1
Remark 12.44. For α = β = 2 the two linear systems in the splitting scheme have the same matrix I + k4 A.

12.4.2.3 Application to Navier-Stokes


Formulation 12.45 (Peaceman-Rachford time discretization applied to NSE). Let u0 = u0 be the discrete
initial value at n = 0. For n ≥ 0, let v n be the previous time step solution of the velocities. From v n , we
compute v n+1/2 , pn+1/2 , v n+1 via
v n+1/2 − v n ν ν
Step 1: − ∆v n+1/2 + ∇pn+1/2 = f n+1/2 + ∆v n − (v n · ∇)v n ,
2k 2 2
∇ · v n+1/2 = 0,
v n+1 − v n+1/2 ν ν
Step 2: − ∆v n+1 + (v n+1 · ∇)v n+1 = f n+1 + ∆v n+1/2 − ∇pn+1/2 .
2k 2 2

Remark 12.46. In Step 1, we treat the incompressibility constraint and take the previous convection term
time discretization. In Step 2, we then consider the convection term for the current time step. Therefore:
A1 (v) = ∇ · v,
A2 (v) = (v · ∇)v.

296
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.4.3 The Fractional-Step-θ scheme
12.4.3.1 Basic scheme The idea is to split the current time interval [tn , tn+1 ] into three subintervals. First,
we take A1 implicitly and A2 explicitly, in the second step, we switch the roles, and the third step is the same
as the first.
Formulation 12.47 (Fractional-Step-θ). Let θ ∈ (0, 1/2). Set u0 = u0 and n ≥ 0. We compute un+1 in a
three-step procedure via the intermediate values un+θ , un+1−θ as follows:

un+θ − un
+ A1 (un+θ ) + A2 (un ) = f n+θ ,
θk
un+1−θ − un+θ
+ A1 (un+θ ) + A2 (un+1−θ ) = f n+1−θ ,
(1 − 2θ)k
un+1 − un+1−θ
+ A1 (un+1 ) + A2 (un+1−θ ) = f n+1 ,
θk

Remark 12.48. The Fractional-Step-θ scheme is 2nd order for a special choice of θ and strongly A-stable
(in contrast to Crank-Nicolson or the Peaceman-Rachford scheme). Consequently, the scheme has the full
smoothing property, which is important for rough initial data, rough boundary values or accumulation of other
numerical errors (e.g., quadrature) for long-time computations. In addition, the FS-θ scheme contains little
numerical dissipation, which is very desirable when dealing with elastodynamics (wave equations) and flow
simulations with nonenforced temporal oscillations (e.g., flow benchmarks [118] or FSI benchmarks [77]).

12.4.3.2 Numerical analysis: stability and convergence Let us now analyse the FS-θ scheme. Again we
follow closely the original work [29]. We set θ0 = 1 − 2θ. We have

un+1 = (I + αθkA)−1 (I − βθkA)(I + βθ0 kA)−1 (I − αθ0 kA)(I + αθkA)−1 (I − βθkA)un .

12.4.3.2.1 Stability Proceeding as for the Peaceman-Rachford scheme, we use the eigenvalues for each
component i:
(1 − βθkλi )2 (1 − αθ0 kλi ) n
un+1 = u .
i
(1 + αθkλi )2 (1 + βθ0 kλi ) i
We consider for simplicity of the presentation again the rational function R(z) defined as

(1 − βθkz)2 (1 − αθ0 kz)


R(z) =
(1 + αθkz)2 (1 + βθ0 kz)
We have in particular
β
lim |R(z)| =
z→∞ α
such that α ≥ β in order to have unconditional stability for the largest eigenvalues of A.

12.4.3.2.2 Consistency We know for the model problem:


1
e−z = 1 − z + z 2 + z 2 b(z).
2
Thus, we have to see until the FS-θ can match. First, we re-write R(z) such that
1  
R(z) = 1 − z + z 2 1 + (β 2 − α2 )(2θ2 − 4θ + 1) + z 2 c(z).
2
The scheme is first-order accurate when

(β 2 − α2 ) 6= 0 and (2θ2 − 4θ + 1) 6= 0.

297
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
The scheme is second-order accurate when

(β 2 − α2 ) = 0 or (2θ2 − 4θ + 1) = 0.

To have second-order accuracy we need specifically:

α=β

Because of α + β = 1 it then follows α = β = 21 . Or we need



2
θ =1− .
2
1
The question is whether both choices are admissible? Yes, they are, but for α = β = 2 we have from
β
lim |R(z)| = =1
z→∞ α
Therefore, this choice is not strongly A-stable.
In order to check the second choice we need to express θ in terms of α and β. We begin with the idea that
the matrices should be the same for all three substeps of the FS-θ scheme. This means:

αθ = β(1 − 2θ).

It can be inferred that


1 − 2θ θ
α= , β=
1−θ 1−θ
Knowing that α ≥ β yields
1
0<θ< .
3
In fact
1 1
θ= ⇒ α=β=
3 2
Therefore, this choice is again only a variant of the Peacemen-Rachford scheme. For θ strictly smaller than 13 ,
we have
β θ
lim |R(z)| = = < 1.
z→∞ α 1 − 2θ
For this reason, the scheme is unconditionally stable for this choice of θ and has good smoothing properties
for large eigenvalues of A. The most famous choice for θ is

• θ = 1 − 2/2,

• α = 2 − 2,

• β = 2 − 1.

12.4.3.3 Application to Navier-Stokes


Formulation 12.49 (Fractional-Step-θ time discretization applied to NSE). Let u0 = u0 be the discrete initial
value at n = 0. For n ≥ 0, let v n be the previous time step solution of the velocities. From v n , we solve
v n+θ − v n
Step 1: − αν∆v n+1/2 + ∇pn+θ = f n+θ + βν∆v n − (v n · ∇)v n ,
θk
∇ · v n+θ = 0,
v n+1−θ − v n+θ
Step 2: − βν∆v n+1−θ + (v n+1−θ · ∇)v n+1−θ = f n+1−θ + αν∆v n+θ − ∇pn+θ ,
(1 − 2θ)k
v n+1 − v n+1−θ
Step 3: − αν∆v n+1 + ∇pn+1 = f n+1 + βν∆v n+1−θ − (v n+1−θ · ∇)v n+1−θ ,
θk
∇ · v n+1 = 0.

298
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.4.4 Modified Fractional-Step-θ schemes
We present two recently proposed modifications of the original FS-θ scheme. Here, the second step is replaced
by an extrapolation rather than solving another system.

12.4.4.1 Basic scheme The macro time step is k = tn+1 − tn . Let θ = 1 − √12 and the initial value u0 = u0 ,
n ≥ 0 and un is the previous time step solution. Then, we solve:

un+θ − un
= f (un+θ , tn+θ ), (254)
θk
1 − θ n+θ 2θ − 1 n un+1 − un+1−θ
un+1−θ = u + u = f (un+1 , tn+1 ). (255)
θ θ θk
Remark 12.50. This scheme is fully implicit, strongly A-stable, 2nd-order accurate.

12.4.4.2 Application to Navier-Stokes (I)


Formulation 12.51.
v n+θ − v n
Step 1: + [−ν∆v n+θ + v n+θ · ∇v n+θ ] + ∇pn+θ = f n+θ , (256)
θk
∇ · v n+θ = 0, (257)
1 − θ n+θ 2θ − 1 n
Step 2: v n+1−θ = v + v , (258)
θ θ
v n+1 − v n+1−θ
Step 3: + [−ν∆v n+1 + v n+1 · ∇v n+1 ] + ∇p̃n+1 = f n+1 , (259)
θk
∇ · v n+1 = 0, (260)
n+1 n+θ n+1
Step 4: p = (1 − θ)p + θp̃ . (261)

12.4.4.3 Application to Navier-Stokes (II)


Formulation 12.52.
v n+θ − v n
Step 1: + [−ν∆v n+θ + v n+θ · ∇v n+θ ] + ∇pn+θ = f n+θ , (262)
θk
∇ · v n+θ = 0, (263)
1 − θ n+θ 2θ − 1 n
Step 2: v n+1−θ = v + v , (264)
θ θ
v n+1 − v n+1−θ
Step 3: + [−ν∆v n+1 + v n+1 · ∇v n+1 ] + ∇pn+1 = f n+1 , (265)
θk
∇ · v n+1 = 0. (266)

Remark 12.53. Here we simply take the last pressure solution and do not work with a tilde pressure.

299
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.5 The discontinuous Galerkin (DG) method for temporal discretization
We now introduct a DG discretization in time. The way is very similar to the class Numerik 2 [117].
Definition 12.54. Let q ∈ N and I = (0, T ). Then, we define the space
Xhk = {v : I → Vh : v|In ∈ Pq (In ), n = 1, . . . , N }
where
q
X
Pq (In ) = {v : In → Vh : v(t) = vi ti , vi ∈ Vh }.
i=0
In words: the space Xhk is the space of functions on the time interval I with values in the FE space Vh . On
each time interval In the functions vary as polynomials of degree at most q.
We recall from Numerik 2 [117] that the functions v ∈ Xhk may be discontinuous in time at the discrete
time levels tn . We use the standard notation
n n
v+ = lim+ v(tn + ε), v− = lim− v(tn + ε), [v n ] = v+
n
− vnn
ε→0 ε→0

The third definition is the jump of v at tn . We formulate the DG method for the heat equation as follows:
Formulation 12.55. Find uhk ∈ Xhk such that
ā(uhk , ϕhk ) = ¯l(ϕhk ) ∀ϕhk ∈ Xhk .
with
N Z
X N
X
ā(uhk , ϕhk ) = ((∂t uhk , ϕhk ) + a(uhk , ϕhk )) dt + ([un−1 n−1 0 0
hk ], ϕ+ ) + (uhk , ϕ+ )
n=1 In n=2
and the right hand side functional
Z
¯l(ϕhk ) = (f, ϕhk ) dt + (u0 , ϕ0+ ).
I

Because v ∈ Xhk varies independently on each subinterval In , the previous formulation decouples and we
obtain:
Formulation 12.56. For n = 1, . . . , N given un−1hk,− , find uhk = uhk |In ∈ Pq (In ) such that
Z Z
((∂t uhk , ϕ) + a(uhk , ϕ)) dt + (un−1
hk,+ , ϕn−1
+ ) = (f, ϕ) dt + (un−1 n−1
hk,− , ϕ+ ) ϕ ∈ Pq (In )
In In

where u0hk,− =u .0

Using specific values for q, we obtain concrete time-stepping schemes.


Formulation 12.57 (DG(0)). For q = 0 with unhk = unhk,− = un−1 n
hk,+ we obtain for n = 1, . . . , N : Find uhk ∈ Vh
such that Z
(unhk − un−1
hk , ϕ) + k n a(u n
hk , ϕ) = (f, ϕ) dt ∀ϕ ∈ Vh , n = 1, . . . , N
In

with u0hk 0
= u . This scheme is close to the backward Euler scheme. The only difference is in the right hand
side, which is here an average of In rather than the evaluation of f at tn .
Formulation 12.58 (DG(1)). For q = 1, we obtain a system of equations. First, we define
t − tn−1
uhk (t) = v0 + v1 , t ∈ In , vi ∈ Vh .
kn
Then, we obtain:
Z
1
(v0 , ϕ) + kn a(v0 , ϕ) + (v1 , ϕ) + kn a(v1 , ϕ) = (un−1
hk,− , ϕ) + (f (s), ϕ) ds, ∀ϕ ∈ Vh
2 In
Z
1 1 1 1
kn a(v0 , ϕ) + (v1 , ϕ) + kn a(v1 , ϕ) = (s − tn−1 )(f (s), ϕ) ds, ∀ϕ ∈ Vh .
2 2 3 kn In

300
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.6 Methods for second-order-in-time hyperbolic problems
For second-order (in time) hyperbolic problems (prototype is the wave equation) the situation is a bit more
tricky since we have a second-order in time derivative.

12.6.1 Problem statement


The equation is given by; see also Section 4.3:
2
ρ∂tt u − ∇ · (∇u) = f in Ω × I,
u = 0 on ∂Ω × I,
u(x, 0) = u0 (x) in Ω × {0},
∂t u(x, 0) = v0 (x) in Ω × {0}

A novel feature in comparison to parabolic problems from the previous sections is energy conservation.
So far we were concerned with
• Stability;
• Convergence.
Now we also need
• Energy conservation.
Consequently, we need time-discretizations that have the boundary of their stability region on the imaginary
axis in order to conserve the energy on the time-discrete level; see Section 12.3.8. As we will see, this further
reduces the choices of ‘good’ time-stepping schemes. The wave equation is important in many applications
and for this reason a zoo of time-stepping schemes (the most prominent being the Newmark scheme) has been
proposed. Finite difference schemes discretizing directly the second-order time derivative have some problems
in the energy conservation. On the other hand, discretizing the wave equation directly with a One-Step-θ
scheme is not possible because of the second-order time derivative. Therefore, a common procedure is to
re-write the equation into a first-order mixed system. We introduce a second solution variable (the velocity),
which also needs to be discretized in space and results in a higher computational cost.

12.6.2 Variational formulations


A formal derivation of a variational form reads:
2
(∂tt u, φ) + (∇u, ∇φ) = (f, φ) ∀φ ∈ H01 .

From this variational formulation, we seek

du d2 u
u ∈ L2 (I, H01 ), ∈ L2 (I, L2 ), ∈ L2 (I, H −1 ).
dt dt2
Proposition 12.59. Let f ∈ L2 (I, L2 ) and the initial data u0 ∈ H01 and v0 ∈ L2 be given. Then the variational
problem has a unique solution
du
u ∈ L2 (I, H01 ), ∈ L2 (I, L2 ).
dt
Moreover, the mapping
du
{f, u0 , v0 } → {u, v}, v =
dt
from
L2 (I, L2 ) × H01 × L2 → L2 (I, H01 ) × L2 (I, H −1 )
is linear and continuous.
Proof. See Wloka [145].

301
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
We also state the promised first-order mixed system. We recall
2
ρ∂tt u − ∇ · (∇u) = f

yielding

∂t u = v,
ρ∂t v − ∇ · (∇u) = f.

Then we formulate the variational system:


Formulation 12.60. Find (u, v) ∈ L2 (I, H01 ) × L2 (I, L2 ) with u0 ∈ H01 and v0 ∈ L2 such that

(∂t u, ψ) = (v, ψ) ∀ψ ∈ L2 ,
(ρ∂t v, φ) + (∇u, ∇φ) = (f, φ) ∀φ ∈ H01 .

12.6.3 A space-time formulation


We use the previous formulations to derive space-time variational formulations for the wave equation. We
briefly introduce the concept in this section.
Let us consider the second-order hyperbolic PDE:
Formulation 12.61. Let Ω ⊂ Rn be open and let I := (0, T ) with T > 0. Find u : Ω × I → R and
∂t u : Ω × I → R such that

ρ∂t2 u − ∇ · (a∇u) = f in Ω × I,
u=0 on ∂ΩD × I,
a∂n u = 0 on ∂ΩN × I,
u = u0 in Ω × {0},
v = v0 in Ω × {0}.

We now prepare the functions spaces requires for the weak formulation. Let us denote L2 and H 1 as the
usual Hilbert spaces and H −1 the dual space to H 1 . For the initial functions we assume:
• u0 ∈ H01 (Ω)n ;
• v0 ∈ L2 (Ω)n .
For the right hand side source term we assume
• f ∈ L2 (I, H −1 (Ω)), where L2 (·, ·) is a Bochner space; see Section 12.2. Specifically, we remind the reader
that the initial values are a priori not well defined since they are in L-spaces. However, we have Theorem
12.3 ensuring that the initial data are continuous in time and therefore well defined.
We introduce the following short-hand notation:
• H := L2 (Ω)n ;
• V := H01 (Ω)n ;
• V ∗ is the dual space to V .
• H̄ := L2 (I, H);
• V̄ := {v ∈ L2 (I, V )| ∂t v ∈ H̄}.
Theorem 12.62. If the operator A := −∇ · (a∇u) satisfies the coercivity estimate:

(Au, u) ≥ βkuk21 , u ∈ V, β > 0,

then there exists a unique weak solution with the properties:

302
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
¯ V );
• u ∈ V̄ ∩ C(I,
¯ H);
• ∂t u ∈ H̄ ∩ C(I,
• ∂t2 u ∈ L2 (I, V ∗ ).
Proof. See Lions and Magenes, Lions or Wloka.
Definition 12.63. The previous derivations allow us to define a compact space-time function space for the
wave equation:
¯ V ), ∂t v ∈ C(I,
X := {v ∈ V̄ | v ∈ C(I, ¯ H), ∂ 2 v ∈ L2 (I, V ∗ )}.
t

To design a Galerkin time discretization, we need to get rid of the second-order in time derivative and
therefore usually the wave equation is re-written in terms of a mixed first-order system:
Formulation 12.64. Let Ω ⊂ Rn be open and let I := (0, T ) with T > 0. Find u : Ω × I → R and
∂t u = v : Ω × I → R such that

ρ∂t u − ρv = 0 in Ω × I,
ρ∂t v − ∇ · (a∇u) = f in Ω × I,
u=0 on ∂ΩD × I,
a∂n u = 0 on ∂ΩN × I,
u = u0 in Ω × {0},
v = v0 in Ω × {0}.

To derive a space-time formulation, we first integrate formally in space:

Au (U )(ψ u ) = (ρ∂t u, ψ u ) − (ρv, ψ u ),


Av (U )(ψ v ) = (ρ∂t v, ψ v ) + (a∇u, ∇ψ v ) − (f, ψ v )

And then in time:


Z
Āu (U )(ψ u ) = (ρ∂t u, ψ u ) − (ρv, ψ u ) dt + (u(0) − u0 , ψ u (0)),


ZI
Āv (U )(ψ v ) = (ρ∂t v, ψ v ) + (a∇u, ∇ψ v ) − (f, ψ v ) dt + (v(0) − v0 , ψ v (0)).

I

The total problem reads:


¯ H), ∂t w ∈
Formulation 12.65. Find U = (u, v) ∈ Xu × Xv with Xu = X and Xv := {w ∈ H̄|w ∈ C(I,
2 ∗
L (I, V )} such that
Ā(U )(Ψ) = 0 ∀Ψ = (ψ u , ψ v ) ∈ Xu × Xv
where
Ā(U )(Ψ) := Āv (U )(ψ v ) + Āu (U )(ψ u ).
Such a formulation is the starting point for the discretization: either in space-time or using a sequential
time-stepping scheme and for instance FEM in space.

12.6.4 Energy conservation and consequences for good time-stepping schemes


Another feature of the wave equation is energy conservation. In the ideal case, f = 0, ρ = 1, and no dissipation,
build the variational form and obtain:
2
(∂tt u, φ) + (∇u, ∇φ) = 0.

Using φ = ∂t u as test function, we obtain:


2
(∂tt u, ∂t u) + (∇u, ∇∂t u) = 0,

303
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
which yields:
1 d 1 d
(∂t u, ∂t u) + (∇u, ∇u) = 0,
2 dt 2 dt
which is
1 d 1 d
k∂t uk2 + k∇uk2 = 0.
2 dt 2 dt
Remark 12.66. It holds:
Z Z Z Z
1 d 1 1
∂t u · ∂t u dt = ∂t (∂t u · ∂t u) dt = 2∂t2 u · ∂t u dt = ∂t2 u · ∂t u dt = (∂t2 u, ∂t u).
2 dt Ω 2 Ω 2 Ω Ω

Thus the time variation is zero and therefore by integration in time yields

1 t d 1 t d
Z Z
k∂t uk2 + k∇uk2 = 0.
2 0 dt 2 0 dt

Thus,
k∂t u(t)k2 + k∇u(t)k2 = const = kv0 k2 + k∇u0 k2
| {z } | {z } | {z } | {z }
=Ekin (t) =Epot (t) =Ekin (0) =Epot (0)
| {z } | {z }
=Etot (t) =Etot (0)

which is energy conservation


Etot (t) = Etot (0)
at all times t ∈ I.
Consequently, the construction of good time-stepping schemes becomes more involved than for the parabolic
case. From the ODE analysis, we recall that for the backward Euler scheme and the Crank-Nicolson scheme
are both implicit and unconditionally stable. But the backward Euler scheme is very dissipative and may
introduce too much artificial diffusion alternating the energy conservation on the discrete time level. Here, the
Crank-Nicolson scheme is much better suited. We recall in other words from Section 12.3.8:
Definition 12.67 (Stability and properties of time stepping schemes ). A good time stepping scheme for
time-discretizing partial differential equations should satisfy:

• A-stability (local convergence): an example is the Crank-Nicolson scheme (trapezoidal rule);


• strict A-stability (global convergence): an example is the shifted Crank-Nicolson scheme;
• strong A-stability (smoothing property): examples are backward Euler (in particular even L-stable10 ) and
Fractional-step-θ schemes;

• small dissipation (energy conservation).



Let us now describe the properties of the previous schemes in more detail:

• The explicit Euler scheme is cheap in the computational cost since no equation system needs to be solved.
However, in order to be numerically stable (very, very) small time steps must be employed, which makes
this scheme infeasible for most PDE problems
• A classical scheme for problems with a stationary limit is the (implicit) backward Euler scheme (BE),
which is strongly A-stable (but only from first order), robust and dissipative. It is used in numerical
Examples, where a stationary limit must be achieved. However, due to its dissipativity this schemes
dampes high-frequent temporal parts of the solution too much and for this reason this scheme is not
recommended for nonstationary PDE problems.

10 A time-stepping scheme is L-stable if it is A-stable and the stability function satisfies |R(z)| → 0 for z → ∞. In this respect
the Crank-Nicolson scheme is not L-stable since R(z) → 1 for z → −∞.

304
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
• In contrast, the (implicit) Crank-Nicolson scheme is of second order, A-stable, and has very little dissi-
pation but suffers from case-to-case instabilities caused by rough initial and/or boundary data. These
properties are due to weak stability (it is not strongly A-stable). A variant of the Crank-Nicolson scheme
is called shifted Crank-Nicolson scheme, is analyzed in Rannacher et al. [74, 100], which allows for global
stability of the solution 11 . In particular, Rannacher analyzed in [99] the shifted Crank-Nicolson scheme
for linear evolution equations and how they can be modified in order to make them suitable for long-term
computations (without reducing second order accuracy!!).
• The fourth scheme summarizes the advantages of the previous two and is known as the Fractional-Step-
θ scheme for computing unsteady-state simulations [58]. Roughly-speaking it consists of summarizing
three Crank-Nicolson steps and has therefore the same accuracy and compuational cost as the Crank-
Nicolson scheme. However, it is more robust, i.e., it is strongly A-stable (as backward Euler) but has 2nd
order accuracy as Crank-Nicolson, and is therefore well-suited for computing solutions with rough initial
data and long-term computations for problems. This property also holds for ALE-transformed fluid
equations, which is demonstrated in a numerical test below. We also refer the reader to a modification
of the Fractional-Step-θ scheme [133].

12.6.5 One-Step-θ times-stepping for the wave equation


To finish, we aim to apply a One-Step-θ scheme and therefore, we need to work with the first-order mixed-
system. Then, we apply the One-Step-θ scheme to Formulation 12.60 idea to discretize in time: Find v n and
un such that:
v n − v n−1
(ρ , φ) + θ(∇un , ∇φ) + (1 − θ)(∇un−1 , ∇φ) = θ(f n , φ) + (1 − θ)(f n−1 , φ)
k
un − un−1
( , ψ) = θ(v n , ψ) + (1 − θ)(v n−1 , ψ)
k
And finally, we discretize in space using a finite element scheme: Find vhn and unh such that:

vhn − vhn−1
(ρ , φh ) + θ(∇unh , ∇φh ) + (1 − θ)(∇un−1
h , ∇φh ) = θ(f n , φh ) + (1 − θ)(f n−1 , φh )
k
un − un−1
( h h
, ψh ) = θ(vhn , ψh ) + (1 − θ)(vhn−1 , ψh )
k
Re-arringing given data terms on the right hand side yields:

k −1 (ρvhn , φh ) + θ(∇unh , ∇φh ) = k −1 (ρvhn−1 , φh ) + θ(f n , φh ) + (1 − θ)(f n−1 , φh ) − (1 − θ)(∇un−1


h , ∇φh )
k −1 (unh , ψh ) − θ(vhn , ψh ) = k −1 (un−1
h , ψh ) + (1 − θ)(vhn−1 , ψh )

To obtain the system matrix and the linear system, we proceed as in the parabolic case by considering that
we now seek two solutions unh and vhn .
With the previous discretizations, we can now define
Algorithm 12.68. Given the initial guesses u0h and vh0 and given at each time the right hand side forces
f n := f (tn ) and kn = tn − tn−1 we solve for each n = 1, 2, 3, . . . , N : Find vhn and unh such that

kn−1 (ρvhn , φh ) + θ(∇unh , ∇φh ) = kn−1 (ρvhn−1 , φh ) + θ(f n , φh ) + (1 − θ)(f n−1 , φh ) − (1 − θ)(∇un−1
h , ∇φh )
kn−1 (unh , ψh ) − θ(vhn , ψh ) = kn−1 (un−1
h , ψh ) + (1 − θ)(vhn−1 , ψh )

for (φh , ψh ) ∈ Vh × Wh with Vh × Wh ⊂ H01 × L2 .


Remark 12.69. An abstract generalization of the One-Step-θ scheme has been formulated in [135][Chapter
5].
11 A famous alternative to the shifted Crank-Nicolson scheme is Rannacher’s time-marching technique in which the unshifted
Crank-Nicolson scheme is augmented with backward Euler steps for stabilization and in particular for irregular initial data
[99]. Adding backward Euler steps within Crank-Nicolson does also stabilize during long-term computations [74, 99].

305
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
We now use the FEM representations at time tn
N1
X
unh = a j ϕj
j=1
N2
X
vhn = bj ψj
j=N1 +1

and obtain:
X X
bj kn−1 (ρψj , ϕi ) + aj θ(∇ϕj , ∇ϕi ) = kn−1 (ρvhn−1 , ϕi ) + θ(f n , ϕi ) + (1 − θ)(f n−1 , ϕi ) − (1 − θ)(∇un−1
h , ∇ϕi )
j j
X X
aj kn−1 (ϕj , ψi ) − bj θ(ψj , ψi ) = kn−1 (un−1
h , ψi ) + (1 − θ)(vhn−1 , ψi )
j j

resulting in the following block system (respectively k, θ, and ρ):


    
Auu Buv a F
=
Bvu Mvv b G
with
Auu = (∇ϕj , ∇ϕi )ij , Buv = (ψj , ϕi )ij , Bvu = (ϕj , ψi )ij , Mvv = (ψj , ψi )ij
and
a = (aj )N
j=1 ,
1
b = (bj )N
j=1 .
2

12.6.6 Stability analysis / energy conservation on the time-discrete level


Similar to parabolic problems, we perform a stability analysis, but this time with a focus on energy conservation
on the time-discretized level.
We set
0 = t0 < t1 < . . . < tN = T, k = tn − tn−1
The model problem is:
Formulation 12.70. Let V = H01 and W = L2 . Given un−1 ∈ V and v n−1 ∈ W we solve for n = 1, . . . , N
using the Crank-Nicolson scheme: Find (un , v n ) ∈ V × W such that
1 1
(v n − v n−1 , φ) + k(∇un + ∇un−1 , ∇φ) = (f n + f n−1 , φ) ∀φ ∈ V,
2 2
1
(un − un−1 , ψ) − k(v n + v n−1 , ψ) = 0 ∀ψ ∈ W.
2
Proposition 12.71. Setting f n = 0 for all n = 1, . . . , N in the previous formulation, we have
k∇un k2 + kv n k2 = k∇un−1 k2 + kv n−1 k2
| {z } | {z } | {z } | {z }
n
Epot n
Ekin n−1 n−1
Epot Ekin

and therefore
n n−1
Etot = Etot .
Proof. We start from Formulation 12.70 and choose once again clever test functions:
φ = un − un−1 , ψ = v n − v n−1 .
Then:
1
(v n − v n−1 , un − un−1 ) + k(∇un + ∇un−1 , ∇(un − un−1 )) = 0,
2
1
(un − un−1 , v n − v n−1 ) − k(v n + v n−1 , v n − v n−1 ) = 0.
2

306
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
We subtract the second equation from the first equation. Due to the bi-linear property the first terms cancel
each other:
1 1
k(∇un + ∇un−1 , ∇(un − un−1 )) + k(v n + v n−1 , v n − v n−1 ) = 0.
2 2
Next, the cross terms cancel due to different signs and it remains:
1 1 1 1
k(∇un , ∇un ) + k(v n , v n ) = k(∇un−1 , ∇un−1 ) + k(v n−1 , v n−1 ),
2 2 2 2
therefore we easily see

k∇un k + kv n k = k∇un−1 k + kv n−1 k.

Voilà! We are finished.


We next show that indeed the implicit Euler scheme introduces artificial viscosity and loose energy over
time:
Formulation 12.72. Let V = H01 and W = L2 . Given un−1 ∈ V and v n−1 ∈ W we solve for n = 1, . . . , N
using the implicit Euler scheme: Find (v n , un ) ∈ W × V such that

(v n − v n−1 , φ) + k(∇un , ∇φ) = (f n , φ) ∀φ ∈ V,


n n−1 n
(u − u , ψ) − k(v , ψ) = 0 ∀ψ ∈ W.

Proposition 12.73. Setting f n = 0 for all n = 1, . . . , N in the previous formulation, we have

k∇un k2 + kv n k2 ≤ k∇un−1 k2 + kv n−1 k2


| {z } | {z } | {z } | {z }
n
Epot n
Ekin n−1 n−1
Epot Ekin

and therefore
n n−1
Etot ≤ Etot .
Therefore, energy is dissipated due to the ‘wrong’ numerical scheme.
Proof. The proof is basically the same as before. We start from Formulation 12.72 and choose the same test
functions as before:
φ = un − un−1 , ψ = v n − v n−1 .
Then:

(v n − v n−1 , un − un−1 ) + k(∇un , ∇(un − un−1 )) = 0,


(un − un−1 , v n − v n−1 ) − k(v n , v n − v n−1 ) = 0.

We subtract the second equation from the first equation. Due to the bi-linear property the first terms cancel
each other:

k(∇un , ∇(un − un−1 )) + k(v n , v n − v n−1 ) = 0.

Next,

k(∇un , ∇un ) − k(∇un , ∇un−1 ) + k(v n , v n ) − k(v n , v n−1 ) = 0.

Here, bring the cross terms on the right hand side and apply once again Youngs’ inequality:
1 1 1 1
k(∇un , ∇un ) + k(v n , v n ) ≤ k∇un k + k∇un−1 k + kv n k + k∇v n−1 k.
2 2 2 2
Re-arranging terms yields the assertion.

307
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.6.7 Convergence results
A priori and a posteriori error estimates for the linear second-order in time wave equation using the backward
Euler scheme can be found [17]. Specifically, adaptive time step sizes can be chosen therein. Other results for
uniform time steps, but including the Crank-Nicolson scheme are proven in [64].

12.6.8 Summary of stability, convergence and energy conservation


We summarize our findings (see for more details in [8, 17, 64]). Our first conclusions is: In extension to elliptic
and parabolic problems, the numerical discretization of the wave equation asks for temporal schemes that
satisfy energy conservation.
Summarizing everything yields:
• Stability in the L2 -norm: the One-Step-θ scheme is unconditionally stable, i.e., there is no time step
restriction on k if and only if θ ∈ [ 12 , 1].
• Convergence It holds (can be proven using tools from ODE numerical analysis):
– θ ∈ [0, 0.5) ∪ (0, 5, 1]: linear convergence O(k)
– θ = 0.5: quadratic convergence O(k 2 ).
• Energy conservation: the one-step-θ scheme preserves energy only for the choice θ = 12 . For θ > 12 (e.g.,
the implicit Euler scheme for the choice θ = 1) the scheme dissipates energy as shown in the previous
subsection. For θ < 21 energy conservation depends, but in all cases schemes will be unstable.

Consequently, the Crank-Nicolson scheme is an optimal time-stepping scheme for hyperbolic equations.
When explicit schemes are taken, possible restrictions with respect to the time-step size are weaker for
hyperbolic problems than for parabolic differential equations [64]; namely
Definition 12.74 (Courant-Friedrichs-Levy - CFL). A necessary condition for explicit time stepping schemes
is the Courant-Friedrichs-Levy (CFL) condition, i.e., the time step size k is dependent on material parameters
and the spatial discretization parameter h:
• Parabolic problems: k ≤ 12 ch2 ;
• Hyperbolic problems: k ≤ a−1 h, where a is of order of the elasticity constant in the wave operator.

308
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.7 Numerical tests: scalar wave equation
We provide some impressions about the scalar-valued wave equation. The setting is similar to Section 8.16.1.
We take T = 10s and k = 1s using the Crank-Nicolson scheme. The initial solutions u0 and v 0 are taken to
be zero. In the Figures 55 and 56 we show the evolution of the displacements and the velocity.

Figure 55: Scalar wave equation: Evolution of displacements (left) and velocities (right).

309
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS

Figure 56: Scalar wave equation: Evolution of displacements (left) and velocities (right).

310
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.8 Numerical tests: elastic wave equation
We extend the scalar wave equation to the elastic wave equation. Specifically, the linearized elastic wave
equation is augmented with the acceleration term:
Problem 12.75 (Discretized nonstationary linearized elasticity / Elastodynamics). Given the initial condi-
tions u0 and v0 and a vector-valued right hand side fˆ = (fx , fy ). Find (ûh , v̂h ) ∈ V̂h0 × Ŵh such that

b ϕ̂h ) = (fˆ, ϕ̂h )


b h,lin , ∇
(ρ̂∂t v̂h , ϕ̂h ) + (Σ ∀ϕ̂h ∈ V̂h0 ,
(∂t ûh , ψ̂h ) − (v̂h , ψ̂h ) = 0 ∀ψ̂h ∈ Ŵh ,

where V̂h0 ⊂ V := {u ∈ H 1 | u = 0 on Γlef t }, here Γlef t is the left boundary of the specimen and and Ŵh ⊂ L2
and
Σ
b h,lin = 2µE bh,lin + λtrE ˆ
bh,lin I,

and the identity matrix Iˆ and


bh,lin = 1 (∇û
E b T ).
b h + ∇û
h
2

12.8.1 Constant force - stationary limit


First, we redo the 2D test from Section 11.4 in which the left boundary is fixed and ρ̂ = 1 and (fx , fy ) =
(0, −9.81). These findings are displayed in Figure 57. As time-stepping scheme, the Crank-Nicolson scheme,
i.e., θ = 0.5 is used. The time step size is k = 1s and the total time is T = 10s. The initial solutions are
u0 = 0 and v0 = 0.

Figure 57: 2D nonstationary linearized elasticity with a constant right hand side (fx , fy ) = (0, −9.81). At left,
the ux solution is displayed. At right, the uy solution is displayed.

311
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.8.2 Time-dependent force - Crank-Nicolson scheme
In the second test, we prescribe the right hand is as

(fxn , fyn ) = (0, sin(t))T ,

with the uniform time step size k = 0.02s and N = 314 (that is 314 time step solutions). The total time is
T = k ∗ N = 6.28s. The results at N = 2, 157, 314 are displayed in Figure 58.

Figure 58: 2D nonstationary linearized elasticity with (fxn , fyn ) = (0, sin(t))T . The uy solution is displayed at
N = 2, 157, 314. The displacements are amplified by a factor of 1000 such that a visible deformation
of the elastic beam can be seen. Here, the Crank-Nicolson scheme was used.

312
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.8.3 Time-dependent force - backward Euler scheme
In the third test, we use the backward Euler scheme. From the theory we know that this scheme is too dissipa-
tive and introduces artificial numerical damping. Indeed, observing Figure 59, we see that the displacements
at N = 157 and N = 314 are much smaller than in the corresponding Crank-Nicolson test case.

Figure 59: 2D nonstationary linearized elasticity with (fxn , fyn ) = (0, sin(t))T . The uy solution is displayed at
N = 2, 157, 314. The displacements are amplified by a factor of 1000 such that a visible deformation
of the elastic beam can be seen. Here, the backward Euler scheme was used.

313
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.9 Numerical tests: elastic wave equation in a flow field
We go to [140]. The problem is fluid-structure interaction (Section 14.9), but this is not important at all here.
What is important is the fact that the elastic part is described by a (nonlinear) wave equation (again Section
14.9). The configuration is a cylinder that is shifted such that the elastic beam (should!) start oscillating.

Figure 60: Left: Flow around a cylinder with an elastic beam described by a nonlinear wave equation [77].
Right: Snapshot of the numerical solution of the velocity field. Both figures copied from [140].

We learned previously that the Crank-Nicolson scheme should be an ideal candidate for problems with wave
phenomena. First, we want to refer to 2.7 (taken from the author’s PhD thesis [135]) that the pure Crank-
Nicolson scheme with θ = 0.5 will be subject to stability issues after some time. Let us now work with the
shifted variant θ = 0.5 + k and investigate different k towards the implicit side. The largest choice will be
k = 0.1, i.e., θ = 0.5 + 0.1 = 0.6. The results are displayed in Figure 61. We nicely see that from k = 0.05 (i.e.,
θ = 0.55) the numerical dissipation is already too high and damps significantly the (physical!) oscillations
of the elastic beam. For θ = 0.6, the beam will even not start to move! Thus, θ ≥ 0.6, specifically θ = 1
(backward Euler), are useless schemes for these situations because the numerical dissipation is too high and
energy on the time-discrete level is not conserved.

0.1 0.1

0.05
0.05
Dis(y)[m]

Dis(y)[m]

0
0
-0.05
CN, k=0.05s
k=0.0025s CN, k=0.1s
-0.05 k=0.005s CN, k=0.2s
k=0.01s -0.1 CNs, k=0.1s
k=0.05s CNs, k=0.05s
k=0.1s CNs, k=0.0025s
-0.1 -0.15
0 2 4 6 8 10 12 14 5 5.5 6 6.5 7 7.5 8 8.5 9
Time [s] Time [s]

Figure 61: Taken from [140]: Comparison of uy (tip of the elastic beam in y direction) on mesh level 1 and
different time step sizes k using the shifted Crank-Nicolson (CNs) scheme (left) and a comparison
with the classical Crank-Nicolson (CN) method (right). First thanks to the monolithic formulation
and implicit time-stepping we are able to use large time steps, e.g., k = 0.2s. Secondly, the large
time step is not sufficient any more to lead to the correct oscillations of the elastic beam. In
particular, care of the correct choice of the damping factor using the shifted version must be taken.
From this figure, we infer that the largest time step size is around k ∼ 0.01s in order to obtain the
correct amplitude of oscillations.

314
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS
12.10 Chapter summary and outlook
In this chapter, we have investigated time-dependent problems. In the next chapter we leave time-dependence
and go back to stationary problems, but include now nonlinear phenomena. These lead to fascinating research
topics for which theory and discretizations only have partially developed, and therefore constitute also current
research directions.

315
12. METHODS FOR TIME-DEPENDENT PDES: PARABOLIC AND HYPERBOLIC PROBLEMS

316
13. NONLINEAR PROBLEMS

13 Nonlinear problems
Most problems are nonlinear and such settings are really fascinating to study. For this reason, a first class on
numerical methods for partial differential equations, with the risk of being dense, should at least strive how
nonlinear problems can be approached. This chapter and Chapter 14 have been partially re-written and are
both complementary to my book on multiphysics phase-field fracture [141] in which nonlinear problems and
coupled variational inequality systems (CVIS) are studied in great detail.

13.1 Nonlinear PDEs: strong forms


We introduce nonlinear problems in terms of the so-called p-Laplace problem, which has important applications,
but reduces to the Poisson problem for p = 2. This setting has no time dependence.

13.1.1 p-Laplace
Find u : Ω → R:

−∇ · (|∇u|p−2 ∇u)p/2 = f (267)

Properties: nonlinear (quasilinear), stationary, scalar-valued. We need W k,p Sobolev spaces (Section 7.1.7).

13.1.2 Semilinear equation


Find u : Ω → R:

−∆u + u2 = f (268)

Properties: nonlinear (semilinear), stationary, scalar-valued.

13.1.3 Incompressible Euler equations


Find v : Ω → Rn and p : Ω → R

∂t + (v · ∇)v + ∇p = f, ∇·v =0 (269)

Properties: nonlinear (quasilinear), nonstationary, vector-valued, PDE system.

13.1.4 Incompressible Navier-Stokes equations


Find v : Ω → Rn and p : Ω → R
1
∂t v + (v · ∇)v − ∆v + ∇p = f, ∇·v =0 (270)
Re
with Re being the Reynolds’ number. Properties: semilinear, .... For Re → ∞ we obtain the Euler equations.
Properties: nonlinear (quasilinear), nonstationary, vector-valued, PDE system.

13.1.5 A coupled nonlinear problem (I)


Find u : Ω → R and ϕ : Ω → R

−∇ · (a(ϕ)∇u) = f, (271)
2
a(ϕ)|∇u| − ∆ϕ = g (272)

Properties: nonlinear, coupled problem via coefficients, stationary. Equations become linear when solution
variables are fixed in the other equation.

317
13. NONLINEAR PROBLEMS
13.1.6 A coupled nonlinear problem (II)
Find u : Ω → R and ϕ : Ω → R

−∆u = f (ϕ), (273)


2
|∇u| − ∆ϕ = g(u) (274)

Properties: nonlinear, coupled problem via right hand sides, stationary. Equations become linear when solution
variables are fixed in the other equation.

13.1.7 A coupled nonlinear problem (III)


Let Ω1 and Ω2 with Ω1 ∩ Ω2 = 0 and Ω̄1 ∩ Ω̄2 = Γ and Ω̄1 ∪ Ω̄2 = Ω. Find u1 : Ω1 → R and u2 : Ω2 → R:

−∆u1 = f1 in Ω1 , (275)
−∆u2 = f2 in Ω2 , (276)
u1 = u2 on Γ, (277)
∂n u1 = ∂n u2 on Γ, (278)
(279)

Properties: linear, coupled problem via interface conditions, stationary.

13.2 Nonlinear PDEs: weak forms


The notation for weak forms is via semi-linear forms:

a(u)(φ)

where the (solution) variable u is nonlinear and the test function φ is linear. The nonlinear problem statement
in abstract form reads
Formulation 13.1. Find u ∈ V such that

a(u)(φ) = l(φ) ∀φ ∈ V,

with the right hand side linear form l(φ) = (f, φ) with a given force f .
Due to the nonlinearity in u, simply inserting
n
X
uh = uj φj
j=1

yields a nonlinear equation system. Consider as example

(∇u, ∇φ) + (u2 , φ) = (f, φ)

Then, the discrete form reads


(∇uh , ∇φh ) + (u2h , φh ) = (f, φh )
and inserting the linear combination yields
 2 
n
X Xn
uj (∇φj , ∇φi ) +  uj φj  , φi  = (f, φi )
 
j=1 j=1

for i = 1, . . . , n. For this reason, we need to linearize as we show in Section 13.7.

318
13. NONLINEAR PROBLEMS

Figure 62: Obstacle problem −u00 = −1 in Ω and u(0) = u(1) = 0: deformation u of a line that is constraint
by the obstacle g.

13.3 A variational inequality I: Obstacle problem


The obstacle problem is the prototype of a variational inequality and is very classical and intensively investi-
gated in [83] and [84]. We consider an elastic membrane in Rn with n = 2 or simpler n = 1 resulting in the
clothesline problem. We want avoid that the wet clothes touch the flowers as illustrated in Figure 62.
Let Ω ⊂ R (all developements in this section hold also for Ω ⊂ Rn ) be open and u : Ω → R and f : Ω → R.
The (nonlinear) potential energy is defined as
Z  p 
E(u) = µ( 1 + |∇u|2 − 1) − f u dx

where µ > 0 is a material parameter. Taylor linearization yields:


Z
1
E(u) = (µ|∇u|2 − f u) dx
2 Ω
The physical state without constraints is given by

min E(u)
u∈V

with V = {v ∈ H 1 (Ω)|v = u0 on ∂Ω}.


We introduce the constraint (for instance a table that blocks the further deformation of the membrane):

u ≥ g, g ∈ L2

Then:
min E(u)
u∈V,u≥g

The admissible space is the convex set

K = {v ∈ H 1 |v = u0 on ∂Ω, v ≥ g in Ω}.

We recall the definition of a convex set:

u, v ∈ K : θu + (1 − θ)v ∈ K.

Formulation 13.2. If u ∈ K is a minimum, it holds:

E(u) = min E(v).


v∈K

319
13. NONLINEAR PROBLEMS
13.3.0.1 Variational formulation We now derive a variational formulation. Let θv +(1−θ)u = u+θ(v −u) ∈
K for θ ∈ [0, 1]. Then it clearly holds:
E(u + θ(v − u)) ≥ E(u)
We derive now the first-order optimality condition (i.e., the PDE in weak form):
• Differentiate w.r.t. θ;

• Set θ = 0.
Remark 13.3. For the differentiation in Banach spaces or Rn we refer the reader to Section 7.4.
Then:
d d
E(u + θ(v − u))|θ=0 ≥ E(u)
dθ dθ
d
⇔ E(u + θ(v − u))|θ=0 ≥ 0
Z dθ Z
d
⇔ µ∇u · ∇(u + θ(v − u))|θ=0 dx − f (v − u) dx ≥ 0
dθ Ω Ω
Z Z
⇒ µ∇u · ∇(v − u) dx − f (v − u) dx ≥ 0
Ω Ω

for all v ∈ K.
In summary:

Formulation 13.4 (Obstacle problem: variational formulation). We have

(µ∇u, ∇(v − u)) ≥ (f, v − u)

for K = {v ∈ H 1 |v = u0 on ∂Ω, v ≥ g in Ω}.

13.3.0.2 Strong formulation Using the fundamental lemma of calculus of variations, we derive as usually
the strong form:

−∇ · (µ∇u) ≥ f in Ω,
u≥g in Ω,
(∇ · (µ∇u) + f )(u − g) = 0 in Ω,
u = u0 on ∂Ω
u=g on Γ
µ∇u · n = µ∇g · n on Γ.

The boundary Γ is a so-called free boundary. The third condition is the so-called complementarity condition
that links the PDE and the obstacle condition.
We define the subregions as
• A = Ω \ N : active set or contact zone (we sit on the obstacle), defined by

A = {x ∈ Ω : u = g}

• N is the inactive set (we solve the PDE)


• Γ = ∂A ∩ ∂N is the free boundary.

Remark 13.5. We notice that the two inner conditions of the free boundary are exactly of the same nature
as the interface conditions of coupled problems in different subdomains; see Section 14.1. Therefore, we have
a natural link between the obstacle problem and surface-coupled PDEs.

320
13. NONLINEAR PROBLEMS
13.3.0.3 Lagrange multiplier formulation We can also explicitly introduce a Lagrange multiplier p ∈ L2
with 
0 if x ∈ N

p=
 −∇ · (µ∇u) − f if x ∈ A

With this definition is also becomes clear that the physical meaning of the Lagrange multiplier is a force. This
force acts against the PDE in order to fulfill the constraint.
Then we can re-formulate the strong form as:
Formulation 13.6. Find u : Ω → R and p : Ω → R

−∇ · (µ∇u) − p = f in Ω,
u≥g in Ω,
p≥0 in Ω,
p(u − g) = 0 in Ω,
u = u0 on ∂Ω
u=g on Γ
µ∇u · n = µ∇g · n on Γ.

13.3.0.4 Brief introduction to one possibility to treat the obstacle constraint In practice we face the
question, how we can realize the obstacle constraint. One possibility (in its basic form not the best!) is
penalization, which is a well-known technique in nonlinear programming, e.g., [129, 130]. The idea is to
asymptotically fulfill the constraints by including an additional term that acts against the optimization goal
if the constraints are violated.
We introduce the penalty parameter ρ > 0 (and have ρ → ∞ in mind for later). We have from before: Find
u ∈ K:
(µ∇u, ∇(v − u)) ≥ (f, v − u) ∀v ∈ K.
A penalized version reads: Find uρ ∈ H01 :
Z
(µ∇uρ , ∇v) − ρ [g − u]+ v dx = (f, v) ∀v ∈ H01 .

Here, [x]+ = max(x, 0).


Indeed we have
• uρ ≥ g yields 0 ≥ g − u. Thus [g − u]+ = 0.
• uρ < g yields 0 < g − u. Thus [g − u]+ > 0.
Remark 13.7. For large ρ (which are necessary to enforce as well as possible the constraint), the system
matrix becomes ill-conditioned and thus the linear system cannot be solved anymore. Here, one has to find a
trade-off between sufficiently small and large ρ. An extension is an augmented Lagrangian method or active
set methods.

13.4 Exercise: FEM for the penalized 1D obstacle problem


We derive explicitly the final linear equation system for the 1D situation presented in Section 13.3.0.4:
Formulation 13.8 (1D obstacle problem). Let Ω = (0, 1) and f = −1. The obstacle is given by g = −0.075
for all x ∈ Ω. Find u : Ω → R such that

−u00 ≥ f in Ω,
u≥g in Ω,
[f + u00 ][g − u] = 0 in Ω,
u(0) = u(1) = 0.

321
13. NONLINEAR PROBLEMS
13.4.1 Tasks
We derive in detail the linear equation system for the 1D obstacle problem and adopt the following steps:

1. Write down the 1D obstacle problem in weak formulation (variational inequality).


2. Penalize the inequality constraint using a penalization parameter ρ in order to obtain a variational
formulation.
3. Since the problem is nonlinear, employ a Newton or fixed-point method (see Chapter 13) to formulate
an iterative scheme for the numerical solution.
4. Discretize in space by choosing linear FEM with Vh = {φ1 , . . . , φn } with dim(Vh ) < ∞ and representing
uh as usually done via a linear combination of the basis functions.
5. State the resulting linear equation system AU = b with U = {u1 , . . . , un } ∈ Rn

6. Try to justify that a large penalty parameter ρ yields an ill-conditioned system.


7. Make at least a guess (better a detailed justification) how ρ could be chosen in accordance with the
spatial discretization parameter h such that ill-conditioning is avoided and we obtain asympotically the
correct scheme.

8. Formulate a saddle-point formulation representing the constraint with the help of a Lagrange multiplier
p.
9. Propose (in less detail than for the penalized version) an FEM scheme to solve the saddle-point system.

13.5 A variational inequality II: Phase-field fracture


Fracture or damage models allow to simulate discontinuities in solids. A regularized phase-field model reads
[96, 139]:
Formulation 13.9. Find u : B → Rn :
  
−∇ · (1 − κ)ϕ2 + κ σ = 0 in B
u=0 on ∂B.

The phase-field system consists of three parts: the partial differential equation, the inequality constraint, and a
compatibility condition (which is called Rice condition in the presence of fractures [110]): Find ϕ : B → [0, 1]
such that
Gc
(1 − κ)σ : e(u) ϕ − Gc ε∆ϕ − (1 − ϕ) ≤ 0 in B,
ε
∂t ϕ ≤ 0 in B,
h Gc i
(1 − κ)σ : e(u) ϕ − Gc ε∆ϕ − (1 − ϕ) · ∂t ϕ = 0 in B,
ε
∂n ϕ = 0 on ∂B.

We explain all sympols in the following:


• ∇: Nabla operator.

• κ > 0: regularization parameter that system remains solvable when ϕ → 0.


• ϕ: phase-field variable.
• σ: stress tensor, e.g., σ = 2µe(u) + λtr(e)I.

• u: displacement field.

322
13. NONLINEAR PROBLEMS
• µ, λ > 0: Lamé parameters.
• e(u) = 12 (∇u + ∇uT ): strain tensor

• Gc : critical energy release rate.


• ε: phase-field regularization parameter. Determines the thickness of the transition layer in which ϕ
changes from 0 to 1 and vice versa.
Remark 13.10. As in the obstacle problem, we deal with a variational inequality that satisfies a compatiblity
condition.

13.6 Differentiation of nonlinear operators


We have learned about differentiation in Banach spaces in Section 7.4. Examples are:

T (u) = u2

then
Tu0 (u)(h) = 2u · h.
Or for semi-linear forms:
a(u)(φ) = (u2 , φ)
we want to differentiate in the first argument:

a0u (u)(h, φ) = (2u · h, φ).

13.7 Linearization techniques: brief overview


We discuss linearization techniques in the following subsections. The idea is to provide algorithmic frameworks
that serve for the implementation. Concerning Newton’s method for general problems there is not that much
theory; see e.g., [40]. In general one can say that in many problems the theoretical assumptions are not met,
but nevertheless Newton’s method works well in practice.
Our list is as follows:
• Fixed-point iteration;

• Linearization via time-lagging;


• Newton’s method.
As nice online overview is given by Langtangen: http://hplgit.github.io/num-methods-for-PDEs/doc/
pub/nonlin/html/slides_nonlin-1.html#nonlin:timediscrete:logistic:Picard

13.8 Fixed-point iteration


13.8.1 Idea
In ODE computations, applying a fixed-point theorem, namely the Banach fixed point theorem, is called a
Picard iteration. The basic idea is to introduce an iteration using an index k and to linearize the nonlinear
terms by taking these terms from the previous iteration k − 1.

323
13. NONLINEAR PROBLEMS
13.8.2 Example
This is best illustrated in terms of an example. Let

−∆u + u2 = f

The variational formulation reads:

(∇u, ∇φ) + (u2 , φ) = (f, φ) ∀φ ∈ V.

An iterative scheme is constructed as follows:


Algorithm 13.11 (Fixed-point iterative scheme). For k = 1, 2, 3, . . ., given uk−1 , we seek uk ∈ V such that

(∇uk , ∇φ) + (uk uk−1 , φ) = (f, φ) ∀φ ∈ V,

until a stopping criterion is fulfilled (choice one out of four):


• Error criterion:

kuk − uk−1 k < T OL, (absolute)


kuk − uk−1 k < T OLkuk k, (relative)

• Residual criterion:

k(∇uk , ∇φi ) + ([u2 ]k , φi ) − (f, φi )k < T OL, (absolute)


k 2 k
k(∇u , ∇φi ) + ([u ] , φi ) − (f, φi )k < T OLk(f, φi )k, (relative)

for all i = 1, . . . , dim(Vh ).

Algorithm 13.12 (Fixed-point iterative scheme with relaxation (I)). Given a damping parameter λ > 0. For
k = 1, 2, 3, . . ., given uk−1 , we seek uk ∈ V such that

(∇uk , ∇φ) + λ(uk uk−1 , φ) = (f, φ) ∀φ ∈ V,

until a stopping criterion is fulfilled.

Algorithm 13.13 (Fixed-point iterative scheme with relaxation (II)). For k = 1, 2, 3, . . ., given uk−1 :
1. We seek ûk ∈ V
(∇ûk , ∇φ) + (ûk uk−1 , φ) = (f, φ) ∀φ ∈ V,

2. Set uk := λûk + (1 − λ)uk−1 with λ ∈ (0, 1];


3. Until a stopping criterion is fulfilled.

We briefly provide some details on the linearized part. We have


Z
k k−1
(u u , φ) = uk−1 (x)uk (x) · φ(x) dx.

Using the FEM representation of uk , we obtain


M
X
uk = aj φj ,
j=1

324
13. NONLINEAR PROBLEMS
where M = dim(Vh ) = {φ1 , . . . , φM } and aj are as usual the unknown solution coefficients. Plugging this in,
we obtain
X X Z
aj (uk−1 φj , φi ) = aj uk−1 (x)φj (x) · φi (x) dx.
j j Ω

(1)
We recall that uk−1 ∈ Vh at the old iteration step k − 1. Working with Vh := Vh means that uk−1 is a linear
function on the element Kl . In principle the previous integral can be evaluated by building the anti-derivative.
The more convenient way is to use numerical quadrature. To this end, we obtain on the element Kl over our
triangulation the following integral:
Z Nq
X
uk−1 (x)φj (x) · φi (x) dx ≈ uk−1 (xq )φj (xq ) · φi (xq )ω(xq )
Kl q=1

where Nq is the number of quadrature points and where xq are the quadrature points on the elment Kl and
ω(xq ) are the quadrature weights. Specifically, uk−1 (xq ) is the evaluation of the previous FEM k − 1th solution
at the point xq .
Algorithm 13.14 (Fixed-point iterative scheme; stronger linearization). Given a damping parameter λ > 0.
A stronger linearization would be as follows: For k = 1, 2, 3, . . ., given uk−1 , we seek uk ∈ V such that

(∇uk , ∇φ) + λ((uk−1 )2 , φ) = (f, φ) ∀φ ∈ V,

which means that the entire nonlinearity goes on the right hand side:

(∇uk , ∇φ) = (f, φ) − ((uk−1 )2 , φ) ∀φ ∈ V.

Remark 13.15. For time-dependent PDEs, a common (simple) linearization can be

(u2 , φ) → (uun−1 , φ)

where un−1 := u(tn−1 ) is the previous time step solution. In this case, no additional fixed-point iteration needs
to be constructed because the linearization is part of the temporal discretization. An even stronger linearization
would be
(u2 , φ) → ((un−1 )2 , φ).
For instance, for the nonlinear parabolic PDE

∂t u − ∆u + u2 = f

we obtain in the semi-time-discrete (implicit Euler) variational forms:

Find un ∈ V : (un , φ) + k(∇un , ∇φ) + k(un un−1 , φ) = k(f n , φ) − (un−1 , φ) ∀φ ∈ V,

and
Find un ∈ V : (un , φ) + k(∇un , ∇φ) = k(f n , φ) − k((un−1 )2 , φ) − (un−1 , φ) ∀φ ∈ V,
respectively.

13.8.3 Example of a 1D nonlinear equation


Given the nonlinear problem

−(uu0 )0 = f in Ω
u(0) = u(1) = 0.

Thus we have a nonlinear coefficient a(u) = u in front of the highest order derivative. The weak formulation
reads: Find u ∈ H01 (Ω) such that
(uu0 , ϕ0 ) = (f, ϕ) ∀ϕ ∈ H01 (Ω).

325
13. NONLINEAR PROBLEMS
A fixed-point linearization would be: given some initial guess u0 , iterate for k = 1, 2, 3, . . . such that in weak
formulation:
(uk−1 u0k , ϕ0 ) = (f, ϕ)
The nonlinear coefficient uk−1 is taken at the previous iteration step k − 1. Now we are in the game and can
take as discrete representation in Vh = {ϕ1 , . . . , ϕn }:
n
X
uk = a j ϕj .
j=1

The solution coefficients are denoted by aj (we did not take uj in order to avoid any confusion with the
iteration index). Then:
n
X
aj (uk−1 ϕ0j , ϕ0i ) = (f, ϕi ) for i = 1, . . . , n
j=1

Therefore, we obtain similar results to Section 8.4, with the modification that we need to evaluate the previous
FEM solution uk−1 in each integral (uk−1 ϕ0j , ϕ0i ).

13.9 Linearization via time-lagging (example: NSE)


For time-dependent PDEs, a common linearization can be

(u2 , φ) → (uun−1 , φ)

where un−1 := u(tn−1 ) is the previous time step solution. In this case, no additional fixed-point iteration
needs to be constructed.
Let us know briefly explain this idea to Navier-Stokes to point an important property. Setting N (v) :=
v · ∇v − ∇ · (∇v + ∇v T ), we write

∂t v + N (v) + ∇p = f, ∇·v =0

be given.

13.9.1 Stokes linearization


Using the Stokes linearization (for small Reynolds numbers), the nonlinearity can be treated fully explicitely:

v + k∇p = v n−1 − kN (v n−1 )kθf + k(1 − θ)f n−1 , ∇ · v = 0.

This simplifies the problem as it is now linear and in each step, the symmetric and positive Stokes-operator
must be inverted.

13.9.2 Oseen linearization


For higher Reynold’s numbers, the so-called Oseen linearization can be consulted:

v + kθÑ (v) + k∇p = v n−1 − k(1 − θ)Ñ (v n−1 )kθf + k(1 − θ)f n−1 , ∇ · v = 0,

where
Ñ (v) = ṽ · ∇v,
where, for instance a constant extrapolation ṽ = v n−1 or linear extrapolation ṽ = 2v n−1 − v n−2 can be
employed; see also the next section. Here, at each step, a (linear) nonsymmetric diffusion-transport operator
must be inverted.

326
13. NONLINEAR PROBLEMS
13.10 Newton’s method in R - the Newton-Raphson method
Let f ∈ C 1 [a, b] with at least one point f (x) = 0, and x0 ∈ [a, b] be a so-called initial guess. The task is to
find x ∈ R such that
f (x) = 0.
In most cases it is impossible to calculate x explicitly. Rather we construct a sequence of iterates (xk )k∈R and
hopefully reach at some point

|f (xk )| < T OL, where T OL is small, e.g., T OL = 10−10 .

What is true for all Newton derivations in the literature is that one has to start with a Taylor expansion. In
our lecture we do this as follows. Let us assume that we are at xk and can evaluate f (xk ). Now we want to
compute this next iterate xk+1 with the unknown value f (xk+1 ). Taylor gives us:

f (xk+1 ) = f (xk ) + f 0 (xk )(xk+1 − xk ) + o(xk+1 − xk )2

We assume that f (xk+1 ) = 0 (or very close to zero f (xk+1 ) ≈ 0). Then, xk+1 is the sought root and neglecting
the higher-order terms we obtain:
0 = f (xk ) + f 0 (xk )(xk+1 − xk ).
Thus:
f (xk )
xk+1 = xk − , k = 0, 1, 2, . . . . (280)
f 0 (xk )
This iteration is possible as long as f 0 (xk ) 6= 0.
Remark 13.16 (Relation to Section 10.3). We see that Newton’s method can be written as

xk+1 = xk + dk , k = 0, 1, 2, . . . ,

where the search direction is


f (xk )
dk = − .
f 0 (xk )
The iteration (280) terminates if a stopping criterion

|xk+1 − xk |
< T OL, or |xk+1 − xk | < T OL, (281)
|xk |
or
|f (xk+1 )| < T OL (282)
is fulfilled. All these TOL do not need to be the same, but sufficiently small and larger than machine precision.
Remark 13.17. Newton’s method belongs to fix-point iteration schemes with the iteraction function:

f (x)
F (x) := x − . (283)
f 0 (x)

For a fixed point x̂ = F (x̂) it holds: f (x̂) = 0. Compare again to Section 10.3.

The main results is given by:


Theorem 13.18 (Newton’s method). The function f ∈ C 2 [a, b] has a root x̂ in the interval [a, b] and

m := min |f 0 (x)| > 0, M := max |f 00 (x)|.


a≤x≤b a≤x≤b

Let ρ > 0 such that


M
q := ρ < 1, Kρ (x̂) := {x ∈ R : |x − x̂| ≤ ρ} ⊂ [a, b].
2m

327
13. NONLINEAR PROBLEMS

f x1 x0

Figure 63: Geometrical interpretation of Newton’s method.

Then, for any starting point x0 ∈ Kρ (x̂), the sequence of iterations xk ∈ Kρ (x̂) converges to the root x̂.
Furthermore, we have the a priori estimate
2m 2k
|xk − x̂| ≤ q , k ∈ N,
M
and a posteriori estimate
1 M
|xk − x̂| ≤ |f (xk )| ≤ |xk − xk+1 |2 , k ∈ N.
m 2m
Often, Newton’s method is formulated in terms of a defect-correction scheme.

Definition 13.19 (Defect). Let x̃ ∈ R an approximation of the solution f (x) = y. The defect (or similarly
the residual) is defined as
d(x̃) = y − f (x̃).
Definition 13.20 (Newton’s method as defect-correction scheme).

f 0 (xk )δx = dk , dk := y − f (xk ),


xk+1 = xk + δx, k = 0, 1, 2, . . . .

The iteration is finished with the same stopping criterion as for the classical scheme. To compute the update
δx we need to invert f 0 (xk ):
δx = (f 0 (xk ))−1 dk .
This step seems trivial but is the most critical one if we deal with problems in Rn with n > 1 or in function
spaces. Because here, the derivative becomes a matrix. Therefore, the problem results in solving a linear
equation system of the type Aδx = b and computing the inverse matrix A−1 is an expensive operation. 

Remark 13.21. This previous forms of Newton’s method are already very close to the schemes that are used
in research. One simply extends from R1 to higher dimensional cases such as nonlinear PDEs or optimization.
The ’only’ aspects that are however big research topics are the choice of
• good initial Newton guesses;
• globalization techniques.

Two very good books on these topics, including further materials as well, are [40, 97].

328
13. NONLINEAR PROBLEMS
13.10.1 Newton’s method: overview. Going from R to Banach spaces
Overview:
• Newton-Raphson (1D), find x ∈ R via iterating k = 0, 1, 2, . . . such that xk ≈ x via:
Find δx ∈ R : f 0 (xk )δx = −f (xk ),
Update: xk+1 = xk + δx.

• Newton in Rn , find x ∈ Rn via iterating k = 0, 1, 2, . . . such that xk ≈ x via:


Find δx ∈ Rn : F 0 (xk )δx = −F (xk ),
Update: xk+1 = xk + δx.
Here we need to solve a linear equation system to compute the update δx ∈ Rn .
• Banach spaces, find u ∈ V , with dim(V ) = ∞, via iterating k = 0, 1, 2, . . . such that uk ≈ u via:
Find δu ∈ V : F 0 (uk )δu = −F (uk ),
Update: uk+1 = uk + δu.
Such a problem needs to be discretized and results again in solving a linear equation system in the defect
step.
• Banach spaces, applied to variational formulations, find u ∈ V , with dim(V ) = ∞, via iterating k =
0, 1, 2, . . . such that uk ≈ u via:
Find δu ∈ V : a0 (uk )(δu, φ) = −a(uk )(φ),
Update: uk+1 = uk + δu.
As before, the infinite-dimensional problem is discretized resulting in solving a linear equation system in
the defect step.

13.10.2 A basic algorithm for a residual-based Newton method


In this type of methods, the main criterion is a decrease of the residual in each step:
Algorithm 13.22 (Residual-based Newton’s method). Given an initial guess x0 . Iterate for k = 0, 1, 2, . . .
such that
Find δx ∈ Rn : F 0 (xk )δxk = −F (xk ),
Update: xk+1 = xk + λk δxk ,
with λk ∈ (0, 1] (see the next sections how λk can be determined). A full Newton step corresponds to λk = 1.
The criterion for convergence is the contraction of the residuals measured in terms of a discrete vector norm:
kF (xk+1 )k < kF (xk )k.
In order to save some computational cost, close to the solution x∗ , intermediate simplified Newton steps can
be used. In the case of λk = 1 we observe
kF (xk+1 )k
θk = < 1.
kF (xk )k
If θk < θmax , e.g., θmax = 0.1, then the old Jacobian F 0 (xk ) is kept and used again in the next step k + 1.
Otherwise, if θk > θmax , the Jacobian will be assembled. Finally the stopping criterion is one of the following
(relative preferred!):
kF (xk+1 )k ≤ T OLN (absolute)
kF (xk+1 )k ≤ T OLN kF (x0 )k (relative)
∗ ∗
If fulfilled, set x := xk+1 and the (approximate) root x of the problem F (x) = 0 is found.
Remark 13.23 (C++ Newton implementation). A C++ implementation can be found of such a basic Newton
scheme can be found in [122].

329
13. NONLINEAR PROBLEMS
13.11 Inexact Newton
For large scale nonlinear PDEs, the inner linear systems are usually not solved with a direct method (e.g.,
LU), but with iterative methods (CG, GMRES, multigrid). Using such iterative methods yields an inexact
Newton scheme. Here, we deal with two iterations:
• The outer nonlinear Newton iteration.
• The inner linear iteration.

Algorithm 13.24 (Inexact Newton method). Given an initial guess x0 . Iterate for k = 0, 1, 2, . . . such that

Formulate δx ∈ Rn : F 0 (xk )δxk = −F (xk ),


Solve with an iterative linear method F 0 (xk )δxik = −F (xk ),
Check linear stopping criteriumkδxik − δxki−1 k < T OLLin ,
Update: xk+1 = xk + λk δxik ,
Check nonlinear stopping criterium, e.g., kxk+1 − xk k < T OLN ew

with λk ∈ (0, 1].

13.12 Newton’s method for variational formulations


In this section, we describe in detail an Newton method for solving nonlinear nonstationary PDE problems in
variational form. That is we have the following indices:
• h for spatial discretization.
• n for the current time step solution.

• j Newton iteration.
• l (optional): line search iterations.
Therefore, we deal with the following problem on the discrete level:

a(unh )(φ) = l(φ) ∀φ ∈ Vh ,

which is solved with a Newton-like method. We can express this relation in terms of the defect:

d := l(φ) − a(unh )(φ) = 0 ∀φ ∈ Vh .

Algorithm 13.25 (Basic Newton method as defect-correction problem). Given an initial Newton guess un,0
h ∈
Vh , find for j = 0, 1, 2, . . . the update δunh ∈ Vh of the linear defect-correction problem

a0 (un,j n n,j
h )(δuh , φ) = −a(uh )(φ) + l(φ), (284)
un,j+1
h = un,j
h + λδunh , (285)
Check k − a(un,j
h )(φ) + l(φ)k < T OLN , (286)
n,j
or check k − a(uh )(φ) + l(φ)k < T OLN k − a(un,0
h )(φ) + l(φ)k, (287)

with a line search damping parameter λ ∈ (0, 1] and a Newton tolerance T OLN , e.g., T OLN = 10−10 . For
λ = 1, we deal with a full Newton step. Adapting λ and choosing 0 < λ < 1 helps to achieve convergence of
Newton’s method for certain nonlinear problems when a full Newton step does not work.
Remark 13.26 (Initial Newton guess for time-dependent problems). In time-dependent problems the ‘best’
initial Newton guess is the previous time step solution; namely

un,0 n−1
h := uh .

330
13. NONLINEAR PROBLEMS
Remark 13.27 (Initial Newton guess in general). When possible the solution should be obtained in nested
iterations. We start on a coarse mesh with mesh size h0 and here u0h0 = 0 when nothing better is available.
Then, we compute the solution uh0 . After Newton converged, we project uh0 on the next mesh and use
u0h1 := uh0 as initial Newton guess and repeat the entire procedure.
Remark 13.28 (Dirichlet boundary conditions). In Newton’s method, non-homogeneous Dirichlet boundary
conditions are only prescribed on the initial guess un,0
h and in all further updates non-homogeneous Dirichlet
conditions are replaced by homoegeneous Dirichlet conditions. The reasons is that we only need to prescribe
these conditions once per Newton iteration and not several times, which would lead to wrong solution.
In the following we present a slightly more sophisticated algorithm with two additional features:
• backtracking line search to choose λ;
• simplified Newton steps in order to avoid too many assemblings of the Jacobian matrix a0 (un,j n
h )(δuh , φ).

The second point is a trade-off: the less assemblings we do, the more Newton’s performance deteriorates and
quadratic convergence gets lost resulting in a fixed-point-like scheme in which many more nonlinear iterations
j are required to achieve the tolerance. On the other hand, when we do not construct the Jacobian, we save
the assembling and can keep working with the previous matrix. For details, we refer for instance to [40].
Algorithm 13.29 (Backtracking line search). A simple strategy is to modify the update step in (284) as
follows: For given λ ∈ (0, 1) determine the minimal l∗ ∈ N via l = 0, 1, . . . , Nl , such that

kR(un,j+1
h,l )k∞ < kR(un,j
h,l )k∞ ,

un,j+1
h,l = un,j l n
h + λ δuh .

For the minimal l, we set


un,j+1
h := un,j+1
h,l∗ .

In this context, the nonlinear residual R(·) is defined as

R(un,j n,j
h )(φi ) := a(uh )(φi ) − l(φi ) ∀φi ∈ Vh ,

and
kR(un,j
h )k∞ := max {R(un,j
h )(φi )}
i∈[1,dim(Vh )]

The final full Newton algorithm based on a contraction of the residuals reads:
Algorithm 13.30 (Residual-based Newton’s method with backtracking line search and simplified Newton
steps). Given an initial Newton guess un,0
h ∈ Vh . For the iteration steps j = 0, 1, 2, 3, . . .:

1. Find δunh ∈ Vh such that

a0 (un,j n n,j
h )(δuh , φ) = −a(uh )(φ) + l(φ) ∀φ ∈ Vh , (288)
un,j+1
h = un,j
h + λj δunh , (289)

for λj = 1.
2. The criterion for convergence is the contraction of the residuals:

kR(un,j+1
h )k∞ < kR(un,j
h )k∞ . (290)

3. If (290) is violated, re-compute in (288) un,j+1


h,l by choosing λlj = 0.5, and compute for l = 1, ..., lM (e.g.
lM = 5) a new solution
un,j+1
h = un,j l n
h + λj δuh ,

until (290) is fulfilled for a l∗ < lM or lM is reached. In the latter case, no convergence is obtained and
the program aborts.

331
13. NONLINEAR PROBLEMS
4. In case of l∗ < lM we check next the stopping criterion:

kR(un,j+1
h )k∞ ≤ T OLN , (absolute)
kR(un,j+1
h )k∞ ≤ T OLN kR(un,0
h )k∞ , (relative)

If this is criterion is fulfilled, set unh := un,j+1


h . Else, we increment k → k + 1 and goto Step 1.

Remark 13.31 (On using simplified-Newton steps). Usually, when the Newton reduction rate

kR(un,j+1
h )k
θk = ,
kR(un,j
h )k

was sufficiently good, e.g., θk ≤ θmax < 1 (where e.g. θmax ≈ 0.1), a common strategy is to work with the ‘old’
Jacobian matrix, but with a new right hand side.

13.13 Continuing Section 13.8.2, now with Newton


Given from Section 13.8.2:
(∇u, ∇φ) + (u2 , φ) = (f, φ) ∀φ ∈ V
Let us formulate now a Newton scheme: Find δu ∈ V such that

(∇δu, ∇φ) + (2uk δu, φ) = −(∇uk , ∇φ) − ((uk )2 , φ) + (f, φ) φ ∈ V,


uk+1 = uk + δu.

13.14 Continuing Section 13.8.3, now with Newton


Taking the problem from Section 13.8.3, we first formulate the semi-linear form (we could have done this
therein as well):
a(u)(ϕ) = (uu0 , ϕ0 ) − (f, ϕ).
Thus we want to solve:
a(u)(ϕ) = 0.
For this, we need (see e.g., Section 13.10.1):

Find δu ∈ V : a0 (uk )(δu, ϕ) = −a(uk )(ϕ),


Update: uk+1 = uk + δu.

Thus, we need to clarify the meanings of uk , δu, ϕ:

• uk is the previous iterate


• δu is our unknown and therefore the trial function
• ϕ as usual the test function

As derivative for a(u)(ϕ), we obtain with the help of Section 7.4:

a0 (u)(δu, ϕ) = (δuu0 + uδu0 , ϕ0 )

Then, the defect step of Newton’s method at iteration k reads for the unknown δu:

(δuu0k + uk δu0 , ϕ0 ) = −(uk u0k , ϕ0 ) + (f, ϕ)

We need to represent δu as FEM function:


n
X
δu = aj ϕj .
j=1

332
13. NONLINEAR PROBLEMS
Since the left-hand-side is linear and the right-hand-side only contains given data, we obtain:
n
X
aj (ϕj u0k + uk ϕ0j , ϕ0i ) = −(uk u0k , ϕ0i ) + (f, ϕi )
j=1

Similar to Section 13.8.3, we are now in the game to known topics; see Section 8.4. In a bit more detail, we
need to assemble a matrix in which ϕ and ϕ arise, and a standard stiffness matrix with coefficients depending
on the previous solution:
n
X  
aj (ϕj u0k , ϕ0i ) + (uk ϕ0j , ϕ0i ) = −(uk u0k , ϕ0i ) + (f, ϕi )
j=1

Thus:
AU = B
with

U = (aj )j ∈ Rn , A = ((ϕj u0k , ϕ0i ) + (uk ϕ0j , ϕ0i ))ij ∈ Rn×n , B = (−(uk u0k , ϕ0i ) + (f, ϕi ))i ∈ Rn

13.15 The regularized p-Laplacian


As previously characterized, the p-Laplacian is an example of a quasi-linear PDE. In this section, we state
the complete problem, present an error analysis and then a numerical test. In this example, we confirm the
theoretical error analysis and also discuss iteration numbers of the nonlinear and linear solvers. In the latter
one, we use a geometric multigrid method as preconditioner.

13.15.1 Problem statement


We assume Ω being a bounded polygonal domain in Rd , with d = 2 and ΓD = ∂Ω. We consider the following
scalar p-type problem

−divA(∇u) = f in Ω, u = uD on ΓD , (291)

where f : Ω → R and uD : ΓD → R are given smooth functions. The operator A(∇u) : R2 → R2 has the
following p-power law form
p−2
A(∇u) = (ε2 + |∇u|2 ) 2 ∇u, (292)
p−2
where p ∈ (1, ∞) and ε > 0 are model parameters and | · |2 = (·, ·). The function a(∇u) = (ε2 + |∇u|2 ) 2 is
the diffusivity term of (291). As we can observe by (292), the nonlinear nature of the problem is due to the
appearance of |∇u| in the diffusivity function and this poses numerical challenges. We introduce the closely
related function F : R2 → R2 to the operator A by
p−2
F(a) = (ε2 + |a|2 ) 4 a. (293)

For the mathematical setting, let 1 ≤ p ≤ ∞ be fixed and l be a non-negative integer. As usual, Lp (Ω)
1
denotes Lebesgue spaces for which Ω |u(x)|p dx < ∞, endowed with the norm kukLp (Ω) = Ω |u(x)|p dx p ,
R R

and W l,p (Ω) is the Sobolev space, which consists of the functions φ : Ω → R such that their weak derivatives
Dα φ with |α| ≤ l belong to Lp (Ω). If φ ∈ W l,p (Ω), then its norm is defined by
X 1
kφkW l,p (Ω) = kDα φkpLp (Ω) p and kφkW l,∞ (Ω) = max kDα φkL∞ (Ω) ,
0≤|α|≤l
0≤|α|≤l

for 1 ≤ p < ∞ and p = ∞, respectively. We refer the reader to [1] for more details about Sobolev spaces.
Furthermore, we define the spaces
l,p
WD := {u ∈ W l,p (Ω) : u|∂Ω = uD }, and W0l,p := {u ∈ W l,p (Ω) : u|∂Ω = 0}. (294)

333
13. NONLINEAR PROBLEMS
In what follows, positive constants c and C appearing in the inequalities are generic constants which do not
depend on the mesh-size h. We indicate on what may the constants depend for a better understanding of the
proofs. Frequently, we will write a ∼ b meaning that ca ≤ b ≤ Ca, with c and C independent of the mesh size.
The weak formulation for (291) reads as follows:
1,p
Formulation 13.32. Find u ∈ WD such that
Z Z
B(u, φ) =lf (φ), ∀φ ∈ W01,p (Ω), where B(u, φ) = A(∇u) · ∇φ dx, and lf (φ) = f φ dx. (295a)
Ω Ω

Depending on the form of A and on the range of p, the well-posedness has been examined by means of monotone
operators in several works, see e.g., [33, 35].
Problem (295) is equivalent to the minimization problem:
Formulation 13.33.
1,p 1,p
Find u ∈ WD such that J(u) ≤ J(φ), ∀φ ∈ WD , (296)
1,p
where J : WD → R is defined by
Z Z
1 p
J(φ) = (ε2 + |∇φ|2 ) 2 dx − f φ dx. (297)
p Ω Ω

Furthermore, one can show that the Gateaux derivative of B is given by


Z
p−2
0
B (u)(v, w) = (ε2 + |∇u|2 ) 2 ∇v · ∇w dx (298)

Z
p−4
1,p
+(p − 2) (ε2 + |∇u|2 ) 2 (∇u · ∇v)(∇u · ∇w) dx, for u, v, w ∈ WD . (299)

Hypothesis 13.34. Let l ≥ 2 be an integer and let p ∈ (1, ∞) and d = 2. We assume that the solution u of
l,p 2d
(295) belongs to V := WD (Ω), where either (l − 1)p > d and p > 1 or (l − 1)p < d and p > d+1 . Further, we
2
assume that ∇F(∇u) ∈ L (Ω).
(k)
Theorem 13.35. Let u ∈ V be the solution of (295) under the Assumption 13.34, and let uh ∈ VD,h be the
solution of the discrete p-Laplacian. Then, there exist C ≥ 0, independent of the grid size h, such that
Z
|F(∇u) − F(∇uh )|2 dx ≤ Ch2(l−1) kuk2W l,p (Ω) . (300)

Proof. See [128].

13.15.2 Augmented Lagrangian techniques as one option for the numerical solution
In the following two paragraphs, we transform the original problem (126) into a saddle-point problem using
augmented Lagrangian techniques. Let q be the conjugate exponent of p, that is p1 + 1q = 1, and let us define
1,p
the space W ⊂ WD × (Lp (Ω))2 by
1,p
W = {(v, q)|(v, q) ∈ WD × (Lp (Ω))2 : ∇v − q = 0}. (301)
Following [59], we introduce the augmented Lagrangian Lrb defined, for rb > 0, by
Z Z Z Z
1 p rb
Lrb(v, q, λ) = (ε2 + |q|2 ) 2 dx − f v dx + |∇v − q|2 dx + λ · (∇v − q) dx, (302)
p Ω Ω 2 Ω Ω

and the following associated to (302) saddle-point problem:


1,p
Find{u, q, λ} ∈ WD × (Lp (Ω))2 × (Lq (Ω))2 such that (303a)
Lrb(u, q, µ) ≤ Lrb(u, q, λ) ≤ Lrb(v, w, λ), (303b)
1,p
∀{v, w, µ} ∈ WD × (Lp (Ω))2 × (Lq (Ω))2 . (303c)
Here the Lagrangian has very similar properties to the Lagrangian defined in Section 9.6.

334
13. NONLINEAR PROBLEMS
13.15.3 Numerical test p-Laplacian w. Newton-CG nonlinear/linear solver w. GMG preconditioning
We take example No. 2 from [128] and implement as linear solver the CG method with geometric multigrid
preconditioning. The solver configurations are as in Section 10.8, namely T OLLin = 1e − 12. For the Newton
solver, we choose the tolerance T OLN ew = 1e − 10. A plot of the solution is presented in Figure 13.15.3.

Figure 64: Solution of p-Laplace. Configuration taken from [128], Example No. 2.

We obtain as numerical results:


FE degree eps p TOL (LinSolve) TOL(Newton)
===========================================================
1 0.1 1.01 1e-12 1e-10
-----------------------------------------------------------
Cells DoFs h F-norm err ConvRate Min/MaxLinIter Newton iter
===========================================================================================
4096 4225 2.20971e-02 5.43340e-02 1.04683e+00 18/28 5
16384 16641 1.10485e-02 2.03360e-02 1.41782e+00 27/37 5
65536 66049 5.52427e-03 1.01906e-02 9.96804e-01 25/33 4
262144 263169 2.76214e-03 5.09674e-03 9.99587e-01 25/30 3
1048576 1050625 1.38107e-03 2.54855e-03 9.99899e-01 24/28 3
4194304 4198401 6.90534e-04 1.27430e-03 9.99975e-01 23/25 3
===========================================================================================
Observations:

• The F norm is close to 1 as to be expected from the theory stated above and proven in [128]
• The number of linear iterations is (nearly) mesh-independent and asymptotically stable. Therefore, the
geometric multigrid preconditioner works very well.
• The number of nonlinear Newton iterations is as well mesh-independent and asymptotically stable

335
13. NONLINEAR PROBLEMS
13.16 Temporal discretization of non-linear time-dependent problems
In the domain Ω and the time interval I = (0, T ), we consider the abstract problem
A(U )(Ψ) = F (Ψ)
where Z T Z T
A(U )(Ψ) = A(U )(Ψ) dt, F (Ψ) = F (Ψ) dt.
0 0
The abstract problem can either be treated by a full time-space Galerkin formulation, which was investigated
previously for fluid problems in Besier et al. [18, 19, 119]. Alternatively, the Rothe method or the method of
lines can be used in cases where temporal and spatial discretization are treated separtetely.
For reasons explained in Chapter 12, we follow the Rothe method. After semi-discretization in time, we
obtain a sequence of generalized steady-state fluid-structure interaction problems that are completed by ap-
propriate boundary values at every time step. Let us now go into detail and let
I = {0} ∪ I1 ∪ . . . ∪ IN
be a partition of the time interval I = [0, T ] into half open subintervals In := (tn−1 , tn ] of (time step) size
kn := tn − tn−1 with
0 = t0 < · · · < tN = T.
Example 13.36 (Variational space-time of Navier-Stokes). For Navier-Stokes, we have
Z
A(U )(Ψ) := A(U )(Ψ) dt
I

with
A(U )(Ψ) := (∂t v, ψ v ) + (v · ∇v, ψ v ) + (σ, ∇ψ) + (∇ · v, ψ p )
Moreover, Z
F (Ψ) := (f, ψ v ) dt
I
The full variational form reads:
Formulation 13.37 (Space-time formulation NSE). Find U := (v, p) ∈ X such that
A(U )(Ψ) + (v(0) − v 0 , ψ v (0)) = F (Ψ) ∀Ψ := (ψ v , ψ p ) ∈ X
with
X := {U = (v, p)|v ∈ L2 (I, H01 (Ω)n ), ∂t v ∈ L2 (I, L2 (Ω)n , p ∈ L2 (I, L2 (Ω)/R)
Since X can be continuously embedded into C(I, ¯ L2 (Ω)n ) × L2 (I, L2 (Ω)/R) the expression of the initial data
v(0) makes sense for functions U ∈ X.
Remark 13.38. The pressure is only defined up to a constant when only Dirichlet data are considered.
Working for instance with the do-nothing condition implicitly normalizes the pressure on the outflow boundary
and no pressure filter is necessary.

13.16.1 Classifying the semi-linear forms of PDE systems


When we work with PDE systems or coupled problems, several PDEs arise. Here, it is useful to classify a
priori the semi-linear forms into different categories.
Definition 13.39 (Arranging the semi-linear form into groups). We formally define the following semi-linear
forms and group them into four categories:
Stationary terms: Ak ⇒ AS (304)
Time-dependent terms: Ak ⇒ AT , AE , AI (305)
where AS are stationary terms, AT terms with time derivatives, AE terms that are treated explicitly in a
time-stepping scheme, AI terms that are treated implicitly in a time-stepping scheme (e.g., pressure in Navier-
Stokes).

336
13. NONLINEAR PROBLEMS
We explain the term AT in a bit more detail in the following. Since AT contains the continuous time
derivative, i.e.,
AT (U )(Ψ) = (∂t (. . .), Ψ)
we first need to discretize via a backward difference quotient:
 
n,k 1
AT (U )(Ψ) ≈ AT (U ) = (. . .), Ψ .
kn

13.16.2 Nonlinear time derivatives


When the time derivative is fully nonlinear we first apply the product rule and then apply the difference
quotient with θ-weighting of the nonlinear coefficient.
We explain this procedure with the help of an
Example 13.40. Let u and v be solution variables of a nonlinear system. Define J := J(u) and let us
consider:
∂t (Jv) = J∂t v + v∂t J
Applying the difference quotient and theta-approximation for the term in front of the time dervative yields
 v − v n−1   J − J n−1 
Jθ +v θ
k k
with
J θ = θJ n + (1 − θ)J n−1 , v θ = θv n + (1 − θ)v n−1

! !
 v − v n−1   J − J n−1 
ÂT (Û n,k )(Ψ̂) = Jθ ,ψ + vθ ,ψ (306)
k k

Remark 13.41 (Convenience of the splitting). The splitting has no advantage except a clean notation. In
more detail, it allows for a very compact notation when applying One-Step-θ and Fractional-Step-θ schemes [29]
to nonlinear coupled PDE systems and multiphysics problems in which some terms are stationary and others
are nonstationary; when some terms are treated explicitly and other terms fully implicitly. For instance, for
temporal discretization of the Navier-Stokes equations, the pressure should be treated fully implicitly, thus no
terms as (1 − θ)p̂n−1
f should appear. Separating all terms in the above way yields immediately the correct
temporal scheme as used for instance in [136]. Using this notational idea was one key part that facilitated a
lot the implementation for the further extension to FSI plus PFF. Of course, it makes no difference in the
code itself (expect its readability) if such a splitting is used or not. However, the readability and sustainability
of programming code is extremely important. The goal should be to have a code that can be immediately used
again by yourself and extended after some time (let’s say three years of non-usage). Or, a code that is usable
immediately by others, also here the authors has made very positive experiences with the code in [137]. For
general comments on code development and its treatment we also refer the reader to [90].

13.16.3 One-Step-θ and Fractional-Step-θ schemes

Definition 13.42. Let the previous time step solution Û n−1 and the time step k := kn = tn − tn−1 be given.
Find Û n such that

ÂT (Û n,k )(Ψ̂) + θÂE (Û n )(Ψ̂) (307)


+ÂP (Û n )(Ψ̂) + ÂI (Û n )(Ψ̂) = − (1 − θ)ÂE (Û n−1 )(Ψ̂) (308)
+ θFbn (Ψ̂) + (1 − θ)Fbn−1 (Ψ̂), (309)

where Fbn (Ψ̂) = (ρ̂s fˆsn , ψ̂sv )Ω ˆn ˆ


b s with fs := fs (tn ). The concrete scheme depends on the choice of the parameter
θ as defined in Definition 12.11.

337
13. NONLINEAR PROBLEMS

Definition 13.43. Let us choose θ = 1 − 22 , θ0 = 1 − 2θ, and α = 1−2θ 1−θ , β = 1 − α. The time step is split
into three consecutive sub-time steps. Let Û n−1 and the time step k := kn = tn − tn−1 be given.
Find Û n such that

ÂT (Û n−1+θ,k )(Ψ̂) + αθÂE (Û n−1+θ )(Ψ̂) (310)


n−1+θ n−1+θ n−1 n−1
+θÂP (Û )(Ψ̂) + ÂI (Û )(Ψ̂) = −βθÂE (Û )(Ψ̂) + θF̂ (Ψ̂), (311)
(312)
n−θ,k 0 n−θ
ÂT (Û )(Ψ̂) + βθ ÂE (Û )(Ψ̂) (313)
0 n−θ n−θ 0 n−1+θ 0 n−θ
+θ ÂP (Û )(Ψ̂) + ÂI (Û )(Ψ̂) = −αθ ÂE (Û )(Ψ̂) + θ F̂ (Ψ̂), (314)
(315)
n,k n
ÂT (Û )(Ψ̂) + αθÂE (Û )(Ψ̂) (316)
+θÂP (Û n )(Ψ̂) + ÂI (Û n )(Ψ̂) = −βθÂE (Û n−1 )(Ψ̂) + θF̂ n−θ (Ψ̂). (317)

13.17 Resulting time discretized problems


With the help of the previous considerations, we formulate a statement for the time-discretized equations:

Formulation 13.44. Let the semi-linear form Â(·)(·) be formulated in terms of the previous arrangement,
such that
Â(Û )(Ψ̂) := ÂT (Û )(Ψ̂) + ÂI (Û )(Ψ̂) + ÂE (Û )(Ψ̂) + ÂP (Û )(Ψ̂).
After time discretization, let the time derivatives are approximated with

ÂT (Û )(Ψ̂) ≈ ÂT (Û n,k )(Ψ̂),

such that the time-discretized semi-linear form reads

Â(Û n )(Ψ̂) := ÂT (Û n,k )(Ψ̂) + ÂI (Û n )(Ψ̂) + ÂE (Û n )(Ψ̂) + ÂP (Û n )(Ψ̂).

Then, we aim to find Û n ∈ X


b 0 for all n = 1, 2, . . . , N such that
D

Â(Û n )(Ψ̂) = Fb(Ψ̂) ∀Ψ̂ ∈ X,


b

where this equation is treated with one specific time-stepping scheme as introduced previously.

13.18 An academic example of finite-difference-in-time,


Galerkin-FEM-in-space-discretization and linearization in a Newton setting
We collect ingredients from most other sections to discuss a final PDE, which is time-dependent and nonlinear.
This specific PDE itself has no use neither in theory nor in practice and is purely academic, but is inspired by
the conservation of momentum of a nonstationary, nonlinear, fluid-structure interaction problem [136].
We are given the following example:
Problem 13.45. Let Ω = (0, 1) and I = (0, T ) with the end time value T > 0. Let the following PDE be
given: Find v : R × I → R and u : R × I → R, for allmost all times, such that

∂t2 u − J∇ · σ(v)F −T = f, in Ω, plus bc. and initial cond.,

and where J := J(u), F := F (u) and σ(v) = (∇v + ∇v T ).

338
13. NONLINEAR PROBLEMS
13.18.0.1 Time discretization We aim to apply a One-Step-θ scheme applied to the mixed problem:

∂t v − J∇ · σ(v)F −T = f,
∂t u − v = 0.

One-Step-θ discretization with time step size k, and θ ∈ [0, 1], leads to

v − v n−1
− θJ∇ · σ(v)F −T − (1 − θ)J n−1 ∇ · σ(v n−1 )(F −T )n−1 = θf + (1 − θ)f n−1 ,
k
u − un−1
− θv − (1 − θ)v n−1 = 0.
k

13.18.0.2 Spatial pre-discretization: weak form on the continuous level We multiply by the time step k,
apply with test functions from suitable spaces V and W and obtain the weak formulations

(v − v n−1 , ϕ) + kθ(Jσ(v)F −T , ∇ϕ) + k(1 − θ)(J n−1 σ(v n−1 )(F −T )n−1 , ϕ)
=kθ(f, ϕ) + k(1 − θ)(f n−1 , ϕ) ∀ϕ ∈ V,
(u − un−1 , ψ) + kθ(v, ψ) + k(1 − θ)(v n−1 , ψ) = 0 ∀ψ ∈ W.

Sorting terms on left and right hand sides:

(v, ϕ) + kθ(Jσ(v)F −T , ∇ϕ)


=(v n−1 , ϕ) − k(1 − θ)(J n−1 σ(v n−1 )(F −T )n−1 , ϕ)
+ kθ(f, ϕ) + k(1 − θ)(f n−1 , ϕ) ∀ϕ ∈ V,
(u, ψ) + kθ(v, ψ) = (un−1 , ψ) − k(1 − θ)(v n−1 , ψ) ∀ψ ∈ W.

13.18.0.3 A single semi-linear form - first step towards Newton solver Now, we build a single semi-linear
form A(·)(·) and right hand side F (·). Let12 U := {v, u} ∈ V × W and Ψ := {ϕ, ψ} ∈ V × W : Find U ∈ V × W
such that:

A(U )(Ψ) = (v, ϕ) + kθ(Jσ(v)F −T , ∇ϕ) + (u, ψ) + kθ(v, ψ)


F (Ψ) = (v n−1 , ϕ) − k(1 − θ)(J n−1 σ(v n−1 )(F −T )n−1 , ϕ) + kθ(f, ϕ)
+ k(1 − θ)(f n−1 , ϕ)(un−1 , ψ) − k(1 − θ)(v n−1 , ψ)

for all Ψ ∈ V × W .

13.18.0.4 Evaluation of directional derivatives - second step for Newton solver Let δU := {δv, δu} ∈
V × W . Then the directional derivative of A(U )(Ψ) is given by:
 
A0 (U )(δU, Ψ) = (δv, ϕ) + kθ J 0 (δu)σ(v)F −T + Jσ 0 (δv)F −T + Jσ(v)(F −T )0 (δu), ∇ϕ
+ (δu, ψ) + kθ(δv, ψ),

where we applied the chain rule for the term Jσ(v)F −T . Here, in the ‘non-prime’ terms in the nonlinear part;
namely J, F −T and σ(v), the previous Newton solution is inserted. We have now all ingredients to perform
the Newton step (284).

12 Of course, the solution spaces for ansatz- and test functions might differ.

339
13. NONLINEAR PROBLEMS
13.18.0.5 Spatial discretization in finite-dimensional spaces In the final step, we assume conforming finite-
dimensional subspaces Vh ⊂ V and Wh ⊂ W with Vh := {ϕ1 , . . . , ϕN } and Wh := {ψ1 , . . . , ψM }. Then, the
update solution variables in each Newton step are given by:
N
X M
X
δvh := vj ϕ j , and δuh := uj ψj .
j=1 j=1

For the linearized semi-linear form A0 (U )(δU, Ψ), we have:

A0 (Uh )(δUh , Ψh )

and with this (Mij representing the entries of the Jacobian):


N
X
M = (M )ij := A0 (Uh )(Ψj , Ψi ) = vj (ϕj , ϕi )
j=1
M
X XN M
X 
+ kθ uj J(ψj )σ(v)F −T + J( vj σ 0 (ϕj ))F −T + Jσ(v)( uj (F −T )0 (ψj )), ∇ϕi
j=1 j=1 j=1
M
X N
X
+ uj (ψj , ψi ) + kθ vj (ϕj , ψi )
j=1 j=1

for all test functions running through i = 1, . . . , N, N + 1, . . . , M .

13.18.0.6 Newton’s method and the resulting linear system We recall:

A0 (Uhn,j )(δUhn , φ) = −A(Uhn,j )(φ) + F (φ),


Uhn,j+1 = Uhn,j + λδUhn .

The linear equation system reads in matrix form:

M δU = B (318)

where M has been defined before, B is the discretization of the residual:

B ∼ A(un,j
h )(φ) + F (φ)

and the solution vector δU is given by

δU = (δv1 , . . . , δvN , δu1 , . . . , δuM )T .

Since we dealt originally with two PDEs (now somewhat hidden in the semilinear form), it is often desirable
to write (318) in block form:     
Mvv Mvu δv Bv
=
Muv Muu δu Bu
where Bv and Bu are the residual parts corresponding to the first and second PDEs, respectively. Since in
general such matrix systems are solved with iterative solvers, the block form allows a better view on the
structure and construction of preconditioners. For instance, starting again from (318), a preconditioner is a
matrix P −1 such that
P −1 M δU = P −1 B
so that the condition number of P −1 M is moderate. Obviously, the ideal preconditioner would be the inverse
of A: P −1 = A−1 . In practice one tries to build P −1 in such a way that
 
−1 I ∗
P M=
0 I

340
13. NONLINEAR PROBLEMS
and where P −1 is a lower triangular block matrix:
!
−1 P1−1 0
P =
P3−1 P4−1

The procedure is as follows (see lectures for linear algebra in which the inverse is explicitely constructed):
  
Mvv Mvu I 0
M=
Muv Muu 0 I
−1
  −1 
I Mvv Mvu Mvv 0
=
Muv Muu 0 I
 
−1
I Mvv Mvu  −1

−1 Mvv 0
=  0 Muu − Muv Mvv Mvu 
 
−1
| {z } −Muv Mvv I
=S
−1 −1
  
I Mvv Mvu Mvv 0
=
0 I −S −1 Muv Mvv
−1
S −1
| {z }
=P −1

−1
where S = Muu − Muv Mvv Mvu is the so-called Schur complement. The matrix P −1 is used as (exact)
preconditioner for M . Indeed we double-check:
−1 −1
    
Mvv 0 Mvv Mvu I Mvv Mvu
=
−S −1 Muv Mvv
−1
S −1 Muv Muu 0 I

Tacitely we assumed in the entire procedure that S and Mvv are invertible.
Using P −1 in a Krylov method, we only have to perform matrix-vector multiplications such as
! 
P1−1 0
 
Xnew X
=
Ynew P3−1 P4−1 Y

Now we obtain:

Xnew = P1−1 X (319)


Ynew = P3−1 X + P4−1 Y (320)

Remark 13.46. Be careful, we deal with two iterative procedures in this example: Newton’s method to compute
iteratively the nonlinear solution. Inside Newton’s method, we solve the linear equations systems with a Krylov
space method, which is also an iterative method.

341
13. NONLINEAR PROBLEMS
13.19 Navier-Stokes - FEM discretization
In a similar way to Section 11.5, we briefly present the NSE FEM discretization. As we previously discusssed,
rather than solving directly for vh and ph we solve now for the (Newton) updates δvh and δph - later more.
The problem reads:

Find Uh ∈ Xh such that: A(Uh )(Ψh ) = F (Ψh ) ∀Φ ∈ Xh ,

where
1
A(Uh )(Ψh ) = (vh · ∇vh , ψhv ) + (∇vh , ∇ψhv ) − (ph , ∇ · ψhv ) + (∇ · vh , ψhp ),
Re
F (Ψh ) = (ff , ψhv ).

Here, we added the dimensionless Reynolds number Re in order to be aware how the equations change their
type if Re is either small or large.
Remark 13.47 (Notations for bilinear and semilinear forms). If the PDE is linear, we use the notation for a
bilinear form: A(Uh , Ψh ). If the PDE is nonlinear, this is indicated in the notation by using A(Uh )(Ψh ).

In the following, we apply Newton’s method. Given an initial guess Uh0 := {vh0 , p0h }, we must solve the
problem:

Find δUh := {δvh , δph } ∈ Xh such that: A0 (Uhl )(δUh , Ψh ) = −A(Uhl )(Ψh ) + F (Ψh ), Uhl+1 = Uhl + δUh .

Here,
1
A0 (Uhl )(δUh , Ψh ) = (δvh · ∇vhl + vhl · ∇δvh , ψhv ) + (∇δvh , ∇ψhv ) − (δph , ∇ · ψhv ) + (∇ · δvh , ψhp ),
Re
F (Ψh ) = (ff , ψhv ),
1
A(Uhl )(Ψh ) = (vhl · ∇vhl , ψhv ) + (∇vhl , ∇ψhv ) − (plh , ∇ · ψhv ) + (∇ · vhl , ψhp ).
Re
As explained, we solve now for the updates and their respresentation with the help of the shape functions:
NV
X NP
X
δvh = δvj ψhv,j , δph = δpj ψhp,j
j=1 j=1

With that the discrete, linearized convection operator reads:

L := (ψhv,j · ∇vhl + vhl · ∇ψhv,j , ψhv,i )N V ,NV


ij=1 .

The block structure reads:


! !
1 1
f − [Ll + Re Al + B]

L + Re A B δv
T
= ,
−B 0 δp 0 − [−B T ]l

where we have added on the right hand side the vectors of Newton’s residual.
Remark 13.48. A numerical test is presented in Section 9.24.

342
13. NONLINEAR PROBLEMS
13.20 Newton solver performance for Poiseuille flow in a channel
We take DOpElib www.dopelib.net and go to dopelib/Examples/PDE/StatPDE/Example1. The original code
takes the (linear) Stokes equations. Since we always use a nonlinear solver, we should obtain convergence in
one single Newton step.

Figure 65: Poiseuille flow in a channel: x-velocity and pressure p.

13.20.1 Stokes
First the absolute residual value is set to a relative value. Then, the problem converges in one Newton step.

Newton step: 0 Residual (abs.): 8.83333e-01


Newton step: 0 Residual (rel.): 1.00000e+00
Newton step: 1 Residual (rel.): 2.84364e-15

13.20.2 Navier-Stokes with νf = 1


1
With our previous formulation, we have here νf = Re . We obtain:
Newton step: 0 Residual (abs.): 8.99844e-01
Newton step: 0 Residual (rel.): 1.00000e+00
Newton step: 1 Residual (rel.): 4.24110e-02
Newton step: 2 Residual (rel.): 5.92363e-07
Newton step: 3 Residual (rel.): 1.30338e-10
Newton step: 4 Residual (rel.): 1.72731e-14
This is quadratic convergence and considered a ‘optimal’ Newton convergence for such problems.

13.20.3 Navier-Stokes with νf = 1e − 2


We shift more to convection-dominated flow and obtain now:
Newton step: 0 Residual (abs.): 1.03895e-01
Newton step: 0 Residual (rel.): 1.00000e+00
Newton step: 1 Residual (rel.): 8.47208e-01 LineSearch {3}
Newton step: 2 Residual (rel.): 1.78723e-01 LineSearch {0}
Newton step: 3 Residual (rel.): 2.95891e-02 LineSearch {0}
Newton step: 4 Residual (rel.): 7.34315e-04 LineSearch {0}
Newton step: 5 Residual (rel.): 2.07151e-06 LineSearch {0}
Newton step: 6 Residual (rel.): 1.56055e-08 LineSearch {0}
Newton step: 7 Residual (rel.): 9.93743e-11 LineSearch {0}
In particular at the beginning, the Newton solver has now more problems and even needs some backtracking
line search steps in order to converge. In the steps 4 - 7 the convergence is fast, but not quadratic anymore.

13.21 Chapter summary and outlook


In this section we provided an introduction to nonlinear problems. Most importantly we introduced numerical
solver techniques such as fixed-point schemes and Newton methods. These are illustrated with several exam-
ples; finally we addressed descriptions for solving nonlinear, nonstationary, multiphysics problems. In the next
chapter, we focus on situations in which two (or more) PDEs couple and interact with each other.

343
13. NONLINEAR PROBLEMS

344
14. COUPLED PROBLEMS

14 Coupled problems
In this chapter, we address these topics and explain how two or more PDEs can be coupled on a variational
level. In practice, the coupling of different PDEs or PDEs with different coefficients in different subdomains
is a timely research topic (the keyword is multiphysics problems13 ).
Before we start, we refer in particular as well to:
• Section 4.1.7 in which an example of a linear coupled problem is given;

• Section 5.5 in which an explanation-only-in-words of coupled PDEs is given;


As in the previous chapter, we also refer the reader to my book [141].

14.1 An interface-coupled problem in 1D


In this section, we explain the basic features of an interface-coupled problem in 1D.
Formulation 14.1. Let Ω = (a, b) be an open domain and k := k(x) ∈ R a material coefficient, which is
specified as (
k1 , for x ∈ Ω1 = (a, c),
k(x) =
k2 , for x ∈ Ω2 = (c, b)
for a < c < b. The solution is split into two parts:
(
u1 (x), for x ∈ Ω1 ,
u(x) =
u2 (x), for x ∈ Ω2 ,

which is obtained by solving


−k1 u001 = f in Ω1 ,



00

 −k2 u2 = f in Ω2 ,



u1 (a) = u2 (b) = 0, (321)

 u1 (c) = u2 (c),



k1 u01 (c) = k2 u02 (c).

The third line in (321) is well-known to us and denotes homogeneous Dirichlet conditions on the outer bound-
aries. Afterwards, we have two new conditions that are so-called interface or coupling conditions and which
are very characteristic to coupled problems.

Remark 14.2. One of the most famous examples of a coupled (multiphysics) problem is fluid-structure inter-
action [112, 138] in which the Navier-Stokes equations (fluid flow) are coupled to elasticity (solid mechanics).
In the following, we now derive a variational formulation. Again the first step is to formulate a function
space. To this end, we define:

V = {u ∈ C 0 (Ω̄) : u|Ω1 ∈ C 1 (Ω1 ), u|Ω2 ∈ C 1 (Ω2 ) and u(a) = u(b) = 0}.

Remark 14.3 (on the Dirichlet coupling conditions). Since we seek u ∈ C 0 (Ω̄) the first coupling condition
u1 (c) = u2 (c) is implicitly contained. In general, however, it holds that same rule as before: Dirichlet conditions
need to be set explicitely in the function space and Neumann conditions, i.e., the second coupling condition,
appears naturally in the equations through integration by parts.

It holds:

13 Theword multiphysics is self-explaining: here multiple physical phenomena interact. Examples are interaction of solids with
temperature, interaction of solids with fluids, interaction of fluids with chemical interactions, interaction of solids with elec-
tromagnetic waves.

345
14. COUPLED PROBLEMS
Proposition 14.4 (Variational formulation of the coupled problem). Find u ∈ V such that

a(u, φ) = l(φ) ∀φ ∈ V

with
Z b
a(u, φ) = k(x)u0 (x)φ0 (x) dx, (322)
a
Z b
l(φ) = f (x)φ(x) dx. (323)
a

Proof. As in the previous sections, we first multiply with a test function from V and integrate:
Z c Z b Z b
− k1 u001 (x)φ1 dx − k2 u002 (x)φ1 dx = f (x)φ dx. (324)
a c a

We perform integration by parts:


Z c h i
k1 u01 (x)φ01 (x) dx − k1 u01 (c)φ1 (c) − u01 (a)φ1 (a) (325)
a
Z b h i Z b
+ k2 u02 (x)φ02 (x) dx − k2 u02 (b)φ2 (b) − u02 (c)φ2 (c) = f (x)φ(x) dx (326)
c a

using the continuity (first coupling condition) of v on the interface c, we obtain:


Z b h i Z b
k(x)u0 (x)φ0 (x) dx − k1 u01 (c) − k2 u02 (c) φ(c) + k1 u01 (a) φ1 (a) −k2 u02 (b) φ2 (b) = f (x)φ(x) dx. (327)
a | {z } | {z } | {z } a
=0 =0
=0

On the outer boundaries, the boundary terms will vanish because the test functions are zero. On the interface,
the terms will vanish as well since we can employ the second coupling condition. Consequently, we obtain
(322) with u ∈ V .
We next show the equivalence of solutions between the strong form and the variational forms:
Proposition 14.5. Let u ∈ C 2 and u|Ωi ∈ C 2 (Ω1 ), i = 1, 2. Then, u is a solution of the strong form (321) if
and only if u is a solution of the variational problem.
Proof. As usually, we first show (D) → (V ). Let u be a solution of (321). Moreover, u has sufficient regularity
such that the previous calculations (multiplication with a test function, integration, integration by parts) hold
true and u ∈ V . Therefore, u is a solution of (V).
We now discuss (V ) → (D) and consider u ∈ V to be solution of (V ) plus the assumption that u is twice
differentiable in each subdomain Ωi . Consequently, we can integrate by parts backwards. Let φ ∈ V . Then:
Z c h i
− k1 u001 (x)φ1 (x) dx + k1 u01 (c)φ1 (c) − u01 (a)φ1 (a) (328)
a
Z b h i Z b
− k2 u002 (x)φ2 (x) dx + k2 u02 (b)φ2 (b) − u02 (c)φ2 (c) = f (x)φ(x) dx (329)
c a

On the boundary Ω̄ we have zero test functions V , and obtain


Z c Z b Z b
− k1 u001 (x)φ1 (x) dx + k1 u01 (c)φ1 (c) − k2 u002 (x)φ2 (x) dx − k2 u02 (c)φ2 (c) = f (x)φ(x) dx. (330)
a c a

In the following we now discuss term by term on each interval:

346
14. COUPLED PROBLEMS
• We choose a subspace of V such that φ2 is zero and φ1 has a compact support on Ω1 (yielding specifically
that φ1 (a) = φ1 (c) = 0). Such a subspace is indeed a subspace of V since it is continuous on c and
φ(a) = φ(b) = 0. Consequently, we obtain

−k1 u001 (x) = f (x) a < x < c.

The other way around holds true as well from which we get

−k2 u002 (x) = f (x) c < x < b.

Consequently, we have recovered the two differential equations inside Ω1 and Ω2 .


• We now discuss the second interface conditions. In particular, the previous findings from (330) and
eliminating the boundary terms give us for φ ∈ V :

k1 u01 (c)φ1 (c) − k2 u02 (c)φ2 (c) = 0.

The continuity of φ at c becomes now essential such that we can write:


h i
k1 u01 (c) − k2 u02 (c) φ(c) = 0, ∀φ ∈ V.

We now construct φ ∈ V such that φ(c) = 1 (take for instance piece-wise linear functions on each
subdomain). This yields the transmission conditions:

k1 u01 (c) − k2 u02 (c) = 0.

• Finally, we also get the missing conditions on the outer boundary and the first interface conditions
because u ∈ V :
u(a) = u(b) = 0
for u ∈ V . Finally, since u ∈ V , i.e., u ∈ C 0 yields

u1 (c) = u2 (c).

In summary, u is a solution of (321), which shows that the variational formulation yields the classical solution.

Exercise 11. Exercise 2.8 in [82].

14.2 Volume coupling of two PDEs


Let Ω ⊂ Rn and ∂Ω be a sufficiently smooth boundary. Let V1 and V2 be Hilbert or Sobolev spaces that are
appropriate to each single PDE. For simplicity, we work with homogeneous Dirichlet conditions on ∂Ω. In
volume-coupling both equations are defined in the same domain Ω (this is important here!).
In a formal way, we write for two variational forms:

Find u1 ∈ V1 : A1 ({u1 , u2 })(ϕ1 ) = F1 (ϕ1 ) ∀ϕ1 ∈ V1 , (331)


Find u2 ∈ V2 : A2 ({u1 , u2 })(ϕ2 ) = F2 (ϕ2 ) ∀ϕ2 ∈ V2 , (332)

where A1 and A2 are semi-linear forms representing the partial differential operators. Specifically, the solution
variable of the other problem may appear in each equation.

347
14. COUPLED PROBLEMS
14.3 Partitioned versus monolithic coupling

PDE 1 PDE 1 PDE 1

PDE 2 PDE 2 PDE 2

tn−1 tn tn+1

PDE 1 PDE 1 PDE 1

PDE 2 PDE 2 PDE 2

tn−1 tn tn+1

Figure 66: Monolithic and paritioned coupling schemes. In monolithic coupling, the coupled system is solved
all-at-once whereas in the paritioned scheme subiterations are required to enforce the force balance
on the interface.

14.4 Iterative coupling (known as partitioned or staggered) schemes


For coupled problems, very often, partitioned (or also so-called staggered) schemes are employed. The idea is
to iterate between the different equations per time step.
Find U = (u, v) ∈ V × W such that:

Au (U )(ψ u ) = F (ψ u ), (333)
Av (U )(ψ v ) = F (ψ v ). (334)

The idea is to iterate between both PDEs:


Algorithm 14.6. Given the initial iteration iteration values (u0 , v 0 ) ∈ V × W . For k = 1, 2, 3, . . . iterate:

Given v k−1 , find uk : Au (uk , v k−1 )(ψ u ) = F (ψ u ), (335)


k k k k v v
Given u , find v : Av (u , v )(ψ ) = F (ψ ). (336)

Check the stopping criterion:


max(kuk − uk−1 k, kv k − v k−1 k) < T OL.
If the stopping criterion is fulfilled, stop. If not, increment k → k + 1.

14.5 Variational-monolithic coupling


One possibility is a variational-monolithic coupling (in short often monolithic) in which we first define the
space
X = V1 × V2 .
This allows us to define a compact semi-linear form with both solution variables. To this end, let U :=
(u1 , u2 ) ∈ X. This allows us to sum-up both equations such that

Find U ∈ X : A(U )(Ψ) = F (Ψ) ∀Ψ ∈ X,

where Ψ = (ϕ1 , ϕ2 ) ∈ X and

A(U )(Ψ) := A1 ({u1 , u2 })(ϕ1 ) + A2 ({u1 , u2 })(ϕ2 ).

348
14. COUPLED PROBLEMS
If the problem is linear, A(U )(Ψ) can be treated with a linear solver (Direct, CG, GMRES, MG, ...) with
appropriate preconditioners. In fact, the development of a good preconditioner for coupled problems is very
often the most challenging part.
If A(U )(Ψ) is nonlinear (which is in most coupled problems the case), then A(U )(Ψ) can be solved with the
techniques explained in Chapter 13.
Remark 14.7. This concept can be easily extended to n PDEs rather than only 2.

14.6 Examples
14.6.1 Two stationary elliptic PDEs
We couple two elliptic PDEs. The first describes some displacements while the second is a heat transfer
problem.
Find u : Ω → R such that
−∇ · (T ∇u) = f in Ω,
u = 0 on ∂Ω,
and find T : Ω → R such that
−∇ · (α∇T ) = g(u) in Ω,
T = 0 on ∂Ω.
The function spaces read:
V1 := H01 (Ω)
V2 := H01 (Ω).
The product space reads:
X := V1 × V2 .
This allows us to define U := (u, T ) ∈ X and the following compact semi-linear form:
A(U )(Ψ) := (∂t u, ψ1 ) + (T ∇u, ∇ψ1 )
+ (∂t T, ψ2 ) + (α∇T, ∇ψ2 ) − (g(u), ψ2 )
and the right hand side:
F (Ψ) := (f, ψ1 ).
Then, we obtain:
Formulation 14.8. Find U ∈ X such that
A(U )(Ψ) = F (Ψ) ∀Ψ ∈ X.

14.6.2 Two time-dependent parabolic PDEs


We now extend the previous equations, to a time-dependent situation and couple two parabolic equations.
Find u : Ω × I → R such that
∂t u − ∇ · (T ∇u) = f in Ω × I,
u = 0 on ∂Ω × I,
u(0) = u0 in ∂Ω × {0},
and find T : Ω × I → R such that
∂t T − ∇ · (α∇T ) = g(u) in Ω × I,
T = 0 on ∂Ω × I,
T (0) = T0 in ∂Ω × {0},

349
14. COUPLED PROBLEMS
The Bochner function spaces read:
V1 := L2 (I, H01 (Ω))
V2 := L2 (I, H01 (Ω)).
The product space reads:
X := V1 × V2 .
This allows us to define U := (u, T ) ∈ X and the following compact semi-linear form:
Z
A(U )(Ψ) := (∂t u, ψ1 ) + (T ∇u, ∇ψ1 ) dt + (u(0) − u0 , ψ1 (0))
I
Z
+ (∂t T, ψ2 ) + (α∇T, ∇ψ2 ) − (g(u), ψ2 ) dt + (T (0) − T0 , ψ2 (0))
I

and the right hand side: Z


F (Ψ) := (f, ψ1 ) dt.
I
Then, we obtain:
Formulation 14.9. Find U ∈ X such that
A(U )(Ψ) = F (Ψ) ∀Ψ ∈ X.

14.6.3 The Biot problem


The standard Biot problem in Section 4.1.7 is an example of a volume-coupled problem.

14.7 Interface coupling


14.7.1 Preliminaries: Interface-Tracking and Interface-Capturing
Interface coupling is more challenging than volume-coupling since the PDEs live in different subdomains. Two
basic methods can be distinguished:
• Interface-Tracking
• Interface-Capturing
In methods, where the domain is decomposed into elements or cells (finite volumes, finite elements or isoge-
ometric analysis), using interface-tracking aligns mesh edges with the interface. These are so-called fitted
methods. For moving interfaces, the mesh elements need to be moved as well; see Figure 67. However, mesh
elements may be deformed too much such that the approach fails if not taken care of (expensive) re-meshing
in a proper way.
In interface-capturing methods (unfitted methods), the domain and consequently the single elements stay
fixed; see Figure 67 (left). Here, the interface can move freely through the domain. Mesh degeneration is not
a problem, but capturing the interface is difficult. In this approach a further classification can be made:
• Lower-dimensional approaches
• Diffusive techniques.
The first method comprises extended/generalized finite elements, cut-cell methods, finite cell methods, and
locally modified finite elements. Diffusive methods are the famous level-set method or phase-field methods.

Short summary Finally, using interface-tracking or interface-capturing approaches are a compromise between
computational and implementation efforts and the accuracy of the desired interface approximation. While in
general interface-capturing are easier to implement and can deal in an easier way with moving and evolving
interfaces, the accuracy for the same number of degrees of freedom is lower than comparable interface-tracking
approaches. The latter are, however, more challenging when interfaces are moving, propagating - in particular
in 3D. To this end, as said just before: it is a compromise as many things in life.

350
14. COUPLED PROBLEMS

Figure 67: Left: the mesh is fixed and the interface must be captured. Right: interface-tracking in which the
interface is located on mesh edges.

14.7.2 Interface coupling of two PDEs


We now address interface coupling in more detail. Here, both (or multiple) PDEs live in different subdomains.
Let Ω1 ⊂ Rn and Ω2 ⊂ Rn and Ω := Ω̄1 ∪ Ω̄2 . The interface is described by

Γ := Ω̄1 ∩ Ω̄2 .

Let both the outer boundary ∂Ω and the interface Γ both be sufficiently smooth.
Let V1 (Ω1 ) and V2 (Ω2 ) be Hilbert or Sobolev spaces that are appropriate to each single PDE. For simplicity,
we work with homogeneous Dirichlet conditions.
Let u1 : Ω1 → R and u2 : Ω2 → R. For second-order equations (for instance two Poisson problems) we need
to impose two interface conditions on Γ:

u1 = u2
∂n1 u1 = ∂n2 u2 ,

with the relation of the normal vectors n1 = −n2 . In the first is known as kinematic condition and the second
is the continuity of normal forces. The first condition is a Dirichlet-like condition, which needs to be imposed
in the function space. The second one is a Neumann condition that will vanish in the weak formulation when
a variational-monolithic coupling algorithm is used.
In a formal way, we write for two variational forms:

Find u1 ∈ V1 : A1 ({u1 , u2 })(ϕ1 ) = F1 (ϕ1 ) ∀ϕ1 ∈ V1 ,


Find u2 ∈ V2 : A2 ({u1 , u2 })(ϕ2 ) = F2 (ϕ2 ) ∀ϕ2 ∈ V2 ,

14.7.3 Variational-monolithic coupling


One possibility is a variational-monolithic coupling (in short often monolithic) in which, we first define the
space
X = {v ∈ H01 (Ω)| v1 = v2 on Γ}
Here, the function v is defined over the entire domain Ω. Moreover, the kinematic coupling condition is built
into X.
This allows us to define a compact semi-linear form with both solution variables. To this end, let U :=
(u1 , u2 ) ∈ X. This allows us to sum-up both equations such that
Proposition 14.10.
Find U ∈ X : A(U )(Ψ) = F (Ψ) ∀Ψ ∈ X,
where Ψ = (ϕ1 , ϕ2 ) ∈ X and

A(U )(Ψ) := A1 ({u1 , u2 })(ϕ1 ) + A2 ({u1 , u2 })(ϕ2 ).

351
14. COUPLED PROBLEMS
Proof. It holds after partial integration of the strong forms:
Z
A1 ({u1 , u2 })(ϕ1 ) + ∂n u1 ϕ1 ds
Γ
and Z
A2 ({u1 , u2 })(ϕ2 ) + ∂n u2 ϕ2 ds.
Γ
We sum-up:
Z Z
A1 ({u1 , u2 })(ϕ1 ) + ∂n u1 ϕ1 ds + A2 ({u1 , u2 })(ϕ2 ) + ∂n u2 ϕ2 ds
Γ Γ
Z
= A1 ({u1 , u2 })(ϕ1 ) + A2 ({u1 , u2 })(ϕ2 ) + (∂n u1 ϕ1 + ∂n u2 ϕ2 ) ds.
Γ

We argue that Ψ = (ϕ1 , ϕ2 ) ∈ X. In particular, therein ϕ1 = ϕ2 on Γ. Also we use the fact that n1 = −n2 .
Then: Z Z
(∂n1 u1 ϕ1 + ∂n2 u2 ϕ1 ) , ds = (∂n1 u1 + ∂n1 u2 )ϕ1 ds
Γ Γ
Our second coupling condition ∂n1 u1 +∂n1 u2 = 0 comes into play now and therefore the integral on Γ vanishes:
Z
(∂n1 u1 + ∂n1 u2 )ϕ1 ds = 0.
Γ
Therefore, the sum of both semi-linear forms does not contain any terms on the interface Γ and consequently,
we have
A(U )(Ψ) := A1 ({u1 , u2 })(ϕ1 ) + A2 ({u1 , u2 })(ϕ2 ).

14.8 Examples
14.8.1 Two stationary elliptic PDEs
We study two elliptic PDEs in different domains.
Find u1 : Ω1 → R such that
−∇ · (u2 ∇u1 ) = f in Ω1 ,
u1 = 0 on ∂Ω1 ,
and find u2 : Ω2 → R such that
−∇ · (α∇u2 ) = g(u1 ) in Ω2 ,
u2 = 0 on ∂Ω2
where α > 0 and some diffusion coefficient.
We impose the following coupling conditions on Γ:
u1 = u2
u2 ∇u1 · n1 = α∇u2 · n2 .
Using monolithic coupling the joint function space read:
X = {v ∈ H01 (Ω)| v1 = v2 on Γ}
This allows us to define U := (u, T ) ∈ X and the following compact semi-linear form:
A(U )(Ψ) := (∂t u, ψ1 ) + (T ∇u, ∇ψ1 )
+ (∂t T, ψ2 ) + (α∇T, ∇ψ2 ) − (g(u), ψ2 )
and the right hand side:
F (Ψ) := (f, ψ1 ).
Then, we obtain:

352
14. COUPLED PROBLEMS
Formulation 14.11. Find U ∈ X such that
A(U )(Ψ) = F (Ψ) ∀Ψ ∈ X.
This formulation is identical to volume-coupling, but the fact is hidden that u1 and u2 live on different subdo-
mains.

14.8.2 The augmented Biot-Lamé-Navier problem


The Biot problem coupled to elasticity in Section 4.1.7 is an example in which a volume-coupled problem
is coupled via interface-coupled to another problem. Here, the interface conditions must be very carefully
described and implemented.

14.9 Variational-monolithic fluid-structure interaction


In this section, we present an example of a nonlinear, nonstationary interface-coupled problem: fluid-structure
interaction. FSI can be written in terms of semi-linear forms (see Chapter 14) that allow later for compact
notation:
Problem 14.12 (ALEf x FSI with harmonic and linear-elastic mesh motion). Find
Û := {v̂f , v̂s , ûf , ûs , p̂f , p̂s } ∈ Xb 0 := {v̂ D + V̂ 0 } × L̂s × {ûD + V̂ 0 } × {ûD 0 0 0
s + V̂s } × L̂f × L̂s , such that
D f f,v̂ f f,û
0 0 0 0
v̂f (0) = v̂f , v̂s (0) = v̂s , ûf (0) = ûf , and ûs (0) = ûs are satisfied, and for almost all time steps t ∈ I holds:

 Â1 (Û )(ψ̂ v ) = (Jˆρ̂f ∂t v̂f , ψ̂ v )Ω

b f + (ρ̂f J(F
ˆ b−1 (v̂f − ŵ) · ∇)v̂ b f ), ψ̂ v ) b
Ωf
Fluid momentum
 +(Jˆσ
 −T
bf Fb , ∇ b ψ̂ ) b − hĝf , ψ̂ ib − (ρ̂f Jˆfˆf , ψ̂ ) b
v v v
= 0 ∀ψ̂ v ∈ V̂f,0 Γb ,
Ωf ΓN Ωf i

 Â2 (Û )(ψ̂ v ) = (ρ̂s ∂t v̂s , ψ̂ v )Ω
 bb b v b
b s + (F Σ, ∇ψ̂ )Ω s
Solid momentum, 1st eq.
v
 +γw (v̂s , ψ̂ )Ω

b s + γs (ˆ (v̂s ), ∇ψ̂ )Ω
b v ˆ v
b s − (ρ̂s fs , ψ̂ )Ω bs = 0 ∀ψ̂ v ∈ V̂s0 ,

Fluid mesh motion Â3 (Û )(ψ̂ u ) = (b σmesh , ∇b ψ̂ u ) b
Ωf = 0 ∀ψ̂ u ∈ V̂f,û,
0
bi ,
Γ

Solid momentum, 2nd eq. Â4 (Û )(ψ̂ u ) = ρ̂s (∂t ûs − v̂s , ψ̂ u )Ω bs = 0 ∀ψ̂ u ∈ L̂s ,

Fluid mass conservation Â5 (Û )(ψ̂ p ) = (div c (JˆFb−1 v̂f ), ψ̂ p ) b = 0 ∀ψ̂ p ∈ L̂0f ,
Ωf

Solid volume conservation Â6 (Û )(ψ̂ p ) = (P̂s , ψ̂ p )Ω bs = 0 ∀ψ̂ p ∈ L̂0s .

The Neumann coupling conditions on Γ


b i are fulfilled in a variational way and cancel in monolithic modeling.
Thus, the condition
hJˆσ
bf Fb−T n̂f , ψ̂fv iΓbi + h[FbΣ]n̂
b s , ψ̂sv ib = 0
Γi ∀ψ̂fv ∈ H 1 (Ω
b f ), ∀ψ̂sv ∈ H 1 (Ω
b s ), ψ̂fv = ψ̂sv on Γ
bi , (337)
is implicitly contained in the above system.

14.9.1 A compact semi-linear form


Consequently, we can write in short form:
Problem 14.13. Find Û ∈ X b D such that v̂f (0) = v̂ 0 , v̂s (0) = v̂s0 , ûf (0) = û0 , and ûs (0) = û0s are satisfied,
0 f f
and for almost all time steps t ∈ I holds:
Â(Û )(Ψ̂) = 0 for all Ψ̂ ∈ X
b (338)

with Ψ̂ = {ψ̂fv , ψ̂sv , ψ̂fu , ψ̂su , ψ̂fp , ψ̂sp } and X


b = V̂ 0 × L̂f × V̂ 0
f,v̂ f,û,Γ
0 0 0
b × V̂s × L̂f × L̂s and
i

v
Â(Û )(Ψ̂) := Â1 (Û )(ψ̂ ) + Â2 (Û )(ψ̂ ) + Â3 (Û )(ψ̂ ) + Â4 (Û )(ψ̂ u ) + Â5 (Û )(ψ̂ p ) + Â6 (Û )(ψ̂ p ).
v u

353
14. COUPLED PROBLEMS
We apply the previous ideas to fluid-structure interaction. Here the abstract problem is formally given by:
b 0 , where X
Definition 14.14. Find Û = {v̂f , v̂s , ûf , ûs , p̂f , p̂s } ∈ X b 0 := {v̂ D + V̂ 0 } × L̂f × {ûD + V̂ 0 } ×
D D f f,v̂ f f,û
{ûD 0 0 0
s + V̂s } × L̂f × L̂s , such that
Z T
Â(Û )(Ψ̂) = Â(Û )(Ψ̂) dt = 0 ∀Ψ̂ ∈ X,
b (339)
0

where Ψ̂ = {ψ̂fv , ψ̂sv , ψ̂fu , ψ̂su , ψ̂fp , ψ̂sp } and X


b = V̂ 0 × L̂f × V̂ 0
f,v̂ f,û,Γ
0 0 0
b × V̂s × L̂f × L̂s and
i

Â(Û )(Ψ̂) := Â1 (Û )(ψ̂ ) + Â2 (Û )(ψ̂ ) + Â3 (Û )(ψ̂ ) + Â4 (Û )(ψ̂ u ) + Â5 (Û )(ψ̂ p ) + Â6 (Û )(ψ̂ p ).
v v u

as defined in Problem 14.12.

14.9.2 Preparing temporal discretization: grouping the FSI problem


Definition 14.15 (Arranging of the FSI semi-linear form into groups). We formally define the following
semi-linear forms and group them into four categories:
• time equation terms (including the time derivatives);
• implicit terms (e.g., the incompressibility of the fluid);
• pressure terms;
• all remaining terms (stress terms, convection, damping, etc.);
such that

ÂT (Û )(Ψ̂) = (Jˆρ̂f ∂t v̂f , ψ̂fv )Ω ˆ b−1 ŵ · ∇)v̂


b f − (ρ̂f J(F
b f ), ψ̂ v ) b + (ρ̂s ∂t v̂s , ψ̂ v ) b + (ρ̂s ∂t ûs , ψ̂ u ) b ,
f Ωf s Ωs s Ωs (340)
ÂI (Û )(Ψ̂) = (αu ∇û
b f,∇ c (JˆFb−1 v̂f ), ψ̂ p ) b + (P̂s , ψ̂ p ) b ,
b ψ̂ u ) b + (div (341)
f Ωf f Ωf s Ωs

ˆ Fb−1 v̂f · ∇)v̂


ÂE (Û )(Ψ̂) = (ρ̂f J( b f ), ψ̂fv ) b + (Jˆσ
bf,vu Fb−T , ∇
b ψ̂fv ) b − hρf ν Fb−T ∇v
b T n̂, ψ̂fv ib (342)
Ωf Ωf Γout
b ∇
+ (FbΣ, b ψ̂sv ) b − (ρ̂s v̂s , ψ̂su ) b , (343)
Ωs Ωs

ÂP (Û )(Ψ̂) = (Jˆσ


bf,p F b−T b ψ̂fv ) b
,∇ Ωf + (Jˆσ
bs,p F b−T b ψ̂sv ) b ,
,∇ Ωs (344)

where the reduced stress tensors σ


bf,vu , σ
bf,p , and σ
bs,p are defined as:

σ ˆ
bf,p = −p̂f I, b f Fb−1 + Fb−T ∇v̂
bf,vu = ρf νf (∇v̂
σ b T ), (345)
f

σ ˆ
bs,p = −p̂s I, (if we deal with the INH or IMR material), (346)

and Σ b denotes as usual the structure tensor of the INH, IMR, or STVK material. The time derivative in
ÂT (Û )(Ψ̂) is approximated by a backward difference quotient. For the time step tn ∈ I for n = 1, 2, . . . , N
( N ∈ R), we compute v̂i := v̂in , ûi := ûni (i = f, s) via
1 1
ÂT (Û n,k )(Ψ̂) ≈ ρ̂f Jˆn,θ (v̂f − v̂fn−1 ), ψ̂ v Ω ρ̂f (JˆFb−1 (ûf − ûn−1 b f , ψ̂ v b
 
bf − f ) · ∇)v̂ Ωf
(347)
k k
1
+ ρ̂s (v̂s − v̂sn−1 ), ψ̂ v Ω n−1
, ψ̂ u Ω
 
b s + ûs − ûs bs , (348)
k
where we introduce a parameter θ, which is clarified below. Furthermore, we use

Jˆn,θ = θJˆn + (1 − θ)Jˆn−1 ,

and ûni := ûi (tn ), v̂in := v̂i (tn ), and Jˆ := Jˆn := J(t
ˆ n ). The former time step is given by v̂ n−1 , etc. for i = f, s.
i

Remark 14.16. Working with damped solid equations, one needs to add two terms such that:

ÂE (Û )(Ψ̂) = . . . + γw (v̂s , ψ̂sv )Ω b ψ̂sv ) b .


(v̂s ), ∇
b s + γs (ˆ Ωs

354
14. COUPLED PROBLEMS
14.9.3 Implementation in ANS/deal.II and DOpElib; both based on C++
To supplement our algorithmic discussion with some practical aspects, the reader is invited to test him/her-self
some concepts presented so far. Implementations are present in two software packages deal.II [10] (source code
[137] in ANS14 ) and DOpElib [41, 62]:
• ALE-FSI in deal.II (ANS) with biharmonic mesh motion
https://journals.ub.uni-heidelberg.de/index.php/ans/issue/view/1244 [137] solving FSI bench-
mark problems [77]
• Nonstationary Navier-Stokes benchmark problems [118] at
http://www.dopelib.uni-hamburg.de/ Examples/PDE/InstatPDE/Example1
• ALE-FSI benchmark problems [77] with biharmonic mesh motion at
http://www.dopelib.uni-hamburg.de/ Examples/PDE/InstatPDE/Example2
• Biot-Lame-Navier system: augmented Mandel’s benchmark at
http://www.dopelib.uni-hamburg.de/ Examples/PDE/InstatPDE/Example6
• Stationary FSI optimization at
http://www.dopelib.uni-hamburg.de/ Examples/OPT/StatPDE/Example9
We notice that the implementation is performed in a practical monolithic way as described in Chapter 14
that has two assumptions; namely, displacements and velocity are taken from globally-defined Sobolev spaces
rather than restricting them to the sub-domains and one (major) limitation (the development of an iterative
linear solver and preconditioners might be challanging since we do not distuingish the sub-problems in the
system matrix). Putting these two limitations aside, the idea results in a fascinating easy way to implement
multiphysics problems in non-overlapping domains.
Remark 14.17. It has been appeared that the basic program structure could be easily generalized and extended
to other complex multiphysics problems such as moving boundary problems with chemical reactions [88] as well
as phase-field fracture propagation [94, 139] 

Figure 68: Reference [137] at https://media.archnumsoft.org/10305/


14 ANS = Archive of Numerical Software, http://www.archnumsoft.org/

355
14. COUPLED PROBLEMS
The structure of such a program for time-dependent nonlinear problems is as follows:
• Input of all data: geometry, material parameters, right hand side values
• Sorting and associating degrees of freedom
• Assembling the Jacobian
• Assembling the right hand side residual
• Newton’s method
• Solution of linear equations
• Postprocessing (output, a posteriori error estimation, mesh refinement)
• Main routine including time step loop

14.10 Diffusive interface approximations


Level set method: (Osher S., Sethian J.A., Fronts Propagating with Curvature-Dependent Speed: Algorithms
Based on Hamilton-Jacobi Formulations (Journal of Computational Physics, 79(1), page 12-49, 1988).
Phase-field methods: Cahn-Hilliard, potential, ...

14.11 Sharp interface approximations in cut-elements with a locally modified FEM


Originally proposed in [53, 54] with an open-source code implementation in deal.II on https://doi.org/10.
5281/zenodo.1457758.
In this method, the interface can cut the elements, but is not reconstructed in a diffusive manner, but in
an exact fashion. The key trick is not to introduce new degrees of freedom on the cuts, but to design a patch
and to move the existing degrees of freedom to the cuts. With this, the inner structure of the system matrix
and vectors remain the same and do not change at each time step in a time-dependent computation. Thus,
the setup of numerical solvers is greatly facilitated.

14.12 On the nonlinear and linear numerical solutions


Coupled problems can be solved with methods discussed in Chapter 13. While fixed-point or Newton methods
can be relatively easily applied, the crucial point are the linear equation systems inside Newton’s method. For
multiple PDEs with many degrees of freedom, direct solvers such as UMFPACK or MUMPS are too costly
and memory-demanding (see lectures on Numerik 1). Consequently, we need iterative solvers, which require
(nearly) always a preconditioner. To construct such preconditioners is problem-specific and must be re-done
for each coupled PDE system. Therefore, this is a lot of work and very time-consuming. However, for an
efficient numerical solution, there are only a few alternatives. Black-box preconditioners may work in certain
cases. Another possibility is to decouple the PDEs and to iterate (staggered/partitioned approach) and to use
well-known solvers for the subproblems.
Either way, in coupled problems, often material and model parametes differ in several order of magnitude.
Therefore, nonlinear and linear solvers may suffer from ill-conditioning and it must be proofed (numerical
analysis!) that they are robust under parameter variations. This equally belongs to the development of good
solvers.

14.12.1 Example
Our example of an iterative solver, namely GMRES with Schur complements and multigrid preconditioning
was developed and presented in
• Monolithic solver with emphasis on Schur complements [81]
• Parallel monolithic solver [80]
In the latter paper, an extensive literature overview for solvers for fluid-structure interaction as of the year
2018 is provided.

356
14. COUPLED PROBLEMS
14.13 Chapter summary and outlook
We introduced concepts to couple two (or more) PDEs. Many parts of this section are also current research
topics. Many open problems can be found in numerical analysis (convergence order), well-posedness, robust
and efficient numerical methods, efficient solvers, and so forth.

357
14. COUPLED PROBLEMS

358
15. PUBLIC CLIMATE SCHOOL (NOVEMBER 2019/2021): PDES IN ATMOSPHERIC PHYSICS

15 Public climate school (November 2019/2021): PDEs in


atmospheric physics
The underlying main literature is the textbook [49]. The introduction to weather models in [121][Chapter 2]
is highly recommended, too. In meteorology, mainly six PDEs are coupled, which can be simplified by certain
assumptions. However, for the latter we refer to the corresponding literature [49, 121] and references cited
therein. We mainly follow now [49]. In the following, we first formulate the PDEs, then classify them, and
focus lastly on the Navier-Stokes equations.
Disclaimer: The following models are taken in such a way that they still make sense from a
physics point of view and serve for our numerical explanations (the main goal in this class!).
For their ‘correct’ behavior in practice and further modifications and/or simplifications, we do
not claim full knowledge and refer to the previously given references.

15.1 Notation
Let Ω ⊂ Rd , d = 3 the spatial domain and I := (0, T ] the time interval.

15.2 Five governing equations in meteorology


To model meteorological events and air flow, basically five variables are important:
• air pressure p : Ω × I → R;
• air density ρ : Ω × I → R;
• Humidity of the air w : Ω × I → R;
• air temperature T : Ω × I → R;
• wind velocities v : Ω × I → Rd .
For these variables, we need five governing equations. These are described in the following.
Formulation 15.1 (Ideal gas equation for p). Let R > 0 be the gas constant (see e.g., [49][Chapter 2]),
ρ : Ω × I → R the air density, T : Ω × I → R the temperature. Find p : Ω × I → R such that
p = ρRT in Ω × I,
plus bc on ∂Ω × I.

Formulation 15.2 (Continuity equation / mass conservation for ρ). Let v : Ω × I → Rd be the air velocity.
Find ρ : Ω × I → R such that
∂t ρ + ∇ · (ρv) = 0 in Ω × I,
plus bc on ∂Ω × I,
ρ(0) = ρ0 in Ω × {t = 0}.
Remark 15.3. Compare this to the law in the incompressible NSE from the chapters before. Specifically, when
ρ = const, then we obtain ∇ · v = 0.
The next law is also a continuity equation, but now for the air water steem; see [49][Chapter 4] or [121][Eq.
(2.5)] to describe a single phase concentration (e.g., gaseous water phase):
Formulation 15.4 (Humidity for w; concentration of gaseous water phase). Let W := W (T, w, ρ) ∈ R
quantify the processes that govern phase transitions. Let v : Ω → R be the air velocity. Find w : Ω × I → R
such that
∂t w + ∇ · (wv) = W in Ω × I,
plus bc on ∂Ω × I,
w(0) = w0 in Ω × {t = 0}.

359
15. PUBLIC CLIMATE SCHOOL (NOVEMBER 2019/2021): PDES IN ATMOSPHERIC PHYSICS
For the temperature, we adopt the first law of thermodynamics (see [49][Chapter 3]):
dE = dQ − dA,
which is the conservation of energy. Here, dE is the inner energy, dQ the heat applied to the system, dA the
work performed by the system. One possible realization in terms of continuum mechanics is then [49][page
52, (5.3)]: Let m be the mass and specific heat capacity cp > 0 and δQ the net heat flow (energy flow). Find
T : Ω × I → R such that
d
mcp ∂t T = mv∂t p + δQ in Ω × I.
dt

After some further manipulations, one arrives at [121]


Formulation 15.5 (First law of thermodynamics for T ). Given the specific heat capacity cp > 0, and thermal
conductivity k > 0. Furthermore, let the density ρ, the pressure p and the velocity v be determined by the
continuity equation, ideal gas law and the Navier-Stokes equations, respectively. Let F be the net radiative
d d
flux (amount of power radiated through a given area) and dt δQ := dt δQ(T, p, ρ) as before the rate of internal
heating/cooling, for instance due to phase changes in atmosspheric moisture. Find T : Ω × I → R such that
d
ρcp ∂t T + p∇ · v − ∇ · (k∇T ) = −∇ · F + ρ δQ in Ω × I,
dt
T = TD on ∂Ω × I,
T (0) = T0 in Ω × {t = 0},
p(0) = p0 in Ω × {t = 0},
where the Dirichlet conditions T = TD are only an example. Neumann conditions are equally important.
Finally, we describe the air flow. Since the earth turns, we have to include Coriolis forces (which is a virtual
force) and due to the earth rotation. Consequently, the equations look a bit different to our previous chapters.
We now simply take [49][Chapter 18, page 259, (18.10)]
Formulation 15.6 (Navier-Stokes for v). Given the angle velocity ω, the Coriolis force 2ω×v, the gravitational
potential Φ, the density ρ and the pressure p, the dynamic viscosity µ. Find v : Ω × I → Rd such that
µ 1 1
∂t v + (v · ∇)v − [∆v + ∇(∇ · v)] + 2ω × v + ∇p = −∇Φ in Ω × I,
ρ 3 ρ
v = vD on ∂Ω × I,
v(0) = v0 in Ω × {t = 0},
where vD (Dirichlet conditions) are just an example. Neumann conditions are equally important.
Remark 15.7. We briefly state as comparison from Section 13.1.4 a classical flow model using the isothermal
incompressible Navier-Stokes formulation: Find v : Ω × I → Rn and p : Ω × I → R
ρ∂t v + ρ(v · ∇)v − µ∆v + ∇p = ρf (related to Formulation (15.6))
∇·v =0 (related to Formulation (15.2))
with µ = ρν, where ν is the kinematic viscosity.

15.3 Some values and units


We illustrate the previous variables with some specific values. For the unit of the pressure, we have
N kg J
[p] = P a = = = 3
m2 ms2 m
The atmospheric air pressure close to the earth surface is
p0 = 101325P a = 1.01325bar.

360
15. PUBLIC CLIMATE SCHOOL (NOVEMBER 2019/2021): PDES IN ATMOSPHERIC PHYSICS
Example 15.8. In the daily weather forecast we see high pressure regions (Hochdruckgebiet) and low pressure
regions (Tiefdruckgebiet). They range between 1030 hP a for high pressure and 970 hP a for low pressure.
Example 15.9. On the moon the atmospheric air pressure close to the surface is 3·10−10 hP a and the pressure
in a bike tire is between 4 − 8 bar = 4000 − 8000 hP a.
The specific gas constant is
J
[R] = 287.058
kg · K
where K stands for Kelvin
K = Celsius + 273.15
The unit for the air density is
kg
[ρ] =
m3
The temperature is measured in
[T ] = K
The velocity is measured in
m
[v] = .
s
Usually the humidity is measured as
g
[w] = ,
m3
which however is not an SI unit. Consider 1g = 10−3 kg.
Energy:
[dE] = J = N · m.
Energy flow or power (mechanical work of 1J per second):
Nm
[d/dtδQ] = 1Js−1 = 1W =
s
Net radiative flux:
[F ] = W/m2
Example 15.10. In the year 2021 (first half year), the total power of wind energy in Germany was 55 772 M W ,
approx. 56 GW , where W = W att.

15.4 Classifications
For the terminology, we refer to Chapter 5.

15.4.1 Conservation laws


We derived the previous equations based on physical conservation laws of mass, momentum and energy. Fur-
thermore conservation of water phases of water in the solid, liquid, and gaseous phases may enter [121][Section
2.1.1]. Constituent aerosal concentrations can also be quantified by conservations relations [121][Section 2.1.1].
Therefore (we repeat ourselves), physics-based discretization schemes in time and space are important.

15.4.2 Order, steady-state vs. nonstationary, single vs. vector-valued


• Pressure: zero order (algebraic equation), single equation
• Density: first order in space and time, single equation
• Water stem: first order in space and time, single equation
• Temperature: first order in time, single equation
• Velocities: second order in space, first order in time, vector-valued equation

361
15. PUBLIC CLIMATE SCHOOL (NOVEMBER 2019/2021): PDES IN ATMOSPHERIC PHYSICS
15.4.3 Coupling and nonlinearities
The previous governing equations are clearly coupled: the variables from the single equations enter into the
other ones. It is obvious that
• The Navier-Stokes equations are nonlinear (semi-linear) because of the convection term (v · ∇)v;
• All other equations are linear, when the solution variables from the other equations are assumed to be
given.

• If two equations are solved simultaneously (monolithically), then they may become nonlinear because of
their coupling. For instance, if equation No. 2 and 5 are solved simultaneously than ρ and v are solution
variables. Consequently equation No. 2 is nonlinear due to ρ∇ · v.
• Not all solution variables enter into all other equations

15.4.4 Identification of similar structures


Overall, the previous classifications allow to identify similar structures. For instance the first law of thermody-
namics and the Navier-Stokes equations share similar structures that allow for similar numerical discretizations:

ρcp ∂t T and ρ∂t v (temporal derivatives)


−∇ · (k∇T ) and − ∇ · (ν∇v) (diffusive terms)
∇ · (ρv) and ∇ · (wv) (divergence terms)

As well, the continuity equation (mass conservation) has the same time derivative and also a convective
behavior:
∂t ρ, ∇ · (ρv),
but no diffusion (which is an important difference resulting in a different order of the PDE). Finally, the
Navier-Stokes equations and the temperature equation are basically of parabolic type and can be related to
the prototype heat equation ∂t u − ∆u = f .

15.5 Weak formulations and solution algorithms


15.5.1 Examples of weak formulations
Construct first the function spaces. For instance V ρ . Then:

∂t ρ + ρ∇ · v = 0
⇒ (∂t ρ, ψ ρ ) + (ρ∇ · v, ψ ρ ) =: A(ρ)(ψ ρ ).

For the Navier-Stokes equations, we obtain:


µ 1 1
∂t v + (v · ∇)v − [∆v + ∇(∇ · v)] + 2ω × v = − ∇p − ∇Φ,
ρ 3 ρ
µ µ1 1
⇒ (∂t v, ψ v ) + ((v · ∇)v, ψ v ) + ( ∇v, ∇ψ v ) + ( (∇ · v)I, ∇ψ v ) + (2ω × v, ψ v ) = −( ∇p, ψ v ) − (∇Φ, ψ v )
ρ ρ3 ρ
| {z } | {z }
=A(v)(ψ v ) =F (ψ v )

Here, in the second order spatial terms, we applied integration by parts. Specifically, we want to mention:

−∇ · (∇v) → (−∇ · (∇v), ψ) = (∇v, ∇ψ) (if v = 0 on ∂Ω; Green’s formula; well-known)
−∇(∇ · v) → (−∇(∇ · v), ψ) = ((∇ · v)I, ∇ψ) (if v = 0 on ∂Ω; a bit less well-known)

362
15. PUBLIC CLIMATE SCHOOL (NOVEMBER 2019/2021): PDES IN ATMOSPHERIC PHYSICS

Deposition It holds for ne H r

leyte foreoffold
on ve

Roof 1213
component wise ftp.e
1 1 1019 9 1,1 D
Stp

Aldous dyny forpneds


dylautdyny g
sfdxuxtdxdyugextdydxuxtdyyuyfy.de

fortfidjigen Exiting egnyds


frfesignadeteistugges de

front ng g Ing d
s

fro.us i8 Frobenius
8es
dI.oma
dex dye
for from Ily dye
di
m
s
forget'd t lo.us ve a

363
15. PUBLIC CLIMATE SCHOOL (NOVEMBER 2019/2021): PDES IN ATMOSPHERIC PHYSICS
15.5.2 Principle solution schemes: monolithic versus staggered

monolithic
scheme

BIG MATRIX

costlyto solve
but no iteration error fp
f

Alternative staggered scheme


sequential
order

0 0 0 0
AphpAWkI wtiÉi
Smaller matrices advantage
But iterations which can be
many
and or onlyconditionallystable
compare to explicit
schemes in ODES

364
15. PUBLIC CLIMATE SCHOOL (NOVEMBER 2019/2021): PDES IN ATMOSPHERIC PHYSICS
15.5.3 A fully monolithic formulation
If all equations would be solved monolithically, all-at-once, (see Chapter 14), then the resulting system would
be nonlinear and of the form:
Formulation 15.11. Let X be the product function space, i.e., X := V p × V ρ × V w × V T × V v where V p
contain the possible Dirichlet boundary conditions. Find U := (v, p, T, ρ, w) ∈ X such that U (0) = U0 and for
almost t ∈ I, it holds:
A(U )(Ψ) = F (Ψ) ∀Ψ ∈ X0
with Ψ := (ψ v , ψ p , ψ T , ψ ρ , ψ w ) and X0 being the space in which all Dirichlet bc are set to zero. Furthermore,
the compact semi-linear form is composed as

A(U )(Ψ) := Ap (p)(ψ p ) + Aρ (ρ)(ψ ρ ) + Aw (w)(ψ w ) + AT (T )(ψ T ) + Av (v)(ψ v )

where the single semi-linear forms represent the weak formulations of the single subproblems. Formally, a
fixed-point or Newton-type solver can be applied from Chapter 13.

15.5.4 Staggered solution (time explicit coupling)


If the equations are solved sequentially or only partially fully-coupled, then iterative schemes must be con-
structed as explained in Section 14.4.
Algorithm 15.12. Given the previous time step solution U n−1 = (pn−1 , ρn−1 , wn−1 , T n−1 , v n−1 ) ∈ X, for
time step index n = 1, . . . , N find U n = (pn , ρn , wn , T n , v n ) ∈ X as follows:
1. Given ρn−1 , wn−1 , T n−1 , v n−1 , find pn ∈ V p : Ap (p)(ψ p ) = Fp (ψ p ) for all φp ∈ V p
2. Given wn−1 , T n−1 , v n−1 , pn , find ρn ∈ V ρ : Aρ (ρ)(ψ ρ ) = Fρ (ψ ρ ) for all φρ ∈ V ρ
3. Given T n−1 , v n−1 , pn , ρn , find wn ∈ V w : Aw (w)(ψ w ) = Fw (ψ w ) for all φw ∈ V w
4. Given v n−1 , pn , ρn , wn , find T n ∈ V T : AT (T )(ψ T ) = FT (ψ T ) for all φT ∈ V T
5. Given pn , ρn , wn , T n , find v n ∈ V v : Av (v)(ψ v ) = Fv (ψ v ) for all φv ∈ V v

15.6 FEM discretization in 1D of ideal gas equation


We form the FEM discretization in 1D for Formulation 15.1. The strong form is given by

p = ρRT in Ω,

where we notice that ρ and T are given, for instance as ρ := ρn−1 and T := T n−1 . We keep the boundary
conditions open and simply choose V p := L2 (Ω) as function space. Now we derive the weak form:
Z Z
pψ p dx = ρRT ψ p dx
Ω Ω
⇔ (p, ψ p ) = (ρRT, ψ p )

For the discretization, we choose the finite dimensional space Vhp = {ψ1p , . . . , ψSp such that dim(Vhp ) = S. Then,
we have: Find ph ∈ Vhp such that

(ph , ψhp ) = (ρh RTh , ψhp ) ∀ψhp ∈ Vhp .

Or using the abstract notation: Find ph ∈ Vhp such that

a(ph , ψhp ) = l(ψhp ) ∀ψhp ∈ Vhp .

We now choose the linear combination


S
X
ph = χpi ψip
i=1

365
15. PUBLIC CLIMATE SCHOOL (NOVEMBER 2019/2021): PDES IN ATMOSPHERIC PHYSICS
and insert
S
X
χpi (ψip , ψhp ) = l(ψhp ) ∀ψhp ∈ Vhp .
i=1

Since this holds for all test functions, we have


S
X
χpi (ψip , ψjp ) = l(ψjp ) ∀ψjp ∈ Vjp .
i=1

Finally, use employ P1 hat functions (see Section 8.4) and obtain:
Z xi+1
2
j = i : (ψip , ψip ) = ψip ψip dx = h
xi−1 3
Z xi
1
j = i − 1 : (ψip , ψi−1
p
)= ψip ψi−1
p
dx = h
xi−1 6
Z xi+1
1
j = i + 1 : (ψip , ψi+1
p
)= ψip ψi+1
p
dx = h
xi 6
j = i − 2, . . . , (ψip , ψi+1
p
)=0
j = i + 2, . . . , (ψip , ψi+1
p
) = 0.

Consequently, the system matrix reads


 
4 1 0
1 4 1 
 
h .. .. .. 
M=  . . . .
6 
1 4 1
 
 
0 1 4

15.7 Navier-Stokes discretization


For details on stationary Navier-Stokes, we refer the reader to Section 13.19. In the nonstationary case, we first
need to discretize time and then obtain a sequence of quasi-stationary problems; see Section 12.4. Moreover,
we refer to the textbook [132].

15.8 Boussinesq approximation


This is an extension to the pure incompressible Navier-Stokes equations in meteorology. We refer the reader
to [49][Chapter 12.7]. In the presense of turbulence phenomena, we again refer to [49] or as well to [127].

366
16. COMPUTATIONAL CONVERGENCE ANALYSIS

16 Computational convergence analysis


We provide some tools to perform a computational convergence analysis. In these notes we faced two situations
of ‘convergence’:
• Discretization error: Convergence of the discrete solution uh towards the (unknown) exact solution
u;
• Iteration error: Convergence of an iterative scheme to approximate the discrete solution uh through a
(k)
sequence of approximate solutions uh , k = 1, 2, . . ..
In the following we further illustrate the terminologies ‘first order convergence’, ‘convergene of order two’,
’quadratic convergence’, ’linear convergence’, etc.

16.1 Discretization error


Before we go into detail, we discuss the relationship between the degrees of freedom (DoFs) N and the mesh
size parameter h. In most cases the discretization error is measured in terms of h and all a priori and a
posteriori error estimates are stated in a form
ku − uh k = O(hα ), α > 0.
In some situations it is however better to create convergence plots in terms of DoFs vs. the error. One example
is when adaptive schemes are employed with different h. Then it would be not clear to which h the convergence
plot should be drawn. But simply counting the total numbers of DoFs is not a problem though.

16.1.1 Relationship between h and N (DoFs)


The relationship of h and N depends on the basis functions (linear, quadratic), whether a Lagrange method
(only nodal points) or Hermite-type method (with derivative information) is employed. Moreover, the dimen-
sion of the problem plays a role.
We illustrate the relationship for a Lagrange method with linear basis functions in 1D,2D,3D:
Proposition 16.1. Let d be the dimension of the problem: d = 1, 2, 3. It holds
1 d
N= +1
h
where h is the mesh size parameter (lengh of an element or diameter in higher dimensions for instance), and
N the number of DoFs.
Proof. Sketch. No strict mathematical proof. We initialize as follows:
• 1D: 2 values per line;
• 2D: 4 values per quadrilaterals;
• 3D: 8 values per hexahedra.
Of course, for triangles or primsms, we have different values in 2D and 3D. We work on the unit cell with
h = 1. All other h can be realized by just normalizing h. By simple counting the nodal values, we have in 1D
h N
=======
1 2
1/2 3
1/4 5
1/8 9
1/16 17
1/32 33
...
=======

367
16. COMPUTATIONAL CONVERGENCE ANALYSIS
We have in 2D
h N
=======
1 4
1/2 9
1/4 25
1/8 36
1/16 49
1/32 64
...
=======
We have in 3D
h N
=======
1 8
1/2 27
1/4 64
...
=======

16.1.2 Discretization error


With the previous considerations, we have now a relationship between h and N that we can use to display the
discretization error.
Proposition 16.2. In the approximate limit it holds:
 1 d
N∼
h
yielding
1
h∼ √ d
N
These relationships allow us to replace h in error estimates by N .
Proposition 16.3 (Linear and quadratic convergence in 1D). When we say a scheme has a linear or quadratic
convergence in 1D, (i.e., d = 1) respectively, we mean:
1
O(h) = O
N
or  1 
O(h2 ) = O
N2
In a linear scheme, the error will be divided by a factor of 2 when the mesh size h is divided by 2 and having
quadratic convergence the error will decrease by a factor of 4.
Proposition 16.4 (Linear and quadratic convergence in 2D). When we say a scheme has a linear or quadratic
convergence in 2D, (i.e., d = 2) respectively, we mean:
 1 
O(h) = O √
N
or 1
O(h2 ) = O
N

368
16. COMPUTATIONAL CONVERGENCE ANALYSIS
16.1.3 Computationally-obtained convergence order
In order to calculate the convergence order α from numerical results, we make the following derivation. Let
P (k) → P for k → 0 be a converging process and assume that
P (k) − P̃ = O(k α ).
Here P̃ is either the exact limit P (in case it is known) or some ‘good’ approximation to it. Let us assume
that three numerical solutions are known (this is the minimum number if the limit P is not known). That is
P (k), P (k/2), P (k/4).
Then, the convergence order can be calculated via the formal approach P (k) − P̃ = ck α with the following
formula:
Proposition 16.5 (Computationally-obtained convergence order). Given three numerically-obtained values
P (k), P (k/2) and P (k/4), the convergence order can be estimated as:
1  P (k) − P (k/2) 
α= log . (349)
log(2) P (k/2) − P (k/4)
The order α is an estimate and heuristic because we assumed a priori a given order, which strictly speaking
we have to proof first.
Proof. We assume:
P (k) − P (k/2) = O(k α ),
P (k/2) − P (k/4) = O((k/2)α ).
First, we have
1
P (k/2) − P (k/4) = O((k/2)α ) = O(k α )

We simply re-arrange:
1  
P (k/2) − P (k/4) = P (k) − P (k/2)

P (k) − P (k/2)
⇒ 2α =
P (k/2) − P (k/4)
1 P (k) − P (k/2)
⇒ α=
log(2) P (k/2) − P (k/4)

16.1.4 Temporal order for FE, BE, CN in ODEs


In this section we demonstrate our algorithmic developments in terms of a numerical example. The program-
ming code is written in octave (which is the open-source sister of MATLAB) presented in Section 17.2.6.

16.1.4.1 Problem statement We solve our ODE model problem numerically. Let a = g − m be a = 0.25
(test 1) or a = −0.25 (test 2) or a = −10 (test 3). The IVP is given by:
y 0 = ay, y(t0 = 2011) = 2.
The end time value is T = 2014. The tasks are:
• Use the forward Euler (FE), backward Euler (BE), and trapezoidal rule (CN) for the numerical approx-
imation.
• Please observe the accuracy in terms of the discretization error.
• Observe for (stiff) equations with a large negative coefficient a = −10  1 the behavior of the three
schemes.

369
16. COMPUTATIONAL CONVERGENCE ANALYSIS
16.1.4.2 Discussion of the results for test 1 a = 0.25 In the following, we present our results for the end
time value y(T = 2014) for test case 1 (a = 0.25) on three mesh levels:
Scheme #steps(N) k y(T)
===========================================
FE 8 0.37500 4.0961
BE 8 0.37500 4.3959
CN 8 0.37500 4.2363
-------------------------------------------
FE 16 0.18750 4.1624
BE 16 0.18750 4.3115
CN 16 0.18750 4.2346
-------------------------------------------
FE 32 0.093750 4.1975
BE 32 0.093750 4.2720
CN 32 0.093750 4.2341
===========================================

• In the second column, i.e., 8, 16, 32, the number of steps (= number of intervals, i.e., so called mesh cells
- speaking in PDE terminology) are given. In the column after, the errors are provided.
• In order to compute numerically the convergence order α with the help of formula (349), we work
with k = kmax = 0.375. Then we identify in the above table that P (kmax ) = P (0.375) = |y(T ) −
y8 |, P (kmax /2) = P (0.1875) = |y(T ) − y16 | and P (kmax /4) = P (0.09375) = |y(T ) − y32 |.

• We monitor that doubling the number of intervals (i.e., halving the step size k) reduces the error in
the forward and backward Euler scheme by a factor of 2. This is (almost) linear convergence, which is
confirmed by using Formula (349) yielding α = 1.0921. The trapezoidal rule is much more accurate (for
instance using N = 8 the error is 0.2% rather than 13 − 16%) and we observe that the error is reduced
by a factor of 4. Thus quadratic convergence is detected. Here the ‘exact’ order on these three mesh
levels is α = 2.0022.
• A further observation is that the forward Euler scheme is unstable for N = 16 and a = −10 and has a
zig-zag curve, whereas the other two schemes follow the exact solution and the decreasing exp-function.
But for sufficiently small step sizes, the forward Euler scheme is also stable which we know from our
A-stability calculations. These step sizes can be explicitely determined for this ODE model problem and
shown below.

Figure 69: Example 16.1.4.1: on the left, the solution to test 1 is shown. In the middle, test 2 is plotted.
On the right, the solution of test 3 with N = 16 (number of intervals) is shown. Here, N = 16
corresponds to a step size k = 0.18 which is slightly below the critical step size for convergence (see
Section 16.1.4.3). Thus we observe the instable behavior of the forward Euler method, but also see
slow convergence towards the continuous solution.

370
16. COMPUTATIONAL CONVERGENCE ANALYSIS
16.1.4.3 Investigating the instability of the forward Euler method: test 3 a = −10 With the help of
Example 12.32 let us understand how to choose stable step sizes k for the forward Euler method. The
convergence interval reads:
|1 + z| ≤ 1 ⇒ |1 + ak| ≤ 1
In test 3, a = −10, which yields:
|1 + z| ≤ 1 ⇒ |1 − 10k| ≤ 1
Thus, we need to choose a k that fulfills the previous relation. In this case this, k < 0.2 is calculated.
This means that for all k < 0.2 we should have convergence of the forward Euler method and for k ≥ 0.2
non-convergence (and in particular no stability!). We perform the following additional tests:
• Test 3a: N = 10, yielding k = 0.3;
• Test 3b: N = 15, yielding k = 0.2; exactly the boundary of the stability interval;
• Test 3c: N = 16, yielding k = 0.1875; from before;

• Test 3d: N = 20, yielding k = 0.15.


The results of test 3a,3b,3d are provided in Figure 70 and visualize very nicely the theoretically predicted
behavior.

Figure 70: Example 16.1.4.1: tests 3a,3b,3d: Blow-up, constant zig-zag non-convergence, and convergence of
the forward Euler method.

Exercise 12. Consider the ODE-IVP:

y 0 (t) = ay(t), y(t0 ) = 7,

where t0 = 5 and T = 11.

1. Implement the backward Euler scheme. and set the model parameter to a = 2
2. Run a second test with a = −2.
3. Implement the forward Euler scheme. What is the critical step size (Hint: Determine the stability
interval).

4. Implement the trapezoidal scheme.


5. Perform a computational analysis and detect the convergence order.
Exercise 13. Implement the predator-prey system and study the solution behavior for different time steps and
the convergence order. Details will be given in the lecture.

Remark 16.6. Another example where the previous formula is used can be found in Section 8.16.4.

371
16. COMPUTATIONAL CONVERGENCE ANALYSIS
16.2 Spatial discretization error
We simply use now h and then
1  P (h) − P (h/2) 
α= log (350)
log(2) P (h/2) − P (h/4)
in order to obtain the computational convergence order α. Here,
P (h) := uh , P (h/2) := uh/2 , P (h/4) := uh/4 ,
where u is the discrete PDE solution.

16.3 Temporal discretization error


If in time-dependent PDEs, no analytical solution can be constructed, the convergence can again be purely
numerically determined. However, we have now influence from spatial and temporal components. Often, the
spatial error is more influential than the temporal one. This requires that a numerical reference solution must
be computed for sufficiently small h and k. If now ‘coarse’ temporal solutions are adopted (k coarse!), the
spatial mesh must be sufficiently fine, i.e., h small!, because otherwise we see significant influences by the
spatial components. Then, we perform the procedure from the previous sections: bisection of k and observing
the descrease in the corresponding norm or goal functional. That is to say, observe for a discrete PDE solution
u:
uk,hf ixed,small , uk/2,hf ixed,small , uk/4,hf ixed,small , . . . , ukref ,hf ixed,small
With this, we can set:
P̃ := ukref ,hf ixed,small , P (k) = uk,hf ixed,small , P (k/2) = uk/2,hf ixed,small .
Of course, using P (k/4) is also an option. But keep in mind that always h must be sufficiently small and fixed
while the time step size is varied.

16.4 Extrapolation to the limit


In case, there is no manufactured solution, the numerical reference value can be improved by extrapolation:
Formulation 16.7. Let h1 and h2 two mesh sizes with h2 < h1 . The functionals of interest are denoted by
Jh1 and Jh2 . Then, the extrapolation formula reads:
(h1 ∗ h1 ∗ Jh2 − h2 ∗ h2 ∗ Jh1 )
Jextra =
h1 ∗ h1 − h2 ∗ h2
The value for Jextra can then be used as approximation for P̃ from the previous subsection.
A specific example from [114] realized in octave is:
% Octave code for the extrapolation to the limit
format long

h1 = 1/2^3 % Mesh size h1


h2 = 1/2^4 % Mesh size h2

% Richter/Wick, Springer, 2017, page 383


p1 = 0.784969295 % Functional value J on h1
p2 = 0.786079344 % Functional value J on h2

% Formula MHB, page 335 (Romberg/Richardson)


% See also Quarteroni, Saleri, Ger. 2014, Springer, extrapolation

pMaHB = p2 + (p2 - p1) / ((h1*h1)/(h2*h2) - 1)

% Formula Richter/Wick for extrapolated value


pRiWi = (h1*h1 * p2 - h2*h2 * p1) / (h1*h1 - h2*h2)

372
16. COMPUTATIONAL CONVERGENCE ANALYSIS
16.5 Iteration error
Iterative schemes are used to approximate the discrete solution uh . This has a priori nothing to do with the
discretization error. The main interest is how fast can we get a good approximation of the discrete solution
uh . One example can be found for solving implicit methods for ODEs in which Newton’s method is used to
compute the discrete solutions of the backward Euler scheme.
To speak about convergence, we compare two subsequent iterations:
Proposition 16.8. Let us assume that we have an iterative scheme to compute a root z. The iteration
converges with order p when
kxk − zk ≤ c kxk−1 − zkp , k = 1, 2, 3, . . .
with p ≥ 1 and c = const. In more detail:

• Linear convergence: c ∈ (0, 1) and p = 1;


• Superlinear convergence: c := ck → 0, (k → ∞) and p = 1;
• Quadratic convergence c ∈ R and p = 2.

Cupic and higher convergence are defined as quadratic convergence with the respectice p.
Remark 16.9 (Other characterizations of superlinear and quadratic convergence). Other (but equivalent)
formulations for superlinear and quadratic convergence, respectively, in the case z 6= xk for all k, are:

kxk − zk
lim = 0,
k→∞ kxk−1 − zk
kxk − zk
lim sup 2
< ∞.
k→∞ kxk−1 − zk

Corollary 16.10 (Rule of thumb). A rule of thumb for quadratic convergence is:√the number of correct digits
doubles at each step. For instance, a Newton scheme to compute f (x) = x − 2 = 0 yields the following
results:

Iter x f(x)
==============================
0 3.000000e+00 7.000000e+00
1 1.833333e+00 1.361111e+00
2 1.462121e+00 1.377984e-01
3 1.414998e+00 2.220557e-03
4 1.414214e+00 6.156754e-07
5 1.414214e+00 4.751755e-14
==============================
The programming code to this example can be found in [122].

373
16. COMPUTATIONAL CONVERGENCE ANALYSIS

374
17. SOFTWARE AND PROGRAMMING CODES

17 Software and programming codes


There are many software packages around for simulating ODEs and PDEs. A google search with the search
term ‘List of finite element software packages’. A search in October 2020 yielded 44 entries listed on the
Wikipedia page https://en.wikipedia.org/wiki/List_of_finite_element_software_packages

17.1 Research software (C++)


Our own sofwares and implementations are based on:
• deal.II www.dealii.org : Poisson’s problem (basic version): step-3 https://www.dealii.org/current/
doxygen/deal.II/step_3.html
• DOpElib www.dopelib.org : Poisson’s problem (basic version) : PDE/StatPDE/Example 4 (2D) and
PDE/StatPDE/Example 6 (3D)

17.2 Programming codes


17.2.1 deal.II (C++) step-3 in a 1D version: clothesline / Poisson problem
// Thomas Wick
// Nov 2019
// Slight adaptation of deal.II www.dealii.org step-3
#include <deal.II/grid/tria.h>
#include <deal.II/dofs/dof_handler.h>
#include <deal.II/grid/grid_generator.h>
#include <deal.II/grid/tria_accessor.h>
#include <deal.II/grid/tria_iterator.h>
#include <deal.II/dofs/dof_accessor.h>
#include <deal.II/fe/fe_q.h>
#include <deal.II/dofs/dof_tools.h>
#include <deal.II/fe/fe_values.h>
#include <deal.II/base/quadrature_lib.h>
#include <deal.II/base/function.h>
#include <deal.II/numerics/vector_tools.h>
#include <deal.II/numerics/matrix_tools.h>
#include <deal.II/lac/vector.h>
#include <deal.II/lac/full_matrix.h>
#include <deal.II/lac/sparse_matrix.h>
#include <deal.II/lac/dynamic_sparsity_pattern.h>
#include <deal.II/lac/solver_cg.h>
#include <deal.II/lac/precondition.h>
#include <deal.II/numerics/data_out.h>
#include <fstream>
#include <iostream>

using namespace dealii;

class Step3
{
public:
Step3 ();
void run ();

private:
void make_grid ();

375
17. SOFTWARE AND PROGRAMMING CODES
void setup_system ();
void assemble_system ();
void solve ();
void output_results (unsigned int ref_level) const;

Triangulation<1> triangulation;
FE_Q<1> fe;
DoFHandler<1> dof_handler;

SparsityPattern sparsity_pattern;
SparseMatrix<double> system_matrix;

Vector<double> solution;
Vector<double> system_rhs;
};

Step3::Step3 ()
:
fe (2),
dof_handler (triangulation)
{}

void Step3::make_grid ()
{
GridGenerator::hyper_cube (triangulation, 0, 1);
triangulation.refine_global (1);

std::cout << "Number of active cells: "


<< triangulation.n_active_cells()
<< std::endl;
}

void Step3::setup_system ()
{
dof_handler.distribute_dofs (fe);
std::cout << "Number of degrees of freedom: "
<< dof_handler.n_dofs()
<< std::endl;

DynamicSparsityPattern dsp(dof_handler.n_dofs());
DoFTools::make_sparsity_pattern (dof_handler, dsp);
sparsity_pattern.copy_from(dsp);

system_matrix.reinit (sparsity_pattern);

solution.reinit (dof_handler.n_dofs());
system_rhs.reinit (dof_handler.n_dofs());
}

void Step3::assemble_system ()
{
QGauss<1> quadrature_formula(2);
FEValues<1> fe_values (fe, quadrature_formula,

376
17. SOFTWARE AND PROGRAMMING CODES
update_values | update_gradients | update_JxW_values);

const unsigned int dofs_per_cell = fe.dofs_per_cell;


const unsigned int n_q_points = quadrature_formula.size();

FullMatrix<double> cell_matrix (dofs_per_cell, dofs_per_cell);


Vector<double> cell_rhs (dofs_per_cell);

std::vector<types::global_dof_index> local_dof_indices (dofs_per_cell);

DoFHandler<1>::active_cell_iterator cell = dof_handler.begin_active();


DoFHandler<1>::active_cell_iterator endc = dof_handler.end();
for (; cell!=endc; ++cell)
{
fe_values.reinit (cell);

cell_matrix = 0;
cell_rhs = 0;

for (unsigned int q_index=0; q_index<n_q_points; ++q_index)


{
for (unsigned int i=0; i<dofs_per_cell; ++i)
for (unsigned int j=0; j<dofs_per_cell; ++j)
cell_matrix(i,j) += (fe_values.shape_grad (i, q_index) *
fe_values.shape_grad (j, q_index) *
fe_values.JxW (q_index));

for (unsigned int i=0; i<dofs_per_cell; ++i)


cell_rhs(i) += (fe_values.shape_value (i, q_index) *
(-1.0) *
fe_values.JxW (q_index));
}
cell->get_dof_indices (local_dof_indices);

for (unsigned int i=0; i<dofs_per_cell; ++i)


for (unsigned int j=0; j<dofs_per_cell; ++j)
system_matrix.add (local_dof_indices[i],
local_dof_indices[j],
cell_matrix(i,j));

for (unsigned int i=0; i<dofs_per_cell; ++i)


system_rhs(local_dof_indices[i]) += cell_rhs(i);
}

std::map<types::global_dof_index,double> boundary_values;
VectorTools::interpolate_boundary_values (dof_handler,
0,
ZeroFunction<1>(),
boundary_values);

VectorTools::interpolate_boundary_values (dof_handler,
1,
ZeroFunction<1>(),

377
17. SOFTWARE AND PROGRAMMING CODES
boundary_values);

MatrixTools::apply_boundary_values (boundary_values,
system_matrix,
solution,
system_rhs);
}

void Step3::solve ()
{
SolverControl solver_control (1000, 1e-12);
SolverCG<> solver (solver_control);

solver.solve (system_matrix, solution, system_rhs,


PreconditionIdentity());
}

void Step3::output_results (unsigned int ref_level) const


{
DataOut<1> data_out;
data_out.attach_dof_handler (dof_handler);
data_out.add_data_vector (solution, "solution");
data_out.build_patches (2);

//std::ofstream output ("solution.gpl");

std::string filename_basis;
filename_basis = "solution_";

std::ostringstream filename;
filename << filename_basis
<< Utilities::int_to_string (ref_level, 4)
<< ".gpl";

std::ofstream output (filename.str().c_str());

data_out.write_gnuplot (output);
}

void Step3::run ()
{
make_grid ();

unsigned int max_ref = 5;


for (unsigned int level=0; level<max_ref; level++)
{
if (level > 0)
triangulation.refine_global(1);

setup_system ();
assemble_system ();
solve ();
output_results (level);
}

378
17. SOFTWARE AND PROGRAMMING CODES
}

int main ()
{
deallog.depth_console (2);

Step3 laplace_problem;
laplace_problem.run ();

return 0;
}

17.2.2 Poisson 1D in python


Python is a programming language that currently enjoys a lot of popularity in scientific computing because
of artificial intelligence and machine learning. A website to get started and further information is https:
//www.python.org/about/gettingstarted/.
The programming code is based on python version 2.7. In order to perform scientific computing with python
some additional packages might have to be installed first (on some computers they are pre-installed though):
• numpy
• scipy
• matplotlib
The full code to obtain the previous results is:
#!/usr/bin/env python
# -*- coding: utf-8 -*-

# Thomas Wick
# Ecole Polytechnique
# MAP 502
# Winter 2016/2017

# Execution of *.py files


# Possiblity 1: in terminal
# terminal> python3 file.py # here file.py = poisson.py

# Possiblity 2: executing in an python environment


# such as spyder.

# This program has been tested with python


# version 2.7 on a SUSE linux system version 13.2

# Problem statement
# Numerical solution using finite
# elements of the Poisson problem:
#
# Find u such that
#
# -d/dx (k d/dx) u = f in (0,1)
# u(0) = u(1) = 0
#
# with f=-1

379
17. SOFTWARE AND PROGRAMMING CODES
########################################################
# Load packages (need to be installed first if
# not yet done - but is not difficult)
import numpy as np
import matplotlib.pyplot as plt # for plot functons
plt.switch_backend(’tkAgg’) # necessary for OS SUSE 13.1 version,
# otherwise, the plt.show() function will not display any window

import scipy as sp
from scipy import sparse
from scipy.sparse import linalg

import scipy.linalg # for the linear solution with Cholesky

########################################################
# Setup of discretization and material parameters
N = 200 # Number of points in the discretization
x = np.linspace(0,1,N) # Initialize nodal points of the mesh

# Spatial discretization parameter (step length) : h_i=x_{i+1}-x_i


hx = np.zeros(len(x))
hx[:len(x)-1]=x[1:]-x[:len(x)-1]

h = max(hx) # Maximal element diameter

# Sanity check
#print h

# Material parameter
k= 1.0 * np.ones(len(x))

#########################################################
# Construct and plot exact solution and right hand side f

Nddl = N-2
xddl = x.copy()
xddl = xddl[1:len(x)-1]

# Initialize and construct exact solution


uexact = np.zeros(len(x))
uexact[1:len(x)-1] = 0.5 * (xddl * xddl - xddl)

# Construct right hand side vector


f = (-1) * np.ones(len(x)-2)

#########################################################
# Build system matrix and right hand side
Kjj = k[:N-2]/hx[:N-2] + k[1:N-1]/hx[1:N-1] # the diagonal of Kh
Kjjp = - k[1:N-2]/hx[1:N-2] # the two off-diagionals of Kh

380
17. SOFTWARE AND PROGRAMMING CODES

# System matrix (also commenly known as stiffness matrix)


# First argument: the entries
# Second argument: in which diagonals shall the entries be sorted
Ah =sp.sparse.diags([Kjjp,Kjj,Kjjp],[-1,0,1],shape=(Nddl,Nddl))

# Right hand side vector


Fh= (hx[:N-2] + hx[1:N-1]) * 0.5 * f

#########################################################
# Solve system: Ah uh = Fh => uh = Ah^{-1}Fh
utilde = sp.sparse.linalg.spsolve(Ah,Fh)

# Add boundary conditions to uh


# The first and last entry are known to be zero.
# Therefore, we include them explicitely:
uh = np.zeros(len(uexact))
uh[1:len(x)-1]=utilde

#########################################################
# Plot solution
plt.clf()
plt.plot(x,uexact,’r’, label=’Exact solution $u$’)
plt.plot(x,uh,’b+’,label=’Discrete FE solution $u_h$’)
plt.legend(loc=’best’)
plt.xlim([-0.1,1.1])
plt.ylim([-0.15,0.01])
plt.show()

381
17. SOFTWARE AND PROGRAMMING CODES

Figure 71: Solution of the 1D Poisson problem.

17.2.3 Poisson 1D in python (short-elegant version) by Roth/Schröder


This code is based on python3. Here are some hints assuming you have python3 installed, but some packages
are still missing.
# pip is the installer for python
# pip3 is the installer for python3
# Find out current versions
> pip3 --version
> python3 --version

# Numerical and visualization packages


> pip3 install --user scipy
> pip3 install --user matplotlib

# Graphical user interface (GUI):


> pip3 install --user PyQt5==5.9.2

# Running the program. Here the CODE FROM BELOW has to


# be copied into the file poisson.py
> python3 poisson.py

Here is now the code copied from the file poisson.py:

import numpy as np
from scipy.sparse import diags
from scipy.sparse.linalg import spsolve

382
17. SOFTWARE AND PROGRAMMING CODES
import matplotlib.pyplot as plt

N = 10
h = 1 / N
x = np.linspace(0, 1, N + 1)
xFine = np.linspace(0, 1, 5 * N)

print(f"There are {N+1} nodal points and the mesh size is {h}.")

# assemble system
systemMatrix = (1 / h) * diags(
[-1, 2, -1], [-1, 0, 1], shape=(N + 1, N + 1), format="csr"
)
rightHandSide = -h * np.ones(N + 1)

# enforce homogeneous Dirichlet boundary conditions


systemMatrix[0, 0] = 1.0
systemMatrix[0, 1] = 0.0
systemMatrix[N, N] = 1.0
systemMatrix[N, N - 1] = 0.0

rightHandSide[0] = 0.0
rightHandSide[N] = 0.0

# solve linear system


solution = spsolve(systemMatrix, rightHandSide)
analyticSolution = 0.5 * xFine * (xFine - 1)

plt.plot(xFine, analyticSolution, "r", label="Analytic solution $u$")


plt.plot(x, solution, "b+", label="FEM solution $u_h$")
plt.legend(loc="best")
plt.show()

Figure 72: Solution of the 1D Poisson problem.

383
17. SOFTWARE AND PROGRAMMING CODES
17.2.4 Poisson 2D in python with assembling and numerical quadrature by Roth/Schröder
In this code, we show also in detail the assembling with test functions and numerical quadrature. The results
are analyzed with a figure and also the evaluation of a goal functional.

import numpy as np
from scipy.sparse import dok_matrix
from scipy.sparse.linalg import spsolve
import matplotlib.pyplot as plt

# right-hand side
f = -1.0

# quadrature points and weights


quad = simpson = [(0.0, 1 / 6), (0.5, 4 / 6), (1.0, 1 / 6)] # simpson rule

# basis functions in 1D
hat0 = {"eval": lambda x: x, "nabla": lambda x: 1}
hat1 = {"eval": lambda x: 1 - x, "nabla": lambda x: -1}
hatFunction = [hat0, hat1]

# basis functions in 2D = tensor product of 1D basis functions


class Phi2D:
def __init__(self, x0, y0):
self.xBasis = hatFunction[x0]
self.yBasis = hatFunction[y0]

# function evaluation
def eval(self, x, y):
return self.xBasis["eval"](x) * self.yBasis["eval"](y)

# derivative
def nabla(self, x, y):
return np.array(
[
self.xBasis["nabla"](x) * self.yBasis["eval"](y),
self.xBasis["eval"](x) * self.yBasis["nabla"](y),
]
)

# linear basis functions in 2D


basisFunction = {0: Phi2D(0, 0), 1: Phi2D(1, 0), 2: Phi2D(0, 1), 3: Phi2D(1, 1)}

class Cell:
def __init__(self, origin, right, up, upRight):
self.dofs = [origin, right, up, upRight]

class DoF:
def __init__(self, x, y, ind=-1):
self.x, self.y = x, y
self.ind = ind

384
17. SOFTWARE AND PROGRAMMING CODES

class Grid:
def __init__(self, xMin=0.0, xMax=1.0, yMin=0.0, yMax=1.0, stepSize=0.5):
self.xMin, self.xMax = xMin, xMax
self.yMin, self.yMax = yMin, yMax
self.h = stepSize
self.dofs, self.cells = [], []

def decompose(self):
# create mesh and transform it into a list of x and y coordinates
xRange = np.arange(self.xMin, self.xMax, self.h)
xRange = np.append(xRange, [self.xMax])
self.xNum = len(xRange)
yRange = np.arange(self.yMin, self.yMax, self.h)
yRange = np.append(yRange, [self.yMax])
self.yNum = len(yRange)
self.xCoord, self.yCoord = np.meshgrid(xRange, yRange)
xList, yList = self.xCoord.ravel(), self.yCoord.ravel()

# create DoFs
for i, (x, y) in enumerate(zip(xList, yList)):
self.dofs.append(DoF(x, y, ind=i))

# create cells
for i, dof in enumerate(self.dofs):
if dof.x != self.xMax and dof.y != self.yMax:
self.cells.append(
Cell(
dof,
self.dofs[i + 1],
self.dofs[i + self.xNum],
self.dofs[i + self.xNum + 1],
)
)

def assembleSystem(self):
# system matrix
A = dok_matrix((len(self.dofs), len(self.dofs)), dtype=np.float32)
# system right hand side
F = np.zeros(len(self.dofs), dtype=np.float32)

for cell in self.cells:


for x, y, quadWeight in
[ (x, y, wX * wY) for (x, wX) in quad for (y, wY) in quad ]:
for j, dof_j in enumerate(cell.dofs):
# assemble rhs
F[dof_j.ind] += (
quadWeight * f * basisFunction[j].eval(x, y) * self.h ** 2
)
for i, dof_i in enumerate(cell.dofs):
# assemble matrix
A[dof_i.ind, dof_j.ind] += quadWeight
* basisFunction[i].nabla(x, y).dot(
basisFunction[j].nabla(x, y)

385
17. SOFTWARE AND PROGRAMMING CODES
)

# apply homogeneous Dirichlet boundary conditions


for dof in self.dofs:
if dof.x in [self.xMin, self.xMax] or dof.y in [self.yMin, self.yMax]:
_, nonZeroColumns = A[dof.ind, :].nonzero()
for j in nonZeroColumns:
A[dof.ind, j] = 0.0
A[dof.ind, dof.ind] = 1.0
F[dof.ind] = 0.0

return A.tocsr(), F

def plotSolution(self, solution):


# 2D contour plot
plt.contourf(
self.xCoord,
self.yCoord,
solution.reshape(self.yNum, self.xNum),
20,
cmap="viridis",
)
plt.colorbar()
plt.show()

def printSolutionAtCenter(self,solution):
#prints the solution at center of the domain
xCenter = (self.xMax + self.xMin)/2
yCenter = (self.yMax + self.yMin)/2
for dof in self.dofs:
if abs(dof.x - xCenter) < 1e-10 and abs(dof.y - yCenter) < 1e-10:
print(f’The Solution at the center x={xCenter},y={yCenter}
has the value {solution[dof.ind]}.’)
break
else:
print(f’The center point x={xCenter},y={yCenter} is not on the grid.’)

if __name__ == "__main__":
grid = Grid(stepSize=0.1)
grid.decompose()
A, F = grid.assembleSystem()
solution = spsolve(A, F) # U = A^{-1}F
grid.printSolutionAtCenter(solution)
grid.plotSolution(solution)

Goal functional evaluation:


The Solution at the center x=0.5,y=0.5 has the value -0.07426007837057114.

386
17. SOFTWARE AND PROGRAMMING CODES

Figure 73: Solution of the 2D Poisson problem.

17.2.5 Newton’s method for solving nonlinear problems: octave/MATLAB and C++
We present three variants:
1. Root-finding with a standard Newton method for a real-valued function (octave)
2. Root-finding with a defect-correction Newton method for a real-valued function (octave)
3. Root-finding (solving!) a nonlinear PDE (C++ with deal.II)

17.2.5.1 Code for a straightforward implementation of Newton’s method The programming code for
standard Newton’s method is as follows:
% Script file: step_2.m
% Author: Thomas Wick, CMAP, Ecole Polytechnique, 2016
% We seek a root of a nonlinear problem

format long

% Define nonlinear function of which


% we want to find the root
f = @(x) x^2 - 2;

% Define derivative of the nonlinear function


df = @(x) 2*x;

% We need an initial guess


x = 3;

% Define function
y = f(x);

% Set iteration counter


it_counter = 0;
printf(’%i %e %e\n’,it_counter,x,y);

387
17. SOFTWARE AND PROGRAMMING CODES

% Set stopping tolerance


TOL = 1e-10;

% Perform Newton loop


while abs(y) > TOL
dy = df(x);
x = x - y/dy;
y = f(x);
it_counter = it_counter + 1;
printf(’%i %e %e\n’,it_counter,x,y);
end

17.2.5.2 Code for a Newton defect-correction scheme including line search As second implementation
we present Newton’s method as defect correction scheme as introduced in Definition 13.20. Additionally we
introduce a line search parameter ω ∈ [0, 1]. If the initial Newton guess is not good enough we can enlarge the
convergence radius of Newton’s method by choosing ω < 1. The octave code reads:
% Script file: step_3.m
% Author: Thomas Wick, CMAP, Ecole Polytechnique, 2016
% We seek a root of a nonlinear problem

format long

% Define nonlinear function of which


% we want to find the root
f = @(x) x^2 - 2;

% Define derivative of the nonlinear function


df = @(x) 2*x;

% We need an initial guess


x = 3;

% Define function
y = f(x);

% Set iteration counter


it_counter = 0;
printf(’%i %e %e\n’,it_counter,x,y);

% Set stopping tolerance


TOL = 1e-10;

% Line search parameter


% omega \in [0,1]
% where omega = 1 corresponds to full Newton
omega = 1.0;

% Newtons method as defect correction


% and line search parameter omega
while abs(y) > TOL
dy = df(x);
% Defect step
dx = - y/dy;

388
17. SOFTWARE AND PROGRAMMING CODES
% Correction step
x = x + omega * dx;
y = f(x);
it_counter = it_counter + 1;
printf(’%i %e %e\n’,it_counter,x,y);
end

17.2.5.3 Pseudo C++ code of a Newton implementation with line search An example of a pseudo
C++ code demonstrating the algorithm from Section 13.10.2 was implemented in [137]; see https://media.
archnumsoft.org/10305/doc/step-fsi_8cc_source.html and https://media.archnumsoft.org/10305/
This piece of code has been the work horse of the author’s own research since the year 2011. In [73], see
https://github.com/tjhei/cracks this code was the basis for a parallel version.
newton_iteration ()
{
const double lower_bound_newton_residual = 1.0e-10;
const unsigned int max_no_newton_steps = 20;

// Decision whether the system matrix should be build at each Newton step
const double nonlinear_theta = 0.1;

// Line search parameters


unsigned int line_search_step;
const unsigned int max_no_line_search_steps = 10;
const double line_search_damping = 0.6;
double new_newton_residual;

// Application of nonhomogeneous Dirichlet boundary conditions to the variational equations:


set_nonhomo_Dirichlet_bc ();
// Evaluate the right hand side residual
assemble_system_rhs();

double newton_residual = system_rhs.linfty_norm();


double old_newton_residual = newton_residual;
unsigned int newton_step = 1;

if (newton_residual < lower_bound_newton_residual)


std::cout << ’\t’ << std::scientific << newton_residuum << std::endl;

while (newton_residual > lower_bound_newton_residual && newton_step < max_no_newton_steps)


{
old_newton_residual = newton_residual;

assemble_system_rhs();
newton_residual = system_rhs.linfty_norm();

if (newton_residuum < lower_bound_newton_residuum)


{
std::cout << ’\t’
<< std::scientific
<< newton_residual << std::endl;
break;
}

// Simplified Newton steps


if (newton_residual/old_newton_residual > nonlinear_theta)
assemble_system_matrix ();

// Solve linear equation system Ax = b


solve ();

line_search_step = 0;
for ( ; line_search_step < max_no_line_search_steps; ++line_search_step)
{
solution += newton_update;

389
17. SOFTWARE AND PROGRAMMING CODES
assemble_system_rhs ();
new_newton_residual = system_rhs.linfty_norm();

if (new_newton_residual < newton_residual)


break;
else
solution -= newton_update;

newton_update *= line_search_damping;
}

// Output to the terminal for the user


std::cout << std::setprecision(5) <<newton_step << ’\t’ << std::scientific << newton_residual << ’\t’
<< std::scientific << newton_residual/old_newton_residual <<’\t’ ;
if (newton_residual/old_newton_residual > nonlinear_theta)
std::cout << "r" << ’\t’ ;
else
std::cout << " " << ’\t’ ;
std::cout << line_search_step << ’\t’ << std::endl;

// Goto next newton iteration, increment j->j+1


newton_step++;
}
}

17.2.6 Time-stepping schemes for ODEs in Octave


In this section, we present the code for Example 16.1.4. Despite that this is only an ODE, it is useful also
for PDE because very often time-dependent PDEs are solved in the temporal variable with finite difference
schemes as presented here.
The package MATLAB is widely used for scientific computations. MATLAB is a trademark of The-
MathsWorks Inc., 24 Prime Park Way, Natick, MA 01760. The website is www.mathworks.com. The open
source sister language with similar commands is Octave (in detail GNU Octave) with the website www.octave.
org. We use the latter one in these notes. For both languages, an excellent introduction can be found in the
textbook by Quarteroni et al. [98].
We first define the three numerical schemes. These are defined in terms of octave-functions and which must
be in the same directory where you later call these functions and run octave. Thus, the content of the next
three functions is written into files with the names:
euler_forward_2.m euler_backward_model_problem.m trapezoidal_model_problem.m
Inside these m-files we have the following statements. Forward Euler:

function [x,y] = euler_forward_2(f,xinit,yinit,xfinal,n)


% Forward Euler scheme
% Author: Thomas Wick, CMAP, Ecole Polytechnique, 2016

h = (xfinal - xinit)/n;

x = [xinit zeros(1,n)];
y = [yinit zeros(1,n)];

for k = 1:n
x(k+1) = x(k)+h;
y(k+1) = y(k) + h*f(x(k),y(k));
end
end
Backward Euler (here the right hand side is also unknown). For simplicity we explicitly manipulate the scheme
for this specific ODE. In the general case, the right hand side is solved in terms of a nonlinear solver.

390
17. SOFTWARE AND PROGRAMMING CODES
function [x,y] = euler_backward_model_problem(xinit,yinit,xfinal,n,a)
% Backward Euler scheme
% Author: Thomas Wick, CMAP, Ecole Polytechnique, 2016

% Calculuate step size


h = (xfinal - xinit)/n;

% Initialize x and y as column vectors


x = [xinit zeros(1,n)];
y = [yinit zeros(1,n)];

% Implement method
for k = 1:n
x(k+1) = x(k)+h;
y(k+1) = 1./(1.0-h.*a) .*y(k);
end
end
And finally the trapezoidal rule (Crank-Nicolson):
function [x,y] = trapezoidal_model_problem(xinit,yinit,xfinal,n,a)
% Trapezoidal / Crank-Nicolson
% Author: Thomas Wick, CMAP, Ecole Polytechnique, 2016

% Calculuate step size


h = (xfinal - xinit)/n;

% Initialize x and y as column vectors


x = [xinit zeros(1,n)];
y = [yinit zeros(1,n)];

% Implement method
for k = 1:n
x(k+1) = x(k)+h;
y(k+1) = 1./(1 - 0.5.*h.*a) .*(1 + 0.5.*h.*a).*y(k);
end
end
In order to have a re-usable code in which we can easily change further aspects, we do not work in the command
line but define an octave-script; here with the name
step_1.m
that contains initialization and function calls, visualization, and error estimation:
% Script file: step_1.m
% Author: Thomas Wick, CMAP, Ecole Polytechnique, 2016

% We solve three test scenarios


% Test 1: coeff_a = 0.25 as in the lecture notes.
% Test 2: coeff_a = -0.1
% Test 3: coeff_a = -10.0 to demonstrate
% numerical instabilities using explizit schemes

% Define mathematical model, i.e., ODE


% coeff_a = g - m
coeff_a = 0.25;

391
17. SOFTWARE AND PROGRAMMING CODES
f=@(t,y) coeff_a.*y;

% Initialization
t0 = 2011;
tN = 2014;
num_steps = 16;
y0 = 2;

% Implement exact solution, which is


% available in this example
g=@(t) y0 .* exp(coeff_a .* (t-t0));
te=[2011:0.01:2014];
ye=[g(te)];

% Call forward and backward Euler methods


[t1,y1] = euler_forward_2(f,t0,y0,tN,num_steps);
[t2,y2] = euler_backward_model_problem(t0,y0,tN,num_steps,coeff_a);
[t3,y3] = trapezoidal_model_problem(t0,y0,tN,num_steps,coeff_a);

% Plot solution
plot (te,ye,t1,y1,t2,y2,t3,y3)
xlabel(’t’)
ylabel(’y’)
legend(’Exact’,’FE’,’BE’,’CN’)
axis([2011 2014 0 6])

% Estimate relative discretization error


errorFE=[’FE err.: ’ num2str((ye(end)-y1(end))/ye(end))]
errorBE=[’BE err.: ’ num2str((ye(end)-y2(end))/ye(end))]
errorCN=[’CN err.: ’ num2str((ye(end)-y3(end))/ye(end))]

% Estimate absolute discretization error


errorABS_FE=[’FE err.: ’ num2str((ye(end)-y1(end)))]
errorABS_BE=[’BE err.: ’ num2str((ye(end)-y2(end)))]
errorABS_CN=[’CN err.: ’ num2str((ye(end)-y3(end)))]
Exercise 14. Implement the backward Euler scheme and trapezoidal rule in a more general way that avoids
the explicit manipulation with the right hand side f (t, y) = ay(t). Hint: A possibility is (assuming that the
function f is smooth enough) to formulate the above schemes as fixed-point equations and then either use a
fixed-point method or Newton’s (see Section 17.2.6.1) method to solve them. Apply these ideas to Example 12.

17.2.6.1 Background: Solving initial-valued problems with implicit schemes In Section 12.3.7.2, we intro-
duced implicit schemes to solve ODEs. In specific cases, one can explicitely arrange the terms in such a way
that an implicit schemes is obtained. This was the procedure for the ODE model problem in Section 17.2.6.
However, such an explicit representation is not always possible. Rather we have to solve a nonlinear problem
inside the time-stepping process. This nonlinear problem will be formulated in terms of a Newton scheme.
We recall:
y 0 (t) = f (t, y(t))
which reads after backward Euler discretization:
yn − yn−1
= f (tn , yn (tn )).
kn
After re-arranging some terms we obtain the root-finding problem:

g(yn ) := yn − yn−1 − kn f (tn , yn (tn )) = 0.

392
17. SOFTWARE AND PROGRAMMING CODES
Taylor expansion yields (where the Newton iteration index is denoted by ‘l’):

0 = g(ynl ) + g 0 (ynl )(ynl+1 − ynl ) + o((ynl+1 − ynl )2 ),

where the Newton matrix (the Jacobian) is given by

g 0 (ynl ) := I − kn fy0 (tn , ynl ).

The derivative fy0 is with respect to the unknown yn .


In defect correction notation, we have
Algorithm 17.1. Given a starting value, e.g., yn0 we have for l = 1, 2, 3, . . .:

g 0 (ynl )δy = −g(ynl ), (351)


ynl+1 = ynl + δy (352)

Stop if kg(ynl+1 )k < T OLN , with e.g., T OLN = 10−8 , set yn := ynl+1 .
The difficulty here is to understand the mechanism of the two different indices n and l. As before, we want
to compute for a new time point tn+1 a solution yn+1 . However, in most cases yn+1 cannot be constructed
explicitely as done for example in Section 4.1.2. Rather we need to approximate yn+1 by another numerical
scheme. Here Newton’s method is one efficient possibility, in which we start with a rough approximation for
0
yn+1 , which is denoted as initial Newton guess yn+1 ≈ yn+1 . In order to improve this initial guess, we start a
Newton iteration labeled by l and (hope!) to find better approximations
1 2
yn+1 , yn+1 , ...

for the sought value yn+1 . Mathematically speaking, we hope that


l
|yn+1 − yn+1 | → 0, for l → ∞.

To make this procedure clear:


• we are interested in the true value y(tn+1 ), which is however in most cases not possible to be computed
in an explicit way.
• Thus, we use a finite difference scheme (e.g., backward Euler) to approximate y(tn+1 ) by yn+1 .
• However, in many cases this finite difference scheme (precisely speaking: implicit schemes) is still not
enough to obtain directly yn+1 . Therefore, we need a second scheme (here Newton) to approximate yn+1
l
by yn+1 .
Remark 17.2. To avoid two numerical schemes is one reason, why explicit schemes are nevertheless very
popular in practice. Here the second scheme can be omitted.
Remark 17.3. On the other hand, this example is characteristic for many numerical algorithms: very of-
ten, we have several algorithms that interact with each other. One important question is how to order these
algorithms such that the overall numerical scheme is robust and efficient.

17.2.7 deal.II (C++) PU-DWR implementation used in Section 9


In this section, we show the key code snippet for the PU-DWR implementation. The full code can be found
here https://thomaswick.org/Codes/step_nse_Nov_9_2021.cc

////////////////////////////////////////////////////////////////////////////////////////////
// PU-DWR Error estimator

// Implementing a weak form of DWR localization using a partition-of-unity

393
17. SOFTWARE AND PROGRAMMING CODES
// as it has been proposed in
//
// T. Richter, T. Wick;
// Variational Localizations of the Dual-Weighted Residual Estimator,
// Journal of Computational and Applied Mathematics,
// Vol. 279 (2015), pp. 192-208
//
// Some routines have been taken from step-14 in deal.II
template<int dim>
double NSE_PU_DWR_Problem<dim>::compute_error_indicators_a_la_PU_DWR (const unsigned int refinement_cy
{

// First, we re-initialize the error indicator vector that


// has the length of the space dimension of the PU-FE.
// Therein we store the local errors at all degrees of freedom.
// This is in contrast to usual procedures, where the error
// in general is stored cell-wise.
dof_handler_pou.distribute_dofs (fe_pou);
error_indicators.reinit (dof_handler_pou.n_dofs());

// Block 1 (building the dual weights):


// In the following the very specific
// part (z-I_hz) of DWR is implemented.
// This part is the same for classical error estimation
// and PU error estimation.
std::vector<unsigned int> block_component (3,0);
block_component[dim] = 1;

DoFRenumbering::component_wise (dof_handler_adjoint, block_component);

// Implement the interpolation operator


// (z-z_h)=(z-I_hz)
ConstraintMatrix dual_hanging_node_constraints;
DoFTools::make_hanging_node_constraints (dof_handler_adjoint,
dual_hanging_node_constraints);
dual_hanging_node_constraints.close();

ConstraintMatrix primal_hanging_node_constraints;
DoFTools::make_hanging_node_constraints (dof_handler_primal,
primal_hanging_node_constraints);
primal_hanging_node_constraints.close();

// TODO double-check (3), 0 , 1, 2


// Construct a local primal solution that
// has the length of the adjoint vector
std::vector<unsigned int> dofs_per_block (2);
DoFTools::count_dofs_per_block (dof_handler_adjoint,
dofs_per_block, block_component);
const unsigned int n_v = dofs_per_block[0];
const unsigned int n_p = dofs_per_block[1];

394
17. SOFTWARE AND PROGRAMMING CODES

BlockVector<double> solution_primal_of_adjoint_length;
solution_primal_of_adjoint_length.reinit(2);
solution_primal_of_adjoint_length.block(0).reinit(n_v);
solution_primal_of_adjoint_length.block(1).reinit(n_p);
solution_primal_of_adjoint_length.collect_sizes ();

// Main function 1: Interpolate cell-wise the


// primal solution into the dual FE space.
// This rescaled primal solution is called
// ** solution_primal_of_adjoint_length **
FETools::interpolate (dof_handler_primal,
solution_primal,
dof_handler_adjoint,
dual_hanging_node_constraints,
solution_primal_of_adjoint_length);

// Local vectors of dual weights obtained


// from the adjoint solution
BlockVector<double> dual_weights;
dual_weights.reinit(2);
dual_weights.block(0).reinit(n_v);
dual_weights.block(1).reinit(n_p);
dual_weights.collect_sizes ();

// Main function 2: Execute (z-I_hz) (in the dual space),


// yielding the adjoint weights for error estimation.
FETools::interpolation_difference (dof_handler_adjoint,
dual_hanging_node_constraints,
solution_adjoint,
dof_handler_primal,
primal_hanging_node_constraints,
dual_weights);

// end Block 1

// Block 2 (evaluating the PU-DWR):


// The following function has a loop inside that
// goes over all cells to collect the error contributions,
// and is the ‘heart’ of the DWR method. Therein
// the specific equation of the error estimator is implemented.

// Info: must be sufficiently high for adjoint evaluations


QGauss<dim> quadrature_formula(5);

FEValues<dim> fe_values_pou (fe_pou, quadrature_formula,


update_values |
update_quadrature_points |
update_JxW_values |
update_gradients);

395
17. SOFTWARE AND PROGRAMMING CODES

FEValues<dim> fe_values_adjoint (fe_adjoint, quadrature_formula,


update_values |
update_quadrature_points |
update_JxW_values |
update_gradients);

const unsigned int dofs_per_cell = fe_values_pou.dofs_per_cell;


const unsigned int n_q_points = fe_values_pou.n_quadrature_points;

Vector<double> local_err_ind (dofs_per_cell);

std::vector<unsigned int> local_dof_indices (dofs_per_cell);

const FEValuesExtractors::Vector velocities (0); // 0


const FEValuesExtractors::Scalar pressure (dim); // 2

const FEValuesExtractors::Scalar pou_extract (0);

std::vector<Vector<double> > primal_cell_values (n_q_points,


Vector<double>(dim+1));

std::vector<std::vector<Tensor<1,dim> > > primal_cell_gradients (n_q_points,


std::vector<Tensor<1,dim> > (dim+1));

std::vector<Vector<double> > dual_weights_values(n_q_points,


Vector<double>(dim+1));

std::vector<std::vector<Tensor<1,dim> > > dual_weights_gradients (n_q_points,


std::vector<Tensor<1,dim> > (dim+1));

typename DoFHandler<dim>::active_cell_iterator
cell = dof_handler_pou.begin_active(),
endc = dof_handler_pou.end();

typename DoFHandler<dim>::active_cell_iterator
cell_adjoint = dof_handler_adjoint.begin_active();

for ( ; cell!=endc; ++cell, ++cell_adjoint)


{
fe_values_pou.reinit (cell);
fe_values_adjoint.reinit (cell_adjoint);

local_err_ind = 0;

// primal solution (cell residuals)


// But we use the adjoint FE since we previously enlarged the
// primal solution to the length of the adjoint vector.

396
17. SOFTWARE AND PROGRAMMING CODES
fe_values_adjoint.get_function_values (solution_primal_of_adjoint_length,
primal_cell_values);

fe_values_adjoint.get_function_gradients (solution_primal_of_adjoint_length,
primal_cell_gradients);

// adjoint weights
fe_values_adjoint.get_function_values (dual_weights,
dual_weights_values);

fe_values_adjoint.get_function_gradients (dual_weights,
dual_weights_gradients);

// Gather local error indicators while running


// of the degrees of freedom of the partition of unity
// and corresponding quadrature points.
for (unsigned int q=0; q<n_q_points; ++q)
{
// Right hand side
Tensor<1,dim> fluid_force;
fluid_force.clear();
fluid_force[0] = density_fluid * force_fluid_x;
fluid_force[1] = density_fluid * force_fluid_y;

// Primal cell values


Tensor<1,2> v;
v.clear();
v[0] = primal_cell_values[q](0);
v[1] = primal_cell_values[q](1);

Tensor<2,dim> grad_v;
grad_v[0][0] = primal_cell_gradients[q][0][0];
grad_v[0][1] = primal_cell_gradients[q][0][1];
grad_v[1][0] = primal_cell_gradients[q][1][0];
grad_v[1][1] = primal_cell_gradients[q][1][1];

Tensor<2,dim> pI;
pI[0][0] = primal_cell_values[q](2);
pI[1][1] = primal_cell_values[q](2);

// Adjoint weights
Tensor<1,dim> dw_v;
dw_v[0] = dual_weights_values[q](0);
dw_v[1] = dual_weights_values[q](1);

double dw_p = dual_weights_values[q](2);

Tensor<2,dim> grad_dw_v;
grad_dw_v[0][0] = dual_weights_gradients[q][0][0];
grad_dw_v[0][1] = dual_weights_gradients[q][0][1];
grad_dw_v[1][0] = dual_weights_gradients[q][1][0];

397
17. SOFTWARE AND PROGRAMMING CODES
grad_dw_v[1][1] = dual_weights_gradients[q][1][1];

Tensor<2,dim> stress_term = - pI + density_fluid * viscosity * (grad_v + transpose(grad_v));

double divergence_v = grad_v[0][0] + grad_v[1][1];

Tensor<1,2> convection_fluid = density_fluid * grad_v * v;

// Run over all PU degrees of freedom per cell (namely 4 DoFs for Q1 FE-PU)
for (unsigned int i=0; i<dofs_per_cell; ++i)
{

Tensor<2,dim> grad_phi_psi;
grad_phi_psi[0][0] = fe_values_pou[pou_extract].value(i,q) * grad_dw_v[0][0] + dw_v[0] * fe_valu
grad_phi_psi[0][1] = fe_values_pou[pou_extract].value(i,q) * grad_dw_v[0][1] + dw_v[0] * fe_valu
grad_phi_psi[1][0] = fe_values_pou[pou_extract].value(i,q) * grad_dw_v[1][0] + dw_v[1] * fe_valu
grad_phi_psi[1][1] = fe_values_pou[pou_extract].value(i,q) * grad_dw_v[1][1] + dw_v[1] * fe_valu

// Implement the error estimator


// J(u) - J(u_h) \approx \eta := (f,...) - (\nabla u, ...)
//
// First part: (f,...)
local_err_ind(i) += (fluid_force * dw_v * fe_values_pou[pou_extract].value(i,q)
) * fe_values_pou.JxW (q);

// Second part: - (\nabla u, ...)


local_err_ind(i) -= (scalar_product(stress_term, grad_phi_psi)
+ convection_fluid * dw_v * fe_values_pou[pou_extract].value(i,q)
+ divergence_v * dw_p * fe_values_pou[pou_extract].value(i,q)
) * fe_values_pou.JxW (q);

} // end q_points

// Write all error contributions


// in their respective places in the global error vector.
cell->get_dof_indices (local_dof_indices);
for (unsigned int i=0; i<dofs_per_cell; ++i)
error_indicators(local_dof_indices[i]) += local_err_ind(i);

} // end cell loop for PU FE elements

// Finally, we eliminate and distribute hanging nodes in the error estimator


ConstraintMatrix dual_hanging_node_constraints_pou;
DoFTools::make_hanging_node_constraints (dof_handler_pou,

398
17. SOFTWARE AND PROGRAMMING CODES
dual_hanging_node_constraints_pou);
dual_hanging_node_constraints_pou.close();

// Distributing the hanging nodes


dual_hanging_node_constraints_pou.condense(error_indicators);

// Averaging (making the ’solution’ continuous)


dual_hanging_node_constraints_pou.distribute(error_indicators);

// end Block 2

// Block 3 (data and terminal print out)


DataOut<dim> data_out;
data_out.attach_dof_handler (dof_handler_pou);
data_out.add_data_vector (error_indicators, "error_ind");
data_out.build_patches ();

std::ostringstream filename;
filename << "solution_error_indicators_"
<< refinement_cycle
<< ".vtk"
<< std::ends;

std::ofstream out (filename.str().c_str());


data_out.write_vtk (out);

// Print out on terminal


std::cout << "------------------" << std::endl;
std::cout << std::setiosflags(std::ios::scientific) << std::setprecision(2);
std::cout << " Dofs: " << dof_handler_primal.n_dofs()<< std::endl;
std::cout << " Exact error: " << exact_error_local << std::endl;
double total_estimated_error = 0.0;
for (unsigned int k=0;k<error_indicators.size();k++)
total_estimated_error += error_indicators(k);

// Take the absolute of the estimated error.


// However, we might check if the signs
// of the exact error and the estimated error are the same.
total_estimated_error = std::abs(total_estimated_error);

std::cout << " Estimated error (prim): " << total_estimated_error << std::endl;

// From the JCAM paper: compute indicator indices to check


// effectivity of error estimator.
double total_estimated_error_absolute_values = 0.0;
for (unsigned int k=0;k<error_indicators.size();k++)
total_estimated_error_absolute_values += std::abs(error_indicators(k));

// "ind" things were mainly for paper with Thomas Richter (Richter/Wick; JCAM, 2015)
// std::cout << " Estimated error (ind): " << total_estimated_error_absolute_values << std::end

std::cout << " Ieff: " << total_estimated_error/exact_error_local << std::endl;

399
17. SOFTWARE AND PROGRAMMING CODES
//std::cout << " Iind: " << total_estimated_error_absolute_values/exact_error_l

// Write everything into a file


//file.precision(3);
file << std::setiosflags(std::ios::scientific) << std::setprecision(2);
file << dof_handler_primal.n_dofs() << "\t";
file << exact_error_local << "\t";
file << total_estimated_error << "\t";
file << total_estimated_error_absolute_values << "\t";
file << total_estimated_error/exact_error_local << "\t";
file << total_estimated_error_absolute_values/exact_error_local << "\n";
file.flush();

// Write everything into a file gnuplot


file_gnuplot << std::setiosflags(std::ios::scientific) << std::setprecision(2);
//file_gnuplot.precision(3);
file_gnuplot << dof_handler_primal.n_dofs() << "\t";
file_gnuplot << exact_error_local << "\t";
file_gnuplot << total_estimated_error << "\t";
file_gnuplot << total_estimated_error_absolute_values << "\n";
file_gnuplot.flush();

// end Block 3

// Block 4
return total_estimated_error;

// end Block 4

// Refinement strategy and carrying out the actual refinement.


template<int dim>
double NSE_PU_DWR_Problem<dim>::refine_average_with_PU_DWR (const unsigned int refinement_cycle)
{
// Step 1:
// Obtain error indicators from PU DWR estimator
double estimated_DWR_error = compute_error_indicators_a_la_PU_DWR (refinement_cycle);

// Step 2: Choosing refinement strategy


// Here: averaged refinement
// Alternatives are in deal.II:
// refine_and_coarsen_fixed_fraction for example
for (Vector<float>::iterator i=error_indicators.begin();
i != error_indicators.end(); ++i)
*i = std::fabs (*i);

400
17. SOFTWARE AND PROGRAMMING CODES
const unsigned int dofs_per_cell_pou = fe_pou.dofs_per_cell;
std::vector< unsigned int > local_dof_indices(dofs_per_cell_pou);

// Refining all cells that have values above the mean value
double error_indicator_mean_value = error_indicators.mean_value();

// Pou cell later


typename DoFHandler<dim>::active_cell_iterator
cell = dof_handler_pou.begin_active(),
endc = dof_handler_pou.end();

double error_ind = 0.0;


//1.1; for drag and lift and none mesh smoothing
//5.0; for pressure difference and maximal mesh smoothing
double alpha = 1.1;

for (; cell!=endc; ++cell)


{
error_ind = 0.0;
cell->get_dof_indices(local_dof_indices);

for (unsigned int i=0; i<dofs_per_cell_pou; ++i)


{
error_ind += error_indicators(local_dof_indices[i]);
}

// For uniform (global) mesh refinement,


// just comment the following line
if (error_ind > alpha * error_indicator_mean_value)
cell->set_refine_flag();
}

triangulation.execute_coarsening_and_refinement ();

return estimated_DWR_error;

17.2.8 deal.II (C++) heat equation used in Section 12.3.12


In this section, we provide the programming code for Section 12.3.12. In general, we notice that it does
not make sense to copy a code into some written document and print it. Sometimes, however this is useful,
specifically for carefully explaining basic steps, which is the case here. For this reason, the code is on the
one hand as short as possible, with only the necessary documentation, but also (hopefully) self-contained and
self-explaining.

/*
* Author: Thomas Wick (2014)
* ICES, UT Austin, USA
*
* Features of this program:
* -------------------------
* Computes the nonstationary heat equation in 2d and 3d

401
17. SOFTWARE AND PROGRAMMING CODES
* A) Constant and/or nonconstant diffusion coefficient
* B) Constant and/or nonconstant right hand side f
* C) Homogeneous or nonhomogeneous Dirichlet conditions
* D) Projection of initial data
* E) Declaration of the 2d and 3d classes
* F) Possible extension: implement different timestepping schemes
* with a suitable choice of theta
*
* Leibniz University Hannover
* November 04, 2021
* Running under deal.II version dealii-9.1.1
*
* This program is originally based on the deal.II steps 4 and 5
* in version 8.1.0
* for implementing the Poisson problem
* and we refer for Author and License information to those tutorial steps.
* The time step loop implementation is a straightforward
* extension; here it is based on the ideas presented in
* T. Wick; ANS, Vol. 1, No. 1, pp. 1-19.
*
* License of this program: GNU Lesser General
* Public License as published by the Free Software Foundation
*
*
*/

////////////////////////////////////////////////
// For explanations of the includes, we refer
// to the deal.II documentation
#include <deal.II/grid/tria.h>
#include <deal.II/dofs/dof_handler.h>
#include <deal.II/grid/grid_generator.h>
#include <deal.II/grid/tria_accessor.h>
#include <deal.II/grid/tria_iterator.h>
#include <deal.II/dofs/dof_accessor.h>
#include <deal.II/fe/fe_q.h>
#include <deal.II/dofs/dof_tools.h>
#include <deal.II/fe/fe_values.h>
#include <deal.II/base/quadrature_lib.h>
#include <deal.II/base/function.h>
#include <deal.II/numerics/vector_tools.h>
#include <deal.II/numerics/matrix_tools.h>
#include <deal.II/lac/vector.h>
#include <deal.II/lac/full_matrix.h>
#include <deal.II/lac/sparse_matrix.h>
#include <deal.II/lac/dynamic_sparsity_pattern.h>
#include <deal.II/lac/solver_cg.h>
#include <deal.II/lac/precondition.h>
#include <deal.II/grid/grid_refinement.h>
#include <deal.II/numerics/error_estimator.h>

#include <deal.II/numerics/data_out.h>
#include <fstream>
#include <iostream>

402
17. SOFTWARE AND PROGRAMMING CODES
#include <deal.II/base/logstream.h>

using namespace dealii;

//////////////////////////////////////////////////////
// Main class
template <int dim>
class Step_Heat
{
public:
Step_Heat ();
void run ();

private:
void set_runtime_parameters ();
void make_grid ();
void setup_system();
void assemble_system ();
void solve ();
void output_results (const unsigned int cycle) const;

void compute_functionals();

Triangulation<dim> triangulation;
FE_Q<dim> fe;
DoFHandler<dim> dof_handler;

SparsityPattern sparsity_pattern;
SparseMatrix<double> system_matrix;

Vector<double> solution, old_timestep_solution;


Vector<double> system_rhs;

double timestep, theta, time;


std::string time_stepping_scheme;
unsigned int max_no_timesteps, timestep_number;

};

////////////////////////////////////////////////////////
// 1. Initial values
// 2. Boundary values
// 3. Right hand side f
// 4. Function for coefficients

////////////////////////////////////////////////////////
// Initial values

template <int dim>


class InitialValues : public Function<dim>
{
public:
InitialValues () : Function<dim>(1) {}

403
17. SOFTWARE AND PROGRAMMING CODES

virtual double value (const Point<dim> &p,


const unsigned int component = 0) const;

virtual void vector_value (const Point<dim> &p,


Vector<double> &value) const;

};

template <int dim>


double
InitialValues<dim>::value (const Point<dim> &p,
const unsigned int component) const
{
// Choice 1
//return 1.0;

// Choice 2 TODO 2d !!!


return (std::sin(p[0]) * std::sin(p[1]));
}

template <int dim>


void
InitialValues<dim>::vector_value (const Point<dim> &p,
Vector<double> &values) const
{
for (unsigned int comp=0; comp<this->n_components; ++comp)
values (comp) = InitialValues<dim>::value (p, comp);
}

///////////////////////////////////////////////////////////
// Right hand side
// Info: not fully implemented, but rather directly in the assembly
template <int dim>
class BoundaryValues : public Function<dim>
{
public:
BoundaryValues () : Function<dim>() {}

virtual double value (const Point<dim> &p,


const unsigned int component = 0) const;
};

template <int dim>


double BoundaryValues<dim>::value (const Point<dim> &p,
const unsigned int /*component*/) const
{
return p.square();
}

404
17. SOFTWARE AND PROGRAMMING CODES

///////////////////////////////////////////////////////////
// Right hand side
// Info: not fully implemented, but rather directly in the assembly
template <int dim>
class RightHandSide : public Function<dim>
{
public:
RightHandSide () : Function<dim>() {}

virtual double value (const Point<dim> &p,


const unsigned int component = 0) const;
};

template <int dim>


double RightHandSide<dim>::value (const Point<dim> &p,
const unsigned int /*component*/) const
{
double return_value = 0;
for (unsigned int i=0; i<dim; ++i)
return_value += 4*std::pow(p(i), 4);

return return_value;
}

///////////////////////////////////////////////////////////
// Non-constant coefficient values for Laplacian
template <int dim>
class Coefficient : public Function<dim>
{
public:
Coefficient () : Function<dim>() {}

virtual double value (const Point<dim> &p,


const unsigned int component = 0) const;

virtual void value_list (const std::vector<Point<dim> > &points,


std::vector<double> &values,
const unsigned int component = 0) const;
};

template <int dim>


double Coefficient<dim>::value (const Point<dim> &p,
const unsigned int /*component*/) const
{
if (p[1] >= 0.0)
return 20;
else

405
17. SOFTWARE AND PROGRAMMING CODES
return 1;
}

template <int dim>


void Coefficient<dim>::value_list (const std::vector<Point<dim> > &points,
std::vector<double> &values,
const unsigned int component) const
{
Assert (values.size() == points.size(),
ExcDimensionMismatch (values.size(), points.size()));

Assert (component == 0,
ExcIndexRange (component, 0, 1));

const unsigned int n_points = points.size();

for (unsigned int i=0; i<n_points; ++i)


{
if (points[i][1] >= 0.0)
values[i] = 20;
else
values[i] = 1;
}
}

///////////////////////////////////////////////////////////
// Constructor
template <int dim>
Step_Heat<dim>::Step_Heat ()
:
fe (1),
dof_handler (triangulation)
{}

///////////////////////////////////////////////////////////
// Definiting several model parameters
template <int dim>
void Step_Heat<dim>::set_runtime_parameters()
{
// Timestepping schemes
//BE, FE, CN
time_stepping_scheme = "BE";

// BE = 1.0,
// FE = 0.0,
// CN = 0.5,
// F) Not yet implemented for the cases other than BE: 1.0
theta = 0.5;

// Timestep size:

406
17. SOFTWARE AND PROGRAMMING CODES
timestep = 0.125;

// Maximum number of timesteps:


max_no_timesteps = 200;

// A variable to count the number of time steps


timestep_number = 0;

// Counts total time


time = 0;

// Goal functionals
// BE:
// L2 norm: 2.500000e+01 6.644876e-02
// L2 norm: 2.500000e+01 1.236463e-01
// L2 norm: 2.500000e+01 2.485528e-01

// CN:
// L2 norm: 2.500000e+01 2.352296e-05
// L2 norm: 2.500000e+01 4.243702e-02
// L2 norm: 2.500000e+01 1.660507e-01

// Ref 6, CN
//L2 norm: 2.500000e+01 5.975624e-06
//L2 norm: 2.500000e+01 4.251649e-02
//L2 norm: 2.500000e+01 1.662468e-01
//L2 norm: 2.500000e+01 4.058831e-01

///////////////////////////////////////////////////////////
// Initialize the mesh
template <int dim>
void Step_Heat<dim>::make_grid ()
{
GridGenerator::hyper_cube (triangulation, 0, M_PI);

// In the following, we read a *.inp grid from a file.


/*
std::string grid_name;
grid_name = "unit_square_1.inp";

GridIn<dim> grid_in;
grid_in.attach_triangulation (triangulation);
std::ifstream input_file(grid_name.c_str());
Assert (dim==2, ExcInternalError());
grid_in.read_ucd (input_file);
*/

triangulation.refine_global (6);

std::cout << " Number of active cells: "

407
17. SOFTWARE AND PROGRAMMING CODES
<< triangulation.n_active_cells()
<< std::endl
<< " Total number of cells: "
<< triangulation.n_cells()
<< std::endl;
}

///////////////////////////////////////////////////////////
// Setting up sparsity pattern for system matrix, rhs, solution vector
template <int dim>
void Step_Heat<dim>::setup_system ()
{
dof_handler.distribute_dofs (fe);

std::cout << " Number of degrees of freedom: "


<< dof_handler.n_dofs()
<< std::endl;

//CompressedSparsityPattern c_sparsity(dof_handler.n_dofs());
DynamicSparsityPattern c_sparsity(dof_handler.n_dofs());
DoFTools::make_sparsity_pattern (dof_handler, c_sparsity);
sparsity_pattern.copy_from(c_sparsity);

system_matrix.reinit (sparsity_pattern);

solution.reinit (dof_handler.n_dofs());
old_timestep_solution.reinit (dof_handler.n_dofs());
system_rhs.reinit (dof_handler.n_dofs());
}

///////////////////////////////////////////////////////////
// This is the heart in which the
// discretized equations are locally (per element) assembled.
// Therein, local integrals are evaluated with Gauss quadrature.
template <int dim>
void Step_Heat<dim>::assemble_system ()
{
QGauss<dim> quadrature_formula(2);

FEValues<dim> fe_values (fe, quadrature_formula,


update_values | update_gradients |
update_quadrature_points | update_JxW_values);

const unsigned int dofs_per_cell = fe.dofs_per_cell;


const unsigned int n_q_points = quadrature_formula.size();

FullMatrix<double> cell_matrix (dofs_per_cell, dofs_per_cell);


Vector<double> cell_rhs (dofs_per_cell);

std::vector<types::global_dof_index> local_dof_indices (dofs_per_cell);

// For previous time step solutions


std::vector<double> old_timestep_solution_values (n_q_points);

408
17. SOFTWARE AND PROGRAMMING CODES

std::vector<std::vector<Tensor<1,dim> > >


old_timestep_solution_grads (n_q_points,std::vector<Tensor<1,dim> > (1));

// Right hand side


const RightHandSide<dim> right_hand_side;

// Class for nonconstant coefficients


const Coefficient<dim> coefficient;
std::vector<double> coefficient_values (n_q_points);

double density = 1.0;

// Next, we again have to loop over all elements and assemble local
// contributions. Note, that an element is a quadrilateral in two space
// dimensions, but a hexahedron in 3D.
typename DoFHandler<dim>::active_cell_iterator
cell = dof_handler.begin_active(),
endc = dof_handler.end();

for (; cell!=endc; ++cell)


{
fe_values.reinit (cell);
cell_matrix = 0;
cell_rhs = 0;

// Old_timestep_solution values are interpolated on the present cell


fe_values.get_function_values (old_timestep_solution, old_timestep_solution_values);
fe_values.get_function_gradients (old_timestep_solution, old_timestep_solution_grads);

// Update of nonconstant coefficients


coefficient.value_list (fe_values.get_quadrature_points(),
coefficient_values);

// Loop over all quadrature points (evaluating integrals numerically)


for (unsigned int q_point=0; q_point<n_q_points; ++q_point)
{
// Get FE-solution values for previous time step at present quadrature point
const double old_timestep_u = old_timestep_solution_values[q_point];

Tensor<1,dim> old_timestep_grad_u;
old_timestep_grad_u[0] = old_timestep_solution_grads[q_point][0][0];
old_timestep_grad_u[1] = old_timestep_solution_grads[q_point][0][1];

for (unsigned int i=0; i<dofs_per_cell; ++i)


{
for (unsigned int j=0; j<dofs_per_cell; ++j)
{
// Mass term from the time derivative
// Change indices such that (j,i) because test
// function determines the row in system matrix A
// See lecture notes in Chapter 6

409
17. SOFTWARE AND PROGRAMMING CODES
cell_matrix(j,i) += density * (fe_values.shape_value (i, q_point) *
fe_values.shape_value (j, q_point) *
fe_values.JxW (q_point));

cell_matrix(j,i) += timestep * theta *


(// A) Constant diffusion a=1:
1.0 *
// Nonconstant diffusion coefficient
//coefficient_values[q_point] *
fe_values.shape_grad (i, q_point) *
fe_values.shape_grad (j, q_point) *
fe_values.JxW (q_point));

cell_rhs(i) += (old_timestep_u * fe_values.shape_value (i, q_point)


+ fe_values.shape_value (i, q_point) *
// B) Constant right hand side f=0:
0.0
// Non constant right hand side:
//right_hand_side.value (fe_values.quadrature_point (q_point))
) * fe_values.JxW (q_point);

// Old timestep Laplace operator


cell_rhs(i) -= timestep * (1.0 - theta) * 1.0 *
(old_timestep_grad_u * fe_values.shape_grad (i, q_point)
) * fe_values.JxW (q_point);

}
}

// Putting local element contributions into global system matrix A


// and global right hand side vector b
cell->get_dof_indices (local_dof_indices);
for (unsigned int i=0; i<dofs_per_cell; ++i)
{
for (unsigned int j=0; j<dofs_per_cell; ++j)
system_matrix.add (local_dof_indices[i],
local_dof_indices[j],
cell_matrix(i,j));

system_rhs(local_dof_indices[i]) += cell_rhs(i);
}
}

// Apply Dirichlet boundary conditions


std::map<types::global_dof_index,double> boundary_values;
VectorTools::interpolate_boundary_values (dof_handler,
0,

410
17. SOFTWARE AND PROGRAMMING CODES
// C) Homogeneous Dirichlet conditions:
ZeroFunction<dim>(),
// Non-homogeneous Dirichlet conditions:
//BoundaryValues<dim>(),
boundary_values);
MatrixTools::apply_boundary_values (boundary_values,
system_matrix,
solution,
system_rhs);
}

////////////////////////////////////////////////////////
// Linear solution using the conjugate gradient
// method without preconditioning.
// See deal.II tutorials for extensions with preconditioners
// See also my lecture notes for extensions towards geometric multigrid
template <int dim>
void Step_Heat<dim>::solve ()
{
SolverControl solver_control (1000, 1e-12);
SolverCG<> solver (solver_control);
solver.solve (system_matrix, solution, system_rhs,
PreconditionIdentity());

std::cout << " " << solver_control.last_step()


<< " CG iterations needed to obtain convergence."
<< std::endl;
}

/////////////////////////////////////////////////////////
// Goal functional evaluations
// -> correctness of solution
// -> possible interest in terms of applications
template <int dim>
void Step_Heat<dim>::compute_functionals ()
{
// Compute global L2 norm
Vector<float> difference_per_cell(triangulation.n_active_cells());
VectorTools::integrate_difference(dof_handler,
solution,
//Solution<dim>(),
ZeroFunction<dim>(),
difference_per_cell,
QGauss<dim>(4),
VectorTools::L2_norm);
const double L2_error =
VectorTools::compute_global_error(triangulation,
difference_per_cell,
VectorTools::L2_norm);

// Compute point value


Point<dim> p = Point<dim>(M_PI/2.0, M_PI/2.0);

411
17. SOFTWARE AND PROGRAMMING CODES
unsigned int component = 0;
Vector<double> tmp_vector(1);
VectorTools::point_value (dof_handler,
solution,
p,
tmp_vector);

// Output
std::cout << "L2 norm: " << time << " " << std::scientific << L2_error << std::endl;
std::cout << "Point value: " << time << " " << p << " " << tmp_vector(component) << std::endl;
std::cout << std::endl;

/////////////////////////////////////////////////////////
// Graphical output: here *.vtk
template <int dim>
void Step_Heat<dim>::output_results (const unsigned int cycle) const
{
DataOut<dim> data_out;

data_out.attach_dof_handler (dof_handler);
data_out.add_data_vector (solution, "solution");

data_out.build_patches ();

std::string filename_basis = "solution_";


std::ostringstream filename;
std::cout << "------------------" << std::endl;
std::cout << "Write solution" << std::endl;
std::cout << "------------------" << std::endl;
std::cout << std::endl;

if (dim == 2)
{
filename << filename_basis
<< "2d_"
<< Utilities::int_to_string (cycle, 2)
<< ".vtk";
}
else
{
filename << filename_basis
<< "3d_"
<< Utilities::int_to_string (cycle, 2)
<< ".vtk";
}

std::ofstream output (filename.str().c_str());


data_out.write_vtk (output);
}

412
17. SOFTWARE AND PROGRAMMING CODES
/////////////////////////////////////////////////////////
// Final method to run the program and designing the
// time step loop
template <int dim>
void Step_Heat<dim>::run ()
{
std::cout << "Solving problem in " << dim << " space dimensions." << std::endl;

set_runtime_parameters();
make_grid();
setup_system ();

// D) Project (possibly nonhomogeneous) initial values


{
ConstraintMatrix constraints;
constraints.close();

VectorTools::project (dof_handler,
constraints,
QGauss<dim>(3),
InitialValues<dim>(),
solution
);

output_results (0);
}

// Timestep loop
do
{
std::cout << "Timestep " << timestep_number
<< " (" << time_stepping_scheme
<< ")" << ": " << time
<< " (" << timestep << ")"
<< "\n=============================="
<< "====================================="
<< std::endl;

std::cout << std::endl;

// Solve for next time step


old_timestep_solution = solution;
assemble_system ();
solve ();

// Write solutions
output_results (timestep_number+1);

// Compute goal functionals


compute_functionals();

time += timestep;
++timestep_number;

413
17. SOFTWARE AND PROGRAMMING CODES
}
while (timestep_number <= max_no_timesteps);

/////////////////////////////////////////////////////////
// Final method to start the program
int main ()
{
deallog.depth_console (0);

// E) Computes first 2d and then 3d solution


{
Step_Heat<2> laplace_problem_2d;
laplace_problem_2d.run ();
}

// {
// Step_Heat<3> laplace_problem_3d;
// laplace_problem_3d.run ();
// }

return 0;
}

17.2.9 deal.II (C++) Navier-Stokes illustrating nonlinear problems, Chapter 13


In this section, we show how a coupled nonlinear problem is assembled in a monolithic fashion in deal.II. The
numerical solution is based on Newton’s method, Section 13.12, and the Navier-Stokes FEM discretization
Section 13.19. The following code snippets are taken from https://thomaswick.org/Codes/step_nse_Nov_
9_2021.cc

// Looping over all elements


for (; cell!=endc; ++cell)
{
fe_values.reinit (cell);
local_matrix = 0;

// Previous Newton iteration values


fe_values.get_function_values (solution_primal, old_solution_values);
fe_values.get_function_gradients (solution_primal, old_solution_grads);

for (unsigned int q=0; q<n_q_points; ++q)


{
for (unsigned int k=0; k<dofs_per_cell; ++k)
{
phi_i_v[k] = fe_values[velocities].value (k, q);
phi_i_grads_v[k] = fe_values[velocities].gradient (k, q);
phi_i_p[k] = fe_values[pressure].value (k, q);
}

414
17. SOFTWARE AND PROGRAMMING CODES

// We build values, vectors, and tensors


// from information of the previous Newton step.
// Required for convection term
Tensor<1,dim> v;
v[0] = old_solution_values[q](0);
v[1] = old_solution_values[q](1);

Tensor<2,dim> grad_v;
grad_v[0][0] = old_solution_grads[q][0][0];
grad_v[0][1] = old_solution_grads[q][0][1];
grad_v[1][0] = old_solution_grads[q][1][0];
grad_v[1][1] = old_solution_grads[q][1][1];

// Outer loop for dofs


for (unsigned int i=0; i<dofs_per_cell; ++i)
{
Tensor<2, 2> pI_LinP;
pI_LinP.clear();
pI_LinP[0][0] = phi_i_p[i];
pI_LinP[1][1] = phi_i_p[i];

Tensor<2,dim> grad_v_LinV;
grad_v_LinV[0][0] = phi_i_grads_v[i][0][0];
grad_v_LinV[0][1] = phi_i_grads_v[i][0][1];
grad_v_LinV[1][0] = phi_i_grads_v[i][1][0];
grad_v_LinV[1][1] = phi_i_grads_v[i][1][1];

const Tensor<2,dim> stress_fluid_LinAll = -pI_LinP


+ density_fluid * viscosity * (phi_i_grads_v[i] + transpose(phi_i_grads_v[i]));

const Tensor<1,dim> convection_fluid_LinAll = density_fluid * (phi_i_grads_v[i] * v + grad_v * phi_i

const double incompressibility_LinAll = phi_i_grads_v[i][0][0] + phi_i_grads_v[i][1][1];

// Inner loop for dofs


for (unsigned int j=0; j<dofs_per_cell; ++j)
{
// Fluid , NSE
const unsigned int comp_j = fe_primal.system_to_component_index(j).first;
if (comp_j == 0 || comp_j == 1)
{
local_matrix(j,i) += (convection_fluid_LinAll * phi_i_v[j] +
scalar_product(stress_fluid_LinAll, phi_i_grads_v[j])
) * fe_values.JxW(q);
}
else if (comp_j == 2)
{
local_matrix(j,i) += (incompressibility_LinAll * phi_i_p[j]
) * fe_values.JxW(q);
}
// end j dofs

415
17. SOFTWARE AND PROGRAMMING CODES
}
// end i dofs
}
// end n_q_points
}

// This is the same as discussed in step-22:


cell->get_dof_indices (local_dof_indices);
constraints_primal.distribute_local_to_global (local_matrix, local_dof_indices,
system_matrix_primal);

} // end cell

///////////////////////////////////////////////////////
...
///////////////////////////////////////////////////////

for (; cell!=endc; ++cell)


{
fe_values.reinit (cell);
local_rhs = 0;

// Previous Newton iteration


fe_values.get_function_values (solution_primal, old_solution_values);
fe_values.get_function_gradients (solution_primal, old_solution_grads);

for (unsigned int q=0; q<n_q_points; ++q)


{
Tensor<2, 2> pI;
pI.clear();
pI[0][0] = old_solution_values[q](dim);
pI[1][1] = old_solution_values[q](dim);

Tensor<1,dim> v;
v[0] = old_solution_values[q](0);
v[1] = old_solution_values[q](1);

Tensor<2,dim> grad_v;
grad_v[0][0] = old_solution_grads[q][0][0];
grad_v[0][1] = old_solution_grads[q][0][1];
grad_v[1][0] = old_solution_grads[q][1][0];
grad_v[1][1] = old_solution_grads[q][1][1];

Tensor<2,dim> sigma_fluid;
sigma_fluid.clear();
sigma_fluid = -pI + density_fluid * viscosity * (grad_v + transpose(grad_v));

// Divergence of the fluid


const double incompressiblity_fluid = grad_v[0][0] + grad_v[1][1];

416
17. SOFTWARE AND PROGRAMMING CODES

// Convection term of the fluid


Tensor<1,dim> convection_fluid;
convection_fluid.clear();
convection_fluid = density_fluid * grad_v * v;

for (unsigned int i=0; i<dofs_per_cell; ++i)


{
// Fluid, NSE
const unsigned int comp_i = fe_primal.system_to_component_index(i).first;
if (comp_i == 0 || comp_i == 1)
{
const Tensor<1,dim> phi_i_v = fe_values[velocities].value (i, q);
const Tensor<2,dim> phi_i_grads_v = fe_values[velocities].gradient (i, q);

local_rhs(i) -= (convection_fluid * phi_i_v +


scalar_product(sigma_fluid, phi_i_grads_v)
) * fe_values.JxW(q);

}
else if (comp_i == 2)
{
const double phi_i_p = fe_values[pressure].value (i, q);
local_rhs(i) -= (incompressiblity_fluid * phi_i_p) * fe_values.JxW(q);
}
// end i dofs
}
// close n_q_points
}

cell->get_dof_indices (local_dof_indices);
constraints_primal.distribute_local_to_global (local_rhs, local_dof_indices,
system_rhs_primal);

} // end cell

17.3 Chapter summary and outlook


We finish this booklet in the next chapter by posing some central questions that help to recapitulate the
important topics of this lecture.

417
17. SOFTWARE AND PROGRAMMING CODES

418
18. WRAP-UP

18 Wrap-up
18.1 Quiz
18.1.1 Basic techniques
1. Define the Laplace operator.

2. Write down the formula of partial integration in higher dimensions!


3. State the Poisson problem with homogeneous Dirichlet conditions. Write this problem in a variational
form using the correct function spaces.
4. Define the H 1 space.

5. Define the H01 space.


6. What are difficulties of H 1 functions in higher dimensions n ≥ 2?
7. What is a Hilbert space?
8. What are possible boundary conditions for second-order boundary value problems?

9. What is the idea of the FEM?


10. Why do we need numerical quadrature in finite element calculations?
11. What are the three characterizing properties of an finite element?

12. Coming from the variational form, give the discrete form and construct a finite element scheme of the
Poisson problem until the final final linear equation system.
13. Explain the idea of Finite Differences (FD).
14. Discretize the 1D Poisson problem with FD.

15. What is the difference of FD and the FEM?


16. What is the discretization error?
17. What is the maximum principle?
18. What is an M matrix?

19. For what is the M matrix property important?


20. How can we characterize elliptic, parabolic, and hyperbolic problems?
21. What is the difference between a priori and a posteriori error estimates?

22. Let h → 0. What does this mean for ku − uh k (FEM) and for the solution of Ax = b, where A is the
usual stiffness matrix of Poisson’s problem?
23. Formulate Poisson’s problem with Dirichlet, Neumann, Robin conditions, respectively.
24. How is well-posedness of problems defined?

25. What is the Poisson problem? How do we obtain existence, uniqueness, and continuous dependence on
the right hand side (Hint: weak form)?
26. What is the Lax-Milgram lemma? What are the assumptions?
27. Outline the key ideas of the proof of the Lax-Milgram lemma.

419
18. WRAP-UP
28. Give a physical interpretation of the Poisson problem.
29. Why are all integrals evaluated on a master element in FEM?
30. Write down the loops to assemble a finite element system matrix.
31. Why is it sufficient to use the Riesz lemma to prove existence of the Poisson problem rather than using
the Lax-Milgram lemma?
32. What are challenges investigating PDEs in higher dimensions d ≥ 2?
33. What is a best approximation?
34. What is Galerkin orthogonality? And what is its geometrical interpretation?
35. What are interpolation error estimates?
36. What are the ingredients (type of error estimates) to obtain convergence results in FEM?
37. How do we get an improved estimate for the L2 norm for the Poisson problem using finite elements?
38. Which result is better: ku − uh kL2 = O(h2 ) or ku − uh kH 1 = O(h)?
39. What is the Aubin-Nitsche trick?
40. What does the Céa lemma say?
41. How can we solve the linear equation systems Ax = b?
42. What is a preconditioner?
43. Give a stopping criterion for an iterative scheme!
44. Why is the condition number important?
45. What is the condition number of the system matrix of the Poisson problem?
46. Describe the basic idea of multigrid
47. What is a V cycle?
48. What is a smoother?
49. What is the optimal arithmetic complexity of a geometric multigrid method?
50. Formulate the heat / wave equation and discretize in space and time.
51. Give examples of different time-stepping schemes. What schemes exist and what are their principles
differences?
52. Explain the key ideas to obtain stability estimates for parabolic problems.
53. What is the principle idea to proof energy conservation of the wave equation on the time-discrete level?
54. Why are implicit schemes mostly preferred in time-dependent problems?
55. What can cause instabilities in temperal problems and how can they be cured?
56. Why are Crank-Nicolson-type schemes (nearly) optimal schemes for wave-like problems?
57. Are explicit time-stepping schemes conditionally or unconditionally stable?
58. What is the Rothe method?
59. What is the method of lines?
60. What is the idea of a full space-time Galerkin discretization?

420
18. WRAP-UP
18.1.2 Advanced techniques
1. What is useful when employing goal-oriented error estimation? What is a goal functional?

2. What is the adjoint?


3. Why is the adjoint our friend?
4. Formulate Newton’s method in R.

5. Why do we need a posteriori error estimation?


6. Why do we need mesh adaptivity?
7. Define the directional derivative!
8. Formulate Newton’s method for variational problems

9. What are examples of vector-valued problems? Id est: what is a vector-valued problem?


10. How are linear and quadratic convergence characterized in practice?
11. Give an example of a nonlinear equation.
12. What are numerical possiblities to solve nonlinear equations?

13. Explain a fixed-point iteration for solving nonlinear problems.


14. What is a coupled problem?
15. Which coupling versions exist?

16. How can we decouple a coupled problem? What are the advantages and shortcomings?
17. Why are interface-couplings more difficult than volume coupling?
18. What are numerical methods for interface coupling?
19. Given PDE1 and PDE2, formulate a compact semi-linear form for a monolithic version.

20. Formulate Newton’s method for a variational-monolithic coupled problem


21. What are the advantages or shortcomings of staggered approaches?
22. What is the crucial point in terms of the numerical solution, when several PDEs are coupled in a
monolithic fashion?

421
18. WRAP-UP
18.2 Hints for presenting results in a talk (FEM-C++ practical class)
We collect some hints to make good presentations using, for instance, beamer latex or PowerPoint:

1. Structure your talk clearly: what is the goal? How did you manage to reach this goal? What are your
results? What is your interpretation? Why did you choose specific methods (advantages / shortcomings
in comparison to other existing techniques)?
2. Title page: title, author names, co-authors (if applicable), your institute/company, name of event you
are talking

3. Page 2: list of contents


4. Conclusions at the end: a few take-home messages
5. Rule of thumb: one minute per slide. For a 20 minutes presentation a good mean value is to have 20
slides.

6. A big risk of beamer/Power-Point presentations is that we rush over the slides. You are the expert and
know everything! But your audience is not. Give the audience time to read the slide!
7. Taylor your talk to the audience!
8. Put a title on each slide!

9. In plots and figures: make lines, the legend, and axes big enough such that people from the back can still
read your plots. A good no go, bad example is my Figure 14 in which the legend is by far too small.
Do not do that! A good example is Figure 31.
10. Tables with numbers: if you create columns of numbers do not put all of them. If you have many, you
may mark the most important in a different color to highlight them.
11. Do not glue to your slides. Despite you have everything there, try to explain things in a free speech.
12. Less is more:
a) Do not put too much text on each slide!
b) Do not write full sentences, but use bullet points!
c) Do not use too fancy graphics charts, plots, etc, where everything is moving, etc.
d) Do not try to pack everything you have learned into one presentation. Everybody believes that you
know many, many things. The idea of a presentation is to present in a clear way a piece of your
work!

13. As always: the first sentence is the most difficult one: practice for yourself some nice welcome words to
get smoothly started.
14. Practice your final presentation ahead of time either alone, with friends or colleagues.
15. Just be authentic during the presentation.

16. If you easily become nervous, avoid coffee before the presentation.

422
18. WRAP-UP
18.2.1 Writing an abstract
Before presenting scientific results at workshops or conferences, usually an abstract is required, which summa-
rizes in a consise way your results. An example from myself is provided in this section. 15

15 Thanks to Sebastian Kinnewig who had this idea to ask for such abstracts in our courses on FEM-C++ project work.

423
18. WRAP-UP
18.2.2 Some example pages

Figure 74: Title page. In the footline, the name, topic, and page number is displayed. Page numbers help the
audience to point more precisely to a specific page when the talk is open for questions.

Figure 75: Page 2.

424
18. WRAP-UP

Figure 76: Some page with precise statements; not overpacked, bullet points, if not too exhausting colors may
be used, ...

Figure 77: When switching to a new section, you may want to display the table of contents again to focus on
this new section.

425
18. WRAP-UP

Figure 78: Some page representing the configuration, goals, and references of a numerical example.

Figure 79: Some page with results. If too many lines in a table, please use . . . (dots). A reference to published
literature can be added as well.

426
18. WRAP-UP

Figure 80: Some other page.

Figure 81: Conclusions with short and precise take-home messages. If applicable, you may want to add ongoing
or future work.

427
18. WRAP-UP
18.3 The end
Eduard Mörike (1804 - 1875),
deutscher Erzähler, Lyriker und Dichter:
Septembermorgen

Im Nebel ruhet noch die Welt,


noch träumen Wald und Wiesen;
bald siehst du, wenn der Schleier fällt,
den blauen Himmel unverstellt,
herbstkräftig die gedämpfte Welt
in warmem Golde fließen.

Gottfried Wilhelm Leibniz (1646 - 1716):


“Es ist unwürdig, die Zeit von hervorragenden Leuten mit knechtischen Rechenarbeiten zu
verschwenden, weil bei Einsatz einer Maschine auch der Einfältigste die Ergebnisse sicher
hinschreiben kann.”
Gottfried Wilhelm Leibniz erfand das Staffelwalzenprinzip, welches eine große Errungenschaft beim Bau von
mechanischen Rechenmaschinen darstellte, und gewissermaßen wegbereitend für unsere heutigen
Rechnenmaschinen war: die Computer, welche einen Schwerpunkt in der modernen numerischen Mathematik
und des Wissenschaftlichen Rechnens einnehmen.

428
References

References
[1] R. A. Adams. Sobolev Spaces. Academic Press, 1975.

[2] G. Allaire. An Introduction to Mathematical Modelling and Numerical Simulation. Oxford University
Press, 2007.
[3] G. Allaire and F. Alouges. Analyse variationnelle des équations aux dérivées partielles. MAP 431:
Lecture notes at Ecole Polytechnique, 2016.

[4] H. W. Alt. Lineare Funktionalanalysis. Springer, Berlin-Heidelberg, 5nd edition, 2006.


[5] S. Amstutz and T. Wick. Refresher course in maths and a project on numerical modeling
done in twos. Hannover : Institutionelles Repositorium der Leibniz Universität Hannover, DOI:
https://doi.org/10.15488/11629, December 2021.
[6] D. Arndt, W. Bangerth, D. Davydov, T. Heister, L. Heltai, M. Kronbichler, M. Maier, J.-P. Pelteret,
B. Turcksin, and D. Wells. The deal.II library, version 8.5. Journal of Numerical Mathematics, 2017.
[7] W. Arnoldi. The principle of minimized iteration in the solution of the matrix eigenvalue problem. Quart.
Appl. Math., 9:17–29, 1951.
[8] W. Bangerth, M. Geiger, and R. Rannacher. Adaptive Galerkin finite element methods for the wave
equation. Comput. Methods Appl. Math., 10:3–48, 2010.
[9] W. Bangerth, R. Hartmann, and G. Kanschat. deal.II – a general purpose object oriented finite element
library. ACM Trans. Math. Softw., 33(4):24/1–24/27, 2007.
[10] W. Bangerth, T. Heister, and G. Kanschat. Differential Equations Analysis Library, 2012.

[11] W. Bangerth and R. Rannacher. Adaptive Finite Element Methods for Differential Equations. Birkhäuser,
Lectures in Mathematics, ETH Zürich, 2003.
[12] S. Bartels. Numerical Methods for Nonlinear Partial Differential Equations. Springer, 2015.
[13] P. Bastian. Lecture notes on scientific computing with partial differential equations. Vorlesungsskriptum,
2014.

[14] R. Becker and R. Rannacher. A feed-back approach to error control in finite element methods: basic
analysis and examples. East-West J. Numer. Math., 4:237–264, 1996.
[15] R. Becker and R. Rannacher. An optimal control approach to a posteriori error estimation in finite
element methods. Acta Numerica, Cambridge University Press, pages 1–102, 2001.

[16] A. Bensoussan, J. Lions, and G. Papanicolaou. Asymptotic Analysis for Periodic Structures. North-
Holland, 1978.
[17] C. Bernardi and E. Süli. Time and space adaptivity for the second-order wave equation. Math. Models
Methods Appl. Sci., 15(2):199–225, 2005.

[18] M. Besier. Adaptive Finite Element methods for computing nonstationary incompressible Flows. PhD
thesis, University of Heidelberg, 2009.
[19] M. Besier and R. Rannacher. Goal-oriented space-time adaptivity in the finite element galerkin method
for the computation of nonstationary incompressible flow. Int. J. Num. Meth. Fluids, 70:1139–1166,
2012.

[20] M. Biot. Consolidation settlement under a rectangular load distribution. J. Appl. Phys., 12(5):426–430,
1941.
[21] M. Biot. General theory of three-dimensional consolidation. J. Appl. Phys., 12(2):155–164, 1941.

429
References
[22] M. Biot. Theory of elasticity and consolidation for a porous anisotropic solid. J. Appl. Phys., 25:182–185,
1955.
[23] M. Braack and A. Ern. A posteriori control of modeling errors and discretization errors. Multiscale
Model. Simul., 1(2):221–238, 2003.
[24] D. Braess. Finite Elemente. Springer-Verlag Berlin Heidelberg, Berlin, Heidelberg, vierte, überarbeitete
und erweiterte edition, 2007.
[25] M. Braun. Differential equations and their applications. Springer, 1993.

[26] S. C. Brenner and L. R. Scott. The mathematical theory of finite element methods. Number 15 in Texts
in applied mathematics ; 15 ; Texts in applied mathematics. Springer, New York, NY, 3. ed. edition,
2008.
[27] H. Brezis. Analyse Fonctionnelle, Theorie et Applications. Masson Paris, 1983.

[28] H. Brezis. Functional analysis, Sobolev spaces and partial differential equations. Springer New York,
Dordrecht, Heidelberg, London, 2011.
[29] M. Bristeau, R. Glowinski, and J. Periaux. Numerical methods for the Navier-Stokes equations. Comput.
Phys. Rep., 6:73–187, 1987.
[30] G. F. Carey and J. T. Oden. Finite Elements. Volume III. Compuational Aspects. The Texas Finite
Element Series, Prentice-Hall, Inc., Englewood Cliffs, 1984.
[31] C. Carstensen, M. Feischl, M. Page, and D. Praetorius. Axioms of adaptivity. Computers and Mathe-
matics with Applications, 67(6):1195 – 1253, 2014.
[32] C. Carstensen and R. Verfürth. Edge residuals dominate a posteriori error estimates for low order finite
element methods. SIAM J. Numer. Anal., 36(5):1571–1587, 1999.
[33] S.-S. Chow. Finite element error estimates for non-linear elliptic equations of monotone type. Numer.
Math., 54:373–393, 1989.
[34] P. G. Ciarlet. Mathematical Elasticity. Volume 1: Three Dimensional Elasticity. North-Holland, 1984.

[35] P. G. Ciarlet. The finite element method for elliptic problems. North-Holland, Amsterdam [u.a.], 2. pr.
edition, 1987.
[36] P. G. Ciarlet. Linear and Nonlinear Functional Analysis with Applications. SIAM, 2013.
[37] R. Courant. Variational methods for the solution of problems of equilibrium and vibrations. Bull. Amer.
Math. Soc., 49:1–23, 1943.

[38] H. Darcy. Les Fontaines de la Ville de Dijon. Dalmont, Paris, 1856.


[39] R. Dautray and J.-L. Lions. Mathematical Analysis and Numerical Methods for Science and Technology,
volume 5. Springer-Verlag, Berlin-Heidelberg, 2000.
[40] P. Deuflhard. Newton Methods for Nonlinear Problems, volume 35 of Springer Series in Computational
Mathematics. Springer Berlin Heidelberg, 2011.
[41] The Differential Equation and Optimization Environment: DOpElib. http://www.dopelib.net.
[42] W. Dörfler. A convergent adaptive algorithm for poisson’s equation. SIAM J. Numer. Anal., 33(3):1106–
1124, 1996.

[43] G. Duvaut and J. L. Lions. Inequalities in Mechanics and Physics. Springer, Berlin-Heidelberg-New
York., 1976.

430
References
[44] J. W. Eaton, D. Bateman, S. Hauberg, and R. Wehbring. GNU Octave version 3.8.1 manual: a high-level
interactive language for numerical computations. CreateSpace Independent Publishing Platform, 2014.
ISBN 1441413006.

[45] B. Endtmayer, U. Langer, and T. Wick. Multigoal-Oriented Error Estimates for Non-linear Problems.
Journal of Numerical Mathematics, 27(4):215–236, 2019.
[46] B. Endtmayer, U. Langer, and T. Wick. Two-Side a Posteriori Error Estimates for the Dual-Weighted
Residual Method. SIAM J. Sci. Comput., 42(1):A371–A394, 2020.

[47] B. Endtmayer and T. Wick. A partition-of-unity dual-weighted residual approach for multi-objective goal
functional error estimation applied to elliptic problems. Computational Methods in Applied Mathematics,
17(2):575–599, 2017.
[48] K. Eriksson, D. Estep, P. Hansbo, and C. Johnson. Computational Differential Equations. Cambridge
University Press, 2009. http://www.csc.kth.se/ jjan/private/cde.pdf.

[49] D. Etling. Theoretische Meteorologie. Springer, 2008.


[50] L. C. Evans. Partial differential equations. American Mathematical Society, 2000.
[51] L. C. Evans. Partial differential equations. American Mathematical Society, 2010.

[52] T. Fliessbach. Mechanik. Spektrum Akademischer Verlag, 2007.


[53] S. Frei. Eulerian finite element methods for interface problems and fluid-structure interactions. PhD
thesis, Heidelberg University, 2016.
[54] S. Frei and T. Richter. A locally modified parametric finite element method for interface problems.
SIAM Journal on Numerical Analysis, 52(5):2315–2334, 2014.

[55] X. Gai. A coupled geomechanics and reservoir flow model on parallel computers. PhD thesis, The
University of Texas at Austin, 2004.
[56] V. Girault, G. Pencheva, M. F. Wheeler, and T. Wildey. Domain decomposition for poroelasticity and
elasticity with dg jumps and mortars. Mathematical Models and Methods in Applied Sciences, 21(1):169–
213, 2011.
[57] V. Girault and P.-A. Raviart. Finite Element method for the Navier-Stokes equations. Number 5 in
Computer Series in Computational Mathematics. Springer-Verlag, 1986.
[58] R. Glowinski and J. Periaux. Numerical methods for nonlinear problems in fluid dynamics. In Proc.
Intern. Seminar on Scientific Supercomputers. North Holland, Feb. 2-6 1987.

[59] R. Glowinski and P. L. Tallec. Augmented Lagrangian and Operator-Splitting Methods in Nonlinear
Mechanics. SIAM Stud. Appl. Math. 9. SIAM, Philadelphia, 1989.
[60] J. Goldenstein and J. Grashorn. FEM-Implementierung der instationären Wärmeleitungsgleichung in
C++. Project in the FEM-C++ class, Institut für Angewandte Mathematik, Leibniz Universität Han-
nover, July 2018.
[61] H. Goldstein, C. P. P. Jr., and J. L. S. Sr. Klassische Mechanik. Wiley-VCH, 3. auflage edition, 2006.
[62] C. Goll, T. Wick, and W. Wollner. DOpElib: Differential equations and optimization environment; A
goal oriented software library for solving pdes and optimization problems with pdes. Archive of Numerical
Software, 5(2):1–14, 2017.

[63] P. Grisvard. Elliptic Problems in Nonsmooth Domains, volume 24. Pitman Advanced Publishing Pro-
gram, Boston, 1985.

431
References
[64] C. Großmann and H.-G. Roos. Numerische Behandlung partieller Differentialgleichungen. Teubner-
Studienbücher Mathematik ; Lehrbuch Mathematik. Teubner, Wiesbaden, 3., völlig überarb. und erw.
aufl. edition, 2005.

[65] C. Großmann, H.-G. Roos, and M. Stynes. Numerical Treatment of Partial Differential Equations.
Springer, 2007.
[66] H. Guzman. Domain Decomposition Methods in Geomechanics. PhD thesis, The University of Texas at
Austin, 2012.

[67] W. Hackbusch. Multi-Grid Methods and Applications. Springer, 1985.


[68] W. Hackbusch. Theorie und Numerik elliptischer Differentialgleichungen. Vieweg+Teubner Verlag, 1986.
[69] G. H"ammerlin and K.-H. Hoffmann. Numerische Mathematik. Springer Verlag, 1992.
[70] M. Hanke-Bourgeois. Grundlagen der numerischen Mathematik und des Wissenschaftlichen Rechnens.
Vieweg-Teubner Verlag, 2009.
[71] R. Hartmann. Multitarget error estimation and adaptivity in aerodynamic flow simulations. SIAM
Journal on Scientific Computing, 31(1):708–731, 2008.
[72] R. Hartmann and P. Houston. Goal-oriented a posteriori error estimation for multiple target functionals.
In T. Hou and E. Tadmor, editors, Hyperbolic Problems: Theory, Numerics, Applications, pages 579–588.
Springer Berlin Heidelberg, 2003.
[73] T. Heister, M. F. Wheeler, and T. Wick. A primal-dual active set method and predictor-corrector mesh
adaptivity for computing fracture propagation using a phase-field approach. Comp. Meth. Appl. Mech.
Engrg., 290:466 – 495, 2015.

[74] J. G. Heywood and R. Rannacher. Finite-element approximation of the nonstationary Navier-Stokes


problem part iv: Error analysis for second-order time discretization. SIAM Journal on Numerical Anal-
ysis, 27(2):353–384, 1990.
[75] J. G. Heywood, R. Rannacher, and S. Turek. Artificial boundaries and flux and pressure conditions
for the incompressible Navier-Stokes equations. International Journal of Numerical Methods in Fluids,
22:325–352, 1996.
[76] G. Holzapfel. Nonlinear Solid Mechanics: A continuum approach for engineering. John Wiley and Sons,
LTD, 2000.
[77] J. Hron and S. Turek. Proposal for numerical benchmarking of fluid-structure interaction between an
elastic object and laminar incompressible flow, volume 53, pages 146 – 170. Springer-Verlag, 2006.

[78] T. Hughes. The finite element method. Dover Publications, 2000.


[79] T. Hughes, J. Cottrell, and Y. Bazilevs. Isogeometric analysis: Cad, finite elements, nurbs, exact
geometry and mesh refinement. Computer Methods in Applied Mechanics and Engineering, 194(39-
41):4135 – 4195, 2005.

[80] D. Jodlbauer, U. Langer, and T. Wick. Parallel block-preconditioned monolithic solvers for fluid-structure
interaction problems. Int. J. Num. Meth. Eng., 117(6):623–643, 2019.
[81] D. Jodlbauer and T. Wick. A monolithic fsi solver applied to the fsi 1,2,3 benchmarks. In S. Frei, B. Holm,
T. Richter, T. Wick, and H. Yang, editors, Fluid-Structure Interaction: Modeling, Adaptive Discretization
and Solvers, Radon Series on Computational and Applied Mathematics. Walter de Gruyter, Berlin, 2017.

[82] C. Johnson. Numerical solution of partial differential equations by the finite element method. Cambridge
University Press, Campridge, 1987.

432
References
[83] N. Kikuchi and J. Oden. Contact problems in elasticity. Studies in Applied Mathematics. Society for
Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1988.
[84] D. Kinderlehrer and G. Stampacchia. An Introduction to Variational Inequalities and Their Applications.
Classics in Applied Mathematics. Society for Industrial and Applied Mathematics, 2000.
[85] K. Königsberger. Analysis 1. Springer Lehrbuch. Springer, Berlin – Heidelberg – New York, 6. auflage
edition, 2004.
[86] K. Königsberger. Analysis 2. Springer Lehrbuch. Springer, Berlin – Heidelberg – New York, 5. auflage
edition, 2004.
[87] E. Kreyszig. Introductory functional analysis with applications. Wiley, 1989.
[88] K. Kumar, M. Wheeler, and T. Wick. Reactive flow and reaction-induced boundary movement in a thin
channel. SIAM J. Sci. Comput., 35(6):B1235–B1266, 2013.

[89] O. Ladyzhenskaya. The boundary value problems of mathematical physics. Springer, New York, 1985.
[90] R. le Veque. Top ten reasons to not share your code (and why you should anyway).
http://faculty.washington.edu/rjl/pubs/topten/topten.pdf, Dec 2012.
[91] R. Liu. Discontinuous Galerkin Finite Element Solution for Poromechanics. PhD thesis, The University
of Texas at Austin, 2004.

[92] R. Liu, M. Wheeler, C. Dawson, and R. Dean. A fast convergent rate preserving discontinuous galerkin
framework for rate-independent plasticity problems. Comp. Methods Appl. Mech. Engrg., 199:3213–3226,
2010.
[93] M. Luskin and R. Rannacher. On the soothing property of the Crank-Nicolson scheme. Applicable
Analysis, 14(2):117 – 135, 1982.
[94] A. Mikelić, M. Wheeler, and T. Wick. A phase-field approach to the fluid filled fracture surrounded by
a poroelastic medium. ICES Report 13-15, Jun 2013.
[95] A. Mikelić and M. F. Wheeler. On the interface law between a deformable porous medium containing a
viscous fluid and an elastic body. M3AS, 22(11):32, 2012.

[96] A. Mikelić, M. F. Wheeler, and T. Wick. A quasi-static phase-field approach to pressurized fractures.
Nonlinearity, 28(5):1371–1399, 2015.
[97] J. Nocedal and S. J. Wright. Numerical optimization. Springer Ser. Oper. Res. Financial Engrg., 2006.
[98] A. Quarteroni, F. Saleri, and P. Gervasio. Scientific computing with MATLAB and Octave. Texts in
computational science and engineering. Springer, 2014.
[99] R. Rannacher. Finite element solution of diffusion problems with irregular data. Numer. Math., 43:309–
327, 1984.
[100] R. Rannacher. On the stabilization of the Crank-Nicolson scheme for long time calculations. Preprint,
August 1986.
[101] R. Rannacher. Finite element methods for the incompressible Navier-Stokes equations. Lecture notes,
August 1999.
[102] R. Rannacher. Numerische methoden der Kontinuumsmechanik (Numerische Mathematik 3). Vor-
lesungsskriptum, 2001.

[103] R. Rannacher. Numerische methoden für gewöhnliche Differentialgleichungen (Numerische Mathematik


1). Vorlesungsskriptum, 2002.

433
References
[104] R. Rannacher. Numerische methoden für partielle Differentialgleichungen (Numerische Mathematik 2).
Vorlesungsskriptum, 2006.
[105] R. Rannacher. Analysis 2. Vorlesungsskriptum, 2010.

[106] R. Rannacher. Numerik partieller Differentialgleichungen. Heidelberg University Publishing, 2017.


[107] R. Rannacher and J. Vihharev. Adaptive finite element analysis of nonlinear problems: balancing of
discretization and iteration errors. Journal of Numerical Mathematics, 21(1):23–61, 2013.
[108] H.-J. Reinhardt. Die methode der finiten elemente. Vorlesungsskriptum, Universität Siegen, 1996.

[109] S. I. Repin and S. A. Sauter. Accuracy of Mathematical Models: Dimension Reduction, Homogenization,
and Simplification. European Mathematical Society, 2020.
[110] J. Rice. Mathematical analysis in the mechanics of fracture, pages 191–311. Academic Press New York,
chapter 3 of fracture: an advanced treatise edition, 1968.

[111] T. Richter. Numerische methoden für gewöhnliche und partielle differentialgleichungen. Lecture notes,
Heidelberg University, 2011.
[112] T. Richter. Fluid-structure interactions: models, analysis, and finite elements. Springer, 2017.
[113] T. Richter and T. Wick. Variational localizations of the dual weighted residual estimator. Journal of
Computational and Applied Mathematics, 279(0):192 – 208, 2015.
[114] T. Richter and T. Wick. Einführung in die numerische Mathematik - Begriffe, Konzepte und zahlreiche
Anwendungsbeispiele. Springer, 2017.
[115] Y. Saad. Iterative methods for sparse linear systems. SIAM, 2003.

[116] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric
linear systems. SIAM J. Sci. Stat. Comput., 7(3), 1986.
[117] L. Samolik and T. Wick. Numerische Mathematik II (Numerik GDGL, Eigenwerte). Institute for Applied
Mathematics, Leibniz Universität Hannover, Germany, 2019.

[118] M. Schäfer and S. Turek. Flow Simulation with High-Performance Computer II, volume 52 of Notes
on Numerical Fluid Mechanics, chapter Benchmark Computations of laminar flow around a cylinder.
Vieweg, Braunschweig Wiesbaden, 1996.
[119] M. Schmich and B. Vexler. Adaptivity with dynamic meshes for space-time finite element discretizations
of parabolic equations. SIAM J. Sci. Comput., 30(1):369 – 393, 2008.

[120] H. R. Schwarz. Methode der finiten Elemente. Teubner Studienbuecher Mathematik, 1989.
[121] R. C. Smith. Uncertainty Quantification: Theory, Implementation, and Applications. SIAM, 2014.
[122] M. Steinbach, J. Thiele, and T. Wick. Algorithmisches programmieren (numerische algorithmen
mit c++). Hannover : Institutionelles Repositorium der Leibniz Universität Hannover, DOI:
https://doi.org/10.15488/11583, December 2021.
[123] F. Suttmeier. Numerical solution of Variational Inequalities by Adaptive Finite Elements.
Vieweg+Teubner, 2008.
[124] F. T. Suttmeier and T. Wick. Numerische Methoden für partielle Differentialgleichungen (in german).
Notes (Vorlesungsmitschrift) at University of Siegen, 2006.

[125] R. Temam. Navier-Stokes Equations: Theory and Numerical Analysis. AMS Chelsea Publishing, Provi-
dence, Rhode Island, 2001.

434
References
[126] T. Tezduyar. Finite element methods for flow problems with moving boundaries and interfaces. Archives
of Computational Methods in Engineering, 8(2):83–130, 2001.
[127] J.-P. Thiele. Development and test of a nesting method for an embedded lagrangian particle model.
Master’s thesis, Leibniz University Hannover, Sep 2017.
[128] I. Toulopoulos and T. Wick. Numerical methods for power-law diffusion problems. SIAM Journal on
Scientific Computing, 39(3):A681–A710, 2017.
[129] R. Trémolières, J. Lions, and R. Glowinski. Analyse numérique des inéquations variationnelles. Dunod,
1976.
[130] R. Trémolières, J. Lions, and R. Glowinski. Numerical Analysis of Variational Inequalities. Studies in
Mathematics and its Applications. Elsevier Science, 2011.
[131] F. Tröltzsch. Optimale Steuerung partieller Differentialgleichungen - Theorie, Verfahren und Anwen-
dungen. Vieweg und Teubner, Wiesbaden, 2nd edition, 2009.

[132] S. Turek. Efficient solvers for incompressible flow problems. Springer-Verlag, 1999.
[133] S. Turek, L. Rivkind, J. Hron, and R. Glowinski. Numerical analysis of a new time-stepping θ-scheme
for incompressible flow simulations. Technical report, TU Dortmund and University of Houston, 2005.
Dedicated to David Gottlieb on the occasion of his 60th anniversary.

[134] D. Werner. Funktionalanalysis. Springer, 2004.


[135] T. Wick. Adaptive Finite Element Simulation of Fluid-Structure Interaction with Application to Heart-
Valve Dynamics. PhD thesis, University of Heidelberg, 2011.
[136] T. Wick. Fluid-structure interactions using different mesh motion techniques. Computers and Structures,
89(13-14):1456–1467, 2011.
[137] T. Wick. Solving monolithic fluid-structure interaction problems in arbitrary Lagrangian Eulerian coor-
dinates with the deal.II library. Archive of Numerical Software, 1:1–19, 2013.
[138] T. Wick. Modeling, discretization, optimization, and simulation of fluid-structure interaction.
Lecture notes at Heidelberg University, TU Munich, and JKU Linz available on https://www-
m17.ma.tum.de/Lehrstuhl/LehreSoSe15NMFSIEn, 2015.
[139] T. Wick. Modified Newton methods for solving fully monolithic phase-field quasi-static brittle fracture
propagation. Computer Methods in Applied Mechanics and Engineering, 325:577 – 611, 2017.
[140] T. Wick. Variational-Monolithic ALE Fluid-Structure Interaction: Comparison of Computational Cost
and Mesh Regularity Using Different Mesh Motion Techniques, pages 261–275. Springer International
Publishing, Cham, 2017.
[141] T. Wick. Multiphysics Phase-Field Fracture: Modeling, Adaptive Discretizations, and Solvers. De
Gruyter, Berlin, Boston, 2020.
[142] T. Wick. Sergey i. repin, stefan a. sauter: ‘accuracy of mathematical models’. Jahresbericht der Deutschen
Mathematiker-Vereinigung, 2020.
[143] T. Wick and P. Bastian. Pec3 spring school on introduction to numerical modeling with differential
equations. Hannover : Institutionelles Repositorium der Leibniz Universität Hannover, 2019. DOI:
https://doi.org/10.15488/6427, 2019.

[144] J. Wloka. Partielle Differentialgleichungen. B. G. Teubner Verlag, Stuttgart, 1982.


[145] J. Wloka. Partial differential equations. Cambridge University Press, 1987.
[146] E. Zeidler. Nonlinear functional analysis - nonlinear monotone operators. Springer-Verlag, 1990.

435
References

436
Index
H 1 , 101, 105 Scalar product, 124
H 1 norm, 193 Biot problem, 51, 350
H 1 seminorm, 107 Bochner integral, 271
H 2 , 107 Boundary conditions, 62
H 2 norm, 107 Boundary value problem, 45
H 2 seminorm, 107 Bounded operator, 37
H01 , 101, 106 Brachistochrone, 43
L2 , 102 BVP, 45
L2 norm, 193
V cycle, 256 C++
W cycle, 257 Newton, 329
2D-1 benchmark, 226 Céa lemma, 167
First version, 145
A norm, 87 Cauchy sequence, 99, 111
A posteriori error estimation, 195, 198 Cauchy’s inequality, 108
A priori error estimation, 198 Cauchy-Schwarz inequality, 108
A-stability, 281, 284 Chain rule, 32
Strict, 284 Change of variables, 39
Strong, 284 Characteristic equation, 34
a.e., 102 Characteristic polynomial, 34
Accuracy, 19, 22 Classifications, 71, 361
Adaptive FEM, 199 Compact support, 123
Adaptive finite elements, 199 Compatiblity condition, 149
Adaptivity Complementarity condition, 320
Fluid-structure interaction, 232 Complete space, 99
Adjoint methods, 202 Completeness, 99
Adjoint problem, 206 Computational convergence, 193, 367
Adjoint variable, 200, 202 Computational convergence analysis, 189, 193, 367
AFEM, 199 Computing error norms, 193
Basic algorithm, 199 Concepts in numerics, 24
Algorithm, 19 Condition number, 92
Analysis, 19 Conditional stability, 287
Almost everywhere, 102 Conforming FEM, 164
almost everywhere, 102 Conservation principles, 120
ANS, 355 Consistency, 84, 279, 282
Ansatz function, 125 Constitutive laws, 46
Approximation theory, 109 Continuity equation, 359
Assembling in FEM, 134 Continuous mapping, 36
Assembling integrals, 134 Convergence, 283
Atmospheric physics, 359 Definition, 99
Aubin-Nitsche trick, 172 Parabolic problems, 287
Wave equation, 308
Backtracking line search, 331 Convergence order, 193
Backward Euler scheme, 93, 313 Computationally obtained, 369
Banach space, 99 Coupled PDEs, 72, 345
Benchmark, 226 Interface coupling, 350
Best approximation, 146, 167 Iterative coupling, 348
Big O, 37 Variational monolithic coupling, 348
Bilinear, 35 Volume coupling, 347
Bilinear form, 41, 124 Coupled problems, 75, 345
Norm, 124 Coupled variational inequality system, 75

437
Index
Coupling information, 75 Duality arguments, 199
Crank-Nicolson, 93 DWR, 199, 393
Crank-Nicolson scheme, 306, 312 FSI, 232
Cross product, 33
curl, 33 Efficiency, 19
Current configuration, 261 Efficient error estimator, 198
CVIS, 19, 75 Eigenvalues, 34, 58, 250
Poisson, 250
Damping parameter, 236 Eigenvectors, 34, 250
deal.II, 375, 401, 414 Poisson, 250
Decoupled vs. coupled, 72 Elastic wave equation, 311
Defect, 328 Elasticity, 49, 261
Deformation gradient, 39 Linearized, 65
Deformed configuration, 261 Elastodynamics, 311
Degrees of freedom, 140 Element residual, 212
Dense subspace, 103 Elements, 163
Density, 103 Elliptic operator, 58
Derivative Elliptic PDE, 57
Directional, 116 Energy conservation, 303
Fréchet, 116 Time-discrete level, 306
Gâteux, 116 Wave equation, 304
descent method, 237 Energy norm, 105
DG method, 300 Energy space, 105
Differential equation Error, 21
Classification, 65 a posteriori in FEM, 195
General definition, 63 a priori in FEM, 167
Linear vs. nonlinear, 65 Goal-oriented, 195
Differential equations Heat equation, 285
Examples, 43 ODE, 279
Differential operator Error estimation
Divergence form, 57 Goal-oriented, 199
Nondivergence form, 57 Residual-based, 219
Differential problem, 120 Error localization, 210
Differentiation in Banach spaces, 116 Error of consistence, 175
Differentiation under the integral sign, 48 Errors in numerical mathematics, 21
Diffusion, 45 Essential boundary conditions, 62
Diffusive interface approaches, 350 Euler
Dirac goal functional, 224 Backward, 279
Dirac rhs, 224 Implicit, 279
Directional derivative, 116 Euler method, 278
Dirichlet boundary conditions Euler-Lagrange equations, 43, 120
in Newton’s method, 331 Existence
Dirichlet condition, 62 Finite differences, 82
Discontinuous Galerkin, 300 Linearized elasticity, 263
Discretization Explicit schemes, 278
Finite differences, 80 Extrapolation to the limit, 372
Time, 271
Discretization error, 21, 368 Face residuals, 212
Distance, 22 FD, 77
Divergence, 33 FEM, 119
Divergence theorem, 40 2D simulations, 189
DoFs, 140 3D simulations, 190
DOpElib, 375 Computational convergence analysis, 190
Dual-weighted residual method, 199 Quadratic, 142
See also index Finite Elements, 119

438
Index
Finite differences, 77 Gradient, 33
Finite element Gradient descent, 237
Definition, 131 Green’s formulae, 40
Finite element method, 119 Green’s function, 78
Finite elements, 119, 125 Green-Lagrange strain tensor, 261
Adaptivity, 199 Growth of a species, 44
Computational convergence analysis, 190
Construction of manufactured solution, 119 Hölder’s inequality, 108
Construction of right hand side, 190 Hadamard, 28
Error analysis, 167, 190 Hanging nodes, 220
Error analysis in simulations, 190 Harmonic function, 58
Manufactured solution, 190 Hat function, 126
See also index FEM, 119 Heat equation, 57, 59, 291
Conforming, 127 deal.II, 401
Evaluation of integrals, 129 Implementation, 289, 401
Linear, 126 Numerical tests, 289
Mesh, 125 Programming code, 401
Weak form, 128 Higher-order FEM, 142
First Strang lemma, 175 Hilbert space, 100, 105
Fitted methods, 350 Hilbert spaces, 105
Fixed point methods, 233 Homogeneous Dirichlet condition, 57
Fixed-point iteration, 323 Hyperbolic PDE, 57
Fluid flow equations, 50 Hyperbolic problems, 271, 301
Fluid mechanics, 226
Fluid-Structure Interaction, 353, 355 IBVP, 59, 273
Fluid-structure interaction, 232 Ill-posed problem, 28
Forward Euler method, 278 Implementation
Forward Euler scheme, 93 C++, 329
Fréchet derivative, 116 Clothesline, 375
Fractional-Step-θ, 297 deal.II, 375
FSI, 353 Heat equation, 401
Free boundadary, 320 Navier-Stokes, 414
Friedrichs inequality, 162 Newton, 329
Frobenius scalar product, 124 Poisson, 375
FSI, 232 PU-DWR, 393
Functional of interest, 195 Implementation of FSI, 355
Fundamental lemma of calculus of variations, 103, Implementation of multiphysics, 355
123 Implicit schemes, 279
Inequalities
Gâteaux derivative, 116 Cauchy, 108
Galerkin equations, 129 Cauchy-Schwarz, 108
Galerkin method, 128 Hölder, 108
Galerkin orthogonality, 144 Minkowski, 108
Galerkin orthogonalty, 238 Young, 108
Gauss-Green theorem, 40 Inner product, 100
Gauß-Seidel method, 236 Integration by parts, 40, 104, 121, 150
Generalized minimal residual, 244 Interface coupling, 350, 351
Geometric multigrid, 253 Iteration
Gershgorin, 250 Fixed-point, 323
Global error nroms, 193 Newton, 327
GMRES, 244 Picard, 323
Goal functional, 195, 197 Iteration error, 373
Goal-oriented error estimation, 195 Iterative coupling, 348
Gottfried Wilhelm Leibniz, 428 Iterative solvers, 233

439
Index
Jacobi method, 236 Mesh elements, 163
Jacobian, 33 Meteorology, 359
Jumping coefficients, 345 Method of lines, 271
Metric space, 22
Korn’s inequality MG
1st, 263 Mesh independence, 257
2nd, 263 Preconditioner, 257
Krylov space, 238 Solver, 257
Minimization problem, 120
L-shaped domain, 224 Minimum angle condition, 173
Lagrange multiplier, 200, 201 Minkowski’s inequality, 108
Lagrange multipliers, 200 Monolithic formulation, 365
Lagrangian, 200 Multigrid, 253
Lamé parameters, 261 Key idea, 248
Landau symbols, 37 Mesh independence, 251, 257
Laplace Preconditioner, 257
Vector, 64 Prolongation, 248
Laplace equation, 57 Restriction, 248
Lax-Milgram lemma, 152, 262 Smoother, 248
Lebuesgue integral, 101 Solver, 257
Lebuesgue integration, 149 Multiindex notation, 32
Lebuesgue measures, 101
Leibniz, 428 Nabla operator, 33
Lemma Natural boundary conditions, 62
Lax-Milgram, 152 Navier-Stokes, 226, 343, 414
Line search, 331 FEM discretization, 342
Linear form, 124 Navier-Stokes equations, 50, 360
Linear functional, 36 Neumann condition, 62
Linear operator, 36 Newton’s method, 327, 330
Linear PDE, 65 Performance tests, 343
Linear vs. nonlinear, 72 Defect-correction, 328
Linearization by Newton, 327 Octave code, 387
Linearized elasticity, 65, 261 overview, 329
Little o, 37 Root finding problem, 387
Local degrees of freedom, 165 Nitsche, 172
Local truncation error, 84 Nonlinear PDE, 65
Logistic equation, 45 Nonlinear problems, 317
Lower-dimensional interface approaches, 350 Implementation, 414
Nonstationary linearized elasticity, 311
M matrix, 83 Norm, 35, 193
Malthusian law of growth, 44 Normal vector, 31
Mass conservation, 359 Normed space, 23, 35
Mass matrix, 276 Notation
Master element, 138, 140 Bilinear forms, 41
Mathematical modeling, 19 Linear forms, 41
MATLAB, 390 Scalar products, 41
Matrix NSE, 360
Eigenvalues, 34 Numerical analysis
Invariants, 34 Finite differences, 84
Properties, 34 Heat equation, 285
Maximum norm, 36 ODE, 279
Maximum principle Numerical concepts, 24
Continuous level, 79 Numerical integration, 137
Discrete, 83 Numerical methods, 19
Mesh adaptivity, 195

440
Index
Numerical quadrature, 137, 175 Programming codes, 375
Projection, 114
Obstacle problem, 321 PU-DWR, 393
Variational formulation, 320
Octave, 390 Quadratic FEM, 142
octave, 87, 193, 375 Questions to these lecture notes, 419
Octave code, 390 Quiz, 419
ODE, 20
One-Step-theta scheme, 274, 275 Rannacher time stepping, 305
Operator splitting, 295 Reference configuration, 261
Order of a PDE, 71 Reference element, 138, 140
Ordinary differential equation, 20 Regularity
Orthogonal projection, 110 Poisson 1D, 80
Orthogonality, 100 Relationship mesh size and DoFs, 367
Relaxation parameter, 236
Parabolic equation, 272 Reliability, 198
Parabolic PDE, 57 Reliable error estimator, 198
Parabolic problems, 271, 272 Research software, 19
Partial differential equation, 20 Reynolds’ transport theorem, 48
Partial integration, 40, 150 Richardson, 249
Partitioned schemes, 348 Richardson extrapolation, 372
PDE, 20 Richardson iteration, 234
Coupled, 72 Riesz representation theorem, 262
Single-valued, 63 Ritz method, 128
Vector-valued, 64 Ritz-Galerkin method, 129
Elliptic, 57 Robin condition, 62
Hyperbolic, 57 Robustness, 19
Parabolic, 57 Rotation, 33
PDE system, 19, 64, 71 Rothe method, 271
PDEs in atmospheric physics, 359
PDEs in meteorology, 359 Saddle point, 200
Penalization, 321 Scalar product, 100
Petrov-Galerkin method, 129 Scalar product, 35, 41, 124
Physical configuration, 261 Schur complement, 341
Picard iteration, 323 Scientific computing, 19
Poincaré inequality, 156 Search direction, 236
Poisson Semi-linear form, 41
1D simulations, 87, 132, 143 Semi-norm, 35
2D simulations, 189 Separation of variables, 44
3D simulations, 190 Sesquilinear, 35
Impementation, 375 Shape functions, 126
Poisson equation, 57 Shape functions, 129
Poisson problem, 45, 77 Simplified Newton steps, 331
Population, 44 Single-valued PDE, 63
Population growth, 44 Singularities of H 1 functions, 106
Positive definite, 34 Smooth functions, 123
Pre-Hilbert space, 100, 109 Smoother, 248, 249
Preconditioner, 243 Richardson, 249
Primal variable, 200 SOR, 257
Principal invariants of a matrix, 34 Sobolev spaces, 101, 105, 106, 149
Principle of minimum potential energy, 120 Software, 19, 375
Principle of virtual work, 120 Solid mechanics equation, 49
Programming code Space-time formulation, 273, 302
C++, 329 Space-time Galerkin, 271
Newton, 329 Space-time parabolic problem, 273

441
Index
Stability, 85, 279, 280 W-cycle, 256
Numerical tests, 27 Wave equation, 57, 61
stability analysis, 285 Weak derivative, 104
Staggered schemes, 348 Weak formulation
Stencil, 94 Fluid-structure interaction with elastic MM-
Stiff PDEs, 274 PDE, 353
stiff problem, 285 Fluid-structure interaction with harmonic MM-
Stiffness, 285 PDE, 353
Stiffness matrix, 276 Well-posedness, 28
Stokes, 343 Finite differences, 81
Stokes problem, 227 Poisson in 1D, 78
Strang Wheather models, 359
First lemma, 175
Strict A-stability, 284 Young’s inequality, 108
Structures
Mathematical, 362
Substitution rule, 39

Taylor expansion, 84, 282


Test function, 121, 123, 125
Time discretization, 271, 274
Time stepping scheme
Fractional-Step-Theta, 305
Time stepping schemes
A-Stability, 304
Crank-Nicolson, 304
Explicit Euler, 304
Fractional-Step-θ, 305
Fractional-step-θ, 297
Shifted Crank-Nicolson, 305
Total potential energy, 120
Trace, 33, 157
Transformation of integrals, 39
Transmission problems, 345
Trial function, 125
Truncation error, 84
Two grid method, 255

Unconditional stability, 94
Unconditionally stable, 287
Unfitted methods, 350
Uniqueness
Finite differences, 82

V-cycle, 256
V-ellipticity, 265
Variational inequality, 75
Variational multigrid, 253
Variational problem, 120
Vector Laplace, 64
Vector space, 35
Vector-valued PDE, 64
Vector-valued problems, 226
VI, 75
Volume ratio, 39

442

You might also like