0% found this document useful (0 votes)

77 views264 pages

Notes NMCE

This document provides notes on numerical methods for civil engineering. It is organized into five parts covering: 1) elliptic problems and their weak formulations and the Galerkin finite element method (GFEM), 2) applications of the GFEM to elliptic problems in 1D and 2D, 3) applications of the GFEM to linear elasticity problems in 3D and 2D, 4) sample examination problems and solutions, and 5) appendices reviewing relevant mathematics. The document was produced by Riccardo Sacco for a graduate course in numerical methods for civil engineering taught over 10 years at Politecnico di Milano. All computations were performed using Matlab code developed by Marco Restelli and modified by Riccardo Sacco

Uploaded by

Akshay Desai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views264 pages

Notes NMCE

Uploaded by

Akshay Desai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 264

Numerical Methods

for Civil Engineering

– Notes of the Course–

Riccardo Sacco

September 3, 2016
2
i

To my Family and Friends

Preface
This text contains the notes of the course “Numerical Methods in Civil Engi-
neering”, which I have been helding over the last ten years for the MSc in Civil
Engineering at Politecnico di Milano. The material is organized into 5 parts:

• Part I: Elliptic Problems (Chapts. 1 and 2)

• Part II: The GFEM for Elliptic Problems in 1D and 2D (Chapts. 3 and 4)

• Part III: The GFEM for Linear Elasticity (Chapts. 5, 6 and 7)

• Part IV: Examination Problems with Solution (Chapt. 8)

• Part V: Appendices ( A, B, C, D, E and F).

Specifically:

• Chapts. 1 and 2 illustrate the weak formulation of a boundary value prob-

lem and its numerical approximation using the Galerkin Finite Element
Method (GFEM).

• Chapts. 3 and 4 address the numerical study of 1D model problems and

the implementation in 2D of the finite element model for an advection-
diffusion-reaction boundary value problem.

• Chapt. 5 illustrates the basic foundations of Continuum Mechanics for a

linear, isotropic and elastic material in a general 3D setting. Chapt. 6 ad-
dresses the study of 2D problems (plane strain/plane stress conditions) as
well as their weak formulation and GFE approximation. Chapt. 7 deals
with the case of incompressible elasticity and its two-field formulation based
on the introduction of the pressure parameter.

• Chapt. 8 illustrates the complete solution of exercises recently proposed in

class exams at the end of the Course.

• Appendices A, B and C give a short review of the essentials in Differential

Calculus, Linear Algebra and Functional Analysis that are extensively used
throughout the text.

• Appendices D, E and F provide an introduction to Numerical Mathematics,

Numerical Linear Algebra and Approximation Theory.
iii

All computations have been performed using Matlab on a standard laptop running
under Linux OS. Methodologies and algorithms have been implemented in 1D
and 2D codes by Marco Restelli and subsequently modified by Riccardo Sacco
and Aurelio Giancarlo Mauri. The codes can be downloaded at the official web
page of the Course:
⇒ https://beep.metid.polimi.it/

to which the reader is kindly referred for any further information.

Milano, Riccardo Sacco

October 2013/December 2015
iv

Acknowledgements
It is a great pleasure for me to acknowledge here the fundamental contribution
given by Marco Restelli and Luca Dedè for their computer lab class teaching and
for developing the numerical software and tutorial exercises which constitute the
supporting backbone of the theoretical part of the course. I also wish to gratefully
thank Carlo de Falco for his teaching assistance in computer lab classes during the
year 2010, to Chiara Lelli for her assistance during the years 2012/2013, where
for the first time the teaching language switched from Italian to English, and to
Francesca Malgaroli for teaching in the year 2014. Last, but certainly not least,
my thanks go also to Paola Causin, for fruitful discussions and ten years of com-
mon hard work in the development, analysis and implementation of finite element
methodologies in Continuum Mechanics, and to Prof. Maurizio Verri, for his fan-
tastically critical proof-reading of the first draft of these notes.
Contents

I Elliptic Problems 1
1 Weak Formulations 5
1.1 Elliptic operators . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Elliptic boundary value problems . . . . . . . . . . . . . . . . . . 6
1.3 Weak solution of a BVP: the 1D case . . . . . . . . . . . . . . . . 7
1.4 Weak solution of a BVP: the 2D case . . . . . . . . . . . . . . . . 13
1.4.1 Non-homogeneous Dirichlet problem in 2D . . . . . . . . 15
1.4.2 Non-homogeneous Neumann problem in 2D . . . . . . . 16
1.4.3 Mixed problem in 2D . . . . . . . . . . . . . . . . . . . . 17
1.5 Well-posedness analysis: the Lax-Milgram Lemma . . . . . . . . 18

2 The GFEM 21
2.1 The Galerkin method . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 The Galerkin Finite Element Method . . . . . . . . . . . . . . . . 26
2.2.1 Error analysis . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Experimental convergence study of the GFEM . . . . . . . . . . . 30
2.3.1 BVP with smooth solution . . . . . . . . . . . . . . . . . 30
2.3.2 BVP with a non-smooth solution . . . . . . . . . . . . . . 31

II The GFEM for Elliptic Problems in 1D and 2D 33

3 Elliptic problems: theory and FEs 37
3.1 Reaction-diffusion model problem . . . . . . . . . . . . . . . . . 38
3.1.1 Weak formulation . . . . . . . . . . . . . . . . . . . . . . 39
3.1.2 Galerkin finite element approximation . . . . . . . . . . . 40
3.1.3 The linear system and the discrete maximum principle . . 41
3.1.4 Stabilization: the method of lumping of the reaction matrix 47

v
vi CONTENTS

3.2 Advection-diffusion model problem . . . . . . . . . . . . . . . . 50

3.2.1 Weak formulation . . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 Galerkin finite element approximation . . . . . . . . . . . 53
3.2.3 The linear system and the discrete maximum principle . . 54
3.2.4 Stabilization: the method of artificial diffusion . . . . . . 58

4 2D Implementation of the GFEM 65

4.1 Weak formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 Geometrical discretization . . . . . . . . . . . . . . . . . . . . . 66
4.3 Polynomial spaces in 2D . . . . . . . . . . . . . . . . . . . . . . 67
4.4 The approximation space . . . . . . . . . . . . . . . . . . . . . . 68
4.5 GFE approximation . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.6 The linear system . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.7 The assembly phase . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.8 2D Experimental convergence study of the GFEM . . . . . . . . . 74
4.8.1 The homogeneous problem for the Laplacian . . . . . . . 74
4.8.2 An advection-diffusion-reaction BVP . . . . . . . . . . . 76

III The GFEM for Linear Elasticity 79

5 3D Compressible elasticity 83
5.1 Essentials of solid mechanics . . . . . . . . . . . . . . . . . . . . 83
5.1.1 The relation σ − ε . . . . . . . . . . . . . . . . . . . . . 85
5.1.2 Linear isotropic elasticity . . . . . . . . . . . . . . . . . . 87
5.2 The Navier-Lamè Model . . . . . . . . . . . . . . . . . . . . . . 88
5.3 The weak formulation . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4 Existence and uniqueness of the weak solution . . . . . . . . . . . 92

6 2D elasticity 97
6.1 Two-dimensional models in elasticity . . . . . . . . . . . . . . . 97
6.1.1 Plane stress . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.1.2 Plane strain . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.2 The GFE approximation in the 2D case . . . . . . . . . . . . . . 100
6.2.1 The Galerkin FE problem . . . . . . . . . . . . . . . . . 101
6.2.2 Local approximations and matrices . . . . . . . . . . . . 102
6.2.3 Convergence analysis . . . . . . . . . . . . . . . . . . . . 105
6.3 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . 107
CONTENTS vii

6.3.1 Example 1: patch test (constant stress) . . . . . . . . . . . 108

6.3.2 Example 2: experimental convergence analysis . . . . . . 110

7 Incompressible elasticity 115

7.1 The incompressible regime in three spatial dimensions . . . . . . 116
7.2 The incompressible regime in two spatial dimensions . . . . . . . 117
7.3 Volumetric locking: examples . . . . . . . . . . . . . . . . . . . 119
7.4 Two-field model for linear elasticity . . . . . . . . . . . . . . . . 122
7.5 Weak formulation of the two-field model . . . . . . . . . . . . . . 124
7.6 Abstract form of the Herrmann system . . . . . . . . . . . . . . . 125
7.7 Well-posedness analysis of the two-field model . . . . . . . . . . 126
7.8 Energy formulation of incompressible elasticity . . . . . . . . . . 130
7.9 GFE approximation of the two-field model . . . . . . . . . . . . . 132
7.10 Unique solvability and error analysis . . . . . . . . . . . . . . . . 134
7.11 The discrete inf-sup condition . . . . . . . . . . . . . . . . . . . 137
7.12 Finite elements for incompressible elasticity . . . . . . . . . . . . 138
7.12.1 Discontinuous pressures . . . . . . . . . . . . . . . . . . 139
7.12.2 Continuous pressures . . . . . . . . . . . . . . . . . . . . 142
7.13 Locking and pressure spurious modes . . . . . . . . . . . . . . . 145
7.13.1 Discontinuous pressure FE space . . . . . . . . . . . . . . 146
7.13.2 Continuous pressure FE space . . . . . . . . . . . . . . . 149
7.14 Convergence analysis . . . . . . . . . . . . . . . . . . . . . . . . 151

IV Examination Problems with Solution 157

8 Solved problems 161
8.1 Examination of July 09, 2012 . . . . . . . . . . . . . . . . . . . . 162
8.1.1 Solution of Exercise 1 . . . . . . . . . . . . . . . . . . . 164
8.1.2 Solution of Exercise 2 . . . . . . . . . . . . . . . . . . . 172
8.1.3 Solution of Exercise 3 . . . . . . . . . . . . . . . . . . . 175

V Appendices 185
A Differential Calculus 189
A.1 Differential operators, useful formulas and properties . . . . . . . 189
A.1.1 First-order operators . . . . . . . . . . . . . . . . . . . . 189
viii CONTENTS

A.1.2 Second-order operators . . . . . . . . . . . . . . . . . . . 190

A.1.3 Green’s formula . . . . . . . . . . . . . . . . . . . . . . 190
A.1.4 Maximum principle for 2nd order elliptic operators . . . . 191
A.2 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
A.2.1 Operations/operators on tensors . . . . . . . . . . . . . . 192

B Linear Algebra 195

B.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
B.2 Vector and matrix norms . . . . . . . . . . . . . . . . . . . . . . 196
B.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

C Functional Analysis 203

C.1 Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
C.2 Complete metric spaces . . . . . . . . . . . . . . . . . . . . . . . 206
C.3 Normed spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
C.4 Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
C.5 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
C.6 Sobolev spaces in 1D . . . . . . . . . . . . . . . . . . . . . . . . 215
C.7 Sobolev spaces in Rd , d ≥ 2 . . . . . . . . . . . . . . . . . . . . 220

D Numerical Mathematics 225

D.1 The continuous model . . . . . . . . . . . . . . . . . . . . . . . . 225
D.2 The numerical model . . . . . . . . . . . . . . . . . . . . . . . . 229
D.3 The chain of errors . . . . . . . . . . . . . . . . . . . . . . . . . 231
D.4 Errors and error analysis . . . . . . . . . . . . . . . . . . . . . . 232
D.5 Floating-point numbers . . . . . . . . . . . . . . . . . . . . . . . 233

E Numerical Linear Algebra 237

E.1 Linear algebraic systems . . . . . . . . . . . . . . . . . . . . . . 237
E.2 Direct methods for linear systems . . . . . . . . . . . . . . . . . 239
E.3 Stability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 244

F Approximation Theory 247

F.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
F.1.1 Basis functions . . . . . . . . . . . . . . . . . . . . . . . 249
F.1.2 Finite element interpolation and error analysis . . . . . . . 250
F.2 Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Part I

Elliptic Problems

1
3

This part illustrates the weak formulation of elliptic model problems and their
numerical approximation using the Galerkin Finite Element Method.
4
Chapter 1

Weak Formulation of Elliptic

Boundary Value Problems

Abstract
In this chapter, we introduce the concept of weak solution of an elliptic boundary
value problem and we illustrate the Lax-Milgram Lemma, a theoretical tool for
the analysis of the well-posedness of the associated weak formulation.

1.1 Elliptic operators

Let Ω be a bounded set of Rd , d ≤ 3, with Lipschitz boundary ∂ Ω on which a
unit normal vector n = n(x) is defined almost everywhere, x = (x1 , . . . , xd )T being
the coordinate position vector. Let us consider the second-order linear differential
operator
Lu := div(−A∇u) + div(bu) + cu : Ω → R (1.1)
where A, b, c and u are sufficiently regular given functions.

Definition 1.1.1 (Ellipticity). The operator L is said to be elliptic in Ω if there

exists a positive constant α0 such that

ξ T A(x)ξ ≥ α0 |ξ |2

for every ξ = [ξ1 , . . . , ξd ]T ∈ Rd and for almost every x ∈ Ω.

5
6 CHAPTER 1. WEAK FORMULATIONS

Remark 1.1.2. Def. 1.1.1 is satisfied if A is a strictly positive function almost ev-
erywhere (a.e.) in Ω. In such a case we can take α0 = inf A(x). In the more
x∈Ω
general case where A is a real-valued d × d matrix function, then Def. 1.1.1 is sat-
isfied if A is positive definite a.e. in Ω. In this case, we can take α0 = inf λmin (x)
x∈Ω
where λmin (x) is the minimum eigenvalue of matrix A evaluated at the spatial
position x in the domain Ω.
Example 1.1.3 (The Laplace operator). The simplest, and most important, exam-
ple of linear elliptic operator is the Laplace operator (shortly, Laplacian), defined
as the second-order differential operator obtained by application of the divergence
operator to the vector field ∇u, for a given twice-differentiable function u, such
that
d
∂ 2u
4u := div ∇u = ∑ 2 .
i=1 ∂ xi

The above definition corresponds to setting A = 1, b = 0 and c = 0 in (1.1).

1.2 Elliptic boundary value problems

Let us start the mathematical study of an elliptic boundary value problem (BVP)
of the form:
find u = u(x) : Ω → R such that:
(
Lu = f in Ω
(P)
Bu = g on ∂ Ω,

where L is the linear differential elliptic operator defined in (1.1), f is a given

function and B is a linear operator associating the value of u and/or of its normal
derivative ∂ u/∂ n on the boundary ∂ Ω with a given boundary datum g.
Example 1.2.1 (The Dirichlet problem for the Laplace operator). The simplest ex-
ample of problem (P) corresponds to taking Lu = −4u, Bu = u (identity operator)
and g = 0. The resulting elliptic problem is the so-called Dirichlet BVP associated
with the Poisson equation with homogeneous boundary conditions:
(
−4u = f in Ω
(1.2)
u=0 on ∂ Ω.
1.3. WEAK SOLUTION OF A BVP: THE 1D CASE 7

The BVP (1.2), in the 2D case (d = 2) represents in Mechanics the model of

an elastic membrane clamped at the boundary and subject to the action of its own
weight f . The solution u(x, y) is the vertical displacement of any point P = (x, y)T
of the membrane.
More in general, problem (1.2) is the mathematical model for a wide range of
physical applications, including:
• Electrostatics: u is the electrostatic potential, f is the electric charge density
(normalized to the dielectric permittivity);
• Thermal Physics: u is the spatial distribution of temperature in a body sub-
ject to a heat thermal source f (normalized to the thermal conductivity);
• Hydraulics: u is the piezometric head in a porous medium subject to a piezo-
metric load f (normalized to the hydraulic conductivity).
Definition 1.2.2 (Classical solution). A classical (or strong) solution of the BVP
(P) is any function u ∈ C2 (Ω) satisfying (P)1 in the interior of Ω and the boundary
conditions (P)2 on ∂ Ω.
In realistic applications (like those mentioned in Ex. 1.2.1), Def. 1.2.2 does
not turn out to be the most appropriate, as shown in the following example.
Example 1.2.3 (Weak solution). Let us consider the homogeneous Dirichlet BVP
(1.2) in the case d = 1 where Ω = (0, 1) and f (x) = −Pδ (x−1/2), P being a given
positive constant and δ (x − x0 ) being the Dirac function centered at x = x0 . The
differential problem at hand represents the mathematical model of the transversal
deformation of an elastic rod fixed at the endpoints and subject to a load applied
at its center (the line for the wash with a heavy cloth in the middle).
The deformed configuration, shown in Fig. 1.1, is certainly not a twice-dif-
ferentiable function, rather, it is only piecewise linear and continuous, in such a
way that its derivative is piecewise constant over (0, 1) and the second derivative
is exactly the delta-function at x = 1/2 whose value is equal to +P. In this sense,
comparing the smoothness of u in this case with that of a strong solution, we can
speak, in a natural manner, of a “weak” solution.

1.3 Weak solution of a BVP: the 1D case

To account for cases like that in Ex. 1.2.3, we obviously need to extend the notion
of a solution of a BVP given in Def. 1.2.2. With this aim, we consider prob-
8 CHAPTER 1. WEAK FORMULATIONS

111
000
P 11
00
000
111
000
111
000
111
x 00
11
00
11
00
11
000
111
000
111 0 1/2 1 00
11
00
11
000
111 00
11
000
111 00
11
000
111 00
11
000
111
000
111
000
111
u(x) 00
11
00
11
00
11

Figure 1.1: Deformed configuration of an elastic rod.

lem (1.2) in the 1D case (d = 1): find u : Ω = (0, 1) → R such that:

−u00 = f
(
in Ω = (0, 1)
(1.3)
u(0) = u(1) = 0

where f is a given function in L2 (0, 1). Then, we proceed by moltiplying both

sides of the equation (1.3)1 by an arbitrary non-vanishing function v = v(x) (called
test function) and by integrating over (0, 1), obtaining
Z 1 Z 1
00
−u v dx = f v dx ∀v.
0 0

We use the formula of integration by parts (A.1) applied to w = −u0 to transform

the above integral identity into
Z Z 1 Z 1
0 0 0
− vu ndσ + u v dx = f v dx ∀v.
∂Ω 0 0

The boundary ∂ Ω is the set made by the two end-points x = 0 and x = 1, while
the outward unit normal vector n is given by n(0) = −1 and n(1) = +1, so that
the previous identity becomes
0 0
Z1 0 0 Z 1
− v(1)u (1) − v(0)u (0) + u v dx = f v dx ∀v.
0 0

To get rid of the boundary term in the previous relation, we choose v such that
v(0) = v(1) = 0
exactly as u does in (1.3)2 .
1.3. WEAK SOLUTION OF A BVP: THE 1D CASE 9

Theorem 1.3.1. Let ϕ, ψ be two functions in L2 (0, 1), and set g := ϕ · ψ. Then,
g ∈ L1 (0, 1).

Proof. By definition of norm in the space L1 (0, 1), we have

Z 1 Z 1
kgkL1 (0,1) = |g| dx = |ϕ ψ| dx.
0 0

Then, we apply the Cauchy-Schwarz inequality (C.16) and prove the result.

Using Thm. 1.3.1 we immediately see that the above described formal proce-
dure can be synthetically written as:
find u ∈ V such that
Z 1 Z 1
0 0
u v dx = f v dx ∀v ∈ V (1.4)
0 0

where

V = H01 (Ω) = v ∈ L2 (0, 1), v0 ∈ L2 (0, 1) such that v(0) = v(1) = 0 .

(1.5)

By virtue of Poincarè’s inequality (C.26) the space V is an Hilbert space endowed

with the norm
kvkV := kv0 kL2 (0,1) v ∈ V. (1.6)
Comparing (1.4)-(1.5) with the original BVP (1.3), we can make the following
considerations:

1. relation (1.4) holds in an integral sense and not in a pointwise sense, as (1.3)1 ;

2. however, to determine the solution of (1.4)-(1.5) we need to make an infinite

choice of test functions v ∈ V ;

3. the solution u of (1.4)-(1.5) has a lower differentiability requirements than

a classical solution. For this reason, we qualify problem (1.4)-(1.5) as the
weak formulation of the BVP.

Remark 1.3.2. To appreciate the extraordinary enlargement of the solution space

from C02 (classical solution) to H01 (weak solution), it is useful to go back to
Ex. C.6.5 and Fig. C.7.
10 CHAPTER 1. WEAK FORMULATIONS

Theorem 1.3.3 (Variational formulation of (1.3)). The minimization problem:

find u ∈ V such that
J(u) ≤ J(v) ∀v ∈ V (1.7)
where Z 1 Z 1
1
J(v) := (v0 )2 dx − f v dx ∀v ∈ V, (1.8)
0 2 0
is equivalent to the weak problem (1.4)-(1.5).

Figure 1.2: Minimization of the total potential energy. The black bullet is J(u),
while the black square is J(v) ≥ J(u) for all v ∈ V .

Proof. Let us prove that (1.4)-(1.5) implies (1.7). For each ε ∈ R and for any
arbitrary function v ∈ V we have
Z 1 Z 1
1 0 2
J(u + εv) = ((u + εv) ) dx − f (u + εv) dx
2
0 0
Z 1 Z 1 Z 1
1 0 2 0 0 2 1 0 2
= (u ) dx + ε u v dx + ε (v ) dx
0 2 0 0 2
Z 1 Z 1
− f u dx − ε f v dx
0 Z 1 0 1 Z 1

1
Z
0 0
= J(u) + ε u v dx − f v dx + ε 2
(v0 )2 dx.
0 0 0 2
1.3. WEAK SOLUTION OF A BVP: THE 1D CASE 11

Then, if u is a solution of (1.4)-(1.5), we obtain that

Z 1
1
J(u + εv) = J(u) + ε 2
(v0 )2 dx ≥ J(u) ∀ε ∈ R, ∀v ∈ V,
0 2
that is, the weak solution u is also a minimizer of (1.8). Let us now prove that
solving (1.7) in the space (1.5) implies (1.4). With this purpose, we introduce the
directional derivative of the functional J at u ∈ V in the direction v (called also
Frèchet derivative of the functional J at u ∈ V ), defined as
∂ J(u + εv) J(u + εv) − J(u)
:= lim . (1.9)
∂ε ε=0 ε→0 ε
Then, to express the stationarity of J at u we need to enforce the so-called Euler
condition
∂ J(u + εv)
= 0. (1.10)
∂ε ε=0
Computing the numerator of (1.9) yields
Z 1 Z 1 Z 1
0 0 2 1 0 2
J(u + εv) − J(u) = ε u v dx − f v dx + ε (v ) dx
0 0 0 2

and substituting the above result into (1.10) we get

Z 1 Z 1 Z 1
0 0 2 1 0 2
ε u v dx − f v dx + ε (v ) dx
0 0 0 2
0 = lim
ε→0
Z 1 ε
Z 1 Z 1
0 0 1 0 2
= lim u v dx − f v dx + ε (v ) dx
ε→0 0 0 0 2
Z 1 Z 1
0 0
= u v dx − f v dx,
0 0
which yields
Z 1 Z 1
0 0
u v dx = f v dx ∀v ∈ V,
0 0
that is, a minimizer u of (1.8) is also a solution of the weak problem (1.4)-(1.5).
This concludes the proof.
Remark 1.3.4. In the language of Continuum Mechanics, problem (1.7)-(1.8) is
referred to as the principle of minimization of the total potential energy. In partic-
ular, the term Z 1
1 0 2
(v ) dx
0 2
12 CHAPTER 1. WEAK FORMULATIONS

represents the total elastic strain energy stored in the elastic body in correspon-
dance to a generic deformed configuration v = v(x), while
Z 1
− f v dx
0

represents the work done by the deformed system against the external body forces
The weak formulation (1.4)-(1.5), instead, is the counterpart of the principle
of virtual works. In particular, the term
Z 1
u0 v0 dx
0

represents the scalar product bewteen the stress σ (u) = u0 associated with the
actual body deformation and the strain ε(v) = v0 associated with the virtual dis-
placement v = v(x), while
Z 1
f v dx
0
represents the work done by the external forces in correspondance of a virtual
displacement v.
So far, we have shown that u is a weak solution of the BVP iff it is also a min-
imizer of the total potential energy (1.8). By construction, if u solves (1.3), then
it is also a solution of the weak problem (1.4). To close this chain of implications,
we need to show that a weak solution is also a strong solution of the BVP.

Theorem 1.3.5 (Regularity of a weak solution). Let u be a solution of (1.4)-(1.5).

Assume also that
u00 ∈ L2 (0, 1). (1.11)
Then, u is also a solution of the BVP (1.3) a.e. in (0, 1).

Proof. Let us apply Green’s formula (A.1) to (1.4), to obtain

Z 1 Z 1
0
[v(1)w(1) − v(0)w(0)] − vw dx = f v dx ∀v ∈ V,
0 0

where w = u0 . Since v(0) = v(1) = 0, we get

Z 1
−(u00 + f )v dx = 0 ∀v ∈ V. (1.12)
0
1.4. WEAK SOLUTION OF A BVP: THE 2D CASE 13

Two remarks are in order with (1.12). The first remark is that the function g :=
u00 · v belongs to L1 (0, 1) due to Thm. 1.3.1, so that each term in the integral is
well defined. The second remark is that, using the definition (C.18) of scalar
product in the Hilbert space L2 (0, 1), the integral equation (1.12) tells us that
the element −(u00 + f ) ∈ L2 (0, 1) is orthogonal to the whole H01 (0, 1). Since the
embedding of H01 (0, 1) in L2 (0, 1) is completely continuous, it follows that the
element −(u00 + f ) ∈ L2 (0, 1) is orthogonal to the whole L2 (0, 1) space. Therefore,
−(u00 + f ) coincides with the null function a.e. in (0, 1) which concludes the
proof.

Remark 1.3.6. The conclusion of Thm. (1.3.5) is that the additional regularity
assumption (1.11) is a necessary requirement on the solution of the weak prob-
lem (1.4)-(1.5) in order to prove that it is actually a solution of the original BVP
problem (1.3). Such extra regularity could, however, have been inferred from (1.3)1 ,
by identifying u00 with the element f ∈ L2 (0, 1) in the Lebesgue sense (i.e., almost
everywhere in (0, 1)). Therefore, we see a posteriori that the solution of the weak
problem (1.4)-(1.5) belongs to H 2 (0, 1) ∩ H01 (0, 1).

1.4 Weak solution of a BVP: the 2D case

In this section we extend the construction and characterization of the weak for-
mulation of the 1D BVP (1.3) to the corresponding BVP (1.2) where the compu-
tational domain Ω is an open subset of R2 with Lipschitz boundary Γ := ∂ Ω on
which a unit outward normal vector n is defined almost everywhere.

Figure 1.3: Computational domain in 2D.

Proceeding formally as in the 1D case, we multiply (1.2)1 by an arbitrary

test function v and integrate both sides over Ω. Then, we apply the formula of
14 CHAPTER 1. WEAK FORMULATIONS

integration by parts (A.1) to the vector field w = ∇u to obtain

Z Z Z
− v∇u · n dΣ + ∇u · ∇v dΩ = f v dΩ ∀v.
∂Ω Ω Ω
The boundary term identically vanishes if we choose v in the space V = H01 (Ω)
defined in (C.34). By doing so, the weak formulation of the BVP (1.2) is:
find u ∈ V such that
Z Z
∇u · ∇v dΩ = f v dΩ ∀v ∈ V (1.13)
Ω Ω
where
V = H01 (Ω) = v ∈ L2 (Ω), ∇v ∈ (L2 (Ω))2 such that v = 0 on ∂ Ω .

(1.14)
As in the 1D case, by virtue of Poincarè’s inequality (C.35) the space V is an
Hilbert space endowed with the norm
kvkV := k∇vkL2 (Ω) v ∈ V. (1.15)
All the considerations and properties discussed in Sect. 1.3 can be immediately
extended to the 2D case. In particular, we have the following results.
Theorem 1.4.1 (Variational formulation of (1.2) in 2D). The minimization prob-
lem: find u ∈ V such that
J(u) ≤ J(v) ∀v ∈ V (1.16)
where
1
Z Z
J(v) := |∇v|2 dΩ − f v dΩ ∀v ∈ V, (1.17)
Ω2 Ω
is equivalent to the weak problem (1.13)-(1.14).
Theorem 1.4.2 (Regularity of a weak solution). Let u be a solution of (1.13)-
(1.14). Assume also that Ω is a convex open bounded set of R2 , and that
Dα u ∈ L2 (Ω) with |α| = 2. (1.18)
Then, u is also a solution of the BVP (1.2) in 2D a.e. in Ω, and we have u ∈
H 2 (Ω) ∩ H01 (Ω) exactly as in the one-dimensional weak problem (1.4)-(1.5).
Example 1.4.3 (BVP in a L-shaped domain). This example serves to clarify the
importance of Ω to be a convex domain in order to ensure the global H 2 -regulartity
of the solution. Consider the homogeneous Dirichlet problem (1.2) with d = 2,
f = 10 and Ω given by the L-shaped domain of Fig. 1.4(a). It can be proved
that the unique weak solution u, shown in Fig. 1.4(b), belongs only to the space
H 5/3 (Ω) because of the reentrant corner at (0, 0).
1.4. WEAK SOLUTION OF A BVP: THE 2D CASE 15

(a) Ω and finite element mesh (b) u

Figure 1.4: Dirichlet problem in a L-shaped domain. Geometrical discretization

and numerical solution have been performed using the Matlab function pdetool.

1.4.1 Non-homogeneous Dirichlet problem in 2D

We consider here the following non-homogeneous Dirichlet problem for the Laplace
operator: (
−4u = f in Ω
(1.19)
u=g on Γ ≡ ∂ Ω,
where Ω is a 2D bounded Lipschitz domain and where g is a given function in
the trace space H 1/2 (Γ) (see Def. C.7.4). Clearly, the BVP studied in Sect. 1.4
is a special case of (1.19) when g is identically equal to zero. To repeat the steps
followed previously to obtain the weak formulation of the problem, we only need
to transform (1.19) into an equivalent problem with homogeneous boundary con-
ditions. This, because it is easily checked that the space
Vg := v ∈ H 1 (Ω) such that γ0 v = g

is not linear (does v1 + v2 belong to Vg , for v1 , v2 ∈ Vg ??). With this purpose, let
us pick up in H 1 (Ω), a function Rg such that γ0 Rg = g. We call Rg a lifting of g to
all Ω. Then, we assume that the solution of (1.19) can be written as
u = u0 + Rg (1.20)
with γ0 u0 = 0. Having done this, we multiply (1.19)1 by a test function v ∈ V =
H01 (Ω), integrate over all Ω and apply Green’s formula, to get
Z Z
∇u · ∇v dΩ = f v dΩ ∀v ∈ V.
Ω Ω
16 CHAPTER 1. WEAK FORMULATIONS

Using (1.20), we obtain the weak formulation of the BVP (1.19):

find u0 ∈ V such that
Z Z Z
∇u0 · ∇v dΩ = f v dΩ − ∇Rg · ∇v dΩ ∀v ∈ V. (1.21)
Ω Ω Ω
Remark 1.4.4 (Physical interpretation). The BVP (1.19) represents in Thermal
Physics the model of a plate, whose temperature u is fixed at the boundary and
equal to g, and subject to a thermal source f (normalized to the thermal conduc-
tivity of the plate).
Remark 1.4.5. The weak problem (1.21) is of the same form as (1.13), except for
a modification of the right-hand side. The extra term
Z
− ∇Rg · ∇v dΩ
Ω
is well defined because of Thm. 1.3.1.

1.4.2 Non-homogeneous Neumann problem in 2D

Let us now consider the Neumann problem for the Laplace operator:
(
−4u = f in Ω
(1.22)
∇u · n = h on Γ ≡ ∂ Ω,
where h is a given function in the trace space H −1/2 (Γ) (see Def. C.7.10).
Remark 1.4.6 (Existence of a solution). Multiplying (1.22)1 by v = 1, integrating
over Ω and applying Green’s formula (A.1) with w = −∇u, yields
Z Z
f dΩ + h dΣ = 0. (1.23)
Ω Γ
This condition must be satisfied by the data in order problem (1.22) to admit a
solution, and for this reason it is often called compatibility condition on the data
of the Neumann problem.
Remark 1.4.7 (Uniqueness of the solution). If u is a solution of the BVP (1.22),
then also u + K is a solution, K being an arbitrary constant. Thus, to ensure unique
solvability of the Neumann problem, we need to enforce in some way a value of u
in Ω. This can be done, for instance, by introducing the further request that u has
null mean integral value Z
u dΩ = 0.
Ω
1.4. WEAK SOLUTION OF A BVP: THE 2D CASE 17

To derive the weak formulation of the BVP (1.22) we proceed as usual and
obtain:
find u ∈ V such that
Z Z Z
∇u · ∇v dΩ = f v dΩ + hv dΣ ∀v ∈ V (1.24)
Ω Ω Γ
where
V = H 1 (Ω) \ R = v ∈ H 1 (Ω) such that v is not a constant .

(1.25)
The space V is an Hilbert space endowed with the norm
kvkV := k∇vkL2 (Ω) v ∈ V. (1.26)
Remark 1.4.8 (Physical interpretation). The BVP (1.22) represents in Electrostat-
ics the Gauss law for a dielectric medium whose electric charge density (normal-
ized to the dielectric permittivity of the medium) is f , and whose outward flux of
the electric field across the boundary is equal to −h.

1.4.3 Mixed problem in 2D

We conclude the presentation of BVPs associated with the Laplace operator in a
2D domain by considering the mixed problem:


 −4u = f in Ω

u=g on ΓD (1.27)


∇u · n = h

on ΓN ,
where ΓD and ΓN are mutually disjoint partitions of the boundary Γ such that
Γ = ΓD ∪ ΓN , while g and h are given data in the trace spaces H 1/2 (ΓD ) and
H −1/2 (ΓN ), respectively. The definitions of these latter spaces are the obvious
generalization of Defs. C.7.4 and C.7.10, provided to replace Γ with ΓD and ΓN ,
respectively. Clearly, BVPs (1.19) and (1.22) can be recovered as special cases
of (1.27) by setting ΓN = 0/ and ΓD = 0,
/ respectively. In what follows, we assume
that meas(ΓD ) > 0. This avoids the need of enforcing an extra condition on u to
ensure uniqueness of the solution (see Rem. 1.4.7).
To construct the weak formulation of (1.27) we replicate the steps followed in
previous cases. In particular, we introduce a lifting Rg ∈ H 1 (Ω) of g, such that
Rg |ΓD = g, and decompose the solution u as
u = u0 + Rg
18 CHAPTER 1. WEAK FORMULATIONS

where u0 is such that u0 |ΓD = 0, and then we obtain:

find u0 ∈ V such that
Z Z Z Z
∇u0 · ∇v dΩ = f v dΩ + hv dΣ − ∇Rg · ∇v dΩ ∀v ∈ V (1.28)
Ω Ω ΓN Ω

where
1
(Ω) = v ∈ H 1 (Ω) such that v|ΓD = 0 .

V = H0,ΓD
(1.29)
The space V is an Hilbert space endowed with the norm (C.36) in virtue of
Rem. C.7.8.
Remark 1.4.9 (Physical interpretation). The BVP (1.27) represents in Thermal
Physics the Fourier law for thermal diffusion in a medium under the action of
a distributed thermal source f (normalized to the thermal conductivity of the
medium), of a fixed temperature g on the boundary portion ΓD and of a given
thermal flux −h on the boundary portion ΓN , respectively.

1.5 Well-posedness analysis: the Lax-Milgram Lemma

All of the weak formulations considered in the previous sections can be written in
the following abstract form:
find u ∈ V such that
B(u, v) = F(v) ∀v ∈ V, (1.30)
where:

• V is a given Hilbert space endowed with norm k · kV ;

• B(·, ·) : V × V → R is a given bilinear form, i.e., a linear real-valued func-

tional over the pair V ×V such that:

B(λ u + µw, v) = λ B(u, v) + µB(w, v),

B(u, λ v + µw) = λ B(u, v) + µB(u, w),

for every u, v, w ∈ V and for every real numbers λ and µ;

• F(·) : V → R is a given linear form, i.e., a linear functional over the space
V such that
F(λ v + µw) = λ F(v) + µF(w).
1.5. WELL-POSEDNESS ANALYSIS: THE LAX-MILGRAM LEMMA 19

The following result provides a sufficient condition for problem (1.30) to ad-
mit a unique solution.
Theorem 1.5.1 (Lax-Milgram (LM) Lemma). Assume that there exist three posi-
tive constants M, β and Λ such that:

|B(u, v)| ≤ MkukV kvkV u, v ∈ V (1.31a)

B(u, u) ≥ β kukV2 u∈V (1.31b)
|F(v)| ≤ ΛkvkV v ∈ V. (1.31c)

Then, problem (1.30) admits a unique solution u ∈ V and the following stability
estimate holds
Λ
kukV ≤ . (1.32)
β
Proof. The proof of the Lax-Milgram Lemma requires the use of advanced tools
of Functional Analysis and for this reason is omitted here. To prove (1.32), we
simply need to take v = u in (1.30). This yields, using (1.31c)

B(u, u) = F(u) ≤ |F(u)| ≤ ΛkukV ,

from which, using (1.31b)

β kukV2 ≤ B(u, u) ≤ ΛkukV

which gives (1.32).

Remark 1.5.2 (Meaning of the conditions in the LM Lemma). Conditions (1.31a)
and (1.31c) express the continuity of the bilinear form B and of the linear form
F, respectively. Continuity has the same meaning as the concept of continuous
dependence on the data that was introduced in Def. D.1.4. Therefore, it is clearly
desirable M and Λ to be as small as possible in such a way that perturbations
on problem data are not eventually amplified. Condition (1.31b) expresses the
coercivity of the bilinear form B. Roughly speaking, it tells us that the energy of
the system (represented by the quantity kukV2 ) is uniformly bounded from below.
Therefore, it is clearly desirable β to be as large as possible in such a way to
ensure a good control of the energy of the solution.
Tab. 1.1 gathers in a synthetic format all the considered BVPs for the Laplace
operator in a two-dimensional domain Ω with Lipschitz boundary Γ. In all cases,
the norm for V is given by (C.36).
20 CHAPTER 1. WEAK FORMULATIONS

BVP V B F
Z Z
Dirichlet (hom.) H01 (Ω) ∇u · ∇v dΩ f v dΩ
Ω Ω
Z Z
Dirichlet (non-hom.) H01 (Ω) ∇u0 · ∇v dΩ f v dΩ − a(Rg , v)
Ω Ω
Z Z Z
Neumann (non-hom.) H 1 (Ω) \ R ∇u · ∇v dΩ f v dΩ + hv dΣ
Ω Ω Γ
Z Z Z
Mixed 1 (Ω)
H0,Γ ∇u0 · ∇v dΩ f v dΩ + hv dΣ − a(Rg , v)
D
Ω Ω ΓN

Table 1.1: Examples of BVPs for the Laplace operator.

Example 1.5.3 (Verification of the LM Lemma). We use here the LM Lemma to

verify that the weak problem (1.13) is well-posed.
• Continuity of B: we have
Z Z
|B(u, v)| = ∇u · ∇v dΩ ≤ |∇u||∇v| dΩ ≤ kukV kvkV

Ω Ω
because of CS inequality (C.16) and (C.36). Therefore, we have M = 1.
• Coercivity of B: we immediately have
Z
B(u, u) = |∇u|2 dΩ = kukV2
Ω
so that β = 1.
• Continuity of F: we have
Z Z
|F(v)| = f v dΩ ≤ | f ||v| dΩ ≤ k f kL2 (Ω) kvkL2 (Ω) ≤ CP k f kL2 (Ω) kvkV

Ω Ω
having used again (C.16) and Poincarè inequality (C.35). Therefore, we
have Λ = CP k f kL2 (Ω) , and the stability estimate for the solution of the
Dirichlet problem for the Laplace operator is
kukV ≤ CP k f kL2 (Ω) . (1.33)

Remark 1.5.4 (Mechanical interpretation). The estimate (1.33) tells us that the
membrane deformation energy is proportional to the energy of the vertical load
(its weight, for instance) in agreement with Hooke’s law of Elasticity.
Chapter 2

Galerkin Finite Element

Approximation of Elliptic Boundary
Value Problems

Abstract
In this chapter, we construct the numerical approximation of an elliptic boundary
value problem in weak form by means of the Galerkin Finite Element Method.
In particular, we study the consistency, stability and convergence properties of
the associated discrete formulation, and we illustrate the numerical performance
of the method in the study of simple two-point boundary value problems with
smooth and non-smooth solutions.

2.1 The Galerkin method

The numerical approximation of the weak formulation (1.30) of a BVP like those
studied in Chapter 1 consists of the following steps:

1. construct a family of subspaces Vh ⊂ V that depend on a discretization pa-

rameter h > 0 and such that

dimVh = Nh < +∞;

2. seek the solution of the weak problem (1.30) within Vh , that is:

21
22 CHAPTER 2. THE GFEM

find uh ∈ Vh such that

B(uh , vh ) = F(vh ) ∀vh ∈ Vh . (2.1)

Definition 2.1.1 (Nomenclature). The equation (2.1) is called the Galerkin Prob-
lem associated with the weak problem (1.30), and uh is the Galerkin approxima-
tion of the solution u ∈ V of (1.30).

Since Vh is a finite-dimensional space, we have

Vh = span {ϕi }Nh

i=1

where the functions ϕi = ϕi (x) constitute a basis for Vh (cf. Sect. B.1). Of course,
we can write
Nh
uh (x) = ∑ u j ϕ j (x) (2.2)
j=1

where the Nh real numbers u j , j = 1, . . . , Nh are the degrees of freedom (dofs) of

uh with respect to the considered basis of Vh . Replacing (2.2) into (2.1) and using
the linearity of B(·, ·) with respect to the first argument, yields the problem:
find u j , j = 1, . . . , Nh , such that

Nh
∑ u j B(ϕ j , vh) = F(vh) ∀vh ∈ Vh . (2.3)
j=1

The mathematical phrase ”∀vh ∈ Vh ” is equivalent to ”∀ϕi , i = 1, . . . , Nh ”, so that

the Galerkin problem (2.3) can be written in explicit form as:
find u j , j = 1, . . . , Nh , such that


 B(ϕ1 , ϕ1 )u1 + B(ϕ2 , ϕ1 )u2 + · · · + B(ϕNh , ϕ1 )uNh = F(ϕ1 )
 B(ϕ1 , ϕ2 )u1 + B(ϕ2 , ϕ2 )u2 + · · · + B(ϕN , ϕ2 )uN = F(ϕ2 )

h h
.. .. .. ..


 . + . + ··· + . = .
 B(ϕ , ϕ )u + B(ϕ , ϕ )u + · · · + B(ϕ , ϕ )u
1 Nh 1 2 Nh 2 Nh Nh Nh = F(ϕNh ).

Thus, the Galerkin problem amounts to solving a linear algebraic system that can
be written in matrix form as
Bu = f (2.4)
where:
2.1. THE GALERKIN METHOD 23

• B is a square matrix of dimension Nh , called stiffness matrix, whose entries

are defined as
Bi j = B(ϕ j , ϕi ) i, j = 1, . . . , Nh ; (2.5)

• u = [u1 , . . . , uNh ]T is the column vector of dimension Nh containing the un-

known dofs of uh ;
• f is a column vector of dimension Nh , called load vector, whose compo-
ments are defined as

fi = F(ϕi ) i = 1, . . . , Nh . (2.6)

In conclusion, once the basis {ϕi }Nh

i=1 is properly selected, we only have to use
the methodologies illustrated in Appendix E to efficiently and accurately solve
system (2.4), and we are done. Of course, it remains to prove that:
1. the solution uh exists and is unique;
2. the solution uh is a good approximation of u, that is, using the terminology
of Sect. D.2, uh converges to u, i.e.

lim ku − uh kV = 0. (2.7)
h→0

The answer to the first question is given by the following result.

Theorem 2.1.2. The stiffness matrix B is positive definite, so that the Galerkin
problem (2.1) (equivalently, the linear system (2.4)) is uniquely solvable.
Proof. We prove the result in the simple case Nh = 2, leaving to an exercise the
extension to Nh > 2. Computing explicitly the quantity B(uh , uh ) we get
B(uh , uh ) = B(u1 ϕ1 + u2 ϕ2 , u1 ϕ1 + u2 ϕ2 )
= u1 B(ϕ1 , u1 ϕ1 + u2 ϕ2 ) + u2 B(ϕ2 , u1 ϕ1 + u2 ϕ2 )
= u21 B(ϕ1 , ϕ1 ) + u1 u2 B(ϕ1 , ϕ2 ) + u2 u1 B(ϕ2 , ϕ1 ) + u22 B(ϕ2 , ϕ2 )
= u1 [B(ϕ1 , ϕ1 )u1 + B(ϕ1 , ϕ2 )u2 ] + u2 [B(ϕ1 , ϕ1 )u1 + B(ϕ1 , ϕ2 )u2 ]
= uT Bu.
Using the coercivity of B, we have

uT Bu = B(uh , uh ) ≥ β kuh kV2 > 0 ∀uh ∈ Vh

that is, B is positive definite (see Sect. B.3 for the definition).
24 CHAPTER 2. THE GFEM

Theorem 2.1.3 (A priori estimate for uh ). The unique solution uh ∈ Vh of (2.1)

satisfies the following a priori estimate
Λ
kuh kV ≤ . (2.8)
β
Proof. Note that Vh ⊂ V ; then, apply the Lax-Milgram lemma to (2.1).
To answer the second question, we need to check that the Galerkin formula-
tion (2.1) is stable and consistent (cf. Thm. D.2.2).
Definition 2.1.4 (Discretization error). The discretization error introduced by the
Galerkin method is defined as

eh := u − uh .

Theorem 2.1.5 (Stability). The Galerkin method is stable, i.e., eh satisfies the
estimate
B(eh , eh ) ≥ β keh kV2 ,
where β is the coercivity constant appearing in the Lax-Milgram Lemma 1.5.1.
Proof. Since Vh ⊂ V , the function eh ∈ V , and then we can apply the coercivity
assumption on B to get the result.

Figure 2.1: The property of orthogonality of the Galerkin method.

Theorem 2.1.6 (Consistency). The Galerkin method is consistent, i.e., the dis-
cretization error satisfies the relation

B(eh , vh ) = 0 ∀vh ∈ Vh . (2.9)

2.1. THE GALERKIN METHOD 25

Proof. Since Vh ⊂ V , we can choose v = vh ∈ Vh in (1.30). Then, subtracting (2.1)

from (1.30), we get (2.9).
Remark 2.1.7 (Galerkin orthogonality). Thm. 2.1.6 has an interesting geometrical
interpretation in the case where the bilinear form B(·, ·) is a scalar product on the
Hilbert space V (cf. Def. C.5.1). In such an event, relation (2.9) tells us that the
error is orthogonal to the solution space V with respect to the metrics induced by
B(·, ·). This means, in particular, that the discrete solution uh is the orthogonal
projection of the exact solution u onto Vh . A pictorial representation of this latter
statement is given in Fig. (2.1).
Theorem 2.1.8 (Ceà’s Lemma). The discretization error satisfies the following
estimate
M
ku − uh kV ≤ inf ku − vh kV . (2.10)
β vh ∈Vh
Proof. Using the coercivity of B (equivalently, the stability of the Galerkin method),
we get
B(u − uh , u − uh ) ≥ β ku − uh kV2 .
On the other hand, using the consistency of the Galerkin method, we also have,
for every vh ∈ Vh
B(u − uh , u − uh ) = B(u − uh , u − vh ) + B(u − uh , vh − uh ) .
| {z }
=0 because of (2.9)

Thus, using the continuity of B and combining the previous two relations, we get
β ku − uh kV2 ≤ Mku − uh kV ku − vh kV ∀vh ∈ Vh ,
from which we get
M
ku − uh kV ≤ ku − vh kV ∀vh ∈ Vh .
β
Since the last inequality must hold for every choice of vh in Vh , it is true in particu-
lar for that realizing the infimum of ku − vh kV over the space Vh , that is (2.10).
Remark 2.1.9. Ceà’s Lemma has a remarkable conceptual interpretation
discretization error ≤ ampli f ication f actor × approximation error .
The amplification factor M/β is the effect of the continuous problem on its corre-
sponding approximation. This, a posteriori, supports the importance of having a
”small” M and a ”big” β in Lax-Milgram Lemma.
26 CHAPTER 2. THE GFEM

Theorem 2.1.10 (Convergence). Let v ∈ V be an arbitrary element of the function

space V . Assume that the approximation space Vh is chosen in such a way that

lim inf kv − vh kV = 0. (2.11)

h→0 vh ∈Vh

Then, the Galerkin method is convergent.

Proof. It suffices to use (2.11) in (2.10) with v = u and apply the definition of
convergence (2.7).

2.2 The Galerkin Finite Element Method

In this section, we specify one particular choice of Vh that satisfies (2.11). To
make things simple, we assume here that Ω = (0, 1) and V = H01 (Ω), and denote
from now on by C a positive constant, independent of h, not necessarily having
the same value in all its occurrences.
Let Th denote a family of partitions of Ω into Mh ≥ 2 subintervals Ki :=
[xi−1 , xi ], i = 1, . . . , Mh , of length hi = xi − xi−1 , with h := max hi . For a given
Ki ∈Th
r ≥ 1, we set
r
Vh := Xh,0 (Th ) = {vh ∈ Xhr , such that vh (0) = vh (1) = 0} , (2.12)

where Xhr is the piecewise polynomial space introduced in Sect. F.1. The number
of linearly independent basis functions spanning Vh , denoted by Nh (r), gives the
dimension of Vh and reads

Nh (r) = Mh (r + 1) − (Mh − 1) − 2 = Mh r − 1. (2.13)

The Galerkin method with Vh as in (2.12) is called Galerkin Finite Element Method
of order r (GFEM), or, shortly, the Finite Element Method (FEM).
Remark 2.2.1 (Sparsity pattern of A). An important consequence of the choice of
Vh is that the stiffness matrix B is sparse. This means that only a few of the Nh2
entries Bi j is actually non-vanishing. In the 1D case, it is easy to characterize the
sparsity pattern of B, i.e., the subset of node indices such that Bi j 6= 0. As a matter
of fact, looking at the structure of Lagrangian basis functions (see Sect. F.1.1), we
see that for each row i ∈ [1, Nh (r)], the column indices j(i) belonging to the pattern
of i are those such that j(i) ∈ [i − r, i + r]. For example, if r = 1, for each row i,
we have (a priori) at most three nonzero entries: Bi,i−1 , Bi,i and Bi,i+1 , so that B
2.2. THE GALERKIN FINITE ELEMENT METHOD 27

is a tridiagonal matrix. If r = 2, the sparsity pattern is larger, and we have a priori

at most five nonzero entries per each row: Bi,i−2 , Bi,i−1 , Bi,i , Bi,i+1 and Bi,i+2 , and
so on. Fig. 2.2 shows these two sparsity patterns on a mesh of Mh = 10 elements.
The property of generating a sparse matrix structure maintains its validity also
in multi-dimensional problems, and represents one of the strongest advantages of
using the GFEM. Sparsity allows to minimize storage occupation and to optimize
computing resources in the solution of the linear algebraic system (2.4).

(a) r = 1 (b) r = 2

Figure 2.2: Sparsity patterns in the cases r = 1 and r = 2, with Mh = 10. The
graphs have been obtained using the Matlab commands: spy(B) and nnz(B),
respectively.

2.2.1 Error analysis

The following interpolation error estimates generalize (F.8) and (F.9) to the case
of the Sobolev norm (C.36).

Theorem 2.2.2 (Interpolation error in the L2 norm). Let f be a given function in

H r+1 (Ω) and Πrh f the Vh -interpolant of f . Then, we have

k f − Πrh f kL2 (Ω) ≤ Chr+1 k f kH r+1 (Ω) . (2.14)

Morever, we also have

k( f − Πrh f )0 kL2 (Ω) ≤ Chr k f kH r+1 (Ω) . (2.15)

28 CHAPTER 2. THE GFEM

Remark 2.2.3. Thm. 2.2.2 contains two important informations. The first infor-
mation is that the finite element interpolant Πrh f converges to f (in the topology of
L2 ) with order r + 1 with respect to the discretization parameter h (cf. Def. D.4.2).
The second information is that the order of convergence of the derivative (Πrh f )0
to the derivative f 0 decreases by one with respect to the order of convergence of
Πrh f to the function f .

Theorem 2.2.4 (GFEM Convergence (1)). Let u and uh denote the solutions
of (1.30) and (2.1), respectively. Assume also that u ∈ H r+1 (Ω). Then, we have
M r
ku − uh kV ≤ C h kukH r+1 (Ω) . (2.16)
β

Under the same assumptions, we also have

ku − uh kL2 (Ω) ≤ Chr+1 kukH r+1 (Ω) . (2.17)

Proof. Let us consider (2.10). Instead of determining the ”optimal” vh , i.e., the
function vh in correspondance of which the approximation error attains its infi-
mum, we pick the ”special” function vh := Πrh u. By doing so, we obtain

M
ku − uh kV ≤ ku − Πrh ukV
β

from which, using (2.14) and (2.15) with f ≡ u, we get (2.16). The proof of (2.17)
requires the use of sophisticated functional arguments (duality analysis, Aubin-
Nitsche trick), and for this reason is omitted here.
Remark 2.2.5. The above proof reveals another remarkable conceptual interpreta-
tion of Ceà’s Lemma

discretization error ≤ ampli f ication f actor × interpolation error .

Thus, the error analysis of the GFEM method is reduced to a simple interpolation
error analysis.
The error analysis of the GFEM has been carried out under the very restrictive
assumption that the exact solution u of (1.30) has an arbitrarily high regularity (as
a matter of fact, we have postulated that, for any given r, the function u belongs to
H r+1 !). This is clearly not true, in general, so that we have to modify Thm. 2.2.4
to account for the realistic case.
2.2. THE GALERKIN FINITE ELEMENT METHOD 29

Theorem 2.2.6 (GFEM Convergence (2)). Let u and uh denote the solutions
of (1.30) and (2.1), respectively. Assume that u ∈ H s (Ω), s ≥ 2 being the ”true”
regularity of the solution of the weak problem. Then, we have
M l
ku − uh kV ≤ C h kukH l+1 (Ω) , (2.18)
β
where
l := min {r, s − 1}
is called regularity threshold. Under the same assumptions, we also have

ku − uh kL2 (Ω) ≤ Chl+1 kukH l+1 (Ω) . (2.19)

Remark 2.2.7 (h-and-r refinement strategies). Thm. 2.2.6 shows that there is no
convenience in using high-order polynomials (r-refinement) to increase approxi-
mation accuracy, when the regularity of the exact solution is small. Rather, it is
better to employ the maximum value of r that is allowed by the regularity thresh-
old, and then reduce the discretization parameter h (h-refinement). This general
phylosophical statement is summarized in Tab. 2.1.

r s=1 s=2 s=3 s=4 s=5

1 conv. O(h) O(h) O(h) O(h)

2 conv. O(h) O(h2 ) O(h2 ) O(h2 )

3 conv. O(h) O(h2 ) O(h3 ) O(h3 )

4 conv. O(h) O(h2 ) O(h3 ) O(h4 )

Table 2.1: Order of convergence of the GFEM method as a function of r and of

s. The lower triangular part of the table corresponds to sub-optimal convergence
rates of the method (equal to s − 1) because the order of convergence is limited
by the regularity threshold. The diagonal part of the table (boxed terms) corre-
spond to the best trade-off between accuracy and computational effort for a given
solution regularity. The upper triangular part of the table corresponds to opti-
mal convergence rates of the method (equal to r) when the solution is sufficiently
smooth.
30 CHAPTER 2. THE GFEM

2.3 Experimental convergence study of the GFEM

In this section, we shortly verify the numerical performance of the GFEM in the
study of two BVPs in 1D, characterized by a smooth and non-smooth solution,
respectively. All computations have been performed using the Matlab finite ele-
ment package EF1D developed by Marco Restelli and available at the link:

⇒ https://beep.metid.polimi.it/

2.3.1 BVP with smooth solution

This example serves to verify Thm. 2.2.6 in the numerical solution of the follow-
ing BVP:
 −u00 = sin(x)

in (0, 10)
u(0) = 0 (2.20)
u(10) = sin(10).


The exact solution is u(x) = sin(x), so that s = +∞ and Thm. 2.2.6 gives us the
optimal result l = r for every r. This prediction is confirmed by the plots in Fig. 2.3
which show that the orders of convergence in the L2 and H 1 norms are equal to
r + 1 and r, respectively. The mesh size is uniform and equal to h = 10/Mh ,
Mh = [10, 20, 40, 80, 160, 320]T .

(a) r = 1 (b) r = 2 (c) r = 3

Figure 2.3: BVP in 1D with smooth solution. Circles refer to the error in L2 ,
asterisks refer to the error in H 1 .
2.3. EXPERIMENTAL CONVERGENCE STUDY OF THE GFEM 31

2.3.2 BVP with a non-smooth solution

This example serves to verify Thm. 2.2.6 in the numerical solution of the follow-
ing BVP:
 −u00 = f (x; λ )

in (0, 2)
u(0) = 0 (2.21)
u(2) = sin(2) + 1


where λ is a real parameter and

sin(x) 0≤x≤1
f (x; λ ) =
sin(x) − λ (λ − 1)(x − 1)λ −2 1 < x ≤ 2.

The exact solution is

sin(x) 0≤x≤1
u(x; λ ) =
sin(x) + (x − 1)λ 1 < x ≤ 2.

For a given s ≥ 2, the source term f belongs to Ls (0, 2) iff

Z 2
(x − 1)(λ −2)s dx < +∞.
1

Setting ξ := (x − 1) and κ := (λ − 2)s, the above condition becomes

Z 1
ξ κ dξ =< +∞,
0

which shows that f ∈ Ls (0, 2) ⇔ κ + 1 > 0, that is, iff λ > 2 − 1/s. Under this
condition, the solution u(x; λ ) of (2.21) belongs to the Sobolev space H s (0, 2) so
that Thm. 2.2.6 tells us that (in the worst scenario) the order of convergence of the
GFEM is equal to l = min {r, s − 1}. Setting s = 3, for example, we should have a
convergence order pH 1 in the H 1 -norm equal to 1 and 2 (optimal) if r is equal to
1 and 2, respectively, while pH 1 = s − 1 = 2 for every r ≥ 3 (sub-optimal). This
prediction is fully confirmed by Tab. 2.2 which shows the order of convergence
computed experimentally using (D.9) on a triangulation Th with uniform mesh
size equal to h = 2/Mh , Mh = [7, 23, 59, 131, 277, 551]T .
32 CHAPTER 2. THE GFEM

Mh pH 1 (r = 1) pH 1 (r = 2) pH 1 (r = 3) pH 1 (r = 4)
23 9.9434e-01 1.9833e+00 2.1384e+00 2.0032e+00
59 9.9938e-01 1.9947e+00 2.0559e+00 2.0010e+00
131 9.9990e-01 1.9978e+00 2.0247e+00 2.0004e+00
277 9.9998e-01 1.9990e+00 2.0116e+00 2.0002e+00
551 9.9999e-01 1.9995e+00 2.0057e+00 1.9998e+00
Table 2.2: Order of convergence of the GFE method as a function of r in the case
of a solution belonging to H 3 (0, 2).
Part II

The GFEM for Elliptic Problems in

1D and 2D

33
35

This part illustrates the weak formulation and the numerical approximation using
the Galerkin Finite Element Method (GFEM) of reaction-diffusion and advection-
diffusion problems in the 1D case. Then, the application of the GFEM to elliptic
problems in the 2D case is addressed, and the principal issues that characterize the
implementation of the method in a computational algorithm are illustrated.
36
Chapter 3

Elliptic Boundary Value Problems in

1D: Theory and Finite Element
Approximation

Abstract

In this chapter, we apply the general machinery of weak formulation and Galerkin
Finite Element (GFE) approximation to two particular, and significant, classes
of elliptic boundary value problems, namely, a reaction-diffusion equation and
an advection-diffusion equation. To keep the presentation as simple as possible,
we consider the 1D case, with constant coefficients and homogeneous Dirichlet
boundary conditions. For both model problems, the standard GFE approximation
is first presented, and then an appropriate stabilized form of the method is pro-
posed to cure the occurrence of numerical instabilities in the computed solution
when the continuous problem is dominated by reaction or advection terms. Com-
putational examples (run with the 1D finite element code EF1D written by Marco
Restelli) are included to show how things can go wrong and how to fix problems.
The code has been developed by Marco Restelli and subsequently modified by
Riccardo Sacco and Aurelio Giancarlo Mauri, and is available at the link:

⇒ https://beep.metid.polimi.it/

37
38 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

3.1 Reaction-diffusion model problem

Let us consider the following two-point BVP:
−µu00 + σ u = f


 in Ω = (0, 1)

u(0) = 0 (3.1)



u(1) = 0,
where µ > 0 is the diffusion coefficient, σ > 0 is the reaction coefficient while
the right-hand side f is a given production term. The equation system (3.1) is
commonly referred to as a reaction-diffusion (RD) problem, and represents a sim-
ple model of a chemical substance whose concentration is u, that diffuses in the
environment (a fluid, the air) with diffusion coefficient µ and reacts with the en-
vironment according to a net reaction mechanism given by R := f − σ u. Setting
for simplicity f = 1, the exact solution of the problem is
−α
αx 1 − e −αx e − 1
1 α
u(x) = 1 + e −α +e (3.2)
σ e − eα e−α − eα
p
where α := σ /µ. A plot of u is reported in Fig. 3.1 for three increasing values
of α, corresponding to σ = 1 and µ = 1, 10−1 , 10−3 .

Figure 3.1: Plot of the solution of the reaction-diffusion model problem as a func-
tion of α.
Fig. 3.1 shows that u tends to a constant, equal to 1/σ , as α → ∞. As a matter
of fact, assuming α 1 and expanding the exponential terms in (3.2), we get
1h i
u(x) ' 1 − eα(x−1) − e−αx .
σ
3.1. REACTION-DIFFUSION MODEL PROBLEM 39

The solution behaves, asymptotically, as the sum of three contributions: a con-

stant and two exponential terms that become significant as x gets closer to 1 and
0, respectively. These two exponential terms are called boundary layer terms
because they are responsible for the variation of u from the constant value 1/σ
to the boundary conditions u(1) = 0 and u(0) = 0. The subinterval of Ω within
which such a variation occurs is called boundary layer and is denoted by δ . It can
be proved that δ = O(α −1 ), therefore it gets smaller and smaller if diffusion is
dominated by reaction in the model (3.1).

Definition 3.1.1. Model (3.1) is said to be reaction-dominated if α 1, otherwise

it is called diffusion-dominated.
Matlab coding. The Matlab script for computing and plotting u is reported below.
x=[0:0.0001:1];
s=1; mu=[1, 1e-1, 1e-3];
a=sqrt(s./mu);
for k=1:numel(a),
u(k,:)=1/s*(1+exp(a(k)*x)*(1-exp(-a(k)))/(exp(-a(k))-exp(a(k)))+...
exp(-a(k)*x)*(exp(a(k))-1)/(exp(-a(k))-exp(a(k))));
end
plot(x,u)>> xlabel(’x’)
legend(’\alpha=1’,’\alpha=3.16’,’\alpha=31.7’)

3.1.1 Weak formulation

Let V := H01 (0, 1) with norm given by (C.28). Assume also that f is a given
function in L2 (0, 1). Multiplying (3.1)1 by an arbitrary test function v ∈ V and
integrating by parts the term 01 −µu00 v dx, we get the following weak formulation
R

of the reaction-diffusion BVP:

find u ∈ V such that
Z 1 Z 1 Z 1
µu0 v0 dx + σ uv dx = f v dx ∀v ∈ V
| 0
{z 0
} 0
| {z } (3.3)
B(u,v) F(v)

Theorem 3.1.2. Problem (3.3) is uniquely solvable and u satifies the estimate

CP k f kL2 (0,1)
kukV ≤ . (3.4)
µ
Proof. Let us check that B and F satisfy the conditions required by the Lax-
Milgram Lemma.
40 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

• continuity of B:
Z 1 Z 1
0 0
|B(u, v)| ≤ µ |u | |v | dx + σ |u||v| dx
0 0
≤ µkukV kvkV + σ kukV kvkL2 (0,1)
≤ (µ + σCP2 )kukV kvkV ∀u, v ∈ V,

from which M = µ + σCP2 ;

• coercivity of B:
B(u, u) ≥ µkukV2 ∀u ∈ V
because σ is > 0, from which β = µ;

• continuity of F:
Z 1
|F(v)| ≤ | f | |v| dx ≤ k f kL2 (0,1) kvkL2 (0,1)
0
≤ CP k f kL2 (0,1) kvkV ∀v ∈ V,

from which Λ = CP f kL2 (0,1) .

Since all the assumptions of the Lax-Milgram Lemma are verified, the solution u
of (3.3) exists and is unique in V , and satisfies (3.4).

3.1.2 Galerkin finite element approximation

The GFE discretization of (3.1) is (2.1) and the corresponding matrix form is (2.4)
with Z 1 Z 1
Bi j = µϕ 0j ϕi0 dx + σ ϕ j ϕi dx i, j = 1, . . . , Nh
0 0
Z 1
Fi = f ϕi dx i = 1, . . . , Nh ,
0
where the approximation space Vh is the finite element space of degree r defined
in (2.12), and {ϕi }Nh
i=1 is the corresponding Lagrange basis introduced in Sect. F.1.
The stiffness matrix B is symmetric and positive definite, so that the discrete prob-
lem (2.1) admits a unique solution (see Thm. 2.1.2). Moreover, Ceà’s Lemma tells
us that
σ
ku − uh kV ≤ C 1 + hr kukH r+1 (0,1) , (3.5)
µ
3.1. REACTION-DIFFUSION MODEL PROBLEM 41

because the exact solution u ∈ C∞ ([0, 1]) so that l = r, while CP = 1 for Ω = (0, 1).
Thus, we see that uh converges to u, in the H 1 -norm, as h → 0, and that accuracy
improves as the polynomial order is increased, because of the infinite regularity
of the exact solution.

Remark 3.1.3 (The reaction-dominated case). It is interesting to take a closer look

at the convergence estimate (3.5) and try to analyze it as a function of model
parameters, µ and σ . For this, we set r = 1 (piecewise linear finite elements) and
fix a tolerance ε, sufficiently small. Then, we see that in order the error to be ' ε,
we need that
ε
h' .
C(1 + α 2 )kuk H 2 (0,1)

Therefore, if the reaction-diffusion problem is reaction-dominated (α 1), the

mesh size needs to be much smaller than the tolerance ε, in accordance with
Fig. 3.1, because boundary layer effects become increasingly significant as α gets
larger and the RD problem becomes “tougher” to solve. In conclusion, we ex-
pect possible difficulties to occur in the GFE approximation of the RD two-point
boundary value problem (3.1) when µ is σ .

3.1.3 The linear system and the discrete maximum principle

We enter in more details with the GFE problem for the RD equation, and consider
the case r = 1 with a uniform partition of [0, 1] into Mh ≥ 2 elements of size
h = 1/Mh . Matrix B can be written as the sum of a diffusion matrix


µ

 − j = i−1



 h
 2µ


j=i
Z 1 
Bdij = µϕ 0j ϕi0 dx = h
0
 −µ


 j = i+1



 h

 0 otherwise
42 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

and of a reaction matrix

σh


 j = i−1



 6
 4σ h


j=i
Z 1 
Brij = σ ϕ j ϕi dx = 6
0
 σh



 j = i+1


 6


0 otherwise.
As for the load vector, in the simple case f = 1, we have
Z 1
Fi = ϕi dx = h, ∀i = 1, . . . , Nh .
0

In conclusion, the explicit form of the linear algebraic system (2.4) in the case
r = 1, h = 1/Mh and f = 1 is
 
2µ 4σ h µ σh
+ − + 0/    
 h 6 h 6  u1 1
 
 µ σh 2µ 4σ h ...
   
 − +

+
 u2   1 
6 6
   
h h
 = h .
     

.. .. µ σh  
.. .. 
 . . − +
 
.
 
. 
6
   
 h    
 
 µ σh 2µ 4σ h  uNh 1
0/ − + +
h 6 h 6
(3.6)
Theorem 3.1.4 (Monotonicity and maximum principle for the RD problem). Let
Lw := −µw00 + σ w = 1. Then, L is inverse-monotone (see (A.2)) so that the solu-
tion of (3.1) satisfies the condition

u(x) ≥ 0 ∀x ∈ [0, 1]. (3.7)

Moreover, L satisfies the comparison principle A.1.3 with φ (x) = 1/σ , so that the
solution of (3.1) satisfies the maximum principle (MP)
1
0 ≤ u(x) ≤ max φ (x) = ∀x ∈ [0, 1]. (3.8)
x∈[0,1] σ
3.1. REACTION-DIFFUSION MODEL PROBLEM 43

Having established the monotonicity of L and the consequent MP property for

the exact solution of (3.1), we would clearly like the corresponding finite element
approximate solution uh to inherit such a property. Should this occur, we say
that the GFEM is a monotone scheme. Moreover, should the inequalities (3.8)
hold also for the discrete solution uh , we say that uh satisfies a discrete maximum
principle (DMP). With this scope, the following definitions turn out to be useful.
Definition 3.1.5 (Inverse monotone matrix). An invertible square matrix A is said
to be inverse monotone if
A−1 ≥ 0
the inequality being understood in the element-wise sense. Should A be inverse-
monotone, then
Aw ≤ Az ⇒ w ≤ z
(always in the element-wise sense), w, z being two vectors of RNh .
Definition 3.1.6 (M-matrix). An invertible square matrix A is an M-matrix if:
• Ai j ≤ 0 for i 6= j;
• A is inverse-monotone.
Then, we have the following important result.
Theorem 3.1.7 (Sufficient condition for DMP). If the stiffness matrix B = Bd +Br
is an M-matrix, then uh satisfies (3.7) so that the Galerkin finite element method
applied to the RD problem (3.1) is a monotone scheme. Moreover, the solution uh
satisfies the DMP
1
0 ≤ uh (x) ≤ ∀x ∈ [0, 1]. (3.9)
σ
To verify the property of being an M-matrix using directly Def. (3.1.6) is, in
general, prohibitive. The following (necessary and sufficient) condition is useful.
Theorem 3.1.8 (Discrete comparison principle). Let A be a matrix with non-
positive off diagonal entries (Ai j ≤ 0 for i 6= j). Then, A is an M-matrix iff there
exists a positive vector e such that Ae > 0 (in the component-wise sense).
Remark 3.1.9. A first choice to try for the test vector in Thm. 3.1.8 is
e = [1, 1, . . . , 1]T ∈ RNh .
Thus, computing the matrix-vector product Ae amounts to computing the row sum
for each row of A.
44 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

To apply Thm. 3.1.8 we need as a first step to ascertain that Bi j ≤ 0. This is

true iff
µ σh
− + ≤ 0,
h 6
that is, if the following (more conservative) condition is satisfied

Perd < 1, (3.10)

where
σ h2
Perd := (3.11)
6µ
is the non-dimensional Pèclet number associated with the reaction-diffusion prob-
lem.

Theorem 3.1.10 (DMP principle). Assume that (3.10) is satisfied. Then B = Bd +

Br is an M-matrix and uh satisfies the DMP (3.9).

Proof. It suffices to apply the discrete comparison principle with e = [1, 1, . . . , 1]T ∈
RNh and then apply Thm. 3.1.7.

Example 3.1.11 (Reaction-dominated problem). The upper bound on the Pèclet

number (3.10) is equivalent to enforcing an upper limiting value for the mesh size
h, that is r
6µ
h< := hmax (3.12)
σ
The corresponding minimum number of (uniform) mesh elements is
r
1 σ
Mh,min = round = round . (3.13)
hmax 6µ

If h does not satisfy (3.12), then spurious oscillations are likely to occur in the
computed solution uh preventing (3.9) from being verified. To see this, we study
BVP (3.1) in the case where µ = 10−4 , σ = 1 and f = 1, and start to take Mh = 10
uniform partitions of [0, 1], i.e., h = 1/10 = 0.1. Condition (3.12) would require
h < 2.45 · 10−2 , i.e., Mh ≥ 41, to guarantee a DMP for uh , therefore, we expect
numerical difficulties to occur. This is exactly what is shown in Fig. 3.2. Wild
oscillations arise in the neighbourhood of the boundary layers, at x = 0 and x = 1,
but some small over-and undershoots are present also in the remainder of the
computational domain.
3.1. REACTION-DIFFUSION MODEL PROBLEM 45

Figure 3.2: Plot of exact and approximate solutions (r = 1) in the case Mh = 10.

Things drastically change if we take Mh = 41. The result is shown in Fig. 3.3.

Oscillations have disappeared, and it can be checked that max uh (x) = 1 (re-
x∈[0,1]
call that σ = 1), so that the DMP is satisfied. However, a closer glance to uh
reveals that in the neighbourhood of the layers, the accuracy of the approximation
is not very good, because h is not sufficiently small to “capture” the steep gradient
of u. On the other hand, it is quite clear that the mesh size is excessively refined
outside the layers, where the exact solution is almost constant.
Definition 3.1.12 (Discrete maximum norm). For any grid function
ηh : {xi }N
i=1 → R,
h

we define the discrete maximum norm as

kηh k∞,h := max |ηi |, (3.14)
i=1,...,Nh

where the quantities ηi are the nodal values of ηh . The norm (3.14) is the discrete
version of the maximum norm (F.7) and coincides with the maximum norm for a
vector (B.2) upon setting x := [η1 , η2 , . . . , ηNh ]T .
46 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

Figure 3.3: Plot of exact and approximate solutions (r = 1) in the case Mh = 41.

Computing the discrete maximum norm of the error u − uh in the case of

Fig. 3.3 yields ku − uh k∞,h = 0.086, and it can be checked that the maximum
nodal error occurs in correspondance of the very first internal node. A trade-off
between numerical stability and accuracy may be obtained using a non-uniform
partition Th , where the mesh size is smaller close to the boundary layers and
larger elsewhere. The result of the application of this latter strategy is shown in
Fig. 3.4, where the number of intervals (Mh = 41) is the same used in Fig. 3.3,
but with a different distribution of the element size. In particular, h is equal to
0.1/15 in [0, 0.1] and [0.9, 1], and equal to 0.8/11 in [0.1, 0.9]. With this choice
of the grid, the accuracy is highly improved, and the maximum nodal error is
ku − uh k∞,h = 0.0067632, i.e., more than ten times smaller than in the case of a
uniform mesh.
3.1. REACTION-DIFFUSION MODEL PROBLEM 47

Figure 3.4: Plot of exact and approximate solutions (r = 1) in the case Mh = 41

and where the finite element mesh is non-uniform.

3.1.4 Stabilization: the method of lumping of the reaction ma-

trix
Ex. 3.1.11 has shown how to properly cope with a reaction-dominated problem.
Basically, oscillations are removed by taking a sufficiently small mesh size h.
In the case of a uniform mesh, however, this restriction may become too se-
vere. For instance, assuming σ = 1, if µ = 10−4 then condition (3.12) yields
hmax = 2.45·10−2 , i.e., Mh,min = 41, while if µ = 10−6 we have hmax = 2.45·10−3 ,
i.e., Mh,min = 409, and if µ = 10−8 we have hmax = 2.45·10−4 , i.e., Mh,min = 4083.
This trend may become even worse if the reaction-diffusion problem is to be
solved in two or three spatial dimensions, because in such a case, condition (3.12)
has to be satisfied in all spatial directions, giving rise to an explosion of the total
number of mesh nodes, approximately as Mh,min d , d = 2, 3.
In order to ensure a numerically stable solution and, at the same time, main-
tain the computational effort to an acceptable level, a possible alternative is to
introduce a modification to the standard GFE approximation (2.1) of (3.1). The
48 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

Figure 3.5: Plot of exact and approximate solutions (r = 1) in the case Mh = 10.
The lumping stabilization has been used.

modification belongs to the more general technique of stabilization, and consists,

in the specific case of the reaction-diffusion BVP, of the following modified dis-
crete problem:
find uh ∈ Vh such that

Bh (uh , vh ) = F(vh ) ∀vh ∈ Vh , (3.15)

1 and the modified bilinear form B is
where Vh is the finite element space Xh,0 h
defined as
Z 1
Bh (uh , vh ) = µu0h v0h dx + Ih1 (σ uh vh ) ∀uh , vh ∈ Vh . (3.16)
0

The proposed modification is represented by the use of the trapezoidal quadrature

rule (cf. (F.10) with r = 1 and [a, b] = [0, 1]) to compute in an approximate manner
the integral
Z 1
σ uh vh dx.
0
3.1. REACTION-DIFFUSION MODEL PROBLEM 49

The effect of the use of the trapezoidal numerical quadrature can be seen in
Fig. 3.5, which shows the computed solution uh over the same mesh as in Fig. 3.2.
No spurious oscillations are present, uh (x) satisfies the DMP and is always bounded
from above by the exact solution u(x), unlike the case of Fig. 3.3 (with Mh = 41).
Also, the maximum nodal error is equal to 0.0097595, which is far better than that
of Fig. 3.3.
Remark 3.1.13 (The linear system). The use of trapezoidal quadrature is called
reduced integration of the reaction matrix, because instead of computing exactly
the entries Brij , we are deliberately introducing a quadrature error, given by (F.11)2 .
The advantage of such reduced integration is that the modified reaction matrix,
denoted B e r , is diagonal and positive

 σ h + h = σh j=i
Berij = Ih1 (σ ϕ j ϕi ) = 2 2

0 otherwise,

so that the explicit form of the linear algebraic system (2.4) in the (stabilized) case
is (r = 1, f = 1 and h = 1/Mh )
 
2µ µ
+σh − 0/ 
 u1
  
1

 h h 

2µ
    
 µ ...  u   1 
 − +σh  2   
 h h = h  .
   
 .. .. µ  
.
.

 .. 
 . . −  .
  
 . 
h
 
    
 
 µ 2µ  uN 1
0/ − +σh h
h h
(3.17)

Theorem 3.1.14 (DMP principle (revisited)). Problem (3.15) admits a unique so-
lution that satisfies the DMP (3.9) for every value of the Pèclet number Pead .

Proof. By inspection, the modified stiffness matrix B e := Bd + B

e r is a s.d.p. matrix
because it is the sum of a s.d.p. matrix with a positive diagonal matrix. Moreover,
it is a strictly diagonally dominant matrix, with off-diagonal entries ≤ 0. Thus,
the comparison principle shows that B e is an M-matrix irrespective of the Pèclet
number. Thm. 3.1.7 then ensures the DMP to hold.
50 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

Remark 3.1.15 (The lumping procedure). The diagonalization of the reaction ma-
trix is often called lumping because the entries Berii can be interpreted as obtained
by summing by row the matrix Br . This is equivalent to “lump” the weight of the
reaction term into each mesh node xi . Such approach can be extended to higher-
degree finite elements (r ≥ 2), but no theoretical proof of DMP can be easily
established.
Remark 3.1.16 (The error). The integration error (F.11)2 associated with the use
of the trapezoidal quadrature formula is O(h2 ). This allows to show (using a
generalized form of Ceà’s Lemma known as Strang’s Lemma) that for r = 1
ku − uh kV ≤ ChkukH 2 (Ω)
(3.18)
ku − uh kL2 (Ω) ≤ Ch2 kukH 2 (Ω)

provided u ∈ H 2 (Ω) ∩ H01 (Ω).

3.2 Advection-diffusion model problem

Let us consider the following two-point BVP:
−µu00 + au0 = f


 in Ω = (0, 1)

u(0) = 0 (3.19)



u(1) = 0,
where a is the advection coefficient. Throughout the section, unless otherwise
stated, we assume a > 0, although completely similar analysis and results hold
in the case a < 0. The equation system (3.19) is commonly referred to as an
advection-diffusion (AD) problem, and represents a simple model of a chemical
substance whose concentration is u, that diffuses in a fluid with diffusion coef-
ficient µ and velocity a, and interacts with the environment according to a net
production term given by f . Setting for simplicity f = 1, the exact solution of the
problem is
1 eαx − 1
u(x) = x− α (3.20)
a e −1
where α := a/µ. A plot of u is reported in Fig. 3.6 for three increasing values of
α, corresponding to a = 1 and µ = 1, 10−1 , 10−3 .
Fig. 3.6 shows that u tends to the linearly varying function y(x) = x/a, as
α → ∞, except for an interval close to x = 1, denoted boundary layer, where a
3.2. ADVECTION-DIFFUSION MODEL PROBLEM 51

Figure 3.6: Plot of the solution of the advection-diffusion model problem as a

function of α.

steep variation of u occurs to satisfy the boundary condition u(1) = 0. As a matter

of fact, assuming α 1 and expanding the exponential terms in (3.20), we get

1
u(x) ' (x − eα(x−1) ).
a
The solution behaves, asymptotically, as the sum of two contributions: a linear
function and an exponential term that becomes significant as x gets closer to 1.
The exponential term is called boundary layer term because it is responsible for
the variation of u from the value 1/a to the boundary condition u(1) = 0. As in the
reaction-diffusion case, it can be proved that the width δ of the boundary layer is
of the order of α −1 , therefore it gets smaller and smaller if diffusion is dominated
by advection in the model (3.19).

Definition 3.2.1. Model (3.19) is said to be advection-dominated if α 1, other-

wise it is called diffusion-dominated.

Remark 3.2.2 (Inflow/outflow boundaries). The solution of the advection-diffusion

model problem is a non symmetric function of x. This is due to the fact that the
advection term a introduces a preferential direction to the motion of u. In this
case, a is positive so that the barycenter of the front u is drifted from left to right,
as α increases. We distinguish between inflow boundary (the point x = 0), where
52 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

we have a · n < 0 (here, n = −1) and outflow boundary (the point x = 1), where
we have a · n > 0 (here, n = +1), having denoted by n the outward unit normal on
∂ Ω.
Matlab coding. The Matlab script for computing and plotting u is reported below.
close all
x=[0:0.0001:1];
b=1; mu=[1, 1e-1, 1e-2];
a=b./mu;
for k=1:numel(a),
u(k,:)=1/b*(x- (exp(a(k)*x)-1)/(exp(a(k))-1));
end
plot(x,u(1,:),’k-.’,x,u(2,:),’k--’,x,u(3,:),’k-’)
xlabel(’x’)
legend(’\alpha=1’,’\alpha=10’,’\alpha=100’)

Remark 3.2.3 (The conservative form of the AD problem). In many applications,

the BVP (3.19) is written in the following form of a first-order system:

(J(u))0 = f

 in Ω = (0, 1)


 J(u) = −µu0 + au

 in Ω
(3.21)


 u(0) = 0



u(1) = 0.

The above system is called the conservative form of the AD problem. In particular,
Eq. (3.21)1 represents the law of conservation of the advective-diffusive flux J(u)
defined in Eq. (3.21)2 . The two formulations (3.19) and (3.21) are completely
equivalent in the case where µ and a are constant coefficients.

3.2.1 Weak formulation

Let V := H01 (0, 1) with norm given by (C.28). Assume also that f is a given
function in L2 (0, 1). Multiplying (3.19)1 by an arbitrary test function v ∈ V and
integrating by parts the term 01 −µu00 v dx, we get the following weak formulation
R

of the advection-diffusion BVP:

find u ∈ V such that
Z 1 Z 1 Z 1
0 0 0
µu v dx + au v dx = f v dx ∀v ∈ V
|0 {z 0
} | 0 {z } (3.22)
B(u,v) F(v)
3.2. ADVECTION-DIFFUSION MODEL PROBLEM 53

Theorem 3.2.4. Problem (3.22) is uniquely solvable and u satifies the same esti-
mate (3.4) as for the RD problem.

Proof. Let us check that B satisfies the conditions required by the Lax-Milgram
Lemma, because F(·) is the same as in the case of the RD problem.

• continuity of B:
Z 1 Z 1
0 0
|B(u, v)| ≤ µ |u | |v | dx + a |u0 ||v| dx
0 0
≤ µkukV kvkV + akukV kvkL2 (0,1)
≤ (µ + aCP )kukV kvkV ∀u, v ∈ V,

from which M = µ + aCP ;

• coercivity of B:
Z 1
B(u, u) = µkukV2 +a u0 u dx ∀u ∈ V.
0

We have
1
u0 u = (u2 )0
2
so that Z 1 Z 1
0 1
u u dx = (u2 )0 dx = 0
0 2 0

because u ∈ H01 (0, 1). Thus β = µ.

Since all the assumptions of the Lax-Milgram Lemma are verified, the solution u
of (3.22) exists and is unique in V , and satisfies (3.4).

3.2.2 Galerkin finite element approximation

The GFE discretization of (3.19) is (2.1) and the corresponding matrix form is (2.4)
with Z 1 Z 1
Bi j = µϕ 0j ϕi0 dx + aϕ 0j ϕi dx i, j = 1, . . . , Nh
0 0
Z 1
Fi = f ϕi dx i = 1, . . . , Nh .
0
54 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

The stiffness matrix B is positive definite in virtue of Thm. 2.1.2, so that the
discrete problem (2.1) admits a unique solution. Moreover, Ceà’s Lemma tells us
that, assuming u ∈ H r+1 (0, 1)

a
ku − uh kV ≤ C 1 + hr kukH r+1 (0,1) . (3.23)
µ

Thus, we see that uh converges to u, in the H 1 -norm, as h → 0, and that accuracy

improves as the polynomial order is increased, because of the optimal regularity
of the exact solution.
Remark 3.2.5 (The advection-dominated case). Let us examine the estimate (3.23)
as a function of model parameters, µ and a. For this, we set r = 1 (piecewise linear
finite elements) and fix a tolerance ε, sufficiently small. Then, we see that in order
the error to be ' ε, we need that
ε
h' .
C(1 + α)kukH 2 (0,1)

Therefore, if the advection-diffusion problem is advection-dominated (α 1),

the mesh size needs to be much smaller than the tolerance ε, in accordance with
Fig. 3.6, because the boundary layer effect at x = 0 becomes increasingly sig-
nificant as α gets larger and the AD problem becomes “tougher” to solve. In
conclusion, we expect possible difficulties to occur in the GFE approximation of
the AD two-point boundary value problem (3.19) when µ is |a|.

3.2.3 The linear system and the discrete maximum principle

We enter in more details with the GFE problem for the AD equation, and consider
the case r = 1 with a uniform partition of [0, 1] into Mh ≥ 2 elements of size
h = 1/Mh . Matrix B can be written as the sum of the diffusion matrix Bd and of
the advection matrix
 a
 − j = i−1


 2


Z 1  0
 j=i
a 0
Bi j = aϕ j ϕi dx =
0  a

 + j = i+1



 2

0 otherwise.
3.2. ADVECTION-DIFFUSION MODEL PROBLEM 55

In conclusion, the explicit form of the linear algebraic system (2.4) in the case
r = 1, h = 1/Mh and f = 1 is
 2µ µ a 
− + 0/
   
u1 1
 h h 2 
 2µ    
 −µ − a .. u2  1 
  
.    
 h 2 h µ a 
 = h
 
.

 .. .. 
.. .. 
 . . − + 
.
 
. 

 h 2 




 
 µ a 2µ 
0/ − − uNh 1
h 2 h
(3.24)
We notice that in the case of the AD problem the stiffness matrix B is no longer
symmetric as in the case of the RD problem. As a matter of fact, the diffusion
matrix Bd is symmetric (and positive definite), while the advection matrix Ba is
not symmetric.
In the case f = 1, we immediately see that a barrier function for the AD prob-
lem is φ (x) = x/a, so that the application of the comparison principle A.1.3 allows
us to conclude the following result.

Theorem 3.2.6 (Maximum principle for the AD problem). The solution of (3.19)
satisfies the maximum principle (MP)
1
0 ≤ u(x) ≤ ∀x ∈ [0, 1]. (3.25)
a
Remark 3.2.7. The analysis of Thm. 3.2.6 is confirmed by Fig. 3.6 where a =
1. The conclusions of Rem. A.1.5 apply obviously also to the case of the AD
problem.
Let us now investigate under which conditions the finite element approximate
solution uh of (3.19) enjoys a DMP, by applying the discrete comparison princi-
ple 3.1.8. As in the RD case, we need to ascertain that Bi j ≤ 0. The entries Bi,i−1
are ≤ 0 because a > 0, while the coefficients Bi,i+1 are nonpositive iff
µ a
− + ≤ 0,
h 2
that is, if the following (more conservative) condition is satisfied

Pead < 1, (3.26)

56 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

where
ah
Pead := (3.27)
2µ
is the non-dimensional Pèclet number associated with the advection-diffusion
problem.
Remark 3.2.8 (Definition of Pèclet number). If a < 0, the definition of the Pèclet
number becomes
|a|h
Pead := . (3.28)
2µ
Theorem 3.2.9 (DMP principle). Assume that (3.26) is satisfied. Then B = Bd +
Ba is an M-matrix and uh satisfies the DMP
1
0 ≤ uh (x) ≤ ∀x ∈ [0, 1]. (3.29)
a
Proof. It suffices to apply the discrete comparison principle with e = [1, 1, . . . , 1]T ∈
RNh and then apply Thm. 3.1.7.
Example 3.2.10 (Advection-dominated problem). The upper bound on the Pèclet
number (3.26) is equivalent to enforcing an upper limiting value for the mesh size
h, that is
2µ
h< := hmax (3.30)
a
The corresponding minimum number of (uniform) mesh elements is

1 a
Mh,min = round = round . (3.31)
hmax 2µ
If h does not satisfy (3.30), then spurious oscillations are likely to occur in the
computed solution uh preventing the DMP from being verified. To see this, we
study BVP (3.19) in the case where µ = 5 · 10−3 , a = 1 and f = 1, and start to
take Mh = 10 uniform partitions of [0, 1], i.e., h = 1/10 = 0.1. Condition (3.30)
would require h < 10−2 , i.e., Mh ≥ 100, to guarantee a DMP for uh , therefore, we
expect numerical difficulties to occur. This is exactly what is shown in Fig. 3.7.
Wild oscillations arise in the neighbourhood of the boundary layer, at x = 1, and
propagate throughout the whole computational domain, polluting completely the
correct behaviour of the exact solution.
Things drastically change if we take Mh = Mh,min = 100. The result is shown in
Fig. 3.8. Oscillations have disappeared, and it can be checked that max uh (x) =
x∈[0,1]
3.2. ADVECTION-DIFFUSION MODEL PROBLEM 57

Figure 3.7: Plot of exact and approximate solutions (r = 1) in the case µ = 5·10−3
and Mh = 10

0.99 in agreement with Thm. 3.2.9 which predicts uh (x) ≤ 1/a = 1 for all x ∈
[0, 1], so that the DMP is satisfied.
A closer glance to uh reveals that in the neighbourhood of the outflow bound-
ary layer, the accuracy of the approximation is not very good, because h is not
sufficiently small to “capture” the steep gradient of u. On the other hand, it is
quite clear that the mesh size is excessively refined outside the layers, where the
exact solution is practically linear. To clarify this issue in a quantitative manner,
we compute the discrete maximum norm of the error u − uh in the case of Fig. 3.3
obtaining ku − uh k∞,h = 0.13534, and it can be checked that the maximum nodal
error occurs in correspondance of the last internal node. A trade-off between nu-
merical stability and accuracy may be obtained using a non-uniform partition Th ,
where the mesh size is smaller close to the boundary layers and larger elsewhere.
The result of the application of this latter strategy is shown in Fig. 3.9, where the
number of intervals (Mh = 100) is the same used in Fig. 3.8, but with a different
distribution of the element size. In particular, h is equal to 0.1 in [0, 0.9] and equal
to 0.1/90 in [0.9, 1]. With this choice of the grid, the accuracy is highly improved,
58 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

Figure 3.8: Plot of exact and approximate solutions (r = 1) in the case µ = 5·10−3
and Mh = 100..

and the maximum nodal error is ku − uh k∞,h = 0.001513, i.e., almost 100 times
smaller than in the case of a uniform mesh.

3.2.4 Stabilization: the method of artificial diffusion

Ex. 3.2.10 has shown how to properly cope with an advection-dominated prob-
lem. Basically, oscillations are removed by taking a sufficiently small mesh size
h. In the case of a uniform mesh, however, this restriction may become too se-
vere. For instance, assuming a = 1, if µ = 5 · 10−4 then condition (3.30) yields
hmax = 10−3 , i.e., Mh,min = 1000, while if µ = 5 · 10−6 we have hmax = 10−5 , i.e.,
Mh,min = 100000, and if µ = 5 · 10−8 we have hmax = 10−7 , i.e., Mh,min = 107
(ten millions of elements!). This trend may become even worse if the advection-
diffusion problem is to be solved in two or three spatial dimensions, as already
commented in the case of the RD equation.
In order to ensure a numerically stable solution and, at the same time, maintain
the computational effort to an acceptable level, a possible alternative is to resort
3.2. ADVECTION-DIFFUSION MODEL PROBLEM 59

Figure 3.9: Plot of exact and approximate solutions (r = 1) in the case µ = 5·10−3
and Mh = 100, where the finite element mesh is non-uniform.

to a stabilized form of the GFEM as in (3.15), where the modified bilinear form
Bh is defined as
Z 1
Bh (uh , vh ) = B(uh , vh ) + µΦ(Pead )u0h v0h dx ∀uh , vh ∈ Vh . (3.32)
0
| {z }
bh (uh ,vh )

The additional contribution bh represents the discrete weak form of the artificial
diffusion term
−µΦ(Pead )u00
so that the generalized GFEM (3.15), with Bh as in (3.32), can be regarded as the
standard GFE approximation of the modified advection-diffusion two-point BVP:

−µ(1 + Φ(Pead ))u00 + au0 = f



 in Ω = (0, 1)

u(0) = 0 (3.33)



u(1) = 0.
60 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

Clearly, in order to ensure numerical stability, the stabilization function Φ has to

be designed with care. In particular, it is required that:

Φ(t) ≥ 0 if t ≥ 0; (3.34a)
lim Φ(t) = 0. (3.34b)
t→0+

Condition (3.34a) amounts to requiring that if no advection term is present in the

model (a = 0) then Φ = 0, so that no artificial diffusion is introduced. Condi-
tion (3.34b) tells us that the amount of artificial diffusion has to be small when the
advective field strength |a| is small, in order the modified AD problem (3.33) to
be a small perturbation of the original problem (3.19) and, consequently, the com-
puted solution uh to be a sufficiently accurate approximation of the exact solution
of (3.19).

Definition 3.2.11 (Pèclet number of the modified AD problem). Let

µh := µ(1 + Φ(Pead )) (3.35)

denote the modified diffusion coefficient. Then, the Pèclet number of the modified
AD problem is

fad := ah =
Pe
ah
=
Pead
. (3.36)
2µh 2µ(1 + Φ(Pead )) 1 + Φ(Pead )

Proposition 3.2.12 (Choice of Φ). Let Pead be the Pèclet number associated with
the AD problem (3.19) on a given triangulation Th . Then, we have

Φ(Pead ) > Pead − 1 ⇒ fad < 1.

Pe (3.37)

Using the above proposition we have the following result.

Theorem 3.2.13 (Stability of the GFEM with artificial diffusion). If the stabiliza-
tion function Φ satisfies (3.37), then the solution uh computed by the GFEM with
artificial diffusion satisfies the DMP for every value of Pead .

Among the possible choices for Φ, we consider here two alternatives:

• Upwind (UP) stabilization function

ΦUP (t) := t t ≥0 (3.38)

3.2. ADVECTION-DIFFUSION MODEL PROBLEM 61

• Scharfetter-Gummel (SG) stabilization function

ΦSG (t) := t − 1 + B(2t) t ≥0 (3.39)

where
t
B(t) :=
et−1
is the inverse of the Bernoulli function, such that B(−t) = t + B(t). Ex-
panding et in Mc Laurin series, we see that B(0) = 1. Moreover, we have
the following asymptotical behavior:

0 t >0 t t >0
lim B(t) = lim B(−t) =
t→∞ −t t <0 t→∞ 0 t < 0.

A plot of B(t) and B(−t) is reported in Fig. 3.10.

Figure 3.10: Plot of B(x).

Theorem 3.2.14 (DMP principle (revisited)). Problem (3.15) with the UP and SG
stabilizations admits a unique solution that satisfies the DMP (3.9) for every value
of the Pèclet number Perd .
Proof. The stiffness matrix B e=B e d + Ba is positive definite, so that (3.15) is
uniquely solvable. From definitions (3.38) and (3.39), we see that both stabi-
lization functions satisfy Prop. 3.2.12, so that the UP and SG generalized GFE
formulations both satisfy the DMP due to Thm. 3.2.13.
62 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

Using the asymptotical properties of (3.39), we can also see that ΦSG (Pead ) '
ΦUP (Pead ) for large values of the Pèclet number, so that the two schemes produce
the same approximation. Things behave very differently if Pead is small. As a
matter of fact, we have
2t 1 t2
lim ΦSG (t) ' t − 1 + 2
= t − 1 + = = t 2,
t→0 4t 1+t 1+t
1 + 2t + −1
2
which shows that the amount of artificial diffusion introduced by the SG stabiliza-
tion is far smaller than that of the UP stabilization, for which we have
ΦUP (t) = t t → 0.
Therefore, in a moderately to weakly advection-dominated regime, we expect the
SG and UP stabilized GFEMs to produce substantially different approximations,
that computed by the SG method being characterized by a better accuracy than
that of the UP method. As a matter of fact we have the following result.
Theorem 3.2.15 (Convergence in the discrete maximum norm). Assume that µ, a
and f are given continuous functions on [0, 1], and let h be sufficiently small. Then,
the UP and SG stabilized GFEMs are both convergent in the discrete maximum
norm and we have
ku − uh k∞,h ≤ Ch Upwind stabilization, (3.40a)

ku − uh k∞,h ≤ Ch2 SG stabilization. (3.40b)

A more striking result holds for the SG stabilization.
Theorem 3.2.16 (Nodal exactness). Let r = 1. Assume that µ > 0, a and f are
given constants (with f = 1), and that Th is a uniform partition of Ω. Then the
solution uh ∈ Vh = Xh,01 of the SG GFEM satisfies the following relation

1 eαxi − 1
uh (xi ) = u(xi ) = xi − α i = 1, . . . , Nh . (3.41)
a e −1
Remark 3.2.17. Relation (3.41) expresses the property of nodal exactness of the
SG GFEM, and tells us that the approximate solution uh computed by the SG
stabilized formulation coincides with the Π1h interpolant of the exact solution of
the AD BVP (3.19). Because of the above particularly exceptional accuracy of
the SG method, this latter is also known as exponential fitting method or optimal
artificial diffusion method.
3.2. ADVECTION-DIFFUSION MODEL PROBLEM 63

The convergence performance in the V -norm of the stabilized formulations is

illustrated by the following result.

Theorem 3.2.18 (Convergence in the V -norm). Let u ∈ H r+1 (Ω) be the solution
r be the corresponding approximation computed by the
of (3.19) and uh ∈ Vh = Xh,0
generalized GFEM (3.15) with artificial diffusion. Then, for h sufficiently small,
we have
hr Φ(Pead )
ku − uh kV ≤ C kukH r+1 (Ω) + kukH r+1 (Ω) . (3.42)
µ(1 + Φ(Pead )) 1 + Φ(Pead )

(a) Φ = ΦUP (b) Φ = ΦSG

Figure 3.11: Plot of exact and approximate solutions (r = 1) in the case µ =

5 · 10−3 and Mh = 10. Left: upwind stabilization. Right: SG stabilization. The
SG scheme is nodally exact in the special case of constant coefficients and uniform
grid.

Remark 3.2.19. Three important conclusions can be drawn from (3.42):

1. the stabilized GFEM converges to the exact solution of (3.19) because of

property (3.34b);

2. the stabilized GFEM with Φ = ΦUP has order of convergence equal to 1

irrespective of the polynomial degree r;

3. the choice Φ = ΦSG is optimal for r = 1, 2, while it is sub-optimal if r ≥ 3.

64 CHAPTER 3. ELLIPTIC PROBLEMS: THEORY AND FES

Fig. 3.11 shows the numerical solutions computed on a very coarse grid by the
UP and SG methods. We see that both approximations satisfy the DMP, the UP
approximation being not very accurate in the layer region while the SG approx-
imation is nodally exact. The L2 error, the H 1 error and the error in the discrete
maximum norm are equal to 0.17371, 3.2014 and 0.047619 for the UP method,
while are equal to 0.16871, 3.2559 and 7.7716e-16 for the SG method (round-off
error).
Chapter 4

2D Implementation of the GFEM

Abstract
In this chapter, we describe the principal issues that characterize the implemen-
tation in a computational algorithm of the GFEM. For ease of presentation, we
restrict the analysis to the case where the domain Ω is a 2D polygon, with Lips-
chitz boundary Γ = ∂ Ω and outward unit normal vector n. The model boundary
value problem considered in this chapter is:
(
Lu := div(−µ∇u) + a · ∇u + σ u = f in Ω
(4.1)
u=0 on Γ,
where µ, σ and f are continuous functions in Ω, a is a continuously differentiable
vector field over Ω and g is a continuous function over Γ. The BVP (4.1) is
a representative example of a diffusion-advection-reaction differential problem.
The advection field a is a given two-dimensional vector field with components
ax = ax (x, y), ay = ay (x, y), having denoted with x = (x, y)T the position vector in
Ω. The reaction coefficient σ is ≥ 0 over Ω while the diffusion coefficient µ is
such that 0 < µmin ≤ µ(x) ≤ µmax for all x ∈ Ω. Dirichlet boundary conditions
are assumed only for ease of presentation of the method. For more general types
of boundary conditions we refer to Chapt. 1 and to the documentation of the 2D
finite element code EF2D that has been used to show the performance of the 2D
GFEM on test cases for which the exact solution is available. The code, developed
by Marco Restelli, can be downloaded at the link:

⇒ https://beep.metid.polimi.it/

65
66 CHAPTER 4. 2D IMPLEMENTATION OF THE GFEM

4.1 Weak formulation

Assume for simplicity g = 0 and set V := H01 (Ω), endowed with the norm k · kV
given by (C.36). Then, the weak formulation of (4.1) is:
find u ∈ V such that
Z Z
µ∇u · ∇v + (a · ∇u)v + σ uv dΩ = f v dΩ ∀v ∈ V. (4.2)
|Ω {z } | Ω {z }
B(u,v) F(v)

Then, under the following (sufficient) condition

1
σ (x) − diva(x) ≥ 0 ∀x ∈ Ω, (4.3)
2
it is immediate, using the Lax-Milgram Lemma, to check that (4.2) admits a
unique (weak) solution such that

CP |Ω|1/2 k f kC0 (Ω)

kukV ≤ (4.4)
µmin

where |Ω| denotes the area of the domain Ω.

4.2 Geometrical discretization

Let us introduce a geometrical partition of Ω into a family of triangulations Th ,
h > 0, made of triangular elements K, such that hK = diam(K) and h = maxK∈Th hK .
For each K ∈ Th , we denote by ρK the “sphericity” of K, i.e., the diameter of the
largest circle that can be inscribed within K (see Fig. 4.1(a)).

Definition 4.2.1 (Aspect ratio). For each element K ∈ Th , we define the aspect
ratio as
hK
RK := . (4.5)
ρK
Definition 4.2.2 (Regular triangulation). A triangulation Th is said to be regular
if there exists a positive constant κ, independent of h, such that

RK ≤ κ ∀K ∈ Th . (4.6)
4.3. POLYNOMIAL SPACES IN 2D 67

(a) Regular triangle (b) Triangle with large aspect ratio

Figure 4.1: Geometrical notation of a triangulation.

Remark 4.2.3. The regularity condition practically means that every triangle of Th
cannot have an arbitrarily large aspect ratio, as it would be the case, for example,
shown in Fig. 4.1(b) where RK → ∞ as the rightmost vertex is moved along the
right-hand direction. Equivalently, condition (4.6) prevents the minimum angle of
Th from being too small. From now on, we assume that each member of Th is a
regular triangulation.

4.3 Polynomial spaces in 2D

In this section, we introduce the space of polynomials that will be used in the GFE
approximation of (4.1).
Definition 4.3.1 (The space Pr in 2D). Let r ≥ 0 be a fixed integer. Then, the
space of polynomials of degree ≤ r with respect to the independent variables x, y
is defined as
0≤p+q≤r
Pr = span {x p yq } p,q=0,...,r . (4.7)
Example 4.3.2. In the case r = 0, we only have P0 = span {1}. In the case r = 1,
Def. (4.7) yields P1 = span {1, x, y}, if r = 2, we have P2 = span 1, x, y, x 2 , xy, y2 ,
and, finally, if r = 3, we have P3 = span 1, x, y, x2 , xy, y2 , x3 , x2 y, xy2 , y3 .
Definition 4.3.3 (The space Pr (K)). We set
Pr (K) := {p = p(x, y) ∈ Pr , (x, y) ∈ K} . (4.8)
68 CHAPTER 4. 2D IMPLEMENTATION OF THE GFEM

A simple count of dofs yields the following result.

Proposition 4.3.4 (Dimension of Pr (K)). Let r ≥ 0 be a fixed integer. Then
(r + 1)(r + 2)
dim(Pr (K)) = ≡ Nr (K) ∀K ∈ Th . (4.9)
2

Figure 4.2: Black bullets denote the dofs of Pr (K).

The monomial basis used in (4.7) is, in general, not the most preferable for
computations. An equivalent choice is represented by the set of Lagrangian (local)
basis functions ϕi = ϕi (x), with ϕi ∈ Pr (K) for i = 1, . . . , Nr (K), such that

ϕi (x j ) = δi j i, j = 1, . . . , Nr (K), (4.10)

where xi , i = 1, . . . , Nr (K) are the coordinates of the nodes over the element K.
The nodal positions of the dofs for Pr (K) in the cases r = 1, 2, 3 are illustrated in
Fig. 4.2. We see that in the case r = 3, dofs are also located in the interior of K
and not only along its boundary ∂ K.

4.4 The approximation space

From now on, we always assume, otherwise stated, that the polynomial degree
r is a fixed integer ≥ 1. The finite dimensional space Vh ⊂ V that is used in the
GFE approximation of (4.1) is the two-dimensional generalization of the space
r introduced in (2.12) in the 1D case
Xh,0
r
(Th ) = vh ∈ C0 (Ω), vh |K ∈ Pr (K) ∀K ∈ Th , vh = 0 on Γ . (4.11)

Vh := Xh,0

By definition of vector space, any function vh ∈ Vh can be written as

Nh (r)
vh (x) = ∑ vi ϕi (x), (4.12)
i=1
4.4. THE APPROXIMATION SPACE 69

where Nh (r) is the dimension of Vh and represents the number of linearly indepen-
dent basis functions spanning Vh . Such functions, denoted by ϕi , i = 1, . . . , Nh (r),
are the global basis functions whose restriction over each element K is one of
the Nr (K) local basis functions introduced in (4.10), while the real numbers vi ,
i = 1, . . . , Nh (r), are the nodal values of vh (see Fig. 4.3).

Figure 4.3: Global and local basis functions (r = 1).

Remark 4.4.1 (Dimension of Vh ). The count of Nh (r) proceeds in the same manner
as in the 1D case. We first need to determine the number of dofs Nr,i (K) that are
internal to each element K. We have
r2 + 3r + 2
Nr (K) = 3 + 3(r − 1) + Nr,i (K)
= |{z}
2 | {z } | {z }
vertices edges internal

from which we obtain

r2 + 3r + 2 r2 − 3r + 2
Nr,i (K) = − 3r = r ≥ 1.
2 2
Then, we distinguish between:
• number of internal vertices: Nv,i ;
• number of internal edges: Ne,i ;
• number of elements: NK .
Finally, using the fact that any function belonging to Vh is continuous across every
internal edge, we get
r2 − 3r + 2
Nh (r) = Nv,i + (r − 1)Ne,i + NK . (4.13)
2
70 CHAPTER 4. 2D IMPLEMENTATION OF THE GFEM

Relation (4.13) is the generalization of (2.13) to the 2D case and to triangular

grids.

4.5 GFE approximation

The GFE approximation of (4.1) is:
find uh ∈ Vh such that
Z Z
µ∇uh · ∇vh + (a · ∇uh )vh + σ uh vh dΩ = f vh dΩ ∀vh ∈ Vh . (4.14)
|Ω {z } | Ω {z }
B(uh ,vh ) F(vh )

Under assumption (4.3) it is easy to use Lax-Milgram Lemma to check that (4.14)
admits a unique solution uh that satisfies the same a priori estimate (4.4) valid for
the solution u of (4.2). Moreover, assuming that u ∈ H s (Ω) ∩ H01 (Ω), s ≥ 2 being
a given quantity, the discretization error satisfies the estimate (2.18), which in the
present case takes the following form

µmax +CP kakC0 (Ω) +CP2 kσ kC0 (Ω)

ku − uh kV ≤ C hl kukH l+1 (Ω) (4.15)
µmin

where l := min {r, s − 1}. Under the same assumptions above, the following error
estimate in the L2 norm can also be proved

ku − uh kL2 (Ω) ≤ Chl+1 kukH l+1 (Ω) . (4.16)

In both estimates (4.15) and (4.16), the quantity C is a positive constant indepen-
dent of h and depending on the mesh regularity constant κ.
Remark 4.5.1. Estimate (4.15) tells us that the numerical solution on the com-
puter machine of problem (4.14) may be negatively influenced by three possible
sources: 1) large variation of the diffusion coefficient; 2) large dominance of ad-
vection; 3) large dominance of reaction. In each of these three cases, the choice
of h that ensures a small error may be strongly penalized with respect to situa-
tions where problem coefficients vary more gently over Ω. In the case 1), we have
µmax /µmin 1, in the case 2) we have kakC0 (Ω) /µmin 1 and in the case 3) we
have kσ kC0 (Ω) /µmin 1. In all of such cases, suitable stabilization techniques
(generalizing those analyzed in Chapt. 3) have to be adopted.
4.6. THE LINEAR SYSTEM 71

4.6 The linear system

Problem (4.14) can be written in the form of linear algebraic system as

Bu = f, (4.17)

where the entries of the stiffness matrix B are

Z
Bi j = µ∇ϕ j · ∇ϕi + (a · ∇ϕ j )ϕi + σ ϕ j ϕi dΩ i, j = 1, . . . , Nh (r)
Ω

and the load vector is given by

Z
Fi = f ϕi dΩ i = 1, . . . , Nh (r).
Ω

Remark 4.6.1. The stiffness matrix B is positive definite, but not symmetric.
Using the property of addivity of an integral, we can write
Z
Bi j = ∑ µ∇ϕ j · ∇ϕi + (a · ∇ϕ j )ϕi + σ ϕ j ϕi dK i, j = 1, . . . , Nh (r)
K∈Th | K {z }
BK
ij
Z
Fi = ∑ f ϕi dK i = 1, . . . , Nh (r).
K∈Th | K {z }
fiK
(4.18)
Relations (4.18) tell us that the computation of the global stiffness matrix and
of the global load vector is reduced to a for-loop over each element K ∈ Th that
computes:
• a local stiffness matrix BK ∈ RNr (K)×Nr (K) ;

• a local load vector fK ∈ RNr (K) .

4.7 The assembly phase

For each K ∈ Th , let us introduce:
• the local bilinear form

BK (ϕ j , ϕi ) = B(ϕ j |K , ϕi |K ), i, j = 1, . . . , Nh (r);
72 CHAPTER 4. 2D IMPLEMENTATION OF THE GFEM

• the local linear form

FK (ϕi ) = F(ϕi |K ) i = 1, . . . , Nh (r).

BK and FK are nothing but B and F evaluated in correspondance of the restrictions

of the global basis functions ϕi and ϕ j on the element K. We notice that two kinds
of indexing quantities involved in the assembly phase are being used:
• global node numbering: i, j = 1, . . . , Nh (r);
• global element numbering: K ∈ Th .
These two global numberings must not be independent, rather they have to be
mutually related in order to distribute the contributions coming from each K ∈ Th
into the right global row and colum indices i and j. With this aim, it is convenient
to introduce the so-called connectivity matrix TNr (K)×NK , whose generic entry TI,J
contains, for each mesh element J = 1, . . . , NK , the global number of the local
node I = 1, . . . , Nr (K), having adopted the convention that local node numbering
is made according to a counterclockwise orientation. In the example of Fig. 4.4,
we would have
T1,53 = 78, T2,53 = 900, T3,53 = 1213.

Figure 4.4: Local and global numbering of nodes and elements (r = 1).

Matlab coding. A Matlab script for assembling B and f is reported below. The
list of the functions, variables and parameters is commented here:
4.7. THE ASSEMBLY PHASE 73

• get local coordinates dofs is the function that recovers the x and y
coordinates of the Nr (K) dofs over the element K; these coordinates are
contained in the two arrays X dofs K and Y dofs K;

• compute local stiffness matrix is the function that computes the lo-
cal stiffness matrix BK using a suitable 2D quadrature formula;

• compute local load vector is the function that computes the local load
vector fK using a suitable 2D quadrature formula;

• N nodes tot represents the total number of nodes (i.e., the interior nodes
Nh (r) plus the boundary nodes);

• N dofs loc is Nr (K);

• the input parameters mu, a, sigma and f are function pointers to compute
problem coefficients.
B = sparse(N_nodes_tot);
f = sparse(N_nodes_tot,1);
for K=1:N_K
[X_dofs_K, Y_dofs_K] = get_local_coordinates_dofs(X,Y,K);
B_K = compute_local_stiffness_matrix(mu,a,sigma,X_dofs_K,Y_dofs_K);
f_K = compute_local_load_vector(f,X_dofs_K,Y_dofs_K);
for i_loc = 1:N_dofs_loc
row = T(i_loc,K);
for j_loc = 1:N_dofs_loc
col = T(j_loc,K);
B(row,col) = B(row,col) + B_K(i_loc,j_loc);
end
f(row) = f(row) + F_K(i_loc);
end
end

Remark 4.7.1 (The boundary conditions). To account for the homogeneous bound-
ary conditions u|Γ = 0, we need to eliminate all the rows and columns of B and all
the rows of f that correspond to nodes belonging to Γ.
Matlab coding. The Matlab commands for enforcing Dirichlet homogeneous
boundary conditions are reported below. The variable Dir nodes contains the
list of nodes of Th which belong to Γ.
>> B(Dir_nodes,Dir_nodes) = [];
>> f(Dir_nodes) = [];

Once the above assembly loop is performed, system (4.17) is ready to be

solved using a direct or iterative method as described in Appendix. E.
74 CHAPTER 4. 2D IMPLEMENTATION OF THE GFEM

4.8 2D Experimental convergence study of the GFEM

In this section, we shortly verify the numerical performance of the GFEM in the
study of two test cases for which the exact solution is available. All computations
have been performed using the Matlab finite element package EF2D developed by
Marco Restelli and available at the link:

⇒ https://beep.metid.polimi.it/

4.8.1 The homogeneous problem for the Laplacian

In this first example we set Ω = (0, 1) × (0, 1) and µ = 1, a = [0, 0]T , σ = 0,
g = 0 and f (x, y) = 32(y − y2 + x − x2 ) in such a way that the exact solution is the
function u(x, y) = 16 xy(x − 1)(y − 1) (whose value at the center of the unit square
is 1). With these data, the equation system (4.1) reduces to the homogeneous
Dirichlet problem for the Laplace operator (1.2). Computations are performed on
five progressively refined unstructured triangulations Th , with an average mesh
size h approximately equal to 0.5, 0.025, 0.0125, 0.00625 and 0.00312. A plot of
the discrete solution uh (with r = 1) computed over the grid of average mesh size
of 0.00312 is shown in Fig. 4.5.

Figure 4.5: Numerical solution of the homogeneous Dirichlet problem for the
Laplace operator using r = 1.

Tables 4.1, 4.2 and 4.3 show the errors measured in the H 1 and L2 as a function
4.8. 2D EXPERIMENTAL CONVERGENCE STUDY OF THE GFEM 75

of the mesh size h and of the approximation polynomial degree r, for r = 1, 2, 3.

The obtained results closely agree with the expected theoretical estimates (4.15)
and (4.16) in the optimal case l = r. As a matter of fact, in the case r = 1, we
see that the H 1 error (approximately) halves at each mesh refinement step (lin-
ear convergence rate) while the L2 error (approximately) reduces by a factor of 4
(quadratic convergence rate). Similarly, in the case r = 2, we see that the H 1 error
(approximately) reduces by a factor of 4 at each mesh refinement step while the L2
error (approximately) reduces by a factor of 8. Finally, in the case r = 3 the error
reduction in the H 1 norm is (approximately) equal to 8 at each mesh refinement
step while corresponding the error reduction in the L2 norm is (approximately)
equal to 16.

h ku − uh kV ku − uh kL2
0.5 9.78837e − 01 1.12889e − 01
0.025 5.90571e − 01 4.03202e − 02
0.0125 3.03473e − 01 1.03154e − 02
0.00625 1.54287e − 01 2.67608e − 03
0.00312 7.64671e − 02 6.52262e − 04
Table 4.1: Convergence analysis for the GFE method with r = 1 in the case of the
homogeneous Dirichlet problem for the Laplace operator.

h ku − uh kV ku − uh kL2
0.5 2.96266e − 01 1.63666e − 02
0.025 7.70054e − 02 2.29500e − 03
0.0125 1.70161e − 02 2.40015e − 04
0.00625 4.14263e − 03 2.86345e − 05
0.00312 1.01035e − 03 3.38010e − 06
Table 4.2: Convergence analysis for the GFE method with r = 2 in the case of the
homogeneous Dirichlet problem for the Laplace operator.

The approximation errors in the H 1 and L2 norms are also displayed in a log-
log plot as a function of h and of r in Fig. 4.6. The slopes of the various curves
provides the information about the order of convergence of the method.
76 CHAPTER 4. 2D IMPLEMENTATION OF THE GFEM

h ku − uh kV ku − uh kL2
0.5 2.28341e − 02 8.22032e − 04
0.025 3.74118e − 03 7.12394e − 05
0.0125 4.13323e − 04 3.70496e − 06
0.00625 5.27815e − 05 2.39387e − 07
0.00312 6.53857e − 06 1.44786e − 08
Table 4.3: Convergence analysis for the GFE method with r = 3 in the case of the
homogeneous Dirichlet problem for the Laplace operator.

(a) ku − uh kV (b) ku − uh kL2

Figure 4.6: Log-log plots of the errors in the case of the homogeneous Dirichlet
problem for the Laplace operator. Left: H 1 error. Right: L2 error.

4.8.2 An advection-diffusion-reaction BVP

In this second example we set Ω = (0, a)×(0, b) with a = √ 4, b = 3 and take
√ the fol- T
lowing values of the coefficients: µ = µ, σ = σ , a = [−a/ a + b , −b/ a2 + b2 ] ,
2 2
f = σ exp(a · x/µ) and g = exp(a · x/µ, where µ and σ are two positive constants,
in such a way that the exact solution of equation (4.1)1 is
ax + by
u(x, y) = exp(a · x/µ) = exp − √ (x, y) ∈ (0, a) × (0, b).
µ( a2 + b2 )
With these data, the equation system (4.1) is a diffusion-advection-reaction prob-
lem with a constant advection field of unit module directed along the main diag-
onal of the rectangle from the upper right corner to the bottom left corner. To
4.8. 2D EXPERIMENTAL CONVERGENCE STUDY OF THE GFEM 77

perform numerical computations, we use a grid made of right-angled triangles

having horizontal side hx = a/Nx and vertical side hy = b/Ny , respectively, cre-
ated using the following sequence of Matlab commands:

>> Nx = 10; Ny = Nx;

>> [p,e,t]=poimesh(’squareg’,Nx,Ny);
>> [x,y] = mapquad2rect(p(1,:),p(2,:),0,4,0,3);
>> p(1,:)=x;
>> p(2,:)=y;
>> pdemesh(p,e,t)

where the function mapquad2rect computes the map that transforms the refer-
ence square [−1, 1]2 into the rectangle [α, β ] × [γ, δ ]:

function [x,y] = mapquad2rect(X,Y,a,b,c,d);

x = (a+b)/2+(b-a)/2*X;
y = (c+d)/2+(d-c)/2*Y;
return

(a) Pe2D
ad = 0.25 (b) Pe2D
ad = 25

Figure 4.7: Computed solution in the case of the diffusion-advection-reaction

model problem. Left: diffusion-dominated regime. Right: advection-dominated
regime.

It is interesting to check the performance of the GFEM as a function of the

dimensionless number
|a|h
Pe2D
ad := (4.19)
2µ
78 CHAPTER 4. 2D IMPLEMENTATION OF THE GFEM
q
where h := h2x + h2y is the maximum length of each triangle in the grid (i.e.,
the diameter of each element). The quantity Pe2D ad introduced in (4.19) may be
regarded as the two-dimensional extension of the Pèclet number (3.27) introduced
in the case of the 1D advection-diffusion model problem.
As such, should boundary layers occur in the exact solution, the GFEM might
get into numerical troubles depending on the value of Pe2D ad . More precisely, as
discussed in the case of the 1D advection-diffusion model problem in Sect. 3.2,
if Pe2D
ad > 1 we expect the possible onset of spurious oscillations affecting the
behaviour of the discrete solution uh in the neighbourhood of the layer.
The above analysis is completely confirmed by √ the two graphs in Fig. 4.7.
We have set a = 4, b = 3, Nx = Ny = 10 so that h = a2 + b2 /10 = 5/10 = 0.5.
Fig. 4.7(a) shows the computed FE solution uh using r = 1 in the case µ = σ = 1.
Using (4.19) we get Pe2D
ad = 0.25 and the computed solution is a very good approx-
imation of the exact solution over the entire computational domain. Fig. 4.7(b)
shows the computed FE solution uh using r = 1 in the case µ = 0.01 and σ = 1.
Using (4.19) we get Pe2D ad = 25 and the computed solution exhibits an unstable
behaviour in the neighbourhood of the internal boundary layer (at [0, 0]T ) which
is transported by the diffusion mechanism throughout the remainder of the do-
main, making the overall quality of the approximation completely unsatisfactory.
As anticipated before, to remedy this latter negative performance of the GFEM
we need either to reduce the mesh size h (which is prohibitive in this example!)
or introduce a stabilization procedure that extends to the present 2D case those
proposed in the case of the 1D advection-diffusion problem in Sect. 3.2.4.
Part III

The GFEM for Linear Elasticity

79
81

This part addresses the study of the Navier-Lamè equations for linear elasticity
in the compressible and incompressible regimes. Theory is carried out in a gen-
eral three-dimensional setting while numerical approximation using the GFEM is
addressed in two spatial dimensions (plane stress/plane strain).
82
Chapter 5

Compressible Linear Elasticity:

Theory and Finite Element
Approximation

Abstract
In this chapter we illustrate the Navier-Lamè equations of linear elasticity in the
general three dimensional case and in the compressible regime.

5.1 Essentials of solid mechanics

Let B be a material body which occupies a volume portion Ω ⊂ R3 . The surface
of the body, denoted by Γ is divided into two mutually disjoint parts, ΓD and ΓN ,
and a unit outward normal vector n = [n1 , n2 , n3 ]T is defined almost everywhere
(a.e.) on Γ. On ΓD , the body is constrained to ground, while on ΓN a force per
unit area g = [g1 , g2 , g3 ]T is applied. Moreover, at each point of the volume Ω, a
body force per unit volume f = [ f1 , f2 , f3 ]T is defined (see Fig. 5.1).
The displacement of a material point P = [x1 , x2 , x3 ]T of B, because of the
deformation consequent to the action of the loads, is expressed by the vector

x10 − x1
 

r0 − r := u =  x20 − x2  ,
x30 − x3

83
84 CHAPTER 5. 3D COMPRESSIBLE ELASTICITY

Figure 5.1: Elastic body: geometrical notation; undeformed and deformed con-
figurations; loads and constraints.

where the coordinates of the point P in the deformed state are denoted with a (·)0
superscript. The local measure of the deformation of the body is given by the
Green-Lagrange deformation tensor

1 ∂ ui ∂ uk ∂ ul ∂ ul
Uik := + + (5.1)
2 ∂ xk ∂ xi ∂ xi ∂ xk

where the Einstein convention of repeated indices has been used in the third term
at the right-hand side.

Definition 5.1.1 (The strain tensor). Assume that deformations are small, i.e.

∂ ul
∂ xi 1. (5.2)

Then, the Green-Lagrange tensor can be approximated by the (small) deformation

tensor or strain tensor, defined as

1 ∂ ui ∂ uk
εik := + . (5.3)
2 ∂ xk ∂ xi

Using (A.6) into definition (5.3), we see that

1
∇u + (∇u)T ≡ ∇S u

ε=
2
5.1. ESSENTIALS OF SOLID MECHANICS 85

where ∇S u denotes the symmetric gradient of the displacement u. The strain

tensor is symmetric and its trace is given by
3
Tr(ε) = ∑ εii = divu = Tr(∇u), (5.4)
i=1

having used (A.7) in the last equality. The last identity is consistent with the fact
that
Tr(∇u) = Tr(∇S u) + Tr(∇SS u) = Tr(∇S u) = Tr(ε),
because the trace of a skew-symmetric tensor is identically zero, by definition.
Remark 5.1.2 (Small deformations/small displacements). The assumption of small
deformations does not necessarily imply that also displacements are small. To see
this, consider the case of a long thin rod under large deflection conditions, in which
the motion of the end points of the rod may be non-negligible but the extensions
and compressions in the rod are small. In practice, however, the displacement
vector u for a three-dimensional body under small deformation is itself small.
This amounts to adding, from now on, to (5.2) the so-called small displacement
assumption
|ui | `, (5.5)
where ` is a characteristic length of the 3D body.

5.1.1 The relation σ − ε

Consider the elastic body B under the uniaxial loading condition of Fig. 5.2. Let

Figure 5.2: Elastic body subject to an uniaxial load.

` − `0 d − d0
ε := , εt :=
`0 d0
86 CHAPTER 5. 3D COMPRESSIBLE ELASTICITY

denote the longitudinal and transveral deformation, respectively. Define also the
normal stress (in N/mm2 , or MPa)
F
σ :=
A0
where A0 is the areal cross-section of the body, and the transversal contraction
coefficient
εt
ν := − .
ε
Definition 5.1.3 (Reversibility). The relation σ − ε is reversible whenever, if the
applied force is released, the deformation is completely recovered, i.e., the body
returns to the original (undeformed) configuration. In mathematical terms, this
corresponds to assuming that the relation σ = σ (ε) is invertible.
Definition 5.1.4 (The linear relation: Hooke’s law). Within a certain deformation
range, i.e., if we have
0 ≤ |ε| ≤ εmax ,
then, the relation σ − ε is linear, i.e., we have

σ = Eε (5.6)

where E is the so-called Young modulus (in MPa). Relation (5.6) is universally
known as Hooke’s law (“Ut tensio, sic vis”). In the general 3D case, (5.6) be-
comes
σi j = Ci jkl εkl i, j, k, l = 1, 2, 3 (5.7)
or, in tensor notation
σ = C ε. (5.8)
The fourth-order tensor C is called the elastic matrix of the material and the
34 = 81 quantities Ci jkl are the elastic constants of the material. Relation (5.7) is
the so-called generalized Hooke’s law.
The constraint of the stress and deformation tensors to be symmetric

σi j = σ ji , and εkl = εlk

and the fact that the body energy functional is a symmetric and positive definite
quadratic form
∂ σi j ∂ σkl
= ,
∂ εkl ∂ εi j
5.1. ESSENTIALS OF SOLID MECHANICS 87

allow us to conclude that the elastic tensor C is symmetric, so that the 81 constants
reduce to (36-6)/2+6=21, and the matrix form of (5.7) reads
C1111 C1122 C1133 C1112 C1123 C1131
    
σ11 ε11
 σ22   C2222 C2233 C2212 C2223 C2231    ε22 
 
C3333 C3312 C3323 C3331 
  
 σ33  
  ε33  . (5.9)
 
 σ12  =  C1212 C1223 C1231   2ε12 
  
Sym  
C2323 C2331   2ε23 
   
 σ23  
σ31 C3131 2ε31
| {z } | {z } | {z }
σ C ε

Remark 5.1.5 (Voigt notation). Eq. (5.9) adopts a matrix format to express the
generalized Hooke law and strongly simplifies the representation of the stress ten-
sor. This approach is widely used in both theoretical and numerical treatment of
solid mechanics problems and is well-known as Voigt notation.

5.1.2 Linear isotropic elasticity

Definition 5.1.6 (The linear isotropic elastic regime). A linear elastic material
body B in which the mechanical response is independent of the coordinate system
is said to be a linear isotropic elastic material.
Under this condition, the σ − ε relation takes the following form
σi j = λ δi j δkl εkl + µ(δik δ jl + δil δ jk )εkl i, j, k, l = 1, 2, 3, (5.10)
or, more simply
σi j = 2µεi j + λ εkk δi j i, j = 1, 2, 3. (5.11)
Example 5.1.7. Relation (5.11) can be easily verified. Take, for instance, i = 1,
j = 2 in (5.10). Using the fact that δ12 = 0 and that ε12 = ε21 , we obtain
σ12 = λ δ12 [δ11 ε11 + δ22 ε22 + δ33 ε33 ] + µ [δ11 δ22 ε12 + δ11 δ22 ε21 ]
= λ div u δ12 + 2µε12 = 2µε12
that is, relation (5.11) in the case i = 1, j = 2.
The two quantities, λ and µ, are known as the Lamè constants of the material
and have the following definitions in terms of the material mechanical parameters
E (Young modulus) and ν (Poisson modulus)
νE E
λ := , µ := . (5.12)
(1 + ν)(1 − 2ν) 2(1 + ν)
88 CHAPTER 5. 3D COMPRESSIBLE ELASTICITY

Remark 5.1.8 (Limits for E and ν). The Young modulus E is a strictly positive
quantity. The Poisson modulus, in principle, satisfies

−1 < ν < 0.5.

In almost all cases of practical interest, it is customary to assume

0 < ν < 0.5. (5.13)

The limiting case ν = 0.5 is critical, as can be seen from definition (5.12) because
λ becomes infinite. In this situation, the elastic material is said to be incompress-
ible and B can deform under the condition that its volume remains constant. This
case will be studied in detail in Chapter 7.
Using (5.10) and (5.11), the stress-strain relation in matrix form yields
    
σ11 λ + 2µ λ λ 0 0 0 ε11
 σ22  
   λ + 2µ λ 0 0 0    ε22 
 
 σ33  
= λ + 2µ 0 0 0   ε33 
 
  (5.14)
 σ12  
   Sym µ 0 0   2ε12 
 
 σ23   µ 0   2ε23 
σ31 µ 2ε31
| {z } | {z } | {z }
σ C ε

or, in compact tensor notation

σ = 2µε + λ (Trε) δ = 2µε + λ divu δ , (5.15)

where δ is the identity tensor and (5.4) has been used.

5.2 Mathematical model of isotropic linear elasticity

Let Ω ⊂ R3 be the volume of a linear elastic isotropic body B before deformation
occurs (see Fig. 5.1 for notation). Let f : Ω → R3 and g : ΓN → R3 denote two
given vector fields representing forces per unit volume and unit surface, respec-
tively. Finally, let E > 0 and ν ∈ (0, 1/2) denote the Young modulus and Poisson
coefficient of the material constituting the body, respectively. The problem of lin-
ear elastic isotropic elasticity (shortly, elasticity, from now on) reads:
5.2. THE NAVIER-LAMÈ MODEL 89

find the displacement u : Ω → R3 such that:

divσ (u) + f = 0 in Ω (5.16a)

σ (u) = C ε(u) = 2µ∇S u + λ divu δ in Ω (5.16b)
u=0 on ΓD (5.16c)
σ (u)n = g on ΓN , (5.16d)

or, using Einstein’s notation:

find ui : Ω → R3 such that:

σi j, j + fi = 0 in Ω (5.17a)
σi j = Ci jkl εkl = 2µεi j + λ εkk δi j in Ω (5.17b)
1
εi j = (ui, j + u j,i ) in Ω (5.17c)
2
ui = 0 on ΓD (5.17d)
σi j n j = g i on ΓN . (5.17e)

• (5.16a): indefinite equilibrium equation (force balance);

• (5.16b): constitutive relation (stress-strain);

• (5.16c): homogeneous Dirichlet boundary condition (the body is constrained

to ground on a part of its boundary);

• (5.16d): Neumann boundary condition (the body is subject to external forces

per unit area on a part of its boundary).

Remark 5.2.1. The mathematical model (5.16) is usually referred to as the Navier-
Lamè system for linear isotropic elasticity. It is clearly possible to eliminate the
strain and stress fields from the constitutive laws (5.17c)-(5.17b) in favor of the
sole displacement field u to obtain the following form of the Navier-Lamè system:
find ui : Ω → R3 such that:

−div (2µ∇S u + λ divuδ ) = f in Ω (5.18a)

u=0 on ΓD (5.18b)
(2µ∇S u + λ divu δ ) n = g on ΓN . (5.18c)

The BVP (5.18) is called displacement formulation of elasticity because the only
remaining dependent variable is the displacement field u.
90 CHAPTER 5. 3D COMPRESSIBLE ELASTICITY

Remark 5.2.2 (Reaction forces). Integrating (5.16a) over the body volume yields
Z
(divσ + f) dΩ = 0.
Ω

Using Green’s formula to treat the first contribution, we obtain

Z Z Z
σ n dΓ + σ n dΓ + f dΩ = 0
ΓD ΓN Ω

Enforcing the Neumann boundary condition (5.16d), we obtain the following re-
lation that expresses the global balance of forces that act on the material body

RD + Ftot,N + Ftot,Ω = 0 (5.19)

where:

• (reaction forces, to be determined in order to satisfy global equilibrium)

Z
RD := σ n dΓ;
ΓD

• (total surface forces) Z

Ftot,N := g dΓ;
ΓN

• (total volume forces) Z

Ftot,Ω := f dΩ.
Ω

5.3 The weak formulation

Let us now apply the machinery of the weak formulation of a BVP to the elasticity
problem (5.16). To this purpose, from now on we assume that f ∈ (L2 (Ω))3 and
g ∈ (L2 (ΓN ))2 , respectively. Then, we set
1
V := (H0,Γ D
(Ω))3 (5.20)

and multiply (5.16a) by a vector-valued test function v ∈ V to obtain

Z
(divσ (u) + f) · v dΩ = 0 ∀v ∈ V.
Ω
5.3. THE WEAK FORMULATION 91

Using Green’s formula to treat the first term, enforcing the Neumann boundary
conditions on ΓN and the Dirichlet boundary conditions for v on ΓD , we obtain:
find u ∈ V such that
Z Z Z
σ (u) : ∇v dΩ = f · v dΩ + g · v dΓ ∀v ∈ V.
Ω Ω ΓN

Proposition 5.3.1 (Scalar product with a symmetric tensor). Let q ∈ R3×3 be a

given tensor. For every symmetric tensor τ ∈ R3×3 , we have
τ : q = τ : (qS + qSS ) = τ : qS . (5.21)
Applying Prop. 5.3.1 to the previous integral equality, where τ ≡ σ (u) and
q ≡ ∇v, and using the generalized Hooke’s law (5.16b), we finally obtain the
weak formulation of the elastic BVP:
find u ∈ V such that
Z Z Z
C ε(u) : ε(v) dΩ = f · v dΩ + g · v dΓ ∀v ∈ V. (5.22)
|Ω {z } |Ω {z ΓN
}
B(u,v) F(v)

Theorem 5.3.2 (Variational formulation of (5.22)). The minimization problem:

find u ∈ V such that
J(u) ≤ J(v) ∀v ∈ V (5.23)
where
1
Z Z Z
J(v) = σ (v) : ε(v) dΩ − f · v dΩ − g · v dΓ
|2 Ω {z }| Ω
{z ΓN
}
strain energy work
Z against external forces
1 (5.24)
Z
= µε(u) : ε(v) dΩ + λ (div v)2 dΩ
ΩZ Z 2 Ω

− f · v dΩ − g · v dΓ ∀v ∈ V,
Ω ΓN

is equivalent to solving the weak problem (5.22).

Remark 5.3.3 (Principle of minimum of Potential Energy and Principle of Virtual
Works). The functional (5.24) is the total potential energy stored in the elastic
body B. Thus, the minimization problem (5.23) is the principle of minimum of
the potential energy while the weak formulation (5.22) is the principle of virtual
works. In this regard, the test function v has the mechanical meaning of virtual
displacement.
92 CHAPTER 5. 3D COMPRESSIBLE ELASTICITY

5.4 Existence and uniqueness of the weak solution

In this section, we apply the theoretical tool of the Lax-Milgram Lemma to verify
that (5.22) admits a unique solution depending continuously on the data of the
elastic problem.

Theorem 5.4.1 (Korn’s inequality). There exists a positive constant CK = CK (Ω)

such that
kε(v)k2L2 (Ω) ≥ CK k∇vk2L2 (Ω) ∀v ∈ V (5.25)
where
Z 3 Z
kε(v)k2L2 (Ω) = ε(v) : ε(v) dΩ = ∑ (εi j (v))2 dΩ ∀v ∈ V. (5.26)
Ω i, j=1 Ω

Proposition 5.4.2 (Rigid body motions). Let

R := v : Ω → R3 of the form v = a + b × x, with a, b constant vectors

be the space of rigid body motions. Then, we have

kε(v)kL2 (Ω) = 0 ⇔ v = 0, ∀v ∈ V. (5.27)

Proof. By definition of rigid body motion, we have

ε(v) = 0 ∀v ∈ R.

Applying the homogeneous Dirichlet boundary conditions (5.16)3 yields

a+b×x = 0 ∀x ∈ ΓD

from which, necessarily, we must have a = b = 0. This proves that the only
admissible rigid body motion for the elasticity problem is v(x) = 0 for all x ∈ Ω,
which is (5.27).
The combined use of Korn’s inequality (5.25), of Prop. 5.4.2 and of the Cauchy-
Schwarz inequality allow us to prove the following result.

Proposition 5.4.3. For every v ∈ V we have

n o n o
1/2 1/2
min CK , 1 k∇vkL2 (Ω) ≤ kε(v)kL2 (Ω) ≤ max CK , 1 k∇vkL2 (Ω) . (5.28)
5.4. EXISTENCE AND UNIQUENESS OF THE WEAK SOLUTION 93

Proof. For every v ∈ V , we have (omitting the explicit dependence on v for nota-
tional clarity)

kεk2L2 (Ω) = ∑3i, j=1 kεi j k2L2 (Ω) = ∑3i=1 kεii k2L2 (Ω)
(5.29)
+ 2kε12 k2L2 (Ω) + 2kε13 k2L2 (Ω) + 2kε23 k2L2 (Ω) .

The sum of the first three terms on the right-hand side yields
3 3
∂ vi 2

2
kεii k 2 =
∑ L (Ω) ∑ ∂ xi L2(Ω).
(5.30)
i=1 i=1

As for the sum of the other three terms, we have, for the first contribution

1 ∂ v1 ∂ v2 2
Z
2kε12 k2L2 (Ω)=2 + dΩ
Ω 2 ∂ x2 ∂ x1
1 ∂ v1 2 ∂ v 2
2
∂ v
1
∂ v
2
≤ + + 2 ,
2

∂ x2 L2 (Ω) ∂ x1 L2 (Ω) ∂ x2 L2 (Ω) ∂ x1 L2 (Ω)

and similarly for the other two contributions. Noting that for every a, b ∈ R we
have 2ab ≤ a2 + b2 , the previous inequality yields
∂ v 2 ∂ v 2
1 2
2kε12 k2L2 (Ω) ≤ 2 + . (5.31)
∂ x2 L (Ω) ∂ x1 L2 (Ω)

Replacing (5.30) and (5.31) into (5.29), we obtain

∂ v 2 ∂ v 2 ∂ v 2
1 2 3
kεk2L2 (Ω) ≤ 2 + 2 +
∂ x1 L (Ω) ∂ x2 L (Ω) ∂ x3 L2 (Ω)
∂ v 2 ∂ v 2
1 2
+ 2 +
∂ x2 L (Ω) ∂ x1 L2 (Ω)
∂ v 2 ∂ v 2
1 3
+ 2 +
∂ x3 L (Ω) ∂ x1 L2 (Ω)
∂ v 2 ∂ v 2
2 3
+ +
∂ x3 L2 (Ω) ∂ x2 L2 (Ω)
= k∇v1 k2L2 (Ω) + k∇v2 k2L2 (Ω) + k∇v3 k2L2 (Ω) = k∇vk2L2 (Ω) .

Combining this latter inequality with Korn’s inequality (5.25), we obtain (5.28).
94 CHAPTER 5. 3D COMPRESSIBLE ELASTICITY

Theorem 5.4.4. The quantity

kε(v)kL2 (Ω) : V → R+ (5.32)

1 (Ω))3 .
is an equivalent norm on V = (H0,ΓD

Proof. Relation (5.27) shows that (5.32)

n is a norm.
o Then, applying
n Def.
o C.3.2
1/2 1/2
to the inequality (5.28), with K1 ≡ min CK , 1 and K2 ≡ max CK , 1 , com-
pletes the proof.
Having introduced an appropriate norm on V , we are ready to proceed.
• Continuity of B(·, ·): we have
Z Z
|B(u, v)| = C ε(u) : ε(v) dΩ ≤ kC k∞ |ε(u)||ε(v)| dΩ

Ω Ω
≤ kC k∞ kukV kvkV ∀u, v ∈ V

where kC k∞ denotes the maximum norm of matrix C (see (B.5)). Applying

this latter definition to (5.14) we see that

kC k∞ = 3λ + 2µ.

Thus, B(·, ·) is continuous with

M = 3λ + 2µ. (5.33)

• Coercivity of B(·, ·): we have

Z Z
B(u, u) = C ε(u) : ε(u) dΩ = (2µε(u)+λ divu δ ) : ε(u) dΩ ∀u ∈ V,
Ω Ω

from which
Z
B(u, u) = 2µkukV2 + λ divu δ : ε(u) dΩ
ZΩ
= 2µkukV2 + λ (divu)2 dΩ ≥ 2µkukV2 ∀u ∈ V.
Ω

Thus, B(·, ·) is coercive with

β = 2µ. (5.34)
5.4. EXISTENCE AND UNIQUENESS OF THE WEAK SOLUTION 95

• Continuity of F(·): we have

Z Z
|F(v)| ≤ |f||v| dΩ + |g||v| dΓ
Ω ΓN
≤ kfkL2 (Ω) kvkL2 (Ω) + kgkL2 (ΓN ) kvkL2 (ΓN ) ∀v ∈ V.

Using Poincarè’s inequality, the Trace Theorem C.7.7 and Thm. 5.4.4, we
finally get

|F(v)| ≤ (CP kfkL2 (Ω) +CΓ kgkL2 (ΓN ) )kvkV ∀v ∈ V.

Thus, F(·) is continuous with

Λ = CP kfkL2 (Ω) +CΓ kgkL2 (ΓN ) . (5.35)

Theorem 5.4.5 (Well posedness of the elasticity problem). Problem (5.22) admits
a unique solution u ∈ V that satisfies the a priori estimate

CP kfkL2 (Ω) +CΓ kgkL2 (ΓN )

kukV ≤ . (5.36)
2µ
Proof. To prove (5.36), notice that

B(u, u) = F(u)

and then use the coercivity of B, the continuity of F and relations (5.34) and (5.35).
96 CHAPTER 5. 3D COMPRESSIBLE ELASTICITY
Chapter 6

2D Linear Elasticity: Theory and

Finite Element Approximation

Abstract
In this chapter we consider the Navier-Lamè equations of linear elasticity in the
two-dimensional (2D) case (plane stress and plane strain conditions). We first ap-
ply the general machinery of weak formulation and then we construct the Galerkin
Finite Element approximation of the mechanical problem. Computational tests to
validate the numerical performance of the method are run in Matlab using the
code EF2D el H developed by Marco Restelli and modified by Riccardo Sacco an
Aurelio Giancarlo Mauri. The code is available at the link:

⇒ https://beep.metid.polimi.it/

6.1 Two-dimensional models in elasticity

In many practical applications of Continuum Mechanics, it is not really neces-
sary to solve the general three-dimensional model (5.16) in order to determine the
deformed configuration of the elastic body. As a matter of fact, under suitable
mechanical assumptions, it is possible to deduce from (5.16) two simpler formu-
lations that are appropriate for the study of problems in two spatial dimensions.
These formulations are known as plane stress and plane strain models for lin-
ear elasticity, and the stress-strain relation can be written in the following general

97
98 CHAPTER 6. 2D ELASTICITY

form
σ = Cε (6.1)
where C ∈ R3×3 is the elastic matrix, σ is the 2D stress tensor (in vector form)
 
 σxx 
σ := σyy ,
 
τxy

and where, denoting by u = [u, v]T the 2D displacement field, the 2D strain tensor
(in vector form) is defined as
   
 εxx   ∂ u/∂ x 
ε := εyy = ∂ v/∂ x . (6.2)
∂ u/∂ y + ∂ v/∂ x
   
γxy

6.1.1 Plane stress

The characteristic assumptions for this case are (see Fig. 6.1):

• the thickness of the body is small compared to the dimensions of the body
in the other two directions (example: plates);

• the acting loading forces are applied in the symmetry plane of the thickness
and do not have other transversal components with respect to such a plane.

z
L L>>t
y
x

Figure 6.1: Plane stress.

6.1. TWO-DIMENSIONAL MODELS IN ELASTICITY 99

Assuming that the xy plane coincides with the mid-plane of the body thickness
and that the z-axis is perpendicular to the xy plane, the plane stress problem is
mathematically identified by the following conditions
σzz = 0, τzx = τzy = 0. (6.3)
The corresponding elastic matrix is
 
1 ν 0
E 
C= ν 1 0 . (6.4)
1 − ν2
0 0 (1 − ν)/2
Remark 6.1.1. The strain component εzz is, in general, nonnull, and can be post-
computed as
ν
εzz = − (εxx + εyy ).
1−ν

6.1.2 Plane strain

The characteristic assumptions for this case are (see Fig. 6.2):
• the body has a dimension much larger than the other two (example: long
thin beam, cylinders, drive axles);
• the displacement field does not have components along the axis of the beam
(taken equal to z) but only in directions perpendicular to z;
• the acting loading forces do not depend on the z coordinate and have null
component with respect to the z axis.
Assuming that the z axis coincides with the longitudinal axis of the body, the
plane strain problem is mathematically identified by the following conditions
εzz = 0, γzx = γzy = 0. (6.5)
The corresponding elastic matrix is
 
1−ν ν 0
E
C=  ν 1−ν 0 . (6.6)
(1 + ν)(1 − 2ν)
0 0 (1 − 2ν)/2
The elastic matrix (6.6) is the same as in the general 3D case (Eq. (5.11)), except
for the elimination of the rows and columns in (5.14) corresponding to the null
components of the strain tensor.
100 CHAPTER 6. 2D ELASTICITY

L>>h
y
L

h x

z
Figure 6.2: Plane strain.

Remark 6.1.2. The stress component σzz is, in general, nonnull, and can be post-
computed as
σzz = ν(σxx + σyy ).

6.2 The GFE approximation in the 2D case

In this section, we briefly describe the use of the GFEM for the numerical approx-
imation of the elasticity equations (5.16) in the plane stress/plane strain regimes
discussed in Sects. 6.1.1 and 6.1.2. To this purpose, we refer to Chapt. 4, for the
main general concepts and implementation issues. In what follows, we assume
that Th is a given regular triangulation of the computational domain Ω into trian-
gles K of diameter hK and area |K| and for sake of simplicity of the presentation
of the method, we restrict ourselves to the special, and widely used, case r = 1,
corresponding to the finite element space

Vh = vh ∈ (C0 (Ω))2 | vh |K ∈ (P1 (K))2 ∀K ∈ Th , vh = 0 on ΓD .

(6.7)
6.2. THE GFE APPROXIMATION IN THE 2D CASE 101

The above choice of Vh ⊂ V amounts to representing each component of the dis-

crete displacement field by a linearly varying function on each mesh element,
continuous across interelement edges and vanishing on the Dirichlet portion of
the boundary where the elastic body is fixed to ground. For a given Th , we de-
note by Nh + Nh = 2Nh the dimension of Vh , so that any vector-valued function
T
vh = vh,x , vh,y ∈ Vh can be represented as
Nh Nh
vh,x (x, y) = ∑ vxj ϕ j (x, y) vh,y (x, y) = ∑ vyj ϕ j (x, y)
j=1 j=1
n oNh n oNh
where the real numbers vxj , vyj are the nodal dofs of the x- and y-
j=1 j=1
Nh
components of the displacement vh , while the Nh functions ϕ j (x, y) j=1 are
the global pyramid-like basis functions associated with the triangulation Th (see
Fig. 4.3).

6.2.1 The Galerkin FE problem

The Galerkin finite element problem associated with the weak formulation (5.22)
of the elasticity problem in two spatial dimensions reads:
find uh ∈ Vh such that
B(uh , vh ) = F(vh ) ∀vh ∈ Vh . (6.8)
In compact matrix form, the above problem translates into the solution of the
following linear algebraic system
Ku = f (6.9)
where the entries of the global stiffness matrix K and of the global load vector f
are given by:
Ki j = B(ϕ j , ϕi ) i, j = 1, . . . , Nh
Fi = F(ϕi ) i = 1, . . . , Nh .
The application of the Lax-Milgram lemma to (6.8) allows us to conclude the
following result.
Theorem 6.2.1 (Well posedness of the discrete elasticity problem). Problem (6.8)
admits a unique solution uh ∈ Vh (equivalently, system (6.9) is uniquely solvable).
Moreover, uh satisfies the a priori estimate (5.36).
Remark 6.2.2. Coercivity of B(·, ·) implies that K is a s.p.d. matrix.
102 CHAPTER 6. 2D ELASTICITY

6.2.2 Local approximations and matrices

Let K be a given triangle in Th . We denote from now on by uh = uh (x, y) and vh =
vh (x, y) the two components of the local discrete displacement field uK
h ≡ uh |K in
the x and y directions, respectively, so that for each K ∈ Th we have

3 3
uh (x, y) = ∑ ui ϕiK (x, y), vh (x, y) = ∑ vi ϕiK (x, y),
i=1 i=1

where, for i = 1, 2, 3:

• ϕiK is the restriction over K of the corresponding global basis function de-
fined over Th (see Fig. 4.3);

• ui , vi are the dofs of the FE approximation over K, corresponding to the

nodal values of uh and vh , respectively (see Fig. 6.3);

Figure 6.3: Local dofs for the CST in Linear Elasticity.

Denoting by dK = [u1 , v1 , u2 , v2 , u3 , v3 ]T the vector of the local nodal dis-

placements (numbered according to a counterclockwise orientation along the tri-
angle boundary ∂ K), we have in compact matrix form

UK (x, y) = NK (x, y)dK ,

where UK (x, y) := [uh (x, y), vh (x, y)]T and

K
ϕ2K (x, y) ϕ3K (x, y)

K ϕ1 (x, y) 0 0 0
N (x, y) :=
0 ϕ1K (x, y) 0 ϕ2K (x, y) 0 ϕ3K (x, y)
6.2. THE GFE APPROXIMATION IN THE 2D CASE 103

is the so-called local matrix of the shape functions. Using the kinematic rela-
tion (5.3) we obtain the following expression for the discrete strain tensor over the
element K
ε K (x, y) = BK (x, y)dK

where  
ε11 (x, y)
ε K (x, y) :=  ε22 (x, y) 
γ12 (x, y)
and the local strain matrix is
 
∂ ϕ1K (x, y) ∂ ϕ2K (x, y) ∂ ϕ3K (x, y)
0 0 0

 ∂x ∂x ∂x 


∂ ϕ1K (x, y) ∂ ϕ2K (x, y) K
∂ ϕ3 (x, y) 

BK (x, y) :=  0 0 0 .

 ∂y ∂y ∂y 
 ∂ ϕ1K (x, y) ∂ ϕ1K (x, y) ∂ ϕ2K (x, y) ∂ ϕ2K (x, y) ∂ ϕ3K (x, y) K
 
∂ ϕ3 (x, y) 
∂y ∂x ∂y ∂x ∂y ∂x

Remark 6.2.3 (CST). Clearly, since the shape functions ϕiK are linear in x and y,
it turns out that the local strain matrix BK (x, y) is constant, and consequently also
ε K (x, y) is a constant tensor. For this reason, the finite element corresponding to
the choice r = 1 is well-known as CST or Constant Strain Triangle.
Remark 6.2.4 (Higher-order FE spaces). Higher-order approximations of displace-
ment and strain can be obtained by taking r > 1 (see Fig. 4.2). All the formulas
written above continue to apply provided to replace 3 with Ndofs, where

1
Ndofs = (r + 1)(r + 2)
2

is the number of dofs of the finite element space Pr (K).

As already described in Chapt. 4, the assembly phase of the global coefficient
matrix is based on the computation of:

• a local stiffness matrix;

• a local load vector.

104 CHAPTER 6. 2D ELASTICITY

To construct the local stiffness matrix, we simply need to define the local bilinear
form associated with the global bilinear form B(uh , vh ). We have
Z Z
BK (uh , vh ) := C ε(uh ) : ε(vh ) dK = (C BK dK )T (BK vK ) dK
K Z K
= (dK )T (BK )T C BK dK vK ∀K ∈ Th ,
K

where C is the symmetric matrix (i.e., such that C T = C ) given by (6.4) in plane
stress conditions or by (6.6) in plane strain conditions. From the above relation,
we identify Z
KK := (BK )T C BK dK ∈ R6×6
K
as the local stiffness matrix associated with element K, so that we can write in a
synthetic notation
BK (uh , vh ) = (dK )T KK vK .
The vector vK ∈ R6 contains cyclically the dofs associated with the test function
vh , and is taken equal to ei = [0, 0, . . . , 1, 0, . . . , 0]T , i = 1, . . . , 6.
| {z }
in the i-th position
To construct the local load vector, we define the local linear form associated
with the global form F(vh ). We have
Z Z
F K (vh ) := f · vh dK + g · vh dσ
ZK ∂ K∩Γ
Z N
= f · NK vK dK + g · NK vK dσ
ZK ∂ZK∩ΓN
K K T
= (N v ) f dK + (NK vK )T g dσ
K Z ∂ K∩Γ
ZN
= K T
(v ) K T
(N ) f dK + K T
(N ) g dσ ∀K ∈ Th .
K ∂ K∩ΓN

From the above relation, we identify

Z Z
K
F := K T
(N ) f dK + (NK )T g dσ ∈ R6×1
K ∂ K∩ΓN

as the local load vector associated with element K, so that we can write in a syn-
thetic manner
F K (vh ) = (vK )T FK = (FK )T vK .
6.2. THE GFE APPROXIMATION IN THE 2D CASE 105

The vectors f = f(x, y) = [ fx (x, y), fy (x, y)]T and g = g(x, y) = [gx (x, y), gy (x, y)]T
contain the functions describing the external volume and surface applied loads.
As discussed in Sect. 4.7, to evaluate FK a suitable quadrature formula is needed.
To compute the contribution associated with f in the case of linear finite elements,
a simple and appropriately accurate formula is the two-dimensional extension of
the trapezoidal rule (see Sect. F.2) given by
3
|K|
Z
η(x, y) dK ' ∑ η(xi , yi )
K i=1 3

for every continuous function η : K → R, where (xi , yi )T , i = 1, 2, 3, are the co-

ordinates of the vertices of K, numbered in counterclockwise sense (see Fig. 6.3)
while |K| denotes the area of triangle K. Notice that the above formula is exact
for every η ∈ P1 (K) and satisfies the error estimate
3
Z |K|
η(x, y) dK − ∑ η(xi , yi ) ≤ C h3K η ∈ C2 (K)

K i=1 3

C > 0 being a constant depending on η but independent of h. A similar quadrature

formula may be adopted to evaluate the contribution associated with g on the
boundary portion ∂ K ∩ ΓN .

6.2.3 Convergence analysis

Let us assume to deal with the two-dimensional elasticity problem (plane stress
or plane strain conditions) and that Ω ⊂ R2 is a polygonal domain covered by a
family of regular triangulations {Th }h>0 (see (4.6)). Assume also that the finite
dimensional subspace of V is the space of (vector-valued) finite elements of degree
≤ r, r ≥ 1 being a given integer

Vh = vh ∈ (C0 (Ω))2 | vh |K ∈ (Pr (K))2 ∀K ∈ Th , vh = 0 on ΓD .

(6.10)

Theorem 6.2.5 (Error estimate for the GFEM applied to Linear Elasticity). Let us
denote by u and uh ∈ Vh the solution of (5.22) and of the GFE approximation (6.8),
respectively. Assume also that u ∈ (H s (Ω))2 ∩ V , s ≥ 2 being a given quantity
representing the regularity of the exact solution. Then, the following a priori
error estimate hols
M 2D `
ku − uh kV ≤ CTh h kuk(H `+1 (Ω))2 (6.11)
β 2D
106 CHAPTER 6. 2D ELASTICITY

where ` := min {r, s − 1} is the usual regularity threshold, CTh is a positive con-
stant depending on κ but not on the discretization parameter h and M 2D , β 2D
are the continuity and coercivity constants of the bilinear form B(·, ·) in the plane
stress/plane strain regimes, given by
E



 1+ν plane stress
M 2D =
E (6.12)
 2(λ + µ) = plane strain


(1 + ν)(1 − 2ν)
E
β 2D = 2µ = plane stress and plane strain.
1+ν
Proof. The estimate (6.11) is the result of the straightforward application of Ceà’s
Lemma (cf. Chapters 2 and 4). To compute the values of the constants M 2D and
β 2D we follow the same steps as in Sect. 5.4 by setting
 
εxx (x, y)
ε(x, y) :=  εyy (x, y) 
εxy (x, y)
and  
a b 0
C (x, y) =  b a 0  .
0 0 c
where in the case of plane stress conditions a = E/(1 − ν 2 ), b = νa and c = 2µ =
E/(1 + ν), while in the case of plane strain conditions a = λ + 2µ, b = λ and
c = 2µ.
Concerning continuity of B(·, ·) we immediately get
|B(u, v)| ≤ kC k∞ kε(u)k(L2 (Ω))2 kε(v)k(L2 (Ω))2 ≤ (a + b)kukV kvkV u, v ∈ V,

from which we obtain the expressions for M 2D in (6.12).

Concerning coercivity, the following result is useful.
Proposition 6.2.6 (Energy norm induced by C ). The positive number
Z 1/2
C ε(u) · ε(u) dΩ u∈V
Ω

is a norm in V, denoted henceforth by kukC . Moreover, we have

B(u, u) = kuk2C ≥ min {a, b, c} kukV2 ∀u ∈ V. (6.13)
6.3. NUMERICAL EXAMPLES 107

Using Prop. 6.2.6 we immediately obtain that B(·, ·) is a coercive bilinear form
with β given by the expression β 2D given in (6.12).

Remark 6.2.7 (The dependence on ν). The ratio M 2D /β 2D in the case of plane
stress regime is equal to 1. Therefore, we can conclude that the choice of the
mesh size in order to obtain a small error is determined only by the regularity of
the exact solution.
Instead, in the case of plane strain regime the ratio M 2D /β 2D is equal to
1/(1 − 2ν) which shows that M 2D /β 2D is a bounded quantity (of the order of
unity) as long as ν is sufficiently less than 0.5 (for example, ν = 0.33, as in the
case of steel). In such a situation, the choice of the mesh size in order to obtain
a small error is determined only by the regularity of the exact solution. This is
no longer true when ν becomes close to 0.5, i.e., as the elastic material becomes
incompressible. In such a situation, the ratio M 2D /β 2D → +∞ and the choice of
the mesh size in order to obtain a small error is furtherly penalized by the incom-
pressibility condition. The practical result of this analysis is that the computed
deformation is severely affected by the so-called volumetric locking, unless the
mesh size is extremely small. This introduces a strong limitation in the use of the
Navier-Lamè model equations (5.16) in the incompressible limit, and an alterna-
tive approach to overcome such a limitation will be the object of Chapt. 7.

Remark 6.2.8 (The 3D case). Applying Cea’s Lemma as done in the 2D case, one
can prove a 3D analogue of Thm. 6.2.5 in the case where {Th }h>0 is a family of
regular triangulations made of tetrahedral elements, provided to replace M 2D and
β 2D with the continuity and coercivity constants M and β determined in Sect. 5.4
and given by (5.33) and (5.34), reapectively. Therefore, since these latter constants
coincide with those of the 2D plane strain elastic problem, the same considerations
valid for this regime can also be applied to the 3D elastic problem.

6.3 Numerical examples

In this section we study two simple problems for which the exact solution is avail-
able, in order to verify the numerical performance of the GFEM. The CST is
adopted in the choice of Vh .
108 CHAPTER 6. 2D ELASTICITY

6.3.1 Example 1: patch test (constant stress)

In this case we consider the unit square plate in plane stress conditions shown in
Fig. 6.4. The plate is partially constrained to ground along the sides y = 0 and
x = 0 and is subject to a boundary traction gN along the sides y = 1 and x = 1. The
elastic parameters are E = 1 and ν = 0.3, the volume force f is equal to zero.

Figure 6.4: Constant stress patch test: geometry, boundary conditions and applied
loads having set Kx = Ky = 1 in (6.14), (6.15) and (6.16).

Data are chosen in such a way that the exact solution of the 2D plane stress
elasticity problem is the displacement field

u(x, y) = Kx x, v(x, y) = Ky y (6.14)

where Kx and Ky are given constants. Using (6.2) we obtain

εxx = Kx , εyy = Ky , γxy = 0, (6.15)

while using (6.1) and (6.4) we get

E νE νE E
σxx = 2
Kx + 2
Ky , σyy = 2
Kx + Ky , τxy = 0.
(1 − ν ) (1 − ν ) (1 − ν ) (1 − ν 2 )
(6.16)
6.3. NUMERICAL EXAMPLES 109

The boundary traction force gN can be computed as

 T
 νE E
 0, 1 − ν 2 εxx + 1 − ν 2 εyy on y = 1


gN = σ n = (C ε)n = T
 E νE
εxx + εyy , 0 on x = 1,


1 − ν2 1 − ν2


where n = [0, 1]T on y = 1 and n = [1, 0]T on x = 1.

Since u is linear and strain/stress are constant, we expect the GFEM to be
nodally exact (up to machine precision, of course) in this elementary case. This
particular application is a consistency test of the numerical method, as it aims at
verifying the ability of the finite element subspace to represent a solution that ac-
tually belongs to it. In the computational mechanics literature, such a test problem
is called patch test, because the numerical solution should be capable to capture
exactly the solution of the continuous problem at the mesh nodes, irrespective
of the choice of the triangulation. The result of the computation is illustrated in
Fig. 6.5 which shows the deformed configuration of the plate in the case where
Kx = Ky = 1.

Figure 6.5: Constant stress patch test: deformed configuration.

110 CHAPTER 6. 2D ELASTICITY

Matlab coding. The following Matlab commands can be used to compute the me-
chanical response of the elastic body and the H 1 and L2 norms of the discretization
error u − uh .
>> EF2D_el_H

* test case ----> patchtest *

reading problem data

matrix assembly
enforcing Neumann BCs
enforcing Dirichlet BCs
solving the linear system
plotting the solution

Error norms:
||u-u_h||_{H^1} --> 3.70702e-15
||u-u_h||_{L^2} --> 2.90620e-16
||p-p_h||_{L^2}--> 5.93407e-01

The results clearly demonstrate that the solution is nodally exact (up to ma-
chine precision), as expected.
Matlab coding. The following Matlab commands can be used to evaluate the
maximum displacement after application of the external loads.
>> max(abs(U_ux))

ans =

>> max(abs(U_uy))

ans =

We see that the maximum deformation is of 100%, in accordance with the

exact solution (6.14).

6.3.2 Example 2: experimental convergence analysis

In this case we consider the unit square section of a beam (plane strain conditions)
shown in Fig. 6.6, with E = 1, ν = 0.33 and body force f = [−9y/20, −3x/10]T .
On the right and top sides of the beam, a force g is applied while the displacement
u is constrained on the other remaining sides.
6.3. NUMERICAL EXAMPLES 111

Figure 6.6: Cross-section of a beam: geometry, boundary conditions and applied

loads.

The exact solution of the elasticity problem is the displacement field

y3 xy2
u= , v=
10 10
while the corresponding stress field is

λ xy (λ + 2µ)xy 2µy2
σxx = , σyy = , τxy = .
5 5 5
The Lamè parameters λ and µ in the previous expressions can be computed as a
function of the elastic material constants using relations (5.12). Since u is a cubic
function of (x, y), we expect the GFEM (with r = 1) to introduce a discretization
error. Fig. 6.7 shows the deformed configuration of the beam computed by the
GFEM on a mesh of right-angled triangles of size h = 1/5.
Matlab coding. The following Matlab script allows to generate the computational
mesh of Fig. 6.7. The command pdemesh(p,e,t) allows to visualize the grid.
N=5;
[p,e,t]=poimesh(’squareg’,N,N);
p(1,:)=0.5*(p(1,:)+1);
p(2,:)=0.5*(p(2,:)+1);
pdemesh(p,e,t)

An experimental convergence analysis as a function of the mesh size h allows

to verify the theoretical conclusions of Thm. 6.2.5. To this purpose, we use the
above Matlab coding to generate 5 grids of right-angled triangles of successively
refined size h = 1/N, where N = [5, 10, 20, 40, 80]T .
112 CHAPTER 6. 2D ELASTICITY

Figure 6.7: Cross-section of a beam: deformed configuration.

Tab. 6.1 reports the computed errors in the H 1 and L2 norms as a function of
h. Results clearly indicate a linear reduction of ku − uh kV (for each row, the next
error is halved with respect to that of the considered row), while ku − uh k(L2 (Ω))2
decreases quadratically vs. h (for each row, the next error is divided by a factor of
22 = 4 with respect to that of the considered row). These results agree with (6.11)
(in the case r = 1 and s = +∞) and with Thm. 2.2.6. Fig. 6.8 shows the log-log
plot of the error u − uh , measured in the H 1 and L2 norms, respectively.

h ku − uh kV ku − uh k(L2 (Ω))2
0.2 2.71368e-02 2.16674e-03
0.1 1.33554e-02 5.63020e-04
0.05 6.61045e-03 1.42246e-04
0.025 3.28992e-03 3.56057e-05
0.0125 1.64201e-03 8.89275e-06

Table 6.1: Discretization error in the H 1 and L2 norms as a function of h.

6.3. NUMERICAL EXAMPLES 113

Figure 6.8: Cross-section of a beam: convergence analysis.

114 CHAPTER 6. 2D ELASTICITY
Chapter 7

Incompressible Linear Elasticity:

Theory and Finite Element
Approximation

Abstract

In this chapter, we apply the general machinery of weak formulation and Galerkin
Finite Element approximation to the case of the Navier-Lamè equations of lin-
ear elasticity in the incompressible regime. To deal with the incompressibility
condition (and avoid the volumetric locking phenomenon), we introduce the so-
called Hermann formulation of the elasticity problem, through the introduction of
an additional unknown, the pressure parameter, and we discuss the compatibil-
ity condition that needs be satisfied in order the resulting discrete scheme to be
stable. Several choices of stable and unstable pairs of finite element spaces (for
displacement and pressure, respectively) are considered and analyzed in the 2D
case. Computational tests to validate the numerical performance of the method
are run in Matlab using the code EF2D el H developed by Marco Restelli and
modified by Riccardo Sacco and Aurelio Giancarlo Mauri. The code is available
at the link:

⇒ https://beep.metid.polimi.it/

115
116 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

7.1 The incompressible regime in three spatial di-

mensions
Rubber, polymers, water: these materials have in common one special feature.
Whenever subject to an external load, they deform without changing their volume,
that is to say, they are incompressible. This corresponds to the physical evidence
that producing a change of volume in a (nearly) incompressible material requires
an extremely high amount of energy (water is a limit example!).

Definition 7.1.1 (Incompressibility (mathematical definition)). A material is said

to be incompressible if its displacement field is solenoidal, i.e.

div u = 0. (7.1)

In the case of a fluid, the above condition means that the fluid velocity u is
solenoidal.

Recalling (A.7), we see that an alternative definition of incompressibility is

∆V
Tr ε(u) = = 0, (7.2)
V
where ∆V /V represents the volumetric strain occurring in the three-dimensional
(3D) elastic body due to the applied stress.
From a mechanical point of view, it is useful to express condition (7.2) in
terms of the representative parameters of the material. Using definitions (5.12),
we first introduce the shear modulus
E
G := ≡µ
2(1 + ν)

and the bulk modulus

2 Eν 2
K := λ + µ = + µ.
3 (1 + ν)(1 − 2ν) 3

Then, we introduce the following ratio

K 2 λ
ρ := = + (7.3)
G 3 µ
7.2. THE INCOMPRESSIBLE REGIME IN TWO SPATIAL DIMENSIONS117

which expresses the resistance of the material to volume variations. Replacing

in (7.3) the definitions of G and K we get
Eν
2 λ 2 (1 + ν)(1 − 2ν) 2 2ν 2 1+ν
ρ= + = + = + =
3 µ 3 E 3 (1 − 2ν) 3 1 − 2ν
2(1 + ν)

which shows that

1
lim ρ = lim = +∞. (7.4)
ν→0.5− ν→0.5− 1 − 2ν
Therefore, consistently with physical expectation, as long as the Poisson ratio gets
closer and closer to the limiting value 0.5, resistance to a variation of body volume
increases without control. Noting that
E ν ν
λ= = 2µ ,
1 + ν 1 − 2ν 1 − 2ν
it immediately follows that
lim λ = +∞. (7.5)
ν→0.5−
In conclusion, we have the following alternative definition of incompressibility.

Definition 7.1.2 (Incompressibility (mechanical definition)). A material is said to

be incompressible if condition (7.4) (equivalently, (7.5)) is satisfied.

7.2 The incompressible regime in two spatial dimen-

sions
In order to characterize the notion of material incompressibility in the case of
plane linear elasticity we need express the stress-strain relation (6.1) as a function
of the material parameters E and ν. In plane stress conditions we have:
E Eν
σxx = ε xx + εyy
1 − ν2 1 − ν2
Eν E
σyy = 2
εxx + εyy (7.6)
1−ν 1 − ν2
E
τxy = γxy .
2(1 + ν)
118 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

Summing and subtracting the quantity Eνεxx /(1 − ν 2 ) in the expression for σxx
we get

E Eν Eν Eν
σxx = 2
− 2
εxx + 2
divu = 2µεxx + divu.
1−ν 1−ν 1−ν 1 − ν2

Proceeding similarly in the expression for σyy we get

E Eν Eν Eν
σyy = 2
− 2
εyy + 2
divu = 2µεyy + divu.
1−ν 1−ν 1−ν 1 − ν2

The tangential stress, instead, is

E
τxy = γxy = µ2εxy .
2(1 + ν)

Thus, in summary we can write the generalized Hooke law in the two-dimensional
plane stress condition as
2D
σ = 2µε + λPlaneStress divu δ , (7.7)

where
2D Eν Eν
λPlaneStress := 2
= . (7.8)
1−ν (1 + ν)(1 − ν)
In plane strain conditions we have:

E(1 − ν) Eν
σxx = εxx + εyy
(1 + ν)(1 − 2ν) (1 + ν)(1 − 2ν)
Eν E(1 − ν)
σyy = εxx + εyy (7.9)
(1 + ν)(1 − 2ν) (1 + ν)(1 − 2ν)
E
τxy = γxy .
2(1 + ν)

It is immediate to verify that the above expressions can be written compactly as

the generalized Hooke’s law in two-dimesnional plane strain condition
2D
σ = 2µε + λPlaneStrain divu δ , (7.10)

where
2D Eν
λPlaneStrain := . (7.11)
(1 + ν)(1 − 2ν)
7.3. VOLUMETRIC LOCKING: EXAMPLES 119

Thus, consistently with the different forms assumed by the elastic matrix C in
plane stress and plane strain conditions, also the generalized Hooke’s law assumes
different forms in the two regimes. These forms characterized by two different
expressions of the second Lamè parameter λ . In particular, we see that under the
assumption ν > 0, volumetric locking may occur only in plane strain conditions. If
ν is allowed to vary in the admissible range −1 < ν < 1/2 we see that volumetric
locking may occur also in plane stress conditions because
2D
lim λPlaneStress = +∞.
ν→−1+

Unless otherwise specified, from now on we always assume that ν ∈ [0, +1/2] and
we denote the second Lamè parameter as λ , the distinction between plane stress
and plane strain conditions being understood.

7.3 Volumetric locking: examples

Consider the study of the deformation of the 2D cross-section of a long thin beam
(plane strain conditions) shown in Fig. 7.3.

Figure 7.1: 2D cross-section of a beam: geometrical notation, loads and con-

straints.

The solution of the elastic problem (5.16) is a function of the elastic Lamè
parameters λ and µ, so that we can write

u = u(λ , µ).
120 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

To the purpose of analysis, let us assume that µ = 1 and perform a study as a func-
tion of the sole remaining parameter λ ∈ [0, +∞]. A computation of the solution
of the elastic problem yields the results reported in Tab. 7.1, which indicate that as
the material becomes incompressible, the deformation tends to a limit (non null)
value of almost 18%.

λ maximum deformation
10 0.21662
104 0.17676
107 0.17716

Table 7.1: Maximum deformation as a function of λ .

In correspondance of these data, we compute the approximate solution of

problem (5.16) using the GFEM with r = 1 as explained in detail in Chapt. 5.
The obtained results are reported in Tab. 7.2.

λ maximum deformation h
10 0.212 1/32
107 0.0003 1/64

Table 7.2: Maximum approximate deformation as a function of λ and for given

values of h.

The numerically computed values of the deformation indicate that the approx-
imate deformation accurately represents the exact one only when λ is very small.
The percentage error is in such a case

|0.21662 − 0.212|
εrel = × 100 % ' 2 %.
0.21662
Things are very different, however, when the material approaches the incompress-
ible regime (λ = 107 ). As a matter of fact, despite the mesh size is halved with
respect to the case λ = 10, the maximum deformation is practically null, and
εrel ' 100 %. The fact that the material does not exhibit an appreciable deforma-
tion is a manifestation of the so-called volumetric locking.
The reasons for such an unphysical performance of the numerical formulation
were anticipated in Rem. 6.2.7. As a matter of fact, in the case where λ = 107 ,
7.3. VOLUMETRIC LOCKING: EXAMPLES 121

we have (1 + λ /(2µ)) = O(107 ) in the a-priori error estimate (6.11). This implies
that to have a small error, we need to take h much smaller than 1/64, at least of the
order of hmin = 10−7 , if we make the (non precise) assumption that kuk(H 2 (Ω))2 =
O(1). Should h be chosen signficantly larger than hmin (as in the present example),
it is no surprise that the corresponding error is very large and volumetric locking
occurs.
Example 7.3.1 (Kinematic interpretation of locking). From the results of the previ-
ous example, one is tempted to draw the conclusion that the only discrete solution
in the space Vh (with r = 1) that is also compatible with the incompressibility
constaint (7.1), is
uh (x, y) = 0 ∀(x, y) ∈ Ω. (7.12)
To check the correctness of this claim, let us consider the geometrical representa-
tion of Fig. 7.3.1 which shows a (very coarse) triangulation of the domain Ω, and
assume λ = +∞ (exact incompressibility).

Figure 7.2: Kinematic interpretation of volumetric locking.

Since the body has to deform at constant volume, the area of all mesh trian-
gles must remain the same. Referring in particular to triangle 1, the constraint of
constant area prevents node A from moving along the y direction, but allows only
for an horizontal motion. At the same time, node A belongs also to triangle 2, so
that, in order to maintain the area constant, no horizontal motion is permitted to
node A, but only vertical. This, however, is prohibited by the previous analysis,
so that the conclusion is that the only admissible motion for node A is uA = 0.
This argument applies also to node B and to all the remaining other nodes, from
which it follows that the only piecewise linear continuous deformed configuration
compatible with (7.1) is the null function of (7.12).
122 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

7.4 A two-field model for compressible and incom-

pressible elasticity
In this section, we introduce a unified mathematical formulation of the elasticity
problem in both compressible and incompressible regimes. To this purpose, we
start with the following definition.
Definition 7.4.1 (The hydrostatic pressure). The hydrostatic pressure ℘ is defined
as
1 1
℘ := − Trσ = − (σ11 + σ22 + σ33 ). (7.13)
3 3
Using the Navier-Lamè stress-strain relation (5.15), we have

1 2µ 2µ
℘ = − Tr σ = − Tr ε − λ divu = −λ 1 + divu. (7.14)
3 3 3λ
On the other hand, the generalized Hooke’s law can be written in the following
equivalent form
σ = σ +℘δ −℘δ = 2µε + (λ divu +℘)δ −℘δ
which, using (7.14) and recalling that divu = Trε, becomes

1
σ = 2µ ε − Tr ε δ −℘δ := S −℘δ , (7.15)
3
where
1
S := 2µ ε − Tr ε δ (7.16)
3
is the deviatoric part of the stress tensor. By construction, we have
Tr S = S11 + S22 + S33 = 2µ(Tr ε − Tr ε) = 0.
The representation (7.15)- (7.16) has the important mechanical effect of decom-
posing the state of stress in a material into a deviatoric part and an hydrostatic part.
From the mathematical point of view, the decomposition (7.15), unlike (5.15),
does not fail in the incompressible case, provided that the pressure is bounded.
This observation can be profitably exploited to construct a mathematical formu-
lation of the linear elasticity problem that is robust with respect to the compress-
ibility parameter λ . With this aim, let us introduce the new dependent variable
p := −λ div u. (7.17)
7.4. TWO-FIELD MODEL FOR LINEAR ELASTICITY 123

Comparing (7.17) and (7.14), we see that

2µ
℘= 1+ p,
3λ

from which we can express p as a function of ℘ as

3λ
p= ℘. (7.18)
2µ + 3λ

The new variable p is referred to as the pressure parameter, because (7.18) shows
that
lim p = ℘, (7.19)
λ →+∞

that, is, in the incompressible limit the pressure parameter coincides with the hy-
drostatic pressure.
Remark 7.4.2. For brevity, we shall refer to the pressure parameter p as the pres-
sure. As noted above, there should not be confusion between pressure and hydro-
static pressure, these two quantities being related through (7.18).
Replacing (7.17) into the Navier-Lamè model (5.16), we obtain, for λ > 0, the
following novel formulation of the linear elasticity problem:
find the displacement u : Ω → R3 and the pressure p : Ω → R such that:

divσ (u, p) + f = 0 in Ω (7.20a)

σ (u, p) = 2µ∇S u − p δ in Ω (7.20b)
p
+ div u = 0 in Ω (7.20c)
λ
u = 0 on ΓD (7.20d)
σ (u, p)n = g on ΓN . (7.20e)

Remark 7.4.3 (Two-field model). The Navier-Lamè system (5.16) is a displace-

ment formulation of elasticity (cf. Rem. 5.2.1), while the novel system (7.20) is
a two-field formulation (displacement-pressure). System (7.20) is also known as
the Herrmann formulation of linear elasticity, because of the introduction of the
pressure variable. Comparing (7.20b) and (7.15), we see that in the incompress-
ible limit (λ → +∞) the deviatoric stress S tends to coincide with ∇S u and the two
stress-strain relations become identical.
124 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

Remark 7.4.4 (A unified model). Except for the special case λ = 0 (corresponding
to a material that does not exhibit trasverse contraction in its mechanical response
and that can be treated by using the standard Navier-Lamè model), system (7.20)
is well defined for every λ ∈ (0, +∞]. For this reason, its use is by no means
restricted to the (nearly) incompressible regime, but can be adopted also for the
study of compressible materials. In this sense, system (7.20) can be regarded as a
unified formulation of linear elasticity.

7.5 Weak formulation of the two-field model

The weak formulation of the two-field system (7.20) is the natural extension of
that considered in Sect. 5.3 for the Navier-Lamè model. Let V and Q be the
function spaces for displacement and pressure, respectively. The definition of V is
the same as in (5.20), so that we proceed by multiplying (7.20a) by a test function
v ∈ V and integrating by parts the term
Z
divσ (u, p) · v dΩ ∀v ∈ V
Ω

to obtain
Z Z
v · σ (u, p)n dΓ − σ (u, p) : ε(v) dΩ ∀v ∈ V.
Γ Ω

The boundary integral can be treated as done in Sect. 5.3. The volume integral
becomes
Z Z
σ (u, p) : ε(v) dΩ = (2µε(u) − pδ ) : ε(v) dΩ
ΩZ ΩZ

= 2µε(u) : ε(v) dΩ − p Tr ε(v) dΩ

ZΩ ZΩ
= 2µε(u) : ε(v) dΩ − p div v dΩ ∀v ∈ V.
Ω Ω

The fact that v ∈ V implies that the function div v belongs to L2 (Ω). Thus, using
Thm. 1.3.1, we have that a sufficient condition for the function p div v to belong
to L1 (Ω) is that p belongs to L2 (Ω). Concluding, the appropriate function space
where to seek the pressure is
Q := L2 (Ω), (7.21)
7.6. ABSTRACT FORM OF THE HERRMANN SYSTEM 125

and the weak formulation of the Herrmann model for elasticity reads:
find u ∈ V and p ∈ Q such that, for all v ∈ V and for all q ∈ Q we have:
 Z Z Z Z

 2µε(u) : ε(v) dΩ − p div v dΩ = f · v dΩ + g · v dΓ
Ω Ω Ω ΓN
1
Z Z
 −

q div u dΩ − p q dΩ = 0.
Ω λ Ω
(7.22)

7.6 Abstract form of the Herrmann system

The weak form of the Herrmann formulation can be written in synthetic structure
as the following two-by-two block abstract system:
find u ∈ V and p ∈ Q such that:
(
a(u, v) + b(v, p) = F(v) ∀v ∈ V
(7.23)
b(u, q) + d(p, q) = 0 ∀q ∈ Q

where a(·, ·) : V ×V → R is the bilinear form defined as

Z
a(u, v) = 2µε(u) : ε(v) dΩ,
Ω

b(·, ·) : V × Q → R is the bilinear form defined as

Z
b(v, q) = − q div v dΩ,
Ω

d(·, ·) : Q × Q → R is the bilinear form defined as

1
Z
d(p, q) = − p q dΩ,
λ Ω

while F(·) : V → R is the linear functional defined as

Z Z
F(v) = f · v dΩ + g · v dΓ.
Ω ΓN

System (7.23) is the generalization to the Herrmann elasticity formulation (7.20)

of the abstract problem (5.22) associated with the Navier-Lamè BVP (5.16).
126 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

Theorem 7.6.1 (Elimination of the pressure). If λ < +∞, the abstract prob-
lem (7.23) is equivalent to the abstract problem (5.22).
Proof. If λ < +∞, the second equation of (7.23) gives
Z
(p + λ div u) q dΩ = 0 ∀q ∈ Q,
Ω
which tells us that the element p + λ div u of the space Q = L2 (Ω) is orthogonal
to every element q of Q with respect to the usual scalar product in L2 (Ω). Thus,
p + λ div u is the null element of Q, that is
p = −λ div u a.e. in Ω.
Replacing the above relation into the first equation of (7.23) gives immediately
back the abstract problem (5.22).
Remark 7.6.2 (The role of the pressure in the incompressible case). From the
proof of Thm. 7.6.1, we immediately see that in the exactly incompressible case
(λ = +∞), the elimination of the pressure from (7.23)2 is no longer possible,
so that the pressure has to be determined by solving the full two-by-two coupled
system (7.23). Therefore, while in the compressible regime the pressure is an aux-
iliary variable (not strictly necessary for the solution of the elasticity problem), in
the incompressible limit the pressure is necessary for the solution of the problem.
To overcome this fundamental difficulty of the exactly incompressible regime, a
popular approach in Computational Mechanics consists of introducing a (small)
additional compressibility to the mechanical behavior of the solid, in order to end
up with a modified problem of the form (7.23) to which Thm. 7.6.1 can be applied.
This approach is known as the B-bar method.

7.7 Well-posedness analysis of the two-field model

Let us now address the issue of assessing whether the two-field formulation (7.22)
is well-posed. To this purpose, we set λ = +∞ (incompressible regime) and
slightly simplify the problem by setting ΓN = 0/ so that the elasticity problem
becomes:
find u : Ω → R3 and p : Ω → R such that:
−2µ div ε(u) + ∇p = f in Ω (7.24a)
div u = 0 in Ω (7.24b)
u = 0 on Γ. (7.24c)
7.7. WELL-POSEDNESS ANALYSIS OF THE TWO-FIELD MODEL 127

Remark 7.7.1. From (7.24a) and the (mechanical) fact that no information is given
on the normal stress (and thus, on p) on the boundary, it immediately follows that
if p∗ is a solution, also p∗ +C is a solution, C being an arbitrary constant.
Because of (7.24c), the function space where to seek the displacement is

V = (H01 (Ω))3 .

Because of Rem. 7.7.1, an appropriate choice for the function space where to seek
the pressure is
Z
2 2
Q = L0 (Ω) := q ∈ L (Ω) | q(x) dΩ = 0 .
Ω

Adding the further requirement on the pressure to have zero mean over Ω allows to
fix in a unique manner the arbitrary constant C introduced in Rem. 7.7.1. This ap-
proach closely resembles what already done in the case of the non-homogeneous
Neumann problem (see Rem. 1.4.7).
The weak problem associated with (7.24) is:
find u ∈ V and p ∈ Q such that, for all v ∈ V and for all q ∈ Q we have:
 Z Z Z

 2µε(u) : ε(v) dΩ − p div v dΩ = f · v dΩ
ΩZ Ω Ω
(7.25)
 − q div u dΩ
 = 0.
Ω

The following result is crucial in the analysis of well-posedness of problem (7.25).

Theorem 7.7.2 (Compatibility condition relating V and Q). There exists a positive
constant γ such that, for every q ∈ Q, there exists v ∈ V such that
Z
q div v dΩ ≥ γkqkQ kvq kV . (7.26)
Ω

Remark 7.7.3 (Inf-sup condition). The inequality (7.26) expresses a compatibility

condition, through the action of the bilinear form b(·, ·), introduced in Sect. 7.6,
between the two function spaces V and Q. It is also called inf-sup condition be-
cause it can be reformulated as
Z
q div v dΩ
Ω
inf sup ≥ γ. (7.27)
q∈Q v∈V kqkQ kvkV
|{z} |{z}
for all q ∈ Q there exists v ∈ V
128 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

The inf-sup condition is also known as LBB condition, from the names of the
various mathematicians that, over the last 40 years, have given fundamental con-
tributions to its development and analysis: Olga Ladinszeskaya, Franco Brezzi
and Ivo Babuska.
Remark 7.7.4 (Weak coercivity). The inf-sup condition can be regarded as a prop-
erty of weak coercivity of the bilinear form b(·, ·). The constant γ plays here the
role of β in the Lax-Milgram Lemma 1.5.1.
Theorem 7.7.5 (Existence, uniqueness and a priori estimate). Let f be a given
function in (L2 (Ω))3 , and assume that the compatibility condition (7.26) is sat-
isfied. Then, the weak problem (7.25) admits a unique solution pair (u, p) ∈
(V × Q), such that
CP
kukV ≤ kfk(L2 (Ω))3 (7.28a)
2µ
2CP
kpkQ ≤ kfk(L2 (Ω))3 . (7.28b)
γ
Proof. To prove the uniqueness of the solution pair (u, p), we proceed by as-
suming that there exist two distinct pairs that satisfy (7.25), namely, (u1 , p1 ) and
(u2 , p2 ), and then try to show that, necessarily, one must have u1 = u2 and p1 = p2 .
Subtracting the two weak formulations in correspondance of the two distinct so-
lution pairs we obtain
 Z Z

 2µε(u1 − u2 ) : ε(v) dΩ − (p1 − p2 ) div v dΩ = 0 ∀v ∈ V
ΩZ Ω
 −
 q div (u1 − u2 ) dΩ = 0 ∀q ∈ Q.
Ω
(7.29)
Now, take v = u1 − u2 and q = −(p1 − p2 ) and sum (7.29)1 and (7.29)2 , to get
Z
2µε(u1 − u2 ) : ε(u1 − u2 ) dΩ = 0,
Ω
that is
ku1 − u2 kV2 = 0 ⇒ u1 = u2 .
This proves that the displacement is unique.
To prove now that the pressure is unique, set P := p1 − p2 . Then, from (7.29)1 ,
we get (recall that u1 = u2 !)
Z
P div v dΩ = 0 ∀v ∈ V.
Ω
7.7. WELL-POSEDNESS ANALYSIS OF THE TWO-FIELD MODEL 129

Using (7.26), we have

Z
0= P div v∗ dΩ ≥ γkPkQ kv∗ kV
Ω

for, at least, one particular v∗ ∈ V such that v∗ 6= 0. Thus, we get

kPkQ = 0 ⇒ p1 = p2 .

This proves uniqueness also of the pressure.

Let us now prove the a-priori estimate (7.28). To this purpose, take v = u
in (7.25)1 and q = p in (7.25)2 , and then subtract the second equation from the
first, obtaining
Z Z
2µε(u) : ε(u) dΩ = f · u dΩ.
Ω Ω

Using the Cauchy-Schwarz inequality and the Poincarè inequality to treat the
right-hand side, and the definition of equivalent norm in V to treat the left-hand
side, we immediately get (7.28a).
Let us now come to the a-priori bound for the pressure. From (7.25)1 , we find
Z Z Z
p div v dΩ = 2µε(u) : ε(v) dΩ − f · v dΩ ∀v ∈ V,
Ω Ω Ω

from which we obtain

Z
p div v dΩ ≤ 2µkukV kvkV + kfk(L2 (Ω))3 kvkQ
Ω
CP kfk(L2 (Ω))3
≤ 2µ kvkV + kfk(L2 (Ω))3 CP kvkV
2µ
= 2CP kfk(L2 (Ω))3 kvkV ∀v ∈ V.

Thus, using (7.26) with q = p, we have, for, at least, one particular v∗ ∈ V such
that v∗ 6= 0
Z
∗
γkv kV kpkQ ≤ p div v∗ dΩ ≤ 2CP kfk(L2 (Ω))3 kv∗ kV ,
Ω

from which (7.28b) immediately follows.

130 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

7.8 Energy formulation of incompressible elasticity

Thm. 5.3.2 expresses the relation between the weak solution of the Navier-Lamè
equations of linear elasticity (Principle of Virtual Works) with the minimizer of
the energy functional
1
Z Z Z
J(v) = σ (v) : ε(v) dΩ − f · v dΩ − g · v dΓ
Ω2 Ω ΓN
1
Z Z Z Z
= µε(v) : ε(v) dΩ + λ (div v)2 dΩ − f · v dΩ − g · v dΓ
Ω Ω2 Ω ΓN

(Principle of Minimum of the Potential Energy). Such a relation breaks down in

the incompressible limit, because of the presence in J(v) of the term
1
Z
λ (div v)2 dΩ. (7.30)
2 Ω

As a matter of fact, in the incompressible regime, we have, simultaneously, λ →

+∞ and div v → 0, so that the integrand in (7.30) gives rise to an undetermined
form. A possible way out to this deadend is suggested by the following result.
Proposition 7.8.1. We have
1
Z
λ (div v)2 dΩ = sup Φv (q) (7.31)
2 Ω q∈Q

where
1
Z Z
2
Φv (q) := − q dΩ − q divv dΩ v ∈ V. (7.32)
2λ Ω Ω

Proof. Enforcing the stationarity condition on Φv

∂ Φv (q) Φv (q + δ η) − Φv (q)
= lim =0 ∀η ∈ Q
∂q δ →0 δ
yields Z
1
− q + div v η dΩ = 0 ∀η ∈ Q,
Ω λ
from which we conclude that the element
1
z := q + div v ∈ Q
λ
7.8. ENERGY FORMULATION OF INCOMPRESSIBLE ELASTICITY 131

is orthogonal (with respect to the inner product in L2 (Ω)) to all elements of Q, i.e.

z = 0 a.e.in Ω ⇒ q = −λ div v ≡ q∗ a.e.in Ω.

To check whether q∗ is a mimimizer of maximizer of Φv , we have to compute the

second partial derivative of Φv with respect to q, obtaining

∂ 2 Φv (q) 1
Z
2
=− η 2 dΩ < 0 ∀η ∈ Q.
∂q λ Ω

This shows that Φv (q) is maximized at q = q∗ , i.e.

1
Z Z
∗
sup Φv (q) = Φv (q ) = − (−λ div v)2 dΩ − (−λ div v)div v dΩ
q∈Q 2λ Ω Ω
1
Z
= λ (div v)2 dΩ
2 Ω
which is (7.31).
Using Prop. 7.8.1, we can prove the following result.
Theorem 7.8.2 (Saddle-point formulation of (7.22)). Let V and Q be defined as
in (5.20) and (7.21), respectively. Then, the saddle-point problem:
find the pair (u, p) ∈ (V × Q) such that

Πλ (u, q) ≤ Πλ (u, p) ≤ Πλ (v, p) ∀(v, q) ∈ (V × Q) (7.33)

where
1
Z Z
Πλ (v, q) = µε(v) : ε(v) dΩ − q2 dΩ
Ω 2λ Ω
Z Z Z (7.34)
− qdiv v dΩ − f · v dΩ − g · v dΓ
Ω Ω ΓN

is completely equivalent to the weak problem (7.22).

Remark 7.8.3. Thm. 7.8.2 is the counterpart of Thm. 5.3.2 in the incompressible
limit. For a given λ ∈ (0, +∞], the functional Πλ (v, q) : V × Q → R is the La-
grangian associated with the elasticity problem. Notice that, unlike in the varia-
tional principle of Thm. 5.3.2 (search of the minimizer of the total potential energy
J(v) stored in the elastic body), here we are looking for the saddle-point of the
Lagrangian Πλ (v, q). This makes the nature of problem (7.33) very different from
132 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

that of (5.23). The main disadvantage related to the solution of the saddle-point
formulation is the introduction of an auxiliary unknown (the pressure parame-
ter) which makes the method more computationally expensive, and, above all, the
need of satisfying the inf-sup condition (7.26) also on the discrete level (this issue
will be thouroughly addressed in Sect. 7.9). The main advantage provided by the
use of the saddle-point formulation compared to the minimum energy principle
(or the B-bar method) is represented by the fact that in the incompressible limit,
Πλ (v, q) does not break down, unlike J(v), rather it tends to the limiting value
Z Z Z Z
Π+∞ (v, q) = µε(v) : ε(v) dΩ − qdiv v dΩ − f · v dΩ − g · v dΓ.
Ω Ω Ω ΓN

In this sense, the saddle-point formulation is robust with respect to the compress-
ibility parameter λ over the whole range λ ∈ (0, +∞].

7.9 GFE approximation of the two-field model

Throughout this section, we assume that the computational domain Ω represent-
ing the elastic body is a polygon in R2 and we set λ = +∞ (exactly incompressible
1 (Ω))2 while that for
case). The abstract space for the displacement is V = (H0,Γ D
the pressure is Q = L2 (Ω). To address the numerical approximation of (7.25)
using the GFEM, we introduce a family {Th }h>0 of triangulations made of tri-
angular elements K and for a given member of the family, we consider two finite
dimensional subspaces of polynomial scalar functions

Vh ⊂ V, Qh ⊂ Q.

Then, the GFE approximation of (7.25) is:

find uh ∈ Vh and ph ∈ Qh such that, for all vh ∈ Vh and for all qh ∈ Qh we have:
 Z Z Z

 2µε(uh ) : ε(vh ) dΩ − ph div vh dΩ = f · vh dΩ
ΩZ Ω Ω
(7.35)
 − qh div uh dΩ
 = 0.
Ω

To characterize the linear system of algebraic equations emanating from (7.35),

we introduce two sets of polynomial basis functions, φ j , j = 1, . . . , Nh , and ψk ,
k = 1, . . . , Mh , such that
Nh Nh Mh
Vh = (span φ j j=1 × span φ j j=1 ) Qh = span ψ j j=1 ,
7.9. GFE APPROXIMATION OF THE TWO-FIELD MODEL 133

in such a way that dimVh = 2Nh , dim Qh = Mh and the approximations uh ∈ Vh

and pq ∈ Qh to u ∈ V and p ∈ Q, respectively, can be written as
 N

h
∑ j=1 u j φ j (x)
uh (x) =   ∈ Vh
Nh
∑ j=1 v j φ j (x)
(7.36)
Mh
ph (x) = ∑ p j ψ j (x) ∈ Qh,
j=1

Mh
where {u1 , u2 , . . . , uNh , v1 , v2 , . . . , vNh }T and p j j=1 are the sets of dofs for uh
and ph , respectively. Replacing the expressions (7.36) into (7.35) and taking vh =
[φi , 0]T and vh = [0, φi ]T , respectively, for i = 1, . . . , Nh , in (7.35)1 , and qh = ψi ,
i = 1, . . . , Mh in (7.35)2 , yields the following linear algebraic system

K U=F (7.37)

where
A BT
" # " # " #
u F
K = , U= , F= . (7.38)
B 0 p 0
The matrices A and B have the following block-form structure

Auu Auv
" #
Bqu Bqv

A= , B=
Avu Avv

where Auu , Auv , Avu and Avv are square matrices each of size Nh , while Bqu and
Bqv are rectangular matrices each of size Mh × Nh . The entries of each sub-block
matrix in A are given by
Z
Auu
ij = 2µε([φ j , 0]T ) : ε([φi , 0]T ) dΩ i, j = 1, . . . , Nh
ZΩ
Auv
ij = 2µε([0, φ j ]T ) : ε([φi , 0]T ) dΩ i, j = 1, . . . , Nh
ZΩ
Avu
ij = 2µε([φ j , 0]T ) : ε([0, φi ]T ) dΩ i, j = 1, . . . , Nh
ZΩ
Avv
ij = 2µε([0, φ j ]T ) : ε([0, φi ]T ) dΩ i, j = 1, . . . , Nh
Ω
134 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

while those in matrix B are given by

Z
qu ∂φj
Bi j = − dΩ
ψi i = 1, . . . , Mh , j = 1, . . . , Nh
Ω ∂x
Z
qv ∂φj
Bi j = − ψi dΩ i = 1, . . . , Mh , j = 1, . . . , Nh .
Ω ∂y
A similar stucture represents the load vector F
" u #
F
F= ,
Fv

where the entries of each sub-block vector are given by

Z Z
T
Fiu = f · [φi , 0] dΩ + g · [φi , 0]T dΓ
ZΩ Z ΓN

= fx φi dΩ + gx φi dΓ i = 1, . . . , Nh
ZΩ ΓN Z

Fiv = f · [0, φi ]T dΩ + g · [0, φi ]T dΓ

ZΩ Z ΓN

= fy φi dΩ + gy φi dΓ i = 1, . . . , Nh .
Ω ΓN

Matrix A is symmetric and positive definite, and has 2Nh rows and 2Nh columns,
while matrix B is rectangular and has Mh rows and 2Nh columns. The right-hand
side F is a vector with 2Nh rows. The construction of these matrices and vector
proceeds in the same fashion as described in Sect. 6.2.1.

7.10 Unique solvability and error analysis

The linear algebraic system (7.37) emanating from the two-field formulation of
linear elasticity has a remarkably different structure compared to the linear al-
gebraic system (6.9) emanating from the classical displacement-based approach.
The difference is twofold. First, the stiffness matrix K is symmetric and positive
definite, while the corresponding (generalized) stiffness matrix K is symmetric
but indefinite, because of the zero block in the (2,2) position. Second, the rect-
angular matrix B has no definite rank until we do not specify the degree of the
polynomial basis functions of Vh and Qh . The consequences of these two differ-
ences are:
7.10. UNIQUE SOLVABILITY AND ERROR ANALYSIS 135

• system (6.9) admits a unique solution that can be efficiently computed by

using one of the most appropriate methods (direct/iterative) illustrated in
Sect. E;
• system (7.37) has no guarantee to be uniquely solvable, and should this hold
true, its solution is, in general, more difficult than that of (6.9) because of
the larger size of K compared to K.
In this sense, the displacement-based formulation is “better ” than the displace-
ment-pressure formulation. However, we have already seen that this latter ap-
proach is indispensable whenever a robust treatment of the incompressibility con-
straint is in order, so that its adoption is out of question in such a case.
In conclusion, we need now to characterize the choice of the finite element
spaces Vh and Qh in such a way that system (7.37) admits a unique solution. To
this purpose, we need a discrete counterpart of the inf-sup condition (7.26).
Proposition 7.10.1 (Discrete inf-sup condition). The two spaces Vh and Qh are
said to be compatible (or, LBB compatible) if there exists a positive constant γ ∗
independent of h, such that for every qh ∈ Qh , there exists vh ∈ Vh such that
Z
qh div vh dΩ ≥ γ ∗ kqh kQ kvh kV . (7.39)
Ω

The discrete inf-sup condition has an important algebraic interpretation.

Theorem 7.10.2 (Rank condition). Assume that (7.39) is satisfied. Then matrix B
has full rank given by
rank(B) = dim Qh = Mh . (7.40)
By Def. B.3.2, a necessary condition for (7.40) to hold is that

dim Qh ≤ dimVh . (7.41)

| {z } | {z }
Mh 2Nh

Remark 7.10.3 (Equal order interpolation). Since Thm. 7.10.2 tells us that sat-
isfying the discrete inf-sup condition is equivalent to stating that matrix B has
full rank, the discrete inf-sup condition is often referred to as the rank condition.
From (7.41), it turns out that the rank condition introduces the constraint for the
approximation space of the displacement field of being richer than that of the
pressure field. A consequence of (7.41) is that the use of equal-order polynomial
spaces for both Vh and Qh is in general not allowed except suitable stabilization
techniques are adopted.
136 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

Having introduced the discrete version of the inf-sup condition, we can pro-
ceed as in Thm. 7.7.5, to prove the following result.
Theorem 7.10.4 (Unique solvability of the discrete Herrmann formulation). As-
sume that Vh and Qh satisfy the discrete inf-sup condition (7.39). Then, prob-
lem (7.35) admits a unique solution pair (uh , ph ) ∈ (Vh × Qh ), such that
CP
kuh kV ≤ kfk(L2 (Ω))3 (7.42a)
2µ
2CP
kph kQ ≤ kfk(L2 (Ω))3 . (7.42b)
γ∗
We can also prove the following result, that represents the extension of Ceà’s
Lemma to the two-field formulation.
Theorem 7.10.5 (Ceà’s Lemma for the Herrmann formulation). Assume that Vh
and Qh satisfy the discrete inf-sup condition (7.39). Then, there exists a positive
constant C independent of h such that

ku − uh kV + kp − ph kQ ≤ C inf ku − vh kV + inf kp − qh kQ (7.43)
vh ∈Vh qh ∈Qh

where (u, p) and (uh , ph ) are the solution pairs of problems (7.25) and (7.35),
respectively.
The previous result tells us that the discretization error (left-hand side) is
bounded by the approximation error (right-hand side). Let:
Uh := inf kv − vh kV ∀v ∈ V
vh ∈Vh
(7.44)
Ph := inf kq − qh kQ ∀q ∈ Q.
qh ∈Qh

Theorem 7.10.6 (Convergence). Assume that Vh and Qh satisfy the discrete inf-
sup condition (7.39) and that the property of “good approximation” holds, i.e.
lim Uh = 0
h→0
(7.45)
lim Ph = 0.
h→0

Then, the approximate pair (uh , ph ) converges to the exact solution (u, p) of (7.25),
i.e.
lim (ku − uh kV + kp − ph kQ ) = 0. (7.46)
h→0
7.11. THE DISCRETE INF-SUP CONDITION 137

Moreover, if
lim Uh = O(h p )
h→0
(7.47)
lim Ph = O(h p )
h→0
for a certain p ≥ 1, then

ku − uh kV + kp − ph kQ ≤ Ch p (7.48)

and the convergence of the GFE approximation is said to be optimal.

Remark 7.10.7 (The order of convergence). In Thm. 7.10.6 we have assumed both
approximation errors Uh and Ph to be infinitesimals of the same order p with re-
spect to the discretization parameter h. This property implies the optimality of
the error estimate for the FE approximation of the Herrmann formulation. Let us
assume now that the finite element spaces for displacement and pressure are such
that:
lim Uh = O(h pu )
h→0
(7.49)
lim Ph = O(h p p )
h→0
for certain pu and p p such that pu , p p ≥ 1 with pu 6= p p . According to Thm. 7.10.2
and to Rem. 7.10.3, it turns out that p p < pu , so that, using (7.49) into the gener-
alized Ceà’s Lemma (7.43) we get the following error estimate

ku − uh kV + kp − ph kQ ≤ Ch p p . (7.50)

Comparing (7.50) with (7.48) we conclude that if the finite element space for the
pressure is too poor with respect to that for the displacement, the convergence of
the method is sub-optimal and the accuracy of the computation of the displace-
ment may be (negatively) affected by the accuracy of the computation of the pres-
sure. This undesired phenomenon is the effect of the coupling between the two
variables through the incompressibility condition.

7.11 The importance of satisfying the discrete inf-

sup condition: spurious pressure modes
In this section, we give a motivation of the importance of satisfying the discrete
inf-sup in the FE approximation of the two-field model (7.25).
138 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

Definition 7.11.1 (Spurious pressure modes). A function p∗h ∈ Qh , with p∗h 6= 0 is

called a spurious pressure mode if
Z
p∗h div vh dΩ = 0 vh ∈ Vh . (7.51)
Ω

Should a function p∗h exist, then this would immediately imply that if ph is a
solution of (7.35), then also ph + p∗h is a solution. In other words, there would be
no way for filtering out (from the correct solution) the presence of a parasitic (i.e.,
unphysical) solution component that adds to the right one. This is the reason for
calling p∗h a “spurious pressure mode”.

Theorem 7.11.2. Let Vh and Qh are such that the discrete inf-sup condition (7.39)
is satisfied. Then, p∗h = 0, i.e., no spurious mode can arise in the solution of (7.35).

Proof. We proceed by contradiction, and assume that a function p∗h , with p∗h 6= 0,
exists such that (7.51) holds. Then, we have
Z
0= p∗h div vh dΩ ≥ γ ∗ kp∗h kQ kvh kV with vh 6= 0
Ω

which implies kp∗h kQ = 0 and thus, by definition of norm, p∗h = 0.

Remark 7.11.3. The previous result tells us that satisfying the discrete inf-sup con-
dition is an automatic guarantee of avoiding the occurrence of spurious pressure
modes in the numerical solution of incompressible elasticity problems.

7.12 Finite elements for incompressible elasticity

In this section, we provide a short overview of the FE pairs most commonly em-
ployed for the approximation spaces Vh and Qh . The two-dimensional case is
considered here, on a triangular partition Th of the polygon Ω into triangles K. In
the characterization of the finite element spaces for the displacement, enforcement
of the Dirichlet boundary conditions is understood for notational clarity.
Two main classes of finite elements are available, depending on the choice for Qh :

• discontinuous pressures on Th ;

• continuous pressures on Th .
7.12. FINITE ELEMENTS FOR INCOMPRESSIBLE ELASTICITY 139

These two different approaches are allowed because Qh is a subset of Q = L2 (Ω)

and therefore no continuity requirement between two neighbouring elements K1 ,
K2 needs be enforced a-priori in the construction of the pressure space. The same
possibility is, instead, not allowed in the construction of Vh , because this latter
is a subspace of (H 1 (Ω))2 ⊂ V and therefore discrete functions of V must be
continuous across neighbouring elements.
In the error estimates that are discussed in the following, C denotes a positive
constant, independent of h, not taking in general the same value at each occur-
rence, and the exact solutions u and p of (7.22) are assumed to be sufficiently
regular in order the norms appearing in the right-hand side of each considered es-
timate to make sense. In all figures of this section, a black bullet identifies a dof for
a scalar-valued function, while a black square identifies a dof for a vector-valued
function.

7.12.1 Discontinuous pressures

The basic FE pair is the so-called P1 − P0 element, where

Vh = vh ∈ (C0 (Ω))2 | vh |K ∈ (P1 (K))2 ∀K ∈ Th

and
Qh = qh ∈ L2 (Ω) | qh |K ∈ P0 (K) ∀K ∈ Th .

The dofs for Vh and Qh over each element K ∈ Th are shown in Fig. 7.3.

Figure 7.3: Dofs for the P1 − P0 FE pair over K. Left: dofs for the displacement.
Right: dof for the pressure.

The P1 − P0 FE pair does not satisfy the discrete inf-sup condition and is
typically affected by severe locking phenomena. To see this, let us consider the
incompressibility constraint
Z
qh div uh dΩ = 0 ∀qh ∈ Qh .
Ω
140 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

From the definition of the space Qh , this condition amounts to requiring

Z
div uh dK = 0 ∀K ∈ Th .
K

Since uh is linear over K, the above equation yields the strong condition

div uh = 0 ∀K ∈ Th .

Going back to Ex. 7.3.1 (cf. Fig. 7.3.1), it is immediate to see that this latter
condition implies that each triangle has to deform maintaining a constant area, so
that uh = 0 in Ω and the structure goes in complete locking.
A variant of the P1 − P0 element that passes the discrete inf-sup condition
(thus avoiding the locking problem) is the so-called (cross − grid − P1 ) − P0 FE
pair, whose dofs are depicted in Fig. 7.4.

Figure 7.4: Dofs for the (cross − grid − P1 ) − P0 FE pair over K. Left: dofs for
the displacement. Right: dof for the pressure.

The approximation space for the displacement consists of functions that are
linear over the sub-elements K1 , K2 and K3 , and are continuous across interelement
edges. In this case, the incompressibility constraint becomes
3
∑ div uh|Ki |Ki| = 0 ∀K ∈ Th ,
i=1

which does not admit as unique solution the displacement field uh = 0, so that
the solid body is free to deform, maintaining the area of K constant as required.
The (cross − grid − P1 ) − P0 FE pair satisfies the discrete inf-sup condition and
the following optimal error estimate can be proved to hold

ku − uh kV + kp − ph kQ ≤ Ch(kuk(H 2 (Ω))2 + kpkH 1 (Ω) ).

7.12. FINITE ELEMENTS FOR INCOMPRESSIBLE ELASTICITY 141

Passing to higher-order elements, we have the P2 − Pdisc

1 FE pair, where

Vh = vh ∈ (C0 (Ω))2 | vh |K ∈ (P2 (K))2 ∀K ∈ Th

and
Qh = qh ∈ L2 (Ω) | qh |K ∈ P1 (K) ∀K ∈ Th .

The dofs for Vh and Qh over each element K ∈ Th are shown in Fig. 7.5.

Figure 7.5: Dofs for the P2 −Pdisc

1 FE pair over K. Left: dofs for the displacement.
Right: dof for the pressure.

The P2 − Pdisc
1 FE pair does not satisfy the discrete inf-sup condition. To over-
come this problem, it is enough to enrich the space of displacements by adding
an extra (vectorial) degree of freedom at the center of gravity of the element K, as
depicted in Fig. 7.6.

Figure 7.6: Dofs for the (P2 ⊕ bK ) − Pdisc

1 FE pair over K. Left: dofs for the
displacement. Right: dof for the pressure.

The basis function bK that is added to those of the space (P2 (K))2 is a cubic
polynomial called bubble function, because it vanishes along all the boundary ∂ K
of the element and is identically equal to zero outside K. The locality of bK can
be advantageously exploited in the computer implementation of the scheme, by
eliminating each interior dof corresponding to bK in favor of those located on the
boundary of K. Such a procedure is called static condensation.
142 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

The (P2 ⊕ bK ) − Pdisc

1 FE pair was proposed by Crouzeix and Raviart in 1973.
This element satisfies the discrete inf-sup condition and the following optimal
error estimate can be proved

ku − uh kV + kp − ph kQ ≤ Ch2 (kuk(H 3 (Ω))2 + kpkH 2 (Ω) ).

Remark 7.12.1 (Enriching the displacement FE space). The above presentation

shows that the approach used to transform a FE pair that does not satisfy the
discrete inf-sup condition into a pair the is LBB-stable consists in enriching the
space Vh with respect to the space Qh . In both cases (linear and quadratic elements
for the displacement), the remedy consists in adding an internal degree of freedom
to uh which has the mechanical role to introduce an additional “flexibility” to the
geometrically discretized structure, thus allowing it to deform maintaining the
volume (area, in the 2D case) constant. This strategy can be extended to higher-
order elements by introducing the (Pk ⊕ Bk+1 ) − Pdisc
k−1 FE pair, for k ≥ 2, where

Bk+1 (K) := {v ∈ Pk+1 (K) | v = pk−2 bK , pk−2 ∈ Pk−2 (K)} k≥2 (7.52)

is the space of bubble functions of degree less than or equal to k − 2 + 3 = k + 1

on K. The (P2 ⊕ bK ) − Pdisc
1 is the lowest order element of the above family, cor-
responding to setting k = 2. The (Pk ⊕ Bk+1 ) − Pdisc
k−1 FE pair satisfies the discrete
inf-sup condition and the following optimal error estimate can be proved

ku − uh kV + kp − ph kQ ≤ Chk (kuk(H k+1 (Ω))2 + kpkH k (Ω) ) k ≥ 2.

7.12.2 Continuous pressures

When using continuous pressures, the natural choice is the Pk − Pcont
k FE pair,
k ≥ 1, where

Vh = vh ∈ (C0 (Ω))2 | vh |K ∈ (Pk (K))2 ∀K ∈ Th

and
Qh = qh ∈ C0 (Ω) | qh |K ∈ Pk (K) ∀K ∈ Th .

The above choice for Vh and Qh goes under the name equal-order interpolation
and has, in principle, the considerable advantage of allowing the use of the same
shape functions for displacement and pressure in coding. Unfortunately, it leads
to elements that do not satisfy the discrete inf-sup condition.
7.12. FINITE ELEMENTS FOR INCOMPRESSIBLE ELASTICITY 143

To construct FE pairs with continuous pressures that are LBB-stable, we need

to use the Pk − Pcont
r FE pair, with k ≥ 2 and 1 ≤ r ≤ k − 1, where

Vh = vh ∈ (C0 (Ω))2 | vh |K ∈ (Pk (K))2 ∀K ∈ Th

k≥2

and
Qh = qh ∈ C0 (Ω) | qh |K ∈ Pr (K) ∀K ∈ Th

1 ≤ r ≤ k − 1.
The lowest order element of the family, corresponding to k = 2 and r = 1, is well-
known as the Taylor-Hood element, and its dofs are depicted in Fig. 7.7.

Figure 7.7: Dofs for the P2 −Pcont

1 FE pair over K. Left: dofs for the displacement.
Right: dof for the pressure.

Notice that in the case of the Taylor-Hood element the dofs for the pressure
are located in correspondance of the vertices of K, so that they are single-valued
over the whole triangulation, while in the case of the P2 − Pdisc
1 FE pair of Fig. 7.5
the dofs for the pressure are not single-valued at each vertex of Th (see Fig. 7.8).

Figure 7.8: Dofs for the pressure on Th . Left: P2 − Pcont disc

1 ; right: P2 − P1 .

The Pk −Pcont
r FE pair satisfies the discrete inf-sup condition and the following
optimal error estimate can be proved (r = k − 1)

ku − uh kV + kp − ph kQ ≤ Chk (kuk(H k+1 (Ω))2 + kpkH k (Ω) ) k ≥ 2, r = k − 1.

144 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

Remark 7.12.2 (How to enrich the Pk − Pk FE pair). The enrichment strategy

discussed in the case of discontinuous pressure spaces can be used also in the case
where Qh is made of continuous functions over Ω. To see this, we consider the
lowest order case k = 1, and introduce the so-called (P1 ⊕ B3 ) − Pcont
1 FE pair
where
Vh = vh ∈ (C0 (Ω))2 | vh |K ∈ (P1 (K))2 ⊕ B3 ∀K ∈ Th

and
Qh = qh ∈ C0 (Ω) | qh |K ∈ P1 (K) ∀K ∈ Th

the space B3 being that defined in (7.52) in the case k = 2. The (P1 ⊕ B3 ) − Pcont
1
FE pair is also well-known as Mini-element, and has been proposed by Arnold,
Brezzi and Fortin in 1984. Its dofs are depicted in Fig. 7.9. The name Mini is
due to the fact that the (P1 ⊕ B3 ) − Pcont
1 FE pair is the most economical and LBB
stable element with continuous pressures. The following optimal error estimate
can be proved to hold for the Mini element

ku − uh kV + kp − ph kQ ≤ Ch(kuk(H 2 (Ω))2 + kpkH 2 (Ω) ).

Figure 7.9: Dofs for the Mini element over K. Left: dofs for the displacement.
Right: dof for the pressure.

Another possibility to enrich the P1 − Pcont

1 FE pair is the so-called (P1 − iso −
cont
P2 ) − P1 element, proposed by Bercovier and Pironneau in 1979 and whose
dofs are shown in Fig. 7.10. The name “iso” is due to the fact that the geometrical
location of the dofs is the same as for a standard P2 element for the displace-
ment, but the functions are linear over each sub-triangle Ki , i = 1, . . . , 4. An error
estimate similar to that for the Mini element can be proved to hold also for the
(P1 − iso − P2 ) − Pcont
1 FE pair.
7.13. LOCKING AND PRESSURE SPURIOUS MODES 145

Figure 7.10: Dofs for the (P1 − iso − P2 ) − Pcont

1 FE pair over K. Left: dofs for
the displacement. Right: dof for the pressure.

7.13 Numerical example: locking and pressure spu-

rious modes
In this section, we investigate the issue of the possible ocurrence of locking and
pressure spurious modes in the numerical solution of the incompressible elastic
problem (7.22) in the unit square Ω = (0, 1)2 and in plane strain conditions. We
assume that the body is constrained to ground along the sides y = 0 and x = 0 and
that no-stress conditions are applied on the rest of the boundary. Volume forces
are set equal to f = [0, −1]T , while Young modulus is set equal to 1. For h > 0,
we denote by {Th }h>0 a family of finite element triangulations, each member of
which is a uniform partition of the unit square into 2N 2 right-angled triangles of
side h = 1/N. An example of Th is shown in Fig. 7.11 in the case N = 4.

Figure 7.11: Finite element mesh.

In general, for a fixed value of N (equivalently, of h), the partition consists of

146 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

a number of:
• NE = 2N 2 elements;

• Nv = (N + 1)2 vertices;

• Nv D = 2(N + 1) − 1 = 2N + 1 vertices on ΓD ;

• Nv N = 2(N − 1) + 1 = 2N − 1 vertices on ΓN ;

• Nv i = Nv − (Nv D + Nv N) = (N − 1)2 vertices in the interior of Ω;

• Ne D = 2N edges on ΓD ;

• Ne N = 2N edges on ΓN ;

• Ne i = N 2 + 2N(N − 1) = 3N 2 − 2N edges in the interior of Ω;

• Ne = Ne i + Ne D + Ne N = 3N 2 + 2N edges.
In the example of Fig. 7.11, we have Nelem = 32, Nv = 25, Nv D = 9, Nv N = 7
and Nv i = 9, while Ne = 48, Ne D = Ne N = 4 and Ne i = 2N(N − 1) = 40.

7.13.1 Discontinuous pressure FE space

We start by using the simplest choice for Vh and Qh , the FE pair P1 − P0 . We have

dimVh = 2Nh = 2(Nv i + Nv N) = 2((N − 1)2 + 2N − 1) = 2N 2

and
dim Qh = NE = 2N 2 .
Thus, Mh = 2Nh and inequality (7.41) becomes in this special case an equality.
The computed deformed structure and pressure field are shown in Figs. 7.12(a)
and 7.12(b).
Results clearly indicate:
• no deformation of the structure (locking);

• unphysical oscillations in the pressure (spurious pressure mode).

Matlab coding. The following Matlab commands extract from the computed so-
lution of system (7.37) the maximum horizontal and vertical displacement com-
ponents.
7.13. LOCKING AND PRESSURE SPURIOUS MODES 147

(a) Deformed structure (b) Pressure

Figure 7.12: Computed solution with the P1 − P0 FE pair that does no satisfy the
discrete inf-sup condition.

>> EF2D_elinc

* test case ----> Locking *

reading data
assembling stiffness matrix and load vector
Neumann boundary conditions
Dirichlet boundary conditions
solving the linear system
plot of solution
>> figure; pdesurf(griglia_conn.p,griglia_conn.t,U_p’)
>> xlabel(’x’)
>> ylabel(’y’)
>> title(’Pressure’)
>> max(abs(U_ux))

ans =

>> max(abs(U_uy))

ans =

To avoid the occurrence of locking, we increase the polynomial degree of the

approximation for the displacement. For this, we use the FE pair P2 − P0 . In this
case we have

dimVh = 2Nh = 2(Nv i + Nv N) + 2(Ne i + Ne N) = 8N 2 .

148 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

Thus, Mh < 2Nh . The computed deformed structure and pressure field are shown
in Figs. 7.13(a) and 7.13(b). Results clearly indicate the absence of any locking
or spurious oscillations in the pressure distribution.

(a) Deformed structure (b) Pressure

Figure 7.13: Computed solution with the P2 − P0 FE pair that satisfies the discrete
inf-sup condition.

Matlab coding. The following Matlab commands extract from the computed so-
lution of system (7.37) the maximum horizontal and vertical displacement com-
ponents.
>> EF2D_elinc

* test case ----> Locking *

ans =

0.2708

>> max(abs(U_uy))

ans =

0.2642
7.13. LOCKING AND PRESSURE SPURIOUS MODES 149

7.13.2 Continuous pressure FE space

We now consider the use of a continuous pressure approximation space. The
simplest element is thus the P1 − Pcont
1 FE pair. We have

dim Qh = Mh = Nv = (N + 1)2

so that condition (7.41) is satisfied for N ≥ 3. The computed deformed structure

and pressure field are shown in Figs. 7.14(a) and 7.14(b). Results clearly indicate
the absence of locking and the presence of strong spurious oscillations in the pres-
sure distribution, in accordance with the fact that the P1 − Pcont
1 FE pair does not
satisfy the discrete inf-sup condition.

(a) Deformed structure (b) Pressure

Figure 7.14: Computed solution with the P1 − Pcont

1 FE pair that does not satisfy
the discrete inf-sup condition.

Matlab coding. The following Matlab commands extract from the computed so-
lution of system (7.37) the maximum horizontal and vertical displacement com-
ponents. Notice the message warning us about the fact that the linear algebraic
system (7.37) is singular.
>> EF2D_elinc

* test case ----> Locking *

reading data
assembling stiffness matrix and load vector
Neumann boundary conditions
Dirichlet boundary conditions
solving the linear system
Warning: Matrix is close to singular or badly scaled.
150 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

Results may be inaccurate. RCOND = 6.938894e-18.

> In EF2D_elinc at 399
plot of solution
>> figure; pdesurf(griglia_conn.p,griglia_conn.t,U_p)
>> max(abs(U_ux))

ans =

0.2246

>> max(abs(U_uy))

ans =

0.2055

To overcome the instability of the P1 − P1 pair, we adopt the simplest variant of

such an element that satisfies the discrete inf-sup condition, the so-called Mini-
element, that is, the (P1 ⊕ B3 ) − Pcont
1 FE pair. In this case, we have

dimVh = 2Nh = 2(Nv i + Nv N + NE) = 4N 2

from which it follows that condition (7.41) is satisfied for every N ≥ 1. The com-
puted deformed structure and pressure field are shown in Figs. 7.15(a) and 7.15(b).
Results clearly indicate the absence of locking and spurious oscillations, in accor-
dance with the fact that the Mini-element satisfies the discrete inf-sup condition.

(a) Deformed structure (b) Pressure

Figure 7.15: Computed solution with the Mini-element that satisfies the discrete
inf-sup condition.
7.14. CONVERGENCE ANALYSIS 151

Matlab coding. The following Matlab commands extract from the computed so-
lution of system (7.37) the maximum horizontal and vertical displacement com-
ponents. Notice how these latter are very close to those computed by the unstable
pair P1 − Pcont
1 , confirming the fact that not satisfying the discrete inf-sup condi-
tion does not prevent, in general, the displacement variable to be approximated
correctly. It is the pressure, however, that results completely inaccurate.
>> EF2D_elinc

* test case ----> Locking *

reading data
assembling stiffness matrix and load vector
Neumann boundary conditions
Dirichlet boundary conditions
solving the linear system
plot of solution
>> figure; pdesurf(griglia_conn.p,griglia_conn.t,U_p)
>> xlabel(’x’)
>> ylabel(’y’)
>> title(’Pressure’)
>> max(abs(U_ux))

ans =

0.2241

>> max(abs(U_uy))

ans =

0.2173

7.14 Numerical example: convergence analysis

In this section, we illustrate the convergence performance of the GFEM in the
solution of a problem in incompressible elasticity (plane strain conditions) with
available exact solution. The computational domain is the square Ω = (0, π)2 ,
obtained as the image of the reference domain Ω b = (−1, 1)2 through the linear
map 
π
 x := (ξ + 1)
 ξ ∈ [−1, 1]
2
π
 y := (η + 1)
 η ∈ [−1, 1].
2
The value of the Young modulus is E = 3 and the Poisson modulus is ν = 0.5.
The solid body is constrained to ground on three of the four sides, as shown in
152 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

Fig. 7.16, while the remaining lateral side is subject to a normal stress condition
T
hN = 0, −(sin(y))2 .

The volume force is

π T
f= sin(2y), sin(2x)(3 − 8(sin(y))2 ) ,
2
in such a way that the exact displacement field is
1 T
u= (sin(x))2 sin(2y), − sin(2x)(sin(y))2 .
π

Figure 7.16: Computational domain Ω,

b volume forces and boundary conditions.

A family of five uniformly refined triangulations {Th }h>0 , with h = π/N, is

used in the nuemrical computations, with N = [5, 10, 20, 40, 80]T . For sake of
comparison, the following FE pairs are considered: Mini element, P2 − Pcont 1 ,
cont cont
P3 − P2 and P3 − P1 . The symbols used to identify the results obtained with
each FE pair are (in the same order): circles, asterisks, squares and diamonds.
Fig. 7.17 shows the experimental error curves for the sole displacement vari-
able, measured in the H 1 norm and plotted in log-log scale as a function of h for
each of the considered finite element spaces. The obtained results are in complete
7.14. CONVERGENCE ANALYSIS 153

agreement with the theoretical conclusions of Sect. 7.12.2. Namely, we see that
the error in H 1 decreases as hk for the Mini element (k = 1), the P2 − Pcont
1 element
cont
(k = 2, r = 1) and the P3 − P2 element (k = 3, r = 2). For the same FE pairs,
the error in L2 decreases as hk+1 in agreement with Thm. 2.2.4.

Figure 7.17: Convergence analysis: ku − uh kV .

A different asymptotic behavior as a function of h is registered for the P3 −

Pcont
1 FE pair (k = 3, r = 1). In such a case, we still have convergence, because the
element is LBB-stable. However, the value of r is not equal to k − 1, so that the es-
timate in this case is not optimal. In accordance with the notation of Rem. 7.10.7,
we have pu = k = 3 and p p = r + 1 = 2 < pu , so that the estimate (7.50) predicts
the total discretization error to be of the order of h2 and not of the order of h3 . This
estimate agrees with the results of Fig. 7.17 that predict a quadratic convergence
rate for ku − uh kV in the case of the P3 − Pcont
1 FE pair. Notably, because of the
coupling between uh and ph in the exactly incompressible problem, the accuracy
of the approximation of the displacement variable is limited by the lower degree
of the pressure finite element space.
A visual plot of the deformed structure and of the pressure distribution in the
elastic body are shown in Figs. 7.18 and 7.19 as computed by using the P2 − Pcont 1
pair on a mesh with N = 5 (deformed structure) and N = 10 (pressure). It can be
seen that the Dirichlet boundary conditions are enforced in an essential manner,
i.e., the displacement of the nodes belonging to ΓD is equal to zero. It can be
154 CHAPTER 7. INCOMPRESSIBLE ELASTICITY

Figure 7.18: Computed solution with the P2 − Pcont

1 FE pair: deformed structure.

Figure 7.19: Computed solution with the P2 − Pcont

1 FE pair: pressure field.

also seen very clearly that the structure deforms in a counterclockwise sense, in
accordance with the applied volume forces and boundary tractions, and that the
area of the elastic body remains constant, in accordance with the incompressibility
7.14. CONVERGENCE ANALYSIS 155

constraint.
156 CHAPTER 7. INCOMPRESSIBLE ELASTICITY
Part IV

Examination Problems with Solution

157
159

This part contains several examination problems with complete solution, includ-
ing theoretical questions, numerical tests and Matlab coding implementation.
160
Chapter 8

Problems with Solution

Abstract
In this chapter, we deal with the detailed solution of several examination prob-
lems, including theoretical and numerical questions, as well as their coding using
the Matlab software environment. During each exam, time left for solving the
three exercises is typically 3 hours and a half, and for each exercise a value in
“points” is assigned to allow the student a preliminary self-evaluation before exam
completion.

161
162 CHAPTER 8. SOLVED PROBLEMS

8.1 Examination of July 09, 2012

Exercise 1 (11 points). Consider the linear elasticity problem in the incompress-
ible regime (ν = 0.5) and in plane strain conditions. The computational domain is
shown in Fig. 8.1, the two sides at the bottom are constrained to ground, the lateral
and top sides are subject to a normal stress h as indicated in the figure while on
the remaining sides a stress-free condition is applied. The Young modulus E is
equal to 30. Solve the problem with the code EF2D elinc using the Mini element
on a triangulation with average mesh size equal to 0.05, and the following specific
questions must be addressed:

Figure 8.1: Computational domain, boundary conditions and applied loads.

(a) plot the configuration of the deformed structure superposed to the original
undeformed configuration, and comment the obtained result based on the
symmetry of the problem;
(b) compute the resultant of ground reacting forces and check that global force
equilibrium is satisfied;
(c) compute the maximum horizontal and vertical displacements;
(d) plot the qualitative behavior of the computed pressure and give a mechanical
comment to the obtained result;
(e) repeat all the points (a), (b), (c) and (d) using the P1 -P1 FE pair, comparing the
obtained results with those of the Mini element. Which variable between u
and p turns out to be correctly approximated in the two cases? Why?
8.1. EXAMINATION OF JULY 09, 2012 163

Exercise 2 (11 points). Consider the following two-point BVP:

−µ u00 + σ u = f
(
in Ω = (0, 1)
(P)
u(0) = u0 , u(1) = u1 ,

where µ and σ are constants > 0, u0 and u1 are constants, and f is a given function
in L2 (0, 1).

(a) Let R denote the lifting of boundary data. Write the weak formulation (W) of
problem (P), specifying the space V and the corresponding norm k · kV .

(b) Use the Lax-Milgram Lemma to verify that (W) admits a unique solution and
determine the a-priori estimate for kukH 1 .

(c) Assuming that the solution of (P) belongs to H s , with s ≥ 2 given, write the
error estimate for the GFEM of degree r ≥ 1 applied to the numerical ap-
proximation of (P) and discuss the estimate as a function of the relation
between r and s and of the ratio σ /µ.

Exercise 3 (11 points). Set u0 = 1, u1 = 0, σ = 1 and f (x) = 1, so that the exact

solution of (P) is
u(x) = 1 +C e−αx − e+αx ,

with α := 1/µ and C := 1/(eα − e−α ).

(a) Set µ = 10−1 . Solve (P) with the GFEM using the code EF1D with r = 1, 2 and
3 on a family of uniform grids with N = [10, 20, 40, 80, 160] elements. Set
h := 1/N, report in a table the errors in the H 1 and L2 norms as functions of
h, and determine experimentally the order of convergence p for each con-
sidered degree r. Comment the obtained results based on the conclusions
drawn at point (c) of Exercise 2.

(b) Set now µ = 10−4 , N = 10 and r = 1. Compute the solution uh with these data
and report a plot of it superposing to that of the exact solution u. Compute
the error in the L2 norm and the absolute value of the maximum nodal error,
emax , and check if uh satisfies the DMP justifying the answer based on the
value of the local Pèclet number.
164 CHAPTER 8. SOLVED PROBLEMS

(d) Determine the minimum number of intervals Nmin that are needed to make uh
satisfy the DMP, repeat point (b) with µ = 10−4 , N = Nmin and r = 1 (no
lumping) and compare the obtained results.

8.1.1 Solution of Exercise 1

Matlab coding. The following Matlab functions are called by the script EF2d elinc
to solve Exercise 1(a). This script is the main program that implements the GFEM
in 2D for the numerical treatment of the Herrmann formulation for linear elastic-
ity.
%%%%%%%%%%%%%%
% Ex. 1
%%%%%%%%%%%%%%
function [griglia,dati_problema,soluzione] = Ex1()
% problem definition
griglia = struct(’file_griglia’,’griglia09072012.mat’, ...
’ku’,1.5, ... % Mini element
’kp’,1);
dati_problema = struct(’coeff’,’Es1_coeff’, ...
’tipo’,1, ... % 1 plane strain, 2 plane stress
’tipo_bc’,[ 1 2 3 4 5 6 7 8;
-2 -2 -2 -2 -2 -2 -1 -1], ...
’g’,’Es1_g’, ...
’h’,’Es1_h’);
soluzione = struct(’u_es’,[]’);
return

function [E,nu,f] = Ex1_coeff(x,y)

% problem coefficients/volume force
E = 30*ones(size(x));
nu = 0.5*ones(size(x));
f(1,:) = 0*x;
f(2,:) = 0*x;
return

function g = Ex1_g(x,y)
% Dirichlet bcs
g = zeros(2,size(x,2));
return

function h = Ex1_h(x,y,marker)
% Neumann bcs
switch(marker)
case(1)
h(1,:) = y;
h(2,:) = 0*x;
case(2)
h(1,:) = 0*x;
h(2,:) = -1;
8.1. EXAMINATION OF JULY 09, 2012 165

case(3)
h(1,:) = -y;
h(2,:) = 0*x;
case(4)
h(1,:) = 0*x;
h(2,:) = 0*x;
case(5)
h(1,:) = 0*x;
h(2,:) = 0*x;
case(6)
h(1,:) = 0*x;
h(2,:) = 0*x;
otherwise
error(’Error in Neumann bcs!!’)
end
return

(a) The finite element mesh used to solve numerically the problem is generated
using the pdetool toolbox available in the Matlab software environment
and is shown in Fig. 8.2.

Figure 8.2: Computational mesh with average mesh size h = 0.1.

Matlab coding. Running the script EF2d elinc yields the following output.
The other Matlab commands allow to access the various dimensions of data
structure and problem unknowns.

>> EF2D_elinc

* test case ----> ex1 *

166 CHAPTER 8. SOLVED PROBLEMS

reading data
assembling stiffness matrix and load vector
Neumann boundary conditions
Dirichlet boundary conditions
solving the linear system
plot of solution

>> size(griglia_conn.p)

ans =

2 1107

>> size(griglia_conn.t)

ans =

4 2000

>> size(U_ux), size(U_uy), size(U_p)

ans =

3107 1

ans =

3107 1

ans =

1107 1

The number of triangles is equal to 2000 and the number of vertices is 1107.
The total number of dofs for each component of the displacement uh is 3107
which equals the sum of 2000 (one internal dof per mesh triangle) plus the
number of vertices that do not belong to ΓD . The total number of dofs for
ph is equal to the number of vertices.
The deformed configuration computed by the numerical scheme using the
Mini element is shown in Fig. 8.3.
We can see that the structure deforms itself in a symmetric manner, ac-
cording to the symmetry of the geometry and of applied external loads. In
particular, the two lateral sides turn out to be subject to a compressive state
that produces a significant deformation. The incompressibility constraint is
responsible for the strong deflection of the internal vertical sides located at
x = −0.5 and x = 0.5, respectively. The horizontal side located at y = 1.5 is
8.1. EXAMINATION OF JULY 09, 2012 167

Figure 8.3: Computed deformed configuration.

subject to a vertical force that produces a negative displacement of the side

itself along the y direction.

(b) Enforcing global equilibrium of horizontal and vertical applied forces with
the reaction forces R yields

Rx + 2 ∗ 2/2 − 2 ∗ 2/2 = 0 ⇒ Rx = 0
Ry + (−1) ∗ 1 = 0 ⇒ Ry = 1.

Matlab coding. The following Matlab commands allow to access the pro-
gram output variables containing the components of reaction forces.

>> R_tot

R_tot =

-0.0000
1.0000

Results are in agreement with physical expectation.

(c)
Matlab coding. The following Matlab commands allow to compute the re-
quired quantities.
168 CHAPTER 8. SOLVED PROBLEMS

>> [Uxmax, Ix] = max(abs(U_ux))

Uxmax =

0.3433

Ix =

118

>> griglia_conn.p(:,118)

ans =

-0.5888
1.0656

>> [Uymax, Iy] = max(abs(U_uy))

Uymax =

0.1164

Iy =

>> griglia_conn.p(:,76)

ans =

-0.6591
1.3636

The maximum horizontal and vertical displacements are 0.3433 and 0.1164,
respectively, and correspond to the deformed configuration of points P =
[−0.5888, 1.0656]T and Q = [−0.6591, 1.3636]T (in the original undeformed
structure).

(d)
Matlab coding. The following Matlab commands allow to plot the com-
puted pressure field in 3D.

>> figure; pdesurf(griglia_conn.p,griglia_conn.t,U_p)

>> colorbar; colormap(’jet’); xlabel(’x’); ylabel(’y’); title(’Pressure’)

The resulting picture of the pressure is displayed in Fig. 8.4. Consistently

with the applied external loads and the geometry of the structure, we see
8.1. EXAMINATION OF JULY 09, 2012 169

that the points [−0.5, 1.5]T and [0.5, 1.5]T are subject to the highest com-
pressive state. Concerning with the part of the structure that is constrained
to ground, the point [−1, 0]T (symmetrically, [1, 0]T ) is subject to the highest
tensile stress, while the point [−0.5, 0]T (symmetrically, [0.5, 0]T ) is subject
to the highest compressive state, in agreement with the clockwise (counter-
clockwise) torque exerted by the horizontal linearly varying pressure force
distributed along the lateral side of the structure.

Figure 8.4: 3D plot of the pressure field.

Matlab coding. The following Matlab commands allow to plot the maxi-
mum and minimum values of ph over the mesh and the corresponding nodal
coordinates in the structure.
>> [Pmax,Ip]=max(U_p)

Pmax =

19.8730

Ip =

5
170 CHAPTER 8. SOLVED PROBLEMS

>> griglia_conn.p(:,5)

ans =

-0.5000
1.5000

>> [Pmin,Ip]=min(U_p)

Pmin =

-12.0233

Ip =

>> griglia_conn.p(:,1)

ans =

-1
0

(e) Repeating the analysis previously done with the Mini element, but using the
P1 − P1 FE pair yields a similar result for the deformed structure configu-
ration, with maximum horizontal and vertical displacements of 0.3383 and
0.1152, respectively (to be compared with 0.3433 and 0.1164 in the case of
the Mini element). Computed reaction forces are again consistent with the
theoretical value R = [0, 1]T . However, as expected, the computed pressure
field is affected by numerical instabilities (oscillations) because the P1 − P1
FE pair does not satisfy the LBB compatibility condition. These oscillations
(wiggles) are clearly visible in Fig. 8.5, from which we can also see that the
maximum variation of the pressure is much higher than in the case of the
solution computed by the Mini element.
Matlab coding. The following Matlab commands allow to plot the maxi-
mum and minimum values of ph over the mesh and the corresponding nodal
coordinates in the structure.

>> [Pmax,Ip]=max(U_p)

Pmax =

40.2087

Ip =
8.1. EXAMINATION OF JULY 09, 2012 171

Figure 8.5: 3D plot of the pressure field.

>> griglia_conn.p(:,6)

ans =

0.5000
1.5000

>> [Pmin,Ip]=min(U_p)

Pmin =

-26.1788

Ip =

>> griglia_conn.p(:,1)

ans =

-1
172 CHAPTER 8. SOLVED PROBLEMS

8.1.2 Solution of Exercise 2

The two-point BVP is an example of reaction-diffusion equation with non-homogeneous
Dirichlet boundary conditions.
(a) We first need to reduce problem (P) to an equivalent problem with homoge-
neous boundary conditions at x = 0 and x = 1. To do this, we introduce a
function R : Ω → R such that R ∈ H 1 (Ω), and that R(0) = u0 and R(1) = u1 ,
the so-called lifting of boundary data. Such a function certainly exists (is
not unique), and the simplest example is
R(x) = u0 + (u1 − u0 )x.
Then, according to the linear superposition principle, we decompose the
exact solution u as
u(x) = u0 (x) + R(x), (8.1)
so that the new problem unknown u0 is such that
u0 (0) = u0 (1) = 0.
After doing this, we set
V := H01 (Ω).
Using Poincarè‘s inequality (C.26), we conclude that the equivalent norm
in V can be taken as
Z 1 1/2
0 0 2
kφ kV := kφ kL2 (0,1) = (φ ) dx φ ∈ V.
0

Defining
Z 1 Z 1
0 0
a(w, v) := µ w v dx + σ w v dx w, v ∈ V
0 0
Z 1
F(v) := f v dx − a(R, v) v ∈ V,
0

the weak formulation of (P) reads:

find u0 ∈ V such that
(W ) a(u0 , v) = F(v) ∀v ∈ V.
8.1. EXAMINATION OF JULY 09, 2012 173

(b) In the solution of this point of the exercise, U and W are two generic functions
in V .

Continuity of a(·, ·):

Z 1 Z 1
0 0
|a(U,W )| ≤ µ U W dx + σ UW dx

0 0
≤ µkUkV kW kV + σCP2 kUkV kW kV

so that the continuity constant is M := µ + σCP2 .

Coercivity of a(·, ·):

Z 1 Z 1
a(U,U) = µ (U 0 )2 dx + σ U 2 dx = µkUkV2 + σ kUk2L2 (0,1) ≥ µkUkV2
0 0

so that the coercivity constant is β := µ.

Continuity of F(·):

Z 1
|F(W )| ≤ fW dx + |a(R,W )|

0
≤ k f kL2 (0,1) kW kL2 (0,1) + MkRkH 1 (0,1) kW kV
≤ CP k f kL2 (0,1) kW kV + MkRkH 1 (0,1) kW kV

so that the continuity constant of F is Λ := CP k f kL2 (0,1) + MkRkH 1 (0,1) .

Since the assumptions (1.31) of the Lax-Milgram Lemma are all satisfied,
we conclude that problem (W) admits a unique solution u0 ∈ V , such that

Λ
ku0 kV ≤ .
β

Thus, using triangular inequality and Poincarè‘s inequality, we get the fol-
174 CHAPTER 8. SOLVED PROBLEMS

lowing a-priori estimate for the solution u

kukH 1 (0,1) = ku0 + RkH 1 (0,1) ≤ ku0 kH 1 (0,1) + kRkH 1 (0,1)

1/2
≤ k(u0 )0 k2L2 (0,1) + ku0 k2L2 (0,1) + kRkH 1 (0,1)
1/2
≤ ku0 kV2 +CP2 ku0 kV2 + kRkH 1 (0,1)
≤ (1 +CP2 )1/2 ku0 kV + kRkH 1 (0,1)
Λ
≤ (1 +CP2 )1/2 + kRkH 1 (0,1)
β
1
≤ (1 +CP2 )1/2 CP k f kL2 (0,1) + MkRkH 1 (0,1) + kRkH 1 (0,1)
µ
1 2 1/2 2 1/2 σ 2
≤ (1 +CP ) CP k f kL2 (0,1) + (1 +CP ) (1 + CP ) + 1 kRkH 1 (0,1) .
µ µ

(c) Ceà’s Lemma tells us that

M l
ku − uh kH 1 (0,1) ≤ CTh h kukH l+1 (0,1) (8.2)
β
where CTh is a positive constant, independent of h, and related to mesh
regularity, while
l := min {r, s − 1}
is the so-called regularity threshold.
Let us comment the dependence of the error on the relation between r and s.
For a given value of s, s ≥ 2, the optimal value of the degree of GFE approx-
imation is r = s − 1. In such a case, the discretization error tends to zero,
as h tends to zero, as hr and there is an optimal balance between regularity
of the solution and choice of the approximation space (see Rem. 2.2.7 and
Tab. 2.1). In particular, estimate (8.2) tells us that there is no convenience to
increase accuracy in using r ≥ s, i.e., resorting to higher-order polynomials,
but it is better to refine the mesh size (i.e., for r = s − 1, take a smaller value
of h).
Let us now comment the dependence of the error on the ratio σ /µ := R.
We have
M µ + σCP2
= = 1 +CP2 R.
β µ
8.1. EXAMINATION OF JULY 09, 2012 175

In the case of the interval (0, 1), we have CP = 1, so that (8.2) yields

ku − uh kH 1 (0,1) ≤ CTh (1 + R)hl kukH l+1 (0,1) . (8.3)

The case R 1: in this situation, problem (P) is heavily diffusion-dominated

and the choice of h to obtain a small discretization error is only determined
by the regularity of the solution. To make an example, assume s = r + 1 for
every value of the polynomial degree of the GFE approximation. Assume
also that Th is uniform, so that it is reasonable to have CTh = O(1). In this
case, to get ku − uh kH 1 (0,1) ' ε, for a given tolerance ε (small), we need to
take !1/r
ε
h' . (8.4)
kukH r+1 (0,1)

The case R 1: in this situation, problem (P) is heavily reaction-dominated

and the choice of h is now determined in a more substantial manner by the
nature of the continuous problem itself. As a matter of fact, under the same
assumptions as in the previous analyzed case, to get ku − uh kH 1 (0,1) ' ε, for
a given tolerance ε (small), we need to take
!1/r
ε
h' . (8.5)
RkukH r+1 (0,1)

The value of h predicted by (8.5) is smaller than that predicted by (8.4) by

a factor of R −1/r . If r = 1 and R = 104 , this means that the mesh size
in the reaction-dominated case has to be ten thousands smaller than in the
diffusion-dominated case, with a very negative impact in the computational
effort associated with the GFEM.

8.1.3 Solution of Exercise 3

(a)
Matlab coding. The following Matlab script is used to solve Exercise 3(a).
%%%%%%%%%%%%%%
% Ex. 3 (a)
%%%%%%%%%%%%%%
close all; clear all
global m
m = 1e-1;
176 CHAPTER 8. SOLVED PROBLEMS

% nr. of elements
N = [10 20 40 80 160];
% degree of FEs
deg = 1;
% initialize errors
El2 = [];
Eh1 = [];

for ii=1:length(N)
% uniform mesh nodes
xnod = [0 : 1/(deg*N(ii)) : 1];
% mesh
griglia = struttura_griglia(xnod,deg);
% local basis functions
base = struttura_base(deg);
% system assembling phase
K = termine_diffusione(griglia,base,’coeff_mu’);
S = termine_reazione(griglia,base,’coeff_s’);
bv = termine_noto(griglia,base,’coeff_f’);
% boundary con ditions
dati_bordo = struct( ...
’bc’ , [1 1] , ... % type of condition
’gamma’ , [] , ... % parameter for Robin bc
’r’ , [] ... % boundary datum
);
A = K+S;
b = bv;
% Dirichlet bcs and matrix system partitioning
u_incognite = [2 : 1 : griglia.dim-1];
u_note = [1; griglia.dim];
A11 = A(u_incognite,u_incognite);
A12 = A(u_incognite,u_note );
A21 = A(u_note ,u_incognite);
A22 = A(u_note ,u_note );
b1 = b(u_incognite);
x2 = [u_ex(0); u_ex(1)]; % boundary values for u_h
% system solution
x1 = A11\(b1-A12*x2);
% inclusion of boundary values in the computed solution
x(u_incognite,1) = x1;
x(u_note ,1) = x2;
% post-processing/plot
[x_plot y_plot] = visualizza_soluzione(griglia,base,x,20);
plot(x_plot,u_ex(x_plot),’m’)
% errors
[eL2,eH1] = norme_errore(griglia,base,x,’u_ex’,’grad_u_ex’);
disp([’L2 Error : ’,num2str(eL2)]);
disp([’H1 Error : ’,num2str(eH1)]);
El2 = [El2, eL2];
Eh1 = [Eh1, eH1];
end
% asymptotic convergence orders
ph1 = log(Eh1(1:end-1)./Eh1(2:end))/log(2)
pl2 = log(El2(1:end-1)./El2(2:end))/log(2)

Matlab coding. The following Matlab functions called by the previous

script are listed below.
8.1. EXAMINATION OF JULY 09, 2012 177

%%%%%%%%%%%%%%%%%%%%%%%
% Ex. 3: coefficients
%%%%%%%%%%%%%%%%%%%%%%%
function mu = coeff_mu(x)
% diffusion coefficient
global m
mu = m*ones(size(x));
return
%%%%%%%%%%%%%%%%%%%%%%%
function s = coeff_s(x)
% reaction coefficient
s = ones(size(x));
return
%%%%%%%%%%%%%%%%%%%%%%%
function f = coeff_f(x)
% source term
f = ones(size(x));
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Ex. 3: solution/gradient of solution
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function u = u_ex(x)
% exact solution
global m
a = sqrt(1/m);
C = (exp(a)-exp(-a))^(-1);
u = 1 + C*(exp(-a*x)-exp(a*x));
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function gradu = grad_u_ex(x)
% gradient of exact solution
global m
a = sqrt(1/m);
C = (exp(a)-exp(-a))^(-1);
gradu = -a*C*(exp(-a*x)+exp(a*x));
return

Running the Matlab codes shown before, we obtain the errors listed in
Tab. 8.1. Errors in the H 1 -norm are indicated by eh1 and errors in L2 -
norm are indicated by el2. Subscripts refer to the polynomial degree r that
is used. The orders of convergence, estimated using (D.9), are listed in
Tab. 8.2. All the obtained results are in complete agreement with the con-
clusions drawn at point (c) of Exercise 2. As a matter of fact, the present
case corresponds to a diffusion-dominated problem R = 1, so that, since
s = +∞, we expect a convergence of order hr in H 1 and hr+1 in L2 , respec-
tively.
(b)
Matlab coding. The following Matlab script solves Exercise 3(b).
%%%%%%%%%%%%%%
178 CHAPTER 8. SOLVED PROBLEMS

N eh11 eh12 eh13 el21 el22 el23

10 0.1132 4.7197e − 03 1.2319e − 04 2.9958e − 03 7.2713e − 05 1.2967e − 06
20 0.0568 1.1851e − 03 1.5478e − 05 7.5190e − 04 9.1398e − 06 8.1548e − 08
40 0.0284 2.9661e − 04 1.9373e − 06 1.8816e − 04 1.1441e − 06 5.1047e − 09
80 0.0142 7.4173e − 05 2.4224e − 07 4.7052e − 05 1.4306e − 07 3.1917e − 10
160 0.0071 1.8544e − 05 3.0282e − 08 1.1764e − 05 1.7884e − 08 1.9950e − 11

Table 8.1: H 1 -norm of the error (denoted by eh1) and L2 -norm of the error (de-
noted by el2) as functions of h = 1/N and r.

N p1 p2 p3 q1 q2 q3
20 0.9954 1.9937 2.9926 1.9943 2.9920 3.9911
40 0.9989 1.9984 2.9981 1.9986 2.9980 3.9978
80 0.9997 1.9996 2.9995 1.9996 2.9995 3.9994
160 0.9999 1.9999 2.9999 1.9999 2.9999 3.9998

Table 8.2: Estimated orders of convergence in the H 1 and L2 norms (denoted by

p and q, resp.) as functions of h = 1/N and r.

% Ex. 3 (b)
%%%%%%%%%%%%%%
close all; clear all
global m

setfonts
m = 1e-4;
% nr. of elements
N = 10;
% degree of FEs
deg = 1;
% mesh size
h = 1/N;
% uniform mesh nodes
xnod = [0 : h: 1];
% mesh
griglia = struttura_griglia(xnod,deg);
% local basis functions
base = struttura_base(deg);
% system assembling phase
K = termine_diffusione(griglia,base,’coeff_mu’);
S = termine_reazione(griglia,base,’coeff_s’);
bv = termine_noto(griglia,base,’coeff_f’);
% boundary con ditions
dati_bordo = struct( ...
8.1. EXAMINATION OF JULY 09, 2012 179

’bc’ , [1 1] , ... % type of condition

’gamma’ , [] , ... % parameter for Robin bc
’r’ , [] ... % boundary datum
);
A = K+S;
b = bv;
% Dirichlet bcs and matrix system partitioning
u_incognite = [2 : 1 : griglia.dim-1];
u_note = [1; griglia.dim];
A11 = A(u_incognite,u_incognite);
A12 = A(u_incognite,u_note );
A21 = A(u_note ,u_incognite);
A22 = A(u_note ,u_note );
b1 = b(u_incognite);
x2 = [u_ex(0); u_ex(1)]; % boundary values for u_h
% system solution
x1 = A11\(b1-A12*x2);
% inclusion of boundary values in the computed solution
x(u_incognite,1) = x1;
x(u_note ,1) = x2;
% post-processing/plot
XX = [0:0.001:1]’;
Uex = u_ex(XX);
plot(XX,Uex,’k-’,xnod’,x,’o--’);
legend(’exact solution’,’FE solution’)
xlabel(’x’)
% errors
[eL2,eH1] = norme_errore(griglia,base,x,’u_ex’,’grad_u_ex’);
emax = norm(u_ex(xnod)’-x,’inf’);
Pe_loc = h^2/(6*m);
disp([’L2 Errore : ’,num2str(eL2)])
disp([’H1 Errore : ’,num2str(eH1)])
disp([’Err max : ’,num2str(emax)])
disp([’Peclet : ’,num2str(Pe_loc)])

Running the Matlab code shown before, we obtain the solution shown in
Fig. 8.6. The BVP is in this case strongly reaction-dominated, and the exact
solution tends to behave as the solution of the “reduced” problem, corre-
sponding to setting µ = 0, that is given by the function ured (x) = 1. How-
ever, the boundary condition in x = 1 forces the solution to be equal to zero,
so that a steep boundary layer arises. The function uh exhibits marked spu-
rious oscillations in the neighbourhood of x = 1, and does not satisfy the
DMP. This is due to the fact that the local Peclèt number is
σ h2
Pe = = 16.6667 1.
6µ

The L2 error is equal to 0.12698 while the maximum nodal error is 0.2415
and clearly occurs in correspondance of the node located immediately be-
fore x = 1.
180 CHAPTER 8. SOLVED PROBLEMS

Figure 8.6: Computed solution and exact solution.

%%%%%%%%%%%%%%
% Ex. 3 (c)
%%%%%%%%%%%%%%
close all; clear all
global m

setfonts
m = 1e-4;
% nr. of elements
N = 10;
% degree of FEs
deg = 1;
% mesh size
h = 1/N;
% uniform mesh nodes
xnod = [0 : h: 1];
% mesh
griglia = struttura_griglia(xnod,deg);
% local basis functions
base = struttura_base(deg);
% system assembling phase
K = termine_diffusione(griglia,base,’coeff_mu’);
S = termine_reazione_lumping(griglia,base,’coeff_s’);
bv = termine_noto(griglia,base,’coeff_f’);
% boundary con ditions
dati_bordo = struct( ...
’bc’ , [1 1] , ... % type of condition
’gamma’ , [] , ... % parameter for Robin bc
’r’ , [] ... % boundary datum
);
A = K+S;
8.1. EXAMINATION OF JULY 09, 2012 181

b = bv;
% Dirichlet bcs and matrix system partitioning
u_incognite = [2 : 1 : griglia.dim-1];
u_note = [1; griglia.dim];
A11 = A(u_incognite,u_incognite);
A12 = A(u_incognite,u_note );
A21 = A(u_note ,u_incognite);
A22 = A(u_note ,u_note );
b1 = b(u_incognite);
x2 = [u_ex(0); u_ex(1)]; % boundary values for u_h
% system solution
x1 = A11\(b1-A12*x2);
% inclusion of boundary values in the computed solution
x(u_incognite,1) = x1;
x(u_note ,1) = x2;
% post-processing/plot
XX = [0:0.001:1]’;
Uex = u_ex(XX);
plot(XX,Uex,’k-’,xnod’,x,’o--’);
legend(’exact solution’,’FE solution’)
xlabel(’x’)
% errors
[eL2,eH1] = norme_errore(griglia,base,x,’u_ex’,’grad_u_ex’);
emax = norm(u_ex(xnod)’-x,’inf’);
disp([’L2 Errore : ’,num2str(eL2)])
disp([’H1 Errore : ’,num2str(eH1)])
disp([’Err max : ’,num2str(emax)])

Running the Matlab code shown before, we obtain the solution shown in
Fig. 8.7. In this case, using the lumping stabilization procedure, the com-
puted solution uh is no longer affected by numerical instabilities, and satis-
fies the DMP. The L2 error is equal to 0.14244 while the maximum nodal
error is 0.0097595, considerably smaller than in the non stabilized case.

(d)
Matlab coding. The following Matlab script solves Exercise 3(d).

%%%%%%%%%%%%%%
% Ex. 3 (d)
%%%%%%%%%%%%%%
close all; clear all
global m

setfonts
m = 1e-4;
% compute the number of elements needed to ensure Peclet <1
Nmin = round((1/sqrt(6*m)))
% nr. of elements
N = Nmin;
% degree of FEs
deg = 1;
% mesh size
182 CHAPTER 8. SOLVED PROBLEMS

Figure 8.7: Computed solution and exact solution.

h = 1/N;
% uniform mesh nodes
xnod = [0 : h: 1];
% mesh
griglia = struttura_griglia(xnod,deg);
% local basis functions
base = struttura_base(deg);
% system assembling phase
K = termine_diffusione(griglia,base,’coeff_mu’);
S = termine_reazione(griglia,base,’coeff_s’);
bv = termine_noto(griglia,base,’coeff_f’);
% boundary con ditions
dati_bordo = struct( ...
’bc’ , [1 1] , ... % type of condition
’gamma’ , [] , ... % parameter for Robin bc
’r’ , [] ... % boundary datum
);
A = K+S;
b = bv;
% Dirichlet bcs and matrix system partitioning
u_incognite = [2 : 1 : griglia.dim-1];
u_note = [1; griglia.dim];
A11 = A(u_incognite,u_incognite);
A12 = A(u_incognite,u_note );
A21 = A(u_note ,u_incognite);
A22 = A(u_note ,u_note );
b1 = b(u_incognite);
x2 = [u_ex(0); u_ex(1)]; % boundary values for u_h
% system solution
x1 = A11\(b1-A12*x2);
% inclusion of boundary values in the computed solution
x(u_incognite,1) = x1;
x(u_note ,1) = x2;
% post-processing/plot
XX = [0:0.001:1]’;
8.1. EXAMINATION OF JULY 09, 2012 183

Uex = u_ex(XX);
plot(XX,Uex,’k-’,xnod’,x,’o--’);
legend(’exact solution’,’FE solution’)
xlabel(’x’)
% errors
[eL2,eH1] = norme_errore(griglia,base,x,’u_ex’,’grad_u_ex’);
emax = norm(u_ex(xnod)’-x,’inf’);
Pe_loc = h^2/(6*m);
disp([’L2 Errore : ’,num2str(eL2)])
disp([’H1 Errore : ’,num2str(eH1)])
disp([’Err max : ’,num2str(emax)])
disp([’Peclet : ’,num2str(Pe_loc)])

Running the Matlab code shown before, we obtain the solution shown in
Fig. 8.8. In this case, the mesh is more refined than in points (b) and (c) of
this exercise, since we have Nmin = 41. The corresponding Peclèt number
is 0.99147 (i.e., slightly less than 1), the L2 error is 0.024579 and the maxi-
mum nodal error is 0.085817. With this combination of the parameters, the
performance of the GFEM is better than that in the case (c) as far as the
L2 error is concerned (a factor of ten smaller), while the maximum nodal
error is a factor of ten larger. All in all, if computational cost is the major
constraint, the method with the use of lumping of reaction matrix is the best
compromise, otherwise an alternative choice might be to refine the mesh
only close to the boundary layer.

Figure 8.8: Computed solution and exact solution.

184 CHAPTER 8. SOLVED PROBLEMS
Part V

Appendices

185
187

This part contains six appendices. The first three appendices are devoted to a re-
view of essential aspects of Linear Algebra, Functional Analysis and Differential
Calculus, that are extensively used in the text. The remaining three appendices are
devoted to illustrating the basic concepts of Numerical Mathematics, Numerical
Linear Algebra and Approximation Theory.
188
Appendix A

Differential Calculus

In this appendix, we gather some basic notions of Differential Calculus and related
formulas that are extensively used throughout the text.

A.1 Differential operators, useful formulas and prop-

erties
In this section, we give a short treatment of the differential operators of first and
second order that are used throughout the text, applied to scalar- and vector-valued
functions, with some useful formulas and properties. In the following, Ω is a
bounded set of Rd , d = 1, 2, 3, with Lipschitz boundary ∂ Ω on which a unit nor-
mal vector n = n(x) is defined almost everywhere, x = (x1 , . . . , xd )T being the
coordinate position vector.

A.1.1 First-order operators

• given a differentiable function u : Ω → R, we define the gradient of u as the
first-order operator transforming u into a vector, ∇u : Ω → Rd , such that
T
∂u ∂u
∇u = ,..., ;
∂ x1 ∂ xd

• given a differentiable vector field v = [v1 , . . . , vd ]T : (Ω . . . × Ω}) → Rd ,

| × Ω{z
d times
we define the divergence of v as the operator transforming v into a scalar,

189
190 APPENDIX A. DIFFERENTIAL CALCULUS

div v : Rd → R, such that

d
∂ vi
div v = ∑ ;
i=1 ∂ xi

A.1.2 Second-order operators

The Laplace operator (shortly, Laplacian) is the second-order differential operator
obtained by application of the divergence operator to the vector field ∇u, for a
given twice-differentiable function u, such that

d
∂ 2u
4u := div ∇u = ∑ 2
.
i=1 ∂ xi

More in general, for a given differentiable function

A = A(x) : Ω → R,

we define the second-order linear differential operator in divergence form as

div(A∇u) = A4u + ∇A · ∇u

where the symbol · denotes the scalar product between two vectors.

A.1.3 Green’s formula

The following result holds.

Theorem A.1.1 (Green’s formula). For sufficiently smooth given scalar and vec-
tor functions v and w, we have
Z Z Z
v div w dΩ = − ∇v · w dΩ + v w · n dΣ (A.1)
Ω Ω ∂Ω

where n dΣ denotes the oriented differential area element centered around the
point x over the d − 1-dimensional manifold ∂ Ω.

Identity (A.1) goes also under the name of integration by parts formula, and
plays a crucial role in the weak formulation of BVPs.
A.1. DIFFERENTIAL OPERATORS, USEFUL FORMULAS AND PROPERTIES191

A.1.4 Maximum principle for 2nd order elliptic operators

In this section, the differential operator L is defined as in (1.1). We assume that
c(x) ≥ 0 for all x ∈ Ω.
Definition A.1.2 (Inverse-monotonicity). Let w ∈ C2 (Ω)∩C0 (Ω). If the following
inequalities are satisfied
Lw(x) ≥ 0 for all x ∈ Ω and w(x) ≥ 0 for all x ∈ ∂ Ω, (A.2a)
then, we have
w(x) ≥ 0 for all x ∈ Ω. (A.2b)
In such a case, we say that L is inverse-monotone and condition (A.2) is referred
to as the monotonicity property.
Theorem A.1.3 (Comparison principle). Suppose that there exists a function φ ∈
C2 (Ω) ∩C0 (Ω) such that
Lu(x) ≤ Lφ (x) ∀x ∈ Ω
u(x) ≤ φ (x) ∀x ∈ ∂ Ω.
Then, we have
u(x) ≤ φ (x) ∀x ∈ Ω
and we say that φ is a barrier function for u.
Combining the previous two properties, we obtain the following result which
is a very useful tool in the approximation process of a BVP associated with the
second-order elliptic operator L.
Theorem A.1.4 (Maximum principle). Suppose that L is inverse-monotone and
that the comparison principle holds for a suitable barrier function φ . Then, setting
Mφ := max φ (x), we have
x∈Ω

0 ≤ u(x) ≤ Mφ ∀x ∈ Ω
and we say that u satisfies a maximum principle (MP).
Remark A.1.5 (Physical interpretation of the maximum principle). The maximum
principle has an important physical significance, because it mathematically ex-
presses the obvious fact that the dependent variable of the problem, say, a concen-
tration, a temperature or a mass density, cannot take negative values.
Example A.1.6. The reaction-diffusion and advection-diffusion operators consid-
ered in Chapt. 3 satisfy the maximum principle taking φ (x) = 1/σ and φ (x) = x/a,
respectively.
192 APPENDIX A. DIFFERENTIAL CALCULUS

A.2 Tensors
Let x = (x1 , x2 , x3 )T denote the position vector in R3 . A tensor τi j = τi j (x) is a
3 × 3 real-valued matrix function of the form

τ T1
   
τ11 τ12 τ13
   T 
τ τ21 τ22 τ 23
= τ 
   2 
τ31 τ32 τ33 τ T3

where τ i = [τi1 , τi2 , τi3 ]T , i = 1, 2, 3. The trace of τ is the scalar quantity defined
as the sum of the diagonal entries of τ, i.e.

3
Trτ = ∑ τii .
i=1

A tensor is symmetric if τ ji = τi j , i, j = 1, 2, 3, while if τ ji = −τi j , we say that τ

is skew-symmetric; the symmetric part of a given tensor τ is defined as

τ + τT
τS :=
2

while the skew-symmetric part of τ is

τ − τT
τSS := .
2

Clearly, for every tensor τ, we have

τ = τS + τSS . (A.3)

A.2.1 Operations/operators on tensors

The product between a tensor τ ∈ R3×3 and a vector v ∈ R3 is a vector w ∈ R3
that is defined as
3
wi = ∑ τi j v j = τ Ti v i = 1, 2, 3. (A.4)
j=1
A.2. TENSORS 193

The divergence of a tensor τ is a vector field defined as

 
∂ τ11 ∂ τ12 ∂ τ13
 ∂ x1 + ∂ x2 + ∂ x3 
 
  divτ 1
 ∂ τ21 ∂ τ22 ∂ τ23   
divτ =  ∂ x + ∂ x + ∂ x  =  divτ 2
  .
 (A.5)
 1 2 3 
 ∂ τ31 ∂ τ32 ∂ τ33  divτ 3
+ +
∂ x1 ∂ x2 ∂ x3

Let U = [u1 , u2 , u3 ]T be a given vector field whose components ui = ui (x) are

sufficiently smooth scalar functions, i = 1, 2, 3. Then, the gradient of U is the
tensor defined as
 
∂ u1 ∂ u1 ∂ u1
 ∂ x1 ∂ x2 ∂ x3 

(∇u )T 
  1
 ∂ u2 ∂ u2 ∂ u2   
∇U =    =  (∇u2 )T  . (A.6)
 ∂ x1 ∂ x2 ∂ x3 
  
 ∂ u3 ∂ u3 ∂ u3  (∇u3 )T
∂ x1 ∂ x2 ∂ x3
We notice that
3 3
∂ ui
Tr(∇U) := ∑ (∇U)ii = ∑ = divU. (A.7)
i=1 i=1 ∂ xi
194 APPENDIX A. DIFFERENTIAL CALCULUS
Appendix B

Linear Algebra

In this appendix, we gather some basic notions of Linear Algebra, that are exten-
sively used throughout the text.

B.1 Vector spaces

We start with some basic definitions.
Definition B.1.1 (Vector space). A vector (or linear) space V with respect to the
field Λ (Λ = R or Λ = C) is a nonempty set containing elements (vectors) for
which the following operations are defined:
• the sum;
• the multiplication by a scalar λ ∈ Λ.
Definition B.1.2 (Linear independence in V ). A system of vectors {v1 , v2 , . . . , vn } ∈
V is linearly independent if the following condition is satisfied
n
∑ λivi = 0 ⇔ λ1 = λ2 = . . . = λn = 0, λi ∈ Λ, i = 1, . . . , n.
i=1

Definition B.1.3 (Basis in V ). Any set of linearly independent vectors ui , i =

1, . . . , n, defines a basis in V if any given element v ∈ V can be written as
n
v = ∑ vi ui .
i=1

The quantities vi ∈ Λ are called the coordinates of v with respect to the basis {ui }.

195
196 APPENDIX B. LINEAR ALGEBRA

Theorem B.1.4 (Theorem of dimension). Let V be a vector space for which a

basis of n vectors exists. Then, any system of linearly independent vectors in V
has, at most, n elements, and the same holds for any other basis in V . In such a
case the dimension of V is equal to n, that is

dim(V ) = n.

If, instead, for every n there exists always a system of n linearly independent
vectors of V , then V has infinite dimension, that is

dim(V ) = +∞.

Example B.1.5. V = Rn . We have dim(V ) = n and the canonical basis is repre-

sented by the unit vectors

ei = [0, 0, . . . , 1, 0, . . . , 0]T ∈ Rn , i = 1, . . . , n

where the only non-zero component of ei occupies the i-th position in ei , that is,
(ei ) j = δi j , where (
1 if i = j
δi j =
0 if i 6= j
is the Kronecker symbol. On the contrary, V = C0 ([a, b]), the space of continuous
function over the interval [a, b], has infinite dimension. To see this, it suffices to
consider the Fourier series expansion of a periodic function over [0, 2π].

B.2 Vector and matrix norms

Given a vector space V , it is useful to introduce a mathematical instrument that
permits to quantitatively measure any element of V .
Definition B.2.1 (Norm). Let v be an arbitrary element of the vector space V .
The “norm” is a real functional k · kV : V → R satisfying the following properties
∀v, w ∈ V and ∀λ ∈ Λ:
1. kvkV ≥ 0 and kvkV = 0 iff v = 0;

2. kλ vkV = |λ |kvkV ;

3. kv + wkV ≤ kvkV + kwkV (triangular inequality).

B.2. VECTOR AND MATRIX NORMS 197

Figure B.1: Triangular inequality.

Example B.2.2 (Vector norms). Let V = Rn . For p ∈ [1, ∞), the Hölder (or p)
norm of a vector v ∈ V is defined as
!1/p
n
kvk p := ∑ |vi| p . (B.1)
i=1

• p = 1: we obtain the 1-norm of a vector;

• p = 2: we obtain the euclidean norm of v (corresponding to Pythagora’s

theorem in Rn );

• p → +∞: we obtain the so-called maximum norm of a vector

kvk∞ = max |vi |. (B.2)

i∈[1,n]

Matlab coding. The Matlab source code for computing the norm of a vector is
reported below.
>> v = [-10, 2, 1]’;
>> norm(v,1)

ans =

13
>> norm(v,2)

ans =

10.2470
>> norm(v,’inf’)

ans =

10
198 APPENDIX B. LINEAR ALGEBRA

Example B.2.3 (Matrix norms). Let V = Rm×n be the space of real-valued matri-
ces having m rows and n columns. For p ∈ [1, ∞], we define the so-called induced
p-norm of a matrix A ∈ V (or natural p-norm of a matrix) as
kAxk p
kAk p := sup x ∈ Rn . (B.3)
x6=0 kxk p

• p = 1: we obtain the 1-norm of a matrix, corresponding to taking the column

sum of A
m
kAk1 = max ∑ |ai j |;
j=1,...,ni=1

• p = 2: we obtain the 2-norm of a matrix (also known as spectral norm)

q
kAk2 = ρ(AT A)

where AT is the transpose of A and for any square matrix M of size n

ρ(M) := max |λi (M)| (B.4)

i=1,...,n

is the spectral radius of M, λi being the eigenvalues of M;

• p → +∞: we obtain the so-called maximum norm of a matrix, correspond-

ing to taking the row sum of A
n
kAk∞ = max ∑ |ai j | (B.5)
i=1,...,m j=1

Proposition B.2.4. Let A ∈ Rn×n be a given square matrix. Then

ρ(A) ≤ kAk p ∀p ∈ [1, ∞].

Matlab coding. The Matlab source code for computing the norm of a matrix is
reported below. Notice that the computed value of the spectral radius of A agrees
with Prop. B.2.4.
>> A = rand(3)

A =

0.8147 0.9134 0.2785

0.9058 0.6324 0.5469
B.3. MATRICES 199

0.1270 0.0975 0.9575

>> norm(A,1)

ans =

1.8475

>> norm(A,2)

ans =

1.8168

>> norm(A,’inf’)

ans =

2.0850

>> max(abs(eig(A)))

ans =

1.7527

B.3 Matrices
We start with an introduction to the basic nomenclature, definitions and properties
of a matrix.
Let A ∈ Rm×n denote a real-valued matrix with m rows and n columns. Each
entry of A is denoted by ai j , i = 1, . . . , m, j = 1, . . . , n. If m = n then A is a square
matrix, otherwise it is rectangular. Unless otherwise stated, we assume henceforth
m = n.
Definition B.3.1 (Regularity of a matrix). A is invertible (regular) iff ∃B ∈ Rn×n
such that
AB = BA = I
where I = (δi j ) is the identity matrix of order n. B is called the inverse of A and,
by definition, we set B = A−1 . If A is not invertible, then we say that A is singular.
Definition B.3.2 (Rank of a matrix). The rank of A is the maximum number of
linearly independent rows (or columns) of A, and is denoted rank(A). Matrix A is
said to have maximum (or full) rank if

rank(A) = min {m, n} .

200 APPENDIX B. LINEAR ALGEBRA

Theorem B.3.3 (Invertibility of a matrix). Matrix A is invertible iff det(A) 6= 0. In

an equivalent manner, A is invertible iff rank(A) = n, that is, iff A has maximum
rank. If A is singular, then we have det(A) = 0 and, by definition, the p-condition
number K p (A) is equal to +∞.
Nomenclature and matrix properties:
• A is diagonal if ai j = 0 for i 6= j;

• A is tridiagonal if ai j = 0 for j > i + 1 and j < i − 1;

• A is lower triangular if ai j = 0 for j > i, upper triangular if ai j = 0 if i > j;

• A is symmetric if ai j = a ji , i, j = 1, . . . , n, i.e., if A = AT ;

• A is positive definite (p.d.) if the real number xT Ax, x ∈ Rn , is always > 0

for every x 6= 0 and it is = 0 iff x = 0.
Proposition B.3.4. Let A = AT . Then, A is (symmetric and) positive definite
(s.p.d.) iff one of the following equivalent properties is satisfied:
• xT Ax > 0 for all x 6= 0;

• λi (A) > 0, i = 1, . . . , n (positive eigenvalues);

• det(Ai ) > 0, i = 1, . . . , n, where Ai is the submatrix formed by the first i rows

and i columns of A (Sylvester criterion);

• ∃H ∈ Rn×n such that A = H T H, with det(H) 6= 0.

Matlab coding. The Matlab sequence of commands to construct Ai is reported
below.
>> A=magic(3)

A =

8 1 6
3 5 7
4 9 2

>> for i=1:3, A_i=A(1:i,1:i), end

A_i =

8
B.3. MATRICES 201

A_i =

8 1
3 5

A_i =

8 1 6
3 5 7
4 9 2

Remark B.3.5. It can be checked that if A is p.d., then aii > 0, i = 1, . . . , n. To see
this, it suffices to take x = ei , i = 1, . . . , n.
Remark B.3.6. If A is p.d., then it has positive eigenvalues, and thus, as a conse-
quence, det(A) = Πni=1 λi (A) > 0, so that A is non-singular.

Definition B.3.7 (Diagonally dominant matrices). A is diagonally dominant by

rows if
n
|aii | ≥ ∑ |ai j | i = 1, . . . , n
j=1, j6=i

while it is diagonally dominant by columns if

n
|aii | ≥ ∑ |a ji | i = 1, . . . , n.
j=1, j6=i

Should the > operator hold in the above inequalities in place of ≥, then A is said
to be strictly diagonally dominant.

Definition B.3.8 (M-matrix property). A nonsingular matrix A is an M-matrix if

ai j ≤ 0 for i 6= j and if (A−1 )i j ≥ 0.

Definition B.3.9. Let x be a given vector in Rn . The inequality

x ≥ (≤)0

means that xi ≥ (≤)0 for each i = 1, . . . , n. The same holds if x is replaced by a

matrix A ∈ Rn×n .

The next result is a direct consequence of Def. B.3.8.

Theorem B.3.10 (Discrete maximum principle). Let A be an M-matrix such that

Ax ≤ 0, x ∈ Rn . Then, x ≤ 0.
202 APPENDIX B. LINEAR ALGEBRA

Proof. Set y := Ax and assume that y ≤ 0. Since A is an M-matrix, it is invertible

and we have
n
xi = ∑ (A−1)i j y j ≤ 0 i = 1, . . . , n,
j=1

because (A−1 )i j ≥ 0 and y j ≤ 0. Thus, we have proved that x ≤ 0. We proceed in

a similar manner if y ≥ 0.
Appendix C

Functional Analysis

In this appendix, we review the basic foundations of Functional Analysis that are
used thoroughly in the text. We introduce the basic concepts of function spaces,
associated topology and norms. We start from the notion of metric space, and
we give the definition of complete metric space and of normed space. Then, we
define the principal cathegory of metric complete normed spaces, the so-called
Banach spaces. We close this brief review by introducing a notable member of the
family of Banach spaces, the so-called Hilbert function spaces, and discuss the
most relevant examples that will be used in the weak formulation of the boundary
value problems considered in this text. Several examples are provided to support
the theoretical presentation and definitions.

C.1 Metric spaces

We start by giving the following general definition.
Definition C.1.1 (Metric space). V is a metric space if, for every pair u, v ∈ V ,
we can define a functional δ (u, v) : V × V → R, called distance, satisfying the
following properties:
1. δ (u, v) ≥ 0 and δ (u, v) = 0 iff u = v;
2. δ (u, v) = δ (v, u) (symmetry);
3. δ (u, v) ≤ δ (u, w) + δ (v, w), where w ∈ V (triangular inequality).
Definition C.1.1 introduces a metrics on V , that is, a quantitative manner to
“measure” the distance between two functions.

203
204 APPENDIX C. FUNCTIONAL ANALYSIS

Definition C.1.2 (Ball of radius ρ). Given u ∈ V and ρ ∈ R+ , we define the ball
Bρ (u) as the subset of elements v ∈ V such that δ (u, v) ≤ ρ.

(a) Triangular inequality (b) Ball in V

Figure C.1: Geometrical representation of a metrics in V .

Example C.1.3. Let V = R. It is immediate to see that the euclidean metrics

δ (x, y) := |x − y| ∀x, y ∈ V (C.1)
satisfies the three requirements in Def. C.1.1.

Figure C.2: Ball Bρ in V = C0 (Ω).

Example C.1.4. Let V = C0 ([a, b]), where [a, b] ⊂ R. Setting

δ (u, v) := max |u(x) − v(x)| ∀u, v ∈ V (C.2)
x∈[a,b]

it can be checked that the three requirements in Def. C.1.1 are satisfied. Met-
rics (C.2) is called Lagrangian of order zero (cf. (C.12) in the case k = 0), and the
ball Bρ is represented in Fig. C.2.
C.1. METRIC SPACES 205

Having introduced the notion of metrics, we can define the limit in V .

Definition C.1.5 (Limit in V ). Let {un } be a sequence of functions in a metric

space V , and z be a given element of V . We say that

lim un = z (C.3)
n→∞

if the following property holds

lim δ (un , z) = 0. (C.4)

n→∞

In such a case, we say that {un } converges to z (or that {un } is δ -convergent to z).

Remark C.1.6. If V = C0 ([a, b]), Def. C.1.5 is equivalent to the notion of uniform
convergence.

Proposition C.1.7 (Necessary condition for δ -convergence). Assume that (C.4)

holds (i.e., {un } is δ -convergent). Then,

lim δ (um , un ) = 0. (C.5)

m,n→∞

Proof. Triangular inequality yields

δ (um , un ) ≤ δ (um , z) + δ (un , z)

from which, using (C.4) for both terms at the right-hand side, we get (C.5).
Remark C.1.8. Relation (C.5) is the extension to metric spaces of the necessary
Cauchy condition for the existence of the limit for a sequence of numbers.
Example C.1.9. Let V = C0 ([0, 1]) and take un (x) = xe−nx , n ≥ 1. Let also z(x) = 0
for all x ∈ [0, 1]. Then, we have

lim δ (un , z) = lim max |xe−nx − 0| = lim max xe−nx = 0,

n→∞ n→∞ x∈[0,1] n→∞ x∈[0,1]

so that un converges uniformly to zero in [0, 1]. Fig. C.3(a) shows a plot of un (x)
for n = 1, 2, 3, 5 and 10, while Fig. C.3(b) shows a logarithmic plot of δ (un , 0) as
n → ∞. This latter picture reveals that uniform convergence of un to the function
z(x) = 0 is very slow.
Matlab coding. The Matlab script for generating Fig. C.3(a) is reported below.
206 APPENDIX C. FUNCTIONAL ANALYSIS

(a) un (b) Maximum error

Figure C.3: Uniform convergence in C0 ([0, 1]).

x=[0:0.0001:1];
n=[1,2,3,5,10];
figure;
for i=1:numel(n), un=x.*exp(-n(i)*x); plot(x,un); hold on; end
xlabel(’x’);
legend(’n=1’,’n=2’,’n=3’,’n=5’,’n=10’);

Matlab coding. The Matlab script for generating Fig. C.3(b) is reported below.
x=[0:0.0001:1];
n=[1:100];
for i=1:numel(n), un=x.*exp(-n(i)*x); err(i)=norm(un-0,’inf’); end
figure;
semilogy(n,err);
xlabel(’n’);

C.2 Complete metric spaces

We start with the following definition.

Definition C.2.1 (Cauchy sequence). {un } is a Cauchy sequence if

lim δ (un , um ) = 0. (C.6)

n,m→∞

Prop. C.1.7 immediately implies that

Proposition C.2.2. Every convergent sequence in a metric space V is a Cauchy

sequence.
C.2. COMPLETE METRIC SPACES 207

The following example shows that Prop. C.1.7 (equivalently, relation (C.6)) is
not, in general, a sufficient condition for {un } to satisfy (C.4) (i.e. for {un } to be
δ -convergent).
Example C.2.3 (The number e.). Let Q denote the set of rational numbers, and
let un := (1 + 1/n)n for n ≥ 1. Basic Calculus analysis tells us that the numerical
sequence un converges, as n → ∞, to Nepero’s number e ' 2.718281828459, i.e.

lim |un − e| = 0.
n→∞

Thus, un converges to the finite limit e with respect to the euclidean metrics, but
the limit is not a rational number (i.e., it cannot be written in the form p/q, p and
q being natural numbers). does not belong to the vector space Q
To remedy a problem like that occurring in Ex. C.2.3, the space V must satisfy
the property of completeness.

Definition C.2.4 (Completeness). V is complete metric space if

lim δ (un , um ) = 0 ⇒ ∃z ∈ V such that lim un = z. (C.7)

n,m→∞ n→∞

Thus, in a complete metric space, Prop. C.1.7 is also sufficient for {un } to be
δ -convergent.
Example C.2.5 (R as the completion of Q). Taking V = R, instead of Q, in
Ex. C.2.3, we conclude that the numerical sequence un = (1 + 1/n)n , n ≥ 1, is
δ -convergent to e with respect to the euclidean metrics (C.1). This shows that
the real field is the completion of the set of rational numbers with respect to the
metrics (C.1).

Proposition C.2.6. The space V = C0 ([0, 1]) is complete with respect to the la-
grangian metrics (C.2).

A natural question arises about the consequence of a change of metrics on the

topological properties of a function space. In the case of V = C0 ([a, b]), we can
introduce the following novel metrics (called integral metrics)
Z b
δ (u, v) := |u(x) − v(x)| dx ≡ δ1 (u, v). (C.8)
a
208 APPENDIX C. FUNCTIONAL ANALYSIS

Remark C.2.7. The space V = C0 ([a, b]) is not complete with respect to δ1 (·, ·).
In other words, there exist sequences of continuous functions un = {un (x)} for
which we have
Z b
lim δ1 (um , un ) = lim |um (x) − un (x)| dx = 0
m,n→∞ m,n→∞ a

even if there is no continuous function z = z(x) such that

lim δ1 (un , z) = 0.
n→∞
Example C.2.8. Let [a, b] = [0, 1], and
1 1

 √
 ≤x≤1
un (x) = x n
√ 1
 n 0≤x≤ .

n

Figure C.4: Plot of un (x) in the case n = 3.

Each function of the sequence is continuous, and we have

1
Z 1/n
1 √
lim δ1 (un , √ ) = lim ( √ − n)dx
n→∞ x n→∞ 0 x
√ 1/n √ 1

1
= lim 2 x0 − n = lim √ = 0.
n→∞ n n→∞ n

This shows √ that the sequence un converges, in the metrics (C.8), to the function
z(x) = 1/ x, so that, by Def. C.2.1, {un } is a Cauchy sequence with respect to
the integral metrics. However, the limit function z does not belong to C0 ([0, 1], so
that C0 ([0, 1] is not complete with respect to the integral metrics.
C.2. COMPLETE METRIC SPACES 209

Matlab coding. The Matlab script for generating Fig. C.4 is reported below.
x=[0:0.0001:1];
figure;
n=3;
un=sqrt(n)*(x<=1/n)+1./sqrt(x).*(x>1/n);
plot(x,un);
xlabel(’x’);

In the previous example, the function z is not defined at x = 0. More in gen-

eral, to treat the case where a function is not defined in a finite number of points,
the following definition allows us to extend the notion of continuity and the cor-
responding metrics.
Definition C.2.9. Let v : [a, b] → R be a measurable function. The space of
summable functions over the closed interval [a, b] is defined as
Z b
1
L ([a, b]) := v : [a, b] → R, such that |v(x)| dx < +∞ . (C.9)
a

L1 ([a, b]) is also called the space of Lebesgue integrable functions over the closed
interval [a, b].
L1 ([a, b]) does not contain only continuous functions.
√
Example C.2.10. Let v(x) = 1/ x. We have v ∈ / C0 ([0, 1]) because lim x−1/2 =
x→0+
+∞, but v ∈ L1 ([0, 1]). As a matter of fact
Z 1 1
x−1/2 dx = 2x1/2 0 = 2.
0

Theorem C.2.11 (Fisher-Riesz theorem). L1 ([a, b]) is complete with respect to

the integral metrics (C.8).
Obviously, we have C0 ([a, b]) ⊂ L1 ([a, b]), exactly as Q ⊂ R. Moreover, the
integral metrics δ1 is, in general, less fine than the Lagrangian metrics (C.2) in the
space C0 ([a, b]), as shown in the next example.
Example C.2.12. Let un (x) = xn and z(x) = 0, x ∈ [0, 1], n ≥ 1. Then
Z 1
1
lim δ1 (un , z) = lim |xn − 0| dx = lim = 0,
n→∞ n→∞ 0 n→∞ n + 1

while
max |xn − 0| = 1 ∀n ≥ 1.
x∈[0,1]
210 APPENDIX C. FUNCTIONAL ANALYSIS

C.3 Normed spaces

Def. B.2.1 extends to abstract function spaces the classical notion of the euclidean
norm of a vector in Rd .

Definition C.3.1 (Normed vector space). A real vector space V on which a norm
k · kV is introduced according to Def. B.2.1, can be made a metric space by setting

δ (u, v) := ku − vkV ∀u, v ∈ V.

Again, the above definition is a consistent extension of the way for measuring
the distance between two vectors in Rd .

Definition C.3.2 (Equivalence of norms). Let k · k and ||| · ||| be two norms on V .
We say that k · k and ||| · ||| are equivalent, if there exist two positive constants K1
and K2 , with K1 ≤ K2 , such that

K1 ||| · ||| ≤ k · k ≤ K2 ||| · |||. (C.10)

C.4 Banach spaces

Gathering the properties of a vector space of being both normed and complete
gives us the important cathegory of Banach spaces.

Definition C.4.1 (Banach space). V is called a Banach space if for every sequence
{un } ∈ V we have

lim kum − un kV = 0 ⇔ ∃z ∈ V such that lim kun − zkV = 0.

m,n→∞ n→∞

We notice that the above definition is nothing else the usual notion of com-
pleteness of a metric space in the special case where the distance δ (u, v) is taken
as the norm of u − v.

Before proceeding, we introduce some notation that is useful in the presentation

of function spaces suitable for the analysis of partial differential equations. Let Ω
be an open bounded set of Rd , d ≥ 1, and x = (x1 , . . . , xd )T be the position vector
in Ω. To denote in a synthetic manner partial differentiation with respect to the i-th
C.4. BANACH SPACES 211

coordinate xi , we introduce the non-negative multi-index α := (α1 , α2 , . . . , αd )T

such that, for any sufficiently smooth function v : Ω → R, we set

∂ |α| v
Dα v :=
∂ x1α1 . . . ∂ xdαd

where |α| := α1 + α2 + . . . αd is the length of the vector α. For example, let d = 2

and |α| = 2. Then, we have the three possible cases: α = [2, 0]T , α = [1, 1]T and
α = [0, 2]T , with which we can associate the following derivatives
2
∂ u ∂ 2u ∂ 2u

α
D u= , , .
∂ x12 ∂ x1 ∂ x2 ∂ x22
Example C.4.2 (The space of continuously differentiable functions of order k).
For k ≥ 0, let us denote by

Ck (Ω) = v : Ω → R : Dα v ∈ C0 (Ω) for each α such that 0 ≤ |α| ≤ k (C.11)

the space of functions having all derivatives of order ≤ k continuous in Ω up to

the boundary ∂ Ω. Then, V is a Banach space with respect to the norm
!
k
kvkV := ∑ sup sup Dα v . (C.12)
j=0 Ω |α|= j

For example, in the case d = 2 and |α| = 2, we have

( )!
∂u ∂u
kukC2 (Ω) = sup |u| + sup sup ,

Ω Ω ∂ x1 ∂ x2
( )!
∂ 2u ∂ 2u ∂ 2u
+ sup sup 2 , , 2 .

Ω ∂ x1
∂ x ∂
1 2x ∂ x2

Example C.4.3 (The space of p-summable functions). For p ∈ [1, ∞), let V =
L p (Ω) denote the set of measurable functions u : Ω → R of summable power p,
that is, such that Z
|u(x)| p dΩ < +∞.
Ω
Let Z 1/p
kukL p := |u(x)| p dΩ , (C.13)
Ω
212 APPENDIX C. FUNCTIONAL ANALYSIS

where dΩ := (dx1 , . . . dxd )T is the infinitesimal volume element centered at point

x ∈ Ω. Then, L p is a Banach space with respect to the metrics
Z 1/p
δ (u, v) := ku − vkL p = |u(x) − v(x)| p dΩ .
Ω

The special case p = ∞ deserves a separate treatment. A function u : Ω → R is

said to be essentially bounded on Ω if there exists a set Ω0 having zero measure
such that the restriction of u to Ω \ Ω0 is bounded. The norm on the space L∞ (Ω)
is defined as follows. Let u be essentially bounded on Ω, and let [Ω0 ] be the family
of sets having zero measure such that the restriction of u over Ω \ [Ω0 ] is bounded.
Setting
µΩ0 := sup |u(x)|,
x∈Ω\[Ω0 ]

we have that µΩ0 is a real-valued functional over [Ω0 ]. Then,

kukL∞ := inf µΩ0 ≡ Ess supx∈Ω |u(x)|. (C.14)

[Ω0 ]

Again, L∞ is a Banach space with respect to the metrics

δ (u, v) := ku − vkL∞ = Ess supx∈Ω |u(x) − v(x)|.

Theorem C.4.4 (Hölder inequality). Let f ∈ L p (Ω), g ∈ Lq (Ω), with p ∈ [1, ∞]

and q := p/(p − 1) (conjugate exponent of p). Then, the following Hölder in-
equality holds Z Z
f g dΩ ≤ | f | |g| dΩ ≤ k f kL p kgkLq . (C.15)

Ω Ω
If p = q = 2, (C.15) becomes
Z Z
f g dΩ ≤ | f | |g| dΩ ≤ k f kL2 kgkL2 ∀ f , g ∈ L2 (Ω). (C.16)

Ω Ω

Relation (C.16) is well-known as the Cauchy-Schwarz (CS) inequality.

C.5 Hilbert spaces

Hilbert spaces are certainly the most notable members of the wider family of Ba-
nach spaces. This prominence is due to the fact that the definition of norm in a
Hilbert space is induced by the notion of scalar product. This allows to extend
C.5. HILBERT SPACES 213

in a natural manner familiar concepts of vector analysis in Rd , in particular, or-

thogonality among elements of a space and the definition of energy of a function.
These structural properties make Hilbert spaces the right candidates for being the
ambient function spaces for problems arising from physical applications.
Definition C.5.1 (Scalar product). A real vector space V is called pre-hilbertian
if a functional (·, ·)V : V ×V → R, called scalar product, is defined in such a way
to satisfy the following properties:
1. (u, u)V ≥ 0 and (u, u)V = 0 iff u = 0;
2. (u, v)V = (v, u)V , for all u, v ∈ V ;
3. (λ u + µv, z)V = λ (u, z)V + µ(v, z)V .
p
We see that the quantity (u, u)V satisfies all the properties required by Def. B.2.1,
so that it is admissible to set by definition
p
kukV := (u, u)V ∀u ∈ V. (C.17)
Thus, a pre-hilbertian function space V is made a normed space through (C.17),
showing that the norm is directly induced by the introduction of a scalar product
in V . The next step is to ensure that the pre-hilbertian space is also complete with
respect to the norm (C.17). Should this happen, then V is called a Hilbert space.
Example C.5.2. Let V = L2 (Ω). In this case, V is an Hilbert space with respect to
the scalar product Z
(u, v)L2 := u v dΩ. (C.18)
Ω
It is important to notice that C0 (Ω) is pre-hilbertian with respect to (C.18), but
not complete, as the following example shows.
Example C.5.3. Let Ω = [−1, 1] and

 0 x≤0
un (x) = nx 0 < x ≤ 1/n
1 x > 1/n.


Clearly, un ∈ C0 ([−1, 1]) for all n ≥ 1. Moreover, with m < n we have

Z 1/n Z 1/m
2
kum − un k2L2 = (mx − nx) dx + (1 − mx)2 dx
0 1/n
1 1 m 1 1 1
= + −2 < − <
3m 3n n 3m 3n 3m
214 APPENDIX C. FUNCTIONAL ANALYSIS

(a) u3 (b) Limit function as n → ∞

Figure C.5: The space C0 ([−1, 1]) is not complete with respect to the metrics of
L2 .

so that
lim kum − un k2L2 = 0.
m,n→∞
This shows that {un } is a Cauchy sequence with respect to the metrics associated
with (C.17). Let
0 −1 ≤ x < 0
H(x) :=
1 0<x≤1
denote the so-called Heaviside function. The function H has a jump discontinuity
at x = 0 and for this reason it is also known as step function. We can check that
for n → ∞, the sequence un converges to H(x) for almost all x ∈ [0, 1]. As a matter
of fact, we have
(nx − 1)3 x=1/n
Z 1/n
1
lim kun − H(x)k2L2 = lim (nx − 1)2 dx = lim = lim = 0.
3n n→∞ 3n

n→∞ n→∞ 0 n→∞ x=0

Assume now that there exists a function v ∈ C0 ([−1, 1]) such that lim kun −vkL2 =
n→∞
0. Then, applying the triangular inequality we get
kv − H(x)kL2 ≤ kv − un kL2 + kun − H(x)kL2
from which, passing to the limit
lim kv − H(x)kL2 ≤ lim kv − un kL2
n→∞ n→∞
+ lim kun − H(x)kL2 = 0 + 0 = 0
n→∞
C.6. SOBOLEV SPACES IN 1D 215

that implies v(x) = H(x) ∈/ C0 ([−1, 1]), in contradiction with the starting assump-
tion v ∈ C0 ([−1, 1]). We have thus proved that un is a convergent sequence in the
space C0 ([−1, 1]), convergence being measured with respect to the metrics of L2 ,
even if there is no continuous function z = z(x) such that
lim kun − zkL2 = 0.
n→∞

The limit function z is the discontinuous function H(x). This shows that the space
C0 ([−1, 1]) is not complete with respect to the metrics of L2 .
Matlab coding. The Matlab script for generating Figs. C.5(a) and C.5(b) is re-
ported below.
x=[-1:0.0001:1];
figure;
n=3;
un=0.*(x<=0) + n*x.*((x>0) & (x<=1/n)) + 1.*(x>1/n);
figure, plot(x,un);
xlabel(’x’);
ulimit=0.*(x<=0) + 1.*(x>0);
figure, plot(x,ulimit);
xlabel(’x’);

C.6 The Sobolev space of order m in one spatial di-

mension
Let Ω := (a, b) be an open bounded interval. For any given nonnegative integer
m, we let

m ∂αv 2
H (Ω) = v : Ω → R : α ∈ L (Ω), α = 0, 1, . . . , m (C.19)
∂x
be the space of functions whose derivative up to the m-th order is square integrable
in the Lebesgue sense over Ω. The space L2 (Ω) corresponds to the special case
m = 0. The space H m is known as the Sobolev space of order m, and is an Hilbert
space upon the following definition of the scalar product
m Z b
∂αu ∂αv
(u, v)H m (Ω) = ∑ dx. (C.20)
α=0 a ∂ xα ∂ xα
The natural norm on H m is then
s
m α 2
∂ u
q
kukH m (Ω) = (u, u)H m (Ω) = ∑ α 2 . (C.21)
α=0 ∂ x L (Ω)
216 APPENDIX C. FUNCTIONAL ANALYSIS

Theorem C.6.1 (Sobolev imbedding in 1D). Let Ω = (a, b) ⊂ R. Then

H m (Ω) ⊂ Cm−1 (Ω) ∀m ≥ 1. (C.22)
An important member of the family of spaces (C.19) is that corresponding to
m = 1. In this case, we have
Z b Z b
∂u ∂v
(u, v)H 1 (Ω) = uv dx + dx, (C.23)
a a ∂x ∂x
and s
∂ u 2
kukH 1 (Ω) = kuk2L2 (Ω) + 2 . (C.24)

∂ x L (Ω)
Remark C.6.2. Sobolev embedding (C.22) tells us that functions in H 1 (a, b) are
continuous over [a, b]. This property does not hold, in general, when Ω ⊂ Rd ,
d ≥ 2.
A useful subspace of H 1 that will be used many times in the analysis of bound-
ary value problems is that consisting of all functions belonging to H 1 and vanish-
ing on the boundary ∂ Ω = {a, b}
H01 (Ω) = v ∈ H 1 (Ω) : v(a) = v(b) = 0 .

(C.25)
The following important property holds.
Theorem C.6.3 (Poincaré inequality). Given the bounded interval Ω = (a, b),
there exists a constant CP > 0 depending only on the size of Ω such that
∂ v
kvkL2 (Ω) ≤ CP 2 ∀v ∈ H01 (Ω). (C.26)

∂ x L (Ω)
Proof. Let v denote any function in H01 (Ω). Then, noting that v(a) = 0, we have
∀x ∈ [a, b]
R ∂v Z b ∂v
|v(x)| = |v(x) − v(a)| = ax dt ≤ |1| dt

∂t a ∂t
∂ v √ ∂ v
≤ 2 k1kL2 (Ω) = b − a 2 ,

∂ x L (Ω) ∂ x L (Ω)
from which, squaring both sides of the inequality and integrating over (a, b), we
get
Z b ∂ v 2
|v(x)|2 dx ≤ (b − a)2 2 ,

a ∂ x L (Ω)
from which we finally obtain (C.26) with CP = b − a.
C.6. SOBOLEV SPACES IN 1D 217

Remark C.6.4 (Equivalent norm in H01 ). Using (C.26) in (C.24) for each v ∈
H01 (Ω), we have
∂ v 2
kvk2H 1 (Ω) ≤ (1 +CP2 ) 2 .

∂ x L (Ω)
On the other hand, from definition (C.24), we trivially have
∂ v 2
kvk2H 1 (Ω) ≥ 2 ,

∂ x L (Ω)
so that we can conclude that
∂v 2 ∂ v 2
k kL2 (Ω) ≤ kvk2H 1 (Ω) ≤ (1 +CP2 ) 2 ∀v ∈ H01 (Ω). (C.27)

∂x ∂ x L (Ω)
Applying Def. C.3.2 to (C.27), we see that
∂ u
kukH 1 (Ω) = 2 u ∈ H01 (Ω) (C.28)

0 ∂ x L (Ω)

is an equivalent norm on H01 (Ω), with K1 = 1 and K2 = (1 +CP2 )1/2 .

Example C.6.5 (Completion of C2 ∩C00 into H01 ). Let

C00 ([−1, 1]) := v ∈ C0 ([−1, 1]) such that v(−1) = v(1) = 0

and introduce the space

W = v ∈ C2 ([−1, 1]) such that v(−1) = v(1) = 0 = C2 ([−1, 1]) ∩C00 ([−1, 1]).

For n ≥ 1, let us consider the following sequence of functions in W

1 2 1
 πnx
 1− +
 cos |x| <
un (x) = n nπ 2 n
 1 − |x| 1
≤ |x| ≤ 1

n
whose first and second derivatives are given by


 1 −1 ≤ x ≤ − 1n


1

 πnx
0 − sin |x| <
un (x) =

 2 n

 1
 −1
 ≤x≤1
n
218 APPENDIX C. FUNCTIONAL ANALYSIS

and 

 0 −1 ≤ x ≤ − 1n


 πn 1
 πnx
u00n (x) = − cos |x| <

 2 2 n

 1
 0
 ≤ x ≤ 1.
n
A plot of un , u0n and u00n for n = 1, n = 6 and n = 20, is shown in Fig. C.6.
Proceeding as in Ex. C.5.2, we can check that

(a) un (b) u0n

Figure C.6: A sequence of functions belonging to C2 ∩C00 whose limit is a function

in H01 .

lim kum − un kH 1 (−1,1) = 0.

m,n→∞ 0

This shows that {un } is a Cauchy sequence with respect to the metrics associated
with (C.28). We can also see that for n → ∞, the sequence un converges to z(x) :=
C.6. SOBOLEV SPACES IN 1D 219

1 − |x| ∈ C00 ([−1, 1]), i.e.

lim kun − z(x)kH 1 (−1,1) = 0.

n→∞ 0

Then, proceeding by contradiction, we show that there is no function v ∈ W such

that
lim kun − vkH 1 (−1,1) = 0.
n→∞ 0

The only limit function v is the continuous function z = 1 − |x|. This shows that
the space C2 ([−1, 1]) ∩C00 ([−1, 1]) is not complete with respect to the metrics of
H01 (−1, 1). The limit function z belongs to H01 (−1, 1) ≡ V because z ∈ L2 (−1, 1)
and the derivative z0 is the Heaviside-like step function
(
1 −1 ≤ x ≤ 0
z0 (x) =
−1 0 < x ≤ 1.

Taking a closer glance, we see that every member un of the sequence (which
belongs to W ) actually belongs to V . However, V contains functions that do not
belong to W , as is the case for the limit z. This fact is graphically documented in
Fig. C.7

Figure C.7: The process of completion of the space W = C2 ([−1, 1])∩C00 ([−1, 1]).
Arrows pointing from W to V = H01 (−1, 1) represent the limit for n → ∞.

Matlab coding. The Matlab script for generating Fig. C.6 is reported below.
x=[-1:0.0001:1];
n=[1,6,20];
220 APPENDIX C. FUNCTIONAL ANALYSIS

for i=1:numel(n)
un_0(i,:) = (1-1/n(i)+2/(n(i)*pi).*cos(pi*n(i)*x/2)).*(abs(x)<1/n(i))+...
(1-abs(x)).*((abs(x)>=1/n(i))&(abs(x)<=1));
un_1(:,i) = 1.*((x>=-1)&(x<=-1/n(i)))+(-sin(n(i)*pi*x/2)).*...
(abs(x)<1/n(i))+(-1).*(x>=1/n(i));
un_2(:,i) = 0.*((x>=-1)&(x<=-1/n(i)))+ 0.*((x>=1/n(i))&(x<=1)) ...
-pi/2*n(i)*cos(pi*n(i)*x/2).*(abs(x)<1/n(i));
end
figure, plot(x,un_0);axis(’square’)
xlabel(’x’);
legend(’n=1’,’n=6’,’n=20’);
figure, plot(x,un_1);axis(’square’)
xlabel(’x’);
legend(’n=1’,’n=6’,’n=20’);
figure, plot(x,un_2);axis(’square’)
xlabel(’x’);
legend(’n=1’,’n=6’,’n=20’);

C.7 The Sobolev space of order m in Rd , d ≥ 2

In the study of elliptic boundary value problems, it is important to consider the
more general (and realistic) situation where the domain Ω is a bounded open set
of Rd , d ≥ 2, whose boundary ∂ Ω is Lipschitz continuous. This latter property
means that a unit outward normal vector n = n(x) is defined for almost every
x ∈ ∂ Ω.
Then, analogously to (C.19), for m ≥ 0 we set
H m (Ω) = v : Ω → R : Dα v ∈ L2 (Ω) for each α such that |α| ≤ m . (C.29)

The scalar product is

m Z
(u, v)H m (Ω) = ∑ Dα u · Dα v dΩ (C.30)
|α|=0 Ω

and the corresponding norm is

v
q u m
kDα vk2L2 (Ω) .
u
kukH m (Ω) = (u, u)H m (Ω) = t ∑ (C.31)
|α|=0

Again, the space L2 (Ω) corresponds to the case m = 0, while in the case m = 1,
we have the space H 1 (Ω)

1 2 ∂v 2
H (Ω) = v ∈ L (Ω) such that ∈ L (Ω), i = 1, . . . , d . (C.32)
∂ xi
C.7. SOBOLEV SPACES IN RD , D ≥ 2 221

Remark C.7.1. In the applications, the quantity Ω |∇v|2 dΩ has usually the mean-
R

ing of an energy. For this reason, it is customary to say that a function v belonging
to H 1 (Ω) has finite energy.
The following example shows that functions in H 1 (Ω) are not necessarily con-
tinuous when d > 1, as anticipated in Rem. C.6.2.
Example C.7.2. Let

1
q
2 2
Ω = (x1 , x2 ) : ρ ≡ x1 + x2 ≤ ,
2

and u(x1 , x2 ) = (− log(ρ))β , β ∈ (0, 1/2). We have lim = +∞ despite the fact
ρ→0+
that u ∈ H 1 (Ω).

Figure C.8: A function with logarithmic singularity at the origin, which belongs
to H 1 (Ω) (in this case, β = 1/3).

Matlab coding. The Matlab script for generating Fig. C.8 is reported below.
beta = 1/3;
[X,Y] = meshgrid(0:0.001:1/2, 0:0.001:1/2);
rho=sqrt(X.^2+Y.^2);
Z=(-log(rho)).^(beta);
surf(X,Y,Z);

The following definitions extend to the functions of H 1 (Ω) the notion of value
of a function on the boundary of Ω that is valid in the case d = 1.
222 APPENDIX C. FUNCTIONAL ANALYSIS

Definition C.7.3 (Trace operator). Given a function v ∈ H 1 (Ω), we define the

trace operator γ0 : ∂ Ω → R as
γ0 v := v|∂ Ω .
Definition C.7.4 (Trace space). The space of traces of all functions of H 1 (Ω) is
denoted H 1/2 (∂ Ω) and is defined as

H 1/2 (∂ Ω) = γ0 v, v ∈ H 1 (Ω) .

The space H 1/2 (∂ Ω) is an Hilbert space by endowing it with the norm

kgkH 1/2 (∂ Ω) := inf kvkH 1 (Ω) g ∈ H 1/2 (∂ Ω). (C.33)

v∈H 1 (Ω)
γ0 v=g

The subset of H 1 (Ω) of functions having vanishing trace on ∂ Ω is

H01 (Ω) = v ∈ H 1 (Ω) such that v|∂ Ω = 0 .

(C.34)
As in the one-dimensional case, we have the following results.
Theorem C.7.5 (Poincaré inequality in Rd ). There exists a positive constant CP
depending only on the domain Ω such that
kvkL2 (Ω) ≤ CP k∇vkL2 (Ω) ∀v ∈ H01 (Ω). (C.35)

Theorem C.7.6 (Equivalent norm in H01 ). The quantity k∇vkL2 is an equivalent

norm in H01 (Ω), and we set

kvkH 1 (Ω) = k∇vkL2 (Ω) v ∈ H01 (Ω). (C.36)

It is often useful to relate the norm of a function belonging to H 1 (Ω) to the

norm of the restriction of such a function to the boundary ∂ Ω. The following
results holds.
Theorem C.7.7 (Trace theorem). Let v ∈ H 1 (Ω). Then there exists a positive
constant C∂ Ω such that
kv|∂ Ω kL2 (∂ Ω) ≤ C∂ Ω kvkH 1 (Ω) ,
where Z 1/2
2
kv|∂ Ω kL2 (∂ Ω) = |v(x)| dΓ .
∂Ω
C.7. SOBOLEV SPACES IN RD , D ≥ 2 223

Figure C.9: A two-dimensional domain. Γ ⊆ ∂ Ω is the portion of the boundary

on which the trace of a function vanishes.

Remark C.7.8 (Poincaré-Friedrichs inequality). The Poincaré inequality and the

trace theorem continue to hold if ∂ Ω is replaced with a subset Γ ⊆ ∂ Ω. In such a
case, H01 (Ω) is replaced by the space
1
(Ω) := v ∈ H 1 (Ω) such that v|Γ = 0

H0,Γ
and the Poincaré inequality takes the name of Poincaré-Friedrichs inequality.
Having defined the space of traces of a scalar function of H 1 (Ω), it is also
possible to introduce the space of the traces of a vector function.
Definition C.7.9. Let v : Ω → (R)d be a given vector-valued function. A space of
frequent use is
n o
2 d 2
H(div; Ω) := v ∈ (L (Ω)) such that divv ∈ L (Ω) . (C.37)

The space H(div; Ω) is an Hilbert space by endowing it with the norm

1/2
kvkH(div;Ω) := kvk2L2 (Ω) + kdivvk2L2 (Ω) . (C.38)

Definitions (C.37) and (C.38) tell us that functions belonging to H(div; Ω) have a
regularity that is intermediate between L2 and H 1 .
Definition C.7.10 (Trace space of vector functions). The space of traces of all
functions of H(div; Ω) is denoted H −1/2 (∂ Ω) and is defined as
H −1/2 (∂ Ω) = {γ0 v = (v · n)|∂ Ω , v ∈ H(div; Ω)} .
The space H −1/2 (∂ Ω) is an Hilbert space by endowing it with the norm
khkH −1/2 (∂ Ω) := inf kvkH(div;Ω) h ∈ H −1/2 (∂ Ω). (C.39)
v∈H(div;Ω)
v·n=h
224 APPENDIX C. FUNCTIONAL ANALYSIS
Appendix D

Essentials of Numerical
Mathematics

Abstract
In this appendix, we review the basic foundations of Numerical Mathematics that
are used thoroughly in the the text. In particular, several elements of Linear Al-
gebra and Numerical Analysis will be addressed, including solution methods for
linear systems and function approximation using polynomials.

D.1 The continuous model

In this section, we introduce a general approach to the study of a mathematical
representation of a given physical problem. Such a representation is referred to,
from now on, as the continuous model or, the continuous problem, and takes the
following form:

 given d ∈ D, find x ∈ V such that
(P)
 F(x, d) = 0

where: d are the data, D is the set of admissible data (i.e., the data for which (P)
admits a solution), x is the solution, to be sought in a space V (see Def. B.1.1),
while F is the functional relation between d and x.

225
226 APPENDIX D. NUMERICAL MATHEMATICS
Z b
Example D.1.1. Compute I f := f (t) dt with f ∈ C0 ([a, b]):
a

d = { f , a, b} , x = If
Z b
F(x, d) = f (t) dt − x, V = R.
a

Example D.1.2. Solve the linear algebraic system Ax = b:

d = ai j , bi , i, j = 1, . . . , n, x=x
F(x, d) = b − Ax, V = Rn .

Example D.1.3. Solve the Cauchy problem:

y0 (t) = f (t, y(t))

(
t ∈ (t0 ,t0 + T )
y(t0 ) = y0

where:

d = { f ,t0 , T, y0 } , x = y(t)
f − y0 = 0
(
t ∈ (t0 ,t0 + T )
F(x, d) = , V = C1 (t0 ,t0 + T ).
y0 − y(t0 ) = 0 t = t0

Definition D.1.4 (Continuous dependence on data). Problem (P) is said to be

well-posed (or stable) if it admits a unique solution x continuously depending on
the data. Should (P) not enjoy such a property, it is said to be not well-posed (or
unstable).

Example D.1.5 (Unstable problem). Let a be a real parameter. Let n = n(a) denote
the number of real roots of the polynomial p(t) = t 4 − t 2 (2a − 1) + a(a − 1). It is
easy to see that: 
 0 a<0
n(a) = 2 0≤a<1
4 a ≥ 1.


Therefore, the problem of determining n(a) is not well-posed because n is not

continuous at a = 0 and a = 1.
D.1. THE CONTINUOUS MODEL 227

Figure D.1: Continuous problem with perturbation on data and solution.

To characterize mathematically the concept of stability of (P), we consider a

perturbation of this latter (cf. Fig. D.1):

 given (d + δ d) ∈ D, find (x + δ x) ∈ V such that
(Pδ )
 F(x + δ x, d + δ d) = 0

where δ d is a perturbation of the data d such that (Pδ ) is still well posed, and δ x
is the corresponding perturbation of the solution.
Definition D.1.6 (Stability of (P)). We say that (P) is stable if
∃η0 = η0 (d), ∃K0 = K0 (d) such that
(D.1)
kδ dkD ≤ η0 ⇒ kδ xkV ≤ K0 kδ dkD ,
where k · kD and k · kV are appropriate norms for D and V , respectively (see
Def. B.2.1).
The above definition tells us that, provided a limiting threshold η0 is assumed
for the admissible perturbations, the corresponding perturbation δ x on the solution
can be controlled by η0 , modulo an amplification constant K0 . Notice that both
η0 and K0 depend, in general, on the reference data d. In this sense, the stability
emanating from Def. D.1.6 is “local”. To get a “global” estimate of the stability of
(P), it is clear that we need to “explore” the whole admissible set D. This leads
to the next definition.
Definition D.1.7 (Condition number of (P)). The relative condition number as-
sociated with problem (P) is
kδ xkV /kxkV
K(d) = sup (d + δ d) ∈ D. (D.2)
δ d6=0 kδ dkD /kdkD
228 APPENDIX D. NUMERICAL MATHEMATICS

If x = 0 or d = 0, the relative condition number is replaced by the absolute con-

dition number associated with problem (P)

kδ xkV
Kabs (d) = sup (d + δ d) ∈ D. (D.3)
δ d6=0 kδ dkD

Roughly speaking, we say that (P) is “well-conditioned” whenever K(d) is

“small”, while (P) is “ill-conditioned” whenever K(d) is “large”.
Example D.1.8 (Condition number of a matrix A). In this example, we compute
an estimate of the relative condition number in the case of Ex. D.1.2. We use the
p-norm (B.1) for x, δ x, b and δ b, and the induced p-norm (B.3) for A and δ A.
Moreover, we assume that δ A = 0, so that only the right-hand side b is subject to
a perturbation δ b. Then, relation (D.2) yields

kδ xk p /kxk p kA−1 δ bk p kAxk p

K(d) = sup = sup
δ b6=0 kδ bk p /kbk p δ b6=0 kxk p kδ bk p
kA−1 k p kδ bk p kAk p kxk p
≤ ≡ K p (A)
kxk p kδ bk p

where
K p (A) := kAk p kA−1 k p (D.4)

is the p-condition number of matrix A. It is easy to see that

K p (A) ≥ 1

and that K p (I) = 1, I being the identity matrix of order n. By definition, K p (A) =
+∞ if A is singular. In the important case where A is symmetric and positive
definite, we have

K2 (A) = kAk2 kA−1 k2 =

p p
ρ(AAT ) ρ(A−1 A−T )
λmax (A)
= ρ(A)ρ(A−1 ) =
λmin (A)

where λmax (A) and λmin (A) are the extreme eigenvalues of A and ρ(A) is the
spectral radius of A (see (B.4)).
D.2. THE NUMERICAL MODEL 229

D.2 The numerical model

Solving explicitly the continuous problem (P) is in general a difficult, if not even
impossible, task. For this reason, we introduce a family of numerical problems
depending on the discretization parameter h > 0:

 given dh ∈ Dh , find xh ∈ Vh such that
(P)h
 F(x , d ) = 0
h h

where dh and xh are the approximate data and solution, to be sought in suitable
subspaces Dh ⊆ D and Vh ⊆ V . The main difference between (P)h and its con-
tinuous counterpart (P) is that the former is constructed in such a way that xh is a
computable quantity, unlike x. For this reason, we refer to (P)h as the numerical
method. Our request is that
lim xh = x, (D.5)
h→0
that is, we want the numerical solution to converge to the solution of the contin-
uous problem, as the discretization parameter becomes arbitrarily small. In order
convergence to occur, we necessarily need that:
1. the approximate data to be convergent, i.e.

lim dh = d;
h→0

2. (P)h to be consistent, i.e.

lim (Fh (x, d) − F(x, d)) = 0 (D.6)

h→0 | {z }
:=Rh (x,d)

Rh (x, d) being the residual of problem (P)h , obtained by forcing into the
numerical problem the solution and data of the continuous problem.
Example D.2.1 (Numerical quadrature). For f ∈ C0 ([a, b]), let x = I f , xh = I f ,h '
I f and !
Nh
Fh (xh , dh ) = I f ,h − xh := ∑ h f (t k ) − xh
k=1

where Nh ≥ 1 is the number of subintervals in which [a, b] is divided, each of width

h = (b − a)/Nh , while t k := (tk−1 + tk )/2, k = 1, . . . , Nh , is the midpoint of each
230 APPENDIX D. NUMERICAL MATHEMATICS

subinterval. Notice that h → 0 ⇔ Nh → ∞, but the product h Nh remains constant

and equal to b − a. The approximate formula I f ,h to compute I f is known as the
composite midpoint quadrature rule and its geometrical representation is given in
Fig. D.2.

Figure D.2: The composite midpoint quadrature rule. The shaded scaloid is the
approximation of I f .

Let us check that (P)h is consistent. Assuming f ∈ C2 ([a, b]), it can be shown
that !
Nh Z b
Rh (x, d) = ∑ h f (t k ) −x− f (t) dt − x
k=1 a

!
Nh Z b
b − a 2 00
= ∑ h f (t k ) − f (t) dt = h f (ξ )
k=1 a 24
where ξ ∈ (a, b), so that (D.6) is satisfied. As for convergence, we notice that the
midpoint quadrature rule I f ,h coincides with the average Riemann sum, so that, by
definition of integral of a function between a and b we have
lim xh = I f = x ∀ f ∈ C0 ([a, b])
h→0
and thus xh converges to x.
The following general result is the milestone of Numerical Analysis.
Theorem D.2.2 (Equivalence theorem (Lax-Richtmyer)). The numerical method
(P)h is convergent iff it is consistent and stable. Consistency is expressed by (D.6),
while stability is expressed by (D.1) provided to replace d, x, D and V with dh , xh ,
Dh and Vh .
D.3. THE CHAIN OF ERRORS 231

Figure D.3: The Lax-Richtmyer paradigm.

D.3 The chain of errors

So far, we have introduced the notion of error as that of being, in general, the
difference between the solution of the continuous problem, which we call from
now on, the exact solution, and the solution of the numerical problem, which we
call the approximate, or, numerical solution. As a matter of fact, our definition
of error is not completely precise, because we are neglecting, at least, two other
important sources of error, namely, the so-called modeling error and rounding
error.
The modeling error is associated with a possible inaccuracy of the mathe-
matical model describing the real physical application at hand, and/or a possible
inaccuracy in the data entering the model formulation (due, for instance, to mea-
surement machine tolerances or statistical fluctuations of the phenomena under
investigation). As such, the modeling error is somewhat outside our possibility of
control and, consequently, reduction.
The rounding error is, instead, inherently related to the computational process
that is implemented in a computer algorithm, i.e., a sequence of deterministic
machine operations that are run using a software environment (Matlab) on a PC
characterized by a specific hardware (Intel Pentium IV processor) and a specific
OS (Linux, Windows 7). Depending on the machine arithmetic being used, the
rounding error can be monitored and accurately estimated, but not completely
eliminated.
In mathematical terms, we have:
• em := x ph − x: modeling error. It is the difference between the solution x ph
of the real physical problem and the solution x of the mathematical model;

• eh := x − xh : numerical error. It is the difference between the solution x of

the mathematical model and the solution xh of the numerical problem;

• er := xbh − xh : rounding error. It is the difference between the solution xh

of the numerical model and the solution xbh that is actually produced by the
232 APPENDIX D. NUMERICAL MATHEMATICS

computational process.

Figure D.4: Sources of error in a computational process.

Letting ec := eh + er the computational error, we define the global error as

e = x ph − xbh = x ph − x + x − xbh = x ph − x + (x − xh + xh − xbh ). (D.7)
| {z } | {z } | {z } | {z }
em ec eh er

The whole chain of errors is pictorially represented in Fig. D.4. It is important to

notice that the only quantity that is available to the user is xbh ; all the other solu-
tions of the various intermediate steps of the process are virtually unaccessible.
In particular, throughout the remainder of this text, we shall always neglect the
modeling error in the analysis of the numerical methods of processes. Moreover,
for ease of notation, we shall write xh instead of xbh , unless otherwise specified,
keeping the presence of the rounding error always understood.

D.4 Errors and error analysis

Definition D.4.1 (Absolute and relative errors). For an appropriate norm k · k, we
have
Eabs (xh ) := kx − xh k, absolute error
kx − xh k (D.8)
Erel (xh ) := , relative error, x 6= 0.
kxk
Definition D.4.2 (Order of convergence). We say that xbh converges to x with order
p > 0 with respect to the discretization parameter h, if ∃ a positive constant C
independent of h, such that
ec ≤ Ch p . (D.9)
D.5. FLOATING-POINT NUMBERS 233

Figure D.5: Error in log-log scale.

Remark D.4.3. Using logarithmic scale, we have

log(ec ) ' log(C) + p log(x)

| {z } | {z } |{z} | {z }
Y q m X

that is, the log-plot of the error is a straight line in the X-Y plane, whose slope
m gives us immediately the order of convergence of the method. In the case of
b−a
Ex. D.2.1, p = 2 and C = max | f 00 (t)|.
24 t∈[a,b]

D.5 Machine representation of numbers and round-

ing error
When working on a computer machine, any real number x is represented in the
hardware by another real number, called floating-point number and denoted by
f l(x), equal to
f l(x) = (−1)s · (0. a1 a2 . . . at )β e (D.10)
| {z }
m
where:

• β is the machine base;

• t is the number of significant digits;

234 APPENDIX D. NUMERICAL MATHEMATICS

• m is the mantissa;

• e is the exponent;

• s is the “sign” bit, with s = 0 if x ≥ 0 and s = 1 if x < 0.

Typically, we have β = 2 and L ≤ e ≤ U, with L < 0 and U > 0. Using double pre-
cision for number representation (which means 8 bytes of memory, corresponding
to a machine word of 64 bits for storing f l(x)), we have usually t = 53, L = −1021
and U = 1024. From (D.10), it turns out that the range of numbers that can be rep-
resented in a computer machine is finite, and in particular it can be seen that

| f l(x)| ∈ [β L−1 , β U (1 − β −t )].

| {z } | {z }
xmin xmax

Matlab coding. The quantities xmin and xmax are stored in the Matlab variables
realmin and realmax.
>> realmin, realmax

ans =

2.2251e-308

ans =

1.7977e+308

Floating-point numbers are distributed in a discrete manner along the real axis.
In particular, the power of resolution of a computer machine is characterized by
the following quantity.

Definition D.5.1 (Machine epsilon). The machine epsilon εM = β 1−t is the small-
est floating point number such that

1 + εM > 1.
Matlab coding. The quantity εM is stored in the Matlab variable eps.
>> eps

ans =

2.2204e-16
D.5. FLOATING-POINT NUMBERS 235

Remark D.5.2 (Computing realmax in Matlab). Using the definition of εM into

the definition of xmax , we can write this latter quantity in the following equivalent
manner

xmax = β U (1 − β −t ) = β U (1 − εM β −1 ) = β U−1 (β − εM ).

This expression is implemented in the Matlab environment to compute the vari-

able realmax without occurring into overflow problems.
In general, f l(x) does not coincide with x, as stated by the following result.

Proposition D.5.3 (Round-off error). Let x ∈ R be a given number. If xmin ≤ |x| ≤

xmax , then we have

f l(x) = x(1 + δ ) with |δ | ≤ u, (D.11)

where
1 1
u = β 1−t ≡ εM (D.12)
2 2
is the roundoff unit (or machine precision).

Essentials of Numerical Linear

Algebra

Abstract
In this appendix, we review the basic foundations of Numerical Linear Algebra
that are used thoroughly in the text. Solution methods for linear systems based on
the LU factorization technique will be illustrated.

E.1 Linear algebraic systems

The mathematical problem of the solution of a linear algebraic system consists of
finding x ∈ Rn such that
Ax = b (E.1)
where A ∈ Rn×n is a given real-valued matrix and b ∈ Rn is a given right-hand
side vector.

Theorem E.1.1. Problem (E.1) admits a unique solution iff A is nonsingular. In

such a case, Cramer’s rule can be applied to yield

det(Ai )
xi = i = 1, . . . , n. (E.2)
det(A)

Apparently, we are very satisfied with the approach to problem (E.1). Under a
reasonable condition (matrix invertibility), there is an explicit formula to compute

237
238 APPENDIX E. NUMERICAL LINEAR ALGEBRA

the solution x. A closer look at the situation shows that this is not completely
true. As a matter of fact, the application of formula (E.2) requires to evaluate (n +
1) determinants. Assuming to work with a machine capable of performing 1011
floating-point operations (flops) per second (100Gflops/sec), the cost of Cramer’s
rule is of about 5 minutes of CPU time if n = 15, 16 years if n = 20 and 10141
years if n = 100. More generally, the asymptotical cost as a function of matrix
size n increases as (n + 1)! (in mathematical terms, O((n + 1)!), see Fig. E.1).

Figure E.1: Cost of Cramer’s rule (CPU time in years) as a function of n on a

machine performing 100Gflops/sec.

Matlab coding. The following Matlab script can be used to compute the cost of
Cramer’s rule as a function of n.
SecYear=365*24*3600;
n=[5, 10, 15, 20, 50, 100];
Clock=100*1e9;
Cost=factorial((n+1))/Clock;
CostYear=Cost/SecYear;
loglog(n, CostYear,’ko-’)
xlabel(’n’); ylabel(’CPU Time [Years]’); title(’Cost of Cramer’’s Rule’)

The computational effort required by Cramer’s rule is clearly not affordable,

so that a remedy is urgently in order. Numerical linear algebra comes to rescue,
providing two main kinds of techniques for working around the problem associ-
ated with Cramer’s rule: direct methods and iterative methods. In essence, direct
methods compute the solution of (E.1) in a finite number of steps, while iterative
E.2. DIRECT METHODS FOR LINEAR SYSTEMS 239

methods compute the solution of (E.1) as the limit of a sequence, therefore re-
quiring (theoretically) an infinite number of steps. The first approach is discussed
in Sect. E.2 while for the second approach we refer the interested reader to a text
(course) of Numerical Analysis.

E.2 Direct methods for linear systems

Assume that there exist a lower triangular matrix L and an upper triangular U,
such that
A = L ·U. (E.3)
Relation (E.3) is referred to as the LU factorization of A.

Figure E.2: LU factorization of a matrix.

Since
det(A) = det(L) · det(U)
it immediately turns out that lii 6= 0 and uii 6= 0, i = 1, . . . , n, because A is nonsin-
gular.
The LU factorization can be used as a solution method for (E.1) as follows.
From (E.3) we have
(L ·U)x = b
so that solving (E.1) amounts to solving the two linear triangular systems:

Ly = b (E.4a)

Ux = y (E.4b)

Matlab coding. The Matlab coding of (E.4a) is reported below and takes the name
of forward substitution algorithm.
240 APPENDIX E. NUMERICAL LINEAR ALGEBRA

function y = forward_substitution(L,b)
n = length(b);
y = zeros(n,1);
y(1) = b(1)/L(1,1);
for i=2:n
y(i) = (b(i)-L(i,1:i-1)*y(1:i-1))/L(i,i);
end

Matlab coding. The Matlab coding of (E.4b) is reported below and takes the
name of backward substitution algorithm.
function x = backward_substitution(U,y)
n = length(y);
x = zeros(n,1);
x(n) = y(n)/U(n,n);
for i=n-1:-1:1
x(i) = (y(i)-U(i,i+1:n)*x(i+1:n))/U(i,i);
end

Both forward and backward substitution algorithms have a computational cost

of n2 flops. To compute the triangular matrices L and U we need to solve the
nonlinear system (E.3) for the 2(n2 − (n2 − n)/2) = n2 + n unknown coefficients
li j and ui j , which reads
k
∑ lik uk j = ai j i, j = 1, . . . , n (E.5)
k=1

However, the number of equations (E.5) is only n2 , so that we need n additional

equations to close the problem. These latter are found by enforcing the conditions

lii = 1 i = 1, . . . , n. (E.6)

Then, the remaining coefficients can be efficiently computed using the Gauss al-
gorithm, with a cost of O(2n3 /3). Summarizing, the total cost of the solution of
the linear system (E.1) using the method of LU factorization of matrix A is equal
to 2n3 /3 + 2n2 .
We can draw two relevant conclusions from this result. The first conclusion is
that the LU factorization is a computationally efficient alternative to Cramer’s rule.
The second conclusion is that, as n increases, the cost of forward and backward
system solving becomes less important than the cost of the factorization itself.
Matlab coding. The Matlab coding of the solution of the nonlinear system (E.5)
through Gauss algotithm is reported below.
function [L,U] = lufact(A,n)
for k=1:n-1
W(k+1:n)=A(k,k+1:n);
E.2. DIRECT METHODS FOR LINEAR SYSTEMS 241

for i=k+1:n
A(i,k)=A(i,k)/A(k,k);
for j=k+1:n
A(i,j)=A(i,j)-A(i,k)*W(j);
end
end
end
L=tril(A,-1)+eye(n);
U=triu(A);

Example E.2.1. Let us solve (E.1) using the LU factorization in the case where
A is the “magic” matrix of order n = 3 (a matrix constructed from the integers 1
to n2 with equal row, column, and diagonal sums), and b is chosen in such a way
that x = [1, 1, 1]T .
Matlab coding. The sequence of Matlab commands for the present example is
reported below.
>> A=magic(3)

A =

8 1 6
3 5 7
4 9 2

>> b=A*ones(3,1)

b =

15
15
15

>> [L,U]=lufact(A,3)

L =

1.0000 0 0
0.3750 1.0000 0
0.5000 1.8378 1.0000

U =

8.0000 1.0000 6.0000

0 4.6250 4.7500
0 0 -9.7297

>> y=forward_substitution(L,b)

y =

15.0000
9.3750
242 APPENDIX E. NUMERICAL LINEAR ALGEBRA

-9.7297

>> x=backward_substitution(U,y)

x =

1
1
1

The following result gives a necessary and sufficient condition for the exis-
tence and uniqueness of the LU factorization.

Theorem E.2.2. Given a matrix A ∈ Rn×n , its LU factorization with lii = 1, i =

1, . . . , n exists and is unique iff det(Ai ) 6= 0 for i = 1, . . . , n − 1.

The previous theorem is of little practical use. The following sufficient condi-
tions are more helpful.

Proposition E.2.3. If A is s.p.d. or if A is diagonally dominant, then its LU fac-

torization with lii = 1, i = 1, . . . , n exists and is unique. In the first case, we have
L = U T (i.e., the factorization is symmetric) and its cost is O(n3 /3) instead of
O(2n3 /3) (reduced of one half). The LU factorization takes the name of Cholesky
factorization.

It is easy to find cases for which the conditions of Thm. E.2.2 are not satisfied.
Example E.2.4. Let  
1 2 3
A =  2 4 5 .
7 8 9
We have det(A1 ) = 1, det(A2 ) = 0 and det(A3 ) = det(A) = −6, so that A is non-
singular but fails to satisfy condition det(A2 ) 6= 0 that is required to admit the LU
factorization.
To accomodate the inconvenience manifested by the previous example, the LU
factorization is modified as follows. Instead of (E.3), we assume that there exist a
lower triangular matrix L, an upper triangular U and a permutation matrix P, such
that
P · A = L ·U. (E.7)
Relation (E.7) is referred to as the LU factorization of A with pivoting. The role of
matrix P is just to perform a suitable exchange of certain rows of A in such a way
E.2. DIRECT METHODS FOR LINEAR SYSTEMS 243

that all the principal minors up to order n − 1 of P · A turn out to be non-singular,

as required by Thm. E.2.2. Going back to Ex. E.2.4, the choice
 
0 0 1
P= 0 1 0 
1 0 0
has the effect of transforming A into
 
7 8 9
P·A =  2 4 5  ≡ A
e
1 2 3

that is, the first and third rows have been exchanged. Now, we have det(A e1 ) = 7
and det(A e2 ) = 12 so that A
e satisfies the conditions required by Thm. E.2.2 and thus
admits a unique LU factorization (notice that det(A e3 ) = det(A)
e = det(P)det(A) =
+6, so that also A e is nonsingular as A was). According to (E.7), we have

(L ·U)x = P · b
so that solving (E.1) amounts to solving the two linear triangular systems:
Ly = P · b (E.8a)

Ux = y. (E.8b)
In other words, everything remains the same as in the case of LU factorization, ex-
cept the fact that the right-hand-side is subject to a reordering which corresponds
to the exchange of rows of A during factorization through the Gauss algorithm.
Matlab coding. The Matlab command for computing L, U and P is reported be-
low.
>> [L,U,P]=lu(A);
More synthetically, the solution of (E.1) using the LU factorization with piv-
oting can be obtained in Matlab with the following command.
>> x = A \ b;

Remark E.2.5 (Inverse of a matrix). The LU factorization with pivoting (E.7) and
the triangular systems (E.8a)- (E.8b) can be used as an efficient tool for computing
the inverse of a given matrix by solving the n systems
Axi = ei i = 1, . . . , n
where the i-th unknown vector xi represents the i-th column of A−1 .
244 APPENDIX E. NUMERICAL LINEAR ALGEBRA

E.3 Stability analysis

In this section, we study the effect of round-off error in the numerical solution of
problem (E.1).

Theorem E.3.1. Let δ A and δ b denote two perturbations of the problem data (A
and b, respectively), so that the perturbed system associated with (E.1)

(A + δ A)(x + δ x) = b + δ b (E.9)

is still uniquely solvable. Then, we have

kδ xk K(A) kδ bk k|δ Ak|
≤ + , (E.10)
kxk k|δ Ak| kbk k|Ak|
1 − K(A)
k|Ak|

for a matrix norm k| · k| induced by a vector norm k · k and where K(A) :=

k|Ak|k|A−1 k|.

The estimate (E.10) tells us that the relative error on the exact solution is
bounded by the sum of the relative errors on the data, modulo the effect of fi-
nite machine precision which is represented by the amplification constant

K(A)
.
k|δ Ak|
1 − K(A)
k|Ak|

Assuming for simplicity δ A = 0, we see that if A is ill-conditioned (K(A) 1),

xh that is actually produced by the computational process can
then the solution b
be affected by a serious inaccuracy. A refinement of (E.10) is provided by the
following result.

Theorem E.3.2. Assume that k|δ Ak| ≤ γk|Ak| and kδ bk ≤ γkbk, for a positive
constant γ = O(u), and that γK(A) < 1. Then, we have

kδ xk 2γ
≤ K(A). (E.11)
kxk 1 − γK(A)
The estimate (E.11) tells us that the round-off error which inevitably affects
every operation on a machine computer (cf.Sect. D.5) is amplified by the condition
number of matrix A. To obtain a more quantitative information on the error, take
E.3. STABILITY ANALYSIS 245

γ = u ' β 1−t , β = 10, t = 16 and k · k = k · k∞ . Assume also K∞ (A) = 10m , for a

nonnegative integer m, with γK∞ (A) ≤ 1/2. Then, (E.11) yields

kδ xk∞
' uK∞ (A) = 10m−16 . (E.12)
kxk∞

Roughly speaking, the above estimate tells us that the expected accuracy of b xh
with respect to the solution x of (E.1) is, at least, of 16 − m exact decimal digits.
Example E.3.3 (Hilbert matrix). In this example, we study the disastrous effects
of the combined occurrence of matrix ill-conditioning and machine round-off in
the solution of a linear system. We warn the reader to consider this case not as a
general paradigm, rather as a warning against “blind-faith” computations.
For n ≥ 1, let A ∈ Rn×n be the Hilbert matrix of order n, defined as
1
ai j = i, j = 1, . . . , n.
i+ j−1
Matlab coding. Matrix A is symmetric and positive definite for every n. To exam-
ine the stability with respect to perturbations in the solution of the linear system
Ax = b, we compute with the following Matlab script the condition number of A
as a function of n.
n=[1:20];
for i=1:numel(n), A=hilb(n(i)); K(i)=cond(A); end
semilogy(n, K)
xlabel(’n’)

Fig. E.3(a) shows that K(d) grows exponentially as a function of n, in such a

way that, even for a matrix of small size (n = 13), the value of K is exceedingly
large (1015 and more). According to estimate (E.12), the expected number of exact
digits becomes null, making the solution on the computer of the linear system
irremediably useless.
246 APPENDIX E. NUMERICAL LINEAR ALGEBRA

(a) K(d) (b) Percentual error

Figure E.3: Log-plot of the condition number of the Hilbert matrix of order n as a
function of n.
Appendix F

Essentials of Numerical
Approximation

Abstract
In this appendix, we review the basic elements of approximation theory of a given
continuous function f : Ω = [a, b] → R using piecewise polynomials of degree
r ≥ 1.

F.1 Interpolation
Let us introduce a partition Th of Ω into a number Mh ≥ 1 of 1-simplex (intervals)
Ki = [xi , xi+1 ], i = 1, . . . , Mh , in such a way that x1 := a and xMh +1 := b. We denote
by hi := xi+1 − xi the length of each interval and set h := max hi . The partition
Th
Th takes the name of triangulation of the domain Ω. Each Ki is an element of
the triangulation, while the quantities xi , i = 1, . . . , Mh + 1 are the vertices of the
triangulation. The same terminology is adopted when Ω is a bounded set of Rd ,
d ≥ 2, and in such a case 2-simplices are triangular elements (d = 2) while 3-
simplices are tetrahedral elements (d = 3). We associate with Th the following
space of functions

Xhr (Th ) := vh ∈ C0 (Ω) : vh |Ki ∈ Pr (Ki ) ∀Ki ∈ Th .

(F.1)

Unless strictly necessary, we write Xhr instead of Xhr (Th ). The space Xhr is the set of
piecewise continuous polynomials over Ω, of degree ≤ r over each element Ki of

247
248 APPENDIX F. APPROXIMATION THEORY

Th , i = 1, . . . , Mh . Xhr is called the finite element space of degree r associated with

Th , and its dimension depends on both h and r. A count of degrees of freedom
plus the continuity requirement at each internal vertex yields

dim(Xhr ) = Mh (r + 1) − (Mh + 1 − 2) = rMh + 1 ≡ Nh .

Fig. F.1 shows an example of Th (with Mh = 4) and of a function vh of the finite

element space in the cases r = 1 and r = 2.

Figure F.1: Triangulation in 1D and finite element functions (r = 1 and r = 2).

Black bullets denote nodal values, while black ticks and white squares are the
positions of the degrees of freedom of Xhr . In the case r = 1 the vertices and the
nodes coincide, while in the case r = 2 the nodes are more than the vertices.

Remark F.1.1. We notice that it is necessary to introduce an increasingly larger

number of degrees of freedom when the polynomial degree becomes larger than
1. In general, for a given r, we need r − 1 internal nodes for each element in
order to construct the restriction of the finite element function vh |Ki . For example,
in the case r = 2, we can choose as internal node the midpoint of Ki for each
i = 1, . . . , Mh , and so on, for r ≥ 3.
F.1. INTERPOLATION 249

F.1.1 Basis functions

By definition of vector space, any function vh ∈ Xhr can be written in the following
form
Nh
vh (x) = ∑ v j ϕ j (x) (F.2)
j=1

where:

• ϕ j , j = 1, . . . , Nh : basis functions of Vh ;

• v j : degrees
of freedom of vh , that is, the coordinates of vh with respect to
the basis ϕ j .

A particularly interesting choice for the basis of Xhr are the so-called Lagrangian
basis functions, such that

ϕi (x j ) = δi j i, j = 1, . . . , Nh . (F.3)

This property allows us to interpret the quantities vi as the nodal values of vh at

the nodes of Th , i.e.

Nh Nh
vh (xi ) = ∑ v j ϕ j (x) = ∑ v j δ ji = vi i = 1, . . . , Nh .
j=1 j=1

Example F.1.2 (Basis for r = 1 and r = 2). In the case r = 1, two basis functions
associated with the triangulation Th are plotted in Fig. F.2 (top), while Fig. F.2
(bottom) refers to the case r = 2.
Remark F.1.3 (Support of basis functions). For any basis function ϕi , we define
the support of ϕi as the subset of Ω on which the function is nonvanishing. We
notice that in the case r = 1 the support of a basis function associated with an
internal node xi is made by the union of Ki−1 and Ki . In the case r = 2, instead,
we have two kinds of basis functions, those associated with the midpoint of each
element (white square) and those associated with a vertex (black bullet). The first
kind of function (called “bubble” function) is localized within the element and
its support is given by Ki , the second kind of function has a support made by the
union of Ki−1 and Ki .
250 APPENDIX F. APPROXIMATION THEORY

Figure F.2: Basis functions. Top: r = 1, bottom: r = 2. In the first case, dim(Xhr ) =
5 while in the second case we have dim(Xhr ) = 9.

F.1.2 Finite element interpolation and error analysis

We are now ready to use the finite element basis functions of Xhr to construct a
piecewise polynomial approximation of a given continuous function f . Precisely,
let us intoduce the interpolation operator Πrh f : Xhr → R defined as

Nh
Πrh f (x) := ∑ f (x j )ϕ j (x) (F.4)
j=1

and such that

Nh Nh
Πrh f (xi ) := ∑ f (x j )ϕ j (xi ) = ∑ f (x j )δ ji = f (xi) i = 1, . . . , Nh . (F.5)
j=1 j=1

The Nh relations (F.5) are the interpolation conditions that allow to uniquely char-
acterize the operator Πrh f ∈ Xhr associated with the given function f .
Example F.1.4. Let f (x) = sin(x), [a, b] = [0, 10], r = 1 and Mh = 10.
Matlab coding. The following Matlab commands can be used to compute Πrh f
and to plot it together with the function f .
xn = 0:10; yn = sin(xn);
xi = 0:.25:10; yi = interp1(xn,yn,xi);
xx =0:0.0001:10; yy = sin(xx);
plot(xn,yn,’*’,xi,yi,xx,yy)
xlabel(’x’);
legend(’f(x_i)’,’\Pi_h^1 f(x)’,’f(x)’)
F.1. INTERPOLATION 251

Figure F.3: Piecewise linear continuous interpolation of the function f (x) = sin(x)
over [0, 10].

The interpolation error is defined as

Ehr f (x) := f (x) − Πrh f (x). (F.6)
By definition, we have
Ehr f (xi ) = 0 i = 1, . . . , Nh .
The next question is to characterize the accuracy of the piecewise polynomial
approximation over the whole domain Ω as a function of h. For any continuous
function g = g(x), we define the maximum norm as (cf. (C.12) in the case k = 0)
kgk∞ := max |g(x)|. (F.7)
x∈[a,b]

Theorem F.1.5 (Interpolation error). Let us assume that f ∈ Cr+1 (Ω), r ≥ 1.

Then, there exists a positive constant C independent of h such that
kEhr f k∞ ≤ Chr+1 k f (r+1) k∞ , (F.8)
where f (s) denotes the s-th derivative of f with respect to x, s ≥ 0.
According to Def. D.9, Thm. F.1.5 tells us that Πrh f converges uniformly to f
with order r + 1 with respect to the discretization parameter h, provided that f is
sufficiently smooth. Moreover, the following corollary holds.
Corollary F.1.6 (Interpolation error for the derivative of f ). Under the same reg-
ularity assumption as in Thm. F.1.5, there exists a positive constant C independent
of h such that
k(Ehr f )0 k∞ ≤ Chr k f (r+1) k∞ . (F.9)
252 APPENDIX F. APPROXIMATION THEORY

This result tells us that the interpolation operator can be used also for the ap-
proximation of the derivative of a function f , but that, in such a case, the order of
convergence is decreased by one with respect to the approximation of the function.

F.2 Quadrature
The interpolation polynomial can be profitably used for the approximate evalua-
tion of the integral I( f ) := ab f (x) dx, obtaining the following quadrature formula
R

Z b
Ihr ( f ) = Πrh f (x) dx ' I( f ). (F.10)
a

As computing the integral of a polynomial is easy, the above relation gives an

explicit formula for I( f ). For simplicity, we assume that the triangulation of [a, b]
is uniform, i.e., h = (b−a)/Mh . By doing so, depending on the polynomial degree
r, formula (F.10) yields the so-called Newton-Cotes quadrature rules.
Example F.2.1. The lowest-order case r = 1 corresponds to the trapezoidal rule.
The approximate area computed by the quadrature rule is geometrically repre-
sented by the sum of the areas of the trapezoidal scaloids Si , i = 1, . . . , Mh (see
Fig. F.4).

Figure F.4: Trapezoidal quadrature rule.

In the case r = 2, the resulting formula is called Cavalieri-Simpson quadrature

rule.
It is also interesting to consider the case of a zero-order approximation (r =
0). In such a case, the resulting formula is called midpoint quadrature and is
geometrically represented in Fig. D.2.
In view of the error analysis of (F.10), some definitions are in order.
F.2. QUADRATURE 253

Definition F.2.2 (Quadrature error). If the error associated with a quadrature for-
mula can be written in the form
e ≤ Ch p k f (q) k∞ ,
then we say that:
• the error is infinitesimal of order p with respect to the discretization param-
eter h;
• the formula has a degree of exactness (or precision) equal to s := q − 1. As
a matter of fact, if f ∈ Pq−1 (i.e., if f is a polynomial of degree ≤ q − 1),
then f (q) (x) = 0 for all x ∈ [a, b] and e = 0. This means that the formula is
exact if applied to polynomials of degree up to q − 1.
Theorem F.2.3 (Quadrature error). Let f ∈ C2 ([a, b]). Then
b − a 2 (2)
|I( f ) − Ih0 ( f )| ≤ h k f k∞ ⇒ p = 2, s = 1
24
(F.11)
b − a 2 (2)
|I( f ) − Ih1 ( f )| ≤ h k f k∞ ⇒ p = 2, s = 1.
12
If f ∈ C4 ([a, b]), then
b − a 4 (4)
|I( f ) − Ih2 ( f )| ≤ h k f k∞ ⇒ p = 4, s = 3. (F.12)
90 · 25
Remark F.2.4 (Gaussian quadratures). More in general, a quadrature formula for
the approximate evaluation of I( f ) can be written as a weighted sum
Nh
Ih ( f ) = ∑ wi f (xi )
i=1

where xi and wi are called nodes and weights of the considered quadrature for-
mula. The optimal choice of such nodes and weights is a classical (and nontrivial)
problem. If the number of nodes over each element Ki is a fixed quantity, say
equal to n + 1, for n ≥ 0 (not too large), then it is possible to uniquely determine
the nodes x j ∈ Ki in such a way that s = 2n + 1, i.e., maximum degree of exactness.
The resulting formulae are called Gaussian quadrature rules. An important prop-
erty of Gaussian quadratures is that wi > 0, i = 1, . . . , N, which has the remarkable
consequence to yield numerically stable formulae. This property does not hold for
Newton-Cotes formulae of high order, for which some weights wi turn out to be
negative if n > 5.
254 APPENDIX F. APPROXIMATION THEORY

Example F.2.5. Two simple examples of Gaussian quadratures are those with:

• n = 0 (one quadrature node for each Ki ), for which Nh = Mh , p = 2 and

s = 1;

• n = 1 (two quadrature nodes for each Ki ), for which Nh = 2Mh , p = 4 and

s = 3.

IB Math AA SL Questionbank - Trigonometric Functions
100% (3)
IB Math AA SL Questionbank - Trigonometric Functions
30 pages
Introduction To The Finite Element in Electromagnetics
100% (1)
Introduction To The Finite Element in Electromagnetics
125 pages
Lectures On Finite Element Method
50% (2)
Lectures On Finite Element Method
111 pages
Introduction To The Finite Element Method in Electromagnetics
100% (1)
Introduction To The Finite Element Method in Electromagnetics
126 pages
January 2016: 101MP Algebra - Mock Exam Paper
No ratings yet
January 2016: 101MP Algebra - Mock Exam Paper
5 pages
Numerical Methods For Civil Engineering PDF
No ratings yet
Numerical Methods For Civil Engineering PDF
256 pages
Numerical Methods For Civil Engineering PDF
33% (3)
Numerical Methods For Civil Engineering PDF
256 pages
FEM Notes 2016 PDF
No ratings yet
FEM Notes 2016 PDF
71 pages
Lecture Notes: The Finite Element Method: Aurélien Larcher, Niyazi Cem de Girmenci Fall 2013
No ratings yet
Lecture Notes: The Finite Element Method: Aurélien Larcher, Niyazi Cem de Girmenci Fall 2013
43 pages
MAT2
No ratings yet
MAT2
223 pages
Hiptmair R., Schwab C. Numerical Treatment of PDEs. Finite Element Method (Web Draft, 2004) (223s) - MNF
No ratings yet
Hiptmair R., Schwab C. Numerical Treatment of PDEs. Finite Element Method (Web Draft, 2004) (223s) - MNF
223 pages
IntroToFEA Red
No ratings yet
IntroToFEA Red
202 pages
FEniCS Manual
No ratings yet
FEniCS Manual
210 pages
Computational Solid Mechanics Lecture Notes
No ratings yet
Computational Solid Mechanics Lecture Notes
108 pages
FEM Salih
No ratings yet
FEM Salih
81 pages
Intro To FEA Notes Zurich
No ratings yet
Intro To FEA Notes Zurich
208 pages
Intro Fem
No ratings yet
Intro Fem
72 pages
AdvancedFE PDF
No ratings yet
AdvancedFE PDF
47 pages
LinFem Part2 en
No ratings yet
LinFem Part2 en
116 pages
(L) - 2015 - May - Introduction To Finite Element Modelling in Geosciences - ETH
No ratings yet
(L) - 2015 - May - Introduction To Finite Element Modelling in Geosciences - ETH
81 pages
Finite Element Method
No ratings yet
Finite Element Method
172 pages
FEA For Coupled Problems
No ratings yet
FEA For Coupled Problems
149 pages
Lecture Notes: The Finite Element Method: Aurélien Larcher, Niyazi Cem de Girmenci Fall 2013
No ratings yet
Lecture Notes: The Finite Element Method: Aurélien Larcher, Niyazi Cem de Girmenci Fall 2013
1 page
Geo Dynamics 557
No ratings yet
Geo Dynamics 557
216 pages
Numerical Modeling of Earth Systems
No ratings yet
Numerical Modeling of Earth Systems
216 pages
Intro To FEM For Thermal and Stress Analysis With Matlab SOFEA
No ratings yet
Intro To FEM For Thermal and Stress Analysis With Matlab SOFEA
196 pages
Elementi Finiti e Matlab
No ratings yet
Elementi Finiti e Matlab
196 pages
Book-Bars and Beams
No ratings yet
Book-Bars and Beams
85 pages
Am442cm452 Book
No ratings yet
Am442cm452 Book
166 pages
Beginner S Course in Boundary Element Methods PDF
No ratings yet
Beginner S Course in Boundary Element Methods PDF
191 pages
The Finite Element Method-An Introduction
No ratings yet
The Finite Element Method-An Introduction
125 pages
FEM Lecture Notes by Peter Hunter, Andrew Pullian
No ratings yet
FEM Lecture Notes by Peter Hunter, Andrew Pullian
153 pages
Lecture Notes On Finite Element II Victor Saouma
No ratings yet
Lecture Notes On Finite Element II Victor Saouma
64 pages
LinFem Part1 en
No ratings yet
LinFem Part1 en
121 pages
CT5123 - LectureNotesThe Finite Element Method An Introduction
100% (2)
CT5123 - LectureNotesThe Finite Element Method An Introduction
125 pages
CT5123 LectureNotesThe Finite Element Method An Introduction PDF
No ratings yet
CT5123 LectureNotesThe Finite Element Method An Introduction PDF
125 pages
FEM Notes
No ratings yet
FEM Notes
160 pages
Beginners Course in Bem
No ratings yet
Beginners Course in Bem
71 pages
Fem 2
No ratings yet
Fem 2
491 pages
Finite Element Method For Retaining Walls Subjected To Swelling Pressure
No ratings yet
Finite Element Method For Retaining Walls Subjected To Swelling Pressure
49 pages
Pragmatic Thermal
No ratings yet
Pragmatic Thermal
117 pages
Lec Notes Fem Akp
No ratings yet
Lec Notes Fem Akp
75 pages
2022 Numpdenotes Ces Sisc
No ratings yet
2022 Numpdenotes Ces Sisc
139 pages
Lecture Notes - Finite Element Method: January 2020
No ratings yet
Lecture Notes - Finite Element Method: January 2020
6 pages
Numpde
No ratings yet
Numpde
194 pages
Numerical Methods For Partial Differential Equations: Joachim SCH Oberl April 6, 2009
No ratings yet
Numerical Methods For Partial Differential Equations: Joachim SCH Oberl April 6, 2009
135 pages
Fem Doerfler
No ratings yet
Fem Doerfler
89 pages
CourseNotes FEM SHZ
No ratings yet
CourseNotes FEM SHZ
6 pages
(Texts in Applied Mathematics 72) Alexandre Ern, Jean-Luc Guermond (Auth.) - Finite Elements I - Approximation and Interpolation (2021, Spring
No ratings yet
(Texts in Applied Mathematics 72) Alexandre Ern, Jean-Luc Guermond (Auth.) - Finite Elements I - Approximation and Interpolation (2021, Spring
323 pages
Method of Finite Elements
No ratings yet
Method of Finite Elements
159 pages
FEM Implement
No ratings yet
FEM Implement
51 pages
FEAP - A Finite Element Analysis Program: Version 8.4 Theory Manual
No ratings yet
FEAP - A Finite Element Analysis Program: Version 8.4 Theory Manual
160 pages
Theory85 Hu Washizu PDF
No ratings yet
Theory85 Hu Washizu PDF
160 pages
Fem Book 4print 2up 9 64
No ratings yet
Fem Book 4print 2up 9 64
56 pages
Lecture07 CE72.12Galerkin - To - FEM
No ratings yet
Lecture07 CE72.12Galerkin - To - FEM
13 pages
Finite Volume Book PDF
No ratings yet
Finite Volume Book PDF
237 pages
Node Pairs Report
No ratings yet
Node Pairs Report
185 pages
Notes Wave Combined
No ratings yet
Notes Wave Combined
77 pages
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Mathematics N4: FET College Nated, #6
From Everand
Mathematics N4: FET College Nated, #6
Efetobo Emede
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Linear Algebra
No ratings yet
Linear Algebra
150 pages
Students' Support Material-Class-XII (Maths)
No ratings yet
Students' Support Material-Class-XII (Maths)
136 pages
A Gentle Introduction To Tensors
No ratings yet
A Gentle Introduction To Tensors
87 pages
PDJ Endre Pap, Arpad Takaci
No ratings yet
PDJ Endre Pap, Arpad Takaci
415 pages
3.4 Transformations of Power Functions
No ratings yet
3.4 Transformations of Power Functions
2 pages
MTH101 Current Papers 2018 To 2020
No ratings yet
MTH101 Current Papers 2018 To 2020
4 pages
Week 1
No ratings yet
Week 1
25 pages
IAF Airman Maths Syllabus and Model Question Paper PDF
No ratings yet
IAF Airman Maths Syllabus and Model Question Paper PDF
5 pages
Reviewer 2
No ratings yet
Reviewer 2
6 pages
MATH1048 Linear Algebra 1 Exam 2014
No ratings yet
MATH1048 Linear Algebra 1 Exam 2014
9 pages
FP3 June 2022
No ratings yet
FP3 June 2022
36 pages
3.2 The Derivative As A Function
No ratings yet
3.2 The Derivative As A Function
25 pages
Limits - Formula Sheet
No ratings yet
Limits - Formula Sheet
1 page
Linear Transformations: Math 240 - Calculus III
No ratings yet
Linear Transformations: Math 240 - Calculus III
18 pages
Test2 Sol
No ratings yet
Test2 Sol
12 pages
Legendre Polynomials
No ratings yet
Legendre Polynomials
4 pages
Studyguide360: Chapter - 3 Trigonometric Functions Key Points
No ratings yet
Studyguide360: Chapter - 3 Trigonometric Functions Key Points
18 pages
Elliptic Partial Differential Equations and Quasiconformal Mappings in The Plane First Edition Kari Astala
No ratings yet
Elliptic Partial Differential Equations and Quasiconformal Mappings in The Plane First Edition Kari Astala
44 pages
Slides On Algebra of Coding Theory
No ratings yet
Slides On Algebra of Coding Theory
62 pages
General Mathematics First Quarter-Module 4A
No ratings yet
General Mathematics First Quarter-Module 4A
7 pages
Discrete Mathematics I: Solution
No ratings yet
Discrete Mathematics I: Solution
4 pages
Complex Functions & Mappings: 8023010-4: Advanced Engineering Mathematics
No ratings yet
Complex Functions & Mappings: 8023010-4: Advanced Engineering Mathematics
20 pages
Table of Laplace Transform
No ratings yet
Table of Laplace Transform
2 pages
250+ TOP MCQs On Laplace Transform - 1 and Answers
No ratings yet
250+ TOP MCQs On Laplace Transform - 1 and Answers
7 pages
Chapter 13 Special Test For Convergence
No ratings yet
Chapter 13 Special Test For Convergence
23 pages
Chapter 2: Advanced Differentiation
No ratings yet
Chapter 2: Advanced Differentiation
21 pages
1004 r19 Dse Extc III Ecc301 Em3 QP
No ratings yet
1004 r19 Dse Extc III Ecc301 Em3 QP
5 pages