Introduction To Applied Mathematics Math 46
Introduction To Applied Mathematics Math 46
Department of Mathematics
Dartmouth College, NH, USA
Email: [email protected]
Script for
Math 46 (Spring 2020)
Introduction to Applied Mathematics
This course provides an introduction into the field of Applied Mathematics. In particular, the emphasis
is upon mathematical tools to describe and analyze real world phenomena which play a central role,
for instance, in the applied and natural sciences. However, the focus will lie ’only’ on models resulting
in ordinary differential and integral equations. Still, this course requires you (the student or any other
reader) to already be familiar with basic concepts from ordinary differential equations as well as linear
algebra. I would like to point out that this script as well as the corresponding lectures are mainly based
on the first four chapters of the third edition of Logan’s monograph [Log13]. Moreover, for the second
chapter of this script (Dimensional Analysis and Scaling), I also draw inspiration from the lecture notes
[Her19] of my former colleague Michael Herrmann from TU Braunschweig in Germany.
Table of Contents
3 Asymptotic analysis 19
3.1 The Bachmann-Landau notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Perturbation Methods 23
4.1 Regular Perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Pitfalls of Regular Perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7 Calculus of Variations 49
7.1 Variational Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.2 Necessary conditions for extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.3 The Euler–Lagrange equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.4 Some Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.5 Outlook on Possible Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.5.1 Higher Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.5.2 Several Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.5.3 Natural Boundary Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8 Orthogonal Expansions 67
8.1 Best Approximations in Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.2 The Generalized Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.3 Mean-Square Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.4 Classical Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
9 Sturm–Liouville Problem 79
i
ii TABLE OF CONTENTS
Bibliography 82
Chapter
1
Introduction - What is Applied Mathematics?
Applied mathematics is a broad subject area dealing with the description, analysis, and prediction of
real-world phenomena. It is more than a set of methods that can be used to solve equations that
come from physics, engineering, and other applied sciences. In fact, applied mathematics also is about
mathematical modeling and an entire process that intertwines with the physical reality. In this course, by
a mathematical model (sometimes just called a model) we usually refer to an equation that describes
some physical problem or phenomenon. Moreover, by mathematical modeling we mean the process
of formulating, analyzing, and potentially refining a mathematical model. In particular, this process
includes the following steps:
3. Comparing the solution of the model to real-world data and interpreting the results
4. If necessary, revising the model until it describes the underlying physical problem sufficiently
accurate
Hence, mathematical modeling involves physical intuition, formulation of equations, solution meth-
ods, and analysis. Please note that solution methods can be both, of analytical or numerical nature. In
this course, however, we will restrict ourselves to analytical methods (the ones involving the use of pen
and paper) rather than using numerical methods (the ones involving the use of computers). Moreover,
you (the student) are expected to already be familiar with basic solution methods for ordinary differen-
tial equations, which will be the class of mathematical models we will focus on in this course. Finally,
let us — at least loosely — agree on what we demand from a ’good’ model:
• It should be predictive
Here, the last criterion means that a model should not just reflect all known features of a real-
world process but it should also be able to predict previously unknown mechanisms or phenomena. A
prominent example for this are, for instance, gravitational waves, which have already been predicted in
1916 by Einstein [Ein05, Ein18] on the basis of his general theory of relativity (a mathematical model)
and directly observed only in 2016 by the LIGO and Virgo Scientific Collaboration (resulting in the
1
Of course, the model needs to be sufficiently complex to include all relevant mechanisms of the underlying real-world
process. Yet, when we have two different models at hand which both describe the same process equally well, we should
always prefer the simpler model. This principle is often referred to as Ockham’s razor [Ari76]
1
2
2017 Nobel Prize in Physics). Another example of current significance is the spread of diseases, which
can often be modeled by reaction-diffusion equations. In particular, applied mathematics can therefore
help us determine which parameters and processes are important (e. g. washing hands) and which are
unimportant (e. g. buying toilet paper).
Chapter
2
Dimensional Analysis and Scaling
As the name indicates, basis dimensions are defined independent or fundamental dimensions, from
which other dimensions can be obtained. Table 2.1 lists the seven basis dimensions as well as their
corresponding symbols and units. We should note that there can be many possible units for measuring
a certain dimension. For instance, length can be measured not just by the unit meter but also by
millimeter, kilometer, inches, foot, mile, and many more. In this course, however, we will only use SI
units (more commonly referred to as metric units) from the International System of Units (also Le
Systeme International d’Unites) [oWMTT01]. The SI units corresponding to the seven basis dimensions
can be found in Table 2.1 as well.
All other dimensions can be derived as combinations (powers and products) of these seven basis
dimensions and are therefore referred to as derived dimensions. By way of example, all of you will
already know that speed is measured, for instance, by dividing a certain number of meters (or any other
unit for length, e. g. miles) by a certain number of seconds (or any other unit for time, e. g. hours). In
the same manner, many more dimensions can be derived from the seven basis dimensions. Some of the
more important ones, together with their corresponding SI units and symbols, can be found in 2.1. Table
2.1 also lists how the respective dimension is derived from the basis dimensions or some other derived
dimensions.
Henceforth, given a quantity x, we denote
Note that often a mathematical model, x = y, only makes sense if both sides of the equation have the
same dimensions, that is [x] = [y]. Speaking plainly, we might not want to accidentally compare appels
and oranges!
3
4 2.1. PHYSICAL DIMENSIONS AND UNITS
Basis dimensions
dimension SI unit symbol
mass kg (kilogram) m (sometimes M)
length m (meter) L (sometimes l)
time s (second) t (sometimes T)
temperature K (Kelvin) T (sometimes q)
electric current A (ampere) I (sometimes I)
amount of light c (candela) C (sometimes I)
(luminous intensity)
amount of matter mol (mole) n (sometimes µ)
real numbers none 1 (dimensionless)
Table 2.1: The seven basis dimensions as well as their corresponding SI units and symbols. Note that
real numbers are dimensionless.
Derived dimensions
dimension SI unit symbol
m L
speed s (meter per second) s= t
m L
acceleration s2
(meter per square second) a= t2
m·L
force N (Newton) F= t2
F
pressure Pa (Pascal) p= L2
Table 2.2: Some derived dimensions as well as their corresponding SI units and symbols. This list might
grow throughout the course!
2.2. DIMENSIONAL ANALYSIS 5
To demonstrate the flavor of the concepts discussed in this section, let us consider a calculation
made by the British applied mathematician Taylor in the late 1940s. After the first atomic bomb
and viewing photographs of the spread of its fireball, Taylor wanted to compute its yield (energy
released). From photographs it becomes clear that such an explosion results in a spherical blast
wave front. Taylor then argued that there is a relation between the radius r of this blast wave,
time t, the initial air density ρ, and the energy released E. Hence, he assumed that there should
be a simple physical law of the form
rα tβ ργ E δ = C, (2.1)
for some C ∈ R. All quantities as well as their corresponding dimensions can be found in Table
2.3. Note that we have a dimensionless quantity at the right hand side of (2.1) (remember that
[C] = 1)! Thus, for (2.1) to make sense, also the dimensions of the quantities on the left hand
side have to cancel each other out, i. e.
h i
rα tβ ργ E δ = 1.
Consulting tables 2.1 and 2.2 from Chapter 2.1, this results in
γ δ
(L)α (t)β mL−3 mL2 t−2 = 1.
Next, since all dimensions have to cancel out, this provides us with a system of linear equations,
α − 3γ + 2δ = 0 (for L),
β − 2δ = 0 (for t),
γ=δ=0 (for m),
t2 E
= C, (2.2)
r5 ρ
for some C ∈ R. In particular, we get
!1/5
Et2
r=C . (2.3)
ρ
That is, just from dimensional reasoning (and the physical assumption (2.1)), it can be shown
that the radius of the blast wave depends on the two-fifth power of time. It should be pointed
out that in this example the constant C depends on the dimensionless ratio of specific heats.a
Moreover, also ρ is usually known. Finally, the initial energy yield can be calculated by fitting
the curve (2.3) to experimental data of r versus t.
a
Taylor used the value C = 1
6 2.2. DIMENSIONAL ANALYSIS
Table 2.3: The quantities and corresponding dimensions used in Example 2.2.1
Eα1 1 · · · Eαmm = 1
Examples 2.2.3
(a) Let us consider the 3-tupel (m, L, s) containing the dimension symbols for mass, length, and
speed. Consulting Table 2.2 we can note that
s = L · t−1
and therefore
mα1 Lα2 sα3 = mα1 Lα2 +α3 t−α3 .
Obviously, only the trivial vector α = (0, 0, 0) satisfies
ρ = m · L−3
and therefore
mα1 Lα2 ρα3 = mα1 +α3 Lα2 −3α3 .
It is easy to note that there are non-trivial vectors α = (α1 , α2 , α3 ) satisfying
These are given by the infinite many solutions α = (−α3 , 3α3 , α3 ) with α3 ∈ R of the system
of linear equations
α1 + α3 = 0 (for m),
α2 − 3α3 = 0 (for L).
We say that the quantity q can be represented with respect to the vector of dimension symbols
(E1 , . . . , Em ) if
[q] = Eα1 1 · · · Eαmm
for some vector α = (α1 , . . . , αm ) ∈ Rm . The vector α is often called dimension vector of q
with respect to (E1 , . . . , Em ).
Example 2.2.5
We note that there can be different representations of quantities. Let us consider the quantity q
measuring energy, i. e., [q] = E. Consulting Table 2.2, q can be represented with respect to (F, L),
since
[q] = E = F · L (force times length). (2.4)
The corresponding dimension vector is given by α = (1, 1). Yet, at the same time q can be
represented with respect to (m, t, L). Note that force can be derived by
F = m · L · t−2
[q] = E = m · t−2 · L2 .
This time, the corresponding dimension vector is given by α = (1, −2, 2).
Let q be a quantity which can be represented with respect to the vector of dimension symbols
(E1 , . . . , Em ). The dimension α = (α1 , . . . , αm ) vector is unique if and only if (E1 , . . . , Em ) is
independent.
α1,1 · · · α1,m
.. .. ∈ Rn×m
A := . .
αn,1 · · · αn,m
is called the dimension matrix of (q1 , . . . , qn ) with respect to (E1 , . . . , Em ). The rank of A,
rank(A) = maximal number of independent columns, is referred to as the dimension rank.
8 2.2. DIMENSIONAL ANALYSIS
Example 2.2.8
Revisiting Example 2.2.1, let us consider the vector of quantities (ρ, E) containing the density ρ
and the energy E. Consulting Table 2.3, these can be represented with respect to the independent
dimension symbols (m, L, t), denoting the dimensions mass, length, and time:
[ρ] = ρ = m1 · L−3 · t0
[E] = E = m1 · L2 · t−2 .
Hence, the dimension matrix of the vector of quantities (ρ, E) with respect to (m, L, t) is given
by !
1 −3 0
A=
1 2 −2
and the dimension rank is rank(A) = 2.
has the dimensional vector βA. In particular, y is dimensionless ([y] = 1) if and only if
β1 α1,m + · · · + βn αn,m = 0
for all m = 1, . . . , n.
Hence, denoting the dimension vector of y with respect to (E1 , . . . , Em ) by γ = (γ1 , . . . , γm ), we imme-
2.2. DIMENSIONAL ANALYSIS 9
diately get
α1,1 · · · α1,m
.. .. ∈ Rn×m = βA
γ = (β1 , . . . , βn ) . .
αn,1 · · · αn,m
and therefore the assertion.
Despite its simplicity, Lemma 2.2.10 comes with some strong consequences. In particular, the di-
mension of the dimension marix’ kernel tells us how many dimensionless quantities can be constructed
as generalized monomials of the physical quantities (q1 , . . . , qn ). Moreover, this also tells us how many
constants, which we need to be determined by actual measurements, will be in a physical law. This is
summarized in the following theorem.
such that
f (q1 , . . . , qn ) = g(qi1 , . . . , qir ) · h(π1 , . . . , πk ) (2.5)
holds.
Proof. Without loss of generality, let 0 < r < n and 0 < k < n. The cases r = 0 (k = n) and r = n
(k = 0) can be treated analogously. The proof is done in three steps:
1. From linear algebra we know that for A : Rm → Rn the image im(A) is an r-dimensional subspace
of Rn . Thus, there are r indices n1 , . . . , nr such that
im(A) = span{αn1 , . . . , αnr },
where αi = (αi,1 , . . . , αi,m ) denotes the dimension vector of qi with respect to (E1 , . . . , Em ).
Without loss of generality, let us assume that
n1 = 1, . . . , nr = r.
Otherwise, we could just change the order of the qi .
2. Next, note that every dimension vector αr+j , j = 1, . . . , k, can be written as a linear combination
of α1 , . . . , αr :
αr+j = γj,1 α1 + · · · + γj,r αr
In particular, this implies
α α
[qr+j ] = E1 r+j,1 · · · Emr+j,m
γ α1,1 +···+γj,r αr,1 γ α1,m +···+γj,r αr,m
= E11,1 · · · Em1,1
α α α
= (E1 1,1 · · · Em1,m )γj,1 · · · (E1 r,1 · · · Eαmr,m )γj,r
γ γ
= [q1 j,1 · · · qr j,r ].
10 2.2. DIMENSIONAL ANALYSIS
is dimensionless. In fact, these are the k dimensionless quantities predicted by (b) in Theorem
2.2.11.
where δi = βi + (γ1,i βr+1 + · · · + γk,i βr+k ) for i = 1, . . . , r. Finally, the generalized monomials
g : Rr → R and h : Rk → R are therefore given by
Remark 2.2.12
(a) The indices (i1 , . . . , ir ) and the dimensionless quantities (π1 , . . . , πk ) might not be unique.
However, their numbers, rank(A) and k = n − r, are.
(b) For fixed (i1 , . . . , ir ) and (π1 , . . . , πk ), the generalized monomials g : Rr → R and h : Rk → R
are uniquely determined up to a multiplicative constant.
The dimensional arguments occurring in the proof of Theorem 2.2.11 are often used in the natural
engineering sciences. There, they are not just applied to general monomials f , however, but also to much
more general physical laws. In particular, we note the following principle.
(a) k dimensionless quantities (π1 , . . . , πk ) that can be formed from (q1 , . . . , qn ) and
(b) a function F : Rk → R
f (q1 , . . . , qn ) = 0,
F (π1 , . . . , πk ) = 0,
Proof. The proof of Theorem 2.2.13 does not only involve mathematical but also physical arguments
and is omitted.
Like every good physical principle, Buckingham’s principle is both, easy to understand as well as
powerful. To demonstrate this, let us consider an example.
Let us consider a very thin and one-sided endless metal bar with constant temperature zero. At
time t0 = 0 an amount of temperature U is assumed to be concentrated at the finite end of the
bar, let’s say at x = 0 with x ∈ [0, ∞). You could imagine the bar being pushed against a heater.
Then, the problem is to determine the temperature u as a function of the distance to the source
x, time t, the source’s temperature U , and a constant κ describing the thermal diffusivity of the
metal bar. All involved quantities can be found in Table 2.4. Of course, the problem could be
formulated as a partial differential equation, namely the heat equation ∂t u − κ∂x2 u = 0 with zero
initial condition and boundary condition u = U at x = 0. Yet, we want to use this example to
demonstrate that already dimensional arguments, in particular the Pi theorem, can provide us
with surprisingly deep insights into the above described physical process. We conjecture a physical
law of the form
f (u, t, x, U, κ) = 0, (2.6)
which relates the five physical quantities u, t, U , and κ. Next, for our dimensional analysis of
(2.6), we proceed in three steps:
1. Find a minimal set of independent dimensions (E1 , . . . , Em ) by which the physical quantity
can be represented.
On 1. Consulting Table 2.4, a suitable choice are the three dimensions temperature (T), time (t),
and length (L). For this choice, (E1 , . . . , Em ) = (T, t, L), we have
Table 2.4: The quantities and corresponding dimensions used in Example 2.2.14
2.3 Scaling
So far, dimensional analysis — especially Buckingham’s principle — allowed us to reduce mathematical
models, potentially involving many physical quantities q1 , . . . , qn to an equivalent model relating a usually
smaller number of dimensionless quantities π1 , . . . , πk . Essentially, this tells us which quantities to use
in a model. Scaling (also referred to as non-dimensionalization) is of a similar flavor. The goal is to find
appropriate dimensionless scales for the involved variables.
Suppose that time t is a quantity in a given model. If this model describes, for instance, the
motion of a glacier, clearly the unit of seconds is too fast. Significant changes in the glacier could
not be observed on the order of seconds. On the other hand, if the problem involved a nuclear
reaction, the unit of seconds would be too slow. This time, all of the important actions would be
over before the first second has ticked.
Evidently, every problem has an intrinsic time scale, or characteristic quantity tc , which is the
shortest time for which discernible changes can be observed in the physical quantities. Some processes
might even have multiple time scales. Once a characteristic time tc has been identified, at least for a
part of the process, a new dimensionless variable t can be defined by
t
t := .
tc
If tc is chosen correctly, the dimensionless time t neither is too large nor too small. After characteristic
quantities have been chosen the model can then be reformulated in terms of the new dimensionless
quantities. The result will be a model in dimensionless form, where all the variables and parameters
are dimensionless and roughly of the same magnitude. Moreover, the number of parameters is usually
reduced. This process is usually called scaling or non-dimensionalization.
Remark 2.3.2
Scaling is not a strict mathematical theory but rather a technique, which can be learned from
exercise. Besides technical skills (application of integral and differential transformations), it is
important to have a physical understanding of the intrinsic scales of a problem, e. g. characteristic
times, lengths, and so on.
In what follows, we consider two examples (population growth as well as the projectile problem)
which are supposed to demonstrate the idea and steps behind scaling in greater detail. At the end of
14 2.3. SCALING
this chapter, you should try to apply scaling to some other problems by yourself (see the corresponding
exercises).
Example 2.3.3: Population growth
Let p = p(t) be the population of an animal species located at a fixed region at time t. The
simplest model to describe population growth is the classical Malthus model
dp
= rp, p(0) = p0 ;
dt
see [Mal72, MWJ92]. Here, r > 0 is a parameter called the growth rate, given in dimensions of
inverse-time. In particular, the Malthus model indicates p(t) = p0 ert , that is, exponential growth.
Unfortunately, this model does not capture the important effect of competition. As a population
grows, intraspecific competition for food, living space, and natural resources limit the growth.
Yet, this is reflected in the logistics model
dp p
= rp 1 − , p(0) = p0 , (2.7)
dt K
which was introduced in a series of three papers by the Belgian mathematician Verhulst between
1838 and 1847; see [Cra04]. Here, K > 0 is an additional parameter referred to as the carrying
capacity. This is the number of individuals that the ecosystem can sustain. Note that the logistics
model 2.7 has a total number of two variables (t and p) and three parameters (p0 , r, and K), also
see Table 2.5. We now demonstrate how scaling can be used to reduce the number of parameters
in a model. This is done by introducing new dimensionless variables for time and population,
formed from the parameters p0 , r, and K. Of these, only r contains dimension of time ([r] = t−1 ).
Hence, we use tc = r−1 as the characteristic quality for time and introduce
t = rt.
For the characteristic quantity of the population, pc , there are two choices, K as well as p0 . Either
will work, but here we choose pc = K, yielding
p
p= .
K
Reformulating (2.7) with respect to the new dimensionless variables t and p, we get
dp
= p(1 − p), p(0) = α, (2.8)
dt
with α := p0 /K. It should be stressed heavily that the scaled model (2.8) only contains a single
parameter α. This is a significant simplification over the original logistics model (2.7), which
relied on three parameters. Finally, (2.8) can be solved by separating variables to obtain
α
p(t) = .
α + (1 − α)e−t
It is clear that p(t) → 1 for t → ∞ if α > 0, confirming that the limiting population p is equal to
the carrying capacity K.
The following example describes the motion of a projectile thrust vertically upward from the surface
of the earth. It was first pointed out by Lin and Segel [LS88] and demonstrates the importance of
choosing correct scales (characteristic quantities), especially when it is desired to make simplifications
2.3. SCALING 15
Table 2.5: Variables, parameters, and corresponding dimensions used in Example 2.3.3
At time t = 0 on the surface of the earth, with radius R and mass M , an object of mass m is given
a vertical upward velocity of magnitude V . Here, we want to determine the height h = h(t) above
the earth’s surface that the object reaches at time t. Forces on the object are the gravitational
force and the force due to air resistance. Yet, we assume that the force due to air resistance
can be neglected in this particular problem. Then, Newton’s second law provides us with the
mathematical model
d2 h R2 dh
2
= −g 2
, h(0) = 0, (0) = V, (2.9)
dt (h + R) dt
where g denotes the local acceleration of free fall and is assumed to be constant (since h/R ≈ 0).
All involved quantities can be found in Table 2.6. Next, to scale model (2.9), let us introduce new
dimensionless variables for time and the object height:
t h
t= and h= .
tc hc
Of course, the question remains of how to choose the characteristic quantities tc and hc . We start
by noting that tc and hc are formed by taking combinations of the parameters in the problem, i. e.
t c = R α1 V α2 g α3 , hc = Rβ1 V β2 g β3 .
t = Lα1 (Lt−1 )α2 (Lt−2 )α3 = Lα1 +α2 +α3 t−α2 −2α3 ,
h = Lβ1 (Lt−1 )β2 (Lt−2 )β3 = Lβ1 +β2 +β3 t−β2 −2β3
α1 + α2 + α3 = 0,
−α2 − 2α3 = 1,
16 2.3. SCALING
and
β1 + β2 + β3 = 1,
−β2 − 2β3 = 0.
α = (1 + α3 , −1 − 2α3 , α3 ), α3 ∈ R,
β = (1 + β3 , −2β3 , β3 ), β3 ∈ R.
Unfortunately, not all choices for α and β will result in equal success later on. This is where
we also need to weight in some physical intuition. To demonstrate this, let us consider different
choices:
t h
t= , h = , (for α3 = 0, β3 = 0)
RV −1 R
t h
t = p −1 , h = , (for α3 = −0.5, β3 = 0)
Rg R
t h
t= , h = 2 −1 , (for α3 = −1, β3 = −1),
V g −1 V g
which respectively yield the scaled models
d2 h 1 dh
ε 2 =− , h(0) = 0, (0) = 1, (2.10)
dt (1 + h)2 dt
d2 h 1 dh √
2 =− , h(0) = 0, (0) = ε, (2.11)
dt (1 + h)2 dt
d2 h 1 dh
2 =− , h(0) = 0, (0) = 1. (2.12)
dt (1 + εh)2 dt
V2
ε= .
gR
To illustrate how a clumsy choice of the characteristic quatities tc and hc can yield to difficulties,
let us modify the scaled models (2.10), (2.11), and (2.12) in the case that ε is known to be a small
parameter, which can be neglected. Then, the scaled model (2.10) becomes
dh
(1 + h)−2 = 0, h(0) = 0, (0) = 1,
dt
which has no solution. At the same time, the scaled model (2.11) becomes
d2 h 1 dh
2 =− , h(0) = 0, (0) = 0.
dt (1 + h)2 dt
This model, in fact, has a solution. Unfortunately this solution cannot be considered as physical
reasonable, since h(t) ≤ 0. Hence, in the scaled models (2.10) and (2.11) it is not possible to
neglect small parameters, which is unfortunate, since this kind of technique is common practice in
making approximations in applied problems. What went wrong is that (2.10) and (2.11) represent
incorrectly scaled models. In these, terms that may appear small may not in fact be small. For
2.3. SCALING 17
2 d2 h
instance, in the term ε d 2h , the parameter ε may be small but 2 may be large, and therefore
dt dt
the whole term may not be negligible. If, on the other hand, the term εh is neglected in the last
scaled model (2.12), we get
d2 h dh
2 = −1, h(0) = 0, (0) = 1,
dt dt
and therefore
t
h(t) = t − ,
2
or
1
h(t) = − gt2 + V T.
2
Hence, in fact, we have obtained an approximate solution that is consistent with our experience
with falling bodies close to the earth. In this case, we were able to neglect the small term ε and
obtain a valid approximation because the scaling is correct. Actually, the choices
tc = V g −1 , hc = V 2 g −1 ,
can be argued physically. If V is small, then the body will be acted on by a constant gravitational
field. Thus, launched with speed V , it will uniformly decelerate and reach its maximum height
at time V /g, which therefore is the characteristic time. Moreover, the body will travel a distance
of about V /g times its average velocity V /2. Hence, V 2 /g is revealed as a good choice for the
characteristic height. In contrast, measuring the height relative to the radius of the earth, as in
(2.10) and (2.11), is not a good choice!
Table 2.6: Variables, parameters, and corresponding dimensions used in Example 2.3.4
In general, if a correct scaling is chosen, terms in the equation that appear small are indeed small
and may therefore be safely neglected.
18 2.3. SCALING
Chapter
3
Asymptotic analysis
In the latter chapters of this course we will often try to find approximate solutions to otherwise unsolvable
mathematical models. Yet, before digging deeper into these techniques, it is convenient to collect some
fundamental definitions and results from asymptotic analysis first. Asymptotic analysis is concerned
with the behavior of functions f (ε) for ε → 0.
where qε := kxpεε k . Sometimes, pε is referred to as a gauge function. When it is clear from the
context which norm we are using, we just write xε = O(pε ) and xε = o(pε ) instead of Ok·k (pε )
and ok·k (pε ).
x : R+ → X, ε 7→ xε
for the argument ε ∈ R+ . Yet, it is a common convention in asymptotic analysis to also denote the
whole function x —and not just specific function values— by xε . We follow this convention here.
Remark 3.1.2
Note that if X is a finite-dimensional linear space all norms on X are equivalent. Hence, Definition
3.1.1 is independent of the norm k · k in this case. However, if X is not a finite-dimensional linear
space there will be norms which are not equivalent to each other.a Thus, in this case, Definition
3.1.1 does depend on the chosen norm k · k.
a
In fact, it is a well-known result in functional analysis that a linear space has finite dimensions if and only if
all norms on this space are equivalent.
19
20 3.1. THE BACHMANN-LANDAU NOTATION
The following lemma provides some useful calculation rules for the Bachmann-Landau notation.
in the sense of the euclidean space (R, k · k). All of the above calculation rules, except for (a), also
hold true if we replace big O by little o.
=⇒ Cxε = o(pε )
(c.1) xε = O(pε ), yε = O(rε )
kxε + yε k kxε k kyε k
=⇒ lim sup ≤ lim sup + lim sup <∞
ε→0 max{pε , rε } ε→0 pε ε→0 rε
=⇒ xε + yε = O(max{pε , rε })
(c.2) xε = o(pε ), yε = o(rε )
kxε + yε k kxε k kyε k
=⇒ 0 ≤ lim ≤ lim + lim =0
ε→0 max{pε , rε } ε→0 pε ε→0 rε
3.1. THE BACHMANN-LANDAU NOTATION 21
kxε + yε k
=⇒ lim =0
max{pε , rε }
ε→0
=⇒ xε + yε = O(max{pε , rε })
(d.1) xε = O(pε ), pε ≤ rε
kxε k kxε k
=⇒ lim sup ≤ lim sup <∞
ε→0 rε ε→0 pε
=⇒ xε = O(rε )
(d.2) xε = o(pε ), pε ≤ rε
kxε k kxε k
=⇒ 0 ≤ lim ≤ lim sup <∞
ε→0 rε ε→0 pε
=⇒ xε = o(rε )
(e.1) yε = O(rε ), kxε k ≤ kyε k ∀ε ≥ 0
kxε k kyε k
=⇒ lim sup ≤ lim sup <∞
ε→0 rε ε→0 rε
=⇒ xε = O(rε )
(e.2) yε = o(rε ), kxε k ≤ kyε k ∀ε ≥ 0
kxε k kyε k
=⇒ 0 ≤ lim ≤ lim =0
ε→0 rε ε→0 rε
=⇒ xε = O(rε )
(f.1) xε = O(pε ), yε = O(rε )
|hxε , yε i| kxε k kyε k
=⇒ lim sup ≤ lim sup lim sup <∞
ε→0 pε rε ε→0 pε ε→0 rε
=⇒ hxε , yε i = O(pε rε )
(f.2) xε = o(pε ), yε = o(rε )
|hxε , yε i| kxε k kyε k
=⇒ 0 ≤ lim ≤ lim lim <∞
ε→0 pε rε ε→0 pε ε→0 rε
=⇒ hxε , yε i = o(pε rε )
Example 3.1.4
Let X = R and k · k = | · |.
ε2 ln(ε) = o(ε).
(b) Sometimes also the mean value theorem is of practical use in asymptotic analysis. This is
demonstrated by considering xε = sin(ε). This time, we want to verify that
sin(ε) = O(ε).
22 3.1. THE BACHMANN-LANDAU NOTATION
Here, the mean value theorem provides us with an ξ between 0 and ε such that
sin(ε) − sin(0)
= cos(ξ).
ε−0
Thus, we have
| sin(ε)|
lim sup = lim sup cos(ξ) = 1 < ∞,
ε→0 ε ε→0
since ξ → 0 for ε → 0. At the same time, we have sin(ε) 6= o(ε) but sin(ε) = o(εα ) for all
α < 1.
Note that in Example 3.1.4(a) we have shown that ε2 ln(ε) = o(ε). Obviously, ε2 ln(ε) = O(ε) holds
as well. In fact, this observation is true in general.
xε = o(pε ) =⇒ xε = O(pε )
holds.
At the same time, the reversed statement (O(pε ) ⊂ o(pε )) is not true, in general. This has already
been demonstrated in Example 3.1.4(b). We end this excursion into the world of asymptotic analysis by
an example in which, for fixed ε, xε is a function itself and not just a real number.
Example 3.1.6
t
xε (t) := exp − , ε > 0.
ε
In particular, we have
xε = O(1) and xε 6= o(1).
Again, this demonstrates that xε = o(pε ) is a stronger property than xε = O(pε ).
Chapter
4
Perturbation Methods
Often, mathematical models cannot be solved in exact form and their solution has therefore to be
approximated by a numerical method. Perturbation methods are one such approximation technique,
in the case that the model includes very small terms. Such terms arise when the underlying physical
process has only small effects. For instance, let us consider an ordinary differential equation
F (t, y, y 0 , y 00 , ε) = 0, t ∈ I, (4.1)
where t denotes time and I the time interval, y = y(t) the dependent variable, and ε a small parameter.
Perturbation methods try to find an approximation to the original problem, in this case (4.1), by starting
from the exact solution of a simplified version of the problem, in this case
F (t, y, y 0 , y 00 , 0) = 0, t ∈ I. (4.2)
Usually, we refer to (4.1) as the perturbed problem and to (4.2) as the unperturbed problem. The
approximation of the perturbed problem is then expressed in terms of a formal power series in the small
parameter ε,
∞
X
pε (t) = εk yk (t), (4.3)
k=0
called perturbation series. Thereby, the leading term y0 is the solution of the (exactly solvable)
unperturbed problem (4.2). Further terms describe the deviation in the solution of the perturbed problem
due to the deviation from the unperturbed problem. In particular, we therefore would like to know if the
perturbation series (4.3) converges to the solution of the perturbed problem (4.1). Such investigations
are not just important in the context of ordinary differential equations but also for partial differential
equations, algebraic equations, integral equations, and many other types of equations we encounter
in Applied Mathematics. In fact, perturbation methods are closely related to numerical analysis and
their earliest use includes the otherwise unsolvable mathematical problems of celestial mechanics. In
particular, perturbation methods were used to describe and predict the orbit of the moon, which moves
noticeable differently from a simple Keplerian ellipse, because of the competing gravitation of the Earth
and the Sun.1
23
24 4.1. REGULAR PERTURBATION
y0 will be the solution of the unperturbed problem (4.2). The subsequent terms εy1 , ε2 y2 are therefore
regarded as higher-order correction terms that are expected to be small. Often, no more than the first
two or three terms are used. The resulting truncated perturbation series,
K
X
pε,K (t) = εk yk (t),
k=0
is called a perturbation approximation. In what follows, we illustrate the idea of regular perturbation
methods by some examples.
Example 4.1.1
as a perturbed problem, where ε > 0 is a small parameter. Note that the corresponding unper-
turbed problem F (x, 0) = 0 is given by
x2 − 3 = 0.
Here, O(ε3 ) denotes a term r = r(ε) with r = O(ε3 ). Substituting the perturbation series pε into
the perturbed problem (4.4) gives us
2
x0 + εx1 + ε2 x2 + O ε3 + 2ε x0 + εx1 + ε2 x2 + O ε3 − 3 = 0.
x20 − 3 = 0,
2x0 (x1 + 1) = 0,
x21 + 2x0 x2 + 2x1 = 0.
or ! !
√ ε2 √ ε2
x = −ε + 3 1 + √ + O ε4 , x̃ = −ε − 3 1 + √ + O ε4
2 3 2 3
by using the generalized binomial theorem. Hence, by comparing our perturbation approximations
with the exact solutions,
x − pε,2 = O ε4 , x̃ − p̃ε,2 = O ε4 ,
In the last example, the regular perturbation method led to a satisfactory result. In the next example,
the procedure is the same, but the result does not turn out favorable. This is the first signal that we are
in need to modify the regular method later on.
Example 4.1.2
where
aA2
ε := 1
k
is a small dimensionless parameter. Furthermore, ü denotes the second derivative of u with respect
to t, i. e. ü = d2 u/dt2 . Equation (4.5) is known as the Duffing equation.
Next, let us consider the regular perturbation method in the context of the Duffing equation (4.5).
In this case, the perturbation series looks like
∞
X
pε (t) = εk uk (t).
k=0
26 4.1. REGULAR PERTURBATION
For sake of simplicity, however, we just focus on the perturbation approximation including the
first two terms,
pε,1 (t) = u0 (t) + εu1 (t).
Once more, the coefficients u0 and u1 are determined by substituting pε,1 into the perturbed
problem (4.5), resulting in
Expanding out, collecting the terms in the different orders of ε, and comparing their coefficients
yields the following sequence of linear initial value problems:
u0 (t) = cos t.
For a fixed time t the term goes to zero as ε → 0. Yet, if t itself is of order ε−1 or larger as
ε → 0, then the term ε 83 t sin t has an increasingly large amplitude. Such terms in a perturbation
approximation, which are not ensured to converge to zero as ε → 0, are called secular terms.
Unfortunately, this behavior is not consistent with the physical situation of the underlying per-
turbation problem (4.5).
We summarize that for this example the correction term cannot be made arbitrarily small for
t ∈ [0, ∞), by choosing ε small enough. Note that it is also not possible to repair this by including
further correction terms, e. g. of order ε2 or ε3 . Only if we restrict the time to a finite interval
t ∈ [0, T ] for some fixed T > 0 the correction term can be made arbitrarily small.
In fact, another way to remedy this type of singular behavior is the Poincaré–Lindstedt method.
The key idea of this method is to introduce a distorted time scale in the perturbation series. In case of
Example 4.1.2, this would yield
∞
X
pε (τ ) = εk uk (τ )
k=0
with τ = ωt and
K
X
ω= εk ωk .
k=0
4.2. PITFALLS OF REGULAR PERTURBATION 27
For more details on the Poincaré–Lindstedt method we refer to Chapter 2.1.3 in [Log13].
Example 4.2.1
Yet, our goal is to illustrate the failure of the regular perturbation method for this example. If
we attempt regular perturbation by substituting the perturbation series
pε = x0 + εx1 + ε2 x2 + . . .
into the perturbed problem (4.8), we get, after comparing the coefficients of the different orders
of ε, the following sequence of equations:
2x0 + 1 = 0,
x20 + 2x1 = 0,
2x1 x0 + 2x2 = 0, ...
• Regular perturbation just gives us a single (approximate) solution, where there should be
two.
28 4.2. PITFALLS OF REGULAR PERTURBATION
2x + 1 = 0,
for which the solution is given by x0 , is not a reasonable simplification of the perturbed problem
(4.8) for the second solution x2 . Even though ε is a small parameter and might be neglected,
the product εx22 cannot, since x2 increases with decreasing ε; see (4.9). This is a problem we
have already encountered in the context of scaling: A term that may appear small, εx22 , in fact
is not small at all! Yet, this observation also points the way towards a solution of this problem.
Motivated by scaling, let us introduce a new variable y of order 1 defined by
y = εx.
y 2 + 2y + ε = 0.
p̃ε = y0 + εy1 + ε2 y2 + . . .
satisfy
y02 + 2y0 = 0,
2y0 y1 + 2y1 + 1 = 0, ...
The approach used in the above example, of considering the perturbed equation as well as an ap-
propriate scaled version, to find perturbation series for solutions of different orders is called dominant
balancing.
The next example of a second order boundary value problem is supposed to illustrate another pitfall
of regular perturbation methods: Sometimes it is not even possible to calculate the leading term. This
happens, for instance, when the unperturbed problem is not well defined.
4.2. PITFALLS OF REGULAR PERTURBATION 29
Example 4.2.2
where the coefficients yk , k ∈ N0 , are computed by substituting pε into (4.10), the leading term
y0 would be given as the solution of the unperturbed problem
y 0 + y = 0, 0 < x < 1,
(4.11)
y(0) = 0, y(1) = 1.
However, it is easy to note that (4.11) does not have a solution. The general solution of y 0 + y = 0
is given by
y(t) = ce−x .
Yet, the boundary condition y(0) = 0 yields c = 0 and therefore
y(t) = 0,
y(t) = e1−x .
Generally speaking, it is always a bad sign when the order of an ordinary differential equation
is lower than the number of initial/boundary conditions. Once more, we observe the regular
perturbation method to fail. In the following list, we summarize some of the several indicators
that often suggest failure of the regular perturbation method:
1. When the small parameter is multiplied with the highest derivative in the problem.
2. More generally, when setting the small parameter to zero changes the character of the
problem. This includes partial differential equations changing type (e. g., from elliptic to
parabolic), or an algebraic equation changing degree.
4. When the equations that model physical processes have multiple time or spacial scales.
Such problems, resulting in the pitfall of the regular perturbation method, fall in the general
class of singular perturbation problems. Different techniques to adapt perturbation methods
to some of the above singular perturbation problems and therefore overcome their failure in
these cases, can be found in boundary and initial layer analysis; see chapters 2.3 and 2.4 in
[Log13]. Another noteworthy approach is the Wentzel–Kramers–Brillouin perturbation method.
The interested reader may find further information in Chapter 2.5 of [Log13].
30 4.2. PITFALLS OF REGULAR PERTURBATION
Chapter
5
Asymptotic Expansion of Integrals
Many physical quantities can be described by integrals. In particular, the solution of differential equations
often yield formulas involving integrals. Unfortunately, in many cases, these integrals cannot be evaluated
in closed form. For example, the initial value problem
y 00 + 2λ + y 0 = 0, y(0) = 0, y 0 (0) = 1,
with λ 1 and strictly increasing g ∈ C 1 ([a, b]). A prominent example is the Laplace transform
Z ∞
L{f }(λ) = f (t)e−λt dt,
0
which is an often used tool to transform differential equations into algebraic equations. Note that to
study the whole class of integrals 5.1 it actually suffices to consider so-called Laplace integrals.
a Laplace integral.
31
32 5.1. LAPLACE INTEGRALS
To observe that every integral of the form (5.1) can be represented as a Laplace integral, we can
make a change of variables s = g(t) − g(a). This yields
Z b Z g(b)−g(a)
−λg(t) −λg(a) f (t(s)) −λs
f (t)e dt = e 0
e ds
a 0 g (t(s))
and therefore a Laplace integral. Here, t = t(s) denotes the solution of the equation s = g(t) − g(a).
Unfortunately, in many cases we are not able to compute Laplace integrals exactly. Hence, we will
look for approximations to (5.2) next. The fundamental idea we will follow for this is to determine
which subintegral gives the dominant contribution to the integral Note that the function e−λt is rapidly
decaying for t > 0 if λ 1. Thus, assuming that f does not grow too fast at infinity and is reasonably
well behaved at t = 0. This approach is often referred to as Laplace’s method and we start by
illustrating it by an example.
Example 5.1.2
for T > 0. Note the second integral is an exponentially small term (EST); that is
I2 (λ, T ) = O(λ−1 e−λT ) as λ, T → ∞. This can be quickly observed from
Z ∞ Z ∞
sin t −λt
|I2 (λ, T )| ≤ e dt ≤ e−λt dt = λ−1 e−λT .
T t T
sin t
Addressing the first integral next, in the finite interval [0, T ] we can replace t by its Taylor
series around t = 0,
sin t t2 t4
= 1 − + ∓ ....
t 3! 5!
This yields !
t2 t4
Z T
I1 (λ, T ) = 1 − + ∓ . . . e−λt dt
0 3! 5!
and a change of variable provides us with
!
u2 u4
Z T
1
I1 (λ, T ) = 1− + ∓ ... e−u du.
λ 0 3!λ2 5!λ4
But now, the upper limit can be replaced by ∞. The error introduced by doing this is exponentially
small as T → ∞. Consequently, we have
!
u2 u4
Z ∞
1
I1 (λ, T ) = 1− + ∓ ... e−u du + O e−T .
λ 0 3!λ2 5!λ4
5.1. LAPLACE INTEGRALS 33
we obtain
1 3 5
I1 (λ, T ) = − 3 + 5 ∓ . . . + O e−T
λ λ λ
1 3
= − 3 + O λ−5 + O e−T
λ λ
and therefore
1 3
I(λ) = − 3 + O(λ−5 ) + O e−T + O(λ−1 e−λT ).
λ λ
Since this equation holds for any T > 0, it implies
1 3
I(λ) = − + O(λ−5 ).
λ λ3
A general result for Laplace integrals of this flavor can be found in the subsequent Theorem 5.1.6.
Yet, before addressing this result, we introduce some additional notation.
The gamma function is considered as a generalization of the factorial function. This is because of
the following properties of the gamma function.
Let x ∈ R+ and n ∈ N. The following properties hold for the gamma function Γ:
Proof. (a) The assertion follows from Definition 5.1.3 by applying integration by parts.
Let (gn (t, λ))n∈N0 be a sequence of gauge functions (see Definition 3.1.1).
for n ∈ N0 .
or
N
X
y(t, λ) − an gn (t, λ) = O(gN +1 (t, λ)) as λ → ∞
n=0
Note that (a) in Definition 5.1.5 means that every element gn+1 (t, λ) of the sequence converges faster
to zero than its predecessor. Similarly, (b) in Definition 5.1.5 means that the remainder of any partial
sum is little oh, o, of the last term. We are now ready to formulate a general result regarding the
asymptotic behavior of Laplace integrals.
where α > −1. Let h satisfy |h(t)| ≤ Cekt , 0 < t < b, for some positive constants C and k and let
h have a Taylor series expansion around t = 0. Then,
∞
X h(n) (0)Γ(α + n + 1)
I(λ) ∼ , λ → ∞.
n=0
n!λα+n+1
Proof. Here, we only outline the proof of Watson’s Lemma in a rough manner. A more detailed discussion
can be found, for instance, in [Mur12]. The idea behind the proof is to follow the same argument as in
Example 5.1.2. Let T ∈ (0, b) lie in the radius of convergence of the Taylor series of h around t = 0.
5.2. INTEGRATION BY PARTS 35
Note that the condition |h(t)| ≤ CekT , 0 < t < b, ensures that the second integral satisfies
I2 (λ, T ) = O(e−λ )
and is therefore an exponentially small term. Next, replacing h by its Taylor series around t = 0 in
I1 (λ, T ), we have
∞ ∞
!
tn h(n) (0)
Z T Z T
−λt
tn+α e−λt dt,
X X
α (n)
I1 (λ, T ) = t h (0) e dt =
0 n=0
n! n=0
n! 0
where the second equation follows Lebesgue’s dominated convergence theorem. Furthermore, making
the substitution u = 2t and replacing the upper limit of integration λT by ∞ yields
∞
!
h(n) (0)
Z T
1
un+α e−u dt + O(e−λT ) .
X
I1 (λ, T ) =
n=0
n! λn+α+1 0
Note that replacing the upper limit of integration λT by ∞ only introduces an error that is exponentially
small as λ → ∞, which is reflected by the term O(e−λT ). Finally, remembering the definition of the
gamma function, we obtain
∞
h(n) (0)Γ(α + n + 1)
+ O(e−λT )
X
I1 (λ, T ) =
n=0
n!λn+α+1
e−t
Z ∞
I(λ) = dt. (5.4)
0 1 + t/λ
Using integration by parts, we prove that
∞
(−1)n n!λ−n .
X
I(λ) ∼ (5.5)
n=0
Lemma 5.2.1
For Euler’s integral (5.4) the inequality
N
(−1)n n!λ−n | ≤ (N + 1)!λ−(N +1)
X
|I(λ) −
n=0
36 5.2. INTEGRATION BY PARTS
e−t
Z ∞
I(λ) = 1 − λ−1 dt.
0 (1 + λ−1 t)2
e−t
Z ∞
rN +1 (λ) = (−1)N +1 (N + 1)!λ−(N +1) . dt
0 (1 + λ−1 t)N +2
Then, (5.5) follows by observing that for the sequence of gauge functions (gn (λ))n∈N0 with gn (λ) = λ−n
the relation
|I(λ) − N n −n | (N + 1)!λ−(N +1)
P
n=0 (−1) n!λ
lim ≤ lim =0
λ→∞ gN (λ) λ→∞ gN (λ)
holds true and therefore
N
X
I(λ) − (−1)n n!gn (λ) = o(gN (λ)), λ → ∞.
n=0
Chapter
6
Functional Analysis - A Crash Course
This chapter is supposed to provide you with some basic concepts from functional analysis. Broadly
speaking, functional analysis is a branch of mathematical analysis that is concerned with vector spaces
which are endowed with a topological structure (e. g. a norm or an inner product) and linear maps
between some vector spaces. In particular, such maps can be differential operators mapping functions
from the vector space C 1 ([0, 1]) to the vector space C 0 ([0, 1]) or integral operators. Many of the concepts
discussed here will also be useful in the subsequent Chapter 7 on variational methods.
X → R+
0, x 7→ kxk
Example 6.1.2
kf k∞ := max |f (t)|, f ∈ X,
t∈[0,1]
is a norm on X.
37
38 6.1. NORMED VECTOR SPACES
Lemma 6.1.3
Let 1 ≤ p < ∞. Then
Z 1 1/p
kf kp := |f (t)|p dt , f ∈ C([0, 1]),
0
Essentially, a norm allows us measure the length of vectors, which often is quite useful. Yet, an even
more powerful tool are inner products. These additionally allow us to measure the angle between two
vectors.
Definition 6.1.4
Let X be a vector space over R. A bilinear function
X × X → R, (x, y) 7→ hx, yi
The pair (X, h·, ·i) is referred to as an inner product space (also pre-Hilbert space).
In the above definition, h·, ·i being bilinear means that it is linear in both arguments. That is,
Remark 6.1.5
Here, we only consider vector spaces over R. While the definition for norms does not change if we
go over to vector spaces over C it should be stressed that the definition of inner products does! In
the case of X being a vector space over C, the function h·, ·i is required to be sesquilinear (linear
in the first argument and conjugate linear in the second argument) instead of bilinear. Moreover,
the property of h·, ·i being symmetric is replaced by h·, ·i being conjugate symmetric in this case.
Example 6.1.6
It is easy to note that both functions are bilinear and nonnegative. Moreover, both functions are
symmetric due to R being commutative with respect to multiplication (ab = ba for all a, b ∈ R).
Finally, it is also not hard to verify that both functions are positive definite. It is clear that
Pn
(a) x = (0, . . . 0)| =⇒ hx, xi = i=1 0 = 0,
R1
(b) f ≡ 0 =⇒ hf, f i = 0 0 dt = 0.
On the other hand, the reverse implication can be shown by the contraposition (A =⇒ B is
equivalent to ¬B =⇒ ¬A).
(a) Let x = (x1 , . . . , xn ) 6= (0, . . . 0)| . Then, there exists a j ∈ {1, . . . , n} such that xj 6= 0.
Hence,
n
X
hx, xi = x2i ≥ x2j > 0.
i=1
(b) Let f 6≡ 0. Then, there exists an t0 ∈ [0, 1] such that f (t0 ) 6= 0. Since f ∈ C([0, 1]), we can
find a whole neighborhood Uε (t0 ) = (t0 − ε, t0 + ε) such that f (t) 6= 0 for all t ∈ Uε (t0 ).
Thus, we have Z 1 Z x0 +ε
hf, f i = f 2 (t) dt ≥ f 2 (t) dt > 0.
0 x0 −ε
As noted before, inner products can be considered as a much more powerful tool than norms. In
what follows, we summarize some of the most important and far reaching results for inner products. We
start by showing that, in particular, every inner product induces a norm.
Lemma 6.1.7
Let (X, h·, ·i) be an inner product space over R. Then,
q
kxk = hx, xi, x ∈ X,
is a norm on X.
Proof. The above defined function k · k clearly is nonnegative. Hence, it remains to check the properties
(N1)-(N3).
(N1) k · k being positive definite is ensured by h·, ·i being positive definite, due to (IP1).
(N2) k · k being absolutely homogeneous is ensured by h·, ·i being bilinear:
kλxk2 = hλx, λxi = λ2 hx, xi = (|λ|kxk)2
(N3) Finally, the triangle inequality for k · k follows from h·, ·i being bilinear and symmetric (IP3):
kx + yk2 = hx + y, x + yi
= hx, xi + hx, yi + hy, xi + hy, yi
= kxk2 + 2hx, yi + kyk2
≤ kxk2 + 2|hx, yi| + kyk2
Thus, by utilizing the Cauchy–Schwarz inequality,
|hx, yi| ≤ kxkkyk, x, y ∈ X, (6.1)
40 6.1. NORMED VECTOR SPACES
we get
Note that we have utilized the Cauchy-Schwarz inequality (6.1) in the above proof.
p To not leave a
gap in the proof, we need to verify (6.1) next (without using the fact that k · k = h·, ·i satisfies the
triangle inequality; otherwise we would be stuck in a logical loop).
Let (X, h·, ·i) be an inner product space over R and let k · k : X → R+
0 be given by
q
kxk = hx, xi, x ∈ X.
0 ≤ kx − λyk2
= hx, xi − hx, λyi − hλy, xi + hλy, λyi
= kxk2 − 2λhx, yi + λ2 kyk2 .
Another fundamental —yet far reaching— result for inner product spaces is the parallelogram law.
Proof. The parallelogram law is easily established using the properties of the inner product:
In particular, the parallelogram law allows us to generalize Pythagoras’ theorem to general inner
product spaces. This is also related to inner products being able to assign angle-like relations between
two vectors. For instance, we can check if two vectors are perpendicular by the concept of orthogonality.
Let (X, h·, ·i be an inner product space. We say that two vectors x, y ∈ X are orthogonal if
hx, yi = 0
holds.
Similarly to Pythagoras’ theorem in R2 , we can now formulate the following generalization for or-
thogonal vectors in inner product spaces.
Let (X, h·, ·i) be an inner product space over R, let x, y ∈ X, and let x and y be orthogonal. Then,
Proof. The assertion follows immediately from the proof of the parallelogram law when utilizing
hx, yi = hy, xi = 0.
As we can note from the proof of Pythagoras’ theorem, the parallelogram law allows some impressive
conclusions. Perhaps most formidable, it even allows to characterize inner product spaces among normed
vector spaces.
Let (X, k · k) be a normed space. Then, there exists an inner product h·, ·i on X such that
kxk2 = hx, xi for all x ∈ X (k · k is induced by h·, ·i) if and only if the parallelogram law
2 kxk2 + kyk2 = kx + yk2 + kx − yk2 (6.2)
Proof. We have already established the parallelogram law to hold in in inner product spaces in Lemma
6.1.9. Hence, it only remains to show that the norm k·k is induced by an inner product if the parallelogram
law holds in (X, k · k). This is discussed in a subsequent lemma.
Let (X, k · k) be a normed space over R. If the parallelogram law (6.2) holds, then there is an
inner product h·, ·i such that kxk2 = hx, xi for all x ∈ X. Moreover, the inner product is uniquely
given by
1
hx, yi = kx + yk2 − kx − yk2
4
for x, y ∈ X.
Proof. The assertion follows from some lengthy and tiresome (but basic) computations.
Let (xn )n∈N be a sequence in a normed vector space (X, k · k). We say (xn )n∈N coverges to a
vector x ∈ X if
∀ε > 0 ∃N ∈ N ∀n ≥ N : kx − xn k < ε. (6.3)
We denote this by
lim xn = x or xn → x for n → ∞
n→∞
Definition 6.2.2
Let (xn )n∈N be a sequence in a normed vector space (X, k · k). We call (xn )n∈N a Cauchy
sequence if
∀ε > 0 ∃N ∈ N ∀n, m ≥ N : kxm − xn k < ε. (6.4)
In many situations, we are used for convergence of a sequence to hold if and only if the sequence is a
Cauchy sequence. In this case it is often more convenient to prove the Cauchy property (6.4) instead of
directly showing convergence, since (6.4) does not assume any prior knowledge of the potentially existing
(or non existing) limit x. Indeed, part of the equivalence between convergence and Cauchy sequences
also holds in general normed vector spaces, namely, every convergent series is also a Cauchy sequence.
This is noted in the following lemma.
6.2. CONVERGENCE AND CAUCHY SEQUENCES 43
Let (xn )n∈N be a sequence in a normed vector space (X, k · k). If (xn )n∈N converges, then (xn )n∈N
is a Cauchy sequence.
Proof. Let x denote the limit of the convergent sequence (xn )n∈N and let ε > 0. We have to show that
there is an N ∈ N such that
kxm − xn k < ε ∀n, m ≥ N.
Therefor, we note that
kxm − xn k ≤ kxm − xk + kx − xn k.
Hence, since xn → x for n → ∞, there exists an N ∈ N such that
ε
kxm − xk, kx − xn k < ∀n, m ≥ N.
2
For this N , we therefore have
kxm − xn k < ε ∀n, m ≥ N,
which results in the assertion.
The above lemma tells us that also in general normed vector spaces, every convergent sequence is a
Cauchy sequence. From real analysis (X = R or X = Rn or even X = C) we are accustomed to the
reversed implication to hold as well. Unfortunately, in general normed vector spaces Cauchy sequences
do not necessarily converge. This is, among other things, demonstrated in the subsequent example.
Example 6.2.4
for all n, m ≥ N .
• Yet, at the same time, (fn )n∈N does not converge! To show this, let us assume that (fn )n∈N
would converge to a limit f ∈ C([0, 1]). It is easy to show that
(
0 if 0 ≤ t < 1,
f (t) =
0 if 1 ≤ t ≤ 2
needs to hold then. Yet, this would be a contradiction to f being continuous. Hence, there
can be no limit in (C([0, 2]), k · k2 ).
We have just seen that while every convergent sequence is a Cauchy sequence in normed vector spaces
the reversed statement does not hold in general. In fact, every Cauchy sequence also being convergent
is an intrinsic property of a normed or inner product space and has received its own name.
• A normed space (X, k · k) in which every Cauchy sequence converges is said to be complete
and then referred to as a Banach space.
• An inner product space (X, h·, ·i) —also called pre-Hilbert space— in which every Cauchy
sequence converges with respecct to the induced norm is also said to be complete and
referred to as Hilbert space.
(c) In the case of Y = R (linear) operators are usually referred to as (linear) functionals.
L(0) = 0.
We have already seen that many collections of important objects in mathematics can be seen as vector
spaces, e. g. different function spaces. For the same reason, (linear) operators are considered such a
6.3. LINEAR OPERATORS AND FUNCTIONALS 45
powerful tool. Many important operations, for instance differentiation and integral operators, can all be
interpreted as linear operators between certain function spaces.
Example 6.3.2
d
(a) The differential operator dx can be considered as a linear operator L defined by
(b) Integration of a function f , let’s say over [a, b], can be considered as applying a linear
functional F given by
Z b
F : C([a, b]) → R, F (u) = u(t) dt.
a
Next, we note a property for operators which is strongly connected to continuity, yet often much
easier to check.
Definition 6.3.3
Let (X, k ·kX ) and (Y, k ·kY ) be normed vector spaces. An operator L : X → Y is called bounded
if
∃C > 0 ∀x ∈ X : kLxkY ≤ CkxkX
holds. The number
kLxkY
kLk := sup
x∈X\{0} kxkX
is called norm of L.
One of the most pleasant properties of linear operators is the following equivalence between their
continuity and boundedness. This equivalence allows us to further investigate boundedness of a linear
operator, which is often considerably easier, instead of directly addressing continuity.
Theorem 6.3.4
Let (X, k · kX ) and (Y, k · kY ) be normed vector spaces and let L : X → Y be a linear operator.
The following properties are equivalent:
(i) L is bounded
(ii) L is continuous
(iii) L is continuous in 0
kxkX ≤ δ =⇒ kLxkY ≤ 1.
kLxkY ≤ δ −1 kxkX
Example 6.3.5
This observation also suggests that the norm of F is given by kF k = b − a. It is obvious that
kF k ≤ b − a but we could also have estimated |F u| ≤ 2(b − a)kuk∞ and therefore concluded
that kF k ≤ 2(b − a). Thus, so far, we just know that b − a is an upper bound for kF k. To
prove that not just kF k ≤ b − a but also kF k ≥ b − a (and therefore kF k = b − a), we have
to show that there exists a u ∈ C([a, b]) such that |F u| ≥ b − a. Therefor, let us consider
the simple example u ≡ 1, yielding |F u| = b − a. Hence,
kF k = b − a.
In fact, this linear functional is not bounded. To show this, let us consider functions of the
form
un (t) = sin(nπt), n ∈ N.
For these, we have
and therefore
|F un | = nπkuk∞ .
6.3. LINEAR OPERATORS AND FUNCTIONALS 47
Since this relation holds for every n ∈ N, there can be no C > 0 such that
|F u| ≤ Ckuk∞
The calculus of variations is concerned with optimizing, i. e. minimizing or maximizing functionals over
some admissible class of functions.
From calculus and real analysis, we are familiar with this question in the case that M ⊂ R. We would
look for local minima or maxima by checking the necessary condition
F 0 (x) = 0
then. Yet, in many cases M is not simply given by a subset of the real numbers but, for instance, by
a class of functions. A few examples for this —with important applications in mechanics, optics, and
quantum mechanics— are provided below:
1. Dido’s problem: Maximize the area of a surface that is given by a closed curve of fixed length.
That is, for Z 1 Z 1
A(u) = u2 (t)u01 (t) dt, l(u) = |u0 (t)| dt,
0 0
maximize A(u) under the constraint that l(u) = 1.
2. The Brachistochrone (curve of fastest decent): Given are two points A = (0, 0) and B = (B1 , B2 )
with B2 < 0. Along which curve does an object (frictionless, under the influence of gravity) get
from A to B the fastest? That is, for which u : [0, 1] → R2 is
ku0 (t)k2
Z 1
T (u) = p dt
0 u2 (t)
3. Geodesics: Curves representing — in some sense — the shortest path between two points in a
surface, or more generally in Riemann manifold.
49
50 7.2. NECESSARY CONDITIONS FOR EXTREMA
Moreover, if δX F is linear and continuous (or bounded) it is called the Gâteaux derivative of
F in x and F is said to be Gâteaux differentiable in x.
Remark 7.2.2
Note that if F : Rn → Rm is Gâteaux differentiable in x, F is also partially differentiable in x
with
δx F (h) = JF (x)h ∀h ∈ Rn ,
where JF (x) ∈ Rm×n denotes the Jacobi matrix of F in x.
Here, we are only interested in functionals and therefore in the case Y = R. Note that in this case,
for a fixed h ∈ X, the Gâteaux derivative δx F (h) essentially is the derivative of the real valued function
R → R, t 7→ F (x + th).
Note that the variation of a functional can be considered as an approximation of the local behavior
of that functional. If adm(F, x) = X, the function δx (h) is exactly the Gâteaux derivative of F at x.
Hence, variations of F can be interpreted as special directional derivatives. Next, let us summarize some
elemental properties of the first (1-st) variation.
(b) Let G : A → R be a functional such that δx G(h) exists. The first variation is additive:
We can now devote our attention to extrema of functions and how these can be found using the first
variation.
Definition 7.2.5: Extrema
Let (X, k · k) be a normed vector space. The functional F : X ⊃ A → R is said to have a (local)
minimum at x ∈ A if there exists a neighborhood U around x such that
F (ξ) ≥ F (x)
Remark 7.2.6
If a point x is an extremum of F , in particular, depends on which sets U ⊂ X are neighborhoods
around x (x ∈ U and U is open). In general, this will also depend on the norm k · k on X.
Following a well-known argument from real analysis, we note that if F has an extremum at x, for
h ∈ adm(F, x), the real-valued function
R → R, t 7→ F (x + th)
52 7.2. NECESSARY CONDITIONS FOR EXTREMA
has an extremum in t = 0 then. This already yields the following necessary condition for extrema of
functionals.
Theorem 7.2.7: A necessary condition for extrema
δx F (h) = 0
holds.
Remark 7.2.8
The converse statement of Theorem 7.2.7 does not hold. That is, δx F (h) = 0 is not a sufficient
condition for x being an extremum.
Definition 7.2.9
Let F : X ⊃ A → R be a functional, x ∈ A, and V ⊂ adm(F, x). If δx F (h) = 0 holds for all
h ∈ V , we call x an extremal of F (with respect to V ).
Note that since Theorem 7.2.7 only provides a necessary condition, we are not guaranteed that
extremals will actually provide extrema. Yet, we consider extremals as the candidates for extrema. To
actually find extrema of a variational problem it is often convinient to follow the following strategy:
• Determine extremals and investigate which of these are actually extrema.
Example 7.2.10
and therefore
d
δx F (h) = F (x + th) = 0.
dt t=0
This shows that x(s) = s is an extremal of F .
and therefore
Z b
d ∂
F (x + th) = L(s, x + th, x0 + th0 ) ds
dt a ∂t
Z b
= Lx (s, x + th, x0 + th0 )h + Lx0 (s, x + th, x0 + th0 )h0 ds,
a
Lemma 7.3.1
A necessary condition for x being a local minimum (or maximum) of F given by (7.2) is
Z b
Lx (s, x, x0 )h + Lx0 (s, x, x0 )h0 ds = 0 (7.3)
a
Unfortunately, condition (7.3) is just of limited use for determining x. Yet, using the fact that it has
to hold for all h, we can simplify the condition. Note that applying integration by parts to (7.3) yields
Z b
d
b
0
Lx (s, x, x ) − Lx0 (s, x, x0 ) h ds + Lx0 (s, x, x0 )h s=a = 0
a ds
to hold for all h ∈ C 2 ([a, b]) with h(a) = h(b) = 0 as a necessary condition for x to be a local minimum
of F . The following lemma, which goes back to Lagrange, provides the final step in deriving a practical
necessary condition.
Lemma 7.3.2
Let f ∈ C([a, b]). If
Z b
f (s)h(s) ds = 0
a
holds for all h ∈ C 2 ([a, b]) with h(a) = h(b) = 0, then f (s) = 0 for all s ∈ [a, b].
Proof. We prove this by contradiction. Assume there exists an s0 ∈ (a, b) such that f (s0 ) 6= 0, where
we assume f (s0 ) > 0 without loss of generality. Since f is continuous, we can find an ε > 0 such that
f (s) > ε for s0 − δ ≤ s ≤ s0 + δ then. Hence, for s1 = s0 − δ, s2 = s0 + δ, and
(
(s − s1 )3 (s2 − s)3 if s1 ≤ s ≤ s2 ,
h(s) =
0 otherwise,
we have Z b Z s2
f (s)h(s) ds ≥ ε h(s) ds > 0,
a s1
In fact, the above lemma is just a special version of what is called the fundamental lemma of the
calculus of variations (CoV).
Let L : [a, b]×R×R → R be a twice continuously differentiable function. If a function x ∈ C 2 ([a, b])
7.3. THE EULER–LAGRANGE EQUATION 55
with x(a) = xa and x(b) = xb provides a local minimum (or maximum) for the functional
Z b
F (x) = L(s, x, x0 ) ds,
a
Note that the Euler–Lagrange equation is a second order equation and potentially nonlinear. This
can be seen by writing the derivatives in (7.5) using the chain rule:
Lx (s, x, x0 ) − Lx0 s (s, x, x0 ) − Lx0 x (s, x, x0 )x0 − Lx0 x0 (s, x, x0 )x00 = 0.
It represents a necessary condition for a local minimum and it is analogous to the derivative condition
f 0 (x) = 0 in differential calculus. Thus, its solutions are not necessarily local minima. Yet, the solutions
of the Euler–Lagrange equation (7.5) are extremals, i. e. candidates for local minima. Next, let us
consider two examples to demonstrate the application of the Euler–Lagrange equation.
Example 7.3.5
with n o
A= x ∈ C 2 ([0, 1]) | x(0) = 0, x(1) = 1 .
We would like to find the extremals of F . Therefore, we can utilize Theorem 7.3.4, which tells
us that the extremals of F are given as solutions of the corresponding Euler–Lagrange equation
(7.5). Adapting the notation of Theorem 7.3.4, we have
Example 7.3.6
where we n o
A= x ∈ C 2 ([a, b]) | x(a) = xa , x(b) = xb .
Again, applying Theorem 7.3.4 and adapting its notation, we have
q
L(s, x, x0 ) = 1 + x0 (s)2 ,
Lx (s, x, x0 ) = 0,
x0
Lx0 (s, x, x0 ) = p
1 + x0 (s)2
and a necessary condition for x ∈ A being a local minimum (or maximum) of F is given by the
corresponding Euler–Lagrange equation
!
d x0 (s)
p = 0.
ds 1 + x0 (s)2
Integration yields
x0 (s)
p =C
1 + x0 (s)2
for some constant C ∈ R. Next, note that
x0 (s)
p = C ⇐⇒ x0 (s)2 = C + Cx0 (s)2
1 + x0 (s)2
C
⇐⇒ x0 (s)2 =
1−C
0
⇐⇒ x (s) = B
q
C
with constant B = 1−C ∈ R. Hence, integration yields
x(s) = sB + B̃
xb − xa bxa − axb
B= , B̃ =
b−a b−a
and therefore the unique extremal
1
x(s) = [s(xb − xa ) + (bxa − axb )] .
b−a
Note that this is a straight line connecting (a, xa ) and (b, xb ). This can be considered as a proof of
the well-known geometric fact that the shortest connection between two points is a straight line.
7.4. SOME SPECIAL CASES 57
Lemma 7.4.1
Given is a functional F of the form (7.2) with Lagrangian L = L(s, x, x0 ) and the corresponding
Euler–Lagrange equation (7.5).
Lx (s, x) = 0,
Lx (s, x, x0 ) = 0,
Lx0 (s, x, x0 ) = Lx0 (s, x0 ),
d d
Lx0 (s, x, x0 ) = Lx0 (s, x0 )
ds ds
and therefore
d
0= Lx0 (s, x0 ).
ds
58 7.4. SOME SPECIAL CASES
Lx0 (s, x0 ) = C
for C ∈ R.
Lx (s, x, x0 ) = Lx (x, x0 ),
Lx0 (s, x, x0 ) = Lx0 (x, x0 ),
d d
Lx0 (s, x, x0 ) = Lx0 (x, x0 )
ds ds
and therefore
d
Lx0 (x, x0 ) = Lx0 (x, x0 ).
ds
This yields
d d d
L(x, x0 ) − x0 L(x, x0 ) = L(x, x0 ) − x0 Lx0 (x, x0 ) − x00 Lx0 (x, x0 )
ds ds ds
d
= x0 Lx (x, x0 ) + x00 Lx0 (x, x0 ) − x0 Lx0 (x, x0 ) − x00 Lx0 (x, x0 )
ds
d
= x0 Lx (x, x0 ) − Lx0 (x, x0 )
ds
=0
In this example, we determine the extremals of the Brachistochrome (curve of the fastest descent)
problem.
Problem statement
A bead of mass m with initial velocity zero slides with no friction under the force of gravity g
from a point (0, b) to a point (a, 0) along a wire defined by a curve y = y(x) in the xy-plane. We
would like to figure out which curve connecting (0, b) and (a, 0) yields the fastest time of descent.
To formulate this problem analytically, first, we compute the time of descent T for a fixed curve
y:
Z bp
1 + y 0 (x)2
T (y) = dx.
a v(x)
Here, v = v(x) denotes the velocity of the bead along the curve y. Using the fact that the energy
is conserved, we have
(kinetic energy at t > 0) + (potential energy at t > 0)
= (kinetic energy at t = 0) + (potential energy at t = 0).
Expressed in our notation, this means
1
mv 2 + mgy = 0 + mgb,
2
7.4. SOME SPECIAL CASES 59
which yields q
v(x) = 2g(b − y(x)).
Thus, the time required for the bead to descend is
Z b p
1 + y 0 (x)2
T (y) = p dx. (7.8)
a 2g(b − y(x))
Note that we can now reformulate the Brachistochrome problem as finding the minimum of the
functional
T : A → R+0, y 7→ T (y)
under the restriction y(0) = b and y(a) = 0; That is,
n o
A= y ∈ C 2 ([0, a]) | y(0) = b, y(a) = 0 .
Finding extremals
Next, let us determine the extremals of the function T : A → R+ 0 given by (7.8). Once more, note
that by Theorem 7.3.4 the extremals are given as solutions of the Euler–Lagrange equation (7.5)
with Lagrangian p
0 1 + y 0 (x)2
L(x, y, y ) = p .
2g(b − y(x))
This Lagrangian does not depend on x, i. e. L = L(y, y 0 ), and Lemma 7.4.1 therefore tell us that
the corresponding Euler–Lagrange equation reduces to
!
C = L(y, y 0 ) − y 0 Ly0 (y, y 0 )
y 0 (x)
p
1 + y 0 (x)2 1
= − y 0 (x) p p .
2g(b − y(x)) 2g(b − y 0 (x)) 1 + y 0 (x)2
The above equation is equivalent to
1 − C̃(b − y)
(y 0 )2 =
C̃(b − y)
with constant C̃ ∈ R. Taking the square root of both sides and separating variables gives
√
b−y
dx = − p dy, C1 = C̃ −1 ,
C1 − (b − y)
dy
where the minus sign is taken because dx < 0. The last equation can be integrated by making
the trigonometric substitution
φ
b − y = C1 sin2 . (7.9)
2
Then, one obtains
φ C1
dx = C1 sin2 dφ = (1 − cos φ) dφ,
2 2
which yields
C1
(φ − sin φ) + C2 .
x(φ) = (7.10)
2
Equations (7.9) and (7.10) are parametric equations for a cycloid. Here, in contrast to the problem
of finding the curve of shortest length between two points (see Example 7.3.6), it is not clear that
the cycloids just obtained actually minimize the functional T : A → R. Further calculations would
be required for confirmation.
60 7.5. OUTLOOK ON POSSIBLE GENERALIZATIONS
where n o
A= x ∈ C 4 ([a, b]) | x(a) = A1 , x0 (a) = A2 , x(b) = B1 , x0 (b) = B2 .
Moreover, the function L : [a, b] × R3 → R is assumed to be twice continuously differentiable in each
of its arguments. Even though we have now included the second derivative x00 in the Lagrangian, we
can proceed quite similar to Chapter 7.3. Again, our goal is to reformulate the necessary condition
δx F (h) = 0 for all h ∈ adm(F, x) provided by Theorem 7.2.7 in a more practical manner:
Then, x + th ∈ A and
Z b
F (x + th) = L(s, x + th, x0 + th0 , x00 + th00 ) ds.
a
Hence, we have
Z b
d ∂
F (x + th) = L(s, x + th, x0 + th0 , x00 + th00 ) ds
ds a ∂t
Z b
= Lx (s, x + th, x0 + th0 , x00 + th00 )h + Lx0 (s, x + th, x0 + th0 , x00 + th00 )h0
a
+ Lx00 (s, x + th, x0 + th0 , x00 + th00 )h00 ds
since h(a) = h0 (a) = h(b) = h0 (b) = 0. Thus, by Theorem 7.2.7, we can note that
Z b" #
d d2
Lx − Lx0 + 2 Lx00 h ds = 0
a ds ds
holds for all h ∈ C 4 ([a, b]) satisfying (7.11) if x is a (local) extremum. The fundamental lemma of the
CoV (Theorem 7.3.3) therefore yields the following result.
Theorem 7.5.1
d d2
Lx (s, x, x0 , x00 ) − Lx0 (s, x, x0 , x00 ) + 2 Lx00 (s, x, x0 , x00 ) = 0.
ds ds
x1 (a) = A1 , x1 (b) = B1 ,
x2 (a) = A2 , x2 (b) = B2 .
Once more, we can derive some kind of Euler–Lagrange equation which solutions are the extremals of F
by following similar arguments as presented in Chapter 7.2 and 7.4:
Let (x1 , x2 ) be a local extremum of F and let h1 , h2 ∈ C 2 ([a, b]) such that
Then, the pair of functions (x1 + th1 , x2 + th2 ) is an admissible argument of F and we have
Z b
d ∂
F (x1 + th1 , x2 + th2 ) = L(s, x1 + th1 , x2 + th2 , x01 + th01 , x2 + th02 ) ds
dt a ∂t
Z b
= Lx1 (s, x1 + th1 , x2 + th2 , x01 + th01 , x02 + th02 )h1
a
+ Lx2 (s, x1 + th1 , x2 + th2 , x01 + th01 , x02 + th02 )h2
+ Lx01 (s, x1 + th1 , x2 + th2 , x01 th01 , x02 + th02 )h01
+ Lx02 (s, x1 + th1 , x2 + th2 , x01 + th01 , x02 + th02 )h02 ds
and therefore
d
δx1 ,x2 F (h) = F (x1 + th1 , x2 + th2 )
dt t=0
Z b
= Lx1 h1 + Lx2 h2 + Lx01 h01 + Lx02 h02 ds.
a
has to hold for all h1 , h2 ∈ C 2 ([a, b]) satisfying (7.13). The fundamental theorem of the CoV (Theorem
7.3.3) therefore yields the following result.
Theorem 7.5.2
d
Lx1 (s, x1 , x2 , x01 , x02 ) = L 0 (s, x1 , x2 , x01 , x02 ),
ds x1
d
Lx2 (s, x1 , x2 , x01 , x02 ) = L 0 (s, x1 , x2 , x01 , x02 ).
ds x2
where xi ∈ C 2 ([a, b]) and xi (a) = Ai , xi (b) = Bi for all i = 1, . . . , n, a necessary condition for the n-tupel
(x1 , . . . , xn ) to provide a local extremum of F is given by the system of Euler–Lagrange equations
d
L xi = L 0
ds xi
for i = 1, . . . , n.
A river with parallel straight banks b units apart has a stream velocity given by
!
0
v(x, y) = .
v(x)
Assuming that one of the banks is the y-axis and that the point (0, 0) is the point of departure,
what route should a boat take to reach the opposite bank in the shortest possible time? Assume
that the speed of the boat in still water is c ∈ R+ with c > v(x) for all x. This problem differs
from those in earlier sections in that right-hand endpoint, the point of arrival on the line x = b,
is not specified. Instead, it must be determined as part of the solution. It can be shown that the
time required for the boat to cross the river along a given path y = y(x) is
Such a problem is referred to as a free endpoint problem, and if y is an extremal, then a certain
condition must hold at x = b.
Generally speaking, conditions of the above type are called natural boundary conditions. In fact,
just as common as free endpoint problems are problems where both, the startpoint and endpoint, are
64 7.5. OUTLOOK ON POSSIBLE GENERALIZATIONS
unspecified. To outline the treatment of such variational problems, let us consider the problem
Z b
F (y) = L(x, y, y 0 ) dx,
a
where y ∈ C 2 ([a, b]) with
y(a) = ya , y(b) free.
Let y be a local extremum of F and let h ∈ C 2 ([a, b]) such that
h(a) = 0. (7.14)
Then, y + th is an admissible function and we have
d
0 0
δy F (h) = F (x, y + th, y + th )
dt t=0
Z b
= Ly h + Ly0 h0 dx.
a
Note that this time we only have h(a) = 0 and integration by parts therefore yields
Z b
d
δy F (h) = Ly − Ly0 h dx + Ly0 (b, y(b), y 0 (b))h(b).
a ds
Thus, if y is a local extremum of F , then
Z b
d
Ly − Ly0 h dx + Ly0 (b, y(b), y 0 (b))h(b) = 0 (7.15)
a ds
has to hold for all h ∈ C 2 ([a, b]) with h(a) = 0. Yet, in particular, (7.15) has to hold for all h ∈ C 2 ([a, b])
with h(a) = h(b) = 0. Hence, by the fundamental lemma of the CoV, y must satisfy the Euler–Lagrange
equation
d
Ly (x, y, y 0 ) = Ly0 (x, y, y 0 ).
dx
In addition, however,
Ly0 (b, y(b), y 0 (b)) = 0
has to hold as well. Then, (7.15) is ensured to hold for all h ∈ C 2 ([a, b]) with h(a) = 0. This observation
is summarized in the following theorem.
Theorem 7.5.4
Let L : [a, b]×R×R → R be a twice continuously differentiable function. If a function y ∈ C 2 ([a, b])
with
y(a) = ya , y(b) free
provides a local extremum for the functional
Z b
F (x) = L(x, y, y 0 ) ds,
a
While (7.16) is the usual Euler–Lagrange equation, (7.17) is called the natural boundary condi-
tion. By similar arguments, if the left endpoint y(a) is unspecified, then the natural boundary condition
is given by Ly0 (a, y(a), y 0 (a)) = 0.
7.5. OUTLOOK ON POSSIBLE GENERALIZATIONS 65
Example 7.5.5
c2 y 0 (b)
− v(b) = 0,
c2 [1 + y 0 (b)2 ] − v(b)2
67
68 8.1. BEST APPROXIMATIONS IN INNER PRODUCT SPACES
kf − v ∗ k ≤ kf − vk
Given is the linear space (R2 , k · k∞ ) with k(x, y)| k∞ = max{|x|, |y|}, the element f = (0, 1)| , and
the linear subspace V = R × {0} which corresponds to the x-axis of the xy-plane. Then, every
element
vr∗ = (r, 0)| ∈ V, r ∈ [−1, 1],
is a best approximation of f = (0, 1)| from V with respect to k · k∞ . This can be noted by
observing that
kf − vr∗ k∞ = k(−r, 1)| k∞ = max{|r|, 1} = 1
for r ∈ [−1, 1] and
kf − vk∞ > 1
for all other elements from V .
While providing an illustration of best approximations in a well-known setting, Example 8.1.2 also
shows that, in general, best approximations are not unique. These changes, however, if we restrict
ourselves to inner product spaces. Moreover, it is convenient to assume V to be a finite dimensional
linear subspace.1
Let (X, h·, ·i) be an inner product space and V ⊂ X be a finite dimensional linear subspace. Then,
for every f ∈p X, there exists a unique best approximation v ∗ ∈ V of f with respect to the induced
norm kxk = hx, xi.
Proof. The details can be found, for instance, in [Gla20]. Yet, we note that the uniqueness follows from
k · k being strictly convex, which, in turn, follows from the parallelogram law.
Hence, we can establish uniqueness of the best approximation if we restrict ourselves to norms which
are induced by an inner product. This is also demonstrated in the following example.
Again, let us consider the linear space R2 with linear subspace V = R × {0} and f = (0, 1)| . This
1
Generally speaking, uniqueness is already ensured once we restrict ourselves to to finite dimensional linear subspaces
V and strictly convex norms. Hence, also the usual p-norms with 1 < p < ∞ would yield uniqueness for finite dimensional
V . See, for instance, [Gla20, Chapter 3.1.1] and references therein.
8.1. BEST APPROXIMATIONS IN INNER PRODUCT SPACES 69
time, however, we equip the linear space with the Euclidean norm
q
|
k(x, y) k2 = x2 + y 2 ,
Then, in accordance with Theorem 8.1.3, there is only a single best approximation:
v ∗ = (0, 0)|
and
kf − vk22 = k(x, 1)k22 = x2 + 1 > 1
for all v = (x, 0)| ∈ V with x 6= 0.
We have observed that the inner product spaces provide us with a unique best approximation from
a finite dimension linear subspace to any element. Yet, the actual advantage of inner product spaces is
the concept of orthogonality. Looking back at our earlier crash course in functional analysis — more
precisely Definition 6.1.10 — we call two vectors orthogonal if
hx, yi = 0.
Utilizing the concept of orthogonality, we can note the following characterization of the best approxima-
tion, which can be used for explicit and practical computations.
Let (X, h·, ·i) be an inner product space, f ∈ X, and V ⊂ X be a finite dimensional linear
subspace. Then, v ∗ ∈ V is the best approximation of f from V if and only if
hf − v ∗ , vi = 0 ∀v ∈ V (8.1)
holds.
Proof. Let v ∗ ∈ V . First, we show that (8.1) is a necessary condition for v ∗ being a best approximation.
Afterwards, we prove that (8.1) also is a sufficient condition.
=⇒ : Let us assume that there exists a v ∈ V (with v 6= 0) such that (8.1) is violated:
α := hf − v ∗ , vi =
6 0
Then,
kf − (v ∗ + λv)k2 = kf − v ∗ k2 − 2λhf − v ∗ , vi + λ2 kvk2
α
and by choosing λ = kvk2
, we get
α2
kf − (v ∗ + λv)k2 = kf − v ∗ k − < kf − v ∗ k2 .
kvk2
Hence, v ∗ cannot be the best approximation.
70 8.2. THE GENERALIZED FOURIER SERIES
kf − v ∗ k2 = hf − v ∗ , f − v + v − v ∗ i
= hf − v ∗ , f − vi + hf − v ∗ , v − v ∗ i.
kf − v ∗ k2 = hf − v ∗ , f − vi ≤ kf − v ∗ kkf − vk
Note that the characterization (8.1) essentially means that the error between f and its best ap-
proximation v ∗ is orthogonal to the linear subspace V ; that is, f − v ∗ ⊥V . This shows that the best
approximation v ∗ is obtained by the orthogonal projection of f onto V .
Remark 8.2.1
The characterization of the best approximation provided by Theorem 8.1.5 allows — at least
formally — an easy computation of the best approximation. The procedure is described below.
Let {vk }N
k=1 be a basis of the finite dimensional linear subspace V ⊂ X. Then, the best approximation
∗
v of f ∈ X from V has a unique representation as a linear combination of the basis elements:
N
v∗ =
X
αk vk (8.2)
k=1
Here, the scalar coefficients αk are elements of the underlying field, in this case R. These coefficients
can be determined by consulting (8.1). Note that, because of the sesquilinearity of the inner product, it
is sufficient to check (8.1) for the basis elements v1 , . . . , vN . Thus, using the representation (8.2), we get
a system of linear equations,
N
X
αk hvk , vl i = hf, vl i, l = 1, . . . , N.
k=1
G= .. ..
,
. .
hv1 , vN i · · · hvN , vN i
the above system of linear equations can be written in matrix vector notation as
Gα = b. (8.3)
8.2. THE GENERALIZED FOURIER SERIES 71
The matrix G is called a Gram matrix; see [HJ12, Chapter 7.2]. It is symmetric (G = G| ) and positive
definite, since
N
X
β | Gβ = βk βl hvk , vl i
k,l=1
*N N
+
X X
= βk vk , vl
k=1 l=1
N 2
X
= β k vk
k=1
>0
for β ∈ RN with β 6= 0. Note that the last estimate follows from the basis elements v1 , . . . , vN being
linearly independent. In particular, the Gram matrix G is therefore regular and (8.3) can be solved
uniquely for the coefficients α of the best approximation v ∗ . This can be done, for instance, by Gaussian
elimination [TBI97, Lecture 20]. Yet, the real beauty of best approximations in inner product spaces is
revealed when we choose {vk }N k=1 to be an orthogonal basis.
Let {vk }N N
k=1 be a basis of the linear space V . We say that {vk }k=1 is an orthogonal basis if
holds for all k, l = 1, . . . , N . Here, δkl denotes the usual Kronecker delta defined by
(
1, if k = l,
δkl =
0, 6 l.
if k =
kvk k = 1
When using such an orthogonal basis, the Gram matrix G reduces to a diagonal matrix and (8.3)
consists of N independent equations
αk kvk k2 = hf, vk i, k = 1, . . . , N.
hf, vk i
αk = , k = 1, . . . , N,
kvk k2
in this case. Note that, when using an orthonormal basis, the Gram matrix even reduces to the identity
matrix and we get
αk = hf, vk i, k = 1, . . . , N,
for the coefficients. We summarize our above observations in the following theorem.
72 8.2. THE GENERALIZED FOURIER SERIES
Let (X, h·, ·i) be an inner product space, V ⊂ X be a finite dimensional subspace, and {vk }N
k=1
be an orthogonal
p basis of V . Then, the best approximation v ∗ of f from V with respect to
k · k = h·, ·i is uniquely given by
N
hf, vk i
v∗ =
X
vk .
k=1
kvk k2
The above form of a best approximation already is what is usually referred to as truncated generalized
Fourier series, at least in the case of X = L2 ([a, b]). Here, L2 ([a, b]) denotes the linear space of all square
integrable real-valued functions.
A function f : [a, b] → R is said to be square square integrable on [a, b], for which we write
f ∈ L2 ([a, b]), if
Z b
|f (x)|2 dx < ∞
a
holds.
the function space L2 ([a, b]) is a (complete) inner product space.2 To the induced norm,
Z b
kf k2 = |f (x)|2 dx, (8.5)
a
we usually refer to either as the L2 norm or the mean-square norm. In this setting, we define the
generalized Fourier series as follows.
Let F = {fk }∞ 2
k=0 with fk 6≡ 0 be a set of pairwise orthogonal functions in (L ([a, b]), h·, ·i). The
2
(generalized) Fourier series of f ∈ L ([a, b]) with respect to F is given by
∞
X
F [f ](x) = ck fk (x) (8.6)
k=0
with (generalized) Fourier coefficients
hf, fn i
ck = . (8.7)
kfk k2
Sometimes we write F [f ] ∼ f to denote that F [f ] is the (generalized) Fourier series of f . Moreover,
N
X
FN [f ](x) = ck fk (x) (8.8)
k=0
is referred to as the N -th truncated (generalized) Fourier series.
2
Strictly speaking, we have to partition the square integrable functions into equivalence classes first. Otherwise, it is
not even positive definite. We say that two square integrable functions are equivalent if they are equal almost everywhere.
8.2. THE GENERALIZED FOURIER SERIES 73
Example 8.2.6
Given is the set F = {fn }∞ n=0 with fn (x) = cos(nπx) of pairwise orthogonal functions in
2
(L ([0, 1]), h·, ·i). Let f (x) = 1 − x, then the Fourier coefficients cn of the generalized Fourier
series
N
X
FN [f ](x) = cn cos(nπx)
n=0
and therefore
1
kfn k2 = .
2
Next, we observe that
Z 1
hf, fn i = (1 − x) cos(nπx) dx
0
Z 1
=− x cos(nπx) dx
0
1 Z 1
IBP 1 1
= −x sin(nπx) + sin(nπx) dx
nπ 0 nπ 0
1
=− [cos(nπx)]10
(nπ)2
1
=− (cos(nπ) − 1) .
(nπ)2
Thus, since (
+1 if n is even,
cos(nπ) =
−1 if n is odd,
holds, we have
0 if n is even,
hf, fn i =
2 if n is odd.
(nπ)2
We therefore get
hf, fn i 0 if n is even,
cn = =
kfn k2 4 2 if n is odd,
(nπ)
In the previous discussion, we have observed that the truncated Fourier series, FN [f ], is exactly the
best approximation in (L2 ([a, b]), h·, ·i). This is summarized in the following theorem.
74 8.3. MEAN-SQUARE CONVERGENCE
Theorem 8.2.7
Let F = {fk }∞ 2
k=0 with fk 6≡ 0 be a set of pairwise orthogonal functions in (L ([a, b]), h·, ·i) and let
V = span F (the linear subspace spanned by F). For every f ∈ L2 ([a, b]), the truncated Fourier
series FN [f ] is the best approximation of f from V ; that is,
kf − FN [f ]k ≤ kf − vk
for all v ∈ V .
Hence, the truncated Fourier series is constructed in such a way that there can be no function
generated from the same orthogonal system F which is able to yield a better approximation to f that
FN [f ].
Lemma 8.3.1
N
X
kf − FN [f ]k2 = kf k2 − c2k . (8.9)
k=0
kf − FN [f ]k2 = hf − FN [f ], f − FN [f ]i
= kf k2 − hf, FN [f ]i + kFN [f ]k2 .
∞
X
c2k ≤ kf k2 .
k=0
Since this inequality holds for every N ∈ N, we can conclude the assertion.
Furthermore, from Lemma 8.3.1 it immediately becomes clear that the Fourier series F [f ] converges
to f ∈ L2 ([a, b]), i. e.
F [f ] = f,
if and only if
N
X
c2k = kf k2 (8.10)
k=0
holds for every f ∈ L2 ([a, b]). Equation (8.10) is known as Parseval’s equality. This identity, in turn,
holds if and only if we restrict ourselves to complete orthonormal systems F.
hf, fk i = 0 ∀fk ∈ F =⇒ f ≡ 0
Thus, an orthonormal system F = {fk }∞ k=0 is complete if and only if the only function having all its
Fourier coefficients vanish is the zero function. Sometimes it is difficult to show completeness and we shall
usually just state whether a given orthonormal system is complete. Next, it is proven that completeness
is equivalent to strict equality holding in Bessel’s inequality, resulting in Parseval’s equality.
Let F = {fk }∞ 2
k=0 be an othonormal system in L ([a, b]). F is complete if and only if Parseval’s
equality,
∞
X
c2k = kf k2 ,
k=0
holds for all L2 ([a, b]). Here, the ck ’s are the Fourier coefficients of f .
Proof. First, we show that F being complete is sufficient for Parseval’s equality to hold and afterwards
that it is also a necessary condition.
76 8.3. MEAN-SQUARE CONVERGENCE
Then, we have
kf k2 − kf˜k2 ≤ kf − f˜k2 = 0.
k=0
and, in combination with Bessel’s inequality, we get
N
X
c2k = kf k2 .
k=0
"⇐=": Let us assume that F is not complete. That is, there exists an f ∈ L2 ([a, b]) with f 6≡ 0 such that
hf, fk i = 0 ∀fk ∈ F.
This, however, implies that f ≡ 0 which is a contradiction to our initial assumption. Hence,
Parseval’s equality implies completeness.
Finally, using Theorem 8.3.4, we can note that the orthonormal system F being complete is also
equivalent to the Fourier series converging to f ∈ L2 ([a, b]) in the mean-square sense,
lim FN [f ] = f or F [f ] = f in L2 ([a, b]), k · k .
N →∞
Let F = {fk }∞ 2 2
k=0 be an orthonormal system in L ([a, b]) and let f ∈ L ([a, b]). The Fourier series
of f with respect to F converges to f in the mean-square sense if and only if F is complete.
Proof. The assertion follows from Lemma 8.3.1 and Theorem 8.3.4.
8.4. CLASSICAL FOURIER SERIES 77
Example 8.4.1
Given is the square-integrable function f (x) = x on [−1, 1]. The Fourier coefficients of f ’s Fourier
series are easily computed to be
Z 1
an = f (x) cos (nπx) dx = 0, n = 0, 1, 2, . . . ,
−1
Z 1
2
bn = f (x) sin (nπx) dx = (−1)n+1 , n = 1, 2, . . . .
−1 nπ
Thus, the Fourier series of f is given by
∞
2 X 1
F [f ] = (−1)n+1 sin(nπx)
π n=1 n
and Theorem 8.3.5 tells us that F [f ] converges to f in the mean-square sense. That is,
F [f ] = f in L2 ([−1, 1]), k · k or kF [f ] − f k = 0.
It should be stressed heavily that this equality only holds in the mean-square sense and, for
instance, not necessarily in the pointwise sense (F [f ](x) = f (x) ∀x ∈ [−1, 1])! In fact, we can
note that
F [f ](±1) = 0 6= ±1 = f (±1).
Hence, the Fourier series does not converge at the boundary points. It does, however, converge
pointwise to f in (−1, 1).
78 8.4. CLASSICAL FOURIER SERIES
The above example shows that, while we can expect the Fourier series to converge in the mean-square
sense for every f ∈ L2 ([a, b]), pointwise convergence does not hold in general. Yet, we can establish such
a result if we restrict ourselves to piecewise smooth periodic functions.
Definition 8.4.2
Let f : [a, b] → R be a function. We say that f is
such that
f |(xn ,xn+1 ) : (xn , xn+1 ) → R is continuous ∀n = 0, . . . , N
and
f (x+
n ) := lim f (x), f (x−
n ) := lim f (x) ∈ R ∀n = 1, . . . , N.
x&xn x%xn
The second condition means that both one-sided limits of f exist and are finite.
For this class of functions, indeed, we can note the following convergence result.
Let f : [−L, L]) → R be periodic and piecewise smooth. Then, its classical Fourier series (8.12)
converges pointwise for all x ∈ [−L, L] to the value f (x) if f is continuous at x and to the average
value of its left and right limits at x, namely 21 [f (x+ ) + f (x− )], if f is discontinuous at x.
To get stronger convergence results, such as uniform convergence, additional smoothness conditions
on f are required. Observe that continuity of f is not enough to guarantee pointwise convergence.
Chapter
9
Sturm–Liouville Problem
In Chapter 8, we have discussed generalized Fourier series to represent or at least approximate square
integrable functions. These assume a (complete) orthogonal system, however. If we already have a basis
of L2 ([a, b]) at hand we can construct a complete orthogonal system by the Gram–Schmidt procedure.
Here, we discuss another technique to construct complete orthogonal systems without prior knowledge of
a basis. This technique builds up on ordinary differential equations (ODEs) of a Sturm–Liouville type.
For the bounded interval [a, b], equation (9.1) is usually accompanied by boundary conditions of the
form
α1 y(a) + α2 y 0 (a) = 0, β1 y(b) + β2 y 0 (b) = 0. (9.2)
Here, the pairs of constants α1 , α2 and β1 , β2 respectively are not allowed to both be zero. Otherwise,
the boundary condition would collapse at that boundary ("0=0"). Two important special cases of (9.2)
are (homogenous) Dirichlet boundary conditions
y(a) = 0, y(b) = 0
y 0 (a) = 0, y 0 (b) = 0.
It is convenient to also distinguish between certain cases regarding the functions p and q in (9.1).
We call the ODE (9.1) together with a boundary condition (9.2) a Sturm–Liouville problem
(SLP). Furthermore, if the interval [a, b] is bounded, p ∈ C 1 ([a, b]), q ∈ C 0 ([a, b]) and p is never
zero in [a, b], we say that the SLP is regular. Otherwise, it is reffered to as singular.
It should be stressed that a regular SLP might not have a nontrivial solution (y 6≡ 0) for every value
of the constant λ.
79
80
Remark 9.0.3
The SLP (9.1), (9.2) can be interpreted as an eigenvalue problem for certain linear operators. Let
us consider the differential operator
Ly = −[p(x)y 0 ]0 + q(x)y,
Ly = λy,
where we are only interested in eigenvectors y 6≡ 0 (also called eigenfunctions) that additionally
satisfy the boundary conditions (9.2). From this interpretation we can note that any constant of
a solution (eigenfunction) y gives another — but not independent — solution (eigenfunction) cy.
The really interesting fact about regular SBPs is that they have an infinite number of eigenvalues
λ and the corresponding eigenfunctions form a complete orthogonal system of L2 ([a, b]). This, again,
allows us to work with orthogonal expansions, a key idea in many applications.
Example 9.0.4
One way to find these is to separately consider the cases λ = 0, λ < 0, and λ > 0. (We show later
that the eigenvalue λ cannot be complex.)
λ = 0: Then, the ODE (9.3) becomes
y 00 = 0
with general solution
y(x) = ax + b.
Yet, the BCs yield
y ≡ 0,
which is not an admissible eigenfunction. Note that eigenvectors (eigenfunctions) are not
allowed to be zero. Otherwise, every constant would be an eigenvalue of this eigenvector.
Hence, λ = 0 cannot be an eigenvalue.
λ < 0: Let us express λ as λ = −k 2 for a suitable k ∈ R+ . Then, the ODE (9.3) becomes
y 00 − k 2 y = 0
with general solution
y(x) = aekx + be−kx .
The BCs yield a = b = 0 and therefore
y ≡ 0.
Thus, there can be no negative eigenvalue.
81
y 00 + k 2 y = 0
λk = k 2 , k ∈ N, (9.5)
equation (9.4) does not pose any restriction on b and we obtain a family of solutions
yk (x) = b sin(kx), k ∈ N.
λk = k 2 , yk (x) = sin(kx), k ∈ N,
In Example 9.0.4, by investigating different cases, we saw that only positive real eigenvalues occured
and that the corresponding eigenfunctions form a complete orthogonal system of L2 ([a, b]). The following
theorem shows that, in fact, this holds for regular SLPs in general.
Theorem 9.0.5
The regular SLP (9.1), (9.2) has infinitely many eigenvalues λk , k ∈ N. The eigenvalues are
real and limk→∞ |λk | = ∞. Furthermore, the corresponding eigenfunctions yk form a complete
orthogonal system in L2 ([a, b]). That is, every f ∈ L2 ([a, b]) can be expanded as
∞
X
f (x) = ck fk (x)
k=1
Proof. We only sketch the proof of Theorem 9.0.5. For instance, regarding the existence of the eigenval-
ues, however, we refer to the literature [BR78]. Here we just show that the eigenfunctions corresponding
to distinct eigenvalues are orthogonal. Let λ1 6= λ2 be two eigenvalues with corresponding eigenfunctions
y1 and y2 . In particular, these satisfy the ODEs
− [py10 ]0 + qy1 = λ1 y1 ,
− [py20 ]0 + qy2 = λ2 y2 ,
then. Multiplying the first equation by y2 , multiplying the second equation by y1 , substracting, and
82
Note that y1 and y2 satisfy the same boundary conditions. Thus, y1 (x)y20 (x) = y2 (x)y10 (x) for x = a, b
and
(λ1 − λ2 )hy1 , y2 i = 0.
Finally, since λ1 6= λ2 , this proves hy1 , y2 i = 0 (orthogonality of y1 and y2 ).
Another issue concerns the sign of eigenvalues. The following energy argument is sometimes useful
in showing that the eigenvalues are all of the same sign.
where q is a positive continuous function. Multiplying the ODE by y and integrating it over the
interval [a, b] gives
Z b Z b Z b
− y 00 y dx + qy 2 dx = λ y 2 dx.
a a a
The first integral can be treated by integration by parts, yielding
Z b Z b Z b
b
−yy 0 a + 0 2
(y ) dx + 2
qy dx = λ y 2 dx.
a a a
Next, the homogenous Dirichlet boundary conditions result in the boundary terms to vanish.
Hence, we have
Z b Z b
λ 2
y dx = y 2 + q(y 0 )2 dx.
a a
Note that the integrals on both sides are positive. Thus, also the eigenvalues λ have to be positive.
Bibliography
[Ari76] R. Ariew. Ockham’s razor: A historical and philosophical analysis of Ockham’s principle
of parsimony. 1976.
[Cra04] J. S. Cramer. The early origins of the logit model. Studies in History and Philosophy of
Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences,
35(4):613–626, 2004.
[Ein05] A. Einstein. Näherungsweise Integration der Feldgleichungen der Gravitation. Albert Ein-
stein: Akademie-Vorträge: Sitzungsberichte der Preußischen Akademie der Wissenschaften
1914–1932, pages 99–108, 2005.
[Gla20] J. Glaubitz. Shock capturing and high-order methods for hyperbolic conservation laws.
Logos Verlag Berlin, 2020.
[HJ12] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge university press, 2012.
[LS88] C.-C. Lin and L. A. Segel. Mathematics applied to deterministic problems in the natural
sciences, volume 1. Siam, 1988.
[Mur12] J. D. Murray. Asymptotic analysis, volume 48. Springer Science & Business Media, 2012.
[MWJ92] T. R. Malthus, D. Winch, and P. James. Malthus:’An Essay on the Principle of Population’.
Cambridge University Press, 1992.
[TBI97] L. N. Trefethen and D. Bau III. Numerical linear algebra, volume 50. Siam, 1997.
83
84 BIBLIOGRAPHY