0% found this document useful (0 votes)
63 views

Lecture Notes For "Introduction To Mathematical Modeling" - Freie Universit at Berlin, Winter Semester 2017/2018

This document provides lecture notes on mathematical modeling. It introduces key concepts like dimensional analysis, linear regression models, population models in biology, control theory, bifurcation theory, modeling chemical reactions and traffic flow. An example of using a logistic model to describe historical US population growth from 1790-2010 is presented to illustrate the typical steps in building a mathematical model: (1) formulating the problem, (2) outlining the model, (3) checking practicality, and (4) checking against reality. The example shows how initial assumptions had to be refined to create a model that fits the known data.

Uploaded by

Angel Beltran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Lecture Notes For "Introduction To Mathematical Modeling" - Freie Universit at Berlin, Winter Semester 2017/2018

This document provides lecture notes on mathematical modeling. It introduces key concepts like dimensional analysis, linear regression models, population models in biology, control theory, bifurcation theory, modeling chemical reactions and traffic flow. An example of using a logistic model to describe historical US population growth from 1790-2010 is presented to illustrate the typical steps in building a mathematical model: (1) formulating the problem, (2) outlining the model, (3) checking practicality, and (4) checking against reality. The example shows how initial assumptions had to be refined to create a model that fits the known data.

Uploaded by

Angel Beltran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Lecture notes for “Introduction to Mathematical

Modeling” — Freie Universität Berlin, Winter


semester 2017/2018

Carsten Hartmann, Nikki Vercauteren, Ralf Banisch


November 28, 2017

Contents
1 Introduction 3

2 Arguments from scale 6


2.1 Dimensional analysis . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 A historical example . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Arguments from data 13


3.1 Linear regression models . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Least squares and maximum likelihood principles . . . . . . . . . 15

4 Population models in biology 21


4.1 Lotka-Volterra model . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Stability of fixed points . . . . . . . . . . . . . . . . . . . . . . . 27

5 Basic principles of control theory 33


5.1 Fishery management based on the logistic model . . . . . . . . . 33
5.2 Optimal control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 Basic principles of bifurcation theory 39


6.1 Solar radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 Energy balance models . . . . . . . . . . . . . . . . . . . . . . . . 40
6.3 Bifurcation theory . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7 Modelling of chemical reactions 47


7.1 The chemical master equation . . . . . . . . . . . . . . . . . . . 47
7.2 Stochastic simulation algorithm . . . . . . . . . . . . . . . . . . . 52

8 Modelling of traffic flow 56


8.1 From individual vehicles to vehicle densities . . . . . . . . . . . . 56
8.2 Traffic jams and propagation of perturbations . . . . . . . . . . . 60
8.2.1 Numerical solution . . . . . . . . . . . . . . . . . . . . . . 62
8.3 Flow modeling: macroscopic modeling of traffic flows . . . . . . . 63
8.4 Traffic flow when the light turns green . . . . . . . . . . . . . . . 68
8.5 Some properties of traffic flow from a red light . . . . . . . . . . 70

1
9 Formal justice 72
9.1 Functional equations . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.2 Criticism and possible extensions . . . . . . . . . . . . . . . . . . 75

References 78

2
1 Introduction
Mathematical tools & concepts: basic ODE
Suggested references: [Ari94, Ben00]

By “mathematical model” we mean anything that can be expressed in terms of


mathematical formulae and that is amenable to mathematical analysis. Typical
mathematical models involve (list non-exhaustive, non-disjoint):

• deterministic ODE models (e.g. population growth, chemical reactions


mechanical systems and analogues, climate, . . . )
• stochastic models (e.g. growth of small populations, chemical reactions in
cells, asset prices, weather prediction, . . . )

• optimality principles (e.g. principles of utility, properties of materials, min-


imization of energy consumption, trading strategies, . . . )
• discrete or continuous flow models (e.g. queuing problems, traffic, logistics,
load balancing in parallel computers, . . . )

• statistical models (e.g. distribution of votes, change in the precipitation


rate, wage justice, criminal statistics, Google PageRank, . . . )

Of course, you can combine any of the aforementioned modelling approaches


to obtain what is called a hybrid model.

Building a model: divide et impera. From the modelling viewpoint, the


world is divided into things whose effects are neglected (e.g. planetary configura-
tions in a model of traffic flow on a highway), things whose behaviour the model
is designed to study, so-called observables or states (e.g. the average number of
commuters in a traffic flow model), and things that affect the model, but that
are not within the scope of the model, called boundary conditions (e.g., weather
conditions in a model of traffic flows).
The standard way to build a mathematical model then involves the following
four steps:

1. Formulate the modelling problem: What is the question to be an-


swered? What type of model is appropriate to answer the question?
2. Outline the model: Which effects should be included in the model,
which are negligible? Write down relations between the states.

3. Practicability check: Is the model “solvable”, either by analytical meth-


ods or numerical simulation? Do I have access to all the parameters in
the model? Can the model be used to make predictions?
4. Reality check: Make predictions of known phenomena and compare with
available data (qualitatively or quantitatively).

Note that there is a trade-off between simplicity of a model (easier to an-


alyze and interpret) and “realism” or accuracy of the model (potentially more
complicated, analysis may be through computer simulations only).

3
year population in millions
1790 3.93
1810 7.24
1830 12.87
1850 23.19
1870 39.82
1890 62.95
1910 91.97
1930 122.78
1950 150.70
1970 208.00
1990 248.14
2010 308.19

Table 1: U.S. population between 1790 and 2010 (see http://www.census.gov/).

Illustrative example: population growth. Suppose we want to describe


the long term growth of a population, specifically we want to predict the growth
of the, say, U.S. population over several generations. Here are some numbers
from the U.S. Census Bureau:
Now let let γ ∈ R be the net reproduction rate per individual (birth rate
minus death rate, and N (t) the size of the population at time t ≥ 0. Then, by
definition of γ, we have

1 N (t + ∆t) − N (t)
γ(t) = lim
∆t→0 N (t) ∆t

(Note that this assumes that the limit exists, which is nonsense given the annual
census data.) This suggests the following model for N as a function of t:

Ṅ (t) = γ(t)N (t) , N (0) = N0 (1.1)

where the dot means differentiation with respect to t. This completes Steps 1
and 2 above. Now suppose that γ is independent of t. The solution of (1.1)
then is
N (t) = N0 eγt , (1.2)
which, depending on the sign of γ, means that the population will either grow
(γ positive), die out (γ negative) or stay constant (γ = 0). Sounds okay! So,
let us skip the Step 3 and the question of the how to get γ and directly proceed
with Step 4: Assuming γ > 0 it holds that

lim N (t) = ∞ , (1.3)


t→∞

which cannot be true. (Make sure you understand why the model must be
rejected on the basis of this prediction.) So let’s go back to Step 2 and take
into account that the growth rate of a population will depend on its size due to
limited resources, food supply etc. Specifically, let

γ̃ : [0, ∞) → R , N 7→ γ̃(N ) .

4
be strictly decreasing for sufficiently large N , with γ̃(N ) → −∞ for n → ∞,
so as to make sure that the reproduction rate becomes negative once a certain
population size is exceeded. Getting the precise census data to estimate γ̃(N )
will be difficult, but we may be happy with a rough estimate; the simplest
possible scenario is

γ̃(N ) = γ(1 − N/K), γ, K > 0 (1.4)

where γ, K must be determined from data. The resulting differential equation,

Ṅ (t) = γN (t)(1 − N (t)/K) , N (0) = N0 , (1.5)

is called the logistic growth model. It is clear that when N grows, the right
hand side of the equation will become negative and so will the reproduction
rate. This guarantees that N remains finite for all t. It can be shown that

KN0 eγt
N (t) = , (1.6)
K + N0 (eγt − 1)

which implies that


lim N (t) = K .
t→∞

For obvious reasons K is called the systems’s capacity. This looks much better
than before, and we may now see how well this model fits the data given in
Table 1. Clearly, the model could be extended in various ways, e.g., by splitting
the population into subpopulations according to sex and age, by incoporating
additional external factors, such as war, immigration etc.

Problems
Exercise 1.1. Consider the logistic growth model from above.
a) Discuss the issue of parameter estimation and model validation: How could
the unknown parameters γ, K be computed? How well does the model fit
the data? How would you judge the predictive power of the model?
b) Would you trust the model, if N0 was, say, 2 or 3? If not, explain why
and discuss possible ways to improve the model. In case you do trust the
model, interpret the role of the parameter γ.

5
Figure 2.1: The classical pendulum. The radial position at time t is given by
the arclength s(t) = Lθ(t), hence the radial force on the mass is −mLθ̈(t).

2 Arguments from scale


Mathematical tools & concepts: basic ODE, linear algebra
Suggested references: [Ben00, IBM+ 05]

Let us start with some motivation and look at the classical pendulum (see
Fig. 2.1). The governing equation of motion for the angle θ as a function of t is

Lθ̈(t) = −g sin θ(t) (2.1)

with g acceleration due to gravity and L the length of the pendulum. When θ
is small, θ ≈ sin θ and we may replace the last equation by

θ̈(t) = −ω 2 θ(t) (2.2)


p
with ω = g/L. The solution of (2.2) is

θ(t) = A sin(ωt) + B cos(ωt) (2.3)

with A, B depending on the initial conditions θ(0) and θ̇(0). Since sine and
cosine have a period of 2π, we find that the pendulum has period
s
2π L
T = = 2π , (2.4)
ω g

which is independent of the mass m of the pendulum and which does not depend
on the initial position θ(0) = θ0 .

Derivation from scale arguments. Let us now derive the essential depen-
dence of T on L and g without using any differential equations. To this end we
conjecture that there exists a function f such that

T = f (θ, L, g, m) . (2.5)

6
We denote the physical units (a.k.a. dimensions) of the variables θ, L, g, m by
square brackets. Specifically,

[T ] = s , [θ] = dimensionless , [L] = m [g] = ms−2 , [m] = kg

In an equation the physical units must match, so the idea is to combine θ, L, g, m


is such a way that the physical units of the formula have the unit of time T . This
excludes transcendental functions, such as log or tan for the variables carrying
physical units. Using the ansatz

f (θ, L, g, m) ∝ Lα1 g α2 mα3 (2.6)

with unknowns α1 , α2 , α3 ∈ R that must be chosen such that

s = mα1 +α2 s−2α2 kgα3

Note that we have ignored θ as it does not carry any physical units. By com-
parison of coefficients we then find

α1 = 1/2 , α2 = −1/2 , α3 = 0 .

which yields s
L
T ∝ (2.7)
g
and which is consistent with (2.4). Note, however, that we cannot say anything
about a possible dependence of T on the dimensionless angle variable θ. (The
unknown dependence on the angle is the constant prefactor 2π.)

2.1 Dimensional analysis


Let y, x1 , . . . , xn be physical (measurable) scalar quantities, out of which we
want to build a model. The quantities y, xi come with fundamental physical
units L1 , . . . , Lm with m ≤ n. Now the model consists in assuming that there
is an a priori unknown function f such that

y = f (x1 , . . . , xn ) (2.8)

In the SI system there are exactly seven fundamental physical units: mass (L1 =
kg), length (L2 = m), time (L3 = s), electric current (L4 = A), temperature
(L5 = K), amount of substance (L6 =mol) and luminous intensity (L7 =cd),
and we postulate that the physical dimension of any measurable scalar quantity
can be expressed as a product of powers of the L1 , . . . , L7 .

Example 2.1. It holds that

kg m2
[Energy] = = L1 L22 L−2
3 .
s2
Here, the number of fundamental physical units is m = 3.

7
Step 1: Remove redundancies from the model If the unknown function
f is a function of n variables x1 , . . . , xn with m ≤ n fundamental physical units,
using strategy in the pendulum example may lead to an underdetermined system
of equations (there we had 4 variables with only 3 fundamental units).
To remove such redundancies, it is helpful to translate the problem into the
language of linear algebra: Let L1 , . . . , Lm be our fundamental physical units
and identify Li with to the i-th canonical basis vector

ei = (0, . . . , 0, 1, 0, . . . , 0)T ,

of Rm , with the entry 1 in position i. Now pick a subset {p1 , . . . , pm } of


{x1 , . . . , xn } so that p1 , . . . , pm are linearly independent in the sense that no
[pi ] can be expressed in terms of the [p1 ], . . . , [pi−1 ], [pi+1 ], . . . , [pm ]. With this
correspondence, each [pi ] is a linear combination of the canonical basis vectors
e1 , . . . , em , and so, by construction, there exist αi,1 , . . . , αi,m ∈ R with
α α
[pi ] = L1 i,1 L2 i,,2 · · · Lα
m
i,m
, (2.9)

such that the vectors

vi = (αi,1 , . . . , αi,m ) ∈ Rm , i = 1, . . . , m ,

are linearly independent and therefore form a basis of Rm .


Example 2.2 (Cont’d). The dimensional unit of energy has the canonical basis
representation  
1
 2  = e1 + 2e2 − 2e3
−2
if considered as a vector in R3 .
We call the set {p1 , . . . , pm } the set of primary variables or primary quanti-
ties.1 The secondary variables are then defined as the set

{s1 , . . . , sn−m } = {x1 , . . . , xn } \ {p1 , . . . , pm } . (2.10)

By construction, the secondary variables are expressible as linear combinations


of the primary variables. In terms of primary and secondary variables our
postulated model (2.8) reads (with an abuse of notation)

y = f (p1 , . . . , pm , s1 , . . . , sn−m ) . (2.11)

Step 2: Construct dimensionless quantities Having refined our model


according to Step 1 above, we construct a quantity z with [z] = [y] such that

z = pα αm
1 · · · pm ,
1
(2.12)

with uniquely defined coefficients α1 , . . . , αm ∈ R. Now call Π the dimensionless


quantity given by Π = y/z, in other words
f (p1 , . . . , pm , s1 , . . . , sn−m )
Π= , (2.13)
pα1 · · · pm
1 αm

1 We assume that the set of primary variables exists, otherwise we have to rethink our

postulated model f . (It is important that you understand this reasoning.)

8
We want to express Π solely as a function of the primary variables. To this end
note that we can write

[sj ] = [p1 ]αj,1 · · · [pm ]αj,m

for suitable coefficients αj,1 , . . . , αj,m ; this can be done for all the sj . Along the
lines of the previous considerations we introduce zj with [zj ] = [sj ] by
α
zj = p1 j,1 · · · pα
m
j,m

and define the dimensionless quantity Πj = sj /zj . Note that, by the rank-nullity
theorem there are exactly n − m such quantities where n − m is the dimension
of the nullspace of the matrix spanned by the x1 , . . . , xn Replacing all the sj by
zj Πj , we can recast (2.13) as

Π = F (p1 , . . . , pm , Π1 , . . . , Πn−m ) , (2.14)

with the shorthand


f (p1 , . . . , pm , z1 Π1 , . . . , zn−m Πn−m )
F (p1 , . . . , pm , Π1 , . . . , Πn−m ) := , (2.15)
pα1 · · · pm
1 αm

This suggests that we regard F as a function F : P → R of the primary variables


p1 , . . . , pm where P = span{p1 , . . . , pm } ⊂ Rm . Note, however, that Π and Πj
are dimensionless (and so is F ). Hence F is even independent of the primary
variables, for otherwise we could rescale, say, p1 , by which none of p2 , . . . , pm
or any of the dimensionless quantities Πj change (they are dimensionless); as
a consequence F is a homogeneous function of degree 0 in p1 , and the same is
true for any of the other pj . Therefore F is independent of p1 , . . . , pm .

Step 3: Find y up to a multiplicative constant The last statement can


be rephrased by saying that y can be expressed in terms of a relation between
dimensionless parameters. The surprising implication is that the unknown quan-
tity y that we want to model has the functional form of z in (2.12), namely,

y = Π pα αm
1 · · · pm .
1
(2.16)

No trigonometric functions, no logarithms or anything like this appear here. We


summarize our findings by the famous Buckingham Π Theorem (see [Buc14]).
Theorem 2.3 (Buckingham, 1914). Any complete physical relation of the form
y = f (x1 , . . . , xn ) can be reduced to a relation between the associated dimen-
sionless quantities, where the number of independent dimensionless quantities
is equal to n−m, the difference between the number physical quantities x1 , . . . , xn
and the number of fundamental dimensional units. That is, there exists a func-
tion Φ : Rn−m → R such that

y = z Φ(Π1 , . . . , Πn−m ) (2.17)

or, in other words,  


s1 sn−m
y = zΦ ,..., . (2.18)
z1 zn−m

9
2.2 A historical example
To appreciate the power and usefulness of Buckingham’s theorem we have to
see it in action. The following example is taken from [IBM+ 05]; see also [Tay50]
for the original article. When the U.S. tested the atomic bomb “Trinity” at Los
Alamos in 1945, the British physicist and mathematician Sir Geoffrey Taylor2
could quite accurately estimate the mass of the bomb based on the dimensional
analysis of the radius of the shock wave as a function of time, using only film
footage of the explosion. (The data was still classified then.) Taylor assumed
that the expanding shock wave R due to the explosion could be expressed as

R = f (t, E, ρ, p) (2.19)

where t is time, E the released energy (that is a function of the mass of the
bomb), ρ is the density of the ambient air and p denotes air pressure. The
corresponding physical units are (cf. Example 2.1)

[R] = L2 , [t] = L3 , [E] = L1 L22 L−2


3 , [ρ] = L1 L−3
2 , [p] = L1 L−1 −2
2 L3 ,

with the three fundamental physical units L1 (mass), L2 (length) and L3 (time).
The latter implies that there are three primary variables where, without loss of
generality we pick t, E ρ. Then
     
0 1 1
[t] =  0  , [E] =  2  , [ρ] =  −3  . (2.20)
1 −2 0

(Remember that Taylor wanted to find out how large E was, so our choice was
only to about 67 percent arbitrary.) Expressing [R] in terms of the chosen basis
then leads to the linear system of equations

Ax = b (2.21)

with unknown x = (α1 , α2 , α3 ) and coefficients


   
0 1 1 0
A= 0 2 −3  , b =  1  . (2.22)
1 −2 0 0

By construction the matrix A has maximum rank and so the unique solution of
(2.21)–(2.22) is x = (2/5, 1/5, −1/5)T , from which we find
1/5
t2 E

[R] = . (2.23)
ρ

Following (2.16) we can thus form a dimensionless quantity by setting


−1/5
t2 E

Π=R . (2.24)
ρ
2 SirGeoffrey Ingram Taylor F.R.S. (1886–1975) was a British physicist and mathematician,
expert on fluid dynamics and part of the British delegation to the Manhattan project between
1944 and 1945. The famous Taylor-Couette instability is named after him.

10
time (in miliseconds) radius (in meters)
0.10 11.1
0.24 19.9
0.38 25.4
0.52 28.8
0.66 31.9
.. ..
. .
3.53 61.1
3.80 62.9
4.07 64.3
4.34 65.6
4.61 67.4
.. ..
. .

Table 2: Radius of the shock wave as a function of time (from [Tay50]).

Analogously, we find for the remaining linearly dependent (secondary) quantity


−1/5
E 2 ρ3

Π1 = p . (2.25)
t6

Now combining (2.24)–(2.25), Buckingham’s theorem tells us that


 2 1/5  2 3 −1/5 !
t E E ρ
R= Φ p . (2.26)
ρ t6

Here Φ(·) is a yet unknown function of Π1 that must be determined from appro-
priate data.3 In Taylor’s case a reasonable approximation was Φ(Π1 ) ≈ Φ(0),
simply because t was small compared whereas E was fairly large, in other words:
1/5
E 2 ρ3

p ,
t6

assuming that the numerical values of p and ρ were of order 1. Taylor did
experiments with small explosives and found out that Φ(0) ≈ 1, which led him
to conclude that [Tay50]
 2 1/5
t E
R= (2.27)
ρ
describes the radius of the shock wave as a function of time t and the parameters
E, ρ. Given measurement data (t, R(t)) and the value of the density of air
ρ = 1.25kg/m3 , it is then possible to estimate E and hence the mass of the
nuclear bomb. Taylor had the following data:
Taking the log on both sides of (2.27) yields
2 1 1
log R = log t + b , b= log E − log ρ (2.28)
5 5 5
3 More about this in the next section.

11
from which E ≈ 8.05 · 1013 Joules can be obtained by a least squares fit of the
data. (See the next section.) Using the conversion factor 1 kiloton = 4.186 · 1012
Joules Taylor estimated the weight of the nuclear bomb Trinity as 19.2 kilotons.
The true weight of the bomb was about 21 kilotons which was revealed much
later. Thus Taylor estimate proved indeed quite accurate.

Problems
Exercise 2.4. Explain the statement p  (E 2 ρ3 /t6 )1/5 below equation (2.26).
Why would a statement like t6  E 2 or t ≈ 0 be meaningless?.
Exercise 2.5. Prove that the α1 , . . . , αm in (2.12) are unique.
Exercise 2.6. A recurrent theme in both U.S. kitchens and books on mathe-
matical modelling is the question how to cook a turkey. Cookbook sometimes
give directions of the form: “Set the oven to T0 = 180◦ C and put it in the oven
for 20 minutes per pound of weight.” Analyse (and criticise) this rule of thumb
based on the following modelling assumptions:
a) A piece of meat is cooked when its minimum internal temperature has
reached a certain value Tmin that may depend, e.g., on the type of meat.

b) The cooking time t is a function of the difference ∆T between the oven


temperature and the raw meat, the thermal conductivity κ of the meat, its
average density ρ and the characteristic size (length) l of the piece of meat.
c) Most mammals and birds obey the law of elastic similarity that says that
larger animals have relatively thicker trunks so as to ensure a certain sta-
bility against external stress while being efficient regarding the use of mate-
rial (e.g. bones) [Bie05]. Maximum efficiency is achieved when the vertical
thickness t of a body (trunk) scales with its length l as t ∝ l3/2 . Together
with the fact that volume is porportional to lA, with A ∝ t2 being cross-
sectional area, the elastic similarity principle implies that the mass of a
body of a bird or mammal is proportional to lt2 ∝ l4 .
d) Temperature is measured in units of energy per volume (and so are temper-
ature differences), the thermal conductivity measures the amount of energy
crossing a unit cross-sectional area per second divided by the temperature
gradient perpendicular to this area, i.e.,

[energy] × [length]
[κ] = .
[area] × [time] × [temperature]

12
3 Arguments from data
Mathematical tools & concepts: linear algebra, random variables
Suggested reference: [BTF+ 99]

Recall the problem of Section 2.2: Determine the size of a nuclear bomb
from measurement data {(ti , R(ti )) : i = 1, 2, . . . , N } based on the model (2.27).
The equivalent logarithmic representation (2.28) is an equation of the form
y(t) = αx(t) + β ,
with the new variables y = log R and x = log t, known parameter α and un-
known parameter β. If the measurement data and the model were exact, it
would be possible to estimate the unknown coefficient β from a single measure-
ment (x(t1 ), y(t1 )) = (log t1 , log R(t1 )).
If we take into account that measurement data are subject to measurement
errors coming, e.g., from the measurement apparatus or from other sources of
error that are not part of the measurement model, then an apparently more
realistic model could be an equation of the form
Y (t) = αX(t) + β + (t) , (3.1)
where  is a (typically stationary Gaussian) stochastic process that represents
the measurement or, more generally, statistical noise.4

3.1 Linear regression models


Generally, we consider linear models of the form
n
X
Y (t) = αi Xi (t) + β + (t) . (3.2)
i=1

Here α = (α1 , . . . , αn ) ∈ Rn and β are the a priori unknown model parameters,


Y denotes the dependent variable, that for simplicity is assumed to be scalar
and that depends on m independent variables X1 , . . . , Xn . We assume that  is
stationary with zero mean and covariance σ 2 , i.e. for all t it holds
E[(t)] = 0 , E[(t)2 ] = σ 2 . (3.3)
Note that the model is called linear because of the linear dependence on the
parameters; it does not matter whether the function is linear or nonlinear in the
Xi , for we can always redefine the Xi by a suitable nonlinear transformation,
such as Xi 7→ log Xi , to obtain an equation of the form 3.2.
Example 3.1 (Cobb-Douglas production function). Another model of the above
form is the popular Cobb-Douglas model that is used to express the amount of
output of a production as a function of capital and labor [Dou76]. In its most
standard form for production of a single good with two factors, the function is
Y = κLα1 C α2 e , (3.4)
4 That is, (t) = (t; ω) is a random variable (t; ·) : Ω → R for each fixed t. We adopt
the convention that capital letters, such as X, Y are used to distinguish real-valued random
variables (i.e. measurable functions X, Y : Ω → R on some probability space (Ω, E, P ), with E
being a σ-algebra over the set Ω) from their values x, y ∈ R.

13
with Y the total production (the real value of all goods produced in a year),
L the labor input (the total number of person-hours worked in a year), C the
capital input (the real value of all machinery, equipment and buildings), κ the
productivity, and  the statistical error. The coefficients αi are called the output
elasticities; they are a measure for the succeptibility of the output to a change
in levels of either labor or capital used. If α1 + α2 = 1, then doubling the usage
of capital C and labor L will also double output Y.
Taking the logarithm on both sides of (3.4), we have
log Y = α1 log L + α2 log C + log κ +  , (3.5)
where the right hand side of the equation is an affine function of the form
f (x) = αT x + β +  , (3.6)
with the coefficients α = (α1 , α2 )T and β = log κ and dependent variables
(X1 , X2 ) = (log L, log C). It is commonly assumed that  is a zero-mean Gaus-
sian random variable with variance σ 2 that is independent of C and L.5

Measurements. Without loss of generality we can assume that


β = 0.
This is so, because we can always treat β as one of the αi for Xi (t) = 1. (See
Exercise 3.13) below.) We now want to estimate the unkown model parameters
α = (α1 , . . . , αn ) from, say, N noisy measurements
M := {(x(tj ), y(tj )) : j = 1, . . . , N } (3.7)
of the measurement vector (x, y) = (x1 , . . . , xn , y). In other words, we have N
observational equations of the form
n
X
y(tj ) = αi xi (tj ) + (tj ) , j = 1, . . . , N , (3.8)
i=1

It is convenient to arrange the data as the matrix


 
x1 (t1 ) x2 (t1 ) · · · xn (t1 )
X = .. .. .. N ×n
∈R (3.9)
 
. . .
x1 (tN ) x2 (tN ) · · · xn (tN )
and define the vectors
   
y(t1 ) (t1 )
Y= .. ..
,  :=  . (3.10)
   
. .
y(tN ) (tN )
Thus (3.8) reads
Y = Xα + , (3.11)
where α is interpreted as a column vector. At this stage it does not matter
anymore whether the data are interpreted as N i.i.d. realizations of a random
variable, or as a time series with N independent data points.6 In the following we
will regard X as a measurement of the determinstic variables X = (X1 , . . . , Xn )
where the error in the dependent variable Y comes from the additive noise .
5 The Gaussianity is a consequence of the Central Limit Theorem. Can you explain why?
6 Here “i.i.d.” stands for “independent and identically distributed”.

14
Figure 3.1: Linear regression for n = 1: Find the straight line that minimizes
the sum of squared deviations from the data points.

3.2 Least squares and maximum likelihood principles


We want to estimate the unknown coefficients α from (independent) observa-
tions stored in the data matrices (X , Y).
Assumption 3.2. We implement the following standing assumptions for (3.12):
a) The vector  ∈ RN is a random variable with mean and covariance

E[] = 0 , E[T ] = σ 2 IN ×N .

b) The matrix X has full column rank n ≤ N .

Least squares. A general procedure to determine the unknown parameters is


to choose α, such that the error  is small in a suitable norm k · k. A convenient
choice is to minimize  in the `2 norm, in other words, we seek α, such that the
sum of squared deviations
N
X
|(ti )|2 = (Y − X α)T (Y − X α)
i=1

is minimized (cf. Figure 3.1). To this end we define the function

SN : Rn → R , α 7→ (Y − X α)T (Y − X α) , (3.12)

Under above assumptions SN is differentiable and strictly convex, hence it has


a unique minimum. (See Exercise 3.11 below.)
Definition 3.3. We call the minimizer

α∗ = argmin SN (α) (3.13)


α∈Rn

the least squares estimator (LSE) of α given the data (X , Y).

15
Theorem 3.4 (Least squares estimator). The LSE is given by

α∗ = (X T X )−1 X T Y (3.14)

Proof. The first and second derivatives of SN with respect to the parameter
vector α are given by

∇SN (α) = 2X T X α − 2X T Y , ∇2 SN (α) = 2X T X .

Equating the first derivative to zero we obtain the normal equation

X T X α − X T Y = 0,

that, under the assumption that the data matrix X ∈ RN ×n has maximum rank
n, has the unique solution

α∗ = (X T X )−1 X T Y .

Since, by the same argument, ∇2 SN (α) is positive definite and independent of


α, it follow that SN (α∗ ) is the unique minimum of SN .
Remark 3.5. The LSE can be interpreted as the best approximation of the
solution to the linear equation
Xα = Y (3.15)
in the Euclidean norm for given X ∈ RN ×n , Y ∈ RN . If N > n the linear
system of equations (3.15) does not have a solution, and by construction, the
LSE satisfies the best-approximation property

α∗ = argmin kX α − Yk2 ,
α∈Rn

with k · k denoting Euclidean norm. In other words, α∗ is the orthogonal pro-


jection of Y onto the column space of X .
Remark 3.6. It can be shown (see, e.g., [BTF+ 99, Sec. 3.4]) that, given the as-
sumptions from page 15, the LSE is the linear estimator of α that has minimum
variance among all linear estimators (best linear estimator).

Maximum likelihood principle. So far we have viewed (3.8) as a fitting


problem for the unknown parameter vector α without making any assumptions
on the very nature of the measurement error  or on the distributions of (X, Y ).
Specifically, we assume that X is deterministic and that all the (ti ) are uncor-
related Gaussian random variables.7
Assumption 3.7. The vector  ∈ RN is an N -dimensional Gaussian random
variable with zero mean and covariance

E[T ] = σ 2 IN ×N .
7 Thanks to the central limit theorem, this is often a reasonable choice.

16
Then in the linear regression model (3.8), the dependent variables Y are
Gaussian with
Y ∼ N (X α, σ 2 ) . (3.16)
We define the likelihood function of Y as the Gaussian density of Y , but con-
sidered as a function of the parameters α and σ 2 , i.e.
|y − αT x|2
 
L(α, σ 2 ; x, y) = (2πσ 2 )−1/2 exp − . (3.17)
2σ 2
By Assumption 3.7, the likelihood function of the data vector Y then is
N
Y
L(α, σ 2 ; X , Y) = L(α, σ 2 ; x(ti ), y(ti ))
i=1
  (3.18)
2 −N/2 1 T
= (2πσ ) exp − 2 (Y − X α) (Y − X α)

The idea of the maximum likelihood principle now is to find parameters α, σ 2


that fit the given data (X , Y) best. (Remember that we assume that X is
deterministic.) This is to say that we seek α, σ, such that the joint normal
density f (X , Y; α, σ 2 ) of the data (X , Y) attains its maximum, which means
that the data is best explained by these parameter values. If we consider the
density of the normal distribution as a function of the unknown parameters
(over which we want to maximize), we have our likelihood function

L(α, σ 2 ; X , Y) = f (X , Y; α, σ 2 ) (3.19)

Definition 3.8. We call

(α̂, σ̂ 2 ) = argmax L(α, σ 2 ; X , Y) (3.20)


(α,σ 2 )∈Rn ×R+

the maximum likelihood estimator (MLE) of (α, σ 2 ) given the data (X , Y).
The logarithm is monotonic, and it is often more convenient to maximize
the log-likelihood
N 1
log L(α, σ 2 ; X , Y) = − log(2πσ 2 ) − 2 (Y − X α)T (Y − X α) (3.21)
2 2σ
rather than the likelihood function. It can be readily seen that the maximizer
of the log-likelihood function is the MLE. The proof of the next theorem is left
as an exercise to the reader.
Theorem 3.9 (Maximum likelihood estimator). The MLE of (α, σ 2 ) is given
by
α̂ = (X T X )−1 X T Y
1 (3.22)
σ̂ 2 = (Y − X α̂)T (Y − X α̂)
N
Proof. Exercise.
Note that α̂ = α∗ , i.e., the MLE of α agrees with the LSE. It should be
stressed that both LSE and MLE are linear transformations of the random
observation Y, hence they are both random variables [BTF+ 99].

17
1.5 1.5

1
1

0.5
0.5

0
Y

Y
0
−0.5

−0.5
−1

−1 −1.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X X

Figure 3.2: LSE and MSE of the one-dimensional model (3.23) with N = 102
and N = 106 data points (red: exact model, green: estimated model).

Remark 3.10. The MLE σ̂ 2 = σ̂ 2 (N ) for the variance σ 2 of the measurement


error  is asymptotically unbiased, i.e., it holds

σ 2 = lim E[σ̂ 2 (N )] .
N →∞

Example. As an illustrative example we consider the model


1
Y (t) = X(t) + (t) (3.23)
2
with  ∼ N (0, σ 2 ). We have generated a random realization of  with N data
points, from which we generated observation data

x(ti ) = i/N , y(ti ) = 0.5x(ti ) + (ti ) , i = 1, . . . , N .

We then computed the LSE (equivalently: the MLE) for the given data. Figure
3.2 shows two typical realizations for N1 = 102 and N2 = 105 . The correspond-
ing estimates are α1∗ = 0.5572 for N1 = 102 and α2∗ = 0.5000 for N2 = 105 . The
fact that the estimator is linear in Y , together with the fact that the noise has
mean zero, implies that the LSE and MLE estimators are unbiased with

E[α∗ ] = E[α̂] = α .

This is to say that both MLE and LSE estimators will always fluctuate around
the true value α, no matter how small N is. By the law of large numbers, their
empirical mean converges to α when the estimation is repeated infinitely often.8

Generalized linear models (whitening). The linear regression model un-


der Assumption 3.2 is a special case of a model with error variance

E[T ] = σ 2 W ,
8 Clearly the estimator converges faster when N is larger, because by the central limit

theorem its variance decreases with 1/N .

18
with W ∈ RN ×N being a symmetric and positive definite (s.p.d.) matrix. A
noticeable feature of this so-called generalized linear modeI is that there are
N (N + 1)/2 additional unknowns in the game, if W is not known a priori.
Therefore it is impossible to estimate both α and σ 2 W without further assum-
tions on the measurement error, if the sample size N is fixed.
For simplicity we assume that W is given. In this case it is possible to reduce
the parameter estimation problem for the generalized model to the estimation
problem for (3.11). To this end, recall that both W and its inverse W −1 have
s.p.d. square roots, i.e., there exist s.p.d. matrices Q, R, such that

W = QQ , W −1 = RR .

Multiplying (3.11) with R = W −1/2 from the left, we obtain

RY = RX α + R , (3.24)

which upon redefining

Ỹ = RY , X̃ = RX , ˜ = R

can be recast as
Ỹ = X̃ α + ˜ , (3.25)
Rescaling the observation variables by the square root of the inverse error co-
variance is known by the name of whitening because now

˜T ] = σ 2 IN ×N .
E[˜

As a consequence the last equation is again a standard linear system satisfying


Assumption 3.2, and we can apply one of the Theorems 3.4 or 3.9 to find that
the best linear estimator for the unknown parameters α reads

α̃∗ = (X T W −1 X )−1 X T W −1 Y . (3.26)

The assertion that α̃∗ is unbiased and indeed the best linear estimator for
the parameters in the generalized linear model (3.24) is called Gauss-Markov-
Theorem; the interested reader is referred to [BTF+ 99, Thm. 4.4] for details.

Problems
Exercise 3.11. For the linear model (3.1) with unknown scalar coefficients
(α, β), we define the LSE (α∗ , β ∗ ) as the minimizer of the function
N
X
SN (α, β) = (y(ti ) − αx(ti ) − β)2
i=1

Prove that SN is convex and strictly convex if


N N
X 1 X
(x(ti ) − x̄) > 0 , x̄ = x(ti )
i=1
N i=1

19
Exercise 3.12. If we drop the assumption that the data matrix X ∈ RN ×n has
full rank n ≤ N , the LSE α∗ is given by any solution of the normal equations

X T X α − X T Y = 0.

In this case α∗ is no longer unique. Show that SN (α) as defined in (3.12) attains
its minimum for any solution of the normal equations.
Exercise 3.13. Consider the linear model (3.1) with unknown scalar coefficients
(α, β). Compute LSE and MLE of the parameters (α, β); cf. exercise 3.11.

20
Figure 4.1: Italian Front 1915–1917 (source: History Department of the US
Military Academy).

4 Population models in biology


Mathematical tools & concepts: ODE systems, eigenvalues
Suggested reference: [IBM+ 05]

A famous model that describes how biological populations evolve in time is


the Lotka–Volterra predator–prey model that is a variant of the logistic equation
(1.5). It was derived independently by Alfred J. Lotka in 1910 in the context
of chemical reactions [Lot10] and by Vito Volterra in 1926 in order to explain a
found paradox in the fish catches in the Adriatic Sea after World War I [Vol26].
The apparent paradox is an interesting one: In 1915 when Italy declared war
on Austria and both countries were afraid of being invaded via their sea ports,
they set mines in the Adriatic Sea to prevent the other party to reach their
harbours. As a consequence, fishing was stopped in the Adriatic Sea during
World War I. After the war, when the mines had been removed, the fishermen
expected an enourmous fish catch as the fish populations had had more than
three years to recover. However, the opposite was true, and it was Volterra who
first came up with an explanation using a mathematical model that is nowadays
known as the Lotka-Volterra equation.

4.1 Lotka-Volterra model


The Lotka-Volterra equation is the logistic equation for a biological population
with two species, one being the predator, the other one being the prey, for
example a fish predator and its prey. Let P (t) the number of predators and
N (t) the number of prey at time t. We assume that there is a enough food

21
available for the prey (e.g. plankton), but they are eaten by the predator. The
rate of change for the the prey per capita can be modelled by

Ṅ (t)
= a − bP (t) , (4.1)
N (t)
which describes exponential growth of the prey with effective growth rate a −
bP (t). The reproduction rate of the predator population depends on whether
there is enough for them to eat; they die without prey. Letting dN (t) − c be
the effective growth rate, the size of the predator population is governed by

Ṗ (t)
= −d + cN (t) . (4.2)
P (t)
We assume that a (prey reproduction rate), b (the rate of predation upon the
prey), c (growth rate of the predator population) and d (predator mortality)
are all strictly positive and that (4.1)–(4.2) are equipped with suitable initial
conditions N (0) = N0 > 0 and P (0) = P0 > 0.
It is customary to rescale the free variable t and the dependent variables
N, P to recast the equations in dimensionless form.9 To this end we define
c b
τ = at , u= N, v= P
d a
in terms of which the Lotka-Volterra equations read (see Exercise 4.6)
du
= u(1 − v) , u(0) = u0
dτ (4.3)
dv
= µv(u − 1) , v(0) = v0 ,

with µ = d/a.

Vector field and fixed points. Even though there is an explicit solution
to (4.3), we will not take advantage of this fact, but rather try to get some
qualitative insight into the dynamics of the Lotka-Volterra system by studying
the underlying autonomous (i.e. time-independent) vector field. Let

F µ : R + × R + → R2 , (u, v) 7→ (u(1 − v), µv(u − 1)) (4.4)

the family of vector fields associated with the Lotka-Volterra system, i.e. the
right hand side of (4.3) parametrized by µ > 0. The Lotka-Volterra vector field
is depicted in Figure 4.2 for µ = 1. Since Fµ is locally Lipschitz, the Picard-
Lindelöf existence and uniqueness theorem for initial value problems [Tes12]
implies that (4.3) has a unique solution. The solutions then are the integral
curves of Fµ , i.e., for every (u0 , v0 ) ∈ R+ × R+ a differentiable curve

γ : D → R+ × R+ , τ 7→ (u(τ ), v(τ )) (4.5)

with γ(0) = (u0 , v0 ) is a solution of (4.3) if


d
γ(τ ) = Fµ (γ(τ )) , τ ∈ D ⊂ R. (4.6)
dt
9 You should be able to explain the rationale behind the dimensionless scaling.

22
3

2.5

1.5
v

0.5

0
0 0.5 1 1.5 2 2.5 3
u

Figure 4.2: Vector field of the Lotka-Volterra system for µ = 1.

In other words, the solution trajectories are everywhere tangential to the vector
field. This gives us some idea of how a typical solution of (4.3) could look like.
An important property of any vector field are its critical points:
Definition 4.1. A point (ueq , veq ) ∈ R+ × R+ is called critical point, equilib-
rium or fixed point of (4.6) if Fµ (ueq , veq ) = 0.
By definition, a solution that goes through a critical point is constant, hence
the names “equilibrium” or “fixed point”. The Lotka-Volterra system has only
two critical points in the positive orthant, namely
(ueq , veq ) = (0, 0) and (ueq , veq ) = (1, 1) . (4.7)
The dynamics when one of the populations is absent at time τ = 0 is relatively
easy to understand: if u0 = 0 and v0 > 0, the first equation in (4.3) entails
u(τ ) = u0 which, together with the second equation, implies that
v(t) = e−µτ v0 . (4.8)
Thus the predators are bound to die out. Conversely, if v0 = 0 and u0 > 0 it
follows by the analogous argument that
u(t) = eτ u0 , (4.9)
assuming that the prey population has infinite resources available. (They will
die out later when they have eaten all the plankton.) If, however, both u0 and
v0 are different from zero but small, then u0  u0 v0 and v0  u0 v0 , which
suggests to neglect the bilinear terms in (4.3) and employ the approximation
du
≈ u , u(0) = u0
dτ (4.10)
dv
≈ −µv , v(0) = v0 .

23
This means that the prey population will still grow even though there is a small
predator population, while the number of predators will decrease initially; they
have not enough to eat, so they die before they can reproduce. Clearly, once
the prey population grows so that u(τ )v(τ ) is no longer small compared to u(τ )
the approximation that is behind (4.9) is no longer valid.
Now let us consider the other equilibrium (ueq , veq ) = (1, 1). To linearize Fµ
about the point (1, 1), it is convenient to introduce new coordinates by ξ = u−1
and η = v − 1, in terms of which (4.3) reads

= −η(1 + ξ) , ξ(0) = u0 − 1
dτ (4.11)

= µξ(η + 1) , η(0) = v0 − 1 .

Upon noting that u, v ≈ 1 is equivalent to ξ, η ≈ 0, this leads to the following
linearized system of differential equations

≈ −η , ξ(0) = u0 − 1
dτ (4.12)

≈ µξ , η(0) = v0 − 1 .

Upon replacing the “≈” in the linearized equation by equality signs, the latter
is equivalent to the differential equation of the pendulum,
d2 ξ
= −µξ(τ ) ,
dτ 2
from page 6, with solution

ξ(τ ) = A sin(µ1/2 τ ) + B cos(µ1/2 τ ) . (4.13)

The unknown constants A, B ∈ R depend only on the initial conditions. Now,


since η = −ξ 0 , we may suspect that (ξ(τ ), η(τ )) and hence (u(τ ), v(τ )) are
periodic in the neighbourhood of the equilibrium, with period given by

T = 2πµ−1/2 . (4.14)

We will come back to the validity of these kinds of arguments that are based on
linearization later on.

Integral curves and periodic orbits. The above reasoning suggests that
the solutions of the Lotka-Volterra equation are periodic, at least in the neigh-
bourhood of the critical point (1, 1). We will now show that all nonstationary
solutions of (4.3), i.e. all solutions away from the critical points are indeed pe-
riodic. For this purpose we need the following definition.
Definition 4.2. A function I : R2 → R is called a first integral (also: constant
of motion or conserved quantity) if

I(γ(τ )) = I(γ(0)) ∀τ ∈ D .

First integrals of an ODE, such as (4.3), are useful in either finding explicit
solutions or in finding periodic orbits. Specifically, if an ODE has an integral
with compact level sets, then these level sets are candidates for periodic orbits.

24
Theorem 4.3. Let u0 , v0 > 0, (u0 , v0 ) 6= (1, 1). Then τ 7→ (u(τ ), v(τ )) is
periodic, i.e., there exists a positive number T ∈ (0, ∞), such that

(u(τ + T ), v(τ + T )) = (u(τ ), v(τ )) ∀τ ≥ 0 .

Proof. Suppose u 6= 0 and v 6= 1. If we divide the second equation of (4.3) by


the first equation and switch to u as the free variable, we obtain

dv v(1 − u)
= −µ , v(u0 ) = v0 ,
du u(1 − v)

which is a separable equation for v = v(u). Separating variables and integrating,


Z v Z u
1 − ṽ 1 − ũ
dṽ = −µ dũ ,
v0 ṽ u0 ũ

yields
log v − v + C1 = µ(u − log u) + C2 ,
with integration constants C1 = v0 − log v0 and C2 = µ(log u0 − u0 ). Defining
C = C1 − C2 and switching back to the free variable τ , it follows that

C = µu(τ ) + v(τ ) − log(u(τ )µ v(τ )) ∀τ .

The function
I: X → R, (u, v) 7→ µu + v − log(uµ v)
with X = R+ × R+ is strictly convex with compact level curves

I −1 (C) = {(u, v) ∈ X : I(u, v) = C} ⊂ X

for all
C ≥ min I(u, v) = µ + 1 .
(u,v)∈X

I is strictly convex, hence it has a unique minimum I(1, 1) = µ + 1. It then


follows that the solutions of (4.3) with strictly positive initial conditions u0 , v0 >
0, (u0 , v0 ) 6= (1, 1) lie on the contours I −1 (·) having all positive length. Since
kFµ k is finite and bounded away from zero on each contour, it follows that every
solution returns to its initial value in finite time 0 < T < ∞.

The left panel of Figure 4.3 shows the numerically computed solution for
three different initial values that have been computed with the Matlab function
ode15s. Due to finite precision of the numerical solver, the numerical solution
is not exactly periodic and spirals inwards (see the right panel of the figure).
We can say more about the oscillations around the equilibrium (ueq , veq ) =
(1, 1): Their running mean is equal to the equilibrium value.

Lemma 4.4. It holds that

1 T 1 T
Z Z
u(t) dt = v(t) dt = 1 .
T 0 T 0

for all solutions of (4.3) with period T = T (u0 , v0 ).

25
3
3

2.5
2.5

2 2
v

v
1.5 1.5

1 1

0.5 0.5

0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
u u

Figure 4.3: Left panel: Solutions of (4.3) for different initial conditions (u0 , v0 ) =
(1.2, 1.2) (red), (u0 , v0 ) = (0.5, 0.5) (orange) and (u0 , v0 ) = (1, 3) (green). Right
panel: Numerically computed green solution trajectory for 500 time periods.

Proof. Let us prove the rightmost part of the above equality and consider the
equation for u only:
du
= u(1 − v) .

By Theorem 4.3 there exists T ∈ (0, ∞), such that u(T ) = u(0). Separating
variables and integrating from 0 to T yields
Z u(T ) Z T
du
= (1 − v(t)) dt ,
u(0) u 0

which, using that the upper and lower limit in the integral on the left side of
the equality coincide, can be recast as
Z T Z T
1
T = v(t) dt ⇔ v(t) dt = 1 .
0 T 0

The other part of the assertion can be proved in exactly the same way by solving
the equation for v.

The effect of fishing We now want to model the effect that fishing has on
fish predators and their prey. For the sake of simplicity we assume that the
reduction rate of predator and prey population due to fishing pressure is given
by a single parameter δ > 0. The unscaled equations (4.1)–(4.2) then become

dN δ
= N δ (a − bP δ − δ) , N δ (0) = N0
dt (4.15)
dP δ
= −P δ (d − cN δ + δ) , P δ (0) = P0 .
dt
In the unscaled form, the nontrivial equilibrium is
δ δ

Neq , Peq = ((d + δ)/c, (a − δ)/b) , (4.16)

26
3

2.5

1.5
v

0.5

0
0 0.5 1 1.5 2 2.5 3
u

Figure 4.4: The effect of fishing for δ/a = δ/d = 0.3: typical solutions without
(red) and with fishing (blue).

which reduced to the known equilibrium for δ = 0, namely,


0 0

(ueq , veq ) = (1, 1) ⇔ Neq , Peq = (d/c, a/b) (4.17)
We observe that the fishing shifts the equilibrium towards smaller values of
predators, but larger values of its prey. In other words, the fishing pressure of
the predator is higher than on the prey (because its food basis is reduced as
well). According to Lemma 4.4 the average catch is equal to (cf. 4.8)

1 T δ 1 T δ
Z Z
δ δ
Neq = N (t) dt , Peq = P (t) dt . (4.18)
T 0 T 0
As a consequence the total average catch is given by
   
δ δ
 d a δ δ
δ Neq + Peq =δ + +δ − (4.19)
c b c b
Assuming that the effect of the fishing kicks in immediately whereas the equi-
libria represent long term properties of the ecosystem, the average catch after
the recovery phase is smaller than it was before, if and only if
b > c, (4.20)
which is the case when the reduction of the predator population due to fishing
leads to a relatively higher survival rate of its prey.

4.2 Stability of fixed points


In the last section we have analysed the local behaviour on the nonlinear Lotka-
Volterra model in the neighbourhood of its critical points based on a lineariza-

27
tion about these points. As we have seen, the linear model shares some fea-
tures of the nonlinear model, e.g., periodic oscillations around the critical point
(ueq , veq ) = (1, 1), or the exponential growth of the prey population close to the
origin. The idea behind this kind of analysis is that the solution of the original
nonlinear system and its linearization should behave similarly in a small neigh-
bourhood of the fixed point. Under certain assumptions this idea can indeed be
justified, and we will give precise statements below.10

Linearization about a critical point. We confine our attention to the case


of two-dimensional systems. Consider the autonomous ODE
dx
= F (x) (4.21)
dt
where F : R2 → R2 is any smooth (e.g. Lipschitz continuous and continu-
ously differentiable) vector field with an isolated fixed point x∗ ∈ R2 satisfying
F (x∗ ) = 0. By Taylor’s theorem, we can expand F about the point x∗ :

F (x) = F (x∗ ) + ∇F (x∗ )(x − x∗ ) + o(kx − x∗ k) , (4.22)

with the matrix !


∂F1 ∂F1
∗ ∂x1 ∂x2
∇F (x ) = ∂F2 ∂F2
∂x1 ∂x2

denoting the 2×2 Jacobian of the function F , evaluated at x = x∗ . We moreover


used the Landau notation o(kx − x∗ k) to indicate that the remainder goes to
zero faster than kx − x∗ k as x → x∗ , i.e.,
r(x)
r(x) = o(kx − x∗ k) ⇔ lim = 0.
x→x∗ kx − x∗ k
Setting y = x − x∗ the linearization of (4.21) thus reads
dy
= Ay , A = ∇F (x∗ ) , (4.23)
dt
which is hopefully a good approximation to (4.21), whenever kyk is small. We
will specify what “good approximation” means later on.

Classification of equilibria. Consider now the initial value problem


dy
= Ay , y(0) = y0 , (4.24)
dt
for a regular matrix A ∈ R2×2 . The solution of (4.23) is given by

y(t) = exp(At)y0 , (4.25)

where the exponential of a matrix (called matrix exponential ) is defined by



X Bk
exp(B) = . (4.26)
k!
k=0
10 For the case of the critical point (u , v ) = (1, 1) this is not the case, unfortunately, even
eq eq
though the linearized system showed the same qualitative behaviour as the nonlinear model.

28
Note that this is in accordance with the usual exponential series

X zk
exp(z) =
k!
k=0

for a real number z ∈ R. Now suppose that A can be diagonalized by solving


the corresponding eigenvalue problem

Av = λv

for some λ ∈ C. It is easy to see that if A has two distinct eigenvalues λ1 , λ2 ,


then the corresponding eigenvectors v1 , v2 are linearly independent. To see this,
assume the contrary: then there exists a α ∈ C, such that

v1 − αv2 = 0 .

But this implies

0 = A(v1 − αv2 ) = λ1 v1 − λ2 αv2 = (λ1 − λ2 )v1 6= 0 ,

which proves that the assumption that v1 and v2 are linearly dependent must be
wrong. Calling V = (v1 , v2 ) the 2 × 2 matrix that diagonalizes A, i.e. V −1 AV =
Λ with Λ = diag(λ1 , λ2 ). Then, by definition of the matrix exponential, we have

exp(At) = V exp(Λt)V −1 , (4.27)

where exp(Λt) is a diagonal matrix with entries exp(λi t). Depending on whether
the real part of the λi is positive negative or zero the exponential eλi t will either
grow, decay or oscillate. As a consequence we can analyse the solution (4.24)
in terms of the eigenvalues of the matrix A.
Depending on whether eigenvalues are real or complex, we can distinguish 5
cases as is illustrated in Figure 4.5; the case when the two eigenvalues λ1,2 = ±iω
are pure imaginary, in which case the the solution of (4.3) is a linear combination
of sines and cosines, is not extra mentioned.11 Critical points with the property
that the real part of λ1,2 = ±iω of the Jacobian matrix is zero are called elliptic,
otherwise the critical point is called hyperbolic. The following theorem that is
stated in an informal way guarantees that the linearization of a nonlinear ODE
preserves the properties of hyperbolic equilibria.12

Theorem 4.5. Let dy/dt = Ay be the linearization of dx/dt = F (x) at a critical


point x∗ . If no eigenvalue of A = ∇F (x∗ ) has real part zero then there exists
a small neighbourhood of x∗ , in which the flow map of the linearized system is
topologically conjugate to the flow map of the nonlinear system.
Informally the theorem states that the solution of a linearized ODE close
to a hyperbolic equilibrium is basically a distorted, but otherwise qualitatively
equivalent version of the exact solution. In particular, the stability of hyperbolic
equilibria is preserved under linearization; for details see [Tes12, Sec. 9.3].
11 This is a consequence of the famous Euler formula eiωt = cos(ωt) + i sin(ωt). Note that

complex eigenvalues of a 2 × 2 matrix always come in conjugate pairs λ1,2 = α ± iω


12 The theorem is due to the Russian mathematician David Grobman and the U.S. mathe-

matician Philip Hartman who proved it independently.

29
Figure 4.5: Classification of equilibria of dy/dt = Ay in termsp
of the eigenvalues
of A ∈ R2×2 . The eigenvalues are given by λ1,2 = τ /2 ± τ 2 /4 − ∆ where
∆ = det A and τ = traceA (figure taken from: [Izh07]).

Stability analysis of the Lotka-Volterra model. As an example to illus-


trate the above we analyse the behaviour of the dimensionless Lotka-Volterra
system close to the origin, i.e., we revisit (4.10). The Jacobian at (0, 0),
 
1 0
A= , (4.28)
0 −µ

has the two real eigenvalues λ1 = −µ < 0 and λ2 = 1. Hence the origin is a sad-
dle or semistable equilibrium. As a second example we consider the linearization
(4.12) around the fixed point (1, 1); this equilibrium is neutrally stable, because
solutions are periodic, which means that any solution in a sufficiently small
neighbourhood of (1, 1) stays within this neighbourhood without approaching
the fixed point. As a consequence the linearization cannot be informative, for
otherwise the critical point would be hyperbolic (e.g. asymptotically stable)
rather than elliptic. The Jacobian matrix at (1, 1) reads
 
0 −1
A= , (4.29)
µ 0

and indeed the two eigenvalues of A are λ1,2 = ±i µ. Note that even though
the linearized system is again neutrally stable, Theorem 4.5 does not apply.13
13 Further note note that elliptic and hyperbolic equilibria are not mutually exclusive and so

the fact that both eigenvalues are pure imaginary does not imply that the nonlinear system
has an elliptic fixed point too. For instance, it can happen that the nonlinear system has a
crititcal point that is unstable in one direction and neutrally stable in the other direction,
whereas the linearized system has an elliptic fixed point.

30
Problems
Exercise 4.6. Show that (4.1)–(4.2) and (4.3) are equivalent under the substi-
tutions
c b
τ = at , u = N , v = P .
d a
Exercise 4.7. Consider the linear ODE system
dx
= y , x(0) = x0
dt
dy
= −x , y(0) = y0 .
dt
a) Show that I(x, y) = x2 + y 2 is a constant of motion.
b) Consider the forward Euler scheme
xn+1 = xn + hyn
yn+1 = yn − hxn , n = 0, 1, 2, . . .
for sufficiently small step size h > 0 and show that dn (x0 , y0 ) = x2n + yn2
is strictly increasing for all (x0 , y0 ) ∈ R2 \ {(0, 0)}, with
lim dn = ∞ .
n→∞

Exercise 4.8. Prove (4.18).


Exercise 4.9. Consider the family of linear systems
dx
= y , x(0) = x0
dt
dy
= −x − µy , y(0) = y0
dt
for a scalar parameter µ ∈ R.
a) Characterize the stability of the unique fixed point (xeq , yeq ) = (0, 0) as a
function of the parameter µ. Plot the eigenvalue(s) of the Jacobian matrix
over the range −3 ≤ µ ≤ 3 in 0.1 steps.
b) Solve the equation in Matlab over the time interval I = [0, 10] with initial
condition (x0 , y0 ) = (1, 0) and the parameter values
µ ∈ {−2, −1, 0, 1, 2} ,
using the function ode23t. Plot the solutions (x(t; x0 , y0 ), y(t; x0 , y0 )) in
the x-y-plane and explain your observation.
(Hint: Use the Matlab command help ode23t and modify the example
accordingly. It is recommended to use function handles “@” to define the
right hand side of the ODE; for details type doc function_handle.)
Exercise 4.10. Consider the following modification of the Lotka-Volterra model:
dN 
= N  (1 − P  − N  ) , N  (0) = N0
dt
dP 
= −P  (1 − N  ) , P  (0) = P0 ,
dt
where  > 0. Compute all fixed points and analyse their stability.

31
Exercise 4.11. The following plots show linear 2-dimensional vector fields
along with some typical solution trajectories (shown in red). Classify the sta-
bility of the associated fixed points according to the eigenvalues of the Jacobian:

3 3

2 2

1 1
y2

2
0 0

y
−1 −1

−2 −2

−3 −3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
y1 y1

3 3

2 2

1 1
y2

0 0
y

−1 −1

−2 −2

−3 −3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
y1 y1

3 3

2 2

1 1
y2

0 0
y

−1 −1

−2 −2

−3 −3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
y1 y1

32
5 Basic principles of control theory
Mathematical tools & concepts: ODE, optimization
Suggested reference: [Whi96]

Recall the considerations about the effect of fishing on a population of two


species on page 27. We will now modify the question a little bit and ask whether
there is an optimal harvesting strategy that maximizes the sustainable catch or
that maximizes the profit on a time-horizon of several years or tens of years.

5.1 Fishery management based on the logistic model


Our considerations will be based on the logistic population model for a single
species [IBM+ 05]. In other words, we do not take into account the interaction
between different species as in the previous section. Our model must be reason-
ably simple so that it is amenable to mathematical analysis, but still useful for
the given purpose. To this end we introduce the functions

x(·) ∈ R , b(·) ∈ R , h(·) ∈ R , (5.1)

where x(t) denotes the fish population at time t, b(t) the number of boats
operating at time t and h(t) the harvesting rate at time t. For simplicity,
we assume that all functions can take real values, even though the number
of boats will be an integer number. Our harvesting strategy will be based
on controlling the number of boats that are used for the fishing; we call b the
control variable, even though, strictly speaking, it is a (e.g. piecewise continuous)
function b : [0, ∞) → R.
There are clearly other players in the game of finding an optimal harvesting
strategy that come in form or parameters or boundary conditions, such as legal
requirements, wages or overhead costs of maintaining a fishing fleet. Specifically,
we consider the following parameters: cB > 0 the overhead cost per boat and
unit of time, n the number of fishermen per boat, w their salary per unit of
time, p the market price of one unit of fish. The boundary conditions and
available parameters determine what a good harvesting strategy is. For example,
maximizing the sustainable catch is different from maximixing the long-term
profit, which may be different from maximing the short-term profit.14

Constitutive relations, equations of motion and admissible controls


It is time to set up the model. The first step consists in relating the harvest rate
h with the number of fishes, x, and the number of boats, b. The static relation
between these variables that is different from, say, the dynamic relation between
different species in the predator-prey model is called a constitutive relation.
Another example of a constitutive relation is Hooke’s law is a kinematic relation
between the force exerted by a spring and its extension, in contrast to Newton’s
law that expresses a dynamical relation between force and acceleration. Here
we assume that the harvesting rate is proportional to both the number of fishes
and and the number of boats, i.e., we assume the following relation

h(t) = qx(t)b(t) , (5.2)


14 “The answer depends on the question”.

33
where q > 0 is a proportionality constant that depends on the efficacy of the
fishing boats (e.g. the nets used etc.). The harvesting rate is the rate by which
the growth rate of a fish population is reduced as an effect of fishing; we assume
that the fish population evolves according to the logistic equation:
dx  x
= γx 1 − − h , x(0) = x0 > 0 (5.3)
dt K
where γ > 0 is the initial growth rate of the population when x is small and
K > 0 is the capacity of the ecosystem without fishing (cf. p. 5). Maximizing
any given objective, such as sustainable catch or profit under the constraint
that the fish population evolves according to the dynamics (5.3) is not possible
without further specifying what the admissible controls b(·) are. Here we assume
that the only admissible strategies are of the form
(
0 t ≤ t∗
b : [0, ∞) → R , b(t) = (5.4)
b0 t > t∗ ,

with the two adjustable, but a priori unkown parameters t∗ ≥ 0 and b0 > 0. As
a consequence our harvesting strategy can be controlled by choosing the right
time t∗ at which fishing is started or resumed and the corresponding number b0
of boats. The resulting logistic model then is a switched ODE of the form
(
dx γx (1 − x/K) t ≤ t∗
= (5.5)
dt γx (1 − qb0 /γ − x/K) t > t∗ ,

where we have used the constitutive relation (5.2) in the second equation.

Maximizing the sustainable catch Suppose we want to choose b0 so that


the average long-term catch is maximized. This requires that we do not overfish,
for otherwise the fish population goes extinct and hence the average long-term
catch is zero. For the average long term catch, it does not matter how t∗ is
chosen, so we can set it equal to zero and ignore it in what follows.
We identify the sustainable population under fishing with the asymptotically
stable equilibrium of the system for b0 > 0; asymptotic stability is essential for
long-term catchability, because it the asymptotic stability guarantees that the
equilibrium is robust under small perturbation, in other words, the population
returns to its equilibrium size after a small perturbation that may be, e.g.,
due to fluctuating environmental conditions. If one is fishing at an unstable
equilibrium instead the fluctuations may cause the population to drift away
from its equilibrium and eventually go extinct.
Lemma 5.1. Let γ > qb0 . Then
 
qb0
x∗ = 1− K
γ
is the unique stable equilibrium.
Proof. The proof is left as an exercise.
Remark 5.2. The assumption γ > qb0 makes sure that the fish population,
growing with rate γ when sufficiently far away from the capacity limit, is not
eaten up by the fishing. For γ = qb0 the single stable equilibrium is x∗ = 0.

34
35
K (no fishing
K (fishing)
30 fish population

25

20
x

15

10

0
0 20 40 60 80 100
t

Figure 5.1: Solution of the logistic equation (5.5) with parameters γ = 0.25,
K = 30, q = 0.025, b0 = 2, t∗ = 60 and initial value x0 = 2.

Bear in mind that the solution to the logistic equation for b0 = 0 satisfies

lim x(t; b0 = 0) = K .
t→∞

That is, the fishing reduces the capacity of the ecosystem by the factor 1−qb0 /γ.
A representative solution of (5.4) is shown in Figure 5.1. We now define the
average long-term catch as

1 T
Z
J0 (b0 ) = lim h(t)dt , (5.6)
T →∞ T 0

where the expression for the associated sustainable catch rate follows from (5.2):

h(t) = qx∗ b0 . (5.7)

Hence, with Lemma 5.1,


 
qb0
J0 (b0 ) = qb0 K 1 − . (5.8)
γ
The function J0 (·) is strictly concave, which implies that it has a unique maxi-
mum. The maximizer b∗0 = argmax J0 (b0 ) is given by
γ
b∗0 = , (5.9)
2q
which—rounded to the nearest integer—determines the optimal fleet size. (Here
in the example, b∗0 = 5). The corresponding optimal sustainable population is
K
x∗ = . (5.10)
2

35
We observe that the maximum sustainable catch is independent of the efficacy
q, which seems counterintuitive, but is understandable if we realize that b∗0 is
inversely proportional to q, which makes the optimal harvesting rate indepen-
dent of q. Rougly speaking, a lower efficacy requires to use more boats and vice
versa: With too many boats the fish population is depleated too much, which
results in a lower catch; the same happens when too few boats are at work,
which conserves the fish population, but is suboptimal in terms of the catch.

5.2 Optimal control


The function J0 is symmetric about its maximum, so if the optimal number of
boats was, say, b∗0 = 4.6, the sustainable catch with b0 = 5 boates would be
slightly higher than with b0 = 4. If, however, we take into account that fishing
boats are costly, b0 = 4 will probably be the more reasonable choice.

Objective functional: maximizing profit We will now seek to maximize


profit rather than catch, which requires to take into account the costs of main-
taining a fishing fleet, the market price of fish etc. To this end we define profit
as revenue minus the total costs; accordingly the profit rate is the revenue rate
minus the rate of the total costs. Using that revenue is the catch times the
market price of fish, whereas the total cost is the sum of the overhead costs and
the salaries of the fishermen. That is:

P (t) = ph(t) − (cB + nw)b(t) . (5.11)

The total profit until time t = T is then obtained by integrating over the profit
rate from 0 to T . To simplify matter we assume that T = ∞ and we discount
the future profit with a discount rate δ > 0. Together with the constitutive
relation (5.2) the overall profit as a function of b turns out to be
Z ∞
J(b) = e−δt b(t) (pqx(t) − c) dt . (5.12)
0

with the shorthand c = cB + nw. The discount factor δ accounts for inflation,
interest rates or the fact that future rewards are less profitable than immediate
rewards; it also ensures that J is finite for our choice of admissible controls b(·).

Extremum principle We want to maximize (5.12) over all admissible har-


vesting strategies, i.e. over the switching time t∗ and the number of boats b0 .
Since the population x(t) depends on this choice, our optimal harvesting prob-
lems is of the form of a maximization probem with a constraint:

max J(b) (5.13)


b(·)

over the set of admissible control strategies defined by (5.4) and subject to
(
dx γx (1 − x/K) t ≤ t∗
= (5.14)
dt γx (1 − qb0 /γ − x/K) t > t∗ .

Generally, problems of the form (5.13)–(5.14) can be solved by the method of


Lagrange multipliers or by eliminating the contraint; see [Whi96] for further

36
35
K (no fishing)
K (fishing)
30 fish population

25

20
t

15

10

0
0 5 10 15 20 25 30 35 40
x

Figure 5.2: Solution of the switched logistic equation. the solution at the switch-
ing point t∗ is continuous but non-differentiable, because the control variable
has a jump discontinuity at t∗ and jumps from b(t∗ ) = 0 to b(t∗ + ) = b0 .

details and alternative methods for solving optimal control problems. Note that
Z t∗ Z ∞
J(b) = e−δt b(t) (pqx(t) − c) dt + e−δt b(t) (pqx(t) − c) dt
0 t∗
Z ∞
= e−δt b0 (pqx(t) − c) dt .
t∗

As a consequence we can solve (5.13)–(5.14) by first determining the optimal


swiching time t∗ which allows for solving (5.14) analytically and plugging the
solution x(t) into (5.13), which then eliminates the constraint and allows us to
compute the optimal number of boats.
1. As a first step, we maximize over the switching time t∗ . Clearly the opti-
mal swiching time will depend on the initial value x0 : If x0 is larger than
the maximum capacity under fishing, then it pays off resume fishing from
the very beginning; if, however, the initial fish population is below the
capacity, then one should wait and resume fishing once the fish popula-
tion has reached the fishable capacity; waiting longer to further increase
the population does not pay of, in particular since future profits are dis-
counted; see Figure 5.2. Let us assume that x0 < x∗ and recall that
K
x(t) = , t ∈ [0, t∗ ] (5.15)
1 + (K/x0 − 1) exp(−γt)
is the solution to the logistic equation in the initial period without fishing.
The optimal switching time is then determined by the condition
x(t∗ ) = x∗ . (5.16)

37
Solving the equation for t∗ yields
    
∗ −1 K γ
t =γ log − 1 + log −1 , (5.17)
x0 qb0
which determines the optimal switching time t∗ = t∗ (b0 ) as a function of
the number of boats (via the capacity that is a function of b0 ).
2. As a next step we eliminate the constraint from J, by noting that
x(t) = x∗ ∀t ≥ t∗ . (5.18)
Hence Z ∞
J(b) = e−δt b0 (pqx(t) − c) dt
t∗
Z ∞    
qb0
= b0 e−δt pqK 1 − − c dt (5.19)
t∗ (b0 ) γ
   
b0 qb0 ∗
= pqK 1 − − c e−δt (b0 ) .
δ γ
The profit function is nonnegative when pqK(1 − qb0 /γ) > c, with c de-
noting the total cost per boat. Then for
 
γ c
0 ≤ b0 ≤ 1− (5.20)
q pqK
the function J is bounded from below by zero and has a unique maximum
by Rolle’s theorem. (Recall that the optimal fleet size for the maximum
sustainable catch was b∗0 = γ/(2q).) An example with the parameters
γ = 0.25, K = 30, q = 0.025, p = 10, δ = 0.2, x0 = 2 , (5.21)
is shown in Figure 5.3.

Problems
Exercise 5.3. Prove Lemma 5.1.
Exercise 5.4. Consider the time-discrete logistic model with seasonal fishing
 xn 
x∗n+1 = xn + γ̃xn 1 −
K
xn+1 = (1 − q̃b0 ) x∗n+1
that can be interpreted as the forward Euler discretization of (5.5) with γ̃ = γ∆t1
and q̃ = q∆t2 . Compute the maximum sustainable average catch as a function
of ∆t1 (recovery period) ∆t2 (fishing season) and b0 (number of boats).
Exercise 5.5. Compute the optimal fleet size for the parameters (5.19) to obtain
a) the maximum sustainable catch,
b) the maximum long-term profit,
as described in the text. Plot the profit functions in both cases, compare the
results and discuss the role of the discount parameter δ.

38
0.6

0.5

0.4

0.3
J

0.2

0.1

0
0 1 2 3 4 5 6 7
b0

Figure 5.3: Profit as a function of the number of ships operating at time t ≥ t∗ .

6 Basic principles of bifurcation theory


Mathematical tools & concepts: ODE, bifurcations, hysteresis, attractors.
Suggested reference: [KaEn]

An important concept when studying ecological or climate systems is the


concept of resilience. Resilience can be defined as the capacity of a system to
recover from stress, i.e. to respond to a perturbation by resisting damage and
recovering quickly. Disturbances of sufficient magnitude or duration can pro-
foundly affect a system and may force it to reach a threshold, or a tipping point,
and enter a different regime. In the framework of dynamical systems theory,
the question is then wether abrupt changes in the dynamics will occur and if
yes, at which values of parameters those changes will occur. An abrupt change
in model dynamics with a (slight) change in parameters is called a bifurcation.
The change of a steady state from a stable node into a saddle is an example
of such a bifurcation. Bifurcation theory is the part of theory about dynami-
cal systems, that deals with classifying, ordering and studying the regularity in
these changes. A famous example is the Earth’s radiation budget or global en-
ergy balance model (EBM). A global EBM summarises the state of the Earth’s
climate in a single variable, namely the temperature overaged over the entire
globe.
In 1964, Brian Harland at Cambridge University postulated that the Earth
had experienced an ice age with global scale glaciation. He pointed out that
sedimental glacial deposits, similar in type to those found in Svalbard or Green-
land, are widely distributed on virtually every continent. Climate physicists
were just developing mathematical models of the Earth’s climate, providing a
new perspective on the limits to glaciation.

39
Figure 6.1: Detailed radiative energy balance (source: Trenberth et al., Bulletin
of the American Meteorological Society, 2009).

6.1 Solar radiation


The Earth’s climate is fundamentally controlled by the way that solar radiation
interacts with the Earth’s surface and atmosphere. We receive around 343
watts per square meter of radiation from the Sun. Some of this is reflected
back to space by clouds and by the Earth’s surface, but approximately two
thirds is absorbed by the Earth’s surface and atmosphere, increasing the average
temperature. Earth’s surface emits radiation at longer wavelengths (infrared),
balancing the energy of the radiation that has been absorbed. If more of the solar
radiation were reflected back to space, then less radiation would be absorbed at
the surface and the Earth’s temperature would decrease. The surface albedo is
a measure of how much radiation is reflected; snow has a high albedo (∼ 0.8),
seawater has a low albedo (∼ 0.1), and land surfaces have intermediate values
that vary widely depending mainly on the types and distribution of vegetation.
When snow falls on land or ice forms at sea, the increase in the albedo causes
greater cooling, stabilizing the snow and ice. This is called ice-albedo feedback,
and it is an important factor in the waxing (and waning) of ice sheets.

6.2 Energy balance models


A simple observation is that the global average temperature at the Earth’s sur-
face increases if the amount of energy reaching the Earth from the Sun exceeds
the amount of energy emitted by the Earth and released into the stratosphere,
and decreases if the converse is true. To consider the simplest scenario, we think
of the Earth as a solid sphere and ignore all spatial differences to characterize
the state of the system by a single variable, namely the global mean surface tem-
perature T . We are interested in the evolution of T over time t. We merge all
components that can exchange heat with outer space into one single element to
obtain the heat capacity C of the system. This is the energy needed to raise the

40
temperature T (t) by one kelvin (the heat capacity varies a lot between land, wa-
ter, etc, but we consider again the global average value C). The energy needed
to increase T by an amount ∆T after a time ∆t, i.e. T (t + ∆t) = T (t) + ∆T is
thus AC∆T , where A is the surface area of the planet.
Now let Ein be the average amount of solar energy reaching one square
meter of the Earth’s surface per unit time, and Eout be the average amount of
energy emitted by one square meter of the Earth’s surface and released into the
stratosphere per unit time. Then we have

AC∆T = A(Ein − Eout )∆t .

Letting ∆t tend to zero, we obtain the global energy balance model describing
the evolution of T ,
dT
C = Ein − Eout . (6.1)
dt
We assume that no forcing modifies the solar energy or radiative properties; Ein
and Eout thus do not depend explicitly on time (but they depend on T ). If the
incoming energy balances the outgoing energy, the Earth’s temperature remains
constant and the planet is said to be in thermal equilibrium. To specify Ein
and Eout , we consider the following:
• Viewed from the Sun, the Earth is a disk of area πR2 , where R is the
radius of the Earth.

• The energy flux density is S0 .


• The amount of energy flowing through the disk (i.e. reaching the Earth)
is Ein = (1 − α)πR2 S0 , where α is the average albedo of the Earth.
• All bodies radiate energy in the form of electromagnetic radiation. The
amount of energy radiated (black body radiation) depends on temperature
according to the Stefan-Boltzmann law,

FSB (T ) = σT 4 .

Here σ is Stefan’s constant, σ = 5.67.10−8 · Wm−2 K−4

• The amount of energy radiated out by the Earth is distributed uniformly


across the globe (area 4πR2 ), such that

Eout (T ) = 4πR2 σT 4 .

With these expressions, the differential equation 6.1 becomes


dT
C = (1 − α)Q − σT 4 , (6.2)
dt
where we have used Q = 41 S0 .

41
1

0.9

0.8

0.7
albedo [-] 0.6

0.5

0.4

0.3

0.2

0.1

0
180 200 220 240 260 280 300 320 340 360
Temperature [K]

Figure 6.2: Albedo dependance on temperature.

Greenhouse effect We can now calculate the steady state temperature


 1/4
(1 − α)Q
T∗ = .
σ

Considering a typical albedo value of α = 0.3, the solar energy flux den-
sity S0 = 1368Wm−2 (thus Q = 342Wm−2 ), the equilibrium temperature is
T ∗ = 254.8K. However, the actual value of the surface average temperature is
T ∗ = 287.7K. The difference is largely explained by the greenhouse effect of
the Earth’s atmosphere, that is, the effect of gases like CO2 , water vapor and
methane. The greenhouse gases only affect the infrared (long wavelengths) part
of the energy spectrum, and thus only the energy radiated by the Earth. We
include this effect through a factor 0 <  < 1 which reduces the outgoing energy,
leading to
dT
C = (1 − α)Q − σT 4 , (6.3)
dt

Albedo dependence on temperature The albedo depends on the amount


of ice and snow cover and therefore on the temperature, α = α(T ). The energy
balance equation should therefore be written as
dT
C = (1 − α(T ))Q − σT 4 , (6.4)
dt
It is reasonable to assume that it has a low value at high temperature (no ice),
a high value at low temperature (Earth completely frozen), and some continuous
variation in between (Fig. 6.2):
 
T − 265
α(T ) = 0.5 − 0.2 · tanh .
10

42
600

500

400
Radiation [W m -2 ]

300

200

100

0
180 200 220 240 260 280 300 320 340 360
Temperature [K]

Figure 6.3: Incoming and outgoing energy as a function of globally averaged


temperature.

Qualitative solution Steady state or fixed points are obtained when the left
hand side of 6.4 vanishes. There are three equilibria. Two of them are stable
and one is unstable. The leftmost is a snowball solution, the rightmost is a
ice-free solution.

Bifurcation and hysteresis As the strength of the solar input S0 or the


emissivity  is changed, the two curves of Figure 6.3 move with respect to each
other. As a result, the number of crossing points, and thus of fixed points,
changes. If the greenhouse effect increases, the emissivity becomes lower and
the outgoing energy curve moves down. The same happens if the solar input be-
comes larger. The two lower equilibria representing cold and moderate climates
disappear, leaving only the warmer climate solution. In the opposite scenario
where the outgoing energy curve moves up, only the cold (snowball) climate is
possible. The number and character of the solutions as function of such a bi-
furcation parameter may be represented schematically in a bifurcation diagram
as in 6.4. Notice that the bifurcation diagram is shown using the dimensionless
bifurcation parameter Q/Q0 (Why?). It is common practice to represent the
stable solutions by solid curves and unstable solutions by dashed curves. The
bifurcation diagram shows that, with decreasing solar input, the mean tempera-
ture decreases until it reaches a tipping point. The climate system then transits
to the lower branch, the planet turns frozen, and the temperature equilibrates
at its frozen stable equilibrium. A reverse scenario is also possible, however
the transition from frozen to unfrozen equilibrium occurs at a different tipping
point. Since the paths for increasing and decreasing value of the bifurcation
parameter are distinct, we see that hysteresis occurs in our climate model.

43
stable unstable current T cr

310

300

290
Temperature [K]
280
Tipping point
270

260
Tipping point
250

240

230

220
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Bifurcation parameter (Q/Q 0 ) [-]

Figure 6.4: Mean surface temperature at equilibrium as a function of the solar


constant, normalized by its present value.

6.3 Bifurcation theory


In general, the notion of a bifurcation refers to a qualitative change in the be-
haviour of a dynamical system as some parameter on which the system depends
varies continuously,
ẋ = f (λ, x) (6.5)
where f is a smoth function which depends on x and on one or more real
parameters λ1 , . . . , λm . The qualitative dynamical behaviour of a dynamical
system is determined by its equilibria and their stability, so all bifurcations are
associated with bifurcations of equilibria. A possible definition is
Definition 6.1. A point (x∗ , λ0 ) is a bifurcation point of equilibria for 6.5 if
the number of solutions of the equation f (λ, x) = 0 for x in every neighbourhood
of (x∗ , λ0 ) is not a constant independant of λ.
In the following we will present some typical examples of bifurcations. In
those examples, the vector field is a simple, low order (quadratic or cubic)
polynomial function with one or two real parameters. These examples are quite
generic despite their simplicity, in the sense that often, when the nonlinearity of
the vector field is more complicated, it can be approximately described by one
of the following examples. The higher the order of the polynomial, the more
complicated the bifurcation gets, but also the less likely.

Transcritical bifurcation Consider the ODE


ẋ = λx − x2 . (6.6)
This has two equilibria at x = 0 and x = λ, which coincide if λ = 0. For
f (λ, x) = λx − x2 , we have
∂f ∂f ∂f
(λ; x) = λ − 2x, (λ; 0) = λ, (λ; λ) = −λ.
∂x ∂x ∂x

44
5 5 5
stable

stable stable

stable stable unstable


x*

x*

x*
0 0 0
unstable

unstable stable

unstable
-5 -5 -5
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

Figure 6.5: Bifurcation diagram for example bifurcations

Thus, the equilibrium x = 0 is stable for λ < 0 and unstable for λ > 0, while the
equilibrium x = λ is unstable for λ < 0 and stable for λ > 0. This transcritical
bifurcation arises in systems where there is some ”trivial” solution branch (here
corresponding to x = 0), which exists for all values of the parameter λ. There is
a second branch x = λ that crosses the first one at the bifurcation point (x, λ) =
(0, 0). When the branches cross one solution goes from stable to unstable while
the other goes from unstable to stable (Fig. 6.5).

Saddle-node bifurcation Consider the ODE

ẋ = λ − x2 . (6.7)

If λ < 0, the equation has


√ no equilibrium solution. If λ > 0, it has two equi-
librium solutions x = ± λ, one stable and one unstable. These two solutions
coincide if λ = 0, and the single trivial solution is unstable. This bifurcation is
called a saddle-node bifurcation. A pair of hyperbolic equilibria, one stable and
one unstable, emerge out of nowhere (Fig. 6.5).

Hysteresis Consider the ODE

ẋ = λ + x − x3 . (6.8)

The equation f (λ, x) = 0 has one solution for λ < −λ∗ , three solutions for
−λ∗ < λ < λ∗ , and one solution for λ > λ∗ . For λ = ±λ∗ , there are two
solutions.
The dynamics of 6.8 have a particularity: suppose we have the capability to
continuously vary the value of the parameter λ, for example by changing some
of the physics (e.g. emitting CO2 in the atmosphere, releasing large amounts of
freshwater into the ocean by melting the polar ice cap, etc. ). If we start with a
large negative value of λ, the system will eventually reach an equilibrium on the
lower branch of the bifurcation diagram. As we keep increasing λ, the system
will ”slide” along the stable branch of the bifurcation diagram until it exceeds
the value λ∗ , where it will transition quickly to the upper stable branch of the
bifurcation diagram (why quickly?). As we further increase λ, the system will
now slide along the upper branch. The reverse change from the upper to the
lower branch will however necessitate a large decrease of λ, since it will occur
at the value −λ∗ . That is, the parameter value at which the transition occurs

45
depends on the direction in which the parameter is varied. This phenomenon is
called a hysteresis.

Pitchfork bifurcation Consider the ODE

ẋ = λx − x3 . (6.9)

For λ ≤ 0, the system has one stable equilibrium point at x = 0. At λ = 0, a


bifurcation occurs. For√ λ > 0, the system has three equilibrium states at x = 0
(unstable) and x = ± λ (stable). Thus the stable equilibrium 0 looses stability
at the bifurcation point, and two new stable equilibria appear. The pitchfork
shape bifurcation diagram gives its name (Fig. 6.5). The pitchfork bifurcation
in which a stable solution bifurcates into two new stable solutions is called a
supercritical pitchfork bifurcation. Up to change in the signs of x and λ, the
other possibility is the subcritical pitchfork bifurcation, described by

ẋ = λx + x3 . (6.10)

In this case, we have three equilibria x = 0 (stable) and x = ± −λ (unstable)
for λ < 0, and one unstable equilibrium x = 0 for λ > 0. A supercritical
pitchfork bifurcation leads to a ”soft”√loss of stability, in which the system can
go to nearby stable equilibria x = ± λ when the equillibrium at x = 0 looses
stability as λ passes through 0. On the other hand, a subcritical pitchfork
bifurcation leads to a ”hard” loss of stability, in which there are no nearby
equilibria and the system goes to some far-off dynamics (perhaps to infinity)
when the equilibrium at x = 0 looses stability.

Finding bifurcation points A solution branch is a set of equilibrium points


x∗ which can be written as a smooth function of the bifurcation parameter λ,
i.e. f (λ, x∗ ) = 0. The equilibrium points will depend smoothly on λ as long as

∂f
(λ, x∗ ) 6= 0.
∂x
This is a consequence of the implicit function theorem. Therefore, solution
branches are expected to meet at points where f (λ, x∗ ) = 0 and ∂f ∗
∂x (λ, x ) = 0.
These are candidates for bifurcation points.

46
7 Modelling of chemical reactions
Mathematical tools & concepts: conditional probabilities, ODE
Suggested reference: [Hig08]

The main theme of this section will be the stochastic modelling of chemical
reactions. Nonetheless the reader may replace chemical reaction by evolutionary
game or the alike. To begin with, we mention two prototypical examples:

Enzyme kinetics Enzyme-catalysed reactions with single-substrate mecha-


nisms due to Michaelis and Menten are systematically written as

k1 k2
GGGGGGB
S+EF GG SE GGGAE + P
k−1

where it is assumed that the back reaction S + P → ES is negligible. If the


concentration of the substrate S is high, the enzyme E is entirely saturated
and only exists in its complex form ES. This entails that, after a short relax-
ation time depending upon the initial conditions, the concentrations of both the
enzyme and the complex quickly converge to a steady-state.

Depletion of ozone in the stratosphere An important environmental pro-


cess is the catalytic destruction of ozone by atomic halogens, the main source of
which is photodissociation of halocarbon refrigerants and foam-blowing agents,
such as CFCs or freons. One such example is the cyclic reaction

k1
Cl + O3 GGGGGGAClO + O2

and
k2
ClO + O3 GGGGGGACl + 2O2 .

The second reaction recreates the original chlorine atom, which can repeat the
first reaction and continue to destroy ozone (i.e., chlorine acts as a catalyst).

7.1 The chemical master equation


Generally, we consider a situation with N different molecular species A, B, C, . . .
that can interact via M different reactions, such as A + B → C. We call

X(t) = (X1 (t), . . . , XN (t)) ∈ NN


0 (7.1)

the state vector, with Xi (t) ∈ {0, 1, 2, 3, . . .} being the number of molecules at
time t ≥ 0. If any of the M reactions fires at time t, say, the j-th reaction, the
state vector is updated according to the rule

X(t) 7→ X(t) + νj . (7.2)

The vector νj ∈ ZN is called the stoichiometric vector of the j-th reaction.

47
Figure 7.1: Chemical reaction catalysed by an enzyme.

Probability of a reaction We assume that the system of chemical species is


spatially homogeneous (“well-stirred”), so the chemical species abundances do
not vary in space. When two molecules collide they can react with a certain
probability, depending on the boundary conditions like temperature or pressure;
by the above homogeneity assumption the probability that the reageants collide
is linear in the number of each molecular species involved in a particular reaction
(cf. Exercise 7.9). Specifically, we define the probability that the j-th reaction
fires over the infinitesimal time interval [t, t + dt) given that X(t) = x as

P (j-th reaction fires over [t, t + dt) | X(t) = x) = aj (x)dt . (7.3)

The function aj : NN
0 → R+ is called the propensity of the reaction. The exact
functional form of aj depends on the type o the reaction.
Example 7.1. As an example that will guide us through this section consider
three species A, B, C with the single binary reaction
c
A + B GGGGGA C .

Since the reaction turns one A and one B into one C, the stoichiometric vec-
tor is ν1 = (−1, −1, 1). Now suppose that initially we have an initial mixture
consisting of four molecules of type A, three molecules of type B and zero C
molecules, i.e. X(0) = (4, 3, 0). Then, since the total number of particles is
finite, the set of possible states at time t > 0 is

S = {(4, 3, 0), (3, 2, 1), (2, 1, 2), (1, 0, 3)} .

Note that the state X(t) = (1, 0, 3) is a fixed point, also called absorbing state,
since all the B molecules are eaten up and no further reaction can happen.
The propensity of the reaction results from the consideration that, in a well-
stirred system, the probability of a reaction happening per unit of time must be
proportional to the number of both A and B molecules, which implies that

a1 (x1 , x2 , x3 ) = cx1 x2

It is unrealistic to assume that the number of molecules in a test tube can


be counted. Hence we seek a more coarse grained description of the number

48
of molecules at time t. The idea is to derive an differential equation for the
probability to have x = (x1 , . . . , xN ) molecules at time t,
ρ(x, t) = P (X(t) = x) , (7.4)
given that we know the probability distribution of states at time t = 0. (Note
that this entails the situation that X(0) is known exactly.)

Interlude: Basic probability theory.


Definition 7.2 (probability space). A probability space (Ω, E, P ) consists of
• A non-empty set Ω. This is the sample space or the space of possible
outcomes.
• A σ-algebra E ⊂ 2Ω . E is a set of subsets of Ω and models the ’event space’
(i.e. the things we can assign probabilities to).
• A probability measure P : E → [0, 1] which satisfies P (Ω) = 1 and the
so-called σ-additivity property:
∞ ∞
!
[ X
P Ai = P (Ai )
i=1 i=1

if Ai ∈ E are pairwise disjoint.


Example 7.3. The probability space Ω, E, P ) that models a fair six-sided die
consists of
• Ω = {1, 2, 3, 4, 5, 6}
• E = 2Ω = {∅, {1}, {2}, . . . , {1, 2}, {1, 3}, . . . , Ω}
• P ({1}) = . . . = P ({6}) = 61 .
Probabilities of other events can be computed using the σ-additivity property, for
example P {1, 2}) = P ({1}) + P ({2}) = 13 .
We need two more definitions:
Definition 7.4 (conditional probability). Let (Ω, E, P ) be a probability space
and A, B ∈ E with P (B) > 0. The conditional probability of A given B is
defined as
P (A ∪ B)
P (A|B) = .
P (B)
Definition 7.5 (partitions). The collection {B0 , . . . , BM } is called a partition of
Ω if Bi ⊂ Ω and Bi ∩Bj = ∅ for al i 6= j = 0, . . . , M and B0 ∪B1 ∪. . .∪BM = Ω.
Now we are ready to state and prove a Lemma that we will need later to
derive the chemical master equation:
Lemma 7.6 (Law of total probability). Let (Ω, E, P ) be a probability space
and let {B0 , B1 , . . . , BM } ∈ E be a partition of Ω such that P (Bj ) > 0 for all
j = 0, . . . , M Then, for any A ∈ E, it holds
M
X
P (A) = P (A|Bj )P (Bj ) .
j=0

49
Proof. Since the {Bj }j=0,...,M are a partition of Ω, we can write A ⊂ Ω as
 
M
[ M
[
A=A∩ Bj  = (A ∩ Bj ) ,
j=0 j=0

where we have used de Morgan’s rule in the second equality. Since any probabil-
ity measure P is countably additive (σ-additive) and all the A ∩ Bj are disjoint,
we have
 
M
[
P (A) = P  (A ∩ Bj )
j=0
M
X
= P (A ∩ Bj )
j=0
M
X
= P (A|Bj ) P (Bj ) .
j=0

The last equality is a direct consequence of the definition P (A|B) = P (A ∩


B)/P (B) of conditional probabilities.

Recurrence equation for the state probability We will now derive an


equation for the ρ(x, t + dt) of the molecular state vector X(t + dt) at time
t + dt, assuming that we know the distribution ρ(x, t) of X(t) at time t. To
compute ρ(x, t + dt) from ρ(x, t), it is helpful to realize that, in a chemical
system with M possible reactions there are exactly M + 1 different scenarios
that can lead to the situation X(t + dt) = x:
• X(t) = x and no reaction fired over [t, t + dt),
• X(t) = x − νj for any j = 1, . . . , M and the j-th reaction fired.
Here we assume that dt is sufficiently small (in fact: infinitesimally small), so
that at most one reaction can occur between t and t + dt.
Now let A above the event {X(t + dt) = x}, so that P (A) = ρ(x, t + dt) is
exactly the probability that we want to compute.15 Then, with Bj = {X(t) =
x − νj } the conditional probability P (A|Bj ) is exactly the probability that the
j-th rection fires over [t, t + dt), given that X(t) is x − νj at time t. Moreover,
P (A|B0 ) is one minus the probability that any of the M reaction fires over
[t, t + dt) given that X(t) = x. By definition of the propensities, we thus have
PM
• P (A|B0 ) = 1 − j=1 aj (x)dt,
• P (A|Bj ) = aj (x − νj )dt
Furthing noting that P (Bj ) = ρ(x − νj , t) for j = 1, . . . , M the law of total
probability, Lemma 7.6, implies that
 
M
X M
X
ρ(x, t + dt) = 1 − aj (x)dt ρ(x, t) + aj (x − νj )dt ρ(x − νj , t) . (7.5)
j=1 j=1

15 More precisely, we define A = {ω ∈ Ω : X(t + dt, ω) = x} ⊂ Ω where X(t, ·) : Ω → NN is


0
a stochastic process with sample paths (realizations) X(·) = {X(t, ω) : t ≥ 0}.

50
Rearranging the terms and dividing by dt yields
M M
ρ(x, t + dt) − ρ(x, t) X X
= aj (x − νj )ρ(x − νj , t) − aj (x)ρ(x, t) . (7.6)
dt j=1 j=1

Chemical master equation We take advantage of the fact that the right
hand side of (7.6) is independent of dt and that the expression on the left
is a finite-difference approximation of the partial derivative with respect to t:
Letting dt → 0, we obtain the chemical master equation (CME)16
M M
∂ X X
ρ(x, t) = aj (x − νj )ρ(x − νj , t) − aj (x)ρ(x, t) . (7.7)
∂t j=1 j=1

The CME is a discrete linear partial differential equation on a countable spatial


domain S ⊂ NN 0 , which size depends on whether the number of molecules is
finite or not (cf. Example 7.1). If we introduce the (possibly infinite) vector
u(t) = ρ(·, t) with entries u = (ux )x∈NN 0
the CME is equivalent to a linear
system of ordinary differential equations of the form
du
= Au(t) , u(0) = ρ(x, 0) , (7.8)
dt
where entries of the square matrix A are the propensities (cf. Exercise 7.10). If
the initial value X(0) = x0 is known then ρ(x, 0) = δ(x − x0 ) and the CME
yields the distribution of X(t) for any t > 0.
Example 7.7 (Example 7.1, cont’d). Recalling that a(x) = cx1 x2 and ν =
(−1, −1, 1), the CME is readily found to be


ρ(x1 , x2 , x3 , t) = c(x1 + 1)(x2 + 1)ρ(x1 + 1, x2 + 1, x3 − 1, t)
∂t
− cx1 x2 ρ(x1 , x2 , x3 , t) .

To solve it, it must be endowed with the initial condition

ρ(x1 , x2 , x3 , 0) = δ(x1 − 4)δ(x2 − 3)δ(x3 ) ,

with (x1 , x2 , x3 ) being any state from the state space

S = {(4, 3, 0), (3, 2, 1), (2, 1, 2), (1, 0, 3)} .

To see that the CME is indeed equivalent to a linear ODE system, let us define
the vector u = (u1 , . . . , u4 ) with 0 ≤ ui ≤ 1 given by

u1 (t) = ρ(4, 3, 0, t) , u2 = ρ(3, 2, 1, t) , u3 = . . . .


16 The reader may be surprised that the right hand side of (7.6) does not depend on dt at

all. The reason is that the propensities were only defined in terms of the infinitesimal time
increment dt, i.e., the right hand side of (7.5) is already a linearization in dt, which is why
after dividing by the time increment it becomes constant (i.e. independent of dt).

51
In terms of the new state vector u the CME can be recast as

u̇1 = −12cu1
u̇2 = 12cu1 − 6cu2
u̇3 = 6cu2 − 2cu3
u̇4 = 2cu3

where the dot denotes the derivative with respect to t. In other words, we have
rewritten the CME as the linear system of equations
 
−12c 0 0 0
 12c −6c 0 0 
u0 (t) = Au(t) , A =  
 0 6c −2c 0 
0 0 2c 0

with real eigenvalues λ ∈ {0, −2c, −6c, −12c}. The simple eigenvalue λ1 = 0
corresponds to the asymptotically stable equilibrium point of the CME, which is
the stationary probability of the absorbing state x∗ = (1, 0, 3), i.e.,

lim u(t) = (0, 0, 0, 1) ⇐⇒ lim ρ(x∗ , t) = 1 .


t→∞ t→∞

7.2 Stochastic simulation algorithm


We have shown that the CME can be solved in pretty much the same way as one
would solve a linear ODE. Note, however, that the ODE system thus obtained
can be fairly large, possibly even infinitely large.
We will now present an alternative method to simulate the CME in terms
of the continuous-time Markov jump process (X(t))t≥0 that is furnished by
the CME. In fact, the matrix A that is obtained by rewriting the CME as an
equivalent ODE system is the transpose of the generator matrix of this Markov
jump process. This property entails that the sum over all row-entries of AT is
zero, which implies that A has at least one eigenvalue λ = 0 corresponding to
an equilibrium of the chemical reaction system; for details see [And11].
The idea of the stochastic simulation algorithm (SSA), also known as Gille-
spie algorithm is to simulate the random reaction events by

• drawing a random time until the next reaction occurs,


• drawing randomly one of the M reactions to occur.
Once a reaction has fired, the number of molecules is updated according to the
corresponding stiochoimetric vector.

Time until next reaction To make this intuitive idea precise, let X(t) = x
and consider the time τ until the next reaction fires. Call

p0 (τ ; x, t) = P (no reaction fires in [t, t + τ )|X(t) = x) (7.9)

the probability that no reaction happens in the finite interval [t, t + τ ) with
τ > 0. Further let us suppose that whatever happens in [t, t + τ ) is independent
of what happens in [t + τ, t + τ + s) for all s > 0, in other words, the system

52
is memoryless. The, by independence, the probability that no reaction fires in
[t, t + τ + dτ ) can be written as
 
M
X
p0 (τ + dτ ; x, t) = p0 (τ ; x, t) 1 − aj (x)dτ  , (7.10)
j=1

where, by definition of the propensities, the term in the parenthesis is the prob-
ability of no reaction between t + τ and t + τ + dτ , given that X(t + τ ) = x.
Rearranging terms, dividing by dτ and letting dτ → 0, it follows that
dp0
= −atot p0 , (7.11)

with
M
X
atot (x) = aj (x) . (7.12)
j=1

Solving (7.11) with the initial condition p0 (τ = 0; x, t) = 1 it follows that

p0 (τ ; x; t) = exp(−atot (x)τ ) , (7.13)

which is to say that τ is an exponential waiting time with parameter atot .17
This implies that the average waiting time between two reactions is

E(τ |X(t) = x) = 1/atot (x) . (7.14)

Next reaction index To determine the next reaction, we define p1 (τ, j; x, t)dτ
to be the probability that no reaction happens in the interval [t, t+τ ) and the j-
th reaction fires in [t+τ, t+τ +dτ ), given that X(t) = x. Then, by independence
and definition of the propensities, we have

p1 (τ, j; x, t) = p0 (τ ; x, t)aj (x) . (7.15)

Using (7.13), the latter can be recast as the product of two probability densities:
 
aj (x)
p1 (τ, j; x, t) = (atot (x) exp(−atot (x)τ )) . (7.16)
atot (x)

We notice that p1 is the joint probaility density of the time until the next reaction
τ and the reaction index j. The (conditional) probability of the reaction index j
is proportional to aj with atot as normalization constant. Since p1 is a product
density, both random variables τ and j are independent and can be drawn
independently. The following algorithm goes back to [Gil77].
The following Lemma is helpful for generating exponentially distributed ran-
dom variables from a uniformly distributed random variable.

Lemma 7.8. Let Z be a random variable with cumulative distribution function


F (z) = P (Z ≤ z). If U ∈ [0, 1] is a uniformly distributed random variable, then

Z̃ = F −1 (U )
17 Waiting times are memoryless iff they are exponentially distributed.

53
Algorithm 1 Stochastic Simulation Algorithm
P
Given X(0) = x, define atot (x) = j aj (x).
while t < T do
Generate exponential waiting time τ ∼ Exp(atot (X(t))).
Pick reaction index j randomly with probability aj (X(t))/atot (X(t))
Set t 7→ t + τ and update state vector X(t) 7→ X(t) + νj .
end while

is distributed according to F where

F −1 (u) = inf{z ∈ R : F (z) ≥ u}

is the generalized inverse of F . In particular,


log(1 − U )
τ =−
atot
is exponentially distributed with parameter atot > 0.
Proof. Let the random variable Z̃ : [0, 1] → R be defined by U 7→ F −1 (U ). Then

P (Z̃ ≤ z) = P ({U ∈ [0, 1] : Z̃(U ) ≤ z})


= P ({U ∈ [0, 1] : F −1 (U ) ≤ z})
= P ({U ∈ [0, 1] : U ≤ F (z)})
= F (z) ,

where we have used the monotonicity of F in the third equality and the fact
that u ∈ [0, 1] is uniformly distributed in the last equality. This shows that the
distribution function of Z̃ is F . The rest of the proof is left as an exercise.
If the cumulative distribution function is a continuous monotonic function,
then the generalized inverse agrees with the standard inverse function.

Problems
Exercise 7.9. Let x = (x1 , x2 , . . .) be the state vector of a system with species
A, B, . . .. Construct propensity functions a(x) for the following reactions.
c0
a) zeroth-order reactions: ∅ GGGGGGA A

c1
b) first-order reactions: A GGGGGGA B

c2
c) dimerization: A + A GGGGGGA B

Justify your choice in each case.


Exercise 7.10. Consider a reversible reaction between three species of the form
c+
GGGGGGB
A+BF GG C ,
c−

54
with rate constants c± > 0. Let X(t) = (XA (t), XB (t), XC (t)) be the state
vector of the system at time t ≥ 0.

a) Derive the CME for the probability density function (pdf )

ρ(x, t) = P (X(t) = x) .

b) Let X(0) = (3, 2, 1). Recast the CME as an equivalent system of linear
odes and compute its equilibrium state as a function of c± . Explain the
meaning of the equilibrium (Hint: use that ρ(x, t) is a pdf ).
Exercise 7.11. Consider the Michaelis-Menten system

k1 k2
GGGGGGB
S+EF GG SE GGGAE + P
k−1

with initial values

XS (0) = 5 · 10−7 NA vol , XE (0) = 2 · 10−7 NA vol , XSE (0) = XP (0) = 0 .

and kinetic parameters

k1 = 106 /(NA vol) , k−1 = 10−4 , k2 = 10−1 , vol = 10−15 liters .

Here NA = 6.023 · 1023 is Avogadro’s number.

a) Derive the propensities and stiochiometric vectors.


b) Simulate the Michaelis-Menten system using Gillespie’s SSA from [Hig08].
Generate additional realizations using 10 times smaller and 10 times larger
initial values. Plot typical realizations and explain your observation.

55
8 Modelling of traffic flow
Mathematical tools & concepts: delay differential equations, Euler method,
scalar conservation laws, partial differential equations.
Suggested reference: [MeG07, BCD02]

We will discuss two different ways to model traffic flows: a microscopic


approach that is based on the dynamics of single cars and a mean field approach
that employs an analysis on the level of fluxes and densities of vehicles.

8.1 From individual vehicles to vehicle densities


Suppose there are N vehicles in one traffic lane, all of equal length l and mass m.
The vehicles are labelled j = 1, . . . , N where j = 1 corresponds to the leading
vehicle. Let the front car have the distance xj (t) to the beginning of the road
at time t ≥ 0. We assume that there is only one traffic lane, so vehicles cannot
overtake each other.

A delay differential equation for the vehicle positions On a busy road


the vehicles have to brake depending on both the distance between vehicles and
their relative velocities.
Suppose that the average values of |xj+1 (t)−xj (t)| are relatively small for all
j = 1, . . . , N − 1, i.e., we consider a busy road, and vehicles avoid collisions by
braking when they come too close. It is reasonable to assume that the braking
force of, say, vehicle j +1 will be higher the smaller the distance |xj+1 (t)−xj (t)|
to the j-th vehicle and the faster it approaches the j-th vehicle, i.e., the larger the
relative velocity d/dt(xj+1 (t) − xj (t)). Let us further assume that the response
of the driver of vehicle j + 1 is delayed by τ > 0, where for simplicity we assume
that the reaction time τ is constant for all drivers. Letting Fj+1 denote the
braking force, the simplest model to account for this situation is
ẋj+1 (t) − ẋj (t)
Fj+1 (t + τ ) = k , (8.1)
|xj+1 (t) − xj (t)|
where, as usual, the dot denotes derivative with respect to t and k > 0 is
constant. Using Newton’s law, F = ma, equation (8.1) implies that

d2 ẋj+1 (t) − ẋj (t)


m xj+1 (t + τ ) = k
dt2 |xj+1 (t) − xj (t)|
(8.2)
d
= k log |xj+1 (t) − xj (t)| ,
dt
which can be integrated to yield
d k
xj+1 (t + τ ) = log |xj+1 (t) − xj (t)| + aj+1 , (8.3)
dt m
for j = 1, . . . , N − 1. Equation (8.3) is a system of N − 1 delay differential
equations (DDE) where the positions x1 (t) (and the velocities) of the first vehicle
are given. There is no way to solve (8.3) analytically, but we can find a numerical
solution as we will discuss later on in this section (cf. Exercise 8.8). However
there is some hope that the DDE will admit steady state solutions or equilibria,

56
Figure 8.1: Density of vehicles and micro-macro passage.

for, in the real world, nothing can grow or decay forever. One special case of the
DDE is its Markovian limit τ → 0, in which case we obtain a nonlinear system
of N − 1 ordinary time-inhomogeneous differential equations

x1 (t) = φ(t)
k (8.4)
ẋj+1 (t) = log |xj+1 (t) − xj (t)| + aj+1 , j = 1, . . . , N − 1 .
m

A micro-macro passage: densities and fluxes It is known that the veloc-


ities of cars decrease when their density increases. To arrive at a description of
(8.3) in terms of densities and fluxes, consider a street section of length 2s  l
and define the density of vehicles at x at time t to be

# vehicles in (x − s, x + s) at time t
ρ(x, t) = , (8.5)
2s
where we assume that the street section is symmetric around the position x ∈ R.
We regard ρ as a macroscopic variable that replaces the detailed microscopic
description in terms of the positions of single vehicles by a coarse-grained de-
scription in terms of (average) numbers of cars per street section; clearly ρ
depends on the length of the street section over which we average, but it can
be shown that ρ = ρN,s converges to a limit as N → ∞ and l  2s → 0
with N l → const [BCD02]. Here we assume the birds-eye perspective (e.g.
seen from a traffic surveillance helicopter) and busy traffic conditions and, as a
consequence, we may safely ignore the dependence of ρ on N, s.
Example 8.1. One situation in which ρ is independent of N < ∞ and s is
when all vehicles are at equal distance d at any time (which implies that they
are all moving at the same speed), in which case

ρ(x, t) = (d + l)−1 (8.6)

(cf. Exercise 8.9). As the vehicle length is equal to l, the denominator is bounded
from below by l; therefore the maximum achievable density is the density of
bumber-to-bumber traffic at constant speed, with

ρ(x, t) = l−1 . (8.7)

57
Figure 8.2: Fundamental diagram of pedestrian flows (from: [BMR11]).

We want to analyse the maximum capacity of the traffic lane under equilib-
rium conditions. To this end we assume that the observed speed v of vehicles
at (x, t) depends only on the density ρ. In an abuse of notation, we write

v(x, t) = v(ρ(x, t)) . (8.8)

It is known from empirical data of traffic flows that there exists a critical density
ρcrit , below which the vehicles move at the maximum possible speed vmax , and
that there is a maximum density ρmax , at which the flow stops. From Example
8.1 it readily follows that ρmax ≤ 1/l. From the critical to the maximum density,
v decays towards zero where it is also known from experimental data that v is
a decreasing function of the density, i.e.

v 0 (ρ) ≤ 0 . (8.9)

Figure 8.2 shows experimental and simulation data of pedestrian flows under
various environmental conditions that shows the universal signature of almost
all traffic flows; the graphical relation v(ρ) is called fundamental diagram.

Steady state and equilibrium flow We suppose that all vehicles (cars,
pedestrians, . . . ) are separated by a distance d > 0 and move at the same
constant speed v. The equilibrium density corresponding to this situation is (cf.
Exercise 8.1)
ρ(x, t) = (d + l)−1 , (x, t) ∈ R × [0, ∞) . (8.10)
In equilibrium, all vehicles move at the same speed vj = dxj /dt, hence together
with the DDE (8.3) it follows that

v = λ log(d + l) + a , (8.11)

58
where we have introduced the shorthands λ = k/m and a = aj+1 for j =
1, . . . , N − 1. Combining the last two equations, we find the fundamental equi-
librium relation between the speed v and density ρ:

v = −λ log ρ + a , (8.12)

with the yet unknown parameters a and λ that must be determined from data;
by definition of ρmax , it holds that v(ρmax ) = 0, which is equivalent to

a = λ log ρmax . (8.13)

Hence  
ρ
v = −λ log . (8.14)
ρmax
An expression for λ is easily obtained by requiring that v is continuous as a
functional of ρ. Setting vmax = v(ρcrit ), the last equation entails
  −1
ρmax
λ = vmax log , (8.15)
ρcrit

which, together with the empirical finding that v(ρ) equals vmax below the
critical vehicle density yields the surprisingly general relation

vmax , ρ ≤ ρcrit
v(ρ) = n  o−1   (8.16)
vmax log ρmax log ρmax , ρ > ρcrit .
ρcrit ρ

Maximum traffic flux at equilibrium We now define the instantaneous


traffic flux J as the number of vehicles passing through a street sector [x, x+∆x)
in the time interval [t, t + ∆t), in other words,
  
# vehicles in [x, x + ∆x) at time t ∆x
J=
∆x ∆t

In mathematical terms, letting ∆x and ∆t tend to zero, we have the following


Definition 8.2 (Density flux). We define the flux J to be the functional

J(ρ) = ρv(ρ) .

With (8.16) it readily follows that



ρvmax , ρ ≤ ρcrit
J(ρ) = n 
ρ
o−1 
ρ
 (8.17)
ρvmax log max
log max
, ρ > ρcrit ,
ρcrit ρ

which, provided that ρmax ≥ e · ρcrit , attains its unique maximum at


ρmax
ρ∗ = , (8.18)
e
with e = 2.71828 . . . being the base of the natural logarithm (see Exercise 8.10).

59
8.2 Traffic jams and propagation of perturbations
We want to study what happens when the first vehicle brakes, i.e., we want to
study the effect of a perturbation of the lead vehicle on the pursuing vehicles,
when the traffic flows close to the maximum flux point.
To this end, let us go back to the microscopic picture again and consider a
platoon of vehicles under maximum flux conditions as described in the previous
section. We suppose that all vehicles move at constant speed
  −1
∗ ρmax
v(ρ ) = vmax log (8.19)
ρcrit
where we have used (8.16) with ρ∗ = ρmax /e and have tacitly assumed that
ρ∗ > ρcrit . Let us further assume that we can extend the time t ≥ 0 to the
whole real axis, t ∈ R, and that the lead vehicle crosses the origin x = 0 at time
t = 0, i.e. x1 (0) = 0. With the sign convention
xj−1 − xj ≥ l > 0 (8.20)
and the shorthand v ∗ = v(ρ∗ ), equation (8.3) becomes
d
xj+1 (t + τ ) = λ log |xj+1 (t) − xj (t)| + a
dt
= v ∗ log(xj (t) − xj+1 (t)) + v ∗ log ρmax (8.21)
= v ∗ log (ρmax (xj (t) − xj+1 (t)))
where we have used that v ∗ = λ, which follows from (8.15) and (8.20) and which
together with (8.13) entails the relation a = v ∗ log ρmax .

Braking of the lead vehicle and perturbation of the pursuing vehicles


For t > 0, we consider the DDE system
d
x1 (t) = φ(t)
dt (8.22)
d
xj (t + τ ) = v ∗ log (ρmax (xj−1 (t) − xj (t))) , j = 2, . . . , N .
dt
where we assume that the system in equilibrium for t ≤ 0:18
xj (t) = v ∗ t − (j − 1)(d + l) , j = 1, . . . , N (8.23)
Let us assume that the first vehicle with position x1 (t) brakes at time t = 0 and
releases the brake after a short time tb > 0. Specifically,
(
v∗ , t≤0
φ(t) = ∗
(8.24)
v (1 − b(t)) , t > 0 .

where we set b(t) = kt exp((tb − t)/tb ). Solving the ODE for x1 (t), using (8.22)–
(8.24) we find
Z t Z t
∗ ∗
x1 (t) = φ(s) ds = v t − v b(s)ds , t > 0 , (8.25)
0 0
18 The reader should think of v∗as a model parameter, rather than the instantaneous velocity
of individual vehicles that is given by vj = dxj /dt.

60
with Z t  
b(s)ds = ektb tb − (t + tb )e−t/tb . (8.26)
0
We call yj (t) the hypothetical position of the j-th vehicle, if the lead vehicle
had not braked, i.e. without the perturbation. We further call
zj (t) = xj (t) − yj (t) , j = 1, . . . , N , (8.27)
the perturbation displacement due to the perturbation of the lead vehicle’s mo-
tion. The perturbation displacement of the first vehicle then is
Z t
z1 (t) = −v ∗ b(s)ds , t > 0 . (8.28)
0
By (8.23), it follows for the pursuing vehicles with indices j = 2, . . . , N ,
zj (t) = xj (t) − v ∗ t + (j − 1)(d + l) , t > 0. (8.29)
Note that zj (t) = 0 for t ≤ 0 and all j = 1, . . . , N . Further note that the
non-collision constraint
xj−1 (t) − xj (t) ≥ l ∀t ∈ R . (8.30)
entails
zj (t) − zj−1 (t) ≤ d ∀t ∈ R . (8.31)
The latter follows from (8.29), together with
l ≤ xj−1 (t) − xj (t) = zj−1 (t) − zj (t) + d + l ∀t ∈ R . (8.32)

Reaction time and the onset of traffic jam Equations (8.27)–(8.31) allow
us to recast (8.22)–(8.23) as a DDE for the perturbation displacement. Bear-
ing in mind that d + l = e/ρmax holds under maximum flow conditions, the
perturbation displacement (8.29) of the pursuing vehicles reads
(j − 1)e
zj (t) = xj (t) − v ∗ t + , t > 0. (8.33)
ρmax
Plugging the last equation into (8.22), using that (8.32), we obtain a closed
DDE system for the perturbation displacement for t > 0:
  
d ∗ e
zj (t + τ ) = v log ρmax + zj−1 (t) − zj (t) − v∗ , (8.34)
dt ρmax
with j = 2, . . . , N , the lead vehicle displacement
Z t
z1 (t) = −v ∗ b(s) ds (8.35)
0
and the initial conditions
zj (0) = 0 , j = 2, . . . , N . (8.36)
Note that (8.34) is equivalent to
d n ρmax o
zj (t) = v ∗ log 1 + (zj−1 (t − τ ) − zj (t − τ )) , (8.37)
dt e
which follows from shifting the independent variable t according to t 7→ t − τ
and moving the rightmost term −v ∗ in (8.34) under the logarithm.
Figure 8.3 shows a simulation of (8.34)–(8.36) for different reaction times τ ;
cf. Exercise 8.8 for the parameter values used and for further details.

61
0 0

−100 −100

−200 −200
displacement

displacement
−300 −300

−400 −400

−500 −500

−600 −600

−700 −700

−800 −800
0 20 40 60 80 100 0 20 40 60 80 100
time time

Figure 8.3: The left panel shows how a perturbation propagates in case of short
reaction time (no accident), the right panel shows the case of a too long reaction
time; the 11th car cannot brake anymore and crashes into the 12th car.

8.2.1 Numerical solution


First vehicle. The equation for the displacement of the first vehicle
d
z1 (t) = −v ∗ b(t) (8.38)
dt
is an ODE. The easiest way to solve it numerically is to use the forward Euler
scheme, which approximates an ODE of the form ẋ = f (x, t) with the iteration

x̃(t + h) = x̃(t) + hf (x̃(t), t), x̃(0) = x0 .

The scheme is derived by approximating dt d


x(t) with the finite difference h−1 (x(t+
h) − x(t)) for some finite step size h > 0, which is a parameter of the algorithm
and must be chosen by the user. Here the tilde denotes numerical approxima-
tion. The forward Euler scheme applied to (8.38) gives

z̃1 (t + h) = z̃1 (t) − hv ∗ b(t), (8.39)

an equation we can iterate N times (starting from t = 0) in order to obtain


the numerical solution vector (z̃1 (0), z̃1 (h), . . . , z̃1 (hN )). Here z̃1 (nh) is the
numerical approximation to the true solution z1 (t) for t = nh.

2nd vehicle. The equations (8.37) for vehicles 2 to N are DDEs, and care
must be taken with their numerical integration.
If we apply the forward Euler scheme to (8.37) for j = 2, we obtain
n ρmax o
z̃2 (t + h) = z̃2 (t) + hv ∗ log 1 + (z̃1 (t − τ ) − z̃2 (t − τ )) (8.40)
e
The initial condition z̃2 (0) = 0 is not enough to solve (8.40) due to the presence
of the time delay τ > 0: In order to compute the first iterate z̃2 (h), we need
z̃1 (−τ ) and z̃2 (−τ ). In general, the initial condition z̃1 (t) = z̃2 (t) = 0 for
t ∈ [−τ, 0) (which we luckily have) is needed to iterate (8.40) forward in time.

62
In addition, z̃1 (t) from equation (8.39) is needed for t = 0, h, . . . , hN in order to
compute z̃2 (t) for t = 0, h, . . . , hN using (8.40). Thus we can solve the equation
for the 2nd vehicle after we solved the equation for the 1st.
Iterating this argument, we can solve the equation for the (j + 1)st vehicle
after we solved the equation for the jth vehicle using forward Euler. This even-
tually leads to the numerical solution (z̃j (0), z̃j (h), . . . , z̃j (hN )) being available
for all j = 1, . . . , N .

8.3 Flow modeling: macroscopic modeling of traffic flows


In the microscopic model, each driver reacts only to the car in front. This is
realistic in a tunnel, or in very dense traffic. However on open road, the driver
will probably look further ahead and react according to changes in density. We
want to model the overall flow in this setting, using a so-called continuum model
of traffic flow.
Again we consider only one traffic lane, without entrances or exits. If we
select some stretch of the road between two points denoted x1 and x2 , the
total number of cars to be found between x1 and x2 will depend on the time
t. Specifically, if more cars flow into the segment [x1 , x2 ] than flow out of
it, the number of cars in the segment will increase. This can be expressed
mathematically through the conservation of total number of cars in [x1 , x2 ]:

Rate of change of traffic = Traffic inflow − Traffic outflow

Scalar conservation laws In the following, we will make use of variables


such as the density ρ = ρ(x, t), the flux J = J(x, t) and the average speed
v = v(x, t). Recall that the three are related by the simple identity J = vρ.
Considering our stretch of road in space and time, we have

J(x, t) := v(x, t)ρ(x, t)

where v is the observed speed at location x and time t. We assume that J and
ρ are nonnegative functions. We also make the simplistic assumption that the
speed is a function of density alone, i.e. v = v(ρ). In the following, we will keep
track of the total number of cars in [x1 , x2 ] during a time [t1 , t2 ].
The number of cars entering [x1 , x2 ] through the point x1 during the time
Rt
interval [t1 , t2 ] is given by t12 J(x1 , t)dt, and the number of cars leaving [x1 , x2 ]
R t2
through the point x2 is t1 J(x2 , t)dt. The number of cars to be found in the
Rx
space interval [x1 , x2 ] at time t1 is given by x12 ρ(x, t1 )dx and the number at
R x2
time t2 is given by x1 ρ(x, t2 )dx.
Since the total number of cars in [x1 , x2 ] during a time [t1 , t2 ] is conserved
(no cars disappear or appear), we have
Z x2 Z x2 Z t2 Z t2
ρ(x, t2 )dx − ρ(x, t1 )dx = J(x1 , t)dt − J(x2 , t)dt (8.41)
x1 x1 t1 t1

Supposing that ρ and J are continuously differentiable with respect to x and t,


the left-hand side of 8.41 can be expressed as
Z x2 Z x2 Z t2

[ρ(x, t2 ) − ρ(x, t1 )] dx = ρ(x, t)dtdx. (8.42)
x1 x1 t1 ∂t

63
The right-hand side can be rewritten similarly, leading to
Z x2 Z t2 Z t2 Z x2
∂ ∂
ρ(x, t)dtdx = − J(x, t)dxdt.
x1 t1 ∂t t1 x1 ∂x

The latter is equivalent to


Z x 2 Z t2  
∂ ∂
ρ(x, t) + J(x, t) dtdx = 0, (8.43)
x1 t1 ∂t ∂x

which is true for any choice of rectangle [x1 , x2 ] × [t1 , t2 ]. By the Fundamental
Theorem of the Calculus of Variation (see the following Lemma for a simple
version) this implies that
∂ ∂
ρ(x, t) + J(x, t) = 0, (8.44)
∂t ∂x
which is a first-order conservation law.
Lemma 8.3. If f (x, t) is a continuous function defined on R2 such that
ZZ
f (x, y)dxdy = 0 .
R

for each rectangle R ⊆ R2 , then f (x, y) ≡ 0 for all (x, y).


Proof. Suppose that there exists a pair of coordinates (x0 , y0 ) such that f (x0 , y0 ) 6=
0. Without loss of generality assume that f (x0 , y0 ) > 0. Since f is continuous,
there is a δ > 0 such that f (x, y) > f (x0 , y0 )/2 whenever |x − x0 | < δ and
|y − y0 | < δ. Therefore if we let

Rδ := {(x, y) : |x − x0 | < δ and |y − y0 | < δ} ,

then ZZ ZZ
1
f (x, y)dxdy ≥ f (x0 , y0 )dxdy.
Rδ 2 Rδ

By assumption the left hand side is zero, consequently we obtain

0 ≥ 2δ 2 f (x0 , y0 ),

which is a contradiction. Thus f (x, y) ≡ 0

Simplifying the scalar conservation law The conservation law 8.44 is a


first order partial differential equation with two unknowns. To be able to solve
it, we need one more equation - a state equation - relating the unknowns ρ and
J. For this, recall that the microscopic analysis provided us with an expression
for the traffic flux J(ρ) in equilibrium conditions (8.17):

ρvmax , ρ ≤ ρcrit
J(ρ) = n  o−1  
ρvmax log ρmax log ρmax , ρ > ρcrit .
ρcrit ρ

The state equation has the right behavior, i.e. the traffic flux J increases linearly
for small density ρ, levels off until a maximum is reached and then decreases

64
until J becomes zero at bumper-to-bumper traffic. However, the derivative has
a jump discontinuity at ρ = ρcrit , which is not so realistic. Moreover, it was
derived under equilibrium conditions which will not necessarily be satisfied.
We will now create a similar state equation for J = J(ρ) that is differentiable
for all admissible ρ. A simple choice is J(ρ) = aρ(b − ρ) with a parameter a > 0
and b = ρmax . When doing this, we assume that J = J(ρ) even outside of
equilibrium conditions, meaning that the flux adjusts smoothly to a change in
density.
With this choice, the conservation law 8.44 takes the form
∂ ∂
ρ(x, t) + J 0 (ρ) ρ(x, t) = 0, (8.45)
∂t ∂x
This is a first-order partial differential equation (PDE) for ρ. Together with
initial and boundary conditions, this model is solvable using the method of
characteristics.
Remark 8.4. From the definition of the density flux, we have that J(ρ) = v(ρ)ρ.
Therefore the derivative J 0 (ρ) has the dimension of velocity. We will see later
that the PDE expresses that ”traffic waves” propagate with a velocity given by
J 0 (ρ).

Linear traffic waves Before going to the method of characteristics, we will


consider the simpler case of linear traffic waves. Our
 PDE 8.44 will be examined
ρ
in more detail using the relation J(ρ) = vmax ρ 1 − ρmax . This is a simple
quadratic function of ρ.
We will use the form of the equation given by (8.45) to investigate the
propagation of linear traffic waves.
Let us suppose that ρ = ρ0 + δρ in (8.45), where δρ  ρ0 . In other words,
we consider a case where the traffic density is slightly perturbed from a constant
density ρ0 . To put this into (8.45), we can use the Taylor expansion

J 0 (ρ0 + δρ) = J 0 (ρ0 ) + J 00 (ρ0 )δρ + · · · ,

but we see that the terms in δ can be dropped since both partial derivatives in
(8.45) are of order δ. Thus we obtain the linearized form of the PDE

∂ ∂
ρ(x, t) + J 0 (ρ0 ) ρ(x, t) = 0, (8.46)
∂t ∂x
Notice here that J 0 (ρ0 ) is a constant and has the dimension of velocity. We can
call it v0 and get
∂ ∂
ρ(x, t) + v0 ρ(x, t) = 0. (8.47)
∂t ∂x
Now by substitution, one can easily see that ρ = f (x − v0 t) is solution for any
differentiable f (x). Note that ρ = f (x − v0 t) describes a wave moving with
velocity v0 . For v0 > 0, the wave moves to the right, the opposite sign moving
to the left.
Example 8.5. If f (x) = sin x, then ρ = sin(x − v0 t), the point (x, t) such that
x − v0 t = π/2 is at the crest of the wave and it moves in the x − t plane along

65
the straight line x = v0 t + π/2. Thus the solutions of (8.46) represent linear
traffic waves . The velocity v0 is given by
 
0 2ρ0
v0 = J (ρ0 ) = vmax 1 − .
ρmax
It is important to realise that this velocity is relative to the road surface.Note that
when ρ0 ≈ 0 we have v0 ≈ vmax . This says that density changes are propagating
with the velocity of the cars when there are few cars, which is reasonable. It also
means that traffic waves move with the traffic (again reasonable for light traffic).
At the other extreme, when ρ ≈ ρmax , we have v0 ≈ −vmax . In this case cars are
moving slowly, and the waves move backward relative to the car’s motionat the
high speed of vmax . This happens when cars move slowly in a packed traffic and
one car suddenly stops. The wave of red brake lights caused by many rear-end
cars braking can move towards a driver quickly.

Method of characteristics We will now learn how to solve initial value


problems for first-order PDE’s using the method of characteristic curves. The
idea of the method is to discover curves (the characteristic curves) along which
the PDE becomes an ODE. Once the ODE is found, it can be solved along the
characteristic curves and transformed into a solution for the PDE. Consider a
function f (x, t) satisfying a first-order linear PDE of the form
∂ ∂
f (x, t) + v(x, t) f (x, t) = 0. (8.48)
∂t ∂x
We will view this equation as saying that f is not changing along a curve x =
x(t), which means
d
f (x(t), t) = 0. (8.49)
dt
Using the chain rule we get
∂f dx ∂f
0= + . (8.50)
∂t dt ∂x
By virtue of (8.48) and (8.50) we must have
dx
= v(x, t), (8.51)
dt
which is an ODE for x(t). A solution φ(x(t), t) satisfies
d ∂φ ∂φ ∂φ ∂φ
φ(x(t), t) = + x0 (t) = + v(x, t) = 0, (8.52)
dt ∂t ∂x ∂t ∂x
which implies that φ(x(t), t) = φ0 (x0 ). Each value of x0 determines a unique
characteristic base curve if v is such that the initial value problems for the ODE
(8.51) are uniquely solvable (we assume v smooth enough for that). On any of
the integral curves φ(x(t), t) = φ0 (x0 ), f will also be constant (see (8.49) and
(8.52) ). Since the curves of constant φ and constant f coincide, f has to be a
function of φ alone
f (x(t), t) = F (φ(x, t)). (8.53)
We will consider for example an initial condition f (x, 0) = f0 (x) such that
f (x, 0) = F (φ(x, 0)). This equation can be solved for x, which then leads to
f (x, t) = f0 (x(φ(x, t))).

66
3

2.5
2
1.9
1.8
2 1.7

f(x,t)
1.6
1.5
1.5
t

1.4
1.3
1.2
1 1.1
1
3
2.5
0.5 15
2 10
1.5 5
1 0
t 0.5 -10
-5
0
-5 -3 -1 1 3 5 0 -15 x
x

Figure 8.4: The left panel shows the characteristic base curves for example 8.6,
the right panel shows the corresponding solution f (x, t)

Example 8.6. Consider the initial value problem


∂f ∂f 1
+ x sin(t) = 0, f0 (x) = 1 + .
∂t ∂x 1 + x2
Here we have v(x, t) = x sin t. Characteristic base curves for this problem
are solutions of
dx
= x sin t, x(0) = x0 .
dt
By separation of variables we get
Z Z
1
dx = sin tdt.
x
Hence
ln x = − cos t + c,
and using the initial condition

x(t) = x0 e1−cos t .

The function f is preserved along the characteristic base curves

f (x(t), t) = f0 (x0 ), x0 = x(t)e−1+cos t .

Since we know that


1
f0 (x0 ) = 1 + ,
1 + x20
we find that
1
f (x, t) = 1 + ,
1 + x2 e−2+2 cos t
The characteristic base curves and the solution f (x, t) are illustrated in Figure
8.4.

67
Back to nonlinear scalar conservation laws The method of characteristics
can be used for nonlinear conservation equations like our traffic model.
Consider the nonlinear scalar conservation law given by
∂ρ ∂ρ
+ J 0 (ρ) = 0, (8.54)
∂t ∂x
with characteristic base curves such that
dx
= J 0 (ρ(x, t)), x(0) = x0 .
dt
Assuming a smooth enough ρ, we get a solution x(t) and can rewrite the con-
servation law as
d
ρ(x(t), t) = 0.
dt
Thus as before, ρ(x(t), t) = ρ0 (x0 ), meaning that ρ is contant along character-
istic curves. The corresponding characteristic ODE is
dx
= J 0 (ρ(x(t), t)) = J 0 (ρ0 (x0 )),
dt
and since ρ0 (x0 ) is constant we can integrate to get

x(t) = x0 + J 0 (ρ(x0 ))t.

In conclusion, the PDE (8.54) has characteristic base curves that are straight
lines and explicitly computable.
−1
Each line has a slope of [J 0 (ρ(x0 ))] corresponding to a propagation speed
0
for the density of J (ρ(x0 )).

8.4 Traffic flow when the light turns green


We want to study what happens to the traffic flow when a light turns green.
That is, there is a (red) traffic light at x = 0, where cars are stopped and
standing bumper to bumper behind the traffic light (for x ≤ 0), the road ahead
of the light is empty and the light turns green at time 0. Mathematically at
time zero we have (
ρmax , x ≤ 0
ρ0 (x) =
0, x > 0.
For simplicity we assume the normalisation ρmax = 1 and can write
(
ρ(1 − ρ) , ρ ∈ [0, 1]
J(ρ) = (8.55)
0, ρ > 1.

At time t = 0, the light turns green and thus


(
1, x ≤ 0
ρ0 (x) =
0, x > 0.

The characteristic base lines satisfy


(
0 0 −1, x ≤ 0
x (t) = J (ρ0 ) = 1 − 2ρ0 = ,
1, x > 0.

68
2

1.8

1.6

1.4

1.2
?
1

t
0.8

0.6

0.4

0.2

0
-5 -3 -1 1 3 5
x

Figure 8.5: Characteristics for J(ρ) from 8.55

and are thus (Figure 8.5)


(
−t + x0 , , x0 ≤ 0
x(t) = .
t + x0 , x0 > 0 .

The figure reveals a gap region which is not reached by any characteristic
base line. There is an inadequacy in our approach: this gap originates from our
discontinuous change in density, which we will correct for. In other words, we
will now modify the problem in order to have a continuous change in density.
Let us define a modified initial density

1 ,
 x ≤ −
 1 x
ρ (x, 0) = 2 − 2 , − < x ≤  (8.56)

0, x>

The characteristic base curves satisfy



−1 , x0 ≤ −

0 0 
x (t) = J (ρ0 (x0 )) = x0 , − < x0 ≤ 

1, x0 > 

The set of these characteristic base curves is known as the rarefaction fan and
is shown in Figure 8.6.
Now in the transition region we get
 
1 x0 t
x(t) = (1 − 2ρ0 )(x0 )t + x0 = 1 − 2( − )t + x0 = x0 1 + .
2 2 

Solving for x0 we get


x
x0 = .
1 + t/
With x0 we get the density in the transition zone
 
 1 x 1 x
ρ (x, t) = ρ0 (x0 ) = − = 1− ,
2 2(1 + t/) 2 +t

for which we can take the limit as  → 0


1 x
lim ρ (x, t) = ρ(x, t) = 1− , t > 0, (8.57)
→0 2 t

69
2

1.8

1.6

1.4

1.2

t
0.8

0.6

0.4

0.2

0
-5 -3 -1 1 3 5
x

Figure 8.6: The expansion fan fills in the gap of Figure 8.5

where t is taken as a fixed value as shown in figure


As can be seen in the figure, the limit function is linear in x in the rarefaction
fan. For x = −t, we have ρ = 1 as it should be. For x = t, we have ρ = 0 as we
should, and there is a smooth variation in between.
This function tells us how the density varies smoothly from 1 (or ρmax ) to 0
as the cars accelerate when the light turns green.

8.5 Some properties of traffic flow from a red light


We will now ask questions associated with motion from a red light. The ques-
tions are, how long do you have to wait before you start moving? What is the
path of your car once you begin to move? How close do you have to be to go
through the light in one cycle?
To understand the answers better, it is easier
 to restore
 units for the next
ρ
analyses. Therefore we go back to J(ρ) = ρvmax 1 − ρmax .

Waiting time The left end of the expansion fan is the traffic wave corre-
sponding to ρmax and has a velocity of −vmax . If we consider a car at a distance
D behind the light, the time until the wave reaches the car is thus t = D/vmax .
In city traffic vmax might be 50 km/h. If we assume a car spacing of 6 m,
the waiting time per car is t = (6/1000).(3600/50) = .43 seconds. However,
typically large human reaction time has to be added to that value.

Vehicle path Once the car located at a distance D encounters the traffic wave
(which moves with velocity vmax ), it will begin to move. The car’s movement
after that is completely independent of the traffic wave and can be calculated.
The density in the expansion fan is
 
vmax t − x
ρ(x, t) = ρmax . (8.58)
2vmax t

We already know that the velocity of the car is v = vmax (1 − ρ/ρmax ). Inserting
(8.58) in this expression, we obtain the velocity of a car as a function of x and
t. That gives us
  
dx 1 vmax t − x
= vmax 1 − ρmax . (8.59)
dt ρmax 2vmax t

By integrating (8.59), we get the path of a car after it starts moving.

70
Which cars get through Let us say that the light stays green for a time tG .
The last car to get through the light is the one starting from a distance Dlast
such that its position at time tG is xlast (tG ) = 0.

Problems
Exercise 8.7. Simulate the Markovian approximation (8.4) of the DDE (8.3)
for pedestrians in Matlab using forward Euler.

a) Obtain a rough estimate of the equilibrium parameters λ = k/m and a =


aj+1 from Figure 8.2. Use reasonable assumptions about the size of adults
and the legroom necessary for walking to estimate d and l.
b) Simulate (8.4) starting from random initial conditions x1 (0), . . . , xN (0),
with N = 10 and various different choices of the motion x1 (t) of the
first pedestrian in the row (e.g. constant speed, slowing down). Let it run
sufficiently long and explain your observation.
Think about a sensible visualization of the simulation data.
Exercise 8.8. Simulate the DDE (8.34)–(8.36) for the perturbation displace-
ment, with the parameters v ∗ = 28 m/s, ρmax = 40 cars per kilometer, tb = 1 s,
k = 0.2 s−1 and d = 19 m, and the braking law

b(t) = kt exp((tb − t)/tb ) .

a) Implement an iterative solver in Matlab using forward Euler, with time


step h = 0.01 s and τ = Rh, R ∈ N being a multiple of the time step.
b) Simulate the dynamics and estimate the maximum reaction time before an
accident occurs when the number of cars in the platoon is N = 40. Think
about a sensible data visualization.

Exercise 8.9. The macroscopic density of N vehicles at x at time t is defined


as
# vehicles in (x − s, x + s) at time t
ρN,s (x, t) = . (8.60)
2s
Consider a situation where all N vehicles are at equal distance at any time.
Show that then ρN,s is independent of N and s and given by
−1
ρ(x, t) = |xj+1 (t) − xj (t)| .

Why is the right-hand side of this equation independent of j?


Exercise 8.10. Compute the density ρ∗ that maximizes the flux (8.15). Read
off the values for ρcrit , vmax and ρmax from Figure (7.2) and plot J(ρ) against
v(ρ), using (8.15). Explain your findings. For what density does the maximum
flux occur if ρ∗ > ρcrit ? And if ρ∗ < ρcrit ? If the density is ρmax /e, what is the
approximate distance between cars? Would drivers want to drive the speed limit
then?

71
9 Formal justice
Mathematical tools & concepts: functional equations
Suggested reference: [Ill90]

In this section, we will give a mathematical analysis of the Aristotelian dic-


tum that being just means treating equals equally and unequals unequally. The
formal concept of justice means the consistent and continuous application of the
same norms and rules to each and every member of a population. Mathemati-
cally, formal justice means that that there is a relation between, e.g., merit and
qualification and the compensation that professionals receive for their work.

9.1 Functional equations


A functional equation is an equation that, just like a differential equation, defines
a function in implicit form. Here these implicit function definitions follow from
abstract considerations about equality and inequality. The Aristotelian idea of
proportionality, that says that any member of a population must be treated
proportionately according to merit or excellence, refers precisely to this kind of
formal concept of equality and inequality.
In more mathematical terms, this means that, if we can measure or express,
e.g., merit by a single non-negative real number x, then then compensation
should be a function m : [0, ∞) → [0, ∞) of x. If we agree with Aristotle that
compensation (e.g., wage) and merit (e.g., professional qualification) should be
proportional, then the function m should satisfy the functional equation

x m(x)
= , x, y ≥ 0 . (9.1)
y m(y)

It is easy to see that (9.1) has the solution

m(x) = cx (9.2)

for some constant c ≥ 0, where the positivity follows from the fact that m takes
only positive (non-negative) values. To see that m(x) is indeed proportional to
x, set y = 1 which implies that m(x) = m(1)x. As we will show below, this is
in fact the only solution of the functional equation (9.1).

Formalizing justice Aristotle’s concept of proportional justice is rather re-


strictive and has obvious shortcomings. For example, when applied to crime and
punishment it implies that someone who has robbed 10 banks should receive two
times the punishment of someone who has robbed “only” 5 banks.19
However let us stick to the problem of just wages. Following [Ill90], we shall
call a wage system formally just if
a) there is a (mathematical) relation between the group of people involved
and the set of possible wages,
b) the system is reliable, in that it is invariant with respect to the user,
19 Think about other examples and discuss the shortcomings of the concept of formal justice.

72
c) the system is accurate, i.e., relations between objects (people) are reflected
by the relations between the assigned numbers (wages), and
d) there is an accurate inverse in the sense that for any two different wages
there exit well-defined prototypes of people who qualify for these wages.
Aristotle’s concept of proportional justice meets the first two requirements.
Translating the third requirement, condition c), into mathematics, it entails that
the function m should be a homomorphism, i.e. a structure-preserving map be-
tween relations between people an wage relations. This homomorphism, by the
condition d) on its inverse, is specified to be invertible, hence an isomorphism.
We stipulate that the function m should be a homomorphism with respect to to
the ratio scale, in other words, we measure relations between two people with
qualification x > 0 and y > 0 by the ratio x/y. This is to say that we seek a
function m : [0, ∞) → [0, ∞) that satisfies the equation
 
x m(x)
m = , x, y ≥ 0 . (9.3)
y m(y)
Theorem 9.1. Let m : [0, ∞) → [0, ∞) satisfy the functional equation
 
x m(x)
m = , x, y ≥ 0 .
y m(y)
If m is continuous for some z ∈ [0, ∞), then m is of the form
m(x) = xs , s ∈ R.
Proof. It is clear that m(x) = xs solves (9.3), however we have to prove the
converse statement, namely that all solutions of (9.3) that have a point of
continuity are of the form m(x) = xs . The proof proceeds in three steps:
1. We first observe that (9.3) is equivalent to
m(xy) = m(x)m(y) , x, y ≥ 0 , (9.4)
which follows from the fact that
 
x m(x) m(y)
m(xy) = m = = m(x) ,
1/y m(1/y) m(1)
with  x  m(x)
m(1) = m = = 1.
x m(x)
2. Define the function h : R → R by h(u) = log(m(eu )). This function solves
Cauchy’s functional equation
h(u + v) = h(u) + h(v) , (9.5)
as can be seen by noting that
h(u + v) = log(m(eu ev ))
= log(m(eu )m(ev ))
= log(m(eu )) + log(m(ev ))
= h(u) + h(v) .

73
Obviously, (9.5) is satisfied by any linear function

h(u) = cu , c ∈ R. (9.6)

We will show that Cauchy’s equation has no other solutions. By induction,


h(qu) = qh(u) for all q ∈ Z; in particular h(q) = qh(1), and we set
c = h(1). Then, with h(1) = h(p/p) = ph(1/p), p ∈ N, it follows that
   
q 1 q
h = qh = c , q ∈ Z, p ∈ N ,
p p p

which proves that (9.6) holds for all rational arguments u = q/p. By
the above assumptions m is continuous at z ≥ 0, as a consequence h is
continuous at w = log z. But then, since

h() = h(w +  − w) = h(w + ) − h(w) → 0

as  → 0, we can conclude that h is continuous at u = 0. Iterating the last


argument, it follows that h is continuous everywhere on its domain. Hence
all solutions of Cauchy’s functional equation (9.5) that have at least one
point of continuity are linear functions of the form (9.6).
3. All we have to do now is to invert (9.6) which, using that x = eu gives

m(x) = exp(h(log x)) = exp(log(xc )) = xc .

But since c ∈ R is arbitrary, the assertion follows.

Remark 9.2. There are other solutions to Cauchy’s functional equation (9.5),
but they represent pathological cases; in particular they are nowhere continuous;
for details we refer to [Kuc09].

Testing whether wages are formally just The solution to the functional
equation does not say anything about how the wages should grow with the
qualification of an employee, because (9.3) does not tell us what s is (or what
it should be in an ideal world). Nevertheless we can use it to test whether a
wage system is consistent, i.e. wether all employees in a company or in a country
receive payment that follows the same “law”. Accordingly (9.3) can be used as
a means for decision making, for example, when negotiating the salary with a
potential future employee or when considering to cap bankers’ bonuses by law.
We suppose that a fair wage scale is given by

m(x) = rxs , (9.7)

with r > 0 being a scaling factor that accounts for, e.g., the currency in which
wages are paid; cf. (9.9) below. Figure 9.1 shows possible qualification-wage
curves for different values of s. Note that the function m(x) = rxs with s <
0 mets the requirement of formal justice, however paying the more qualified
candidate the lower salary does not appear to be just by any sensible standard.
As an illustration of how our model of formal justice can be used to test wage
scales consider the situation of three employees working for company X, who

74
2

1.8

1.6 s=1
s>1
1.4 0<s<1
s<0
1.2
m

0.8

0.6

0.4

0.2

0
0 0.5 1 1.5 2
x

Figure 9.1: Continuous solutions m(x) = xs of the functional equation


m(x/y) = m(x)/m(y) for various values of s ∈ R.

are paid according to their seniority: Alice has been working for her company
for 25 years and makes e 2,000,000 per year, Bob who joined the company 16
years ago earns e 60,000 per year and Carol after only 3 years gets e 40,000.
We do a least squares fit of the linear model

M = sX + b , with M = log m, X = log x, b = log r . (9.8)

In a logarithmic scale a fair wage scale is a straight line with slope s and,
when the payment is fair, all data points should lie roughly on this straight
line.20 Figure 9.2 that shows the least square fit of the data suggests—not very
surprising—that Alice is significantly overpaid, whereas Bob is underpaid.

9.2 Criticism and possible extensions


Note that there is no rigorous (mathematical or other) argument for using ratios,
rather than any other relation between people’s qualifications and their wages,
e.g., differences such as in

m(x − y) = m(x) − m(y) , x, y ≥ 0 .

(You may think of other sensible choices. By the proof of Theorem 9.1, all
solutions of this functional equation are linear.) Nonetheless using ratios to
compare measurable quantities is a well-established approach in sociology and
quantitative research, hence we will stick to the ratio scale; moreover it is in
line with the historical notion of proportional justice.
20 Clearly three points do not give sufficient statistics, but the example is just meant to

illustrate the idea.

75
8

7.5

6.5

6
log(salary)

5.5

4.5

3.5

3
1 1.5 2 2.5 3 3.5
log(years)

Figure 9.2: Least square fit of a wage system (log scale), with exponent s ≈ 1.4.

Scaling invariance A more severe drawback of (9.3) is that the equation is


not scale invariant. If we change the payment from, say, EUR to CHF, then
m scales according to m 7→ rm, with r > 0 being the currency exchange rate
between Euros and Swiss francs. Specifically, calling m̃ = rm, we have
 
x m̃(x)
m̃ =r , x, y ≥ 0 , (9.9)
y m̃(y)

which is different from our original functional equation (9.3). We can, however,
account for this lack of scale invariance by simply replacing our model (9.3) by
the rescaled model (9.9), which then has solutions of the form

m̃(x) = rxs , (9.10)

with r > 0 now being a general scale-dependent prefactor.

Formal justice with multiple objectives One may argue that the previ-
ous concept of formal justice is “too one-dimensional”, in that it tacitly assumes
that qualification or merit can be measured by a single parameter. It is more
reasonable to assume that the regular payment that an employee receives de-
pends on various independent parameters x, y, z . . ., such as formal degree of
education, seniority, extra professional qualifications and so on, which means
that the compensation will be a function m(x, y, z, . . .).
As an example we consider the case m = m(x, y). The idea is that the rule
of formal justice, i.e. (9.3) or (9.9) should apply to each qualification measure
separately. That is, we require that m : [0, ∞) × [0, ∞) → [0, ∞) solves the

76
group annual salary in £
1 42,803 – 57,520
2 44,971 – 61,901
3 48,505 – 66,623
4 52,131 – 71,701
5 57,520 – 79,081
6 61,901 – 87,229
7 66,623 – 96,166
8 73,480 – 106,148

following system of two coupled functional equations


 
y1 m(x, y1 )
m x, = r1 (x) , ∀y1 , y2 ≥ 0
y2 m(x, y2 )
  (9.11)
x1 m(x1 , y)
m , y = r2 (y) , ∀x1 , x2 ≥ 0
x2 m(x2 , y)
for each combination of possible qualifications x, y ≥ 0. Here r1 (x) and 2 (y) are
qualification dependent scaling factors, similarly the factor r in the modified
equation (9.9). It can be shown (see [Ill90] for details) that the only continuous
functions m that admit joint representations of the form
m(x, y) = r1 (x)y u(x)
(9.12)
m(x, y) = r2 (y)xv(y)
are given by
m(x, y) = rxu y v ew log x log y , u, v, w ∈ R . (9.13)
Note that on a logarithmic scale, the multi-variable model of formal justice turns
into a bilinear model, rather than a linear one:
log m(x, y) = log r + u log x + v log y + w log x log y , u, v, w ∈ R . (9.14)

Problems
Exercise 9.3. Let h : R → R solve the functional equation h(t+s) = h(t)+h(s)
for all s, t ∈ R Prove that
a) h(−t) = −h(t), and
b) h(t − s) = h(t) − h(s) for all s, t ∈ R.
Exercise 9.4. Salaries for headteacher in England and Wales (excluding Lon-
don) range from £42,803 to £106,148 based upon a performance group index
that involves, e.g., school leadership, management or pupil progress. The 2013
pay ranges for headteachers are recorded in the following table:
For comparison, the following table shows the 2003 base salaries of players
in the U.S. National Football League, depending on their match experience:
Compare the two salary scales, and explain the rationale behind your com-
parison. Would you rate any of the above salary scales as fair?
(Hint: Determine the exponent s in the qualification-salary relation m(x) = cxs
by a least square fit of the data given.)

77
group annual salary in $
Rookies 225,000
2 yrs 300,000
3 yrs 375,000
4 – 6 yrs 450,000
7 – 9 yrs 655,000
10 yrs 755,00

References
[And11] D.F. Anderson and T.G. Kurtz. Continuous Time Markov Chain Models for
Chemical Reaction Networks. In: Design and Analysis of Biomolecular Circuits,
H. Koeppl, G. Setti, M. di Bernardo, and D. Densmore (eds.), pp. 3–42, Springer,
New York, 2011.
[Ari94] R. Aris. Mathematical Modelling Techniques. Dover, Mineola, 1994.
[BMR11] A.L. Ballinas-Hernández, A. Muñoz-Meléndez, A. Rangel-Huerta. Multiagent
System Applied to the Modeling and Simulation of Pedestrian Traffic in Coun-
terflow. J. Artif. Soc. Soc. Simulat. 14(3), 2, 2011.
[BCD02] N. Bellomo, V. Coscia, M. Delitala. On the Mathematical Theory of Vehicular
Traffic Flow I. Fluid Dynamic and Kinetic Modelling. Math. Mod. Meth. App.
Sc. 12, 1801–1843, 2002.
[Ben00] E.A. Bender. An Introduction to Mathematical Modeling. Dover, Mineola, 2000.
[Bie05] A.A. Biewener. Biomechanical consequences of scaling. J. Exp. Biol. 208, 1665–
1676, 2005.
[Buc14] E. Buckingham. On Physically Similar Systems; Illustrations of the Use of Di-
mensional Equations. Phys. Rev. 4, 345–376, 1914.
[Dou76] P.H. Douglas. The Cobb-Douglas Production Function Once Again: Its History,
Its Testing, and Some New Empirical Values. J. Polit. Econ. 84, 903–916, 1976.
[Gil77] D.T. Gillespie. Exact Stochastic Simulation of Coupled Chemical Reactions. J.
Phys. Chem. 81, 2340–2361, 1977.
[Hig08] D.J. Higham. Modeling and Simulating Chemical Reactions. SIAM Review 50,
347–368, 2008.
[Ill90] R. Illner. Formal justice and functional equations. Technical Reports (Mathe-
matics and Statistics), University of Victoria DMS-541-IR, 1990.
[IBM+ 05] R. Illner, C.S. Bohun, S. McCollum, and T. van Roode. Mathematical Modelling:
A Case Studies Approach. AMS, Providence, 2005.
[Izh07] E.M. Izhikevich. Dynamical Systems in Neuroscience: The Geometry of Ex-
citability and Bursting. MIT Press, Cambridge, 2007.
[KaEn] H. Kaper and H. Engler. Mathematics and Climate. Society for Industrial and
Applied Mathematics (SIAM), Philadelphia, Pennsylvania (2013).
[Kuc09] M. Kuczma. An introduction to the theory of functional equations and inequal-
ities. Cauchy’s equation and Jensen’s inequality. Birkhäuser, Basel, 2009.
[Lot10] A.J. Lotka. Contribution to the Theory of Periodic Reactions. J. Chem. Phys.
14, 271–274, 1910.
[MeG07] M. Mesterton-Gibbons. A Concrete Approach to Mathematical Modelling. Wiley,
Hoboken, 2007.
[Met87] N. Metropolis. The beginning of the Monte Carlo method. Los Alamos Science
15(584), 125–130, 1987.
[BTF+ 99] C.R. Rao, H. Toutenburg, A. Fieger, C. Heumann, T. Nittner, and S. Scheid.
Linear Models: Least Squares and Alternatives. Springer, Berlin, 1999.
[Tay50] G.I. Taylor. The formation of a blast wave by a very intense explosion II: The
atomic explosion of 1945. Proc. Roy. Soc. A 201, 175–186, 1950.

78
[Tes12] G. Teschl. Ordinary Differential Equations and Dynamical Systems. AMS,
Providence, 2012.
[Vol26] V. Volterra. Variazioni e fluttuazioni del numero d’individui in specie animali
conviventi. Mem. Acad. Lincei Roma 2, 31–113, 1926.
[Whi96] P. Whittle. Optimal control: Basics and beyond. Wiley & Sons, Chichester,
1996.

79

You might also like