Lecture Notes On Numerical Methods For Engineering (?) : Pedro Fortuny Ayuso
Lecture Notes On Numerical Methods For Engineering (?) : Pedro Fortuny Ayuso
Introduction 5
1. Some minor comments 5
Chapter 1. Arithmetic and error analysis 7
1. Exponential notation 7
2. Error, basic definitions 10
3. Bounding the error 16
Chapter 2. Numerical Solutions to Non-linear Equations 19
1. Introduction 19
2. The Bisection Algorithm 20
3. Newton-Raphsons Algorithm 22
4. The Secant Algorithm 24
5. Fixed Points 26
6. Annex: Matlab/Octave Code 32
Chapter 3. Numerical Solutions to Linear Systems of Equations 35
1. Gauss Algorithm and LU Factorization 35
2. Condition Number: behavior of the relative error 40
3. Fixed Point Algorithms 44
4. Annex: Matlab/Octave Code 46
Chapter 4. Interpolation 49
1. Linear (piecewise) interpolation 49
2. Can parabolas be used for this? 50
3. Cubic Splines: Continuous Curvature 51
4. The Lagrange Interpolating Polynomial 57
5. Approximate Interpolation 60
6. Code for some of the Algorithms 64
Chapter 5. Numerical Differentiation and Integration 69
1. Numerical Differentiation 69
2. Numerical IntegrationQuadrature Formulas 71
Chapter 6. Differential Equations 79
1. Introduction 79
3
4 CONTENTS
2. The Basics 81
3. Discretization 82
4. Sources of error: truncation and rounding 85
5. Quadratures and Integration 86
6. Eulers Method: Integrate Using the Left Endpoint 86
7. Modified Euler: the Midpoint Rule 87
8. Heuns Method: the Trapezoidal Rule 89
Chapter 7. Multivariate and higher order ODEs 93
1. A two-variable example 93
2. Multivariate equations: Euler and Heuns methods 96
3. From greater order to order one 98
Introduction
1. Exponential notation
Real numbers can have a finite or infinite number of digits. When
working with them they have to be approximated using a finite (and
usually small) number of digits. Even more, they should be ex-
pressed in a standard way so that their magnitude and significant
digits are easily noticed at first sight and so that sharing them among
machines is a deterministic and efficient process. These require-
ments have led to what is known as scientific or exponential nota-
tion. We shall explain it roughly; what we intend is to transmit the
underlying idea more than to detail all the rules (they have lots of
particularities and exceptions). We shall use, along these notes, the
following expressions:
D EFINITION 1. An exponential notation of a real number is an ex-
pression of the form
A.B 10C
10C A.B
A.BeC
where A, B and C are natural numbers (possibly 0), is a sign (which
may be elided if +). Any of those expressions refers to the real num-
ber ( A + 0.B) 10C (where 0.B is nought dot B. . . ).
For example:
The number 3.123 is the same 3.123.
The number 0.01e 7 is 0.000000001 (eight zeroes after the
dot and before the 1).
7
8 1. ARITHMETIC AND ERROR ANALYSIS
value (the bound) which is larger than the error, in order to assess,
with certainty, how far the real value can be from the computed one.
In what follows, an exact value x is assumed (a constant, a datum,
the exact solution to a problem. . . ) and an approximation will be
denoted x.
12 1. ARITHMETIC AND ERROR ANALYSIS
x = a 1 a 2 . . . a r . a r +1 . . . a n . . .
notice that there are r digits to the left of the decimal point. Define:
The problem with rounding is that all the digits of the rounded num-
ber can be different from the original value (think of 0.9999 rounded
to 3 significant digits). The great advantage is that the error incurred
when rounding is less than the one incurred when truncating (it can
even be a half of it):
14 1. ARITHMETIC AND ERROR ANALYSIS
1. Introduction
Computing roots of functions, and especially of polynomials, is
one of the classical problems of Mathematics. It used to be believed
that any polynomial could be explicitly solved like the quadratic
equation, via a formula involving radicals (roots of numbers). Galois
Theory, developed in the beginning of the XIX Century, proved that
this is not the case at all and that, as a matter of fact, most polynomi-
als of degree greater than 4 are not solvable using radicals.
However, the search for a closed formula for solving equations is
just a way of putting off the real problem. In the end, and this is
what matters:
The only computations that can always be performed exactly are addition,
subtraction and multiplication.
19
20 2. NUMERICAL SOLUTIONS TO NON-LINEAR EQUATIONS
1One could argue that using rational numbers solves this problem but, again,
there is a point at which decimal expansions are needed.
2. THE BISECTION ALGORITHM 21
2This is what gives the algorithm the alternative name midpoint rule.
22 2. NUMERICAL SOLUTIONS TO NON-LINEAR EQUATIONS
3. Newton-Raphsons Algorithm
A classical geometric idea (which also appears in approximation
theory) is to use the best3 linear approximation to f in order to compute
Examples an approximate solution to f ( x ) = 0. This best linear? approxima-
tion is, of course, the tangent line to f , which is computed using the
derivative f 0 ( x ). So, instead of trying to solve f ( x ) = 0 directly, one
draws the tangent to the graph of f at ( x, f ( x )) and finds the meeting
3From the point of view of infinitesimal analysis and polynomial
approximation.
3. NEWTON-RAPHSONS ALGORITHM 23
point of this line with the OX axis. Obviously, this will most likely
not be a root of f but in sufficiently general conditions, it is expected to
approach one. If the process is repeated enough times, it get nearer
a root of f . This is the idea of Newton-Raphsons method.
Recall that the equation of the line passing through ( x0 , y0 ) with
slope b is:
Y = b ( X x0 ) + y0
so that the equation of the tangent line to the graph of f at ( x0 , f ( x0 ))
is (assuming f has a derivative at x0 ):
Y = f 0 ( x0 )( X x0 ) + f ( x0 ).
The meeting point between this line and OX is
f ( x0 )
( x1 , y1 ) = x0 0 ,0
f ( x0 )
assuming it exists (i.e. f 0 ( x0 ) 6= 0).
If x1 is not an approximate root of f with the desired precision,
one proceeds in the same way at the point ( x1 , f ( x1 )). After having
performed n steps, the next point xn+1 takes the form:
f ( xn )
x n +1 = x n .
f 0 ( xn )
One carries on until the desired precision or a bound in the number
of iterations is reached. This is Algorithm 4. In its formal expression,
we only specify a possible error place in order to keep it clear but one
has to take into account that each time a computation is performed a
floating-point error might occur.
E XAMPLE 8. Take the function f ( x ) = e x + x, which is easily seen
to have a unique zero on the whole real line (why is this so and why
is it easy to see?). Its derivative is f 0 ( x ) = e x + 1. Let us use x0 = 1
as the seed. An implementation of Newton-Raphsons algorithm in
Octave might show the following steps:
octave> c=newtonF(f, fp, 1, .0001, 5)
xn = 0
xn = -0.500000000000000
xn = -0.566311003197218
xn = -0.567143165034862
xn = -0.567143290409781
xn = -0.567143290409784
c = -0.567143290409784
octave> f(c)
ans = -1.11022302462516e-16
24 2. NUMERICAL SOLUTIONS TO NON-LINEAR EQUATIONS
Algorithm 4 Newton-Raphson.
Input: A differentiable function f ( x ), a seed x0 R, a tolerance
e > 0 and a limit for the number of iterations N > 0
Output: Either an error message or a real number c such that
| f (c)| < e (i.e. an approximate root)
? S TART
i0
while | f ( xi )| e and i N do
f (x )
x i +1 x i 0 i
f ( xi )
i i+1
end while
if i > N then
return ERROR
end if
c xi
return c
5. Fixed Points
Fixed point algorithms which, as we shall see later, include the
previous ones indirectly are based on the notion of contractivity,
which reflects the idea that a function (a transformation) may map
pairs of points in such a way that the images are always nearer than
the original two (i.e. the function shrinks, contracts the initial space).
This idea, linked to differentiability, leads to that of fixed point of an
iteration and, by means of a little artifact, to the approximate solu-
tion of general equations using just iterations of a function.
5.1. Contractivity and equations g( x ) = x. Let g be a real-valued
function of one real variable, differentiable at c. That is, for any in-
finitesimal o, there exists another one o1 such that
g(c + o) = g(c) + g0 (c)o + oo1 ,
which means that near c, the function g is very similar to its linear ap-
proximation.
Assume that o is the width of a small interval centered at c.
Removing the supralinear error (the term oo1 ) for approximating, one
might think that (c o, c + o) is mapped into ( g(c) g0 (c)o, g(c) +
g0 (c)o): that is, an interval of radius o is mapped into one of radius
g0 (c)o (it dilates or shrinks by a factor of g0 (c)). This is essentially
what motivates the Jacobian Theorem in integration: the derivative
5. FIXED POINTS 27
then
lim xn = .
n
Assumed both checks have been performed, the method for find-
ing a fixed point of g : [ a, b] [ a, b] is stated in Algorithm 6:
E XAMPLE 13. For the fixed point algorithm we shall use the same
function as for the bisection method, that is f ( x ) = cos(e x ). In order
to find a root of this function, we need to turn the equation
cos(e x ) = 0
into a fixed-point problem. This is always done in the same (or sim-
ilar) way: the above equation is obviously equivalent to
cos(e x ) + x = x,
which is a fixed-point problem. Let us call g( x ) = cos(e x ) + x, whose
fixed point we shall try to find.
In order to apply the fixed-point theorem, one needs an interval
[ a, b] which is mapped into itself. It is easily shown that g( x ) de-
creases near x = 0.5 and, as a matter of fact, that it does so in the
interval I = [0.4, 0.5]. Moreover, g(0.4) ' 0.478+ while g(0.5) '
0.4221+, which implies that the interval I is mapped into itself by
g (this is probably the most difficult part of a fixed-point problem:
finding an appropriate interval which gets mapped into itself). The
derivative of g is g0 ( x ) = e x sin(e x ) + 1, whose absolute value is,
in that interval, less than 0.8 (it is less than 1 in the whole interval
[0, 1], as a matter of fact). This is the second condition to be verified
in order to apply the theorem.
30 2. NUMERICAL SOLUTIONS TO NON-LINEAR EQUATIONS
The convergence speed does not look too good, which is usual in
fixed-point problems.
R EMARK (Skip in first reading). The fixed point algorithm can be
used, as explained in 5.2, for finding roots of general equations using
a suitable factor; to this end, if the equation is f ( x ) = 0, one can use
any function g( x ) of the form
g( x ) = x k f ( x )
where k R is an adequate number. This transformation is per-
formed so that the derivative of g is around 0 near the root and so
that if c is the (unknown) root, then g defines a contractive map on
an interval of the form [c , c + ] (which will be the [ a, b] used in
the algorithm).
The hard part is to check that g is a contractive map of [ a, b] onto
itself.
5.4. Convergence Speed of the Fixed Point Method. If the ab-
solute value of the derivative is bounded by < 1, the convergence
speed of the fixed point algorithm is easily bounded because [ a, b] is
mapped onto a sub-interval of width at most (b a) and this con-
traction is repeated at each step. So, after i iterations, the width of
the image set is at most i (b a). If < 101 and (b a) < 1, for
example, then one can guarantee that each iteration gives an extra
digit of precision in the root. However, tends to be quite large (like
0.9) and the process is usually slow.
E XAMPLE 14. If [ a, b] = [0, 1] and | g0 ( x )| < 0.1, then after each it-
eration there is an exact decimal digit more in x as an approximation
to the fixed point c, regardless of the initial value of the seed x.
5. FIXED POINTS 31
fa = f(a);
fb = f(b);
if(fa == 0)
c = a;
return
end
if(fb == 0)
c = b;
return
end
c = (a+b)/2
fc = f(c);
while(abs(fc) >= epsilon & N < n)
N = N + 1;
% multiply SIGNS, not values
if(sign(fc)*sign(fa) < 0)
b = c;
fb = fc;
else
a = c;
fa = fc;
end
% An error might happen here
c = (a+b)/2;
fc = f(c);
end
if(N >= n)
warning("Tolerance not reached.")
end
end
% Newton-Raphson implementation
function [z n] = NewtonF(f, fp, x0, epsilon = eps, N = 50)
n = 0;
xn = x0;
% Both f and fp are anonymous functions
fn = f(xn);
while(abs(fn) >= epsilon & n <= N)
n = n + 1;
fn = f(xn); % memorize to prevent recomputing
% next iteration
xn = xn - fn/fp(xn); % an exception might take place here
end
z = xn;
if(n == N)
warning(Tolerance not reached.);
end
end
(2) Ax = b
the new system be the same as that of the original1. To this end, only
the following operation is permitted:
Any equation Ei (the i th row of A together with the i th
element of b) may be substituted by a linear combination of
the form Ei + Ek for some k < i and R. In this case, bi
is substituted with bi + bk .
The fact that Ei appears with coefficient 1 in the substituting expres-
sion (Ei + Ek ) is what guarantees that the new system has the same
solution as the original one.
Let A be the augmented matrix A = ( A|b).
swap is needed on A at step i then the rows i and j are swapped but
only the columns 1 to i 1 are involved). Also, the P computed up to
step i has to be multiplied on the left by Pij (the permutation matrix
for i and j).
This can be stated as Algorithm 8.
x = A1 b
so that, taking sizes (i.e. norms, which are denoted with kk), we get
kx k = k A1 b k.
Recall that we are trying to asses the relative displacement, not the
absolute one. To this end we need to include k x k in the left hand
42 3. NUMERICAL SOLUTIONS TO LINEAR SYSTEMS OF EQUATIONS
that is, the maximum of the sums of absolute values of each row.
The following result relates the infinity norms of matrices and
vectors:
L EMMA 4. The infinity norm is such that, for any vector x, k Ax k
k Ak k x k , where k x k is the norm given by the maximum of the abso-
lute values of the coordinates of x.
2. CONDITION NUMBER: BEHAVIOR OF THE RELATIVE ERROR 43
This means that, if one measures the size of a vector by its largest
coordinate (in absolute value), and one calls it k x k , then
kx k k k
k A k k A 1 k b .
k x k kbk
The product k Ak k A1 k is called the condition number of A for the
infinity norm, is denoted ( A) and is a bound for the maximum pos-
sible displacement of the solution when the initial vector gets dis-
placed. The greater the condition number, the greater (to be ex-
pected) the displacement of the solution when the initial condition
(independent term) changes a little.
The condition number also bounds from below the relative dis-
placement:
L EMMA 5. Let A be a nonsingular matrix of order n and x a solution
of Ax = b. Let b be a displacement of the initial conditions and x the
corresponding displacement in the solution. Then:
1 kb k kx k k k
( A) b .
( A) kbk kxk kbk
So that the relative displacement (or error) can be bounded using the rela-
tive residue (the number kb k/kbk).
The following example shows how large condition numbers are
usually an indicator that solutions may be strongly dependent on the
initial values.
E XAMPLE 17. Consider the system
0.853x + 0.667y = 0.169
(5)
0.333x + 0.266y = 0.067
whose condition number for the infinity norm is 376.59, so that a rel-
ative change of a thousandth of unit in the initial conditions (vector
b) is expected to give rise to a relative change of more than 37% in
the solution. The exact solution of (5) is x = 0.055+, y = 0.182+.
However, the size of the condition number is a tell-tale sign that
a small perturbation on the system will modify the solutions greatly.
If, instead of b = (0.169, 0.067), one uses b = (0.167, 0.067) (which is
a relative displacement of just 1.1% in the first coordinate), the new
solution is x = 0.0557+, y = 0.321+, for which x has not even the
same sign as in the original problem and y is displaced 76% from its
original value. This is clearly unacceptable. If the equations describe
a static system, for example, and the coefficients have been measured
44 3. NUMERICAL SOLUTIONS TO LINEAR SYSTEMS OF EQUATIONS
( N P) x = b Nx = b + Px x = N 1 b + N 1 Px.
If one calls c = N 1 b and M = N 1 P, then one obtains the following
fixed point problem:
x = Mx + c
which can be solved (if at all) in the very same way as in Chapter 2:
start with a seed x0 and iterate
xn = Mxn1 + c,
until a sufficient precision is reached.
In what follows, the infinity norm k k is assumed whenever the
concept of convergence appears3.
One needs the following results:
T HEOREM 4. Assume M is a matrix of order n and that k Mk < 1.
Then the equation x = Mx + c has a unique solution for any c and the
iteration xn = Mxn1 + c converges to it for any initial value x0 .
return;
end
At=A;
bt=b;
L=eye(n);
P=eye(n);
i=1;
while (i<n)
j=i+1;
% beware nomenclature:
% L(j,i) is ROW j, COLUMN i
% the pivot with greatest absolute value is sought
p = abs(At(i,i));
pos = i;
for c=j:n
u = abs(At(c,i));
if(u>p)
pos = c;
p = u;
end
end
if(u == 0)
warning(Singular system);
return;
end
% Swap rows i and p if i != p
% in A and swap left part of L
% This is quite easy in Matlab, there is no need
% for temporal storage
P([i pos],:) = P([pos i], :);
if(i = pos)
At([i pos], :) = At([pos i], :);
L([i pos], 1:i-1) = L([pos i], 1:i-1);
b([i pos], :) = b([pos i], :);
end
while(j<=n)
L(j,i)=At(j,i)/At(i,i);
% Combining these rows is easy
% They are 0 up to column i
% And combining rows is easy as above
At(j,i:n) = [0 At(j,i+1:n) - L(j,i)*At(i,i+1:n)];
bt(j)=bt(j)-bt(i)*L(j,i);
j=j+1;
end
i=i+1;
end
end
L ISTING 3.2. LUP Factorization
CHAPTER 4
Interpolation
f (x)
linear interpolation
4
4 6 8 10
It is continuous.
That is why it is frequently used for drawing functions (it is what
Matlab does by default): if the data cloud is dense, the segments are
short and corners will not be noticeable on a plot.
The main drawback of this technique is, precisely, the corners
which appear anywhere the cloud of points does not correspond to
a straight line. Notice also that this (and the following splines which
we shall explain) are interpolation methods only, not suitable for ex-
trapolation: they are used to approximate values between the endpoints
x0 , xn , never outside that interval.
f (x)
quadratic spline
4
4 6 8 10
f (x)
cubic spline
4
4 6 8 10
bottom 0. From these n equations, one computes all the ci and, using
(11) and (12), one gets all the bs and ds.
The above system (15), which has only nonzero elements on the
diagonal and the two adjacent lines is called tridiagonal. These sys-
tems are easily solved using LU factorization and one can even com-
pute the solution directly, solving the cs in terms of the s and hs.
One might as well use iterative methods but for these very simple
systems, LU factorization is fast enough.
3.2. The Algorithm. We can now state the algorithm for com-
puting the interpolating cubic spline for a data list x, y of length
n + 1, x = ( x0 , . . . , xn ), y = (y0 , . . . , yn ), in which xi < xi+1 (so
that all the values in x are different). This is Algorithm 9.
3.3. Bounding the Error. The fact that the cubic spline is graphi-
cally satisfactory does not mean that it is technically useful. As a matter
of fact, it is much more useful than what it might seem. If a function
is well behaved on the fourth derivative, then the cubic spline is a
very good approximation to it (and as the intervals get smaller, the
better the approximation is). Specifically, for clamped cubic splines, we
have:
T HEOREM 6. Let f : [ a, b] R be a 4 times differentiable function
with | f 4) ( x )| < M for x [ a, b]. Let h be the maximum of xi xi1 for
i = 1, . . . , n. If s( x ) is a the clamped cubic spline for ( xi , f ( xi )), then
5M 4
|s( x ) f ( x )| h .
384
This result can be most useful for computing integrals and bound-
ing the error or for bounding the error when interpolating values
of solutions of differential equations. Notice that the clamped cu-
bic spline for a function f is such that s0 ( x0 ) = f 0 ( x0 ) and s0 ( xn ) =
f 0 ( xn ), that is, the first derivative at the endpoints is given by the
first derivative of the interpolated function.
Notice that this implies, for example, that if h = 0.1 and M <
60 (which is rather common: a derivative greater than 60 is huge),
then the distance at any point between the original function f and the
interpolating cubic spline is less than 104 .
3.4. General Definition of Spline. We promised to give the gen-
eral definition of spline:
D EFINITION 15. Given a data list as (6), an interpolating spline of
degree m for it, (for m > 0), is a function f : [ x0 , xn ] R such that
56 4. INTERPOLATION
j 6 =i ( x x j )
(16) pi ( x ) = .
j 6 =i ( xi x j )
These polynomials p0 ( x ), . . . , pn ( x ) are called the Lagrange basis poly-
nomials (there are n + 1 of them, one for each i = 0, . . . , n). The collec-
tion { p0 ( x ), . . . , pn ( x )} can be viewed as the basis of the vector space
Rn+1 . From this point of view, now the vector P( x ) = (y0 , y1 , . . . , yn )
58 4. INTERPOLATION
f (x)
5 cubic spline
Lagrange
4
4 6 8 10
One verifies easily that this P( x ) passes through all the points ( xi , yi )
for i = 0, . . . , n.
The fact that there is only one polynomial of the same degree
as P( x ) satisfying that condition can proved as follows: if there ex-
isted Q( x ) of degree less than or equal to n passing through all those
points, the difference P( x ) Q( x ) between them would be a poly-
nomial of degree at most n with n + 1 zeros and hence, would be 0.
So, P( x ) Q( x ) would be 0, which implies that Q( x ) equals P( x ).
Compare the cubic spline interpolation with the Lagrange inter-
polating polynomial for the same function as before in figure 4.
The main drawbacks of Lagranges interpolating polynomial are:
There may appear very small denominators, which may give
rise to (large) rounding errors.
It is too twisted.
4. LAGRANGE INTERPOLATION 59
0.8
0.6
0.4
0.2
1
f ( x ) = 1+12x 2
Lagrange
0
1 0.5 0 0.5 1
R( x ) = ( x x0 )( x x1 ) . . . ( x xn ),
0.8
0.6
0.4
0.2 1
f ( x ) = 1+12x 2
Lagrange
0
1 0.5 0 0.5 1
1
1 0.5 0 0.5 1
The least squares linear interpolation problem consists in, given the
cloud of points ( xi , yi ) and a family of functions, to find the func-
tion among those in the family which minimizes the total quadratic
error. This family is assumed to be a finite-dimensional vector space
(whence the term linear interpolation).
We shall give an example of nonlinear interpolation problem, just
to show the difficulties inherent to its lack of vector structure.
Writing
N
yj = f j ( xi ) yi ,
i =1
and stating the equations in matrix form, one gets the system
f 1 ( x1 ) f 1 ( x2 ) ... f1 (x N ) f 1 ( x1 ) f 2 ( x1 ) ... f n ( x1 ) a1
f 2 ( x1 ) f 2 ( x2 ) f2 (x N )
... f 1 ( x2 ) f 2 ( x2 ) ... f n ( x2 )
a2
.. .. =
. .. .. .. .. .. ..
.. . . . . . . . .
f n ( x1 ) f n ( x2 ) ... fn (xN ) f1 (x N ) f2 (x N ) ... fn (xN ) an
(18)
f 1 ( x1 ) f 1 ( x2 ) ... f1 (x N ) y1 y1
f 2 ( x1 ) f 2 ( x2 ) f2 (x N )
... y2 y 2
= . .. .. = .. ,
.. ..
.. . . . . .
f n ( x1 ) f n ( x2 ) ... fn (xN ) yN yn
3 data
2
3e2x
linear interp.
2
0
2 1 0 1 2
What is at stake here is, essentially, the fact that log( x ) is not a
linear function of x, that is: log( a + b) 6= log( a) + log(b). Then, per-
forming a linear computation on the logarithms and then computing
the exponential is not the same as performing the linear operation on
the initial data.
1. Numerical Differentiation
Sometimes (for example, when approximating the solution of a
differential equation) one has to approximate the value of the deriv-
ative of a function at point. A symbolic approach may be unavailable
(either because of the software or because of its computational cost)
and a numerical recipe may be required. Formulas for approximat-
ing the derivative of a function at a point are available (and also for
higher order derivatives) but one also needs, in general, bounds for
the error incurred. In these notes we shall only show the symmetric
rule and explain why the naive approximation is suboptimal.? instability?
1.1. The Symmetric Formula for the Derivative. The first (sim-
ple) idea for approximating the derivative of f at x is to use the def-
inition of derivative as a limit, that is, the following formula:
f ( x + h) f ( x )
(20) f 0 (x) ' ,
h
where h is a small increment of the x variable.
However, the very expression in formula (20) shows its weak-
ness: should one take h positive or negative? This is not irrelevant.
Assume f ( x ) = 1/x and try to compute its derivative at x = 2. We
shall take |h| = .01. Obviously f 0 (0.5) = 0.25. Using the natural
approximation one has, for h > 0:
1
f ( x + .01) f ( x ) 2.01 12
= = 0.248756+
.01 .01
69
70 5. NUMERICAL DIFFERENTIATION AND INTEGRATION
f ( x0 + h )
f ( x0 )
f ( x0 h )
x0 h x0 x0 + h
f (x)
2
0
2 3 3.5 4 5
2.1. The Midpoint Rule. A coarse but quite natural way to ap-
proximate an integral is to multiply the value of the function at the
midpoint by the width of the interval. This is the midpoint formula:
D EFINITION 18. The midpoint quadrature formula corresponds to
x1 = ( a + b)/2 and a1 = (b a). That is, the approximation
Z b
a+b
f ( x ) dx ' (b a) f .
a 2
given by the area of the rectangle having a horizontal side at f ((b
a)/2).
One checks easily that the midpoint rule is of order 1: it is exact
for linear polynomials but not for quadratic ones.
2.3. Simpsons Rule. The next natural step involves 3 points in-
stead of 2 and using a parabola instead of a straight line. This method
is remarkably precise (it has order 3) and is widely used. It is called
Simpsons Rule.
D EFINITION 20. Simpsons rule is the quadrature formula corre-
sponding to the nodes x1 = a, x2 = ( a + b)/2 and x3 = b, and
the weights, corresponding to the correct interpolation of a degree
74 5. NUMERICAL DIFFERENTIATION AND INTEGRATION
f (x)
2
0
2 3 4 5
4( b a )
2 polynomial. That is1, a1 = b6 a , a2 = 6 and a3 = b6 a . Hence, it
is the approximation of the integral of f by
Z b
ba
a+b
f ( x ) dx ' f ( a) + 4 f + f (b) ,
a 6 2
which is a weighted mean of the areas of three intermediate rectan-
gles. This rule must be memorized: one sixth of the length times the
values of the function at the endpoints and midpoint with weights
1, 4, 1.
The remarkable property of Simpsons rule is that it has order 3:
even though a parabola is used, the rule integrates correctly poly-
nomials of degree up to 3. Notice that one does not need to know the
equation of the parabola: the values of f and the weights (1/6, 4/6, 1/6)
are enough.
2.4. Composite Formulas. Composite quadrature formulas are
no more than applying the simple ones in subintervals. For ex-
ample, instead of using the trapezoidal rule for approximating an
integral, like
Z b
ba
f ( x ) dx ' ( f ( a) + f (b))
a 2
one subdivides [ a, b] into subintervals and performs the approxima-
tion on each of these.
1This is an easy computation which the reader should perform by himself.
2. NUMERICAL INTEGRATIONQUADRATURE FORMULAS 75
f (x)
2
0
2 3 3.5 4 5
f (x)
2
0
2 2.5 3 3.5 4 4.5 5
80 f (x)
60
40
20
0
2 2.75 3.5 4.25 5
Differential Equations
1. Introduction
A differential equation is a special kind of equation: one in which
one of the unknowns is a function. We have already studied some:
any integral is a differential equation (it is the simplest kind). For
example,
y0 = x
is an equation in which one seeks a function y( x ) whose derivative
is x. It is well-known that the solution is not unique: there is an
integration constant and the general solution is written as
x2
y= + C.
2
This integration constant can be better understood graphically.
When computing the primitive of a function, one is trying to find
a function whose derivative is known. A function can be thought
of as a graph on the X, Y plane. The integration constant specifies
at which height the graph is. This does not change the derivative,
obviously. On the other hand, if one is given the specific problem
y0 = x, y(3) = 28,
then one is trying to find a function whose derivative is x, with a con-
dition at a point: that its value at 3 be 28. Once this value is fixed, there
is only one graph having that shape and passing through (3, 28). The con-
dition y(3) = 28 is called an initial condition. One imposes that the
graph of f passes through a point and then there is only one f solv-
ing the integration problem, which means there is only one suitable
constant C. As a matter of fact, C can be computed by substitution:
28 = 32 + C C = 19.
79
80 6. DIFFERENTIAL EQUATIONS
The same idea gives rise to the term initial condition for a differential
equation.
Consider the equation
y0 = y
(whose general solution should be known). This equation means:
find a function y( x ) whose derivative is equal to the same function
y( x ) at every point. One tends to think of the solution as y( x ) = e x
but. . . is this the only possible solution? A geometrical approach may
be more useful; the equation means the function y( x ) whose deriv-
ative is equal to the height y( x ) at each point. From this point of
view it seems obvious that there must be more than one solution to
the problem: at each point one should be able to draw the corre-
sponding tangent, move a little to the right and do the same. There
is nothing special on the points ( x, e x ) for them to be the only solu-
tion to the problem. Certainly, the general solution to the equation
is
y( x ) = Ce x ,
where C is an integration constant. If one also specifies an initial con-
dition, say y( x0 ) = y0 , then necessarily
y0 = Ce x0
2. The Basics
The first definition is that of differential equation
D EFINITION 23. An ordinary differential equation is an equality A =
B in which the only unknown is a function of one variable whose de-
rivative of some order appears explicitly.
The adjective ordinary is the condition on the unknown of being
a function of a single variable (there are no partial derivatives).
E XAMPLE 18. We have shown some examples above. Differential
equations can get many forms:
y0 = sin( x )
xy = y0 1
(y0 )2 2y00 + x2 y = 0
y0
xy = cos(y)
y
etc.
In this chapter, the unknown in the equation will always be de-
noted with the letter y. The variable on which it depends will usually
be either x or t.
D EFINITION 24. A differential equation is of order n if n is the
highest order derivative of y appearing in it.
The specific kind of equations we shall study in this chapter are
the solved ones (which does not mean that they are already solved,
but that they are written in a specific way):
y0 = f ( x, y).
D EFINITION 25. An initial value problem is a differential equation
together with an initial condition of the form y( x0 ) = y0 , where
x0 , y0 R.
D EFINITION 26. The general solution to a differential equation E
is a family of functions f ( x, c), where c is one (or several) constants
such that:
Any solution of E has the form f ( x, c) for some c.
Any expression f ( x, c) is a solution of E,
except for possibly a finite number of values of c.
82 6. DIFFERENTIAL EQUATIONS
3. Discretization
We shall assume a two-variable function f ( x, y) is given, which
is defined on a region x [ x0 , xn ], y [ a, b], and which satisfies the
following condition (which the reader is encouraged to forget):
D EFINITION 27. A function f ( x, y) defined on a set X R2 satis-
fies Lipschitzs condition if there exists K > 0 such that
| f ( x1 ) f ( x2 )| K | x1 x2 |
for any x1 , x2 , X, where | | denotes the absolute value of a number.
This is a kind of strong continuity condition (i.e. it is easier for
a function to be continuous than to be Lipschitz). What matters is
that this condition has a very important consequence for differential
equations: it guarantees the uniqueness of the solution. Let X be
a set [ x0 , xn ] [ a, b] (a strip, or a rectangle) and f ( x, y) : X R a
function on X which satisfies Lipschitzs condition. Then
T HEOREM 9 (Cauchy-Kovalevsky). Under the conditions above, any
differential equation y0 = f ( x, y) with an initial condition y( x0 ) = y0 for
y0 ( a, b) has a unique solution y = y( x ) defined on [ x0 , x0 + t] for some
t R greater than 0.
Lipschitzs condition is not so strange. As a matter of fact, poly-
nomials and all the analytic functions (exponential, logarithms,
trigonometric functions, etc. . . ) and their inverses (where they are
defined and continuous) satisfy it. An example which does not is
f ( x ) = x on an interval containing 0, because f at that point has
a vertical tangent line. The reader should not worry about this
condition (only if he sees a derivative becoming infinity or a point of
discontinuity, but we shall not discuss them in these notes). We give
just an example:
3. DISCRETIZATION 83
3.1. The derivative as an arrow. One is usually told that the de-
rivative of a function of a real variable is the slope of its graph at the
corresponding point. However, a more useful idea for the present
chapter is to think of it as the Y coordinate of the velocity vector of the
graph.
When plotting a function, one should imagine that one is draw-
ing a curve with constant horizontal speed (because the OX axis is
homogeneous, one goes from left to right at uniform speed). This
way, the graph of f ( x ) is actually the plane curve ( x, f ( x )). Its tan-
gent vector at any point is (1, f 0 ( x )): the derivative f 0 ( x ) of f ( x ) is
the vertical component of this vector.
From this point of view, a differential equation in solved form
0
y = f ( x, y) can be interpreted as the statement find a curve ( x, y( x ))
such that the velocity vector at each point is (1, f ( x, y)). One can
then draw the family of velocity vectors on the plane ( x, y) given
by (1, f ( x, y)) (the function f ( x, y) is known, remember). This visu-
alization, like in Figure 1, already gives an idea of the shape of the
solution.
Given the arrows the vectors (1, f ( x, y)) on a plane, draw-
ing a curve whose tangents are those arrows should not be too hard.
Even more, if what one needs is just an approximation, instead of
drawing a curve, one could draw little segments going in the di-
rection of the arrows. If these segments have a very small x coor-
dinate, one reckons that an approximation to the solution will be
obtained.
This is exactly Eulers idea.
84 6. DIFFERENTIAL EQUATIONS
which grows both when h becomes larger and when it gets smaller.
The minimum E(h) is given by h ' e: which means that there is
no point in using intervals of width less than e in the Euler method
(actually, ever). Taking smaller intervals may perfectly lead to huge
errors.
This explanation about truncation and rounding errors is rele-
vant for any method, not just Eulers. One needs to bound both for
every method and know how to choose the best h. There are even
problems for which a single h is not useful and it has to be modified
during the execution of the algorithm. We shall not deal with these
problems here (the interested reader should look for the term stiff
differential equation).
which is not a primitive but looks like it. In this case, there is no way to
approximate the integral using the values of f at intermediate points
because one does not know the value of y(t). But one can take a similar
approach.
That is:
Z x i +1
f (t, y(t)) dt ' ( xi+1 xi ) f ( xi , yi ) = h f ( xi , yi ).
xi
If f ( x, y) were independent of y, then one would be performing the
following approximation (for an interval [ a, b]):
Z b
f (t) dt = (b a) f ( a),
a
which, for lack of a better name, could be called the left endpoint rule:
the integral is approximated by the area of the rectangle of height
f ( a) and width (b a).
One might try (as an exercise) to solve the problem using the
right endpoint: this gives rise to what are called the implicit meth-
ods which we shall not study (but which usually perform better than
the explicit ones we are going to explain).
7. Modified Euler: the Midpoint Rule
Instead of using the left endpoint of [ xi , xi+1 ] for integrating and
computing yi+1 , one might use (and this would be better, as the
reader should verify) the midpoint rule somehow. As there is no
way to know the intermediate values of y(t) further than x0 , some
kind of guesswork has to be done. The method goes as follows:
Use a point near Pi = ( xi , yi ) whose x coordinate is the
midpoint of [ xi , xi+1 ].
For lack of a better point, the first approximation is done us-
ing Eulers algorithm and one takes a point Qi = [ xi+1 , yi +
h f ( xi , yi )].
88 6. DIFFERENTIAL EQUATIONS
yi + k1
z2 + k 2
z2
y i +1
yi
xi xi + h x i +1
2
yi + hk1
y i +1
yi + hk2
yi
xi x i +1 x i +1 + h
5 Euler
Modified
4 Solution
2 1 0 1 2
Euler
Modified
1 Heun
Solution
0.5
0.5
1.5
0 1 2 3 4 5 6
1. A two-variable example
Let us consider a problem in two variables. Usually and we
shall follow the convention in this chapter the independent vari-
able is called t, reflecting the common trait of ODEs of being equa-
tions which define a motion in terms of time.
Consider a dimensionless body B a mass point moving with
velocity v in the plane. Hence, if the coordinates of B at time t are
B(t) = ( x (t), y(t)), we shall denote v = ( x (t), y (t)). Assume we
know, for whatever reason, that the velocity vector satisfies some
condition F which depends on the position of B in the plane. This
can be expressed as v = F ( x, y). The function F is then a vector
function of two variables, F ( x, y) = ( F1 ( x, y), F2 ( x, y)), and we can
write the condition on v as:
x (t) = F1 ( x (t), y(t))
(21)
y (t) = F2 ( x (t), y(t))
which means, exactly, that the x component of the velocity depends
on the position at time t as the value of F1 at the point and that the
ycomponent depends as the value of F2 . If we want, writing B(t) =
( x (t), y(t)) as above, expression 21 can be written, in a compact way
as
v(t) = F ( B(t))
93
94 7. MULTIVARIATE AND HIGHER ORDER ODES
which reflects the idea that what we are doing is just the same as in
Chapter 6, only with more coordinates. This is something that has to be
clear from the beginning: the only added difficulty is the number of
computations to be carried out.
Notice that in the example above, F depends only on x and y
not on t. However, it might as well depend on t (because the be-
haviour of the system may depend on time), so that in general, we
should write
v(t) = F (t, B(t)).
Just for completeness, if F does not depend on t, the system is called
autonomous, whereas if it does depend on t, it is called autonomous.
easy to check that x (t) = cos(t), y(t) = sin(t) verify the conditions
above. That is, the solution to the initial value problem (23) is
x (t) = cos(t)
(24)
y(t) = sin(t)
which, as the reader will have alredy realized, is a circular trajectory
around the origin, passing through (1, 0) at time t = 0. Any other
initial condition ( x (0), y(0)) = ( a, b) gives rise to a circular trajectory
starting at ( a, b).
However, our purpose, as we have stated repeatedly in these
notes, is not to find a symbolic solution to any problem, but to ap-
proximate it using the tools at hand. In this specific case, which is of
dimension two, one can easily describe the generalization of Eulers
method of Chapter 6 to the problem under study. Let us fix a dis-
cretization of t, say in steps of size h. Then, a rough approximation
to a solution would be:
(1) We start with x0 = 1, y0 = 0.
(2) At that point, the differential equation means that the veloc-
ity v(t) = ( x (t), y (t)) is
x (0) = 0, y (0) = 1.
(3) Because t moves in steps of size h, the trajectory ( x (t), y(t))
can only be approximated by moving as much as the vector
at t = 0 says multiplied by the timespan h, hence the next
96 7. MULTIVARIATE AND HIGHER ORDER ODES
in coordinates:
x1 (t) = F1 ( x1 (t), . . . , xn (t)) m
x (t) = F ( x (t), . . . , x (t)) m
2 2 1 n
.
..
x n (t) = Fn ( x1 (t), . . . , xn (t)) m
1
x 1 = u1
1 2
u 1 = u1
u 1k2 = u1k1
.. , ..
. ,..., ..
. .
1
u kn2 = ukn1
1 2
xn = un u n = un
m2
( x1 , y1 ) = G 3/2
( x2 x1 , y2 y1 )
(28)
(( x2 x1 )2 + (y2 y1 )2 )
m1
( x2 , y2 ) = G 3/2
( x1 x2 , y1 y2 )
(( x2 x1 )2 + (y2 y1 )2 )
23 %plot(x1,y1,r, x2,y2,b);
24 %axis([-2 2 -2 2]);
25 %filename=sprintf(pngs/%05d.png,k);
26 %print(filename);
27 %clf;
28 end
29 endfunction
30 % interesting values:
31 % r=twobody(4,400,[-1,0,1,0,0,14,0,-0.1],.01,40000); (pretty singular)
32 % r=twobody(1,1,[-1,0,1,0,0,0.3,0,-0.5],.01,10000); (strange->loss of
precision!)
33 % r=twobody(1,900,[-30,0,1,0,0,2.25,0,0],.01,10000); (like sun-earth)
34 % r=twobody(1, 333000, [149600,0,0,0,0,1,0,0], 100, 3650); (earth-sun)...
L ISTING 7.1. An implementation of the two-body
problem with G = 1 using Heuns method.
In Figure 4, a plot for the run of the function twobbody for masses
4, 400 and initial conditions (1, 0), (1, 0), (0, 14), (0, 0.1) is given,
using h = 0.01 and 40000 steps. The difference in mass is what makes
the trajectories so different from one another, more than the differ-
ence in initial speed. Notice that the blue particle turns elliptically
around the red one.
Finally, in Figure 5, we see both the plot of the motion of two par-
ticles and, next to it, the relative motion of one of them with respect
to the other, in order to give an idea of:
The fact that trajectories of one body with respect to the
other are ellipses.
The error incurred in the approximation (if the solution were
exact, the plot on the right would be a closed ellipse, not the
open trajectory shown).
104 7. MULTIVARIATE AND HIGHER ORDER ODES