CT2 Notes - All Chapters
CT2 Notes - All Chapters
1 c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved.
Contents
1 Linear Equations 1
1.1 Revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 System of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 LU Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.3 Computational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Partial Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.5 Total Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Solution of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Matrix Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 LU Factorisation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Crout Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Cholesky Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3 Iterative Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Iterative Solution Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Jacobi Iterative Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Gauss-Seidel Iterative Method . . . . . . . . . . . . . . . . . . . . . . . . 11
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
iv CONTENTS
3 Nonlinear Equations 31
3.1 Revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.1 The Problem of Nonlinear Root-finding . . . . . . . . . . . . . . . . . . . 31
3.1.2 Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.3 Termination Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.4 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.5 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.6 Regula Falsi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.7 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Combining Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Laguerre’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2 Horner’s method for evaluating a polynomial and its derivatives . . . . . . 41
3.3.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.4 Deflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Systems of Nonlinear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Univariate Minimisation 49
4.1 The Problem of Univariate Minimisation . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Non-derivative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.2 Uniform Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.3 Dichotomous Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.4 Fibonacci Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.5 Golden Section Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.6 Brent’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Derivative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1 Bisection Search Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
CONTENTS v
6 Eigenproblems 67
6.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1.1 Finding eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . 68
6.2 Power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.1 An example of the power method . . . . . . . . . . . . . . . . . . . . . . 71
6.2.2 Shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.3 Deflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Inverse Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3.1 An example of inverse iteration . . . . . . . . . . . . . . . . . . . . . . . 75
6.4 Similarity Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.5.1 Uncoupling systems of linear equations . . . . . . . . . . . . . . . . . . . 77
6.5.2 Solving a system of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.5.3 Powers of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
Chapter 1
Linear Equations
1.1 Revision
1.1.1 System of Equations
One of the most common computational problems for engineers is to solve a set of simultaneous
equations (e.g., solution matrix in finite difference and finite element methods). In many cases it
may be necessary to solve very large systems of equations (e.g., aircraft design). The accurate and
efficient solution of linear systems of equations is therefore a very important engineering problem.
A linear system of m equations with n unknowns (x1 , . . . , xn ) can be rewritten in the form
Ax = b where A is an m × n matrix and x and b are column vectors. For example, the system of
four equations in four unknowns
1.1.2 LU Factorisation
The standard steps of the LU factorisation are as follows:
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2 L INEAR E QUATIONS
Doolittle Factorisation
Perhaps the most commonly used LU factorisation method is a variation on Gaussian elimination
and is called Doolittle factorisation. In this method Gaussian elimination is used to determine U ,
and L is then constructed from the subtraction multipliers used in the factorisation process.
For example to factorise the matrix in Section 1.1.1
2 3 −4 2 Pivot Row = r1
−4 −5 6 −3 r2 − (−2)r1 l21 = −2
A=
2 2 1 0 r3 − r1 l31 = 1
−6 −7 14 −4 r4 − (−3)r1 l41 = −3
2 3 −4 2 Pivot Row = r2
0 1 −2 1
−→
0 −1 5 −2 r3 − (−1)r2 l32 = −1
0 2 2 2 r4 − 2r2 l42 = 2
2 3 −4 2 Pivot Row = r3
0 1 −2 1
−→ 0 0 3 −1
0 0 6 0 r4 − 2r3 l43 = 2
2 3 −4 2
0 1 −2 1
−→ 0 0 3 −1 = U
0 0 0 2
The L matrix is a lower-triangular matrix storing the pivot operations
1 0 0 0 1 0 0 0
l21 1 0 0 −2 1 0 0
L= l31 l32 1 0 = 1 −1 1 0
(1.4)
l41 l42 l43 1 −3 2 2 1
Notes:
1. The pivots are stored on the diagonal of U .
2. This method fails when we encounter a zero pivot, in which case, we need to reorder the
rows of the matrix. If all the values in a column are zero, the matrix is singular and a
solution cannot be found with the LU factorisation method.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
1.1 R EVISION 3
Once the factorisation has been found, the solution of the system of equations is a simple
case of forward and back substitution. For example, consider the right-hand side vector given in
Section 1.1.1. First we solve for the y vector by forward substitution:
1 0 0 0 4 4
−2 1 0 0 0 −8
Ly = b ⇐⇒ 1 −1 1 0 5 = 9
−3 2 2 1 8 6
Note that the solution vector is given in bold. Next, we find A−1 b by back substitution:
2 3 −4 2 1 4
0 1 −2 1 2 0
Ux = y ⇐⇒ 0 0 3 −1 3 = 5
(1.5)
0 0 0 2 4 8
In practice, as the Doolittle process proceeds the elements of L are stored below the diagonal
in the A matrix and the elements of U are stored on and above the diagonal.
u11 u21 ... u11
.. .. ..
l
. . .
A −→ 21 (1.6)
... ..
.
..
.
..
.
ln1 . . . ln,(n−1) unn
For the example considered above
2 3 −4 2
−2 1 −2 1
A −→
1 −1 3 −1 (1.7)
−3 2 2 2
This improves the computational efficiency of the LU method as it is only necessary to store one
matrix.
using four decimal floating arithmetic. The correct solution is x1 = 10 and x2 = 1. Using standard
LU, we would pivot on the first column, giving l21 = 0.4003/0.0004 = 1001. The equation for x2
is therefore
−1405x2 = −1404,
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
4 L INEAR E QUATIONS
which yields x2 = 0.9993 and x1 = (1.406 − 1.402x2 )/0.0004 = 0.005/0.0004 = 12.5. This
problem arises due to the pivot being small in relation to the other members of the column. A
computer inevitably introduces some rounding error into calculations which can lead, in cases
such as above, to substantial solution error.
This potential for numerical error can be reduced by using partial pivoting. Instead of using
the first row as the pivot, we swap rows of A before the elimination step to ensure that the pivot
value has a larger magnitude than any of the coefficients in the column below it. Such an approach
avoids problems with zero or very small pivots, and hence, improves computational accuracy. The
cost associated with partial pivoting is the need to store pivot strategy in vector ρi for right-hand
sides. An example of using partial pivoting follows:
Example:
2 3 −4 2
−4 −5 6 −3 The largest magnitude in the first column is -6
A= ⇒ swap r4 with r1
2 2 1 0
⇒ ρ1 = 4
−6 −7 14 −4
−6 −7 14 −4
−4 −5 6 −3 r2 − 2 r1 l21 = 23
=⇒ 3
1
0 r3 − − 3 r1 l31 = − 13
2 2 1
2 3 −4 2 r4 − − 13 r1 l41 = − 13
−6 −7 14 −4 2
2 − 1 −3 1 The largest magnitude in the second column is
− 13 3
−→ 3
− 1 − 1 5 2
3 3 ⇒ swap r2 with r4
−4
3 3 3 3 ⇒ ρ2 = 4
−1 2 2 2
3 3 3 3
−6 −7 14 −4
− 1 2 2 2
−→ 3 3
− 1 − 1 5 2
3 3
− 43 r3 − − 12 r2 l32 = − 12
3 3 3
2
3
− 13 −3 13 − 13 r4 − − 12 r2 l42 = − 12
−6 −7 14 −4 ⇒ ρ3 = 3
− 1 2 2 2
−→ 3
− 1 − 1 6
3 3 3
3 2
−1
2
− 12 −3 r4 − − 12 r3 l43 = − 12
0
3
−6 −7 14 −4
− 1 2 2 2
−→ 3
− 1 − 1 6
3 3 3 ⇒ ρ4 = 4
3 2
−1
2
3
− 21 − 12 − 12
Therefore
1 0 0 0 −6 −7 14 −4 4
− 1 1 0 0 0 2 2 2 4
L= 3 U = 3 3 3 ρ= (1.9)
− 1 − 1 1 0 0 0 6 −1 3
3 2
2
3
− 12 − 12 1 0 0 0 − 12 4
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
1.1 R EVISION 5
To keep the system consistent we need to perform the pivot operations on b to get b0
4 6 6 6
−8 −→ −8 −→ 4 −→ 4 0
b= 9 ρ1 = 4 9 ρ2 = 4 9 ρ3 = 3 9 = b
(1.10)
6 4 −8 −8
Alternatively the pivot operations can be stored using a permutation matrix P . Originally this is
the identity. If a pivot operation is performed the corresponding rows in P are also swapped. b0
can be determined from b0 = P b. For example, for the problem considered here
0 0 0 1
1 0 0 0
P = 0 0 1 0
(1.11)
0 1 0 0
and
0 0 0 1 4 6
1 0 0 0 −8 4
b0 = P b =
= (1.12)
0 0 1 0 9 9
0 1 0 0 6 −8
The solution process thus consists of the following steps:
For example,
1 0 0 0 6 6
− 1 1 0 0 6 4
Ly = b0
31 = (1.13)
− − 1 1 0 14 9
3 2
2
3
− 12 − 21 1 −2 −8
−6 −7 14 −4 1 6
0 2 2 2
3 3
2
3
6
Ux = y . (1.14)
0 0 6 −1 3 14
0 0 0 − 21 4 −2
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
6 L INEAR E QUATIONS
1.1.6 Applications
Calculating an Inverse Matrix
As mentioned before calculating the inverse of a matrix directly is a computationally expensive
process. We can use LU factorisation to find the inverse of a matrix with a considerable compu-
tational saving over the direct method. Consider the observation that AA−1 = I. This of course
implies
. .. .. .. 1 0 0 0
.. . . .
0 1 0 0
A x1 x2 x3 x4 =
0 0 1 0
.. .. .. ..
. . . . 0 0 0 1
or
1 0 0 0
0 1 0 0
Ax1 = 0
Ax2 = 0
Ax3 =
1 Ax4 = 0 ,
0 0 0 1
where x1 , x2 , x3 , x4 are the columns of A−1 .
Therefore the inverse matrix A−1 can be calculated easily from an LU factorisation.
Algorithm
1. Factorise P A = LU (P is the permutation matrix).
2. Solve Ly i = P b for each column of the inverse matrix −→ y 1 , . . . , y n
3. Solve U xi = y i for xi −→ x1 , . . . , xn
. .. .. ..
.. . . .
−1
4. A = x1 x2 x3 · · · xn
.. .. .. ..
. . . .
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
1.3 LU FACTORISATION M ETHODS 7
As an example, consider the linear system in Section 1.1.1, which in augmented form is given by
..
2 3 −4 2 . 4
.
−4 −5 6 −3 .. −8
. (1.15)
0 .. 9
2 2 1
.
−6 −7 14 −4 .. 6
Using row reduction we convert this augmented matrix to an upper-triangular form, i.e.,
..
2 3 −4 2 .. 4
0 1 −2 1 .. 0
(1.16)
0 0 3 −1 ... 5
..
0 0 0 2 . 8
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
8 L INEAR E QUATIONS
gives better control of round-off error. L̄ can be constructed from L by multiplying the columns
of L by the pivots. Ū can be constructed from U by dividing the rows of U by the pivots. For
example
0 0 0 1 23 −2 1
2
−4 1 0 0 0 1 −2 1
A = L̄Ū =
2 −1 3 0 0 0 1 − 1
(1.17)
3
−6 2 6 2 0 0 0 1
Ax = b (1.21)
returns x + δx, where x is the true solution and δx is the unknown error, then substituting this
solution back into our system of equations will give a slight discrepancy in the right hand side
vector i.e.,
A (x + δx) = b + δb, (1.22)
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
1.3 LU FACTORISATION M ETHODS 9
where δb is termed the residual in the solution. Rearranging Equation (1.22) we can obtain the
following expression for the residual:
δb = A (x + δx) − b. (1.23)
Note that δb must be calculated in double precision since there will be a lot of cancellation in the
subtraction of b.
Subtracting Equations (1.22) and (1.21) we obtain
Thus we can calculate the error in our solution, δx, from our residual, δb, and hence correct our
solution to improve its accuracy by subtracting this error i.e.,
1. We can use our previous LU factorisation of A to solve Equation (1.24) so this process can
be iterated efficiently.
2. The calculation of the residual, δb, requires the original A matrix which is normally over-
written by the LU decomposition. Thus if iterative improvement is used the original matrix
must be copied before the factorisation process.
3. For iterative improvement to be most successful the residual should be computed with a
higher degree of accuracy than the matrices and the right-hand side. This is often times
difficult to do in practice if we are already using 8-byte real numbers.
• Set x0 = x.
Do k = 1, ...., numits
• Solve Ly = P δb for y.
Enddo
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
10 L INEAR E QUATIONS
x = Eb − F x. (1.26)
After an initial guess, x(0) , is selected, the approximation for the solution is generated by calculat-
ing
x(k) = Eb − F x(k−1) . (1.27)
If we write the matrix A as A = D + L + U , where D = diag(A) and L and U are the lower
and upper triangular parts, then E = D −1 and F = D −1 (L + U ) in Equation (1.27). We then
write the Jacobi method in Equation (1.27) as
For simple systems, we can replace this matrix equation as a series of steps as follows:
1. Rearrange the system so that each variable is the subject of one of the equations.
2. Set the variables on the RHS of the equation to be the old variables.
3. Set the variables on the LHS of the equation to be the new variables.
1 −1 1 6 x4 10
For this system, we execute Step 1 and form the following four equations:
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
1.4 I TERATIVE S OLUTION M ETHODS 11
We (hopefully) calculate successively better xk by repeatedly using the old solution, xk−1 , to
get new approximations. We continue doing this until the current approximation, xk , is converged,
i.e. it is close enough, in some sense, to the exact solution. One commoly used stopping criteria,
known as the relative change criteria, is to iterate until
x(k) − x(k−1)
< tolerance. (1.29)
|x(k) |
Iteration x1 x2 x3 x4
0 0 0 0 0
1 2.43 -1.44 1.50 1.67
2 1.80 -0.86 0.85 0.77
3 2.06 -1.05 1.06 1.08
4 1.98 -0.98 0.98 0.97
5 2.01 -1.01 1.01 1.01
6 2.00 -1.00 1.00 1.00
In the above table we observe sequence of iterates when applying the Jacobi method to the above
system Ax = b. The solution after 6 iterations is given by x = [2, −1, 1, 1]T .
1 −1 1 6 x4 10
Use of the Gauss-Seidel method results in the equations
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
12 L INEAR E QUATIONS
In succession, each xi is updated, resulting in the k th iterate. The iterations proceed until the
iterates converge to the solution.
Iteration x1 x2 x3 x4
0 0 0 0 0
1 2.43 -1.17 1.01 0.90
2 1.95 -0.99 1.02 1.01
3 2.00 -0.99 1.00 1.00
4 2.00 -1.00 1.00 1.00
Therefore ,the solution to this problem, after 4 iterations, is given by x = [2, −1, 1, 1]T . Note how
this method reaches the same solution as the Jacobi method, but in two fewer iterations.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
Chapter 2
2.1 Introduction
This chapter considers finite difference approximations to partial differential operators. The finite
difference approximations are discrete models of the continuous finite difference operators. They
are used to give a discrete model of a partial differential equation. The solutions are then a discrete
approximation of the continuous solution to the partial differential equation.
Models exist at many levels in engineering. One possible categorisation is to think of mathe-
matical, numerical and computational models. An example of a mathematical model is the simple
diffusion equation:
∂f ∂ 2f
+ D 2 = b(x, t)
∂t ∂x
Mathematical models are derived from first principles, subject to necessary assumptions and ap-
proximations. Many mathematical models cannot be solved analytically for a realistically com-
plicated solution domain, e.g. a harbour or a complicated heat sink, and the differential operators
must be approximated in some way. This approximation to the mathematical model can be called a
numerical model and usually gives a discrete numerical approximation to the continuous solution
of the partial differential equation. The third level of approximation or modelling arises in the
choice of coefficients (such as the diffusion coefficient, D, in the above equation) and the number
of discrete points at which to solve the equation. These choices make comprise what can be re-
ferred to as the computational model. A specific set of results is obtained from the computational
model, which is based on the numerical model of the mathematical model of the real process. In
this chapter we are interested in a particular class of numerical models called finite differences.
An illustration of a numerical model follows. Consider for example the representation of a
function f formed by fitting a quadratic to three values of f , each a distance of ∆x apart. This is
shown in Figure 2.1.
The approximation to the function f is:
f (x) ≈ ax2 + bx + c
fi−1 − 2fi + fi+1 2 fi+1 − fi−1
≈ x /2 + x + fi
∆x2 2∆x
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
14 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS
f (x)
fi
fi−1 fi+1
x
i−1 i i+1
−∆x 0 +∆x
F IGURE 2.1
This is identical to what would be obtained from using a second order Lagrange interpolation.
Using this quadratic approximation to the function, f , at x = 0 (index i), models of the first
and second derivatives of f (x) are:
These sorts of models can be directly substituted into a partial differential equation like the diffu-
sion equation to give a numerical model.
The discrete points i − 1, i, i + 1 etc. can be referred to as nodes. This chapter will look
at deriving models of differential operators using a more rigorous foundation that gives formal
estimates of e.g., the truncation error involved in a model. The aim is to determine a (usually
linear) discrete equation for each node that relates function values at surrounding nodes, e.g.,
∂ 2 f (x, t)
= b(x, t)
∂x2
⇒ fi−1 (t) − 2fi (t) + fi+1 (t) ≈ ∆x2 b(xi , t)
Note that the function value at each node is only dependent on time. The equation for each node
can be put together to form a system of equations, i.e., Ax = b that can be solved (subject to the
appropriate boundary conditions):
. ..
.. .
...
fi−2 (t) b(xi−2 , t)
. . . 0 1 −2 1 0 . . . fi−1 (t) 2 b(xi−1 , t)
= ∆x
. . . 0 1 −2 1 0 . . . fi (t) b(xi , t)
..
. fi+1 (t) b(xi+1 , t)
.. ..
. .
An important part of the process of deriving models of partial differential operators is an un-
derstanding of the nature of the the partial differential equations themselves.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.2 C LASSIFICATION OF PARTIAL D IFFERENTIAL E QUATIONS 15
F IGURE 2.2
These sorts of problems are also known as boundary value problems. Typical examples include
steady viscous flow, steady state temperature distributions, equilibrium stresses in elastic structures
and steady state voltage distributions.
Common boundary condition types include:
Dirichlet conditions Values of the dependent function, u, are specified on the
boundary.
∂u
Neumann conditions Values of the normal derivative, , are specified on the
∂n
boundary.
Mixed or Robin conditions A linear combination of values and normal derivatives are
specified on the boundary.
Only one type of boundary condition can be specified at each point on the boundary.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
16 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS
Differential Equation
D (u) = f
on open domain, Ω
Boundary Conditions
B (u) = g
on boundary, ∂Ω
Time
Space
Initial Conditions
I (u) = h
F IGURE 2.3
These problems are known as initial boundary value problems, or just initial value problems.
Typical examples include the propagation of pressure waves in fluid, propagation of heat, propaga-
tion of stress and displacement in elastic structures and the propagation of electromagnetic waves.
i.e., the derivative will be approximated by a weighted sum of discrete function evaluations (e.g.,
just like interpolation or quadrature). The shorthand, u (xi+k ) = u (xi + k∆x), will be used in the
remainder of these notes.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.3 F INITE D IFFERENCES 17
du (∆x)2 d2 u
u (xi+1 ) = u (xi ) + ∆x + + ···
dx x=xi 2! dx2 x=xi
du
Setting n = 1 in Equation (2.1) we can get a finite difference approximation to i.e.,
dx x=xi
du 1 ∆x d2 u
= (−u (xi ) + u (xi+1 )) − (2.2)
dx x=xi ∆x 2 dx2 x=xi
This is a two point forward difference approximation to the first derivative. The approximation
is called a two point scheme as it involves two known points (u (xi ) and u (xi+1 ) in this case) and a
forward scheme as all points are ’forward’ of xi , the point at which the derivative is approximated.
The term
∆x d2 u du 1
− = − (−u (xi ) + u (xi+1 ))
2 dx2 x=xi dx x=xi ∆x
is called the truncation error. It is a measure of the error in the finite difference approximation to
the derivative. It can be seen in Equation (2.2) that the truncation error for the two point forward
difference approximation to the first derivative is proportional to ∆x.
Consider now the Taylors series expansion of u (xi−1 ) about xi
du (∆x)2 d2 u
u (xi−1 ) = u (xi ) − ∆x + + ···
dx x=xi 2! dx2 x=xi
(−1)n (∆x)n d u n
(−1)n+1 (∆x)n+1 dn+1 u
··· + + (2.3)
n! dxn x=xi (n + 1)! dxn+1 x=xi
du
From Equation (2.3) we can obtain the two point backward difference approximation to
dx x=xi
i.e.,
du 1 ∆x d2 u
= (−u (xi−1 ) + u (x0 )) + (2.4)
dx x=xi ∆x 2 dx2 x=xi
The truncation error for the two point backward difference approximation to the first derivative
is also proportional to ∆x.
If we subtract the Taylor series expansions of u (xi−1 ) from u (xi+1 ) we obtain the two point
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
18 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS
du
central difference approximation to i.e.,
dx x=xi
du (∆x)2 d2 u (∆x)3 d3 u
u (xi+1 ) − u (xi−1 ) = u (xi ) + ∆x + +
dx x=xi 2! dx2 x=xi 3! dx3 x=xi
2 2 3 3
du (∆x) d u (∆x) d u
− u (xi ) + ∆x − + (2.5)
dx x=xi 2! dx2 x=xi 3! dx3 x=xi
du 1 (∆x)2 d3 u
= (−u (xi − ∆x) + u (xi + ∆x)) −
dx x=xi 2∆x 6 dx3 x=xi
Note that the truncation error for the two point central difference approximation to the first
derivative is proportional to (∆x)2 , an improvement over the two point forward and backward
differences where the error is proportional to ∆x.
du
Consider now the problem of finding an expansion for in terms of u (xi ) , u (xi+1 ), and
dx x=xi
du
u (xi+2 ), i.e., a three point difference. Specifically we would like the approximation to in
dx x=xi
terms of a linear combination of u (xi ) , u (xi+1 ), and u (xi+2 ). Thus we need to find b0 , b1 and b2
such that
du
≈ b0 u (xi ) + b1 u (xi+1 ) + b2 u (xi+2 )
dx x=xi
Let u (x) be expressed in terms of a polynomial,
u (x) = a0 + a1 x + a2 x2 + a3 x3 + · · · .
Then
du
= a1 + 2a2 x + 3a3 x2 + · · · ,
dx
where
a1 + 2a2 xi + 3a3 x2i + · · · = b0 a0 + a1 xi + a2 x2i + a3 x3i + · · ·
Thus, we can obtain three linear equations in b0 , b1 and b2 by equating terms involving a0 , a1 and
a2 i.e.,
1 1 1 b0 0
xi (xi+1 ) (xi+2 ) b1 = 1
x2i (xi+1 )2 (xi+2 )2 b2 2xi
Solving this system gives
−3 4 −1
, b1 =
b0 = and b2 =
2∆x 2∆x 2∆x
Thus the three point forward difference approximation is
du 1 (∆x)2 d3 u
= (−3u (xi ) + 4u (xi+1 ) − u (xi+2 )) + (2.6)
dx x=xi 2∆x 3 dx3 x=xi
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.3 F INITE D IFFERENCES 19
The truncation error of the three point forward difference approximation to the first derivative
is twice that of the two point central difference approximation and is proportional to (∆x)2 .
du
A table of common finite difference approximations to is as follows.
dx x=xi
d2 u (∆x)2 d4 u
u (xi−1 ) + u (xi+1 ) = 2u (xi ) + (∆x)2 + .
dx2 x=xi 12 dx4 x=xi
d2 u
Rearranging we can obtain a three point central difference approximation for , i.e.,
dx2 x=xi
d2 u 1 (∆x)2 d4 u
= (u (xi−1 ) − 2u (xi ) + u (xi+1 )) − (2.7)
dx2 x=xi (∆x)2 12 dx4 x=xi
du (∆x)2 d2 u (∆x)3 d3 u
u (xi+1 ) = u (xi ) + ∆x + +
dx x=xi 2 dx2 x=xi 6 dx3 x=xi
2 3 3
du du 4 (∆x) d u
u (xi+2 ) = u (xi ) + 2∆x + 2 (∆x)2 + .
dx x=xi dx2 x=xi 3 dx3 x=xi
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
20 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS
We can eliminate the first derivative term by adding −2u (xi+1 ) to u (xi+2 ).
d2 u d3 u
−2u (xi+1 ) + u (xi+2 ) = −u (xi ) + (∆x)2 + (∆x)3
dx2 x=xi dx3 x=xi
d2 u
Rearranging this yields a three point forward approximation to in terms of u (xi ), u (xi+1 )
dx2 x=xi
and u (xi+2 ), i.e.,
d2 u 1 d3 u
= (u (x i ) − 2u (x i+1 ) + u (x i+2 )) − ∆x . (2.8)
dx2 x=xi (∆x)2 dx3 x=xi
d2 u
A table of common finite difference approximations to is as follows.
dx2 x=xi
∂ 2u 1
≈ (ui−1j − 2uij + ui+1j )
∂x2 (xi ,yj ) (∆x)2
2
∂ u 1
≈ (uij−1 − 2uij + uij+1 )
∂y 2 (xi ,yj ) (∆y)2
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.4 D IFFERENCE E QUATIONS FOR E QUILIBRIUM P ROBLEMS 21
or
ui−1j + uij−1 − 4uij + ui+1j + uij+1 = ∆2 f (xi , yj )
Thus we can write the finite difference approximation of ∇2 u = f in terms of a five point expres-
sion with the coefficient pattern
1
1
1 −4 1 =f (2.9)
∆2
1
This is known as a finite difference molecule for ∇2 u = f centred at the point (xi , yj ) .
The graphical representation is given in Figure 2.4, where ◦ are the finite difference points in
∆2 4
the scheme. This approximation has a local truncation error of ∇ u (x, y).
6
yj+1 = yj + ∆y
yj
yj−1 = yj − ∆y
xi−1 = xi − ∆x xi xi+1 = xi + ∆x
F IGURE 2.4: Laplacian finite difference molecule.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
22 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS
y
u=4−x
3
2
u = 16 − 4y u=0
1
x
1 2 3 4
2
u = (4 − x)
F IGURE 2.5: Steady state temperature distribution problem.
x
u00 u10 u20 u30 u40
F IGURE 2.6: A finite difference grid laid over the plate.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.4 D IFFERENCE E QUATIONS FOR E QUILIBRIUM P ROBLEMS 23
The equation centred on node 11 is given by the five point approximation to the Laplacian in
∇2 u = 0 i.e.,
1
(u10 + u01 − 4u11 + u21 + u12 ) = 0
∆2
but ∆ = 1 so
u10 + u01 − 4u11 + u21 + u12 = 0
Similarly for the equations centred on nodes 21, 31, 12, 22, and 32:
u20 + u11 − 4u21 + u31 + u22 =0
u30 + u21 − 4u31 + u41 + u32 =0
u11 + u02 − 4u12 + u22 + u13 =0
u21 + u12 − 4u22 + u32 + u23 =0
u31 + u22 − 4u32 + u42 + u33 =0
The set of linear equations may be expressed as a matrix equation:
1 u00 16 u00 16
1 u10 9 u10 9
1 u
20
4
u
20
4
1 u
30
1
u
30
1
1 u
40
0
u
40
0
1 u
01
12
u
01
12
1 1 −4 1 1 u
11
0
u
11
7.66
1 1 −4 1 1 u
21
0
u
21
4.15
1 1 −4 1 1 u31
0
u31
1.66
1 u41 0 u41 0
u02 = 8
⇒ u02 = 8
1
1 1 −4 1 1 u12
0
u12
5.48
1 1 −4 1 1 u22
0
u22
3.28
1 1 −4 1 1 u32
0
u32
1.48
1 u42 0 u42 0
1 u03 4 u03 4
1 u13 3 u13 3
1 u23 2 u23 2
1 u33 1 u33 1
1 u43 0 u43 0
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
24 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS
x
u00 u10 u20 uN1 0
F IGURE 2.7
and I is the (N1 + 1) × (N1 + 1) identity matrix, ν i and β i are (N1 + 1) subvectors given by
ui0 bi0
ui1 bi1
ui2 bi2
ν i = .. , β i = .. .
. .
ui(N1 −1) bi(N1 −1)
uiN1 biN1
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.5 D IFFERENCE E QUATIONS FOR P ROPAGATION P ROBLEMS 25
the local truncation error of this approximation is proportional to ∆x. In order to maintain a so-
lution accuracy proportional to (∆x)2 three point difference approximations to the boundary first
derivatives must be used, e.g., a three point backward difference
1
(ui−2j − 4ui−1j + 3uij ) = c.
2∆x
∂ 2u ∂u
2
−a =0
∂x ∂t
where a is reciprocal of the thermal diffusivity. The equation for u (x, t) is usually defined over
an open domain, i.e., over a closed spatial interval, X0 ≤ x ≤ X1 , and an open temporal interval,
t ≥ 0. At time t = 0 an initial state is prescribed, u (x, 0) = h (x). For time t > 0 boundary
conditions are prescribed at x = X0 and x = X1 i.e., u (X0 , t) = g0 (t) and u (X1 , t) = g1 (t).
This is shown graphically in Figure 2.8.
∂ 2u ∂u
2
−a =0
∂x ∂t
g0 (t) g1 (t)
x
h (x)
X0 X1
F IGURE 2.8
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
26 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS
mated as
∂ 2u ∂u 1 n n n
2
−a =0≈ 2 ui−1 − 2ui + ui+1
∂x ∂t (∆x)
1
−uni + un+1
−a i
∆t
We now have the unknown un+1i expressed in terms of known values at t = tn , i.e., uni−1 , uni and
uni+1 . Rearranging we have
∆t
un+1 = uni + n n n
i ui−1 − 2ui + ui+1
a (∆x)2
∆t ∂ 2u ∂u
Setting r = 2 and rearranging we can obtain the molecule for 2
−a = 0 centred
a (∆x) ∂x ∂t
on (xi , tn ) i.e.,
−1
=0 (2.10)
r 1 − 2r r
Figure 2.9 shows this molecule graphically, where ◦ are known values and • are unknown values.
t0 + ∆t
t0
x0 − ∆x x0 x0 + ∆x
F IGURE 2.9
A formula, such as this, which expresses one unknown value in terms of known values is called
an explicit formula and the corresponding finite difference scheme is called an explicit scheme.
∆t
It will be shown (in Section 2.6) that the explicit method is only stable for 0 < ≤ 12 .
a (∆x)2
∆t 1
When 2 > 2 the explicit scheme yields oscillatory solutions whose magnitudes increase
a (∆x)
exponentially with time. Although the explicit method is computationally simple it requires a
∆t 1 a (∆x)2
small time step i.e., since ≤ then ∆t ≤ and since we would like ∆x to be
a (∆x)2 2
2
small, ∆t must be kept very small.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.5 D IFFERENCE E QUATIONS FOR P ROPAGATION P ROBLEMS 27
∂ 2u
mation for by the average of its finite difference representations at tn and tn+1 = tn +∆t,
∂x2 (xi ,tn )
i.e.,
∂ 2u ∂u 1 n+1 n+1
n n n
+ un+1
2
−a =0≈ 2 ui−1 − 2ui + ui+1 + ui−1 − 2ui i+1
∂x ∂t 2 (∆x)
1
−uni + un+1
−a i
∆t
∆t
Setting r = and rearranging gives
a (∆x)2
r n+1 r r n r n
ui−1 − (1 + r) un+1
i + un+1 n
i+1 = − ui−1 − (1 − r) ui − ui+1
2 2 2 2
The left hand side contains three unknown values expressed in terms of three known values on the
∂ 2u ∂u
right hand side. In terms of the molecule for 2 − a = 0 centred on (xi , tn ) we have
∂x ∂t
r r
− (1 + r)
2 2 =0 (2.11)
r
1−r
r
2 2
Graphically this is shown in Figure 2.10, where ◦ are known values and • are unknown values.
t0 + ∆t
t0
x0 − ∆x x0 x0 + ∆x
F IGURE 2.10
If there are N internal spatial mesh points along each time row we obtain N simultaneous
linear equations to solve for the N unknown values at each time step. A method such as this where
the calculation of unknown values requires the solution of a set of simultaneous equations is called
an implicit scheme.
We can generalise this method of approximating the spatial derivative by the average of the
finite difference representations of the spatial derivative at tn and tn + ∆t in terms of a general
weighting of the tn and tn + ∆t representations. Consider approximating the spatial derivative by
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
28 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS
∂ 2u ∂u 1 n n n
n+1 n+1 n+1
− a = 0 ≈ 2 (1 − θ) u i−1 − 2ui + u i+1 + θ ui−1 − 2ui + u i+1
∂x2 ∂t (∆x)
1
−uni + un+1
−a i
∆t
∂ 2u ∂u
The molecule for 2
−a = 0 centred at (xi , tn ) is:
∂x ∂t
( )
θr − (1 + 2θr) θr
= 0. (2.12)
(1 − θ) r 1 − 2 (1 − θ) r (1 − θ) r
Comparing the molecule in Equation (2.12) with the molecule in Equation (2.11) we can see
that the molecule in Equation (2.11) corresponds to θ = 12 . This θ = 12 is a special case known as
the Crank-Nicolson implicit method. The names for other special values of θ are:
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.6 VON N EUMANN S TABILITY A NALYSIS FOR P ROPAGATION P ROBLEMS 29
It can be seen then that the mode of error will grow exponentially as time increases (n → ∞) if
|ξ| > 1. Hence the eigenmode will be stable only if
|ξ| ≤ 1 (2.14)
This stability criterion can be used to investigate the stability of a finite difference scheme.
Consider now the fully explicit finite difference scheme for the heat equation as given in Equa-
tion (2.10), i.e.,
un+1
j = runj−1 + (1 − 2r) unj + runj+1 (2.15)
∆t n
where r = 2 and uj is the finite difference solution at spatial location j and temporal
a (∆x)
location n. Substituting Equation (2.13) into Equation (2.15) (i.e., nj in place of unj ) we have
or
Now the error will be stable if |ξ| ≤ 1 and since r > 0, as ∆x > 0 and ∆t > 0, the explicit
difference scheme will be unstable if ξ < −1, i.e.,
2 β∆x
1 − 4r sin < −1
2
2 β∆x
4r sin >2
2
1
r> .
2
∆t 1
Hence the explicit finite difference scheme will be unstable if r = 2 > .
a (∆x) 2
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
Chapter 3
Nonlinear Equations
3.1 Revision
3.1.1 The Problem of Nonlinear Root-finding
In this module we consider the problem of using numerical techniques to find the roots of nonlinear
equations, f (x) = 0. Initially we examine the case where the nonlinear equations are a scalar
function of a single independent variable, x. Later, we shall consider the more difficult problem
where we have a system of n nonlinear equations in n independent variables, f (x) = 0.
Since the roots of a nonlinear equation, f (x) = 0, cannot in general be expressed in closed
form, we must use approximate methods to solve the problem. Usually, iterative techniques are
used to start from an initial approximation to a root, and produce a sequence of approximations
(0) (1) (2)
x ,x ,x ,... ,
which converge toward a root. Convergence to a root is usually possible provided that the function
f (x) is sufficiently smooth and the initial approximation is close enough to the root.
be a sequence which converges to a root χ, and let (k) = x(k) − χ. If there exists a number p and
a non-zero constant c such that
(k+1)
lim (k) p = c, (3.1)
k→∞ | |
then p is called the order of convergence of the sequence. For p = 1, 2, 3 the order is said to be
linear, quadratic and cubic respectively.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
32 N ONLINEAR E QUATIONS
the nonlinear function near a root. This limiting accuracy of the function imposes a limit on the
accuracy, , of the root. The accuracy to which the root may be found is given by:
δ
= . (3.2)
|f 0 (χ)|
This is the best error bound for any root finding method. Note that is large when the first derivative
of the function at the root, |f 0 (χ)|, is small. In this case the problem of finding the root is ill-
conditioned. This is shown graphically in Figure 3.1.
f (x)
f (x) ± δ
21 22
χ1 2δ
2δ
χ2 x
f 0 (χ1 ) 1 large 2 small
ill-conditioned well-conditioned
F IGURE 3.1
converges to the root, χ, then the differences x(k) − x(k−1) will decrease until x(k) − χ ≈ .
With further iterations, rounding errors will dominate and the differences will vary irregularly.
The iterations should be terminated and x(k) be accepted as the estimate for the root when the
following two conditions are satisfied simultaneously:
1. x(k+1) − x(k) ≥ x(k) − x(k−1) and (3.3)
(k) (k−1)
x −x
2. < ∆. (3.4)
1 + |x(k) |
∆ is some coarse tolerance to prevent the iterations from being terminated before x(k) is close to χ,
i.e. before the step size x(k) −x(k−1) becomes “small”. The condition (3.4) is able to test the relative
offset when x(k) is much larger than 1 and much smaller than 1. In practice, these conditions are
used in conjunction with the condition that the number of iterations not exceed some user defined
limit.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.1 R EVISION 33
be the mid-point of the interval x(0) , x(1) . Three mutually exclusive possibilities exist:
• if f x(2) = 0 then the root has been found;
• if f x(2) has the same sign as f x(0) then the root is in the interval x(2) , x(1) ;
• if f x(2) has the same sign as f x(1) then the root is in the interval x(0) , x(2) .
In the last two cases, the size of the interval bracketing the root has decreased by a factor of two.
The next iteration is performed by evaluating the function at the mid-point of the new interval.
After k iterations the size of the interval bracketing the root has decreased to:
x(1) − x(0)
2k
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
34 N ONLINEAR E QUATIONS
The process is shown graphically in Figure 3.2. The bisection method is guaranteed to converge
to a root. If the initial interval brackets more than one root, then the bisection method will find one
of them.
f (x)
x(0) x(1)
x
x(2)
x(3)
x(4)
x(5)
F IGURE 3.2
Since the interval size is reduced by a factor of two at each iteration, it is simple to calculate in
advance the number of iterations, k, required to achieve a given tolerance, 0 , in the solution:
(1)
x − x(0)
k = log2 .
0
The bisection method has a relatively slow linear convergence.
Then we would expect the root to lie closer to x(k) than x(k−1) . Instead of choosing the new estimate
to lie at the midpoint of the current interval, as is the case with the bisection method, the secant
(k−1) (k−1)
method chooses the x-intercept of the secant line to the curve, the line through x , f x
and x(k) , f x(k) . This places the new estimate closer to the endpoint for which f (x) has the
smallest absolute value. The new estimate is:
(k+1) (k) f x(k) x(k) − x(k−1)
x =x − . (3.6)
f (x(k) ) − f (x(k−1) )
Note that the secant method requires two initial function evaluations but only one new function
evaluation is made at each iteration. The secant method is shown graphically in Figure 3.3.
The secant method does not have the root bracketing property of the bisection method since
the new estimate, x(k+1) , of the root need not lie within the bounds defined by x(k−1) and x(k) . As
a consequence, the secant method does not always converge, but when it does so it usually does
so faster than the bisection method. It can be shown that the order of convergence of the secant
method is: √
1+ 5
≈ 1.618.
2
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.1 R EVISION 35
f (x)
F IGURE 3.3
f (x)
x(0) x(1)
x(3) x(2) x
F IGURE 3.4
The advantage of regula falsi is that like the bisection method, it is always convergent. How-
ever, like the bisection method, it has only linear convergence. Examples where the regula falsi
method is slow to converge are not hard to find. One example is shown in Figure 3.5.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
36 N ONLINEAR E QUATIONS
f (x) x(3)
(0)
x
x(2) x(1) x
F IGURE 3.5
f (x)
F IGURE 3.6
Newton’s method may be derived from the Taylor series expansion of the function:
(∆x)2
f (x + ∆x) = f (x) + f 0 (x)∆x + f 00 (x) + ... (3.8)
2
For a smooth function and small values of ∆x, the function is approximated well by the first two
terms. Thus f (x + ∆x) = 0 implies that ∆x = − ff0(x) (x)
. Far from a root the higher order terms are
significant and Newton’s method can give highly inaccurate corrections. In such cases the Newton
iterations may never converge to a root. In order to achieve convergence the starting values must be
reasonably close to a root. An example of divergence using Newton’s method is given in Figure 3.7.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.1 R EVISION 37
f (x)
x(1) x(0) x
F IGURE 3.7
Newton’s method exhibits quadratic convergence. Thus, near a root the number of significant
digits doubles with each iteration. The strong convergence makes Newton’s method attractive in
cases where the derivatives can be evaluated efficiently and the derivative is continuous and non-
zero in the neighbourhood of the root (as is the case with multiple roots).
Whether the secant method should be used in preference to Newton’s method depends upon
the relative work required to compute the first derivative of the function. If the work required to
evaluate the first derivative is greater than 0.44 times the work required to evaluate the function,
then use the secant method, otherwise use Newton’s method.
It is easy to circumvent the poor global convergence properties of Newton’s method by com-
bining it with the bisection method. This hybrid method uses a bisection step whenever the Newton
method takes the solution outside the bisection bracket. Global convergence is thus assured while
retaining quadratic convergence near the root. Line searches of the Newton step from x(k) to x(k+1)
are another method for achieving better global convergence properties of Newton’s method.
3.1.8 Examples
In the following examples, the bisection, secant, regula falsi and Newton’s methods are applied to
find the root of the non-linear function f (x) = x2 − 1 between [0, 3]. xL and xR are the left and
right bracketing values and xnew is the new value determined at each iteration. is the distance
from the true solution, nf is the number of function evaluations required at each iteration. The
bisection method is terminated when conditions (3.3) and (3.4) are simultaneously satisfied for
∆ = 1 × 10−4 . The remaining methods determine the solution to the same level of accuracy as the
bisection method.
The examples show the linear convergence rate of the bisection and regula falsi methods, the
better than linear convergence rate of the secant method and the quadratic convergence rate of
Newton’s method. It is also seen that although Newton’s method converges in 5 iterations, 5
function and 5 derivative evaluations, to give a total of 10 evaluations, are required. This contrasts
to the secant method where for the 8 iterations, 9 function evaluations are made.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
38 N ONLINEAR E QUATIONS
2
Secant Method - f(x) = x -1 2
f(x) = x -1
k x1 x2 f(x1) f(x2) ε nf |xk-xk-1|/1+|xk|
0 0.000000 3.000000 -1.000000 8.000000 2.000000 2 f(x) 20
1 3.000000 0.333333 8.000000 -0.888889 0.666667 1 2.000000
2 0.333333 0.600000 -0.888889 -0.640000 0.400000 1 0.166667
10
3 0.600000 1.285714 -0.640000 0.653061 0.285714 1 0.300000
4 1.285714 0.939394 0.653061 -0.117539 0.060606 1 0.178571
0
5 0.939394 0.992218 -0.117539 -0.015504 0.007782 1 0.026515
-1 -0.5 3E-16 0.5 1 1.5 2 2.5 3 3.5 4
6 0.992218 1.000244 -0.015504 0.000488 0.000244 1 0.004013
-10 x
7 1.000244 0.999999 0.000488 -0.000002 0.000001 1 0.000123
8 0.999999 1.000000 0.000000 0.000000
9
2
Regula Falsi Method - f(x) = x -1
k x1 x2 f(x1) f(x2) xnew f(xnew) ε nf |xk-xk-1|/1+|xk|
0 0.000000 3.000000 -1.000000 8.000000 0.333333 -0.888889 0.666667 3
1 0.333333 3.000000 -0.888889 8.000000 0.600000 -0.640000 0.400000 1 0.166667
2 0.600000 3.000000 -0.640000 8.000000 0.777778 -0.395062 0.222222 1 0.100000
3 0.777778 3.000000 -0.395062 8.000000 0.882353 -0.221453 0.117647 1 0.055556
4 0.882353 3.000000 -0.221453 8.000000 0.939394 -0.117539 0.060606 1 0.029412
5 0.939394 3.000000 -0.117539 8.000000 0.969231 -0.060592 0.030769 1 0.015152
6 0.969231 3.000000 -0.060592 8.000000 0.984496 -0.030767 0.015504 1 0.007692
7 0.984496 3.000000 -0.030767 8.000000 0.992218 -0.015504 0.007782 1 0.003876
8 0.992218 3.000000 -0.015504 8.000000 0.996101 -0.007782 0.003899 1 0.001946
9 0.996101 3.000000 -0.007782 8.000000 0.998049 -0.003899 0.001951 1 0.000975
10 0.998049 3.000000 -0.003899 8.000000 0.999024 -0.001951 0.000976 1 0.000488
11 0.999024 3.000000 -0.001951 8.000000 0.999512 -0.000976 0.000488 1 0.000244
12 0.999512 3.000000 -0.000976 8.000000 0.999756 -0.000488 0.000244 1 0.000122
13 0.999756 3.000000 -0.000488 8.000000 0.999878 0.000122 0.000061
15
2
Newton’s Method - f(x) = x -1; f’(x) = 2x
k x f(x) f’(x) ε nf nf’ |xk-xk-1|/1+|xk|
0 3.000000 8.000000 6.000000 2.000000 1 1
1 1.666667 1.777778 3.333333 0.666667 1 1 0.500000
2 1.133333 0.284444 2.266667 0.133333 1 1 0.250000
3 1.007843 0.015748 2.015686 0.007843 1 1 0.062500
4 1.000031 0.000061 2.000061 0.000031 1 1 0.003906
5 1.000000 0.000000 0.000015
5 5
F IGURE 3.8
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.2 C OMBINING M ETHODS 39
Let:
d ln |Pn (x)|
G= (3.11)
dx
1 1 1
= + + ... + (3.12)
x − x1 x − x2 x − xn
0
P (x)
= n (3.13)
Pn (x)
and
d2 ln |Pn (x)|
H=− (3.14)
dx2
1 1 1
= 2
+ 2
+ ... + (3.15)
(x − x1 ) (x − x2 ) (x − xn )2
0 2
Pn (x) P 00 (x)
= − n . (3.16)
Pn (x) Pn (x)
If we make the assumption that the root x1 is located a distance α from our current estimate,
(k)
x , and all other roots are located at a distance β, then:
x(k) − x1 = α (3.17)
x(k) − xi = β f or i = 2, 3, . . . , n. (3.18)
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
40 N ONLINEAR E QUATIONS
The method is cubic in its convergence for real or complex simple roots. Laguerre’s method will
converge to all types of roots of polynomials, real, complex, single or multiple. It requires complex
arithmetic even when converging to a real root. However, for polynomials with all real roots, it is
guaranteed to converge to a root from any starting point.
After a root, x1 , of the nth order polynomial, Pn (x), has been found, the polynomial should
be divided by the quotient, (x − x1 ), to yield an (n − 1) order polynomial. This process is called
deflation. It saves computation when estimating further roots and ensures that subsequent iterations
do not converge to roots already found.
3.3.1 Example
Following is a simple example that illustrates the use of Laguerre’s method. Consider the third
order polynomial:
P2 (x) = x2 − 4x + 3
P 0 2 (x) = 2x − 4
P 00 2 (x) = 2
1. Let the initial estimate of the first root be:
(0)
x1 = 0.5
then
(0)
2x1 − 4
G= 2 = −2.4
(0) (0)
x1 − 4x1 + 3
2
H = G2 − 2 = 4.16
(0) (0)
x1 − 4x1 +3
2
α= p = −0.5 (choosing −4 as the largest denominator)
G ± (2 − 1)(2H − G2 )
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.3 L AGUERRE ’ S M ETHOD 41
3.3.3 Example
Following is an example of how Horner’s method can be used to evaluate
P3 (x) = 1 + 2x − 3x2 + x3
and its derivatives.
1. P = 1, P 0 = 0 and P 00 = 0.
2. P 00 = 0 + 0, P 0 = 0 + 1 and P = x − 3.
3. P 00 = 0 + 1, P 0 = x + x − 3 and P = x(x − 3) + 2.
4. P 00 = x + 2x − 3, P 0 = x(2x − 3) + x(x − 3) + 2 and P = x(x(x − 3) + 2) + 1.
5. P 00 = 2(3x − 3)
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
42 N ONLINEAR E QUATIONS
3.3.4 Deflation
Dividing a polynomial of order n by a factor (x − x1 ) may be performed using the following
algorithm:
r ← an ;
an ← 0.0;
for i in n − 1 . . . 0 loop
q ← ai ;
ai ← r;
r ← x1 ∗ r + q;
end loop;
The coefficients of the new polynomial are stored in the array ai , and the remainder in r.
fi (x1 , x2 , . . . , xn ) = 0, i = 1, 2, . . . , n.
∂fi (x)
f 0 ij (x) = . (3.23)
∂xj
If these gradients are approximately calculated, the method is called a quasi-Newton method. One
way to approximately form the gradients is to calculate fi (x − ∆xj ) and fi (x + ∆xj ) and then
form the finite difference approximation:
fi (x + ∆xj ) − fi (x − ∆xj )
f 0 ij (x) ≈ (3.24)
2∆xj
or,
(k+1)
f 0 ij (x(k) )δj = −fi x(k)
(3.26)
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.4 S YSTEMS OF N ONLINEAR E QUATIONS 43
(k+1) (k+1) (k)
where δj = xj − xj is the vector of updates. Equation (3.26) is a set of n linear
(k+1)
equations in n unknowns, δj . If the Jacobian is non-singular, then the system of linear equations
can be solved to provide the Newton update:
(k+1) (k) (k+1)
xj = x j + δj . (3.27)
Each step of Newton’s method requires the solution of a set of linear equations. For small n the
set of linear equations may be solved using LU decomposition. For large n, alternative iterative
methods may be required. As with the one-dimensional version, Newton’s method converges
quadratically if the initial estimate, x(0) , is sufficiently close to a root. Newton’s method in multiple
dimensions suffers from the same global convergence problems as its one-dimensional counterpart.
3.4.1 Example 1
Consider finding x and y such that the following are true:
This is a contrived problem, but it serves to illustrate the application of Newton’s in multiple
dimensions. Solving for the simultaneous roots of these equations is equivalent to finding the roots
of the single factored equation:
Inspection shows that this equation has the root (0, −1).
The first step in applying Newton’s method is to determine the form of the Jacobian and the
right hand side function vector for calculating the update vector for each iteration. These are:
∂f1 (x(k) , y (k) ) ∂f1 (x(k) , y (k) ) " #
(k+1)
∂x ∂y x
δ −f1 (x(k) , y (k) )
∂f2 (x(k) , y (k) ) ∂f2 (x(k) , y (k) ) δy(k+1) = −f2 (x(k) , y (k) ) (3.31)
∂x ∂y
" (k+1) # 2
2(x − y − 1) 2(y − x − 1) δx −x + 2x − y 2 + 2y + 2xy − 1
= (3.32)
2(x + y + 1) 2(x + y + 1) δy(k+1) −x2 − 2x − y 2 − 2y − 2xy − 1
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
44 N ONLINEAR E QUATIONS
20
10
0 0
f(x,y)
−0.5
−10
−1
−20
−1.5
−30 y
−1 −2
−0.5
0
0.5 −2.5
1
x 1.5
2
2.5 −3
3
F IGURE 3.9: Sequence of Newton steps plotted on the surface defined by Equation (3.30).
3.4.2 Example 2
Newton’s method for finding roots in multiple dimensions becomes very useful when finding ap-
proximate solutions to non-linear partial differential equations. Consider the diffusion equation in
one-dimension, where the diffusivity, D, is an exponential function of the concentration:
∂c ∂ 2c
− Aec 2 = 0 (3.33)
∂t ∂x
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.4 S YSTEMS OF N ONLINEAR E QUATIONS 45
10
5
start at (−1,−3)
0
f(x,y)
−5
−10
start at (2,0)
−15
start at (2,−3)
−20
5 10 15 20 25 30
Number of iterations
The corresponding right hand side vector entry for the Newton update is −fi . Note that in contrast
to the case of the linear diffusion equation, the system of discrete finite difference equations are
now time varying and must be constructed and factorised for each time step.
If Dirichlet boundary conditions of 1 and 0 are applied at nodes 1 and n, respectively, then the
functions whose roots are to be found are:
f1 = cn+1
1 −1 (3.37)
fn = cn+1
n (3.38)
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
46 N ONLINEAR E QUATIONS
The right hand side vector entries for the Newton update are 1 − cn+1
1 and −cn+1
n , respectively.
Note that the unknown dependent variables for this non-linear problem are cn+1 . Let these
unknowns be referred to simply as c so that the k th Newton update becomes c(k) . The algorithm
for solving this problem is similar to that of Example 1:
for t in 0 · · · T loop
k ← 0;
∆ ←> ;
specify starting point: c(k) ; (This will be the solution from the previous time step)
while: ∆ > and k < N
δ (k+1) ← −f 0 −1 f ; (LU factorisation and solution)
c(k+1) ← c(k) + δ (k+1) ;
∆← δ (k+1) − δ (k) ;
2 2
k ← k + 1;
end loop;
end loop;
The coefficient was chosen such that D = 1 m2 s−1 at c = 1 kg m−2 . The linear solutions were
calculated using D = 0.632 m2 s−1 . This value was chosen as the integrated mean value of D(c)
from c = 0 kg m−2 to c = 1 kg m−2 . The variation of the diffusion coefficient with concentration
is shown in Figure 3.12. The non-linear diffusion coefficient is smaller than the constant diffusion
coefficient at lower concentrations. This is observed in the solutions, as the concentration profile
for the non-linear diffusion does not advance as far as for constant diffusion over the same interval
of time.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.4 S YSTEMS OF N ONLINEAR E QUATIONS 47
1
L 5s 0.5
NL 5s
0.9 L 10s
NL 10s 0.45
L 50s
0.8 NL 50s
0.4
0.7
0.35
0.6
Concentration
0.3
Concentration
0.5
0.25
0.4
0.2
0.3 0.15
0.2 0.1
0.1 0.05
Linear
Non−linear
0 0
0 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100
x (m) Time (s)
F IGURE 3.11: Comparative solutions for constant diffusion and non-linear diffusion.
1.1
D=0.368exp(c)
D=0.632
1
0.9
Diffusion Coefficient
0.8
0.7
0.6
0.5
0.4
0.3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Concentration
Solving this example problem produced the following non-linear iteration information over the
course of the first time step:
Time: 1 second
Iteration: 1 L2Norm(F) = 0.1000E+01 Delta=0.1026E+01
Iteration: 2 L2Norm(F) = 0.5549E-01 Delta=0.9876E+00
Iteration: 3 L2Norm(F) = 0.8958E-03 Delta=0.3755E-01
Iteration: 4 L2Norm(F) = 0.2278E-06 Delta=0.5565E-03
Iteration: 5 L2Norm(F) = 0.1481E-13 Delta=0.1438E-06
Note that F is the right hand side vector, so its magnitude (or L2 norm) may be thought of as an
indication of how good the solution at that iteration is. The solution is accurate when F is very
small. Delta is a measure of how much the solution is changing from one iteration to the next.
The Newton iterations are clearly converging.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
Chapter 4
Univariate Minimisation
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
50 U NIVARIATE M INIMISATION
f (x)
F IGURE 4.1
A function f (x) is quasiconvex in [a, b] if there exists a unique x∗ ∈ [a, b] such that, given
α, β ∈ [a, b] where α < β:
Let a function f (x) be quasiconvex over [a, b]. Let α, β ∈ [a, b] such that α < β.
• If f (α) > f (β) then f (γ) > f (β) for all γ ∈ [a, α).
• If f (α) < f (β) then f (γ) > f (α) for all γ ∈ (β, b].
If f (x) is quasiconvex in [a, b], it is possible to reduce the interval of uncertainty by comparing the
values of f (x) at two interior points.
• If f (α) > f (β) then the new interval of uncertainty is [α, b].
a α β b a α β b
F IGURE 4.2
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
4.2 N ON - DERIVATIVE M ETHODS 51
a a + (κ − 1)δ a + (κ + 1)δ b
new interval
of uncertainty
F IGURE 4.3
The uniform search method requires that the total number of function evaluations, K, be chosen
a priori. The interval of uncertainty is reduced after K function evaluations to:
2(b − a)
2δ =
K −1
i.e.
b−a
+ 1.K=
δ
Thus, if we require a small interval of uncertainty, a large number of function evaluations must be
made.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
52 U NIVARIATE M INIMISATION
ak αk βk bk ak αk βk bk
Fk = Fk−1 + Fk−2
where F0 = F1 = 1, i.e. the first few values are: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . ..
Let K be the total number of function evaluations to be made. At iteration k, let the interval of
uncertainty be [ak , bk ] and define:
FK−k−1
αk = ak + FK−k+1
(bk − ak )
FK−k
βk = ak + FK−k+1
(bk − ak ) .
• If f (αk ) > f (βk ) then the new interval of uncertainty is given by [ak+1 , bk+1 ] = [αk , bk ].
• If f (αk ) < f (βk ) then the new interval of uncertainty is given by [ak+1 , bk+1 ] = [ak , βk ].
In both cases, the interval of uncertainty is reduced by the factor
FK−k
.
FK−k+1
Once (K − 2) iterations are complete, the final function evaluation needs to be made. This
will halve the final uncertainty interval, and follows the same idea as in the Dichotomous Search
by performing a final evaluation at a point near the current α, ie: f (α + δ). Based on this final
function evaluation we can choose our final interval to be that spanning the smallest value.
The Fibonacci method requires that the total number of function evaluations, K, be chosen a
priori. After (K − 1) iterations requiring K function evaluations (including the final step), the
level of uncertainty is:
b−a
.
FK
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
4.2 N ON - DERIVATIVE M ETHODS 53
1+δ
0 1 2 3
F IGURE 4.5
Since the function value at the midpoint of each interval is already known, only one function
evaluation needs to be made at x = 1 + δ, reducing the interval of uncertainty to 1.
• the interval of uncertainty does not depend on the outcome of the k th iteration:
bk − αk = βk − ak
so
bk+1 − ak+1 = τ (bk − ak )
where τ ∈ [0, 1];
• either αk+1 coincides with βk or βk+1 coincides with αk so that at iteration (k + 1), only one
function evaluation is required.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
54 U NIVARIATE M INIMISATION
0 1−τ τ 1
ak αk βk bk
Let the objective function be evaluated at three points, f (â), f (b̂) and f (ĉ). The extremum of
the quadratic approximation is given by:
At each iteration the new approximation to the minimum, x, replaces the abscissa: â, b̂ or ĉ, that
has the largest objective function value, i.e..
â f (â) = M AX(f (â), f (b̂), f (ĉ))
x replaces b̂ f (b̂) = M AX(f (â), f (b̂), f (ĉ)) . (4.4)
c f (ĉ) = M AX(f (â), f (b̂), f (ĉ))
An example is given in Figure 4.7. The minimum is bracketed between points 1 and 3. A further
point 2 is defined which is at the mid-point of 1 and 3. The minimum of the parabola fitted through
1, 2, and 3 lies at point 4. The largest value of the objective function occurs at point 3 so it is
discarded. A new parabola is fitted through points 1, 4 and 2 with a minimum at point 5.
f (x)
3 f (x)
1
2
5 4
F IGURE 4.7
Brent’s method fails when f (â), f (b̂) and f (ĉ) are colinear. In this case the denominator will
be zero and the new approximation to the minimum will be located infinitely far away. It should
be also noted that the above update will locate the extremum of the quadratic approximation. This
extremum may, in fact, be a maximum rather than a minimum.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
4.3 D ERIVATIVE M ETHODS 55
Brent’s method is usually combined with a safer method, like golden search, to guard against
the above problems. When Brent’s method predicts an update lying outside some bracket, or
predicts a large step, the scheme switches to the golden search method for that iteration.
A typical ending for Brent’s method is that the minimum has been isolated to a fractional
precision of approximately . Effectively this means that a and b are 2x apart and x is the midpoint
of a and b, i.e. x = (a + b)/2.
ak + b k
αk = . (4.5)
2
• If f 0 (αk ) = 0 then by pseudoconvexity of f (x), αk is a minimum point;
ak αk bk ak αk bk
At each iteration only one derivative evaluation is required and the interval of uncertainty is
reduced by a factor of 1/2.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
56 U NIVARIATE M INIMISATION
f 0 (xk )
xk+1 = xk − . (4.7)
f 00 (xk )
Note that f (x) must be twice differentiable and f 00 (x) 6= 0. Newton’s method will converge
quadratically provided the starting point, x0 , is sufficiently close to a stationary point.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
Chapter 5
5.1 Revision
5.1.1 Euler’s Method
Euler’s method1 is the simplest numerical method for solving ODEs. The method only approxi-
mates the first derivative of the function; this is accomplished by using a simple first-order finite
difference approximation of the derivative (or slope).
y2
y1
y0 ∆y
x
∆x
x0 x1 x2
F IGURE 5.1: Steps in Euler’s method.
dy ∆y y k+1 − y k
= f xk , y k .
≈ = (5.1)
dx ∆x h
1
Leonhard Euler (1707-1783)
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
58 N UMERICAL M ETHODS FOR O RDINARY D IFFERENTIAL E QUATIONS
10 Euler prediction
yB
y0
A
x
x0 x0 + h
F IGURE 5.2: Euler prediction for the improved Euler method
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
5.2 S TABILITY 59
The Improved Euler Method uses this approach to approximate the solution at y k+1 such that
h
y k+1 = y k + f xk , y k + f xk+1 , y k + hf xk , y k
. (5.4)
2
The Improved Euler Method is an example of a predictor-corrector method. In the first step the
value yB is predicted using an Euler step, and in the second step, this extra slope information is
used to provide a corrected estimate of y k+1 .
Note that the Improved Euler Method is a second-order method, i.e., its truncation error ∼
O (h3 ), but the Improved Euler Method is more computationally expensive than Euler’s Method.
As an example, consider the equation
dy
= (1 + xy)2 , with y (0) = 1.
dx
Solving this equation using the Improved Euler Method with step length h = 0.1 yields the fol-
lowing sequence of iterates:
2 h k
yEk+1 fEk+1 = 1 + xk+1 y k+1 f + fEk+1
k xk yk fk y k+1 = y k +
2
0.1
0 0 1 1 1.1 (1 + 0.1(1.1))2 = 1.2321 1 + (1 + 1.2321) = 1.1116
2
1 0.1 1.1116 1.2347 1.2351 1.5551 1.2511
2 0.2 1.2511 etc.
2
given that f k = 1 + xk y k and yEk+1 = y k + hf k .
5.2 Stability
5.2.1 Euler’s Method
Euler’s method is only a first-order accurate method and in practice it is generally necessary to use
small step lengths to achieve accurate solutions. For example, consider the equation y 0 = ay. The
solution to this equation is y = eax which will be a decaying exponential for a < 0. Applying
Euler’s method to this equation gives:
y 1 = (1 + ha)y 0
y 2 = (1 + ha)y 1 = (1 + ha)2 y 0
..
.
y k = (1 + ha)k y 0
If |1 + ha| > 1 then y k will increase in value. But for a < 0 we want y k to decrease. So for a < 0
we require −1 < 1 + ha < 1 −→ 1 + ha > −1 −→ ha > −2 −→ h < − a2 .
The criterion h < − a2 models the decaying behaviour of the exponential when a < 0 and is
necessary but not sufficient. If h is chosen to be as large as possible but smaller than − a2 then
1 + ha < 0 causing (1 + ha)k and thus y k to take negatives values for odd values of k. To
avoid this oscillatory behaviour we require 0 < 1 + ha −→ ha > −1 −→ h < − a1 . This is a
stronger condition on h since − a1 < − a2 . Therefore the criterion for Euler’s method to give stable,
non-oscillating solutions for this problem is h < − a1 .
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
60 N UMERICAL M ETHODS FOR O RDINARY D IFFERENTIAL E QUATIONS
h2 2 0
y 1 = (1 + ha + a )y
2
h2 h2
y 2 = (1 + ha + a2 )y 1 = (1 + ha + a2 )2 y 0
2 2
..
.
h2 2 k 0
y k = (1 + ha + a) y
2
2
Therefore if |1 + ha + h2 a2 | > 1 then y k will increase in value. But if a < 0 we want y k to decrease
2
and not to oscillate in sign. This implies that (for a < 0) we require 0 < 1 + ha + h2 a2 < 1. Thus,
2 2
if 1 + ha + h2 a2 < 1 −→ ha(1 + h2 a) < 0 −→ h < − a2 . Similarly, if 1 + ha + h2 a2 > 0 −→
2 + 2ha + h2 a2 > 0 −→ (1 + ha)2 > −1 which is always true since the left-hand-side is positive
and the right-hand-side is negative for any choice of h. Therefore the criterion for the Improved
Euler method to give stable, non-oscillating solutions for this problem is h < − a2 .
y k+1 = y k + hf k . (5.5)
For example, if there are two variables in the system form of the ODE, y1 and y2 , i.e.,
dy1
= f1 (x, y1 , y2 )
dx (5.6)
dy2
= f2 (x, y1 , y2 ) ,
dx
then Euler’s method becomes
y1k+1 = y1k + hf1 xk , y1k , y2k
E
(5.7)
y2k+1 = y2k + hf2 xk , y1k , y2k
E
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
5.4 RUNGE -K UTTA M ETHODS 61
where y1k+1
E
and y2k+1
E
are calculated from Equation (5.7). Once these values are calculated then
h k
y1k+1 = y1k + f1 + f1k+1
2 E
(5.10)
h
y2k+1 = y2k + f2k + f2k+1
.
2 E
d2 y Df ∂f ∂f dy ∂f ∂f dy
2
= = + = + . (5.11)
dx Dx ∂x ∂y dx ∂x ∂y dx
Using Taylor’s series
dy h2 d2 y
+ O h3
y (x + h) = y + h + 2
(5.12)
dx 2! dx
and substituting in Equation (5.11) we get
h2
∂f ∂f
+ O h3 .
y (x + h) = y + hf0 + + f0 (5.13)
2! ∂x 0 ∂y 0
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
62 N UMERICAL M ETHODS FOR O RDINARY D IFFERENTIAL E QUATIONS
where f0 = f (x, y) and f1 = f (x + βh, y + γhf0 ). To define the scheme we need to determine
the constants α0 , α1 , β and γ. Expand f1 about (x, y) using Taylor’s series yields
∂f ∂f
+ O h2
f1 = f0 + βh + γhf0 (5.15)
∂x 0 ∂y 0
Comparing this equation with the Taylor’s series expansion Equation (5.13) implies that
α0 + α1 = 1
1
α1 β = (5.17)
2
1
α1 γ = .
2
This system consists of three equations with four unknowns. Yet, if we specify one of the
1
parameters, the other three can be determined. For example, if we choose β = 1, then α0 = α1 =
2
1
and γ = 1. Therefore Equation (5.14) becomes y (x + h) = y + h (f0 + f1 ) where f0 = f (x, y)
2
and f1 = f (x + h, y + hf0 ). This is the formulation for the Improved Euler method discussed
earlier. Therefore the Improved Euler is a second-order Runge-Kutta scheme and will therefore
have truncation error ∼ O (h3 ).
Other second-order Runge-Kutta schemes can be derived by prescribing different parameter
1 1
values. For example, choosing α1 = 1 gives α0 = 0, β = and γ = , leading to the scheme
2 2
defined by
1 1
y (x + h) = y + hf x + h, y + hf0 .
2 2
f0 = f x k , y k
k h k h
f1 = f x + , y + f0
2 2
k k (5.18)
f2 = f x + h, y − hf0 + 2hf1
h
y k+1 = y k + (f0 + 4f1 + f2 ) ,
6
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
5.5 E RROR ESTIMATION FOR FOURTH - ORDER RUNGE -K UTTA 63
f 0 = f xk , y k
k h k h
f1 = f x + , y + f0
2 2
k h k h
f2 = f x + , y + f1
2 2 (5.19)
k k
f3 = f x + h, y + hf2
h
y k+1 = y k + (f0 + 2f1 + 2f2 + f3 ) ,
6
with a truncation error ∼ O (h5 ).
Therefore the error in one step is O (h5 ), implying that the truncation error involves terms which
are proportional to h5 or higher powers.
If we halve the step length then the error per step is proportional to (h/2)5 = (h5 /32) but to
get to the same point we must do twice as many steps which means the error at the final point is
E ∝ 2(h/2)5 = (h5 /16). Therefore by halving the step length we reduce the error by a factor of
16. Similarly, if we double the step length we increase the error by a factor of 16. This is why the
fourth-order Runge-Kutta method is termed a fourth-order method. In general if a method is said
to be n-th order, it will have a truncation error of O (hn+1 ).
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
64 N UMERICAL M ETHODS FOR O RDINARY D IFFERENTIAL E QUATIONS
Two ways of incorporating adaptive step-size control into single-step methods like those above
are step-doubling (also called step-halving) and embedded Runge-Kutta methods. Step-doubling
involves taking each step twice, once as a full step and then as two half steps. The difference in the
two results gives an estimate of the truncation error. Embedded Runge-Kutta methods estimate the
truncation error to be the difference between two predictions using different-order Runge-Kutta
methods. While step-doubling is simple to implement, embedded Runge-Kutta methods are more
commonly used because they are more efficient.
Although it requires more effort at each step the use of an adaptive scheme can lead to dramatic
improvements in both computational efficiency and accuracy. Clearly an adaptive scheme will
lead to improved accuracy as inaccurate steps will be rejected. The computational efficiency will
improve in cases where the function is rapidly changing in some areas and is smooth in others.
Using an adaptive scheme accuracy can be maintained in the rapidly changing region by using
small steps. In the smooth region an adaptive scheme will increase the step length so that these
regions can be traversed rapidly, leading to improved computational efficiency.
5.6.1 Step-doubling
Step-doubling involves taking each step twice, once as a full step and then as two half steps. The
difference in the two results gives an estimate of the truncation error.
Consider implementing step-doubling for a fourth-order Runge-Kutta scheme (error ∼ O (h5 )).
The calculations required are shown in Figure 5.3.
Big Step
Two Small
Steps
h h
x
F IGURE 5.3: Step doubling in a 4th order Runge-Kutta method. Points where the derivative is
evaluated are shown as filled circles. The open circle represents the same derivatives as the filled
circle immediately above it.
We can see in Figure 5.3 doing each step twice requires a total of 11 evaluations. This is to be
compared with 8 evaluations from just doing the two small steps. Therefore the adaptive method
is 11/8 = 1.375 times more expensive. So why does this method justify the extra computational
expense at each step?
Let y (x + 2h) be the exact solution when advancing from x to x+2h, let y1 be the approximate
solution determined taking one big step of 2h and let y2 be the approximate solution determined
taking two smaller steps of length h. Then using a big step we have
y (x + 2h) = y1 + (2h)5 φ + O h6
(5.20)
y (x + 2h) = y2 + 2 (h)5 φ + O h6 ,
(5.21)
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
5.6 A DAPTIVE S TEP -S IZE C ONTROL 65
where, to order h5 , φ is a number which remains constant over the step and, from Taylor’s series,
1 d5 y
will be of order of magnitude of .
5! dx5
The advantage of using step-doubling is that we can use ∆ = y2 − y1 as an indicator of the
truncation error. ∆ can be used to control the error and hence the step size. If step h1 gives the
error estimate ∆1 and the desired accuracy is ∆0 then the step length required to achieve the desired
1
∆0 5
accuracy can be estimated as h0 = h1 (the exponential power arises due to the error being
∆1
proportional to the fifth power of h). Therefore:
1
∆0 5
If |∆1 | > |∆0 | then calculate h0 and redo the time-step with h0 = h1
∆1
1
∆0 5
If |∆1 | < |∆0 | then use h0 = h1 as a larger step length for the next step.
∆1
f 0 = f xk , y k
k h k h
f1 = f x + , y + f0
2 2
k 3h k 3h (5.22)
f2 = f x + , y + f1
4 4
h
y k+1 = y k + (2f0 + 3f1 + 4f2 ) .
9
The error is estimated as
h
(−5f0 + 6f1 + 8f2 − 9f3 )
∆= (5.23)
72
where f3 = f xk+1 , y k+1 . Although there appear to be four function evaluations, there are really
only three because after the first step, the f0 for the present step will be the f3 from the previous
step.
Similarly, using fourth- and fifth-order Runge-Kutta methods in tandem amounts to a total of
10 function evaluations per step. However, by deriving a fifth-order method that employs most of
the same function evaluations as the fourth-order method, an estimate of the truncation error can
be determined using only six function evaluations.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
Chapter 6
Eigenproblems
Ax
F IGURE 6.1
1
Now let us try x = then
1
1 4 1 5 1
Ax = = =5 = 5x
2 3 1 5 1
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
68 E IGENPROBLEMS
Ax
F IGURE 6.2
−2
As a third example, try x = then
1
1 4 −2 2 −2
Ax = = = −1 = −1x
2 3 1 −1 1
The transformation of the last two x vectors produced a vector b that was in the same direction
(give or take a negative sign) as the original x vector. In other words, the x direction vector is
scaled by some amount, λ, to give the b vector i.e. b = Ax = λx. These special x directions
are called eigenvectors, and the scale factor λ is called an eigenvalue. These eigenvectors
and
1 4
eigenvalues are properties of the matrix A. For the above example the matrix A = has an
2 3
1 −2
eigenvector x = with eigenvalue λ = 5 and another eigenvector x = with eigenvalue
1 1
λ = −1.
Ax = λx (6.1)
or
(A − λI) x = 0
that is
1−λ 4 x1 0
=
2 3 − λ x2 0
This has the trivial solution x = 0, or nontrivial solution if det [A − λI] = 0, that is
1−λ 4
det = (1 − λ) (3 − λ) − 2 × 4 = 0
2 3−λ
Therefore
3 − λ − 3λ + λ2 − 8 = 0
λ2 − 4λ − 5 = 0
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
6.1 E IGENVALUES AND E IGENVECTORS 69
The first equation gives −4x1 + 4x2 = 0 or x1 = x2 . The second equation gives the same result
1 − 2x
2x 2= 0 or x1 = x2 . Note that the determinant is zero, as required. Thus the eigenvector is
1 k
or — only direction matters not magnitude. Similarly when λ = −1,
1 k
1 − (−1) 4 x1 0 2 4 x1 0
= or =
2 3 − (−1) x2 0 2 4 x2 0
−2
which gives x1 = −2x2 . Thus the eigenvector is .
1
1 −2
Now let s1 = and s2 = be the eigenvectors, then we can write
1 1
As1 = λ1 s1 λ1 0
⇒ A [s1 s2 ] = [s1 s2 ]
As2 = λ2 s2 0 λ2
or
AS = SΛ
where
λ 0
Λ= 1
0 λ2
and
S = [s1 s2 ]
i.e., the ith column of the matrix S is the ith eigenvector. Hence we have
S −1 AS = Λ (6.2)
and therefore
1 2
1 2
−1 3 3
1 4 1 −2 5 2 5 0
S AS = = 31 3 = = Λ.
− 13 1
3
2 3 1 1 −3 1
3
5 −1 0 −1
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
70 E IGENPROBLEMS
y = c1 x 1 + c2 x 2 + · · · + cn x n
Ay
y
F IGURE 6.3
If we assume that λ1 is the eigenvalue with the largest absolute value we can obtain
λ2 λn
Ay = λ1 c1 x1 + c2 x2 + · · · + cn xn
λ1 λ1
After m premultiplications:
m m
m
λ2 λn
A y= λm
1
c1 x1 + c2 x 2 + · · · + cn xn
.
λ1 λ1
| {z } | {z }
→0 →0
As m → ∞, the other terms drop out, and the vector Am y points in the direction of x1 . The
graphical representation of this is shown in Figure 6.4.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
6.2 P OWER METHOD 71
x1
A2 y
Ay
F IGURE 6.4
Thisphas converged to the first significant figure. The eigenvalue is given by the normalising factor,
i.e. 5.2612 + (−3.390)2 + 0.6402 = 6.291, and thus:
5.261 0.836
Aŷ m = −3.390 = 6.291 −0.539 = λ1 ŷ m .
0.640 0.102
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
72 E IGENPROBLEMS
0.836
Therefore the eigenvalue is λ1 = 6.291, with eigenvector s = −0.539. If λ2 is the second
0.102
m
λ2
largest eigenvalue, then the speed of convergence depends on how quickly → 0, i.e. if λ2
λ1
is close to λ1 , it is slow, and fails if λ2 = λ1 .
6.2.2 Shifting
The eigenvalues and eigenvectors of A satisfy Equation (6.1). We can rewrite this equation by
shifting the eigenvalues of A by an arbitrary amount p, i.e.
Ax = λx = (λ − p) x + px
or
(A − pI) x = (λ − p) x.
Thus,
Bx = µx
where
B = A − pI and µ = λ − p
i.e. the eigenvectors of B are the same as A, but the eigenvalues are shifted by p.
If we choose p = λ1 , where λ1 has been determined from the method shown above, we will
shift the largest eigenvalue of A to zero and the most negative eigenvalue of A into prominence as
the eigenvalue with the largest absolute value. Consider now applying the power method to the B
matrix with p = λ1 = 6.291.
5 − p −2 0 −1.291 −2 0
B = −2 3 − p −1 = −2 −3.291 −1
0 −1 1 − p 0 −1 −5.291
The eigenvector for the largest absolute value µ is the same as for the smallest λ (but not smallest
absolute value), andλsmallest
= µlargest + p.
0
Start with y = 0.5 and calculate:
1
−1.291 −2 0 0 −1
y 1 = By = −2 −3.291 −1 0.5 = −2.646 .
0 −1 −5.291 1 −5.791
−0.155
Normalise y 1 , i.e., divide by 6.445 to give ŷ 1 = −0.411. Iterating on this process as per the
−0.899
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
6.2 P OWER METHOD 73
6.2.3 Deflation
We can find other eigenvalues by “removing” the highest eigenvalue and corresponding eigenvector
from A and reapplying the power method. One method, called deflation, for removing the highest
eigenvalue produces a new matrix with the same eigenvalues as the original matrix except for λ1
which is now zero. Hence λ2 will become the largest eigenvalue which the power method will find.
This new matrix, B, is found by
B = A − λ1 x1 x1 T
where λ1 is the highest eigenvalue and x1 is the corresponding eigenvector of unit length. Note
that x1 x1 T is the backward or matrix product of two vectors.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
74 E IGENPROBLEMS
(A − τ I) y = b
where b is some random vector, y is the solution to this linear system, and τ is close to some
eigenvalue λ of A. Then y will be close to the eigenvector corresponding to λ. We can iterate
on this method by replacing b by y and solving for a new y, which will be even closer to the true
eigenvector x.
Suppose that at iteration i, we have some approximate value of the eigenvalue λi and some
approximate eigenvector xi . Then we are solving the equation (by the LU method or similar)
(A − λi I) y = xi (6.3)
Now the vector y in Equation (6.3) is a better approximation to x than xi .We can therefore improve
our guess at the eigenvector by setting xi+1 to the normalised value of y i.e.
y
xi+1 = (6.4)
kyk2
To improve our guess of the eigenvalue we note that the exact eigenvalue and eigenvector will
solve
Ax = λx
and hence
(A − λi I) x = (λ − λi ) x. (6.5)
Substituting our new guess at the eigenvector, y, into Equation (6.5) we get
(A − λi I) y = (λ − λi ) y
From Equation (6.3) we can replace the left hand side of the above equation by xi , i.e.
xi = (λ − λi ) y
If we take the dot product of each side with xi and replace λ by our new guess λi+1 we obtain
xi · xi = (λi+1 − λi ) xi · y.
kxi k2 2
λi+1 = λi + (6.6)
xi · y
There are some practical points that should be noted about the inverse iteration method.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
6.3 I NVERSE I TERATION 75
• If τ (the initial guess or approximation to the eigenvalue) is equal to λ (the actual eigenvalue)
then (A − τ I) is singular, and the system cannot be solved. In any case for τ close to λ (as
desired) the system (A − τ I) will be nearly singular and a special LU scheme for coping
with zero or close to zero pivots must be used.
• In practice the approximation to the eigenvalue, λi , is not updated at every iteration. As
most of the work in this method is spent solving Equation (6.3) computational savings can
be made using forward and backward substitution on the factored matrix (A − λi I). This,
however, can only happen if λi is constant between iterations. The question then becomes
that of how many iterations is λi used for before it is updated and the matrix (A − λi I)
refactorised.
• The method will only find one eigenvector for repeated eigenvalues.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
76 E IGENPROBLEMS
Update λ
1
x3 · y 3 = −56.3349 ⇒ λ4 = λ3 + = 6.2899.
x3 · y 3
Solve again using x4 and λ4
−2.223 −0.836
y4
y 4 = 1.434 × 105 , ky 4 k2 = 2.6590 × 105 ⇒ x5 = = 0.539
ky 4 k2
−0.271 −0.102
Update λ
1
x4 · y 4 = 2.6590 × 105 ⇒ λ5 = λ4 + = 6.2899.
x4 · y 4
Both eigenvector and eigenvalue have converged to 3 significant figures.
If this continues until the resultant product of A and the P i s is in a diagonal form, then from
Λ = S −1 AS, the eigenvalues are the values on the diagonal, and the eigenvectors are the columns
of the accumulated transformations i.e.
S = P 1P 2P 3 . . .
If only eigenvalues are required, then transforming to an upper (or lower) triangular matrix is
sufficient, and the eigenvalues are the diagonal values (as can be seen by taking a determinant).
There are several methods for doing these transformations which will be covered next year.
There are two methods which zero the off-diagonal elements. Jacobi works element by element,
and Householder works column by column. These two methods are generally used to produce
only a tridiagonal matrix. Further algorithms exist for finding eigenvalues and eigenvectors of
tridiagonal matrices. Another important method is called the QR method, or Gram Schmidt or-
thogonalisation.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
6.5 A PPLICATIONS 77
6.5 Applications
6.5.1 Uncoupling systems of linear equations
Consider solving a system of equations, Ax = b. We rotate these equations to a new system of
coordinates based on the eigenvectors of A by letting x = Sx∗ and b = Sb∗ . Our system of
equations now becomes
Ax = b ⇒ ASx∗ = Sb∗ ⇒ S −1 ASx∗ = b∗ ,
or
Λx∗ = b∗ .
This is a diagonal system of equations which is easy to solve, e.g.
λ1 x∗1 = b∗1
A diagonal system of equations like this is said to be uncoupled as each variable xi does not affect
the value of other xj variables.
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
78 E IGENPROBLEMS
since
λ1 λ1 λ21
ΛΛ =
.. .. =
.. .
. . .
λn λn λ2n
c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014