0% found this document useful (0 votes)
28 views84 pages

CT2 Notes - All Chapters

This document covers techniques for solving linear equations, finite difference methods for partial differential equations, nonlinear root finding algorithms, and univariate function minimization. It includes revision of relevant concepts and detailed explanations of various computational methods.

Uploaded by

r8jrmfhxmn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views84 pages

CT2 Notes - All Chapters

This document covers techniques for solving linear equations, finite difference methods for partial differential equations, nonlinear root finding algorithms, and univariate function minimization. It includes revision of relevant concepts and detailed explanations of various computational methods.

Uploaded by

r8jrmfhxmn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

ENGSCI 331: Computational Techniques II

Course Note Set1

August 19, 2014

1 c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved.
Contents

1 Linear Equations 1
1.1 Revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 System of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 LU Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.3 Computational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Partial Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.5 Total Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Solution of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Matrix Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 LU Factorisation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Crout Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Cholesky Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3 Iterative Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Iterative Solution Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Jacobi Iterative Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Gauss-Seidel Iterative Method . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Finite Differences for Partial Differential Equations 13


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Classification of Partial Differential Equations . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Equilibrium problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Propagation problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Finite Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 First derivative finite differences . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 Second derivative finite differences . . . . . . . . . . . . . . . . . . . . . 19
2.4 Difference Equations for Equilibrium Problems . . . . . . . . . . . . . . . . . . . 20
2.4.1 Dirichlet boundary condition problem . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Derivative boundary conditions . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Difference Equations for Propagation Problems . . . . . . . . . . . . . . . . . . . 25
2.5.1 Explicit formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5.2 Implicit formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 Von Neumann Stability Analysis for Propagation Problems . . . . . . . . . . . . . 28

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
iv CONTENTS

3 Nonlinear Equations 31
3.1 Revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.1 The Problem of Nonlinear Root-finding . . . . . . . . . . . . . . . . . . . 31
3.1.2 Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.3 Termination Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.4 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.5 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.6 Regula Falsi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.7 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Combining Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Laguerre’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2 Horner’s method for evaluating a polynomial and its derivatives . . . . . . 41
3.3.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.4 Deflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Systems of Nonlinear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Univariate Minimisation 49
4.1 The Problem of Univariate Minimisation . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Non-derivative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.2 Uniform Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.3 Dichotomous Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.4 Fibonacci Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.5 Golden Section Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.6 Brent’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Derivative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1 Bisection Search Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Numerical Methods for Ordinary Differential Equations 57


5.1 Revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.1.1 Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.1.2 The Improved Euler Method . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.1 Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.2 Improved Euler Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Systems of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3.1 Using Euler’s Method to Solve Systems of ODEs . . . . . . . . . . . . . . 60
5.3.2 Using the Improved Euler Method to Solve Systems of ODEs . . . . . . . 61
5.4 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4.1 Second-order Runge-Kutta Scheme . . . . . . . . . . . . . . . . . . . . . 61

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
CONTENTS v

5.4.2 Higher-order Runge-Kutta schemes . . . . . . . . . . . . . . . . . . . . . 62


5.5 Error estimation for fourth-order Runge-Kutta . . . . . . . . . . . . . . . . . . . . 63
5.6 Adaptive Step-Size Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.6.1 Step-doubling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.6.2 Embedded Runge-Kutta methods . . . . . . . . . . . . . . . . . . . . . . 65

6 Eigenproblems 67
6.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1.1 Finding eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . 68
6.2 Power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.1 An example of the power method . . . . . . . . . . . . . . . . . . . . . . 71
6.2.2 Shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.3 Deflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Inverse Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3.1 An example of inverse iteration . . . . . . . . . . . . . . . . . . . . . . . 75
6.4 Similarity Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.5.1 Uncoupling systems of linear equations . . . . . . . . . . . . . . . . . . . 77
6.5.2 Solving a system of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.5.3 Powers of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
Chapter 1

Linear Equations

1.1 Revision
1.1.1 System of Equations
One of the most common computational problems for engineers is to solve a set of simultaneous
equations (e.g., solution matrix in finite difference and finite element methods). In many cases it
may be necessary to solve very large systems of equations (e.g., aircraft design). The accurate and
efficient solution of linear systems of equations is therefore a very important engineering problem.
A linear system of m equations with n unknowns (x1 , . . . , xn ) can be rewritten in the form
Ax = b where A is an m × n matrix and x and b are column vectors. For example, the system of
four equations in four unknowns

2x1 + 3x2 − 4x3 + 2x4 = 4


−4x1 − 5x2 + 6x3 − 3x4 = −8
(1.1)
2x1 + 2x2 + x3 = 9
−6x1 − 7x2 + 14x3 − 4x4 = 6

can be written in the form


Ax = b, (1.2)
where      
2 3 −4 2 x1 4
−4 −5 6 −3  x2  −8
A=
2
 x=  b=  (1.3)
2 1 0  x3  9
−6 −7 14 −4 x4 6
In this module we will only consider square systems (i.e., m = n) for which case there is the
same number of equations as unknowns.

1.1.2 LU Factorisation
The standard steps of the LU factorisation are as follows:

1. Solve Ly = b by forward substitution.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2 L INEAR E QUATIONS

2. Solve U x = y by backward substitution.


The advantages of LU factorisation over Gaussian elimination are:
1. LU operates only on A (b is not required until the solution stage).
2. We do not have to redo the factorisation if b changes (In most numerical methods, A relates
to geometry and material parameters whereas b depends on the boundary conditions of the
problem. If the boundary conditions are changed, it is not necessary to re-factorise the
solution system).

Doolittle Factorisation
Perhaps the most commonly used LU factorisation method is a variation on Gaussian elimination
and is called Doolittle factorisation. In this method Gaussian elimination is used to determine U ,
and L is then constructed from the subtraction multipliers used in the factorisation process.
For example to factorise the matrix in Section 1.1.1
 
2 3 −4 2 Pivot Row = r1
−4 −5 6 −3 r2 − (−2)r1 l21 = −2
A= 
2 2 1 0  r3 − r1 l31 = 1
−6 −7 14 −4 r4 − (−3)r1 l41 = −3
 
2 3 −4 2 Pivot Row = r2
0 1 −2 1 
−→  
0 −1 5 −2 r3 − (−1)r2 l32 = −1
0 2 2 2 r4 − 2r2 l42 = 2
 
2 3 −4 2 Pivot Row = r3
0 1 −2 1 
−→  0 0 3 −1

0 0 6 0 r4 − 2r3 l43 = 2
 
2 3 −4 2
0 1 −2 1 
−→  0 0 3 −1 = U

0 0 0 2
The L matrix is a lower-triangular matrix storing the pivot operations
   
1 0 0 0 1 0 0 0
l21 1 0 0 −2 1 0 0
L= l31 l32 1 0 =  1 −1 1 0
   (1.4)
l41 l42 l43 1 −3 2 2 1
Notes:
1. The pivots are stored on the diagonal of U .
2. This method fails when we encounter a zero pivot, in which case, we need to reorder the
rows of the matrix. If all the values in a column are zero, the matrix is singular and a
solution cannot be found with the LU factorisation method.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
1.1 R EVISION 3

Once the factorisation has been found, the solution of the system of equations is a simple
case of forward and back substitution. For example, consider the right-hand side vector given in
Section 1.1.1. First we solve for the y vector by forward substitution:
    
1 0 0 0 4 4
−2 1 0 0 0 −8
Ly = b ⇐⇒  1 −1 1 0 5 =  9 
    

−3 2 2 1 8 6

Note that the solution vector is given in bold. Next, we find A−1 b by back substitution:
    
2 3 −4 2 1 4
0 1 −2 1  2 0
Ux = y ⇐⇒ 0 0 3 −1 3 = 5
     (1.5)
0 0 0 2 4 8

In practice, as the Doolittle process proceeds the elements of L are stored below the diagonal
in the A matrix and the elements of U are stored on and above the diagonal.
 
u11 u21 ... u11
.. .. .. 
l
 . . . 
A −→  21 (1.6)
 ... ..
.
..
.
.. 
. 
ln1 . . . ln,(n−1) unn
For the example considered above
 
2 3 −4 2
−2 1 −2 1 
A −→  
 1 −1 3 −1 (1.7)
−3 2 2 2

This improves the computational efficiency of the LU method as it is only necessary to store one
matrix.

1.1.3 Computational Issues


1.1.4 Partial Pivoting
In some cases, using the standard LU approach can lead to numerical instabilities. For example,
consider the problem (from Kreyszig)

0.0004x1 + 1.402x2 = 1.406


(1.8)
0.4003x1 − 1.502x2 = 2.501

using four decimal floating arithmetic. The correct solution is x1 = 10 and x2 = 1. Using standard
LU, we would pivot on the first column, giving l21 = 0.4003/0.0004 = 1001. The equation for x2
is therefore
−1405x2 = −1404,

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
4 L INEAR E QUATIONS

which yields x2 = 0.9993 and x1 = (1.406 − 1.402x2 )/0.0004 = 0.005/0.0004 = 12.5. This
problem arises due to the pivot being small in relation to the other members of the column. A
computer inevitably introduces some rounding error into calculations which can lead, in cases
such as above, to substantial solution error.
This potential for numerical error can be reduced by using partial pivoting. Instead of using
the first row as the pivot, we swap rows of A before the elimination step to ensure that the pivot
value has a larger magnitude than any of the coefficients in the column below it. Such an approach
avoids problems with zero or very small pivots, and hence, improves computational accuracy. The
cost associated with partial pivoting is the need to store pivot strategy in vector ρi for right-hand
sides. An example of using partial pivoting follows:
Example:
 
2 3 −4 2
−4 −5 6 −3 The largest magnitude in the first column is -6
A=  ⇒ swap r4 with r1
2 2 1 0
⇒ ρ1 = 4
−6 −7 14 −4
 
−6 −7 14 −4
−4 −5 6 −3 r2 − 2 r1 l21 = 23
=⇒   3
1
0  r3 − − 3  r1 l31 = − 13

2 2 1
2 3 −4 2 r4 − − 13 r1 l41 = − 13

 
−6 −7 14 −4 2
 2 − 1 −3 1 The largest magnitude in the second column is
− 13  3
−→  3
− 1 − 1 5 2
3 3  ⇒ swap r2 with r4
−4
3 3 3 3 ⇒ ρ2 = 4
−1 2 2 2
 3 3 3 3

−6 −7 14 −4
− 1 2 2 2 
−→  3 3
− 1 − 1 5 2
3 3 
− 43  r3 − − 12  r2 l32 = − 12

3 3 3
2
3
− 13 −3 13 − 13 r4 − − 12 r2 l42 = − 12
 
−6 −7 14 −4 ⇒ ρ3 = 3
− 1 2 2 2 
−→  3
− 1 − 1 6
3 3 3 
3 2
−1
2
− 12 −3 r4 − − 12 r3 l43 = − 12

0
 3 
−6 −7 14 −4
− 1 2 2 2 
−→  3
− 1 − 1 6
3 3 3  ⇒ ρ4 = 4
3 2
−1 
2
3
− 21 − 12 − 12

Therefore
     
1 0 0 0 −6 −7 14 −4 4
− 1 1 0 0 0 2 2 2  4
L= 3  U = 3 3 3  ρ=  (1.9)
− 1 − 1 1 0 0 0 6 −1  3
3 2
2
3
− 12 − 12 1 0 0 0 − 12 4

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
1.1 R EVISION 5

To keep the system consistent we need to perform the pivot operations on b to get b0
       
4 6 6 6
−8 −→ −8 −→  4  −→  4  0
b=  9  ρ1 = 4  9  ρ2 = 4  9  ρ3 = 3  9  = b
       (1.10)
6 4 −8 −8

Alternatively the pivot operations can be stored using a permutation matrix P . Originally this is
the identity. If a pivot operation is performed the corresponding rows in P are also swapped. b0
can be determined from b0 = P b. For example, for the problem considered here
 
0 0 0 1
1 0 0 0
P = 0 0 1 0
 (1.11)
0 1 0 0

and     
0 0 0 1 4 6
1 0 0 0 −8  4 
b0 = P b = 

  =   (1.12)
0 0 1 0  9   9 
0 1 0 0 6 −8
The solution process thus consists of the following steps:

1. Find L, U and P such that P A = LU . Then P Ax = P b → LU x = P b = b0 .

2. Find y = U x from Ly = b0 by forward substitution.

3. Find x from U x = y by backward substitution.

For example,
    
1 0 0 0 6 6
− 1 1 0 0  6   4 
Ly = b0
   
 31 =  (1.13)
− − 1 1 0  14   9 
3 2
2
3
− 12 − 21 1 −2 −8
   
−6 −7 14 −4 1 6
0 2 2 2  
3 3
2
3  
6
Ux = y  . (1.14)
0 0 6 −1  3  14 
0 0 0 − 21 4 −2

1.1.5 Total Pivoting


Instead of choosing the largest pivot in each column we could choose the absolutely largest co-
efficient in the entire system and use this as the pivot. This further minimises the potential for
significant numerical error. However to find the largest pivot in a matrix requires m × n operations
whereas finding the largest pivot in a column requires m operations. Because of this additional
computational expense total pivoting is rarely used.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
6 L INEAR E QUATIONS

1.1.6 Applications
Calculating an Inverse Matrix
As mentioned before calculating the inverse of a matrix directly is a computationally expensive
process. We can use LU factorisation to find the inverse of a matrix with a considerable compu-
tational saving over the direct method. Consider the observation that AA−1 = I. This of course
implies  
 . .. .. ..  1 0 0 0
.. . . .
 0 1 0 0
A x1 x2 x3 x4  = 
 
0 0 1 0
.. .. .. ..
. . . . 0 0 0 1
or        
1 0 0 0
0 1 0 0
Ax1 =  0
 Ax2 =  0
 Ax3 =  
1 Ax4 = 0 ,

0 0 0 1
where x1 , x2 , x3 , x4 are the columns of A−1 .
Therefore the inverse matrix A−1 can be calculated easily from an LU factorisation.

Algorithm
1. Factorise P A = LU (P is the permutation matrix).
2. Solve Ly i = P b for each column of the inverse matrix −→ y 1 , . . . , y n
3. Solve U xi = y i for xi −→ x1 , . . . , xn
 . .. .. .. 
.. . . .
−1
4. A = x1 x2 x3 · · · xn 
 
.. .. .. ..
. . . .

1.2 Solution of Linear Systems


1.2.1 Matrix Inversion
One way to solve the linear system is to pre-multiply the system by the inverse of the A matrix i.e.,
A−1 Ax = A−1 b
Ix = A−1 b
x = A−1 b.
However, it is not recommended to solve this system using matrix inversion because
1. it is computationally expensive to calculate A−1 explicitly.
2. the matrix A may be ill-conditioned, implying that significant numerical error may be intro-
duced through the calculation of A−1 leading to inaccurate solutions.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
1.3 LU FACTORISATION M ETHODS 7

1.2.2 Gaussian Elimination


Another way to solve a system of linear equations is to use the well-known method of Gaussian
elimination1 . The steps involved in Gaussian elimination are

1. Write system as an augmented matrix.

2. Use row reduction operations to convert the augmented matrix to an upper-triangular or


echelon form. The row used to eliminate an unknown from a column is termed the pivot row
and the weights used in this process are termed the row subtraction multipliers.

3. Solve the resultant system using back substitution.

As an example, consider the linear system in Section 1.1.1, which in augmented form is given by
..
 
2 3 −4 2 . 4 
.
−4 −5 6 −3 .. −8
 
. (1.15)
0 .. 9 
 
2 2 1
 
.
−6 −7 14 −4 .. 6

Using row reduction we convert this augmented matrix to an upper-triangular form, i.e.,
..
 
2 3 −4 2 .. 4
0 1 −2 1 .. 0
 
(1.16)
0 0 3 −1 ... 5
 
 
..
0 0 0 2 . 8

Using back-substitution we obtain the solution xT = [1, 2, 3, 4].


Gaussian elimination suffers from one main drawback. If we need to solve systems of linear
equations with multiple right-hand side vectors, we must form the augmented matrix and row
reduce to upper-triangular form for each of the right-hand side vectors. This is computationally
expensive.

1.3 LU Factorisation Methods


There are many methods of LU Factorisation. We have already discussed the Doolittle method;
other methods include the Crout and Cholesky methods.

1.3.1 Crout Factorisation


Doolittle factorisation gives A = LU where the L matrix has 1’s on the diagonal and the U matrix
has the pivots on the diagonal. Crout factorisation is an alternative LU factorisation which gives
A = L̄Ū where L̄ has the pivots on the diagonal and Ū has 1’s on the diagonal. This approach
1
Karl Friedrich Gauss (1777-1855)

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
8 L INEAR E QUATIONS

gives better control of round-off error. L̄ can be constructed from L by multiplying the columns
of L by the pivots. Ū can be constructed from U by dividing the rows of U by the pivots. For
example

0 0 0 1 23 −2 1
  
2
−4 1 0 0 0 1 −2 1 
A = L̄Ū =  
 2 −1 3 0 0 0 1 − 1 
 (1.17)
3
−6 2 6 2 0 0 0 1

1.3.2 Cholesky Factorisation


If A is symmetric, then the computational advantages of symmetry are lost by using an LU fac-
torisation. To maintain symmetry, a Cholesky factorisation can be used. For example, using a
Doolittle factorisation gives
    
16 4 8 1 0 0 16 4 8
A =  4 5 −4 =  41 1 0  0 4 −6 (1.18)
1 3
8 −4 22 2
−2 1 0 0 9
Alternatively, using a Cholesky factorisation gives
16 0 0 1 41 21
   
1 0 0
A = LDLT =  41 1 0  0 4 0 0 1 − 23  (1.19)
1
2
− 32 1 0 0 9 0 0 1
Provided all the pivots are positive (i.e., the matrix A is positive definite) then
  
4 0 0 4 0 0
A = L 0 2 0 0 2 0 LT
0 0 3 0 0 3
  
4 0 0 4 1 2 (1.20)
= 1 2 0 0 2 −3
2 −3 3 0 0 3
T
= L̄L̄
T
Therefore, A = L̄L̄ is the Cholesky factorisation for symmetric positive definite matrices.

1.3.3 Iterative Improvement


In general, all floating point operations will introduce an error into a calculation. Thus the solution
we obtain from Ax = b will contain an error. If solving

Ax = b (1.21)

returns x + δx, where x is the true solution and δx is the unknown error, then substituting this
solution back into our system of equations will give a slight discrepancy in the right hand side
vector i.e.,
A (x + δx) = b + δb, (1.22)

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
1.3 LU FACTORISATION M ETHODS 9

where δb is termed the residual in the solution. Rearranging Equation (1.22) we can obtain the
following expression for the residual:

δb = A (x + δx) − b. (1.23)

Note that δb must be calculated in double precision since there will be a lot of cancellation in the
subtraction of b.
Subtracting Equations (1.22) and (1.21) we obtain

Aδx = δb. (1.24)

Thus we can calculate the error in our solution, δx, from our residual, δb, and hence correct our
solution to improve its accuracy by subtracting this error i.e.,

xnew = (x + δx) − δx. (1.25)

Note the following:

1. We can use our previous LU factorisation of A to solve Equation (1.24) so this process can
be iterated efficiently.

2. The calculation of the residual, δb, requires the original A matrix which is normally over-
written by the LU decomposition. Thus if iterative improvement is used the original matrix
must be copied before the factorisation process.

3. For iterative improvement to be most successful the residual should be computed with a
higher degree of accuracy than the matrices and the right-hand side. This is often times
difficult to do in practice if we are already using 8-byte real numbers.

Iterative Improvement Algorithm


• Factorise P A = LU using partial pivoting.

• Solve LU x = P b to attain an estimate for x (same as x + δx).

• Set x0 = x.

Do k = 1, ...., numits

• Compute the residual δb = Ax(k−1) − b.

• Solve Ly = P δb for y.

• Solve U δx = y for δx.

• Compute x(k) = x(k−1) − δx

Enddo

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
10 L INEAR E QUATIONS

1.4 Iterative Solution Methods


Direct methods become inefficient, due to round-off errors, when they are applied to large systems
(Large systems arise in the solution of partial differential equations). In these cases, an iterative
method is preferable, as there are less round-off errors. Also, for sparse matrices, which by def-
inition consists largely of zeros, the amount of storage space required for iterative solutions on a
computer is far less then the amount of storage space required for direct solutions.

1.4.1 Jacobi Iterative Method


The Jacobi algorithm consists of converting a system Ax = b into

x = Eb − F x. (1.26)

After an initial guess, x(0) , is selected, the approximation for the solution is generated by calculat-
ing
x(k) = Eb − F x(k−1) . (1.27)
If we write the matrix A as A = D + L + U , where D = diag(A) and L and U are the lower
and upper triangular parts, then E = D −1 and F = D −1 (L + U ) in Equation (1.27). We then
write the Jacobi method in Equation (1.27) as

x(k) = D −1 b − D −1 (L + U )x(k−1) . (1.28)

For simple systems, we can replace this matrix equation as a series of steps as follows:

1. Rearrange the system so that each variable is the subject of one of the equations.

2. Set the variables on the RHS of the equation to be the old variables.

3. Set the variables on the LHS of the equation to be the new variables.

4. Iterate from the initial guess until an accurate solution is found.

For example, consider the system of equations given by


    
7 −2 1 0 x1 17
1 −9 3 −1 x2  13
Ax = b ⇐⇒ 2 0 10 1  x3  = 15 .
    

1 −1 1 6 x4 10
For this system, we execute Step 1 and form the following four equations:

xk1 = (17 + 2x2k−1 − xk−1


3 )/7

xk2 = (−13 + xk−1


1 + 3x3k−1 − xk−1
4 )/9

xk3 = (15 − 2xk−1


1 + xk−1
4 )/10

xk4 = (10 − xk−1


1 + xk−1
2 − xk−1
3 )/6

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
1.4 I TERATIVE S OLUTION M ETHODS 11

We (hopefully) calculate successively better xk by repeatedly using the old solution, xk−1 , to
get new approximations. We continue doing this until the current approximation, xk , is converged,
i.e. it is close enough, in some sense, to the exact solution. One commoly used stopping criteria,
known as the relative change criteria, is to iterate until
x(k) − x(k−1)
< tolerance. (1.29)
|x(k) |
Iteration x1 x2 x3 x4
0 0 0 0 0
1 2.43 -1.44 1.50 1.67
2 1.80 -0.86 0.85 0.77
3 2.06 -1.05 1.06 1.08
4 1.98 -0.98 0.98 0.97
5 2.01 -1.01 1.01 1.01
6 2.00 -1.00 1.00 1.00
In the above table we observe sequence of iterates when applying the Jacobi method to the above
system Ax = b. The solution after 6 iterations is given by x = [2, −1, 1, 1]T .

1.4.2 Gauss-Seidel Iterative Method


The algorithm for the Gauss-Seidel method is a derivative of the algorithm for the Jacobi method.
The difference is that in the Gauss-Seidel algorithm, each new xi -value is calculated using the
most recent approximation to the values of the other xi variables. Again, writing the matrix A as
A = D + L + U , we can express the Gauss-Seidel method as
x(k) = (D + L)−1 b − (D + L)−1 U x(k−1) . (1.30)
For simple systems, we can replace this matrix equation as a series of steps as follows:
1. Rearrange the system so that each variable is the subject of one of the equations.
2. Set the variables on the RHS of the equation, that have not been calculated yet, to be the old
variables.
3. Set the variables on the RHS of the equation, that have been calculated, to be the new vari-
ables.
4. Set the variables on the LHS of the equation to be the new variables.
5. Iterate from the initial guess until an accurate solution is found.
For example, consider the system of equations given by
    
7 −2 1 0 x1 17
1 −9 3 −1 x2  13
Ax = b ⇐⇒ 2 0 10 1  x3  = 15 .
    

1 −1 1 6 x4 10
Use of the Gauss-Seidel method results in the equations

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
12 L INEAR E QUATIONS

xk1 = (17 + 2xk−1


2 − xk−1
3 )/7

xk2 = (−13 + xk1 + 3xk−1


3 − xk−1
4 )/9

xk3 = (15 − 2xk1 + xk4 )/10

xk4 = (10 − xk1 + xk2 − xk3 )/6.

In succession, each xi is updated, resulting in the k th iterate. The iterations proceed until the
iterates converge to the solution.

Iteration x1 x2 x3 x4
0 0 0 0 0
1 2.43 -1.17 1.01 0.90
2 1.95 -0.99 1.02 1.01
3 2.00 -0.99 1.00 1.00
4 2.00 -1.00 1.00 1.00

Therefore ,the solution to this problem, after 4 iterations, is given by x = [2, −1, 1, 1]T . Note how
this method reaches the same solution as the Jacobi method, but in two fewer iterations.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
Chapter 2

Finite Differences for Partial Differential


Equations

2.1 Introduction
This chapter considers finite difference approximations to partial differential operators. The finite
difference approximations are discrete models of the continuous finite difference operators. They
are used to give a discrete model of a partial differential equation. The solutions are then a discrete
approximation of the continuous solution to the partial differential equation.
Models exist at many levels in engineering. One possible categorisation is to think of mathe-
matical, numerical and computational models. An example of a mathematical model is the simple
diffusion equation:
∂f ∂ 2f
+ D 2 = b(x, t)
∂t ∂x
Mathematical models are derived from first principles, subject to necessary assumptions and ap-
proximations. Many mathematical models cannot be solved analytically for a realistically com-
plicated solution domain, e.g. a harbour or a complicated heat sink, and the differential operators
must be approximated in some way. This approximation to the mathematical model can be called a
numerical model and usually gives a discrete numerical approximation to the continuous solution
of the partial differential equation. The third level of approximation or modelling arises in the
choice of coefficients (such as the diffusion coefficient, D, in the above equation) and the number
of discrete points at which to solve the equation. These choices make comprise what can be re-
ferred to as the computational model. A specific set of results is obtained from the computational
model, which is based on the numerical model of the mathematical model of the real process. In
this chapter we are interested in a particular class of numerical models called finite differences.
An illustration of a numerical model follows. Consider for example the representation of a
function f formed by fitting a quadratic to three values of f , each a distance of ∆x apart. This is
shown in Figure 2.1.
The approximation to the function f is:

f (x) ≈ ax2 + bx + c
   
fi−1 − 2fi + fi+1 2 fi+1 − fi−1
≈ x /2 + x + fi
∆x2 2∆x

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
14 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS

f (x)
fi

fi−1 fi+1

x
i−1 i i+1
−∆x 0 +∆x
F IGURE 2.1

This is identical to what would be obtained from using a second order Lagrange interpolation.
Using this quadratic approximation to the function, f , at x = 0 (index i), models of the first
and second derivatives of f (x) are:

∂f fi+1 − fi−1 ∂ 2f fi−1 − 2fi + fi+1


≈ ≈
∂x i 2∆x ∂x2 i ∆x2

These sorts of models can be directly substituted into a partial differential equation like the diffu-
sion equation to give a numerical model.
The discrete points i − 1, i, i + 1 etc. can be referred to as nodes. This chapter will look
at deriving models of differential operators using a more rigorous foundation that gives formal
estimates of e.g., the truncation error involved in a model. The aim is to determine a (usually
linear) discrete equation for each node that relates function values at surrounding nodes, e.g.,

∂ 2 f (x, t)
= b(x, t)
∂x2
⇒ fi−1 (t) − 2fi (t) + fi+1 (t) ≈ ∆x2 b(xi , t)

Note that the function value at each node is only dependent on time. The equation for each node
can be put together to form a system of equations, i.e., Ax = b that can be solved (subject to the
appropriate boundary conditions):
 .   .. 
.. .
...
 
fi−2 (t) b(xi−2 , t)
   
. . . 0 1 −2 1 0 . . .  fi−1 (t) 2 b(xi−1 , t)
    
 = ∆x 
. . . 0 1 −2 1 0 . . .  fi (t)   b(xi , t) 
  

.. 
. fi+1 (t) b(xi+1 , t)
  
.. ..
. .

An important part of the process of deriving models of partial differential operators is an un-
derstanding of the nature of the the partial differential equations themselves.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.2 C LASSIFICATION OF PARTIAL D IFFERENTIAL E QUATIONS 15

2.2 Classification of Partial Differential Equations


A partial differential equation is called quasilinear if it is linear in its highest derivatives. A general
second-order quasilinear equation in two independent variables x and y can be written as
∂ 2u ∂ 2u ∂ 2u
 
∂u ∂u
a 2 + 2b + c 2 = F x, y, u, ,
∂x ∂x∂y ∂y ∂x ∂y
where u is the unknown function. This type of equation can classified into one of three types of
equations based on the equations characteristics, that is
elliptic type if ac − b2 > 0 (example: Laplace equation)
parabolic type if ac − b2 = 0 (example: heat equation)
hyperbolic type if ac − b2 < 0 (example: wave equation)
Generally elliptic equations govern equilibrium problems whereas parabolic and hyperbolic
equations govern propagation problems.

2.2.1 Equilibrium problems


Equilibrium problems are steady state problems in which the equilibrium configuration, u, is de-
termined by solving a differential equation, D (u) = f , within a domain, Ω, subject to boundary
conditions, B (u) = g, on the boundary, ∂Ω. This is shown in Figure 2.2.

Differential Equation Boundary Conditions


D (u) = f B (u) = g
on closed domain, Ω on boundary, ∂Ω

F IGURE 2.2

These sorts of problems are also known as boundary value problems. Typical examples include
steady viscous flow, steady state temperature distributions, equilibrium stresses in elastic structures
and steady state voltage distributions.
Common boundary condition types include:
Dirichlet conditions Values of the dependent function, u, are specified on the
boundary.
∂u
Neumann conditions Values of the normal derivative, , are specified on the
∂n
boundary.
Mixed or Robin conditions A linear combination of values and normal derivatives are
specified on the boundary.
Only one type of boundary condition can be specified at each point on the boundary.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
16 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS

2.2.2 Propagation problems


Propagation problems have an unsteady or transient nature. Given an initial state, I (u) = h,
and prescribed boundary conditions, B (u) = g, the subsequent behaviour of the system is deter-
mined by solving the differential equation, D (u) = f , on the open domain, Ω. This is shown in
Figure 2.3.

Differential Equation
D (u) = f
on open domain, Ω
Boundary Conditions
B (u) = g
on boundary, ∂Ω

Time

Space

Initial Conditions
I (u) = h
F IGURE 2.3

These problems are known as initial boundary value problems, or just initial value problems.
Typical examples include the propagation of pressure waves in fluid, propagation of heat, propaga-
tion of stress and displacement in elastic structures and the propagation of electromagnetic waves.

2.3 Finite Differences


du
Finite differences involve the approximation of the derivatives of a function, e.g., , at a point,
dx
xi , using the function values, u (x), near the point xi . These approximations are usually made from
function values at a finite number of equally spaced points i.e., · · · , u (xi − 2∆x), u (xi − ∆x),
u (xi ), u (xi + ∆x), u (xi + 2∆x), · · · for a spacing of ∆x. Hence
K1 K1
du X X
≈ ak u (xi + k∆x) = ak u (xi+k ) ,
dx x=xi k=K0 k=K0

i.e., the derivative will be approximated by a weighted sum of discrete function evaluations (e.g.,
just like interpolation or quadrature). The shorthand, u (xi+k ) = u (xi + k∆x), will be used in the
remainder of these notes.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.3 F INITE D IFFERENCES 17

2.3.1 First derivative finite differences


Consider the n term Taylor series expansion of u (xi+1 ) about xi :

du (∆x)2 d2 u
u (xi+1 ) = u (xi ) + ∆x + + ···
dx x=xi 2! dx2 x=xi

(∆x)n dn u (∆x)n+1 dn+1 u


··· + + . (2.1)
n! dxn x=xi (n + 1)! dxn+1 x=xi

du
Setting n = 1 in Equation (2.1) we can get a finite difference approximation to i.e.,
dx x=xi

du 1 ∆x d2 u
= (−u (xi ) + u (xi+1 )) − (2.2)
dx x=xi ∆x 2 dx2 x=xi

This is a two point forward difference approximation to the first derivative. The approximation
is called a two point scheme as it involves two known points (u (xi ) and u (xi+1 ) in this case) and a
forward scheme as all points are ’forward’ of xi , the point at which the derivative is approximated.
The term
∆x d2 u du 1
− = − (−u (xi ) + u (xi+1 ))
2 dx2 x=xi dx x=xi ∆x
is called the truncation error. It is a measure of the error in the finite difference approximation to
the derivative. It can be seen in Equation (2.2) that the truncation error for the two point forward
difference approximation to the first derivative is proportional to ∆x.
Consider now the Taylors series expansion of u (xi−1 ) about xi

du (∆x)2 d2 u
u (xi−1 ) = u (xi ) − ∆x + + ···
dx x=xi 2! dx2 x=xi

(−1)n (∆x)n d u n
(−1)n+1 (∆x)n+1 dn+1 u
··· + + (2.3)
n! dxn x=xi (n + 1)! dxn+1 x=xi

du
From Equation (2.3) we can obtain the two point backward difference approximation to
dx x=xi
i.e.,
du 1 ∆x d2 u
= (−u (xi−1 ) + u (x0 )) + (2.4)
dx x=xi ∆x 2 dx2 x=xi

The truncation error for the two point backward difference approximation to the first derivative
is also proportional to ∆x.
If we subtract the Taylor series expansions of u (xi−1 ) from u (xi+1 ) we obtain the two point

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
18 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS

du
central difference approximation to i.e.,
dx x=xi

du (∆x)2 d2 u (∆x)3 d3 u
u (xi+1 ) − u (xi−1 ) = u (xi ) + ∆x + +
dx x=xi 2! dx2 x=xi 3! dx3 x=xi
2 2 3 3
du (∆x) d u (∆x) d u
− u (xi ) + ∆x − + (2.5)
dx x=xi 2! dx2 x=xi 3! dx3 x=xi

du 1 (∆x)2 d3 u
= (−u (xi − ∆x) + u (xi + ∆x)) −
dx x=xi 2∆x 6 dx3 x=xi

Note that the truncation error for the two point central difference approximation to the first
derivative is proportional to (∆x)2 , an improvement over the two point forward and backward
differences where the error is proportional to ∆x.
du
Consider now the problem of finding an expansion for in terms of u (xi ) , u (xi+1 ), and
dx x=xi
du
u (xi+2 ), i.e., a three point difference. Specifically we would like the approximation to in
dx x=xi
terms of a linear combination of u (xi ) , u (xi+1 ), and u (xi+2 ). Thus we need to find b0 , b1 and b2
such that
du
≈ b0 u (xi ) + b1 u (xi+1 ) + b2 u (xi+2 )
dx x=xi
Let u (x) be expressed in terms of a polynomial,
u (x) = a0 + a1 x + a2 x2 + a3 x3 + · · · .
Then
du
= a1 + 2a2 x + 3a3 x2 + · · · ,
dx
where
a1 + 2a2 xi + 3a3 x2i + · · · = b0 a0 + a1 xi + a2 x2i + a3 x3i + · · ·


+ b1 a0 + a1 (xi+1 ) + a2 (xi+1 )2 + a3 (xi+1 )3 + · · ·




+ b2 a0 + a1 (xi+2 ) + a2 (xi+2 )2 + a3 (xi+2 )3 + · · · .




Thus, we can obtain three linear equations in b0 , b1 and b2 by equating terms involving a0 , a1 and
a2 i.e.,     
1 1 1 b0 0
 xi (xi+1 ) (xi+2 )  b1  =  1 
x2i (xi+1 )2 (xi+2 )2 b2 2xi
Solving this system gives
−3 4 −1
, b1 =
b0 = and b2 =
2∆x 2∆x 2∆x
Thus the three point forward difference approximation is
du 1 (∆x)2 d3 u
= (−3u (xi ) + 4u (xi+1 ) − u (xi+2 )) + (2.6)
dx x=xi 2∆x 3 dx3 x=xi

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.3 F INITE D IFFERENCES 19

The truncation error of the three point forward difference approximation to the first derivative
is twice that of the two point central difference approximation and is proportional to (∆x)2 .
du
A table of common finite difference approximations to is as follows.
dx x=xi

Coefficient u (xi−2 ) u (xi−1 ) u (xi ) u (xi+1 ) u (xi+2 ) Error


1 ∆x d2 u
-1 1 −
∆x 2 dx2
1 ∆x d2 u
-1 1 2
∆x 2 dx 2 3
1 (∆x) d u
-1 0 1 −
2∆x 6 2 dx3
1 (∆x) d3 u
-3 4 -1
2∆x 3 2 dx3
1 (∆x) d3 u
1 -4 3
2∆x 3 3 dx3
1 (∆x) d4 u
-2 -3 6 -1
6∆x 12 dx4
1 (∆x)3 d4 u
1 -6 3 2 −
6∆x 12 dx4
1 (∆x)4 d5 u
2 -16 0 16 -2
24∆x 30 dx5

2.3.2 Second derivative finite differences


Consider now the finite difference approximations to the second derivative at a point. If we add the
Taylor series expansions of u (xi−1 ) to u (xi+1 ) we get

d2 u (∆x)2 d4 u
u (xi−1 ) + u (xi+1 ) = 2u (xi ) + (∆x)2 + .
dx2 x=xi 12 dx4 x=xi

d2 u
Rearranging we can obtain a three point central difference approximation for , i.e.,
dx2 x=xi

d2 u 1 (∆x)2 d4 u
= (u (xi−1 ) − 2u (xi ) + u (xi+1 )) − (2.7)
dx2 x=xi (∆x)2 12 dx4 x=xi

which has a truncation error proportional to (∆x)2 .


Consider now the Taylor series expansions of u (xi+1 ) and u (xi+2 ):

du (∆x)2 d2 u (∆x)3 d3 u
u (xi+1 ) = u (xi ) + ∆x + +
dx x=xi 2 dx2 x=xi 6 dx3 x=xi
2 3 3
du du 4 (∆x) d u
u (xi+2 ) = u (xi ) + 2∆x + 2 (∆x)2 + .
dx x=xi dx2 x=xi 3 dx3 x=xi

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
20 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS

We can eliminate the first derivative term by adding −2u (xi+1 ) to u (xi+2 ).

d2 u d3 u
−2u (xi+1 ) + u (xi+2 ) = −u (xi ) + (∆x)2 + (∆x)3
dx2 x=xi dx3 x=xi

d2 u
Rearranging this yields a three point forward approximation to in terms of u (xi ), u (xi+1 )
dx2 x=xi
and u (xi+2 ), i.e.,

d2 u 1 d3 u
= (u (x i ) − 2u (x i+1 ) + u (x i+2 )) − ∆x . (2.8)
dx2 x=xi (∆x)2 dx3 x=xi

d2 u
A table of common finite difference approximations to is as follows.
dx2 x=xi

Coefficient u (xi−2 ) u (xi−1 ) u (xi ) u (xi+1 ) u (xi+2 ) Error


1 (∆x)2 d4 u
1 -2 1 −
(∆x)2 12 dx4
1 d3 u
1 -2 1 −∆x 3
(∆x)2 dx
1 d3 u
1 -2 1 −∆x 3
(∆x)2 dx
1 (∆x)4 d6 u
-1 16 -30 16 -1 −
12 (∆x)2 90 dx6

2.4 Difference Equations for Equilibrium Problems


Consider the Poisson equation, ∇2 u = f , expressed in two-dimensional rectangular Cartesian
coordinates
∂ 2u ∂ 2u
+ = f (x, y)
∂x2 ∂y 2
This equation can be discretely approximated by using the three point second derivative central
finite difference approximations from the previous section. We can streamline our notation further
by using: uij = u (xi , yj ), ui−1j = u (xi−1 , yj ), ui+1j = u (xi+1 , yj ) etc.. Note here that the index
i is used in the x-direction and the index j is used in the y-direction.

∂ 2u 1
≈ (ui−1j − 2uij + ui+1j )
∂x2 (xi ,yj ) (∆x)2
2
∂ u 1
≈ (uij−1 − 2uij + uij+1 )
∂y 2 (xi ,yj ) (∆y)2

If we set ∆x = ∆y = ∆ and substitute the difference approximations into Poisson’s equation, we


get
(ui−1j − 2uij + ui+1j ) + (uij−1 − 2uij + uij+1 ) = ∆2 f (xi , yj )

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.4 D IFFERENCE E QUATIONS FOR E QUILIBRIUM P ROBLEMS 21

or
ui−1j + uij−1 − 4uij + ui+1j + uij+1 = ∆2 f (xi , yj )
Thus we can write the finite difference approximation of ∇2 u = f in terms of a five point expres-
sion with the coefficient pattern  
1
1  
1 −4 1 =f (2.9)
∆2 
1

This is known as a finite difference molecule for ∇2 u = f centred at the point (xi , yj ) .
The graphical representation is given in Figure 2.4, where ◦ are the finite difference points in
∆2 4
the scheme. This approximation has a local truncation error of ∇ u (x, y).
6

yj+1 = yj + ∆y

yj

yj−1 = yj − ∆y
xi−1 = xi − ∆x xi xi+1 = xi + ∆x
F IGURE 2.4: Laplacian finite difference molecule.

2.4.1 Dirichlet boundary condition problem


Consider a 4m by 3m rectangular plate made of a homogeneous material. What is the steady state
temperature distribution within the plate if the sides are maintained at the indicated prescribed
values? The steady state temperature is governed by Laplace equation, ∇2 u = 0. The plate can
be partitioned into a 1m by 1m grid of nodes with numbering as shown in Figure 2.6. Since
the temperatures are prescribed on all the boundary nodes only the internal node temperatures
must be determined – u11 , u21 , u31 , u12 , u22 , u32 . Using the five point approximation to ∇2 u in
Equation (2.9) we can construct a matrix of 20 linear equations in the 20 nodal values.
The equations associated with the boundary nodes are simple i.e.,

u03 = 4, u13 = 3, u23 = 2, u33 = 1, u43 = 0,


u02 = 8, u42 = 0,
u01 = 12, u41 = 0,
u00 = 16, u10 = 9, u20 = 4, u30 = 1, u40 = 0

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
22 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS

y
u=4−x
3

2
u = 16 − 4y u=0
1

x
1 2 3 4
2
u = (4 − x)
F IGURE 2.5: Steady state temperature distribution problem.

u03 u13 u23 u33 u43

u02 u12 u22 u32 u42

u01 u11 u21 u31 u41

x
u00 u10 u20 u30 u40
F IGURE 2.6: A finite difference grid laid over the plate.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.4 D IFFERENCE E QUATIONS FOR E QUILIBRIUM P ROBLEMS 23

The equation centred on node 11 is given by the five point approximation to the Laplacian in
∇2 u = 0 i.e.,
1
(u10 + u01 − 4u11 + u21 + u12 ) = 0
∆2
but ∆ = 1 so
u10 + u01 − 4u11 + u21 + u12 = 0
Similarly for the equations centred on nodes 21, 31, 12, 22, and 32:
u20 + u11 − 4u21 + u31 + u22 =0
u30 + u21 − 4u31 + u41 + u32 =0
u11 + u02 − 4u12 + u22 + u13 =0
u21 + u12 − 4u22 + u32 + u23 =0
u31 + u22 − 4u32 + u42 + u33 =0
The set of linear equations may be expressed as a matrix equation:
1 u00 16 u00 16
        
 1  u10  9 u10   9 

 1  u 
  20 
4
 
u 
 20 
 4 
 

 1  u 
  30 
1
 
u 
 30 
 1 
 

 1  u 
  40 
0
 
u 
 40 
 0 
 

 1  u 
  01 
12
 
u 
 01 
 12 
 

 1 1 −4 1 1  u 
  11 
0
 
u 
 11 
7.66
 

 1 1 −4 1 1  u 
  21 
0
 
u 
 21 
4.15
 

 1 1 −4 1 1  u31 
 
0
 
u31 
 
1.66
 
 1  u41  0 u41   0 

 u02  =  8 
    ⇒ u02  =  8 
   

 1        

 1 1 −4 1 1  u12 
 
0
 
u12 
 
5.48
 

 1 1 −4 1 1  u22 
 
0
 
u22 
 
3.28
 

 1 1 −4 1 1  u32 
 
0
 
u32 
 
1.48
 
 1  u42  0 u42   0 
        
 1  u03  4 u03   4 
1  u13  3 u13   3 
        

1  u23  2 u23   2 
        

1 u33 1 u33 1
        
1 u43 0 u43 0

The solution for u is accurate to within 2% of the true solution.


This can be generalised for larger finite difference grids over rectangular domains. Consider
now a rectangular grid of size (N1 + 1) by (N2 + 1), shown graphically in Figure 2.7. Using a five
point approximation to the Laplacian we obtain the following set of (N1 + 1) by (N2 + 1) linear
equations in (N1 + 1) × (N2 + 1) variables, Au = b.
    
I ν0 β0
J α J   ν1   β 
1
...
    
  ν2   β 
J α

2
  ..  =  .. 
    

 . .
.. .. J  .   . 
    
 J α J  ν (N2 −1)  β (N2 −1) 
I ν N2 β N2
Here α, and J are (N1 + 1) × (N1 + 1) submatrices given by
 2   
∆ 0
 1 −4 1   1 
..
   
1  1 −4 .  1  1 
α= 2 , J =
   
∆  ... ... 
∆2
 .. 
1   . 
   
 1 −4 1   1 
∆2 0

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
24 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS

u0N2 u1N2 u2N2 uN1 N2

u03 u13 u23 uN1 3

u02 u12 u22 uN1 2

u01 u11 u21 uN1 1

x
u00 u10 u20 uN1 0
F IGURE 2.7

and I is the (N1 + 1) × (N1 + 1) identity matrix, ν i and β i are (N1 + 1) subvectors given by
   
ui0 bi0
 ui1   bi1 
   
 ui2   bi2 
ν i =  ..  , β i =  ..  .
   
 .   . 
   
ui(N1 −1)  bi(N1 −1) 
uiN1 biN1

2.4.2 Derivative boundary conditions


For Neumann problems the boundary derivatives normal to the boundary are prescribed. We can
approximate the boundary derivatives by finite differences and place them into the system of linear
equations Au = b. For example, if the derivative at a boundary point (xi , yi ) is prescribed as
∂u
= c, then using a two point backward difference approximation to this derivative we get
∂x (xi ,yi )
the linear equation
1
(−ui−1j + uij ) = c.
∆x
The truncation error of the finite difference approximation to the boundary derivative should
match the truncation error of the finite difference approximation to the differential equation or the
overall solution accuracy will suffer. For example, if three point central difference approximations
are used for second derivatives in the differential equation, then these approximations will have a
local truncation error proportional to (∆x)2 . In this case if two point forward difference approxi-
mations to the boundary first derivative were used the overall solution accuracy would suffer since

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.5 D IFFERENCE E QUATIONS FOR P ROPAGATION P ROBLEMS 25

the local truncation error of this approximation is proportional to ∆x. In order to maintain a so-
lution accuracy proportional to (∆x)2 three point difference approximations to the boundary first
derivatives must be used, e.g., a three point backward difference
1
(ui−2j − 4ui−1j + 3uij ) = c.
2∆x

2.5 Difference Equations for Propagation Problems


Consider the one-dimensional diffusion equation which models heat flow

∂ 2u ∂u
2
−a =0
∂x ∂t
where a is reciprocal of the thermal diffusivity. The equation for u (x, t) is usually defined over
an open domain, i.e., over a closed spatial interval, X0 ≤ x ≤ X1 , and an open temporal interval,
t ≥ 0. At time t = 0 an initial state is prescribed, u (x, 0) = h (x). For time t > 0 boundary
conditions are prescribed at x = X0 and x = X1 i.e., u (X0 , t) = g0 (t) and u (X1 , t) = g1 (t).
This is shown graphically in Figure 2.8.

∂ 2u ∂u
2
−a =0
∂x ∂t

g0 (t) g1 (t)

x
h (x)
X0 X1
F IGURE 2.8

2.5.1 Explicit formulations


∂ 2u
Consider using a three point central difference for and a two point forward differences
∂x2 (xi ,tn )
∂u
for . We will use shorthand notaion: u (xi , tn ) = uni etc.. The heat equation is approxi-
∂t (xi ,tn )

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
26 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS

mated as
∂ 2u ∂u 1 n n n

2
−a =0≈ 2 ui−1 − 2ui + ui+1
∂x ∂t (∆x)
1
−uni + un+1

−a i
∆t
We now have the unknown un+1i expressed in terms of known values at t = tn , i.e., uni−1 , uni and
uni+1 . Rearranging we have

∆t
un+1 = uni + n n n

i ui−1 − 2ui + ui+1
a (∆x)2

∆t ∂ 2u ∂u
Setting r = 2 and rearranging we can obtain the molecule for 2
−a = 0 centred
a (∆x) ∂x ∂t
on (xi , tn ) i.e.,
 
−1
=0 (2.10)
r 1 − 2r r
Figure 2.9 shows this molecule graphically, where ◦ are known values and • are unknown values.

t0 + ∆t

t0
x0 − ∆x x0 x0 + ∆x
F IGURE 2.9

A formula, such as this, which expresses one unknown value in terms of known values is called
an explicit formula and the corresponding finite difference scheme is called an explicit scheme.
∆t
It will be shown (in Section 2.6) that the explicit method is only stable for 0 < ≤ 12 .
a (∆x)2
∆t 1
When 2 > 2 the explicit scheme yields oscillatory solutions whose magnitudes increase
a (∆x)
exponentially with time. Although the explicit method is computationally simple it requires a
∆t 1 a (∆x)2
small time step i.e., since ≤ then ∆t ≤ and since we would like ∆x to be
a (∆x)2 2
2
small, ∆t must be kept very small.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.5 D IFFERENCE E QUATIONS FOR P ROPAGATION P ROBLEMS 27

2.5.2 Implicit formulations


∆t 1
The restriction of 0 < ≤ for explicit schemes can be avoided if we replace the approxi-
a (∆x)2 2

∂ 2u
mation for by the average of its finite difference representations at tn and tn+1 = tn +∆t,
∂x2 (xi ,tn )
i.e.,

∂ 2u ∂u 1 n+1 n+1
n n n
+ un+1

2
−a =0≈ 2 ui−1 − 2ui + ui+1 + ui−1 − 2ui i+1
∂x ∂t 2 (∆x)
1
−uni + un+1

−a i
∆t
∆t
Setting r = and rearranging gives
a (∆x)2
r n+1 r r n r n
ui−1 − (1 + r) un+1
i + un+1 n
i+1 = − ui−1 − (1 − r) ui − ui+1
2 2 2 2
The left hand side contains three unknown values expressed in terms of three known values on the
∂ 2u ∂u
right hand side. In terms of the molecule for 2 − a = 0 centred on (xi , tn ) we have
∂x ∂t
 
r r
− (1 + r)

 

2 2 =0 (2.11)
r

1−r
r
2 2
Graphically this is shown in Figure 2.10, where ◦ are known values and • are unknown values.

t0 + ∆t

t0
x0 − ∆x x0 x0 + ∆x
F IGURE 2.10

If there are N internal spatial mesh points along each time row we obtain N simultaneous
linear equations to solve for the N unknown values at each time step. A method such as this where
the calculation of unknown values requires the solution of a set of simultaneous equations is called
an implicit scheme.
We can generalise this method of approximating the spatial derivative by the average of the
finite difference representations of the spatial derivative at tn and tn + ∆t in terms of a general
weighting of the tn and tn + ∆t representations. Consider approximating the spatial derivative by

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
28 F INITE D IFFERENCES FOR PARTIAL D IFFERENTIAL E QUATIONS

θ (0 ≤ θ ≤ 1) times its representation at tn + ∆t and (1 − θ) times its representation at tn . The


finite difference representation of the heat equation would then become:

∂ 2u ∂u 1 n n n
 n+1 n+1 n+1

− a = 0 ≈ 2 (1 − θ) u i−1 − 2ui + u i+1 + θ ui−1 − 2ui + u i+1
∂x2 ∂t (∆x)
1
−uni + un+1

−a i
∆t
∂ 2u ∂u
The molecule for 2
−a = 0 centred at (xi , tn ) is:
∂x ∂t
( )
θr − (1 + 2θr) θr
= 0. (2.12)
(1 − θ) r 1 − 2 (1 − θ) r (1 − θ) r

Comparing the molecule in Equation (2.12) with the molecule in Equation (2.11) we can see
that the molecule in Equation (2.11) corresponds to θ = 12 . This θ = 12 is a special case known as
the Crank-Nicolson implicit method. The names for other special values of θ are:

• θ = 0 : Fully Explicit scheme - Equation (2.10).


1
• θ= 2
: Crank-Nicolson implicit scheme - Equation (2.11).
2
• θ= 3
: Galerkin implicit scheme.

• θ = 1 : Fully Implicit scheme.

2.6 Von Neumann Stability Analysis for Propagation Problems


Von Neumann stability analysis, also known as Fourier stability analysis, expresses the errors in
the finite difference solutions in terms of a Fourier series and then looks at the ’growth’ of this
Fourier series over time.
If we let nj be the error in the finite difference solution of a propagation problem at a spatial
location1 j and a temporal location n, then we can express this error in terms of the eigenmodes of
the problem, i.e.,
X∞
n
j = Ak enαk ∆t eijβk ∆x
k=1
√ kπ
where i = −1, Ak are constants, αk are some arbitrary complex numbers and βk = where
J∆x
αk ∆t
J∆x = l, is the length of the solution domain. If we let ξ = e and just consider one mode
(k = 1) of the Fourier series we have

nj = ξ n eijβ∆x (2.13)


1

Note: because we need to use i to represent the complex number i.e., i = −1, we will now represent the spatial
index using j.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
2.6 VON N EUMANN S TABILITY A NALYSIS FOR P ROPAGATION P ROBLEMS 29

It can be seen then that the mode of error will grow exponentially as time increases (n → ∞) if
|ξ| > 1. Hence the eigenmode will be stable only if

|ξ| ≤ 1 (2.14)

This stability criterion can be used to investigate the stability of a finite difference scheme.
Consider now the fully explicit finite difference scheme for the heat equation as given in Equa-
tion (2.10), i.e.,
un+1
j = runj−1 + (1 − 2r) unj + runj+1 (2.15)
∆t n
where r = 2 and uj is the finite difference solution at spatial location j and temporal
a (∆x)
location n. Substituting Equation (2.13) into Equation (2.15) (i.e., nj in place of unj ) we have

ξ n+1 eijβ∆x = rξ n ei(j−1)β∆x + (1 − 2r) ξ n eijβ∆x + rξ n ei(j+1)∆x

or

ξ n eijβ∆x ξ = rξ n eijβ∆x e−iβ∆x + (1 − 2r) ξ n eijβ∆x + rξ n eijβ∆x eiβ∆x


= ξ n eijβ∆x re−iβ∆x + (1 − 2r) + reiβ∆x


Eliminating ξ n eijβ∆x from both sides we have

ξ = re−iβ∆x + (1 − 2r) + reiβ∆x


= (1 − 2r) + 2r cos (β∆x)
= 1 + 2r (cos β∆x − 1)
 
2 β∆x
= 1 − 4r sin
2

Now the error will be stable if |ξ| ≤ 1 and since r > 0, as ∆x > 0 and ∆t > 0, the explicit
difference scheme will be unstable if ξ < −1, i.e.,
 
2 β∆x
1 − 4r sin < −1
2
 
2 β∆x
4r sin >2
2
1
r> .
2
∆t 1
Hence the explicit finite difference scheme will be unstable if r = 2 > .
a (∆x) 2

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
Chapter 3

Nonlinear Equations

3.1 Revision
3.1.1 The Problem of Nonlinear Root-finding
In this module we consider the problem of using numerical techniques to find the roots of nonlinear
equations, f (x) = 0. Initially we examine the case where the nonlinear equations are a scalar
function of a single independent variable, x. Later, we shall consider the more difficult problem
where we have a system of n nonlinear equations in n independent variables, f (x) = 0.
Since the roots of a nonlinear equation, f (x) = 0, cannot in general be expressed in closed
form, we must use approximate methods to solve the problem. Usually, iterative techniques are
used to start from an initial approximation to a root, and produce a sequence of approximations
 (0) (1) (2)
x ,x ,x ,... ,

which converge toward a root. Convergence to a root is usually possible provided that the function
f (x) is sufficiently smooth and the initial approximation is close enough to the root.

3.1.2 Rate of Convergence


Let
x(0) , x(1) , x(2) , . . .


be a sequence which converges to a root χ, and let (k) = x(k) − χ. If there exists a number p and
a non-zero constant c such that
(k+1)
lim (k) p = c, (3.1)
k→∞ | |

then p is called the order of convergence of the sequence. For p = 1, 2, 3 the order is said to be
linear, quadratic and cubic respectively.

3.1.3 Termination Criteria


Since numerical computations are performed using floating point arithmetic, there will always be
a finite precision to which a function can be evaluated. Let δ denote the limiting accuracy of

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
32 N ONLINEAR E QUATIONS

the nonlinear function near a root. This limiting accuracy of the function imposes a limit on the
accuracy, , of the root. The accuracy to which the root may be found is given by:
δ
= . (3.2)
|f 0 (χ)|
This is the best error bound for any root finding method. Note that  is large when the first derivative
of the function at the root, |f 0 (χ)|, is small. In this case the problem of finding the root is ill-
conditioned. This is shown graphically in Figure 3.1.

f (x)
f (x) ± δ

21 22

χ1 2δ

χ2 x
f 0 (χ1 ) 1 large 2 small
ill-conditioned well-conditioned
F IGURE 3.1

When the sequence


x(0) , x(1) , x(2) , . . . ,


converges to the root, χ, then the differences x(k) − x(k−1) will decrease until x(k) − χ ≈ .
With further iterations, rounding errors will dominate and the differences will vary irregularly.
The iterations should be terminated and x(k) be accepted as the estimate for the root when the
following two conditions are satisfied simultaneously:
1. x(k+1) − x(k) ≥ x(k) − x(k−1) and (3.3)
(k) (k−1)
x −x
2. < ∆. (3.4)
1 + |x(k) |
∆ is some coarse tolerance to prevent the iterations from being terminated before x(k) is close to χ,
i.e. before the step size x(k) −x(k−1) becomes “small”. The condition (3.4) is able to test the relative
offset when x(k) is much larger than 1 and much smaller than 1. In practice, these conditions are
used in conjunction with the condition that the number of iterations not exceed some user defined
limit.

3.1.4 Bisection Method


(0) (1)

Suppose that f (x) is continuous on an interval x , x . A root is bracketed on the interval
(0) (1) (0) (1)
x ,x if f x and f x have opposite sign.
Let
x(1) − x(0)
x(2) = x(0) + (3.5)
2

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.1 R EVISION 33


be the mid-point of the interval x(0) , x(1) . Three mutually exclusive possibilities exist:

• if f x(2) = 0 then the root has been found;
  
• if f x(2) has the same sign as f x(0) then the root is in the interval x(2) , x(1) ;
  
• if f x(2) has the same sign as f x(1) then the root is in the interval x(0) , x(2) .

In the last two cases, the size of the interval bracketing the root has decreased by a factor of two.
The next iteration is performed by evaluating the function at the mid-point of the new interval.
After k iterations the size of the interval bracketing the root has decreased to:

x(1) − x(0)
2k

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
34 N ONLINEAR E QUATIONS

The process is shown graphically in Figure 3.2. The bisection method is guaranteed to converge
to a root. If the initial interval brackets more than one root, then the bisection method will find one
of them.
f (x)
x(0) x(1)
x

x(2)
x(3)
x(4)
x(5)

F IGURE 3.2

Since the interval size is reduced by a factor of two at each iteration, it is simple to calculate in
advance the number of iterations, k, required to achieve a given tolerance, 0 , in the solution:
 (1)
x − x(0)

k = log2 .
0
The bisection method has a relatively slow linear convergence.

3.1.5 Secant Method


The bisection method uses no information about the function values, f (x), apart from whether
they are positive or negative at certain values of x. Suppose that
f x(k) < f x(k−1) ,
 

Then we would expect the root to lie closer to x(k) than x(k−1) . Instead of choosing the new estimate
to lie at the midpoint of the current interval, as is the case with the bisection method, the secant 
(k−1) (k−1)
method chooses the x-intercept of the secant line to the curve, the line through x , f x
and x(k) , f x(k) . This places the new estimate closer to the endpoint for which f (x) has the
smallest absolute value. The new estimate is:
 
(k+1) (k) f x(k) x(k) − x(k−1)
x =x − . (3.6)
f (x(k) ) − f (x(k−1) )
Note that the secant method requires two initial function evaluations but only one new function
evaluation is made at each iteration. The secant method is shown graphically in Figure 3.3.
The secant method does not have the root bracketing property of the bisection method since
the new estimate, x(k+1) , of the root need not lie within the bounds defined by x(k−1) and x(k) . As
a consequence, the secant method does not always converge, but when it does so it usually does
so faster than the bisection method. It can be shown that the order of convergence of the secant
method is: √
1+ 5
≈ 1.618.
2

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.1 R EVISION 35

f (x)

x(0) x(2) x(1)


x(3) x

F IGURE 3.3

3.1.6 Regula Falsi


Regula falsi is a variant of the secant method. The difference between the secant method and
regula falsi lies in the choice of points used to form the secant. While the secant method uses
(k−1) (k−1)
the two most recent function evaluations to form the secant line through x ,f x and
(k) (k)
x ,f x , regula falsi forms the secant line through the most recent points that bracket the
root. The regula falsi steps are shown graphically in Figure 3.4. In this example, the point x(0)
remains active for many steps.

f (x)

x(0) x(1)
x(3) x(2) x

F IGURE 3.4

The advantage of regula falsi is that like the bisection method, it is always convergent. How-
ever, like the bisection method, it has only linear convergence. Examples where the regula falsi
method is slow to converge are not hard to find. One example is shown in Figure 3.5.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
36 N ONLINEAR E QUATIONS

f (x) x(3)
(0)
x
x(2) x(1) x

F IGURE 3.5

3.1.7 Newton’s Method


The methods discussed so far have required only function values to compute the new estimate
of the root. Newton’s method requires that both the function value and the first derivative be
evaluated. Geometrically, Newton’s method proceeds by extending the tangent line to the function
at the current point until it crosses the x-axis. The new estimate of the root is taken as the abscissa
of the zero crossing. Newton’s method is defined by the following iteration:
(k)

f x
x(k+1) = x(k) − 0 (k) . (3.7)
f (x )
The update is shown graphically in Figure 3.6.

f (x)

x(2) x(1) x(0) x

F IGURE 3.6

Newton’s method may be derived from the Taylor series expansion of the function:
(∆x)2
f (x + ∆x) = f (x) + f 0 (x)∆x + f 00 (x) + ... (3.8)
2
For a smooth function and small values of ∆x, the function is approximated well by the first two
terms. Thus f (x + ∆x) = 0 implies that ∆x = − ff0(x) (x)
. Far from a root the higher order terms are
significant and Newton’s method can give highly inaccurate corrections. In such cases the Newton
iterations may never converge to a root. In order to achieve convergence the starting values must be
reasonably close to a root. An example of divergence using Newton’s method is given in Figure 3.7.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.1 R EVISION 37

f (x)

x(1) x(0) x

F IGURE 3.7

Newton’s method exhibits quadratic convergence. Thus, near a root the number of significant
digits doubles with each iteration. The strong convergence makes Newton’s method attractive in
cases where the derivatives can be evaluated efficiently and the derivative is continuous and non-
zero in the neighbourhood of the root (as is the case with multiple roots).
Whether the secant method should be used in preference to Newton’s method depends upon
the relative work required to compute the first derivative of the function. If the work required to
evaluate the first derivative is greater than 0.44 times the work required to evaluate the function,
then use the secant method, otherwise use Newton’s method.
It is easy to circumvent the poor global convergence properties of Newton’s method by com-
bining it with the bisection method. This hybrid method uses a bisection step whenever the Newton
method takes the solution outside the bisection bracket. Global convergence is thus assured while
retaining quadratic convergence near the root. Line searches of the Newton step from x(k) to x(k+1)
are another method for achieving better global convergence properties of Newton’s method.

3.1.8 Examples
In the following examples, the bisection, secant, regula falsi and Newton’s methods are applied to
find the root of the non-linear function f (x) = x2 − 1 between [0, 3]. xL and xR are the left and
right bracketing values and xnew is the new value determined at each iteration.  is the distance
from the true solution, nf is the number of function evaluations required at each iteration. The
bisection method is terminated when conditions (3.3) and (3.4) are simultaneously satisfied for
∆ = 1 × 10−4 . The remaining methods determine the solution to the same level of accuracy as the
bisection method.
The examples show the linear convergence rate of the bisection and regula falsi methods, the
better than linear convergence rate of the secant method and the quadratic convergence rate of
Newton’s method. It is also seen that although Newton’s method converges in 5 iterations, 5
function and 5 derivative evaluations, to give a total of 10 evaluations, are required. This contrasts
to the secant method where for the 8 iterations, 9 function evaluations are made.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
38 N ONLINEAR E QUATIONS

Bisection Method - f(x) = x2-1


k xL xR xnew fL(x) fR(x) f(xnew) ε nf |xk+1-xk| |xk-xk-1| |xk-xk-1|/1+|xk|
0 0.000000 3.000000 1.500000 -1.000000 8.000000 1.250000 0.500000 3
1 0.000000 1.500000 0.750000 -1.000000 1.250000 -0.437500 0.250000 1
2 0.750000 1.500000 1.125000 -0.437500 1.250000 0.265625 0.125000 1 0.375000 0.750000 0.428571
3 0.750000 1.125000 0.937500 -0.437500 0.265625 -0.121094 0.062500 1 0.187500 0.375000 0.176471
4 0.937500 1.125000 1.031250 -0.121094 0.265625 0.063477 0.031250 1 0.093750 0.187500 0.096774
5 0.937500 1.031250 0.984375 -0.121094 0.063477 -0.031006 0.015625 1 0.046875 0.093750 0.046154
6 0.984375 1.031250 1.007813 -0.031006 0.063477 0.015686 0.007813 1 0.023438 0.046875 0.023622
7 0.984375 1.007813 0.996094 -0.031006 0.015686 -0.007797 0.003906 1 0.011719 0.023438 0.011673
8 0.996094 1.007813 1.001953 -0.007797 0.015686 0.003910 0.001953 1 0.005859 0.011719 0.005871
9 0.996094 1.001953 0.999023 -0.007797 0.003910 -0.001952 0.000977 1 0.002930 0.005859 0.002927
10 0.999023 1.001953 1.000488 -0.001952 0.003910 0.000977 0.000488 1 0.001465 0.002930 0.001466
11 0.999023 1.000488 0.999756 -0.001952 0.000977 -0.000488 0.000244 1 0.000732 0.001465 0.000732
12 0.999756 1.000488 1.000122 -0.000488 0.000977 0.000244 0.000122 1 0.000366 0.000732 0.000366
13 0.999756 1.000122 0.999939 -0.000488 0.000244 -0.000122 0.000061 1 0.000183 0.000366 0.000183
14 0.999939 1.000122 1.000031 0.000031 0.000092 0.000183 0.000092
16

2
Secant Method - f(x) = x -1 2
f(x) = x -1
k x1 x2 f(x1) f(x2) ε nf |xk-xk-1|/1+|xk|
0 0.000000 3.000000 -1.000000 8.000000 2.000000 2 f(x) 20
1 3.000000 0.333333 8.000000 -0.888889 0.666667 1 2.000000
2 0.333333 0.600000 -0.888889 -0.640000 0.400000 1 0.166667
10
3 0.600000 1.285714 -0.640000 0.653061 0.285714 1 0.300000
4 1.285714 0.939394 0.653061 -0.117539 0.060606 1 0.178571
0
5 0.939394 0.992218 -0.117539 -0.015504 0.007782 1 0.026515
-1 -0.5 3E-16 0.5 1 1.5 2 2.5 3 3.5 4
6 0.992218 1.000244 -0.015504 0.000488 0.000244 1 0.004013
-10 x
7 1.000244 0.999999 0.000488 -0.000002 0.000001 1 0.000123
8 0.999999 1.000000 0.000000 0.000000
9

2
Regula Falsi Method - f(x) = x -1
k x1 x2 f(x1) f(x2) xnew f(xnew) ε nf |xk-xk-1|/1+|xk|
0 0.000000 3.000000 -1.000000 8.000000 0.333333 -0.888889 0.666667 3
1 0.333333 3.000000 -0.888889 8.000000 0.600000 -0.640000 0.400000 1 0.166667
2 0.600000 3.000000 -0.640000 8.000000 0.777778 -0.395062 0.222222 1 0.100000
3 0.777778 3.000000 -0.395062 8.000000 0.882353 -0.221453 0.117647 1 0.055556
4 0.882353 3.000000 -0.221453 8.000000 0.939394 -0.117539 0.060606 1 0.029412
5 0.939394 3.000000 -0.117539 8.000000 0.969231 -0.060592 0.030769 1 0.015152
6 0.969231 3.000000 -0.060592 8.000000 0.984496 -0.030767 0.015504 1 0.007692
7 0.984496 3.000000 -0.030767 8.000000 0.992218 -0.015504 0.007782 1 0.003876
8 0.992218 3.000000 -0.015504 8.000000 0.996101 -0.007782 0.003899 1 0.001946
9 0.996101 3.000000 -0.007782 8.000000 0.998049 -0.003899 0.001951 1 0.000975
10 0.998049 3.000000 -0.003899 8.000000 0.999024 -0.001951 0.000976 1 0.000488
11 0.999024 3.000000 -0.001951 8.000000 0.999512 -0.000976 0.000488 1 0.000244
12 0.999512 3.000000 -0.000976 8.000000 0.999756 -0.000488 0.000244 1 0.000122
13 0.999756 3.000000 -0.000488 8.000000 0.999878 0.000122 0.000061
15

2
Newton’s Method - f(x) = x -1; f’(x) = 2x
k x f(x) f’(x) ε nf nf’ |xk-xk-1|/1+|xk|
0 3.000000 8.000000 6.000000 2.000000 1 1
1 1.666667 1.777778 3.333333 0.666667 1 1 0.500000
2 1.133333 0.284444 2.266667 0.133333 1 1 0.250000
3 1.007843 0.015748 2.015686 0.007843 1 1 0.062500
4 1.000031 0.000061 2.000061 0.000031 1 1 0.003906
5 1.000000 0.000000 0.000015
5 5

F IGURE 3.8

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.2 C OMBINING M ETHODS 39

3.2 Combining Methods


Newton’s method and Secant’s method do not always converege on to the desired root, they can
converge onto a different root outside the initial bracket, or can just diverge away from the de-
sired root. In these cases we would want to change the method to the Regula-falsi method, until
Newton’s or Secant’s method starts to converge.
We may also want to combine methods when using the Regula-falsi method. The Regula-falsi
method can have a convergence rate that is slower then Bisection, if the function is one sided
around the root.

3.3 Laguerre’s Method


In the previous sections we have investigated methods for finding the roots of general functions.
These techniques may sometimes work for finding roots of polynomials, but often problems arise
because of multiple roots or complex roots. Better methods are available for nonlinear functions
with these types of roots. One such technique is Laguerre’s method.
Consider the nth order polynomial:

P n (x) = (x − x1 )(x − x2 ) . . . (x − xn ). (3.9)

Taking the logarithm of both sides gives:

ln |Pn (x)| = ln |x − x1 | + ln |x − x2 | + . . . + ln |x − xn | . (3.10)

Let:
d ln |Pn (x)|
G= (3.11)
dx
1 1 1
= + + ... + (3.12)
x − x1 x − x2 x − xn
0
P (x)
= n (3.13)
Pn (x)
and
d2 ln |Pn (x)|
H=− (3.14)
dx2
1 1 1
= 2
+ 2
+ ... + (3.15)
(x − x1 ) (x − x2 ) (x − xn )2
 0 2
Pn (x) P 00 (x)
= − n . (3.16)
Pn (x) Pn (x)
If we make the assumption that the root x1 is located a distance α from our current estimate,
(k)
x , and all other roots are located at a distance β, then:

x(k) − x1 = α (3.17)
x(k) − xi = β f or i = 2, 3, . . . , n. (3.18)

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
40 N ONLINEAR E QUATIONS

Equations (3.13) and (3.16) can now be expressed as:


1 n−1
G= + (3.19)
α β
1 n−1
H= 2+ . (3.20)
α β2
Solving for α gives:
n
p α= (3.21)
G ± (n − 1) (nH − G2 )
where the sign should be taken to yield the largest magnitude for the denominator. The new esti-
mate of the root is obtained from the old using the update:
x(k+1) = x(k) − α. (3.22)
In general, α will be complex.
Laguerre’s method requires that Pn x(k) , Pn0 x(k) and Pn00 x(k) be computed at each step.
  

The method is cubic in its convergence for real or complex simple roots. Laguerre’s method will
converge to all types of roots of polynomials, real, complex, single or multiple. It requires complex
arithmetic even when converging to a real root. However, for polynomials with all real roots, it is
guaranteed to converge to a root from any starting point.
After a root, x1 , of the nth order polynomial, Pn (x), has been found, the polynomial should
be divided by the quotient, (x − x1 ), to yield an (n − 1) order polynomial. This process is called
deflation. It saves computation when estimating further roots and ensures that subsequent iterations
do not converge to roots already found.

3.3.1 Example
Following is a simple example that illustrates the use of Laguerre’s method. Consider the third
order polynomial:
P2 (x) = x2 − 4x + 3
P 0 2 (x) = 2x − 4
P 00 2 (x) = 2
1. Let the initial estimate of the first root be:
(0)
x1 = 0.5
then
(0)
2x1 − 4
G=  2 = −2.4
(0) (0)
x1 − 4x1 + 3
2
H = G2 −  2 = 4.16
(0) (0)
x1 − 4x1 +3
2
α= p = −0.5 (choosing −4 as the largest denominator)
G ± (2 − 1)(2H − G2 )

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.3 L AGUERRE ’ S M ETHOD 41

The first root is then:


x(1) = x(0) − α = 0.5 − (−0.5) = 1.
2. The second root is found by dividing P2 by (x1 − 1). The trivial result is that x2 = 3.

3.3.2 Horner’s method for evaluating a polynomial and its derivatives


The polynomial Pn (x) = a0 + a1 x + a2 x2 + . . . + an xn should not be calculated by evaluating
the powers of x individually. It is much more efficient to calculate the polynomial using Horner’s
method by noting that Pn (x) = a0 + x (a1 + x (a2 + . . . + xan )). Using this approach the poly-
nomial value and the first and second derivatives may be evaluated efficiently using the following
algorithm:
p ← an ;
p0 ← 0.0;
p00 ← 0.0;
for i in n − 1 . . . 0 loop
p00 ← x ∗ p00 + p0 ;
p0 ← x ∗ p0 + p;
p ← x ∗ p + ai ;
end loop;
p00 ← 2 ∗ p00 ;
If the polynomial and its derivatives are evaluated using the individual powers of x, (n2 +3n)/2
operations are required for the value, (n2 + n − 2)/2 for the first derivative and (n2 − n − 2)/2 for
the second derivative, i.e. a total of (3n2 + 3n − 4)/2 operations. Horner’s method requires 6n + 1
operations to evaluate the polynomial and both derivatives and thus requires less operations when
n > 3. If just the function evaluation is required, then Horner’s method will require less operations
for all n.

3.3.3 Example
Following is an example of how Horner’s method can be used to evaluate
P3 (x) = 1 + 2x − 3x2 + x3
and its derivatives.
1. P = 1, P 0 = 0 and P 00 = 0.
2. P 00 = 0 + 0, P 0 = 0 + 1 and P = x − 3.
3. P 00 = 0 + 1, P 0 = x + x − 3 and P = x(x − 3) + 2.
4. P 00 = x + 2x − 3, P 0 = x(2x − 3) + x(x − 3) + 2 and P = x(x(x − 3) + 2) + 1.
5. P 00 = 2(3x − 3)

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
42 N ONLINEAR E QUATIONS

3.3.4 Deflation
Dividing a polynomial of order n by a factor (x − x1 ) may be performed using the following
algorithm:
r ← an ;
an ← 0.0;
for i in n − 1 . . . 0 loop
q ← ai ;
ai ← r;
r ← x1 ∗ r + q;
end loop;
The coefficients of the new polynomial are stored in the array ai , and the remainder in r.

3.4 Systems of Nonlinear Equations


Consider a general system of n nonlinear equations in n unknowns:

fi (x1 , x2 , . . . , xn ) = 0, i = 1, 2, . . . , n.

Newton’s method can be generalised to n dimensions by examining Taylor’s series in n dimen-


sions:   (k+1) 
(k+1)
 (k)
 0 (k) (k)
fi x = fi x + f ij x xj − xj + ...

where f 0 ij x(k) is an n × n matrix called the Jacobian, having elements:




∂fi (x)
f 0 ij (x) = . (3.23)
∂xj
If these gradients are approximately calculated, the method is called a quasi-Newton method. One
way to approximately form the gradients is to calculate fi (x − ∆xj ) and fi (x + ∆xj ) and then
form the finite difference approximation:
fi (x + ∆xj ) − fi (x − ∆xj )
f 0 ij (x) ≈ (3.24)
2∆xj

The vector ∆xj is a vector


 of zeros, except for a small perturbation in the j th position.
Setting fi x(k+1) = 0 gives Newton’s formula in n dimensions:
  (k+1) (k)

f 0 ij x(k) xj = −fi x(k)

− xj (3.25)

or,
(k+1)
f 0 ij (x(k) )δj = −fi x(k)

(3.26)

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.4 S YSTEMS OF N ONLINEAR E QUATIONS 43

 
(k+1) (k+1) (k)
where δj = xj − xj is the vector of updates. Equation (3.26) is a set of n linear
(k+1)
equations in n unknowns, δj . If the Jacobian is non-singular, then the system of linear equations
can be solved to provide the Newton update:
(k+1) (k) (k+1)
xj = x j + δj . (3.27)

Each step of Newton’s method requires the solution of a set of linear equations. For small n the
set of linear equations may be solved using LU decomposition. For large n, alternative iterative
methods may be required. As with the one-dimensional version, Newton’s method converges
quadratically if the initial estimate, x(0) , is sufficiently close to a root. Newton’s method in multiple
dimensions suffers from the same global convergence problems as its one-dimensional counterpart.

3.4.1 Example 1
Consider finding x and y such that the following are true:

f1 (x, y) = x2 − 2x + y 2 − 2y − 2xy + 1 = 0 (3.28)


f2 (x, y) = x2 + 2x + y 2 + 2y + 2xy + 1 = 0 (3.29)

This is a contrived problem, but it serves to illustrate the application of Newton’s in multiple
dimensions. Solving for the simultaneous roots of these equations is equivalent to finding the roots
of the single factored equation:

f (x, y) = (x + y + 1)2 − (x − y − 1)2 (3.30)

Inspection shows that this equation has the root (0, −1).
The first step in applying Newton’s method is to determine the form of the Jacobian and the
right hand side function vector for calculating the update vector for each iteration. These are:
 
∂f1 (x(k) , y (k) ) ∂f1 (x(k) , y (k) ) " # 
(k+1) 
 ∂x ∂y  x
 δ −f1 (x(k) , y (k) )
 ∂f2 (x(k) , y (k) ) ∂f2 (x(k) , y (k) )  δy(k+1) = −f2 (x(k) , y (k) ) (3.31)

∂x ∂y
  " (k+1) #  2 
2(x − y − 1) 2(y − x − 1) δx −x + 2x − y 2 + 2y + 2xy − 1
= (3.32)
2(x + y + 1) 2(x + y + 1) δy(k+1) −x2 − 2x − y 2 − 2y − 2xy − 1

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
44 N ONLINEAR E QUATIONS

The algorithm for solving this problem is:


k ← 0;
∆ ←> ;
specify starting point: (x(k) , y (k) );
while: ∆ >  and k < N
δ (k+1) ← −f 0 −1 f ;(LU factorisation and solution)
x(k+1) ← x(k) + δ (k+1) ;
∆← δ (k+1) − δ (k) ;
2 2
k ← k + 1;
end loop;
Where N is some specified maximum number of iterations and  is some specified convergence
tolerance.
Sequences of Newton steps converging toward the root (0, −1) for the starting points (−1, −3),
(2, −3) and (2, 0) are shown in figures (3.9) and (3.10).

20

10

0 0
f(x,y)

−0.5
−10
−1
−20
−1.5
−30 y
−1 −2
−0.5
0
0.5 −2.5
1
x 1.5
2
2.5 −3
3

F IGURE 3.9: Sequence of Newton steps plotted on the surface defined by Equation (3.30).

3.4.2 Example 2
Newton’s method for finding roots in multiple dimensions becomes very useful when finding ap-
proximate solutions to non-linear partial differential equations. Consider the diffusion equation in
one-dimension, where the diffusivity, D, is an exponential function of the concentration:
∂c ∂ 2c
− Aec 2 = 0 (3.33)
∂t ∂x

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.4 S YSTEMS OF N ONLINEAR E QUATIONS 45

10

5
start at (−1,−3)

0
f(x,y)

−5

−10
start at (2,0)

−15
start at (2,−3)

−20
5 10 15 20 25 30

Number of iterations

F IGURE 3.10: Convergent behaviour of Equation (3.30) with Newton iteration.

This is a non-linear partial differential equation.


A finite difference discretisation of this equation for the ith node is:
n n+1
cn+1 − cni Ae(1−θ)ci +θci
fi = i (1 − θ) cni−1 − 2cni + cni+1 + θ cn+1 n+1
+ cn+1
  
− i−1 − 2ci i+1
∆t ∆x2
(3.34)
Notice that this is a non-linear equation whose roots are the unknown concentrations cn+1 n+1
i−1 , ci
and cn+1
i+1 . If there are m discrete nodes in our problem, then there are m equations and m unknown
concentrations, so Newton’s method for multiple dimensions can be applied.
A row in the Jacobian for the internal node i will be:
∂fi ∂fi ∂fi
 
0 ··· ··· 0 (3.35)
∂cn+1
i−1 ∂cn+1
i ∂cn+1
i+1
" n+1 
!
n+1 (1−θ)cn
i +θci
(1−θ)cn 1 Ae n n n

or 0 · · · −θ Ae ∆x2 i
i +θc
∆t
− ∆x2
θ(1 − θ) ci−1− 2ci + ci+1 +
θ2 cn+1
i−1 − 2cn+1
i + cn+1
i+1 − 2θ
n n+1
i
Ae(1−θ)ci +θci (3.36)
−θ ∆x2
··· 0

The corresponding right hand side vector entry for the Newton update is −fi . Note that in contrast
to the case of the linear diffusion equation, the system of discrete finite difference equations are
now time varying and must be constructed and factorised for each time step.
If Dirichlet boundary conditions of 1 and 0 are applied at nodes 1 and n, respectively, then the
functions whose roots are to be found are:

f1 = cn+1
1 −1 (3.37)
fn = cn+1
n (3.38)

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
46 N ONLINEAR E QUATIONS

The corresponding rows in the Jacobian will be:


 
1 0 ··· 0 (3.39)
 
0 ··· 0 1 (3.40)

The right hand side vector entries for the Newton update are 1 − cn+1
1 and −cn+1
n , respectively.
Note that the unknown dependent variables for this non-linear problem are cn+1 . Let these
unknowns be referred to simply as c so that the k th Newton update becomes c(k) . The algorithm
for solving this problem is similar to that of Example 1:

for t in 0 · · · T loop

k ← 0;
∆ ←> ;
specify starting point: c(k) ; (This will be the solution from the previous time step)
while: ∆ >  and k < N
δ (k+1) ← −f 0 −1 f ; (LU factorisation and solution)
c(k+1) ← c(k) + δ (k+1) ;
∆← δ (k+1) − δ (k) ;
2 2
k ← k + 1;
end loop;

end loop;

Results for a problem of length 10 m with ∆x = 1 m, ∆t = 1 s and θ = 1 are shown in


Figure 3.11. The non-linear solutions were calculated using:

D(c) = 0.368ec (3.41)

The coefficient was chosen such that D = 1 m2 s−1 at c = 1 kg m−2 . The linear solutions were
calculated using D = 0.632 m2 s−1 . This value was chosen as the integrated mean value of D(c)
from c = 0 kg m−2 to c = 1 kg m−2 . The variation of the diffusion coefficient with concentration
is shown in Figure 3.12. The non-linear diffusion coefficient is smaller than the constant diffusion
coefficient at lower concentrations. This is observed in the solutions, as the concentration profile
for the non-linear diffusion does not advance as far as for constant diffusion over the same interval
of time.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
3.4 S YSTEMS OF N ONLINEAR E QUATIONS 47

1
L 5s 0.5
NL 5s
0.9 L 10s
NL 10s 0.45
L 50s
0.8 NL 50s
0.4

0.7
0.35

0.6
Concentration

0.3

Concentration
0.5
0.25

0.4
0.2

0.3 0.15

0.2 0.1

0.1 0.05
Linear
Non−linear
0 0
0 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100
x (m) Time (s)

(a) Profiles (b) Time history at x = 5 m

F IGURE 3.11: Comparative solutions for constant diffusion and non-linear diffusion.

1.1
D=0.368exp(c)
D=0.632
1

0.9
Diffusion Coefficient

0.8

0.7

0.6

0.5

0.4

0.3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Concentration

F IGURE 3.12: Non-linear and constant diffusion functions.

Solving this example problem produced the following non-linear iteration information over the
course of the first time step:
Time: 1 second
Iteration: 1 L2Norm(F) = 0.1000E+01 Delta=0.1026E+01
Iteration: 2 L2Norm(F) = 0.5549E-01 Delta=0.9876E+00
Iteration: 3 L2Norm(F) = 0.8958E-03 Delta=0.3755E-01
Iteration: 4 L2Norm(F) = 0.2278E-06 Delta=0.5565E-03
Iteration: 5 L2Norm(F) = 0.1481E-13 Delta=0.1438E-06

Note that F is the right hand side vector, so its magnitude (or L2 norm) may be thought of as an
indication of how good the solution at that iteration is. The solution is accurate when F is very
small. Delta is a measure of how much the solution is changing from one iteration to the next.
The Newton iterations are clearly converging.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
Chapter 4

Univariate Minimisation

4.1 The Problem of Univariate Minimisation


This module will consider search procedures to minimise a function, f (x), of one variable, x,
over a closed bounded interval, a ≤ x ≤ b. The function f (x) is called the objective function.
[a, b] is called the interval of uncertainty since the location of the minimum, x∗ , within the interval
is unknown. Search procedures reduce the interval of uncertainty by excluding portions of the
interval that do not contain the minimum.
Optimality of a solution x∗ may be defined by its relationship with neighbouring solutions.
We let x∗ be a feasible solution and then define N (x∗ , δ) to be the set of feasible solutions in a
δ-neighbourhood of x∗ .
The solution x∗ is a strong local minimum if there exists δ > 0 such that:
• f (x) is defined on N (x∗ , δ); and
• f (x∗ ) < f (x) for all x ∈ N (x∗ , δ), x 6= x∗ .
The solution x∗ is a weak local minimum if there exists δ > 0 such that:
• f (x) is defined on N (x∗ , δ); and
• f (x∗ ) ≤ f (x) for all x ∈ N (x∗ , δ), x 6= x∗ ; and
• x∗ is not a strong local minimum.
Thus, x∗ is not a local minimum if every neighbourhood of x∗ contains at least one point with
a strictly lower objective function value. If f (x∗ ) ≤ f (x) for all x ∈ Rn , then x∗ is called a
global minimum . These situations are illustrated in Figure 4.1. This module will consider both
non-derivative and derivative methods for finding x∗ .

4.2 Non-derivative Methods


4.2.1 Definitions
When only function values are available, we need to place conditions on f (x) over [a, b] to ensure
that x∗ may be found.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
50 U NIVARIATE M INIMISATION

f (x)

strong local weak local


minimum
minimum
global minimum
x

F IGURE 4.1

A function f (x) is quasiconvex in [a, b] if there exists a unique x∗ ∈ [a, b] such that, given
α, β ∈ [a, b] where α < β:

• if α > x∗ then f (α) < f (β);

• if β < x∗ then f (α) > f (β).

Let a function f (x) be quasiconvex over [a, b]. Let α, β ∈ [a, b] such that α < β.

• If f (α) > f (β) then f (γ) > f (β) for all γ ∈ [a, α).

• If f (α) < f (β) then f (γ) > f (α) for all γ ∈ (β, b].

If f (x) is quasiconvex in [a, b], it is possible to reduce the interval of uncertainty by comparing the
values of f (x) at two interior points.

• If f (α) > f (β) then the new interval of uncertainty is [α, b].

• If f (α) ≤ f (β) then the new interval of uncertainty is [a, β].

The interval of uncertainty is shown in Figure 4.2.

a α β b a α β b

new interval new interval


of uncertainty of uncertainty

F IGURE 4.2

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
4.2 N ON - DERIVATIVE M ETHODS 51

4.2.2 Uniform Search


Let the interval of uncertainty be divided into (K − 1) subintervals:
[a, a + δ], [a + δ, a + 2δ], [a + 2δ, a + 3δ], . . . , [a + (K − 2)δ, b]
where b = a + (K − 1)δ. The function is evaluated at the K interval boundaries.
If κ is the boundary with the smallest function value (count from 0) and if f (x) is quasiconvex
in [a, b], then the minimum x∗ lies in the interval [a + (κ − 1)δ, a + (κ + 1)δ]. This is shown in
Figure 4.3.

a a + (κ − 1)δ a + (κ + 1)δ b

new interval
of uncertainty

F IGURE 4.3

The uniform search method requires that the total number of function evaluations, K, be chosen
a priori. The interval of uncertainty is reduced after K function evaluations to:
2(b − a)
2δ =
K −1
i.e.
b−a
+ 1.K=
δ
Thus, if we require a small interval of uncertainty, a large number of function evaluations must be
made.

4.2.3 Dichotomous Search


Consider f (x) to be minimised over the interval [a, b]. If f (x) is strictly quasiconvex, at least two
function evaluations are required to reduce the interval of uncertainty. Refer to Figure 4.2.
The length of the new interval of uncertainty is either (β − a) or (b − α). We do not know a
priori whether f (α) > f (β) or f (α) < f (β).
How can the position of α and β be chosen to reduce the new interval of uncertainty as much as
possible? If α and β are placed at the midpoint of [a, b] then the new level of uncertainty is reduced
by 1/2. In this case, however, f (x) would only be evaluated at one point. If:
b+a b+a
α= −δ and β= + δ,
2 2
b−a
where δ > 0 is small, then the new level of uncertainty will be close to the optimal value of 2
.
After K function evaluations the new level of uncertainty is:
b−a
.
2K/2

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
52 U NIVARIATE M INIMISATION

f (αk ) < f (βk ) f (αk ) > f (βk )

ak αk βk bk ak αk βk bk

ak+1 bk+1 ak+1 bk+1


F IGURE 4.4

4.2.4 Fibonacci Search


The Fibonacci search method requires two function evaluations at the first iteration and only one
at subsequent iterations. The points at which the functions are evaluated are chosen so that they
may be reused at subsequent iterations. These points are defined by the Fibonacci numbers {Fk }:

Fk = Fk−1 + Fk−2

where F0 = F1 = 1, i.e. the first few values are: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . ..
Let K be the total number of function evaluations to be made. At iteration k, let the interval of
uncertainty be [ak , bk ] and define:
FK−k−1
αk = ak + FK−k+1
(bk − ak )
FK−k
βk = ak + FK−k+1
(bk − ak ) .

• If f (αk ) > f (βk ) then the new interval of uncertainty is given by [ak+1 , bk+1 ] = [αk , bk ].

• If f (αk ) < f (βk ) then the new interval of uncertainty is given by [ak+1 , bk+1 ] = [ak , βk ].
In both cases, the interval of uncertainty is reduced by the factor
FK−k
.
FK−k+1
Once (K − 2) iterations are complete, the final function evaluation needs to be made. This
will halve the final uncertainty interval, and follows the same idea as in the Dichotomous Search
by performing a final evaluation at a point near the current α, ie: f (α + δ). Based on this final
function evaluation we can choose our final interval to be that spanning the smallest value.
The Fibonacci method requires that the total number of function evaluations, K, be chosen a
priori. After (K − 1) iterations requiring K function evaluations (including the final step), the
level of uncertainty is:
b−a
.
FK

Example of Fibonacci search for K=3


The original interval of uncertainty is [0,3]. The function is evaluated at x = 1 and x = 2.
• If f (1) > f (2) the new level of uncertainty is [1, 3].

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
4.2 N ON - DERIVATIVE M ETHODS 53

1+δ
0 1 2 3
F IGURE 4.5

• If f (1) < f (2) the new level of uncertainty is [0, 2].

Since the function value at the midpoint of each interval is already known, only one function
evaluation needs to be made at x = 1 + δ, reducing the interval of uncertainty to 1.

4.2.5 Golden Section Search


The golden section search does not require that the total number of function evaluations, K, be
chosen a priori.
At iteration k, let the interval of uncertainty be [ak , bk ]. Let the function be evaluated at two
points αk , βk ∈ [ak , bk ]. The new interval of uncertainty [ak+1 , bk+1 ] is given by [αk , bk ] if f (αk ) >
f (βk ) and [ak , βk ] if f (αk ) < f (βk ). The points αk and βk are selected so that:

• the interval of uncertainty does not depend on the outcome of the k th iteration:

bk − αk = βk − ak

so
bk+1 − ak+1 = τ (bk − ak )
where τ ∈ [0, 1];

• either αk+1 coincides with βk or βk+1 coincides with αk so that at iteration (k + 1), only one
function evaluation is required.

These conditions are satisfied when:


2
τ= √ ≈ 0.618034. (4.1)
1+ 5
At each iteration, only one function evaluation is required and the interval of uncertainty is reduced
by a factor of 1+2√5 . Note that:
Fk+1 2
lim = √ . (4.2)
k→∞ Fk 1+ 5
Thus, the golden section search may be considered a limiting case of the Fibonacci search. The
golden section search steps are shown in Figure 4.6.

4.2.6 Brent’s Method


Brent’s method makes the assumption that the objective function, f (x), is well approximated by a
parabola. If this assumption is valid, a parabola fitted through three points of f (x) should have a
minimum near the minimum of the objective function.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
54 U NIVARIATE M INIMISATION

0 1−τ τ 1
ak αk βk bk

if f (αk ) < f (βk )


ak+1 αk+1 βk+1 bk+1

if f (αk ) > f (βk )


ak+1 αk+1 βk+1 bk+1
F IGURE 4.6

Let the objective function be evaluated at three points, f (â), f (b̂) and f (ĉ). The extremum of
the quadratic approximation is given by:

1 (b̂ − â)2 (f (b̂) − f (ĉ)) − (b̂ − ĉ)2 (f (b̂) − f (â))


x = b̂ − . (4.3)
2 (b̂ − â)(f (b̂) − f (ĉ)) − (b̂ − ĉ)(f (b̂) − f (â))

At each iteration the new approximation to the minimum, x, replaces the abscissa: â, b̂ or ĉ, that
has the largest objective function value, i.e..

â f (â) = M AX(f (â), f (b̂), f (ĉ))

x replaces b̂ f (b̂) = M AX(f (â), f (b̂), f (ĉ)) . (4.4)

c f (ĉ) = M AX(f (â), f (b̂), f (ĉ))

An example is given in Figure 4.7. The minimum is bracketed between points 1 and 3. A further
point 2 is defined which is at the mid-point of 1 and 3. The minimum of the parabola fitted through
1, 2, and 3 lies at point 4. The largest value of the objective function occurs at point 3 so it is
discarded. A new parabola is fitted through points 1, 4 and 2 with a minimum at point 5.

f (x)
3 f (x)
1

2
5 4

F IGURE 4.7

Brent’s method fails when f (â), f (b̂) and f (ĉ) are colinear. In this case the denominator will
be zero and the new approximation to the minimum will be located infinitely far away. It should
be also noted that the above update will locate the extremum of the quadratic approximation. This
extremum may, in fact, be a maximum rather than a minimum.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
4.3 D ERIVATIVE M ETHODS 55

Brent’s method is usually combined with a safer method, like golden search, to guard against
the above problems. When Brent’s method predicts an update lying outside some bracket, or
predicts a large step, the scheme switches to the golden search method for that iteration.
A typical ending for Brent’s method is that the minimum has been isolated to a fractional
precision of approximately . Effectively this means that a and b are 2x apart and x is the midpoint
of a and b, i.e. x = (a + b)/2.

4.3 Derivative Methods


∂f (α)
A function f (x) is pseudoconvex in [a, b] if for each α, β ∈ [a, b] with (β − α) ≥ 0 we
∂x
must have f (α) ≤ f (β).
If f (x) is pseudoconvex in [a, b], it is possible to reduce the interval of uncertainty by evaluating
the derivative, f 0 (x), at a single interior point, α.

• If f 0 (α) = 0 then by pseudoconvexity of f (x), α is a minimum point;

• if f 0 (α) > 0 then the new interval of uncertainty is [a, α];

• if f 0 (α) < 0 then the new interval of uncertainty is [α, b].

4.3.1 Bisection Search Method


Let f (x) be pseudoconvex. At iteration k let the interval of uncertainty be [ak , bk ]. Let:

ak + b k
αk = . (4.5)
2
• If f 0 (αk ) = 0 then by pseudoconvexity of f (x), αk is a minimum point;

• if f 0 (αk ) > 0 then the new interval of uncertainty is [ak , αk ];

• if f 0 (αk ) < 0 then the new level of uncertainty is [αk , bk ].

This is shown is Figure 4.8.

f 0 (αk ) > 0 f 0 (αk ) < 0

ak αk bk ak αk bk

ak+1 bk+1 ak+1 bk+1


F IGURE 4.8

At each iteration only one derivative evaluation is required and the interval of uncertainty is
reduced by a factor of 1/2.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
56 U NIVARIATE M INIMISATION

4.3.2 Newton’s Method


Newton’s method approximates f (x) at a point by a quadratic (a truncation of the appropriate
Taylor series):
(x − xk )2 00
q(x) = f (xk ) + (x − xk )f 0 (xk ) + f (xk ). (4.6)
2
xk+1 is taken as the point where q 0 (x) = 0:

f 0 (xk )
xk+1 = xk − . (4.7)
f 00 (xk )

Note that f (x) must be twice differentiable and f 00 (x) 6= 0. Newton’s method will converge
quadratically provided the starting point, x0 , is sufficiently close to a stationary point.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
Chapter 5

Numerical Methods for Ordinary


Differential Equations

5.1 Revision
5.1.1 Euler’s Method
Euler’s method1 is the simplest numerical method for solving ODEs. The method only approxi-
mates the first derivative of the function; this is accomplished by using a simple first-order finite
difference approximation of the derivative (or slope).

y2

y1
y0 ∆y

x
∆x
x0 x1 x2
F IGURE 5.1: Steps in Euler’s method.

Recall the finite difference approximation to the derivative is given by

dy ∆y y k+1 − y k
= f xk , y k .

≈ = (5.1)
dx ∆x h
1
Leonhard Euler (1707-1783)

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
58 N UMERICAL M ETHODS FOR O RDINARY D IFFERENTIAL E QUATIONS

This can be rearranged to give


y k+1 = y k + hf xk , y k .

(5.2)
As a simple representation of Euler’s method, consider solving the equation
dy
= (1 + xy)2 , with y (0) = 1
dx
using a step length h = 0.1. This yields the following sequence of iterates:
 2 
k xk yk f x k , y k = 1 + xk y k y k+1 = y k + hf xk , y k
0 0 1 (1 + (0)(1))2 = 1 1 + 0.1 = 1.1
1 0.1 1.1 (1 + (0.1)(1.1))2 = 1.2321 1.1 + (0.1)(1.2321) = 1.22321
2 0.2 1.22321 (1 + (0.2)(1.22321))2 = 1.5491 1.22321 + (0.1)(1.5491) = 1.3781
3 0.3 1.3781 (1 + (0.3)(1.3781))2 = 1.9978 1.3781 + (0.1)(1.9978) = 1.5779
4 0.4 1.5779 etc.

5.1.2 The Improved Euler Method


Euler’s method uses the slope (i.e., the value of dy/dx) at y k to predict y k+1 . If the slope is
changing significantly over the step length, then Euler’s method will give an inaccurate solution.
An improved method can be devised by averaging the slopes at y k and y k+1 .
y

10 Euler prediction
yB
y0
A

x
x0 x0 + h
F IGURE 5.2: Euler prediction for the improved Euler method

Consider Figure 5.2 and the standard differential equation given by


dy
= f (x, y) , with y(x0 ) = y0 .
dx
At point A, the slope is given by dy/dx = f (x0 , y 0 ). At point B, the slope can be estimated as
dy/dx = f x0 + h, y B , where y B = y 0 +hf (x0 , y 0 ) is obtained using Euler’s method. Averaging
the slopes at point A and point B yields
y 1 = y 0 + h × average slope
h (5.3)
= y0 + f x0 , y 0 + f x0 + h, y B .
 
2

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
5.2 S TABILITY 59

The Improved Euler Method uses this approach to approximate the solution at y k+1 such that
h
y k+1 = y k + f xk , y k + f xk+1 , y k + hf xk , y k
 
. (5.4)
2
The Improved Euler Method is an example of a predictor-corrector method. In the first step the
value yB is predicted using an Euler step, and in the second step, this extra slope information is
used to provide a corrected estimate of y k+1 .
Note that the Improved Euler Method is a second-order method, i.e., its truncation error ∼
O (h3 ), but the Improved Euler Method is more computationally expensive than Euler’s Method.
As an example, consider the equation
dy
= (1 + xy)2 , with y (0) = 1.
dx
Solving this equation using the Improved Euler Method with step length h = 0.1 yields the fol-
lowing sequence of iterates:
2 h k
yEk+1 fEk+1 = 1 + xk+1 y k+1 f + fEk+1

k xk yk fk y k+1 = y k +
2
0.1
0 0 1 1 1.1 (1 + 0.1(1.1))2 = 1.2321 1 + (1 + 1.2321) = 1.1116
2
1 0.1 1.1116 1.2347 1.2351 1.5551 1.2511
2 0.2 1.2511 etc.
2
given that f k = 1 + xk y k and yEk+1 = y k + hf k .

5.2 Stability
5.2.1 Euler’s Method
Euler’s method is only a first-order accurate method and in practice it is generally necessary to use
small step lengths to achieve accurate solutions. For example, consider the equation y 0 = ay. The
solution to this equation is y = eax which will be a decaying exponential for a < 0. Applying
Euler’s method to this equation gives:
y 1 = (1 + ha)y 0
y 2 = (1 + ha)y 1 = (1 + ha)2 y 0
..
.
y k = (1 + ha)k y 0
If |1 + ha| > 1 then y k will increase in value. But for a < 0 we want y k to decrease. So for a < 0
we require −1 < 1 + ha < 1 −→ 1 + ha > −1 −→ ha > −2 −→ h < − a2 .
The criterion h < − a2 models the decaying behaviour of the exponential when a < 0 and is
necessary but not sufficient. If h is chosen to be as large as possible but smaller than − a2 then
1 + ha < 0 causing (1 + ha)k and thus y k to take negatives values for odd values of k. To
avoid this oscillatory behaviour we require 0 < 1 + ha −→ ha > −1 −→ h < − a1 . This is a
stronger condition on h since − a1 < − a2 . Therefore the criterion for Euler’s method to give stable,
non-oscillating solutions for this problem is h < − a1 .

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
60 N UMERICAL M ETHODS FOR O RDINARY D IFFERENTIAL E QUATIONS

5.2.2 Improved Euler Method


The Improved Euler method is a second-order accurate method and therefore we would expect it
to provide as accurate solutions as Euler’s method for the same choice of step size. Indeed, by
carrying out the same analysis as we did above we find that the stability criterion for the Improved
Euler method admits larger step sizes.
Applying the Improved Euler method to the equation y 0 = ay gives:

h2 2 0
y 1 = (1 + ha + a )y
2
h2 h2
y 2 = (1 + ha + a2 )y 1 = (1 + ha + a2 )2 y 0
2 2
..
.
h2 2 k 0
y k = (1 + ha + a) y
2
2
Therefore if |1 + ha + h2 a2 | > 1 then y k will increase in value. But if a < 0 we want y k to decrease
2
and not to oscillate in sign. This implies that (for a < 0) we require 0 < 1 + ha + h2 a2 < 1. Thus,
2 2
if 1 + ha + h2 a2 < 1 −→ ha(1 + h2 a) < 0 −→ h < − a2 . Similarly, if 1 + ha + h2 a2 > 0 −→
2 + 2ha + h2 a2 > 0 −→ (1 + ha)2 > −1 which is always true since the left-hand-side is positive
and the right-hand-side is negative for any choice of h. Therefore the criterion for the Improved
Euler method to give stable, non-oscillating solutions for this problem is h < − a2 .

5.3 Systems of ODEs


5.3.1 Using Euler’s Method to Solve Systems of ODEs
Euler’s method can be extended to allow the solution of systems of ODEs of the form dy/dt =
f (x, y) with initial values y (x0 ) = y 0 . This extension is important as it allows Euler’s method to
be extended to higher-order ODEs as almost any higher-order ODE can be written as a system of
first-order ODEs.
Euler’s method in system form is

y k+1 = y k + hf k . (5.5)

For example, if there are two variables in the system form of the ODE, y1 and y2 , i.e.,
dy1
= f1 (x, y1 , y2 )
dx (5.6)
dy2
= f2 (x, y1 , y2 ) ,
dx
then Euler’s method becomes
y1k+1 = y1k + hf1 xk , y1k , y2k

E
(5.7)
y2k+1 = y2k + hf2 xk , y1k , y2k

E

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
5.4 RUNGE -K UTTA M ETHODS 61

5.3.2 Using the Improved Euler Method to Solve Systems of ODEs


Improved Euler’s method in system form is
h k
y k+1 = y k + f + f kE ,

(5.8)
2
where f kE is calculated using the predicted values from Euler’s method. For example if there are
two variables in the system, then

f1k+1 = f1 xk+1 , y1k+1 , y2k+1



E E E
(5.9)
f2k+1 k+1 k+1 k+1

E
= f 2 x , y1E , y2E ,

where y1k+1
E
and y2k+1
E
are calculated from Equation (5.7). Once these values are calculated then

h k
y1k+1 = y1k + f1 + f1k+1

2 E
(5.10)
h
y2k+1 = y2k + f2k + f2k+1

.
2 E

5.4 Runge-Kutta Methods


The Improved Euler method is the first member of a class of single-step methods called Runge-
Kutta methods2 . Runge-Kutta methods use several intermediate points and special slope averaging
schemes to determine y k+1 i.e., y k+1 = y k + α0 hf0 + α1 hf1 + α2 hf2 + . . . where f1 , f2, . . . are
slopes evaluated at intermediate points and α1 , α2 , . . . are constants.
Using dy/dx = f (x, y) and the chain rule, we can show that

d2 y Df ∂f ∂f dy ∂f ∂f dy
2
= = + = + . (5.11)
dx Dx ∂x ∂y dx ∂x ∂y dx
Using Taylor’s series
dy h2 d2 y
+ O h3

y (x + h) = y + h + 2
(5.12)
dx 2! dx
and substituting in Equation (5.11) we get

h2
 
∂f ∂f
+ O h3 .

y (x + h) = y + hf0 + + f0 (5.13)
2! ∂x 0 ∂y 0

5.4.1 Second-order Runge-Kutta Scheme


A second-order Runge-Kutta scheme will be of the form

y (x + h) = y + α0 hf0 + α1 hf1 (5.14)


2
Carl Runge (1856-1927), M.W. Kutta (1867-1944)

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
62 N UMERICAL M ETHODS FOR O RDINARY D IFFERENTIAL E QUATIONS

where f0 = f (x, y) and f1 = f (x + βh, y + γhf0 ). To define the scheme we need to determine
the constants α0 , α1 , β and γ. Expand f1 about (x, y) using Taylor’s series yields

∂f ∂f
+ O h2

f1 = f0 + βh + γhf0 (5.15)
∂x 0 ∂y 0

Substituting Equation (5.15) into Equation (5.14) gives


 
∂f ∂f
+ O h3

y(x + h) = y + α0 hf0 + α1 h f0 + βh + γhf0
∂x 0 ∂y 0
  (5.16)
2 ∂f ∂f
+ O h3 .

= y + h (α0 + α1 ) f0 + h α1 β + α1 γf0
∂x 0 ∂y 0

Comparing this equation with the Taylor’s series expansion Equation (5.13) implies that

α0 + α1 = 1
1
α1 β = (5.17)
2
1
α1 γ = .
2
This system consists of three equations with four unknowns. Yet, if we specify one of the
1
parameters, the other three can be determined. For example, if we choose β = 1, then α0 = α1 =
2
1
and γ = 1. Therefore Equation (5.14) becomes y (x + h) = y + h (f0 + f1 ) where f0 = f (x, y)
2
and f1 = f (x + h, y + hf0 ). This is the formulation for the Improved Euler method discussed
earlier. Therefore the Improved Euler is a second-order Runge-Kutta scheme and will therefore
have truncation error ∼ O (h3 ).
Other second-order Runge-Kutta schemes can be derived by prescribing different parameter
1 1
values. For example, choosing α1 = 1 gives α0 = 0, β = and γ = , leading to the scheme
2 2
defined by  
1 1
y (x + h) = y + hf x + h, y + hf0 .
2 2

5.4.2 Higher-order Runge-Kutta schemes


A similar procedure can be used to derive higher-order Runge-Kutta schemes. For example, a
third-order Runge-Kutta scheme is given by

f0 = f x k , y k

 
k h k h
f1 = f x + , y + f0
2 2
k k (5.18)

f2 = f x + h, y − hf0 + 2hf1

h
y k+1 = y k + (f0 + 4f1 + f2 ) ,
6

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
5.5 E RROR ESTIMATION FOR FOURTH - ORDER RUNGE -K UTTA 63

which has a truncation error ∼ O (h4 ). Similarly, a fourth-order Runge-Kutta scheme is

f 0 = f xk , y k

 
k h k h
f1 = f x + , y + f0
2 2
 
k h k h
f2 = f x + , y + f1
2 2 (5.19)
k k

f3 = f x + h, y + hf2

h
y k+1 = y k + (f0 + 2f1 + 2f2 + f3 ) ,
6
with a truncation error ∼ O (h5 ).

5.5 Error estimation for fourth-order Runge-Kutta


Each of these methods matches the Taylor’s expansion to some order. The omission of higher-
order Taylor’s series terms in the approximation means that each method will have an associated
truncation error. If y (x0 + h) is the exact value at x0 + h then, by Taylor’s series3 :

h2 00 0  h3 000 0  h4 0000 0  h5 00000 0 


y x0 + h = y x0 + hy 0 x0 +
  
y x + y x + y x + y x + ...
(2!) (3!) (4!) (5!)
= fourth-order Runge-Kutta Method estimate + O h5


Therefore the error in one step is O (h5 ), implying that the truncation error involves terms which
are proportional to h5 or higher powers.
If we halve the step length then the error per step is proportional to (h/2)5 = (h5 /32) but to
get to the same point we must do twice as many steps which means the error at the final point is
E ∝ 2(h/2)5 = (h5 /16). Therefore by halving the step length we reduce the error by a factor of
16. Similarly, if we double the step length we increase the error by a factor of 16. This is why the
fourth-order Runge-Kutta method is termed a fourth-order method. In general if a method is said
to be n-th order, it will have a truncation error of O (hn+1 ).

5.6 Adaptive Step-Size Control


When the function is changing rapidly it is necessary to take very small steps to maintain solution
accuracy. But if the function is smooth we don’t want to be taking small steps because it becomes
expensive. Therefore we want an adaptive method which will take small steps if necessary but
will step rapidly through smooth regions. To achieve this we need a method which will provide a
measure of the truncation error at each step and will feed this back to the routine to estimate the
next step length.
3
Brook Taylor (1685-1731)

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
64 N UMERICAL M ETHODS FOR O RDINARY D IFFERENTIAL E QUATIONS

Two ways of incorporating adaptive step-size control into single-step methods like those above
are step-doubling (also called step-halving) and embedded Runge-Kutta methods. Step-doubling
involves taking each step twice, once as a full step and then as two half steps. The difference in the
two results gives an estimate of the truncation error. Embedded Runge-Kutta methods estimate the
truncation error to be the difference between two predictions using different-order Runge-Kutta
methods. While step-doubling is simple to implement, embedded Runge-Kutta methods are more
commonly used because they are more efficient.
Although it requires more effort at each step the use of an adaptive scheme can lead to dramatic
improvements in both computational efficiency and accuracy. Clearly an adaptive scheme will
lead to improved accuracy as inaccurate steps will be rejected. The computational efficiency will
improve in cases where the function is rapidly changing in some areas and is smooth in others.
Using an adaptive scheme accuracy can be maintained in the rapidly changing region by using
small steps. In the smooth region an adaptive scheme will increase the step length so that these
regions can be traversed rapidly, leading to improved computational efficiency.

5.6.1 Step-doubling
Step-doubling involves taking each step twice, once as a full step and then as two half steps. The
difference in the two results gives an estimate of the truncation error.
Consider implementing step-doubling for a fourth-order Runge-Kutta scheme (error ∼ O (h5 )).
The calculations required are shown in Figure 5.3.

Big Step
Two Small
Steps

h h
x
F IGURE 5.3: Step doubling in a 4th order Runge-Kutta method. Points where the derivative is
evaluated are shown as filled circles. The open circle represents the same derivatives as the filled
circle immediately above it.

We can see in Figure 5.3 doing each step twice requires a total of 11 evaluations. This is to be
compared with 8 evaluations from just doing the two small steps. Therefore the adaptive method
is 11/8 = 1.375 times more expensive. So why does this method justify the extra computational
expense at each step?
Let y (x + 2h) be the exact solution when advancing from x to x+2h, let y1 be the approximate
solution determined taking one big step of 2h and let y2 be the approximate solution determined
taking two smaller steps of length h. Then using a big step we have

y (x + 2h) = y1 + (2h)5 φ + O h6

(5.20)

and using two smaller steps we have

y (x + 2h) = y2 + 2 (h)5 φ + O h6 ,

(5.21)

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
5.6 A DAPTIVE S TEP -S IZE C ONTROL 65

where, to order h5 , φ is a number which remains constant over the step and, from Taylor’s series,
1 d5 y
will be of order of magnitude of .
5! dx5
The advantage of using step-doubling is that we can use ∆ = y2 − y1 as an indicator of the
truncation error. ∆ can be used to control the error and hence the step size. If step h1 gives the
error estimate ∆1 and the desired accuracy is ∆0 then the step length required to achieve the desired
1
∆0 5
accuracy can be estimated as h0 = h1 (the exponential power arises due to the error being
∆1
proportional to the fifth power of h). Therefore:
1
∆0 5
If |∆1 | > |∆0 | then calculate h0 and redo the time-step with h0 = h1
∆1
1
∆0 5
If |∆1 | < |∆0 | then use h0 = h1 as a larger step length for the next step.
∆1

5.6.2 Embedded Runge-Kutta methods


Embedded Runge-Kutta methods estimate the truncation error to be the difference between two
predictions using different-order Runge-Kutta methods. As was the case with step-doubling, the
idea of using two predictions might seem too computationally expensive.
For example, using second- (two evaluations) and third-order (three evaluations) Runge-Kutta
methods in tandem amounts to a total five function evaluations per step. However, by deriving a
third-order method that employs most of the same function evaluations as the second-order method,
an estimate of the truncation error can be determined using only three function evaluations. Such
a second- and third-order embedded Runge-Kutta method is

f 0 = f xk , y k

 
k h k h
f1 = f x + , y + f0
2 2
 
k 3h k 3h (5.22)
f2 = f x + , y + f1
4 4

h
y k+1 = y k + (2f0 + 3f1 + 4f2 ) .
9
The error is estimated as
h
(−5f0 + 6f1 + 8f2 − 9f3 )
∆= (5.23)
72

where f3 = f xk+1 , y k+1 . Although there appear to be four function evaluations, there are really
only three because after the first step, the f0 for the present step will be the f3 from the previous
step.
Similarly, using fourth- and fifth-order Runge-Kutta methods in tandem amounts to a total of
10 function evaluations per step. However, by deriving a fifth-order method that employs most of
the same function evaluations as the fourth-order method, an estimate of the truncation error can
be determined using only six function evaluations.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
Chapter 6

Eigenproblems

6.1 Eigenvalues and Eigenvectors


A linear transformation Ax = b will (for a given A matrix) transform a given vector x by stretch-
ing and rotating it into the vector b. For example consider the matrix
 
1 4
A=
2 3
 
2
Now let us choose some arbitrary vector x, say x = , then
1
    
1 4 2 6
Ax = =
2 3 1 7

This is represented graphically in Figure 6.1.

Ax

F IGURE 6.1
 
1
Now let us try x = then
1
      
1 4 1 5 1
Ax = = =5 = 5x
2 3 1 5 1

This example is shown graphically in Figure 6.2.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
68 E IGENPROBLEMS

Ax

F IGURE 6.2

 
−2
As a third example, try x = then
1
      
1 4 −2 2 −2
Ax = = = −1 = −1x
2 3 1 −1 1

The transformation of the last two x vectors produced a vector b that was in the same direction
(give or take a negative sign) as the original x vector. In other words, the x direction vector is
scaled by some amount, λ, to give the b vector i.e. b = Ax = λx. These special x directions
are called eigenvectors, and the scale factor λ is called an eigenvalue. These eigenvectors
  and
1 4
eigenvalues are properties of the matrix A. For the above example the matrix A = has an
    2 3
1 −2
eigenvector x = with eigenvalue λ = 5 and another eigenvector x = with eigenvalue
1 1
λ = −1.

6.1.1 Finding eigenvectors and eigenvalues


To find eigenvectors and eigenvalues of A we need to find vectors x and scalars λ such that

Ax = λx (6.1)

or
(A − λI) x = 0
that is     
1−λ 4 x1 0
=
2 3 − λ x2 0
This has the trivial solution x = 0, or nontrivial solution if det [A − λI] = 0, that is
 
1−λ 4
det = (1 − λ) (3 − λ) − 2 × 4 = 0
2 3−λ

Therefore
3 − λ − 3λ + λ2 − 8 = 0
λ2 − 4λ − 5 = 0

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
6.1 E IGENVALUES AND E IGENVECTORS 69

This has the solution √



16 + 20
λ= = 2 ± 3 = 5, −1
2
which are the eigenvalues. To find the eigenvectors, when λ = 5:
         
1−5 4 x1 0 −4 4 x1 0
= or =
2 3 − 5 x2 0 2 −2 x2 0

The first equation gives −4x1 + 4x2 = 0 or x1 = x2 . The second equation gives the same result
 1 − 2x
2x  2= 0 or x1 = x2 . Note that the determinant is zero, as required. Thus the eigenvector is
1 k
or — only direction matters not magnitude. Similarly when λ = −1,
1 k
         
1 − (−1) 4 x1 0 2 4 x1 0
= or =
2 3 − (−1) x2 0 2 4 x2 0
 
−2
which gives x1 = −2x2 . Thus the eigenvector is .
    1
1 −2
Now let s1 = and s2 = be the eigenvectors, then we can write
1 1
  
As1 = λ1 s1 λ1 0
⇒ A [s1 s2 ] = [s1 s2 ]
As2 = λ2 s2 0 λ2
or
AS = SΛ
where  
λ 0
Λ= 1
0 λ2
and
S = [s1 s2 ]
i.e., the ith column of the matrix S is the ith eigenvector. Hence we have

S −1 AS = Λ (6.2)

With the above example,


1 2
   
1 −2 −1 3 3
S= and S =
1 1 − 13 1
3

and therefore
1 2
     1 2
   
−1 3 3
1 4 1 −2 5 2 5 0
S AS = = 31 3 = = Λ.
− 13 1
3
2 3 1 1 −3 1
3
5 −1 0 −1

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
70 E IGENPROBLEMS

6.2 Power method


The above method of finding eigenvalues and eigenvectors works well for matrices up to about
3 × 3, but it becomes unsuitable for larger matrices. The power method (also called iterated
multiplication or Stodola method) is used to find the eigenvalue with largest absolute magnitude
(often we only need the largest or smallest eigenvalue) and then a process of “shifting” is used to
find the others.
Consider arbitrary vector y expressed in terms of a basis of unknown eigenvectors of A
(x1 , x2 , . . . , xn ) — assuming the eigenvectors span the n-D space we can write

y = c1 x 1 + c2 x 2 + · · · + cn x n

Now premultiply by A. This is shown graphically in Figure 6.3.

Ay
y

F IGURE 6.3

Now we can use Axi = λi xi to get

Ay = c1 Ax1 + c2 Ax2 + · · · + cn Axn


= c1 λ 1 x 1 + c2 λ 2 x 2 + · · · + cn λ n x n

If we assume that λ1 is the eigenvalue with the largest absolute value we can obtain
     
λ2 λn
Ay = λ1 c1 x1 + c2 x2 + · · · + cn xn
λ1 λ1

This result can be premultiplied by A to give:


"  2  2 #
λ 2 λn
A2 y = λ21 c1 x1 + c2 x2 + · · · + cn xn .
λ1 λ1

After m premultiplications:
 
 m  m
m
 λ2 λn 
A y= λm
1
c1 x1 + c2 x 2 + · · · + cn xn 
.
 λ1 λ1
| {z } | {z }
→0 →0

As m → ∞, the other terms drop out, and the vector Am y points in the direction of x1 . The
graphical representation of this is shown in Figure 6.4.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
6.2 P OWER METHOD 71

x1
A2 y
Ay

F IGURE 6.4

6.2.1 An example of the power method


   
5 −2 0 1
To determine the eigenvalues of A = −2
 3 −1 , begin with y = 0 and calculate:
 
0 −1 1 0
    
5 −2 0 1 5
y 1 = Ay = −2
 3 −1 0 = −2 .
   
0 −1 1 0 0
 
p √ 0.928
Now normalise y 1 i.e. divide by 52 + (−2)2 + 02 = 29 to give ŷ 1 = −0.371. Iterating on
0.000
this process we have:
      
0.928 5 −2 0 0.928 5.382
ŷ 1 = −0.371 ∴ y 2 = Aŷ 1 = −2 3 −1 −0.371 ⇒ y 2 = −2.969
 0.000  0 −1 1 0.000  0.371 
0.874 5.334
ŷ 2 = −0.482
  ⇒ y 3 = −3.254

 0.060   0.542 
0.850 5.288
ŷ 3 = −0.519
  ⇒ y 4 = −3.343 .

 0.086   0.705 
0.840 5.262
ŷ 4 = −0.531 ⇒ y 5 = −3.385
 0.112   0.643 
0.837 5.261
ŷ 5 = −0.538 ⇒ y 6 = −3.390
0.102 0.640

Thisphas converged to the first significant figure. The eigenvalue is given by the normalising factor,
i.e. 5.2612 + (−3.390)2 + 0.6402 = 6.291, and thus:
   
5.261 0.836
Aŷ m = −3.390 = 6.291 −0.539 = λ1 ŷ m .
0.640 0.102

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
72 E IGENPROBLEMS

 
0.836
Therefore the eigenvalue is λ1 = 6.291, with eigenvector s = −0.539. If λ2 is the second
0.102
 m
λ2
largest eigenvalue, then the speed of convergence depends on how quickly → 0, i.e. if λ2
λ1
is close to λ1 , it is slow, and fails if λ2 = λ1 .

6.2.2 Shifting
The eigenvalues and eigenvectors of A satisfy Equation (6.1). We can rewrite this equation by
shifting the eigenvalues of A by an arbitrary amount p, i.e.

Ax = λx = (λ − p) x + px

or
(A − pI) x = (λ − p) x.
Thus,
Bx = µx
where
B = A − pI and µ = λ − p
i.e. the eigenvectors of B are the same as A, but the eigenvalues are shifted by p.
If we choose p = λ1 , where λ1 has been determined from the method shown above, we will
shift the largest eigenvalue of A to zero and the most negative eigenvalue of A into prominence as
the eigenvalue with the largest absolute value. Consider now applying the power method to the B
matrix with p = λ1 = 6.291.
   
5 − p −2 0 −1.291 −2 0
B =  −2 3 − p −1  =  −2 −3.291 −1 
0 −1 1 − p 0 −1 −5.291

The eigenvector for the largest absolute value µ is the same as for the smallest λ (but not smallest
absolute value), andλsmallest
 = µlargest + p.
0
Start with y = 0.5 and calculate:
1
    
−1.291 −2 0 0 −1
y 1 = By =  −2 −3.291 −1  0.5 = −2.646 .
0 −1 −5.291 1 −5.791
 
−0.155
Normalise y 1 , i.e., divide by 6.445 to give ŷ 1 = −0.411. Iterating on this process as per the
−0.899

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
6.2 P OWER METHOD 73

unshifted case gives:


   
−0.155 1.021
ŷ 1 = −0.411 ∴ B ŷ 1 = y 2 = 2.560 ky 2 k2 = 5.854
−0.899 5.165 
0.175 −1.100
ŷ 2 = 0.437 y 3 = −2.620 ky 3 k2 = 5.865
0.882  −5.105

−0.188 1.153
ŷ 3 = −0.455 y 4 = 2.744 ky 4 k2 = 5.871
−0.870 5.061 
0.196 −1.188
ŷ 4 = 0.468 y 5 = −2.793 ky 5 k2 = 5.873
0.862  −5.028

−0.202 1.212
ŷ 5 = −0.476 y 6 = 2.826 ky 6 k2 = 5.874.
−0.856 5.005
Following
 further
 iterations:  
−0.215 1.263
ŷ m = −0.493 y m = 2.894 ky m k2 = 5.875.
−0.843 4.955
This has converged to the first three significant figures. The normalising factor is 5.875, and
   
1.263 −0.215
B ŷ m = 2.894 = −5.875 −0.493 = µ1 ŷ m .
4.955 −0.843
The largest absolute value of µ1 = −5.875. Notice that the alternating sign of y is due to the fact
that the eigenvalue is negative. This gives the smallest λ value as λ3 = µ1 + 6.291 = 0.416. If we
substitute this eigenvalue and eigenvector into the initial equation involving A, we find that
      
5 −2 0 1.263 0.525 1.263
Ay m = −2 3 −1 2.894 = 1.204 = 0.416 2.894 = λ3 y m ,
0 −1 1 4.955 2.061 4.955
as desired.

6.2.3 Deflation
We can find other eigenvalues by “removing” the highest eigenvalue and corresponding eigenvector
from A and reapplying the power method. One method, called deflation, for removing the highest
eigenvalue produces a new matrix with the same eigenvalues as the original matrix except for λ1
which is now zero. Hence λ2 will become the largest eigenvalue which the power method will find.
This new matrix, B, is found by
B = A − λ1 x1 x1 T
where λ1 is the highest eigenvalue and x1 is the corresponding eigenvector of unit length. Note
that x1 x1 T is the backward or matrix product of two vectors.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
74 E IGENPROBLEMS

6.3 Inverse Iteration


The inverse iteration method is usually used in conjunction with some other method which finds
(approximate) eigenvalues. Given these approximate eigenvalues, this method will iterate to im-
prove the eigenvalues and compute the associated eigenvectors.
Consider the linear system of equations

(A − τ I) y = b

where b is some random vector, y is the solution to this linear system, and τ is close to some
eigenvalue λ of A. Then y will be close to the eigenvector corresponding to λ. We can iterate
on this method by replacing b by y and solving for a new y, which will be even closer to the true
eigenvector x.
Suppose that at iteration i, we have some approximate value of the eigenvalue λi and some
approximate eigenvector xi . Then we are solving the equation (by the LU method or similar)

(A − λi I) y = xi (6.3)

Now the vector y in Equation (6.3) is a better approximation to x than xi .We can therefore improve
our guess at the eigenvector by setting xi+1 to the normalised value of y i.e.
y
xi+1 = (6.4)
kyk2

To improve our guess of the eigenvalue we note that the exact eigenvalue and eigenvector will
solve
Ax = λx
and hence
(A − λi I) x = (λ − λi ) x. (6.5)
Substituting our new guess at the eigenvector, y, into Equation (6.5) we get

(A − λi I) y = (λ − λi ) y

From Equation (6.3) we can replace the left hand side of the above equation by xi , i.e.

xi = (λ − λi ) y

If we take the dot product of each side with xi and replace λ by our new guess λi+1 we obtain

xi · xi = (λi+1 − λi ) xi · y.

Recall that xi · xi = kxi k2 2 . Rearranging we obtain

kxi k2 2
λi+1 = λi + (6.6)
xi · y
There are some practical points that should be noted about the inverse iteration method.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
6.3 I NVERSE I TERATION 75

• If τ (the initial guess or approximation to the eigenvalue) is equal to λ (the actual eigenvalue)
then (A − τ I) is singular, and the system cannot be solved. In any case for τ close to λ (as
desired) the system (A − τ I) will be nearly singular and a special LU scheme for coping
with zero or close to zero pivots must be used.
• In practice the approximation to the eigenvalue, λi , is not updated at every iteration. As
most of the work in this method is spent solving Equation (6.3) computational savings can
be made using forward and backward substitution on the factored matrix (A − λi I). This,
however, can only happen if λi is constant between iterations. The question then becomes
that of how many iterations is λi used for before it is updated and the matrix (A − λi I)
refactorised.
• The method will only find one eigenvector for repeated eigenvalues.

6.3.1 An example of inverse iteration


Consider the same matrix as in the previous examples, i.e.
 
5 −2 0
A = −2 3 −1 ,
0 −1 1
 
1
and suppose that we have determined λ to be approximately 6. Begin with λ1 = 6 and x1 = 1.

1
For each iteration solve
(A − λi I) y i = xi
and then update the xi vector and (if necessary) λi .
We do the first iteration without updating λi so that the random starting vector will be updated
to be close to the eigenvector. Using Gaussian elimination or LU factorisation of A we obtain
   
1 0.707
y1
y 1 = −1 , ky 1 k2 = 1.4142 ⇒ x2 = = −0.707
ky 1 k2
0 0.000
and solve again using x2 and λ2 = λ1 to give
   
2.828 0.843
y2
y 2 = −1.770 , ky 2 k2 = 3.3541 ⇒ x3 = = −0.527 .
ky 2 k2
0.354 0.105
Now update λ
1
x2 · y 2 = 3.25 ⇒ λ3 = λ2 + = 6.3077
x2 · y 2
Solve again using x3 and λ3
   
−47.102 −0.836
y3
y 3 = −30.376 , ky 3 k2 = 56.3408 ⇒ x4 = =  0.539  .
ky 3 k2
−5.743 −0.102

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
76 E IGENPROBLEMS

Update λ
1
x3 · y 3 = −56.3349 ⇒ λ4 = λ3 + = 6.2899.
x3 · y 3
Solve again using x4 and λ4
   
−2.223 −0.836
y4
y 4 =  1.434  × 105 , ky 4 k2 = 2.6590 × 105 ⇒ x5 = =  0.539 
ky 4 k2
−0.271 −0.102
Update λ
1
x4 · y 4 = 2.6590 × 105 ⇒ λ5 = λ4 + = 6.2899.
x4 · y 4
Both eigenvector and eigenvalue have converged to 3 significant figures.

6.4 Similarity Transformations


Consider the similarity transform of A for some transformation matrix Z defined by
A → Z −1 AZ.
Consider now the eigenvalues of the transformed matrix.
det Z −1 AZ − λI = det Z −1 (A − λI) Z
   

= det [Z] det [A − λI] det Z −1


 

= det [Z] det Z −1 det [A − λI]


 
| {z }
1
= det [A − λI] .
The eigenvalues of the transformed matrix are unchanged.
Similarity transforms are used to transform A towards a diagonal form by some sequence of
transformations, e.g.
A → P −1 −1 −1 −1 −1 −1
1 AP 1 → P 2 P 1 AP 1 P 2 → P 3 P 2 P 1 AP 1 P 2 P 3 → etc.

If this continues until the resultant product of A and the P i s is in a diagonal form, then from
Λ = S −1 AS, the eigenvalues are the values on the diagonal, and the eigenvectors are the columns
of the accumulated transformations i.e.
S = P 1P 2P 3 . . .
If only eigenvalues are required, then transforming to an upper (or lower) triangular matrix is
sufficient, and the eigenvalues are the diagonal values (as can be seen by taking a determinant).
There are several methods for doing these transformations which will be covered next year.
There are two methods which zero the off-diagonal elements. Jacobi works element by element,
and Householder works column by column. These two methods are generally used to produce
only a tridiagonal matrix. Further algorithms exist for finding eigenvalues and eigenvectors of
tridiagonal matrices. Another important method is called the QR method, or Gram Schmidt or-
thogonalisation.

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
6.5 A PPLICATIONS 77

6.5 Applications
6.5.1 Uncoupling systems of linear equations
Consider solving a system of equations, Ax = b. We rotate these equations to a new system of
coordinates based on the eigenvectors of A by letting x = Sx∗ and b = Sb∗ . Our system of
equations now becomes
Ax = b ⇒ ASx∗ = Sb∗ ⇒ S −1 ASx∗ = b∗ ,
or
Λx∗ = b∗ .
This is a diagonal system of equations which is easy to solve, e.g.
λ1 x∗1 = b∗1
A diagonal system of equations like this is said to be uncoupled as each variable xi does not affect
the value of other xj variables.

6.5.2 Solving a system of ODEs


Consider a system of ODEs
dy
= Ay + b,
dt
where A is a real symmetric n × n matrix. We can uncouple this system using the same method
as above. Let
y = Sz = z1 s1 + z2 s2 + · · · + zn sn
and therefore
dy
= Ay + b
dt
becomes:

S z = ASz + b

z = S −1 ASz + S −1 b
= Λz + u
      
z1 λ1 z1 u1
z
d  2 
   λ2   2   u2 
  z   
.= ..   ..  +  ..  .
dt  ..   .  .   . 
zn λn zn un
This has uncoupled the system of equations to

z 1 = λ1 z1 + u1

z 2 = λ2 z2 + u2
..
.

z n = λn zn + un .

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014
78 E IGENPROBLEMS

The solution of this system is easily found to be


u1
z1 = a1 eλ1 t −
λ1
λ2 t u2
z2 = a2 e −
λ2
..
.
un
zn = an eλn t − .
λn

6.5.3 Powers of a matrix


Consider calculating a matrix to some power, e.g. A100 . Using A = SΛS −1 we can write
−1 −1 −1 −1
A.A.A . . . A = SΛ S
| {z.S} Λ S
| {z.S} ΛS . . . SΛS
I I
100 −1
= SΛ S
 
λ100
1
=S
 ...  −1
S
100
λn

since     
λ1 λ1 λ21
ΛΛ = 
 ..  .. =
  .. .

.  . .
λn λn λ2n

c Department of Engineering Science, University of Auckland, New Zealand. All rights reserved. August 19, 2014

You might also like