0% found this document useful (0 votes)
89 views55 pages

Introduction To Numerical Methods With Examples in Javascript

This document is the preface to a book about numerical methods. It introduces the book as evolving from lecture notes for teaching numerical methods using Javascript examples. The preface notes that the text is freely distributable under open source licenses and hopes that readers will find it useful, though no warranties are provided. It also directs readers to external Wikipedia pages for more details on the licenses.

Uploaded by

Francis Drake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views55 pages

Introduction To Numerical Methods With Examples in Javascript

This document is the preface to a book about numerical methods. It introduces the book as evolving from lecture notes for teaching numerical methods using Javascript examples. The preface notes that the text is freely distributable under open source licenses and hopes that readers will find it useful, though no warranties are provided. It also directs readers to external Wikipedia pages for more details on the licenses.

Uploaded by

Francis Drake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Introduction to Numerical Methods

with examples in Javascript

D.V. Fedorov
Department of Physics and Astronomy
Aarhus University
8000 Aarhus C, Denmark
ii

c 2010 Dmitri V. Fedorov


Permission is granted to copy and redistribute this work under the terms of
either the GNU General Public License1 , version 3 or later, as published by the
Free Software Foundation, or the Creative Commons Attribution Share Alike
License2 , version 3 or later, as published by the Creative Commons corporation.

This work is distributed in the hope that it will be useful, but without any
warranty. No responsibility is assumed by the author and the publisher for any
damage from any use of any methods, instructions or ideas contained in the
material herein.

1 http://en.wikipedia.org/wiki/GPL
2 http://en.wikipedia.org/wiki/CC-BY-SA
iii

Preface
This book evolved from lecture notes developed over several years of teaching
numerical methods at the University of Aarhus. It contains short descriptions of
the most common numerical methods together with program examples written
in Javascript. The latter was chosen simply because the it seems concise and
intuitive to me. The program examples are not tested or optimized in any way
other than to fit on one page of the book.
The text of the book is free as in freedom. You are permitted to copy and
redistribute the book in original or modified form either gratis or for a fee.
However, you must attribute the original author(s) and pass the same freedoms
to all recipients of your copies3 .

2010
Dmitri Fedorov

3 see the GPL or CC-BY-SA licenses for more details.


iv
Contents

1 Linear equations 1
1.1 Triangular systems and back-substitution . . . . . . . . . . . . . 1
1.2 Reduction to triangular form . . . . . . . . . . . . . . . . . . . . 2
1.2.1 LU decomposition . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 QR decomposition . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Determinant of a matrix . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Matrix inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 JavaScript implementations . . . . . . . . . . . . . . . . . . . . . 4

2 Interpolation 5
2.1 Polynomial interpolation . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Spline interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Linear interpolation . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Quadratic spline . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Cubic spline . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Other forms of interpolation . . . . . . . . . . . . . . . . . . . . . 9

3 Linear least squares 11


3.1 Linear least-squares problem . . . . . . . . . . . . . . . . . . . . 11
3.2 Solution via QR-decomposition . . . . . . . . . . . . . . . . . . . 11
3.3 Ordinary least-squares curve fitting . . . . . . . . . . . . . . . . . 12
3.3.1 Variances and correlations of fitting parameters . . . . . . 12
3.4 JavaScript implementation . . . . . . . . . . . . . . . . . . . . . . 13

4 Numerical integration 15
4.1 Classical quadratures with equally spaced abscissas . . . . . . . . 15
4.2 Quadratures with optimized abscissas . . . . . . . . . . . . . . . 16
4.3 Reducing the error by subdividing the interval . . . . . . . . . . 17
4.4 Adaptive quadratures . . . . . . . . . . . . . . . . . . . . . . . . 17
4.5 Gauss-Kronrod quadratures . . . . . . . . . . . . . . . . . . . . . 17
4.6 Integrals over infinite intervals . . . . . . . . . . . . . . . . . . . 18
4.6.1 Infinite intervals . . . . . . . . . . . . . . . . . . . . . . . 18
4.6.2 Half-infinite intervals . . . . . . . . . . . . . . . . . . . . . 19

5 Monte Carlo integration 21


5.1 Multi-dimensional integration . . . . . . . . . . . . . . . . . . . . 21
5.2 Plain Monte Carlo sampling . . . . . . . . . . . . . . . . . . . . . 21
5.3 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . 22

v
vi CONTENTS

5.4 Stratified sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 23


5.5 Quasi-random (low-discrepancy) sampling . . . . . . . . . . . . . 24
5.5.1 Lattice rules . . . . . . . . . . . . . . . . . . . . . . . . . 25

6 Ordinary differential equations 27


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Runge-Kutta methods . . . . . . . . . . . . . . . . . . . . . . . . 27
6.3 Multistep methods . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3.1 A two-step method . . . . . . . . . . . . . . . . . . . . . . 28
6.4 Predictor-corrector methods . . . . . . . . . . . . . . . . . . . . . 28
6.5 Step size control . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.5.1 Error estimate . . . . . . . . . . . . . . . . . . . . . . . . 29
6.5.2 Adaptive step size control . . . . . . . . . . . . . . . . . . 30

7 Nonlinear equations 31
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7.2 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7.3 Broyden’s quasi-Newton method . . . . . . . . . . . . . . . . . . 32
7.4 Javascript implementation . . . . . . . . . . . . . . . . . . . . . . 32

8 Optimization 35
8.1 Downhill simplex method . . . . . . . . . . . . . . . . . . . . . . 35
8.2 Javascript implementation . . . . . . . . . . . . . . . . . . . . . . 36

9 Eigenvalues and eigenvectors 37


9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9.2 Similarity transformations . . . . . . . . . . . . . . . . . . . . . . 37
9.2.1 Jacobi eigenvalue algorithm . . . . . . . . . . . . . . . . . 38
9.3 Power iteration methods . . . . . . . . . . . . . . . . . . . . . . . 39
9.3.1 Power method . . . . . . . . . . . . . . . . . . . . . . . . 39
9.3.2 Inverse power method . . . . . . . . . . . . . . . . . . . . 40
9.3.3 Inverse iteration method . . . . . . . . . . . . . . . . . . . 40
9.4 JavaScript implementation . . . . . . . . . . . . . . . . . . . . . . 40

10 Power method and Krylov subspaces 43


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
10.2 Arnoldi iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
10.3 Lanczos iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
10.4 Generalised minimum residual (GMRES) . . . . . . . . . . . . . 44

11 Fast Fourier transform 45


11.1 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . 45
11.1.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . 46
11.2 Cooley-Tukey algorithm . . . . . . . . . . . . . . . . . . . . . . . 46
11.3 Multidimensional DFT . . . . . . . . . . . . . . . . . . . . . . . . 47
11.4 C implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Chapter 1

Linear equations

A system of linear equations is a set of linear algebraic equations generally


written in the form n
X
Aij xj = bi , i = 1 . . . m , (1.1)
j=1

where x1 , x2 , . . . , xn are the unknown variables, A11 , A12 , . . . , Amn are the (con-
stant) coefficients of the system, and b1 , b2 , . . . , bm are the (constant) right-hand
side terms.
The system can be written in matrix form as
Ax = b . (1.2)
. .
where A = {Aij } is the m × n matrix of the coefficients, x = {xj } is the size-n
.
column-vector of the unknown variables, and b = {bi } is the size-m column-
vector of right-hand side terms.
Systems of linear equations occur regularly in applied mathematics. There-
fore the computational algorithms for finding solutions of linear systems are an
important part of numerical methods.
A system of non-linear equations can often be approximated by a linear
system, a helpful technique (called linearization) in creating a mathematical
model of an otherwise a more complex system.
If m = n, the matrix A is called square. A square system has a unique
solution if A is invertible.

1.1 Triangular systems and back-substitution


An efficient algorithm to solve a square system of linear equations numerically
is to transform the original system into an equivalent triangular system,
Ty = c , (1.3)
where T is a triangular matrix: a special kind of square matrix where the matrix
elements either below or above the main diagonal are zero.
An upper triangular system can be readily solved by back substitution:
n
!
1 X
yi = ci − Tik yk , i = n, . . . , 1 . (1.4)
Tii
k=i+1

1
2 CHAPTER 1. LINEAR EQUATIONS

For the lower triangular system the equivalent procedure is called forward sub-
stitution.
Note that a diagonal matrix – that is, a square matrix in which the elements
outside the main diagonal are all zero – is also a triangular matrix.

1.2 Reduction to triangular form


Popular algorithms for transforming a square system to triangular form are LU
decomposition and QR decomposition.

1.2.1 LU decomposition
LU decomposition is a factorization of a square matrix into a product of a lower
triangular matrix L and an upper triangular matrix U ,

A = LU . (1.5)

The linear system Ax = b after LU-decomposition of the matrix A becomes


LU x = b and can be solved by first solving Ly = b for y and then U x = y for
x with two runs of forward and backward substitutions.
If A is a n × n matrix, the condition (1.5) is a set of n2 equations,
n
X
Lik Ukj = Aij , (1.6)
k=1

for n2 + n unknown elements of the triangular matrices L and U . The decom-


position is thus not unique.
Usually the decomposition is made unique by providing extra n conditions
e.g. by the requirement that the elements of the main diagonal of the matrix L
are equal one, Lii = 1 , i = 1 . . . n. The system (1.6) can then be easily solved
row after row using e.g. the Doolittle algorithm,
f o r i = 1 to n :
Lii = 1
f o r j = 1 to i − 1 : 
P
Lij = Aij − k<j Lik Ukj /Ujj
f o r j = i to Pn :
Uij = Aij − k<i Lik Ukj

1.2.2 QR decomposition
QR decomposition is a factorization of a matrix into a product of an orthogonal
matrix Q, such that QT Q = 1 (where T denotes transposition), and a right
triangular matrix R,
A = QR . (1.7)
QR-decomposition can be used to convert the linear system Ax = b into the
triangular form
Rx = QT b, (1.8)
which can be solved directly by back-substitution.
1.3. DETERMINANT OF A MATRIX 3

QR-decomposition can also be performed on non-square matrices with few


long columns. Generally speaking a rectangular n × m matrix A can be repre-
sented as a product, A = QR, of an orthogonal n × m matrix Q, QT Q = 1, and
a right-triangular m × m matrix R.
QR decomposition of a matrix can be computed using several methods, such
as Gram-Schmidt orthogonalization, Householder transformations, or Givens
rotations.

Gram-Schmidt orthogonalization
Gram-Schmidt orthogonalization is an algorithm for orthogonalization of a set
of vectors in a given inner product space. It takes a linearly independent set
of vectors A = {a1 , . . . , am } and generates an orthogonal set Q = {q1 , . . . , qm }
which spans the same subspace as A. The algorithm is given as
f o r i = 1 to m
qi ← ai /kai k ( n o r m a l i z a t i o n )
f o r j = i + 1 to m
aj ← aj − haj , qi iqi ( o r t h o g o n a l i z a t i o n )
p
where ha, bi is the inner product of two vectors, and kak = ha, ai is the
vector’s norm. This variant of the algorithm, where all remaining vectors aj are
made orthogonal to qi as soon as the latter is calculated, is considered to be
numerically stable and is referred to as stabilized or modified.
Stabilized Gram-Schmidt orthogonalization can be used to compute QR de-
composition of a matrix A by orthogonalization of its column-vectors ai with
the inner product
Xn
ha, bi = aT b ≡ (a)k (b)k , (1.9)
k=1

where n is the length of column-vectors a and b, and (a)k is the kth element of
the column-vector.
i n p u t : m a tr i x A = {a1 , . . . , am } ( d e s t r o y e d )
o u tp u t : m a t r i c e s R , Q = {q1 , . . . , qm } : A = QR
for i = 1 . . . m
Rii = (aTi ai )
1/2

qi = ai /Rii
for j = i + 1 . . . m
Rij = qT i aj
aj = aj − qi Rij

The factorization is unique under requirement that the diagonal elements of


R are positive. For a n × m matrix the complexity of the algorithm is O(m2 n).

1.3 Determinant of a matrix


LU- and QR-decompositions allow O(n3 ) calculation of the determinant of a
square matrix. Indeed, for the LU-decomposition,
n
Y
det A = det LU = det L det U = det U = Uii . (1.10)
i=1
4 CHAPTER 1. LINEAR EQUATIONS

For the QR-decomposition

det A = det QR = det Q det R . (1.11)

Since Q is an orthogonal matrix (det Q)2 = 1 and therefore


n
Y
| det A| = | det R| = Rii . (1.12)


i=1

1.4 Matrix inverse


The inverse A−1 of a square n × n matrix A can be calculated by solving n
linear equations Axi = zi , i = 1 . . . n, where zi is a column where all elements
are equal zero except for the element number i, which is equal one. The matrix
made of columns xi is apparently the inverse of A.

1.5 JavaScript implementations


function q r d e c (A) { // QR−d e c o m p o s i t i o n A=QR o f m at r i x A
var m=A. l e n g t h , dot = function ( a , b ) {
var s =0; f o r ( var i i n a ) s+=a [ i ] ∗ b [ i ] ; return s ; }
var R= [ [ 0 f o r ( i i n A) ] f o r ( j i n A) ] ;
var Q= [ [A[ i ] [ j ] f o r ( j i n A [ 0 ] ) ] f o r ( i i n A) ] ; //Q i s a copy o f A
f o r ( var i =0; i <m; i ++){
var e=Q[ i ] , r=Math . s q r t ( dot ( e , e ) ) ;
i f ( r==0) throw ” q r d e c : s i n g u l a r m a tr i x ”
R [ i ] [ i ]= r ;
f o r ( var k i n e ) e [ k ]/= r ; // n o r m a l i z a t i o n
f o r ( var j=i +1; j <m; j ++){
var q=Q[ j ] , s=dot ( e , q ) ;
f o r ( var k i n q ) q [ k]−=s ∗ e [ k ] ; // o r t h o g o n a l i z a t i o n
R [ j ] [ i ]= s ; } }
return [ Q,R ] ; } // end q r d e c

function q r b a c k (Q, R, b ) { // QR−b a c k s u b s t i t u t i o n


// i n p u t : m a t r i c e s Q, R, ar r ay b ; o u t p u t : ar r ay x s u c h t h a t QRx=b
var m = Q. l e n g t h , c = new Array (m) , x = new Array (m) ;
f o r ( var i i n Q) { // c = QˆT b
c [ i ] = 0 ; f o r ( var k i n b ) c [ i ]+=Q[ i ] [ k ] ∗ b [ k ] ; }
f o r ( var i=m−1; i >=0; i −−){ // b a c k s u b s t i t u t i o n
f o r ( var s =0 , k=i +1;k<m; k++) s+=R [ k ] [ i ] ∗ x [ k ] ;
x [ i ]=( c [ i ]− s ) /R [ i ] [ i ] ; }
return x ; } // end q r b a c k

function i n v e r s e (A) { // c a l c u l a t e s i n v e r s e o f m at r i x A
var [ Q,R]= q r d e c (A) ;
return [ q r b a c k (Q, R , [ ( k == i ? 1 : 0 ) f o r ( k i n A) ] ) f o r ( i i n A) ] ; } //
end i n v e r s e
Chapter 2

Interpolation

In practice one often meets a situation where the function of interest, f (x), is
only given as a discrete set of n tabulated points, {xi , yi = f (xi ) | i = 1 . . . n},
as obtained for example by sampling, experimentation, or expensive numerical
calculations.
Interpolation means constructing a (smooth) function, called interpolating
function, which passes exactly through the given points and hopefully approx-
imates the tabulated function in between the tabulated points. Interpolation
is a specific case of curve fitting in which the fitting function must go exactly
through the data points.
The interpolating function can be used for different practical needs like esti-
mating the tabulated function between the tabulated points and estimating the
derivatives and integrals involving the tabulated function.

2.1 Polynomial interpolation


Polynomial interpolation uses a polynomial as the interpolating function. Given
a table of n points, {xi , yi }, one can construct a polynomial P (n−1) (x) of the
order n − 1 which passes exactly through the points. This polynomial can be
intuitively written in the Lagrange form,
n n
X Y x − xk
P (n−1) (x) = yi . (2.1)
i=1
xi − xk
k6=i

function p i n t e r p ( x , y , z ) {
f o r ( var s =0 , i =0; i <x . l e n g t h ; i ++){
f o r ( var p=1 ,k =0; k<x . l e n g t h ; k++)
i f ( k!= i ) p∗=( z−x [ k ] ) / ( x [ i ]−x [ k ] )
s+=y [ i ] ∗ p}
return s }

Higher order interpolating polynomials are susceptible to the Runge phe-


nomenon – erratic oscillations close to the end-points of the interval as illus-
trated on Fig. 2.1. This problem can be avoided by using only the nearest few
points instead of all the points in the table (local interpolation) or by using
spline interpolation.

5
6 CHAPTER 2. INTERPOLATION

1.0

0.8

0.6

0.4

y
0.2

0.0

−0.2

−0.4
−6 −4 −2 0 2 4 6
x

Figure 2.1: Lagrange interpolating polynomial, solid line, showing the Runge
phenomenon: large oscillations at the end-points. Dashed line shows a quadratic
spline.

2.2 Spline interpolation


Spline interpolation uses a piecewise polynomial S(x), called spline, as the in-
terpolating function,

S(x) = Si (x) if x ∈ [xi , xi+1 ] , (2.2)

where Si (x) is a polynomial of a given order k.


The spline of the order k ≥ 1 can be made continuous at the tabulated
points,

Si (xi ) = yi , i = 1...n − 1
Si (xi+1 ) = yi+1 , i = 1...n − 1 , (2.3)

together with its k − 1 derivatives,

Si′ (xi+1 ) = Si+1



(xi+1 ) , i = 1...n − 2
Si′′ (xi+1 ) = ′′
Si+1 (xi+1 ) , i = 1...n − 2
... (2.4)

Continuity conditions (2.3) and (2.4) make kn + n − 2k linear equations for


the (n − 1)(k + 1) = kn + n − k − 1 coefficients in n − 1 polynomials (2.2) of the
order k. The missing k − 1 conditions can be chosen (reasonably) arbitrarily.
The most popular is the cubic spline, where the polynomials Si (x) are of
third order. The cubic spline is a continuous function together with its first
and second derivatives. The cubic spline also has a nice feature that it (sort
of) minimizes the total curvature of the interpolating function. This makes the
cubic splines look good.
Quadratic spline, which is continuous together with its first derivative, is
not nearly as good as the cubic spline in most respects. Particularly it might
oscillate unpleasantly when a quick change in the tabulated function is followed
by a period where the function is nearly a constant. The cubic spline is less
susceptible to such oscillations.
Linear spline is simply a polygon drawn through the tabulated points.
2.2. SPLINE INTERPOLATION 7

2.2.1 Linear interpolation


If the spline polynomials are linear the spline is called linear interpolation. The
continuity conditions (2.3) can be satisfied by choosing the spline as
∆yi
Si (x) = yi + (x − xi ) , (2.5)
∆xi
where
∆yi ≡ yi+1 − yi , ∆xi ≡ xi+1 − xi . (2.6)

2.2.2 Quadratic spline


Quadratic splines are made of second order polynomials, conveniently chosen in
the form
∆yi
Si (x) = yi + (x − xi ) + ai (x − xi )(x − xi+1 ), (2.7)
∆xi
which identically satisfies the continuity conditions (2.3).
Substituting (2.7) into the continuity condition for the first derivative (2.4)
gives n − 2 equations for n − 1 unknown coefficients ai ,
∆yi ∆yi+1
+ ai ∆xi = − ai+1 ∆xi+1 . (2.8)
∆xi ∆xi+1
One coefficient can be chosen arbitrarily, for example a1 = 0. The other
coefficients can now be calculated recursively,
 
1 ∆yi+1 ∆yi
ai+1 = − − ai ∆xi . (2.9)
∆xi+1 ∆xi+1 ∆xi
Alternatively, one can choose an−1 = 0 and make the inverse recursion
 
1 ∆yi+1 ∆yi
ai = − − ai+1 ∆xi+1 . (2.10)
∆xi ∆xi+1 ∆xi
In practice, unless you know what your a1 (or an−1 ) is, it is better to run
both recursions and then average the resulting a’s.

2.2.3 Cubic spline


Cubic splines are made of third order polynomials written e.g. in the form

Si (x) = yi + bi (x − xi ) + ci (x − xi )2 + di (x − xi )3 , (2.11)

which automatically satisfies the upper half of continuity conditions (2.3). The
other half of continuity conditions (2.3) and the continuity of the first and second
derivatives (2.4) give

yi + bi hi + ci h2i + di h3i = yi+1 , i = 1, . . . , n − 1


bi + 2ci hi + 3di h2i = bi+1 , i = 1, . . . , n − 2
2ci + 6di hi = 2ci+1 , i = 1, . . . , n − 2 (2.12)

where
hi = xi+1 − xi . (2.13)
8 CHAPTER 2. INTERPOLATION

The set of equations (2.12) is a set of 3n − 5 linear equations for the 3(n − 1)
unknown coefficients {ai , bi , ci | i = 1, . . . , n − 1}. Therefore two more equations
should be added to the set to find the coefficients. If the two extra equations
are also linear, the total system is linear and can be easily solved.
The spline is called natural if the extra conditions are vanishing second
derivative at the end-points,

S ′′ (x1 ) = S ′′ (xn ) = 0 , (2.14)

which gives

c1 = 0 ,
cn−1 + 3dn−1 hn−1 = 0 . (2.15)

Solving the first two equations in (2.12) for ci and di gives1

ci h i = −2bi − bi+1 + 3pi ,


di h2i = bi + bi+1 − 2pi , (2.16)

where pi ≡ ∆y hi . The natural conditions (2.15) and the third equation in (2.12)
i

then produce the following tridiagonal system of n linear equations for the n
coefficients bi ,

2b1 + b2 = 3p1 ,
hi hi hi
bi + (2 + 2)bi+1 + bi+2 = 3(pi + pi+1 ) , i = 1, n − 2
hi+1 hi+1 hi+1
bn−1 + 2bn = 3pn−1 , (2.17)

or, in the matrix form,


 
D1 Q1 0 0 ... b1
 
B1

 1 D2 Q2 0 ...  ..   .. 
. .
 
 0 1 D3 Q3 ...
 
=

(2.18)
 
 ..

.. .. .. ..
 ..   .. 
. .

 . . . . .    
... ... 0 1 Dn bn Bn

where the elements Di at the main diagonal are


hi
D1 = 2 ; Di+1 = 2 + 2 , i = 1, . . . , n − 2 ; Dn = 2 , (2.19)
hi+1

the elements Qi at the above-main diagonal are


hi
Q1 = 1 ; Qi+1 = , i = 1, . . . , n − 2 , (2.20)
hi+1

and the right-hand side terms Bi are


hi
B1 = 3p1 ; Bi+1 = 3(pi + pi+1 ) , i = 1, . . . , n − 2 ; Bn = 3pn−1 . (2.21)
hi+1
1 introducing an auxiliary coefficient bn
2.3. OTHER FORMS OF INTERPOLATION 9

This system can be solved by one run of Gauss elimination and then a run
of back-substitution. After a run of Gaussian elimination the system becomes

D̃1 Q1 0 0 ...
    
b1 B̃1
 0 D̃2 Q2 0 ...  ..   .. 
. .
 
0 0 D̃3 Q3 ...
  
  =  , (2.22)

.. .. .. .. ..
 ..   .. 
. .
 
. . . . .
    
... ... 0 0 D̃n bn B̃n

where
D̃1 = D1 ; D̃i = Di − Qi−1 /Di−1 , i = 2, . . . , n (2.23)
and
B̃1 = B1 ; B̃i = Bi − Bi−1 /Di−1 , i = 2, . . . , n (2.24)
The triangular system (2.22) can be solved by a run of back-substitution,
1 1
bn = B̃n ; bi = (B̃i − Qi bi+1 ) , i = n − 1, . . . , 1 . (2.25)
D̃n D̃i

2.3 Other forms of interpolation


Other forms of interpolation can be constructed by choosing a different class
of interpolating functions, for example, rational function interpolation, trigono-
metric interpolation, wavelet interpolation etc.
Sometimes not only the value of the function is given at the tabulated points,
but also the derivative. This extra information can be taken advantage of when
constructing the interpolation function.
Interpolation of a function in more than one dimension is called multivariate
interpolation. In two dimension one of the easiest methods is the bi-linear
interpolation where the function in each tabulated rectangle is approximated as
a product of two linear functions,

f (x, y) ≈ (ax + b)(cy + d) , (2.26)

where the constants a, b, c, d are obtained from the condition that the inter-
polating function is equal the tabulated values at the nearest four tabulated
points.
10 CHAPTER 2. INTERPOLATION
Chapter 3

Linear least squares

A system of linear equations is considered overdetermined if there are more


equations than unknown variables. If all equations of an overdetermined system
are linearly independent, the system has no exact solution.
A linear least-squares problem is the problem of finding an approximate so-
lution to an overdetermined system. It often arises in applications where a
theoretical model is fitted to experimental data.

3.1 Linear least-squares problem


Consider a linear system
Ac = b , (3.1)
where A is an n × m matrix, c is an m-component vector of unknown variables
and b is an n-component vector of the right-hand side terms. If the number of
equations n is larger than the number of unknowns m, the system is overdeter-
mined and generally has no solution.
However, it is still possible to find an approximate solution – the one where
Ac is only approximately equal b, in the sence that the Euclidean norm of the
difference between Ac and b is minimized,

min kAc − bk2 . (3.2)


c

The problem (3.2) is called the linear least-squares problem and the vector c
that minimizes kAc − bk2 is called the least-squares solution.

3.2 Solution via QR-decomposition


The linear least-squares problem can be solved by QR-decomposition. The
matrix A is factorized as A = QR, where Q is n × m matrix with orthogonal
columns, QT Q = 1, and R is an m × m upper triangular matrix. The Euclidean
norm kAc − bk2 can then be rewritten as

kAc − bk2 = kQRc − bk2 = kRc − QT bk2 + k(1 − QQT )bk2 ≥ k(1 − QQT )bk2 .
(3.3)

11
12 CHAPTER 3. LINEAR LEAST SQUARES

The term k(1 − QQT )bk2 is independent of the variables c and can not be
reduced by their variations. However, the term kRc − QT bk2 can be reduced
down to zero by solving the m × m system of linear equations
Rc − QT b = 0 . (3.4)
The system is right-triangular and can be readily solved by back-substitution.
Thus the solution to the linear least-squares problem (3.2) is given by the
solution of the triangular system (3.4).

3.3 Ordinary least-squares curve fitting


Ordinary (or linear) least-squares curve fitting is a problem of fitting n (exper-
imental) data points {xi , yi ± ∆yi }, where ∆yi are experimental errors, by a
linear combination of m functions
m
X
F (x) = ck fk (x) . (3.5)
k=1

The objective of the least-squares fit is to minimize the square deviation, called
χ2 , between the fitting function and the experimental data,
n  2
2
X F (xi ) − yi
χ = . (3.6)
i=1
∆yi

Individual deviations from experimental points are weighted with their inverse
errors in order to promote contributions from the more precise measurements.
Minimization of χ2 with respect to the coefficiendt ck in (3.5) is apparently
equivalent to the least-squares problem (3.2) where
fk (xi ) yi
Aik = , bi = . (3.7)
∆yi ∆yi
If QR = A is the QR-decomposition of the matrix A, the formal least-squares
solution is
c = R−1 QT b . (3.8)
However in practice it is better to back-substitute the system Rc = QT b.

3.3.1 Variances and correlations of fitting parameters


Suppose δyi is a (small) deviation of the measured value of the physical ob-
servable from its exact value. The corresponding deviation δck of the fitting
coefficient is then given as
X ∂ck
δck = δyi . (3.9)
i
∂yi

In a good experiment the deviations δyi are statistically independent and dis-
tributed normally with the standard deviations ∆yi . The deviations (3.9) are
then also distributed normally with variances,
X  ∂ck 2 X  2
∂ck
hδck δck i = ∆yi = . (3.10)
∂yi ∂bi
i i
3.4. JAVASCRIPT IMPLEMENTATION 13

The standard errors in the fitting coefficients are then given as the square roots
of variances, v
uX ∂ck 2
u  
p
∆ck = hδck δck i = t . (3.11)
i
∂bi

The variances are diagonal elements of the covariance matrix, Σ, made of


covariances,
X ∂ck ∂cq
Σkq ≡ hδck δcq i = . (3.12)
i
∂bi ∂bi

Covariances hδck δcq i are measures of to what extent the coefficients ck and cq
change together if the measured values yi are varied. The normalized covari-
ances,
hδck δcq i
p (3.13)
hδck δck ihδcq δcq i
are called correlations.
Using (3.12) and (3.8) the covariance matrix can be calculated as
  T
∂c ∂c
Σ= = R−1 (R−1 )T = (RT R)−1 = (AT A)−1 . (3.14)
∂b ∂b

The square roots of the diagonal elements of this matrix provide the estimates
of the errors of the fitting coefficients and the (normalized) off-diagonal elements
are the estimates of their correlations.

3.4 JavaScript implementation


function l s f i t ( xs , ys , dys , f u n s ) { // L i n e ar l e a s t s q u a r e s f i t
// u s e s : q r de c , q r b ac k , i n v e r s e
// i n p u t : dat a p o i n t s {x , y , dy } ; f u n c t i o n s { f u n s }
// o u t p u t : f i t t i n g c o e f f i c i e n t s c and c o v a r i a n c e m at r i x S
var dot = function ( a , b ) // a . b
{ l e t s =0; f o r ( l e t i i n a ) s+=a [ i ] ∗ b [ i ] ; return s }
var t t i m e s = function (A, B) // AˆT∗B
[ [ dot (A[ r ] , B [ c ] ) f o r ( r i n A) ] f o r ( c i n B) ] ;
var A= [ [ f u n s [ k ] ( x s [ i ] ) / dys [ i ] f o r ( i i n x s ) ] f o r ( k i n f u n s ) ] ;
var b=[ y s [ i ] / dys [ i ] f o r ( i i n y s ) ] ;
var [ Q,R]= q r d e c (A) ;
var c=q r b a c k (Q, R, b ) ;
var S=i n v e r s e ( t t i m e s (R,R) ) ;
return [ c , S ] ;
}
14 CHAPTER 3. LINEAR LEAST SQUARES
Chapter 4

Numerical integration

Numerical integration, also called quadrature for one-dimensional integrals and


cubature for multi-dimensional integrals, is an algorithm to compute an approx-
imation to a definite integral in the form of a finite sum,
Z b n
X
f (x)dx ≈ wi f (xi ) . (4.1)
a i=1

The abscissas xi and the weights wi in (4.1) are chosen such that the quadrature
is particularly well suited for the given class class of functions to integrate. Dif-
ferent quadratures use different strategies of choosing the abscissas and weights.

4.1 Classical quadratures with equally spaced


abscissas
Classical quadratures use predefined equally-spaced abscissas. A quadrature
is called closed if the abscissas include the end-points of the interval or the
mid-point (which becomes end-point after halving the interval). Otherwise it is
called open. If the integrand is diverging at the end-points (or at the mid-point
of the interval) the closed quadratures generally can not be used.
For an n-point classical quadrature the n free parameters wi can be chosen
such that the quadrature integrates exactly a set of n (linearly independent)
functions {φ1 (x), . . . , φn (x)} where the integrals
Z b
Ik ≡ φk (x)dx (4.2)
a

are known. This gives a set of equations, linear in wi ,


n
X
wi φk (xi ) = Ik , k = 1 . . . n . (4.3)
i=1

The weights wi can then be determined by solving the linear system (4.3).
If the functions to be integrated exactly are chosen as polynomials {1, x, x2 , . . . , xn−1 },
the quadrature is called Newton-Cotes quadrature. An n-point Newton-Cotes

15
16 CHAPTER 4. NUMERICAL INTEGRATION

quadrature can integrate exactly the first n terms of the function’s Taylor ex-
pansion

X f (k) (a) k
f (a + t) = t . (4.4)
k!
k=0
f (n) (a) n
The nth order term n! t
will not be integrated exactly by an n-point quadra-
ture and will then result in the quadrature’s error2
1

Z h (n)
f (a) n f (n) (a) n+1
ǫn ≈ t dt = h . (4.6)
0 n! n!(n + 1)
If the function is smooth and the interval h is small enough the Newton-Cotes
quadrature can give a good approximation.
Here are several examples of closed and open classical quadratures:
Z h
f (x)dx ≈ 21 h [f (0) + f (h)] , (4.7)
0
Z h
1
f (0) + 4f ( 21 h) + f (h) ,
 
f (x)dx ≈ 6h (4.8)
0
Z h
1
 1
f ( 3 h) + f ( 32 h) ,

f (x)dx ≈ 2h (4.9)
0
Z h
1
2f ( 61 h) + f ( 26 h) + f ( 46 h) + 2f ( 65 h) .
 
f (x)dx ≈ 6h (4.10)
0

4.2 Quadratures with optimized abscissas


In quadratures with optimal abscissas, called Gaussian quadratures, not only
weights wi but also abscissas xi are chosen optimally. The number of free
parameters is thus 2n (n optimal abscissas and n weights) and one can chose
2n functions {φ1 (x), . . . , φ2n (x)} to be integrated exactly. This gives a system
of 2n equations, linear in wi and non-linear in xi ,
n
X
wi fk (xi ) = Ik , k = 1, . . . , 2n, (4.11)
i=1
Rb
where Ik = a fk (x)dx. The weights and abcissas can be determined by solving
this system of equations.
Here is, for example, a two-point Gauss-Legendre quadrature rule 3
Z 1  q   q 
f (x)dx ≈ f − 3 + f + 13 .
1
(4.13)
−1
1 Assuming that the integral is rescaled as
Z b Z h=b−a
f (x)dx = f (a + t)dt . (4.5)
a 0

2 Actually the error is often one order in h higher due to symmetry of the the polynomials

tk with respect to reflections against the origin.


3 assuming that the integral is rescaled as
Z b Z 1  
b−a a+b b−a
f (x)dx = f + t dt . (4.12)
a −1 2 2 2
4.3. REDUCING THE ERROR BY SUBDIVIDING THE INTERVAL 17

The Gaussian quadratures are of order 2n − 1 compared to order n − 1 for


non-optimal abscissas. However, the optimal points generally can not be reused
at the next iteration in an adaptive algorithm.

4.3 Reducing the error by subdividing the in-


terval
The higher order quadratures, say n > 10, suffer from round-off errors as the
weights wi generally have alternating signs. Again, using high order polyno-
mials is dangerous as they typically oscillate wildly and may lead to Runge
phenomenon. Therefore if the error of the quadrature is yet too big for a suffi-
ciently large n quadrature, the best strategy is to subdivide the interval in two
and then use the quadrature on the half-intervals. Indeed, if the error is of the
k
order hk , the subdivision would lead to reduced error, 2 h2 < hk , if k > 1.

4.4 Adaptive quadratures


Adaptive quadrature is an algorithm where the integration interval is subdivided
into adaptively refined subintervals until the given accuracy goal is reached.
Adaptive algorithms are usually built on pairs of quadrature rules (preferably
using the same points), a higher order rule (e.g. 4-point-open) and a lower
order rule (e.g. 2-point-open). The higher order rule is used to compute the
approximation, Q, to the integral. The difference between the higher order rule
and the lower order rule gives an estimate of the error, δQ. The integration
result is accepted, if
δQ < δ + ǫ|Q| , (4.14)

where δ is the absolute accuracy goal and ǫ is the relative accuracy goal of the
integration.
Otherwise the interval is subdivided into two half-intervals and the procedure
applies recursively to subintervals √with the same relative accuracy goal ǫ and
rescaled absolute accuracy goal δ/ 2.
The reuse of the function evaluations made at the previous step of adaptive
integration is very important for the efficiency of the algorithm. The equally-
spaced abscissas naturally provide for such a reuse.

4.5 Gauss-Kronrod quadratures


Gauss-Kronrod quadratures represent a compromise between equally spaced
abscissas and optimal abscissas: n points are reused from the previous iteration
(n weights as free parameters) and then m optimal points are added (m abscissas
and m weights as free parameters). Thus the accuracy of the method is n +
2m − 1. There are several special variants of these quadratures fit for particular
types of the integrands.
18 CHAPTER 4. NUMERICAL INTEGRATION

Table 4.1: Recursive adaptive integrator based on open-2/4 quadratures.


function adapt ( f , a , b , acc , eps , o l d f s ) { // a d a p t i v e i n t e g r a t o r

var x = [ 1 / 6 , 2 / 6 , 4 / 6 , 5 / 6 ] ; // a b s c i s s a s
var w= [ 2 / 6 , 1 / 6 , 1 / 6 , 2 / 6 ] ; // w e i g h t s o f h i g h e r o r d e r q u a d r a t u r e
var v = [ 1 / 4 , 1 / 4 , 1 / 4 , 1 / 4 ] ; // w e i g h t s o f l o w e r o r d e r q u a d r a t u r e
var p = [ 1 , 0 , 0 , 1 ] ; // shows t h e new p o i n t s a t each r e c u r s i o n
var n=x . l e n g t h , h=b−a ;

i f ( t y p e o f ( o l d f s )==”u n d e f i n e d ” ) // f i r s t c a l l ?
f s =[ f ( a+x [ i ] ∗ h ) f o r ( i i n x ) ] ; // f i r s t c a l l : p o p u l a t e o l d f s
e l s e { // r e c u r s i v e c a l l : o l d f s ar e g i v e n
f s = new Array ( n ) ;
f o r ( var k=0 , i =0; i <n ; i ++){
i f ( p [ i ] ) f s [ i ]= f ( a+x [ i ] ∗ h ) ; // new p o i n t s
else f s [ i ]= o l d f s [ k ++];}} // r e u s e o f o l d p o i n t s

f o r ( var q4=q2=i =0; i <n ; i ++){


q4+=w [ i ] ∗ f s [ i ] ∗ h ; // h i g h e r o r d e r e s t i m a t e
q2+=v [ i ] ∗ f s [ i ] ∗ h ; } // l o w e r o r d e r e s t i m a t e
var t o l=a c c+e p s ∗Math . abs ( q4 ) // r e q u i r e d t o l e r a n c e
var e r r=Math . abs ( q4−q2 ) /3 // e r r o r e s t i m a t e

i f ( e r r <t o l ) // ar e we done ?
return [ q4 , e r r ] // yes , r e t u r n i n t e g r a l and e r r o r
e l s e { // t o o b i g e r r or , p r e p a r i n g t h e r e c u r s i o n
a c c /=Math . s q r t ( 2 . ) // r e s c a l e t h e a b s o l u t e ac c u r ac y g o a l
var mid=(a+b ) /2
var l e f t =[ f s [ i ] f o r ( i i n f s ) i f ( i < n / 2 ) ] // s t o r e t h e l e f t p o i n t s
var r g h t =[ f s [ i ] f o r ( i i n f s ) i f ( i >=n / 2 ) ] // s t o r e t h e r i g h t p o i n t s
var [ q l , e l ]= adapt ( f , a , mid , eps , acc , l e f t ) // d i s p a t c h two r e c u r s i v e
calls
var [ qr , e r ]= adapt ( f , mid , b , eps , acc , r g h t )
return [ q l+qr , Math . s q r t ( e l ∗ e l+e r ∗ e r ) ] // r e t u r n t h e grand
estimates
}
}

4.6 Integrals over infinite intervals


4.6.1 Infinite intervals
One way to calculate an integral over infinite interval is to transform it into an
integral over a finite interval,
+∞ +1
1 + t2
 
t
Z Z
f (x)dx = f dt , (4.15)
−∞ −1 1 − t2 (1 − t2 )2

by the variable substitution



t 1 + t2 1 + 4x2 − 1
x= 2
, dx = dt , t = . (4.16)
1−t (1 − t2 )2 2x

The integral over finite interval can be evaluated by ordinary integration


methods.
4.6. INTEGRALS OVER INFINITE INTERVALS 19

Alternatively,
Z +∞ 1     
dt 1−t 1−t
Z
f (x)dx = f +f − . (4.17)
−∞ 0 t2 t t

4.6.2 Half-infinite intervals


An integral over a half-infinite interval can be transformed into an integral over
a finite interval,
Z +∞ Z 1  
t 1
f (x)dx = f a+ dt , (4.18)
a 0 1 − t (1 − t)2

by the variable substitution


t 1 x−a
x=a+ , dx = dt , t = . (4.19)
1−t (1 − t)2 1 + (x − a)

Similarly,
a 0  
t −1
Z Z
f (x)dx = f a− dt , (4.20)
−∞ −1 1+t (1 + t)2
Alternatively,
+∞ 1  
1 − t dt
Z Z
f (x)dx = f a+ (4.21)
a 0 t t2
b 1  
1 − t dt
Z Z
f (x)dx = f b− (4.22)
−∞ 0 t t2
20 CHAPTER 4. NUMERICAL INTEGRATION
Chapter 5

Monte Carlo integration

Monte Carlo integration is a cubature where the points, at which the integrand
is evaluated, are chosen randomly. Typically no assumptions are made about
the smoothness of the integrand, not even that it is continuous.
Plain Monte Carlo algorithm distributes points uniformly throughout the
integration region using either uncorrelated pseudo-random or correlated quasi-
random sequences of points.
Adaptive algorithms, such as VEGAS and MISER, distribute points non-
uniformly in an attempt to reduce integration error. They use correspondingly
importance and stratified sampling.

5.1 Multi-dimensional integration


One of the problems in multi-dimensional integration is that the integration
region Ω is often quite complicated, with the boundary not easily described by
simple functions. However, it is usually much easier to find out whether a given
point lies within the integration region or not. Therefore a popular strategy
is to create an auxiliary rectangular volume V which contains the integration
volume Ω and an auxiliary function F which coincides with the integrand inside
the volume Ω and is equal zero outside. Then the integral of the auxiliary
function over the (simple rectangular) auxiliary volume is equal the original
integral.
Unfortunately, the auxiliary function is generally non-continuous at the
boundary and thus the ordinary quadratures which assume continuous inte-
grand will fail badly here while the Monte-Carlo quadratures will do just as
good (or as bad) as with continuous integrand.

5.2 Plain Monte Carlo sampling


Plain Monte Carlo is a quadrature with random abscissas and equal weights ,
Z N
X
f (x)dV ≈ w f (xi ) , (5.1)
V i=1

21
22 CHAPTER 5. MONTE CARLO INTEGRATION

where x a point in the multi-dimensional integration space. One free parameter,


w, allows one condition to be satisfied: the quadrature has to integrate exactly
a constant function. This gives w = V /N ,
N
V X
Z
f (x)dV ≈ f (xi ) = V hf i . (5.2)
V N i=1

According to the central limit theorem the error estimate ǫ is close to


σ
ǫ=V√ , (5.3)
N
where σ is the variance of the sample,
σ 2 = hf 2 i − hf i2 . (5.4)

The 1/ N convergence of the error, typical for a random process, is quite
slow.

Table 5.1: Plain Monte Carlo integrator


function p l a i n m c ( fun , a , b , N) {
var randomx = function ( a , b ) // t hr ow s a random p o i n t i n s i d e
i n e g r a t i o n volume
[ a [ i ]+Math . random ( ) ∗ ( b [ i ]−a [ i ] ) f o r ( i i n a ) ] ;
var V=1; f o r ( var i i n a ) V∗=b [ i ]−a [ i ] ; // V = i n t e g r a t i o n
volume
f o r ( var sum=0 ,sum2=0 , i =0; i <N; i ++){ //main l o o p
var f=f u n ( randomx ( a , b ) ) ; // s am pl i n g t h e f u n c t i o n
sum+=f ; sum2+=f ∗ f } // a c c u m u l a t i n g s t a t i s t i c s
var a v e r a g e =sum/N;
var v a r i a n c e=sum2/N−a v e r a g e ∗ a v e r a g e ;
var i n t e g r a l=V∗ a v e r a g e ; // i n t e g r a l
var e r r o r=V∗Math . s q r t ( v a r i a n c e /N) ; // e r r o r
return [ i n t e g r a l , e r r o r ] ;
} // end pl ai n m c

5.3 Importance sampling


Suppose that the points are distributed not uniformly but with some density
ρ(x) : the number of points ∆n in the volume ∆V around point x is given as
N
∆n = ρ∆V, (5.5)
V
R
where ρ is normalised such that V ρdV = V .
The estimate of the integral is then given as
N N  
V f
Z X X
f (x)dV ≈ f (xi )∆Vi = f (xi ) =V , (5.6)
V i=1 i=1
N ρ(xi ) ρ

where
V
∆Vi = (5.7)
N ρ(xi )
5.4. STRATIFIED SAMPLING 23

0.8

y 0.6

0.4

0.2

0
0 0.2 0.4 0.6 0.8 1
x

Figure 5.1: Stratified sample of a discontinuous function,


f (x, y) = (x2 + y 2 < 0.82 ) ? 1 : 0

is the “volume per point” at the point xi .


The corresponding variance is now given by
*  +  
2 2
2 f f
σ = − . (5.8)
ρ ρ

Apparently if the ratio f /ρ is close to a constant, the variance is reduced.


It is tempting to take ρ = |f | and sample directly from the function to
be integrated. However in practice it is typically expensive to evaluate the
integrand. Therefore a better strategy is to build an approximate density in
the product form, ρ(x, y, . . . , z) = ρx (x)ρy (y) . . . ρz (z), and then sample from
this approximate density. A popular routine of this sort is called VEGAS. The
sampling from a given function can be done using the Metropolis algorithm
which we shall not discuss here.

5.4 Stratified sampling


Stratified sampling is a generalisation of the recursive adaptive integration al-
gorithm to random quadratures in multi-dimensional spaces.
The ordinary “dividing by two” strategy does not work for multi-dimensions
as the number of sub-volumes grows way too fast to keep track of. Instead one
estimates along which dimension a subdivision should bring the most dividends
and only subdivides along this dimension. Such strategy is called recursive
stratified sampling. A simple variant of this algorithm is given in table 5.4.
In a stratified sample the points are concentrated in the regions where the
variance of the function is largest, as illustrated on figure 5.4.
24 CHAPTER 5. MONTE CARLO INTEGRATION

Table 5.2: Recursive stratified sampling


sample N random p o i n t s with p l a i n Monte C a r l o ;
e s t i m a t e th e a v e r a g e and th e e r r o r ;
i f th e e r r o r i s a c c e p t a b l e :
return th e a v e r a g e and th e e r r o r ;
else :
f o r each d i m e n s i o n :
s u b d i v i d e th e volume i n two a l o n g th e d i m e n s i o n ;
e s t i m a t e th e sub−v a r i a n c e s i n th e two sub−volumes ;
p i c k th e d i m e n s i o n with th e l a r g e s t sub−v a r i a n c e ;
s u b d i v i d e th e volume i n two a l o n g t h i s d i m e n s i o n ;
d i s p a t c h two r e c u r s i v e c a l l s to each o f th e sub−volumes ;
e s t i m a t e th e grand a v e r a g e and grand e r r o r ;
return th e grand a v e r a g e and grand e r r o r ;

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Figure 5.2: Typical distributions of pseudo-random (left) and quasi-random


(right) points in two dimensions.

5.5 Quasi-random (low-discrepancy) sampling


Pseudo-random sampling has high discrepancy1 – it typically creates regions
with hight density of points and other regions with low density of points, as
illustrated on fig. 5.2. With pseudo-random sampling there is a finite probability
that all the N points would fall into one half of the region and none into the
other half.
Quasi-random sequences avoid this phenomenon by distributing points in
a highly correlated manner with a specific requirement of low discrepancy, see
fig. 5.2 for an example. Quasi-random sampling is like a computation on a
grid where the grid constant must not be known in advance as the grid is
ever gradually refined and the points are always distributed uniformly over the
region. The computation can be stopped at any time.
By placing points √ more evenly than at random, the quasi-random sequences
try to improve the 1/ N convergence rate of pseudo-random sampling.
The central limit theorem does not work in this case as the points are not
statistically independent. Therefore the variance can not be used as an estimate
1 discrepancy is a measure of how unevenly the points are distributed over the region.
5.5. QUASI-RANDOM (LOW-DISCREPANCY) SAMPLING 25

of the error. The error estimation is actually not trivial. In practice one can
employ two different sequences and use their difference as the error estimate.
Quasi-random sequences can be roughly divided into lattice rules and digital
nets (see e.g. arXiv:1003.4785 [math.NA] and references therein).

5.5.1 Lattice rules


In the simplest incarnation a lattice rule can be defined as follows.
Let αi , i = 1, . . . , d, (where d is the dimension of the integration space) be
a set of cleverly chosen irrational numbers, like square roots of prime numbers.
Then the kth point (in the unit volume) of the sequence is given as

x(k) = {frac(kα1 ), . . . , frac(kαd )} , (5.9)

where frac(x) is the fractional part of x.


A problem with this method is that a high accuracy arithmetics (e.g. long
double) might be needed in order to generate a reasonable amount of quasi-
random numbers.
26 CHAPTER 5. MONTE CARLO INTEGRATION
Chapter 6

Ordinary differential
equations

6.1 Introduction
Many scientific problems can be formulated in terms of a system of ordinary
differential equations (ODE),

y′ (x) = f (x, y) , (6.1)

with an initial condition


y(x0 ) = y0 , (6.2)

where y ≡ dy/dx, and the boldface variables y and f (x, y) are generally un-
derstood as column-vectors.

6.2 Runge-Kutta methods


Runge-Kutta methods are one-step methods for numerical integration of ODE
(6.1). The solution y is advanced from the point x0 to x1 = x0 + h using a
one-step formula
y1 = y0 + hk, (6.3)
where y1 is the approximation to y(x1 ), and k is a cleverly chosen (vector)
constant. The Runge-Kutta methods are distinguished by their order : a method
has order p if it can integrate exactly an ODE where the solution is a polynomyal
of order p. In other words, if the error of the method is O(hp+1 ) for small h.
The first order Runge-Kutta method is the Euler’s method,

k = f (x0 , y0 ) . (6.4)

Second order Runge-Kutta methods advance the solution by an auxiliary


evaluation of the derivative, e.g. the mid-point method,

k0 = f (x0 , y0 ) ,
k1/2 = f (x0 + 12 h, y0 + 21 hk0 ) ,
k = k1/2 , (6.5)

27
28 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

or the two-point method,


k0 =
f (x0 , y0 ),
k1 =
f (x0 + h, y0 + hk0 ),
1
k = (k0 + k1 ) . (6.6)
2
These two methods can be combined into a third order method,
1 4 1
k= k0 + k1/2 + k1 . (6.7)
6 6 6
The most commont is the fourth-order method, which is called RK4 or simply
the Runge-Kutta method,
k0 = f (x0 , y0 ) ,
k1 = f (x0 + 12 h, y0 + 12 hk0 ) ,
k2 = f (x0 + 21 h, y0 + 12 hk1 ) ,
k3 = f (x0 + h, y0 + hk2 ) ,
1
k = 6 (k0 + 2k1 + 2k2 + k3 ) . (6.8)
Higher order Runge-Kutta methods have been devised, with the most fa-
mous being the Runge-Kutta-Fehlberg fourth/fifth order method, RKF45, im-
plemented in the renowned rkf45.f Fortran routine.

6.3 Multistep methods


Multistep methods try to use the information about the function gathered at
the previous steps. They are generally not self-starting as there are no previous
points at the start of the integration.

6.3.1 A two-step method


Given the previous point, (x−1 , y−1 ), in addition to the current point (x0 , y0 ),
the sought function y can be approximated in the vicinity of the point x0 as
ȳ(x) = y0 + y0′ · (x − x0 ) + c · (x − x0 )2 , (6.9)
where y0′ = f (x0 , y0 ) and the coefficient c is found from the condition y(x−1 ) =
y−1 ,
y−1 − y0 + y0′ · (x0 − x−1 )
c= . (6.10)
(x0 − x−1 )2
The value of the function at the next point, x1 , can now be estimated as ȳ(x1 )
from (6.9).

6.4 Predictor-corrector methods


Predictor-corrector methods use extra iterations to improve the solution. For
example, the two-point Runge-Kutta method (6.6) is as actually a predictor-
corrector method, as it first calculates the prediction ỹ0 for y(x0 ),
ỹ1 = y0 + hf (x0 , y0 ) , (6.11)
6.5. STEP SIZE CONTROL 29

and then uses this prediction in a correction step,

˜ 1 = y0 + h 1 f (x0 , y0 ) + f (x1 , ỹ1 )



ỹ (6.12)
2

Similarly, one can use the two-step approximation (6.9) as a predictor, and
then improve it by one order with a correction step, namely

¯ (x) = ȳ(x) + d · (x − x0 )2 (x − x−1 ).


ȳ (6.13)

The coefficient d can be found from the condition ȳ ¯ ′ (x1 ) = f̄1 , where f̄1 =
f (x1 , ȳ(x1 )),
f̄1 − y0′ − 2c · (x1 − x0 )
d= . (6.14)
2(x1 − x0 )(x1 − x−1 ) + (x1 − x0 )2
Equation (6.13) gives a better estimate, y1 = ȳ¯ (x1 ), of the function at the point
x1 .
In this context the formula (6.9) is referred to as predictor, and (6.13) as
corrector. The difference between the two gives an estimate of the error.

6.5 Step size control


6.5.1 Error estimate
The error δy of the integration step for a given method can be estimated e.g. by
comparing the solutions for a full-step and two half-steps (the Runge principle),

ytwo half steps − yfull step


δy ≈ , (6.15)
2p − 1

where p is the order of the algorithm used. It is better to pick formulas where
the full-step and two half-step calculations share the evaluations of the function
f (x, y).
Another possibility is to make the same step with two methods of different
orders, the difference between the solutions providing an estimate of the error.
In a predictor-corrector method the correction itself can serve as the estimate
of the error.

Table 6.1: Runge-Kutta mid-point stepper with error estimate.


function r k s t e p ( f , x , y , h ) { // Runge−Kutta m i dpoi n t s t e p w i t h e r r o r
estimate
var k0 = f ( x , y ) // d e r i v a t i v e s a t x0
var y12 = [ y [ i ]+ k0 [ i ] ∗ h /2 f o r ( i i n y ) ] // h a l f −s t e p
var k12 = f ( x+h / 2 , y12 ) // d e r i v a t i v e s a t h a l f −s t e p
var y1 = [ y [ i ]+ k12 [ i ] ∗ h f o r ( i i n y ) ] // f u l l s t e p
var d e l t a y = [ ( k12 [ i ]−k0 [ i ] ) ∗h/2 f o r ( i i n y ) ] // e r r o r e s t i m a t e
return [ y1 , d e l t a y ]
}
30 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

6.5.2 Adaptive step size control


Let tolerance τ be the maximal accepted error consistent with the required
absolute, δ, and relative, ǫ, accuracies to be achieved in the integration of an
ODE,
τ = ǫkyk + δ , (6.16)
where kyk is the “norm” of the column-vector y. Pn
Suppose the inegration is done in n steps of size hi such that i=1 hi =
b − a. Under assumption that the errors at the integration steps are random
and independent, the step tolerance τi for the step i has to scale as the square
root of the step size, r
hi
τi = τ . (6.17)
b−a
Then, if the error ei on the step i is less than the step tolerance, ei ≤ τi , the
total error E will be consistent with the total tolerance τ ,
v v v
u n u n u n
uX uX u X hi
E≈t e2i ≤ t τi2 = τ t =τ . (6.18)
i=1 i=1 i=1
b−a

In practice one uses the current values of the function y in the estimate of
the tolerance, r
hi
τi = (ǫkyi k + δ) (6.19)
b−a
The step is accepted if the error is smaller than tolerance. The next step-size
can be estimated according to the empirical prescription
 τ Power
hnew = hold × × Safety, (6.20)
e
where Power ≈ 0.25, Safety ≈ 0.95. If the error ei is larger than tolerance τi
the step is rejected and a new step with the new step size (6.20) is attempted.

Table 6.2: An ODE driver with adaptive step size control.


function r k d r i v e ( f , a , b , y0 , acc , eps , h ) { //ODE d r i v e r :
// i n t e g r a t e s y ’= f ( x , y ) w i t h a b s o l u t e ac c u r ac y acc and r e l a t i v e
ac c u r ac y e ps
// from a t o b w i t h i n i t i a l c o n d i t i o n y0 and i n i t i a l s t e p h
// s t o r i n g t h e r e s u l t s i n a r r a y s x l i s t and y l i s t
var norm=function ( v ) Math . s q r t ( v . r e d u c e ( function ( a , b ) a+b∗b , 0 ) ) ;
var x=a , y=y0 , x l i s t =[ a ] , y l i s t =[ y0 ] ;
while ( x<b ) {
i f ( x+h>b ) h=b−x // t h e l a s t s t e p has t o l a n d on ” b ”
var [ y1 , dy]= r k s t e p ( f , x , y , h ) ;
var e r r=norm ( dy ) , t o l =(norm ( y1 ) ∗ e p s+a c c ) ∗Math . s q r t ( h / ( b−a ) ) ;
i f ( e r r <t o l ) {x+=h ; y=y1 ; x l i s t . push ( x ) ; y l i s t . push ( y ) ; } // a c c e p t
the step
i f ( e r r >0) h∗=Math . pow ( t o l / e r r , 0 . 2 5 ) ∗ 0 . 9 5 ; e l s e h ∗=2; //new s t e p
} // end w h i l e
return [ x l i s t , y l i s t ] ;
} // end r k d r i v e
Chapter 7

Nonlinear equations

7.1 Introduction
Non-linear equations or root-finding is a problem of finding a set of n variables
{x1 , . . . , xn } which satisfy n equations
fi (x1 , ..., xn ) = 0 , i = 1, . . . , n , (7.1)
where the functions fi are generally non-linear.

7.2 Newton’s method


Newton’s method (also reffered to as Newton-Raphson method, after Isaac New-
ton and Joseph Raphson) is a root-finding algorithm that uses the first term of
the Taylor series of the functions fi to linearise the system (7.1) in the vicinity
of a suspected root. It is one of the oldest and best known methods and is a
basis of a number of more refined methods.
Suppose that the point x ≡ {x1 , . . . , xn } is close to the root. The Newton’s
algorithm tries to find the step ∆x which would move the point towards the
root, such that
fi (x + ∆x) = 0 , i = 1, . . . , n . (7.2)
The first order Taylor expansion of (7.2) gives a system of linear equations,
n
X ∂fi
fi (x) + ∆xk = 0 , i = 1, . . . , n , (7.3)
∂xk
k=1

or, in the matrix form,


J∆x = −f (x), (7.4)
1
where f (x) ≡ {f1 (x), . . . , fn (x)} and J is the matrix of partial derivatives ,
∂fi
Jik ≡ , (7.5)
∂xk
1 in practice if derivatives are not available analytically one uses finite differences
∂fi fi (x1 , . . . , xk−1 , xk + δx, xk+1 , . . . , xn ) − fi (x1 , . . . , xk , . . . , xn )

∂xk δx
with δx ≪ s where s is the typical scale of the problem at hand.

31
32 CHAPTER 7. NONLINEAR EQUATIONS

called the Jacobian matrix.


The solution ∆x to the linear system (7.4) gives the approximate direction
and the step-size towards the solution.
The Newton’s method converges quadratically if sufficiently close to the
solution. Otherwise the full Newton’s step ∆x might actually diverge from
the solution. Therefore in practice a more conservative step λ∆x with λ < 1 is
usually taken. The strategy of finding the optimal λ is referred to as line search.
It is typically not worth the effort to find λ which minimizes kf (x + λ∆x)k
exactly, since ∆x is only an approximate direction towards the root. Instead an
inexact but quick minimization strategy is usually used, like the backtracking
line search where one first attempts the full step, λ = 1, and then backtracks,
λ ← λ/2, until either the condition
 
λ
kf (x + λ∆x)k < 1 − kf (x)k (7.6)
2
is satisfied, or λ becomes too small.

7.3 Broyden’s quasi-Newton method


The Newton’s method requires calculation of the Jacobian at every iteration.
This is generally an expensive operation. Quasi-Newton methods avoid calcu-
lation of the Jacobian matrix at the new point x + δx, instead trying to use
certain approximations, typically rank-1 updates.
Broyden algorithm estimates the Jacobian J + δJ at the point x + δx using
the finite-difference approximation,

(J + δJ)δx = δf , (7.7)

where δf ≡ f (x + δx) − f (x) and J is the Jacobian at the point x.


The matrix equation (7.7) is under-determined in more than one dimension
as it contains only n equations to determine n2 matrix elements of δJ. Broyden
suggested to choose δJ as a rank-1 update, linear in δx,

δJ = c δxT , (7.8)

where the unknown vector c can be found by substituting (7.8) into (7.7), which
gives
δf − Jδx T
δJ = δx . (7.9)
kδxk2

7.4 Javascript implementation


load ( ’ . . / l i n e a r / qrdec . j s ’ ) ; load ( ’ . . / l i n e a r / qrback . j s ’ ) ;

function newton ( f s , x , acc , dx ) { //Newton ’ s r oot −f i n d i n g method


var norm=function ( v ) Math . s q r t ( v . r e d u c e ( function ( s , e ) s+e ∗ e , 0 ) ) ;
i f ( a c c==u n d e f i n e d ) a c c =1e−6
i f ( dx==u n d e f i n e d ) dx=1e−3
var J = [ [ 0 f o r ( i i n x ) ] f o r ( j i n x ) ]
var m i n u s f x=[− f s [ i ] ( x ) f o r ( i i n x ) ]
do{
7.4. JAVASCRIPT IMPLEMENTATION 33

f o r ( i i n x ) f o r ( k i n x ) { // c a l c u l a t e J ac ob i an
x [ k]+=dx
J [ k ] [ i ]=( f s [ i ] ( x )+m i n u s f x [ i ] ) / dx
x [ k]−=dx }
var [Q,R]= q r d e c ( J ) , Dx=q r b a c k (Q, R, m i n u s f x ) // Newton ’ s s t e p
var s=2
do{ // s i m p l e b a c k t r a c k i n g l i n e s e a r c h
s=s / 2 ;
var z =[x [ i ]+ s ∗Dx [ i ] f o r ( i i n x ) ]
var m i n u s f z =[− f s [ i ] ( z ) f o r ( i i n x ) ]
} while ( norm ( m i n u s f z )>(1−s / 2 ) ∗norm ( m i n u s f x ) && s > 1 . / 1 2 8 )
m i n u s f x=m i n u s f z ; x=z ; // s t e p done
} while ( norm ( m i n u s f x )>a c c )
return x ;
} // end newton
34 CHAPTER 7. NONLINEAR EQUATIONS
Chapter 8

Optimization

Optimization is a problem of finding the minimum (or the maximum) of a given


real (non-linear) function F (p) of an n-dimensional argument p ≡ {x1 , . . . , xn }.

8.1 Downhill simplex method


The downhill simplex method (also called Nelder-Mead method or amoeba method)
is a commonnly used nonlinear optimization algorithm. The minimum of a func-
tion in an n-dimensional space is found by transforming a simplex (a polytope
of n+1 vertexes) according to the function values at the vertexes, moving it
downhill until it converges towards the minimum.
To introduce the algorithm we need the following definitions:
• Simplex: a figure (polytope) represented by n+1 points, called vertexes,
{p1 , . . . , pn+1 } (where each point pk is an n-dimensional vector).
• Highest point: the vertex, phi , with the largest value of the function:
f (phi ) = max(k) f (pk ).
• Lowest point: the vertex, plo , with the smallest value of the function:
f (plo ) = min(k) f (pk ).
• Centroid: the center of gravity of all points, except for the highest: pce =
1
P
n (k6=hi) pk

The simplex is moved downhill by a combination of the following elementary


operations:
1. Reflection: the highest point is reflected against the centroid, phi → pre =
pce + (pce − phi ).
2. Expansion: the highest point reflects and then doubles its distance from
the centroid, phi → pex = pce + 2(pce − phi ).
3. Contraction: the highest point halves its distance from the centroid, phi →
pco = pce + 12 (phi − pce ).
4. Reduction: all points, except for the lowest, move towards the lowest
points halving the distance. pk6=lo → 12 (pk + plo ).

35
36 CHAPTER 8. OPTIMIZATION

Finally, here is a possible algorithm for the downhill simplex method:


repeat :
f i n d h i g h e s t , l o w e s t , and c e n t r o i d p o i n t s
try r e f l e c t i o n
i f f (reflected) < f (highest) :
accept r e f l e c t i o n
i f f (reflected) < f (lowest) :
try expansion
i f f (expanded) < f (reflected) :
accept expansion
else :
try contraction
i f f (contracted) < f (highest) :
accept contr acti on
else :
do r e d u c t i o n
u n t i l c o n v e r g e d ( e . g . s i z e ( s i m p l e x )<t o l e r a n c e )

8.2 Javascript implementation


function amoeba (F , s , a c c ) { // s : i n i t a l s i m pl e x , F : f u n c t i o n t o
minimize
var sum =function ( x s ) x s . r e d u c e ( function ( s , x ) s+x , 0 )
var norm=function ( x s ) Math . s q r t ( x s . r e d u c e ( function ( s , x ) s+x∗x , 0 ) )
var d i s t=function ( as , bs ) norm ( [ ( a s [ k]− bs [ k ] ) f o r ( k i n a s ) ] )
var s i z e=function ( s ) norm ( [ d i s t ( s [ i ] , s [ 0 ] ) f o r ( i i n s ) i f ( i >0) ] )
var p=s [ 0 ] , n=p . l e n g t h , f s =[F( s [ i ] ) f o r ( i i n s ) ] // v e r t e x e s
while ( s i z e ( s )>a c c ) {
var h=0 , l =0
f o r ( var i i n f s ) { // f i n d i n g h i g h and low p o i n t s
i f ( f s [ i ]> f s [ h ] ) h=i
i f ( f s [ i ]< f s [ l ] ) l=i }
var pce =[sum ( [ s [ i ] [ k ] f o r ( i i n s ) i f ( i !=h ) ] ) /n f o r ( k i n p ) ] //
p centroid
var p r e =[ pce [ k ]+( pce [ k]− s [ h ] [ k ] ) f o r ( k i n p ) ] , Fre=F( p r e ) //
p reflected
var pex =[ pce [ k ] + 2 ∗ ( pce [ k]− s [ h ] [ k ] ) f o r ( k i n p ) ] // p e x pan de d
i f ( Fre<f s [ h ] ) { // a c c e p t r e f l e c t i o n
f o r ( var k i n p ) s [ h ] [ k]= p r e [ k ] ; f s [ h]= Fre
i f ( Fre<f s [ l ] ) {
var Fex=F( pex )
i f ( Fex<Fre ) { // e x pan s i on
f o r ( var k i n p ) s [ h ] [ k ]= pex [ k ] ; f s [ h]= Fex }}}
else {
var pco =[ pce [ k ] + . 5 ∗ ( pce [ k]− s [ h ] [ k ] ) f o r ( k i n p ) ] , Fco=F( pco ) //
contraction
i f ( Fco<f s [ h ] ) { // c o n t r a c t i o n
f o r ( var k i n p ) s [ h ] [ k]= pco [ k ] ; f s [ h]= Fco }
e l s e { // r e d u c t i o n
f o r ( var i i n s ) i f ( i != l ) {
f o r ( var k i n p ) s [ i ] [ k ] = . 5 ∗ ( s [ i ] [ k ]+ s [ l ] [ k ] )
f s [ i ]=F( s [ i ] ) } } }
} // end w h i l e
return s [ l ]
} // end amoeba
Chapter 9

Eigenvalues and
eigenvectors

9.1 Introduction
A non-zero column-vector v is called an eigenvector of a matrix A with an
eigenvalue λ, if
Av = λv . (9.1)
If an n × n matrix A is real and symmetric, AT = A, then it has n real
eigenvalues λ1 , . . . , λn , and its (orthogonalized) eigenvectors V = {v1 , . . . , vn }
form a full basis,
V V T = V TV = 1 , (9.2)
in which the matrix is diagonal,
 
λ1 0 ··· 0
 .. 
0 λ2 . 
V T AV = 
.
 . (9.3)
 .. .. 
. 
0 ··· λn

Matrix diagonalization means finding all eigenvalues and (optionally) eigenvec-


tors of a matrix.
Eigenvalues and eigenvectors enjoy a maltitude of applications in different
branches of science and technology.

9.2 Similarity transformations


Orthogonal transformations,

A → QT AQ , (9.4)

where QT Q = 1, and, generally, similarity transformations,

A → S −1 AS , (9.5)

37
38 CHAPTER 9. EIGENVALUES AND EIGENVECTORS

preserve eigenvalues and eigenvectors. Therefore one of the strategies to diago-


nalize a matrix is to apply a sequence of similarity transformations (also called
rotations) which (iteratively) turn the matrix into diagonal form.

9.2.1 Jacobi eigenvalue algorithm


Jacobi eigenvalue algorithm is an iterative method to calculate the eigenvalues
and eigenvectors of a real symmetric matrix by a sequence of Jacobi rotations.
Jacobi rotation is an orthogonal transformation which zeroes a pair of the
off-diagonal elements of a (real symmetric) matrix A,

A → A′ = J(p, q)T AJ(p, q) : A′pq = A′qp = 0 . (9.6)

The orthogonal matrix J(p, q) which eliminates the element Apq is called the
Jacobi rotation matrix. It is equal identity matrix except for the four elements
with indices pp, pq, qp, and qq,

1
 
 .. 

 . 0 

 cos φ · · · sin φ  ← row p

. .

J(p, q) = 
 .. .. .. 
. (9.7)
. 
 ← row q
 

 − sin φ · · · cos φ 
 .. 
 0 . 
1

Or explicitly,

J(p, q)ij = δij ∀ ij ∈


/ {pq, qp, pp, qq} ;
J(p, q)pp = cos φ = J(p, q)qq ;
J(p, q)pq = sin φ = −J(p, q)qp . (9.8)

After a Jacobi rotation, A → A′ = J T AJ, the matrix elements of A′ become

A′ij = Aij ∀ i 6= p, q ∧ j 6= p, q
A′pi = A′ip = cApi − sAqi ∀ i 6= p, q ;
A′qi = A′iq = sApi + cAqi ∀ i 6= p, q ;
A′pp = c2 App − 2scApq + s2 Aqq ;
A′qq = s2 App + 2scApq + c2 Aqq ;
A′pq = A′qp = sc(App − Aqq ) + (c2 − s2 )Apq , (9.9)

where c ≡ cos φ, s ≡ sin φ. The angle φ is chosen such that after rotation the
matrix element A′pq is zeroed,

Aqq − App
cot(2φ) = ⇒ A′pq = 0 . (9.10)
2Apq

A side effect of zeroing a given off-diagonal element Apq by a Jacobi rotation


is that other off-diagonal elements are changed. Namely the elements of the
9.3. POWER ITERATION METHODS 39

rows and columns with indices equal to p and q. However, after the Jacobi
rotation the sum of squares of all off-diagonal elemens is reduced. The algorithm
repeatedly performs rotations until the off-diagonal elements become sufficiently
small.
The convergence of the Jacobi method can be proved for two strategies for
choosing the order in which the elements are zeroed:

1. Classical method: with each rotation the largest of the remaining off-
diagonal elements is zeroed.

2. Cyclic method: the off-diagonal elements are zeroed in strict order, e.g.
row after row.

Although the classical method allows the least number of rotations, it is


typically slower than the cyclic method since searching for the largest element
is an O(n2 ) operation. The count can be reduced by keeping an additional array
with indexes of the largest elements in each row. Updating this array after each
rotation is only an O(n) operation.
A sweep is a sequence of Jacobi rotations applied to all non-diagonal ele-
ments. Typically the method converges after a small number of sweeps. The
operation count is O(n) for a Jacobi rotation and O(n3 ) for a sweep.
The typical convergence criterion
P is that the sum of absolute values of the
off-diagonal elements is small, i<j |Aij | < ǫ, where ǫ is the required accuracy.
Other criteria can also be used, like the largest off-diagonal element is small,
max |Ai<j | < ǫ, or the diagonal elements have not changed after a sweep.
The eigenvectors can be calculated as V = 1J1 J2 ..., where Ji are the suc-
cessive Jacobi matrices. At each stage the transformation is

Vij → Vij , j 6= p, q
Vip → cVip − sViq (9.11)
Viq → sVip + cViq

Alternatively, if only one (or few) eigenvector vk is needed, one can instead
solve the (singular) system (A − λk )v = 0.

9.3 Power iteration methods


9.3.1 Power method
Power method is an iterative method to calculate an eigenvalue and the corre-
sponding eigenvector using the iteration

xi+1 = Axi . (9.12)

The iteration converges to the eigenvector of the largest eigenvalue. The eigen-
value can be estimated using the Rayleigh quotient

xTi Axi xTi+1 xi


λ[xi ] = = . (9.13)
xTi xi xTi xi
40 CHAPTER 9. EIGENVALUES AND EIGENVECTORS

9.3.2 Inverse power method


The iteration with the inverse matrix

xi+1 = A−1 xi (9.14)

converges to the smallest eigenvalue of matrix A. Alternatively, the iteration

xi+1 = (A − s)−1 xi (9.15)

converges to an eigenvalue closest to the given number s.

9.3.3 Inverse iteration method


Inverse iteration method is the refinement of the inverse power method where
the trick is not to invert the matrix in (9.15) but rather solve the linear system

(A − λ)xi+1 = xi (9.16)

using e.g. QR decomposition.


One can update the estimate for the eigenvalue using the Rayleigh quotient
λ[xi ] after each iteration and get faster convergence for the price of O(n3 ) op-
erations per QR-decomposition; or one can instead make more iterations (with
O(n2 ) operations per iteration) using the same matrix (A − λ). The optimal
strategy is probably an update after several iterations.

9.4 JavaScript implementation


function j a c o b i (M) { // J a c o b i d i a g o n a l i z a t i o n
// i n p u t : m at r i x M[ ] [ ] ; o u t p u t : e g e n v a l u e s E [ ] , e i g e n v e c t o r s V [ ] [ ] ;
var V= [ [ ( i==j ? 1 : 0 ) f o r ( i i n M) ] f o r ( j i n M) ]
var A=M // in−p l a c e d i a g o n a l i z a t i o n , r i g h t t r i a n g l e o f M i s
destroyed
var e p s = 1 e −12 , r o t a t e d , sweeps =0;
do{ r o t a t e d =0;
f o r ( var r =0; r<M. l e n g t h ; r++)f o r ( var c=r +1; c<M. l e n g t h ; c++){ //
sweep
i f ( Math . abs (A[ c ] [ r ] )>e p s ∗ ( Math . abs (A[ c ] [ c ] )+Math . abs (A[ r ] [ r ] )
) ){
r o t a t e d =1; r o t a t e ( r , c , A,V) ; }
} sweeps ++;// end sweep
} while ( r o t a t e d ==1) ; // end do
var E = [A[ i ] [ i ] f o r ( i i n A) ] ;
return [ E , V, sweeps ] ;
} // end j a c o b i

function r o t a t e ( p , q , A,V) { // J a c o b i r o t a t i o n e l i m i n a t i n g A pq .
// Only upper t r i a n g l e o f A i s u pdat e d .
// The m at r i x o f e i g e n v e c t o r s V i s a l s o u pdat e d .
i f ( q<p ) [ p , q ] = [ q , p ]
var n=A. l e n g t h , app = A[ p ] [ p ] , aqq = A[ q ] [ q ] , apq = A[ q ] [ p ] ;
var p h i =0.5∗ Math . a ta n 2 (2 ∗ apq , aqq−app ) ; // c o u l d be done b e t t e r
var c=Math . c o s ( p h i ) , s=Math . s i n ( p h i ) ;
A[ p ] [ p ] = c ∗ c ∗ app + s ∗ s ∗ aqq − 2 ∗ s ∗ c ∗ apq ;
A[ q ] [ q ] = s ∗ s ∗ app + c ∗ c ∗ aqq + 2 ∗ s ∗ c ∗ apq ;
A[ q ] [ p ] = 0 ;
9.4. JAVASCRIPT IMPLEMENTATION 41

f o r ( var i =0; i <p ; i ++){


var a i p=A[ p ] [ i ] , a i q=A[ q ] [ i ] ;
A[ p ] [ i ] = c ∗ a i p−s ∗ a i q ; A[ q ] [ i ] = c ∗ a i q+s ∗ a i p ; }
f o r ( var i=p+1; i <q ; i ++){
var a p i=A[ i ] [ p ] , a i q=A[ q ] [ i ] ;
A[ i ] [ p ] = c ∗ a p i−s ∗ a i q ; A[ q ] [ i ] = c ∗ a i q+s ∗ a p i ; }
f o r ( var i=q +1; i <n ; i ++){
var a p i=A[ i ] [ p ] , a q i=A[ i ] [ q ] ;
A[ i ] [ p ] = c ∗ a p i−s ∗ a q i ; A[ i ] [ q ] = c ∗ a q i+s ∗ a p i ; }
i f (V!= u n d e f i n e d ) // u pdat e e i g e n v e c t o r s
f o r ( var i =0; i <n ; i ++){
var v i p=V[ p ] [ i ] , v i q=V[ q ] [ i ] ;
V[ p ] [ i ] = c ∗ vip −s ∗ v i q ; V[ q ] [ i ] = c ∗ v i q+s ∗ v i p ; }
} // end r o t a t e
42 CHAPTER 9. EIGENVALUES AND EIGENVECTORS
Chapter 10

Power method and Krylov


subspaces

10.1 Introduction
When calculating an eigenvalue of a matrix A using the power method, one starts
with an initial random vector b and then computes iteratively the sequence
Ab, A2 b, . . . , An−1 b normalising and storing the result in b on each iteration.
The sequence converges to the eigenvector of the largest eigenvalue of A.
The set of vectors

Kn = b, Ab, A2 b, . . . , An−1 b ,

(10.1)

where n < rank(A), is called the order-n Krylov matrix, and the subspace
spanned by these vectors is called the order-n Krylov subspace. The vectors are
not orthogonal but can be made so e.g. by Gram-Schmidt orthogonalisation.
For the same reason that An−1 b approximates the dominant eigenvector one
can expect that the other orthogonalised vectors approximate the eigenvectors
of the n largest eigenvalues.
Krylov subspaces are the basis of several successful iterative methods in
numerical linear algebra, in particular: Arnoldi and Lanczos methods for finding
one (or a few) eigenvalues of a matrix; and GMRES (Generalised Minimum
RESidual) method for solving systems of linear equations.
These methods are particularly suitable for large sparse matrices as they
avoid matrix-matrix operations but rather multiply vectors by matrices and
work with the resulting vectors and matrices in Krylov subspaces of modest
sizes.

10.2 Arnoldi iteration


Arnoldi iteration is an algorithm where the order-n orthogonalised Krylov ma-
trix Qn of a matrix A is built using stabilised Gram-Schmidt process:
• start with a set Q = {q1 } of one random normalised vector q1
• repeat for k = 2 to n :

43
44 CHAPTER 10. POWER METHOD AND KRYLOV SUBSPACES

– make a new vector qk = Aqk−1


– orthogonalise qk to all vectors qi ∈ Q storing q†i qk → hi,k−1
– normalise qk storing kqk k → hk,k−1
– add qk to the set Q
By construction the matrix Hn made of the elements hjk is an upper Hessenberg
matrix,  
h1,1 h1,2 h1,3 ··· h1,n
h2,1 h2,2 h2,3 ··· h2,n 
 
Hn =  0 h3,2 h3,3 ··· h3,n 
 , (10.2)

 .. .. .. .. .. 
 . . . . . 
0 ··· 0 hn,n−1 hn,n
which is a partial orthogonal reduction of A into Hessenberg form,
Hn = Q†n AQn . (10.3)
The matrix Hn can be viewed as a representation of A in the Krylov subspace
Kn . The eigenvalues and eigenvectors of the matrix Hn approximate the largest
eigenvalues of matrix A.
Since Hn is a Hessenberg matrix of modest size its eigenvalues can be rela-
tively easily computed with standard algorithms.
In practice if the size n of the Krylov subspace becomes too large the method
is restarted.

10.3 Lanczos iteration


Lanczos iteration is Arnoldi iteration for Hermitian matrices, in which case the
Hessenberg matrix Hn of Arnoldi method becomes a tridiagonal matrix Tn .
The Lanczos algorithm thus reduces the original hermitian N × N matrix
A into a smaller n × n tridiagonal matrix Tn by an orthogonal projection onto
the order-n Krylov subspace. The eigenvalues and eigenvectors of a tridiagonal
matrix of a modest size can be easily found by e.g. the QR-diagonalisation
method.
In practice the Lanczos method is not very stable due to round-off errors
leading to quick loss of orthogonality. The eigenvalues of the resulting tridi-
agonal matrix may then not be a good approximation to the original matrix.
Library implementations fight the stability issues by trying to prevent the loss
of orthogonality and/or to recover the orthogonality after the basis is generated.

10.4 Generalised minimum residual (GMRES)


GMRES is an iterative method for the numerical solution of a system of linear
equations,
Ax = b , (10.4)
where the exact solution x is approximated by the vector xn ∈ Kn that min-
imises the residual Axn − b in the Krylov subspace Kn of matrix A,
x ≈ xn ← min kAx − bk . (10.5)
x∈Kn
Chapter 11

Fast Fourier transform

Fast Fourier transform (FFT) is an efficient algorithm to compute the discrete


Fourier transform (DFT).
Computing DFT of N points in the naive way, using the definition, takes
O(N 2 ) arithmetic operations, while an FFT can compute the same result in only
O(N log N ) operations. The difference in speed can be substantial, especially for
large data sets. This improvement made many DFT-based algorithms practical.
Since the inverse of a DFT is also a DFT any FFT algorithm can be used
in for the inverse DFT as well.
The most well known FFT algorithms, like the Cooley-Tukey algorithm,
depend upon the factorization of N . However, there are FFTs with O(N log N )
complexity for all N , even for prime N .

11.1 Discrete Fourier Transform


For a set of complex numbers xn , n = 0, . . . , N − 1, the DFT is defined as a set
of complex numbers ck ,
N −1
nk
X
ck = xn e−2πi N , k = 0, . . . , N − 1 . (11.1)
n=0

The inverse DFT is given by


N −1
1 X nk
xn = ck e+2πi N . (11.2)
N
k=0

These transformations can be viewed as expansion of the vector xn in terms


kn
of the orthogonal basis of vectors e2πi N ,
N −1   k′ n
X kn

e2πi N e−2πi N = N δkk′ . (11.3)
n=0

The DFT represent the amplitude and phase of the different sinusoidal com-
ponents in the input data xn .
The DFT is widely used in different fields, like spectral analysis, data com-
pression, solution of partial differential equations and others.

45
46 CHAPTER 11. FAST FOURIER TRANSFORM

11.1.1 Applications
Data compression
Several lossy (that is, with certain loss of data) image and sound compression
methods employ DFT as an approximation for the Fourier series. The signal
is discretized and transformed, and then the Fourier coefficients of high/low
frequencies, which are assumed to be unnoticeable, are discarded. The decom-
pressor computes the inverse transform based on this reduced number of Fourier
coefficients.

Partial differential equations


Discrete Fourier transforms are often used to solve partial differential equations,
where the DFT is used as an approximation for the Fourier series (which is re-
covered in the limit of infinite N ). The advantage of this approach is that it
expands the signal in complex exponentials einx , which are eigenfunctions of
d inx
differentiation: dx e = ineinx . Thus, in the Fourier representation, differenti-
ation is simply multiplication by in.
A linear differential equation with constant coefficients is transformed into an
easily solvable algebraic equation. One then uses the inverse DFT to transform
the result back into the ordinary spatial representation. Such an approach is
called a spectral method.

Convolution and Deconvolution


DFT can be used to efficiently compute convolutions of two sequences. A con-
volution is the pairwise product of elements from two different sequences, such
as in multiplying two polynomials or multiplying two long integers.
Another example comes from data acquisition processes where the detector
introduces certain (typically Gaussian) blurring to the sampled signal. A recon-
struction of the original signal can be obtained by deconvoluting the acquired
signal with the detector’s blurring function.

11.2 Cooley-Tukey algorithm


In its simplest incarnation this algorithm re-expresses the DFT of size N = 2M
in terms of two DFTs of size M ,
N −1
nk
X
ck = xn e−2πi N
n=0
M−1 M−1
mk k mk
X X
= x2m e−2πi M + e−2πi N x2m+1 e−2πi M
m=0 m=0
(
(even) k (odd)
ck + e−2πi N ck ,k < M
= (even) k−M (odd) , (11.4)
ck−M − e−2πi N ck−M ,k ≥ M

where c(even) and c(odd) are the DFTs of the even- and odd-numbered sub-sets
of x.
11.3. MULTIDIMENSIONAL DFT 47

This re-expression of a size-N DFT as two size- N2 DFTs is sometimes called


k
the Danielson-Lanczos lemma. The exponents e−2πi N are called twiddle factors.
The operation count by application of the lemma is reduced from the original
N 2 down to 2(N/2)2 + N/2 = N 2 /2 + N/2 < N 2 .
For N = 2p Danielson-Lanczos lemma can be applied recursively until the
data sets are reduced to one datum each. The number of operations is then
reduced to O(N ln N ) compared to the original O(N 2 ). The established library
FFT routines, like FFTW and GSL, further reduce the operation count (by a
constant factor) using advanced programming techniques like precomputing the
twiddle factors, effective memory management and others.

11.3 Multidimensional DFT


For example, a two-dimensional set of data xn1 n2 , n1 = 1 . . . N1 , n2 = 1 . . . N2
has the discrete Fourier transform
NX
1 −1 N 2 −1
n1 k1 n2 k2
−2πi −2πi
X
ck1 k2 = xn1 n2 e N1
e N2
. (11.5)
n1 =0 n2 =0

11.4 C implementation
#i n c l u d e <complex . h>
#i n c l u d e <tgmath . h>
#d e f i n e PI 3 . 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2 6 4 3 3 8 3 2 7 9 5 0 2 8 8
v o i d d f t ( i n t N, complex ∗ x , complex ∗ c , i n t s i g n ) {
complex w = cexp ( s i g n ∗2∗ PI ∗ I /N) ;
f o r ( i n t k =0;k<N; k++){
complex sum=0; f o r ( i n t n=0;n<N; n++) sum+=x [ n ] ∗ cpow (w, n∗k ) ;
c [ k]=sum/ s q r t (N) ; }
}
v o i d f f t ( i n t N, complex ∗ x , complex ∗ c , i n t s i g n ) {
i f (N%2==0){
complex w = exp ( s i g n ∗2∗ PI ∗ I /N) ;
i n t M=N/ 2 ;
complex xo [M] , co [M] , xe [M] , c e [M] ; //VLA: c om pi l e w i t h −s t d=c99
f o r ( i n t m=0;m<M;m++){ xo [m]=x [ 2 ∗m+ 1 ] ; xe [m]=x [ 2 ∗m] ; }
f f t (M, xo , co , s i g n ) ; f f t (M, xe , ce , s i g n ) ;
f o r ( i n t k =0; k<M; k++) c [ k ]=( c e [ k]+cpow (w, k ) ∗ co [ k ] ) / s q r t ( 2 ) ;
f o r ( i n t k=M; k<N; k++) c [ k ]=( c e [ k−M]+cpow (w, k ) ∗ co [ k−M] ) / s q r t ( 2 ) ;
}
e l s e d f t (N, x , c , s i g n ) ;
}
48 CHAPTER 11. FAST FOURIER TRANSFORM
Index

Arnoldi iteration, 43

back substitution, 1
backtracking line search, 32

cubature, 15

Danielson-Lanczos lemma, 47

eigenvalue, 37
eigenvector, 37

forward substitution, 2

GMRES, 44

Jacobi rotation, 38
Jacobian matrix, 32

Krylov matrix, 43
Krylov subspace, 43

Lanczos iteration, 44
line search, 32
LU decomposition, 2

matrix diagonalization, 37

Newton-Cotes quadrature, 15

orthogonal transformation, 37

QR decomposition, 2
quadrature, 15

Rayleigh quotient, 40
root-finding, 31

similarity transformation, 37

triangular system, 1

49

You might also like