0% found this document useful (0 votes)
15 views

Lecture Notes: CE 33500, Computational Methods in Civil Engineering

vcc

Uploaded by

fikadu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Lecture Notes: CE 33500, Computational Methods in Civil Engineering

vcc

Uploaded by

fikadu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Lecture notes: CE 33500, Computational

Methods in Civil Engineering


Nir Krakauer
September 26, 2018

This work is licensed under a Creative Commons


“Attribution-ShareAlike 4.0 International” license.

1 Introduction: Mathematical models, numeri-


cal methods, and Taylor series
• Engineering problem → Mathematical model → Numerical method
There are approximations/uncertainties in each step (build in factors
of safety and conservative assumptions to make design robust to ‘small’
errors)
Example: Beam bending under loading
Example: Contraction of steel by cooling (Kaw 01.01)
• “Numerical methods” solve math problems, often approximately, using
arithmetic operations (add, subtract, multiply, divide) and logic opera-
tions (such as checking whether one quantity x is greater than another
quantity y – returns true/false). They therefore permit math problems
relevant to engineering to be easily handled by computers, which can carry
out arithmetic and logic operations very fast.
On the other hand, “analytical methods”, like those learned in calcu-
lus, give exact solutions, but are not available for many math problems of
engineering interest (such as complicated integrals and differential equa-
tions that don’t have integration formulas). Computers can help with
using such methods also.
• The term “computational methods” may include numerical methods plus
setting up the mathematical model as well as representing the model and
any needed numerical methods for solving it as a computer program

• Many numerical methods are based on approximating arbitrary functions


with polynomials. A common method for doing this is based on Taylor’s

1
theorem: If f is a function that has k + 1 continuous derivatives over the
interval between a and x, then

(x − a)2 00 (x − a)k (k)


f (x) = f (a) + (x − a)f 0 (a) + f (a) + . . . + f (a) + Rk+1 ,
2 k!
where the “remainder term” Rk+1 has the form

(x − a)k+1 (k+1)
Rk+1 = f (ζ)
(k + 1)!
for some ζ between a and x.
We typically don’t know the value of Rk+1 exactly, since the exact
value of ζ isn’t easily determined. Rk+1 represents the error in approx-
imating f (x) by the first k + 1 terms in the right-hand side, which are
known. But if the difference x − a is small enough, its k + 1 power will
also be small, and we might therefore find that the remainder term is small
enough to be negligible.
Taylor’s theorem can also be written in a series form:
k
X (x − a)i
f (x) = f (i) (a) + Rk+1 .
i=0
i!

If f is infinitely differentiable between a and x, we can write



X (x − a)i
f (x) = f (i) (a)
i=0
i!

with an infinite series and no remainder term.


The Taylor series when a = 0 is also known as the Maclaurin series.
If we define h ≡ x − a, we can also write

h2 00 hk (k)
f (a + h) = f (a) + hf 0 (a) + f (a) + . . . + f (a) + Rk+1
2 k!
with
hk+1 (k+1)
Rk+1 = f (ζ)
(k + 1)!
for some ζ between a and a + h, or more compactly
k
X hi
f (a + h) = f (i) (a) + Rk+1
i=0
i!

or as an infinite series

X hi
f (a + h) = f (i) (a).
i=0
i!

2
• Taylor’s theorem can be applied to find series expansions for functions in
terms of polynomials:

x2 x3 x4
ex = 1 + x + + + + ...
2 6 24
(converges for all x)

x3 x5
sin(x) = x − + − ...
6 120

x2 x4
cos(x) = 1 − + − ...
2 24

1
= 1 + x + x2 + x3 + . . .
1−x
(converges for |x| < 1)

x2 x3 x4
log(x + 1) = x − + − + ...
2 3 4
(converges for |x| < 1)
(In this class, trigonometric functions are always for the argument x
in radians, and log means the natural logarithm [base e].)
• Several important numerical methods can be directly derived from Tay-
lor’s theorem, including Newton’s method, finite differences, and Euler’s
method:
• Newton’s method for estimating x where some function f is equal to 0
can be derived from Taylor’s theorem:

f (x) = f (a) + (x − a)f 0 (a) + R2 →

f (a) R2 f (a)
0 = f (a) + (x − a)f 0 (a) + R2 → x = a − 0
− 0 ≈a− 0 ,
f (a) f (a) f (a)
with the approximation good if R2 is small (small h or small second deriva-
tive f (2) )
Example: Apply Newton’s method to estimate the square root of
26 iteratively. Set f (x) = x2 − 26 = 0, start with a0 = 5, get a1 =
a0 − ff0(a0) f (a1 ) f (a2 )
(a0 ) = 5.1, a2 = a1 − f 0 (a1 ) = 5.0990196078 . . . , a3 = a2 − f 0 (a2 ) =
5.09901951359
√ . . . , giving a series of increasingly accurate numerical esti-
mates of 26

3
• Centered finite difference for estimating the derivative of some function f
at x :
h2 00
f (x + h) = f (x) + hf 0 (x) + f (x) + R3,+
2
2
h
f (x − h) = f (x) − hf 0 (x) + f 00 (x) − R3,−
2
f (x + h) − f (x − h) = 2hf 0 (x) + R3,+ + R3,−
f (x + h) − f (x − h) R3,+ + R3,− f (x + h) − f (x − h)
f 0 (x) = − ≈
2h 2h 2h

where
h3 (3) h3 (3)
R3,+ =f (ζ+ ), R3,− = f (ζ− )
6 6
for some ζ+ between x and x + h and ζ− between x − h and x.
Example: To estimate the derivative of f (x) = ex at x = 1, we can
use the approximation derived here with h = 0.1 : f 0 (1) ≈ f (1.1)−f
2·0.1
(0.9)
=
2.72281456 . . .
The finite-difference approximation generally becomes more accurate
as we make h closer to zero, because the remainder terms get smaller
• Euler’s method for numerically approximating the value of y(x) given the
differential equation y 0 (x) = g(x, y(x)) and the initial value y(a) = y0 :

y(x) = y(a)+(x−a)y 0 (a)+R2 = y(a)+(x−a)g(a, y0 )+R2 ≈ y(a)+(x−a)g(a, y0 )

Example: If we know that y(0) = 1, y 0 (x) = y, we can estimate y(0.4)


as y(0.4) ≈ 1+0.4·1 = 1.4. To get a more accurate estimate, we can reduce
the size of the left-out remainder terms by solving the problem in two steps:
y(0.2) ≈ y1 = 1 + 0.2 · 1 = 1.2 and y(0.4) ≈ y2 = 1.2 + 0.2 · 1.2 = 1.44,
or even in four steps: y(0.1) ≈ y1 = 1 + 0.1 · 1 = 1.1, y(0.2) ≈ y2 =
1.1 + 0.1 · 1.1 = 1.21, y(0.3) ≈ y3 = 1.21 + 0.1 · 1.21 = 1.331, y(0.4) ≈ y4 =
1.331 + 0.1 · 1.331 = 1.4641. In this case, the true answer can be found
analytically to be e0.4 ≈ 1.4918.

2 Error sources and control


• To solve important problems reliably, need to be able to identify, control,
and estimate sources of error
• Size of error: If the true answer is x∗ and the numerical answer is x,
Absolute error : |x − x∗ |
Fractional error :
|x − x∗ |
|x∗ |

4
Usually for an engineering calculation we want the fractional error
introduced during the solution step to be small, say < 10−6 ; sometimes,
we also want the absolute error to be under some threshold.
• Approximate (estimated) error: needed in practice, because we generally
don’t know the exact answer x∗
There are a few ways to estimate approximately how big the error
might be. For example, if we have two different numerical approximations
x1 , x2 of x∗ , we can write
Approximate absolute error: |x1 − x2 |
Approximate fractional error:
|x1 − x2 |
|x+ |

where x+ is the best available approximation (whichever of x1 or x2


is believed to be more accurate than the other, or their average if both
are believed to be equally accurate)
• Error types in solving math problems: gross error, roundoff error, trunca-
tion error
• Gross error : Entering the wrong number, using the wrong commands or
syntax, incorrect unit conversion . . .
This does happen in engineering practice (but isn’t looked on kindly).
Example: Mars Climate Orbiter spacecraft lost in space in 1999 be-
cause Lockheed Martin calculated rocket thrust in lb while NASA thought
the results were in N
Detection methods: Know what answer to expect and check if what
you got makes sense; try to find the answer for a few simple cases where you
know what it should be (programs to be used to solve important problems
should have a test suite to do this); compare answers with others who
worked independently
• Roundoff error in floating-point computation: results from the fact that
computers only carry a finite number of decimal places – about 16 for
Matlab’s default IEEE double-precision format (machine epsilon [eps] or
unit roundoff  ≈ 10−16 )
Example: 0.3/0.1 - 3 is nonzero in Matlab, but is of order 
The double-precision format uses 64 bits (binary digits) to represent
each number – 1 bit for the sign, 11 bits for the base-2 exponent (which
can range between −1022 and 1023), and 52 bits for the significand or
mantissa, which is interpreted as binary 1.bbbb. . . . It can represent num-
ber magnitudes between about 10−308 and 10308 (beyond that, numbers
overflow to infinity or underflow to zero)

5
Multiplication and division under roundoff are subject to maximum
fractional error of 
Addition of two similar numbers also has maximum roundoff error
similar to , but subtraction of two numbers that are almost the same can
have roundoff error much bigger than  (cancellation of significant digits)
A non-obvious example of subtractive cancellation: computing ex
P∞ i
using the first number of terms in the Taylor series expansion ex = i=0 xi!
(In Matlab, sum(x .^ (0:imax) ./ factorial(0:imax))) when x is a
large negative number, say −20
Another example: Approximating derivatives with finite difference
formulas with small increments h
In general, math operations which incur large roundoff error (like
subtracting similar numbers) tend to be those that are ill-conditioned,
meaning that small fractional changes in the numerical values used can
change the output by a large fractional amount
Examples of ill-conditioned problems: solving linear systems when
the coefficient matrix has a large condition number; trigonometric opera-
tions with a large argument (say, sin(10100 )); the quadratic formula when
b is much larger than a and c; finding a polynomial that interpolates a
large number of given points
Mitigation: Use extended precision; reformulate problems to avoid
subtracting numbers that are very close together
• Truncation error : Running an iterative numerical algorithm for only a
few steps, whereas convergence to the exact answer requires theoretically
an infinite number of steps
Often can be thought of as only considering the first few terms in the
Taylor series
“Steps” can be terms in the Taylor series, iterations of Newton’s
method or bisection for root finding, number of subdivisions in the com-
posite trapezoid rule or Simpson rule for numerical integration, etc.
Detection: Estimate truncation error (and roundoff error) by com-
paring the results of different numerical methods, or the same method run
for different numbers of steps on the same problem
Mitigation: Run for more steps (at the cost of more computations);
use a more accurate numerical method, if available
• Non-mathematical error sources should also be considered in formulating
and interpreting a problem, including measurement errors, variations in
material properties, uncertainties as to the loadings that will be faced,
simplifications in constructing math models . . . . They may have large
effects, but will be discussed more in other courses
Models are always only approximations (like maps) that hopefully
represent the main aspects of interest in an engineering problem

6
3 Linear systems
3.1 Properties
• Linear systems arise directly from engineering problems (stresses, circuits,
pipes, traffic networks . . . ) as well as indirectly via numerical methods, for
example finite difference and finite element methods for solving differential
equations
• Any system of m linear equations in n unknowns x1 , x2 , x3 , . . . xn (where
each equation looks like a1 x1 + a2 x2 + a3 x3 + . . . + an xn = b, with different
a and b coefficients) can be written in a standard matrix form Ax = b,
where
A is the m × n matrix with each row containing the (known) coeffi-
cients of the unknowns in one linear equation,
x is the n × 1 vector of unknowns,
b is the m × 1 vector of (known) constant terms in the equations.
Can also write the system as an m × (n + 1) “augmented matrix” A|b
(with x implied)
• Does a solution exist? For square systems (m = n), there is a unique
solution equal to A−1 b if A has an inverse.
If A has no inverse (is singular ), then there will be either no solution
or infinitely many solutions.
A has no inverse when the equations of a linear system with A as the
coefficient matrix are not linearly independent of each other. E.g.: the 2
given equations in 2 unknowns are x1 + 2x2 = 3, 2x1 + 4x2 = 6 so that the
coefficient matrix is  
1 2
2 4

• Solution accuracy measures


Error size: ||x−x∗ ||, where x∗ is the true solution and x is computed;
Residual size: ||Ax − b|| (usually easier to calculate than the error)
• Note: norms, denoted by ||x||, measure the size of a vector or matrix
(analogous to absolute value, |x|, for a scalar)
2-norm of a vector v:
sX
||v||2 = vi2
i

2-norm of a matrix A: ||A||2 is defined as the maximum ||Av||2 over


all vectors v such that ||v||2 = 1.
Similar in magnitude to the largest element of the matrix

7
• If a matrix A has an inverse but is very close to a non-invertible matrix,
then
In exact math, any linear system Ax = b has a unique solution
However, a small change in A or b can change the solution x by a lot
(ill-conditioning)
The equations can become linearly dependent if we change the coef-
ficients slightly
Another way of putting this is that there are vectors x far from the
true solution x∗ (large error compared to x∗ ) where Ax is nevertheless
close to b (small residual).
Roundoff errors in the computation have an effect similar to perturb-
ing the coefficient matrix A or b slightly, potentially giving a very wrong
solution (large error), although it will generally have a small residual.
We measure how close a matrix is to a singular matrix by the condi-
tion number, defined as cond(A) = ||A|| × ||A−1 ||, which ranges from 1
to infinity. A matrix with no inverse has infinite condition number. The
condition number gives the factor by which small changes to the coeffi-
cients, due to measurement or roundoff error, can multiply to give a larger
error in x. A large condition number means that linear systems with this
coefficient matrix will be ill-conditioned – changing the numerical values
by a small fraction could change the solution by a large percentage.

3.2 Solution methods


• An upper triangular matrix has only zero entries for columns less than the
row number, i.e. Ai,j = 0 whenever j < i. (Ai,j [or Aij ] here refers to the
element in row i and column j of the matrix A.)
• If the coefficient matrix of a square linear system A is upper triangular,
then it can generally be solved for the unknown xi by back substitution:
xn = bn /An,n
for i = n − 1, n − 2, . . . to 1
 Pn 
xi = bi − j=i+1 Ai,j xj /Ai,i

• Similarly, a lower triangular matrix has only zero entries for columns more
than the row number, i.e. Ai,j = 0 whenever j > i.
• If the coefficient matrix of a square linear system A is lower triangular,
then it can generally be solved for the unknown xi by forward substitution:
x1 = b1 /A1,1
for i = 2, 3, . . . to n
 Pi−1 
xi = bi − j=1 Ai,j xj /Ai,i

8
• Any square linear system can generally be transformed into an upper
triangular one with the same solution through Gauss elimination:
for j = 1, 2, . . . to n
p = Aj,j (pivot element; j is the pivot row or pivot equation)
for i = j + 1, j + 2, . . . to n
M = Ai,j /p (multiplier)
Ai,: = Ai,: − M × Aj,:
bi = bi − M × bj
• Example: to solve
   
−1 2 2 8
 1 1 1 x =  1 
1 3 2 4

for x:

     
−1 2 2 8 −1 2 2 8 −1 2 2 8
 1 1 1 1  (augmented matrix) →  0 3 3 9 → 0 3 3 9 
1 3 2 4 0 5 4 12 0 0 −1 −3

and employ back substitution to get


 
−2
x= 0 
3

• Gauss elimination requires a total of 13 n3 each subtraction and multipli-


cation operations; forward and back substitution require only some n2 .
• Gauss elimination would fail if any of the pivots p is zero, even when the
linear system actually has a unique solution. If a pivot is not zero but
very small, the Gauss elimination wouldn’t fail in exact math, but any
roundoff error from previous steps could become large when we divide by
the very small p.
• Solution: row pivoting – rearrange the rows (equations) so that the pivot
is as large as possible (in absolute value). This avoids dividing by a pivot
element that is zero or small in absolute value.
• Algorithm for Gauss elimination with row pivoting:
for j = 1, 2, . . . to n
Row pivoting:

9
Find row l such that |Al,j | is the maximum of all the elements in
column j that are at or below row j.
Interchange rows l and j in the matrix A. (This is the same as
multiplying A (and then b) by a permutation matrix Q, which is an
identity matrix with rows l and j interchanged.)
Interchange elements l and j in the vector b.

p = Aj,j (pivot element; j is the pivot row)


for i = j + 1, j + 2, . . . to n
M = Ai,j /p (multiplier)
Ai,: = Ai,: − M × Aj,:
bi = bi − M × bj
• Example:

     
0 2 2 −2 4 −2 4 4 4 −2 4 4
 1 −2 −1 2 → 1 −2 −1 2 → 0 2 2 −2  →
3
4 −2 4 4 0 2 2 −2 0 − 2 −2 1
 
4 −2 4 4
 0 2 2 −2 
0 0 − 2 − 21
1

Giving 
−1
x =  −2 
1

• LU (=lower/upper triangular) decomposition (or factorization): A matrix


A can be written as the matrix product LU, where L is unit lower trian-
gular (unit meaning that all elements on the main diagonal – where the
row and column numbers are the same – are equal to 1) and U is upper
triangular.
• The LU decomposition can be obtained based on Gauss elimination, as
follows:
for j = 1, 2, . . . to n
p = Aj,j (pivot element; j is the pivot row)
for i = j + 1, j + 2, . . . to n
M = Ai,j /p (multiplier)
Save M as Li,j
Ai,: = Ai,: − M × Aj,:

10
Then U is the transformed A (which is upper triangular) and L is
the matrix built up from the multipliers M , with ones added along the
main diagonal.
Example:
     
−1 2 2 −1 2 2 −1 2 2
 1 1 1  →  −1| 3 3  →  −1| 3 3 
1 3 2 −1| 5 4 −1 5/3| −1

This gives us the factors in the form (L\U) , which can be expanded to
   
1 0 0 −1 2 2
L =  −1 1 0 ,U =  0 3 3 
−1 5/3 1 0 0 −1

You can then check that LU is in fact equal to the original matrix.
• LU decomposition is useful in solving linear systems because once we have
the decomposition of a matrix A, we can solve any system with the coef-
ficient matrix A using forward and back substitution and only around n2
operations instead of n3 :
Given Ax = b with unknown x and the LU decomposition A = LU,
Solve the lower triangular system Ly = b for y.
Now that y is known, solve the upper triangular system Ux = y for
the unknown x.
• Thus, if we need to solve several problems with the same coefficient matrix
A and different vectors b, it’s more efficient to find the LU decomposition
of A and use that to solve for the different unknown vectors, instead of
repeating the Gauss elimination for each problem.
• What about row pivoting?
If we include row pivoting in the LU decomposition above, what we
will get is a permuted LU decomposition with factors L and U such that
PA = LU, where P is a permutation matrix that represents the row
interchanges:
Start with P = I, an n × n identity matrix.
for j = 1, 2, . . . to n
Row pivoting:
Find row l such that |Al,j | is the maximum of all the ele-
ments in column j that are at or below row j.
Interchange rows l and j in the matrix A. (This is the same
as multiplying A by a permutation matrix Q, which is an identity matrix
with rows l and j interchanged.)

11
Update P ← QP. (Interchange rows l and j of P)
p = Aj,j (pivot element; j is the pivot row)
for i = j + 1, j + 2, . . . to n
M = Ai,j /p (multiplier)
Save M as Li,j
Ai,: = Ai,: − M × Aj,:
Then U is the transformed A (which is upper triangular) and L is
the matrix built up from the multipliers M , with ones added along the
main diagonal.

• Example:

     
0 2 2 (1) 4 −2 4 (3) 4 −2 4 (3)
 1 −2 −1 (2)  →  1 −2 −1 (2)  →  14 | − 32 −2 (2)  →
4 −2 4 (3) 0 2 2 (1) 0| 2 2 (1)
   
4 −2 4 (3) 4 −2 4 (3)
 0| 2 2 (1)  →  0| 2 2 (1) 
1
4| − 23 −2 (2) 1
4 − 4 | − 12 (2)
3

Giving the factors


   
1 0 0 4 −2 4
L= 0 1 0 ,U =  0 2 2 .
1
4 − 34 1 0 0 − 21

The product LU is equal to the original matrix with the rows permuted
(switched). In this case the permutation matrix P is
 
0 0 1
 1 0 0 
0 1 0

• To solve a linear system given a permuted LU decomposition PA = LU:


Given Ax = b with unknown x and the LU decomposition PA = LU
(Note that PAx = Pb, so LUx = Pb.)
Form y = Ux
Solve the lower triangular system Ly = Pb for the unknown y.
Now that y is known, solve the upper triangular system Ux = y for
the unknown x.

12
• In addition to the LU decomposition, a symmetric matrix (one that’s the
same as its transpose) may also has a Cholesky decomposition, for which
the upper triangular factor is the transpose of the lower triangular factor
(specifically, the Cholesky decomposition of a symmetric matrix exists if
the matrix is positive definite). This decomposition can be computed with
about half as many arithmetic operations as LU decomposition, and can
used to help solve linear systems with this coefficient matrix just like the
LU decomposition can.
Given a symmetric positive definite matrix A, its lower triangular
Cholesky factor L (A = LLT ) can be computed as:
for j = 1, 2, . . . to n
v
u
u j−1
X
Lj,j = tAj,j − L2j,k
k=1

for i = j + 1, j + 2, . . . to n
Pj−1
Ai,j − k=1 Li,k Lj,k
Li,j =
Lj,j

Example:
   
1 2 3 1 0 0
If A =  2 5 7  , then L =  2 1 0 
3 7 14 3 1 2

4 Eigenvalues and eigenvectors


• A vector v is an eigenvector of a matrix A if Av = λv for some nonzero
scalar λ. λ is then an eigenvalue of A.
• Newton’s second law for a system with forces linear in the displacements
(as in ideal springs connecting n different masses, or a discretized ap-
proximation to a linear beam or to a multistory building) can be written
as
d2 x
Mx00 = −Kx (where x00 stands for the acceleration, dt2 ) – or

x00 = −Ax,

where A ≡ M−1 K.

For
√ this system, if Av = λv, then x(t) = v sin( λt) and x(t) =
v cos( λt) are solutions.

13
Since this is a system of linear differential equations, any linear com-
bination of solutions is also a solution. (There are normally n eigenvalue-
eigenvector pairs λi , vi of A, and we need 2n initial conditions [i.e. the
values of x(0) and x0 (0)] to find a unique solution x(t).)
Pn √ √
The general solution is √
therefore i=1 √
ci vi sin( λi t)+di vi cos( λi t),
Pn
or equivalently i=1 ai vi ei λi t + bi vi e−i λi t , where ci , di or ai , bi can be
determined from the initial conditions
The eigenvectors are modes of oscillation for the system, and the
eigenvalues are the squared frequencies for each mode.
In analyzing vibrational systems, the first few modes (the ones with
the lowest frequencies/eigenvalues) are usually the most important be-
cause they are the most likely to be excited and the slowest to damp. The
fundamental mode is the one with lowest frequency.
Modes of e.g. a beam or structure can be found experimentally by
measuring the responses induced by vibrations with different frequencies
(modal analysis).
• How do we find eigenvalues and eigenvectors?
For diagonal or (upper/lower) triangular matrices, the eigenvalues are
just the diagonal elements
For general 2 × 2 matrices, we can solve a quadratic equation for the
eigenvalues λ – generally, there will be two (they may be complex). For
each eigenvalue, we can then solve a linear system to get the eigenvector.
Example:
If  
−1 2
A= ,
1 1
eigenvalues λ and eigenvectors v must be solutions to
Av = λv, or (A − λI)v = 0.
Assuming that v isn’t a zero vector, this implies that (A − λI) is not in-
vertible, so its determinant must be zero. But the determinant of (A − λI)
is (−1 − λ)(1 − λ) − 2, so we have the characteristic polynomial

λ2 − 3 = 0 → λ = ± 3.
To find the eigenvectors corresponding
√ to each λ, we solve the linear system
to find v. We have Av = ± 3v, or
 √ 
−1 ∓ 3 2√
v = 0,
1 1∓ 3
where the second row is a multiple of the first, so there is not a unique
solution. We have  
1

v=
(1 ± 3)/2

14
or any multiples thereof.
• In general, an n×n matrix has n (complex) eigenvalues, which are the roots
of a characteristic polynomial of order n. We could find the eigenvalues by
writing and solving for this characteristic polynomial, but more efficient
numerical methods exist, for example based on finding a QR factorization
of the matrix (which we won’t cover in this class).
• A conceptually simple numerical method for finding the largest (in abso-
lute value) eigenvalue of any given square matrix is the power method. It
involves the iteration:
Start with an n × 1 vector v
Do until convergence:
v ← Av
v
v← ||v||
Example: If    
1 2 1
A= , v0 = ,
1 1 1
then (using the vector ∞-norm ||v||∞ ≡ max(|vi |)) successive iterations
produce for v
       
1 1 1 1
, , , ,···
2/3 5/7 12/17 29/41
 
converging toward the eigenvector √1 , which has the eigenvalue
2/2

2 + 1.

• A symmetric real matrix will have only real eigenvalues. Otherwise, eigen-
values of a real matrix, being the roots of a polynomial, may also come in
complex conjugate pairs.
A symmetric matrix is positive definite if (and only if) all its eigen-
values are positive.

• For any square matrix,


The product of the eigenvalues is equal to the determinant
Hence, if any eigenvalue is zero, the matrix is singular (has no inverse).
If no eignevalues are zero, the matrix has an inverse.
The sum of the eigenvalues is equal to the sum of the elements on the
matrix main diagonal (called the trace)
• A matrix has the same eigenvalues as its transpose

15
• The eigenvalues of the inverse of a matrix (if the inverse exists) are the
reciprocals of the eigenvalues of the matrix, while the eigenvectors of both
are the same.
• For any matrix A, the square root of the ratio of the largest to smallest
eigenvalue of AAT is equal to the (2-norm) condition number of A

• If a matrix is diagonal or upper or lower triangular, its eigenvalues are the


elements on its main diagonal.
• A linear system with damping, Mx00 +Cx0 +Kx = 0 where P2n C is a matrix of
damping coefficients, has the general solution x(t) = i=1 ci vi eλi t , where
λi , vi are generalized eigenvalue-eigenvector pairs that solve the quadratic
eigenvalue problem (λ2 M + λC + K)v = 0 and the coefficients ci can be
set to match the initial conditions x(0), x0 (0).

5 Differentiation
• Finite difference (centered) to approximate f 0 (x0 ) :

f (x0 + ∆x) − f (x0 − ∆x)


f 0 (x0 ) ≈
2∆x
This approximation is ‘second-order accurate,’ meaning that trunca-
tion error is proportional to (∆x)2 (and to f 000 (x)) (derived previously
from Taylor’s theorem)
However, can’t make ∆x very small because roundoff error will in-
crease
• Richardson extrapolation
Starting with some fairly large ∆0 , define Di0 to be the centered finite-
difference estimate obtained with ∆x = ∆2i0 :

f (x0 + ∆x) − f (x0 − ∆x)


Di0 (f, x0 , ∆0 ) ≡
2 · ∆x
Then Di1 ≡ 34 Di+1
0
− 13 Di0 will typically be much more accurate than
0
Di+1
Can extend to higher j ≥ 1 up to some jmax :

4j 1
Dij ≡ j−1
Di+1 − j Dj−1
j
4 −1 4 −1 i

The difference between two estimates

(at a given jmax , |D0jmax − D1jmax −1 |)

16
can give an estimate of uncertainty, which may be used as a criterion for
convergence. jmax often doesn’t need to be large to get a very accurate
estimate, which makes roundoff error less of a problem.
Example of estimates obtained:

(f (x) = sin(x), x0 = 0.5, ∆0 = 0.2, jmax = 3)

j=0 j=1 j=2 j=3


i = 0 0.871743701440879 0.877579640095606 0.877582561716377 0.877582561890374
i = 1 0.876120655431924 0.877582379115078 0.877582561887656
i = 2 0.877216948194290 0.877582550464370
i = 3 0.877491149896850
where the answer is actually cos(0.5) = 0.8775825618903727 . . .

• Second-order accurate finite-difference approximations to higher deriva-


tives (which can also be derived from Taylor’s theorem) are

f (x + ∆x) − 2f (x) + f (x − ∆x)


f 00 (x) ≈
(∆x)2

f (x + 2∆x) − 2f (x + ∆x) + 2f (x − ∆x) − f (x − 2∆x)


f 000 (x) ≈
2(∆x)3

f (x + 2∆x) − 4f (x + ∆x) + 6f (x) − 4f (x − ∆x) + f (x − 2∆x)


f (4) (x) ≈
(∆x)4

• Non-centered (forward or backward) finite-difference approximations can


be derived which are useful for estimating a derivative at the edge of a
function’s range. For example, a second-order accurate forward finite-
difference approximation for the first derivative is

−f (x + 2∆x) + 4f (x + ∆x) − 3f (x)


f 0 (x) ≈
2∆x
• Cubic spline interpolation [see under Interpolation]
Good when only some given function values (not necessarily equally
spaced) are available
Fit an interpolating cubic spline S(x) to the given points, and esti-
mate f 0 (x) as S 0 (x)

17
6 Integration
6.1 Introduction
• Some applications of integrals
1
Rb
Average function value between a and b: b−a a
f (x)dx
Center of mass (in 1-D):
Rb
a
xρ(x)dx
Rb
a
ρ(x)dx

(ρ = density)
Rb
Moment of inertia about x = x0 (in 1-D): (x − x0 )2 ρ(x)dx
a
Rb
Net force produced by a distributed loading: a w(x)dx (w = force
per unit length)
Net moment about x = x0 produced by a distributed loading:
Z b
(x − x0 )w(x)dx
a

Rb
• Typical situations where we need to approximate an integral I = a
f (x)dx
numerically:
The function f doesn’t have an analytic integral
No mathematical expression for function is available – we can only
measure values or get them from a computation.

6.2 Basic integration rules


Rb
• Trapezoid rule: I = a f (x)dx ≈ T = (b − a) f (a)+f
2
(b)
(exact if f (x) is a
straight line [first-degree polynomial])
Rb
• Midpoint rule: I = a f (x)dx ≈ M = (b − a)f ( a+b
2 ) (also exact if f (x) is
a straight line [first-degree polynomial])
• Simpson’s rule: combines the trapezoid and midpoint rules in such a way
as to be exact if f (x) is up to a third-degree polynomial.
 
b−a a+b
S = (T + 2M )/3 = f (a) + 4f ( ) + f (b)
6 2

(1-4-1 weighting of the function values)

18
6.3 Composite rules
• For numerical approximations of an integral, we can take advantage of the
fact that integral are additive – if we divide the integration interval [a, b]
Rb Rc
into [a, c] and [c, b], the integral I = a f (x)dx is equal to I1 = a f (x)dx
Rb
plus I2 = c f (x)dx.
Composite forms of the trapezoid, midpoint, or Simpson rules di-
vide [a, b] up into n subintervals, apply the rule to each subinterval, and
then add up the results to get an approximation for the whole integral.
Generally, this improves the accuracy of the approximation compared to
applying the rule to the whole interval.
For the composite Simpson’s rule when the interval has equally spaced
points a = x0 , x1 , x2 , ...xn = b, we get a 1-4-2-4-2-. . . 4-2-4-1 weighting (For
the trapezoid rule, it’s 1-2-2-2-. . . 2-1)
Adaptive subdivision can be used to achieve high accuracy for numer-
ical integration with fewer computations.
An analogue to Taylor’s theorem can be used to derive absolute error
bounds for the composite rules with equally spaced evaluation points for
3
a function f that is smooth in [a, b]: (b−a)
12
K2
n2 for the trapezoid rule,
(b−a)3 K2 5

24 n2 for the midpoint rule, and (b−a) K4


180 n4 for Simpson’s rule, where
K2 is the maximum of |f (x)| in (a, b) and K4 is the maximum of |f 0000 (x)|
00

in (a, b). Although we often don’t know the maximum value of derivatives
of f , these bounds are nevertheless useful for estimating how the error will
decrease as a result of increasing the number of intervals n.
• To integrate functions whose values are only available at certain points
(which may be unequally spaced), there are a few options:
Composite trapezoid rule works even for unequally spaced intervals
Can interpolate the points with an easily integrable function (such as
a polynomial or cubic spline) and integrate the interpolating function.

6.4 More complex numerical methods


• Romberg integration
Let Ri0 , i = 0, 1, 2, . . . be the estimated integrals obtained by applying
the composite trapezoid rule with 2i equal-width subintervals:
 i

2X −1  
b − a f (a) + f (b) j
Ri0 ≡  + f a + (b − a) i 
2i 2 j=1
2
i
0 2 −1  
Ri−1 b−a X j
= + f a + (b − a) i [if i > 0]
2 2i j=1 2
j odd

19
Because the error from the composite trapezoid rule decreases as the
0
number of subintervals squared, The error of Ri+1 is expected to be about
0
1/4 that of Ri , and in the same direction
We exploit this by coming up with a generally more accurate estimate
Ri1 ≡ 43 Ri+1
0
− 13 Ri0 .
j
Can continue, with Rij ≡ 4j4−1 Ri+1
j−1
− 4j1−1 Rij−1 for any j ≥ 1, to
obtain generally even more accurate estimates.
Difference between two estimates can give an estimate of uncertainty,
which may be used as a criterion for convergence. For smooth functions,
i often doesn’t need to be large to get a very accurate estimate.
Algorithm:
for j = 0, 1, . . . jmax :
Evaluate the function at 2j + 1 equally spaced points, including the
endpoints a and b (giving 2j equal-width subintervals), and obtain Rj0
4j
i
Find Rj−i , i = 1, . . . j using the formula Rij ≡ j−1 1 j−1
4j −1 Ri+1 − 4j −1 Ri

For j ≥ 1, if |R0j − R1j−1 | is less than our required absolute error


tolerance, can stop (convergence reached)
As an example, consider f (x) = ex − 4x, a = 0, b = 1, jmax = 3 :

j=0 j=1 j=2 j=3


i=0 −0.14086 −0.28114 −0.28172 −0.28172
i=1 −0.24607 −0.28168 −0.28172
i=2 −0.27278 −0.28172
i=3 −0.27948

The best estimate from this is R03 or −0.28172.

• Gauss quadrature
Estimate an integral based on the function value at specific non-
equally spaced points within the interval (more points closer to the edges)
Select the sample points and weights based on approximating the
function as a polynomial of degree 2n − 1, where n is the number of points
In practice, tabulated values of the sample points xi and weights wi
for the standard integration interval [−1, 1] are available
R1 Pn
To approximate I = −1 f (x)dx, use Gn = i=1 wi f (xi )
Rb
To approximate I = a f (x)dx, use
n  
b−aX b−a
Gn = wi · f a+ (xi + 1)
2 i=1 2

20
Can give very accurate numerical integral estimates with few function
evaluations (small n).
With given n, could divide the integration interval into parts and
apply Gauss quadrature to each one in order to get increased accuracy
• Both Romberg integration and Gauss quadrature are only applicable if
we can find the function value at the desired points. Also, they may
not be more accurate than simpler methods if the function is not smooth
(e.g. has discontinuities). Essentially this is because they both rely on
approximating the function by the first terms in its Taylor series.

7 Ordinary differential equations: Initial value


problems
• Initial-value ODE first-order problem: Given dy dt = f (y, t) and y(a) =
y0 , find y(b) – this is a generalization of the definite integral problem
considered previously
• Most numerical methods for ODE IVPs consider a sequence of points a =
t0 , t1 , t2 , . . . tN = b, similar to the idea of composite integration rules, and
construct estimates of y at those points: y(a) = y0 , y1 , y2 , . . . , yN ≈ y(b)
• Euler method (explicit): yi+1 = yi + hf (yi , ti ), where h is the step size,
ti+1 − ti (iterate to h to get from t0 = a to tN = b)
Approximates f¯ with the value of f at the beginning of the interval
[ti , ti + h]
dyi
• Initial-value problem for an ODE first-order system: Given dt = f (y1 , y2 , . . . , yn , t)
and yi (a) = yi,0 for i = 1, 2, . . . n, find all yi (b)
In vector notation: Given dy dt = f (y, t) and y(a) = y0 , where y(t) is
an n × 1 vector and f is a function that returns an n × 1 vector, find y(b)
In this notation, Euler’s method for a system can be written com-
pactly as yi+1 = yi + hf (yi , ti )
• Any ODE system (even one with higher derivatives) can be converted
to this first-order form by setting the derivatives of lower order than the
highest one that appears as additional variables in the system
Example: pendulum motion (with friction and a driving force),

d2 θ dθ g
+ c + sin(θ) = a sin(Ωt)
dt2 dt L
(second-order equation: highest-order derivative of θ is 2)

21
Can be written as
dy1
= y2
dt
dy2 g
= −cy2 − sin(y1 ) + a sin(Ωt)
dt L

where y1 = θ, y2 = dt .

• Implicit Euler method: yi+1 = yi + hf (yi+1 , ti+1 ) – implicit because the


unknown yi+1 appears on both sides of the equation.
Solving for this yi+1 may require a numerical nonlinear equation solv-
ing (root-finding) method, depending on how f depends on y.

• Crank-Nicholson method (uses average of slopes from original (explicit)


and implicit Euler methods to estimate f¯, and so should be more accurate
then either one – analogous to trapezoid rule)
yi+1 = yi + h(f (yi , ti ) + f (yi+1 , ti+1 ))/2
Again, an implicit method, which makes each step substantially more
complicated than in the original (explicit) Euler method (unless, for ex-
ample, f is a linear function of y)
• Modified (second-order accurate) Euler method is a 2nd-order ‘Runge-
Kutta’ method (RK2; also known as Heun’s method); it’s analogous to
trapezoid rule, but unlike Crank-Nicholson method it’s explicit, so no
root-finding is required:
Two stages at each timestep, based on estimating the derivative at
the end as well as the beginning of each subinterval:
K1 + K2
yi+1 = yi + ,
2
with K1 = hf (yi , ti ) (as in Euler method) and K2 = hf (yi + K1 , ti + h)

• The classic Runge-Kutta method of order 4 (RK4) is one that is often


used in practice for solving ODEs. Each step involves four stages, and can
be written as:
1
y(ti+1 ) = y(ti ) + (K1 + 2K2 + 2K3 + K4 ),
6
with K1 = hf (y(t), t) (as in Euler method), K2 = hf (y(ti ) + K1 /2, ti +
h/2), K3 = hf (y(ti ) + K2 /2, ti + h/2), K4 = hf (y(ti ) + K3 , ti + h) (Notice
that the 1-2-1 ratio of the weights is similar to the Simpson rule)
• Local truncation error of a numerical method: error of each step of length
h

22
For Euler’s method, the local error is bounded by
h2
K2 ,
2
where K2 is the maximum of |y 00 | between t and t + h
• Global truncation error: error for far end of interval, after n = |b − a|/h
steps
The global error may depend in a complicated way on the local errors
at each step, but usually can be roughly approximated as the number of
steps times the local truncation error for each step
For Euler’s method, the estimated global error using this approxima-
tion is |b − a| · h/2 · y 00 (c) for some c between a and b. Thus, the estimated
global error is proportional to h (first-order accuracy).
The RK2 local truncation error is bounded by
h3
K3 ,
12
where K3 is the maximum of |y (3) | between t and t + h (same as for
the trapezoid rule in integration). Thus, the estimated global error is
proportional to h2 (second order).
The RK4 local truncation error is bounded by
h5
K5 ,
2880
where K5 is the maximum of |y (5) | between t and t + h (same as for Simp-
son’s rule in integration). Thus, the estimated global error is proportional
to h4 (fourth order). Usually, it will be much more accurate than first-
or second-order numerical methods (such as Euler or RK2) for the same
step size h.
• The Euler method as well as the other explicit and implicit methods can be
extended readily to systems of ODEs – just run through all the equations
in the system at each timestep to go from y(t) to y(t + h) (but for the
implicit methods, will need to solve a system of equations at each timestep)
• In practice, how do we estimate the local truncation error (since we don’t
generally know the values of higher derivatives of y)? The usual method
is to compare two estimates of y(t + h), obtained by different numerical
methods or different step sizes.
The estimated error can be used to adjust the step size h adap-
tively so that the global truncation error is within the given tolerance: if
the estimated error is very small the step size is increased so that fewer
computations are needed, while if the estimated error is too large the step
size is decreased so that the answer is accurate enough.

23
8 Ordinary differential equations: Boundary value
problems
• In initial-value ODE problems (which is the type we’ve been doing so far),
the conditions are all given at one value of the independent variable t, say
t = a. By contrast, in boundary-value problems, the conditions are spread
out between different values of t. Therefore, we can’t use methods that
start at the known initial condition and take steps away from it, as in the
Euler method and Runge-Kutta methods, to find a numerical solution.
• Example of a boundary-value problem: A beam supported at both ends
with nonlinear deflection, so that

y 00 T wx(L − x)
0 2 3/2
− y= ,
(1 + (y ) ) EI 2EI

where T is the tension in the beam, E is the beam modulus of elasticity,


I is the beam moment of inertia, and w is the uniform loading intensity
on the beam. The boundary conditions are y(0) = 0 and y(L) = 0.
Linearized form:
T wx(L − x)
y 00 − y= .
EI 2EI

• Another example (linear):


Steady-state 1-D groundwater flow
 
d dy
K = 0,
dx dx

where y(x) is the groundwater head and K the hydraulic conductivity, with
upstream and downstream boundary conditions y(xu ) = yu , y(xd ) = yd .
• Finite-difference is one method of numerically solving boundary value
problems. The idea is to find y on an equally spaced grid, in the beam
example case between x = 0 and x = L, where the points on the grid are
designated 0 = x0 , x1 , x2 , . . . xn = L and the corresponding y values are
y0 , y1 , y2 , . . . yn . We get one equation at each xi . For the beam example,
this is
y 00 (xi ) T wx(L − xi )
0 2 3/2
− yi = .
(1 + (y (xi )) ) EI 2EI
To continue we need expressions for the derivatives of y(x) at each xi in
terms of the xi and yi . We approximate these by finite-difference formulas,
for example (all these formulas are second-order-accurate):

yi+1 − yi−1
y 0 (xi ) ≈
2h

24
yi+1 − 2yi + yi−1
y 00 (xi ) ≈
h2

yi+2 /2 − yi+1 + yi−1 − yi−2 /2


y 000 (xi ) ≈
h3

yi+2 − 4yi+1 + 6yi − 4yi−1 + yi−2


y 0000 (xi ) ≈
h4
Applying these for i from 1 to n − 1, this gives us n − 1 generally
nonlinear equations for the n − 1 unknown yi (from the boundary condi-
tions, we already know that y0 = y(0) = 0 and yn = y(L) = 0). If the
ODE system is linear (as in the linearized form of the beam example, or
in the groundwater example), then we actually have n − 1 linear equations
in the n − 1 unknowns (or n + 1 if we include the boundary conditions as
additional equations), which we can solve using Gauss elimination or LU
decomposition to find the unknown yi , and hence (in this example) the
shape of the deflected beam.
• Note: For the finite-difference method, there’s no need to rewrite higher-
order ODEs as a system of first-order ODEs.
• Non-centered finite-difference formulas could be used for conditions near
the boundaries. Alternatively, fictitious points beyond the boundaries,
such as y−1 and yn+1 , could be added to the system to allow using the
centered formulas.
• Example: Use finite differences to approximate y(x) given
y 00 + y 0 = x, y(0) = 1, y 0 (15) = 2 :
Start by establishing a grid of points to solve at that spans the do-
main over which there are boundary conditions: say n = 5 equally spaced
intervals, with endpoints indexed 0, 1, . . . 5 (h = 3)
Next, write the differential equation for each interior grid point, plus
the boundary conditions, with each derivative replaced by a finite-difference
approximation (here, ones with second-order accuracy are used):
y0 = 1
1 2 1 1 1
h 2 y0 − h2 y1 + h2 y2 − 2h y0 + 2h y2 =3
1 2 1 1 1
h 2 y1 − h2 y2 + h2 y3 − 2h y1 + 2h y3 =6
1 2 1 1 1
h 2 y2 − h2 y3 + h2 y4 − 2h y2 + 2h y4 =9
1 2 1 1 1
h 2 y3 − h2 y4 + h2 y5 − 2h y3 + 2h y5 = 12
1 2 3
2h y3 − h y4 + 2h y5 =
2 [backward finite-difference approximation
for the first derivative at the upper boundary]

25
The resulting system of algebraic equations for the approximate y
values at the grid points is
    
1 0 0 0 0 0 y0 1
 −1 2 5
 18 − 19 0 0 0   y1   3 
   
18
2 5
 0 − 18 − 9 18 0 0   y2   6 
   
 y3  =  9 
 1 2 5

 0
 0 − 18 − 9 18 0     
1
 0 0 0 − 18 − 29 18
5 
y4   12 
1
0 0 0 6 − 23 12 y5 2
which gives, to 5 decimal places,
 
1

 325.32 

 270.85 
y= .

 173.22 

 116.80 
102

9 Interpolation
• Interpolation: fitting a function of a given type, say f , through some given
points (xi , yi ) so that for all the given points, f (xi ) = yi .
• Why interpolate data?
We may have measured responses at a few points, need responses at
intermediate points.
We may have evaluated a complicated function (such as the solution
to a differential equation) at a few points, and want estimates for its value
at many other points.
We may want to estimate the average value, integral, derivative, . . . of
a measured quantity or difficult-to-evaluate function whose value is known
only at some points.
• Polynomial interpolation
For any set of n points (xi , yi ) with distinct xi , there’s a unique
polynomial p of degree n − 1 such that p(xi ) = yi .
For n > 5 or so, polynomial interpolation tends to in most cases give
oscillations around the given points, so the interpolating polynomial often
looks unrealistic.
Polynomial interpolation is quite nonlocal and ill-conditioned, espe-
cially for larger n: if you change slightly one of the points to interpolate,
the whole curve will often change substantially.
Finding the interpolating polynomial through given points:

26
Lagrange form:
n n
X Y x − xj
yi
i=1
x − xj
j=1 i
j6=i

Newton form: Pn , where P1 = y1 and


i−1
Y x − xj
Pi = Pi−1 + (yi − Pi−1 (xi ))
x
j=1 i
− xj

for i = 2, . . . n
Example: if x = [0 1 -2 2 -1]’ and y = [-3 -2 1 -4 1]’, the
Lagrange form of the interpolating polynomial is

PL (x) = − 43 (x − 1)(x + 2)(x − 2)(x + 1) + 13 x(x + 2)(x − 2)(x + 1) + 1


24 x(x − 1)(x − 2)(x + 1)
− 16 x(x − 1)(x + 2)(x + 1) − 16 x(x − 1)(x + 2)(x − 2)

The Newton form is


5 17
PN (x) = −3 + x + x(x − 1) − x(x − 1)(x + 2) − x(x − 1)(x + 2)(x − 2)
8 24
The two forms are equivalent; thus, both give P ( 21 ) = − 128
387
, and both will
17 4 1 3 77 2 19
expand to P (x) = − 24 x + 12 x + 24 x + 12 x − 3
• Spline interpolation
Idea: interpolating function S(x) is a piecewise polynomial of some
low degree d; the derivative of order d is discontinuous at nodes or knots,
which are the boundaries between the polynomial pieces (for our purposes,
these are set to be at the data points).
Linear spline: d = 1 – connects points by straight lines (used for
plot) – first derivative isn’t continuous.
Quadratic spline: d = 2 – connects lines by segments of parabolas –
second derivative isn’t continuous (but first derivative is).
Cubic spline: d = 3 – commonly used – minimizes curvature out of
all the possible interpolating functions with continuous second derivatives
(third derivative is discontinuous)
Uses n−1 cubic functions to interpolate n points (4n−4 coefficients
total)
Need 2 additional conditions to specify coefficients uniquely. Com-
monly, we use natural boundary conditions, where the second derivative is
zero at the endpoints. Another possibility is not-a-knot boundary condi-
tions, where the first two cubic functions are the same and last two cubic
functions are the same.

27
The spline coefficients that interpolate a given set of points can be
found by solving a linear system. For a cubic spline:
Let the n + 1 given points be (xi , yi ), for i = 0, 1, . . . n
Each piecewise cubic polynomial (which interpolates over the in-
terval [xi−1 , xi ]) can be written as Si (x) = ai (x − xi−1 )3 + bi (x − xi−1 )2 +
ci (x − xi−1 ) + di , where i = 1, 2, . . . n
The 4n − 2 conditions for the piecewise polynomials to form a cubic
spline that interpolates the given points are
Si (xi−1 ) = yi−1 , i = 1, 2, . . . n
Si (xi ) = yi , i = 1, 2, . . . n
Si0 (xi ) = Si+1
0
(xi ), i = 1, 2, . . . n − 1
00 00
Si (xi ) = Si+1 (xi ), i = 1, 2, . . . n − 1
We can find the bi by solving a linear system that includes the
following n − 2 equations, plus two more from the boundary conditions:
hi−1 bi−1 + 2(hi−1 + hi )bi + hi bi+1 = 3(∆i − ∆i−1 ), for i = 2, 3, . . . n − 1
where hi ≡ xi − xi−1 , ∆i ≡ yi −yhi
i−1

For natural boundary conditions, the two additional equations are:


b1 = 0
hn−1 bn−1 + 2(hn−1 + hn )bn = 3(∆n − ∆n−1 )
For not-a-knot boundary conditions, we can introduce an additional
unknown bn+1 , and the three additional equations needed are:
h2 b1 − (h1 + h2 )b2 + h1 b3 = 0
hn bn−1 − (hn−1 + hn )bn + hn−1 bn+1 = 0
hn−1 bn−1 + 2(hn−1 + hn )bn + hn bn+1 = 3(∆n − ∆n−1 )
Once the b coefficients are found, we can find the a coefficients as
−bi
ai = bi+1
3hi , i = 1, 2, . . . n − 1
and an = −b3hn (natural boundary conditions) or an = an−1 (not-a-knot)
n

For the remaining coefficients,


ci = ∆i − hi bi − h2i ai ,
di = xi−1 .
For example, with (x, y) pairs (1, 3), (2, 4), (3, 6), (4, 7),
the cubic spline with natural boundary conditions has the pieces
S1 (x) = 31 (x − 1)3 + 23 (x − 1) + 3
S2 (x) = − 23 (x − 2)3 + (x − 2)2 + 53 (x − 2) + 4
S3 (x) = 31 (x − 3)3 − (x − 3)2 + 53 (x − 3) + 6.
The cubic spline with not-a-knot boundary conditions has the pieces
S1 (x) = − 13 (x − 1)3 + 32 (x − 1)2 − 16 (x − 1) + 3
S2 (x) = − 13 (x − 2)3 + 12 (x − 2)2 + 116 (x − 2) + 4
S3 (x) = − 31 (x − 3)3 − 12 (x − 3)2 + 116 (x − 3) + 6
(In fact, when, as in this example, n = 3, the cubic spline with not-a-knot
boundary conditions is the same as the interpolating polynomial.)
• Piecewise cubic Hermite polynomial interpolation (also known as mono-
tone cubic interpolation)

28
Piecewise cubic (like cubic spline)
Differences from cubic spline:
Second derivative not continuous (so less smooth)
Extreme points only at the nodes (so never goes above or below the
range formed by the two adjacent points – good when the values should
stay within an acceptable range)

10 Regression
• Least squares fitting is a kind of regression
Regression: fit a function of a given type, say f , approximately through
some given points (xi , yi ) so that for the given points, f (xi ) ≈ yi .
If the points are given as n × 1 vectors x, y, the residual vector of the
fitted function is r = y − f (x) (i.e. ri = yi − f (xi ))
The least squares criterion: Out of all the functions
Pnin the given type,
minimize the residual sum of squares, RSS = rT r = i=1 ri2
• Suppose the function type is such that we can write f (x) = Aβ, where A
is a known n × m design matrix for the given x while β is an unknown
m×1 vector of ‘parameters’. That is, f (xi ) = Ai,1 β1 +Ai,2 β2 +. . . Ai,m βm .
Example: the function type is a straight line, f (x) = ax + b. Then
row i of A consists of (xi , 1), and β is [a, b]T (m = 2).
Example: the function type is a quadratic with zero intercept, f (x) =
cx2 . Then row i of A consists of (x2i ), and β is [c] (m = 1).
• In that case, the residual sum of squares (RSS) rT r = ||r||22 is equal to
(Aβ − y)T (Aβ − y). Under least squares, we want to choose β so that
this quantity is as small as possible.
To find a minimum of the residual sum of squares, we take its deriva-
tive with respect to β, which works out to be

2AT (Aβ − y).

If we set this equal to zero, we then get for β the m × m linear system

AT Aβ = AT y

Solving this linear system of normal equations gives us the parameters


β that, for the given function type and data points, is the best fit (has
smallest residual sum of squares).
Minimizing RSS is the same as maximizing the coefficient of determi-
nation R2 = 1− RSS T
TSS , where the total sum of squares TSS is (y− ȳ) (y− ȳ),
with ȳ the average value of y

29
• An important question in regression is which function class we should
choose to fit given data, for example what degree polynomial to use. In
general, function classes with many parameters (like high-degree polyno-
mials) can fit any given data better (smaller RSS), but are not always
better at predicting new data that were not included in the fit. Some
ways to decide which function class to choose are:
Graphical inspection: Plot each fitted function (as a line) together
with the given points (as a scatter). Select the function class with fewest
unknown parameters that seems to follow the overall trend of the data.
Out-of-sample validation: Divide the available data into “training”
and “validation” subsets. Fit the parameters for each model using only
the training data. Choose the model with lowest RSS for the validation
data.
Numerical criteria:
Adjusted R2 :
n − 1 RSS
Ra2 = 1 − ,
n − m TSS
where n is the number of data points and m is the number of unknown
parameters in each model. Each model is fitted to the data and Ra2 is
computed, and the model with highest Ra2 is chosen as likely to be best
at predicting the values of new points.
Another commonly used rule is the Akaike information criterion
(AIC), which can be given as follows for linear least squares:

AIC = n log(RSS/n) + 2mn/(n − m − 1)

(log designates natural logarithm). Each model is fitted to the data and
AIC is computed, and the model with lowest AIC is chosen as likely to be
best at predicting the values of new points.
• In nonlinear least squares fitting, the function form is such that finding the
least squares parameter values for it requires solving a system of nonlinear
equations. In general this is a more difficult numerical problem.

11 Root finding (solving nonlinear equations)


• General form of a nonlinear equation in x: g(x) = h(x), e.g. x3 + x = 3.
Unlike non-singular linear systems, such equations can have many (or no)
solutions

• Standard form of a nonlinear equation: f (x) = 0, where f (x) = g(x) −


h(x), e.g. x3 + x − 3 = 0. Any solution to this equation is called a ‘root’
of f.

30
• Newton’s method : The iteration
f (xi )
xi+1 = xi −
f 0 (xi )
converges to a root x∗ of f if the initial value x0 is close to x∗ .
e.g. the above example with x0 = 1
Only needs one iteration to find x∗ if f (x) is linear (straight line)
Sometimes doesn’t converge if f (x) is strongly nonlinear (curved)
between x0 and x∗. Most likely to converge if x0 is close to x∗.
Like many numerical methods, Newton’s method (stopped after finitely
many iterations) will usually only give an approximate answer, but is eas-
ily implementable on a computer.
• One possible “stopping criterion” would be to check when the (absolute or
relative) difference between xn and xn−1 is small enough, as an estimate
of the error relative to the unknown true value x∗ .
• Another possible stopping criterion would be for f (xn ) to be close enough
to f (x∗ ) = 0 is small enough. This difference is called the “residual” and
is another measure of the error in our estimate of x∗ after n iterations.
• Newton’s method can be generalized for a function of a vector, f (x), where
the “Jacobian” matrix of partial derivatives J(x) has the elements Jij (x) =
∂fi −1
∂xj : xi+1 = xi − J (xi )f (xi )

• Disadvantages of Newton’s method are that it requires the derivative of f


and may not converge if the function is too nonlinear (so that it has ‘flat’
regions where the derivative is much closer to zero than elsewhere).
• Bisection is another iterative (approximate) numerical root-finding method.
It is slower to converge (more iterations typically required for high accu-
racy) but very reliable.
• Algorithm:
Start by finding two values a and b on either side of the root, such
that f (a) < 0 and f (b) > 0.
For each iteration:
Find f (c), where c is halfway between a and b (c ← (a + b)/2)
If f (c) = 0, return c as the root
If f (c) > 0, set b ← c
If f (c) < 0, set a ← c
Assuming that the exact root is not found, each iteration makes the
interval [a, b] half as wide as before
Possible stopping criteria (for some specified error tolerances tol, ):

31
Number of iterations
Uncertainty in the root: |b−a|
2 < tol (absolute uncertainty) or
|b−a|
|b+a| < tol (fractional uncertainty)
Small residual, |f (c)| < 
• Example for f (x) = x2 − 2:
Initialize: a ← 1, b ← 2
Iteration 1: c ← 1.5, f (c) > 0, b ← c
Iteration 2: c ← 1.25, f (c) < 0, a ← c
Iteration 3: c ← 1.375, f (c) < 0, a ← c
• The secant method is similar to Newton’s method in that it’s based on
locally approximating the function as a straight line, but it doesn’t require
us to be able to compute the derivative, instead estimating it from the
difference in function value between two points.
Like Newton’s method, this method converges fast (few iterations)
if the function f is locally pretty close to a straight line, but may not
converge at all if the function is very nonlinear.
• Algorithm:
Start with two initial points a and b close to the root (can be the
same as the starting points for bisection)
For each iteration:
Find f (a) and f (b)
Estimate the function’s slope (first derivative) as

f (b) − f (a)
s=
b−a

Compute c ← b − f (b)/s
Set a ← b, b ← c.
Possible stopping criteria:
Number of iterations
Estimated uncertainty in the root, |f (b)/s| < tol
Small residual, |f (c)| < 
• Example for f (x) = x2 − 2:
Initialize: a ← 1, b ← 2
Iteration 1: s = 3, c ← 1.333, a ← b, b ← c
Iteration 2: s = 3.333, c ← 1.4, a ← b, b ← c
Iteration 3: s = 2.733, c ← 1.4146, a ← b, b ← c

32
• The false position method chooses c in each iteration the same way as in
the secant method, but then finds a new bracketing interval for the next
iteration as in bisection. This is more reliable than the secant method in
that it should converge even for functions with ‘flat’ parts.
• Algorithm:
Start by finding two points a and b on either side of the root, such
that f (a) < 0 and f (b) > 0 (as in bisection). Set c ← (a + b)/2.
For each iteration:
Find f (a) and f (b)
Estimate the function’s slope (first derivative) as

f (b) − f (a)
s=
b−a

Compute c ← a − f (a)/s
Find f (c)
If f (c) = 0, return c as the root
If f (c) > 0, set b ← c
If f (c) < 0, set a ← c
Possible stopping criteria: As in bisection, or if c changes by less than
some tolerance between iterations.
• Example for f (x) = x2 − 2:
Initialize: a ← 1, b ← 2
Iteration 1: s = 3, c ← 1.4167, f (c) > 0, b ← c
Iteration 2: s = 2.417, c ← 1.4138, f (c) < 0, a ← c
Iteration 3: s = 2.831, c ← 1.4142, f (c) < 0, a ← c

12 Optimization
• Choose x to maximize or minimize a function f (x); examples: minimize
f (x) = −x2 + 2x; maximize f (θ) = 4 sin(θ)(1 + cos(θ))
max f (x) is equivalent to min −f (x)

• Local vs. global optimums (maximums or minimums)


• First step should always be to graph the function and assess where the
optimum might be

33
• Some optimization methods:
1) Golden section search (similar in robustness and convergence rate
to bisection;

finds a local optimum) – named after the golden section ratio
5+1
φ = 2 – here given for finding a local minimum
Start with a bracketing interval [a, b] that contains a minimum of
f (x), and with c ← a + (φ − 1)(b − a). Find f (a), f (b), f (c)
Iterate the following until convergence (e.g. until |b − a| < tol,
where tol is the maximum allowable error):
Find d ← a + (φ − 1)(c − a) and f (d)
If f (d) < f (c), set b ← c and c ← d
Otherwise, set a ← b and b ← d (so that c stays the same for the
next iteration)
Example: Find minimum of f (x) = − sin(x)(1 + cos(x)) for x in
[−π π] :

# a b c d f (c) f (d)
1 −π π 0.74162942 −0.74162942 −1.17357581 1.17357581
2 π −0.74162942 0.74162942 1.65833381 −1.17357581 −0.90908007
3 −0.74162942 1.65833381 0.74162942 0.17507496 −1.17357581 −0.34570127
4 1.65833381 0.17507496 0.74162942 1.09177934 −1.17357581 −1.29647964
5 1.65833381 0.74162942 1.09177934 1.30818389 −1.29647964 −1.21641886
6 0.74162942 1.30818389 1.09177934 0.95803397 −1.29647964 −1.28855421
7 1.30818389 0.95803397 1.09177934 −1.29647964

So after 6 iterations, the location of the minimum is narrowed down to


[0.95 1.31] and the minimum value is under −1.296
2) Root finding for derivative, either analytically or with any numer-
ical method (requires f to be differentiable)
Example: For f (x) = − sin(x)(1+cos(x)), we have f 0 (x) = −(cos(x)+
cos(2x)) and f 00 (x) = sin(x)+2 sin(2x). Applying Newton’s method to find
where f 0 (x) = 0 beginning with x0 = 1 gives the sequence 1, 1.04667383,
1.04719747, 1.04719755, 1.04719755 (converging to the minimum, where
f (x) = −1.29903811).

34

You might also like