Lecture Notes: CE 33500, Computational Methods in Civil Engineering
Lecture Notes: CE 33500, Computational Methods in Civil Engineering
1
theorem: If f is a function that has k + 1 continuous derivatives over the
interval between a and x, then
(x − a)k+1 (k+1)
Rk+1 = f (ζ)
(k + 1)!
for some ζ between a and x.
We typically don’t know the value of Rk+1 exactly, since the exact
value of ζ isn’t easily determined. Rk+1 represents the error in approx-
imating f (x) by the first k + 1 terms in the right-hand side, which are
known. But if the difference x − a is small enough, its k + 1 power will
also be small, and we might therefore find that the remainder term is small
enough to be negligible.
Taylor’s theorem can also be written in a series form:
k
X (x − a)i
f (x) = f (i) (a) + Rk+1 .
i=0
i!
h2 00 hk (k)
f (a + h) = f (a) + hf 0 (a) + f (a) + . . . + f (a) + Rk+1
2 k!
with
hk+1 (k+1)
Rk+1 = f (ζ)
(k + 1)!
for some ζ between a and a + h, or more compactly
k
X hi
f (a + h) = f (i) (a) + Rk+1
i=0
i!
or as an infinite series
∞
X hi
f (a + h) = f (i) (a).
i=0
i!
2
• Taylor’s theorem can be applied to find series expansions for functions in
terms of polynomials:
x2 x3 x4
ex = 1 + x + + + + ...
2 6 24
(converges for all x)
x3 x5
sin(x) = x − + − ...
6 120
x2 x4
cos(x) = 1 − + − ...
2 24
1
= 1 + x + x2 + x3 + . . .
1−x
(converges for |x| < 1)
x2 x3 x4
log(x + 1) = x − + − + ...
2 3 4
(converges for |x| < 1)
(In this class, trigonometric functions are always for the argument x
in radians, and log means the natural logarithm [base e].)
• Several important numerical methods can be directly derived from Tay-
lor’s theorem, including Newton’s method, finite differences, and Euler’s
method:
• Newton’s method for estimating x where some function f is equal to 0
can be derived from Taylor’s theorem:
f (a) R2 f (a)
0 = f (a) + (x − a)f 0 (a) + R2 → x = a − 0
− 0 ≈a− 0 ,
f (a) f (a) f (a)
with the approximation good if R2 is small (small h or small second deriva-
tive f (2) )
Example: Apply Newton’s method to estimate the square root of
26 iteratively. Set f (x) = x2 − 26 = 0, start with a0 = 5, get a1 =
a0 − ff0(a0) f (a1 ) f (a2 )
(a0 ) = 5.1, a2 = a1 − f 0 (a1 ) = 5.0990196078 . . . , a3 = a2 − f 0 (a2 ) =
5.09901951359
√ . . . , giving a series of increasingly accurate numerical esti-
mates of 26
3
• Centered finite difference for estimating the derivative of some function f
at x :
h2 00
f (x + h) = f (x) + hf 0 (x) + f (x) + R3,+
2
2
h
f (x − h) = f (x) − hf 0 (x) + f 00 (x) − R3,−
2
f (x + h) − f (x − h) = 2hf 0 (x) + R3,+ + R3,−
f (x + h) − f (x − h) R3,+ + R3,− f (x + h) − f (x − h)
f 0 (x) = − ≈
2h 2h 2h
where
h3 (3) h3 (3)
R3,+ =f (ζ+ ), R3,− = f (ζ− )
6 6
for some ζ+ between x and x + h and ζ− between x − h and x.
Example: To estimate the derivative of f (x) = ex at x = 1, we can
use the approximation derived here with h = 0.1 : f 0 (1) ≈ f (1.1)−f
2·0.1
(0.9)
=
2.72281456 . . .
The finite-difference approximation generally becomes more accurate
as we make h closer to zero, because the remainder terms get smaller
• Euler’s method for numerically approximating the value of y(x) given the
differential equation y 0 (x) = g(x, y(x)) and the initial value y(a) = y0 :
4
Usually for an engineering calculation we want the fractional error
introduced during the solution step to be small, say < 10−6 ; sometimes,
we also want the absolute error to be under some threshold.
• Approximate (estimated) error: needed in practice, because we generally
don’t know the exact answer x∗
There are a few ways to estimate approximately how big the error
might be. For example, if we have two different numerical approximations
x1 , x2 of x∗ , we can write
Approximate absolute error: |x1 − x2 |
Approximate fractional error:
|x1 − x2 |
|x+ |
5
Multiplication and division under roundoff are subject to maximum
fractional error of
Addition of two similar numbers also has maximum roundoff error
similar to , but subtraction of two numbers that are almost the same can
have roundoff error much bigger than (cancellation of significant digits)
A non-obvious example of subtractive cancellation: computing ex
P∞ i
using the first number of terms in the Taylor series expansion ex = i=0 xi!
(In Matlab, sum(x .^ (0:imax) ./ factorial(0:imax))) when x is a
large negative number, say −20
Another example: Approximating derivatives with finite difference
formulas with small increments h
In general, math operations which incur large roundoff error (like
subtracting similar numbers) tend to be those that are ill-conditioned,
meaning that small fractional changes in the numerical values used can
change the output by a large fractional amount
Examples of ill-conditioned problems: solving linear systems when
the coefficient matrix has a large condition number; trigonometric opera-
tions with a large argument (say, sin(10100 )); the quadratic formula when
b is much larger than a and c; finding a polynomial that interpolates a
large number of given points
Mitigation: Use extended precision; reformulate problems to avoid
subtracting numbers that are very close together
• Truncation error : Running an iterative numerical algorithm for only a
few steps, whereas convergence to the exact answer requires theoretically
an infinite number of steps
Often can be thought of as only considering the first few terms in the
Taylor series
“Steps” can be terms in the Taylor series, iterations of Newton’s
method or bisection for root finding, number of subdivisions in the com-
posite trapezoid rule or Simpson rule for numerical integration, etc.
Detection: Estimate truncation error (and roundoff error) by com-
paring the results of different numerical methods, or the same method run
for different numbers of steps on the same problem
Mitigation: Run for more steps (at the cost of more computations);
use a more accurate numerical method, if available
• Non-mathematical error sources should also be considered in formulating
and interpreting a problem, including measurement errors, variations in
material properties, uncertainties as to the loadings that will be faced,
simplifications in constructing math models . . . . They may have large
effects, but will be discussed more in other courses
Models are always only approximations (like maps) that hopefully
represent the main aspects of interest in an engineering problem
6
3 Linear systems
3.1 Properties
• Linear systems arise directly from engineering problems (stresses, circuits,
pipes, traffic networks . . . ) as well as indirectly via numerical methods, for
example finite difference and finite element methods for solving differential
equations
• Any system of m linear equations in n unknowns x1 , x2 , x3 , . . . xn (where
each equation looks like a1 x1 + a2 x2 + a3 x3 + . . . + an xn = b, with different
a and b coefficients) can be written in a standard matrix form Ax = b,
where
A is the m × n matrix with each row containing the (known) coeffi-
cients of the unknowns in one linear equation,
x is the n × 1 vector of unknowns,
b is the m × 1 vector of (known) constant terms in the equations.
Can also write the system as an m × (n + 1) “augmented matrix” A|b
(with x implied)
• Does a solution exist? For square systems (m = n), there is a unique
solution equal to A−1 b if A has an inverse.
If A has no inverse (is singular ), then there will be either no solution
or infinitely many solutions.
A has no inverse when the equations of a linear system with A as the
coefficient matrix are not linearly independent of each other. E.g.: the 2
given equations in 2 unknowns are x1 + 2x2 = 3, 2x1 + 4x2 = 6 so that the
coefficient matrix is
1 2
2 4
7
• If a matrix A has an inverse but is very close to a non-invertible matrix,
then
In exact math, any linear system Ax = b has a unique solution
However, a small change in A or b can change the solution x by a lot
(ill-conditioning)
The equations can become linearly dependent if we change the coef-
ficients slightly
Another way of putting this is that there are vectors x far from the
true solution x∗ (large error compared to x∗ ) where Ax is nevertheless
close to b (small residual).
Roundoff errors in the computation have an effect similar to perturb-
ing the coefficient matrix A or b slightly, potentially giving a very wrong
solution (large error), although it will generally have a small residual.
We measure how close a matrix is to a singular matrix by the condi-
tion number, defined as cond(A) = ||A|| × ||A−1 ||, which ranges from 1
to infinity. A matrix with no inverse has infinite condition number. The
condition number gives the factor by which small changes to the coeffi-
cients, due to measurement or roundoff error, can multiply to give a larger
error in x. A large condition number means that linear systems with this
coefficient matrix will be ill-conditioned – changing the numerical values
by a small fraction could change the solution by a large percentage.
• Similarly, a lower triangular matrix has only zero entries for columns more
than the row number, i.e. Ai,j = 0 whenever j > i.
• If the coefficient matrix of a square linear system A is lower triangular,
then it can generally be solved for the unknown xi by forward substitution:
x1 = b1 /A1,1
for i = 2, 3, . . . to n
Pi−1
xi = bi − j=1 Ai,j xj /Ai,i
8
• Any square linear system can generally be transformed into an upper
triangular one with the same solution through Gauss elimination:
for j = 1, 2, . . . to n
p = Aj,j (pivot element; j is the pivot row or pivot equation)
for i = j + 1, j + 2, . . . to n
M = Ai,j /p (multiplier)
Ai,: = Ai,: − M × Aj,:
bi = bi − M × bj
• Example: to solve
−1 2 2 8
1 1 1 x = 1
1 3 2 4
for x:
−1 2 2 8 −1 2 2 8 −1 2 2 8
1 1 1 1 (augmented matrix) → 0 3 3 9 → 0 3 3 9
1 3 2 4 0 5 4 12 0 0 −1 −3
9
Find row l such that |Al,j | is the maximum of all the elements in
column j that are at or below row j.
Interchange rows l and j in the matrix A. (This is the same as
multiplying A (and then b) by a permutation matrix Q, which is an
identity matrix with rows l and j interchanged.)
Interchange elements l and j in the vector b.
0 2 2 −2 4 −2 4 4 4 −2 4 4
1 −2 −1 2 → 1 −2 −1 2 → 0 2 2 −2 →
3
4 −2 4 4 0 2 2 −2 0 − 2 −2 1
4 −2 4 4
0 2 2 −2
0 0 − 2 − 21
1
Giving
−1
x = −2
1
10
Then U is the transformed A (which is upper triangular) and L is
the matrix built up from the multipliers M , with ones added along the
main diagonal.
Example:
−1 2 2 −1 2 2 −1 2 2
1 1 1 → −1| 3 3 → −1| 3 3
1 3 2 −1| 5 4 −1 5/3| −1
This gives us the factors in the form (L\U) , which can be expanded to
1 0 0 −1 2 2
L = −1 1 0 ,U = 0 3 3
−1 5/3 1 0 0 −1
You can then check that LU is in fact equal to the original matrix.
• LU decomposition is useful in solving linear systems because once we have
the decomposition of a matrix A, we can solve any system with the coef-
ficient matrix A using forward and back substitution and only around n2
operations instead of n3 :
Given Ax = b with unknown x and the LU decomposition A = LU,
Solve the lower triangular system Ly = b for y.
Now that y is known, solve the upper triangular system Ux = y for
the unknown x.
• Thus, if we need to solve several problems with the same coefficient matrix
A and different vectors b, it’s more efficient to find the LU decomposition
of A and use that to solve for the different unknown vectors, instead of
repeating the Gauss elimination for each problem.
• What about row pivoting?
If we include row pivoting in the LU decomposition above, what we
will get is a permuted LU decomposition with factors L and U such that
PA = LU, where P is a permutation matrix that represents the row
interchanges:
Start with P = I, an n × n identity matrix.
for j = 1, 2, . . . to n
Row pivoting:
Find row l such that |Al,j | is the maximum of all the ele-
ments in column j that are at or below row j.
Interchange rows l and j in the matrix A. (This is the same
as multiplying A by a permutation matrix Q, which is an identity matrix
with rows l and j interchanged.)
11
Update P ← QP. (Interchange rows l and j of P)
p = Aj,j (pivot element; j is the pivot row)
for i = j + 1, j + 2, . . . to n
M = Ai,j /p (multiplier)
Save M as Li,j
Ai,: = Ai,: − M × Aj,:
Then U is the transformed A (which is upper triangular) and L is
the matrix built up from the multipliers M , with ones added along the
main diagonal.
• Example:
0 2 2 (1) 4 −2 4 (3) 4 −2 4 (3)
1 −2 −1 (2) → 1 −2 −1 (2) → 14 | − 32 −2 (2) →
4 −2 4 (3) 0 2 2 (1) 0| 2 2 (1)
4 −2 4 (3) 4 −2 4 (3)
0| 2 2 (1) → 0| 2 2 (1)
1
4| − 23 −2 (2) 1
4 − 4 | − 12 (2)
3
The product LU is equal to the original matrix with the rows permuted
(switched). In this case the permutation matrix P is
0 0 1
1 0 0
0 1 0
12
• In addition to the LU decomposition, a symmetric matrix (one that’s the
same as its transpose) may also has a Cholesky decomposition, for which
the upper triangular factor is the transpose of the lower triangular factor
(specifically, the Cholesky decomposition of a symmetric matrix exists if
the matrix is positive definite). This decomposition can be computed with
about half as many arithmetic operations as LU decomposition, and can
used to help solve linear systems with this coefficient matrix just like the
LU decomposition can.
Given a symmetric positive definite matrix A, its lower triangular
Cholesky factor L (A = LLT ) can be computed as:
for j = 1, 2, . . . to n
v
u
u j−1
X
Lj,j = tAj,j − L2j,k
k=1
for i = j + 1, j + 2, . . . to n
Pj−1
Ai,j − k=1 Li,k Lj,k
Li,j =
Lj,j
Example:
1 2 3 1 0 0
If A = 2 5 7 , then L = 2 1 0
3 7 14 3 1 2
x00 = −Ax,
where A ≡ M−1 K.
√
For
√ this system, if Av = λv, then x(t) = v sin( λt) and x(t) =
v cos( λt) are solutions.
13
Since this is a system of linear differential equations, any linear com-
bination of solutions is also a solution. (There are normally n eigenvalue-
eigenvector pairs λi , vi of A, and we need 2n initial conditions [i.e. the
values of x(0) and x0 (0)] to find a unique solution x(t).)
Pn √ √
The general solution is √
therefore i=1 √
ci vi sin( λi t)+di vi cos( λi t),
Pn
or equivalently i=1 ai vi ei λi t + bi vi e−i λi t , where ci , di or ai , bi can be
determined from the initial conditions
The eigenvectors are modes of oscillation for the system, and the
eigenvalues are the squared frequencies for each mode.
In analyzing vibrational systems, the first few modes (the ones with
the lowest frequencies/eigenvalues) are usually the most important be-
cause they are the most likely to be excited and the slowest to damp. The
fundamental mode is the one with lowest frequency.
Modes of e.g. a beam or structure can be found experimentally by
measuring the responses induced by vibrations with different frequencies
(modal analysis).
• How do we find eigenvalues and eigenvectors?
For diagonal or (upper/lower) triangular matrices, the eigenvalues are
just the diagonal elements
For general 2 × 2 matrices, we can solve a quadratic equation for the
eigenvalues λ – generally, there will be two (they may be complex). For
each eigenvalue, we can then solve a linear system to get the eigenvector.
Example:
If
−1 2
A= ,
1 1
eigenvalues λ and eigenvectors v must be solutions to
Av = λv, or (A − λI)v = 0.
Assuming that v isn’t a zero vector, this implies that (A − λI) is not in-
vertible, so its determinant must be zero. But the determinant of (A − λI)
is (−1 − λ)(1 − λ) − 2, so we have the characteristic polynomial
√
λ2 − 3 = 0 → λ = ± 3.
To find the eigenvectors corresponding
√ to each λ, we solve the linear system
to find v. We have Av = ± 3v, or
√
−1 ∓ 3 2√
v = 0,
1 1∓ 3
where the second row is a multiple of the first, so there is not a unique
solution. We have
1
√
v=
(1 ± 3)/2
14
or any multiples thereof.
• In general, an n×n matrix has n (complex) eigenvalues, which are the roots
of a characteristic polynomial of order n. We could find the eigenvalues by
writing and solving for this characteristic polynomial, but more efficient
numerical methods exist, for example based on finding a QR factorization
of the matrix (which we won’t cover in this class).
• A conceptually simple numerical method for finding the largest (in abso-
lute value) eigenvalue of any given square matrix is the power method. It
involves the iteration:
Start with an n × 1 vector v
Do until convergence:
v ← Av
v
v← ||v||
Example: If
1 2 1
A= , v0 = ,
1 1 1
then (using the vector ∞-norm ||v||∞ ≡ max(|vi |)) successive iterations
produce for v
1 1 1 1
, , , ,···
2/3 5/7 12/17 29/41
converging toward the eigenvector √1 , which has the eigenvalue
2/2
√
2 + 1.
• A symmetric real matrix will have only real eigenvalues. Otherwise, eigen-
values of a real matrix, being the roots of a polynomial, may also come in
complex conjugate pairs.
A symmetric matrix is positive definite if (and only if) all its eigen-
values are positive.
15
• The eigenvalues of the inverse of a matrix (if the inverse exists) are the
reciprocals of the eigenvalues of the matrix, while the eigenvectors of both
are the same.
• For any matrix A, the square root of the ratio of the largest to smallest
eigenvalue of AAT is equal to the (2-norm) condition number of A
5 Differentiation
• Finite difference (centered) to approximate f 0 (x0 ) :
4j 1
Dij ≡ j−1
Di+1 − j Dj−1
j
4 −1 4 −1 i
16
can give an estimate of uncertainty, which may be used as a criterion for
convergence. jmax often doesn’t need to be large to get a very accurate
estimate, which makes roundoff error less of a problem.
Example of estimates obtained:
17
6 Integration
6.1 Introduction
• Some applications of integrals
1
Rb
Average function value between a and b: b−a a
f (x)dx
Center of mass (in 1-D):
Rb
a
xρ(x)dx
Rb
a
ρ(x)dx
(ρ = density)
Rb
Moment of inertia about x = x0 (in 1-D): (x − x0 )2 ρ(x)dx
a
Rb
Net force produced by a distributed loading: a w(x)dx (w = force
per unit length)
Net moment about x = x0 produced by a distributed loading:
Z b
(x − x0 )w(x)dx
a
Rb
• Typical situations where we need to approximate an integral I = a
f (x)dx
numerically:
The function f doesn’t have an analytic integral
No mathematical expression for function is available – we can only
measure values or get them from a computation.
18
6.3 Composite rules
• For numerical approximations of an integral, we can take advantage of the
fact that integral are additive – if we divide the integration interval [a, b]
Rb Rc
into [a, c] and [c, b], the integral I = a f (x)dx is equal to I1 = a f (x)dx
Rb
plus I2 = c f (x)dx.
Composite forms of the trapezoid, midpoint, or Simpson rules di-
vide [a, b] up into n subintervals, apply the rule to each subinterval, and
then add up the results to get an approximation for the whole integral.
Generally, this improves the accuracy of the approximation compared to
applying the rule to the whole interval.
For the composite Simpson’s rule when the interval has equally spaced
points a = x0 , x1 , x2 , ...xn = b, we get a 1-4-2-4-2-. . . 4-2-4-1 weighting (For
the trapezoid rule, it’s 1-2-2-2-. . . 2-1)
Adaptive subdivision can be used to achieve high accuracy for numer-
ical integration with fewer computations.
An analogue to Taylor’s theorem can be used to derive absolute error
bounds for the composite rules with equally spaced evaluation points for
3
a function f that is smooth in [a, b]: (b−a)
12
K2
n2 for the trapezoid rule,
(b−a)3 K2 5
in (a, b). Although we often don’t know the maximum value of derivatives
of f , these bounds are nevertheless useful for estimating how the error will
decrease as a result of increasing the number of intervals n.
• To integrate functions whose values are only available at certain points
(which may be unequally spaced), there are a few options:
Composite trapezoid rule works even for unequally spaced intervals
Can interpolate the points with an easily integrable function (such as
a polynomial or cubic spline) and integrate the interpolating function.
19
Because the error from the composite trapezoid rule decreases as the
0
number of subintervals squared, The error of Ri+1 is expected to be about
0
1/4 that of Ri , and in the same direction
We exploit this by coming up with a generally more accurate estimate
Ri1 ≡ 43 Ri+1
0
− 13 Ri0 .
j
Can continue, with Rij ≡ 4j4−1 Ri+1
j−1
− 4j1−1 Rij−1 for any j ≥ 1, to
obtain generally even more accurate estimates.
Difference between two estimates can give an estimate of uncertainty,
which may be used as a criterion for convergence. For smooth functions,
i often doesn’t need to be large to get a very accurate estimate.
Algorithm:
for j = 0, 1, . . . jmax :
Evaluate the function at 2j + 1 equally spaced points, including the
endpoints a and b (giving 2j equal-width subintervals), and obtain Rj0
4j
i
Find Rj−i , i = 1, . . . j using the formula Rij ≡ j−1 1 j−1
4j −1 Ri+1 − 4j −1 Ri
• Gauss quadrature
Estimate an integral based on the function value at specific non-
equally spaced points within the interval (more points closer to the edges)
Select the sample points and weights based on approximating the
function as a polynomial of degree 2n − 1, where n is the number of points
In practice, tabulated values of the sample points xi and weights wi
for the standard integration interval [−1, 1] are available
R1 Pn
To approximate I = −1 f (x)dx, use Gn = i=1 wi f (xi )
Rb
To approximate I = a f (x)dx, use
n
b−aX b−a
Gn = wi · f a+ (xi + 1)
2 i=1 2
20
Can give very accurate numerical integral estimates with few function
evaluations (small n).
With given n, could divide the integration interval into parts and
apply Gauss quadrature to each one in order to get increased accuracy
• Both Romberg integration and Gauss quadrature are only applicable if
we can find the function value at the desired points. Also, they may
not be more accurate than simpler methods if the function is not smooth
(e.g. has discontinuities). Essentially this is because they both rely on
approximating the function by the first terms in its Taylor series.
d2 θ dθ g
+ c + sin(θ) = a sin(Ωt)
dt2 dt L
(second-order equation: highest-order derivative of θ is 2)
21
Can be written as
dy1
= y2
dt
dy2 g
= −cy2 − sin(y1 ) + a sin(Ωt)
dt L
dθ
where y1 = θ, y2 = dt .
22
For Euler’s method, the local error is bounded by
h2
K2 ,
2
where K2 is the maximum of |y 00 | between t and t + h
• Global truncation error: error for far end of interval, after n = |b − a|/h
steps
The global error may depend in a complicated way on the local errors
at each step, but usually can be roughly approximated as the number of
steps times the local truncation error for each step
For Euler’s method, the estimated global error using this approxima-
tion is |b − a| · h/2 · y 00 (c) for some c between a and b. Thus, the estimated
global error is proportional to h (first-order accuracy).
The RK2 local truncation error is bounded by
h3
K3 ,
12
where K3 is the maximum of |y (3) | between t and t + h (same as for
the trapezoid rule in integration). Thus, the estimated global error is
proportional to h2 (second order).
The RK4 local truncation error is bounded by
h5
K5 ,
2880
where K5 is the maximum of |y (5) | between t and t + h (same as for Simp-
son’s rule in integration). Thus, the estimated global error is proportional
to h4 (fourth order). Usually, it will be much more accurate than first-
or second-order numerical methods (such as Euler or RK2) for the same
step size h.
• The Euler method as well as the other explicit and implicit methods can be
extended readily to systems of ODEs – just run through all the equations
in the system at each timestep to go from y(t) to y(t + h) (but for the
implicit methods, will need to solve a system of equations at each timestep)
• In practice, how do we estimate the local truncation error (since we don’t
generally know the values of higher derivatives of y)? The usual method
is to compare two estimates of y(t + h), obtained by different numerical
methods or different step sizes.
The estimated error can be used to adjust the step size h adap-
tively so that the global truncation error is within the given tolerance: if
the estimated error is very small the step size is increased so that fewer
computations are needed, while if the estimated error is too large the step
size is decreased so that the answer is accurate enough.
23
8 Ordinary differential equations: Boundary value
problems
• In initial-value ODE problems (which is the type we’ve been doing so far),
the conditions are all given at one value of the independent variable t, say
t = a. By contrast, in boundary-value problems, the conditions are spread
out between different values of t. Therefore, we can’t use methods that
start at the known initial condition and take steps away from it, as in the
Euler method and Runge-Kutta methods, to find a numerical solution.
• Example of a boundary-value problem: A beam supported at both ends
with nonlinear deflection, so that
y 00 T wx(L − x)
0 2 3/2
− y= ,
(1 + (y ) ) EI 2EI
where y(x) is the groundwater head and K the hydraulic conductivity, with
upstream and downstream boundary conditions y(xu ) = yu , y(xd ) = yd .
• Finite-difference is one method of numerically solving boundary value
problems. The idea is to find y on an equally spaced grid, in the beam
example case between x = 0 and x = L, where the points on the grid are
designated 0 = x0 , x1 , x2 , . . . xn = L and the corresponding y values are
y0 , y1 , y2 , . . . yn . We get one equation at each xi . For the beam example,
this is
y 00 (xi ) T wx(L − xi )
0 2 3/2
− yi = .
(1 + (y (xi )) ) EI 2EI
To continue we need expressions for the derivatives of y(x) at each xi in
terms of the xi and yi . We approximate these by finite-difference formulas,
for example (all these formulas are second-order-accurate):
yi+1 − yi−1
y 0 (xi ) ≈
2h
24
yi+1 − 2yi + yi−1
y 00 (xi ) ≈
h2
25
The resulting system of algebraic equations for the approximate y
values at the grid points is
1 0 0 0 0 0 y0 1
−1 2 5
18 − 19 0 0 0 y1 3
18
2 5
0 − 18 − 9 18 0 0 y2 6
y3 = 9
1 2 5
0
0 − 18 − 9 18 0
1
0 0 0 − 18 − 29 18
5
y4 12
1
0 0 0 6 − 23 12 y5 2
which gives, to 5 decimal places,
1
325.32
270.85
y= .
173.22
116.80
102
9 Interpolation
• Interpolation: fitting a function of a given type, say f , through some given
points (xi , yi ) so that for all the given points, f (xi ) = yi .
• Why interpolate data?
We may have measured responses at a few points, need responses at
intermediate points.
We may have evaluated a complicated function (such as the solution
to a differential equation) at a few points, and want estimates for its value
at many other points.
We may want to estimate the average value, integral, derivative, . . . of
a measured quantity or difficult-to-evaluate function whose value is known
only at some points.
• Polynomial interpolation
For any set of n points (xi , yi ) with distinct xi , there’s a unique
polynomial p of degree n − 1 such that p(xi ) = yi .
For n > 5 or so, polynomial interpolation tends to in most cases give
oscillations around the given points, so the interpolating polynomial often
looks unrealistic.
Polynomial interpolation is quite nonlocal and ill-conditioned, espe-
cially for larger n: if you change slightly one of the points to interpolate,
the whole curve will often change substantially.
Finding the interpolating polynomial through given points:
26
Lagrange form:
n n
X Y x − xj
yi
i=1
x − xj
j=1 i
j6=i
for i = 2, . . . n
Example: if x = [0 1 -2 2 -1]’ and y = [-3 -2 1 -4 1]’, the
Lagrange form of the interpolating polynomial is
27
The spline coefficients that interpolate a given set of points can be
found by solving a linear system. For a cubic spline:
Let the n + 1 given points be (xi , yi ), for i = 0, 1, . . . n
Each piecewise cubic polynomial (which interpolates over the in-
terval [xi−1 , xi ]) can be written as Si (x) = ai (x − xi−1 )3 + bi (x − xi−1 )2 +
ci (x − xi−1 ) + di , where i = 1, 2, . . . n
The 4n − 2 conditions for the piecewise polynomials to form a cubic
spline that interpolates the given points are
Si (xi−1 ) = yi−1 , i = 1, 2, . . . n
Si (xi ) = yi , i = 1, 2, . . . n
Si0 (xi ) = Si+1
0
(xi ), i = 1, 2, . . . n − 1
00 00
Si (xi ) = Si+1 (xi ), i = 1, 2, . . . n − 1
We can find the bi by solving a linear system that includes the
following n − 2 equations, plus two more from the boundary conditions:
hi−1 bi−1 + 2(hi−1 + hi )bi + hi bi+1 = 3(∆i − ∆i−1 ), for i = 2, 3, . . . n − 1
where hi ≡ xi − xi−1 , ∆i ≡ yi −yhi
i−1
28
Piecewise cubic (like cubic spline)
Differences from cubic spline:
Second derivative not continuous (so less smooth)
Extreme points only at the nodes (so never goes above or below the
range formed by the two adjacent points – good when the values should
stay within an acceptable range)
10 Regression
• Least squares fitting is a kind of regression
Regression: fit a function of a given type, say f , approximately through
some given points (xi , yi ) so that for the given points, f (xi ) ≈ yi .
If the points are given as n × 1 vectors x, y, the residual vector of the
fitted function is r = y − f (x) (i.e. ri = yi − f (xi ))
The least squares criterion: Out of all the functions
Pnin the given type,
minimize the residual sum of squares, RSS = rT r = i=1 ri2
• Suppose the function type is such that we can write f (x) = Aβ, where A
is a known n × m design matrix for the given x while β is an unknown
m×1 vector of ‘parameters’. That is, f (xi ) = Ai,1 β1 +Ai,2 β2 +. . . Ai,m βm .
Example: the function type is a straight line, f (x) = ax + b. Then
row i of A consists of (xi , 1), and β is [a, b]T (m = 2).
Example: the function type is a quadratic with zero intercept, f (x) =
cx2 . Then row i of A consists of (x2i ), and β is [c] (m = 1).
• In that case, the residual sum of squares (RSS) rT r = ||r||22 is equal to
(Aβ − y)T (Aβ − y). Under least squares, we want to choose β so that
this quantity is as small as possible.
To find a minimum of the residual sum of squares, we take its deriva-
tive with respect to β, which works out to be
If we set this equal to zero, we then get for β the m × m linear system
AT Aβ = AT y
29
• An important question in regression is which function class we should
choose to fit given data, for example what degree polynomial to use. In
general, function classes with many parameters (like high-degree polyno-
mials) can fit any given data better (smaller RSS), but are not always
better at predicting new data that were not included in the fit. Some
ways to decide which function class to choose are:
Graphical inspection: Plot each fitted function (as a line) together
with the given points (as a scatter). Select the function class with fewest
unknown parameters that seems to follow the overall trend of the data.
Out-of-sample validation: Divide the available data into “training”
and “validation” subsets. Fit the parameters for each model using only
the training data. Choose the model with lowest RSS for the validation
data.
Numerical criteria:
Adjusted R2 :
n − 1 RSS
Ra2 = 1 − ,
n − m TSS
where n is the number of data points and m is the number of unknown
parameters in each model. Each model is fitted to the data and Ra2 is
computed, and the model with highest Ra2 is chosen as likely to be best
at predicting the values of new points.
Another commonly used rule is the Akaike information criterion
(AIC), which can be given as follows for linear least squares:
(log designates natural logarithm). Each model is fitted to the data and
AIC is computed, and the model with lowest AIC is chosen as likely to be
best at predicting the values of new points.
• In nonlinear least squares fitting, the function form is such that finding the
least squares parameter values for it requires solving a system of nonlinear
equations. In general this is a more difficult numerical problem.
30
• Newton’s method : The iteration
f (xi )
xi+1 = xi −
f 0 (xi )
converges to a root x∗ of f if the initial value x0 is close to x∗ .
e.g. the above example with x0 = 1
Only needs one iteration to find x∗ if f (x) is linear (straight line)
Sometimes doesn’t converge if f (x) is strongly nonlinear (curved)
between x0 and x∗. Most likely to converge if x0 is close to x∗.
Like many numerical methods, Newton’s method (stopped after finitely
many iterations) will usually only give an approximate answer, but is eas-
ily implementable on a computer.
• One possible “stopping criterion” would be to check when the (absolute or
relative) difference between xn and xn−1 is small enough, as an estimate
of the error relative to the unknown true value x∗ .
• Another possible stopping criterion would be for f (xn ) to be close enough
to f (x∗ ) = 0 is small enough. This difference is called the “residual” and
is another measure of the error in our estimate of x∗ after n iterations.
• Newton’s method can be generalized for a function of a vector, f (x), where
the “Jacobian” matrix of partial derivatives J(x) has the elements Jij (x) =
∂fi −1
∂xj : xi+1 = xi − J (xi )f (xi )
31
Number of iterations
Uncertainty in the root: |b−a|
2 < tol (absolute uncertainty) or
|b−a|
|b+a| < tol (fractional uncertainty)
Small residual, |f (c)| <
• Example for f (x) = x2 − 2:
Initialize: a ← 1, b ← 2
Iteration 1: c ← 1.5, f (c) > 0, b ← c
Iteration 2: c ← 1.25, f (c) < 0, a ← c
Iteration 3: c ← 1.375, f (c) < 0, a ← c
• The secant method is similar to Newton’s method in that it’s based on
locally approximating the function as a straight line, but it doesn’t require
us to be able to compute the derivative, instead estimating it from the
difference in function value between two points.
Like Newton’s method, this method converges fast (few iterations)
if the function f is locally pretty close to a straight line, but may not
converge at all if the function is very nonlinear.
• Algorithm:
Start with two initial points a and b close to the root (can be the
same as the starting points for bisection)
For each iteration:
Find f (a) and f (b)
Estimate the function’s slope (first derivative) as
f (b) − f (a)
s=
b−a
Compute c ← b − f (b)/s
Set a ← b, b ← c.
Possible stopping criteria:
Number of iterations
Estimated uncertainty in the root, |f (b)/s| < tol
Small residual, |f (c)| <
• Example for f (x) = x2 − 2:
Initialize: a ← 1, b ← 2
Iteration 1: s = 3, c ← 1.333, a ← b, b ← c
Iteration 2: s = 3.333, c ← 1.4, a ← b, b ← c
Iteration 3: s = 2.733, c ← 1.4146, a ← b, b ← c
32
• The false position method chooses c in each iteration the same way as in
the secant method, but then finds a new bracketing interval for the next
iteration as in bisection. This is more reliable than the secant method in
that it should converge even for functions with ‘flat’ parts.
• Algorithm:
Start by finding two points a and b on either side of the root, such
that f (a) < 0 and f (b) > 0 (as in bisection). Set c ← (a + b)/2.
For each iteration:
Find f (a) and f (b)
Estimate the function’s slope (first derivative) as
f (b) − f (a)
s=
b−a
Compute c ← a − f (a)/s
Find f (c)
If f (c) = 0, return c as the root
If f (c) > 0, set b ← c
If f (c) < 0, set a ← c
Possible stopping criteria: As in bisection, or if c changes by less than
some tolerance between iterations.
• Example for f (x) = x2 − 2:
Initialize: a ← 1, b ← 2
Iteration 1: s = 3, c ← 1.4167, f (c) > 0, b ← c
Iteration 2: s = 2.417, c ← 1.4138, f (c) < 0, a ← c
Iteration 3: s = 2.831, c ← 1.4142, f (c) < 0, a ← c
12 Optimization
• Choose x to maximize or minimize a function f (x); examples: minimize
f (x) = −x2 + 2x; maximize f (θ) = 4 sin(θ)(1 + cos(θ))
max f (x) is equivalent to min −f (x)
33
• Some optimization methods:
1) Golden section search (similar in robustness and convergence rate
to bisection;
√
finds a local optimum) – named after the golden section ratio
5+1
φ = 2 – here given for finding a local minimum
Start with a bracketing interval [a, b] that contains a minimum of
f (x), and with c ← a + (φ − 1)(b − a). Find f (a), f (b), f (c)
Iterate the following until convergence (e.g. until |b − a| < tol,
where tol is the maximum allowable error):
Find d ← a + (φ − 1)(c − a) and f (d)
If f (d) < f (c), set b ← c and c ← d
Otherwise, set a ← b and b ← d (so that c stays the same for the
next iteration)
Example: Find minimum of f (x) = − sin(x)(1 + cos(x)) for x in
[−π π] :
# a b c d f (c) f (d)
1 −π π 0.74162942 −0.74162942 −1.17357581 1.17357581
2 π −0.74162942 0.74162942 1.65833381 −1.17357581 −0.90908007
3 −0.74162942 1.65833381 0.74162942 0.17507496 −1.17357581 −0.34570127
4 1.65833381 0.17507496 0.74162942 1.09177934 −1.17357581 −1.29647964
5 1.65833381 0.74162942 1.09177934 1.30818389 −1.29647964 −1.21641886
6 0.74162942 1.30818389 1.09177934 0.95803397 −1.29647964 −1.28855421
7 1.30818389 0.95803397 1.09177934 −1.29647964
34