Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes
Lecture Notes
Endre Süli
Mathematical Institute
University of Oxford
2011
2
3
Lecture 5. LU factorization
Lecture 6. QR factorization
E.g.: n = 1,
linear constant
Note: degree ≤ 1 =⇒ pn ∈ Πn
Theorem. ∃pn ∈ Πn such that pn (xi) = fi for i = 0, 1, . . . , n.
Proof. Consider, for k = 0, 1, . . . , n,
(x − x0 ) · · · (x − xk−1)(x − xk+1) · · · (x − xn)
Ln,k (x) = ∈ Πn . (1)
(xk − x0 ) · · · (xk − xk−1)(xk − xk+1) · · · (xk − xn )
Then
So now define n
X
pn (x) = fk Ln,k (x) ∈ Πn (2)
k=0
=⇒ n
X
pn(xi) = fk Ln,k (xi) = fi for i = 0, 1, . . . , n.
k=0 2
1
Matlab:
% matlab
>> help lagrange
LAGRANGE Plots the Lagrange polynomial interpolant for the
given DATA at the given KNOTS
>> lagrange([1,1.2,1.3,1.4],[4,3.5,3,0]);
3.5
2.5
1.5
0.5
>> lagrange([0,2.3,3.5,3.6,4.7,5.9],[0,0,0,1,1,1]);
60
50
40
30
20
10
−10
−20
−30
−40
0 1 2 3 4 5 6
2
Data from an underlying smooth function: Suppose that f (x) has
at least n + 1 smooth derivatives in the interval (x0, xn). Let fk = f (xk ) for
k = 0, 1, . . . , n, and let pn be the Lagrange interpolating polynomial for the
data (xk , fk ), k = 0, 1, . . . , n.
Error: how large can the error f (x) − pn (x) be on the interval [x0, xn]?
Theorem. For every x ∈ [x0, xn] there exists ξ = ξ(x) ∈ (x0, xn) such that
def f (n+1)(ξ)
e(x) = f (x) − pn (x) = (x − x0 )(x − x1) · · · (x − xn ) ,
(n + 1)!
where f (n+1) is the n + 1-st derivative of f .
Proof. Trivial for x = xk , k = 0, 1, . . . , n as e(x) = 0 by construction. So
suppose x 6= xk . Let
def e(x)
φ(t) = e(t) − π(t),
π(x)
where
def
π(t) = (t − x0 )(t
− x1) · · · (t − xn )
n
n+1
xi tn + · · · (−1)n+1x0x1 · · · xn
X
= t −
i=0
∈ Πn+1.
Now note that φ vanishes at n + 2 points x and xk , k = 0, 1, . . . , n. =⇒
φ′ vanishes at n + 1 points ξ0 , . . . , ξn between these points =⇒ φ′′ vanishes
at n points between these new points, and so on until φ(n+1) vanishes at an
(unknown) point ξ in (x0, xn). But
e(x) (n+1) e(x)
φ(n+1) (t) = e(n+1) (t) − π (t) = f (n+1)(t) − (n + 1)!
π(x) π(x)
since p(n+1)
n (t) ≡ 0 and because π(t) is a monic polynomial of degree n + 1.
The result then follows immediately from this identity since φ(n+1) (ξ) = 0.
2
3
This shows the important fact that the error can be large at the end
points—there is a famous example due to Runge, where the error from the
interpolating polynomial approximation to f (x) = (1 + x2)−1 for n + 1
equally-spaced points on [−5, 5] diverges near ±5 as n tends to infinity: try
runge from the website in Matlab.
Building Lagrange interpolating polynomials from lower degree
ones.
Notation: Let Qi,j be the Lagrange interpolating polynomial at xk , k =
i, . . . , j.
Theorem.
(x − xi)Qi+1,j (x) − (x − xj )Qi,j−1(x)
Qi,j (x) = (3)
xj − xi
Proof. Let s(x) denote the right-hand side of (3). Because of uniqueness,
we simply wish to show that s(xk ) = fk . For k = i+1, . . . , j−1, Qi+1,j (xk ) =
fk = Qi,j−1(xk ), and hence
(xk − xi )Qi+1,j (xk ) − (xk − xj )Qi,j−1(xk )
s(xk ) = = fk .
xj − xi
We also have that Qi+1,j (xj ) = fj and Qi,j−1(xi) = fi, and hence
s(xi) = Qi,j−1(xi) = fi and s(xj ) = Qi+1,j (xj ) = fj .
2
4
Then n
X
p2n+1(x) = [fk Hn,k (x) + gk Kn,k (x)] (4)
k=0
interpolates the data as required. The polynomial (4) is called the Hermite
interpolating polynomial.
Theorem. Let p2n+1 be the Hermite interpolating polynomial in the case
where fi = f (xi) and gi = f ′(xi) and f has at least 2n+2 smooth derivatives.
Then, for every x ∈ [x0, xn],
f (2n+2)(ξ)
f (x) − p2n+1(x) = [(x − x0)(x − xk−1) · · · (x − xn )]2 ,
(2n + 2)!
5
Numerical Analysis Hilary Term 2011.
Lecture 2: Newton–Cotes Quadrature.
1
Proof.
L1,0 (x) L1,1 (x)
z }| { z }| {
Z x
1
Z x
1 x − x1 Z x
1 x − x0
p1(x) dx = f (x0) dx +f (x1) dx
x0 x0 − x1x0 x0 x1 − x0
(x1 − x0) (x1 − x0)
= f (x0) + f (x1)
2 2
Simpson’s Rule: n = 2:
f Z x
2 h
f (x) dx ≈ [f (x0) + 4f (x1) + f (x2)]
p2 x0 3
x0 h x1 h x2
In fact, we can prove a tighter result using the Integral Mean-Value Theo-
rem1:
Z x
1 (x1 − x0) (x1 − x0)3 ′′
Theorem. f (x) dx − [f (x0) + f (x1)] = − f (ξ) for
x0 2 12
1
Integral Mean-Value Theorem: if f and g are continuous on [a, b] and g(x) ≥ 0 on this interval,
Z b Z b
then there exits an η ∈ (a, b) for which f (x)g(x) dx = f (η) g(x) dx (see problem sheet).
a a
2
some ξ ∈ (x0, x1).
Proof. See problem sheet. 2
For n > 1, (4) gives pessimistic bounds. But one can prove better results
such as:
Theorem. Error in Simpson’s Rule: if f ′′′′ is continuous on (x0, x2), then
Z x
2 (x2 − x0)
(x2 − x0)5
max |f ′′′′(ξ)|.
f (x) dx − [f (x0) + 4f (x1) + f (x2)] ≤
x0 6 720 ξ∈[x0 ,x2 ]
Z x
2
Proof. Recall p2 (x) dx = 13 h[f (x0)+4f (x1)+f (x2)], where h = x2 −x1 =
x0
x1 − x0. Consider f (x0) − 2f (x1) + f (x2) = f (x1 − h) − 2f (x1) + f (x1 + h).
Then, by Taylor’s Theorem,
f (x1 − h) f (x1) − hf ′ (x1) + 12 h2 f ′′ (x1) − 16 h3 f ′′′(x1) + 241 h4 f ′′′′(ξ1)
−2f (x1) = −2f (x1) +
+f (x1 + h) f (x1) + hf ′ (x1) + 21 h2 f ′′ (x1) + 16 h3 f ′′′(x1) + 241 h4 f ′′′′(ξ2)
for some ξ1 ∈ (x0, x1) and ξ2 ∈ (x1, x2), and hence
f (x0) − 2f (x1) + f (x2) = h2 f ′′ (x1) + 241 h4 [f ′′′′(ξ1) + f ′′′′(ξ2)]
(5)
= h2 f ′′ (x1) + 121 h4 f ′′′′(ξ3 ),
the last result following from the Intermediate-Value Theorem2 for some
ξ3 ∈ (ξ1, ξ2 ) ⊂ (x0, x2). Now for any x ∈ [x0, x2], we may use Taylor’s
Theorem again to deduce
Z x Z x +h Z x +h
2 1 ′ 1
f (x) dx = f (x1) dx + f (x1) (x − x1 ) dx
x0 x1 −h Z x1 −h
x1 −h Z x +h
1
+ 21 f ′′(x1) (x − x1) dx + 61 f ′′′(x1)
2
(x − x1)3 dx
Z x +h 1 x −h x1 −h
1 ′′′′ 4
+ 24 1
f (η1(x))(x − x1) dx
x1 −h
= 2hf (x1) + 13 h3 f ′′(x1) + 601 h5 f ′′′′(η2)
5 ′′′′
= 3 h[f (x0 ) + 4f (x1 ) + f (x2 )] + 60 h f
1 1
(η2) − 361 h5 f ′′′′(ξ3)
1 x2 − x0 5
Z x !
2
= p2 (x) dx + (3f ′′′′(η2) − 5f ′′′′(ξ3))
x0 180 2
2
Intermediate-Value Theorem: if f is continuous on a closed interval [a, b], and c is any number
between f (a) and f (b) inclusive, then there is at least one number ξ in the closed interval such that
f (ξ) = c. In particular, since c = (df (a) + ef (b))/(d + e) lies between f (a) and f (b) for any positive d and
e, there is a value ξ in the closed interval for which d · f (a) + e · f (b) = (d + e) · f (ξ).
3
where η1 (x) and η2 ∈ (x0, x2), using the Integral Mean-Value Theorem and
(5). Thus, taking moduli,
Z x
2
8
(x2 − x0)5 max |f ′′′′(ξ)|
[f (x) − p2(x)] dx ≤
25
x0 · 180 ξ∈[x0 ,x2 ]
as required. 2
4
Numerical Analysis Hilary Term 2011.
Lecture 3: Newton-Cotes Quadrature (continued).
1
Another composite rule: if [a, b] = [x0, x2n],
Z b Z x
2n
n Z x2i
X
f (x) dx = f (x) dx = f (x) dx
a x0 i=1 x2i−2
Z x
2i
in which each f (x) dx is approximated by quadrature.
x2i−2
Simpson’s Rule:
Z x
2i h (2h)5 ′′′′
f (x) dx = [f (x2i−1) + 4f (x2i−1) + f (x2i)] − f (ξi )
x2i−2 3 2880
for some ξi ∈ (x2i−2, x2i).
Composite Simpson’s Rule:
(2h)5 ′′′′
Z x
2n
n
Xh
f (x) dx = [f (x2i−2) + 4f (x2i−1) + f (x2i)] − f (ξi )
x0 i=1 3 2880
h
= [f (x0) + 4f (x1) + 2f (x2) + 4f (x3) + 2f (x4) + · · ·
3
+ 2f (x2n−2) + 4f (x2n−1) + f (x2n)] + eSh
where ξi ∈ (x2i−2, x2i) and h = xi − xi−1 = (x2n − x0)/2n = (b − a)/2n, and
the error eSh is given by
(2h)5 X
n
′′′′ n(2h)5 ′′′′ h4 ′′′′
eh = −
S
f (ξi) = − f (ξ) = −(b − a) f (ξ)
2880 i=1 2880 180
for some ξ ∈ (a, b), using the Intermediate-Value Theorem n times. Note
that if we halve the stepsize h by introducing a new point half way between
each current pair (xi−1, xi), the factor h4 in the error will decrease by sixteen.
Adaptive procedure: if Sh is the value given by Simpson’s rule with a
stepsize h, then
15
Sh − S 21 h ≈ − eSh .
16
Z b
This suggests that if we wish to compute f (x) dx with an absolute error
a
ε, we should compute the sequence Sh , S 21 h , S 41 h , . . . and stop when the dif-
ference, in absolute value, between two consecutive values is smaller than
16
15 ε. That will ensure that (approximately) |eh | ≤ ε.
S
2
Matlab:
% matlab
>> help adaptive_simpson
ADAPTIVE_SIMPSON Adaptive Simpson’s rule.
S = ADAPTIVE_SIMPSON(F,A,B,NMAX,TOL) computes an approximation
to the integral of F on the interval [A,B]. It will take a
maximum of NMAX steps and will attempt to determine the
integral to a tolerance of TOL.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3
3
>> adaptive_simpson(g,0,pi,8,1.0e-7);
Step 1 integral is 1.7623727094, with error estimate 1.7624.
Step 2 integral is 1.8011896009, with error estimate 0.038817.
Step 3 integral is 1.7870879453, with error estimate 0.014102.
Step 4 integral is 1.7865214631, with error estimate 0.00056648.
Step 5 integral is 1.7864895607, with error estimate 3.1902e-05.
Step 6 integral is 1.7864876112, with error estimate 1.9495e-06.
Step 7 integral is 1.7864874900, with error estimate 1.2118e-07.
Successful termination at iteration 8:
The integral is 1.7864874825, with error estimate 7.5634e-09.
4
Numerical Analysis Hilary Term 2011.
Lecture 4: Gaussian Elimination.
This works if, and only if, aii 6= 0 for each i. The procedure is known as
forward substitution.
Computational work estimate: one floating-point operation (flop) is
one multiply (or divide) and possibly add (or subtraction) as in y = a∗x+b,
where a, x, b and y are computer representations of real scalars. Hence the
work in forward substitution is 1 flop to compute x1 plus 2 flops to compute
x2 plus . . . plus i flops to compute xi plus . . . plus n flops to compute xn , or
in total n
i = 12 n(n + 1) = 21 n2 + lower order terms
X
i=1
1
flops. We sometimes write this as 21 n2 + O(n) flops or more crudely O(n2)
flops.
Upper-triangular matrices: the matrix A is upper triangular if aij =
0 for all 1 ≤ j < i ≤ n. Once again, the system (1) is easy to solve if A is
upper triangular.
..
. ⇑
n
X
bi − aij xj
j=i+1
aii xi + · · · + ain−1xn−1 + a1n xn = bi =⇒ xi = ⇑
aii
..
. ⇑
bn−1 − an−1n xn
an−1n−1xn−1 + an−1nxn = bn−1 =⇒ xn−1 = ⇑
an−1n−1
bn
ann xn = bn =⇒ xn = . ⇑
ann
Again, this works if, and only if, aii 6= 0 for each i. The procedure is known
as backward or back substitution. This also takes approximately 21 n2
flops.
For computation, we need a reliable, systematic technique for reducing
Ax = b to U x = c with the same solution x but with U (upper) trian-
gular =⇒ Gauss elimination.
Example
3 −1 x1 12
= .
1 2 x2 11
Multiply first equation by 1/3 and subtract from the second =⇒
3 −1 x1 12
7
= .
0 3 x2 7
2
for columns j = 1, 2, . . . , n − 1
for rows i = j + 1, j + 2, . . . , n
aij
row i ← row i − ∗ row j
ajj
aij
bi ← bi − ∗ bj
ajj
end
end
Example.
3 −1 2 x1 12 3 −1 2 | 12
1 2 3 x2 = 11 : represent as 1 2 3 | 11
2 −2 −1 x3 2 2 −2 −1 | 2
3 −1 2 | 12
7 7
=⇒ row 2 ← row 2 − row 1 0 3 | 7
1
3 3
0 − 43 − 73 | −6
row 3 ← row 3 − 23 row 1
3 −1 2 | 12
7 7
=⇒ 0 3 | 7
3
row 3 ← row 3 + 47 row 2
0 0 −1 | −2
Back substitution:
x3 = 2
7 − 73 (2)
x2 = 7 =1
3
12 − (−1)(1) − 2(2)
x1 = = 3.
3
aij
Cost of Gaussian Elimination: note, row i ← row i − ∗ row j is
ajj
for columns k = j + 1, j + 2, . . . , n
aij
aik ← aik − ajk
ajj
end
3
This is approximately n − j flops as the multiplier aij /ajj is calculated
with just one flop; ajj is called the pivot. Overall therefore, the cost of GE
is approximately
n−1 n−1 n(n − 1)(2n − 1) 1 3
(n − j)2 =
X 2
= n + O(n2 )
X
l =
j=1 l=1 6 3
flops. The calculations involving b are
n−1
X n−1
X n(n − 1) 1 2
(n − j) = l= = n + O(n)
j=1 l=1 2 2
flops, just as for the triangular substitution.
4
Numerical Analysis Hilary Term 2011.
Lecture 5: LU Factorization.
The basic operation of Gaussian Elimination, row i ← row i + λ ∗ row j can
be achieved by pre-multiplication by a special lower-triangular matrix
0 0 0
0 λ 0 ← i
M(i, j, λ) = I +
0 0 0
↑
j
where I is the identity matrix.
Example: n = 4,
1 0 0 0 a a
0 1 0 0 b b
M(3, 2, λ) =
and M(3, 2, λ)
=
,
0 λ 1 0
c
λb + c
0 0 0 1 d d
i.e., M(3, 2, λ)A performs: row 3 of A ← row 3 of A + λ∗ row 2 of A and
similarly M(i, j, λ)A performs: row i of A ← row i of A + λ∗ row j of A.
So GE for e.g., n = 3 is
M(3, 2, −l32) · M(3, 1, −l31) · M(2, 1, −l21) · A = U = ( ).
a32 a31 a21
l32 = l31 = l21 = (upper triangular)
a22 a11 a11
The lij are the multipliers.
Be careful: each multiplier lij uses the data aij and aii that results from the
transformations already applied, not data from the original matrix. So l32
uses a32 and a22 that result from the previous transformations M(2, 1, −l21)
and M(3, 1, −l31).
Lemma. If i 6= j, (M(i, j, λ))−1 = M(i, j, −λ).
Proof. Exercise.
Outcome: for n = 3, A = M(2, 1, l21) · M(3, 1, l31) · M(3, 2, l32) · U , where
1 0 0
M(2, 1, l21) · M(3, 1, l31) · M(3, 2, l32) = l21 1 0 = L = ( ).
l31 l32 1 (lower triangular)
1
This is true for general n:
Theorem. For any dimension n, GE can be expressed as A = LU , where
U = ( ) is upper triangular resulting from GE, and L = ( ) is unit lower
triangular (lower triangular with ones on the diagonal) with lij = multiplier
used to create the zero in the (i, j)th position.
Most implementations of GE therefore, rather than doing GE as above,
Note: this is much more efficient if we have many different right-hand sides
b but the same A.
Pivoting: GE or LU can fail if the pivot aii = 0, e.g., if
0 1
A= ,
1 0
GE will fail at the first step. However, we are free to reorder the equations
(i.e., the rows) into any order we like, e.g., the equations
0 · x1 + 1 · x2 = 1 1 · x1 + 0 · x2 = 2
and
1 · x1 + 0 · x2 = 2 0 · x1 + 1 · x2 = 1
have had their rows reordered: GE fails for the first but succeeds for the
second =⇒ better to interchange the rows and then apply GE.
Partial pivoting: when creating the zeros in the j-th column, find |akj | =
max(|ajj |, |aj+1j |, . . . , |anj |), then swap (interchange) rows j and k
2
e.g.,
a11 · a1j−1 a1j · · · a1n
a11 · a1j−1 a1j · · · a1n
0 · · · · · · · 0 · · · · · · ·
0 · aj−1j−1 aj−1j · · · aj−1n 0 · aj−1j−1 aj−1j · · · aj−1n
0 · 0 ajj · · · ajn
0 · 0 akj · · · akn
→
0 · 0 · · · · · 0 · 0 · · · · ·
0 · 0 akj · · · akn 0 · 0 ajj · · · ajn
0 · 0 · · · · ·
0 · 0 · · · · ·
0 · 0 anj · · · ann 0 · 0 anj · · · ann
just has the 2nd row of A first, the 3rd row of A second and the 1st row of
A last.
3
% matlab
>> A = rand(6,6)
A =
0.8462 0.6813 0.3046 0.1509 0.4966 0.3420
0.5252 0.3795 0.1897 0.6979 0.8998 0.2897
0.2026 0.8318 0.1934 0.3784 0.8216 0.3412
0.6721 0.5028 0.6822 0.8600 0.6449 0.5341
0.8381 0.7095 0.3028 0.8537 0.8180 0.7271
0.0196 0.4289 0.5417 0.5936 0.6602 0.3093
>> b = A*ones(6,1)
b =
2.8215
2.9817
2.7691
3.8962
4.2491
2.5533
>> x = A \ b
x =
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
>> [L,U,P] = lu(A)
L =
1.0000 0 0 0 0 0
0.6206 -0.0648 0.0183 0.8969 1.0000 0
0.2395 1.0000 0 0 0 0
0.7943 -0.0573 0.9718 0.5673 -0.2248 1.0000
0.9904 0.0519 -0.0113 1.0000 0 0
0.0232 0.6178 1.0000 0 0 0
U =
0.8462 0.6813 0.3046 0.1509 0.4966 0.3420
0 0.6686 0.1205 0.3422 0.7027 0.2593
0 0 0.4602 0.3786 0.2146 0.1412
0 0 0 0.6907 0.2921 0.3765
0 0 0 0 0.3712 -0.2460
0 0 0 0 0 -0.1288
4
P =
1 0 0 0 0 0
0 0 1 0 0 0
0 0 0 0 0 1
0 0 0 0 1 0
0 1 0 0 0 0
0 0 0 1 0 0
>> P*P’
ans =
1 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
5
Numerical Analysis Hilary Term 2011.
Lecture 6: QR Factorization.
1
Note that the columns of an orthogonal matrix are of length 1 as qiT qi = 1, so
they form an orthonormal set ⇐⇒ they are linearly independent (check!)
=⇒ they form an orthonormal basis for Rn as there are n of them. 2
= I 2
2
√
say, where α = ± uTu.
Proof. Take w = γ(u − v), where γ 6= 0. Recall that since H(w) is
orthogonal, uT u = v T v. Then
wT w = γ 2(u − v)T (u − v) = γ 2(uTu − 2uTv + v T v)
= γ 2(uTu − 2uTv + uT u) = 2γuT(γ(u − v))
= 2γwTu.
So
2 2wTu 1
H(w)u = I− wwT u = u − w = u − w = u − (u − v) = v.
wTw wTw γ
2
3
Thus if we continue in this manner for the n − 1 steps, we obtain
α × ··· ×
0 β ··· ×
H(wn−1) · · · H(w3)H(w2)H(w)A = ( )
. . . . = .
| {z } .. .. . . ..
QT
0 0 ··· γ
A = QR
( ) = A = QR = ( )( )
4
where c = cos θ and s = sin θ.
Exercise: Prove that J(i, j, θ)J(i, j, θ)T = I— obvious though, since the
columns form an orthonormal basis.
Note that if x = (x1, x2 , . . . , xn)T and y = J(i, j, θ)x, then
yk = xk for k 6= i, j
yi = cxi + sxj
yj = −sxi + cxj
5
Numerical Analysis Hilary Term 2011.
Lecture 7: Matrix Eigenvalues.
n
X
(akk − λ)xk = − akj xj .
j6=k
j=1
1
Dividing by xk , (which, we know, is 6= 0) and taking absolute values,
X n X n X n
x j xj
|akk − λ| = akj ≤ |akj | ≤ |akj |
j6=k x k xk
j=1 j6j=1
=k j6=k
j=1
by (1). 2
Example.
9 1 2
A = −3 1 1
1 2 −1
−4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 11 12
2
must contain exactly ℓ eigenvalues (as they can’t jump!). 2
√
Notation: for x ∈ Rn , kxk = xTx is the (Euclidean) length of x.
3
Power Iteration: a simple method for calculating a single (largest) eigen-
value of a square matrix A is: for arbitrary y ∈ Rn , set x0 = y/kyk to
calculate an initial vector, and then for k = 0, 1, . . .
Compute yk = Axk
and set xk+1 = yk /kyk k.
This is the Power Method or Iteration, and computes unit vectors in
the direction of x0, Ax0, A2x0 , A3x0 , . . . , Ak x0.
Suppose that A is diagonalizable so that there is a basis of eigenvectors of
A:
{v1 , v2, . . . , vn}
with Avi = λi vi and kvi k = 1, i = 1, 2, . . . , n, and assume that
4
More usefully, the Power Iteration may be seen to compute yk = βk Ak x0
for some βk . Then, from the above,
yk βk Ak x 0
xk+1 = = · → ±v1 .
kyk k |βk | kAk x0 k
Similarly, yk−1 = βk−1 Ak−1x0 for some βk−1. Thus
βk−1 Ak−1x0 βk−1 Ak x 0
xk = · and hence yk = Axk = · .
|βk−1 | kAk−1x0k |βk−1| kAk−1x0k
Therefore, as above,
kAk x0k
kyk k = ≈ |λ1 |,
kAk−1x0k
and the sign of λ1 may be identified by looking at, e.g., (xk+1)1/(xk )1 .
Hence the largest eigenvalue (and its eigenvector) can be found.
Note: it is possible for a chosen vector x0 that α1 = 0, but rounding errors
in the computation generally introduce a small component in v1, so that in
practice this is not a concern!
This simplified method for eigenvalue computation is the basis for effective
methods, but the current state of the art is the QR Algorithm, which we
consider only in the case when A is symmetric.
5
Numerical Analysis Hilary Term 2011.
Lectures 8–9: The Symmetric QR Algorithm.
1
If
× × ··· ×
0 × ··· ×
H(w)A = .. .. ... ..
. . .
0 × ··· ×
then H(w)AH(w)T is generally full, i.e., all zeros created by pre-multiplication
are destroyed by the post-multiplication. However, if
" #
T
γ u
A=
u C
(as A = AT ) and
" # α
0 0
w= where H(ŵ)u = .. ,
ŵ .
0
it follows that
T
γ u
..
α × . ×
H(w)A =
.. .. .. .. ,
. . . .
..
0 × . ×
i.e., the uT part of the first row of A is unchanged. However, then
γ α 0 ··· 0
α
H(w)AH(w)−1 = H(w)AH(w)T = H(w)AH(w) = 0 ,
.. B
.
0
where B = H(ŵ)CH T(ŵ), as uTH(ŵ)T = (α, 0, · · · , 0); note that
H(w)AH(w)T is symmetric as A is.
Now we inductively apply this to the smaller matrix B, as described for the
QR factorization but using post- as well as pre-multiplications. The result
of n − 2 such Householder similarity transformations is the matrix
H(wn−2) · · · H(w2)H(w)AH(w)H(w2) · · · H(wn−2),
2
which is tridiagonal.
The QR factorization of a tridiagonal matrix can now easily be achieved
with n − 1 Givens rotations: if A is tridiagonal
J(n − 1, n) · · · J(2, 3)J(1, 2)A = R, upper triangular.
| {z }
T
Q
Precisely, R has a diagonal and 2 super-diagonals,
× × × 0 0 0 ··· 0
0 × × × 0 0 ··· 0
0 0 × × × 0 ··· 0
. . ..
.. .. .
R= 0 0 0 0 × × × 0
0 0 0 0 0 × × ×
0 0 0 0 0 0 × ×
0 0 0 0 0 0 0 ×
(exercise: check!). In the QR algorithm, the next matrix in the sequence is
RQ.
Lemma. In the QR algorithm applied to a tridiagonal matrix, the tridiag-
onal form is preserved when Givens rotations are used.
Proof. If Ak = QR = J(1, 2)TJ(2, 3)T · · · J(n − 1, n)TR is tridiagonal,
then Ak+1 = RQ = RJ(1, 2)TJ(2, 3)T · · · J(n − 1, n)T. Recall that post-
multiplication of a matrix by J(i, i + 1)T replaces columns i and i + 1
by linear combinations of the pair of columns, while leaving columns j =
1, 2, . . . , i − 1, i + 2, . . . , n alone. Thus, since R is upper triangular, the
only subdiagonal entry in RJ(1, 2)T is in position (2, 1). Similarly, the
only subdiagonal entries in RJ(1, 2)TJ(2, 3)T = (RJ(1, 2)T)J(2, 3)T are in
positions (2, 1) and (3, 2). Inductively, the only subdiagonal entries in
RJ(1, 2)TJ(2, 3)T · · · J(i − 2, i − 1)TJ(i − 1, i)T
= (RJ(1, 2)TJ(2, 3)T · · · J(i − 2, i − 1)T)J(i − 1, i)T
are in positions (j, j − 1), j = 2, . . . i. So, the lower triangular part of Ak+1
only has nonzeros on its first subdiagonal. However, then since Ak+1 is
symmetric, it must be tridiagonal. 2
3
Using shifts. One further and final step in making an efficient algorithm
is the use of shifts:
for k = 1, 2, . . .
form the QR factorization of Ak − µk I = Qk Rk
and set Ak+1 = Rk Qk + µk I
end
For any chosen sequence of values of µk ∈ R, {Ak }∞ k=1 are symmetric and
tridiagonal if A1 has these properties, and similar to A1 .
The simplest shift to use is an,n , which leads rapidly in almost all cases to
" #
Tk 0
Ak = ,
0T λ
4
Numerical Analysis Hilary Term 2011.
Lecture 10: Best Approximation in Inner-Product Spaces.
is the L∞ - or ∞-norm.
3. For integrable functions on (a, b),
Z b
kf k ≡ kf k1 = |f (x)| dx
a
1
is the L2 - or two-norm—the space L2 (a, b) is a common abbreviation for
L2w (a, b) for the case w(x) ≡ 1.
Note: kf k2 = 0 =⇒ f = 0 almost everywhere on [a, b]. We say that a certain property
P holds almost everywhere (a.e.) on [a, b] if property P holds at each point of [a, b] except
perhaps on a subset S ⊂ [a, b] of zero measure. We say that a set S ⊂ R has zero measure (or
that it is of measure zero) if for any ε > 0 there exists a sequence {(αi , βi )}∞
i=1 of subintervals
P∞
of R such that S ⊂ ∪∞ i=1 (αi , βi ) and i=1 (βi − αi ) < ε. Trivially, the empty set ∅(⊂ R) has
zero measure. Any finite subset of R has zero measure. Any countable subset of R, such as
the set of all natural numbers N, the set of all integers Z, or the set of all rational numbers Q,
is of measure zero.
Least-squares polynomial approximation: aim to find the best poly-
nomial approximation to f ∈ L2w (a, b), i.e., find pn ∈ Πn for which
kf − pn k2 ≤ kf − qk2 ∀q ∈ Πn .
n
X
Seeking pn in the form pn(x) = αk xk then results in the minimization
k=0
problem " #2
Z b n
X
min w(x) f (x) − αk xk dx.
(α0 ,...,αn ) a k=0
The unique minimizer can be found from the (linear) system
Z b " n
#2
∂ X
w(x) f (x) − αk xk dx = 0 for each j = 0, 1, . . . , n,
∂αj a
k=0
Examples: 1. V = Rn ,
n
X
T
hx, yi = x y = xiyi ,
i=1
2
where x = (x1, . . . , xn)T and y = (y1, . . . , yn)T .
Z b
2. V = L2w (a, b) = {f : (a, b) → R | w(x)[f (x)]2 dx < ∞},
a
Z b
hf, gi = w(x)f (x)g(x) dx,
a
3
4. The Cauchy–Schwarz inequality: Suppose that V is an inner-product
space with inner product h·, ·i and norm k · k defined by this inner product.
For any u, v ∈ V ,
|hu, vi| ≤ kukkvk.
Proof. For every λ ∈ R,
ku + vk ≤ kuk + kvk.
2
1
Note: The function k · k : V → R defined by kvk := hv, vi 2 on the inner-
product space V , with inner product h·, ·i, trivially satisfies the first two
axioms of norm on V ; this is a consequence of h·, ·i being an inner product
on V . Result 5 above implies that k · k also satisfies the third axiom of
norm, the triangle inequality.
4
Numerical Analysis Hilary Term 2011.
Lecture 11: Least-Squares Approximation.
Z b
For the problem of least-squares approximation, hf, gi = w(x)f (x)g(x) dx
a
and kf k22 = hf, f i where w(x) > 0 on (a, b).
Theorem. If f ∈ L2w (a, b) and pn ∈ Πn is such that
hf − pn , ri = 0 ∀r ∈ Πn, (1)
then
kf − pn k2 ≤ kf − rk2 ∀r ∈ Πn ,
i.e., pn is a best (weighted) least-squares approximation to f on [a, b].
Proof.
kf − pn k22 = hf − pn, f − pn i
= hf − pn, f − ri + hf − pn , r − pn i ∀r ∈ Πn
Since r − pn ∈ Πn the assumption (??) implies that
= hf − pn, f − ri
≤ kf − pn k2kf − rk2 by the Cauchy–Schwarz inequality.
Dividing both sides by kf − pn k2 gives the required result. 2
1
which is the component-wise statement of a matrix equation
Aα = ϕ, (3)
Z 1 Z 1 Z 1
α0 x dx + α1 2
x dx = ex x dx
0 0 0
i.e., " #" # " #
1 1
2 α0 e−1
=
1
2
1
3 α1 1
=⇒ α0 = 4e − 10 and α1 = 18 − 6e, so p1 (x) := (18 − 6e)x + (4e − 10) is
the best approximation.
Proof that the coefficient matrix A is nonsingular will now establish exis-
tence and uniqueness of (weighted) k · k2 best-approximation.
Theorem. The coefficient matrix A is nonsingular.
Proof. Suppose not =⇒ ∃α 6= 0 with Aα = 0 =⇒ αT Aα = 0
n
X n
X n
X
⇐⇒ αi (Aα)i = 0 ⇐⇒ αi aik αk = 0,
i=0 i=0 k=0
2
Z b
and using the definition aik = w(x)xk xi dx ,
a
n
X n Z
X b
⇐⇒ αi w(x)xk xi dx αk = 0.
i=0 k=0 a
Rearranging gives
n
! n
! n
!2
Z b X X Z b X
w(x) αi xi αk xk dx = 0 or w(x) αi xi dx = 0
a i=0 k=0 a i=0
n
X
which implies that αi xi = 0 and thus αi = 0 for i = 0, 1, . . . , n. This
i=0
contradicts the initial supposition, and thus A is nonsingular. 2
3
Numerical Analysis Hilary Term 2011.
Lecture 12: Orthogonal Polynomials.
i.e.,
Aβ = ϕ, (1)
where β = (β0, β1, . . . , βn)T , ϕ = (f1, f2, . . . , fn)T and now
Z b Z b
ai,k = w(x)φk (x)φi(x) dx and fi = w(x)f (x)φi(x) dx.
a a
So A is diagonal if
(
b
= 0 i 6= k and
Z
hφi , φk i = w(x)φi(x)φk (x) dx
a 6= 0 i = k.
We can create such a set of orthogonal polynomials
{φ0 , φ1, . . . , φn , . . .},
with φi ∈ Πi for each i, by the Gram–Schmidt procedure, which is based
on the following lemma.
Lemma. Suppose that φ0 , φ1, . . . , φk , with φ
Zi ∈ Πi for each i, are orthogonal
b
with respect to the inner product hf, gi = w(x)f (x)g(x) dx. Then,
a
k
X
k+1
φk+1(x) = x − λi φi (x)
i=0
1
satisfies
Z b
hφk+1 , φj i = w(x)φk+1(x)φj (x) dx = 0, j = 0, 1, . . . , k,
a
when
hxk+1, φj i
λj = , j = 0, 1, . . . , k.
hφj , φj i
Proof. For any j, 0 ≤ j ≤ k,
k
X
hφk+1 , φj i = hxk+1, φj i − λi hφi , φj i
i=0
k+1
= hx , φj i − λj hφj , φj i
by the orthogonality of φi and φj , i 6= j,
= 0 by definition of λj . 2
2
3. The inner product
Z ∞
hf, gi = e−x f (x)g(x) dx
0
holds.
Proof. The polynomial xφk ∈ Πk+1, so there exist real numbers
such that
k+1
X
xφk (x) = σk,iφi (x)
i=0
3
as {φ0 , φ1, . . . , φk+1} is a basis for Πk+1. Now take the inner product on
both sides with φj , and note that xφj ∈ Πk−1 if j ≤ k − 2. Thus
Z b Z b
hxφk , φj i = w(x)xφk (x)φj (x) dx = w(x)φk (x)xφj (x) dx = hφk , xφj i = 0
a a
4
Matlab:
% cat hermite_polys.m
x=linspace(-2.2,2.2,200);
oldH=ones(1,200); plot(x,oldH), hold on
newH=2*x; plot(x,newH)
for n=1:2,...
newnewH=2*x.*newH-2*n*oldH; plot(x,newnewH),...
oldH=newH;newH=newnewH;
end
% matlab
>> hermite_polys
60
40
20
−20
−40
−60
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
5
Numerical Analysis Hilary Term 2011.
Lecture 13: Gaussian Quadrature.
implies that w(x)φk (x) dx = 0 with w(x) > 0, x ∈ (a, b). Thus φk (x)
a
must change sign in (a, b), i.e., φk has at least one root in (a, b).
Suppose that there are ℓ points a < r1 < r2 < · · · < rℓ < b where φk changes
sign for some 1 ≤ ℓ ≤ k. Then
ℓ
Y
q(x) = (x − rj ) × the sign of φk on (rℓ, b)
j=1
and thus it follows from the previous lemma that q, (which is of degree ℓ)
must be of degree ≥ k, i.e., ℓ ≥ k. Therefore ℓ = k, and φk has k distinct
roots in (a, b). 2
1
so that the rule is exact for polynomials of degree as high as possible? (The
case w(x) ≡ 1 is the most common.)
Recall: the Lagrange interpolating polynomial
n
X
pn = f (xj )Ln,j ∈ Πn
j=0
where Z b
wj = w(x)Ln,j (x) dx (2)
a
exactly!
Theorem. Suppose that x0 < x1 < · · · < xn are the roots of the n + 1-st
degree orthogonal polynomial φn+1 with respect to the inner product
Z b
hg, hi = w(x)g(x)h(x) dx.
a
Then, the quadrature formula (1) with weights satisfying (2) is exact when-
ever f ∈ Π2n+1.
Proof. Let p ∈ Π2n+1, then by the Division Algorithm p(x) = q(x)φn+1(x)+
r(x) with q, r ∈ Πn . So
Z b Z b Z b Xn
w(x)p(x) dx = w(x)q(x)φn+1(x) dx + w(x)r(x) dx = wj r(xj )
a a a j=0
(3)
since the integral involving q ∈ Πn is zero by the lemma above and the other
is integrated exactly since r ∈ Πn . Finally p(xj ) = q(xj )φn+1(xj ) + r(xj ) =
r(xj ) for j = 0, 1, . . . , n as the xj are the roots of φn+1. So (3) gives
Z b Xn
w(x)p(x) dx = wj p(xj ),
a j=0
2
where wj is given by (2) whenever p ∈ Π2n+1. 2
Note that for the Trapezium Rule (also two evaluations of the integrand)
gives Z 2
1 1 1
dx ≃ + 1 = 0.75,
1 x 2 2
3
2
1
Z
whereas dx = ln 2 = 0.6931472 . . . .
1 x
Theorem. Error in Gaussian Quadrature: suppose that f (2n+2) is continu-
ous on (a, b). Then,
Z b n n
f (2n+2)(η) b
X Z Y
w(x)f (x) dx = wj f (xj ) + w(x) (x − xj )2 dx
a j=0
(2n + 2)! a j=0
and hence the required result follows from the Integral Mean Value Theorem
as w(x) nj=0(x − xj )2 ≥ 0.
Q
2
4
any cubic polynomial exactly, so
Z 1
2= 1 dx = w0 + w1 (4)
−1
Z 1
0= x dx = w0 x0 + w1 x1 (5)
−1
Z 1
3 =
2
x2 dx = w0x20 + w1x21 (6)
−1
Z 1
0= x3 dx = w0x30 + w1x31. (7)
−1
x0 = − √13 and x1 = √1
3
,
5
Table 1: Abscissas xj (zeros of Legendre polynomials) and weight factors wj for Gaussian
Z 1 Xn
Quadrature: f (x) dx ≃ wj f (xj ) for n = 0 to 6.
−1 j=0
xj wj
n=0 0.000000000000000e+0 2.000000000000000e+0
n=1 5.773502691896258e−1 1.000000000000000e+0
−5.773502691896258e−1 1.000000000000000e+0
7.745966692414834e−1 5.555555555555556e−1
n=2 0.000000000000000e+0 8.888888888888889e−1
−7.745966692414834e−1 5.555555555555556e−1
8.611363115940526e−1 3.478548451374539e−1
n=3 3.399810435848563e−1 6.521451548625461e−1
−3.399810435848563e−1 6.521451548625461e−1
−8.611363115940526e−1 3.478548451374539e−1
9.061798459386640e−1 2.369268850561891e−1
5.384693101056831e−1 4.786286704993665e−1
n=4 0.000000000000000e+0 5.688888888888889e−1
−5.384693101056831e−1 4.786286704993665e−1
−9.061798459386640e−1 2.369268850561891e−1
9.324695142031520e−1 1.713244923791703e−1
6.612093864662645e−1 3.607615730481386e−1
n=5 2.386191860831969e−1 4.679139345726910e−1
−2.386191860831969e−1 4.679139345726910e−1
−6.612093864662645e−1 3.607615730481386e−1
−9.324695142031520e−1 1.713244923791703e−1
9.491079123427585e−1 1.294849661688697e−1
7.415311855993944e−1 2.797053914892767e−1
4.058451513773972e−1 3.818300505051189e−1
n=6 0.000000000000000e+0 4.179591836734694e−1
−4.058451513773972e−1 3.818300505051189e−1
−7.415311855993944e−1 2.797053914892767e−1
−9.491079123427585e−1 1.294849661688697e−1
6
Table 2: Abscissas xj (zeros of Legendre polynomials) and weight factors wj for Gaussian
Z 1 Xn
Quadrature: f (x) dx ≃ wj f (xj ) for n = 7 to 9.
−1 j=0
xj wj
9.602898564975362e−1 1.012285362903763e−1
7.966664774136267e−1 2.223810344533745e−1
5.255324099163290e−1 3.137066458778873e−1
n=7 1.834346424956498e−1 3.626837833783620e−1
−1.834346424956498e−1 3.626837833783620e−1
−5.255324099163290e−1 3.137066458778873e−1
−7.966664774136267e−1 2.223810344533745e−1
−9.602898564975362e−1 1.012285362903763e−1
9.681602395076261e−1 8.127438836157441e−2
8.360311073266358e−1 1.806481606948574e−1
6.133714327005904e−1 2.606106964029355e−1
3.242534234038089e−1 3.123470770400028e−1
n=8 0.000000000000000e+0 3.302393550012598e−1
−3.242534234038089e−1 3.123470770400028e−1
−6.133714327005904e−1 2.606106964029355e−1
−8.360311073266358e−1 1.806481606948574e−1
−9.681602395076261e−1 8.127438836157441e−2
9.739065285171717e−1 6.667134430868814e−2
8.650633666889845e−1 1.494513491505806e−1
6.794095682990244e−1 2.190863625159820e−1
4.333953941292472e−1 2.692667193099964e−1
n=9 1.488743389816312e−1 2.955242247147529e−1
−1.488743389816312e−1 2.955242247147529e−1
−4.333953941292472e−1 2.692667193099964e−1
−6.794095682990244e−1 2.190863625159820e−1
−8.650633666889845e−1 1.494513491505806e−1
−9.739065285171717e−1 6.667134430868814e−2
7
Numerical Analysis Hilary Term 2011.
Lectures 14–15: Piecewise Polynomial Interpolation: Splines.
On the left the Lagrange Interpolant p7 ‘wiggles’ through the points, while
on the right a piecewise linear interpolant (‘join the dots’), or linear spline
interpolant, s appears to represent the data better.
Remark: for any given data s clearly exists and is unique.
Suppose that a = x0 < x1 < · · · < xn = b. Then, s is linear on each interval
[xi−1, xi] for i = 1, . . . , n and continuous on [a, b]. The xi, i = 0, 1, . . . , n,
are called the knots of the linear spline.
Notation: f ∈ Ck [a, b] if f, f ′, . . . , f k exist and are continuous on [a, b].
Theorem. Suppose that f ∈ C2[a, b]. Then,
1
kf − sk∞ ≤ h2 kf ′′k∞
8
where h = max (xi − xi−1) and kf ′′k∞ = max |f ′′(x)|.
1≤i≤n x∈[a,b]
Proof. For x ∈ [xi−1, xi], the error from linear interpolation is
1
f (x) − s(x) = f ′′(η)(x − xi−1)(x − xi )
2
where η = η(x) ∈ (xi−1, xi). However, |(x − xi−1)(x − xi)| = (x − xi−1)(xi −
x) = −x2 + x(xi−1 + xi ) − xi−1xi, which has its maximum value when
2x = xi + xi−1, i.e., when x − xi−1 = xi − x = 12 (xi − xi−1). Thus, for any
x ∈ [xi−1, xi], i = 1, 2, . . . , n, we have
1 1
|f (x) − s(x)| ≤ kf ′′k∞ max |(x − xi−1)(x − xi)| = h2 kf ′′k∞ . 2
2 x∈[xi−1 ,xi ] 8
Note that s may have discontinuous derivatives, but is a locally defined
approximation, since changing the value of one data point affects the ap-
proximation in only two intervals.
1
To get greater smoothness but retain some ‘locality’, we can define cubic
splines s ∈ C2[a, b]. For a given ‘partition’, a = x0 < x1 < · · · < xn = b,
there are (generally different!) cubic polynomials in each interval (xi−1, xi),
i = 1, . . . , n, which are ’joined’ at each knot to have continuity and conti-
nuity of s′ and s′′ . Interpolating cubic splines also satisfy s(xi) = fi for
given data fi, i = 0, 1, . . . , n.
Remark: if there are n intervals, there are 4n free coefficients (four for
each cubic ‘piece’), but 2n interpolation conditions (one each at the ends of
each interval), n − 1 derivative continuity conditions (at x1, . . . , xn−1) and
n − 1 second derivative continuity conditions (at the same points), giving a
total of 4n−2 conditions (which are linear in the free coefficients). Thus the
spline is not unique. So we need to add two extra conditions to generate a
spline that might be unique. There are three common ways of doings this:
(b) specify s′′ (x0) = 0 = s′′ (xn) — this gives a natural cubic spline; or
(c) enforce continuity of s′′′ at x1 and xn−1 (which implies that the first
two pieces are the same cubic spline, i.e., on [x0, x2], and similarly for
the last two pieces, i.e., on [xn−2, xn], from which it follows that x1 and
xn−1 are not knots! — this is usually described as the ‘not a knot’
end-conditions).
and overall as
n
X
si (x) for x ∈ [x0, xn] \ {x0, x1, . . . , xn}
s(x) = i=1
f (xi) for x = xi, i = 0, 1, . . . , n.
2
The 4n linear conditions for an interpolating cubic spline s are:
si (x−
i ) = f (xi)
s1(x0) = f (x0) si+1(x+ i ) = f (xi) sn (xn) = f (xn)
s1 (x0) = f ′ (x0) (a)
′
si (xi) − si+1(xi) = 0 s1(xn) = f ′ (xn) (a)
′ ′ ′ (1)
or s′′1 (x0) = 0 (b) s′′i (xi) − s′′i+1(xi) = 0 or s′′n (xn) = 0 (b)
i = 1, . . . , n − 1.
and the various entries of g are f (xi), i ∈ 0, 1, . . . , n, and f ′(x0), f ′(xn) and
zeros for (a) and zeros for (b).
So if A is nonsingular, this implies that y = A−1g, that is there is a unique
set of coefficients {a1 , b1, c1 , d1, a2 , . . . , dn−1, an , bn, cn , dn}. We now prove
that if Ay = 0 then y = 0, and thus that A is nonsingular for cases (a) and
(b) — it is also possible, but more complicated, to show this for case (c).
Theorem. If f (xi) = 0 at the knots xi, i = 1, . . . , n, and additionally
f ′ (x0) = 0 = f ′ (xn) for case (a), then s(x) = 0 for all x ∈ [x0, xn].
Proof. Consider
Z xn n Z
X xi
′′ 2
(s (x)) dx = (s′′i (x))2 dx
x0 i=1 xi−1
n n Z xi
x
X X
= [s′i (x)s′′i (x)]xii−1 − s′i (x)s′′′
i (x) dx
i=1 i=1 xi−1
since s′′′
i (x) is constant on the interval (xi−1, xi) and si (xi−1) = 0 = si (xi ).
Thus, matching first and second derivatives at the knots, telescopic cancel-
3
lation gives
Z xn n
x
X
′′ 2
(s (x)) dx = [s′i (x)s′′i (x)]xii−1
x0 i=1
= s′1(x1)s′′1 (x1) − s′1 (x0)s′′1 (x0)
+ s′2(x2)s′′2 (x2) − s′2 (x1)s′′2 (x1) + · · ·
+ s′n−1(xn−1)s′′n−1(xn−1) − s′n−1 (xn−2)s′′n−1(xn−2)
+ s′n (xn)s′′n(xn) − s′n (xn−1)s′′n (xn−1)
= s′n (xn)s′′n(xn) − s′1 (x0)s′′1 (x0).
However, in case (a), f ′(x0) = f ′(xn) =⇒ s′1 (x0) = 0 = s′n (xn), while in
case (b) s′′1 (x0) = 0 = s′′n (xn). Thus
Z xn
(s′′(x))2 dx = 0,
x0
which implies that s′′i (x) = 0 and thus si (x) = ci x + di . Since s(xi−1) = 0 =
s(xi), s(x) is identically zero on [x0, xn]. 2
4
is the interpolatory natural cubic spline to f . These have the disadvantage
that if any xi is changed, all of the Ci s change, — which is clear from writing
down Ci (x) explicitly:
(x − x1 ) · · · (x − xi−1)(x − xi+1) · · · (x − xn )
Ci (x) = .
(xi − x1) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn)
Preferred are the B-splines (locally) defined by Bi (xi) = 1 for i =
2, 3, . . . , n − 2, Bi (x) ≡ 0 for x ∈/ (xi−2, xi+2), Bi a cubic spline with knots
xj , j = 0, 1, . . . , n, with special definitions for B0 , B1 , Bn−1 and Bn .
Example/construction: Cubic B-spline with knots 0, 1, 2, 3, 4. On [0, 1],
B(x) = ax3
for some β, and since we require B(2) = 1, then β = 1 − 7a. Now, in order
to continue, by symmetry, we must have B ′ (2) = 0, i.e.,
and hence a = 14 . So
0 for x<0
1 3
4x for x ∈ [0, 1]
− 3 (x − 1)3 + 3 (x − 1)2 + 3 (x − 1) +
1
for x ∈ [1, 2]
4 4 4 4
B(x) = 3 2
− 4 (3 − x) + 4 (3 − x) + 34 (3 − x) +
3 3 1
4 for x ∈ [2, 3]
3
4 (4 − x) for x ∈ [3, 4]
1
0 for x > 4.
5
More generally: B-spline on xi = a + hi, where h = (b − a)/n.
0 for x < xi−2
(x − xi−2)3
3
for x ∈ [xi−2, xi−1]
4h
3 2
− 3(x − xi−1) + 3(x − xi−1) + 3(x − xi−1) + 1 for x ∈ [xi−1, xi]
Bi (x) = 4h3 3
4h2 2
4h 4
3(x i+1 − x) 3(x i+1 − x) 3(x i+1 − x) 1
− + + + for x ∈ [xi, xi+1]
4h3 4h2 4h 4
(xi+2 − x)3
for x ∈ [xi+1, xi+2]
3
4h
0 for x > xi+2.
fi = 41 ci−1 + ci + 14 ci+1,
6
i.e.,
1 1
4 c0 f0
1 ... c1 f1
1
4
... ... ... . .
.. = ..
.
... 1 cn−1 fn−1
1
4
1
4 1 cn fn
For linear splines, a similar local basis of ‘hat functions’ or Linear B-
splines φi (x) exist:
x − xi−1
x ∈ (xi−1, xi)
xi − xi−1
φi (x) = x − xi+1
x ∈ (xi, xi+1)
xi − xi+1
0 x∈/ (xi−1, xi+1)
7
% interpolation point - this is one of the end-point choices available with
% the matlab command spline (and is what is called option (a) in lectures)
% (f’ = -2x/(1+x^2)^2, so f’(-5) = 10/26^2 and f’(5) = -10/26^2)
s=spline(x,y);
% to see the function (in red) and the spline interpolant (in blue) on the
% same figure
hold on
plot(fine,f,’r’),pause
% matlab
>> spline_example
8
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−4 −3 −2 −1 0 1 2 3 4
9
The result then follows directly since the discriminant of this quadratic must
be nonpositive. 2
However,
Z x the first required inequality then follows since for x ∈ [xj−1, xj ],
dt ≤ h and because the previous theorem gives that
c
Z x Z x Z xn
[e′′ (t)]2 dt ≤ [f ′′(t)]2 dt ≤ [f ′′(x)]2 dx.
c c x0
Similar bounds exist for natural cubic splines and splines satisfying end-
condition (c).
10
Numerical Analysis Hilary Term 2011.
Lecture 16: Richardson Extrapolation.
for which
(j)
T = Th + O(h2j )
so long as there are high enough order terms in the error series.
Example: approximation of π by inscribed polygons in unit circle. For a
regular n-gon, the circumference = 2n sin(π/n) ≤ 2π, so let cn = n sin(π/n) ≤
π, or if we put h = 1/n,
1 π 3 h2 π 5 h4
sin(πh) = π −
cn = + + ···
h 6 120
so that we can use Richardson Extrapolation. Indeed c2 = 2 and
q
c2n = 2nr
sin(π/2n) = 2n 1
2 (1 − cos(π/n))
q r q
2
= 2n 1
2 (1 − 1 − sin (π/n)) = 2n 1
2 (1 − 1 − (cn /n)2) .
2
There is such as series: the Euler–Maclaurin formula
Z b r
B2k 2k (2k−1)
X
f (x) dx − Th = − h [f (b) − f (2k−1)(a)]
k=1 (2k)!
a
h2r+1B2r+2 (2r+2)
+(b − a) f (ξ)
(2r + 2)!
where ξ ∈ (a, b) and B2k are called the Bernoulli numbers, defined by
x ∞
X xℓ
= Bl
ex − 1 ℓ=0 ℓ!
so that B2 = 1
6 , B4 = − 301 , etc.
Romberg Integration is composite Trapezium for n = 0, 1, 2, 3, and
the repeated application of Richardson Extrapolation. Changing notation
(Th → Tn , h = stepsize, 2n = number of composite steps), we have
b−a
T0 = [f (a) + f (b)] = R0,0
2
b−a
T1 = [f (a) + f (b) + 2f (a + 12 (b + a))]
4
= 12 [R0,0 + (b − a)f (a + 12 (b + a))] = R1,0 .
3
now with error O(h6 ). At the ith stage
" 2i−2 ! !
1 b−a X 1 b−a
Ti = Ri,0 = Ri−1,0 + i−2 f a+ j− .
2 2 j=1 2 2i−1
| {z }
evaluations at new interlacing points
Extrapolate
4j Ri,j−1 − Ri−1,j−1
Ri,j = for j = 1, 2, . . .
4j − 1
This builds a triangular table:
R0,0
R1,0 R1,1
R2,0 R2,1 R2,2
.. .. .. . . .
. . .
Ri,0 Ri,1 Ri,2 . . . Ri,i
Theorem: Composite Composite
Trapezium Simpson
Notes 1. The integrand must have enough derivatives for the Euler–
Maclaurin series to exist (the whole procedure is based on this!).
Z b Z b
2. Rn,n → f (x) dx in general much faster than Rn,0 → f (x) dx.
a a
A final observation: because of the Euler–Maclaurin series, if f ∈
C2n+2[a, b] and is periodic of period b − a, then f (j) (a) = f (j) (b) for j =
0, 1, . . . , 2n − 1, so
Z b h2n+1B2n+2 (2n+2)
f (x) dx − Th = (b − a) f (ξ)
a (2n + 2)!
c.f.,
Z b h2 ′′
f (x) dx − Th = (b − a)
f (ξ)
a 12
for nonperiodic functions! That is, the Composite Trapezium Rule is ex-
tremely accurate for the integration of periodic functions. If f ∈ C∞ [a, b],
then Th → ab f (x) dx faster than any power of h.
R
Example:
Z 2π q
the circumference of an ellipse with semi-axes A and B is
A2 sin2 φ + B 2 cos2 φ dφ. For A = 1 and B = 41 , T8 = 4.2533, T16 =
0
4.2878, T32 = 4.2892 = T64 = · · ·.