5 Linear Equations: 11 1 12 2 1n N 1 21 1 22 2 2n N 2 1 1 N 2 2 NN N N
5 Linear Equations: 11 1 12 2 1n N 1 21 1 22 2 2n N 2 1 1 N 2 2 NN N N
To estimate the computational cost of forward substitution, we can count the number of
floating-point operations (+, −, ×, ÷).
If our matrix A is not triangular, we can try to transform it to triangular form. Gaussian elimina-
tion uses elementary row operations to transform the system to upper triangular form U x = y.
Elementary row operations include swapping rows and adding multiples of one row to another.
They won’t change the solution x, but will change the matrix A and the right-hand side b.
Stage 1. Subtract 1 times equation 1 from equation 2, and 2 times equation 1 from equation 3,
so as to eliminate x 1 from equations 2 and 3:
x 1 + 2x 2 + x 3 = 0, 1 2 1 0
−4x 2 + x 3 = 4, A(2) = *.0 −4 1 +/ b (2) = *.4+/ , m 21 = 1, m 31 = 2.
8x 2 − 4x 3 = 4. ,0 8 −4- ,4-
Stage 2. Subtract −2 times equation 2 from equation 3, to eliminate x 2 from equation 3:
x 1 + 2x 2 + x 3 = 0, 1 2 1 0
−4x 2 + x 3 = 4, A(3) = *.0 −4 1 +/ b (3) = *. 4 +/ , m 32 = −2.
−2x 3 = 12. ,0 0 −2- ,12-
Now the system is upper triangular, and back substitution gives x 1 = 11, x 2 = − 25 , x 3 = −6.
We can write the general algorithm as follows.
Algorithm 5.1 (Gaussian elimination). Let A(1) = A and b (1) = b. Then for each k from 1 to
n − 1, compute a new matrix A(k+1) and right-hand side b (k+1) by the following procedure:
1. Define the row multipliers
(k )
aik
mik = (k )
, i = k + 1, . . . , n.
akk
2. Use these to remove the unknown xk from equations k + 1 to n, leaving
Recalling that
n
X n
X
k= 1
2 n(n + 1), k 2 = 16 n(n + 1)(2n + 1),
k=1 k=1
we find
5.3 LU decomposition
In Gaussian elimination, both the final matrix U and the sequence of row operations are de-
termined solely by A, and do not depend on b. We will see that the sequence of row operations
that transforms A to U is equivalent to left-multiplying by a matrix F , so that
FA = U , U x = Fb. (5.8)
To see this, note that step k of Gaussian elimination can be written in the form
1 0 0 1 2 1 1 2 1 1 2 1
F (1) A = *.−1 1 0+/ *.1 −2 2 +/ = *.1−1(1) −2−1(2) 2−1(1) +/ = *.0 −4 1 +/ = A(2) ,
,−2 0 1- ,2 12 −2- ,2−2(1) 12−2(2) −2−2(1) - ,0 8 −4-
and
1 0 0 1 2 1 1 2 1 1 2 1
F (2) (2)
A * + * + *
= .0 1 0/ .0 −4 1 / = .0 −4 1 / = .0 −4 1 +/ = A(3) = U .
+ *
,0 2 1- ,0 8 −4- ,0 8+2(−4) −4+2(1) - ,0 0 −2-
It follows that
U = A(n) = F (n−1) F (n−2) · · · F (1) A. (5.10)
Now the F (k ) are invertible, and the inverse is just given by adding rows instead of subtracting:
Theorem 5.2 (LU decomposition). Let U be the upper triangular matrix from Gaussian elimi-
nation of A (without pivoting), and let L be the unit lower triangular matrix (5.13). Then
A = LU .
1 2 1 x1 0
*.1 −2 2 +/ *.x +/ = *.4+/ .
2
,2 12 −2 - , 3 - ,4-
x
We apply Gaussian elimination as before, but ignore b (for now), leading to
1 2 1
U = 0 −4 1 +/ .
*
.
,0 0 −2-
As we apply the elimination, we record the multipliers so as to construct the matrix
1 0 0
L = .1 1 0+/ .
*
,2 −2 1-
Thus we have the factorisation/decomposition
1 2 1 1 0 0 1 2 1
*.1 −2 2 +/ = *.1 1 0+/ *.0 −4 1 +/ .
,2 12 −2- ,2 −2 1- ,0 0 −2-
With the matrices L and U , we can readily solve for any right-hand side b. We illustrate for
our particular b. Firstly, solve Ly = b:
1 0 0 y1 0
*.1 1 0+/ *.y +/ = *.4+/ =⇒ y1 = 0, y2 = 4 − y1 = 4, y3 = 4 − 2y1 + 2y2 = 12.
2
, 2 −2 1 - , 3 - ,4-
y
Notice that y is the right-hand side b (3) constructed earlier. Then, solve U x = y:
1 2 1 x1 0
*.0 −4 1 +/ *.x +/ = *. 4 +/ =⇒ x 3 = −6, x 2 = − 41 (4 − x 3 ) = − 52 , x 1 = −2x 2 − x 3 = 11.
2
, 0 0 −2 - , 3 - ,12-
x
Gaussian elimination and LU factorisation will both fail if we ever hit a zero on the diagonal.
But this does not mean that the matrix A is singular.
Swapping rows or columns is called pivoting. It is needed if the “pivot” element is zero, as in
the above example. But it is also used to reduce rounding error.
m 21 = −104 , (2)
a 22 = 2 + 104 , b2(2) = 1 + 104 .
and 1
x 2 = fl = 1, x 1 = fl (−[1 − 2(1)]) = 1.
1
Now both x 1 and x 2 are correct to 3 significant figures.
So pivoting is used to avoid large multipliers mik . A common strategy is partial pivoting, where
we interchange rows at the kth stage of Gaussian elimination to bring the largest element aik (k )
2 2x 1 2 1
Ly = Pb =⇒ y = *.3+/ , Ux = y =⇒ *.3x 2 +/ = *.3+/ =⇒ x = *.1+/ .
,1- , x 3 - ,1- ,1-
To measure the error when the solution is a vector, as opposed to a scalar, we usually summa-
rize the error in a single number called a norm.
A vector norm on Rn is a real-valued function that satisfies
1. The `2 -norm v
t n
X √
kx k2 := xk2 = x >x .
k=1
We also use norms to measure the “size” of matrices. Since the set Rn×n of n × n matrices with
real entries is a vector space, we could just use a vector norm on this space. But usually we
add an additional axiom.
A matrix norm is a real-valued function k · k on Rn×n that satisfies:
Example → If we treat a matrix as a big vector with n2 components, then the `2 -norm is called
the Frobenius norm of the matrix:
v
t n n
XX
kAkF = aij2 .
i=1 j=1
This norm is rarely used in numerical analysis because it is not induced by any vector norm
(as we are about to define).
kAx kp
kAkp := sup = max kAx kp . (5.15)
x,0 kx kp kx kp =1
To see that the two definitions here are equivalent, use the fact that k · kp is a vector norm. So
by (N2) we have
x
= sup
A
kAx kp
sup
= sup kAy kp = max kAy kp . (5.16)
x,0 kx kp x,0
kx kp
p ky kp =1 ky kp =1
I Usually we use the same notation for the induced matrix norm as for the original vector
norm. The meaning should be clear from the context.
Example → Let !
0 1
A= .
3 0
In the `2 -norm, a unit vector in R2 has the form x = (cos θ, sin θ ) > , so the image of the unit
circle is !
sin θ
Ax = .
3 cos θ
This is illustrated below:
The induced matrix norm is the maximum stretching of this unit circle, which is
1/2 1/2
kAk2 = max kAx k2 = max sin2 θ + 9 cos2 θ = max 1 + 8 cos2 θ = 3.
kx k2 =1 θ θ
Theorem 5.3. The induced norm corresponding to any vector norm is a matrix norm, and the
two norms satisfy kAx k ≤ kAkkx k for any matrix A ∈ Rn×n and any vector x ∈ Rn .
It is cumbersome to compute the induced norms from their definition, but fortunately there
are some very useful alternative formulae.
Theorem 5.4. The matrix norms induced by the `1 -norm and `∞ -norm satisfy
n
X
kAk1 = max |aij |, (maximum column sum)
j=1,...,n
i=1
Xn
kAk∞ = max |aij |. (maximum row sum)
i=1,...,n
j=1
Proof. We will prove the result for the `1 -norm and leave the `∞ -norm to the problem sheet.
Starting from the definition of the `1 vector norm, we have
n X X
aij x j ≤
X n n X n Xn Xn
kAx k1 = |aij | |x j | = |x j | |aij |. (5.20)
i=1 j=1
i=1 j=1 j=1 i=1
If we let
n
X
c = max |aij |, (5.21)
j=1,...,n
i=1
then
kAx k1 ≤ c kx k1 =⇒ kAk1 ≤ c. (5.22)
Now let m be the column where the maximum sum is attained. If we choose y to be the vector
with components yk = δkm , then we have kAy k1 = c. Since ky k1 = 1, we must have that
What about the matrix norm induced by the `2 -norm? This turns out to be related to the
eigenvalues of A. Recall that λ ∈ C is an eigenvalue of A with associated eigenvector u if
Au = λu. (5.24)
We define the spectral radius ρ (A) of A to be the maximum |λ| over all eigenvalues λ of A.
Theorem 5.5. The matrix norm induced by the `2 -norm satisfies
q
kAk2 = ρ (A>A).
I As a result this is sometimes known as the spectral norm.
Some linear systems are inherently more difficult to solve than others, because the solution is
sensitive to small perturbations in the input.
If κ ∗ (A) is large, then the solution will be sensitive to errors in b, at least for some b. A large
condition number means that the matrix is close to being non-invertible (i.e. two rows are
close to being linearly dependent).
Example → Return to our earlier examples and consider the condition numbers in the 1-norm.
We have (assuming 0 < ϵ 1) that
! !
1 1 1 −1
A= =⇒ A =−1
=⇒ kAk1 = kA−1 k1 = 2 =⇒ κ 1 (A) = 4,
0 1 0 1
! !
ϵ 1 1 1 −1 1+ϵ 2(1 + ϵ )
B= =⇒ B =−1
=⇒ kBk1 = 2, kB −1 k1 = =⇒ κ 1 (B) = .
0 1 ϵ 0 ϵ ϵ ϵ
For matrix B, κ 1 (B) → ∞ as ϵ → 0, showing that the matrix B is ill-conditioned.
For large systems, the O(n3 ) cost of Gaussian elimination is prohibitive. Fortunately many
such systems that arise in practice are sparse, meaning that most of the entries of the matrix
A are zero. In this case, we can often use iterative algorithms to do better than O(n3 ).
In this course, we will only study algorithms for symmetric positive definite matrices. A matrix
A is called symmetric positive definite (or SPD) if x >Ax > 0 for every vector x , 0.
I Recall that a symmetric matrix has real eigenvalues. It is positive definite iff all of its eigen-
values are positive.
x >Ax = 3x 12 + 4x 22 + 5x 32 + 2x 1x 2 + 4x 2x 3 − 2x 1x 3
= x 12 + x 22 + 2x 32 + (x 1 + x 2 ) 2 + (x 1 − x 3 ) 2 + 2(x 2 + x 3 ) 2 .
This is positive for any non-zero vector x ∈ R3 , so A is SPD (eigenvalues 1.29, 4.14 and 6.57).
If A is SPD, then solving Ax = b is equivalent to minimizing the quadratic functional
When A is SPD, this functional behaves like a U-shaped parabola, and has a unique finite global
minimizer x ∗ such that f (x ∗ ) < f (x ) for all x ∈ Rn , x , x ∗ . To find x ∗ , we need to set ∇f = 0.
We have
xi *. aij x j +/ −
Xn X n n
X
f (x ) = 12 bj x j (5.32)
i=1 , j=1 - j=1
so
1* + 1* +
Xn Xn Xn Xn Xn
= . xi aik + akj x j / − bk = 2 . aki xi + akj x j / − bk =
∂f
akj x j − bk . (5.33)
∂xk 2 i=1
, j=1 - , i=1 j=1 - j=1
In the penultimate step we used the symmetry of A to write aik = aki . It follows that
∇f = Ax − b, (5.34)
x k+1 = x k + αk d k . (5.35)
The step size αk is chosen by minimizing f (x ) along the line x = x k + αd k . For our functional
(5.31), we have
f (x k + αd k ) = 12 d k>Ad k α 2 + 21 d k>Ax k + 12 x k>Ad k − b >d k α + 21 x k>Ax k − b >x k . (5.36)
>
Since A is symmetric, we have x k>Ad k = x k>A>d k = Ax k d k = d k>Ax k and b >d k = d k>b, so
we can simplify to
f (x k + αd k ) = 12 d k>Ad k α 2 + d k> Ax k − b α + 21 x k>Ax k − b >x k . (5.37)
d k>r k
αk = − . (5.39)
d k>Ad k
Different line search methods differ in how the search direction d k is chosen at each iteration.
For example, the method of steepest descent sets
d k = −∇f (x k ) = −r k , (5.40)
Continuing the iteration, x k proceeds towards the solution (2, −2) > as illustrated below. The
coloured contours show the value of f (x 1 , x 2 ).
Unfortunately, the method of steepest descent can be slow to converge. In the conjugate gra-
dient method, we still take d 0 = −r 0 , but subsequent search directions d k are chosen to be
A-conjugate, meaning that
>
d k+1 Ad k = 0. (5.41)
This means that minimization in one direction does not undo the previous minimizations.
In particular, we construct d k+1 by writing
Theorem 5.7. The residuals r k := Ax k − b at each stage of the conjugate gradient method are
j r k = 0 for j = 0, . . . , k − 1.
mutually orthogonal, meaning r >
After n iterations, the only residual vector that can be orthogonal to all of the previous ones is
r n = 0, so x n must be the exact solution.
I In practice, conjugate gradients is not competitive as a direct method. It is computationally
intensive, and rounding errors can destroy the orthogonality, meaning that more than n iter-
ations may be required. Instead, its main use is for large sparse systems. For suitable matrices
(perhaps after preconditioning), it can converge very rapidly.
I We can save computation by using the alternative formulae
r k>r k > r
r k+1 k+1
r k+1 = r k + αk Ad k , αk = , βk = .
d k>Ad k r k>r k
With these formulae, each iteration requires only one matrix-vector product, two vector-
vector products, and three vector additions. Compare this to Algorithm 5.6 which requires
two matrix-vector products, four vector-vector products and three vector additions.
6.1 Orthogonality
Recall that the inner product between two column vectors x, y ∈ Rn is defined as
n
X
x · y = x >y = xk yk . (6.1)
k=1
√
This is related to the `2 -norm since kx k2 = x >x. The angle θ between x and y is given by
x >y = kx k2 ky k2 cos θ .
Two vectors x and y are orthogonal if x >y = 0 (i.e. they lie at right angles in Rn ).
Let S = {x 1 , x 2 , . . . , x n } be a set of n vectors. Then S is called orthogonal if x i>x j = 0 for all
i, j ∈ {1, 2, . . . , n} with i , j.
Proof. We know that a set of n vectors is a basis for Rn if the vectors are linearly independent.
If this is not the case, then some x k ∈ S could be expressed as a linear combination of the other
members,
Xn
xk = ci x i . (6.2)
i=1
i,k
where we used orthogonality. This would be a contradiction, so we conclude that the vectors
in an orthogonal set are linearly independent.
I Many of the best algorithms for numerical analysis are based on the idea of orthogonality.
We will see some examples this term.
An orthonormal set is an orthogonal set where all of the vectors have unit norm. Given
an orthogonal set S = {x 1 , x 2 , . . . , x n }, we can always construct an orthonormal set S 0 =
{x 01 , x 02 , . . . , x n0 } by normalisation, meaning
1
x i0 = xi . (6.4)
kx i k
If m = n, then such a Q is called an orthogonal matrix. For m , n, it is just called a matrix with
orthonormal columns.
Therefore S forms a basis for R2 . If x is a vector with components x 1 , x 2 in the standard basis
{ (1, 0) > , (0, 1) > }, then the components of x in the basis given by S are
√2 √1
!
* 15 5 + x1
− √2 x 2
, 5 5-
√
I Inner products are preserved under multiplication by orthogonal matrices, since (Qx ) >Qy =
x > (Q >Q )y = x >y. This means that angles between vectors and the lengths of vectors are
preserved. Multiplication by an orthogonal matrix corresponds to a rigid rotation (if det(Q ) =
1) or a reflection (if det(Q ) = −1).
The discrete least squares problem: find x that minimizes the `2 -norm of the residual, kAx −b k2 .
pn (x ) = c 0 + c 1x + . . . + cn x n
..1 x2 · · · x 2n // .. ... // = .. f (x 2 ) // .
.. ... .. .. // . / .. .. //
. . / ,cn - . . /
,1 - , f (xm ) -
xm · · · xmn
p(x ) − f (x ) 2 .
m
X
i i
i=1
Theorem 6.3. The matrix A>A is invertible iff the columns of A are linearly independent, in
which case Ax = b has a unique least-squares solution x = (A>A) −1A>b.
Proof. If A>A is singular (non-invertible), then A>Ax = 0 for some non-zero vector x, implying
that
x >A>Ax = 0 =⇒ kAx k22 = 0 =⇒ Ax = 0. (6.8)
This implies that A is rank-deficient (i.e. its columns are linearly dependent).
Conversely, if A is rank-deficient, then Ax = 0 for some x , 0, implying A>Ax = 0 and hence
that A>A is singular.
Example → Fit a least-squares straight line to the data f (−3) = f (0) = 0, f (6) = 2.
Here n = 1 (fitting a straight line) and m = 2 (3 data points), so x 0 = −3, x 1 = 0 and x 2 = 6.
The overdetermined system is
1 −3 ! 0
*.1 0 +/ c 0 = *.0+/ ,
c1
,1 6 - ,2-
and the normal equations have the form
! ! !
3 x0 + x1 + x2 c0 f (x 0 ) + f (x 1 ) + f (x 2 )
=
x 0 + x 1 + x 2 x 02 + x 12 + x 22 c 1 x 0 f (x 0 ) + x 1 f (x 1 ) + x 2 f (x 2 )
! ! ! ! !
3 3 c0 2 c0 3/7
⇐⇒ = =⇒ = .
3 45 c 1 12 c1 5/21
3 5
So the least-squares approximation by straight line is p1 (x ) = 7 + 21 x.
In practice the normal matrix A>A can often be ill-conditioned. A better method is based on
another matrix factorization.
Theorem 6.4 (QR decomposition). Any real m × n matrix A, with m ≥ n, can be written in the
form A = QR, where Q is an m ×n matrix with orthonormal columns and R is an upper-triangular
n × n matrix.
Example → Use the Gram-Schmidt process to construct an orthonormal basis forW = Span{u 1 , u 2 },
where u 1 = (3, 6, 0) > and u 2 = (1, 2, 2) > .
We take
3 3
*
p 1 = u 1 = .6/+ =⇒ q 1 =
p1 1 * +
= √ .6/ .
kp 1 k2 45 0
, -
0 , -
Then
1 3 0 0
p 2 = u 2 − (u >q )q = *.2+/ − 15 *.6+/ = *.0+/ =⇒ q =
p2
= *.0+/ .
2 1 1 2
45 0 kp 2 k2
,2- , - ,2- ,1-
The set {q 1 , q 2 } is orthonormal and spans W .
How do we use this to construct our QR decomposition of A? We simply apply Gram-Schmidt
to the set of columns of A, {a 1 , . . . , an }, which are vectors in Rm . This produces a set of
orthogonal vectors qi ∈ Rm . Moreover, we have
k−1
X k−1
X
kpk k2qk = ak − (ak>qi ) qi =⇒ ak = kpk k2qk + (ak>qi )qi . (6.9)
i=1 i=1
Taking the inner product with qk and using orthonormality of the qi shows that kpk k2 = ak>qk ,
so we can write the columns of A as
k
X
ak = (ak>qi )qi . (6.10)
i=1
*. +/*a 1 q 1 a 2 q 1 · · · an q 1 +
> > >
.. //.. 0 a 2 q 2 · · · an>q 2 /
>
A = .q 1 q 2 . . . qn /. .. .. // (6.11)
.. //. . . /
.. ..
. .
, 0 0 an q n -
··· >
,| {z }-| {z }
Q (m × n) R (n × n)
I What we have done here is express each column of A in the orthonormal basis given by the
columns of Q. The coefficients in the new basis are stored in R, which will be non-singular if
A has full column rank.
How does this help in least squares? If A = QR then
In the last step we assumed that R is invertible. We see that the problem is reduced to an
upper-triangular system, which may be solved by back-substitution.
Example → Use QR decomposition to find our earlier least-squares straight line, where
1 −3 ! 0
*.1 0 +/ c 0 = *.0+/ .
c1
,1 6 - ,2-
The columns of A are a 1 = (1, 1, 1) > and a 2 = (−3, 0, 6) > . So applying Gram-Schmidt gives
1 1
p 1 = a 1 = *.1+/ = √ *. 1 +/
p1 1
=⇒ q 1 =
kp 1 k2 3 1,
,1- , -
and
−3 1 −4 −4
p 2 = a 2 − (a > *. 0 +/ − √3 1 *.1+/ = *.−1+/ p2
= √ *.−1+/ .
1
q )q
2 1 1 = √ =⇒ q 2 =
3 1 kp 2 k2 42 5
,6- , - ,5- , -
Therefore A = QR with
√ √
1/√3 −4/√42 √ √ !
* +
!
3 √3
Q = ..1/√3 −1/√ 42// ,
a>
1 q1 a>
2 q1
R= = .
0 a>
2 q2 0 42
,1/ 3 5/ 42 -
The normal equations may then be written
√ √ ! ! √ ! ! !
> 3 √ 3 c0 2/√3 c0 3/7
Rx = Q b =⇒ = =⇒ = ,
0 42 c 1 10/ 42 c1 5/21
for any choice of weight function w (x ) that is positive, continuous, and integrable on (a, b).
I The purpose of the weight function will be to assign varying degrees of importance to errors
on different portions of the interval.
Since (6.16) is an inner product, we can define a norm satisfying (N1), (N2), (N3) by
q b ! 1/2
2
k f k := (f , f ) = | f (x )| w (x ) dx . (6.17)
a
The continuous least squares problem is to find the polynomial pn ∈ Pn that minimizes
pn − f
,
in a given inner product. The analogue of the normal equations is the following.
Theorem 6.6 (Continuous least squares). Given f ∈ C[a, b], the polynomial pn ∈ Pn minimizes
qn − f
among all qn ∈ Pn if and only if
I Notice the analogy with discrete least squares: we are again setting the “error” orthogonal
to the space of “possible functions” Pn .
I The theorem holds more generally for best approximations in any subspace of an inner
product space. It is an important result in the subject of Approximation Theory.
Just as with discrete least squares, we can make our life easier by working in an orthonormal
basis for Pn .
A family of orthogonal polynomials associated with the inner product (6.16) is a set {ϕ 0 , ϕ 1 ,
ϕ 2 , . . .} where each ϕk is a polynomial of degree exactly k and the polynomials satisfy the
orthogonality condition
(ϕ j , ϕk ) = 0, k , j. (6.21)
I This condition implies that each ϕk is orthogonal to all polynomials of degree less than k.
The condition (6.21) determines the family uniquely up to normalisation, since multiplying
each ϕk by a constant factor does not change their orthogonality. There are three common
choices of normalisation:
1. Require each ϕk to be monic (leading coefficient 1).
2. Require orthonormality, (ϕ j , ϕk ) = δ jk .
3. Require ϕk (1) = 1 for all k.
I The final one is the standard normalisation for Chebyshev and Legendre polynomials.
As in Theorem 6.1, a set of orthogonal polynomials {ϕ 0 , ϕ 1 , . . . , ϕn } will form a basis for Pn .
Since this is a basis, the least-squares solution pn ∈ Pn may be written
pn (x ) = c 0ϕ 0 (x ) + c 1ϕ 1 (x ) + . . . + cnϕn (x ) (6.22)
where c 0 , . . . , cn are the unknown coefficients to be found. Then according to Theorem 6.6 we
can find these coefficients by requiring
(pn − f , ϕk ) = 0 for k = 0, . . . , n, (6.23)
⇐⇒ c 0 (ϕ 0 , ϕk ) + c 1 (ϕ 1 , ϕk ) + . . . cn (ϕn , ϕk ) = ( f , ϕk ) for k = 0, . . . , n, (6.24)
( f , ϕk )
⇐⇒ ck = for k = 0, . . . , n. (6.25)
(ϕk , ϕk )
So compared to the natural basis, the number of integrals required is greatly reduced (at least
once you have the orthogonal polynomials).
We can construct an orthogonal basis using the same Gram-Schmidt algorithm as in the dis-
crete case. For simplicity, we will construct a set of monic orthogonal polynomials. Start with
the monic polynomial of degree 0,
ϕ 0 (x ) = 1. (6.26)
Then construct ϕ 1 (x ) from x by subtracting the orthogonal projection of x on ϕ 0 , giving
(xϕ 0 , ϕ 0 ) (x, 1)
ϕ 1 (x ) = x − ϕ 0 (x ) = x − . (6.27)
(ϕ 0 , ϕ 0 ) (1, 1)
In general, given the orthogonal set {ϕ 0 , ϕ 1 , . . . , ϕk }, we construct ϕk+1 (x ) by starting with
xϕk (x ) and subtracting its orthogonal projections on ϕ 0 , ϕ 1 , . . . , ϕk . Thus
(xϕk , ϕ 0 ) (xϕk , ϕ 1 ) (xϕk , ϕk−1 ) (xϕk , ϕk )
ϕk+1 (x ) = xϕk (x )− ϕ 0 (x )− ϕ 1 (x )−. . .− ϕk−1 (x )− ϕk (x ).
(ϕ 0 , ϕ 0 ) (ϕ 1 , ϕ 1 ) (ϕk−1 , ϕk−1 ) (ϕk , ϕk )
(6.28)
Theorem 6.7 (Three-term recurrence). The set of monic orthogonal polynomials under the inner
product (6.16) satisfy the recurrence relation
(x, 1)
ϕ 0 (x ) = 1, ϕ 1 (x ) = x − ,
(1, 1)
(xϕk , ϕk ) (ϕk , ϕk )
ϕk+1 (x ) = xϕk (x ) − ϕk (x ) − ϕk−1 (x ) for k ≥ 1.
(ϕk , ϕk ) (ϕk−1 , ϕk−1 )
I Traditionally, the Legendre polynomials are then normalised so that ϕk (1) = 1 for all k. In
that case, the recurrence relation reduces to
(2k + 1)xϕk (x ) − kϕk−1 (x )
ϕk+1 (x ) = .
k +1
Example → Use a basis of orthogonal polynomials to find the least squares polynomial p1 =
c 0 +c 1x that approximates f (x ) = sin(πx ) on the interval [0, 1] with weight function w (x ) = 1.
Starting with ϕ 0 (x ) = 1, we compute
1
x dx
ϕ 1 (x ) = x − 0 1 = x − 12 .
0 dx
In other words,
n
X b
In ( f ) := σk f (xk ), where σk = `k (x ) dx . (7.5)
k=0 a
When the nodes are equidistant, this is called a Newton-Cotes formula. If x 0 = a and xn = b, it
is called a closed Newton-Cotes formula.
I An open Newton-Cotes formula has nodes xi = a + (i + 1)h for h = (b − a)/(n + 2).
Theorem 7.1. Let f be continuous on [a, b] with n + 1 continuous derivatives on (a, b). Then the
Newton-Cotes formula (7.5) satisfies the error bound
I ( f ) − I ( f ) ≤ maxξ ∈[a,b] | f (x − x )(x − x ) · · · (x − x ) dx .
(n+1) (ξ )| b
n (n + 1)! a
0 1 n
I ( f ) − I ( f ) = b f g
b b
a f (x ) dx − p (x ) dx =
a f (x ) − p (x ) dx (7.6)
n n n
a
b
f (x ) − p (x ) dx . (7.7)
≤
a
n
Now recall Theorem 2.6, which says that, for each x ∈ [a, b], we can write
f (n+1) (ξ )
f (x ) − pn (x ) = (x − x 0 )(x − x 1 ) · · · (x − xn ) (7.8)
(n + 1)!
for some ξ ∈ (a, b). The theorem simply follows by inserting this into inequality (7.7).
1 (1 + 1)! x − a)(b − x ) dx = M2 .
a 2! a 12
For our earlier example with a = 0, b = 2, f (x ) = ex , the estimate gives
I ( f ) − I ( f ) ≤ 1 3 2
≈ 4.926.
1 12 (2 )e
1 2! x i−1
i−1 i
Note that
xi xi xi f g
(x − x )(x − x ) dx = dx 2
i−1 i (x − x i−1 )(x i − x ) = − x + (x i−1 + x i )x − x i−1 i dx
x
f g xi
x i−1 x i−1 x i−1
As long as f is sufficiently smooth, this shows that the composite trapezium rule will converge
as m → ∞. Moreover, the convergence will be O(h 2 ).
7.3 Exactness
From Theorem 7.1, we see that the Newton-Cotes formula In ( f ) will give the exact answer if
f (n+1) = 0. In other words, it will be exact if f ∈ Pn .
The degree of exactness of a quadrature formula is the largest integer n for which the formula
is exact for all polynomials in Pn .
To check whether a quadrature formula has degree of exactness n, it suffices to check whether
it is exact for the basis 1, x, x 2 , . . . , x n .
The idea of Gaussian quadrature is to choose not only the weights σk but also the nodes xk , in
order to achieve the highest possible degree of exactness.
Firstly, we will illustrate the brute force method of undetermined coefficients.
P1
Example → Gaussian quadrature formula G 1 ( f ) = k=0 σk f (xk ) on the interval [−1, 1].
Here we have four unknowns x 0 , x 1 , σ0 and σ1 , so we can impose four conditions:
1
G 1 (1) = I (1) =⇒ σ0 + σ1 = dx = 2,
−1
1
G 1 (x ) = I (x ) =⇒ σ0x 0 + σ1x 1 = x dx = 0,
−1
1
2 2 2 2
G 1 (x ) = I (x ) =⇒ σ0x 0 + σ1x 1 = x 2 dx = 23 ,
−1
1
G 1 (x 3 ) = I (x 3 ) =⇒ σ0x 03 + σ1x 13 = x 3 dx = 0.
−1
To solve this system, the symmetry suggests that x 1 = −x 0 and σ0 = σ1 . This will automatically
satisfy the equations for x and x 3 , leaving the two equations
2σ0 = 2, 2σ0x 02 = 23 ,
√
so that σ0 = σ1 = 1 and x 1 = −x 0 = 1/ 3. The resulting Gaussian quadrature formula is
! !
1 1
G1 ( f ) = f − √ + f √ .
3 3
This formula has degree of exactness 3.
In general, the Gaussian quadrature formula with n nodes will have degree of exactness 2n + 1.
The method of undetermined coefficients becomes unworkable for larger numbers of nodes,
because of the nonlinearity of the equations. A much more elegant method uses orthogonal
polynomials. In addition to what we learned before, we will need the following result.
Lemma 7.2. If {ϕ 0 , ϕ 1 , . . . , ϕn } is a set of orthogonal polynomials on [a, b] under the inner prod-
uct (6.16) and ϕk is of degree k for each k = 0, 1, . . . , n, then ϕk has k distinct real roots, and these
roots lie in the interval [a, b].
Proof. Let x 1 , . . . , x j be the points where ϕk (x ) changes sign in [a, b]. If j = k then we are done.
Otherwise, suppose j < k, and consider the polynomial
q j (x ) = (x − x 1 )(x − x 2 ) · · · (x − x j ). (7.10)
which is a contradiction.
Remarkably, these roots are precisely the optimum choice of nodes for a quadrature formula
to approximate the (weighted) integral
b
Iw ( f ) = f (x )w (x ) dx . (7.13)
a
Theorem 7.3 (Gaussian quadrature). Let ϕn+1 be a polynomial in Pn+1 that is orthogonal on
[a, b] to all polynomials in Pn , with respect to the weight function w (x ). If x 0 , x 1 , . . . , xn are the
roots of ϕn+1 , then the quadrature formula
n
X b
Gn,w ( f ) := σk f (xk ), σk = `k (x )w (x ) dx
k=0 a
as before.
I Using an appropriate weight function w (x ) can be useful for integrands with a singularity,
since we can incorporate this in w (x ) and still approximate the integral with Gn,w .
1 1
Example → Gaussian quadrature for 0 cos(x )x − 2 dx, with n = 0.
This is a Fresnel integral, with exact value 1.80905 . . . Let us compare the effect of using an
appropriate weight function.
The corresponding weight may be found by imposing G 0 (1) = I (1), which gives σ0 =
1
0 dx = 1. Then our estimate is
! cos 1
cos(x ) 2
G0 √ = q = 1.2411 . . .
x 1
2
1
2. Weighted quadrature with w (x ) = x − 2 . This time we get
1 1
2
0 x 2 dx 3
ϕ 1 (x ) = x − =x− =⇒ x 0 = 31 .
1 1 2
0 x−2 dx
1 1
The corresponding weight is σ0 = 0 x − 2 dx = 2, so the new estimate is the more accurate
G 0,w cos(x ) = 2 cos 13 = 1.8899 . . .
Proof of Theorem 7.3. First, recall that any interpolatory quadrature formula based on n + 1
nodes will be exact for all polynomials in Pn (this follows from Theorem 7.1, which can be
modified to include the weight function w (x )). So in particular, Gn,w is exact for pn ∈ Pn .
Now let p2n+1 ∈ P2n+1 . The trick is to divide this by the orthogonal polynomial ϕn+1 whose
roots are the nodes. This gives
Then
n
X n
X Xn
Gn,w (p2n+1 ) = σk p2n+1 (xk ) = σk ϕn+1 (xk )qn (xk ) + rn (xk ) = σk rn (xk ) = Iw (rn ),
k=0 k=0 k=0
(7.15)
where we have used the fact that Gn,w is exact for rn ∈ Pn . Now, since qn has lower degree
than ϕn+1 , it must be orthogonal to ϕn+1 , so
b
Iw (ϕn+1qn ) = ϕn+1 (x )qn (x )w (x ) dx = 0 (7.16)
a
and hence
Proof. Let m and M be the minimum and maximum values of f on [a, b], respectively. Since
д(x ) ≥ 0, we have that
b b b
m д(x ) dx ≤ f (x )д(x ) dx ≤ M д(x ) dx . (7.18)
a a a
b b
Now let I = a д(x ) dx. If I = 0 then д(x ) ≡ 0, so a f (x )д(x ) dx = 0 and the theorem holds
for every ξ ∈ (a, b). Otherwise, we have
1 b
m≤ f (x )д(x ) dx ≤ M. (7.19)
I a
By the Intermediate Value Theorem (Theorem 4.1), f (x ) attains every value between m and M
somewhere in (a, b), so in particular there exists ξ ∈ (a, b) with
1 b
f (ξ ) = f (x )д(x ) dx . (7.20)
I a
Theorem 7.5 (Error estimate for Gaussian quadrature). Let ϕn+1 ∈ Pn+1 be monic and orthogo-
nal on [a, b] to all polynomials in Pn , with respect to the weight function w (x ). Let x 0 , x 1 , . . . , xn
be the roots of ϕn+1 , and let Gn,w ( f ) be the Gaussian quadrature formula defined by Theorem 7.3.
If f has 2n + 2 continuous derivatives on (a, b), then there exists ξ ∈ (a, b) such that
f (2n+2) (ξ ) b 2
Iw ( f ) − Gn,w ( f ) = ϕ (x )w (x ) dx .
(2n + 2)! a n+1
Proof. A neat trick is to use Hermite interpolation. Since the xk are distinct, there exists a
unique polynomial p2n+1 such that
In addition (see problem sheet), there exists λ ∈ (a, b), depending on x, such that
n
f (2n+2) (λ) Y
f (x ) − p2n+1 (x ) = (x − xi ) 2 . (7.22)
(2n + 2)! i=0
For the right-hand side, we can’t take f (2n+2) (λ) outside the integral since λ depends on x. But
2 (x )w (x ) ≥ 0 on [a, b], so we can apply the mean value theorem for integrals and get
ϕn+1
b
f (2n+2) (ξ ) 2
Iw ( f ) − Gn,w ( f ) = ϕn+1 (x )w (x ) dx (7.25)
(2n + 2)! a