0% found this document useful (0 votes)
71 views39 pages

5 Linear Equations: 11 1 12 2 1n N 1 21 1 22 2 2n N 2 1 1 N 2 2 NN N N

The document discusses solving linear systems of equations numerically. It introduces Gaussian elimination, which transforms a matrix A into upper triangular form U through elementary row operations, allowing the system Ux = b to be solved efficiently via backward substitution. Gaussian elimination proceeds iteratively, at each step choosing row multipliers to eliminate entries below the diagonal. This reduces the complexity of solving large systems from exponential to quadratic in the number of variables.

Uploaded by

imaniceguy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views39 pages

5 Linear Equations: 11 1 12 2 1n N 1 21 1 22 2 2n N 2 1 1 N 2 2 NN N N

The document discusses solving linear systems of equations numerically. It introduces Gaussian elimination, which transforms a matrix A into upper triangular form U through elementary row operations, allowing the system Ux = b to be solved efficiently via backward substitution. Gaussian elimination proceeds iteratively, at each step choosing row multipliers to eliminate entries below the diagonal. This reduces the complexity of solving large systems from exponential to quadratic in the number of variables.

Uploaded by

imaniceguy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

5 Linear equations

How do we solve a linear system numerically?

Linear systems of the form


a 11x 1 + a 12x 2 + . . . + a 1n xn = b1,
a 21x 1 + a 22x 2 + . . . + a 2n xn = b2,
.. .. .. . (5.1)
. . . = ..
an1x 1 + an2x 2 + . . . + ann xn = bn
occur in many applications (often with very large n). It is convenient to express (5.1) in the
matrix form
Ax = b, (5.2)
where A is an n × n square matrix with elements aij , and x, b are n × 1 vectors.
We will need some basic facts from linear algebra:
1. A> is the transpose of A, so (a > )ij = a ji .
2. A is symmetric if A = A> .
3. A is non-singular iff there exists a solution x ∈ Rn for every b ∈ Rn .
4. A is non-singular iff det(A) , 0.
5. A is non-singular iff there exists a unique inverse A−1 such that AA−1 = A−1A = I .
It follows from fact 5 that (5.2) has a unique solution iff A is non-singular, given by x = A−1b.
In this chapter, we will see how to solve (5.2) both efficiently and accurately.
I Although this seems like a conceptually easy problem (just use Gaussian elimination!), it
is actually a hard one when n gets large. Nowadays, linear systems with n = 1 million arise
routinely in computational problems. And even for small n there are some potential pitfalls,
as we will see.
I If A is instead rectangular (m × n), then there are different numbers of equations and un-
knowns, and we do not expect a unique solution. Nevertheless, we can still look for an ap-
proximate solution – this will be considered in Section 6.
Many algorithms are based on the idea of rewriting (5.2) in a form where the matrix is easier
to invert. Easiest to invert are diagonal matrices, followed by orthogonal matrices (where
A−1 = A> ). However, the most common method for solving Ax = b transforms the system to
triangular form.

5.1 Triangular systems

If the matrix A is triangular, then Ax = b is straightforward to solve.


A matrix L is called lower triangular if all entries above the diagonal are zero:
l 11 0 ··· 0
*. .. .. +/
L = ... 21
./
// .
l l 22 .
(5.3)
.. 0/
.. ...
,ln1 · · · · · · lnn -

Numerical Analysis II - ARY 49 2017-18 Lecture Notes


The determinant is just
det(L) = l 11l 22 · · · lnn , (5.4)
so the matrix will be non-singular iff all of the diagonal elements are non-zero.

Example → Solve Lx = b for n = 4.


The system is
l 11 0 0 0 x 1 b1 l 11x 1 = b1 ,
*.l 0 0 / .x 2 / .b2 +/
+ * + *
.. 21 22 /. / = . /
l l 21x 1 + l 22x 2 = b2 ,
l 31 l 32 l 33 0 / .x 3 / .b3 /
⇐⇒
l 31x 1 + l 32x 2 + l 33x 3 = b3 ,
,l 41 l 42 l 43 l 44 - ,x 4 - ,b4 - l 41x 1 + l 42x 2 + l 43x 3 + l 44x 4 = b4 .
We can just solve step-by-step:
b1 b2 − l 21x 1 b3 − l 31x 1 − l 32x 2 b4 − l 41x 1 − l 42x 2 − l 43x 3
x1 = , x2 = , x3 = , x4 = .
l 11 l 22 l 33 l 44
This is fine since we know that l 11 , l 22 , l 33 , l 44 are all non-zero when a solution exists.
In general, any lower triangular system Lx = b can be solved by forward substitution
Pj−1
b j − k=1 l jk xk
xj = , j = 1, . . . , n. (5.5)
l jj
Similarly, an upper triangular matrix U has the form
u 11 u 12 · · · u 1n
*. .. +/
. 0 u 22
U = .. . .
. /
/, (5.6)
. .. . . . . ... //
.
, 0 · · · 0 unn -
and an upper-triangular system U x = b may be solved by backward substitution
P
b j − nk=j+1 u jk xk
xj = , j = n, . . . , 1. (5.7)
u jj

To estimate the computational cost of forward substitution, we can count the number of
floating-point operations (+, −, ×, ÷).

Example → Number of operations required for forward substitution.


Consider each x j . We have
j = 1 → 1 division
j = 2 → 1 division + [1 subtraction + 1 multiplication]
j = 3 → 1 division + 2 ×[1 subtraction + 1 multiplication]
..
.
j = n → 1 division + (n − 1) ×[1 subtraction + 1 multiplication]
So the total number of operations required is
Xn   Xn n
X
1 + 2(j − 1) = 2 j− 1 = n(n + 1) − n = n2 .
j=1 j=1 j=1

Numerical Analysis II - ARY 50 2017-18 Lecture Notes


So solving a triangular system by forward (or backward) substitution takes n 2 operations.
I We say that the computational complexity of the algorithm is n2 .
I In practice, this is only a rough estimate of the computational cost, because reading from
and writing to the computer’s memory also take time. This can be estimated given a “memory
model”, but this depends on the particular computer.

5.2 Gaussian elimination

If our matrix A is not triangular, we can try to transform it to triangular form. Gaussian elimina-
tion uses elementary row operations to transform the system to upper triangular form U x = y.
Elementary row operations include swapping rows and adding multiples of one row to another.
They won’t change the solution x, but will change the matrix A and the right-hand side b.

Example → Transform to upper triangular form the system


x 1 + 2x 2 + x 3 = 0, 1 2 1 0
x 1 − 2x 2 + 2x 3 = 4, A = .1 −2 2 +/ ,
* b = .4+/ .
*
2x 1 + 12x 2 − 2x 3 = 4. ,2 12 −2- ,4-

Stage 1. Subtract 1 times equation 1 from equation 2, and 2 times equation 1 from equation 3,
so as to eliminate x 1 from equations 2 and 3:
x 1 + 2x 2 + x 3 = 0, 1 2 1 0
−4x 2 + x 3 = 4, A(2) = *.0 −4 1 +/ b (2) = *.4+/ , m 21 = 1, m 31 = 2.
8x 2 − 4x 3 = 4. ,0 8 −4- ,4-
Stage 2. Subtract −2 times equation 2 from equation 3, to eliminate x 2 from equation 3:
x 1 + 2x 2 + x 3 = 0, 1 2 1 0
−4x 2 + x 3 = 4, A(3) = *.0 −4 1 +/ b (3) = *. 4 +/ , m 32 = −2.
−2x 3 = 12. ,0 0 −2- ,12-
Now the system is upper triangular, and back substitution gives x 1 = 11, x 2 = − 25 , x 3 = −6.
We can write the general algorithm as follows.

Algorithm 5.1 (Gaussian elimination). Let A(1) = A and b (1) = b. Then for each k from 1 to
n − 1, compute a new matrix A(k+1) and right-hand side b (k+1) by the following procedure:
1. Define the row multipliers
(k )
aik
mik = (k )
, i = k + 1, . . . , n.
akk
2. Use these to remove the unknown xk from equations k + 1 to n, leaving

aij(k+1) = aij(k ) − mik akj


(k )
, bi(k+1) = bi(k ) − mik bk(k ) , i, j = k + 1, . . . , n.

The final matrix A(n) = U will then be upper triangular.

Numerical Analysis II - ARY 51 2017-18 Lecture Notes


This procedure will work providing akk
(k )
, 0 for every k. (We will worry about this later.)
What about the computational cost of Gaussian elimination?

Example → Number of operations required to find U .


Computing A(k+1) requires:
• n − (k + 1) + 1 = n − k divisions to compute mik .
• (n − k ) 2 subtractions and the same number of multiplications to compute aij(k+1) .
So in total A(k+1) requires 2(n − k ) 2 + n − k operations. Overall, we need to compute A(k+1) for
k = 1, . . . , n − 1, so the total number of operations is
n−1 
X  n−1
X n−1
X n−1
X
2 2
N = 2n + n − (4n + 1)k + 2k = n(2n + 1) 1 − (4n + 1) k +2 k 2.
k=1 k=1 k=1 k=1

Recalling that
n
X n
X
k= 1
2 n(n + 1), k 2 = 16 n(n + 1)(2n + 1),
k=1 k=1
we find

N = n(2n + 1)(n − 1) − 12 (4n + 1)(n − 1)n + 13 (n − 1)n(2n − 1) = 32 n3 − 21 n 2 − 16 n.

So the number of operations required to find U is O(n 3 ).


I It is known that O(n 3 ) is not optimal, and the best theoretical algorithm known for inverting
a matrix takes O(n2.3728639 ) operations (although this is not practically useful). But it remains
an open conjecture that there exists an O(n2+ϵ ) algorithm, for ϵ arbitrarily small.

5.3 LU decomposition

In Gaussian elimination, both the final matrix U and the sequence of row operations are de-
termined solely by A, and do not depend on b. We will see that the sequence of row operations
that transforms A to U is equivalent to left-multiplying by a matrix F , so that

FA = U , U x = Fb. (5.8)

To see this, note that step k of Gaussian elimination can be written in the form

1 0 ··· ··· ··· 0


*. . .. +/
..0 . . ./
. ..
.. .. . . .. //
. //
..
1
A(k+1) = F (k ) A(k ) , b (k+1) = F (k )b (k ) , where F (k ) := .. .
. . .
.. .. /.
−mk+1,k . . . . .. //
. . .
.. . /
. 0//
..
. ..
.. ..
. .
,0 · · · −mn,k · · · 0 1-
(5.9)
Multiplying by F has the effect of subtracting mik times row k from row i, for i = k +1, . . . , n.
(k )

Numerical Analysis II - ARY 52 2017-18 Lecture Notes


I A matrix with this structure (the identity except for a single column below the diagonal) is
called a Frobenius matrix.

Example → You can check in the earlier example that

1 0 0 1 2 1 1 2 1 1 2 1
F (1) A = *.−1 1 0+/ *.1 −2 2 +/ = *.1−1(1) −2−1(2) 2−1(1) +/ = *.0 −4 1 +/ = A(2) ,
,−2 0 1- ,2 12 −2- ,2−2(1) 12−2(2) −2−2(1) - ,0 8 −4-
and
1 0 0 1 2 1 1 2 1 1 2 1
F (2) (2)
A * + * + *
= .0 1 0/ .0 −4 1 / = .0 −4 1 / = .0 −4 1 +/ = A(3) = U .
+ *
,0 2 1- ,0 8 −4- ,0 8+2(−4) −4+2(1) - ,0 0 −2-

It follows that
U = A(n) = F (n−1) F (n−2) · · · F (1) A. (5.10)
Now the F (k ) are invertible, and the inverse is just given by adding rows instead of subtracting:

1 0 ··· ··· ··· 0


*. . .. +/
..0 . . ./
. ..
.. .. . . .. //
. //
..
1
= .. .
. . .
(F (k ) ) −1
.. .. ./ . (5.11)
mk+1,k . . . . .. //
. .
.. . /
. 0//
..
. ..
.. ..
. .
,0 · · · mn,k · · · 0 1-
So we could write
A = (F (1) ) −1 (F (2) ) −1 · · · (F (n−1) ) −1U . (5.12)
Since the successive operations don’t “interfere” with each other, we can write

1 0 ··· ··· ··· 0


*. .. +/
..m 2,1 ./
...
1
.. .. //
. //
..
1
= .. 3,1
m m 3,2 .
(F (1) ) −1 (F (2) ) −1 · · · (F (n−1) ) −1
..m 4,1 m 4,2 m 4,3 . .
. ... .. // := L. (5.13)
./
.. . /
0//
.. ..
. .. . . 1
, n,1
m m n,2 m n,3 · · · m n,n−1 1-

Thus we have established the following result.

Theorem 5.2 (LU decomposition). Let U be the upper triangular matrix from Gaussian elimi-
nation of A (without pivoting), and let L be the unit lower triangular matrix (5.13). Then

A = LU .

Numerical Analysis II - ARY 53 2017-18 Lecture Notes


I Unit lower triangular means that there are all 1’s on the diagonal.
I Theorem 5.2 says that Gaussian elimination is equivalent to factorising A as the product of
a lower triangular and an upper triangular matrix. This is not at all obvious from Algorithm
5.1! The decomposition is unique up to a scaling LD, D −1U for some diagonal matrix D.
The system Ax = b becomes LU x = b, which we can readily solve by setting U x = y. We first
solve Ly = b for y, then U x = y for x. Both are triangular systems.
Moreover, if we want to solve several systems Ax = b with different b but the same matrix,
we just need to compute L and U once. This saves time because, although the initial LU fac-
torisation takes O(n 3 ) operations, the evaluation takes only O(n2 ).
I This matrix factorisation viewpoint dates only from the 1940s, and LU decomposition was
introduced by Alan Turing in a 1948 paper (Q. J. Mechanics Appl. Mat. 1, 287). Other common
factorisations used in numerical linear algebra are QR (which we will see later) and Cholesky.

Example → Solve our earlier example by LU decomposition.

1 2 1 x1 0
*.1 −2 2 +/ *.x +/ = *.4+/ .
2
,2 12 −2 - , 3 - ,4-
x
We apply Gaussian elimination as before, but ignore b (for now), leading to
1 2 1
U = 0 −4 1 +/ .
*
.
,0 0 −2-
As we apply the elimination, we record the multipliers so as to construct the matrix
1 0 0
L = .1 1 0+/ .
*
,2 −2 1-
Thus we have the factorisation/decomposition
1 2 1 1 0 0 1 2 1
*.1 −2 2 +/ = *.1 1 0+/ *.0 −4 1 +/ .
,2 12 −2- ,2 −2 1- ,0 0 −2-
With the matrices L and U , we can readily solve for any right-hand side b. We illustrate for
our particular b. Firstly, solve Ly = b:
1 0 0 y1 0
*.1 1 0+/ *.y +/ = *.4+/ =⇒ y1 = 0, y2 = 4 − y1 = 4, y3 = 4 − 2y1 + 2y2 = 12.
2
, 2 −2 1 - , 3 - ,4-
y

Notice that y is the right-hand side b (3) constructed earlier. Then, solve U x = y:
1 2 1 x1 0
*.0 −4 1 +/ *.x +/ = *. 4 +/ =⇒ x 3 = −6, x 2 = − 41 (4 − x 3 ) = − 52 , x 1 = −2x 2 − x 3 = 11.
2
, 0 0 −2 - , 3 - ,12-
x

Numerical Analysis II - ARY 54 2017-18 Lecture Notes


5.4 Pivoting

Gaussian elimination and LU factorisation will both fail if we ever hit a zero on the diagonal.
But this does not mean that the matrix A is singular.

Example → The system


0 3 0 x1 3
*.2 0 0+/ *.x +/ = *.2+/
2
, 0 0 1- , 3 - ,1-
x
obviously has solution x 1 = x 2 = x 3 = 1 (the matrix has determinant −6). But Gaussian
elimination will fail because a 11
(1)
= 0, so we cannot calculate m 21 and m 31 . However, we could
avoid the problem by changing the order of the equations to get the equivalent system
2 0 0 x1 2
*.0 3 0+/ *.x +/ = *.3+/ .
2
, 0 0 1- , 3 - ,1-
x
Now there is no problem with Gaussian elimination (actually the matrix is already upper tri-
angular). Alternatively, we could have rescued Gaussian elimination by swapping columns:
3 0 0 x2 3
*.0 2 0+/ *.x +/ = *.2+/ .
1
,0 0 1- ,x 3 - ,1-

Swapping rows or columns is called pivoting. It is needed if the “pivot” element is zero, as in
the above example. But it is also used to reduce rounding error.

Example → Consider the system


! ! !
10−4 1 x1 1
= .
−1 2 x2 1

1. Using Gaussian elimination with exact arithmetic gives

m 21 = −104 , (2)
a 22 = 2 + 104 , b2(2) = 1 + 104 .

So backward substitution gives the solution


!
1 + 104 1 − x2 4 1 + 104 104
x2 = = 0.9999, x1 = = 10 1 − = = 0.9998.
2 + 104 a 11 2 + 104 2 + 104
2. Now do the calculation in 3-digit arithmetic. We have

m 21 = fl(−104 ) = −104 , (2)


a 22 = fl(2 + 104 )= 104 , b2(2) = fl(1 + 104 )= 104 .

Now backward substitution gives


!
104  
x 2 = fl = 1, x 1 = fl 104 (1 − 1) = 0.
104
The large value of m 21 has caused a rounding error which has later led to a loss of sig-
nificance during the evaluation of x 1 .

Numerical Analysis II - ARY 55 2017-18 Lecture Notes


3. We do the calculation correctly in 3-digit arithmetic if we first swap the equations,
! ! !
−1 2 x 1 1
= .
10−4 1 x 2 1
Now,

m 21 = fl(−10−4 ) = −10−4 , (2)


a 22 = fl(1 + 10−4 ) = 1, b2(2) = fl(1 + 10−4 ) = 1,

and 1
x 2 = fl = 1, x 1 = fl (−[1 − 2(1)]) = 1.
1
Now both x 1 and x 2 are correct to 3 significant figures.
So pivoting is used to avoid large multipliers mik . A common strategy is partial pivoting, where
we interchange rows at the kth stage of Gaussian elimination to bring the largest element aik (k )

(for k ≤ i ≤ n) to the diagonal position akk (k )


. This dramatically improves the stability of
Gaussian elimination.
I Gaussian elimination without pivoting is unstable: rounding errors can accumulate.
I The ultimate accuracy is obtained by full pivoting, where both the rows and columns are
swapped to bring the largest element possible to the diagonal.
I If it is not possible to rearrange the columns or rows to remove a zero from position akk
(k )
,
then A is singular.
If pivoting is applied, then the effect of Gaussian elimination is to produce a modified LU
factorisation of the form
PA = LU , (5.14)
where P is a permutation matrix. This is a matrix where every row and column has exactly one
non-zero element, which is 1.
I The permutation matrices form a (sub)group, so a product of permutation matrices equals
another, different, permutation matrix.
In this case, we solve Ly = Pb then U x = y.

Example → Consider again the system


0 3 0 x1 3
*.2 0 0+/ *.x +/ = *.2+/ .
2
, 0 0 1- , 3 - ,1-
x
To swap rows 1 and 2, we can left-multiply A by the permutation matrix
0 1 0 2 0 0 2
P = .1 0 0+/
* =⇒ PA = .0 3 0+/ ,
* Pb = .3+/ .
*
,0 0 1- ,0 0 1- ,1-
Now we find the LU factorisation of PA, which is easy in this case:
1 0 0 2 0 0
PA = LU where L = .0 1 0+/ ,
* U = .0 3 0+/ .
*
,0 0 1- ,0 0 1-

Numerical Analysis II - ARY 56 2017-18 Lecture Notes


Since LU x = Pb, we can then solve for x in two steps:

2 2x 1 2 1
Ly = Pb =⇒ y = *.3+/ , Ux = y =⇒ *.3x 2 +/ = *.3+/ =⇒ x = *.1+/ .
,1- , x 3 - ,1- ,1-

5.5 Vector norms

To measure the error when the solution is a vector, as opposed to a scalar, we usually summa-
rize the error in a single number called a norm.
A vector norm on Rn is a real-valued function that satisfies

kx + y k ≤ kx k + ky k for every x, y ∈ Rn , (N1)


kαx k = |α | kx k for every x ∈ Rn and every α ∈ R, (N2)
kx k ≥ 0 for every x ∈ Rn and kx k = 0 =⇒ x = 0. (N3)

Property (N1) is called the triangle inequality.

Example → There are three common examples:

1. The `2 -norm v
t n
X √
kx k2 := xk2 = x >x .
k=1

This is just the usual Euclidean length of x.


2. The `1 -norm
n
X
kx k1 := |xk |.
k=1
This is sometimes known as the taxicab or Manhattan norm, because it corresponds to
the distance that a taxi has to drive on a rectangular grid of streets to get to x ∈ R2 .
3. The `∞ -norm
kx k∞ := max |xk |.
k=1,...,n

This is sometimes known as the maximum norm.


We leave the proofs that these satisfy (N1)-(N3) to the problem sheet.
I The norms in the example above are all special cases of the `p -norm,
n 1/p
X
kx kp = * |xk | p+
,
, k=1 -
which is a norm for any real number p ≥ 1. Increasing p means that more and more emphasis
is given to the maximum element |xk |.

Numerical Analysis II - ARY 57 2017-18 Lecture Notes


Example → Consider the vectors a = (1, −2, 3) > , b = (2, 0, −1) > , and c = (0, 1, 4) > .
The `1 -, `2 −, and `∞ -norms are
kak1 = 1√+ 2 + 3 = 6 kb k1 = 2√+ 0 + 1 = 3 0+1+4=5
kc k1 = √
kak2 = 1 + 4 + 9 ≈ 3.74 kb k2 = 4 + 0 + 1 ≈ 2.24 kc k2 = 0 + 1 + 16 ≈ 4.12
kak∞ = max{1, 2, 3} = 3 kb k∞ = max{2, 0, 1} = 2 kc k∞ = max{0, 1, 4} = 4.
Notice that, for a single vector x, the norms satisfy the ordering kx k1 ≥ kx k2 ≥ kx k∞ , but that
vectors may be ordered differently by different norms.

Example → Sketch the “unit circles” {x ∈ R2 : kx kp = 1} for p = 1, 2, ∞.

5.6 Matrix norms

We also use norms to measure the “size” of matrices. Since the set Rn×n of n × n matrices with
real entries is a vector space, we could just use a vector norm on this space. But usually we
add an additional axiom.
A matrix norm is a real-valued function k · k on Rn×n that satisfies:

kA + Bk ≤ kAk + kBk for every A, B ∈ Rn×n , (M1)


kαAk = |α | kAk for every A ∈ Rn×n and every α ∈ R, (M2)
kAk ≥ 0 for every A ∈ Rn×n and kAk = 0 =⇒ A = 0, (M3)
kABk ≤ kAkkBk for every A, B ∈ R n×n
. (M4)

The new axiom (M4) is called consistency.


I We usually want this additional axiom because matrices are more than just vectors. Some
books call this a submultiplicative norm and define a “matrix norm” to satisfy just (M1), (M2),
(M3), perhaps because (M4) only works for square matrices.

Example → If we treat a matrix as a big vector with n2 components, then the `2 -norm is called
the Frobenius norm of the matrix:
v
t n n
XX
kAkF = aij2 .
i=1 j=1

This norm is rarely used in numerical analysis because it is not induced by any vector norm
(as we are about to define).

Numerical Analysis II - ARY 58 2017-18 Lecture Notes


The most important matrix norms are so-called induced or operator norms. Remember that
A is a linear map on Rn , meaning that it maps every vector to another vector. So we can
measure the size of A by how much it can stretch vectors with respect to a given vector norm.
Specifically, if k · kp is a vector norm, then the induced norm is defined as

kAx kp
kAkp := sup = max kAx kp . (5.15)
x,0 kx kp kx kp =1

To see that the two definitions here are equivalent, use the fact that k · kp is a vector norm. So
by (N2) we have
x
= sup A
kAx kp
sup = sup kAy kp = max kAy kp . (5.16)
x,0 kx kp x,0 kx kp p ky kp =1 ky kp =1

I Usually we use the same notation for the induced matrix norm as for the original vector
norm. The meaning should be clear from the context.

Example → Let !
0 1
A= .
3 0
In the `2 -norm, a unit vector in R2 has the form x = (cos θ, sin θ ) > , so the image of the unit
circle is !
sin θ
Ax = .
3 cos θ
This is illustrated below:

The induced matrix norm is the maximum stretching of this unit circle, which is
  1/2   1/2
kAk2 = max kAx k2 = max sin2 θ + 9 cos2 θ = max 1 + 8 cos2 θ = 3.
kx k2 =1 θ θ

Theorem 5.3. The induced norm corresponding to any vector norm is a matrix norm, and the
two norms satisfy kAx k ≤ kAkkx k for any matrix A ∈ Rn×n and any vector x ∈ Rn .

Numerical Analysis II - ARY 59 2017-18 Lecture Notes


Proof. Properties (M1)-(M3) follow from the fact that the vector norm satisfies (N1)-(N3). To
show (M4), note that, by the definition (5.15), we have for any vector y ∈ Rn that
kAy k
kAk ≥ =⇒ kAy k ≤ kAkky k. (5.17)
ky k
Taking y = Bx for some x with kx k = 1, we get

kABx k ≤ kAkkBx k ≤ kAkkBk. (5.18)

This holds in particular for the vector x that maximises kABx k, so

kABk = max kABx k ≤ kAkkBk. (5.19)


kx k=1


It is cumbersome to compute the induced norms from their definition, but fortunately there
are some very useful alternative formulae.

Theorem 5.4. The matrix norms induced by the `1 -norm and `∞ -norm satisfy
n
X
kAk1 = max |aij |, (maximum column sum)
j=1,...,n
i=1
Xn
kAk∞ = max |aij |. (maximum row sum)
i=1,...,n
j=1

Proof. We will prove the result for the `1 -norm and leave the `∞ -norm to the problem sheet.
Starting from the definition of the `1 vector norm, we have
n X X

aij x j ≤
X n n X n Xn Xn
kAx k1 = |aij | |x j | = |x j | |aij |. (5.20)

i=1 j=1
i=1 j=1 j=1 i=1

If we let
n
X
c = max |aij |, (5.21)
j=1,...,n
i=1
then
kAx k1 ≤ c kx k1 =⇒ kAk1 ≤ c. (5.22)
Now let m be the column where the maximum sum is attained. If we choose y to be the vector
with components yk = δkm , then we have kAy k1 = c. Since ky k1 = 1, we must have that

max kAx k1 ≥ kAy k1 = c =⇒ kAk1 ≥ c. (5.23)


kx k1 =1

The only way to satisfy both (5.22) and (5.23) is if kAk1 = c. 

Example → For the matrix


−7 3 −1
A = . 2 4 5 +/
*
,−4 6 0 -

Numerical Analysis II - ARY 60 2017-18 Lecture Notes


we have
kAk1 = max{13, 13, 6} = 13, kAk∞ = max{11, 11, 10} = 11.

What about the matrix norm induced by the `2 -norm? This turns out to be related to the
eigenvalues of A. Recall that λ ∈ C is an eigenvalue of A with associated eigenvector u if

Au = λu. (5.24)

We define the spectral radius ρ (A) of A to be the maximum |λ| over all eigenvalues λ of A.
Theorem 5.5. The matrix norm induced by the `2 -norm satisfies
q
kAk2 = ρ (A>A).
I As a result this is sometimes known as the spectral norm.

Example → For our matrix !


0 1
A= ,
3 0
we have ! ! !
> 0 3 0 1 9 0
A A= = .
1 0 3 0 0 1

We see that the eigenvalues of A>A are λ = 1, 9, so kAk2 = 9 = 3 (as we calculated earlier).

Proof. We want to show that


p
max kAx k2 = max{ |λ| : λ eigenvalue of A>A}. (5.25)
kx k2 =1

For A real, A>A is symmetric, so has real eigenvalues λ 1 ≤ λ 2 ≤ . . . ≤ λn with corresponding


orthonormal eigenvectors u 1 , . . . , u n in Rn . (Orthonormal means that u >
j u k = δ jk .) Note also
that all of the eigenvalues are non-negative, since
u> >
1 A Au 1 kAu 1 k22
A>Au 1 = λ 1u 1 =⇒ λ 1 = = ≥ 0. (5.26)
u>1 u1 ku 1 k22

So we want to show that kAk2 = λn . The eigenvectors form a basis, so every vector x ∈ Rn
P
can be expressed as a linear combination x = nk=1 αk u k . Therefore
n
X n
X n
X n
X
kAx k22 > >
= x A Ax = x >
α k λk u k = α ju >
j α k λk u k = αk2λk , (5.27)
k=1 j=1 k=1 k=1

where the last step uses orthonormality of the u k . It follows that


n
X
kAx k22 ≤ λn αk2 . (5.28)
k=1
P
But if kx k2 = 1, then kx k22 = nk=1 αk2 = 1, so kAx k22 ≤ λn . To show that the maximum of
kAx k22 is equal to λn , we can choose x to be the corresponding eigenvector x = u n . In that
case, α 1 = . . . = αn−1 = 0 and αn = 1, so kAx k22 = λn . 

Numerical Analysis II - ARY 61 2017-18 Lecture Notes


5.7 Conditioning

Some linear systems are inherently more difficult to solve than others, because the solution is
sensitive to small perturbations in the input.

Example → Consider the linear system


! ! ! ! !
1 1 x1 1 x1 0
= =⇒ = .
0 1 x2 1 x2 1
If we add a small rounding error 0 < δ  1 to the data b1 then
! ! ! ! !
1 1 x1 1+δ x1 δ
= =⇒ = .
0 1 x2 1 x2 1
The solution is within rounding error of the true solution, so the system is called well condi-
tioned.
Example → Now let ϵ  1 be a fixed positive number, and consider the linear system
! ! ! ! !
ϵ 1 x1 1+δ x1 δ/ϵ
= =⇒ = .
0 1 x2 1 x2 1
The true solution is still (0, 1) > , but if the error δ is as big as the matrix entry ϵ, then the
solution for x 1 will be completely wrong. This system is much more sensitive to errors in b, so
is called ill-conditioned.
Graphically, this system (right) is more sensitive to δ than the first system (left) because the
two lines are closer to parallel:

To measure the conditioning of a linear system, consider


! ! ! !
|relative error in x | kδx k/kx k kδx k kb k kA−1δb k kb k
= = =
|relative error in b | kδb k/kb k kx k kδb k kx k kδb k
−1
! −1 −1
kA kkδb k kb k kA kkb k kA kkAx k
≤ = = ≤ kA−1 kkAk. (5.29)
kx k kδb k kx k kx k
We define the condition number of a matrix A in some induced norm k · k∗ to be

κ ∗ (A) = kA−1 k∗ kAk∗ . (5.30)

If κ ∗ (A) is large, then the solution will be sensitive to errors in b, at least for some b. A large
condition number means that the matrix is close to being non-invertible (i.e. two rows are
close to being linearly dependent).

Numerical Analysis II - ARY 62 2017-18 Lecture Notes


I This is a “worst case” amplification of the error by a given matrix. The actual result will
depend on δb (which we usually don’t know if it arises from previous rounding error).
I Note that det(A) will tell you whether a matrix is singular or not, but not whether it is
ill-conditioned. Since det(αA) = α n det(A), the determinant can be made arbitrarily large or
small by scaling (which does not change the condition number). For instance, the matrix
!
10−50 0
0 10−50
has tiny determinant but is well-conditioned.

Example → Return to our earlier examples and consider the condition numbers in the 1-norm.
We have (assuming 0 < ϵ  1) that
! !
1 1 1 −1
A= =⇒ A =−1
=⇒ kAk1 = kA−1 k1 = 2 =⇒ κ 1 (A) = 4,
0 1 0 1
! !
ϵ 1 1 1 −1 1+ϵ 2(1 + ϵ )
B= =⇒ B =−1
=⇒ kBk1 = 2, kB −1 k1 = =⇒ κ 1 (B) = .
0 1 ϵ 0 ϵ ϵ ϵ
For matrix B, κ 1 (B) → ∞ as ϵ → 0, showing that the matrix B is ill-conditioned.

Example → The Hilbert matrix Hn is the n × n symmetric matrix with entries


1
(hn )ij = .
i +j−1
These matrices are notoriously ill-conditioned. For example, κ 2 (H 5 ) ≈ 4.8 × 105 , and κ 2 (H 20 ) ≈
2.5 × 1028 . Solving an associated linear system in floating-point arithmetic would be hopeless.
I A practical limitation of the condition number is that you have to know A−1 before you can
calculate it. We can always estimate kA−1 k by taking some arbitrary vectors x and using
kx k
kA−1 k ≥ .
kb k
5.8 Iterative methods

For large systems, the O(n3 ) cost of Gaussian elimination is prohibitive. Fortunately many
such systems that arise in practice are sparse, meaning that most of the entries of the matrix
A are zero. In this case, we can often use iterative algorithms to do better than O(n3 ).
In this course, we will only study algorithms for symmetric positive definite matrices. A matrix
A is called symmetric positive definite (or SPD) if x >Ax > 0 for every vector x , 0.
I Recall that a symmetric matrix has real eigenvalues. It is positive definite iff all of its eigen-
values are positive.

Example → Show that the following matrix is SPD:


3 1 −1
A = . 1 4 2 +/ .
*
,−1 2 5 -

Numerical Analysis II - ARY 63 2017-18 Lecture Notes


With x = (x 1 , x 2 , x 3 ) > , we have

x >Ax = 3x 12 + 4x 22 + 5x 32 + 2x 1x 2 + 4x 2x 3 − 2x 1x 3
= x 12 + x 22 + 2x 32 + (x 1 + x 2 ) 2 + (x 1 − x 3 ) 2 + 2(x 2 + x 3 ) 2 .

This is positive for any non-zero vector x ∈ R3 , so A is SPD (eigenvalues 1.29, 4.14 and 6.57).
If A is SPD, then solving Ax = b is equivalent to minimizing the quadratic functional

f : Rn → R, f (x ) = 12 x >Ax − b >x . (5.31)

When A is SPD, this functional behaves like a U-shaped parabola, and has a unique finite global
minimizer x ∗ such that f (x ∗ ) < f (x ) for all x ∈ Rn , x , x ∗ . To find x ∗ , we need to set ∇f = 0.
We have

xi *. aij x j +/ −
Xn X n n
X
f (x ) = 12 bj x j (5.32)
i=1 , j=1 - j=1
so

1* + 1* +
Xn Xn Xn Xn Xn
= . xi aik + akj x j / − bk = 2 . aki xi + akj x j / − bk =
∂f
akj x j − bk . (5.33)
∂xk 2 i=1
, j=1 - , i=1 j=1 - j=1

In the penultimate step we used the symmetry of A to write aik = aki . It follows that

∇f = Ax − b, (5.34)

so locating the minimum of f (x ) is indeed equivalent to solving Ax = b.


I Minimizing functions is a vast sub-field of numerical analysis known as optimization. We
will only cover this specific case.
A popular class of methods for optimization are line search methods, where at each iteration
the search is restricted to a single search direction d k . The iteration takes the form

x k+1 = x k + αk d k . (5.35)

The step size αk is chosen by minimizing f (x ) along the line x = x k + αd k . For our functional
(5.31), we have
   
f (x k + αd k ) = 12 d k>Ad k α 2 + 21 d k>Ax k + 12 x k>Ad k − b >d k α + 21 x k>Ax k − b >x k . (5.36)
 >
Since A is symmetric, we have x k>Ad k = x k>A>d k = Ax k d k = d k>Ax k and b >d k = d k>b, so
we can simplify to
   
f (x k + αd k ) = 12 d k>Ad k α 2 + d k> Ax k − b α + 21 x k>Ax k − b >x k . (5.37)

This is a quadratic in α, and the coefficient of α 2 is positive because A is positive definite. It is


therefore a U-shaped parabola and achieves its minimum when
∂f  
= d k>Ad k α + d k> Ax k − b = 0. (5.38)
∂α

Numerical Analysis II - ARY 64 2017-18 Lecture Notes


Defining the residual r k := Ax k − b, we see that the desired choice of step size is

d k>r k
αk = − . (5.39)
d k>Ad k

Different line search methods differ in how the search direction d k is chosen at each iteration.
For example, the method of steepest descent sets

d k = −∇f (x k ) = −r k , (5.40)

where we have remembered (5.34).

Example → Use the method of steepest descent to solve the system


! ! !
3 2 x1 2
= .
2 6 x2 −8
Starting from x 0 = (−2, −2) > , we get
! !
12 d >d 0 208 0.08
d 0 = b − Ax 0 = =⇒ α 0 = >0 = =⇒ x 1 = x 0 + α 0d 0 ≈ .
8 d 0 Ad 0 1200 −0.613

Continuing the iteration, x k proceeds towards the solution (2, −2) > as illustrated below. The
coloured contours show the value of f (x 1 , x 2 ).

Unfortunately, the method of steepest descent can be slow to converge. In the conjugate gra-
dient method, we still take d 0 = −r 0 , but subsequent search directions d k are chosen to be
A-conjugate, meaning that
>
d k+1 Ad k = 0. (5.41)
This means that minimization in one direction does not undo the previous minimizations.
In particular, we construct d k+1 by writing

d k+1 = −r k+1 + βk d k , (5.42)

Numerical Analysis II - ARY 65 2017-18 Lecture Notes


then choosing the scalar βk such that d k+1
>
Ad k = 0. This gives
 >
0 = − r k+1 + βk d k Ad k = −r k+1>
Ad k + βk d k>Ad k (5.43)
and hence
> Ad
r k+1 k
βk = . (5.44)
d k>Ad k
Thus we get the basic conjugate gradient algorithm.
Algorithm 5.6 (Conjugate gradient method). Start with an initial guess x 0 and initial search
direction d 0 = −r 0 = b − Ax 0 . For each k = 0, 1, . . ., do the following:
1. Compute step size
d k>r k
αk = − .
d k>Ad k
2. Compute x k+1 = x k + αk d k .
3. Compute residual r k+1 = Ax k+1 − b.
4. If kr k+1 k < tolerance, output x k+1 and stop.
5. Determine new search direction
> Ad
r k+1 k
d k+1 = −r k+1 + βk d k where βk = .
d k>Ad k
Example → Solve our previous example with the conjugate gradient method.
Starting with x 0 = (−2, −2) > , the first step is the same as in steepest descent, giving x 1 =
(0.08, −0.613) > . But then we take
! !
−2.99 r>1 Ad 0 4.66
r 1 = Ax 1 − b = , β0 = > = 0.139, d 1 = −r 1 + β 0d 0 = .
4.48 d 0 Ad 0 −3.36
The second iteration then gives
!
d>
1 r1 2
α1 = − = 0.412 =⇒ x 2 = x 1 + α 1d 1 = .
d>
1 Ad 1
−2
This time there is no zig-zagging and the solution is reached in just two iterations:

Numerical Analysis II - ARY 66 2017-18 Lecture Notes


In exact arithmetic, the conjugate gradient method will always give the exact answer in n
iterations – one way to see this is to use the following.

Theorem 5.7. The residuals r k := Ax k − b at each stage of the conjugate gradient method are
j r k = 0 for j = 0, . . . , k − 1.
mutually orthogonal, meaning r >

Proof. See problem sheet. 

After n iterations, the only residual vector that can be orthogonal to all of the previous ones is
r n = 0, so x n must be the exact solution.
I In practice, conjugate gradients is not competitive as a direct method. It is computationally
intensive, and rounding errors can destroy the orthogonality, meaning that more than n iter-
ations may be required. Instead, its main use is for large sparse systems. For suitable matrices
(perhaps after preconditioning), it can converge very rapidly.
I We can save computation by using the alternative formulae

r k>r k > r
r k+1 k+1
r k+1 = r k + αk Ad k , αk = , βk = .
d k>Ad k r k>r k

With these formulae, each iteration requires only one matrix-vector product, two vector-
vector products, and three vector additions. Compare this to Algorithm 5.6 which requires
two matrix-vector products, four vector-vector products and three vector additions.

Numerical Analysis II - ARY 67 2017-18 Lecture Notes


6 Least-squares approximation

How do we find approximate solutions to overdetermined systems?

If A is an m ×n rectangular matrix with m > n, then the linear system Ax = b is overdetermined


and will usually have no solution. But we can still look for an approximate solution.

6.1 Orthogonality

Recall that the inner product between two column vectors x, y ∈ Rn is defined as
n
X
x · y = x >y = xk yk . (6.1)
k=1

This is related to the `2 -norm since kx k2 = x >x. The angle θ between x and y is given by
x >y = kx k2 ky k2 cos θ .
Two vectors x and y are orthogonal if x >y = 0 (i.e. they lie at right angles in Rn ).
Let S = {x 1 , x 2 , . . . , x n } be a set of n vectors. Then S is called orthogonal if x i>x j = 0 for all
i, j ∈ {1, 2, . . . , n} with i , j.

Theorem 6.1. An orthogonal set S of n vectors in Rn is a basis for Rn .

Proof. We know that a set of n vectors is a basis for Rn if the vectors are linearly independent.
If this is not the case, then some x k ∈ S could be expressed as a linear combination of the other
members,
Xn
xk = ci x i . (6.2)
i=1
i,k

Since x k , 0, we know that x k>x k = kx k k22 > 0. But we would have


n
X
x k>x k = ci x k>x i = 0, (6.3)
i=1
i,k

where we used orthogonality. This would be a contradiction, so we conclude that the vectors
in an orthogonal set are linearly independent. 

I Many of the best algorithms for numerical analysis are based on the idea of orthogonality.
We will see some examples this term.
An orthonormal set is an orthogonal set where all of the vectors have unit norm. Given
an orthogonal set S = {x 1 , x 2 , . . . , x n }, we can always construct an orthonormal set S 0 =
{x 01 , x 02 , . . . , x n0 } by normalisation, meaning

1
x i0 = xi . (6.4)
kx i k

Numerical Analysis II - ARY 68 2017-18 Lecture Notes


Theorem 6.2. Let Q be an m ×n matrix. The columns of Q form an orthonormal set iff Q >Q = In .

If m = n, then such a Q is called an orthogonal matrix. For m , n, it is just called a matrix with
orthonormal columns.

Proof. Let q 1 , q 2 , . . . , qn be the columns of Q. Then


q> q> > >
1 q 1 q 1 q 2 · · · q 1 qn
*.q 1> +/   .q 2 q 1 q 2 q 2 · · · q 2 qn +/
* > > >
Q Q = . .. / q 1 q 2 · · · qn = .. ..
. 2/
.. // . (6.5)
>
. . / . . . /
.. ..
. .
,qn - ,qn q 1 qn q 2 · · · qn qn -
> > > >

So orthonormality qi>q j = δij is equivalent to Q >Q = In , where In is the n × n identity matrix.



I Note that the columns of Q are a basis for range(Q ) = {Qx : x ∈ R }. n

Example → The set S = √1 (2, 1) > , √1 (1, −2) > .


n o
5 5
The two vectors in S are orthonormal, since
! ! !
1
 
1 2 1
 
1 1   1
√ 2 1 √ = 1, √ 1 −2 √ = 1, √1 2 1 √1 = 0.
5 5 1 5 5 −2 5 5 −2

Therefore S forms a basis for R2 . If x is a vector with components x 1 , x 2 in the standard basis
{ (1, 0) > , (0, 1) > }, then the components of x in the basis given by S are

√2 √1
!
* 15 5 + x1
− √2 x 2
, 5 5-

I Inner products are preserved under multiplication by orthogonal matrices, since (Qx ) >Qy =
x > (Q >Q )y = x >y. This means that angles between vectors and the lengths of vectors are
preserved. Multiplication by an orthogonal matrix corresponds to a rigid rotation (if det(Q ) =
1) or a reflection (if det(Q ) = −1).

6.2 Discrete least squares

The discrete least squares problem: find x that minimizes the `2 -norm of the residual, kAx −b k2 .

Example → Polynomial data fitting.


An overdetermined system arises if we try to fit a polynomial

pn (x ) = c 0 + c 1x + . . . + cn x n

to a function f (x ) at m + 1 > n + 1 nodes x 0 , . . . , xm . In the natural basis, this leads to a


rectangular system
1 x 0n c 0
+/ *.c 1 +/ *.
x0 · · ·
*. +
f (x 0 )
..1 x1 · · · x 1 / . / . f (x 1 ) //
n

..1 x2 · · · x 2n // .. ... // = .. f (x 2 ) // .
.. ... .. .. // . / .. .. //
. . / ,cn - . . /
,1 - , f (xm ) -
xm · · · xmn

Numerical Analysis II - ARY 69 2017-18 Lecture Notes


We can’t find coefficients ck to match the function at all m + 1 points. Instead, the least squares
approach is to find coefficients that minimize

p(x ) − f (x ) 2 .
m
X
i i
i=1

We will see how to do this shortly.


To solve the problem it is useful to think geometrically. The range of A, written range(A), is
the set of all possible vectors Ax ∈ Rm , where x ∈ Rn . This will only be a subspace of Rm , and
in particular it will not, in general, contain b. We are therefore looking for x ∈ Rn such that
Ax is as close as possible to b in Rm (as measured by the `2 -norm/Euclidean distance).

The distance from Ax to b is given by kr k2 = kAx − b k2 . Geometrically, we see that kr k2 will


be minimized by choosing r orthogonal to Ax, i.e.,

(Ax ) > (Ax − b) = 0 ⇐⇒ x > (A>Ax − A>b) = 0. (6.6)

This will be satisfied if x satisfies the n × n linear system

A>Ax = A>b, (6.7)

called the normal equations.

Theorem 6.3. The matrix A>A is invertible iff the columns of A are linearly independent, in
which case Ax = b has a unique least-squares solution x = (A>A) −1A>b.

Proof. If A>A is singular (non-invertible), then A>Ax = 0 for some non-zero vector x, implying
that
x >A>Ax = 0 =⇒ kAx k22 = 0 =⇒ Ax = 0. (6.8)
This implies that A is rank-deficient (i.e. its columns are linearly dependent).
Conversely, if A is rank-deficient, then Ax = 0 for some x , 0, implying A>Ax = 0 and hence
that A>A is singular. 

I The n × m matrix (A>A) −1A> is called the Moore-Penrose pseudoinverse of A. In practice,


we would solve the normal equations (6.7) directly, rather than calculating the pseudoinverse
itself.

Numerical Analysis II - ARY 70 2017-18 Lecture Notes


Example → Polynomial data fitting.
For the matrix A in our previous example, we have aij = xij for i = 0, . . . , m and j = 0, . . . , n.
So the normal matrix has entries
m
X m
X m
X
>
(A A)ij = aki akj = xki xkj = xki+j ,
k=0 k=0 k=0

and the normal equations have the form


n
X m
X m
X
cj xki+j = x ji f (x j ) for i = 0, . . . n.
j=0 k=0 j=0

Example → Fit a least-squares straight line to the data f (−3) = f (0) = 0, f (6) = 2.
Here n = 1 (fitting a straight line) and m = 2 (3 data points), so x 0 = −3, x 1 = 0 and x 2 = 6.
The overdetermined system is
1 −3 ! 0
*.1 0 +/ c 0 = *.0+/ ,
c1
,1 6 - ,2-
and the normal equations have the form
! ! !
3 x0 + x1 + x2 c0 f (x 0 ) + f (x 1 ) + f (x 2 )
=
x 0 + x 1 + x 2 x 02 + x 12 + x 22 c 1 x 0 f (x 0 ) + x 1 f (x 1 ) + x 2 f (x 2 )
! ! ! ! !
3 3 c0 2 c0 3/7
⇐⇒ = =⇒ = .
3 45 c 1 12 c1 5/21
3 5
So the least-squares approximation by straight line is p1 (x ) = 7 + 21 x.

Numerical Analysis II - ARY 71 2017-18 Lecture Notes


6.3 QR decomposition

In practice the normal matrix A>A can often be ill-conditioned. A better method is based on
another matrix factorization.

Theorem 6.4 (QR decomposition). Any real m × n matrix A, with m ≥ n, can be written in the
form A = QR, where Q is an m ×n matrix with orthonormal columns and R is an upper-triangular
n × n matrix.

Proof. We will show this by construction… 

The simplest way to compute Q and R is by Gram-Schmidt orthogonalization.

Algorithm 6.5 (Gram-Schmidt). Let {u 1 , . . . , u n } be a set of n vectors in Rm , not necessarily


orthogonal. Then we can construct an orthonormal set {q 1 , . . . , qn } by
k−1
X pk
pk = u k − (u k>qi )qi , qk = for k = 1, . . . , n.
i=1
kpk k2

Notice that each pk is constructed from u k by subtracting the orthogonal projections of u k on


each of the previous qi for i < k.

Example → Use the Gram-Schmidt process to construct an orthonormal basis forW = Span{u 1 , u 2 },
where u 1 = (3, 6, 0) > and u 2 = (1, 2, 2) > .
We take
3 3
*
p 1 = u 1 = .6/+ =⇒ q 1 =
p1 1 * +
= √ .6/ .
kp 1 k2 45 0
, -
0 , -
Then
1 3 0 0
p 2 = u 2 − (u >q )q = *.2+/ − 15 *.6+/ = *.0+/ =⇒ q =
p2
= *.0+/ .
2 1 1 2
45 0 kp 2 k2
,2- , - ,2- ,1-
The set {q 1 , q 2 } is orthonormal and spans W .
How do we use this to construct our QR decomposition of A? We simply apply Gram-Schmidt
to the set of columns of A, {a 1 , . . . , an }, which are vectors in Rm . This produces a set of
orthogonal vectors qi ∈ Rm . Moreover, we have
k−1
X k−1
X
kpk k2qk = ak − (ak>qi ) qi =⇒ ak = kpk k2qk + (ak>qi )qi . (6.9)
i=1 i=1

Taking the inner product with qk and using orthonormality of the qi shows that kpk k2 = ak>qk ,
so we can write the columns of A as
k
X
ak = (ak>qi )qi . (6.10)
i=1

Numerical Analysis II - ARY 72 2017-18 Lecture Notes


In matrix form this may be written

*. +/*a 1 q 1 a 2 q 1 · · · an q 1 +
> > >

.. //.. 0 a 2 q 2 · · · an>q 2 /
>
A = .q 1 q 2 . . . qn /. .. .. // (6.11)
.. //. . . /
.. ..
. .
, 0 0 an q n -
··· >
,| {z }-| {z }
Q (m × n) R (n × n)

I What we have done here is express each column of A in the orthonormal basis given by the
columns of Q. The coefficients in the new basis are stored in R, which will be non-singular if
A has full column rank.
How does this help in least squares? If A = QR then

A>Ax = A>b ⇐⇒ (QR) >QRx = (QR) >b (6.12)


⇐⇒ > >
R (Q Q )Rx = R Q b > >
(6.13)
 
⇐⇒ R > Rx − Q >b = 0 (6.14)
⇐⇒ Rx = Q >b. (6.15)

In the last step we assumed that R is invertible. We see that the problem is reduced to an
upper-triangular system, which may be solved by back-substitution.

Example → Use QR decomposition to find our earlier least-squares straight line, where
1 −3 ! 0
*.1 0 +/ c 0 = *.0+/ .
c1
,1 6 - ,2-

The columns of A are a 1 = (1, 1, 1) > and a 2 = (−3, 0, 6) > . So applying Gram-Schmidt gives
1 1
p 1 = a 1 = *.1+/ = √ *. 1 +/
p1 1
=⇒ q 1 =
kp 1 k2 3 1,
,1- , -
and
−3 1 −4 −4
p 2 = a 2 − (a > *. 0 +/ − √3 1 *.1+/ = *.−1+/ p2
= √ *.−1+/ .
1
q )q
2 1 1 = √ =⇒ q 2 =
3 1 kp 2 k2 42 5
,6- , - ,5- , -
Therefore A = QR with
√ √
1/√3 −4/√42 √ √ !
* +
!
3 √3
Q = ..1/√3 −1/√ 42// ,
a>
1 q1 a>
2 q1
R= = .
0 a>
2 q2 0 42
,1/ 3 5/ 42 -
The normal equations may then be written
√ √ ! ! √ ! ! !
> 3 √ 3 c0 2/√3 c0 3/7
Rx = Q b =⇒ = =⇒ = ,
0 42 c 1 10/ 42 c1 5/21

Numerical Analysis II - ARY 73 2017-18 Lecture Notes


which agrees with our earlier solution.
I In practice, Gram-Schmidt orthogonalization is not very numerically stable. Alternative
methods of computing the QR decomposition such as the Householder algorithm are preferred
(but beyond the scope of this course).

6.4 Continuous least squares

We have seen how to approximate a function f (x ) by a polynomial pn (x ), by minimising


the sum of squares of errors at m > n nodes. In this section, we will see how to find an
approximating polynomial that minimizes the error over all x ∈ [a, b].
I Think of taking m → ∞, so that our matrix A is infinitely tall. Happily, since pn still has
finitely many coefficients, we will end up with an n × n system of normal equations to solve.
Let f , д belong to the vector space C[a, b] of continuous real-valued functions on [a, b]. We
can define an inner product by
 b
( f , д) = f (x )д(x )w (x ) dx (6.16)
a

for any choice of weight function w (x ) that is positive, continuous, and integrable on (a, b).
I The purpose of the weight function will be to assign varying degrees of importance to errors
on different portions of the interval.
Since (6.16) is an inner product, we can define a norm satisfying (N1), (N2), (N3) by
q  b ! 1/2
2
k f k := (f , f ) = | f (x )| w (x ) dx . (6.17)
a

Example → Inner product.


For the weight function w (x ) = 1 on the interval [0, 1], we have, for example,
 1 p  1 ! 1/2
1 2
(1, x ) = x dx = 2 , kx k = (x, x ) = x dx = √1 .
3
0 0

The continuous least squares problem is to find the polynomial pn ∈ Pn that minimizes pn − f ,
in a given inner product. The analogue of the normal equations is the following.

Theorem 6.6 (Continuous least squares). Given f ∈ C[a, b], the polynomial pn ∈ Pn minimizes
qn − f among all qn ∈ Pn if and only if

(pn − f , qn ) = 0 for all qn ∈ Pn .

I Notice the analogy with discrete least squares: we are again setting the “error” orthogonal
to the space of “possible functions” Pn .

Numerical Analysis II - ARY 74 2017-18 Lecture Notes


Proof. If (pn − f , qn ) = 0 for all qn ∈ Pn , then for any qn , pn we have
qn − f 2 = (qn − pn ) + (pn − f ) 2 = qn − pn 2 + pn − f 2 > pn − f 2 , (6.18)
i.e., pn minimizes qn − f .
Conversely, suppose (pn − f , qn ) , 0 for some qn ∈ Pn , and consider
(pn − f ) + λqn 2 = pn − f 2 + 2λ (pn − f , qn ) + λ2 qn 2 . (6.19)
If we choose λ = −(pn − f , qn )/kqn k 2 then we see that
2
(pn + λqn ) − f 2 = pn − f 2 − (pn − f , qn ) < pn − f 2 , (6.20)
kqn k 2
showing that pn does not minimize qn − f . 

I The theorem holds more generally for best approximations in any subspace of an inner
product space. It is an important result in the subject of Approximation Theory.

Example → Find the least squares polynomial p1 (x ) = c 0 + c 1x that approximates f (x ) =


sin(πx ) on the interval [0, 1] with weight function w (x ) = 1.
We can use Theorem 6.6 to find c 0 and c 1 by requiring orthogonality for both functions in the
basis {1, x } for P1 . This will guarantee that p1 − f is orthogonal to every polynomial in P1 .
We get a system of two linearly independent equations
  
1 1

 (p1 − f , 1) = 0 
 (p1 , 1) = ( f , 1) 
 0 (c 0 + c 1x ) dx = 0 sin(πx ) dx
 (p1 − f , x ) = 0  (p1 , x ) = ( f , x )  1 (c 0 + c 1x )x dx = 1 sin(πx )x dx
⇐⇒ ⇐⇒  
   0 0
which may be written as the linear system
1 1 ! ! 1 !
dx x dx c 0 sin(πx ) dx
 10  10 2 =  10
0 x dx 0 x dx c 1 0 x sin(πx ) dx
These are analogous to the normal equations for the discrete polynomial approximation, with
sums over xk replaced by integrals. Evaluating the integrals, we have
! ! ! ! !
1 1/2 c 0 2/π c0 2/π
= =⇒ = =⇒ p1 (x ) = π2 .
1/2 1/3 c 1 1/π c1 0

Numerical Analysis II - ARY 75 2017-18 Lecture Notes


6.5 Orthogonal polynomials

Just as with discrete least squares, we can make our life easier by working in an orthonormal
basis for Pn .
A family of orthogonal polynomials associated with the inner product (6.16) is a set {ϕ 0 , ϕ 1 ,
ϕ 2 , . . .} where each ϕk is a polynomial of degree exactly k and the polynomials satisfy the
orthogonality condition
(ϕ j , ϕk ) = 0, k , j. (6.21)

I This condition implies that each ϕk is orthogonal to all polynomials of degree less than k.
The condition (6.21) determines the family uniquely up to normalisation, since multiplying
each ϕk by a constant factor does not change their orthogonality. There are three common
choices of normalisation:
1. Require each ϕk to be monic (leading coefficient 1).
2. Require orthonormality, (ϕ j , ϕk ) = δ jk .
3. Require ϕk (1) = 1 for all k.
I The final one is the standard normalisation for Chebyshev and Legendre polynomials.
As in Theorem 6.1, a set of orthogonal polynomials {ϕ 0 , ϕ 1 , . . . , ϕn } will form a basis for Pn .
Since this is a basis, the least-squares solution pn ∈ Pn may be written
pn (x ) = c 0ϕ 0 (x ) + c 1ϕ 1 (x ) + . . . + cnϕn (x ) (6.22)
where c 0 , . . . , cn are the unknown coefficients to be found. Then according to Theorem 6.6 we
can find these coefficients by requiring
(pn − f , ϕk ) = 0 for k = 0, . . . , n, (6.23)
⇐⇒ c 0 (ϕ 0 , ϕk ) + c 1 (ϕ 1 , ϕk ) + . . . cn (ϕn , ϕk ) = ( f , ϕk ) for k = 0, . . . , n, (6.24)
( f , ϕk )
⇐⇒ ck = for k = 0, . . . , n. (6.25)
(ϕk , ϕk )
So compared to the natural basis, the number of integrals required is greatly reduced (at least
once you have the orthogonal polynomials).
We can construct an orthogonal basis using the same Gram-Schmidt algorithm as in the dis-
crete case. For simplicity, we will construct a set of monic orthogonal polynomials. Start with
the monic polynomial of degree 0,
ϕ 0 (x ) = 1. (6.26)
Then construct ϕ 1 (x ) from x by subtracting the orthogonal projection of x on ϕ 0 , giving
(xϕ 0 , ϕ 0 ) (x, 1)
ϕ 1 (x ) = x − ϕ 0 (x ) = x − . (6.27)
(ϕ 0 , ϕ 0 ) (1, 1)
In general, given the orthogonal set {ϕ 0 , ϕ 1 , . . . , ϕk }, we construct ϕk+1 (x ) by starting with
xϕk (x ) and subtracting its orthogonal projections on ϕ 0 , ϕ 1 , . . . , ϕk . Thus
(xϕk , ϕ 0 ) (xϕk , ϕ 1 ) (xϕk , ϕk−1 ) (xϕk , ϕk )
ϕk+1 (x ) = xϕk (x )− ϕ 0 (x )− ϕ 1 (x )−. . .− ϕk−1 (x )− ϕk (x ).
(ϕ 0 , ϕ 0 ) (ϕ 1 , ϕ 1 ) (ϕk−1 , ϕk−1 ) (ϕk , ϕk )
(6.28)

Numerical Analysis II - ARY 76 2017-18 Lecture Notes


Now all of these projections except for the last two vanish, since, e.g., (xϕk , ϕ 0 ) = (ϕk , xϕ 0 ) = 0
using the fact that ϕk is orthogonal to all polynomials of lower degree. The penultimate term
may be simplified similarly since (xϕk , ϕk−1 ) = (ϕk , xϕk−1 ) = (ϕk , ϕk ) So we get:

Theorem 6.7 (Three-term recurrence). The set of monic orthogonal polynomials under the inner
product (6.16) satisfy the recurrence relation
(x, 1)
ϕ 0 (x ) = 1, ϕ 1 (x ) = x − ,
(1, 1)
(xϕk , ϕk ) (ϕk , ϕk )
ϕk+1 (x ) = xϕk (x ) − ϕk (x ) − ϕk−1 (x ) for k ≥ 1.
(ϕk , ϕk ) (ϕk−1 , ϕk−1 )

Example → Legendre polynomials.


These are generated by the inner product with w (x ) ≡ 1 on the interval (−1, 1). Starting with
ϕ 0 (x ) = 1, we find that
1 1 3 1 2
−1 x dx 2 −1 x dx x dx
ϕ 1 (x ) = x −  1 = x, ϕ 2 (x ) = x −  1 x − −1
1 = x 2 − 31 , . . .
−1 dx −1 x dx −1 dx
2

I Traditionally, the Legendre polynomials are then normalised so that ϕk (1) = 1 for all k. In
that case, the recurrence relation reduces to
(2k + 1)xϕk (x ) − kϕk−1 (x )
ϕk+1 (x ) = .
k +1

Example → Use a basis of orthogonal polynomials to find the least squares polynomial p1 =
c 0 +c 1x that approximates f (x ) = sin(πx ) on the interval [0, 1] with weight function w (x ) = 1.
Starting with ϕ 0 (x ) = 1, we compute
1
x dx
ϕ 1 (x ) = x − 0 1 = x − 12 .
0 dx

Then the coefficients are given by


1
( f , ϕ0 ) 0 sin(πx ) dx
c0 = = 1 = π2 ,
0 dx
(ϕ 0 , ϕ 0 )
1
( f , ϕ1 ) 0 (x − 12 ) sin(πx ) dx
c1 = = 1 = 0,
1 2
(ϕ 1 , ϕ 1 )
0 (x − 2 ) dx

so we recover our earlier approximation p1 (x ) = π2 .

Numerical Analysis II - ARY 77 2017-18 Lecture Notes


7 Numerical integration

How do we calculate integrals numerically?

The definite integral


 b
I ( f ) := f (x ) dx (7.1)
a
can usually not be evaluated in closed form. To approximate it numerically, we can use a
quadrature formula
n
X
In ( f ) := σk f (xk ), (7.2)
k=0
where x 0 , . . . , xn are a set of nodes and σ0 , . . . , σn are a set of corresponding weights.
I The nodes are also known as quadrature points or abscissas, and the weights as coefficients.

Example → The trapezium rule


b −a 
I1 ( f ) = f (a) + f (b) .
2
This is the quadrature formula (7.2) with x 0 = a, x 1 = b, σ0 = σ1 = 21 (b − a).
For example, with a = 0, b = 2, f (x ) = ex , we get
2−0 0 
I1 ( f ) = e + e2 = 8.389 to 4 s.f.
2
The exact answer is  2
I (f ) = ex dx = e2 − e0 = 6.389 to 4 s.f.
0
Graphically, I 1 ( f ) measures the area under the straight line that interpolates f at the ends:

Numerical Analysis II - ARY 78 2017-18 Lecture Notes


7.1 Newton-Cotes formulae

We can derive a family of “interpolatory” quadrature formulae by integrating interpolating


polynomials of different degrees. We will also get error estimates using Theorem 2.6.
Let x 0 , . . . , xn ∈ [a, b], where x 0 < x 1 < · · · < xn , be a set of n + 1 nodes, and let pn ∈ Pn be
the polynomial that interpolates f at these nodes. This may be written in Lagrange form as
n
X Yn
x − xj
pn (x ) = f (xk )`k (x ), where `k (x ) = . (7.3)
k=0
x − xj
j=0 k
j,k

To approximate I ( f ), we integrate pn (x ) to define the quadrature formula


 n
bX n
X  b
In ( f ) := f (xk )`k (x ) dx = f (xk ) `k (x ) dx . (7.4)
a k=0 k=0 a

In other words, 
n
X b
In ( f ) := σk f (xk ), where σk = `k (x ) dx . (7.5)
k=0 a

When the nodes are equidistant, this is called a Newton-Cotes formula. If x 0 = a and xn = b, it
is called a closed Newton-Cotes formula.
I An open Newton-Cotes formula has nodes xi = a + (i + 1)h for h = (b − a)/(n + 2).

Example → Trapezium rule.


This is the closed Newton-Cotes formula with n = 1. To see this, let x 0 = a, x 1 = b. Then
 b  b
(x − b) 2 =
x −b 1 1 b b −a
`0 (x ) = =⇒ σ0 = `0 (x ) dx = (x − b) dx = ,
a −b a a −b a 2(a − b) a 2
and
 
(x − a) 2 =
x −a b
1 b
1 b b −a
`1 (x ) = =⇒ σ1 = `1 (x ) dx = (x − a) dx = .
b −a a b −a a 2(b − a) a 2
So
b −a 
I 1 ( f ) = σ0 f (a) + σ1 f (b) = f (a) + f (b) .
2

Theorem 7.1. Let f be continuous on [a, b] with n + 1 continuous derivatives on (a, b). Then the
Newton-Cotes formula (7.5) satisfies the error bound

I ( f ) − I ( f ) ≤ maxξ ∈[a,b] | f (x − x )(x − x ) · · · (x − x ) dx .
(n+1) (ξ )| b

n (n + 1)! a
0 1 n

Numerical Analysis II - ARY 79 2017-18 Lecture Notes


Proof. First note that the error in the Newton-Cotes formula may be written

I ( f ) − I ( f ) =  b f g
 b  b
a f (x ) dx − p (x ) dx =
a f (x ) − p (x ) dx (7.6)

n n n
a
 b
f (x ) − p (x ) dx . (7.7)

a
n

Now recall Theorem 2.6, which says that, for each x ∈ [a, b], we can write

f (n+1) (ξ )
f (x ) − pn (x ) = (x − x 0 )(x − x 1 ) · · · (x − xn ) (7.8)
(n + 1)!
for some ξ ∈ (a, b). The theorem simply follows by inserting this into inequality (7.7). 

Example → Trapezium rule.


Let M 2 = maxξ ∈[a,b] | f 00 (ξ )|. Here Theorem 7.1 reduces to
 
I ( f ) − I ( f ) ≤ M 2 (x − a)(x − b) dx = M 2  (b − a) 3
b b

1 (1 + 1)! x − a)(b − x ) dx = M2 .
a 2! a 12
For our earlier example with a = 0, b = 2, f (x ) = ex , the estimate gives
I ( f ) − I ( f ) ≤ 1 3 2
≈ 4.926.
1 12 (2 )e

This is an overestimate of the actual error which was ≈ 2.000.


I Theorem 7.1 suggests that the accuracy of In is limited both by the smoothness of f (outside
our control) and by the location of the nodes xk . If the nodes are free to be chosen, then we
can use Gaussian integration (see later).
I As with interpolation, taking a high n is not usually a good idea. One can prove for the
closed Newton-Cotes formula that
n
X
|σk | → ∞ as n → ∞.
k=0

This makes the quadrature vulnerable to rounding errors for large n.

7.2 Composite Newton-Cotes formulae

Since the Newton-Cotes formulae are based on polynomial interpolation at equally-spaced


points, the results do not converge as the number of nodes increases. A better way to improve
accuracy is to divide the interval [a, b] into m subintervals [xi−1 , xi ] of equal length
b −a
h := , (7.9)
m
and use a Newton-Cotes formula of small degree n on each subinterval.

Numerical Analysis II - ARY 80 2017-18 Lecture Notes


Example → Composite trapezium rule.
Applying the trapezium rule I 1 ( f ) on each subinterval gives
h
C 1,m ( f ) = [f (x 0 ) + f (x 1 ) + f (x 1 ) + f (x 2 ) + . . . + f (xm−1 ) + f (xm )] ,
2f g
= h 12 f (x 0 ) + f (x 1 ) + f (x 2 ) + . . . + f (xm−1 ) + 21 f (xm ) .
We are effectively integrating a piecewise-linear approximation of f (x ); here we show m = 3
for our test problem f (x ) = ex on [0, 2]:

Look at what happens as we increase m for our test problem:


m h C 1,m ( f ) |I ( f ) − C 1,m ( f )|
1 2 8.389 2.000
2 1 6.912 0.524
4 0.5 6.522 0.133
8 0.25 6.422 0.033
16 0.125 6.397 0.008
32 0.0625 6.391 0.002
When we halve the sub-interval h, the error goes down by a factor 4, suggesting that we have
quadratic convergence, i.e., O(h 2 ).
To show this theoretically, we can apply Theorem 7.1 in each subinterval. In [xi−1 , xi ] we have
 xi
I ( f ) − I ( f ) ≤ maxξ ∈[xi−1 ,xi ] | f (ξ )| (x − x )(x − x ) dx
00

1 2! x i−1
i−1 i

Note that
 xi  xi  xi f g
(x − x )(x − x ) dx = dx 2
i−1 i (x − x i−1 )(x i − x ) = − x + (x i−1 + x i )x − x i−1 i dx
x
f g xi
x i−1 x i−1 x i−1

= − 31 x 3 + 12 (xi−1 + xi )x 2 − xi−1xi x = 16 xi3 − 21 xi−1xi2 + 21 xi−1


2 3
xi − 16 xi−1
x i−1
1 3 1 3
= 6 (xi − xi−1 ) = 6h .

Numerical Analysis II - ARY 81 2017-18 Lecture Notes


So overall
!
I ( f ) −C ( f ) ≤ 1
max max 00 h 3 mh 3
max | f 00 (ξ )| =
b −a 2
h max | f 00 (ξ )|.
1,m 2 i ξ ∈[x i−1 ,x i ]
| f (ξ )| m
6
=
12 ξ ∈[a,b] 12 ξ ∈[a,b]

As long as f is sufficiently smooth, this shows that the composite trapezium rule will converge
as m → ∞. Moreover, the convergence will be O(h 2 ).

7.3 Exactness

From Theorem 7.1, we see that the Newton-Cotes formula In ( f ) will give the exact answer if
f (n+1) = 0. In other words, it will be exact if f ∈ Pn .

Example → The trapezium rule I 1 ( f ) is exact for all linear polynomials f ∈ P1 .

The degree of exactness of a quadrature formula is the largest integer n for which the formula
is exact for all polynomials in Pn .
To check whether a quadrature formula has degree of exactness n, it suffices to check whether
it is exact for the basis 1, x, x 2 , . . . , x n .

Example → Simpson’s rule.


This is the n = 2 closed Newton-Cotes formula
" ! #
b −a a +b
I2 ( f ) = f (a) + 4f + f (b) ,
6 2
derived by integrating a quadratic interpolating polynomial. Let us find its degree of exactness:
 b
b −a
I (1) = dx = (b − a), I 2 (1) = [1 + 4 + 1] = b − a = I (1),
a 6
 b
b 2 − a2 b −a (b − a)(b + a)
I (x ) = x dx = , I 2 (x ) = [a + 2(a + b) + b] = = I (x ),
a 2 6 2
b −a f 2 g 2(b 3 − a 3 )
 b
2 b 3 − a3
I (x ) = x 2 dx = , I 2 (x 2 ) = a + (a + b) 2 + b 2 = = I (x 2 ),
a 3 6 6
b −a f 3 1 g b 4 − a4
 b
3 b 4 − a4
I (x ) = x 3 dx = , I 2 (x 3 ) = a + 2 (a + b) 3 + b 3 = = I (x 3 ).
a 4 6 4
This shows that the degree of exactness is at least 3 (contrary to what might be expected from
the interpolation picture). You can verify that I 2 (x 4 ) , I (x 4 ), so the degree of exactness is
exactly 3.
I This shows that the term f 000 (ξ ) in the error formula for Simpson’s rule (Theorem 7.1) is
misleading. In fact, it is possible to write an error bound proportional to f (4) (ξ ).
I In terms of degree of exactness, Simpson’s formula does better than expected. In general,
Newton-Cotes formulae with even n have degree of exactness n + 1. But this is by no means
the highest possible (next section).

Numerical Analysis II - ARY 82 2017-18 Lecture Notes


7.4 Gaussian quadrature

The idea of Gaussian quadrature is to choose not only the weights σk but also the nodes xk , in
order to achieve the highest possible degree of exactness.
Firstly, we will illustrate the brute force method of undetermined coefficients.
P1
Example → Gaussian quadrature formula G 1 ( f ) = k=0 σk f (xk ) on the interval [−1, 1].
Here we have four unknowns x 0 , x 1 , σ0 and σ1 , so we can impose four conditions:
 1
G 1 (1) = I (1) =⇒ σ0 + σ1 = dx = 2,
−1
 1
G 1 (x ) = I (x ) =⇒ σ0x 0 + σ1x 1 = x dx = 0,
−1
 1
2 2 2 2
G 1 (x ) = I (x ) =⇒ σ0x 0 + σ1x 1 = x 2 dx = 23 ,
−1
 1
G 1 (x 3 ) = I (x 3 ) =⇒ σ0x 03 + σ1x 13 = x 3 dx = 0.
−1

To solve this system, the symmetry suggests that x 1 = −x 0 and σ0 = σ1 . This will automatically
satisfy the equations for x and x 3 , leaving the two equations

2σ0 = 2, 2σ0x 02 = 23 ,

so that σ0 = σ1 = 1 and x 1 = −x 0 = 1/ 3. The resulting Gaussian quadrature formula is
! !
1 1
G1 ( f ) = f − √ + f √ .
3 3
This formula has degree of exactness 3.
In general, the Gaussian quadrature formula with n nodes will have degree of exactness 2n + 1.
The method of undetermined coefficients becomes unworkable for larger numbers of nodes,
because of the nonlinearity of the equations. A much more elegant method uses orthogonal
polynomials. In addition to what we learned before, we will need the following result.

Lemma 7.2. If {ϕ 0 , ϕ 1 , . . . , ϕn } is a set of orthogonal polynomials on [a, b] under the inner prod-
uct (6.16) and ϕk is of degree k for each k = 0, 1, . . . , n, then ϕk has k distinct real roots, and these
roots lie in the interval [a, b].

Proof. Let x 1 , . . . , x j be the points where ϕk (x ) changes sign in [a, b]. If j = k then we are done.
Otherwise, suppose j < k, and consider the polynomial

q j (x ) = (x − x 1 )(x − x 2 ) · · · (x − x j ). (7.10)

Since q j has lower degree than ϕk , they must be orthogonal, meaning


 b
(q j , ϕk ) = 0 =⇒ q j (x )ϕk (x )w (x ) dx = 0. (7.11)
a

Numerical Analysis II - ARY 83 2017-18 Lecture Notes


On the other hand, notice that the product q j (x )ϕk (x ) cannot change sign in [a, b], because
each sign change in ϕk (x ) is cancelled out by one in q j (x ). This means that
 b
q j (x )ϕk (x )w (x ) dx , 0, (7.12)
a

which is a contradiction. 

Remarkably, these roots are precisely the optimum choice of nodes for a quadrature formula
to approximate the (weighted) integral
 b
Iw ( f ) = f (x )w (x ) dx . (7.13)
a

Theorem 7.3 (Gaussian quadrature). Let ϕn+1 be a polynomial in Pn+1 that is orthogonal on
[a, b] to all polynomials in Pn , with respect to the weight function w (x ). If x 0 , x 1 , . . . , xn are the
roots of ϕn+1 , then the quadrature formula
n
X  b
Gn,w ( f ) := σk f (xk ), σk = `k (x )w (x ) dx
k=0 a

approximates (7.13) with degree of exactness 2n + 1 (the largest possible).

I Like Newton-Cotes, we see that Gaussian quadrature is based on integrating an interpolating


polynomial, but now the nodes are the roots of an orthogonal polynomial, rather than equally
spaced points.

Example → Gaussian quadrature with n = 1 on [−1, 1] and w (x ) = 1 (again).


To find the nodes x 0 , x 1 , we need to find the roots of the orthogonal polynomial ϕ 2 (x ). For
this inner product, we already computed this (Legendre polynomial) in Chapter 6, where we
found
ϕ 2 (x ) = x 2 − 13 .
√ √
Thus the nodes are x 0 = −1/ 3, x 1 = 1/ 3. Integrating the Lagrange polynomials gives the
corresponding weights
 1  1 √1
x− √ f g1
3 3 1 2
σ0 = `0 (x ) dx = dx = − − √1 x = 1,
2 2 2x 3 −1
−1 −1 − √
3
 1  1 x + √1 √ f g1
3 3 1 2
σ1 = `1 (x ) dx = dx = √1 x = 1,
√2 2 2x +
3 −1
−1 −1
3

as before.
I Using an appropriate weight function w (x ) can be useful for integrands with a singularity,
since we can incorporate this in w (x ) and still approximate the integral with Gn,w .
1 1
Example → Gaussian quadrature for 0 cos(x )x − 2 dx, with n = 0.
This is a Fresnel integral, with exact value 1.80905 . . . Let us compare the effect of using an
appropriate weight function.

Numerical Analysis II - ARY 84 2017-18 Lecture Notes


1. Unweighted quadrature (w (x ) ≡ 1). The orthogonal polynomial of degree 1 is
1
x dx
ϕ 1 (x ) = x − 0 1 = x − 12 =⇒ x 0 = 12 .
0 dx

The corresponding weight may be found by imposing G 0 (1) = I (1), which gives σ0 =
1
0 dx = 1. Then our estimate is

! cos  1 
cos(x ) 2
G0 √ = q = 1.2411 . . .
x 1
2

1
2. Weighted quadrature with w (x ) = x − 2 . This time we get
1 1
2
0 x 2 dx 3
ϕ 1 (x ) = x −  =x− =⇒ x 0 = 31 .
1 1 2
0 x−2 dx
1 1
The corresponding weight is σ0 = 0 x − 2 dx = 2, so the new estimate is the more accurate
   
G 0,w cos(x ) = 2 cos 13 = 1.8899 . . .

Proof of Theorem 7.3. First, recall that any interpolatory quadrature formula based on n + 1
nodes will be exact for all polynomials in Pn (this follows from Theorem 7.1, which can be
modified to include the weight function w (x )). So in particular, Gn,w is exact for pn ∈ Pn .
Now let p2n+1 ∈ P2n+1 . The trick is to divide this by the orthogonal polynomial ϕn+1 whose
roots are the nodes. This gives

p2n+1 (x ) = ϕn+1 (x )qn (x ) + rn (x ) for some qn , rn ∈ Pn . (7.14)

Then
n
X n
X   Xn
Gn,w (p2n+1 ) = σk p2n+1 (xk ) = σk ϕn+1 (xk )qn (xk ) + rn (xk ) = σk rn (xk ) = Iw (rn ),
k=0 k=0 k=0
(7.15)

where we have used the fact that Gn,w is exact for rn ∈ Pn . Now, since qn has lower degree
than ϕn+1 , it must be orthogonal to ϕn+1 , so
 b
Iw (ϕn+1qn ) = ϕn+1 (x )qn (x )w (x ) dx = 0 (7.16)
a

and hence

Gn,w (p2n+1 ) = Iw (rn ) + 0 = Iw (rn ) + Iw (ϕn+1qn ) = Iw (ϕn+1qn + rn ) = Iw (p2n+1 ). (7.17)

Numerical Analysis II - ARY 85 2017-18 Lecture Notes


I Unlike Newton-Cotes formulae with equally-spaced points, it can be shown that Gn,w ( f ) →
Iw ( f ) as n → ∞, for any continuous function f . This follows (with a bit of analysis) from
the fact that all of the weights σk are positive, along with the fact that they sum to a fixed
b
number a w (x ) dx. For Newton-Cotes, the signed weights still sum to a fixed number, but
Pn
k=0 |σk | → ∞ which destroys convergence.
Not surprisingly, we can derive an error formula that depends on f (2n+2) (ξ ) for some ξ ∈ (a, b).
To do this, we will need the following result from calculus.
Theorem 7.4 (Mean value theorem for integrals). If f , д are continuous on [a, b] and д(x ) ≥ 0
for all x ∈ [a, b], then there exists ξ ∈ (a, b) such that
 b  b
f (x )д(x ) dx = f (ξ ) д(x ) dx .
a a

Proof. Let m and M be the minimum and maximum values of f on [a, b], respectively. Since
д(x ) ≥ 0, we have that
 b  b  b
m д(x ) dx ≤ f (x )д(x ) dx ≤ M д(x ) dx . (7.18)
a a a
b b
Now let I = a д(x ) dx. If I = 0 then д(x ) ≡ 0, so a f (x )д(x ) dx = 0 and the theorem holds
for every ξ ∈ (a, b). Otherwise, we have

1 b
m≤ f (x )д(x ) dx ≤ M. (7.19)
I a
By the Intermediate Value Theorem (Theorem 4.1), f (x ) attains every value between m and M
somewhere in (a, b), so in particular there exists ξ ∈ (a, b) with

1 b
f (ξ ) = f (x )д(x ) dx . (7.20)
I a


Theorem 7.5 (Error estimate for Gaussian quadrature). Let ϕn+1 ∈ Pn+1 be monic and orthogo-
nal on [a, b] to all polynomials in Pn , with respect to the weight function w (x ). Let x 0 , x 1 , . . . , xn
be the roots of ϕn+1 , and let Gn,w ( f ) be the Gaussian quadrature formula defined by Theorem 7.3.
If f has 2n + 2 continuous derivatives on (a, b), then there exists ξ ∈ (a, b) such that

f (2n+2) (ξ ) b 2
Iw ( f ) − Gn,w ( f ) = ϕ (x )w (x ) dx .
(2n + 2)! a n+1

Proof. A neat trick is to use Hermite interpolation. Since the xk are distinct, there exists a
unique polynomial p2n+1 such that

p2n+1 (xk ) = f (xk ) and p2n+1


0
(xk ) = f 0 (xk ) for k = 0, . . . , n. (7.21)

In addition (see problem sheet), there exists λ ∈ (a, b), depending on x, such that
n
f (2n+2) (λ) Y
f (x ) − p2n+1 (x ) = (x − xi ) 2 . (7.22)
(2n + 2)! i=0

Numerical Analysis II - ARY 86 2017-18 Lecture Notes


Now we know that (x − x 0 )(x − x 1 ) · · · (x − xn ) = ϕn+1 (x ), since we fixed ϕn+1 to be monic.
Hence
 b  b  b (2n+2)
f (λ) 2
f (x )w (x ) dx − p2n+1 (x )w (x ) dx = ϕn+1 (x )w (x ) dx . (7.23)
a a a (2n + 2)!

Now we know that Gn,w must be exact for p2n+1 , so


 b n
X n
X
p2n+1 (x )w (x ) dx = Gn,w (p2n+1 ) = σk p2n+1 (xk ) = σk f (xk ) = Gn,w ( f ). (7.24)
a k=0 k=0

For the right-hand side, we can’t take f (2n+2) (λ) outside the integral since λ depends on x. But
2 (x )w (x ) ≥ 0 on [a, b], so we can apply the mean value theorem for integrals and get
ϕn+1
 b
f (2n+2) (ξ ) 2
Iw ( f ) − Gn,w ( f ) = ϕn+1 (x )w (x ) dx (7.25)
(2n + 2)! a

for some ξ ∈ (a, b) that does not depend on x. 

Numerical Analysis II - ARY 87 2017-18 Lecture Notes

You might also like