0% found this document useful (0 votes)
14 views28 pages

Topic 6 Handout

The document is a course outline for Numerical Linear Algebra by Pieter Collins at Maastricht University. It covers various topics including Gaussian elimination, LU factorization, iterative methods, eigenvalues, and orthogonalization, providing a structured approach to solving linear algebraic equations. The document includes mathematical formulations and examples to illustrate the concepts discussed.

Uploaded by

Nathan Bouquet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views28 pages

Topic 6 Handout

The document is a course outline for Numerical Linear Algebra by Pieter Collins at Maastricht University. It covers various topics including Gaussian elimination, LU factorization, iterative methods, eigenvalues, and orthogonalization, providing a structured approach to solving linear algebraic equations. The document includes mathematical formulations and examples to illustrate the concepts discussed.

Uploaded by

Nathan Bouquet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Numerical Mathematics

Numerical Linear Algebra


Pieter Collins

Department of Knowledge Engineering


Maastricht University
[email protected]

KEN1540, Block 5, April-May 2021

Algebraic Methods 2
Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LU Factorisation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Norms and Conditioning 19


Iterative refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Iterative Methods 24
Iterative method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Jacobi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Gauss-Seidel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Over-Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Conjugate-Gradient 36
Conjugate-Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Eigenvalues 42
Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Inverse power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Deflation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Orthogonalisation 52
Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Orthogonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

The QR Method 58
QR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Householder
. . . . . . . . . . .matrices
............................................................................ 64
Givens matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Eigenvalue condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

1
Algebraic Methods 2 / 76

Matrix form

System of linear algebraic equations:


3x1 − 7x2 − 2x3 + 2x4 = −9
−3x1 + 5x2 + x3 = 5;
6x1 − 4x2 + 2x3 − 5x4 = 7;
−9x1 + 5x2 − 5x3 + 6x4 = −19.
Write using matrix form Ax = b where
     
3 −7 −2 2 x1 −9
−3 5 1 0  x2   5 
A=
 6 −4 2 −5 ,
 x=
 x3  ,
 b=
 7 .

−9 5 −5 6 x4 −19

3 / 76

Gaussian Elimination

Solve by Gaussian elimination:


   
3 −7 −2 2 −9 r1 3 −7 −2 2 −9
 −3 5 1 0 5  r2 − (−1)r 1
 0 −2 −1 2 −4 
 ∼  4
 6 −4 2 −5 7  r3 − (+2)r1 0 10 6 −9 25 
−9 5 −5 6 −19 r4 − (−3)r1 0 −16 −11 12 −46
 
r1 3 −7 −2 2 −9
r2  0 −2 −1 2 −4 
∼  
r3 − (−5)r2 0 0 1 1 5 
r4 − (+8)r2 0 0 −3 −4 −14
 
r1 3 −7 −2 2 −9
r2  0 −2 −1 2 −4 
∼  
r3  0 0 1 1 5 
r4 − (−3)r3 0 0 0 −1 1
Solve by backsubstitution x4 = −1, x3 = 6, x2 = −2, x1 = −3.

4 / 76

2
LU Factorisation

Define L by letting Lij be the multiple of row j subtracted from row i, and U the tableau matrix before
backsubstitution.
   
1 0 0 0 3 −7 −2 2
−1 1 0 0 0 −2 −1 2 
L=
 2 −5 1
, U = 
0 0 0 1 1
−3 8 −3 1 0 0 0 −1
Notice that LU = A!
    
1 0 0 0 3 −7 −2 2 3 −7 −2 2
−1 1 0 0 0 −2 −1 2  −3 5 1 0
LU = 
 2 −5 1
 =  = A.
0  0 0 1 1 6 −4 2 −5
−3 8 −3 1 0 0 0 −1 −9 5 −5 6

Solving Ly = b gives y = (−9, −4, 5, 1)T . Notice that this y is the right-hand column of the tableau before
backsubstitution.

5 / 76

LU Factorisation

Factorization A = LU where L is unit lower triangular lii = 1, lij = 0 for j > i and U is upper triangular
uij = 0 for i > j .

Backsubstitution The linear system LU x = b can be solved by computing Ly = b and U x = y . Since L


and U are triangular, the linear systems can be easily solved by backsubstitution
X X
y i = bi − lij yj ; xi = (yi − uij xj )/uii .
j<i j>i

6 / 76

LU Factorisation—Complexity

Complexity Computing the LU -factorisation:


n(n − 1) + (n − 1)(n − 2) + · · · + 3 · 2 + 2 · 1 = (n3 − n)/3 ∼ (n3 /3) = O(n3 ).
Solving the system Ly = b requires
(n − 1) + (n − 2) + · · · + 2 + 1 = (n2 − n)/2
operations, and solving U x = y requires
n + (n − 1) + · · · + 2 + 1 = (n2 + n)/2
operations, so computing x from L, U, b requires n2 operations.
Hence for large systems, most of the work in solving Ax = b lies in computing the LU -factorisation.
Once the LU -factorisation has been computed for a given A, the systems Axi = bi can easily be solved for
different vectors bi .

7 / 76

3
LU Factorisation—Pivoting

Let A(k−1 ) be the matrix obtained after working on the first k − 1 columns.
(k−1)
Pivoting If Akk is equal to zero, then Gaussian elimination needs to swap rows.
Obtain a factorisation P A = LU , were P is a permutation matrix.
(k−1) (k−1)
Partial pivoting Make the pivot element Akk the largest of Akj for j ≥ k .

Scaled partial pivoting Define scale factor sk = max1≤j≤k |akj |.


Choose first row to miaximise |ak1 |/sk = |ak1 |/ max1≤j≤k |akj |.

8 / 76

LU Factorisation—Pivoting Example (Omit)

Example Compute the LU-factorisation of A using partial pivoting and solve Ax = b where
   
2.11 −4.21 0.921 2.01
A = 1.09 0.987 0.832  , b =  4.21  .
4.01 10.2 −1.12 −3.09

Swap row 1 and row 3 and eliminate x1 :


    
0 0 1 2.11 −4.21 0.921 4.01 10.2 −1.12
T 1 A = 0 1 0 1.09 0.987 0.832  = 1.09 0.987 0.832 
1 0 0 4.01 10.2 −1.12 2.11 −4.21 0.921
  
1 0 0 4.01 10.2 −1.12
= 0.272 1 0  0 −1.79 1.14  = L1 U1
0.526 0 1 0 −9.58 1.51

9 / 76

LU Factorisation—Pivoting Example (Omit)


Example T1 A = L1 U1 given by
     
0 0 1 1.09 0.987 0.832 1 0 0 4.01 10.2 −1.12
0 1 0 2.11 −4.21 0.921  = 0.272 1 0  0 −1.79 1.14 
1 0 0 4.01 10.2 −1.12 0.526 0 1 0 −9.58 1.51

Swap row 2 and row 3, so T2 T1 A = (T2 L1 T2−1 )(T2 U1 ), and eliminate x2 :


   
1 0 0 0 0 1 2.11 −4.21 0.921
T 2 T 1 A = 0 0 1 0 1 0 1.09 0.987 0.832 
0 1 0 1 0 0 4.01 10.2 −1.12
  
1 0 0 4.01 10.2 −1.12
= 0.526 1 0  0 −9.58 1.51 
0.272 0 1 0 −1.79 1.14
   
1 0 0 1 0 0 4.01 10.2 −1.12
= 0.526 1 0 0 1 0  0 −9.58 1.51 
0.272 0 1 0 0.186 1 0 0 0.855
= (T2 L1 T2−1 ) L2 U2

10 / 76

4
LU Factorisation—Pivoting Example (Omit)
Example Obtain P A = LU with
    
0 0 1 2.11 −4.21 0.921 4.01 10.2 −1.12
P A = 1 0 0 1.09 0.987 0.832  = 2.11 −4.21 0.921 
0 1 0 4.01 10.2 −1.12 1.09 0.987 0.832
  
1 0 0 4.01 10.2 −1.12
= 0.526 1 0  0 −9.58 1.51  = LU
0.272 0.186 1 0 0 0.855

Solve Ax = b using P Ax = LU x = P b, so U x = y where Ly = P b.


     
−3.09 −3.09 −0.428
P b =  2.01 ; y = L\P b =  3.64 ; x = U \y =  0.427 .
4.21 4.37 5.11

11 / 76

LU Factorisation—Pivoting Example (Omit)

Example Compute y by backsubstitution:


    
1 0 0 y1 −3.09
Ly = 0.526 1 0 y2  =  2.01  = P b
0.272 0.186 1 y3 4.21
y1 = −3.09;
0.526y1 + y2 = 2.01 =⇒
y2 = 2.01 − 0.526y1 = 2.01 − 0.526×(−0.309)
= 3.64
0.272y1 + 0.186y2 + y3 = 4.21 =⇒
y3 = 4.21 − 0.272y1 − 0.186y2 = 4.21 − 0.272×(−0.309) − 0.186×3.64
= 4.37

12 / 76

5
LU Factorisation—Pivoting Example (Omit)

Example Compute x by backsubstitution:


    
4.01 10.2 −1.12 x1 −3.09
Ux =  0 −9.58 1.51  x2  =  3.64  = y
0 0 0.855 x3 4.37
0.855x3 = 4.37 =⇒
x3 = 4.37 ÷ 0.855
= 5.11
− 9.58x2 + 1.51x3 = 3.64 =⇒
x2 = (3.64 − 1.51x3 ) ÷ (−9.58) = (3.64 − 1.51×5.11) ÷ (−9.58)
= 0.427
4.01x1 + 10.2x2 − 1.12x3 = −3.09 =⇒

x1 = −3.09 − 10.2x2 − (−1.12)x3 ÷ 4.01

= −3.09 − 10.2×0.427 + 1.12×5.11 ÷ 4.01
= −0.428

13 / 76

Sparsity

Sparsity A matrix is sparse if it has many zero elements.

Fill-in The inverse of a sparse matrix is usually dense.

Tridiagonal matrices A is tridiagonal if aij = 0 for |i − j| > 1.


LU-factorisation preserves the zeros for a banded matrix!

Complexity Computing A−1 b if A is tridiagonal requires n2 operations; solving LU x = b requires ∼ 3n


operations!

14 / 76

Sparsity
Example For the given tridiagonal matrix, the inverse is
 −1  
4 1 0 0 0 0.268 −0.072 0.019 −0.005 0.001
1 4 1 0 0 −0.072 0.287 −0.077 0.021 −0.005
   
0 1 4 1 0 =  0.019 −0.077 0.288 −0.077 0.019 
   
0 0 1 4 1 −0.005 0.021 −0.077 0.287 −0.072
0 0 0 1 4 0.001 −0.005 0.019 −0.072 0.268

and the LU-factorisation is


  
1 0 0 0 0 4.000 1 0 0 0
0.250 1 0 0 0 0 3.750 1 0 0 

 
 0 0.267 1 0 0  0 0 3.733 1 0 
  
 0 0 0.268 1 0  0 0 0 3.732 1 
0 0 0 0.268 1 0 0 0 0 3.732

Clearly, solving LU x = b by backsubstitution is faster then computing A−1 x.

15 / 76

6
Sparsity
Example For the LU-factorisation
  
1 0 0 0 0 4.000 1 0 0 0
0.250 1 0 0 0
 0 3.750 1 0 0 

 
 0
LU =  0.267 1 0 0  0 0 3.733 1 0 
 

 0 0 0.268 1 0  0 0 0 3.732 1 
0 0 0 0.268 1 0 0 0 0 3.732

we have (LU )−1 = U −1 L−1 given by


  
0.250 −0.067 0.018 −0.005 0.001 1 0 0 0 0
 0 0.267 −0.071 0.019 −0.005 −0.250 1 0 0 0
  
 0 0 0.268 −0.072 0.019   0.067 −0.267 1 0 0
  
 0 0 0 0.268 −0.072 −0.018 0.071 −0.268 1 0
0 0 0 0 0.268 0.005 −0.019 0.072 −0.268 1

Clearly, solving LU x = b by backsubstitution is faster then computing L−1 , U −1 and x = U −1 (L−1 b)

16 / 76

Symmetric matrices

Symmetric matrices A symmetric matrix (A = AT ) can be factorised A = LDLT where L is


unit-lower-triangular and D is diagonal.

Positive-definite matrices A matrix is positive definite if xT Ax > 0 whenever x 6= 0; equivalently, if all


eigenvalues are positive, or if A = LDLT with D having strictly-positive diagonal elements.

Cholesky factorisation If A is positive definite, then there is an upper-triangular matrix U such that
A = U T U (or a lower-triangular matrix L such that A = LLT .)
17 / 76

Symmetric matrices—Example (Omit)


Example Compute the LDLT factorisation:
       
2 1 0 1 0 0 2 1 0 1 0 0 2 1 0
A = 1 2 1 = 1/2 1 0 0 3/2 1 = 1/2 1 0 0 3/2 1 
0 1 2 0 0 1 0 1 2 0 2/3 1 0 0 4/3
   
1 0 0 2 0 0 1 1/2 0
= 1/2 1 0 0 3/2 0  0 1 2/3 = LDLT
0 2/3 1 0 0 4/3 0 0 1

The Cholseky factorisation is given by A = U T U where


√   p p 
2 p0 0 1 1/2 0 2/1 p1/2 p0
U = D1/2 LT =  0 3/2 p0  0 1 2/3 =  0 3/2 p2/3
0 0 4/3 0 0 1 0 0 4/3

18 / 76

7
Norms and Conditioning 19 / 76

Vector and matrix norms

Properties A norm is a measure of the magnitude of a vector (or matrix).


A vector norm k · k is a function Rn → R which satisfies
kvk ≥ 0, kvk = 0 ⇐⇒ v = 0;
kαvk = |α| · kvk; ku + vk ≤ kuk + kvk.
A matrix norm additionally satisfies
kABk ≤ kAk · kBk.
Given a vector norm k · k∗ , the corresponding matrix norm is
kAk∗ = max{kAxk∗ | kxk∗ = 1}
and satisfies
kAxk∗ ≤ kAk∗ × kxk∗
20 / 76

Vector and matrix norms

p-norms Important vector norms are


p 1/p .
Pn 
kvkp := i=1 |vi |
kvk∞ := limp→∞ kvkp .
Note
kvk1 = ni=1 |vi |;
P
qP
n 2

kvk2 = i=1 vi = v·v;
kvk∞ = maxi=1,...,n |vi | .
The two-norm kvk2 gives the Euclidean length of v . The uniform norm kvk∞ gives the maximum absolute value
of the components, and is usually easiest to compute.
The corresponding matrix norms are
Pm
kAk1 = i=1 maxj=1,...,n |aij |;
T 1/2
kAk2 = max(eig(A A)) ;
kAk∞ = maxi=1,...,m nj=1 |aij |.
P

21 / 76

Condition number

Conditioning For a given vector norm k · k and corresponding matrix norm, define the matrix condition
number K(A) := ||A|| × ||A−1 || .
Suppose x̃ is an approximate solution to Ax = b. Then
||x̃ − x|| = kA−1 (Ax̃ − Ax)k ≤ ||A−1 || ||Ax̃ − b||
so the error satisfies:
||Ax̃ − b||
||x̃ − x|| ≤ K(A) .
||A||
Further, since kAk · kxk ≥ kAxk = kbk, we have 1/kxk ≤ kAk/kbk, so the relative error satisfies:
||x̃ − x|| ||Ax̃ − b||
≤ K(A)
||x|| ||b||

8
22 / 76

Iterative refinement

Approximate solution Suppose x̃ is an approximate solution to Ax = b.

Refinement Then
A(x − x̃) = Ax − Ax̃ = b − Ax̃ ≈ 0,
so
x = x̃ + A−1 (b − Ax̃).

Accuracy If A−1 b can be computed less accurately than Ax, refinement typically improves the accuracy of a
solution.

23 / 76

Iterative Methods 24 / 76

General iterative method

Fixed-point For the linear system Ax = b, write A = D + E , where D is “easy” to invert.


Then Ax = Dx + Ex = b, so Dx = b − Ex and
x = D −1 (b − Ex).
Alternatively, we can write
x = x − D −1 (Ax − b).

Update Attempt to improve x using the update


x′ = D −1 (b − Ex) = x − D −1 (Ax − b).

Iteration Use this as a basis for an iterative method


x(n+1) = D −1 (b − Ex(n) ) = x(n) − D −1 (Ax(n) − b).
25 / 76

9
Jacobi method
Pn
Fixed-point formula We have j=1 aij xj = bi for i = 1, . . . , n.
Pn
Write aii xi + j=1,j6=i aij xj = bi . Rearranging gives
P
bi − j6=i aij xj
xi = .
aii
We can use this as the basis for an iterative method.

Jacobi Method Iterate using


P
bi − j6=i aij xj
x′i = .
aii
An alternative formula is P
j aij xj − bi
x′i = xi − .
aii
In matrix form
x′ = D −1 (b − Ex) = x − D −1 (Ax − b)
where D is the diagonal of A and E = A − D .

26 / 76

Jacobi method

Example Solve Ax = b using the Jacobi method starting at x(0) = 0 for


   
6 2 0 3
A =  3 5 −1 , b = 4 .
−2 1 4 1
(1) (0) (0)
x1 = (b1 − a12 x2 − a13 x3 )/a11 = (3 − 2×0.0 − 0×0.0)/6 = 0.500;
(1) (0) (0)
x2 = (b2 − a21 x1 − a23 x3 )/a22 = (4 − 3×0.0 − (−1)×0.0)/5 = 0.800;
(1) (0) (0)
x3 = (b3 − a31 x1 − a32 x2 )/a33 = (1 − (−2)×0.0 − 1×0.0)/4 = 0.250.
(2) (1) (1)
x1 = (b1 − a12 x2 − a13 x3 )/a11 = (3 − 2×0.800 − 0×0.250)/6 = 0.233;
(2) (1) (1)
x2 = (b2 − a21 x1 − a23 x3 )/a22 = (4 − 3×0.500 + 1×0.250)/5 = 0.550;
(2) (1) (1)
x3 = (b3 − a31 x1 − a32 x2 )/a33 = (1 + 2×0.500 − 1×0.800)/4 = 0.300.
Continuing yields      
0.3167 0.2600 0.2814
x(3) = 0.7200 , x(4) = 0.6558 , x(5) = 0.6897 .
0.2292 0.2283 0.2160

27 / 76

10
Gauss-Seidel method

Gauss-Seidel Method Rather than update all the xi simultaneously, we can update in turn. Then
Pi−1 ′
Pn
bi − j=1 aij xj − j=i+1 aij xj
x′i = .
aii
In other words,
P  Pn 
for i = 1, . . . , n, set xi = bi − j6=i aij xj /aii = xi − j=1 aij xj − bi /aii .
This is more easily implemented in code:
for i=1:n, x(i)=x(i)-(A(i,:)*x-b(i))/A(i,i); end;
or using an explicit loop:
for i=1:n,
ri=-b(i);
for j=1:n, ri=ri+A(i,j)*x(j); end;
x(i)=x(i)-ri/A(i,i);
end;
28 / 76

Gauss-Seidel method

Example Solve Ax = b using the Gauss-Seidel method for:


     
6 2 0 3 0
A =  3 5 −1, b = 4, x(0) = 0
−2 1 4 1 0
(1) (0) (0)
x1 = (b1 − a12 x2 − a13 x3 )/a11 = (3 − 2×0.0000 − 0×0.0000)/6 = 0.5000;
(1) (1) (0)
x2 = (b2 − a21 x1 − a23 x3 )/a22 = (4 − 3×0.5000 − (−1)×0.0000)/5 = 0.5000;
(1) (1) (1)
x3 = (b3 − a31 x1 − a32 x2 )/a33 = (1 − (−2)×0.5000 − 1×0.5000)/4 = 0.3750.
(2) (1) (1)
x1 = (b1 − a12 x2 − a13 x3 )/a11 = (3 − 2×0.5000 − 0×0.3750)/6 = 0.3333;
(2) (2) (1)
x2 = (b2 − a21 x1 − a23 x3 )/a22 = (4 − 3×0.3333 + 1×0.3750)/5 = 0.6750;
(2) (2) (2)
x3 = (b3 − a31 x1 − a32 x2 )/a33 = (1 + 2×0.3333 − 1×0.6750)/4 = 0.2479.
       
0.50000 0.33333 0.27500 0.27181
x(1) =0.50000, x(2) =0.67500, x(3) =0.68458, x(4) =0.68019,
0.37500 0.2479 0.21635 0.21568
Convergence is rapid.

29 / 76

11
Gauss-Seidel method

Example For
     
3 −7 −2 2 −9 0
−3 5 1 0  5  0
A=
 6 −4 2 −5,
 b=
 7 ,
 x(0) =
0

−9 5 −5 6 −19 0
the Gauss-Seidel method gives
       
−3.000 6.778 −11.944 −4.857
 1.000  (2) −2.500 (3)  6.950  (4)  −6.925 
x(1) =
 3.500 , x = 3.083 , x =−30.958, x = 119.365 
      

−3.167 −2.417 14.069 −66.743


so does not converge!

30 / 76

Gauss-Seidel method

Matrix form Write A = L + D + U , where L is strictly lower-triangular, D is diagonal and U is strictly upper
triangular.
i.e. Li,j = 0 if i ≤ j , Di,j = 0 if i 6= j , Ui,j = 0 if i ≥ j .
Note that L, U here are not the L and U of the LU factorisation!!
Then the Gauss-Seidel method is given by x′ = D −1 (b − Lx′ + U x), and rearranging gives
x′ = (L + D)−1 (b − U x)
= x − (L + D)−1 (Ax − b).

31 / 76

Convergence

Theorem An iterative method x′ = T x + r converges if ||T || < 1 for some matrix norm || · ||.
P
Definition A matrix A is diagonally-dominant if |aii | > j6=i |aij | for all i.

Theorem (Convergence of Jacobi / Gauss-Seidel) If A is diagonally-dominant, then the Jacobi and


Gauss-Seidel iterations converge to the solution of Ax = b.

Preconditioning Can precondition A by multiplying by a matrix P to obtain (P A)x = P b.

Approximate inverse Since if J ≈ I , then J is diagonally-dominant, precondition by P ≈ A−1 to obtain


J = P A ≈ I.
32 / 76

12
Preconditioning

Example Let
    
3 −7 −2 2 −9 0 0 −1 −1
−3 5 1 0  5  1 2 0 0
A=
 6 −4 2 −5,
 b=
 7 ,
 P =
0
.
0 1 0
−9 5 −5 6 −19 1 0 0 0
Then    
3 −1 3 −1 12
−3 3 0 2 1
 6 −4 2 −5 ,
PA =  
 7 .
Pb =  

3 −7 −2 2 −9
Applying the Gauss-Seidel method to (P A)x = (P b) gives iterates:
         
4.0 6.9 1.7 −1.69 −2.63
4.3 (2) 4.0 (4) 1.0 (8) −1.15 (12) −1.76
x(1) =
0.2, x =2.9, x =4.2, x = 5.50 , x
       =
 5.86 .

4.8 2.1 0.8 −0.40 −0.85


Method converges slowly to x(∞) = (−3, −2, 6, −1)T .

33 / 76

Successive Over-Relaxation Method

Successive Over-Relaxation Use slightly larger update in the Gauss-Seidel method:



P P
bi − j<i aij xj − j≥i aij xj
x′i = xi + ω
a
P ii ′
P
bi − j<i aij xj − j>i aij xj
= (1 − ω)xi + ω .
aii
with ω & 1.

Typically ω is taken to be in the range 1.1 ≤ ω ≤ 1.3.

Implement in Matlab as:


for i=1:n, x(i)=x(i)-omega*(A(i,:)*x-b(i))/A(i,i); end;

Explicitly in matrix form,


x′ = x − ω (ωL + D)−1 (Ax − b).
34 / 76

13
Successive Over-Relaxation Method
Example Solve Ax = b using the successive over-relaxation method with
   
6 2 0 3
A= 3 5 −1 , b = 4 , x(0) = 0, ω = 1.1.
−2 1 4 1

First step gives x(1) = (0.5500, 0.5170, 0.4353)T . Second step:


(2) (1) (1) (1) (1)
x1 = x1 − ω(a11 x1 + a12 x2 + a13 x3 − b1 )/a11
= 0.5500 − 1.1 × (6×0.5500 + 2×0.5170 + 0×0.4353 − 3)/6 = 0.3054;
(2) (2) (1) (1)
x2 = (a21 x1 + a22 x2 + a23 x3 − b2 )/a22
= 0.5170 − 1.1 × (3×0.3054 + 5×0.5170 − 1×0.4353 − 4)/5 = 0.7225;
(2) (2) (2) (1)
x3 = (a31 x1 + a32 x2 + a33 x3 − b3 )/a33
= 0.4353 − 1.1 × (−2×0.3054 − 1×0.7225 + 4×0.4353 − 1)/4 = 0.2008.

Further iterates give:


       
0.25455 0.27377 0.27460 0.27358
(3) (4) (5) (∞)
x = 0.68392, x = 0.67642 , x = 0.67927 , x = 0.67925 .
0.20684 0.21888 0.21734 0.21698

35 / 76

Conjugate-Gradient 36 / 76

Conjugate-Gradient Method

Method for solving Ax = b where A is positive-definite.

Idea Construct sequence xk minimising residual ||Axk − b||2 in span{b, Ab, . . . , Ak−1 b}.
Use inner products
Pn Pn
hu, S, vi = i,j=1 ui Sij vj and hu, vi = hu, I, vi = i u i vi .
Algorithm
Initialise
x0 = 0, r0 = b − Ax0 , v1 = r0 ;
Iterate for k = 1, 2, . . .
sk = hrk−1 , rk−1 i/hrk−2 , rk−2 i, s1 unused.
vk = rk−1 + sk vk−1 , v1 = r0 ;
tk = hrk−1 , rk−1 i/hvk , A, vk i,
xk = xk−1 + tk vk ,
rk = rk−1 − A tk vk ,
Note rk = b − Axk and hvi , A, vj i = 0 for i 6= j .

37 / 76

14
Conjugate-Gradient Method
Example Solve Ax = b using the conjugate-gradient method, where
   
6 3 −1 3
A =  3 5 2 , b = 4 .
−1 2 4 1
   
0 3
x0 = 0, r0 = b = 4 ;
0 1
 
3
v1 = r0 = 4
1
hr0 , r0 i 26
t1 = = = 0.11818
hv1 , A, v1 i 220
         
0 3 0 0.35455 0.35455
x1 = x0 + t1 v1 = 0 + 0.11818 × 4 = 0 + 0.47273  = 0.47273 
0 1 0 0.11818 0.11818
       
3 6 3 −1 0.35455 −0.427273
r1 = r0 − A t1 v1 = 4 −  3 5 2  × 0.47273  =  0.336364 
1 −1 2 4 0.11818 −0.063636

38 / 76

Conjugate-Gradient Method
Example
     
3.00000 0.35455 −0.427273
v1 = 4.00000  , x1 = 0.47273  , r1 =  0.336364  .
1.00000 0.11818 −0.063636

hr1 , r1 i 0.29975
s2 = = = 0.011529
hr0 , r0 i 26.00000
     
−0.427273 3.00000 −0.39269
v2 = r1 + s2 v1 =  0.336364  + 0.011529 × 4.00000  =  0.38248 
−0.063636 1.00000 −0.05211
hr1 , r1 i 0.29975
t2 = = = 0.46422
hv2 , A, v2 i 0.64572
         
0.355 −0.393 0.355 −0.1823 0.17225
x2 = x1 + t2 v2 = 0.473 + 0.464×  0.382  = 0.473 +  0.1776  = 0.65028 
0.118 −0.052 0.118 −0.0242 0.09399
       
−0.4273 6 3 −1 −0.1823 0.109625
r2 = r1 − A t2 v2 =  0.3364  −  3 5 2  ×  0.1776  =  0.043850 
−0.0636 −1 2 4 −0.0242 −0.504277

39 / 76

15
Conjugate-Gradient Method
Example      
−0.39269 0.17225 0.109625
v2 =  0.38248  , x2 = 0.65028 , r2 =  0.043850  .
−0.05211 0.09399 −0.504277

hr2 , r2 i 0.26824
s3 = = = 0.89486
hr1 , r1 i 0.29975
 
−0.24177
v3 = r2 + s3 v2 =  0.38612 
−0.55091
hr2 , r2 i 0.26824
t3 = = = 0.42390
hv3 , A, v3 i 0.63278
 
0.06977
Solution x = x3 = x2 + t3 v3 =  0.81395 
−0.13953
 
0.0
r3 = r2 − A t3 v3 = 0.0 = b − Ax (residual)
0.0

40 / 76

Preconditioned Conjugate-Gradient (Non-examinable)

Preconditioning Choose a preconditioning matrix P .


Apply the conjugate-gradient method to the equations
(P AP T )y = P b
and set
x = P T y.
A typical choice is P = (diag(A))1/2 .

41 / 76

16
Approximating Eigenvalues 42 / 76

Eigenvalues and eigenvectors

Eigenvalues and Eigenvectors If Av = λv and v 6= 0, then λ is an eigenvalue of A with corresponding


eigenvector v .

Similarity Matrices A and B are similar if B = P AP −1 for some invertible matrix P .


If Av = λv , then B(P v) = (P AP −1 )P v = P Av = P (λv) = λ(P v), so P v is an eigenvector of B with
eigenvalue λ.

Triangular If A is a lower- or upper-triangular matrix, then the eigenvalues of A are the diagonal entries;
λi = aii

Diagonal If D is a diagonal matrix, then the eigenvectors are the standard unit basis vectors ei , with
Dei = dii ei .

Notation Often write eigenvalues in order, |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn |, with corresponding eigenvectors vi .

43 / 76

Approximation theorems

Gersgorin Circle Theorem For i = 1, . . . , n, there exists an eigenvalue λi of A within the circle
Pn
{z ∈ C | |z − aii | ≤ j=1,j6=i |aij |}.
Similarly, for j = 1, . . . , n, there esists an eigenvalue λj of A in the circle
Pn
{z ∈ C | |z − ajj | ≤ i=1,i6=j |aij |}.

Example
 
10 −2 1
A=1 3 −1 .
0 1 −2
The eigenvalues λ1,2,3 satisfy:
|λ1 − 10| ≤ |−2|+|1| = 3, |λ2 − 3| ≤ |1|+|−1| = 2, |λ3 − (−2)| ≤ 1.
If λ1,2,3 are real, then λ1 ∈ [7, 13], λ2 ∈ [1, 5], λ3 ∈ [−3, −1].
By considering the first column, see |λ1 − 10| ≤ 1.

44 / 76

17
The power method

Power method Iterate


y (n) = Ax(n) , x(n+1) = y (n) / ±ky (n) k.
Note x(n) = ±An x(0) /kAn x(0) k.
Typically use supremum norm kyk∞ , chosing sign so that maximum absolute value of x is one:
(n)
x(n+1) = y (n) /yimax where |yimax | ≥ |yi | for all i.
Take eigenvalue approximation
(n) (n) (n)
µ(n) = (Ax(n) )imax /ximax = yimax /ximax .
It is usually more accurate, notably for symmetric matrices, to take the alternative eigenvalue approximation
x(n)T Ax(n)
µ(n) = .
x(n)T x(n)
If kx(n) k2 = 1, then this reduces to
µ(n) = x(n)T Ax(n) = x(n) · y (n) .
45 / 76

The power method

Theorem If A has an eigenvalue λ1 such that |λ1 | > |λi | for all other eigenvalues, then the power method
converges, with limn→∞ x(n) = v1 and limn→∞ µ(n) = λ1 .
Proof.
For simplicity, suppose Rn has basis {v1 , v2 , . . . , vn } of eigenvectors of A.
Write x = α1 v1 + α2 v2 + · · · + αn vn . Then
Ak x = Ak (α1 v1 + α2 v2 + · · · + αn vn )
= α1 Ak v1 + α2 Ak v2 + · · · + αn Ak vn
= α1 λk1 v1 + α2 λk2 v2 + · · · + αn λkn vn
= λk1 (α1 v1 + α2 (λ2 /λ1 )k v2 + · · · + αn (λn /λ1 )k vn .
Hence limk→∞ Ak x/λk1 = α1 v1 , so
Ak x Ak x/λk1 α1 v1
lim x(k) = lim k
= lim ± k
= ± = ± v̂1 .
k→∞ k→∞ ±||A x|| k→∞ ||Ak x/λ1 || ||α1 v1 ||

46 / 76

18
The power method
Example    
1 2 0 1
A = 0 0 1 , x = 1 .
1 0 0 1
        
1 2 0 1 3 (0) 1 1.0000
y
y (0) = Ax(0) = 0 0 1 1 = 1 ; x(1) = (0) = 1/3 = 0.3333
1 0 0 1 1 ||y || ∞ 1/3 0.3333
      
1 2 0 1.0000 1.6667 (1) 1.0000
y
y (1) = Ax(1) = 0 0 1 0.3333 = 0.3333 ; x(2) = (1) = 0.2000
1 0 0 0.3333 1.0000 ||y || 0.6000
      
1 2 0 1.0000 1.4000 (2) 1.0000
y
y (2) = Ax(2) = 0 0 1 0.2000 = 0.6000 ; x(3) = (2) = 0.4286
1 0 0 0.6000 1.0000 ||y || 0.7143
      
1 2 0 1.0000 1.8571 (3) 1.0000
y
y (3) = Ax(3) = 0 0 1 0.4286 = 0.7143 ; x(4) = (3) = 0.3846
1 0 0 0.7143 1.0000 ||y || 0.5385

47 / 76

The power method


Example       
1 2 0 1.0000 1.7692 1.0000
y (4)
y (4) = Ax(4) = 0 0 1 0.3846 = 0.5385 ; x(5) = (4) = 0.3043
1 0 0 0.5385 1.0000 ||y || 0.5652
    
1 2 0 1.0000 1.6087
y (5) = Ax(5) = 0 0 1 0.3846 = 0.5652 .
1 0 0 0.5385 1.0000
Estimate  
1.0000
(Ax(5) )imax
v ≈ x(5) =0.3043; λ ≈ µ(5) = (5) )
= (Ax(5) )imax= (Ax(5) )1= 1.609.
0.5652 (x imax

Alternative estimate  
0.8415
(5) x(5) x(5)TAx(5)
v ≈ x̂ = (5) = 0.2561; λ ≈ µ̂(5) = (5)T (5) = x̂(5)TAx̂(5) = 1.661.
||x ||2 0.4756 x x
T
Actual eigenvalue/vector λ = 1.6841, v = 1.0000, 0.3514, 0.6216 or normalised
T
v̂ = 0.8138, 0.2859, 0.5059 .

48 / 76

19
Inverse power method

Theorem If λ is an eigenvalue of A, then 1/(λ − µ) is an eigenvalue of (A − µI)−1 with the same


eigenvector v . Hence if (A − µI)−1 v = κv , then 1/(λ − µ) = κ, so λ = µ + 1/κ.
Proof. (A − µI)v = Av − µIv = λv − µv = (λ − µ)v ,
so (A − µI)−1 v = (λ − µ)−1 v .
Inverse power method To estimate an eigenvalue λ ≈ µ, apply the power method to (A − µI)−1 .
Iterate
y (n) = (A − µI)−1 x(n) , κ(n) = y (n) /x(n) , x(n+1) = y (n) / ± ||y (n) ||.
Estimate
λ ≈ λ(n) = µ + 1/κ(n) .
In practise, first compute LU-factorisation A − µI = Lµ Uµ , and solve
(A − µI)y (n) = Lµ Uµ y (n) = x(n) .
Can update µ to µ + 1/κ(m) to speed up convergence.

49 / 76

Inverse power method

Example 
2 −1 1
  
1
A = −1 3 −2 ; µ = 1.5. Use x(0) = 0.
1 2 3 0
    
0.5 −1 1 1 0 0 0.5 −1.0 1.0
A − µI = −1 1.5 −2 = −2.0 1 0  0 −0.5 0.0  = Lµ Uµ
1 2 1.5 2.0 −8.0 1 0 0 −0.5
 −1  
0.5 −1 1 50 28 4
(A − µI)−1 = −1 1.5 −2 =  −4 −2 0
1 2 1.5 −28 −16 −2
   
50.0000 (0) 1.0000
y
y (0) = (A − µI)−1 x(0) =  −4.0000  ; x(1) = (0) = −0.0800
−28.0000 ||y || −0.5600
   
45.5200 (1) 1.0000
y
y (1) = (A − µI)−1 x(1) =  −3.8400  ; x(2) = (1) = −0.0844
−25.6000 ||y || −0.5624
   
45.3884 (2) 1.0000
y
y (2) = (A − µI)−1 x(2) =  −3.8313  ; x(3) = (2) = −0.0844
−25.5255 ||y || −0.5624
Eigenvalue/vector λ ≈ (Ax(3) )1 = 1.5220, v ≈ (1.0000, −0.0844, −0.5624).

50 / 76

Deflation (Off-syllabus)

Deflation Suppose A has eigenvalue/vector pairs (λi , vi ).


If x is a vector such that xT v1 = 1, and B = A − λ1 v1 xT ,
then B has eigenvalues 0 and λi − λ1 for i = 2, . . . , n
with eigenvectors wi satisfying vi = (λi − λ1 )wi + λ1 (x′ wi )v1 .

Wielandt deflation Define x by xj = ak,j / λ1 (v1 )k for some k . Then the k th row of B is identically 0, so


(wi )k = 0 for all i = 2, . . . , n.

20
51 / 76

Orthogonalisation 52 / 76

Orthogonality
pPn
A vector v is normal if kvk2 := 2
Normal vectors i=1 vi = 1, equivalently if v · v = 1.

Orthogonal vectors Vectors {v1 , . . . , vn } are orthogonal if vi · vj = 0 for all i 6= j .

Orthonormal vectors Vectors {v1 , . . . , vn } are orthonormal if they are orthogonal and each is normal.

Orthogonal matrices A matrix Q is orthogonal if Q−1 = QT ; equivalently, if the columns of Q are


orthonormal vectors.
53 / 76

Gram-Schmidt orthogonalisation

Gram-Schmidt orthogonalisation Let x1 , . . . , xm be vectors in Rm .


Recursively define
i−1 i−1
X x i · vj X vi
vi = xi − vj = x i − (xi · uj ) uj ; ui = .
vj · vj ||vi ||2
j=1 j=1

Theorem The vi are an orthogonal set, and the ui are an orthonormal set, such that for all k = 1, . . . , m,
span{x1 , . . . , xk } = span{u1 , . . . , uk } = span{v1 , . . . , vk }.
Proof. Fix i and assume {u1 , . . . , ui−1 } are orthonormal. Then for i > j ,
Pi−1
vi · uj = xi · uj − k=1 (xi · uk )uk · uj = xi · uj − (xi · uj )(uj · uj ) = 0,
so ui · uj = (vi /kvi k) · uj = (vi · uj )/kvi k = 0.

54 / 76

Gram-Schmidt orthogonalisation
Example Apply the Gram-Schmidt orthogonalisation procedure to:
     
2 −1 1
x1 = −1 , x2 =  3  , x3 = −2 .
1 2 3
       
2 −1 2 0
x 2 · v1 −3
v1 = x1 = −1 ; v2 = x2 − v1 =  3  − −1 = 5/2 ;
v1 · v1 6
1 2 1 5/2
       
1 2 0 −4/3
x3 · v1 x3 · v2 7 5/2
v3 = x3 − v1 − v2 = −2 − −1 − 5/2 = −4/3 .
v1 · v1 v2 · v2 6 25/2
3 1 5/2 4/3
√ √ √
kv1 k = 6; kv2 k = 5/ 2; kv3 k = 4/ 3.
     
2 0 −1
v1 1   1   1  
u1 = = √ −1 ; u2 = √ 1 ; u3 = √ −1 .
||v1 || 6 2 1 3
1 1
Check orthogonality:
2×0 + (−1)×1 + 1×1 −2 + 1 + 1
u1 · u2 = √ √ = 0; u1 · u3 = √ = 0; u2 · u3 = 0.
6× 2 18

21
55 / 76

QR factorisation

QR factorisation In the Gram-Schmidt orthogonalisation, define ri,i = ||vi || and


ri,j = xj · ui = xj · vi /kvi k. Then
Pj−1
vj = x j − i=1 ri,j ui , and ui = vi /ri,i
so
Pj−1
xj = rj,j uj + i=1 ri,j ui .
Let
A = (x1 , . . . , xn ), Q = (u1 , . . . , un ) and (R)i,j = ri,j for i ≤ j.
Then Q is orthogonal, R is upper-triangular, and
A = QR.
56 / 76

QR factorisation
Example Compute the QR-factorisation of:
 
2 −1 1
A = −1 3 −2
1 2 3

From the Gram-Schmidt orthogonalisation,


 √ √   
2/ √6 0√ −1/√ 3 0.816 0.000 −0.577
Q = −1/√ 6 1/√2 −1√ 3  = −0.408 0.707 −0.577
1/ 6 1/ 2 1/ 3 0.408 0.707 0.577
 √ √ √   
6/ 6 −3/√ 6 7/√6 2.449 −1.225 2.858
R= 0 5/ 2 1/√2 =  0 3.536 0.707 .
0 0 4/ 3 0 0 2.309

57 / 76

The QR Method 58 / 76

The QR method

The QR Method The QR method is an iterative algorithm for finding all the eigenvalues of A.
Set A(0) = A. Iteratively find Q(n) , R(n) such that A(n) = Q(n) R(n) and set A(n+1) = R(n) Q(n) .

Theorem Assuming eigenvalues of A have distinct absolute value, A(n) converges to an upper-triangular
matrix with the eigenvalues of A on the diagonal.

59 / 76

22
The QR method
Example Use the QR method to approximate eigenvalues of:
 
4 −1 1
A = −1 3 −2 .
1 −2 3
   
−0.9428 −0.3244 −0.0765 −4.2426 2.1213 −2.1213
(0) (0)
Q =  0.2357 −0.8111 0.5353  , R =  0 −3.0822 2.7578 
−0.2357 0.4867 0.8412 0 0 1.3765
 
5.0000 −1.3765 −0.3244
A(1) = R(0) Q(0) = −1.3765 3.8421 0.6699 
−0.3244 0.6699 1.1579
 
5.6667 −0.9406 0.0640
A(2) = R(1) Q(1) = −0.9406 3.3226 −0.1580 
0.0640 −0.1580 1.0108
   
5.909 −0.514 −0.011 5.977 −0.263 0.002
(3) (4)
A = −0.514 3.090 0.045  A = −0.263 3.023 −0.014
−0.011 0.045 1.001 0.002 −0.014 1.000
Hence λ1 = 5.977 ± 0.265, λ2 = 3.023 ± 0.277, λ3 = 1.000 ± 0.016.

60 / 76

The QR method—Properties (Non-examinable)

Conjugacies Note
A(k) = Q(k) R(k) = R(k−1) Q(k−1) ;
Q(k) A(k+1) = Q(k) R(k) Q(k) = A(k) Q(k) .

Similarity Set P (k) = Q(0) Q(1) · · · Q(k−1) . Then


P (k) A(k) = A P (k) .
Equivalently,
−1
A = P (k) A(k) P (k)

Power Set S (k) = R(k) · · · R(1) R(0) . Then


Ak = P (k) S (k) .
61 / 76

23
The QR method—Convergence (Non-examinable)

Convergence Since P (k) is orthogonal and S (k) upper-triangular, we have


(k)
Ak e1 = P (k) S (k) e1 = s1,1 P (k) e1 ,
(k) (k) (k) (k)
Ak e2 = P (k) S (k) e2 = P (k) (s1,2 e1 + s2,2 e2 ) = s1,2 P (k) e1 + s2,2 P (k) e2 .
We deduce
P (k) ei ∈ span{Ak e1 , . . . , Ak ei }
and is orthogonal to {Ak e1 , . . . , Ak ei−1 }.
P
Writing ei = αi,j vj gives
Ak ei = αi,j λkj vj .
P

Then
P (k) e1 ∝ Ak e1 = α1,1 λk1 (v1 + (α1,j /α1,1 )(λj /λ1 )k vj ∼ v1 as k → ∞.
P

Similarly, we see that


limk→∞ P (k) ei ∈ span{v1 , v2 , . . . , vi }
and is orthogonal to {v1 , . . . , vi−1 }.

62 / 76

The QR method (Non-examinable)

Shifted QR The QR method can be shifted i.e. applied to A(n) − µ(n) I for scalars µ(n) , as
A(n) − µ(n) I = Q(n) R(n) ; A(n+1) = R(n) Q(n) + µ(n) I .
This can accelerate convergence, especially when eigenvalues have nearly the same absolute value.

Partial QR The QR method can also be modified to find only some of the eigenvalues of A by iterating
Q(k) R(k) = AQ(k−1) , where the Q(k) are n-by-m matrices with orthonormal columns, and the R(k) are
m-by-m upper-triangular matrices.
63 / 76

Householder matrices (Advanced)

Householder matrices A Householder matrix is a matrix of the form


v vT
H =I −2
vT v
or
H = I − 2wwT where ||w|| = 1.

Symmetry A Householder matrix is symmetric, since


H T = (I − 2wwT )T = I T − 2(wwT )T = I T − 2(wT )T wT = I − 2wwT .

Orthogonality A Householder matrix is orthogonal, since


H T H = HH = (I − 2wwT )(I − 2wwT )
= I − 2wwT − 2wwT + 4wwT wwT = I − 4wwT + 4w(wT w)wT = I

Theorem If H = 1 − 2wwT is a Householder matrix, then H = H T = H −1 .

64 / 76

24
Householder matrices (Advanced)

Example Take v = (0, 2, −1, 1)T , w = v/ 6,
1 0 0 0
   
3 0 0 0
0 − 1 2
− 32 
3 3  1 0 −1 2 −2
H = = 
 
0 32 2
3
1 
3
 3 0 2 2 1 
0 − 32 1 2 0 −2 1 2
3 3

65 / 76

Householder matrices (Advanced)


Upper-Hessenberg form For a matrix A, aim to find an orthogonal matrix Q such that QT AQ has
upper-Hessenberg form Ai,j = 0 for i > j + 1.
 
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
 
0 ∗ ∗ ∗ ∗
 
0 0 ∗ ∗ ∗
0 0 0 ∗ ∗

Note that if A is symmetric and upper-Hessenberg, then it is tridiagonal Ai,j = 0 for |i − j| > 1.
 
∗ ∗ 0 0 0
∗ ∗ ∗ 0 0
 
0 ∗ ∗ ∗ 0
 
0 0 ∗ ∗ ∗
0 0 0 ∗ ∗

Upper-Hessenberg form greatly increases the efficiency of the QR method; time O(n2 ) per step instead of
O(n3 ), and O(n) for symmetric matrices.
66 / 76

Householder matrices (Advanced)

Conversion to upper-Hessenberg form First find a Householder matrix H = H1 such that H1T AH1 has first
column (H1T AH1 )i,1 = 0 for i ≥ 3:
Pn 2
1/2
1. Set α = i=2 ai,1 , v1 = 0, v2 = a2,1 ±α, vi = ai,1 for i ≥ 3.
Typically choose sign so that v2 = a2,1 + sgn(a2,1 )α.
1/2
2. Take w = v/r where r = ||v|| = 2α(α ± a2,1 ) .
3. Set H = 1 − 2wwT , so Hi,1 = 0 for i > 2.

Continue by applying the method to the sub-matrix (H1T AH1 )2:n,2:n to find a Householder matrix H2 such
that H2T H1T AH1 H2 has Hi,j = 0 for j = 1, 2 and i > j + 1.

Note that in practise, we compute


HA = (I − 2wwT )A = A − (2w)(wT A)
which takes O(n2 ) operations, whereas computing H first and then HA would take O(n3 ) operations.

67 / 76

25
Householder matrices (Advanced)
Example  
4 1 −2 2
1 2 0 1
A= −2 0 3 −2 .

2 1 −2 −1
P4 2 1/2
 √
α1 = − sgn(a21 ) i=2 a21 = − 9 = −3.
q √
r1 = ( 12 α1 (α1 − a2,1 ))1/2 = (− 12 α1 (a2,1 − α1 ))1/2 = 12 3(3 + 1) = 6.
     
0 0 0
1 − (−3)  4  1 2
v1 =   −2  = −2 ; w1 = v1 /2r1 = √6 −1 .
    

2 2 1
   
1 0 0 0 3 0 0 0
0 − 1 2 − 2  1 0 −1 2 −2
H1 = I − 2w1 w1T =  0 2
3 3
2
3
1  =
 
3 3 3 3 0 2 2 1 
0 − 32 13 2
3 0 −2 1 2

68 / 76

Householder matrices (Advanced)


Example (continued)
   
36 −27 0 0 4 −3 0 0
T 1 −27 30 9 12  −3 10 1 4 
H1 AH1 =  = 3 3 
5
9 0 9 15 −12  0 1 3 − 34 
4
0 12 −12 −9 0 3 − 34 −1
q
2 1/2
P4
= − 12 + ( 43 )2 = − 53 ; (v2 )3 = a2,3 − α2 = 38

α2 = ± i=3 ai,2
       
0 0 1 0 0 0 5 0 0 0
4 0 ; w2 = √1 0 ; H2 = 0 1 03
   0  1 0 5 0
 0
v2 =  4 =

3 2 5  2  0 0 − 5 − 5 5  0 0 −3 −4
1 1 0 0 − 54 3
5 0 0 −4 3
 
4.0000 −3.0000 0 0
−3.0000 3.3333 −1.6667 0 
H2T (H1T AH1 )H2 =  
 0 −1.6667 −1.3200 0.9067
0 0 0.9067 1.9867

69 / 76

26
Givens matrices (Advanced)

Givens rotation A Givens rotation is an matrix 


of the form


 α if i = j = k or i = k = l

β if i = l and j = k
[Gk,l (α, β)]i,j =


 −β if i = k and j = l

0 otherwise.

where α2 + β 2 = 1. We can write α = cos θ and β = sin θ for some θ .

Inverse Givens rotation matrices are orthogonal, with inverse


T
Gk,l (θ)−1 = G
k,l (θ) = Gk,l
(−θ).
Examples 
cos θ − sin θ 0 0 1 0 0 0

 sin θ cos θ 0 0 0 cos θ 0 − sin θ
G1,2 (θ) = 
 0
 G2,4 (θ) =  
0 1 0 0 0 1 0 
0 0 0 1 0 sin θ 0 cos θ

70 / 76

Givens matrices (Advanced)

We can implement the QR algorithm for upper-Hessenberg A by a sequence of Givens rotations


A = QR = G1,2 (θ1 ) G2,3 (θ2 ) G3,4 (θ3 ) R
so
R = G−1 −1 −1
3,4 (θ3 ) G2,3 (θ2 ) G1,2 (θ1 ) A
and
RQ = R G1,2 (θ1 ) G2,3 (θ2 ) G3,4 (θ3 )
= G−1 −1 −1
3,4 (θ3 ) G2,3 (θ2 ) G1,2 (θ1 ) A G1,2 (θ1 ) G2,3 (θ2 ) G3,4 (θ3 )
Note that pre-multiplying A by Gkl (or G−1
kl ) changes rows k and l to linear combinations of each other, and
leaves all other rows unchanged.
Likewise, post-multiplying by Gkl changes columns k and l, and leaves all other columns unchanged.

71 / 76

Givens matrices (Advanced)


 
Example 4 1 7
A = 3 7 9
0 12 2
Take Givens rotation (with r1 = (a211 + a221 )1/2 )
 4
− 53 0
 
a11 /r1 −a21 /r1 0 5
G1 = a21 /r1 a11 /r1 0 =  3 4
0
5 5
0 0 1 0 0 1
 4 3
0
   
5 5 4 1 7 5 5 11
G−1
1 A =
− 3 4
0 3 7 9 = 0 5 3  .
5 5
0 0 1 0 12 2 0 12 2

1 0 0
   
5 5 11
5 12 
G2 = 0
 13 − 13 ; R = G−1
2 (G1 A) =
−1 0 13 3  .
12 5 0 0 −2
0 13 13

27
72 / 76

Givens matrices (Advanced)

Example (continued)
RQ = RG1 G2 = (RG1 )G2
 4 3
 
5 −5 0 1 0 0

5 5 11
5 12 
= 0 13 3   35 4
0 0 13 − 13 
 
5
0 0 −2 0 0 1 0 1213
5
13
 
1 0 0
 
7 1 11
4 2 0 5 − 12 
= 7 5 10 5 3 
  13 13 
0 0 −2 0 12 13
5
13
7 4
 
7 10 13 3 13 
7.000 10.538 3.308

10
= 7 54 6 13 −9 6529 
 = 7.800 6.769 −8.446

0 −1 11 − 10 0 −1.846 −0.769
13 13

73 / 76

Givens matrices (Advanced)

If A is tridiagonal, then A = QR with ri,j = 0 for i > j or j > i + 2, so R is banded with three nontrivial
bands on the main diagonal and above ( 3n nonzero entries).
However, Q is upper-Hessenberg, and has n2 /2 nonzero entries, even though it is the product of n − 1 Givens
rotations, which are sparse.
We therefore perform the QR-method using Givens rotations, and do not directly construct Q.

74 / 76

Eigenvalue condition number (Non-examinable)

Eigenvalue condition numbers Suppose λ is a simple eigenvalue of A, Ax = λx and AT y = λy with


||x||2 = ||y||2 = 1. Suppose ||F ||2 = 1. Then if λ(ǫ) continuous to A + ǫF , we have
|λ′ (ǫ)| = |Y T F x|/|Y T x| ≤ 1/|Y ·x|, with the bound attained for F = yxT . Define s(λ) = |y T x| the
condition of the eigenvalue.
[From Golub & Van Loon]

75 / 76

Eigenvalue conditioning (Non-examinable)

Eigenvalue conditioning If µ is eigenvalue of perturbation A + E of a nondefective matrix A, then


|µ − λk | ≤ cond2 (V )||E||2
where λk is closest eigenvalue of A to µ, and V is the matrix of eigenvectors of A.
[From Michael T. Heath: Scientific Computing: An Introductory Survey]

76 / 76

28

You might also like