0% found this document useful (0 votes)

117 views20 pages

Numerical Linear Algebra and Matrix Analysis: Higham, Nicholas J. 2015

This document provides information about the book "Numerical Linear Algebra and Matrix Analysis" by Nicholas J. Higham from 2015. It discusses how matrices are ubiquitous in applied mathematics and how numerical linear algebra is concerned with solving matrix problems and analyzing algorithms. It also summarizes key topics covered in the book, including matrix factorizations, unitary transformations, condition numbers, and exploiting matrix structure.

Uploaded by

Mauricio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

117 views20 pages

Numerical Linear Algebra and Matrix Analysis: Higham, Nicholas J. 2015

Uploaded by

Mauricio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Numerical Linear Algebra and Matrix Analysis

Higham, Nicholas J.

2015

MIMS EPrint: 2015.103

Manchester Institute for Mathematical Sciences

School of Mathematics

The University of Manchester

Reports available from: http://eprints.maths.manchester.ac.uk/

And by contacting: The MIMS Secretary
School of Mathematics
The University of Manchester
Manchester, M13 9PL, UK

ISSN 1749-9097
1

exploitation of matrix structure (such as sparsity, sym-

Numerical Linear Algebra and metry, and definiteness), and the design of algorithms
Matrix Analysis† to exploit evolving computer architectures.
Nicholas J. Higham Throughout the article, uppercase letters are used for
matrices and lower case letters for vectors and scalars.
Matrices are ubiquitous in applied mathematics. Matrices and vectors are assumed to be complex, unless
Ordinary differential equations (ODEs) and partial dif- otherwise stated, and A∗ = (aji ) denotes the conjugate
ferential equations (PDEs) are solved numerically by transpose of A = (aij ). An unsubscripted norm k · k
finite difference or finite element methods, which lead denotes a general vector norm and the corresponding
to systems of linear equations or matrix eigenvalue subordinate matrix norm. Particular norms used here
problems. Nonlinear equations and optimization prob- are the 2-norm k · k2 and the Frobenius norm k · kF .
lems are typically solved using linear or quadratic The notation “i = 1 : n” means that the integer variable
models, which again lead to linear systems. i takes on the values 1, 2, . . . , n.
Solving linear systems of equations is an ancient task,
undertaken by the Chinese around 1AD, but the study
1 Nonsingularity and Conditioning
of matrices per se is relatively recent, originating with
Arthur Cayley’s 1858 “A Memoir on the Theory of Matri- Nonsingularity of a matrix is a key requirement in many
ces”. Early research on matrices was largely theoret- problems, such as in the solution of n linear equations
ical, with much attention focused on the development in n unknowns. For some classes of matrices, nonsingu-
of canonical forms, but in the 20th century the practi- larity is guaranteed. A good example is the diagonally
cal value of matrices started to be appreciated. Heisen- dominant matrices. The matrix A ∈ Cn×n is strictly
berg used matrix theory as a tool in the development diagonally dominant by rows if
of quantum mechanics in the 1920s. Early proponents X
of the systematic use of matrices in applied mathemat- |aij | < |aii |, i = 1 : n
j6=i
ics included Frazer, Duncan, and Collar, whose 1938
book Elementary Matrices and Some Applications to and strictly diagonally dominant by columns if A∗ is
Dynamics and Differential Equations emphasized the strictly diagonally dominant by rows. Any matrix that
important role of matrices in differential equations and is strictly diagonally dominant by rows or columns
mechanics. The continued growth of matrices in appli- is nonsingular (a proof can be obtained by applying
cations, together with the advent of mechanical and Gershgorin’s theorem in section 5.1).
then digital computing devices, allowing ever larger Since data is often subject to uncertainty we wish
problems to be solved, created the need for greater to gauge the sensitivity of problems to perturbations,
understanding of all aspects of matrices from theory which is done using condition numbers. An appropriate
to computation. condition number for the matrix inverse is
This article treats two closely related topics: matrix
k(A + ∆A)−1 − A−1 k
analysis, which is the theory of matrices with a focus lim sup .
ε→0 k∆Ak6εkAk εkA−1 k
on aspects relevant to other areas of mathematics, and
numerical linear algebra (also called matrix computa- This expression turns out to equal κ(A) = kAkkA−1 k,
tions), which is concerned with the construction and which is called the condition number of A with respect
analysis of algorithms for solving matrix problems as to inversion. This condition number occurs in many
well as related topics such as problem sensitivity and contexts. For example, suppose A is contaminated
rounding error analysis. by errors and we perform a similarity transformation
Important themes that are discussed in this article X −1 (A + E)X = X −1 AX + F . Then kF k = kX −1 EXk 6
include the matrix factorization paradigm, the use of κ(X)kEk and this bound is attainable for some E. Hence
unitary transformations for their numerical stability, the errors can be multiplied by a factor as large as
κ(X). We therefore prefer to carry out similarity and
†. Author’s final version, before copy editing and cross-referencing, other transformations with matrices that are well con-
of: N. J. Higham. Numerical linear algebra and matrix analysis. In N. J. ditioned, that is, ones for which κ(X) is close to its
Higham, M. R. Dennis, P. Glendinning, P. A. Martin, F. Santosa, and
J. Tanner, editors, The Princeton Companion to Applied Mathematics,
lower bound of 1. By contrast, a matrix for which κ is
pages 263–281. Princeton University Press, Princeton, NJ, USA, 2015. large is called ill conditioned. For any unitary matrix X,
2

κ2 (X) = 1, so in numerical linear algebra transforma- angular systems. It is then clear how to solve efficiently
tions by unitary or orthogonal matrices are preferred several systems Axi = bi , i = 1 : r , with different right-
and usually lead to numerically stable algorithms. hand sides but the same coefficient matrix A: compute
In practice we often need an estimate of the matrix the LU factors once and then re-use them to solve for
condition number number κ(A) but do not wish to go each xi in turn.
to the expense of computing A−1 in order to obtain This matrix factorization1 viewpoint dates from
it. Fortunately, there are algorithms that can cheaply around the 1940s and has been extremely successful
produce a reliable estimate of κ(A) once a factorization in matrix computations. In general, a factorization is a
of A has been computed. representation of a matrix as a product of “simpler”
Note that the determinant, det(A), is rarely com- matrices. Factorization is a tool that can be used to
puted in numerical linear algebra. Its magnitude gives solve a variety of problems, as we will see below.
no useful information about the conditioning of A, not Two particular benefits of factorizations are unity
least because of its extreme behavior under scaling: and modularity. GE, for example, can be organized
det(αA) = αn det(A). in several different ways, corresponding to different
orderings of the three nested loops that it comprises,
2 Matrix Factorizations as well as the use of different blockings of the matrix
elements. Yet all of them compute the same LU factor-
The method of Gaussian elimination (GE) for solving ization, carrying out the same mathematical operations
a nonsingular linear system Ax = b of n equations in a different order. Without the unifying concept of a
in n unknowns reduces the matrix A to upper trian- factorization, reasoning about these GE variants would
gular form and then solves for x by substitution. GE be difficult.
is typically described by writing down the equations Modularity refers to the way that a factorization
(k+1) (k) (k) (k) (k)
aij = aij − aik akj /akk (and similarly for b) that breaks a problem down into separate tasks, which can
(1) be analyzed or programmed independently. To carry
describe how the starting matrix A = A(1) = (aij )
changes on each of the n − 1 steps of the elimina- out a rounding error analysis of GE we can analyze the
tion in its progress towards upper triangular form U. LU factorization and the solution of the triangular sys-
Working at the element level in this way leads to a pro- tems by substitution separately and then put the analy-
fusion of symbols, superscripts, and subscripts that ses together. The rounding error analysis of substitu-
tend to obscure the mathematical structure and hin- tion can be re-used in the many other contexts in which
der insights being drawn into the underlying process. triangular systems arise.
One of the key developments in the last century was An important example of the use of LU factoriza-
the recognition that it is much more profitable to work tion is in iterative refinement. Suppose we have used
at the matrix level. Thus the basic equation above is GE to obtain a computed solution x b to Ax = b in
written as A(k+1) = Mk A(k) , where Mk agrees with the floating-point arithmetic. If we form r = b − Ax b and
identity matrix except below the diagonal in the kth solve Ae = r , then in exact arithmetic y = x b + e is
(k)
column, where its (i, k) element is mik = −aik /akk ,
(k) the true solution. In computing e we can reuse the LU
i = k + 1 : n. Recurring the matrix equation gives factors of A, so obtaining y from x b is inexpensive. In
U := A(n) = Mn−1 . . . M1 A. Taking the Mk matrices over practice, the computation of r , e, and y is subject to
to the left-hand side leads, after some calculations, to rounding errors so the computed y b is not equal to x.
the equation A = LU , where L is unit lower triangular, But under suitable assumptions y b will be an improved
with (i, k) element mik . The prefix “unit” means that L approximation and we can iterate this refinement pro-
cess. Iterative refinement is particularly effective if r
has ones on the diagonal.
can be computed using extra precision.
GE is therefore equivalent to factorizing the matrix
Two other key factorizations are:
A as the product of a lower triangular matrix and an
upper triangular matrix—something that is not at all • Cholesky factorization: for Hermitian positive def-
obvious from the element-level equations. Solving the inite A ∈ Cn×n , A = R ∗ R, where R is upper tri-
linear system Ax = b now reduces to the task of solving angular with positive diagonal elements, and this
the two triangular systems Ly = b and U x = y. factorization is unique.
Interpreting GE as LU factorization separates the
computation of the factors from the solution of the tri- 1. Or decomposition—the two terms are essentially synonymous.
3

• QR factorization: for A ∈ Cm×n with m > n, A = (recall that FnT = Fn ), though this was not realized
QR where Q ∈ Cm×m is unitary (Q∗ Q = Im )hand i when the methods were developed. Transposition also
R ∈ Cm×n is upper trapezoidal, that is, R = R01 plays an important role in automatic differentiation:
with R1 ∈ Cn×n upper triangular. the so-called reverse or adjoint mode can be obtained
by transposing a matrix factorization representation of
These two factorizations are related: if A ∈ Cm×n with the forward mode.
m > n has full rank and A = QR is a QR factorization, The factorizations described in this section are in
in which without loss of generality we can assume that “plain vanilla” form, but all have variants that incor-
R has positive diagonal, then A∗A = R ∗R, so R is the porate pivoting. Pivoting refers to row or column inter-
Cholesky factor of A∗A. changes carried out at each step of the factorization as
The Cholesky factorization can be computed by what it is computed, introduced either to ensure that the fac-
is essentially a symmetric and scaled version of GE. The torization succeeds and is numerically stable or to pro-
QR factorization can be computed in three main ways, duce a factorization with certain desirable properties
one of which is the classical Gram–Schmidt orthogonal- usually associated with rank deficiency. For GE, partial
ization. The most widely used method constructs Q as pivoting is normally used: at the start of the kth stage
(k)
a product of Householder reflectors, which are unitary of the elimination an element ar k of largest modulus in
matrices of the form H = I −2vv ∗ /(v ∗ v), where v is a the kth column below the diagonal is brought into the
nonzero vector. Note that H is a rank 1 perturbation of (k, k) (pivot) position by interchanging rows k and r .
(k)
the identity and since it is Hermitian and unitary it is its Partial pivoting avoids dividing by zero (if akk = 0 after
own inverse, that is, it is involutory. The third approach the interchange then the pivot column is zero below the
builds Q as a product of Givens rotations, each of which diagonal and the elimination step can be skipped). More
c s
is a 2 × 2 matrix −s c embedded into two rows and importantly, partial pivoting ensures numerical stabil-
columns of an m ×m identity matrix, where (in the real ity; see section 8. The overall effect of GE with partial
case) c 2 + s 2 = 1. pivoting is to produce an LU factorization P A = LU ,
The Cholesky factorization helps us to make the where P is a permutation matrix.
most of the very desirable property of positive definite- Pivoted variants of Cholesky factorization and QR
ness. For example, suppose A is Hermitian positive def- factorization
h i take the form P TAP = R ∗ R and AP =
inite and we wish to evaluate the scalar α = x ∗ A−1 x. R
Q 0 , where P is a permutation matrix and R satisfies
We can rewrite it as x ∗ (R ∗R)−1 x = (x ∗ R −1 )(R −∗ x) = the inequalities
z∗ z, where z = R −∗ x. So once the Cholesky factoriza- j
X
tion has been computed we need just one triangular |rkk |2 > |rij |2 , j = k + 1 : n, k = 1 : n.
solve to compute α, and of course there is no need to i=k
explicitly invert the matrix A.
h i
If A is rank deficient then R has the form R = R011 R012
A matrix factorization might involve a larger num- with R11 nonsingular, and the rank of A is the dimen-
ber of factors: A = N1 N2 . . . Nk , say. It is immediate sion of R11 . Equally importantly, when A is nearly rank
that AT = NkT Nk−1 T
. . . N1T . This factorization of the deficient this tends to be revealed by a small trailing
transpose may have deep consequences in a particu- diagonal block of R.
lar application. For example, the discrete Fourier trans- A factorization of great importance in a wide vari-
form is the matrix–vector product y = Fn x, where ety of applications is the singular value decomposition
the n × n matrix Fn has (p, q) element exp(−2π i(p − (SVD) of A ∈ Cm×n :
1)(q − 1)/n); Fn is a complex, symmetric matrix. The
A = UΣV ∗ , Σ = diag(σ1 , σ2 , . . . , σp ) ∈ Rm×n , (1)
fast Fourier transform (FFT) is a way of evaluating y in
O(n log2 n) operations, as opposed to the O(n2 ) oper- where p = min(m, n), U ∈ Cm×m
and V ∈ areCn×n
ations that are required by a standard matrix–vector unitary, and the singular values σi satisfy σ1 > σ2 >
multiplication. Many variants of the FFT have been pro- · · · > σp > 0. For a square A (m = n), the 2-norm
posed since the original 1965 paper by Cooley and condition number is given by κ2 (A) = σ1 /σn .
Tukey. It turns out that different FFT variants corre- The polar decomposition of A ∈ Cm×n with m > n
spond to different factorizations of Fn with k = log2 n is a factorization A = UH in which U ∈ Cm×n has
sparse factors. Some of these methods correspond sim- orthonormal columns and H ∈ Cn×n is Hermitian posi-
ply to transposing the factorization in another method tive semidefinite. The matrix H is unique and is given by
4

(A∗A)1/2 , where the exponent 1/2 denotes the princi- be rewritten as the inequality k∆Ak/kAk < κ(A)−1 ,
pal square root, while U is unique if A has full rank. The where κ(A) = kAkkA−1 k > 1 is the condition number
polar decomposition generalizes to matrices the polar introduced in section 1. It turns out that we can always
representation z = r eiθ of a complex number. The Her- find a perturbation ∆A such that A + ∆A is singular
mitian polar factor H is also known as the matrix abso- and k∆Ak/kAk = κ(A)−1 . It follows that the relative
lute value, |A|, and is much studied in matrix analysis distance to singularity
and functional analysis.
d(A) = min { k∆Ak/kAk : A + ∆A is singular } (2)
One reason for the importance of the polar decom-
position is that it provides an optimal way to orthogo- is given by d(A) = κ(A)−1 . This reciprocal relation
nalize a matrix: a result of Fan and Hoffman (1955) says between problem conditioning and the distance to a
that U is the nearest matrix with orthonormal columns singular problem (one with an infinite condition num-
to A in any unitarily invariant norm (a unitarily invari- ber) is common to a variety of problems in linear alge-
ant norm is one with the property that kU AV k = kAk bra and control theory, as shown by James Demmel in
for any unitary U and V ; the 2-norm and the Frobe- the 1980s.
nius norm are particular examples). In various appli- We may want a more refined test for whether A + ∆A
cations a matrix A ∈ Rn×n that should be orthogonal is nonsingular. To obtain one we will need to make
drifts from orthogonality because of rounding or other some assumptions about the perturbation. Suppose
errors; replacing it by the orthogonal polar factor U is that ∆A has rank 1: ∆A = xy ∗ , for some vectors x and
then a good strategy. y. From the analysis above we know that A + ∆A will
The polar decomposition also solves the orthogonal be nonsingular if A−1 ∆A = A−1 xy ∗ has no eigenvalue
Procrustes problem, for A, B ∈ Cm×n , equal to −1. Using the fact that the nonzero eigenvalues
of AB are the same as those of BA for any conformable
min kA − BQkF : Q ∈ Cn×n , Q∗ Q = I ,

matrices A and B, we see that the nonzero eigenvalues
for which any solution Q is a unitary polar factor of of (A−1 x)y ∗ are the same as those of y ∗ A−1 x. Hence
B ∗A. This problem comes from factor analysis and mul- A + xy ∗ is nonsingular as long as y ∗ A−1 x 6= −1.
tidimensional scaling in statistics, where the aim is to Now that we know when A + xy ∗ is nonsingular we
see whether two data sets A and B are the same up to might ask if there is an explicit formula for the inverse.
an orthogonal transformation. Since A + xy ∗ = A(I + A−1 xy ∗ ) we can take A = I
Either of the SVD and the polar decomposition can without loss of generality. So we are looking for the
be derived, or computed, from the other. Histori- inverse of B = I + xy ∗ . One way to find it is to guess
cally, the SVD came first (Beltrami, in 1873), with the that B −1 = I + θxy ∗ for some scalar θ and equate the
polar decomposition three decades behind (Autonne, product with B to I, to obtain θ(1 + y ∗ x) + 1 = 0. Thus
in 1902). (I + xy ∗ )−1 = I − xy ∗ /(1 + y ∗ x). The corresponding
formula for (A + xy ∗ )−1 is
3 Distance to Singularity and Low-Rank
(A + xy ∗ )−1 = A−1 − A−1 xy ∗ A−1 /(1 + y ∗ A−1 x),
Perturbations
which is known as the Sherman–Morrison formula.
The question commonly arises of whether a given per- This formula and its generalizations originate in the
turbation of a nonsingular matrix A preserves nonsin- 1940s and have been rediscovered many times. The
gularity. In a sense, this question is trivial. Recalling corresponding formula for a rank p perturbation is
that a square matrix is nonsingular when all its eigen- the Sherman–Morrison–Woodbury formula: for U, V ∈
values are nonzero, and that the product of two matri- Cn×p ,
ces is nonsingular unless one of them is singular, from
(A + UV ∗ )−1 = A−1 − A−1 U(I + V ∗ A−1 U)−1 V ∗ A−1 .
A + ∆A = A(I + A−1 ∆A) we see that A + ∆A is non-
singular as long as A−1 ∆A has no eigenvalue equal to Important applications of these formulae are in opti-
−1. However, this is not an easy condition to check, mization, where rank-1 or rank-2 updates are made to
and in practice we may not know ∆A but only a bound Hessian approximations in quasi-Newton methods and
for its norm. Since any norm of a matrix exceeds the to basis matrices in the simplex method. More gener-
modulus of every eigenvalue, a sufficient condition for ally, the task of updating the solution to a problem after
A + ∆A to be nonsingular is that kA−1 ∆Ak < 1, which a coefficient matrix has undergone a low-rank change,
is certainly true if kA−1 kk∆Ak < 1. This condition can or has had a row or column added or removed, arises in
5

many applications, including signal processing, where 5 Eigenvalue Problems

new data is continually being received and old data is
The eigenvalue problem Ax = λx for a square matrix
discarded.
A ∈ Cn×n , which seeks an eigenvalue λ ∈ C and an
The minimal distance in the definition (2) of the dis-
eigenvector x 6= 0, arises in many forms. Depending on
tance to singularity d(A) can be shown to be attained
the application we may want all the eigenvalues or just
for a rank-1 matrix ∆A. Rank-1 matrices often feature
a subset, such as the 10 that have the largest real part,
in the solutions of matrix optimization problems.
and eigenvectors may or may not be required as well.
Whether the problem is Hermitian or non-Hermitian
4 Computational Cost changes its character greatly. In particular, while a Her-
mitian matrix has real eigenvalues and a linearly inde-
In order to compare competing methods and predict pendent set of n eigenvectors that can be taken to
their practical efficiency we need to know their com- be orthonormal, the eigenvalues of a non-Hermitian
putational cost. Traditionally, computational cost has matrix can be anywhere in the complex plane and there
been measured by counting the number of scalar arith- may not be a set of eigenvectors that spans Cn .
metic operations and retaining only the highest order
terms in the total. For example, using GE we can solve 5.1 Bounds and Localization
a system of n linear equations in n unknowns with
One of the first questions to ask is whether we can find
n3 /3 + O(n2 ) additions, n3 /3 + O(n2 ) multiplications,
a finite region containing the eigenvalues. The answer
and O(n) divisions. This is typically summarized as
is yes, because Ax = λx implies |λ|kxk = kAxk 6
2n3 /3 flops, where a flop denotes any of the scalar
kAkkxk, and hence |λ| 6 kAk. So all the eigenvalues lie
operations +, −, ∗, /. Most standard problems involv-
in a disc of radius kAk about the origin. More refined
ing n×n matrices can be solved with a cost of order n3
bounds are provided by Gershgorin’s theorem.
flops or less, so the interest is in the exponent (1, 2, or 3)
and the constant of the dominant term. However, the Theorem 1 (Gershgorin’s theorem, 1931). The eigen-
costs of moving data around a computer’s hierarchi- values of A ∈ Cn×n lie in the union of the n discs in
cal memory and the costs of communicating between the complex plane
different processors on a multiprocessor system can (
X
)
be equally important. Simply counting flops does not Di = z ∈ C : |z − aii | 6 |aij | , i = 1 : n.
j6=i
therefore necessarily give a good guide to performance
in practice. An extension of the theorem says that if k discs form
Seemingly trivial problems can offer interesting chal- a connected region that is isolated from the other discs
lenges as regards minimizing arithmetic costs. For then there are precisely k eigenvalues in this region.
matrices A, B, and C of any dimensions such that the The Gershgorin discs for the matrix
product ABC is defined, how should we compute the 
−1 1/3 1/3 1/3

product? The associative law for matrix multiplication  3/2 −2 0 0 
(3)
 
tells us that (AB)C = A(BC), but this mathematical  1/2 0 3 1/4 
 
equivalence is not a computational one. To see why, 1 0 −1 6
note that for three vectors a, b, c ∈ Rn we can write are shown in figure 1. We can conclude that there is
∗ ∗
(ab ) c = a (b c) . one eigenvalue in the disc centered at 3, one in the disc
| {z } | {z } centered at 6, and two in the union of the other two
n×n 1×1
discs.
Evaluation of the left-hand side requires O(n2 ) flops, as Gershgorin’s theorem is most useful for matrices
there is an outer product ab∗ and then a matrix–vector that are close to diagonal, such as those eventually pro-
product to evaluate, while evaluation of the right-hand duced by the Jacobi iterative method for eigenvalues of
side requires just O(n) flops, as it involves only vec- Hermitian matrices. Improved estimates can be sought
tor operations: an inner product and a vector scaling. by applying Gershgorin’s theorem to a matrix D −1 AD
One should always be alert for opportunities to use the similar to A, with the diagonal matrix D chosen in an
associative law to save computational effort. attempt to isolate and shrink the discs. Many variants
6

2
20
1 10
10
0 • • • •
0 0
−1
−10
−2 −10
−4 −3 −2 −1 0 1 2 3 4 5 6 7 8
−20
Figure 1 Gershgorin discs for the matrix in (3); the −20 −10 0 0 20 40
eigenvalues are marked as solid dots.
Figure 2 Fields of values for a pentadiagonal Toeplitz
of Gershgorin’s theorem exist with discs replaced by matrix (left) and a circulant matrix (right), both of dimen-
sion 32. The eigenvalues are denoted by crosses.
other shapes.
The spectral radius ρ(A) (the largest absolute value
of any eigenvalue of A) satisfies ρ(A) 6 kAk, as shown y, respectively, then there is an eigenvalue λ + ∆λ of
above, but hthisiinequality can be arbitrarily weak, as A+∆A such that ∆λ = y ∗ ∆Ax/(y ∗ x)+O(k∆Ak2 ) and
the matrix 10 θ1 shows for |θ| 1. It is natural to ask so
whether there are any sharper relations between the kyk2 kxk2
spectral radius and norms. One answer is the equality |∆λ| 6 k∆Ak + O(k∆Ak2 ).
|y ∗ x|
ρ(A) = lim kAk k1/k . (4) The term kyk2 kxk2 /|y ∗ x| can be shown to be an
k→∞
(absolute) condition number for λ. It is at least 1 and
Another is the result that given any ε > 0 there is a
norm such that kAk 6 ρ(A) + ε; however, the norm tends to infinity as y and x approach orthogonality
depends on A. This result can be used to give a proof of (which can never exactly be achieved for simple λ), so
the fact, discussed in the article on the Jordan canonical λ can be very ill conditioned. However if A is Hermitian
form, that the powers of A converge to zero if ρ(A) < 1. then we can take y = x and the bound simplifies to
The field of values, also known as the numerical |∆λ| 6 k∆Ak + O(k∆Ak2 ), so all the eigenvalues of a
range, is a tool that can be used for localization and Hermitian matrix are perfectly conditioned.
many other purposes. It is defined for A ∈ Cn×n by Much research has been done to obtain eigenvalue
∗
z Az
perturbation bounds under both weaker and stronger
n
F (A) = : 0 =
6 z ∈ C . assumptions about the problem. Suppose we drop the
z∗ z
The set F (A) is compact and convex (a nontrivial prop- requirement that λ is simple. Consider the matrix and
erty proved by Toeplitz and Hausdorff) and it contains perturbation
all the eigenvalues of A. For normal matrices it is the
   
0 1 0 0 0 0
convex hull of the eigenvalues. The normal matrices A A =  0 0 1  , ∆A =  0 0 0  .
   
are those for which AA∗ = A∗ A, and they include the 0 0 0 ε 0 0
Hermitian, the skew-Hermitian, and the unitary matri-
The eigenvalues of A are all zero and those of A + ∆A
ces. For a Hermitian matrix F (A) is a segment of the real
are the third roots of ε. The change in the eigenvalue
axis while for a skew-Hermitian matrix it is a segment
is proportional not to ε but to a fractional power of ε.
of the imaginary axis. Figure 2 illustrates two fields of
values, the second of which is the convex hull of the In general, the sensitivity of an eigenvalue depends on
eigenvalues because a circulant matrix is normal. the Jordan structure for that eigenvalue.

5.2 Eigenvalue Sensitivity 5.3 Companion Matrices and the Characteristic

Polynomial
If A is perturbed how much do its eigenvalues change?
This question is easy to answer for a simple eigenvalue The eigenvalues of a matrix A are the roots of its charac-
λ—one that has algebraic multiplicity 1. We need the
teristic polynomial, det(λI − A). Conversely, associated
notion of a left eigenvector of A corresponding to λ,
with the polynomial
which is a nonzero vector y such that y ∗ A = λy ∗ .
If λ is simple with right and left eigenvectors x and p(λ) = λn − an−1 λn−1 − · · · − a0
7

is the companion matrix Special cases are λn = minx6=0 x ∗ Ax/(x ∗ x) and λ1 =


an−1 an−2 ... ... a0
 maxx6=0 x ∗ Ax/(x ∗ x).
 1
 0 ... ... 0   Taking x to be a unit vector ei in the previous formula
 ..  for λ1 gives λ1 > aii for all i. This inequality is just
C= 0

1 . 0 ,

 .
 . .. ..  the first in a sequence of inequalities relating sums of
.

 . 0 .  eigenvalues to sums of diagonal elements, obtained by
0 ... ... 1 0 Schur in 1923:
and the eigenvalues of C are the roots of p. k
X k
X
This relation means that the roots of a polynomial λi > a
e ii , k = 1 : n, (5)
can be found by computing the eigenvalues of an n × n i=1 i=1

matrix, and this approach is used by some computer where {a e ii } is the set of diagonal elements of A
codes, for example the roots function of MATLAB. arranged in decreasing order: a e 11 > · · · > a e nn . There
While standard eigenvalue algorithms do not exploit is equality for k = n, since both sides equal trace(A).
the structure of C, this approach has proved competi- These inequalities say that the vector [λ1 , . . . , λn ] of
tive with specialist polynomial root-finding algorithms. eigenvalues majorizes the vector [a e 11 , . . . , a
e nn ] of diag-
Another use for the relation is to obtain bounds for onal elements.
roots of polynomials from bounds for matrix eigen- In general there is no useful formula for the eigen-
values, and vice versa. values of a sum A + B of Hermitian matrices. How-
Companion matrices have many interesting proper- ever, the Courant–Fischer theorem yields the upper and
ties. For example, any nonderogatory n × n matrix lower bounds
is similar to a companion matrix. Companion matri- λk (A) + λn (B) 6 λk (A + B) 6 λk (A) + λ1 (B),
ces therefore have featured strongly in matrix analysis
and also in control theory. However, similarity trans- from which it follows that |λk (A + B) − λk (A)| 6
formations to companion form are little used in prac- max(|λn (B)|, |λ1 (B)|) = kBk2 . The latter inequality
tice because of problems with ill conditioning and again shows that the eigenvalues of a Hermitian matrix
numerical instability. are well conditioned under perturbation.
Returning to the characteristic polynomial, p(λ) = The Cauchy interlace theorem has a different flavor. It
det(λI − A) = λn − an−1 λn−1 − · · · − a0 , we know that relates the eigenvalues of successive leading principal
p(λi ) = 0 for every eigenvalue λi of A. The Cayley– submatrices Ak = A(1 : k, 1 : k) by
Hamilton theorem says that p(A) = An − an−1 An−1 − λk+1 (Ak+1 ) 6 λk (Ak ) 6 λk (Ak+1 )
· · · − a0 I = 0 (which cannot be obtained simply by 6 · · · 6 λ2 (Ak+1 ) 6 λ1 (Ak ) 6 λ1 (Ak+1 )
putting “λ = A” in the previous expression!). Hence the
nth power of A, and inductively all higher powers, are for k = 1 : n − 1, showing that the eigenvalues of Ak
expressible as a linear combination of I, A, . . . , An−1 . interlace those of Ak+1 .
Moreover, if A is nonsingular then from A−1 p(A) = 0 it In 1962 Alfred Horn made a conjecture that a cer-
follows that A−1 can also be written as a polynomial in tain set of linear inequalities involving real numbers
A of degree at most n − 1. These relations are not use- αi , βi , and γi , i = 1 : n, is necessary and sufficient for
ful for practical computation because the coefficients the existence of n × n Hermitian matrices A, B, and C
ai can vary tremendously in magnitude and it is not with eigenvalues the αi , βi , and γi , respectively, such
possible to compute them to high relative accuracy. that C = A+B. The conjecture was open for many years
but was finally proved to be true in papers published by
5.4 Eigenvalue Inequalities for Hermitian Matrices Klyachko in 1998 and Knutson and Tao in 1999, which
exploit deep connections with algebraic geometry, rep-
The eigenvalues of Hermitian matrices A ∈ Cn×n ,
resentations of Lie groups, and quantum cohomology.
which in this section we order λn 6 · · · 6 λ1 , satisfy
many beautiful inequalities. Among the most impor-
tant are those in the Courant–Fischer theorem (1905), 5.5 Solving the Non-Hermitian Eigenproblem
which states that every eigenvalue is the solution of a
min-max problem over a suitable subspace S of Cn : The simplest method for computing eigenvalues, the
x ∗ Ax power method, computes just one: the largest in mod-
λi = min max . ulus. It comprises repeated multiplication of a starting
dim(S)=n−i+1 06=x∈S x∗ x
8

vector x by A. Since the resulting sequence is liable to check that Hk+1 = Qk∗ Hk Qk , so the QR iteration carries
overflow or underflow in floating-point arithmetic one out a sequence of unitary similarity transformations.
normalizes the vector after each iteration. Therefore Why the QR iteration works is not obvious but can
one step of the power method has the form x ← Ax, be elegantly explained by analyzing the subspaces
x ← ν −1 x, where ν = xj with |xj | = maxi |xi |. If A spanned by the columns of Qk . To produce a practi-
has a unique eigenvalue λ of largest modulus and the cal and efficient algorithm various refinements of the
starting vector has a component in the direction of the iteration are needed, which include
corresponding eigenvector then ν converges to λ and x
converges to the corresponding eigenvector. The power • deflation, whereby when an element on the first
method is most often applied to (A − µI)−1 , where µ is subdiagonal of Hk becomes small, that element is
an approximation to an eigenvalue of interest. In this set to zero and the problem is split into two smaller
form it is known as inverse iteration and convergence is problems that are solved independently,
to the eigenvalue closest to µ. We now turn to methods • a double shift technique for real A that allows
that compute all the eigenvalues. two QR steps with complex conjugate shifts to be
Since similarities X −1 AX preserve the eigenvalues carried out entirely in real arithmetic and gives
and change the eigenvectors in a controlled way, car- convergence to the real Schur form,
rying out a sequence of similarity transformations to • a multishift technique for including m different
reduce A to a simpler form is a natural way to tackle the shifts in a single QR iteration.
eigenproblem. Some early methods used nonunitary X,
but such transformations are now avoided because of A proof of convergence is lacking for all current shift
numerical instability when X is ill conditioned. Since strategies. Implementations introduce a random shift
the 1960s the focus has been on using unitary similar- when convergence appears to be stagnating. The QR
ities to compute the Schur decomposition A = QT Q∗ , algorithm works very well in practice and continues
where Q is unitary and T is upper triangular. The diag- to be the method of choice for the non-Hermitian
onal entries of T are the eigenvalues of A, and they can eigenproblem.
be made to appear in any order by appropriate choice of
Q. The first k columns of Q span an invariant subspace 5.6 Solving the Hermitian Eigenproblem
corresponding to the eigenvalues t11 , . . . , tkk . Eigen-
vectors can be obtained by solving triangular systems The eigenvalue problem for Hermitian matrices is eas-
involving T . ier to solve than that for non-Hermitian matrices and
For some matrices the Schur factor T is diagonal; the range of available numerical methods is much
these are precisely the normal matrices defined in sec- wider.
tion 5.1. The real Schur decomposition contains only To solve the complete Hermitian eigenproblem we
real matrices when A is real: A = QRQT , where Q is need to compute the spectral decomposition A =
orthogonal and R is real upper quasi-triangular, which QDQ∗ , where D = diag(λi ) contains the eigenvalues
means that R is upper triangular except for 2×2 blocks and the columns of the unitary matrix Q are the corre-
on the diagonal corresponding to complex conjugate sponding eigenvectors. Many methods begin by unitary
eigenvalues. reduction to tridiagonal form T = U ∗ AU , where tij = 0
The standard algorithm for solving the non- for |i − j| > 1 and the unitary matrix U is constructed
Hermitian eigenproblem is the QR algorithm, which as a product of Householder matrices. The eigenvalue
was proposed independently by John Francis and Vera problem for T is much simpler, though still nontriv-
Kublanovskaya in 1961. The matrix A ∈ Cn×n is ial. The most widely used method is the QR algorithm,
first unitarily reduced to upper Hessenberg form H = which has the same form as in the non-Hermitian case
U ∗ AU (hij = 0 for i > j + 1), with U a product of but with the upper Hessenberg Hk replaced by the Her-
Householder matrices. The QR iteration constructs a mitian tridiagonal Tk and the shifts chosen to acceler-
sequence of upper Hessenberg matrices beginning with ate the convergence of Tk to diagonal form. The Her-
H1 = H defined by Hk − µk I =: Qk Rk (QR factorization, mitian QR algorithm with appropriate shifts has been
computed using Givens rotations), Hk+1 := Rk Qk + µk I, proved to converge at a cubic rate.
where the µk are shifts chosen to accelerate the con- Another method for solving the Hermitian tridiag-
vergence of Hk to upper triangular form. It is easy to onal eigenproblem is the divide and conquer method.
9

This method decouples T in the form to x, that is, N(x). The LDL∗ factors of a tridiagonal
T11 0 matrix can be computed in O(n) flops, so this bisec-

T = + αvv ∗ ,
0 T22 tion process is efficient. An alternative approach can be
where only the trailing diagonal element of T11 and the built by using properties of Sturm sequences, which are
leading diagonal element of T22 differ from the corre- sequences comprising the characteristic polynomials
sponding elements of T and hence the vector v has of leading principal submatrices of T − λI.
only two nonzero elements. The eigensystems of T11
and T22 are found by applying the method recursively, 5.7 Computing the SVD
yielding T11 = Q1 Λ1 Q1∗ and T22 = Q2 Λ2 Q2∗ . Then
m×n the eigenvalues of

Q1 Λ1 Q1∗ 0
For a rectangular matrixh A∈ iC
∗
T = ∗ + αvv the Hermitian matrix A0∗ A 0 of dimension m + n are
0 Q 2 Λ2 Q 2
plus and minus the nonzero singular values of A along
e ∗ diag(Q1 , Q2 )∗ ,

= diag(Q1 , Q2 ) diag(Λ1 , Λ2 ) + αvev
with m + n − 2 min(m, n) zeros. Hence the SVD can
where ve = diag(Q1 , Q2 )∗ v. The eigensystem of a rank- be computed via the eigendecomposition of this larger
1 perturbed diagonal matrix D + ρzz∗ can be found by matrix. However, this would be inefficient, and instead
solving the secular equation obtained by equating the one uses algorithms that work directly on A and are
characteristic polynomial to zero: analogues of the algorithms for Hermitian matrices.
n
X |zj |2 The standard approach is to reduce A to bidiagonal
f (λ) = 1 + ρ = 0.
djj − λ form B by Householder transformations applied on the
j=1
left and the right and then to apply an adaptation of the
Putting the pieces together yields the overall eigende-
QR algorithm that works on the bidiagonal factor (and
composition.
implicitly applies the QR algorithm to the tridiagonal
Other methods are suitable for computing just a por-
matrix B ∗ B).
tion of the spectrum. Suppose we want to compute the
kth smallest eigenvalue of T and that we can some-
5.8 Generalized Eigenproblems
how compute the integer N(x) equal to the number
of eigenvalues of T that are less than or equal to x. The generalized eigenvalue problem (GEP) Ax = λBx,
Then we can apply the bisection method to N(x) to with A, B ∈ Cn×n , can be converted into a standard
find the point where N(x) jumps from k − 1 to k. eigenvalue problem if B (say) is nonsingular: B −1 Ax =
We can compute N(x) by making use of the following λx. However, such a transformation is inadvisable
result about the inertia of a Hermitian matrix, defined numerically unless B is very well conditioned. If A and B
by inertia(A) = (ν, ζ, π ), where ν is the number of neg- have a common null vector z the problem takes on a dif-
ative eigenvalues, ζ is the number of zero eigenvalues, ferent character because then (A − λB)z = 0 for any λ;
and π is the number of positive eigenvalues. such a problem is called singular . We will assume that
Theorem 2 (Sylvester’s inertia theorem). If A is Her- the problem is regular , so that det(A − λB) 6≡ 0. The
mitian and M is nonsingular then inertia(A) = linear polynomial A − λB is sometimes called a pencil.
inertia(M ∗ AM). It is convenient to write λ = α/β, where α and β are
Sylvester’s inertia theorem says that the number not both zero, and rephrase the problem in the more
of negative, zero, and positive eigenvalues does not symmetric form βAx = αBx. If x is a nonzero vector
change under congruence transformations. By using GE such that Bx = 0 then, since the problem is assumed
we can factorize2 T − xI = LDL∗ , where D is diago- to be regular, Ax 6= 0 and so β = 0. This means that
nal and L is unit lower bidiagonal (a bidiagonal matrix λ = ∞ is an eigenvalue. Infinite eigenvalues may seem
is one that is both triangular and tridiagonal). Then a strange concept, but in fact they are no different in
inertia(T − xI) = inertia(D), so the number of nega- most respects to finite eigenvalues.
tive diagonal or zero elements of D equals the number An important special case is the definite general-
of eigenvalues of T − xI less than or equal to 0, which ized eigenvalue problem, in which A and B are Hermi-
is the number of eigenvalues of T less than or equal tian and B (say) is positive definite. If B = R ∗R is a
Cholesky factorization then Ax = λBx can be rewrit-
2. The factorization may not exist, but if it does not we can simply
ten as R −∗ AR −1 · Rx = λRx, which is a standard eigen-
perturb T slightly and try again without any loss of numerical stability. problem for the Hermitian matrix C = R −∗ AR −1 . This
10

argument shows that the eigenvalues of a definite prob- The standard approach for numerical solution of the
lem are all real. Definite generalized eigenvalue prob- QEP mimics the conversion of the scalar polynomial
lems arise in many physical situations where an energy root problem into a matrix eigenproblem described in
minimization principle is at work, such as in problems section 5.3. From the relation
in engineering and physics. A1 A0 A2 0 λx

L(λ)z ≡ +λ
A generalization of the QR algorithm called the QZ I 0 0 −I x
algorithm computes a generalization to two matrices Q(λ)x

=
of the Schur decomposition: Q∗ AZ = T , Q∗ BZ = S, 0
where Q and Z are unitary and T and S are upper tri- we see that the eigenvalues of the quadratic Q are the
angular. The generalized Schur decomposition yields eigenvalues of the 2n × 2n linear polynomial L(λ). This
the eigenvalues as the ratios tii /sii and enables eigen- is an example of an exact linearization process—thanks
vectors to be computed by substitution. to the hidden λ in the eigenvector! The eigenvalues of L
The quadratic eigenvalue problem (QEP) Q(λ)x = can be found using the QZ algorithm. The eigenvectors
(λ2 A2 + λA1 + A0 )x = 0, where Ai ∈ Cn×n , i = 0 : 2, of L have the form z = λx

x , where x is an eigenvector
arises most commonly in the dynamic analysis of struc- of Q, and so x can be obtained from either the first n
tures when the finite element method is used to dis- (if λ 6= 0) or the last n components of z.
cretize the original PDE into a system of second-order
ODEs A2 q̈(t) + A1 q̇(t) + A0 q(t) = f (t). Here, the Ai 6 Sparse Linear Systems
are usually Hermitian (though A1 is skew-Hermitian in
For linear systems coming from discretization of dif-
gyroscopic systems) and positive (semi)definite. Anal-
ferential equations it is common that A is banded,
ogously to the GEP, the QEP is said to be regular if
that is, the nonzero elements lie in a band about the
det(Q(λ)) 6≡ 0. The quadratic problem differs funda-
main diagonal. An extreme case is a tridiagonal matrix,
mentally from the linear GEP because a regular problem
of which the classic example is the second-difference
has 2n eigenvalues, which are the roots of det(Q(λ)) =
matrix, illustrated for n = 4 by
0, but at most n linearly independent eigenvectors,  
−2 1 0 0 4 3 2 1
 
and a vector may be an eigenvector for two different
 1 −2 1 0 
 
eigenvalues. For example, the QEP with −1 1 3 6 4 2

A=  0 1 −2 1  , A = − 5  .
 
2 4 6 3
−1 −6 0 12
 
Q(λ) = λ2 I + λ + 0 0 1 −2 1 2 3 4
2 −9 −2 14
This matrix corresponds to a centered finite difference
has eigenvalues 1, 2, 3, and 4, with eigenvectors 10 ,

0 1 1 approximation to a second derivative: f 00 (x) ≈ (f (x +
1 , 1 , and 1 , respectively. Moreover, there is no
h)−2f (x)+f (x−h))/h2 . Note that A−1 is a full matrix.
Schur form for three or more matrices, that is, we can-
For banded matrices, GE produces banded LU factors
not in general find unitary matrices U and V such that
and its computational cost is proportional to n times
U ∗ Ai V is triangular for i = 0 : 2.
the square of the bandwidth.
Associated with the QEP is the matrix Q(X) = A2 X 2 +
A matrix is sparse if advantage can be taken of the
A1 X + A0 , with X ∈ Cn×n . From the relation
zero entries, because of either their number or their dis-
Q(λ) − Q(X) = A2 (λ2 I − X 2 ) + A1 (λI − X) tribution. A banded matrix is a special case of a sparse
= (λA2 + A2 X + A1 )(λI − X) matrix. Sparse matrices are stored on a computer not as
a square array but in a special format that records only
it is clear that if we can find a matrix X such that
the nonzeros and their location in the matrix. This can
Q(X) = 0, known as a solvent, then we have reduced
be done with three vectors: one to store the nonzero
the QEP to finding the eigenvalues of X and solving
entries and the other two to define the row and column
one n × n GEP. For theh 2 ×i2 Q above there are five sol-
indices of the elements in the first vector.
vents, one of which is 31 02 . The existence and enumer-
Sparse matrices help to explain the tenet: never solve
ation of solvents is nontrivial and leads into the theory
a linear system Ax = b by computing x = A−1 × b. The
of matrix polynomials. In general, matrix polynomials reasons for eschewing A−1 are threefold:
Pk
are matrices of the form i=0 λi Ai whose elements are
polynomials in a complex variable; an older term for • Computing A−1 requires three times as many flops
such matrices is λ-matrices. as solving Ax = b by GE with partial pivoting.
11

• GE with partial pivoting is backward stable for solv- 0 0

ing Ax = b (see section 8) but solution via A−1 is 50 50
not.
100 100
• If A is sparse, A−1 is generally dense and so requires
much more storage than GE with partial pivoting. 150 150

When GE is applied to a sparse matrix fill-in occurs 200 200

when the row operations cause a zero entry to become 0 100 200 0 100 200
nonzero during the elimination. To minimize the stor- nz = 3137 nz = 5041

age and the computational cost, fill-in must be avoided 0 0

as much as possible. This can be done by employing
50 50
row and column interchanges to choose a suitable pivot
from the active submatrix. The first such strategy was 100 100

introduced by Markowitz in 1957. At the kth stage, 150 150

(k)
with cj denoting the number of nonzeros in rows
(k) 200 200
k to n of column j and ri the number of nonzeros
in columns k to n of row i, the Markowitz strategy 0 100 200 0 100 200
(k) (k) nz = 3137 nz = 3476
finds the pair (r , s) that minimizes ri − 1 cj − 1
(k)
over all nonzero potential pivots aij and then takes
(k) Figure 3 Sparsity plots of a symmetric positive definite
ar s as the pivot. The quantity being minimized is a
bound on the fill-in. In practice, the potential pivots matrix (left) and its Cholesky factor (right) for original
matrix (first row) and reordered matrix (second row). nz is
must be restricted to those not too much smaller in
the number of nonzeros.
magnitude than the partial pivot, in order to preserve
numerical stability. The result of GE with Markowitz
pivoting is a factorization P AQ = LU , where P and Q than unknowns (m > n), and underdetermined systems,
are permutation matrices. with fewer equations than unknowns (m < n). Since in
The analogue of the Markowitz strategy for Hermi- general there is no solution when m > n and there are
tian positive definite matrices chooses a diagonal entry many solutions when m < n, extra conditions must
(k) (k) be imposed for the problems to be well-defined. These
aii as the pivot, where ri is minimal. This is the mini-
mum degree algorithm, which has been very successful usually involve norms and different choices of norms
in practice. Figure 3 shows in the first row a sparse and are possible. We will restrict our discussion mainly to
banded symmetric positive definite matrix A of dimen- the 2-norm, which is the most important case, but other
sion 225 followed to the right by its Cholesky factor. choices are also of practical interest.
The Cholesky factor has many more nonzeros than A.
The second row shows the matrix P AP T produced by 7.1 The Linear Least Squares Problem
an approximate minimum degree ordering (produced
When m > n the residual r = b − Ax cannot in general
by the MATLAB symamd function) and its Cholesky fac-
be made zero so we try to minimize its norm. The most
tor. We can see that the permutations have destroyed
common choice of norm is the 2-norm, which gives the
the band structure but have greatly reduced the fill-in,
linear least squares problem
producing a much sparser Cholesky factor.
As an alternative to GE for solving sparse linear sys- min kb − Axk2 . (6)
x∈Cn
tems one can apply iterative methods, described in sec-
This choice can be motivated by statistical consider-
tion 9; for sufficiently large problems these are the only
ations (the Gauss–Markov theorem) or by the fact that
feasible methods.
the square of the 2-norm is differentiable, which makes
the problem explicitly solvable. Indeed by setting the
7 Overdetermined and Underdetermined
gradient of kb − Axk22 to zero we obtain the normal
Systems
equations A∗Ax = A∗ b, which any solution of the least
Linear systems Ax = b with a rectangular matrix squares problem must satisfy. If A has full rank then
A ∈ Cm×n are very common. They break into two cat- A∗A is positive definite and so there is a unique solu-
egories: overdetermined systems, with more equations tion, which can be computed by solving the normal
12

equations using Cholesky factorization. For reasons of (It is certainly not obvious that these equations have
numerical stability, ith is ipreferable to use a QR fac- a unique solution.) In the case where A is square and
torization: if A = Q R01 then the normal equations nonsingular it is easily seen that A+ is just A−1 . More-
reduce to the triangular system R1 x = c, where c is the over, if rank(A) = n then A+ = (A∗A)−1 A∗ , while if
first n components of Q∗ b. rank(A) = m then A+ = A∗ (AA∗ )−1 . In terms of the
When A is rank deficient there are many least squares SVD (7),
solutions, which vary widely in norm. A natural choice
A+ = V diag(σ1−1 , . . . , σr−1 , 0, . . . , 0)U ∗ ,
is one of minimal 2-norm, and in fact there is a unique
minimal 2-norm solution, xLS , given by where r = rank(A). The formula xLS = A+ b holds for
r
X all m and n, so the pseudoinverse yields the minimal
xLS = (u∗
i b/σi )vi , 2-norm solution to both the least squares (overdeter-
i=1 mined) problem Ax = b and an underdetermined sys-
where tem Ax = b. The pseudoinverse has many interesting
A = UΣV ∗ , U = [u1 , . . . , um ], V = [v1 , . . . , vn ] (7) properties, including (A+ )+ = A, but it is not always
true that (AB)+ = B + A+ .
is an SVD and r = rank(A). The use of this formula in
Although the pseudoinverse is a very useful theoret-
practice is not straightforward because a matrix stored
ical tool it is rarely necessary to compute it explicitly
in floating-point arithmetic will rarely have any zero
(just as for its special case the matrix inverse).
singular values. Therefore r must be chosen by desig-
The pseudoinverse is just one of many ways of gen-
nating which singular values can be regarded as negligi-
eralizing the notion of inverse to rectangular matri-
ble and this choice should take account of the accuracy
ces, but it is the right one for minimum 2-norm solu-
with which the elements of A are known.
tions to linear systems. Other generalized inverses can
Another choice of least squares solution in the rank-
be obtained by requiring only a subset of the four
deficient case is a basic solution: one with at most r
nonzeros. Such a solution can be computed via the QR Moore–Penrose conditions to hold.
factorization with column pivoting.
8 Numerical Considerations
7.2 Underdetermined Systems
Prior to the introduction of the first digital comput-
When m < n and A has full rank, there are infinitely ers in the 1940s, numerical computations were carried
many solutions to Ax = b and again it is natural to out by humans, sometimes with the aid of mechanical
seek one of minimal 2-norm. There is a unique such calculators. The human involvement in a sequence of
solution xLS = A∗ (AA∗ )−1 b, and it is best computed calculations meant that potentially dangerous events
via a QR factorization, this time of A∗ . A basic solu- such as dividing by a tiny number or subtracting two
tion, with m nonzeros, can alternatively be computed. numbers that agree to almost all their significant digits
As a simple example, consider the problem “find two could be observed, their effect monitored, and possible
x
numbers whose sum is 5”, that is, solve [1 1] x12 = corrective action taken—such as temporarily increas-
5. A basic solution is [5 0]T while the minimal 2- ing the precision of the calculations. On the very early
norm solution is [5/2 5/2]T . Minimal 1-norm solu- computers intermediate results were observed on a
tions to underdetermined systems are important in cathode-ray tube monitor, but this became impossible
compressed sensing. as problem sizes increased (along with available com-
puting power). Fears were raised in the 1940s that algo-
7.3 Pseudoinverse rithms such as GE would suffer exponential growth
The analysis in the previous two subsections can be of errors as the problem dimension increased, due
unified in a very elegant way by making use of the to the rapidly increasing number of arithmetic opera-
Moore–Penrose pseudoinverse A+ of A ∈ Cm×n , which tions, each having its associated rounding error. These
is defined as the unique X ∈ Cn×m satisfying the fears were particularly concerning given that the error
Moore–Penrose conditions growth might be unseen and unsuspected.
The subject of rounding error analysis grew out
AXA = A, XAX = X,
of the need to understand the effect on algorithms
∗
(AX) = AX, (XA)∗ = XA. of rounding errors. The person who did the most to
13

develop the subject was James Wilkinson, whose influ- upper triangular matrix Tb such that
ential papers and 1961 and 1965 books showed how e ∗ (A + ∆A)Q
Q e = Tb , k∆AkF 6 p(n)ukAkF ,
backward error analysis can be used to obtain deep
insights into numerical stability. We will discuss just where Q e is some exactly unitary matrix and p(n) is a
two particular examples. cubic polynomial. The computed Schur factor Q b is not

Wilkinson showed that when a nonsingular linear necessarily close to Q—which

e in turn is not necessarily
system Ax = b is solved by GE in floating-point close to the exact Q!— but it is close to being orthogo-
nal: kQb ∗Qb − IkF 6 p(n)u. This distinction between the
arithmetic the computed solution x b satisfies
different Q matrices is an indication of the subtleties
(A + ∆A)x
b = b, k∆Ak∞ 6 p(n)ρn ukAk∞ . of backward error analysis. For some problems it is not
Here p(n) is a cubic polynomial, the growth factor clear exactly what form of backward error result it is
(k) possible to prove while obtaining useful bounds. How-
maxi,j,k |aij |
ρn = >1 ever, the purpose of a backward error analysis is always
maxi,j |aij | the same: either to show that an algorithm behaves in a
measures the growth of elements during the elimina- numerically stable way or to shed light on how it might
tion, and u is the unit roundoff. This is a backward fail to do so and to indicate what quantities should be
stability result : it says that the computed solution x b monitored in order to identify potential instability.
is the exact solution of a perturbed system. Ideally,
we would like k∆Ak∞ 6 ukAk∞ , which reflects the 9 Iterative Methods
uncertainty caused by converting the elements of A to
In numerical linear algebra methods can broadly be
floating-point numbers. The polynomial term p(n) is
divided into two classes: direct and iterative. Direct
pessimistic and might be more realistically replaced by
methods, such as GE, solve a problem in a fixed num-
its square root. The danger term is the growth factor ρn ,
ber of arithmetic operations or a variable number that
and the conclusion from Wilkinson’s analysis is that a
in practice is fairly constant, as for the QR algorithm for
pivoting strategy should aim to keep ρn small. If no
eigenvalues. Iterative methods are infinite processes
pivoting is done, ρn can be arbitrarily large (e.g., for
that must be truncated at some point when the approx-
A = 1ε 11 with 0 < ε 1, ρn ≈ 1/ε). For partial pivot-

imation they provide is “good enough”. Usually, iter-
ing however, it can be shown that ρn 6 2n−1 and that
ative methods do not transform the matrix in ques-
this bound is attainable. In practice, ρn is almost always
tion and access it only through matrix–vector products;
of modest size for partial pivoting (ρn 6 50, say); why
this makes them particularly attractive for large, sparse
this should be so remains one of the great mysteries of
matrices, where applying a direct method may not be
numerical analysis!
practical.
One of the benefits of Wilkinson’s backward error
We have already seen in section 5.5 a simple iterative
analysis is that it enables us to identify classes of matri-
method for the eigenvalue problem: the power method.
ces for which pivoting is not necessary, that is, for
The stationary iterative methods are an important class
which the LU factorization A = LU exists and ρn is
of iterative methods for solving a nonsingular linear
nicely bounded. One such class is the matrices that
system Ax = b. These methods are best described in
are diagonally dominant by either rows or columns, for
terms of a splitting
which ρn 6 2.
The potential instability of GE can be attributed to A = M − N,
the fact that A is premultiplied by a sequence of non- with M nonsingular. The system Ax = b can be rewrit-
unitary transformations, any of which can be ill con- ten Mx = Nx + b, which suggests constructing a
ditioned. Many algorithms, including Householder QR sequence {x (k) } from a given starting vector x (0) via
factorization and the QR algorithm for eigenvalues, use
Mx (k+1) = Nx (k) + b. (8)
exclusively unitary transformations. Such algorithms
are usually (but not always) backward stable, essen- Different choices of M and N yield different methods.
tially because unitary transformations do not magnify The aim is to choose M in such a way that it is inexpen-
errors: kU AV k = kAk for any unitary U and V for the sive to solve (8) while M is a good enough approxima-
2-norm and the Frobenius norm. As an example, the QR tion to A that convergence is fast. It is easy to analyze
algorithm applied to A ∈ Cn×n produces a computed convergence. Denote by e(k) = x (k) − x the error in the
14

kth iterate. Subtracting Mx = Nx + b from (8) gives where M1 z = d1 and M2 z = d2 are easy to solve. In
M(x (k+1) − x) = N(x (k) − x), so this case it is natural to take W = diag(M1 , M2 ) as the
e(k+1) = M −1 Ne(k) = · · · = (M −1 N)k+1 e(0) . (9) preconditioner. When A is Hermitian positive definite
the preconditioned system is written in a way that pre-
Ifρ(M −1 N) < 1 then (M −1 N)k → 0 as k → ∞ (see
serves the structure. For example, for the Jacobi pre-
Jordan canonical form) and so x (k) converges to x, at
conditioner, D = diag(A), the preconditioned system
a linear rate. In practice, for convergence in a reason-
would be written D −1/2 AD −1/2 xe = b,
e where x e = D 1/2 x
able number of iterations we need ρ(M −1 N) to be suf- −1/2 −1/2 −1/2
and b = D b. Here, the matrix D AD has unit
ficiently less than 1 and the powers of M −1 N should
e
diagonal and off-diagonal elements lying between −1
not grow too large initially before eventually decaying;
and 1.
in other words, M −1 N must not be too nonnormal.
Three standard choices of splitting are, with D = The most powerful iterative methods for linear sys-
diag(A) and L and U denoting the strictly lower and tems Ax = b are the Krylov methods. In these methods
strictly upper triangular parts of A, respectively, each iterate x (k) is chosen from the shifted subspace
x (0) + Kk (A, r (0) ) where
• M = D, N = −(L + U ): Jacobi iteration;
• M = D + L, N = −U : Gauss–Seidel iteration; Kk (A, r (0) ) = span{r (0) , Ar (0) , . . . , Ak−1 r (0) }
1 1−ω
• M = ω D + L, N = ω D − U, where ω ∈ (0, 2) is a Krylov subspace of dimension k, with r (k) =
is a relaxation parameter: successive overrelaxation b − Ax (k) . Different strategies for choosing approxi-
(SOR) iteration. mations from within the Krylov subspaces yield dif-
Sufficient conditions for convergence are that A is ferent methods. For example, the conjugate gradient
strictly diagonally dominant by rows for the Jacobi method (CG, for Hermitian positive definite A) and
iteration and that A is symmetric positive definite for the full orthogonalization method (FOM, for general A)
the Gauss–Seidel iteration. How to choose ω so that make the residual r (k) orthogonal to the Krylov sub-
ρ(M −1 N|ω ) is minimized for the SOR iteration was space Kk (A, r (0) ), while the minimal residual method
elucidated in the landmark 1950 PhD thesis of David (MINRES, for Hermitian A) and the generalized min-
Young. imal residual method (GMRES, for general A) mini-
The Google PageRank algorithm, which underlies mize the 2-norm of the residual over all vectors in the
Google’s ordering of search results, can be interpreted Krylov subspace. How to compute the vectors defined
as an application of the Jacobi iteration to a certain lin- in these ways is nontrivial. It turns out that CG can
ear system involving the adjacency matrix of the graph be implemented with a recurrence requiring just one
corresponding to the whole world wide web. However, matrix–vector multiplication and three inner products
the most common use of stationary iterative methods per iteration, and MINRES is just a little more expen-
is as preconditioners within other iterative methods. sive. GMRES, being applicable to non-Hermitian matri-
The aim of preconditioning is to convert a given lin- ces, is significantly more expensive, and it is also much
ear system Ax = b into one that can be solved more harder to analyze its convergence behavior. For general
cheaply by a particular iterative method. The basic idea matrices there are alternatives to GMRES that employ
is to use a nonsingular matrix W to transform the sys- short recurrences. We mention just BiCGSTAB, which
tem to (W −1 A)x = W −1 b in such a way that (a) the pre- has the distinction that the 1992 paper by Henk van
conditioned system can be solved in fewer iterations der Vorst that introduced it was the most-cited paper
than the original system and (b) matrix–vector multi- in mathematics of the 1990s.
plications with W −1 A (which require the solution of a Theoretically, Krylov methods converge in at most
linear system with coefficient matrix W ) are not signif- n iterations for a system of dimension n. However, in
icantly more expensive than matrix–vector multiplica- practical computation rounding errors intervene and
tions with A. In general, this is a difficult or impossible the methods behave as truly iterative methods not
task, but in many applications the matrix A has struc- having finite termination. Since n is potentially huge,
ture that can be exploited. For example, many elliptic a Krylov method would not be used unless a good
PDE problems lead to a positive definite matrix A of the approximate solution was obtained in many fewer than
form
M1 F
n iterations, and preconditioning plays a crucial role
A= T , here. Available error bounds for a method help to guide
F M2
15

the choice of preconditioner, but care is needed in inter- D = diag(λi ) containing the eigenvalues on its diag-
preting the bounds. To illustrate this, consider the CG onal. In many respects, normal matrices have very pre-
method for Ax = b, where A is Hermitian positive defi- dictable behavior. For example, kAk k2 = ρ(A)k and
nite. In the A-norm, kzkA = (z∗ Az)1/2 , the error on the k etA k2 = eα(tA) , where the spectral abscissa α(tA) is
kth step satisfies the largest real part of any eigenvalue of tA. However,
!k matrices that arise in practice are often very nonnor-
κ2 (A)1/2 − 1
kx − x (k) kA 6 2kx − x (0) kA , mal. The adjective “very” can be quantified in various
κ2 (A)1/2 + 1
ways, of which one is the Frobenius norm of the strictly
where κ2 (A) = kAk2 kA−1 k2 . If we can precondition A
upper triangular part of the upper triangular matrix T
so that its 2-norm condition number is very close to 1
in the Schurh decomposition A = QT Q∗ . For example,
then fast convergence is guaranteed. However, another t11 θ
i
the matrix 0 t22 is nonnormal for θ 6= 0 and grows
result says that if A has k distinct eigenvalues then
increasingly nonnormal as |θ| increases.
CG converges in at most k iterations. Therefore a bet-
Consider the moderately nonnormal matrix
ter approach might be to choose the preconditioner so
that the eigenvalues of the preconditioned matrix are −0.97 25

A= . (10)
clustered into a small number of groups. 0 −0.3
Another important class of iterative methods is While the powers of A ultimately decay to zero, since
multigrid methods, which work on a hierarchy of grids ρ(A) = 0.97 < 1, we see from figure 4 that initially they
that come from a discretization of an underlying PDE increase in norm. Likewise, since α(A) = −0.3 < 0 the
(geometric multigrid) or are constructed artificially norm k etA k2 tends to zero as t → ∞, but figure 4 shows
from a given matrix (algebraic multigrid).
that there is an initial hump in the plot. In station-
An important practical issue is how to terminate
ary iterations the hump caused by a nonnormal iter-
an iteration. Popular approaches are to stop when the
ation matrix M −1 N can delay convergence, as is clear
residual r (k) = b − Ax (k) (suitably scaled) is small or
from (9). In finite precision arithmetic it can even hap-
when an estimate of the error x − x (k) is small. Compli-
pen that, for a sufficiently large hump, rounding errors
cating factors include the fact that the preconditioner
cause the norms of the powers to plateau at the hump
can change the norm and a possible desire to match the
level and never actually converge to zero.
error in the iterations with the discretization error in
How can we predict the shape of the curves in fig-
the PDE from which the linear system might have come
ure 4? Let us concentrate on kAk k2 . Initially it grows
(as there is no point solving the system to greater accu-
racy than the data warrants). Research in recent years like kAkk2 and ultimately it decays like ρ(A)k , the decay
has led to good understanding of these issues. rate following from (4). The height of the hump is
The ideas of Krylov methods and preconditioners can related to pseudospectra, which have been popularized
be applied to problems other than linear systems. A by Nick Trefethen.
popular Krylov method for solving the least squares The ε-pseudospectrum of A ∈ Cn×n is defined, for a
problem (6) is LSQR, which is mathematically equiva- given ε > 0, to be the set
lent to applying CG to the normal equations. In large-
Λε (A) = { z ∈ C : z is an eigenvalue of A + E
scale eigenvalue problems only a few eigenpairs are
usually required. A number of methods project the for some E with kEk2 < ε }, (11)
original matrix onto a Krylov subspace and then solve a and it can also be represented, in terms of the resolvent
smaller eigenvalue problem. These include the Lanczos (zI − A)−1 , as
method for Hermitian matrices and the Arnoldi method
for general matrices. Also of much current research Λε (A) = { z ∈ C : k(zI − A)−1 k2 > ε−1 }.
interest are rational Krylov methods based on rational
The 0.001-pseudospectrum, for example, tells us the
generalizations of Krylov subspaces.
uncertainty in the eigenvalues of A if the elements are
known only to three decimal places. Pseudospectra pro-
10 Nonnormality and Pseudospectra
vide much insight into the effects of nonnormality of
Normal matrices A ∈ Cn×n (defined in section 5.1) matrices and (with an appropriate extension of the def-
have the property that they are unitarily diagonaliz- inition) linear operators. For nonnormal matrices the
able: A = QDQ∗ for some unitary Q and diagonal pseudospectra are much bigger than a perturbation of
16

the spectrum by ε. It can be shown that for any ε > 0, 3. there is a positive vector x such that Ax = ρ(A)x,
ρε (A) − 1 ρε (A)k+1 4. ρ(A) is an eigenvalue of algebraic multiplicity 1.
sup kAk k > , kAk k 6 ,
k>0 ε ε
To illustrate the theorem consider the following two
where the pseudospectral radius ρε (A) = max{ |λ| : irreducible matrices and their eigenvalues:
λ ∈ Λε (A) }. For A in (10) and ε = 10−2 these inequal- 
8 1 6

√
ities give an upper bound of 230 for kA3 k and a lower A = 3 5 7,
 
Λ(A) = {15, ±2 6},
bound of 23 for supk>0 kAk k, and figure 5 plots the 4 9 2
corresponding ε-pseudospectrum. 
0 0 6

1 1 √
B =  2 0 0, Λ(B) = 1, 2 (−1 ± 3i) .

11 Structured Matrices 0 13 0
In a wide variety of applications the matrices have The Perron–Frobenius theorem correctly tells us that
a special structure. The matrix elements might form ρ(A) = 15 is a distinct eigenvalue of A, and that it has
a pattern, as for a Toeplitz matrix or a Hamiltonian a corresponding positive eigenvector, which is known
matrix, the matrix may satisfy a nonlinear equation as the Perron vector. The Perron vector of A is the vec-
such as A∗ ΣA = Σ, where Σ = diag(±1), which yields tor of all ones, as A forms a magic square and ρ(A) is
the pseudo-unitary matrices A, or the submatrices may the magic sum! The Perron vector of B, which is both a
satisfy certain rank conditions (as for quasisepara- Leslie matrix and a companion matrix, is [6 3 1]T . There
ble matrices). We discuss here two of the oldest and is one notable difference between A and B: for A, ρ(A)
most studied classes of structured matrices, both of exceeds the other eigenvalues in modulus, but all three
which were historically important in the analysis of eigenvalues of B have modulus 1. In fact, Perron’s orig-
iterative methods for linear systems arising from the inal version of Theorem 3 says that if A has all positive
discretization of differential equations. elements then ρ(A) is not only an eigenvalue of A but
is larger in modulus than every other eigenvalue. Note
11.1 Nonnegative Matrices that B 3 = I, which provides another way to see that the
eigenvalues of B all have modulus 1.
A nonnegative matrix is a real matrix all of whose
We saw in the section 9 that the spectral radius plays
entries are nonnegative. A number of important classes
an important role in the convergence of stationary iter-
of matrices are subsets of the nonnegative matrices.
ative methods, through ρ(M −1 N), where A = M − N is
These include adjacency matrices, stochastic matrices,
a splitting. In comparing different splittings we can use
and Leslie matrices (used in population modeling). Non-
the result that for A, B ∈ Rn×n , with |A| denoting the
negative matrices have a large body of theory, which
matrix (|aij |),
originates with Perron in 1907 and Frobenius in 1908.
To state the celebrated Perron–Frobenius theorem |aij | 6 bij ∀i, j ⇒ ρ(A) 6 ρ(|A|) 6 ρ(B).
we need the definition that A ∈ Rn×n with n > 2 is
reducible if there is a permutation matrix P such that

A11 A12
11.2 M-Matrices
P T AP = ,
0 A22 A ∈ Rn×n is an M-matrix if it can be written in the form
where A11 and A22 are square, nonempty submatrices, A = sI − B, where B is nonnegative and s > ρ(B). M-
and it is irreducible if it is not reducible. A matrix with matrices arise in many applications, a classic one being
positive entries is trivially irreducible. A useful char- Leontief’s input–output models in economics.
acterization is that A is irreducible if and only if the The special sign pattern of an M-matrix—positive
directed graph associated with A (which has n vertices, diagonal elements and nonpositive off-diagonal
with an an edge connecting the ith vertex to the jth elements—combines with the spectral radius condi-
vertex if aij 6= 0) is strongly connected. tion to give many interesting characterizations and
Theorem 3 (Perron–Frobenius). If A ∈ Rn×n is nonneg- properties. For example, a nonsingular matrix A with
ative and irreducible then nonpositive off-diagonal elements is an M-matrix if and
only if A−1 is nonnegative. Another characterization,
1. ρ(A) > 0, which makes connections with section 1, is that A
2. ρ(A) is an eigenvalue of A, is an M-matrix if and only if A has positive diagonal
17

34
16

32 14

30 12

10
||Ak||2 28 ||etA||2
8
26
6
24
4

22
2

20 0
0 5 10 15 20 0 2 4 6 8 10
k t

Figure 4 2-norms of powers and exponentials of 2 × 2 matrix A in (10).

open right half-plane. This means that M-matrices are

0.4
special cases of positive stable matrices, which in turn
0.3 are of great interest due to the fact that the stabil-
0.2
ity of various mathematical processes is equivalent to
positive (or negative) stability of an associated matrix.
0.1 The class of matrices whose inverses are M-matrices
0 is also much-studied. To indicate why, we state a result
about matrix roots. It is known that if A is an M-matrix
−0.1
then A1/2 is also an M-matrix. But if A is stochastic (that
−0.2 is, it is nonnegative and has unit row sums), A1/2 may
not be stochastic. However, if A is both stochastic and
−0.3
the inverse of an M-matrix then A1/p is stochastic for
−0.4 all positive integers p.

−1.2 −1 −0.8 −0.6 −0.4 −0.2

12 Matrix Inequalities
Figure 5 Approximation to 10−2 -pseudospectrum of A in
(10) comprising eigenvalues of 5000 randomly perturbed There is a large body of work on matrix inequalities,
matrices A + E in (11). The eigenvalues of A are marked by ranging from classical 19th and early 20th century
white circles. inequalities (some of which are described in section 5.4)
to more recent contributions, which are often moti-
entries and AD is diagonally dominant by rows for vated by applications, notably in statistics, physics,
some nonsingular diagonal matrix D. and control theory. In this section we describe just
An important source of M-matrices is discretizations a few examples, chosen for their interest or practical
of differential equations, and the archetypal example is usefulness.
the second-difference matrix, described at the start of An important class of inequalities on Hermitian
section 6, which is an M-matrix multiplied by −1. For matrices is expressed using the Löwner (partial) order-
this application it is an important result that when A ing in which for Hermitian X and Y , X > Y denotes that
is an M-matrix the Jacobi and Gauss–Seidel iterations X − Y is positive semidefinite while X > Y denotes that
for Ax = b both converge for any starting vector—a X − Y is positive definite. Many inequalities between
result that is part of the more general theory of regular real numbers generalize to Hermitian matrices in this
splittings. ordering. For example, if A, B, C are Hermitian and A
Another important property of M-matrices is imme- commutes with B and C then
diate from the definition: the eigenvalues all lie in the A > 0, B6C ⇒ AB 6 AC.
18

A function f is matrix monotone if it preserves the the related inequalities k eA+B k 6 k eA/2 eB eA/2 k 6
order, that is, A 6 B implies f (A) 6 f (B), where f (A) k eA eB k hold for any unitarily invariant norm.
denotes a function of a matrix. Much is known about
this class of functions, including that t 1/2 and log t are 13 Library Software
matrix monotone but t 2 is not.
From the early days of digital computing the benefits
Many matrix inequalities involve norms. One exam-
of providing library subroutines for carrying out basic
ple is
√ operations such as the addition of vectors and the for-
k |A| − |B| kF 6 2kA − BkF , mation of vector inner products was recognized. Over
where A, B ∈ Cm×n and | · | is the matrix absolute value the ensuing years many matrix computation research
defined in section 2. This inequality can be regarded codes were published, including in the Linear Alge-
as a perturbation result that shows the matrix absolute bra volume of the Handbook for Automatic Computa-
value to be very well conditioned. tion (1971) and in the Collected Algorithms of the ACM.
An example of an inequality that finds use in the Starting in the 1970s the concept of standardized sub-
analysis of convergence of methods in nonlinear opti- programs was developed in the form of the Basic Lin-
mization is the Kantorovich inequality, which for Her- ear Algebra Subprograms (BLAS), which are specifica-
mitian positive definite A with eigenvalues λn 6 · · · 6 tions for vector (level 1), matrix–vector (level 2), and
λ1 and x 6= 0 is matrix–matrix (level 3) operations. The BLAS have been
widely adopted, and highly optimized implementations
(x ∗ Ax)(x ∗ A−1 x) (λ1 + λn )2
∗ 2
6 . are available for most machines. The freely-available
(x x) 4λ1 λn
LAPACK library of Fortran codes represents the current
This inequality is attained for some x, and the left-hand state of the art for solving dense linear equations, least
side is always at least 1. squares problems, and eigenvalue and singular value
Many inequalities are available that generalize scalar problems. Many modern programming packages and
inequalities for means. For example, the arithmetic– environments build on LAPACK.
1
geometric mean inequality (ab)1/2 6 2 (a + b) for posi- It is interesting to note that the TOP500 list
tive scalars has an analogue for Hermitian positive def- (http://www.top500.org) ranks the world’s fastest
1
inite A and B in the inequality A # B 6 2 (A + B), where computers by their speed (measured in flops per sec-
A # B is the geometric mean defined as the unique Her- ond) in solving a random linear system Ax = b by GE.
mitian positive definite solution to XA−1 X = B. The This benchmark has its origins in the 1970s LINPACK
geometric mean also satisfies the extremal property project, a precursor to LAPACK, in which the perfor-
A X mance of contemporary machines was compared by

A # B = max X : X = X ∗ , >0 ,
X B running the LINPACK GE code on a 100 × 100 system.
which hints at matrix completion problems, in which
the aim is to choose missing elements of a matrix in 14 Outlook
order to achieve some goal, which could be to satisfy a Matrix analysis and numerical linear algebra remain
particular matrix property or, as here, to maximize an very active areas of research. Many problems in applied
objective function. Another mean for Hermitian posi- mathematics and scientific computing require the solu-
tive definite matrices (and applicable more generally), tion of a matrix problem at some stage, so there is
1
is the log-Euclidean mean, exp( 2 (log A + log B)), where always a demand for better understanding of matrix
log is the principal logarithm, which is used in image problems and faster and more accurate algorithms for
registration, for example. their solution. As the overarching applications evolve,
Finally, we mention an inequality for the matrix expo- new problem variants are generated, often involving
nential. Although there is no simple relation between new assumptions on the data, different requirements
eA+B and eA eB in general, for Hermitian A and B the on the solution, or new metrics for measuring the suc-
inequality trace(eA+B ) 6 trace(eA eB ) was proved inde- cess of an algorithm. A further driver of research is
pendently by S. Golden and J. Thompson in 1965. Orig- computer hardware. With the advent of processors with
inally of interest in statistical mechanics, the Golden– many cores, the use of accelerators such as graphics
Thompson inequality has more recently found use in processing units (GPUs), and the harnessing of vast
random matrix theory. Again for Hermitian A and B, numbers of processors for parallel computing, the
19

standard algorithms in numerical linear algebra are

having to be reorganized and possibly even replaced,
so we are likely to see significant changes in the coming
years.

15 Further Reading
Three must-haves for researchers are Golub and Van
Loan’s influential treatment of numerical linear alge-
bra and the two volumes by Horn and Johnson, which
contain a comprehensive treatment of matrix analysis.

[1] Rajendra Bhatia. Matrix Analysis. Springer-Verlag,

New York, 1997. xi+347 pp. ISBN 0-387-94846-5.
[2] Rajendra Bhatia. Linear algebra to quantum
cohomology: The story of Alfred Horn’s inequal-
ities. Amer. Math. Monthly, 108(4):289–318, 2001.
[3] Rajendra Bhatia. Positive Definite Matrices. Prince-
ton University Press, Princeton, NJ, USA, 2007.
ix+254 pp. ISBN 0-691-12918-5.
[4] Gene H. Golub and Charles F. Van Loan. Matrix
Computations. Fourth edition, Johns Hopkins Uni-
versity Press, Baltimore, MD, USA, 2013. xxi+756
pp. ISBN 978-1-4214-0794-4.
[5] Nicholas J. Higham. Accuracy and Stability of
Numerical Algorithms. Second edition, Society for
Industrial and Applied Mathematics, Philadelphia,
PA, USA, 2002. xxx+680 pp. ISBN 0-89871-521-0.
[6] Roger A. Horn and Charles R. Johnson. Topics in
Matrix Analysis. Cambridge University Press, Cam-
bridge, UK, 1991. viii+607 pp. ISBN 0-521-30587-X.
[7] Roger A. Horn and Charles R. Johnson. Matrix
Analysis. Second edition, Cambridge University
Press, Cambridge, UK, 2013. xviii+643 pp. ISBN
978-0-521-83940-2.
[8] Beresford N. Parlett. The Symmetric Eigenvalue
Problem. Society for Industrial and Applied Mathe-
matics, Philadelphia, PA, USA, 1998. xxiv+398 pp.
Unabridged, amended version of book first pub-
lished by Prentice-Hall in 1980. ISBN 0-89871-402-
8.
[9] Yousef Saad. Iterative Methods for Sparse Linear
Systems. Second edition, Society for Industrial and
Applied Mathematics, Philadelphia, PA, USA, 2003.
xviii+528 pp. ISBN 0-89871-534-2.
[10] G. W. Stewart and Ji-guang Sun. Matrix Pertur-
bation Theory. Academic Press, London, 1990.
xv+365 pp. ISBN 0-12-670230-6.
[11] Françoise Tisseur and Karl Meerbergen. The quad-
ratic eigenvalue problem. SIAM Rev., 43(2):235–
286, 2001.

Matrix Analysis, Bellman
67% (3)
Matrix Analysis, Bellman
426 pages
Class 10 Mathematics Mind Map
No ratings yet
Class 10 Mathematics Mind Map
14 pages
Ebin - Pub - Linear Algebra and Differential Equations Using Matlab
No ratings yet
Ebin - Pub - Linear Algebra and Differential Equations Using Matlab
654 pages
L. Fox - An Introduction To Numerical Linear Algebra-Oxford University Press (1967) PDF
100% (2)
L. Fox - An Introduction To Numerical Linear Algebra-Oxford University Press (1967) PDF
345 pages
LinearAlgebra GDF Jan5 23
No ratings yet
LinearAlgebra GDF Jan5 23
305 pages
Matrix Analysis For Scientists and Engineers
100% (2)
Matrix Analysis For Scientists and Engineers
172 pages
(Richard Bellman) Introduction To Matrix Analysis, (BookFi)
100% (1)
(Richard Bellman) Introduction To Matrix Analysis, (BookFi)
426 pages
The Theory of Matrices - Lancaster PDF
100% (2)
The Theory of Matrices - Lancaster PDF
588 pages
Updated-Numerical Solutions To CE Problems
100% (3)
Updated-Numerical Solutions To CE Problems
24 pages
Matrix Analysis For Scientists and Engineers Alan J Laub
No ratings yet
Matrix Analysis For Scientists and Engineers Alan J Laub
172 pages
Elementary Linear Algebra Applications Version 12th Edition PDF
No ratings yet
Elementary Linear Algebra Applications Version 12th Edition PDF
24 pages
Functions of Matrices Theory and Computation TQW - Darksiderg PDF
100% (3)
Functions of Matrices Theory and Computation TQW - Darksiderg PDF
446 pages
Matrix Polynomials
100% (5)
Matrix Polynomials
435 pages
Pma Exam
100% (1)
Pma Exam
358 pages
Matrix Analysis For Scientists and Engineers by Alan J Laub
91% (11)
Matrix Analysis For Scientists and Engineers by Alan J Laub
172 pages
A Second Course in Elementary Differential Equations PDF
No ratings yet
A Second Course in Elementary Differential Equations PDF
201 pages
P Lancaster The Theory of Matrices 2nd ED PDF
100% (1)
P Lancaster The Theory of Matrices 2nd ED PDF
587 pages
Virtusa Placement Paper 2011without
No ratings yet
Virtusa Placement Paper 2011without
75 pages
Matrices and Its Application
100% (6)
Matrices and Its Application
25 pages
Lectures On Aplied Math Linear Algebra Version 34
No ratings yet
Lectures On Aplied Math Linear Algebra Version 34
595 pages
Applied Linear Algebra: Third Edition
No ratings yet
Applied Linear Algebra: Third Edition
5 pages
Schaum S Outline of Theory and Problems of Matrix Operations PDF
No ratings yet
Schaum S Outline of Theory and Problems of Matrix Operations PDF
235 pages
Matrix - Algebra - and - Applications (I)
No ratings yet
Matrix - Algebra - and - Applications (I)
30 pages
Fundamentals of Numerical Linear Algebra
No ratings yet
Fundamentals of Numerical Linear Algebra
265 pages
Algebra Note
No ratings yet
Algebra Note
116 pages
Finite-Dimensional Linear Algebra: Mark S
0% (1)
Finite-Dimensional Linear Algebra: Mark S
7 pages
Iso Cie 11664-6-2014
100% (1)
Iso Cie 11664-6-2014
18 pages
Mathematics I PDF
No ratings yet
Mathematics I PDF
247 pages
MAtrices Review
No ratings yet
MAtrices Review
9 pages
5 System of Linear Equation
No ratings yet
5 System of Linear Equation
12 pages
Ipse Ilsen
No ratings yet
Ipse Ilsen
135 pages
1 9780898717778 BM
No ratings yet
1 9780898717778 BM
46 pages
Math-12th Sample Question Papers (Solved) 2024-25
No ratings yet
Math-12th Sample Question Papers (Solved) 2024-25
21 pages
Linear Algebra and Ordinary Differential Equations
No ratings yet
Linear Algebra and Ordinary Differential Equations
17 pages
QB 060010212
No ratings yet
QB 060010212
38 pages
Main Work
No ratings yet
Main Work
60 pages
1 9781611977448 FM
No ratings yet
1 9781611977448 FM
14 pages
Apendix A
No ratings yet
Apendix A
15 pages
Nonlinear Algebra Matrix
No ratings yet
Nonlinear Algebra Matrix
13 pages
Intronumericalrecipes v01 Chapter01 Linalg
No ratings yet
Intronumericalrecipes v01 Chapter01 Linalg
31 pages
Mathematics: Linear Algebra & Ode
No ratings yet
Mathematics: Linear Algebra & Ode
24 pages
Replacement Models
No ratings yet
Replacement Models
9 pages
Maths Unit 1
No ratings yet
Maths Unit 1
14 pages
MATH3322 1 Introduction
No ratings yet
MATH3322 1 Introduction
5 pages
University of Zakho Faculty of Education Department of Mathematics Second Stage Semester 4
No ratings yet
University of Zakho Faculty of Education Department of Mathematics Second Stage Semester 4
28 pages
NA 5 Latex
No ratings yet
NA 5 Latex
45 pages
Notes LinearSystems
No ratings yet
Notes LinearSystems
33 pages
Allied Radio Data Handbook 1943
No ratings yet
Allied Radio Data Handbook 1943
52 pages
MATH219 Lecture 6
No ratings yet
MATH219 Lecture 6
9 pages
Matb 314 & Matb 253 - Linear Algebra
No ratings yet
Matb 314 & Matb 253 - Linear Algebra
30 pages
Abstract Classes
No ratings yet
Abstract Classes
5 pages
Appendix B-Matrix Algebra
No ratings yet
Appendix B-Matrix Algebra
5 pages
Relativity An Introduction To Special and General Relativity
100% (22)
Relativity An Introduction To Special and General Relativity
418 pages
The Incident at Antioch, by Alain Badiou
100% (1)
The Incident at Antioch, by Alain Badiou
32 pages
Crux v20n05 May
No ratings yet
Crux v20n05 May
35 pages
(Undergraduate Lecture Notes in Physics) Albrecht Lindner, Dieter Strauch - A Complete Course on Theoretical Physics_ From Classical Mechanics to Advanced Quantum Statistics-Springer International Pub
100% (16)
(Undergraduate Lecture Notes in Physics) Albrecht Lindner, Dieter Strauch - A Complete Course on Theoretical Physics_ From Classical Mechanics to Advanced Quantum Statistics-Springer International Pub
655 pages
Wave and Oscillation PDF PDF
100% (10)
Wave and Oscillation PDF PDF
417 pages
Linear Algebra Optimization Machine Learning PDF
100% (12)
Linear Algebra Optimization Machine Learning PDF
507 pages
Nivaldo A. Lemos - Analytical Mechanics (2018, Cambridge University Press) PDF
82% (11)
Nivaldo A. Lemos - Analytical Mechanics (2018, Cambridge University Press) PDF
475 pages
2019 Book BasicQuantumMechanics
89% (9)
2019 Book BasicQuantumMechanics
516 pages
Module 2 Revised
No ratings yet
Module 2 Revised
25 pages
Sanet - ST 3030528146
100% (5)
Sanet - ST 3030528146
506 pages
Tensor Analysis
100% (10)
Tensor Analysis
346 pages
Multivariable Calculus With Applications - Lax
95% (21)
Multivariable Calculus With Applications - Lax
488 pages
Mann, Peter - Lagrangian & Hamiltonian Dynamics (2018, Oxford University Press) PDF
100% (12)
Mann, Peter - Lagrangian & Hamiltonian Dynamics (2018, Oxford University Press) PDF
553 pages
A Course in Modern Mathematical Physics
95% (43)
A Course in Modern Mathematical Physics
618 pages
Multivariable and Vector Calculus 696f PDF
100% (11)
Multivariable and Vector Calculus 696f PDF
319 pages
A User-Friendly Introduction To Lebesgue Measure and Integration
100% (8)
A User-Friendly Introduction To Lebesgue Measure and Integration
233 pages
Differential Geometry in Physics Lugo
100% (6)
Differential Geometry in Physics Lugo
374 pages
Metric Space Topology
100% (2)
Metric Space Topology
443 pages
(UNITEXT For Physics) Kurt Lechner - Classical Electrodynamics - A Modern Perspective (2018, Springer) PDF
100% (9)
(UNITEXT For Physics) Kurt Lechner - Classical Electrodynamics - A Modern Perspective (2018, Springer) PDF
699 pages
Essentials of Hamiltonian Dynamics - John H.lowenstein
100% (12)
Essentials of Hamiltonian Dynamics - John H.lowenstein
203 pages
(GTM 275) Loring W. Tu-Differential Geometry - Connections, Curvature, and Characteristic Classes-Springer (2017)
80% (10)
(GTM 275) Loring W. Tu-Differential Geometry - Connections, Curvature, and Characteristic Classes-Springer (2017)
358 pages
Mat 1275
No ratings yet
Mat 1275
5 pages
Grade 4 Mathematics Term 4 Mock Exam: Place Value
0% (1)
Grade 4 Mathematics Term 4 Mock Exam: Place Value
4 pages
Differential Geometry and Mathematical Physics - Part II. Fibre Bundles, Topology and Gauge Fields (PDFDrive)
100% (4)
Differential Geometry and Mathematical Physics - Part II. Fibre Bundles, Topology and Gauge Fields (PDFDrive)
837 pages
Alexander L. Kuzemsky - Alexander Leonidovich Kuzemsky
100% (1)
Alexander L. Kuzemsky - Alexander Leonidovich Kuzemsky
1,259 pages
Pletser - Lagrangian and Hamiltonian Analytical Mechanics PDF
100% (6)
Pletser - Lagrangian and Hamiltonian Analytical Mechanics PDF
138 pages
Tensor Calculus For Physics Concise by Dwight Neuenschwander
100% (12)
Tensor Calculus For Physics Concise by Dwight Neuenschwander
239 pages
2017 Book MechanicsAndThermodynamics PDF
100% (8)
2017 Book MechanicsAndThermodynamics PDF
459 pages
(Thomas - Wolfram, - Sinasi - Ellialtıoglu, - 2014) Applications of Group Theory To Atoms, Molecules, and Solids
100% (3)
(Thomas - Wolfram, - Sinasi - Ellialtıoglu, - 2014) Applications of Group Theory To Atoms, Molecules, and Solids
485 pages
Jörg Bünemann - Group Theory in Physics - An Introduction With A Focus On Solid State Physics-Springer International Publishing (2024)
No ratings yet
Jörg Bünemann - Group Theory in Physics - An Introduction With A Focus On Solid State Physics-Springer International Publishing (2024)
233 pages
물리 교재 28단원
No ratings yet
물리 교재 28단원
26 pages
Samrat Class 8 Science
No ratings yet
Samrat Class 8 Science
10 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Elementary Statistics A Step by Step Approach 9th Edition Bluman Test Bank PDF Download
100% (2)
Elementary Statistics A Step by Step Approach 9th Edition Bluman Test Bank PDF Download
65 pages
Fig-1 in (Lec - 05 - Ver - 01.vsd) : Common Emitter Amplifier Frequency Response
No ratings yet
Fig-1 in (Lec - 05 - Ver - 01.vsd) : Common Emitter Amplifier Frequency Response
16 pages
Averages Arithmetic Mean
No ratings yet
Averages Arithmetic Mean
2 pages
Vdoc - Pub Spectral Theory Basic Concepts and Applications
100% (6)
Vdoc - Pub Spectral Theory Basic Concepts and Applications
339 pages
(Cambridge Texts in Applied Mathematics) Gabriel J. Lord, Catherine E. Powell, Tony Shardlow - An Introduction To Computational Stochastic PDEs-Cambridge University Press (2014)
100% (1)
(Cambridge Texts in Applied Mathematics) Gabriel J. Lord, Catherine E. Powell, Tony Shardlow - An Introduction To Computational Stochastic PDEs-Cambridge University Press (2014)
513 pages
SUMS87 Algebras and Representation Theory, Karin Erdmann, Thorsten Holm (2018) PDF
100% (2)
SUMS87 Algebras and Representation Theory, Karin Erdmann, Thorsten Holm (2018) PDF
304 pages
Cheng and Li Gauge Theory of Elementary Particles
100% (2)
Cheng and Li Gauge Theory of Elementary Particles
549 pages
Fortney, J.P.-A Visual Introduction To Differential Forms and Calculus On Manifolds-Springer International Publishing (2019) PDF
100% (19)
Fortney, J.P.-A Visual Introduction To Differential Forms and Calculus On Manifolds-Springer International Publishing (2019) PDF
470 pages
Paperia 1 2022
No ratings yet
Paperia 1 2022
7 pages
OULD - bammOUNE Preparation Tp1
No ratings yet
OULD - bammOUNE Preparation Tp1
13 pages
1.6 Other Types of Equations: Objectives
No ratings yet
1.6 Other Types of Equations: Objectives
19 pages
Applied III Chapter 1
No ratings yet
Applied III Chapter 1
51 pages
Crux v20n02 Feb
No ratings yet
Crux v20n02 Feb
35 pages
Analysis of Selected Mathematical Models of High-Cycle S-N Characteristics
No ratings yet
Analysis of Selected Mathematical Models of High-Cycle S-N Characteristics
15 pages
Vertopal Com EDA Project
No ratings yet
Vertopal Com EDA Project
21 pages
Stable and Unstable Manifold, Heteroclinic Trajectories and The Pendulum
No ratings yet
Stable and Unstable Manifold, Heteroclinic Trajectories and The Pendulum
7 pages
Caie As Level Psychology 9990 Methodology 63d5229efa0a7313631e05cb 853
No ratings yet
Caie As Level Psychology 9990 Methodology 63d5229efa0a7313631e05cb 853
9 pages
Mastercam 2017 Readme
No ratings yet
Mastercam 2017 Readme
50 pages
Alexis Butler's - Fac - Fact - Fic - Pon - Pound - Pono - Struct - Strue - Stit - Stat - Sto
No ratings yet
Alexis Butler's - Fac - Fact - Fic - Pon - Pound - Pono - Struct - Strue - Stit - Stat - Sto
10 pages
Section 8-2
No ratings yet
Section 8-2
10 pages
High Voltage Transformer
No ratings yet
High Voltage Transformer
12 pages
Math Tessellation Final Project
No ratings yet
Math Tessellation Final Project
8 pages
Gretl Empirical Exercise 2 - KEY PDF
No ratings yet
Gretl Empirical Exercise 2 - KEY PDF
3 pages
Special Matrices and Their Applications in Numerical Mathematics: Second Edition
From Everand
Special Matrices and Their Applications in Numerical Mathematics: Second Edition
Miroslav Fiedler
5/5 (1)
Kronecker Products and Matrix Calculus with Applications
From Everand
Kronecker Products and Matrix Calculus with Applications
Alexander Graham
No ratings yet
The Theory of Matrices in Numerical Analysis
From Everand
The Theory of Matrices in Numerical Analysis
Alston S. Householder
3.5/5 (3)
Nonnegative Matrices and Applicable Topics in Linear Algebra
From Everand
Nonnegative Matrices and Applicable Topics in Linear Algebra
Alexander Graham
No ratings yet
Matrix Theory and Applications for Scientists and Engineers
From Everand
Matrix Theory and Applications for Scientists and Engineers
Alexander Graham
No ratings yet
Modern Multidimensional Calculus
From Everand
Modern Multidimensional Calculus
Marshall Evans Munroe
No ratings yet
Determinants and Matrices
From Everand
Determinants and Matrices
A. C. Aitken
3/5 (1)
Matrix Theory
From Everand
Matrix Theory
Joel N. Franklin
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Exercises of Matrices and Linear Algebra
From Everand
Exercises of Matrices and Linear Algebra
Simone Malacrida
4/5 (1)
Introduction to Analytical Geometry
From Everand
Introduction to Analytical Geometry
Simone Malacrida
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Exercises of Differential Linear Systems
From Everand
Exercises of Differential Linear Systems
Simone Malacrida
No ratings yet
Introduction to Numerical Analysis
From Everand
Introduction to Numerical Analysis
Simone Malacrida
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet

Numerical Linear Algebra and Matrix Analysis: Higham, Nicholas J. 2015

Uploaded by

Numerical Linear Algebra and Matrix Analysis: Higham, Nicholas J. 2015

Uploaded by

Numerical Linear Algebra and Matrix Analysis

MIMS EPrint: 2015.103

Manchester Institute for Mathematical Sciences

The University of Manchester

Reports available from: http://eprints.maths.manchester.ac.uk/

exploitation of matrix structure (such as sparsity, sym-

many applications, including signal processing, where 5 Eigenvalue Problems

5.2 Eigenvalue Sensitivity 5.3 Companion Matrices and the Characteristic

is the companion matrix Special cases are λn = minx6=0 x ∗ Ax/(x ∗ x) and λ1 =

• GE with partial pivoting is backward stable for solv- 0 0

When GE is applied to a sparse matrix fill-in occurs 200 200

age and the computational cost, fill-in must be avoided 0 0

introduced by Markowitz in 1957. At the kth stage, 150 150

Wilkinson showed that when a nonsingular linear necessarily close to Q—which

Figure 4 2-norms of powers and exponentials of 2 × 2 matrix A in (10).

open right half-plane. This means that M-matrices are

−1.2 −1 −0.8 −0.6 −0.4 −0.2

standard algorithms in numerical linear algebra are

[1] Rajendra Bhatia. Matrix Analysis. Springer-Verlag,

You might also like