Numerical Methods For Least Squares Problems, Second Edition
Numerical Methods For Least Squares Problems, Second Edition
Problems
The first edition of Numerical Methods for Least Squares Problems was the leading
reference on this topic for many years. The updated second edition stands out
compared to other books on the topic because it
• provides an in-depth and up-to-date treatment of direct and iterative methods for
solving different types of least squares problems and for computing the singular
value decomposition;
• covers generalized, constrained, and nonlinear least squares problems as well as
partial least squares and regularization methods for discrete ill-posed problems; Second Edition
and
• contains a bibliography of over 1,100 historical and recent references, providing a
comprehensive survey of past and present research in the field.
Audience
This book will be of interest to graduate students and researchers in applied
mathematics and to researchers working with numerical linear algebra applications.
Åke Björck
For more information about SIAM books, journals, conferences, memberships, and activities, contact:
Åke Björck
3600 Market Street, 6th Floor
Philadelphia, PA 19104-2688 USA
+1-215-382-9800
[email protected] • www.siam.org
OT196
OT196
ISBN: 978-1-61197-794-3
90000
9 781611 977943
Åke Björck
Linköping University
Linköping, Sweden
is a registered trademark.
Contents
List of Tables ix
Preface xi
v
vi Contents
Bibliography 431
Index 487
List of Figures
4.1.1 Matrix A after reduction of first k = 3 blocks using Householder reflections . . 190
4.1.2 Reduction of a band matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
4.2.1 Relative error and residual for PLS and TSVD solutions. . . . . . . . . . . . . 203
4.3.1 One and two levels of dissection of a region. . . . . . . . . . . . . . . . . . . 206
5.1.1 Nonzero pattern of matrix from structural problem and its Cholesky factor. . . 244
5.1.2 The labeled graph G(C) of the matrix in (5.1.2). . . . . . . . . . . . . . . . . 248
5.1.3 Sequence of elimination graphs of the matrix in (5.1.2). . . . . . . . . . . . . 249
5.1.4 The transitive reduction and elimination tree T (ATA). . . . . . . . . . . . . . 250
5.1.5 The graph of a matrix for which minimum degree is not optimal. . . . . . . . . 252
5.2.1 Sparse matrix A and factor R using the MATLAB colperm reordering. . . . . 260
5.2.2 Sparse matrix A and factor R using the MATLAB colamd ordering. . . . . . . 260
5.2.3 Structure of upper triangular matrix R for a rank-deficient matrix. . . . . . . . 261
6.1.1 Structure of A and ATA for a simple image reconstruction problem. . . . . . . 268
6.2.1 ∥x† − xk ∥ and ∥AT rk ∥ for problem ILLC1850: LSQR and CGLS . . . . . . . 295
6.2.2 ∥x† − xk ∥ and ∥AT rk ∥ for problem ILLC1850: LSQR and LSMR . . . . . . 296
6.2.3 Underdetermined consistent problem with transpose of ILLC1850: ∥x† − xk ∥
and ∥rk ∥; CGME and LSME . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
6.2.4 Problem ILLC1850 overdetermined consistent: ∥x† − xk ∥ and ∥rk ∥; CGME
and CGLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
vii
List of Tables
3.2.1 Average number of correct significant decimal digits in the solution before and
after iterative refinement with various QR factorizations. . . . . . . . . . . . . 132
ix
Preface
More than 25 years have passed since the first edition of this book was published in 1996. Least
squares and least-norm problems have become more significant with every passing decade, and
applications have grown in size, complexity, and variety. More advanced techniques for data
acquisition give larger amounts of data to be treated. What counts as a large matrix has gone
from dimension 1000 to 106 . Hence, iterative methods play an increasingly crucial role for the
solution of least squares problems. On top of these changes, methods must be adapted to new
generations of multiprocessing hardware.
This second edition is primarily aimed at applied mathematicians and graduate students.
Like the first edition, it aims to give a comprehensive review of the state of the art and to survey
the most important publications. Special effort has gone into making the revised bibliography
as comprehensive and up to date as possible. More than half the references are new, and a
substantial share are from the last decade.
To address the mentioned trends, many parts of this edition are enlarged and completely
rewritten. Several new sections have been added, and the content has been reordered. Be-
cause underdetermined linear systems increasingly occur in applications, the duality between
least squares and least-norm problems is now more emphasized throughout.
The Cosine-Sine (CS) decomposition is a new addition to Chapter 1. Among the novelties
in Chapter 2 are new results on Gram–Schmidt and block QR algorithms. Blocked and recursive
algorithms for Cholesky and QR factorization are also covered. The section on computing the
SVD has been much enlarged and moved to the new Chapter 7.
Chapter 3 presents more complete treatments of generalized and weighted least squares prob-
lems. Oblique projectors and elliptic Modified Gram–Schmidt (MGS) and Householder algo-
rithms are other important additions, and new results are given on the stability of algorithms for
weighted least squares. Linear equality and inequality constrained least squares problems are
treated, along with a more complete treatment of quadratic constraints. A much enlarged treat-
ment of regularization methods is also found in Chapter 3, including truncated singular value
decomposition (SVD), Tikhonov regularization, and transformation to standard form.
Chapter 4 starts with a section on band matrices and methods for band least squares problems;
this section originally appeared in Section 6.1 of the first edition. Next, a new section follows on
Householder and Golub–Kahan bidiagonalizations and their connection to the concept of core
problems for linear systems and Krylov subspace approximations. Another new section covers
algorithms for the partial least squares (PLS) method for prediction and cause-effect inference.
(PLS is much used in chemometrics, bioinformatics, food research, medicine, pharmacology,
social sciences, physiology.) Next, this chapter gives methods for least squares problems with
special structure, such as block-angular form, Kronecker, or Toeplitz. An introduction to tensor
computations and tensor decompositions is also a new addition to this chapter. The section on
total least squares problems now includes a treatment of large-scale problems.
xi
xii Preface
Chapter 5 treats direct methods for sparse problems and corresponds to Chapter 6 in the first
edition. New software, such as SuiteSparse QR, is surveyed. A notable addition is a section on
methods for solving mixed sparse-dense least squares problems.
Iterative methods for least squares and least-norm problems are treated in Chapter 6. The
Krylov subspace methods CGLS and LSQR as well as the recently introduced LSMR are de-
scribed. The section on preconditioners is completely revised and covers new results on, e.g.,
approximate Cholesky and QR preconditioners. A survey of preconditioners based on random
sampling is also new. Section 6.4 now covers regularization by iterative methods, including
methods for saddle point and symmetric quasi-definite (SQD) systems.
Chapter 7 on algorithms for computing the SVD is a much enlarged version of Section 2.6
of the first edition. Here, new topics are Jacobi-type methods and differential qd and MRRR
algorithms. Brief surveys of the matrix square root and sign functions as well as the polar de-
composition are also included.
Chapter 8 covers methods for nonlinear least squares problems. Several new topics are in-
cluded, such as inexact Gauss–Newton methods, bilinear least squares, and nonnegative least
squares. Also discussed here are algorithms for robust regression, least-angle regression, and
LASSO; compressed sensing; and iteratively reweighted least squares (IRLS).
Acknowledgments
The works of Nick Higham, Lars Eldén, G. W. Stewart, Luc Giraud, and many others have been
prominent inspirations for many of the topics new to this edition. Special thanks goes to Michael
Saunders, who patiently read several versions of the book and gave valuable advice. Without his
encouragement and support, this revision would never have been finished.
Åke Björck
Linköping, March 2023
Preface to the First Edition
A basic problem in science is to fit a model to observations subject to errors. It is clear that
the more observations that are available the more accurately will it be possible to calculate the
parameters in the model. This gives rise to the problem of “solving” an overdetermined linear
or nonlinear system of equations. It can be shown that the solution which minimizes a weighted
sum of the squares of the residual is optimal in a certain sense. Gauss claims to have discovered
the method of least squares in 1795 when he was 18 years old. Hence this book also marks the
bicentennial of the use of the least squares principle.
The development of the basic modern numerical methods for solving linear least squares
problems took place in the late sixties. The QR decomposition by Householder transformations
was developed by Golub and published in 1965. The implicit QR algorithm for computing the
singular value decomposition (SVD) was developed about the same time by Kahan, Golub, and
Wilkinson, and the final algorithm was published in 1970. These matrix decompositions have
since been developed and generalized to a high level of sophistication. Great progress has been
made in the last decade in methods for generalized and modified least squares problems and in
direct and iterative methods for large sparse problems. Methods for total least squares problems,
which allow errors also in the system matrix, have been systematically developed.
Applications of least squares of crucial importance occur in many areas of applied and en-
gineering research such as statistics, geodetics, photogrammetry, signal processing, and control.
Because of the great increase in the capacity for automatic data capturing, least squares prob-
lems of large size are now routinely solved. Therefore, sparse direct methods as well as iterative
methods play an increasingly important role. Applications in signal processing have created a
great demand for stable and efficient methods for modifying least squares solutions when data
are added or deleted. This has led to renewed interest in rank-revealing QR decompositions,
which lend themselves better to updating than the singular value decomposition. Generalized
and weighted least squares problems and problems of Toeplitz and Kronecker structure are be-
coming increasingly important.
Chapter 1 gives the basic facts and the mathematical and statistical background of least
squares methods. In Chapter 2 relevant matrix decompositions and basic numerical methods
are covered in detail. Although most proofs are omitted, these two chapters are more elementary
than the rest of the book and essentially self-contained. Chapter 3 treats modified least squares
problems and includes many recent results. In Chapter 4 generalized QR and SVD decomposi-
tions are presented, and methods for generalized and weighted problems surveyed. Here also,
robust methods and methods for total least squares are treated. Chapter 5 surveys methods for
problems with linear and quadratic constraints. Direct and iterative methods for large sparse least
squares problems are covered in Chapters 6 and 7. These methods are still subject to intensive
research, and the presentation is more advanced. Chapter 8 is devoted to problems with special
bases, including least squares fitting of polynomials and problems of Toeplitz and Kronecker
structures. Finally, Chapter 9 contains a short survey of methods for nonlinear problems.
xiii
xiv Preface to the First Edition
This book will be of interest to mathematicians working in numerical linear algebra, com-
putational scientists and engineers, and statisticians, as well as electrical engineers. Although a
solid understanding of numerical linear algebra is needed for the more advanced sections, I hope
the book will be found useful in upper-level undergraduate and beginning graduate courses in
scientific computing and applied sciences.
I have aimed to make the book and the bibliography as comprehensive and up to date as
possible. Many recent research results are included, which were only available in the research
literature before. Inevitably, however, the content reflects my own interests, and I apologize in
advance to those whose work has not been mentioned. In particular, work on the least squares
problem in the former Soviet Union is, to a large extent, not covered.
The history of this book dates back to at least 1981, when I wrote a survey entitled “Least
Squares Methods in Physics and Engineering” for the Academic Training Programme at CERN
in Geneva. In 1985 I was invited to contribute a chapter on “Least Squares Methods” in the
Handbook of Numerical Analysis, edited by P. G. Ciarlet and J. L. Lions. This chapter was
finished in 1988 and appeared in Volume 1 of the Handbook, published by North-Holland in
1990. The present book is based on this contribution, although it has been extensively updated
and made more complete.
The book has greatly benefited from the insight and knowledge kindly provided by many
friends and colleagues. In particular, I have been greatly influenced by the work of Gene H.
Golub, Nick Higham, and G. W. Stewart. Per-Åke Wedin gave valuable advice on the chapter
on nonlinear problems. Part of the Handbook chapter was written while I had the benefit of
visiting the Division of Mathematics and Statistics at CSIRO in Canberra and the Chr. Michelsen
Institute in Bergen.
Thanks are due to Elsevier Science B.V. for the permission to use part of the material from the
Handbook chapter. Finally, I thank Beth Gallagher and Vickie Kearn at SIAM for the cheerful
and professional support they have given throughout the copy editing and production of the book.
Åke Björck
Linköping, February 1996
Chapter 1
Mathematical and
Statistical Foundations
De tous les principes qu’on peut proposer pour cet objet, je pense qu’il n’en est
pas de plus général, de plus exact, ni d’une application plus facile que celui qui . . .
consiste à rendre minimum la somme de quarrés des erreurs.1
—Adrien Marie Legendre, Nouvelles méthodes pour la détermination des or-
bites des comètes. Appendice. 1805.
1.1 Introduction
1.1.1 Historical Remarks
The least squares problem is a computational problem of primary importance in science and
engineering. Originally, it arose from the need to reduce the influence of errors when fitting
a mathematical model to given observations. A way to do this is to use a greater number of
measurements than the number of unknown parameters in the model. As an example, consider a
function known to be a linear combination of n known basis function ϕj (t):
n
X
f (t) = cj ϕj (t). (1.1.1)
j=1
that which consists of rendering the sum of squares of the errors minimum. (Our translation.)
1
2 Chapter 1. Mathematical and Statistical Foundations
be equal to zero. He showed that this implies that the solution x must satisfy exactly n out of the
m equations. Gauss argued against this, saying that by the principles of probability, greater or
smaller errors are equally possible in all equations. Therefore, a solution that precisely satisfies a
subset of the equations must be regarded as less consistent with the laws of probability.
Pm This led
him to the alternative principle of minimizing the sum of squared residuals S = i=1 ri2 , which
also gives a simpler computational procedure.
The first to publish the algebraic procedure and use the name “least squares method” was
Legendre [729, 1805]. A few years later, in a paper titled (in translation) Theory of the Motion
of the Heavenly Bodies Moving about the Sun in Conic Sections2 Gauss [444, 1809] justified the
method of least squares as a statistical procedure. Much to the annoyance of Legendre, he wrote
Our principle, which we have made use of since 1795, has lately been published by
Legendre.
Most historians agree that Gauss was right in his claim of precedence. He had used the least
squares principle earlier for the analyses of survey data and in astronomical calculations and had
communicated the principle to several astronomers. A famous example is Gauss’s prediction of
the orbit of the asteroid Ceres in 1801. After this success, the method of least squares quickly
became the standard procedure for analysis of astronomical and geodetic data and remains so to
this day.
Another early application of the least squares method is from 1793. At that time, the French
government decided to base the new metric system upon a unit, the meter, equal to one
10,000,000th part of the distance from the North Pole to the Equator along a meridian arc through
Paris. In a 1795 survey, four subsections of an arc from Dunkirk to Barcelona were measured.
For each subsection, the length S of the arc (in modules), the degrees d of latitude, and the
latitude L of the midpoint were determined by the following astronomical observations:
where z and y are unknown parameters. The meridian quadrant is then M = 90(z + y/2),
and the eccentricity e is found from 1/e = 3(z/y + 1/2). The least squares estimates are
1/e = 157.951374 and M = 2, 564, 801.46; see Stigler [1038, 1981].
The early development of statistical methods for estimating parameters in linear models
is surveyed by Farebrother [397, 1999]. Detailed accounts of the invention and history of
least squares are given by Plackett [895, 1972], Stigler [1037, 1977], [1038, 1981], and
Goldstine [484, 1977].
Analyzing data sets of very large size is now a regular task in a broad variety of applica-
tions. The method of least squares, now over 200 years old, is still one of the most frequently
used methods for data fitting. Applications of least squares fitting cover a wide range of scien-
tific disciplines, such as geodesy, photogrammetry, tomography, molecular modeling, structural
2 A German language version of this paper containing his least squares work had appeared in 1806. The 1809 publi-
cation is in Latin, and an English language translation was not available until 1857.
1.1. Introduction and Historical Remarks 3
analysis, signal processing, cluster analysis and pattern matching. Many of these lead to prob-
lems of large size and complexity.
An application of spectacular size for its time is the least squares adjustment of coordinates
of the geodetic stations comprising the North American Datum; see Kolata [704, 1978]. This
problem consists of about 6.5 million equations in 540,000 unknowns (= twice the number of
stations). Since the equations are mildly nonlinear, only two or three linearized problems of this
size have to be solved.
A more recent application is the determination of the gravity field of Earth from highly accu-
rate satellite measurements; see Baboulin et al. [49, 2009]. To model the gravitational potential,
a function of the form
L l
GM X r l+1 X
V (r, θ, λ) = Plm (cos θ) clm cos mλ + slm sin mλ
R R m=0
l=0
is used, where G is the gravitational constant, M is Earth’s mass, R is Earth’s reference radius,
and Plm are the normalized Legendre polynomials of order m. The normalized harmonic co-
efficients clm and slm are to be determined. For L = 300, the resulting least squares problem
involves 90,000 unknowns and millions of observations and needs to be solved on a daily basis.
The demand for fast and accurate least squares solvers continues to grow as problem scales
become larger and larger. Analyzing data sets of billions of records is now a regular task at
many companies and institutions. Such large-scale problems arise in a variety of fields, such as
genetics, image processing, geophysics, language processing, and high-frequency trading.
Let x = (x1 , . . . , xn )T be a vector of random variables where the joint distribution of xi and xj
is F (xi , xj ). Then the covariance σij between xi and xj is defined by
Z ∞
σij = cov(xi , xj ) = E[(xi − µi )(xj − µj )] = (xi − µi )(xj − µj )dF (xi , xj ).
xi ,xj =−∞
Then σij = E(xi xj ) − µi µj , where µi = E(xi ). The covariance matrix V ∈ Rn×n of the vector
x is defined by
V(x) = V = E[(x − µ)(x − µ)T ] = E(xxT ) − µµT ,
where µ = E(x) = (µ1 , . . . , µn ). We now prove some useful properties.
4 Chapter 1. Mathematical and Statistical Foundations
Lemma 1.1.1. Let z = F x, where F ∈ Rm×n is a given matrix, and let x ∈ Rn be a random
vector with E(x) = µ and covariance matrix V . Then
E(z) = F µ, V(z) = F V F T .
Proof. The first property follows directly from the definition of expected value. The second is
proved as
Lemma 1.1.2. Let A ∈ Rn×n be a symmetric matrix, and let y be a random vector with expected
value µ and covariance matrix V . Then
In the Gauss–Markov linear model it is assumed that the random vector of observations
b ∈ Rm is related to a parameter vector x ∈ Rn by a linear equation
where V is the known covariance of a random error vector ϵ of mean zero. The standard model
has V = I, i.e., the errors bi − b̄i are assumed to be uncorrelated and to have the same variance.
Theorem 1.1.4 (Gauss–Markov Theorem). Consider the standard Gauss–Markov linear model
(1.1.2), where A ∈ Rm×n is a known matrix of rank n. Then the best linear unbiased estimate
of any linear functional cT x is cT x
b, where x
b is the least squares estimator that minimizes the
sum of squares rT r, where r = b − Ax. Furthermore, x b is obtained by solving the symmetric
positive definite system of normal equations
ATAx = AT b. (1.1.3)
Proof. See Theorem 1.1.5 and Zelen [1143, 1962, pp. 560–561].
In the literature, this result is often stated in less general form, where the errors are assumed to
be normally distributed or independent and identically distributed. However, Gauss only assumed
the weaker condition that the errors are uncorrelated with zero mean and equal variance.
From Lemma 1.1.1 it follows that the covariance matrix of the least squares estimate x b=
(ATA)−1 AT b is
It can be shown that the s2 , and therefore also the rb is uncorrelated with x
b, i.e.,
cov(b
r, x
b) = 0, cov(s2 , x
b) = 0.
AT V −1 Ax = AT V −1 b. (1.1.8)
be the set of all least squares solutions, where ∥ · ∥2 denotes the Euclidean vector norm ∥x∥2 =
(xT x)1/2 . Then x ∈ S if and only if the orthogonality condition AT (b − Ax) = 0 holds or,
equivalently, x satisfies the normal equations
ATAx = AT b. (1.1.10)
Proof. Assume that x̂ satisfies AT r̂ = 0, where r̂ = b − Ax̂. Then for any x ∈ Rn we have
r = b − Ax = r̂ + A(x̂ − x) ≡ r̂ + Ae. From this we obtain
which is minimized when x = x̂. On the other hand, suppose AT r̂ = z ̸= 0. If x = x̂ + ϵz, then
r = r̂ − ϵAz and
rT r = r̂T r̂ − 2ϵz T z + ϵ2 (Az)T Az < r̂T r̂
for sufficiently small ϵ > 0. Hence x̂ is not a least squares solution.
6 Chapter 1. Mathematical and Statistical Foundations
Theorem 1.1.6. The matrix ATA of normal equations is positive definite if and only if the col-
umns of A ∈ Rm×n are linearly independent, i.e., rank(A) = n. Then the matrix (ATA)−1
exists, and the unique least squares solution and residual are
For a matrix A ∈ Rm×n of rank r, the range (or column space) is the subspace
R(A) = {y = Ax | x ∈ Rn } ∈ Rm (1.1.12)
Ax ∈ R(A), r = b − Ax ⊥ R(A).
Thus, x is a least squares solution if and only if the residual r = b−Ax is perpendicular to R(A).
This geometric characterization is shown in Figure 1.1.1. The nullspace of a matrix A ∈ Rm×n
is defined as the subspace
N (A) = {z ∈ Rm | Az = 0} ∈ Rn (1.1.13)
where Rm and Rn denote the space of m-vectors and n-vectors. Further, from the singular
value decomposition (SVD) of A in Section 1.2.2 it follows that R(A) ⊥ N (AT ) and N (A) ⊥
R(AT ).
(I − P )2 = I − 2P + P 2 = (I − P ).
P = A(ATA)−1 AT
is the orthogonal projector onto R(A). The above results can be summarized as follows.
4. x solves the consistent linear system Ax = PR(A) b, where PR(A) is the orthogonal pro-
jector onto R(A).
If r = rank(A) < n, then A has a nullspace of dimension n − r > 0. Then the problem
minx ∥Ax − b∥2 is underdetermined, and its solution is not unique. If x̂ is a particular least
squares solution, then the set of all least squares solutions is S = {x = x̂ + z | z ∈ N (A)}. In
this case we can seek the least squares solution of least-norm ∥x∥2 , i.e., solve
Theorem 1.1.9. Let x be a solution of the problem minx ∥Ax − b∥2 . Then x is a least squares
solution of least-norm if and only if x ⊥ N (A) or, equivalently, x = AT z, z ∈ Rm .
Proof. Let x̂ be any least squares solution, and set x = x̂ + z, where z ∈ N (A). Then Az = 0,
so r = b − Ax̂ = b − Ax, and x̂ is also a least squares solution. By the Pythagorean theorem,
∥x∥22 = ∥x̂∥22 + ∥z∥22 , which is minimized when z = 0.
If the system Ax = b is consistent, then the least-norm solution satisfies the normal equations
of second kind,
x = AT z, AATz = b. (1.1.17)
If rank(A) = m, then AAT is nonsingular, and the solution to (1.1.17) is unique.
8 Chapter 1. Mathematical and Statistical Foundations
From this result and Theorem 1.1.8 we have the following characterization of a solution to
the least squares problem (1.1.16). It includes both the over- and underdetermined cases.
Theorem 1.1.10. For A of any dimension and rank, the least squares solution of minimum norm
∥x∥2 is unique and characterized by the conditions
The normal equations ATAx = AT b are equivalent to the linear equations AT r = 0, and
r = b − Ax. Together, these form a symmetric augmented system of m + n equations
I A y b
= , (1.1.19)
AT 0 x c
where y = r and c = 0. The augmented system is nonsingular if and only if rank(A) = n. Then
its inverse is −1
I − PR(A) A(ATA)−1
I A
= , (1.1.20)
AT 0 (ATA)−1 AT −(ATA)−1
where PR(A) = A(ATA)−1AT is the orthogonal projector onto R(A).
Theorem 1.1.11. If rank(A) = n, then the augmented system (1.1.19) has a unique solution
that solves the primal and dual least squares problems,
1
min ∥b − Ax∥22 + cT x, (1.1.21)
x∈Rn 2
1
min ∥y − b∥22 subject to AT y = c. (1.1.22)
y∈Rm 2
Theorem 1.2.1 (Cholesky Factorization). Let the matrix C ∈ Cn×n be Hermitian positive
definite. Then there exists a unique upper triangular matrix R = (rij ) with real positive diagonal
elements called the Cholesky factor of C such that
C = RHR. (1.2.2)
Proof. The proof is by induction. The result is clearly true for n = 1. If it is true for some
n − 1, the leading principal submatrix Cn−1 of C has a unique Cholesky factorization Cn−1 =
H
Rn−1 Rn−1 , where Rn−1 is nonsingular. Then
H
Cn−1 d Rn−1 0 Rn−1 r
Cn = = (1.2.3)
dH γ rH ρ 0 ρ
Substituting the Cholesky factorization C = AHA = RHR into the normal equations gives
RH Rx = d, where d = AH b. Hence, the solution is obtained by solving two triangular systems,
RH z = d, Rx = z. (1.2.5)
This method is easy to implement and often faster than other direct solution methods. It works
well unless A is ill-conditioned.
For a consistent underdetermined linear system Ax = b, the solution to the least-norm prob-
lem min ∥x∥2 subject to Ax = b satisfies the normal equations of the second kind,
x = AH z, AAH z = b.
If A has full row rank, then AAH is symmetric positive definite, and the Cholesky factorization
AAH = RHR exists. Then z is obtained by solving
RH w = b, Rz = w. (1.2.6)
It is often preferable to work with the Cholesky factorization of the cross-product of the
extended matrix ( A b ),
H H
A A A AH b
(A b) = , (1.2.7)
bH bHA bH b
when solving a least squares problem. If rank(A) = n, then the Cholesky factor of the cross-
product (1.2.7),
R z
S= , (1.2.8)
0 ρ
10 Chapter 1. Mathematical and Statistical Foundations
In the real case, the arithmetic cost of this Cholesky QR algorithm is 2mn2 + n3 /3 flops. More
accurate methods that compute the QR factorization (1.2.9) directly from A are described in
Sections 2.2.2–2.2.4.
We now state some useful properties of Hermitian positive definite matrices. From the proof
of Theorem 1.2.1 follows a well-known characterization.
Theorem 1.2.3. Let C ∈ Cn×n be Hermitian, and let X ∈ Cn×p have full column rank. Then
X HCX is positive definite. In particular, any principal p × p submatrix
ci1 i1 . . . ci1 ip
. .. ∈ Cp×p , 1 ≤ p < n,
.. .
c̄ip i1 . . . cip ip
is positive definite. From p = 1 it follows that all diagonal elements in C are real positive.
Proof. Suppose C is positive definite, z ̸= 0, and y = Xz. Then since X has full column rank,
it follows that y ̸= 0 and
z H (X HCX)z = y HCy > 0.
The result now follows because any principal submatrix of C can be written as X HCX, where the
columns of X are taken to be the columns k = ij , j = 1, . . . , p, of the identity
matrix.
Theorem 1.2.4. The element of maximum magnitude of a real symmetric positive definite matrix
C = (cij ) ∈ Cn×n lies on the diagonal.
Theorem 1.2.5 (The Singular Value Decomposition). For any matrix A ∈ Cm×n of rank r
there exist unitary matrices U = (u1 , . . . , um ) ∈ Cm×m and V = (v1 , . . . , vn ) ∈ Cn×n such
that
Σ1 0
A = U ΣV H = U V H, (1.2.11)
0 0
where Σ1 = diag (σ1 , σ2 , . . . , σr ) ∈ Rr×r . The σi are the singular values of A, and ui ∈ Cm
and vi ∈ Cn are the left and right singular vectors. In the following we assume that the singular
values are ordered so that
σ1 ≥ σ2 ≥ · · · ≥ σr > 0.
Proof. We give an inductive proof that constructs the SVD from its largest singular value σ1 and
the associated left and right singular vectors. Let v1 ∈ Cn with ∥v1 ∥2 = 1 be a unit vector such
that
∥Av1 ∥2 = ∥A∥2 = σ1 ,
where σ1 is real and positive. The existence of such a vector follows from the definition of a
subordinate matrix norm ∥A∥. If σ1 = 0, then A = 0, and (1.2.11) holds with Σ = 0 and
arbitrary unitary matrices U and V . If σ1 > 0, set u1 = (1/σ1 )Av1 ∈ Cm , and let
V = ( v1 V1 ) ∈ Cn×n , U = ( u1 U1 ) ∈ Cm×m
be unitary matrices. (Recall that it is always possible to extend a unitary set of vectors to a
unitary basis for the whole space.) Since U1H Av1 = σU1H u1 = 0, it follows that U H AV has the
structure
σ1 w H
A1 ≡ U H AV = ,
0 B
where wH = uH H
1 AV1 and B = U1 AV1 ∈ C
(m−1)×(n−1)
.
From the two inequalities
2
2 H 1/2 σ1 σ + wH w
∥A1 ∥2 (σ1 + w w) ≥ A1 = ≥ σ12 + wH w
w 2
Bw 2
it follows that ∥A1 ∥2 ≥ (σ12 + wH w)1/2 . But U and V are unitary and ∥A1 ∥2 = ∥A∥2 = σ1 .
Hence w = 0, and the proof can now be completed by an induction argument on the smallest
dimension min(m, n).
Instead of the full SVD (1.2.11), it often suffices to consider the compact or economy size
SVD,
Xr
A = U1 Σ1 V1H = σi ui viH , (1.2.12)
i=1
m×r n×r
where U1 = C and V1 = C are the singular vectors corresponding to nonzero singular
values. If A has full column rank, then V1 = V . Similarly, if A has full row rank, then U1 = U .
By (1.2.12), A is decomposed into a sum of r rank-one matrices.
12 Chapter 1. Mathematical and Statistical Foundations
Like the eigenvalues of a real Hermitian matrix, the singular values have a min-max charac-
terization.
Proof. The result is established similarly to the characterization of the eigenvalues of a symmetric
matrix in the Courant–Fischer theorem; see Horn and Johnson [641, 2012].
Theorem 1.2.7 (Jordan–Wielandt). Let A ∈ Cm×n , m ≥ n, be a matrix of rank r, and let its
SVD be A = U ΣV H , where Σ = diag ( Σ1 0 ), U = ( U1 U2 ), and V = ( V1 V2 ). Then
Σ1 0 0
0 A
C= = P 0 −Σ1 0 P H , (1.2.19)
AH 0
0 0 0
1.2. Fundamental Matrix Decompositions 13
The use of C for computing the SVD of A was pioneered by Lanczos [717, 1958], [714,
1961, Chap. 3]. Note that the matrix
AAH 0
C2 = (1.2.21)
0 AH A
Example 1.2.8. Let A = (u1 , u2 ), where u1 and u2 are unit vectors such that 0 < uT1 u2 =
cos γ, where γ is the angle between the vectors u1 and u2 . The eigenvalues of the matrix
1 cos γ
ATA =
cos γ 1
are the roots of the equation (λ − 1)2 = cos2 γ and equal λ1 = 2 cos2 (γ/2), λ2 = 2 sin2 (γ/2).
Hence the singular values of A are
√ √
σ1 = 2 cos(γ/2), σ2 = 2 sin(γ/2).
The left singular vectors can be determined from (1.2.16). However, if γ is less than the square
root of the unit roundoff, then numerically cos γ ≈ 1±γ 2 /2 = 1. Then the computed eigenvalues
of ATA are 0 and 2, i.e., the smallest singular value of A has been lost!
A = U ΣV H gives unitary bases for these two subspaces. Thus the SVD is a perfect tool for
solving least squares problems.
∥b − Ax∥2 = ∥U H (b − AV V H x)∥2
c1 Σ1 0 z1 c1 − Σ1 z1
= − = .
c2 0 0 z2 2
c2 2
The pseudoinverse was introduced by Moore and rediscovered by Penrose [889, 1955]. The
pseudoinverse is therefore also called the Moore–Penrose inverse. Penrose gave it an elegant
algebraic characterization.
1. (A† )† = A;
2. (A† )H = (AH )† ;
3. (αA)† = α† A† ;
Ai , where Ai AH H †
A†i ;
P P
6. if A = i j = 0, Ai Aj = 0, i ̸= j, then A = i
For the pseudoinverse, the relations and AA† = A† A and (AB)† = B † A† are not in general
T
true. For example, let A = ( 1 0 ) and B = ( 1 1 ) . Then AB = 1, but
† † 1 1 1
B A = (1 1) = .
2 0 2
Necessary and sufficient conditions for the identity (AB)† = B † A† to hold have been given by
Greville [536, 1966]. The following theorem gives an important sufficient condition.
Proof. The last equality follows from (1.2.25). The first equality is verified by showing that the
four Penrose conditions are satisfied.
The pseudoinverse and the singular vectors of A give simple expressions for orthogonal pro-
jections onto the four fundamental subspaces of A. The following expressions are easily verified
using the Penrose conditions and the SVD:
where U1 = (u1 , . . . , ur ) and V1 = (v1 , . . . , vr ). From this we get the following expression for
the inverse of the augmented system matrix when A has full column rank:
−1
I − AA† (A† )T
I A
= . (1.2.30)
AH 0 A† −(AHA)−1
If only some of the four Penrose conditions hold, the corresponding matrix is called a gen-
eralized inverse. Such inverses have been extensively analyzed; see Nashed [823, 1976]. Any
matrix A− satisfying the first Penrose condition AA−A = A is called an inner inverse or {1}-
inverse. If it satisfies the second condition A−AA− = A− , it is called an outer inverse or a
{2}-inverse.
16 Chapter 1. Mathematical and Statistical Foundations
Let A− be a {1}-inverse of A. Then for all b such that the system Ax = b is consistent,
x = A− b is a solution. The general solution can be written
x = A− b + (I − A−A)y, y ∈ Cn . (1.2.31)
This shows that AA− and A−A are idempotent and therefore (in general, oblique) projectors; see
Section 3.1.4. The residual norm ∥Ax − b∥2 is minimized when x satisfies the normal equations
AHAx = AH b. Suppose that a {1}-inverse A− also satisfies the third Penrose condition
Then AA− is the orthogonal projector onto R(A) and A− is called a least squares inverse. We
have
AH = (AA−A)H = AHAA− ,
which shows that x = A− b satisfies the normal equations and therefore is a least squares solution.
The following dual result also holds. If A− is a generalized inverse and (A−A)H = A−A,
then A−A is the orthogonal projector onto N (A), and A− is called a least-norm inverse. If
Ax = b is consistent, the unique solution for which ∥x∥2 is smallest satisfies the normal equa-
tions of the second kind,
x = AH z, AAH z = b.
For a least-norm inverse A− it holds that
AH = (AA−A)H = A−AAH .
The notion of inverse of a matrix was generalized to include all matrices A, singular as well as
rectangular, by E. H. Moore [804, 1920]. Moore called this the “general reciprocal.” His con-
tribution used unnecessarily complicated notation and was soon sinking into oblivion; see Ben-
Israel [100, 2002]. A collection of papers on generalized inverses can be found in Nashed [823,
1976]. Ben-Israel and Greville [101, 2003] give a comprehensive survey of generalized inverses.
The general concept of principal angles between any two subspaces of Cn goes back to a re-
markable paper by Jordan [677, 1875].
1.2. Fundamental Matrix Decompositions 17
Definition 1.2.12. Let X = R(X) and Y = R(Y ) be two subspaces of Cn . Without restriction
we can assume that
p = dim (X ) ≥ dim (Y) = q ≥ 1.
The principal angles θk between X and Y and the corresponding principal vectors uk , vk , k =
1, . . . , q, are recursively defined by
Note that for k = 1, the constraints are empty, and θ1 is the smallest principal angle between
X and Y. The principal vectors need not be uniquely defined, but the principal angles always
should be.
From the min-max characterization of the singular values and vectors given in Theorem 1.2.6
follows a relationship between principal angles and the SVD.
Theorem 1.2.13. Assume that X ∈ Cm×p and Y ∈ Cm×q form unitary bases for two subspaces
X and Y. Consider the SVD
Using this result, Björck and Golub [144, 1973] give stable algorithms for computing the
principal angles and vectors between subspaces. Golub and Zha [515, 1994], [516, 1995] give
a detailed perturbation analysis, discuss equivalent characterizations of principal angles, and
present algorithms for large and structured matrices. The stability of the Björck–Golub algorithm
is proved by Drmač [334, 2000].
The principal angles can be used to determine when two subspaces X and Y are close to each
other. X and Y are identical if and only if all principal angles are zero.
Definition 1.2.14. The largest principal angle θmax between two subspaces X and Y of the same
dimension is a measure of the distance between them:
sin θk (X , Y), k = 1, . . . , q;
where m1 + m2 = m. Then the SVD of the blocks Q11 and Q21 are closely related. To simplify
the discussion, we assume that both Q11 and Q21 are square, i.e., m1 = m2 = n and Q11 is
nonsingular. Let
Q11 = U1 CV1H , C = diag (c1 , . . . , cn )
X H X = V1H QH H 2 2
21 Q21 V1 = V1 (I − C )V1 = I − C .
A more general CS decomposition for a unitary matrix, where Q11 and Q21 are not required
to be square matrices, is given by Paige and Saunders [856, 1981].
Theorem 1.2.15 (CS Decomposition). For an arbitrary partitioning of a square unitary matrix
Q ∈ Cm×m ,
n n2
1
m1 Q11 Q12
Q= , (1.2.39)
m2 Q21 Q22
where Dij = UiH Qij Vj ∈ Rni ×mj , i, j = 1, 2, are real and diagonal matrices,
OsH
I
C S
D11 D12
Oc I
= , (1.2.41)
D21 D22 Os I
S −C
I OcH
and c2i + s2i = 1. Here Oc and Os are zero blocks or may be empty matrices having no rows or
no columns. The unit matrices need not be equal and may not be present.
1.3. Perturbation Analysis 19
Proof. Paige and Wei [870, 1994] note that Qij = Ui CVjH , i, j = 1, 2, are essentially the SVDs
of the four blocks in the partitioned unitary matrix Q. Take U1 and V1 so that Q11 = U1 D11 V1H
is the SVD of Q11 . Hence, D11 is a nonnegative diagonal matrix with elements less than or equal
to unity. Choose unitary U2 and V2 to make U2H Q21 V1 lower triangular and U1H Q12 V2 upper
triangular with real nonnegative elements on their diagonals. Since the columns are orthonormal,
D21 must have the stated form. The orthonormality of rows gives D12 , except for the dimension
of the zero block denoted OsH . Since each row and column has unit length, the last block column
must have the form
O12
S
Q12
I
U2 V2H = .
Q22 K L
M N
O22
The orthogonality of the second and fourth blocks of columns shows that SM = 0. Hence
M = 0 because S is nonsingular. Similarly, from the second and fourth blocks of rows, L = 0.
Next, from the fifth and second blocks of rows, SC + N S = 0 and hence N = −C. Then we
see that KK H = I and K H K = I, and they can be transformed into I without altering the
rest of D. Finally, the unit matrices in the (1, 1) and (4, 4) blocks show that O12 = OsH and
O22 = OcH .
These have the property ∥x∥p = ∥ |x| ∥p , where |x| = (|x1 |, . . . , |xn |). Such norms are said to
be absolute. The most important particular cases are p = 1, 2 and the limit when p → ∞,
The vector ℓ2 -norm is the Euclidean length of the vector. If Q is unitary, then ∥Qx∥22 =
xH QH Qx = xH x = ∥x∥22 , i.e., this norm is invariant under unitary transformations: On a
finite-dimensional space, two norms differ by at most a positive constant that only depends on
the dimension. For the vector ℓp -norms,
1
− p1
∥x∥p2 ≤ ∥x∥p1 ≤ n p1 2 ∥x∥p2 , p1 ≤ p2 .
Definition 1.3.1. For any given vector norm ∥ · ∥ on Cn , the dual norm ∥ · ∥D is defined by
The vectors in the set {y ∈ Cn | ∥y∥D ∥x∥ = y H x = 1} are said to be dual vectors to x with
respect to ∥ · ∥.
From Hölder’s inequality it follows that the dual of the ℓp -norm is the ℓq -norm, where 1/p +
1/q = 1. Hence the dual of the ℓ2 -norm is itself. It is the only norm with this property; see Horn
and Johnson [641, 2012, Theorem 5.4.16].
A matrix norm is a function ∥·∥ : Cm×n → R that satisfies analogues of the three properties
of a vector norm. The matrix norm subordinate to a given vector norm is defined by
∥Ax∥
∥A∥ = max = max ∥Ax∥. (1.3.5)
x̸=0 ∥x∥ ∥x∥=1
A subordinate matrix norm is submultiplicative, i.e., whenever the product AB is defined, the
inequality ∥AB∥ ≤ ∥A∥∥B∥ holds. The matrix norms subordinate to the vector ℓp -norms are
especially important. For p = 2 it is given by the spectral norm
where σ1 (A) is the largest singular value of A ∈ Cm×n . Because the nonzero singular values of
A and AH are the same, it follows that ∥A∥2 = ∥AH ∥2 .
For p = 1 and p = ∞ it can be shown that the matrix subordinate norms are
m
X n
X
∥A∥1 = max |aij |, ∥A∥∞ = max |aij |. (1.3.7)
1≤j≤n 1≤i≤m
i=1 j=1
These norms are easily computable. A useful upper bound for the spectral norm, which is ex-
pensive to compute, is given by
Another way to define matrix norms is to regard Cm×n as an mn-dimensional vector space
and apply a vector norm over that space. An example is the Frobenius norm derived from the
vector ℓ2 -norm,
m X
X n 1/2
∥A∥F = ∥AH ∥F = |aij |2 = trace (AH A)1/2 . (1.3.9)
i=1 j=1
The Frobenius norm is submultiplicative but is often larger than necessary, e.g., ∥I∥F = n1/2 .
Lower and upper bounds for the matrix ℓ2 -norm in terms of the Frobenius norm are
1
√ ∥A∥F ≤ ∥A∥2 ≤ ∥A∥F .
n
The Frobenius norm and the matrix subordinate norms for p = 1 and p = ∞ satisfy ∥ |A| ∥ =
∥A∥. However, for the ℓ2 -norm, the best result is ∥ |A| ∥2 ≤ n1/2 ∥A∥2 . The spectral and
Frobenius norms of A both can be expressed in terms of singular values σi (A) as
r
X 1/2
∥A∥2 = σmax (A), ∥A∥F = σi2 (A) , r = min{m, n}. (1.3.10)
i=1
Such norms are unitarily invariant, i.e., ∥A∥ = ∥U H AV ∥ for any unitary U and V . The follow-
ing characterization of such norms was given by von Neumann [1094, 1937]; see Stewart and
Sun [1033, 1990].
Theorem 1.3.2. Any unitarily invariant matrix norm ∥A∥ is a symmetric function of the singular
values of A, i.e.,
∥A∥ = Φ(σ1 , . . . , σn ),
where Φ is invariant under permutation of its arguments.
22 Chapter 1. Mathematical and Statistical Foundations
Proof. Let the singular value decomposition of A be A = U ΣV H . The invariance implies that
∥A∥ = ∥Σ∥, which shows that Φ(A) only depends on Σ. As the ordering of the singular values
in Σ is arbitrary, Φ must be symmetric in σi .
The converse of Theorem 1.3.2 was also proved by von Neumann, i.e., any function
Φ(σ1 , . . . , σn ) that is symmetric in its arguments and satisfies the properties of a vector norm
defines a unitarily invariant matrix norm. Such functions are called symmetric gauge functions.
An important class of unitarily invariant matrix norms is the so-called Schatten norms
(Schatten [969, 1960]) obtained by taking the ℓp vector norm (1.3.1) of the vector of singular
values σ = (σ1 , . . . , σn ) of A:
For p = 2 we get the Frobenius norm, and p → ∞ gives the spectral norm ∥A∥2 = σ1 . Taking
p = 1, we obtain the nuclear norm or Ky Fan norm (see Ky Fan [394, 1949])
√ n
X
H
∥A∥∗ = trace ( A A) = σi (A). (1.3.12)
i=1
Theorem 1.3.3. Let A ∈ Rm×n have the singular values σ1 ≥ σ2 ≥ · · · ≥ σn . Then the
singular values σ̃1 ≥ σ̃2 ≥ · · · ≥ σ̃n of the perturbed matrix à = A + E, m ≥ n, satisfy
n
X
(i) |σi − σ̃i | ≤ ∥E∥2 , (ii) |σi − σ̃i |2 ≤ ∥E∥2F . (1.3.13)
i=1
The second inequality in (1.3.13) is known as the Wielandt–Hoffman theorem for singular
values. The fact that a perturbation of A will produce perturbations of the same or smaller
magnitude in its singular values is important for the use of the SVD to determine the “numerical
rank” of a matrix; see Section 1.3.3.
Theorem 1.3.4. Let A ∈ Rm×n have a simple singular value σi with corresponding left and
right singular vectors ui and vi . Let γi = minj̸=i |σi − σj | be the absolute gap between σi and
the other singular values of A. Then, the perturbed matrix à = A + E, where ∥E∥2 < γi , has
a singular value σ̃i with singular vectors ũi and ṽi that satisfy
∥E∥2
max sin θ(ui , ũi ), sin θ(vi , ṽi ) ≤ . (1.3.14)
γi − ∥E∥2
It is well known that the eigenvalues of the leading principal minor of order (n − 1) of a
Hermitian matrix A ∈ Rn×n interlace the eigenvalues of A. From the min-max characterization
in Theorem 1.2.6, a similar result for the singular values follows.
1.3. Perturbation Analysis 23
A
b = (A u ) ∈ Rm×n , m ≥ n.
Then the ordered singular values σi of A separate the ordered singular values σ̂i of Â,
b1 ≥ σ1 ≥ σ
σ b2 ≥ σ2 ≥ · · · ≥ σ
bn−1 ≥ σn−1 ≥ σ
bn ≥ σn .
Lemma 1.3.6. Let A ∈ Cm×n and Bk = Xk YkH , where Xk , Yk ∈ Cm×k . Then rank(Bk ) ≤
k < min{m, n} and σ1 (A − Bk ) ≥ σk+1 (A), where σi (·) denotes the ith singular value of its
argument.
Proof. Let vi , i = 1, . . . , n, be the right singular vectors of A. Since rank(Y ) = k < n, there is
a vector v = c1 v1 + · · · + ck+1 vk+1 such that ∥v∥22 = c21 + · · · + c2k+1 and Y H v = 0. It follows
that
σ12 (A − Bk ) ≥ v H (A − Bk )H (A − Bk )v = v H AH Av
= |c1 |2 σ12 + · · · + |ck+1 |2 σk+1
2 2
≥ σk+1 .
We are now able to prove an important best approximation property of truncated SVD
expansions.
Pk
is obtained by truncating the SVD expansion to k < r terms: Ak = i=1 σi ui viH . The minimum
distance is given by
2
∥A − Ak ∥2 = σk+1 , ∥A − Ak ∥F = (σk+1 + · · · + σr2 )1/2 . (1.3.16)
24 Chapter 1. Mathematical and Statistical Foundations
Proof. For the spectral norm the result follows directly from Lemma 1.3.6. For the Frobenius
norm, set B = A − Bk , where Bk has rank k. Then σk+1 (Bk ) = 0 and, setting j = k + 1 in
(1.3.15), we obtain σi (A−Bk ) ≥ σk+1 (A), i = 1, 2, . . . . From this it follows that ∥A−Bk ∥2F ≥
2
σk+1 (A) + · · · + σn2 (A).
The Eckart–Young–Mirsky theorem was originally proved for the Frobenius norm, for which
the solution is unique; see Eckhart and Young [357, 1936]. The result for an arbitrary unitarily
invariant norm is due to Mirsky [797, 1960]. An elementary proof of the general case is given by
Li and Strang [740, 2020].
The best approximation property of the partial sums of the SVD expansion has a wide range
of applications in applied mathematics and is a key tool for constructing reduced-order models.
In signal processing, A may be derived from data constituting a noisy signal, and a rank reduction
is used to filter out the noise and reconstruct the true signal. Other applications are noise filtering
in statistics and model reduction in control and systems theory. Recently it has been recognized
that most high-dimensional data sets can be well approximated by low-rank matrices; see Udell
and Townsend [1071, 2019].
Golub, Hoffman, and Stewart [494, 1987] prove a generalization that shows how to obtain a best
approximation when a specified set of columns in the matrix is to remain fixed.
The absolute condition number for computing the inverse A−1 is defined as the limit
1
inf max (A + E)−1 − A−1 = ∥A−1 ∥2 , ϵ → +0. (1.3.17)
∥E∥≤ϵ ∥E∥≤ϵ ϵ
The relative condition number for computing the inverse is obtained by multiplying this quantity
by ∥A∥/∥A−1 ∥:
κ(A) = ∥A∥ ∥A−1 ∥. (1.3.18)
This is invariant under multiplication of A by a scalar, κ(αA) = κ(A). From (1.3.18) it follows
that
κ(AB) ≤ κ(A) κ(B).
From the identity AA−1 = I it follows that κp (A) ≥ ∥I∥p = 1 for all matrix ℓp -norms. A matrix
with large condition number is called ill-conditioned; otherwise, it is called well-conditioned.
To indicate that a particular norm is used, we write κ∞ (A). For the Euclidean norm the
condition number can be expressed in terms of the singular values σ1 ≥ · · · ≥ σn of A as
This expression applies also to rectangular matrices A ∈ Cm×n with rank(A) = n. For a real
orthogonal or unitary matrix U ,
κ2 (U ) = ∥U ∥2 ∥U −1 ∥2 = 1.
Such matrices are perfectly conditioned in the ℓ2 -norm. Furthermore, if P and Q are real or-
thogonal or unitary, then κ(P AQ) = κ(A) for both the ℓ2 -norm and the Frobenius norm. This
is one reason why real orthogonal and unitary transformations play such a central role in matrix
computations.
The normwise relative distance of a matrix A to the set of singular matrices is defined as
dist(A) := min ∥E∥/∥A∥ | A + E is singular . (1.3.20)
E
For the spectral norm it follows from the Eckart–Young–Mirsky theorem (Theorem 1.3.8) that
This equality holds for any subordinate matrix norm and can be used to get a lower bound for the
condition number; see Kahan [681, 1966] and Stewart and Sun [1033, 1990, Theorem III.2.8].
Let B = A + E be a perturbation of a matrix A ∈ Rm×n . Estimating the difference
∥(A + E)† − A† ∥ is complicated by the fact that A† varies discontinuously when the rank of A
changes. A trivial example is
σ 0 0 0
A= , E= ,
0 0 0 ϵ
Definition 1.3.9. Two subspaces R(A) and R(B) are said to be acute if the corresponding
orthogonal projections satisfy
PR(A) − PR(B) 2 < 1.
If the perturbation E does not change the rank of A, then unbounded growth of (A + E)†
cannot occur.
1
∥(A + E)† ∥2 ≤ ∥A† ∥2 . (1.3.23)
1−η
26 Chapter 1. Mathematical and Statistical Foundations
By expressing the projections in terms of pseudoinverses and using the relations in (1.2.29),
we obtain Wedin’s identity (Wedin [1106, 1969]). If B = A + E, then
B † − A† = −B † EA† + (B H B)† E H PN (AT ) − PN (B) E H (AAH )† . (1.3.24)
The result for the 2-norm is due to Wedin [1108, 1973]. For the Frobenius norm, µ = 1 as
shown by van der Sluis and Veltkamp [1074, 1979]. From the results above we deduce that
lim (A + E)† = A† ⇐⇒ lim rank(A + E) = rank(A).
E→0 E→0
Pk
This infimum is attained for B = i=1 σi ui viH . It follows that A has numerical δ-rank k if and
only if
σ1 ≥ · · · ≥ σk > δ ≥ σk+1 , ≥ · · · ≥ σn . (1.3.30)
From (1.3.28) it follows that
∥A − Ak ∥2 = ∥AV2 ∥2 ≤ δ, V2 = (vk+1 , . . . , vn ),
where R(V2 ) is the numerical nullspace of A. In many applications the cost of using SVD to
determine the numerical rank and nullspace of a matrix can be prohibitively high.
Choosing the parameter δ in (1.3.30) depends on the context and is not always an easy matter.
Let E = (eij ) be an upper bound on the absolute error in A. If the elements eij are about the
same magnitude, and |eij | ≤ ϵ for all i, j, then
Then rank(A + E) = n, and the perturbed solution satisfies the normal equations
Note that if rank(A) = min(m, n) then the condition η < 1 suffices to guarantee that rank(A +
E) = rank(A). The analysis needs the following estimate for the largest principal angle between
the fundamental subspaces of à and A (see Definition 1.2.12).
Lemma 1.3.13. Suppose à = A + E and conditions (1.3.33) are satisfied. Then if χ(·) denotes
any of the four fundamental subspaces,
Nearly optimal bounds for the perturbation of the solution of least squares problems are
derived in Björck [125, 1967].
Theorem 1.3.14. Suppose rank(A + E) = rank(A) and that perturbations E and f satisfy the
normwise relative bounds
Then if η = κϵA < 1, the perturbations δx and δr in the least squares solution x and residual
r = b − Ax satisfy
κ ∥b∥2 ∥r∥2
∥δx∥2 ≤ ϵA ∥x∥2 + ϵb + ϵA κ + ϵA κ∥x∥2 , (1.3.36)
1−η ∥A∥2 ∥A∥2
We separately estimate each of the three terms in this decomposition of δx. From Lemma 1.3.11
it follows that
1
∥Æ (f − Ex)∥2 ≤ ∥A† ∥2 (∥E∥2 ∥x∥2 + ∥f ∥2 ). (1.3.39)
1−η
From r ⊥ R(A) we have r = PN (AT ) r, and from (1.2.29) the second term becomes
By definition, ∥PR(Ã) PN (AT ) ∥2 = sin θmax (R(Ã), R(A)), where θmax is the largest principal
angle between the subspaces R(Ã) and R(A). Similarly, x = PR(AT ) x, and the third term
can be written as PN (Ã) x = PN (Ã) PR(AT ) x, where by Lemma 1.3.13 ∥PN (Ã) PR(AT ) ∥2 =
sin θmax (N (Ã), N (A)) ≤ η. The estimate (1.3.36) now follows, and (1.3.37) is proved using
the decomposition
r̃ − r = PN (ÃT ) (b + f ) − PN (AT ) b
= PN (ÃT ) f + PN (ÃT ) PR(A) b − PR(ÃT ) PN (AT ) r,
If rank(A) = n, then N (Ã) = {0}, and the last term in (1.3.36) (and therefore also in
(1.3.37)) vanishes. If the system is consistent, then r = 0, and the term involving κ2 in (1.3.36)
vanishes. If rank(A) = n and ϵb = 0, the condition number of the least squares problem can be
written as
∥r∥2
κLS (A, b) = κ(A) 1 + κ(A) . (1.3.41)
∥A∥2 ∥x∥2
Note that the condition depends on r and therefore on the right-hand side b. This dependence is
negligible if κ(A)∥r∥2 ≪ ∥A∥2 ∥x∥2 . By considering first-order approximations of the terms, it
can be shown that for any matrix A of rank n and vector b there are perturbations E and f such
that the estimates in Theorem 1.3.14 can almost be attained.
In many applications, one is not directly interested in the least squares solution x but in some
functional c = LT x, where L ∈ Rn×k . For example, in the determination of positions using GPS
systems, the main quantities of interest are the three-dimensional coordinates, but the statistical
model involves several other auxiliary parameters; see Arioli, Baboulin, and Gratton [32, 2007].
The sensitivity of functionals of the solution of ill-conditioned least squares problems is studied
by Eldén [372, 1990]. From (1.3.31) we have
A† = V Σ† U, (ATA)−1 = V Σ† (Σ† )T V T
into (1.3.42) gives (1.3.43) with C T = (LT V )Σ† and σn exchanged for σr , where r = rank(A).
Normwise perturbation bounds yield results that are easy to present but ignore how the per-
turbations are distributed among the elements of the matrix and vector. When the matrix is poorly
scaled or sparse, such bounds can greatly overestimate the error. For this reason, component-
wise perturbation analysis is gaining increasing attention. As stressed in the excellent survey
by Higham [620, 1994], the conditioning of a problem should always be defined with respect to a
particular class of perturbations. In normwise analysis, perturbations are considered that satisfy
the inequality ∥E∥ ≤ ϵ for some matrix norm. If the columns of A have vastly different norms,
then a more relevant class of perturbations might be
In componentwise analysis, scaling factors eij ≥ 0 and fi ≥ 0 are specified, and perturbations
such that
|δaij | ≤ ωeij , |δbi | ≤ ωfi , i, j = 1, . . . , n, (1.3.45)
for some ω > 0 are considered. By setting eij to zero, we can ensure that the corresponding
element aij is not perturbed. With eij = |aij | and fi = |bi |, ω bounds the componentwise
relative perturbation in each component of A and b.
30 Chapter 1. Mathematical and Statistical Foundations
A ≥ B means the same as B ≤ A, and A > B is the same as B < A. These orderings are
transitive: if A ≤ B and B ≤ C, then A ≤ C. Note that there are matrices that cannot be
compared by any of these relations. It is rather obvious which rules for handling inequalities can
Pngeneralized to this partial ordering in matrix spaces. If C = AB, it is easy to show that |cij | ≤
be
k=1 |aik | |bkj |, i.e., |C| ≤ |A| |B|. A similar rule holds for matrix-vector multiplication.
With the above notation the componentwise bounds (1.3.45) can be written more compactly
as
|δA| ≤ ωE, |δb| ≤ ωf, (1.3.47)
where E > 0, f > 0. Componentwise relative perturbations are obtained by taking E = |A| and
f = |b|. We first consider a nonsingular square linear system Ax = b. The basic identity used
for a componentwise perturbation analysis is
where the inequality is to be interpreted componentwise. The matrix (I − |A−1 ||δA|) is guaran-
teed to be nonsingular if ∥ |A−1 | |δA| ∥ < 1. For perturbations satisfying (1.3.47), we obtain
Assuming that ωκE (A) < 1, it follows from (1.3.48) that for any absolute norm,
ω
∥δx∥ ≤ ∥ |A−1 |(|E| |x| + f )∥. (1.3.49)
1 − ωκE (A)
Hence κE (A) = ∥ |A−1 |E∥ can be taken to be the componentwise condition number with re-
spect to E. For componentwise relative error bounds (E = |A|), we obtain the Bauer–Skeel
condition number of A,
It can be shown that cond (A) and the bound (1.3.49) with E = |A| are invariant under row
scaling.
The Bauer–Skeel perturbation analysis of linear systems can be extended to linear least
squares problems by considering the augmented system
I A y b
Mz = d ≡ = . (1.3.51)
AT 0 x c
1.3. Perturbation Analysis 31
(A† )T
PN (AT )
M −1 = , (1.3.52)
A† −(ATA)−1
where PN (AT ) = I − AA† is the orthogonal projection onto N (AT ). Componentwise perturba-
tions |δA| ≤ ωE, |δb| ≤ ωf , and |δc| ≤ ωg give rise to perturbations
0 δA δb
δM = , δd =
δAT 0 δc
The least squares problem minx ∥Ax − b∥2 corresponds to taking c = g = 0 and r = y.
With E = |A| and f = |b| we obtain, after taking norms,
see Björck [132, 1991]. For small-residual problems the componentwise condition number for
the least squares solution x can be defined as
Componentwise perturbation analysis originated with Bauer [95, 1966] and was later refined by
Skeel [1000, 1979]; see also Higham [620, 1994]. A good survey on matrix perturbation theory
is given by Stewart and Sun [1033, 1990]. Demmel [304, 1992] conjectured that the distance of
a matrix from a singular matrix in a componentwise sense is close to the reciprocal of its Bauer–
Skeel condition number. This conjecture was later proved for the general weighted condition
number κE (A) by Rump [945, 1999].
Abdelmalek [3, 1974] gives a perturbation analysis for pseudoinverses and linear least squares
problems. Stewart [1017, 1977] gives a unified treatment of the perturbation theory for pseu-
doinverses and least squares solutions with historical comments. In particular, asymptotic forms
and derivatives for orthogonal projectors, pseudoinverses, and least squares solutions are derived.
Grcar [531, 2010] derives spectral condition numbers of orthogonal projections and full-rank lin-
ear least squares problems. Gratton [527, 1996] obtains condition numbers of the least squares
problem in a weighted Frobenius norm. A similar componentwise analysis is given in Arioli,
Duff, and de Rijk [36, 1989]. Baboulin and Gratton [50, 2009] give sharp bounds for the condi-
tion numbers of linear functionals of least squares solutions.
32 Chapter 1. Mathematical and Statistical Foundations
a = ±m · β e , β −1 ≤ m < 1, (1.4.1)
where exponent e is an integer and β is the base of the system. If t digits are used to represent
the fraction part m, we write
The exponent is limited to a finite range emin ≤ e ≤ emax . In a floating-point number system,
every real number in the floating-point range can be represented with a relative error that does
not exceed the unit roundoff
u = 12 β −t+1 . (1.4.3)
The IEEE 754–2008 standard for binary floating-point arithmetic [655, 2019] is used on
virtually all general-purpose computers. It specifies formats for floating-point numbers, ele-
mentary operations, and rounding rules. Three basic formats are specified for representing a
number: single, double, and quadruple precision using 32, 64, and 128 bits, respectively. Also,
a half precision format fp16 using 16 bits was introduced. This offers massive speed-up, but
because the maximum number that can be represented is only about 65,000, overflow is more
likely. This motivated Google to propose another half precision format with wider range called
bfloat16. Half precision formats have applications in computer graphics as well as deep learn-
ing; see Pranesh [904, 2019]. Because it is cheaper to move data in lower precision, the cost
of communication is reduced. The characteristics of floating-point formats are summarized in
Table 1.4.1.
Four rounding modes are supported by the standard. The default rounding mode is round
to the nearest representable number, with rounding to even in the case of a tie. Chopping (i.e.,
rounding toward zero) is also supported, as well as directed rounding to ∞ and −∞. The latter
modes simplify the implementation of interval arithmetic.
The IEEE standard specifies that all arithmetic operations, including the square root, should
be performed as if they were first calculated to infinite precision and then rounded to a floating-
point number according to one of the four modes mentioned above. One reason for specifying
precisely the results of arithmetic operations is to improve the portability of software. If a pro-
gram is moved between two computers, both supporting the IEEE standard, intermediate and
final results should be the same.
1.4. Floating-Point Computation 33
In general the image of an interval vector under a transformation is not an interval vector. As
a consequence, the inclusion operation will usually yield an overestimation. This phenomenon,
intrinsic to interval computations, is called the wrapping effect. Rump [943, 1999] gives an
algorithm for computing the product of two interval matrices using eight matrix products.
A square interval matrix [A] is called nonsingular if it does not contain a singular matrix.
An interval linear system is a system of the form [A] x = [b], where A is a nonsingular interval
matrix and b an interval vector. The solution set of such an interval linear system is the set
Computing this solution set can be shown to be an intractable (NP-complete) problem. Even for
a 2 × 2 linear system, this set may not be easy to represent.
An efficient and easy-to-use MATLAB toolbox called INTLAB (INTerval LABoratory) has
been developed by Rump [944, 1999]. It contains many useful subroutines and allows verified
solutions of linear least squares problems to be computed.
x1 x2 (1 + ϵ2 )x3 (1 + ϵ3 ) · · · xn (1 + ϵn ),
|f l(x1 x2 · · · xn ) − x1 x2 · · · xn | ≤ δ|x1 x2 · · · xn |,
where δ = (1 + u)n−1 − 1 < 1.06(n − 1)u, and the last inequality holds if the condition
(n − 1)u < 0.1 is satisfied. This bounds the forward error in the computed result. Simi-
lar results can easily be derived for basic vector and matrix operations; see Wilkinson [1120,
1965, pp. 114–118].
In the following we mainly use notation due to Higham [623, 2002, Sect. 3.4]. Let |δi | ≤ u
and ρi = ±1, i = 1 : n. If nu < 1, then
n
Y nu
(1 + δi )ρi = 1 + θn , |θn | < γn , γn = . (1.4.9)
i=1
1 − nu
With the realistic assumption that nu < 0.1, it holds that γn < 1.06nu. Often it can be assumed
that nu ≪ 1, and we can set γn = nu. When it is not worth the trouble to keep precise track of
constants in the γk terms, we use Higham’s notation
cnu
γ̃n ≡ , (1.4.10)
1 − cnu
where c denotes a small integer constant whose exact value is unimportant.
If the inner product xT y = x1 y1 + x2 y1 + · · · + xn yn is accumulated from left to right,
repeated use of (1.4.4) gives
f l (xT y) = x1 y1 (1 + δ1 ) + x2 y2 (1 + δ2 ) + · · · + xn yn (1 + δn ),
1.4. Floating-Point Computation 35
where |δ1 | < γn , |δi | < γn+2−i , i = 2, . . . , n. This gives the forward error bound
n
X
|f l (xT y) − xT y| < γn |x1 ||y1 | + γn+2−i |xi ||yi | < γn |x|T |y|, (1.4.11)
i=2
where |x|, |y| denote vectors with elements |xi |, |yi |. Note that the error magnitudes depend
on the order of evaluation. The last upper bound in (1.4.11) holds independently of the sum-
mation order and is also valid for floating-point computation with no guard digit rounding. The
corresponding backward error bounds
also hold for any order of evaluation. This result is easily generalized to yield a forward error
analysis of matrix-matrix multiplication. However, for this case there is no backward error analy-
sis, because the rows and columns of the two matrices participate in several inner products! For
the outer product xy T of two vectors x, y ∈ Rn we have f l (xi yj ) = xi yj (1 + δij ), |δij | ≤ u,
and so
|f l (xy T ) − xy T | ≤ u |xy T |. (1.4.13)
This is a satisfactory result for many purposes. However, the computed result is usually not
a rank-one matrix. In general, it is not possible to find perturbations ∆x and ∆y such that
f l(xy T ) = (x + ∆x)(y + ∆y)T .
In many matrix algorithms, expressions of the form
k−1
X .
y= c− ai bi d
i=1
occur repeatedly. A simple extension of the roundoff analysis of an inner product shows that if
the term c is added last, then the computed ȳ satisfies
k−1
X
ȳd(1 + δk ) = c − ai bi (1 + δi ), (1.4.14)
i=1
where |δ1 | ≤ γk−1 , |δi | ≤ γk+1−i , i = 2, . . . , k − 1, and |δk | ≤ γ2 . The result is formulated so
that c is not perturbed. The forward error satisfies
k−1
X k−1
X
ȳd − c − ai bi ≤ γk |ȳd| + |ai ||bi | , (1.4.15)
i=1 i=1
Hence each computed column cj in C has a small backward error. The same cannot be said for
C = AB as a whole, because the perturbation of A depends on j. For the forward error we get
the bound
|f l (AB) − AB| ≤ γn |A||B|, (1.4.17)
and it follows that ∥f l (AB) − AB∥ ≤ γn ∥ |A| ∥ ∥ |B| ∥. Hence for any absolute norm, such as
ℓ1 , ℓ∞ , and Frobenius norms, it holds that
For the ℓ2 -norm the best upper bound is ∥f l (AB) − AB∥2 < nγn ∥A∥2 ∥B∥2 , unless A and B
have only nonnegative elements. The rounding error results here are formulated for real arith-
metic. Similar bounds hold for complex arithmetic provided the constants in the bounds are
increased appropriately.
where c is a not-too-large constant and κ is the condition number of the problem from a pertur-
bation analysis. Clearly this is a weaker form of stability.
Backward error analysis for matrix algorithms was pioneered by J. H. Wilkinson in the late
1950s. When it applies, it tends to be markedly superior to forward analysis. In a backward error
analysis one attempts to show that for some class of input data, the algorithm computes a solution
fˆ that is the exact solution corresponding to a modified set of data ãi close to the original data
ai . There may be an infinite number of such sets, but it can also happen that no such set exists.
An algorithm is said to be backward stable if for some norm ∥ · ∥,
where c is a not-too-large constant. Backward error analysis usually gives better insight into the
stability (or lack thereof) of the algorithm, which often is the primary purpose of an error analysis.
Notice that no reference is made to the exact solution for the original data. A backward stable
algorithm is guaranteed to give an accurate solution only if the problem is well-conditioned. To
yield error bounds for the solution, the backward error analysis has to be complemented with a
perturbation analysis. It can only be expected that the error satisfies an inequality of the form
(1.4.19). Nevertheless, if the backward error is within the uncertainties of the given data, it can
be argued that the computed solution is as good as the data warrants. From the error bound
(1.4.15) it is straightforward to derive a bound for the backward error in solving a triangular
system of equations.
Theorem 1.4.1. If the lower triangular system Lx = b, L ∈ Rn×n is solved by forward substi-
tution, the computed solution ȳ is the exact solution of (L + ∆L)x̄ = b, where for i = 1, . . . , n,
γ2 |lij | if j = i,
|∆lij | ≤ (1.4.20)
γi−j |lij | if j < i,
where γn = nu/(1 − nu). Hence |∆L| ≤ γn |L| for any summation order.
1.4. Floating-Point Computation 37
A similar result holds for the computed solution of an upper triangular system. We conclude
that the algorithm for solving triangular systems by forward or backward substitution is backward
stable. More precisely, we call the algorithm componentwise backward stable, because (1.4.20)
bounds the perturbations in L componentwise. Note that it is not necessary to perturb the right-
hand side b.
Often the data matrix M belongs to a class M of structured matrices, such as Toeplitz matri-
ces (see Section 4.5.5) or the augmented system matrix
I A
M= .
AT 0
Basic Numerical
Methods
is to form and solve the normal equations ATAx = AT b. If A has full column rank, then AT A is
positive definite and hence nonsingular. Gauss developed an elimination method for solving the
normal equations that uses pivots chosen from the diagonal; see Stewart [1028, 1995]. Then all
reduced matrices are symmetric, and the storage and number of needed operations are reduced
by half. Later, the preferred way to organize this elimination algorithm was to use Cholesky
factorization, named after André-Louis Cholesky (1875–1918). The accuracy of the computed
least squares solution using the normal equations depends on the square of the condition number
of A. Indeed, accuracy may be lost already when forming ATA and AT b. Hence, the method
of normal equations works well only for well-conditioned problems or when modest accuracy is
required. Otherwise, algorithms based on orthogonalization should be preferred.
Much of the background theory on the matrix algorithms in this chapter is given in the ex-
cellent textbook of Stewart [1030, 1998]. Accuracy and stability properties of matrix algorithms
are admirably covered by Higham [623, 2002].
C = ATA ∈ Rn×n , d = AT b ∈ R n .
Given the symmetry of C, it suffices to compute the 21 n(n + 1) elements of its upper triangular
part (say). If m ≥ n, this number is always less than the mn elements in A ∈ Rm×n . Hence,
forming the normal equations can be viewed as data compression, in particular when m ≫ n. If
A = (a1 , a2 , . . . , an ), the elements of C and d can be expressed in inner product form as
39
40 Chapter 2. Basic Numerical Methods
This expresses C as a sum of m matrices of rank one and d as a linear combination of the rows
of A. It has the advantage that only one pass through the data A and b is required, and it can be
used to update the normal equations when equations are added or deleted. For a dense matrix,
both schemes require mn(n + 1) floating-point operations or flops, but the outer product form
requires more memory access. When A is sparse, zero elements in A can more easily be taken
into account by the outer product scheme. If the maximum number of nonzero elements in a
row of A is p, then the outer product scheme only requires m(p + 1)2 flops. Note that the flop
count only measures the amount of arithmetic work of a matrix computation but is often not an
adequate measure of the overall complexity of the computation.
Rounding errors made in forming the normal equations can be obtained from the results in
Section 1.4.2 for floating-point arithmetic. The computed elements in C = ATA are
m
X
c̄ij = aik ajk (1 + δk ), |δk | < 1.06(m + 2 − k)u,
k=1
where u is the unit roundoff. It follows that the computed matrix satisfies C̄ = C + E, where
A similar estimate holds for the rounding errors in the computed vector AT b. It is important to
note that the rounding errors in forming ATA are in general not equivalent to small perturbations
of the initial data matrix A, i.e., it is not true that C̄ = (A + E)T (A + E) for some small error
matrix E. Furthermore, forming the normal equations squares the condition number:
κ(C) = κ2 (A).
Therefore methods that form the normal equations explicitly are not backward stable.
Example 2.1.1. A simple example where information is lost when ATA is formed is the matrix
studied by Läuchli [724, 1961]:
1 1 1
1 + ϵ2 1 1
ϵ T
A= , A A= 1 1 + ϵ2 1 .
ϵ
1 1 1 + ϵ2
ϵ
Assume that ϵ = 10−8 and that six decimal digits are used for the elements of ATA. Then,
because 1 + ϵ2 = 1 + 10−16 is rounded to 1 even in IEEE double precision, all information
contained in the last three rows of A is irretrievably lost.
Sometimes an unsuitable formulation of the problem will cause the least squares problem
to be ill-conditioned. Then a different choice of parametrization may significantly reduce the
ill-conditioning. For example, in regression problems one should try to use orthogonal or nearly
orthogonal base functions. Consider, for example, a linear regression problem for fitting a linear
model y = α + βt to the given data (yi , ti ), i = 1, . . . , m. This is a least squares problem
2.1. The Method of Normal Equations 41
1 t1 y1
1 t2 y2
A = (e t) =
... .. , y=
.. .
. .
1 tm ym
The solution is
Note that the mean values of the data ȳ = eT y/m, t̄ = eT t/m lie on the fitted line, that is,
α + β t̄ = ȳ.
A more accurate formula for β is obtained by centering the data, i.e., by making the change
of variables yei = yi − ȳ, e
ti = ti − t̄, i = 1, . . . , m, and writing the model as ye = β e
t. In the new
variables, eT et = 0 and eT ye = 0. Hence the matrix of normal equations is diagonal, and we get
β = yeT e tT e
t/e t. (2.1.4)
When the elements in A and b are the original data, ill-conditioning cannot be avoided, as
in the above example, by choosing another parametrization. The method of normal equations
works well for well-conditioned problems or when only modest accuracy is required. However,
the accuracy of the computed solution will depend on the square of the condition number of A.
In view of the perturbation result in Theorem 1.3.14, this is not consistent with the sensitivity of
small-residual problems, and the method of normal equations can introduce errors much greater
than those of a backward stable algorithm. For less severely ill-conditioned problems, this can
be offset by iterative refinement; see Section 2.5.3. Otherwise, algorithms that use orthogonal
transformations and avoid forming ATA should be used; see Section 2.2.
C = RT R, (2.1.5)
where R = (rij ) is upper triangular with positive diagonal elements. Written componentwise,
this is n(n + 1)/2 equations
i−1
X
cij = rii rij + rki rkj , 1 ≤ i ≤ j ≤ n, (2.1.6)
k=1
3 It is named after André-Louis Cholesky (1875–1918), who was a French military officer involved in the surveying
of Crete and North Africa before the First World War. His work was posthumously published by Benoît [105, 1924].
42 Chapter 2. Basic Numerical Methods
for the n(n+1)/2 unknown elements in R. If properly sequenced, (2.1.6) can be used to compute
R. An element rij can be computed from equation (2.1.6) provided rii and the elements rki , rkj ,
k < i, are known. It follows that one way is to compute R column by column from left to right
as in the proof of Theorem 1.2.1. Such an algorithm is called left-looking. Let Ck ∈ Rk×k be
H
the kth leading principal submatrix of C. If Ck−1 = Rk−1 Rk−1 , then
H
Ck−1 dk Rk−1 0 Rk−1 rk
Ck = = (2.1.7)
dTk γk rkH ρk 0 ρk
implies that
H
Rk−1 rk = dk , ρ2k = γk − rkH rk . (2.1.8)
Hence, the kth step of columnwise Cholesky factorization requires the solution of a lower tri-
angular system. The algorithm requires approximately 31 n3 flops and n square roots. Only the
elements in the upper triangular part of C are referenced, and R can overwrite C.
A second variant computes in step k = 1, . . . , n the elements in the kth row of R. This is a row-
wise or right-looking algorithm. The arithmetic work is dominated by matrix-vector products.
It is simple to modify the Cholesky algorithms so that they instead compute a factorization
of the form C = RT DR, where D is diagonal and R unit upper triangular. This factorization
requires no square roots and may therefore be slightly faster. The two algorithms above are
2.1. The Method of Normal Equations 43
numerically equivalent, i.e., they compute the same factor R, even taking rounding errors into
Pi by2Wilkinson [1121, 1968].
account. A normwise error analysis is given
Taking i = j in (2.1.6) gives cii = k=1 rki , i = 1, . . . , n. It follows that
Hence, the elements in R are bounded in size by the square root of the maximum diagonal
elements in C. This shows the numerical stability of Cholesky factorization.
Theorem 2.1.2. Let C ∈ Rn×n be a symmetric positive definite matrix such that
Then the Cholesky factor of C can be computed without breakdown, and the computed R̄ satisfies
Theorem 2.1.3. If Cholesky factorization applied to the symmetric positive definite matrix C ∈
Rn×n runs to completion, then the computed factor R satisfies
T
R R = C + E, |E| ≤ γn+1 (1 − γn+1 )ddT , (2.1.12)
−1/2
where di = cii and γk = ku/(1 − ku).
These results show that the given algorithms for computing the Cholesky factor R from
C are backward stable. The error in the computed Cholesky factor will affect the error in
the least squares solution x̄ computed by the method of normal equations. When C = ATA,
κ(C) = κ2 (A), and this squaring of the condition number implies that the Cholesky algorithm
√
may break down, and roots of negative numbers may arise when κ(A) is of the order of 1/ u.
Rounding errors in the solution of the triangular systems RTRx = AT b are usually negligible;
see Higham [615, 1989]. From Theorem 2.1.2 it follows that the backward error in the computed
solution x̄ caused by the Cholesky factorization satisfies
The squared condition number shows that the method of normal equations is not a backward
stable method for solving a least squares problem.
A′ = D2 AD1 , b′ = D2 b, x = D 1 x′ . (2.1.14)
44 Chapter 2. Basic Numerical Methods
It seems natural to expect that such a scaling should have no effect on the relative accuracy of
the computed solution. If the system is solved by Gaussian elimination or, equivalently, by LU
factorization, this is in fact true, as shown by the following theorem due to Bauer [95, 1966].
Theorem 2.1.4. Denote by x̄ and x̄′ the computed solution using LU factorization in floating-
point arithmetic to the two linear systems Ax = b and (D2 AD1 )x′ = D2 b, where D1 and D2
are diagonal scaling matrices. If no rounding errors are introduced by the scaling, and the same
pivots are used, then x̄ = D1 x̄′ holds exactly.
For a least squares problem minx ∥Ax − b∥2 , the column scaling AD corresponds to a two-
sided symmetric scaling (AD)T AD = DCD of the normal equations matrix. Note that scaling
the rows in A is not allowed because this would change the LS objective function; see Sec-
tion 3.2.1.
Cholesky factorization of C = ATA is a special case of LU factorization, and therefore
by Theorem 2.1.4 it is numerically invariant under a diagonal scaling that preserves symmetry.
Condition (2.1.10) in Theorem 2.1.2 can therefore be replaced by 2n3/2 u(κ′ )2 < 0.1, where
κ′ (A) = minD κ(AD), D > 0. Furthermore, the error bound (2.1.13) for the computed solution
by Cholesky factorization can be improved to
∥x̄ − x∥2 ≤ 2.5n3/2 u κ′ (A)κ(A)∥x∥2 . (2.1.15)
Note that these improvements hold without explicitly carrying out any scaling in the algorithm.
Theorem 2.1.5. Let C ∈ Rn×n be symmetric and positive definite with at most q ≤ n nonzero
elements in any row. If all diagonal elements in C are equal, then
κ(C) ≤ q min κ(DCD), (2.1.16)
D>0
where D > 0 denotes the set of all n × n diagonal matrices with positive entries.
If C = ATA and the columns of A are scaled to have equal 2-norms, then C, then from
Theorem 2.1.5 with q = n, it follows that
√
κ(A) = n min κ(AD). (2.1.17)
D>0
Hence, choosing D so that the columns of AD have equal length is a nearly optimal scaling.
Example 2.1.6. Sometimes the error in the solution computed by the method of normal equa-
tions is much smaller than the error bound in (2.1.13). This is often due to poor column scaling,
and the observed error bound is well predicted by (2.1.15). Consider the least squares fitting of a
polynomial
p(t) = c0 + c1 t + · · · + tn−1
to observations yi = p(ti ) at points ti = 0, 1, . . . , m − 1. The resulting least squares problem is
minc ∥Ac − y∥2 , where A ∈ Rm×n has elements
aij = (i − 1)j−1 , 1 ≤ i ≤ m, 1 ≤ j ≤ n.
For m = 21 and n = 6, the matrix is quite ill-conditioned: κ(A) = 6.40 · 106 . However, the
condition number of the scaled matrix AD, where all columns have unit length, is more than
three orders of magnitude smaller: κ(AD1 ) = 2.22 · 103 .
2.1. The Method of Normal Equations 45
The inverse S = R−1 = (sij ) is upper triangular and can be computed in n3 /3 flops from the
matrix equation RS = I as follows:
for j = n, n − 1, . . . , 1
sjj = 1/rjj ;
for i = j − 1, . . . , 2, 1
X j .
sij = − rik skj rii ;
k=i+1
end
end
The computed elements of S can overwrite the corresponding elements of R in storage. Forming
the upper triangular part of Cx = SS T requires an additional n3 /3 flops. This computation can
be sequenced so that the elements of Cx overwrite those of S. The variance of the components
of x is given by the diagonal elements of Cx :
n
X
cnn = s2nn = 1/rnn
2
, cii = s2ij , i = n − 1, . . . , 1.
j=i
Note that the variance for xn is available directly from the last diagonal element rnn . The
covariance matrix of the residual vector r̂ = b − Ax̂ is
P a = a − β(uT a)u
and can be found without explicitly forming P itself. The effect of this transformation is that
it reflects the vector a in the hyperplane with normal vector u; see Figure 2.2.1. Note that
P a ∈ span {a, u} and P u = −u so that P reverses u.
The use of orthogonal reflections in numerical linear algebra was initiated by Householder
[644, 1958]. Therefore, a matrix P of the form (2.2.1) is also called a Householder reflector.
The most common use of Householder reflectors is reducing a given vector a to a multiple of the
unit vector e1 = (1, 0, . . . , 0)T :
Since P is orthogonal, we have σ = ∥a∥2 ̸= 0. From (2.2.2) it follows that P a = ±σe1 for
u = a ∓ σe1 . (2.2.3)
so that 1/β = σ(σ ∓ α1 ). To avoid cancellation when a is close to a multiple of e1 , the standard
choice is to take
u = a + sign (α1 )σe1 , 1/β = σ(σ + |α1 |). (2.2.5)
This corresponds to choosing the outer bisector in the Householder reflector.
Given real vector a ̸= 0, Algorithm 2.2.1 computes a Householder reflector P = I − βuuT ,
where u is normalized so that u1 = 1. If n = 1 or a(2 : n) = 0, then β = 0 is returned.
2.2. Orthogonalization Methods 47
Householder reflections for use with complex vectors and matrices can be constructed as
follows; see Wilkinson [1119, 1965, pp. 49–50]. A complex Householder reflector has the form
2
P = I − βuuH , β= , u ∈ Cn , (2.2.6)
uH u
and is Hermitian and unitary (P H = P , P H P = I). Given x ∈ Cn such that eT1 x = ξ1 =
eiα |ξ1 |, we want to determine P so that
and P x = −eiα σe1 . Complex Householder transformations are discussed in more detail by
Lehoucq [732, 1996] and Demmel et al. [309, 2008].
Householder matrices can be used to introduce zeros in a column or a row of a matrix A ∈
Rm×n . The product P A can be computed as follows using only β and the Householder vector u:
This requires 4mn flops and alters A by a matrix of rank one. Similarly, multiplication from the
right is computed as
AP = A(I − βuuT ) = A − β(Au)uT . (2.2.9)
Another useful class of elementary orthogonal transformations are plane rotations, often
called Givens rotations; see Givens [480, 1958]. These have the form
c s
G= , c = cos θ, s = sin θ, (2.2.10)
−s c
the plane spanned by the unit vectors ei and ek , i < k, is a rank-two modification of the unit
matrix In :
i k
.. ..
1 . .
.. .. ..
. . .
i . . . . . . c ... s ... ...
.. .. ..
Gik = . . (2.2.11)
. .
k . . . . . . −s
... c ... ...
.. .. ..
. . .
.. ..
. . 1
Premultiplying a column vector a = (α1 , . . . , αn )T by Gik will affect only elements in rows i
and j: ′
αi c s αi cαi + sαk
= = . (2.2.12)
αk′ −s c αk −sαi + cαk
Any element in a vector or matrix can be annihilated by a plane rotation. For example, if in
(2.2.12) we take q
c = αi /σ, s = αk /σ, σ = αi2 + αk2 ̸= 0, (2.2.13)
then αi′ = σ and αk′ = 0. Premultiplication of A ∈ Rm×n with a plane rotation Gik ∈ Rm×m
requires 6n flops and only affects rows i and j in A. Similarly, postmultiplying A with Gij ∈
Rn×n will only change columns i and j.
A robust algorithm for computing c, s, and σ in a plane rotation G such that G(α, β)T = σe1
to nearly full machine precision is given below. Note that the naive expression σ = (α2 + β 2 )1/2
may produce damaging underflows and overflows even though the data and result are well within
the range of the floating-point number system.
This algorithm requires two divisions, three multiplications, and one square root. No inverse
trigonometric functions are involved. Overflow can only occur if the true value of σ itself were
to overflow.
The standard task of mapping a given vector x ∈ Rm onto a multiple of e1 can be performed
in different ways by a sequence of plane rotations. Let Gik denote a plane rotation in the plane
2.2. Orthogonalization Methods 49
(i, k) that zeros out the kth component in a vector. Then one solution is to take
Note that because G1k only affects components 1 and k, previously introduced zeros will not be
destroyed later. Another possibility is to take
where Gk−1,k is chosen to zero the kth component. This flexibility of plane rotations compared
to Householder reflections is particularly useful when operating on sparse matrices.
A plane rotation G (or reflector) can be represented by c and s and need never be explicitly
formed. Even more economical is to store either c or s, whichever is smaller. Stewart [1016,
1976] devised a scheme in which the two cases are distinguished by storing the reciprocal of c.
Then c = 0 has to be treated as a special case. If for the matrix (2.2.10) we define
(1 if c = 0,
ρ= sign (c)s if |s| < |c|, (2.2.14)
sign (s)/c if |c| ≤ |s|,
then the numbers c and s can be retrieved up to a common factor ±1 by
if ρ = 1, then c = 0; s = 1;
if |ρ| < 1, then s = ρ; c = (1 − s2 )1/2 ;
if |ρ| > 1, then c = 1/ρ; s = (1 − c2 )1/2 .
√
This scheme is used because the formula 1 − x2 gives poor accuracy when |x| is close to unity.
An alternative to plane rotations favored by some are plane reflectors of the form
e = c s , c = cos θ, s = sin θ,
G (2.2.15)
s −c
for which det(G(θ))e = −1. These reflectors are symmetric and orthogonal, G̃−1 = G̃ =
T
G̃ , and represent a plane rotation followed by a reflection about an axis. From trigonometric
e equals the 2 × 2 Householder reflector
identities, it follows that G
Ge = I − (I − G)e = I − 2uuT , u = − sin(θ/2) . (2.2.16)
cos(θ/2)
For efficiency, we want c to be real also in the complex case. If α ̸= 0 and β ̸= 0, this can be
achieved by taking
p p
c = |α|/( |α|2 + β|2 ), s = sign (α) β̄/( |α|2 + β|2 ), (2.2.18)
p
where σ = sign (α) |α|2 + β|2 . Here sign (z) = z/|z| is defined for any complex z ̸= 0. If
α = 0 and β ̸= 0, we take
and updating the two factors separately, the number of multiplications can be reduced. The
transformation (2.2.19) is represented in the factored form
where D̃ is a diagonal matrix chosen so that two elements in P are equal to unity. This eliminates
2n multiplications in forming the product P A′ . In actual computation, D2 rather than D is stored
in order to avoid square roots.
Consider first the case |γ| ≥ |σ|, i.e., |θ| ≤ π/4. Then
d2 σ
1
d1 γ d2 σ d1 γ
GD = = γD = D̃P,
− dd21 σγ
−d1 σ d2 γ 1
and D̃2 = γ 2 D2 . Since σ/γ = β/α = (d2 /d1 )(β ′ /α′ ), we have
2
β′
1 p12 d2
P = , p21 = ′ , p12 = p21 . (2.2.20)
−p21 1 α d1
Hence we only need the squares of the scale factors d1 and d2 . The identity γ 2 = (1 + σ 2 /γ 2 )−1
implies that
d˜21 = d21 /t, d˜22 = d22 /t, t = 1 + σ 2 /γ 2 = 1 + p12 p21 . (2.2.21)
This eliminates the square root in the plane transformation. Similar formulas are easily derived
for the other case |γ| < |σ|, i.e., |θ| > π/4, giving
2
α′
p11 1 d1
P = , p22 = ′ , p11 = p22 , (2.2.22)
−1 p22 β d2
and
d˜21 = d22 /t, d˜22 = d21 /t, t = 1 + γ 2 /σ 2 = p11 p22 + 1. (2.2.23)
2.2. Orthogonalization Methods 51
Fast plane rotations have the advantage that they reduce the number of multiplications and square
roots. However, when they are applied, the square of the scale factors is always updated by a
factor in the interval [1/2, 1]. Thus after many transformations the elements in D may underflow.
Therefore the size of the scale factors must be careful monitored to prevent underflow or overflow.
This substantially decreases the efficiency of fast plane rotations.
Anda and Park [21, 1994] developed self-scaling fast rotations, which obviate rescalings.
Four variations of these modified fast rotations are used. The choice among the four variants is
made to diminish the larger diagonal element while increasing the smaller one.
On modern processors the gain in speed of fast plane rotations is modest, due to the nontrivial
amount of monitoring needed. Hence, their usefulness appears to be limited, and LAPACK does
not make use of them.
The first rotation G23 (ψ) is used to zero the element q31 . Next, G12 (θ) zeros the modified
element q21 . Finally, G23 (ϕ) is used to zero q32 . The angles can always be chosen to make the
diagonal elements positive. Since the final product is orthogonal and upper triangular, it must be
the unit matrix I3 . By orthogonality, we have
A problem with this representation is that the Euler angles may not depend continuously on the
data. If Q equals the unit matrix plus small terms, then a small perturbation may change an
angle by as much as 2π. A different set of angles, based on zeroing the elements in the order
q21 , q31 , q32 , yields a continuous representation and is preferred. This corresponds to the product
For more details, see Hanson and Norris [590, 1981]. An application of Euler angles for solving
the eigenproblem of a symmetric 3×3 matrix is given by Bojanczyk and Lutoborksi [167, 1991].
Theorem 2.2.2 (The QR Factorization). For any matrix A ∈ Cm×n , m ≥ n, of full column
rank there exists a factorization
R R
A=Q = ( Q1 Q2 ) = Q1 R, (2.2.25)
0 0
where Q ∈ Cm×m is unitary, Q1 ∈ Cm×n , and R ∈ Cn×n is upper triangular with real positive
diagonal elements. The matrices R and Q1 = AR−1 are uniquely determined. Q2 is not unique,
and (2.2.25) holds if we substitute Q2 P , where P ∈ C(m−n)×(m−n) is any unitary matrix. The
corresponding orthogonal projectors
PR(A) = Q1 QH
1 , PA⊥ = Q2 QH H
2 = I − Q1 Q1 (2.2.26)
Proof. The proof is constructive. Set A(1) = A and compute A(k+1) = Hk A(k) , k = 1, . . . , n.
Here Hk is a Householder reflection chosen to zero the elements below the main diagonal in
column k of A(k) . After step k, A(k+1) is triangular in its first k columns, i.e.,
R11 R12
A(k+1) = , k = 1, . . . , n, (2.2.27)
0 Ã(k+1)
where R11 ∈ Rk×k is upper triangular and (R11 , R12 ) are the first k rows of the final factor R.
(k) (k)
If Ã(k) = (ãk , . . . , ãn ), Hk is taken as
where
(k) (k)
H̃k ãk = σk e1 , σk = rkk = ∥ãk ∥2 . (2.2.28)
Note that Hk only transforms Ã(k) and does not destroy zeros introduced in earlier steps. After
n steps, we have
(n+1) T R
A =Q A= , Q = H1 · · · Hn , (2.2.29)
0
which is the QR factorization of A, where Q is given as a product of Householder transforma-
tions. If m = n, the last transformation Hn can be skipped.
From (2.2.25) the columns of Q1 and Q2 form orthonormal bases for R(A) and its orthogonal
complement:
R(A) = R(Q1 ), N (AH ) = R(Q2 ). (2.2.30)
The vectors ũ(k) , k = 1, . . . , n, can overwrite the elements on and below the main diagonal of
A. Thus all information associated with the factors Q and R fits into the array holding A. The
vector (β1 , . . . , βn ) of length n is usually stored separately but can also be recomputed from
βk = 12 (1 + ∥b uk ∥22 )1/2 .
2.2. Orthogonalization Methods 53
For a complex matrix A ∈ Cm×n the QR factorization can be computed similarly by using
a sequence of unitary Householder reflectors. Note that a factor R with real positive diagonal
elements can always be obtained by a unitary scaling:
R DR
A=U = (U D−1 ) , D = diag (eiα1 , . . . , eiαn ).
0 0
The factor Q is usually kept in factored form and accessed through βk and the Householder
vectors ũ(k) , k = 1, . . . , n. In step k the application of the Householder reflector to the active
part of the matrix requires 4(m − k + 1)(n − k) flops. Hence, the total flop count becomes
2(mn2 − n3 /3) or 4n3 /3 flops if m = n.
Algorithm 2.2.3 computes the QR factorization of A ∈ Cm×n (m ≥ n) using Householder
(kk)
transformations. Note that the diagonal elements rkk will be positive if ak is negative and neg-
ative otherwise. Negative diagonal elements may be removed by multiplying the corresponding
rows of R and columns of Q by −1.
Q = ( Q1 Q2 ) = H1 · · · Hn Im ∈ Rm×m (2.2.31)
from left to right. Since the transformation Hk+1 leaves the first k rows unchanged, it follows
that
qk = H1 · · · Hp ek , k = 1 : m, p = min{k, n}. (2.2.32)
Generating the full matrix Q takes 4(mn(m − n) + n3 /3) flops. Algorithm 2.2.4 generates the
matrix Q1 ∈ Rm×n (m ≥ n) and requires 2(mn2 − n3 /3) flops.
54 Chapter 2. Basic Numerical Methods
function Q = houseq1(U,beta)
% HOUSEQ1 generates the m by n orthonormal matrix
% Q from a given Householder QR factorization
% -----------------------------------------------
[m,n] = size(U);
Q = eye(m,n)
for k = n:-1:1
uk = U(k:m,k); v = uk'*Q(k:m,k:n);
Q(k:m,k:n) = Q(k:m,k:n) - (beta(k)*uk)*v;
end
end
The matrix Q2 ∈ Rm×(m−n) gives an orthogonal basis for the orthogonal complement
N (AT ) and can be generated in 2n(m − n)(2m − n) flops. Householder QR factorization
is backward stable. The following result is due to Higham [623, 2002, Theorem 19.4].
Theorem 2.2.3. Let R̄ ∈ Rm×n denote the upper trapezoidal matrix computed by the House-
holder QR factorization for A ∈ Rm×n . Then there exists an exactly orthogonal matrix Q ∈
Rm×m such that A + ∆A = QR̄, where
Note that the matrix Q̄ computed by the Householder QR factorization is not the exact or-
thogonal matrix Q in Theorem 2.2.3. However, it is very close to this:
√
∥Q̄ − Q∥F ≤ nγmn . (2.2.34)
An important special case is the computation of the QR factorization of a matrix of the form
R1
A= ,
R2
where R1 , R2 ∈ Rn×n are upper triangular. This “merging” of triangular matrices occurs as
a subproblem in parallel QR factorization and in QR factorization of band and other sparse
matrices. If the rows of A are permuted in order 1, n + 1, 2, n + 1, . . . , n, 2n, then standard
Householder QR can be used without introducing extra fill. This is illustrated in the following
diagram for n = 4:
× × × × × × × × × × × × × × × ×
⊗ × × × 0 × × × 0 × × × 0 × × ×
× × × ⊗ × × ⊗ × × ⊗ × ×
× × × ⊗ × × 0 ⊗ × 0 ⊗ ×
, , , .
× × × × ⊗ × ⊗ ⊗
× × × × ⊗ × 0 ⊗
× × × ⊗
× × × ⊗
In the diagram, × stands for a (potential) nonzero element, ⊗ for a nonzero element that has
been zeroed out, and + for a nonzero element that has been introduced in the computations
(if any). In practice the reordering of the rows need only be carried out implicitly. The QR
factorization requires a total of approximately 2n3 /3 flops if the Householder transformations
are not accumulated.
In Givens QR factorization, a sequence of rotations is used to eliminate the elements below
the diagonal of A. An advantage over Householder QR is that the rotations can be adapted to
the nonzero structure of the matrix. For example, in the QR factorization of band matrices, zeros
can be introduced one row at a time. This case is considered further in Section 4.1. Another
important example arises in algorithms for the unsymmetric eigenvalue problem. Here the QR
factorization of a Hessenberg matrix
h
11 h12 ··· h1,n−1 h1,n
h21 h22 ··· h2,n−1 h2,n
.. ..
h32 ··· . .
Hn = ∈ Rn×n
..
. hn−1,n−1 hn−1,n
hn,n−1 hn,n
hn+1,n
(2.2.33) holds for any ordering of the rotations in Givens QR factorization. Actual errors grow
even more slowly.
Two plane rotations Gij and Gkl are said to be disjoint if the integers i, j, k, l are disjoint.
Disjoint rotations commute and can be performed in parallel. To increase the efficiency of Givens
QR factorizations, the rotations can be ordered into groups of disjoint operations. An ordering
suggested by Gentleman [451, 1975] is illustrated as follows for a 6 × 5 matrix:
× × × × ×
1 × × × ×
2 3 × × ×
.
3 4 5 × ×
4 5 6 × ×
5 6 7 8 ×
Here an integer k in position (i, j) denotes that the corresponding element is eliminated in step
k. Note that all elements in a group are disjoint. For a matrix A ∈ Rm×n with m > n, m + n − 2
stages are needed.
Then the unique solution x to minx ∥Ax−b∥2 and the corresponding residual vector r = b−Ax
are given by the solution of the upper triangular system Rx = d1 , where
d1 0
= QT b, r=Q . (2.2.37)
d2 d2
Golub [487, 1965] gives an algorithm using pivoted Householder QR factorization. The
factor Q is not explicitly formed but implicitly defined as Q = H1 H2 · · · Hn :
d1 0
= Hn · · · H2 H1 b, r = H1 H2 · · · Hn . (2.2.38)
d2 d2
An ALGOL implementation of Golub’s least squares algorithm is given in Businger and Golub
[193, 1965]. This later appeared in Wilkinson and Reinsch [1123, 1971].
2.2. Orthogonalization Methods 57
Householder QR factorization requires 2n2 (m − n/3) flops, and computing QT b and solving
Rx = d1 require a further 4mn − n2 flops. If one wants not only ∥r∥2 but also r, another
4nm − 2n2 flops are needed. This can be compared to the method of normal equations, which
requires (mn2 + n3 /3) flops for the factorization and 2(nm + n2 ) flops for each right-hand
side. For m = n this is about the same as for the Householder QR method, but for m ≫ n the
Householder method is roughly twice as expensive.
In the following algorithm the QR factorization is applied to the extended matrix ( A b ),
R d1
(A b) = Q , Q = H1 · · · Hn Hn+1 . (2.2.39)
0 ρe1
Then Rx = d1 and the residual and its norm are given by
0
r = H1 · · · Hn Hn+1 , ∥r∥2 = ρ ≥ 0. (2.2.40)
ρe1
Theorem 2.2.6. Suppose that the full-rank least squares problem minx ∥Ax − b∥2 , m ≥ n, is
solved by Householder QR factorization. Then the computed solution x̂ is the exact least squares
solution to a slightly perturbed problem
min ∥(A + δA)x − (b + δb)∥2 ,
x
In some applications it is important that the computed residual vector r̄ be accurately orthog-
onal to R(A). The backward stability of Householder QR means that the residual computed
58 Chapter 2. Basic Numerical Methods
Now assume that the residual is computed instead from re = f l(b − f l(Ax)), where x is the exact
least squares solution. Then AT r = 0, and the error analysis for (1.4.11) for inner products gives
|AT r̃| < γn+1 |AT |(|b| + |A||x|). It follows that
This bound is much weaker than the bound (2.2.42) valid for the Householder QR method, par-
ticularly when ∥r̄∥2 ≪ ∥b∥2 .
Let AT y = c be an underdetermined linear system, where A ∈ Rm×n has full row rank n.
Then the least-norm solution y ∈ Rm can be computed from the Householder QR factorization
of A as follows. We have AT = ( RT 0 ) QT and
T T z1
Ay = (R 0)z = c, z = Q y = . (2.2.43)
z2
Since ∥y∥2 = ∥z∥2 , the problem is reduced to min ∥z1 ∥2 subject to RT z1 = c. Clearly the
least-norm solution is obtained by setting z2 = 0, and
T z1
R z1 = c, y=Q . (2.2.44)
0
Proof. See Higham [623, 2002, Theorem 21.4]. (This result from 2002 was first published here;
see Notes and References at the end of Chapter 21.)
The second equation can be used to eliminate the first n components of QT y in the first equation
to solve for x. The last m − n components d2 of QT y are obtained from the last m − n equations
in the first equation. The resulting QR algorithm for solving the augmented system (2.2.46) is
summarized below.
Algorithm 2.2.6 (Augmented System Solution by Householder QR). Compute the House-
holder QR factorization of A ∈ Rm×n , rank(A) = n, and
d1
z = R−T c, d= = QT b, (2.2.47)
d2
z
x = R−1 (d1 − z), y=Q . (2.2.48)
d2
The algorithm requires triangular solves with R and RT and multiplications of vectors with
Q and QT for a total of 8mn − 2n2 flops. Higham [618, 1991] gives a componentwise error
analysis of this algorithm, which is of importance in the analysis of iterative refinement of least
squares solution; see Section 2.5.3.
Lemma 2.2.8 (Higham 1991). Let A ∈ Rm×n , rank(A) = n, and suppose the augmented
system is solved by the algorithm in (2.2.47)–(2.2.48) using Householder or Givens QR factor-
ization. Then the computed x̄ and ȳ satisfy
I A + E1 ȳ b + e1
= ,
(A + E2 )T 0 x̄ c + e2
where
where ⟨·, ·⟩ denotes the inner product. By construction, ⟨qj , qk ⟩ = 0, j ̸= k, and span (q1 , . . . , qk )
= span (x1 , . . . , xk ), k ≥ 1. Replacing each qn by qn /∥qn ∥ gives an orthonormal sequence.
Having an orthogonal basis for this nested sequence of subspaces simplifies many operations.
Given A = (a1 , a2 , . . . , an ) ∈ Rm×n with linearly independent columns, the Gram–Schmidt
process computes a matrix factorization
such that Q = (q1 , q2 , . . . , qn ) ∈ Rm×n is orthonormal and R ∈ Rn×n upper triangular. The
difference between Gram–Schmidt and Householder QR factorizations has been aptly formu-
lated by Trefethen and Bau [1068, 1997, p. 70]: Gram–Schmidt is triangular orthogonalization,
whereas Householder is orthogonal triangularization.
The Classical Gram–Schmidt (CGS) algorithm applied to A = (a1 , . . . , an ) proceeds in n
steps, k = 1, . . . , n. In step k the vector ak is orthogonalized against Qk−1 = (q1 , . . . , qk−1 ),
giving
ak = (I − Qk−1 QTk−1 )ak = ak − Qk−1 rk , rk = QTk−1 ak .
b (2.2.51)
Then qk is obtained by normalizing bak : rkk = ∥b
ak ∥2 , qk = b
ak /rkk . Note that if a1 , . . . , ak are
linearly independent, then rkk > 0.
In step k of CGS the first k − 1 elements of the kth column of R are computed. CGS is
therefore called a columnwise or left-looking algorithm. The main work in CGS is performed
in two matrix-vector products. By omitting the normalization of ak , a square-root-free CGS
2.2. Orthogonalization Methods 61
algorithm can be obtained. This gives a factorization A = QR, where R̂ is unit upper triangular
and QT Q = D, a positive diagonal matrix. The CGS algorithm requires approximately 2mn2
flops. This is 2n3 /3 flops more than required for Householder QR.
The modified Gram–Schmidt (MGS) algorithm is a slightly different way to carry out
Gram–Schmidt orthogonalization. As soon as a new column qk has been computed, all remain-
ing columns are orthogonalized against it. This determines the kth row of R. Hence MGS is a
row-oriented or right-looking algorithm. At the start of step k the matrix A = A(1) has been
transformed into
(k)
( Qk−1 A(k) ) = q1 , . . . , qk−1 , ak , . . . , a(k)
n ,
(k)
where the columns in A(k) are orthogonal to Qk−1 . Normalizing the vector ak gives
(k) (k)
rkk = ∥ak ∥2 , qk = ak /rkk . (2.2.52)
(k)
Next aj , j = k, . . . , n, is orthogonalized to qk :
CGS and MGS is subtle and has often gone unnoticed or been misunderstood. Wilkinson [1122,
1971, p. 559] writes “I used the modified process for many years without even explicitly noticing
that I was not performing the classical algorithm.” Columnwise versions of MGS have been used
by Schwarz, Rutishauser, and Stiefel [978, 1968]; see also Gander [437, 1980] and Longley [757,
1981].
In CGS and rowwise MGS algorithms virtually all operations can be implemented as matrix-
vector operations. These versions can be made to execute more efficiently than columnwise
MGS, which uses vector operations. They also offer more scope for parallel implementations.
These aspects are important in deciding which variant should be used in a particular application.
Rice [926, 1966] was the first to establish the superior stability properties of MGS. Both CGS
and MGS can be shown to accurately reproduce A,
A + E1 = Q̄R̄, ∥E1 ∥2 < c1 u∥A∥2 , (2.2.55)
where c1 = c1 (m, n) is a small constant.
Rounding errors will occur when the orthogonal projections onto previous vectors qi are
subtracted. These errors will propagate to later stages of the algorithm and cause a (sometimes
severe) loss of orthogonality in the computed Q. However, as shown by Björck [125, 1967], for
MGS the loss of orthogonality can be bounded by a factor proportional to κ(A).
Theorem 2.2.9. Let Q̄ denote the orthogonal factor computed by the MGS algorithm. Then,
provided that c2 κ2 (A)u < 1, there is a constant c2 = c2 (m, n) such that
c2 κ2 (A)u
I − Q̄T Q̄ 2
≤ . (2.2.56)
1 − c2 κ2 (A)u
The loss of orthogonality in CGS can be much more severe. Gander [437, 1980] points out
that even Cholesky QR factorization often gives better orthogonality than CGS. For the stan-
dard version of CGS, not even a bound proportional to κ(A)2 holds unless a slightly altered
“Pythagorean variant” of CGS is used, in which the diagonal entry rkk is computed as
rkk = (s2k − p2k )1/2 = (sk − pk )1/2 (sk + pk )1/2 , (2.2.57)
2.2. Orthogonalization Methods 63
2 2
where sk = ∥ak ∥2 and pk = (r1k + · · · + rk−1,k )1/2 . For this variant, Smoktunowicz, Barlow,
and Langou [1006, 2006] were able to prove the upper bound
T
∥I − Q̄1 Q̄1 ∥2 ≤ c2 (m, n)κ2 (A)2 . (2.2.58)
Example 2.2.10. To illustrate the difference in loss of orthogonality of MGS and CGS, we use
a matrix A ∈ R50×10 with singular values σi = 10−i+1 , i = 1 : 10, generated by computing
Here U and V are random orthonormal matrices from the Haar distribution generated by an
algorithm of Stewart [1019, 1980] that uses products of Householder matrices with randomly
chosen Householder vectors. Table 2.2.1 shows κ(Ak ), Ak = (a1 , . . . , ak ), and the loss of
orthogonality in CGS and MGS as measured by ∥Ik − QTk Qk ∥2 for k = 1, . . . , 10. As expected,
the loss of orthogonality in the computed factor Q for MGS is proportional to κ(Ak ). For CGS
the loss is much worse. CGS is therefore often used with reorthogonalization; see Section 2.2.7.
Table 2.2.1. Condition number and loss of orthogonality in CGS and MGS.
It is important to note that all Gram–Schmidt algorithms are invariant under column scalings,
Let D > 0 be a diagonal matrix. Then the Gram–Schmidt algorithms applied to the scaled
matrix A e = AD will yield the factors Q e = Q and Re = RD. This is true also in floating-point
arithmetic, provided that the entries of D are powers of two so that the scaling is done without
error. From the invariance under column scaling it follows that κ2 (A) in (2.2.56) can be replaced
by
κ
e2 = min κ2 (AD), (2.2.59)
D∈D
where D is the set of all positive diagonal matrices. Scaling A so that all column norms in A are
equal will approximately minimize κ2 (AD); see Theorem 2.1.5.
The history of Gram–Schmidt orthogonalization is surveyed by Leon, Björck, and Gander [733,
2013]. What is now called the “classical” Gram–Schmidt (CGS) algorithm appeared in Schmidt
[971, 1907], [972, 1908] in the context of solving linear systems with infinitely many unknowns.
Schmidt remarked that his formulas were similar to those given earlier by J. P. Gram [525, 1883]
64 Chapter 2. Basic Numerical Methods
in a paper on series expansions of real functions using least squares. Gram used the “modi-
fied” Gram–Schmidt (MGS) algorithm for orthogonalizing a sequence of functions and applied
the results to applications involving integral equations. Gram was influenced by the work of
Chebyshev, and his original orthogonalization procedure was applied to orthogonal polynomials.
The earliest linkage of the names Gram and Schmidt to describe the orthonormalization
process appears to be in a paper by Wong [1130, 1935]. A process similar to MGS had already
been used by Laplace [721, 1816] for solving a least squares problem; see Farebrother [396,
1988] and Langou [720, 2009]. However, Laplace seems not to have recognized the crucial role
of orthogonality. Bienaymé [118, 1853] developed a similar process related to an interpolation
algorithm of Cauchy [212, 1837] that forms the basis of Thiele’s theory of linear estimation.
where P ∈ R(n+m)×(n+m) and P11 ∈ Rn×n . Recall that the Householder transformation
P a = e1 ρ uses
P = I − 2vv T /∥v∥22 , v = a − e1 ρ, ρ = ±∥a∥2
(ek is the kth column of the unit matrix). If (2.2.60) is obtained using Householder transforma-
tions, then
(1)
where the vectors v̂k are as described below. From MGS applied to A(1) = A, r11 = ∥a1 ∥2
(1)
and a1 = q1′ = q1 r11 . Thus, for the first Householder transformation applied to the augmented
matrix,
e(1) ≡ On (1) 0
A , a1 = (1) ,
A(1) a1
e
(1) −e1 r11 −e1
v̂1 ≡ = r11 v1 , v1 =
q1′ q1
(and since there can be no cancellation, we take rkk ≥ 0). But ∥v1 ∥22 = 2, giving
and
(1) (1) (1) 0 −e1 (1) e1 r1j
P1 e
aj = aj − v1 v1T e
aj = (1) − q1T aj = (2) ,
aj q1
e
aj
2.2. Orthogonalization Methods 65
so
r11 r12 ··· r1n
0 0 ··· 0
. .. .. ..
e(1)
P1 A = .
. . . . ,
0 0 ··· 0
(2) (2)
0 a2 ··· an
where these values are clearly numerically the same as in the first step of MGS on A. The next
(3) (3)
Householder transformation produces the second row of R and a3 , . . . , an , just as in MGS.
We have
T 0 R eT = H en · · · H
Q
e = , Q e2He1, (2.2.62)
A 0
where ek denotes the kth unit vector, and the sign is chosen so that R has a positive diagonal.
With qk = q̂k /rkk it follows that
e k = I − vk vkT , −ek
H vk = , ∥vk ∥22 = 2. (2.2.63)
qk
Initially the first n rows are empty. Hence, the scalar products of vk with later columns will only
involve qk , and as is easily verified, the quantities rkj and qk are numerically the same as in the
MGS method. It follows that Householder QR is numerically equivalent to MGS applied to A.
From the backward stability of Householder QR we have the following result.
Theorem 2.2.11. There exists an exactly orthonormal matrix Q̂1 ∈ Rm×n such that for the
computed matrix R̄ in MGS it holds that
A consequence of this result is that the factor R̄ computed by MGS is as good as the triangular
factor obtained by using Householder or Givens QR. The result can be sharpened to show that R̄
is the exact triangular QR factor of a matrix near A in the columnwise sense; see Higham [623,
2002, Theorem 19.13].
For a matrix Q1 = (q1 , . . . , qn ) with any sequence q1 , . . . , qn of unit 2-norm vectors, the
matrix P = P1 P2 · · · Pn , with
has a very special structure. The following result holds without recourse to the MGS connection.
As shown by Paige [862, 2009] it can be used to simplify the error analysis of several other
algorithms.
66 Chapter 2. Basic Numerical Methods
Then
P11 (I − P11 )QT1
P = P1 P2 · · · P n = (2.2.66)
Q1 (I − P11 ) I − Q1 (I − P11 )QT1
The matrix P is orthogonal and depends only on Q1 and the strictly upper triangular matrix
P11 . P11 = 0 if and only if QT1 Q1 is diagonal, and then
0 QT1
P = . (2.2.67)
Q1 I − Q1 QT1
No assumption about the orthogonality of Q1 is needed for this to be true. However, if ρ ≪ ∥b∥2 ,
then qn+1 fails to be accurately orthogonal to R(A). A backward stable r is obtained by adding
a reorthogonalization step, where the computed r orthogonalized against qn , . . . , q2 , q1 in this
order. The proof of backward stability of this algorithm for computing x and r is by no means
2.2. Orthogonalization Methods 67
by Householder QR. Applying the Householder transformations to the right-hand side in (2.2.70)
gives
d 0
= Hn · · · H1 .
e b
An implementation is given below.
The equivalence between MGS and Householder QR factorization can also be used to obtain
a backward stable algorithm for computing the minimum norm solution of an underdetermined
linear system,
min ∥y∥2 subject to AT y = c, (2.2.71)
where A ∈ Rm×n , rank(A) = n; see Björck [134, 1994]. Consider now using Householder QR
factorization to solve the equivalent least-norm problem
w T w
min subject to ( 0 A ) = c. (2.2.72)
y 2
y
From the special form of the matrices Hk , this leads to the following algorithm: Set y (n) = 0
and
y (k−1) = y (k) − (ωk − ζk )qk , ωk = qkT y (k) , k = n, . . . , 1. (2.2.74)
Then the least-norm solution is y = y (0) . The quantities ωk compensate for the lack of orthog-
onality of Q1 . If Q1 is exactly orthogonal, they are zero.
68 Chapter 2. Basic Numerical Methods
Algorithms 2.2.10 and 2.2.11 are columnwise backward stable in the same sense as the cor-
responding Householder QR factorizations; see Theorem 2.2.6 and Theorem 2.2.7, respectively.
The backward stable algorithm (2.2.47)–(2.2.48) using Householder QR factorization for
solving the augmented system
I A y b
= , (2.2.75)
AT 0 x c
where A ∈ Rm×n , with rank(A) = n was given in Section 2.2.3. The interpretation of MGS
as a Householder method shows the strong backward stability property of the following MGS
algorithm for solving augmented systems; see Björck and Paige [150, 1994].
4. Solve Rx = d − z for x.
Algorithm 2.2.12 requires 8mn + 2n2 flops and generalizes the previous two algorithms.
It is easily verified that if c = 0, it reduces to Algorithm 2.2.10, and if b = 0, it reduces to
Algorithm 2.2.11. The stability of the MGS algorithm for solving augmented systems is analyzed
by Björck and Paige [150, 1994].
2.2.7 Reorthogonalization
As shown in Section 2.2.4, the loss of orthogonality in the computed Q1 = (q1 , . . . , qn ) as
measured by ∥I − QT1 Q1 ∥2 is proportional to κ(A) for MGS and to κ(A)2 for a variant of CGS.
2.2. Orthogonalization Methods 69
In many applications it is essential that the computed vectors be orthogonal to working accuracy.
In the orthogonal basis problem A is given, and we want to find Q1 and R such that
for modest constants c1 (m, n) and c2 (m, n). One important application of reorthogonalization
is subspace projection methods for solving eigenvalue problems.
To study the loss of orthogonalization in an elementary orthogonalization step, let A =
(a1 , a2 ) ∈ Rm×n be two given linearly independent unit vectors. Let q1 = a1 and q2′ =
a2 − r12 q1 , r12 = q1T a2 , be the exact results. The corresponding quantities in floating-point
arithmetic are
r12 = f l(q1T a2 ), q ′2 = f l(a2 − f l(r12 q1 )).
The errors can be bounded by (see Section 1.4.2)
(The errors in the normalization are negligible.) This shows that loss of orthogonality results
when cancellation occurs in the computation of q ′2 . This is the case when r22 = sin(ϕ) is small,
where ϕ is the angle between a1 and a2 . Then the orthogonalization can be repeated:
Hence for α = 0.5 the computed vector q̂2′ is orthogonal to machine precision. For smaller values
of α, reorthogonalization will occur less frequently, and then the bound (2.2.79) on orthogonality
is less satisfactory.
For A = (a1 , . . . , an ), n > 2, selective reorthogonalization is used in a similar way.
In step k, k = 2, . . . , n, CGS or MGS is applied to make ak orthogonal to an orthonor-
mal Q1 = (q1 , . . . , qk−1 ), giving a computed vector q̄k′ . The vector q̄k′ is accepted provided
r̄kk = ∥q̄k′ ∥2 > α∥ak ∥2 . Otherwise, q̄k′ is reorthogonalized against Q1 . Rutishauser [951, 1970]
performs reorthogonalization when at least one decimal digit of accuracy has been lost due to
cancellation. This corresponds to selective reorthogonalization with α = 0.1. Hoffmann [637,
1989] reports extensive numerical tests with iterated reorthogonalization for CGS and MGS for
a range of values of α = 1/2, 0.1, . . . , 10−10 . The tests show that α = 0.5 makes Q1 orthogonal
to full working precision after one reorthogonalization. Moreover, with α = 0.5, CGS performs
as well as MGS. √
Daniel et al. [285, 1976] recommend using α = 1/ 2. Under certain technical assumptions,
they show that provided A has full numerical rank, iterated reorthogonalization converges to
70 Chapter 2. Basic Numerical Methods
a sufficient level of orthogonality. If failure occurs in step k, one option is to not generate a
new vector qk in this step, set rkk = 0, and proceed to the next column. This will generate a
QR factorization where, after a suitable permutation of columns, Q is m × (n − p) and R is
(n − p) × n upper trapezoidal with nonzero diagonal entries. This factorization can be used to
compute the pseudoinverse solution to a least squares problem.
If full orthogonality is desired, the simplest option is to always perform one reorthogonal-
(0)
ization, i.e., the column vectors ak = ak , k ≥ 2, in A are orthogonalized twice against the
computed basis vectors Qk−1 = (q1 , . . . , qk−1 ):
(2) (2)
The new basis vector is then given as qk = ak /∥ak ∥2 .
The corrections to the elements in R are in general small and may be omitted. However,
Gander [437, 1980] has shown that including them will give a slightly lower error in the com-
puted residual A − QR. A scheme MGS2 similar to CGS2 can be employed for the columnwise
MGS algorithm. This has the same operation count as CGS2, and both produce basis vectors
with orthogonality close to unit roundoff level. For MGS2 the inner loop is a vector operation,
whereas in CGS2 it is a matrix-vector operation. Hence MGS2 executes slower than CGS2,
which therefore usually is the preferred choice.
Giraud and Langou [479, 2002] analyze a different version of MGS2. Let the initial factor-
ization A = Q1 R be computed by rowwise MGS. MGS is then applied a second time to the
computed Q1 to give Q1 = Q e 1 R.
e Combining this and the first factorizations yields the corrected
factorization
A=Q e 1 R,
b R b = RR.
e
This algorithm can be proved to work under weaker assumptions than those for CGS2. From the
analysis of MGS by Björck [125, 1967] and Björck and Paige [149, 1992], Giraud and Langou
get the following result.
2.3. Rank-Deficient Least Squares Problems 71
Then for the factorization A = Q1 R computed by MGS, it holds that κ(Q1 ) ≤ 1.3.
Hence Q
e 1 is orthonormal to machine precision.
QT1 Q1 rk = QT1 ak .
Iterated CGS corresponds to the Jacobi, and iterated MGS corresponds to the Gauss–Seidel iter-
ative method for solving this system; see Section 6.1.4.
Theorem 2.3.1. Let C = ATA ∈ Rn×n be a symmetric positive semidefinite matrix of rank
r < n. Then there is a permutation P such that P T CP has a unique Cholesky factorization of
the form
R11 R12
P T ATAP = RTR, R = , (2.3.1)
0 0
where R11 ∈ Rr×r is upper triangular with positive diagonal elements.
Proof. The proof is constructive. The algorithm takes C (1) = ATA and computes a sequence of
matrices
(k) 0 0
C (k) = (cij ) = , k = 1, 2, . . . .
0 S (k)
At the start of step k we select the maximum diagonal element of C (k) ,
(k)
sp = max cii ,
k≤i≤n
72 Chapter 2. Basic Numerical Methods
and interchange rows and columns p and k to bring this into pivot position. This pivot must be
positive for k < r, because otherwise S (k) = 0, which implies that rank(C) < r. Next, the
elements in the permuted C (k) are transformed according to
q
(k) (k)
rkk = ckk , rkj = ckj /rkk , j = k + 1 : n,
(k+1) (k) T
cij = cij − rki rkj , i, j = k + 1 : n.
This is equivalent to subtracting a symmetric rank-one matrix rj rjT from C (k) , where rj = eTj R
is the jth row of R. The algorithm stops when k = r + 1. Then all diagonal elements are zero,
which implies that C (r+1) = 0.
Since all reduced matrices C (k) are symmetric positive semidefinite, their maximum ele-
ments lie on the diagonal. Hence, the pivot selection in the outer product Cholesky algorithm
described above is equivalent to complete pivoting. The algorithm produces a matrix R whose
diagonal elements in R form a nonincreasing sequence r11 ≥ r22 ≥ · · · ≥ rnn . Indeed, the
stronger inequalities
j
X
2 2
rkk ≥ rij , j = k + 1, . . . , n, k = 1 : r, (2.3.2)
i=k
and set rank(C) = k − 1. But this may cause unnecessary work in eliminating negligible
elements. Taking computational cost into consideration, we recommend the stopping criterion
(k) 2
max aii ≤ cn u r11 , (2.3.3)
k≤i≤n
where cn is a modest constant; see also Higham [623, 2002, Sect. 10.3.2]. Perturbation theory
and error analysis for the Cholesky decomposition of semidefinite matrices are developed by
Higham [617, 1990].
In the rank-deficient case, the permuted normal equations become
RTRe
x = d,
e x = Px
e, de = P T (AT b).
With z = Re
x, we obtain
T
R11 de1
RT z = T z= ,
R12 de2
where R11 ∈ Rr×r is nonsingular. The triangular system R11
T
z = de1 determines z ∈ Rr . From
e1 = z − R12 x
R11 x e2 ,
T
where xe = (x eT1 x eT2 ) , we can determine xe1 for an arbitrarily chosen x
e2 . This expresses the
fact that a consistent singular system has an infinite number of solutions. Finally, the permuta-
tions are undone to obtain x = P x e.
2.3. Rank-Deficient Least Squares Problems 73
Setting x
e2 = 0 we get a basic solution xb with only r = rank(A) nonzero components in
x, corresponding to the first r columns in AP . This is relevant when a good least squares fit of
b using as few variables as possible is desired. The pseudoinverse solution x† that minimizes
∥x∥2 = ∥e x∥2 is obtained from the full-rank least squares problem
S xb −1
min x2 − , S = R11 R12 . (2.3.4)
x2 −In−r 0 2
The basic solution xb can be computed in about r2 (n − r) flops. Note that S can overwrite R12 .
Then x2 can be computed from the normal equations,
(S T S + In−r )x2 = S T xb ,
Theorem 2.3.2. Given A ∈ Rm×n with rank(A) = r < n, there is a permutation matrix P and
an orthogonal matrix Q ∈ Rm×n , such that
R11 R12 }r
AP = Q , (2.3.5)
0 0 }m − r
Proof. Since rank(A) = r, we can always choose a permutation matrix P such that AP =
( A1 A2 ), where A1 ∈ Rm×r has linearly independent columns. The QR factorization
R11
QTA1 = , Q = ( Q1 Q2 )
0
uniquely determines Q1 ∈ Rm×r and R11 ∈ Rr×r with positive diagonal elements. Then
T T T R11 R12
Q AP = ( Q A1 Q A2 ) =
0 R22
has rank r. Here R22 = 0, because R cannot have more than r linearly independent rows. Hence
the factorization must have the form (2.3.5).
From (2.3.5) and orthogonal invariance it follows that the least squares problem minx ∥Ax −
b∥2 is equivalent to
R11 R12 x
e1 d1
min − , (2.3.6)
x 0 0 x
e2 d2 2
and z = x
e2 can be chosen arbitrarily. For z = 0, we obtain a basic least squares solution
x
eb −1
x=P , x eb = R11 d1 , (2.3.8)
0
Here S can be computed in about r2 (n − r) flops by solving the matrix equation R11 S = R12
using back-substitution.
A general approach to resolve rank-deficiency is to seek the solution to the least squares
problem
min ∥Bx∥2 , S = {x | min ∥Ax − b∥2 }. (2.3.10)
x∈S x
Here B can be chosen so that ∥Bx∥2 is a measure of the smoothness of x. Substituting the
general solution (2.3.7), we find that (2.3.10) is equivalent to
S x
eb
min B z−B . (2.3.11)
z −In−r 0 2
is a (nonorthonormal) basis for N (AP ). QR factorization gives an orthonormal basis for N (AP ).
Note that the unique pseudoinverse solution orthogonal to N (AP ) equals the residual of the least
squares problem (2.3.11) with B = I,
† xeb S
x
e = − z. (2.3.13)
0 −In−r
Notice that it has the form of the basic solution minus a correction in the nullspace of AP . Any
particular solution can be substituted for z in (2.3.11).
in which the pivot column at step k is chosen to maximize the diagonal element rkk . We first
show how to implement this strategy for MGS. Assume that after (k − 1) steps the nonpivotal
columns have been transformed according to
k−1
(k)
X
aj = aj − rij qi , j = k, . . . , n,
i=1
2.3. Rank-Deficient Least Squares Problems 75
(k)
where aj is orthogonal to R(Ak−1 ) = span {q1 , . . . , qk−1 }. Hence in the kth step we should
determine p, so that
(k) 2
∥a(k) 2
p ∥2 = max ∥aj ∥2 , (2.3.15)
k≤j≤n
and interchange columns k and p. This is equivalent to choosing at the kth step a pivot column
with largest distance to the subspace R(Ak−1 ) = span (ac1 , . . . , ack−1 ), where Ak−1 is the
submatrix formed by the columns corresponding to the first k − 1 selected pivots. We note that
for this pivot strategy to be relevant it is essential that the columns of A be well scaled.
Golub [487, 1965] gave an implementation of the same pivoting strategy for Householder
QR. Assume that after k steps of pivoted QR factorization the reduced matrix is
R11 R12
, (2.3.16)
0 A(k)
(k) (k)
where R11 ∈ Rk×k is square upper triangular and A(k) = (a1 , . . . , an ). Let p be the smallest
index such that
(k) (k) (k)
s(k)
p ≥ sj , sj = ∥aj ∥22 , j = k + 1, . . . , n,
(k)
where aj are the columns of the submatrix A(k) . Then before the next step, columns k + 1 and
p in A(k) are interchanged. The pivot column maximizes
(k) (k)
sj = min ∥A(k) y − aj ∥22 , j = k, . . . , n. (2.3.17)
y
(k)
The quantities sj can be updated by formulas similar to (2.3.21) used for MGS, but some care
is necessary to avoid numerical cancellation.
With the column pivoting strategy described above, the diagonal elements in R will form
a nonincreasing sequence r11 ≥ r22 ≥ · · · ≥ rrr . It is not difficult to show that, in fact, the
diagonal elements in R satisfy the stronger inequalities
j
X
2 2
rkk ≥ rij , j = k + 1, . . . , n, k = 1 : r. (2.3.18)
i=k
where P1,j is the permutation matrix that interchanges columns 1 and j. Then ∥A∥2F ≤ nr112
,
which yields upper and lower bounds for σ1 (A),
√
|r11 | ≤ σ1 (A) ≤ n |r11 |. (2.3.20)
(k)
If the column norms ∥aj ∥2 are recomputed at each stage of MGS, this will increase the opera-
tion count of the QR factorization by 50%. Since these quantities are invariant under orthogonal
transformations, this overhead can be reduced to O(mn) operations by using the recursion
(k+1) 2 (k)
∥aj ∥2 = ∥aj ∥22 − rkj
2
, j = k + 1, . . . , n, (2.3.21)
(k+1)
to update these values. To avoid numerical problems, sj should be recomputed from scratch
(k+1) (k) √
whenever there has been substantial cancellation, e.g., when ∥aj ∥2 ≤ ∥aj ∥2 / 2.
76 Chapter 2. Basic Numerical Methods
If a diagonal element rkk in QRP vanishes, it follows from (2.3.18) that rij = 0, i, j ≥ k.
Assume that at an intermediate stage of QRP the new diagonal element satisfies rk+1,k+1 ≤ δ
for some small δ. Then by (2.3.18),
∥A(k) ∥F ≤ (n − k)1/2 δ,
and setting A(k) = 0 corresponds to a perturbation Ek of A, such that A + Ek has rank-k and
∥Ek ∥F ≤ (n − k)1/2 δ. The matrix
obtained by neglecting R22 , is the best rank-k approximation to A that differs from AP only in
the last n − k columns. In particular, with k = n − 1 we obtain ∥A − Â∥F = rnn .
Example 2.3.3. The following example by Kahan [680, 1966] shows that QR factorization with
standard pivoting can fail to reveal near singularity of a matrix. The matrix
1 −c −c . . . −c
1 −c . . . −c
n−1
.. ..
An = diag(1, s, . . . , s ) 1 . . , 0 ≤ c ≤ 1,
..
. −c
1
where s2 + c2 = 1, is already in upper triangular form. Because the inequalities (2.3.18) hold,
An is invariant under QR factorization with column pivoting.4 For n = 100 and c = 0.2 the two
smallest singular values are σn = 3.6781 · 10−9 and σn−1 = 0.1482. However, the two smallest
diagonal elements of Rn are rn−1,n−1 = sn−2 = 0.1299 and rnn = sn−1 = 0.1326, and the
near singularity of An is not revealed.
The column pivoting strategy described is independent of the right-hand side b and may not
be the most appropriate for solving a given least squares problem. For example, suppose b is a
multiple of a column in A. With standard pivoting this may not be detected until the full QR
factorization has been computed. An alternative strategy is to select the pivot column in step
k + 1 as the column for which the current residual norm ∥b − Ax(k) ∥2 is maximally reduced. For
MGS this is achieved by choosing as pivot the column ap that makes the smallest acute angle
(k) (k)
with r(k) . Hence, with γj = (aj )T r(k) , the column is chosen to maximize
(k) (k)
(γj )2 /∥aj ∥22 . (2.3.23)
3|rnn |
σn ≥ √ ≥ 21−n |rnn |. (2.3.24)
4n + 6n − 1
This shows σn can be much smaller than |rnn | for moderately large values of n. Example 2.3.3
shows that the bound in (2.3.24) can almost be attained.
4 Due to roundoff, pivoting actually may occur in floating-point arithmetic. This can be avoided by making a small
Stewart [1023, 1984] shows that better bounds for σn can be found from QR factorization
using so-called reverse column pivoting. This determines the permutation P1,j so that
|rnn | = min {|eTn Ren | | AP1,j = QR}.
1≤j≤n
Theorem 2.3.4. Assume that we have a complete orthogonal decomposition (2.3.27) of A. Then
if V = ( V1 V2 ), P V2 is an orthogonal basis for the nullspace of dimension (n − r) of A.
Furthermore, the pseudoinverse of A is
−1
R 0
A† = P V QT , (2.3.28)
0 0
and x† = P V1 R−1 QT1 b is the pseudoinverse solution of the problem minx ∥Ax − b∥2 .
78 Chapter 2. Basic Numerical Methods
For matrices A ∈ Rm×n that are only close to being rank-deficient with r < n, Stewart
[1025, 1992] introduced the URV decomposition. This has the form
R11 R12
AP = U V T , R11 ∈ Rr×r , (2.3.29)
0 R22
σ1 ≥ σ2 ≥ · · · ≥ σr ≫ σr+1 ≥ · · · ≥ σn ,
↓ ↓ ↓ ↓
r r r r → r r r r r r r r
⇒ → ⊕
+ r r r r r r 0 r r r
⇒
0 0 r r 0 0 r r 0 + r r
0 0 0 r 0 0 0 r 0 0 0 r
↓ ↓
r r r r r r r r r r r e
→ 0 r r r ⇒
0 r r r 0 r r e
⇒ .
→ 0 ⊕ r r 0 0 r r → 0 0 r e
0 0 0 r 0 0 + r → 0 0 ⊕ e
This process requires O(n2 ) multiplications. We now have
Stewart [1026, 1993] has suggested a refinement process for the URV decomposition (2.3.29),
which reduces the size of the block R12 and increases the accuracy in the nullspace approxima-
tion. It can be viewed as one step of the zero-shift QR algorithm (7.1.19), and can be iterated,
and will converge quickly if there is a large relative gap between the singular values σk and σk+1 .
Alternatively one can work with the corresponding decomposition of lower triangular form, the
rank-revealing ULV decomposition
L11 0
A=U V T. (2.3.31)
L21 L22
For this decomposition with the partitioning V = ( V1 V2 ), ∥AV2 ∥2 = ∥L22 ∥F . Hence the
size of ∥L21 ∥F does not adversely affect the nullspace approximation.
Suppose we have a rank-revealing factorization
L11 0
AP = Q ,
L21 L22
where L11 and L22 are lower triangular and σk (L11 ) ≥ σk /c, and ∥L22 ∥2 ≤ cσk+1 for some
constant c. (Such a factorization can be obtained from a rank-revealing QR factorization by
reversing the rows and columns of the R-factor.) Then a rank-revealing ULV decomposition can
be obtained by a similar procedure as shown above for the URV decomposition. Suppose we
have a vector w such that ∥wT L∥2 is small. Then, as before, w is first reduced to the unit vector
en :
QT wn = GTn−1,n · · · GT12 wn = ∥wn ∥2 en .
The sequence of plane rotations are then applied to L from the left, and extra rotations from the
right are used to preserve the lower triangular form.
Definition 2.3.5. Let A ∈ Rm×n , m ≥ n, be a given matrix, and let Πk be a permutation. Then
the QR factorization
R11 R12 }k
AΠk = QR = Q , 1 ≤ k < n, (2.3.32)
0 R22 }m − k
is said to be a rank-revealing QR (RRQR) factorization if
From the interlacing property of the singular values (Theorem 1.3.5) it follows that
The permutation Π should be chosen so that the smallest singular value of the k first columns
of A1 is maximized and the largest singular value of A2 is minimized. Note that an exhaustive
search is not feasible because this has combinatorial complexity. It can be shown that an RRQR
factorization always exists.
Theorem 2.3.6. Let A ∈ Rm×n , (m ≥ n), and let k be a given integer 0 < k < n. Then there
is a permutation matrix Πk that gives an RRQR factorization (2.3.32) with
p
c = k(n − k) + 1. (2.3.35)
As pointed out by Stewart [1024, 1992] the sense in which the RRQR algorithms are rank-
revealing is different from that of the SVD. Given A ∈ Rm×n and a value k < n they produce a
permutation Π that reveals if there is a gap between σk and σk+1 . For a different value of k the
permutation may change.
Golub, Klema, and Stewart [496, 1976] (see also Golub and Van Loan [512, 1996, Sect.
12.2]) note that the selection of Π in an RRQR factorization is related to the column subset
selection problem of determining a subset A1 of k < n columns in A ∈ Rm×n such that
∥A − (A1 A†1 )A∥2 is minimized over all possible choices. This is closely related to the selection
of a subset of rows of the matrix of right singular vectors of A corresponding to small singular
values, as explained in the following theorem.
Proof. See Hong and Pan [638, 1992, Theorem 1.5]. The equality (2.3.37) follows by applying
the CS decomposition (see Section 1.2.4) to the orthogonal matrix
T
Π1 V1 ΠT1 V2
ΠT V = .
ΠT2 V1 ΠT2 V2
Here the matrix V2 of right singular vectors can be replaced by any orthonormal basis for the
column space of V2 .
It remains to obtain a sufficiently sharp lower bound for c. When k = n − 1, this is easily
√ the right singular vector corresponding to σn . Since ∥vn ∥2 = 1, it follows that
done. Let vn be
∥vn ∥∞ ≥ 1/ n. Hence, taking Π to be the permutation matrix that permutes √ the maximum
element in vn to the last position guarantees that (2.3.37) holds with c = n.
Algorithms for computing an RRQR factorization based on computing the SVD of A are
usually not practical. If the SVD is known, this is already sufficient for most purposes.
Example 2.3.8. Let Rn , n = 100, be the Kahan matrix in Example 2.3.3. The largest element
in the right singular vector vn corresponding to σn is v1,n = 0.553, whereas the element vn,n =
1.60 · 10−8 is very small. Therefore, we perform a cyclical shift of the columns in Rn that puts
the first column last, i.e., in the order 2, 3, . . . , n, 1,
−c −c . . . −c 1
1 −c . . . −c 0
n−1
.. .. ..
Hn = Rn Π = diag(1, s, . . . , s ) 1 . . . .
. . . −c 0
1 0
The matrix Hn has Hessenberg form and can be retriangularized in less than 2n2 flops using
updating techniques. Hence the total cost of this factorization is only slightly larger than that
for the standard QR factorization. In the new R-factor R̄ the last diagonal element r̄n,n =
6.654 · 10−9 is of the same order of magnitude as the smallest singular value 3.678 · 10−9 .
Furthermore, r̄n−1,n−1 = 0.16236. Hence R̄ is rank-revealing.
To obtain a sharp lower bound for c in (2.3.36) when k > 1 is more difficult. Recall that the
volume of a matrix X ∈ Rm×k , m ≥ k, is defined as the product of its singular values:
vol (X) = | det(X)| = σ1 (X) · · · σk (X).
Hong and Pan [638, 1992] show that selecting a permutation Π = ( Π1 Π2 ) such that
vol (ΠT2 V2 ) is maximum among all possible (n − k) by (n − k) submatrices in V2 is sufficient
to give an RRQR factorization.
Lemma 2.3.9. Let the unit vector v, ∥v∥2 = 1, be such that ∥Av∥2 = ϵ. Let Π be a permutation
such that if w = ΠT v, then |zn | = ∥w∥∞ . Then, in the QR factorization of AΠ we have
|rnn | ≤ n1/2 ϵ.
Proof. Since |zn | = ∥w∥∞ and ∥v∥2 = ∥w∥2 = 1, it follows that |zn | ≥ n−1/2 . Furthermore,
QTAv = QTAΠ(ΠT v) = Rw,
where the last component of Rw is rnn zn . Therefore, ϵ = ∥Av∥2 = ∥QTAw∥2 = ∥Rw∥2 ≥
|rnn zn |, from which the result follows.
Chan [224, 1987] gives a more efficient approach for the special case k = n − 1, based on
inverse iteration. Let AΠG = QR be an initial QR factorization using standard pivoting. Then
(ATA)−1 = (RTR)−1 = R−1 R−T ,
for which the dominating eigenvalue is σ1−2 . Each step then requires 2n2 flops for the solution
of the two triangular systems
RT y (k) = w(k−1) , Rw(k) = y (k) . (2.3.38)
82 Chapter 2. Basic Numerical Methods
By a few steps of inverse iteration, an approximation to σn and the corresponding singular vector
vn are obtained. From this a permutation matrix Π is determined as in Lemma 2.3.9. An RRQR
factorization of RΠ = Q̄R̄ can then be computed using updating techniques as in Example 2.3.8.
The above one-dimensional technique can be extended to the case when the approximate
nullspace is larger than one by applying it repeatedly to smaller and smaller leading blocks of R
as described in Algorithm 2.3.1.
wk
5. Assign to the kth column of W and update
0
W1 PT 0
W = := W,
W2 0 In−k
A similar algorithm was proposed independently by Foster [424, 1986]. The main difference
between the two algorithms is that Foster’s algorithm only produces a factorization for a subset
of the columns of the original matrix.
By the interlacing property of singular values (Theorem 1.2.9) it follows that the δi are non-
increasing and that the singular values σi of A satisfy δi ≤ σi , k + 1 ≤ i ≤ n. Chan [224, 1987]
proves the following upper and lower bounds.
(i) (i)
Theorem 2.3.10. Let R22 and W2 denote the lower right submatrices of dimension (n − i +
1) × (n − i + 1) of R22 and W2 , respectively. Let δi denote the smallest singular value of the
leading principal i × i submatrices of R. Then, for i = k + 1 : n,
σi (i) √ (i)
√ (i)
≤ δi ≤ σi ≤ ∥R22 ∥2 ≤ σi n − i + 1∥(W2 )−1 ∥2 .
n−i+ 1∥(W2 )−1 ∥2
(i)
Hence, ∥R22 ∥2 are easily computable upper bounds for σi . Further, the outermost bounds
(i) (i)
in the theorem show that if ∥(W2 )−1 ∥2 is not large, then δi and ∥R22 ∥2 are guaranteed to be
tight bounds, and hence the factorization will have revealed the rank. The matrix W determined
2.3. Rank-Deficient Least Squares Problems 83
Therefore, R(ΠW ) in the RRQR algorithm is a good approximation to the numerical nullspace
Nk (A). A more accurate and orthogonal basis for Nk (A) can be determined by simultaneous
inverse iteration with RTR starting with W . If R has zero or nearly zero diagonal elements, a
small multiple of the machine unit is substituted. The use of RRQR factorizations for computing
truncated SVD solutions is discussed by Chan and Hansen [227, 1990].
If the matrix A has low rank rather than low rank-deficiency, it is more efficient to build
up the rank-revealing QR factorization from estimates of singular vectors corresponding to the
large singular values. Such algorithms are described by Chan and Hansen [229, 1994]. Chan-
drasekaran and Ipsen [231, 1994] show that many previously suggested pivoting strategies form
a hierarchy of greedy algorithms. They give an algorithm called Hybrid-III that is guaranteed to
find an RRQR factorization that satisfies (2.3.33). Their algorithm works by alternately applying
standard and Stewart’s reverse column pivoting to the leading and trailing diagonal blocks of the
initial QR factorization
R11 R12
A = QR = Q , R11 ∈ Rk×k . (2.3.39)
0 R22
It keeps interchanging the “most dependent” of the first k columns with one of the last n − k
columns, and interchanging the “most independent” of the last n − k columns with one of the
first k columns, as long as det(R11 ) strictly increases. This stops after a finite number of steps.
In the worst case the work is exponential in n, but in practice usually only a few refactorizations
are needed.
Pan and Tang [875, 1999] give an RRQR algorithm that uses a similar type of cyclic pivoting.
Given A ∈ Rm×n , m ≥ n, let Πi,j be the permutation matrix such that AΠi,j interchanges
columns i and j of A. They define the pivoted magnitude η(A) of A to be the maximum
magnitude of r11 in the QR factorizations of AΠ1,j , 1 ≤ j ≤ n, i.e.,
Similarly, they define the reverse pivoted magnitude τ (A) to be the minimum magnitude of
|rnn | in the QR factorizations of AΠj,n , 1 ≤ j ≤ n. If A is nonsingular, then as shown by
Stewart [1023, 1984],
τ (A) = 1/ max ∥eTj A−1 ∥2 .
1≤j≤n
It can be shown that whenever an exchange takes place in step 2, the determinant of R11
will strictly increase in magnitude. Therefore an exchange can happen only a finite number of
times. After the last exchange, at most k − 1 iterations can take place. Hence the algorithm must
terminate.
For the RRQR factorization a basis for the numerical nullspace is given by R(W), where
−1
−R11 R12
W = .
In−r
−1
If the norm of R11 R12 is large, this cannot be stably evaluated. Gu and Eisenstat [549, 1996]
call an RRQR factorization strong if, apart from (2.3.33) being satisfied, it holds that
−1
(R11 R12 )ij ≤ c2 (k, n) (2.3.42)
where L ∈ Rm×n is unit lower trapezoidal and U ∈ Rn×n is upper triangular and nonsingular.
Π1 and Π2 are permutation matrices reflecting the row and column interchanges. With this
factorization the least squares problem becomes
If complete pivoting is used, then all elements in L are bounded by one in modulus. This ensures
that L is well-conditioned, and any ill-conditioning in A will be reflected in U . Hence, squaring
2.4. Methods Based on LU Factorization 85
the condition number is avoided, and without substantial loss of accuracy, the first subproblem
in (2.4.2) can be solved using the normal equations
LT Ly = LT Π1 b. (2.4.3)
Example 2.4.1. The following ill-conditioned matrix and its normal matrix is considered by
Noble [832, 1976]:
1 1
1 1
A = 1 1 + ϵ, ATA = 3 .
1 1 + 2ϵ2 /3
1 1−ϵ
√
If ϵ ≤ u, then in floating-point computation, f l(1 + 2ϵ2 /3) = 1, and the computed matrix
ATA is numerically singular. However, in the LU factorization
1 0
1 1
A = 1 1 ≡ LU,
0 ϵ
1 −1
1 −ϵ−1
1/3 0 1 1 1
A† = U −1 (LT L)−1 LT =
0 ϵ−1 0 1/2 0 1 −1
−1 −1
1 2 2 − 3ϵ 2 + 3ϵ
= .
6 0 3ϵ−1 −3ϵ−1
where d is obtained from the lower triangular system U T d = ΠT2 c. Problem (2.4.5) is well-
conditioned and can be solved using the normal equations of the second kind:
LU factorization with complete pivoting is slow, and when A has full column rank, it is
usually sufficient to use partial pivoting. An attractive alternative is to use a method that, in
86 Chapter 2. Basic Numerical Methods
terms of efficiency and accuracy, lies between partial and complete pivoting; see Foster [425,
1997]. Rook pivoting was introduced by Neal and Poole [824, 1992]. The name is chosen
because the pivot search resembles the movement of the rook piece in chess. A pivot element
is chosen that is the largest in both its row and column. One starts by finding the element of
maximum magnitude in the first column. If this also is of maximum magnitude in its row, it is
accepted as pivot. Otherwise, the element of maximum magnitude in this row is determined and
compared with other elements in its column, etc.
The Peters–Wilkinson method works also for problems where A is rank-deficient. Then, after
deleting zero rows in U and the corresponding columns in L, the LU factorization yields a unit
lower trapezoidal factor L ∈ Rm×r and an upper triangular factor U ∈ Rr×n of full rank. For
example, if m = 6, n = 4, and r = 3, then (in exact computation) the factors obtained after r
elimination steps have the form
1
l21 1
u11 u12 u13 u14
l31 l32 1
L= , U = u22 u23 u24 ;
l41 l42 l43
u33 u34
l51 l52 l53
l61 l62 l63
see Section 2.4.3. A problem that remains is how to determine the rank r reliably.
where C = L2 L−1
1 ∈R
p×n
. The normal equations are
can be proved by multiplying with (In + C T C) from the left and using (C T C)C = C T (CC T ).
It follows that the solution to (2.4.7) can be written u = C T v, where
This only requires the inversion of a matrix of size p × p. The resulting algorithm is summarized
below.
1. Compute the factorization A e = Π1 AΠ2 = L1 U .
L2
Neglecting higher order terms in p, the initial LU factorization requires mn2 − n3 /3 flops
and computing C requires pn2 flops. Forming and solving the normal equations for v takes about
np2 flops. Finally, solving the triangular systems L1 y = C T v + b1 and U x = y takes 2n2 flops.
For p ≪ n the total arithmetic cost mn2 − n3 /3 + pn2 is lower than that required by the normal
equations.
Equation (2.4.9) is the normal equation for the problem
Ip b2
min v− (2.4.10)
y
e2 CT −b1 2
This shows that y2 is the solution to the damped least squares problem of full rank,
T
C d
min y2 − . (2.4.11)
y2 Ip 0 2
The normal equations are (CC T + Ip )y2 = Cf . The algorithm can be summarized as follows:
L1
1. Compute the factorization A = Π1 AΠ2 =
e U.
L2
4. Compute y1 = −C T y2 .
Cardenal, Duff, and Jiménez [208, 1998] derive similar formulas for quasi-square least
squares problems by using the augmented system formulation. They implement and test the new
methods using both dense LU routines from LAPACK and the sparse LU solver MA48 from the
Harwell Software Library (HSL). Tests carried out confirm that the new method is much faster
than either the normal equations or the original Peters–Wilkinson method. Furthermore, their
88 Chapter 2. Basic Numerical Methods
sparse implementation outperformed the dense solvers on problems as small as a few hundred
equations.
An alternative suggested by Tewarson [1058, 1968] is to solve the least squares subproblem
miny ∥Ly − Π1 b∥2 in (2.4.2) by an orthogonal reduction of the trapezoidal matrix L to lower
triangular form:
L1 L̄ c1
Q = , QΠ1 b = .
L2 0 c2
Such an algorithm is given by Cline [253, 1973]. The lower trapezoidal structure in L̄ is pre-
served by taking Q as a product of Householder reflectors,
Q = P1 · · · Pn−1 Pn ,
where Pk is chosen to zero the elements ℓn+1,k , . . . , lm,k . This reduction requires 2n2 (m −
n) flops. The least squares solution is then obtained in 2n3 /3 flops from the lower triangular
system L̄y = c1 . Cline’s algorithm is very efficient for least squares problems that are slightly
overdetermined, i.e., p = m − n ≪ n. The total number of flops required for computing the
least squares solution is about 23 n3 + 3pn2 . This is fewer than needed for the method of normal
equations if p ≤ n/3. For p ≤ 2n/3 it is also more efficient than using the Householder QR
factorization of A.
Plemmons [897, 1974] solves the least squares subproblem (2.4.2) by MGS. The trapezoidal
structure in L is preserved by applying MGS in reverse order from the last to the first column
of L. This method requires a total of 43 n3 + 3pn2 flops, which is slightly more than needed for
Cline’s algorithm. Similar orthogonalization methods for the underdetermined case (m < n) are
given by Cline and Plemmons [258, 1976].
where L11 ∈ Rk×k is unit lower unit triangular, U11 ∈ Rk×k is upper triangular, and U22 = 0.
Hence the rank of the matrix is revealed. However, when A is nearly rank-deficient, then even
with complete pivoting, ∥U22 ∥ may not be small, as shown by the following example by Peters
and Wilkinson [892, 1970]. For the upper triangular Wilkinson matrix
1 −1 −1 · · · −1
1 −1 · · · −1
.. .. n×n
W = . ··· . ∈R , (2.4.13)
1 −1
1
the smallest singular value of W is of size 2−(n−2) , although no diagonal element is small.
2.4. Methods Based on LU Factorization 89
For matrices having only one small singular value, Chan [223, 1984] shows that there always
exists a permutation such that the near-rank-deficiency is revealed by LU factorization. Further-
more, these permutations are related to the size of elements in A−1 . Let A ∈ Rm×n have the
SVD
Xn
A = U ΣV T = σi ui viT ,
i=1
and assume that σn−1 ≫ σn > 0. Then the last term σn−1 vn uTn in the pseudoinverse A† =
P n −1 T
i=1 σi vi ui will dominate. Let i, j be indices corresponding to the maximum value of
|(un )i (vn )j |. Then permuting aij to the (n, n)th position will produce a rank-revealing LU
(RRLU) factorization with unn = 2−(n−2) ≈ σn (A). For the matrix W in (2.4.13), the largest
element of the inverse
1 1 2 · · · 2n−2
n−3
1 1 ··· 2
−1
. . .. n×n
W = . ··· . ∈R
1 1
1
Theorem 2.4.2. Let A ∈ Rn×n have singular values σ1 ≥ · · · ≥ σn ≥ 0. Then for any
given k, 1 ≤ k < n, there exist permutations Π1 and Π2 such that in the factorization (2.4.12),
L11 ∈ Rk×k is unit lower triangular, U11 ∈ Rk×k is upper triangular, and
The bounds established in (2.4.14) and (2.4.15) are strikingly similar to the bounds estab-
lished in Theorem 2.3.6 for rank-revealing QR factorizations. If ∥U22 ∥2 is sufficiently small,
then a rank-k approximation of A is
L11
A= ΠT1 ( U11 U21 ) ΠT2 .
L21
Note that to determine if B has a local maximum volume, it is only necessary to compare
its volume with the volumes of k(n − k) neighboring submatrices that differ from B in exactly
one column (row). In floating-point computations, (2.4.17) should be replaced by vol (B) ≥
vol (B ′ )/µ, where µ > 1 is a fudge factor.
A1
Lemma 2.4.4. Let A = ∈ Rn×k , where A1 ∈ Rk×k , n > k. Then
A2
∥A2 A−1
p
1 ∥2 ≤ k(n − k) + 1 (2.4.18)
provided vol (A1 ) is a local maximum in A.
I A1
Proof. Let M = (mij ) = A2 A−1
so that
1 A1 = . Since the submatrix A1 has
M A2
I
maximum local volume in A, it follows that I has maximum local volume in . For any
M
′
I
mij ̸= 0, interchange row i of M with row j in I. Denote the new matrix by . Then
M′
vol (I ′ ) ≤ vol (I), which implies that |mij | ≤ 1. Finally, ∥M ∥2 ≤ ∥M ∥F ≤
p
k(n − k).
To find a rank-k, 1 ≤ k < n, RRLU factorization of A ∈ Rn×n one first selects a subset of
k columns with local maximum volume in A and then a subset of k rows with local maximum
volume in the selected k columns. Pan [873, 2000] gives two algorithms. Algorithm 1 selects an
m × m submatrix of a matrix A ∈ Rm×n , with rank(A) = m < n. It starts by computing the
LU factorization with partial pivoting
Π1 AΠ2 = LU = L ( U1 U2 ) ,
where Π1 = I and |ukk | ≥ |ukj |, k < j ≤ n. This is followed by a block pivoting phase in
which column j of U1 , j = m−1, . . . , 1, is permuted to the last position in U1 , and the permuted
U1 is retriangularized by Gaussian elimination. If the updated matrix fails to satisfy the condition
Hwang, Lin, and Yang [652, 1992] were the first to consider RRLU factorizations with numerical
rank-deficiency p > 1. However, their bounds may increase faster than exponentially. Improved
bounds are given in Hwang, Lin, and Pierce [651, 1997]. Miranian and Gu [796, 2003] study
strong rank-revealing LU factorizations for which the elements of Wl = L21 L−1 11 and Wr =
−1
U11 U12 are bounded by some slow growing polynomial in k, m, and n.
Eliminating y using Gaussian elimination without pivoting gives the reduced upper block trian-
gular system
I A y b
= .
0 −ATA x −AT b
Hence this choice of pivots just leads to the normal equations. To get a more stable method, it is
necessary to choose pivots outside the block I.
Introducing the scaled residual vector s = α−1 r gives the augmented system
−1
αI A α r b
= ⇐⇒ Mα zα = dα , (2.4.21)
AT 0 x 0
where we assume that 0 ≤ α ≤ ∥A∥2 = σ1 (A). The scaling parameter α will affect the
conditioning of Mα as well as the choice of pivots and thereby the accuracy of the computed
solution. For sufficiently small values of α, pivots will not be chosen from the (1, 1) block.
However, mixing the unknowns x and r does not make sense physically, because they have
different physical units and may be on vastly different scales numerically.
For a general symmetric indefinite matrix M , a stable symmetric indefinite factorization
ΠM ΠT = LDLT with D diagonal always exists if 2 × 2 symmetric blocks in D are allowed. A
pivoting scheme due to Bunch and Kaufman [187, 1977] guarantees control of element growth
without requiring too much searching. The symmetry constraint allows row and column permu-
tations to bring any diagonal element d1 = brr or any 2 × 2 submatrix of the form
brr brs
(brs = bsr )
bsr bss
to the pivot position. Taking a 2×2 submatrix as a pivot is equivalent to a double step of Gaussian
elimination and pivoting first on brs and then on bsr . Such a double step preserves symmetry,
and only elements on and below the main diagonal of the reduced matrix need to be computed.
Ultimately, a factorization A = LDLT is obtained where D is block diagonal with a mixture of
1 × 1 and 2 × 2 blocks. L is unit lower triangular with ℓk+1,k = 0 when B (k) is reduced by a
2 × 2 pivot. The Bunch–Kaufman strategy is to search until two columns r and s are found for
which the common element brs bounds in modulus the other off-diagonal elements in the r and
s columns. Then either a 2 × 2 pivot on these two columns or a 1 × 1 pivot with the largest in
modulus of the two diagonal elements is taken, according to the test
√
max(|brr |, |bss |) ≥ ρ|brs |, ρ = ( 17 + 1)/8 ≈ 0.6404.
The number ρ has been chosen to minimize the growth per stage of elements of B, allowing for
the fact that two stages are taken by a 2 × 2 pivot. With this choice, element growth is bounded
by gn ≤ (1 + 1/ρ)n−1 < (2.57)n−1 . This bound can be compared to the bound 2n−1 that holds
for Gaussian elimination with partial pivoting.
The above bound for element growth can be achieved with fewer comparisons using a strat-
egy due to Bunch and Kaufman. Let λ = |br1 | = max2≤i≤n |bi1 | be the off-diagonal element of
largest magnitude in the first column. If |b11 | ≥ ρλ, take b11 as a pivot. Otherwise, determine
the largest off-diagonal element in column r:
σ = max |bir |, i ̸= r.
1≤i≤n
If |b11 | ≥ ρλ2 /σ, again take b11 as a pivot; else if |brr | ≥ ρσ, take brr as a pivot. Otherwise, take
the 2×2 pivot corresponding to the off-diagonal element b1r . Note that at most two columns need
2.4. Methods Based on LU Factorization 93
to be searched in each step, and at most n2 comparisons are needed in all. When the factorization
M = LDLT has been obtained, the solution of M z = d is obtained in the three steps
Lv = d, Dw = v, LT z = w.
It has been shown by Higham [621, 1997] that for stability it is necessary to solve the 2 × 2
systems arising in Dw = v using partial pivoting or the explicit 2 × 2 inverse. The proof of this
is nontrivial and makes use of the special relations satisfied by the elements of the 2 × 2 pivots
in the Bunch–Kaufman pivoting scheme.
Bunch–Kaufman pivoting does not in general give a stable method for the least squares prob-
lem, because perturbations introduced by roundoff do not respect the structure of the augmented
system. For the scaled system (2.4.21) with a sufficiently small value of α the Bunch–Kaufman
scheme will introduce 2 × 2 pivots of the form
α a1r
,
a1r 0
which may improve the stability. This raises the question of the optimal choice of α for stability.
The eigenvalues λ of Mα can be expressed in terms of the singular values σi , i = 1, . . . , n
of A; see Björck [124, 1967]. If Mα z = λz, z = (s, x)T ̸= 0, then
αs + Ax = λs, AT s = λx,
or, eliminating s, αλx + ATAx = λ2 x. Hence if x ̸= 0, then x is an eigenvector and (λ2 − αλ)
an eigenvalue of ATA. On the other hand, x = 0 implies that AT s = 0, αs = λs, s ̸= 0. It
follows that the m + n eigenvalues of Mα are
p
2 2
λ = α/2 ± α /4 + σi , i = 1, . . . , n,
α otherwise.
If rank(A) = r ≤ n, then the eigenvalue α has multiplicity (m − r), and 0 is an eigenvalue
√ multiplicity (n − r). From this it−1/2
of is easily deduced that if σn > 0, then minα κ2 (Mα ) ≈
2κ2 (A) is attained for α = α̃ = 2 σn (A). Therefore, α̃ (or σn ) can be used as a nearly
optimal scaling factor in the augmented system method. Minimizing κ2 (Mα ) will minimize the
forward bound for the error in zα ,
−1
ϵκ(Mα ) α r
∥z̄α − zα ∥2 ≤ ∥zα ∥2 , zα = .
1 − ϵκ(Mα ) x
However, α also influences the norm in which the error is measured.
Pivoting and stability in the augmented system method is studied by Björck [133, 1992]. A
more refined error analysis is given here that separately minimizes bounds for the errors in x̄ and
ȳ. It is shown that the errors in the computed solution satisfy the upper bounds
∥r̄ − r∥2 σ1 (A)
≤ cguf (α) , (2.4.22)
∥x̄ − x∥2 κ2 (A)
where c is a low-degree polynomial, g the growth factor, and
α 1
f (α) = 1 + ∥r∥2 + ∥x∥2 .
σn α
1/2
If x ̸= 0, then f (α) is minimized for α = αopt = σn ∥r∥2 ∥x∥2 . The corresponding
minimum value of f (α) is
2 2
αopt σn
fmin = 1 + ∥x∥2 = 1 + σn−1 ∥r∥2 . (2.4.23)
σn αopt
94 Chapter 2. Basic Numerical Methods
Taking α = σn we find
1
f (σn ) = 2 ∥r∥2 + ∥x∥2 ≤ 2fmin ,
σn
i.e., using α = σn will at most double the error bound.
We recall that an acceptable-error stable algorithm is defined as one that gives a solution
for which the size of the error is never significantly greater than the error bound obtained from
a tight perturbation analysis. It can be shown that the augmented system method is acceptable-
error stable with both α = σn and α = αopt .
Empirical evidence suggests that, provided column pivoting has been used in the QR factoriza-
tion, it is very rare for the bound in (2.5.1) to differ much from κ(A). In extensive tests on
randomly generated test matrices, the bound usually underestimated the true condition number
by a factor of only 2–3 and never by more than 10. However, as shown by Example 2.3.3, the
bound (2.5.1) can still be a considerable underestimate of κ(A).
Improved estimates of κ(R) can be computed in only O(n2 ) flops by using inverse iteration.
Let the singular values of A be σi , i = 1, . . . , n, where σn < σi , i ̸= n. Then the dominating
eigenvalue σ1−2 of the matrix
C = (ATA)−1 = (RTR)−1
can be computed by applying the power method to (RTR)−1 = R−1 R−T . In each step, two
triangular linear systems
are solved, which requires 2n2 flops. After normalization, z (k) will converge to the right singular
vector vn corresponding to an eigenvalue, and
Example 2.5.1. Failure to detect near-rank-deficiency of A even in the (unusual) case when this
is not revealed by a small diagonal element in R can lead to a meaningless solution of very large
norm. Inverse iteration will often prevent that. For example, the n×n upper triangular Wilkinson
matrix
1 −1 · · · −1 −1
1 · · · −1 −1
W =
.. .. .. (2.5.4)
. . .
1 −1
1
2.5. Estimating Condition Numbers and Errors 95
has numerical rank n − 1 when n is large. If n = 50 and W is perturbed by changing the z50,1
entry to −2−48 , the new matrix Ŵ will be exactly singular. The smallest singular value of W is
bounded by
σ50 ≤ ∥W − Ŵ ∥F = 2−48 ≈ 7.105·10−15 .
The next smallest singular value is σ49 ≈ 1.5, so there is a well-defined gap between σ49 and σ50 .
But in the QR factorization R = W and gives no indication of the numerical rank-deficiency. (If
column interchanges are employed, the diagonal elements in R indicate rank 49.) Doing a single
inverse iteration on W T W using the MATLAB script
The condition estimator given by A. K. Cline et al. [256, 1979], often referred to as the
LINPACK condition estimator, proceeds as follows:
This is equivalent to one step of the power method with (ATA)−1 . Let R = U ΣV T be the
SVD of R. Expanding d in terms of the right singular vectors V gives
n
X n
X n
X
d= αi vi , y= (αi /σi )ui , z= (αi /σi2 )vi .
i=1 i=1 i=1
Hence provided αn , the component of d along vn , is not very small, the vector z is likely to be
dominated by its component of vn , and
will usually be a good estimate of σn−1 . In the LINPACK algorithm the 1-norm is used for
normalization. The vector d is chosen as d = (±1, ±1, . . . , ±1)T , where the sign of dj is
determined adaptively; see A. K. Cline et al. [256, 1979].
In practice the LINPACK algorithm performs very reliably and produces good order of mag-
nitude estimates; see Higham [612, 1987]. However, examples of parametrized matrices can be
constructed for which the LINPACK estimate can underestimate the true condition number by
an arbitrarily large factor. In a modification to the LINPACK condition estimator, O’Leary [837,
1980] suggests that ∥R−1 ∥1 be estimated by
max ∥y∥∞ /∥d∥∞ , ∥z∥1 /∥y∥1 ,
This makes use of information from the first step, which can improve the estimate. Another
generalization, due to A. K. Cline, Conn, and Van Loan [255, 1982], of the LINPACK algorithm
incorporates a “look-behind” technique. This allows for the possibility of modifying previously
chosen dj ’s. It gives an algorithm for the 2-norm that requires about 10n2 flops.
Boyd [173, 1974] devised a method for computing a lower bound for an arbitrary Hölder
norm ∥B∥p , assuming only that Bx and B T x can be computed for arbitrary vectors x. In the
96 Chapter 2. Basic Numerical Methods
following, p ≥ 1 and q ≥ 1 are such that 1/p + 1/q = 1. Then ∥ · ∥q is the dual norm to ∥ · ∥p ,
and the Hölder inequality
|xT y| ≤ ∥x∥p ∥y∥q
holds. In the algorithm, dualp (x) is any vector y of unit ℓq -norm such that equality holds for
x and y in the Hölder inequality. A derivation of Boyd’s algorithm is given by Higham [623,
2002, Sect. 15.2]. When p = q = 2, Boyd’s algorithm reduces to the usual power method
applied to B TB.
For the ℓ1 -norm the algorithm was derived independently by Hager [560, 1984]. In this
case the dual norm is the ℓ∞ -norm. Since ∥B∥∞ = ∥B T ∥1 , this algorithm can be used also
to estimate the infinity norm. Hager’s algorithm is based on convex optimization and uses the
observation that ∥B∥1 is the maximal value of the convex function
n
X
f (x) = ∥Bx∥1 = |yi |, y = Bx,
i=1
over the convex set S = {x ∈ Rn | ∥x∥1 ≤ 1}. From convexity results it follows that the
maximum is attained at one of the vertices ej , j = 1, . . . , n, of S. From this observation Hager
derives an algorithm for finding a local maximum that with high probability is also the global
maximum.
x = n−1 e;
repeat
y = Bx; ξ = sign(y);
T
z = B ξ;
if ∥z∥∞ ≤ z T x
γ = ∥y∥1 ; break
end
set x = ej ; where |zj | = ∥z∥∞ ;
end
The algorithm tries to maximize f (x) = ∥Bx∥1 subject to ∥x∥1 = 1. The vector z computed
at each step can be shown to be a subgradient of f at x. From convexity properties,
Hence if |zj | > z T x for some j, then f can be increased by moving from x to the vertex ej of S.
If, however, ∥z∥∞ ≤ z T x, and if yj ̸= 0 for all j, then x can be shown to be a local maximum
point for f over S.
Higham [617, 1990] reports on experience in using Hager’s algorithm. The estimates pro-
duced are generally sharper than those produced by the LINPACK estimator. Its results are
frequently exact, usually good (γ ≥ 0.1∥B∥1 ), but sometimes poor. The algorithm almost al-
ways converges after at most four iterations, and Higham recommends that between two and five
2.5. Estimating Condition Numbers and Errors 97
iterations be used. The average cost for estimating ∥R∥1 of a triangular matrix R is in practice
around 6n2 flops.
An important feature of Hager’s norm estimator is that to estimate ∥B −1 ∥1 we only need to
be able to solve linear systems By = x and B T z = ξ. This feature makes it useful for estimating
the componentwise error bounds given in Section 1.3.4. For the least squares problem the bound
(1.3.56) can be written in the form ∥δx∥∞ ≤ ω cond (A, b)∥x∥∞ , where
cond (A, b) ≤ ∥|A† |g1 ∥∞ + ∥|ATA)−1 |g2 ∥∞ /∥x∥∞ (2.5.5)
and
g1 = |A||x| + |b|, g2 = |A|T |r|. (2.5.6)
Hager’s algorithm gives an inexpensive and reliable estimate of cond (A, b). The key idea is to
note that all terms in (2.5.5) are of the form ∥|B|g∥∞ , where g > 0. Following Arioli, Demmel,
and Duff [33, 1989], we take G = diag (g). Then using g = Ge and the properties of the
ℓ∞ -norm, we have
Hence Hager’s algorithm can be applied to estimate ∥|B|g∥∞ provided matrix-vector products
BGx and GT B T y can be computed efficiently. To estimate cond (A, b) we need to be able to
compute matrix-vector products of the forms A† x, (A† )T y, and (ATA)−1 x. This can be done
efficiently if a QR factorization of A is known.
for some E, then ∥E∥2 is called the backward error of x̄. If ∥E∥2 is small compared to the
uncertainty in the data A, then the solution x̄ can be said to be as good as the data warrants. The
forward error ∥x − x̄∥ can be estimated using the perturbation bounds given in Section 1.3.3.
In general, x̄ solves (2.5.7) for an infinite number of perturbations E. The optimal backward
error for a given x̄ is defined as
To find a good estimate of µ(x̄) is important, e.g., for deciding when to stop an iterative solution
method. For a consistent linear system Ax = b. Rigal and Gaches [928, 1967] showed that the
optimal backward error E is given by the rank-one perturbation
Finding the optimal backward error for a general least squares problem is more difficult.
Stewart [1018, 1977, Theorem 3.1] gives two simple upper bounds for µ(x̄).
Theorem 2.5.2. Let x̄ be an approximate solution to the least squares problem minx ∥Ax − b∥2 .
Assume that the corresponding residual r̄ = b − Ax̄ ̸= 0. Then x̄ exactly solves minx ∥b − (A +
Ei )x∥2 , where
(r̄ − r)x̄T r̄r̄TA
E1 = , E2 = − = −r̄r̄† , (2.5.10)
∥x̄∥22 ∥r̄∥22
and r = b − Ax is the residual corresponding to the exact solution x. The norms of these
perturbations are
p
∥r̄∥22 − ∥r∥22 ∥r̄∥2 ∥AT r̄∥2
∥E1 ∥2 = ≤ , ∥E2 ∥2 = . (2.5.11)
∥x̄∥2 ∥x̄∥2 ∥r̄∥2
Proof. The result for E2 is proved by showing that x̄ satisfies the normal equations (A+E2 )T (b−
(A + E2 )x̄ = 0. Note that r̄† = r̄T /∥r̄∥2 is the pseudoinverse of r̄ and that r̄r̄† is an orthogonal
projector. From A + E2 = (I − r̄r̄† )A and Ax̄ = b − r̄, we obtain
Hence the normal equations become AT (I − r̄r̄† )r̄r̄† b = 0. The proof for E1 may be found in
Stewart [1017, 1977, Theorem 5.3].
In Theorem 2.5.2 ∥E1 ∥2 is small when r̄ is almost equal to the residual r of the exact solution.
∥E2 ∥2 is small when r̄ is almost orthogonal to the column space of A. However, these are just
upper bounds, and µ(x̄) can be much smaller than either ∥E1 ∥2 or ∥E2 ∥2 .
An exact expression for µ(x̄) was found by Waldén, Karlsson, and Sun [1096, 1995] by char-
acterizing the set E of all possible perturbations E. Their result is summarized in the following
theorem; see also Higham [623, 2002, pp. 404–405].
Theorem 2.5.3. Let r̄ = b − Ax̄ ̸= 0 and η = ∥r̄∥2 /∥x̄∥2 . Then the optimal backward error in
the Frobenius norm is
µ(x̄) ≡ min ∥E∥F = min η, σmin ( A, B ) , (2.5.12)
E∈E
Computing the smallest singular value of the matrix (A, B) ∈ Rm×(m+n is too expensive for
most practical purposes. Karlsson and Waldén [685, 1997] proposed an estimate of µ̃ that can be
computed more cheaply. This makes use of a regularized projection of the residual r̄ = b − Ax̄.
The Karlsson–Waldén (KW) estimate can be expressed as µ e = ∥Ky∥2 /∥x̄∥2 , where y solves the
least squares problem
A r̄
min ∥Ky − v∥2 , K= , v= . (2.5.13)
y ηI 0
Numerical experiments by Grcar, Saunders, and Su [532, 2007] indicate that the KW esti-
mate is very near the true optimal backward error and can be used safely in practice. This was
confirmed by Gratton, Jiránek, and Titley-Peloquin [528, 2012], who proved the lower and upper
bounds
µ p √
1 ≤ ≤ 2 − (∥r∥2 /∥r̄∥2 ) ≤ 2. (2.5.14)
µ
e
The ratio tends to 1 as ∥r̄∥2 → ∥r∥2 .
The following MATLAB script can be used to compute the KW estimate (2.5.13) by sparse
QR factorization without storing Q.
Methods for computing the KW estimate are given by Malyshev and Sadkane [770, 2001].
Optimal backward error bounds for problems with multiple right-hand sides are given by Sun
[1050, 1996], while bounds for underdetermined systems are derived in Sun and Sun [1051,
1997]. The extension of backward error bounds to constrained least squares problems is dis-
cussed by Cox and Higham [271, 1999].
The optimal componentwise backward error of x̄ is the smallest ω ≥ 0 such that x̄ exactly
minimizes ∥(A + E)x̄ − (b + f )∥2 , and
where the inequalities are to be interpreted componentwise. For a consistent linear system b ∈
R(A), Oettli and Prager [835, 1964] proved the explicit expression
|Ax̄ − b|i
ω = max . (2.5.16)
1≤i≤n (|A||x̄| + |b|)i
Here 0/0 should be interpreted as 0, and ζ/0 (ζ ̸= 0) as infinity. (The latter case means that no
finite ω satisfying (2.5.16) exists.) Together with the perturbation result (1.3.49), (2.5.16) can be
used to compute an a posteriori bound on the error in a given approximate solution x̄.
In Section 1.3.4 we obtained perturbation bounds for the least squares problem subject to
componentwise perturbations. However, no expression for the optimal componentwise back-
ward error is known. Following Björck [132, 1991], we apply the Oettli–Prager bound to the
augmented system (1.1.19), where no perturbations in the diagonal blocks of M or in the zero
vector in the right-hand side are allowed. However, we allow for different perturbations of the
blocks A and AT , as this does not increase the forward error bounds (1.3.55) and (1.3.56).
Hence for an a posteriori error analysis, it makes sense to define the pseudocomponentwise
backward error of a computed solution x̄, r̄ to be the smallest nonnegative number ω such that
and
I A + δA1 r̄ b + δb
= . (2.5.17)
AT + δA2 0 x̄ 0
100 Chapter 2. Basic Numerical Methods
Note that this allows the two blocks of A in the augmented system to be perturbed differently
and hence does not directly correspond to perturbing the data of the least squares problem. From
the result of Oettli and Prager, this backward error for a computed solution r̄ and x̄ becomes
ω(r̄, x̄) = max(ω1 , ω2 ), where
If we only have a computed x̄, it may be feasible to define r̄ = b − Ax̄ and apply the result
above. With this choice we have ω1 = 0 (exactly), and hence
|AT (b − Ax̄)|i
ω(r̄, x̄) = ω2 = max .
1≤i≤n (|AT ||b − Ax̄|)i
If the columns of A and b are scaled, the class of perturbations scales in the same way. Hence ω
is invariant under row and column scaling of A. A bound for the forward error ∥x̄ − x∥2 can be
obtained in terms of ω, which potentially is much smaller than the standard forward error bound
involving κ2 (A).
In the case of a nearly consistent least squares problem, f l(b − Ax̄) will mainly consist of
roundoff and will not be accurately orthogonal to the range of A. Hence although x̄ may have
a small relative backward error, ω2 may not be small. This illustrates a fundamental problem in
computing the backward error: for x̄ to have a small backward error it is sufficient that either
(b − Ax̄) or AT (b − Ax̄) is small, but neither of these conditions is necessary.
for s = 0, 1, 2, . . . ,
compute rs = b − Axs ; in precision u2
round rs to precision u1
solve Aδxs = rs ; in precision u1
xs+1 = xs + δxs ; in precision u1
end;
The process is stopped when δxs /∥xs ∥ no longer shows a steady decrease.
The factorization of A used for computing the initial approximation can also be used for solv-
ing the correction equations. Therefore the cost of each refinement step is quite small. Note that
while the computed solution initially improves with each iteration, this is usually not reflected in
a corresponding decrease in the norm of the residual, which typically stays about the same.
2.5. Estimating Condition Numbers and Errors 101
On many early computers, inner products could be cheaply accumulated at twice the working
precision, and IR was used with u2 = u21 . This traditional version of IR was analyzed for fixed-
point arithmetic by Wilkinson [1118, 1963] and for floating-point arithmetic by Moler [799,
1967]. A more recent error analysis is found in Higham [623, 2002, Chapter 12]. As long as A is
not too ill-conditioned so that the initial solution has a relative error ∥x − x0 ∥/∥x0 ∥ = η, η < 1,
IR behaves roughly as follows. The relative error is decreased by a factor of about η with each
step of refinement until a stage is reached at which the solution is correct to working precision u.
Since most problems involve inexact input data, obtaining a highly accurate solution may
not seem to be justified. Even so, IR offers a useful estimate of the accuracy and reliability of a
computed solution. The correction δ1 also gives a good estimate of the sensitivity of the solution
to small relative perturbations of order u in the data A and b. Furthermore, there are applications
where an accurate solution of very ill-conditioned equations is warranted; see Ma et al. [765,
2017].
Mixed-precision IR was first applied to least squares problems by Businger and Golub [193,
1965]. In their algorithm the QR factorization of A is used to compute x0 and solve for the
corrections δxs . The iterations proceeds as follows.
for s = 0, 1, 2, . . .
compute rs = b − Axs ; in precision u2 .
solve min ∥Aδxs − rs ∥2 ; in precision u1 .
δxs
xs+1 = xs + δxs ; in precision u1 .
end
This works well for small-residual problems, but otherwise it may fail to give solutions correct
to working precision.
To remedy this it is necessary to simultaneously refine both the solution x and the residual
r by applying IR to the augmented system for the least squares problem. Let xs and rs be the
current approximations. In Björck [124, 1967] the new approximations are taken to be
where the corrections are computed in precision u1 from the augmented system
I A δrs fs
= . (2.5.19)
AT 0 δxs gs
are computed in precision u2 and rounded to precision u1 . The system (2.5.19) can be solved
stably using Algorithm 2.2.6:
−T ds
zs = R gs , = QT fs , (2.5.20)
es
zs
δrs = Q , δxs = R−1 (ds − zs ). (2.5.21)
es
This algorithm requires 8mn − 2n2 flops in working precision for computing the QR fac-
torization. Computing the residual takes 4mn flops in extended precision. The initial rate of
convergence can be shown to be linear with rate
ρ = c1 uκ′ , κ′ = min κ2 (AD), (2.5.22)
D>0
where c1 is of modest size. Note that this rate is achieved without actually carrying out the
scaling of A by the optimal D. This rate is similar to that for the linear system case, even
though the conditioning of the least squares problem includes a term proportional to κ2 (A) for
large-residual problems.
Example 2.5.4 (See Björck and Golub [143, 1967]). To illustrate the method of iterative re-
finement we consider the linear least squares problem where A is the last six columns of the
inverse of the Hilbert matrix H8 ∈ R8×8 , which has elements
hij = 1/(i + j − 1), 1 ≤ i, j ≤ 8.
Two right-hand sides b1 and b2 are chosen so that the exact solution is
x = (1/3, 1/4, 1/5, 1/6, 1/7, 1/8)T .
For b = b1 the system Ax = b is consistent; for b = b2 the norm of the residual r = b − Ax is
1.04 · 107 . Hence, for b2 the term proportional to κ2 (A) in the perturbation bound dominates.
The refinement algorithm was run on a computer with a single precision unit roundoff u =
1.46 · 10−11 . The correction equation was solved using Householder QR factorization. Double
precision accumulation of inner products was used for calculating the residuals, but otherwise
all computations were performed in single precision. We give below the first component of the
successive approximations x(s) , r(s) s = 1, 2, 3, . . . , for right-hand sides b1 (left) and b2 (right).
(s) (s)
x1 = 3.33323 25269 · 10−1 x1 = 5.56239 01547 · 10+1
3.33333 35247 · 10−1 3.37777 18060 · 10−1
3.33333 33334 · 10−1 3.33311 57908 · 10−1
3.33333 33334 · 10−1 3.33333 33117 · 10−1
(s) (s)
r1 = 9.32626 24303 · 10−5 r1 = 2.80130 68864 · 106
5.05114 03416 · 10−7 2.79999 98248 · 106
3.65217 71718 · 10−11 2.79999 99995 · 106
−1.95300 70174 · 10−13 2.80000 00000 · 106
A gain of almost three digits accuracy per step in the approximations to x1 and r1 is achieved for
both right-hand sides b1 and b2 . This is consistent with the estimate (2.5.22) because
κ(A) = 5.03 · 108 , uκ(A) = 5.84 · 10−3 .
2.5. Estimating Condition Numbers and Errors 103
(4)
For the right-hand side b1 the approximation x1 is correct to full fixed precision. It is interesting
to note that for the right-hand side b2 the effect of the error term proportional to uκ2 (A) is evident
(1) (4)
in that the computed solution x1 is in error by a factor of 103 . However, x1 has eight correct
(4)
digits, and r1 is close to the true value 2.8 · 106 .
Wampler [1099, 1979] gives two Fortran subroutines L2A and L2B using MGS with itera-
tive refinement for solving weighted least squares problems. These are based on the ALGOL
programs in Björck [126, 1968] and were found to provide the best accuracy in a comparative
evaluation at the National Bureau of Standards; see Wampler [1098, 1970]. Demmel et al. [308,
2009] developed a portable and parallel implementation of the Björck–Golub IR algorithm for
least squares solutions that uses extended precision.
Most descriptions of IR stress the importance of computing the residuals in higher precision.
However, fixed-precision IR with residuals computed in working precision (u2 = u1 ) can also
be beneficial. Jankowski and Woźniakowksi [663, 1977] show that any linear equation solver
can be made backward stable by IR in fixed precision as long as the solver is not too unstable
and A not too ill-conditioned. If the product of cond (A) = ∥ |A−1 ||A| ∥2 and the measure of
ill-scaling
maxi (|A||x|)i
σ(A, x) = (2.5.23)
mini (|A||x|)i
is not too large, Skeel [1001, 1980] proves that LU factorization with partial pivoting combined
with one step of fixed-precision IR becomes stable in a strong sense. Higham [618, 1991] extends
Skeel’s analysis to show that for any solver that is not too unstable, one step of fixed-precision
IR suffices to achieve a solution with a componentwise relative backward error ω < γn+1 u1 . In
particular, this result applies to solving linear systems by QR factorization.
Higham [618, 1991] studies fixed-precision IR for linear systems and least squares prob-
lems of QR factorization methods. He shows that the componentwise backward error ω(r̄, x̄) =
max(ω1 , ω2 ) in (2.5.18) eventually becomes small, although it may take more than one iteration.
In particular, IR mitigates the effect of poor row scaling.
If fs and gs in Algorithm 2.5.3 are evaluated in precision u1 , then the resulting roundoff
errors become more important. A standard backward error analysis shows that
where δI is diagonal. Hence, the roundoff errors are equivalent to small componentwise pertur-
bations in the nonzero blocks of the augmented matrix,
where the inequalities are to be interpreted componentwise. It follows that the computed residu-
als f¯ and ḡ are the exact residuals corresponding to the perturbed system
f¯
b + δb I + δI A + δA1 r̄
= − ,
ḡ 0 (A + δA2 )T 0 x̄
where the perturbations satisfy the componentwise bounds derived above. A perturbation |δI|
can be considered as a small perturbation in the weights of the rows of Ax − b. Roundoff errors
also occur in solving equations (2.5.20–(2.5.21). However, if the refinement converges, then the
roundoff errors in the solution of the final corrections are negligible.
104 Chapter 2. Basic Numerical Methods
Recently, graphics processing units (GPUs) have been introduced that perform extremely fast
half precision matrix-matrix multiplication accumulated in single IEEE half precision format (see
Section 1.4). This has caused a renewed interest in multiprecision algorithms for applications
such as weather forecasting and machine learning. A survey of linear algebra in mixed precision
is given by Higham and Mary [627, 2022].
Carson and Higham [209, 2018] develop a three-precision iterative refinement algorithm for
solving linear equations. This uses a complete LU factorization in half IEEE precision (u = 4.9×
10−4 ), single precision as working precision, and double precision for computing residuals. The
remaining computations are performed in working precision, and all results are stored in working
precision. A rounding error analysis shows that this obtains full single-precision accuracy as long
as κ(A) ≤ 104 . With lower working precision the likelihood increases that the system being
solved is too ill-conditioned. The authors show that in these cases an improvement is obtained
by using a two-stage iterative refinement approach where the correction equation is solved by
GMRES preconditioned by LU factorization (see Section 6.4.5). For the resulting GMRES-IR
algorithm the above condition can be weakened to κ(A) ≤ 108 .
Carson, Higham, and Pranesh [210, 2020] develop an analogous three-precision iterative
refinement algorithm called GMRES-LSIR for least squares problems. It uses the QR factoriza-
tion of A computed in half IEEE precision. The correction is solved by GMRES applied to the
augmented system using a preconditioner based on Algorithm 2.2.6 and the computed QR factor-
ization. For a wide range of problems this yields backward and forward errors for the augmented
system correct to working precision.
Kielbasiński [693, 1981] studies a version of IR with variable precision called binary-cascade IR
(BCIR) in which several steps of IR are performed for solving a linear system with prescribed
relative accuracy. At each step the process uses the lowest sufficient precision for evaluating the
residuals. A BCIR process for solving least squares problems is developed by Gluchowska and
Smoktunowicz [481, 1990]. Iterative refinement of solutions has many applications to sparse
linear systems and sparse least squares problems. Arioli, Demmel, and Duff [33, 1989] adapt
Skeel’s analysis of fixed-precision IR to the problem of solving sparse linear systems with sparse
backward error. The use of a fixed-precision IR for sparse least squares problems is studied by
Arioli, Duff, and de Rijk [36, 1989]. They note that IR can regain a loss of accuracy caused by
bad scaling of the augmented system.
RTRx = c, c = AT b. (2.5.26)
set x0 = 0;
for s = 0, 1, 2, . . . ,
rs = b − Axs ,
solve RT(Rδxs ) = AT rs ,
xs+1 = xs + δxs .
end
Each step of refinement requires matrix-vector multiplication with A and AT and the solution
of two triangular systems. With R from a QR factorization the convergence of this iteration is
linear with a rate that can be shown to be approximately
see Björck [130, 1987]. Note that this holds without actually performing the optimum column
scaling. When R comes from a Cholesky factorization the rate achieved is much worse: only
ρ̄ = cuκ′ (A)2 . Even then, a good final accuracy can be achieved for a large class of problems
by performing several steps of IR.
In the method of corrected seminormal equations (CSNE), a corrected solution xc is
computed by doing just one step of refinement of the SNE solution x̄: Compute the residual
r̄ = f l(b − Ax̄) and solve
RTRδx = AT r̄, xc = x̄ + δx. (2.5.27)
Assuming that R comes from a backward stable Householder QR or MGS factorization, the
computed R is the exact R-factor of a perturbed matrix,
A + E, ∥E∥F ≤ c1 u∥A∥F ,
where c1 is a small constant depending on m and n. From this property the following error bound
for the corrected solution xc can be shown; see Björck [130, 1987, Theorem 3.2].
Theorem 2.5.5. Let x̄c be the computed CSNE solution using R from Householder QR or MGS
of A. If ρ ≡ c1 n1/2 uκ(A) < 1, then neglecting higher order terms in uκ(A), the following
error estimate holds:
1/2 ∥r∥2 1/2 ∥b∥2
∥x − x̄c ∥2 ≤ mn uκ ∥x∥2 + κ + σuκ c2 ∥x∥2 + n m , (2.5.28)
∥A∥2 ∥A∥2
where κ = κ(A), σ = c3 uκκ′ , c3 ≤ 2n1/2 (c1 + 2n + m/2), and c1 and c2 are small constants
depending on m and n.
If σ = c3 uκκ′ < 1, the error bound for the CSNE solution is no worse than the error bound
for a backward stable method. This condition is usually satisfied in practical applications and
is roughly equivalent to requiring that x̄ from the seminormal equation have at least one correct
digit.
An important application of CSNE is to sparse least squares problems. In the QR factor-
ization of a sparse matrix A, the factor Q often can be much less sparse that the factor R; see
Gilbert, Ng, and Peyton [470, 1997]. Therefore Q is not saved, which creates a difficulty if addi-
tional right-hand sides b have to be treated. With CSNE, recomputing the QR factorization can
be avoided.
106 Chapter 2. Basic Numerical Methods
Example 2.5.6. Consider a sequence of least squares problems constructed as follows. Let A be
the first five columns of the inverse Hilbert matrix H6−1 of order six. This matrix is moderately
ill-conditioned: κ2 (A) = 4.70 × 106 . Let x = (1, 1/2, 1/3, 1/4, 1/5)T , and let b = Ax be a
consistent right-hand side. Let h satisfy AT h = 0 and κ2 (A)∥h∥2 /(∥A∥2 ∥x∥2 ) = 3.72 × 103 .
Consider a sequence of right-hand sides Ax + 10k h, k = 0, 1, 2, with increasing residual norm.
For these problems it holds that σ = c3 uκκ′ ≪ 1.
Table 2.5.1 shows the average number of correct significant decimal digits of four solution
methods: Normal equations (NE), SNE, QR, and CSNE. As predicted by the error analysis, SNE
gives only about the same accuracy as NE. On the other hand, CSNE is better than QR.
For more ill-conditioned problems, several refinement steps may be needed. Let xp be the
computed solution after p refinement steps. With R from QR the error initially behaves as ∥x −
xs ∥ ∼ c1 uκκ′ (c1 uκ′ )p . If c1 ≈ 1 and κ′ = κ, an acceptable-error stable level is achieved in p
steps if κ(A) < u−p/(p+1) .
To achieve better performance, this sequence of rank-one updates can be aggregated into one
update of rank p.
A stable compact representation of a product of Householder matrices is given by Bischof
and Van Loan [123, 1987]. Here we describe a more storage–efficient version due to Schreiber
and Van Loan [975, 1989]. Let
Qi = I − Ui Ti UiT , i = 1, 2,
2.6. Blocked Algorithms and Subroutine Libraries 107
Note that U is formed by concatenation, but forming the off-diagonal block of the upper trian-
gular matrix T requires extra operations. In the special case when n1 = k, n2 = 1, U2 = u, and
T2 = τ , (2.6.3) becomes
T1 −τ T1 (U1T u)
U = ( U1 u), T = . (2.6.4)
0 τ
This requires about n21 (m + n1 /3) flops. The trailing matrix A2 can then be updated by matrix-
matrix operations as
R12
QT1 A2 = A2 − Un1 (TnT1 (UnT1 A2 )) = , R12 ∈ Rn1 ×(n−n1 ) (2.6.5)
B
3. Apply the update to the trailing blocks matrix of size (m − kp) × (n − kp).
The QR factorization in step 1 requires a total of less than 2N mp2 = 2mnp flops. The operation
count of step 2 is of similar magnitude. Since the total number of flops for the Householder
QR factorization of A ∈ Rm×n must be greater than 2n2 (m − n/3) flops, all but a fraction of
n/p = 1/N of the operations are spent in the matrix operations of the updating.
The block Householder QR algorithm described above is right-looking, i.e., in step k the full
trailing submatrix of size (m − kp) × (n − kp) is updated. For p = 1 it reduces to the standard
Householder QR algorithm. The data referenced can instead be reduced by using a left-looking
algorithm that in step k applies all previous Householder transformations to the next block of
size (n − kp) × p of the trailing matrix.
108 Chapter 2. Basic Numerical Methods
A blocked form of MGS QR can easily be constructed as follows; see Björck [134, 1994].
Let A = ( A1 A2 ) and
be the MGS factorization of A1 , where qiT qj = 1, i = j. Due to rounding errors, there will
be a loss of orthogonality so that qiT qj ̸= 0, i ̸= j. In the next step, the trailing block A2 is
transformed as
B = PkT A2 , Pk = (I − q1 q1T ) · · · (I − qk qkT ), (2.6.7)
This update requires 2(n − k)k(m + p/4) flops. When k ≪ n this is the main work in the first
step. In the next step, B is partitioned as B = ( B1 B2 ), and the MGS QR factorization of B1
is computed, etc. The resulting block MGS algorithm has the same stability as the scalar MGS
algorithm and can be used to solve least squares problems in a backward stable way.
The following result (Björck [125, 1967, Lemma 5.1]) can be used to improve the efficiency
of column-oriented MGS orthogonalization.
e k QTk ,
Pk = I − Q e k (I + LTk ),
Qk = Q (2.6.11)
where Lk ∈ Rk×k is a strictly lower triangular correction matrix with elements lij = qiT qj ,
i > j.
Proof. The lemma is trivially true for k = 1. From the definition of qek we have
e k (I + LT ).
But this is the kth column of Qk = Q k
2.6. Blocked Algorithms and Subroutine Libraries 109
T
Pk−1 e Tk−1 )ak ,
ak = (I − Qk−1 Q e Tk−1 = (I + Lk−1 )−1 QTk−1 .
Q
Comparing (2.6.8) and (2.6.13) gives the identity Tk−1 = (I − Lk−1 )−1 . A similar lower tri-
angular inverse is used by Walker [1097, 1988] to obtain a blocked Householder algorithm. A
summary of the compact WY and inverse compact WY for Householder and MGS transforma-
tions and a version of blocked MGS based on (2.6.13) are given by Świrydowicz et al. [1054,
2020].
Several other block Gram–Schmidt algorithms have been suggested. Jalby and Philippe [662,
1991] study a block Gram–Schmidt algorithm in which MGS is used to orthogonalize inside the
blocks, and the trailing matrix is updated as in CGS by multiplication with (I − Qk QTk ). The
stability of this algorithm is shown to lie between that of MGS and CGS. The computed matrix
Q̂ satisfies
∥I − Q̂T Q̂∥2 ≤ ρu max κ(Wk )κ(A),
k
where Wk , k = 1, . . . , N , are the successive panel matrices. The accuracy can be improved
significantly by reorthogonalization of the trailing matrix.
A more challenging problem is the orthogonal basis problem for computing Q1 and R that
satisfy
where A ∈ Rm×n , and c1 (m, n) and c2 (m, n) are modest constants. Stewart [1031, 2008]
develops a left-looking Gram–Schmidt algorithm with A partitioned into blocks of columns.
Each block is successively orthogonalized and incorporated into Q. In order to maintain full
orthogonality in Q, reorthogonalization is used in all Gram–Schmidt steps. A feature of the
algorithm is that it can handle numerical rank-deficiencies in A. A similar block algorithm based
on CGS2, together with an error analysis that improves some previously given bounds, is given
by Barlow and Smoktunowicz [76, 2013].
Puglisi [906, 1992] gives an improved version of the WY representation of products of House-
holder reflections, which is richer in matrix-matrix operations; see also Joffrain et al. [674, 2006].
Bischof and Quintana-Ortí [121, 122, 1998], use a windowed version of column pivoting aided
by an incremental condition estimator (ICE) to develop an efficient block algorithm for comput-
ing an RRQR factorization. Columns found to be nearly linearly dependent of previously chosen
columns are permuted to the end of the matrix. Numerical tests show that this pivoting strategy
usually correctly identifies the rank of A and generates a well-conditioned matrix R.
Oliveira et al. [842, 2000] analyze pipelined implementations of QR factorization using dif-
ferent partitioning schemes, including block and block-cyclic columnwise schemes. A parallel
implementation of CGS with reorthogonalization is given by Hernandez, Román, and Tomás
[605, 2006]. Rounding error analysis of mixed-precision block Householder algorithms is given
by Yang, Fox, and Sanders [1138, 2020]. Carson et al. [211, 2022] survey block Gram–Schmidt
algorithms and their stability properties.
110 Chapter 2. Basic Numerical Methods
where R11 and R22 are upper triangular matrices. Equating both sides gives the matrix equations
T T T T
R11 R11 = A11 , R11 R12 = A12 , R22 R22 = A22 − R12 R12
T
1. Compute the Cholesky factorization A11 = R11 R11 .
T
2. Solve the lower triangular system R11 R12 = A12 for R12 .
T
3. Form the Schur complement S22 = A22 − R12 R12 and compute its Cholesky factorization
T
S22 = R22 R22 .
The Cholesky algorithm in the MATLAB program following computes the two required
Cholesky factorizations of size n1 × n1 and n2 × n2 by recursive calls. The recursion is stopped
and a standard Cholesky routine used if n ≤ nmin.
function L = rchol(C,nmin)
% RCHOL computes the Cholesky factorization
% of C using a divide and conquer method
% -------------------------------------------------
[n,n] = size(C);
if n <= nmin, L = chol(C);
else
n1 = floor(n/2); n2 = n-n1;
j1 = 1:n1; j2 = n1+1:n;
% Recursive call
L11 = rchol(C(j1,j1),nmin);
% Triangular solve
L21 = (L11\C(j1,j2))';
% Recursive call
L22 = rchol(C(j2,j2) - L21*L21',nmin);
L = [L11, zeros(n1,n2); L21, L22];
end
end
L = chol(C). All remaining work is done in triangular solves and matrix multiplication. At
level i, 2i calls to matrix-matrix operations are made. In going from level i to i + 1, the number
of such calls doubles, and each problem size is halved. Hence the number of flops done at each
level goes down in a geometric progression by a factor of 4. Because the total number of flops
must remain the same, a large part of the calculations are made at low levels. Since the flop rate
goes down with the problem size, the computation time does not quite go down by the factor
1/4, but for large problems this has little effect on the total efficiency.
To develop a recursive QR algorithm, A is partitioned as A = ( A1 A2 ), where A1 consists
of the first ⌊n/2⌋ columns of A. Assume that the QR factorization of A1 has been computed and
A2 updated as follows:
R11 A12 R12
QT1 A1 = , QT1 A2 = QT1 = .
0 A22 B
A disadvantage of this algorithm is the overhead in storage and operations caused by the T
matrices. At the end of the recursive QR factorization a T -matrix of size n × n is formed and
stored. This can be avoided by using a hybrid of the partitioned and recursive algorithms, where
the recursive QR algorithm is only used to factorize the blocks in the partitioned algorithm; see
Elmroth and Gustavson [385, 2004].
O’Leary and Whitman [841, 1990] analyze algorithms for Householder and MGS QR factoriza-
tions on distributed MIMD machines using rowwise partitioning schemes. Gunter and Van de
Geijn [553, 2005] present parallel algorithms for QR factorizations. A recursive algorithm for
Cholesky factorization of a matrix in packed storage format is given in Andersen, Waśniewski,
and Gustavson [23, 2001]. An incremental parallel QR factorization code is given by Baboulin
et al. [49, 2009]. Algorithms for QR factorization for multicore architectures are developed by
Buttari et al. [196, 2008], Buttari [195, 2013], and Yeralan et al. [1139, 2017]. Communication
avoiding rank-revealing QR factorizations are developed by Demmel et al. [305, 2015]. Recent
developments in hardware and software for large-scale accelerated multicores are surveyed by
Abdelfattah et al. [2, 2016]. The impact of hardware developments on subroutines for computing
the SVD is surveyed by Dongarra et al. [323, 2018].
software for linear algebra appeared in 1971 in the Handbook by Wilkinson and Reinsch [1123,
1971]. It contained eleven subroutines written in ALGOL 60 for linear systems, linear least
squares, and linear programming and eighteen routines for eigenvalue problems.
EISPACK is a collection of Fortran 77 subroutines for computing eigenvalues and/or eigen-
vectors of several different classes of matrices as well as the SVD; see Smith et al. [1005, 1976]
and Garbow et al. [441, 1977]. The subroutines are primarily based on a collection of ALGOL
procedures in the Handbook mentioned above, although some were updated to increase reliability
and accuracy.
In 1979 a set of standard routines called BLAS (Basic Linear Algebra Subprograms) were
introduced to perform frequently occurring operations; see Lawson et al. [728, 1979]. These
included operations such as scalar product β := xT y (Sdot), vector sums y := αx + y (Saxpy),
scaling y = αx (Sscal), and Euclidean norm β = ∥x∥2 (Snrm2). Both single- and double-
precision real and complex operations were provided. BLAS leads to shorter and clearer code
and aids portability. Furthermore, machine-independent optimization can be obtained by using
tuned BLAS provided by manufacturers.
LINPACK is a collection of Fortran subroutines using BLAS that analyzes and solves lin-
ear equations and linear least squares problems. It solves systems whose matrices are general,
banded, symmetric positive definite and indefinite, triangular, or tridiagonal. It uses QR and SVD
for solving least squares problems. These subroutines were developed from scratch and include
several innovations; see Dongarra et al. [322, 1979].
While successful for vector-processing machines, Level 1 BLAS were found to be unsatisfac-
tory for the cache-based machines introduced in the 1980s. This brought about the development
of Level 2 BLAS that involve operations with one matrix and one or several vectors, e.g.,
y := αAx + βy,
y := αAT x + βy,
A := αxy T + A,
x := T x,
x := T −1 x,
where A is a matrix, T is an upper or lower triangular matrix, and x and y are vectors; see
Dongarra et al. [325, 326, 1988]. Level 2 BLAS involve O(mn) data, where m × n is the
dimension of the matrix involved, and the same number of arithmetic operations.
When RISC-type microprocessors were introduced, Level 2 BLAS failed to achieve adequate
performance, due to a delay in getting data to the arithmetic processors. In Level 3 BLAS,
introduced in 1990, the vectors in Level 2 BLAS are replaced by matrices. Some typical Level 3
BLAS are
C := αAB + βC,
C := αAT B + βC,
C := αAB T + βC,
B := αT B,
B := αT −1 B.
For n × n matrices, Level 3 BLAS use O(n2 ) data but perform O(n3 ) arithmetic operations.
This gives a surface-to-volume effect for the ratio of data movement to operations and avoids
excessive data movements between different parts of the memory hierarchy. Level 3 BLAS can
achieve close to optimal performance on a large variety of computer architectures and makes
it possible to write portable high-performance linear algebra software. Formal definitions for
114 Chapter 2. Basic Numerical Methods
Level 3 BLAS were published in 2001; see Blackford et al. [155, 2002]. Vendor-supplied highly
efficient machine-specific implementations of BLAS libraries are available, such as Intel Math
Kernel Library (MKL), IBM Scientific Subroutine Library (ESSL), and the open-source BLAS
libraries OpenBLAS and ATLAS.
The kernel in the Level 3 BLAS that gets closest to peak performance is the matrix-matrix
multiply routine GEMM. Typically, it will achieve over 90% of peak on matrices of order greater
than a few hundred. The bulk of the computation of other Level 3 BLAS such as symmetric
matrix-matrix multiply (SYMM), triangular matrix-matrix multiply (TRMM), and symmetric
rank-k update (SYRK), can be expressed as calls to GEMM; see Kågström, Ling, and Van Loan
[679, 1998].
The LAPACK library [27, 1999], first released in 1992, was designed to supersede and inte-
grate LINPACK and EISPACK. The subroutines in LAPACK were restructured to achieve greater
efficiency on both vector processors and shared-memory multiprocessors. LAPACK was incor-
porated into MATLAB in the year 2000. LAPACK is continually improved and updated and
available from http://www.netlib.org/lapack/. Different versions and releases are listed
there as well as information on related projects. A number of parallel BLAS libraries can be used
in LAPACK to take advantage of common techniques for shared-memory parallelization such as
pThreads or OpenMP.
The last decade has been marked by the proliferation of multicore processors and hardware
accelerators that present new challenges in algorithm design. On such machines, costs for com-
munication, i.e., moving data between different levels of memory hierarchies and processors, can
exceed arithmetic costs by orders of magnitude; see Graham, Snir, and Patterson [524, 2004].
This gap between computing power and memory bandwidth keeps increasing; see Abdelfatta
et al. [1, 2021]. A key to high efficiency is locality of reference, which requires splitting opera-
tions into carefully sequenced tasks that operate on small portions of data. Iterative refinement is
exploited by Dongarra and his coworkers for accelerating multicore computing; see Abdelfatta
et al. [2, 2016].
There are two costs associated with communication: bandwidth cost (proportional to the
amount of data moved) and latency cost (proportional to the number of messages in which these
data are sent). Ballard et al. [65, 2011] prove bounds on the minimum amount of communication
needed for a wide variety of matrix factorizations including Cholesky and QR factorizations.
These lower bounds generalize earlier bounds by Irony, Toledo, and Tiskin [657, 2004] for matrix
products. New linear algebra algorithms with reduced communication costs are discussed and
examples given that attain these lower bounds.
ScaLAPACK is an extension of LAPACK designed to run efficiently on newer MIMD dis-
tributed memory architectures; see Choi et al. [243, 1996] and Blackford et al. [154, 1997].
ScaLAPACK builds on distributed memory versions of parallel BLAS (PBLAS) and on a set
of Basic Linear Algebra Communication Subprograms (BLACS) for executing communication
tasks. This makes the top level code of ScaLAPACK look quite similar to the LAPACK code.
Matrices are arranged in a two-dimensional block-cyclic layout using a prescribed block size.
New implementations of algorithms are available via the open-source libraries PLASMA and
MAGMA; see Agullo et al. [10, 2009].
Chapter 3
Generalized and
Constrained Least
Squares
Ax + ϵ = b, V(ϵ) = σ 2 V, (3.1.1)
with Hermitian positive definite error covariance matrix V ̸= I. Then the following result holds.
The solution x
b satisfies the generalized normal equations
AH V −1Ax = AH V −1 b (3.1.3)
Proof. Since V is positive definite, the Cholesky factorization V = LLH exists. Then AH V −1 A
= (L−1 A)HL−1 A, and problem (3.1.2) can be reformulated as
This is a standard least squares problem minx ∥Ax e = L−1 A and eb = L−1 b. The
e − eb∥, where A
proof now follows by replacing A and b in Theorem 1.1.4 with A and eb.
e
115
116 Chapter 3. Generalized and Constrained Least Squares
In the following we assume that A, b, and V are real. The GLS problem can be solved
by first computing V = LLT and then solving LA e = A and Leb = b. The normal equations
T T
à Ãx = à b̃ are formed and solved by Cholesky factorization. Alternatively, using the QR
factorization
R
L−1 A = Q , Q = ( Q1 Q2 ) , (3.1.7)
0
AT V −1 Az = c, y = V −1 Az. (3.1.9)
If V = LLT is the Cholesky factorization, then y T V y = ∥LT y∥22 . Hence problem (3.1.8) is
eT ye = c, where
equivalent to seeking the minimum ℓ2 -norm solution of the system A
e = L−1 A,
A ye = LT y.
Problems GLS and GLN are special cases of the generalized augmented system
y V A y b
M ≡ = . (3.1.11)
x AT 0 x c
N (V ) ∩ N (AT ) = {0}.
If V is positive definite, then by Sylvester’s law of inertia (see Horn and Johnson [639, 1985]) it
follows that the matrix M ∈ R(m+n)×(m+n) of system (3.1.11) has m positive and n negative
eigenvalues. For this reason, (3.1.11) is called a saddle point system. Eliminating y in (3.1.11)
gives the generalized normal equations for x,
AT V −1 Ax = AT V −1 b − c. (3.1.12)
Such systems represent the equilibrium of a physical system and occur in many applications; see
Strang [1043, 1988].
3.1. Generalized Least Squares Problems 117
Theorem 3.1.2. If A ∈ Rm×n has full column rank and V ∈ Rm×m is symmetric positive defi-
nite, the augmented system (3.1.11) is nonsingular and gives the first-order optimality conditions
for the generalized least squares problem
Proof. System (3.1.11) can be obtained by differentiating (3.1.13). This gives AT V −1 (b−Ax) =
c, where V y = b − Ax. It can also be obtained by differentiating the Lagrangian
1 T
L(x, y) = y V y − bT y + xT (AT y − c)
2
for (3.1.14) and equating to zero. Here x is the vector of Lagrange multipliers. If c = 0 in
(3.1.13), then x is the GLS solution of (1.1.2). If b = 0 in (3.1.14), y is the GLS solution with
minimum weighted norm ∥y∥V = (y T V y)1/2 of the consistent underdetermined linear system
AT y = c.
It follows that any algorithm for solving the GLS problem (3.1.13) is valid also for the qua-
dratic programming problem (3.1.14) and vice versa. An explicit expression for the inverse of
augmented matrix M is obtained from the Schur–Banachiewicz formula (3.3.6),
−1
V −1 (I − P ) V −1 AS −1
−1 V A
M = = , (3.1.15)
AT 0 S −1 AT V −1 −S −1
where
S = AT V −1 A, P = AS −1 (V −1 A)T . (3.1.16)
The solution of the augmented system (3.1.11) becomes y = L−T u, x = R−1 v, where
B −1 b
u Q2 QT2 Q1
= . (3.1.18)
v QT1 −I R−T c
P b ∈ R(P ), (I − P )b = N (P )⊥ .
Consider first the two-dimensional case. Let u and v be unit vectors in R2 such that cos θ =
v T u > 0. Then
1
P = u(v T u)−1 v T = uv T
cos θ
is the oblique projector onto u along the orthogonal complement of v. Similarly, P T =
v(uT v)−1 uT is the oblique projector onto v along the orthogonal complement of u. If u = v,
then P is an orthogonal projector and cos θ = 1. When v is almost orthogonal to u, then
∥P ∥2 = 1/ cos θ becomes large.
It is easily verified that if V ̸= I is positive definite, the matrix in (3.1.16),
X ∩ Y = 0, X ∪ Y = Cn . (3.1.20)
Let U1 and V1 be orthonormal matrices such that R(U1 ) = X and R(V1 ) = Y ⊥ , where Y ⊥ is
the orthogonal complement of Y. Then the oblique projector onto X along Y is
Similarly, let U2 and V2 be orthonormal matrices such that X ⊥ = R(U2 ) and Y = R(V2 ). Then
PY,X = V2 (U2T V2 )−1 U2T and
Proof. We have PX2 ,Y = U1 (V1T U1 )−1 V1T U1 (V1T U1 )−1 V1T = PX ,Y . This shows that PX ,Y is a
projector onto X . Similarly, PY,X = V2 (U2T V2 )−1 U2T is the projector onto Y. To prove the first
identity in (3.1.22), we first note that the assumption implies V1T V2 = 0 and U2T U1 = 0. Then
The second identity follows from the expression PXT ,Y = V1 (U1T V1 )−1 U1T .
where equality holds for all vectors in R(P ). It follows that ∥P ∥2 = 1. The converse is also
true; a projector P is an orthogonal projector only if (3.1.23) holds. The spectral norm of an
oblique projector can be exactly computed.
3.1. Generalized Least Squares Problems 119
Proof. See Wedin [1110, 1985, Lemma 5.1] or Stewart [1032, 2011].
An excellent introduction to oblique projectors and their representations is given by Wedin [1110,
1985]. Afriat [9, 1957] gives an exposition of orthogonal and oblique projectors. Relations
between orthogonal and oblique projectors are studied in Greville [537, 1974] and Černý [215,
2009]. Numerical properties of oblique projectors are treated by Stewart [1032, 2011]. Szyld
[1055, 2006] surveys different proofs of the equality ∥P ∥ = ∥I − P ∥ for norms of oblique
projectors in Hilbert spaces.
defines a scalar product and the corresponding norm. Since the unit ball {x | ∥x∥G ≤ 1} is an
ellipsoid, ∥ · ∥G is called an elliptic norm. A generalized Cauchy–Schwarz inequality holds:
Two vectors x and y are said to be G-orthogonal if (x, y)G = 0, and a matrix Q ∈ Rm×n is
G-orthonormal if QT GQ = I.
If A = (a1 , . . . , an ) ∈ Rm×n has full column rank, then an elliptic modified Gram–Schmidt
(MGS) algorithm can be used to compute a G-orthonormal matrix Q1 = (q1 , . . . , qn ) and an
upper triangular matrix R such that
P = (I − qq T G), q T Gq = 1 (3.1.28)
and satisfies P 2 = I − 2qq T G + q(q T Gq)q T G = P . It is easily verified that for any vector a,
q T G(P a) = 0, i.e., P a is G-orthogonal to q. Note that P is not symmetric and therefore is an
oblique projector; see Section 3.1.2. Furthermore,
is an orthogonal projector.
120 Chapter 3. Generalized and Constrained Least Squares
factorization
R z
(A b ) = ( Q1 qn+1 ) . (3.1.29)
0 ρ
It follows that Ax − b = Q1 (Rx − z) − ρqn+1 , where qn+1 is G-orthogonal to Q1 . Hence
∥b − Ax∥G is minimized when Rx = z, and the solution and residual are obtained from
Rx = z, r = b − Ax = ρqn+1 . (3.1.30)
Extra right-hand sides can be treated later by updating the factorization A = Q1 R using the
column-oriented version of elliptic MGS.
Gulliksson and Wedin [551, 1992] develop an elliptic Householder QR factorization. An
elliptic Householder reflection has the form
cf. the elementary projection operator (3.1.28). The product of an elliptic Householder reflection
H with a vector a is given by
Such matrices are called G-invariant. Clearly, the unit matrix I is G-invariant, and a product
of G-invariant matrices H = H1 H2 · · · Hn is again G-invariant. This property characterizes
transformations that leave the G-norm invariant:
Hence, minx ∥Ax − b∥G and minx ∥H(Ax − b)∥G have the same solution.
To develop a Householder QR algorithm for solving minx ∥Ax − b∥G , we construct a se-
quence of generalized reflections Hi such that
R c1
Hn · · · H1 (Ax − b) = x− , (3.1.33)
0 c2
where R is upper triangular and nonsingular. Then an equivalent problem is minx ∥Rx − c1 ∥G
with solution x = R−1 c1 . As in the standard Householder method, this only requires that we
construct a generalized Householder reflection H that maps a given vector a onto a multiple of
the unit vector e1 :
Ha = a − β(uT Ga)u = ±σe1 . (3.1.34)
By the invariance of the G-norm,
V = BB T , B ∈ Rm×p , p ≤ m, (3.1.35)
the GLS problem minx (b − Ax)T V −1 (b − Ax) can be reformulated as a standard least squares
problem minx ∥B −1 Ax − B −1 b∥2 . However, when B is ill-conditioned, computing A
e = B −1 A
−1
and b = B b may lead to a loss of accuracy. Paige [858, 853, 1979] avoids this by using the
e
equivalent formulation
min ∥v∥22 subject to Ax + Bv = b. (3.1.36)
v,x
Paige’s method can handle rank-deficiency in both A and V , but for simplicity we assume in
the following that A has full column rank n and V is positive definite. Paige’s method starts by
computing the QR factorization of A and applies QT to b and B:
T R c1 } n T C1 } n
Q (A b) = , Q B= . (3.1.37)
0 c2 } m − n C2 } m − n
For any vector v ∈ Rm , x can always be determined so that the first block of these equations is
satisfied. Next, an orthogonal matrix P ∈ Rm×m is determined such that
0 } n
P T C2T = , (3.1.39)
ST } m − n
and S is upper triangular. By the nonsingularity of B it follows that C2 will have linearly inde-
pendent rows, and hence S will be nonsingular. (Note that after rows and columns are reversed,
(3.1.39) is just the QR factorization of C2T .) Now, the second set of constraints in (3.1.38) be-
comes
u1 } n
Su2 = c2 , where P T v = u = . (3.1.40)
u2 } m − n
Since P is orthogonal, ∥v∥2 = ∥u∥2 . Hence the minimum in (3.1.36) is achieved by taking
u1 = 0, u2 = S −1 c2 , v = P2 u2 ,
Paige’s algorithm requires a total of about 4m3 /3 + 2m2 n flops. If m ≫ n, the work in
the QR factorization of C2 dominates. Paige [858, 1979] gives a perturbation analysis for the
generalized least squares problem (3.1.6) by using the formulation (3.1.36). An error analysis
shows that the algorithm is stable. The algorithm can be generalized in a straightforward way to
rank-deficient A and B.
If B has been obtained from the Cholesky factorization of V , it is advantageous to carry out
the two QR factorizations in (3.1.37) and (3.1.39) together, maintaining the lower triangular form
throughout by a careful sequencing of the plane rotations. When there are several problems of
the form (3.1.36) with constant A but variable B, the QR factorization of A can be computed
once and for all. When m = n this reduces the work for solving an additional problem from
(10/3)n3 to 2n3 .
When computing the QR factorization of the product C = BA or quotient C = AB −1 =
QR the explicit formation of C should be avoided in order to obtain backward stable results. The
generalized QR factorization (GQR) of a pair of matrices A ∈ Rm×n and B ∈ Rm×p , intro-
duced by Hammarling [564, 1987], is useful for solving generalized equality constrained least
squares problems and in the preprocessing stage for computing the generalized SVD (GSVD).
When B is nonsingular, GQR implicitly computes
without forming C. The GQR is defined for any matrices A and B with the same number of
rows. In the general case the construction of the GQR proceeds in two steps. A rank-revealing
QR factorization of A is first computed,
U11 U12 }r
QT AΠ = , r = rank(A), (3.1.43)
0 0 }m − r
where Q is orthogonal, Π is a permutation matrix, and U11 ∈ Rr×r is upper triangular and
nonsingular. Then QT is applied to B:
B1 }r
QT B = .
B2 }m − r
where rank(B2 ) = q, t = r + q, R11 ∈ Rr×k1 and R22 ∈ R(n−q)×k2 are upper trapezoidal, and
rank(R11 ) = k1 , rank(R22 ) = k2 . If rank(B) = p, there will be no zero columns. Note that
row interchanges can be performed on the block B2 if Q is modified accordingly.
If B is square (p = m) and nonsingular, then so is R, and from (3.1.43)–(3.1.44) we have
−1
R11 U
Q̃T (B −1 A)Π = = S, U = ( U11 U12 ) , (3.1.45)
0
which is the QR factorization of B −1 AΠ. Even in this case one should avoid computing S
because in most applications it is not needed, and it is usually more effective to use R11 and U
separately. Another advantage of keeping R11 and U is that the corresponding decompositions
(3.1.43)–(3.1.44) can be updated by the standard methods when columns or rows of A and B
are added or deleted. Even when S is defined by (3.1.45) it cannot generally be updated in a
stable way.
124 Chapter 3. Generalized and Constrained Least Squares
When B is singular or not square, the GQR can be defined as the QR factorization of B † A,
where B † denotes the pseudoinverse of B. However, as pointed out by Paige [861, 1990], this
does not produce the algebraically correct solution for many applications.
The product QR factorization (PQR) of A ∈ Rm×n and B ∈ Rm×p can be computed in a
similar manner. We use (3.1.43) as the first step and replace (3.1.44) by
L11 0 0 }r
QT B Q̃ = L21 L22 0 }q , Q e = ( Q1 Q2 Q3 ) , (3.1.46)
0 0 0 }m − t
where L11 ∈ Rq×r1 , L22 ∈ R(n−q)×r2 and rank(L11 ) = r1 , rank(L22 ) = r2 . This gives the
PQR because T T
e T B TA = L11 U = L
Q , (3.1.47)
0 0
with LT ∈ Rr1 ×n upper trapezoidal. Again, one should avoid computing LT because it is not
needed in most applications, and more accurate methods are usually obtained if L11 and U are
kept separate. (A trivial example is the case when B = A.)
The GQR factorization is given by
A = QR, B = QT Z, (3.1.48)
where Q ∈ Rm×m and Z ∈ Rp×p are orthogonal, and R and T have one of the forms
R11
R= (m ≥ n), R = ( R11 R12 ) (m < n)
0
and
T11
T = ( 0 T12 ) (m ≤ p), T = (m > p).
T21
If B is square and nonsingular, GQR implicitly gives the QR factorization of B −1 A. There is a
similar generalized RQ factorization related to the QR factorization of AB −1 . These generalized
decompositions and their applications are discussed in Anderson, Bai, and Dongarra [25, 1992].
Theorem 3.1.5 (Generalized SVD). Let (A, B) with A ∈ Rm×n and B ∈ Rp×n be a given
T
matrix pair with M = ( AT B T ) and rank(M ) = k. Then there exist orthogonal matrices
m×m p×p n×n
U ∈R ,V ∈R ,Q∈R , and W ∈ Rk×k such that
AQ = U ΣA ( Z 0), BQ = V ΣB ( Z 0), (3.1.50)
where the nonzero singular values of Z = W T R equal those of M , and R ∈ Rk×k is upper
triangular. Moreover,
IA }r OB }m − k − r
ΣA = DA }s , ΣB = DB }s (3.1.51)
OA }m − r − s IB }k − r − s
are diagonal matrices,
DA = diag (αr+1 , . . . , αr+s ), 1 > αr+1 ≥ · · · ≥ αr+s ,
DB = diag (βr+1 , . . . , βr+s ), 0 < βr+1 ≤ · · · ≤ βr+s , (3.1.52)
and αi2 + βi2 = 1, i = r + 1, . . . , r + s. Furthermore, IA and IB are square unit matrices, and
OA ∈ R(m−r−s)×(k−r−s) and OB ∈ R(m−k−r)×r are zero matrices with possibly no rows or
no columns.
Note that in (3.1.51) the column partitionings of ΣA and ΣB are the same. We can define
k nontrivial singular value pairs (αi , βi ) of (A, B), where αi = 1, βi = 0, i = 1, . . . , r, and
αi = 0, βi = 1, i = r + s + 1, . . . , k. Perturbation theory for generalized singular values by
Sun [1047, 1983] and Li [742, 1993] shows that as in the SVD, αi and βi are well-conditioned
with respect to perturbations of A and B.
The GSVD algorithm of Bai and Demmel [59, 1993] requires about 2mn2 + 15n3 flops. It
uses a preprocessing step for reducing A and B to upper triangular form and gives a new stable
and accurate 2 × 2 triangular GSVD algorithm. Another approach by Bai and Zha [63, 1993]
starts by extracting a regular pair (A, B), with A and B upper triangular and B nonsingular.
A satisfying aspect of the formulation of GSVD in Theorem 3.1.5 is that A and B are treated
identically, and no assumptions are made on the dimension and rank of A and B. For many
applications this generality is not needed, and the following simplified form similar to that in
Van Loan [1079, 1976] can be used.
Corollary 3.1.6. Let ( A B ) with A ∈ Rm×n and B ∈ Rp×n be a given matrix pair with
T
m ≥ n ≥ p and rank(M ) = n, where M = ( AT B T ) . Then there exist orthogonal
m×m p×p
matrices U ∈ R ,V ∈ R and a nonsingular matrix Z ∈ Rn×n with singular values
equal to those of M , such that
DB 0
A = U ( DA 0 ) Z, B=V Z, (3.1.53)
0 0
where DA = diag (α1 , . . . , αn ), DB = diag (β1 , . . . , βp ), αi2 + βi2 = 1, i = 1, . . . , p, and
0 < α1 ≤ · · · ≤ αn ≤ 1, 1 ≥ β1 ≥ · · · ≥ βp > 0.
The generalized singular values of (A, B) are the ratios σi = αi /βi . Setting W = Z −1 =
(w1 , . . . , wk ), we get from (3.1.53)
Awi = αi ui , i = 1, . . . , n, Bwi = βi vi , i = 1, . . . , p,
126 Chapter 3. Generalized and Constrained Least Squares
(Awi )TAwj = 0, i ̸= j.
When B ∈ Rn×n is square and nonsingular the GSVD of A and B reduces to the SVD of AB −1 ,
also called the quotient SVD (QSVD). Similarly the SVD of AB is the product SVD. If B were
ill-conditioned, then forming AB −1 (or AB) would give unnecessarily large errors in the SVD,
so this approach should be avoided. Note also that when B is not square or is singular, the SVD
of AB † does not always correspond to the GSVD.
An algorithm for computing the QSVD of A ∈ Rm×n and B ∈ Rp×n was proposed by
Paige [854, 1986]. In the first phase, A and B are reduced to generalized triangular form by an
RRQR factorization of A with column pivoting P . Next, a QR factorization is performed on BP
in which column pivoting is used on the last p − r rows and n − r columns of B to reveal the
rank q of this block. In the second phase, two n × n upper triangular matrices are computed by
a Kogbetliantz-type algorithm; see Section 7.2.2. Such algorithms can be extended to compute
the GSVD for products and quotients of several matrices.
The generalized linear model where both A and V are allowed to be rank-deficient can be
analysed by the GSVD; see Paige [860, 1985]. We use the model (3.1.36) and assume that
V = BB T is given in factored form, where B ∈ Rm×p , p ≤ m. Since A and B have the same
number of rows, the GSVD is applied to AT and B T .
Let r = rank(A), s = rank(B), k = rank(( A B )), where r ≤ n, s ≤ p, k ≤ r + s. Then
there exist orthogonal matrices U ∈ Rn×n , V ∈ Rp×p and a matrix Z ∈ Rm×k of rank k such
that
0 0 0 }k − r Ik−r 0 0 }k − r
AU = Z 0 DA 0 0 D 0
}q , BV = Z B }q ,
0 0
|{z} | {z }I k−s }k − s 0
| {z }0 0
|{z} }k − s
n−r r s p−s
(3.1.54)
2 2
where q = r + s − k, DA + DB = Iq , and
Note that the row partitionings in (3.1.54) are the same. Let the orthogonal matrices U =
( U1 U2 U3 ) and V = ( V1 V2 V3 ) be partitioned conformally with the column blocks
on the right-hand sides in (3.1.54). Then AU1 = 0, BV3 = 0, i.e., U1 and V3 span the nullspace
of A and B, respectively. The decomposition (3.1.54) separates out the common column space
of A and B. Since AU2 = ZDA and BV2 = ZDB , we have AU2 DB = BV2 DA , and it follows
that
R(AU2 ) = R(BV2 ) = R(A) ∩ R(B)
and has dimension q. For the special case B = I we have s = k = m and then q = rank(A).
Now let the QR factorization of Z in (3.1.54) be
R
QT Z = , Q = (Q1 , Q2 ), (3.1.56)
0
where R ∈ Rk×k is upper triangular and nonsingular. In the model (3.1.36) we make the orthog-
onal transformations
x̃ = U T x, ũ = V T u. (3.1.57)
3.1. Generalized Least Squares Problems 127
where we have partitioned R and c = QT1 b conformally with the block rows of the two-block
diagonal matrices in (3.1.58).
We first note that x̃1 has no effect on b and therefore cannot be estimated. The decomposition
x = xn + xe with
xn = U1 x̃1 , xe = U2 x̃2 + U3 x̃3
splits x into a nonestimable part xn and an estimable part xe . Furthermore, x̃3 can be determined
exactly from R33 x̃3 = c3 . Note that x̃3 has dimension k − s = rank(( A B )) − rank(B), so
that this can only occur when rank(B) < m.
The second block row in (3.1.59) gives the linear model
−1
DA x̃2 + DB ũ2 = R22 (c2 − R23 x̃3 ),
where from (3.1.57) we have V ũ2 = σ 2 I. Here the right-hand side is known, and the best linear
unbiased estimate of x̃2 is
−1 −1
x̂2 = DA R22 (c2 − R23 x̃3 ). (3.1.60)
Since the error satisfies DA (x̂2 − x̃2 ) = DB ũ2 , the error covariance is
−1
V (x̂2 − x̃2 ) = σ 2 (DA DB ),
and the components are uncorrelated. The random vector ũ3 has no effect on b. The dimension
of ũ3 is p − s = p − rank(B), and so is zero if B has independent columns. Finally, the vector
ũ1 can be obtained exactly from (3.1.59). Since ũ1 has zero mean and covariance matrix σ 2 I, it
can be used to estimate σ 2 . Note that ũ1 has dimension k − r = rank ( A B ) − rank(A).
A QR-like algorithm for computing the SVD of a product or quotient of two or more matrices
is given by Golub, Sølna, and Van Dooren [508, 2000]. Let
C = Aspp · · · As22 As11 , si = ±1,
be a sequence of products or quotients of matrices Ai of compatible dimensions. To illustrate the
idea, we consider for simplicity the case when p = 2 and si = 1. Then orthogonal matrices Qi ,
i = 0, 1, 2, can be constructed such that
B = QT2 A2 Q1 QT1 A1 Q0
is bidiagonal. The SVD of B can then be found by standard methods. Typically, the bidiagonal
matrix will be graded and will allow small singular values to be computed with high relative
precision by the QR algorithm. The generalization to the product and/or quotient of an arbitrary
number of matrices is obvious.
128 Chapter 3. Generalized and Constrained Least Squares
where weights wi are such that the weighted residuals ri = wi (b − Ax)i have equal variance.
Note that the solution of (3.2.1) is scaling independent, i.e., it does not change if W is multiplied
by a nonzero scalar. Therefore, without restriction we can assume in the following that wi ≥ 1,
and that the rows of A are normalized so that max1≤j≤n |aij | = 1, i = 1, . . . , m.
The solution to the WLS problem (3.2.1) satisfies the normal equations
AT W 2 Ax = AT W 2 b. (3.2.2)
A more stable solution method is to use the weighted QR factorization W A = QR. The solution
to (3.2.1) is then obtained from
Rx = QT W b, (3.2.3)
and squaring of the weight matrix W is avoided.
For a consistent underdetermined system AT y = c, the unique solution of the weighted
least-norm problem
min ∥W y∥2 subject to AT y = c (3.2.4)
y
y = W QR−T c. (3.2.6)
3.2. Weighted Least Squares 129
In a linear model where some error components have much smaller variance than others, the
weight matrix W = diag (wi ) is ill-conditioned. Then κ(W A) can be large even when A is
well-conditioned. We call a weighted least squares problem stiff if
max1≤i≤m wi
µ= ≫ 1, (3.2.7)
min1≤i≤m wi
by analogy to the terminology used in differential equations. Stiff problems arise, e.g., in barrier
and interior methods for optimization, electrical networks, and certain classes of finite element
problems. In interior methods, W becomes very ill-conditioned when the iterate approaches the
boundary; see Wright [1132, 1995]. Stiff problems occur also when the method of weighting is
used to solve least squares problems with the linear equality constraints A1 x = b1 . Often the
interest is in a sequence of weighted problems where W varies and A is constant.
That a weighted least squares problem is stiff does not in general imply that the problem of
computing x from the data W , A, and b is ill-conditioned. For weighted least squares problems
the componentwise Bauer–Skeel condition number (see Section 1.3.4)
is more relevant. This often depends only weakly on W and when µ → ∞ can tend to a limit
value; see Section 3.2.3.
To illustrate the possible failure of the method of normal equations for the weighted least
squares problem (3.2.2), we consider a problem where only the first p < n equations are
weighted by wi = w ≫ 1, i = 1, . . . , p:
2
wA1 wb1
min x− , A1 ∈ Rp×n . (3.2.8)
x A2 b2 2
When w ≫ 1, C and d will be dominated by the first term. If w > u−1/2 , then all information
contained in A2 and b2 will be lost. But since p < n, the solution depends critically on this data.
(The matrix in the Läuchli least squares problem in Example 2.1.1 is of this type.) Hence, the
method of normal equations is generally not well behaved.
The Peters–Wilkinson method (see Section 2.4.1) can be used to solve weighted least squares
problems even when W is severely ill-conditioned. Assume that the rows of A and b are pre-
ordered by decreasing weights, ∞ > w1 ≥ · · · ≥ wm > 0. Compute an LDU factorization with
complete pivoting of A,
Π1 AΠ2 = LDU,
where L ∈ Rm×n is unit lower trapezoidal, |lij | ≤ 1, and U ∈ Rn×n is upper unit triangular. In
the transformed problem
L and U are usually well-conditioned, and the weight matrix W is reflected only in D. The
transformed problem can often be solved safely by the method of normal equations LT Ly = LT b̃
and back-substitution DU x = Dy.
130 Chapter 3. Generalized and Constrained Least Squares
w w w w
1 0 0 0
WA = , Wb =
0 1 0 0
0 0 1 0
of Läuchli. For w > u−1/2 the normal equations become singular. The Peters–Wilkinson
method computes the factorization W A = LDU ,
1
1 1 1
w−1 1
L= , U = −1 −1 ,
−1 1
−1
−1
where D = diag (w, 1, 1) and L and U are well-conditioned. With y = DU x the problem
becomes miny ∥Ly − W b∥2 . The solution can be accurately computed from
LT Ly = LT W b, DU x = y.
The weight w only appears in the diagonal scaling of y in the last step. Alternatively, L can be
transformed into lower triangular form by Householder reflections; see A. K. Cline [253, 1973].
0 2 1 3
w w 0 2w
A= , b= , w ≫ 1,
w 0 w 2w
0 1 1 2
with exact solution x = (1, 1, 1)T . The first step of Householder QR factorization produces the
reduced matrix √ √ √
−w 2 −w/ 2 −w/ 2
√ √
0 w/2 − 2 −w/2 − 1/ 2
√ √ .
−w/2 − 2 w/2 − 1/ 2
0
0 1 1
√ −1 √ √
If w > 2 2u , then in floating-point arithmetic, the terms − 2 and −1/ 2 in the second and
third rows of Ã(2) are lost. But this means that all information present in the first row of A is lost.
This is disastrous, because the number of rows in A containing large elements is less than the
number of components in x. Hence there is a substantial dependence of the solution x on both
the first and fourth rows of A. Still, this is better than the method of normal equations, which
fails when w > u−1/2 for this problem. Van Loan [1081, 1985] gives further examples where
Householder QR without row interchanges gives poor accuracy for stiff least squares problems.
Note that the insensitivity of fixed-precision IR to poor row scaling of A can make it possible to
relax the need to sort rows of large norm to the top.
3.2. Weighted Least Squares 131
For the first example above, failure can easily be avoided by interchanging the first two rows
of A before performing the QR factorization. More generally, rowwise stability can be achieved
by using Householder QR with complete pivoting. By this we mean that in each step, first
a pivot column of maximum norm is selected and then an element of largest absolute value is
permuted into the pivot position. (Note the importance of interchanging columns before rows.)
With complete pivoting, the backward errors in QR factorization can be bounded rowwise in
(k)
terms of element growth in each row. If Â(k) = (aij ) is the computed matrix after k steps, then
(k)
√
ωi = max |aij | ≤ (1 + 2)n−1 max |aij |, i = 1, . . . , m. (3.2.10)
j,k j
This upper bound can nearly be attained, although in practice, usually ωi ≈ 1. The following
rowwise stability result is due to Powell and Reid, but a more accessible derivation is given by
Cox and Higham [273, 1998].
Theorem 3.2.2. Let R̂ ∈ Rm×n denote the computed upper triangular matrix in the Householder
QR factorization of A ∈ Rm×n with complete pivoting. Let Π be the permutation matrix that
describes the column permutations. Then there exists an exactly orthogonal matrix Q ∈ Rm×m
such that
R̂
(A + ∆A)Π = Q , |∆aTi | ≤ γ̃m (1, . . . , n2 )ωi , i = 1, . . . , m, (3.2.11)
0
where γ̃m is defined as in (1.4.9).
Householder QR factorization with complete pivoting is expensive and not available in stan-
dard software. Björck [135, 1996, p. 169] conjectured that if the rows of A are presorted by
decreasing row ∞-norm, i.e., so that d1 ≥ d2 ≥ · · · ≥ dm , where di = max1≤j≤n |wi aij |, then
the rowwise backward error bound holds for Householder QR with standard column pivoting.
This conjecture was later proved by Cox [272, 1997]. Then standard software can be used for
stably solving strongly weighted least squares problems.
In contrast to the Householder QR method, Gram–Schmidt QR factorization is numerically
invariant under row interchanges, except for second-order effects derived from different summa-
tion orders in inner products. However, numerical results for highly stiff problems show a loss
of accuracy also for MGS.
Example 3.2.3. The stability of algorithms using QR factorization for stiff problems can be
enhanced by iterative refinement. As test problems we take
A = V D ∈ R21×6 , vij = (i − 1)j−1 ,
with D chosen so that the columns of A have unit 2-norm. The right-hand side is taken to be
b = Ax + θh, x = D−1 (105 , 104 , . . . , 1),
where AT h = 0 and h is normalized so that κ2 (A)∥h∥2 = 1.5∥A∥2 ∥x∥2 . Problems with
−1
widely different row norms are obtained by taking Aw = Dw A, bw = Dw b, hw = Dw h,
and Dw = diag (wi ), where wi = w for i = 1, 11, 21 and wi = 1 otherwise. The tests were
run on a UNIVAC 1108 with single precision equal to 2−26 = 10−7.83 and double precision
2−62 = 10−18.66 . Mixed-precision iterative refinement was carried out with three different QR
factorizations: Modified Gram–Schmidt (MGS), Householder QR (HQR), and Householder QR
with the weighted rows permuted to the top (HQRP). Table 3.2.1 shows the initial and final
average numbers of correct significant decimal digits in the solution for w = 1, 27 , 214 . The
numbers in parentheses indicate refinements carried out.
132 Chapter 3. Generalized and Constrained Least Squares
Table 3.2.1. Average number of correct significant decimal digits in the solution before and after
iterative refinement with various QR factorizations. The number of refinement steps is shown in parentheses.
Anda and Park [22, 1996] apply their fast self-scaling plane rotations to the QR factorization
of stiff least squares problems. Their results show that regardless of row sorting, these produce
accurate results even for extremely stiff least squares problems. No significant difference in ac-
curacy is observed between different rotation orderings. This makes self-scaling plane rotations
a method of choice for solving stiff problems.
where W = diag (w1 , . . . , wm ) is positive definite. Stewart [1022, 1984] showed that there is a
finite number χA not depending on W such that the weighted least squares solution xW satisfies
This implies a bound for the perturbation of the solution that results from a perturbation of b.
However, due to the combinatorial aspect of the bound χA , it cannot be computed in polynomial
time.
Ben-Tal and Teboulle [103, 1990] show, using a formula derived from Cramer’s rule and the
Binet–Cauchy formula, that xW is a convex combination of the basic solutions to nonsingular
square subsystems of the original overdetermined system.
Theorem 3.2.4. Let F be the set of all subsets I = {i1 , i2 , . . . , in } from {1, 2, . . . , m}. Denote
by AI and bI the submatrix of A and b whose rows correspond to I. Furthermore, let
F + = {I ∈ F | rank(AI ) = n},
This result is generalized to diagonally dominant symmetric and positive semidefinite weight
matrices by Forsgren [420, 1996]. He also shows that it does not hold for general symmetric
definite weight matrices.
3.2. Weighted Least Squares 133
is the signature matrix. Note that J −1 = J. This problem arises in downdating, total least
squares problems, and H ∞ -smoothing; see Chandrasekaran, Gu, and Sayed [232, 1998]. A
necessary condition for x to solve (3.2.13) is that the gradient be zero:
Hence the residual vector r = b−Ax is J-orthogonal to R(A). Equivalently, x solves the normal
equations ATJAx = AT Jb. The indefinite least squares problem has a unique solution if and
only if AT JA is positive definite. This implies that m1 ≥ n and that A1 (and A) has full column
rank; see Bojanczyk, Higham, and Patel [166, 2003]. In the following we assume that AT JA is
positive definite.
In the method of normal equations for problem ILS the Cholesky factorization of AT JA =
R R is computed, and the solution is obtained by solving two triangular systems RT (Rx) = c,
T
where c = AT Jb.
If accuracy is important, an algorithm based on QR factorization is to be preferred. A back-
ward stable algorithm was given by Chandrasekaran, Gu, and Sayed [232, 1998]. The first step
is to compute the compact QR factorization
A1 Q1
A= = R = QR, Q1 ∈ Rm1 ×n , Q2 ∈ Rm2 ×n ,
A2 Q2
where QT Q = QT1 Q1 + QT2 Q2 = In . Substituting this into (3.2.15) gives the equation
Note that the orthogonality of Q is not needed for this to hold. By computing the Cholesky
factorization
QT1 Q1 − QT2 Q2 = LLT ,
where L ∈ Rn×n is lower triangular, this linear system becomes
LLT Rx = QT Jb.
The total cost for this QR-Cholesky algorithm is about (5m − n)n2 flops. Although QT1 Q1 −
QT2 Q2 can be very ill-conditioned, this method can be shown to be backward stable.
134 Chapter 3. Generalized and Constrained Least Squares
Bojanczyk, Higham, and Patel [166, 2003] give a perturbation analysis of the ILS problem
and develop an alternative ILS algorithm based on so-called hyperbolic QR factorization. For
the perturbation analysis the normal equations can be rewritten in symmetric augmented form as
J A s b
= , (3.2.17)
AT 0 x 0
where s = Jr = J(b − Ax). The inverse of the augmented matrix is (cf. (3.1.15))
−1
J(I − P ) JAS −1
J A
= , (3.2.18)
AT 0 S −1 AT J −S −1
They are so named because |c| = cosh θ and s = sinh θ for some θ. It is easily verified that
Like plane rotations, hyperbolic rotations can be used to zero a selected component in a vector,
x1 σ
H = ,
x2 0
3.2. Weighted Least Squares 135
which requires cx2 = sx1 . Provided |x1 | > |x2 |, the solution is
q q
s = x2 / x21 − x22 , c = x1 / x21 − x22 . (3.2.21)
The elements of a hyperbolic rotation H are unbounded and therefore must be used with care. As
shown by Chambers [217, 1971] direct computation of y = Hx is not stable. Instead, a mixed
form that combines a hyperbolic rotation with a plane rotation should be used. This is based on
the equivalence
x1 y1 y1 x1 c −s
H = ⇔ G = , G= .
x2 y2 x2 y2 s c
First, y1 = Hx1 is computed using the hyperbolic rotation, and then y2 = Gx2 is computed
from a plane rotation, i.e.,
taking A1 = A and A2 = wIn in the above reduction. Since the hyperbolic rotations will cause
the rows in A2 to fill in, the reduction is almost as expensive as when A2 is a full upper triangular
matrix. There are two different ways to perform the reduction: process In either top down, which
will use more memory but allows the use of Householder reflectors, or bottom up, which avoids
fill-in but uses more hyperbolic rotations.
Bojanczyk, Higham, and Patel [165, 2003] give a similar algorithm for the equality con-
strained ILS problem
min(b − Ax)T J(b − Ax) subject to Bx = d, (3.2.23)
x
where A ∈ R(p+q)×n , B ∈ Rs×n , and J is the signature matrix. It is assumed that rank(B) = s
and that xT (AT JA)x > 0 for all x > 0 in N (B). These conditions imply that p ≥ n − s. The
solution to problem (3.2.23) satisfies
0 0 B λ d
0 J A s = b, (3.2.24)
B T AT 0 x 0
where s = Jr = J(b − Ax).
J-orthogonal matrices can be constructed from orthogonal matrices as follows. Consider the
partitioned linear system
Q11 Q12 x1 y1
Qx = = , (3.2.25)
Q21 Q22 x2 y2
where Q is orthogonal. Solving the first equation for x1 and substituting in the second equation
will exchange x1 and y1 . This can be written as
x1 y1
= exc (Q) ,
y2 x2
where
Q−1 −Q−1
exc (Q) = 11 11 Q12 (3.2.26)
Q21 Q−1
11 Q22 − Q21 Q−1
11 Q12
is the exchange operator. The (2, 2) block in (3.2.26) is the Schur complement of Q11 in
Q. An early reference to the exchange operator is in network analysis; see the survey by Tsat-
someros [1069, 2000].
These formulas can easily be verified by forming the products L−1 L and U −1 U and using the
rule for multiplying partitioned matrices.
Let M in (3.3.1) have a nonsingular block A. Then from M −1 = (LU )−1 = U −1 L−1 and
(3.3.5) the Schur–Banachiewicz formula follows:
−1
A + A−1 BS −1 CA−1 −A−1 BS −1
−1
M = , (3.3.6)
−S −1 CA−1 S −1
138 Chapter 3. Generalized and Constrained Least Squares
I BD−1
T 0
M= , T = A − BD−1 C, (3.3.7)
0 I C D
T −1 −T −1 BD−1
M −1 = . (3.3.8)
−D−1 CT −1 D−1 + D−1 CT −1 BD−1
Woodbury [1131, 1950]5 gave a formula for the inverse of a square nonsingular matrix A
after it has been modified by a matrix of rank p. This is very useful when p ≪ n.
Theorem 3.3.1 (The Woodbury Formula). Let A ∈ Rn×n and D = Ip , p < n. If A and
S = D − CA−1 B ∈ Rp×p are nonsingular, then
Proof. Equate the (1, 1) blocks in the two expressions (3.3.6) and (3.3.8) for the inverse
M −1 .
which is sometimes called the matrix inversion lemma. Higham [626, 2017] shows that this
formula follows directly from the associative law for matrix multiplications.
Let Ax = b be a given system with known solution x = A−1 b, and let x̂ satisfy the system
where A has been modified by a matrix of rank p. Then the Woodbury formula gives
which is also known as the Sherman–Morrison formula. Note that if σ −1 = v T A−1 u, then
modified matrix A − σuv T is singular. Otherwise, the solution x̂ of the modified linear system
5 The same formula appeared in several papers before Woodbury.
3.3. Modified Least Squares Problems 139
vT x
x̂ = x + w, w = A−1 u. (3.3.14)
σ −1− vT w
The evaluation of this expression only requires the solution of two linear systems Ax = b and
Aw = u. If the LU factorization of A is known, the arithmetic cost is only about 2n2 .
The following related result was first proved by Egerváry [361, 1960] and is included as
Example 1.34 in the seminal book by Householder [645, 1975]. The sufficient part appeared
earlier in Wedderburn [1105, 1964].
rank(B) = rank(A) − 1.
Note that the Woodbury and Sherman–Morrison formulas do not always lead to numerically
stable algorithms, and therefore they should be used with some caution. Stability is a problem
whenever the unmodified problem is conditioned in a worse way than the modified problem.
The history of the Woodbury and similar updating formulas and their applications is surveyed by
Henderson and Searle [601, 1981]. Chu, Funderlic, and Golub [248, 1995] explore extensions
of the Wedderburn rank reduction formula that lead to various matrix factorizations. Explicit
expressions for the pseudoinverse (A + uv H )† are given by Meyer [793, 1973]. Depending
on which of the three conditions u ∈ R(A), v ∈ R(AH ), and 1 + v H A† u ̸= 0 are satisfied,
there are no less than six different cases. Generalizations of Meyer’s results to perturbations of
higher rank are not known. Some results for the pseudoinverse of sums of matrices are given by
R. E. Cline [257, 1965]. For rectangular or square singular matrices, no formulas similar to the
Woodbury formula (3.3.9) with A−1 replaced by the pseudoinverse A† seem to exist.
of A ∈ Rm×n , m ≥ n. Note that only R and Q1 are uniquely determined. Because no efficient
way to update a product of Householder reflections is known, we assume that Q ∈ Rm×m is
explicitly stored. Primarily, we consider algorithms for modifying a QR factorization when A
is subject to a low-rank change. The important special cases of adding or deleting a column or
a row of A are considered separately. Such updating algorithms require O(m2 ) multiplications.
140 Chapter 3. Generalized and Constrained Least Squares
One application where such algorithms are needed is stepwise regression. This is a greedy
technique for selecting a suitable subset of variables in a linear regression model
Ax ≈ b, A = (a1 , a2 , . . . , an ) ∈ Rm×n .
The regression model is built sequentially by adding or deleting one variable at a time. Initially,
set x(0) = 0 and r(0) = b. Assume that at the current step, k variables have entered the regres-
sion, and the current residual is r(k) = b − Ax(k) . In the next step, the column ap to add is
chosen so that the residual norm is maximally decreased or, equivalently, the column that makes
the smallest acute angle with the residual r(k) . Hence,
aTj r(k)
cos(aj , r(k) ) =
∥aj ∥2 ∥rk) ∥2
is maximized for j = p over all variables not yet in the model. After a new variable has entered
the regression, it may be that the contribution of some other variable included in the regression
is no longer significant. This variable is then deleted from the regression model using a similar
technique.
Efroymson [360, 1960] gave an algorithm for stepwise regression based on Gauss–Jordan
elimination on the normal equations. This is sensitive to perturbations and not numerically stable.
Miller [794, 2002] surveys subset selection in regression and emphasizes the computational and
conceptual advantages of using methods based on QR factorization rather than normal equations.
Eldén [366, 1972] describes a backward stable method that uses Householder QR factorization
that has the drawback of needing storage for a square matrix Q ∈ Rm×m . A stable implementa-
tion of stepwise regression based on MGS with reorthogonalization is given by Gragg, LeVeque,
and Trangenstein [522, 1979] and essentially uses only the storage needed for A and b.
Methods for updating least squares solutions are closely related to methods for modifying
matrix factorizations. If A ∈ Rm×n has full column rank, the least squares solution is obtained
from the extended QR factorization
R z
(A b) = Q , Q ∈ Rm×m , (3.3.16)
0 ρe1
where the right-hand side b has been appended to A as a last column. The solution is then
obtained by solving the triangular system Rx = z, and the residual norm ∥Ax − b∥2 equals ρ.
The upper triangular matrix R in (3.3.16) is the Cholesky factor of
AT ATA AT b
(A b) = .
bT bTA bT b
Appending a Row
Without loss of generality, we assume that a row v T is appended to A ∈ Rm×n after the last row.
Then we have
R
A Q 0 T
= v .
vT 0 1
0
Hence the problem is equivalent to appending v T as the (n + 1)th row to R. Now, plane rotations
Gk,n+1 , k = 1 : n, are determined to annihilate v T , giving
R R
e
G = , G = Gn,n+1 · · · G1,n+1 .
vT 0
and can be computed in 6mn flops. Note that R can be updated without Q being available. From
the interlacing property of the singular values it follows that the updating does not decrease the
singular values of R. By the general rounding error analysis of plane rotations and Householder
transformations, this updating algorithm is backward stable.
Determine H as a Householder reflection such that Hw2 = βe1 . Next, let Gk,n+1 , k =
n, n − 1, . . . , 1, be a sequence of plane transformations that zeros the elements in w1 from the
bottom up and creates a nonzero row below the matrix R. Taking P1 as the combination of these
transformations, we have
R
R w1
P1 + v T = z T + βv T , β = ±∥w2 ∥2 . (3.3.19)
0 w2
0
flops. This gives a total of 4m2 + 8nm + 4n2 flops. The algorithm can be shown to be mixed
backward stable.
Remark 3.3.1. In another version of this algorithm the matrix R was modified into a Hessenberg
matrix by using a sequence of rotations Gk,k+1 , k = n, n − 1, . . . , 1, in step 3. The version given
here is easier to implement since the modified row can be held in a vector. This becomes even
more important for large sparse problems.
Deleting a Column
is trivial. The QR factorization of A1 is obtained simply by deleting the trailing column from the
decomposition. Suppose now that we want to compute the QR factorization
A
e = (a1 , . . . , ak−1 , ak+1 , . . . , an ),
where the kth column of A is deleted, k < n. From the above observation it follows that this
decomposition can readily be obtained from the QR factorization of the matrix
where PL is a permutation matrix that performs a left circular shift of the columns ak , . . . , an .
The matrix RPL is upper Hessenberg, but the matrix PLT RPL is upper triangular, except in its
last row. For example, if k = 3, n = 6, then it has the structure
× × × × × ×
0 × × × × ×
T
0 0 × × × 0
PL RPL = .
0 0 0 × × 0
0 0 0 0 × 0
0 0 × × × ×
The task has now been reduced to constructing plane rotations Gi,n , i = k : n − 1, that zero out
the off-diagonal elements in the last row. Only the trailing principal submatrix of order n − k + 1
in PLT RPL , which has the form
R22 0
,
v T rnn
participates in this transformation. After the last column is deleted , the remaining update of R22
is precisely the same as already described for adding a row. The updated Q factor is
e = QPL GT · · · GT
Q k,n n−1,n .
By an obvious extension of the above algorithm, we obtain the QR factorization of the matrix
resulting from a left circular shift applied to a set of columns (a1 , . . . , ak−1 , ak+1 , . . . , ap , ak ,
ap+1 , . . . , an ).
3.3. Modified Least Squares Problems 143
Inserting a Column
Assume that the QR factorization
R
A = (a1 , . . . , ak−1 , ak+1 , . . . , an ) = Q , k ̸= n,
0
where γ = ∥v∥2 ̸= 0. Let Hn be a Householder reflector such that HnT v = γe1 . Then we have
the QR factorization
e R , Q e = Q In 0 e= R u .
e
( A ak ) = Q , R
0 0 Hn 0 γ
Let PR be the permutation matrix that performs a right circular shift on the columns ak+1 , . . . ,
an , ak , so that
R 11 u1 R 12
e RPR , RP
e
A e = ( A a k ) PR = Q e R= 0 u2 R22 ,
0
0 γ 0
where R11 ∈ R(k−1)×(k−1) and R22 ∈ R(n−k)×(n−k) are upper triangular. For example, for
k = 4, n = 6, we have
× × × × × ×
0 × × × × ×
e R= 0 0 × × × ×
RP 0 0 0 ×
.
× ×
0 0 0 × 0 ×
0 0 0 × 0 0
Now determine plane rotations Gi−1,i , i = n : −1 : k, to zero the last n − k elements in the kth
column of RP
e R . Then
u2 R22
Gk−1,k · · · Gn−1,n =R e22
γ 0
is upper triangular, and the updated factors are
R11 R e12
e Tn−1,n · · · Gk−1,k ,
R= e22 , Q = QG (3.3.21)
0 R
where R
e12 = ( u1 R12 ).
The above method easily generalizes to computing QR factors of
i.e., of the matrix resulting from a right circular shift of the columns ak , . . . , ap . Note that when
a column is deleted, the new R-factor can be computed without Q being available. However,
when a column is added, it is essential that Q be known.
The algorithms given for appending and deleting a column correspond to the MATLAB func-
tions qrinsert(Q,R,k,ak) and qrdelete(Q,R,k).
144 Chapter 3. Generalized and Constrained Least Squares
Deleting a Row
Suppose we are given the QR factorization of A ∈ Rm×n ,
T
a1 R
A= = Q , (3.3.22)
Ae 0
where R e ∈ Rn×n is upper triangular, and the row vector v T = aT has been generated. To find
1
the downdated factor Q̃ we need not consider the transformation H, because it does not affect
its first n columns. The matrix QGT is orthogonal, and by (3.3.23) its first row must equal eT1 .
Therefore it must have the form
T 1 0
QG =
0 Qe
with Q
e orthogonal. It then follows that
vT 1
aT1
1 1 0
= Re 0,
Ae 0 0 Q e
0 0
Re
which shows that a1 = v. The desired factorization A = Q
e e is now obtained by deleting
0
the first row and last column on both sides of the equation. Note the essential role played by the
first row of Q in this algorithm.
In downdating, the singular values of R can decrease, and R
e can become singular. Paige [859,
1980] has proved that the above downdating algorithm is mixed stable, i.e., the computed R e is
close to the corresponding exact factor of a nearby matrix A e + E, where ∥E∥ < cu.
where Q11 ∈ Rk×n and Q12 ∈ Rk×(m−n) . We want to find the QR factors of A2 , where the first
block of k rows A1 ∈ Rk×n is deleted. From QQT = Im it follows that
A1 Ik R QT11
=Q . (3.3.26)
A2 0 0 QT12
e ∈ Rn×n is upper triangular, and V has been generated. To find the downdated factor Q̃
where R
we need not consider the transformation H, which does not affect its first n columns. The matrix
QGT is orthogonal, and from (3.3.27) its k first rows must equal ( Ik 0 ). It follows that
T Ik 0
QG = e ,
0 Q
where Q
e is orthogonal, and
V Ik
A1 Ik Ik 0
= R
e 0 .
A2 0 0 Q
e
0 0
This shows that V = A1 . Equating the last block of rows, we obtain the desired downdated
factorization
R
e
A2 = Q e .
0
eT R
R e = RTR − zz T .
146 Chapter 3. Generalized and Constrained Least Squares
The Cholesky factor R is mathematically the same as R in the QR factorization. Any downdating
algorithm that uses only R and not Q or the original data A relies on less information and cannot
be expected to give full accuracy.
In the LINPACK algorithm due to Saunders [963, 1972], one seeks to recover the necessary
information in Q using the original data from A. The first row of the QR factorization can be
written
T T R
e1 A = q , q T = eT1 Q = ( q1T q2T ) , (3.3.29)
0
giving z T = q1T R. Thus q1 ∈ Rn can be found from A by forward substitution in the lower
triangular system RT q1 = AT e1 = z. Furthermore, ∥q∥22 = ∥q1 ∥22 + ∥q2 ∥22 = 1, and hence
γ = ∥q2 ∥2 = (1 − ∥q1 ∥22 )1/2 . (3.3.30)
This allows the downdated factor R e to be computed as described previously in Section 3.3.2 by
a sequence of plane rotations Gk,n+1 , k = n, n − 1, . . . , 1, constructed so that
q1
G1,n+1 · · · Gn,n+1 = αen+1 , α = 1,
γ
and
R Re
G1,n+1 · · · Gn,n+1 = .
0 vT
Then, as in (3.3.24), R
e is the downdated factor, and v = z. As described, the LINPACK al-
gorithm requires about 3n2 flops. By interleaving the two phases, Pan [871, 1990] gives an
implementation that uses 40% fewer multiplications.
We can write
RTR − zz T = RT (In − q1 q1T )R = R̃T R̃. (3.3.31)
If we put I − q1 q1T = LLT , then R̃ = LT R. The matrix I − q1 q1T has n − 1 eigenvalues equal
to 1 and one equal to γ 2 = 1 − ∥q1 ∥22 ≤ 1. Hence, σn (L) = γ. If γ is small, there will be
severe cancellation in the computation of 1 − ∥q1 ∥22 . When γ ≈ u1/2 the LINPACK algorithm
can break down. In Saunders [963, 1972] the downdate was used only on a square matrix A.
Then we know that γ = 0, and there is no danger of breakdown. However, Example 3.3.3 below
shows the danger of not having Q.
Downdating the Cholesky factor R is an inherently less stable operation than downdating
both Q and R in the QR factorization. The best we can expect is that the computed downdated
factor R
e is the exact Cholesky factor of
(R + E)T (R + E) − (z + f )(z + f )T ,
where ∥E∥2 and ∥f ∥2 are modest constants times machine precision.
Pan [872, 1993] gives a first-order perturbation analysis that shows that the normwise relative
sensitivity of the Cholesky downdating problem can be measured by
ξ(R, z) = κ(R) + κ2 (R)(1 − γ 2 )/γ 2 , (3.3.32)
where γ is defined as in (3.3.30). Hence an ill-conditioned downdating problem is signaled by a
small value of γ, but the condition number of R also plays a role. Sun [1049, 1995] derives two
different condition numbers,
κ(R, z) = κ(R)/γ 3 , e 2.
c(R, z) = κ(R)/γ (3.3.33)
Numerical tests show that in most cases, c(R, z) is the smallest. Note that the suggested condition
numbers can be cheaply estimated using a standard condition estimator.
3.3. Modified Least Squares Problems 147
Example 3.3.3. Consider the least squares problem min ∥Ax − b∥2 , where
√
τ 1
A= , b= , τ = 1/ u,
1 1
and u is the unit roundoff. We may think of the first row of A as an outlier. The QR factorization
of A, correctly rounded to single precision, is
τ 1 −ϵ τ
A= = ,
1 ϵ 1 0
An alternative downdating method that uses both R and A but not Q is given by Björck,
Eldén, and Park [151, 1994]. Let v be the solution of minv ∥Av − e1 ∥2 . Then the R-factor of
( A e1 ) is
R q1
, q1 = Rv. (3.3.34)
0 γ
The connected seminormal equations (CSNE) downdating algorithm first computes v from the
so-called seminormal equations (SNE) RTRv = AT e1 . A corrected solution v + δv is then
determined from
r = Av, RTRδv = r, v := v + δv, (3.3.35)
giving
q1 = Rv, γ = ∥Av − e1 ∥2 .
The update of R then proceeds as in the LINPACK algorithm. A similar procedure can be used
to downdate the augmented R-factor (3.3.16) by solving the least squares problem
x
min ( A b ) − e1
x,ϕ ϕ 2
using the CSNE method. This leads to an accurate downdating algorithm for least squares prob-
lems. However, the modifications are not trivial, partly because the condition number of the
augmented R-factor is large when ρ is small. An error analysis of the CSNE method is given in
Section 2.5.4.
The CSNE downdating algorithm requires three more triangular solves than the LINPACK
algorithm and an additional four matrix-vector products. Thus, a hybrid algorithm is preferable
in which the CSNE algorithm is used only when the downdating problem is ill-conditioned.
It is often required to find the downdated Cholesky factor after a modification of rank k > 1.
This can be performed as a sequence of k rank-one modifications. However, block methods using
matrix-matrix and matrix-vector operations can execute more efficiently. Let R in AT A = RT R
and Z ∈ Rk×n be given. The Cholesky block downdating problem seeks R e such that
eT R
R e = ATA − Z T Z.
The LINPACK algorithm can be generalized as follows. Suppose the first k rows Z in A are to
be deleted. We have
Z R Q11 Q12 R
A= e =Q = , (3.3.36)
A 0 Q21 Q22 0
148 Chapter 3. Generalized and Constrained Least Squares
where Q ∈ Rm×m has been partitioned so that Q11 ∈ Rk×n . It follows that
R
Z = ( Ik 0 ) A = ( Ik 0 ) Q = Q11 R. (3.3.37)
0
Hence Q11 ∈ Rk×n can be determined as in LINPACK by solving the triangular matrix equation
RT QT11 = Z. Furthermore, using the orthogonality of Q, we have
T
Q11
Ik = ( Q11 Q12 ) ,
QT12
which shows that Q12 QT12 = Ik − Q11 QT11 . Hence we can take Q12 = ( L 0 ) ∈ Rk×(m−n) ,
where L is the lower triangular Cholesky factor of Ik − Q11 QT11 . The downdating can then
proceed as in block downdating of the QR factorization described in Section 3.3.2.
Algorithms for block downdating the Cholesky factorization using hyperbolic transforma-
tions are given by Bojanczyk and Steinhardt [168, 1991] and Liu [756, 2011]. They proceed in
n steps to compute
Z 0
Pn · · · P1 = e ,
R R
where each transformation Pi consists of a Householder reflection followed by a hyperbolic
rotation; see Section 3.2.4. In step i, i = 1, . . . , n, a Householder reflection Hi is used to zero
all elements in the ith column of Z, except zk,i which then is zeroed by a hyperbolic rotation
Gk,k+i acting on rows k, k + i. If the problem is positive definite, the process will not break
down. The first two steps in the reduction are shown below for the case n = 4, k = 3.
⊗ × × × × × × ⊗ × ×
⊗ × × ×
× × ×
⊗ × ×
× × × × ⊗ × × × ⊗ × ×
H1 ⇒ G3,4
⇒ G3,5 H2
× × × ×
×
× × ×
×
× × ×
× × ×
× × ×
× × ×
× × × × × ×
× × ×
(ATA + B TB)e
x = AT b + B T c.
3.3. Modified Least Squares Problems 149
Adding B TBx to both sides of the original normal equations ATAx = AT b and subtracting gives
(ATA + B TB)(e
x − x) = B T rp , rp = c − Bx, (3.3.38)
where rp is the predicted residual for the added equations. Hence the updated solution becomes
x e T rp ,
e = x + CB e = (ATA + B TB)−1 ,
C (3.3.39)
where Ce is the updated covariance matrix. From the Woodbury formula (3.3.9) we obtain the
expression
e = C − U (Ip + BU )−1 U T , U = CB T .
C (3.3.40)
In particular, adding a single equation v T x = γ gives ρ = γ − v T x, u = Cv, and
e = C − uuT /(1 + v T u),
C x
e = x + ρCv,
e
storage and operation counts for a rank-one modification are reduced from O(m2 ) to O(mn).
Such algorithms for adding and deleting rows and columns in the compact QR factorization are
developed by Daniel et al. [285, 1976]. Their algorithms use Gram–Schmidt QR with reorthog-
onalization. Reichel and Gragg [917, 1990] give optimized Fortran subroutines implementing
similar methods.
Adding a column in the last position in the QR factorization is straightforward and equal to
an intermediate step in a columnwise Gram–Schmidt. Similarly, deleting the last column of A in
the factorization A = Q1 R is trivial. Inserting or deleting a column in another position requires
computing QR factors of a permuted triangular matrix. This can be done by a series of plane
rotations as described for updating the full QR factorization; see Section 3.3.2. Adding a row in
the QR factorization can also be performed similarly by a series of plane rotations.
We now describe an algorithm for a general rank-one update. Given A = Q1 R, with or-
thonormal Q1 ∈ Rm×n , we seek the compact QR factorization of the modified matrix A e =
A + vuT , where v ∈ Rm , and u ∈ Rn . We then have
Ae = ( Q1 v ) RT . (3.3.43)
u
The first step is to make v orthogonal to Q1 using Gram–Schmidt and, if necessary, reorthogo-
nalization. This produces vectors r and a unit vector q, ∥q∥2 = 1, such that
v = Q1 r + ρq, QT1 q = 0.
We then have
R r
A
e = ( Q1 q) + uT . (3.3.44)
0 ρ
The remaining step uses a sequence of plane rotations as in the algorithm for modifying the
full QR factorization. With one reorthogonalization this rank-one update algorithm requires
approximately 20mn + 6n2 flops.
A similar algorithm is used for downdating the compact QR factorization when the first row
z T is deleted. Let q1T = eT1 Q1 be the first row in Q1 . Then we have
T T
z q1
A= e = Q̂1 R, (3.3.45)
A
v = e1 − Q1 (QT1 e1 ) = e1 − Q1 q1 .
3.3. Modified Least Squares Problems 151
√
If ∥v∥2 < 1/ 2, then v is reorthogonalized; otherwise, v is accepted. Because of the special
form of e1 , the result has the form
T T
q1 1 q1 γ In q1 γ
= , = v/∥v∥2 . (3.3.47)
Q̂1 0 Q̂1 h 0 γ h
Since orthogonal transformations preserve length, we must have |τ | = 1. Because the trans-
h = 0 in (3.3.49), and
formed matrix also must have orthonormal columns, it follows that e
T
z 0 1 T R
Ae = Q e1 0 G 0
,
where
−ek
Pk = I − uk uTk , uk = . (3.3.52)
qk
The equivalence is true also numerically. The matrix P is orthogonal by construction and fully
determined by Q1 and the strictly upper triangular matrix P11 ∈ Rn×n ,
P11 P12 P11 (I − P11 )Q̄T1
P = = . (3.3.53)
P21 P22 Q̄1 (I − P11 ) I − Q̄1 (I − P11 )Q̄T1
where Mi = I − qi qiT . Yoo and Park note that downdating the MGS QR decomposition when
the first row in A is deleted is equivalent to downdating the corresponding Householder decom-
position (3.3.51) when row (n + 1) is deleted. This can be done stably provided the (n + 1)th
row in P is available. The HGSD algorithm starts by using (3.3.55) to recover the first row g T
of P22 :
g = eT1 P21 = ((eT1 M1 )M2 ) · · · Mn . (3.3.56)
This gives the recursion g T = eT1 , g T = g T − (g Tqk )qkT , k = 1, . . . , n. Next, a Householder
reflection H such that g T H = (∥g∥2 , 0, . . . , 0) is determined, and the first column v of P22 H is
computed from
v = P22 He1 = (M1 · · · (Mn (He1 ))), (3.3.57)
giving the recursion v = He1 , v = v − qk (qkT v), k = n, n − 1, . . . , 1. These steps replace
the steps for orthogonalization of e1 to Q1 in the previous algorithm and yield γ = v1 and
h = (v2 , . . . , vm ). The first row f of P21 could be recovered similarly from (3.3.54), but it is
much cheaper to use
eTn+1 P = (((eTn+1 P1 )P2 ) · · · Pn ), (3.3.58)
where en+1 ∈ R(n+m) is a unit vector with one in its (n + 1)th position. This leads to the
recursion f = en+1 , f T = f T − (f T uk )uk , k = 1, . . . , n, where uk is given by (3.3.52).
The remaining steps of the algorithm are similar to the steps (3.3.49)–(3.3.50) in the previous
Gram–Schmidt downdating algorithm. An orthogonal matrix G is determined as a product of
plane rotations so that T
f γ 1 0
G= f1 .
0 Q̂1 0 Q
Finally, the upper triangular matrix R is modified:
T
R z
GT = e .
0 R
A complete pseudocode of this Householder–MGS downdating algorithm is given by Yoo and
Park [1140, 1996]. The HGSD algorithm uses 4mn flops for computing g and v in (3.3.56) and
(3.3.57). The total arithmetic work is approximately 20mn + 4n2 flops.
where U and V are orthogonal and R ∈ Rk×k , G ∈ R(m−k)×(n−k) are upper triangular. Let
σ1 ≥ σ2 ≥ · · · ≥ σn be the singular values of A, and assume that for some k < n it holds that
σk ≫ σk+1 ≤ δ, where δ is a given tolerance. Then the numerical δ-rank of A equals k. Also, if
1 1/2
σk (R) ≥ σk , ∥F ∥2F + ∥G∥2F ≤ cσk+1
c
for some constant c, the decomposition (3.3.59) exhibits the rank and nullspace of A. The URV
decomposition can be updated in O(n2 ) operations when a row is added to A. Following the
algorithm given by Stewart [1025, 1992], we write
T R F
U 0 A
V = 0 G , (3.3.60)
0 1 wT
xT y T
where wT V = (xT y T ) and (∥F ∥2F + ∥G∥2F )1/2 = ν ≤ δ. In the simplest case the inequality
q
ν 2 + ∥y∥22 ≤ δ (3.3.61)
is satisfied. Then it suffices to reduce the matrix in (3.3.60) to upper triangular form by a sequence
of left plane rotations. Note that the updated matrix R cannot become effectively rank-deficient
because its singular values cannot decrease.
If (3.3.61) is not satisfied, we first reduce y T in (3.3.60) so that it becomes proportional to
T
e1 , while keeping the upper triangular form of G. This can be done by a sequence of right and
left plane rotations as illustrated below. (Note that here the f ’s represent entire columns of F .)
↓ ↓ ↓ ↓
f f f f f f f f f f f f
g g g g g g g g g g g g
0 g g g 0 g g g 0 g g g
⇒ ⇒ ⇒
0
0 g g →0 0 g g
0
+ g g
0 0 + g → 0 0 ⊕ g 0 0 0 g
y y y 0 y y y 0 y y 0 0
↓ ↓
f f f f f f f f f f f f
g
g g g
g g g
g →g g g g
→
0 g g g ⇒
+ g g
g ⇒ → ⊕
g g g.
→
0 ⊕ g g
0 ⊕ g
g
0
0 0 g
0 0 0 g 0 0 0 g 0 0 g g
y y 0 0 σ 0 0 0 σ 0 0 0
In this part of the reduction, R and xT are not involved. The system now has the form
R f F̃
0 g G̃ .
xT σ 0
This matrix is now reduced to triangular form using plane rotations from the left, and k is in-
creased by 1. Finally, the new R is checked for degeneracy and possibly reduced by deflation.
The complete update takes O(n2 ) flops.
Stewart [1027, 1993] has pointed out that although the decomposition (3.3.59) is very satis-
factory for recursive least squares problems, it is less suited for applications where an approx-
imate nullspace is to be recursively updated. Let U = ( U1 U2 ) and V = ( V1 V2 ) be
154 Chapter 3. Generalized and Constrained Least Squares
Hence the orthogonal matrix V2 can be taken as an approximation to the numerical nullspace Nk .
On the other hand, we have ∥U2T A∥2 = ∥G∥2 , and therefore the last n − k singular values of A
are less than or equal to ∥G∥2 .
Because F is involved in the bound (3.3.62), V2 is not the best available approximate nullspace.
This problem can be resolved by working instead with the corresponding rank-revealing ULV
decomposition
L 0
A=U V T, (3.3.63)
H E
where L and E have lower triangular form, and
1 1/2
σk (L) ≥ σk , ∥H∥2F + ∥E∥2F = ν ≤ δ.
c
For this decomposition, ∥AV2 ∥2 = ∥E∥F , where V = (V1 , V2 ) is a conformal partitioning of
V . Hence the size of ∥H∥2 does not affect the nullspace approximation.
Stewart [1027, 1993] has presented an updating scheme for the decomposition (3.3.63). With
wT V = ( xT y T ), the problem reduces to updating
L 0
H E .
xT y T
We first reduce y T to ∥y∥2 eT1 by right rotations while keeping the lower triangular form of E. At
the end of this reduction the matrix will have the form
l 0 0 0 0
l l 0 0 0
l l l 0 0
.
h h h e 0
h h h e e
x x x y 0
This last row is annihilated by a sequence of left rotations, and k is increased by 1. (For the case
above we would use Q = G16 G26 G36 G46 .) If there has been no effective increase in rank, a
deflation process has to be applied. If (3.3.61) is satisfied, the rank cannot increase. Then the
reduction is performed, but the first rotation G46 is skipped. This gives us a matrix of the form
l 0 0 y 0
l l 0 y 0
l l l y 0
.
h h h e 0
h h h e e
0 0 0 y 0
The y elements above the main diagonal can be eliminated using right rotations. This fills out the
last row again, but with elements the same size as y. Now the last row can be reduced by the pro-
cedure described above without destroying the rank-revealing structure; see again Stewart [1027,
1993]. The main difference compared to the scheme for updating the URV decomposition is that
there is not the same simplification when (3.3.61) is satisfied.
3.4. Equality Constrained Problems 155
A complication with the above updating algorithm is that when m ≫ n, the extra storage for
U ∈ Rm×m may be prohibitive. If only V and the triangular factor are stored, then we must use
methods like the Saunders algorithm, possibly stabilized with the CSNE method. Alternatively,
hyperbolic rotations may be used; see Section 7.2.4. Such methods will not be as satisfactory as
methods using Q or Gram–Schmidt-based methods using Q1 ; see Sections 3.3.2 and 3.3.5.
Downdating algorithms for rank-revealing URV decompositions are also treated by Park and
Eldén [880, 1995] and Barlow, Yoon, and Zha [79, 1996]. MATLAB templates for computing
RRQR and UTV decompositions are given by Fierro, P. C. Hansen, and P. S. K. Hansen [408,
1999] and Fierro and Hansen [407, 2005]. Algorithms for modifying and maintaining ULV
decompositions are given in Barlow [67, 2003], Barlow and Erbay [72, 2009], and Barlow, Erbay,
and Slapnic̆ar [73, 2005]. Stewart and Van Dooren [1034, 2000] give updating schemes for
quotient-type generalized URV decomposition. Methods for computing and updating product
and quotient ULV decompositions are developed by Simonsson [999, 2006].
This problem always has a unique solution of least-norm. Most of the methods described in the
following for solving problem LSE can, with small modifications, be adapted to solve (3.4.4).
A natural way to solve problem LSE is to derive an equivalent unconstrained least squares
problem of lower dimension. There are two different ways to perform this reduction: by direct
elimination or by using the nullspace method. The method of direct elimination starts by
reducing the matrix B to upper trapezoidal form. It is essential that column pivoting be used
in this step. To solve the more general problem (3.4.4) a QR factorization of B with column
pivoting can be used:
R11 R12 }r
QTB BΠB = , r = rank(B) ≤ p, (3.4.5)
0 0 }p − r
where QB ∈ Rp×p is orthogonal, R11 is upper triangular and nonsingular, and ΠB is a permuta-
tion matrix. With x̄ = ΠTB x the constraints become
d¯
( R11 R12 ) x̄ = d¯1 , d¯ = QTB d = ¯1 , (3.4.6)
d2
where d¯2 = 0 if the constraints are consistent. Applying the permutation ΠB to the columns of
A and partitioning the resulting matrix conformally with (3.4.5) gives
x̄1
Ax = Āx̄ = ( Ā1 Ā2 ) , (3.4.7)
x̄2
−1 ¯
where Ā = AΠB . If (3.4.6) is used to eliminate x̄1 = R11 (d1 − R12 x̄2 ) from (3.4.7), we obtain
Ax − b = A2 x̄2 − b, where
b b
b2 = Ā2 − Ā1 R−1 R12 ,
A bb = b − Ā1 R−1 d¯1 . (3.4.8)
11 11
This reduction can be interpreted as performing r steps of Gaussian elimination on the system
d¯1
R11 R12 x̄1
= .
Ā1 Ā2 x̄2 b
Then x̄2 is determined by solving the reduced unconstrained least squares problem
min ∥A
b2 x̄2 − bb∥2 , b2 ∈ Rm×(n−r) .
A (3.4.9)
x̄2
We now show that if (3.4.2) holds, then rank(A b2 ) = n − r, and (3.4.9) has a unique solution.
b2 ) < n − r, there is a vector v ̸= 0 such that
For if rank(A
b2 v = Ā2 v − Ā1 R−1 R12 v = 0.
A 11
−1 u
If we let u = −R11 R12 v, then R11 u + R12 v = 0 and Ā1 u + Ā2 v = 0. Hence w = ΠB ̸=
v
0 is a null vector to both B and A. But this contradicts the assumption (3.4.2).
The solution to (3.4.9) can be obtained from the QR factorization
T b R22 Tb c1
QA A2 = , QA b = ,
0 c2
where R22 ∈ R(n−r)×(n−r) is upper triangular and nonsingular. Then x̄ is obtained from the
triangular system
d¯1
R11 R12
x̄ = , (3.4.10)
0 R22 c1
3.4. Equality Constrained Problems 157
and x = ΠB x̄ solves problem LSE. The coding of the direct elimination algorithm can be kept
remarkably compact, as shown by the ALGOL program for Householder QR in Björck and
Golub [143, 1967]. Cox and Higham [271, 1999] obtain a similar stable elimination method by
taking the analytic limit of the weighted least squares problem
2
ωB ωd
min x−
x A b 2
Q1 ∈ Rn×p , Q2 ∈ Rn×(n−p) .
Here Q2 gives an orthogonal basis for the nullspace of B, i.e., N (B) = R(Q2 ). If rank(B) = p,
then LB is nonsingular. Then any vector x ∈ Rn such that Bx = d can be represented as
x = x1 + Q2 y2 , where
x1 = B † d, B † = Q1 L−1 B d, (3.4.12)
and y2 ∈ R(n−p) is arbitrary. It remains to solve the reduced least squares problem
x = B † d + Q2 (AQ2 )† (b − AB † d)
= (I − Q2 (AQ2 )† A)B † d + Q2 (AQ2 )† b. (3.4.14)
exists with RA upper triangular and nonsingular. It follows that the unique solution to problem
LSE is x = x1 + Q2 y2 , where
c1
RA y2 = c1 , c = = QTA (b − Ax1 ). (3.4.15)
c2
The method of direct elimination and all three nullspace methods are numerically stable and
should give almost identical results. The method of direct elimination, which uses Gaussian
elimination to derive the reduced unconstrained system, has the lowest operation count.
When A is large and sparse, the nullspace method has the drawback that fill-in can be ex-
pected when forming the matrix AQ2 . When rank(B) = p, and BΠ = ( B1 B2 ) with B1
square and nonsingular, the nullspace matrix
−B1−1 B2
Z=Π
I
satisfies BZ = 0. In the reduced gradient method, Z is used to solve problem LSE. This is
both an elimination method and a nullspace method. The reduced gradient method is potentially
more efficient because it can work with a sparse LU factorization of B1 .
Theorem 3.4.1. The solution of problem (3.4.16) can be written x = A†M L b, where
Here A†M L b is the ML-weighted pseudoinverse of A. The solution is unique if and only if
Proof. See Eldén [369, 1982, Theorem 2.1] and Mitra and Rao [798, 1974].
3.4. Equality Constrained Problems 159
†
The solution to problem LSE can be expressed in terms of the weighted pseudoinverse BIA
as follows.
Theorem 3.4.2. If the constraints are consistent, the least-norm solution of problem LSE is given
by
†
x = BIA d + (AP )† b, P = I − L† L. (3.4.20)
Next, apply the following iterative refinement scheme: Set x(1) = x(w), and for k = 1, 2, . . . ,
(k)
1. s1 = d − Bx(k) .
(k)
wB s1
2. solve min∆x(k) ∆x(k) − .
A 0 2
No further QR factorization is needed to compute the corrections. The vectors x(k) generated
can be shown to converge to xLSE with linear rate equal to ρw = wp2 /(wp2 + w2 ), where wp is
the largest generalized singular value of (A, B). With the default value w = u−1/2 the method
converges quickly unless the problem is ill-conditioned.
The method of weighting can be analyzed using the GSVD. In the following we assume that
rank(B) = p and rank ( AT B T ) = n, which ensures that the weighted problem (3.4.21) has
a unique solution. By Theorem 3.1.5 the GSVD of {A, B} can be written
ΣA
A=U Z, B = V (ΣB 0)Z, (3.4.23)
0
The generalized singular values of {A, B} are αi = 1, i > p, and σi = αi /βi . From (3.4.23)
the normal equations (ATA + w2 B TB)x = AT b + w2 B T d are transformed into diagonal form,
Σ2B 0 ΣB
Σ2A + w2 y = ( ΣA 0 ) c + w2 e, (3.4.24)
0 0 0
αi ci + w2 βi ei
(
, i = 1, . . . , p,
yi = αi2 + w2 βi2 (3.4.25)
ci , i = p + 1, . . . , n.
Several methods for solving problem LSE are described by Lawson and Hanson [727, 1995];
a detailed analysis of the method of weighting for LSE is given in Section 22. Wedin [1110,
1985] gives perturbation bounds for LSE based on the augmented system formulation. Cox and
Higham [274, 1999] analyze the accuracy and stability of three different nullspace methods for
problem LSE, give a perturbation theory, and derive practical error bounds. Rank-deficient LSE
problems are studied by Wei [1113, 1992]. An MGS algorithm for weighted and constrained
problems is presented by Gulliksson [550, 1995]. Barlow and Handy [74, 1988] compare the
method of weighting to that in Björck [126, 1968]. Reid [921, 2000] surveys the use of implicit
scaling for linear least squares problems. Barlow and Vemulapati [78, 1992] give a slightly
modified improvement scheme for the weighting method.
Problem LSI
min ∥Ax − b∥2 subject to Cx ≤ d. (3.5.1)
x
Here C ∈ Rp×n is a matrix with ith row cTi , and the inequalities are to be interpreted compo-
nentwise. A solution to problem LSI exists only if the set of points satisfying Cx ≤ d is not
empty. Problem LSI is equivalent to the quadratic programming problem
where y ∈ Rm is the vector of Lagrange multipliers. The first-order optimality conditions are
given by the Karush–Kuhn–Tucker (KKT) conditions.
y T (Cx − d) = 0, y ≥ 0. (3.5.4)
From the nonnegativity of Cx − d and y it follows that either yi = 0 or the ith constraint is
binding: cTi x − d = 0. The vector y in the KKT conditions is called the dual solution.
An important special case of LSI is when the constraints are upper and lower bounds.
Problem BLS
min ∥Ax − b∥2 subject to l ≤ x ≤ u. (3.5.5)
x
This is an LSI problem with C = (In , −In )T , d = (u, −l)T . Bound-constrained least squares
(BLS) problems arise in many practical applications, e.g., reconstruction problems in geodesy
and tomography, contact problems for mechanical systems, and modeling of ocean circulation.
It can be argued that the linear model is only realistic when the variables are constrained within
meaningful intervals. For computational efficiency it is essential that such constraints be consid-
ered separately from more general constraints, such as those in (3.5.1).
If only one-sided bounds on x are specified in BLS, it is no restriction to assume that these
are nonnegativity constraints. Then we have a linear nonnegative least squares problem.
Problem NNLS
min ∥Ax − b∥2 subject to x ≥ 0. (3.5.6)
This problem arises when x represents quantities such as amounts of material, chemical concen-
trations, and pixel intensities. Applications include geodesy, tomography, and contact problems
for mechanical systems. The KKT conditions to be satisfied at an optimal NNLS solution are
where y is the gradient of 12 ∥Ax − b∥22 . This is also known as a (monotone) linear complementar-
ity problem (LCP). LSI appears to be a more general problem than NNLS, but this is not the case.
Hanson and Haskell [588, 1982] give a number of ways any LSI problem can be transformed into
the form of NNLS.
If A is rank-deficient, there may be an infinite manifold M of optimal solutions with a unique
global optimum. Then we can seek the unique solution of least norm that satisfies min ∥x∥2 ,
x ∈ M . This can be formulated as a least distance problem.
Problem LSD
min ∥x∥2 subject to g ≤ Gx. (3.5.8)
x
The solution to problem LSD can be obtained by an appropriate normalization of the resid-
ual in a related NNLS problem. The following result is shown by Lawson and Hanson [727,
1995, Chap. 23].
where
GT 0 }n
E= , f= .
gT 1 }1
Let r = (r1 , . . . , rn+1 )T = f − Eu be the residual corresponding to the NNLS solution, and set
σ = ∥r∥2 . If σ ̸= 0, then the unique solution to problem LSD is
T
where x = ( x1 , x2 ) , are given by A. K. Cline [254, 1975], Haskell and Hanson [594, 1981],
and Hanson [587, 1986]. Each step requires the solution of a problem of type NNLS, with the
first step having additional linear equality constraints.
Haskell and Hanson [594, 1981] implement two subprograms, LSEI and WNNLS, for solv-
ing linearly constrained least squares problems with equality constraints (LSEI). This is more
general than problem LSI in that some of the equations in Ax = b are to be satisfied exactly.
A user’s guide to the two subroutines LSEI and WNNLS is given in Haskell and Hanson [593,
1979]. WNNLS is based on solving a differentially weighted least squares problem. A penalty
function minimization that is implemented in a numerically stable way is used.
Gradient projection methods and interior methods for problem NNLS are treated in Sec-
tions 8.3.1 and 8.3.2, respectively. General nonlinear optimization software can also be used to
solve nonlinear least squares problems with inequality constraints; see, e.g., Schittkowski [970,
1985] and Mahdavi-Amiri [768, 1981].
constraints that are satisfied with equality at a feasible point are called active. In an active-set
algorithm a working set of constraints is defined to be a linearly independent subset of the active
set at the current approximation. In each iteration the value of the objective function is decreased,
and the optimum is reached in finitely many steps (assuming the constraints are independent).
A first useful step in solving problem LSI (3.5.1) is to transform A to upper triangular form.
With standard column pivoting, a rank-revealing QR factorization
T R11 R12 }r
Q AP = R = (3.5.12)
0 0 }m − r
is computed with R11 upper triangular and nonsingular. The numerical rank r of A is determined
using some specified tolerance, as discussed in Section 1.3.3. After the last (m − r) rows in R
and c are deleted, the objective function in (3.5.1) becomes
c1
∥(R11 , R12 ) x − c1 ∥2 , c = = QT b.
c2
By further orthogonal transformations from the right in (3.5.12), a complete orthogonal decom-
position
T T 0 }r
Q AP V =
0 0 }m − r
is obtained, where T is upper triangular and nonsingular. With x = P V y, problem LSI (3.5.1)
is then equivalent to
min ∥T y1 − c1 ∥2 subject to Ey ≤ d,
y1
y1
where E = CP V , y = , and y1 ∈ Rr .
y2
In general, a feasible point from which to start the algorithm is not known. An exception
is the case when all constraints are simple bounds, as in BLS and NNLS. In a first phase of
an
P active-set algorithm, a feasible point is determined by minimizing the sum of infeasibilities
cTi x − di among violated constraints.
Let xk be a feasible point that satisfies the working set of nk linearly independent constraints
with associated matrix Ck . The constraints in the working set are temporarily treated as equality
constraints. An optimum solution to the corresponding problem exists because the least squares
objective is bounded below. To solve this we take
xk+1 = xk + αk pk ,
where pk is a search direction and αk a nonnegative step length. The search direction is chosen
so that Ck pk = 0, which will cause the constraints in the working set to remain satisfied for all
values of αk . If moving toward the solution encounters an inactive constraint, this constraint is
added to the active set, and the process is repeated.
To satisfy the condition Ck pk = 0, a decomposition
Qk = ( Zk Yk ), (3.5.14)
|{z} |{z}
n−nk nk
164 Chapter 3. Generalized and Constrained Least Squares
To simplify the discussion we assume AZk has full rank, so that (3.5.15) has a unique solu-
tion. To compute this solution we need the QR factorization of AZk . This is obtained from the
QR factorization of AQk , where
Rk Sk
}n − nk
PkT AQk = PkT (AZk AYk ) = 0 Uk . (3.5.16)
}nk
0 0
Computing this larger decomposition has the advantage that the orthogonal matrix Pk need not
be saved and can be discarded after being applied also to the residual vector rk . The solution qk
to (3.5.15) can now be computed from
ck }n − nk
Rk qk = ck , PkT rk = .
dk }nk
Denote by ᾱ the maximum nonnegative step along pk for which xk+1 = xk + αk pk remains
feasible with respect to the constraints not in the working set. If ᾱ ≤ 1, then we take αk = ᾱ
and add the constraint that determines ᾱ to the working set for the next iteration. If ᾱ > 1, then
we set αk = 1. In this case, xk+1 will minimize the objective function when the constraints in
the working set are treated as equalities, and the orthogonal projection of the gradient onto the
subspace of feasible directions will be zero:
In this case we check the optimality of xk+1 by computing Lagrange multipliers for the con-
straints in the working set. At xk+1 these are defined by the equation
The residual vector rk+1 for the new unconstrained problem satisfies
0
PkT rk+1 = .
dk
so from (3.5.16),
TkT λ = − ( UkT 0 ) dk .
The Lagrange multiplier λi for the constraint cTi x ≥ di in the working set is said to be
optimal if λi ≥ 0. If all multipliers are optimal, an optimal point has been found. Otherwise, the
objective function can be decreased if we delete the corresponding constraint from the working
set. When more than one multiplier is not optimal, it is normal to delete the constraint whose
multiplier deviates the most from optimality.
3.5. Inequality Constrained Problems 165
At each iteration, the working set of constraints is changed, which leads to a change in Ck .
If a constraint is dropped, the corresponding row in Ck is deleted; if a constraint is added, a new
row is introduced in Ck . An important feature of an active-set algorithm is efficient solution of
the sequence of unconstrained problems (3.5.15). Techniques described in Section 7.2 can be
used to update the matrix decompositions (3.5.13) and (3.5.16). In (3.5.13), Qk is modified by
a sequence of orthogonal transformations from the right. Factorization (3.5.16) and the vector
PkT rk+1 are updated accordingly.
If xk+1 = xk , Lagrange multipliers are computed to determine if an improvement is possible
by moving away from one of the active constraints (by deleting it from the working set). In each
iteration, the value of the objective function is decreased until the KKT conditions are satisfied.
Active-set algorithms usually restrict the change in dimension of the working set by dropping
or adding only one constraint at each iteration. For large-scale problems this implies many
iterations when the set of active constraints at the starting point is far from the working set at the
optimal point. Hence, unless a good approximation to the final set of active constraints is known,
an active-set algorithm will require many iterations to converge.
In the rank-deficient case it can happen that the matrix AZk in (3.5.15) is rank-deficient,
and hence Rk is singular. Note that if some Rk is nonsingular, it can only become singular
during later iterations when a constraint is deleted from the working set. In this case only its last
diagonal element can become zero. This simplifies the treatment of the rank-deficient case. To
make the initial Rk nonsingular one can add artificial constraints to ensure that the matrix AZk
has full rank.
A possible further complication is that the working set of constraints can become linearly
dependent. This can cause possible cycling in the algorithm, so that its convergence cannot be
ensured. A simple remedy that is often used is to enlarge the feasible region of the offending
constraint by a small quantity.
If A has full column rank, the active-set algorithm for problem LSI described here is essen-
tially identical to an algorithm given by Stoer [1039, 1971]. LSSOL by Gill et al. [473, 1986] is a
set of Fortran subroutines for solving a class of convex quadratic programming that includes LSI.
It handles rank-deficiency in A, a combination of simple bounds, and general linear constraints.
It allows for a linear term in the objective function and uses a two-phase active-set method. The
minimizations in both phases are performed by the same subroutines.
For problems BLS and NNLS, active-set methods simplify. We outline an active-set algo-
rithm for problem BLS in its upper triangular form
min ∥Rx − c∥2 subject to l ≤ x ≤ u. (3.5.18)
x
where RPk = (REFk , REBk ) = (RFk , RBk ). To simplify the discussion we assume RFk has
full column rank, so that (3.5.20) has a unique solution. This is always the case if rank(A) = n.
To solve (3.5.20) we need the QR factorization of RFk . We obtain this by considering the
first block of columns of the QR factorization,
Uk Sk dk
QTk (RFk , RBk ) = , QTk c = . (3.5.21)
0 Vk ek
(k) (k)
The solution to (3.5.20) is given by Uk xFk = dk − Sk xBk , and we take
(k) (k)
where z (k) = EFk xFk + EBk xBk and α is a nonnegative step length. (Note that, assuming
rank(R) = n, z (0) is just the solution to the unconstrained problem.)
Let ᾱ be the maximum value of α for which x(k+1) remains feasible. There are now two
possibilities:
• If ᾱ < 1, then z (k) is not feasible. We take α = ᾱ and move all indices q ∈ Fk for which
(k+1)
xq = lq or uq from Fk to Bk . Thus, the free variables that hit their lower or upper
bounds will be fixed for the next iteration step.
• If ᾱ ≥ 1, we take α = ᾱ. Then x(k+1) = z (k) is the unconstrained minimum when the
variables xBk are kept fixed. The Lagrange multipliers are checked to see if the objective
function can be decreased further by freeing one of the fixed variables. If not, we have
found the global minimum.
At each iteration, the sets Fk and Bk change. If a constraint is dropped, a column from RBk is
moved to RFk ; if a constraint is added, a column is moved from RFk to RBk . Solution of the
sequence of unconstrained problems (3.5.20) and computation of the corresponding Lagrange
multipliers can be efficiently achieved, provided the QR factorization (3.5.21) can be updated.
In a similar active-set algorithm for problem NNLS, the index set of x is divided as
{1, 2, . . . , n} = F ∪ B, where i ∈ F if xi is a free variable and i ∈ B if xi is fixed at zero.
Ck now consists of the rows ei , i ∈ B, of the unit matrix In . We let Ck = EBT , and if EF is
similarly defined, then Pk = (EF EB ), Tk = Ink . Since Pk is a permutation matrix, the product
k + 1, . . . , q − 1, q ⇒ q, k + 1, . . . , q − 1.
Similarly, to add the bound corresponding to xq to the working set we take AQk+1 =
AQk PL (q, k), where PL (q, k), q < k − 1, is a permutation matrix that performs a left circular
shift of the columns
q, q + 1, . . . , k ⇒ q + 1, . . . , k, q.
Equation (3.5.17) for the Lagrange multipliers simplifies for NNLS to λ = −EBTAT rk+1 ,
where −AT rk+1 is the gradient vector. As an initial feasible point we take x = 0 and set F = ∅.
3.5. Inequality Constrained Problems 167
The least squares subproblems need not be solved from scratch. Instead, the QR factorization
T R S T c
P A(EF , EB ) = , P b= ,
0 U d
is updated after a right or left circular shift, using the algorithms described in Section 3.3.2; cf.
stepwise regression.
The pseudocode below is based on the NNLS algorithm given by Lawson and Hanson [727,
1995, Chapter 23]. The algorithm cannot cycle and terminates after a finite number of steps.
However, the number of iterations needed can be large and cannot be estimated a priori.
Initialization:
F = ∅; B = {1, 2, . . . , n};
x = 0; w = AT b;
Main loop:
while B ̸= ∅ and maxi wi ≥ 0,
p = argmaxi∈B wi ;
Move index p from B to F, i.e., free variable xp ;
Let z solve minx ∥Ax − b∥2 subject to xB = 0;
while mini zi ≤ 0 ∀i ∈ F;
Let αi = xi /(xi − zi ) ∀i ∈ F such that zi < 0;
Find index q such that αq = mini∈F αi ; x = x + αq (z − x);
Move all indices q for which xq = 0 from F to B;
Let x solve minx ∥Ax − b∥2 subject to zB = 0;
end
x = z; w = EBT AT (b − Ax);
end
and targets applications such as multiway decomposition methods for tensor arrays; see Sec-
tion 4.3.5. In a recent variant due to Van Benthem and Keenan [1072, 2004], the performance is
improved by identifying and grouping together observations at each stage that share a common
pseudoinverse.
where A ∈ Rm×n , B ∈ Rp×n arise in many applications. For the solution to be unique it is
necessary that the nullspaces of A and B intersect trivially, i.e.,
A
N (A) ∩ N (B) = {0} ⇔ rank = n. (3.5.23)
B
Conditions for existence and uniqueness of solutions to problem LSQI and the related problem
LSQE with equality constraint ∥Bx−d∥2 = γ are given by Gander [438, 1981]. For a solution to
(3.5.22) to exist the set {x : ∥Bx − d∥2 ≤ γ} must not be empty. Furthermore, if ∥BxB − d∥2 <
γ, where xB solves
min ∥Bx − d∥2 , S = {x ∈ Rn | ∥Ax − b∥2 = min}, (3.5.24)
x∈S
Theorem 3.5.3. Assume that the constraint set {x : ∥Bx − d∥2 = γ} is not empty. Then the
solution to problem LSQE equals the solution x(λ) to the normal equations (AT A + λB T B)x =
AT b + B T d or, equivalently, to the least squares problem
min √A x− √
b
, (3.5.25)
x λB λd 2
where λ ≥ 0 solves the secular equation
f (λ) = ∥Bx(λ) − d∥2 − γ = 0. (3.5.26)
Proof. By the method of Lagrange multipliers and minimize ψ(x, λ), where
ψ(x, λ) = ∥Ax − b∥22 + λ(∥Bx − d∥22 − γ 2 ).
Only positive values of λ are of interest. A necessary condition for a minimum is that the gradient
of ψ(x, λ) with respect to x equals zero. For λ ≥ 0 this shows that x(λ) solves (3.5.25). It can
be shown that f (λ) is a monotone decreasing function of λ. Hence the secular equation has a
unique positive solution.
Damped least squares problems were used by Levenberg [736, 1944] in the solution of non-
linear least squares problems. Problem (3.5.28) can be solved by Householder QR factorization;
see Golub [487, 1965]. The structure of the initial and the transformed matrices after k = 2 steps
of Householder QR factorization is shown below for m = n = 4.
× × × × × × × ×
× × × × ⊗ × × ×
× × × × ⊗ ⊗ × ×
× × × × ⊗ ⊗ × ×
=⇒
× ⊗ ⊕ + +
× ⊕ + +
× ×
× ×
Only the first two rows of λI have filled in. In all steps, precisely n elements in the current
column are annihilated. Hence the Householder QR factorization requires 2mn2 flops, which is
2n3 /3 flops more than for the QR factorization of A. A similar increase in arithmetic operations
occurs for MGS.
The standard form of problem LSQE can also be formulated as a least squares problem on
the unit sphere:
min ∥Ax − b∥2 . (3.5.30)
∥x∥2 =γ
This is a problem on the Stiefel (or equivalently, Grassman) manifold. Newton-type algorithms
for such problems are developed by Edelman, Arias, and Smith [358, 1999]. The application of
such methods to the regularized least squares problem has been studied by Eldén [374, 2002].
Each Newton iteration requires the solution of the damped least squares problem (3.5.28) for a
new value of λ. Hence a new QR decomposition must be computed. These QR factorizations
account for the main cost of an iteration.
For any fixed value of λ this problem can be reduced by a second QR factorization to
minx ∥R(λ)x − c1 (λ)∥2 . Then x(λ) and z(λ) can be computed from the two triangular sys-
tems
R(λ)x(λ) = c1 (λ), R(λ)T z(λ) = x(λ).
Eldén [367, 1977] shows that further savings in the Newton iterations can be obtained by an
initial transformation
T B T g1
U AV = , U b= , (3.5.34)
0 g2
where B is upper bidiagonal; see Section 4.2.1. Computing B and V requires 4mn2 flops; see
Section 4.2.1. With x = V y, the least squares problem (3.5.28) is transformed into
B g1
min √ y− . (3.5.35)
y λIn 0 2
Here G1 zeros the element in position (n + 1, 1) and creates a new nonzero element in position
(n + 2, 2). This is annihilated by a second plane rotation J1 that transforms rows n + 1 and n + 2.
All remaining steps proceed similarly. The solution is then obtained as
The QR factorization in (3.5.36) and the computation of y(λ) take about 23n flops. Eldén [367,
1977] gives a more detailed operation count and also shows how to compute the derivatives used
in Newton’s method of the equation f (λ) = ∥yλ ∥2 − γ = 0.
where f and g are assumed to be real functions in the Hilbert space L2 (Ω), and the kernel
k(· , ·) ∈ L2 (Ω × Ω). Let K ∈ L2 (Ω) → L2 (Ω) be a continuous linear operator defined by
Z
Kf = k(· , t)f (t)dt. (3.6.2)
Ω
By the Riemann–Lebesgue lemma there are rapidly oscillating functions f that come arbitrarily
close to being annihilated by K. Hence, the inverse of K cannot be a continuous operator.
Hence the solution of f does not depend continuously on the data g. Therefore (3.6.1) is called
an ill-posed problem, a term introduced by Hadamard.
A compact operator K admits a singular value expansion
Kvi = σi ui , K T ui = σi v i , i = 1, 2, . . . ,
where the functions ui and vi are orthonormal with respect to the inner product
Z
⟨u, v⟩ = u(t)v(t)dt, ∥u∥ = ⟨u, u⟩1/2 .
Ω
The infinitely many singular values σi quickly decay with i and cluster at zero. Therefore
(3.6.1) has a solution f ∈ L2 (Ω) only for special right-hand sides g. A known (Groetsch [539,
1984, Theorem 1.2.6]) necessary and sufficient condition is that g satisfies the Picard condition
∞
X
|uTi g/σi |2 < ∞. (3.6.3)
i=1
In most practical applications the kernel K of integral equation (3.6.1) is usually given ex-
actly by the mathematical model, while g consists of measured quantities known with a certain
accuracy at a finite set of points s1 , . . . , sn . To solve the integral equation (3.6.1) numerically, it
must first be reduced to a finite-dimensional matrix equation by discretization. This can be done
172 Chapter 3. Generalized and Constrained Least Squares
where Wn = span {ψ1 , . . . , ψn } is a second n-dimensional subspace of L2 (Ω). This leads again
to a finite-dimensional linear system Kf = g for the vector (f1 , . . . , fn ), where
The discretized system Kf = g or, more generally, the least squares problem minx ∥Kf −
g∥2 , will inherit many properties of the integral equation (3.6.1). In the singular value decompo-
sition
Xn
K = U ΣV T = σi ui viT ,
i=1
the singular values σi will decay rapidly and cluster near zero with no evident gap between any
two consecutive singular values. Such matrices have an ill-determined numerical rank, and the
corresponding problem is a discrete ill-posed problem. The solution usually depends mainly
on a few larger singular values σ1 , . . . , σp , p ≪ n. The effective condition number for the exact
right-hand side g,
κe = σ1 /σp ≪ κ(K), p ≪ n, (3.6.5)
is usually small. The concept of effective condition number was introduced by Varah [1086,
1973]; see also Chan and Foulser [226, 1988].
2
Example 3.6.1. Consider the Fredholm integral equation (3.6.1) with kernel K(s, t) = e−(s−t) .
Let Kf = g be the system of linear equations obtained by discretization with the trapezoidal rule
on a square mesh with m = n = 100. The singular values of K are displayed in logarithmic
scale in Figure 3.6.1. They decay toward zero, and there is no distinct gap anywhere in the spec-
trum. For i > 30, σi are close to roundoff level, and in double precision the numerical rank of
K certainly is smaller than 30.
The discretized linear system Kf = g will only have a meaningful solution for right-hand
sides b that satisfy a discrete version of the Picard condition; see Hansen [573, 1990]. If g is
affected by noise, then the exact solution for the noisy right-hand side g = gexact + e will
bear no resemblance to the noise-free true solution. A consequence of the Picard condition for
the continuous problem is that the coefficients ci = uTi g for gexact in the SVD solution of the
discretized system
Xn
f = K † gexact = ci σi−1 vi (3.6.6)
i=1
3.6. Regularized Least Squares 173
2
10
0
10
−2
10
−4
10
−6
10
−8
10
k
σ
−10
10
−12
10
−14
10
−16
10
−18
10
0 10 20 30 40 50 60 70 80 90 100
k
Figure 3.6.1. Singular values σi of a discretized integral operator. Used with permission of
Springer International Publishing; from Numerical Methods in Matrix Computations, Björck, Åke, 2015;
permission conveyed through Copyright Clearance Center, Inc.
must eventually decrease faster than σi . However, when gexact is contaminated with errors, any
attempt to solve the discrete ill-posed problem numerically without restriction of the solution
space will give a meaningless result.
In many applications the kernel k(s, t) depends only on the difference s − t, and the integral
equation has the form
Z 1
h(s − t)f (t)dt = g(s), 0 ≤ s, t ≤ 1. (3.6.7)
0
large reduction in the norm of the residual b − Axk is achieved without causing the norm of the
approximate solution xk to become too large. The TSVD method is widely used as a general-
purpose method for small to medium-sized ill-posed problems. For many ill-posed problems the
solution can be well approximated by the TSVD solution with a small number of terms. Such
problems are called effectively well-conditioned.
In statistical literature, TSVD is known as principal component regression (PCR) and often
formulated in terms of the eigenvectors of ATA instead of the SVD of A; see Massy [781, 1965].
The Gauss–Markov theorem (Theorem 1.1.4) states that the least squares solution is the best
unbiased linear estimator of x, in the sense that it has minimum variance. If A is ill-conditioned,
this minimum variance is still large. In regularization the variance can be substantially decreased
by allowing the estimator to be biased.
In TSVD the components selected from the SVD expansion correspond to the k largest sin-
gular values. The right-hand side b can have larger projections on some singular vectors corre-
sponding to smaller singular values. In such a case, one could take into account also the size of
the coefficients ci = uTi b when choosing which components of the SVD expansion to include in
the approximation.
Hansen, Sekii, and Shibahashi [585, 1992] introduce the modified solution MTSVD xB,k
that solves the least squares problem
min ∥Bx∥2 , S = { x | ∥Ak x − b∥2 = min}, (3.6.8)
x∈S
where B ∈ Rp×n is a matrix that penalizes solutions that are not smooth. The TSVD solution
is obtained by taking B = I. The MTSVD problem has the same form as (2.3.10) used in
Section 2.3.2 to resolve rank-deficiency. The solution can be written in the form
xB,k = xk − Vk⊥ z, z ∈ Rn−k , (3.6.9)
where xk = Vk Σ−1 T
k Uk b is the TSVD solution, and the columns of Vk⊥ span the nullspace of Ak .
The vector z can be computed from the QR factorization of BVk⊥ ∈ Rp×(n−k) as the solution to
the least squares problem
min ∥(BVk⊥ )z − Bbk ∥2 . (3.6.10)
z
It is often desired to compute a sequence of MTSVD solutions for decreasing values of k.
When k is decreased, more columns are added to the left of BVk⊥ . This makes it costly to update
the QR factorization of BVk⊥ . It is more efficient to work with
the subspaces spanned by the selected columns for such methods are almost identical to those
from TSVD.
The parameter λ > 0 governs the balance between a small-residual norm and the regularity
of the solution as measured by ∥Lx∥2 . Attaching Tikhonov’s name to the method is moti-
vated by the groundbreaking work of Tikhonov [1063, 1963] and Tikhonov and Arsenin [1062,
1977]. Early works by other authors on Tikhonov regularization are surveyed by Hansen [579,
2010, Appendix C]. Regularization methods of the form (3.6.11) have been used by many other
authors for smoothing noisy data and in methods for nonlinear least squares; see Levenberg [736,
1944]. In statistics the method is known as ridge regression.
Often L is chosen as a discrete approximation of some derivative operator. Typical choices
are
1 −1 −1 2 −1
L1 = .. .. , L2 = .. .. .. , (3.6.12)
. . . . .
1 −1 −1 2 −1
or a combination of these. These operators have a smoothing effect on the solution and are
called smoothing-norm operators; see Hanke and Hansen [570, 1993]. Note that L1 and L2
are banded matrices with small bandwidth and full row rank. Their nullspaces are explicitly
known:
Clearly, any component of the solution in N (L) will not be affected by the regularization term
λ2 ∥Lx∥2 . Since the nullspaces are spanned by very smooth vectors, it will not be necessary to
regularize this part of the solution.
Any combination of the matrices L1 , L2 and the unit matrix can also be used. This corre-
sponds to a discrete approximation of a Sobolev norm. It is no restriction to assume that L ∈
Rp×n , p ≤ n. If p > n, then a QR factorization of L can be performed so that ∥Lx∥2 = ∥R2 x∥2 ,
where R2 = QT L has at most n rows. Then R2 can be substituted for L in (3.6.11).
The solution of (3.6.11) satisfies the normal equations
If A and L have no nullspace in common and λ > 0, there is a unique solution. Writing the
normal equations as
AT r − λLLT x = 0, r = b − Ax,
shows that they are equivalent to the augmented system
I A r b
= . (3.6.14)
AT −λLLT x 0
176 Chapter 3. Generalized and Constrained Least Squares
Forming the normal equations can be avoided by noting that problem (3.6.11) is equivalent
to the least squares problem
min √A x−
b
, (3.6.15)
x λL 0 2
where σi are the singular values and ui , vi are the singular vectors of A. The quantities fi ,
0 ≤ fi < 1, are called filter factors (in statistics, shrinkage factors) and are decreasing functions
of λ. If λ ≪ σi , then fi ≈ 1, and the corresponding terms are almost the same as without
regularization. On the other hand, if λ ≫ σi , then fi ≪ 1, and the corresponding terms are
damped. For a suitably chosen λ, x(λ) is approximately the same as the TSVD solution.
Often the Tikhonov problem (3.6.11) arises with A and L banded upper triangular matrices.
For example, in fitting a polynomial smoothing spline of degree 2p − 1 to m data points, the
half-bandwidth will be p and p + 1, respectively; see Reinsch [923, 1971]. Eldén [370, 1984]
shows how to reduce such regularized problems to the form
√ R1 d1
min x− , (3.6.18)
x λR2 0 2
where R1 and R2 are banded. For a fixed value of λ the regularized problem can be reduced
further to upper triangular form. Recall that the order in which the rows are processed in this
QR factorization is important for efficiency; see Section 4.1. Unnecessary fill-in is avoided
by first sorting the rows so the matrices are in standard form; see Section 4.1.4. The rows
are then processed from the top down using plane rotations to give a matrix of banded upper
triangular form. For a given value of λ this requires n(w1 + w2 − 1) rotations and 4n(w12 + w22 )
flops. It can easily be generalized to problems involving several upper triangular band matrices
λ1 R1 , λ2 R2 , λ3 R3 , etc.
Example 3.6.2. Consider the banded regularized least squares problem (3.6.18) where R1 and
R2 are banded upper triangular matrices with bandwidth w1 = 3 and w2 = 2, respectively. First,
let Givens QR factorization be carried out without reordering the rows. Below right is shown the
reduced matrix after the first three columns have been brought into upper triangular form. Note
that the upper triangular part R2 has completely filled in:
×
× × ×
× ×
× × ×
× × ×
× × ×
× × ×
× ×
× ×
× ⇒
×
× × ⊗ ⊗ ⊕ + +
× ×
⊗ ⊗ + +
× × ⊗ × +
× × × ×
3.6. Regularized Least Squares 177
For a similar problem of size (2n−1, n) and bandwidth w the complete QR factorization requires
n(n + 1)/2 plane rotations and about 2n(n + 1)w flops.
Consider now the application of Givens QR factorization after the rows have been preordered
to put A in standard form as shown below left. To the right is shown the matrix after the first
three columns have been brought into upper triangular form. Here the algorithm is optimal in the
sense that no new nonzero elements are created, except in the final (uniquely determined) R. For
a similar matrix of size (2n − 1) × n the factorization only requires n(w + 1)/2 Given rotations,
and a total of approximately 2n(w + 1)w flops, an improvement by a factor of n/w:
× × × × +
× × × ⊗ × × +
× ×
⊗ × +
× × ×
⊗ ⊗ ×
× ×
⇒
⊗ ×
× × ×
⊗ × ×
× ×
× ×
× × × ×
× ×
Groetsch [539, 1984] has shown that when the exact solution is very smooth and in the
presence of noisy data, Tikhonov regularization cannot reach the optimal solution that the data
allows. In those cases the solution can be improved by using iterated Tikhonov regularization,
suggested by Riley [929, 1956]. In this method a sequence of improved approximate solutions is
computed by
x(0) = 0, x(q+1) = x(q) + δx(q) ,
where δx(q) solves the regularized least squares problem
(q)
√A r
min δx − , r(q) = b − Ax(q) . (3.6.19)
δx λI 0 2
This iteration may be implemented very effectively because only one QR factorization is needed;
see Golub [487, 1965]. A related scheme is suggested by Rutishauser [952, 1968]. The conver-
gence can be expressed in terms of the SVD of A as
n (q)
(q)
X ci f i (q)
λ q
x (λ) = vi , fi =1− . (3.6.20)
i=1
σi σi2 +λ
Thus, for q = 1 we have the standard regularized solution, and x(q) → A† b as q → ∞. When
iterated sufficiently many times, Tikhonov regularization will reach an accuracy that cannot be
improved significantly by any other method; see Hanke and Hansen [570, 1993].
the parameter λ is chosen as the smallest value for which (3.6.21) is satisfied. The attained
accuracy can be very sensitive to the value of η. It has been observed that the discrepancy
principle tends to give a slightly oversmoothed solution. This means that not all the information
present in the data is recovered.
When no prior information about the noise level in the data is available, a great number of
different methods have been proposed. All of them have the common property that they require
the solution for many values of the regularization parameter λ. The L-curve method was first
proposed by Lawson and Hanson [727, 1995, Chapter 26]. It derives its name from a plot in a
doubly logarithmic scale of the curve (∥b − Axλ ∥2 , ∥xλ ∥2 ), which typically is shaped like the
letter L. Choosing λ near the “corner” of this L-curve represents a compromise between a small
residual and a small solution. The L-curve method is further studied and refined by Hansen [574,
1992]. Hansen and O’Leary [584, 1993] propose choosing λ more precisely as the point on the
L-curve where the curvature has the largest magnitude. Advantages and shortcomings of this
method are discussed by Hansen [576, 1998]. For large-scale problems it may be too expensive
to compute sufficiently many points on the L-curve. Calvetti et al. [199, 2002] show how to
compute cheap upper and lower bounds in this case.
In generalized cross-validation (GCV) the parameter in Tikhonov regularization is esti-
mated directly from the data. The underlying statistical model is that the components of b are
subject to random errors of zero mean and covariance matrix σ 2 Im , where σ 2 may or may not
be known. The predicted values of b are written as Axλ = Pλ b, where
Pλ = AA†λ , A†λ = (ATA + λI)−1 AT (3.6.23)
is the symmetric influence matrix. When σ 2 is known, Craven and Wahba [278, 1979] suggest
that λ should be chosen to minimize an unbiased estimate of the expected true mean square error.
When m is large, this minimizer is asymptotically the same as for the GCV function
∥b − Axλ ∥22
Gλ = 2 (3.6.24)
trace Im − Pλ
(see Golub, Heath, and Wahba [493, 1979]). The GCV method can also be used in other ap-
plications, such as truncated SVD methods and subset selection. The GCV function is invariant
under orthogonal transformations of A. It can be very flat around the minimum, and localizing
the minimum numerically can be difficult; see Varah [1088, 1983].
Ordinary cross-validation is based on the following idea; see Allen [17, 1974]. Let xλ,i be
the solution of the regularized problem when the ith equation is left out. If this solution is a
good approximation, then the error in the prediction of the ith component of the right-hand side
should be small. This is true for all i = 1 : m. Generalized cross-validation is a rotation-invariant
version of ordinary cross-validation.
For standard Tikhonov regularization the GCV function (3.6.24) can expressed in terms of
the SVD A = U ΣV T as
† Ω 0
Pλ = AAλ = U UT , (3.6.25)
0 0
where Ω = diag(ω1 , . . . , ωn ), ωi = σi2 /(σi2 + λ). An easy calculation shows that
n m
X c2i X
∥(Im − Pλ )b∥22 = λ + c2 , (3.6.26)
i=1
σi2 + λ i=n+1 i
For the general case B ̸= I, formulas similar to (3.6.26) and (3.6.27) can be derived from the
GSVD, or a transformation to the standard case can be used; see Section 3.6.5.
Example 3.6.3 (Golub and Van Loan [512, 1989, Problem 12.1.5]). For A = (1, 1, . . . , 1)T ∈
Rm×1 and b ∈ Rm the GCV function becomes
m(m − 1)s2 + ν 2 m2 b̄2 λ
Gλ = , ν= ,
(m − 1 + ν)2 m+λ
where
m m
1 X 1 X
b̄ = bi , s2 = (bi − b̄)2 .
m i=1 m − 1 i=1
It can be readily verified that Gλ is minimized for ν = s2 /(mb̄2 ), and the optimal value of λ is
−1
λopt = (b̄/s)2 − 1/m .
The regularized problem (3.6.22) is equivalent to minx ∥Rx − c1 ∥22 + λ2 ∥x∥22 . By a second QR
factorization this can be reduced further to
R c1 Rλ d1
QTλ √ = , (3.6.28)
λIn 0 0 d2
where Rλ is upper tridiagonal, and RTR + λIn = RλT Rλ . The solution of the regularized
problem is then obtained by solving Rλ x(λ) = d1 and the residual norm is given by
To compute the trace term in the GCV function we first note that AA†λ = AMλ−1 AT , where
equals the covariance matrix. From elementary properties of the trace function,
Hence, the trace computation is reduced to computing the sum of the diagonal elements of the
covariance matrix C = (RλT Rλ )−1 . An efficient algorithm for this is given by Eldén [371, 1984].
By reducing A to upper bidiagonal form U T AV = B and setting y = V T x, a regularization
problem of bidiagonal form is obtained. This can be further reduced by plane rotations to upper
triangular form
B g1 Bλ z1
QT √ = (3.6.32)
λIn 0 0 z2
with Bλ upper bidiagonal:
ρ1 θ2
..
ρ2 .
Bλ = .
..
. θn
ρn
180 Chapter 3. Generalized and Constrained Least Squares
Pn
If bTi denotes the ith row of Bλ−1 , then trace ((BλT Bλ )−1 ) = i=1 ∥bi ∥22 . From the identity
Bλ Bλ−1 = I we obtain the recursion
ρn bn = en , ρi bi = ei − θi+1 bi+1 , i = n − 1, . . . , 2, 1.
Because Bλ−1 is upper triangular, bi+1 is orthogonal to ei . Hence
∥bn ∥22 = 1/ρ2n , ∥bi ∥22 = (1 + θi+1
2
∥bi+1 ∥22 )/ρ2i , i = n − 1, n − 2, . . . , 1. (3.6.33)
This algorithm for computing the trace term requires only O(n) flops.
Hutchinson and de Hoog [649, 1985] give a similar method for computing the GCV function
for smoothing noisy data with polynomial spline functions of degree 2p − 1. It is based on the
observation that only the elements in the central 2p + 1 bands of the inverse of the influence
function Pλ (3.6.23) are needed. These elements can be computed efficiently from the Cholesky
factor of Pλ . Their algorithm fully exploits the banded structure of the problem and only re-
quires O(p2 m) operations. A Fortran implementation for p = 2 is given in Hutchinson and
de Hoog [650, 1986].
this can be expensive. Great savings can be achieved by an initial transformation into a problem
of standard form. If L has full column rank, such a reduction can easily be made as follows.
Let L = QL RL be the (thin) QR factorization. If L has full column rank, then RL ∈ Rn×n is
nonsingular, and with y = RL x, problem (3.6.34) becomes
e − eb∥22 + λ∥y∥22 ,
min ∥(Ay (3.6.35)
x
e
where for any given y, z can be determined so that r1 = 0. Thus, the generalized problem is
reduced to the standard form (3.6.35) with
e = QT2 (AL† ),
A eb = QT b.
2
(Awi )T (Awj ) = 0, i ̸= j.
and σi = αi /βi . The second term x2 ∈ N (L) is the unregularized part of the solution. This
GSVD splitting resembles expansion (3.6.17) of the solution for a problem in standard form and
has the property that Ax1 ⊥ Ax2 .
The two components in (3.6.40) are solutions to two independent least squares problems
x2 = W2 z, z = (AW2 )† b. (3.6.42)
Usually, the dimension of N (L) is very small and the cost of computing x2 is negligible. The first
problem in (3.6.41) can be transformed into standard form using the A-weighted pseudoinverse
of L introduced by Eldén [369, 1982],
where I − L† L = PN (L) . Setting x1 = L†A y, we have Lx1 = LL†A y = y, and the first problem
in (3.6.41) becomes
min ∥AL†A y − b∥22 + λ2 ∥y∥22 . (3.6.45)
y
182 Chapter 3. Generalized and Constrained Least Squares
and hence
L†A = (I − P )L† , P = W2 (AW2 )† A. (3.6.46)
2 T
It can be verified (Hansen [580, 2013]) that P = P and P ̸= P . Hence P is an oblique
projector onto N (L) along the A-orthogonal complement of N (L) in Rn ; cf. Theorem 3.1.3. It
can also be shown that L†A satisfies four conditions similar to the Penrose conditions in Theo-
rem 1.2.10.
The amount of work in the above reduction to standard form is often negligible compared
to the amount required for solving the resulting standard form problem. The use of such a
reduction in direct and iterative regularization methods is studied by Hanke and Hansen [570,
1993], where a slightly different implementation is used. The use of transformation to standard
form in iterative regularization methods is treated in Section 6.4.
i.e., all nonzero elements in each row of A lie in at most w contiguous positions. If w ≪ n,
then only a small proportion of the n2 elements are nonzero, and they are located in a band
centered along the principal diagonal. Band linear systems and least squares problems with
small bandwidth w ≪ n arise in applications where each variable xi is coupled to only a few
other variables xj such that |j − i| is small. Clearly, the bandwidth of a matrix depends on the
ordering of its rows and columns.
The lower bandwidth r and upper bandwidth s are the smallest integers such that
In other words, the number of nonzero diagonals below and above the main diagonal are r and
s, respectively. The maximum number of nonzero elements in any row is w = r + s + 1. For
example, the matrix
a11 a12
a21 a22 a23
a31 a32 a33 a34
(4.1.3)
a42 a43 a44 a45
a53 a54 a55 a56
a64 a65 a66
has r = 2, s = 1, and w = 4. If A is symmetric, then r = s. Several frequently occurring classes
of band matrices have special names, e.g., a matrix for which r = s = 1 is called tridiagonal. If
r = 0, s = 1 (r = 1, s = 0), the matrix is called upper (lower) bidiagonal.
To avoid storage of zero elements, the diagonals of a band matrix can be stored either as
columns in an array of dimension n × w or as rows in an array of dimension w × n. For example,
183
184 Chapter 4. Special Least Squares Problems
Definition 4.1.1. If a ∈ Rn is a vector, then A = diag (a, k) is a square matrix of order n + |k|
with the elements of a on its kth diagonal, where k = 0 is the main diagonal, k > 0 is above
the main diagonal, and k < 0 is below the main diagonal. If A is a square matrix of order n,
then diag (A, k) ∈ R(n−k) , |k| < n, is the column vector consisting of the elements of the kth
diagonal of A.
For example, diag (A, 0) is the main diagonal of A, and if 0 ≤ k < n, the kth superdiagonal
and subdiagonal of A are
Clearly, the product of two diagonal matrices D1 and D2 is another diagonal matrix whose
elements are equal to the elementwise product of the diagonals. The following elementary but
very useful result shows which diagonals in the product of two square band matrices are nonzero.
Theorem 4.1.2. Let A ∈ Rn×n and A2 ∈ Rn×n have lower bandwidths r1 and r2 and upper
bandwidths s1 and s2 , respectively. Then the products AB and BA have lower and upper
bandwidths r3 ≤ min{r1 + r2 } and s3 ≤ min{s1 + s2 }.
Pn
Proof. The elements of C = AB are cij = k=1 aik bjk . By definition, aik = 0 if k > i + r1
and bkj = 0 if j > k + r2 . It follows that aik bjk = 0 unless k ≤ i + r1 and j ≤ k + r2 . But this
implies that k + j ≤ i + r1 + k + r2 or j ≤ i + (r1 + r2 ), i.e., C has bandwidth at most r1 + r2 .
The second case follows from the observations that if a matrix has lower bandwidth r, then AT
has upper bandwidth r, and that (AB)T = B T AT .
Note that Theorem 4.1.2 holds also for negative values of the bandwidths. For example, a
strictly upper triangular matrix A can be said to have lower bandwidth r = −1. It follows that
A2 has lower bandwidth r = −2, etc., and An = 0.
When the bandwidth of A ∈ Rn×n and B ∈ Rn×n are small compared to n the usual algo-
rithms for forming the product AB are not effective on vector computers. Instead, the product
can be formed by writing A and B as a sum of their diagonals and multiplying crosswise. For
example, if A and B are tridiagonal, then by Theorem 4.1.2 the product C = AB has upper and
lower bandwidths two. The five nonzero diagonals of C can be computed by 32 = 9 pointwise
vector multiplications independent of n.
4.1. Band Least Squares Problems 185
The original system has been reduced to two smaller sets of equations. Hence, only the diagonal
blocks A11 and A22 need to be factorized. If again A11 or A22 is reducible, then such a reduction
can be carried out again. This can be continued until a triangular block form with irreducible
diagonal blocks is obtained. This observation motivates the term reducible.
It is well known (see Duff, Erisman, and Reid [344, 1986]) that the inverse of any irreducible
matrix A is structurally full, i.e., it is always possible to find numerical values such that all entries
in A−1 will be nonzero. In particular, the inverse of an irreducible band matrix in general has no
zero elements. Therefore, it is important to avoid computing the inverse explicitly. Even storing
the elements of A−1 may be infeasible. However, if the LU factorization of A can be carried out
without pivoting, then the band structure in A is preserved in the LU factors.
Theorem 4.1.4. Let A have lower bandwidth r and upper bandwidth s, and assume that the
factorization A = LU exists. Then, assuming that LU factorization can be carried out without
row and columns permutations, L will have lower bandwidth r and U have upper bandwidth s.
Proof. The proof is by induction. Assume that the first k − 1 columns of L and rows of U have
bandwidths r and s. Then for p = 1 : k − 1,
The assumption is trivially true for k = 1. Since akj = 0 for j > k + s, (4.1.5) yields
k−1
X
ukj = akj − lkp upj = 0 − 0 = 0, j > k + s.
p=1
Similarly, it follows that lik = 0, i > k + r, which completes the induction step.
An important but hard problem is to find a reordering of the rows and columns of A that
minimizes the bandwidth of the LU or Cholesky factors. However, there are heuristic algorithms
that give almost optimal results; see Section 5.1.5.
When A is tridiagonal,
a1 c2
b2 a2 c3
A= .. .. ..
, (4.1.6)
. . .
bn−1 an−1 cn
bn an
186 Chapter 4. Special Least Squares Problems
γk = bk /dk−1 , dk = ak − γk ck , k = 2 : n. (4.1.7)
Here γk and dk can overwrite bk and ak , respectively. The solution to the system Ax = L(U x) =
g is then obtained by solving Ly = f by forward substitution: y1 = g1 ,
yi = gi − γi yi−1 , i = 2, . . . , n, (4.1.8)
The total number of flops for the factorization is about 3n and is 2.5n for the solution of the
triangular systems. Note that the divisions in the substitution can be avoided if (4.1.7) is modified
to compute d−1k . This may be more efficient because on many computers a division takes more
time than a multiplication.
Lemma 4.1.5. Assume that A ∈ Rm×n has row bandwidth w. Then ATA has upper and lower
bandwidths s = r = w − 1.
Proof. From the definition (4.1.1) of bandwidth it follows that aij aik ̸= 0 ⇒ |j − k| < w. This
implies that
Xm
|j − k| ≥ w ⇒ (ATA)jk = aij aik = 0,
i=1
and hence s ≤ w − 1.
Theorem 4.1.6. Let C = LLT be the Cholesky factorization of the symmetric positive definite
band matrix C. Then the symmetric matrix L + LT inherits the band structure of C.
The next algorithm computes the Cholesky factor L of a symmetric (Hermitian) positive
definite matrix C using a column sweep ordering. Recall that no pivoting is needed for stability.
Only the lower triangular part of A is used.
4.1. Band Least Squares Problems 187
If r ≪ n, then this algorithm requires about nr(r+3) flops and n square roots to compute the
Cholesky factor L. When s ≪ n this is much less than the n3 /3 flops required in the full case.
In the semidefinite case, diagonal pivoting is required, which can destroy the band structure.
The least squares solution is obtained by solving the triangular systems Ly = c = AT b and
T
L x = y by forward- and back-substitution:
i−1
X
yi = bi − rji yj , i = 1, . . . , n, p = max(1, i − r),
j=p
q
X
xi = yi − rij xj /rii , i = n, . . . , 1, q = min(i + r, n).
j=i+1
Efficient band versions of band forward and back-substitution can be derived. Each requires
about 2(2n − r)(r + 1) ≈ 4nr flops and can be organized so that y and, finally, x overwrite
c in storage. Thus, if full advantage is taken of the band structure of the matrices involved, the
solution of a least squares problem where A has bandwidth w ≪ n requires a total of about
(m + n)w2 + 4nw flops.
Let A be symmetric positive definite and tridiagonal as in (4.1.6) with ci = bi , i = 2, . . . , n,
and write the Cholesky factorization in symmetric form as A = LDLT , D = diag (d1 , . . . , dn ).
Then the elements in D and L are obtained as follows. Set d1 = a1 , and
γk = bk /dk−1 , dk = ak − γk bk , k = 2, . . . , n. (4.1.10)
Eliminating γk gives
dk = ak − b2k /dk−1 , k = 2, . . . , n. (4.1.11)
Sometimes it is more convenient to set LD = U T and determine the factorization A = U T D−1 U .
are given in Section 2.1.5. In the case when R is banded, or generally sparse, an algorithm by
Golub and Plemmons [505, 1980] can be used with great savings to compute all elements of Cx
188 Chapter 4. Special Least Squares Problems
in positions where R has nonzero elements. This includes the diagonal elements of Cx that give
the variances of least squares solution x. We denote by K the index set {(i, j) ∈ K | rij ̸= 0}.
Note that because R is the Cholesky factor of ATA, its structure is such that (i, j) ∈ K and
(i, k) ∈ K imply that (j, k) ∈ K if j < k and (k, j) ∈ K if j > k. From (4.1.12) it follows that
where R−T is lower triangular with diagonal elements 1/rii , i = 1, . . . , n. Equating the last
columns of the identity (4.1.13) gives
−1
Rcn = rnn en , en = (0, . . . , 0, 1)T ,
where cn is the last column of Cx . From this equation the elements in cn can be computed by
−2
back-substitution, giving cnn = rnn and
n
X
cin = −rii rij cjn , i = n − 1, . . . , 1. (4.1.14)
j=i+1
(i,j)∈K
By symmetry cni = cin , i = 1, . . . , n − 1, also the last row of Cx is determined. We only need
to save the elements with row indices greater than or equal to the first nonzero element in the kth
column of R.
Now assume that the elements cij , (i, j) ∈ K, in columns j = n, . . . , k + 1 have been
computed. Then " #
Xn
−1 −1
ckk = rkk rkk − rkj ckj . (4.1.15)
j=k+1
(k,j)∈K
Similarly, for i = k − 1, . . . , fk ,
" k n
#
X X
−1
cik = −rii rij cjk + rij ckj , (i, k) ∈ K. (4.1.16)
j=i+1 j=k+1
(i,j)∈K (i,j)∈K
If the Cholesky factor R has bandwidth p, then the elements of Cx in the p + 1 bands of the
upper triangular part of Cx can be computed by the above algorithm in about 23 np(p + 1) flops.
An important particular case is when R is bidiagonal. Computing the two diagonals of Cx then
requires only about 43 n flops.
A(k+1) = Pk A(k) , k = 1, . . . , n,
where the sequence of Householder reflections Pk is chosen to annihilate the subdiagonal ele-
ments in the kth column of A(k) . As shown by Reid [919, 1967] this will cause each column in
the remaining unreduced part of the matrix that has a nonzero inner product with the column be-
ing reduced to take on the sparsity pattern of their union. Hence, even though the final R retains
the bandwidth of A, large intermediate fill can take place with consequent cost in operations and
4.1. Band Least Squares Problems 189
i ≤ k ⇒ fi (A) ≤ fk (A).
A b1
A2 b2
A=
... ,
b=
... ,
(4.1.17)
Aq bq
where Ak ∈ R(mk ,n) and m = m1 + · · · + mq . Here Ak consists of all rows of A for which
the first nonzero element is in column k, k = 1, . . . , q. The row ordering within the blocks Ak
may be specified by sorting the rows so that the column indices li (Ak ) within each block form a
nondecreasing sequence. In many applications, A is naturally given in standard form.
The structure of ATA, and therefore of R, can be generated as follows.
Theorem 4.1.7. Assume that A ∈ Rm×n is a band matrix in standard form partitioned as in
(4.1.17), and let wk be the bandwidth of the block Ak . Then the band structure of the upper
triangular Cholesky factor R is given by l1 (R) = w1 ,
The first efficient QR algorithm for band matrices was given in Chapter 27 of Lawson and
Hanson [727, 1995]. It uses Householder transformations and assumes that A is partitioned in the
form (4.1.17). First, R = R0 is initialized to be an empty upper triangular matrix of bandwidth
w. The QR factorization proceeds in q steps, k = 1, 2, . . . , q. In step k the kth block Ak is
merged as follows into the previously computed upper triangular (Rk−1 , dk−1 ) by an orthogonal
transformation:
T Rk−1 Rk
Qk = ,
Ak 0
where Qk is a product of Householder reflections giving an upper trapezoidal Rk . Note that
this and later steps do not involve the first k − 1 rows and columns of Rk−1 . Hence, at the
beginning of step k the first k − 1 rows of Rk−1 are rows in the final matrix R. At termination
we have obtained (Rq , dq ) such that Rq x = dq . This Householder algorithm uses a total of about
2w(w + 1)(m + 3n/2) flops,
Example 4.1.8. The least squares approximation of a discrete set of data by a linear combination
of cubic B-splines is often used for smoothing noisy data values observedPat m distinct points;
n
see Reinsch [922, 1967] and Craven and Wahba [278, 1979]. Let s(t) = j=1 xj Bj (t), where
190 Chapter 4. Special Least Squares Problems
× × × ×
1 × × × +
1 2 × × + +
3 4 × × +
3 4 5 × +
6 7 8 ×
6 7 8 9
6 7 8 9
× × × ×
× × × ×
× × × ×
× × × ×
× × × ×
× × × ×
× × × ×
× × × ×
× × × ×
Figure 4.1.1. The matrix A after reduction of the first k = 3 blocks using Householder reflections.
We now describe an alternative algorithm using plane rotations for the QR factorization of a
band matrix.
for i = 1, 2, . . . , m
for j = fi (A), . . . , min{fi (A) + w − 1, n}
if aij ̸= 0 then
[c, s] = givrot(rjj , aij );
T T
rj c s rj
:= ;
aTi −s c aTi
end
end
end
4.2. Bidiagonalization 191
The reduction is shown schematically in Figure 4.1.1. The ith step only involves the w × w
upper triangular part of Ri−1 formed by rows and columns fi (A) to li (A). If at some stage
rii = 0, then the whole ith row in Ri−1 must be zero, and the remaining part of the current row
aTi can just be copied into row i of Ri−1 .
If A has constant bandwidth and is in standard form, then at step i the last (n−li (A)) columns
of R have not been touched and are still zero as initialized. Furthermore, at this stage the first
(fi (A) − 1) rows are final rows of R and can be saved on secondary storage. Since primary
storage is only needed for the active triangular part, shown in Figure 4.1.2, very large problems
can be handled. Clearly, the number of plane rotations needed to process the ith row is at most
min(i − 1, w) and requires at most 4w2 flops. Hence, the complete factorization requires 4mw2
flops, and 21 w(w + 3) locations of storage. We remark that if A is not in standard form, then the
operation count can only be bounded by 4mnw flops.
The solution to a band least squares problem minx ∥Ax − b∥2 is obtained by applying QR
factorization to the extended matrix ( A b ). The solution is then obtained from a band upper
triangular system Rx = c1 . In order to treat additional right-hand sides not available at the time
of factorization, the plane rotations from the QR factorization need to be saved. As described
in Section 2.2.1, a plane rotation can be represented by a single floating-point number. Since at
most w rotations are needed to process each row, it follows that Q can be stored in no more space
than that allocated for A.
4.2 Bidiagonalization
4.2.1 Bidiagonal Decomposition
For a general m×n matrix A, the closest-to-diagonal form that can be achieved in a finite number
of operations is a bidiagonal form. Golub and Kahan [495, 1965] show that this form can be
obtained by a sequence of two-sided Householder reflections B = U TAV , where U ∈ Rm×m
and V ∈ Rn×n are orthogonal. This preserves the singular values of A, and the singular vectors
of B are closely related to those of A. Householder bidiagonalization (HHBD) is often the first
step toward computing the SVD (singular value decomposition); see Section 7.1.1.
192 Chapter 4. Special Least Squares Problems
Theorem 4.2.1 (HHBD). For any matrix A ∈ Rm×n , orthogonal matrices U = (u1 , . . . , um ) ∈
Rm×m and V = (v1 , . . . , vn ) ∈ Rn×n can be found such that U TAV is upper bidiagonal. If
m ≥ n, then
β1 α2
β2 α3
T B .. .. ∈ Rp×p ,
U AV = , B= . . (4.2.1)
0
βp−1 αp
βp
Proof. U and V are constructed as products of Householder reflectors Qk ∈ Rm×m from the
left and Pk ∈ Rn×n , k = 1, . . . , p, from the right, applied alternately to A. Here Qk is chosen to
zero out all elements in the kth column below the main diagonal, and Pk is chosen to zero out all
elements in the kth row to the right of B. First, Q1 is applied to A to zero out all elements, except
one in the first column of Q1 A. When P1 is next applied to zero out the elements in the first row
not in B, the first column in Q1 A is left unchanged. These two transformations determine β1
and α2 . All later steps are similar. With A(1) = A, set A(k+1) = (Qk A(k) )Pk . This determines
not only the bidiagonal elements βk and αk+1 in the kth row of B but also the kth columns in U
and V :
uk = U ek = Q1 . . . Qk ek , vk = V ek = P1 . . . Pk ek , k = 1, . . . , p. (4.2.2)
Here, some of the transformations may be skipped. For example, if m = n, then Qn = In and
Pn−1 = Pn = In . Similarly, a complex matrix A ∈ Cm×n can be reduced to real bidiagonal
form using a sequence of complex Householder reflections.
By applying the HHBD algorithm of Theorem 4.2.1 to AT , any matrix A ∈ Rm×n can be
transformed into lower bidiagonal form. (This is equivalent to starting the reduction of A with a
right transformation P1 instead of with Q1 .)
HHBD requires approximately 4(mn2 − 31 n3 ) flops or roughly twice as many as needed for
Householder QR factorization. If the matrices Up = (u1 , . . . , up ) and/or Vp = (v1 , . . . , vp ) are
wanted, the corresponding products of Householder reflections can be accumulated at a cost of
2(mn2 − 31 n3 ) and 43 n3 flops, respectively. For a square matrix, these counts are 38 n3 for the
reduction and 43 n3 for computing each of the matrices U and V .
HHBD is a backward stable algorithm, i.e., the computed matrix B is the exact result for a
matrix A + E, where
∥E∥F ≤ cn2 u∥A∥F , (4.2.3)
and c is a constant of order unity. Further, if the stored Householder vectors are used to generate
U and V explicitly, the computed matrices are close to the exact matrices U and V that reduce
A + E. This guarantees that the computed singular values of B and the transformed singular
vectors of B are those of a matrix close to A.
An alternative two-step reduction to bidiagonal form was suggested by Lawson and Han-
son [727, 1995] and later analyzed by T. Chan [222, 1982]. This first computes the pivoted QR
factorization
R
AP = Q
0
4.2. Bidiagonalization 193
and then transforms the upper triangular R ∈ Rn×n into bidiagonal form. Column pivoting in
the initial QR factorization makes the two-step algorithm potentially more accurate. Householder
bidiagonalization cannot take advantage of the triangular structure of R, but with plane rotations
it can be done. In the first step the elements r1n , . . . , r13 in the first row are annihilated in this
order. To zero out element r1j a plane rotation Gj−1,j is first applied from the right. This
introduces a new nonzero element rj,j−1 , which is annihilated by a rotation G̃j−1,j from the left.
The first few rotations in the process are pictured below. (Recall that ⊕ denotes the element to
be zeroed, and fill-in elements are denoted by +.)
↓ ↓ ↓ ↓
× × × × ⊕ × × × × 0 × × × ⊕ 0
× × × ×
× × × ×
× × × ×
× × ×⇒
× × ×⇒
× × ×.
× × → × × + × ×
+ × → ⊕ × 0 ×
Two plane rotations are needed to zero each element. The total cost is 6n flops, and the operation
count for the reduction of R to bidiagonal form is 2n3 flops and lower than for the Householder
algorithm. However, if the product of either the left or right transformation is needed, then
Householder reduction requires less work; see Chan [222, 1982]. A similar “zero chasing” tech-
nique can be used to reduce the bandwidth of an upper triangular band matrix while preserving
the band structure; see Golub, Luk, and Overton [498, 1981].
The bidiagonal decomposition is a powerful tool for analyzing and solving least squares prob-
lems. Let U T AV = C, where U and V are square orthogonal matrices. Then the pseudoinverses
of A and C are related as A† = V C † U T , and the pseudoinverse solution is
x = A† b = V (C † (U T b)). (4.2.4)
Clearly, the minimum of ∥Ax − b∥22 is obtained when By = c. The least squares solution
x = V y = P1 · · · Pn−2 y is computed by forming U T b = Qn · · · Q1 b and solving By = c by
back-substitution:
is given by Dongarra, Sorensen, and Hammarling [327, 1989]; see also Dongarra et al. [323,
2018, Algorithm 1]. Householder reflectors Hk = I − τk vk vkT are used to eliminate elements
below the diagonal in column k, while Gk = I − πk uk uTk eliminate elements to the right of the
superdiagonal in row k:
Setting x = V y, the problem splits (4.2.8) into two independent approximation subproblems
y1
min ∥A11 y1 − c1 ∥2 , min ∥A22 y2 ∥2 , y = , (4.2.10)
y1 y2 y2
where A11 and A22 may be rectangular. Clearly, the solution to the second subproblem is y2 = 0.
Hence, all information needed for solving the original approximation problem is contained in the
first subproblem. The pseudoinverse solution x† of the original problem (4.2.8) is related to the
pseudoinverse solution y † of the transformed problem (4.2.10) by
x† = V1 y1† , V = ( V1 V2 ) . (4.2.11)
Definition 4.2.2. The subproblem miny ∥A11 y − c1 ∥2 in (4.2.10) is said to be a core problem
of minx ∥Ax − b∥2 if A11 is minimally dimensioned for all orthogonal U and V that give the
form (4.2.9).
We now show that a core subproblem can be found by bidiagonalization of the compound
matrix ( b A ), where A ∈ Rm×n , m > n. (Note that here b is placed in front of A, con-
trary to our previous practice!) This yields an upper bidiagonal decomposition U T b A Ve .
Because Ve does not act on the first column b, we can write
T
1 0
= β1 e1 U T AV .
U b A (4.2.12)
0 V
α1
β2 α2
T B .. .. ∈ R(n+1)×n .
U AV = , B= . . (4.2.13)
0
βn αn
βn+1
The bidiagonalization process is terminated when the first zero bidiagonal element in B is en-
countered. Then a core problem has been found. There are two possible cases. If αk > 0 but
βk+1 = 0, k ≤ n, then the transformed matrix splits as follows:
β 1 e1 B11 0
UT
b AV = , (4.2.14)
0 0 A22
where
α1
β2 α2
B11 = ∈ Rk×k (4.2.15)
.. ..
. .
βk αk
has full row rank, and A22 is the part of A that is not yet in bidiagonal form. The pseudoinverse
solution x = V y is now found by solving the square nonsingular lower bidiagonal core system
B11 y1 = β1 e1 (4.2.16)
and setting x† = V1 y1 ; see (4.2.11). Because the residual for (4.2.16) is zero, it follows that the
original system Ax = b must be consistent.
If instead the bidiagonalization terminates with βk+1 > 0 and αk+1 = 0, then the first
k columns of the transformed matrix split with a rectangular lower bidiagonal matrix of full
column rank,
α1
β2 α2
B11 =
. .
.. .. ∈ R(k+1)×k .
(4.2.17)
βk αk
βk+1
196 Chapter 4. Special Least Squares Problems
The solution can be found efficiently by a QR factorization of B11 . Proof that systems (4.2.16)
and (4.2.18) are minimally dimensioned, and therefore core problems, is given by Paige and
Strakoš [868, 2006, Theorems 3.2 and 3.3].
Core problems play an important role in a unified treatment of the scaled TLS (STLS) problem;
see Paige and Strakoš [867, 2002], [868, 2006]. Hnětynková et al. [630, 2011] give a new clas-
sification of multivariate TLS problems AX ≈ B that reveals subtleties not captured previously.
The definition of the core problem is extended to the multiple right-hand side case AX ≈ B by
Plešinger [899, 2008] and Hnětynková, Plešinger, and Strakoš [632, 2013], [633, 2015].
where some of the zero blocks may be empty. Assume now that B is lower bidiagonal,
α1
β2 α2
..
B=
β3 . .
..
. αn
βn+1
Given an initial unit vector u1 ∈ Rm , this yields an iterative process for generating U and V
columnwise and B rowwise from
where βk+1 and αk+1 are normalization constants. As long as no zero bidiagonal element occurs,
the choice of u1 uniquely determines this process. It terminates with a core subproblem when
the first zero bidiagonal element is encountered. After k steps the recurrence relations (4.2.23)–
(4.2.24) can be summarized in matrix form as
The Bidiag1 algorithm of Paige and Saunders [866, 1982] is obtained for the special starting
vector
u1 = b/β1 , β1 = ∥b∥2 . (4.2.27)
In exact arithmetic, it generates the same lower bidiagonal decomposition of A as HHBD, trans-
forming ( b A ) into upper bidiagonal form. Bidiag1 is the basis of the important iterative least
squares algorithm LSQR of Paige and Saunders [866, 1982].
A similar GKLBD process can be devised for the case when U T AV = R is upper bidiagonal,
ρ1 γ2
ρ2 γ3
..
R=
ρ3 . .
(4.2.28)
..
. γn
ρn
Given an initial unit vector v1 ∈ Rn , equating columns in (4.2.19)–(4.2.20) yields Av1 = ρ1 u1 ,
AT uk = ρk vk + γk+1 vk+1 ,
Avk = γk uk−1 + ρk uk , k = 1, 2, . . . , n − 1.
These equations can be used to generate not only successive columns in U and V but also the
columns of R. As long as no zero bidiagonal element is encountered, we have
These equations uniquely determine vk+1 and uk+1 as well as γk+1 and ρk+1 . After k steps
this process has generated Uk ∈ Rm×k and Vk ∈ Rn×k and a square upper bidiagonal matrix
Rk ∈ Rk×k such that
AVk = Uk Rk , (4.2.31)
T
A Uk = Vk RkT + γk+1 vk+1 eTk . (4.2.32)
Taking the initial vector to be v1 = AT b/γ1 , γ1 = ∥AT b∥2 , gives the Bidiag2 algorithm of Paige
and Saunders [866, 1982]. In exact arithmetic, this generates the same quantities as HHBD,
transforming ( AT b AT ) into upper bidiagonal form.
198 Chapter 4. Special Least Squares Problems
Such subspaces with C = AT A or C = AAT play a fundamental role in methods for solving
large-scale least squares problems.
There can be at most n linearly independent vectors in Rn . Hence, in any Krylov sequence
v, Cv, C 2 v, C 3 v, . . . there is a first vector C p+1 v, p ≤ n, that is a linear combination of the
preceding ones. Then
Kp+1 (C, v) = Kp (C, v), (4.2.34)
and the Krylov sequence terminates. The maximum dimension p of Kk (C, v) is called the grade
of v with respect to C. From (4.2.34) it follows that at termination, Kp (C, v) is an invariant
subspace of C. Conversely, if the vector v lies in an invariant subspace of C of dimension p, its
Krylov sequence terminates for k = p.
Krylov subspaces satisfy the following easily verified invariance properties:
Theorem 4.2.3. As long as no zero bidiagonal element is encountered, the orthonormal vectors
(u1 , . . . , uk ) and (v1 , . . . , vk ) generated by Bidiag1 are bases for the Krylov subspaces
By Theorem 4.2.3 the matrices Uk and Vk can also be obtained by Gram–Schmidt orthogo-
nalization of the Krylov sequences
Hence, the uniqueness of the bases is a consequence of the uniqueness (up to a diagonal scaling
with elements ±1) of the QR factorization of a real matrix of full column rank.
Bidiag1 terminates for some k ≤ rank(A) when the subspaces Kk (ATA, AT b) have reached
maximum rank. Then the subspaces
have also reached maximal rank. At termination the pseudoinverse solution xk = Vk yk is ob-
tained for some yk ∈ Rk , where R(Vk ) = Kk (ATA, AT b).
Bidiag2 terminates with βk+1 = 0 for some k ≤ rank(A) when the subspaces Kk (ATA, AT b)
have reached maximum rank. Then the subspaces Kk (AAT , AAT b) = AKk (ATA, AT b) have
also reached maximal rank. Bidiag2 generates vectors giving orthogonal bases for
Bidiag1 and Bidiag2 generate the same Vk , but Uk differs. The upper bidiagonal matrix Rk in
Bidiag2 is the same as the matrix obtained by QR factorization of the lower bidiagonal Bk in
Bidiag1; see Paige and Saunders [866, 1982].
Collecting previous results we can now state the following.
Theorem 4.2.4 (Krylov Subspace Approximations). Let p be the grade of AT b with respect to
ATA. Then the projected least squares problems
have full rank. The solutions xk are uniquely determined, and the residuals satisfy rk = b −
Axk ⊥ Kk (AAT , AAT b). Independent of b and the size or rank of A, the Krylov subspace
approximations terminate with the pseudoinverse solution xp = A† b for some p ≤ rank(A).
From the nested property of the Krylov subspaces Kk (ATA, AT b), k = 1, 2, . . . , it follows
that the sequence of residual norms ∥rk ∥2 , k = 1, 2, . . . , are monotonically decreasing. For
k < p the Krylov subspace approximations depend nonlinearly on b in a highly complicated
way.
At termination, all bidiagonal elements in B and R of Bidiag1 and Bidiag2 are positive. Such
bidiagonal matrices are said to be unreduced and have the following property.
Lemma 4.2.5. For any unreduced bidiagonal matrix B, all singular values σi must be distinct.
Theorem 4.2.6. Let A have p distinct, possibly multiple, nonzero singular values. Then, Bidiag1
and Bidiag2 terminate with an unreduced bidiagonal matrix Bk = UkT AVk of rank k ≤ p.
200 Chapter 4. Special Least Squares Problems
Proof. Since Bk is unreduced, it follows from Lemma 4.2.5 that all its singular values must
be distinct: σ1 > σ2 > · · · > σk > 0. But these are also singular values of A, and hence
k ≤ p.
Theorem 4.2.6 states that if A has singular values of multiplicity greater than one, then
Bidiag1 and Bidiag2 terminate in less than rank(A) steps. For example, let square A = In +uv T ,
u, v ̸= 0, and rank(A) = n. The singular values of A are the square roots of the eigenvalues
of ATA = In + uv T + vuT + (uT u)vv T . AT A has one eigenvalue equal to 1 of multiplicity
(n − 2) corresponding to eigenvectors x that are orthogonal to u and v. Hence, A has at most
three distinct singular values, and, in exact arithmetic, bidiagonalization must terminate after at
most three steps.
If b is orthogonal to the left singular subspaces corresponding to some of the singular values,
Bidiag1 terminates in k ≤ p steps; see Björck [137, 2014, Lemma 2.1].
The process terminates when either ∥v̂k ∥2 or ∥ûk ∥2 is zero. Note that if uTk Ak−1 vk ̸= 0, the
rank of Ak is exactly one less than that of Ak−1 .
In the PLS literature, uk are called score vectors, vk loading weights, and pk loading vec-
tors. Summing the equations in (4.2.41) gives
NIPALS-PLS uses three matrix-vector products and one rank-one deflation that together re-
quire 8mn flops per PLS factor. The flop counts for the additional scalar products and the final
back-substitution for solving the upper triangular systems are negligible in comparison. For
k ≪ min(m, n) this is the same number of flops per step as required using Householder bidiag-
onalization.
In exact arithmetic, NIPALS-PLS computes the sequence of Krylov subspace approximations
defined in Theorem 4.2.4. The following important result is due to Eldén [375, 2004].
Theorem 4.2.7. The vectors {v1 , . . . , vk } and {u1 , . . . , uk } generated by NIPALS-PLS form
orthonormal bases for Kk (ATA, AT b) and Kk (AAT , AAT b), respectively. In exact arithmetic,
these vectors are the same as the columns of Vk and Uk from the Bidiag2 algorithm. It also
follows that PkT Vk is an upper bidiagonal matrix, and xk is the kth Krylov subspace approx-
imation.
accuracy in the computed xk . It is unfortunate that omitting the deflation of b seems to have be-
come the norm rather than the exception. For example, Manne [771, 1987] proposes a marginally
faster PLS algorithm by omitting the deflation of b. The implementation of NIPALS tested by
Andersson [29, 2009] also omits this deflation. Unfortunately, this practice has spread even to
some commercial statistical software packages.
In the context of iterative methods for least squares problems, the Bidiag2 process is known
to lead to a less stable algorithm than Bidiag1; see Paige and Saunders [866, 1982]. For PLS
it is more direct to use the Bidiag2 process. In the implementation below, orthogonality in the
computed basis vectors uk and vk is preserved by reorthogonalizing the new basis vectors uk+1
and vk+1 against all previous basis vectors. Then there is no difference in stability between
Bidiag1 and Bidiag2, as confirmed by tests in Björck and Indahl [148, 2017]. The additional
cost of reorthogonalizing is about 4(m + n)k 2 flops for k factors. Unless k is very large, this
overhead is acceptable.
[x,U,B,V] = bidiag2pls(A,b,k)
% BIDIAG2-PLS computes the first k PLS factors using
Golub--Kahan bidiagonalization
% -----------------------------------------------------
[m,n] = size(A);
B = zeros(k,2); % B stored by diagonals
x = zeros(n,k); U = zeros(m,k); V = zeros(n,k);
v = A'*b; v = v/norm(v);
w = A*v; rho = norm(w); w = (1/rho)*w;
V(:,1) = v; U(:,1) = w; B(1,1) = rho;
z = v/rho; x(:,1) = (w'*b)*z;
for i = 2:k % Bidiagonalization
v = A'*w - rho*v;
v = v - V*(V'*v); % Reorthogonalize
gamma = norm(v); B(i-1,2) = gamma;
v = (1/gamma)*v; V(:,i) = v;
w = A*v - gamma*w;
w = w - U*(U'*w); % Reorthogonalize
rho = norm(w); B(i,1) = rho;
if rho == 0, break, end
w = (1/rho)*w; U(:,i) = w;
z = (1/rho)*(v - gamma*z); % Update solution
x(:,i) = x(:,i-1) + (w'*b)*z;
end
end
Example 4.2.8. Like TSVD (truncated SVD) approximations, the PLS approximations are or-
thogonal projections of the pseudoinverse solution onto a nested sequence of subspaces of di-
mension k ≪ n; see Section 3.6.2. Both sequences have regularization properties. However,
because PLS only requires a partial bidiagonalization of A, it is much less expensive to compute
than TSVD. Further, for PLS the subspaces depend (nonlinearly) on the right-hand side b and are
tailored to the specific right-hand side. Therefore, the minimum error for an ill-conditioned prob-
lem is often achieved with a lower dimensional subspace for PLS rather than for TSVD. Consider
2
the discretized problem Kf = g in Example 3.6.1 for the linear operator K(s, t) = e−(s−t)
4.2. Bidiagonalization 203
3
10 3
10
2
2
10 10
1
10
1
10
0
10
0
10
−1
10
−1
10
−2
10
−2
10 −3
10
−3 −4
10 10
−4
10 0 2 4 6 8 10 12 14 16 18 20
0 2 4 6 8 10 12 14 16 18 k
Figure 4.2.1. Relative error ∥fk −f ∥2 /∥f ∥2 (solid line) and relative residual ∥Kfk −f ∥2 /∥f ∥2
(dashed line) after k steps. PLS (left) and TSVD (right). Used with permission of Springer International
Publishing; from Numerical Methods in Matrix Computations, Björck, Åke, 2015; permission conveyed
through Copyright Clearance Center, Inc.
Many algorithms that differ greatly in speed and accuracy have been proposed for PLS. Both
NIPALS-PLS and HHBD perform transformations on A that require A to be explicitly available.
This makes them less suitable for large-scale problems. Andersson [29, 2009] tested nine dif-
ferent PLS algorithms on a set of contrived benchmark problems. The only provably backward
stable method among them, HHBD, was much slower than NIPALS-PLS, which also was one of
the most accurate, even though it was used here without deflation of b. The version of Bidiag2-
PLS used did not employ reorthogonalization and gave poor accuracy. In further tests by Björck
and Indahl [148, 2017] the most accurate algorithms were HHBD, NIPALS-PLS with deflation,
and Bidiag2-PLS with reorthogonalization. For large-scale problems, Bidiag2-PLS is the method
of choice. On some large simulated data sets of size 30,000 × 10,000 and 100 extracted com-
ponents, Bidiag2-PLS was about seven times faster than NIPALS. Notably, the popular SIMPLS
algorithm by de Jong [295, 1993], used by the MATLAB function plsregress, gave poor ac-
curacy. This was true even when it was improved by reorthogonalization, as suggested by Faber
and Ferré [390, 2008].
where A ∈ Rm×n1 , B ∈ Rm×n2 . One example is periodic spline approximation, which leads to
a problem of augmented band form, where A is a band matrix and B is a full matrix with a small
number of columns.
If z is given, then y must solve the problem
Substituting the solution y = A† (b − Bz) into (4.3.2) to eliminate y, we see that z solves the
problem
min ∥(I − AA† )(Bz − b)∥2 , (4.3.3)
z
†
where I − AA = PN (AT ) is the orthogonal projector onto N (AT ). Thus (4.3.1) has been split
into two separate least squares problems (4.3.3) and (4.3.2).
One advantage of this is that different methods can be used to solve the two least squares
subproblems for z and y. Moreover, the subproblem for z is always better conditioned than the
original problem. Hence, z can sometimes be computed with sufficient accuracy by the method
of normal equations; see Foster [427, 1991]. Another application is when n2 ≫ n1 and the
subproblem for z can be solved by an iterative method; see Section 6.3.6.
The normal equations of (4.3.1) are
T T
A A ATB y A b
= .
B TA B TB z BT b
where PN (AT ) = I − A(ATA)−1 AT . When z ∈ Rn2 has been determined, we obtain y ∈ Rn1
from
ATAy = AT (b − Bz). (4.3.5)
For better stability, methods based on orthogonal factorizations should be used. After n1
steps, a partial Householder QR factorization reduces the first block in ( A B ) to the upper
triangular form,
T R11 R12 c1
Q1 ( A B b ) = , (4.3.6)
0 A22 c2
where Q1 = P1 · · · Pn1 . By the orthogonal invariance of the 2-norm,
R11 R12 y c1
∥Ax − b∥2 = − .
0 A22 z c2 2
The following lemma gives an alternative formulation without explicitly referring to orthog-
onal projections.
Lemma 4.3.1. Let z ∈ Rn2 and W ∈ Rn2 ×n1 be the solutions to the least squares problems
min ∥Aw − b∥2 , min ∥AW − B∥F . (4.3.8)
w W
Proof. If w and W solve (4.3.8), then Aw = PA b and AW = PA B. Hence the least squares
problems (4.3.9) and (4.3.3) are equivalent. Further, y = z − W x solves (4.3.2), because Ay =
A(z − W z) = PA (b − Az).
A common practice in linear regression is to preprocess the data by centering the data, i.e.,
subtracting out the means. This can be interpreted as a simple case of a two-block least squares
problem, where
ξ
(e B ) = b, e = (1, . . . , 1)T . (4.3.10)
z
Multiplying B ∈ Rm×n and b ∈ Rm with the projection matrix (I − eeT /m) gives
1 eT b
B̄ = B − e(eTB), b̄ = b − e. (4.3.11)
m m
This makes the columns of B̄ and b̄ orthogonal to e. The reduced least squares problem becomes
minx ∥B̄z − b̄∥2 . After solving the reduced problem for z, we obtain ξ = eT (b − Bz)/m.
Example 4.3.2. The Hald cement data (see [561, 1952, p. 647]) are used in Draper and Smith
[331, 1998, Appendix B] and several other books as an example of regression analysis. The
right-hand side consists of m = 13 observations of the heat evolved in cement during hardening.
The explanatory variables are four different ingredients of the mix and a constant term:
1 7 26 6 60 78.5
1 1 29 15 52
74.3
1 11 56 8 20
104.3
1 11 31 8 47
87.6
1 7 52 6 33
95.9
1 11 55 9 22
109.2
A=
1 3 71 17 6 ,
b=
102.7 .
(4.3.12)
1 1 31 22 44
72.5
1 2 54 18 22
93.1
1 21 47 4 26
115.9
1 1 40 23 34
83.8
1 11 66 9 12 113.3
1 1 68 8 12 109.4
206 Chapter 4. Special Least Squares Problems
For the least squares problem ∥Ax − b∥2 , κ(A) ≈ 3.245 · 103 indicates that about six digits
may be lost when using the normal equations. The first column of ones in A = (e, B) is added
to extract the mean values. The first variable ξ in x = (ξ, y) can be eliminated by setting
B̄ = B − epT , c = b − βe,
where p = (eT B)/m, β = eT b/m, and ξ = β − pT y. The reduced problem miny ∥B̄y − c∥2
is much better conditioned: κ(B̄) = 23.0. Normalizing the columns of B to have unit length
decreases the condition number by only a small amount to κ(BD) = 19.6.
The block structure in the matrix of the corresponding linear system induced by one and two
levels of dissection for p = 2 is shown here:
A1 B1 D1
A1 B1 A2 B2 D2
A= , A= . (4.3.13)
A2 B 2 A3 C3 D3
A4 C4 D4
There is a finer structure in A not shown here. For example, in one level of dissection most of
the equations involve variables in A1 or A2 only but not in B.
4.3. Some Structured Problems 207
For a least squares problems minx ∥Ax − b∥2 arising from a one-level dissection into p
regions, the matrix has a bordered block diagonal or block-angular form,
x1
A1 B1 b1
x2
A2 B2 b2
..
A= .. .. , x=
,
. b=
... ,
(4.3.14)
. . x
p
Ap Bp bp
xp+1
where
Ai ∈ Rmi ×ni , Bi ∈ Rmi ×np+1 , i = 1, . . . , p,
and m = m1 + m2 + · · · + mp , n = n1 + n2 + · · · + np+1 . For some problems, the blocks
Ai and/or Bi are themselves large sparse matrices, often of the same general sparsity pattern as
A. There is a wide variation in the number and size of blocks. Some problems have large blocks
with p of moderate size (10–100), while others have many more but smaller blocks.
The normal matrix ATA, with A given as in (4.3.14), has a doubly bordered block diagonal
form T
AT1 B1
A1 A1
AT2 A2 AT2 B2
..
ATA =
.. ,
. .
ATpAp ATp Bp
B1TA1 B2TA2 · · · BpTAp C
Pp
where C = i=1 BiT Bi . The right-hand side f = (f1 , . . . , fp+1 ) of the normal equations is
p
X
fi = ATi bi , i = 1, . . . , p, fp+1 = BiT bi .
i=1
If rank(A) = n, the upper triangular Cholesky factor R of ATA exists and has a block structure
similar to that of A:
R1 S1
R2 S2
. .. ..
R=
. .
(4.3.15)
Rp Sp
Rp+1
Equating the blocks in RTR = ATA gives
where the Cholesky factors Ri ∈ Rni ×ni and Si can be computed independently and in parallel.
The least squares solution is then obtained from the two-block triangular systems RT z = AT b =
f and Rx = z. This amounts to first solving the lower triangular systems
p
X
RiT zi = ATi bi , i = 1, . . . , p, T
Rp+1 zp+1 = fp+1 − SiT zi , (4.3.18)
i=1
208 Chapter 4. Special Least Squares Problems
1. Reduce the diagonal block Ai to upper triangular form by a sequence of orthogonal trans-
formations applied to (Ai , Bi ) and the right-hand side bi , yielding
T Ri S i T ci
Qi (Ai , Bi ) = , Qi bi = , i = 1, . . . , p. (4.3.20)
0 Ti di
Any sparse structure in the blocks Ai should be exploited.
2. Form
T1 d1
. .
T = .. , d = .. ,
Tp dp
and compute the QR factorization
Rp+1 cp+1
QTp+1 (T d) = . (4.3.21)
0 dp+1
There are several ways to organize this algorithm. In steps 1 and 3 the computations can be
performed in parallel on the p subsystems. It is then advantageous to continue the reduction in
step 1 so that the matrices Ti , i = 1, . . . , p, are brought into upper trapezoidal form.
Large problems may require too much memory, even if we take into account the block-
angular structure. Cox [275, 1990] suggests two modifications by which the storage requirement
can be reduced. By merging steps 1 and 2, it is not necessary to hold all blocks Ti simultaneously
in memory. Even more storage can be saved by discarding Ri and Si after they have been
computed in step 1 and recomputing them for step 3. Indeed, only Ri needs to be recomputed,
because after y has been computed in step 2, xi is the solution to the least squares problems
min ∥Ai xi − gi ∥2 , gi = bi − Bi y.
xi
This method assumes that all the matrices Ri and Si have been retained. For a discussion of how
to compute variances and covariances when the storage saving algorithm is used, see Cox [275,
1990].
In some applications the matrices Ri will be sparse, but a lot of fill occurs in the blocks Bi
in step 1. Then the triangular matrix Rp+1 will be full and expensive to compute. For such
problems a block-preconditioned iterative method may be more efficient; see Section 6.3. Then
an iterative method, such as CGLS or LSQR, is applied to the problem
where a suitable preconditioner is M = diag (R1 , . . . , Rp , Rp+1 ); see Golub, Manneback, and
Toint [500, 1986].
Dissection and orthogonal decompositions in geodetic survey problems are treated by Golub and
Plemmons [505, 1980]. Avila and Tomlin [44, 1979] discuss parallelism in the solution of very
large least squares problems by nested dissection and the method of normal equations. Weil and
Kettler [1116, 1971] give a heuristic algorithm for permuting a general sparse matrix into block-
angular form. The dissection procedure described above is a variation of the nested dissection
orderings developed for general sparse positive definite systems; see Section 5.1.5.
where A ⊗ B is the Kronecker product of A ∈ Rm×n and B ∈ Rp×q . This product is the
mp × nq block matrix
Problems with Kronecker structure arise in several application areas, including signal and image
processing, photogrammetry, and multidimensional approximation; see Fausett and Fulton [399,
1994]. Grosse [540, 1980] describes a tensor factorization algorithm and how it applies to least
squares fitting of multivariate data on a rectangular grid. Such problems can be solved with great
210 Chapter 4. Special Least Squares Problems
savings in storage and operations. These savings are essential for problems where A and B are
large. It is not unusual to have several hundred thousand equations and unknowns.
The Kronecker product and its relation to linear matrix equations, such as Lyapunov’s equa-
tion, are treated in Horn and Johnson [640, 1991, Chapter 4]. See also Henderson and Searle [602,
1981] and Van Loan [1083, 2000]. We now state some elementary facts about Kronecker prod-
ucts that follow from its definition:
(A + B) ⊗ C = (A ⊗ C) + (B ⊗ C),
A ⊗ (B + C) = (A ⊗ B) + (A ⊗ C),
A ⊗ (B ⊗ C) = (A ⊗ B) ⊗ C,
(A ⊗ B)T = AT ⊗ B T .
assuming all the products are defined. We can also conclude that if P and Q are orthogonal n×n
matrices, then P ⊗ Q is an orthogonal n2 × n2 matrix. Furthermore, if A and B are square and
nonsingular, it follows that A ⊗ B is nonsingular and
(A ⊗ B)−1 = A−1 ⊗ B −1 .
(A ⊗ B)† = A† ⊗ B † .
Proof. The lemma follows by verifying that X = A† ⊗ B † satisfies the four Penrose conditions
in Theorem 1.2.11.
We now introduce a function, closely related to the Kronecker product, that converts a matrix
into a vector. For a matrix C = (c1 , c2 , . . . , cn ) ∈ Rm×n we define
c1
c2
vec (C) =
... .
(4.3.25)
cn
Hence vec (C) is the vector formed by stacking the columns of C into one column vector of
length mn. We now state a result that shows how the vec-function is related to the Kronecker
product.
4.3. Some Structured Problems 211
By Lemma 4.3.5, we can write the solution to the Kronecker least squares problem (4.3.23)
as
x = (A ⊗ B)† f = (A† ⊗ B † )f = vec (B † F (A† )T ), (4.3.27)
where f = vec (F ). This allows a great reduction in the cost of solving (4.3.23). For example,
if both A and B are m × n matrices, the cost of computing the least squares solution is reduced
from O(m2 n4 ) to O(mn2 ).
In some areas, the most common approach to computing the least squares solution to (4.3.23)
is from normal equations. If we assume that both A and B have full column rank, we can use the
expressions
A† = (ATA)−1 AT , B † = (B TB)−1 B T .
However, because of the instability associated with the explicit formation of ATA and B TB, an
approach based on orthogonal decompositions should generally be preferred. From the complete
QR factorizations of A and B,
R1 0 R2 0
AΠ1 = Q1 V1T , BΠ2 = Q2 V2T ,
0 0 0 0
These expressions can be used in (4.3.27) to compute the pseudoinverse solution of problem
(4.3.23), even in the rank-deficient case.
From Lemma 4.3.5, the following simple expression for the singular values and vectors of
the Kronecker product A⊗B, in terms of the singular values and vectors of A and B, is obtained.
Lemma 4.3.6. Let A and B have the SVDs A = U1 Σ1 V1T and B = U2 Σ2 V2T . Then
is the SVD of A ⊗ B.
Barrlund [84, 1998] develops an efficient solution method for constrained least squares prob-
lems with Kronecker structure:
This problem can be solved by a nullspace method; cf. Section 3.4.2. By a change of variables
the unknowns are split into two sets. The first set is determined by the constraints, and the other
set belongs to the nullspace of the constraints.
212 Chapter 4. Special Least Squares Problems
Ai = Qi Ri , i = 0, . . . , N − 1,
are computed. Subsequent stages merge pairs of the resulting upper triangular matrices in a
divide-and-conquer fashion until a single factor R has been obtained. This requires about log2 N
stages. The algorithm is exemplified below for N = 4. After the first step we have obtained
A0 Q0 R0 Q0 R0
A Q R Q1 R1
A= 1= 1 1= . (4.3.31)
A2 Q2 R2 Q2 R2
A3 Q3 R3 Q2 R3
In the next two steps the QR factorizations of the stacked upper triangular factors are merged
into one upper triangular matrix
R0
R1
= Q01 R01
,
R01
= Q0,1,2,3 R. (4.3.32)
R2 Q23 R23 R23
R3
The representation of the factor Q in TSQR is different from the standard Householder QR
factorization. It is implicitly given by a tree of smaller Householder transformations
Q0
Q1 Q01
Q= Q0,1,2,3 . (4.3.33)
Q2 Q23
Q2
In general, the combination process in TSQR forms a tree with the row blocks Ai as leaves and
the final R as a root. The version pictured above corresponds to a binary tree. The tree shape can
be chosen to minimize either the communication between processors or the volume of memory
traffic between the main memory and the cache memory of each processor.
The initial Householder QR factorizations of the N blocks Ai require 2N n2 (p − n/3) flops.
Merging two triangular QR factorizations of dimension n × n takes 2n3 /3 flops. The total
arithmetic cost of TSQR is higher than that for the direct Householder QR factorization of A,
but for strongly rectangular systems most of the arithmetic work is spent in computing the QR
factorizations of the submatrices, which can be done in parallel.
4.3. Some Structured Problems 213
An implementation of TSQR using the message passage interface (MPI) operation AllReduce
for multiple processors is given by Langou [719, 2007]. Experiments show that the AllReduce
QR algorithm obtains nearly linear speed-up. As shown by Mori et al. [811, 2012], although the
number of floating-point operations is larger, the bounds for the backward error and the deviation
from orthogonality are smaller for the AllReduce QR algorithm than for standard Householder
QR.
Another communication-avoiding QR algorithm suitable for tall-and-skinny matrices is the
Cholesky QR algorithm. Let A ∈ Rm×n have full column rank and ATA = RTR be its Cholesky
factorization. The Cholesky QR algorithm then computes Q1 = AR−1 by block forward substi-
tution, giving
A = Q1 R; (4.3.34)
see Section 1.2.1. The arithmetic cost of this algorithm is 2mn2 + n3 /3 flops. The Cholesky QR
algorithm is ideal from the viewpoint of high performance computing. It requires only one global
reduction between parallel processing units, and most of the arithmetic work can be performed
as matrix-matrix operations. However, the loss of orthogonality ∥I − QT1 Q1 ∥F can only be
bounded by the squared condition number of A.
Yamamoto et al. [1136, 2015] suggest a modified Cholesky QR2 algorithm, where R and
Q1 from Cholesky QR are refined as follows. First, compute E = QT1 Q1 and its Cholesky
factorization E = S T S. The refined factorization is taken to be A = P U , where
P = QS −1 , U = SR. (4.3.35)
This updating step doubles the arithmetic cost. The Cholesky QR2 algorithm has good stability
properties provided the initial Cholesky factorization does not break down. However, the QR2
algorithm may fail for matrices with a condition number roughly greater than u−1/2 .
Yamazaki, Tomov, and Dongarra [1137, 2015] extend the applicability of the Cholesky QR2
algorithm as follows. An initial Cholesky factorization of RT R = AT A + sI is computed,
where s ≥ 0 is a shift that guarantees that the factorization runs to completion. Further, some
intermediate results are accumulated in higher precision. The resulting Cholesky QR3 algorithm
uses three Cholesky QR steps and yields a computed Q with loss of orthogonality ∥I − QT1 Q1 ∥F
and residual ∥A − QR∥F /∥A∥F of order u. See also Fukaya et al. [435, 2020].
The following sum convention is often used. If an index occurs as both a subscript and a su-
perscript, the product should be summed over the range of this index. For example, the ith
214 Chapter 4. Special Least Squares Problems
coordinate of A(x1 , x2 , . . . , xd ) is written aij1 ,j2 ,...,jd xj11 xj22 · · · xjkd . (Remember the superscripts
are not exponents.)
Suppose Xi = X, i = 1, 2, . . . , d. Then the set of d-linear mappings from X k to Y is itself
a linear space, denoted by Lk (X, Y ). For k = 1, we have the space of linear functions. Linear
functions can, of course, be described in vector-matrix notation as a set of matrices L(Rn , Rm ) =
Rm×n . Matrix notation can also be used for each coordinate of a bilinear function. Norms of
multilinear operators are defined analogously to subordinate matrix norms. For example,
where
n1 X
n2 nk
m X X
∥A∥∞ = max ... |aij1 ,j2 ,...,jk |. (4.3.37)
i=1
j1 =1 j2 =1 jk =1
(a:,j,k ) ∈ Rn1 , j = 1 : n2 , k = 1 : n3 .
A tensor is said to be symmetric if its elements are equal under any permutations of the indices,
i.e., for a 3-mode tensor,
see Comon et al. [263, 2008]. A tensor is diagonal if ai1 ,i2 ,...,id ̸= 0 only if i1 = i2 = · · · = id .
Elementwise addition and scalar multiplication trivially extend to hypermatrices of arbitrary
order. The tensor or outer product is denoted by ◦ (not to be confused with the Hadamard
product of matrices). For example, if A = (aij ) ∈ Rm×n and B = (bkl ) ∈ Rp×q are matrices,
then
C = A ◦ B = (ai,j,k,l )
is a 4-mode tensor. The 1-mode contraction product of two 3-mode hypermatrices A =
(ai,j,k ) ∈ Rn×n2 ×n3 and B = (bi,l,m ) ∈ Rn×m2 ×m3 with conforming first dimension is the
4-mode tensor C ∈ Rn2 ×n3 ×m2 ×m3 defined as
n
X
C = ⟨A, B⟩1 , cj,k,l,m = ai,j,k bi,l,m . (4.3.39)
i=1
4.3. Some Structured Problems 215
Contractions need not be restricted to one pair of indices at a time. The inner product of two
3-mode tensors of the same size and the Frobenius norm of a tensor are defined as
X X
⟨A, B⟩ = aijk bijk , ∥A∥2F = ⟨A, A⟩1/2 = a2ijk . (4.3.40)
i,j,k i,j,k
where a colon indicates all elements of a mode. Different papers sometimes use different order-
ings of the columns. The specific permutation is not important as long as it is consistent.
A matrix C ∈ Rp×q can be multiplied from the left and right by other matrices X ∈ Rm×p
and Y ∈ Rn×q , and we write
p X
X q
A = XCY T , aij = xiα yjβ cαβ .
α=1 β=1
A notation for this operation suggested by Silva and Lim [993, 2008] is
C = (X, Y, Z) · A, (4.3.43)
where the mode of each multiplication is understood from the ordering of the matrices. It
is convenient to use a separate notation for multiplication by transposed matrices. For C =
(X T , Y T , Z T ) · A we also write
C = A · (X, Y, Z).
For a matrix A ∈ Rm×n there are three ways to define the rank r, all of which yield the same
value. The rank is equal to the dimension of the subspace of Rm spanned by its columns and the
dimension of the subspace of Rn spanned by its rows. Also, the rank is the minimum number of
terms in the expansion of A as a sum of rank-one matrices; cf. the SVD expansion. For a tensor
of mode d > 2, these three definitions yield different results.
The column rank and row rank of a matrix are generalized as follows. For a 3-mode tensor
A ∈ Rn1 ×n2 ×n3 , let r1 be the dimension of the subspace of Rn1 spanned by the n2 n3 vectors
with entries a:,i2 ,i3 , i2 = 1 : n2 , i3 = 1 : n3 . In other words, r1 (A) = rank(A(1) ), with similar
interpretations for r2 and r3 . The triple (r1 , r2 , r3 ) is called the multirank of A, and r1 , r2 , r3
can all be different.
216 Chapter 4. Special Least Squares Problems
If nonzero, we call this a rank-one tensor. The tensor rank of A is the smallest number r such
that A may be written as a sum of rank-one hypermatrices:
r
X
A= xp ◦ yp ◦ zp . (4.3.45)
p=1
When d = 2 this definition agrees with the usual definition of the rank of a matrix. Generalization
of this definition of rank to higher order tensors is straightforward. However, for d ≥ 3 there is no
algorithm for determining the rank of a given tensor, and this problem is NP-hard. Furthermore,
de Silva and Lim [993, 2008] show that the problem of finding the best rank-p approximation in
general has no solution, even for d = 3.
Tensor decompositions originated with Hitchcock [629, 1927] and much later were taken
up and used to analyze data in psychometrics (Tucker [1070, 1966]). In the last decades the use
of tensor methods has spread to other fields, such as chemometrics (Bro [180, 1997]), signal and
image processing, data mining, and pattern recognition (Eldén [376, 2019]). Tensor decomposi-
tions are used in machine learning and parameter estimation.
Low-rank approximations of a given two-dimensional array of data can be found from the
SVD of a matrix. In many applications one would like to approximate a given tensor A with a
sum of rank-one tensors to minimize
p
X
A− λi xi ◦ yi ◦ zi . (4.3.46)
F
i=1
Weights λi are introduced to let us assume that vectors xi , yi , and zi are normalized to have
length one. Hillar and Lim [628, 2013] have shown that this problem (and indeed, most other
tensor problems) are NP-hard. Therefore, we assume that the number p < r of factors is fixed. A
popular algorithm for computing such an approximate decomposition is alternating least squares
(ALS). First, the vectors yi and zi are fixed, and xi is determined to minimize (4.3.46). Next,
xi , zi are fixed, and we solve for yi . Finally, xi , yi are fixed, and we solve for zi . Define the
matrices
n1 ×n2 n3
where A(1) ∈ R is the matrix obtained by unfolding A along the first mode, and
Z ⊙ Y = (z1 ⊗ y1 , . . . , zp ⊗ yp ) ∈ Rn2 n3 ×p
is the matching columnwise Kronecker product, also called the Khatri–Rao product, of Z and Y .
The solution can be written
X̂ = A(1) [(Z ⊙ Y )T ]† ,
and then the columns of X̂ are normalized to give X̂ = Xdiag (λi ). Because of the special form
of the Khatri–Rao product, the solution can also be written as
X̂ = X(1) (Z ⊙ Y )(Z T Z. ∗ Y T Y )† ,
4.3. Some Structured Problems 217
where .∗ is the Hadamard (elementwise) matrix product. This version is not always suitable
because of the squared condition number. Similar formulas for the two other modes are easily
derived. At each inner iteration a pseudoinverse must be computed. ALS can take many iterations
and is not guaranteed to converge to a global minimum. Also, the solution obtained depends on
the starting point.
The idea of expressing a tensor as a sum of rank-one tensors has been proposed under differ-
ent names by several authors. In psychometrics it was called CANDECOMP (canonical decom-
position) and PARAFAC (parallel factors); see Kolda and Bader [705, 2009]. Here, following
Leurgans, Ross, and Abel [735, 1993], we call it the CP decomposition. In matrix computations,
the SVD expansion
r
X
A = U ΣV T = σi ui viT ∈ Rm×n , r ≤ min{m, n}, (4.3.47)
i=1
expresses a matrix A of rank r as the weighted sum of rank-one matrices ui viT , where ui ∈ Rm
and vi ∈ Rn , i = 1 : r, are mutually orthogonal. This expansion has the desirable property
that for any unitarily invariant norm, the best approximation of A by a matrix of rank r < n is
obtained by truncating the expansion; see the Eckart–Young–Mirksy Theorem 1.3.8.
The high-order SVD (HOSVD) is a generalization of the SVD to 3-mode hypermatrices
A = (U, V, W ) · C,
where U, V , and W are square and orthogonal, and C has the same size as A. Further, the
different matrix slices of C are mutually orthogonal (with respect to the standard inner product
on matrix spaces) and with decreasing Frobenius norm. Because of the imposed orthogonality
conditions, the HOSVD of A is essentially unique. It is rank-revealing in the sense that if A has
multirank (r1 , r2 , r3 ), then the last n1 − r1 , n2 − r2 , and n3 − r3 slices along the different modes
of the core tensor C are zero matrices. Algorithms for computing the HOSVD are described by
Lathauwer, De Moor, and Vandewalle [296, 2000]. The matrix U is obtained from the SVD of
the l × mn matrix obtained from unfolding A. V and W are obtained similarly. Since U , V , and
W are orthogonal, C = (cijk ) is easily computed from C = (U T , V T , W T ) · A.
Suppose we want to approximate tensor A by another tensor B of lower multirank. Then we
want to solve
min ∥A − B∥F , (4.3.48)
rank(B)=(p,q,r)
where the Frobenius tensor norm is defined as in (4.3.40). This is the basis of the Tucker
model [1070, 1966]. Unlike the matrix case, this problem cannot be solved by truncating the
HOSVD of A. It is no restriction to assume that B = (U, V, W ) · C, where U ∈ Rn1 ×p ,
V ∈ Rn2 ×q , and W ∈ Rn3 ×p are orthogonal matrices. Because of the orthogonal invariance of
the Frobenius norm, U , V , and W are only determined up to a rotation. With the core tensor C
eliminated, problem (4.3.48) can be rewritten as a maximization problem with objective function
1 2
Φ(U, V, W ) = (U T , V T , W T ) · A F
2
subject to U T U = I, V T V = I, and W T W = I (compare with the corresponding matrix prob-
lem for d = 2). This can be formulated and solved as an optimization problem on a Grassmann
manifold; see Eldén and Savas [380, 2009] and Savas and Lim [968, 2010].
and computation of tensors is given by Wei and Ding [1114, 2016]. Tensor rank problems
are studied by de Silva and Lim [993, 2008] and Comen et al. [264, 2009]. A tutorial on
CP decomposition and its applications is given by Bro [180, 1997]. The N-way Toolbox for
MATLAB (Andersson and Bro [28, 2000]) for analysis of multiway data can be downloaded
from http://www.models.kvl.dk/source/. MATLAB tools for tensor computations have
also been developed by Bader and Kolda [52, 2006], [53, 2007]. A MATLAB Tensor toolbox
supported by Sandia National Labs and MathSci.ai is available on the web. Hankel tensors arise
from signal processing and data fitting; see Papy, De Lauthauwer, and Van Huffel [878, 2005].
Tensors with Cauchy structure are also of interest; see Chen, Li, and Qi [240, 2016].
The assumption that all errors are confined to b is frequently unrealistic. Sampling or modeling
errors will often affect both A and b. In the errors-in-variables model it is assumed that
(A + E)x = b + f, (4.4.2)
where the rows of the error matrix ( E f ) are independently and identically distributed with
zero mean and the same variance. This model has independently been developed in statistics,
where it is known as “latent root regression.” The optimal estimates of the parameters x in this
model satisfy the total least squares7 (TLS) problem
where ∥ · ∥F denotes the Frobenius matrix norm. The TLS problem is equivalent to finding
the “nearest” consistent linear system, where the distance is measured in the Frobenius norm of
( E f ). When a minimizing perturbation has been found, any x satisfying (4.4.2) is said to
solve the TLS problem.
A complete and rigorous treatment of both theoretical and computational aspects of the TLS
problem is developed in the monograph by Van Huffel and Vandewalle [1077, 1991]. They find
that in typical applications, gains of 10–15% in accuracy can be obtained by using TLS instead
of standard least squares methods.
The TLS solution depends on the relative scaling of the data A and b. Paige and Strakoš [867,
2002] study the scaled TLS (STLS) problem
where γ is a given positive scaling parameter. For small values of γ, perturbations in b will
be favored. In the limit when γ → 0 in (4.4.4), the solution equals the ordinary least squares
7 The term “total least squares problem” was coined by Golub and Van Loan [511, 1980].
4.4. Total Least Squares 219
solution. On the other hand, large values of γ favor perturbations in A. In the limit when
1/γ → 0, we obtain the data least squares (DLS) problem
Assume that σn+1 > 0. Then, by the Eckart–Young–Mirsky theorem (Theorem 1.3.8) the unique
perturbation ( E f ) of minimum Frobenius norm that makes (A + E)x = b + f consistent is
the rank-one perturbation
T
( E f ) = −σn+1 un+1 vn+1 , (4.4.8)
and minE, f ∥ ( E f ) ∥F = σn+1 . Multiplying (4.4.8) from the right with vn+1 and using
(4.4.7) gives
(E f ) vn+1 = −σn+1 un+1 = − ( A b ) vn+1 , (4.4.9)
i.e., ( A + E b + f ) vn+1 = 0. If vn+1,n+1 ̸= 0, the problem is called generic, and the TLS
solution is obtained by scaling vn+1 so that its last component is −1:
x 1
= − vn+1 , γ = eTn+1 vn+1 . (4.4.10)
−1 γ
(E f ) = −(A + E b + f ) vv T , v = V2 z.
Then x = −γ −1 y is a TLS solution. In this case the TLS solution is not unique. A unique TLS
solution of least-norm can be found as follows. Since V2 z has unit length, minimizing ∥x∥2 is
equivalent to choosing the unit vector z ∈ Rn−p+1 to maximize γ in (4.4.11). Set z = Qe1 ,
where Q is a Householder reflector such that
y V̂2
V2 Q = .
γ 0
Then the least-norm TLS solution is x = −γ −1 y. If eTn+1 V2 = 0, then the TLS problem is
nongeneric. This case can only occur when σ̂n = σn+1 . By an argument similar to that for
p = n, it then holds that b ⊥ uj , j = p : n. Nongeneric TLS problems can be treated by adding
constraints on the solution; see Van Huffel and Vandewalle [1077, 1991].
From the relationship between the SVD of A e = ( A b ) and the eigendecomposition of the
T
symmetric matrix A e A e (see Section 1.2.2) it follows that the TLS solution x can be characterized
by T
A A AT b 2 x
v = σn+1 v, v = , (4.4.12)
bTA bT b −1
2 eT A,
where σn+1 is the smallest eigenvalue of the matrix A e and v is a corresponding eigenvector.
From (4.4.12) it follows that
(ATA − σn+1
2
In )x = AT b, bT (b − Ax) = σn+1
2
. (4.4.13)
In the first equation of (4.4.13) a positive multiple of the unit matrix is subtracted from the matrix
of normal equations ATAx = AT b. This shows that TLS can be considered as a procedure for
deregularizing the LS problem. (Compare with Tikhonov regularization, where a multiple of the
unit matrix is added to improve the conditioning; see Section 3.5.3.) From a statistical point
of view, TLS can be interpreted as removing bias by subtracting the error covariance matrix
2
estimated by σn+1 I from the data covariance matrix ATA.
Because of the nonlinear dependence of xTLS on the data A, b, a strict analysis of the sen-
sitivity and conditioning of the TLS problem is more complicated than for the least squares
problem. Golub and Van Loan [511, 1980] show that an approximate condition number for the
TLS problem is
σ̂n
κT LS (A b) = κ(A) . (4.4.14)
σ̂n − σn+1
This shows that the condition number for the TLS problem will be much larger than κ(A) when
the relative distance 1 − σn+1 /σ̂n between σn+1 and σ̂n is small. Subtracting the normal equa-
tions from (4.4.13), we obtain
2
xTLS − xLS = σn+1 (ATA − σn+1
2
I)−1 xLS , (4.4.15)
where xTLS and xLS denote the TLS and LS solutions. Taking norms in (4.4.15), we obtain the
upper bound
2
σn+1 σn+1
∥xTLS − xLS ∥2 ≤ 2 2 ∥xLS ∥2 ≤ ∥xLS ∥2 , (4.4.16)
σ̂n − σn+1 2(σ̂n − σn+1 )
where the last inequality follows from
σ̂n2 − σn+1
2
= (σ̂n + σn+1 )(σ̂n − σn+1 ) ≥ 2σn+1 (σ̂n − σn+1 ).
From this we deduce that when the errors in A and b are small the difference between the LS and
TLS solutions is small. Otherwise, the solutions can differ considerably.
4.4. Total Least Squares 221
In many parameter estimation problems, some of the columns of A are known exactly. It is
no loss of generality to assume that the n1 error-free columns are the first in A = ( A1 A2 ) ∈
Rm×n , n = n1 + n2 . The mixed LS–TLS model is
x1
( A1 A2 + E2 ) = b + f, A1 ∈ Rm×n1 ,
x2
where the rows of the errors ( E2 f ) are independently and identically distributed with zero
mean and the same variance. The problem can then be expressed as
x1
min ∥ ( E2 f ) ∥F , ( A1 A2 + E 2 ) = b + f. (4.4.17)
E2 ,f x2
When A2 is empty, this reduces to solving an ordinary least squares problem with multiple
right-hand sides. When A1 is empty, this is the standard TLS problem. When the columns
of A1 are linearly independent, the mixed LS–TLS can be solved as a two-block problem; see
Section 4.3.1. First, compute the QR factorization
R11 R12 c1
QT ( A1 A2 b ) = ,
0 R22 c2
where R11 ∈ Rn1 ×n1 is upper triangular and R22 ∈ R(m−n1 )×n2 . Next, compute x2 as the
solution to the TLS problem
Pn+d
and the minimum equals i=n+1 σi2 . If σn > σn+1 , this is the unique minimizer. If V2 is
partitioned as
V12
V2 =
V22
and V22 ∈ Rd×d is nonsingular, then the TLS solution is
−1
X = −V12 V22 = (ATA − σn+1 I)−1 AT B ∈ Rn×d . (4.4.22)
The last formula generalizes (4.4.13). For d = 1, we recover the previous expression (4.4.10) for
the TLS solution.
We now show that if σ̂n > σn+1 , then V22 is nonsingular. From (4.4.20) it follows that
AV12 + BV22 = U2 Σ2 .
If V22 is singular, then V22 z = 0 for some unit vector z, and hence U2 Σ2 z = AV12 z. From
V2T V2 = V12
T T
V12 + V22 V22 = I
T
it follows that V22 V22 z = z and ∥V22 x∥2 = 1. But then
When this condition is satisfied, the following extension of the classical SVD algorithm computes
the least-norm solution; see Van Huffel and Vandevalle [1077, 1991, Section 3.6.1]. For d = 1
and p = n the algorithm coincides with the classical SVD algorithm described earlier.
Algorithm 4.4.1.
Given a data matrix A ∈ Rm×n and an observation matrix B ∈ Rm×d , do the following:
1. Compute the SVD of the extended matrix C = ( A B ) ∈ Rm×(n+d) :
n+d
X
C = U ΣV T = ui σi viT . (4.4.23)
i=1
From the CS decomposition it follows that if V22 has full column rank, then so has V11 .
For a proof of the equivalence of the formulas in (4.4.25), see Van Huffel and Vandevalle[1077,
1991, Theorem 3.10]. The second expression for X only requires computation of the k largest
right singular vectors of ( A b ) and is advantageous when k is small.
Algorithm 4.4.1 for solving the multidimensional TLS problem only requires a small part
of the full SVD, namely the d ≪ n right singular vectors corresponding to the smallest singu-
lar values. For this purpose Van Huffel, Vandewalle, and Haegemans [1078, 1987] developed
a modified partial QRSVD (PSVD) algorithm. In the Householder bidiagonalization phase the
singular vectors U and V are initialized by the accumulated products of the Householder reflec-
tions. During the QRSVD iterations, plane rotations are applied to U and V to generate the left
and right singular values. A great amount of work is saved in the PSVD algorithm by delaying
the initializing of U and V until the end of the diagonalizing phase. The Householder reflections
are then applied only to the (small number of) desired singular vectors of the bidiagonal matrix.
A second modification in PSVD is to perform only a partial diagonalization of the bidiagonal
matrix. The iterative process is stopped as soon as convergence has occurred to all desired singu-
lar values. Assume that at the ith step of the diagonalization phase, we have the block bidiagonal
form (i)
B1
(i)
B2
B (i) = ,
..
.
(i)
Bs
(i)
where Bj , j = 1, . . . , s, are unreduced upper bidiagonal matrices. Suppose that a basis for a
singular subspace corresponding to the singular values σi ≤ θ is desired. Then spectrum slicing
(see Section 7.2.1) can be used to partition these blocks into three classes:
C3 = {Bj | at least one singular value > θ and at least one ≤ θ}.
If C3 is empty, then the algorithm stops. Otherwise, one QR iteration is applied to each block in
C3 , and the partition is reclassified. If no bound on the singular values can be given but, instead,
the dimension p of the desired subspace is known, then a bound θ can be computed with the
bisection method from Section 7.2.1. A complete description of the PSVD algorithm is given in
Van Huffel and Vandevalle [1077, 1991, Section 4.3]. A Fortran 77 implementation of PSVD is
available from Netlib.
This characterization of the TLS solution suggests the following alternative formulation of the
2
TLS problem. The unique minimum σn+1 of the Rayleigh quotient ρ(x) = v T Av/∥v∥
e 2
2 , where
x
A
e = (A b), v= ,
−1
is attained for x = xT LS . Hence the TLS problem is equivalent to minx ρ(x), where
∥r∥22
ρ(x) = , r = − e x
A = b − Ax. (4.4.27)
∥x∥22 + 1 −1
In the algorithm of Björck, Heggernes, and Matstoms [147, 2000] the TLS solution x is
computed by applying inverse iteration to the symmetric eigenvalue problem (4.4.26). Let x(k)
be the current approximation. Then x(k+1) and the scalars βk , k = 0, 1, 2, . . . , are computed by
solving (k+1) (k)
T e x x
A A
e = βk . (4.4.28)
−1 −1
If the compact QR factorization
R c
A
e = (A b) = Q , Q ∈ Rm×(n+1) , (4.4.29)
0 η
is known, then the solution of (4.4.28) is obtained by solving the two triangular systems
T (k) (k) (k+1) (k)
R 0 z x R c x z
= , = β k .
cT η −γk −1 0 η −1 −γk
After eliminating γk , this becomes
x(k+1) = xLS + βk R−1 (R−T x(k) ), βk = η 2 /(1 + xTLS x(k) ). (4.4.30)
The iterations are initialized by taking
x(0) = xLS = R−1 c, β0 = η 2 /(1 + ∥xLS ∥2 ). (4.4.31)
From classical convergence results for symmetric inverse iteration it follows that
∥x(k) − xT LS ∥2 = O((σn+1 /σn )2k ), |ρ(x(k) ) − σn+1
2
| = O((σn+1 /σn )4k ).
By using Rayleigh-quotient iteration (RQI), a better rate of convergence can be obtained. For
properties of the Rayleigh quotient of symmetric matrices, see Section 7.3.1.
Fasino and Fazzi [398, 2018] note that by the characterization (4.4.27) the TLS problem is
equivalent to the nonlinear least squares problem minx ∥f (x)∥2 , where
f (x) = µ(x)(b − Ax), µ(x) = (1 + xT x)−1/2 , (4.4.32)
and solved by a Gauss–Newton method (see Section 8.1.2). If xk is the current approximation,
this requires solution of a sequence of linear least squares problems
min ∥f (xk ) + hk J(xk )∥2 , xk+1 = xk + hk ,
hk
where J(xk ) = µ(xk ) A + µ(xk )2 rk xTk , with rk = b − Axk is the Jacobian of f (x) at xk .
Since A + µ(xk )2 rk xTk is a rank-one modification of A, its QR factorization can be cheaply
computed by modifying the QR factorization of A; see Section 3.3.2. In fact, this method is
closely related to inverse iteration and has a rate of convergence similar to that shown by Peters
and Wilkinson [891, 1979].
4.4. Total Least Squares 225
can be computed by Algorithm 4.4.1. The second expression for xk is better to use when k is
1/2
small. The norm of xk is ∥xk ∥2 = ∥V22 ∥−2
2 −1 . It increases with k, while the norm of the
residual matrix 1/2
2 2
∥ ( A b ) − ( Ã b̃ ) ∥F = σk+1 + · · · + σn+1 (4.4.34)
decreases with k. The TTLS solution xk can be written as a filtered sum
n
X ûTi b
xk = fi v̂i ,
i=1
σ̂i
where A = Û Σ̂V̂ T is the SVD of A. Fierro et al. [404, 1997] show that when ûTi b ̸= 0 and
i ≤ k, the filter factors fi are close to one, and for i > k they are small.
Another approach to regularization is to restrict the TLS solution by a quadratic constraint.
The RTLS problem is
where δ > 0 is a regularization parameter, and the matrix L ∈ Rp×n defines a seminorm on the
solution space. In practice, the parameter δ is usually not exactly specified but has to be estimated
from the given data using the techniques discussed for the TLS problem in Section 3.6.4.
The optimal solution of the RTLS problem is different from xTLS only when the quadratic
constraint is active, i.e.,
∥LxTLS ∥2 > δ.
In this case the constraint in (4.4.35) holds with equality at the optimal solution, and the RTLS
solution can be characterized by the following first-order optimality conditions for (4.4.35); see
Golub, Hansen, and O’Leary [492, 1999].
Theorem 4.4.1. The solution to the regularized TLS problem (4.4.35) with the inequality con-
straint replaced by equality is a solution to the eigenvalue problem
where
∥b − Ax∥22 1 T
λI = ϕ(x) = , λL = b (b − Ax) − ϕ(x) . (4.4.37)
1 + ∥x∥2 δ 2
Sima, Van Huffel, and Golub [994, 2004] give methods for solving the RTLS problem using
an iterative method that in each step solves a quadratic eigenvalue problem. In practice, very few
steps are required. Their method can be applied to large problems using existing fast methods
based on projection onto Krylov subspaces for solving quadratic eigenvalue problems.
226 Chapter 4. Special Least Squares Problems
The first-order conditions for RTLS given in Theorem 4.4.1 are the same as for the con-
strained minimization problem
∥b − Ax∥22
min subject to ∥Lx∥2 ≤ δ. (4.4.38)
x 1 + ∥x∥22
This Rayleigh quotient formulation of the RTLS problem is closely related to Tikhonov regular-
ization of TLS,
∥b − Ax∥22
2
min + ρ∥Lx∥2 . (4.4.40)
x 1 + ∥x∥22
Beck and Ben-Tal [96, 2006] show that this problem can be reduced to a sequence of trust-region
problems and give detailed algorithms. Algorithms for large-scale Tikhonov regularization of
TLS are developed by Lampe and Voss [712, 2013].
Guo and Renaut [554, 2002] show that the eigenvalue problem in Theorem 4.4.1 can be
formulated as
x x
(M + λL N ) = λI , (4.4.41)
−1 −1
Based on this formulation, they suggest an algorithm using shifted inverse iteration to solve the
eigenvalue problem (4.4.41). As an initial solution, the corresponding regularized LS solution
x(0) = xRLS is used. An additional complication is that the matrix B depends on the RTLS
solution. Their algorithm is further analyzed and refined in Renaut and Guo [924, 2005]. Lampe
and Voss [711, 2008] develop a related but faster method that uses a nonlinear Arnoldi process
(see Section 6.4.5) and a modified root-finding method.
M. Wei [1111, 1992] gives algebraic relations between the total least squares and least squares
problems with more than one solution. Several papers study the sensitivity and condition-
ing of the TLS problem and give bounds for the condition number; see Zhou et al. [1149,
2009], Baboulin and Gratton [51, 2011], Jia and Li [670, 2013], and Xie, Xiang, and Y. Wei
[1134, 2017]. A perturbation analysis of TTLS is given by Gratton, Titley-Peloquin, and
Ilunga [529, 2013].
De Moor [297, 1993] studies more general structured and weighted TLS problems and con-
siders applications in systems and control theory. These problems can be solved via a nonlinear
GSVD. A review of developments and extensions of the TLS method to weighted and struc-
tured approximation is given by Markovsky and Van Huffel [777, 2007]. A recent bibliography
on total least squares is given by Markovsky [776, 2010]. Standard TLS methods may not be
4.5. Least Squares Problems with Special Bases 227
appropriate when A has a special structure such as Toeplitz or Vandermonde. The structured
least-norm problem (STLN) preserves structure and can minimize errors also in the ℓ1 -norm and
others; see Rosen, Park, and Glick [936, 1996] and Van Huffel, Park, and Rosen [1076, 1996].
is a polynomial of degree k with nonzero leading coefficient akk . The coefficients of such a
family form a nonsingular lower triangular matrix A = (ai,j ), 0 ≤ j ≤ i ≤ n. Then the
monomials xk , k = 0, . . . , n, can be expressed recursively and uniquely as linear combinations
xk = bk,0 p0 + bk,1 p1 + · · · + bk,k pk , where the associated matrix is B = (bi,j ) = A−1 .
Definition 4.5.1. Let the real-valued functions f and g be defined on a finite grid G = {xi }m
i=0
of distinct points. Then, the inner product of (f, g) is defined by
m
X
(f, g) = f (xi )g(xi )wi , (4.5.2)
i=0
where {wi }m
i=0 are given positive weights. The norm of f is ∥f ∥ = (f, f )
1/2
.
An important consideration is the choice of a proper basis for the space of approximating
functions. The functions ϕ0 , ϕ1 , . . . , ϕn are said to form an orthogonal system if (ϕi , ϕj ) = 0
for i ̸= j and ∥ϕi ∥ ̸= 0 for all i. If, in addition, ∥ϕi ∥ = 1 for all i, then the sequence is
called an orthonormal system. We remark that the notation used is such that the results can be
generalized with minor changes to cover the least squares approximation when f is approximated
by an infinite sequence of orthogonal functions ϕ0 , ϕ1 , ϕ2 , . . . .
We study the least squares approximation problem to determine coefficients c0 , c1 , . . . , cn in
(4.5.1) such that the weighted Euclidean norm
m
X
∥f − f ∥2 = wi |f (xi ) − f (xi )|2
i=0
of the error is minimized. Note that interpolation is a special case (n = m) of this problem. By
a family of orthogonal polynomials we mean here a triangle family of polynomials, which is an
orthogonal system with respect to the inner product (4.5.2) for some given weights.
228 Chapter 4. Special Least Squares Problems
Theorem 4.5.2. Let ϕ0 , ϕ1 , . . . , ϕn be linearly independent functions. Then the least squares
approximation problem has the unique solution
n
X
f∗ = cj ϕj ,
j=0
(xϕn , ϕn ) αn ∥ϕn ∥2
βn = , γn = (n > 0). (4.5.6)
∥ϕn ∥2 αn−1 ∥ϕn−1 ∥2
Proof. Suppose that the ϕj have been constructed for 0 ≤ j ≤ n, ϕj ̸= 0 (n ≥ 0). We now seek
a polynomial of degree n + 1 with leading coefficient an+1 that is orthogonal to ϕ0 , ϕ1 , . . . , ϕn .
For a triangle family of polynomials {ϕj }nj=0 , we can write
n
X
ϕn+1 = αn xϕn − cn,i ϕi . (4.5.7)
i=0
which determines the coefficients cn,j . From the definition of inner product it follows that
(xϕn , ϕj ) = (ϕn , xϕj ). But xϕj is a polynomial of degree j + 1 and is therefore orthogonal to
ϕn if j + 1 < n. So cnj = 0, j < n − 1, and thus
This has the same form as (4.5.5) if we set βn = cn,n /αn , γn = cn,n−1 . To get the expression
in (4.5.6) for γn , we take the inner product of equation (4.5.7) with ϕn+1 . From orthogonal-
ity it follows that (ϕn+1 , ϕn+1 ) = αn (ϕn+1 , xϕn ). Decreasing all indices by one, we obtain
(ϕn , xϕn−1 ) = ∥ϕn ∥2 /αn−1 , n ≥ 1. Substituting this in the expression for γn gives the desired
result.
The proof of the above theorem shows a way to construct βn , γn , and the values of the
polynomials ϕn at the grid points for n = 1, 2, 3, . . . . This is called the Stieltjes procedure.
For n = m, the constructed polynomial must be equal to am+1 (x − x0 )(x − x1 ) · · · (x − xm ),
because this polynomial is zero at all the grid points and thus orthogonal to all functions on the
grid. Since ∥ϕm+1 ∥ = 0, the construction stops at n = m. This is natural because there cannot
be more than m + 1 orthogonal (or even linearly independent) functions on a grid with m + 1
points.
Given the coefficients in an orthogonal expansion, the values of f can be efficiently computed
using the following algorithm. (The proof is left as a somewhat difficult exercise.)
Pk
Theorem 4.5.4 (Clenshaw’s Formula). Let pk = j=0 cj ϕj , where ϕj (x) are orthogonal
polynomials satisfying the recursion (4.5.5). Then p(x) = A0 y0 , where yn+2 = yn+1 = 0 and
From Theorem 4.5.2 follows the important result that the coefficients in the best approximat-
ing polynomial pk of degree k are independent of k and given by cj = (f, ϕj )/(ϕj , ϕj ). Hence
approximations of increasing degree can be recursively generated as follows. Assume that ϕi ,
i = 1, . . . , k, and pk have been computed. In the next step the coefficients βk , γk are computed
from (4.5.6) and ϕk+1 by (4.5.5). The next approximation of f is then given by
The coefficients {βk , γk } in the recursion formula (4.5.5) and the orthogonal functions ϕj
at the grid points are computed using the Stieltjes procedure together with the orthogonal co-
efficients {cj } for j = 1, 2, . . . , n. The total work required is about 4mn flops, assuming unit
weights and that the grid is symmetric. If there are differing weights, then about mn additional
operations are needed; similarly, mn additional operations are required if the grid is not sym-
metric. If the orthogonal coefficients are determined simultaneously for several functions on the
same grid, then only about mn additional operations per function are required. (In the above, we
assume that m ≫ 1, n ≫ 1.) Hence, the procedure is much more economical than the general
methods based on normal equations or QR factorization, which all require O(mn2 ) flops.
In practice the computed ϕk+1 will gradually lose orthogonality to the previously computed
ϕj . Since ϕTk+1 pk = 0, an alternative expression for the new coefficient is
This expression, which involves the residual rk , will give better accuracy in the computed coef-
ficients. Indeed, when using the classical formula, one sometimes finds that the residual norm
230 Chapter 4. Special Least Squares Problems
increases when the degree of the approximation is increased! Note that the difference between
the two variants discussed here is similar to the difference between CGS and MGS.
The Stieltjes procedure may be sensitive to propagation of roundoff errors. An alternative
procedure for computing the recurrence coefficients in (4.5.5) and the values of the orthogonal
polynomials has been given by Gragg and Harrod [523, 1984]; see also Boley and Golub [169,
1987]. In this procedure these quantities are computed from an inverse eigenvalue problem for
a certain symmetric tridiagonal matrix. Reichel [916, 1991] compares this scheme with the
Stieltjes procedure and shows that the Gragg–Harrod procedure generally yields better accuracy.
Expansions in orthogonal polynomials also have the very important advantage of avoiding
Pn systems of equations that occur even for moderate n when the
the difficulties with ill-conditioned
coefficients in a polynomial j=0 cj xj and the function values are given on an equidistant grid.
For equidistant data, the Gram polynomials {Pn,m }m n=0 , which are orthogonal with respect to
the inner product
Xm
(f, g) = f (xi )g(xi ), xi = −1 + 2i/m,
i=0
are relevant. These satisfy the recursion formula P−1,m (x) = 0, P0,m = (m + 1)−1/2 ,
When n ≪ m1/2 , these polynomials are well behaved. However, when n ≫ m1/2 , they have
very large oscillations between the grid points, and a large maximum norm in [−1, 1]. Related
to this is the fact that when fitting a polynomial to equidistant data, one should never choose n
larger than about 2m1/2 .
One of the motivations for the method of least squares is that it effectively reduces the in-
fluence of random errors in measurements. Suppose that the values of a function have been
measured at points x0 , x1 , . . . , xm . Let f (xp ) be the measured value, and let f¯(xp ) be the
“true” (unknown) function value, which is assumed to be the same as the expected value of the
measured value. Thus, no systematic errors are assumed to be present. Suppose further that
the errors in measurement at the various points are statistically independent. Then we have
f (xp ) = f¯(xp ) + ϵ, where
E(ϵ) = 0, V(ϵ) = s2 I, (4.5.13)
n
X n
X n
X
V{fn∗ (α)} =V c∗j ϕj (α) = V{c∗j }|ϕj (α)|2 2
=s |ϕj (α)|2 .
j=0 j=0 j=0
4.5. Least Squares Problems with Special Bases 231
As an average, taken over the grid of measurement points, the variance of the smoothed function
values is
n n m
1 X s2 X X n+1
V{fn∗ (xi )} = |ϕj (xi )|2 = s2 .
m + 1 j=0 m + 1 j=0 i=0 m+1
Between the grid points, however, the variance can, in many cases, be significantly larger. For
j ≫ m1/2 the Gram polynomials can be much larger between the grid points. Set
n Z 1
X 1
s2I = s2 |ϕj (α)|2 dα.
j=0
2 −1
Thus, s2I is an average variance for fn∗ (α) taken over the entire interval [−1, 1]. The following
values were obtained for the ratio k between s2I and s2 (n + 1)/(m + 1) when m = 41; see
Dahlquist and Björck [283, 1974, Section 4.4.5]:
n 5 10 15 20 25 30 35
k 1.0 1.1 1.7 26 7 · 103 1.7 · 107 8 · 1011
These results are related to the recommendation that one should choose n < 2m1/2 when fitting
a polynomial to equidistant data. This recommendation seems to contradict the Gauss–Markov
theorem, but in fact it only means that one gives up the requirement that the estimate be unbi-
ased. Still, it is remarkable that this can lead to such a drastic reduction of the variance of the
estimates fn∗ .
can be used recursively to express cos(nϕ) as a polynomial in cos ϕ. If we set x = cos ϕ, then
ϕ = arccos x, and we obtain the Chebyshev polynomials for −1 ≤ x ≤ 1 by the formula
Tn (x) = cos(n arccos x), n ≥ 0. From trigonometric formulas it follows that the Chebyshev
polynomials satisfy the recursion formula
The leading coefficient of Tn (x) is 2n−1 for n ≥ 1 and 1 for n = 0. The symmetry property
Tn (−x) = (−1)n Tn (x) also follows from the recurrence formula.
Theorem 4.5.5. Tn (x) has n zeros in [−1, 1] given by the Chebyshev abscissae,
2k + 1 π
xk = cos , k = 0, 1, . . . , n − 1, (4.5.15)
n 2
and n + 1 extrema Tn (x′k ) = (−1)k attained at x′k = cos(kπ/n), k = 0, . . . , n. These results
follow directly by noting that | cos(nϕ)| has maxima for ϕ′k = kπ/n, and cos(nϕk ) = 0 for
ϕk = (2k + 1)π/(2n).
8 Pafnuty Lvovich Chebyshev (1821–1894) was a Russian mathematician and a pioneer in approximation theory.
232 Chapter 4. Special Least Squares Problems
The Chebyshev polynomials T0 , T1 , . . . , Tn−1 are orthogonal with respect to the inner prod-
uct
n−1
X
(f, g) = f (xk )g(xk ),
k=0
where {xk } are the Chebyshev abscissae (4.5.15) for Tn . If i ̸= j, then (Ti , Tj ) = 0, 0 ≤ i, j <
n, and 1
n if i = j ̸= 0,
(Ti , Tj ) = 2 (4.5.16)
n if i = j = 0.
If one intends to approximate a function in the entire interval [−1, 1] by a polynomial and
can choose the points at which the function is computed or measured, one should choose the
Chebyshev abscissae. With these points, interpolation is a fairly well-conditioned problem in the
entire interval, and one can conveniently fit a polynomial of lower degree than m if one wishes
to smooth errors in measurement. The risk of disturbing surprises between the grid points is
insignificant.
Let p(x) denote the interpolation polynomial of a function f (x) at the Chebyshev abscissae
xk (4.5.15). From Theorem 4.5.3 we get
n−1 n−1
X 1 X
p(x) = cj Tj (x), ci = f (xk )Ti (xk ),
j=0
∥Ti ∥2
k=0
f (n) (ξ)
(x − x0 )(x − x1 ) · · · (x − xn−1 ).
(n)!
Here ξ depends on x, but one can say that the error curve behaves for the most part like a polyno-
mial curve y = a(x − x0 )(x − x1 ) · · · (x − xn−1 ). A similar oscillating curve is typical for error
curves arising from least squares approximation. The zeros of the error are then about the same
as the zeros for the first neglected term in the orthogonal expansion. This contrasts sharply with
the error curve for Taylor approximation at x0 , whose usual behavior is described approximately
by the formula y = a(x − x0 )n−1 . From the min-max property of the Chebyshev polynomi-
als it follows that placing the interpolation points of the Chebyshev abscissae will minimize the
maximum magnitude of
in the interval [−1, 1]. This corresponds to choosing q(x) = Tm+1 (x)/2m .
4.5. Least Squares Problems with Special Bases 233
For computing p(x), one can use Clenshaw’s recursion formula; see the previous section.
(Note that αk = 2 for k > 0, but α0 = 1.) Occasionally, one is interested in the partial
sums of the expansion. To smooth errors in measurement, it can be advantageous to break off
the summation before the last term. If the values of the function are afflicted with statistically
independent errors in measurement with standard deviation s, then (see the next section) the
series can be broken off when, for the first time,
k
X
f− cj Tj < sn1/2 .
j=0
If the measurement points are the Chebyshev abscissae, then no difficulties arise in fitting a
polynomial to the data. In this case the Chebyshev polynomials have a magnitude between the
grid points that is not much larger than their magnitude at the grid points. The average variance
for fn∗ (α) becomes the same both on the interval [−1, 1] and the grid of measurement points:
s2 (n + 1)/(m + 1).
The choice of n, when m is given, is a question of compromising between taking into account
the systematic error, i.e., the truncation error (which decreases when n increases) and taking into
account the random errors (which grow as n increases). In the Chebyshev case, |cj | decreases
quickly with j if f is a sufficiently smooth function, while the part of c∗j that comes from errors
in measurement varies randomly with magnitude about s(2/(m + 1))1/2 . The expansion should
then be broken off when the coefficients begin to “behave randomly.” The coefficients in an ex-
pansion in terms of the Chebyshev polynomials can hence be used for filtering away the “noise”
from the signal, even when s is initially unknown.
where ak , bk are real constants. Fourier analysis is one of the most useful and valuable tools
in applied mathematics. It has applications also to problems that are not a priori periodic. One
important area of application is in digital signal processing, e.g., in interpreting radar and sonar
signals. Another application is statistical time series, which arise in communications theory,
control theory, and the study of turbulence.
An expansion of the form (4.5.17) can be expressed in several equivalent ways. Another
form, more convenient for manipulations, is
∞
X
f (t) = ck eikωt , (4.5.18)
−∞
where c0 = a0 , ck = (ak − ibk )/2, c−k = (ak + ibk )/2, k > 0. This form allows the function to
have complex values. Function f with period 2π can be approximated by partial sums of these
series. We call these finite sums trigonometric polynomials. If a function of t has period p,
then the substitution x = 2πt/p transforms the function into a function of x with period 2π.
When the functions to be modeled are known only on a discrete equidistant set of sampling
points, a discrete version of the Fourier analysis can be used. The discrete inner product of two
234 Chapter 4. Special Least Squares Problems
complex-valued functions f and g of period 2π is defined as follows (the bar over g indicates
complex conjugation):
N
X −1
(f, g) = f (xβ )ḡ(xβ ), xβ = 2πβ/N. (4.5.19)
β=0
Theorem 4.5.6. With inner product (4.5.19) the following orthogonality relations hold for the
functions ϕj (x) = eijx , j = 0, ±1, ±2, . . .:
N if (j − k)/N is an integer,
n
(ϕj , ϕk ) =
0 otherwise.
This is a geometric series with ratio q = ei(j−k)h . If (j − k)/N is an integer, then q = 1, and
the sum is N . Otherwise, q ̸= 1, but q N = ei(j−k)2π = 1. From the summation formula of a
geometric series, (ϕj , ϕk ) = (q N − 1)/(q − 1) = 0.
PN −1
If f has an expansion of the form f = j=0 cj ϕj , then
b
X
(f, ϕk ) = cj (ϕj , ϕk ) = ck (ϕk , ϕk ), a ≤ k ≤ b,
j=a
Note that the calculations required to compute the coefficients cj according to (4.5.20), called
Fourier analysis, are of essentially the same type as in the calculations needed to tabulate f ∗ (x)
for x = 2πβ/N , β = 0, 1, . . . , N − 1, when the expansion in (4.5.21) is known, the so-called
Fourier synthesis.
Theorem 4.5.7. Every function f (x) defined on the grid xβ = 2πβ/N , β = 0, . . . , N − 1, can
be interpolated by a trigonometric polynomial
k+θ
X
f (x) = cj eijx , (4.5.21)
j=−k
where
1 if N even, N/2 − 1 if N even,
θ= k=
0 if N odd, (N − 1)/2 if N odd.
If the sum in (4.5.21) is terminated when j < k + θ, one obtains the trigonometric polynomial
that is the best least squares approximation, among all trigonometric polynomials with the same
number of terms, to f on the grid.
4.5. Least Squares Problems with Special Bases 235
Proof. The expression for cj was formally derived previously (see (4.5.20)). Because
where ω = e−2πi/N is the nth root of unity (ω N = 1). Hence cj is a polynomial of degree N − 1
in ω j . Let FN ∈ RN ×N be the Fourier matrix with elements
(FN )jβ = ω jβ , j, β = 0, . . . , N − 1.
It follows that the discrete Fourier transform can be expressed as a matrix-vector multiplica-
tion c = FN f , where the discrete Fourier transform (DFT) matrix FN is a complex symmetric
Vandermonde matrix. Furthermore,
1 H
F FN = I, (4.5.23)
N N
i.e., N −1/2 FN is a unitary matrix, and the inverse transform is
1 H
f= F c.
N N
If implemented in a naive way, the DFT will take N 2 operations to compute all cj (here,
one operation equals one complex addition and one complex multiplication). The application of
discrete Fourier analysis to large-scale problems became feasible only with the invention of the
so-called fast Fourier transform (FFT) that reduces the computational complexity to O(N log N ).
The FFT, developed in 1965 by Cooley and Tukey [269, 1965], is based on the divide-and-
conquer strategy. Consider the special case when N = 2p and set
2β1 if β even, 1
β= 0 ≤ β1 ≤ N − 1.
2β1 + 1 if β odd, 2
Then the sum in (4.5.22) can be split into an even part and an odd part:
1 1
2 N −1
X 2 N −1
X
cj = f2β1 (ω 2 )jβ1 + f2β1 +1 (ω 2 )jβ1 ω j . (4.5.24)
β1 =0 β1 =0
Let α be the quotient and j1 the remainder when j is divided by 12 N , i.e., j = α 12 N + j1 . Then,
because ω N = 1,
j 1 β1 αβ1
(ω 2 )jβ1 = (ω 2 )α(1/2)N β1 (ω 2 ) = (ω N ) (ω 2 )j1 β1 = (ω 2 )j1 β1 .
236 Chapter 4. Special Least Squares Problems
1
where (ω 2 ) 2 N = 1, then by (4.5.24),
cj = ϕ(j1 ) + ω j ψ(j1 ), j = 0, 1, . . . , N − 1.
The two sums on the right are elements of the DFTs of length N/2 applied to the parts of f with
odd and even subscripts. The DFT of length N is obtained by combining these two DFTs. Since
(N/2)
ωN = −1, it follows that
j1
yj1 = ϕj1 + ωN ψj 1 , (4.5.25)
j1
yj1 +N/2 = ϕj1 − ωN ψj 1 , j1 = 0, . . . , N/2 − 1. (4.5.26)
These expressions are called the butterfly relations because of the data flow pattern. The com-
putation of ϕj1 and ψj1 is equivalent to two Fourier transforms with m = N/2 terms instead of
one with N terms. If N/2 is even, the same idea can be applied to these two Fourier transforms.
One then gets four Fourier transforms, each of which has N/4 terms. If N = 2p , this reduction
can be continued recursively until we get N DFTs with one term. Each step involves an even–
odd permutation. In the first step the points with last binary digit equal to 0 are ordered first, and
those with last digit equal to 1 are ordered last. In the next step the two resulting subsequences
of length N/2 are reordered according to the second binary digit, etc.
The number of complex operations (one multiplication and one addition) required to compute
{yj } from the butterfly relations when {ϕj1 } and {ψj1 } have been computed is 2p , assuming that
the powers of ω are precomputed and stored. If we denote by qp the total number of operations
needed to compute the DFT when N = 2p , we have qp ≤ 2qp−1 + 2p , p ≥ 1. Since q0 = 0, it
follows by induction that
qp ≤ p 2p = N log2 N.
Hence, when N is a power of two, the FFT solves the problem with at most N · log2 N complex
operations. For example, when N = 220 = 1,048,576 the FFT algorithm is theoretically a factor
of 84,000 faster than the “conventional” O(N 2 ) algorithm. The FFT algorithm not only uses
fewer operations to evaluate the DFT, it also is more accurate. For the conventional method, the
roundoff error is proportional to N . For the FFT algorithm, the roundoff error is proportional to
log2 N .
Most implementations of FFT avoid explicit recursion and instead use two stages.
• A reordering stage in which the data vector f is permuted in bit-reversal order.
• A second stage in which first N/2 FFT transforms of length 2 are computed on adjacent
elements, followed by N/4 transforms of length 4, etc., until the final result is obtained by
merging two FFTs of length N/2.
It is not difficult to see that the combined effect of the reordering in the first stage is a bit-
reversal permutation of the data points. For i = 0 : N − 1, let the index i have the binary
expansion i = b0 + b1 · 2 + · · · + bt−1 · 2t−1 , and set
r(i) = bt−1 + · · · + b1 · 2t−2 + b0 · 2t−1 .
That is, r(i) is the index obtained by reversing the order of the binary digits. If i < r(i), then
exchange fi and fr(i) . We denote the permutation matrix performing the bit-reversal ordering
4.5. Least Squares Problems with Special Bases 237
by PN . Note that if an index is reversed twice, we end up with the original index. This means
that PN−1 = PNT = PN , i.e., PN is symmetric. The permutation can be carried out “in place” by
a sequence of pairwise interchanges or transpositions of the data points. For example, for N =
16, the pairs (1,8), (2,4), (3,12), (5,10), (7,14), and (11,13) are interchanged. The bit-reversal
permutation can take a substantial fraction of the total time to do the FFT. Which implementation
is best depends strongly on the computer architecture.
When N = 2p the FFT algorithm can be interpreted as a sparse factorization of the DFT
matrix,
F N = A k · · · A 2 A 1 PN , (4.5.27)
where PN is the bit-reversal permutation matrix, and A1 , . . . , Ak are block diagonal matrices:
L/2−1
ΩL/2 = diag (1, ωL , . . . , ωL ), ωL = e−2πi/L . (4.5.30)
This is usually referred to as the Cooley–Tukey FFT algorithm.
The discrete cosine transform (DCT) was discovered in 1974 by Ahmed, Natarajan, and
Rao [11, 1974]; see also Rao and Yio [913, 1990]. The DCT has real entries as opposed to
the complex entries of the FFT matrix. Depending on the type of boundary condition (Dirichlet
or Neumann, centered at a mesh point or midpoint) there are different variants. The DCT-2
transform is used extensively in image processing. It uses the real basis vectors
r
N
cos (j − 1)(i + 21 ) π,
vi,j = (4.5.31)
2
√
divided by 2 if j = 1. Strang [1044, 1999] surveys the four possible variants of cosine trans-
forms DCT-1, . . . , DCT-4 and their use for different boundary conditions.
where {xk }nk=0 is a sequence of n + 1 distinct real numbers. Vandermonde matrices arise in
many applications, such as interpolation and approximation of linear functionals. Consider first
the problem of constructing a polynomial
p(x) = a0 + a1 x + · · · + an xn
it follows that V = V (x0 , x1 , . . . , xn ) has positive determinant and positive leading principal
minors. More generally, it is known that V is totally positive. For such Vandermonde systems,
Higham [612, 1987] shows that if the right-hand side is sign-interchanging, (−1)k bk ≥ 0, then
the error in the solution computed by the Björck–Pereyra algorithm can be bounded by a quantity
independent of κ(V ).
A Vandermonde-like matrix V = (vij ) has elements vij = ϕi (xj ), 0 ≤ i, j ≤ n, where
{ϕi }n0 is a family of orthonormal polynomials that satisfy a three-term recurrence of the form
(4.5.5). Such matrices generally have much smaller condition numbers than the classical Van-
dermonde matrices. Higham [613, 1988] and Reichel [916, 1991] give fast algorithms of Björck–
Pereyra type for such systems. Demmel and Koev [311, 2005] prove that Björck–Pereyra-type
algorithms exist not only for such systems but also for any totally positive linear system for which
the initial minors can be computed accurately.
Let V ∈ Rm×n be a rectangular Vandermonde matrix consisting of the first n < m columns
of V (x0 , x1 , . . . , xm ). It is natural to ask whether fast methods exist for solving the primal
Vandermonde least squares problem
Demeure [301, 1989], [302, 1990] has given an algorithm of complexity O(mn) for computing
the QR factorization of V , which can be used to solve problem (4.5.34). However, because
this algorithm forms V T V , it is likely to be unstable. A fast algorithm based on the Gragg–
Harrod scheme [523, 1984] for computing the QR factorization of transposed Vandermonde-like
matrices is given by Reichel [916, 1991]. This algorithm can be used to solve overdetermined
dual Vandermonde-like systems in the least squares sense in O(mn) operations.
A survey of properties of totally positive matrices is given by Fallat [393, 2001]. Higham [623,
2002, Chapter 22], surveys algorithms for Vandermonde systems. The remarkable numerical
stability obtained for Vandermonde systems has counterparts for other classes of structured ma-
trices. Boros, Kailath, and Olshevsky [171, 1999] derive fast parallel Björck–Pereyra-type algo-
rithms for solving Cauchy linear equation Cx = d,
1 1
···
x0 − y0 x0 − yn
.. .. ..
C=
. . . .
1 1
···
xn − y0 xn − yn
This class of systems includes Hilbert linear systems with sign-interchanging right-hand side.
Martínez and Peña [780, 1998] discuss algorithms of similar type for Cauchy–Vandermonde
matrices of the form ( C V ).
t0 t1 ... tn
.. ..
t
−1 t0 . .
. .. ..
. . .
. t1 ∈ R(m+1)×(n+1)
T = (4.5.35)
t−n ... t−1 t0
. .. .. ..
.
. . . .
t−m t−m+1 ... t−m+n
is defined by its n + m + 1 elements t−m , . . . , t0 , . . . , tn in the first row and column. Toeplitz
matrices arise from discretization of convolution-type integral equations and play a fundamental
role in signal processing, time-series analysis, econometrics, and image deblurring; see Hansen,
Nagy, and O’Leary [583, 2006].
The BBH algorithm of Bojanczyk, Brent, and de Hoog [162, 1986] is a fast algorithm for
computing the QR factorization of a Toeplitz matrix. It is related to a classical algorithm of
Schur and requires O(mn + n2 ) instead of O(mn2 ) operations. The basic idea of the BBH
algorithm is to partition T in two different ways,
t0 uT T0 ũ
T = = , (4.5.36)
v T0 ṽ T tm−n
240 Chapter 4. Special Least Squares Problems
where both T and L are upper triangular and Toeplitz, Eldén [370, 1984] gives an algorithm that
only requires 9n2 flops for computing the solution for a given value of λ. His algorithm can be
modified to handle also the case when T and L have a few nonzero diagonals below the main
diagonal. Eldén’s algorithm uses n2 /2 storage locations. A modification that only uses O(n)
storage locations is given by Bojanczyk and Brent [161, 1986].
An alternative to direct methods for Toeplitz least squares problems is to use iterative meth-
ods, such as the preconditioned conjugate gradient method; see Section 6.2.2. This requires
in each iteration step one matrix-vector multiplication with T and T H . Such products can be
implemented in O(m log m) operations and O(m + n) storage using FFT; see Section 6.3.7.
Let T = (tij ) be a Toeplitz matrix, and let J denote the reverse permutation matrix. Then
H = T J has constant entries along each antidiagonal, i.e., H is a Hankel matrix. For example,
if n = 2, m = 3, then
t2 t1 t0
t t0 t−1
H = TJ = 1 .
t0 t−1 t−2
t−1 t−2 t−3
Since the reverse permutation matrix satisfies J −1 = J, we have HJ = T . Hence, methods
discussed in this section for solving Toeplitz least squares problems apply to Hankel least squares
problems as well.
A sparse matrix is any matrix with enough zeros that it pays to take advantage of
them.
—J. H. Wilkinson
243
244 Chapter 5. Direct Methods for Sparse Problems
0 0
100 100
200 200
300 300
400 400
Figure 5.1.1. Nonzero pattern of a matrix arising from a structural problem and its Cholesky factor.
This step only works on the nonzero structure of A and also generates a storage structure
for R.
2. A numerical phase in which C = (APc )T (APc ) is formed numerically and its Cholesky
factor R is computed and stored in the data structure generated in the symbolic phase.
3. For the given right-hand side b form c = PcT AT b, solve RT z = c, Ry = z, and set
x = Pc y.
The literature on sparse matrix algorithms is extensive. George and Liu [457, 1981] is an early
classical text on sparse Cholesky factorization. Graph theory and its connections to sparse matrix
computations are treated in George, Gilbert, and Liu [454, 1993]. An excellent survey of the
state of the art of direct methods for sparse linear systems is given by Davis [289, 2006]; see also
Davis, Rajamanickam, and Sid-Lakhdar [292, 2016] and Duff, Erisman, and Reid [345, 2017].
A recent addition that complements theory with detailed outlines of algorithms and emphasizes
the importance of sparse matrix factorizations for constructing preconditioners for iterative least
squares methods is Scott and Tůma [991, 2023].
5.1. Tools for Sparse Matrix Computations 245
has structural rank two, but if c11 c33 − c13 c31 = 0, the numerical rank is one.
In the following we denote the number of nonzero elements in a sparse matrix (or vector) by
nnz. A sparse vector x can be stored in compressed form as the triple (xC, ix, nnz). Its nnz
nonzero elements are stored in the vector xC. The indices of the elements in xC are stored in
the integer vector ix, so that
Note that it is not necessary to store the nonzero elements in any particular order in xC. This
makes it easy to add further nonzero elements in x—they can be appended in the last positions
of xC and ix.
In the general sparse storage scheme for a sparse matrix A, nonzero elements are stored in
coordinate form (AC, ix, jx, nnz), that is, as an unordered one-dimensional array AC with two
integer vectors ix and jx containing the corresponding row and column indices,
where nnz is the number of nonzero elements in A. This scheme is very convenient for the
initial representation of a general sparse matrix because additional nonzero elements can be
easily added to the structure. A drawback is that it is difficult to access A by rows or columns,
as is usually needed in later operations. Also, the storage overhead is large because two integer
vectors of length nnz are required. This overhead can be decreased by using a clever compressed
scheme due to Sherman; see George and Liu [457, 1981, pp. 139–142].
In compressed storage by rows (CSR) a sparse matrix is stored as the collection of its sparse
rows. For each row, the nonzeros are stored in AC in compressed form. The corresponding
column subscripts are stored in integer vector ja, so that the column subscript of element AC(k)
is in ja(k). A third vector ip gives the position in AC of the first element in the ith row of A.
246 Chapter 5. Direct Methods for Sparse Problems
AC = (a11 , a13 | a22 , a21 a24 | a33 | a42 , a44 | a54 , a55 | a65 ),
ja = (1, 3, 2, 1, 4, 3, 2, 4, 4, 5, 5),
ip = (1, 3, 6, 7, 9, 11).
The components in each row need not be ordered. To access a nonzero aij there is no direct way
to calculate the corresponding index in the vector AC; some testing is needed on the subscripts
in ja. In CSR storage form, a complete row of A can be retrieved efficiently, but elements in
a particular column cannot be retrieved without a search of nearly all elements. Alternatively, a
similar compressed storage by columns (CSC) form can be used.
For a sparse symmetric matrix it suffices to store either its upper or its lower triangular part,
including the main diagonal. If the matrix is positive definite, then all its diagonal elements are
positive, and it may be convenient to store them in a separate vector.
For problems in which all the nonzero matrix elements lie along a few subdiagonals the
compressed diagonals storage mode is suitable. A matrix A is then stored in two rectangular
arrays AD and a vector la of pointers. The array AD has n rows and nd columns, where nd is
the number of diagonals. AD contains the diagonals of A that have at least one nonzero element,
and la contains the corresponding diagonal numbers. The superdiagonals are padded to length
n with k trailing zeros, where k is the diagonal number. The subdiagonals are padded to length
n with |k| leading zeros. This storage scheme is particularly useful if the matrices come from a
finite element or finite discretization on a tensor product grid. The matrix A in (5.1.1) would be
stored as
a11 a13 a21 0
a22 a24 0 a42
AD = a33 0 0 0 , la = (0 2 − 1 − 2).
a44 0 a54 0
a55 0 a65 0
Operations on sparse vectors are simplified if one of the vectors is first uncompressed, i.e.,
stored as a dense vector. This can be done in time proportional to the number of nonzeros, and
it allows direct random access to specified elements in the vector. Vector operations, such as
adding a multiple of a sparse vector x to an uncompressed sparse vector y or computing an inner
product xT y, can then be performed in constant time per nonzero element. For example, assume
x is held in compressed form and y in a full-length array. Then the operation y := a ∗ x + y may
be expressed as
for k = 1, . . . , nnz
i = ix(k); y(i) := a ∗ x(k) + y(i);
end
Some of the more common structures are described below. Let u ∈ Rm and v ∈ Rn be uncom-
pressed vectors, and let A be stored in CSR mode. Then the matrix-vector product u = Av can
be implemented as
for i = 1 : m,
u(i) = 0;
for k = ip(i) : ip(i + 1) − 1,
u(i) = u(i) + AC(k) ∗ v(ja(k));
end
end
For the product v = AT u a similar code would access the elements of A column by column,
which is very inefficient. The product is better performed as
v(1 : n) = 0;
for i = 1 : m
for k = ip(i) : ip(i + 1) − 1,
j = ja(k); v(j) = v(j) + AC(k) ∗ u(i);
end
end
A proposal for standard computational kernels (BLAS) aimed at iterative solvers is given
by Duff et al. [349, 1997]. The interface of the sparse BLAS in the standard from the BLAS
technical forum is designed to shield one from concern over specific storage schemes; see Duff,
Heroux, and Pozo [347, 2002].
The degree of a vertex vi is the number of vertices adjacent to vi and is denoted by |AdjG (vi )|.
A graph Ḡ = (V̄ , Ē) is a subgraph of G = (V, E) if V̄ ⊂ V and Ē ⊂ E. A walk of length k
in an undirected graph is an ordered sequence of k + 1 vertices v1 , . . . , vk+1 such that
(vi , vi+1 ) ∈ E, i = 1, . . . , k.
248 Chapter 5. Direct Methods for Sparse Problems
A walk is a path if all edges are distinct. If v1 = vk+1 , the walk is closed and called a cycle.
Two nodes are said to be connected if there is a path between them. The distance between two
vertices in a graph is the number of edges in the shortest path connecting them. A graph is said
to be connected if every pair of distinct vertices is connected by a path. A graph that is not
connected consists of at least two separate connected subgraphs. If G = (X, E) is a connected
graph, then Y ⊂ V is called a separator if G becomes disconnected after removal of the nodes
Y . A directed graph is strongly connected if there is a path between every pair of distinct nodes
along directed edges.
Graphs that do not contain cycles are called acyclic. A connected acyclic graph is called a
tree; see Figure 5.1.2. In a tree, any pair of vertices is connected by exactly one path. A tree
has at least two vertices of degree 1. Such vertices are called leaf vertices. If a graph G is
connected, then a spanning tree is a subgraph of G that is a tree. In a tree an arbitrary vertex r
can be specified as the root of the tree. Then the tree becomes a directed rooted tree. An edge
(vi , vj ) ∈ G in a directed rooted tree is a directed edge from vi to vj if there is a path from vi to
r such that the first edge of this path is from vi to vj . If there is a directed edge from vi to vj ,
then vj is called the parent of vi , and vi is said to be a child of vj .
A labeling (or ordering) of a graph G with n vertices is a mapping of the integers {1, 2, . . . , n}
onto the vertices of G. The integer i assigned to a vertex is called the label (or number) of that
vertex. A labeling of the adjacency graph of a symmetric matrix C corresponds to a particular
ordering of its rows and columns. For example, the graph of the structurally symmetric matrix
× × × ×
× × ×
× × × ×
× × (5.1.2)
× ×
× ×
× ×
is given in Figure 5.1.2.
where aTi is the ith row of A. This expresses ATA as the sum of m matrices of rank one. By
the no-cancellation assumption, the nonzero structure of ATA is the direct sum of the nonzero
elements of the matrices ai aTi , i = 1, . . . , m. It follows that the graph G(ATA) is the direct sum
of the graphs G(ai aTi ), i = 1, . . . , m. That is, its edges are the union of all edges not counting
multiple edges. In G(ai aTi ) all pairs of vertices are connected. Such a graph is called a clique
and corresponds to a dense submatrix in ATA. Clearly, the structure of ATA is not changed by
dropping any row of A whose nonzero structure is a subset of another row.
5.1. Tools for Sparse Matrix Computations 249
The filled graph GF (C) is a graph with n vertices and edges corresponding to all graphs
G0 = G, Gi , i = 1, . . . , n − 1. It bounds the structure of the Cholesky factor R:
Theorem 5.1.1. Let G(C) = (V, E) be the undirected graph of the symmetric matrix C. Then
(vi , vj ) is an edge of the filled graph GF (C) if and only if (vi , vj ) ∈ E, or there is a path in
G(C) from vertex i to vertex j passing only through vertices vk with k < min(i, j).
The filled graph GF (C) can be characterized by an elimination tree T (C) that captures the
row dependencies in the Cholesky factorization.
Definition 5.1.2. The elimination tree of a symmetric matrix C ∈ Rn×n with Cholesky factor R
is a directed rooted tree T (C) with n vertices labeled from 1 to n, where vertex p is the parent
of node i if and only if
p = min{j > i | rij ̸= 0}.
j
250 Chapter 5. Direct Methods for Sparse Problems
The node ordering of an elimination tree is such that children vertices are numbered before
their parent node. Such orderings are called topological orderings. All topological orderings
of the elimination tree are equivalent in the sense that they give the same triangular factor R. A
postordering is a topological ordering in which a parent node j always has node j − 1 as one of
its children. For example, the ordering of the vertices in the tree in Figure 5.1.4 can be made into
a postordering by exchanging labels 3 and 5. Postorderings can be determined by a depth-first
search; see Liu [755, 1990]. An important advantage of using a postordering is that it simplifies
data management, and the storage is reduced.
Elimination trees play a fundamental role in symmetric sparse matrix factorization and pro-
vide, in compact form, all information about the row dependencies. Schreiber [974, 1982] was
the first to note this and to define elimination trees formally. In the excellent review of the
generation and use of elimination trees by Liu [755, 1990] an efficient algorithm is given that
determines the elimination tree of C in time proportional to nnz(R) and in storage proportional
to nnz(C). When C = ATA it is possible to predict the structure of R directly without forming
C; see Gilbert, Moler, and Schreiber [469, 1992].
after increasing the number of nonzero elements in the columns. More effective methods can be
obtained by using information from successively reduced submatrices.
We first consider ordering methods that have the objective of minimizing the bandwidth of
ATA or rather the number of elements in the envelope of C = ATA. Recall that by Theo-
rem 4.1.6, zeros outside the envelope of C will not suffer fill in the Cholesky factorization. Such
ordering methods often perform well for matrices that come from one-dimensional problems or
problems that are in some sense tall and thin. The most widely used such method is the Cuthill–
McKee (CM) algorithm [281, 1969]. It starts by finding a peripheral vertex r (i.e., a vertex
with lowest degree) in G(C) and numbering this as vertex 1. It then performs a breadth-first
search using a level structure rooted at r. This partitions the vertices into disjoint levels
L1 = r, L2 (r), . . . , Lh (r),
where Li (r), i ≤ h, is the set of all vertices adjacent to Li−1 (r) that are not in Lj (r), j =
1, . . . , i − 1. The search ends when all vertices are numbered.
George [453, 1971] observed that the ordering obtained by reversing the Cuthill–McKee
(RCM) ordering gives the same bandwidth and usually results in less fill. An implementation of
the RCM algorithm with run-time proportional to the number of nonzeros in the matrix is given
by Chan and George [230, 1980]. The performance of the RCM ordering strongly depends on the
choice of the starting peripheral node. George and Liu [457, 1981] recommend a strategy where
a node of maximal or nearly maximal eccentricity is chosen as a starting node. The eccentricity
of a node is defined as
l(x) = max d(x, y),
y∈X
where d(x, y) is the length of the shortest path between the two vertices x and y in G = (X, E).
A substantially faster algorithm for bandwidth minimization is the GPS algorithm by Gibbs,
Poole, and Stockmeyer [467, 1976]. Lewis [738, 1982], [737, 1982] describes some improve-
ments to the implementation of the GPS and other profile reduction algorithms. Sloan [1004,
1986] gives a new algorithm that generally results in a smaller profile. Among other proposed
methods for envelope reduction, we mention the spectral method of Barnard, Pothen, and Si-
mon [81, 1995]. This uses the eigenvector of the smallest positive eigenvalue of the Laplacian
matrix corresponding to the given matrix.
The minimum degree (MD) ordering is the most widely used heuristic method for limiting
fill. The name “minimum degree” comes from the graph-theoretic formulation of the Cholesky
algorithm first given by Rose [935, 1972]; see Section 5.1.3. MD is a greedy method that in
each step of the Cholesky factorization selects the next pivot as the sparsest row and column.
It is a symmetric analogue of an ordering algorithm proposed by Markowitz [778, 1957] for
unsymmetric matrices with applications to linear programming. This algorithm was adapted for
symmetric matrices by Tinney and Walker [1064, 1967]. The MD ordering minimizes the local
252 Chapter 5. Direct Methods for Sparse Problems
arithmetic and amount of fill that occurs but in general will not minimize the global arithmetic or
fill. However, it has proved to be very effective in reducing both of these objectives. If the graph
of C = ATA is a tree, then the MD ordering results in no fill.
For the matrix C in (5.1.2) the initial elimination order indicated gives 10 fill-in elements.
A minimum degree ordering such as 4, 5, 6, 7, 1, 2, 3 will result in no fill-in. Because several
vertices in the graph have degree 1, the minimum degree ordering is not unique.
If there is more than one node of minimum degree at a particular step, the node is chosen from
the set of all vertices of minimum degree. The way this tie-breaking is done can be important.
Examples are known for which the minimum degree ordering will give poor results if the tie-
breaking is systematically done poorly. It is an open question how good the orderings are if ties
are broken randomly. A matrix for which the minimum degree algorithm is not optimal is given
by Duff, Erisman, and Reid [344, 1986]. If the minimum degree node 5 is eliminated first, fill
will occur in position (4, 6). But with the elimination ordering shown in Figure 5.1.5, there is
no fill. In the multiple minimum degree algorithm (MMD) by George and Liu [459, 1989] a
refinement of the elimination graph model is used to decrease the number of degree updates. The
vertices Y = {y1 , . . . , yp } are called indistinguishable if they have the same adjacency sets
(including the node itself), i.e.,
If one of these vertices is eliminated, the degrees of the remaining vertices in the set will de-
crease by one, and they all will become of minimum degree. Hence, all vertices in Y can be
eliminated simultaneously, and the graph transformation needs to be updated only once. Indeed,
indistinguishable vertices can be merged and treated as one supernode. For example, the graph
in Figure 5.1.5 contains two sets {1, 2, 3} and {7, 8, 9} of supervertices.
Figure 5.1.5. The graph of a matrix for which minimum degree is not optimal.
The structure of each row in A ∈ Rm×n corresponds to a clique in the graph of C = ATA.
Thus, the generalized element approach can be used to represent C as a sequence of cliques.
This allows the minimum degree algorithm for ATA to be implemented directly from A without
forming the structure of C = ATA, with resulting savings in work and storage.
The most costly part of the minimum degree algorithm is recomputation of the degree of the
vertices adjacent to the new pivot column. In the approximate minimum degree (AMD) algo-
rithm an upper bound on the exact minimum degree is used instead, which leads to substantial
savings in run-time, especially for very irregularly structured matrices. It has no effect on the
quality of the ordering; see Amestoy, Davis, and Duff [18, 2004].
5.2. Sparse QR Factorization 253
× × × × ×
× × × × ×
× ×
× ×
× ×
A1 = , A2 = × × . (5.2.1)
× ×
× ×
× ×
×
×
For both A1 and A2 the matrix of normal equations is full. But A2 is already upper triangular,
and hence R = A2 is sparse. For A2 there will be cancellation in the Cholesky factorization of
AT2 A2 that will occur irrespective of the numerical values of the nonzero elements. This is called
structural cancellation, in contrast to numerical cancellation that occurs only for certain values
of the nonzero elements. Coleman, Edenbrandt, and Gilbert [262, 1986] prove that if A has the
so-called strong Hall property, then structural cancellation cannot occur.
Definition 5.2.1. A matrix A ∈ Rm×n , m ≥ n, is said to have the strong Hall property if for
every subset of k columns, 0 < k < n, the corresponding submatrix has nonzeros in at least
k + 1 rows. That is, when m > n, every subset of k ≤ n columns has the required property, and
when m = n, every subset of k < n columns has the property.
Theorem 5.2.2. Let A ∈ Rm×n , m ≥ n, have the strong Hall property. Then the structure of
ATA correctly predicts that of R, excluding numerical cancellations.
Note that A2 in (5.2.1) does not have the strong Hall property because the first column has
only one nonzero element. However, the matrix à obtained by deleting the first column has this
property.
The structure of R in QR factorization can also be predicted by performing the Givens or
Householder QR algorithm symbolically. In Givens QR factorization, the intermediate fill can
be modeled using the bipartite graph G(A) = {R, C, E}. Here the two sets of vertices
R = (r1 , . . . , rk ), C = (c1 , . . . , ck )
correspond to the sets of rows and columns of A. E is a set of edges {ri , cj } that connects a
node in R to one in C, and {ri , cj } ∈ E if and only if aij is nonzero; see George, Liu, and Ng
[465, 1984] and Ostrouchov [851, 1987]. The following result is due to George and Heath [455,
1980].
Theorem 5.2.3. The structure of R as predicted by a symbolic factorization of ATA includes the
structure of R as predicted by the symbolic Givens method.
254 Chapter 5. Direct Methods for Sparse Problems
Manneback [772, 1985] proves that the structure predicted by a symbolic Householder algo-
rithm is strictly included in the structure predicted from ATA. Symbolic Givens and Householder
factorizations can also overestimate the structure of R. An example where structural cancellation
occurs for the Givens rule is shown by Gentleman [452, 1976].
× × × × ×
× ×
. .
.. ..
× ×
A= , PA = . (5.2.2)
× × × × ×
× ×
× ×
× ×
Several heuristic algorithms for determining a row ordering have been suggested. The fol-
lowing is an extension of the row ordering recommended for band sparse matrices.
2. For each group of rows with fi (A) = k, k = 1, . . . , maxi fi (A), sort all the rows by
increasing ℓi (A).
If Algorithm 5.2.1 is applied to A in (5.2.2), the good row ordering P A is obtained. The
algorithm will not in general determine a unique ordering. One way to resolve ties is to consider
the cost of symbolically rotating a row aTi into all other rows with a nonzero element in column
ℓi (A). By cost we mean the total number of new nonzero elements created. The rows are then
ordered according to ascending cost. With this ordering it follows that rows 1, . . . , fi (A) − 1 in
Ri−1 will not be affected when the remaining rows are processed. Therefore these rows are the
final first fi (A) − 1 rows in R.
An alternative that has been found to work well in some contexts is to order the rows by
increasing values of ℓi (A). When row aTi is processed using this ordering, all previous rows
processed have nonzeros only in columns up to at most ℓi (A). Hence, only columns fi (A) to
5.1. Sparse QR Factorization 255
ℓi (A) of Ri−1 will be involved, and Ri−1 has zeros in columns ℓi+1 (A), . . . , n. No fill will be
generated in row aTi in these columns.
Liu [754, 1986] introduced the notion of row merge tree for structuring this operation. Let
R0 be initialized to have the structure of the final R with all elements equal to zero. Denote
by Rk−1 ∈ Rn×n the upper triangular matrix obtained after processing the first k − 1 rows.
At step k the kth row of A is first uncompressed into a full vector aTk = (ak1 , ak2 , . . . , akn ).
The nonzero elements akj ̸= 0 are annihilated from left to right by plane rotations involving
rows j < k in Rk−1 . This may create new nonzeros in both Rk−1 and in the current row aTk .
Note that if rjj = 0 in Rk−1 , this means that this row in Rk−1 has not yet been touched by any
rotation, and hence the entire jth row must be zero. When this occurs the remaining part of row k
can just be inserted as the jth row in Rk−1 . The algorithm is illustrated below using an example
from George and Ng [460, 1983].
Assume that the first k rows of A have been processed to generate Rk . Nonzero elements
of Rk−1 are denoted by ×, nonzeros introduced into Rk and aTk during the elimination aTk are
denoted by +, and elements involved in the elimination of aTk are circled. Nonzero elements
created in aTk during the elimination are ultimately annihilated. The sequence of row indices
involved in the elimination is {2, 4, 5, 7, 8}, where 2 is the column index of the first nonzero in
aTk . Note that unlike in the Householder method, intermediate fill only takes place in the row
being processed:
× 0 × 0 0 × 0 0 0 0
⊗ 0 ⊕ ⊗ 0 0 0 0 0
× 0 × 0 0 0 × 0
⊗ ⊕ 0 ⊗ 0 0 0
Rk−1 ⊗ ⊕ 0 0 0 0
= . (5.2.3)
T
ak ⊗ ⊗ 0 0
⊗ 0 0
× ×
0 ×
0 ⊗ 0 ⊗ ⊕ 0 ⊕ ⊕ 0 0
From Theorem 5.2.3 it follows that if the structure of R has been predicted from that of
ATA, any intermediate matrix Ri−1 will fit into the predicted structure. The plane rotations can
be applied simultaneously to a right-hand side b to form QT b. In the original implementation the
Givens rotations are discarded after use. Hence, only enough storage to hold the final R and a
few extra vectors for the current row and right-hand side(s) is needed in main memory.
Gilbert et al. [468, 2001] give an algorithm to predict the structure of R working directly from
G(A). This algorithm runs in time proportional to nnz(A) and makes the step of determining the
structure of ATA redundant.
Variable row pivoting methods are studied by Gentleman [450, 1973], Duff [340, 1974], and
Zlatev [1151, 1982]. These schemes have never become very popular because they require a
dynamic storage structure and are complicated to implement.
Theorem 5.2.4. Let T [j] denote the subtree rooted in node j. Then if k ̸∈ T [j], columns k and j
can be eliminated independently of each other.
If T [i] and T [j] are two disjoint subtrees of T (C), columns s ∈ T [i] and t ∈ T [j] can be
eliminated in any order. The elimination tree prescribes an order relation for the elimination of
columns in the QR factorization, i.e., a column associated with a child node must be eliminated
before the parent column. On the other hand, columns associated with different subtrees of T (C)
are independent and can be eliminated in parallel.
Liu [754, 1986] developed a multifrontal QR algorithm that generalizes the row-oriented
Givens QR algorithm by using submatrix rotations. It achieves a significant reduction in time
at the cost of a modest increase in working storage. A modified version of this algorithm that
uses Householder reflections is given by George and Liu [458, 1987]. Supervertices and other
essential modifications of multifrontal methods are treated by Liu [755, 1990].
Nested dissection orderings have been discussed in Section 4.3.2. Such orderings for solving
general sparse positive definite systems have been analyzed by George, Poole, and Voigt [462,
1978] and George and Liu [457, 1981]. The use of such orderings for sparse least squares
problems is treated in George, Heath, and Plemmons [464, 1981] and George and Ng [460,
1983]. A planar graph is a graph that can be drawn in the plane without two edges crossing.
Planar graphs are known to have small balanced separators. Lipton, Rose, and Tarjan
√ [753, 1979]
show that for any planar graph with n vertices there exists a separator with O( n) vertices such
that each subgraph has at most n/2 vertices.
We illustrate the multifrontal QR factorization by the small 12 × 9 matrix
× × × ×
× × × ×
× × × ×
× × × ×
× × × ×
× × × ×
A=
. (5.2.4)
× × × ×
× × × ×
× × × ×
× × × ×
× × × ×
× × × ×
This matrix arises from a 3 × 3 mesh problem using a nested dissection ordering. The graph
G(ATA) is
First, a QR factorization of rows 1–3 is performed. These rows have nonzeros only in columns
{1, 5, 7, 8}. With the zero columns omitted, this operation can be carried out as a QR factoriza-
tion of a small dense matrix of size 3 × 4. The resulting first row equals the first of the final R
of the complete matrix and can be stored away. The remaining two rows form an update matrix
F1 and will be processed later. The other three block rows 4–6, 7–9, and 10–12 can be reduced
5.1. Sparse QR Factorization 257
similarly in parallel. After this first stage the matrix has the form
× × × ×
× × ×
× ×
× × × ×
× × ×
× ×
.
× × × ×
× × ×
× ×
× × × ×
× × ×
× ×
In the second stage, F1 , F2 and F3 , F4 are simultaneously merged into two upper trape-
zoidal matrices by eliminating columns 5 and 6. In merging F1 and F2 , only the set of columns
{5, 7, 8, 9} needs to be considered. Reordering the rows by the index of the first nonzero element
and performing a QR decomposition, we get
× × × × × × ×
× × × × × ×
QT = .
× × × ×
× × ×
The first row in each of the four blocks is a final row in R and can be removed, which leaves four
upper trapezoidal update matrices, F1 –F4 . The merging of F3 and F4 is performed similarly.
The first row in each reduced matrix is a final row in R and is removed. In the final stage the
remaining two upper trapezoidal (in this example, triangular) matrices are merged, giving the
final factor R. This corresponds to eliminating columns 7, 8, and 9.
In the multifrontal method the vertices in the elimination tree are visited in turn given by the
ordering. Each node xj in the tree is associated with a frontal matrix Fj that consists of the set
of rows Aj in A, with the first nonzero in location j, together with one update matrix contributed
by each child node of xj . After variable j in the frontal matrix is eliminated, the first row in the
reduced matrix is the jth row of the upper triangular factor R. The remaining rows form a new
update matrix Uj that is stored in a stack until needed.
For j = 1, . . . , n do
1. Form the frontal matrix Fj by combining the set of rows Aj and the update matrix Us for
each child xs of the node xj in the elimination tree T (ATA).
The frontal matrices in the multifrontal method are often too small for the efficient use of
vector processors and matrix-vector operations in the solution of the subproblems. Therefore, a
useful modification of the multifrontal method is to amalgamate several vertices into one large
supernode. Instead of eliminating one column in each node, the decomposition of the frontal
matrices then involves the elimination of several columns, and it may be possible to use Level 2
or even Level 3 BLAS; see Dongarra et al. [328, 1990]. Vertices can be grouped together to
258 Chapter 5. Direct Methods for Sparse Problems
form a supernode if they correspond to a block of contiguous columns in the Cholesky factor,
where the diagonal block is fully triangular, and these rows all have identical off-block diago-
nal column structures. Because of the computational advantages of having large supervertices,
it is advantageous to relax this condition and also amalgamate vertices that satisfy this con-
dition if some local zeros are treated as nonzeros. A practical restriction is that if too many
vertices are amalgamated, then the frontal matrices become sparse. Note also that nonnumerical
operations often make up a large part of the total decomposition time, which limits the possi-
ble gain. For a discussion of supervertices and other modifications of the multifrontal method,
see Liu [755, 1990].
For a K by K grid problem
√ with n = K 2 , m = s(K − 1)2 it is known that nnz(R) =
O(n log n), but Q has O(n n) nonzeros; see George and Ng [461, 1988]. Hence if Q is needed,
it should not be stored explicitly but represented by the Householder vectors of the frontal or-
thogonal transformations. Lu and Barlow [761, 1996] show that these require only O(n log n)
storage. In many implementations the orthogonal transformations are not stored. Then the cor-
rected seminormal equations (see Section 2.5.4) can be used to treat additional right-hand sides.
Several column reorderings of the column are available in MATLAB for making the Cholesky
and QR factors more sparse. MATLAB stores a permutation as a vector p containing a per-
mutation of 1, 2, . . . , n such that A(:,p) is the matrix with permuted columns. The function
p = colperm(A) computes a permutation that sorts the columns so that they have increas-
ing nonzero count. An approximate minimum degree ordering for the columns is given by
p = colmmd(A).
To solve a sparse least squares problem using a minimum degree ordering of the columns of
A and the corrected seminormal equations (CSNE; see Section 2.5.4), one writes the following
in MATLAB.
SuiteSparseQR is also available in MATLAB. This allows large-scale sparse least squares
problems to be solved. The function [Q,R,p] = qr(A,`vector') returns the m × n factor R
and m × m factor Q such that A(:,p) = Q*R. Because Q often is not very sparse, a better choice
is to solve one or more least squares problems min ∥AX − B∥ using [C,R,p] = qr(A,B,0)
and X(p,:) = R\C.
Example 5.2.5. We study the effect of two different column orderings in the QR factorization of
a sparse matrix A arising in a study of substance transport in rivers by Elfving and Skoglund [384,
2007]. Figures 5.2.1 and 5.2.2 show the location of nonzero elements in AP and R using two
different column orderings available in MATLAB. With colperm the columns are ordered by
increasing nonzero elements and give nnz(R) = 32,355. The colamd ordering gives nnz(R) =
15,903, a great improvement.
0 0
100 100
200 200
300 300
400 400
500 500
600 600
700 700
800 800
900 900
1000 1000
0 100 200 300 400 0 100 200 300 400
nz = 3090 nz = 32355
Figure 5.2.1. Nonzero pattern of a sparse matrix A and the factor R in its QR factorization
using the MATLAB colperm reordering. Used with permission of Springer International Publishing; from
Numerical Methods in Matrix Computations, Björck, Åke, 2015; permission conveyed through Copyright
Clearance Center, Inc.
0 0
100 100
200 200
300 300
400 400
500 500
600 600
700 700
800 800
900 900
1000 1000
0 100 200 300 400 0 100 200 300 400
nz = 3090 nz = 15903
Figure 5.2.2. Nonzero pattern of a sparse matrix A and the factor R in its QR factorization using
the MATLAB colamd column ordering. Used with permission of Springer International Publishing; from
Numerical Methods in Matrix Computations, Björck, Åke, 2015; permission conveyed through Copyright
Clearance Center, Inc.
5.1. Sparse QR Factorization 261
In finite-precision, the computed R usually will not have any zero diagonal element, even
when rank(A) < n. Although the rank is often revealed by the presence of small diagonal
elements, this does not imply that the rest of the rows are negligible. Heath [596, 1982] suggests
the following postprocessing of R. Starting from the top, the diagonal of R is examined for
small elements. In each row whose diagonal element falls below a certain tolerance, the diagonal
element is set to zero. The rest of the row is then reprocessed, zeroing out all its other nonzero
elements. This might increase some previously small diagonal elements in rows below, which
is why one has to start from the top. We again end up with a matrix of the form shown in
Figure 5.2.3. However, it may be that R is numerically rank-deficient yet has no small diagonal
element.
Pierce and Lewis [894, 1997] develop a rank-revealing algorithm for sparse QR factorizations
based on techniques similar to those in Section 2.3.5. The factorization proceeds by columns,
and inverse iteration is used to determine ill-conditioning. Let Rj = (r1 , . . . , rj ) be the matrix
formed by the first j columns of the final R. Assume that Rj is not too ill-conditioned, but
Rj+1 = ( Rj rj+1 ) is found to be almost rank-deficient. Then column rj+1 is permuted to the
last position, and the algorithm is continued. This may happen several times during the numerical
factorization. At the end we obtain a QR factorization
R11 R12
( A1 A2 ) = Q ,
0 S
where R11 ∈ Rr×r is well-conditioned. In general R12 and S will be dense, but provided r ≪ n,
this is often acceptable. An important fact stated in the theorem below is that R11 will always
fit into the storage structure predicted for R. The following theorem is implicit in Foster [424,
1986].
be a submatrix of A. Denote the Cholesky factors of ATA and ATFk AFk by R and RFk , respec-
tively. Then the nonzero structure of RFk is included in the nonzero structure predicted for R
under the no-cancellation assumption.
Proof. Let G = G(X, E) be the ordered graph of ATA. The ordered graph GFk = GFk (XFk ,
EFk ) of ATFkAFk is obtained by deleting all vertices in G not in Fk = [j1 , j2 , . . . , jr ] and all
edges leading to the deleted vertices. Then (RF )ij ̸= 0 only if there exists a path in GFk from
node i to node j (i < j) through vertices with numbers less than i. If such a path exists in GFk ,
it must exist also in G, and hence we will have predicted Rij ̸= 0.
262 Chapter 5. Direct Methods for Sparse Problems
u = Rs z, v = rd − Cd u, Cd = Ad Rs−1 ∈ Rmd ×n ,
this problem is seen to be that of finding the least-norm solution to the linear system
v
( Imd Cd ) = rd . (5.3.4)
u
This problem is the same as the small least squares problem
Imd rd
min v− , u = CdT v, (5.3.5)
v CdT 0 2
which can be solved by QR factorization of the (md + n) × md matrix. Note that for both
problems the normal equations for v are
4. Solve the full-rank least squares problem (5.3.5) for v and form u = CdT v.
where A1 has full column rank, x1 ∈ Rn1 and x2 ∈ Rn2 , and n = n1 + n2 . The previous
updating scheme is generalized to this case by Heath [596, 1982]. Let z ∈ Rn1 and W ∈ Rn1 ×n2
be the solutions of the least squares problems
respectively. In both least squares problems (5.3.7), A1 has full rank and md has dense rows
and, thus problems (5.3.7) can be solved by the previous algorithm. Then x2 is obtained as the
solution of the small dense least squares problem
Here the block Ah is underdetermined (has more columns than rows), As is square, and Av is
overdetermined (has more rows than columns). All three blocks have a nonzero diagonal, and the
submatrices Av and ATh both have the strong Hall property. One or two of the diagonal blocks
may be absent in the decomposition. The example below shows the coarse block triangular
decomposition of a matrix A ∈ R11×10 with structural rank 8.
× × ⊗ × ×
× ⊗ × × ×
⊗ × ×
× ⊗ ×
⊗ ×
× ⊗ ×
⊗ ×
⊗
×
× ×
×
It may be possible to further decompose the blocks Ah and Av in the coarse decomposition
(5.3.9) so that
Ah1 Av1
Ah = . .. , Av = . .. ,
Ahp Avq
where each Ah1 , . . . , Ahp is underdetermined and each Av1 , . . . , Avq is overdetermined. The
submatrix As may be decomposable into block upper triangular form
A U12 ... U1,t
s1
As2 ... U2,t
As = .. .. (5.3.10)
. .
Ast
with square diagonal blocks As1 , . . . , Ast that have nonzero diagonal elements. The resulting
fine decomposition can be shown to be essentially unique. Any block triangular form can be
obtained from any other by applying row permutations that involve the rows of a single block
row, column permutations that involve the columns of a single block column, and symmetric
permutations that reorder the blocks.
A square matrix that can be permuted to the form (5.3.10), with t > 1, is said to be reducible;
otherwise, it is called irreducible; see Definition 4.1.3. All the diagonal blocks As1 , . . . , Ast in
the fine decomposition are irreducible. This implies that they have the strong Hall property; see
Coleman, Edenbrandt, and Gilbert [262, 1986]. A two-stage algorithm for permuting a square
and structurally nonsingular matrix A to block upper triangular is given by Tarjan [1057, 1972];
see also Gustavson [555, 1976], and Duff [341, 1977], [342, 1981]. The algorithm depends
on the concept of matching in the bipartite graph of A. This is a subset of its edges with no
5.3. Special Topics 265
common end points and corresponds to a subset of nonzeros in A such that no two belong to the
same row or column. A maximum matching is one where the maximum number of edges equals
the structural rank r(A).
Pothen and Fan [902, 1990], [901, 1984] give an algorithm for the general case. The program
MC13D by Duff and Reid [351, 1978] in the HSL Mathematical Software Library implements
the fine decomposition of As . It proceeds in three steps:
1. Find a maximum matching in the bipartite graph G(A) with row set R and column set C.
2. According to the matching, partition R into sets VR, SR, HR and C into sets VC, SC,
HC corresponding to the vertical, square, and horizontal blocks.
3. Find the diagonal blocks of the submatrices Av and Ah from the connected components
in the subgraphs G(Av ) and G(Ah ). Find the block upper triangular form of As from
the strongly connected components in the associated directed subgraph G(As ), with edges
directed from columns to rows.
Next, compute the remaining part of the solution x̃k , . . . , x̃1 from
k
X
Asi x̃i = b̃i − Uij x̃j , i = k, . . . , 2, 1. (5.3.12)
j=i+1
Finally, set x = Qx̃. We can solve subproblems (5.3.11) and (5.3.12) by computing the QR fac-
torizations of Av and As,i , i = 1, . . . , k. As As1 , . . . , Ask and Av have the strong Hall property,
the structures of the matrices Ri are correctly predicted by the structures of the corresponding
normal matrices.
If A has structural rank n but is numerically rank-deficient, it will not be possible to fac-
torize all the diagonal blocks in (5.3.10). In this case the block triangular structure given by
the Dulmage–Mendelsohn form cannot be preserved, or some blocks may become severely ill-
conditioned. If the structural rank is less than n, there is an underdetermined block Ah . In this
case we can still obtain the form (5.3.10) with a square block A11 by permuting the extra col-
umns in the first block to the end. The least squares solution is then not unique, but there is a
unique solution of minimum length.
The block triangular form of the matrices in the Harwell–Boeing test collection (Duff,
Grimes, and Lewis [346, 1989]) and the time required to compute them are given in Pothen
and Fan [902, 1990].
Chapter 6
Iterative Methods
AT r = 0, r = b − Ax. (6.1.1)
Working with A and AT separately has important advantages. As emphasized earlier, small
perturbations in ATA, e.g., by roundoff, may change the solution much more than perturbations
of similar size in A itself. Working with AT b instead of b as input data can also cause a loss
of accuracy. Fill that can occur in the formation of ATA is also avoided (although occasionally
ATA is more sparse than A).
Iterative methods can also be applied to the least-norm problem
If A has full row rank, the unique solution is y = Az, where z satisfies the normal equations of
the second kind (see (1.1.17)):
AT Az = c. (6.1.3)
267
268 Chapter 6. Iterative Methods
Example 6.1.1. A problem where A is sparse but ATA is significantly more dense is shown in
Figure 6.1.1. In such a case the Cholesky factor will in general also be nearly dense. This rules
out the use of sparse direct methods based on QR decomposition of A. Consider the case when
A has a random sparsity structure such that an element aij is nonzero with probability p < 1.
Ignoring numerical cancellation, it follows that (ATA)jk ̸= 0 with probability
2
q = 1 − (1 − p2 )m ≈ 1 − e−mp .
Therefore, ATA will be almost dense when mp ≈ m1/2 , i.e., when the average number of non-
zero elements in a column is about m1/2 . This type of structure is common in reconstruction
problems. An example is the inversion problem for the velocity structure for the Central Califor-
nia Microearthquake Network. In 1980 this generated a matrix A with dimensions m = 500,000,
n = 20,000, and about 107 nonzero elements. The nonzero structure of A is very irregular and
ATA is almost dense. Today similar problems of much higher dimensions are common.
0 0
10
20 20
30
40 40
50
60 60
70
80 80
90
100
0 20 40 60 80 0 20 40 60 80
nz = 600 nz = 2592
Figure 6.1.1. Structure of a sparse matrix A (left) and ATA (right) for a simple image recon-
struction problem. Used with permission of Springer International Publishing; from Numerical Methods
in Matrix Computations, Björck, Åke, 2015; permission conveyed through Copyright Clearance Center, Inc.
This system combines both kinds of normal equations. Taking c = 0 gives the least squares
problem. Taking b = 0 gives the least-norm problem with z = −x. The augmented system is
symmetric and indefinite, which makes its solution more challenging. Recall that the condition
of (6.1.4) can be improved by working with
αI A y/α b
= , (6.1.5)
AT 0 x c/α
A good survey of iterative methods for solving linear systems is the comprehensive text by
Saad [957, 2003]. The main research developments of the 20th century are surveyed by Saad and
van der Vorst [960, 2000]. Other notable textbooks on iterative methods include Axelsson [48,
1994], Greenbaum [534, 1997], and van der Vorst [1075, 2003]. Templates for implementation
of iterative methods for linear systems are found in Barret et al. [82, 1994]. A PDF file of an
unpublished second edition of this book can be downloaded from http://www.netlib.org/
templates/templates.pdf. Among older textbooks, we mention Varga [1090, 1962] and
Hageman and Young [559, 2004].
M xk+1 = N xk + b, k = 0, 1, . . . , (6.1.6)
Theorem 6.1.2. The stationary iterative method xk+1 = Gxk + d is convergent for all initial
vectors x0 if and only if the spectral radius of G satisfies
For any consistent matrix norm, we have ρ(G) ≤ ∥G∥. Thus a sufficient condition for
convergence is that ∥G∥ < 1 holds for some consistent matrix norm. From (6.1.8) it follows that
for any consistent matrix norm,
Definition 6.1.3. Assume that the iterative method (6.1.6) is convergent. The average rate
Rk (G) and asymptotic rate R∞ (G) of convergence are defined as
1
Rk (G) = − ln ∥Gk ∥, R∞ (G) = − ln ρ(G),
k
To reduce the norm of the error by a fixed factor δ, at most k iterations are needed where
∥Gk ∥ ≤ δ or, equivalently, k satisfies
k ≥ − ln δ/Rk (G).
It is desirable for the iteration matrix G = I − M −1 A to have real eigenvalues. This will be
the case if the iterative method is symmetrizable.
Definition 6.1.4. The stationary iterative method (6.1.6) is said to be symmetrizable if there is
a nonsingular matrix W such that
W (I − G)W −1 = W M −1 AW −1
If A and the splitting matrix M are both symmetric positive definite, then the corresponding
stationary method is symmetrizable. To show this, let R be the Cholesky factor of A and set
W = R. Then
R(M −1 A)R−1 = RM −1 RTRR−1 = RM −1 RT .
The convergence of (6.1.6) when rank(A) < n is investigated by Keller [690, 1965] and
Young [1141, 2003]. Dax [293, 1990] investigates the convergence properties of stationary iter-
ative methods, with the emphasis is on properties that hold for singular and possibly inconsistent
systems for a square matrix A. Tanabe [1056, 1971] considers stationary iterative methods of
the form (6.1.10) for computing more general solutions x = A− b, where A− is any generalized
inverse of A such that AA− A = I. He shows that the iteration can always be written in the form
for some matrix B, and characterizes the solution in terms of R(AB) and N (BA).
The concept of splitting has been extended to rectangular matrices by Plemmons [896, 1972].
Berman and Plemmons [112, 1974] define A = M − N to be a proper splitting if the ranges
and nullspaces of A and M are equal. They show that for a proper splitting, the iteration
xk+1 = M † (N xk + b) (6.1.10)
converges to the pseudoinverse solution x = A† b for every x0 if and only if the spectral radius
ρ(M † N ) < 1. The iterative method (6.1.10) avoids explicit use of the normal system.
6.1. Basic Iterative Methods 271
This iteration is symmetrizable if A has full column rank and M is symmetric positive definite.
For solving the minimum norm problem (6.1.2), the same splitting is applied to solve ATAz =
c, giving the iteration zk+1 = zk + M −1 (c − ATAzk ). After multiplying with A and using
yk = Azk , we obtain
yk+1 = yk + AM −1 (c − AT yk ), k = 0, 1, . . . . (6.1.12)
In the following we assume that ATA is positive definite. In (6.1.11), the particular choice
M = ω −1 I gives Richardson’s first-order method
where ω > 0 is a relaxation parameter. Richardson’s method is often used for solving least
squares problems originating from discretized ill-posed problems. In this context (6.1.13) is
also known as Landweber’s method; see Section 6.4.1. If x0 ∈ R(AT ) (e.g., x0 = 0), then
by construction xk ∈ R(AT ) for all k > 0. Hence, in exact arithmetic, Richardson’s method
converges to the pseudoinverse solution when A is rank-deficient,
For the iteration (6.1.13) the error satisfies
The eigenvalues of the iteration matrix G are λi (G) = 1 − ωσi2 , i = 1, . . . , n, where σi are the
singular values of A.
Theorem 6.1.5. Assume that the singular values σi of A satisfy 0 < a ≤ σi2 ≤ b, i = 1, . . . , n.
Then Richardson’s method converges if and only if 0 < ω < 2/b.
To maximize the asymptotic rate of convergence, ω should be chosen so that the spectral
radius
ρ(G) = max{|1 − ωa|, |1 − ωb|}
is minimized. The optimal ω lies in the intersection of the graphs of |1 − ωa| and |1 − ωb|,
ω ∈ (0, 2/b). Setting 1 − ωa = ωb − 1, we obtain
where ωk > 0. Sufficient conditions for convergence of this iteration are known, which we state
below without proof.
Theorem 6.1.6. The iterates (6.1.16) converge for all vectors b to a least squares solution
x̂ = arg minx ∥Ax − b∥2 if for some ϵ > 0 it holds that
where σ1 is the largest singular value of A. If x0 ∈ R(AH ), then x̂ is the unique least-norm
solution.
The method is not symmetrizable. Each step requires the solution of a lower triangular system
by forward substitution and splits into n minor steps. Note that the ordering of the columns of A
will influence the convergence. To implement the Gauss–Seidel method the key observation is
that it differs from the Jacobi method only in the following respect. As soon as a new component
of xk+1 has been computed, it is used for computing the remaining part of xk+1 .
6.1. Basic Iterative Methods 273
Björck and Elfving [141, 1979] show that the Gauss–Seidel method applied to the normal
equations of the first and second kinds are special cases of two classes of projection methods for
square nonsingular linear systems studied by Householder and Bauer [646, 1960]. In the first
class of methods for Ax = b, let p1 , p2 , . . . ̸∈ N (A) be a sequence of n-vectors. Let x(1) be an
initial approximation, and for j = 1, 2, . . . , compute
This shows that (6.1.22) is a residual-reducing iteration method. This method was originally
devised by de la Garza [442, 1951].
One step of the Gauss–Seidel method for ATAx = AT b is obtained by taking pj in (6.1.22) to
be the unit vectors ej ∈ Rn , j = 1, . . . , n, in cyclic order. Then qj = Aej = aj , and one iteration
(1) (1) (1)
splits into n minor steps as follows. Let xk = xk be the current iterate and rk = b − Axk be
the corresponding residual. The Gauss–Seidel iteration becomes the following: For j = 1, . . . , n,
compute
(j+1) (j) (j)
xk = xk + δ j ej , δj = aTj rk /∥aj ∥22 , (6.1.23)
(j+1) (j)
rk = rk − δj aj .
(n+1) (n+1)
Then xk+1 = xk and rk+1 = rk . In the jth minor step, only the jth component is
changed, and only the jth column aj = Aej is accessed. The iteration simplifies if the columns
are prescaled to have unit norm.
The second class of projection methods finds the minimum norm solution of AT y = c =
(c, . . . , cn )T . Let p1 , p2 , . . . ̸∈ N (A) be a sequence of n-vectors, and set qj = Apj . Compute
By construction we have d(j+1) ⊥ qi , where d(j) = y − y (j) denotes the error. It follows that
aTi x = bi , i = 1, . . . , n, (6.1.26)
274 Chapter 6. Iterative Methods
where aTi = eTi A is the ith row of A. Given an initial approximation x(0) , he forms
Because the center of gravity of the system of masses wi must fall inside this hypersphere,
Cimmino’s method is error-reducing, i.e., ∥x(1) − x∥2 < ∥x(0) − x∥2 . In matrix form Cimmino’s
method can be written
2 T T
x(k+1) = x(k) + A D D(b − Ax(k) ), (6.1.30)
m
where
√
D = diag (d1 , . . . , dn ), di = wi ∥ai ∥2 . (6.1.31)
n×n
Cimmino notes that if rank(A) > 2, the iterates converge even when A ∈ R is singular
and the linear system is inconsistent. Then Cimmino’s method will converge to a solution of the
weighted least squares problem minx ∥D(Ax − b)∥2 . If wi = ∥ai ∥22 , then D = I, and (6.1.30)
is Richardson’s method with ω = 2/µ.
SOR step where the columns of A are taken in reverse order, j = n, . . . , 2, 1. The SSOR iter-
ation is symmetrizable. SOR and SSOR share with the Gauss–Seidel method the advantages of
simplicity and small storage requirements.
For positive definite ATA, SOR converges if and only if 0 < ω < 2. The parameter ω in
SOR should, if possible, be chosen to maximize the asymptotic rate of convergence. If ATA has
the following special property, the optimal ω is known.
A block tridiagonal matrix A whose diagonal blocks are nonsingular diagonal matrices can
be shown to be consistently ordered. In particular, this is true if there exists a permutation matrix
P such that P ATAP T has the form
D1 U1
, (6.1.32)
L1 D2
where D1 and D2 are nonsingular diagonal matrices. Such a matrix is said to have property A.
The following result is due to Young [1141, 2003].
Theorem 6.1.8. Let A be a consistently ordered matrix, and assume that the eigenvalues µ of
the Jacobi iteration matrix GJ = L + U are real, and its spectral radius satisfies ρ(GJ ) < 1.
Then the optimal relaxation parameter ω in SOR is given by
2
ωopt = p , ρ(Gωopt ) = ωopt − 1. (6.1.33)
1 + 1 − ρ2J
If A is consistently ordered, then using ωopt in SOR gives a great improvement in the rate of
convergence. Otherwise, SOR may not be effective for any choice of ω. In contrast to SOR, the
rate of convergence of SSOR is not very sensitive to the choice of ω. Taking ω = 1, i.e., using
the symmetric Gauss–Seidel method, is often close to optimal; see Axelsson [48, 1994].
for solving the normal equations ATAx = AT b, associated with the splitting ATA = M −N with
M symmetric positive definite. Then the eigenvalues {λi }ni=1 of M −1 ATA are real. Assume that
lower and upper bounds are known such that
where G = I − M −1 ATA is the iteration matrix. Then the eigenvalues {ρi }ni=1 of G are real
and satisfy
1 − b = c < ρi ≤ d = 1 − a < 1. (6.1.36)
(Note that we allow c ≤ −1, even though then ρ(G) ≥ 1, and the iteration (6.1.34) is divergent!)
To attempt to accelerate convergence of the basic iteration we take a linear combination of
the first k iterations,
X k
x0 = x̃0 , xk = cki x̃i , k = 1, 2, . . . , (6.1.37)
i=0
Pk
where, for consistency, we require that i=0 cki = 1. The resulting iteration is known as a
semi-iterative method; see Varga [1090, 1962]. The error equation for the basic iteration method
(6.1.34) is
x̃k − x = Gk (x̃0 − x), (6.1.38)
where
k
X
Pk (t) = cki ti , Pk (1) = 1,
i=0
where Π1k denotes the set of all polynomials of degree k such that Pk (1) = 1.
The solution to the minimization problem (6.1.40) can be expressed in terms of the Cheby-
shev polynomials Tk (z) of the first kind; see Section 4.5.2. These are defined by the three-term
recurrence relation T0 (z) = 1, T1 (z) = z, and
By induction it follows that the leading coefficient of Tk (z) is 2k−1 . Tk (z) may also be expressed
explicitly as
cos(kϕ)), z = cos ϕ if |z| ≤ 1,
Tk (z) = (6.1.42)
cosh(kγ), z = cosh γ if |z| > 1.
Thus, |Tk (z)| ≤ 1 for |z| ≤ 1. For |z| ≥ 1 we have z = 21 (w + w−1 ), where w = eγ . By solving
a quadratic equation in w, we get
p
Tk (z) = 12 (wk + w−k ), w=z± z 2 − 1 > 1. (6.1.43)
This shows that outside the interval [−1, 1], Tk (z) grows exponentially with k. The Chebyshev
polynomial Tk (z) has the following extremal property.
6.1. Basic Iterative Methods 277
Theorem 6.1.9. Let µ be any fixed number such that µ > 1. If we let Pk (z) = Tk (z)/Tk (µ),
then Pk (µ) = 1 and
max |Pk (z)| = 1/Tk (µ). (6.1.44)
1≤z≤1
Moreover, if Q(z) is any polynomial of degree k or less such that Q(µ) = 1 and max1≤z≤1 |Q(z)|
≤ 1/Tk (µ), then Q(z) = Pk (z).
From this result it follows that the solution to the minmax problem (6.1.40) is a scaled and
shifted Chebyshev polynomial. Let
be the linear transformation that maps the interval t ∈ [c, d] onto z ∈ [−1, 1]. Then the solution
to the minimization problem (6.1.40) is given by
Tk (z(t)) b+a
Pk (t) = , µ = z(1) = (2 − (d + c))/(d − c) = , (6.1.45)
Tk (z(1)) b−a
where we have used the facts that a = 1 − d and b = 1 − c; see (6.1.36). It follows that a bound
for the spectral radius of Pk (G) is given by
If the splitting matrix M is symmetric and positive definite, then κ2 = b/a > 1 is an approximate
upper bound for the condition number of M −1 ATA. From (6.1.35) it follows that
is an upper bound for the spectral condition number of A. An elementary calculation shows that
√ √
p κ+1 2 κ κ+1
w = µ + µ2 − 1 = + =√ > 1.
κ−1 κ−1 κ−1
From (6.1.43) it follows that ρ(qk (A)) ≤ 1/Tk (µ) < 2e−kγ , where
√
κ+1 2
γ = log √ >√ . (6.1.47)
κ−1 κ
Hence, to reduce the error norm by at least a factor of δ < 1 it suffices to perform k iterations,
where
1√
2
k> κ log . (6.1.48)
2 δ
Hence, the number of iterations √required for the Chebyshev accelerated method to achieve a cer-
tain accuracy is proportional to κ rather than κ as for Richardson’s method; see Section 6.1.3.
This is a great improvement but assumes that the upper and lower bounds in (6.1.35) for the
eigenvalues are sufficiently accurate.
The Chebyshev Semi-iterative (CSI) method by Golub and Varga [513, 1961] is an efficient
and stable way to implement Chebyshev acceleration. It can be applied to accelerate any station-
ary iterative method for the normal equations
xk+1 = xk + M −1 AT (b − Axk ), k = 0, 1, . . . ,
278 Chapter 6. Iterative Methods
provided it is symmetrizable. CSI also has the advantage that the number of iterations need not
be fixed in advance. CSI uses a clever rewriting of the three-term recurrence relation for the
Chebyshev polynomials to compute x(k) directly.
rk = b − Axk , sk = M −1 AT rk , k ≥ 0.
Each iteration requires two matrix-vector multiplications Axk and AT rk , and the solution of
M s(k) = AT rk . The second-order Richardson method can also be described by (6.1.49) with
α and µ as above, and
2
ωk = ωb= p . (6.1.50)
(1 + 1 − µ2 )
It can be shown that in the CSI method, ωk → ω b as k → ∞.
For SOR, the eigenvalues of the iteration matrix Bωopt are all complex with modulus |ωopt |.
In this case, convergence acceleration is of no use; see Young [1141, 2003]. On the other hand,
for SSOR, Chebyshev acceleration often achieves a substantial gain in convergence rate.
where αk and βk are parameters to be chosen. A simple induction argument using (6.2.1) shows
that rk and pk lie in the Krylov subspaces
pH
k rk
αk = . (6.2.3)
pH
k Apk
6.2. Krylov Subspace Methods 279
(pk+1 , pk )A = 0, (6.2.4)
where the A-inner product and A-norm, also called energy norm, are defined by
pH
k Ark+1
βk = − . (6.2.6)
pH
k Apk
Theorem 6.2.1. In CG, the residual vector rk is orthogonal to all previous direction vectors and
residual vectors:
rkH pj = 0, rkH rj = 0, j = 0 : k − 1. (6.2.7)
The direction vectors are mutually A-conjugate:
pH
k Apj = 0, j = 0 : k − 1. (6.2.8)
Proof. We prove (6.2.7) and (6.2.8) jointly by induction. The choice of αk ensures that rk is
orthogonal to pk−1 , and (6.2.4) shows that (6.2.8) holds also for j = k − 1. Hence these relations
are true for k = 1. Assume now that the relations are true for some k > 1. From pH k rk+1 = 0,
changing the index and taking the scalar product with pj , 0 ≤ j < k, we get
H
rk+1 pj = rkH pj − αk pH
k Apj .
H
This is zero by the induction hypothesis, and because rk+1 pk = 0, it follows that (6.2.7) holds
for k + 1. From (6.2.1), the induction hypothesis, and (6.2.8), we find that
−1 H
pH H H
k+1 Apj = rk+1 Apj + βk pk Apj = αj rk+1 (rj − rj+1 )
= αj−1 rk+1
H
(pj − βj−1 pj−1 − pj+1 + βj pj ).
By (6.2.7), this is zero for 0 < j < k. For j = 0 we use b = p0 in forming the last line of
the equation. For j = k we use (6.2.4), which yields (6.2.8). Since the vectors r0 , . . . , rk−1
and p0 , . . . , pk−1 span the same Krylov subspace Kk (A, b), the second orthogonality relation in
(6.2.7) also holds.
We now use these orthogonality properties to derive alternative expressions for αk and βk .
From (6.2.1), we have rkH pk = rkH (rk + βk−1 pk−1 ) = rkH rk , giving
rkH rk
αk = . (6.2.9)
pH
k Apk
H H H H
Similarly, rk+1 rk+1 = rk+1 (rk − αk Apk ) = −αk rk+1 Apk . Now from (6.2.6) we get rk+1 Apk
H
= βk pk Apk , and (6.2.9) gives
rH rk+1
βk = k+1 . (6.2.10)
rkH rk
Theorem 6.2.1 and the property rk ∈ Kk (A, b) imply that in theory the residual vectors r0 , r1 ,
r2 , . . . are the vectors that would be obtained from the sequence b, Ab, A2 b, . . . by Gram–Schmidt
280 Chapter 6. Iterative Methods
of the error over all vectors x ∈ x0 + Kk (A, r0 ), where r0 = b − Ax0 and x∗ = A−1 b is the
exact solution.
The computational requirements for each iteration of CG are constant. Each step requires
one matrix-vector multiplication with A, two inner products, and two vector updates of length n.
Storage is needed for four n-vectors x, r, p, q.
6.2. Krylov Subspace Methods 281
Originally, the CG method was viewed primarily as a direct method; see Householder [645,
1964, Sect. 5.7]. It soon became evident that the finite termination property is valid only in
exact arithmetic. In floating-point computation it could take many more than n iterations before
convergence occurred. This led to a widespread disregard of the method for more than a decade
after its publication. Interest was renewed when Reid [920, 1971] showed that it could be highly
efficient if used as an iterative method for solving large, sparse, well-conditioned linear systems.
is obtained by applying CG to the normal equations ATAx = AT b. These are Hermitian positive
definite or semidefinite. From Theorem 6.2.1 it follows that the residual vectors
sk = AT rk = AT (b − Axk )
in CGLS are mutually orthogonal. In exact arithmetic, the CGLS iterations terminate with sk = 0
after at most rank(A) steps. Let x† denote the pseudoinverse solution, and let r† = b − Ax† . If
ATA is positive definite, it follows from Theorem 6.2.2 that xk minimizes the error norm
where the last equality follows from the Pythagorean theorem and the identity
r = (r − r† ) + r† , r − r† ⊥ r† .
Thus ∥rk ∥2 also decreases monotonically. However, the residual norm ∥sk ∥2 = ∥AT rk ∥2 will
usually oscillate, especially when A is ill-conditioned.
Consider a straightforward application of CG to the normal equation ATAx = AT b with
x0 = 0. The only information about b available is from the initialization s = AT b because no
more reference to b is made in the iterative phase. Hence the bound on the achievable accuracy
will include a term of size
|δx| ≤ muκ(A)|A† | |b|, (6.2.14)
coming from the roundoff error in computing AT b. If |r| ≪ |b|, this term is much larger than for
perturbations of A and b. For reasons of numerical stability, the following two simple algebraic
rearrangements should be performed:
1. Explicit formation of the matrix ATA should be avoided.
2. The residual r = b − Ax should be recurred instead of the residual s = AT r of the normal
equations. This is crucial for stability because of the cancellation that occurs in r before
multiplication by AT .
The resulting method, here called CGLS, appeared as Algorithm (10:2) in the original paper by
Hestenes and Stiefel [608, 1952].9
9 The same method is called CGNR by Saad [956, 1996] and GCG-LS by Axelsson [48, 1994].
282 Chapter 6. Iterative Methods
Each iteration of CGLS requires two matrix-vector products, one with A and the other with
AT , as well as two inner products or vector updates of length m and three of length n. Storage
is needed for two m-vectors r, q and two n-vectors x, p. (Note that s can share storage with q.)
When rank(A) < n the least squares solution is not unique. However, it is easily verified
that if x0 ∈ R(AT ), e.g., x0 = 0, then in CGLS xk ∈ R(AT ), k = 0, 1, 2, . . . . Hence, in exact
arithmetic, CGLS terminates with the pseudoinverse solution x† = A† b ∈ R(AT ). We conclude
that in theory CGLS works for least squares problems of any rank and shape, overdetermined
as well as underdetermined. A version of CGLS that solves the regularized normal equations
(AT A + µ2 I)x = AT b is given in Section 6.4.2.
For a linear model Ax = b + e with positive definite covariance matrix σ 2 V ∈ Rm×m , the
generalized normal equations in factored form are AT V −1 (b − Ax) = 0. These can be solved
by the following generalized version of CGLS.
s = A'*y;
stsold = sts; sts = s'*s;
beta = sts/stsold;
p = s + beta*p;
end
end
AAT z = b, x = AT z. (6.2.16)
xk = AT zk , zk ∈ Kk (AAT , b).
For A ∈ Rm×n , CGME needs storage for two vectors x and q of length n and two vectors r
and p of length m. Three inner products or vector updates of length n and two of length m are
required per step.
The vector p in CGME can be eliminated. Then the algorithm becomes identical to an al-
gorithm due to Craig [277, 1955]; see Saad [956, 1996, Sect. 8.3.2]. We prefer to keep the
algorithm in the form given above, because this makes it possible to include a regularization
term; see Section 6.4.2.
In exact arithmetic, CGLS generates the sequence of Krylov subspace approximations xk ,
k = 1, 2, . . . , defined in Section 4.2.3 for solving ATAx = ATb. By Theorem 4.2.4, CGLS
terminates after at most r = rank(A) steps with the pseudoinverse solution x† = A† b. More
precisely, if A has p distinct (possibly multiple) nonzero singular values σ1 > σ2 > · · · > σp ,
then in exact arithmetic, CGLS terminates after p steps. For example, if A is the unit matrix plus
a matrix of rank p, at most p+1 steps are needed. Even fewer steps are required if b is orthogonal
Pn vectors ui corresponding to σi . If the original system is such that
to some of the left singular
the exact solution x = i=1 (ci /σi )vi , ci = uTi b has small projections ci onto singular vectors
ui , i > p, then p steps can be expected to give good approximations. However, the intermediate
Krylov subspace approximations xk depend nonlinearly on A and b in a highly complicated way.
We now derive an upper bound for the number of iterations needed to reduce the residual
norm by a certain amount. We assume exact arithmetic, but the bound holds also for finite-
precision computation. The residual of the normal equation can be written
Pk
where Pk−1 is a polynomial of degree k − 1, and Rk (λ) = i=0 cki λi is a residual polynomial,
i.e., Rk (1) = 1. Let S contain all the nonzero singular values σ of A, and assume that for some
residual polynomial R̃k we have
max |R̃k (σ 2 )| ≤ Mk .
σ∈S
We can now select a set S on the basis of some assumption regarding the singular value distribu-
tion of A and seek a polynomial R̃k ∈ Π̃1k such that Mk = maxσ∈S |R̃k (σ 2 )| is small. A simple
choice is to take S to be the interval [σn2 , σ12 ] and seek the polynomial R̃k ∈ Π̃1k that minimizes
The solution to this problem is the shifted Chebyshev polynomials introduced in the analysis
of the CSI method in Section 6.1.6. This gives the following upper bound for the norm of the
residual error after k steps:
k
κ(A) − 1
∥r − rk ∥2 ≤ 2 ∥r − r0 ∥2 , k = 0, 1, 2, . . . . (6.2.18)
κ(A) + 1
It follows that an upper bound on the number of iterations k needed to reduce the relative error
by a factor ϵ is
∥r − rk ∥2 1 2
< ϵ ⇐⇒ k < κ(A) log . (6.2.19)
∥r − r0 ∥2 2 ϵ
6.2. Krylov Subspace Methods 285
This is the same as for the CSI method and the second-order Richardson method. However, these
methods require that accurate lower and upper bounds for the singular values of A be known.
Furthermore, the estimate (6.2.19) tends to be sharp asymptotically for CSI, while for CGLS the
error usually decreases much faster. On the other hand, the inner products in CGLS can be a
bottleneck when implemented on parallel computers.
Although useful in the analysis of many model problems, the bound (6.2.18) in terms of κ(A)
cannot be expected to describe the highly nonlinear complexity of the convergence behavior of
CGLS. The convergence depends on the distribution of all singular values of A, as well as on
the projection of the right-hand side b onto the left singular vectors of A. In practice the rate of
convergence often accelerates as the number of steps increases.
In floating-point arithmetic the finite termination property no longer holds, and it can take
much more than n steps before the desired final accuracy is reached; see Section 6.2.6.
AB −1 y = (I − EB −1 )Bx = b, Bx = y, (6.2.21)
are the left- and right-preconditioned systems. If B is chosen so that B −1 A or AB −1 is better
conditioned than A, faster convergence may be expected when the iterative method is applied to
one of the preconditioned systems. Note that the products B −1 A and AB −1 need not be formed
explicitly, since iterative methods only require that matrix-vector products such as B −1 (Ax)
and A(B −1 y) can be formed for arbitrary x and y. This is possible if linear systems with B
can be solved. Preconditioned iterative methods may be viewed as a compromise between a
direct and an iterative solver. To be efficient, a preconditioner should have the following, partly
contradictory properties:
• The norm of the defect matrix E should be small, and AB −1 (B −1 A) should be better
conditioned than A and have well clustered eigenvalues.
• Linear systems with matrices B and B T should be cheap to solve, and B should not have
many more nonzero elements than A.
For solving a least squares problem minx ∥Ax − b∥2 the right-preconditioned problem
min ∥AB −1 y − b∥2 , y = Bx, (6.2.22)
y
286 Chapter 6. Iterative Methods
should be used, because a left-preconditioner would change the objective function. This is equiv-
alent to a symmetric preconditioning of the normal equations:
B −TATAB −1 y = B −TAT b, x = B −1 y. (6.2.23)
For the preconditioned CGLS (PCGLS) method the approximations minimize the error func-
tional ∥rk ∥22 , rk = b − Axk , over the Krylov subspace
xk − x0 ∈ Kk (B −T ATAB −1 , B −T AT r0 ). (6.2.24)
For PCGLS, the required matrix-vector product u = AB −1 y is computed by solving Bw = y
and forming u = Aw. Similarly, v = B −T AT z is computed by solving B T v = AT z. Hence,
the extra work is one solve with B and one with B T . Below we give an implementation of
preconditioned CGLS (PCGLS).
A simple preconditioner for the normal equations is the diagonal matrix B = diag (d1 ,
. . . , dm ), where dj = ∥aj ∥2 . Then the columns of the preconditioned matrix AB −1 will have
unit length. By Theorem 2.1.2 this preconditioner approximately minimizes κ2 (AD−1 ) over all
diagonal D > 0. Using this preconditioner can significantly improve the convergence rate with
almost no cost in terms of time and memory. The column norms can be obtained cheaply if A is
a sparse matrix (stored columnwise or rowwise). For CGLS the iterations are usually terminated
when ∥AT rk ∥2 /(∥A∥2 ∥rk ∥2 ) ≤ η, where η is a small tolerance (see stopping criteria (6.2.59)
and (6.2.60)). This guarantees a backward stable solution. In PCGLS, ∥rk ∥2 and ∥AB −1 ∥ can
be estimated, but usually not ∥A∥2 . Instead, we use the stopping criterion
∥(AB −1 )T rk ∥2
≤ η.
∥AB −1 ∥2 ∥rk ∥2
To solve a consistent underdetermined problem minx ∥x∥2 subject to Ax = b, we can apply
CGME (Craig’s method) to the left-preconditioned problem
min ∥x∥2 subject to B −1 Ax = B −1 b. (6.2.25)
x
6.2. Krylov Subspace Methods 287
This iteration method minimizes the error functional ∥x − xk ∥2 over the Krylov subspaces
x − xk ∈ Kk AT (BB T )−1A, AT (BB T )−1 b .
(6.2.26)
Note that although the residual vectors are transformed, the algorithm can be formulated in terms
of the original residuals rk = b − Axk .
Preconditioners for least squares problems are treated in Section 6.3. Benzi [106, 2002]
gives an excellent survey of preconditioning techniques for the iterative solution of large linear
systems, with a focus on algebraic methods for general sparse matrices. Wathen [1095, 2015] and
Pearson and Pestana [888, 2020] survey a range of preconditioners for use with partial differential
equations and optimization problems and for other purposes as well.
αk = uH
k vk , vk = Auk − βk uk−1 , k = 2, . . . , n. (6.2.29)
288 Chapter 6. Iterative Methods
Solving (6.2.28) for uk+1 gives βk+1 uk+1 = rk and rk = vk − αk uk . If rk = 0, the process
stops. Otherwise,
βk+1 = ∥rk ∥2 , uk+1 = rk /βk+1 . (6.2.30)
Thus, as long as all βk ̸= 0, the elements in the tridiagonal matrix Tk and the unitary matrix
Uk+1 ∈ Kk+1 (A, u1 ) are uniquely determined. Furthermore, it holds that
Tk
AUk = Uk Tk + βk+1 uk+1 eTk = Uk+1 T̂k , T̂k = , (6.2.31)
βk+1 eT1
which is the Lanczos decomposition. It is easy to verify that Uk is an orthonormal basis in the
Krylov subspace Kk (A, b).
The Lanczos process requires storage for Tk and three n-vectors uk−1 , uk , and uk+1 . The
eigenvalues of Tk are approximations to the eigenvalues of A. The process stops when βk+1 = 0.
Then by (6.2.31), AUk = Uk Tk , i.e., Uk is an invariant subspace of A.
Lanczos [716, 1952] noted that his process can be used to solve positive definite systems of
linear equations by a method he called the method of minimized iterations. With
β1 = ∥b∥2 , u1 = b/β1 ,
an approximate solution xk = Uk yk ∈ Kk (A, b) is determined by the Galerkin condition
UkH rk = 0, rk = b − Axk .
The Lanczos decomposition (6.2.31) gives
rk = β1 u1 − AUk yk = Uk+1 (β1 e1 − T̂k yk ). (6.2.32)
If yk is determined from Tk yk = β1 e1 , it follows that rk = −(eTk yk )βk+1 uk+1
and H
Uk+1
=0 rk
as required. Because A is positive definite, so is Tk . Hence the Cholesky factorization Tk =
Lk LTk exists, with
l11
l21 l22
l32 l33
Lk = .
.. ..
. .
lk,k−1 lkk
Because Lk is the k × k principal submatrix of Lk+1 , the Cholesky factorization can be cheaply
updated. The equation Tk yk = β1 e1 is equivalent to the bidiagonal equations
Lk zk = β1 e, LTk yk = zk .
It follows that
zk−1
zk = , ξk = −lk,k−1 ξk−1 /lkk .
ξk
If we define Pk from Lk PkT = UkT , then
xk = Uk yk = Pk LTk yk = Pk zk ,
and lk−1,k pk−1 + lkk pk = uk . Hence,
xk = xk−1 + ζk pk , pk = (uk − lk,k−1 pk−1 )/lkk
can be obtained without saving all the vectors u1 , . . . , uk or computing yk .
In exact arithmetic, this method computes the same sequence of approximations xk ∈
Kk (A, b) as CG and is therefore often called the Lanczos-CG method. The residual vectors
r0 , . . . , rk−1 in CG are mutually orthogonal and form a basis for the Krylov space Kk (A, b).
Hence by uniqueness, the columns of Uk in the Lanczos process equal the residual vectors nor-
malized to unit length.
6.2. Krylov Subspace Methods 289
Then the Lanczos process is just the Stieltjes algorithm for computing the corresponding se-
quence of orthogonal polynomials. The vectors uk are of the form qk−1 (A)u1 , and the orthog-
onality of these vectors translates into the orthogonality of the polynomials with respect to the
inner product (6.2.33).
where the scalars αk and βk are normalization constants. The recurrences can be summarized as
α1
β2 α2
..
Bk = β3 . ∈ R(k+1)×k (6.2.37)
..
. αk
βk+1
is lower bidiagonal. The columns of Uk and Vk are orthonormal bases for the Krylov subspaces
From the orthogonality of Uk+1 and Vk it follows that ∥rk ∥2 = ∥tk+1 ∥2 is minimized over all
xk ∈ span(Vk ) if yk solves the bidiagonal least squares subproblem
The special form of the right-hand side β1 e1 holds because the starting vector was taken as
b. By uniqueness, it follows that in exact arithmetic, this generates the same Krylov subspace
approximations xk ∈ Kk (ATA, AT b) as CGLS.
Subproblem (6.2.39) can be solved by the QR factorization
Rk fk
Qk (Bk β1 e1 ) = , (6.2.40)
0 ϕ̄k+1
where
ρ1 γ2 ϕ1
ρ2 γ3 ϕ2
.. .. ..
Rk =
. . ,
fk =
.
.
(6.2.41)
ρk−1 γk ϕk−1
ρk ϕk
The matrix Qk = Gk,k+1 Gk−1,k · · · G12 is a product of plane rotations chosen to eliminate the
subdiagonal elements β2 , . . . , βk+1 of Bk . The solution yk satisfies
Rk yk = fk . (6.2.42)
In exact arithmetic, the LSQR approximations xk = Vk yk are the same as for CGLS.
The factorization (6.2.40) can be computed by a recurrence relation. Assume that we have
computed the QR factorization of Bk−1 , so that
0 β1 e 1 γk ek−1 fk−1
Qk−1 αk 0 = ρ̄k ϕ̄k . (6.2.43)
1
βk+1 0 βk+1 0
(Note that the previous rotations Gk−2,k−1 , . . . , G12 do not act on the kth column.)
If xk were formed as xk = Vk yk , it would be necessary to save (or recompute) the vectors
v1 , . . . , vk . This can be avoided by defining Zk from the triangular system Zk Rk = Vk , so that
xk = Zk Rk yk ≡ Zk fk .
v1 = ρ1 z1 , vk = γk zk−1 + ρk zk , k > 1.
LSQR requires 3m + 5n multiplications and storage of two m-vectors u and Av and three
n-vectors x, v, and w. This can be compared to CGLS, which requires 2m + 3n multiplications,
two m-vectors, and two n-vectors. Unlike in CGLS, the residual vector rk = Uk+1 tk+1 is not
computed in LSQR.
Benbow [104, 1999] generalized GKL lower bidiagonalization and LSQR to solve the gen-
eralized normal equations AT M −1 Ax = AT M −1 b, where M = LLT . When vectors ũi = Lui
are introduced into LSQR, only matrix-vector operations with A, AT , and M −1 are needed:
Avi and AT L−T L−1 ui+1 = AT M −1 ũi+1 .
Algorithm LSME of Paige [852, 1974] is an algorithm for solving consistent systems Ax = b
using the same bidiagonalization process as LSQR. Let
α1
β2 α2
β α ∈ Rk×k
Lk = 3 3 (6.2.46)
.. ..
. .
βk α k
292 Chapter 6. Iterative Methods
be the lower bidiagonal matrix consisting of the first k rows of Bk in (6.2.37). Then the bidiag-
onalization recurrence relations (6.2.36) can be written as
AVk = Uk Lk + βk+1 uk+1 eTk , AT Uk = Vk LTk . (6.2.47)
The LSME approximations xk are given by v
xk = Vk zk , zk = β1 L−1
k e1 , (6.2.48)
where zk and xk are obtained by the recurrences
ζk = −(βk /αk )ζk−1 , xk = xk−1 + ζk vk . (6.2.49)
Since the increments vk form an orthogonal set, the step lengths are bounded by |ζk |2 ≤ ∥xk ∥2 ≤
∥x∥2 . The LSME approximations xk lie in Kk (AAT , r0 ) and minimize ∥x† − xk ∥2 . By unique-
ness, LSME is mathematically equivalent to Craig’s method and CGME.
Hence for LSMR, ∥AT rk ∥2 is monotonically decreasing by design. This allows the iterations to
be terminated more safely. The residual norm ∥rk ∥2 of LSQR will be smaller but, usually, not
by much.
We now describe how subproblems (6.2.50) can be solved efficiently. After k steps, we have
To solve this, a sequence of plane rotations is first used to compute the QR factorization
Rk
Qk Bk = (6.2.54)
0
as in LSQR. Here Rk is upper bidiagonal, and RkT Rk = BkT Bk . If RkT qk = β̄k+1 ek , then
qk = (β̄k+1 /ρk )ek ≡ φk ek . If we define tk = Rk yk , subproblem (6.2.53) can be written
T RkT
min ∥A rk ∥2 = min β̄1 e1 − tk , (6.2.55)
yk tk φk eTk 2
Let Wk = (w1 , . . . , wk ) and W̄k = (w̄1 , . . . , w̄k ) be computed by forward substitution from
RkT WkT = VkT and R̄kT W̄kT = WkT . Then, from
xk = Vk yk , Rk yk = tk , R̄k tk = zk ,
we have
xk = Wk Rk yk = Wk tk = W̄k R̄k tk = W̄k zk = xk−1 + ζk w̄k .
As for LSQR, all quantities needed to update the approximation xk−1 can be computed by for-
ward recurrence relations, where only the last term needs to be saved. Also, ∥rk ∥2 and ∥xk ∥2
can be obtained by using formulas that can be updated cheaply. For details see Fong and Saun-
ders [417, 2011], where a detailed pseudocode for LSMR is given. LSMR shares with LSQR the
property that for rank-deficient problems, it terminates with the least-norm solution.
LSMR requires storage for one m-vector and seven n-vectors. The number of floating-point
multiplications is 8n per iteration step. The corresponding figures for LSQR are two m-vectors,
three n-vectors and 3m + 5n, respectively. LSMR is easily extended to solve regularized least
squares problems (6.4.8).
The three algorithms LSME, LSQR, and LSMR all use the same bidiagonalization algorithm
and generate approximations in the same Krylov subspaces xk ∈ Kk (ATA, AT b). The algo-
rithms minimize different error quantities, ∥xk − x∥, ∥rk − r∥, and ∥AT (rk − r)∥, respectively
(where the last two are equivalent to minimizing ∥rk ∥2 and ∥AT rk ∥2 ). LSME can only be used
for consistent systems. LSMR is the only algorithm for which all three error measures are mono-
tonically decreasing. This makes LSMR the method of choice in terms of stability. However,
at any given iteration, LSQR has a slightly smaller residual norm. It has been suggested that
LSMR should be used until the iterations can be terminated. Then a switch can be made to the
corresponding LSQR solution.
Similar comments apply to CGME and CGLS. A CGLS-type algorithm, CRLS, correspond-
ing to LSMR that minimizes ∥AT rk ∥2 is derived by Björck [129, 1979] by using a modified
inner product in CGLS. The same algorithm is derived by Fong [416, 2011] by applying the
conjugate residual (CR) algorithm to the normal equations ATAx = AT b. Tests have shown that
CRLS achieves much lower final accuracy in xk than CGLS and LSMR, and therefore its use
cannot be recommended.
Fortran, C, Python, and MATLAB implementations of LSQR, LSMR, and other iterative
solvers are available from Systems Optimization Laboratory, Stanford University. Julia imple-
mentations are available at http://dpo.github.io/software/.
find that in the limit the computed solution’s accuracy is at least as good as for a backward stable
method.
Greenbaum [533, 1989] observed that Krylov subspace algorithms seem to behave in finite-
precision like the exact algorithm applied to a larger problem Bb x = c, where B has many
eigenvalues distributed in tiny intervals about the eigenvalues of A. Hence κ(B) ≈ κ(A), which
could explain why the error bound (6.2.18) has been observed to apply also in finite precision.
Some theoretical properties of CGLS and LSQR, such as monotonic decrease of ∥rk ∥2 , remain
valid in floating-point arithmetic.
Krylov subspace methods, such as CGLS and LSQR, typically “converge” in three phases as
follows (see, e.g., Axelsson [48, 1994]):
1. An initial phase in which convergence depends essentially on the initial vector b and can
be rapid.
2. A middle phase, where the convergence is linear with a rate depending on the spectral
condition number κ(A).
3. A final phase, where the method converges superlinearly, i.e., the rate of convergence
accelerates as the number of steps increases. This may take place after considerably less
than n iterations.
In a particular case, any of these phases can be absent or appear repeatedly. The behavior can
be supported by heuristic arguments about the spectrum of A; see Nevanlinna [829, 1993]. For
example, superlinear convergence in the third phase can be explained by the effects of the smaller
and larger singular values of A being eliminated.
A typical behavior is shown in Figure 6.2.1, which plots ∥x† −xk ∥ and ∥AT rk ∥ for LSQR and
CGLS applied to the sparse least squares problem ILLC1850 from Matrix Market (Boisvert et al.
[160, 1997]). This problem originates from a surveying application and has dimensions 1850 ×
712 with 8636 nonzeros. The condition number is κ(A) = 1.405 × 103 , and the inconsistent
right-hand side b is scaled so that the true solution x has unit norm and ∥r∥2 = 0.79 × 10−4 .
Paige and Saunders [866, 1982] remark that LSQR is often numerically somewhat more reliable
than CGLS when A is ill-conditioned and many iterations have to be carried out. In Figure 6.2.1
10 0
10 -5
error and residual norms
10 -10
10 -15
10 -20
10 -25
0 500 1000 1500 2000 2500 3000
iterations
Figure 6.2.1. ∥x† − xk ∥ and ∥AT rk ∥ for problem ILLC1850: LSQR (blue and green solid lines)
and CGLS (black and magenta dash-dot lines).
296 Chapter 6. Iterative Methods
the plots are similar, and both CGLS and LSQR converge after about 2500 ≈ 3.5n iterations to
a final error ∥x − xk ∥ < 10−14 . The superlinear convergence phase is clearly visible. Note that
the oscillations in ∥AT rk ∥ are not caused by the finite precision.
It might be tempting to restart CGLS or LSQR after a certain number of iterations from a
very good approximation to the solution and an accurately computed residual. However, after
such a restart, any previously achieved superlinear convergence is lost and often not recovered
until after many additional iterations.
Ideally, the iterations should be stopped when the backward error in xk is sufficiently small.
Evaluating the expression given in Section 2.5.2 for the optimal backward error is generally too
expensive. In practice, stopping criteria are based on the two upper bounds
where x̄ and r̄ are the current approximate solution and residual. This motivates terminating the
iterations as soon as one of the following two conditions are satisfied:
where ηA and ηb are user-specified tolerances. S1 is relevant for consistent problems; otherwise,
S2 is used. Note that it is possible for xk to be a solution of a slightly perturbed least squares
problem and yet for both ∥E1 ∥2 and ∥E2 ∥2 to be orders of magnitude larger than the norm of the
optimal perturbation bound. LSQR and LSMR also terminate if an estimate of κ2 (A) exceeds
a specified limit. This option is useful when A is ill-conditioned or rank-deficient. Because the
stopping criterion S2 is normally used for inconsistent least squares problems, the oscillations in
∥AT rk ∥ that occur for LSQR and CGLS are undesirable. This is one reason to use LSMR, for
which this quantity is monotonically decreasing.
Figure 6.2.2 compares LSQR and LSMR applied to the same problem as before. The fi-
nal accuracy is similar for both algorithms. As predicted by theory, ∥AT rk ∥ is monotonically
10 0
10 -5
error and residual norms
10 -10
10 -15
10 -20
10 -25
0 500 1000 1500 2000 2500 3000
iterations
Figure 6.2.2. ∥x† − xk ∥ and ∥AT rk ∥ for problem ILLC1850: LSQR (blue and green solid lines)
and LSMR (black and magenta dashed lines).
6.2. Krylov Subspace Methods 297
decreasing for LSMR and always smaller than for LSQR. (In many cases the difference is much
more obvious.) Hence criterion S2 will terminate the iterations sooner than for LSQR. On the
other hand, both ∥x − xk ∥2 and ∥rk ∥2 are typically larger for LSMR than for LSQR.
Fong and Saunders [417, 2011] tested LSQR and LSMR on 127 different least squares
problems of widely varying origin, structure, and size from the SuiteSparse matrix collection
(Davis and Hu [291, 2011]). They make similar observations about the accuracy achieved. With
η2 = 10−8 in stopping rule S2, the iterations terminated sooner for LSMR. They suggest the
strategy of running LSMR until termination, then transferring to the LSQR solution. In tests
by Gould and Scott [520, 2017] on 921 problems from the same collection, LSMR had fewer
failures with a given iteration limit and faster execution than LSQR.
If rank(A) < n, CGLS and LSQR in theory will converge to the pseudoinverse solution
provided x0 ∈ R(AT ) (e.g., x0 = 0). In floating-point arithmetic, rounding errors will introduce
a growing component in N (A) in the iterates xk . Initially this component will remain small,
but eventually divergence sets in, and ∥xk ∥ will grow quite rapidly. The vectors xk /∥xk ∥ then
become increasingly close to a null-vector of A; see Paige and Saunders [866, 1982, Section 6.4].
In Figure 6.2.3 plots of ∥x† − xk ∥ and ∥rk ∥ for CGME and LSME are applied to a consistent
underdetermined system with the transpose of the matrix ILLC1850. The algorithms perform
almost identically. A final accuracy of ∥x − xk ∥ = 6.2 × 10−14 is achieved after about 2500
iterations. Superlinear convergence sets in slightly earlier for CGME. As predicted, ∥x† − xk ∥
converges monotonically, while ∥rk ∥ oscillates. For consistent systems Ax = b with stopping
rule S1, the oscillation in ∥rk ∥ that occurs for CGME and LSME is not an attractive feature. For
CGLS and LSQR, ∥rk ∥ converges monotonically. These methods apply also to underdetermined
systems, and, unlike CGME and LSME, they will not break down if Ax = b turns out to be
inconsistent.
10 0
10 -2
10 -4
10 -6
error and residual norms
10 -8
10 -10
10 -12
10 -14
10 -16
10 -18
0 500 1000 1500 2000 2500
iterations
Figure 6.2.3. Underdetermined consistent problem with the transpose of ILLC1850: ∥x† − xk ∥
and ∥rk ∥; CGME (blue and red solid lines); LSME (black and green dashed lines).
As shown in Figure 6.2.4 the convergence of ∥x† − xk ∥ for CGLS is only marginally slower
than for CGME. The residual norm is always smaller and behaves monotonically. Thus, CGLS
achieves similar final accuracy with only slightly more work and storage than CGME and can
be terminated earlier. Therefore, CGLS or LSQR are usually preferred also for underdetermined
systems.
298 Chapter 6. Iterative Methods
10 0
10-2
10-4
10-6
error and residual norms
10-8
10-10
10-12
10-14
10-16
10-18
0 500 1000 1500 2000 2500
iterations
Figure 6.2.4. Problem ILLC1850 overdetermined consistent: ∥x† − xk ∥ and ∥rk ∥; CGME (blue
and red solid lines) and CGLS (black and green dash-dot lines).
Although the two sets of basis vectors Uk and Vk generated by GKL bidiagonalization are
theoretically orthogonal, this property is lost in floating-point arithmetic. The algorithm be-
havior therefore deviates from the theoretical. Reorthogonalization maintains a certain level of
orthogonality and accelerates convergence at the expense of more arithmetic and storage; see
Simon [995, 1984]. In full reorthogonalization, newly computed vectors uk+1 and vk+1 are
reorthogonalized against all previous basis vectors. If Uk and Vk are orthonormal to working
accuracy, this involves computing
and normalizing the resulting vectors. Uk and Vk must be saved, and the cost is roughly 4k(m +
n) flops. After k steps the accumulated cost is about 2k 2 (m + n) flops. Thus, full reorthogonal-
ization is not practical for larger values of k. The cost can be limited by using local reorthogo-
nalization, where uk+1 and vk+1 are reorthogonalized against the p last vectors, where p ≥ 0 is
an input parameter.
Because Uk and Vk are generated by coupled two-term recurrences, these two sets of vectors
have closely related orthogonality. Simon and Zha [996, 2000] show that keeping Vk orthonormal
√
to machine precision u will make Uk roughly orthonormal to a precision of at least O( u) and
vice versa if Uk is orthogonalized. Such one-sided reorthogonalization saves at least half the
cost of full reorthogonalization. For strongly overdetermined systems (m ≫ n) the savings are
highest if only Vk is orthogonalized.
Tests by Fong and Saunders [417, 2011] show that for more difficult problems, reorthog-
onalization can make a huge difference in the number of iterations needed by LSMR. When
comparing full reorthogonalization and the two versions of one-sided reorthogonalization, they
(unexpectedly) found that the results from these three versions were indistinguishable in all test
cases. The current implementation of LSMR includes the option of local one-sided reorthogo-
nalization.
Barlow [68, 2013] showed that one-sided reorthogonalization is a very effective strategy for
preserving the desired behavior of GKL bidiagonalization. In exact arithmetic, GKL generates
the factorization
A = U BV T ∈ Rm×n , m ≥ n,
6.2. Krylov Subspace Methods 299
A key result is that the singular values of the bidiagonal matrix Bk produced after k steps of this
procedure are the exact result for the Rayleigh–Ritz approximation of a nearby matrix.
The Lanczos process is defined for any symmetric matrix A. It generates an orthogonal basis
Uk = (u1 , u2 , . . . , uk ) for the Krylov subspaces Kk (A, b). Setting xk = Uk yk , we obtain from
(6.2.31) that
b − Axk = β1 u1 − AUk yk = Uk+1 (β1 e1 − T̂k yk ). (6.2.63)
By the orthogonality of Uk+1 , the least squares problem (6.2.62) is seen to be equivalent to
γ1 δ2 ϵ3
..
γ2 δ3 .
Rk
..
Gk,k+1 · · · G23 G12 T̂k =
= γ3 . ϵk ∈ R(k+1)×k ,
(6.2.64)
0 ..
. δk
γk
0
300 Chapter 6. Iterative Methods
where the plane rotations Gj,j+1 annihilate the subdiagonal elements in T̂k . The same rotations
are applied to the right-hand side, giving
tk
Gk,k+1 · · · G23 G12 β1 e1 = ,
τ̄k+1
where tk = (τ1 , . . . , τk )T . The factorization (6.2.64) can be updated easily, as we now show. In
the next step, a row and a column are added to T̂k . The only new nonzero elements
βk+1
αk+1
βk+2
are in column k + 1 and rows k, k + 1, and k + 2. To get column k + 1 in Rk+1 , the two last
rotations from the previous steps are first applied to column k + 1 and rows k − 1, k, k + 1. The
element βk+2 is then annihilated by a new rotation Gk+1,k+2 .
We have xk = Uk yk , where yk satisfies the upper triangular system Rk yk = tk . To compute
xk without having to save all of Uk , we define Dk = (d1 , . . . , dk ) from RkT DkT = UkT . This
yields the recurrence relation (d0 = d−1 = 0)
When A is singular, MINRES computes a least squares solution but not, in general, the
minimum-length solution
see Choi [245, 2006]. If the system is inconsistent, then MINRES converges to a least squares
solution but not, in general, to the solution of minimum length. Choi, Paige, and Saunders [245,
2011] develop MINRES-QLP for that purpose. MINRES-QLP uses a QLP decomposition of
the tridiagonal matrix from the Lanczos process and converges to the pseudoinverse solution.
MINRES-QLP requires slightly more operations per iteration step but can give more accurate
solutions than MINRES on ill-conditioned problems. An implementation is given by Choi and
Saunders [246, 2014].
Applying MINRES to the augmented system could be an alternative approach for solving
least squares problems. However, as shown by Fischer, Silvester, and Wathen [409, 1998], MIN-
RES makes progress only in every second iteration, and LSQR and CGLS converge twice as fast.
Indeed, applying the Lanczos process to the augmented system leads to the GLK process (Paige
and Saunders [866, 1982, Section 2.1]) and hence to LSQR.
MINARES by Montoison, Orban, and Saunders [802, 2023] completes the family of Krylov
methods based on the symmetric Lanczos process (CG, SYMMLQ, MINRES, MINRES-QLP).
For any symmetric system Ax = b, MINARES minimizes ∥Ark ∥22 , within the kth Krylov sub-
space Kk (A, b). As this quantity converges to zero even if A is singular, MINARES is well suited
to singular symmetric systems.
6.2. Krylov Subspace Methods 301
The quantities ∥Ark ∥22 , ∥rk ∥22 , ∥xk − x∥22 , and ∥xk − x∥2A all decrease monotonically. On
consistent systems, the number of iterations is similar to MINRES and MINRES-QLP when the
stopping criterion is based on ∥rk ∥22 , and significantly faster when the stopping criterion is based
on ∥Ark ∥22 .
LSMR is a more general solver for the least squares problem minx ∥b − Ax∥22 of square or
rectangular systems. If A were symmetric, LSMR would minimize ∥Ark ∥22 and appear at first
glance to be equivalent to MINARES. However, rk is defined within different Krylov subspaces,
and LSMR would require two matrix-vector products Av and Au per iteration. (LSMR solving
symmetric Ax = b is equivalent to MINRES solving A2 x = Ab.) MINARES is more reliable
than MINRES or MINRES-QLP on singular systems and more efficient than LSMR.
Assume that the process has generated Vk = (v1 , v2 , . . . , vk ) with orthonormal columns. In the
next step the vector wk = Avk is formed and orthogonalized against v1 , v2 , . . . , vk . Using MGS
we compute
If hk+1,k = ∥wk ∥2 > 0, we set vk+1 = wk /hk+1,k . Otherwise, the process terminates. In
matrix form this process yields the Arnoldi decomposition
where Vk+1 has orthonormal columns. It follows that the minimum of ∥rk ∥2 is obtained when
yk solves the small Hessenberg least squares subproblem
Since all Vk and Hk+1,k must be stored, this is solved only at the final step of GMRES by
determining a product of plane rotations such that
Rk gk
Qk Hk+1,k = , = ρ1 Qk e1 . (6.2.73)
0 γk
Then the solution is obtained by solving Rk yk = gk and forming xk = Vk yk . The plane rotations
can be used and then discarded.
GMRES terminates at step k if hk+1,k = 0. Then from (6.2.69), AVk = Vk Hk,k and
rank(AVk ) = rank(Vk ) = k. Hence Hk,k must be nonsingular, and
∥rk ∥2 = min ∥Pk (A)r0 ∥2 ≤ ∥X∥2 ∥X −1 ∥2 min max |Pk (λj )|∥r0 ∥2 . (6.2.74)
Pk (0)=1 Pk (0)=1 j
However, this upper bound is not very useful because the minimization of a polynomial with
Pk (0) = 1 over a set of complex numbers is an unsolved problem. Information about the eigen-
values alone does not suffice for determining the rate of convergence.
6.2. Krylov Subspace Methods 303
AM −1 u = b, x = M −1 u, (6.2.75)
has the advantage that the residuals of the preconditioned system are identical to the original
residuals. For a right-preconditioned system the GMRES algorithm constructs an orthogonal
basis for the subspace {r0 , AM −1 r0 , . . . , (AM −1 )m−1 r0 }. We summarize the algorithm below.
For many applications an effective fixed preconditioner M is not available. Then one would
like to be able to use a preconditioner defined as an arbitrary number of steps of another iterative
method applied to solve M zj = vj . For example, another Krylov subspace based on the normal
equations could be used. It is desirable that the preconditioner be allowed to change without
restarting GMRES so that zj = Mj−1 vj . Flexible GMRES is a simple modification of standard
GMRES by Saad [955, 1993] that allows the use of variable preconditioning.
The only difference from the standard version is that we now must save Zm = (z1 , . . . , zm )
and use it for the update of x0 . This doubles the storage cost, but the arithmetic cost is the same.
Note that z1 , . . . , zm in FGMRES is not a Krylov subspace.
304 Chapter 6. Iterative Methods
By Propositions 2.1 and 2.2 in Saad [955, 1993], xm obtained at step m in flexible GMRES
minimizes the residual norm ∥b − Axm ∥2 over x0 + span (Zm ). Assuming that j − 1 steps of
FGMRES have been successfully performed and that Hj is nonsingular, xj is the exact solution
if and only if hj+1,j = 0. Note that the nonsingularity of Hj is no longer implied by the
nonsingularity of A.
Ax ≈ b, AH y ≈ c, (6.2.76)
has interesting applications in design optimization, aeronautics, weather prediction, and signal
processing; see Giles and Süli [471, 2002] and Montoison and Orban [800, 2020]. More gener-
ally, let A ∈ Rm×n , m ≥ n, be rectangular. Applying two sequences of Householder transfor-
mations alternately on the left and right, we can compute an orthogonal tridiagonalization
1 0 cH 1 0 γ1 eT1
= . (6.2.77)
UH b A V β1 e1 U H AV
The first transformation in each sequence is chosen to reduce b and c, respectively, to a multiple
of e1 , so that
U H b = β1 e 1 , cH V = γ1 eT1 ,
and hence b = β1 U e1 and c = γ1 V e1 . Later transformations are chosen to reduce A to tridiago-
nal form
α1 γ2
β2 α2 γ3
Tn+1,n
. . . . . .
H
U AV = , Tn+1,n =
. . . ∈ R(n+1)×n ,
0 βn−1 αn−1 γn
βn αn
βn+1
(6.2.78)
with nonnegative off-diagonal elements. (If m = n, the last row of Tn+1,n is void.) Note that
this transformation preserves the singular values of A.
Knowing the existence of factorization (6.2.78), we can derive recurrence relations to com-
pute the nonzero elements of the tridiagonal matrix Tn+1,n and the columns of U and V . We
already have b = β1 u1 and c = γ1 v1 , so that u1 = b/∥b∥2 and v1 = c/∥c∥2 . Following Golub,
Stoll, and Wathen [509, 2008], we write
Comparing the last columns on both sides and solving for vectors uk+1 and vk+1 , respectively,
gives
Orthogonality gives αk = uH
k Avk , and the elements βk+1 > 0 and γk+1 > 0 are determined as
normalization constants.
6.2. Krylov Subspace Methods 305
Hence to apply the SSOR preconditioner, only one column of A is needed at a time. The number
of arithmetic operations per step approximately doubles when ω ̸= 0, compared to diagonal
scaling (ω = 0). Evans [389, 1968] reports SSOR preconditioning for solving symmetric positive
definite systems Ax = b as very promising. Jennings and Malik [665, 1978] also consider Jacobi
and SSOR-preconditioned CG methods.
Theory and numerical experiments indicate that the choice ω = 1 is often close to optimal.
However, the gain is usually small and often upset by the increased complexity of each itera-
tion. Convergence may be affected by reordering the columns. Also, reordering may be used to
introduce parallelism.
In many sparse least squares problems arising from multidimensional models, A has a natural
column block structure,
Aj = Qj Rj , Qj ∈ Rm×nj , j = 1, . . . , N. (6.3.6)
For this choice we have AB −1 = (Q1 , Q2 , . . . , QN ), i.e., the columns of each block are or-
thonormal. If ATA is split according to ATA = LB + DB + LTB , where LB is strictly lower
block triangular, the block SSOR preconditioner becomes
−1
B = RB (DB + ωLTB ); (6.3.8)
see Björck [128, 1979]. As with the corresponding point preconditioner, it can be implemented
without forming ATA.
If x and y = Bx are partitioned conformally with (6.3.7),
x = (x1 , x2 , . . . , xN )T , y = (y1 , y2 , . . . , yN )T ,
then Jacobi’s method (6.1.18) applied to the preconditioned problem (6.2.22) becomes
(k+1) (k)
yj = yj + QTj (b − AB −1 yk ), j = 1, . . . , N,
and these corrections can be computed in parallel. Often Qj is not available and we have to
use Qj = Aj Rj−1 . This is equivalent to using the method of seminormal equations (2.5.26) for
solving (6.3.10). It can lead to some loss of accuracy, and a correction step is recommended
unless all the blocks Aj are well-conditioned.
(k)
A block SOR method for the normal equations can be derived similarly. Let r1 = b − Axk ,
and for j = 1, . . . , N compute
(k+1) (k) (k) (k)
xj = xj + ωzj , rj+1 = rj − ωAj zj , (6.3.11)
(k)
where zj solves minzj ∥Aj zj − rj ∥2 . Taking ω = 1 in (6.3.11) gives the block Gauss–Seidel
method.
To use the block SSOR preconditioner (6.3.8) for the conjugate gradient method, we have
to be able to compute vectors q = AB −1 p and s = B −T AT r efficiently, given p and r. The
following algorithms for this generalize the point SSOR algorithms (6.3.3) and (6.3.4):
• Set q (N ) = 0 and solve B(z1 , . . . , zN )T = p. For j = N, . . . , 2, 1, solve
The choice of partitioning A into blocks is important for the storage and computational ef-
ficiency of the methods. An important criterion is that it should be possible to compute the
factorizations Aj = Qj Rj (or at least the factors Rj ) without too much fill. Note that if Aj is
308 Chapter 6. Iterative Methods
block diagonal, the computation of zj in SOR splits into independent subproblems. This makes
it possible to achieve efficiency through parallelization.
The case N = 2, A = ( A1 A2 ) is of special interest. For the block diagonal preconditioner
(6.3.7) we have B −1 = ( Q1 Q2 ), and the matrix of normal equations for the preconditioned
system becomes
−1 T −1 I K
(AB ) AB = , K = QT1 Q2 . (6.3.14)
KT I
This matrix is consistently ordered. Hence, it is possible to reduce the work per iteration by
approximately half for many iterative methods. This preconditioner is also called the cyclic
Jacobi preconditioner.
For consistently ordered matrices, the SOR theory holds. Hence, as shown by Elfving [383,
1980], the optimal ω in the block SOR method (6.3.11) for N = 2 is
where θmin is the smallest principal angle between R(A1 ) and R(A2 ). Block SOR with ωopt
reduces the number of iterations by a factor of 2/ sin θmin compared to ω = 1.
For N = 2, the preconditioner (6.3.8) with ω = 1 has special properties; see Golub, Man-
neback, and Toint [500, 1986]. From
R1 ωQT1 A2
B= ,
0 R2
( A1 A2 ) B −1 = ( Q1 (I − P1 )Q2 ) , (6.3.15)
where P1 = Q1 QT1 is the orthogonal projector onto Range(A1 ). Hence the two blocks in
(6.3.15) are mutually orthogonal, and the preconditioned problem (6.2.22) can be split into
This effectively reduces the original problem to one of size n2 . Hence, this preconditioner is also
called the reduced system preconditioner. The matrix of normal equations is
I 0 0 0
(AB −1 )TAB −1 = = I − ,
0 QT2 (I − P1 )Q2 0 KT K
where K = QT1 Q2 . This reduction of variables corresponding to the first block of columns can
also be performed when N > 2.
Manneback [772, 1985] shows that for N = 2 the optimal choice with respect to the number
of iterations is ω = 1, i.e., the reduced system preconditioning. Further, as shown by Hageman,
Luk, and Young [558, 1980], the reduced system preconditioning is equivalent to cyclic Jacobi
preconditioning (ω = 0) for Chebyshev semi-iteration and the conjugate gradient method. The
reduced system preconditioning essentially generates the same approximations in half the num-
ber of iterations. Since the work per iteration is about doubled for ω ̸= 0, this means that cyclic
Jacobi preconditioning is optimal for CG in the class of SSOR preconditioners.
The use of SSOR preconditioners for Krylov subspace methods was first proposed by Ax-
elsson [47, 1972]. SSOR-preconditioned CG methods for the least squares and least-norm prob-
lems are developed by Björck and Elfving [141, 1979]. Experimental results for block SSOR
6.3. Preconditioners for Least Squares Problems 309
preconditioning with N > 2 are given by Björck [128, 1979]. Tests show that the number of
iterations required is nearly constant for values around ω = 1. For certain grid problems, a high
degree of parallelism can be achieved. Kamath and Sameh [682, 1989] give a scheme for a three-
dimensional n × n × n mesh and a seven-point difference star, for which N = 9 and each block
consists of n2 /9 separate subblocks of columns. Hence each subproblem can be solved with a
parallelism of n2 /9.
where E is the residual matrix. It follows that ∥AL−1 − I∥2 < ϵκ(A)2 , and we can expect rapid
convergence. However, since ATA is often significantly denser than A, it can be difficult to find
a sufficiently sparse and effective IC preconditioner for least squares problems.
In the pioneering paper by Meijerink and van der Vorst [786, 1977], an IC factorization for
the class of symmetric M-matrices is shown to exist. More generally, IC factorizations exist
when C is an H-matrix but may fail for a general symmetric positive definite matrix because of a
zero or negative pivot. Numerical instabilities can be expected if pivots have small magnitudes.
To avoid breakdown, Manteuffel [774, 1980] proposed factorizing a diagonally shifted matrix
C + αI for some sufficiently large α > 0. This modification can be very effective, but its success
depends critically on the choice of α. In general the only way to find a suitable α is by trial and
error.
During the last fifty years, many variations of IC preconditioners have been developed. They
differ in the strategies used to determine which elements are dropped. In a level-based IC
method, the nonzero pattern of L is based on the nonzero pattern of C and prescribed in ad-
vance. A symbolic factorization is used to assign each fill entry a level. In an IC(ℓ) incomplete
factorization, a nonzero element in L is kept in the numerical factorization only if its level is
at most ℓ. This has the advantage that the memory required for the preconditioner L is known
in advance. In an IC(0) factorization, the nonzero structure of L is the same as for the lower
triangular part of C. An IC(1) factorization also includes any nonzeros directly introduced by
the elimination of the level-zero elements. Higher-level incomplete factorizations are defined
recursively. However, memory requirements may grow quickly as ℓ is increased. An improved
310 Chapter 6. Iterative Methods
strategy by Scott and Tůma [984, 2011] is to consider individual matrix elements and restrict
contributions of small elements to fewer levels of fill than for larger elements.
Another widely used class of IC factorizations is called incomplete threshold IC(τ ) fac-
torization. In these, elements in the computed IC factor whose magnitude falls below a preset
threshold τ are discarded. A choice of τ = 0 will retain all elements, giving a complete Cholesky
factorization of C. It can be shown that a choice of τ = 1 will cause all off-diagonal elements to
be rejected and give a diagonal preconditioner. In practice, an intermediate value of τ in the in-
terval [0.01, 0.02] is recommended by Ajiz and Jennings [13, 1984]. Several of the above classes
of IC factorizations are available in the ichol function supplied by MATLAB.
A suitable symmetric permutation P CP T can improve the performance of an IC factoriza-
tion. When a drop tolerance is used, good orderings for direct methods, such as the minimum
degree algorithm, can be expected to perform well, because with these orderings fewer elements
need to be dropped; see Munksgaard [815, 1980]. Duff and Meurant [350, 1989] study the effect
of different ordering strategies on the convergence of CG when it is preconditioned by IC. They
conclude that the rate of convergence is not related to the number of fill-ins that are dropped but
rather almost directly related to the norm of the residual matrix ∥E∥. They show that several
orderings that give a small number of fill-ins do not perform well when used with a level-zero or
level-one incomplete factorization.
An alternative strategy to avoid breakdown of an IC factorization is proposed by Ajiz and
Jennings [13, 1984]. To compensate for dropped off-diagonal elements, corrections are added
to the diagonal elements. To delete the element cij , i ̸= j, a residual matrix Eij with nonzero
elements
cii −cij
−cji cjj
is added, where cii cjj − c2ij ≥ 0. Then Eij is positive semidefinite, and the eigenvalues of the
modified matrix C + Eij cannot be smaller than those of C. Hence if C is positive definite and
E is the sum of such modifications, it follows that C + E is positive definite, and the incomplete
factorization cannot break down. In the algorithm of Ajiz and Jennings, modifications to cii and
cjj of equal relative magnitude are made,
A difficulty with threshold Cholesky factorization is that the amount of storage needed to hold
the factorization for a given τ cannot be determined in advance. One solution is to stop and
restart with a larger value of τ if the allocated memory does not suffice. Alternatively, only the
p largest off-diagonal elements in each column of L can be kept, for some parameter p. Lin and
Moré [748, 1999] use no drop tolerance and retain the nj + p largest elements in the jth column
of L.
Tismenetsky [1065, 1991] proposes a different modification scheme. Intermediate memory
is used during construction of the preconditioner L but then discarded. A decomposition of the
form
C = (L + R)(L + R)T = LLT + LRT + RLT + E (6.3.17)
6.3. Preconditioners for Least Squares Problems 311
is used, where L is lower triangular with positive diagonal elements, and R is strictly lower
triangular. The matrix L is used as a preconditioner, and R is used to stabilize the factorization
process. The residual matrix has the positive semidefinite form E = RRT . At step j, the first
column of the jth Schur complement can be decomposed as the sum lj + rj , where ljT rj = 0.
In a right-looking implementation, the Schur complement is updated by subtracting Ej = (lj +
rj )(lj + rj )T , where rj is not retained in the incomplete factorization. Hence, at step j the
positive semidefinite modification rj rjT is implicitly added to A, which prevents breakdowns.
Tismenetsky takes rj as the vector of off-diagonal elements that are smaller than a chosen drop
tolerance. The good performance of Tismenetsky’s preconditioner is partly explained by the
form of the error matrix that depends on the square of the elements in R. The fill in L can be
controlled by the choice of drop tolerance. The most serious drawback is that the total memory
requirement needed to compute L can be prohibitively high.
Kaporin [684, 1998] modifies Tismenetsky’s method in several respects. A left-looking al-
gorithm is used, and the memory requirement is controlled by using two tolerances. Elements
larger than τ1 are kept in L, and those smaller than τ2 are dropped from R. The error matrix now
has the structure
E = RRT + F + F T ,
where F is a strictly lower triangular matrix that is not computed. Kaporin’s method is not
breakdown-free and has to be stabilized, e.g., by restarting the factorization after a diagonal shift
A := A + αI. More than one restart may be required.
Further developments of the Tismenetsky–Kaporin method are proposed by Scott and Tůma
[986, 2014]. Memory is limited by using two extra parameters lsize and rsize to control the
maximal number of fill elements in each column of L and R, respectively. The lsize largest
elements are kept in lj provided they are at least τ1 in magnitude, and the rsize largest elements
are in rj provided they are at least τ2 in magnitude. An implementation MI28 is described in
Scott and Tůma [985, 2014], where extensive test are described on a large set of problems from
the SuiteSparse collection. The code is available as part of the HSL Mathematical Software
Library; see www.hsl.rl.ac.uk.
As described in Section 2.5.3, iterative refinement (IR) can be regarded as a preconditioned
iterative method, where the preconditioner is the full factor R̄ computed from Cholesky of ATA
(or a QR factorization of A), possibly in lower precision. The iteration method in IR is the
simple power method. This can require several iterations to converge, and often some other
iterative solver such as CGLS can be used with advantage. Zhang and Wu [1147, 2019] use a
QR factorization in IEEE half precision as a preconditioner for CGLS to achieve high accuracy
least squares solutions on GPUs.
By Theorem 2.1.2, the computed full Cholesky factorization satisfies
identified by a partial Cholesky factorization P = LLT that uses only a few columns corre-
sponding to the largest diagonal elements of H. This is used as a preconditioner to reduce the
condition number of H. The smallest eigenvalues of H are handled by the deflated CG algorithm
of Saad et al. [961, 2000]; see Section 6.4.6. This requires computing approximate eigenvectors
corresponding to some of the smallest eigenvalues of the preconditioned matrix P −1 H by a
Rayleigh–Ritz procedure.
Myre et al. [816, 2018] use CGLS preconditioned with the computed complete Cholesky
factorization to solve dense least squares problems. They call their algorithm TNT because it is
a “dynamite method”(!). For problems in rock magnetism with tens of thousands of variables it
outperformed other tested methods, including dense QR factorization.
Alternatively, an incomplete factor R can be generated by modifying a QR factorization
of A. This normally involves more computation but is less subject to the effect of rounding
errors. Jennings and Ajiz [664, 1984] describe an incomplete modified Gram–Schmidt (IMGS)
factorization in which the magnitude of each off-diagonal element rij is compared against a
chosen drop tolerance τ , scaled by the norm of the corresponding column in A = (a1 , . . . , an ).
That is, elements in R such that
are dropped. If τ = 0, all elements in R are retained, and the MGS process is complete. If τ = 1,
all off-diagonal elements in R are dropped, thus making R diagonal. In IMGS factorization the
preconditioner R is formed by the coefficients in a series of vector orthogonalizations, and A is
converted into Q.
for i = 1 : n
rii = ∥ai ∥2 ; qi := ai /rii ;
for j = i + 1 : n
rij := qiT aj ;
if rij < τ dj then rij := 0;
else aj := aj − rij qi ;
end
end
If A has full column rank, the IMGS algorithm cannot break down. Column aj is only mod-
ified by subtracting a linear combination of previous columns a1 , . . . , aj−1 and cannot vanish.
Therefore, we have at each stage A = Q̂R̂, where Q̂ is orthogonal, and upper triangular R̂ has
positive diagonal elements. Normalization will give a nonzero qj , and the process can be con-
tinued. A drawback of IMGS is that for a sparse A, the intermediate storage requirement can be
much larger than for the final preconditioner R̂.
Wang [1101, 1993] (see also Wang, Gallivan, and Bramley [1102, 1997]) gives a com-
pressed algorithm (CIMGS) for computing the IMGS preconditioner. CIMGS is similar to
an incomplete Cholesky factorization. In exact arithmetic, it can be shown to produce the
same incomplete factor R for C = ATA as IMGS. Thus it inherits the robustness of IMGS.
CIMGS is also equivalent to Tismenetsky’s IC decomposition applied to the matrix ATA; see Bru
et al. [183, 2014].
6.3. Preconditioners for Least Squares Problems 313
Jennings and Ajiz [664, 1984] also consider an incomplete Givens QR factorization. The
rows of A are processed sequentially. The nonzero elements in the ith row (ai1 , ai2 , . . . , ain ) are
scanned, and each nonzero element aij is annihilated by a plane rotation involving row j in R.
A rotation to eliminate an element in A is skipped if
|aij | < τ ∥aj ∥2 ,
where aj is the jth column of A. If such an element aij were simply discarded, the final in-
complete factor R would become singular. Instead, these elements are rotated into the diagonal
element rjj by setting q
rjj = rjj 2 + a2 .
ij
This guarantees that R is nonsingular and that the residual matrix E = ATA − RTR has zero
diagonal elements.
Zlatev and Nielsen [1153, 1988] compute sparse incomplete QR factors of A by discarding
computed elements that are smaller than a drop tolerance τ ≥ 0. The initial tolerance is succes-
sively reduced if the iterative solver converges too slowly. This approach can be very efficient
for some classes of problems, especially when storage is a limiting factor.
A different dropping criterion suggested by Saad [953, 1988] is to keep the pR largest ele-
ments in a row of R and the pQ largest elements in a column of Q. The sparsity structure of R
can also be limited to a prescribed index set P , as in the incomplete Cholesky algorithm. This
version can be obtained from Algorithm 6.3.1 by modifying it so that rij = 0 when (i, j) ̸∈ P .
A multilevel incomplete Gram–Schmidt QR (MIQR) preconditioner is given by Li and Saad
[741, 2006]. This exploits the fact that when a matrix is sparse, many of its columns will be
orthogonal because of their structure. The algorithm first finds a set of structurally orthogonal
columns in A and permutes them into the first positions A1 = (a1 , . . . , ak ). Normalizing these
columns gives A1 = Q1 D1 , with Q1 orthogonal. The remaining columns A2 are then orthogo-
nalized against the first set, giving B = A2 − Q1 F1 and the partial QR factorization
T D1 F1
AP1 = ( Q1 B ) ,
0 I
where F1 is usually sparse and has structurally independent columns. Hence, the process can
be repeated recursively on B until the reduced matrix is small enough or no longer sufficiently
314 Chapter 6. Iterative Methods
min ∥I − AM ∥F , (6.3.18)
M ∈G
where M is allowed to have nonzero elements only in a subset G of indices (i, j), 1 ≤ i, j ≤ n,
and ∥ · ∥F is the Frobenius matrix norm. If mj is the jth column of M , then
n
X
∥I − AM ∥2F = ∥ej − Amj ∥22 . (6.3.19)
j=1
The optimization problem reduces to solving n independent least squares subproblems min ∥ej −
Amj ∥22 for mj subject to the sparsity constraints on mj . Rows with no nonzero elements can be
discarded. Thus, when M is constrained to be a sparse matrix the least squares subproblems are
of small dimension. A simple method for solving the subproblems is coordinate descent on the
function
1 2
2 ∥rj ∥2 , rj = ej − Amj .
Chow and Saad [247, 1998] reduce the cost of computing the SPAI by using a few steps of an
iterative method to reduce the residuals for each column.
For matrices with a general sparsity pattern it is difficult to prescribe a good nonzero pat-
tern for M . For A ∈ Rn×n a common choice is to let M have the same sparsity structure as
A. Therefore, adaptive strategies have been developed. These start with a simple initial guess
for the sparsity pattern (for example, a diagonal structure) and successively augment this until
some criterion is satisfied. The algorithm by Grote and Huckle [544, 1997] is one of the most
successful of these. A detailed discussion is given in Benzi [106, 2002].
For a least squares problem minx ∥Ax − b∥2 of full column rank, we could seek an SPAI
for the normal equations matrix C = ATA. Several algorithms for computing a SPAI for posi-
tive definite systems have been suggested. Two basic types can be distinguished, depending on
whether the preconditioner is expressed as a single matrix M ≈ C −1 or as the product of two or
more matrices. For use with CGLS a symmetric positive definite preconditioner M is required.
Symmetry can be achieved by using the symmetric part 21 (M T + M ) of M . Regev and Saun-
ders [915, 2022] give a modified PCGLS method that detects indefiniteness or near singularity
of M and restarts PCGLS with a more positive definite preconditioner.
Another way of computing an approximate inverse is by a procedure related to the bicon-
jugation algorithm of Fox [429, 1964]. Given a symmetric positive definite matrix C and a
6.3. Preconditioners for Least Squares Problems 315
Läuchli [724, 1961] was the first to use CGLS with A1 as a preconditioner for solving (6.3.24).
He took A1 to be square and nonsingular and solved the preconditioned problem minx ∥AA−1 1 y−
b∥2 , y = A1 x or, equivalently,
I b1
min y− , C = A2 A−1 1 ∈R
m2 ×n
, (6.3.25)
y C b2 2
Cp = A2 (A−1
1 p), C T t = A−T T
1 (A2 t). (6.3.27)
Fast convergence is obtained when σ1 is small, e.g., when A1 is well-conditioned and ∥A2 ∥2 is
small. Because C has at most m2 = m − n distinct singular values, the iterations will terminate
in at most m2 steps, assuming exact arithmetic. Rapid convergence can be expected if m2 is
small.
Subset preconditioners can also be constructed from QR factorization. Let A1 ∈ Rm1 ×n ,
m1 ≥ n, be a subset of rows in A ∈ Rm×n such that A1 has full column rank. Assume that the
QR factorization
R1 c1
QT1 ( A1 P b1 ) = (6.3.29)
0 c2
is known, where P is a sparsity-preserving column permutation. Then R1 ∈ Rn×n can be used
as a preconditioner to solve minx ∥Ax − b∥2 . The least squares problem is equivalent to
R1 c1
min y− .
x A2 P b2
Setting x = R1−1 y and suppressing the column permutation gives the preconditioned problem
In c1
min y− , C = A2 R1−1 ∈ Rm2 ×n , (6.3.30)
y C b2 2
where U ∈ Rn×n is upper triangular, and L ∈ Rm×n is unit lower trapezoidal with bounded
off-diagonal elements. (If A is sparse, the row permutation Π and also a column permutation
preserve the sparsity of L and U while bounding the elements of L.) With A1 as right precondi-
tioner, we have
C = A2 A−1 −1
1 = L2 L1 . (6.3.32)
A matrix-vector multiply can be performed as Cv = L2 (L−1 1 v).
If the pivoting strategy maintains |Lij | ≤ τ for some τ ∈ [1, 4], say, any ill-conditioning in
A will usually be reflected in U . Hence U can be used as a right preconditioner. This approach,
suggested by Björck [127, 1976], has the advantage that the lower triangular factor L need not
be stored, and the additional work per iteration depends only on the density of U . Saunders [964,
1979] used a rowwise elimination with a preliminary pass through the rows to select a triangular
subset with maximal diagonal elements. Subsequent use of the operator AU −1 involves only
back-substitutions with U and multiplications with A. When A has many more rows than col-
umns it may be preferable to factorize only A1 = L1 U1 and operate with C = A2 (L1 U1 )−1
in CGLS.
For sparse problems, a standard pivoting strategy in LU factorization is to choose a pivot aij
that minimizes the product of the number of nonzeros in its row and column. The product is
318 Chapter 6. Iterative Methods
called the Markowitz merit function and bounds the number of fill-ins that can occur during an
elimination. For the purpose of stability, aij is required to satisfy
where u is a threshold parameter in the range 0 < u ≤ 1. Taking u in the range 0.1 ≤ u ≤ 0.9
(not too small) normally keeps L well-conditioned while promoting some degree of sparsity. This
is threshold partial pivoting. Threshold rook pivoting and threshold complete pivoting are also
implemented in LUSOL (Gill et al. [475, 2005]) to balance stability and sparsity more carefully
for demanding cases.
In a related approach, Howell and Baboulin [647, 2016] use LU factorization with partial
pivoting and apply CGLS to
min ∥Ly − b∥, U x = y.
The problem is often sufficiently well-conditioned to give rapid convergence. In their MIQR
algorithm, Li and Saad [741, 2006] further precondition L using incomplete QR factors.
Problem (6.3.25) can be written in augmented form as
r1 + y = b1 , r2 + Cy = b2 , r1 + C T r2 = 0. (6.3.33)
Eliminating r1 from the first and third equations gives y = b1 + C T r2 , and then using the second
equation yields the symmetric positive definite system
of size (m − n) × (m − n). This can be interpreted as the normal equations for the least squares
problem
−C T b1
min r2 − . (6.3.35)
r2 Im2 b2
When m2 is sufficiently small, problem (6.3.35) can be solved by QR factorization. Applying
CGLS to (6.3.35) yields the following algorithm; see Björck and Yuan [153, 1999].
qk = −C T pk ,
αk = γk /(∥pk ∥22 + ∥qk ∥22 ),
rk+1 = rk − αk qk ,
tk+1 = tk − αk pk ,
sk+1 = tk+1 − Crk+1 ,
γk+1 = ∥sk+1 ∥22 ,
βk = γk+1 /γk ,
pk+1 = sk+1 + βk pk .
This requires about the same storage and work per step as Algorithm 6.3.3. However, as
shown by Yuan [1142, 1993], the last formulation is advantageous for generalized least squares
6.3. Preconditioners for Least Squares Problems 319
problems with a covariance matrix V ≠ I. The generalized normal equations for problem
(6.3.25) are
In
( In C T ) V −1 y − b , A1 x = y. (6.3.36)
C
On the other hand, the generalized problem for (6.3.35) only involves V :
−C T −C T
( −C Im2 ) V r2 = b2 − Cb1 , y = b1 − ( Im2 0 ) V r2 , (6.3.37)
Im2 Im2
To achieve low coherence, a random-mixing preprocessing phase is performed before the random
sampling. The rows of A are first multiplied by a diagonal matrix D with random elements +1
or −1. Next, a fast transform is applied to each column of DA and Db. This can be a Walsh–
Hadamard transform (WHT), a discrete cosine transform (DCT), or a discrete Hartley transform
(DHT); see Hartley [592, 1942] and Bracewell [174, 1984]. For example, the DCT can be
achieved by the following MATLAB script:
D = spdiags(sign(rand(m,1),0,m,n));
B = dct(D*A); B(1,:) = B(1,:)/sqrt(2);
With high probability the coherence of the resulting row-mixed matrix B is small. After the row-
mixing step, a random sample B1 of s > γn rows of B is taken, where γ > 1 is an oversampling
factor. The QR factorization B1 = Q1 R1 then gives a preconditioner R1 for LSQR. With a
suitable sample size s, R(Q)1 is a good approximation of R(A), and LSQR converges rapidly.
With one preprocessing phase, γ = 4 was found to be near-optimal for a large range of problems.
Since DHT needs less memory than WHT and works better than DCT, this is the preferred choice.
A solver for underdetermined systems is included. Blendenpik often beats LAPACK’s DGELS
on dense highly overdetermined problems.
The iterative solver LSRN by Meng, Saunders, and Mahoney [789, 2014] is based on random
normal projection. LSRN works for both highly over- and underdetermined systems and can
handle rank-deficient systems. For an overdetermined least squares problem
with A ∈ Rm×n and rank(A) = r ≤ n < m, LSRN performs the following steps:
1. Choose an oversampling factor γ > 1 and set s = γn.
2. Compute Ae = GA, where G ∈ Rs×m is a random matrix whose elements are independent
random variables following the standard normal distribution.
3. Compute the compact SVD A
e=U
eΣe Ve T , where Σ
e ∈ Rr×r (U
e is not needed).
4. Set N = Ve Σ−1 and compute the least-norm solution of miny ∥AN y −b∥2 using a Krylov-
type method such as LSQR. Return x = N y.
A similarly structured algorithm works for strongly underdetermined systems. Note that A
is used by LSRN only for matrix-vector and matrix-matrix operations. Hence LSRN is effi-
cient if A is sparse or a fast linear operator. LSRN can easily be extended to handle Tikhonov
regularization.
A reasonable choice for γ in step 1 is 2.0. The random normal projection in step 2 takes
O(mn2 ) time. This is more than the fast transforms used by some of the other methods. How-
ever, the random normal projection scales well in parallel environments. An important property
6.3. Preconditioners for Least Squares Problems 321
of LSRN is that the singular values of AN are the same as for random matrix (GU )† of size
s × n and independent of the spectrum A; see Theorem 4.2 in Meng, Saunders, and Mahoney
[789, 2014]. The spectrum of such a random matrix is a well-studied problem in random matrix
theory, and it is possible to give strong probability bounds on the condition number of AN . To
reach precision 10−14 , the maximum number of iterations needed by LSQR is ≈ 66/ log(s/r).
Thus the running time for LSRN is fully predictable.
The LSRN package can be downloaded from http://www.stanford.edu/group/SOL/
software/lsrn.html. On dense overdetermined problems with n = 103 , LSRN is compared
with solvers DGELSD and DGELSY from LAPACK and Blendenpik. For full-rank problems,
Blendenpik is the overall winner and LSRN the runner-up. Blendenpik is not designed for rank-
deficient problems, while LSRN can take advantage of rank-deficiency. For underdetermined
problems, the LAPACK solvers run much slower, while LSRN works equally well. On sparse
problems, LSRN is also compared to SPQR from SuiteSparseQR. On generated sparse test prob-
lems, SPQR works well for m < 105 . For larger problems, LSRN is the fastest solver. The
advantage of LSRN becomes greater for underdetermined systems.
x = V x1 + W x2 , V ∈ Rn×k , W ∈ Rn×(n−k) ,
studied in Section 4.3.1. Such methods for solving Tikhonov regularized problems were pro-
posed by Hanke and Vogel [572, 1999]. Usually, k ≪ n, and a direct method is used to compute
x1 . For x2 a Krylov subspace method, such as LSQR, is used that acts in the space comple-
mentary to R(V ). The rate of convergence is determined by the singular values of W , and the
reduced condition number is
κ = max ∥Ax∥2 min ∥Ax∥2 .
x∈R(W ) x∈R(W )
is computed, where R ∈ Rk×k is upper triangular. The matrices Q1 and Q2 are not explicitly
formed but (as usual) are represented by the k corresponding Householder vectors. The problem
then becomes
R QT1 AW x1 c1
min − . (6.3.39)
x1 ,x2 0 QT2 AW x2 c2 2
The subproblem for x2 ,
min ∥QT2 AW x2 − c2 ∥2 , (6.3.40)
x2
When L ∈ Rp×n , p < n, and has a nontrivial nullspace spanned by the matrix W2 , this can be
transformed into standard form as follows; see Section 3.6.5. Let Ā = AL†A , where
is the A-weighted pseudoinverse (3.6.46) of L. The solution can be split into two parts as
where z ∈ N (L) is the unregularized part of the solution. An iterative method is used to solve
for y, where L†A acts as a right preconditioner. The matrix L†A is not formed explicitly but kept
in the factored form (6.3.42). The implementation of this two-level method is discussed in more
detail by Hansen [576, 1998] and Hansen and Jensen [582, 2006].
When L = I the optimal choice of columns for V consists of the right singular vectors
corresponding to the k largest singular values of A. This will minimize the condition number of
the reduced subproblem (6.3.40). The choice is usually not practical; instead, singular vectors
from a related simpler problem of reduced size can be used to form V . Another possibility is to
perform products by V and V T with fast transforms. Two examples are the cosine transform DC-
2 and the wavelet transform. Extensive numerical experiments with two-level preconditioners are
given by Jacobsen [660, 2000].
Bunse-Gerstner, Guerra-Ones, and de La Vega [189, 2006] give a modification of the two-
level LSQR algorithm that makes it considerably less expensive when the solution, as is often
the case, is needed for a large number of regularization parameters.
of very large dimensions arise, e.g., in signal restoration, seismic explorations, and image pro-
cessing.
A matrix-vector product T x is essentially a discrete convolution operation. As will be shown
in the following, by embedding the Toeplitz matrix into a circulant matrix, a matrix-vector
product T x can be computed via the fast Fourier transform in O(n log n) operations. Provided a
good preconditioner can be found, iterative methods such as CGLS or LSQR can be competitive
with the fast direct methods given in Section 4.5.5.
6.3. Preconditioners for Least Squares Problems 323
ωj = e−2πij/n , j = 0 : n − 1,
and the eigenvectors are the columns of the discrete Fourier matrix,
1
F = (fjk ), fjk = √ e2πijk/n , 0 ≤ j, k ≤ n, (6.3.47)
n
√ Pn−1
where i = −1. The circulant matrix Cn can be written as a polynomial Cn = k=0 ck Pnk in
Pn . Hence it has the same eigenvectors as Pn , and its eigenvalues are given by
It follows that operations with a circulant matrix Cn can be performed in O(n log n) operations
using the FFT.
We now show how any Toeplitz matrix T can be expanded into a circulant matrix. For
illustration, set m = n = 3, and define
t0 t1 t2 0 t−2 t−1
t t0 t1 t2 0 t−2
−1
t−2 t−1 t0 t1 t2 0
T V
∈ R6×6 .
CT = = (6.3.50)
V T 0 t−2 t−1 t0 t1 t2
t 0 t−2 t−1 t0 t1
2
t1 t2 0 t−2 t−1 t0
A similar construction works for rectangular Toeplitz matrices. For the Toeplitz matrix (6.3.44),
the circulant
can be used. To form y = T x ∈ Rm+1 for arbitrary x ∈ Rn , x is padded with zeros to length
n + m, and we calculate
x H x
z = CT = F ΛF , y = ( Im 0 ) z. (6.3.51)
0 0
324 Chapter 6. Iterative Methods
This can be done with two FFTs and one multiplication with a diagonal matrix. The cost is
O(n log2 n) multiplications. A similar scheme enables fast computation of T H y.
Strang [1042, 1986] obtained a circulant matrix as a preconditioner for symmetric positive
definite Toeplitz systems by copying the central diagonals of T and “bringing them around.”
He showed that the eigenvalues of T C −1 cluster around 1, except for the largest and smallest
eigenvalues. T. Chan [225, 1988] gave an improved circulant preconditioner that is optimal in
the sense of minimizing ∥C − T ∥F .
Theorem 6.3.1. Let T ∈ Rn×n be a square (not necessarily symmetric positive definite) Toeplitz
matrix. Then the circulant matrix C = (c0 , c1 , . . . , cn−1 ) with
it−(n−i) + (n − i)ti
ci = , i = 0 : n − 1, (6.3.52)
n
minimizes ∥C − T ∥F .
The best approximation C has a simple structure. It is obtained by averaging the correspond-
ing diagonal of T extended to length n by wraparound. For a Toeplitz matrix of order n = 4 we
obtain
t0 t1 t2 t3 t0 c1 c2 c3
t t0 t1 t2 c t c3 c2
T = −1 , C= 1 0 ,
t−2 t−1 t0 t1 c2 c1 t0 c3
t−3 t−2 t−1 t0 c3 c2 c1 t0
where
c1 = (t−3 + 3t1 )/4, c2 = (t−1 + t1 )/2, c3 = (3t−1 + t3 ).
The convergence rate of CGLS applied to a preconditioned Toeplitz system T C −1 y = b de-
pends on the distribution of the singular values of T C −1 . R. Chan, Nagy, and Plemmons [219,
1994] show that if the generating functions of the blocks Tj are 2π-periodic continuous func-
tions, and if one of these functions has no zeros, then the singular values of the preconditioned
matrix T C −1 are clustered around 1, and PCGLS converges very quickly. The class of 2π-
periodic continuous functions contains a class of functions that arises in many signal processing
applications.
Similar ideas can be applied to problems where the least squares matrix T has a general
Toeplitz block or block Toeplitz structure; see Nagy [817, 1991] and R. Chan, Nagy, and Plem-
mons [218, 1993]. Hence the method can be applied also to multidimensional problems. Con-
sider a least squares problem
T1
.
min ∥T x − b∥2 , T = .. ∈ Rm×n , (6.3.53)
x
Tq
where each block Tj , j = 1, . . . , q, is a square Toeplitz matrix. (Note that if T itself is a rec-
tangular Toeplitz matrix, then each block Tj is necessarily Toeplitz.) In the first step, a circulant
approximation Cj is constructed for each block Tj . Each circulant matrix Cj , j = 1, . . . , q, is
then diagonalized by the Fourier matrix F : Cj = F Λj F H . The eigenvalues Λj can be found
from the first column of Cj ; cf. (6.3.48). Hence, the spectrum of Cj , j = 1, . . . , q, can be
computed in O(m log n) operations using the FFT.
The preconditioner for T is then defined as a square circulant matrix C such that
q
X q
X
T
C C= CjT Cj =F H
(ΛH
j Λj )F.
j=1 j=1
6.4. Regularization by Iterative Methods 325
Thus, C T C is also circulant, and its spectrum can be computed in O(m log n) operations. Now
C is taken to be the Hermitian positive definite matrix
Xq 1/2
C ≡ FH ΛH
j Λ j F. (6.3.54)
j=1
Then CGLS with right preconditioner C is applied to solve minx ∥T x − b∥2 . Note that to use C
as a preconditioner we need only know its eigenvalues, because the factorization (6.3.54) can be
used to solve linear systems involving C and C T . The generalization to block Toeplitz matrices
is straightforward.
where ω is chosen so that ω ≈ 1/σ1 (A)2 . In this context the method is known as Landweber’s
method [718, 1951]. From the standard theory of stationary iterative methods it follows that the
error in xk satisfies
Px
Taking
n
0 = 0 and expanding the error in terms of the SVD (singular value decomposition)
A = i=1 σi ui viT shows that (6.4.2) can be written as
n
X uTi b
xk = φk (σi2 ) vi , φk (σ 2 ) = 1 − (1 − ωσ 2 )k . (6.4.3)
i=1
σi
It follows that the effect of terminating the iteration with xk is to damp the component of the so-
lution along vi by the factor φk (σi2 ), where φk (σ 2 ) is the filter factor for Landweber’s method;
see Section 3.5.3. After k iterations, only the components of the solution corresponding to
σi ≥ 1/k 1/2 have converged. If the noise level in b is known, the discrepancy principle can
be used as a stopping criterion; see Section 3.6.4.
When an iterative method is applied to an ill-posed problem, the error in xk will initially
decrease, but eventually the unwanted irregular part of the solution will grow and cause the
326 Chapter 6. Iterative Methods
process to diverge. Such behavior is called semiconvergence. The iterations should be stopped
before divergence starts. Terminating the Landweber method after k iterations gives roughly
the same result as using truncated SVD (see Section 3.6.2) where components corresponding to
σi ≤ µ ∼ k −1/2 are dropped. The square root means that usually many iterations are required.
For this reason, Landweber’s method cannot in general be recommended; see Hanke [566, 1991].
If A ∈ Rm×n is rank-deficient, xk in Landweber’s method can be split into orthogonal
components:
xk = yk + zk , yk ∈ R(AT ), zk ∈ N (A).
The orthogonal projection of xk − x0 onto N (A) can then be shown to be zero. Hence, in exact
arithmetic, the iterates converge to the unique least squares solution that minimizes ∥x − x0 ∥2 .
Strand [1041, 1974] analyzed the more general iteration
xk+1 = xk + p(ATA)AT (b − Axk ), (6.4.4)
where p(λ) is a polynomial of order d in λ. A special case is the iteration suggested by Riley [929,
1956]: xk+1 = xk + ∆xk , where
A rk
min ∆xk − , rk = b − Axk .
∆xk µI 0 2
This corresponds to taking p(λ) = (λ + µ2 )−1 . Riley’s method is sometimes called the iterated
Tikhonov method.
Iteration (6.4.4) can be performed more efficiently as follows. If
d
Y
1 − λp(λ) = (1 − γj λ)
j=1
is the factorized polynomial, then one iteration step can be performed in d minor steps of a
nonstationary Landweber method:
xj+1 = xj + γj AT (b − Axj ), j = 0, 1, . . . , d − 1. (6.4.5)
1/2
Assume that σ1 = β and that the aim is to compute an approximation to the truncated sin-
gular value solution with a cut-off for singular values σi ≤ σc = α1/2 . Then, as shown by
Rutishauser [948, 1959], in a certain sense the optimal parameters to use in (6.4.5) are γj = 1/ξj ,
where ξj are the zeros of the Chebyshev polynomial of degree d on the interval [α, β]:
1 1 π 2j + 1
ξj = (α + β) + (α − β)xj , xj = cos , (6.4.6)
2 2 2 d
j = 0, 1, . . . , d − 1. This choice leads to a filter function R(t) of degree d with R(0) = 0, and
of least maximum deviation from 1 on [α, β]. Note that α must be chosen in advance, but the
regularization can be varied by using a decreasing sequence α = α1 > α2 > · · · .
From standard results for Chebyshev polynomials it can be shown that if α ≪ β, then k steps
of iteration (6.4.5)–(6.4.6) reduce the regular part of the solution by the factor
1/2
δk ≈ 2e−2k(α/β) . (6.4.7)
Thus, the cut-off σc for this method is related to j in (6.4.5) as j ≈ 1/σc . This is a great
improvement over the standard Landweber’s method, for which the number of steps needed is
k ≈ (1/σc )2 .
Iteration (6.4.5) with parameters (6.4.6) suffers severely from roundoff errors. This instability
can be overcome by a reordering of the parameters ξj ; see Anderson and Golub [24, 1972].
Alternatively, (6.4.5)–(6.4.6) can be written as a three-term recursion, as in the CSI method of
Section 6.1.6.
6.4. Regularization by Iterative Methods 327
Since y is not of interest, the regularized version RCGME of CGME given below is to be pre-
ferred. This needs only a small amount of extra arithmetic work and no extra storage.
where Bk is lower bidiagonal. Orthogonal matrices Q̃k can be constructed from 2k − 1 plane
rotations so that
B k β1 e 1 R̃k fk
Q̃k = ,
µI 0 0 ϕk+1
where R̃k ∈ Rk×k is upper bidiagonal. The basis matrices Uk+1 and Vk are modified accord-
ingly. For LSME the regularized bidiagonal subproblem is
2
yk yk
min subject to ( Lk µI ) = β1 e 1 .
tk 2
tk
where Lk ∈ Rk×k and L̃k are lower bidiagonal. Numerical tests indicate that regularized LSQR
is more reliable and efficient than regularized LSME.
where quantities like ∥x∥2N = xT N x are elliptic norms. SQD systems arise in sequential qua-
dratic programming and interior-point methods for convex optimization. Another source is in
stabilized mixed finite element methods.
The SQD matrix in (6.4.12) is indefinite and has m positive and n negative eigenvalues.
It is nonsingular, and its inverse is also SQD. The following properties of SQD matrices are
established by Vanderbei [1085, 1995] and George, Ikramov, and Kucherov [456, 2000]:
1. An SQD matrix K is strongly factorizable, i.e., for every permutation matrix P there exists
a diagonal matrix D and a unit lower triangular matrix L such that
P KP T = LDLT , (6.4.15)
has a positive definite symmetric part 21 (K̃ + K̃ T ). Gill, Saunders, and Shinnerl [477,
1996] analyze the stability of an indefinite Cholesky-type factorization using the results of
Golub and Van Loan [486, 1979] on LU factorization of positive definite matrices.
330 Chapter 6. Iterative Methods
One approach to solving the SQD system (6.4.12) is to use Krylov methods for indefinite
systems such as S YMMLQ and MINRES. However, as shown by Fischer et al. [409, 1998], these
make progress only in every second iteration and do not exploit the structure of SQD systems.
Eliminating either y ∈ Rm or x ∈ Rn in (6.4.12) gives the Schur complement equations
(AN −1 AT + M )y = AN −1 c + b. (6.4.17)
Both of these systems are symmetric positive definite and hence can be solved by CG. After x or
y is computed, the remaining part of the solution can be recovered from
The algorithm ECGLS below solves the Schur complement equation (6.4.16). The iterates
are mathematically the same as those generated by the standard CG method applied to (6.4.16).
Better numerical stability is obtained by not forming ATM −1 A and instead using
From the convergence analysis of CG it follows that the convergence of ECGLS is governed
by the distribution of the nonzero eigenvalues of (AT M −1 A + N ). If A has full column rank,
ECGLS works also for N = 0. For M = I, N = µ2 I, µ ̸= 0, and c = 0, ECGLS is equal to
RCGLS. Hence, ECGLS can be viewed as an extended version of CGLS.
The Schur complement equations (6.4.17) have a similar structure to (6.4.16) and can be
obtained from (6.4.16) by making the substitutions
A ⇆ AT , M ⇆ N, x ⇆ y, b→c c → −b.
6.4. Regularization by Iterative Methods 331
where M = LLT and N = RTR are the Cholesky factorizations. With new variables ye = LT y
and x
e = Rx, the transformed system becomes
Im Ae ye eb
T = , (6.4.20)
Ae −In x
e c
e
where
e = L−1 AR−1 ,
A eb = L−1 b, c = R−T c.
e (6.4.21)
For c = 0, problem (6.4.20) can be solved by Algorithm RCGLS or RCGME of Section 6.4.2.
Each iteration requires triangular solves with L, LT , RT , and R. The rate of convergence de-
pends on the eigenvalues λi = 1 + σi2 , where σi are the singular values of L−1 AR−1 . Arioli [31,
2013] calls these elliptic singular values. They are the critical points of the functional
min y T Ax subject to ∥x∥N = ∥y∥M = 1. (6.4.22)
x,y
Note that since ∥e x∥22 = ∥x∥N and ∥ey ∥22 = ∥y∥M , the convergence rates for x and y are measured
in the corresponding elliptic error norm.
Arioli [31, 2013] develops elliptic versions of upper and lower GKL bidiagonalization that
generalize results by Benbow [104, 1999]. These generate left and right basis vectors ui and vi
that are orthonormal with respect to the inner products defined by M and N , respectively. Each
step of the bidiagonalizations requires solves with both M and N . Based on these bidiagonaliza-
tion processes, Arioli and Orban [844, 2017] derive versions of LSQR and related algorithms for
solving SQD systems with either c = 0 or b = 0. When both b ̸= 0 and c ̸= 0, it is necessary to
either shift the right-hand side to obtain one of these special cases or solve two special systems
and add the solutions.
truncated SVD (TSVD); see Example 4.2.8. Because the approximations in LSQR are tailored
to the specific right-hand side b, the minimum error is often achieved with a subspace of much
lower dimension compared to TSVD. For LSQR the iterations also diverge more quickly after
the optimal accuracy has been reached. For very ill-conditioned problems, partial reorthogonal-
ization of the u and v vectors in LSQR (or LSMR) may help to preserve stability. This is costly
in terms of storage and operations but may be acceptable when the number of iterations is small;
see Section 6.2.6.
For iterative regularization methods, it is essential to terminate the iterations before diver-
gence starts. Using a hybrid method that combines the iterative method with an inner regular-
ization can be an effective solution; see Hanke [568, 2001]. For example, the iterative method
can be applied to the Tikhonov regularized problem
A b
min x− . (6.4.23)
x λL 0
Such hybrid methods require two regularization parameters: the number of iterations k and λ.
With an appropriate choice of L, difficulties related to semiconvergence may be overcome. The
iterations can be continued until all relevant information available from the data has been cap-
tured.
Taking L = I in (6.4.23) gives the standard Tikhonov regularization. Then it can be verified
that the iterate xk,λ obtained by LSQR is the same as that obtained by first performing k steps of
LSQR on the unregularized problem (λ = 0) and then solving the regularized subproblem
Bk β1 e 1
min yk,λ − (6.4.24)
yk,λ λIk 0 2
and setting xk,λ = Vk yk,λ . In other words, for L = I, iteration and regularization commute.
This observation allows λ to be varied without restarting LSQR. The subproblems (6.4.24) can
be solved in O(k) operations using plane rotations, and yk,λ can be determined for several values
of λ at each step. However, to get the corresponding xk,λ the vectors v1 , . . . , vk are needed. If
there is not enough space to store these vectors, they can be generated anew by running the
bidiagonalization again for each λ.
Several alternative regularization methods besides Tikhonov regularization have been pro-
posed for LSQR’s bidiagonal subproblem
min ∥Bk yk − β1 e1 ∥2 , Bk ∈ R(k+1)×k . (6.4.25)
yk
O’Leary and Simmons [840, 1981] use a TSVD solution to (6.4.25). At each step of the bidi-
agonalization process, the SVD of ( Bk β1 e1 ) is computed. This can be done by standard
methods in O(k 2 ) operations. Computational details of this and similar schemes are considered
by Björck [131, 1988]. When no a priori information about the solution is available, generalized
cross-validation (GCV) can be used to determine the number of terms in the TSVD solution as
suggested by Björck, Grimme, and Van Dooren [145, 1994].
When L ̸= I, iteration and regularization no longer commute. Restarting LSQR when λ
is changed is usually too demanding computationally, and an initial transformation to standard
form is to be preferred. When L ∈ Rn×n is nonsingular, this is achieved by setting y = Lx and
Ā = AL−1 . Otherwise, if L ∈ Rp×n and p < n, L has a nontrivial nullspace. Then we take
Ā = AL†A , where
L†A = (I − (A(I − L† L))† A)L†
is the A-weighted (oblique) pseudoinverse; see Section 3.6.5. The standard form problem then
becomes
min ∥AL†A y − b∥22 + λ2 ∥y∥22 , x(λ) = L†A yλ + x2 , (6.4.26)
y
6.4. Regularization by Iterative Methods 333
where x2 ∈ N (L) is the unregularized component of the solution. For several frequently used
regularization matrices, this transformation can be implemented so that the extra work is negli-
gible. Assume that L has full row rank n − p, and compute the two QR factorizations
R
LT = (W1 W2 ) = W1 R, AW2 = Q1 U, (6.4.27)
0
where W2 gives an orthogonal basis for N (L). If p ≪ n, the work in the QR factorization of
AW2 is negligible. Then,
and
x2 = W2 (AW2 )† b = W2 U −1 QTAW b. (6.4.29)
For several discrete smoothing norms, L can be partitioned as L = ( L1 L2 ), where L1 is
square and of full rank. Then the computationally simpler expression
−1
L1
L†A = (I − P ) (6.4.30)
0
can be used; see Hanke and Hansen [570, 1993].
In high-dimensional problems, e.g., when L is a sum of Kronecker products, the matrix AL†A
may become too complicated to work with. For this case, an alternative projection approach that
only uses products with L and LT has been suggested by Kilmer, Hansen, and Espanõl [695,
2007]. Inspired by work of Zha [1145, 1996], this uses a joint bidiagonalization of QA and QL
in the QR factorization
A QA
= R.
L QL
inner regularization of the projected problem and may be used in a hybrid method. This simplifies
the use of parameter choice rules for λ, such as the GCV and L-curve criteria.
The choice of a regularization term with L ̸= I in (6.4.31) is known to be potentially much
more useful. If L ∈ Rp×n , the system has dimension (k + p) × k, where often p ≫ k. To
solve subproblem (6.4.32) one may first compute the compact QR factorization LVk = Qk Rk ,
Rk ∈ Rk×k , and use the identity ∥LVk yk ∥2 = ∥Rk yk ∥2 . The reduced subproblem is then solved
by a sequence of Givens transformations.
GMRES applied to a singular system can break down before a solution has been found. The
following property is proved by Brown and Walker [182, 1997].
Theorem 6.4.1. For all b and starting approximations x0 , GMRES determines a least squares
solution x∗ of a singular system Ax = b without breakdown if and only if N (A) = R(AT ).
Furthermore, if the system is consistent and x0 ∈ R(AT ), then x∗ is a least-norm solution.
A variant of MINRES called MR-II and introduced by Hanke [567, 1995] has starting vector
Ab and generates approximations xk ∈ K(A, Ab). The multiplication with A acts as a smoothing
operator and dampens high-frequency noise in b. Range-restricted GMRES is a similar method
for the nonsymmetric case due to Calvetti, Lewis, and Reichel [200, 2000]. This is based on
the Arnoldi process and also generates approximations xk ∈ Kk (A, Ab). These methods some-
times provide better regularized solutions than GMRES and MINRES but can cause a loss of
information in the data b; see Calvetti, Lewis, and Reichel [201, 2001].
If rank(A) < n, it is no restriction to assume the structure
A11 A12 x1 b1
= , (6.4.33)
0 0 x2 b2
where A11 ∈ Rr×r is nonsingular. Then the condition in Theorem 6.4.1 where N (A) = R(AT )
is equivalent to A12 = 0, and the system is consistent if and only if b2 = 0. If these conditions
are satisfied, applying GMRES to (6.4.33) is equivalent to applying GMRES to the nonsingular
system A11 x1 = b1 . In practice it is usually the case that A12 and b2 are nonzero but small.
A common approach is to choose M as an approximation to A in which the small eigenvalues
are replaced by ones. Eldén and Simoncini [382, 2012] show that a similar effect is obtained by
taking M to be a singular preconditioner equal to a low-rank approximation to A and applying
GMRES to
AM † y = b, x = M † y. (6.4.34)
In the initial iterations the residual components corresponding to large eigenvalues will be re-
duced in norm. This approach is particularly suitable for ill-posed problems from partial differen-
tial equations in two or three dimensions, such as the Cauchy problem with variable coefficients.
A fast solver for a nearby problem can then be used as a singular preconditioner.
Surveys of methods for regularization of large-scale problems are given by Hanke and Hansen
[570, 1993] and Hansen [579, 2010]. Nemirovskii [826, 1986] gives a strict proof of the regular-
izing properties of CG methods and shows that CGLS and LSQR reach about the same accuracy
as Landweber’s method before divergence starts. Hanke [568, 2001] compares the regularizing
properties of CGLS and CGME, and Jia [669, 2020] studies the regularizing effects of CGME,
LSQR, and LSMR. Wei, Xie, and Zhang [1115, 2016] propose combining Tikhonov regulariza-
tion with a randomized algorithm for truncated GSVD.
6.4. Regularization by Iterative Methods 335
Fierro et al. [404, 1997] propose using GKL bidiagonalization for computing truncated TLS
(TTLS) solutions. The use of bidiagonalization in Tikhonov regularization of large linear prob-
lems is further analyzed by Calvetti and Reichel [204, 2003]. The choice of regularization param-
eters in iterative methods is studied by Kilmer and O’Leary [696, 2001]. Hnětynková, Plešinger,
and Strakoš [631, 2009] use bidiagonalization to estimate the noise level in the data.
Novati and Russo [834, 2014] give theoretical results on convergence properties of the
Arnoldi–Tikhonov method with L ̸= I. Gazzola, Novati, and Russo [449, 2015] survey hy-
brid Krylov projection methods for Tikhonov regularized problems. They observe experimen-
tally that the method is very efficient for discrete ill-posed problems where the singular values
cluster at zero. They also investigate use of the GCV criterion within the Arnoldi–Tikhonov
method. A MATLAB package of iterative regularization methods called IR Tools is implemented
by Gazzola, Hansen, and Nagy [448, 2018]. This package also contains a set of large-scale test
problems.
W = (w1 , w2 , . . . , wp ) ∈ Rn×p
be a set of p linearly independent vectors that span a subspace to be added or removed. Then
both C and W T CW are symmetric positive definite. The deflated Lanczos process is obtained
by applying the standard Lanczos process auxiliary matrix
H T = I − CW (W T CW )−1 W T
B = CH = H T C = H T CH. (6.4.36)
Let v1 be a unit vector such that W T v1 = 0. Then the standard Lanczos process applied to
B with starting vector v1 generates a sequence {vj } of mutually orthogonal unit vectors vj such
that
vj+1 ⊥ span {W } + Kj (C, v1 ) ≡ Kj (C, W, v1 ). (6.4.37)
The generated vectors satisfy
Theorem 6.4.2. Let A ∈ Rm×n and W ∈ Rm×p have full column rank. Let x∗ be the exact
solution of the least squares problem minx ∥Ax − b∥2 . Then the deflated CGLS algorithm will
not break down at any step. The approximate solution xk is the unique minimizer of the error
norm ∥rk − r∗ ∥2 , rk = b − Axk , over the affine solution space x0 + Kp,k (ATA, W, r0 ). Further,
an upper bound for the residual error after k iterations is given by
k
κ(AH) − 1
∥rk − r∗ ∥2 ≤ 2 ∥r − r0 ∥2 , (6.4.43)
κ(AH) + 1
where H is the oblique projector defined in (6.4.35).
6.4. Regularization by Iterative Methods 337
Hence A and A e have the same eigenvalues, and the eigenvectors are related by X e = L−1 X. In
the LR algorithm of Rutishauser [947, 1958] this process is iterated. Setting A1 = A and
Ak = Lk Uk , Ak+1 = Uk Lk , k = 1, 2, . . . , (7.1.1)
Ak = L−1 −1 −1
k−1 · · · L2 L1 AL1 L2 · · · Lk−1 , k = 2, 3, . . . . (7.1.2)
339
340 Chapter 7. SVD Algorithms and Matrix Functions
Clearly, Ak+1 is again symmetric and positive definite, and therefore the recurrence is well
defined. Repeated application of (7.1.5) gives
−1 T −1 T
Ak = Tk−1 A1 Tk−1 = Tk−1 A1 (Tk−1 ) , (7.1.6)
Under certain restrictions the sequence of matrices Ak converges to a diagonal matrix whose
elements are the eigenvalues of A.
The QR algorithm is similar to the LR algorithm but uses orthogonal similarity transforma-
tions
Ak = Qk Rk , Ak+1 = Rk Qk = QH k Ak Qk , k = 1, 2, . . . . (7.1.8)
The resulting matrix Ak+1 is similar to A1 = A. Successive iterates of the QR algorithm satisfy
relations similar to those derived for the LR algorithm. By repeated application of (7.1.8) it
follows that
Pk Ak+1 = APk , Pk = Q1 Q2 · · · Qk . (7.1.9)
Furthermore, setting Uk = Rk · · · R2 R1 , we have
and by induction,
Pk U k = A k , k = 1, 2, . . . . (7.1.10)
For the QR algorithm we have ATk = Ak = RkT QTk and hence
i.e., RkT is the lower triangular Cholesky factor of A2k . For the Cholesky LR algorithm we have
from (7.1.7) that
A2k = Lk Lk+1 (Lk Lk+1 )T . (7.1.12)
By uniqueness, the Cholesky factorizations (7.1.11) and (7.1.12) of A2k must be the same, and
therefore RkT = Lk Lk+1 . Thus
Comparing this with (7.1.6) shows that two steps of the Cholesky LR algorithm are equivalent to
one step in the QR algorithm.
Matrix shapes invariant under symmetric QR algorithms are studied by Arbentz and Golub [30,
1995]. An initial reduction to real tridiagonal form reduces the arithmetic cost per step in the
QR algorithm to O(n) flops. The reduction can be carried out by a sequence of Householder
reflections
P = I − βuuH , β = 2/uH u.
In the kth step, A(k+1) = Pk A(k) Pk , where Pk is chosen to zero the last n − k − 1 elements in
the kth column. Dropping the subscripts k, we write
P AP = A − upH − puH + βuH puuH = A − uq H − quH , (7.1.14)
where p = βAu, q = p − γu, and γ = βuH p/2. The operation count for this reduction is about
2n3 /3 flops. A complex Hermitian matrix A ∈ Cn×n can be reduced to real tridiagonal form T
by a sequence of similarity transformations with complex Householder reflections.
The reduction to symmetric tridiagonal form is normwise backward stable. This ensures that
the larger eigenvalues will be computed with high relative accuracy. However, if the reduction is
performed starting from the top row, then the matrix should be ordered so that the larger elements
occur in the top left corner. This ensures the errors in the orthogonal reduction correspond to
small relative errors in the elements of A, and the small eigenvalues will not be destroyed.
If the reduction to tridiagonal form is carried out for a symmetric band matrix A in a similar
way, then the band structure will be destroyed in the intermediate matrices. By annihilating pairs
of elements using plane rotations in an ingenious order, the reduction can be performed without
increasing the intermediate bandwidth; see Rutishauser [949, 1963] and Schwarz [977, 1968].
For computing the SVD of a matrix A ∈ Cm×n it is advantageous to reduce it initially to
real bidiagonal form:
ρ1 γ2
ρ2 γ3
B T
. .. . .. ∈ Rn×n .
A = QB PB , B = (7.1.15)
0
ρn−1 γn
ρn
As described in Section 4.2.1, this can be achieved by taking P and Q as products of Householder
matrices. The resulting matrix B has the same singular values as A, and the singular vectors of
B and A are simply related. Note that both B TB and BB T are tridiagonal.
The QR factorization of A ∈ Rm×n requires 2(mn2 − n3 /3) flops or twice as many if Q
is needed explicitly. This cost usually dominates the total cost of computing the SVD. If only
the singular values are required, then the cost of bidiagonalization typically is 90% of the total
cost. If singular vectors are wanted, then the explicit transformation matrices are needed, but the
reduction still accounts for more than half the total cost.
The errors from the bidiagonal reduction may often account for most of the errors in the
computed singular values. To minimize these errors the reduction should preferably be done as a
two-step procedure. In the first step a QR factorization with column pivoting of A is performed:
R
AΠ = QR , R ∈ Rn×n . (7.1.16)
0
Next, R is reduced to upper bidiagonal form, which takes 8n3 /3 flops, or twice as many if the left
and right transformation matrices are wanted explicitly. Presorting the rows of A by decreasing
norms before the QR factorization can also reduce the relative errors in the singular values; see
Higham [622, 2000] and Drmač [335, 2017].
Note that a bidiagonal matrix with complex elements can always be transformed into real
form by a sequence of unitary diagonal scalings from the left and right. In the first step, D1 =
342 Chapter 7. SVD Algorithms and Matrix Functions
diag (eiα1 , 1, . . . , 1) is chosen to make the (1, 1) element in D1 B real. Next, D2 = diag (1, eiα2 ,
1, . . . , 1) is chosen to make the (1, 2) element in (D1 B)D2 real, and so on.
Let σi , i = 1 : n, be the singular values, and let ui and vi be the corresponding left and
right singular vectors of the upper bidiagonal matrix B in (7.1.15). Then the eigenvalues and
eigenvectors of the Jordan–Wielandt matrix W are given by
ui ui 0 B
W = ±σi , W = .
±vi ±vi BT 0
By an odd–even permutation the matrix W can be brought into the special real symmetric tridi-
agonal form
0 ρ1
ρ1 0 γ2
γ2 0 ρ2
.. ..
G = PWPT = ρ2 . . ∈ R2n×2n , (7.1.17)
.. ..
. . γn
γn 0 ρn
ρn 0
with zero diagonal elements first considered by Golub and Kahan [495, 1965].
The remaining steps are similar. Note that s and c in the plane rotations are not needed and
that two successive steps of the algorithm will transform a lower bidiagonal matrix back into
lower bidiagonal form. The work in one step of the bidiagonal zero-shift SVD algorithm is 4n
multiplications, n divisions, and n square roots. The algorithm uses no addition or subtraction.
Therefore no cancellation can take place, and each entry of the transformed matrix is computed
to high relative accuracy. By merging the two steps, we obtain the zero-shift algorithm used by
Demmel and Kahan [310, 1990].
The repeated transformation from lower to upper triangular form, or flipping of a triangu-
lar matrix, was first analyzed by Faddeev, Kublanovskaya, and Faddeeva [392, 1968]; see also
Chandrasekaran and Ipsen [233, 1995].
The following remarkably compact MATLAB function by Fernando and Parlett [403, 1994]
is simpler and more efficient. It performs one step of the unshifted QRSVD algorithm on a lower
or upper bidiagonal matrix B whose elements are stored in q[1:n] and e[2:n].
If some element γi = 0, where i < n, then the bidiagonal matrix splits into a direct sum of
two smaller bidiagonal matrices
B1 0
B= ,
0 B2
which the algorithm can treat separately. In particular, if γn = 0, then σ = ρn is a singular value.
If a diagonal element ρi = 0, i < n, then B is singular and must have a zero singular value.
Then in the next iteration the algorithm will drive this zero element to the last position, giving
γn = 0.
Demmel and Kahan [310, 1990] show that the singular values of a bidiagonal matrix are
determined to full relative accuracy by their elements, independent of their magnitudes, while
the error bounds for the associated singular vectors depend on the relative gap γi between σi
and other singular values.
Theorem 7.1.1. Let B and B̄ = B + δB, |δB| ≤ ω|B|, be upper bidiagonal matrices in Rn×n ,
with singular values σ1 ≥ · · · ≥ σn and σ̄1 ≥ · · · ≥ σ̄n , respectively. If η = (2n − 1)ω < 1,
then for i = 1, . . . , n,
η
|σ̄i − σi | ≤ |σi |, (7.1.24)
1−η
√
2η(1 + η)
max sin θ(ui , ūi ), sin θ(vi , v̄i ) ≤ , (7.1.25)
γi − η
|σi − σj |
γi = min .
j̸=i σi + σj
344 Chapter 7. SVD Algorithms and Matrix Functions
More generally, Demmel et al. [307, 1999] show that high relative accuracy in the computed
SVD can be achieved for matrices that are diagonal scalings of a well-conditioned matrix. They
consider rank-revealing decompositions of the form
where X and Y are well-conditioned and D is diagonal. Such a decomposition can be obtained,
e.g., using Gaussian elimination with rook or complete pivoting.
In the zero-shift QRSVD algorithm the diagonal elements of B will converge to the singular
values σi arranged in order of decreasing absolute magnitude. The superdiagonal elements will
behave asymptotically like cij (σi /σj )2k for some constants cij . Hence, the rate of convergence
is slow unless there is a substantial gap between the singular values; see Theorem 7.3.4. The
remedy is to introduce suitable chosen shifts in the QR algorithm. However, to do this stably is
a nontrivial task, and hence it is the subject of the next section.
Tk − τ I = Qk Rk , Rk Qk + τ I = Tk+1 , k = 1, 2, . . . . (7.1.27)
Since the shift is restored, each iteration is an orthogonal similarity transformation, and it holds
that Tk+1 = QTk Tk Qk . Further, the eigenvectors of T can be found by accumulating the product
Pk = Q1 · · · Qk , k = 1, 2, . . . . If the shift is chosen to approximate a simple eigenvalue λ of T ,
convergence of the QR algorithm to this eigenvalue will be fast.
Performing the shift τ in (7.1.27) explicitly will affect the accuracy of the smaller eigenvalues
for which |λi | ≪ |τ |. This is avoided in the implicitly shifted QR algorithm due to Francis [431,
1961], [432, 1961], where algorithmic details for performing the shifts implicitly are described.
A crucial role is played by the following theorem.
Proof. Assume that the first k columns q1 , . . . , qk in Q and the first k−1 columns in H have been
computed. (Since q1 is known, this assumption is valid for k = 1.) Equating the kth columns in
QH = AQ gives
Multiplying this by qiH and using the orthogonality of Q gives hik = qiH Aqk , i = 1 : k. Since H
is unreduced, hk+1,k ̸= 0 and
k
X
qk+1 = h−1
k+1,k Aqk − h ik i ,
q ∥qk+1 ∥2 = 1.
i=1
This and the condition that hk+1,k is real positive determine qk+1 uniquely.
7.1. The QRSVD Algorithm 345
In the implicit shift tridiagonal QR algorithm, the QR step (7.1.27) is performed as follows.
The first plane rotation P1 = G12 is chosen so that
P1 t1 = ±∥t1 ∥2 e1 , t1 = (α1 − τ, β2 , 0, . . . , 0)T ,
where t1 is the first column in Tk − τk I. The result of applying this transformation is pictured
below (for n = 5):
↓ ↓
→ × × + × × +
→
× × ×
× × ×
T T
P1 Tk =
× × × ,
P1 Tk P1 = + × × ×
.
× × × × × ×
× × × ×
To preserve the tridiagonal form, a transformation P2 = G23 is used to zero out the new nonzero
elements: × × 0
× × × +
P2T (P1T T P1 )P2 = 0 × × × .
+ × × ×
× ×
This creates two new nonzero elements, which in turn are moved further down the diagonal with
plane rotations. This process is known as chasing. Eventually, the nonzeros outside the diagonal
will disappear outside the border. By Theorem 7.1.3 the resulting symmetric tridiagonal matrix
QT Tk Q must equal Tk+1 , because the first column in Qk is P1 P2 · · · Pn−1 e1 = P1 e1 .
The shift τ in the QR algorithm is usually taken to be the eigenvalue of the trailing principal
2 × 2 submatrix of T
αn−1 βn
, (7.1.28)
βn αn
closest to αn , the so-called Wilkinson shift. In the case of a tie (αn−1 = αn ) the smaller
eigenvalue αn − |βn | is chosen. Wilkinson [1121, 1968] shows that, neglecting rounding errors,
this shift guarantees global convergence and that local convergence is nearly always cubic. A
stable formula for computing the shift is
. p
τ = αn − sign (δ)βn2 |δ| + δ 2 + βn2 , δ = (αn−1 − αn )/2;
Theorem 7.1.3. Let Q = (q1 , . . . , qn ) and V = (v1 , . . . , vn ) be orthogonal matrices such that
QT M Q = T and V T M V = S are real, symmetric, and tridiagonal. If v1 = q1 and T is
unreduced, then vi = ±qi , i = 2, . . . , n.
For a shift τ , let t1 be the first column of B TB − τ I, and determine the plane rotation
T1 = R12 so that
tridiagonal. To do this implicitly, start by applying the transformation T1 to B. This gives (take
n = 5)
↓ ↓
× ×
+ × ×
BT1 =
× × .
× ×
×
Next, premultiply by a plane rotation S1T = R12 to zero out the + element. This creates a new
nonzero element in the (1, 3) position. To preserve the upper bidiagonal form, choose a rotation
T2 = R23 to zero out the element +:
↓ ↓
→ × × + × × 0
→
0 × ×
× ×
T
S1T BT1 T2
S1 BT1 =
× × ,
=
+ × × .
× × × ×
× ×
Then continue to chase the element + down, with transformations alternately from the right and
left until a new upper bidiagonal matrix
T
B̂ = Sn−1 · · · S1T BT1 · · · Tn−1 = U T BP
When |ρn | ≤ δ where δ is a prescribed tolerance, ρn is accepted as a singular value, and the
order of the matrix B is reduced by one. This automatic deflation is an important property of the
QR algorithm.
7.1. The QRSVD Algorithm 347
is checked. If this is satisfied for some i < n, the matrix splits into a direct sum of two smaller
bidiagonal matrices B1 and B2 for which the QR iterations can be continued independently.
Furthermore, if qi = 0 for some i ≤ n, then B must have at least one singular value equal to
zero. Therefore also a second convergence criterion
is checked. If this is satisfied for some i < n, then the ith row can be zeroed out by a sequence of
plane rotations Gi,i+1 , Gi,i+2 , . . . , Gi,n applied from the left to B. The new elements generated
in the ith column can be discarded without introducing an error in the singular values that is larger
than some constant times u∥B∥2 . Then the matrix B again splits into two smaller bidiagonal
matrices B1 and B2 .
The criteria (7.1.32)–(7.1.33) ensure backward stability of the QRSVD algorithm in the
normwise sense, i.e., the computed singular values σ̄k are the exact singular values of a nearby
matrix B + δB, where ∥δB∥2 ≤ c(n) · uσ1 . Here c(m, n) is a constant depending on m and
n, and u is the machine unit. Thus, if T is nearly rank-deficient, this will always be revealed
by the computed singular values. The penalty for not spotting a negligible element is not loss
of accuracy but a slowdown of convergence. However, the smaller singular values may not be
computed with high relative accuracy. When all off-diagonal elements in B have converged to
zero, we have QTS BTS = Σ = diag (σ1 , . . . , σn ). The left and right singular vectors of T are
given by accumulating the product of transformations in the QRSVD iterations.
Each QRSVD iteration requires 14n multiplications and 2n calls to givrot. If singular vectors
are desired, accumulating the rotations into U and V requires 6mn and 6n2 flops, respectively,
and the overall cost goes up to O(mn2 ). Usually less than 2n QR iterations are needed. When
singular vectors are desired, the number of QR iterations can be reduced by first computing the
singular values without accumulating singular vectors. Then the QRSVD algorithm is run a
second time with shifts equal to the computed singular values, the so-called perfect shifts. Then
convergence occurs in at most n iteration. This may reduce the cost of the overall computations
by about 40%.
A variant of the QRSVD algorithm is proposed by Chan [222, 1982]. This differs in that a
QR factorization is performed before the bidiagonalization. In Table 7.1.1 operation counts are
shown for standard QRSVD and Chan’s version. Four different cases are considered depending
on whether U1 ∈ Rm×n and V ∈ Rn×n are explicitly required or not. Only the highest order
terms in m and n are kept. It is assumed that the iterative phase takes on average two complete
QR iterations per singular value and that standard plane rotations are used. Case (a) arises in
the computation of the pseudoinverse, case (c) in least squares applications, and case (d) in the
estimation of condition numbers and rank determination.
Ak = Qk Lk , Lk Qk = Ak+1 , k = 1, 2, . . . , (7.1.34)
with Lk lower triangular. This is merely a reorganization of the QR algorithm. Let J ∈ Rn×n be
the symmetric permutation matrix J = (en , . . . , e2 , e1 ). Then JA reverses the rows of A, AJ
reverses the columns of A, and JAJ reverses both rows and columns. If R is upper triangular,
then JRJ is lower triangular. It follows that if A = QR is the QR factorization of A, then
JAJ = (JQJ)(JRJ) is the QL factorization of JAJ. Hence, the QR algorithm applied to A is
the same as the QL algorithm applied to JAJ. Therefore the convergence theory is essentially
the same for both algorithms. But in the QL algorithm, inverse iteration is taking place in the top
left corner of A, and direct iteration in the lower right corner.
A bidiagonal matrix is said to be graded if the elements are large at one end and small at the
other. If the bidiagonalization uses an initial QR factorization with column pivoting, then the
matrix is usually graded from large at upper left to small at lower right, as illustrated here:
1 10−1
10−2 10−3
.
10−4 10−5
10−6
This is advantageous for the QR algorithm, which tries to converge to the singular values from
smallest to largest and “chases the bulge” from top to bottom. Convergence will usually be fast
if the matrix is graded this way. However, if B is graded the opposite way, the QR algorithm
may require many more steps, and the QL algorithm should be used instead. Alternatively, the
rows and columns of B could be reversed. When the matrix breaks up into diagonal blocks that
are graded in different ways, the bulge should be chased in the appropriate direction.
The QRSVD algorithm by Demmel and Kahan [310, 1990] is substantially improved com-
pared to the Golub–Reinsch algorithm. It computes the smallest singular values to maximal
relative accuracy and the others to maximal absolute accuracy. This is achieved by using the
zero-shift QR algorithm on any submatrix whose condition number κ = σmax /σmin is so large
that the shifted QR algorithm would make unacceptably large changes in the computed σmin . Al-
though the zero-shift algorithm has only a linear rate of convergence, it converges quickly when
σmin /σmax is very small. The zero-shift algorithm uses only about a third of the operations per
step as the shifted version. This makes the Demmel–Kahan algorithm faster and, occasionally,
much faster than the original Golub–Reinsch algorithm. Other important features of the new al-
gorithm are stricter convergence criteria and the use of a more accurate algorithm for computing
singular values and vectors of an upper triangular 2 × 2 matrix; see Section 7.2.2.
The QR algorithm was independently discovered by Kublanovskaya [709, 1961]. The story
of the QR algorithm and its later developments is told by Golub and Uhlig [510, 2009]. An
exposition of Francis’s work on the QR algorithm is given in Watkins [1104, 2011]. A two-
stage bidiagonalization algorithm where the matrix is first reduced to band form is developed by
Großer and Lang [541, 1999].
Initially, Golub [488, 1968] applied the Francis implicit QR algorithm to the special symmet-
ric tridiagonal matrix K in (7.1.17), whose eigenvalues are ±σi . If double QR steps with shifts
±τi are taken, then the zero diagonals in K are preserved. This makes it possible to remove
the redundancy caused by the doubling of the dimensions. The resulting algorithm is outlined
7.2. Alternative SVD Algorithms 349
also in the Stanford CS report of Golub and Businger [485, 1967], which contains an ALGOL
implementation by Businger.
The algorithm given by Golub and Reinsch [507, 1971] for computing the SVD is one of the
most elegant and reliable in numerical linear algebra and has been cited over 4600 times (as of
2023). The FORTRAN program for the SVD of a complex matrix of Businger and Golub [194,
1969] is an adaptation of the same code. The LINPACK implementation of the QRSVD al-
gorithm (see Dongarra et al. [322, 1979, Chap. 11]) follows the Handbook algorithm, except it
determines the shift from (7.1.31).
The QRSVD algorithm can be considered as a special instance of a product eigenvalue
problem, where two matrices A and B are given, and one wishes to find the eigenvalues of a
product matrix C = AB or quotient matrix C = AB −1 . For stability reasons, one wants to
operate on the factors A and B separately, without forming AB or AB −1 explicitly; see Heath
et al. [597, 1986]. The relationship between the product eigenvalue problem and the QRSVD
algorithm is discussed by Kressner [707, 2005], [706, 2005]. An overview of algorithms and
software for computing eigenvalues and singular values is given by Bai et al. [61, 2000].
α1 β2
β2 α2 β3
.. ..
T = β3 . . ∈ Rn×n (7.2.1)
..
. αn−1 βn
βn αn
that are greater than or less than a specified value can be determined by the method of bisection or
spectrum slicing. Early implementations of such methods were based on computing the leading
principal minors of the shifted matrix det(Tk − λI) of T . Expanding the determinant along the
last row and defining p0 = 1 gives
For a given numerical value of λ, the so-called Sturm sequence p1 (λ), . . . , pn (λ) can be evalu-
ated in 3n flops using (7.2.2).
Lemma 7.2.1. If the tridiagonal matrix T is irreducible, i.e., βi ̸= 0, i = 2, . . . , n, then the zeros
of pk−1 (λ) strictly separate those of pk (λ).
Proof. By Cauchy’s interlacing theorem, the eigenvalues of any leading principal minor of a
Hermitian matrix A ∈ Rn×n interlace the eigenvalues of A. In particular, the zeros of each
pk−1 (λ) separate those of pk (λ), at least in the weak sense. Suppose now that µ is a zero
of both pk (λ) and pk−1 (λ). Since βk ̸= 0, it follows from (7.2.2) that µ is also a zero of
pk−2 (λ). Continuing in this way shows that µ is a zero of p0 . This is a contradiction because
p0 = 1.
350 Chapter 7. SVD Algorithms and Matrix Functions
Theorem 7.2.2. Let s(τ ) be the number of agreements in sign of consecutive members in
the Sturm sequence p1 (τ ), p2 (τ ), . . . , pn (τ ). If pi (τ ) = 0, the sign is taken to be opposite
that of pi−1 (τ ). (Note that two consecutive pi (τ ) cannot be zero.) Then s(τ ) is the number of
eigenvalues of T strictly greater than µ.
Bisection can be used to locate an individual eigenvalue λk independent of any of the others
and is therefore suitable for parallel computing. The Sturm sequence algorithm is very stable
when carried out in IEEE floating-point arithmetic but is susceptible to underflow and overflow
and other numerical problems. There are ways to overcome these problems as shown by Barth,
Martin, and Wilkinson [93, 1971].
More recent implementations of bisection methods are developments of the inertia algorithm
analyzed by Kahan [680, 1966]; see Fernando [402, 1998]. The inertia of a symmetric matrix A
is defined as the triple (τ, ν, δ) of positive, negative, and zero eigenvalues of T . Sylvester’s law
(Horn and Johnson [639, 1985]) says that the inertia is preserved under congruence transforma-
tions. If symmetric Gaussian elimination is carried out for A − τ I, it yields the factorization
A − τ I = LDLT , D = diag (d1 , . . . , dn ). (7.2.3)
where L is unit lower bidiagonal and D = diag (d1 , . . . , dn ). Since A − σI is congruent to
D, it follows from Sylvester’s law that the number of eigenvalues of A smaller than τ equals
the number of negative elements π(D) in the sequence d1 , . . . , dn . Applied to a symmetric and
tridiagonal matrix T − τ I = LDLT , this procedure becomes particularly efficient and reliable.
A remarkable fact is that provided over- or underflow is avoided, element growth will not affect
the accuracy. For example, the LDLT factorization
1 2 1 1 1 2
A − I = 2 2 −4 = 2 1 −2 1 2
−4 −6 2 1 2 1
shows that A has two eigenvalues greater than 1.
The bisection method can be used to locate singular values of a bidiagonal matrix B by
applying it to compute the eigenvalues of one of tridiagonal matrices B TB and BB T . This can
be done without forming these matrix products explicitly. However, the best option is to apply
bisection to the special symmetric tridiagonal matrix T of Golub–Kahan form (7.1.17) with zero
diagonal and eigenvalues ±σi . It gives the highest relative accuracy in the computed singular
value.
By applying the bisection procedure to the special symmetric tridiagonal matrix G ∈ R2n×2n
in
0 ρ1
ρ1 0 γ2
γ2 0 ρ2
.. ..
G= ρ2 . . ∈ R2n×2n (7.2.4)
. . . .
. . γn
γn 0 ρn
ρn 0
with zero diagonal, we obtain a method for computing selected singular values σi of an irre-
ducible bidiagonal matrix Bn with elements ρ1 , . . . , ρn and γ2 , . . . , γn . Recall that G is per-
mutationally equivalent to the Jordan–Wielandt matrix and has eigenvalues equal to ±σi (Bn ),
i = 1, . . . , n.
7.2. Alternative SVD Algorithms 351
Following Fernando [402, 1998], the diagonal elements in the LDLT factorization of G − τ I
are obtained by Gaussian elimination as
π := 0; d := −τ ;
if d < 0 then π = 1;
for i = 1 : 2n − 1
d := −τ − zi /d;
if |d| < 0 then π := π + 1;
end
One step in Algorithm 7.2.1 requires 2n flops, and only the elements dk need be stored. The
number of multiplications can be halved by precomputing αk2 , but this may cause unnecessary
over- or underflow. To prevent breakdown
√ of the recursion, the algorithm should be modified so
that a small |dk | is replaced by ω, where ω is the underflow threshold.
Kahan [680, 1966] gives a detailed error analysis of the bisection algorithm. Assuming that
no over- or underflow occurs, he proves the monotonicity of the inertia counts in IEEE floating-
point arithmetic. He shows that the computed number π̄ is the exact number of singular values
greater than σ of a tridiagonal matrix T ′ , where the elements of T ′ have elements satisfying
a very satisfactory backward error bound. Combined with Theorem 7.1.1, this shows that the
bisection algorithm computes singular values of a bidiagonal matrix B with small relative errors.
The bisection algorithm is related to the famous quotient difference (qd) algorithm of
Rutishauser [946, 1954] for finding roots of polynomials or the poles of meromorphic functions;
see Henrici [603, 1958]. The differential qd (dqds) algorithm for computing singular values of
a bidiagonal matrix is due to Fernando and Parlett [403, 1994]. This algorithm evolved from
trying to find a faster square-root-free version of the Demmel–Kahan zero-shift bidiagonal QR
Algorithm 7.1.1. Recall that one step of the zero-shift Demmel–Kahan QR algorithm applied to
a bidiagonal matrix B with elements qi , ei+1 gives another bidiagonal matrix Bb with elements
T T b
qbi , ebi+1 such that BB = B B. Equating the (k, k) and (k, k + 1) elements on both sides of
b
this equation gives
qk2 + e2k = eb2k−1 + qbk2 , ek qk+1 = qbk ebk .
These are similar to the rhombus rules of the qd algorithm and connect the four elements
q̂k2
ê2k−1 e2k .
2
qk+1
To keep the high relative accuracy property, Fernando and Parlett had to use the so-called differ-
ential form of the progressive dqds algorithm This version also allows a stable way to introduce
352 Chapter 7. SVD Algorithms and Matrix Functions
explicit shifts in the algorithm. One step of dqds with shift τ ≤ σmin (B) computes a bidiagonal
B
e such that
BbT B
b = BB T − τ 2 I. (7.2.6)
The choice of τ ensures that B b exists. A nonrestoring orthogonal similarity transformation can be
performed without forming BB T − τ 2 I, using a hyperbolic QR factorization (see Section 3.2.4).
Alternatively, if T
B B
∈ R2n×n
b
Q =
0 τI
with Q orthogonal, then BB T = B bT Bb + τ 2 I as required. In the first step, a plane rotation is
constructed that affects only rows (1, n + 1) and makes the (n + 1, 1) element equal topτ . This is
possible because τ ≤ σmin (B) ≤ q1 , and it changes the first diagonal element to t1 = q12 − τ 2 .
Next, a rotation in rows (1, 2) is used to annihilate e2 , giving
q
qb1 = q12 − τ 2 + e22
and changing q2 to qb2 . The first column and row now have their final form:
q1 t1 qb1 eb2
e2 q 2 e2 q2 0 qe2
.. ..
⇒
.. ..
⇒
.. ..
.
. . . . . .
en q n en qn en qn
0 0 ··· 0 τ 0 ··· 0 τ 0 ··· 0
All remaining steps are similar. The kth step only acts on the last n−k +1 rows and columns and
will produce an element τ in position (n + k, n + k). One can show that this algorithm does not
introduce large relative errors in the singular values. By working instead with squared quantities,
square roots can be eliminated. More details are given in Fernando and Parlett [403, 1994] and
Parlett [883, 1995]. The dqds algorithm is available in LAPACK as the routine DLASQ and is
considered to be the fastest SVD algorithm when only singular values are required. The error
bounds for dqds are significantly smaller than those for the Demmel–Kahan QRSVD algorithm.
A further benefit is that it can be implemented in either parallel or pipelined format.
The multiple relatively robust representation (MRRR or MR3 ) algorithm by Dhillon [320,
1997] and Dhillon and Parlett [321, 2004] accurately computes the eigenvalue decomposition
of a symmetric tridiagonal matrix M ∈ Rn×n in only O(n2 ) operations. It overcomes some
difficulties with the dqds algorithm for computing the eigenvectors. Applying the MR3 algorithm
to compute the eigenvalue decompositions of B TB and BB T separately gives a fast algorithm
for computing the full SVD of a bidiagonal matrix B. Großer and Lang [542, 2003] show that
this may lead to poor results regarding the residual ∥BV − U Σ∥ and give a coupling strategy
that resolves this difficulty. The resulting algorithm is analyzed in Großer and Lang [543, 2005].
An implementation of the MR3 algorithm for the bidiagonal SVD is given by Willems, Lang,
and Vömel [1125, 2007]. Later developments of the bidiagonal MR3 algorithm are described in
Willems and Lang [1124, 2012].
ensures that π/4 < θ ≤ π/4 and minimizes the difference ∥A′ − A∥F . Note that a′pp + a′qq =
trace (A). The eigenvalues are
These formulas are chosen to reduce roundoff errors; see Rutishauser [950, 1971]. If symmetry
is exploited, then one Jacobi transformation takes about 8n flops. Note that an off-diagonal
element made zero at one step will in general become nonzero at some later stage. The Jacobi
method also destroys any band structure in A.
The convergence of the Jacobi method depends on the fact that in each step the Frobenius
norm of the off-diagonal elements
X
S(A) = a2ij = ∥A − D∥2F (7.2.12)
i̸=j
is reduced. To see this, note that because the Frobenius norm is orthogonally invariant and
a′pq ̸= 0, it holds that
S(A′ ) = S(A) − 2a2pq .
354 Chapter 7. SVD Algorithms and Matrix Functions
For simplicity of notation we set in the following A = Ak and A′ = Ak+1 . There are
various strategies for choosing the order in which the off-diagonal elements are annihilated. In
the classical Jacobi method the off-diagonal element of largest magnitude is annihilated—the
optimal choice. Then 2a2pq ≥ S(Ak )/N , N = n(n − 1)/2, and
This shows that for the classical Jacobi method, Ak+1 converges at least linearly with rate
1 − 1/N to a diagonal matrix. It can be shown that ultimately the rate of convergence is qua-
dratic, i.e., for k large enough, S(Ak+1 ) < cS(Ak )2 for some constant c. The iterations are
repeated until S(Ak ) < δ∥A∥F , where δ is a tolerance that can be chosen equal to the unit
roundoff u. Then it follows from the Bauer–Fike theorem that the diagonals of Ak approximate
the eigenvalues of A with an error less than δ∥A∥F .
In the classical Jacobi method, a large amount of effort is spent on searching for the largest
off-diagonal element. Even though it is possible to reduce this time by taking advantage of the
fact that only two rows and columns are changed at each step, the classical Jacobi method is
almost never used. Instead a cyclic Jacobi method is used, where the N = 12 n(n − 1) off-
diagonal elements are annihilated in some predetermined order. Each element is rotated exactly
once in any sequence of N rotations, called a sweep. Convergence of any cyclic Jacobi method
can be guaranteed if any rotation (p, q) is omitted for which
for some threshold τ ; see Forsythe and Henrici [423, 1960]. To ensure a good rate of conver-
gence, τ should be successively decreased after each sweep. For sequential computers, the most
popular cyclic ordering is rowwise, i.e., the rotations are performed in the order
Jacobi’s method is very suitable for parallel computation because rotations (pi , qi ) and (pj , qj )
can be performed simultaneously when pi , qi are distinct from pj , qj . If n is even, n/2 trans-
formations can be performed simultaneously, and a sweep needs at least n − 1 such parallel
steps. Several parallel schemes that use this minimum number of steps have been constructed;
see Eberlein and Park [356, 1990]. A possible choice is the round-robin ordering, illustrated here
for n = 8:
(1, 2), (3, 4), (5, 6), (7, 8),
(1, 4), (2, 6), (3, 8), (5, 7),
(1, 6), (4, 8), (2, 7), (3, 5),
(p, q) = (1, 8), (6, 7), (4, 5), (2, 3),
(1, 7), (8, 5), (6, 3), (4, 2),
(1, 5), (7, 3), (8, 2), (6, 4),
(1, 3), (5, 2), (7, 4), (8, 6).
The rotations associated with each such row can be computed simultaneously.
Convergence of any cyclic Jacobi method can be guaranteed if rotations are omitted when
the off-diagonal element is smaller in magnitude than some threshold. To ensure a good rate
of convergence, the threshold should be successively decreased after each sweep. It has been
shown that the rate of convergence is ultimately quadratic, so that for k large enough, we have
7.2. Alternative SVD Algorithms 355
S(Ak+1 ) < cS(Ak )2 for some constant c. The iterations are repeated until S(Ak ) < δ∥A∥F ,
where δ is a tolerance, which can be chosen equal to the unit roundoff u. The Bauer–Fike
theorem (see Golub and Van Loan [512, 1996, Theorem 7.2.2]) shows that the diagonal elements
of Ak then approximate the eigenvalues of A with an error less than δ∥A∥F . About 4n3 flops
are required for one sweep. In practice, the cyclic Jacobi method needs no more than about 3–5
sweeps to obtain eigenvalues of more than single precision accuracy, even when n is large. The
number of sweeps grows approximately as O(log n). About 10n3 flops are needed to compute
all the eigenvalues of A. This is about 3–5 times more than required for the QR algorithm.
An orthogonal system X = limk→∞ Xk , of eigenvectors of A is obtained by accumulating
the product of all Jacobi transformations Jk :
X0 = I, Xk = Xk−1 Jk , k = 1, 2, . . . . (7.2.14)
For each rotation Jk the associated columns p and q of Xk−1 are modified, which requires 8n
flops. Hence, computing the eigenvectors doubles the operation count.
Hestenes [606, 1958]) gave a one-sided Jacobi-type method for computing the SVD. It uses
a sequence of plane rotations from the right to find an orthogonal matrix V such that AV = U Σ
has orthogonal columns. From this the SVD A = U ΣV T is readily obtained. Hestenes’s method
is mathematically equivalent to applying Jacobi’s method to ATA. In a basic step of the method,
two columns in A are rotated,
c s
( âp âq ) = ( ap aq ) , p < q. (7.2.15)
−s c
The rotation parameters c, s are determined so that the rotated columns are orthogonal or, equiv-
alently, so that
T
∥ap ∥22 aTp aq
c s c s
−s c aTq ap ∥aq ∥22 −s c
1 q q
σ1 = (rpp + rqq )2 + rpq
2 + (rpp − rqq )2 + rpq
2 , σ2 = |rpp rqq |/σp . (7.2.17)
2
2
The right singular vector (−sr , cr ) in (7.2.16) is parallel to (rpp − σp2 , rpp rpq ). The left singular
vectors are determined by (cl , sl ) = (rpp cr − rpq sr , rqq sr )/σp . These expressions suffer from
possible over- or underflow in the squared subexpressions but can be reorganized to provide
results with nearly full machine precision; see the MATLAB code below.
356 Chapter 7. SVD Algorithms and Matrix Functions
The strategies for choosing the order in which the off-diagonal elements are annihilated are
similar to those for Jacobi’s method. A sequence of N = n(n − 1)/2 rotations in which each
column is rotated exactly once is called a sweep. No more than about five sweeps are needed to
obtain singular values of more than single precision accuracy, even when n is large.
To apply Hestenes’s method to a real m × n matrix when m > n, an initial QR factorization
with column pivoting and row sorting of A is first performed, and the algorithm is applied to R ∈
Rn×n . This tends to speed up convergence and simplify the transformations and is recommended
also when A is square. Hence, without restriction we can assume in the following that m = n.
Initial implementations of Jacobi’s method were slower than the QR algorithm but were able
to compute singular values of a general matrix more accurately. With further improvements by
Drmač [333, 1997] and Drmač and Veselić [336, 2008], [337, 2008], Jacobi’s method becomes
competitive also in terms of speed.
In the method of Kogbetliantz (see Kogbetliantz [702, 1955]) for computing the SVD of a
square matrix A, the off-diagonal elements of A are successively reduced in size by a sequence
of two-sided plane rotations
A′ = JpqT
(ϕ)AJpq (ψ), (7.2.18)
where Jpq (ϕ) and Jpq (ψ) are determined so that a′pq = a′qp = 0. Note that only rows and
columns p and q in A are affected by the transformation. The rotations Jpq (ϕ) and Jpq (ψ) are
determined by computing the SVD of a 2 × 2 submatrix
app apq
Apq = , app ≥ 0, aqq ≥ 0.
aqp aqq
7.2. Alternative SVD Algorithms 357
The assumption of nonnegative diagonal elements is no restriction because the sign of these
elements can be changed by premultiplication with an orthogonal matrix diag (±1, ±1). From
the invariance of the Frobenius norm under orthogonal transformations it follows that
S(A′ ) = S(A) − (a2pq + a2qp ), S(A) = ∥A − D∥2F .
This is the basis for a proof that the matrices generated by Kogbetliantz’s method converge to a
diagonal matrix containing the singular values of A. Orthogonal sets of left and right singular
vectors can be obtained by accumulating the product of all the transformations. Convergence is
analyzed in Paige and Van Dooren [869, 1986] and Fernando [401, 1989].
Kogbetliantz’s method should not be applied directly to A but to the triangular matrix R
obtained by an initial pivoted QR factorization. It can be shown that one sweep of the row
cyclic algorithm (7.2.13) applied to an upper triangular matrix generates a lower triangular matrix
and vice versa. The annihilation of the elements in the first row for n = 4 by plane rotations
(1, 2), (1, 3), (1, 4) from the left is pictured below:
x a0 b0 c0 x 0 b1 c1 x 0 0 c2 x 0 0 0
0 x d0 e0 0 x d1 e1 g0 x d2 e1 g1 x d2 e2
⇒ ⇒ ⇒ .
0 0 x f0 0 0 x f0 0 0 x f1 h0 0 x f2
0 0 0 x 0 0 0 x 0 0 0 x 0 0 0 x
The switching between upper and lower triangular format can be avoided by a simple permutation
scheme; see Fernando [401, 1989]. This makes it possible to reorganize the algorithm so that at
each stage of the recursion one only needs to store and process a triangular matrix. The resulting
algorithm is highly suitable for parallel computing. The reorganization of the row cyclic scheme
is achieved by the following algorithm (see also Luk [762, 1986] and Charlier, Vanbegin, and
Van Dooren [236, 1988]):
for i = 1 : n − 1
for ik = 1 : n − i
A = Pik Jik ,ik +1 (ϕk )AJiTk ,ik +1 (ψk )PiTk
end
end
where Pik denotes a permutation matrix that interchanges rows ik and ik + 1. The permutations
will shuffle the rows and columns of Ak so that each index pair (ik , jk ) in the row cyclic scheme
becomes an adjacent pair of type (ik , ik +1) when it is its turn to be processed. The permutations
involved are performed simultaneously with the rotations at no extra cost. In this scheme, only
rotations on adjacent rows and columns occur.
Below we picture the annihilation of the elements in the first row for n = 4 for the reorga-
nized scheme. After elimination of a0 , the first and second rows and columns are interchanged.
Element b1 is now in the first superdiagonal and can be annihilated. Again, by interchanging the
third and fourth rows and columns, c2 is brought to the superdiagonal and can be eliminated. The
resulting matrix is still upper triangular:
x a0 b0 c0 x 0 d1 e1 x d2 g0 e1 x d2 e2 g1
0 x d0 e0 0 x b1 c1 0 x 0 f1 0 x f2 h0
⇒ ⇒ ⇒ .
0 0 x f0 0 0 x f0 0 0 x c2 0 0 x 0
0 0 0 x 0 0 0 x 0 0 0 x 0 0 0 x
Because of its simplicity, Kogbetliantz’s algorithm has been adapted to computation of the
generalized singular value decomposition. Further developments of the Kogbetliantz SVD algo-
rithm are given by Hari and Veselić [591, 1987]. Bujanović and Drmač [184, 2012] study the
convergence and practical applications of the block version of Kogbetliantz’s method.
358 Chapter 7. SVD Algorithms and Matrix Functions
where ( l1T λ1 ) = eTk V1 is the last row of V1 , and f2T = eT1 V2 is the first row of V2 . If a
permutation matrix Pk interchanges row k and the first block row, then
qk λ1 qk l1T rk f2T
Pk CPkT = M = 0 D1 0 . (7.2.22)
0 0 D2
7.2. Alternative SVD Algorithms 359
Lemma 7.2.3. Let the SVD of the matrix in (7.2.24) be M = XΣY T , with
X = (x1 , . . . , xn ), Σ = diag (σ1 , . . . , σn ), Y = (y1 , . . . , yn ).
Then the singular values have the interlacing property
0 = d1 < σ1 < d2 < σ2 < · · · < dn < σn < dn + ∥z∥2 ,
where z = (z1 , . . . , zn )T , and they are roots of the characteristic equation
n
X zk2
f (σ) = 1 + = 0.
d2k − σ 2
k=1
The characteristic equation can be solved efficiently and accurately by the algorithm of
Li [743, 1994]. The singular values of M are always well-conditioned. The singular vectors
are xi = x̃i /∥x̃i ∥2 , yi = ỹi /∥ỹi ∥2 , i = 1 : n, where
z1 zn d2 z2 dn zn
ỹi = ,..., 2 , x̃i = −1, 2 ,..., 2 ,
d21 − σi2 dn − σi2 d2 − σi2 dn − σi2
and
n n
X zj2 X (dj zj )2
∥ỹi ∥22 = , ∥x̃i ∥22 =1+ .
j=1
(dj − σi2 )2
2
j=2
(d2j − σi2 )2
The singular vectors can be extremely sensitive to the presence of close singular values. To
get accurately orthogonal singular vectors without resorting to extended precision is a difficult
problem; see Gu and Eisenstat [546, 1995].
360 Chapter 7. SVD Algorithms and Matrix Functions
when a row wT is appended to A. From the relationship between the SVD of A and the symmet-
ric eigenvalue problem for ATA we have
ÃT Ã = ATA + wwT = V Σ2 V T + wwT = V (Σ2 + ρ2 zz T )V T = Ṽ Σ̃2 Ṽ T ,
where z = (ζ1 , . . . , ζn ) = V T w/ρ and ρ = ∥w∥2 . Hence Σ̃2 and Ṽ are the solution to a
symmetric eigenvalue problem modified by a perturbation of rank one. Such problems can be
solved by using the observation (see Golub [490, 1973]) that the eigenvalues λ1 ≥ λ2 ≥ · · · ≥
λn of
C = D + ρ2 zz T , D = diag (d1 , d2 , . . . , dn ), ∥z∥2 = 1, (7.2.27)
where d1 ≥ d2 ≥ · · · ≥ dn , are the values of λ for which
n
2
X ζj2
g(λ) = 1 + ρ = 0. (7.2.28)
j=1
(dj − λ)
7.2. Alternative SVD Algorithms 361
Good initial approximations to the roots can be obtained from the interlacing property (see The-
orem 1.3.5)
λ1 ≥ d1 ≥ λ2 ≥ · · · ≥ dn−1 ≥ λn ≥ dn .
To solve equation (7.2.28) a method based on rational approximation safeguarded with bisection
is used. The subtle details in a stable implementation of such an algorithm are treated by Li [743,
1994].
When the modified eigenvalues d˜i = σ̃i2 have been calculated, the corresponding eigenvec-
tors are found by solving
(Di + ρ2 zz T )xi = 0, Di = D − d˜2i I.
Provided Di is nonsingular (this can be ensured by an initial deflation), we have xi = Di−1 z/
∥Di−1 z∥2 . (Note that forming Di−1 z explicitly should be avoided in practice; see Bunch and
Nielsen [188, 1978].) The updated right singular vectors are Ṽ = V X, where X = (x1 , . . . , xn ).
If A (or Ã) is still available, the updated left singular vectors Ũ can be computed from Ũ =
ÃṼ Σ̃−1 .
An alternative approach for appending a row is given by Businger [192, 1970]. We have
T
U 0 A L Σ
V = Π n+1,m+1 , L = ,
0 1 wT 0 wT V
where Πn+1,m+1 denotes a permutation matrix interchanging rows n + 1 and m + 1, and L is
a special lower triangular matrix. Businger’s updating algorithm consists of two major phases.
The first phase is a finite process that transforms L ∈ R(n+1)×n into upper bidiagonal form using
plane rotations from left and right:
B̃
G1 LG2 = , B̃ ∈ Rn×n .
0
The second phase is an implicit QR diagonalization of B̃ (see Section 7.1.1) that reduces B̃ to
diagonal form Σ̃.
In the kth step of phase 1, the kth element of wT V (k = 1 : n − 1) is eliminated using plane
rotations and a chasing scheme on rows and columns. This is pictured below for n = 5 and
k = 3:
↓ ↓
× × × × × ×
× × +
× × ×
→
× × ⊕
× + ⇒ →
× × ⇒ →
+ × ×
+ ×
→
⊕ ×
×
× × ×
0 0 ⊕ × × 0 0 0 × × 0 0 0 × ×
↓ ↓ ↓ ↓
× × + → × × ⊕ × ×
× ×
→ + × ×
⊕
× ×
⊕ × ×
⇒
× ×
⇒
× ×
.
×
×
×
× × ×
0 0 0 × × 0 0 0 × × 0 0 0 × ×
Phase 1 uses n(n − 1)/2 row and column rotations. Most of the work is used to apply these
rotations to U and V . This requires 2n2 (m + n) flops if standard plane rotations are used. For
362 Chapter 7. SVD Algorithms and Matrix Functions
updating least squares solutions we only need to update V , Σ, and c = U T b. The dominating
term is then reduced to 2n3 flops. Zha [1144, 1992] shows that the work can be halved by using
a two-way chasing scheme in the reduction to bidiagonal form. Phase 2 typically requires about
3n3 flops. Note that Σ and V can be updated without U being available. From the interlacing
property (Theorem 1.3.5) it follows that the smallest singular value will increase. Hence the rank
cannot decrease.
When the SVD is to be modified by deleting a row, with no loss of generality we can assume
that the first row of A is to be deleted. Then we wish to determine the SVD of à ∈ R(m−1)×n
when the SVD T
z Σ
A= =U VT (7.2.29)
à 0
is known. This problem can be reduced to a modified eigenvalue problem of the form
The interlacing property now gives d1 ≥ d˜1 ≥ d2 ≥ · · · ≥ d˜n−1 ≥ dn ≥ d˜n ≥ 0. Hence the
Bunch–Nielsen scheme is readily adapted to solving this problem.
Park and Van Huffel [882, 1995] give a backward stable algorithm based on finding the SVD
of (e1 , A), where e1 is an added dummy column. Then
1 0 u1 Σ
U T (e1 , A) = ,
0 V u2 0
where (uT1 , uT2 ) is the first row of U . First, determine left and right plane rotations G1 and G2 so
that
1 wT
u1 Σ
G1 G2 = 0 B̃ , (7.2.31)
u2 0
0 0
with B̃ upper bidiagonal. This can be achieved by a chasing scheme similar to that used when
adding a row. The desired bidiagonal form is built from bottom to top, while nonzeros are chased
into the lower-right corner. The reduction is pictured below for k = 3, n = 4:
↓ ↓
× × × × × ×
→
× × +
×
× ⊕
×
×
→
⊕ + × ⇒
0
× × ⇒ →0
× × +
0 × × 0 + × × →0 ⊕ × ×
0 × 0 × 0 ×
↓ ↓
× × × ×
× × × ×
0
× × ⊕ ⇒ 0
× × .
0 × × →0 × ×
0 + × → 0 ⊕ ×
A total of (n − 1)2 + 1 plane rotations are needed to make the first column of G1 U T equal to
eT1 . From orthogonality it follows that this matrix must have the form
T 1 0
G1 U = ,
0 Ū T
7.3. Computing Selected Singular Triplets 363
with Ū orthogonal. Since no rotation from the right involves the first column, the transformed
matrix has the form
α 0 1 0
G2 = .
0 V 0 V̄
It now follows that
T
α wT
α 0 1 z 1 0
=0 B̃ ,
0 Ū T 0 Ã 0 V̄
0 0
B̃
which gives Ū T ÃV̄ = . In the second phase, the implicit QRSVD is used to reduce B̃ to
0
diagonal form Σ̃. Simultaneously Ū and V̄ are updated.
In general it is not necessary, or even advisable, to form AHA or AAH . The squaring of the
singular values is a drawback, as it will force the clustering of small singular values. Instead, one
may consider the equivalent Hermitian eigenvalue problem
0 A u u
= ±σ . (7.3.2)
AH 0 ±v ±v
This yields the singular values and both the left and right singular vectors. If r = rank(A),
then M has 2r nonzero eigenvalues, ±σ1 (A), . . . , ±σr (A). Here the small singular values of A
correspond to interior eigenvalues of the Hermitian matrix.
Let A ∈ Cn×n be a Hermitian matrix with eigenpairs λi , xi , i = 1, . . . , n. Given a unit initial
vector z (0) , the power method forms the vector sequence z (k) = Ak z (0) using the recursion
This only requires the ability to form products Az for given vectors z. If the eigenvalues
Pn satisfy
|λ1 | > |λ2 | ≥ · · · ≥ |λn |, expanding z (0) along the eigenvectors gives z (0) = j=1 αj xj and
n n
(k) k (0)
X X λ j k
z =A z = λkj αj xj = λk1 α1 x1 + αj xj , (7.3.4)
j=1 j=2
λ1
k = 1, 2, . . . . If α1 ̸= 0 and |λj |/|λ1 | < 1 (j ̸= 1), it follows from (7.3.4) that z (k) converges
with linear rate |λ2 |/|λ1 | to the normalized eigenvector x1 as k → ∞. To avoid overflow or
underflow, recursion (7.3.3) should be modified to
xH Ax
λ= . (7.3.6)
xH x
Theorem 7.3.1. Let x be given a unit vector and A be a Hermitian matrix. Then (µ, x), where
µ = xH Ax is the Rayleigh quotient, is an exact eigenpair of A
e = A + E, where
(A + E)x = Ax − r = µx.
This shows that r and x are orthogonal eigenvectors of E H E, with both eigenvalues equal to
rH r = ∥r∥22 . The other eigenvalues are zero, and hence ∥E∥2 = ∥r∥2 .
1 Ax xH Ax 1
∇µ(x) = H − H 2 x = H (Ax − λx). (7.3.8)
2 x x (x x) x x
Hence the Rayleigh quotient µ(x) is stationary if and only if x is an eigenvector of A. Hence
µ(x) usually is a far more accurate approximate eigenvalue than x is an approximate eigenvector.
If we apply the Rayleigh quotient to the Hermitian system (7.3.2) we obtain
1 T T 0 A u
µ(u, v) = (u , ±v ) = ±uT Av, (7.3.9)
2 AT 0 ±v
where u and v are unit vectors. Here sign(v) can be chosen to give a real nonnegative value of
7.3. Computing Selected Singular Triplets 365
µ(u, v). Given approximate right and left singular vectors of A ∈ Rm×n , the Rayleigh quotient
approximations to the dominant singular value are
Theorem 7.3.2. For any scalar α and unit vectors u, v, there is a singular value σ of A such that
1 Av − uα
|σ − α| ≤ √ . (7.3.10)
2 AT u − vα 2
For fixed u, v this error bound is minimized by taking α equal to the Rayleigh quotient given in
(7.3.9).
The power method computes approximate eigenvectors of a Hermitian matrix A for the ei-
genvalue of largest magnitude. Approximations at the other end of the spectrum can be obtained
by applying the power method to A−1 . Given an initial unit vector v (0) , the inverse power
method computes the normalized sequence v (1) , v (2) , v (3) , . . . , by the recursion
v (k) = v (k−1) ,
Ab v (k) = vb(k) /∥b
v (k) ∥2 , k = 1, 2, . . . . (7.3.11)
Here v (k) will converge to a unit eigenvector corresponding to the Rayleigh quotient
µ−1
n ≈ (v
(k−1) H −1 (k−1)
) A v = (v (k−1) )H vb(k) .
By this spectral transformation, eigenvalues close to the shift µ are transformed into large and
well-separated eigenvalues of (A−µI)−1 ; see Figure 7.3.1. Given an initial vector v0 , the shifted
inverse power method computes the sequence of vectors
(A − µI)b
vk = vk−1 , k = 1, 2, . . . . (7.3.14)
σi2 ≈ µ + 1 (vk−1
H
vbk ). (7.3.15)
Shifted inverse iteration is usually attributed to Wielandt [1117, 1944] but can be traced back
to Jacobi’s work in 1844. It is a powerful method for computing an eigenvalue in a neighborhood
of the shift µ but requires computing a factorization of the shifted matrix A − µI.
366 Chapter 7. SVD Algorithms and Matrix Functions
4
θ2
1
θ3
0
λ1 λ2 λ3
−1
−2
−3
θ1
−4
−5
−3 −2 −1 0 1 2 3 4 5
Figure 7.3.1. Spectral transformation with shift µ = 1. Used with permission of Springer
International Publishing; from Numerical Methods in Matrix Computations, Björck, Åke, 2015; permission
conveyed through Copyright Clearance Center, Inc.
So far we have considered inverse iteration with a fixed shift µ. In Rayleigh-quotient it-
eration (RQI) a variable shift is used equal to the Rayleigh quotient of the current eigenvector
approximation.
1. If A − µk I is singular, then solve (A − µk I)vk+1 = 0 for unit vector vk+1 and stop.
Otherwise solve (A − µk I)v = vk .
If A is Hermitian, the Rayleigh quotient is stationary at eigenvectors, and the local rate of conver-
gence is cubic; see Parlett [884, 1998, Theorem 4.7.1]. This ensures that the number of correct
digits in vk triples at each step for k large enough.
The norm of the residual rk = Avk − µk vk is the best measure of the accuracy of (µk , vk )
as an eigenpair. A key fact in the global analysis of RQI is that for a Hermitian A the residual
norms decrease.
Theorem 7.3.3. For a Hermitian matrix A, the residual norms in RQI are monotonically de-
creasing: ∥rk+1 ∥2 ≤ ∥rk ∥2 . Equality holds only if µk+1 = µk and vk is an eigenvector of
(A − µk I)2 .
In the Hermitian case it is not necessary to assume that RQI converges to an eigenvector
corresponding to a simple eigenvalue. Either the iterates vk will converge cubically to an eigen-
vector of A, or the odd and even iterates will converge linearly to the bisectors of a pair of
eigenvectors of A. The latter situation is unstable under small perturbations, so RQI converges
from any starting vector; see Parlett [884, 1998, Sect. 4.9]. Note that RQI may not converge to
an eigenvalue closest to µ(v0 ). It is not in general obvious how to choose the starting vector to
make RQI converge to a particular eigenvalue.
Rayleigh-quotient iteration requires a new factorization of the shifted matrix A − µk I for
each iteration. It is therefore considerably more costly than inverse iteration. For a dense matrix
the cost for a factorization is O(n3 ) operations. For problems where A is large and sparse it
may not be feasible. Then (A − µk I)v = vk can be solved inexactly using an iterative solution
method.
Z0 = S, Zk = M Zk−1 , k = 1, 2, . . . , (7.3.16)
Zk = M k S = (M k s1 , . . . , M k sp ).
In applications, M is often a very large sparse matrix, and p ≪ n. If M has a dominant eigen-
value λ1 , then all columns of Zk will converge to a scalar multiple of the dominant eigenvec-
tor x1 . Therefore, Zk will be close to a matrix of numerical rank one, and it is not clear that
much will be gained. If S = span (S), subspace iteration is actually computing a sequence of
subspaces M k S = span (M k S). The problem is that Zk = M k S becomes an increasingly
ill-conditioned basis for M k S. To avoid this, orthogonality can be maintained between the basis
columns as follows. Orthogonal iteration starts with an orthonormal matrix Q0 and computes
Zk = M Qk−1 , Zk = Qk Rk , k = 1, 2, . . . . (7.3.17)
Here Rk plays the role of a normalizing matrix, and Q1 = Z1 R1−1 = M Q0 R1−1 . By induction,
it can be shown that
Qk Rk · · · R1 = M k Q0 . (7.3.18)
Hence the iterations (7.3.16) and (7.3.17) generate the same sequence of subspaces, R(M k Q0 ) =
R(Qk ). Since iteration (7.3.16) is less costly, it is sometimes preferable to perform the orthogo-
nalization in (7.3.17) only occasionally as needed. Bauer [94, 1957] suggests a procedure called
treppen-iteration (staircase iteration) to maintain linear independence of the basis vectors. This
is similar to orthogonal iteration but uses LU instead of QR factorizations.
Orthogonal iteration overcomes several disadvantages of the power method. Provided
|λp+1 /λp | is small, it can be used to determine the invariant subspace corresponding to the
dominant p eigenvalues. Assume that the eigenvalues of M satisfy
and let
U1H T11 T12
M (U1 U2 ) = (7.3.19)
U2H 0 T22
368 Chapter 7. SVD Algorithms and Matrix Functions
Lemma 7.3.5 (Watkins [1103, 1982]). Let S and S ⊥ be orthogonal complementary subspaces
of Cn . Then for all integers k the spaces M k S and (M H )−k S ⊥ are also orthogonal.
are equivalent in the sense that the orthogonal complement of a subspace in one sequence equals
the corresponding subspace in the other. This result is important for understanding convergence
properties of the QR algorithm. A geometric theory for QR and LR iterations is given by Parlett
and Poole [885, 1973].
M y − θy ⊥ Sk . (7.3.20)
Let Sk = R(Qk ) for some orthonormal matrix Qk , and set y = Qk z. Then condition (7.3.20)
can be written
QH
k (M − θI)Qk z = 0
(Hk y − θI)z = 0, Hk = QH
k M Qk . (7.3.21)
7.3. Computing Selected Singular Triplets 369
The matrix Hk ∈ Ck is Hermitian and is the matrix Rayleigh quotient of M . Note that the
condition of this projected eigenvalue problem is not degraded. In practice, M is often large and
sparse, and one is only interested in approximating part of its spectrum. If k ≪ n, the Hermitian
eigenvalue problem (7.3.21) is small and can be solved by a standard method, such as the QR
algorithm. The solution yields k approximate eigenvalues and eigenvectors of M as described in
the procedure below.
Hk = QH
k (M Qk ) ∈ R
k×k
. (7.3.22)
2. Compute the Ritz values (the eigenvalues of Hk ) and select from them p ≤ k desired
approximate eigenvalues θi , i = 1, . . . , p. Then compute the corresponding eigenvectors
zi :
Hk zi = θi zi , i = 1, . . . , p. (7.3.23)
Backward error bounds for the approximate eigenvalues θi , i = 1 : p, are obtained from the
residuals
ri = M yi − yi θi = (M Qk )zi − yi θi , i = 1 : p. (7.3.24)
The Ritz value θi is an exact eigenvalue for a matrix M + Ei , with ∥Ei ∥2 ≤ ∥ri ∥2 . The
corresponding forward error bound is |θi − λi | ≤ ∥ri ∥2 . The Rayleigh–Ritz procedure is optimal
in the sense that the residual norm ∥M Qk −Qk Hk ∥ is minimized for all unitarily invariant norms
by taking Hk equal to the matrix Rayleigh quotient (7.3.22).
No bound for the error in a Ritz vector yi can be given without more information. This is to be
expected, because if another eigenvalue is close to the Ritz value, the eigenvector is very sensitive
to perturbations. If the Ritz value θi is known to be well separated from other eigenvalues of M
except the closest one, then a bound on the error in the Ritz vector and also an improved error
bound for the Ritz value yi can be obtained. If λi is the eigenvalue of M closest to θi , then
|θi − λi | ≤ ∥ri ∥22 /gap (θi ), gap (θi ) = min |λj − θi |. (7.3.25)
j̸=i
When some of the intervals [θi − ∥ri ∥2 , θi + ∥ri ∥2 ], i = 1, . . . , k, overlap, we cannot be sure
of having an eigenvalue of M in each of these intervals. When the Ritz values are clustered, the
following theorem provides useful bounds for individual eigenvalues of M .
370 Chapter 7. SVD Algorithms and Matrix Functions
Theorem 7.3.6. Let M ∈ Cn×n be Hermitian, let Qk ∈ Cn×p be any orthonormal matrix, and
set
H = QHk M Qk , R = M Qk − Qk B.
Then to the eigenvalues θ1 , . . . , θk of H there correspond eigenvalues λ1 , . . . , λk of M such that
|λi − θi | ≤ ∥R∥2 , i = 1 : p. Furthermore, there are eigenvalues λi of M such that
p
X
(λi − θi )2 ≤ 2∥R∥2F .
i=1
Unless the Ritz values are well separated, there is no guarantee that the Ritz vectors are
good approximations to an eigenvalue of M . This difficulty arises because B may have spurious
eigenvalues bearing no relation to the spectrum of M . This problem can be resolved by using a
refined Ritz vector as introduced by Jia [668, 2000]. This is the solution y to the problem
The solution is given by a right singular vector z corresponding to the smallest singular value of
M Qk − θQk . Since M Qk must be formed anyway in the Rayleigh–Ritz procedure, the extra
cost is only that of computing the SVD of a matrix of size n × k. In the Hermitian case the
Ritz vectors can be chosen so that Z = (z1 , . . . , zk ) is unitary and the projected matrix B is
Hermitian. Then, for each Ritz value θi there is an eigenvalue λi of A such that
For determining interior and small eigenvalues of M it is more appropriate to use the har-
monic Ritz values introduced by Paige, Parlett, and van der Vorst [864, 1995]. Given the sub-
space span (Qk ), the harmonic projection method requires that
This is a generalized symmetric eigenvalue problem, and the eigenvalues are the harmonic Ritz
values. If the basis matrix Qk is chosen so that Vk = M Qk is orthonormal, then (7.3.29) becomes
(M Qk )H (M Qk − θQk )z = 0, or because Qk = M −1 Vk ,
More directly, the GKL bidiagonalization (Section 4.2.3) of a rectangular matrix A ∈ Rm×n ,
m ≥ n, can be used to implement the Rayleigh–Ritz procedure. Starting with a unit vector
v1 ∈ Rn , this computes u1 = Av1 /∥Av1 ∥2 ∈ Rm , and for i = 1, 2, . . . ,
γi+1 vi+1 = AT ui − ρi vi , (7.3.31)
ρi+1 ui+1 = Avi+1 − γi+1 ui . (7.3.32)
Here γi+1 and ρi+1 are nonnegative scalars chosen so that ui+1 , vi+1 are unit vectors. With
Uk = (u1 , . . . , uk ), Vk = (v1 , . . . , vk ), the recurrence relations can be summarized as
AVk = Uk Bk , AT Uk = Vk BkT + γk+1 vk+1 eTk , (7.3.33)
where
ρ1 γ2
ρ2 γ3
Bk =
.. .. ∈ Rk×k
. .
ρk−1 γk
ρk
is upper bidiagonal. Note that
γk+1 = ∥rk+1 ∥2 , rk+1 = AT uk − ρk vk .
If γk+1 = 0, it follows from (7.3.33) that the singular values of Bk are singular values of A, and
the associated singular vectors can be obtained from the SVD of Bk and Uk and Vk .
The columns of Uk and Vk form orthonormal bases for the Krylov subspaces Kk (ATA, v1 )
and Kk (AAT , Av1 ), respectively. From (7.3.33) the factorization for the equivalent Hermitian
problem (7.3.2) is
0 A Uk 0 Uk 0 0 Bk 0
= + . (7.3.34)
AT 0 0 Vk 0 Vk BkT 0 γk+1 vk+1 eTk
To avoid spurious singular values caused by loss of orthogonality in Uk and Vk in floating-
point arithmetic, a selective reorthogonalization scheme can be used. As shown by Simon and
Zha [996, 2000] it may suffice to reorthogonalize either Vk or Uk , with considerable savings in
storage and operations.
After k steps of the bidiagonalization process the projected Rayleigh quotient matrix is given
by Bk = UkT AVk . The Rayleigh–Ritz procedure for the Krylov subspaces Kk (ATA, v1 ) and
Kk (AAT , Av1 ) computes the SVD
Bk = Pk Ωk QTk , Ω = diag (ω1 , . . . , ωk ), (7.3.35)
to obtain Ritz values ωi and left/right Ritz vectors v̂i = Vk Qk ei and ûi = Uk Pk ei . The largest
singular values of A tend to be quite well approximated by ωi for k ≪ n. Hochstenbach [635,
2004] shows that for nested subspaces the Ritz values approach the largest singular values mono-
tonically from above.
Small singular values are approached irregularly, but harmonic Ritz values converge to the
smallest singular values from above. Different extraction methods for singular values and vec-
tors are compared by Hochstenbach [635, 2004]. His numerical experiments confirm that for
extracting large singular values, the standard method works well. For interior or small singular
values, harmonic Ritz values perform better. The harmonic Ritz values θi satisfy the generalized
eigenproblem
1 Bk BkT + γk+1 2
ek eTk
0 Bk si 0 si
= , (7.3.36)
BkT 0 wi θi 0 BkT Bk wi
372 Chapter 7. SVD Algorithms and Matrix Functions
see Jia and Niu [671, 2010]. This result can also be obtained from similar formulas for the
Lanczos method given by Baglama, Calvetti, and Reichel [54, 2003]. It follows that the harmonic
Ritz value θi and Ritz vectors si , wi can be obtained more simply from the singular values and
right singular vectors of the lower bidiagonal matrix
ρ1
γ2 ρ2
..
BkT γ3 .
= ∈ R(k+1)×k .
T ..
γk+1 ek . ρk−1
γk ρk
γk+1
With s̃i = si /∥si ∥2 and w̃i = wi /∥wi ∥2 , the Ritz vectors are
To improve convergence and reliability, Jia recommends that the Rayleigh quotient ρi = s̃i Bk w̃i
be used as an approximation of σi rather than θi .
The computed Ritz vectors may exhibit slow and irregular convergence even though the Ritz
value has converged. Jia and Niu [671, 2010] propose a refined strategy that combines harmonic
extraction with the refined projection principle. After using harmonic extraction to compute
ρi = s̃i Bk w̃i , ũi , and ṽi it computes the smallest singular value σmin and the corresponding
right singular vector zi = (xTi , yiT )T of the matrix
0 Bk I 0
BkT 0 − ρi 0 I . (7.3.38)
γk eTk 0 0 0
Then the new left and right approximate singular vectors are taken to be
where the matrix Tk = BkT Bk is symmetric and tridiagonal. Hence, the implicitly restarted
Lanczos algorithm by Sorensen [1011, 1992] could be applied to T̂k = BkT Bk − µ2 I.
7.3. Computing Selected Singular Triplets 373
Björck, Grimme, and Van Dooren [145, 1994] show that forming BkT Bk can be avoided by
applying Golub–Reinsch QRSVD steps to Bk directly; see Section 7.1.4. First, a Givens rotation
(1)
Gl is determined so that 2
(1) ρ1 − µ2 ∗
Gl = .
ρ1 γ2 0
(1)
This creates in Gl BkT an unwanted nonzero element in position (1, 2). Next, the bidiagonal
(1)
form of Gl BkT is restored using k − 1 additional left and right Givens rotations to chase out the
unwanted nonzero element, giving
AVbk = U
bk B
bk , AT U bkT + γk+1 vk+1 eTk Qk .
bk = Vbk B (7.3.40)
However, the last relation is not a valid relation for the bidiagonalization algorithm because the
residual term takes on the invalid form
where q(i,j) is the (i, j)th element in Qk . This can be dealt with by sacrificing one step. Equating
the first (k − 1) columns of the second relation in (7.3.40), we obtain
AT U
bk−1 = Vbk−1 B T
bk−1 + rbk eTk , rbk = γ
bk vk + γk+1 q(k+1,k) vk+1 . (7.3.42)
Similarly, taking the first k−1 columns of the first relation in (7.3.40) gives the restarted analogue
AVbk−1 = U bk−1 Bbk−1 . It can be shown that Ubk−1 , Vbk−1 , and B bk−1 are what would have been
obtained after k − 1 steps of bidiagonalization with a unit starting vector proportional to
is obtained. An efficient strategy for restarting the Arnoldi or Lanczos process, proposed by
Sorensen [1011, 1992] and Lehoucq, Sorensen, and Yang [731, 1998], is to use unwanted Ritz
values as shifts to cause the resulting subspaces to contain more information about the desired
singular values. For example, to compute the p largest (smallest) singular triples, the shifts are
chosen as the k − p smallest (largest) singular values of Bk .
The standard implicitly restarted Lanczos tridiagonalization can suffer from numerical insta-
bilities caused by propagated round-off errors; see Lehoucq, Sorensen, and Yang [735, 1998].
An alternative is to perform the implicit restarts by augmenting the Krylov subspaces by certain
Ritz vectors. This process is mathematically equivalent to standard implicit restarts but is more
stable. A description of how to restart the bidiagonalization process by this method is found in
Baglama and Reichel [56, 2005].
PROPACK is a software package that uses bidiagonalization to compute selected singular
triplets. The initial work on PROPACK is described by Larsen [722, 1998]. Later versions
374 Chapter 7. SVD Algorithms and Matrix Functions
include implicit restarts and partial reorthogonalization; see Larsen [723, 2000]. An overview
of PROPACK versions is found at http://soi.stanford.edu/~rmunk/PROPACK/. The al-
gorithm IRLANB of Kokiopoulou, Bekas, and Gallopoulos [703, 2004] computes a few of the
smallest singular values. It uses an implicitly restarted bidiagonalization process with partial
reorthogonalization and harmonic Ritz values. A refinement process is applied to converged
singular vectors. Deflation is applied directly on the bidiagonalization process. The implic-
itly restarted block-Lanczos algorithm irbleigs of Baglama, Calvetti, and Reichel [55, 2003]
computes a few eigenpairs of a Hermitian matrix. It can be used to obtain singular triplets by
applying it an equivalent Hermitian eigenproblem. The algorithm irlba of Baglama and Re-
ichel [56, 2005] is directly based on the bidiagonalization process with standard or harmonic
Ritz values. A block bidiagonalization version is given by Baglama and Reichel [57, 2006].
Md = (I − yk ykH )M (I − yk ykH )
be the corresponding projected matrix. Then in each step of the iteration an equation of the form
Md z = u, where z, u ⊥ yk , has to be solved. This can be done as in (7.3.46) by computing
ykH M −1 u
z = αM −1 yk − M −1 u, α= .
ykH M −1 yk
Here M −1 yk and ykH M −1 yk need only be computed in the first iteration step. Only one appli-
cation of the preconditioner M is needed in later steps.
The Jacobi–Davidson method is among the most effective methods for computing a few in-
terior eigenvalues of a large sparse matrix, particularly when a preconditioner is available or
generalized eigenvalue problems are considered. Other methods, such as the “shift-and-invert”
variants of Lanczos and Arnoldi, require factorization of the shifted matrix. Moreover, the re-
sulting linear systems need to be solved accurately. Therefore, these methods are not well suited
to combinations with iterative methods as solvers for the linear systems. In Jacobi–Davidson
methods, such expensive factorizations can be avoided. Efficient preconditioned iterative solvers
can be used in inner iterations.
The Jacobi–Davidson method was introduced by Sleijpen and van der Vorst [1002, 1996],
[1003, 2000]. For a survey of variations and applications of this method, see Hochstenbach
and Notay [636, 2006]. Jacobi–Davidson algorithms for the generalized eigenvalue problem
are given in Fokkema, Sleijpen, and van der Vorst [415, 1998]. Variable preconditioners for
eigenproblems are studied by Eldén and Simoncini [381, 2002].
ARPACK is an implementation of the implicitly restarted Arnoldi method. It has become the
most successful and best known public domain software package for solving large-scale eigen-
value problems. ARPACK can be used for finding a few eigenvalues and eigenvectors of large
symmetric or unsymmetric standard or generalized eigenvalue problems; see the users’ guide of
Lehoucq, Sorensen, and Yang [731, 1998]. In MATLAB the eigs function is an interface to
ARPACK. The block Lanczos code of Grimes, Lewis, and Simon [538, 1994] and its updates
are often used for structural analysis problems in industrial applications. A selection of other
software packages freely available are listed in Sorensen [1012, 2002]. An ARPACK-based iter-
ative method for solving large-scale quadratic problems with a quadratic constraint is developed
in Rojas, Santos, and Sorensen [931, 2008].
of his method in [930, 1908]. Lord Rayleigh incorrectly claimed in 1911 that all the ideas in
Ritz’s work were present in his earlier paper [914, 1899].
Golub, Luk, and Overton [498, 1981] develop a block Lanczos method for computing se-
lected singular values and vectors of a matrix. Other Krylov subspace algorithms for computing
singular triplets are given by Cullum, Willoughby, and Lake [279, 1983]. Codes for partial sin-
gular value decompositions of sparse matrices for application to information retrieval problems
and seismic tomography are given by Berry [113, 1992], [114, 1993], [115, 1994]. Sleijpen and
van der Vorst [1002, 1996] develop an alternative Jacobi–Davidson algorithm for the partial Her-
mitian eigenproblem. A similar algorithm called JDSVD for computing singular triplets is given
by Hochstenbach [634, 2001].
Traditional inverse iterations use several Rayleigh quotient shifts for each singular value,
or just one factorization, and apply bidiagonalization on the shifted and inverted problem. In
Ruhe [941, 1998] and a series of other papers the Rational Krylov subspace methods are devel-
oped, which attempt to combine the virtues of these two approaches. Ruhe iterates with several
shifts to build up one basis from which several singular values can be computed.
Oks̆a, Yamamoto, and Vajters̆ic [836, 2022] show the convergence to singular triplets for a
two-sided block-Jacobi method with dynamic ordering.
has radius of convergence r ∈ (0, ∞). Then (7.4.1) converges uniformly for any |z| < r and
diverges for any |z| > r. In the interior of the circle of convergence, formal operations such
as termwise differentiation and integration with respect to z are valid. Consider now the related
matrix power series
∞
X
f (A) = ak Ak . (7.4.2)
k=0
Furthermore, Af (A) = f (A)A, i.e., f (A) commutes with A. An important example of a matrix
function is the matrix exponential eA . This can be defined by its series expansion
1 2 1
eA = I + A + A + A3 + · · ·
2! 3!
for any matrix A ∈ Cn×n . Other examples are the matrix square root and sign functions, which
are treated next.
The previous assumption that A is diagonalizable is not necessary. Any matrix A ∈ Cn×n is
similar to a block diagonal matrix with almost diagonal matrices, which reveals its algebraic
properties. This is the Jordan canonical form named after the French mathematician Marie
Ennemond Camille Jordan (1838–1922).
7.4. Matrix Functions and SVD 377
Theorem 7.4.1 (Jordan Canonical Form). Any matrix A ∈ Cn×n is similar to the block
diagonal matrix
where
λi 1
..
λi .
Jmi (λi ) = = λi I + Si ∈ Cmi ×mi , i = 1 : t, (7.4.5)
..
. 1
λi
Pt
are Jordan blocks and Si are shift matrices. The numbers m1 , . . . , mt are unique and i=1 mi =
n. The form (7.4.4) is called the Jordan canonical form and is unique up to the ordering of the
Jordan blocks.
A proof of this fundamental theorem is given in Horn and Johnson [639, 1985, Sect. 3.1]. It is
quite long and is therefore omitted here. The following result follows from an explicit expression
of the powers of a single Jordan block.
Theorem 7.4.2. Let A have the Jordan canonical form (7.4.4). Assume that f (λ) and its first
mk − 1 derivatives are defined for λ = λk , k = 1 : t. Then the function f (A) is said to be defined
on the spectrum of A, and
f (A) = X diag f Jm1 (λ1 ) , . . . , f Jmt (λt ) X −1 ,
(7.4.6)
where
m−1
X 1 (p)
f (Jmk ) = f (λk )I + f (λk )S p
p=1
p!
f (mk −1) (λk )
′
f (λk ) f (λk ) · · · (mk − 1)!
.. ..
=
f (λk ) . . .
(7.4.7)
. .. ′
f (λk )
(
f λk )
If f is a multivalued function, and a repeated eigenvalue of A occurs in more than one Jordan
block, then the same branch of f and its derivatives is usually taken. This choice gives a primary
matrix function that is expressible as a polynomial in A. In the following it is assumed that f (A)
is a primary matrix function unless stated otherwise. Then the Jordan canonical form definition
(7.4.6) does not depend on the ordering of the Jordan blocks.
There are several equivalent ways to define a function of a matrix. One definition, due to
Sylvester (1883), uses polynomial interpolation. Denote by λ1 , . . . , λt the distinct eigenvalues
of A, and let mk be the index of λk , i.e., the order of the largest Jordan block containing λk .
Assume that the function is defined on the spectrum Λ(A) of A. ThenPf (A) = p(A), where p is
t
the unique Hermite interpolating polynomial of degree less than n = k=1 mk that satisfies the
interpolating conditions
Note that the coefficients of the interpolating polynomial depend on A and that f (A) commutes
with A. It is well known that this interpolating polynomial exists and can be computed by
Newton’s interpolation formula
∗
n
X
f (A) = f (λ1 )I + f (λ1 , λ2 , . . . , λj )(A − λ1 I) · · · (A − λj I), (7.4.9)
j=1
where λj , j = 1 : n∗ are the distinct eigenvalues of A, each counted with the same multiplicity
as in the minimal polynomial. Thus, n∗ is the degree of the minimal polynomial of A. Formulas
for complex Hermite interpolation are given in Dahlquist and Björck [284, 2008, Sect. 4.3.2].
The definitions by the Jordan canonical form and polynomial interpolation can be shown to be
equivalent. Theory and computation of matrix functions are admirably surveyed in the seminal
monograph of Higham [625, 2008].
X 2 = A. (7.4.10)
If A has no eigenvalues on the closed negative real axis, then there is a unique principal square
root such that −π/2 < arg(λ(X)) < π/2. The principal square root is denoted by A1/2 . When
it exists, it is a polynomial in A. If A is Hermitian and positive definite, then the principal square
root is the unique Hermitian and positive definite square root. If A is real and has a square root,
then A1/2 is real.
The square root of a matrix may not exist. For example, it is easy to verify that
0 1
A=
0 0
cannot have a square root. To ensure that a square root exists, it suffices to assume that A has at
most one zero eigenvalue. If A is nonsingular and has s distinct eigenvalues, then it has precisely
2s square roots that are expressible as polynomials in the matrix A.
The principal square root of A can be computed directly using only the Schur decomposition
A = QT QH with T upper triangular; see Björck and Hammarling [146, 1983]. Then
1/2
Starting with sii = tii , i = 1 : n, the off-diagonal elements of S can be computed one diagonal
at a time from these relations in n3 /3 flops:
j−1
X .
sij = tij − sik skj (sii + sjj ), 1 ≤ i < j ≤ n. (7.4.12)
k=i+1
If tii = tjj we take sii = sjj , so this recursion does not break down. (Recall that we have
assumed that at most one diagonal element of T is zero.) The arithmetic cost of this algorithm is
7.4. Matrix Functions and SVD 379
dominated by the 25n3 flops required for computing Q and T in the Schur decomposition. When
A is a normal matrix (AHA = AAH ), T is diagonal. In this case, S is diagonal and the flop count
is reduced to 9n3 . A modified algorithm by Higham [611, 1987] avoids complex arithmetic for
real matrices with some complex eigenvalues by using the real Schur decomposition.
In applications where it is too costly to compute the Schur decomposition, an iterative method
can be used. Assume that A ∈ Cn×n has a principal square root, and let Xk be an approximation
to A1/2 . If Xk+1 = Xk + Hk , then
To solve for the correction Hk requires solving the Sylvester equation (7.4.13), which is expen-
sive. If the initial approximation X0 in (7.4.13) is chosen as a polynomial in A, e.g., X0 = I or
X0 = A, then all subsequent iterates Xk commute with A. Then (7.4.13) simplifies to
1
Xk + AXk−1 ,
Xk+1 = (7.4.14)
2
which is the matrix version of the well-known scalar iteration zk+1 = (zk + a/zk )/2 for the
square root of a. Unfortunately, iteration (7.4.14) is unstable and converges only if A is very well-
conditioned. Divergence is caused by rounding errors that make the computed approximation Xk
fail to commute with A; see Higham [610, 1986].
Several stable modifications of the simplified Newton iteration (7.4.14) have been suggested;
see Iannazzo [653, 2003]. Denman and Beavers [313, 1976] rewrite (7.4.14) as
1
Xk+1 = Xk + A1/2 Xk−1 A1/2 .
2
With Yk = A−1/2 Xk A−1/2 , this gives the coupled iteration: X0 = A, Y0 = I,
1 1
Xk + Yk−1 , Yk + Xk−1 .
Xk+1 = Yk+1 = (7.4.15)
2 2
This iteration is stable with a quadratic rate of convergence, and limk→∞ Xk = A1/2 , limk→∞ Yk
= A−1/2 . Another stable modification of Newton’s iteration due to Meini [787, 2004] can be
written: X0 = A, H0 = 12 (I − A),
1 −1
Xk+1 = Xk + Hk , Hk+1 = − Hk Xk+1 Hk , k = 0, 1, 2, . . . . (7.4.16)
2
The convergence rate of Meini’s iteration is quadratic and can be improved by scaling. Similar
Newton methods for computing the pth root of a matrix A1/p for p > 2 can be developed; see
Iannazzo [654, 2006].
Newton-type methods need the inverse of Xk or its LU (or Cholesky) factorization in each
iteration. Another possibility is to use an inner iteration for computing the needed inverses. The
Schulz iteration [976, 1933] for computing A−1 is
−1 if ℜz < 0,
n
sign(z) = (7.4.18)
+1 if ℜz > 0
for all z ∈ C not on the imaginary axis. We assume in the following that A ∈ Cn×n is a
matrix with no eigenvalues on the imaginary axis. Its Jordan canonical form can be written
A = Xdiag (J+ , J− )X −1 , where the eigenvalues of J+ lie in the open right-hand plane and
those of J− lie in the open left-hand plane. Then
Ik 0
S = sign(A) = X X −1 (7.4.19)
0 −In−k
It follows that the eigenvalues of A in the right half-plane equal Λ(A11 ) and those in the left
half-plane are Λ(A22 ). This can be used to design spectral divide-and-conquer algorithms
for computing eigenvalue decompositions and other fundamental matrix decompositions via the
matrix sign function. The problem is recursively decoupled into two smaller subproblems by
using the sign function to compute an invariant subspace for a subset of the spectrum. This
type of algorithm can achieve more parallelism and have lower communication costs than other
standard eigenvalue algorithms; see Bai and Demmel [60, 1998].
7.4. Matrix Functions and SVD 381
For Hermitian (and real symmetric) matrices A ∈ Cm×n , the eigenvalue decomposition
can be written A = V diag (Λ+ , Λ− )V H , where the diagonal matrices Λ+ and Λ− contain the
positive and negative eigenvalues, respectively. Then,
where A = P H is the polar decomposition. If the unitary polar factor P is known, then
H
P + I = ( V1 V2 ) diag (2Ik , 0) ( V1 V2 ) = 2V1 V1H . (7.4.24)
C = 12 (P + I) = V1 V1H
A = (P V )ΣV H = U ΣV H .
The matrix sign function can be computed by a scaled version of the Newton iteration for
X 2 = I:
1
Xk + Xk−1 , k = 0, 1, 2, . . . .
X0 = A, Xk+1 = (7.4.25)
2
The corresponding scalar iteration λk+1 = λk +λ−1
k /2 is Newton’s iteration for the square root
of 1. It converges quadratically to 1 if ℜ(λ0 ) > 0, and to −1 if ℜ(λ0 ) < 0. The matrix iteration
(7.4.25) is globally and quadratically convergent to sign (A), provided A has no eigenvalues on
the imaginary axis. From the Jordan canonical form it follows that the eigenvalues are decoupled
(0)
and obey the scalar iteration with λj = λj (A). Ill-conditioning of a matrix Xk can destroy the
convergence or cause misconvergence.
Higher order iterative methods for sign (A) can be derived from matrix analogues of Taylor
or Padé approximations of the function h(ξ) = (1 − ξ)−1/2 . The Padé approximations of a
function f (z) are rational functions
Pℓ j
Pℓ,m (z) j=0 pj z
rℓ,m (z) = ≡ Pm j
, (7.4.26)
Qℓ,m (z) j=0 qj z
with numerator of degree at most ℓ and denominator of degree at most m, such that
(1 + z)p − (1 − z)p
rℓ,m = ,
(1 + z)p + (1 − z)p
382 Chapter 7. SVD Algorithms and Matrix Functions
where p = ℓ + m + 1. That is, the numerator and denominator are, respectively, the odd and even
parts of (1 + z)p . This makes it easy to write down the corresponding rational approximations.
The principal Padé approximations have the following properties; see Kenney and Laub [691,
1991, Theorem 5.3].
Theorem 7.4.3. If A has no purely imaginary eigenvalues, then a Padé approximation with
ℓ = m or ℓ = m − 1 gives the rational iteration X0 = A,
pℓ,m (1 − Xk2 )
Xk+1 = Xk , k = 0, 1, 2, . . . . (7.4.28)
qℓ,m (1 − Xk2 )
This converges to S = sign (A), and
(ℓ+m+1)k
(S − Xk )(S + Xk )−1 = (S − A)(S + A)−1
.
which is Halley’s method for sign (A) and has cubic convergence rate.
A = U ΣV H = (U V H )(V ΣV H ),
where U ∈ Rm×n and V ∈ Rn×n are unitary. Then (7.4.30) holds with
A = P H, P = UV H, H = V ΣV H . (7.4.31)
7.4. Matrix Functions and SVD 383
For a square nonsingular matrix, the polar decomposition was first given by Autonne [41,
1902]. The factor P in the polar decomposition is the orthogonal (unitary) matrix closest to A.
Theorem 7.4.5. Let A ∈ Cm×n (m ≥ n) have the polar decomposition A = P H. Then, for
any unitarily invariant norm,
Theorem 7.4.5 suggests that computing the polar factor P is the “optimal orthogonalizing”
of a given matrix. In contrast to other orthogonalization methods it treats the columns of A
symmetrically, i.e., if the columns of A are permuted, the same P with permuted columns is
obtained. In quantum chemistry this orthogonalization method was pioneered by Löwdin [760,
1970] and is called Löwdin orthogonalization; see Bhatia and Mukherjea [117, 1986]. Other
applications of the polar decomposition arise in aerospace computations, factor analysis, satellite
tracking, and the Procrustes problem; see Section 7.4.4.
The Hermitian polar factor H also has a certain optimal property. Let A ∈ Cn×n be a Hermit-
ian matrix with at least one negative eigenvalue. Consider the problem of finding a perturbation
E such that A + E is positive semidefinite.
The theorem was proved for m = n by Fan and Hoffman [395, 1955]. For the generalization
to m > n, see Higham [625, 2008, Theorem 8.4].
The polar decomposition can be regarded as a generalization of the polar decomposition
z = eiθ |z| of a complex number z. Thus
Expanding h(ξ) = (1 − ξ)−1/2 in a Taylor series and terminating the series after the term of
degree p gives p
eiθ = z 1 + 21 ξ + 38 ξ 2 + · · · + −1/2
p ξ . (7.4.34)
This series is convergent for |ξ| < 1.
A family of iterative methods for computing the unitary polar factor P is derived by Björck
and Bowie [138, 1971]. By a well-known analogy between matrices and complex numbers, we
get
P = A(AH A)−1/2 = A(I − E)−1/2 , E = I − AH A.
The matrix series corresponding to (7.4.34),
−1/2
P = A 1 + 12 E + 83 E 2 + · · · +
p Ep , (7.4.35)
384 Chapter 7. SVD Algorithms and Matrix Functions
converges to P if the spectral radius ρ(E) < 1. Terminating the expansion after the term of
order p gives an iterative method of order p + 1 for computing P . For p = 1 the following simple
iteration is obtained: P0 = A,
Pk+1 = Pk I + 12 Ek , Ek = I − PkH Pk , k = 0, 1, 2, . . . .
(7.4.36)
√
This only uses matrix-matrix products. If σmax (A) < 3, then Pk converges to P with quadratic
rate. In applications where A is already close to an orthogonal matrix, sufficient accuracy will
be obtained after just a few iterations.
There are more rapidly converging iterative methods that work even when A is far from
orthogonal. Newton’s method applied to the equation P H P = I yields the iteration
1
Pk + Pk−H ,
P0 = A, Pk+1 = k = 0, 1, 2, . . . . (7.4.37)
2
This converges globally to the unitary polar factor P of A with quadratic rate:
see Higham [625, 2008, Theorem 8.12]. The iteration (7.4.37) cannot be applied to a rectan-
gular matrix A. This is easily dealt with by first computing the QR factorization A = QR,
Q ∈ Rm×n (preferably with column pivoting). Apply the Newton iteration (7.4.37) with initial
approximation P0 = R to compute the polar factor P of R. Then QP is the unitary polar factor
of A.
If A is ill-conditioned, the convergence of the Newton iteration can be very slow initially.
Convergence can be accelerated by taking advantage of the fact that the orthogonal polar factor
of the scaled matrix γA, γ ̸= 0, is the same as for A. The scaled Newton iteration is
where γk are scale factors. Scale factors that minimize ∥Pk+1 − P ∥2 are determined by the
condition that γk σ1 (Pk ) = 1/(γk σn (Pk )), i.e.,
Because the singular values of Pk are not known, the cheaply computable approximations γk =
(αk /βk )−1/2 , where
q
βk = ∥Pk−1 ∥1 ∥Pk−1 ∥∞ ,
p
αk = ∥Pk ∥1 ∥Pk ∥∞ ,
are used instead; see Higham [610, 1986]. The resulting iteration converges in at most nine
iterations to full IEEE double precision of 10−16 even for matrices with a condition number as
large as κ2 (A) = 1016 ; see Higham [625, 2008, Section 8.9]. Kielbasiński and Zietak [694,
2003] show that using the suboptimal scale factors
s √
√ 2 ab
q
γ0 = 1/ ab, γ1 = , γk+1 = 1/ 12 (γk + 1/γk ), k = 1, 2, . . . ,
a+b
The initial rate of convergence of this iteration is very slow when κ(A) is large. A dynamically
weighted Halley (QDWH) algorithm, where P0 = A/∥A∥2 ,
−1
Pk+1 = Pk ak I + bk XkH Xk I + ck XkH Xk
, (7.4.40)
is proposed by Nakatsukasa, Bai, and Gygi [820, 2010]. The singular values of Xk+1 are given
by σi (Xk+1 ) = gk (σi (Xk ), where
ak + bk x2
gk (x) = x .
1 + ck x2
Ideally, the weighting parameters ak , bk , and ck should be chosen to maximize lk+1 , where
[lk+1 , 1] contains all singular values of Xk+1 . A suboptimal choice makes the function gk satisfy
the bounds 0 < gk (x) ≪ 1 for x ∈ [lk , 1] and attain the max-min
n o
max min gk (x) .
ak ,bk ,ck [lk ,1]
The QDWH method has the advantage that it requires at most six iterations for convergence to the
unitary polar factor of A to full IEEE double precision 10−16 for any matrix with κ(A) ≤ 1016 . A
proof of the backward stability of the QDWH method is given by Nakatsukasa and Higham [822,
2013].
The sensitivity of the factors in the polar decomposition to perturbations in A has been studied
by Barrlund [83, 1990] and Chaitin-Chatelin and Gratton [216, 2000]. The absolute condition
number in the Frobenius norm for the orthogonal factor P is 1/σn (A). If A is real and m = n,
this can be improved to
2/(σn (A) + σn−1 (A)).
√
For the Hermitian factor H, an upper bound on the condition number is 2.
10 In Greek mythology, Procrustes was a rogue smith and bandit who seized travelers, tied them to an iron bed, and
either stretched them or cut off their legs to make them fit.
386 Chapter 7. SVD Algorithms and Matrix Functions
The solution to this problem can be computed from the polar decomposition of B TA, as shown
by the following generalization of Theorem 7.4.5.
Theorem 7.4.7 (Schönemann [973, 1966]). Let Mm×n be the set of all matrices in Rm×n ,
m ≥ n, with orthogonal columns. Let A and B be given matrices in Rm×n such that rank(B TA) =
n. Then
∥A − BQ∥F ≥ ∥A − BP ∥F
for any matrix Q ∈ Mm×n , where B TA = P H is the polar decomposition.
Proof. From ∥A∥2F = trace (ATA) and trace (X T Y ) = trace (Y X T ) and the orthogonality of
Q, it follows that
It follows that problem (7.4.42) is equivalent to maximizing trace (QT B TA). From the SVD
B TA = U ΣV T , set Q = U ZV T with Z orthogonal. Then ∥Z∥2 = 1, and the diagonal elements
of Z must satisfy |zii | ≤ 1, i = 1 : n. Hence,
The orthogonal Procrustes problem arises in factor analysis and multidimensional scaling
methods in statistics; see Cox and Cox [276, 1994]. In these applications the matrices A and
B represent sets of experimental data, and the question is whether these are identical up to a
rotation. Another application is in determining rigid body movements. Let a1 , a2 , . . . , am be
measured positions of m ≥ n landmarks of a rigid body in Rn , and let b1 , b2 , . . . , bm be the
measured positions after the body has been rotated. We seek an orthogonal matrix Q ∈ Rn×n
representing the rotation of the body; see Söderkvist and Wedin [1009, 1994]. This has important
applications in radiostereometric analysis (Söderkvist and Wedin [1008, 1993]) and subspace
alignment in molecular dynamics simulation of electronic structures.
In many applications it is important that Q correspond to a pure rotation, i.e., det(Q) = 1. If
det(U V T ) = 1, the optimal Q = U V T as before. Otherwise, if det(U V T ) = −1, the optimal
solution can be shown to be (see Hanson and Norris [590, 1981])
For any Q, including the optimal Q not yet known, the best least squares estimate of c is charac-
terized by the condition that the residual be orthogonal to e. Multiplying by eT we obtain
where eTA/m and eT B/m are the mean values of the rows in A and B. Hence the optimal
translation is
1
c = ((B T e)Q − AT e). (7.4.44)
m
Substituting into (7.4.43) gives the problem minQ ∥A
e − BQ∥
e F , where
e = A − 1 e(eTA),
A e = B − 1 e(eT B),
B
m m
which is a standard orthogonal Procrustes problem.
A perturbation analysis of the orthogonal Procrustes problem is given by Söderkvist [1007,
1993]. If A ∈ Rm×n , B ∈ Rm×l , m > l, in (7.4.42), then the Procrustes problem is called
unbalanced. In this case, Q ∈ Rm×l is rectangular with orthonormal columns and no longer
satisfies trace (QTATAQ) = trace (ATA). Algorithms for this more difficult problem are given
by Park [879, 1991] and Eldén and Park [379, 1999]. Several other generalizations are treated in
the monograph by Gower and Dijksterhuis [521, 2004].
Chapter 8
Anyone who deals with nonlinear problems knows that everything works sometimes
and nothing works every time.
—John E. Dennis, Jr.
389
390 Chapter 8. Nonlinear Least Squares Problems
where f ′ denotes the first derivative. It can be derived in the same way as for real-valued vari-
ables, and the extension to longer chains is straightforward.
Let f : Rk → Rk , k > 1, be a function, and consider the equation x = f (y). By formal
−1
differentiation, dx = f ′ (y)dy, and we obtain dy = f ′ (y) dx, provided that the Jacobian
matrix f ′ (y) with elements (∂xi /∂yj ), 1 ≤ i, j ≤ k, is nonsingular. If f (x, y) = 0, then by
(8.1.2), fx dx + fy dy = 0. If fy (x0 , y0 ) is a nonsingular matrix, then y becomes, under certain
additional conditions, a differentiable function of x in a neighborhood of (x0 , y0 ), and we obtain
dy = −(fy )−1 fx dx; hence
y ′ (x) = −(fy )−1 fx |y=y(x) .
One can also show that
f (x0 + ϵv) − f (x0 )
lim = f ′ (x0 )v.
ϵ→+0 ϵ
There are, however, functions f for which such a directional derivative exists for any v, but f is
not a linear function of v for some x0 . An important example is f (x) = ∥x∥∞ , where x ∈ Rn .
(Look at the case n = 2.)
Consider the set of k-linear mappings from vector spaces Xi = X, i = 1, . . . , k, which we
also write as X k , to Y . This is itself a linear space, which we here denote by Lk (X, Y ). For
k = 1 we write it more briefly as L(X, Y ). If f ′ (x) is a differentiable function of x at the
point x0 , its derivative is denoted by f ′′ (x0 ). This is a linear function that maps X into the space
L(X, Y ) of mappings from X to Y that contains f ′′ (x0 ), i.e., f ′′ (x0 ) ∈ L(X, L(X, Y )). This
space may be identified in a natural way with the space L2 (X, Y ) of bilinear mappings X 2 → Y .
If A ∈ L(X, L(X, Y )), then the corresponding Ā ∈ L2 (X, Y ) is defined by (Au)v = Ā(u, v)
for all u, v ∈ X. In the following it is not necessary to distinguish between A and Ā, so
It can be shown that f ′′ (x0 ): X 2 → Y is a symmetric bilinear mapping, i.e., f ′′ (x0 )(u, v) =
f ′′ (x0 )(v, u). The second-order partial derivatives are denoted fxx , fxy , fyx , fyy . We can show
that fxy = fyx .
p p
If X = Rn , Y = Rm , m > 1, then f ′′ (x0 ) reads fij (x0 ) = fji (x0 ) in tensor notation.
It is thus characterized by a three-dimensional array, which one rarely needs to store or write.
Fortunately, most of the numerical work can be done on a lower level, e.g., with directional
derivatives. For each fixed value of p, we obtain a symmetric n × n matrix H(x0 ) called the
Hessian matrix; note that f ′′ (x0 )(u, v) = uT H(x0 )v. The Hessian can be looked upon as the
derivative of the gradient. An element of the Hessian is, in multilinear mapping notation, the pth
coordinate of the vector f ′′ (x0 )(ei , ej ).
Higher derivatives are recursively defined. If f (k−1) (x) is differentiable at x0 , its derivative at
x0 is denoted by f (k) (x0 ) and called the kth derivative of f at x0 . One can show that f (k) (x0 ) :
X k → Y is a symmetric k-linear mapping. Taylor’s formula then reads, when a, u ∈ X,
f :X →Y,
1 1
f (a + u) = f (a) + f ′ (a)u + f ′′ (a)u2 + · · · + f (k) (a)uk + Rk+1 , (8.1.3)
2 k!
Z 1
(1 − t)k (k+1)
Rk+1 = f (a + ut)dt uk+1 .
0 k!
Here we have used u2 , uk , . . . as abbreviations for the lists of input vectors (u, u), (u, u, . . . , u),
. . . , etc. It follows that
∥u∥k+1
∥Rk+1 ∥ ≤ max f (k+1) (a + ut) ,
0≤t≤1 (k + 1)!
8.1. Newton-Type Methods 391
where norms of multilinear operators are defined analogously to subordinate matrix norms; see
(4.3.37). Such simplifications are often convenient to use. The mean value theorem of differential
calculus and Lagrange’s form for the remainder of Taylor’s formula do not hold, but in many
places they can be replaced by the above integral form of the remainder. All this holds in complex
vector spaces too.
where f (x) ∈ Rm and x ∈ Rn . Such problems arise, e.g., when fitting given data (yi , ti ),
i = 1, . . . , m, to a nonlinear model function g(x, t). If only yi are subject to errors, and the
values ti of the independent variable t are exact, we take
The choice of the least squares measure is justified here, as for the linear case, by statistical
considerations; see Bard [66, 1974]. The case when there are errors in both yi and ti is discussed
in Section 8.2.3.
The NLS problem (8.1.4) is a special case of the general unconstrained optimization problem
in Rn . Although ϕ(x) is bounded below, it is usually convex only near a local minimum. Hence,
solution methods will in general not be globally convergent. The methods are iterative in nature.
Starting from an initial guess x0 , a sequence of approximations x1 , x2 , . . . is generated that
ideally converges to a solution. Each iteration step usually requires the solution of a related
linear or quadratic subproblem.
In the following we assume that f (x) is twice continuously differentiable. Because of the
special structure of ϕ(x) in (8.1.4), the gradient ∇ϕ(x) of ϕ(x) has the special structure
where J(x) ∈ Rm×n is the Jacobian of f (x) with elements (∂fi (x)/∂xj ∂xk ). Furthermore, the
Hessian of ϕ(x) is
m
X
∇2 ϕ(x) = J(x)T J(x) + G(x), G(x) = fi (x)Gi (x), (8.1.7)
i=1
where Gi (x) ∈ Rn×n , i = 1, . . . , n, is the Hessian of fi (x), i.e., the symmetric matrix with
elements (∂ 2 fi (x)/∂xj ∂xk ).
A necessary condition for x∗ to be a local minimum of f (x) is that ∇ϕ(x∗ ) = J(x∗ )T f (x∗ )
= 0. Such a point x∗ is called a critical point. Finding a critical point is equivalent to solving
the system of nonlinear algebraic equations
The basic method for solving such a system is Newton’s method for nonlinear equations:
Newton’s method can attain a quadratic rate of convergence and is invariant under a linear trans-
formation of variables x = Sz; see Dennis and Schnabel [316, 1996]). With F (x) given by
(8.1.8), pk solves
J(xk )T J(xk ) + G(xk ) p = −J(xk )T f (xk ).
(8.1.10)
The method can also be derived by using a quadratic approximation for the function ϕ(x) =
1 2
2 ∥f (x)∥2 and taking pk as the minimizer of
1
ϕk (xk + p) = ϕ(xk ) + ∇ϕ(xk )T p + pT ∇2 ϕ(xk )p. (8.1.11)
2
This is Newton’s method for unconstrained optimization. It has several attractive properties. In
particular, if the Hessian ∇2 f (x) is positive definite at xk , then pk is a descent direction for
ϕ(x).
Newton’s method is seldom used for NLS because the mn2 second derivatives in the term
G(xk ) in (8.1.10) are rarely available at a reasonable cost. An exception is in curve-fitting
problems where the function values fi (x) = yi −g(x, ti ) and derivatives can be obtained from the
single function g(x, t). If g(x, t) is composed of simple exponential and trigonometric functions,
for example, then second derivatives can sometimes be computed cheaply.
In the Gauss–Newton method, f (x) is approximated in a neighborhood of xk by the linear
function f (x) = f (xk ) + J(xk )(x − xk ). Then the condition that x be a critical point can be
written
J(xk )T (f (xk ) + J(xk )(x − xk )) = 0.
The next approximation is taken as xk+1 = xk + pk , where pk solves the linear least squares
problem
min ∥f (xk ) + J(xk )p∥2 . (8.1.12)
p
If J(xk ) has full column rank, then pk is uniquely determined by the condition
If xk is not a critical point, then pk = −J(xk )† f (xk ) is a descent direction. Then, for sufficiently
small α > 0, ∥f (xk + αpk )∥2 < ∥f (xk )∥2 . To verify this, note that
∥f (xk + αpk )∥22 = ∥f (xk )∥22 − 2α∥PJk f (xk )∥22 + O(α2 ), (8.1.13)
where PJk = J(xk )J † (xk ) is the orthogonal projector onto the range of J(xk ). Moreover, since
xk is not a critical point, it follows that PJk f (xk ) ̸= 0.
As the following simple example shows, the Gauss–Newton method may not be locally con-
vergent. Consider minimizing f12 (x) + f22 (x), where
note that if px = −J † (x)f (x) is the Gauss–Newton step in the original variables, then pz =
8.1. Newton-Type Methods 393
(J(x)x′ )† f (x) is the step after the change of variables, where x′ is the Jacobian of x(z). Then
px = x′ pz , which is the desired relation.
The Gauss–Newton method can also be thought of as arising from neglecting the term G(x)
in Newton’s method (8.1.10). This term is small if the quantities
are small, i.e., if either the residuals fi (xk ), i = 1, . . . , m, are small or fi (x) is only mildly
nonlinear at xk . In this case the behavior of the Gauss–Newton method can be expected to be
similar to that of Newton’s method.
The Gauss–Newton method can be written as a fixed-point iteration
Assume there exists an x∗ such that J(x∗ )T f (x∗ ) = 0 and J(x∗ ) has full column rank. Then
Sufficient conditions for local convergence and error bounds for the Gauss–Newton method can
be obtained from Ostrowski’s fixed-point theorem; see Pereyra [890, 1967], Ortega and Rhein-
boldt [845, 2000, Theorem 10.1.3], and Dennis and Schnabel [316, 1996, Theorem 10.2.1].
Theorem 8.1.1. Let f (x) be a twice continuously Fréchet differentiable function. Assume there
exists x∗ such that J(x∗ )T f (x∗ ) = 0 and the Jacobian J(x∗ ) has full rank. Then the Gauss–
Newton iteration converges locally to x∗ if the spectral radius
The asymptotic convergence is linear with rate bounded by ρ. In particular, the local convergence
rate is superlinear if f (x∗ ) = 0.
Wedin [1107, 1972] gives a geometrical interpretation of this convergence result. Minimizing
ϕ(x) = 21 ∥f (x)∥22 is equivalent to finding a point on the n-dimensional surface y = f (x) in Rm
closest to the origin. The normalized vector
is orthogonal to the tangent plane of the surface y = f (x∗ ) + J(x∗ )h, h ∈ Rn , at x∗ . The
normal curvature matrix of the surface with respect to w is the symmetric matrix
m
X
K = (J(x∗ )† )T Gw (x∗ )J(x∗ )† , Gw = wi Gi (x∗ ). (8.1.18)
i=1
394 Chapter 8. Nonlinear Least Squares Problems
the matrix ∇F (x∗ ) has the same nonzero eigenvalues as γK, where γ is given as in (8.1.17). It
follows that
ρ = ρ(∇F (x∗ )) = γ max(κ1 , −κn ) = γ∥K∥2 . (8.1.19)
This relation indicates that the local convergence of the Gauss–Newton method is invariant under
a local transformation of the nonlinear least squares problem.
If J(x∗ ) has full column rank, then ∇2 f (x∗ ) = J T (I − γK)J is positive definite. It follows
that x∗ is a local minimum of ϕ(x) if and only if I − γK is positive definite at x∗ . This is the
case when
1 − γκ1 > 0 (8.1.20)
or, equivalently, γ < 1/κ1 . Furthermore, if 1 − γκ1 ≤ 0, then ϕ(x) has a saddle point at x∗ ;
if 1 − γκn < 0, then ϕ(x) has a local maximum at x∗ . If x∗ is a saddle point, then κ1 ≥ 1.
Hence using the Gauss–Newton method, one is generally repelled from a saddle point, which is
an excellent property.
is the orthogonal projection onto the range space of J(x). Hence at a critical point x∗ it holds
that PJ (x∗ )f (x∗ ) = 0. The rate of convergence for the Gauss–Newton method can be estimated
during the iterations from
where ρ is defined as in Theorem 8.1.1. (Recall that limxk →x∗ PJ (xk )fk = 0.) Since
the cost of computing this estimate is only one matrix-vector multiplication. When the estimated
ρ is greater than 0.5 (say), one should consider switching to a method using second-derivative
information.
If the radius of curvature at a critical point satisfies 1/|κi | ≪ ∥f (x∗ )∥2 , the nonlinear least
squares problems will be ill-behaved, and many insignificant local minima may exist. Poor
performance of Gauss–Newton methods often indicates poor quality of the underlying model
or insufficient accuracy in the observed data. Then it would be better to improve the model
rather than use more costly methods of solution. Wedin [1109, 1974] shows that the estimate
(8.1.21) of the rate of convergence of the Gauss–Newton method is often a good indication of the
quality of the underlying model. Deuflhard and Apostolescu [318, 1980] call problems for which
divergence occurs “inadequate problems.” Many numerical examples leading to poor behavior
of Gauss–Newton methods are far from realistic; see Hiebert [609, 1981] and Fraley [430, 1989].
As the following simple example shows, the Gauss–Newton method may not even be locally
convergent. Consider minimizing f12 (x) + f22 (x), where
xk+1 = xk + αk pk ,
where αk > 0 is a step length to be determined. There are two common algorithms for choosing
αk ; see Ortega and Rheinboldt [845, 2000] and Gill, Murray, and Wright [476, 1981].
1. Armijo–Goldstein line search, where αk is taken to be the largest number in the sequence
1, 12 , 14 , . . . for which the inequality
1
∥f (xk )∥22 − ∥f (xk + αk pk )∥22 ≥ αk ∥J(xk )pk ∥22
2
holds (notice that −J(xk )pk = PJk f (xk )).
2. “Exact” line search, i.e., taking αk as the solution to the one-dimensional minimization
problem
min ∥f (xk + αpk )∥22 . (8.1.23)
α>0
Note that a solution αk may not exist, or there may be a number of local solutions.
A theoretical analysis of the Gauss–Newton method with exact line search has been given by
Ruhe [939, 1979]. The asymptotic rate of convergence is shown to be
A comparison with (8.1.19) shows that ρe = ρ if κn = −κ1 , and ρe < ρ otherwise. We also have
that γκ1 < 1 implies ρe < 1, i.e., we always get convergence close to a local minimum. This is
in contrast to the Gauss–Newton method, which may fail to converge to a local minimum.
Lindström and Wedin [751, 1984] develop an alternative line-search algorithm. In this, α is
chosen to minimize ∥pk (α)∥2 , where p(α) approximates the curve f (α) = f (xk + αpk ) ∈ Rm .
One possibility is to choose p(α) to be the unique circle (in the degenerate case, a straight line)
determined by the conditions
where 0 < ϵ ≪ 1 and f1 and f2 are of order unity. If J is considered to be of rank two, then the
search direction is pk = s1 , whereas if the assigned rank is one, pk = s2 , where
f1 f1
s1 = − , s2 = − .
f2 /ϵ 0
Clearly the two directions s1 and s2 are almost orthogonal, and s1 is almost orthogonal to the
gradient vector J T f . A strategy for estimating the rank of J(xk ) can be based on QR factor-
ization or SVD of J(xk ); see Section 2.5. Usually an underestimate of the rank is preferable,
except when f (x) is close to an ill-conditioned quadratic function.
An alternative approach that avoids the rank determination is to take xk+1 = xk + pk , where
pk is the unique solution to
2
min f (xk ) + J(xk )p 2
+ µ2k ∥p∥22 , (8.1.25)
p
and µk > 0 is a regularization parameter. Then pk is defined even when Jk is rank-deficient. This
method was first used by Levenberg [736, 1944] and later modified by Marquardt [779, 1963]
and is therefore called the Levenberg–Marquardt method. The solution to problem (8.1.25)
satisfies
J(xk )T J(xk ) + µ2k I pk = −J(xk )T f (xk )
or, equivalently,
J(xk ) f (xk )
min p+ , (8.1.26)
p µk I 0 2
and can be solved stably by QR factorization (or CGLS or LSQR).
The regularized Gauss–Newton method always takes descent steps. Hence it is locally con-
vergent on almost all nonlinear least squares problems, provided an appropriate line search is
carried out. We remark that as µk → 0+ , pk → J(xk )† fk , the pseudoinverse step. For large val-
ues of µ the direction pk becomes parallel to the steepest descent direction −J(xk )T fk . Hence
pk interpolates between the Gauss–Newton and steepest descent direction. This property makes
the Levenberg–Marquardt method preferable to damped Gauss–Newton for many problems.
A useful modification of the Levenberg–Marquardt algorithm is to change the penalty term
in (8.1.25) to ∥µk D∥22 , where D is a diagonal scaling matrix. A frequently used choice is D2 =
diag (J(x0 )T J(x0 )). This choice makes the Levenberg–Marquardt algorithm scaling invariant,
i.e., it generates the same iterations if applied to f (Dx) for any nonsingular diagonal matrix D.
From the discussion of problem LSQI in Section 3.5.3, it follows that the regularized least
squares problem (8.1.25) is equivalent to the least squares problem with a quadratic constraint,
for some ∆k > 0. If the constraint in (8.1.27) is binding, then pk is a solution to (8.1.25) for
some µk > 0. The set of feasible vectors p, ∥Dp∥2 ≤ ∆k in (8.1.27) can be thought of as a
region of trust for the linear model f (x) ≈ f (xk ) + J(xk )p, p = x − xk . There has been much
research on so-called trust-region methods based on the formulation (8.1.27) as an alternative
to a line-search strategy. Combined with a suitable active-set strategy, such a technique can be
extended to handle problems with nonlinear inequality constraints; see Lindström [749, 1983].
Several versions of the Levenberg–Marquardt algorithm have been proved to be globally con-
vergent; see, e.g., Fletcher [411, 1971] and Osborne [847, 1976]. A general description of scaled
trust-region methods for nonlinear optimization is given by Moré [806, 1978], [807, 1983]. Moré
proves that the algorithm will converge to a critical point x∗ if f (x) is continuously differentiable,
J(x) is uniformly continuous in a neighborhood of x∗ , and J(xk ) remains bounded.
8.1. Newton-Type Methods 397
To compute ρ(pk ) stably in step 2, note that because pk satisfies (8.1.26), the predicted re-
duction can be computed from
∥f (xk )∥22 − ∥f (xk ) + J(xk )pk ∥22 = −2pTk J(xk )T f (xk ) − ∥J(xk )pk ∥22
= 2µ2 ∥Dpk ∥22 + ∥J(xk )pk ∥22 .
The ratio ρk measures the agreement between the linear model and the nonlinear function. If Jk
has full rank, then ρk → 1 as ∥pk ∥2 → 0. The parameter β in step 3 can be chosen quite small,
e.g., β = 0.0001.
Assume that the Jacobian J(x) is rank-deficient with constant rank r < n in a neighborhood
of a local minimum x∗ . Then the appropriate formulation of the problem is
Boggs [157, 1976] notes that the choice pk = −J(xk )† f (xk ) gives the least-norm correction to
the linearized problem. Instead, pk should be taken as the least-norm solution to the linearized
problem
min ∥xk + p∥22 subject to ∥f (xk ) + J(xk )p∥22 = min . (8.1.29)
p
For µk > 0 this is a full-rank linear least squares problem. The unique solution pk = pTk ik can
be computed by QR factorization.
This result implies that if µk → 0+ , the two Gauss–Newton methods have the same local
convergence properties.
This is called the quasi-Newton relation. We further require Sk to differ from Sk−1 by a matrix
of small rank. The search direction pk for the next step is then computed from
The solution to (8.1.35) that minimizes ∥Bk − Bk−1 ∥F is given by the rank-two update formula
where Σ1 = diag (σ1 , . . . , σp ) contains the large singular values of J and Σ2 = diag (σp+1 , . . . ,
σn ). Set B = B(xk ), and let s̄ denote the first n components of the vector s = U T f (xk ).
Equation (8.1.10) for the Newton direction can then be split into two sets. The first p equations
are
(Σ21 + V1T BV1 )q1 + V1T BV2 q2 = −Σ1 s̄1 .
If the terms involving B = Bk can be neglected compared to Σ21 q1 , we get q1 = −Σ−1
1 s̄1 .
Substituting this into the last (n − p) equations, we obtain
= vjT Bk + O(h), j = p + 1, . . . , n.
This gives an approximation for V2T Bk , and then (V2T Bk )V2 can be formed.
400 Chapter 8. Nonlinear Least Squares Problems
where J and C are the Jacobians for f (x) and h(x), respectively. This problem can be solved
by the methods described in Section 4.5. The search direction pk obtained from (8.1.39) can be
shown to be a descent direction for the merit function
ψ(x, µ) = ∥f (x)∥22 + µ∥h(x)∥22
at the point xk , provided µ is chosen large enough.
may be too costly to solve accurately. In any case, far from the solution x∗ , it may not be worth
solving these subproblems to high accuracy. To solve (8.1.40) for pk , a truncated inner iterative
8.1. Newton-Type Methods 401
method such as CGLS or LSQR can be applied. A class of inexact Newton methods for solving
a system of nonlinear equations F (x) = 0 is studied by Dembo, Eisenstat, and Steihaug [300,
1982]. A sequence {xk } of approximations is generated as follows. Given an initial guess x0 ,
set xk+1 = xk + pk , where pk satisfies
∥rk ∥2
F ′ (xk )pk = −F (xk ) + rk , ≤ ηk < 1. (8.1.41)
∥F (xk )∥2
Here {ηk } is a nonnegative forcing sequence used to control the level of accuracy. Taking
ηk = 0 gives the Newton method. Note that the requirement ηk < 1 is natural because ηk ≥ 1
would allow pk = 0.
Theorem 8.1.3 (Dembo, Eisenstat, and Steihaug [300, 1982, Theorem 3]). Assume that there
exists an x∗ such that F (x∗ ) = 0 with F ′ (x∗ ) nonsingular. Let F be continuously differentiable
in a neighborhood of x∗ , and let ηk ≤ η̂ < t < 1. Then there exists ϵ > 0 such that if
∥x∗ − x0 ∥2 ≤ ϵ, the sequence {xk } generated by (8.1.41) converges linearly to x∗ in the sense
that
∥xk+1 − xk ∥∗ ≤ t∥xk − xk ∥∗ , (8.1.42)
where the norm is defined by ∥y∥∗ ≡ ∥F ′ (x∗ )y∥2 .
First, we note that the exact Gauss–Newton method can be considered as an incomplete
Newton method for the equation F (x) = J(x)T f (x) = 0. This is of the form (8.1.41), where
pk satisfies J(xk )T J(xk )pk = −J(xk )T f (xk ) and
where Gi (xk ) is the Hessian of fi (xk ). By Theorem 8.1.3 a sufficient condition for convergence
is that
−1
G(xk ) J(xk )T J(xk ) = tk ≤ t < 1. (8.1.43)
2
fk = J(xk )T (J(xk )pk + f (xk )), ∥fk ∥2 ≤ βk ∥J(xk )T f (xk )∥2 . (8.1.44)
The condition on ∥fk ∥2 is a natural stopping condition on an iterative method for solving the
linear subproblem. Gratton, Lawless, and Nichols [526, 2007] give conditions on the sequence
of tolerances {βk } needed to ensure convergence and investigate the use of such methods for
variational data assimilation in meteorology.
Theorem 8.1.4 (Gratton, Lawless, and Nichols [526, 2007, Theorem 5]). Let f (x) be a
twice continuously Fréchet differentiable function. Assume that there exists an x∗ such that
J T (x∗ )f (x∗ ) = 0 and J(x∗ ) has full column rank. Assume tk < β̂ < 1, where tk is given as in
(8.1.43). Assume βk , k = 0, 1, . . . , are chosen so that
Then there exists ϵ > 0 such that if ∥x∗ − x0 ∥2 ≤ ϵ, the sequence {xk } of the inexact Gauss–
Newton method (8.1.44) converges linearly to x∗ .
402 Chapter 8. Nonlinear Least Squares Problems
Note that the requirement tk < β̂ < 1 is the sufficient condition for convergence given for the
exact Gauss–Newton method. The more highly nonlinear the problem, the larger tk and the more
accurate the linear subproblems to be solved. Accelerated schemes for inexact Newton methods
using GMRES for large systems of nonlinear equations are given by Fokkema, Sleijpen, and van
der Vorst [414, 1998].
where Φ(z) ∈ Rm,p , y ∈ Rp , and z ∈ Rq are unknown parameters, and g ∈ Rm are given data.
For a fixed value of the nonlinear parameter z, problem (8.2.1) is a linear least squares problem
in y. Such least squares problems are called separable and arise frequently in applications. One
example is when one wants to approximate given data by a linear combination of given nonlinear
functions ϕj (z), j = 1, . . . , p.
A simple method for solving separable problems is the alternating least squares (ALS) al-
gorithm. Let z1 be an initial approximation, and solve the linear problem miny ∥g − Φ(z1 )y∥2
to obtain y1 . Next, solve the nonlinear problem minz ∥g − Φ(z)y1 ∥2 to obtain z2 . Repeat both
steps until convergence. The rate of convergence of ALS is linear and can be very slow. It does
not always converge.
For a fixed value of z, the least-norm least squares solution of (8.2.1) can be expressed as
where Φ† is the pseudoinverse of Φ(z). In the variable projection method of Golub and
Pereyra [503, 1973], this is used to eliminate the linear parameters y, giving a reduced nonlinear
8.2. Separable Least Squares Problems 403
where PΦ(z) is the orthogonal projector onto the column space of Φ(z). This is a pure nonlinear
problem of reduced dimension. An important advantage is that initial values are only needed for
the nonlinear parameters z.
In order to solve (8.2.3) using a Gauss–Newton–Marquardt method, a formula for the gradi-
⊥
ent of the function f (z) = (I − PΦ(z) )g = PΦ(z) g in (8.2.3) is needed. The following lemma
⊥
gives an expression for the Fréchet derivative of the orthogonal projection matrix PΦ(z) . It must
be assumed that the rank of Φ(z) is locally constant, because otherwise the pseudoinverse would
not be differentiable.
Lemma 8.2.1 (Golub and Pereyra [503, 1973, Lemma 4.1]). Let Φ(z) ∈ Rm×p be a matrix
of local constant rank r and Φ† be its pseudoinverse. Denote by PΦ = ΦΦ† the orthogonal
projector onto R(Φ), and set PΦ⊥ = 1 − PΦ . Then, using the product rule for differentiation, we
get
T
d d dΦ † dΦ †
(PΦ ) = − (PΦ⊥ ) = PΦ⊥ Φ + PΦ⊥ Φ . (8.2.4)
dz dz dz dz
More generally, Golub and Pereyra show that (8.2.4) is valid for any generalized inverse that
satisfies ΦΦ− Φ = Φ and (ΦΦ− )T = ΦΦ− . Note that the derivative dΦ/dz in (8.2.4) is a three-
dimensional tensor with elements ∂ϕij /∂zk . The transposition in (8.2.4) is done in the (i, j)
directions for fixed k.
In many applications, each component function ϕj (z) depends on only a few of the parame-
ters z1 , . . . , zq . Hence the derivative will often contain many zeros. Golub and Pereyra develop a
storage scheme that avoids waste storage and computations. Let E = (eij ) be a q × p incidence
matrix such that eij = 1 if function ϕj depends on the parameter P zi , and 0 otherwise. This
incidence matrix can be retrieved from an m × p array B, p = i,j e(i, j), in which the nonzero
derivatives are stored sequentially.
Example 8.2.2. A problem of great importance is the fitting of a linear combination of exponen-
tial functions with different time constants,
p
X
g(t) = y0 + yj ezj t , (8.2.5)
j=1
to observations gi = g(ti )+ϵi , i = 0 : m. Since g(t) in (8.2.5) depends on p+1 linear parameters
yj and p nonlinear parameters zj , at least m = 2p + 1 observations for t0 , . . . , tm are needed.
Clearly this problem is separable, and ϕi,j (z; t) = ezj ti , j = 1, . . . , p. Here the number of
nonvanishing derivatives is p.
The quantities required to solve the reduced nonlinear problem can be expressed in terms of
a complete QR decomposition
R 0
Φ=U V T , U = ( U1 U2 ) , (8.2.6)
0 0
where R ∈ Rr×r (r = rank(Φ)) is upper triangular and nonsingular, and U and V are orthogo-
nal. The solution to the linear least squares problem (8.2.1) is then y = Φ† g, where
−1
† R
Φ =V U1T (8.2.7)
0
404 Chapter 8. Nonlinear Least Squares Problems
is the generalized inverse. The orthogonal projectors onto the range of Φ and its orthogonal
complement are
PΦ = ΦΦ† = U1 U1T , PΦ⊥ = I − PΦ = U2 U2T . (8.2.8)
The least squares residual is r = PΦ⊥ g = U2 (U2T g), and its norm is
c1
∥r∥2 = ∥U2 (U2T g)∥2 = ∥c2 ∥2 , T
U g= . (8.2.9)
c2
Both columns can be computed using matrix-vector products and triangular solves. The second
part is somewhat more expensive to compute.
The variable projection approach reduces the dimension of the parameter space and leads to a
more well-conditioned problem. Furthermore, because no starting values have to be provided for
the linear parameters, convergence to a global minimum is more likely. Krogh [708, 1974] re-
ports that at the Jet Propulsion Laboratory (JPL) the variable projection algorithm solved several
problems that methods not using separability could not solve.
Kaufman [686, 1975] gave a simplified version of the variable projection algorithm that uses
an approximate Jacobian matrix obtained by dropping the second term in (8.2.4). This simplifica-
tion was motivated by the observation that the second part of the Jacobian is negligible when the
residual r is small. Kaufman’s simplification reduces the arithmetic cost per iteration by about
25%, with only a marginal increase in the number of iterations. Kaufman and Pereyra [688, 1978]
extend the simplified scheme to problems with separable equality constraints. Their approach is
further refined by Corradi [270, 1981].
Ruhe and Wedin [942, 1980] consider variable projection methods for more general separable
problems, where one set of variables can be easily eliminated. They show that the asymptotic rate
of convergence of the variable projection method is essentially the same as for the Gauss–Newton
method applied to the full problem. Both converge quadratically for zero-residual problems,
whereas ALS always converges linearly.
Several implementations of the variable projection method are available. An improved ver-
sion of the original program VARPRO was given by John Bolstad in 1977. This uses Kaufman’s
modification that allows for weights on the observations and also computes the covariance ma-
trix. A version called VARP2 by LeVeque handles multiple right-hand sides. Both VARPRO
and VARP2 are available in the public domain at http://www.netlib.org/opt/. Another
implementation was written by Linda Kaufman and David Gay for the Port Library. A well-
documented implementation in MATLAB written by O’Leary and Rust [839, 2013] uses the full
Jacobian as in the original Golub–Pereyra version. The motivation is that in many current appli-
cations, the increase in the number of function evaluations in Kaufman’s version outweighs the
savings gained from using an approximate Jacobian.
8.2. Separable Least Squares Problems 405
that is separable in each of the variables x and y. In system theory, the identification of a Ham-
merstein model leads to a BLSQ problem; see Wang, Zhang, and Ljung [1100, 2009]. Related
multilinear problems are used in statistics, chemometrics, and tensor regression.
The data matrices Ai form a three-dimensional tensor A ∈ Rm×p×q with elements ai,j,k .
Slicing the tensor in the two other possible directions, we obtain matrices
The BLSQ problem is linear in each of the variables x and y. If (x, y) is a solution to problem
BLSQ, x solves the linear least squares problem
q
X
min ∥Bx − b∥2 , B= yk Sk ∈ Rm×p , (8.2.15)
x
k=1
For α ̸= 0, the residuals ri (x, y) = bi − xTAi y of the bilinear problem (8.2.12) are invariant
under the scaling (αx, α−1 y) of the variables. This shows that the bilinear problem (8.2.12) is
singular. The singularity can be handled by imposing a quadratic constraint ∥x∥2 = 1. Alterna-
tively, a linear constraint xi = eTi x = 1 for some i, 1 ≤ i ≤ m can be used. For convenience,
we assume in the following that i = 1 (x1 = 1) and use the notation
1
x= ∈ Rp , x̄ ∈ Rp−1 .
x̄
406 Chapter 8. Nonlinear Least Squares Problems
The residual r̄(x̄, y) of the constrained problem is r(x, y) with x given as above. The Jacobian
of the constrained problem is
J¯ = J(x̄,
¯ y) = ( J¯x̄ Jy ) ∈ Rm×(p+q−1) ,
y = f (x, β) ∈ Rq , x ∈ Rn , β ∈ Rp , (8.2.19)
In orthogonal distance regression (ODR) the parameters β are determined by minimizing the sum
of squares of the orthogonal distances from the observations (xi , yi ) to the curve y = f (x, β); see
Figure 8.2.1. If δi and ϵi do not have constant covariance matrices, then weighted norms should
8.2. Separable Least Squares Problems 407
be substituted above. If the errors in the independent variables are small, then ignoring these
errors will not seriously degrade the estimates of x. Independent of statistical considerations, the
orthogonal distance measure has natural applications in fitting data to geometrical elements.
In linear orthogonal distance regression one wants to fit m > n given points yi ∈ Rn ,
i = 1 : m, to a hyperplane
M = {z | cT z = d}, z, c ∈ Rn , ∥c∥2 = 1, (8.2.21)
where c is the unit normal vector of M , and |d| is the orthogonal distance from the origin to the
plane in such a way that the sum of squares of the orthogonal distances from the given points
to M is minimized. For n = 1 this problem was studied in 1878 by Adcock [7, 1878]. The
orthogonal projection zi of the point yi onto M is given by
zi = yi − (cT yi − d)c. (8.2.22)
It is readily verified that zi lies on M and that the residual zi − yi is parallel to c and hence or-
thogonal to M . Hence, the problem of minimizing the sum of squares of the orthogonal distances
is equivalent to minimizing
m
X
(cT yi − d)2 subject to ∥c∥2 = 1.
i=1
If we put Y T = (y1 , . . . , ym ) ∈ Rn×m and e = (1, . . . , 1)T ∈ Rm , this problem can be written
in matrix form as
d
min ( −e Y ) subject to ∥c∥2 = 1. (8.2.23)
c,d c 2
For a fixed c, this expression is minimized when the residual vector Y c − de is orthogonal to e,
i.e., when eT (Y c − de) = eT Y c − deT e = 0. Since eT e = m, it follows that
1 T T 1 T
d= c Y e = cT ȳ, ȳ = Y e, (8.2.24)
m m
where ȳ is the mean of the given points Pi . Hence d is determined by the condition that the mean
ȳ lies on the optimal plane M . Note that this property is shared by the solution to the usual linear
regression problem.
We now subtract the mean value ȳ from each given point and form the matrix
Ȳ T = (ȳ1 , . . . , ȳm ), ȳi = yi − ȳ, i = 1 : m.
408 Chapter 8. Nonlinear Least Squares Problems
Orthogonalizing the shifted points ȳi against vn and adding back the mean value, we get the
fitted points
zi = ȳi − (vnT ȳi )vn + ȳ ∈ M. (8.2.26)
The linear orthogonal regression problem always has a solution. The solution is unique when
σn−1 > σn , and the minimum sum of squares is σn2 . Moreover, σn = 0 if and only if the
given points yi , i = 1 : m, all lie on the hyperplane M . In the extreme case when all points
coincide, then Ȳ = 0, and any plane going through ȳ is a solution. The above method solves
the problem of fitting an (n − 1)-dimensional linear manifold to a given set of points in Rn . It
is readily generalized to fitting an (n − p)-dimensional linear manifold by orthogonalizing the
shifted points y against the p right singular vectors of Y corresponding to the p smallest singular
values.
In the nonlinear case (8.2.19), we first assume that x ∈ Rn , y ∈ R. Then the parameters
β ∈ Rp should be chosen as the solution to
m
X
min (ϵ2i + δi2 ) subject to yi + ϵi = f (xi + δi , β), i = 1, . . . , m.
β,ϵ,δ
i=1
This is a constrained least squares problem of special form. By using the constraints to eliminate
ϵi , the ODR problem can be formulated as an NLS problem in the parameters δ = (δ1 , . . . , δn )
and β:
Xm
min (f (xi + δi , β) − yi )2 + δi2 . (8.2.27)
β,δ
i=1
Note that even when f (x, β) is a linear function of β, this is a nonlinear least squares problem.
If we define the residual vector r = (r1T , r2T )T by
r1i (δ, β) = f (xi + δi , β) − yi , r2i (δ) = δi ,
then (8.2.27) is a standard NLS problem of the form minx,δ ∥r(x, δ)∥22 . The corresponding
Jacobian matrix has the block structure
D 0
J = ∈ R2mn×(mn+p) . (8.2.28)
V J
Here D > 0 is a diagonal matrix of order mn reflecting the variance in δi ,
V = diag (v1T , . . . , vm
T
) ∈ Rm×mn , and
∂f (xi + δi , β) ∂f (xi + δi , β)
viT = , Jij = , i = 1, . . . , m, j = 1, . . . , p.
∂xi ∂βj
8.2. Separable Least Squares Problems 409
Note that J is sparse and highly structured. In applications, usually mn ≫ p, and then account-
ing for the errors δi in xi will considerably increase the size of the problem. Therefore, the use
of standard NLS software to solve orthogonal distance problems is not efficient or even feasible.
By taking the special structure of (8.2.27) into account, the work in ODR can be reduced to only
slightly more than for an ordinary least squares fit of β.
In the Gauss–Newton method, corrections ∆δk and ∆βk to the current approximations δk
and βk are obtained from the linear least squares problem
∆δ r1
min J − , (8.2.29)
∆δ,∆β ∆β r2 2
where J , r1 , and r2 are evaluated at δk and βk . A stable way to solve this problem is to compute
the QR decomposition of J . First, a sequence of plane rotations is used to zero the (2,1) block
V of J . The rotations are also applied to the right-hand side vector. We obtain
U K r1 t
Q1 J = , Q1 = , (8.2.30)
0 J¯ r2 r̄2
where U is a block diagonal matrix with m upper triangular blocks of size n × n. Problem
(8.2.29) now decouples, and ∆β is determined as the solution to
¯
min ∥J∆β − r̄2 ∥2 ,
∆β
where J¯ ∈ Rm×n . This second step has the same complexity as computing the Gauss–Newton
correction in the classical NLS problem. When ∆β has been determined, ∆δ is obtained by
back-substitution:
U ∆δ = u1 , u1 = t − K∆β. (8.2.31)
More generally, when y ∈ Rq and x ∈ Rn , the ODR problem has the form
m
X
min ∥f (xi + δi , β) − yi ∥22 + ∥δi ∥22 . (8.2.32)
β,δ
i=1
After an orthogonal reduction to upper triangular form, the reduced Jacobian has the same block
structure as in (8.2.30). However, now D ∈ Rmnt ×mnt is block diagonal with upper triangular
matrices of size nt × nt on the diagonal, K ∈ Rmnt ×n , and J¯ ∈ Rmny ×n .
In practice a trust-region stabilization of the Gauss–Newton step should be used, and then we
need to solve (8.2.29) with
D 0 r1
V J r2
Jµ = , r = , (8.2.33)
µT 0 0
0 µS 0
with several different values of the parameter µ, where S and T are nonsingular diagonal ma-
trices. An orthogonal reduction to upper triangular form would result in a matrix with the same
block structure as in (8.2.30), where D ∈ Rmn×mn is block diagonal with upper triangular ma-
trices of size n × n on the diagonal. Therefore, if done in a straightforward way, this reduction
does not take full advantage of the structure in the blocks and is not efficient.
410 Chapter 8. Nonlinear Least Squares Problems
Boggs, Byrd, and Schnabel [158, 1987] show how the computations in a trust-region step can
be carried out so that the cost is of the same order as for a standard least squares fit. They claim
that computing a QR factorization of Jµ would require O((mn + p)2 ) operations for each µ.
For this reason the ∆δ variables are eliminated as outlined above by using the normal equations,
combined with the Woodbury formula (3.3.9). The reduced least squares problem for ∆β can
then be written min ∥J∆β˜ − r̃2 ∥2 , where
−M (r1 − V E −2 Dr2 )
MJ
J˜ = , r̃2 = ,
µS 0
where E = D2 + µ2 T 2 , and an explicit expression for the diagonal M can be given in terms
of E and V . A software package ODRPACK for orthogonal distance regression based on this
algorithm is described in Boggs et al. [159, 1989].
Schwetlick and Tiller [980, 1985], [981, 1989] develop an ODR algorithm based on the
Gauss–Newton method with a special Marquardt-type regularization for which similar simplifi-
cations can be achieved. The path
p(µ) := −(∆δ(µ)T , ∆x(µ)T )T
is shown to be equivalent to a trust-region path defined by a nonstandard scaling matrix, and the
step is controlled in trust-region style.
An algorithm for computing the trust-region step based on QR factorization of Jµ is given
by Björck [136, 2002]. Because only part of U and Q1 need to be computed and saved, this
has the same leading arithmetic cost as the normal equation algorithm by Boggs et al. By taking
advantage of the special structure of Jµ , only part of the nonzero elements in the factors Q and
R need to be stored. In the first step of the factorization, plane rotations are used to merge the
two diagonal blocks D and µT . After a permutation of the last two block rows, we obtain
D 0 r1 Dµ 0 r̃1
V J r2 V J r2
G 0 µS 0 = 0 µS 0 ,
(8.2.34)
µT 0 0 0 0 r4
where Dµ = (D2 + µ2 T 2 )1/2 is diagonal. The rotations are also applied to the right-hand side.
This step does not affect the second block column and the last block row in Jµ . The key step is
orthogonal triangularization of the first block column in (8.2.34). This will not affect the last two
block rows and can be carried out efficiently if full advantage is taken of the structure of blocks
Dµ and V .
A nonlinear least squares problem with structure similar to ODR arises in structured total
least squares problems. Given the data matrix A and the vector b, the TLS problem is to find E
and x to solve
min ∥(E r)∥F such that r = b − (A + E)x. (8.2.35)
8.2. Separable Least Squares Problems 411
If A is sparse, we may want E to have the same sparsity structure. Rosen, Park, and Glick [936,
1996] impose an affine structure on E by defining a matrix X such that Xδ = Ex, where δ ∈ Rq
is a vector containing the nonzero elements of E, and the elements of X ∈ Rm×q consist of the
elements of the vector x with a suitable repetition. Then the problem can be written as a nonlinear
least squares problem
δ
min , r(δ, x) = Ax − b + Xδ. (8.2.36)
δ,x r(δ, x)
When E is general and sparse, the structure of the Jacobian of r with respect to δ will be similar
to that in the ODR problem.
is minimized and it directly involves the function f (x, y, p); see Pratt [905, 1987]. In geometric
fitting, a least squares functional is minimized and involves the geometric distances from the
data points to the curve defined by f (x, y, p) = 0,
m
X
d2i (p), d2i (p) = (x − xi )2 + (y − yi )2 ,
min min (8.2.38)
p f (x,y,p)=0
i=1
where di (p) is the orthogonal distance from the data point (xi , yi ) to the curve. This is similar
to orthogonal distance regression described for an explicitly defined function y = f (x, β) in
Section 8.2.3. However, for implicitly defined functions the calculation of the distance function
di (p) is more complicated.
Algebraic fitting often leads to a simpler problem, in particular when f is linear in the pa-
rameters p. Methods for geometrical fitting are slower but give better results both conceptually
and visually. Implicit curve fitting problems, where a model h(y, x, t) = 0 is to be fitted to
observations (yi , ti ), i = 1, . . . , m, can be formulated as a special least squares problem with
nonlinear constraints:
This is a special case of problem (8.1.38). It has n + m unknowns x and z, but the sparsity of
the Jacobian matrices can be taken advantage of; see Lindström [750, 1984].
We first discuss algebraic fitting of circles and ellipses in two dimensions. A circle has three
degrees of freedom and can be represented algebraically by
f (x, y, p) = a(x2 + y 2 ) + b1 x + b2 y − c = 0,
412 Chapter 8. Nonlinear Least Squares Problems
The constraint is added because p is only defined up to a constant multiple. This problem is
equivalent to the linear least squares problem
For plotting, the circle can more conveniently be represented in parametric form,
x(θ) xc cos θ
= +r ,
y(θ) yr sin θ
where the center (xc yc )T and radius r of the circle can be expressed in terms of p as
xc 1 b1 1
q
=− , r= ∥b∥22 + 4ac . (8.2.41)
yc 2a b2 2a
From this expression we see that the constraint 2ar = 1 can be written as a quadratic constraint
b21 + b22 + 4ac = pT Cp = 1, where
a 0 0 0 2
b 0 1 0 0
p = 1 , C= .
b2 0 0 1 0
c 2 0 0 0
This will guarantee that a circle is fitted. (Note that equality can be used in the constraint be-
cause of the free scaling.) The matrix C is symmetric but not positive definite (its eigenvalues
are −2, 1, 1, 2). We discuss the handling of such quadratic constraints later when dealing with
ellipses.
An ellipse in the x, y-plane can be represented algebraically by
x x a11 a12
f (x, y, p) = ( x y ) A + ( b1 b2 ) − c = 0, A = . (8.2.42)
y y a12 a22
Then the objective function is Φ(p) = ∥Sp∥22 , where S is an m×6 matrix with rows sTi . Because
the parameter vector p is only determined up to a constant factor, the problem formulation must
be completed by including some constraint on p. Three such constraints have been considered.
The solution of this constrained problem is the right singular vector corresponding to the smallest
singular value of S.
8.2. Separable Least Squares Problems 413
where d is a fixed vector with ∥d∥2 = 1. Let H be an orthogonal matrix such that Hd = e1 . Then
the constraint becomes dT p = dT H T Hp = eT1 (Hp) = 1, so we can write Sp = (SH T )(Hp),
where Hp = (1 q T )T . Now, if we form SH T = S̃ = [s̃1 S̃2 ], we arrive at the unconstrained
linear least squares problem
2 T 1
min ∥S̃2 q + s̃1 ∥2 subject to p = H . (8.2.45)
q q
For general symmetric C, problem (8.2.46) reduces to the generalized eigenvalue problem
(C − λS T S)p = 0.
(B TB − λS T S)p = 0. (8.2.47)
Since λ∥Sp∥2 = ∥Bp∥2 = 1, we want the eigenvector p corresponding to the largest eigenvalue
λ or, equivalently, the largest eigenvalue of (S T S)−1 B TB. If S = QR is the QR factorization
of S with R nonsingular, the eigenvalue problem (8.2.47) can be written
Hence, q is the right singular vector corresponding to the largest singular value of BR−1 . Of
special interest is the choice B = (0 I). In this case, the constraint can be written ∥p2 ∥22 = 1,
where p = (p1 p2 ). With R conformally partitioned with p, problem (8.2.46) is equivalent to the
generalized total least squares problem
2
R11 R12 p1
min subject to ∥p2 ∥2 = 1. (8.2.48)
p 0 R22 p2 2
For any p2 we can determine p1 so that R11 p1 + R12 p2 = 0. Hence p2 solves minp2 ∥R22 p2 ∥2
subject to ∥p2 ∥2 = 1 and can be obtained from the SVD of R22 .
If σmin (S) = 0, the data exactly fit the ellipse. If σmin (S) is small, the different constraints
above give similar solutions. However, when errors are large, the different constraints can lead
to very different solutions; see Varah [1089, 1996].
A desirable property of a fitting algorithm is that when the data are translated and rotated,
the fitted ellipse should be transformed in the same way. It can be shown that to have this invari-
ance, the constraint must involve only symmetric functions of the eigenvalues of A. The SVD
constraint does not have this property. For a linear constraint, the choice dT = (1 0 1 0 0 0)
gives the desired invariance:
also leads to this kind of invariance. Note that the Bookstein constraint
√ can be put in the form
∥Bp∥2 = 1, B = (0 I), by permuting the variables and scaling by 2:
√ √
p = (b1 , b2 , c, a11 , 2a12 , a22 )T , sTi = (xi , yi , −1, x2i , 2xi yi , yi2 ).
Another useful quadratic constraint is pT Cp = a11 a22 −a212 = λ1 λ2 = 1. This has the advantage
of guaranteeing that an ellipse is generated rather than another conic section. Note that the matrix
C corresponding to this constraint is not positive definite, so it leads to a generalized eigenvalue
problem.
To plot the ellipse, it is convenient to convert the algebraic form (8.2.42) to the parametric
form
x(θ) xc a cos θ cos α sin α
= + G(α) , G(α) = , (8.2.51)
y(θ) yc b sin θ − sin α cos α
where G(α) is a rotation with angle α. The new parameters (xc , yc , a, b, α) can be obtained from
the algebraic parameters p by the eigenvalue decomposition A = G(α)ΛG(α)T as follows. We
assume that Λ = diag (λ1 , λ2 ). If a12 = 0, take G(α) = I and Λ = A. Otherwise, compute
t = tan(α) from
p
τ = (a22 − a11 )/(2a12 ), t = sign (τ )/ |τ | + 1 + τ 2 .
√
This ensures that |α| < π/4. Then cos α = 1/ 1 + t2 , sin α = t cos α, λ1 = a11 − ta12 , and
λ2 = a22 + ta12 . The center of the ellipse is given by
xc 1 1
= − A−1 b = − G(α)Λ−1 G(α)T b, (8.2.52)
yc 2 2
p p
and the axis is (a, b) = c̃/λ1 , c̃/λ2 , where
1 xc 1
c̃ = c − bT = c + bT G(α)Λ−1 G(α)T b. (8.2.53)
2 yc 4
The algorithms described here can be generalized for fitting conic sections in three dimen-
sions. In some applications, e.g., lens-making, it is required to fit a sphere to points representing
only a small patch of the sphere surface. For a discussion of this case, see Forbes [419, 1989].
Other fitting problems, such as fitting a cylinder, circle, or cone in three dimensions, are consid-
ered in Forbes [418, 1989].
In geometric fitting of circles and ellipses, the sum of orthogonal distances from each data
point to the curve is minimized. When the curve admits a parametrization, such as in the case of
fitting a circle and ellipse, this simplifies. We first consider fitting of a circle written in parametric
form as
x − xc − r cos θ
f (x, y, p) = = 0,
y − yc − r sin θ
where p = (xc , yc , r)T . The problem can be written as a nonlinear least squares problem
After reordering of rows, the Jacobian associated with this problem has the form
rS A
J= ,
−rC B
where S = diag (sin θ1 , . . . , sin θm ) and C = diag (cos θ1 , . . . , cos θm ). The first m columns
of J correspond to the parameters
θ and are mutually orthogonal. Multiplying from the left with
T S −C
the orthogonal matrix Q = , we obtain
C S
rI SA − CB
QT J = .
0 CA + SB
Hence to compute the QR factorization of J and the Gauss–Newton search direction for problem
(8.2.54), we only need to compute the QR factorization of the m × 3 matrix CA + SB. A
trust-region stabilization can easily be added.
For the geometrical fitting of an ellipse we proceed similarly. We now have the parametric
form (8.2.51),
x − xc a cos θ cos α sin α
f (x, y, p) = − G(α) , G(α) = ,
y − yc b sin θ − sin α cos α
where p = (xc , yc , a, b, α)T . The problem can be written as a nonlinear least squares problem
(8.2.54) if we define r as a vector of length 2m with
xi − xc a cos θi
ri = − G(α) .
yi − yc b sin θi
As for the circle, we can take as initial approximation for p the values for an algebraically
fitted ellipse. To obtain initial values for θi we note that for a point (x(θ), y(θ)) on the ellipse,
we have from (8.2.51)
a cos θ T x(θ) − xc
= G (α) .
b sin θ y(θ) − yc
As an initial approximation one can take θi = arctan(vi /ui ), where
ui T (xi − xc )/a
= G (α) , i = 1, . . . , m.
vi (yi − yc )/b
and
∂ri a cos θi − sin α cos α
= −G′ (α) , ′
G (α) = .
∂α b sin θi − cos α − sin α
To simplify, multiply the Jacobian from the left by the 2m × 2m block diagonal orthogonal
matrix diag (GT (α), . . . , GT (α)), noting that
∂ri −b sin θi 0 1
G(α)T = , G(α)T G(α)′ = .
∂α a cos θi −1 0
With rows reordered, the structure of the transformed Jacobian is similar to the circle case,
aS A
J= , S = diag (sin θi ), C = diag (cos θi ),
−bC B
where A and B are now m × 5 matrices. The first block column can easily be triangularized
using the diagonal form of S and C. The main work is the final triangularization of the resulting
(2,2) block. If a = b, the sum of the first m columns of J is zero. In this case the parameter α is
not well determined, and it is essential to use some kind of regularization of α.
The fitting of a sphere or an ellipsoid can be treated analogously. The sphere can be repre-
sented in parametric form as
x − xc − r cos θ cos ϕ
f (x, y, z, p) = y − yc − r cos θ sin ϕ = 0, (8.2.55)
z − zc − r sin θ
are competitive for problems of small to medium size. For applications where A is large and/or
sparse, other algorithms are to be preferred. For example, consider calculating the optimal
amount of material to be removed in the polishing of large optics. Here the nonnegativity con-
straints come in because polishing can only delete material from the surface. A typical problem
might have 8,000 to 20,000 rows and the same number of unknowns, with only a small per-
centage of the matrix elements being nonzero. In general, the problem is rank-deficient, and
the nonnegativity constraints are active for a significant fraction of the elements of the solution
vector. Applications in data mining and machine learning (where the given data, such as images
and text, are required to be nonnegative) give rise to problems of even larger size.
8.3. Nonnegativity Constrained Problems 417
In gradient projection methods for problem BLS (bound-constrained least squares), a step
in the direction of the negative gradient is followed by projection onto the feasible set. For
example, the projected Landweber method for an NNLS problem is
x(k+1) = P(xk + ωAT (b − Ax(k) )), 0 ≤ ω ≤ 2/σ1 (A)2 ,
where P is the projection onto the set x ≥ 0. These simple methods have the disadvantage of
slow convergence, and σ1 (A) may not be known.
For problem NNLS, an equivalent unconstrained problem can be obtained by introducing the
parametrization xi = ezi , i = 1, . . . , n. In image restoration, this has a physical interpretation—
see Hanke, Nagy, and Vogel [571, 2000]. By the chain rule, the gradient g of Φ(x) = 21 ∥Ax−b∥2
with respect to z is
X = diag (x) ≥ 0, y = AT (Ax − b) ≥ 0, gz = Xy. (8.3.1)
Setting gz = 0, we recover the KKT first-order optimality conditions for NNLS. The correspond-
ing modified residual norm steepest descent iterative method is
x(k+1) = x(k) + αk Xk AT (b − Ax(k) ), Xk = diag (x(k) ), (8.3.2)
where the step is in the direction of the negative gradient. This iteration can be interpreted as a
nonlinear Landweber method in which Xk acts as a (variable) preconditioner. The step length
αk in (8.3.2) is restricted to ensure nonnegativity of the x(k+1) .
In certain problems in astronomy and medical imaging, such as positron emission tomogra-
phy (PET), data is subject to noise with a Poisson distribution. Statistical considerations justify
computing a nonnegative minimizer of the maximum likelihood functional
n
X n
X
Ψ(x) = yi − bi log yi , y ≡ Ax; (8.3.3)
i=1 i=1
see Kaufman [687, 1993]. With the same parametrization x = ez as above, the gradient of Φ
with respect to z is
gz = Xgx = XAT Y −1 (Ax − b),
where X = diag (x) and Y = diag (y). Assume now that A is nonnegative and all column
sums of A are 1, i.e., AT e = e, where e = (1, 1, . . . , 1)T . This assumption can be interpreted
as an energy conservation property of A and can always be satisfied by an initial scaling of the
columns. Then XAT Y −1 Ax = XAT e = x, and the gradient becomes gz = x − XAT Y −1 b.
Setting the gradient equal to zero leads to the fixed-point iteration
x(k+1) = Xk AT Yk−1 b, Xk = diag (x(k) ), Yk = diag (Ax(k) ). (8.3.4)
This is the basis for the expectation maximization (EM) algorithm, which is popular in astron-
omy. Note that nonnegativity of the iterates in (8.3.4) is ensured if b ≥ 0.
The choice of starting point is important in iterative NNLS algorithms. Typical applications
require the solution of an underdetermined linear system, which has no unique solution. Different
initial points will converge to different local optima. The EM algorithm is very sensitive to the
initial guess and does not allow x(0) = 0. Dax [294, 1991] shows that the use of Gauss–Seidel
iterations to obtain a good initial point is likely to give large gains in efficiency.
Steepest descent methods for nonnegative least squares have only a linear rate of conver-
gence, even when a line search is used. They tend to take very small steps whenever the level
curves are ellipsoidal. Kaufman [687, 1993] considers acceleration schemes based on the conju-
gate gradient (CG) method. In the inner iterations of CG, the scaling matrix X is kept fixed. CG
is restarted with a new scaling matrix Xk whenever a new constraint becomes active. Nagy and
and Strakoš [819, 2000] consider a variant of this algorithm and show that it is more accurate
and efficient than unconstrained Krylov subspace methods.
418 Chapter 8. Nonlinear Least Squares Problems
This is the basis of a primal-dual interior method. It uses the Newton directions for the nonlinear
system
Xy
F2 (x, y) = = 0,
AT (Ax − b) − y
where the iterands are not forced to satisfy the linear constraints y = AT (Ax − b). A sequence
of points {xk > 0} is computed by
where θk is a positive step size, and (uk , vk ) satisfies the linear system
Yk Xk uk −Xk yk + µk e
= , (8.3.6)
ATA −I vk AT rk + yk
where θkmax is the largest value such that xk+1 ≥ 0, yk+1 ≥ 0, and
After uk has been calculated, vk is determined from the first block equation in (8.3.6):
with Uk = diag (uk ) and Vk = diag (vk ). When uk and vk have been computed, zk can be
found as the solution of the least squares problem
A rk
min uk − .
zk (Xk Yk )−1/2 (Xk Yk )−1/2 (µk − Uk Vk )e 2
8.3. Nonnegativity Constrained Problems 419
with θk as above. This choice does not guarantee a decrease in g(x, y) but seems to work well in
practice. Subproblems (8.3.7) and (8.3.8) must be solved from scratch at each iteration because
no reliable updating methods are available.
Portugal, Júdice, and Vicente [900, 1994] discuss implementation issues and present com-
putational experience with a predictor-corrector algorithm for problem NNLS. They find this
method gives high accuracy even when the subproblems are solved by forming the normal equa-
tion.
An interior method for large-scale nonnegative regularization is given by Rojas and Stei-
haug [933, 2002]. Surveys of interior methods are given by Wright [1133, 1997] and Forsgren,
Gill, and Wright [421, 2002]. The theory of interior methods for convex optimization is devel-
oped in the monumental work by Nesterov and Nemirovski [828, 1994]. The state of the art of
interior methods for optimization is surveyed by Nemirovski and Todd [825, 2008].
The NNMF problem has received much attention; see Kim and Park [697, 2008]. Applications
include analysis of image databases, data mining, machine learning, and other retrieval and clus-
tering operations. Vavasis [1092, 2009] shows that the NNMF problem is equivalent to a problem
in polyhedral combinatorics and is NP-hard.
If either of factors H or W is kept fixed, then computing the other factor in problem NNMF
is a standard NNLS problem with multiple right-hand sides. It can be solved independently by
an NNLS algorithm. For example, if H is fixed, then
m
X
min ∥HW T − AT ∥2F = min ∥HwiT − aTi ∥22 ,
W ≥0 wi ≥0
i=1
where wi and ai are the ith rows of W and A, respectively. Given an initial guess H (1) , the
alternating NNLS method (ANLS) is
for k = 1, 2, . . . , (8.3.10)
(k) T
min ∥H W − AT ∥2F , giving W (k)
, (8.3.11)
W ≥0
The two NNLS subproblems are solved alternately until a convergence criterion is satisfied. It
can be shown that every limit point attained by ANLS is a stationary point of (8.3.9). If A is
rank-deficient, a unique least-norm solution is computed in a second stage.
420 Chapter 8. Nonlinear Least Squares Problems
A problem with the ANLS method is that convergence is often slow, and the solution reached
is not guaranteed to be a global optimum. Finding a good initial approximation is important.
We know that the best rank-k approximation of A can be found from the first k singular triplets
σi ui viT of A. How to obtain these triplets is discussed in Section 7.3.3. Boutsides and Gal-
lopoulus [172, 2008] show that good initial values for ANLS can be obtained by replacing all
negative elements in the product ui viT by zeros.
Surveys of algorithms for nonnegative matrix factorizations are given by Berry et al. [116,
2007] and Kim, He, and Park [699, 2014]. Nonnegative tensor factorizations are studied by Kim,
Park, and Eldén [698, 2007].
and the latter expression is easier to use. For 1 ≤ p < ∞, problem (8.4.1) is strictly convex if
rank(A) = n and therefore has a unique solution. For 0 < p < 1, ψp (x) is not convex, and ∥ · ∥p
is not actually a norm, though d(x, y) = ∥x − y∥p is a metric. For p = 1 and p → ∞, where
m
X
∥r∥1 = |ri |, ∥r∥∞ = max |ri |, (8.4.2)
1≤i≤m
i=1
the minimization is complicated by the fact that these norms are only piecewise differentiable.
Already in 1799, Laplace was using the principle of minimizing the sum of the absolute errors
with the added condition that the sum of the errors be zero. He showed that this implies that the
solution x must satisfy exactly n out of the m equations. The effect on errors of using different
ℓp -norms is visualized in Figure 8.4.1.
8.4. Robust Regression and Related Topics 421
1.5
0.5
0
-1.5 -1 -0.5 0 0.5 1 1.5
Figure 8.4.1. The penalizing effect using the ℓp -norm for p = 0.1, 1, 2, 10.
Example 8.4.1. Consider the problem of estimating the scalar γ from m observations y ∈ Rm .
This is equivalent to
min ∥eT γp − y∥pp , e = (1, 1, . . . , 1)T . (8.4.3)
γp
Minimization in the ℓ1 and ℓ∞ norms can be posed as linear programs. For the ℓ1 -norm,
define nonnegative variables r+ , r− ∈ Rm such that r = r+ − r− . Let eT = (1, 1, . . . , 1) be a
row vector of all ones. Then (8.4.1) is equivalent to
min (eT r+ + eT r− ) subject to Ax + r+ − r− = b, r+ , r− ≥ 0. (8.4.4)
The matrix ( A I −I ) has rank m and column dimension n + 2m. From standard results in
linear programming theory it follows that there exists an optimal ℓ1 solution such that at least m
ri+ or ri− are zero and at least n − rank(A) xi are zero; see Barrodale and Roberts [86, 1970].
An initial feasible basic solution is available immediately by setting x = 0 and ri+ = bi or
ri− = −bi .
Barrodale and Roberts [87, 1973] use a modified simplex method to solve linear program
(8.4.4). It takes advantage of the fact that the variables ri+ and ri− cannot simultaneously be in the
basis. The simplex iterations can be performed within a condensed simplex array of dimensions
(m+2)×(n+2). Implementions of this algorithm are given in Barrodale and Roberts [88, 1974]
and Bartels and Conn [89, 1980]. The latter can handle additional linear equality and inequality
constraints. An alternative algorithm by Abdelmalek [5, 1980], [6, 1980] is based on the dual of
linear program (8.4.4).
The ℓ∞ problem, also called Chebyshev or minimax approximation, is to minimize
maxm i=1 |ri | or, equivalently,
Stiefel [1036, 1959] gave a so-called exchange algorithm for Chebyshev approximation. This is
based on the following property of the optimal solution: the maximum error is attained at n + 1
points if rank(A) = n. In a later paper he showed his algorithm to be equivalent to the simplex
method applied to a suitable linear program. Problem (8.4.5) can be formulated as
A e x b
min ζ subject to ≥ , ζ ≥ 0. (8.4.6)
−A e ζ −b
This linear program has 2m linear constraints in n + 1 variables, and only ζ has a finite simple
bound. Osborne and Watson [850, 1967] recommended the dual program of (8.4.6),
T
A −AT 0
max ( bT −bT ) w subject to w = , w ≥ 0, (8.4.7)
eT eT 1
which has only n+1 rows. To use a modern mathematical programming system, such as CPLEX
or Gurobi (especially when A is sparse), problem (8.4.6) can be rewritten as
b A I x b
min ζ subject to 0 ≤ I e r ≤ ∞ , ζ ≥ 0, (8.4.8)
−∞ I −e ζ 0
which is larger but has only one A and is very sparse.
If the assumptions in a regression model are violated and data are contaminated with outliers,
these can have a large effect on the solution. In robust regression, possible outliers among the
data points are identified and given less weight. Huber’s M-estimator (Huber [648, 1981]) is
a compromise between ℓ2 and ℓ1 estimations. It uses the least squares estimator for “normal”
data but the ℓ1 -norm estimator for data points that disagree more with the normal picture. More
precisely, Huber’s M-estimate minimizes the objective function
m
X
ψH (x) = ρ(rj (x)/σ), (8.4.9)
i=1
where diag (|r|) denotes the diagonal matrix with ith component |ri |. Here the diagonal weight
matrix W depends on r and hence on the unknown x.
In IRLS a sequence of weighted least squares problems is solved, where the weights for the
next iteration are obtained from the current solution. The iterations are initialized by computing
the unweighted least squares solution x(1) and setting W1 = W1 (|r(1) |), where r(1) = b−Ax(1) .
In step k, one solves
(k)
min ∥Wk (r(k) − Aδx)∥2 , Wk = diag ((|ri |)(p−2)/2 ),
δx
and sets x(k+1) = x(k) +δx(k) . It can be shown that any fixed point of the IRLS iteration satisfies
the necessary conditions for a minimum of ψ(x) = ∥r(x)∥pp .
The first study of IRLS appeared in Lawson [725, 1961], where it was applied with p = 1
for ℓ1 minimization. It was extended to 1 ≤ p ≤ ∞ by Rice and Usow [927, 1968]. Cline
[252, 1972] proved that the local rate of convergence of IRLS is linear. Osborne [848, 1985]
gives a comprehensive analysis of IRLS and proves convergence of the basic IRLS method for
1 < p < 3. For p = 1, his main conclusion is that IRLS converges with linear rate provided the
ℓ1 approximation problem has a unique nondegenerate solution. More recently, IRLS has been
used with p < 1 for computing sparse solutions.
The IRLS method is attractive because methods for solving weighted least squares are gen-
erally available. In the simplest implementations the Cholesky factorization of AT W 2 A or the
QR factorization of W A is recomputed in each step. IRLS is closely related to Newton’s method
for minimizing the nonlinear function ψ(x) = ∥b − Ax∥pp . The first and second derivatives of
ψ(x) are
m m
∂ψ(x) X ∂ 2 ψ(x) X
= −p aij |ri |p−2 ri , = p(p − 1) aij |ri |p−2 aik . (8.4.13)
∂xj i=1
∂x j ∂xk i=1
where W = W (|r|) is given as in (8.4.12). Newton’s method for solving the nonlinear equation
g(x) = 0 becomes
These equations are the normal equations for the weighted linear least squares problem
It follows that the Newton step s for minimizing ψp (x) differs from the IRLS step only by the
factor q = 1/(p − 1). It follows that the IRLS step is a descent direction provided the Hessian is
positive definite.
Since IRLS does not take a full Newton step, it is at best only linearly convergent. Taking the
full Newton step gives asymptotic quadratic convergence but makes the initial convergence less
robust when p < 2. In the implementation below, the full Newton step is taken only when p > 2.
(k)
For p < 2, (p − 2)/2 < 0 and a zero residual |ri | = 0 will give an infinite weight wi . Then
ri = 0 also in the next step, and the zero residual will persist in all subsequent steps. Therefore,
when p < 2 it is customary to modify the weights by adding a small number:
see Merle and Späth [790, 1974]. Below, we take ϵ = 10−6 initially and halve its value at each
iteration. We remark that x can be eliminated from the loop by using r = (I − PA )b, where PA
is the orthogonal projector onto the column space of A.
The global convergence of IRLS can be improved by using a line search. O’Leary [838,
1990] compares several strategies for implementing an efficient and reliable line search.
A useful modification of IRLS is to apply continuation. Starting with p = 2, p is successively
increased or decreased for a number of iterations until the desired value is reached. This improves
the range of values of p that give convergence and also significantly improves the rate of conver-
gence. A similar idea is to use a large initial value of ϵ in (8.4.16) and reduce it successively in
later iterations.
When p is close to unity, the convergence of the IRLS method can be extremely slow. This is
related to the fact that for p = 1 the solution has zero residuals. Li [744, 745, 1993] develops a
globalized Newton algorithm using the complementary slackness condition for the ℓ1 problem.
Far from the solution it behaves like IRLS with line search, but close to the solution it is similar
to Newton’s method for an extended nonlinear system of equations. The problem of unbounded
second derivatives is handled by a simple technique connected to the line search.
was proposed for variable selection in least squares problems by Tibshirani [1061, 1996]. He
gave it the colorful name LASSO, which stands for “Least Absolute Shrinkage and Selection
Operator.” For a fixed value of the regularization parameter µ > 0, the objective function in
LASSO is strictly convex over a convex feasible region. Therefore problem (8.4.18) has a unique
minimizer. Let µLS = ∥xLS ∥1 , where xLS is the unconstrained least squares solution. The trajec-
tory of the LASSO solution for µ ∈ [0, µLS ] is a piecewise linear function of µ. An algorithm for
computing the LASSO trajectory based on standard methods for convex programming is given
by Osborne, Presnell, and Turlach [849, 2000].
Efron et al. [359, 2004] gave a more intuitive algorithm called Least-Angle Regression
(LARS). By construction, the trajectory of the solution in LARS is piecewise linear with n break
points. In many cases, this trajectory coincides with that of the ℓ1 constrained least squares
426 Chapter 8. Nonlinear Least Squares Problems
problem. Indeed, with the following small modification the LARS algorithm can be used for
solving the LASSO problem: When a nonzero variable becomes zero and is about to change
sign, the variable is removed from the active set, and the least squares direction of change is
recomputed.
Theorem 8.4.2. Let x(µ) be the solution of the ℓ1 constrained least squares problem (8.4.18).
Then there exists a finite set of break points 0 = µ0 ≤ µ1 ≤ · · · ≤ µp = µLS such that x(µ) is a
piecewise linear function
x(µ) = x(µk ) + (µ − µk )(x(µk+1 ) − x(µk )), µk ≤ µ ≤ µk+1 . (8.4.19)
The ℓ1 -constrained least squares solution can be computed with about the same arithmetic
cost as a single least squares problem. As in stepwise regression, the QR factorization of the
columns in the active set is modified in each step. When a new variable is added, the factorization
is updated by adding the new column. In the case when a variable becomes zero, the factorization
is downdated by deleting a column. Although unlikely, the possibility of a multiple change in the
active set cannot be excluded. A crude method for coping with this is to make a small random
change in the right-hand side. It follows from continuity that no index set σ can be repeated in
the algorithm. There are only a finite number of steps in the algorithm, usually not much more
than min{m, n}.
Statistical aspects of variable selection in least squares problems using LASSO and related
techniques are discussed by Hastie, Tibshirani, and Friedman [595, 2009]. In image processing a
related technique has been used, where the ℓ1 -norm in LASSO is replaced by the Total-Variation
(TV) norm, giving in one dimension the related problem
min ∥Ax − b∥22 + µ∥Lx∥1 (8.4.20)
x
The BP objective can be interpreted as the closest convex approximation to (8.4.22). The region
∥x∥1 ≤ µ is a diamond-shaped polyhedron with many sharp corners, edges, and faces at which
one or several parameters are zero. This structure favors solutions with few nonzero coefficients.
Chen, Donoho, and Saunders [241, 2001] use BP to decompose a signal into a sparse combina-
tion of elements of a highly overcomplete dictionary, e.g., consisting of wavelets. To allow for
noise in the signal b, they also propose the BP Denoising (BPDN) problem
1
min λ∥x∥1 + rT r subject to Ax + r = b, (8.4.24)
x 2
where λ > 0 again encourages sparsity in x. BPDN is solved by the primal-dual interior method
PDCO [887, 2018] using LSMR to compute search directions (because A is a fast linear operator
rather than an explicit matrix).
Compressed sensing is a term coined by Donoho [329, 2006] for the problem of recovering
a signal from a small number of compressive measurements. Candès, Romberg, and Tao [206,
2006] prove that sparse solutions can with high probability be reconstructed exactly from re-
markably few measurements by compressed sensing, provided these satisfy a certain coherence
property. It has been established that compressed sensing is robust in the sense that it can deal
with measurements noise and cases where the signal is only approximately sparse. One of several
important applications is Magnetic Resonance Imaging (MRI), for which the use of compressed
sensing has improved performance by a factor of 10.
The convex optimization problem for a consistent underdetermined linear system Ax = b is
min ∥x∥pp subject to Ax = b, 1 ≤ p ≤ 2, (8.4.25)
x
and can be solved by IRLS. For p = 2, the solution of (8.4.25) is the pseudoinverse solution
x = A† b. For 1 ≤ p ≤ 2, the ℓp is rewritten as a weighted ℓ2 -norm
n
X
∥x∥pp = (|xi |/wi )2 , wi = |xi |(1−p/2) . (8.4.26)
i=1
With W = diag (wi ), problem (8.4.25) is equivalent to the weighted least-norm problem
min ∥W −1 x∥22 subject to Ax = b. (8.4.27)
From the weighted normal equations of the second kind, we obtain
x = W 2 AT (AW 2 AT )−1 b. (8.4.28)
A more stable alternative is to use the (compact) QR factorization
W AT = QR, x = W Q(R−T b). (8.4.29)
If possible, the rows in W A should be presorted by decreasing row norms of W AT ; see Sec-
tion 3.2.2.
Since the weights wi are well defined for any xi , in principle no regularization is needed.
However, to prevent the matrix being inverted in (8.4.28) from becoming too ill-conditioned, the
weights should be regularized as
wi = |xi |(1−p/2) + ϵ,
where ϵ = 10−6 initially, and each iteration decreases ϵ by a factor of 2.
The MATLAB implementation below follows Burrus [190, 2012]. The iterations are started
with the least squares solution, corresponding to p1 = 2. A continuation strategy is used for p,
where the current value pk is decreased by a fixed amount dp at each iteration until the target p
is reached.
428 Chapter 8. Nonlinear Least Squares Problems
Candès, Wakin, and Boyd [207, 2008] propose an iterative algorithm similar to IRLS in
which each iteration solves the weighted ℓ1 problem
and the weights are updated. This problem can be rewritten as a linear program like the un-
(1)
weighted ℓ1 problem. As in IRLS, the weights are initially wi = 1 and then updated as
(k+1) (k)
wi = |xi | + ϵ.
Solving the weighted ℓ1 problem (8.4.31) is more complex than solving the weighted least
squares problems by IRLS. Candès, Wakin, and Boyd [207, 2008] use the primal-dual log-barrier
interior software package ℓ1 -MAGIC. There is a marked improvement in the recovery of sparse
signal compared to unweighted ℓ1 minimization. The number of iterations needed is typically
less than ten, but each iteration is computationally more costly than for IRLS. The primal-dual
interior method PDCO [887, 2018] can also be applied as for Basis Pursuit or Basis Pursuit
Denoising.
8.4. Robust Regression and Related Topics 429
[1] Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin Carson, Terry Cojean, Jack Dongarra,
Alyson Fox, Mark Gates, Nicolas J. Higham, Xiaoye S. Li, Jennifer Loe, Piotr Luszczek, Srikara
Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry F. Smith, Kasia Świrydowicz, Stephen Thomas,
Stanimire Tomov, Yaohung M. Tsai, and Ulrike Meier Yang. A survey of numerical linear alge-
bra methods utilizing mixed-precision arithmetic. Internat. J. High Performance Comput. Appl.,
35:344–369, 2021. (Cited on p. 114.)
[2] Ahmad Abdelfattah, Hartwig Anzt, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak,
Piotr Luszczek, Stanimire Tomov, Ichitaro Yamazaki, and A. YarKhan. Linear algebra software for
large-scale accelerated multicore computing. Acta Numer., 25:1–160, 2016. (Cited on pp. 112,
114.)
[3] N. N. Abdelmalek. On the solution of the linear least squares problems and pseudoinverses. Com-
puting, 13:215–228, 1971. (Cited on p. 31.)
[4] N. N. Abdelmalek. Roundoff error analysis for Gram–Schmidt method and solution of linear least
squares problems. BIT Numer. Math., 11:345–368, 1971. (Cited on p. 71.)
[5] N. N. Abdelmalek. Algorithm 551: A Fortran subroutine for the l1 solution of overdetermined
linear systems of equations. ACM Trans. Math. Softw., 6:228–230, 1980. (Cited on p. 421.)
[6] N. N. Abdelmalek. l1 solution of overdetermined linear systems of equations. ACM Trans. Math.
Softw., 6:220–227, 1980. (Cited on p. 421.)
[7] R. J. Adcock. A problem in least squares. The Analyst, 5:53–54, 1878. (Cited on p. 407.)
[8] Mikael Adlers and Åke Björck. Matrix stretching for sparse least squares problem. Numer. Linear
Algebra Appl., 7:51–65, 2000. (Cited on p. 263.)
[9] S. N. Afriat. Orthogonal and oblique projectors and the characteristics of pairs of vector spaces.
Proc. Cambridge Philos. Soc., 53:800–816, 1957. (Cited on p. 119.)
[10] E. Agullo, James W. Demmel, Jack Dongarra, B. Hadri, Jakub Kurzak Julien Langou, H. Ltaief,
P. Luszczek, and S. Tomov. Numerical linear algebra on emerging architectures: The PLASMA
and MAGMA projects. J. Phys. Conf. Ser., 180:012037, 2009. (Cited on p. 114.)
[11] N. Ahmed, T. Natarajan, and K. R. Rao. Discrete cosine transform. IEEE Trans. Comput., C23:90–
93, 1974. (Cited on p. 237.)
[12] Alexander Craig Aitken. On least squares and linear combinations of observations. Proc. Roy. Soc.
Edinburgh, 55:42–48, 1934/1936. (Cited on p. 5.)
[13] M. A. Ajiz and Alan Jennings. A robust incomplete Cholesky-conjugate gradient algorithm. Int. J.
Numer. Meth. Eng., 20:949–966, 1984. (Cited on pp. 310, 310.)
[14] M. Al-Baali and Roger Fletcher. Variational methods for non-linear least squares. J. Oper. Res.
Soc., 36:405–421, 1985. (Cited on p. 400.)
[15] M. Al-Baali and R. Fletcher. An efficient line search for nonlinear least-squares. J. Optim. Theory
Appl., 48:359–377, 1986. (Cited on p. 400.)
431
432 Bibliography
[16] S. T. Alexander, Ching-Tsuan Pan, and Robert J. Plemmons. Analysis of a recursive least squares
hyperbolic rotation algorithm for signal processing. Linear Algebra Appl., 98:3–40, 1988. (Cited
on p. 137.)
[17] D. M. Allen. The relationship between variable selection and data augmentation and a method for
prediction. Technometrics, 16:125–127, 1974. (Cited on p. 178.)
[18] Patrick R. Amestoy, Timothy A. Davis, and Iain S. Duff. Algorithm 837: AMD, an an approximate
minimum degree ordering algorithm. ACM Trans. Math. Softw., 30:381–388, 2004. (Cited on
p. 252.)
[19] Patrick R. Amestoy, I. S. Duff, and C. Puglisi. Multifrontal QR factorization in a multiprocessor
environment. Numer. Linear Algebra Appl., 3:275–300, 1996. (Cited on p. 258.)
[20] Greg S. Ammar and William B. Gragg. Superfast solution of real positive definite Toeplitz systems.
SIAM J. Matrix Anal. Appl., 9:61–76, 1988. (Cited on p. 241.)
[21] A. A. Anda and Haesun Park. Fast plane rotations with dynamic scaling. SIAM J. Matrix Anal.
Appl., 15:162–174, 1994. (Cited on p. 51.)
[22] A. Anda and Haesun Park. Self-scaling fast rotations for stiff least squares problems. Linear
Algebra Appl., 234:137–162, 1996. (Cited on p. 132.)
[23] Bjarne S. Andersen, Jerzy Waśniewski, and Fred G. Gustavson. A recursive formulation of
Cholesky factorization for a packed matrix. ACM Trans. Math. Softw., 27:214–244, 2001. (Cited
on p. 112.)
[24] R. S. Andersen and Gene H. Golub. Richardson’s Non-stationary Matrix Iterative Procedure. Tech.
Report STAN-CS-72-304, Computer Science Department, Stanford University, CA, 1972. (Cited
on p. 326.)
[25] Edward Anderssen, Zhaojun Bai, and Jack J. Dongarra. Generalized QR factorization and its ap-
plications. Linear Algebra Appl., 162/164:243–271, 1992. (Cited on pp. 124, 128.)
[26] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammar-
ling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users’ Guide. SIAM, Philadelphia,
second edition, 1995. (Cited on p. 97.)
[27] E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. W. Demmel, J. Dongarra, J. Du Croz, A. Green-
baum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ Guide. SIAM, Philadel-
phia, third edition, 1999. (Cited on p. 114.)
[28] Claus A. Andersson and Rasmus Bro. The N -way toolbox for MATLAB. Chemom. Intell. Lab.
Syst., 52:1–4, 2000. (Cited on p. 218.)
[29] Martin Andersson. A comparison of nine PLS1 algorithms. J. Chemometrics, 23:518–529, 2009.
(Cited on pp. 202, 203.)
[30] Peter Arbentz and Gene H. Golub. Matrix shapes invariant under symmetric QR algorithm. Numer.
Linear Algebra Appl., 2:87–93, 1995. (Cited on p. 341.)
[31] M. Arioli. Generalized Golub–Kahan bidiagonalization and stopping criteria. SIAM J. Matrix Anal.
Appl., 34:571–592, 2013. (Cited on pp. 331, 331.)
[32] Mario Arioli, Marc Baboulin, and Serge Gratton. A partial condition number for linear least squares
problems. SIAM J. Matrix Anal. Appl., 29:413–433, 2007. (Cited on p. 29.)
[33] M. Arioli, J. W. Demmel, and I. S. Duff. Solving sparse linear systems with sparse backward error.
SIAM J. Matrix Anal. Appl., 10:165–190, 1989. (Cited on pp. 97, 104.)
[34] Mario Arioli and Iain S. Duff. Preconditioning linear least-squares problems by identifying a basis
matrix. SIAM J. Sci. Comput., 37:S544–S561, 2015. (Cited on p. 319.)
[35] Mario Arioli, Iain Duff, Joseph Noailles, and Daniel Ruiz. A block projection method for sparse
matrices. SIAM J. Sci. Comput., 13:47–70, 1992. (Cited on p. 275.)
[36] Mario Arioli, Iain S. Duff, and Peter P. M. de Rijk. On the augmented system approach to sparse
least-squares problems. Numer. Math., 55:667–684, 1989. (Cited on pp. 31, 104.)
Bibliography 433
[37] Mario Arioli, Iain Duff, and Daniel Ruiz. Stopping criteria for iterative solvers. SIAM J. Matrix
Anal. Appl., 13:138–144, 1992. (Cited on p. 299.)
[38] Mario Arioli, Iain S. Duff, Daniel Ruiz, and Miloud Sadkane. Block Lanczos techniques for ac-
celerating the block Cimmino method. SIAM J. Sci. Comput., 16:1478–1511, 1995. (Cited on
p. 275.)
[39] W. E. Arnoldi. The principle of minimized iteration in the solution of the matrix eigenvalue prob-
lem. Quart. Appl. Math., 9:17–29, 1951. (Cited on p. 301.)
[40] Stephen F. Ashby, Thomas A. Manteuffel, and Paul E. Saylor. A taxonomy for conjugate gradient
methods. SIAM J. Numer. Anal., 27:1542–1568, 1990. (Cited on p. 336.)
[41] Léon Autonne. Sur les groupes linéaires réels et orthogonaux. Bull. Soc. Math. France, 30:121–134,
1902. (Cited on p. 383.)
[42] Léon Autonne. Sur les matrices hypohermitiennes et les unitaires. C. R. Acad. Sci. Paris, 156:858–
860, 1913. (Cited on p. 13.)
[43] Léon Autonne. Sur les matrices hypohermitiennes et sur les matrices unitaires. Ann. Univ. Lyon
(N.S.), 38:1–77, 1915. (Cited on p. 13.)
[44] J. K. Avila and J. A. Tomlin. Solution of very large least squares problems by nested dissection on
a parallel processor. In J. F. Gentleman, editor, Proceedings of the Computer Science and Statistics
12th Annual Symposium on the Interface. University of Waterloo, Canada, 1979. (Cited on p. 209.)
[45] Haim Avron, Esmond Ng, and Sivan Toledo. Using perturbed QR factorizations to solve linear
least-squares problems. SIAM J. Matrix Anal. Appl., 31:674–693, 2009. (Cited on p. 263.)
[46] H. Avron, P. Maymounkov, and S. Toledo. Blendenpik: Supercharging LAPACK’s least squares
solver. SIAM J. Sci. Comput., 32:1217–1236, 2010. (Cited on p. 320.)
[47] Owe Axelsson. A generalized SSOR method. BIT Numer. Math., 12:443–467, 1972. (Cited on
p. 308.)
[48] Owe Axelsson. Iterative Solution Methods. Cambridge University Press, Cambridge, 1994. (Cited
on pp. 269, 275, 281, 295.)
[49] Marc Baboulin, Luc Giraud, Serge Gratton, and Julien Langou. Parallel tools for solving incremen-
tal dense least squares. Applications to space geodesy. J. Algorithms Comput. Tech., 3:117–133,
2009. (Cited on pp. 3, 112.)
[50] Marc Baboulin and Serge Gratton. Computing the conditioning of the components of a linear
least-squares solution. Numer. Linear Algebra Appl., 16:517–533, 2009. (Cited on p. 31.)
[51] Marc Baboulin and Serge Gratton. A contribution to the conditioning of the total least squares
problem. SIAM J. Matrix Anal. Appl., 32:685–699, 2011. (Cited on p. 226.)
[52] Brett W. Bader and Tamara G. Kolda. Algorithm 862: MATLAB tensor classes for fast algorithm
prototyping. ACM Trans. Math. Softw., 32:455–500, 2006. (Cited on p. 218.)
[53] Brett W. Bader and Tamara G. Kolda. Efficient MATLAB computations with sparse and factored
tensors. SIAM J. Sci. Comput., 30:205–231, 2007. (Cited on p. 218.)
[54] J. Baglama, D. Calvetti, and L. Reichel. IRBL: An implicitly restarted block-Lanczos method
for large-scale Hermitian eigenproblems. SIAM J. Sci. Comput., 24:1650–1677, 2003. (Cited on
pp. 370, 372.)
[55] James Baglama, Daniela Calvetti, and Lothar Reichel. Algorithm 827: irbleigs: A MATLAB
program for computing a few eigenpairs of a large sparse Hermitian matrix. ACM Trans. Math.
Softw., 29:337–348, 2003. (Cited on p. 374.)
[56] James Baglama and Lothar Reichel. Augmented implicitly restarted Lanczos bidiagonalization
methods. SIAM J. Sci. Comput., 27:19–42, 2005. (Cited on pp. 373, 374.)
[57] James Baglama and Lothar Reichel. Restarted block Lanczos bidiagonalization methods. Numer.
Algor., 43:251–272, 2006. (Cited on p. 374.)
434 Bibliography
[58] James Baglama, Lothar Reichel, and Daniel J. Richmond. An augmented LSQR method. Numer.
Algor., 64:263–293, 2013. (Cited on p. 337.)
[59] Zhaojun Bai and James W. Demmel. Computing the generalized singular value decomposition.
SIAM J. Sci. Comput., 14:1464–1486, 1993. (Cited on pp. 125, 356.)
[60] Zhaojun Bai and James Demmel. Using the matrix sign function to compute invariant subspaces.
SIAM J. Matrix Anal. Appl., 19:205–225, 1998. (Cited on pp. 380, 382.)
[61] Zhaojun Bai, James Demmel, Jack Dongarra, Axel Ruhe, and Henk van der Vorst, editors. Tem-
plates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia,
2000. (Cited on pp. 349, 375.)
[62] Zhaojun Bai, James W. Demmel, and Ming Gu. An inverse free parallel spectral divide and conquer
algorithm for nonsymmetric eigenproblems. Numer. Math., 76:279–308, 1997. (Cited on p. 382.)
[63] Zhaojun Bai and Hongyuan Zha. A new preprocessing algorithm for the computation of the gen-
eralized singular value decomposition. SIAM J. Sci. Comput., 14:1007–1012, 1993. (Cited on
p. 125.)
[64] Zhong-Zhi Bai, Iain S. Duff, and Andrew J. Wathen. A class of incomplete orthogonal factorization
methods I: Methods and theories. BIT Numer. Math., 41:53–70, 2001. (Cited on p. 315.)
[65] Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. Minimizing communication in
numerical linear algebra. SIAM J. Matrix Anal. Appl., 32:866–901, 2011. (Cited on p. 114.)
[66] Y. Bard. Nonlinear Parameter Estimation. Academic Press, New York, 1974. (Cited on p. 391.)
[67] Jesse L. Barlow. Modification and maintenance of ULV decompositions. In Zlatko Drmač, Vjeran
Hari, Luka Sopta, Zvonimir Tutek, and Krěsimir Veselić, editors, Applied Mathematics and Scien-
tific Computing, pages 31–62. Springer, Boston, MA, 2003. (Cited on p. 155.)
[68] Jesse L. Barlow. Reorthogonalization for the Golub–Kahan–Lanczos bidiagonal reduction. Numer.
Math., 124:237–278, 2013. (Cited on p. 298.)
[69] Jesse L. Barlow. Block Gram–Schmidt downdating. ETNA, 43:163–187, 2014. (Cited on p. 152.)
[70] Jesse L. Barlow. Block modified Gram–Schmidt algorithm and their analysis. SIAM J. Matrix Anal.
Appl., 40:1257–1290, 2019. (Cited on p. 152.)
[71] Jesse L. Barlow, Nela Bosner, and Zlatko Drmač. A new stable bidiagonal reduction algorithm.
Linear Algebra Appl., 397:35–84, 2005. (Cited on p. 194.)
[72] Jesse L. Barlow and Hasan Erbay. A modifiable low-rank approximation of a matrix. Numer. Linear
Algebra Appl., 16:833–860, 2009. (Cited on p. 155.)
[73] Jesse L. Barlow, Hasan Erbay, and Ivan Slapnic̆ar. An alternative algorithm for the refinement of
ULV decompositions. SIAM J. Matrix Anal. Appl., 27:198–211, 2005. (Cited on p. 155.)
[74] Jesse L. Barlow and Susan L. Handy. The direct solution of weighted and equality constrained
least-squares problems. SIAM J. Sci. Statist. Comput., 9:704–716, 1988. (Cited on p. 160.)
[75] J. L. Barlow, N. K. Nichols, and R. J. Plemmons. Iterative methods for equality-constrained least
squares problems. SIAM J. Sci. Statist. Comput., 9:892–906, 1988. (Cited on p. 319.)
[76] Jesse L. Barlow and Alicja Smoktunowicz. Reorthogonalized block classical Gram–Schmidt. Nu-
mer. Math., 123:395–423, 2013. (Cited on pp. 109, 152.)
[77] Jesse L. Barlow, Alicja Smoktunowicz, and Hasan Erbay. Improved Gram–Schmidt downdating
methods. BIT Numer. Math., 45:259–285, 2005. (Cited on p. 152.)
[78] Jesse L. Barlow and Udaya B. Vemulapati. A note on deferred correction for equality constrained
least squares problems. SIAM J. Numer. Anal., 29:249–256, 1992. (Cited on p. 160.)
[79] Jesse L. Barlow, P. A. Yoon, and Hongyuan Zha. An algorithm and a stability theory for downdating
the ULV decomposition. BIT Numer. Math., 36:14–40, 1996. (Cited on p. 155.)
[80] Jesse L. Barlow, Hongyuan Zha, and P. A. Yoon. Stable Chasing Algorithms for Modifying Com-
plete and Partial Singular Value Decompositions. Tech. Report CSE-93-19, Department of Com-
puter Science, The Pennsylvania State University, State College, PA, 1993. (Cited on p. 363.)
Bibliography 435
[81] S. T. Barnard, Alan Pothen, and Horst D. Simon. A spectral algorithm for envelope reduction of
sparse matrices. Numer. Algor., 2:317–334, 1995. (Cited on p. 251.)
[82] Richard Barret, Michael Berry, Tony F. Chan, James Demmel, June Donato, Jack Dongarra, Victor
Eijkhout, Roldan Pozo, Charles Romine, and Henk van der Vorst. Templates for the Solution of
Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, 1994. (Cited on
p. 269.)
[83] Anders Barrlund. Perturbation bounds on the polar decomposition. BIT Numer. Math., 30:101–113,
1990. (Cited on p. 385.)
[84] Anders Barrlund. Efficient solution of constrained least squares problems with Kronecker product
structure. SIAM J. Matrix Anal. Appl., 19:154–160, 1998. (Cited on p. 211.)
[85] Ian Barrodale and C. Phillips. Algorithm 495: Solution of an overdetermined system of linear
equations in the Chebyshev norm. ACM Trans. Math. Softw., 1:264–270, 1975. (Cited on p. 423.)
[86] Ian Barrodale and F. D. K. Roberts. Applications of mathematical programming to lp approxima-
tion. In J. B. Rosen, O. L Mangasarian, and K. Ritter, editors, Nonlinear Programming, pages
447–464. Academic Press, New York, 1970. (Cited on p. 421.)
[87] I. Barrodale and F. D. K. Roberts. An improved algorithm for discrete l1 linear approximation.
SIAM J. Numer. Anal., 10:839–848, 1973. (Cited on p. 421.)
[88] Ian Barrodale and F. D. K. Roberts. Algorithm 478: Solution of an overdetermined system of
equations in the ℓ1 norm. Comm. ACM, 17:319–320, 1974. (Cited on pp. 421, 422.)
[89] Richard H. Bartels and Andrew R. Conn. Algorithm 563: A program for linearly constrained
discrete ℓ1 problems. ACM Trans. Math. Softw., 6:609–614, 1980. (Cited on p. 421.)
[90] Richard H. Bartels, Andrew R. Conn, and J. W. Sinclair. Minimization techniques for piecewise
differentiable functions: The l1 solution to an overdetermined linear system. SIAM J. Numer. Anal.,
15:224–241, 1978. (Cited on p. 423.)
[91] Richard H. Bartels and Gene H. Golub. Chebyshev solution to an overdetermined system. Algo-
rithm 328. Comm. ACM, 11:428–430, 1968. (Cited on p. 423.)
[92] Richard H. Bartels and Gene H. Golub. Stable numerical methods for obtaining the Chebyshev
solution to an overdetermined system. Comm. ACM, 11:401–406, 1968. (Cited on p. 423.)
[93] W. Barth, R. S. Martin, and James H. Wilkinson. Calculation of the eigenvalues of a symmetric
tridiagonal matrix by the method of bisection. In F. L. Bauer et al., editors, Handbook for Automatic
Computation. Vol. II, Linear Algebra, pages 249–256. Springer, New York, 1971. Prepublished in
Numer. Math., 9:386–393, 1967. (Cited on p. 350.)
[94] Friedrich L. Bauer. Das Verfahren der Treppeniteration und verwandte Verfahren zur Lösung alge-
braischer Eigenwertprobleme. Z. Angew. Math. Phys., 8:214–235, 1957. (Cited on p. 367.)
[95] Friedrich L. Bauer. Genauigkeitsfragen bei der Lösung linearer Gleichungssysteme. Z. Angew.
Math. Mech., 46:409–421, 1966. (Cited on pp. 31, 44.)
[96] Amir Beck and Aharon Ben-Tal. On the solution of the Tikhonov regularization of the total least
squares problem. SIAM J. Optim., 17:98–118, 2006. (Cited on p. 226.)
[97] Stefania Bellavia, Jacek Gondzio, and Benedetta Morino. A matrix-free preconditioner for sparse
symmetric positive definite systems and least-squares problems. SIAM J. Sci. Comput., 35:A192–
A211, 2013. (Cited on p. 311.)
[98] Eugenio Beltrami. Sulle funzioni bilineari. Giorn. Mat. ad Uso degli Studenti Delle Universita,
11:98–106, 1873. (Cited on p. 13.)
[99] Adi Ben-Israel. On iterative methods for solving non-linear least squares problems over convex
sets. Israel J. Math., 5:211–224, 1967. (Cited on p. 395.)
[100] Adi Ben-Israel. The Moore of the Moore–Penrose inverse. Electronic J. Linear Algebra, 9:150–
157, 2002. (Cited on p. 16.)
436 Bibliography
[101] Adi Ben-Israel and Thomas N. E. Greville. Generalized Inverses: Theory and Applications.
Springer-Verlag, New York, second edition, 2003. (Cited on p. 16.)
[102] Adi Ben-Israel and S. J. Wersan. An elimination method for computing the generalized inverse of
an arbitrary matrix. J. Assoc. Comput. Mach., 10:532–537, 1963. (Cited on p. 88.)
[103] Aharon Ben-Tal and Marc Teboulle. A geometric property of the least squares solution of linear
equations. Linear Algebra Appl., 139:165–170, 1990. (Cited on p. 132.)
[104] Steven J. Benbow. Solving generalized least-squares problems with LSQR. SIAM J. Matrix Anal.
Appl., 21:166–177, 1999. (Cited on pp. 291, 331.)
[105] Commandant Benoit. Note sur une méthode de résolution des équations normales provenant de
l’application de la méthode des moindres carrés a un système d’équations linéaires en nombre
inférieur a celui des inconnues. Application de la méthode a la résolution d’un système defini
d’équations linéaires. (Procédé du Commandant Cholesky.) Bull. Géodésique, 2:67–77, 1924.
(Cited on p. 41.)
[106] Michele Benzi. Preconditioning techniques for large linear systems: A survey. J. Comput. Phys.,
182:418–477, 2002. (Cited on pp. 287, 314.)
[107] Michele Benzi. Gianfranco Cimmino’s contribution to Numerical Mathematics. In Ciclo di Con-
ferenze in Ricordo di Gianfranco Cimmino, pages 87–109, Bologna, 2005. Tecnoprint. (Cited on
p. 273.)
[108] Michele Benzi, Gene H. Golub, and Jörg Liesen. Numerical solution of saddle point problems.
Acta Numer., 14:1–138, 2005. (Cited on p. 117.)
[109] Michele Benzi, Carl D. Meyer, and Miroslav Tůma. A sparse approximate inverse preconditioner
for the conjugate gradient method. SIAM J. Sci. Comput., 17:1135–1149, 1995. (Cited on p. 315.)
[110] Michele Benzi and Miroslav Tůma. A robust incomplete factorization preconditioner for positive
definite matrices. Numer. Linear Algebra Appl., 10:385–400, 2003. (Cited on p. 315.)
[111] Michele Benzi and Miroslav Tůma. A robust preconditioner with low memory requirements for
large sparse least squares problems. SIAM J. Sci. Comput., 25:499–512, 2003. (Cited on p. 315.)
[112] Abraham Berman and Robert J. Plemmons. Cones and iterative methods for best least squares
solutions of linear systems. SIAM J. Numer. Anal., 11:145–154, 1974. (Cited on p. 270.)
[113] Michael W. Berry. A Fortran-77 Software Library for the Sparse Singular Value Decomposition.
Tech. Report CS-92-159, University of Tennessee, Knoxville, TN, 1992. (Cited on p. 376.)
[114] Michael W. Berry. SVDPACKC: Version 1.0 User’s Guide. Tech. Report CS-93-194, University of
Tennessee, Knoxville, TN, 1993. (Cited on p. 376.)
[115] Michael W. Berry. A survey of public-domain Lanczos-based software. In J. D. Brown, M. T. Chu,
D. C. Ellison, and Robert J. Plemmons, editors, Proceedings of the Cornelius Lanczos International
Centenary Conference, Raleigh, NC, Dec. 1993, pages 332–334. SIAM, Philadelphia, 1994. (Cited
on p. 376.)
[116] Michael W. Berry, M. Browne, A. Langville, V. C. Pauca, and Robert J. Plemmons. Algorithms
and applications for approximate nonnegative matrix factorizations. Comput. Data Statist. Anal.,
21:155–173, 2007. (Cited on p. 420.)
[117] Rajendra Bhatia and Kalyan K. Mukherjea. On weighted Löwdin orthogonalization. Int. J. Quan-
tum Chemistry, 29:1775–1778, 1986. (Cited on p. 383.)
[118] I. J. Bienaymé. Remarques sur les différences qui distinguent l’interpolation de M. Cauchy de la
méthode des moindre carrés et qui assurent la supériorité de cette méthode. C. R. Acad. Sci. Paris,
37:5–13, 1853. (Cited on p. 64.)
[119] M. Bierlair, Philippe Toint, and D. Tuyttens. On iterative algorithms for linear least-squares prob-
lems with bound constraints. Linear Algebra Appl., 143:111–143, 1991. (Cited on p. 420.)
[120] David Bindel, James W. Demmel, William M. Kahan, and Osni Marques. On computing Givens
rotations reliably and efficiently. ACM Trans. Math. Softw., 28:206–238, 2002. (Cited on p. 50.)
Bibliography 437
[121] Christian H. Bischof and Gregorio Quintana-Ortí. Algorithm 782: Codes for rank-revealing QR
factorizations of dense matrices. ACM Trans. Math. Softw., 24:254–257, 1998. (Cited on p. 109.)
[122] Christian H. Bischof and Gregorio Quintana-Ortí. Computing rank-revealing QR factorizations of
dense matrices. ACM Trans. Math. Softw., 24:226–253, 1998. (Cited on p. 109.)
[123] Christian Bischof and Charles Van Loan. The WY representation for products of Householder
matrices. SIAM J. Sci. Statist. Comput., 8:s2–s13, 1987. (Cited on p. 106.)
[124] Åke Björck. Iterative refinement of linear least squares solutions I. BIT Numer. Math., 7:257–278,
1967. (Cited on pp. 93, 101.)
[125] Åke Björck. Solving linear least squares problems by Gram–Schmidt orthogonalization. BIT Nu-
mer. Math., 7:1–21, 1967. (Cited on pp. 28, 62, 70, 108.)
[126] Åke Björck. Iterative refinement of linear least squares solutions II. BIT Numer. Math., 8:8–30,
1968. (Cited on pp. 103, 160.)
[127] Åke Björck. Methods for sparse least squares problems. In J.R. Bunch and D. J. Rose, editors,
Sparse Matrix Computations, pages 177–199. Academic Press, New York, 1976. (Cited on p. 317.)
[128] Åke Björck. SSOR preconditioning methods for sparse least squares problems. In J. F. Gentle-
man, editor, Proceedings of the Computer Science and Statistics 12th Annual Symposium on the
Interface, pages 21–25. University of Waterloo, Canada, 1979. (Cited on pp. 307, 309.)
[129] Åke Björck. Use of conjugate gradients for solving linear least squares problems. In I. S. Duff,
editor, Conjugate Gradient Methods and Similar Techniques, pages 49–71. Computer Science and
Systems Division, Harwell, AERE- R 9636, 1979. (Cited on p. 294.)
[130] Åke Björck. Stability analysis of the method of semi-normal equations for least squares problems.
Linear Algebra Appl., 88/89:31–48, 1987. (Cited on pp. 105, 105.)
[131] Åke Björck. A bidiagonalization algorithm for solving ill-posed systems of linear equations. BIT
Numer. Math., 28:659–670, 1988. (Cited on p. 332.)
[132] Åke Björck. Component-wise perturbation analysis and error bounds for linear least squares solu-
tions. BIT Numer. Math., 31:238–244, 1991. (Cited on pp. 31, 99.)
[133] Åke Björck. Pivoting and stability in the augmented system method. In D. F. Griffiths and G. A.
Watson, editors, Numerical Analysis 1991: Proceedings of the 14th Dundee Biennal Conference,
June 1991, Pitman Research Notes Math. Ser. 260, pages 1–16. Longman Scientific and Technical,
Harlow, UK, 1992. (Cited on p. 93.)
[134] Åke Björck. Numerics of Gram–Schmidt orthogonalization. Linear Algebra Appl., 197/198:297–
316, 1994. (Cited on pp. 67, 108.)
[135] Åke Björck. Numerical Methods for Least Squares Problems. SIAM, Philadelphia, 1996. (Cited
on p. 131.)
[136] Åke Björck. QR factorization of the Jacobian in some structured nonlinear least squares problem. In
Sabine Van Huffel and Philippe Lemmerling, editors, Total Least Squares and Errors-in-Variables
Modeling. Analysis, Algorithms and Applications, pages 225–234. Kluwer Academic Publishers,
Dordrecht, 2002. (Cited on p. 410.)
[137] Åke Björck. Stability of two direct methods for bidiagonalization and partial least squares. SIAM
J. Matrix Anal. Appl., 35:279–291, 2014. (Cited on pp. 200, 201.)
[138] Åke Björck and C. Bowie. An iterative algorithm for computing the best estimate of an orthogonal
matrix. SIAM J. Numer. Anal., 8:358–364, 1971. (Cited on p. 383.)
[139] Åke Björck and Iain S. Duff. A direct method for the solution of sparse linear least squares prob-
lems. Linear Algebra Appl., 34:43–67, 1980. (Cited on p. 88.)
[140] Åke Björck and Tommy Elfving. Algorithms for confluent Vandermonde systems. Numer. Math.,
21:130–137, 1973. (Cited on p. 238.)
[141] Åke Björck and Tommy Elfving. Accelerated projection methods for computing pseudoinverse
solutions of linear systems. BIT Numer. Math., 19:145–163, 1979. (Cited on pp. 273, 308.)
438 Bibliography
[142] Åke Björck, Tommy Elfving, and Zdeněk Strakoš. Stability of conjugate gradient and Lanczos
methods for linear least squares problems. SIAM J. Matrix Anal. Appl., 19:720–736, 1998. (Cited
on p. 294.)
[143] Åke Björck and Gene H. Golub. Iterative refinement of linear least squares solution by Householder
transformation. BIT Numer. Math., 7:322–337, 1967. (Cited on pp. 101, 102, 157.)
[144] Åke Björck and Gene H. Golub. Numerical methods for computing angles between subspaces.
Math. Comp., 27:579–594, 1973. (Cited on pp. 17, 19.)
[145] Åke Björck, Eric Grimme, and Paul Van Dooren. An implicit shift bidiagonalization algorithm for
ill-posed systems. BIT Numer. Math., 34:510–534, 1994. (Cited on pp. 332, 373.)
[146] Åke Björck and Sven J. Hammarling. A Schur method for the square root of a matrix. Linear
Algebra Appl., 52/53:127–140, 1983. (Cited on p. 378.)
[147] Åke Björck, Pinar Heggernes, and Pontus Matstoms. Methods for large scale total least squares
problems. SIAM J. Matrix Anal. Appl., 22:413–429, 2000. (Cited on p. 224.)
[148] Åke Björck and Ulf G. Indahl. Fast and stable partial least squares modelling: A benchmark study
with theoretical comments. J. Chemom., 31:e2898, 2017. (Cited on pp. 202, 203, 203.)
[149] Å. Björck and C. C. Paige. Loss and recapture of orthogonality in the modified Gram–Schmidt
algorithm. SIAM J. Matrix Anal. Appl., 13:176–190, 1992. (Cited on pp. 66, 70.)
[150] Åke Björck and Christopher C. Paige. Solution of augmented linear systems using orthogonal
factorizations. BIT Numer. Math., 34:1–26, 1994. (Cited on pp. 68, 68.)
[151] Å. Björck, H. Park, and L. Eldén. Accurate downdating of least squares solutions. SIAM J. Matrix
Anal. Appl., 15:549–568, 1994. (Cited on p. 147.)
[152] Åke Björck and Victor Pereyra. Solution of Vandermonde system of equations. Math. Comp.,
24:893–903, 1970. (Cited on p. 238.)
[153] Åke Björck and Jin Yun Yuan. Preconditioners for least squares problems by LU factorization.
ETNA, 8:26–35, 1999. (Cited on p. 318.)
[154] L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. W. Demmel, I. Dhillon, J. Dongarra, S.
Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users’
Guide. SIAM, Philadelphia, 1997. ISBN 0-89871-397-8. (Cited on p. 114.)
[155] L. Susan Blackford, James W. Demmel, Jack Dongarra, Iain Duff, Sven J. Hammarling, Greg
Henry, Michael Heroux, Linda Kaufman, Andrew Lumsdaine, Antoine Petitet, Roldan Pozo, Karin
Remington, and R. Clint Whaley. An updated set of Basic Linear Algebra Subprograms (BLAS).
ACM Trans. Math. Softw., 28:135–151, 2002. (Cited on p. 114.)
[156] Elena Y. Bobrovnikova and Stephen A. Vavasis. Accurate solution of weighted least squares by
iterative methods. SIAM J. Matrix Anal. Appl., 22:1153–1174, 2001. (Cited on p. 133.)
[157] Paul T. Boggs. The convergence of the Ben-Israel iteration for nonlinear least squares problems.
Math. Comp., 30:512–522, 1976. (Cited on pp. 395, 397.)
[158] Paul T. Boggs, Richard H. Byrd, and Robert B. Schnabel. A stable and efficient algorithm for
nonlinear orthogonal distance regression. SIAM J. Sci. Statist. Comput., 8:1052–1078, 1987. (Cited
on pp. 410, 410, 410.)
[159] P. T. Boggs, J. R. Donaldson, R. H. Byrd, and R. B. Schnabel. ODRPACK software for weighted
orthogonal distance regression. ACM Trans. Math. Softw., 15:348–364, 1989. (Cited on p. 410.)
[160] R. F. Boisvert, Roldan Pozo, K. Remington, R. Barret, and Jack J. Dongarra. Matrix Market: A
web resource for test matrix collections. In R. F. Boisvert, editor, Quality of Numerical Software,
Assessment and Enhancement, pages 125–137. Chapman & Hall, London, UK, 1997. (Cited on
pp. 244, 295.)
[161] Adam W. Bojanczyk and Richard P. Brent. Parallel solution of certain Toeplitz least-squares prob-
lems. Linear Algebra Appl., 77:43–60, 1986. (Cited on p. 241.)
Bibliography 439
[162] Adam W. Bojanczyk, Richard P. Brent, and Frank R. de Hoog. QR factorization of Toeplitz matri-
ces. Numer. Math., 49:81–94, 1986. (Cited on p. 239.)
[163] Adam W. Bojanczyk, Richard P. Brent, and Frank R. de Hoog. A Weakly Stable Algorithm for Gen-
eral Toeplitz Matrices. Tech. Report TR-CS-93-15, Cornell University, Ithaca, NY, 1993. (Cited
on p. 241.)
[164] A. W. Bojanczyk, R. P. Brent, P. van Dooren, and F. de Hoog. A note on downdating the Cholesky
factorization. SIAM J. Sci. Statist. Comput., 8:210–221, 1987. (Cited on p. 135.)
[165] Adam W. Bojanczyk, Nicholas J. Higham, and Harikrishna Patel. The equality constrained indefi-
nite least squares problem: Theory and algorithms. BIT Numer. Math., 43:505–517, 2003. (Cited
on p. 136.)
[166] Adam Bojanczyk, Nicholas J. Higham, and Harikrishna Patel. Solving the indefinite least squares
problem by hyperbolic QR factorization. SIAM J. Matrix Anal. Appl., 24:914–931, 2003. (Cited
on pp. 133, 134, 135.)
[167] Adam W. Bojanczyk and Adam Lutoborski. Computation of the Euler angles of a symmetric 3 × 3
matrix. SIAM J. Matrix Anal. Appl., 12:41–48, 1991. (Cited on p. 51.)
[168] Adam W. Bojanczyk and Allan O. Steinhardt. Stability analysis of a Householder-based algorithm
for downdating the Cholesky factorization. SIAM J. Sci. Statist. Comput., 12:1255–1265, 1991.
(Cited on p. 148.)
[169] D. Boley and Gene H. Golub. A survey of matrix inverse eigenvalue problems. Inverse Problems,
3:595–622, 1987. (Cited on p. 230.)
[170] F. L. Bookstein. Fitting conic sections to scattered data. Comput. Graphics Image Process., 9:56–
71, 1979. (Cited on p. 414.)
[171] Tibor Boros, Thomas Kailath, and Vadim Olshevsky. A fast parallel Björck–Pereyra-type algorithm
for parallel solution of Cauchy linear systems. Linear Algebra Appl., 302/303:265–293, 1999.
(Cited on p. 239.)
[172] C. Boutsidis and E. Gallopoulus. SVD-based initialization: A head start for nonnegative matrix
factorization. Pattern Recognition, 41:1350–1362, 2008. (Cited on p. 420.)
[173] David W. Boyd. The power method for ℓp norms. Linear Algebra Appl., 9:95–101, 1974. (Cited
on p. 95.)
[174] R. N. Bracewell. The fast Hartley transform. Proc. IEEE., 72:1010–1018, 1984. (Cited on p. 320.)
[175] R. Bramley and A. Sameh. Row projection methods for large nonsymmetric linear systems. SIAM
J. Sci. Statist. Comput., 13:168–193, 1992. (Cited on p. 275.)
[176] Matthew Brand. Fast low-rank modification of the thin singular value decomposition. Linear
Algebra Appl., 415:20–30, 2006. (Cited on p. 363.)
[177] Richard P. Brent. Algorithm 524: A Fortran multiple-precision arithmetic package. ACM Trans.
Math. Softw., 4:71–81, 1978. (Cited on p. 33.)
[178] Richard P. Brent. A Fortran multiple-precision arithmetic package. ACM Trans. Math. Softw.,
4:57–70, 1978. (Cited on p. 33.)
[179] Richard P. Brent. Old and new algorithms for Toeplitz systems. In Franklin T. Luk, editor, Advanced
Algorithms and Architectures for Signal Processing III, SPIE Proceeding Series, Washington, pages
2–9, 1988. (Cited on p. 241.)
[180] Rasmus Bro. PARAFAC. Tutorial and applications. Chemom. Intell. Lab. Syst., 38:149–171, 1997.
Special Issue: 2nd International Conference in Chemometrics (INCINC’96). (Cited on pp. 216,
218.)
[181] Rasmus Bro and Sijmen de Jong. A fast non-negativity-constrained least squares algorithm. J.
Chemometrics, 11:393–401, 1997. (Cited on p. 167.)
[182] Peter N. Brown and Homer F. Walker. GMRES on (nearly) singular systems. SIAM J. Matrix Anal.
Appl., 18:37–51, 1997. (Cited on p. 334.)
440 Bibliography
[183] Rafael Bru, José Marín, José Mas, and Miroslav Tůma. Preconditioned iterative methods for solving
linear least squares problems. SIAM J. Sci. Comput., 36:A2002–A2022, 2014. (Cited on pp. 306,
312.)
[184] Zvonimir Bujanović and Zlatko Drmač. A contribution to the theory and practice of the block
Kogbetliantz method for computing the SVD. BIT Numer. Math., 52:827–849, 2012. (Cited on
p. 357.)
[185] J. R. Bunch. The weak and strong stability of algorithms in numerical linear algebra. Linear
Algebra Appl., 88/89:49–66, 1987. (Cited on p. 37.)
[186] James R. Bunch. Stability of methods for solving Toeplitz systems of equations. SIAM J. Sci.
Statist. Comput., 6:349–364, 1985. (Cited on p. 241.)
[187] James R. Bunch and Linda Kaufman. Some stable methods for calculating inertia and solving
symmetric linear systems. Math. Comp., 31:163–179, 1977. (Cited on p. 92.)
[188] James R. Bunch and C. P. Nielsen. Updating the singular value decomposition. Numer. Math.,
31:111–129, 1978. (Cited on pp. 361, 363.)
[189] Angelika Bunse-Gerstner, Valia Guerra-Ones, and Humberto Madrid de La Vega. An improved
preconditioned LSQR for discrete ill-posed problems. Math. Comput. Simul., 73:65–75, 2006.
(Cited on p. 322.)
[190] C. S. Burrus. Iterative Reweighted Least Squares. OpenStax-CNX Module m45285, 2012. (Cited
on p. 427.)
[191] C. S. Burrus, J. A. Barreto, and I. W. Selesnick. Iterative reweighted least-squares design of FIR
filters. IEEE Trans. Signal Process., 42:2926–2936, 1994. (Cited on p. 425.)
[192] Peter Businger. Updating a singular value decomposition. BIT Numer. Math., 10:376–385, 1970.
(Cited on p. 361.)
[193] P. Businger and G. H. Golub. Linear least squares solutions by Householder transformations. Nu-
mer. Math., 7:269–276, 1965. Also published as Contribution I/8 in Handbook for Automatic Com-
putation, Vol. 2, F. L. Bauer, ed., Springer, Berlin, 1971. (Cited on pp. 56, 101.)
[194] Peter Businger and Gene H. Golub. Algorithm 358: Singular value decomposition of a complex
matrix. Comm. ACM, 12:564–565, 1969. (Cited on p. 349.)
[195] Alfredo Buttari. Fine-grained multithreading for the multifrontal QR factorization of sparse matri-
ces. SIAM J. Sci. Comput., 35:C323–C345, 2013. (Cited on pp. 112, 258.)
[196] Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. Parallel tiled QR factorization
for multicore architectures. Concurrency and Computation: Practice and Experience, 20:1573–
1590, 2008. (Cited on p. 112.)
[197] Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for
bound constrained optimization. SIAM J. Sci. Comput., 16:1190–1208, 1995. (Cited on p. 420.)
[198] Henri Calandra, Serge Gratton, Elisa Riccietti, and Xavier Vasseur. On iterative solution of the
extended normal equations. SIAM J. Matrix Anal. Appl., 41:1571–1589, 2020. (Cited on p. 331.)
[199] Daniela Calvetti, Per Christian Hansen, and Lothar Reichel. L-curve curvature bounds via Lanczos
bidiagonalization. ETNA, 14:20–35, 2002. (Cited on p. 178.)
[200] Daniela Calvetti, Bryan Lewis, and Lothar Reichel. GMRES-type methods for inconsistent systems.
Linear Algebra Appl., 316:157–169, 2000. (Cited on p. 334.)
[201] Daniela Calvetti, Bryan Lewis, and Lothar Reichel. On the choice of subspace for iterative methods
for linear discrete ill-posed problems. Int. J. Appl. Math. Comput. Sci., 11:1060–1092, 2001. (Cited
on p. 334.)
[202] Daniela Calvetti, Bryan Lewis, and Lothar Reichel. On the regularization properties of the GMRES
method. Numer. Math., 91:605–625, 2002. (Cited on p. 333.)
[203] Daniela Calvetti, Serena Morigi, Lothar Reichel, and Fiorella Sgallari. Tikhonov regularization
and the L-curve for large discrete ill-posed problems. J. Comput. Appl. Math., 123:423–446, 2000.
(Cited on p. 333.)
Bibliography 441
[204] Daniela Calvetti and Lothar Reichel. Tikhonov regularization of large linear problems. BIT Numer.
Math., 43:283–484, 2003. (Cited on p. 335.)
[205] Emmanuel J. Candès, Xiadong Li, Yi Ma, and John Wright. Robust principal component analysis?
J. ACM, 58:11, 2011. (Cited on p. 212.)
[206] Emmanuel J. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal recon-
struction from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52:335–371,
2006. (Cited on p. 427.)
[207] Emmanuel J. Candès, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweighted l1 minimiza-
tion. J. Fourier Anal. Appl., 14:877–905, 2008. (Cited on pp. 428, 428.)
[208] Jesús Cardenal, Iain S. Duff, and José M. Jiménez. Solution of sparse quasi-square rectangular
systems by Gaussian elimination. IMA J. Numer. Anal., 18:165–177, 1998. (Cited on pp. 86, 87.)
[209] Erin Carson and Nicholas J. Higham. Accelerating the solution of linear systems by iterative re-
finement in three precisions. SIAM J. Sci. Comput., 40:A817–A847, 2018. (Cited on p. 104.)
[210] Erin Carson, Nicholas J. Higham, and Srikara Pranesh. Three-precision GMRES-based iterative
refinement for least squares problems. SIAM J. Sci. Comput., 42:A4063–A4083, 2020. (Cited on
p. 104.)
[211] Erin Carson, Kathryn Lund, Miroslav Rozložník, and Stephen Thomas. Block Gram–Schmidt
algorithms and their stability properties. Linear Algebra Appl., 638:150–195, 2022. (Cited on
p. 109.)
[212] A. Cauchy. Mémoire sur l’interpolation. J. Math. Pures Appl., 2:193–205, 1837. (Cited on p. 64.)
[213] Yair Censor. Row-action methods for huge and sparse systems and their applications. SIAM Rev.,
23:444–466, 1981. (Cited on p. 274.)
[214] Yair Censor and S. A. Zenios. Parallel Optimization. Theory, Algorithms, and Applications. Oxford
University Press, Oxford, 1997. (Cited on p. 274.)
[215] Aleš Černý. Characterization of the oblique projector U (V U )† V with applications to constrained
least squares. Linear Algebra Appl., 431:1564–1570, 2009. (Cited on p. 119.)
[216] Frančoise Chaitin-Chatelin and Serge Gratton. On the condition numbers associated with the polar
factorization of a matrix. Numer. Linear Algebra Appl., 7:337–354, 2000. (Cited on p. 385.)
[217] J. M. Chambers. Regression updating. J. Amer. Statist. Assoc., 66:744–748, 1971. (Cited on
p. 135.)
[218] Raymond H. Chan, James G. Nagy, and Robert J. Plemmons. FFT-based preconditioners for
Toeplitz-block least squares problems. SIAM J. Numer. Anal., 30:1740–1768, 1993. (Cited on
p. 324.)
[219] Raymond H. Chan, James G. Nagy, and Robert J. Plemmons. Circulant preconditioned Toeplitz
least squares iterations. SIAM J. Matrix Anal. Appl., 15:80–97, 1994. (Cited on p. 324.)
[220] Raymond H. Chan and Michael K. Ng. Conjugate gradient methods for Toeplitz systems. SIAM
Rev., 38:427–482, 1996. (Cited on p. 325.)
[221] Raymond H.-F. Chan and Xiao-Qing Jin. An Introduction to Iterative Toeplitz Solvers. SIAM,
Philadelphia, 2007. (Cited on p. 325.)
[222] Tony F. Chan. An improved algorithm for computing the singular value decomposition. ACM
Trans. Math. Softw., 8:72–83, 1982. (Cited on pp. 192, 193, 347.)
[223] Tony F. Chan. On the existence and computation of LU factorizations with small pivots. Math.
Comput., 42:535–547, 1984. (Cited on p. 89.)
[224] Tony F. Chan. Rank revealing QR factorizations. Linear Algebra Appl., 88/89:67–82, 1987. (Cited
on pp. 76, 81, 82.)
[225] Tony F. Chan. An optimal circulant preconditioner for Toeplitz systems. SIAM J. Sci. Statist.
Comput., 9:766–771, 1988. (Cited on p. 324.)
442 Bibliography
[226] Tony F. Chan and D. E. Foulser. Effectively well-conditioned linear systems. SIAM J. Sci. Statist.
Comput., 9:963–969, 1988. (Cited on p. 172.)
[227] Tony F. Chan and Per Christian Hansen. Computing truncated singular value decomposition least
squares solutions by rank revealing QR-factorizations. SIAM J. Sci. Statist. Comput., 11:519–530,
1990. (Cited on pp. 83, 174.)
[228] Tony F. Chan and Per Christian Hansen. Some applications of the rank revealing QR factorization.
SIAM J. Sci. Statist. Comput., 13:727–741, 1992. (Cited on p. 174.)
[229] Tony F. Chan and Per Christian Hansen. Low-rank revealing QR factorizations. Numer. Linear
Algebra Appl., 1:33–44, 1994. (Cited on p. 83.)
[230] W. M. Chan and Alan George. A linear time implementation of the reverse Cuthill–McKee algo-
rithm. BIT, 20:1:8–14, 1980. (Cited on p. 251.)
[231] Shivkumar Chandrasekaran and Ilse C. F. Ipsen. On rank-revealing factorizations. SIAM J. Matrix
Anal. Appl., 15:592–622, 1994. (Cited on pp. 77, 83.)
[232] S. Chandrasekaran, M. Gu, and A. H. Sayed. A stable and efficient algorithm for the indefinite
linear least-squares algorithm. SIAM J. Matrix Anal. Appl., 20:354–362, 1998. (Cited on pp. 133,
133.)
[233] S. Chandrasekaran and I. C. F. Ipsen. Analysis of a QR algorithm for computing singular values.
SIAM J. Matrix Anal. Appl., 16:520–535, 1995. (Cited on p. 343.)
[234] Xiao-Wen Chang and Christopher C.Paige. Perturbation analysis for the Cholesky downdating
problem. SIAM J. Matrix Anal. Appl., 19:429–443, 1998. (Cited on p. 148.)
[235] X.-W. Chang, C. C. Paige, and D. Titley-Peloquin. Stopping criteria for the iterative solution of
linear least squares problems. SIAM J. Matrix Anal. Appl., 31:831–852, 2009. (Cited on p. 299.)
[236] J. Charlier, M. Vanbegin, and P. Van Dooren. On efficient implementations of Kogbetliantz’s algo-
rithm for computing the singular value decomposition. Numer. Math., 52:279–300, 1988. (Cited
on p. 357.)
[237] Rick Chartrand. Exact reconstruction of sparse signals via nonconvex minimization. IEEE Proc.
Lett., 14:707–710, 2007. (Cited on p. 428.)
[238] R. Chartrand and Wotao Yin. Iteratively reweighted algorithms for compressive sensing. In Pro-
ceedings IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), pages 3869–3872, 2008.
(Cited on p. 428.)
[239] Danghui Chen and Robert J. Plemmons. Nonnegativity constraints in numerical analysis. In Pro-
ceedings of the Symposium on the Birth of Numerical Analysis, pages 109–140, Katholieke Univer-
siteit, Leuven, 2007. (Cited on p. 167.)
[240] Haibin Chen, Guoyin Li, and Liqun Qi. Further results on Cauchy tensors and Hankel tensors.
Appl. Math. Comput., 275:50–62, 2016. (Cited on p. 218.)
[241] Scott Shaobing Chen, David L. Donoho, and Michael A. Saunders. Atomic decomposition by basis
pursuit. SIAM Rev., 43:129–159, 2001. (Cited on p. 427.)
[242] Y. T. Chen. Iterative Methods for Linear Least Squares Problems. Tech. Report CS-75-04, Depart-
ment of Computer Science, University of Waterloo, Waterloo, Ontario, Canada, 1975. (Cited on
p. 285.)
[243] J. Choi, James W. Demmel, I. Dhillon, J. Dongarra, S. Oustrouchov, A. Petitet, K. Stanley,
D. Walker, and R. C. Whaley. ScaLAPACK: A portable linear algebra library for distributed mem-
ory computers—Design issues and performance. Comput. Phys. Comm., 97:1–15, 1996. (Cited on
p. 114.)
[244] Sou-Cheng T. Choi. Iterative Methods for Singular Linear Equations and Least Squares Problems.
Ph.D. thesis, ICME, Stanford University, Stanford, CA, 2006. (Cited on p. 283.)
[245] Sou-Cheng T. Choi, Christopher C. Paige, and Michael A. Saunders. MINRES-QLP: A Krylov
subspace method for indefinite or singular symmetric matrices. SIAM J. Sci. Comput., 33:1810–
1836, 2011. (Cited on pp. 300, 300.)
Bibliography 443
[246] Sou-Cheng T. Choi and Michael A. Saunders. Algorithm 937: MINRES-QLP for symmetric and
Hermitian linear equations and least-squares problems. ACM Trans. Math. Softw., 40:16:1–12,
2014. (Cited on p. 300.)
[247] Edmond Chow and Yousef Saad. Approximate inverse preconditioners via sparse-sparse iterations.
SIAM J. Sci. Comput., 19:995–1023, 1998. (Cited on p. 314.)
[248] Moody T. Chu, Robert E. Funderlic, and Gene H. Golub. A rank-one reduction formula and its
application to matrix factorization. SIAM Review, 37:512–530, 1995. (Cited on p. 139.)
[249] J. Chun, T. Kailath, and H. Lev-Ari. Fast parallel algorithms for QR and triangular factorizations.
SIAM J. Sci. Statist. Comput., 8:899–913, 1987. (Cited on p. 240.)
[250] Gianfranco Cimmino. Calcolo aprossimato per le soluzioni dei sistemi di equazioni lineari. Ricerca
Sci. II, 9:326–333, 1938. (Cited on p. 273.)
[251] D. I. Clark and M. R. Osborne. Finite algorithms for Hubers’s M -estimator. SIAM J. Sci. Statist.
Comput., 7:72–85, 1986. (Cited on p. 425.)
[252] Alan K. Cline. Rate of convergence of Lawson’s algorithm. Math. Comp., 26:167–176, 1972.
(Cited on p. 423.)
[253] A. K. Cline. An elimination method for the solution of linear least squares problems. SIAM J.
Numer. Anal., 10:283–289, 1973. (Cited on pp. 88, 130.)
[254] Alan K. Cline. The Transformation of a Quadratic Programming Problem into Solvable Form.
Tech. Report ICASE 75-14, NASA, Langley Research Center, Hampton, VA, 1975. (Cited on
p. 162.)
[255] Alan K. Cline, A. R. Conn, and Charles F. Van Loan. Generalizing the LINPACK condition esti-
mator. In J. P. Hennart, editor, Numerical Analysis, volume 909 of Lecture Notes in Mathematics,
pages 73–83. Springer-Verlag, Berlin, 1982. (Cited on p. 95.)
[256] A. K. Cline, C. B. Moler, G. W. Stewart, and J. H. Wilkinson. An estimate for the condition number
of a matrix. SIAM J. Numer. Anal., 16:368–375, 1979. (Cited on pp. 95, 95.)
[257] Randell E. Cline. Representations for the generalized inverse of sums of matrices. SIAM J. Numer.
Anal., 2:99–114, 1965. (Cited on p. 139.)
[258] R. E. Cline and R. J. Plemmons. l2 -solutions to underdetermined linear systems. SIAM Rev.,
18:92–106, 1976. (Cited on p. 88.)
[259] E. S. Coakley, V. Rokhlin, and M. Tygert. A fast randomized algorithm for orthogonal projection.
SIAM J. Sci. Comput., 33:849–868, 2011. (Cited on p. 319.)
[260] T. F. Coleman and Y. Li. A global and quadratically convergent affine scaling method for linear l1
problems. Math. Program., 56:189–222, 1992. (Cited on p. 423.)
[261] Thomas F. Coleman and Yuying Li. A global and quadratically convergent method for linear l∞
problems. SIAM J. Numer. Anal., 29:1166–1186, 1992. (Cited on p. 423.)
[262] Tom F. Coleman, Alan Edenbrandt, and John R. Gilbert. Predicting fill for sparse orthogonal
factorization. J. Assoc. Comput. Mach., 33:517–532, 1986. (Cited on pp. 253, 264.)
[263] Pierre Comon, Gene H. Golub, Lek-Heng Lim, and Bernard Mourrain. Symmetric tensors and
symmetric rank. SIAM J. Matrix Anal. Appl., 30:1254–1279, 2008. (Cited on p. 214.)
[264] P. Comon, J. M. F. ten Berge, Lieven De Lathauwer, and J. Castaing. Generic and typical ranks of
multiway arrays. Linear Algebra Appl., 430:2997–3007, 2009. (Cited on p. 218.)
[265] Andrew R. Conn, Nicholas I. M. Gould, and Philippe L. Toint. A globally convergent augmented
Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer.
Anal., 28:545–572, 1991. (Cited on p. 400.)
[266] Andy R. Conn, Nick I. M. Gould, and Philippe L. Toint. LANCELOT: A Fortran Package for Large-
Scale Nonlinear Optimization. (Release A). Springer-Verlag, Berlin, 1992. (Cited on p. 400.)
[267] J. W. Cooley. How the FFT gained acceptance. In S. G. Nash, editor, A History of Scientific
Computing, pages 133–140. Addison-Wesley, Reading, MA, 1990. (Cited on p. 237.)
444 Bibliography
[268] J. W. Cooley, P. A. W. Lewis, and P. D. Welsh. The fast Fourier transform and its application. IEEE
Trans. Education, E-12:27–34, 1969. (Cited on p. 237.)
[269] James W. Cooley and John W. Tukey. An algorithm for machine calculation of complex Fourier
series. Math. Comp., 19:297–301, 1965. (Cited on p. 235.)
[270] Corrado Corradi. A note on the solution of separable nonlinear least-squares problems with separa-
ble nonlinear equality constraints. SIAM J. Numer. Anal., 18:1134–1138, 1981. (Cited on p. 404.)
[271] A. J. Cox and Nicholas J. Higham. Backward error bounds for constrained least squares problems.
BIT Numer. Math., 39:210–227, 1999. (Cited on pp. 99, 157.)
[272] Anthony J. Cox. Stability of Algorithms for Solving Weighted and Constrained Least Squares
Problems. Ph.D. thesis, University of Manchester, Department of Mathematics, Manchester, UK,
1997. (Cited on p. 131.)
[273] Anthony J. Cox and Nicholas J. Higham. Stability of Householder QR factorization for weighted
least squares problems. In D. F. Griffiths, D. J. Higham, and G. A. Watson, editors, Numerical
Analysis 1997: Proceedings of the 17th Dundee Biennial Conference, Pitman Research Notes Math.
Ser. 380, pages 57–73. Longman Scientific and Technical, Harlow, UK, 1998. (Cited on p. 131.)
[274] Anthony J. Cox and Nicholas J. Higham. Accuracy and stability of the null space method for
solving the equality constrained least squares problem. BIT Numer. Math., 39:34–50, 1999. (Cited
on pp. 157, 160.)
[275] Maurice G. Cox. The least-squares solution of linear equations with block-angular observation
matrix. In Maurice G. Cox and Sven J. Hammarling, editors, Reliable Numerical Computation,
pages 227–240. Oxford University Press, UK, 1990. (Cited on pp. 208, 209.)
[276] Trevor F. Cox and Michael A. A. Cox. Multidimensional Scaling. Chapman and Hall, London,
1994. (Cited on p. 386.)
[277] E. J. Craig. The N -step iteration procedure. J. Math. Phys., 34:65–73, 1955. (Cited on p. 284.)
[278] P. Craven and Grace Wahba. Smoothing noisy data with spline functions. Numer. Math., 31:377–
403, 1979. (Cited on pp. 178, 189.)
[279] Jane Cullum, Ralph A. Willoughby, and Mark Lake. A Lanczos algorithm for computing singular
values and vectors of large matrices. SIAM J. Sci. Statist. Comput., 4:197–215, 1983. (Cited on
p. 376.)
[280] J. J. M. Cuppen. A divide and conquer method for the symmetric tridiagonal eigenproblem. Numer.
Math., 36:177–195, 1981. (Cited on p. 358.)
[281] E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In ACM ’69:
Proc. 24th Nat. Conf., pages 157–172, New York, 1969. ACM. (Cited on p. 251.)
[282] George Cybenko. Fast Toeplitz orthogonalization using inner products. SIAM J. Sci. Statist. Com-
put., 8:734–740, 1987. (Cited on p. 241.)
[283] Germund Dahlquist and Åke Björck. Numerical Methods. Prentice-Hall Inc., Englewood Cliffs,
NJ, 1974. Reprinted in 2003 by Dover Publications, Mineola, NJ. (Cited on p. 231.)
[284] Germund Dahlquist and Åke Björck. Numerical Methods in Scientific Computing, Volume I. SIAM,
Philadelphia, 2008. (Cited on pp. 33, 378.)
[285] James W. Daniel, William B. Gragg, Linda Kaufman, and G. W. Stewart. Reorthogonalization and
stable algorithms for updating the Gram-Schmidt QR factorization. Math. Comp., 30:772–795,
1976. (Cited on pp. 69, 150.)
[286] Ingrid Daubechies, Ronald DeVore, Massimo Fornasier, and C. Sinan Güntürk. Iteratively re-
weighted least squares minimization for sparse recovery. Comm. Pure Appl. Math., 63:1–38, 2010.
(Cited on p. 425.)
[287] E. R. Davidson. The iterative calculation of a few of the lowest eigenvalues and corresponding
eigenvectors of large real symmetric matrices. J. Comput. Phys., 17:87–94, 1975. (Cited on
p. 374.)
Bibliography 445
[288] Chandler Davis and W. M. Kahan. The rotation of eigenvectors by a perturbation. III. SIAM J.
Numer. Anal., 7:1–46, 1970. (Cited on p. 19.)
[289] Timothy A. Davis. Direct Methods for Sparse Linear Systems, volume 2 of Fundamental of Algo-
rithms. SIAM, Philadelphia, 2006. (Cited on p. 244.)
[290] Timothy A. Davis. Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rank-revealing
sparse QR factorization. ACM Trans. Math. Softw., 38:8:1–8:22, 2011. (Cited on p. 258.)
[291] Timothy A. Davis and Yifan Hu. The University of Florida sparse matrix collection. ACM Trans.
Math. Softw., 38:1:1–1:25, 2011. (Cited on pp. 244, 297.)
[292] Timothy A. Davis, Sivasankaran Rajamanickam, and Wissam M. Sid-Lakhdar. A survey of direct
methods for sparse linear systems. Acta Numer., 25:383–566, 2016. (Cited on p. 244.)
[293] Achiya Dax. The convergence of linear stationary iterative processes for solving singular unstruc-
tured systems of linear equations. SIAM Rev., 32:611–635, 1990. (Cited on p. 270.)
[294] Achiya Dax. On computational aspects of bounded linear least squares problems. ACM Trans.
Math. Softw., 17:64–73, 1991. (Cited on p. 417.)
[295] Sijmen de Jong. SIMPLS: An alternative approach to partial least squares regression. Chemom.
Intell. Lab. Syst., 18:251–263, 1993. (Cited on p. 203.)
[296] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. A multilinear singular value decompo-
sition. SIAM J. Matrix Anal. Appl., 21:1253–1278, 2000. (Cited on p. 217.)
[297] Bart De Moor. Structured total least squares and l2 approximation problems. Linear Algebra Appl.,
188/189:163–206, 1993. (Cited on p. 226.)
[298] Bart De Moor and Paul Van Dooren. Generalizations of the singular value and QR decompositions.
SIAM J. Matrix Anal. Appl., 13:993–1014, 1992. (Cited on p. 128.)
[299] Bart De Moor and H. Zha. A tree of generalizations of the ordinary singular value decomposition.
Linear Algebra Appl., 147:469–500, 1991. (Cited on p. 128.)
[300] Ron S. Dembo, Stanley C. Eisenstat, and Trond Steihaug. Inexact Newton methods. SIAM J.
Numer. Anal., 19:400–408, 1982. (Cited on pp. 401, 401.)
[301] C. J. Demeure. Fast QR factorization of Vandermonde matrices. Linear Algebra Appl.,
122/123/124:165–194, 1989. (Cited on p. 239.)
[302] C. J. Demeure. QR factorization of confluent Vandermonde matrices. IEEE Trans. Acoust. Speech
Signal Process., 38:1799–1802, 1990. (Cited on p. 239.)
[303] James W. Demmel. On Floating Point Errors in Cholesky. Tech. Report CS-89-87, Department
of Computer Science, University of Tennessee, Knoxville, TN, 1989. LAPACK Working Note 14.
(Cited on p. 43.)
[304] James W. Demmel. The componentwise distance to the nearest singular matrix. SIAM J. Matrix
Anal. Appl., 13:10–19, 1992. (Cited on p. 31.)
[305] James W. Demmel, Laura Grigori, Ming Gu, and Hua Xiang. Communication avoiding rank reveal-
ing QR factorization with column pivoting. SIAM J. Matrix Anal. Appl., 36:55–89, 2015. (Cited
on p. 112.)
[306] James W. Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou. Communication-optimal
parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput., 34:A206–A239, 2012.
(Cited on p. 212.)
[307] James W. Demmel, Ming Gu, Stanley Eisenstat, Ivan Slapnic̆ar, Krěsimir Veselić, and Zlatko
Drmač. Computing the singular value decomposition with high relative accuracy. Linear Alge-
bra Appl., 299:21–80, 1999. (Cited on p. 344.)
[308] James W. Demmel, Yozo Hida, E. Jason Riedy, and Xiaoye S. Li. Extra-precise iterative refinement
for overdetermined least squares problems. ACM Trans. Math. Softw., 35:29:1–32, 2009. (Cited
on p. 103.)
446 Bibliography
[309] James W. Demmel, Mark Hoemmen, Yozo Hida, and E. Jason Riedy. Non-negative Diagonals and
High Performance on Low-Profile Matrices from Householder QR. LAPACK Working Note 203,
Tech. Report UCB/EECS-2008-76, UCB/EECS, Berkeley, CA, 2008. (Cited on p. 47.)
[310] James Demmel and W. Kahan. Accurate singular values of bidiagonal matrices. SIAM J. Sci. Statist.
Comput., 11:873–912, 1990. (Cited on pp. 343, 343, 348, 356.)
[311] James Demmel and Plamen Koev. The accurate and efficient solution of a totally positive gener-
alized Vandermonde linear system. SIAM J. Matrix Anal. Appl., 27:142–152, 2005. (Cited on
p. 238.)
[312] James Demmel and Krešimir Veselić. Jacobi’s method is more accurate than QR. SIAM J. Matrix
Anal. Appl., 13:1204–1245, 1992. (Cited on p. 353.)
[313] Eugene D. Denman and Alex N. Beavers. The matrix sign function and computations in systems.
Appl. Math. Comput., 2:63–94, 1976. (Cited on p. 379.)
[314] John E. Dennis. Nonlinear least squares and equations. In D. A. H. Jacobs, editor, The State of the
Art in Numerical Analysis, pages 269–312. Academic Press, New York, 1977. (Cited on p. 402.)
[315] John E. Dennis, David M. Gay, and R. E. Welsh. An adaptive nonlinear least-squares algorithm.
ACM. Trans. Math. Softw., 7:348–368, 1981. (Cited on pp. 398, 399.)
[316] John E. Dennis, Jr. and Robert B. Schnabel. Numerical Methods for Unconstrained Optimization
and Nonlinear Equations, volume 16 of Classics in Applied Math., SIAM, Philadelphia, 1996.
Originally published in 1983 by Prentice-Hall, Englewood Cliffs, NJ. (Cited on pp. 392, 393,
399, 402.)
[317] J. E. Dennis, Jr. and Trond Steihaug. On the successive projections approach to least-squares
problems. SIAM J. Numer. Anal., 23:717–733, 1986. (Cited on p. 309.)
[318] Peter Deuflhard and V. Apostolescu. A study of the Gauss–Newton algorithm for the solution of
nonlinear least squares problems. In J. Frehse, D. Pallaschke, and U. Trottenberg, editors, Special
Topics of Applied Mathematics, pages 129–150. North-Holland, Amsterdam, 1980. (Cited on
p. 394.)
[319] Peter Deuflhard and W. Sautter. On rank-deficient pseudoinverses. Linear Algebra Appl., 29:91–
111, 1980. (Cited on p. 73.)
[320] Inderjit S. Dhillon. A New O(n2 ) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector
Problem. Ph.D. thesis, University of California, Berkeley, CA, 1997. (Cited on p. 352.)
[321] Inderjit S. Dhillon and Beresford N. Parlett. Orthogonal eigenvectors and relative gaps. SIAM J.
Matrix Anal. Appl., 25:858–899, 2004. (Cited on p. 352.)
[322] J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart. LINPACK Users’ Guide. SIAM,
Philadelphia, 1979. (Cited on pp. 113, 140, 184, 349.)
[323] Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and
Ichitaro Yamazaki. The singular values decomposition: Anatomy of optimizing an algorithm for
extreme scale. SIAM Rev., 60:808–865, 2018. (Cited on pp. 112, 194, 360.)
[324] J. J. Dongarra and D. C. Sorensen. A fully parallel algorithmic for the symmetric eigenvalue prob-
lem. SIAM J. Sci. Statist. Comput., 8:s139–s154, 1987. (Cited on p. 358.)
[325] Jack Dongarra, Jeremy Du Croz, Sven J. Hammarling, and Richard J. Hanson. Algorithm 656. An
extended set of FORTRAN Basic Linear Algebra Subprograms: Model implementation and test
programs. ACM Trans. Math. Softw., 14:18–32, 1988. (Cited on p. 113.)
[326] Jack Dongarra, Jeremy Du Croz, Sven J. Hammarling, and Richard J. Hanson. An extended set of
FORTRAN Basic Linear Algebra Subprograms. ACM Trans. Math. Softw., 14:1–17, 1988. (Cited
on p. 113.)
[327] Jack Dongarra, Danny Sorensen, and Sven J. Hammarling. Block reduction of matrices to con-
densed form for eigenvalue computation. J. Comput. Appl. Math., 27:215–227, 1989. (Cited on
p. 194.)
Bibliography 447
[328] Jack J. Dongarra, Jeremy Du Croz, Iain S. Duff, and S. Hammarling. A set of level 3 basic linear
algebra subprograms. ACM Trans. Math. Softw., 16:1–17, 1990. (Cited on p. 257.)
[329] D. L. Donoho. For most large underdetermined systems of linear equations the minimal 1-norm
solution is also the sparsest solution. Comm. Pure. Appl. Math., 59:797–829, 2006. (Cited on
p. 427.)
[330] Froilán M. Dopico, Plamen Koev, and Juan M. Molera. Implicit standard Jacobi gives high relative
accuracy. Numer. Math., 113:519–553, 2009. (Cited on p. 353.)
[331] N. R. Draper and H. Smith. Applied Regression Analysis. Wiley, New York, third edition, 1998.
(Cited on p. 205.)
[332] Petros Drineas, Michael W. Mahoney, S. Muthukrishnan, and Tamás Sarlós. Faster least squares
approximation. Numer. Math., 117:219–249, 2011. (Cited on p. 319.)
[333] Zlatko Drmač. Implementation of Jacobi rotations for accurate singular value computation in float-
ing point arithmetic. SIAM J. Sci. Comput., 18:1200–1222, 1997. (Cited on p. 356.)
[334] Zlatko Drmač. On principal angles between subspaces of Euclidean space. SIAM J. Matrix Anal.
Appl., 22:173–194, 2000. (Cited on p. 17.)
[335] Zlatko Drmač. A QR-preconditioned QR SVD method for computing SVD with high accuracy.
ACM Trans. Math. Softw., 44:11:1–11:30, 2017. (Cited on p. 341.)
[336] Zlatko Drmač and Krešimir Veselić. New fast and accurate Jacobi SVD algorithm. I. SIAM J.
Matrix Anal. Appl., 29:1322–1342, 2008. (Cited on p. 356.)
[337] Zlatko Drmač and Krešimir Veselić. New fast and accurate Jacobi SVD algorithm. II. SIAM J.
Matrix Anal. Appl., 29:1343–1362, 2008. (Cited on p. 356.)
[338] A. A. Dubrulle. Householder transformations revisited. SIAM J. Matrix Anal. Appl., 22:33–40,
2000. (Cited on p. 51.)
[339] F. Duchin and Daniel B. Szyld. Application of sparse matrix techniques to inter-regional input-
output analysis. Econ. Planning, 15:147–167, 1979. (Cited on p. 206.)
[340] Iain S. Duff. Pivot selection and row orderings in Givens reduction on sparse matrices. Computing,
13:239–248, 1974. (Cited on p. 255.)
[341] Iain S. Duff. On permutations to block triangular form. J. Inst. Math. Appl., 19:339–342, 1977.
(Cited on p. 264.)
[342] Iain S. Duff. On algorithms for obtaining a maximum transversal. ACM Trans. Math. Softw.,
7:315–330, 1981. (Cited on p. 264.)
[343] Iain S. Duff. Parallel implementation of multifrontal schemes. Parallel Comput., 3:193–204, 1986.
(Cited on p. 255.)
[344] Iain S. Duff, A. M. Erisman, and John K. Reid. Direct Methods for Sparse Matrices. Oxford
University Press, London, 1986. (Cited on pp. 185, 252.)
[345] Iain S. Duff, A. M. Erisman, and John K. Reid. Direct Methods for Sparse Matrices. Oxford
University Press, London, second edition, 2017. (Cited on p. 244.)
[346] Iain S. Duff, Roger G. Grimes, and John G. Lewis. Sparse matrix test problems. ACM Trans. Math.
Softw., 15:1–14, 1989. (Cited on pp. 244, 265.)
[347] Iain S. Duff, Michael A. Heroux, and Roldan Pozo. An overview of the sparse basic linear alge-
bra subprograms: The new standard from the BLAS technical forum. ACM Trans. Math. Softw.,
28:239–257, 2002. (Cited on p. 247.)
[348] Iain S. Duff and J. Koster. On algorithms for permuting large entries to the diagonal of a sparse
matrix. SIAM J. Matrix Anal. Appl., 22:973–996, 2001. (Cited on p. 317.)
[349] Iain S. Duff, M. Marrone, G. Radicati, and C. Vittoli. Level 3 Basic Linear Algebra Subprograms
for sparse matrices: A user level interface. ACM Trans. Math. Softw., 23:379–401, 1997. (Cited
on p. 247.)
448 Bibliography
[350] Iain S. Duff and Gérard A. Meurant. The effect of ordering on preconditioned conjugate gradients.
BIT Numer. Math., 29:635–657, 1989. (Cited on p. 310.)
[351] Iain S. Duff and John K. Reid. An implementation of Tarjan’s algorithm for the block triangular-
ization of a matrix. ACM Trans. Math. Softw., 4:137–147, 1978. (Cited on p. 265.)
[352] Iain S. Duff and John K. Reid. The multifrontal solution of indefinite sparse symmetric linear
systems. ACM Trans. Math. Softw., 9:302–325, 1983. (Cited on p. 255.)
[353] A. L. Dulmage and N. S. Mendelsohn. Coverings of bipartite graphs. Canad. J. Math., 10:517–534,
1958. (Cited on p. 263.)
[354] A. L. Dulmage and N. S. Mendelsohn. A structure theory of bipartite graphs of finite exterior
dimension. Trans. Roy. Soc. Canada Sect. III, 53:1–13, 1959. (Cited on p. 263.)
[355] A. L. Dulmage and N. S. Mendelsohn. Two algorithms for bipartite graphs. J. Soc. Indust. Appl.
Math., 11:183–194, 1963. (Cited on p. 263.)
[356] P. J. Eberlein and Haesun Park. Efficient implementation of Jacobi algorithms and Jacobi sets on
distributed memory machines. J. Parallel Distrib. Comput., 8:358–366, 1990. (Cited on p. 354.)
[357] Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank. Psy-
chometrika, 1:211–218, 1936. (Cited on p. 24.)
[358] A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorithms with orthogonality con-
straints. SIAM J. Matrix Anal. Appl., 20:303–353, 1998. (Cited on p. 169.)
[359] Bradley Efron, Trevor Hastie, Iain Johnston, and Robert Tibshirani. Least angle regression. Ann.
Statist., 32:407–499, 2004. (Cited on p. 425.)
[360] M. A. Efroymson. Multiple regression analysis. In Anthony Ralston and Herbert S. Wilf, editors,
Mathematical Methods for Digital Computers. Volume I, pages 191–203. Wiley, New York, 1960.
(Cited on p. 140.)
[361] E. Egerváry. On rank-diminishing operators and their applications to the solution of linear equa-
tions. Z. Angew. Math. Phys., 11:376–386, 1960. (Cited on p. 139.)
[362] Stanley C. Eisenstat, M. C. Gursky, Martin H. Schultz, and Andrew H. Sherman. The Yale sparse
matrix package, 1. The symmetric code. Internat J. Numer. Methods Engrg., 18:1145–1151, 1982.
(Cited on p. 258.)
[363] Håkan Ekblom. Calculation of linear best lp -approximation. BIT Numer. Math., 13:292–300, 1973.
(Cited on p. 425.)
[364] Håkan Ekblom. A new algorithm for the Huber estimator in linear models. BIT Numer. Math.,
28:123–132, 1988. (Cited on p. 425.)
[365] Håkan Ekblom and Kaj Madsen. Algorithms for nonlinear Huber estimation. BIT Numer. Math.,
29:60–76, 1989. (Cited on p. 425.)
[366] Lars Eldén. Stepwise Regression Analysis with Orthogonal Transformations. Tech. Report LiTH-
MAT-R-1972-2, Department of Mathematics, Linköping University, Sweden, 1972. (Cited on
p. 140.)
[367] Lars Eldén. Algorithms for the regularization of ill-conditioned least squares problems. BIT Numer.
Math., 17:134–145, 1977. (Cited on pp. 170, 171, 180.)
[368] Lars Eldén. Perturbation theory for the least squares problem with linear equality constraints. SIAM
J. Numer. Anal., 17:338–350, 1980. (Cited on p. 158.)
[369] Lars Eldén. A weighted pseudoinverse, generalized singular values, and constrained least squares
problems. BIT Numer. Math., 22:487–502, 1982. (Cited on pp. 158, 158, 159, 181.)
[370] Lars Eldén. An efficient algorithm for the regularization of ill-conditioned least squares problems
with triangular Toeplitz matrix. SIAM J. Sci. Statist. Comput., 5:229–236, 1984. (Cited on pp. 176,
241.)
[371] Lars Eldén. A note on the computation of the generalized cross-validation function for ill-
conditioned least squares problems. BIT Numer. Math., 24:467–472, 1984. (Cited on p. 179.)
Bibliography 449
[372] Lars Eldén. Algorithms for the computation of functionals defined on the solution of a discrete
ill-posed problem. BIT Numer. Math., 30:466–483, 1990. (Cited on p. 29.)
[373] Lars Eldén. Numerical solution of the sideways heat equation. In H. W. Engl and W. Rundell,
editors, Inverse Problems in Diffusion Processes, pages 130–150. SIAM, Philadelphia, PA, 1995.
(Cited on p. 173.)
[374] Lars Eldén. Solving quadratically constrained least squares problems using a differential-geometric
approach. BIT Numer. Math., 42:323–335, 2002. (Cited on p. 169.)
[375] Lars Eldén. Partial least-squares vs. Lanczos bidiagonalization—I: Analysis of a projection method
for multiple regression. Comput. Statist. Data Anal., 46:11–31, 2004. (Cited on p. 201.)
[376] Lars Eldén. Matrix Methods in Data Mining and Pattern Recognition. Fundamentals of Algorithms.
SIAM, Philadelphia, second edition, 2019. (Cited on p. 216.)
[377] Lars Eldén and Salman Ahmadi-Asl. Solving bilinear tensor least squares problems and application
to Hammerstein identification. Numer. Linear Algebra Appl., 26:e2226, 2019. (Cited on p. 406.)
[378] L. Eldén and H. Park. Block downdating of least squares solutions. SIAM J. Matrix Anal. Appl.,
15:1018–1034, 1994. (Cited on p. 148.)
[379] Lars Eldén and Haesun Park. A Procrustes problem on the Stiefel manifold. Numer. Math., 82:599–
619, 1999. (Cited on p. 387.)
[380] Lars Eldén and Berkant Savas. A Newton–Grassman method for computing the best multilinear
rank-(r1 , r2 , r3 ) approximation of a tensor. SIAM J. Matrix Anal. Appl., 31:248–271, 2009. (Cited
on pp. 215, 217.)
[381] Lars Eldén and Valeria Simoncini. Inexact Rayleigh quotient-type methods for eigenvalue compu-
tations. BIT Numer. Math.., 42:159–182, 2002. (Cited on p. 375.)
[382] Lars Eldén and Valeria Simoncini. Solving ill-posed linear systems with GMRES and a singular
preconditioner. SIAM J. Matrix Anal. Appl., 33:1369–1394, 2012. (Cited on p. 334.)
[383] T. Elfving. Block-iterative methods for consistent and inconsistent linear equations. Numer. Math.,
35:1–12, 1980. (Cited on p. 308.)
[384] Tommy Elfving and Ingegerd Skoglund. A direct method for a regularized least-squares problem.
Numer. Linear Algebra Appl., 16:649–675, 2009. (Cited on p. 259.)
[385] Erik Elmroth and Fred G. Gustavson. Applying recursion to serial and parallel QR factorization
leads to better performance. IBM J. Res. Develop., 44:605–624, 2004. (Cited on p. 112.)
[386] Heinz W. Engl, Martin Hanke, and Andreas Neubauer. Regularization of Inverse Problems. Dor-
drecht, The Netherlands. Kluwer Academic Press, 1996. (Cited on p. 182.)
[387] Jerry Eriksson. Quasi-Newton methods for nonlinear least squares focusing on curvature. BIT
Numer. Math., 39:228–254, 1999. (Cited on p. 400.)
[388] Jerry Eriksson, Per-Åke Wedin, Mårten E. Gulliksson, and Inge Söderkvist. Regularization methods
for uniformly rank-deficient nonlinear least squares methods. J. Optim. Theory Appl., 127:1–26,
2005. (Cited on p. 397.)
[389] D. J. Evans. The use of pre-conditioning in iterative methods for solving linear systems with
symmetric positive definite matrices. J. Inst. Math. Appl., 4:295–314, 1968. (Cited on p. 306.)
[390] Nicolaas M. Faber and Joan Ferré. On the numerical stability of two widely used PLS algorithms.
J. Chemom., 22:101–105, 2008. (Cited on p. 203.)
[391] D. K. Faddeev, V. N. Kublanovskaya, and V. N. Faddeeva. Solution of linear algebraic systems with
rectangular matrices. Proc. Steklov Inst. Math., 96:93–111, 1968. (Cited on p. 76.)
[392] D. K. Faddeev, V. N. Kublanovskaya, and V. N. Faddeeva. Sur les systèmes linéaires algébriques
de matrices rectangularies et malconditionées. In Programmation en Mathématiques Numériques,
pages 161–170. Editions Centre Nat. Recherche Sci., Paris VII, 1968. (Cited on p. 343.)
[393] Shaun M. Fallat. Bidiagonal factorizations of totally nonnegative matrices. Amer. Math. Monthly,
108:697–212, 2001. (Cited on p. 239.)
450 Bibliography
[394] Ky Fan. On a theorem by Weyl concerning the eigenvalues of linear transformations. Proc. Nat.
Acad. Sci. USA, 35:652–655, 1949. (Cited on p. 22.)
[395] Ky Fan and Alan J. Hoffman. Some metric inequalities in the space of matrices. Proc. Amer. Math.
Soc., 6:111–116, 1955. (Cited on p. 383.)
[396] R. W. Farebrother. Linear Least Squares Computations. Marcel Dekker, New York, 1988. (Cited
on p. 64.)
[397] Richard W. Farebrother. Fitting Linear Relationships. A History of the Calculus of Observations
1750–1900. Springer Series in Statistics. Springer, Berlin, 1999. (Cited on pp. 2, 139.)
[398] Dario Fasino and Antonio Fazzi. A Gauss–Newton iteration for total least squares problems. BIT
Numer. Math., 58:281–299, 2018. (Cited on p. 224.)
[399] Donald W. Fausett and Charles T. Fulton. Large least squares problems involving Kronecker prod-
ucts. SIAM J. Matrix Anal. Appl., 15:219–227, 1994. (Cited on p. 209.)
[400] M. Fazel. Matrix Rank Minimization with Applications. Ph.D. thesis, Stanford University, Stanford,
CA, 2002. (Cited on p. 429.)
[401] K. Vince Fernando. Linear convergence of the row cyclic Jacobi and Kogbetliantz methods. Numer.
Math., 56:73–91, 1989. (Cited on pp. 357, 357.)
[402] K. V. Fernando. Accurately counting singular values of bidiagonal matrices and eigenvalues of
skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., 20:2:373–399, 1998. (Cited on
pp. 350, 351.)
[403] K. Vince Fernando and Beresford N. Parlett. Accurate singular values and differential qd algo-
rithms. Numer. Math., 67:191–229, 1994. (Cited on pp. 343, 351, 352.)
[404] R. D. Fierro, G. H. Golub, P. C. Hansen, and D. P. O’Leary. Regularization by truncated total least
squares. SIAM J. Sci. Comput., 18:1223–1241, 1997. (Cited on pp. 225, 335.)
[405] Ricardo D. Fierro and James R. Bunch. Collinearity and total least squares. SIAM J. Matrix Anal.
Appl., 15:1167–1181, 1994. (Cited on p. 225.)
[406] Ricardo D. Fierro and James R. Bunch. Perturbation theory for orthogonal projection methods with
applications to least squares and total least squares. Linear Algebra Appl., 234:71–96, 1996. (Cited
on p. 225.)
[407] Ricardo D. Fierro and Per Christian Hansen. UTV Expansion Pack: Special-purpose rank-revealing
algorithms. Numer. Algorithms, 40:47–66, 2005. (Cited on p. 155.)
[408] Ricardo D. Fierro, Per Christian Hansen, and Peter Søren Kirk Hansen. UTV Tools: MATLAB
templates for rank-revealing UTV decompositions. Numer. Algorithms, 20:165–194, 1999. (Cited
on p. 155.)
[409] B. Fischer, A. Ramage, D. J. Silvester, and A. J. Wathen. Minimum residual methods for augmented
systems. BIT Numer. Math., 38:527–543, 1998. (Cited on pp. 300, 330.)
[410] Roger Fletcher. Generalized inverse methods for the best least squares solution of systems of non-
linear equations. Comput. J., 10:392–399, 1968. (Cited on p. 395.)
[411] Roger Fletcher. A Modified Marquardt Subroutine for Nonlinear Least Squares. Tech. Report
R6799, Atomic Energy Research Establishment, Harwell, UK, 1971. (Cited on p. 396.)
[412] Roger Fletcher. Practical Methods of Optimization. Wiley, New York, second edition, 2000. (Cited
on p. 402.)
[413] Roger Fletcher and C. Xu. Hybrid methods for nonlinear least squares. IMA J. Numer. Anal.,
7:371–389, 1987. (Cited on p. 400.)
[414] Diederik R. Fokkema, Gerard L. G. Sleijpen, and Henk A. van der Vorst. Accelerated inexact
Newton schemes for large systems of nonlinear equations. SIAM J. Sci. Comput., 19:657–674,
1998. (Cited on p. 402.)
Bibliography 451
[415] Diederik R. Fokkema, Gerard L. G. Sleijpen, and Henk A. van der Vorst. Jacobi–Davidson style
QR and QZ algorithms for the reduction of matrix pencils. SIAM J. Sci. Comput., 20:94–125, 1998.
(Cited on p. 375.)
[416] David Chin-Lung Fong. Minimum-Residual Methods for Sparse Least-Squares Using Golub–
Kahan Bidiagonalization. Ph.D. thesis, Stanford University, Stanford, CA, 2011. (Cited on p. 294.)
[417] David Chin-Lung Fong and Michael Saunders. LSMR: An iterative algorithm for sparse least-
squares problems. SIAM J. Sci. Comput., 33:2950–2971, 2011. (Cited on pp. 292, 294, 297,
298.)
[418] A. B. Forbes. Least-Squares Best-Fit Geometric Elements. Tech. Report NPL DITC 140/89, Na-
tional Physical Laboratory, Teddington, UK, 1989. (Cited on p. 414.)
[419] A. B. Forbes. Robust Circle and Sphere Fitting by Least Squares. Tech. Report NPL DITC 153/89,
National Physical Laboratory, Teddington, UK, 1989. (Cited on p. 414.)
[420] Anders Forsgren. On linear least-squares problems with diagonally dominant weight matrices.
SIAM J. Matrix Anal. Appl., 17:763–788, 1996. (Cited on p. 132.)
[421] Anders Forsgren, Philip E. Gill, and Margaret H. Wright. Interior methods for nonlinear optimiza-
tion. SIAM Rev., 44:525–597, 2002. (Cited on p. 419.)
[422] George E. Forsythe. Generation and use of orthogonal polynomials for data-fitting with a digital
computer. J. Soc. Indust. Appl. Math., 5:74–88, 1957. (Cited on p. 237.)
[423] George E. Forsythe and Peter Henrici. The cyclic Jacobi method for computing the principal values
of a complex matrix. Trans. Amer. Math. Soc., 94:1–23, 1960. (Cited on p. 354.)
[424] Leslie Foster. Rank and null space calculations using matrix decomposition without column inter-
changes. Linear Algebra Appl., 74:47–71, 1986. (Cited on pp. 82, 261.)
[425] Leslie Foster. The growth factor and efficiency of Gaussian elimination with rook pivoting. J.
Comp. Appl. Math., 86:177–194, 1997. (Cited on p. 86.)
[426] Leslie Foster and Rajesh Kommu. Algorithm 853. An efficient algorithm for solving rank-deficient
least squares problems. ACM Trans. Math. Softw., 32:157–165, 2006. (Cited on p. 79.)
[427] Leslie V. Foster. Modifications of the normal equations method that are numerically stable. In
Gene H. Golub and P. Van Dooren, editors, Numerical Linear Algebra, Digital Signal Processing
and Parallel Algorithms, NATO ASI Series, pages 501–512. Springer-Verlag, Berlin, 1991. (Cited
on p. 204.)
[428] Leslie V. Foster. Solving rank-deficient and ill-posed problems using UTV and QR factorizations.
SIAM J. Matrix Anal. Appl., 25:582–600, 2003. (Cited on p. 79.)
[429] Leslie Fox. An Introduction to Numerical Linear Algebra. Clarendon Press, Oxford, 1964. xi+328
pp. (Cited on p. 314.)
[430] C. Fraley. Computational behavior of Gauss–Newton methods. SIAM J. Sci. Statist. Comput.,
10:515–532, 1989. (Cited on p. 394.)
[431] John G. F. Francis. The QR transformation: A unitary analogue to the LR transformation. I. Com-
puter J., 4:265–271, 1961. (Cited on p. 344.)
[432] John G. F. Francis. The QR transformation. II. Computer J., 4:332–345, 1961. (Cited on p. 344.)
[433] Roland W. Freund. A note on two block SOR methods for sparse least squares problems. Linear
Algebra Appl., 88/89:211–221, 1987. (Cited on p. 316.)
[434] M. P. Friedlander and Dominique Orban. A primal-dual regularized interior-point method for con-
vex quadratic programs. Math. Prog. Comput., 4:71–107, 2012. (Cited on p. 331.)
[435] Takeshi Fukaya, Ramaseshan Kannan, Yuji Nakatsukasa, Yasaku Yamamoto, and Yuka Yanagi-
sawa. Shifted Cholesky QR for computing the QR factorization of ill-conditioned matrices. SIAM
J. Sci. Comput., 42:A477–A503, 2020. (Cited on p. 213.)
[436] Martin J. Gander and Gerhard Wanner. From Euler, Ritz, and Galerkin to modern computing. SIAM
Rev., 54:627–666, 2012. (Cited on p. 375.)
452 Bibliography
[437] Walter Gander. Algorithms for the QR-Decomposition. Research Report 80–02, Seminar für Ange-
wandte Mathematik, ETHZ, Zürich, Switzerland, 1980. (Cited on pp. 62, 62, 70.)
[438] Walter Gander. Least squares with a quadratic constraint. Numer. Math., 36:291–307, 1981. (Cited
on p. 168.)
[439] Walter Gander, Gene H. Golub, and Rolf Strebel. Least squares fitting of circles and ellipses. BIT
Numer. Math., 34:558–578, 1994. (Cited on p. 416.)
[440] Walter Gander and Urs von Matt. Some least squares problems. In W. Gander and J. Hřbiček, edi-
tors, Solving Problems in Scientific Computing Using Maple and MATLAB, pages 83–102. Springer-
Verlag, Berlin, third edition, 1997. (Cited on p. 411.)
[441] B. S. Garbow, J. M. Boyle, Jack J. Dongarra, and G. W. Stewart. Matrix Eigensystems Routines:
EISPACK Guide Extension, volume 51 of Lecture Notes in Computer Science. Springer, New York,
1977. (Cited on p. 113.)
[442] A. de la Garza. An Iterative Method for Solving Systems of Linear Equations. Report K-731, Oak
Ridge Gaseous Diffusion Plant, Oak Ridge, TN, 1951. (Cited on p. 273.)
[443] André Gaul, Martin H. Gutknecht, Jörg Liesen, and Reinhard Nabben. A framework for deflated
and augmented Krylov subspace methods. SIAM J. Matrix Anal. Appl., 34:495–518, 2013. (Cited
on p. 337.)
[444] C. F. Gauss. Theory of the Motion of the Heavenly Bodies Moving about the Sun in Conic Sections,
C. H. Davis, translator, Dover, New York, 1963. First published in 1809. (Cited on p. 2.)
[445] Carl Friedrich Gauss. Theoria combinationis observationum erroribus minimis obnoxiae, pars prior.
In Werke, IV, pages 1–26. Königlichen Gesellschaft der Wissenschaften zu Göttingen, 1880. First
published in 1821. (Cited on p. 3.)
[446] Carl Friedrich Gauss. Theoria combinationis observationum erroribus minimis obnoxiae, pars pos-
terior. In Werke, IV, pages 27–53. Königlichen Gesellschaft der Wissenschaften zu Göttingen,
1880. First published in 1823. (Cited on p. 3.)
[447] Carl Friedrich Gauss. Theory of the Combination of Observations Least Subject to Errors. Part 1,
Part 2, Supplement, G. W. Stewart, Translator, volume 11 of Classics in Applied Math., SIAM,
Philadelphia, 1995. (Cited on p. 3.)
[448] Silvia Gazzola, Per Christian Hansen, and James G. Nagy. IR Tools: A MATLAB package of
iterative regularization methods and large-scale test problems. Numer. Algorithms, 81:773–811,
2018. (Cited on p. 335.)
[449] Silvia Gazzola, Paolo Novati, and Maria Rosario Russo. On Krylov projection methods and
Tikhonov regularization. ETNA, 44:83–123, 2015. (Cited on p. 335.)
[450] W. Morven Gentleman. Least squares computations by Givens transformations without square
roots. J. Inst. Math. Appl., 12:329–336, 1973. (Cited on pp. 50, 255.)
[451] W. Morven Gentleman. Error analysis of QR decompositions by Givens transformations. Linear
Algebra Appl., 10:189–197, 1975. (Cited on p. 56.)
[452] W. Morven Gentleman. Row elimination for solving sparse linear systems and least squares prob-
lems. In G. A. Watson, editor, Proceedings of the Dundee Conference on Numerical Analysis
1975, volume 506 of Lecture Notes in Mathematics, pages 122–133. Springer-Verlag, Berlin, 1976.
(Cited on p. 254.)
[453] Alan George. Computer Implementation of the Finite-Element Method. Ph.D. thesis, Stanford
University, CA, 1971. (Cited on p. 251.)
[454] Alan George, John R. Gilbert, and Joseph W. H. Liu, editors. Graph Theory and Sparse Matrix
Computation, volume 56 of The IMA Volumes in Mathematics and Applications. Springer, 1993.
(Cited on p. 244.)
[455] Alan George and Michael T. Heath. Solution of sparse linear least squares problems using Givens
rotations. Linear Algebra Appl., 34:69–83, 1980. (Cited on pp. 253, 254.)
Bibliography 453
[456] Alan George, Kh. D. Ikramov, and A. B. Kucherov. Some properties of symmetric quasi-definite
matrices. SIAM J. Matrix Anal. Appl., 21:1318–1323, 2000. (Cited on p. 329.)
[457] Alan George and Joseph W. H. Liu. Computer Solution of Large Sparse Positive Definite Systems.
Prentice-Hall, Englewood Cliffs, NJ, 1981. (Cited on pp. 244, 245, 251, 256.)
[458] Alan George and Joseph W. H. Liu. Householder reflections versus Givens rotations in sparse
orthogonal decomposition. Linear Algebra Appl., 88/89:223–238, 1987. (Cited on p. 256.)
[459] Alan George and Joseph W. H. Liu. The evolution of the minimum degree ordering algorithm.
SIAM Rev., 31:1–19, 1989. (Cited on p. 252.)
[460] Alan George and Esmond Ng. On row and column orderings for sparse least squares problems.
SIAM J. Numer. Anal., 20:326–344, 1983. (Cited on pp. 255, 256.)
[461] Alan George and Esmond G. Ng. On the complexity of sparse QR and LU factorization of finite-
element matrices. SIAM J. Sci. Statist. Comput., 9:849–861, 1988. (Cited on p. 258.)
[462] Alan George, William G. Poole, Jr., and Robert G. Voigt. Incomplete nested dissection for solving
n by n grid problems. SIAM J. Numer. Anal., 15:662–673, 1978. (Cited on p. 256.)
[463] J. A. George and E. G. Ng. SPARSPAK: Waterloo Sparse Matrix Package User’s Guide for
SPARSPAK-B. Res. Report CS-84-37, Department of Computer Science, University of Waterloo,
Canada, 1984. (Cited on p. 258.)
[464] J. A. George, M. T. Heath, and R. J. Plemmons. Solution of large-scale sparse least squares prob-
lems using auxiliary storage. SIAM J. Sci. Statist. Comput., 2:416–429, 1981. (Cited on p. 256.)
[465] J. Alan George, Joseph W. H. Liu, and Esmond G. Ng. Row-ordering schemes for sparse Givens
transformations I. Bipartite graph model. Linear Algebra Appl., 61:55–81, 1984. (Cited on p. 253.)
[466] Tomáš Gergelits and Zdeněk Strakoš. Composite convergence bounds based on Chebyshev polyno-
mials and finite precision conjugate gradient computations. Numer. Algorithms, 65:759–782, 2014.
(Cited on p. 299.)
[467] Norman E. Gibbs, William G. Poole, Jr., and Paul K. Stockmeyer. An algorithm for reducing the
bandwidth and profile of a sparse matrix. SIAM J. Numer. Anal., 13:236–250, 1976. (Cited on
p. 251.)
[468] John R. Gilbert, Xiaoye S. Li, Esmond G. Ng, and Barry W. Peyton. Computing row and column
counts for sparse QR and LU factorization. BIT Numer. Math., 41:693–710, 2001. (Cited on
p. 255.)
[469] John R. Gilbert, Cleve Moler, and Robert Schreiber. Sparse matrices in MATLAB: Design and
implementation. SIAM J. Matrix Anal. Appl., 13:333–356, 1992. (Cited on pp. 250, 258.)
[470] John R. Gilbert, Esmond Ng, and B. W. Peyton. Separators and structure prediction in sparse
orthogonal factorization. Linear Algebra Appl., 262:83–97, 1997. (Cited on p. 105.)
[471] M. B. Giles and E. Süli. Adjoint methods for PDEs: A posteriori error analysis and postprocessing
by duality. Acta Numer., 11:145–236, 2002. (Cited on p. 304.)
[472] Philip E. Gill, Gene H. Golub, Walter Murray, and Michael A. Saunders. Methods for modifying
matrix factorizations. Math. Comp., 28:505–535, 1974. (Cited on p. 139.)
[473] Philip E. Gill, Sven J. Hammarling, Walter Murray, Michael A. Saunders, and Margaret H. Wright.
User’s Guide for LSSOL (Version 1.0): A Fortran Package for Constrained Linear Least-Squares
and Convex Quadratic Programming. Report SOL, Department of Operations Research, Stanford
University, CA, 1986. (Cited on p. 165.)
[474] Philip E. Gill and Walter Murray. Algorithms for the solution of the nonlinear least-squares prob-
lem. SIAM J. Numer. Anal., 15:977–992, 1978. (Cited on p. 399.)
[475] Philip E. Gill, Walter Murray, and Michael A. Saunders. SNOPT: An SQP algorithm for large-scale
constrained optimization. SIAM Rev., 47:99–131, 2005. (Cited on p. 318.)
[476] Philip E. Gill, Walter Murray, and Margaret H. Wright. Practical Optimization. Academic Press,
London, UK, 1981. (Cited on pp. 395, 402.)
454 Bibliography
[477] Philip E. Gill, Michael A. Saunders, and Joseph R. Shinnerl. On the stability of Cholesky factor-
ization for symmetric quasi-definite systems. SIAM J. Matrix Anal. Appl., 17:35–46, 1996. (Cited
on p. 329.)
[478] Luc Giraud, Serge Gratton, and Julien Langou. A rank-k update procedure for reorthogonalizing
the orthogonal factor from modified Gram–Schmidt. SIAM J. Matrix Anal. Appl., 25:1163–1177,
2004. (Cited on p. 71.)
[479] Luc Giraud and Julien Langou. When modified Gram–Schmidt generates a well-conditioned set of
vectors. IMA J. Numer. Anal., 22:4:521–528, 2002. (Cited on p. 70.)
[480] Wallace Givens. Computation of plane unitary rotations transforming a general matrix to triangular
form. J. Soc. Indust. Appl. Math., 6:26–50, 1958. (Cited on pp. 47, 51.)
[481] J. Gluchowska and Alicja Smoktunowicz. Solving the linear least squares problem with very high
relative accuracy. Computing, 45:345–354, 1990. (Cited on p. 104.)
[482] Sergei K. Godunov. Problem of the dichotomy of the spectrum of a matrix. Siberian Math. J.,
27:649–660, 1986. (Cited on p. 382.)
[483] Israel Gohberg, Peter Lancaster, and Leiba Rodman. Indefinite Linear Algebra and Applications.
Birkhäuser, Boston, 2005. (Cited on p. 137.)
[484] Herman H. Goldstine. A History of Numerical Analysis from the 16th through the 19th Century.
Stud. Hist. Math. Phys. Sci., Vol. 2. Springer-Verlag, New York, 1977. (Cited on p. 2.)
[485] G. H. Golub and Peter Businger. Least Squares, Singular Values and Matrix Approximations;
an ALGOL Procedure for Computing the Singular Value Decomposition. Tech. Report CS-73,
Computer Science Department, Stanford University, CA, 1967. (Cited on p. 349.)
[486] G. H. Golub and C. F. Van Loan. Unsymmetric positive definite linear systems. Linear Algebra
Appl., 28:85–97, 1979. (Cited on p. 329.)
[487] Gene H. Golub. Numerical methods for solving least squares problems. Numer. Math., 7:206–216,
1965. (Cited on pp. 56, 75, 169, 177.)
[488] Gene H. Golub. Least squares, singular values and matrix approximations. Apl. Mat., 13:44–51,
1968. (Cited on p. 348.)
[489] Gene H. Golub. Matrix decompositions and statistical computing. In Roy C. Milton and John A.
Nelder, editors, Statistical Computation, pages 365–397. Academic Press, New York, 1969. (Cited
on p. 137.)
[490] Gene H. Golub. Some modified matrix eigenvalue problems. SIAM Rev., 15:318–334, 1973. (Cited
on p. 360.)
[491] Gene H. Golub and Chen Greif. On solving block-structured indefinite linear systems. SIAM J. Sci.
Comput., 24:2076–2092, 2003. (Cited on p. 117.)
[492] Gene H. Golub, Per Christian Hansen, and Dianne P. O’Leary. Tikhonov regularization and total
least squares. SIAM J. Matrix Anal. Appl., 21:185–194, 1999. (Cited on p. 225.)
[493] Gene H. Golub, Micheal T. Heath, and Grace Wahba. Generalized cross-validation as a method for
choosing a good ridge parameter. Technometrics, 21:215–223, 1979. (Cited on p. 178.)
[494] Gene H. Golub, Alan Hoffman, and G. W. Stewart. A generalization of the Eckart–Young matrix
approximation theorem. Linear Algebra Appl., 88/89:317–327, 1987. (Cited on p. 24.)
[495] Gene H. Golub and William Kahan. Calculating the singular values and pseudo-inverse of a matrix.
SIAM J. Numer. Anal., 2:205–224, 1965. (Cited on pp. 191, 196, 342.)
[496] Gene H. Golub, Virginia Klema, and G. W. Stewart. Rank Degeneracy and Least Squares Problems.
Tech. Report STAN-CS-76-559, Computer Science Department, Stanford University, Stanford, CA,
1976. (Cited on p. 80.)
[497] Gene H. Golub and Randall J. LeVeque. Extensions and uses of the variable projection algorithm
for solving nonlinear least squares problems. In Proceedings of the 1979 Army Numerical Analysis
and Computers Conf., ARO Report 79-3, pages 1–12. White Sands Missile Range, White Sands,
NM, 1979. (Cited on p. 405.)
Bibliography 455
[498] Gene H. Golub, Franklin T. Luk, and Michael L. Overton. A block Lanczos method for computing
the singular values and corresponding singular vectors of a matrix. ACM Trans. Math. Softw.,
7:149–169, 1981. (Cited on pp. 193, 376.)
[499] Gene H. Golub, Franklin T. Luk, and M. Pagano. A sparse least squares problem in photogramme-
try. In J. F. Gentleman, editor, Proceedings of the Computer Science and Statistics: 12th Annual
Symposium on the Interface, pages 26–30. University of Waterloo, Ontario, Canada, 1979. (Cited
on pp. 206, 208.)
[500] Gene H. Golub, P. Manneback, and Ph. L. Toint. A comparison between some direct and iterative
methods for large scale geodetic least squares problems. SIAM J. Sci. Statist. Comput., 7:799–816,
1986. (Cited on pp. 209, 308, 309.)
[501] Gene H. Golub and Gérard Meurant. Matrices, moments and quadrature. In D. F. Griffiths and G. A.
Watson, editors, Numerical Analysis 1993: Proceedings of the 13th Dundee Biennial Conference,
volume 228 of Pitman Research Notes Math., pages 105–156. Longman Scientific and Technical,
Harlow, UK, 1994. (Cited on p. 289.)
[502] Gene H. Golub and Dianne P. O’Leary. Some history of the conjugate gradient and Lanczos algo-
rithms: 1948–1976. SIAM Review, 31:50–102, 1989. (Cited on p. 285.)
[503] Gene H. Golub and Victor Pereyra. The differentiation of pseudo-inverses and nonlinear least
squares problems whose variables separate. SIAM J. Numer. Anal., 10:413–432, 1973. (Cited on
pp. 26, 402, 403.)
[504] Gene H. Golub and Victor Pereyra. Separable nonlinear least squares: The variable projection
method and its application. Inverse Problems, 19:R1–R26, 2003. (Cited on p. 405.)
[505] Gene H. Golub and R. J. Plemmons. Large-scale geodetic least-squares adjustment by dissection
and orthogonal decomposition. Linear Algebra Appl., 34:3–28, 1980. (Cited on pp. 187, 206,
209.)
[506] Gene H. Golub, Robert J. Plemmons, and Ahmed H. Sameh. Parallel block schemes for large-
scale least-squares computations. In High-Speed Computing, Scientific Applications and Algorithm
Design, pages 171–179. University of Illinois Press, 1988. (Cited on p. 208.)
[507] Gene H. Golub and C. Reinsch. Singular value decomposition and least squares solutions. In F. L.
Bauer et al., editors, Handbook for Automatic Computation. Vol. II, Linear Algebra, pages 134–151.
Springer, New York, 1971. Prepublished in Numer. Math., 14:403–420, 1970. (Cited on pp. 11,
349.)
[508] Gene H. Golub, Knut Sølna, and Paul Van Dooren. Computing the SVD of a general matrix
product/quotient. SIAM J. Matrix Anal. Appl., 22:1–19, 2000. (Cited on p. 127.)
[509] Gene H. Golub, Martin Stoll, and Andy Wathen. Approximation of the scattering amplitude and
linear systems. ETNA, 31:178–203, 2008. (Cited on p. 304.)
[510] Gene H. Golub and Frank Uhlig. The QR algorithm: 50 years later its genesis by John Francis
and Vera Kublanovskaya and subsequent developments. IMA J. Numer. Anal., 29:467–485, 2009.
(Cited on p. 348.)
[511] Gene H. Golub and Charles F. Van Loan. An analysis of the total least squares problem. SIAM J.
Numer. Anal., 17:883–893, 1980. (Cited on pp. 218, 220.)
[512] Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins University Press,
Baltimore, MD, third edition, 1996. (Cited on pp. 17, 80, 157, 179, 355, 368.)
[513] Gene H. Golub and Richard S. Varga. Chebyshev semi-iteration and second-order Richardson
iterative methods. Parts I and II. Numer. Math., 3:147–168, 1961. (Cited on p. 277.)
[514] Gene H. Golub and James H. Wilkinson. Note on the iterative refinement of least squares solutions.
Numer. Math., 9:139–148, 1966. (Cited on p. 27.)
[515] Gene H. Golub and Hongyuan Zha. Perturbation analysis of the canonical correlation of matrix
pairs. Linear Algebra Appl., 210:3–28, 1994. (Cited on p. 17.)
456 Bibliography
[516] Gene H. Golub and Hongyuan Zha. The canonical correlations of matrix pairs and their numerical
computation. In A. Bojanczyk and G. Cybenko, editors, Linear Algebra for Signal Processing,
volume 69 of The IMA Volumes in Mathematics and Its Applications, pages 27–49. Springer-Verlag,
1995. (Cited on p. 17.)
[517] Dan Gordon and Rachel Gordon. CGMN revisited: Robust and efficient solution of stiff linear
systems derived from elliptic partial differential equations. ACM Trans. Math. Softw., 35:18:1–
18:27, 2008. (Cited on p. 275.)
[518] Rachel Gordon, R. Bender, and Gabor T. Herman. Algebraic reconstruction techniques (ART) for
three-dimensional electron microscopy and X-ray photography. J. Theor. Biology, 29:471–481,
1970. (Cited on p. 274.)
[519] S. A. Goreinov, E. E. Tyrtyshnikov, and N. L. Zamarashkin. A theory of pseudo-skeleton approxi-
mations. Linear Algebra Appl., 261:1–21, 1997. (Cited on p. 90.)
[520] Nicholas I. M. Gould and Jennifer A. Scott. The state-of-the-art of preconditioners for sparse least-
squares problems. ACM Trans. Math. Softw., 43:36-1–36-35, 2017. (Cited on pp. 297, 306.)
[521] John C. Gower and Garmt B. Dijksterhuis. Procrustes Problems. Oxford University Press, Oxford,
UK, 2004. (Cited on p. 387.)
[522] William B. Gragg, Randall J. LeVeque, and John A. Trangenstein. Numerically stable methods for
updating regressions. J. Amer. Statist. Org., 74:161–168, 1979. (Cited on p. 140.)
[523] William H. Gragg and W. J. Harrod. The numerically stable reconstruction of Jacobi matrices from
spectral data. Numer. Math., 44:317–335, 1984. (Cited on pp. 230, 239.)
[524] S. L. Graham, M. Snir, and C. A. Patterson, editors. Getting up to Speed. The Future of Supercom-
puting. The National Academies Press, Washington, DC, 2004. (Cited on p. 114.)
[525] Jørgen. P. Gram. Ueber die Entwickelung reeller Functionen in Reihen mittelst der Methode der
kleinsten Quadrate. J. Reine Angew. Math., 94:41–73, 1883. (Cited on p. 63.)
[526] S. Gratton, A. S. Lawless, and N. K. Nichols. Approximate Gauss–Newton methods for nonlinear
least squares problems. SIAM J. Optim., 18:106–132, 2007. (Cited on pp. 401, 401.)
[527] Serge Gratton. On the condition number of the least squares problem in a weighted Frobenius norm.
BIT Numer. Math., 36:523–530, 1996. (Cited on p. 31.)
[528] Serge Gratton, Pavel Jiránek, and David Titley-Peloquin. On the accuracy of the Karlson–Waldén
estimate of the backward error for linear least squares problems. SIAM J. Matrix Anal. Appl.,
33:822–836, 2012. (Cited on p. 99.)
[529] Serge Gratton, David Titley-Peloquin, and Jean Tshimanga Ilunga. Sensitivity and conditioning of
the truncated total least squares solution. SIAM J. Matrix Anal. Appl., 34:1257–1276, 2013. (Cited
on p. 226.)
[530] Joseph F. Grcar. Matrix Stretching for Linear Equations. Tech. Report SAND 90-8723, Sandia
National Laboratories, Albuquerque, NM, 1990. (Cited on p. 263.)
[531] Joseph F. Grcar. Spectral condition numbers of orthogonal projections and full rank linear least
squares residuals. SIAM J. Matrix Anal. Appl., 31:2934–2949, 2010. (Cited on p. 31.)
[532] Joseph F. Grcar, Michael A. Saunders, and Z. Su. Estimates of Optimal Backward Perturbations for
Linear Least Squares Problems. Tech. Report SOL 2007-1, Department of Management Science
and Engineering, Stanford University, Stanford, CA, 2007. (Cited on pp. 99, 299.)
[533] Anne Greenbaum. Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences.
Linear Algebra Appl., 113:7–63, 1989. (Cited on p. 295.)
[534] Anne Greenbaum. Iterative Methods for Solving Linear Systems, volume 17 of Frontiers in Applied
Mathematics. SIAM, Philadelphia, 1997. (Cited on pp. 269, 285, 294.)
[535] Anne Greenbaum, Miroslav Rozložník, and Zdeněk Strakoš. Numerical behavior of the modified
Gram–Schmidt GMRES process and related algorithms. BIT Numer. Math., 37:706–719, 1997.
(Cited on p. 302.)
Bibliography 457
[536] T. N. E. Greville. Note on the generalized inverse of a matrix product. SIAM Rev., 8:518–521, 1966.
(Cited on p. 15.)
[537] T. N. E. Greville. Solutions of the matrix equation XAX = X, and relations between oblique and
orthogonal projectors. SIAM J. Appl. Math., 26:828–832, 1974. (Cited on p. 119.)
[538] Roger G. Grimes, John G. Lewis, and Horst D. Simon. A shifted block Lanczos algorithm for
solving sparse symmetric generalized eigenproblems. SIAM J. Matrix Anal. Appl., 15:228–272,
1994. (Cited on p. 375.)
[539] C. W. Groetsch. The Theory of Tikhonov Regularization for Fredholm Integral Equations of the
First Kind. Pitman, Boston, MA, 1984. (Cited on pp. 171, 177.)
[540] Eric Grosse. Tensor spline approximations. Linear Algebra Appl., 34:29–41, 1980. (Cited on
p. 209.)
[541] Benedikt Großer and Bruno Lang. Efficient parallel reduction to bidiagonal form. Parallel Comput.,
25:969–986, 1999. (Cited on p. 348.)
[542] Benedikt Großer and Bruno Lang. An O(n2 ) algorithm for the bidiagonal SVD. Linear Algebra
Appl., 358:45–70, 2003. (Cited on p. 352.)
[543] Benedikt Grosser and Bruno Lang. On symmetric eigenproblems induced by the bidiagonal SVD.
SIAM J. Matrix Anal. Appl., 26:599–620, 2005. (Cited on p. 352.)
[544] Marcus J. Grote and Thomas Huckle. Parallel preconditioning with sparse approximate inverses.
SIAM J. Sci. Comput., 18:838–853, 1997. (Cited on p. 314.)
[545] Ming Gu and Stanley C. Eisenstat. A Stable and Fast Algorithm for Updating the Singular Value
Decomposition. Tech. Report RR-939, Department of Computer Science, Yale University, New
Haven, CT, 1993. (Cited on p. 363.)
[546] Ming Gu and Stanley C. Eisenstat. A divide-and-conquer algorithm for the bidiagonal SVD. SIAM
J. Matrix Anal. Appl., 16:79–92, 1995. (Cited on pp. 358, 359.)
[547] Ming Gu and Stanley C. Eisenstat. A divide-and-conquer algorithm for the symmetric tridiagonal
eigenproblem. SIAM J. Matrix Anal. Appl., 16:172–191, 1995. (Cited on p. 358.)
[548] Ming Gu and Stanley C. Eisenstat. Downdating the singular value decomposition. SIAM J. Matrix
Anal. Appl., 16:793–810, 1995. (Cited on p. 363.)
[549] Ming Gu and Stanley C. Eisenstat. Efficient algorithms for computing a strong rank-revealing QR
factorization. SIAM J. Sci. Comput., 17:848–869, 1996. (Cited on p. 84.)
[550] Mårten E. Gulliksson. On the modified Gram–Schmidt algorithm for weighted and constrained
linear least squares problems. BIT Numer. Math., 35:453–468, 1995. (Cited on p. 160.)
[551] Mårten Gulliksson and Per-Åke Wedin. Modifying the QR-decomposition to constrained and
weighted linear least squares. SIAM J. Matrix Anal. Appl., 13:1298–1313, 1992. (Cited on p. 121.)
[552] Mårten E. Gulliksson and Per-Åke Wedin. Perturbation theory for generalized and constrained
linear least squares. Numer. Linear Algebra Appl., 7:181–196, 2000. (Cited on p. 158.)
[553] B. C. Gunter and Robert A. Van de Geijn. Parallel out-of-core computation and updating of the QR
factorization. ACM Trans. Math. Softw., 31:60–78, 2005. (Cited on p. 112.)
[554] Hongbin Guo and Rosemary A. Renaut. A regularized total least squares algorithm. In Sabine Van
Huffel and P. Lemmerling, editors, Total Least Squares and Errors-in-Variables Modeling, pages
57–66. Kluwer Academic Publishers, Dordrecht, 2002. (Cited on p. 226.)
[555] Fred G. Gustavson. Finding the block lower triangular form of a matrix. In J. R. Bunch and D. J.
Rose, editors, Sparse Matrix Computations, pages 275–289. Academic Press, New York, 1976.
(Cited on p. 264.)
[556] Martin H. Gutknecht. Block Krylov space methods for linear systems with multiple right hand-
sides: An introduction. In A. H. Siddiqi, I. S. Duff, and O. Christensen, editors, Modern Math-
ematical Models, Methods, and Algorithms for Real World Systems, pages 420–447. Anarnaya
Publishers, New Dehli, India, 2007. (Cited on p. 299.)
458 Bibliography
[557] Irwin Guttman, Victor Pereyra, and Hugo D. Scolnik. Least squares estimation for a class of
nonlinear models. Technometrics, 15:309–318, 1973. (Cited on p. 405.)
[558] L. A. Hageman, Franklin T. Luk, and David M. Young. On the equivalence of certain iterative
acceleration methods. SIAM J. Numer. Anal., 17:852–873, 1980. (Cited on p. 308.)
[559] Louis A. Hageman and David M. Young. Applied Iterative Methods. Dover, Mineola, NY, 2004.
Unabridged republication of the work first published by Academic Press, New York, 1981. (Cited
on p. 269.)
[560] William W. Hager. Condition estimates. SIAM J. Sci. and Statist. Comput., 5:311–316, 1984.
(Cited on pp. 96, 97.)
[561] Arne Hald. Statistical Theory with Engineering Applications. Wiley, New York, 1952. Translated
by G. Seidelin. (Cited on p. 205.)
[562] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic
algorithms for constructing approximate matrix decompositions. SIAM Rev., 53:217–288, 2011.
(Cited on p. 319.)
[563] Sven J. Hammarling. A note on modifications to the Givens plane rotations. J. Inst. Math. Appl.,
13:215–218, 1974. (Cited on p. 50.)
[564] Sven J. Hammarling. The numerical solution of the general Gauss–Markoff linear model. In
T. S. Durrani et al., editors, Mathematics in Signal Processing. Clarendon Press, Oxford University
Press, New York, pages 441–456, 1987. (Cited on p. 123.)
[565] Sven J. Hammarling, Nicholas J. Higham, and Craig Lucas. LAPACK-style codes for pivoted
Cholesky and QR updating. In Bo Kågström, Erik Elmroth, Jack J. Dongarra, and J. Waśniewski,
editors, Applied Parallel Computing: State of the Art in Scientific Computing. Proceedings from the
Eighth International Workshop, PARA 2006, pages 137–146, 2006. (Cited on p. 148.)
[566] Martin Hanke. Accelerated Landweber iteration for the solution of ill-posed equations. Numer.
Math., 60:341–373, 1991. (Cited on p. 326.)
[567] Martin Hanke. Conjugate Gradient Type Methods for Ill-Posed Problems, volume 327 of Pitman
Research Notes in Math. Longman Scientific and Technical, Harlow, UK, 1995. (Cited on p. 334.)
[568] Martin Hanke. On Lanczos based methods for the regularization of discrete ill-posed problems.
BIT Numer. Math., 41:1008–1018, 2001. (Cited on pp. 332, 334.)
[569] Martin Hanke. A Taste of Inverse Problems: Basic Theory and Examples. SIAM, Philadelphia,
2017. (Cited on p. 182.)
[570] Martin Hanke and Per Christian Hansen. Regularization methods for large-scale problems. Surveys
Math. Indust., 3:253–315, 1993. (Cited on pp. 175, 177, 182, 182, 333, 334.)
[571] Martin Hanke, James G. Nagy, and Curtis Vogel. Quasi-Newton approach to nonnegative image
restorations. Linear Algebra Appl., 316:223–236, 2000. (Cited on p. 417.)
[572] Martin Hanke and Curtis R. Vogel. Two-level preconditioners for regularized inverse problems I.
Theory. Numer. Math., 83:385–402, 1999. (Cited on p. 321.)
[573] Per Christian Hansen. The discrete Picard condition for discrete ill-posed problems. BIT Numer.
Math., 30:658–672, 1990. (Cited on p. 172.)
[574] Per Christian Hansen. Analysis of discrete ill-posed problems by means of the L-curve. SIAM Rev.,
34:561–580, 1992. (Cited on p. 178.)
[575] Per Christian Hansen. Regularization tools: A MATLAB package for analysis and solution of
discrete ill-posed problems. Numerical Algorithms, 46:1–35, 1994. (Cited on p. 182.)
[576] Per Christian Hansen. Rank-Deficient and Discrete Ill-Posed Problems. Numerical Aspects of Lin-
ear Inversion. SIAM, Philadelphia, 1998. (Cited on pp. 174, 178, 182, 182, 322.)
[577] Per Christian Hansen. Deconvolution and regularization with Toeplitz matrices. Numer. Algorithms,
29:323–378, 2002. (Cited on p. 173.)
Bibliography 459
[578] Per Christian Hansen. Regularization tools version 4.0 for MATLAB 7.3. Numer. Algorithms,
46:189–194, 2007. (Cited on p. 182.)
[579] Per Christian Hansen. Discrete Inverse Problems. Insight and Algorithms, volume 7 of Fundamen-
tals of Algorithms. SIAM, Philadelphia, 2010. (Cited on pp. 175, 182, 334.)
[580] Per Christian Hansen. Oblique projections and standard form transformations for discrete inverse
problems. Numer. Linear Algebra Appl., 20:250–258, 2013. (Cited on p. 182.)
[581] Per Christian Hansen and H. Gesmar. Fast orthogonal decomposition of rank deficient Toeplitz
matrices. Numer. Algorithms, 4:151–166, 1993. (Cited on p. 241.)
[582] Per Christian Hansen and Toke Koldborg Jensen. Smoothing-norm preconditioning for regularizing
minimum-residual methods. SIAM J. Matrix Anal. Appl., 29:1–14, 2006. (Cited on p. 322.)
[583] Per Christian Hansen, James G. Nagy, and Dianne P. O’Leary. Deblurring Images. Matrices, Spec-
tra, and Filtering. SIAM, Philadelphia, 2006. (Cited on pp. 182, 239.)
[584] Per Christian Hansen and Dianne Prost O’Leary. The use of the L-curve in the regularization of
discrete ill-posed problems. SIAM J. Sci. Comput., 14:1487–1503, 1993. (Cited on p. 178.)
[585] Per Christian Hansen, T. Sekii, and H. Shibahashi. The modified truncated SVD method for reg-
ularization in general form. SIAM J. Statist. Comput., 13:1142–1150, 1992. (Cited on pp. 174,
174.)
[586] Per Christian Hansen and Plamen Y. Yalamov. Computing symmetric rank-revealing decompo-
sitions via triangular factorizations. SIAM J. Matrix Anal. Appl., 23:443–458, 2001. (Cited on
p. 79.)
[587] Richard J. Hanson. Linear least squares with bounds and linear constraints. SIAM J. Sci. Statist.
Comput., 7:826–834, 1986. (Cited on p. 162.)
[588] Richard J. Hanson and K. H. Haskell. Algorithm 587: Two algorithms for the linearly constrained
least squares problem. ACM Trans. Math. Softw., 8:323–333, 1982. (Cited on p. 162.)
[589] Richard J. Hanson and Charles L. Lawson. Extensions and applications of the Householder al-
gorithm for solving linear least squares problems. Math. Comp., 23:787–812, 1969. (Cited on
p. 77.)
[590] Richard J. Hanson and Michael J. Norris. Analysis of measurements based on the singular value
decomposition. SIAM J. Sci. Statist. Comput., 2:363–373, 1981. (Cited on pp. 51, 386.)
[591] Vjeran Hari and Krešimir Veselić. On Jacobi’s methods for singular value decompositions. SIAM
J. Sci. Statist. Comput., 8:741–754, 1987. (Cited on p. 357.)
[592] R. V. L. Hartley. A more symmetrical Fourier analysis applied to transmission problems. Proc.
IRE, 30:144–150, 1942. (Cited on p. 320.)
[593] K. H. Haskell and Richard J. Hanson. Selected Algorithm for the Linearly Constrained Least
Squares Problem—A User’s Guide. Tech. Report SAND78–1290, Sandia National Laboratories,
Albuquerque, NM, 1979. (Cited on p. 162.)
[594] K. H. Haskell and Richard J. Hanson. An algorithm for linear least squares problems with equality
and nonnegativity constraints. Math. Prog., 21:98–118, 1981. (Cited on pp. 162, 162.)
[595] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning.
Data Mining, Inference, and Prediction. Springer-Verlag, Berlin, second edition, 2009. (Cited on
pp. 420, 426.)
[596] Michael T. Heath. Some extensions of an algorithm for sparse linear least squares problems. SIAM
J. Sci. Statist. Comput., 3:223–237, 1982. (Cited on pp. 261, 262, 263.)
[597] M. T. Heath, A. J. Laub, C. C. Paige, and R. C. Ward. Computing the singular value decomposition
of a product of two matrices. SIAM J. Sci. Statist. Comput., 7:1147–1149, 1986. (Cited on pp. 128,
349.)
[598] M. D. Hebden. An Algorithm for Minimization Using Exact Second Derivatives. Tech. Report T. P.
515, Atomic Energy Research Establishment, Harwell, UK, 1973. (Cited on p. 169.)
460 Bibliography
[599] I. S. Helland. On the structure of partial least squares regression. Comm. Statist. Theory Methods
Ser. B, 17:581–607, 1988. (Cited on p. 203.)
[600] F. R. Helmert. Die Mathematischen und Physikalischen Theorieen der höheren Geodäsie. Ein-
leitung und 1 Teil: Die mathematischen Theorieen. Druck und Verlag von B. G. Teubner, Leipzig,
1880. (Cited on p. 206.)
[601] H. V. Henderson and S. R. Searle. On deriving the inverse of a sum of matrices. SIAM Review,
23:53–60, 1981. (Cited on p. 139.)
[602] H. V. Henderson and S. R. Searle. The vec-permutation matrix, the vec operator and Kronecker
products: A review. Linear Multilinear Algebra, 9:271–288, 1980/1981. (Cited on p. 210.)
[603] Peter Henrici. The quotient-difference algorithm. Nat. Bur. Standards Appl. Math. Ser., 49:23–46,
1958. (Cited on p. 351.)
[604] Peter Henrici. Fast Fourier methods in computational complex analysis. SIAM Rev., 21:481–527,
1979. (Cited on p. 237.)
[605] V. Hernandez, J. E. Román, and A. Tomás. A parallel variant of the Gram–Schmidt process with
reorthogonalization. In G. R. Joubert, W. E. Nagel, F. J. Peters, O. G. Plata, and E. L. Zapata,
editors, Parallel Computing: Current & Future Issues in High-End Computing, volume 33 of John
von Neumann Institute for Computing Series, pages 221–228. Central Institute for Applied Mathe-
matics, Jülich, Germany, 2006. (Cited on p. 109.)
[606] Magnus R. Hestenes. Inversion of matrices by biorthogonalization and related results. J. Soc.
Indust. Appl. Math., 6:51–90, 1958. (Cited on p. 355.)
[607] Magnus R. Hestenes. Conjugacy and gradients. In Stephen G. Nash, editor, A History of Scientific
Computing, volume 60 of IMA Series in Mathematics and Its Applications, pages 167–179. ACM
Press, New York, 1990. (Cited on p. 285.)
[608] Magnus R. Hestenes and Eduard Stiefel. Methods of conjugate gradients for solving linear systems.
J. Res. Nat. Bur. Standards, Sect. B, 49:409–436, 1952. (Cited on pp. 278, 280, 281, 285.)
[609] K. L. Hiebert. An evaluation of mathematical software that solves nonlinear least squares problems.
ACM Trans. Math. Softw., 7:1–16, 1981. (Cited on p. 394.)
[610] Nicholas J. Higham. Computing the polar decomposition—with applications. SIAM J. Sci. Statist.
Comput., 7:1160–1174, 1986. (Cited on pp. 379, 383, 384.)
[611] Nicholas J. Higham. Computing real square roots of a real matrix. Linear Algebra Appl.,
88/89:405–430, 1987. (Cited on p. 379.)
[612] Nicholas J. Higham. Error analysis of the Björck–Pereyra algorithm for solving Vandermonde
systems. Numer. Math., 50:613–632, 1987. (Cited on pp. 95, 97, 238.)
[613] Nicholas J. Higham. Fast solution of Vandermonde-like systems involving orthogonal polynomials.
IMA J. Numer. Anal., 8:473–486, 1988. (Cited on p. 238.)
[614] Nicholas J. Higham. Fortran codes for estimating the one-norm of a real or complex matrix, with
applications to condition estimation. ACM Trans. Math. Softw., 14:381–396, 1988. (Cited on
p. 97.)
[615] Nicholas J. Higham. The accuracy of solutions to triangular systems. SIAM J. Numer. Anal.,
26:1252–1265, 1989. (Cited on p. 43.)
[616] Nicholas J. Higham. Analysis of the Cholesky decomposition of a semi-definite matrix. In Mau-
rice G. Cox and Sven J. Hammarling, editors, Reliable Numerical Computation, pages 161–185.
Clarendon Press, Oxford, UK, 1990. (Cited on p. 97.)
[617] Nicholas J. Higham. How accurate is Gaussian elimination? In D. F. Griffiths and G. A. Watson,
editors, Numerical Analysis 1989: Proceedings of the 13th Dundee Biennial Conference, volume
228 of Pitman Research Notes Math. pages 137–154. Longman Scientific and Technical, Harlow,
UK, 1990. (Cited on pp. 72, 96.)
Bibliography 461
[618] Nicholas J. Higham. Iterative refinement enhances the stability of QR factorization methods for
solving linear equations. BIT Numer. Math., 31:447–468, 1991. (Cited on pp. 59, 103, 103.)
[619] Nicholas J. Higham. The matrix sign decomposition and its relation to the polar decomposition.
Linear Algebra Appl., 212/213:3–20, 1994. (Cited on p. 380.)
[620] Nicholas J. Higham. A survey of componentwise perturbation theory in numerical linear algebra.
In Walter Gautschi, editor, Mathematics of Computation 1943–1953: A Half-Century of Compu-
tational Mathematics. Mathematics of Computation, 50th Anniversary Symposium, August 9–13,
1993, Vancouver, BC, volume 48 of Proceedings of Symposia in Applied Mathematics, pages 49–
78. AMS, Providence, RI, 1994. (Cited on pp. 29, 31.)
[621] Nicholas J. Higham. Stable iterations for the matrix square root. Numer. Algorithms, 15:227–242,
1997. (Cited on p. 93.)
[622] Nicholas J. Higham. QR factorization with complete pivoting and accurate computation of the
SVD. Linear Algebra Appl., 309:153–174, 2000. (Cited on p. 341.)
[623] Nicholas J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, second
edition, 2002. (Cited on pp. 34, 37, 39, 44, 54, 57, 58, 65, 72, 96, 98, 101, 239.)
[624] Nicholas J. Higham. J-orthogonal matrices: Properties and generation. SIAM Rev., 45:504–519,
2003. (Cited on pp. 19, 137.)
[625] Nicholas J. Higham. Functions of Matrices. Theory and Computation. SIAM, Philadelphia, 2008.
(Cited on pp. 378, 383, 384, 384.)
[626] Nicholas J. Higham. The world’s most fundamental matrix equation decomposition. SIAM News,
Dec.:1–3, 2017. (Cited on p. 138.)
[627] Nicholas J. Higham and Theo Mary. Mixed precision algorithms in numerical linear algebra. Acta
Numer., 31:347–414, 2022. (Cited on p. 104.)
[628] Christopher J. Hillar and Lek-Heng Lim. Most tensor problems are NP-hard. J. ACM, 60:Article
45, 2013. (Cited on p. 216.)
[629] F. L. Hitchcock. The expression of a tensor or a polyadic as a sum of products. J. Math. Phys.,
7:164–189, 1927. (Cited on p. 216.)
[630] Iveta Hnětynková, Martin Plešinger, Diana Maria Sima, Zdeněk Strakoš, and Sabine Van Huffel.
The total least squares problem in AX ≈ B: A new classification with the relationship to the
classical works. SIAM J. Matrix Anal. Appl., 32:748–770, 2011. (Cited on p. 196.)
[631] Iveta Hnětynková, Martin Plešinger, and Zdeněk Strakoš. The regularization effect of the Golub–
Kahan iterative bidiagonalization and revealing the noise level in the data. BIT Numer. Math.,
49:669–696, 2009. (Cited on p. 335.)
[632] Iveta Hnětynková, Martin Plešinger, and Zdeněk Strakoš. The core problem within a linear approxi-
mation problem AX ≈ B with multiple right-hand sides. SIAM J. Matrix Anal. Appl., 34:917–931,
2013. (Cited on p. 196.)
[633] Iveta Hnětynková, Martin Plešinger, and Zdeněk Strakoš. Band generalization of the Golub–Kahan
bidiagonalization, generalized Jacobi matrices, and the core problem. SIAM J. Matrix Anal. Appl.,
36:417–434, 2015. (Cited on pp. 196, 305.)
[634] Michiel E. Hochstenbach. A Jacobi–Davidson type SVD method. SIAM J. Sci. Comput., 23:606–
628, 2001. (Cited on p. 376.)
[635] Michiel E. Hochstenbach. Harmonic and refined extraction methods for the singular value problem,
with applications in least squares problems. BIT Numer. Math., 44:721–754, 2004. (Cited on
pp. 371, 371.)
[636] Michiel E. Hochstenbach and Y. Notay. The Jacobi–Davidson method. GAMM Mitt., 29:368–382,
2006. (Cited on p. 375.)
[637] Walter Hoffmann. Iterative algorithms for Gram–Schmidt orthogonalization. Computing, 41:335–
348, 1989. (Cited on p. 69.)
462 Bibliography
[638] Y. P. Hong and C.-T. Pan. Rank revealing QR decompositions and the singular value decomposition.
Math. Comp., 58:213–232, 1992. (Cited on pp. 80, 80, 81.)
[639] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge,
UK, 1985. (Cited on pp. 116, 350, 377.)
[640] Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis. Cambridge University Press,
Cambridge, UK, 1991. (Cited on pp. 13, 210.)
[641] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge,
UK, second edition, 2012. (Cited on pp. 12, 20.)
[642] H. Hotelling. Relation between two sets of variables. Biometrica, 28:322–377, 1936. (Cited on
p. 19.)
[643] Patricia D. Hough and Stephen A. Vavasis. Complete orthogonal decomposition for weighted least
squares. SIAM J. Matrix Anal. Appl., 18:369–392, 1997. (Cited on p. 133.)
[644] Alston S. Householder. Unitary triangularization of a nonsymmetric matrix. J. Assoc. Comput.
Mach., 5:339–342, 1958. (Cited on pp. 46, 51.)
[645] Alston S. Householder. The Theory of Matrices in Numerical Analysis. Dover, Mineola, NY, 1975.
Corrected republication of work first published in 1964 by Blaisdell Publ. Co., New York. (Cited
on pp. 139, 281.)
[646] Alston S. Householder and Friedrich L. Bauer. On certain iterative methods for solving linear
systems. Numer. Math., 2:55–59, 1960. (Cited on p. 273.)
[647] Gary W. Howell and Marc Baboulin. LU preconditioning for overdetermined sparse least squares
problems. In R. Wyrzykowski, E. Deelman, J. Dongarra, K. Kurczewiski, J. Kitowski, and K. Wiatr,
editors, 11th Internat. Conf. Parallel Proc. and Appl. Math., 2015, Krakow, Poland, volume 9573
of Lecture Notes in Computer Science, pages 128–137, Springer, Heidelberg, 2016. (Cited on
p. 318.)
[648] P. J. Huber. Robust Statistics. Wiley, New York, 1981. (Cited on pp. 422, 422.)
[649] M. F. Hutchinson and F. R. de Hoog. Smoothing noisy data with spline functions. Numer. Math.,
47:99–106, 1985. (Cited on p. 180.)
[650] M. F. Hutchinson and F. R. de Hoog. A fast procedure for calculating minimum cross-validation
cubic smoothing splines. ACM Trans. Math. Softw., 12:150–153, 1986. (Cited on p. 180.)
[651] Tsung-Min Hwang, Wen-Wei Lin, and Dan Pierce. Improved bounds for rank revealing LU factor-
izations. Linear Algebra Appl., 261:173–186, 1997. (Cited on p. 91.)
[652] Tsung-Min Hwang, Wen-Wei Lin, and Eugene K. Yang. Rank revealing LU factorizations. Linear
Algebra Appl., 175:115–141, 1992. (Cited on p. 91.)
[653] Bruno Iannazzo. A note on computing the matrix square root. Calcolo, 40:273–283, 2003. (Cited
on p. 379.)
[654] Bruno Iannazzo. On the Newton method for the matrix pth root. SIAM J. Matrix Anal. Appl.,
28:503–523, 2006. (Cited on p. 379.)
[655] IEEE Standard for Floating-Point Arithmetic. In IEEE Standard 754-2019 (Revision of IEEE Stan-
dard 754-2008), pages 1–84, 2019. (Cited on p. 32.)
[656] Akira Imakura and Yusaka Yamamoto. Efficient implementations of the modified Gram-Schmidt
orthogonalization with a non-standard inner product. Japan J. Indust. Appl. Math., 36:619–641,
2019. (Cited on p. 122.)
[657] D. Irony, Sivan Toledo, and A. Tiskin. Using perturbed QR factorizations to solve linear least-
squares problems. J. Parallel Distrib. Comput., 64:1017–1026, 2004. (Cited on p. 114.)
[658] Carl Gustav Jacob Jacobi. Über eine neue Auflösungsart der bei der Methode der kleinsten Quadrate
vorkommenden lineären Gleichungen. Astron. Nachr., 22:297–306, 1845. (Cited on p. 51.)
Bibliography 463
[659] Carl Gustav Jacob Jacobi. Über ein leichtes Verfahren die in der Theorie der Säcularstörungen vork-
ommenden Gleichungen numerisch aufzulösen. J. Reine Angew. Math., 30:51–94, 1846. (Cited on
pp. 352, 374.)
[660] M. Jacobsen. Two-grid Iterative Methods for Ill-Posed Problems. Master’s thesis, Technical Uni-
versity of Denmark, Kongens Lyngby, Denmark, 2000. (Cited on p. 322.)
[661] M. Jacobsen, Per Christian Hansen, and Michael A. Saunders. Subspace preconditioned LSQR for
discrete ill-posed problems. BIT Numer. Math., 43:975–989, 2003. (Cited on p. 321.)
[662] W. Jalby and B. Philippe. Stability analysis and improvement of the block Gram–Schmidt algo-
rithm. SIAM J. Sci. Statist. Comput., 12:1058–1073, 1991. (Cited on p. 109.)
[663] Marcus Jankowski and Henryk Woźniakowski. Iterative refinement implies numerical stability. BIT
Numer. Math., 17:303–311, 1977. (Cited on p. 103.)
[664] A. Jennings and M. A. Ajiz. Incomplete methods for solving AT Ax = b. SIAM J. Sci. Statist.
Comput., 5:978–987, 1984. (Cited on pp. 312, 313.)
[665] Alan Jennings and G. M. Malik. The solution of sparse linear equations by the conjugate gradient
algorithm. Int. J. Numer. Methods Engrg., 12:141–158, 1978. (Cited on p. 306.)
[666] Tore Koldborg Jensen and Per Christian Hansen. Iterative regularization with minimum-residual
methods. BIT Numer. Math., 47:103–120, 2007. (Cited on p. 333.)
[667] E. R. Jessup and D. C. Sorensen. A parallel algorithm for computing the singular value decompo-
sition of a matrix. SIAM J. Matrix Anal. Appl., 15:530–548, 1994. (Cited on pp. 358, 359.)
[668] Zhongxiao Jia. A refined subspace iteration algorithm for large sparse eigenproblems. Appl. Numer.
Math., 32:35–52, 2000. (Cited on p. 370.)
[669] Zhongxiao Jia. Regularization properties of Krylov iterative solvers CGME and LSMR for linear
discrete ill-posed problems with an application to truncated randomized SVDs. Numer. Algorithms,
85:1281–1310, 2020. (Cited on p. 334.)
[670] Zhongxiao Jia and Binguy Li. On the condition number of the total least squares problem. Numer.
Math., 125:61–87, 2013. (Cited on p. 226.)
[671] Zhongxiao Jia and Datian Niu. A refined harmonic Lanczos bidiagonalization method and an
implicitly restarted algorithm for computing the smallest singular triplets of large matrices. SIAM
J. Sci. Comput., 32:714–744, 2010. (Cited on pp. 372, 372.)
[672] X.-Q. Jin. A preconditioner for constrained and weighted least squares problems with Toeplitz
structure. BIT Numer. Math., 36:101–109, 1996. (Cited on p. 325.)
[673] Pavel Jiránek and David Titley-Peloquin. Estimating the backward error in LSQR. SIAM J. Matrix
Anal. Appl., 31:2055–2074, 2010. (Cited on p. 299.)
[674] Tierry Joffrain, Tze Meng Low, Enrique S. Quintana-Ortí, Robert van de Geijn, and Field G. Van
Zee. Accumulating Householder transformations, revisited. ACM Trans. Math. Softw., 32:169–179,
2006. (Cited on p. 109.)
[675] D. M. Johnson, A. L. Dulmage, and N. S. Mendelsohn. Connectivity and reducibility of graphs.
Canad. J. Math., 14:529–539, 1963. (Cited on p. 264.)
[676] Camille Jordan. Mémoires sur les formes bilinéaires. J. Math. Pures Appl., 19:35–54, 1874. (Cited
on pp. 12, 13.)
[677] Camille Jordan. Essai sur la géométrie à n dimensions. Bull. Soc. Math. France, 3:103–174, 1875.
(Cited on p. 16.)
[678] S. Kaczmarz. Angenäherte Auflösung von Systemen linearer Gleichungen. Bull. Internat. Acad.
Polon. Sci. Lett., 35:355–357, 1937. (Cited on p. 273.)
[679] Bo Kågström, Per Ling, and Charles F. Van Loan. GEMM-based level 3 BLAS high performance
model implementation and performance evaluation benchmarks. ACM Trans. Math. Softw., 24:268–
302, 1998. (Cited on p. 114.)
464 Bibliography
[680] W. M. Kahan. Accurate Eigenvalues of a Symmetric Tridiagonal Matrix. Tech. Report CS-41,
Computer Science Department, Stanford University, CA, 1966. Revised June 1968. (Cited on
pp. 76, 350, 351.)
[681] W. M. Kahan. Numerical linear algebra. Canad. Math. Bull., 9:757–801, 1966. (Cited on p. 25.)
[682] C. Kamath and Ahmed Sameh. A projection method for solving nonsymmetric linear systems on
multiprocessors. Parallel Computing, 9:291–312, 1989. (Cited on p. 309.)
[683] W. J. Kammerer and M. Z. Nashed. On the convergence of the conjugate gradient method for
singular linear operator equations. SIAM J. Numer. Anal., 9:165–181, 1972. (Cited on p. 280.)
[684] Igor E. Kaporin. High quality preconditioning of a general symmetric positive definite matrix based
on its ut u + ut r + rt u-decomposition. Numer. Linear Algebra Appl., 5:483–509, 1998. (Cited on
p. 311.)
[685] Rune Karlsson and Bertil Waldén. Estimation of optimal backward perturbation bounds for the
linear least squares problem. BIT Numer. Math., 37:862–869, 1997. (Cited on p. 98.)
[686] Linda Kaufman. Variable projection methods for solving separable nonlinear least squares prob-
lems. BIT Numer. Math., 15:49–57, 1975. (Cited on p. 404.)
[687] Linda Kaufman. Maximum likelihood, least squares, and penalized least squares for PET. IEEE
Trans. Med. Imaging, 12:200–214, 1993. (Cited on pp. 417, 417.)
[688] Linda Kaufman and Victor Pereyra. A method for separable nonlinear least squares problems with
separable nonlinear equality constraints. SIAM J. Numer. Anal., 15:12–20, 1978. (Cited on p. 404.)
[689] Linda Kaufman and Garrett Sylvester. Separable nonlinear least squares with multiple right-hand
sides. SIAM J. Matrix Anal. Appl., 13:68–89, 1992. (Cited on p. 405.)
[690] Herbert B. Keller. On the solution of singular and semidefinite linear systems by iteration. SIAM J.
Numer. Anal., 2:281–290, 1965. (Cited on p. 270.)
[691] Charles Kenney and Alan J. Laub. Rational iterative methods for the matrix sign function. SIAM J.
Matrix Anal. Appl., 12:273–291, 1991. (Cited on p. 382.)
[692] Andrzej Kielbasiński. Analiza numeryczna algorytmu ortogonalizacji Grama–Schmidta. Matem-
atyka Stosowana, 2:15–35, 1974. (Cited on p. 71.)
[693] Andrzej Kielbasiński. Iterative refinement for linear systems in variable-precision arithmetic. BIT
Numer. Math., 21:97–103, 1981. (Cited on p. 104.)
[694] Andrzej Kielbasiński and Krystyna Zietak. Numerical behavior of Higham’s scaled method for
polar decomposition. Numer. Algorithms, 32:105–140, 2003. (Cited on p. 384.)
[695] Misha E. Kilmer, Per Christian Hansen, and Malena I. Espanõl. A projection based approach to
general-form Tikhonov regularization. SIAM J. Sci. Comput., 29:315–330, 2007. (Cited on p. 333.)
[696] Misha E. Kilmer and Dianne P. O’Leary. Choosing regularization parameters in iterative methods
for ill-posed problems. SIAM J. Matrix Anal. Appl., 22:1204–1221, 2007. (Cited on p. 335.)
[697] Hyunsoo Kim and Haesun Park. Nonnegative matrix factorization based on alternating nonnegativ-
ity constrained least squares and active set method. SIAM J. Matrix Anal. Appl., 30:713–730, 2008.
(Cited on p. 419.)
[698] Hyunsoo Kim, Haesun Park, and Lars Eldén. Nonnegative tensor factorization based on alternat-
ing large-scale nonnegativity-constrained least squares. In Proceedings of IEEE 7th International
Conference on Bioinformatics and Bioengineering, volume 2, pages 1147–1151, 2007. (Cited on
p. 420.)
[699] Jingu Kim, Yunlong He, and Haesun Park. Algorithms for nonnegative matrix and tensor factoriza-
tion: A unified view based on block coordinate descent framework. J. Glob. Optim., 58:285–319,
2014. (Cited on p. 420.)
[700] Seung-Jean Kim, Kvangmoo Koh, Michael Lustig, Stephen Boyd, and Dimitri Gorinevsky. An
interior-point method for large-scale ℓ1 -regularized least squares. IEEE J. Selected Topics Signal
Process., 1:606–617, 2007. (Cited on p. 429.)
Bibliography 465
[701] Andrew V. Knyazev and Merico E. Argentati. Principal angles between subspaces in an A-based
scalar product: Algorithms and perturbation estimates. SIAM J. Sci. Comput., 23:2008–2040, 2002.
(Cited on p. 19.)
[702] E. G. Kogbetliantz. Solution of linear equations by diagonalization of coefficients matrix. Quart.
Appl. Math., 13:123–132, 1955. (Cited on p. 356.)
[703] E. Kokiopoulou, C. Bekas, and Efstratios Gallopoulos. Computing smallest singular value triplets
with implicitly restarted Lanczos bidiagonalization. Appl. Numer. Math., 49:39–61, 2004. (Cited
on p. 374.)
[704] G. B. Kolata. Geodesy: Dealing with an enormous computer task. Science, 200:421–422, 1978.
(Cited on p. 3.)
[705] Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. SIAM Rev., 51:455–
500, 2009. (Cited on pp. 217, 217.)
[706] Daniel Kressner. Numerical Methods for General and Structured Eigenvalue Problems. Volume 46
of Lecture Notes in Computational Science and Engineering. Springer, Berlin, 2005. (Cited on
p. 349.)
[707] Daniel Kressner. The periodic QR algorithms is a disguised QR algorithm. Linear Algebra Appl.,
417:423–433, 2005. (Cited on p. 349.)
[708] F. T. Krogh. Efficient implementation of a variable projection algorithm for nonlinear least squares.
Comm. ACM, 17:167–169, 1974. (Cited on p. 404.)
[709] Vera N. Kublanovskaya. On some algorithms for the solution of the complete eigenvalue problem.
Z. Vychisl. Mat. i Mat. Fiz., 1:555–570, 1961. In Russian. English translation in USSR Comput.
Math. Phys., 1:637–657, 1962. (Cited on p. 348.)
[710] Ming-Jun Lai and Yang Wang. Sparse Solutions of Underdetermined Linear Systems and Their
Applications. SIAM, Philadelphia, 2021. (Cited on p. 429.)
[711] Jörg Lampe and Heinrich Voss. A fast algorithm for solving regularized total least squares prob-
lems. ETNA, 31:12–24, 2008. (Cited on p. 226.)
[712] Jörg Lampe and Heinrich Voss. Large-scale Tikhonov regularization of total least squares. J.
Comput. Appl. Math., 238:95–108, 2013. (Cited on p. 226.)
[713] Peter Lancaster and M. Tismenetsky. The Theory of Matrices. With Applications. Academic Press,
New York, 1985. (Cited on p. 210.)
[714] C. Lanczos. Linear Differential Operators. D. Van Nostrand, London, UK, 1961. (Cited on p. 13.)
[715] Cornelius Lanczos. An iteration method for the solution of the eigenvalue problem of linear dif-
ferential and integral operators. J. Res. Nat. Bur. Standards, Sect. B, 45:255–281, 1950. (Cited on
pp. 287, 370.)
[716] Cornelius Lanczos. Solution of systems of linear equations by minimized iterations. J. Res. Nat.
Bur. Standards, Sect. B, 49:33–53, 1952. (Cited on p. 288.)
[717] Cornelius Lanczos. Linear systems in self-adjoint form. Amer. Math. Monthly, 65:665–671, 1958.
(Cited on p. 13.)
[718] L. Landweber. An iterative formula for Fredholm integral equations of the first kind. Amer. J.
Math., 73:615–624, 1951. (Cited on p. 325.)
[719] Julien Langou. AllReduce algorithms: Application to Householder QR factorization. In Precon-
ditioning, July 9–12, 2007, CERFACS, Toulouse, 2007. http://www.precond07/enseeih.fr/
Talks/langou/langou.pdf. (Cited on p. 213.)
[720] Julien Langou. Translation and Modern Interpretation of Laplace’s Théorie Analytique des Prob-
abilités, pages 505–512, 516–520. Tech. Report 280, UC Denver CCM, Albuquerque, NM and
Livermore, CA, 2009. (Cited on p. 64.)
[721] P. S. Laplace. Théorie analytique des probabilités. Premier supplément, Courcier, Paris, third
edition, 1816. (Cited on p. 64.)
466 Bibliography
[722] Rasmus Munk Larsen. Lanczos Bidiagonalization with Partial Reorthogonalization. Tech. Report
DAIMI PB-357, Department of Computer Science, Aarhus University, Denmark, 1998. (Cited on
p. 373.)
[723] Rasmus Munk Larsen. PROPACK: A Software Package for the Singular Value Problem Based
on Lanczos Bidiagonalization with Partial Reorthogonalization. http://soi.stanford.edu/
~rmunk/PROPACK/, SCCM, Stanford University, Stanford, CA, 2000. (Cited on p. 374.)
[724] Peter Läuchli. Jordan-Elimination und Ausgleichung nach kleinsten Quadraten. Numer. Math.,
3:226–240, 1961. (Cited on pp. 40, 316.)
[725] Charles L. Lawson. Contributions to the Theory of Linear Least Maximum Approximation. Ph.D.
thesis, University of California, Los Angeles, 1961. (Cited on p. 423.)
[726] Charles L. Lawson. Sparse Matrix Methods Based on Orthogonality and Conjugacy. Tech. Mem.
33-627, Jet Propulsion Laboratory, Cal. Inst. of Tech., Pasadena, CA, 1973. (Cited on p. 285.)
[727] Charles L. Lawson and Richard J. Hanson. Solving Least Squares Problems, volume 15 of Classics
in Applied Math., SIAM, Philadelphia, 1995. Unabridged, revised republication of the work first
published by Prentice-Hall, Inc., Englewood Cliffs, NJ, 1974. (Cited on pp. 137, 157, 160, 162,
167, 167, 178, 189, 192.)
[728] Charles L. Lawson, Richard J. Hanson, D. R. Kincaid, and Fred T. Krogh. Basic Linear Algebra
Subprograms for Fortran usage. ACM Trans. Math. Softw., 5:308–323, 1979. (Cited on p. 113.)
[729] Adrien-Marie Legendre. Nouvelles méthodes pour la détermination des orbites des comètes.
Courcier, Paris, 1805. (Cited on p. 2.)
[730] R. B. Lehoucq. Implicitly restarted Arnoldi methods and subspace iteration. SIAM J. Matrix Anal.
Appl., 23:551–562, 2001.
[731] R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK Users’ Guide: Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia, 1998. (Cited
on pp. 373, 375.)
[732] Richard B. Lehoucq. The computations of elementary unitary matrices. ACM Trans. Math. Softw.,
22:393–400, 1996. (Cited on p. 47.)
[733] Steven J. Leon, Åke Björck, and Walter Gander. Gram–Schmidt orthogonalization: 100 years and
more. Numer. Linear Algebra Appl., 20:492–532, 2013. (Cited on p. 63.)
[734] Ö. Leringe and Per-Å. Wedin. A Comparison between Different Methods to Compute a Vector x
Which Minimizes ∥Ax−b∥2 When Gx = h. Tech. Report, Department of Computer Science, Lund
University, Lund, Sweden, 1970. (Cited on p. 158.)
[735] S. E. Leurgans, R. T. Ross, and R. B. Abel. A decomposition for three-way arrays. SIAM J. Matrix
Anal. Appl., 14:1064–1083, 1993. (Cited on pp. 217, 373.)
[736] K. Levenberg. A method for the solution of certain non-linear problems in least squares. Quart.
Appl. Math., 2:164–168, 1944. (Cited on pp. 169, 175, 396.)
[737] J. G. Lewis. Algorithm 582: The Gibbs–Poole–Stockmeyer and Gibbs–King algorithms for re-
ordering sparse matrices. ACM Trans. Math. Softw., 8:190–194, 1982. (Cited on p. 251.)
[738] J. G. Lewis. Implementation of The Gibbs–Poole–Stockmeyer and Gibbs–King algorithms. ACM
Trans. Math. Softw., 8:180–189, 1982. (Cited on p. 251.)
[739] J. G. Lewis, D. J. Pierce, and D. K. Wah. Multifrontal Householder QR Factorization. Tech. Report
ECA-TR-127-Revised, Boeing Computer Services, Seattle, WA, 1989. (Cited on p. 258.)
[740] Chi-Kwong Li and Gilbert Strang. An elementary proof of Mirsky’s low rank approximation theo-
rem. Electronic J. Linear Algebra, 36:347–414, 2020. (Cited on p. 24.)
[741] Na Li and Yousef Saad. MIQR: A multilevel incomplete QR preconditioner for large sparse least-
squares problems. SIAM J. Matrix Anal. Appl., 28:524–550, 2006. (Cited on pp. 313, 318.)
[742] Ren-Cang Li. Bounds on perturbations of generalized singular values and of associated subspaces.
SIAM J. Matrix Anal. Appl., 14:195–234, 1993. (Cited on p. 125.)
Bibliography 467
[743] Ren-Cang Li. Solving Secular Equations Stably and Efficiently. Tech. Report UCB/CSD-94-851,
Computer Science Department, University of California, Berkeley, CA, 1994. (Cited on pp. 359,
361.)
[744] Yuying Li. A globally convergent method for lp problems. SIAM J. Optim., 3:609–629, 1993.
(Cited on p. 425.)
[745] Yuying Li. Solving lp Problems and Applications. Tech. Report CTC93TR122, 03/93, Advanced
Computing Research Institute, Cornell University, Ithaca, NY, 1993. (Cited on p. 425.)
[746] Jörg Liesen and Zdeněk Strakoš. Krylov Subspace Methods; Principles and Analysis. Oxford
University Press, Oxford, UK, 2012. (Cited on pp. 285, 289.)
[747] Lek-Heng Lim. Tensor and hypermatrices. In Leslie Hogben, editor, Handbook of Linear Algebra,
pages 15.1–15.30. Chapman & Hall/CRC Press, Boca Raton, FL, second edition, 2013. (Cited on
p. 217.)
[748] Chih-Jen Lin and Jorge J. Moré. Newton’s method for large bound-constrained optimization prob-
lems. SIAM J. Optim. Theory Appl., 9:1100–1127, 1999. (Cited on p. 310.)
[749] Per Lindström. A General Purpose Algorithm for Nonlinear Least Squares Problems with Nonlin-
ear Constraints. Tech. Report UMINF–102.83, Institute of Information Processing, University of
Umeå, Sweden, 1983. (Cited on p. 396.)
[750] Per Lindström. Two User Guides, One (ENLSIP) for Constrained — One (ELSUNC) for Uncon-
strained Nonlinear Least Squares Problems. Tech. Report UMINF–109.82 and 110.84, Institute of
Information Processing, University of Umeå, Sweden, 1984. (Cited on p. 411.)
[751] Per Lindström and Per-Å. Wedin. A new linesearch algorithm for unconstrained nonlinear least
squares problems. Math. Program., 29:268–296, 1984. (Cited on p. 395.)
[752] Per Lindström and Per-Å. Wedin. Methods and Software for Nonlinear Least Squares Problems.
Tech. Report UMINF–133.87, Institute of Information Processing, University of Umeå, Sweden,
1988. (Cited on p. 402.)
[753] Richard J. Lipton, Donald J. Rose, and Robert E. Tarjan. Generalized nested dissection. SIAM J.
Numer. Anal., 16:346–358, 1979. (Cited on p. 256.)
[754] Joseph W. H. Liu. On general row merging schemes for sparse Givens transformations. SIAM J.
Sci. Statist. Comput., 7:1190–1211, 1986. (Cited on pp. 255, 256.)
[755] Joseph W. H. Liu. The role of elimination trees in sparse factorization. SIAM J. Matrix Anal. Appl.,
11:134–172, 1990. (Cited on pp. 250, 250, 256, 258.)
[756] Qiaohua Liu. Modified Gram–Schmidt-based methods for block downdating the Cholesky factor-
ization. J. Comput. Appl. Math., 235:1897–1905, 2011. (Cited on p. 148.)
[757] James W. Longley. Modified Gram–Schmidt process vs. classical Gram–Schmidt. Comm. Statist.
Simulation Comput., 10:517–527, 1981. (Cited on p. 62.)
[758] Per Lötstedt. Perturbation bounds for the linear least squares problem subject to linear inequality
constraints. BIT Numer. Math., 23:500–519, 1983. (Cited on p. 167.)
[759] Per Lötstedt. Solving the minimal least squares problem subject to bounds on the variables. BIT
Numer. Math., 24:206–224, 1984. (Cited on p. 167.)
[760] P.-O. Löwdin. On the non-orthogonality problem. Adv. Quantum Chemistry, 5:185–199, 1970.
(Cited on p. 383.)
[761] Szu-Min Lu and Jesse L. Barlow. Multifrontal computation with the orthogonal factors of sparse
matrices. SIAM J. Matrix Anal. Appl., 17:658–679, 1996. (Cited on p. 258.)
[762] Franklin T. Luk. A rotation method for computing the QR-decomposition. SIAM J. Sci. Statist.
Comput., 7:452–459, 1986. (Cited on p. 357.)
[763] Franklin T. Luk and S. Qiao. A new matrix decomposition for signal processing. Automatica,
30:39–43, 1994. (Cited on p. 128.)
468 Bibliography
[764] I. Lustig, R. Marsten, and D. Shanno. Computational experience with a primal-dual interior point
method for linear programming. Linear Algebra Appl., 152:191–222, 1991. (Cited on pp. 418,
419.)
[765] D. Ma, L. Yang, R. M. T. Fleming, I. Thiele, B. O. Palsson, and M. A. Saunders. Reliable and
efficient solution of genome-scale models of metabolism and macromolecular expression. Sci.
Rep., 36:40863, 2017. (Cited on p. 101.)
[766] Kaj Madsen and Hans Bruun Nielsen. Finite algorithms for robust linear regression. BIT Numer.
Math., 30:682–699, 1990. (Cited on p. 425.)
[767] Kaj Madsen and Hans Bruun Nielsen. A finite smoothing algorithm for linear ℓ1 estimation. SIAM
J. Optim., 3:223–235, 1993. (Cited on p. 422.)
[768] N. Mahdavi-Amiri. Generally Constrained Nonlinear Least Squares and Generating Test Prob-
lems: Algorithmic Approach. Ph.D. thesis, The John Hopkins University, Baltimore, MD, 1981.
(Cited on p. 162.)
[769] Alexander N. Malyshev. Parallel algorithms for solving spectral problems of linear algebra. Linear
Algebra Appl., 188:489–520, 1993. (Cited on p. 382.)
[770] Alexander N. Malyshev and Miloud Sadkane. Computation of optimal backward perturbation
bounds for large sparse linear least squares problems. BIT Numer. Math., 41:739–747, 2001. (Cited
on p. 99.)
[771] Rolf Manne. Analysis of two partial-least-squares algorithms for multivariate calibration. Chemom.
Intell. Lab. Syst., 2:187–197, 1987. (Cited on pp. 202, 203.)
[772] P. Manneback. On Some Numerical Methods for Solving Large Sparse Linear Least Squares Prob-
lems. Ph.D. thesis, Facultés Universitaires Notre-Dame de la Paix, Namur, Belgium, 1985. (Cited
on pp. 254, 308.)
[773] P. Manneback, C. Murigande, and Philippe L. Toint. A modification of an algorithm by Golub and
Plemmons for large linear least squares in the context of Doppler positioning. IMA J. Numer. Anal.,
5:221–234, 1985. (Cited on p. 206.)
[774] Thomas A. Manteuffel. An incomplete factorization technique for positive definite linear systems.
Math. Comp., 34:473–497, 1980. (Cited on p. 309.)
[775] A. A. Markov. Wahrscheinlichkeitsrechnung. Liebmann, Leipzig, second edition, 1912. (Cited on
p. 3.)
[776] Ivan Markovsky. Bibliography on total least-squares and related methods. Statist. Interface, 3:1–6,
2010. (Cited on p. 226.)
[777] Ivan Markovsky and Sabine Van Huffel. Overview of total least-squares methods. Signal Process.,
87:2283–2302, 2007. (Cited on p. 226.)
[778] Harry M. Markowitz. The elimination form of the inverse and its application to linear programming.
Management Sci., 3:255–269, 1957. (Cited on p. 251.)
[779] Donald W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. J. Soc.
Indust. Appl. Math., 11:431–441, 1963. (Cited on p. 396.)
[780] J. J. Martínez and J. M. Peña. Fast parallel algorithm of Björck–Pereyra type for solving Cauchy–
Vandermonde linear systems. Appl. Numer. Math., 26:343–352, 1998. (Cited on p. 239.)
[781] W. F. Massy. Principal components regression in exploratory statistical research. J. Amer. Statist.
Assoc., 60:234–246, 1965. (Cited on p. 174.)
[782] Nicola Mastronardi and Paul Van Dooren. The antitriangular factorization of symmetric matrices.
SIAM J. Matrix Anal. Appl., 34:173–196, 2013. (Cited on p. 137.)
[783] Nicola Mastronardi and Paul Van Dooren. An algorithm for solving the indefinite least squares
problem with equality constraints. BIT Numer. Math., 54:201–218, 2014. (Cited on p. 137.)
[784] Pontus Matstoms. QR27—Specification Sheet. Tech. Report, Department of Mathematics,
Linköping University, Sweden, 1992. (Cited on p. 258.)
Bibliography 469
[785] Pontus Matstoms. Sparse QR factorization in MATLAB. ACM Trans. Math. Softw., 20:136–159,
1994. (Cited on p. 258.)
[786] J. A. Meijerink and Henk A. van der Vorst. An iterative solution method for linear systems of
which the coefficient matrix is a symmetric M-matrix. Math. Comp., 31:148–162, 1977. (Cited on
p. 309.)
[787] Beatrice Meini. The matrix square root from a new functional perspective: Theoretical results and
computational issues. SIAM J. Matrix Anal. Appl., 26:362–376, 2004. (Cited on p. 379.)
[788] Xiangrui Meng. Randomized Algorithms for Large-Scale Strongly Over-Determined Linear Re-
gression Problems. Ph.D. thesis, Stanford University, Stanford, CA, 2014. (Cited on p. 212.)
[789] Xiangrui Meng, Michael A. Saunders, and Michael W. Mahoney. LSRN: A parallel iterative solver
for strongly over- or underdetermined systems. SIAM J. Sci. Comput., 36:C95–C118, 2014. (Cited
on pp. 320, 321.)
[790] G. Merle and Helmut Späth. Computational experience with discrete Lp approximation. Comput-
ing, 12:315–321, 1974. (Cited on p. 424.)
[791] Gérard Meurant. The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Preci-
sion Computations, volume 19 of Software, Environments, and Tools. SIAM, Philadelphia, 2006.
(Cited on p. 285.)
[792] Gérard Meurant and Zdeněk Strakoš. The Lanczos and conjugate gradient algorithms in finite
precision arithmetic. Acta Numer., 15:471–542, 2006. (Cited on pp. 285, 299.)
[793] Carl D. Meyer, Jr. Generalized inversion of modified matrices. SIAM J. Appl. Math., 24:315–323,
1973. (Cited on p. 139.)
[794] Alan J. Miller. Subset Selection in Regression, volume 25 of Monograph on Statistics and Applied
Probability. Chapman & Hall/CRC Press, Boca Raton, FL, second edition, 2002. (Cited on p. 140.)
[795] Kenneth S. Miller. Complex linear least squares. SIAM Rev., 15:706–726, 1973. (Cited on p. 5.)
[796] Luiza Miranian and Ming Gu. Strong rank-revealing LU factorizations. Linear Algebra Appl.,
367:1–16, 2003. (Cited on p. 91.)
[797] L. Mirsky. Symmetric gauge functions and unitarily invariant norms. Quart. J. Math. Oxford,
11:50–59, 1960. (Cited on p. 24.)
[798] S. K. Mitra and C. R. Rao. Projections under seminorms and generalized Moore–Penrose inverses.
Linear Algebra Appl., 9:155–167, 1974. (Cited on p. 158.)
[799] Cleve B. Moler. Iterative refinement in floating point. J. Assoc. Comput. Mach., 14:316–321, 1967.
(Cited on p. 101.)
[800] Alexis Montoison and Dominique Orban. BILQ: An iterative method for nonsymmetric linear
systems with a quasi-minimum error property. SIAM J. Matrix Anal. Appl., 41:1145–1166, 2020.
(Cited on p. 304.)
[801] Alexis Montoison and Dominique Orban. TRICG and TRIMR: Two iterative methods for symmet-
ric quasi-definite systems. SIAM J. Sci. Comput., 43:A2502–A2525, 2021. (Cited on p. 331.)
[802] Alexis Montoison, Dominique Orban, and Michael Saunders. MINARES: An Iterative Solver for
Symmetric Linear Systems. Tech. Report GERARD G-2023-40, École Polytechnique Montreal,
2023. (Cited on p. 300.)
[803] Marc Moonen, Paul Van Dooren, and Joos Vandewalle. A singular value decomposition updating
algorithm for subspace tracking. SIAM J. Matrix Anal. Appl., 13:1015–1038, 1992. (Cited on
pp. 360, 363.)
[804] E. H. Moore. On the reciprocal of the general algebraic matrix. Bull. Amer. Math. Soc., 26:394–395,
1920. (Cited on p. 16.)
[805] José Morales and Jorge Nocedal. Remark on Algorithm 778: L-BFGS-B: Fortran subroutines for
large-scale bound-constrained optimization. ACM Tran. Math. Softw., 38:Article 7, 2011. (Cited
on p. 420.)
470 Bibliography
[806] Jorge J. Moré. The Levenberg–Marquardt algorithm: Implementation and theory. In G. A. Watson,
editor, Numerical Analysis Proceedings Biennial Conference Dundee 1977, volume 630 of Lecture
Notes in Mathematics, pages 105–116. Springer-Verlag, Berlin, 1978. (Cited on p. 396.)
[807] Jorge J. Moré. Recent developments in algorithms and software for trust region-methods. In
A. Bachem, M. Grötchel, and B. Korte, editors, Mathematical Programming. The State of the Art,
Proceedings Bonn 1982, pages 258–287. Springer-Verlag, Berlin, 1983. (Cited on p. 396.)
[808] Jorge J. Moré, B. S. Garbow, and K. E. Hillstrom. Users’ Guide for MINPACK-1. Tech. Report
ANL-80-74, Applied Math. Div., Argonne National Laboratory, Argonne, IL, 1980. (Cited on
p. 402.)
[809] Jorge J. Moré and G. Toraldo. Algorithms for bound constrained quadratic programming problems.
Numer. Math., 55:377–400, 1989. (Cited on p. 420.)
[810] Ronald B. Morgan. A restarted GMRES method augmented with eigenvectors. SIAM J. Matrix
Anal. Appl., 16:1154–1171, 1995. (Cited on p. 337.)
[811] Daisuke Mori, Yusaku Yamamoto, Shao-Liang Zhang, and Takeshi Fukaya. Backward error analy-
sis of the AllReduce algorithm for Householder QR decomposition. Japan J. Indust. Appl. Math.,
29:111–130, 2012. (Cited on p. 213.)
[812] Keiichi Morikuni and Ken Hayami. Inner-iteration Krylov subspace methods for least squares
problems. SIAM J. Matrix Anal. Appl., 34:1–22, 2013. (Cited on p. 309.)
[813] Keiichi Morikuni and Ken Hayami. Convergence of inner-iteration GMRES methods for rank-
deficient least squares problems. SIAM J. Matrix Anal. Appl., 36:225–250, 2015. (Cited on p. 309.)
[814] V. A. Morozov. Methods for Solving Incorrectly Posed Problems. Springer, New York, 1984.
(Cited on p. 177.)
[815] N. Munksgaard. Solving sparse symmetric sets of linear equations by preconditioned conjugate
gradients. ACM Trans. Math. Softw., 6:206–219, 1980. (Cited on p. 310.)
[816] Joseph M. Myre, Erich Frahm, David J. Lilja, and Martin O. Saar. TNT: A solver for large dense
least-squares problems that takes conjugate gradient from bad in theory, to good in practice. In
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Work-
shops, pages 987–995. IEEE, 2018. (Cited on p. 312.)
[817] J. G. Nagy. Toeplitz Least Squares Computations. Ph.D. thesis, North Carolina State University,
Raleigh, NC, 1991. (Cited on p. 324.)
[818] James G. Nagy. Fast inverse QR factorization for Toeplitz matrices. SIAM J. Sci. Comput., 14:1174–
1193, 1993. (Cited on pp. 240, 240.)
[819] James G. Nagy and Zdeněk Strakoš. Enforcing nonnegativity in image reconstruction algorithms. In
Mathematical Modeling, Estimation, and Imaging, pages 182–190. Proc. SPIE 4121, Bellingham,
WA, 2000. (Cited on p. 417.)
[820] Yuji Nakatsukasa, Zhaojun Bai, and Françoise Gygi. Optimizing Halley’s iteration for computing
the matrix polar decomposition. SIAM J. Matrix Anal. Appl., 31:2700–2720, 2010. (Cited on
pp. 385, 385.)
[821] Yuji Nakatsukasa and Roland W. Freund. Computing fundamental matrix decompositions accu-
rately via the matrix sign function in two iterations: The power of Zolotarev’s functions. SIAM
Rev., 58:461–493, 2016. (Cited on p. 382.)
[822] Yuji Nakatsukasa and Nicholas J. Higham. Stable and efficient spectral divide and conquer algo-
rithms for the symmetric eigenvalue decomposition and the SVD. SIAM J. Sci. Comput., 35:A1325–
A1349, 2013. (Cited on pp. 381, 385.)
[823] M. Zuhair Nashed. Generalized Inverses and Applications. Academic Press, New York, 1976.
(Cited on pp. 15, 16.)
[824] Larry Neal and George Poole. A geometric analysis of Gaussian elimination. ii. Linear Algebra
Appl., 173:239–264, 1992. (Cited on p. 86.)
Bibliography 471
[825] Arkadi Nemirovski and Michael J. Todd. Interior point methods for optimization. Acta Numer.,
17:191–234, 2008. (Cited on p. 419.)
[826] A. S. Nemirovskii. The regularization properties of the adjoint gradient method in ill-posed prob-
lems. USSR Comput. Math. Math. Phys., 26:7–16, 1986. (Cited on p. 334.)
[827] Yurii Nesterov and Arkadi Nemirovski. On first-order algorithms for ℓ1 /nuclear norm minimization.
Acta Numer., 22:509–575, 2013. (Cited on p. 429.)
[828] Yurii Nesterov and Arkadi Nemirovskii. Interior Point Polynomial Algorithms in Convex Program-
ming, volume 13 of Studies in Applied Mathematics. SIAM, Philadelphia, 1994. (Cited on p. 419.)
[829] Olavi Nevanlinna. Convergence of Iterations for Linear Equations. Lectures in Mathematics ETH
Zürich. Birkhäuser, Basel, 1993. (Cited on p. 295.)
[830] R. A. Nicolaides. Deflation of conjugate gradients with applications to boundary values problems.
SIAM J. Numer. Anal., 24:355–365, 1987. (Cited on p. 337.)
[831] Ben Noble, editor. Applied Linear Algebra. Prentice-Hall, Englewood Cliffs, NJ, 1969. (Cited on
p. 88.)
[832] Ben Noble. Method for computing the Moore-Penrose generalized inverse and related matters.
In M. Zuhair Nashed, editor, Generalized Inverses and Applications, Proceedings of an Advanced
Seminar, The University of Wisconsin–Madison, October 8–10, 1973, Publication of the Mathe-
matics Research Center, No. 32, pages 245–302, Academic Press, New York, 1976. (Cited on
p. 85.)
[833] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer Series in Operations
Research and Financial Engineering. Springer, New York, second edition, 2006. (Cited on p. 402.)
[834] Paolo Novati and Maria Rosario Russo. A GCV based Arnoldi–Tikhonov regularization method.
BIT Numer. Math., 54:501–521, 2014. (Cited on p. 335.)
[835] W. Oettli and W. Prager. Compatibility of approximate solution of linear equations with given error
bounds for coefficients and right-hand sides. Numer. Math., 6:404–409, 1964. (Cited on p. 99.)
[836] Gabriel Oks̆a, Yusaku Yamamoto, and Marián Vajters̆ic. Convergence to singular triplets in the two-
sided block-Jacobi’s svd algorithm with dynamic ordering. SIAM J. Matrix Anal. Appl., 43:1238–
1262, 2022. (Cited on p. 376.)
[837] Dianne P. O’Leary. The block conjugate gradient algorithm and related methods. Linear Algebra
Appl., 29:293–322, 1980. (Cited on p. 95.)
[838] Dianne P. O’Leary. Robust regression computation using iteratively reweighted least squares. SIAM
J. Matrix Anal. Appl., 11:466–480, 1990. (Cited on pp. 425, 425.)
[839] Dianne P. O’Leary and Bert W. Rust. Variable projection for nonlinear least squares problems.
Comput. Optim. Appl., 54:579–593, 2013. (Cited on p. 404.)
[840] Dianne P. O’Leary and John A. Simmons. A bidiagonalization-regularization procedure for large
scale discretizations of ill-posed problems. SIAM J. Sci. Statist. Comput., 2:474–489, 1981. (Cited
on p. 332.)
[841] Dianne P. O’Leary and P. Whitman. Parallel QR factorization by Householder and modified Gram-
Schmidt algorithms. Parallel Comput., 16:99–112, 1990. (Cited on p. 112.)
[842] S. Oliveira, L. Borges, M. Holzrichter, and T. Soma. Analysis of different partitioning schemes
for parallel Gram–Schmidt algorithms. Internat. J. Parallel Emergent Distrib. Syst., 14:293–320,
2000. (Cited on p. 109.)
[843] Serge J. Olszanskyj, James M. Lebak, and Adam W. Bojanczyk. Rank-k modification methods for
recursive least squares problems. Numer. Algorithms, 7:325–354, 1994. (Cited on p. 148.)
[844] Dominique Orban and Mario Arioli. Iterative Solution of Symmetric Quasi-Definite Linear Systems.
SIAM, Philadelphia, 2017. (Cited on p. 331.)
472 Bibliography
[845] James M. Ortega and Werner C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several
Variables, volume 30 of Classics in Applied Math., SIAM, Philadelphia, 2000. Unabridged repub-
lication of the work first published by Academic Press, New York and London, 1970. (Cited on
pp. 393, 395, 402.)
[846] M. R. Osborne. Some special nonlinear least squares problems. SIAM J. Numer. Anal., 12:571–592,
1975. (Cited on p. 405.)
[847] Michael R. Osborne. Nonlinear least squares—the Levenberg algorithm revisited. J. Austr. Math.
Soc. Series B, 19:342–357, 1976. (Cited on p. 396.)
[848] Michael R. Osborne. Finite Algorithms in Optimization and Data Analysis. John Wiley & Sons,
New York, 1985. (Cited on p. 423.)
[849] Michael R. Osborne, Brett Presnell, and B. A. Turlach. A new approach to variable selection in
least squares problems. IMA J. Numer. Anal., 20:389–404, 2000. (Cited on p. 425.)
[850] Michael R. Osborne and G. Alistair Watson. On the best linear Chebyshev approximation. Com-
puter J., 10:172–177, 1967. (Cited on p. 422.)
[851] George Ostrouchov. Symbolic Givens reduction and row-ordering in large sparse least squares
problems. SIAM J. Sci. Statist. Comput., 8:248–264, 1987. (Cited on p. 253.)
[852] C. C. Paige. Bidiagonalization of matrices and solution of linear equations. SIAM J. Numer. Anal.,
11:197–209, 1974. (Cited on p. 291.)
[853] C. C. Paige. Fast numerically stable computations for generalized least squares problems. SIAM J.
Numer. Anal., 16:165–171, 1979. (Cited on p. 122.)
[854] C. C. Paige. Computing the generalized singular value decomposition. SIAM J. Sci. and Statist.
Comput., 7:1126–1146, 1986. (Cited on p. 126.)
[855] C. C. Paige and M. A. Saunders. Solution of sparse indefinite systems of linear equations. SIAM J.
Numer. Anal., 12:617–629, 1975. (Cited on p. 299.)
[856] C. C. Paige and M. A. Saunders. Towards a generalized singular value decomposition. SIAM J.
Numer. Anal., 18:398–405, 1981. (Cited on pp. 18, 19, 124, 125.)
[857] Christopher C. Paige. The Computation of Eigenvalues and Eigenvectors of Very Large Sparse
Matrices. Ph.D. thesis, University of London, UK, 1971. (Cited on p. 294.)
[858] Christopher C. Paige. Computer solution and perturbation analysis of generalized linear least
squares problems. Math. Comp., 33:171–184, 1979. (Cited on pp. 122, 123.)
[859] Christopher C. Paige. Error analysis of some techniques for updating orthogonal decompositions.
Math. Comp., 34:465–471, 1980. (Cited on p. 144.)
[860] Christopher C. Paige. The general linear model and the generalized singular value decomposition.
Linear Algebra Appl., 70:269–284, 1985. (Cited on p. 126.)
[861] Christopher C. Paige. Some aspects of generalized QR factorizations. In M. G. Cox and Sven J.
Hammarling, editors, Reliable Numerical Computation, pages 71–91. Clarendon Press, Oxford,
UK, 1990. (Cited on pp. 124, 128.)
[862] Christopher C. Paige. A useful form of a unitary matrix obtained from any sequence of unit 2-norm
n-vectors. SIAM J. Matrix Anal. Appl., 31:565–583, 2009. (Cited on p. 65.)
[863] Christopher C. Paige. Accuracy of the Lanczos process for the eigenproblem and solution of equa-
tions. SIAM J. Matrix Anal. Appl., 40:1371–1398, 2019. (Cited on p. 299.)
[864] Christopher C. Paige, Beresford N. Parlett, and Henk A. van der Vorst. Approximate solutions and
eigenvalue bounds from Krylov subspaces. Numer. Linear Algebra Appl., 2:115–133, 1995. (Cited
on p. 370.)
[865] Christopher C. Paige, Miroslav Rozložník, and Zdeněk Strakoš. Modified Gram–Schmidt (MGS),
least squares, and backward stability of MGS-GMRES. SIAM J. Matrix Anal. Appl., 28:264–284,
2006. (Cited on p. 302.)
Bibliography 473
[866] Christopher C. Paige and Michael A. Saunders. LSQR: An algorithm for sparse linear equations
and sparse least squares. ACM Trans. Math. Softw., 8:43–71, 1982. (Cited on pp. 197, 197, 197,
199, 202, 289, 291, 295, 297, 300, 322.)
[867] Christopher C. Paige and Zdeněk Strakoš. Unifying least squares, total least squares, and data
least squares. In Sabine Van Huffel and P. Lemmerling, editors, Total Least Squares and Errors-
in-Variables Modeling, pages 25–34. Kluwer Academic Publishers, Dordrecht, 2002. (Cited on
pp. 196, 218.)
[868] Christopher C. Paige and Zdeněk Strakoš. Core problems in linear algebraic systems. SIAM J.
Matrix Anal. Appl., 27:861–875, 2006. (Cited on pp. 194, 196, 196.)
[869] Christopher C. Paige and P. Van Dooren. On the quadratic convergence of Kogbetliantz’s algorithm
for computing the singular value decomposition. Linear Algebra Appl., 77:301–313, 1986. (Cited
on p. 357.)
[870] Christopher C. Paige and Musheng Wei. History and generality of the CS decomposition. Linear
Algebra Appl., 208/209:303–326, 1994. (Cited on pp. 19, 19.)
[871] Ching-Tsuan Pan. A modification to the LINPACK downdating algorithm. BIT Numer. Math.,
30:707–722, 1990. (Cited on p. 146.)
[872] Ching-Tsuan Pan. A perturbation analysis on the problem of downdating a Cholesky factorization.
Linear Algebra Appl., 183:103–116, 1993. (Cited on p. 146.)
[873] Ching-Tsuan Pan. On the existence and computation of rank revealing LU factorizations. Linear
Algebra Appl., 316:199–222, 2000. (Cited on pp. 89, 90, 91.)
[874] Ching-Tsuan Pan and Robert J. Plemmons. Least squares modifications with inverse factorizations:
Parallel implementations. J. Comput. Appl. Math., 27:109–127, 1989. (Cited on pp. 136, 149.)
[875] Ching-Tsuan Pan and Ping Tak Peter Tang. Bounds on singular values revealed by QR factorization.
BIT Numer. Math., 39:740–756, 1999. (Cited on p. 83.)
[876] C. H. Papadimitriou. The NP-completeness of the bandwidth minimization. Computing, 16:263–
270, 1976. (Cited on p. 250.)
[877] A. T. Papadoupolos, Iain S. Duff, and Andrew J. Wathen. A class of incomplete orthogonal factor-
ization methods II: Implementation and results. BIT Numer. Math., 45:159–179, 2005. (Cited on
p. 315.)
[878] J. M. Papy, Lieven De Lauthauwer, and Sabine Van Huffel. Exponential data fitting using multilin-
ear algebra: The single-channel and multi-channel case. Numer. Linear Algebra Appl., 12:809–826,
2005. (Cited on p. 218.)
[879] Haesun Park. A parallel algorithm for the unbalanced orthogonal Procrustes problem. Parallel
Comput., 17:913–923, 1991. (Cited on p. 387.)
[880] Haesun Park and Lars Eldén. Downdating the rank-revealing URV decomposition. SIAM J. Matrix
Anal. Appl., 16:138–155, 1995. (Cited on p. 155.)
[881] Haesun Park and Lars Eldén. Stability analysis and fast algorithms for triangularization of Toeplitz
matrices. Numer. Math., 76:383–402, 1997. (Cited on p. 240.)
[882] Haesun Park and Sabine Van Huffel. Two-way bidiagonalization scheme for downdating the sin-
gular value decomposition. Linear Algebra Appl., 222:23–40, 1995. (Cited on p. 362.)
[883] Beresford N. Parlett. The new QD algorithms. Acta Numer., 4:459–491, 1995. (Cited on p. 352.)
[884] Beresford N. Parlett. The Symmetric Eigenvalue Problem, volume 20 of Classics in Applied Math.,
SIAM, Philadelphia, 1998. Unabridged republication of the work first published by Prentice-Hall,
Englewood Cliffs, NJ, 1980. (Cited on pp. 51, 69, 199, 345, 366, 366, 367, 370, 370.)
[885] Beresford N. Parlett and W. G. Poole, Jr. A geometric theory for the QR, LU and power iteration.
SIAM J. Numer., 10:389–412, 1973. (Cited on p. 368.)
[886] S. V. Parter. The use of linear graphs in Gauss elimination. SIAM Rev., 3:119–130, 1961. (Cited
on p. 249.)
474 Bibliography
[908] Charles M. Rader and Allen O. Steinhardt. Hyperbolic Householder transforms. SIAM J. Matrix
Anal. Appl., 9:269–290, 1988. (Cited on p. 137.)
[909] Rui Ralha. One-sided reduction to bidiagonal form. Linear Algebra Appl., 358:219–238, 2003.
(Cited on p. 194.)
[910] Håkan Ramsin and Per-Å. Wedin. A comparison of some algorithms for the nonlinear least squares
problem. BIT Numer. Math., 17:72–90, 1977. (Cited on p. 398.)
[911] Bhaskar D. Rao and Kenneth Kreutz-Delgado. An affine scaling methodology for best basis selec-
tion. IEEE Trans. Signal Process, 47:187–200, 1999. (Cited on p. 429.)
[912] R. C. Rao. Linear Statistical Inference and Its Applications, John Wiley, New York, second edition,
1973. (Cited on p. 128.)
[913] R. K. Rao and P. Yip. Discrete Cosine Transforms. Academic Press, New York, 1990. (Cited on
p. 237.)
[914] Lord Rayleigh. On the calculation of the frequency of vibration of a system in its gravest mode
with an example from hydrodynamics. Philos Mag., 47:556–572, 1899. (Cited on p. 376.)
[915] Shaked Regev and Michael A. Saunders. Ssai: A Symmetric Approximate Inverse Preconditioner
for the Conjugate Gradient Methods PCG and PCGLS. Tech. Report, Working Paper, SOL and
ICME, Stanford University, Stanford, CA, 2022. (Cited on p. 314.)
[916] Lothar Reichel. Fast QR decomposition of Vandermonde-like matrices and polynomial least
squares approximation. SIAM J. Matrix Anal. Appl., 12:552–564, 1991. (Cited on pp. 230, 238,
239.)
[917] Lothar Reichel and William B. Gragg. FORTRAN subroutines for updating the QR decomposition.
ACM Trans. Math. Softw., 16:369–377, 1990. (Cited on p. 150.)
[918] Lothar Reichel and Qiang Ye. A generalized LSQR algorithm. Numer. Linear Algebra Appl.,
15:643–660, 2008. (Cited on p. 305.)
[919] John K. Reid. A note on the least squares solution of a band system of linear equations by House-
holder reductions. Comput J., 10:188–189, 1967. (Cited on p. 188.)
[920] John K. Reid. A note on the stability of Gaussian elimination. J. Inst. Math. Appl., 8:374–375,
1971. (Cited on p. 281.)
[921] John K. Reid. Implicit scaling of linear least squares problems. BIT Numer. Math., 40:146–157,
2000. (Cited on p. 160.)
[922] Christian H. Reinsch. Smoothing by spline functions. Numer. Math., 10:177–183, 1967. (Cited on
p. 189.)
[923] Christian H. Reinsch. Smoothing by spline functions II. Numer. Math., 16:451–454, 1971. (Cited
on pp. 169, 176.)
[924] Rosemary A. Renaut and Hongbin Guo. Efficient algorithms for solution of regularized total least
squares problems. SIAM J. Matrix Anal. Appl., 26:457–476, 2005. (Cited on p. 226.)
[925] J. R. Rice. PARVEC Workshop on Very Large Least Squares Problems and Supercomputers. Tech.
Report CSD-TR 464, Purdue University, West Lafayette, IN, 1983. (Cited on p. 206.)
[926] John R. Rice. A theory of condition. SIAM J. Numer. Anal., 3:287–310, 1966. (Cited on p. 62.)
[927] John R. Rice and Karl H. Usow. The Lawson algorithm and extensions. Math. Comp., 24:118–127,
1968. (Cited on p. 423.)
[928] J. L. Rigal and J. Gaches. On the compatibility of a given solution with the data of a linear system.
J. Assoc. Comput. Mach., 14:543–548, 1967. (Cited on p. 97.)
[929] J. D. Riley. Solving systems of linear equations with a positive definite symmetric but possibly
ill-conditioned matrix. Math. Tables Aids. Comput., 9:96–101, 1956. (Cited on pp. 177, 326.)
[930] Walter Ritz. Über eine neue Methode zur Lösung gewisser Variationsprobleme der mathematischen
Physik. J. Reine Angew. Math., 136:1–61, 1908. (Cited on p. 376.)
476 Bibliography
[931] Marielba Rojas, Sandra A. Santos, and Danny C. Sorensen. Algorithm 873: LSTRS: MATLAB
software for large-scale trust-region subproblems and regularization. ACM Trans. Math. Softw.,
24:11:1–11.28, 2008. (Cited on pp. 182, 375.)
[932] Marielba Rojas and Danny C. Sorensen. A trust-region approach to the regularization of large-
scale discrete forms of ill-posed problems. SIAM J. Sci. Comput., 23:1842–1860, 2002. (Cited on
p. 182.)
[933] Marielba Rojas and Trond Steihaug. An interior-point trust-region-based method for large-scale
non-negative regularization. Inverse Problems, 18:1291–1307, 2002. (Cited on p. 419.)
[934] Vladimir Rokhlin and Mark Tygert. A fast randomized algorithm for overdetermined linear least
squares regression. Proc. Natl. Acad. Sci. USA, 105:13212–13217, 2008. (Cited on p. 319.)
[935] D. J. Rose. A graph-theoretic study of the numerical solution of sparse positive definite systems of
linear equations. In R. C. Read, editor, Graph Theory and Computing, pages 183–217, Academic
Press, New York, 1972. (Cited on pp. 249, 251.)
[936] J. Ben Rosen, Haesun Park, and John Glick. Total least norm problems formulation and solution
for structured problems. SIAM J. Matrix Anal. Appl., 17:110–126, 1996. (Cited on pp. 227, 411.)
[937] Roman Rosipal and Nicole Krämer. Overview and recent advances in partial least squares. In C.
Saunders et al., eds., Proceedings of International Statistics and Optimization Perspectives Work-
shop, “Subspace, Latent Structure and Feature Selection,” volume 3940 of Lecture Notes in Com-
puter Science, pages 34–51. Springer, Berlin, 2006. (Cited on p. 203.)
[938] Miroslav Rozložník, Alicja Smoktunowicz, Miroslav Tůma, and Jiří Kopal. Numerical stability of
orthogonalization methods with a non-standard inner product. BIT Numer. Anal., 52:1035–1058,
2012. (Cited on p. 122.)
[939] Axel Ruhe. Accelerated Gauss–Newton algorithms for nonlinear least squares problems. BIT
Numer. Math., 19:356–367, 1979. (Cited on p. 395.)
[940] Axel Ruhe. Numerical aspects of Gram–Schmidt orthogonalization of vectors. Linear Algebra
Appl., 52/53:591–601, 1983. (Cited on p. 71.)
[941] Axel Ruhe. Rational Krylov: A practical algorithms for large sparse nonsymmetric matrix pencils.
SIAM J. Sci. Comput., 19:1535–1551, 1998. (Cited on p. 376.)
[942] Axel Ruhe and Per Åke Wedin. Algorithms for separable nonlinear least squares problems. SIAM
Rev., 22:318–337, 1980. (Cited on pp. 404, 406.)
[943] Siegfried M. Rump. INTLAB - INTerval LABoratory. In Tibor Csendes, editor, Developments
in Reliable Computing, pages 77–104. Kluwer Academic Publishers, Dordrecht, 1999. (Cited on
p. 34.)
[944] Siegfried M. Rump. Fast and parallel interval arithmetic. BIT Numer. Math., 39:534–554, 1999.
(Cited on p. 34.)
[945] Siegfried M. Rump. Ill-conditioned matrices are componentwise near to singularity. SIAM Review,
41:102–112, 1999. (Cited on p. 31.)
[946] Heinz Rutishauser. Der Quotienten-Differenzen-Algorithmus. Z. Angew. Math. Phys., 5:233–251,
1954. (Cited on p. 351.)
[947] Heinz Rutishauser. Solution of eigenvalue problems with the LR-transformation. Nat. Bur. Stan-
dards Appl. Math. Ser., 49:47–81, 1958. (Cited on p. 339.)
[948] Heinz Rutishauser. Theory of gradient methods. In M. Engeli, Th. Ginsburg, H. Rutishauser,
and E. Stiefel, editors, Refined Methods for Computation of the Solution and the Eigenvalues of
Self-Adjoint Boundary Value Problems, pages 24–50. Birkhäuser, Basel/Stuttgart, 1959. (Cited on
p. 326.)
[949] Heinz Rutishauser. On Jacobi rotation patterns. In Proceedings of Symposia in Applied Math-
ematics, Vol. XV: Experimental Arithmetic, High Speed Computing and Mathematics. American
Mathematical Society, Providence, RI, pages 219–239, 1963. (Cited on p. 341.)
Bibliography 477
[950] Heinz Rutishauser. The Jacobi method for real symmetric matrices. In F. L. Bauer et al., editors,
Handbook for Automatic Computation. Vol. II, Linear Algebra, pages 201–211. Springer, New
York, 1971. Prepublished in Numer. Math., 9:1–10, 1966. (Cited on p. 353.)
[951] Heinz Rutishauser. Description of ALGOL 60. Handbook for Automatic Computation. Vol. I, Part
a. Springer-Verlag, Berlin, 1967. (Cited on p. 69.)
[952] Heintz Rutishauser. Once again: The least squares problem. Linear Algebra Appl., 1:479–488,
1968. (Cited on p. 177.)
[953] Yousef Saad. Preconditioning techniques for nonsymmetric and indefinite linear systems. J. Com-
put. Appl. Math., 24:89–105, 1988. (Cited on p. 313.)
[954] Yousef Saad. Numerical Methods for Large Eigenvalue Problems. Halstead Press, New York, 1992.
(Cited on p. 375.)
[955] Yousef Saad. A flexible inner-outer preconditioned GMRES algorithm. SIAM J. Sci. Statist. Com-
put., 14:461–469, 1993. (Cited on pp. 303, 304.)
[956] Yousef Saad. Iterative Methods for Sparse Linear Systems. PVS Publishing Company, Boston,
MA, 1996. (Cited on pp. 281, 284.)
[957] Yousef Saad. Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia, second edition,
2003. (Cited on p. 269.)
[958] Yousef Saad. Numerical Methods for Large Eigenvalue Problems, volume 66 of Classics in Applied
Math., SIAM, Philadelphia, revised edition, 2011. Updated edition of the work first published by
Manchester University Press, 1992. (Cited on p. 375.)
[959] Youcef Saad and Martin H Schultz. GMRES: A generalized minimal residual algorithm for solving
nonsymmetric linear systems. SIAM J. Sci. Statist. Comput., 7:856–869, 1986. (Cited on p. 301.)
[960] Yousef Saad and Henk A. van der Vorst. Iterative solution of linear systems in the 20th century. J.
Comput. Appl. Math., 123:1–33, 2000. (Cited on p. 269.)
[961] Y. Saad, M. Yeung, J. Erhel, and F. Guyomarc’h. A deflated version of the conjugate gradient
algorithm. SIAM J. Sci. Comput., 21:1909–1926, 2000. (Cited on pp. 312, 336, 336.)
[962] Douglas E. Salane. A continuation approach for solving large-residual nonlinear least squares
problems. SIAM J. Sci. Statist. Comput., 8:655–671, 1987. (Cited on p. 402.)
[963] Michael A. Saunders. Large-Scale Linear Programming Using the Cholesky Factorization. Tech.
Report CS252, Computer Science Department, Stanford University, Stanford, CA, 1972. (Cited
on pp. 146, 146.)
[964] Michael A. Saunders. Sparse least squares by conjugate gradients: A comparison of precondition-
ing methods. In J. F. Gentleman, editor, Proc. Computer Science and Statistics 12th Annual Sym-
posium on the Interface, pages 15–20. University of Waterloo, Canada, 1979. (Cited on p. 317.)
[965] Michael A. Saunders. Solution of sparse rectangular systems using LSQR and Craig. BIT Numer.
Math., 35:588–604, 1995. (Cited on pp. 292, 328.)
[966] M. A. Saunders, H. D. Simon, and E. L. Yip. Two conjugate-gradient-type methods for unsymmet-
ric systems. SIAM J. Numer. Anal., 25:927–940, 1988. (Cited on pp. 305, 331.)
[967] Werner Sautter. Error analysis of Gauss elimination for the best least squares solution. Numer.
Math., 30:165–184, 1978. (Cited on p. 85.)
[968] Berkant Savas and Lek-Heng Lim. Quasi-Newton methods on Grassmannians and multilinear
approximations of tensors. SIAM J. Sci. Comput., 32:3352–3393, 2010. (Cited on p. 217.)
[969] Robert Schatten. Norm Ideals of Completely Continuous Operators. Ergebnisse der Mathematik
und ihrer Grenzgebiete, Neue Folge. Springer Verlag, Berlin, 1960. (Cited on p. 22.)
[970] K. Schittkowski. Solving constrained nonlinear least squares problems by a general purpose SQP-
method. In K.-H. Hoffmann, J. B. Hiriart-Urruty, C. Lemaréchal, and J. Zowe, editors, Trends in
Mathematical Optimization, volume 84 of International Series of Numerical Mathematics, pages
49–83. Birkhäuser-Verlag, Basel, Switzerland, 1985. (Cited on p. 162.)
478 Bibliography
[971] Erhard Schmidt. Zur Theorie der linearen und nichtlinearen Integralgleichungen. 1 Teil. Entwick-
lung willkürlicher Funktionen nach Systemen vorgeschriebener. Math. Ann., 63:433–476, 1907.
(Cited on p. 63.)
[972] Erhard Schmidt. Über die Auflösung linearer Gleichungen mit unendlich vielen Unbekannten.
Rend. Circ. Mat. Palermo. Ser. 1, 25:53–77, 1908. (Cited on p. 63.)
[973] W. Schönemann. A generalized solution of the orthogonal Procrustes problem. Psychometrica,
31:1–10, 1966. (Cited on p. 386.)
[974] Robert S. Schreiber. A new implementation of sparse Gaussian elimination. ACM Trans. Math.
Softw., 8:256–276, 1982. (Cited on p. 250.)
[975] Robert Schreiber and Charles Van Loan. A storage efficient WY representation for products of
Householder transformations. SIAM J. Sci. Statist. Comput., 10:53–57, 1989. (Cited on p. 106.)
[976] Günther Schulz. Iterative Berechnung der reziproken Matriz. Z. Angew. Math. Mech., 13:57–59,
1933. (Cited on p. 379.)
[977] H. R. Schwarz. Tridiagonalization of a symmetric band matrix. Numer. Math., 12:231–241, 1968.
Also appears in [1123, pp. 273–283]. (Cited on p. 341.)
[978] H. R. Schwarz, Hans Rutishauser, and Eduard Stiefel. Matrizen-Numerik. Teubner Verlag, Stuttgart,
1986. (Cited on p. 62.)
[979] Hubert Schwetlick. Nonlinear parameter estimation: Models, criteria and estimation. In D. F.
Griffiths and G. A. Watson, editors, Numerical Analysis 1991. Proceedings of the 14th Dundee
Conference on Numerical Analysis, volume 260 of Pitman Research Notes in Mathematics, pages
164–193. Longman Scientific and Technical, Harlow, UK, 1992. (Cited on p. 402.)
[980] Hubert Schwetlick and V. Tiller. Numerical methods for estimating parameters in nonlinear models
with errors in the variables. Technometrics, 27:17–24, 1985. (Cited on p. 410.)
[981] Hubert Schwetlick and Volker Tiller. Nonstandard scaling matrices for trust region Gauss–Newton
methods. SIAM J. Sci. Statist. Comput., 10:654–670, 1989. (Cited on p. 410.)
[982] Hugo D. Scolnik. On the solution of non-linear least squares problems. In C. V. Freiman, J. E.
Griffith, and J. L. Rosenfeld, editors, Proc. IFIP Congress 71. Vol. 2, pages 1258–1265. North-
Holland, Amsterdam, 1972. (Cited on p. 405.)
[983] Jennifer Scott. On using Cholesky-based factorizations and regularization for solving rank-deficient
sparse linear least-squares problems. SIAM J. Sci. Comput., 39:C319–C339, 2017. (Cited on
p. 315.)
[984] Jennifer A. Scott and Miroslav Tůma. The importance of structure in incomplete factorization
preconditioners. BIT Numer. Math., 51:385–404, 2011. (Cited on p. 310.)
[985] Jennifer A. Scott and Miroslav Tůma. HSL_MI28: An efficient and limited-memory incomplete
Cholesky factorization code. ACM Trans. Math. Softw., 40:Article 24, 2014. (Cited on pp. 311,
315.)
[986] Jennifer Scott and Miroslav Tůma. On positive semidefinite modification schemes for incomplete
Cholesky factorization. SIAM J. Sci. Comput., 36:A609–A633, 2014. (Cited on p. 311.)
[987] Jennifer Scott and Miroslav Tůma. Preconditioning of linear least squares by robust incomplete
factorization for implicitly held normal equations. SIAM J. Sci. Comput., 38:C603–C623, 2016.
(Cited on p. 315.)
[988] Jennifer Scott and Miroslav Tůma. Solving mixed sparse-dense linear least-squares problems by
preconditioned iterative methods. SIAM J. Sci. Comput., 39:A2422–A2437, 2017. (Cited on
p. 263.)
[989] Jennifer A. Scott and Miroslav Tůma. A Schur complement approach to preconditioning sparse
least squares with some dense rows. Numer. Algor., 79:1147–1168, 2018. (Cited on p. 263.)
[990] Jennifer A. Scott and Miroslav Tůma. Sparse stretching for solving sparse-dense linear least-
squares problems. SIAM J. Sci. Comput., 41:A1604–A1625, 2019. (Cited on p. 263.)
Bibliography 479
[991] Jennifer A. Scott and Miroslav Tůma. Algorithms for Sparse Linear Systems. Necas Center Series.
Birkhäuser, Cham, 2023. (Cited on p. 244.)
[992] Shayle R. Searle. Extending some results and proofs for the singular linear model. Linear Algebra
Appl., 210:139–151, 1994. (Cited on p. 128.)
[993] V. de Silva and Lek-Heng Lim. Tensor rank and the ill-posedness of the best low rank approxima-
tion. SIAM J. Matrix Anal. Appl., 30:1084–1127, 2008. (Cited on pp. 215, 216, 218.)
[994] Diana Sima, Sabine Van Huffel, and Gene H. Golub. Regularized total least squares based on
quadratic eigenvalue solvers. Linear Algebra Appl., 44:793–812, 2004. (Cited on p. 225.)
[995] Horst D. Simon. Analysis of the symmetric Lanczos algorithm with reorthogonalization methods.
Linear Algebra Appl., 61:101–131, 1984. (Cited on p. 298.)
[996] Horst D. Simon and Hongyuan Zha. Low-rank matrix approximation using the Lanczos bidiagonal-
ization process with applications. SIAM J. Sci. Comput., 21:2257–2274, 2000. (Cited on pp. 203,
298, 371.)
[997] Valeria Simoncini and Daniel B. Szyld. On the occurrence of superlinear convergence of exact and
inexact Krylov subspace methods. SIAM Rev., 47:247–272, 2005. (Cited on p. 299.)
[998] Valeria Simoncini and Daniel B. Szyld. Recent computational developments in Krylov subspace
methods for linear systems. Numer. Linear Algebra Appl., 14:1–59, 2007. (Cited on p. 337.)
[999] Lennart Simonsson. Subspace Computations via Matrix Decompositions and Geometric Optimiza-
tion. Ph.D. thesis, Linköping Studies in Science and Technology No. 1052, Linköping, Sweden,
2006. (Cited on p. 155.)
[1000] Robert D. Skeel. Scaling for numerical stability in Gaussian elimination. J. Assoc. Comput. Mach.,
26:494–526, 1979. (Cited on p. 31.)
[1001] Robert D. Skeel. Iterative refinement implies numerical stability for Gaussian elimination. Math.
Comp., 35:817–832, 1980. (Cited on p. 103.)
[1002] Gerard L. G. Sleijpen and Henk A. van der Vorst. A Jacobi–Davidson iteration method for linear
eigenvalue problems. SIAM J. Matrix Anal. Appl., 17:401–425, 1996. (Cited on pp. 374, 375,
376.)
[1003] Gerard L. G. Sleijpen and Henk A. van der Vorst. A Jacobi–Davidson iteration method for linear
eigenvalue problems. SIAM Rev., 42:267–293, 2000. (Cited on p. 375.)
[1004] S. W. Sloan. An algorithm for profile and wavefront reduction of sparse matrices. Int. J. Numer.
Methods Eng., 23:239–251, 1986. (Cited on p. 251.)
[1005] B. T. Smith, J. M. Boyle, Jack J. Dongarra, B. S. Garbow, Y. Ikebe, Virginia C. Klema, and Cleve B.
Moler. Matrix Eigensystems Routines—EISPACK Guide, volume 6 of Lecture Notes in Computer
Science. Springer, New York, second edition, 1976. (Cited on p. 113.)
[1006] Alicja Smoktunowicz, Jesse L. Barlow, and Julien Langou. A note on the error analysis of the
classical Gram–Schmidt. Numer. Math., 105:299–313, 2006. (Cited on p. 63.)
[1007] Inge Söderkvist. Perturbation analysis of the orthogonal Procrustes problem. BIT Numer. Math.,
33:687–694, 1993. (Cited on p. 387.)
[1008] Inge Söderkvist and Per-Åke Wedin. Determining the movements of the skeleton using well-
configured markers. J. Biomech., 26:1473–1477, 1993. (Cited on p. 386.)
[1009] Inge Söderkvist and Per-Åke Wedin. On condition numbers and algorithms for determining a rigid
body movement. BIT Numer. Math., 34:424–436, 1994. (Cited on p. 386.)
[1010] Torsten Söderström and G. W. Stewart. On the numerical properties of an iterative method for
computing the Moore–Penrose generalized inverse. SIAM J. Numer. Anal., 11:61–74, 1974. (Cited
on p. 380.)
[1011] D. C. Sorensen. Implicit application of polynomial filters in a k-step Arnoldi method. SIAM J.
Matrix Anal. Appl., 13:357–385, 1992. (Cited on pp. 372, 373.)
480 Bibliography
[1012] Danny C. Sorensen. Numerical methods for large eigenvalue problems. Acta Numer., 11:519–584,
2002. (Cited on p. 375.)
[1013] David Sourlier. Three-Dimensional Feature-Independent Bestfit in Coordinate Metrology. Ph.D.
dissertation, Swiss Federal Institute of Technology, Zürich, 1995. (Cited on p. 416.)
[1014] Helmuth Späth. Mathematical Algorithms for Linear Regression. Academic Press, Boston, 1992.
(Cited on p. 423.)
[1015] G. W. Stewart. Introduction to Matrix Computations. Academic Press, New York, 1973. (Cited on
p. 22.)
[1016] G. W. Stewart. The economical storage of plane rotations. Numer. Math., 25:137–138, 1976. (Cited
on p. 49.)
[1017] G. W. Stewart. On the perturbation of pseudo-inverses, projections and linear least squares prob-
lems. SIAM Rev., 19:634–662, 1977. (Cited on pp. 19, 19, 31, 98.)
[1018] G. W. Stewart. Research, development, and LINPACK. In J. R. Rice, editor, Mathematical Software
III, pages 1–14. Academic Press, New York, 1977. (Cited on pp. 25, 98.)
[1019] G. W. Stewart. The efficient generation of random orthogonal matrices with an application to
condition estimators. SIAM J. Numer. Anal., 17:403–409, 1980. (Cited on p. 63.)
[1020] G. W. Stewart. Computing the CS decomposition of a partitioned orthogonal matrix. Numer. Math.,
40:297–306, 1982. (Cited on pp. 19, 128.)
[1021] G. W. Stewart. A method for computing the generalized singular value decomposition. In
B. Kågström and Axel Ruhe, editors, Matrix Pencils. Proceedings, Pite Havsbad, 1982, volume
973 of Lecture Notes in Mathematics, pages 207–220. Springer-Verlag, Berlin, 1983. (Cited on
p. 128.)
[1022] G. W. Stewart. On the asymptotic behavior of scaled singular value and QR decompositions. Math.
Comp., 43:483–489, 1984. (Cited on p. 132.)
[1023] G. W. Stewart. Rank degeneracy. SIAM J. Sci. Statist. Comput., 5:403–413, 1984. (Cited on pp. 77,
83.)
[1024] G. W. Stewart. Determining rank in the presence of errors. In Mark S. Moonen, Gene H. Golub,
and Bart L. M. De Moor, editors, Large Scale and Real-Time Applications, pages 275–292. Kluwer
Academic Publishers, Dordrecht, 1992. (Cited on p. 80.)
[1025] G. W. Stewart. An updating algorithm for subspace tracking. IEEE Trans. Signal Process.,
40:1535–1541, 1992. (Cited on pp. 78, 153.)
[1026] G. W. Stewart. On the early history of the singular value decomposition. SIAM Rev., 35:551–566,
1993. (Cited on pp. 13, 79.)
[1027] G. W. Stewart. Updating a rank-revealing ULV decomposition. SIAM J. Matrix Anal. Appl., 14:494–
499, 1993. (Cited on pp. 153, 154, 154.)
[1028] G. W. Stewart. Gauss, statistics, and Gaussian elimination. J. Comput. Graphical Statistics, 4:1–11,
1995. (Cited on p. 39.)
[1029] G. W. Stewart. On the stability of sequential updates and downdates. IEEE Trans. Signal Process.,
43:1643–1648, 1995. (Cited on pp. 8, 149.)
[1030] G. W. Stewart. Matrix Algorithms Volume I: Basic Decompositions. SIAM, Philadelphia, 1998.
(Cited on p. 39.)
[1031] G. W. Stewart. Block Gram–Schmidt orthogonalization. SIAM J. Sci. Comput., 31:761–775, 2008.
(Cited on p. 109.)
[1032] G. W. Stewart. On the numerical analysis of oblique projectors. SIAM J. Matrix Anal. Appl.,
32:309–348, 2011. (Cited on pp. 119, 119.)
[1033] G. W. Stewart and Ji-guang Sun. Matrix Perturbation Theory. Academic Press, New York, 1990.
(Cited on pp. 21, 25, 31.)
Bibliography 481
[1034] Michael Stewart and Paul Van Dooren. Updating a generalized URV decomposition. SIAM J.
Matrix Anal. Appl., 22:479–500, 2000. (Cited on p. 155.)
[1035] Eduard Stiefel. Ausgleichung ohne Aufstellung der Gaußschen Normalgleichungen. Wiss. Z. Tech.
Hochsch. Dresden, 2:441–442, 1952/53. (Cited on p. 285.)
[1036] Eduard Stiefel. Über diskrete und lineare Tschebyscheff-Approximation. Numer. Math., 1:1–28,
1959. (Cited on p. 422.)
[1037] S. M. Stigler. An attack on Gauss, published by Legendre in 1820. Hist. Math., 4:31–35, 1977.
(Cited on p. 2.)
[1038] S. M. Stigler. Gauss and the invention of least squares. Ann. Statist., 9:465–474, 1981. (Cited on
pp. 2, 2.)
[1039] Joseph Stoer. On the numerical solution of constrained least-squares problems. SIAM J. Numer.
Anal., 8:382–411, 1971. (Cited on p. 165.)
[1040] Zdeněk Strakoš and Petr Tichý. On error estimation in the conjugate gradient method and why it
works in finite precision computations. ETNA, 13:56–80, 2002. (Cited on p. 299.)
[1041] O. N. Strand. Theory and methods related to the singular-function expansion and Landweber’s
iteration for integral equations of the first kind. SIAM J. Numer. Anal., 11:798–825, 1974. (Cited
on p. 326.)
[1042] Gilbert Strang. A proposal for Toeplitz matrix computations. Stud. Appl. Math., 74:171–176, 1986.
(Cited on p. 324.)
[1043] Gilbert Strang. A framework for equilibrium equations. SIAM Rev., 30:283–297, 1988. (Cited on
p. 116.)
[1044] Gilbert Strang. The discrete cosine transform. SIAM Rev., 41:135–147, 1999. (Cited on p. 237.)
[1045] Rolf Strebel, David Sourlier, and Walter Gander. A comparison of orthogonal least squares fitting
in coordinate metrology. In Sabine Van Huffel, editor, Proceedings of the Second International
Workshop on Total Least Squares and Errors-in-Variables Modeling, Leuven, Belgium, August 21–
24, 1996, pages 249–258. SIAM, Philadelphia, 1997. (Cited on p. 416.)
[1046] Chunguang Sun. Parallel sparse orthogonal factorization on distributed-memory multiprocessors.
SIAM J. Sci. Comput., 17:666–685, 1996. (Cited on p. 258.)
[1047] Ji-guang Sun. Perturbation theorems for generalized singular values. J. Comput. Math., 1:233–242,
1983. (Cited on p. 125.)
[1048] Ji-guang Sun. Perturbation bounds for the Cholesky and QR factorizations. BIT Numer. Math.,
31:341–352, 1991. (Cited on p. 54.)
[1049] Ji-guang Sun. Perturbation analysis of the Cholesky downdating and QR updating problems. SIAM
J. Matrix Anal. Appl., 16:760–775, 1995. (Cited on pp. 146, 148.)
[1050] Ji-guang Sun. Optimal backward perturbation bounds for the linear least-squares problem with
multiple right-hand sides. IMA J. Numer. Anal., 16:1–11, 1996. (Cited on p. 99.)
[1051] Ji-guang Sun and Zheng Sun. Optimal backward perturbation bounds for underdetermined systems.
SIAM J. Matrix Anal. Appl., 18:393–402, 1997. (Cited on p. 99.)
[1052] Brian D. Sutton. Computing the complete CS decomposition. Numer. Algor., 50:33–65, 2009.
(Cited on p. 19.)
[1053] D. R. Sweet. Fast Toeplitz orthogonalization. Numer. Math., 43:1–21, 1984. (Cited on p. 241.)
[1054] Katarzyna Świrydowicz, Julien Langou, Shreyas Ananthan ans Ulrike Yang, and Stephen Thomas.
Low synchronization Gram–Schmidt and generalized minimum residual algorithms. Numer. Linear
Algebra Appl., 28:1–20, 2020. (Cited on p. 109.)
[1055] Daniel B. Szyld. The many proofs of an identity on the norm of an oblique projection. Numer.
Algorithms, 42:309–323, 2006. (Cited on p. 119.)
482 Bibliography
[1056] Kunio Tanabe. Projection method for solving a singular system of linear equations and its applica-
tions. Numer. Math., 17:203–214, 1971. (Cited on p. 270.)
[1057] Robert Tarjan. Depth-first search and linear graph algorithms. SIAM J. Comput., 1:146–160, 1972.
(Cited on p. 264.)
[1058] R. P. Tewarson. A computational method for evaluating generalized inverses. Comput. J., 10:411–
413, 1968. (Cited on pp. 88, 88.)
[1059] Stephen J. Thomas and R. V. M. Zahar. Efficient orthogonalization in the M -norm. Congr. Numer.,
80:23–32, 1991. (Cited on p. 122.)
[1060] Stephen J. Thomas and R. V. M. Zahar. An analysis of orthogonalization in elliptic norms. Congr.
Numer., 86:193–222, 1992. (Cited on p. 122.)
[1061] Robert Tibshirani. Regression shrinkage and selection via the LASSO. Royal Statist. Soc. B,
58:267–288, 1996. (Cited on p. 425.)
[1062] A. N. Tikhonov and V. Y Arsenin. Solutions of Ill-Posed Problems. Winston, Washington D.C.,
1977. (Cited on p. 175.)
[1063] Andrei N. Tikhonov. Solution of incorrectly formulated problems and the regularization method.
Soviet Math. Dokl., 4:1035–1038, 1963. (Cited on p. 175.)
[1064] W. F. Tinney and J. W. Walker. Direct solution of sparse network equations by optimally ordered
triangular factorization. Proc. IEEE, 55:1801–1809, 1967. (Cited on p. 251.)
[1065] M. Tismenetsky. A new preconditioning technique for solving large sparse linear systems. Linear
Algebra Appl., 154/156:331–353, 1991. (Cited on p. 310.)
[1066] Ph. L. Toint. On large scale nonlinear least squares calculations. SIAM J. Sci. Statist. Comput.,
8:416–435, 1987. (Cited on p. 400.)
[1067] Philippe L. Toint. VE10AD a Routine for Large-Scale Nonlinear Least Squares. Harwell Subroutine
Library, AERE Harwell, Oxfordshire, UK, 1987. (Cited on p. 400.)
[1068] Lloyd N. Trefethen and David Bau, III. Numerical Linear Algebra. SIAM, Philadelphia, 1997.
(Cited on p. 60.)
[1069] Michael J. Tsatsomeros. Principal pivot transforms. Linear Algebra Appl., 307:151–165, 2000.
(Cited on p. 136.)
[1070] L. R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrica, 31:279–311,
1966. (Cited on pp. 216, 217.)
[1071] Madelaine Udell and Alex Townsend. Why are big data matrices approximately low rank? SIAM
J. Math. Data Sci., 1:144–160, 2019. (Cited on p. 24.)
[1072] M. H. van Benthem and M. R. Keenan. A fast non-negativity-constrained least squares algorithm.
J. Chemometrics, 18:441–450, 2004. (Cited on p. 168.)
[1073] A. van der Sluis. Stability of the solutions of linear least squares problems. Numer. Math., 23:241–
254, 1975. (Cited on p. 27.)
[1074] A. van der Sluis and G. Veltkamp. Restoring rank and consistency by orthogonal projection. Linear
Algebra Appl., 28:257–278, 1979. (Cited on p. 26.)
[1075] Henk A. van der Vorst. Iterative Krylov Methods for Large Linear Systems. Number 13 in Cam-
bridge Monographs on Applied and Computational Mathematics. Cambridge University Press,
Cambridge, UK, 2003. (Cited on pp. 269, 285.)
[1076] Sabine Van Huffel, Haesun Park, and J. Ben Rosen. Formulation and solution of structured total
least norm problems for parameter estimation. IEEE Trans. Signal Process., 44:2464–2474, 1996.
(Cited on p. 227.)
[1077] Sabine Van Huffel and Joos Vandewalle. The Total Least Squares Problem: Computational Aspects
and Analysis. SIAM, Philadelphia, 1991. (Cited on pp. 218, 220, 222, 223, 223.)
Bibliography 483
[1078] Sabine Van Huffel, Joos Vandewalle, and Ann Haegemans. An efficient and reliable algorithm for
computing the singular subspace of a matrix associated with its smallest singular values. J. Comput.
Appl. Math., 19:313–330, 1987. (Cited on p. 223.)
[1079] Charles F. Van Loan. Generalizing the singular value decomposition. SIAM J. Numer. Anal., 13:76–
83, 1976. (Cited on pp. 124, 125.)
[1080] Charles F. Van Loan. Computing the CS and the generalized singular value decomposition. Numer.
Math., 46:479–492, 1985. (Cited on pp. 19, 128.)
[1081] Charles Van Loan. On the method of weighting for equality-constrained least squares. SIAM J.
Numer. Anal., 22:851–864, 1985. (Cited on pp. 130, 159.)
[1082] Charles Van Loan. Computational Frameworks for the Fourier Transform, volume 10 of Frontiers
in Applied Math. SIAM, Philadelphia, 1992. (Cited on p. 237.)
[1083] Charles F. Van Loan. The ubiquitous Kronecker product. J. Comput. Appl. Math., 123:85–100,
2000. (Cited on p. 210.)
[1084] Field G. Van Zee, Robert A. van de Geijn, and Gregorio Quintana-Ortí. Restructuring the tridiago-
nal and bidiagonal QR algorithms for performance. Numer. Linear Algebra Appl., 40:18:1–18:34,
2014. (Cited on p. 360.)
[1085] Robert J. Vanderbei. Symmetric quasidefinite matrices. SIAM J. Optim., 5:100–113, 1995. (Cited
on p. 329.)
[1086] J. M. Varah. On the numerical solution of ill-conditioned linear systems with application to ill-
posed problems. SIAM J. Numer. Anal., 10:257–267, 1973. (Cited on p. 172.)
[1087] J. M. Varah. A practical examination of some numerical methods for linear discrete ill-posed
problems. SIAM Rev., 21:100–111, 1979. (Cited on p. 181.)
[1088] J. M. Varah. Pitfalls in the numerical solution of linear ill-posed problems. SIAM J. Sci. Statist.
Comput., 4:164–176, 1983. (Cited on p. 178.)
[1089] James M. Varah. Least squares data fitting with implicit functions. BIT Numer. Math., 36:842–854,
1996. (Cited on pp. 412, 413.)
[1090] Richard S. Varga. Matrix Iterative Analysis. Prentice-Hall, Englewood Cliffs, 1962. (Cited on
pp. 269, 276.)
[1091] Stephen A. Vavasis. Stable numerical algorithms for equilibrium systems. SIAM J. Matrix Anal.
Appl., 15:1108–1131, 1994. (Cited on p. 133.)
[1092] Stephen A. Vavasis. On the complexity of nonnegative matrix factorization. SIAM J. Optim.,
20:1364–1377, 2009. (Cited on p. 419.)
[1093] Vincenzo Esposito Vinzi, Wynne W. Chin, Jörg Henseler, and Huiwen Wang, editors. Handbook of
Partial Least Squares. Springer, New York, 2010. (Cited on p. 200.)
[1094] John von Neumann. Some matrix-inequalities and metrization of matrix-space. Tomsk Univ. Rev.,
1:286–300, 1937. (Cited on p. 21.)
[1095] A. J. Wahten. Preconditioning. Acta Numer., 24:323–376, 2015. (Cited on p. 287.)
[1096] Bertil Waldén, Rune Karlsson, and Ji-guang Sun. Optimal backward perturbation bounds for the
linear least squares problem. Numer. Linear Algebra Appl., 2:271–286, 1995. (Cited on p. 98.)
[1097] Homer F. Walker. Implementation of the GMRES method using Householder transformations.
SIAM J. Sci. Statist. Comput., 9:152–163, 1988. (Cited on pp. 109, 302.)
[1098] R. H. Wampler. A report on the accuracy of some widely used least squares computer programs. J.
Amer. Statist. Assoc., 65:549–565, 1970. (Cited on p. 103.)
[1099] R. H. Wampler. Solutions to weighted least squares problems by modified Gram–Schmidt with
iterative refinement. ACM Trans. Math. Softw., 5:457–465, 1979. (Cited on p. 103.)
[1100] J. Wang, Q. Zhang, and Lennart Ljung. Revisiting Hammerstein system identification through the
two-stage algorithm for bilinear parameter estimation. Automatica, 45:2627–2633, 2009. (Cited
on p. 405.)
484 Bibliography
[1101] Xiaoge Wang. Incomplete Factorization Preconditioning for Least Squares Problems. Ph.D. thesis,
Department of Mathematics, University of Illinois at Urbana-Champaign, Urbana, IL, 1993. (Cited
on p. 312.)
[1102] Xiaoge Wang, Kyle A. Gallivan, and Randall Bramley. CIMGS: An incomplete orthogonal factor-
ization preconditioner. SIAM J. Sci. Comput., 18:516–536, 1997. (Cited on p. 312.)
[1103] David S. Watkins. Understanding the QR algorithm. SIAM Rev., 24:427–440, 1982. (Cited on
p. 368.)
[1104] David S. Watkins. Francis’s algorithm. Amer. Math. Monthly, 118:387–403, 2011. (Cited on
p. 348.)
[1105] Joseph Henry Maclagan Wedderburn. Lectures on Matrices, Dover Publications, Inc., New York,
1964. Unabridged and unaltered republication of the work first published by the American Math-
ematical Society, New York, 1934 as volume XVII in their Colloquium Publications. (Cited on
p. 139.)
[1106] Per-Åke Wedin. On Pseudo-Inverses of Perturbed Matrices. Tech. Report, Department of Computer
Science, Lund University, Sweden, 1969. (Cited on p. 26.)
[1107] Per-Åke Wedin. Perturbation bounds in connection with the singular value decomposition. BIT
Numer. Math., 12:99–111, 1972. (Cited on pp. 22, 393.)
[1108] Per-Åke Wedin. Perturbation theory for pseudo-inverses. BIT Numer. Math., 13:217–232, 1973.
(Cited on pp. 26, 27, 28.)
[1109] Per-Åke Wedin. On the Gauss-Newton Method for the Nonlinear Least Squares Problems. Working
Paper 24, Institute for Applied Mathematics, Stockholm, Sweden, 1974. (Cited on p. 394.)
[1110] Per-Åke Wedin. Perturbation Theory and Condition Numbers for Generalized and Constrained
Linear Least Squares Problems. Tech. Report UMINF–125.85, Institute of Information Processing,
University of Umeå, Sweden, 1985. (Cited on pp. 119, 119, 128, 160.)
[1111] Musheng Wei. Algebraic relations between the total least squares and least squares problems with
more than one solution. Numer. Math., 62:123–148, 1992. (Cited on p. 226.)
[1112] Musheng Wei. The analysis for the total least squares problem with more than one solution. SIAM
J. Matrix Anal. Appl., 13:746–763, 1992. (Cited on p. 222.)
[1113] Musheng Wei. Perturbation theory for the rank-deficient equality constrained least squares problem.
SIAM J. Numer. Anal., 29:1462–1481, 1992. (Cited on p. 160.)
[1114] Yimin Wei and Weiyang Ding. Theory and Computation of Tensors: Multi-Dimensional Arrays.
Academic Press, New York, 2016. (Cited on p. 218.)
[1115] Yimin Wei, Pengpeng Xie, and Liping Zhang. Tikhonov regularization and randomized GSVD.
SIAM J. Matrix Anal. Appl., 37:649–675, 2016. (Cited on p. 334.)
[1116] P. R. Weil and P. C. Kettler. Rearranging matrices to block-angular form for decomposition (and
other) algorithms. Management Sci., 18:98–108, 1971. (Cited on p. 209.)
[1117] Helmut Wielandt. Das Iterationsverfahren bei nicht selbstadjungierten linearen Eigenwertaufgaben.
Math. Z., 50:93–143, 1944. (Cited on p. 365.)
[1118] James H. Wilkinson. Rounding Errors in Algebraic Processes. Prentice-Hall, Englewood Cliffs,
NJ, 1963. (Cited on p. 101.)
[1119] James H. Wilkinson. The Algebraic Eigenvalue Problem. Clarendon Press, Oxford, UK, 1965.
(Cited on p. 47.)
[1120] James H. Wilkinson. Error analysis of transformations based on the use of matrices of the form
I − 2xxH . In L. B. Rall, editor, Error in Digital Computation, pages 77–101. John Wiley, New
York, 1965. (Cited on pp. 34, 51, 350.)
[1121] James H. Wilkinson. A priori error analysis of algebraic processes. In Proc. Internat. Congr. Math.
(Moscow, 1966), Izdat. “Mir”, Moscow, 1968, pp. 629–640. (Cited on pp. 43, 345.)
Bibliography 485
[1122] James H. Wilkinson. Modern error analysis. SIAM Rev., 13:548–568, 1971. (Cited on p. 62.)
[1123] James H. Wilkinson and C. Reinsch, eds. Handbook for Automatic Computation. Volume II: Linear
Algebra. Springer, Berlin, 1971. (Cited on pp. 56, 113, 184, 478.)
[1124] Paul R. Willems and Bruno Lang. The MR3 -GK algorithm for the bidiagonal SVD. ETNA, 39:1–21,
2012. (Cited on p. 352.)
[1125] Paul R. Willems, Bruno Lang, and Christof Vömel. Computing the bidiagonal SVD using multiple
relatively robust representations. SIAM J. Matrix Anal. Appl., 28:907–926, 2006. (Cited on p. 352.)
[1126] T. J. Willmore. An Introduction to Differential Geometry. Clarendon Press, Oxford, UK, 1959.
(Cited on p. 394.)
[1127] Herman Wold. Estimation of principal components and related models by iterative least squares. In
P. R. Krishnaiah, editor, Multivariate Analysis, pages 391–420. Academic Press, New York, 1966.
(Cited on p. 200.)
[1128] S. Wold, A. Ruhe, H. Wold, and W. J. Dunn. The collinearity problem in linear regression. The
partial least squares (PLS) approach to generalized inverses. SIAM J. Sci. Statist. Comput., 5:735–
743, 1984. (Cited on p. 200.)
[1129] Svante Wold, Michael Sjöström, and Lennart Eriksson. PLS-regression: A basic tool of chemomet-
rics. Chemom. Intell. Lab. Syst., 58:109–130, 2001. (Cited on p. 203.)
[1130] Y. K. Wong. An application of orthogonalization process to the theory of lest squares. Ann. Math.
Statist., 6:53–75, 1935. (Cited on p. 64.)
[1131] Max A. Woodbury. Inverting Modified Matrices. Memorandum Report 42, Statistical Research
Group, Princeton, 1950. (Cited on p. 138.)
[1132] Stephen J. Wright. Stability of linear equation solvers in interior-point methods. SIAM J. Matrix
Anal. Appl., 16:1287–1307, 1995. (Cited on p. 129.)
[1133] Stephen J. Wright. Primal-Dual Interior-Point Methods. SIAM, Philadelphia, 1997. (Cited on
p. 419.)
[1134] Pengpeng Xie, Hua Xiang, and Yimin Wei. A contribution to perturbation analysis for total least
squares problems. Numer. Algorithms, 75:381–395, 2017. (Cited on p. 226.)
[1135] Andrew E. Yagle. Non-iterative Reweighted-Norm Least-Squares Local ℓ0 Minimization for Sparse
Solution to Underdetermined Linear Systems of Equations. Tech. Report Preprint, Department of
EECS, The University of Michigan, Ann Arbor, 2008. (Cited on p. 428.)
[1136] Yusaku Yamamoto, Yuji Nakatsukasa, Yuka Yanagisawa, and Takeshi Fukaya. Roundoff error
analysis of the Cholesky QR2 algorithm. ETNA, 44:306–326, 2015. (Cited on p. 213.)
[1137] Ichitaro Yamazaki, Stanimire Tomov, and Jack Dongarra. Mixed-precision Cholesky QR factoriza-
tion and its case studies on multicore CPU with multiple GPUs. SIAM J. Sci. Comput., 37:C307–
C330, 2015. (Cited on p. 213.)
[1138] L. Minah Yang, Alyson Fox, and Geoffrey Sanders. Rounding error analysis of mixed precision
block Householder QR algorithm. ETNA, 44:306–326, 2020. (Cited on p. 109.)
[1139] Sencer Nuri Yeralan, Timothy A. Davis, Wissam M. Sid-Lakhdar, and Sanjay Ranka. Algorithm
980: Sparse QR factorization on the GPU. ACM Trans. Math. Softw., 44:Article 17, 2017. (Cited
on p. 112.)
[1140] K. Yoo and Haesun Park. Accurate downdating of a modified Gram–Schmidt QR decomposition.
BIT Numer. Math., 36:166–181, 1996. (Cited on pp. 151, 152.)
[1141] David M. Young. Iterative Solution of Large Linear Systems. Dover, Mineola, NY, 2003.
Unabridged republication of the work first published by Academic Press, New York-London, 1971.
(Cited on pp. 270, 275, 277, 278.)
[1142] Jin Yun Yuan. Iterative Methods for the Generalized Least Squares Problem. Ph.D. thesis, Instituto
de Matemática Pure e Aplicada, Rio de Janeiro, Brazil, 1993. (Cited on p. 318.)
486 Bibliography
[1143] M. Zelen. Linear estimation and related topics. In John Todd, editor, Survey of Numerical Analysis,
pages 558–584. McGraw-Hill, New York, 1962. (Cited on p. 4.)
[1144] Hongyuan Zha. A two-way chasing scheme for reducing a symmetric arrowhead matrix to tridiag-
onal form. J. Num. Linear Algebra Appl., 1:49–57, 1992. (Cited on p. 362.)
[1145] Hongyuan Zha. Computing the generalized singular values/vectors of large sparse or structured
matrix pairs. Numer. Math., 72:391–417, 1996. (Cited on p. 333.)
[1146] Hongyuan Zha and Horst D. Simon. On updating problems in latent semantic indexing. SIAM J.
Sci. Comput., 21:782–791, 1999. (Cited on p. 360.)
[1147] Shaoshuai Zhang and Panruo Wu. High Accuracy Low Precision QR Factorization and Least
Squares Solver on GPU with TensorCore. Preprint. https://arxiv.org/abs/1912.05508,
2019. (Cited on p. 311.)
[1148] Zhenyue Zhang, Hongyuan Zha, and Wenlong Ying. Fast parallelizable methods for computing
invariant subspaces of Hermitian matrices. J. Comput. Math., 25:583–594, 2007. (Cited on p. 382.)
[1149] Liangmin Zhou, Lijing Lin, Yimin Wei, and Sangzheng Qiao. Perturbation analysis and condition
number of scaled total least squares problems. Numer. Algorithms, 51:381–399, 2009. (Cited on
p. 226.)
[1150] Ciyou Zhu, Richard H. Byrd, Peihuang Lu, and Jorge Nocedal. Algorithm 778: L-BFGS-B: Fortran
subroutines for large-scale bound-constrained optimization. ACM Tran. Math. Softw., 23:550–560,
1997. (Cited on p. 420.)
[1151] Zahari Zlatev. Comparison of two pivotal strategies in sparse plane rotations. Comput. Math. Appl.,
8:119–135, 1982. (Cited on p. 255.)
[1152] Zahari Zlatev and H. Nielsen. LLSS01—A Fortran Subroutine for Solving Least Squares Prob-
lems (User’s Guide). Tech. Report 79-07, Institute of Numerical Analysis, Technical University of
Denmark, Lyngby, Denmark, 1979. (Cited on p. 258.)
[1153] Zahari Zlatev and Hans Bruun Nielsen. Solving large and sparse linear least-squares problems by
conjugate gradient algorithms. Comput. Math. Appl., 15:185–202, 1988. (Cited on p. 313.)
[1154] E. I. Zolotarev. Application of elliptic functions to questions of functions deviating least and most
from zero. Zap. Imp. Akad. Nauk St. Petersburg, 30, 1877. Reprinted in his Collected Works, Vol.
II, Izdat. Akad. Nauk SSSR, Moscow, 1932, pp. 1–59 (in Russian). (Cited on p. 382.)
Index
487
488 Index
Problems
The first edition of Numerical Methods for Least Squares Problems was the leading
reference on this topic for many years. The updated second edition stands out
compared to other books on the topic because it
• provides an in-depth and up-to-date treatment of direct and iterative methods for
solving different types of least squares problems and for computing the singular
value decomposition;
• covers generalized, constrained, and nonlinear least squares problems as well as
partial least squares and regularization methods for discrete ill-posed problems; Second Edition
and
• contains a bibliography of over 1,100 historical and recent references, providing a
comprehensive survey of past and present research in the field.
Audience
This book will be of interest to graduate students and researchers in applied
mathematics and to researchers working with numerical linear algebra applications.
Åke Björck
For more information about SIAM books, journals, conferences, memberships, and activities, contact:
Åke Björck
3600 Market Street, 6th Floor
Philadelphia, PA 19104-2688 USA
+1-215-382-9800
[email protected] • www.siam.org
OT196
OT196
ISBN: 978-1-61197-794-3
90000
9 781611 977943