0% found this document useful (0 votes)
101 views10 pages

Higham, Nicholas J. 2011: Gaussian Elimination

This document provides an advanced review of Gaussian elimination (GE), which is a standard method for solving systems of linear equations. GE works by performing elementary row operations to reduce a matrix to upper triangular form. This process can be interpreted as computing an LU factorization of the original matrix, where L is a lower triangular matrix and U is an upper triangular matrix. Pivoting strategies are required to ensure the numerical stability of GE. Special properties of GE apply when the original matrix has certain structures. Block pivoting and iterative refinement can improve the solution computed by GE.

Uploaded by

Xzxz Wahsh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views10 pages

Higham, Nicholas J. 2011: Gaussian Elimination

This document provides an advanced review of Gaussian elimination (GE), which is a standard method for solving systems of linear equations. GE works by performing elementary row operations to reduce a matrix to upper triangular form. This process can be interpreted as computing an LU factorization of the original matrix, where L is a lower triangular matrix and U is an upper triangular matrix. Pivoting strategies are required to ensure the numerical stability of GE. Special properties of GE apply when the original matrix has certain structures. Block pivoting and iterative refinement can improve the solution computed by GE.

Uploaded by

Xzxz Wahsh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Gaussian Elimination

Higham, Nicholas J.

2011

MIMS EPrint: 2011.16

Manchester Institute for Mathematical Sciences


School of Mathematics

The University of Manchester

Reports available from: http://eprints.maths.manchester.ac.uk/


And by contacting: The MIMS Secretary
School of Mathematics
The University of Manchester
Manchester, M13 9PL, UK

ISSN 1749-9097
Article type: Advanced Review

Gaussian Elimination1 Nicholas J. Higham


School of Mathematics, The University of Manchester, Manchester, M13 9PL, UK ([email protected],
http://www.ma.man.ac.uk/˜higham)

Abstract
As the standard method for solving systems of linear equations, Gaussian elimination (GE) is one of
the most important and ubiquitous numerical algorithms. However, its successful use relies on under-
standing its numerical stability properties and how to organize its computations for efficient execution
on modern computers. We give an overview of GE, ranging from theory to computation. We explain
why GE computes an LU factorization and the various benefits of this matrix factorization viewpoint.
Pivoting strategies for ensuring numerical stability are described. Special properties of GE for certain
classes of structured matrices are summarized. How to implement GE in a way that efficiently exploits
the hierarchical memories of modern computers is discussed. We also describe block LU factorization,
corresponding to the use of pivot blocks instead of pivot elements, and explain how iterative refinement
can be used to improve a solution computed by GE. Other topics are GE for sparse matrices and the
role GE plays in the TOP500 ranking of the world’s fastest computers.

Keywords
Gaussian elimination, LU factorization, pivoting, numerical stability, iterative refinement

Introduction special results that hold for LU factorization when the


matrix has particular properties. Computer implemen-
tation is then discussed, as well as a version of GE that
Gaussian elimination (GE) is the standard method for uses block pivots. Iterative refinement—a means for
solving a system of linear equations. As such, it is one improving the quality of a computed solution—is also
of the most ubiquitous numerical algorithms and plays described.
a fundamental role in scientific computation.
We will need the following notation. The unit roundoff
GE was known to the ancient Chinese [Lay-Yong and (or machine precision) is denoted by u; in IEEE dou-
Kangshen, 1989] and is familiar to many school chil- ble precision arithmetic it has the value u = 2−53 ≈
dren as the intuitively natural method of eliminating 1.1 × 10−16 . We write f l(A) for the result of round-
variables from linear equations. Gauss used it in the ing the elements of A to floating point numbers. The
context of the linear least squares problem [Gauss, ith unit vector ei is the vector that is zero except for
1995], [Grcar, 2010], [Stewart, 1995]. Undergraduates a 1 in the ith element. The notation 1 : n denotes the
learn the method in linear algebra courses, where it is vector [1, 2, . . . , n], while n : − 1 : 1 denotes the vector
usually taught in conjunction with reduction to echelon [n, n−1, . . . , 1]. A(i, j), with i and j vectors of indices,
form. In this linear algebra context GE is shown to be denotes the submatrix of A comprising the intersection
a tool for obtaining all solutions to a linear system, for of the rows specified by i and the columns specified by
computing the determinant, and for deducing the rank j. kAk denotes any subordinate matrix norm, and we
of the coefficient matrix. However, there is much more sometimes use the ∞-norm, given n×n
to GE from the point of view of matrix analysis and ma- Pnfor A ∈ R by
the formula kAk∞ = max1≤i≤n j=1 |aij |.
trix computations.
In this article we survey the many facets of GE that are
relevant to computation—in statistics or in other con-
texts. We begin in the next section by summarizing
LU Factorization
GE and its basic linear algebraic properties, including
conditions for its success and the key interpretation of The aim of GE is to reduce a full system of n lin-
the elimination as LU factorization. Then we turn to ear equations in n unknowns to triangular form using
the numerical properties of LU factorization and dis- elementary row operations, thereby reducing a prob-
cuss pivoting strategies for ensuring numerical stability. lem that we can’t solve to one that we can. There are
In the section “Structured Matrices” we describe some n − 1 stages, beginning with A(1) = A ∈ Rn×n ,
1 To appear in Wiley Interdisciplinary Reviews: Computational Statistics. Version of 31-1-11.

1
b(1) = b, and finishing with the upper triangular sys- generalizing the LU factorization to rectangular matri-
tem A(n) x = b(n) . At the start of the kth stage we have ces, though by far its most common use is for square
converted the original system to A(k) x = b(k) , where matrices.
GE may fail with a division by zero during formation of
k−1 n−k+1
 (k) (k)  the multipliers. The following theorem shows that this
k−1 A11 A12 happens precisely when A has a singular leading princi-
A(k) = (k) , (1)
n−k+1 0 A22 pal submatrix of dimension less than n [Higham, 2002,
(k)
Thm. 9.1]. We define Ak = A(1 : k, 1 : k).
with A11 upper triangular. The kth stage of the elimi-
(k)
nation zeros the elements below the pivot element akk Theorem 1 There exists a unique LU factorization of
in the kth column of A(k) according to the operations A ∈ Rn×n if and only if Ak is nonsingular for k =
1 : n − 1. If Ak is singular for some 1 ≤ k ≤ n − 1 then
(k+1) (k) (k)
aij = aij − mik akj , i, j = k + 1 : n, (2a) the factorization may exist, but if so it is not unique.
(k+1) (k) (k)
bi = bi − mik bk , i = k + 1 : n, (2b) The conditions of the theorem are in general difficult to
where the quantities check, but for some classes of structured matrices they
can be shown always to hold; see the section “Struc-
(k) (k) tured Matrices”.
mik = aik /akk , i = k + 1: n
(k) The interpretation of GE as an LU factorization is very
are called the multipliers and akk is called the pivot. At
important, because it is well established that the ma-
the end of the (n − 1)st stage we have the upper trian- trix factorization viewpoint is a powerful paradigm for
gular system U x ≡ A(n) x = b(n) , which is solved by thinking and computing [Golub and Van Loan, 1996],
back substitution. Back substitution for the upper trian-[Stewart, 2000]. In particular, separating the computa-
gular system U x = b is the recurrence tion of LU factorization from its application is benefi-
cial. We give several examples. First, note that given
xn = bn /unn
 n  A = LU we can write Ax = b1 as LU x = b1 , or
Lz = b1 and U x = z; thus x is obtained by solving
X
x k = bk − ukj xj /ukk , k = n − 1 : −1 : 1.
j=k+1
two triangular systems. If we need to solve for another
right-hand side b2 we can just carry out the correspond-
Much insight into GE is obtained by expressing it in ing triangular solves, re-using the LU factorization—
matrix notation. We can write A(k+1) = Mk A(k) , something that is not so obvious if we work with the
where the Gauss transformation Mk = I − mk eTk with GE equations (2) that mix up operations on A and b.
T
T
mk = [0, . . . , 0, mk+1,k , . . . , mn,k ] . Overall, Similarly, solving A y = c reduces to solving the tri-
angular systems U T z = c and LT y = z using the avail-
Mn−1 Mn−2 . . . M1 A = A(n) =: U, able factors L and U . Another example is the computa-
tion of the scalar α = y T A−1 x, which can be rewritten
By using the fact that Mk−1 = I + mk eTk it is easy to α = y T U −1 · L−1 x (or α = y T · U −1 L−1 x) and so
show that again requires just two triangular solves and avoids the
need to invert a matrix explicitly. Finally, note that if
A = M1−1 M2−1 . . . Mn−1 −1
U A = LU and A−1 = (bij ) then
T T T
= (I + m1 e1 )(I + m2 e2 ) . . . (I + mn−1 en−1 )U u−1 T −1
en = eTn U −1 L−1 en = eTn A−1 en = bnn .
nn = en U
 n−1 
Thus the reciprocal of unn is an element of A−1 , and
X
= I+ mi eTi U
i=1 so we have the lower bound kA−1 k ≥ |u−1 nn |, for all the

1
 standard matrix norms.
 m21 1  A very useful representation of the first stage of GE is
 . .. 
=  .. m32 .  U =: LU. 1 n−1
 
T
 .  
 .. .. ..  1 a11 a
. .  A=
n−1 b C
mn1 mn2 ... mn,n−1 1   
1 0 a11 aT
The upshot is that GE computes an LU factorization = .
b/a11 In−1 0 C − baT /a11
A = LU (also called an LU decomposition), where
L is unit lower triangular and U is upper triangular. The matrix C−baT /a11 is the Schur complement of a11
The cost of the computation is (2/3)n3 + O(n2 ) flops, in A. More generally, it can be shown that the matrix
(k) (k)
where a flop denotes a floating point addition, subtrac- A22 in (1) can expressed as A22 = A22 −A21 A−111 A12 ,
(1)
tion, multiplication or division. There is no difficulty in where Aij ≡ Aij ; this is the Schur complement of A11

2
in A. Various structures of A can be shown to be inher- Thus an element of maximal magnitude in the pivot col-
ited by the Schur complement (for example symmet- umn is selected as pivot.
ric positive definiteness and diagonal dominance), and
Complete pivoting. At the start of the kth stage rows k
this enables the proof of several interesting results about
and r and columns k and s are interchanged where
the LU factors (including some of those in the section
“Structured Matrices”). (k)
|a(k)
rs | := max |aij |;
k≤i,j≤n
Explicit determinantal formulae exist for the elements
of L and U (see, e.g., Householder [1964, p. 11]): in other words, a pivot of maximal magnitude is chosen
det A([1: j − 1, i], 1: j)
 over the whole remaining submatrix.
`ij = , i ≥ j,
det(Aj ) Rook pivoting. At the start of the kth stage, rows k and
det A(1: i, [1: i − 1, j])
 r and columns k and s are interchanged, where
uij = , i ≤ j.
det(Ai−1 ) (k) (k)
|a(k)
rs | = max |ais | = max |arj |;
k≤i≤n k≤j≤n
Although elegant, these are of limited practical use.
in other words, a pivot is chosen that is the largest in
magnitude in both its column (as for partial pivoting)
Pivoting and Numerical Stability and its row. The pivot search is done by repeatedly
looking down a column and across a row for the largest
element in modulus; see Figure 1.
In practical computation it is not just zero pivots that
are unwelcome but also small pivots. The problem with Partial pivoting requires O(n2 ) comparisons in total.
small pivots is that they can lead to large multipliers Complete pivoting requires O(n3 ) comparisons, which
mik . Indeed if mik is large then there is a possible loss is of the same order of magnitude as the arithmetic and
(k) (k)
of significance in the subtraction aij − mik akj , with so is a significant cost. The cost of rook pivoting is in-
(k)
low-order digits of aij being lost. Losing these digits termediate between the two and depends on the matrix.
could correspond to making a relatively large change to The effect on the LU factorization of the row and col-
the original matrix A. The simplest
 example of this phe- umn interchanges in these pivoting strategies can be
nomenon is for the matrix A = 1 11 , where we assume

captured in permutation matrices P and Q; it can be
0 <  < u. GE produces shown that P AQ = LU with a unit lower triangular
     L and upper triangular U (with Q = I for partial piv-
 1 1 0  1
= = LU. oting). In other words, the triangular factors are those
1 1 1/ 1 0 −1/ + 1
that would be obtained if all the interchanges were done
In floating point arithmetic the factors are approximated at the start of the elimination and GE without pivoting
by were used.
  In order to assess the success of these pivoting strate-
1 0
f l(L) = =: L,
b gies in improving numerical stability we need a back-
1/ 1
  ward error analysis result. Such a result expresses the
 1 effects of all the rounding errors committed during the
f l(U ) = =: Ub,
0 −1/ computation as an equivalent perturbation on the orig-
inal data. Since we can assume P = Q = I without
which would be the exact answer if we changed loss of generality, the result is stated for GE without
a22 from 1 to 0. Hence L bUb = A + ∆A with
pivoting. This is the result of Wilkinson [1961] (which
k∆Ak∞ /kAk∞ = 1/2  u. This shows that for this he originally proved for partial pivoting); for a modern
matrix GE does not compute the LU factorization of a proof see Higham [2002, Thm. 9.3, Lem. 9.6].
matrix close to A, which means that GE is behaving as
an unstable algorithm.
Theorem 2 Let A ∈ Rn×n and suppose GE produces
Three different pivoting strategies are available that at- a computed solution x b to Ax = b. Then
tempt to avoid instability. All three strategies ensure
that the multipliers are nicely bounded: |mik | ≤ 1, (A + ∆A)b x = b, k∆Ak∞ ≤ p(n)ρn ukAk∞ ,
i = k + 1 : n.
Partial pivoting. At the start of the kth stage, the kth where p(n) is a cubic polynomial and the growth factor
and rth rows are interchanged, where (k)
maxi,j,k |aij |
(k) (k) ρn = .
|ark | := max |aik |. maxi,j |aij |
k≤i≤n

3
Ideally, we would like k∆Ak∞ ≤ ukAk∞ , which re- the growth factor. We give a just a very brief overview.
flects the uncertainty caused simply by rounding the el- For further details of all these properties and results see
ements of A. The growth factor ρn ≥ 1 measures the Higham [2002].
growth of elements during the elimination. The cubic
GE without pivoting exploits symmetry, in that if A is
term p(n) arises from many triangle inequalities in the (k)
symmetric then so are all the reduced matrices A22 in
proof and is pessimistic; replacing it by its square root
(1), but symmetry does not by itself guarantee the exis-
gives a more realistic bound, but this term is in any case
tence or numerical stability of the LU factorization. If
outside our control. The message of the theorem is that
A is also positive definite then GE succeeds (in light of
GE will be backward stable if ρn is of order 1. A pivot-
Theorem 1) and the growth factor ρn = 1, so pivoting
ing strategy should therefore aim to keep ρn small.
is not necessary. However, it is more common for sym-
If no pivoting is done ρn can be arbitrarily large. For metric positive definite matrices to use the Cholesky
example, for the matrix A = 1 11 (0 <  < u) men- factorization A = RRT , where R is upper triangular


tioned at the start of this section, ρn = 1/ − 1. with positive diagonal elements [Higham, 2009]. For
The maximum size of the growth factor for the three general symmetric T
indefinite matrices factorizations of
pivoting strategies has been the subject of much re- the form P AP = LDLT are used, where P is a per-
search. For partial pivoting, Wilkinson [1961] showed mutation matrix, L is unit lower triangular, and D is
that ρn ≤ 2n−1 and that this bound is attainable. In block diagonal with diagonal blocks of dimension 1 or
practice, ρn is almost always of modest size (ρn ≤ 50, 2. Several pivoting strategies are available for choosing
say), but a good understanding of this phenomenon is P , of which one is a symmetric form of rook pivoting.
still lacking. If A is diagonally dominant by rows, that is,
For complete pivoting a much smaller bound on the n
X
growth factor was derived by Wilkinson [1961]: |aij | ≤ |aii |, i = 1 : n,
j=1
1
ρn ≤ n1/2 (2 · 31/2 · · · n1/(n−1) )1/2 ∼ c n1/2 n 4 log n
. j6=i

However, this bound usually significantly overestimates or A is diagonally dominant by columns (that is, AT
the size of ρn . Indeed for many years a conjecture that is diagonally dominant by rows) then it is safe not to
ρn ≤ n for complete pivoting (for real A) was open. use interchanges: the LU factorization without pivoting
This was finally resolved by Gould [1991] and Edel- exists and the growth factor satisfies ρn ≤ 2.
man [1992], who found an example with ρ13 > 13. Re- If A has bandwidth p, that is, aij = 0 for |i − j| > p,
search on certain aspects of the size of ρn for complete then in an LU factorization L and U also have band-
pivoting is ongoing [Kravvaritis and Mitrouli, 2009]. width p (`ij = 0 for i > j + p and uij = 0 for
Interestingly, ρn ≥ n for any Hadamard matrix (a ma- j > i + p). With partial pivoting the bandwidth is not
trix of 1s and −1s with orthogonal columns) and any preserved, but it is nevertheless true that in P A = LU
pivoting strategy [Higham and Higham, 1989]. For the upper triangular factor U has bandwidth 2p and
3
rook pivoting, the bound ρn ≤ 1.5n 4 log n was obtained L has at most p + 1 nonzeros per column; moreover,
by Foster [1997]. ρn ≤ 22p−1 −(p−1)2p−2 . Tridiagonal matrices (p = 1)
In addition to the backward error, the relative error form an important special case.
kx − x bk/kxk of the solution x b computed in floating A matrix is totally nonnegative if the determinant of
point arithmetic is also of interest. A bound on the rela- every square submatrix is nonnegative. The Hilbert
tive error is obtained by applying standard perturbation matrix (1/(i + j − 1)) is an example of such a ma-
bounds for linear systems to Theorem 2 [Higham, 2002, trix. If A is nonsingular and totally nonnegative then
Chap. 7]. A typical bound is it has an LU factorization A = LU in which L and U
kx − x
bk∞ cn uκ∞ (A) are totally nonnegative, so that in particular L and U
≤ , have nonnegative elements. Moreover, the growth fac-
kxk∞ 1 − cn uκ∞ (A)
tor ρn = 1. More importantly, for such matrices it can
where κ∞ (A) = kAk∞ kA−1 k∞ is the matrix condi- be shown that a much stronger componentwise form of
tion number with respect to inversion and cn = p(n)ρn . Theorem 2 holds with |∆aij | ≤ cn u|aij | for all i and j,
where cn ≈ 3n.

Structured Matrices
Algorithms
A great deal of research has been directed at special-
izing GE to take advantage of particular matrix struc- In principle, GE is computationally straightforward. It
tures and to proving properties of the LU factors and can be expressed in pseudocode as follows:

4
for k = 1: n − 1 We mention two very active areas of current research in
for j = k: n GE, and more generally in dense linear algebra compu-
for i = k + 1: n tations, both of which are aiming to extend the capabil-
mik = aik /akk ities of the state of the art package LAPACK [Ander-
aij = aij − mik akj son et al., 1999] to shared memory computers based on
end multi-core processor architectures. The first is aimed
end at developing parallel algorithms that run efficiently on
end systems with multiple sockets of multicore processors.
A key goal is to minimize the amount of communication
between processors, since on such evolving architec-
Here, pivoting has been omitted, and at the end of the tures communication costs are increasingly significant
computation the upper triangle of A contains the upper relative to the costs of floating point arithmetic. A sec-
triangular factor U and the elements of L are the mij . ond area of research aims to exploit graphics processing
Incorporating partial pivoting, and forming the permu- units (GPUs) in conjunction with multicore processors.
tation matrix P such that P A = LU , is straightforward. GPUs have the ability to perform floating point arith-
metic at very high parallelism and and are relatively in-
There are 3! ways of ordering the three nested loops expensive. Current projects addressing these areas in-
in this pseudocode, but not all are of equal efficiency clude the PLASMA (http://icl.cs.utk.edu/
for computer implementation. The kji ordering shown plasma) and MAGMA (http://icl.cs.utk.
above forms the basis of early Fortran implementations edu/magma) projects. Representative papers are But-
of GE such as those in Forsythe and Moler [1967], tari et al. [2009] and Tomov et al. [2010]. Further activ-
Moler [1972], and the LINPACK package [Dongarra ity is concerned with algorithms for distributed memory
et al., 1979]—the inner loop traverses the columns of machines, aiming to improve upon those in the ScaLA-
A, which matches the order in which Fortran stores PACK library [Blackford et al., 1997]; see, for example,
the elements of two-dimensional arrays. The hierarchi- Grigori et al. [2008].
cal computer memories prevalent from the 1980s on-
wards led to the need to modify implementations of
GE in order to maintain good efficiency: now the loops
must be broken up into pieces, leading to partitioned (or Block LU Factorization
blocked) algorithms. For a given block size r > 1, we
can derive a partitioned algorithm by writing At each stage of GE a pivot element is used to eliminate
elements below the diagonal in the pivot column. This
    
A11 A12 L11 0 Ir 0
= notion can generalized to use a pivot block to eliminate
A21 A22 L21 In−r 0 S
  all elements below that block. For example, consider
U11 U12
× , the factorization
0 In−r
 
0 1 1 1
where A11 , L11 , U11 ∈ Rr×r . Ignoring pivoting, the  −1 1 1 1 
idea is to compute an LU factorization A11 = L11 U11 A=  −2 3 4 2 

(by whatever means), solve the multiple right-hand side
−1 2 1 3
triangular systems L21 U11 = A21 and L11 U12 = A12   
for L21 and U12 respectively, form S = A22 − L21 U12 , 1 0 0 0 0 1 1 1
and apply the same process to S to obtain L22 and  0 1 0 0   −1 1 1 1 
=  
U22 . The computations yielding L21 , U12 , and S are  1 2 1 0   0 0 1 −1 
all matrix–matrix operations and can be carried out us- 1 1 0 1 0 0 −1 1
ing level 3 BLAS [Dongarra et al., 1990a], [Dongarra ≡ L1 U1 .
et al., 1990b], for which highly optimized implemen-
tations are available for most machines. The optimal GE without pivoting fails on A because of the zero (1,1)
choice of the block size r depends on the particular pivot. The displayed factorization corresponds to using
computer architecture. It is important to realize that the leading 2 × 2 principal submatrix of A to eliminate
this partitioned algorithm is mathematically equivalent the elements in the (3 : 4, 1 : 2) submatrix. In the con-
to any other variant of GE: it does the same operations text of a linear system Ax = b, we have effectively
in a different order, but one that reduces the amount of solved for the variables x1 and x2 in terms of x3 and x4
data movement among different levels of the computer and then substituted for x1 and x2 in the last two equa-
memory hierarchy. In an attempt to extract even bet- tions. This is the key idea underlying block Gaussian
ter performance recursive algorithms of this form with elimination, or block LU factorization. In general, for
r ≈ n/2 have also been developed [Gustavson, 1997], a given partitioning A = (Aij )m i,j=1 with the diagonal
[Toledo, 1997]. blocks Aii square (but not necessarily all of the same

5
dimension), a block LU factorization has the form b←x
3. Update x b + d.
I
 
 L21 I  In the absence of rounding errors, x b is the exact solu-
A=  ... ..  tion to the system after one iteration of the three steps.
. 
In practice, rounding errors vitiate all three steps and the
Lm1 . . . Lm,m−1 I process is iterative. For x
b computed by GE, the system
U11 U12 ... U1m
 
Ad = r is solved using the LU factorization already
.. computed, so each iteration requires only O(n2 ) flops.
U22 .
 
×  ≡ LU,
 
.. Iterative refinement was popular in the 1960s and
 . Um−1,m 
Umm 1970s, when it was implemented with the residual r
computed at twice the working precision, which we
where L and U are block triangular but U is not nec- call mixed precision iterative refinement. On some ma-
essarily triangular. This is in general different from the chines of that era it was possible to accumulate in-
usual LU factorization. A less restrictive analogue of ner products in extra precision in hardware, making
Theorem 1 holds Higham [2002, Thm. 13.2]. implementation of the process easy. From the 1980s
onwards, computing extra precision residuals became
Theorem 3 The matrix A = (Aij )m i,j=1 ∈ R
n×n
has problematic and this spurred research into fixed pre-
a unique block LU factorization if and only if the first cision iterative refinement, where only one precision
m−1 leading principal block submatrices of A are non- is used throughout. In the last few years mixed pre-
singular. cision iterative refinement has come back into favour,
because modern processors either have extra precision
The numerical stability of block LU factorization is less registers or can perform arithmetic in single precision
satisfactory than for the usual LU factorization. How- much faster than in double precision but also because
ever, if A is diagonally dominant by columns, or block standardized routines for extra precision computation
diagonally dominant by columns in the sense that are now available [Basic Linear Algebra Subprograms
Technical , BLAST], [Baboulin et al., 2009], [Demmel
n
X et al., 2006].
kA−1
jj k
−1
− kAij k ≥ 0, j = 1 : n,
i=1 The following theorem summarizes the benefits itera-
i6=j
tive refinement brings to the forward error [Higham,
then the factorization can be shown to be numerically 2002, Sec. 12.1].
stable [Higham, 2002, Chap. 13].
Block LU factorization is motivated by the desire to Theorem 4 Let iterative refinement be applied to the
maximize efficiency on modern computers through the nonsingular linear system Ax = b in conjunction with
use of matrix–matrix operations. It has also been widely GE with partial pivoting. Provided A is not too ill con-
used for block tridiagonal matrices arising in the dis- ditioned, iterative refinement reduces the forward error
cretization of partial differential equations. at each stage until it produces an x
b for which

kx − xbk∞ u, for mixed precision,

kxk∞ cond(A, x)u, for fixed precision,
Iterative Refinement
where cond(A, x) = k |A−1 ||A||x| k∞ /kxk∞ .
Iterative refinement is a procedure for improving a com-
puted solution x b to a linear system Ax = b—usually
one computed by GE. The process repeats the three This theorem tells only part of the story. Under suit-
steps able assumptions, iterative refinement leads to a small
componentwise backward error, as first shown by Skeel
1. Compute r = b − Ab x. [1980]—even for fixed precision refinement. For the
definition of componentwise backward error and further
2. Solve Ad = r. details, see Higham [2002, Sec. 12.2].

6

2 10 1 2 4 5
• •
1 5 2 3 5 6
• •
3 0 3 1 4 1

2 2 14 2 1 0
• •
0 9 5 6 3 8

1 13 3 4 0 1

Figure 1: Illustration of how rook pivoting searches for the first pivot for a particular 6×6 matrix (with the positive
integer entries shown). Each dot denotes a putative pivot that is tested to see if it is the largest in magnitude in
both its row and its column.

Sparse Matrices [sidebar]

A matrix is sparse if it has a sufficiently large number of zero entries that it is worth taking advantage of them in
storing the matrix and in computing with it. When GE is applied to a sparse matrix it can produce fill-in, which
occurs when a zero entry becomes nonzero. Depending on the matrix there may be no fill-in (as for a tridiagonal
matrix), total fill-in (e.g., for a sparse matrix with a full first row and column), or something in-between. Various
techniques are available for re-ordering the rows and columns in order to reduce fill-in. Since numerical stability
is also an issue, these techniques must be combined with a strategy for ensuring that the pivots are sufficiently
large. Modern techniques allow sparse GE to be successfully applied to extremely large matrices. For an up to
date treatment that includes C codes see Davis [2006]. When the necessary memory or computation time for GE
to solve Ax = b becomes prohibitive we must resort to iterative methods, which typically require just the ability
to compute matrix–vector products with A (and possibly its transpose) [Saad, 2003].

TOP500 [sidebar]

The TOP500 list (http://www.top500.org) ranks the world’s fastest computers by their performance on the
LINPACK benchmark [Dongarra et al., 2003], which solves a random linear system Ax = b by an implementation
of GE for parallel computers written in C and MPI. Performance is measured by the floating point execution rate
counted in floating point operations (flops) per second. The user is allowed to tune the code to obtain the best
performance, by varying parameters such as the dimension n, the block size, the processor grid size, and so on.
However, the computed solution x b must produce a small residual in order for the result to be valid, in the sense
that kb − Ab
xk∞ /(ukAk∞ kxk∞ ) is of order 1.
This benchmark has its origins in the LINPACK project [Dongarra et al., 1979], in which the performance of
contemporary machines was compared by running the LINPACK GE code dgefa on a 100 × 100 system.

Conclusion
GE with partial pivoting continues to be the standard numerical method for solving linear systems that
are not so large that considerations of computational cost or storage dictate the use of iterative methods.
The first computer program for GE with partial pivoting was probably that of Wilkinson [1948] (his code
implemented iterative refinement too). It is perhaps surprising that it is still not understood why the
numerical stability of this method is so good in practice, or equivalently why large element growth with
partial pivoting is not seen in practical computations.
This overview has omitted a number of GE-related topics, including

• row or column scaling (or equilibration),

7
• Gauss–Jordan elimination, in which at each stage of the elimination elements both above and
below the diagonal are eliminated, and which is principally used as a method for matrix inversion,
• variants of GE motivated by parallel computing, such as pairwise elimination, in which eliminations
are carried out between adjacent rows only,

• analyzing the extent to which (when computed in floating point arithmetic) an LU factorization
reveals the rank of A,
• the sensitivity of the LU factors to perturbations in A.

For more on these topics see Higham [2002] and the references therein.

References

E. Anderson, Z. Bai, C. H. Bischof, S. Blackford, J. W. Demmel, J. J. Dongarra, J. J. Du Croz,


A. Greenbaum, S. J. Hammarling, A. McKenney, and D. C. Sorensen. LAPACK Users’ Guide. Society
for Industrial and Applied Mathematics, Philadelphia, PA, USA, third edition, 1999. ISBN 0-89871-447-8.
M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, and S. Tomov. Ac-
celerating scientific computations with mixed precision algorithms. Comput. Phys. Comm., 180(12):
2526–2533, 2009.
Basic Linear Algebra Subprograms Technical (BLAST) Forum. Basic Linear Algebra Subprograms
Technical (BLAST) Forum Standard II. Int. J. High Performance Computing Applications, 16(2):115–
199, 2002.
L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling,
G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users’ Guide. Society for
Industrial and Applied Mathematics, Philadelphia, PA, USA, 1997. ISBN 0-89871-397-8.

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. A class of parallel tiled linear algebra algorithms for
multicore architectures. Parallel Comput., 35:38–53, 2009.
T. A. Davis. Direct Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics,
Philadelphia, PA, USA, 2006. ISBN 0-89871-613-6.

J. W. Demmel, Y. Hida, W. Kahan, X. S. Li, S. Mukherjee, and E. J. Riedy. Error bounds from extra
precise iterative refinement. ACM Trans. Math. Software, 32(2):325–351, 2006.
J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart. LINPACK Users’ Guide. Society for Indus-
trial and Applied Mathematics, Philadelphia, PA, USA, 1979. ISBN 0-89871-172-X.

J. J. Dongarra, J. J. Du Croz, I. S. Duff, and S. J. Hammarling. A set of Level 3 basic linear algebra
subprograms. ACM Trans. Math. Software, 16:1–17, 1990a.
J. J. Dongarra, J. J. Du Croz, I. S. Duff, and S. J. Hammarling. Algorithm 679. A set of Level 3 basic
linear algebra subprograms: Model implementation and test programs. ACM Trans. Math. Software, 16:
18–28, 1990b.

J. J. Dongarra, P. Luszczek, and A. Petitet. The LINPACK benchmark: Past, present and future. Con-
currency and Computation: Practice and Experience, 15:803–820, 2003.
A. Edelman. The complete pivoting conjecture for Gaussian elimination is false. The Mathematica
Journal, 2:58–61, 1992.

G. E. Forsythe and C. B. Moler. Computer Solution of Linear Algebraic Systems. Prentice-Hall, Engle-
wood Cliffs, NJ, USA, 1967.
L. V. Foster. The growth factor and efficiency of Gaussian elimination with rook pivoting. J. Comput.
Appl. Math., 86:177–194, 1997. Corrigendum in J. Comput. Appl. Math., 98:177, 1998.

8
C. F. Gauss. Theory of the Combination of Observations Least Subject to Errors. Part One, Part Two,
Supplement. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1995. ISBN 0-
89871-347-1. Translated from the Latin originals (1821–1828) by G. W. Stewart.
G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, MD,
USA, third edition, 1996. ISBN 0-8018-5413-X (hardback), 0-8018-5414-8 (paperback).
N. I. M. Gould. On growth in Gaussian elimination with complete pivoting. SIAM J. Matrix Anal. Appl.,
12(2):354–361, 1991.
J. F. Grcar. How ordinary elimination became Gaussian elimination. Historia Mathematica, 2010. In
press. doi:10.1016/j.hm.2010.06.003.
L. Grigori, J. W. Demmel, and H. Xiang. Communication avoiding Gaussian elimination. In SC ’08:
Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pages 1–12, Piscataway, NJ,
USA, 2008. IEEE Press. ISBN 978-1-4244-2835-9.
F. G. Gustavson. Recursion leads to automatic variable blocking for dense linear-algebra algorithms.
IBM Journal of Research and Development, 41(6):737–755, Nov. 1997.
N. J. Higham. Accuracy and Stability of Numerical Algorithms. Society for Industrial and Applied Math-
ematics, Philadelphia, PA, USA, second edition, 2002. ISBN 0-89871-521-0.
N. J. Higham. Cholesky factorization. WIREs Comp. Stat., 1(2):251–254, 2009.
N. J. Higham and D. J. Higham. Large growth factors in Gaussian elimination with pivoting. SIAM J.
Matrix Anal. Appl., 10(2):155–164, Apr. 1989.
A. S. Householder. The Theory of Matrices in Numerical Analysis. Blaisdell, New York, 1964. ISBN
0-486-61781-5. Reprinted by Dover, New York, 1975.
C. Kravvaritis and M. Mitrouli. The growth factor of a Hadamard matrix of order 16 is 16. Numer. Linear
Algebra Appl., 16:715–743, 2009.
L. Lay-Yong and S. Kangshen. Methods of solving linear equations in traditional China. Historia Math-
ematica, 16(2):107–122, 1989.
C. B. Moler. Matrix computations with Fortran and paging. Comm. ACM, 15(4):268–270, 1972.
Y. Saad. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics,
Philadelphia, PA, USA, second edition, 2003. ISBN 0-89871-534-2.
R. D. Skeel. Iterative refinement implies numerical stability for Gaussian elimination. Math. Comp., 35
(151):817–832, 1980.
G. W. Stewart. Gauss, statistics, and Gaussian elimination. J. Comput. Graph. Statist., 4(1):1–11, 1995.
G. W. Stewart. The decompositional approach to matrix computation. Computing in Science and Engi-
neering, 2(1):50–59, Jan/Feb 2000.
S. Toledo. Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix Anal. Appl.,
18(4):1065–1081, 1997.
S. Tomov, J. Dongarra, and M. Baboulin. Towards dense linear algebra for hybrid GPU accelerated
manycore systems. Parallel Comput., 36(232-240), 2010.
J. H. Wilkinson. Progress report on the Automatic Computing Engine. Report MA/17/1024, Mathematics
Division, Department of Scientific and Industrial Research, National Physical Laboratory, Teddington,
UK, Apr. 1948.
J. H. Wilkinson. Error analysis of direct methods of matrix inversion. J. Assoc. Comput. Mach., 8:
281–330, 1961.

Cross-References
Cholesky decomposition, Numerical analysis, Sparse matrix computations

You might also like