0% found this document useful (0 votes)
8 views31 pages

Exploiting Zeros On The Diagonal in The

The document discusses the development of a new code, MA47, for solving sparse indefinite symmetric linear systems, focusing on the utilization of zero diagonal entries to enhance efficiency. This code incorporates advanced features for improved execution speed, particularly on vector and parallel computing systems. The algorithm employs multifrontal techniques and structured pivots to maintain sparsity and stability during factorization, making it suitable for various applications in numerical linear algebra.

Uploaded by

mario sandoval
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views31 pages

Exploiting Zeros On The Diagonal in The

The document discusses the development of a new code, MA47, for solving sparse indefinite symmetric linear systems, focusing on the utilization of zero diagonal entries to enhance efficiency. This code incorporates advanced features for improved execution speed, particularly on vector and parallel computing systems. The algorithm employs multifrontal techniques and structured pivots to maintain sparsity and stability during factorization, making it suitable for various applications in numerical linear algebra.

Uploaded by

mario sandoval
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Exploiting Zeros on the Diagonal in the

Direct Solution of Indefinite Sparse


Symmetric Linear Systems
I. S. DUFF and J. K. REID
Rutherford Appleton Laboratory

We describe the design of a new code for the solution of sparse indefinite symmetric linear
systems of equations. The principal difference between this new code and earlier work lies in
the exploitation of the additional sparsity available when the matrix has a significant number
of zero diagonal entries. Other new features have been included to enhance the execution
speed, particularly on vector and parallel machines.
Categories and Subject Descriptors: G.1.3 [Numerical Analysis]: Numerical Linear Alge-
bra—linear systems (direct methods); sparse and very large systems
General Terms: Algorithms, Performance
Additional Key Words and Phrases: sparse, 2 3 2 pivots, augmented systems, BLAS, Gaussian
elimination, indefinite symmetric matrices, zero diagonal entries

1. INTRODUCTION
This article describes the design of a collection of Fortran subroutines for
the direct solution of sparse symmetric sets of n linear equations

Ax 5 b, (1.1)

when the matrix A is symmetric and has a significant number of zero


diagonal entries. An example of applications in which such linear systems
arise is the equality-constrained least-squares problem

minimize i Bx 2 c i 2 (1.2)
x

subject to

Cx 5 d. (1.3)

Authors’ address: Computing and Information Systems Department, Rutherford Appleton


Laboratory, Chilton, Didcot, Oxon OX11 0QX, England; email: {isd; jkr}@rl.ac.uk.
Permission to make digital / hard copy of part or all of this work for personal or classroom use
is granted without fee provided that the copies are not made or distributed for profit or
commercial advantage, the copyright notice, the title of the publication, and its date appear,
and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to
republish, to post on servers, or to redistribute to lists, requires prior specific permission
and / or a fee.
© 1996 ACM 0098-3500/96/0600 –0227 $03.50

ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996, Pages 227–257.
228 • I. S. Duff and J. K. Reid

This is equivalent to solving the sparse symmetric linear system

1 21 2 1 2
I B r c
0 C l 5 d . (1.4)
T
B CT 0 x 0

Another example is the quadratic programming problem

1
minimize x THx 1 c Tx (1.5)
x 2

subject to the linear equality constraints (1.3), where H is a symmetric


matrix. Such problems arise both in their own right and as subproblems in
constrained optimization calculations. Under a suitable inertial condition,
the problem is equivalent to solving the symmetric but indefinite system of
linear equations

S H
C
CT
0
DS D S D
x
l
5
2c
d
. (1.6)

Our earlier Harwell Subroutine Library code MA27 [Duff and Reid 1982;
1983] uses a multifrontal solution technique and is unusual in being able to
handle indefinite matrices. It has a preliminary analysis phase that
chooses a tentative pivot sequence from the sparsity pattern alone, assum-
ing that the matrix is definite so that all the diagonal entries are nonzero
and suitable as 1 3 1 pivots. For the indefinite case, this tentative pivot
sequence is modified in the factorization phase to maintain stability by
delaying the use of a pivot if it is too small or by replacing two pivots by a
2 3 2 block pivot [Bunch and Parlett 1971].
The assumption that all the diagonal entries are nonzero is clearly
violated in the preceding examples. For such problems, the fill-in during
the factorization phase of MA27 can be significantly greater than predicted
by the analysis phase. Duff et al. [1991] found that the use of 2 3 2 pivots
with zeros on the diagonal alleviated this problem and assisted the preser-
vation of sparsity during the analysis phase. Our new code, MA47, is based
upon this work and, like MA27, uses a multifrontal method. It will work for
the definite case, but there are many opportunities for simplifications and
efficiency improvements; so we plan to provide a separate code for this
special case.
The factorization used has the form

A5PLDL TP T (1.7)

where P is a permutation; L is unit lower triangular; and D is block


diagonal. A tentative choice of the permutation and blocking is made by
working symbolically from the zero/nonzero structure of the matrix A. We
call this the analyze phase. Given the results from this phase, we factorize
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
Exploiting Zeros on the Diagonal • 229

a matrix of the given structure, including symmetric permutations for the


sake of numerical stability, but trying to keep close to the tentative pivotal
sequence and block structure. Finally, we use the factorization to solve for a
particular vector b.
Throughout this article, we use the term entry for a matrix coefficient
that is nonzero or might be nonzero. Note that sometimes an entry may
have the value zero, but a coefficient that is not an entry is always zero. If
a coefficient of the reduced matrix is obtained by modification of an entry,
we regard the result as an entry even if it has the value zero because it
might be nonzero for another matrix of the same pattern. Also, the user
may find it convenient to treat a zero as an entry during the analyze phase
in anticipation of a later matrix having a nonzero in the corresponding
position.
MA47 accepts an n 3 n symmetric sparse matrix whose entries are stored
in any order in a real array with their row and column indices stored in
corresponding positions in integer arrays. Each pair of off-diagonal entries
a ij and a ji is represented by either entry. Multiple entries are permitted
and are summed. This is the most user friendly format that we have been
able to devise, and is the same as that of MA27.
We describe the algorithm in Section 2. There are four subroutines that
are called directly by the user:
Initialize. MA47I provides default values for the arrays that together
control the execution of the package.
Analyze. MA47A accepts the pattern of A and makes a tentative choice
of block pivots. It also calculates other data needed for actual factorization.
The user may provide a pivotal sequence, in which case the necessary data
will be generated.
Factorize. MA47B accepts a matrix A together with a set of recom-
mended block pivots. It performs the factorization, including additional
permutations when they are needed for numerical stability.
Solve. MA47C uses the factorization produced by MA47B to solve the
equations Ax 5 b.
These are described in detail in a separate report [Duff and Reid 1995], in
which the specification document is included as an appendix. Section 3 is
devoted to our experience of the actual running of the code. The code has
been placed in the Harwell Subroutine Library and is available from
AEA Technology, Harwell; the contact is Dr. Scott Roberts, Harwell Sub-
routine Library, Bldg. 552, AEA Technology, Harwell, Didcot, Oxon OX11
0RA; telephone (44) 1235 434714; fax (44) 1235 434136; email
^[email protected]&; who will provide details of price and conditions
of use. We also provide a complex version of the code, which handles
symmetric complex matrices. We have chosen not to offer a version for
Hermitian matrices because significant changes would have been needed to
keep track of which off-diagonal entries move between the two triangular
halves as permutations are made.
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
230 • I. S. Duff and J. K. Reid

2. THE ALGORITHM
Our algorithm is based on the work of Duff et al. [1991] which uses 2 3 2
pivots with zeros on the diagonal to assist in the preservation of sparsity.
In addition, it is advantageous to perform elimination with several pivots
simultaneously which we do by accumulating them into a block pivot.
We use block pivots that may be
(i) of the form

S 0
A T1
A1
0
D (2.1)

with A1 square, which we call an oxo pivot;


(ii) of the form

S A2
A T1
A1
0
D or S 0
A T1
A1
A2
D (2.2)

with A1 square, which we call a tile pivot; or


(iii) of any other form, which we call full.
We use the term structured for a pivot that is either a tile or an oxo pivot.
The blocks A1 and A2 are usually full, and we always store them as full
matrices. Note that the inverse of a tile is a tile with its zero block in the
other diagonal position. The inverse of an oxo is an oxo.
The matrix modifications of a block pivotal step that lie outside the pivot
rows and columns (which we call the Schur update) are not applied at the
time, but are stored in a generated-element matrix. This has entries only in
a principal submatrix that after permutation has the general form

1 2
0 B2 B3
B T2 B1 B4 , (2.3)
B T3 B T4 0

where the blocks on the diagonal are square. This is the form of the
submatrix altered by a pivotal step with an oxo pivot and is illustrated in
Figure 1. The blocks B1, B2, B3, and B4 are usually full, and we always
store them as full matrices. A tile pivot produces the special case of this
form where the first or last block row and column are null. It is illustrated
in Figure 2. For a full pivot, the generated element is held as a full matrix,
which is the special case where the first and last block row and column are
both null.
We have chosen the multifrontal technique [Duff and Reid 1983] for the
sake of efficiency during the analyze phase and to permit extensive use of
full-matrix code and the BLAS (Basic Linear Algebra Subprograms [Don-
garra et al. 1988; 1990; Lawson et al. 1979]) during factorization. We use
the notation B(l ) for the generated-element matrix from the lth (block)
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
Exploiting Zeros on the Diagonal • 231

Fig. 1. An oxo pivot, its pivot rows and columns, and fill-in pattern (generated element).

Fig. 2. A tile pivot, its pivot rows and columns, and fill-in pattern (generated element).

pivotal step and the notation Ak and B(l k


)
to denote the submatrices of A
(l )
and B obtained by removing the rows and columns corresponding to the
first k (block) pivotal steps. Following (block) step k, the reduced matrix is
held as

Ak 1 OB ~l !
k , (2.4)
l[Ik

where I k is the set of indices of element matrices that are active then. If
(l )
Bk21 has entries only in the pivotal rows and columns, B(l k
)
will be zero,
(l )
and l is omitted from the index set Ik . Other Bk may have entries that lie
entirely within the pattern of the newly generated element B(k) ; for
efficiency, such a B(l
k
)
is added into B(k) , and l is omitted from Ik . We say
that Bk is absorbed into B(k) .
(l )

Such absorption certainly takes place if the pivot is full and overlaps one
or more of the diagonal entries of B(l )
k because in this case the pivot row has
(l )
an entry for every index of Bk . If all the pivots are full, all generated
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
232 • I. S. Duff and J. K. Reid

elements are full, and therefore any generated element that is involved in a
pivotal step is absorbed. This is the situation for a definite matrix.
In the definite case, the whole process may be represented by a tree,
known as the assembly tree, which has a node for each block pivotal step.
The sons of a node correspond to the element matrices that contribute to
the pivotal row(s) and are absorbed in the generated element. Here it is
efficient to add all the generated elements from the sons and the pivot rows
from the original matrix into a temporary matrix known as the frontal
matrix, which can be held in a square array of size the number of rows and
columns with entries involved. The rows and columns are known as the
front. For a fuller description of this case, see Duff et al. [1986, Sections
10.5 to 10.9].
Given an assembly tree, there is considerable freedom in the ordering of
the block pivotal steps during an actual matrix factorization. The opera-
tions are the same for any ordering such that the pivotal operations at a
node follow those at a descendant of the node (apart from roundoff effects
caused by performing additions in a different order). Subject to this
requirement, the order may be chosen for organizational convenience. For a
uniprocessor implementation, it is usual to base it on postordering follow-
ing a depth-first search of the tree, which allows a stack to be used to store
the generated elements awaiting assembly. We follow this practice.
When there are some structured pivots, we employ the same assembly
tree, but a generated element is not necessarily absorbed at its father node.
Instead, it may persist for several generations, making contributions to
several pivotal rows, until it is eventually absorbed. As an illustration of
absorption not occurring, a simple 1 3 1 pivot might overlap the leading
(zero) block of Eq. (2.3). In such a case, B1 and B4 are absorbed, but the
nonpivotal rows of B2 and B3 are inactive during this step (unless made
active by entries from other generated-element matrices). Absorption of
B(l ) (l )
k occurs for a structured pivot if an off-diagonal entry of Bk overlaps the
off-diagonal block A1 of the pivot. This is seen by regarding the structured
pivot as a sequence of 1 3 1 pivots, starting with the off-diagonal entry and
its symmetrically placed partner. To handle the structured case efficiently,
we sum only the contributions to the pivot rows, form the Schur update,
and then add into it any generated elements from the sons that can be
absorbed. The frontal matrix is thus more complicated, but we still refer to
the set of rows and columns involved as the front.
Similarly the stack is more complicated. Access will be needed to gener-
ated elements corresponding to descendants (not just children), but these
will still be nearer the top of the stack than any generated elements that do
not correspond to descendants. When a generated element is absorbed, it
may leave a hole in the stack. These holes are tolerated until the stack
becomes too large for the available memory, at which point we perform a
simple data compression. To aid both access to data in the stack and stack
compression, we merge adjacent holes as soon as they appear.
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
Exploiting Zeros on the Diagonal • 233

It is interesting to consider the effect of using full frontal matrices of


sufficient size that each can accommodate all the elements and generated
elements of its sons. Apart from roundoff effects caused by performing
additions in a different order, the operations performed on nonzeros will be
the same. However, there will be many additional operations involving
zeros, and there will be many stored zeros. Delaying the assemblies is our
way of exploiting the sparsity.
Previously [Duff et al. 1991], we had anticipated working with generated
elements (after permutation) of the forms

S B1
B T2
B2
0
D and S 0
B T3
B3
0
D. (2.5)

A tile-generated element is of the first form, and an oxo-generated matrix


can be represented as the sum of a matrix of the first form plus one of the
second form:

1 2 1 2 1 2
0 B2 B3 0 B2 0 0 0 B3
B T2 B1 B 4 5 B T2 B1 B4 1 0 0 0 . (2.6)
B T3 B T4 0 0 B T4 0 B T3 0 0

We have decided to use the form (2.3) because, in the case of an oxo-
generated element,
(i) the duplication of the index lists of the first and third blocks is
avoided;
(ii) for a row of the first or third block, one rather than two elements
involve it and need to be included in a list of elements associated with
the row (such lists are needed during the analyze phase); and
(iii) a link would need to be maintained between the two parts of an oxo
generated element in order to recognize that both parts of the
element can be absorbed when an off-diagonal entry overlaps the
off-diagonal block A1 of a structured pivot.

2.1 The Design of the Factorized Form


The following considerations had a profound effect on our design for the
factorized form:
(i) the wish to use block operations during both factorization and solu-
tion; and
(ii) the wish to be able readily to modify the factorization so that it is a
factorization of a positive-definite matrix, needed in some optimiza-
tion calculations.
For a full pivot, both are easy to achieve. If the pivot A11 is full, we
factorize it as

A 11 5 LDL T, (2.7)
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
234 • I. S. Duff and J. K. Reid

where L is unit lower triangular and D is a block diagonal matrix with


blocks of order 1 or 2 and have the matrix factorization

S A 11
T
A 12
A 12
A 22
5 D S
L
MT I
DS D
A 22 2 S
DS LT M
I
D, (2.8)

where M is the matrix of multipliers

M 5 D 21L 21A 12 (2.9)

and A22 2 S is the Schur complement, where


21
S 5 M T DM 5 A 12
T
A 11 A 12 (2.10)

is the Schur update. D may be perturbed to a positive-definite matrix by


examining its (1 3 1 and 2 3 2) diagonal blocks and changing the diagonal
entries as necessary.
A comparable factorization to that of Eq. (2.7) for a block tile pivot is

A 11 5 S 0
A T1
A1
A2
D S
5
L1
U T2 U T
1
DS 0
D1
D1
D2
DS L T1 U2
U1
D, (2.11)

where L1 is unit lower triangular; U1 is unit upper triangular; U2 is strictly


upper triangular; and D1 and D2 are diagonal. The special case U2 5 D2 5 0
corresponds to a block oxo pivot. Note that the relationship

A 1 5 L 1D 1U 1 (2.12)

holds, so that a conventional triangular factorization of A1 is included. We


show that Eq. (2.11), with U2 strictly upper triangular, is a correct
factorization by performing a symmetric permutation to place the rows and
columns in the order 1, r11, 2, r12, 3, r13, . . . , where 2r is the order of
the matrix in Eq. (2.11). This gives the block tile form of the symmetric
matrix

1 2
t 11 t 12 z z t 1r
t 21 t 22 z z t 2r
z z z z z , (2.13)
z z z z z
t r1 t r2 z z t rr

where each t ij is a 2 3 2 matrix whose (1,1) element is zero (each t ij is a

ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.


Exploiting Zeros on the Diagonal • 235

tile). Using t 11 , t 22 , . . . , t rr as pivots, this matrix has the factorization

1 21 21 2
I d 11 I T
l 21 z z T
l r1
l 21 I d 22 I z z T
l r2
z z z z z z z ,
z z z z z z z
l r1 l r2 z z I d rr I
(2.14)

where each l ij has a zero in position (2,1) and where each d ii is a symmetric
tile. If we now apply the inverse permutation, we get the form (2.11) with
the lower triangular entries of L1, UT T
2 , U1 , respectively, being the (1,1),
(2,1), (2,2) entries of l ij , and the diagonal entries of D1 and D2, respectively,
being the (2,1) and (2,2) entries of d ii . An alternative derivation of Eq.
(2.11) is by application of a sequence of elementary row and column
operations that reduce

S 0
A T1
A1
A2
D
to the form

S 0
D1
D1
D2
D
in column 1, row 1, column r11, row r11, column 2, row 2, column r12, row
r12, . . . . Note that, apart from the effects of reordering additions and
subtractions, the forms (2.11) and (2.14) yield the same numerical values.
We use both forms in our code. We compute the factors with form (2.14). We
apply the factors in solutions and in Schur updates using form (2.11), which
permits use of block operations.
The diagonal entries of

S 0
D1
D1
D2
D
can be changed to make it positive definite as easily as those of D in Eq.
(2.7). Thus we may regard Eq. (2.7) as applicable to the tile case with

L5 S L1
U T2 U T
1
D and D5 S D3
D1
D1
D2
D. (2.15)

The special case U2 5 0 corresponds to an oxo pivot.


We therefore store, for each block pivot, L, D, and M. The matrices L and
D may be packed to take advantage of their form. For an oxo pivot, the
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
236 • I. S. Duff and J. K. Reid

nonzero columns of M are ordered to the form

M5 S M5
0
M6
M7
0
M8
0
0
,D (2.16)

and only the blocks M5, M6, M7, and M8 are stored. For a tile pivot, the first
block column is absent. These forms are discussed further in Section 2.3.

2.2 Analyze Phase


In the analyze phase, we simulate the operations of the factorization,
representing each generated element by three index lists and assuming
that every pivot chosen is acceptable numerically. This allows the process-
ing to be efficient, and there is no need for any numerical values; but the
pivotal sequence chosen has to be regarded as tentative. The analyze phase
is faster than it would be if numerical values were taken into account and if
its storage demands are much more modest.
For pivotal strategy, we use the variant of the Markowitz [1957] criterion
recommended by Duff et al. [1991]. The Markowitz cost

~ri 2 1!2 (2.17)

for a diagonal entry a ii , with row count r i (number of entries in the row), is
extended to

~ r i 2 1 !~ r i 1 r j 2 3 ! (2.18)

for a tile pivot with nonzero a ii and

~ r i 2 1 !~ r j 2 1 ! (2.19)

for an oxo pivot. Applying a structured pivot is mathematically equivalent


to pivoting in turn on the two off-diagonal entries of the pivot. In the oxo
case, expression (2.19) is the Markowitz cost of either of these pivots. In the
tile case, expression (2.18) is a bound for the Markowitz costs.
We find a pivot that minimizes this extended Markowitz cost by search-
ing the rows in order of increasing row count as follows:
main_loop: for r :5 1 step 1 until n do
if there is a row i with row count r such that a ii Þ 0 then
accept it as a 1 3 1 pivot and exit main_loop
end if
for each row i with row count r do
for each variable j for which there is a nonzero entry (i, j) in
the current reduced matrix do
if the Markowitz cost # (r 2 1)2 then
accept (i, j) as a 2 3 2 pivot and exit main_loop

ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.


Exploiting Zeros on the Diagonal • 237

end if
if the Markowitz cost is the smallest so far found, store
it as such
end do
end do
if there is a stored 2 3 2 pivot with Markowitz cost # r2 then
accept the 2 3 2 pivot and exit main_loop
end if
end do

We also provide an option to limit the search to a given number of rows


with least entries.
We found it convenient to build a tree that represents the grouping of
variables into blocks as well as the assemblies. It has a node for every
variable, rather than for every block pivot. In the positive-definite case,
this bigger tree is the elimination tree (e.g., see Liu [1990]), so we call it the
elimination tree in this more general setting. The variables of a full pivot or
of either of the halves of a structured pivot are linked together in a chain
whose head we call the principal variable. The two principal variables of a
structured pivot are linked together as father and son. Only these father
nodes and the principal variables of full pivots have a node in the assembly
tree.
It is interesting that, if the same pivot sequence is applied to the matrix
obtained by adding entries to all the positions corresponding to zeros in the
pivots and the elimination tree is formed, it is identical to our elimination
tree apart from the detail in our tree within each tile and oxo pivot.
One of the ways to speed the analyze phase is to recognize rows with the
same structure, both in the original matrix and the successive reduced
matrices. The set of variables that correspond to such a set of rows is called
a supervariable, and we represent the matrix pattern in terms of supervari-
ables. We allow a supervariable to consist of a single variable. If the rows
do not have entries on the diagonal, we say that the supervariable is
defective. Each supervariable is indexed by one of its variables, which is its
principal variable.
A simple example of a matrix and its elimination tree is shown in Figure
3. Here there are just two block pivots; the first is an oxo pivot of order 6
with variable 4 as its first principal variable, and the second is a full pivot
of order 2 with variable 7 as its principal variable. Variable 1 is the
principal variable of the other half of the oxo pivot, and node 1 is the son of
node 4 in the elimination tree. The other variables are linked in chains to
the three principal variables.
We begin the analyze phase by recognizing rows with identical structure
and forming supervariables. With careful choice of data structures (see
Section 2.7), the amount of work performed for each matrix entry is
bounded by a constant, so the whole process executes in O(n 1 t ) time,
where t is the number of entries. This leaves us with a forest with a node
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
238 • I. S. Duff and J. K. Reid

Fig. 3. A matrix pattern with its elimination tree.

for each variable and the variables of each supervariable in a chain with
the principal variable at its head.
This forest is gradually modified until it becomes the elimination tree. At
an intermediate stage, each tree of the forest has a root that represents the
principal variable of either (i) a supervariable that has not yet been pivotal
or (ii) a supervariable that has been pivotal and whose generated-element
matrix is zero or has not yet contributed to a pivotal row. The node of a
principal variable that has been eliminated may be regarded as also
representing the element generated when it was eliminated. When a pivot
block is chosen, the node of its principal variable is given as sons the roots
of all the trees that contain one or more generated elements that contribute
to the pivot rows. If the pivot is a structured pivot, it is also given as a son
the principal variable of the second part. Some amalgamation of supervari-
ables may follow the pivotal step, because the fill-ins may cause some rows
to have identical structures in the reduced matrix.
We have noted that a generated-element matrix need not be absorbed at
its father node. Indeed, it may persist for many generations of the tree,
contributing to pivotal rows at each stage. This may mean that all that is
left is a zero matrix, which is all or part of one of the zero blocks of the
matrix (2.3). It might be thought that such a zero matrix could be
discarded, but to do so risks the loss of a property of an assembly tree that
we wish to exploit: the block pivotal steps may be reordered provided those
at a node follow those at all descendants of the node. We must therefore
retain such a zero element matrix and whenever one of its variables
becomes pivotal, make the root of its tree a son of the pivot node.
In the irreducible case, this eventually yields the elimination tree, whose
root is the only node with a zero generated-element matrix. The node
represents the final block pivot, which obviously leaves a null Schur
complement. If the original matrix is reducible, it must be block diagonal
because it is symmetric. In this case, the problem consists of several
independent subproblems, and there will be a tree for each. It is not
difficult to allow for this, and our code does so. For simplicity of description,
however, we assume that the matrix is irreducible.

ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.


Exploiting Zeros on the Diagonal • 239

Fig. 4. A father-son pair of tile pivots for which amalgamation is possible.

A depth-first search of the elimination tree allows the pivotal sequence to


be found, including the grouping of variables into blocks that are elimi-
nated together. At the same time, the assembly tree is constructed.
The opportunity is taken to look for father-son amalgamations that can
take place with no extra fill-in, because most of them can be found by a
simple test involving the number of variables eliminated at the son and the
row counts before elimination. We have chosen to rely on this as the only
mechanism that we employ to find block pivots when the user specifies the
pivotal order. It is also useful when the pivotal order and tree are chosen by
the code. Many block pivots will be found through the use of supervari-
ables, but not all because it would be costly to ensure that every possible
supervariable is identified (we see no way of doing this without at least one
sweep of all the index lists for the supervariables).
If a father and son are both full pivots, and the number of variables
eliminated at the son is equal to the difference between the son’s row count
before elimination and the father’s before elimination, there is no extra
fill-in if they are amalgamated. The frontal matrix for the father node has
the pattern of the element generated by the son, and there is no advantage
in treating them separately. This is particularly likely to happen near the
root because the reduced matrix is often full in the final stages.
Similarly, a father-son pair of tiles can be amalgamated without extra
fill-in if the counts for the rows with nonzero diagonal entries differ by the
number of variables eliminated and if the counts for the rows with zero
diagonal entries differ by half the number of variables eliminated. A simple
example is shown in Figure 4. Here there is no fill-in, and the row counts at
the time of elimination are (5, 3) and (3, 2).
To see if a father-son pair of oxos can be amalgamated without extra
fill-in, we test the differences for each of the halves against half the number
of variables eliminated. Unfortunately, when the row counts for the two
halves are identical, we cannot be sure that the two halves of the son do not
need to be interchanged if extra fill-in is to be avoided. A simple case is
shown in Figure 5, where variables 1 and 2 are eliminated at the son node
and variables 3 and 4 at the father node. Checking for such an event
requires more information at the nodes than is available at this time. We
therefore do not amalgamate oxo nodes where the row counts of the two
halves are the same.
We also perform some amalgamation that does involve extra fill-in
because of the advantages of reasonably large block sizes, particularly on a
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
240 • I. S. Duff and J. K. Reid

Fig. 5. A father-son pair of oxo pivots for which amalgamation is possible if the variables of
the father are interchanged.

vector or parallel computer. Interfering with a structured pivot may lead to


greatly increased fill-in, so we do this only for father-son pairs that are full.
Also, we require that all ancestors be full, too, to avoid indirectly affecting
a structured pivot. This leads us to using a depth-first search to look for
father-son amalgamations. These amalgamations are controlled by a pa-
rameter (with default value 5) which is a limit on the number of variables
at a node that is amalgamated with another.

2.3 Factorization
The factorization is controlled by the assembly tree created at the end of
the analyze phase. For stability, all the pivots are tested numerically. If the
updated entries of the fully summed rows in the front are f ij , the test for a
1 3 1 pivot is

uf kk u $ u max uf kj u, (2.20)
jÞk

where u is a pivot threshold parameter given a default value during


initialization (see discussion in Section 3), and the test for a 2 3 2 pivot is

*S D *1 2 1 2
21 max u f kj u
f kk f k,k11 jÞk,k11 u 21
# . (2.21)
f k11,k f k11,k11 max u f k11, j u u 21
jÞk,k11

For a tile pivot, it is possible for this test to fail, and yet taking its two
diagonal entries in turn as 1 3 1 pivots would lead to two 1 3 1 pivots that
satisfy inequality (2.20). Using this pair is mathematically equivalent. We
therefore accept the tile pivot in this case, also. For the second 1 3 1 pivot,
we test the inequality

U U U U
2
f k11,k f k11,k
f k11,k11 2 $ u max f k11, j 2 fkj . (2.22)
f k,k jÞk,k11 fk,k

As well as the relative tests we have just described, we also apply an


absolute test on the size of a 1 3 1 pivot or the off-diagonal entry of a 2 3 2
pivot. By default, the value of this pivot tolerance is zero.
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
Exploiting Zeros on the Diagonal • 241

These numerical tests may mean that some rows that we expected to
eliminate during a block pivotal step remain uneliminated at the end of the
step. These rows are stored alongside the generated element for treatment
at the father node. We call these rows fully summed because we know that
there are no contributions to be added from elsewhere, unlike the rows of
the generated element. At the father node, the possible pivot rows consist
of old fully summed rows coming from this son and perhaps other sons, too,
and the new fully summed rows that were recommended as pivots by the
analyze phase.
If a full block pivot was recommended, we choose a simple 1 3 1 or 2 3 2
pivot and perform the corresponding elimination operations on the pivot
rows before choosing the next simple 1 3 1 or 2 3 2 pivot. Both old and new
fully summed rows are candidates. We know of no other strategy for
ensuring that the block pivot as a whole is satisfactory. Note, however, that
the calculations for the Schur update can be delayed and performed as a
block operation once all pivots are chosen.
If a structured block was recommended, the analyze phase expects that
the new fully summed rows have the form

S A2
A T1
A1
0
A5
0
A6
A7
0
A8
0
0
D or S 0
A T1
A1
0
A5
0
A6
A7
0
A8
0
0
D (2.23)

after suitable permutations. We need to check that the potential pivots


(leading two block columns in the matrix (2.23)) still have this form
because earlier changes to the pivotal sequence may destroy it. Assuming
that the form is still there, we again choose simple pivots one at a time and
perform the corresponding elimination operations on the pivot rows before
choosing the next simple pivot. We restrict these pivots to 2 3 2 pivots of
the desired form in the new fully summed rows, noting that each is
identified by an entry of the A1 block. We therefore search A1 or its reduced
form, again using the test (2.21) for stability.
If any fully summed rows remain in the front after completion of this
sequence of simple 2 3 2 pivot operations, we look for simple 1 3 1 and 2 3 2
pivots in these rows, exactly as for the case when full pivots were recom-
mended. Note that these rows may be new fully summed rows in which we
failed to find a structured pivot or old fully summed rows from the sons of
the node. If any 1 3 1 or 2 3 2 full pivots are chosen, we regard the
generated element as full, but by forming the Schur update for the rows of
the structured pivots separately, we can at least take some advantage of
the zero block or blocks within it.
Provided enough pivots are selected in the block pivot step, we use Level
3 BLAS [Dongarra et al. 1990] for constructing the Schur update. In the
case of a full pivot, the Schur update is

S 5 M T DM, (2.24)

where M is a rectangular matrix, and D is a diagonal matrix with blocks of


ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
242 • I. S. Duff and J. K. Reid

order 1 and 2 (see Eq. (2.10)). Within the frontal matrix, we hold a compact
representation of M that excludes zero columns. Therefore, our real concern
is the efficient formation of the product (2.24) for a full rectangular matrix
M. Unfortunately, there are no BLAS routines for forming a symmetric
matrix as the product of two rectangular matrices, so we cannot form DM
and use a BLAS routine for calculating MT(DM) without doing about twice
as much work as necessary. We therefore choose a block size b (with default
size 5 set by the initialization routine) and divide the rows of M and DM
into strips that start in rows 1, b11, 2b11, . . . . This allows us to compute
the block upper triangular part of each corresponding strip of S in turn,
using SGEMM for each strip except the last. For the last (undersize) strip,
we use simple Fortran code and take full advantage of the symmetry. Note
that if b . n, simple Fortran code will be used all the time.
For an oxo pivot, the matrix DM has the form

DM 5 S L1
UT1
DS21
A5
0
A6
A7
0
A8
0
0
D S
5
M5
0
M6
M7
0
M8
0
0
, D (2.25)

and the matrix M has the form

M5 S D 21
1
D 21
1
DS M5
0
M6
M7
0
M8
0
0
D S
5
0
M 11
M9
M 12
M 10
0
0
0
. D (2.26)

We may form the Schur update

1 21 2
T
0 B2 B3 0 0 M 11
B T2
B T3
B1
B T4
B4
0
0
0
5
M T9
T
M 10 0
T
M 12
S M5
0
M6
M7
0
M8
0
0
D (2.27)

0 0 0 0 0 0
by the calculations

~B2 B 3! 5 M 11
T
~M7 M 8! ,

B 4 5 M 12
T
M 8, (2.28)

B 1 5 ~ M T9 T
M 12 ! S D
M6
M7
.

We may use the BLAS 3 routine SGEMM directly for the first two calcula-
tions. For the symmetric matrix B1, we subdivide the computation into
strips, as for the full-pivot case.
For a tile pivot, the sparsity in the first block column of the preceding
form is lost:

DM 5
L1
U T2
S U T1
DS
21
A5
0
A6
A7
0
A8
0
0
.D (2.29)

ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.


Exploiting Zeros on the Diagonal • 243

We therefore amalgamate the first two blocks to give

DM 5 S M6
M7
0
M8
D 0
0
(2.30)

M5 S 2D 21
D121
21
1 D 2D 1 D 21
0
1
DS M6
M7
0
M8
0
0
D S
5
M9
M 12
M 10
0
0
0
D
. (2.31)

The Schur update calculation is therefore as in the oxo case except that the
first block row and column is not present, and we have only the calculations
for B1 and B4.

2.4 Solution
The solution is conveniently performed in two stages. The first, forward
substitution, consists of solving

~ PLP T! y 5 b, (2.32)

and the second, back-substitution, consists of solving

~ PDL TP T! x 5 y. (2.33)

For the first step of the forward substitution, let

12
b1
P 21b 5 b 2 (2.34)
b3

where b1 corresponds to the block pivot, and b2 corresponds to the rest of


the first front. We need to solve the equation

1 21 2 1 2
L b91 b1
MT I b92 5 b 2 . (2.35)
0 0 I b93 b3

This involves the forward substitution

Lb91 5 b 1 (2.36)

and the modification

b92 5 b 2 2 M T b91. (2.37)

In the case of a full pivot, we can employ the Level 2 BLAS routine STPSV
in Eq. (2.36) (we pack the triangular array L to save storage) and the Level
2 BLAS routine SGEMV in Eq. (2.37). For a structured pivot, it is slightly
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
244 • I. S. Duff and J. K. Reid

more complicated. Now L has the form

S L1
U T2 U T1
D ,

so solving Eq. (2.36) requires two applications of STRSV (here, we pack the
arrays L1, D1 and U1 together in a square array) and one application of
STPMV. Also, M has the form

S M5
0
M6
M7
0
M8
0
0
D,

so two applications of SGEMV are needed for Eq. (2.37).


Similar considerations apply to the other steps and to the back-substitu-
tion, as well.

2.5 Recognition of Supervariables


The analyze phase recognizes sets of structurally identical rows in the
original matrix and forms supervariables. With careful choice of data
structures, the whole process can be made to execute in O(n 1 t ) time,
where t is the number of entries. We work progressively so that after j
steps we have the supervariable structure for the submatrix of the first j
columns. We start with all variables in one supervariable (for the subma-
trix with no columns), then split it into two according to which rows do or
do not have an entry in column 1, then split these according to the entries
in column 2, and so on. The splitting is done by moving the variables one at
a time to the new supervariable.
The variables of each supervariable are held in a circular chain. We use
two integer arrays of length n to hold links from each variable to the next
and previous variable in its chain. This allows rapid removal of a variable
from a supervariable and rapid insertion of a variable in the chain that
holds another variable. We begin with all the variables linked in a single
chain.
We use other integer arrays of length n as follows:

—svar(i) is the index of the supervariable to which variable i belongs.


—flag is initially set to zero. flag(s) is set to j when supervariable s is
encountered in column j.
—var(s) holds the index of a variable that was in supervariable s.

Using this data structure, the details of our algorithm are as follows:
for j :5 1 step 1 until n do
for each entry i in column j do
s 5 svar(i) ! s is i’s old supervariable
if flag(s) , j then ! first occurrence of s for column j
flag(s) 5 j

ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.


Exploiting Zeros on the Diagonal • 245

if s has more than one variable then


remove i from s
create a new supervariable ns with i as its only variable
svar(i) 5 ns
var(s) 5 i
end if
else ! second or later occurrence of s for column j
k 5 var(s) ! k is the first variable of s encountered in column j
svar(i) 5 svar(k)
move i from its present chain to the chain containing k
end if
end do
end do

The elements of var that do not correspond to supervariables may be


employed to hold a chain of indices not currently in use for supervariables.

3. NUMERICAL EXPERIENCE

3.1 Introduction
In this section, we examine the performance of the MA47 code on a range of
test problems on a SUN SPARCstation 10 and a CRAY Y-MP. We study the
influence of various parameter settings on the performance of MA47 and
determine the values for the defaults. We also compare the MA47 code with
the code MA27.
As always, the selection of test problems is a compromise between
choosing sufficient problems to obtain meaningful statistics while keeping
run times and this section of the report manageable. In Tables I and II, we
list the test problems used for the numerical experiments in this section. In
choosing this set, many runs were performed on other matrices, so we do
feel that this selection is broadly representative of a far larger set. Our set
of 22 matrices can be divided into three distinct sets. The first (matrices 1
to 10), in Table I, are obtained from the CUTE collection of nonlinear
optimization problems [Bongartz et al. 1995]. The problems in that set are
parameterized, and we have chosen the parameters to obtain a linear
problem of order between about 1000 and 20,000. In each case, we solve a
linear system whose coefficient matrix is the Kuhn-Tucker matrix of the
form

S H
AT
A
0
D
,

where H is an m 3 m symmetric matrix, and A is of dimension m 3 n. We


are grateful to Ali Bouaricha and Nick Gould for extracting these matrices
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
246 • I. S. Duff and J. K. Reid

Table I. CUTE Matrices Used for Performance Testing

Order Order Number


Case Identifier m m 1 n of Entries Description
1. BRITGAS 3102 5802 15282 British gas pipe network distribution
problem.
2. BIGBANK 2230 3342 8056 Nonlinear network problem.
3. MINPERM 8347 16551 16863 Minimize the permanent of a doubly
stochastic matrix.
4. SVANBERG 14000 21000 91000 Structural optimization.
5. BRATU2D 5184 10084 29684 Finite-difference discretization of
nonlinear PDE on unit square.
6. BRATU3D 4913 8288 28538 Finite-difference discretization of
nonlinear PDE on unit cube.
7. GRIDNETC 7564 11408 30256 A nonlinear network problem on a square
grid.
8. QPCSTAIR 614 970 4617 STAIR LP with additional convex Hessian.
9. KSIP 1021 2022 22023 Discretization of a semiinfinite QP.
10. AUG3DQP 3873 4873 10419 QP from nine-point formulation of 3D
PDE.

Table II. Matrices Used for Augmented Systems

Order Order Number


Case Identifier m m1n of Entries Description
11/17 FFFFF800 1028 1552 7429 LP. Oil industry.
12/18 PILOT 4860 6301 40235 LP. Energy model from Stanford.
13/19 ORSIRR 2 886 1772 6861 Oil reservoir simulation.
14/20 JPWH 991 991 1982 7018 Circuit physics modeling.
15/21 BCSSTK27 1224 2448 29899 Structural engineering. Buckling
analysis.
16/22 NNC1374 1374 2748 9980 Nuclear reactor core modeling.

Matrices 11–16 Have I as Leading Blockand Matrices 17–22 Have D.

for us from CUTE. The second and third sets are obtained by forming
augmented systems of the form

S I
AT
A
0
D and S D
AT
A
0
, D with D5 S I
0
D ,

respectively, where the matrix A is from the Harwell-Boeing Collection


[Duff et al. 1992] or the netlib LP collection [Gay 1985]. The matrix D has
m 2 n unit diagonal entries and n zeros on the diagonal. We use the six
matrices from these collections that are shown in Table II, which gives us
another 12 test cases according to the two forms of augmentation previ-
ously shown. In all cases, the dimensions are given in the tables and are
the number of rows in A and the total order of the augmented system. The
number of entries is the total number in the upper triangular part of the
augmented matrix.

ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.


Exploiting Zeros on the Diagonal • 247

Before conducting a systematic study of the parameters, we first experi-


mented to see the effect of using the extended Markowitz cost computed in
the analyze phase (see Eqs. (2.17) – (2.19)) to decide whether to allow
prospective pivots in the factorize phase. The logic of this scheme, which
was proposed by Duff et al. [1991], is that changes to the pivotal sequence
from the analyze phase may mean that later pivots in their recommended
position in the pivotal sequence would be poor choices. There is, however, a
counterbalancing effect whereby even if the pivot is very much poorer on
sparsity grounds than predicted, it may be worse to delay using it because
of further buildup of unanticipated fill-in. In effect, it may be better to “bite
the bullet” and take the bad pivot early rather than late.
When we tested this option on the examples in Table I, we found that
there was usually little to choose between including the Markowitz test or
not. However, in most cases it was slightly worse to use the Markowitz test,
and in one case, the factorization time was increased by nearly a factor of
14 and the total storage for the factorization by nearly a factor of 6.
Because there were no examples so dramatically favoring the Markowitz
test, we have decided to drop it from the code. We have, however, left in the
structure for the test and have only commented out the test itself. We
believe that possible future changes to the way we handle fully summed
blocks might make the test useful in a later version of the code.
In our program for running the numerical experiments, we have an
option to prescale the matrices using Harwell Subroutine MC30. In nearly
all the cases, there is very little difference in performance or accuracy
whether scaling is used or not. There were, however, two cases where
additional pivoting on the unscaled systems caused a significant increase in
time and storage for the factorization, with only one case significantly the
other way. We feel more comfortable assessing the performance on scaled
systems, so we use this option in all the runs in this article.
In the following subsections, we examine the relative performance when
a single parameter is changed by means of the median, upper-quartile, and
lower-quartile ratios over the 22 test problems. We use these values rather
than means and variances to give some protection against stray results
caused either by the timer or by particular features of the problems. We
remind the reader that half the results lie between the quartile values. Full
tables of ratios are available by anonymous ftp from numerical.cc.rl.ac.uk
(130.246.8.23) in the file pub/reports/ma47.tables. In Section 3.2, we con-
sider the effect of choosing the option to restrict the pivot choice to a small
number of columns, and we examine the effect of altering the pivot
threshold in Section 3.3. We study the effect of changing the amount of
node amalgamation in Section 3.4. In Sections 3.5 and 3.6, we examine the
use of higher level BLAS: the Level 3 BLAS in the factorization in the first
section and the use of Level 2 BLAS in the solution phase in the second.
Finally, in Section 3.7, we show the performance of our code with default
parameter values on the test problems and compare it with the Harwell
Subroutine Library code MA27.
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
248 • I. S. Duff and J. K. Reid

Table III. Results with Search Limit of 2 Rows Divided by Those with Limit of 4

Storage for
Total Storage Factors Time (SUN)

Predicted Actual Predicted Actual Analyze Factorize Solve One-Off


lower q. 0.97 1.00 0.98 1.00 0.95 0.98 0.97 0.97
median 1.00 1.00 1.00 1.00 0.99 1.00 1.00 1.00
upper q. 1.00 1.04 1.00 1.02 1.01 1.06 1.03 1.03

Table IV. Results with Search Limit of 4 Rows Divided by Those with Limit of 10

Storage for
Total Storage Factors Time (SUN)

Predicted Actual Predicted Actual Analyze Factorize Solve One-Off


lower q. 1.00 0.97 1.00 0.99 0.98 0.96 0.99 0.97
median 1.00 1.00 1.00 1.00 0.99 1.01 1.00 1.01
upper q. 1.08 1.09 1.02 1.01 1.01 1.10 1.03 1.07

Table V. Results with Search Limit of 10 Divided by Those with No Search Limit

Storage for
Total Storage Factors Time (SUN)

Predicted Actual Predicted Actual Analyze Factorize Solve One-Off


lower q. 1.00 1.00 0.97 1.00 0.73 0.95 0.96 0.94
median 1.00 1.00 1.00 1.00 0.95 1.00 0.98 0.99
upper q. 1.06 1.76 1.00 1.35 1.00 1.86 1.28 1.43

3.2 Effect of Restricting Pivot Selection


If we restrict the number of rows that we search to find a pivot, we might
expect to reduce the time for the analyze phase at the cost, perhaps, of
more storage and time in the factorization. Clearly, the choice depends on
the relative importance of the phases. We show the results of runs varying
the search limit in Tables III to V. The times in these tables are in seconds
on the SUN SPARCstation 10.
The medians are near 1.0, showing slight gains to the analyze time by
restricting the pivot search at a cost of a slightly more expensive factoriza-
tion. The costs for a “one-off ” run of a single analysis followed by one
factorization and solution show that there is really quite a balance in the
competing trends. The upper-quartile figures markedly support having no
search limit (i.e., using a Markowitz count), so we have decided to keep that
as the default and use it for the later runs in this article.

3.3 Effect of Change in Pivot Threshold


In MA27, a value of 0.1 was chosen for the default value of the threshold
parameter u (see Eqs. (2.20) and (2.21)), because smaller values gave little
appreciable benefit to sparsity in the experiments that we conducted at
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
Exploiting Zeros on the Diagonal • 249

Table VI. Results with Threshold u Set to 0.5 Divided by Those with u 5 0.1

Storage Required Time (SUN)

Total Factors Analyze Factorize Solve


lower q. 1.00 1.00 1.00 1.08 1.01
median 1.14 1.09 1.00 1.78 1.07
upper q. 2.14 1.53 1.00 3.32 1.42

Table VII. Results with Threshold u Set to 0.1 Divided by Those with u 5 0.01

Storage Required Time (SUN)

Total Factors Analyze Factorize Solve


lower q. 1.00 1.00 0.96 0.99 1.00
median 1.02 1.05 0.98 1.07 1.03
upper q. 1.28 1.23 1.00 1.97 1.22

Table VIII. Results with Threshold u Set to 0.01 Divided by Those with u 5 0.001

Storage Required Time (SUN)

Total Factors Analyze Factorize Solve


lower q. 1.00 1.00 0.94 1.00 0.98
median 1.00 1.03 0.99 1.11 1.02
upper q. 1.14 1.16 1.03 1.63 1.12

Table IX. Results with Threshold u Set to 0.001 Divided by Those with u 5 0.0001

Storage Required Time (SUN)

Total Factors Analyze Factorize Solve


lower q. 1.00 1.00 1.00 1.03 1.02
median 1.00 1.00 1.03 1.08 1.07
upper q. 1.07 1.05 1.08 1.42 1.10

Table X. Results with Threshold u Set to 0.0001 Divided by Those with u 5 0.00001

Storage Required Time (SUN)

Total Factors Analyze Factorize Solve


lower q. 1.00 1.00 0.99 1.00 0.99
median 1.00 1.00 1.00 1.01 1.00
upper q. 1.01 1.02 1.00 1.09 1.03

that time. However, the more complicated data structures in MA47 and the
greater penalties for not being able to follow the pivotal sequence recom-
mended by the analyze phase penalize higher values of u to a greater
extent than in the earlier code. We investigate this in Tables VI to X.
We also, of course, monitored the numerical performance for all these
runs. Although the results from using a threshold value of 0.5 were better
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
250 • I. S. Duff and J. K. Reid

Table XI. Results with No Amalgamation Divided by Those with Parameter 5

Storage for
Total Storage Factors Time

Predicted Actual Predicted Actual Analyze Factorize Solve One-Off


CRAY lower q. 0.78 0.88 0.77 0.84 1.00 1.00 0.73 1.00
median 0.95 0.95 0.87 0.92 1.01 1.13 0.83 1.03
upper q. 1.00 1.00 1.00 1.00 1.03 1.37 1.00 1.10
SUN lower q. 0.78 0.90 0.77 0.84 1.00 1.00 0.82 1.00
median 0.95 0.95 0.87 0.92 1.02 1.03 0.92 1.01
upper q. 1.00 1.00 1.00 1.00 1.03 1.11 1.00 1.07

Table XII. Results with Amalgamation Parameter Set to 5 Divided by Those with
Parameter 10

Storage for
Total Storage Factors Time

Predicted Actual Predicted Actual Analyze Factorize Solve One-Off


CRAY lower q. 0.84 0.85 0.80 0.81 1.00 1.00 1.00 1.00
median 0.91 0.96 0.87 0.92 1.00 1.02 1.11 1.01
upper q. 1.00 1.00 1.00 1.00 1.01 1.14 1.25 1.05
SUN lower q. 0.84 0.86 0.80 0.81 0.98 0.95 0.91 0.96
median 0.91 0.96 0.87 0.92 0.98 0.97 0.96 0.98
upper q. 1.00 1.00 1.00 1.00 1.00 1.00 1.01 0.99

Table XIII. Results with Amalgamation Parameter Set to 10 Divided by Those with
Parameter 20

Storage for
Total Storage Factors Time

Predicted Actual Predicted Actual Analyze Factorize Solve One-Off


CRAY lower q. 0.82 0.83 0.77 0.80 1.00 1.00 1.00 1.00
median 0.89 0.94 0.86 0.88 1.00 1.00 1.10 1.00
upper q. 1.00 1.00 1.00 1.00 1.01 1.05 1.22 1.01
SUN lower q. 0.82 0.83 0.77 0.80 1.00 0.87 0.91 0.98
median 0.89 0.94 0.86 0.87 1.02 0.98 0.98 1.00
upper q. 1.00 1.00 1.00 1.00 1.05 1.03 1.00 1.03

than 0.1 for a couple of test problems, notably the NNC1374 example, it
was substantially more expensive to use such a high value for the thresh-
old. For lower values of the threshold, the scaled residual was remarkably
flat for all the test problems until values of the threshold less than 1026
when poorer results were obtained on three of the examples.
From the performance figures in Tables VI to X, the execution times and
storage decline almost monotonically, but have begun to level off by about
0.0001 (although there are a couple of outliers). However, we are anxious
not to compromise stability and recognize that our numerical experience is
necessarily limited. We have therefore decided to choose as default a
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
Exploiting Zeros on the Diagonal • 251

Table XIV. Times with Block Size Parameter b Set to 1 Divided by Those with b 5 5

Analyze Factorize Solve One-Off


CRAY lower q. 1.00 1.01 0.99 1.00
median 1.00 1.05 1.00 1.01
upper q. 1.00 1.14 1.00 1.03
SUN lower q. 1.00 1.00 0.98 0.99
median 1.01 1.02 1.00 1.01
upper q. 1.01 1.06 1.02 1.03

threshold value of 0.001 because for many problems, much of the sparsity
benefit has been realized at this value.

3.4 Effect of Node Amalgamation


The node amalgamation parameter, discussed at the end of Section 2.2,
controls the amalgamation of neighboring nodes in the assembly tree to
obtain larger blocks (and more eliminations within each node) at the cost of
extra fill-in and arithmetic. No node is amalgamated with another unless
its number of variables is less than the parameter value. This feature was
also present in the MA27 code. One intention of performing more amalgam-
ations is that there should be more scope for the use of Level 3 BLAS,
which might be expected to benefit platforms with efficient Level 3 BLAS
kernels.
We show the results of running with various levels of amalgamation in
Tables XI to XIII. As expected, there is a difference of performance between
the two machines. On the CRAY, a higher amount of amalgamation is
beneficial. On the SUN, there is a slight gain from some amalgamation, but
the effect is reversed before the amalgamation parameter reaches 10. In
the interests of choosing a default value that is satisfactory on several
platforms (even if not optimal), we have chosen the value 5. Note that
people running extensively on a vector machine, like a CRAY, may wish to
increase this (say, to 20).

3.5 Effect of Change of Block Size for Level 3 BLAS During Factorization
A major benefit of multifrontal methods is that the floating-point arith-
metic is performed on dense submatrices. In particular, if we perform
several pivot steps on a particular frontal matrix, Level 3 BLAS can be
used. However, in the present case, we also wish to maintain symmetry,
and the current Level 3 BLAS suite does not have an appropriate kernel.
We thus, as discussed in Section 2.3, need to split the frontal matrix into
strips, starting at rows 1, b11, 2b11, . . . , so that we can use Level 3 BLAS
without doubling the arithmetic count. In fact, in a block of size b on the
diagonal, the extra work is bp(b 2 1) floating-point operations. Clearly this
means that, while we would like to increase b for Level 3 BLAS efficiency,
by doing so we increase the amount of arithmetic. In this section, we
examine the tradeoff between these competing trends.
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
252 • I. S. Duff and J. K. Reid

Table XV. Times with Block Size Parameter b Set to 5 Divided by Those with b 5 10

Analyze Factorize Solve One-Off


CRAY lower q. 1.00 0.95 1.00 0.98
median 1.00 0.98 1.00 0.99
upper q. 1.00 1.00 1.00 1.00
SUN lower q. 0.99 0.99 0.99 0.99
median 1.00 0.99 1.00 0.99
upper q. 1.01 1.01 1.01 1.00

Table XVI. Times with Block Size Parameter b Set to 10 Divided by Those with b 5 20

Analyze Factorize Solve One-Off


CRAY lower q. 1.00 0.73 1.00 0.91
median 1.00 0.90 1.00 0.97
upper q. 1.00 1.00 1.00 1.00
SUN lower q. 0.98 0.92 0.98 0.94
median 1.00 0.96 0.99 0.98
upper q. 1.01 0.99 1.00 1.00

Table XVII. Times with Block Size Parameter b Set to 5 Divided by Those with b . n

Analyze Factorize Solve One-Off


CRAY lower q. 1.00 0.97 1.00 0.99
median 1.00 0.99 1.00 1.00
upper q. 1.00 1.01 1.02 1.00
SUN lower q. 1.00 0.97 0.98 0.99
median 1.00 1.00 1.00 1.00
upper q. 1.03 1.02 1.03 1.01

We show results for various values of the block size parameter b in


Tables XIV to XVI. It would seem, from these results, that a modest value
is best, and we choose 5 as the default value on the basis of these figures.
We show in Table XVII a comparison between results with b 5 5 and b .
n, which corresponds to no blocking and the use of simple Fortran code.
Interestingly, the results are almost identical.

3.6 Effect of Change of Block Size for Level 2 BLAS During Solution
In a single block pivot stage of the solution phase, one can use indirect
addressing for every operation. Alternatively, one can load the appropriate
entries of the right-hand-side vector into a small full vector corresponding
to the rows in the current front, update this vector with Level 2 BLAS
operations, and finally scatter it back to the full vector.
We have experimented with the parameter that determines whether to
use indirect or direct addressing in the solution phase. Direct addressing is
used (and Level 2 BLAS called) if the number of pivots at a step is more
than this parameter. Thus, for high values of the parameter, there will be
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
Exploiting Zeros on the Diagonal • 253

Table XVIII. Ratios of Solve Times with Different Values for the Parameter for Indirect
Addressing

Parameter Ratio 1:2 2:4 4:8 8:16


CRAY lower q. 1.00 1.00 1.00 1.00
median 1.04 1.03 1.03 1.00
upper q. 1.10 1.06 1.13 1.07
SUN lower q. 0.97 0.99 0.94 0.94
median 1.00 1.00 0.98 0.98
upper q. 1.02 1.03 1.06 1.04

Table XIX. Performance of Runs of MA47 Code on the SUN SPARC-10

Storage (millions of words)

Total For Factors Time

Case Predicted Actual Predicted Actual Analyze Factorize Solve


1. BRITGAS 0.52 0.59 0.28 0.34 12.40 10.96 0.16
2. BIGBANK 0.09 0.09 0.07 0.07 1.22 0.37 0.04
3. MINPERM 0.12 0.12 0.09 0.09 0.60 0.45 0.13
4. SVANBERG 1.07 0.89 0.94 0.77 12.43 3.17 0.36
5. BRATU2D 1.22 1.22 0.95 0.95 80.35 8.29 0.33
6. BRATU3D 4.09 3.94 2.32 2.18 18.00 70.53 0.65
7. GRIDNETC 0.43 0.43 0.36 0.36 1.62 1.45 0.19
8. QPCSTAIR 0.13 0.14 0.07 0.08 0.44 0.80 0.03
9. KSIP 0.14 0.70 0.11 0.20 4.68 19.28 0.07
10. AUG3DQP 0.29 0.29 0.20 0.20 0.70 1.16 0.09
11. FFFFF800 0.18 0.17 0.12 0.09 1.24 1.08 0.04
12. PILOT 1.58 1.74 0.80 0.84 9.80 15.07 0.28
13. ORSIRR 2 0.44 0.44 0.24 0.25 1.48 2.87 0.08
14. JPWH 991 0.84 0.94 0.43 0.35 1.57 6.99 0.11
15. BCSSTK27 0.18 0.18 0.10 0.10 0.94 0.21 0.04
16. NNC1374 0.38 1.32 0.32 0.48 1.88 29.89 0.16
17. FFFFF800 0.04 0.06 0.03 0.04 0.77 1.29 0.02
18. PILOT 0.29 0.29 0.16 0.19 4.97 2.76 0.10
19. ORSIRR 2 0.28 0.93 0.13 0.30 1.21 25.14 0.11
20. JPWH 991 0.57 0.99 0.14 0.26 2.37 24.08 0.10
21. BCSSTK27 0.17 0.17 0.10 0.10 0.98 0.22 0.04
22. NNC1374 0.11 0.25 0.09 0.14 2.47 5.54 0.06

less use of Level 2 BLAS. We show a summary of our results in Table


XVIII. As can be seen, the results are quite flat. On the largest of our
problems, there was some gain by using a value of 4 for the Level 2
blocking, and so we have chosen that as our default.

3.7 Performance of MA47 and Comparison with MA27


In the past five subsections, we have considered the effect of various
controlling parameters on the performance of MA47. We now examine the
performance of our code with the default values for the parameters on both
the SUN SPARCstation 10 and the CRAY Y-MP. The storage counts and
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
254 • I. S. Duff and J. K. Reid

Table XX. Performance of Runs of MA47 Code on the CRAY Y-MP

Storage (millions of words)

Total For Factors Time

Case Predicted Actual Predicted Actual Analyze Factorize Solve


1. BRITGAS 0.52 0.59 0.28 0.35 9.63 3.46 0.06
2. BIGBANK 0.09 0.09 0.07 0.07 0.99 0.19 0.03
3. MINPERM 0.12 0.12 0.09 0.09 16.27 0.35 0.09
4. SVANBERG 1.07 1.07 0.94 0.95 10.06 1.63 0.27
5. BRATU2D 1.22 1.22 0.95 0.95 75.64 1.79 0.12
6. BRATU3D 4.09 3.94 2.32 2.18 16.69 7.17 0.11
7. GRIDNETC 0.43 0.43 0.36 0.36 1.59 0.71 0.15
8. QPCSTAIR 0.13 0.14 0.07 0.08 0.39 0.28 0.01
9. KSIP 0.14 0.70 0.11 0.20 6.24 2.97 0.02
10. AUG3DQP 0.29 0.29 0.20 0.20 0.58 0.41 0.06
11. FFFFF800 0.18 0.17 0.12 0.09 1.16 0.35 0.02
12. PILOT 1.58 1.74 0.80 0.84 9.06 2.78 0.08
13. ORSIRR 2 0.44 0.44 0.24 0.25 1.38 0.56 0.02
14. JPWH 991 0.84 0.94 0.43 0.35 1.30 1.25 0.02
15. BCSSTK27 0.18 0.18 0.10 0.10 0.91 0.16 0.02
16. NNC1374 0.38 1.32 0.32 0.47 1.51 2.43 0.03
17. FFFFF800 0.04 0.06 0.03 0.04 0.78 0.68 0.01
18. PILOT 0.29 0.29 0.16 0.19 3.94 1.63 0.05
19. ORSIRR 2 0.28 0.93 0.13 0.30 1.17 3.38 0.02
20. JPWH 991 0.57 0.99 0.14 0.26 2.39 4.92 0.02
21. BCSSTK27 0.17 0.17 0.10 0.10 0.92 0.17 0.02
22. NNC1374 0.11 0.23 0.09 0.14 2.00 1.51 0.03

times for the problems of Tables I and II are shown in Tables XIX and XX.
It was our original intention that this new MA47 code would replace MA27
in the Harwell Subroutine Library. However, the added complexity of the
new code will penalize it if it is unable to take advantage of the structure.
We thus might expect that sometimes MA47 would be better, and some-
times MA27. We illustrate this by showing the comparison ratios for the
two codes in Tables XXI and XXII.
We show the full results in these tables as well as the medians and
quartiles because the performance can vary very widely. On the SUN, the
new code factorizes matrix 18 (PILOT) over 60 times faster than MA27, but
is nearly 7 times slower than MA27 on matrix 1 (BRITGAS). With the
exception of matrix 18, the analyze phase times are always greater for the
new code, once by over a factor of 80 (matrix 5, BRATU2D). The variation is
only slightly less dramatic on the CRAY.
We have also run the codes on a set of ten Harwell-Boeing matrices with
nonzero diagonal entries (BCSSTK14/15/16/17/18/26/27/28 and BCSSTM26/
27), and the results are summarized in Table XXIII. On the SUN, MA47
just outperforms MA27 for the factorize and solve phases, but otherwise is
inferior. We therefore recommend that where the matrix has nonzeros on
the diagonal, MA27 should continue to be used. We plan to make a new
version of MA27 that incorporates the BLAS and some other minor im-
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
Exploiting Zeros on the Diagonal • 255

Table XXI. Ratio of Performance of MA47 Code to MA27 Code on the SUN SPARC-10

Total Storage for


Storage Factors Time

Case Predicted Predicted Actual Analyze Factorize Solve One-Off


1. BRITGAS 1.90 1.11 1.25 29.74 7.66 1.47 12.04
2. BIGBANK 1.49 1.31 1.29 4.52 1.86 1.69 3.31
3. MINPERM 1.01 1.12 1.12 1.85 1.31 2.95 1.66
4. SVANBERG 0.96 1.36 1.11 2.75 1.03 1.16 2.02
5. BRATU2D 1.36 1.15 0.47 83.26 0.23 0.55 2.36
6. BRATU3D 1.48 1.10 0.48 10.23 0.26 0.47 0.32
7. GRIDNETC 1.56 1.37 1.38 2.72 1.29 1.47 1.77
8. QPCSTAIR 2.52 1.57 1.64 3.52 3.17 1.53 3.20
9. KSIP 1.30 0.96 0.44 15.00 0.58 0.43 0.72
10. AUG3DQP 1.86 1.46 1.46 2.33 1.25 1.46 1.52
11. FFFFF800 1.80 1.63 1.01 3.00 1.42 1.09 1.95
12. PILOT 1.55 1.20 1.15 1.25 0.99 1.00 1.08
13. ORSIRR 2 1.61 1.15 0.42 4.11 0.26 0.52 0.38
14. JPWH 991 2.04 1.48 0.43 3.29 0.24 0.47 0.30
15. BCSSTK27 0.80 0.45 0.27 1.41 0.07 0.41 0.31
16. NNC1374 2.43 2.19 1.10 5.88 3.39 1.12 3.44
17. FFFFF800 0.41 0.40 0.39 1.83 1.14 0.59 1.31
18. PILOT 0.28 0.25 0.15 0.65 0.06 0.24 0.15
19. ORSIRR 2 1.04 0.61 0.50 3.39 2.31 0.72 2.32
20. JPWH 991 1.40 0.47 0.31 4.97 0.96 0.46 1.03
21. BCSSTK27 0.76 0.45 0.27 1.44 0.07 0.44 0.32
22. NNC1374 0.70 0.59 0.81 7.79 6.72 1.13 6.74
1quart 0.96 0.59 0.42 1.85 0.26 0.47 0.38
median 1.44 1.14 0.66 3.34 1.09 0.86 1.59
uquart 1.80 1.37 1.15 5.88 1.86 1.46 2.36

provements. We anticipate that the revised MA27 will always outperform


MA47 on this kind of matrix.
In these comparisons, we have used the same parameter settings (where
applicable) for both codes. In particular, we have run MA27 with a value for
the threshold u of 0.001 which is 100 times less than the default value for
MA27. However, it would appear empirically that the stability of MA27 is
more sensitive to the threshold value than MA47, so we have also compared
MA47 with MA27 using its default threshold. A summary of the results in
Table XXIV indicates that, in general, MA47 with its default outperforms
MA27 with its default on the SUN, but the variation in relative perfor-
mance over different problem classes is still substantial. However, the roles
are reversed on the CRAY due to the greater penalty for the extra integer
manipulation in MA47. We were disappointed to find this outweighed the
benefits of BLAS on the CRAY.
In summary, these results indicate that it is “horses for courses.” MA47
does well on our augmented systems when the (1,2) block is nearly square
or when several of the diagonals of the (1,1) block are zero. However, the
efficiency of MA47 is very dependent on the details of the assembly tree
structure so it is often difficult to judge the relative performance in
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
256 • I. S. Duff and J. K. Reid

Table XXII. Ratio of Performance of MA47 Code to MA27 Code on the CRAY Y-MP

Total Storage for


Storage Factors Time

Case Predicted Predicted Actual Analyze Factorize Solve One-Off


1. BRITGAS 1.90 1.11 1.28 23.21 9.40 2.22 16.24
2. BIGBANK 1.49 1.31 1.29 3.08 1.70 1.89 2.69
3. MINPERM 1.01 1.12 1.12 59.17 1.42 2.53 30.05
4. SVANBERG 0.96 1.36 1.37 1.68 1.29 2.14 1.62
5. BRATU2D 1.36 1.15 0.47 71.49 0.51 1.93 16.71
6. BRATU3D 1.48 1.10 0.48 7.79 0.46 1.29 1.34
7. GRIDNETC 1.56 1.37 1.38 2.70 1.49 2.11 2.16
8. QPCSTAIR 2.52 1.57 1.64 2.35 3.03 1.83 2.57
9. KSIP 1.30 0.96 0.44 14.83 0.30 1.20 0.88
10. AUG3DQP 1.86 1.46 1.46 1.91 1.62 1.97 1.79
11. FFFFF800 1.80 1.63 1.01 2.03 2.05 1.80 2.03
12. PILOT 1.55 1.20 1.15 0.86 1.35 1.78 0.95
13. ORSIRR 2 1.61 1.15 0.42 3.00 0.54 1.83 1.30
14. JPWH 991 2.04 1.48 0.43 2.30 0.54 1.44 0.88
15. BCSSTK27 0.80 0.45 0.27 1.01 0.22 1.46 0.67
16. NNC1374 2.43 2.19 1.10 4.04 1.90 1.72 2.38
17. FFFFF800 0.41 0.39 0.38 1.36 3.14 1.44 1.84
18. PILOT 0.28 0.25 0.15 0.38 0.40 1.10 0.38
19. ORSIRR 2 1.04 0.61 0.51 2.53 3.33 1.67 3.07
20. JPWH 991 1.40 0.47 0.31 4.24 2.37 1.40 2.77
21. BCSSTK27 0.76 0.45 0.27 1.02 0.25 1.38 0.69
22. NNC1374 0.70 0.59 0.81 5.35 5.86 2.00 5.48
1quart 0.96 0.59 0.42 1.68 0.51 1.44 0.95
median 1.44 1.14 0.66 2.61 1.45 1.79 1.93
uquart 1.80 1.37 1.28 5.35 2.37 1.97 2.77

Table XXIII. MA47 to MA27 Ratios on 10 Matrices with Nonzero Entries on the Diagonal

Total Storage for


Storage Factors Time
Predicted Predicted Actual Analyze Factorize Solve One-Off
CRAY lower q. 1.25 0.99 0.99 1.54 1.11 1.75 1.26
median 1.34 1.03 1.02 1.56 1.15 1.93 1.29
upper q. 1.38 1.07 1.04 1.80 1.21 2.00 1.30
SUN lower q. 1.25 0.99 0.99 1.55 0.86 0.87 0.89
median 1.34 1.03 1.02 1.70 0.94 0.94 0.98
upper q. 1.38 1.07 1.04 1.98 1.02 0.98 1.06

advance. We find this fragility to be the most disturbing aspect of the


present code and hope that further work will improve the performance of
MA47 on the “bad” cases.

ACKNOWLEDGMENTS
We would like to thank our colleagues Nick Gould and Jennifer Scott and
particularly the referee John Lewis for reading drafts of the manuscript
and for making many helpful suggestions for the presentation.
ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.
Exploiting Zeros on the Diagonal • 257

Table XXIV. MA47 to MA27 Ratios Using Default Threshold Value for MA27

Total Storage for


Storage Factors Time

Predicted Predicted Actual Analyze Factorize Solve One-Off


CRAY lower q. 0.96 0.59 0.31 1.69 0.53 1.44 0.93
median 1.45 1.15 0.57 2.64 1.38 1.82 1.80
upper q. 1.86 1.46 1.21 5.41 2.04 2.18 3.13
SUN lower q. 0.96 0.59 0.31 1.88 0.25 0.45 0.33
median 1.45 1.15 0.57 3.32 0.96 0.75 1.38
upper q. 1.86 1.46 1.12 5.58 1.88 1.40 2.53

REFERENCES

BONGARTZ, I., CONN, A. R., GOULD, N. I. M., AND TOINT, P. L. 1995. CUTE: Constrained and
unconstrained testing environment. ACM Trans. Math. Softw. 21, 1 (Mar.), 123–160.
BUNCH, J. R. AND PARLETT, B. N. 1971. Direct methods for solving symmetric indefinite
systems of linear equations. SIAM J. Numer. Anal. 8, 639 – 655.
DONGARRA, J. J., DU CROZ, J., DUFF, I. S., AND HAMMARLING, S. 1990. A set of level 3 basic
linear algebra subroutines. ACM Trans. Math. Softw. 16, 1 (Mar.), 1–17.
DONGARRA, J. J., DU CROZ, J., HAMMARLING, S., AND HANSON, R. J. 1988. An extended set of
Fortran basic linear algebra subprograms. ACM Trans. Math. Softw. 14, 1 (Mar.), 1–17. See
also the companion algorithm, pages 18 –32.
DUFF, I. S. AND REID, J. K. 1982. MA27—A set of Fortran subroutines for solving sparse
symmetric sets of linear equations. Rep. AERE R10533, HMSO, London.
DUFF, I. S. AND REID, J. K. 1983. The multifrontal solution of indefinite sparse symmetric
linear systems. ACM Trans. Math. Softw. 9, 302–325.
DUFF, I. S. AND REID, J. K. 1995. MA47, a Fortran code for direct solution of indefinite
symmetric linear systems of equations. Rep. RAL 95-001, Rutherford Appleton Laboratory,
Oxfordshire, England.
DUFF, I. S., ERISMAN, A. M., AND REID, J. K. 1986. Direct methods for sparse matrices.
Oxford University Press, London.
DUFF, I. S., GOULD, N. I. M., REID, J. K., SCOTT, J. A., AND TURNER, K. 1991. The
factorization of sparse symmetric indefinite matrices. IMA J. Numer. Anal. 11, 181–204.
DUFF, I. S., GRIMES, R. G., AND LEWIS, J. G. 1992. Users’ guide for the Harwell-Boeing
sparse matrix collection. Rep. RAL 92-086, Rutherford Appleton Laboratory, Oxfordshire,
England.
GAY, D. M. 1985. Electronic mail distribution of linear programming test problems. Math.
Program. Soc. COAL Newsl.
HARWELL 1993. Harwell subroutine library catalogue (release 11). Theoretical Studies
Department, AEA Technology, Harwell, England.
LAWSON, C. L., HANSON, R. J., KINCAID, D. R., AND KROGH, F. T. 1979. Basic linear algebra
subprograms for Fortran use. ACM Trans. Math. Softw. 5, 308 –325.
LIU, J. W. H. 1990. The role of elimination trees in sparse factorization. SIAM J. Matrix
Anal. Appl. 11, 134 –172.
MARKOWITZ, H. M. 1957. The elimination form of the inverse and its application to linear
programming. Manage. Sci. 3, 255–269.

Received February 1995; revised June 1995; accepted July 1995

ACM Transactions on Mathematical Software, Vol. 22, No. 2, June 1996.

You might also like