KLU_UserGuide
KLU_UserGuide
Abstract
KLU is a set of routines for solving sparse linear systems of equations. It is particularly
well-suited to matrices arising in SPICE-like circuit simulation applications. It relies on a
permutation to block triangular form (BTF), several methods for finding a fill-reducing ordering
(variants of approximate minimum degree, and nested dissection), and a sparse left-looking LU
factorization method to factorize each block. A MATLAB interface is included. KLU appears
as Collected Algorithm 907 of the ACM [9].
∗
[email protected], http://www.suitesparse.com. This work was supported by Sandia National
Labs, and the National Science Foundation. Portions of the work were done while on sabbatical at Stanford University
and Lawrence Berkeley National Laboratory (with funding from Stanford University and the SciDAC program).
1
Contents
1 License and Copyright 3
2 Overview 4
3 Availability 5
6 Installation 19
2
1 License and Copyright
KLU, Copyright c 2004-2023 University of Florida. All Rights Reserved. KLU is available under
alternate licenses; contact T. Davis for details.
KLU License: see KLU/Doc/License.txt for the license.
Availability: http://www.suitesparse.com
Acknowledgments:
This work was supported by Sandia National Laboratories (Mike Heroux) and the National
Science Foundation under grants 0203270 and 0620286.
3
2 Overview
KLU is a set of routines for solving sparse linear systems of equations. It first permutes the matrix
into upper block triangular form, via the BTF package. This is done by first finding a permutation
for a zero-free diagonal (a maximum transversal) [12]. If there is no such permutation, then the
matrix is structurally rank-deficient, and is numerically singular. Next, Tarjan’s method [13, 23]
is used to find the strongly-connected components of the graph. The block triangular form is
essentially unique; any method will lead to the same number and sizes of blocks, although the
ordering of the blocks may vary (consider a diagonal matrix, for example). Assuming the matrix
has full structural rank, the permuted matrix has the following form:
A11 · · · A1k
P AQ =
.. .. ,
. .
Akk
where each diagonal block is square with a zero-free diagonal.
Next, each diagonal block is factorized with a sparse left-looking method [14]. The kernel of this
factorization method is an efficient method for solving Lx = b when L, x, and b are all sparse. This
kernel is used to compute each column of L and U , one column at a time. The total work performed
by this method is always proportional to the number of floating-point operations, something that
is not true of any other sparse LU factorization method.
Prior to factorizing each diagonal block, the blocks are ordered to reduce fill-in. By default,
the symmetric approximate minimum degree (AMD) ordering is used on Aii + ATii [1, 2]. Another
ordering option is to find a column ordering via COLAMD [7, 8]. Alternatively, a permutation can
be provided by the user, or a pointer to a user-provided ordering function can be passed, which is
then used to order each block.
Only the diagonal blocks need to be factorized. Consider a linear system where the matrix is
permuted into three blocks, for example:
A11 A12 A13 x1 b1
A22 A23 x2 = b2 .
A33 x3 b3
The non-singular system A33 x3 = b3 can first be solved for x3 . After a block back substitution,
the resulting system becomes
" #" # " # " #
A11 A12 x1 b1 − A13 x3 b01
= =
A22 x2 b2 − A23 x3 b02
and the A22 x2 = b02 system can be solved for x2 . The primary advantage of this method is that no
fill-in occurs in the off-diagonal blocks (A12 , A13 , and A23 ). This is particular critical for sparse
linear systems arising in SPICE-like circuit simulation [18, 19, 20, 22]. Circuit matrices are typically
permutable into block triangular form, with many singletons (1-by-1 blocks). They also often have a
handful of rows and columns with many nonzero entries, due to voltage and current sources. These
rows and columns are pushed into the upper block triangular form, and related to the singleton
blocks (for example, A33 in the above system is 1-by-1, and the column in A13 and A23 has many
nonzero entries). Since these nearly-dense rows and columns do not appear in the LU factorization
of the diagonal blocks, they cause no fill-in.
The structural rank of a matrix is based solely on the pattern of its entries, not their numerical
values. With random entries, the two ranks are equal with probability one. The structural rank of
4
any matrix is an upper bound on the numerical rank. The maximum transversal algorithm in the
BTF package is useful in determining if a matrix is structurally rank deficient, and if so, it reveals
a (non-unique) set of rows and columns that contribute to that rank deficiency. This is useful in
determining what parts of a circuit are poorly formulated (such as dangling components).
When ordered and factorized with KLU, very little fill-in occurs in the resulting LU factors,
which means that there is little scope for use of the BLAS [11]. Sparse LU factorization methods
that use the BLAS (such as SuperLU [10] amd UMFPACK [4, 5]) are slower than KLU when
applied to sparse matrices arising in circuit simulation. Both KLU and SuperLU are based on
Gilbert and Peierl’s left-looking method [14]. SuperLU uses supernodes, but KLU does not; thus
the name KLU refers to a “Clark Kent” LU factorization algorithm (what SuperLU was before it
became Super).
For details of the permutation to block triangular form, left-looking sparse LU factorization,
and approximate minimum degree, refer to [6]. Concise versions of these methods can be found
in the CSparse package. KLU is also the topic of a Master’s thesis by Palamadai Natarajan [21];
a copy of the thesis can be found in the KLU/Doc directory. It includes a description of an earlier
version of KLU; some of the function names and parameter lists have changed in this version. The
descriptions of the methods used still applies to the current version of KLU, however.
KLU appears as Algorithm 907: KLU, a direct sparse solver for circuit simulation problems,
ACM Transactions on Mathematical Software, vol 37, no 3, 2010.
3 Availability
KLU and its required ordering packages (BTF, COLAMD, AMD, and SuiteSparse config) are avail-
able at
http://www.suitesparse.com. In addition, KLU can make use of any user-provided ordering
function. One such function is included, which provides KLU with an interface to the ordering
methods used in CHOLMOD [3], such as METIS, a nested dissection method [17]. After per-
mutation to block triangular form, circuit matrices have very good node separators, and are thus
excellent candidates for nested dissection. The METIS ordering takes much more time to compute
than the AMD ordering, but if the ordering is reused many times (typical in circuit simulation) the
better-quality ordering can pay off in lower total simulation time.
To use KLU, you must obtain the KLU, BTF, SuiteSparse config, AMD, and COLAMD pack-
ages in the SuiteSparse suite of sparse matrix libraries. See http://www.suitesparse.com for each
of these packages. They are also all contained within the single SuiteSparse.zip or SuiteSparse.tar.gz
distribution.
5
With a single input klu(A) returns a MATLAB struct containing the LU factors. The factor-
ization is in the form L*U + F = R \ A(p,q) where L*U is the LU factorization of just the diagonal
blocks of the block triangular form, F is a sparse matrix containing the entries in the off-diagonal
blocks, R is a diagonal matrix containing the row scale factors, and p and q are permutation vectors.
The LU struct also contains a vector r which describes the block boundaries (the same as the third
output parameter of dmperm). The kth block consists of rows and columns r(k) to r(k+1)-1 in
the permuted matrix A(p,q) and the factors L and U.
An optional final input argument (klu(A,opts) for example) provides a way of modifying
KLU’s user-definable parameters, including a partial pivoting tolerance and ordering options. A
second output argument ([LU,info] = klu ( ... )) provides statistics on the factorization.
The BTF package includes three user-callable MATLAB functions which replicate most of
features of the MATLAB built-in dmperm function, and provide an additional option which can
significantly limit the worst-case time taken by dmperm. For more details, type help btf, help
maxtrans, and help strongcomp in MATLAB. Additional information about how these functions
work can be found in [6].
Both btf and maxtrans include a second option input, maxwork, which limits the total work
performed in the maximum transversal to maxwork * nnz(A). The worst-case time taken by the
algorithm is O (n * nnz(A)), where the matrix A is n-by-n, but this worst-case time is rarely
reached in practice.
To use the KLU and BTF functions in MATLAB, you must first compile and install them.
In the BTF/MATLAB directory, type btf install, and then type klu install in the KLU/MATLAB
directory. Alternatively, if you have the entire SuiteSparse, simply run the SuiteSparse install
function in the SuiteSparse directory.
After running the installation scripts, type pathtool and save your path for future MATLAB
sessions. If you cannot save your path because of file permissions, edit your startup.m by adding
addpath commands (type doc startup and doc addpath for more information).
6
• tol: partial pivoting tolerance. If the diagonal entry has a magnitude greater than or equal
to tol times the largest magnitude of entries in the pivot column, then the diagonal entry is
chosen. Default value: 0.001.
• ordering: which fill-reducing ordering to use: 0 for AMD, 1 for COLAMD, 2 for a user-
provided permutation P and Q (or a natural ordering if P and Q are NULL), or 3 for the
user order function. Default: 0 (AMD).
• scale: whether or not the matrix should be scaled. If scale < 0, then no scaling is performed
and the input matrix is not checked for errors. If scale >= 0, the input matrix is check for
errors. If scale=0, then no scaling is performed. If scale=1, then each row of A is divided
by the sum of the absolute values in that row. If scale=2, then each row of A is divided by
the maximum absolute value in that row. Default: 2.
• btf: if nonzero, then BTF is used to permute the input matrix into block upper triangular
form. This step is skipped if Common.btf is zero. Default: 1.
• maxwork: sets an upper limit on the amount of work performed in btf maxtrans to
maxwork*nnz(A). If the limit is reached, a partial zero-free diagonal might be found. This has
no effect on whether or not the matrix can be factorized, since the matrix can be factorized
with no BTF pre-ordering at all. This option provides a tradeoff between the effectiveness of
the BTF ordering and the cost to compute it. A partial result can result in fewer, and larger,
blocks in the BTF form, resulting to more work required to factorize the matrix. No limit is
enforced if maxwork <= 0. Default: 0.
• user order: a pointer to a function that can be provided by the application that uses KLU,
to redefine the fill-reducing ordering used by KLU for each diagonal block. The int32_t and
int64_t prototypes must be as follows:
The function should return 0 if an error occurred, and either -1 or a positive (nonzero) value
if no error occurred. If greater than zero, then the return value is interpreted by KLU as
an estimate of the number of nonzeros in L or U (whichever is greater), when the permuted
matrix is factorized. Only an estimate is possible, since partial pivoting with row interchanges
is performed during numerical factorization. The input matrix is provided to the function in
the parameters n, Ap, and Ai, in compressed-column form. The matrix A is n-by-n. The Ap
array is of size n+1; the jth column of A has row indices Ai[Ap[j] ... Ap[j+1]-1]. The
Ai array is of size Ap[n]. The first column pointer Ap[0] is zero. The row indices might not
appear sorted in each column, but no duplicates will appear.
The output permutation is to be passed back in the Perm array, where Perm[k]=j means
that row and column j of A will appear as the kth row and column of the permuted matrix
factorized by KLU. The Perm array is already allocated when it is passed to the user function.
The user function may use, and optionally modify, the contents of the klu common Common ob-
ject. In particular, prior to calling KLU, the user application can set both Common.user order
7
and Common.user data. The latter is a void * pointer that KLU does not use, except to
pass to the user ordering function pointed to by Common.user order. This is a mechanism
for passing additional arguments to the user function.
An example user function is provided in the KLU/User directory, which provides an interface
to the ordering method in CHOLMOD.
• n: an integer scalar. The matrix is n-by-n. Note that KLU only operates on square matrices.
• Ap: an integer array of size n+1. The first entry is Ap[0]=0, and the last entry nz=Ap[n] is
equal to the number of entries in the matrix.
• Ai: an integer array of size nz = Ap[n]. The row indices of entries in column j of A are located
in Ai [Ap [j] ... Ap [j+1]-1]. The matrix is zero-based; row and column indices are in
the range 0 to n-1.
• Ax: a double array of size nz for the real case, or 2*nz for the complex case. For the complex
case, the real and imaginary parts are interleaved, compatible with Fortran and the ANSI C99
Complex data type. KLU does not rely on the ANSI C99 data type, however, for portability
reasons. The numerical values in column j of a real matrix are located in Ax [Ap [j] ...
Ap [j+1]-1]. For a complex matrix, they appear in Ax [2*Ap [j] ... 2*Ap [j+1]-1],
as real/imaginary pairs (the real part appears first, followed by the imaginary part).
8
5.5 klu defaults: set default parameters
This function sets the default parameters for KLU and clears the statistics. It may be used for
either the real or complex cases. A value of 0 is returned if an error occurs, 1 otherwise. This
function must be called before any other KLU function can be called.
#include "klu.h"
int ok ;
klu_common Common ;
ok = klu_defaults (&Common) ; /* real or complex */
#include "klu.h"
klu_l_common Common ;
int ok = klu_l_defaults (&Common) ; /* real or complex */
#include "klu.h"
int32_t n, Ap [n+1], Ai [nz] ;
klu_symbolic *Symbolic ;
klu_common Common ;
Symbolic = klu_analyze (n, Ap, Ai, &Common) ; /* real or complex */
#include "klu.h"
int64_t n, Ap [n+1], Ai [nz] ;
klu_l_symbolic *Symbolic ;
klu_l_common Common ;
Symbolic = klu_l_analyze (n, Ap, Ai, &Common) ; /* real or complex */
#include "klu.h"
int32_t n, Ap [n+1], Ai [nz], P [n], Q [n] ;
klu_symbolic *Symbolic ;
klu_common Common ;
Symbolic = klu_analyze_given (n, Ap, Ai, P, Q, &Common) ; /* real or complex */
#include "klu.h"
int64_t n, Ap [n+1], Ai [nz], P [n], Q [n] ;
9
klu_l_symbolic *Symbolic ;
klu_l_common Common ;
Symbolic = klu_l_analyze_given (n, Ap, Ai, P, Q, &Common) ; /* real or complex */
#include "klu.h"
int32_t Ap [n+1], Ai [nz] ;
double Ax [nz], Az [2*nz] ;
klu_symbolic *Symbolic ;
klu_numeric *Numeric ;
klu_common Common ;
Numeric = klu_factor (Ap, Ai, Ax, Symbolic, &Common) ; /* real */
Numeric = klu_z_factor (Ap, Ai, Az, Symbolic, &Common) ; /* complex */
#include "klu.h"
int64_t Ap [n+1], Ai [nz] ;
double Ax [nz], Az [2*nz] ;
klu_l_symbolic *Symbolic ;
klu_l_numeric *Numeric ;
klu_l_common Common ;
Numeric = klu_l_factor (Ap, Ai, Ax, Symbolic, &Common) ; /* real */
Numeric = klu_zl_factor (Ap, Ai, Az, Symbolic, &Common) ; /* complex */
#include "klu.h"
int32_t ldim, nrhs ; int ok ;
double B [ldim*nrhs], Bz [2*ldim*nrhs] ;
klu_symbolic *Symbolic ;
klu_numeric *Numeric ;
klu_common Common ;
ok = klu_solve (Symbolic, Numeric, ldim, nrhs, B, &Common) ; /* real */
ok = klu_z_solve (Symbolic, Numeric, ldim, nrhs, Bz, &Common) ; /* complex */
#include "klu.h"
int64_t ldim, nrhs ; int ok ;
double B [ldim*nrhs], Bz [2*ldim*nrhs] ;
klu_symbolic *Symbolic ;
klu_numeric *Numeric ;
klu_common Common ;
ok = klu_l_solve (Symbolic, Numeric, ldim, nrhs, B, &Common) ; /* real */
ok = klu_zl_solve (Symbolic, Numeric, ldim, nrhs, Bz, &Common) ; /* complex */
10
5.10 klu tsolve: solve a transposed linear system
Solves the linear system AT x = b or AH x = b. The conj solve input is 0 for AT x = b, or nonzero
for AH x = b. Otherwise, the function is identical to klu solve.
#include "klu.h"
int32_t ldim, nrhs ; int ok ;
double B [ldim*nrhs], Bz [2*ldim*nrhs] ;
klu_symbolic *Symbolic ;
klu_numeric *Numeric ;
klu_common Common ;
ok = klu_tsolve (Symbolic, Numeric, ldim, nrhs, B, &Common) ; /* real */
ok = klu_z_tsolve (Symbolic, Numeric, ldim, nrhs, Bz, conj_solve, &Common) ; /* complex */
#include "klu.h"
int64_t ldim, nrhs ; int ok ;
double B [ldim*nrhs], Bz [2*ldim*nrhs] ;
klu_symbolic *Symbolic ;
klu_numeric *Numeric ;
klu_common Common ;
ok = klu_l_tsolve (Symbolic, Numeric, ldim, nrhs, B, &Common) ; /* real */
ok = klu_zl_tsolve (Symbolic, Numeric, ldim, nrhs, Bz, conj_solve, &Common) ; /* complex */
#include "klu.h"
int ok ; int64_t Ap [n+1], Ai [nz] ;
double Ax [nz], Az [2*nz] ;
klu_l_symbolic *Symbolic ;
klu_l_numeric *Numeric ;
klu_l_common Common ;
ok = klu_l_refactor (Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */
ok = klu_zl_refactor (Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */
11
for both real and complex cases.
#include "klu.h"
klu_symbolic *Symbolic ;
klu_common Common ;
klu_free_symbolic (&Symbolic, &Common) ; /* real or complex */
#include "klu.h"
klu_l_symbolic *Symbolic ;
klu_l_common Common ;
klu_l_free_symbolic (&Symbolic, &Common) ; /* real or complex */
#include "klu.h"
klu_numeric *Numeric ;
klu_common Common ;
klu_free_numeric (&Numeric, &Common) ; /* real */
klu_z_free_numeric (&Numeric, &Common) ; /* complex */
#include "klu.h"
klu_l_numeric *Numeric ;
klu_l_common Common ;
klu_l_free_numeric (&Numeric, &Common) ; /* real */
klu_zl_free_numeric (&Numeric, &Common) ; /* complex */
#include "klu.h"
int ok ;
klu_symbolic *Symbolic ;
klu_numeric *Numeric ;
klu_common Common ;
ok = klu_sort (Symbolic, Numeric, &Common) ; /* real */
ok = klu_z_sort (Symbolic, Numeric, &Common) ; /* complex */
#include "klu.h"
int ok ;
klu_l_symbolic *Symbolic ;
klu_l_numeric *Numeric ;
klu_l_common Common ;
ok = klu_l_sort (Symbolic, Numeric, &Common) ; /* real */
ok = klu_zl_sort (Symbolic, Numeric, &Common) ; /* complex */
12
5.15 klu flops: determine the flop count
This function determines the number of floating-point operations performed when the matrix was
factorized by klu factor or klu refactor. The result is returned in Common.flops.
#include "klu.h"
int ok ;
klu_symbolic *Symbolic ;
klu_numeric *Numeric ;
klu_common Common ;
ok = klu_flops (Symbolic, Numeric, &Common) ; /* real */
ok = klu_z_flops (Symbolic, Numeric, &Common) ; /* complex */
#include "klu.h"
int ok ;
klu_l_symbolic *Symbolic ;
klu_l_numeric *Numeric ;
klu_l_common Common ;
ok = klu_l_flops (Symbolic, Numeric, &Common) ; /* real */
ok = klu_zl_flops (Symbolic, Numeric, &Common) ; /* complex */
#include "klu.h"
int ok ; int64_t Ap [n+1], Ai [nz] ;
double Ax [nz], Az [2*nz] ;
klu_l_symbolic *Symbolic ;
klu_l_numeric *Numeric ;
klu_l_common Common ;
ok = klu_l_rgrowth (Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */
ok = klu_zl_rgrowth (Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */
13
Tisseur [16]. The inputs Ap, and Ax (Az in the complex case) must be unchanged since the last call
to klu factor or klu refactor. The result is returned in Common.condest.
#include "klu.h"
int ok ; int32_t Ap [n+1] ;
double Ax [nz], Az [2*nz] ;
klu_symbolic *Symbolic ;
klu_numeric *Numeric ;
klu_common Common ;
ok = klu_condest (Ap, Ax, Symbolic, Numeric, &Common) ; /* real */
ok = klu_z_condest (Ap, Az, Symbolic, Numeric, &Common) ; /* complex */
#include "klu.h"
int ok ; int64_t Ap [n+1] ;
double Ax [nz], Az [2*nz] ;
klu_l_symbolic *Symbolic ;
klu_l_numeric *Numeric ;
klu_l_common Common ;
ok = klu_l_condest (Ap, Ax, Symbolic, Numeric, &Common) ; /* real */
ok = klu_zl_condest (Ap, Az, Symbolic, Numeric, &Common) ; /* complex */
#include "klu.h"
int ok ;
klu_l_symbolic *Symbolic ;
klu_l_numeric *Numeric ;
klu_l_common Common ;
ok = klu_l_rcond (Symbolic, Numeric, &Common) ; /* real */
ok = klu_zl_rcond (Symbolic, Numeric, &Common) ; /* complex */
14
1. n > 0. Note that KLU does not handle empty (0-by-0) matrices.
2. Ap, Ai, and Ax (Az for the complex case) must not be NULL.
3. Ap[0]=0, and Ap [j] <= Ap [j+1] for all j in the range 0 to n-1.
4. The row indices in each column, Ai [Ap [j] ... Ap [j+1]-1], must be in the range 0
to n-1, and no duplicates can appear. If the workspace W is NULL on input, the check for
duplicate entries is skipped.
#include "klu.h"
int scale, ok ; int32_t n, Ap [n+1], Ai [nz], W [n] ;
double Ax [nz], Az [2*nz], Rs [n] ;
klu_common Common ;
ok = klu_scale (scale, n, Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */
ok = klu_z_scale (scale, n, Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */
#include "klu.h"
int scale, ok ; int64_t n, Ap [n+1], Ai [nz], W [n] ;
double Ax [nz], Az [2*nz], Rs [n] ;
klu_l_common Common ;
ok = klu_l_scale (scale, n, Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */
ok = klu_zl_scale (scale, n, Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */
#include "klu.h"
int ok ;
int32_t Lp [n+1], Li [lnz], Up [n+1], Ui [unz], Fp [n+1], Fi [nzoff], P [n], Q [n], R [n] ;
double Lx [lnz], Lz [lnz], Ux [unz], Uz [unz], Fx [nzoff], Fz [nzoff], Rs [n] ;
klu_symbolic *Symbolic ;
klu_numeric *Numeric ;
klu_common Common ;
ok = klu_extract (Numeric, Symbolic,
Lp, Li, Lx, Up, Ui, Ux, Fp, Fi, Fx, P, Q, Rs, R, &Common) ; /* real */
ok = klu_z_extract (Numeric, Symbolic,
Lp, Li, Lx, Lz, Up, Ui, Ux, Uz, Fp, Fi, Fx, Fz, P, Q, Rs, R, &Common) ; /* complex */
#include "klu.h"
int ok ;
15
int64_t Lp [n+1], Li [lnz], Up [n+1], Ui [unz], Fp [n+1],
Fi [nzoff], P [n], Q [n], R [n] ;
double Lx [lnz], Lz [lnz], Ux [unz], Uz [unz], Fx [nzoff], Fz [nzoff], Rs [n] ;
klu_l_symbolic *Symbolic ;
klu_l_numeric *Numeric ;
klu_l_common Common ;
ok = klu_l_extract (Numeric, Symbolic,
Lp, Li, Lx, Up, Ui, Ux, Fp, Fi, Fx, P, Q, Rs, R, &Common) ; /* real */
ok = klu_zl_extract (Numeric, Symbolic,
Lp, Li, Lx, Lz, Up, Ui, Ux, Uz, Fp, Fi, Fx, Fz, P, Q, Rs, R, &Common) ; /* complex */
#include "klu.h"
size_t n, nnew, nold, size ;
void *p ;
klu_common Common ;
p = klu_malloc (n, size, &Common) ;
p = klu_free (p, n, size, &Common) ;
p = klu_realloc (nnew, nold, size, p, &Common) ;
#include "klu.h"
size_t n, nnew, nold, size ;
void *p ;
klu_l_common Common ;
p = klu_l_malloc (n, size, &Common) ;
p = klu_l_free (p, n, size, &Common) ;
p = klu_l_realloc (nnew, nold, size, p, &Common) ;
16
The function can require up to O(n*nnz(A)) time (excluding the cheap match phase, which takes
another O(nnz(A)) time. If maxwork > 0 on input, the work is limited to O(maxwork*nnz(A))
(excluding the cheap match), but the maximum transversal might not be found if the limit is
reached.
The Work array is workspace required by the methods; its contents are undefined on input and
output.
int32_t nrow, ncol, Ap [ncol+1], Ai [nz], Match [nrow], Work [5*ncol], nmatch ;
double maxwork, work ;
nmatch = btf_maxtrans (nrow, ncol, Ap, Ai, maxwork, &work, Match, Work) ;
int64_t nrow, ncol, Ap [ncol+1], Ai [nz], Match [nrow], Work [5*ncol], nmatch ;
double maxwork, work ;
nmatch = btf_l_maxtrans (nrow, ncol, Ap, Ai, maxwork, &work, Match, Work) ;
int32_t n, Ap [n+1], Ai [nz], P [n], Q [n], R [n+1], nfound, Work [5*n], ncomp, nfound ;
double maxwork, work ;
ncomp = btf_order (n, Ap, Ai, maxwork, &work, P, Q, R, &nfound, Work) ;
int64_t n, Ap [n+1], Ai [nz], P [n], Q [n], R [n+1], nfound, Work [5*n], ncomp, nfound ;
double maxwork, work ;
ncomp = btf_l_order (n, Ap, Ai, maxwork, &work, P, Q, R, &nfound, Work) ;
17
5.25 Sample C programs that use KLU
Here is a simple main program, klu simple.c, that illustrates the basic usage of KLU. It uses
KLU, and indirectly makes use of BTF and AMD. COLAMD is required to compile the demo, but
it is not called by this example. It uses statically defined global variables for the sparse matrix A,
which would not be typical of a complete application. It just makes for a simpler example.
//------------------------------------------------------------------------------
// KLU/Demo/klu_simple: simple demo program for KLU
//------------------------------------------------------------------------------
//------------------------------------------------------------------------------
#include <stdio.h>
#include "klu.h"
int n = 5 ;
int Ap [ ] = {0, 2, 5, 9, 10, 12} ;
int Ai [ ] = { 0, 1, 0, 2, 4, 1, 2, 3, 4, 2, 1, 4} ;
double Ax [ ] = {2., 3., 3., -1., 4., 4., -3., 1., 2., 2., 6., 1.} ;
double b [ ] = {8., 45., -3., 3., 19.} ;
The solution to Ax = b is x = [1 2 3 4 5]T . The program uses default control settings (no scaling,
permutation to block triangular form, and the AMD ordering). It ignores the error codes in the
return values and Common.status.
18
The block triangular form found by btf order for this matrix is given below
0 −1 −3
2 0
2 0 3 0
P AQ = 3 6 0 4 .
0 1 4 1
1
This ordering is not modified by the AMD ordering because the 3-by-3 matrix A22 + AT22 happens
to be a dense matrix. No partial pivoting happens to occur during LU factorization; all pivots
are selected along the diagonal of each block. The matrix contains two singletons, which are the
original entries a34 = 2 and a43 = 1, and one 3-by-3 diagonal block (in which a single fill-in entry
occurs during factorization: the u23 entry of this 3-by-3 matrix).
For a more complete program that uses KLU, see KLU/Demo/kludemo.c for an int32_t version,
and KLU/Demo/kluldemo.c for a version that uses int64_t instead. The top-level main routine
uses CHOLMOD to read in a compressed-column sparse matrix from a Matrix Market file, because
KLU does not include such a function. Otherwise, no CHOLMOD functions are used. Unlike
klu simple.c, CHOLMOD is required to run the kludemo.c and kluldemo.c programs.
6 Installation
Installation of the C-callable interface requires the cmake utility. The MATLAB installation in any
platform, including Windows is simple; just type klu install to compile and install KLU, BTF,
AMD, and COLAMD.
An optional Makefile is provided to simplify the use of cmake. To compile and install the
C-callable KLU, BTF, AMD, and COLAMD libraries, go to the SuiteSparse directory and type
make. The KLU and BTF libraries are placed in KLU/build/libklu.* and BTF/build/libbtf.*.
Two KLU demo programs will be compiled and tested in the KLU/Demo directory. You can compare
the output of make with the results in the KLU distribution, kludemo.out.
Typing make clean will remove all but the final compiled libraries and demo programs. Typing
make purge or make distclean removes all files not in the original distribution.
When you compile your program that uses the C-callable KLU library, you need to add the
KLU/build/libklu.*, BTF/build/libbtf.*, AMD/build/libamd.*, and COLAMD/build/libcolamd.*
libraries, and you need to tell your compiler to look in the KLU/Include and BTF/Include directory
for include files. If using cmake, each package includes scripts for find_library. Alternatively, do
make install, and KLU will be installed (on Linux/Mac) in /usr/local/lib and /usr/local/include,
and documentation is placed in /usr/local/doc. These installation locations can be changed; see
SuiteSparse/README.txt for details.
To install in SuiteSparse/lib and SuiteSparse/include, use make local ; make install.
If all you want to use is the KLU mexFunction in MATLAB, you can skip the use of the make
command entirely. Simply type klu install in the MATLAB command window while in the
KLU/MATLAB directory. This works on any system with MATLAB, including Windows.
19
7 The KLU routines
The file KLU/Include/klu.h listed below describes each user-callable routine in the C version of
KLU, and gives details on their use.
//------------------------------------------------------------------------------
// KLU/Source/klu.h: include file for KLU
//------------------------------------------------------------------------------
//------------------------------------------------------------------------------
#ifndef _KLU_H
#define _KLU_H
#include "amd.h"
#include "colamd.h"
#include "btf.h"
/* -------------------------------------------------------------------------- */
/* Symbolic object - contains the pre-ordering computed by klu_analyze */
/* -------------------------------------------------------------------------- */
typedef struct
{
/* A (P,Q) is in upper block triangular form. The kth block goes from
* row/col index R [k] to R [k+1]-1. The estimated number of nonzeros
* in the L factor of the kth block is Lnz [k].
*/
20
/* only computed if BTF preordering requested */
int32_t structural_rank ; /* 0 to n-1 if the matrix is structurally rank
* deficient. -1 if not computed. n if the matrix has
* full structural rank */
} klu_symbolic ;
} klu_l_symbolic ;
/* -------------------------------------------------------------------------- */
/* Numeric object - contains the factors computed by klu_factor */
/* -------------------------------------------------------------------------- */
typedef struct
{
/* LU factors of each block, the pivot row permutation, and the
* entries in the off-diagonal blocks */
int32_t n ; /* A is n-by-n */
int32_t nblocks ; /* number of diagonal blocks */
int32_t lnz ; /* actual nz in L, including diagonal */
int32_t unz ; /* actual nz in U, including diagonal */
int32_t max_lnz_block ; /* max actual nz in L in any one block, incl. diag */
int32_t max_unz_block ; /* max actual nz in U in any one block, incl. diag */
int32_t *Pnum ; /* size n. final pivot permutation */
int32_t *Pinv ; /* size n. inverse of final pivot permutation */
} klu_numeric ;
21
typedef struct /* 64-bit version (otherwise same as above) */
{
int64_t n, nblocks, lnz, unz, max_lnz_block, max_unz_block, *Pnum,
*Pinv, *Lip, *Uip, *Llen, *Ulen ;
void **LUbx ;
size_t *LUsize ;
void *Udiag ;
double *Rs ;
size_t worksize ;
void *Work, *Xwork ;
int64_t *Iwork ;
int64_t *Offp, *Offi ;
void *Offx ;
int64_t nzoff ;
} klu_l_numeric ;
/* -------------------------------------------------------------------------- */
/* KLU control parameters and statistics */
/* -------------------------------------------------------------------------- */
/* Common->status values */
#define KLU_OK 0
#define KLU_SINGULAR (1) /* status > 0 is a warning, not an error */
#define KLU_OUT_OF_MEMORY (-2)
#define KLU_INVALID (-3)
#define KLU_TOO_LARGE (-4) /* integer overflow has occured */
/* ---------------------------------------------------------------------- */
/* parameters */
/* ---------------------------------------------------------------------- */
22
* divide-by-zero may occur when computing L(:,k). The Numeric object
* can be passed to klu_solve (a divide-by-zero will occur). It can
* also be safely passed to klu_refactor.
* TRUE: stop quickly. klu_factor will free the partially-constructed
* Numeric object. klu_refactor will not free it, but will leave the
* numerical values only partially defined. This is the default. */
/* ---------------------------------------------------------------------- */
/* statistics */
/* ---------------------------------------------------------------------- */
} klu_common ;
} klu_l_common ;
/* -------------------------------------------------------------------------- */
23
/* klu_defaults: sets default control parameters */
/* -------------------------------------------------------------------------- */
int klu_defaults
(
klu_common *Common
) ;
/* -------------------------------------------------------------------------- */
/* klu_analyze: orders and analyzes a matrix */
/* -------------------------------------------------------------------------- */
/* Order the matrix with BTF (or not), then order each block with AMD, COLAMD,
* a natural ordering, or with a user-provided ordering function */
klu_symbolic *klu_analyze
(
/* inputs, not modified */
int32_t n, /* A is n-by-n */
int32_t Ap [ ], /* size n+1, column pointers */
int32_t Ai [ ], /* size nz, row indices */
klu_common *Common
) ;
/* -------------------------------------------------------------------------- */
/* klu_analyze_given: analyzes a matrix using given P and Q */
/* -------------------------------------------------------------------------- */
/* Order the matrix with BTF (or not), then use natural or given ordering
* P and Q on the blocks. P and Q are interpretted as identity
* if NULL. */
klu_symbolic *klu_analyze_given
(
/* inputs, not modified */
int32_t n, /* A is n-by-n */
int32_t Ap [ ], /* size n+1, column pointers */
int32_t Ai [ ], /* size nz, row indices */
int32_t P [ ], /* size n, user’s row permutation (may be NULL) */
int32_t Q [ ], /* size n, user’s column permutation (may be NULL) */
klu_common *Common
) ;
/* -------------------------------------------------------------------------- */
/* klu_factor: factors a matrix using the klu_analyze results */
/* -------------------------------------------------------------------------- */
24
/* inputs, not modified */
int32_t Ap [ ], /* size n+1, column pointers */
int32_t Ai [ ], /* size nz, row indices */
double Ax [ ], /* size nz, numerical values */
klu_symbolic *Symbolic,
klu_common *Common
) ;
/* -------------------------------------------------------------------------- */
/* klu_solve: solves Ax=b using the Symbolic and Numeric objects */
/* -------------------------------------------------------------------------- */
int klu_solve
(
/* inputs, not modified */
klu_symbolic *Symbolic,
klu_numeric *Numeric,
int32_t ldim, /* leading dimension of B */
int32_t nrhs, /* number of right-hand-sides */
int klu_z_solve
(
/* inputs, not modified */
klu_symbolic *Symbolic,
klu_numeric *Numeric,
int32_t ldim, /* leading dimension of B */
int32_t nrhs, /* number of right-hand-sides */
25
int klu_zl_solve (klu_l_symbolic *, klu_l_numeric *,
int64_t, int64_t, double *, klu_l_common *) ;
/* -------------------------------------------------------------------------- */
/* klu_tsolve: solves A’x=b using the Symbolic and Numeric objects */
/* -------------------------------------------------------------------------- */
int klu_tsolve
(
/* inputs, not modified */
klu_symbolic *Symbolic,
klu_numeric *Numeric,
int32_t ldim, /* leading dimension of B */
int32_t nrhs, /* number of right-hand-sides */
int klu_z_tsolve
(
/* inputs, not modified */
klu_symbolic *Symbolic,
klu_numeric *Numeric,
int32_t ldim, /* leading dimension of B */
int32_t nrhs, /* number of right-hand-sides */
) ;
/* -------------------------------------------------------------------------- */
/* klu_refactor: refactorizes matrix with same ordering as klu_factor */
/* -------------------------------------------------------------------------- */
26
int klu_z_refactor /* return TRUE if successful, FALSE otherwise */
(
/* inputs, not modified */
int32_t Ap [ ], /* size n+1, column pointers */
int32_t Ai [ ], /* size nz, row indices */
double Ax [ ], /* size 2*nz, numerical values */
klu_symbolic *Symbolic,
/* input, and numerical values modified on output */
klu_numeric *Numeric,
klu_common *Common
) ;
/* -------------------------------------------------------------------------- */
/* klu_free_symbolic: destroys the Symbolic object */
/* -------------------------------------------------------------------------- */
int klu_free_symbolic
(
klu_symbolic **Symbolic,
klu_common *Common
) ;
/* -------------------------------------------------------------------------- */
/* klu_free_numeric: destroys the Numeric object */
/* -------------------------------------------------------------------------- */
int klu_free_numeric
(
klu_numeric **Numeric,
klu_common *Common
) ;
int klu_z_free_numeric
(
klu_numeric **Numeric,
klu_common *Common
) ;
/* -------------------------------------------------------------------------- */
/* klu_sort: sorts the columns of the LU factorization */
/* -------------------------------------------------------------------------- */
27
/* this is not needed except for the MATLAB interface */
int klu_sort
(
/* inputs, not modified */
klu_symbolic *Symbolic,
/* input/output */
klu_numeric *Numeric,
klu_common *Common
) ;
int klu_z_sort
(
/* inputs, not modified */
klu_symbolic *Symbolic,
/* input/output */
klu_numeric *Numeric,
klu_common *Common
) ;
/* -------------------------------------------------------------------------- */
/* klu_flops: determines # of flops performed in numeric factorzation */
/* -------------------------------------------------------------------------- */
int klu_flops
(
/* inputs, not modified */
klu_symbolic *Symbolic,
klu_numeric *Numeric,
/* input/output */
klu_common *Common
) ;
int klu_z_flops
(
/* inputs, not modified */
klu_symbolic *Symbolic,
klu_numeric *Numeric,
/* input/output */
klu_common *Common
) ;
/* -------------------------------------------------------------------------- */
/* klu_rgrowth : compute the reciprocal pivot growth */
/* -------------------------------------------------------------------------- */
/* Pivot growth is computed after the input matrix is permuted, scaled, and
* off-diagonal entries pruned. This is because the LU factorization of each
* block takes as input the scaled diagonal blocks of the BTF form. The
* reciprocal pivot growth in column j of an LU factorization of a matrix C
28
* is the largest entry in C divided by the largest entry in U; then the overall
* reciprocal pivot growth is the smallest such value for all columns j. Note
* that the off-diagonal entries are not scaled, since they do not take part in
* the LU factorization of the diagonal blocks.
*
* In MATLAB notation:
*
* rgrowth = min (max (abs ((R \ A(p,q)) - F)) ./ max (abs (U))) */
int klu_rgrowth
(
int32_t Ap [ ],
int32_t Ai [ ],
double Ax [ ],
klu_symbolic *Symbolic,
klu_numeric *Numeric,
klu_common *Common /* Common->rgrowth = reciprocal pivot growth */
) ;
int klu_z_rgrowth
(
int32_t Ap [ ],
int32_t Ai [ ],
double Ax [ ],
klu_symbolic *Symbolic,
klu_numeric *Numeric,
klu_common *Common /* Common->rgrowth = reciprocal pivot growth */
) ;
/* -------------------------------------------------------------------------- */
/* klu_condest */
/* -------------------------------------------------------------------------- */
int klu_condest
(
int32_t Ap [ ], /* size n+1, column pointers, not modified */
double Ax [ ], /* size nz = Ap[n], numerical values, not modified*/
klu_symbolic *Symbolic, /* symbolic analysis, not modified */
klu_numeric *Numeric, /* numeric factorization, not modified */
klu_common *Common /* result returned in Common->condest */
) ;
int klu_z_condest
(
int32_t Ap [ ],
double Ax [ ], /* size 2*nz */
klu_symbolic *Symbolic,
klu_numeric *Numeric,
29
klu_common *Common /* result returned in Common->condest */
) ;
/* -------------------------------------------------------------------------- */
/* klu_rcond: compute min(abs(diag(U))) / max(abs(diag(U))) */
/* -------------------------------------------------------------------------- */
int klu_rcond
(
klu_symbolic *Symbolic, /* input, not modified */
klu_numeric *Numeric, /* input, not modified */
klu_common *Common /* result in Common->rcond */
) ;
int klu_z_rcond
(
klu_symbolic *Symbolic, /* input, not modified */
klu_numeric *Numeric, /* input, not modified */
klu_common *Common /* result in Common->rcond */
) ;
/* -------------------------------------------------------------------------- */
/* klu_scale */
/* -------------------------------------------------------------------------- */
30
double Rs [ ],
/* workspace, not defined on input or output */
int32_t W [ ], /* size n, can be NULL */
klu_common *Common
) ;
/* -------------------------------------------------------------------------- */
/* klu_extract */
/* -------------------------------------------------------------------------- */
/* L */
int32_t *Lp, /* size n+1 */
int32_t *Li, /* size Numeric->lnz */
double *Lx, /* size Numeric->lnz */
/* U */
int32_t *Up, /* size n+1 */
int32_t *Ui, /* size Numeric->unz */
double *Ux, /* size Numeric->unz */
/* F */
int32_t *Fp, /* size n+1 */
int32_t *Fi, /* size Numeric->nzoff */
double *Fx, /* size Numeric->nzoff */
/* P, row permutation */
int32_t *P, /* size n */
/* Q, column permutation */
int32_t *Q, /* size n */
/* R, block boundaries */
int32_t *R, /* size Symbolic->nblocks+1 (nblocks is at most n) */
klu_common *Common
) ;
31
klu_numeric *Numeric,
klu_symbolic *Symbolic,
/* L */
int32_t *Lp, /* size n+1 */
int32_t *Li, /* size nnz(L) */
double *Lx, /* size nnz(L) */
double *Lz, /* size nnz(L) for the complex case, ignored if real */
/* U */
int32_t *Up, /* size n+1 */
int32_t *Ui, /* size nnz(U) */
double *Ux, /* size nnz(U) */
double *Uz, /* size nnz(U) for the complex case, ignored if real */
/* F */
int32_t *Fp, /* size n+1 */
int32_t *Fi, /* size nnz(F) */
double *Fx, /* size nnz(F) */
double *Fz, /* size nnz(F) for the complex case, ignored if real */
/* P, row permutation */
int32_t *P, /* size n */
/* Q, column permutation */
int32_t *Q, /* size n */
/* R, block boundaries */
int32_t *R, /* size Symbolic->nblocks+1 (nblocks is at most n) */
klu_common *Common
) ;
/* -------------------------------------------------------------------------- */
/* KLU memory management routines */
/* -------------------------------------------------------------------------- */
32
/* ---- input ---- */
size_t n, /* number of items */
size_t size, /* size of each item */
/* --------------- */
klu_common *Common
) ;
//------------------------------------------------------------------------------
// klu_version: return KLU version
//------------------------------------------------------------------------------
#ifdef __cplusplus
}
#endif
/* ========================================================================== */
/* === KLU version ========================================================== */
/* ========================================================================== */
33
* printf ("This is an early version\n") ;
* #endif
*/
#endif
34
8 The BTF routines
The file BTF/Include/btf.h listed below describes each user-callable routine in the C version of
BTF, and gives details on their use.
//------------------------------------------------------------------------------
// BTF/Include/btf.h: include file for BTF
//------------------------------------------------------------------------------
//------------------------------------------------------------------------------
/* ========================================================================== */
/* === BTF_MAXTRANS ========================================================= */
/* ========================================================================== */
35
*
* q = maxtrans (A) ; % has entries in the range 0:n
* q % a column permutation (only if sprank(A)==n)
* B = A (:, q) ; % permuted matrix (only if sprank(A)==n)
* sum (q > 0) ; % same as "sprank (A)"
*
* This behaviour differs from p = dmperm (A) in MATLAB, which returns the
* matching as p(j)=i if row i and column j are matched, and p(j)=0 if column j
* is unmatched.
*
* p = dmperm (A) ; % has entries in the range 0:m
* p % a row permutation (only if sprank(A)==m)
* B = A (p, :) ; % permuted matrix (only if sprank(A)==m)
* sum (p > 0) ; % definition of sprank (A)
*
* This algorithm is based on the paper "On Algorithms for obtaining a maximum
* transversal" by Iain Duff, ACM Trans. Mathematical Software, vol 7, no. 1,
* pp. 315-330, and "Algorithm 575: Permutations for a zero-free diagonal",
* same issue, pp. 387-390. Algorithm 575 is MC21A in the Harwell Subroutine
* Library. This code is not merely a translation of the Fortran code into C.
* It is a completely new implementation of the basic underlying method (depth
* first search over a subgraph with nodes corresponding to columns matched so
* far, and cheap matching). This code was written with minimal observation of
* the MC21A/B code itself. See comments below for a comparison between the
* maxtrans and MC21A/B codes.
*
* This routine operates on a column-form matrix and produces a column
* permutation. MC21A uses a row-form matrix and produces a row permutation.
* The difference is merely one of convention in the comments and interpretation
* of the inputs and outputs. If you want a row permutation, simply pass a
* compressed-row sparse matrix to this routine and you will get a row
* permutation (just like MC21A). Similarly, you can pass a column-oriented
* matrix to MC21A and it will happily return a column permutation.
*/
#ifndef _BTF_H
#define _BTF_H
#include "SuiteSparse_config.h"
36
int32_t Match [ ], /* size nrow. Match [i] = j if column j matched to row i
* (see above for the singular-matrix case) */
/* ========================================================================== */
/* === BTF_STRONGCOMP ======================================================= */
/* ========================================================================== */
37
/* ========================================================================== */
/* === BTF_ORDER ============================================================ */
/* ========================================================================== */
//------------------------------------------------------------------------------
// btf_version: return BTF version
//------------------------------------------------------------------------------
#ifdef __cplusplus
}
#endif
38
/* ========================================================================== */
/* === BTF marking of singular columns ====================================== */
/* ========================================================================== */
/* ========================================================================== */
/* === BTF version ========================================================== */
/* ========================================================================== */
#endif
39
References
[1] P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algo-
rithm. SIAM J. Matrix Anal. Appl., 17:886–905, 1996.
[2] P. R. Amestoy, T. A. Davis, and I. S. Duff. Algorithm 837: AMD, an approximate minimum
degree ordering algorithm. ACM Trans. Math. Software, 30:381–388, 2004.
[5] T. A. Davis. A column pre-ordering strategy for the unsymmetric-pattern multifrontal method.
ACM Trans. Math. Software, 30:165–195, 2004.
[6] T. A. Davis. Direct Methods for Sparse Linear Systems. SIAM, Philadelphia, PA, 2006.
[7] T. A. Davis, J. R. Gilbert, S. I. Larimore, and E. G. Ng. Algorithm 836: COLAMD, a column
approximate minimum degree ordering algorithm. ACM Trans. Math. Software, 30:377–380,
2004.
[9] Timothy A. Davis and Ekanathan Palamadai Natarajan. Algorithm 907: KLU, a direct sparse
solver for circuit simulation problems. ACM Trans. Math. Softw., 37:36:1–36:17, September
2010.
[11] J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. A set of level-3 basic linear algebra
subprograms. ACM Trans. Math. Software, 16:1–17, 1990.
[12] I. S. Duff. On algorithms for obtaining a maximum transversal. ACM Trans. Math. Software,
7:315–330, 1981.
[13] I. S. Duff and J. K. Reid. An implementation of Tarjan’s algorithm for the block triangular-
ization of a matrix. ACM Trans. Math. Software, 4:137–147, 1978.
[14] J. R. Gilbert and T. Peierls. Sparse partial pivoting in time proportional to arithmetic oper-
ations. SIAM J. Sci. Statist. Comput., 9:862–874, 1988.
[15] W. W. Hager. Condition estimates. SIAM J. Sci. Statist. Comput., 5:311–316, 1984.
[16] N. J. Higham and F. Tisseur. A block algorithm for matrix 1-norm estimation with an appli-
cation to 1-norm pseudospectra. SIAM J. Matrix Anal. Appl., 21:1185–1201, 2000.
[17] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular
graphs. SIAM J. Sci. Comput., 20, 1998.
40
[18] K. S. Kundert. Sparse matrix techniques and their applications to circuit simulation. In A. E.
Ruehli, editor, Circuit Analysis, Simulation and Design. New York: North-Holland, 1986.
[20] L. W Nagel and D. O. Pederson. SPICE (simulation program with integrated circuit emphasis).
Technical Report Memorandum No. ERL-M382, University of California, Berkeley, 1973.
[21] E. Palamadai. KLU - a high performance sparse linear system solver for circuit simulation
problems. Technical report, CISE Department, Univ. of Florida. M.S. Thesis.
[22] Thomas L. Quarles. Analysis of Performance and Convergence Issues for Circuit Simulation.
PhD thesis, EECS Department, University of California, Berkeley, 1989.
[23] R. E. Tarjan. Depth first search and linear graph algorithms. SIAM J. Comput., 1:146–160,
1972.
41