0% found this document useful (0 votes)
5 views25 pages

Adaptive Smoothed Aggregation (SA) : Abstract

Uploaded by

soumita1997guria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views25 pages

Adaptive Smoothed Aggregation (SA) : Abstract

Uploaded by

soumita1997guria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

SIAM J. SCI. COMPUT.


c 2004 Society for Industrial and Applied Mathematics
Vol. 25, No. 6, pp. 1896–1920

ADAPTIVE SMOOTHED AGGREGATION (αSA)∗


M. BREZINA† , R. FALGOUT‡ , S. MACLACHLAN† , T. MANTEUFFEL† ,
S. MCCORMICK† , AND J. RUGE†

Abstract. Substantial effort has been focused over the last two decades on developing multi-
level iterative methods capable of solving the large linear systems encountered in engineering practice.
These systems often arise from discretizing partial differential equations over unstructured meshes,
and the particular parameters or geometry of the physical problem being discretized may be un-
available to the solver. Algebraic multigrid (AMG) and multilevel domain decomposition methods of
algebraic type have been of particular interest in this context because of their promises of optimal per-
formance without the need for explicit knowledge of the problem geometry. These methods construct
a hierarchy of coarse problems based on the linear system itself and on certain assumptions about
the smooth components of the error. For smoothed aggregation (SA) methods applied to discretiza-
tions of elliptic problems, these assumptions typically consist of knowledge of the near-nullspace of
the weak form. This paper introduces an extension of the SA method in which good convergence
properties are achieved in situations where explicit knowledge of the near-nullspace components is
unavailable. This extension is accomplished by using the method itself to determine near-nullspace
components and adjusting the coarsening processes accordingly.

Key words. algebraic multigrid (AMG), generalized smoothed aggregation (SA), adaptive
method

AMS subject classifications. 65F10, 65N55, 65F30

DOI. 10.1137/S1064827502418598

1. Introduction. Over the last decade, smoothed aggregation (SA; cf. [21, 23,
22, 20, 9]) has emerged as an efficient multilevel algebraic solver for the solution of the
algebraic systems obtained by discretizing certain classes of differential equations on
unstructured meshes. In particular, SA is often very efficient at solving the systems
that arise from problems of three-dimensional (3D) thin-body elasticity, a task that
can tax traditional algebraic multigrid (AMG) techniques.
As with classical AMG [4, 18, 19], the standard SA method bases its transfer
operators on certain assumptions about the nature of smooth error. For SA applied
to discretizations of elliptic partial differential equations, this assumption usually takes
the form of explicit knowledge of the near-nullspace of the associated weak form. This
knowledge is easy to obtain for large classes of problems. For example, it is simple
to determine the near-nullspace for finite element discretizations of second- or fourth-
order partial differential equations, including many nonscalar problems. In more
general situations, however, this knowledge may not be readily available. Consider
the case where the matrix for a problem is provided without knowledge of how the
original problem was discretized or scaled.

∗ Received by the editors November 25, 2002; accepted for publication (in revised form) October

26, 2003; published electronically May 25, 2004.


http://www.siam.org/journals/sisc/25-6/41859.html
† Department of Applied Mathematics, Campus Box 526, University of Colorado at Boulder,

Boulder, CO 80309-0526 ([email protected], [email protected], tmanteuf@


boulder.colorado.edu, [email protected], [email protected]). This work was
sponsored by the National Institute of Health under grant 1-R01-EY12291-01, the National Science
Foundation under grant DMS-0084438, the Department of Energy under grants DE-FG03-94ER25217
and DE-FC02-01ER25479, and the National Science Foundation under VIGRE grant DMS-9810751.
‡ Center for Applied Scientific Computation, Lawrence Livermore National Lab, P.O. Box 808,

Livermore, CA 94551 ([email protected]).


1896
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1897

Seemingly innocuous discretization practices, such as the use of scaled bases, can
hamper AMG solvers if this scaling is not taken into account. Even the simplest
problems discretized on regular grids using standard finite elements can pose serious
difficulties if the resulting matrix has been scaled without this information being
provided to the solver. Other discretization practices leading to problematic linear
systems include the use of exotic bases and systems problems in which different local
coordinate systems are used for different parts of the model.
To successfully solve such problems when only the matrix is provided, we need
a process by which the algebraic multilevel solver can determine how to effectively
coarsen the linear system using only information from the system itself. The method
we propose here, which we call adaptive smoothed aggregation (αSA), is an attempt
to do just that. αSA is based on the simple principle that applying a linear iterative
method to the homogeneous problem (Ax = 0) reveals error components that the
method does not effectively reduce. While this principle is easily stated in loose
terms, the resulting algorithm and its implementation can be very subtle. We hope
to expose these subtleties in the presentation that follows.
The objective of the setup phase of αSA is therefore to compute a set of vectors, B,
that represent error components that relaxation is slow to resolve. Such components
are usually referred to by the terms algebraically smooth, near-nullspace, near-kernel,
or, in the case of linear elasticity, rigid body modes. We simply call them candidates
here as it is not actually essential that all of the vectors we compute be troublesome
components; we use a measure that, in effect, ignores candidates that relaxation effi-
ciently handles. It is also not a problem if we compute redundant or linearly dependent
candidates because our approach is designed to select the information we need from
the candidate subspace. It is, however, important to be certain that the final set of
candidates is rich in the sense that they combine to represent all troublesome com-
ponents locally. The keys in being able to do this are to evolve the multigrid solver
by having it compute its own slow-to-converge error components (by way of the ho-
mogeneous problem) and to use these new components to properly improve the solver.
The setup phase for αSA is easiest to describe as an adaptive process. We start
from a given primitive parent method (possibly a simple relaxation scheme), with
error propagation operator M0 , and a current but possibly empty set, B, of candidates
(error components that M0 does not effectively reduce). We attempt to enhance B
by first putting M0 to the following test: given a small number, n, of iterations and
a random initial guess, e0 , compute

(1.1) en ← M0n e0 .

If the method performs well in the sense that en is much smaller than that of e0 in
an appropriate norm, then it is accepted as the solver and the adaptive scheme stops.
Otherwise, the resulting approximation, en , is expected to be rich in the error compo-
nents that are not effectively reduced by M0 , so it is added to the candidate set, B. The
new candidate set is then used to construct an improved child method, with error prop-
agation operator M1 . The whole process can then be repeated with M1 in place of M0 ,
continuing in this way to generate a sequence of hopefully improving methods, Mk .
Thus, we iterate on the method itself, improving the current version by having it
compute its own troublesome components—those that it does not effectively reduce—
and then adjusting the coarsening process accordingly to produce a new method. Old
candidate components are also used in this adjustment process to ensure that the new
method continues to reduce them efficiently. This improvement process repeats until
1898 BREZINA ET AL.

the current method shows itself to be capable of efficient solution of the problem of
interest. The iteration on the method is called the adaptive setup phase (or, simply,
the setup phase) to distinguish it from the solver phase, where the resulting method is
applied to the target problem. The setup phase is terminated when either the latest
incarnation of the method performs satisfactorily or a prescribed number of steps is
reached.
Each new child method is constructed based on components resulting from its
parent iteration (1.1). The method is modified to reflect the newly computed candi-
dates as soon as they become available. In other words, the method is kept up to date
at all times and no more work is done than necessary. In section 3.2, we show how
the general setup phase naturally takes the form of a reverse full multigrid (FMG-)
cycle.
The adaptive strategy outlined above is designed to uncover global error compo-
nents that a parent method does not handle well. It is crucial to recognize that there
are likely to be many such components—so many that, in general, we cannot expect
to identify each one individually. Typically, a small but fixed percentage of the spec-
trum of Mk corresponds to troublesome components. Thus, the few candidates that
iteration (1.1) identifies must serve as representatives for many smooth components
in the coarsening process. This is analogous to the standard SA coarsening processes
where the near-kernel is used to represent all smooth components. This representation
is accomplished by first taking local segments of each candidate (i.e., by taking the
restriction of the candidate to an aggregate and extending it outside the aggregate
with zeros) so that the segments sum to the candidate itself. Each segment is then
smoothed to enhance the overall approximation property in the sense that it accu-
rately represents similar smooth components. In this way, standard SA constructs a
rich set of local representations of the smooth or troublesome components. So too
must αSA. Indeed, we need a way to coarsen the system that ensures accurate approx-
imation of the error components that the current candidates represent. Of course, to
control storage and CPU costs, we also need to control operator complexity, which
involves limiting the number of candidates that iteration (1.1) produces, exploiting
these candidates as fully as we can, and limiting the growth of the number of coarse
degrees of freedom.
The SA framework [20] lends itself to this task. It offers fast automatic coarsening
with well-understood control over operator complexity due to its typically fixed coarse-
operator sparsity pattern. In addition, the process guarantees proper approximation
of a given set of functions and their natural localizations during the coarsening process.
The resulting coarse-level basis functions are smooth by design and thus suitable for
use in a multilevel method. The candidates obtained by iteration (1.1) play the roles
of the near-kernel components on which the SA method is based. Thus, in the αSA
context, the notion of near-kernel components depends not only on the problem but
also on the current method. In general, however, a troublesome component must
have a small Rayleigh quotient, signifying ineffectiveness of relaxation. However, in
all but the initial phase (where coarsening perhaps has not been constructed yet) or
the final phase (where the method may be efficient), the current candidate must also
have a small Rayleigh quotient defined in terms of the current coarse-level projection
operator. We do not use this property explicitly in the adaptive process, but keeping
it in mind can aid in understanding the development that follows.
Thus, our main goal is to extend applicability of the SA concept to difficult
problems for which the original method may perform poorly, possibly due to the
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1899

lack of explicit knowledge of the near-kernel. The algorithm may also be useful for
improving performance in applications that involve multiple right sides, where efforts
to improve the method may be amortized over the number of solutions.
In what follows, we develop this modification of the SA method in such a way
that good convergence properties are recovered even if explicit knowledge of the near-
kernel is either incomplete or lacking altogether. This should facilitate solution in cases
where the problem geometry, discretization method, or coefficients of the differential
operator are not explicitly known to the solver. At the same time, we strive to keep
storage requirements low.
The concept of using a multigrid algorithm to improve itself is not new. Using
representative smooth vectors in the coarsening process was first introduced in [15],
where interpolation was defined to fit vectors obtained by relaxation of the homoge-
neous problem. In [4], a variation of this idea was used for recovering typical AMG
convergence rates for a badly scaled scalar elliptic problem. While the method there
was very basic and used only one candidate, it contained many of the ingredients of the
approach developed below. These concepts were developed further in [16, 17, 19, 14].
The idea of fitting eigenvectors corresponding to the smallest eigenvalues was ad-
vocated in [14] and [19], where an AMG algorithm determining these eigenvectors
through Rayleigh quotient minimization was outlined. These vectors were, in turn,
used to update the AMG interpolation and coarse-grid operators. Most of these ideas
were later summarized in [14]. A more sophisticated adaptive framework appropriate
for the standard AMG is currently under investigation [7].
Another method of the type developed here is the bootstrap AMG scheme pro-
posed recently by Brandt [3] and Brandt and Ron [5]. It differs somewhat from ours
in that it starts on the fine grid by iterating on a number of different random initial
guesses, with interpolation then constructed to approximately fit the resulting vectors
in a least-squares sense.
Various other attempts have been made to allow for the solver itself to determine
from the discrete problem the information required to successfully solve it, without
a priori assumptions on the form of the smooth error. These include the methods
of [13, 8, 6, 11, 12]. All these methods, however, need access to the local finite
element matrices of the problem so that they can construct the multigrid transfer op-
erators based on the algebraically smooth eigenvectors of the agglomerated stiffness
matrices. Although these methods exhibit attractive convergence properties, their
need to construct, store, and manipulate the coarse-level element information typi-
cally leads to increased storage requirements compared to those of classical AMG or
standard SA. The current method aims to achieve the good convergence properties of
the element-based methods without the overhead of the element storage.
This paper is organized as follows. In section 2, we briefly recall the standard SA
method and introduce some notation used throughout the remainder of the paper.
Readers who are unfamiliar with the fundamental concepts assumed here may first
wish to consult basic references on multigrid (e.g., [10]) and SA (e.g., [20]). Section 3
motivates and describes possible strategies to extract the information used to con-
struct improved transfer operators based on the method’s iterative history. These
strategies can be described as adaptive AMG, in which the method ideally evolves
until a cross-point at which further improvement (in terms of convergence rate) is
offset by the increased cost of each iteration. Section 4 discusses implementation is-
sues and ways of reducing cost and improving accuracy of the setup phase. Finally,
section 5 presents computational examples demonstrating the performance of the SA
1900 BREZINA ET AL.

method based on the adaptive setup concepts.


2. Smoothed aggregation. We first briefly recall the SA method and introduce
some of the notation used later (see [20] for more detail). Assume that A is an SPD
matrix of order n1 resulting from a discretization of an elliptic second- or fourth-order
partial differential equation in Rd , where d ∈ {1, 2, 3}. Our aim is to solve A1 x = b1 ,
obtained by scaling Ay = b by its diagonal part, D:

(2.1) A1 = D−1/2 AD−1/2 , b1 = D−1/2 b.

A hierarchy of coarse problems is generated by the Galerkin recurrence


l
(2.2) Al+1 = (Il+1 )T Al Il+1
l
,
l
where the prolongator, Il+1 , is defined as the product of a given prolongation smoother,
l
Sl , and a tentative prolongator, Pl+1 ,
l l
(2.3) Il+1 = Sl Pl+1 ,

l = 1, . . . , L − 1. Suppose we are given a smoothing procedure for each level l ∈


{1, . . . , L} system, Al x = bl , of the form

(2.4) x ← (I − Rl Al )x + Rl bl .

Here, Rl is some simple approximate inverse of Al (e.g., Rl = sl I, where sl ≈ ρ(A 1


l)
)
for l = 1, . . . , L − 1. Assume for simplicity that the coarsest level uses a direct solver:
RL = A−1 L . To make use of the existing convergence estimates, we assume that

1
λmin (I − Rl Al ) ≥ 0 and λmin (Rl ) ≥ 2 ρ(A ) ,
CR l

with a constant CR > 0 independent of the level.


The SA iteration can formally be viewed as a standard variational multigrid
l
process with a special choice of transfer operators Il+1 . One iteration of SA multigrid,
x ← AMG(x, b1 ), for solving A1 x1 = b1 is given by setting AMG = AMG1 , where
AMGl (·, ·), l = 1, . . . , L − 1, is defined as follows.
Algorithm 1 (AMGl ).
1. Presmoothing: Apply ν presmoothings to Al xl = bl of the form xl ← (I −
Rl Al )xl + Rl bl .
2. Coarse-grid correction:
(a) Set bl+1 = (Il+1 l
)T (bl − Al xl ).
(b) If l + 1 < L, set xl+1 = 0 and solve the coarse grid problem

Al+1 xl+1 = bl+1 ,

by γ applications of xl+1 ← AMGl+1 (xl+1 , bl+1 );


else, solve the coarse-level problem directly.
(c) Correct the solution on level l: xl ← xl + Il+1l
xl+1 .
3. Postsmoothing: Apply ν postsmoothings to Al xl = bl of the form xl ← (I −
Rl Al )xl + Rl bl .
The components of the method that can be varied to achieve better convergence
l
are the construction of Sl and Pl+1 . A good choice for Sl is described in [20] and
scrutinized in more generality in [8]. For our purposes, it suffices to assume that the
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1901

prolongation smoother, Sl , is the error propagation operator corresponding to the


Richardson iteration for the problem on level l, with the particular choice of damping
suggested in [20],
4
(2.5) Sl = I − Al ,
3 λl
where λl = 91−l λ̄ and λ̄ is a bound on the spectral radius of the fine-level matrix:
ρ(A1 ) ≤ λ̄. With the prolongation smoothers thus chosen, we can concentrate in this
l
paper on the construction of Pl+1 .
Note 2.1. Our selection of multigrid smoothing procedure (2.4) and prolongation
smoothers Sl follows that of [20], where convergence estimates are obtained. We turn
to these results for heuristics in section 3.
l
The construction of operators Pl+1 consists of deriving the sparsity structure and
specifying the nonzero values. The nonzero sparsity structure determines the supports
of the tentative coarse-grid functions and is specified by way of a decomposition of
the set of degrees of freedom associated with operator Al into an aggregate partition
Nl+1 l
i=1 Ai = {1, . . . , Nl }, Ali ∩ Alj = ∅, 1 ≤ i < j ≤ Nl+1 , l = 1, . . . , L − 1, where
Nl denotes the number of nodes on level l. Note that the number of aggregates on
level l defines the number of nodes on the next level: Nl+1 = card({Ali }). Let nl
denote the number of degrees of freedom on level l, and assume at least one degree of
freedom is associated with each node so that nl ≥ Nl . Aggregates Ali can be formed
based only on the connectivity and strength of connection between the entries of Al ;
cf. [23].
Although we illustrate these concepts in the example of a finite element discretiza-
tion, where the notion of a node should be most familiar to the reader, for us a node
is a strictly algebraic entity consisting of a list of degrees of freedom. In fact, the
finite element analogy is only possible on the finest level; the degrees of freedom on
all other levels have no explicit geometry associated with them. Thus, throughout
this paper, a node on level l + 1 > 1 is a set of degrees of freedom associated with the
coarse basis functions whose discrete supports contain the same aggregate on level l.
Thus, each aggregate, A, on level l gives rise to one node on level l + 1, and each
degree of freedom constituting that node is a coefficient of a particular basis function
in the coarse-level basis expansion associated with A.
The second ingredient in constructing generalized aggregation tentative prolonga-
l
tors Pl+1 consists of endowing the sparsity structure derived from the nodal aggrega-
tion with appropriate values. Starting with a given matrix, B 1 , whose columns repre-
sent the near-kernel of the fine-level operator, we construct the tentative prolongators
and the coarse-level representation of the near-kernel components simultaneously to
satisfy
l
(2.6) Pl+1 B l+1 = B l , l
(Pl+1 )T Pl+1
l
= I.
l
This construction of Pl+1 and B l+1 is practical and parallelizable because it is achieved
l
by assigning each nodal aggregate a set of columns of Pl+1 with a sparsity structure
that is disjoint from all other columns. Thus, obtaining (2.6) amounts to solving a
set of local independent orthonormalization problems in which the basis given by the
fine-level near-kernel matrix, B l , restricted to the degrees of freedom of an aggregate,
is orthonormalized using the QR algorithm. The resulting orthonormal basis forms
l
the values of a block column of Pl+1 , while the coefficients representing the old basis
with respect to the new basis define B l+1 ; cf. [23, 20].
1902 BREZINA ET AL.

Note 2.2. In this way, with B 1 , A1 , and b1 given, the entire multigrid setup
can be performed. This construction of the SA multigrid hierarchy, using (2.6), (2.3),
and (2.2), and relying on a given fine-level near-kernel representation, B 1 , is called
the standard SA setup in this paper. For later reference, we outline the setup in
Algorithm 2 below. For details, see [20].
Algorithm 2 (standard SA setup). Given A1 , B 1 , L, do the following for l =
1, . . . , L − 1:
(a) Construct {Ali }N i=1 based on Al .
l

(b) Construct B l+1 l


and Pl+1 using (2.6) based on {Ali }N i=1 .
l

l l
(c) Construct the smoothed prolongator: Il+1 = Sl Pl+1 .
l T l
(d) Construct the coarse matrix: Al+1 = (Il+1 ) Al Il+1 .
With our choice of smoothing components and a coarsening procedure utiliz-
ing (2.6), the standard SA scheme can be proven to converge under certain assump-
tions on the near-kernel components alone. The following such result motivates the
need for standard SA to have access to the near-kernel components and serves to
motivate and guide our development of αSA.
Let u, v A denote the Euclidean inner product over the degrees of freedom cor-
responding to an agglomerate A and denote the A1 -norm by |||u||| = A1 u, u 1/2 .
Let B 1 denote an n1 × r matrix whose columns are thought to form a basis for the
near-kernel components corresponding to A1 .
Theorem 2.3 (Theorem 4.2 of [20]). With Ãli denoting the set of fine-level
degrees of freedom corresponding to aggregate Ali on level l, assume that there exists
constant Ca > 0 such that, for every u ∈ Rn1 and every l = 1, . . . , L − 1, the following
approximation property holds:
 9l−1
(2.7) minr u − B 1 w 2
≤ Ca A1 u, u .
w∈R Ãli ρ(A1 )
i

Then
 
1
|||x∗ − AMG(x, b1 )||| ≤ 1− |||x∗ − x||| ∀x ∈ Rn1 ,
c(L)

where A1 x∗ = b1 and c(L) is a polynomial of degree 3 in L.


Since the use of (2.6) is assumed, condition (2.7) reflects an assumption on all
l
tentative prolongators Pl+1 and can be equivalently restated as

 9l−1
(2.8) minr u − P21 P32 . . . Pl+1
l
B l+1 w 2
≤ Ca A1 u, u
w∈R Ãli ρ(A1 )
i

for every u ∈ Rn1 and every l = 1, . . . , L − 1. Thus, in the context of SA, condition
(2.7) can be viewed as an alternative formulation of the weak approximation prop-
erty [2]. Note that the required approximation of a fine-level vector is less stringent for
coarser levels. Also, convergence is guaranteed even though no regularity assumptions
have been made. Although the convergence bound naturally depends on the number
of levels, computational experiments suggest that the presence of elliptic regularity
for standard test problems yields optimal performance (i.e., convergence with bounds
that are independent of the number of levels).
That polynomial c(L) in the convergence estimate has degree 3 is an artifact
of the proof technique used in [20], where no explicit assumptions are made on the
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1903

smoothness of the coarse-level bases; instead, only the smoothness guaranteed by


application of the simple prolongation smoother, Sl , is considered.
Notice that this convergence result hinges on the selection of B 1 . In particular,
B and the coarse operators, B l+1 and Pl+1
1 l
, 1 ≤ l ≤ L − 1, that it induces must
guarantee that the left side of (2.8) is small for any vector u for which A1 u, u is
small. Since the standard SA method requires that matrix B 1 be given as input, with
the columns of B 1 representing a basis of (a superset of) the near-kernel of A1 , the
l
construction of Pl+1 in (2.6) guarantees that all coarse-level representations, B l , form
a basis for (a superset of) the near-kernel of Al , l > 1. The purpose of this paper is to
enrich a given incomplete (possibly even empty) set of near-kernel components with
approximations computed at runtime in such a way that good convergence can be
recovered. The adaptive method can then be viewed as an iterative attempt to sat-
isfy (2.8) heuristically (see Note 3.2 below). Our B 1 is computed only approximately,
which means that the coarse-level B l obtained by (2.6) alone may not be the optimal
representation of the near-kernel. To remedy this, we carry out the setup computation
also on the coarse levels to improve on the initial guess for the coarse-level candidates
given by (2.6).
The following section describes our basic approach to achieving this objective.

3. Self-correcting adaptive setup. In this section, we describe a prototype


of the adaptive method. To maintain generality and simplicity, the discussion is
intentionally vague about the various processes involved in the algorithm. Details are
provided in section 4. It suffices to say here that the coarsening processes are closely
related to those used in standard SA.
Before describing the algorithm, we introduce the following notational conven-
tions. The transfer operators and coarse-level problems, as well as other components
of our multigrid scheme, change as our method evolves. Whenever possible, we use
the same symbols for the updated components. Thus, symbol B l may denote a single
vector in one cycle of the setup procedure or perhaps a two-column matrix in the next
step of the setup. The intended meaning should be clear from context.

3.1. Initialization setup stage. The adaptive multigrid setup procedure con-
sidered in this paper can be split into two stages. If no knowledge of the near-kernel
components of A1 is available, then we start with the first stage to determine an ap-
proximation to one such component. This stage also determines the number of levels,
L, to be used in the coarsening process. (Changing L in the next stage based on
observed performance is certainly possible, but it is convenient to fix L—and other
constructs—early in the setup phase.)
Let ε > 0 be a given convergence tolerance.
Algorithm 3 (initialization stage).
1. Set l = 1, select a random vector, x1 ∈ Rn1 , and create copy, x̂1 ← x1 .
2. With initial approximation x1 , relax µ times on A1 x = 0:

x1 ← (I − R1 A1 )µ x1 .

3. If ( A 1 x1 ,x1  1/µ
A1 x̂1 ,x̂1  ) ≤ ε, then set L = 1 and stop (problem A1 x = b1 can be
solved fast enough by relaxation alone, so only one level is needed).
4. Otherwise, do the following:
(a) Set B l ← xl .
(b) Create a set, {Ali }N i=1 , of nodal aggregates based on matrix Al .
l
1904 BREZINA ET AL.

l
(c) Define tentative prolongator Pl+1 and candidate matrix B l+1 using the
candidate matrix B l and relations (2.6) with structure based on {Ali }N i=1 .
l

l l
(d) Define the prolongator: Il+1 = Sl Pl+1 .
l
(e) Define the coarse matrix: Al+1 = (Il+1 )T Al Il+1
l
. If level l + 1 is coarse
enough that a direct solver can be used there, skip to Step 5; otherwise,
continue.
(f) Set the next-level approximation vector: xl+1 ← B l+1 .
(g) Make a copy of the current approximation: x̂l+1 ← xl+1 .
(h) With initial approximation xl+1 , relax µ times on Al+1 x = 0:
xl+1 ← (I − Rl+1 Al+1 )µ xl+1 .

(i) If ( A l+1 xl+1 ,xl+1  1/µ


Al+1 x̂l+1 ,x̂l+1  ) ≤ ε, skip Steps (f)–(i) in further passes through
Step 4.
(j) Increment l ← l + 1 and return to Step 4(a).
5. Set L ← l + 1 and update the finest-level candidate matrix:
B 1 ← I21 I32 . . . IL−1
L−2
xL−1 .
6. Create the V -cycle based on B 1 using the standard SA setup of Algorithm 2,
with the exception that the aggregates are predetermined in Step 4.
This initialization stage terminates whenever a level is reached in the coarsening
process where a direct solver is appropriate. It does not involve level L processing
because it is assumed that the coarsest level is handled by a direct solver, making the
stopping criterion in Step 4(i) automatically true. Notice that the candidate matrix
is actually a vector in this initial stage because we are computing only one candidate.
Note that this stage provides all of the components needed to construct our initial
solver, AMG1 .
If the criterion tested in Step 4(i) is satisfied, we are assured that the current
coarse level l + 1 can be easily solved by relaxation alone. At that point, we could
choose not to coarsen further and use relaxation as a coarsest-level solver. However,
it is possible that the general stage of the algorithm described below adds more can-
didates. In case a new candidate approximates the low-energy modes of the problem
better than the candidate obtained in the initial step, the coarse-level matrix may
no longer be easily solved by relaxation alone. Thus, we choose to coarsen further,
until we are sure that the coarsest problem can be handled well. This offers an added
benefit of producing, at the end of the initial stage, a complete aggregation that can
be reused in the general stage. Note that if Step 4(i) is satisfied, then the approxi-
mate solution of the homogeneous problem may be zero. In such a case, we restore the
saved original vector x̂l+1 . We choose to skip Steps 4(f)–(i) in further coarsening once
Step 4(i) is satisfied. This amounts to using standard SA coarsening from level l + 1
down, which guarantees that the candidate computed on level l is exactly represented
all the way to the coarsest level. Figure 3.1 illustrates Algorithm 3.
Note 3.1. The initialization stage described in Algorithm 3 is used only if no
knowledge of the near-kernel components is provided. In many situations, however,
some knowledge may be available and should be used. In such cases, the initialization
stage can be omitted and the initial B 1 can be assumed to consist of the given set
of vectors. The initial V -cycle would then be constructed exactly as in Algorithm 2
prior to running the main adaptive setup.
As an example, consider a problem of 3D linear elasticity discretized by a stan-
dard linear first-order finite element method over an unstructured grid. In this case,
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1905

Select x 1 (random) Update x 1


1 Relax on A 1 x 1 = 0
1
Construct P2 x2 = x 1
1 1 1
I 2 = S 1P21 A2 = (I2)T A1I2
1 2 L−2
Relax on A2x 2= 0 x1 I 2 I 3... IL−1x L−1
2 2
Construct P3 x 3 = x2
2 2 2T 2
I3 = S2P3 A3= (I3) A2I3

L−1 Set B1 = x1
Run SA setup

L
New V−cycle

Fig. 3.1. Initialization stage, Algorithm 3.

if the discretization package either generates the rigid body modes or supplies the
nodal geometry to the solver, then the full set of nullspace vectors is presumably
available [22] and the adaptive process may be unnecessary. Otherwise, when the full
set of rigid body modes is unavailable, it is nevertheless often possible to obtain a sub-
set of the rigid body modes consisting of three independent constant displacements,
regardless of the geometry of the grid. Such a subspace should be used whenever
possible to create B 1 and to set up a V -cycle exactly as in the standard SA method.
The initialization stage would then be omitted.
Thus, the initialization stage given by Algorithm 3 should be viewed as optional,
to be done only if no information can be assumed about the system to be solved.
In view of Note 3.1, we can in any case assume that the initial B 1 has at least
one column and that a tentative V -cycle is available. This means that we have
constructed aggregates Ali , transfer operators Pl+1l l
and Il+1 , and coarse operators
Al+1 , l = 1, . . . , L − 1.

3.2. General setup stage. In each step of the second stage of the adaptive
procedure, we apply the current V -cycle to the homogeneous problem to uncover error
components that are not quickly attenuated. The procedure then updates its own
transfer operators to ensure that these components will be eliminated by the improved
method, while preserving the previously established approximation properties. Thus,
this stage essentially follows the initialization stage with relaxation replaced by the
current V -cycle.
One of the subtleties of this approach lies in the method’s attempt to update each
level of the evolving V -cycle as soon as its ineffectiveness is exposed. Thus, on the
finest level in the second stage, the current V -cycle simply plays the role of relaxation:
if it is unable to quickly solve the homogeneous problem (i.e., Step 3 fails), then the
resulting error becomes a new candidate, and new degrees of freedom are generated
accordingly on level 2 (i.e., columns are added to B 1 ). The level 2-to-L part of the
old V -cycle (i.e., the part without the finest level) then plays the role of the level 2
relaxation in the initial setup phase and is thus applied to the homogeneous problem
to assess the need to improve its coarser-level interpolation operators. The same is
1906 BREZINA ET AL.

done on each coarser level, l, with the level l-to-L part of the old V -cycle playing the
role of the level l relaxation step in the initial setup phase. The process continues
until adequate performance is observed or the maximum permitted number of degrees
of freedom per node is reached on coarse levels.
We present a general prototype algorithm for the adaptive multigrid setup, as-
suming that a tentative V -cycle has previously been constructed (cf. Note 3.1). We
thus assume that a current hierarchy of nodal aggregates, {Ali }N i=1 , and operators
l

Pl+1 , Il+1 , Al+1 , are available for all l = 1, . . . , L − 1.


l l

Consider a method in which, within each cycle of the adaptive setup, we attempt
to update the current V -cycle level by level. One cycle of this adaptive setup traverses
from the finest to the coarsest level; on each level l along the way, it updates B l based
on computing a new candidate from the current multigrid scheme applied to the
homogeneous problem on level l. Thus, on level l in the setup process, a solver is
applied that traverses from that level to level L and back. This gives us the picture of
a backward F M G cycle, where the setup traverses from the finest to the coarsest grid
and each level along the way is processed by a V -cycle solver (see Figure 3.2). Now,
once this new candidate is computed, it is incorporated into the current multigrid
scheme and the previously existing V -cycle components are overwritten on level l + 1
but temporarily retained from that level down. As a result, we redefine level by level
the V -cycle components. Once the new B l (and Il+1 l
in (2.3)) are constructed all the
way to the coarsest level, we can then use them to update the current B 1 and, based
on it, construct a new V -cycle on the finest level.
1

Fig. 3.2. Self-correcting adaptive cycling scheme given by Algorithm 4, with the solver cycles
uncollapsed.

The general stage is therefore analogous to the initialization stage described in


Algorithm 3, with relaxation replaced by the evolving V -cycle. Instead of using simple
relaxation on each level as the initialization stage does, the general stage uses the
current solver to identify new types of error that the earlier sweeps of the setup cycle
may have missed. The initialization stage was designed to capture a type of error
that relaxation cannot efficiently eliminate. A prototype of this algebraically smooth
error is the candidate that is generated by applying relaxation to the homogeneous
problem on each level. Similarly, each cycle of the general stage is designed to capture
a type of error that the current V -cycle cannot handle, and this too must be done on
each level. It is important to note that we are talking here about error type. It is not
enough for the coarsening process to eliminate only the particular candidate; typically,
a fixed percentage of the spectrum of A1 is algebraically smooth, so elimination of one
candidate at a time would take O(n1 ) setup cycles. Thus, to avoid this unacceptably
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1907

large cost, each setup cycle must determine interpolation operators so that the solver
eliminates a relatively large set of errors of each candidate’s type. Just as each rigid
body mode is used locally in standard SA to treat errors of similar type (constants
represent errors that are smooth within variables and rotations represent intervariable
“smoothness”), so too must each candidate be used in αSA. Moreover, a full set
of types must be determined if the solver is to attain full efficiency (e.g., for two-
dimensional (2D) linear elasticity, three rigid body modes are generally needed). We
thus think of each candidate as a sort of straw man that represents a class of smooth
components. Efficient computation of a full set of straw men is the responsibility of
the adaptive process. However, proper treatment of each straw man is the task of the
basic solver, which is SA in this case.
As we apply our current method to the homogeneous problem, the resulting can-
didate, xl , becomes rich in the components of the error that are slow to converge in
the current method. Our goal in designing the adaptive algorithm is to ensure that
xl is approximated relatively well by the newly constructed transfer operator. That
is, we want to control the constant Ca in the inequality
Ca
(3.1) min xl − Pl+1
l
v 2
≤ xl 2
Al .
v∈Rnl+1 ρ(Al )

The transfer operators must therefore be constructed to give accurate approximations


to each candidate as it is computed. This can be guaranteed locally by requiring that,
over every aggregate A, we have

(3.2) min xl − Pl+1


l
v 2
A ≤ Ca δA (xl ),
v∈Rnl+1

where δA are chosen so that summing (3.2) over all aggregates leads to (3.1), i.e., so
that
 Al x, x
(3.3) δA (x) = .
ρ(Al )
A

For now, the only assumption we place on δA (x) is that (3.3) holds. An appropriate
choice for the definition of δA (x) is given in section 4.
Note 3.2 (relationship to theoretical assumptions). To relate condition (3.1) to
l
the theoretical foundation of SA, we make the following observation. If Pl+1 is con-
structed so that (3.1) is satisfied for the candidate xl , the construction of our method
automatically guarantees that
Ca
(3.4) min x1 − P21 P32 . . . Pl+1
l
v 2
≤ x̂l 2
A1 ,
v∈Rn l+1 ρ(Al )

where x1 = P21 P32 . . . Pll−1 xl and x̂1 = I21 I32 . . . Ill−1 xl . Since it is easy to show that
x̂ A1 ≤ x A1 , we can thus guarantee that (2.8) holds for the particular fine-level
candidate x1 . Inequality (3.1) is easily satisfied for any component u for which u A1
is bounded away from zero. We can thus focus on the troublesome subspace of com-
ponents with small energy. Our experience with the standard SA method indicates
that for the second- and fourth-order elliptic problems it suffices to ensure that the
components corresponding to the nullspace of the weak form of the problem are well
approximated by the prolongation (the near-nullspace components are then well ap-
poximated due to the localization and smoothing procedures involved in constructing
1908 BREZINA ET AL.

the SA transfer operators). Further, as the set of candidates constructed during the
setup cycle is expected to eventually encompass the entire troublesome subspace,
satisfaction of (3.1) for all candidates would imply the satisfaction of (2.8) for any
u ∈ Rn1 . This, in turn, guarantees convergence.
Note 3.3 (locally small components). Each new candidate is the result of applying
the V -cycle based on the current B 1 , so it must be approximately A1 -orthogonal to
all previously computed candidates. This is, however, only a global property that the
evolving candidates tend to exhibit. It may be that a candidate is so small on some
aggregate, relative to its energy, that its representation there can be ignored. More
precisely, we could encounter situations in which

(3.5) xl 2
A ≤ Ca δA (xl )

for a particular aggregate, A, meaning that (3.2) is automatically satisfied, no matter


l
what choice we make for Pl+1 . We can, therefore, test for this condition for each
candidate on every aggregate. When the test is positive, we can simply remove the
candidate’s segment from consideration in construction of that aggregate’s transfer
operator. This elimination can help control coarse-level complexity since small can-
l l
didate segments are prevented from generating additional columns of Pl+1 and Il+1 .
(This test should be used in the initialization as well as the general setup stage. We
did not include it there for simplicity and because there is generally no worry about
complexity in the initial stage.)
l
Note 3.4 (construction of Pl+1 to minimize the number of columns). Although
each candidate’s segments may be too large to ignore, it may be that a nontrivial linear
combination of them is small. Thus, we also need to use (3.5) to identify a minimal
local basis for constructing the transfer operators so that the global approximation
property is maintained. To this end, let subscript A denote the restriction of the
corresponding vector or matrix to the degrees of freedom in A, and let rA denote
l
the number of columns of BA . One possibility for constructing the updated transfer
l
operator, Pl+1 , aggregate by aggregate, would then proceed as follows:
• Rescale each column, y, of B l globally: y ← √ y .
Al y,y
• Reorder the newly scaled columns of BA l
so that their Euclidean norms over
aggregate A are nonincreasing: y1 A ≥ y2 A ≥ · · · ≥ yrA A .
• Set j = 1.
• While j ≤ rA :
C δ (y )
– Set γA = Aal yAj ,yjj  .
y
– If yj 2A ≤ γA , then stop. Otherwise, add yjjA as a new column of
l l
Pl+1 , make all remaining columns in BA orthogonal to yj , and reorder
them so that their Euclidean norms over A are nonincreasing.
– j ← j + 1.
l l
A disadvantage of this process is that Pl+1 (hence also Il+1 ) must, in principle,
be constructed from scratch in each cycle of the adaptive setup. We discuss other
practical issues associated with this approach in Note 4.2.
Note 3.5 (reusing previously constructed components). To exploit the work done
in the earlier steps of the setup as much as possible, we consider an alternate procedure
l
that reuses parts of Pl+1 that have already been computed. Thus, in each step of the
l
setup, we consider only adding a single new column to Pl+1 . This has the advantages
that less work is required and that the storage used to hold the global candidates can
l
be reused as soon as they have been incorporated into Pl+1 .
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1909

In this approach, to minimize the complexity of the transfer operators, we seek to


ignore locally those components of candidate xl that appear to be well approximated
by the current transfer operators. This includes the case when xl is locally small
in the sense of (3.5). To decide whether to ignore xl locally in the construction of
l
new tentative prolongator Pl+1 , we test how well it is approximated by the current
l
tentative prolongator, P̃l+1 . The following provides a test of how well the range of
l
P̃l+1 approximates xl over aggregate A:

(3.6) xl − P̃l+1
l l
(P̃l+1 )T xl 2
A ≤ Ca δA (xl ).
l
(Since (P̃l+1 )T P̃l+1
l l
= I, then P̃l+1 l
(P̃l+1 )T is the L2 projection onto the range of P̃l+1
l
;
thus, (3.6) is just approximation property (3.2) using the tentative prolongator in place
of the smoothed one.) If (3.6) is satisfied, then xl is assumed to be well approximated
by the current transfer operator and is simply ignored in the construction of the new
transfer operator on aggregate A. (Practical implications of this local elimination
from the coarsening process are considered in section 4.) If the inequality is not
satisfied, then we keep the computed vector y = xl − P̃l+1 l l
(P̃l+1 )T xl , which, by
l
construction, is orthogonal to all the vectors already represented in the current P̃l+1 .
We then normalize via y ← y/ y A so that the new Pl+1 has orthonormal columns:
l
l
(Pl+1 )T Pl+1
l
= I.
Before we introduce Algorithm 4 below, we stress that the description should
be viewed as a general outline of the adaptive multigrid setup. We intentionally
ignore several practical issues that must be addressed before this algorithm can be
implemented. For instance, we do not include details on how the new B l and Il+1 l

are efficiently constructed in the evolving method. Also, when using a coarse-level
V -cycle constructed by previous applications of the setup stage, we must deal with
the possibility that the number of vectors approximated on coarse levels in previous
cycles is smaller than the number of vectors approximated on the fine levels in the
current cycle. These issues are discussed in section 4, where we take advantage of the
SA framework to turn the prototypical Algorithm 4 into a practical implementation.
Assume we are given a bound, K ∈ N, on the number of degrees of freedom per
node on coarse levels, convergence factor tolerance ε ∈ (0, 1), and aggregate quantities

δA (x) such that A δA (x) = A l x,x
ρ(Al ) .
Algorithm 4 (one cycle of the general setup stage).
1. If the maximum number of degrees of freedom per node on level 2 equals K,
stop (the allowed number of coarse-grid degrees of freedom has been reached).
2. Create a copy of the current B 1 for later use: B̂ 1 ← B 1 .
3. Select a random x1 ∈ Rn1 , create a copy x̂1 ← x1 , and apply µ iterations of
the current finest-level V -cycle:

x1 ← AMGµ1 (x1 , 0).

4. If ( A 1 x1 ,x1  1/µ
A1 x̂1 ,x̂1  ) ≤ ε, then stop (A1 x = b1 can be solved quickly enough by
the current method).
5. Update B 1 by incorporating the computed x1 in its range:

B 1 ← [B 1 , x1 ].

6. For l = 1, . . . , L − 2:
(a) Create a copy of the current B l+1 for later use: B̂ l+1 ← B l+1 ,
1910 BREZINA ET AL.

(b) Define new coarse-level matrix B l+1 and transfer operators Pl+1
l l
, Il+1 us-
ing (2.6) and (2.3). In creating Pl+1 , some local components in B l may
l

be eliminated as suggested in Notes 3.4 and 3.5.


l
(c) Construct coarse operator Al+1 = (Il+1 )T Al Il+1
l
.
(d) Set the coarse-level approximation xl+1 to be the last column of B l+1 .
(e) Make a copy: x̂l+1 ← xl+1 .
(f) Apply µ iterations of the current-level l + 1 V -cycle:

xl+1 ← AMGµl+1 (xl+1 , 0).

(g) If ( A l+1 xl+1 ,xl+1  1/µ


Al+1 x̂l+1 ,x̂l+1  ) ≤ ε, then skip Steps (d) through (h) in further
passes through Step 6.
(h) Update B l+1 by ensuring that the newly computed xl+1 is in its range:

B l+1 ← [B̂ l+1 , xl+1 ].

7. Update the finest-level candidate:

(3.7) x1 ← I21 I32 . . . IL−1


L−2
xL−1 .

8. Update B 1 by adding the newly computed x1 to the range of B̂ 1 :

B 1 ← [B̂ 1 , x1 ].

9. Create a V -cycle, AMG, based on B 1 using the standard SA setup of Algo-


rithm 2.
Algorithm 4, which is illustrated in Figure 3.3, starts from a V -cycle on input
and produces an improved V -cycle as output. It stops iterating when either the
convergence factor for the fine-level iteration in Step 3 is acceptable (as measured in
Step 4) or the maximum number of iterations is reached. Note that, as with the initial
stage, this general stage does not involve level L processing because the coarsest level

Given B1 select x 1 Update x 1


~
1 V−cycle on A1 x 1 = 0 B1 [ B1 x1 ]
1 ~
P2 B2 = B1 x2 Last col. of B2

V−cycle on A2 x2 = 0 B2 [ B2 x2 ]
2
2 P3 B3 = B2 x3 Last col. of B 3 1 2 L−2
x1 I2 I3... IL−1x L−1
Set B1 [ B1 x1]
Run S. A. setup
L−1

New V−cycle

L
Fig. 3.3. One step of general setup stage, Algorithm 4.
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1911

is assumed to be treated by a direct solver. Also as in the initial stage, once a level
is reached where the problem can be solved well by the current method, any further
coarsening is constructed as in the standard SA.
4. Implementation issues. Several issues must be addressed to make Algo-
rithm 4 practical. We take advantage of certain features of the SA concept to carry
out the method outlined in Algorithm 4, as well as to control the amount of work
required to keep the evolving coarse-level hierarchy up to date.
As suggested in Notes 3.4 and 3.5, a candidate may occasionally be eliminated
locally over an aggregate. This results in varying numbers of degrees of freedom
per node on the coarse levels. (Recall that a coarse-level node is defined as a set of
degrees of freedom, each representing the restriction of a single candidate to a fine-
level aggregate.) To simplify notation, we assume for the time being that the number
of degrees of freedom per node is the same for all nodes on a given level (i.e., no
candidates are locally eliminated). It is important, however, to keep in mind that we
are interested in the more general case. A generalization to varying numbers of degrees
of freedom per node could be obtained easily at the cost of a much more cumbersome
notation. We briefly remark on the more general cases in Notes 4.2 and 4.3 below.
Note 4.1 (construction of temporary “bridging” transfer operators). An issue
we must consider is the interfacing between the emerging V -cycle on finer levels and
the previous V -cycle on coarser levels. Each setup cycle starts by selecting an initial
approximation for a new candidate on the finest level (cf. Figure 3.3). This approxi-
mation is then improved by applying the error propagation matrix for the previously
constructed V -cycle to it. The resulting candidate is used to enrich B 1 . This neces-
sitates an update of P21 , I21 , and A2 from (2.6) and (2.2) and introduces an additional
degree of freedom for the nodes on level 2. Since we now want to run the current
solver on level 2 to obtain an improved candidate on that level, we need to temporar-
ily modify P32 and I32 because these transfer operators have not yet been updated to
reflect the added degrees of freedom on level 2. Once this modification has been made,
a V -cycle on level 2 can be run to compute the new candidate there. This candidate
is then incorporated into B 2 and new P32 and I32 are constructed, overwriting the
temporary versions, and the new A3 can be computed using (2.2). To perform the
V -cycle on level 3, we then must temporarily modify operators P43 and I43 for the same
reason we had to update P32 and I32 above. Analogous temporary modifications to
the transfer operators are necessary on all coarser levels, as the setup cycle traverses
sequentially through them.
Thus, on stage l of a single cycle of the setup process, all transfer operators defin-
l l
ing the V -cycle can be used without change, except for Pl+1 and, consequently, Il+1
l
defined through (2.3). We can construct the temporary operator, Pl+1 , by modifying
(2.6) as
l
Pl+1 B l+1 = B̂ l ,

where B̂ l is formed by removing the last column from B l , which consists of the
k + 1 fine-level candidate vectors, including the newly added one (so that the first k
l
candidates are the same as in the previous cycle). Since tentative prolongator Pl+1
l
produced in this way is based only on fitting the first k vectors in B , the coarse-level
matrix Al+1 resulting from the previous cycle of the αSA setup (described below) can
be used on the next level. Thus, all the coarse operators for levels coarser than l can
be used without change. This has the advantage of reducing the amount of work to
keep the V -cycle up to date on coarser, yet-to-be-traversed levels.
1912 BREZINA ET AL.

So far, we have considered only the case where all candidates are used locally.
In the interest of keeping only the candidates that are essential to achieving good
convergence properties, we now consider locally eliminating the candidates where
appropriate.
Note 4.2 (eliminating candidates locally as suggested in Note 3.5). When we
eliminate a candidate locally over an aggregate as suggested in Note 3.5, the con-
struction of the bridging operator above can be easily modified so that the multigrid
hierarchy constructed in the previous setup cycle can be used to apply a level l V -
cycle in the current one. Since the procedure guarantees that the previously selected
candidates are retained and only the newly computed candidate may be locally elim-
inated, the V -cycle constructed in the previous setup cycle remains valid on coarser
grids as in the case of Note 4.1. The only difference now is that aggregates may have
a variable number of associated candidates, and the construction of the temporary
l
transfer operator Pl+1 described in Note 4.1 must account for this when removing the
column of B l to construct B̂ l .
Note 4.3 (eliminating candidates locally as suggested in Note 3.4). The situation
is slightly more complicated when the procedure described in Note 3.4 is used to
eliminate candidates locally over an aggregate. First, even if none of the old candidates
are eliminated, the use of the procedure of Note 3.4 may result in a permutation of
the candidates over an aggregate and hence a permutation of the coarse degrees of
freedom corresponding to the associated node. To match the fine-level V -cycle with
the existing coarser levels, an appropriate permutation of the coarse degrees of freedom
must then be done when performing the intergrid transfer in the application of the
resulting V -cycle.
However, if some of the previously selected candidates are eliminated in favor of
l
the new candidate in the construction of the updated Pl+1 , the coarse V -cycle should
no longer be used without change. In such cases, we would have to generate all the
coarse levels below level l before running the level l + 1 V -cycle. This results in a
significantly increased cost of the setup phase.
Note 4.4 (selection of the local quantities δA (x)). Our algorithm relies on local
aggregate quantities δA (x) to decide whether to eliminate candidate x in aggregate
A, and to guarantee that the computed candidates satisfy the global approximation
property (3.1). This leads us to the choice
 
card(A) Al x, x
(4.1) δA (x) = ,
Nl ρ(Al )
where card(A) denotes the number of nodes in aggregate A on level l, and Nl is the

total number of nodes on that level. Note that A δA (x) = A l x,x
ρ(Al ) for any x, so this
can be used in local estimates (3.2) to guarantee (3.1).
Suppose that we are given a bound, K > 0, on the maximum allowed number of
degrees of freedom per node on the coarse levels, and a tolerance, ε ∈ (0, 1), on the
target convergence factor. Then one adaptive setup cycle is defined as follows.
Algorithm 5 (one cycle of αSA).
1. If the maximum number of degrees of freedom per node on level 2 equals K,
stop (the allowed number of coarse grid degrees of freedom has been reached).
2. Create a copy of the current B 1 for later use: B̂ 1 ← B 1 .
3. Select a random x1 ∈ Rn1 , create a copy x̂1 ← x1 , and apply µ iterations of
the current V -cycle:
x1 ← AMGµ1 (x1 , 0).
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1913

4. If ( A 1 x1 ,x1  1/µ
A1 x̂1 ,x̂1  ) ≤ ε, then stop (A1 x = b1 can be solved quickly enough by
the current method).
5. Update B 1 by extending its range with the new column {x1 }:

B 1 ← [B 1 , x1 ].

6. For l = 1, . . . , L − 2:
(a) Define a new coarse-level matrix B l+1 and transfer operator Pl+1 l
based
on (2.6), using B and decomposition {Ai }i=1 . In creating Pl+1 , some
l l Nl l

local components in B l may be locally eliminated as suggested in Note 3.5.


l l
(b) Construct the prolongator: Il+1 = Sl Pl+1 .
(c) Construct the coarse operator: Al+1 = (Il+1 )T Al Il+1
l l
.
l+1
(d) Reorder the columns of B so that its last is xl+1 , and let B̂ l+1 consist
of all other columns of B l+1 .
l+1
(e) Create a “bridge” transfer operator Pl+2 to the coarser level with the old
l+1 l+1
B by fitting all the vectors in B except the last one; see Note 4.1.
l+1 l+1
(f) Set the new “bridging” prolongator: Il+2 = Sl+1 Pl+2 .
(g) Make a copy: x̂l+1 ← xl+1 .
(h) Apply µ iterations: xl+1 ← AMGµl+1 (xl+1 , 0).
(i) If ( A l+1 xl+1 ,xl+1  1/µ
Al+1 x̂l+1 ,x̂l+1  ) ≤ ε, then skip (d) through (j) in further passes
through Step 6.
(j) Update the coarse representation of candidate B l+1 :

B l+1 ← [B̂ l+1 , xl+1 ].

7. Update the latest fine-level candidate:

(4.2) x1 ← I21 I32 . . . IL−1


L−2
xL−1 .

8. Update B 1 by extending the old copy with the newly computed x1 :

B 1 ← [B̂ 1 , x1 ].

9. Create the V -cycle based on the current B 1 using the standard SA setup de-
scribed by Algorithm 2.
Note that if we use the candidate elimination scheme of Note 3.4 in 6(a), we
should modify the algorithm to construct a completely new multigrid hierarchy on
levels l +1 through L before applying the level l +1 V -cycle in Step 6(h) (cf. Note 4.2).
Before presenting computational results, we consider several possible improve-
ments intended to reduce the necessary number of cycles of the setup and the amount
of work required to carry each cycle.
Note 4.5 (improving the quality of existing candidates). Many practical situ-
ations, including fourth-order equations and systems of fluid and solid mechanics,
require a set of multiple candidates to achieve optimal convergence. In the interest
of keeping operator complexity as small as possible, it is imperative that the number
of candidates used to produce the final method be controlled. Therefore, ways of
improving the quality of each candidate are of interest, to curb the demand for the
growth in their number.
When the current V -cycle hierarchy is based on approximating at least two can-
didates (in other words, the coarse problems feature at least two degrees of freedom
per node), this can be easily accomplished as follows.
1914 BREZINA ET AL.

Assume that the currently available candidate vectors are x1 , . . . , xk . Consider


one such candidate, say, xj , that we want to improve. We want to run a modified
but current V -cycle on the homogeneous problem, A1 x = 0, using xj as the initial
guess. The modification consists of disabling, in the coarse-grid correction process,
the columns of the prolongator corresponding to the given candidate. That is, instead
of xl ← xl + Il+1
l
xl+1 in step 2(c) of Algorithm 1, we use

xl ← xl + Il+1
l
x̂l+1 ,

where x̂l+1 is obtained from xl+1 by setting to zero every entry corresponding to
l
fine-level candidate xj . Thus, the columns of Il+1 corresponding to xj are not used
in coarse-grid correction.
In this way, we come up with an improved candidate vector without restarting
the entire setup iteration from scratch and without adding a new candidate. Since we
focus on one component at a time and keep all other components intact, this modified
V -cycle is expected to converge rapidly.
Note 4.6 (saving work). The reuse of current coarse-level components described
in Note 4.1 reduces the amount of work required to keep the V -cycle up to date.
Additional work can be saved by performing the decomposition of nodes into disjoint
aggregates only during the setup of the initial V -cycle and then reusing this decom-
position in later cycles. Yet further savings are possible in coarsening, assuming the
candidates are allowed to be locally eliminated according to Note 3.5. For instance,
we can exploit the second-level matrix structure
 
Ã2 X
A2 = ,
Y Z

where Ã2 is the second-level matrix from the previous cycle. Thus, A2 need not be
recomputed and can be obtained by a rank-one update of each block entry in Ã2 .
l
In a similar fashion, the new operators Pl+1 , B l+1 do not have to be recomputed in
each new setup cycle by the local QR decomposition noted in section 2. Instead, it is
l
possible to update each nodal entry in P̃l+1 , B̂ l+1 by a rank-one update on all coarse
l l+1
levels, where P̃l+1 , B̂ are the operators created by the previous setup cycle.
5. Numerical experiments. To demonstrate the effectiveness of the proposed
adaptive setup process, we present results obtained by applying the method to several
model problems. In these tests, the solver was stopped when the relative residual
reached the value ε = 10−12 (unless otherwise specified). The value Ca = 10−3 was
used for test (3.6) and the relaxation scheme for the multigrid solver was symmetric
Gauss–Seidel. While a Krylov subspace process is used often in practice, we present
these results for a basic multigrid V -cycle with no acceleration scheme for clarity,
unless explicitly specified otherwise.
All the experiments have been run on a notebook computer with a 1.6 GHz mo-
bile Pentium 4 processor and 512 MB of RAM. For each experiment, we report the
following. The column denoted by “Iter” contains the number of iterations required
to reduce the residual by the prescribed factor. The “Factor” column reports con-
vergence factor measured as the geometric average of the residual reduction in the
last 10 iterations. In the “CPU” column, we report the total CPU times in seconds
required to complete both the setup and iteration phases of the solver. In the column
“RelCPU” we report the relative times to solution, with one unit defined as the time
required to solve the problem given the correct near-nullspace components. In the
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1915
Table 5.1
Misscaled 3D Poisson problems, 68, 921, and 1, 030, 301 degrees of freedom; using ε = 10−8 .

σ Candidates µmax Iter Factor CPU RelCPU OpComp


Poisson problem with 68,921 degrees of freedom
0 provided N/A 9 0.100 3.65 1.00 1.038
0 1 5 9 0.100 4.09 1.12 1.038
6 provided N/A 150 0.871 43.76 11.99 1.038
6 1 5 10 0.126 4.27 1.17 1.038
Poisson problem with 1,030,301 degrees of freedom
0 provided N/A 9 0.093 58.43 1.00 1.039
0 1 5 9 0.099 80.05 1.37 1.039
6 provided N/A 690 0.970 3,252.80 55.67 1.039
6 1 5 9 0.096 88.23 1.51 1.039

“OpComp” column, we report the operator complexity associated with the V -cycle
for every run (we define operator complexity in the usual sense [19], as the ratio of
the number of entries stored in all problem matrices on all levels divided by the num-
ber of entries stored in the finest-level matrix). The “Candidates” column indicates
the number of kernel vectors computed in the setup iteration (a value of “provided”
means that complete kernel information was supplied to the solver, assuming standard
discretization and ignoring scaling). Parameter µmax denotes the maximal number of
tentative V -cycles allowed in computing each candidate.
In all the cases considered, the problem was modified either by scaling or by
rotating each nodal entry in the system by a random angle (as described below).
These modifications pose serious difficulties for classical algebraic iterative solvers
that are not aware of such modifications, as we assume here.
For comparison, we also include the results for the unmodified problem, with a
supplied set of kernel components. Not surprisingly, the standard algorithm (without
benefit of the adaptive process) performs poorly for the modified system when the
details of this modification are kept from the solver, as we assume here.
We start by considering a diagonally scaled problem,

A ← D−1/2 AD−1/2 ,

where the original A is the matrix obtained by standard Q1 finite element discretiza-
tion of the 3D Poisson operator on a cube and D is a diagonal matrix with entries
10β , where β ∈ [−σ, +σ] is chosen randomly. Table 5.1 shows the results for different
values of parameter σ and different levels of refinement. Using the supplied kernel
yields good convergence factors for the unmodified problem, but the performance is
poor and deteriorates with increased problem size when used with σ = 0. In contrast,
the adaptive process, starting from a random approximation, recovers the convergence
properties associated with the standard Poisson problem (σ = 0), even for the scaled
case, with convergence that appears independent of the problem size.
The second problem comes from a diagonally scaled matrix arising in 2D elasticity.
Diagonal entries of D are again defined as 10β , with β ∈ [−σ, +σ] chosen randomly.
The original matrix is the discrete operator for the plane-strain elasticity formulation
over a square domain using bilinear finite elements on a uniform grid, with a Poisson
ratio of ν = 0.3 and Dirichlet boundary conditions specified only along the “West”
side of the domain. The results in Table 5.2 follow a pattern similar to those for the
Poisson problem. Note, however, that more than the usual three candidate vectors
are now needed to achieve convergence properties similar to those of the unscaled
1916 BREZINA ET AL.

Table 5.2
Scaled 2D elasticity problems with 80, 400 and 181, 202 degrees of freedom. Iteration counts
marked with an asterisk indicate that residual reduction by 1012 was not achieved before the maxi-
mum number of iterations was reached.

σ Candidates µmax Iter Factor CPU RelCPU OpComp


2D elasticity problem, 80, 400 degrees of freedom
0 3 provided N/A 17 0.21 9.16 1.00 1.27
0 3 6 23 0.37 21.16 2.31 1.27
0 3 15 18 0.23 26.65 2.91 1.27
6 3 provided N/A 299 0.92 133.55 14.58 1.27
6 3 6 25 0.38 22.26 2.43 1.27
6 3 15 18 0.25 27.30 2.98 1.27
2D elasticity problem, 181, 202 degrees of freedom
0 3 provided N/A 23 0.35 22.85 1.00 1.28
0 3 15 267 0.937 272.14 11.91 1.27
0 4 15 26 0.422 75.18 3.29 1.50
0 4 20 26 0.439 86.60 3.79 1.50
0 5 15 20 0.314 88.20 3.86 1.78
6 3 provided N/A 5, 000∗ 0.996 4,559.95 199.56 1.28
6 4 15 23 0.367 74.95 3.28 1.50
6 4 20 19 0.302 76.78 3.36 1.50
6 5 10 14 0.173 69.46 3.04 1.78

problem when a correct set of three rigid-body modes is provided by the user. For
the scaled problem, however, supplying the rigid-body modes computed based on the
problem geometry leads, as expected, to dismal performance of the standard solver.
The third set of experiments is based again on the 2D elasticity problem, but now
each nodal block is rotated by a random angle β ∈ [0, π],

A ← QT AQ,

where Q is a nodal block-diagonal matrix consisting of rotations with random angles.


The results in Table 5.3 show that αSA can recover good convergence factors for both
the unmodified and the modified systems. Without the adaptive procedure, our basic
algebraic solver could not solve the modified matrix problem in a reasonable amount
of time.
The fourth example demonstrates performance of the method when a higher num-
ber of candidates is required. We consider a 3D elasticity problem with local rotations.
This is done to maintain locally orthogonal coordinates but is otherwise a random
rotation of the three degrees of freedom at each node. The model problem we start
from is linearized elasticity discretized using trilinear finite elements over a uniform
grid. Dirichlet boundary conditions are specified on the “West” face of the cube, and
the Poisson ratio is set to ν = 0.3. The results in Table 5.4 show that, even for the
modified system, the adaptive method can again recover good convergence factors.
Furthermore, our current method mimics the convergence of the SA for the unmodi-
fied problem with the supplied set of rigid-body modes. In this set of experiments, we
can get close to the ideal iteration counts using just six candidates. We see that using
one extra candidate can improve convergence properties and in some cases actually
lower the overall cost of the total time to solution. This is done at the price of a small
increase in operator complexity. For problems with multiple right-hand sides, the
more expensive setup would be performed only once, and using the extra candidate
may then be preferred.
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1917
Table 5.3
Rotated 2D elasticity problems with 80, 400 and 181, 202 degrees of freedom. Iteration counts
marked with an asterisk indicate that residual reduction by 1012 was not achieved before the limit
on the number of iterations was reached.

Rotated Candidates µmax Iter Factor CPU RelCPU OpComp


2D elasticity problem with 80, 400 degrees of freedom
NO 3 provided N/A 17 0.21 9.16 s 1.00 1.27
NO 3 15 18 0.23 26.66 2.91 1.27
YES 3 provided N/A 1,329 0.99 587.80 64.17 1.27
YES 3 15 19 0.27 27.8464 3.04 1.27
2D elasticity problem with 181, 202 degrees of freedom
NO 3 provided N/A 23 0.35 22.85 s 1.00 1.28
NO 3 15 18 0.23 66.49 2.91 1.28
YES 3 provided N/A 5, 000∗ 0.999 3,968.36 173.67 1.28
YES 3 20 135 0.885 170.23 7.45 1.28
YES 4 15 27 0.488 77.46 3.39 1.50
YES 4 20 21 0.395 79.29 3.47 1.50
YES 5 6 18 0.34 60.78 2.66 1.78
YES 5 10 15 0.233 72.66 3.18 1.78

Table 5.4
Rotated 3D elasticity problems with 114, 444 and 201, 720 degrees of freedom.

Rotated Candidates µmax Iter Factor CPU RelCPU OpComp


3D elasticity problem with 114, 444 degrees of freedom
NO 6 provided N/A 16 0.20 29.97 1.00 1.159
NO 6 15 20 0.27 189.11 6.31 1.159
NO 7 15 17 0.21 215.78 7.20 1.217
YES 6 provided N/A 587 0.97 913.49 30.48 1.159
YES 6 15 16 0.22 184.32 6.15 1.159
YES 7 10 15 0.20 171.73 5.73 1.217
YES 7 15 15 0.20 210.99 7.04 1.217
3D elasticity problem with 201, 720 degrees of freedom
NO 6 provided N/A 16 0.20 50.33 1.00 1.153
NO 6 15 21 0.31 319.60 6.35 1.153
NO 7 10 17 0.216 297.95 5.92 1.209
NO 7 15 17 0.209 363.38 7.22 1.209
YES 6 provided N/A 739 0.97 1,924.62 38.24 1.153
YES 6 15 16 0.23 308.02 6.12 1.153
YES 7 10 15 0.20 301.98 6.00 1.209
YES 7 15 14 0.16 357.85 7.11 1.209

The final example demonstrates performance of the adaptive method for an elas-
ticity problem featuring discontinuities in the Young modulus. Here we consider a 3D
elasticity problem in which the Poisson ratio is fixed at 0.32, while the Young modulus
is allowed to vary randomly between the elements. We consider two cases: a case of
coefficients varying randomly with uniform distribution in the interval (1, 10σ ) and
the case where the distribution is exponential; i.e., the Young modulus is computed
as 10(σr) , where r is generated randomly with uniform distribution in (0, 1). Keep-
ing with the usual practice of employing Krylov method acceleration for problems
with coefficient discontinuities, in this experiment we use our adaptive method as a
preconditioner in the conjugate gradient method. The iteration was stopped once
the initial residual was reduced by 108 . Table 5.5 compares the results obtained by
using our adaptive scheme, started from random initial guess, to the results obtained
when the method based on a priori knowledge of the rigid body modes is employed
1918 BREZINA ET AL.

Table 5.5
3D elasticity problem, 201, 720 degrees of freedom, with Young modulus featuring random jumps
in (1, 10σ ).

σ Candidates µmax Iter Factor CPU RelCPU OpComp


Elasticity problem with uniformly distributed coefficient jumps
2 6 provided N/A 8 0.0734 24.22 1.00 1.15
2 6 10 13 0.2209 219.34 9.05 1.15
2 7 10 11 0.1866 266.29 10.99 1.21
3 6 provided N/A 8 0.0765 24.23 1.00 1.15
3 6 10 13 0.2154 218.57 9.02 1.15
3 7 10 11 0.1861 265.70 10.96 1.21
4 6 provided N/A 8 0.0768 24.56 1.00 1.15
4 6 10 12 0.2081 219.26 8.93 1.15
4 7 10 11 0.1648 264.38 10.76 1.21
Elasticity problem with exponentially distributed coefficient jumps
2 6 provided N/A 9 0.1146 25.99 1.00 1.15
2 6 10 16 0.3048 225.52 8.68 1.15
2 7 10 12 0.2141 267.72 10.30 1.21
3 6 provided N/A 14 0.2466 35.68 1.00 1.15
3 6 10 22 0.4179 237.56 6.76 1.15
3 7 10 16 0.3095 275.62 7.84 1.21
4 6 provided N/A 20 0.3948 49.99 1.00 1.15
4 6 10 30 0.5316 255.43 5.11 1.15
4 7 10 21 0.4040 289.39 5.79 1.21
5 6 provided N/A 32 0.5545 73.63 1.00 1.15
5 6 10 46 0.6695 292.35 3.97 1.15
5 6 20 36 0.5980 402.75 5.47 1.21
5 7 10 37 0.6020 324.19 4.40 1.21
5 7 15 27 0.4966 381.16 5.17 1.21

as a preconditioner. The table indicates that, using the adaptive procedure, without
a priori knowledge of the problem geometry, we can about recover the rates of the
method based on the knowledge of the rigid-body modes.
The topic of problems with discontinuous coefficients and the appropriate modi-
ficiations to the basic SA method will be studied in a separate paper.
Note 5.1. The operator complexities in all of the test problems remain below
2. Moreover, for the larger spatial dimension of three dimensions, these complexities
improve somewhat, due largely to the increased speed of aggregation coarsening. It is
also worth mentioning that the increasing size of the coarse matrix block entries due
to the increasing number of candidates does not significantly impact the time needed
to perform one iteration of the solver, apparently due to the more efficient memory
access afforded by blocking.

6. Conclusions. We consider a new multilevel method to tackle problems which,


to date, have been very difficult to handle by AMG methods. At the expense of a
somewhat more costly setup stage and more intricate implementation, we design a
method that has, thus far, proved successful at solving such problems. We observe
that the convergence properties of the method seem very insensitive to modifications of
the algebraic system by scaling or nodal rotation. Moreover, the solver is flexible and
can benefit from extra information supplied about the problem. If such information
is lacking or incorrect, then αSA can act as a full black-box solver. Despite the
growing number of degrees of freedom per coarse-level node as the method evolves,
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1919

the overall cost of one step of the final iteration grows only modestly because of better
utilization of cache memory due to dense matrix operations on the nodal blocks.
Operator complexity remains at reasonable levels and actually seems to improve with
increasing spatial dimension.
The construction of the tentative prolongator in the setup phase involves restric-
tion of the candidate functions to an aggregate and subsequent local orthogonalization
of these functions. It is therefore suitable for parallel processing as long as the ag-
gregates are local to the processor. Parallelization of the underlying SA solver is
currently under development. When it is completed, then αSA will also benefit from
the parallel speedup. The parallel version is also expected to gain better parallel scala-
bility by replacing the traditionally used Gauss–Seidel relaxation with the polynomial
smoothing procedures investigated recently in [1]. The performance of the parallel
implementation will depend on the quality of the parallel matrix-vector product.
Future development will concentrate on extending features of the underlying
method on which αSA relies and on developing theory beyond the heuristics we de-
veloped here. Although most decisions are currently made by the code at runtime,
much remains to be done to fully automate the procedure, such as determining cer-
tain tolerances that are now input by the user. We plan to explore the possibility of
setting or updating these parameters at runtime based on the characteristics of the
problem at hand. A related work in progress [7] explores adaptive ideas suitable in
the context of the standard AMG method.

REFERENCES

[1] M. Adams, M. Brezina, J. Hu, and R. Tuminaro, Parallel multigrid smoothing: Polynomial
versus Gauss-Seidel, J. Comput. Phys., 188 (2003), pp. 593–610.
[2] J. Bramble, J. Pasciak, J. Wang, and J. Xu, Convergence estimates for multigrid algorithm
without regularity assumptions, Math. Comp., 57 (1991), pp. 23–45.
[3] A. Brandt, Lecture given at CASC, Lawrence Livermore National Lab, Livermore, CA, 2001.
[4] A. Brandt, S. F. McCormick, and J. W. Ruge, Algebraic multigrid (AMG) for sparse matrix
equations, in Sparsity and Its Applications, D. J. Evans, ed., Cambridge University Press,
Cambridge, UK, 1984, pp. 257–284.
[5] A. Brandt and D. Ron, Multigrid solvers and multilevel optimization strategies, in Multilevel
Optimization in VLSICAD, Comb. Optim. 14, J. Cong and J. R. Shinnerl, eds., Kluwer
Academic Publishers, Dordrecht, The Netherlands, 2003, pp. 1–69.
[6] M. Brezina, A. J. Cleary, R. D. Falgout, V. E. Henson, J. E. Jones, T. A. Manteuffel,
S. F. McCormick, and J. W. Ruge, Algebraic multigrid based on element interpolation
(AMGe), SIAM J. Sci. Comput., 22 (2000), pp. 1570–1592.
[7] M. Brezina, R. Falgout, S. MacLachlan, T. Manteuffel, S. F. McCormick, and J. W.
Ruge, Adaptive Algebraic Multigrid (αAMG), in preparation, 2003.
[8] M. Brezina, C. I. Heberton, J. Mandel, and P. Vaněk, An Iterative Method with Conver-
gence Rate Chosen A Priori, UCD/CCM report 140, Center for Computational Mathemat-
ics, University of Colorado at Denver, Denver, CO, 1999; available online from http://www-
math.cudenver.edu/ccmreports/rep140.ps.gz.
[9] M. Brezina and P. Vaněk, A black-box iterative solver based on a two-level Schwarz method,
Computing, 63 (1999), pp. 233–263.
[10] W. Briggs, V. E. Henson, and S. F. McCormick, A Multigrid Tutorial, 2nd ed., SIAM,
Philadelphia, 2000.
[11] T. Chartier, Element-Based Alebraic Multigrid (AMGe) and Spectral AMGe, Ph.D. thesis,
Univerity of Colorado at Boulder, Boulder, CO, 2001.
[12] T. Chartier, R. D. Falgout, V. E. Henson, J. E. Jones, T. Manteuffel, S. F. Mc-
Cormick, J. W. Ruge, and P. S. Vassilevski, Spectral AMGe (ρAMGe), SIAM J. Sci.
Comput., 25 (2003), pp. 1–26.
[13] J. Fish and V. Belsky, Generalized aggregation multilevel solver, Internat. J. Numer. Methods
Engrg., 40 (1997), pp. 4341–4361.
1920 BREZINA ET AL.

[14] S. F. McCormick and J. W. Ruge, Algebraic multigrid methods applied to problems in com-
putational structural mechanics, in State of the Art Surveys on Computational Mechanics,
A. K. Noor and J. T. Oden, eds., ASME, New York, 1989, pp. 237–270.
[15] J. W. Ruge, Algebraic multigrid (AMG) for geodetic survey problems, in Proceedings of the
International Multigrid Conference, Copper Mountain, CO, 1983.
[16] J. W. Ruge, Final Report on AMG02, report, Gesellschaft fuer Mathematik und Datenverar-
beitung, St. Augustin, 1985, GMD, contract 5110/022090.
[17] J. W. Ruge, Algebraic multigrid applied to systems of partial differential equations, in Proceed-
ings of the International Multigrid Conference, 1985, S. McCormick, ed., North–Holland,
Amsterdam, 1986.
[18] J. W. Ruge and K. Stüben, Efficient solution of finite difference and finite element equations
by algebraic multigrid (AMG), in Multigrid Methods for Integral and Differential Equa-
tions, The Institute of Mathematics and its Applications Conference Series, D. J. Paddon
and H. Holstein, eds., Clarendon Press, Oxford, UK, 1985, pp. 169–212.
[19] J. W. Ruge and K. Stüben, Algebraic multigrid (AMG), in Multigrid Methods, Frontiers
Appl. Math. 3, S. F. McCormick, ed., SIAM, Philadelphia, 1987, pp. 73–130.
[20] P. Vaněk, M. Brezina, and J. Mandel, Convergence of algebraic multigrid based on smoothed
aggregation, Numer. Math., 88 (2001), pp. 559–579.
[21] P. Vaněk, Acceleration of convergence of a two-level algorithm by smoothing transfer operator,
Appl. Math., 37 (1992), pp. 265–274.
[22] P. Vaněk, M. Brezina, and R. Tezaur, Two-grid method for linear elasticity on unstructured
meshes, SIAM J. Sci. Comput., 21 (1999), pp. 900–923.
[23] P. Vaněk, J. Mandel, and M. Brezina, Algebraic multigrid by smoothed aggregation for
second and fourth order elliptic problems, Computing, 56 (1996), pp. 179–196.

You might also like