Adaptive Smoothed Aggregation (SA) : Abstract
Adaptive Smoothed Aggregation (SA) : Abstract
c 2004 Society for Industrial and Applied Mathematics
Vol. 25, No. 6, pp. 1896–1920
Abstract. Substantial effort has been focused over the last two decades on developing multi-
level iterative methods capable of solving the large linear systems encountered in engineering practice.
These systems often arise from discretizing partial differential equations over unstructured meshes,
and the particular parameters or geometry of the physical problem being discretized may be un-
available to the solver. Algebraic multigrid (AMG) and multilevel domain decomposition methods of
algebraic type have been of particular interest in this context because of their promises of optimal per-
formance without the need for explicit knowledge of the problem geometry. These methods construct
a hierarchy of coarse problems based on the linear system itself and on certain assumptions about
the smooth components of the error. For smoothed aggregation (SA) methods applied to discretiza-
tions of elliptic problems, these assumptions typically consist of knowledge of the near-nullspace of
the weak form. This paper introduces an extension of the SA method in which good convergence
properties are achieved in situations where explicit knowledge of the near-nullspace components is
unavailable. This extension is accomplished by using the method itself to determine near-nullspace
components and adjusting the coarsening processes accordingly.
Key words. algebraic multigrid (AMG), generalized smoothed aggregation (SA), adaptive
method
DOI. 10.1137/S1064827502418598
1. Introduction. Over the last decade, smoothed aggregation (SA; cf. [21, 23,
22, 20, 9]) has emerged as an efficient multilevel algebraic solver for the solution of the
algebraic systems obtained by discretizing certain classes of differential equations on
unstructured meshes. In particular, SA is often very efficient at solving the systems
that arise from problems of three-dimensional (3D) thin-body elasticity, a task that
can tax traditional algebraic multigrid (AMG) techniques.
As with classical AMG [4, 18, 19], the standard SA method bases its transfer
operators on certain assumptions about the nature of smooth error. For SA applied
to discretizations of elliptic partial differential equations, this assumption usually takes
the form of explicit knowledge of the near-nullspace of the associated weak form. This
knowledge is easy to obtain for large classes of problems. For example, it is simple
to determine the near-nullspace for finite element discretizations of second- or fourth-
order partial differential equations, including many nonscalar problems. In more
general situations, however, this knowledge may not be readily available. Consider
the case where the matrix for a problem is provided without knowledge of how the
original problem was discretized or scaled.
∗ Received by the editors November 25, 2002; accepted for publication (in revised form) October
Seemingly innocuous discretization practices, such as the use of scaled bases, can
hamper AMG solvers if this scaling is not taken into account. Even the simplest
problems discretized on regular grids using standard finite elements can pose serious
difficulties if the resulting matrix has been scaled without this information being
provided to the solver. Other discretization practices leading to problematic linear
systems include the use of exotic bases and systems problems in which different local
coordinate systems are used for different parts of the model.
To successfully solve such problems when only the matrix is provided, we need
a process by which the algebraic multilevel solver can determine how to effectively
coarsen the linear system using only information from the system itself. The method
we propose here, which we call adaptive smoothed aggregation (αSA), is an attempt
to do just that. αSA is based on the simple principle that applying a linear iterative
method to the homogeneous problem (Ax = 0) reveals error components that the
method does not effectively reduce. While this principle is easily stated in loose
terms, the resulting algorithm and its implementation can be very subtle. We hope
to expose these subtleties in the presentation that follows.
The objective of the setup phase of αSA is therefore to compute a set of vectors, B,
that represent error components that relaxation is slow to resolve. Such components
are usually referred to by the terms algebraically smooth, near-nullspace, near-kernel,
or, in the case of linear elasticity, rigid body modes. We simply call them candidates
here as it is not actually essential that all of the vectors we compute be troublesome
components; we use a measure that, in effect, ignores candidates that relaxation effi-
ciently handles. It is also not a problem if we compute redundant or linearly dependent
candidates because our approach is designed to select the information we need from
the candidate subspace. It is, however, important to be certain that the final set of
candidates is rich in the sense that they combine to represent all troublesome com-
ponents locally. The keys in being able to do this are to evolve the multigrid solver
by having it compute its own slow-to-converge error components (by way of the ho-
mogeneous problem) and to use these new components to properly improve the solver.
The setup phase for αSA is easiest to describe as an adaptive process. We start
from a given primitive parent method (possibly a simple relaxation scheme), with
error propagation operator M0 , and a current but possibly empty set, B, of candidates
(error components that M0 does not effectively reduce). We attempt to enhance B
by first putting M0 to the following test: given a small number, n, of iterations and
a random initial guess, e0 , compute
(1.1) en ← M0n e0 .
If the method performs well in the sense that en is much smaller than that of e0 in
an appropriate norm, then it is accepted as the solver and the adaptive scheme stops.
Otherwise, the resulting approximation, en , is expected to be rich in the error compo-
nents that are not effectively reduced by M0 , so it is added to the candidate set, B. The
new candidate set is then used to construct an improved child method, with error prop-
agation operator M1 . The whole process can then be repeated with M1 in place of M0 ,
continuing in this way to generate a sequence of hopefully improving methods, Mk .
Thus, we iterate on the method itself, improving the current version by having it
compute its own troublesome components—those that it does not effectively reduce—
and then adjusting the coarsening process accordingly to produce a new method. Old
candidate components are also used in this adjustment process to ensure that the new
method continues to reduce them efficiently. This improvement process repeats until
1898 BREZINA ET AL.
the current method shows itself to be capable of efficient solution of the problem of
interest. The iteration on the method is called the adaptive setup phase (or, simply,
the setup phase) to distinguish it from the solver phase, where the resulting method is
applied to the target problem. The setup phase is terminated when either the latest
incarnation of the method performs satisfactorily or a prescribed number of steps is
reached.
Each new child method is constructed based on components resulting from its
parent iteration (1.1). The method is modified to reflect the newly computed candi-
dates as soon as they become available. In other words, the method is kept up to date
at all times and no more work is done than necessary. In section 3.2, we show how
the general setup phase naturally takes the form of a reverse full multigrid (FMG-)
cycle.
The adaptive strategy outlined above is designed to uncover global error compo-
nents that a parent method does not handle well. It is crucial to recognize that there
are likely to be many such components—so many that, in general, we cannot expect
to identify each one individually. Typically, a small but fixed percentage of the spec-
trum of Mk corresponds to troublesome components. Thus, the few candidates that
iteration (1.1) identifies must serve as representatives for many smooth components
in the coarsening process. This is analogous to the standard SA coarsening processes
where the near-kernel is used to represent all smooth components. This representation
is accomplished by first taking local segments of each candidate (i.e., by taking the
restriction of the candidate to an aggregate and extending it outside the aggregate
with zeros) so that the segments sum to the candidate itself. Each segment is then
smoothed to enhance the overall approximation property in the sense that it accu-
rately represents similar smooth components. In this way, standard SA constructs a
rich set of local representations of the smooth or troublesome components. So too
must αSA. Indeed, we need a way to coarsen the system that ensures accurate approx-
imation of the error components that the current candidates represent. Of course, to
control storage and CPU costs, we also need to control operator complexity, which
involves limiting the number of candidates that iteration (1.1) produces, exploiting
these candidates as fully as we can, and limiting the growth of the number of coarse
degrees of freedom.
The SA framework [20] lends itself to this task. It offers fast automatic coarsening
with well-understood control over operator complexity due to its typically fixed coarse-
operator sparsity pattern. In addition, the process guarantees proper approximation
of a given set of functions and their natural localizations during the coarsening process.
The resulting coarse-level basis functions are smooth by design and thus suitable for
use in a multilevel method. The candidates obtained by iteration (1.1) play the roles
of the near-kernel components on which the SA method is based. Thus, in the αSA
context, the notion of near-kernel components depends not only on the problem but
also on the current method. In general, however, a troublesome component must
have a small Rayleigh quotient, signifying ineffectiveness of relaxation. However, in
all but the initial phase (where coarsening perhaps has not been constructed yet) or
the final phase (where the method may be efficient), the current candidate must also
have a small Rayleigh quotient defined in terms of the current coarse-level projection
operator. We do not use this property explicitly in the adaptive process, but keeping
it in mind can aid in understanding the development that follows.
Thus, our main goal is to extend applicability of the SA concept to difficult
problems for which the original method may perform poorly, possibly due to the
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1899
lack of explicit knowledge of the near-kernel. The algorithm may also be useful for
improving performance in applications that involve multiple right sides, where efforts
to improve the method may be amortized over the number of solutions.
In what follows, we develop this modification of the SA method in such a way
that good convergence properties are recovered even if explicit knowledge of the near-
kernel is either incomplete or lacking altogether. This should facilitate solution in cases
where the problem geometry, discretization method, or coefficients of the differential
operator are not explicitly known to the solver. At the same time, we strive to keep
storage requirements low.
The concept of using a multigrid algorithm to improve itself is not new. Using
representative smooth vectors in the coarsening process was first introduced in [15],
where interpolation was defined to fit vectors obtained by relaxation of the homoge-
neous problem. In [4], a variation of this idea was used for recovering typical AMG
convergence rates for a badly scaled scalar elliptic problem. While the method there
was very basic and used only one candidate, it contained many of the ingredients of the
approach developed below. These concepts were developed further in [16, 17, 19, 14].
The idea of fitting eigenvectors corresponding to the smallest eigenvalues was ad-
vocated in [14] and [19], where an AMG algorithm determining these eigenvectors
through Rayleigh quotient minimization was outlined. These vectors were, in turn,
used to update the AMG interpolation and coarse-grid operators. Most of these ideas
were later summarized in [14]. A more sophisticated adaptive framework appropriate
for the standard AMG is currently under investigation [7].
Another method of the type developed here is the bootstrap AMG scheme pro-
posed recently by Brandt [3] and Brandt and Ron [5]. It differs somewhat from ours
in that it starts on the fine grid by iterating on a number of different random initial
guesses, with interpolation then constructed to approximately fit the resulting vectors
in a least-squares sense.
Various other attempts have been made to allow for the solver itself to determine
from the discrete problem the information required to successfully solve it, without
a priori assumptions on the form of the smooth error. These include the methods
of [13, 8, 6, 11, 12]. All these methods, however, need access to the local finite
element matrices of the problem so that they can construct the multigrid transfer op-
erators based on the algebraically smooth eigenvectors of the agglomerated stiffness
matrices. Although these methods exhibit attractive convergence properties, their
need to construct, store, and manipulate the coarse-level element information typi-
cally leads to increased storage requirements compared to those of classical AMG or
standard SA. The current method aims to achieve the good convergence properties of
the element-based methods without the overhead of the element storage.
This paper is organized as follows. In section 2, we briefly recall the standard SA
method and introduce some notation used throughout the remainder of the paper.
Readers who are unfamiliar with the fundamental concepts assumed here may first
wish to consult basic references on multigrid (e.g., [10]) and SA (e.g., [20]). Section 3
motivates and describes possible strategies to extract the information used to con-
struct improved transfer operators based on the method’s iterative history. These
strategies can be described as adaptive AMG, in which the method ideally evolves
until a cross-point at which further improvement (in terms of convergence rate) is
offset by the increased cost of each iteration. Section 4 discusses implementation is-
sues and ways of reducing cost and improving accuracy of the setup phase. Finally,
section 5 presents computational examples demonstrating the performance of the SA
1900 BREZINA ET AL.
(2.4) x ← (I − Rl Al )x + Rl bl .
1
λmin (I − Rl Al ) ≥ 0 and λmin (Rl ) ≥ 2 ρ(A ) ,
CR l
Note 2.2. In this way, with B 1 , A1 , and b1 given, the entire multigrid setup
can be performed. This construction of the SA multigrid hierarchy, using (2.6), (2.3),
and (2.2), and relying on a given fine-level near-kernel representation, B 1 , is called
the standard SA setup in this paper. For later reference, we outline the setup in
Algorithm 2 below. For details, see [20].
Algorithm 2 (standard SA setup). Given A1 , B 1 , L, do the following for l =
1, . . . , L − 1:
(a) Construct {Ali }N i=1 based on Al .
l
l l
(c) Construct the smoothed prolongator: Il+1 = Sl Pl+1 .
l T l
(d) Construct the coarse matrix: Al+1 = (Il+1 ) Al Il+1 .
With our choice of smoothing components and a coarsening procedure utiliz-
ing (2.6), the standard SA scheme can be proven to converge under certain assump-
tions on the near-kernel components alone. The following such result motivates the
need for standard SA to have access to the near-kernel components and serves to
motivate and guide our development of αSA.
Let u, v A denote the Euclidean inner product over the degrees of freedom cor-
responding to an agglomerate A and denote the A1 -norm by |||u||| = A1 u, u 1/2 .
Let B 1 denote an n1 × r matrix whose columns are thought to form a basis for the
near-kernel components corresponding to A1 .
Theorem 2.3 (Theorem 4.2 of [20]). With Ãli denoting the set of fine-level
degrees of freedom corresponding to aggregate Ali on level l, assume that there exists
constant Ca > 0 such that, for every u ∈ Rn1 and every l = 1, . . . , L − 1, the following
approximation property holds:
9l−1
(2.7) minr u − B 1 w 2
≤ Ca A1 u, u .
w∈R Ãli ρ(A1 )
i
Then
1
|||x∗ − AMG(x, b1 )||| ≤ 1− |||x∗ − x||| ∀x ∈ Rn1 ,
c(L)
9l−1
(2.8) minr u − P21 P32 . . . Pl+1
l
B l+1 w 2
≤ Ca A1 u, u
w∈R Ãli ρ(A1 )
i
for every u ∈ Rn1 and every l = 1, . . . , L − 1. Thus, in the context of SA, condition
(2.7) can be viewed as an alternative formulation of the weak approximation prop-
erty [2]. Note that the required approximation of a fine-level vector is less stringent for
coarser levels. Also, convergence is guaranteed even though no regularity assumptions
have been made. Although the convergence bound naturally depends on the number
of levels, computational experiments suggest that the presence of elliptic regularity
for standard test problems yields optimal performance (i.e., convergence with bounds
that are independent of the number of levels).
That polynomial c(L) in the convergence estimate has degree 3 is an artifact
of the proof technique used in [20], where no explicit assumptions are made on the
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1903
3.1. Initialization setup stage. The adaptive multigrid setup procedure con-
sidered in this paper can be split into two stages. If no knowledge of the near-kernel
components of A1 is available, then we start with the first stage to determine an ap-
proximation to one such component. This stage also determines the number of levels,
L, to be used in the coarsening process. (Changing L in the next stage based on
observed performance is certainly possible, but it is convenient to fix L—and other
constructs—early in the setup phase.)
Let ε > 0 be a given convergence tolerance.
Algorithm 3 (initialization stage).
1. Set l = 1, select a random vector, x1 ∈ Rn1 , and create copy, x̂1 ← x1 .
2. With initial approximation x1 , relax µ times on A1 x = 0:
x1 ← (I − R1 A1 )µ x1 .
3. If ( A 1 x1 ,x1 1/µ
A1 x̂1 ,x̂1 ) ≤ ε, then set L = 1 and stop (problem A1 x = b1 can be
solved fast enough by relaxation alone, so only one level is needed).
4. Otherwise, do the following:
(a) Set B l ← xl .
(b) Create a set, {Ali }N i=1 , of nodal aggregates based on matrix Al .
l
1904 BREZINA ET AL.
l
(c) Define tentative prolongator Pl+1 and candidate matrix B l+1 using the
candidate matrix B l and relations (2.6) with structure based on {Ali }N i=1 .
l
l l
(d) Define the prolongator: Il+1 = Sl Pl+1 .
l
(e) Define the coarse matrix: Al+1 = (Il+1 )T Al Il+1
l
. If level l + 1 is coarse
enough that a direct solver can be used there, skip to Step 5; otherwise,
continue.
(f) Set the next-level approximation vector: xl+1 ← B l+1 .
(g) Make a copy of the current approximation: x̂l+1 ← xl+1 .
(h) With initial approximation xl+1 , relax µ times on Al+1 x = 0:
xl+1 ← (I − Rl+1 Al+1 )µ xl+1 .
L−1 Set B1 = x1
Run SA setup
L
New V−cycle
if the discretization package either generates the rigid body modes or supplies the
nodal geometry to the solver, then the full set of nullspace vectors is presumably
available [22] and the adaptive process may be unnecessary. Otherwise, when the full
set of rigid body modes is unavailable, it is nevertheless often possible to obtain a sub-
set of the rigid body modes consisting of three independent constant displacements,
regardless of the geometry of the grid. Such a subspace should be used whenever
possible to create B 1 and to set up a V -cycle exactly as in the standard SA method.
The initialization stage would then be omitted.
Thus, the initialization stage given by Algorithm 3 should be viewed as optional,
to be done only if no information can be assumed about the system to be solved.
In view of Note 3.1, we can in any case assume that the initial B 1 has at least
one column and that a tentative V -cycle is available. This means that we have
constructed aggregates Ali , transfer operators Pl+1l l
and Il+1 , and coarse operators
Al+1 , l = 1, . . . , L − 1.
3.2. General setup stage. In each step of the second stage of the adaptive
procedure, we apply the current V -cycle to the homogeneous problem to uncover error
components that are not quickly attenuated. The procedure then updates its own
transfer operators to ensure that these components will be eliminated by the improved
method, while preserving the previously established approximation properties. Thus,
this stage essentially follows the initialization stage with relaxation replaced by the
current V -cycle.
One of the subtleties of this approach lies in the method’s attempt to update each
level of the evolving V -cycle as soon as its ineffectiveness is exposed. Thus, on the
finest level in the second stage, the current V -cycle simply plays the role of relaxation:
if it is unable to quickly solve the homogeneous problem (i.e., Step 3 fails), then the
resulting error becomes a new candidate, and new degrees of freedom are generated
accordingly on level 2 (i.e., columns are added to B 1 ). The level 2-to-L part of the
old V -cycle (i.e., the part without the finest level) then plays the role of the level 2
relaxation in the initial setup phase and is thus applied to the homogeneous problem
to assess the need to improve its coarser-level interpolation operators. The same is
1906 BREZINA ET AL.
done on each coarser level, l, with the level l-to-L part of the old V -cycle playing the
role of the level l relaxation step in the initial setup phase. The process continues
until adequate performance is observed or the maximum permitted number of degrees
of freedom per node is reached on coarse levels.
We present a general prototype algorithm for the adaptive multigrid setup, as-
suming that a tentative V -cycle has previously been constructed (cf. Note 3.1). We
thus assume that a current hierarchy of nodal aggregates, {Ali }N i=1 , and operators
l
Consider a method in which, within each cycle of the adaptive setup, we attempt
to update the current V -cycle level by level. One cycle of this adaptive setup traverses
from the finest to the coarsest level; on each level l along the way, it updates B l based
on computing a new candidate from the current multigrid scheme applied to the
homogeneous problem on level l. Thus, on level l in the setup process, a solver is
applied that traverses from that level to level L and back. This gives us the picture of
a backward F M G cycle, where the setup traverses from the finest to the coarsest grid
and each level along the way is processed by a V -cycle solver (see Figure 3.2). Now,
once this new candidate is computed, it is incorporated into the current multigrid
scheme and the previously existing V -cycle components are overwritten on level l + 1
but temporarily retained from that level down. As a result, we redefine level by level
the V -cycle components. Once the new B l (and Il+1 l
in (2.3)) are constructed all the
way to the coarsest level, we can then use them to update the current B 1 and, based
on it, construct a new V -cycle on the finest level.
1
Fig. 3.2. Self-correcting adaptive cycling scheme given by Algorithm 4, with the solver cycles
uncollapsed.
large cost, each setup cycle must determine interpolation operators so that the solver
eliminates a relatively large set of errors of each candidate’s type. Just as each rigid
body mode is used locally in standard SA to treat errors of similar type (constants
represent errors that are smooth within variables and rotations represent intervariable
“smoothness”), so too must each candidate be used in αSA. Moreover, a full set
of types must be determined if the solver is to attain full efficiency (e.g., for two-
dimensional (2D) linear elasticity, three rigid body modes are generally needed). We
thus think of each candidate as a sort of straw man that represents a class of smooth
components. Efficient computation of a full set of straw men is the responsibility of
the adaptive process. However, proper treatment of each straw man is the task of the
basic solver, which is SA in this case.
As we apply our current method to the homogeneous problem, the resulting can-
didate, xl , becomes rich in the components of the error that are slow to converge in
the current method. Our goal in designing the adaptive algorithm is to ensure that
xl is approximated relatively well by the newly constructed transfer operator. That
is, we want to control the constant Ca in the inequality
Ca
(3.1) min xl − Pl+1
l
v 2
≤ xl 2
Al .
v∈Rnl+1 ρ(Al )
where δA are chosen so that summing (3.2) over all aggregates leads to (3.1), i.e., so
that
Al x, x
(3.3) δA (x) = .
ρ(Al )
A
For now, the only assumption we place on δA (x) is that (3.3) holds. An appropriate
choice for the definition of δA (x) is given in section 4.
Note 3.2 (relationship to theoretical assumptions). To relate condition (3.1) to
l
the theoretical foundation of SA, we make the following observation. If Pl+1 is con-
structed so that (3.1) is satisfied for the candidate xl , the construction of our method
automatically guarantees that
Ca
(3.4) min x1 − P21 P32 . . . Pl+1
l
v 2
≤ x̂l 2
A1 ,
v∈Rn l+1 ρ(Al )
where x1 = P21 P32 . . . Pll−1 xl and x̂1 = I21 I32 . . . Ill−1 xl . Since it is easy to show that
x̂ A1 ≤ x A1 , we can thus guarantee that (2.8) holds for the particular fine-level
candidate x1 . Inequality (3.1) is easily satisfied for any component u for which u A1
is bounded away from zero. We can thus focus on the troublesome subspace of com-
ponents with small energy. Our experience with the standard SA method indicates
that for the second- and fourth-order elliptic problems it suffices to ensure that the
components corresponding to the nullspace of the weak form of the problem are well
approximated by the prolongation (the near-nullspace components are then well ap-
poximated due to the localization and smoothing procedures involved in constructing
1908 BREZINA ET AL.
the SA transfer operators). Further, as the set of candidates constructed during the
setup cycle is expected to eventually encompass the entire troublesome subspace,
satisfaction of (3.1) for all candidates would imply the satisfaction of (2.8) for any
u ∈ Rn1 . This, in turn, guarantees convergence.
Note 3.3 (locally small components). Each new candidate is the result of applying
the V -cycle based on the current B 1 , so it must be approximately A1 -orthogonal to
all previously computed candidates. This is, however, only a global property that the
evolving candidates tend to exhibit. It may be that a candidate is so small on some
aggregate, relative to its energy, that its representation there can be ignored. More
precisely, we could encounter situations in which
(3.5) xl 2
A ≤ Ca δA (xl )
(3.6) xl − P̃l+1
l l
(P̃l+1 )T xl 2
A ≤ Ca δA (xl ).
l
(Since (P̃l+1 )T P̃l+1
l l
= I, then P̃l+1 l
(P̃l+1 )T is the L2 projection onto the range of P̃l+1
l
;
thus, (3.6) is just approximation property (3.2) using the tentative prolongator in place
of the smoothed one.) If (3.6) is satisfied, then xl is assumed to be well approximated
by the current transfer operator and is simply ignored in the construction of the new
transfer operator on aggregate A. (Practical implications of this local elimination
from the coarsening process are considered in section 4.) If the inequality is not
satisfied, then we keep the computed vector y = xl − P̃l+1 l l
(P̃l+1 )T xl , which, by
l
construction, is orthogonal to all the vectors already represented in the current P̃l+1 .
We then normalize via y ← y/ y A so that the new Pl+1 has orthonormal columns:
l
l
(Pl+1 )T Pl+1
l
= I.
Before we introduce Algorithm 4 below, we stress that the description should
be viewed as a general outline of the adaptive multigrid setup. We intentionally
ignore several practical issues that must be addressed before this algorithm can be
implemented. For instance, we do not include details on how the new B l and Il+1 l
are efficiently constructed in the evolving method. Also, when using a coarse-level
V -cycle constructed by previous applications of the setup stage, we must deal with
the possibility that the number of vectors approximated on coarse levels in previous
cycles is smaller than the number of vectors approximated on the fine levels in the
current cycle. These issues are discussed in section 4, where we take advantage of the
SA framework to turn the prototypical Algorithm 4 into a practical implementation.
Assume we are given a bound, K ∈ N, on the number of degrees of freedom per
node on coarse levels, convergence factor tolerance ε ∈ (0, 1), and aggregate quantities
δA (x) such that A δA (x) = A l x,x
ρ(Al ) .
Algorithm 4 (one cycle of the general setup stage).
1. If the maximum number of degrees of freedom per node on level 2 equals K,
stop (the allowed number of coarse-grid degrees of freedom has been reached).
2. Create a copy of the current B 1 for later use: B̂ 1 ← B 1 .
3. Select a random x1 ∈ Rn1 , create a copy x̂1 ← x1 , and apply µ iterations of
the current finest-level V -cycle:
4. If ( A 1 x1 ,x1 1/µ
A1 x̂1 ,x̂1 ) ≤ ε, then stop (A1 x = b1 can be solved quickly enough by
the current method).
5. Update B 1 by incorporating the computed x1 in its range:
B 1 ← [B 1 , x1 ].
6. For l = 1, . . . , L − 2:
(a) Create a copy of the current B l+1 for later use: B̂ l+1 ← B l+1 ,
1910 BREZINA ET AL.
(b) Define new coarse-level matrix B l+1 and transfer operators Pl+1
l l
, Il+1 us-
ing (2.6) and (2.3). In creating Pl+1 , some local components in B l may
l
B 1 ← [B̂ 1 , x1 ].
V−cycle on A2 x2 = 0 B2 [ B2 x2 ]
2
2 P3 B3 = B2 x3 Last col. of B 3 1 2 L−2
x1 I2 I3... IL−1x L−1
Set B1 [ B1 x1]
Run S. A. setup
L−1
New V−cycle
L
Fig. 3.3. One step of general setup stage, Algorithm 4.
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1911
is assumed to be treated by a direct solver. Also as in the initial stage, once a level
is reached where the problem can be solved well by the current method, any further
coarsening is constructed as in the standard SA.
4. Implementation issues. Several issues must be addressed to make Algo-
rithm 4 practical. We take advantage of certain features of the SA concept to carry
out the method outlined in Algorithm 4, as well as to control the amount of work
required to keep the evolving coarse-level hierarchy up to date.
As suggested in Notes 3.4 and 3.5, a candidate may occasionally be eliminated
locally over an aggregate. This results in varying numbers of degrees of freedom
per node on the coarse levels. (Recall that a coarse-level node is defined as a set of
degrees of freedom, each representing the restriction of a single candidate to a fine-
level aggregate.) To simplify notation, we assume for the time being that the number
of degrees of freedom per node is the same for all nodes on a given level (i.e., no
candidates are locally eliminated). It is important, however, to keep in mind that we
are interested in the more general case. A generalization to varying numbers of degrees
of freedom per node could be obtained easily at the cost of a much more cumbersome
notation. We briefly remark on the more general cases in Notes 4.2 and 4.3 below.
Note 4.1 (construction of temporary “bridging” transfer operators). An issue
we must consider is the interfacing between the emerging V -cycle on finer levels and
the previous V -cycle on coarser levels. Each setup cycle starts by selecting an initial
approximation for a new candidate on the finest level (cf. Figure 3.3). This approxi-
mation is then improved by applying the error propagation matrix for the previously
constructed V -cycle to it. The resulting candidate is used to enrich B 1 . This neces-
sitates an update of P21 , I21 , and A2 from (2.6) and (2.2) and introduces an additional
degree of freedom for the nodes on level 2. Since we now want to run the current
solver on level 2 to obtain an improved candidate on that level, we need to temporar-
ily modify P32 and I32 because these transfer operators have not yet been updated to
reflect the added degrees of freedom on level 2. Once this modification has been made,
a V -cycle on level 2 can be run to compute the new candidate there. This candidate
is then incorporated into B 2 and new P32 and I32 are constructed, overwriting the
temporary versions, and the new A3 can be computed using (2.2). To perform the
V -cycle on level 3, we then must temporarily modify operators P43 and I43 for the same
reason we had to update P32 and I32 above. Analogous temporary modifications to
the transfer operators are necessary on all coarser levels, as the setup cycle traverses
sequentially through them.
Thus, on stage l of a single cycle of the setup process, all transfer operators defin-
l l
ing the V -cycle can be used without change, except for Pl+1 and, consequently, Il+1
l
defined through (2.3). We can construct the temporary operator, Pl+1 , by modifying
(2.6) as
l
Pl+1 B l+1 = B̂ l ,
where B̂ l is formed by removing the last column from B l , which consists of the
k + 1 fine-level candidate vectors, including the newly added one (so that the first k
l
candidates are the same as in the previous cycle). Since tentative prolongator Pl+1
l
produced in this way is based only on fitting the first k vectors in B , the coarse-level
matrix Al+1 resulting from the previous cycle of the αSA setup (described below) can
be used on the next level. Thus, all the coarse operators for levels coarser than l can
be used without change. This has the advantage of reducing the amount of work to
keep the V -cycle up to date on coarser, yet-to-be-traversed levels.
1912 BREZINA ET AL.
So far, we have considered only the case where all candidates are used locally.
In the interest of keeping only the candidates that are essential to achieving good
convergence properties, we now consider locally eliminating the candidates where
appropriate.
Note 4.2 (eliminating candidates locally as suggested in Note 3.5). When we
eliminate a candidate locally over an aggregate as suggested in Note 3.5, the con-
struction of the bridging operator above can be easily modified so that the multigrid
hierarchy constructed in the previous setup cycle can be used to apply a level l V -
cycle in the current one. Since the procedure guarantees that the previously selected
candidates are retained and only the newly computed candidate may be locally elim-
inated, the V -cycle constructed in the previous setup cycle remains valid on coarser
grids as in the case of Note 4.1. The only difference now is that aggregates may have
a variable number of associated candidates, and the construction of the temporary
l
transfer operator Pl+1 described in Note 4.1 must account for this when removing the
column of B l to construct B̂ l .
Note 4.3 (eliminating candidates locally as suggested in Note 3.4). The situation
is slightly more complicated when the procedure described in Note 3.4 is used to
eliminate candidates locally over an aggregate. First, even if none of the old candidates
are eliminated, the use of the procedure of Note 3.4 may result in a permutation of
the candidates over an aggregate and hence a permutation of the coarse degrees of
freedom corresponding to the associated node. To match the fine-level V -cycle with
the existing coarser levels, an appropriate permutation of the coarse degrees of freedom
must then be done when performing the intergrid transfer in the application of the
resulting V -cycle.
However, if some of the previously selected candidates are eliminated in favor of
l
the new candidate in the construction of the updated Pl+1 , the coarse V -cycle should
no longer be used without change. In such cases, we would have to generate all the
coarse levels below level l before running the level l + 1 V -cycle. This results in a
significantly increased cost of the setup phase.
Note 4.4 (selection of the local quantities δA (x)). Our algorithm relies on local
aggregate quantities δA (x) to decide whether to eliminate candidate x in aggregate
A, and to guarantee that the computed candidates satisfy the global approximation
property (3.1). This leads us to the choice
card(A) Al x, x
(4.1) δA (x) = ,
Nl ρ(Al )
where card(A) denotes the number of nodes in aggregate A on level l, and Nl is the
total number of nodes on that level. Note that A δA (x) = A l x,x
ρ(Al ) for any x, so this
can be used in local estimates (3.2) to guarantee (3.1).
Suppose that we are given a bound, K > 0, on the maximum allowed number of
degrees of freedom per node on the coarse levels, and a tolerance, ε ∈ (0, 1), on the
target convergence factor. Then one adaptive setup cycle is defined as follows.
Algorithm 5 (one cycle of αSA).
1. If the maximum number of degrees of freedom per node on level 2 equals K,
stop (the allowed number of coarse grid degrees of freedom has been reached).
2. Create a copy of the current B 1 for later use: B̂ 1 ← B 1 .
3. Select a random x1 ∈ Rn1 , create a copy x̂1 ← x1 , and apply µ iterations of
the current V -cycle:
x1 ← AMGµ1 (x1 , 0).
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1913
4. If ( A 1 x1 ,x1 1/µ
A1 x̂1 ,x̂1 ) ≤ ε, then stop (A1 x = b1 can be solved quickly enough by
the current method).
5. Update B 1 by extending its range with the new column {x1 }:
B 1 ← [B 1 , x1 ].
6. For l = 1, . . . , L − 2:
(a) Define a new coarse-level matrix B l+1 and transfer operator Pl+1 l
based
on (2.6), using B and decomposition {Ai }i=1 . In creating Pl+1 , some
l l Nl l
B 1 ← [B̂ 1 , x1 ].
9. Create the V -cycle based on the current B 1 using the standard SA setup de-
scribed by Algorithm 2.
Note that if we use the candidate elimination scheme of Note 3.4 in 6(a), we
should modify the algorithm to construct a completely new multigrid hierarchy on
levels l +1 through L before applying the level l +1 V -cycle in Step 6(h) (cf. Note 4.2).
Before presenting computational results, we consider several possible improve-
ments intended to reduce the necessary number of cycles of the setup and the amount
of work required to carry each cycle.
Note 4.5 (improving the quality of existing candidates). Many practical situ-
ations, including fourth-order equations and systems of fluid and solid mechanics,
require a set of multiple candidates to achieve optimal convergence. In the interest
of keeping operator complexity as small as possible, it is imperative that the number
of candidates used to produce the final method be controlled. Therefore, ways of
improving the quality of each candidate are of interest, to curb the demand for the
growth in their number.
When the current V -cycle hierarchy is based on approximating at least two can-
didates (in other words, the coarse problems feature at least two degrees of freedom
per node), this can be easily accomplished as follows.
1914 BREZINA ET AL.
xl ← xl + Il+1
l
x̂l+1 ,
where x̂l+1 is obtained from xl+1 by setting to zero every entry corresponding to
l
fine-level candidate xj . Thus, the columns of Il+1 corresponding to xj are not used
in coarse-grid correction.
In this way, we come up with an improved candidate vector without restarting
the entire setup iteration from scratch and without adding a new candidate. Since we
focus on one component at a time and keep all other components intact, this modified
V -cycle is expected to converge rapidly.
Note 4.6 (saving work). The reuse of current coarse-level components described
in Note 4.1 reduces the amount of work required to keep the V -cycle up to date.
Additional work can be saved by performing the decomposition of nodes into disjoint
aggregates only during the setup of the initial V -cycle and then reusing this decom-
position in later cycles. Yet further savings are possible in coarsening, assuming the
candidates are allowed to be locally eliminated according to Note 3.5. For instance,
we can exploit the second-level matrix structure
Ã2 X
A2 = ,
Y Z
where Ã2 is the second-level matrix from the previous cycle. Thus, A2 need not be
recomputed and can be obtained by a rank-one update of each block entry in Ã2 .
l
In a similar fashion, the new operators Pl+1 , B l+1 do not have to be recomputed in
each new setup cycle by the local QR decomposition noted in section 2. Instead, it is
l
possible to update each nodal entry in P̃l+1 , B̂ l+1 by a rank-one update on all coarse
l l+1
levels, where P̃l+1 , B̂ are the operators created by the previous setup cycle.
5. Numerical experiments. To demonstrate the effectiveness of the proposed
adaptive setup process, we present results obtained by applying the method to several
model problems. In these tests, the solver was stopped when the relative residual
reached the value ε = 10−12 (unless otherwise specified). The value Ca = 10−3 was
used for test (3.6) and the relaxation scheme for the multigrid solver was symmetric
Gauss–Seidel. While a Krylov subspace process is used often in practice, we present
these results for a basic multigrid V -cycle with no acceleration scheme for clarity,
unless explicitly specified otherwise.
All the experiments have been run on a notebook computer with a 1.6 GHz mo-
bile Pentium 4 processor and 512 MB of RAM. For each experiment, we report the
following. The column denoted by “Iter” contains the number of iterations required
to reduce the residual by the prescribed factor. The “Factor” column reports con-
vergence factor measured as the geometric average of the residual reduction in the
last 10 iterations. In the “CPU” column, we report the total CPU times in seconds
required to complete both the setup and iteration phases of the solver. In the column
“RelCPU” we report the relative times to solution, with one unit defined as the time
required to solve the problem given the correct near-nullspace components. In the
ADAPTIVE SMOOTHED AGGREGATION (αSA) 1915
Table 5.1
Misscaled 3D Poisson problems, 68, 921, and 1, 030, 301 degrees of freedom; using ε = 10−8 .
“OpComp” column, we report the operator complexity associated with the V -cycle
for every run (we define operator complexity in the usual sense [19], as the ratio of
the number of entries stored in all problem matrices on all levels divided by the num-
ber of entries stored in the finest-level matrix). The “Candidates” column indicates
the number of kernel vectors computed in the setup iteration (a value of “provided”
means that complete kernel information was supplied to the solver, assuming standard
discretization and ignoring scaling). Parameter µmax denotes the maximal number of
tentative V -cycles allowed in computing each candidate.
In all the cases considered, the problem was modified either by scaling or by
rotating each nodal entry in the system by a random angle (as described below).
These modifications pose serious difficulties for classical algebraic iterative solvers
that are not aware of such modifications, as we assume here.
For comparison, we also include the results for the unmodified problem, with a
supplied set of kernel components. Not surprisingly, the standard algorithm (without
benefit of the adaptive process) performs poorly for the modified system when the
details of this modification are kept from the solver, as we assume here.
We start by considering a diagonally scaled problem,
A ← D−1/2 AD−1/2 ,
where the original A is the matrix obtained by standard Q1 finite element discretiza-
tion of the 3D Poisson operator on a cube and D is a diagonal matrix with entries
10β , where β ∈ [−σ, +σ] is chosen randomly. Table 5.1 shows the results for different
values of parameter σ and different levels of refinement. Using the supplied kernel
yields good convergence factors for the unmodified problem, but the performance is
poor and deteriorates with increased problem size when used with σ = 0. In contrast,
the adaptive process, starting from a random approximation, recovers the convergence
properties associated with the standard Poisson problem (σ = 0), even for the scaled
case, with convergence that appears independent of the problem size.
The second problem comes from a diagonally scaled matrix arising in 2D elasticity.
Diagonal entries of D are again defined as 10β , with β ∈ [−σ, +σ] chosen randomly.
The original matrix is the discrete operator for the plane-strain elasticity formulation
over a square domain using bilinear finite elements on a uniform grid, with a Poisson
ratio of ν = 0.3 and Dirichlet boundary conditions specified only along the “West”
side of the domain. The results in Table 5.2 follow a pattern similar to those for the
Poisson problem. Note, however, that more than the usual three candidate vectors
are now needed to achieve convergence properties similar to those of the unscaled
1916 BREZINA ET AL.
Table 5.2
Scaled 2D elasticity problems with 80, 400 and 181, 202 degrees of freedom. Iteration counts
marked with an asterisk indicate that residual reduction by 1012 was not achieved before the maxi-
mum number of iterations was reached.
problem when a correct set of three rigid-body modes is provided by the user. For
the scaled problem, however, supplying the rigid-body modes computed based on the
problem geometry leads, as expected, to dismal performance of the standard solver.
The third set of experiments is based again on the 2D elasticity problem, but now
each nodal block is rotated by a random angle β ∈ [0, π],
A ← QT AQ,
Table 5.4
Rotated 3D elasticity problems with 114, 444 and 201, 720 degrees of freedom.
The final example demonstrates performance of the adaptive method for an elas-
ticity problem featuring discontinuities in the Young modulus. Here we consider a 3D
elasticity problem in which the Poisson ratio is fixed at 0.32, while the Young modulus
is allowed to vary randomly between the elements. We consider two cases: a case of
coefficients varying randomly with uniform distribution in the interval (1, 10σ ) and
the case where the distribution is exponential; i.e., the Young modulus is computed
as 10(σr) , where r is generated randomly with uniform distribution in (0, 1). Keep-
ing with the usual practice of employing Krylov method acceleration for problems
with coefficient discontinuities, in this experiment we use our adaptive method as a
preconditioner in the conjugate gradient method. The iteration was stopped once
the initial residual was reduced by 108 . Table 5.5 compares the results obtained by
using our adaptive scheme, started from random initial guess, to the results obtained
when the method based on a priori knowledge of the rigid body modes is employed
1918 BREZINA ET AL.
Table 5.5
3D elasticity problem, 201, 720 degrees of freedom, with Young modulus featuring random jumps
in (1, 10σ ).
as a preconditioner. The table indicates that, using the adaptive procedure, without
a priori knowledge of the problem geometry, we can about recover the rates of the
method based on the knowledge of the rigid-body modes.
The topic of problems with discontinuous coefficients and the appropriate modi-
ficiations to the basic SA method will be studied in a separate paper.
Note 5.1. The operator complexities in all of the test problems remain below
2. Moreover, for the larger spatial dimension of three dimensions, these complexities
improve somewhat, due largely to the increased speed of aggregation coarsening. It is
also worth mentioning that the increasing size of the coarse matrix block entries due
to the increasing number of candidates does not significantly impact the time needed
to perform one iteration of the solver, apparently due to the more efficient memory
access afforded by blocking.
the overall cost of one step of the final iteration grows only modestly because of better
utilization of cache memory due to dense matrix operations on the nodal blocks.
Operator complexity remains at reasonable levels and actually seems to improve with
increasing spatial dimension.
The construction of the tentative prolongator in the setup phase involves restric-
tion of the candidate functions to an aggregate and subsequent local orthogonalization
of these functions. It is therefore suitable for parallel processing as long as the ag-
gregates are local to the processor. Parallelization of the underlying SA solver is
currently under development. When it is completed, then αSA will also benefit from
the parallel speedup. The parallel version is also expected to gain better parallel scala-
bility by replacing the traditionally used Gauss–Seidel relaxation with the polynomial
smoothing procedures investigated recently in [1]. The performance of the parallel
implementation will depend on the quality of the parallel matrix-vector product.
Future development will concentrate on extending features of the underlying
method on which αSA relies and on developing theory beyond the heuristics we de-
veloped here. Although most decisions are currently made by the code at runtime,
much remains to be done to fully automate the procedure, such as determining cer-
tain tolerances that are now input by the user. We plan to explore the possibility of
setting or updating these parameters at runtime based on the characteristics of the
problem at hand. A related work in progress [7] explores adaptive ideas suitable in
the context of the standard AMG method.
REFERENCES
[1] M. Adams, M. Brezina, J. Hu, and R. Tuminaro, Parallel multigrid smoothing: Polynomial
versus Gauss-Seidel, J. Comput. Phys., 188 (2003), pp. 593–610.
[2] J. Bramble, J. Pasciak, J. Wang, and J. Xu, Convergence estimates for multigrid algorithm
without regularity assumptions, Math. Comp., 57 (1991), pp. 23–45.
[3] A. Brandt, Lecture given at CASC, Lawrence Livermore National Lab, Livermore, CA, 2001.
[4] A. Brandt, S. F. McCormick, and J. W. Ruge, Algebraic multigrid (AMG) for sparse matrix
equations, in Sparsity and Its Applications, D. J. Evans, ed., Cambridge University Press,
Cambridge, UK, 1984, pp. 257–284.
[5] A. Brandt and D. Ron, Multigrid solvers and multilevel optimization strategies, in Multilevel
Optimization in VLSICAD, Comb. Optim. 14, J. Cong and J. R. Shinnerl, eds., Kluwer
Academic Publishers, Dordrecht, The Netherlands, 2003, pp. 1–69.
[6] M. Brezina, A. J. Cleary, R. D. Falgout, V. E. Henson, J. E. Jones, T. A. Manteuffel,
S. F. McCormick, and J. W. Ruge, Algebraic multigrid based on element interpolation
(AMGe), SIAM J. Sci. Comput., 22 (2000), pp. 1570–1592.
[7] M. Brezina, R. Falgout, S. MacLachlan, T. Manteuffel, S. F. McCormick, and J. W.
Ruge, Adaptive Algebraic Multigrid (αAMG), in preparation, 2003.
[8] M. Brezina, C. I. Heberton, J. Mandel, and P. Vaněk, An Iterative Method with Conver-
gence Rate Chosen A Priori, UCD/CCM report 140, Center for Computational Mathemat-
ics, University of Colorado at Denver, Denver, CO, 1999; available online from http://www-
math.cudenver.edu/ccmreports/rep140.ps.gz.
[9] M. Brezina and P. Vaněk, A black-box iterative solver based on a two-level Schwarz method,
Computing, 63 (1999), pp. 233–263.
[10] W. Briggs, V. E. Henson, and S. F. McCormick, A Multigrid Tutorial, 2nd ed., SIAM,
Philadelphia, 2000.
[11] T. Chartier, Element-Based Alebraic Multigrid (AMGe) and Spectral AMGe, Ph.D. thesis,
Univerity of Colorado at Boulder, Boulder, CO, 2001.
[12] T. Chartier, R. D. Falgout, V. E. Henson, J. E. Jones, T. Manteuffel, S. F. Mc-
Cormick, J. W. Ruge, and P. S. Vassilevski, Spectral AMGe (ρAMGe), SIAM J. Sci.
Comput., 25 (2003), pp. 1–26.
[13] J. Fish and V. Belsky, Generalized aggregation multilevel solver, Internat. J. Numer. Methods
Engrg., 40 (1997), pp. 4341–4361.
1920 BREZINA ET AL.
[14] S. F. McCormick and J. W. Ruge, Algebraic multigrid methods applied to problems in com-
putational structural mechanics, in State of the Art Surveys on Computational Mechanics,
A. K. Noor and J. T. Oden, eds., ASME, New York, 1989, pp. 237–270.
[15] J. W. Ruge, Algebraic multigrid (AMG) for geodetic survey problems, in Proceedings of the
International Multigrid Conference, Copper Mountain, CO, 1983.
[16] J. W. Ruge, Final Report on AMG02, report, Gesellschaft fuer Mathematik und Datenverar-
beitung, St. Augustin, 1985, GMD, contract 5110/022090.
[17] J. W. Ruge, Algebraic multigrid applied to systems of partial differential equations, in Proceed-
ings of the International Multigrid Conference, 1985, S. McCormick, ed., North–Holland,
Amsterdam, 1986.
[18] J. W. Ruge and K. Stüben, Efficient solution of finite difference and finite element equations
by algebraic multigrid (AMG), in Multigrid Methods for Integral and Differential Equa-
tions, The Institute of Mathematics and its Applications Conference Series, D. J. Paddon
and H. Holstein, eds., Clarendon Press, Oxford, UK, 1985, pp. 169–212.
[19] J. W. Ruge and K. Stüben, Algebraic multigrid (AMG), in Multigrid Methods, Frontiers
Appl. Math. 3, S. F. McCormick, ed., SIAM, Philadelphia, 1987, pp. 73–130.
[20] P. Vaněk, M. Brezina, and J. Mandel, Convergence of algebraic multigrid based on smoothed
aggregation, Numer. Math., 88 (2001), pp. 559–579.
[21] P. Vaněk, Acceleration of convergence of a two-level algorithm by smoothing transfer operator,
Appl. Math., 37 (1992), pp. 265–274.
[22] P. Vaněk, M. Brezina, and R. Tezaur, Two-grid method for linear elasticity on unstructured
meshes, SIAM J. Sci. Comput., 21 (1999), pp. 900–923.
[23] P. Vaněk, J. Mandel, and M. Brezina, Algebraic multigrid by smoothed aggregation for
second and fourth order elliptic problems, Computing, 56 (1996), pp. 179–196.