Gmres Siam

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Vo|.

SIAM J. ScI. STAT. COMPUT. 7, No. 3, July 1986

1986 Society for Industrial and Applied Mathematics 011

GMRES: A GENERALIZED MINIMAL RESIDUAL ALGORITHM FOR SOLVING NONSYMMETRIC LINEAR SYSTEMS*
YOUCEF SAAD
AND

MARTIN H. SCHULTZ"

Abstract. We present an iterative method for solving linear systems, which has the property of minimizing at every step the norm of the residual vector over a Krylov subspace. The algorithm is derived from the Arnoldi process for constructing an /2-orthogonal basis of Krylov subspaces. It can be considered as a generalization of Paige and Saunders MINRES algorithm and is theoretically equivalent to the Generalized Conjugate Residual (GCR) method and to ORTHODIR. The new algorithm presents several advantages

over GCR and ORTHODIR.

Key words, nonsymmetric systems, Krylor subspaces, conjugate gradient, descent methods, minimal residual methods
AMS(MOS) subject classification. 65F

1. Introduction. One of the most effective iterative methods for solving large sparse symmetric positive definite linear systems of equations is a combination of the conjugate gradient method with some preconditioning technique [3], [8]. Moreover, several different generalizations of the conjugate gradient method have been presented in the recent years to deal with nonsymmetric problems [2], [9], [5], [4], [13], [14] and symmetric indefinite problems [10], [3], [11], [14]. For solving indefinite symmetric systems, Paige and Saunders [10] proposed an approach which exploits the relationship between the conjugate gradient method and the Lanczos method. In particular, it is known that the Lanczos method for solving the eigenvalue problem for an N x N matrix A is a Galerkin method onto the Krylov subspace Kk =- span{v1, AVl,"" ", Ak-lvl}, while the conjugate gradient method is a Galerkin method for solving the linear system Ax =f, onto the Krylov subspace Kk with v to/II roll. Thus, the Lanczos method computes the matrix representation Tk of the linear operator PkAII, the restriction of PkA to Kk, where Pk is the 12-0rthogonal projector onto Kk. The Galerkin method for Ax =f in Kk leads to solving a linear system with the matrix Tk which is tridiagonal if A is symmetric. In general, Tk is indefinite when A is and some stable direct method must be used to solve the corresponding tridiagonal Galerkin system. The basis of Paige and Saunders SYMMLQ algorithm is to use the stable LQ factorization of T. Paige and Saunders also showed that it is possible to formulate an algorithm called MINRES using the Lanczos basis to compute an approximate solution Xk which minimizes the residual norm over the Krylov subspace K. In the present paper we introduce and analyse a generalization of the MINRES algorithm for solving nonsymmetric linear systems. This generalization is based on the Arnoldi process 1 ], 12] which is an analogue of the Lanczos algorithm for nonsymmetric matrices. Instead of a tridiagonal matrix repregenting PkAIr, as is produced by the Lanczos method for symmetric matrices, Arnoldis method produces an upper Hessenberg matrix. Using the 12-0rthonormal basis generated by the Arnoldi process, we will show that the approximate solution which minimizes the residual norm over Kk, is easily computed by a technique similar to that of Paige and Saunders. We call the resulting
* Received by the editors November 29, 1983, and in revised form May 8, 1985. This work was supported by the Office of Naval Research under grant N000014-82-K-0184 and by the National Science Foundation under grant MCS-8106181. f Department of Computer Science, Yale University, New Haven, Connecticut 06520.
856

ALGORITHM FOR NONSYMMETRIC LINEAR SYSTEMS

857

algorithm the Generalized Minimal Residual (GMRES) method. We will establish that GMRES is mathematically equivalent to the generalized conjugate residual method (GCR) [5], [16] and to ORTHODIR [9]. It is known that when A is positive real, i.e. when its symmetric part is positive definite, then the generalized conjugate residual method and the ORTHODIR method will produce a sequence of approximations Xk which converge to the exact solution. However, when A is not positive real GCR may break down. ORTHODIR on the other hand does not break down, but is known to be numerically less stable than GCR [5], although this seems to be a scaling difficulty. Thus, systems in which the coefficient matrix is not positive real provide the main motivation for developing GMRES. For the purpose of illustration, consider the following 2 x 2 linear system Ax =f, where
-1

f=

x=0"

The GCR algorithm can be briefly described as follows:


1. Start: Set Po 2. Iterate: For

ro f Axo
0, 1,... until convergence do" (r,, Ap,)/(Api, Ap,),

Compute a,
Xi+ ri+l

t" Xi n otiPi, ri

Pi+=ri+ + j=o (.p. where {i)} are chosen so that

(Ap+ 1, Ap) O, for 0 < j _-< i.

If one attempts to execute this algorithm for the above example one would obtain the following results: 0 we get ao 0 and therefore Xl Xo, rl ro. Moreover, the vector 1. At step is zero. Pl 2. At step 1, a division by zero takes place when computing a and the algorithm breaks down. We will prove that GMRES cannot break down even for problems with indefinite symmetric parts unless it has already converged. Moreover, we will show that the GMRES method requires only half the storage required by the GCR method and fewer arithmetic operations than GCR. In 2 we will briefly recall Arnoldis method for generating /2-orthogonal basis vectors as it is described in [13]. In 3, we will present the GMRES algorithm and its analysis. Finally, in 4 we present some numerical experiments.
2. Arnoldis method. Arnoldis method [ 1] which uses the Gram-Schmidt method for computing an /2-orthonormal basis {Vl, v2,"" ", Vk} of the Krylov subspace Kk span {vl, AVl,., Ak-lvl} can be described as follows.

ALGORITHM 1: Arnoldi. 1. Start: Choose an initial vector Vl with 2. Iterate: For j 1, 2,. do: hi, Avj, v, i= 1, 2,. j, j+ avj

IIv ll- 1.

hj+ lj

j+ ll, and

Y=

In practical implementation it is usually more suitable to replace the Gram-Schmidt algorithm of step 2 by the modified Gram-Schmidt algorithm 15]. If Vk is the N x k

858

YOUCEF SAAD AND MARTIN H. SCHULTZ

matrix whose columns are the g_-orthonormal basis { vl, v2,"

,/)k}, then Hk Vk, is the upper k x k Hessenberg matrix whose entries are the scalars hia generated by Algorithm 1. If we call Pk the /2-orthogonal projector onto Kk, and denote by Ak the section of A in Kk, i.e. the operator Ak PkAI:k, we notice that Hk is nothing but the matrix representation of Ak in the basis { vl, v2, Vk}. Thus Arnoldis original method was a Galerkin method for approximating the eigenvalues of A by those of Hk [1]

VIA

[1], [12]. In order to solve the linear system

(1)

Ax =f,

by the Galerkin method using the /-orthogonal basis Vk, we seek an approximate solution Xk of the form Xk Xo + Zk, where Xo is some initial guess to the solution x, and Zk is a member of the Krylov subspace Kk=span {ro, Aro,, Ak-lro}, with ro =f-Axo. Suppose that k steps of Algorithm 1 are carried out starting with v, ro/i[ro[[. Then it is easily seen that the Galerkin condition that the residual vector rk =f--AXk be /2-orthogonal to Kk yields
Zk

Vkyk where Yk-- n-llrolle

and e is the unit vector Algorithm 13].

e (1, 0, 0,.

, 0)r

13]. Hence we can define the following

ALGORITHM 2: Full orthogonalization method. 1. Start: Choose Xo and compute ro=f-Axo and Vl 2. Iterate" For j 1, 2,. k do: h,,i (Avj, v,), i= 1, 2,.

ro/llroll.

3,+1 Av, .,{= hiovi,


hj+l,j

3. Form the solution" x Xo + VV, where y

v+

j+l II, and l/ h+ ,O.

In practice, the number k of iterations in step 2 is chosen so that the approximate solution x will be sufficiently accurate. Fortunately, it is simple to determine a posteriori when k is sufficiently large without having to explictly compute the approximate solution because we can compute the residual norm of x thanks to the relation 13],

[141" (2)

IIf Axll hk+l,kle Tykl.

Note, that if the algorithm stops at step k, then clearly it is unnecessary to compute the vector vk+l. Algorithm 2 has a number of important properties [14]: Apart from a multiplicative constant, the residual vector rk of Xk is nothing but the vector Vk+. Hence, the residual vectors produced by Algorithm 2 are/2-orthogonal to each other. Algorithm 2 does not break down if and only if the degree of the minimal polynomial of Vl is at least k and the matrix Hk is nonsingular. The process terminates in at most N steps. Algorithm 2 generalizes a method developed by Parlett [11] for the symmetric case. It is also known to be mathematically equivalent to the ORTHORES algorithm developed by Young and Jea [9].

ALGORITHM FOR NONSYMMETRIC LINEAR SYSTEMS

859

A difficulty with the full orthogonalization method is that it becomes increasingly expensive as the step number k increases. There are two distinct ways of avoiding this difficulty. The first is simply to restart the algorithm every rn steps. The second is to truncate the /2-orthogonalization process, by insisting that the new vector Vi+l be /2-orthogonal to only the previous vectors where is some integer parameter. The resulting Hessenberg matrix Hk is then banded and the algorithm can be implemented in such a way as to avoid storing all previous but only the most recent vis. The details on this Incomplete /2-orthogonalization Method (IOM (l)), can be found in 14]. A drawback of these truncation techniques is the lack of any theory concerning the global convergence of the resulting method. Such a theory is difficult because there is no optimality property similar to that of the conjugate gradient method. In the next section we derive a method which we call GMRES based on Algorithm 1 to provide an approximate solution which satisfies an optimality property.
3. The generalized minimal residual (GMRES) algorithm. 3.1. The algorithm. The approximate solution of the form Xo + z, which minimizes the residual norm over z in Kk, can in principle be obtained by several known algorithms: The ORTHODIR algorithm of Jea and Young [9]; Axelssons method [2]; the generalized conjugate residual method [4], [5]. However, if the matrix is indefinite these algorithms may break down or have stability problems. Here we introduce a new algorithm to compute the same approximate solution by using the basis generated by Arnoldis method, Algorithm 1. To describe the algorithm we start by noticing that after k steps of Arnoldis method we have an 12-orthonormal system Vk+ and a (k+ 1)x k matrix Hk whose only nonzero entries are the elements h o generated by the method. Thus Hk is the same as Hk except for an additional row whose only nonzero element is hk+l, k in the (k + 1, k) position. The vectors vi and the matrix Hk satisfy the important relation:

(3)

AVk Vk+Hk.

Now we would like to solve the least squares problem:

(4)
If we set z y:

zmin,, Ilf- a[xo + z]ll

z,,

min

[Iro- Azll.

Vky, we can view the norm to be minimized as the following function of

(5)
where we have let/3

J(y)= IIv-aV,,Yll

Ilroll for convenience. Using (3) we obtain


J(Y)
Vk+,[fle-

(6)

II.

Here, the vector el is the first column of the (k + 1) x (k + 1) identity matrix. Recalling that Vk/l is 12-orthonormal, we see that
(7) J(Y) IIte- nv II. Hence the solution of the least squares problem (4) is given by (8) x Xo + Vy where Yk minimizes the function J(y), defined by (7), over y R k.

860

YOUCEF SAAD AND MARTIN H. SCHULTZ

The resulting algorithm is similar to the Full Orthogonalization Method, Algorithm 2, described earlier, the only difference being that the vector Yk used in step 3 for computing Xk is now replaced by the minimizer of J(y). Hence we define the following
structure of the method.

ALGORITHM 3" The generalized minimal residual method (GMRES). 1. Start" Choose Xo and compute ro=f and Vl ro/llroll. 2. Iterate" For j 1, 2, until satisfied do" k, ,j, hid--(Avj, v,), i= 1, 2,

axo

Form he approximate solution" minimizes (7). xk xo+ Vk, where When usin the GMRES algorithm we can easily use the Arnoldi matrix H for estimatin the eienvalues of A. This is particularly useful in the hybrid Chebyshev procedure proposed in [6]. It is clear that we face the same practical difficulties with the above GMRES method as with the Full Orthogonalization Method. When k increases the number of vectors requiring storage increases like k and the number of multiplications like 1/2k2N. To remedy this difficulty, we can use the algorithm iteratively, i.e. we can restart the algorithm every rn steps, where rn is some fixed integer parameter. This restarted version of GMRES denoted by GMRES(m) is described below.
ALGORITHM 4: GMRES(m). 1. Start: Choose Xo and compute ro =f- Axo and Vl 2. Iterate" For j 1, 2, rn do: h,o (Ave, v,), i= 1, 2, ,.h

j+l avj-=l

h+l./= +111, and


tj+l

j+l/hj+lj.

ro/

j+l
j+

av-,:l hiovi,
II, and
j+ l/ hj+ ld.

3. Form the approximate solution" x= Xo+ Vmy.. where Ym minimizes IIel-lmyll, Y e R 4. Restart" Compute r f- Ax,; if satisfied then stop else compute Xo := x,, vl := r/II and go to 2.

=.

Note that in certain applications we will not restart GMRES. Such is the case for example in the solution of stiff ODEs [7] and in the hybrid adaptive Chebyshev method [6].
3.2. Practical implementation. We now describe a few important additional details concerning the practical implementation of GMRES. Consider the matrix Hk, and let us suppose that we want to solve the least squares problem:
min
y

Ilflel-

A classical way of solving such problems is to factor Hg into QkRk using plane rotations. This is quite simple to implement because of the special structure of Hk. However, it is desirable to be able to update the factorization of Hk progressively as each column appears, i.e. at every step of the Arnoldi process. This is important because, as will be seen, it enables us to obtain the residual norm of the approximate

ALGORITHM FOR NONSYMMETRIC LINEAR SYSTEMS

861

solution without computing xk thus allowing us to decide when to stop the process without wasting needless operations. We now show in detail how such a factorization can be carried out. In what follows, we let F represent the rotation matrix which rotates the unit vectors ej and ej+l, by the angle 0"

rowj+l

where

i= 1,...,j have been previously applied to Hj to produce the following upper triangular matrix of dimension (j + 1) x j"
X X X X X

Assume that the rotations Fi,

cos (0),

sin (0).

X X X
X

X X X
X X

X X X X X X

The letter x stands for a nonzero element. At the next step the last column and row of/-/+ appear and are appended to the above matrix. In order to obtain Rj+ we must start by premultiptying the new column by the previous rotations. Once this is done we obtain a (j + 2)x (j + 1) matrix of the form
X X
X X X X X X

X
X X X X X

X
X

X X

X
X

X
X

X X X X

0 0

0 0

The principal upper (j + 1)xj submatrix of the above matrix is nothing but R, and h stands for h+2j+ which is not affected by the previous rotations. The next rotation will then consist in eliminating that element h in position j + 2, j + 1. This is achieved by the rotation F/ defined by
Cj+

r/(r2+ h2) 1/2, Sj+l=- -h/(r2 + h) /2.

862
side tion of

YOUCEF SAAD AND MARTIN H. SCHULTZ

Note that the successive rotations

F must also simultaneously be applied to the right

Thus, after k steps ofthe above process, we have achieved the following decomposi-

Hk"
(k /

QkHk Rk where Qk is (k-4-1)x 1) and is the accumulated product of the rotation matrices F, while Rk is an upper triangular matrix of dimension (k / 1) x k, whose last row is zero. Since Qk is unitary, we have"

(9)
where gk

Qk[3el is the transformed right-hand side. Since the last row of Rk is a zero row, the minimization of (9) is achieved by solving the upper triangular linear system whch results from removing the last row of Rk and the last component of gk. This provides Yk and the approximate solution Xk is then formed by the linear combination

J(Y) II/e,-

II--II Qk[fle,-

(8).
We claimed earlier that it is possible to obtain the residual norm of the approximate

x while performing the above factorization, without explicitly computing x. notice that from the definition of J(y), the residual norm is nothing but J(Yk) Indeed, which, from (9), is in turn equal to ]]gk--RkYkl]. But by construction of Yk, this norm is the absolute value of the last component of gk. We have proved the following. PROPOSITION 1. The residual norm of the approximate solution Xk is equal to the (k + 1)st component of the right-hand side gk obtained by premultiplying fie1 by the k successive rotations transforming Hk into an upper triangular matrix. Therefore, since gk is updated at each step, the residual norm is available at every step of the QR factorization at no extra cost. This is very useful in the practical implementation of the algorithm because it will prevent us from taking unnecessary iterations while allowing us to avoid the extra computation needed to obtain Xk
solution

explicitly.

Next we describe an efficient implementation of the last step of GMRES. If we can show that we can obtain the residual vector as a combination of the Arnoldi vectors /)m and Av,, then after step m we do not need )m+l. Note that computing m+l )1, and its norm costs (2m / 1)N multiplications, so elimination of its computation is a significant saving. Assume that the first m- 1 Arnoldi steps have already been performed, i.e. that the first m- 1 columns of H, are available as well as the first m vectors vi, i-1,..., m. Since we will not normalize vi at every step, we do not have explicitly the vectors vi but rather the vectors w J,i)i where/xi are some known scaling coefficients. All we need in order to be able to compute Xm is the matrix Hm and the vectors vl,..., Vm. Since the vectors v, i= 1,..., m, are already known, we need compute m + 1. Noting that h,, (Av,,, v), for _-< m we only the coefficients hi,,, 1, see that these first m coefficients can be obtained as follows" 1. Compute Av,,, and m. 2. Compute the m inner-products (Av,,, vi), 1, Clearly, the scaling coefficients/x must be used in the above computations as i- 1,. m, are only available as w =/xvi, where/x- w]]. This determines the mth column of H,,, except for the element h,+,m. We wish to compute this coefficient without having to compute w,+l. By definition and the orthogonality of the vis

.,

(10)

h 2 l,m m.+

Al)m
i=1

h,,,,,v,

IlAvll :

.,

h2

i=1

ALGORITHM FOR NONSYMMETRIC LINEAR SYSTEMS

863

Hence the last coefficient can be obtained from the hi,,,s,


of Arm.
i= 1,.

1,.

., m, and the norm

Now we will show how to compute the residual vector r,, =f-Ax,, from the vis, m and Av,,. This computation is necessary only when restarting. From (6) the residual vector can be expressed as

.,

(11)
If we define
tl, t2,"
ti)i
i=1

rm Vm+l[[3el- HmYm].
",

t,,+l] r =/3el
i=1

,,y,,, then
Ill)

rm

+ tm+l l)m+l

"F tm+l

hm+ l,m

Al)m.--

hi, ml)

tm+l

hm+ l,m

Avm + i=1 (t, tm+lhi.m/h,,,+l.m)Vi. E

It is to be expected that for large m, the alternative expression (10) for would be inaccurate as the orthogonality of the vectors v, on which it is based, is likely to be lost [11]. Moreover, in the restarted GMRES, the computation of r,, by (11) may be more time consuming than the explicit use of r, =f-Ax,,. Therefore, it is not recommended to use the above implementation when m is large.
3.3. Comparison with other methods. From the previous description of GMRES, it is not clear whether or not this algorithm is more ettective than GCR or ORTHODIR. Let us examine the computational costs of these three methods. We will denote by NZ the number of nonzero elements in A. We will evaluate the cost of computing the approximation xk by GMRES. There are several possible implementations but we will refer to the one described in the previous section. If we neglect the cost of computing Yk, which is the solution of a least squares problem of size k, where k is usually much less than N, the total cost of computing xk by GMRES can be divided in two parts" k. The jth step in The computation of the Arnoldi vectors vj/l, for j 1, 2, this loop requires (2j + 1)N + NZ multiplications, assuming that the vectors v are not normalized but that their norms are only computed and saved. The last step requires only (k + 1)N multiplications instead of (2k + 1)N, i.e. kN fewer multiplications than the regular cost, as was shown in the previous section. Hence, the total number of mutiplications for this part is approximately k(k + 2) N + kNZ kN

k(k+l)N+kNZ.
The formation of the approximate solution Xo+ VkYk, in step 3 requires kN multiplications. The k steps of GMRES therefore require k(k + 2) N + kNZ multiplications. Dividing by the total number of steps k, we see that each step requires (k+2)N+ NZ multiplications on the average. In [5], it was shown that both GCR and ORTHODIR require on the average 1/2(3k + 5)N + NZ multiplications per step to produce the same approximation Xk. Therefore with the above implementation GMRES is always less expensive than either GCR or ORTHODIR. For large k savings will be nearly 1/2. The above comparison concerns the nonrestarted GMRES algorithm. Note that the notation adopted in [5] for the restarted versions of GCR and ORTHODIR differs slightly from ours in that GCR(rn) has m + 1 steps in each innerloop, while GMRES(m) has only m steps. Hence GMRES(m) is mathematically equivalent to GCR(m-1). When we restart GMRES, we will need the residual vector after the m steps are completed. The residual vector can be obtained either explicitly as f-Ax, or, as will m. Assuming be described later, as a linear combination of Av, and the vs, 1,

864

YOUCEF SAAD AND MARTIN H. SCHULTZ

the latter, we will perform (m + 1) N extra multiplications. This will increase the average cost per step by (1 + 1 / m) N to (m / 3 + 1/m) N + NZ. The corresponding cost per step of the restarted GCR and ORTHODIR is 21-(3m + 5)N+ NZ. Thus GMRES(m) is more economical than GCR(m-1) for m> 1. Note that the above operation count for GCR(m- 1) and ORTHODIR(m- 1) does not include the computation of the norm of the residual vector which is required in the stopping criterion while for GMRES, we have shown earlier that this norm is available at every step at no extra cost. This remark shows that in fact the algorithms require the same number of operations when
m-1.

For GMRES(m), it is clear that all we need to store is the vis, the approximate solution, and vector for Avi, which means (m + 2)N storage locations. For large m, this is nearly half the (2m + 1)N storage required by both GCR and ORTHODIR. The comparison of costs is summarized in the following table in which GCR(m- 1) and GMRES(m) are the restarted versions of GCR and GMRES, using m steps in each innerloop. Note that the operation count of ORTHODIR(rn- 1) is identical with that of GCR(m 1) [5].
TABLE
Method
Multiplications

Storage

GCR(m 1) GMRES(m)

[(3m+5)/2])N+NZ (m+3+I/m)N+NZ

(2m+l)N (m+2)N

3.4. Theoretical aspects of GMRES. A question often raised in assessing iterative algorithms is whether they may break down. As we showed in the introduction, GCR can break down when A is not positive real, i.e. when its symmetric part is not positive definite. In this section we will show that GMRES cannot break down, regardless of the positiveness of A. Initially, we assume that the first m Arnoldi vectors can be constructed. This will be the case if hj+l 0, j 1, 2,..., m. In fact if hj+2j+l 0, the diagonal element rj/l//l of Rj+I obtained from the above algorithm satisfies:
2 ii+l,j+l---(Cj+lr--sj+lhj+2,j+l)-(r2/ hj+2j+l)

/2

Hence, the diagonal elements of R,, do not vanish and therefore the least squares problem (9) can always be solved, establishing that the algorithm cannot break down if h/ 0, j 1,. m. Thus the only possible potential difficulty is that during the Arnoldi process we encounter an element h+l equal to zero. Assume that this actually happens at the jth step. Then since h+ j 0 the vector v/ cannot be constructed. However, from Arnoldis algorithm it is easily seen that we have the relation A V V/-/ which means that the subspace Kj spanned by V is invariant. Notice that if A is nonsingular then whose spectrum is a part of the spectrum of A is also nonsingular. The quadratic form (5) at the jth step becomes

and from Proposition 1 we know that the residual norm is nothing but

IIvl-mVYll II/Vl- VMy II: v[/3e- ny] II- [I/el-ny II. Since is nonsingular the above function is minimum for y= Hflfl81 and the corresponding minimum norm is zero, i.e., the solution x is exact. To prove that the converse is also true assume that xj is the exact solution and that x, i- 1, 2,..., j-1 are not, i.e. r 0 but ri 0 for i- 0, 1,..., j- 1. Then r- 0
J(Y)

sge-lg-l, i.e.

ALGORITHM FOR NONSYMMETRIC LINEAR SYSTEMS

865

the previous residual norm times ss. Since the previous residual norm is nonzero by assumption, we must have ss 0 which implies hs/lj 0, i.e. the algorithm breaks down and 3S/l 0 which proves the result. ,j is equivalent Moreover, it is possible to show that 3s+1 =0 and i rs 0 1, 2, to the property that the degree of the minimal polynomial of the initial residual vector ro v is equal to j. Indeed assume the degree of the minimal polynomial of Vl is j. This means that there exists a polynomial Ps of degree j, such that ps(A)Vl 0, and is the polynomial of lowest degree for which this is true. Therefore span{vl, AVl,..., AJ/)l} is equal to Ks. Hence the vector j+l which is a member of Ks/l Ks and is orthogonal to K is necessarily a zero vector. Moreover, if t3i 0 for -<j then there exists a polynomial pi of degree such that p(A)Vl 0 which contradicts the minimality of p. To prove the converse assume that t/ =0 and 3 0 1, 2,. ,j. Then there exists a polynomial Ps of degree j such that ps(A)Vl 0. Moreover, Ps is the polynomial of lowest degree for which this is true, otherwise we would have t3i+ 0, for some <j by the first part of this proof which is a contradiction. PROPOSITION 2. The solution xs produced by GMRES at step j is exact if and only if the following four equivalent conditions hold" (1) The algorithm breaks down at step j. (2) V+l 0.

(3) hs+,j=0. (4) The degree of the minimal polynomial of the initial residual vector ro

is equal

toj. This uncommon type of breakdown is sometimes referred to as a "lucky" breakdown in the context of the Lanczos algorithm. Because the degree of the minimal polynomial of v cannot exceed N for an N-dimensional problem, an immediate corollary follows. COROLLARY 3. For an N N problem GMRES terminates in at most N steps. A consequence of Proposition 2 is that the restarted algorithm GMRES (m) does not break down. GMRES (m) would therefore constitute a very reliable algorithm if it always converged. Unfortunately this is not always the case, i.e. there are instances where the residual norms produced by the algorithm, although nonincreasing, do not converge to zero. In [5] it was shown that the GCR (m- 1) method converges under the condition that A is positive real and so the same result is true for GMRES (m). It is easy to construct a counter-example showing that this result does not extend to indefinite problems, i.e. that the method may not converge if the symmetric part of A is not positive definite. In fact it is possible to show that the restarted GMRES method may be stationary. Consider GMRES (1) for the problem Ax =f, where

A=

-1

f=

x=0

which we considered in the introduction. The approximate solution x minimizes the residual norm IIf-Azll where z is a vector of the form z= af. It is easily seen that Xl 0. Therefore the algorithm will provide a stationary sequence. Note that this is independent from the problem of breakdown. In fact GMRES will produce the solution in two steps but GMRES (1) never will. Since the residual norm is minimized at every step of the method it is clear that it is nonincreasing. Intuitively, for rn large enough the residual norm will be reduced by a sufficiently small ratio as to ensure convergence. Thus we would expect GMRES (m) to be convergent for sufficiently large m. However, note that ultimately

866

YOUCEF SAAD AND MARTIN H. SCHULTZ

when m N, the result is trivial, i.e. the method converges in one step. Thus, we will not attempt to show that the method GMRES (m) converges for sufficiently large m. On the other hand it is useful to show that if A is nearly positive real, i.e. when it has a small number of eigenvalues on the left half plane, then m need not be too large for convergence to take place. In order to analyse this convergence, we let P,, be the space of all polynomials of degree -<_ m and let tr represent the spectrum of A. The following result was established in [5] for the GCR algorithm and is a simple consequence of the optimality property. PROPOSITION 4. Suppose that A is diagonalizable so that A XDX -1 and let

(12)

e ")=

pe Pm,p(O)=

min

max

Then the residual norm provided at the ruth step

of GMRES satisfies

r,.+,ll-<where ,, (X)

When A is positive real with symmetric part M, the following error bound can be derived from the proposition, see [5]"

X-II.

r,,
with a

II--<

a //3

o11,

(Amin(M)) 2, fl ,.max(ATm). This proves the convergence of the GMRES (m) for all m when A is positive real [5]. When A is not positive real the above result is no longer true but we can establish the following explicit upper bound for e (). THEOREM 5. Assume that there are t, eigenvalues A 1, A2, A ofA with nonpositive real parts and let the other eigenvalues be enclosed in a circle centered at C with C > 0 and having radius R with C > R. Then

(13)
where

e(") <

max
j=v+l,Ni=

Ix,I
and d
min
i=l,t,

max
i= l,t,;j= t,+ l,N

IA,-Ajl

IA,I.

Proof. Consider the particular class of polynomials defined by p(z)= r(z)q(z) where r(z)=(1-z/A1)(1-z/A2)... (1- z/A) and q(z) is an arbitrary polynomial of degree <=m- t,, such that q(0)= 1. Clearly, since p(0)= 1 and p(Ai)=0, i= 1,we have

., ,,

< e(")_- max


j= t,+l,N

< Ip(xj)l- j=max Ir(Aj)l j=max Iq(mj)l. t,+l,N t,+l,N

It is easily seen that


max
j=u+l,N

[r(Aj)]

max J=t+lNi=l" i i"i

< --(D/d).

Moreover, by the maximum principle, the maximum of Iq(z)[ for z belonging to the set {,j}j=+.u is no larger than its maximum over the circle that encloses that set. whose maximum modulus on the circle Taking the polynomial q(z)= [(C- z)/C] is (R/C) yields the desired result. A similar result was shown by Chandra [3] for the symmetric indefinite case. Note that when the eigenvalues of A are all real then the maximum of the product term in

ALGORITHM FOR NONSYMMETRIC LINEAR SYSTEMS

867

the second part of inequality (13) satisfies


max

I Ih- hjl I [h,-hv[

where A is the largest eigenvalue of A. A simple consequence of the above theorem is the following corollary. COROLLARY 6. Under the assumptions of oposition 4 and eorem 5, GMRES (m) converges for any initial vector Xo if

m>Log[r(X1/]/Log[]
A few comments are in order. First note that, in general, the upper bound (13) is not likely to be sharp, and so convergence may take place for m much smaller than would be predicted by the result. Second, obsee that the minimal m that ensures convergence is related only to the eigenvalue distribution and the condition number of X. In paicular, it is independent of the problem-size N. Third, it may very well happen that the minimal m would be larger than N, in which case the information provided by the corollary would be trivial since the method is exact for m N.
4. Numerical experiments. In this section we repo a few numerical experiments comparing the performances of GMRES with other conjugate gradient-like methods. The tests were performed on a VAX-11/780 using double precision corresponding to a unit round off of nearly 6.93 x 10 -18. The GMRES (k) algorithm used in the following tests computes explicitly the last vector Vk+ of each outer iteration, i.e. it does not implement the modification described at the end of 3.2. The test problem was derived from the five point discretization of the following paaial differential equation which was described in H. Elmans thesis [5]:

-(bu)x -(cu)x + du + (du)x + eur + (eU)y + fu g


on the unit square, where

C(X, y) erd(x, y) fl(x + y), b(x, y) e -xy, e(x,y)=r(x+y) and f(x,y)=l./(l+x+y)


subject to the Dirichlet boundary conditions u 0 on the boundary. The right-hand xy side g was chosen so that the solution was known to be xe sin (Trx)sin (ry). The parameters/3 and /are useful for changing the degree of symmetry of the resulting linear systems. Note that the matrix A resulting from the discretization remains positive real independent of these parameters. We will denote by n the number of interior nodes on each side of the square and by h 1/(n + 1) the mesh size. In the first example we took n 48, /= 50 and/3 1. This yielded a matrix of dimension N 2304. The system was preconditioned by the MILU preconditioning applied on the right, i.e. we solved AM-I(Mx)=f where M was some approximation to A -1 provided by an approximate LU factorization of A see [5]. The process was stopped as soon as the residual norm was reduced by a factor of e 10 -6. The following plot compares the results obtained for GCR (k), GMRES (k), and ORTHOMIN (k) for some representative values of k. The plot shows that ORTHOMIN (k) did not converge for k 1 and k 5 on this example. In fact, we observed that it exhibited the same nonconverging behaviour for all values of k between 1 and 5. Another interesting observation is that GMRES (5)

868

YOUCEF SAAD AND MARTIN H. SCHULTZ

I0 0

IO-I

10-5
10-6
10

1.5x10 MULT IPL ICQT lQN5


GCR(I.)

2.5x10
C
IRTHIMIN{t)

R
O

GMRE$(SI

RTHIMINISI

RTHBMIN{IOI

FIG. 4.1. n

48, MILUpreconditioning,

),

50.,/3

1.

101
100

I0-I

10-2

\\.

-,, -. .,.
10-5
10 -6

105

2.0xlO

3. OlO 5 HLTIPLICTIN

.
50.,/3

010

5.010

fl
C

GMRE$(S)
GCR[19I

B
0

GMRE5(201
IIIRTHIMIN{IOI

FIG. 4.2. n

18, MILU preconditioning, y

-20.

performed almost as well as GCR (1). Note that the value k 5 yielded the best possible result that was obtained for all reasonable choices of k and similarly GCR (1) corresponded to the best possible performance for GCR (k).

ALGORITHM FOR NONSYMMETRIC LINEAR SYSTEMS

869

It is worth pointing out that for moderate accuracy (e->_ 10-2), GMRES (5) was slightly better than GCR (1). Finally, we should indicate that the reason why ORTHOMIN performed so badly in this example is that the preconditioned system is not positive real. The MILU preconditioning seems to be more prone to such peculiarities than the simpler ILU preconditioning. In fact for this example ORTHOMIN (1) performed very well when the ILU preconditioning was used. In the next test we took n-18 which yielded a matrix of smaller dimension N 324, and y 50.,/3 -20. The main purpose of this experiment was to show that there are instances where using a large parameter m is important. Here again we used the MILU preconditioning and the stopping tolerance was e 10 -6. This example was more difficult to treat. ORTHOMIN (k) diverged for all values of k between 1 and 10. Also GCR (1), GCR (2) and GCR (3) diverged as well as their equivalent versions GMRES (k), k-2, 3, 4. The process GMRES (k) started to converge with k- 5 and improved substantially as k increased. The best performance was realized for larger values of k. The following plot shows the results obtained for GMRES (5), GMRES (20) and ORTHOMIN (10). In order to be able to appreciate the gains made by GMRES (20) versus its equivalent version GCR (19), we also plotted the results for GCR (19). Note that we saved nearly 25% in the number of multiplications but also almost half the storage which was quite important here since we needed to keep 22 vectors in memory versus 39 for GCR (19).
REFERENCES

1] W. E. ARNOLDI., The principle of minimized iteration in the solution of the matrix eigenvalue problem, Quart. Appl. Math., 9 (1951), pp. 17-29. [2] O. AXELSSON, Conjugate gradient type methods for unsymmetric and inconsistent systems of linear equations, Lin. Alg. Appl., 29 (1980), pp. 1-16. [3] R. CHANDRA, Conjugate gradient methods for partial differential equations, Ph.D. thesis, Computer Science Dept., Yale Univ., New Haven, CT, 1978. [4] S. C. EISENSTAT, H. C. ELMAN AND M. H. SCHULTZ, Variational iterative methods for nonsymmetric systems of linear equations, SIAM J. Numer. Anal., 20 (1983),. pp. 345-357. [5] H. C. ELMAN, Iterative methods for large sparse nonsymmetric systems of linear equations, Ph.D. thesis, Computer Science Dept., Yale Univ., New Haven, CT, 1982. [6] n. C. ELMAN, Y. SAAD AND P. SAYLOR, A hybrid Chebyshev Krylov subspace algorithm for solving nonsymmetric systems of linear equations, Technical Repoct YALU/DCS/TR-301, Yale Univ., New Haven, CT, 1984. [7] W. C. GEAR AND Y. SAAD, Iterative solution of linear equations in ODE codes, this Journal, 4 (1983),
pp. 583-601.

[8] A. L. HAGEMAN

[9)
[10]
11

[12]
13]

14] [15] [16]

AND D. M. YOUNG, Applied Iterative Methods, Academic Press, New York, 1981. K. C. JEA AND D. M. YOUNG, Generalized conjugate gradient acceleration ofnonsymmetrizable iterative methods, Lin. Alg. Appl., 34 (1980), pp. 159-194. C. C. PAIGE AND M. A. SAUNDERS, Solution of sparse indefinite systems of linear equations, SIAM J. Numer. Anal., 12 (1975), pp. 617-624. B. N. PARLETT, A new look at the Lanczos algorithm for solving symmetric systems of linear equations, Lin. Alg. Appl., 29 (1980), pp. 323-346. Y. SAAD, Variations on Arnoldis method for computing eigenelements of large unsymmetric matrices, Lin. Alg. Appl., 34 (1980), pp. 269-295. ., Krylov subspace methods for solving large unsymmetric linear systems, Math. Comput., 37 (1981), pp. 105-126. Practical use of some Krylov subspace methods for solving indefinite and unsymmetric linear systems, this Journal, 5 (1984), pp. 203-228. G. W. STEWART, Introduction to Matrix Computations, Academic Press, New York, 1973. P. K. W. VINSOME, ORTHOMIN, an iterative method for solving sparse sets of simultaneous linear equations, in Proc. Fourth Symposium on Reservoir Simulation, Society of Petroleum Engineers of AIME, 1976, pp. 149-159.

You might also like