0% found this document useful (0 votes)

33 views14 pages

Chapter 3

The document discusses iterative methods for solving large, sparse systems of linear equations. It introduces the concept of using an iterative approach to generate a sequence of vectors that converge to the solution. Three classical iterative methods are described: Jacobi, Gauss-Seidel, and SOR (successive over-relaxation). Convergence properties and the relationship between the spectral radius of the iteration matrix and convergence of the sequence are analyzed.

Uploaded by

heni16belay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views14 pages

Chapter 3

Uploaded by

heni16belay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Chapter 3

Systems of Linear
Equations - Iterative
Approach

Many linear systems arised in real-world applications are large and sparse.
Because of the large number of equations and unknowns, storage becomes a
serious concern. When it is possible, Gaussian elimination remains a very eco-
nomical, accurate, and useful algorithm. Elimination is possible as long as there
is space to store all the nonzero elements of the triangular matrices associated
with the elimination and when the coding necessary to locate these elements
can be programmed. Techniques along this line can be found, for example,
in SPARSPAK by George, A., and Liu, J.W.H., “Computer Solution of Large
Sparse Positive Definite Systems”.
There are cases where the orders n are so large that is is impossible to store
the fill-in resulted from the Gaussian elimination. It is, therefore, desirable to
solve such linear systems Ax = b by methods that never alter the matrix A and
never require storing more than a few vectors of length n. Iterative methods
are especially suitable for this purpose.
In an iterative method, beginning with an initial vector x(0) , we generate a
sequence of vectors x(1) → x(2) → · · · according to the iteration scheme. We
hope that as k → ∞, x(k) will converge to the exact solution. The computa-
tional effort in each individual step x(i) → x(i+1) , generally, is comparable to
the multiplications of A with a vector. This is a very modest amount when A
is sparse.system.
A iterative method may be motivated from the following consideration.
Given a linear system
Ax = b (3.1)
and an approximate solution x̃, the residual corresponding to x̃ is defined by

r := b − Ax̃. (3.2)

1
2 Systems of Linear Equations

It follows that the error e := x − x̃ satisﬁes the equation

Ae = r. (3.3)
If we could solve (3.3) exactly, then x := x̃ + e would be the solution to (3.1).
In iterative methods, instead of solving (3.3) for the correction e, we solve
Se = r (3.4)
where S is an approximation to A. The diﬀerence between A and S here is that
(3.4) is much easier to be solved than (3.3). Now adding approximate correction
to the approximate x̃ gives what we hope is a better approximate to the true
solution x. This procedure can be summarized as follows:
(a) xold : = The current approximation to x;

(b) Compute the residual r := b − Axold ;

(c) Solve Se = r for the unknown e;
(d) Set
xnew := xold + e; (3.5)
(e) Go back to (a).
Multiplying (3.5) by S yields
Sxnew = Sxold + Se
= Sxold + b − Axold
= (S − A)xold + b := T xold + b (3.6)
where
T := S − A (3.7)
is called the splitting of the matrix A. Note that if the iterates converges to a
limit x, then x = xold = xnew and, by (3.6), we see that Ax = b. In other words,
a limit point of the iteration scheme (3.6) is a solution of the system (3.1).
The choice of S gives rise to diﬀerent iterative schemes. For instance, if the
matrix A is split as
A=D−L−U (3.8)
where D, −L and −U are, respectively, the diagonal, the strictly lower trian-
gular and the strictly upper triangular matrices of A. Then we may describe
three classical iterative schemes as follows:
(1) The Jacobi Method.

S = D; T = L + U ; (3.9)
new
Dx = (L + U )xold + b; (3.10)
1
i−1 n
xnew
i = (bi − aij xold
j − aij xold
j ). (3.11)
aii j=1 j=i+1
3.1. GENERAL CONSIDERATION 3

(2) The Gauss-Seidel method.

S = D − L; T = U ; (3.12)
(D − L)x new
= U xold + b; (3.13)
1
i−1 n
xnew
i = (bi − aij xnew
j − aij xold
j ). (3.14)
aii j=1 j=i+1

(3) The SOR method.

S = σD − L; T = (σ − 1)D + U ; (3.15)
(D − ωL)x new
= (1 − ω)Dxold + ωU xold + ωb; (3.16)
1
ω= ;
σ
1
i−1 n
x̂new
i = (bi − aij xnew
j − aij xold
j ); (3.17)
aii j=1 j=i+1

xnew
i = (1 − ω)xold
i + ωx̂new
i . (3.18)

The main concerns raised in any iterative method are that

1. Will the sequence (x(k) ) every converge? Does the limit point depend
upon the starting point x(0) ?
2. If the sequence (x(k) ) does converge, how fast? Is there a way to accelerate
the convergence?

3.1 General Consideration

We realize that most of the iterative schemes are of the form

xnew = Hxold + d (3.19)

for some constant square matrix H and constant vector d. We ﬁrst prove an
important theorem concerning the estimate of the spectral radius of H.
Lemma 3.1.1. For any square matrix H and any > 0, there exist an induced
matrix norm such that

ρ(H) ≤ H ≤ ρ(H) + (3.20)

(pf): The ﬁrst inequality is true for all induced norm. We prove the second
inequality by construction. Given H, there exist a nonsingular matrix P such
that
P −1 HP = Λ + U
4 Systems of Linear Equations

where Λ := a diagonal matrix with λ1 (H) as elements, and U := an upper tri-

angular matrix with zero diagonal. (See, Linear Algebra by Lane, p184). With
δ > 0, deﬁne D := diag{1, δ, δ 2 , . . . , δ n−1 }. Then D−1 = diag{1, δ −1 , . . . , δ 1−n }.
Now
D−1 P −1 HP D = D−1 (Λ + U )D = Λ + D−1 U D := C.
Note that
⎡ ⎤
0 x x x x ← elements have factor δ n−1
⎢ 0 x x x ⎥
⎢ ⎥
D UD = ⎢
−1
⎢ 0 x x ⎥ ← elements have factor δ 2
⎥
⎣ 0 x ⎦ ← elements have factor δ
0 0

Now we deﬁne a vector norm · by

x := D−1 P −1 x2 . (Show that this is a norm!)

Then the induced matrix norm for H is

H = sup Hx = sup D−1 P −1 Hx2 = sup CD−1 P −1 x2

x=1 x=1 x=1
−1 −1
= sup Cz2 where z := D P x, (So, x = 1 ⇒ z2 = 1)
z2 =1

= C2 .

Observe that C2 depends continuously on δ. When δ = 0, it is obvious that

C2 = ρ(C) = ρ(H) since C = Λ. Therefore, as δ → 0, C2 → ρ(H). Now
given , we may choose δ small enough such that C2 ≤ ρ(H) + . ⊕
Lemma 3.1.2. The following three statements are equivalent:

(1) limn→∞ H n = 0,
(2) limn→∞ H n = 0 for some norm,
(3) ρ(H) < 1.

(pf): This is a homework problem.

We now apply these results to our iterative method.

Theorem 3.1.1 Suppose x = Hx + d has a unique solution x∗ . Then the

sequence {x(k) } computed from (3.19) with any starting point x(0) converges to
x∗ if and only ρ(H) < 1.

(pf): Observe ﬁrst that x(k+1) − x∗ = H(x(k) − x∗ ) = . . . = H k+1 (x(0) − x∗ ).

Thus
(⇒) If ρ(H) < 1, then x(k+1) − x∗ ≤ H k+1 x(0) − x∗ → 0 by Lemma 3.1.2
Thus x(k) → x∗ ¿
(⇐) Suppose x(k) − x∗ → 0 for every x(0) . Take x(0) = x∗ + ei . Then
x(k) − x∗ = H k ei = The i-th column of H k . Use the · 1 norm, then we have
3.1. GENERAL CONSIDERATION 5

The i-th column of H k 1 → 0 as k → ∞. Since i is arbitrary, it follows that

H k 1 → 0. By Lemma 3.1.2 again, ρ(H) < 1. ⊕.
Remark. Suppose ρ(H) < 1. Consider a sequence {x(k) } generated from the
scheme (3.1.1). The estimate x(k) − x∗ 1/k represents the geometric average
error improvement in k iterations. The sequence {x(k) − x∗ 1/k } generally may
not converge. Thus, instead, we consider the number β := inf k supn≥k x(n) −
x∗ 1/n to be the rate of convergence of the given sequence, that is, for every
> 0, there is a k such that for every n ≥ K, x(n) − x( ≤ (β + )n . Note that
β depends on the starting vector x(0) .
Deﬁnition 3.1.1 The asymptotic convergence factor α of an iterative scheme
(3.1.1) is deﬁned to be
α := sup inf sup x(n) − x∗ 1/n
x(0) =0 k n≥k

Remark. By the norm equivalence theorem and the fact that limn→∞ c1/n = 1
for any nonzero constant c, it follows that β (and hence, α) is norm independent.
Theorem 3.1.2 Suppose ρ(H) < 1. Then the iterative scheme has asymptotic
convergence factor α = ρ(H).
(pf): Since ρ(H) < 1, we may choose a norm · such that H ≤ ρ(H)+ <
1. We have already seen that x(k) − x∗ ≤ Hk |x(0) − x0 . It follows that
β ≤ ρ(H) + . Since is arbitrary, it follows that α ≤ ρ(H). To show equality,
we construct a sequence {x(k) } such that the equality holds. Toward this, we
consider two cases:
(i) Suppose λ is a real eigenvalue of H such that λ = ρ(H). Let u be
the associated real unit eigenvector of λ. We choose x(0) := x∗ + u. Then
x(k) − x∗ = H k u = λk u.
For this sequence β = |λ| = ρ(H).
(ii) Suppose ρ(H) corresponds to a pair of complex conjugate eigenvalues λ
and λ. Let u and u be the corresponding eigenvector. We may select a basis
{ui} for Cn such that u1 = u and u2 = u. Any vector y ∈ Cn may be expressed
n
as y = i=1 ci ui . We may take y := Σ|ci | as norm for y. Now we choose
k
x(0) := + 12 (u + u). Then x(k) − x∗ = H k 21 (u + u) = 12 (λk u + λ u). Using the
norm just defined, we have x(k) − x∗ | = 12 (|λ|k + |λ|k ) = ρ(H)k . It follows that
β = ρ(H). ⊕.
Recall that the splitting of the matrix A
A=S−T
induces the iterative scheme
Sxnew = T xold + b
for the system Ax = b. Thus it is imperative to find conditions such that
ρ(S −1 T ) < 1. We discuss below several possible sufficient conditions that has
been established in the literature (cf: R. S. Varga, Matrix Iterative Analysis).
6 Systems of Linear Equations

Deﬁnition 3.1.2 Let A ∈ Rn×n . Then A = S − T is said to be a regular

splitting if S −1 ≥ 0 and T ≥ 0.

Theorem 3.1.3 If A ∈ Rn×n , A−1 ≥ 0 and A = S − T is a regular splitting,

then ρ(S −1 T ) < 1.

(pf): Let H := S −1 T . Then H ≥ 0 and S −1 A = I − H. Note that (I + H +

. . . + H m )(I − H) = I − H m+1 . It follows that 0 ≤ (I + H + . . . + H m )S −1 =
(I − H m+1 )A−1 ≤ A−1 . Let D(m) := I + H + . . . + H m , F := D(m) S −1 . Denote
n
A−1 = (αij ), S −1 = (βij ). Observe that fik =
(m)
djk = linear combinations
j=1
with nonnegative coeﬃcients. Since S −1 is nonsingular, no row of S −1
(m)
of dij
(m)
can be identically zero. It follows that each dij must involve at least once with
one of fi∗ . For each j, there exists a column index k(j) such that βjk(j) > 0.
(m) (m) (m)
Then fik(j) = Σdij βjk(j) ≤ αik(j) implies that dij βjk(j) ≤ αik(j) . Thus dij
(m)
is always bounded above. Since {dij } is a monotone increasing sequence as
∞

m → ∞. It follows that H m converges. So lim H m = 0. ⊕
m→∞
m=0

Deﬁnition 3.1.3 A nonsingular matrix A ∈ Rn×n is said to be an M -matrix

if aij ≤ 0 for i = j, and if A−1 ≥ 0.

Remark. Suppose A is an M -matrix. Then aii > 0.

Theorem 3.1.4 If A ∈ Rn×n is an M -matrix. Then both the Jacobi splitting

(3.10) and the Gauss-Seidel splitting are regular. In this case, both the Jacobi
method and the Gauss-Seidel method are convergent.

(pf): In the Jacobi method, S = D and T = L + U . Since D > 0, S −1 =

−1
D > 0. Now T = L + U ≥ 0 by the deﬁnition of M -matrix. This shows the
Jacobi splitting is regular. Convergence follows from Theorem 3.1.2.
In the Gauss-Seidel method, S = D − L and T = U . Obviously S is non-
singular and T ≥ 0. To show regular splitting, we need to show S −1 ≥ 0. Now
S −1 = (D − L)−1 = (I − D−1 L)−1D−1 . Note that D−1 L ≥ 0. Since D−1 L
is a strictly lower triangular matrix, (D−1 L)m+1 = 0 whenever m + 1 ≥ n. It
follows that (I − D−1 L)(I + D−1 L + . . . + D−1 L)m ) = I − (D−1 L)m+1 = I.
But then (I − D−1 L)−1 = I + D−1 L + . . . + (D−1 L)m ≥ 0. This shows that the
Gauss-Seidel splitting is regular. Convergence follows from Theorem 3.1.2. ⊕

Deﬁnition 3.1.4 A matrix A ∈ Rn×n is said to be (strictly, if inequality holds)

n
row-wise diagonally dominant if |aii | ≥ |aij | for all i.
j=1
j=i

Theorem 3.1.5 Both the Jacobi method and the Gauss-Seidel method converge
if A is strictly diagonally dominant.
3.2. RELAXATION METHOD 7

(pf): In the Jacobi method, J := H := D−1 (L + U ). Taking the L∞ -

norm, we have J∞ = maxi a1ii k−i |aik | < 1. The convergence follows
from Lemma 3.1.1 and Theorem 3.1.1. In the Gauss-Seidel method, G := H :=
(D − L)−1 U = (I − D−1 L)−1 D−1 U . Note that |Je | ≤ J∞ e. Thus |D−1 U |e ≤
(J∞ I − |D−1 L|)e. Note also that 0, ≤ |(I − D−1 L)−1 | = |I + D−1 L + . . . +
(D−1 L)n−1 | ≤ (I − |D−1 L|)−1 . Thus |G|e ≤ (I − |D−1 L|)−1 (I − |D−1 L| +
(J∞ − I)I)e = (J∞ − 1)(I − |D−1 L|)−1 e ≤ (I + (J∞ − 1)I)e = J∞ e.
It follows that G∞ ≤ J∞ < 1. ⊕
Remark. In Theorem 3.1.5 we have actually proved a stronger result G∞ ≤
J∞ . But it is not necessarily true that ρ(G) ≤ ρ(J). That is, it is not
true, in general, that the Gauss-Seidel method converges at least as fast as the
Jacobi method, although intuitively it seems this should be so. (cf: Stein and
Rosenberg Theorem in Varga).
Theorem 3.1.6 If A is symmetric and positive deﬁnite, then the Gauss-Seidel
method converges.
(pf): In the Gauss-Seidel method, G := (D − L)−1 LT . Consider G1 :=
D GD−1/2 = (I − L1 )−1 LT1 with L1 := D−1/2 LD−1/2 .
1/2

Since G and G1 are similar, G and G1 have the same eigenvalues. Suppose
G1 x = λx with x∗ x = 1 (Note that x may be in Cn ). Then LT1 x = λ(I −
L1 )x. It follows that x∗ L1 x = λ(1 − x∗ L1 x). Let X ∗ L1 x = a + ib. Then
|λ|2 =|a−ib
2 a2 +b2
. Note that D−1/2 AD−1/2 = I − L1 − LT1 is still positive
1−a−ib| =
1−2a+a2 +b2

deﬁnite. Thus 1 − x∗ L1 x − x∗ LT1 x = 1 − 2a > 0. It follows that |λ| < 1. ⊕

3.2 Relaxation Method

In the relaxation methods, we consider classes of matrices H that depend on
certain parameters. The main idea is to vary these parameters so that the
corresponding asymptotic convergence factor ρ(H) becomes as small as possible.
One of the most popular relaxation methods is the SOR method (3.16) where
H(ω) = (D − ωL)−1 [(1 − ω)D + ωU )]
Theorem 3.2.1 Suppose A ∈ Rn×n has nonzero diagonal elements. Then
ρ(H(ω)) ≥ |1−ω|. So for convergence of SOR, it is necessary to have 0 < ω < 2.
(pf): We ﬁrst observe that det(H(ω)) = det(D−ωL)−1 det[(1−ω)D+ωU ] =
n
det(D−1 ) det((1 − ω)D = (1 − ω)n . On the other hand, det(H(ω)) = λi with
i=1
n
λi eigenvalues of H(ω). It follows that |(1 − ω) | =
n
|λi | ≤ (ρ(H(ω)))n . The
i=1
assertion follows. ⊕
Theorem 3.2.2 (Ostrowski and Reich Theorem) Let A be real, symmetric and
positive deﬁnite. Then the SOR method converges if and only if 0 < ω < 2.
8 Systems of Linear Equations

(pf): The SOR method comes from the splitting A = S − T where

S := ω −1 D − L and T := (ω −1 − 1)D + U.

Obviously S is nonsingular. Let

Q := A − (S −1 T )T A(S −1 T ).

We claim that both S + T and Q are positive definite. Suppose these claims
are true. Let λ be any eigenvalues of H = S −1 T , and y the corresponding
eigenvector. Then 0 < y ∗ Qy = y ∗ AY − λy ∗ Aλy = (1 − |λ|2 )y ∗ Ay. It follows
that λ| < 1, and hence ρ(H) < 1.
Now we prove the claims. Recall that any given matrix M can be writ-
ten as M = 12 (M + M T ) + 12 (M − M T ) := Ms + Mk where Ms is sym-
metric and Mk is skew-symmetric. Note also that xT M x = xT Ms x. So it
suffices to check the symmetric part of S + T for positive definiteness. Now
(S+T )s = 12 {S+S T +T +T T } = 12 {(ω −1 D−L)+(ω −1D−U )+((ω −1 −1)D+U )+
((ω −1 −1)D+L)} = D(ω −1 (2−ω)) which obviously is positive definite. To check
the matrix Q for positive definiteness, we first observe that Q = A − H T AH =
A − (I − S −1 A)T A(I − S −1 A) = A − {A − (S −1 A)T A − A(S −1 A)T A(S −1 A)} =
(S −1 A)T {S + S T − A}(S −1 A) = (S −1 A)T {S T + T }(S −1 ). Now xT Qx =
xT (S −1 A)T {S T + T }(S −1 A)x = y T (S T + T )y = y T (S + T )y > 0 for all x = 0.
So Q is positive definite. ⊕
Remark. It is often possible to choose the parameter ω so that the SOR
method converges rapidly; much more rapidly than the Jacobi method or the
Gauss-Seidel method. Normally, such an optimum value for ω can be prescribed
if the coefficient matrix A, relative to the partitioning imposed, has the so called
property A and is the so called consistently order (cf: Hageman and Young,
Applied Iterative Methods, Chapter 9). In practice, the estimates for the SOR
ω is obtained by an adaptive procedure.

3.3 Acceleration Methods

In this section we discuss a general procedure for accelerating the rates of con-
vergence of basic iterative methods. The procedure involves the formation of a
new vector sequence from linear combinations of the iterates obtained from the
basic method.
Let {x(k) } be the sequence of iterates generated by a basic method (3.19.
That is, {x(k) } is formed by

x(k) = Hx(k−1) + d. (3.21)

Then the error vector e(k) := x(k) − x∗ satisﬁes

e(k) = H k e(0) . (3.22)

3.3. ACCELERATION METHODS 9

We consider a new vector sequence {u(k) } determined by the linear combination

k
u(k) := αk,i x(i) , k = 0, 1, . . . (3.23)
i=0

where the real numbers αk,i are required to satisfy the consistency condition

k
αk,i = 1, k = 0, 1, . . . (3.24)
i=0

Let (k) := u(k) − x∗ . Then we have

− x∗ = i=0 αk,i e(i)

k k
(k) = i=0 αk,i x
(i)
k k
(k) = i=0 αk,i x
(i)
= i=0 αk,i e(i)
k k
= ( i=0 αk,i H i )e(0) = ( i=0 αk,i H i )(0)
:= Qk (H)(0)

where
Qk (H) := αk,0 I + αk,1 H + . . . + αk,k H k
is a matrix polynomial. The idea is to choose the polynomials {Qk } so that
{u(k) } converges to x∗ faster than {x(k) }. Generally speaking, it requires a high
arithmetic cost and a large amount of storage in using (3.23) to obtain u(k) .
Alternatively, we usually consider only the important family of polynomials
satisfying the recurrence relation:

Q0 (x) = 1
Q1 (x) = γ1 x − γ1 + 1 (3.25)
Qk+1 (x) = ρk+1 (γk+1 x + 1 − γk+1 )Qk (x) + (1 − ρk+1 )Qk−1 (x), for k ≥ 1

where γ1 , ρ2 , γ2 , . . . are real numbers to be determined. We note that the con-

sistency condition (3.24) is satisﬁed automatically for all k ≥ 0.

Theorem 3.3.1 If the polynomial sequence {Qk } in (3.25) is used, then the
iterates {u(k) } of (3.23) may be obtained using the three-term relation:

u(1) = γ1 (Hu(0) + d) + (1 − γ1 )u(0) ,

(3.26)
u (k+1)
= ρk+1 {γk+1 (Hu (k)
+ d) + (1−, γk+1 )u (k)
+ (1 − ρk+1 )u (k−1)
.

(pf): This is a homework problem. ⊕ The polynomial {Qk } may be chosen

to fulﬁll one of two purposes:
(1) (Conjugate Gradient Acceleration) From (3.3), we have for any norm
that
(k) ≤ Qk (H)(0) . (3.27)
10 Systems of Linear Equations

So the polynomial sequence {Qk } may be chosen to minimize Qk (H)(0) . (cf:
Hageman and Young, Chapter 7).
(2) (Chebyshev Acceleration) This is motivated by the fact that

(k) ≤ Qk (H)(0) ≤ (ρ(Qk (H)) + )(0) (3.28)

for a certain norm. We note that

ρ(Qk (H)) = max Qk (λi )|. (3.29)

1≤i≤n

Let M (H) and m(H) denote, respectively, the algebraically largest and smallest
eigenvalues of H. So the polynomial {Qk } is chosen such that the virtual spectral
radius of Qk (H) deﬁned by

ρ(Qk (H)) := max |Qk (x)| (3.30)

m(H)≤x≤M(H)

is minimized. (cf: Hageman and Young, Chapter 4-6).

3.4 Conjugate Gradient Method

The conjugate gradient method is a very useful technique in many areas of
numerical computation. In this section, we shall study how it can be applied to
solve the linear system (3.1) when A is symmetric and positive deﬁnite.
Consider the quadratic functional F : Rn → R where
1 T
F (x) = x Ax − xT b. (3.31)
2
Suppose x is the solution to (3.1). Then
1
F (x) = F (x) + (x − x)T A(x − x). (3.32)
2
It follows that the problem of solving Ax − b is equivalent to the problem of
minimizing F (x). Moreover, the gradient of F (x) is given by

F (x) = Au − b. (3.33)

The direction of the vector F (x) is the direction for which the functional F (x)
at the point x changes most rapidly. Suppose x(k) is an approximation to x,
then in the direction of steepest descent rk := − F (x(k) ) we should obtain an
improved approximation

x(k+1) := x(k) − αk rk (3.34)

if αk is chosen to minimize F (x(k) + αrk )). Using (3.31), we can easily calculate
the number ak . Thus we have derived
Algorithm 3.4.1. (The Steepest Descent Method)
3.4. CONJUGATE GRADIENT METHOD 11

Given x(0) arbitrary

For k = 0, 1, . . .,
rk := b − Ax(k)
If rk = 0
then stop
Else
rkT rk
αk := ; (Why?)
rkT Ark
x(k+1) := x(k) + αk rk .

When the condition number k2 = λλmax (A)

min (A)
is large, the level curves of F
are very elongated hyperellipsoids and minimization corresponds to ﬁnding the
lowest point on a relatively ﬂat, steep-side valley. In steepest descent, we are
forced to traverse back and forth across the valley rather than down the valley.
This is a slow process. We would like to choose certain descent directions {pk }
other than {rk }.

Definition 3.4.1 Given a symmetric and positive definite matrix A, two vec-
tors d1 , d2 are said to be A-conjugate if and only dT1 Ad2 = 0. A finite set of
vectors d0 , . . . , dk is called an A-conjugate set if dTi Adj = 0 for all i = j.

Lemma 3.4.1. If {d0 , . . . , dn−1 } is an A-conjugate set, then d0 , . . . , dn−1 are

linearly independent.
Lemma 3.4.2. If d0 , . . . , dn−1 are A-conjugate, then the solution x∗ to (3.1)
may be written as
dT b
n−1
x∗ = ( T i )di . (3.35)
i=0
di Adi

(pf): By Lemma 3.4.1, d0 , . . . , dn−1 form a basis of Rn . The solution x∗

to (3.1) has a unique representation x∗ =
n−1
j=0 γj dj . Also, we have b =
n−1 dTi b
j=0 γj Adj . Taking inner product of B and di , γi = dT Adi
. ⊕
i

Theorem 3.4.1 (Conjugate Direction Theorem) Let {d0 , . . . , dn−1 } be a set of

nonzero A-conjugate vectors. For any x(0) ∈ Rn , the sequence {x(k) } generated
by
x(k+1) = x(k) + αk dk , k ≥ 0 (3.36)
with
rkT dk
αk = (3.37)
dTk Adk
rk = b − Ax(k) (3.38)

converges to the unique solution x∗ of Ax = b after n steps, i.e., x∗ = x(0) +

n−1
j=0 αj dj .
12 Systems of Linear Equations

(pf); Suppose x∗ − x(0) = n−1

j=0 δj dj . Then

dTi (b − Ax(0) dTi (rk + Ax(k) − Ax(0) )

δi = ) =
dTi Adi dTi Adi
k−1
for all k ≥ 0. Observe from (3.36), x(k) − x(0) = j=0 αj dj . So dTi (Ax(k) −
dTi ri
Ax(0) ) = 0 for all i ≥ k. That is, we have shown that δi = dT Adi
= αi . ⊕
i

Theorem 3.4.2 (Expanding Space Theorem) Let (d0 , . . . , dn−1 ) be a set of A-

conjugate vectors. Let Sk := [d0 , . . . , dk−1 ] denote the n-dimensional subspace
spanned by the vectors d0 , . . . , dk−1 . For any x(0) ∈ Rn , the sequence {x(k) }
generated from the scheme (3.36) and (3.37) has the property that
F (x(k) ) = min F (x(k−1) + αdk−1 ). (3.39)
α
In fact,
F (x(k) ) = min F (x) (3.40)
x∈X (0) +Sk

(pf): Deﬁne g(α) = F (x(k−1) + αdk−1 ). Then g (α) = dTk−1 F (x(k−1) +

αdk−1 ) = dTk−1 (A(x(k−1) + αdk−1 ) − b) = dTk−1 (αAdk−1 − rk−1 ). It follows that
the optimum value of α is given by (3.37).
To show that x(k) is a minimizer over the linear variety x(0) + Sk , it suffices
to show that F (x(k) ) = rk is perpendicular to Sk . Now for k = 1, we have
dT0 r1 = dT0 (b − A(x(0) + α0 d0 )) = 0 by the definition of α0 . For k = 2, we
have dT1 r2 = dT1 (b − A(x(1) + α1 d1 )) = 0 by the definition of α1 , and dT0 r2 =
dT0 (r1 − α1 Ad1 ) = 0. The assertion follows from induction. ⊕
Remark. In the above theorem, we have actually proved the fact that
di ⊥rk
for all i < k.
Algorithm 3.4.2. (The Conjugate Gradient Method)
Given x(0) ∈ Rn arbitrary
d0 := r0 := b − Ax(0)
For k = 0, 1, . . . , n − 1
If rk = 0
Stop
Else
rkT dk
αk : = (Can be replaced by (3.46).) (3.41)
dTk Adk
x(k+1) := x(k) + αk dk (3.42)
rk+1 := b − Ax(k+1) (3.43)
rT Adk
βk := − k+1 (Can be replaced by (3.47).) (3.44)
dTk Adk
dk+1 : = rk+1 + βk dk . (3.45)
3.4. CONJUGATE GRADIENT METHOD 13

Theorem 3.4.3 (Conjugate Gradient Theorem) The conjugate gradient algo-

rithm is a conjugate direction method. If it does not terminate at x(k) , then

1. span{r0 , r1 , . . . , rk } = span{r0 , Ar0 , . . . , Ak r0 };

2. span{d0 , d1 , . . . , dk } = span{r0 , Ar0 , . . . , Ak r0 };

3. dTk Adi = 0 for i < k;

4.
rkT rk
αk = ; (3.46)
dTk Adk

5.
T
rk+1 rk+1
βk = + . (3.47)
rkT rk

(pf): All proofs should be completed by induction.

(a) When k = 0, the case is trivial. Suppose the statement (a) is true for k.
Want to show span{r0 , r1 , . . . , rk+1 } = span{r0 , Ar0 , . . . , Ak+1 r0 }. Note

rk+1 = b − Ax(k+1) = rk − αk Adk .

Note also

rk ∈ span{r0 , r1 , . . . , rk } = span{r0 , Ar0 , . . . , Ak r0 } ⊂ span{r0 , Ar0 , . . . , Ak+1 r0 }.

But by construction, dk ∈ span{r0 , r1 , . . . , rk } = span{r0 , Ar0 , . . . , Ak r0 }. There-

fore, rk+1 ∈ span{r0 , Ar0 , . . . , Ak+1 r0 }. This shows that

span{r0 , r1 , . . . , rk+1 } ⊂ span{r0 , Ar0 , . . . , Ak+1 r0 }.

Now we need to show

Ak+1 r0 ∈ span{r0 , r1 , . . . , rk+1 }.

k+1
Note that rk+1 = i=0 γi Ai r0 . Since

rk+1 ∈ span{r0 , r1 , . . . , rk } = span{r0 , Ar0 , . . . , Ak r0 },

γk+1 = 0. So Ak+1 r0 can be written as a linear combination of r0 , . . . , rk+1 .

(b) The proof is similar to (a).
(c) Assume dTk Adi = 0 for i < k. Want to show dTk+1 Adi = 0 for i < k + 1.
By construction, dTk+1 Adi = rk+1 T
Adi + βk dTk Adi . If i = k, then dTk+1 Adi = 0 by
the deﬁnition of βk . If i < k, then dTk+1 Adi = rk+1 T
Adi = 0 by (a) and (b) since
Adi ∈ span{d0 , . . . , di+1 } and rk+1 ⊥span{d0 , . . . , di+1 }.
rT d r T r +β T
rk dk−1
(d) By deﬁnition, αk = dTkAdk and dk = rk +βk−1 dk−1 . So αk = k k dTk−1 Ad
=
k k k k
T
rk rk
dT Adk
since rkT dk−1 = 0.
k
14 Systems of Linear Equations

T
rk+1 Adk rT Ad T
rk+1 (rk −rk+1 /αk
(e) By deﬁnition, βk := − dT Adk
= − rk+1
T r /α
k
= − T r /α
rk
=
k k k k k k
r T rk+1
+ k+1Tr
rk k
because by (3.45) rk = dk − βk−1 dk−1 .

Remark. In exact arithmetic, the conjugate gradient method would have

reached the solution in exactly n iterations. Because of the effect of floating-
point arithmetic, the computed rn generally is different from zero. In practice,
therefore the method is simply continued until rk is found sufficiently small.

Chapter 4
No ratings yet
Chapter 4
27 pages
Introduction To Numerial Analysis by S.baskar & S. Sivaji Ganesh - Removed
No ratings yet
Introduction To Numerial Analysis by S.baskar & S. Sivaji Ganesh - Removed
131 pages
CombOptim Ch3
No ratings yet
CombOptim Ch3
36 pages
Numerical Analysis Lecture Notes: 7. Iterative Methods For Linear Systems
100% (1)
Numerical Analysis Lecture Notes: 7. Iterative Methods For Linear Systems
28 pages
Iterative Methods For Solving Ax B - Jacobi's Method
No ratings yet
Iterative Methods For Solving Ax B - Jacobi's Method
4 pages
Numerical Methods For Engineers ch4
No ratings yet
Numerical Methods For Engineers ch4
129 pages
HPC Iterative
No ratings yet
HPC Iterative
106 pages
Gts 1
No ratings yet
Gts 1
56 pages
Final PowerPoint (Oct, 2024)
No ratings yet
Final PowerPoint (Oct, 2024)
27 pages
Final Powrpoint (Asm)
No ratings yet
Final Powrpoint (Asm)
33 pages
03 Iterative Methods PDF
No ratings yet
03 Iterative Methods PDF
19 pages
WTW 238 English Summary
No ratings yet
WTW 238 English Summary
30 pages
CS515 Homework 2
No ratings yet
CS515 Homework 2
41 pages
Ma214 S23 Part06
No ratings yet
Ma214 S23 Part06
30 pages
Chapter 3 The Iterative Solving Method For Linear System of Equations
No ratings yet
Chapter 3 The Iterative Solving Method For Linear System of Equations
27 pages
Gauss Seidel Jacobi
No ratings yet
Gauss Seidel Jacobi
21 pages
Jacobi
No ratings yet
Jacobi
46 pages
2 - Numerical Methods For Solving Linear Systems of Equations
No ratings yet
2 - Numerical Methods For Solving Linear Systems of Equations
35 pages
Lecture 9
No ratings yet
Lecture 9
36 pages
Computational and Applied 22s Soln
No ratings yet
Computational and Applied 22s Soln
7 pages
Iterative Method For Solving Linear System
No ratings yet
Iterative Method For Solving Linear System
33 pages
CH 11 1
No ratings yet
CH 11 1
21 pages
Iterative Linear System PDF
No ratings yet
Iterative Linear System PDF
13 pages
Iterative Methods For Linear Systems: Course Website
No ratings yet
Iterative Methods For Linear Systems: Course Website
24 pages
PII - Numerical Analysis II - Iserles (2005) 61pg PDF
No ratings yet
PII - Numerical Analysis II - Iserles (2005) 61pg PDF
61 pages
Systems of Linear Equations
No ratings yet
Systems of Linear Equations
20 pages
Iterative Solution Methods
No ratings yet
Iterative Solution Methods
37 pages
W-13 Solving The Linear Equation Systems by Iteration Method
No ratings yet
W-13 Solving The Linear Equation Systems by Iteration Method
24 pages
Lesson 2
No ratings yet
Lesson 2
6 pages
Section-3.8 Gauss Jacobi
No ratings yet
Section-3.8 Gauss Jacobi
15 pages
MAT637 Practice Examination
No ratings yet
MAT637 Practice Examination
9 pages
02-1 Linear Algebraic Equations
No ratings yet
02-1 Linear Algebraic Equations
29 pages
4 Iterative Methods: 4.1 What A Two Year Old Child Can Do
No ratings yet
4 Iterative Methods: 4.1 What A Two Year Old Child Can Do
15 pages
Mat 637 2023 Assignment 3. Tex
No ratings yet
Mat 637 2023 Assignment 3. Tex
11 pages
Station Iter
No ratings yet
Station Iter
11 pages
Num Chap 3 Edited
No ratings yet
Num Chap 3 Edited
12 pages
Examples To Iterative Methods
No ratings yet
Examples To Iterative Methods
13 pages
Mathematics of Modern Engineering I Lecture 3
No ratings yet
Mathematics of Modern Engineering I Lecture 3
6 pages
Unit Iterative Methods: Dense
No ratings yet
Unit Iterative Methods: Dense
18 pages
03a1 MIT18 - 409F09 - Scribe21
No ratings yet
03a1 MIT18 - 409F09 - Scribe21
8 pages
Ma214 S23 Part05
No ratings yet
Ma214 S23 Part05
12 pages
Numerical Methods1
No ratings yet
Numerical Methods1
3 pages
Gauss Jacobi Proof
No ratings yet
Gauss Jacobi Proof
5 pages
Iterative Methods For Solving Linear Systems
No ratings yet
Iterative Methods For Solving Linear Systems
32 pages
Methodos Numericos
No ratings yet
Methodos Numericos
34 pages
Numerical Analysis I Chapter III - VI
No ratings yet
Numerical Analysis I Chapter III - VI
38 pages
Applied Numerical Methods MEE3005
No ratings yet
Applied Numerical Methods MEE3005
21 pages
Chapter 4 (5 Lectures)
No ratings yet
Chapter 4 (5 Lectures)
16 pages
Solvingsingular Linear Equation
No ratings yet
Solvingsingular Linear Equation
49 pages
A Two-Step Jacobi-Type Iterative M e T H o D: Computers Math. Applic. Vol. 34, No. 1, Pp. 1-9, 1997
No ratings yet
A Two-Step Jacobi-Type Iterative M e T H o D: Computers Math. Applic. Vol. 34, No. 1, Pp. 1-9, 1997
9 pages
CE3330 Dr. Tarun Naskar
No ratings yet
CE3330 Dr. Tarun Naskar
20 pages
Linear Algebra Review
No ratings yet
Linear Algebra Review
2 pages
Iterative Methods For Solving Linear System of Equations 2.1 Introduction To Iterative Methods
No ratings yet
Iterative Methods For Solving Linear System of Equations 2.1 Introduction To Iterative Methods
2 pages
Basic Iterative Methods For Solving Linear Systems PDF
No ratings yet
Basic Iterative Methods For Solving Linear Systems PDF
33 pages
Direct and Iterative Methods For Solving Linear Systems of Equations
No ratings yet
Direct and Iterative Methods For Solving Linear Systems of Equations
16 pages
MAT 461/561: 5.1 Stationary Iterative Methods
No ratings yet
MAT 461/561: 5.1 Stationary Iterative Methods
4 pages
Linear System: 2011 Intro. To Computation Mathematics LAB Session
No ratings yet
Linear System: 2011 Intro. To Computation Mathematics LAB Session
7 pages
Numerical Analysis - Lecture 1
No ratings yet
Numerical Analysis - Lecture 1
2 pages

Chapter 3

Uploaded by

Chapter 3

Uploaded by

Chapter 3

It follows that the error e := x − x̃ satisﬁes the equation

(b) Compute the residual r := b − Axold ;

(2) The Gauss-Seidel method.

(3) The SOR method.

The main concerns raised in any iterative method are that

3.1 General Consideration

xnew = Hxold + d (3.19)

ρ(H) ≤ H ≤ ρ(H) + (3.20)

where Λ := a diagonal matrix with λ1 (H) as elements, and U := an upper tri-

Now we deﬁne a vector norm  ·  by

x := D−1 P −1 x2 . (Show that this is a norm!)

Then the induced matrix norm for H is

H = sup Hx = sup D−1 P −1 Hx2 = sup CD−1 P −1 x2

Observe that C2 depends continuously on δ. When δ = 0, it is obvious that

(pf): This is a homework problem.

Theorem 3.1.1 Suppose x = Hx + d has a unique solution x∗ . Then the

(pf): Observe ﬁrst that x(k+1) − x∗ = H(x(k) − x∗ ) = . . . = H k+1 (x(0) − x∗ ).

 The i-th column of H k 1 → 0 as k → ∞. Since i is arbitrary, it follows that

Deﬁnition 3.1.2 Let A ∈ Rn×n . Then A = S − T is said to be a regular

Theorem 3.1.3 If A ∈ Rn×n , A−1 ≥ 0 and A = S − T is a regular splitting,

(pf): Let H := S −1 T . Then H ≥ 0 and S −1 A = I − H. Note that (I + H +

Deﬁnition 3.1.3 A nonsingular matrix A ∈ Rn×n is said to be an M -matrix

Remark. Suppose A is an M -matrix. Then aii > 0.

Theorem 3.1.4 If A ∈ Rn×n is an M -matrix. Then both the Jacobi splitting

(pf): In the Jacobi method, S = D and T = L + U . Since D > 0, S −1 =

Deﬁnition 3.1.4 A matrix A ∈ Rn×n is said to be (strictly, if inequality holds)

(pf): In the Jacobi method, J := H := D−1 (L + U ). Taking the L∞ -

deﬁnite. Thus 1 − x∗ L1 x − x∗ LT1 x = 1 − 2a > 0. It follows that |λ| < 1. ⊕

3.2 Relaxation Method

(pf): The SOR method comes from the splitting A = S − T where

Obviously S is nonsingular. Let

3.3 Acceleration Methods

x(k) = Hx(k−1) + d. (3.21)

Then the error vector e(k) := x(k) − x∗ satisﬁes

e(k) = H k e(0) . (3.22)

We consider a new vector sequence {u(k) } determined by the linear combination

Let (k) := u(k) − x∗ . Then we have

− x∗ = i=0 αk,i e(i)

where γ1 , ρ2 , γ2 , . . . are real numbers to be determined. We note that the con-

u(1) = γ1 (Hu(0) + d) + (1 − γ1 )u(0) ,

(pf): This is a homework problem. ⊕ The polynomial {Qk } may be chosen

(k)  ≤ Qk (H)(0)  ≤ (ρ(Qk (H)) + )(0)  (3.28)

for a certain norm. We note that

ρ(Qk (H)) = max Qk (λi )|. (3.29)

ρ(Qk (H)) := max |Qk (x)| (3.30)

is minimized. (cf: Hageman and Young, Chapter 4-6).

3.4 Conjugate Gradient Method

x(k+1) := x(k) − αk rk (3.34)

Given x(0) arbitrary

When the condition number k2 = λλmax (A)

Lemma 3.4.1. If {d0 , . . . , dn−1 } is an A-conjugate set, then d0 , . . . , dn−1 are

(pf): By Lemma 3.4.1, d0 , . . . , dn−1 form a basis of Rn . The solution x∗

Theorem 3.4.1 (Conjugate Direction Theorem) Let {d0 , . . . , dn−1 } be a set of

converges to the unique solution x∗ of Ax = b after n steps, i.e., x∗ = x(0) +

(pf); Suppose x∗ − x(0) = n−1

dTi (b − Ax(0) dTi (rk + Ax(k) − Ax(0) )

Theorem 3.4.2 (Expanding Space Theorem) Let (d0 , . . . , dn−1 ) be a set of A-

(pf): Deﬁne g(α) = F (x(k−1) + αdk−1 ). Then g (α) = dTk−1 F (x(k−1) +

Theorem 3.4.3 (Conjugate Gradient Theorem) The conjugate gradient algo-

1. span{r0 , r1 , . . . , rk } = span{r0 , Ar0 , . . . , Ak r0 };

2. span{d0 , d1 , . . . , dk } = span{r0 , Ar0 , . . . , Ak r0 };

3. dTk Adi = 0 for i < k;

(pf): All proofs should be completed by induction.

rk+1 = b − Ax(k+1) = rk − αk Adk .

rk ∈ span{r0 , r1 , . . . , rk } = span{r0 , Ar0 , . . . , Ak r0 } ⊂ span{r0 , Ar0 , . . . , Ak+1 r0 }.

But by construction, dk ∈ span{r0 , r1 , . . . , rk } = span{r0 , Ar0 , . . . , Ak r0 }. There-

span{r0 , r1 , . . . , rk+1 } ⊂ span{r0 , Ar0 , . . . , Ak+1 r0 }.

Now we need to show

Ak+1 r0 ∈ span{r0 , r1 , . . . , rk+1 }.

rk+1 ∈ span{r0 , r1 , . . . , rk } = span{r0 , Ar0 , . . . , Ak r0 },

γk+1 = 0. So Ak+1 r0 can be written as a linear combination of r0 , . . . , rk+1 .

Remark. In exact arithmetic, the conjugate gradient method would have

You might also like

ρ(H) ≤ H ≤ ρ(H) + (3.20)

Now we deﬁne a vector norm · by

x := D−1 P −1 x2 . (Show that this is a norm!)

H = sup Hx = sup D−1 P −1 Hx2 = sup CD−1 P −1 x2

Observe that C2 depends continuously on δ. When δ = 0, it is obvious that

The i-th column of H k 1 → 0 as k → ∞. Since i is arbitrary, it follows that

(k) ≤ Qk (H)(0) ≤ (ρ(Qk (H)) + )(0) (3.28)

ρ(Qk (H)) = max Qk (λi )|. (3.29)