0% found this document useful (0 votes)
34 views5 pages

Chapter 3 Supplementary: (Recall That N (N) )

The document provides details on various iterative methods for solving systems of linear equations, including the conjugate gradient method. It first introduces the conjugate gradient method and outlines its key steps: (1) choosing the time step to be optimal, (2) using conjugate search directions that are orthogonal to previous directions, (3) recursively defining new search directions. It then proves two lemmas: (1) the search directions and residual vectors span the same subspaces, and (2) the search directions and residual vectors are pairwise orthogonal. This establishes that the conjugate gradient method generates conjugate search directions.

Uploaded by

Kan Samuel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views5 pages

Chapter 3 Supplementary: (Recall That N (N) )

The document provides details on various iterative methods for solving systems of linear equations, including the conjugate gradient method. It first introduces the conjugate gradient method and outlines its key steps: (1) choosing the time step to be optimal, (2) using conjugate search directions that are orthogonal to previous directions, (3) recursively defining new search directions. It then proves two lemmas: (1) the search directions and residual vectors span the same subspaces, and (2) the search directions and residual vectors are pairwise orthogonal. This establishes that the conjugate gradient method generates conjugate search directions.

Uploaded by

Kan Samuel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Chapter 3 Supplementary

Another condition for the convergence


Let A = N P (N is invertible).
Consider the iterative scheme: Nxk+1 = Pxk + f or xk+1 = N1 Pxk + N1 f .
Theorem (Household-John). Suppose that A and N + N A are self-adjoint positive definite. (Recall that
N = (N )T )
Then, the iterative scheme: xk+1 = N1 Pxk + N1 f converges.
Proof. We have M = N1 P = N1 (N A) = (I N1 A). It suffices to show that || < 1 for any eigenvalues
of M. Let x be the corresponding eigenvector. ( and x could be complex.) We have:
Mx = x

(I N1 A)x = x

(N A)x = Nx

(1 )Nx = Ax.

Note that 6= 1. Otherwise, Ax = 0, contradicting that A is PD.


Multiply x on both sides: (1 )x Nx = x Ax

x Nx =

1
x Ax
1

(*)

Taking conjugate transpose on both side:


N x = x A x = x Ax
(1 )x

x Nx =

1
xAx
1

(**)

Adding (*) and (**) and subtract x Ax from two sides:




1
1
1 ||2
+
x Ax.
x (N + N A)x =
1 x Ax =
1 1
|1 |2
Since A and N N A are PD, we have: x Ax > 0 and x (N + N A)x > 0. Hence, 1 ||2 > 0 and
|| < 1.
So, (N1 P) < 1 and the iterative scheme converges.
Example:
Suppose A is self-adjoint positive definite. Using Household-John Theorem, prove that: SOR method converges
if and only if 0 < < 2.
Solution:
() Proved.
() Note that NSOR() + NSOR() A = (2/ 1)D
(Recall: NSOR() = L + 1 D. Also, L = U )
Hence, NSOR() is also self-adjoint position-definite if 0 < < 2. By Household-John Theorem, SOR
method converges.
Symmetric SOR
Symmetric SOR is defined as follows: let A = D + L + U.
1
1
x(j+ 2 ) = (D + L)1 ((1 )D U)xj + ( D + L)1 f ;

1
1
x(j+1) = (D + U)1 ((1 )D L)xj+ 2 + ( D + U)1 f

Combining, we get an iterative scheme:


x(j+1) = MSSOR xj + cSSOR
1

where MSSOR = (D + U)

[(1 )D L] (D + L)

[(1 )D U].

It can be proved that SSOR is associated to the following splitting:

where:

A = NSSOR PSSOR
1
NSSOR() =
(D + L) D1 (D + U) and
(2 )
1
[(1 )D L] D1 [(1 )D U]
PSSOR() =
(2 )

(Verify that: A = NSSOR PSSOR and N1


SSOR PSSOR = MSSOR !)
Theorem. If A is self-adjoint and positive-definite, then SSOR converges if and only if 0 < < 2.
Proof. (Sketch of proof) It can be shown that (MSSOR ) ( 1)2 , which implies 0 < < 2 is necessary for
SSOR to converge.
Also,



2
1 1/2
1 1/2
2
D+
D + LD1/2
D + LD1/2
NSSOR() + NSSOR() A =
2
2 2
2
is self-adjoint and positive-definite if 0 < < 2.
Household-John Theorem implies SSOR converge.

Another iterative method: Gradient descent method


Consider the linear system Ax = b where A is real symmetric positive definite.
Solving Ax = b is equivalent to minimizing f : RM R where f is a quadratic function:
f () =

1
A b
2

where A is a square SPD M M matrix.


Our goal is to look for an iterative method as follows:
k+1 = k + k dk ,

( k RM , k RM and k R)

k = 0, 1, 2, . . .

which minimizes f iteratively. f ( k ) as k .


We call dk the search direction and k the time step.
From calculus, we can compute:


f f
f
f =
,
,...,
= A b.
1 2
M
( the minimizer must be the unique critical point: g = A b = 0)
Now,

2f
2
f
1
12
M

.
..
...
the hessian = f 00 () =
.
..
= A.
2f
2f

M 1
2
M

From calculus, we have Taylors expansion:


k2 k 00
f ()k
2
f ( k+1 ) = f () + k f ( k ) dk + O(k2 )
f ( k+1 ) = f ( k ) + k f ( k ) dk +

if f ( k ) dk < 0 and k is small enough, then f ( k+1 ) < f ( k )


we choose dk = f ( k ) (Steepest descent direction)
Time step k can be chosen such that:
f ( k + k dk ) = min f ( k + k dk )
>0

In this case, k is said to be optimal. If is optimal, then:


d
f ( k + dk ) = 0
d
2

at = k .

So, we have:
f 0 ( k+1 ) d = 0 (A k+1 b) dk = 0
(A( k + k dk ) b) dk = 0 (A k b) dk + k dk Adk = 0

k =

Optimal k is:

(A k b) dk
dk Adk

Convergence of gradient method


We consider the gradient method with constant time step :

(*) k+1 = k + dk , where
Here, is chosen to be small enough
(**) dk = f 0 ( k ) = (A k b)
Let be the solution. Then: = + (A b) (***).
(*) (***): ek+1 = (I A)ek
(ek = k = error vector)
Similar to what we have discuss before: kek+1 k (I A)kek k
Need: (I A) < 1 ! Let 1 , 2 , . . . , M = eigenvalues of A. Then: 1 1 , 1 2 , . . . 1 M are the
eigenvalues of I A.
(I A) < 1 |1 j | < 1

j < 2

j. Choose 3 <

In practice, we can choose =

1
Max

Then:

2
Max

1 j < 1 and 1 j > 1


(always true
since j > 0)
(Max = max{1 , 2 , . . . , M })
j

. Then:

(I A) = 1
Define:

min
Max

min = min{1 , 2 , . . . , M }
j

Max
= (A) = condition number of A.
min
(I A) = 1

ek


1

1
(A)

k

1
< 1.
(A)

e0

ek 0 as k .

Gradient method converges.


Remark: To reduce the error by a factor of , we need n iterations where:

n
 
1
1
1


n (A) log
(A)

Condition number large Convergence slow!!

The conjugate gradient method


Goal: Develop an iterative method to solve Ax = b in M steps A MM M .
Method: Choose time step k to be optimal and search directions dk are conjugate: dj Adj = 0 for i 6= j.
Since A is SPD, we can define inner product on RM as:
h, i = A

, RM .

we have:
hdi , dj i = 0 for i 6= j.

We define:

kkA = h, i 2 for RM .
We now state the conjugate gradient method:
Given 0 RM , d0 = r0 := (A 0 b), find k and dk (k = 1, 2, . . . ) such that:
3

(a) k+1 = k + k dk
(b) k =

rk dk
hdk , dk i

(c) dk+1 rk+1 + k dk


(d) k =

hrk+1 , dk i
hdk , dk i

Analysis of each steps:


Recall that for gradient descent method, optimal time step k is:
k =

(A k b) dk
.
dk Adk

(b) means k is optimal.


Now, (c) means hdk+1 , dk i = hrk+1 + k dk , dk i.
hrk+1 , dk i
hdk+1 , dk i = 0 implies: k =
, which is (d).
hdk , dk i
But: is hdj , dj i = 0 for all i 6= j ?? Yes!
Lemma 1. For m = 0, 1, 2, . . . , we have:
Span(d0 , d1 , . . . , dm ) = Span(r0 , r1 , . . . , rm ) = Span(r0 , Ar0 , . . . , Am r0 )
Recall: Span( 0 , 1 , . . . , m ) = { Rm : a0 0 + + am m , aj R}
Proof. We use mathematical induction. When m = 0, obviously true.
Suppose now the equality holds for m = k.
Now, multiply (a) by A: we get A k+1 b = A k b + k Adk which gives:
rk+1 = rk + k Adk .

(F)

By induction hypothesis:
dk Span{r0 , Ar0 , . . . , Ak }
and so:
Adk Span{r0 , Ar0 , . . . , Ak+1 r0 }.
From (F), we see that
Span{r0 , . . . , rk+1 } Span{r0 , Ar0 , . . . , Ak+1 r0 }.
Also, from induction hypothesis, Ak r0 Span(d0 , . . . , dk ).

Ak+1 r0 Span(Ad0 , . . . , Adk )

From (F), Adi Span(ri , ri+1 ). Ar0 Span(r0 , r1 , . . . , rk+1 ). we have:


Span(r0 , Ar0 , . . . , Ak+1 r0 ) Span(r0 , r1 , . . . , rk+1 ).
Thus:
Span(r0 , Ar0 , . . . , Ak+1 r0 ) = Span(r0 , r1 , . . . , rk+1 ).
Now, from (c):
dk+1 = rk+1 + k dk .
It is clear that:
Span(r0 , . . . , rk+1 ) = Span(d0 , d1 . . . , dk+1 ).
By M.I., the theorem is true!
Lemma 2. The search direction di are pairwise-conjugate. That is:
hdi , dj i = 0

for i 6= j.

(FF)

Also, ri are pairwise orthogonal. That is:


ri rj = 0

for i 6= j.
4

(FFF)

Proof. Suppose the statement is true for i, j k. By Lemma 1, Span(d0 , . . . , dj ) = Span(r0 , . . . , rj ). From the
induction hypothesis,
rk dj = 0

for j = 0, 1, 2, . . . , k 1

(dj Span(r0 , r1 , . . . , rj ))

Since rk+1 = rk + k Adk , we have


rk+1 dj = rk dj + hdk , dj i = 0 for j = 0, 1, 2, . . . , k 1
induction hypothesis
Also, k is optimal and so
rk+1 dk = f 0 ( k + k dk ) dk =


d

f ( k + dk )
= 0.
d
=k

Hence, rk+1 dj for j = 0, 1, 2, . . . , k.


By Lemma 7.1, rk+1 rj = 0 for j = 0, 1, 2, . . . , k, which proves (FFF) for i, j k + 1.
j
(r Span(d0 , . . . , dk ))
Now, Adj Span(r0 , r1 , . . . , rj+1 ) since rk+1 = rk + k Adk

hrk+1 , dj i = rk+1 Adj = 0 for j = 0, 1, 2, . . . , k 1

(Adj Span(r0 , r1 , . . . , rj+1 ))

Now, dk+1 = rk+1 + k dk (from (c))


Also, from induction hypothesis, hdk , dj i = 0 for j = 0, 1, 2, . . . , k 1.

hdk+1 , dj i = hrk+1 , dj i + k hdk , dj i = 0 for j = 0, 1, 2, . . . , k 1

Also, by construction, hdk+1 , dk i = 0.

hdi , dj i = 0 for all 1 i, j k + 1 and i 6= j.

By M.I. the Lemma is true!


Theorem. For some m M , we have A m = b.
Proof. Note that ri rj = 0 for i 6= j. Since in RM , there are at most M pairwise orthogonal non-zero vectors,
it follows that rm = A m b = 0 for some m M .
Remark:
- From the theorem, we see that conjugate gradient method must converge in M iterations
- Gradient descent method might not converge to exact sol.
- Conjugate gradient method converges to the exact solution in less than M iterations.
- In fact, the convergence rate for both steepest descent method and conjugate gradient method depends
on (A).
How about if (A) is large?? Preconditioning.
Pre-conditioning
Recall that the gradient method is equivalent to minimizing:


1
min f () = min
A b
2
RM
RM
Let E = non-singular M M matrix. Define: = E. Then = E1
Define:
1
fe() = f () = f (E1 ) = (E1 ) A(E1 ) b E1
2
1
1
T
1
= (E AE ) (ET b) = A b.
2
2
T
1
T
T
1 T
e
e
where A = E AE and b = E b. (E
= (E ) )
e << (A), then the gradient method for the new minimization problem will be much faster!!
If (A)
Question: How to choose the right pre-conditioner ??
Answer: Interesting but challenging research topic!!

You might also like