0% found this document useful (0 votes)
60 views

An Introduction of Multigrid Methods For Large-Scale Computation

This document provides an introduction to multigrid methods for solving large-scale sparse linear systems. It discusses how large modern simulations can be, from millions to billions of nodes. It then outlines the basics of multigrid methods, including that their computation cost scales with problem size and they are easy to parallelize. It also covers stationary iterative methods, finite element error estimates, algebraic multigrid, nonlinear multigrid, and multigrid parallelization. Multigrid methods are introduced as an efficient way to solve the large linear systems that arise from finite element discretizations of partial differential equations.

Uploaded by

Nadji Chi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

An Introduction of Multigrid Methods For Large-Scale Computation

This document provides an introduction to multigrid methods for solving large-scale sparse linear systems. It discusses how large modern simulations can be, from millions to billions of nodes. It then outlines the basics of multigrid methods, including that their computation cost scales with problem size and they are easy to parallelize. It also covers stationary iterative methods, finite element error estimates, algebraic multigrid, nonlinear multigrid, and multigrid parallelization. Multigrid methods are introduced as an efficient way to solve the large linear systems that arise from finite element discretizations of partial differential equations.

Uploaded by

Nadji Chi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

An Introduction of Multigrid Methods

for Large-Scale Computation

Chin-Tien Wu
National Center for Theoretical Sciences
National Tsing-Hua University
01/24/2005
How Large the Real Simulations Are?
Large-scale Simulation of Polymer Electrolyte Three-Dimensional Finite Element Modeling of
Fuel Cells by parallel Computing Human Ear for Sound Transmission
(Hua Meng and Chao-Yang Wang, 2004) (R. Z. Gan, B. Feng and Q. Sun, 2004)

FEM model with O(106) nodes FEM model with 105 ~106 nodes

Car engine with O(105) nodes Commercial Aircraft: 107 nodes FINFET transistor: 105 nodes
What do we need in order to simulate?
•Deepunderstandingtophysicalproblems
•Goodmathematicalmodels.
•Goodcomputablemathematicalmodels.
•Computationgrids(notnecessarybut...)
•Discretizations
•Solvelinearsystems
•Solvelinearsystemsfast!
Our goal is to introduce multigrid methods for solving sparse linear systems.

Why multigrid?

1. Computation cost of multigrid is proportional to problem sizes.


2. Multigrid is “easy” to be parallelized.
Outlines
• Stationary Iterative Methods
• Some finite element error estimates
• Multigrid
• Algebraic Multigrid
• Nonlinear Multigrid (FAS)
• Multigrid Parallelization

Reference:

1. An introduction to multilevel methods (Jinchao Xu)


2. Multigrid Methods (Stephen F. McCormick)
3. A multigrid tutorial (William L. Briggs)
4. Matrix iterative analysis (Richard S. Varga)
5. The mathematical theory of finite element methods (Brenner and Scott)
6. Introduction to Algebraic Multigrid (Christian Wagner)
Solving Linear System Ax=b
by Iterative Methods
Methods:

• Stationary Methods: Jacobi, Gauss Seidel (GS), SOR.

• Krylov Subspace Methods: Conjugate gradient, GMRES,


by Saad and Schultz 1986, and MINRES, by Paige and
Saunders 1975.

• Multigrid Methods: Geometric multigrid (MG), by


Fedorenko 1961, and algebraic multigrid (AMG), by
Ruge and Stüben 1985.
Basic questions and some definitions
Basic questions are

1. How do we iterate?
2. For what category of matrices A, the iteration converge?
3. What is the convergence rate?
Some definitions:
⎡ A1,1
T
A1,2 ⎤
A is irreducible if there is no permutation P such that P AP= ⎢
⎣ 0 A 2,2 ⎥⎦
A is non-negative (denoted as A ≥ 0) if a i,j ≥ 0, for all 1 ≤ i,j ≤ n
A is an M-matrix if A is nonsingular, a i,j ≤ 0 for i ≠ j, and A-1 ≥ 0
A is irreducibly diagonally dominant if A is irreducible,
n
diagonally dominant with a i,i > ∑a i,j for some i.
j=1,j≠ i

A=M-N is a regular splitting of A if M is nonsingular and M -1 ≥ 0


Stationary Iterative Methods
1. r old = f − Au old (
enew = eold − B −1 f − Au old )
2. Solve e=B-1r old ⇔ (
= eold − B −1 A u − u old )
3. update u new = u old + e ( )
= I - B-1 A eold

B is called an iterator or preconditioner of A.


EB = I - B-1 A is called the error reduction operator of the iterator B

Perron-Frobenius Theorem
Theorem: Let A≥0 be an irreducible matrix. Then

1. A has a positive real eigenvalue equal to its spetral radius


2. There is an eigenvector x>0 corresponds to ρ ( A )
3. ρ ( A ) increases when any entry of A increases.
4. ρ ( A ) is a simple eigenvalue of A.
Some Well Known Iterative Methods
Suppose A = D − L − U, where
D is the diagonal, L and U are lower and upper triangular
parts, respectively.
1 2
Richardson: B = , where 0<ω < .
ω ρ ( A)
Jacobi: B=D
1 2
Damped Jacobi: B = D, where 0<ω < .
ω ρ D A(
-1
)
Gauss-Seidel: B = (D − L)
1
SOR: B = ( D − ω L ) , where 0<ω <2.
ω
Jacobi and Gauss-Seidel
n ⎛ ai, j ⎞ (m ) ri
Jacobi: x(m +1)
i = −∑ ⎜ ⎟ xj +
j =1 ⎝ ai,i ⎠ ai,i
j ≠i

i −1⎛ ai, j ⎞ (m +1) n ⎛ ai, j ⎞ (m ) ri


Gauss-Seidel: x (m +1)
i = −∑ ⎜ ⎟ xj − ∑ ⎜⎝ a ⎟⎠ x j + a
j =1 ⎝ ai,i ⎠ j =i +1 i,i i,i

HW1: Write down a formula for SOR


⎡ 1 1⎤
HW2: ⎢ 1 0 − −
4 4⎥
⎢ ⎥⎡x ⎤ ⎡1⎤
⎢ 0 1 1 ⎥ 1
1 − − ⎢ ⎥ ⎢1⎥
⎢ 4 4 ⎥ ⎢ x2 ⎥
Write a program to solve ⎢ ⎥ = 0.5 ⎢ ⎥ by
⎢x ⎥ ⎢1⎥
⎢− 1 − 1 1 0 ⎥⎢ 3⎥ ⎢ ⎥
⎢ 4 4 ⎥ ⎣ x4 ⎦ ⎣1⎦
⎢ 1 1 ⎥
⎢− − 0 1 ⎥
⎣ 4 4 ⎦
Jacobi and Gauss-Seidel, starting with initial x (0) = [ 0, 0, 0, 0 ].
( ) ( )
Let E J = I − D −1 A and E GS = I − ( D − L ) A . Since the solution of HW2 is
−1

x= [1,1,1,1] and e 0 = x − x (0) = [1,1,1,1]. Clearly, we have e mJ = ( E J ) e 0 and eGS = ( E GS ) e 0 ,


m m m

One can easily check that


⎡1⎤ ⎡2 ⎤
⎢1⎥ ⎢2 ⎥
−1 −1 1 10
e mJ = m ⎢ ⎥ and eGS
m
= m ⎢ ⎥ . Thus, e mJ = m −1 > eGSm
= m .
2 ⎢1⎥ 4 ⎢1 ⎥ 2 4
⎢ ⎥ ⎢ ⎥
⎣ ⎦
1 ⎣1 ⎦
You might get a feeling that Gauss-Seidel method is faster than Jacobi method.

Stein-Rosenberg Theorem
Theorem: Let BJ = L + U be the Jacobi matrix and BGS = ( I − L ) U be the Gauss-
−1

Seidel matrix. Then one and only one of the following relations is vaild:
1) ρ ( BJ ) = ρ ( BGS ) = 0.
2) 0 < ρ ( BGS ) < ρ ( BJ ) < 1.
3) ρ ( BJ ) = ρ ( BGS ) = 1.
4) 1 < ρ ( BJ ) < ρ ( BGS ) .
Convergence of Jacobi, Gauss-Seidel and
SOR Iterative Methods
( )
n
Lemma 1. If A= a i,j ≥ 0 is irreducible then either ∑ a i,j =ρ ( A ) or
j=1

⎛ n ⎞ ⎛ n ⎞
min ∑ ai, j ⎟ < ρ ( A ) < max ⎜ ∑ ai, j ⎟ -------- (1).
1≤i≤n ⎜
⎝ j=1 ⎠ 1≤i≤n
⎝ j=1 ⎠
Proof: Case(1): All row sums of A are equal (=σ ): Let ζ = [1,1,,1]. Clearly, Aζ =σζ and σ ≤ ρ ( A ) .
However, the Gerchgorin's Theorem implies ρ ( A ) ≤ σ . Hence, ρ ( A ) = σ .
Case(2): Not all row sums of A are equal:
( ) ( )
Construct B= bi,j ≥ 0 and C= ci,j ≥ 0, by decreasing and increasing some entries of A,
respectively, such that
n ⎛ n ⎞ n ⎛ n ⎞
∑b , j = α = min ⎜ ∑ ai, j ⎟ and ∑ c, j = β = max ⎜ ∑ ai, j ⎟ , for all 1 ≤  ≤ n.
1≤i≤n
⎝ j=1 ⎠ 1≤i≤n
⎝ j=1 ⎠
j=1 j=1

By Perron-Frobenius theorem, we have ρ ( B) ≤ ρ ( A ) ≤ ρ ( C ) .


Clearly, from the result of Case(1), the inequality (1) holds.

Lemma 2. Let A and B be two matrices with 0 ≤ B ≤ A. Then ρ ( B) ≤ ρ ( A )


( )
Theorem 1. Let A= a i,j be a strictly or irreducibly diagonally dominant matrix
then the Jacobi and Gauss-Seidel iterative methods converge.

⎧ 0 i= j

Proof: Recall that E J = I-D-1 A = D-1 ( L + U ) = bi, j ( ) , where b i,j = ⎨ −ai, j
i≠ j
. From Lemma 2,
⎪ a
⎩ i,i
it is clear that ρ ( B) ≤ ρ ( B ) . Since A is strictly diagonally dominant, clearly, we have
n

∑b i, j < 1 for all 1 ≤ i ≤ n. Therefore, Lemma 1 implies ρ ( B ) < 1. As a result,


j=1

we have shown the Jacobi iterative method converge from ρ ( B) ≤ ρ ( B ) <1.


Now, since E GS =I- ( D-L ) A= ( D-L ) U= I-D-1L ( )  
-1 -1 -1
D-1U. Let L=D -1
L and U=D -1
U.

We have I-L ( )
−1
( )
 ≤ I-L
U
−1
(
 ≤ I+ L + L 2 +  + L n −1 U
U  = I- L
) ( )
−1
 .
U

( )
−1
 J = L + U
Now consider B  and B
 GS = I- L  . Since we have already shown
U

( )
 J < 1, the Stein-Rosenberg theorem implies ρ B
ρ B  GS < 1. ( )
Therefore, we conclude the Gauss-Seidel iterative method converges.
Theorem 2. Let A=D-E-E * and D be Hermitian matrices, where D is positive
definite, and D-ω E is non-singular for 0 ≤ ω ≤ 2.
Let E SOR = I − ω ( D − ω E ) A. Then ρ ( ESOR ) < 1 if only if A is
−1

positive definite and 0<ω <2.


Proof: First, assume e 0 is a nonzero vector, the SOR iteration can be written as
( D − ω E ) e m+1 = (ω E * + (1 − ω ) D ) e m , m ≥ 0 ------- (2)
Let δ m = e m − e m+1 . Substracting ( D − ω E ) e m and (ω E * + (1 − ω ) D ) e m+1 from both side of (2),
we have ( D − ω E )δ m = ω Ae m ---- (3) and ω Ae m+1 = ⎡⎣(1 − ω ) D + ω E * ⎤⎦ δ m ----- (4).
From e*m × (3) − e*m+1 × (4) and "simplifying the expression" (HW), one has
( 2-ω )δ m* Dδ m =ω {e*m Ae m − e*m+1 Ae m+1 } ------ (5).
Assume A is positive definite and 0<ω <2 and let e 0 be any eigenvector of E SOR . We have
e1 =λ e 0 and δ 0 = (1-λ ) e 0 and (5) reduces to
⎛ 2-ω ⎞
⎜⎝
ω
⎟⎠ 1 − λ
2 *
e 0 De 0 = 1(− λ
2
)
e*0 Ae 0 ------- (6).

Now, λ ≠ 1. Otherwise, δ 0 = 0 ⇒ Ae 0 = 0 (by (3)) ⇒ e 0 = 0 ⇒ contradiction!


Since A and D are positive definite and 0<ω <2, (6) implies 1 − λ > 0. Therefore, ρ ( ESOR ) < 1.
2

Using similar arguments, one can show that the converse is also true.
• For SOR, it is possible to determined the optimal value of ω
for special type of matrices (p-cyclic). The optimal value ωb
is precisely specified as the unique positive root
(0<p/(p-1)) of the equation
( J b) ⎣
( ) ⎡ ( ) ⎤ (ω b − 1) , ( Varga 1959 ) .
p
ρ ω = −
p 1− p
E p p 1 ⎦

• For p=2, ⎛ ρ ( EJ ) ⎞
ωb = 1 + ⎜ ⎟ ( Young 1950 )
⎜⎝ 1 + 1 − ρ 2 ( EJ ) ⎟⎠
m m

• Semi-Iterative Method: ym = ∑ v j ( m ) x j where ∑ v j ( m )=1.


j=0 j=0
m
We have e m = ∑ v j ( m ) e m . In general, e m = Pm ( ES ) e 0 where
j=0
Pm is a polymonial and E S is the error reduction of an iterater S
This is so-called polynomial acceleration method.
The most important one is the Chebyshev polynomials.
Chebyshev Semi-Iterative Method
Algorithm:
ym +1 = ω m +1 { ES ym + f − ym −1 } + ym −1 , for m ≥ 1, where
Cm −1 (1 / ρ )
ω m +1 = 1 + , Cm-1 and Cm +1 are Chebyshev polynomials,
Cm +1 (1 / ρ )

ρ = ρ ( ES ) and y 0 = x0 . C0 = 1, C1 = x, Cm +1 = 2xCm − Cm −1

Convergence:
⎛ 2 (ω b − 1)m /2 ⎞ 2
e m ≤⎜ m ⎟ e 0 , here ω b =
⎝ 1 + (ω b − 1) ⎠ 1 + 1 − ρ2
The convergence rate is accelerated as ρ→0.

Remark: There are cases that the polynomial acceleration does not
improve asymptotic rate of convergence.
Some PDE and Finite Element Analysis
Finite Element Solutions:
{
Let ℑh be a given triangulation, Vh = v ∈H10 : v|T ∈P1 (T ) , T ∈ℑh and }
π h :H1 → Vh be the interpolation defined by π h ( u ) ( N i ) = u ( N i ) .
Consider the weak solution u ∈H1 satisfying a ( u,v) = ( f,v) , for all v ∈H10 ,
a finite element solution u h ∈Vh satisfies a ( u,v) = ( f,v) , for all v ∈Vh .

Interpolation Errors:

v-π h v 1 ≤ Ch r u r +1 and v-π h v 0 ≤ Ch r +1 u r +1 .

H2-Regularity: a (⋅, ⋅) is said to be H2-Regular if there exists a


constant C such that for all f ∈L
2

u2 ≤C f 0
Finite Element Solution is Quasi-Optimal
C
Céa Theorem: u − uh ≤ min u − v
H1
α v∈Vh H1

where, C is the continuity constant and α is the coercivity constant of a (⋅, ⋅) .

Proof:
( )
Step1: a u − uh , v = 0, for all v ∈Vh

≤ a ( u − uh ,u − uh ) = a ( u − uh ,u − v ) + a ( u − uh , v − uh )
2
Step2: α u − uh H1

=a ( u − uh ,u − v ) ≤ C u − uh H1
u−v H1

HW4: If a (⋅, ⋅) is self-adjoint, show that u − uh A


= min u − v A ,
v∈Vh

where v A
= a ( v, v ) is the energy norm.

Remark: finite element solution is the orthogonal projection of


the exact solution with respect to the energy norm.
FEM Error Estimation
Theorem 3: Assume the interpolation error estimations holds for the given ℑh
and a (⋅, ⋅) has the H 2 − Regularity. The following estimates hold.
u − uh H 1 ≤ Ch r u r +1 ---- (i) and u − uh L2 ≤ Ch r +1 u r +1 ---- (ii)

Proof: From the interpolation estimation and Céa Theorem, (i) is trivial.
To prove (ii), we use the duality argument. Let w be the solution to the
adjoint problem, a ( v,w ) = ( u-u h ,v ) , for all v ∈H1 . Choosing v=u-u h ,
we have
( u-u h ,u-u h ) = a ( u-u h ,w ) = a ( u-u h ,w − wh ), for any w h ∈Vh
≤ C u − uh H1
w − wh H1
≤ Ch u − uh H1
w2
≤ Ch u − uh H1
u − uh L2
.
Therefore, u − uh L2
≤ Ch r +1 u r +1 .
Definition: mesh dependent norm v
k,h
= ( A v, v )
k
h h
for v ∈Vh , k=0,1, where

( v, w )h = ∑ h 2 ( v ( N i ) , w ( N i )). Clearly, v
0,h
≡ v 0
and v
1,h
≡ v A.

Lemma 3: Λ ( A h ) ≤ Ch −2
Proof: Let λ be an eigenvalue of A h with eigenvector φ .
a (φ , φ ) = ( A hφ , φ )h = λ (φ , φ )h = λ φ
2
0,h
.
C φ Ch −2 φ
2 2

λ≤ 2
A
≤ 2
0
 Ch −2 .
φ 0,h
φ 0,h

Lemma 4: (Generalized Cauchy-Schwarz Inequality)


a ( v, w ) ≤ v 1+t ,h
w 1−t ,h
∀v, w ∈Vh and t ∈R.
Multigrid Methods
Ideas:
• Approximate solutions on fine grid using iterative methods.
• Correct remaining errors from coarse grids.

k-1k I
Mesh Refinement

MG Coarsening
Restriction
Restriction

Prolongation kk-1 I
Prolongation

MG V-cycle
Why Multigrid Works?
1. Relaxation methods converge slowly but smooth the error quickly.

-u j -1 + 2u j − u j +1
Ex1: consider Lu = −u '' = λu ⇒ 2
= λu j
finite difference h
4 2⎛ kπ ⎞ ⎛ kjπ ⎞
Eigenvalues λk = 2 sin ⎜ ⎟ and eigenvectors φ j = sin ⎜
k

h ⎝ 2( N + 1) ⎠ ⎝ N + 1⎟⎠
here, k=1N is the wave number and j is the node number.
1
Richardson relaxation: E R = (I − σ −1 A) where A h = 2 tridiag[-1 2 -1].
h
4
Fourier analysis: Choosing σ = 2 (largest eigenvalue).
h
m m
N ⎛ λk ⎞ k N ⎛ 2⎛ kπ ⎞ ⎞ k N
e m = ∑ ⎜ 1 − ⎟ φ = ∑ ⎜ 1 − sin ⎜ ⎟ ⎟ φ = ∑ α φ
m k

k =1 ⎝ σ⎠ k =1 ⎝ ⎝ 2(N + 1) ⎠ ⎠ k =1
k

, after m relaxation. α k → 0 more quickly for k close to N.


m
Damped Jacobi

⎛ 2 jπ ⎞ 1 ⎛ 16 jπ ⎞ 1 ⎛ 32 jπ ⎞
( )
m
e = sin ⎜ + sin + sin
⎝ N + 1⎟⎠ 2 ⎜⎝ N + 1⎟⎠ 2 ⎜⎝ N + 1⎟⎠
e = I −ωD A e
m −1

2. Smooth error modes are more oscillatory on coarse grids. Smooth errors can
be better corrected by relaxation on coarser grids.

Smooth error on fine grid Smooth error on coarse grid

• Relaxation convergence rate on fine grid is 1-O(h2)


• Relaxation convergence rate on coarse grid: 1-O(4h2)
2
⎛ π ⎞
α
Remember: 1 ≈ 1 − ⎜⎝ 2 ( N + 1) ⎟⎠ = 1 − O h 2
( ) for N  1
3. The smooth error is corrected by coarse grid correction operator:

( ) ( )
E c = I − I Hh AH−1 I hH Ah = Ah−1 − I Hh AH−1 I hH Ah ,

here I hH and I Hh are called restriction and prolongation operator respectively.

HhI hHI

• AH can be obtained from discretization on coarse grid

( ) (Galerkin formulation)
T
• AH =
H h H h
0 I A EIIAE⇒==
h
I and I =c I
h H
chHcHhh h H

⎧ E c is an A-orthogonal projection A E c e, I h e = 0
⎪ h H

( ) ( )
⇒ ⎨ N E c = R I Hh

( ) ( )
⎪⎩ R E = N I h Ah and E is identity on N I h Ah
c H c H
( )
A Picture That Show How Multigrid Works !
(
N I hH Ah ) (
N I hH Ah )
H o H
relaxation
correction
L L
o

( )
R I Hh
( )
R I Hh

(
N I hH Ah ) (
N I hH Ah )
H H

relaxation
o L correction L
o

( )
R I Hh ( )
R I Hh
1 1 T 1 1 1
Consider I = [ ,1, ]
h
H ( linear interpolation ) and I = [ ,1, ].
H
h
2 2 2 2 2
It is easy to check that A H =IHh A h IhH is the discretization of L on ℑH .
Now, for any v ∈Vh , let fv = Ah v. One can consider v and vH = I Hh AH−1 I hH fv as
finite element approximations of v̂, the solution of a ( v̂,w ) = ( fv , w ) . Then,
from the FEM-error estimation and H 2 -regularity, we have

Ec (v)
k
(
= Ah−1 − I Hh AH−1 I hH )( A v)
h
k
= v̂ − vH − (v̂ − v) k
----- (∗)
≤ Ch 2 − k v̂ 2 ≤ Ch 2 − k fv 0
= Ch 2 − k Av 0

N k
Consider the eigenfunction φ , k  . φ j is also an eigenfunction of A H
k
j
2
We have E c φ jk ( ) 1
≤ Chλk = O ( h ) . This concludes the coarse-grid

correction fixes the low frequency errors. For k ≈ N, E c φ jk


1
C
h
, ( ) ≤4

the high frequency errors can be amplified by coarse-grid correction.


Multigrid Algorithm
Multigrid (MG) Algorithm:
1. x k =w k
2. (pre-smoothing) x k =w k +M -1k g k -A k x k ( )
k (
3. (restriction) g =I k-1 g -A x
k k k k )
4. (correction) q i =MG k-1 (q i-1 ,g k ) for 1 ≤ i ≤ m, m=1 or 2 and q 0 =0
5. (prolongation) q =I k q
m k-1 m

6. set x k =x k +q m
7. (post-smoothing) x k =x k +M -1k g k -A k x k( )
8. set MG k (w k ,g k )=x k

MG Error reduction operator:

(
Emg = Ah−1 − I Hh AH−1 I hH )( ) (
Ah E s = E s I − I Hh AH−1 I hH Ah )
Pre-smoothing only Post-smoothing only
Multigrid Cycles

Multigrid V-cycle (m=1) Multigrid V-cycle (m=2)

Full Multigrid cycle (FMG)


DampedJacobirelaxationandsmoothingeffect

MultigridV-cycleconvergencefor1-Dlaplacian
Results provided by 曾昱豪 in NCTU
MG Convergence
Smoothing property: ( )
Al E s ≤ η m Al , for all 0 ≤ m < ∞ and l > 0.

−1 −1 H −1
Approximation property: A −I A I
l
H
h l −1 h
≤ C A Al , for all l > 0.

Ideas for proving the approximation property is shown in P.25 (*)


Proof of smoothing property:
⎛ 1 ⎞
Consider E S =E R = ⎜ I- Ah ⎟ . Let v ∈Vh and vmS = ESm ( v) . From Fourier expansion
⎝ Λ ⎠
λ ⎞
m m
⎛ 1 ⎞ ⎛
v=∑ vkφ k , we have v = ⎜ 1 − A ⎟ v = ∑ ⎜ 1 − k ⎟ vkφ k . Therefore,
m
⎝ Λ ⎠
S
⎝ Λ⎠
λ ⎞ λ ⎞ ⎛ λk ⎞
2m 2m
⎛ ⎛
Ah E R ( v ) 0 = vs 2 ≤ ∑⎜1 − k ⎟ λk2 vk2 =Λ∑ ⎜ 1 − k ⎟
2 2
⎜⎝ ⎟⎠ λk vk
2
⎝ Λ⎠ ⎝ Λ⎠ Λ

{
0 ≤ x ≤1
}
≤ Λ sup (1 − x ) x ( vs ⋅ Ah vs ) ≤ Ch -2
m 1
m
vs 0 vs 2

Since vs 0 ≤ v 0 for Richarson iteration, clearly,


1
Ah E R ( v ) 0 ≤ Ah v 0 ⇒ Ah E R ≤ η ( m ) A h , η ( m ) → 0 as m → ∞.
m
HW5: Prove MG with Richardson smoother is convergent in ⋅ -norm
1
Choices of Interpolations and Coarse Grids
⎤1 2 1⎡ ⎡1 2 1⎤
1⎥ ⎢
p = ⎥ 2 4 2 ⎢ r = ⎢⎢ 2 4 2 ⎥⎥
1
• Linear interpolation: 4 16
⎥1 2 1⎢ ⎢1 2 1⎥
⎦ ⎣ ⎣ ⎦

• Operator-dependent interpolation: De Zeeuw 1990

• The rule in chosing interpolation and restriction:


m p + mr > 2m (Brandt 1977)
where 2m is the order of the PDE, m p -1 is the degree
of polynomials exactly interpolated by IhH and m r -1 is
the degree of ploynomials exactly interpolated by I ( )H T
h .

• Coarse grid selection: regular coarsening,


semi-coarsening, algebraic coarsening
Choices of Smoothers
• Stationary iterative methods : Jacobi, Gauss-Seidel, SOR, …
• Block-type stationary iterative methods (blocks can be
determined by the way we number the nodes)

Vertical line ordering Horizontal line ordering Red-Black ordering

⎡ T -I ⎤
⎢-I T -I ⎥
A= ⎢ ⎥ , T=[-1,4,-1]:
⎢ -I T -I ⎥
⎢ ⎥
⎣ -I T⎦

Matrix of 2-D Laplacian Matrix Pattern for Matrix Pattern for


line ordering R-B ordering
⎧ −u ′′ = f
HW6: Write a MG code for solving 1-D ⎨
⎩ u ( 0 ) = u (1) = 0
using linear interpolant I and I = 0.5 I
h
H
H
h ( )h T
H and the
Red-Black Gauss-Seidel smoother
Update even (Red) points first: x 2i = 0.5 ( f2i + x2i −1 + x2i +1 )
Update odd (Black) points: x 2i = 0.5 ( f2i +1 + x2i + x2i + 2 )

Brandt’s Local Mode Analysis


For analyzing the robustness of a smoother. Brandt’s local mode
analysis is a useful tool. Here, we demonstrate the method by
considering the Jacobi and Gauss-Seidel relaxation for the 2-D
laplace equation with periodic boundary condition.
Brandt’s smoothing factor
Let ε be the error before relaxation. From discrete Fourier
transform theory, ε can be written as
ε i, j = ∑ ε̂θφi, j (θ ) ------(i), where θ = (θ1 ,θ 2 ) , φi, j (θ ) = e i(iθ1 + jθ2 ) ,
θ ∈Θn

1
ε̂θ =
( n + 1)
∑ 2 ε k,lφ k,l ( −θ ), and
1≤ k,l ≤n

⎧ 2π n +1 n+3 ⎫
Θn = ⎨ ( )
k,l − ≤ k,l ≤ , n is odd ⎬.
⎩n + 1 2 2 ⎭
Similarly, the error ε obtained after relaxation can be written
ε̂θ
as ε = ∑ ε̂θφi, j (θ ) -----(ii). Let λ (θ ) ≡  . Brandt's smoothing
θ ∈Θn εθ
⎧ π ⎫
factor is defined as ρ =sup ⎨ λ (θ ) , ≤ θ k ≤ π , k = 1, 2 ⎬ .
⎩ 2 ⎭
Smoothing Factor of Damped Jacobi Iteration
ω
Recall that εi, j = ε i, j −
4
( (
4 ε i, j − ε i +1, j + ε i −1, j + ε i, j +1 + ε i, j −1 )) Plug (i) and (ii)
into it, we have
ω
∑ ε̂ φ (θ ) = ∑ {
θ ∈Θn
θ i, j
θ ∈Θn 4
(
ε̂θφi, j (θ ) − ⎡⎣ 4 ε̂θφi, j (θ ) − ε̂θφi +1, j (θ ) + ε̂θφi −1, j (θ ) + ε̂θφi, j +1 (θ )

⎤ )} ⎧
⎩ θ ∈Θn
ω
+ ε̂θφi +1, j (θ ) ⎦ = ∑ ε̂θ ⎨φi, j (θ ) − ⎡⎣ 4φi, j (θ ) − φi, j (θ ) e iθ1 − φi, j (θ ) e-iθ1 − φi, j (θ ) e iθ2
4
⎧⎪ ⎛ cos (θ1 ) + cos (θ 2 ) ⎞ ⎫⎪
-iθ 2

θ ∈Θn
}
−φi, j (θ ) e ⎤⎦ = ∑ ε̂θ ⎨1 − ω ⎜ 1 −
⎪⎩ ⎝ 2 ⎟ ⎬φi, j (θ )
⎠ ⎪⎭
⎛ cos (θ1 ) + cos (θ 2 ) ⎞
Therefore, λ (θ ) = 1 − ω ⎜ 1 − ⎟ . It is easy to see
⎝ 2 ⎠
⎧ ω 3ω ⎫
that ρ = max ⎨ 1 − ω , 1 − , 1 − ⎬ . The optimal ω that
⎩ 2 2 ⎭
4
minimize ρ is and the smoothing factor ρ = 0.6 for such ω .
5
HW7: Show that the smoothing factor of the Gauss-Seidel iteration is 0.5
How Much Multigrid Costs?
Convergence:
• Stationary method ≈ 1-O(κ-1) ≈ 1-h2
• Conjugate gradient ≈ 1-O(κ-1/2) ≈ 1-h
•Multigrid ≈ O(1) independent with h

How much each MG step cost?


Ignore the cost associated with inter-grid transfer (typically
within 10-20%). Computation cost of one MG V-cycle is
2cn d
(
2cn d 1 + 2 − d + 2 −2d )
+ =
1 − 2− d
nd : total number of points
d: dimension of the problem
c: cost for updating a single unknown
cnd: cost per relaxation sweep.
Standard MG can fail!
• The original PDE has poor coercivity or regularity (for example,
crack problems, convection-diffusion problems, etc.)

- Relaxation may not smooth the error.


- coarse grid correction can only capture a small portion
of the error or even worse! N (I H A )
h h
relaxation
o
H o
• The left figure is a sketch to illustrate o correction
o
why MG slow convergence L

• Next, let’s consider the following example:


( )
R I Hh

⎧ d ⎛ du ⎞ ⎧ ε , 0 ≤ x ≤ i0 h
⎪− ⎜⎝ c ( x ) ⎟⎠ = f ( x ) ⎪
Ex2: ⎨ dx dx , here c ( x ) = ⎨ 1, i0 h < x ≤ i1h ---- (+)
⎪ u ( 0 ) = u (1) = 0 ⎪ ε, i h < x ≤ 1
⎩ ⎩ 1
⎡ 2ε −ε ⎤
⎢ −ε 2ε −ε ⎥
⎢ ⎥
⎢    ⎥
⎢ ⎥
⎢ −ε 1 + ε −1 ⎥

Discrete matrix of (+) ⇒ ⎢


Ah = ⎢
−1 2 −1
  


⎢ ⎥
⎢ −1 1 + ε −ε ⎥
⎢ ⎥
Damped Jacobi EDJ ⎢ −ε 2ε −ε ⎥
matrix of Ah ⎢   ⎥
⎢ ⎥
⎢⎣ −ε 2ε ⎥⎦

⎡ 1 −1 / 2 ⎤
⎢ −1 / 2 1 −1 / 2 ⎥
⎢ ⎥
⎢    ⎥
⎢ ⎥
⎢ −ε / (1 + ε ) 1 −1 / (1 + ε ) ⎥
−1
⎢ −1 / 2 1 −1 / 2 ⎥
I − ω D Ah = I − ω ⎢ ⎥
⎢    ⎥
⎢ −1 / (1 + ε ) 1 −ε / (1 + ε ) ⎥
⎢ ⎥
⎢ −1 / 2 1 −1 / 2 ⎥
⎢   ⎥
⎢ ⎥
⎢⎣ −1 / 2 1 ⎥⎦
For ε → 0, the eigen vector corresponding to the largest eigenvalue
λ ( 0 ) of E DJ converges toward to the vector e( 0 ) while λ ( 0 ) → 1, where

⎪ ih, 0 ≤ x ≤ i0 h
⎪⎪
( e( ) )
0
i
= ⎨ i0 h, i0 h < x ≤ i1h . + +
⎪ i0 i0 ( n + 1)
i0h i1h
⎪ ih − h, i1h < x ≤ 1 Figure of e(0)
⎪⎩ i1 − n − 1 i1 − n − 1

Damped Jacobi fails to smooth the high frequency error!


MG convergence is deteriorated as ε → 0
A remedy of this is to use operator-dependent interpolation!
Construct such interpolation is not easy.
But, there is a “easier and better” way to do it!
Algebraic Multigrid
MG AMG
1. A priori generated coarse grids are 1. A priori generated coarse grids
needed. Coarse grids need to be are not needed! Coarse grids are
generated based on geometric generated by algebraic coarsening
information of the domain. from matrix on fine grid.

2. Interpolation operators are defined 2. Interpolation operators are


independent with coarsening process. defined dynamically in coarsening
process.
3. Smoother is not always fixed.
3. Smoother is fixed.

Ideas:
• Fix the smoothing operator.
•Carefully select coarse grids and define interpolation weights
AMG Convergence
2 2 2
Smoothing assumption: ∃ α >0 ∍ E e ≤ e 1 −α e
s
2
for all e ∈Vh
1

2 2
Approximation assumption: min e − I e h
H H 0
≤ β e 1 where β is independent with e.
eH

( )
2
E e = AE c e, E c e − I Hh eH ≤ E c e
c
E c e − I Hh eH ≤ β E ce E ce
1 2 0 2 1

2 2 ⎛ α⎞ c 2 ⎛ α⎞ 2
2
s
E E c
≤ E e − α E e ≤ ⎜1 − ⎟ E e ≤ ⎜1 − ⎟ e 1
c c
1 1 2 ⎝ β⎠ 1 ⎝ β⎠

,here v 0 = Dv,v , v 1 = Av,v , v 2 = D-1Av,Av , for v ∈Vh

AMG works when A is a symmetric positive definite M-matrix.

In the following, we assume that A is also weakly diagonally


dominate
What Does the Smooth Assumption Tell?
• Smooth error is characterized by Eses 1
≈ es 1 , es 2
is very small
2
e 1
≤ D −1/ 2 Ae D1/ 2 e = e 2
e 0
⇒ e 1
<< e 0

1 ⎛ ⎞ 2
( Ae,e = ) ∑
2 i, j
−ai, j
(ei
− e j
) 2
+ ∑ ⎜ ∑ a e
i, j ⎟ i
<< ∑ a e
i,i i
2

i ⎝ j ⎠ i

1
( )
2

2 j
−ai, j ei − e j << aii ei2

(e − e )
2
ai, j

i j
2
<< 2
j≠i ai,i e
i

• Smoother errors vary slowly in the direction of strong connection, from ei to ej

, where ai, j ai,i are large.


• AMG coarsening should be done in the direction of the strong connections.

• In the coarsening process, interpolation weights are computed so that the


approximation assumption is satisfied. (detail see Ruge and Stüben 1985)
What Does the Approximation Assumption Tell?
2 2
Approximation assumption ≡ min e − I e h
H H 0
≤β e1
eH
2
⎛ ⎞ ⎛1 ⎛ ⎞ 2⎞
∑ aii ⎜⎝ ei − ∑ wik ek ⎟⎠ ≤ β ⎜ 2 ∑ (−aij )(ei − e j ) + ∑ ⎜⎝ ∑ aij ⎟⎠ ei ⎟
2

i∈F k ∈C ⎝ i, j i j ⎠
2 2
⎛ ⎞ ⎛ ⎞
Since ∑ aii ⎜ ei − ∑ wik ek ⎟ = ∑ aii ⎜ ∑ wik (ei − ek ) + (1 − si )ei ⎟
i∈F ⎝ k ∈C ⎠ ⎝ i∈F k ∈C ⎠
⎛ ⎞
≤ ∑ aii ⎜ ∑ wik (ei − ek ) + (1 − si )ei ⎟ ,
2 2

i∈F ⎝ k ∈C ⎠
here wik > 0 is the interpolation weight from node k to node i, and si = ∑ wik < 1,
k ∈C
⎧ β
clearly, if ⎪∑ ii ∑ ik i k − ≤ ∑ (−aij )(ei − e j )2
2
a w (e e )
⎪ i∈F k ∈C 2 i, j
(Θ) ⎨
⎪ a (1 − s )e2 ≤ β ⎛ a ⎞ e2 ,
⎪∑ ii i i ∑ ⎜⎝ ∑ ij ⎟⎠ i
⎩ i∈F i j

( )
the approximation assumption holds. For Θ to hold, we can simply require

(Ξ) (
0 ≤ aii wik ≤ β aik and 0 ≤ aii 1 − si ≤ β ∑ aik . )
k
Lemma 5: Given a β ≥1, suppose the coarse grid C is selected such that
1
ai,i + ∑ ai, j = ∑ ai, j ≥ a
β i,i
j∉Ci j∉Ci
j≠i

where Ci =N i  C, C is the coarse grid and N i = neighbors of i-th node


Then, the approximation assumption holds if the interpolation weights
are defined as wi,k = ai,k ∑a i, j ( )
−−− Φ .
j ∉Ci

ai, k ai,i
ai,iω i, k = ai,i = ai, k ≤ β ai, k
∑a i, j ∑ ai, j
j ∉Ci j ∉Ci

⎛ ⎞

ai, k ⎟
⎞ ⎛ ∑a i, j


ai,i (1 − si ) = ai,i ⎜ 1 − ∑ ω i, k ⎟ = ai,i ⎜ 1 − ∑ =a ⎜ j ⎟
⎜⎝ k ∈Ci j∑ ∑ ai, j ⎟⎟⎠
⎝ k ∈Ci ⎠ a ⎟ i,i ⎜
i, j ⎟ ⎜⎝
∉Ci ⎠ j ∉Ci

≤ β ∑ ai, j
j

Therefore, ( Ξ ) holds. From the arguments in previous page,


We can conclude the approximation holds.
Smoothing property holds for GS

Recall E S = I − B −1 A. We have
2
( ( ) (
E GS e 1 = A I − B −1 A e, I − B −1 A e ))
( ) ( ) (
= ( Ae, e) − AB −1 Ae, e − Ae, B −1 Ae + AB −1 Ae, B −1 Ae )
= e1 − ( B Ae, BB Ae) − ( BB Ae, B Ae) + ( AB
−1 −1 −1 −1 −1
Ae, B −1 Ae )
2

− (( B + B − A ) B Ae, B Ae) .
−1 −1
2 T
= e1

((
The smooth assumption ≡ α e 2 ≤ BT + B − A B −1 Ae, B −1 Ae ---- (Θ)
2
) )
( ) (
Let e =B −1 Ae. Since e 2 = D −1 Ae, Ae = D −1 BB −1 Ae, BB −1 Ae ,
2
)
( ) ((
clearly, (Θ) ≡ α D −1 Be, Be ≤ BT + B − A e, e ---- (ΘΘ). ) )
Now consider B=D-L. Since BT + B − A = D, we have

(ΘΘ) ≡ α
( B T
) = α ( D B D BDe, e ) = α ( D B D BD e, D e ) .
D −1 Be, e −1 T −1 −1 T −1 1/2 1/2

( De, e ) ( D e, D e ) 1/2


( D e, D e )
1/2 1/2 1/2

= αρ ( D B D B ) ≤ 1. ≡ α ≤
−1 T −1 1
---- (ΘΘΘ)
ρ ( D B D B) −1 T −1
Therefore, the smooth assumption holds for Gauss-Seidel iteration.
If A is a diagonally dominant M-matrices, we can estimate α as follows:
( ) ( ) ( ) ( (
Since ρ D −1 BT D −1 B ≤ ρ I − D −1 LT ρ I − D −1 L ≤ 1 + ρ D −1 LT ))(1 + ρ ( D L )),
−1

⎧⎪ n ai, j ⎫⎪
( )
and ρ D L ≤ max ⎨ ∑
−1
⎬ ≤ 1, clearly, we have
1 1
≥ .
1≤i ≤n
⎪⎩ j =1, j ≠i ai,i
⎪⎭ (
−1 T −1
ρ D B D B 4 )
1
Therefore, Gauss-Seidel iteration satisfies the smoothing property with α = .
4

Iffact,forsymmetricM-matrices,smoothassumptionholds
forbothGauss-SeidelandJacobiiterations.

Furthermore,onecanalsoshowthatthecoarsegridmatrix
A H = ( I H ) Ah I H
h T h
isalsoadiagonallydominantM-matrix
whenisadiagonallydominantM-natrixandtheinterpolation
Ah
weightssatisfy(Ξ ) and ( Φ ) .
AMG Coarsening Criteria
First, let’s define the following sets:

{ ( ) }
N Si
N Si = j : − ai, j ≥ γ max −ai,m ,0 < γ < 1
m≠ i
(N )
T
S

(N ) = { j : i ∈ N }
T i
S S
i j

S
Here, N i T is the set of nodes that node i strongly connects to.
( NSi ) is the set of nodes strongly connects to node i.
S
• Ci-nodes should be chosen from N i

• From convergence result, we want β close to 1.


This suggests we need a larger set Ci ( we need to choose a small γ ).

• We don’t want C=ø but we want C as small as possible.

Criterion 1. For each node i in F, node j in Nis should be either in C or


strongly connected to at least one node in Ci.

Criterion 2. C should be a maximal subset of all nodes with the property that
no two C-nodes are strongly connected to each other.
AMG Coarsening (I)

+1
-1
N Si
-1

(N )
T
S
i

Fine points +2-1


AMG
Coarsening (II)
AMG Coarsening: Example 1
Laplace operator from Galerkin FEM Discretization:

A very good MG and AMG tutorial resource (by Van Emden Henson) :

http://www.llnl.gov/CASC/people/henson
AMG Coarsening: Example 2
Convection-Diffusion with Characteristic and downstream layers

∂u
− εΔu + =0 0
∂y
⎧1 if (y=0  x>0) or x=1,
u |∂Ω = ⎨ 0 1
⎩0 otherwise,
where Ω=[-1,1] × [-1,1].
0
. 1

Solution from Galerkin discretization on 32x32 Solution from SDFEM discretization on 32x32
grid grid
⎡ ε ε ε ⎤
h ⎢ − − − ⎥
3 3 3 ⎥
SDFEM discretization with δT = ⎢
2 ⎢ h ε 2h 8ε h ε ⎥
⎢ 6−3 3
+
3

6 3 ⎥
yields the left matrix stencil: ⎢
⎢− h − ε 2h ε h ε⎥

− − − −
⎢⎣ 6 3 3 3 6 3 ⎥⎦

C-point

AMG coarsening with strong


connection parameter
ε/h << β << 0.25
C-point

Coarse grids from GMG coarsening Coarse grids from AMG coarsening
Example 2: GMG v.s. AMG

level GMG AMG level GMG AMG level GMG AMG


3 13 7 3 27 8 3 51 11
On the uniform mesh:
2 13 6 2 26 7 2 35 8
1 12 6 1 16 6 1 17 6

ε=10-2 ε=10-3 ε=10-4


Level GMG AMG Level GMG AMG Level GMG AMG
4 9 6 4 22 8 4 59 14
On the adaptive mesh: 3 8 8 3 24 9 3 57 10
2 7 6 2 18 8 2 47 8
1 7 5 1 17 7 1 34 7

ε=10-2 ε=10-3 ε=10-4


Nonlinear Multigrid
Nonlinear problems: L ( u ) = f ⇒
discretization

⎛ a1 ( u1 ,u2 ,,un ) ⎞ ⎛ f1 ⎞
⎜ a ( u ,u ,,u )⎟ ⎜ f ⎟
One needs to solve A h ( uh ) = fh ≡ ⎜ 2 1 2 ⎟ = ⎜ 2⎟ .
n

⎜  ⎟ ⎜⎟
⎜ ⎟ ⎜ ⎟
⎝ an ( u1 ,u2 ,,un )⎠ ⎝ fn ⎠

Method1:LinearizeAhusingNewtonmethodandsolvethelinear
systembymultigrid.
−1
⎡ D
uj ← uj − ⎢
⎣ Du

Ah (u )⎥

( f − A ( u ))
h j

Method2:NonlinearMultigrid,socalledfullapproximation
storagescheme(FAS)
• Nonlinearrelaxation
•Nonlineardefectcorrection
Nonlinear Relaxation:
Jacobi: (
ai u1old ,,uiold
−1 ,u new
i ,u old
i +1 ,,u old
n )
for all i=1,2,,n
Gauss-Seidel: a (u
i
new
1 ,,uinew
−1 ,u new
i ,u old
i +1 ,,u old
n ) for all i=1,2,,n
Solve local nonlinear problems iteratively.
Example ( Nonlinear Gauss-Seidel ) :

Discretiation:

Newton iteration for each j, starting from j=1


Nonlinear defect correction:

In linear case: ( ) ( )
rh(n) = Ah ( uh ) − Ah uh(n) = Ah uh - uh(n)
In nonlinear case: rh(n) = A (u ) − A (u ) ≠ A (u - u )
h h h
(n)
h h
(n)
h

Solving AH eH = I hH rh(n) does not give an approximation to eh = uh - uh(n) .


( )
⎧⎪ eh = uh - uh(n) where uh satisfies rh(n) = Ah ( uh ) − Ah uh(n)
Now consider ⎨
e
⎩⎪ H = u H - I H (n)
u
h h where u H satisfies I H (n)
r
h h = A H ( u H ) − AH(I H (n)
h uh )
Observe that uh(n) → uh ⇒ u H → I hH uh ⇒ eH → I hH eh . (Here, I hH can simply be an injection)
In this point of view, e H is a reasonable approximation of e h .
Now, we can write down the FAS algorithm:

1. Nonlinear Relaxation
2. Restrict uhn and rhn by rH = I hH rhn and v = I hH uhn
FAS:
3. Solve AH ( u H ) = rH + AH ( v )
4. Compute eH = u H − v
5. Update uhn ← uhn + I Hh eH
-∆u(x, y) + γ u(x, y) e u(x,y) = f(x, y) in [0,1] × [0,1],
( )
u(x, y) = x-x 2 sin ( 3π y )

• Discretization: finite difference


⎤1 2 1⎡ ⎡1 2 1⎤
1⎥ ⎢ r = 1 ⎢2 4 2⎥
• Interpolation and Restriction p = ⎥2 4 2⎢
4 16 ⎢ ⎥
⎥1 2 1⎢ ⎢1 2 1⎥
⎦ ⎣ ⎣ ⎦
• Relaxation: Nonlinear Gauss-Seidel:

ui, j = ui, j −
( )
h −2 4ui, j − ui −1, j − ui +1, j − ui, j −1 − ui, j +1 + ui, j e i , j − fi, j
u

( )
4h −2 + γ 1 + ui, j e i , j
u

starting from ( i,j) = (1,1) , (1, 2 ) ,, ( 2,1) , ( 2, 2 ) ,, ( n, n ) .


An example taking from Multigrid Tutorial (Briggs)

Who is better Newton-MG or FAS?

Not so sure … but FAS is popular in CFD.


Multigrid Parallelization

Parallelization: Using multiple computers communication


to do the job!

P P P P
What need to be done?
M M M M
1. Numericalalgorithmneedtobecapabletodoit.
2. Programhastodistributeworkstoprocessors
properlyanddynamically.(loadbalancing)
3.Computershavetocommunicateeachothers.
(Messagingpassinginterface,MPI)
4.Manyothers(gridtopoloogy,scheduling,….)

Multigrid is a scalable algorithm!


(Jim E. Jones, CASC, Lawrence Livermore National Laboratory)
DG
Domain Decomposition:

DB

• FEM assembling in domains DG, DB, … can be done


simultaneously.

• Matrix-vector product A ⋅ x can be computed indenpendently


in each domain. Pass x G to D 2 and x B to D1 .

• Jacobi and red-black Gauss-Seidel Relaxations can be


done in parallel.
• Grid-Coarsening and refinement can be done in parallel
(not quite easy … need to keep tracking grid topology).
• Interpolations can be parallelized too.
Scalability
T ( N,P ) : Time to solve a problems with N unknowns on P processors
Speedup S(N,P)=T(N,1)/T(N,P). Perfect if S(N,P)=P;
Scaled Efficiency: E(N,P)=T(N,1)/T(NP,P). Perfect if E(N,P)=1.

Assume 2D problem of size (pN)2 is distributed to p2 processors.


Number of unknowns in each processor N2
5 point stencils

2
⎛N⎞ ⎛N⎞
Time for relaxation on grid level k: Tk = Tcomm ⎜ k ⎟ + 5 ⎜ k ⎟ f ,
⎝2 ⎠ ⎝2 ⎠
⎛ 40 ⎞
Time for a V-cycle = Tv ≈ ∑ 2Tk ≈ 8α L + 16N β + ⎜ ⎟ N 2 f ,
k
⎝ 3⎠
α = startup time,
β = time to transfer a single double
Tcomm (n) = α + β n = communication time for transmitting n doubles to one processor.
f = one floating point operation time.
Since MG has O(1) convergence rate, we can analyze the
scaled efficiency as follows:

⎛ 40 ⎞ 2
( )
Tv N ,1 ≈ ⎜ ⎟ N f
2
⎝ 3⎠

( )
Tv ( pN ) , p = 8α log 2 ( pN ) + 16 β ( N ) +
2 40 2
3
N f

E ( N, P ) ≈ O (1 log 2 p ) as p → ∞
E ( N, P ) ≈ O (1) as N → ∞

We need to be careful. In IBM SP2,


α = 5 × 10 −5
β = 1 × 10 −6 ⇔ Communication is expensive!
f = 8 × 10 −9

You might also like