0% found this document useful (0 votes)
7 views31 pages

Sparse Solvers

The document discusses Krylov subspace methods, sparse direct solvers, and preconditioning in the context of numerical solutions for linear systems, particularly focusing on the Conjugate Gradients method. It outlines the advantages of iterative methods over direct methods, especially for sparse matrices, and describes the properties and convergence of the Conjugate Gradients method. Additionally, it covers preconditioning techniques to enhance convergence properties in solving linear systems.

Uploaded by

Pablo Parra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views31 pages

Sparse Solvers

The document discusses Krylov subspace methods, sparse direct solvers, and preconditioning in the context of numerical solutions for linear systems, particularly focusing on the Conjugate Gradients method. It outlines the advantages of iterative methods over direct methods, especially for sparse matrices, and describes the properties and convergence of the Conjugate Gradients method. Additionally, it covers preconditioning techniques to enhance convergence properties in solving linear systems.

Uploaded by

Pablo Parra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Krylov Subspace Methods, Sparse Direct Solvers,

and Preconditioning

Per-Olof Persson
[email protected]

Department of Mathematics
University of California, Berkeley

Math 228A Numerical Solutions of Differential Equations


Iterative Methods for Linear Systems

Direct methods for solving Ax = b, e.g. Gaussian elimination,


compute an exact solution after a finite number of steps (in
exact arithmetic)
Iterative algorithms produce a sequence of approximations
x(1) , x(2) , . . . which hopefully converges to the solution, and
may require less memory than 0
10
Direct

direct methods

Residual r = b−Ax
may be faster than direct −5
10

methods
may handle special structures −10
Iterative
10

(such as sparsity) in a simpler


way −15
10

0 5 10 15 20 25 30
Iteration
Two Classes of Iterative Methods

Stationary methods (or classical iterative methods) finds a


splitting A = M − K and iterates
x(k+1) = M −1 (Kx(k) + b) = Rx(k) + c
Jacobi, Gauss-Seidel, Successive Overrelaxation (SOR), and
Symmetric Successive Overrelaxation (SSOR)
Krylov subspace methods use only multiplication by A (and
possibly by AT ) and find solutions in the Krylov subspace
{b, Ab, A2 b, . . . , Ak−1 b}
Conjugate Gradient (CG), Generalized Minimal Residual
(GMRES), BiConjugate Gradient (BiCG), etc
The Model Poisson Problem
Test problem for linear solvers: Discretize Poisson’s equation
in 2-D:
 2
∂ u ∂2u

− + 2 =1
∂x2 ∂y
on a square grid using centered finite difference
approximations:
4uij − ui−1,j − ui+1,j − ui,j−1 − ui,j+1 = h2
Dirichlet conditions u = 0 on
n+1
boundaries
Grid spacing h = 1/(n + 1)
Total of n2 unknowns uij j
uij
A “typical problem” despite the
simplicity
0
0 i n+1
The Model Problem in MATLAB

In MATLAB:
n=8; h=1/(n+1); e=ones(n,1);
A1=spdiags([-e,2*e,-e],-1:1,n,n);
A=kron(A1,speye(n,n))+kron(speye(n,n),A1);
f=h^2*ones(n^2,1);
0
or simply
10
A=delsq(numgrid(’S’,n+2)); 20

Resulting linear system Au = f is 30


sparse and banded 40

50

60

0 20 40 60
nz = 288
Eigenvalues of Model Problem

A is symmetric positive definite, with eigenvalues


λij = λi + λj , i, j = 1, . . . , n, where
 
πk
λk = 2 1 − cos
n+1

are eigenvalues of the 1-D Laplace operator


πn
Largest eigenvalue λn = 2(1 − cos n+1 )≈4
π
Smallest eigenvalue λ1 = 2(1 − cos n+1 ) ≈ π 2 /(n + 1)2
Condition number κ(A) = λn /λ1 ≈ 4(n + 1)2 /π 2
Krylov Subspace Algorithms

Create a sequence of Krylov subspaces for Ax = b:

Kn = b, Ab, . . . , An−1 b

and find approximate solutions xn in Kn


Only matrix-vector products involved
For SPD matrices, the most popular algorithm is the
Conjugate Gradients method [Hestenes/Stiefel, 1952]

Finds the best solution xn ∈ Kn in the norm kxkA = xT Ax
Only requires storage of 4 vectors (not all the n vectors in Kn )
Remarkably simple and excellent convergence properties
Originally invented as a direct algorithm! (converges after m
steps in exact arithmetic)
The Conjugate Gradients Method

Algorithm: Conjugate Gradients Method


x0 = 0, r0 = b, p0 = r0
for k = 1, 2, 3, . . .
T r
αn = (rn−1 T
n−1 )/(pn−1 Apn−1 ) step length
xn = xn−1 + αn pn−1 approximate solution
rn = rn−1 − αn Apn−1 residual
T T
βn = (rn rn )/(rn−1 rn−1 ) improvement this step
pn = rn + βn pn−1 search direction

Only one matrix-vector product Apn−1 per iteration


Operation count O(m) (excluding the matrix-vector product)
Properties of Conjugate Gradients Vectors

The spaces spanned by the solutions, the search directions,


and the residuals are all equal to the Krylov subspaces:

Kn = hx1 , x2 , . . . , xn i = hp0 , p1 , . . . , pn−1 i


= hr0 , r1 , . . . , rn−1 i = b, Ab, . . . , An−1 b

The residuals are orthogonal:

rnT rj = 0 (j < n)

The search directions are A-conjugate:

pTn Apj = 0 (j < n)

Proofs. Textbook/black-board
Optimality of Conjugate Gradients

The errors en = x∗ − xn are minimized in the A-norm


Proof. For any other point x = xn − ∆x ∈ Kn the error is

kek2A = (en + ∆x)T A(en + ∆x)


= eTn Aen + (∆x)T A(∆x) + 2eTn A(∆x)

But eTn A(∆x) = rnT (∆x) = 0, since rn is orthogonal to Kn ,


so ∆x = 0 minimizes kekA
Monotonic: ken kA ≤ ken−1 kA , and en = 0 in n ≤ m steps
Proof. Follows from Kn ⊆ Kn+1 , and that Kn ⊆ Rm unless
converged
Optimization in CG

CG can be interpreted as a minimization algorithm


We know it minimizes kekA , but this cannot be evaluated
CG also minimizes the quadratic function
ϕ(x) = 12 xT Ax − xT b:

ken k2A = eTn Aen = (x∗ − xn )T A(x∗ − xn )


= xTn Axn − 2xTn Ax∗ + xT∗ Ax∗
= xTn Axn − 2xTn + xT∗ b = 2ϕ(xn ) + constant

At each step αn is chosen to minimize xn = xn−1 + αn pn−1


The conjugated search directions pn give minimization over all
of Kn
Polynomial Approximation by CG

Conjugate Gradients finds an optimal polynomial pn ∈ Pn of


degree n with p(0) = 1, minimizing kpn (A)e0 k with initial
error e0 = x∗
More specifically, with Λ(A) being the spectrum of A:

ken kA kp(A)e0 kA
= inf ≤ inf max |p(λ)|
ke0 kA p∈Pn ke0 kA p∈Pn λ∈Λ(A)

Proof. It is clear that xn = qn (A)b = qn (A)Ax∗ with qn


degree n − 1. Then en = pn (A)e0 with pn ∈ Pn . The equality
above then follows since CG minimizes ken kA . For the
inequality, expand in eigenvectors of A:
X X
e0 = aj vj , p(A)e0 = aj p(λj )vj

Then ke0 k2A = j a2j λj and kp(A)e0 k2A = j a2j λj (p(λj ))2 ,
P P
which implies the inequality.
Rate of Convergence

Important convergence results can be obtained from the


polynomial approximation:
1 If A has n distinct eigenvalues, CG converges in at most n
steps
Proof. The polynomial p(x) = Πnj=1 (1 − x/λj ) is zero at Λ(A)
2 If A has 2-norm condition number κ, the errors are
√ n  n
ken kA κ−1 2
≤2 √ ≈2 1− √
ke0 kA κ+1 κ

Proof. Textbook
In general: CG performs well with clustered eigenvalues
Sparse vs. Dense Matrices

A sparse matrix is a matrix with


enough zeros that it is worth taking
advantage of them [Wilkinson]

A structured matrix has enough


structure that it is worthwhile to use it
(e.g. Toeplitz)

A dense matrix is neither sparse nor


structured
MATLAB Sparse Matrices: Design Principles

Most operations should give the same results for sparse and
full matrices
Sparse matrices are never created automatically, but once
created they propagate
Performance is important – but usability, simplicity,
completeness, and robustness are more important
Storage for a sparse matrix should be O(nonzeros)
Time for a sparse operation should be close to O(flops)
Data Structures for Matrices

Full: 31 0 53
Storage: Array of real (or complex) numbers 0 59 0
Memory: nrows*ncols 41 26 0

double *A

Sparse: double *Pr 31 41 59 26 53


Compressed column
storage int *Ir 1 3 2 3 1
Memory: About
1.5*nnz+.5*ncols
int *Jc 1 3 5 6
Compressed Column Format - Observations

Element look-up: O(log #elements in column) time


Insertion of new nonzero very expensive
Sparse vector = Column vector (not Row vector)
Graphs and Sparsity: Cholesky Factorization

Fill: New nonzeros in


factor

1 3 7 1 3 7
Symmetric Gaussian
Elimination:
8 6 8 6
for j = 1 to N
Add edges between
4 10 4 10
j 0 s higher-numbered
neighbors
9 5 2 9 5 2

G(A) G+ (A)
Permutations of the 2-D Model Problem

2-D Model Problem: Poisson’s Equation on n × n finite


difference grid
Total number of unknowns n2 = N
Theoretical results for the fill-in:
With natural permutation: O(N 3/2 ) fill
With any permutation: Ω(N log N ) fill
With a nested dissection permutation: O(N log N ) fill
Nested Dissection Ordering

A separator in a graph G is a set S of vertices whose removal


leaves at least two connected components
A nested dissection ordering for an N -vertex graph G
numbers its vertices from 1 to N as follows:
Find a separator S, whose removal leaves connected
components T1 , T2 , . . . , Tk
Number the vertices of S from N − |S| + 1 to N
Recursively, number the vertices of each component: T1 from
1 to |T1 |, T2 from |T1 | + 1 to |T1 | + |T2 |, etc
If a component is small enough, number it arbitrarily
It all boils down to finding good separators!
Heuristic Fill-Reducing Matrix Permutations

Banded orderings (Reverse Cuthill-McKee, Sloan, etc):


Try to keep all nonzeros close to the diagonal
Theory, practice: Often wins for “long, thin” problems
Minimum degree:
Eliminate row/col with fewest nonzeros, add fill, repeat
Hard to implement efficiently – current champion is
“Approximate Minimum Degree” [Amestoy, Davis, Duff]
Theory: Can be suboptimal even on 2-D model problem
Practice: Often wins for medium-sized problems
Heuristic Fill-Reducing Matrix Permutations

Nested dissection:
Find a separator, number it last, proceed recursively
Theory: Approximately optimal separators =⇒ approximately
optimal fill and flop count
Practice: Often wins for very large problems
The best modern general-purpose orderings are ND/MD
hybrids
Fill-Reducing Permutations in Matlab

Reverse Cuthill-McKee:
p=symrcm(A);
Symmetric permutation: A(p,p) often has smaller bandwidth
than A
Symmetric approximate minimum degree:
p=symamd(A);
Symmetric permutation: chol(A(p,p)) sparser than chol(A)
Nonsymmetric approximate minimum degree:
p=colamd(A);
Column permutation: lu(A(:,p)) sparser than lu(A)
Symmetric nested dissection:
Not built into MATLAB, several versions in the MESHPART
toolbox
Complexity of Direct Methods

Time and space to solve any problem on any well-shaped


finite element mesh with N nodes:

1-D 2-D 3-D


Space (fill): O(N ) O(N log N ) O(N 4/3 )
Time (flops): O(N ) O(N 3/2 ) O(N 2 )
Preconditioners for Linear Systems

Main idea: Instead of solving

Ax = b

solve, using a nonsingular m × m preconditioner M ,

M −1 Ax = M −1 b

which has the same solution x


Convergence properties based on M −1 A instead of A
Trade-off between the cost of applying M −1 and the
improvement of the convergence properties. Extreme cases:
M = A, perfect conditioning of M −1 A = I, but expensive
M −1
M = I, “do nothing” M −1 = I, but no improvement of
M −1 A = A
Preconditioned Conjugate Gradients

To keep symmetry, solve (C −1 AC −∗ )C ∗ x = C −1 b with


CC ∗ = M
Can be written in terms of M −1 only, without reference to C:

Algorithm: Preconditioned Conjugate Gradients Method


x0 = 0, r0 = b
p0 = M −1 r0 , z0 = p0
for n = 1, 2, 3, . . .
T z
αn = (rn−1 T
n−1 )/(pn−1 Apn−1 ) step length
xn = xn−1 + αn pn−1 approximate solution
rn = rn−1 − αn Apn−1 residual
zn = M −1 rn preconditioning
T T
βn = (rn zn )/(rn−1 zn−1 ) improvement this step
pn = zn + βn pn−1 search direction
Commonly Used Preconditioners

A preconditioner should “approximately solve” the problem


Ax = b
Jacobi preconditioning - M = diag(A), very simple and
cheap, might improve certain problems but usually insufficient
Block-Jacobi preconditioning - Use block-diagonal instead
of diagonal. Another variant is using several diagonals (e.g.
tridiagonal)
Classical iterative methods - Precondition by applying one
step of Jacobi, Gauss-Seidel, SOR, or SSOR
Incomplete factorizations - Perform Gaussian elimination
but ignore fill, results in approximate factors A ≈ LU or
A ≈ RT R (more later)
Coarse-grid approximations - For a PDE discretized on a
grid, a preconditioner can be formed by transferring the
solution to a coarser grid, solving a smaller problem, then
transferring back (multigrid)
Incomplete Cholesky Factorization (IC, ILU)

−→ ×

A RT R

Compute factors of A by Gaussian elimination, but ignore fill


Preconditioner B = RT R ≈ A, not formed explicitly
Compute B −1 z by triangular solves in time O(nnz(A))
Total storage is O(nnz(A)), static data structure
Either symmetric (IC) or nonsymmetric (ILU)
Incomplete Cholesky and ILU: Variants

Allow one or more “levels of fill” 1 4 1 4

Unpredictable storage
requirements 2 3 2 3

Allow fill whose magnitude exceeds a “drop tolerance”


May get better approximate factors than levels of fill
Unpredictable storage requirements
Choice of tolerance is ad hoc
Partial pivoting (for nonsymmetric A)
“Modified ILU” (MIC): Add dropped fill to diagonal of U (R)
A and RT R have same row sums
Good in some PDE contexts
Incomplete Cholesky and ILU: Issues

Choice of parameters
Good: Smooth transition from iterative to direct methods
Bad: Very ad hoc, problem-dependent
Trade-off: Time per iteration vs # of iterations (more fill →
more time
but fewer iterations)
Effectiveness
Condition number usually improves (only) by constant factor
(except MIC for some problems from PDEs)
Still, often good when tuned for a particular class of problems
Parallelism
Triangular solves are not very parallel
Reordering for parallel triangular solve by graph coloring
Complexity of Linear Solvers
Time to solve the Poisson model problem on regular mesh
with N nodes:

Solver 1-D 2-D 3-D


Sparse Cholesky O(N ) O(N 1.5 ) O(N 2 )
CG, exact arith. O(N 2 ) O(N 2 ) O(N 2 )
CG, no precond. O(N 2 ) O(N 1.5 ) O(N 1.33 )
CG, modified IC O(N 1.5 ) O(N 1.25 ) O(N 1.17 )
Multigrid O(N ) O(N ) O(N )

You might also like