0% found this document useful (0 votes)
83 views49 pages

Solvingsingular Linear Equation

The document discusses iterative methods for solving large and sparse linear systems. It introduces rank-one updating using the Sherman-Morrison formula, iterative refinement as a stationary method, and fixed-point iterative methods like Jacobi and Gauss-Seidel. These methods are more efficient than direct Gaussian elimination for large systems as they avoid expensive matrix factorizations and instead iteratively improve an approximate solution.

Uploaded by

taanjit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views49 pages

Solvingsingular Linear Equation

The document discusses iterative methods for solving large and sparse linear systems. It introduces rank-one updating using the Sherman-Morrison formula, iterative refinement as a stationary method, and fixed-point iterative methods like Jacobi and Gauss-Seidel. These methods are more efficient than direct Gaussian elimination for large systems as they avoid expensive matrix factorizations and instead iteratively improve an approximate solution.

Uploaded by

taanjit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Solving Linear Systems:

Iterative Methods and Sparse Systems

COS 323
Last time

Linear system: Ax = b
Singular and ill-conditioned systems
Gaussian Elimination: A general purpose method
Nave Gauss (no pivoting)
Gauss with partial and full pivoting
Asymptotic analysis: O(n3)
Triangular systems and LU decomposition
Special matrices and algorithms:
Symmetric positive definite: Cholesky decomposition
Tridiagonal matrices
Singularity detection and condition numbers
Today:
Methods for large and sparse systems
Rank-one updating with Sherman-Morrison
Iterative refinement
Fixed-point and stationary methods
Introduction
Iterative refinement as a stationary method
Gauss-Seidel and Jacobi methods
Successive over-relaxation (SOR)
Solving a system as an optimization problem
Representing sparse systems
Problems with large systems

Gaussian elimination, LU decomposition


(factoring step) take O(n3)
Expensive for big systems!
Can get by more easily with special matrices
Cholesky decomposition: for symmetric positive
definite A; still O(n3) but halves storage and
operations
Band-diagonal: O(n) storage and operations
What if A is big? (And not diagonal?)
Special Example: Cyclic Tridiagonal

Interesting extension: cyclic tridiagonal

Could derive yet another special case


algorithm, but theres a better way
Updating Inverse

Suppose we have some fast way of finding


A-1 for some matrix A
Now A changes in a special way:
A* = A + uvT
for some n1 vectors u and v
Goal: find a fast way of computing (A*)-1
Eventually, a fast way of solving (A*) x = b
Analogue for Scalars
Sherman-Morrison Formula

A * = A + uv T = A(I + A 1uv T )
* 1
(A ) = (I + A 1uv T ) 1 A 1

To check, verify that (A * ) -1 A * = I, A * (A * ) -1 = I

Sherman-Morrison Formula

1 T 1
1 A u v A b
x = (A * ) 1
b=A b
1+ v T A 1u
So, to solve (A * ) x = b,
z vT y
solve Ay = b, Az = u, x = y
1+ v T z

Applying Sherman-Morrison

Lets consider
cyclic tridiagonal again:

a11 1 a12 1 1

a21 a22 a23
a32 a33 a34
Take A =
a43 a44 a45
, u = , v =

a54 a55 a56

a65 a66 a61a16 a61 a16


Applying Sherman-Morrison

Solve Ay=b, Az=u using special fast


algorithm
Applying Sherman-Morrison takes
a couple of dot products
Generalization for several corrections:
Woodbury
Summary: Sherman-Morrison

Not just for band-diagonals: S.-M. good for


rank-one changes to a matrix whose inverse
we know (or can be computed easily)
O(n2) (for matrix-vector computations) rather
than O(n3)
Caution: Error can propogate in repeating S.-
M.
Woodbury formula works for higher-rank
changes
Iterative Methods
Direct vs. Iterative Methods

So far, have looked at direct methods for


solving linear systems
Predictable number of steps
No answer until the very end
Alternative: iterative methods
Start with approximate answer
Each iteration improves accuracy
Stop once estimated error below tolerance
Benefits of Iterative Algorithms

Some iterative algorithms designed for


accuracy:
Direct methods subject to roundoff error
Iterate to reduce error to O( )
Some algorithms produce answer faster
Most important class: sparse matrix solvers
Speed depends on # of nonzero elements,
not total # of elements
First Iterative Method:
Iterative Refinement
Suppose youve solved (or think youve
solved) some system Ax=b
Can check answer by computing residual:
r = b Axcomputed
If r is small (compared to b), x is accurate
What if its not?
Iterative Refinement

Large residual caused by error in x:


e = xcorrect xcomputed
If we knew the error, could try to improve x:
xcorrect = xcomputed + e
Solve for error:
r = b Axcomputed
Axcomputed = A(xcorrect e) = b r
Axcorrect Ae = b r
Ae = r
Iterative Refinement

So, compute residual, solve for e,


and apply correction to estimate of x
If original system solved using LU,
this is relatively fast (relative to O(n3), that is):
O(n2) matrix/vector multiplication +
O(n) vector subtraction to solve for r
O(n2) forward/backsubstitution to solve for e
O(n) vector addition to correct estimate of x
Requires 2x storage, often requires extra precision
for representing residual
Questions?
Fixed-Point and Stationary Methods
Fixed points

x* is a fixed point of f(x) if x* = f(x*)


Formulating root-finding as
fixed-point-finding
Choose a g(x) such that g(x) has a fixed point
at x* when f(x*) = 0
e.g. f(x) = x2 2x + 3 = 0
g(x) = (x2 + 3) / 2
if x* = (x*2 + 3) / 2 then f(x*) = 0
Or, f(x) = sin(x)
g(x) = sin(x) + x
if x* = sin(x*) + x* then f(x*) = 0
Fixed-point iteration

Step 1. Choose some initial x0


Step 2. Iterate:
For i > 0:
x(i+1) = g(xi)
Stop when x(i+1) xi < threshold.
Example

Compute pi using
f(x) = sin(x)
g(x) = sin(x) + x
Notes on fixed-point root-finding

Sensitive to starting x0
|g(x)| < 1 is sufficient for convergence
Converges linearly (when it converges)
Extending fixed-point iteration to systems of
multiple equations
General form:
Step 0. Formulate set of fixed-point equations
x1 = g1 (x1), x2 = g2 (x2), xn = gn (xn)
Step 1. Choose x10, x20, xn0
Step 2. Iterate:
x1(i+1) = g1(x1i), x2(i+1) = g2(x2i)
Example:
Fixed point method for 2 equations
f1(x) = (x1)2 + x1x2 - 10
f2(x) = x2 + 3x1(x2)2 - 57

Formulate new equations:


g1(x1) = sqrt(10 x1x2)
g2(x2) = sqrt((57 x2)/3x1)

Iteration steps:
x1(i+1) = sqrt(10 x1ix2i)
x2(i+1) = sqrt((57 x2i)/3x1i)
Stationary Iterative Methods for Linear
Systems
Can we formulate g(x) such that x*=g(x*)
when Ax* - b = 0?
Yes: let A = M N (for any satisfying M, N)
and let g(x) = Gx + c = M-1Nx + M-1b
Check: if x* = g(x*) = M-1Nx* + M-1b then
Ax* = (M N)(M-1Nx* + M-1b)
= Nx* + b + N(M-1Nx* + M-1b)
= Nx* + b Nx*
=b
So what?

We have an update equation:


x(k+1) = M-1Nxk + M-1b
Only requires inverse of M, not A
(FYI: Its stationary because G and c do not
change)
Iterative refinement is a stationary method!

x(k+1) = xk + e
= xk + A-1r for estimated A-1
This is equivalent to choosing
g(x) = Gx + c = M-1Nx + M-1b
where G = (I B-1 A) and c = B-1 b
(if B-1 is our most recent estimate of A-1)
So what?

We have an update equation:


x(k+1) = M-1Nxk + M-1b
Only requires inverse of M, not A
We can choose M to be nicely invertible (e.g.,
diagonal)
Jacobi Method

Choose M to be the diagonal of A


Choose N to be M A = -(L + U)
Note that A != LU here
So, use update equation:
x(k+1) = D-1 ( b (L + U)xk)
Jacobi method

Alternate formulation: Recall weve got

Store all xik


In each iteration, set
bi aij x (k )
j
(k +1) j i
x i =
aii
Gauss-Seidel

Why make a complete pass through


components of x using only xik, ignoring the
xi(k+1) weve already computed?
bi aij x (k )
j
(k +1) j i
Jacobi: x i =
aii

bi aij x (k )
j aij x (k +1)
j
(k +1) j >i j< i
G.S.: x i=
aii
Notes on Gauss-Seidel

Gauss-Seidel is also a stationary method


A = M N where M = D + L, N = -U
Both G.S. and Jacobi may or may not
converge
Jacobi: Diagonal dominance is sufficient condition
G.S.: Diagonal dominance or symmetric positive
definite

Both can be very slow to converge


Successive Over-relaxation (SOR)

Let x(k+1) = (1-w)x(k) + w xGS(k+1)


If w = 1 then update rule is Gauss-Seidel
If w < 1: Under-relaxation
Proceed more cautiously: e.g., to make a non-
convergent system converge

If 1 < w < 2: Over-relaxation


Proceed more boldly, e.g. to accelerate
convergence of an already-convergent system

If w > 2: Divergence.
Questions?
One more method:
Conjugate Gradients
Transform problem to a function minimization!

Solve Ax=b
Minimize f(x) = xTAx 2bTx

To motivate this, consider 1D:


f(x) = ax2 2bx
df/ = 2ax 2b = 0
dx
ax = b
Conjugate Gradient for Linear Systems

Preferred method: conjugate gradients


Recall: plain gradient descent has a
problem
Conjugate Gradient for Linear Systems

thats solved by conjugate gradients

Walk along direction

Polak and Ribiere formula:


Conjugate Gradient is easily computable for
linear systems
If A is symmetric positive definite:
At any point, gradient is negative residual
f (x) = x T A x 2b T x
so f (x) = 2(Ax b) = 2r
Easy to compute: just A multiplied by a vector
For any search direction sk, can directly

compute minimum in that direction:
x k +1 = x k + k x k
where k = rkT rk /sTk Ask
Conjugate Gradient for Linear Systems

Just a few matrix-vector multiplies


(plus some dot products, etc.) per iteration
For m nonzero entries, each iteration O(max(m,n))
Conjugate gradients may need n iterations for
perfect convergence, but often get decent answer well
before then
For non-symmetric matrices: biconjugate gradient
Representing Sparse Systems
Sparse Systems

Many applications require solution of


large linear systems (n = thousands to
millions or more)
Local constraints or interactions: most entries are
0
Wasteful to store all n2 entries
Difficult or impossible to use O(n3) algorithms
Goal: solve system with:
Storage proportional to # of nonzero elements
Running time << n3
Sparse Matrices in General

Represent sparse matrices by noting which


elements are nonzero
Critical for Av and ATv to be efficient:
proportional to # of nonzero elements
Useful for both conjugate gradient and Sherman-
Morrison
Compressed Sparse Row Format

Three arrays
Values: actual numbers in the matrix
Cols: column of corresponding entry in values
Rows: index of first entry in each row
Example: (zero-based! C/C++/Java, not Matlab!)

values 3 2 3 2 5 1 2 3
cols 1 2 3 0 3 1 2 3
rows 0 3 5 5 8
Compressed Sparse Row Format

values 3 2 3 2 5 1 2 3
cols 1 2 3 0 3 1 2 3
rows 0 3 5 5 8
Multiplying Ax:

for (i = 0; i < n; i++) {


out[i] = 0;
for (j = rows[i]; j < rows[i+1]; j++)
out[i] += values[j] * x[ cols[j] ];
}
Summary of Methods for Linear Systems

Method Benefits Drawbacks


Forward/ Fast (n2) Applies only to upper- or
backward lower-triangular matrices
substitution
Gaussian Works for any [non-singular] matrix O(n3)
elimination
LU Works for any matrix (singular O(n3) initial factorization
decomposition matrices can still be factored); can (same process as Gauss)
re-use L, U for different b values;
once factored uses only forward/
backward substitution
Cholesky O(n3) but with storage and Still O(n3); only for
computation of Gauss symmetric positive definite
Band-diagonal O(w2n) where w = band width Only for band diagonal
elimination
Method Benefits Drawbacks
Sherman-Morrison Update step is O(n2) Only for rank-1 changes;
degrades with repeated
iterations (then use Woodbury
instead)
Iterative refinement Can be applied following any Requires 2x storage, extra
solution method precision for residual
Jacobi More appropriate than Can diverge when not
elimination for large/sparse diagonally dominant; slow
systems; can be parallelized
Gauss-Seidel More appropriate than Can diverge when not
elimination for large/sparse; a diagonnally dominant or
bit more powerful than Jacobi symmetric/positive-definite;
slow; cant parallelize
SOR Potentially faster than Jacobi, Requires parameter tuning
Gauss-Seidel for large/sparse
systems
Conjugate gradient Fast(er) for large/sparse Requires symmetric positive
systems; often doesnt require definite (otherwise use bi-
all n iterations conjugate)

You might also like