Iterative Linear System PDF
Iterative Linear System PDF
GOAL.
• Understand the norm of vectors and matrix
• Understand the conditional number of a matrix
• Understand, implement and analyze iterative methods
KEY WORDS. Condition number, iterative method, Jacobi method, Gauss-
Seidel method, successive over-relaxation (SOR) method
In the last Chapter, we have seen that Gaussian elimination is the most
general approach to solve a nonsingular linear system
Ax = b. (1)
1
1. positivity: the length of a vector is always greater than 0, unless it is a
zero vector
2. positive scalability: the length of the scalar product of a vector is the
length of the vector multiplied by the absolute value of the scalar.
3. triangular inequality: the length of one side of triangular is always smaller
than the sum of the length of the other two sides of a triangle.
Other examples of vector norms are l1 norm, l∞ norm,
n
X
kxk1 = |xi |, l1 norm,
i=1
It can be checked by Definition 1.1 that the l1 , l2 , l∞ norm defined above are
vector norms. Below we use the l1 norm as an example.
Example 1.3. The l1 norm is a vector norm.
It suffices to check that
Pn
1. kxk1 = i=1 |xi | > 0, if x 6= 0
Pn Pn
2. kαxk = i=1 |αxi | = α i=1 |xi | = |α|kxk, for any α ∈ R
Pn Pn
3. kx + yk = i=1 |xi + yi | ≤ i=1 (|xi | + |yi |) = kxk + kyk.
Example 1.4. Let x = (1, 1, · · · , 1)1×n , then
kxk1 = n,
√
kxk2 = n
kxk∞ = 1.
Definition 1.5. (Matrix Norm) A matrix norm of a matrix kAk is any mapping
from Rn×n to R with the following three properties.
1. kAk > 0, if A 6= 0
2. kαAk = |α|kAk, for any α ∈ R
3. kA + Bk ≤ kAk + kBk (triangular inequality)
for any matrix A, B ∈ Rn×n .
We usually prefer matrix norms that are related to a vector norm.
Definition 1.6. (subordinate matrix norm) The subordinate matrix norm based
on a vector norm k · k is given by
kAk = sup{kAxk : x ∈ Rn and kxk = 1}
= sup {kAxk} (2)
kxk=1
2
It can be checked that the subordinate matrix norm defined by eq.(2) is a
norm.
1. kAk > 0, if A 6= 0
2.
3.
= kAk + kBk
•
kIk = 1
•
kAxk ≤ kAkkxk
•
kABk ≤ kAkkBk
To derive them,
•
kIk = sup {kIxk} = sup {kxk} = 1
kxk=1 kxk=1
• Left as homework.
3
Examples of subordinate matrix norms for a matrix A, based on the l1 , l2
and l∞ vector norms respectively, are
n
X
kAk1 = max |aij |, l1 norm
1≤j≤n
i=1
where σi are the square root of eigenvalues of AT A, which are called the singular
values of A. σmax is the largest in absolute value among σi .
Example 1.7. Let the matrix A be
1 2
3 4
then
kAk1 = 6,
kAk2 = 5.4650,
kAk∞ = 7.
To get kAk2 , use ‘eig(A’*A)’ in matlab to obtain σ12 , · · · , σn2 , then pick the
largest one among all σi ’s, σmax .
The formulas for l1 , l2 and l∞ subordinate matrix norms can be derived by
using Definition 1.6. For example,
= σmax . (3)
4
2 Condition number and stability
Definition 2.1. (Condition number) Condition number of a matrix indicates if
the solution of the linear system is sensitive to small changes. It turns out that
this sensitivity can be measured by the condition number defined as
where we have
Aδx = δb.
For the original linear system Ax = b, we have
which gives
1 kAk
≤ . (4)
kxk kbk
For the perturbed linear system Aδx = δb, we have δx = A−1 δb and therefore
Remark 2.2. kδbk is the relative perturbation we have on b, and kδxk kxk is the
kb k
resulting relative perturbation we have on the solution x as a result of the
perturbation on b. Therefore the condition number of a matrix κ(A) measures
the sensitivity of the system to errors in data. When the condition number is
large, the computed solution of the system may be dangerously in error. Further
check should be made before accepting the solution as being accurate.
Remark 2.3. Condition number is always greater than 1. κ(A) = kAkkA−1 k ≥
kAA−1 k = kIk = 1. Values of the condition number close to 1 indicate a well-
conditioned matrix whereas large values indicate an ill-conditioned matrix.
5
3 Basic iterative method
The iterative method produces a sequence of approximate solution vector x(0) ,
x(1) , x(2) , · · · , x(k) , · · · for system of equations Ax = b. The numerical pro-
cedure is designed such that, in principle, the sequence of approximate vectors
converge to the actual solution, and as rapidly as possible. The process could
be stop when the approximate solution is sufficiently close to the true solution
or close to each other. This is in contract with the Gaussian elimination, which
has no provisional solution. A general iterative procedure goes as follows:
1. Select a initial guess x(0) .
2. Design an iterative procedure:
To see that the iterative procedure eq.(7) actually is consistent with the original
Ax = b, we let k → ∞ and presume that the approximate sequence converges
to x, then we have
Qx = (Q − A)x + b
which leads to Ax = b. Thus, if the sequence converge, its limit is the solution
to the Ax = b.
To have a method that is efficient, we hope to have the Q satisfying the
following properties for the general iterative procedure (from eq. (7)),
1. Q is easy to invert.
2. The sequence x(k) will converge to x, no matter what the initial guess is.
3. The sequence x(k) converges to x as rapidly as possible.
In the following, we will introduce three iterative methods: Jacobi method,
the Gauss-Seidel method and the successive over-relaxation (SOR) method.
Jacobi method. Let’s first write the system of equations Ax = b in its
detailed form
Xn
aij xj = bi , 1 ≤ i ≤ n. (8)
j=1
(k)
In the kth iteration, we solve the ith equation for the ith unknown xi , assum-
(k−1)
ing that the other xj comes from the previous iteration xj , we obtain an
equation that describes the Jacobi method:
i−1 n
(k−1) (k) (k−1)
X X
aij xj + aii xi + aij xj = bi , 1 ≤ i ≤ n. (9)
j=1 j=i+1
or
n
(k−1) (k)
X
aij xj + aii xi = bi , 1 ≤ i ≤ n. (10)
j=1,j6=i
6
which could be rearranged as
n
(k) (k−1)
X
xi = (bi − aij xj )/aii
j=1,j6=i
n
bi X aij (k−1)
= − x . (11)
aii aii j
j=1,j6=i
Here we assume that all diagonal entries are nonzero. If this is not the case, we
can usually rearrange the equation so that it is.
The equation (9) could be written in the following matrix form
a11 0 ··· 0 0 a12 · · · a1n
0 a22 · · · 0 (k) a21 0
· · · a2n (k−1)
x + x =b
.. .. .. . . . .. .
. .. .. .. . ..
. .
0 0 ··· ann an1 an2 ··· 0
Dx(k) + (A − D)x(k−1) = b.
Dx(k) = (D − A)x(k−1) + b,
Carry out a number of Jacobi iteration, starting with zero initial vector.
Solution: Rewriting the equation, we have
(k) 1 (k−1) 1
x1 = x +
2 2 2
(k) 1 (k−1) 1 (k−1) 8
x2 = x + x + (12)
3 1 3 3 3
(k) 1 (k−1) 5
x3 = x − . (13)
2 2 2
7
Taking the initial vector to be x(0) = [0, 0, 0]0 , we find that
After 21 iterations, the actual solution is obtained within some fixed precision.
In the Jacobi method, the matrix Q is taken to be the diagonal part of A,
2 0 0
0 3 0
0 0 2
With this Q, we know that the Jacobi method could also be implemented as
x(k) = Bx(k−1) + h
In the Jacobi method, the equations are solved in order. When solving the ith
(k)
equation, the component xj (1 ≤ j < i) can be immediately in their place, and
(k−1)
is expected to be more accurate than xj (1 ≤ j < i). Taking into account
of this, we obtain an equation that describes the Gauss-Seidel (GS) method:
i−1 n
(k) (k) (k−1)
X X
aij xj + aii xi + aij xj = bi , 1 ≤ i ≤ n. (14)
j=1 j=i+1
Here we assume that all diagonal entries are nonzero. If this is not the case,
we can usually rearrange the equation so that it is. The equation (14) could be
8
written in the following matrix form
a11 0 ··· 0 0 a12 ··· a1n
a21 a22 · · · 0 0 0 ··· a2n
(k) (k−1)
x + x =b
.. .. .. .. .. .. .. ..
. . . . . . . .
an1 an2 ··· ann 0 0 ··· 0
Use the notation of decomposing A = D − CL − CU , then the above matrix
representation of GS matrix is
(D − CL )x(k) + (A − D + CL )x(k−1) = b.
Rearrange it a little bit, we have
(D − CL )x(k) = (D − CL − A)x(k−1) + b,
in the form of eq.(7) with Q = D − CL .
Example 3.2. (GS iterative method) Let A and b the same as in Example 3.1.
Carry out a number of GS iterations, starting with zero initial vector.
Solution: Rewriting the equation, we have
(k) 1 (k−1) 1
x1 = x +
2 2 2
(k) 1 (k) 1 (k−1) 8
x2 = x + x3 + (16)
3 1 3 3
(k) 1 (k) 5
x3 = x − . (17)
2 2 2
Taking the initial vector to be x(0) = [0, 0, 0]0 , we find that
x(1) = [0.5000, 2.8333, −1.0833]0
x(2) = [1.9167, 2.9444, −1.0278]0
···
x (9)
= [2.0000, 3.0000, −1.0000]0
After 9 iterations, the actual solution is obtained within some fixed precision.
In the GS method, the matrix Q is taken to be the lower triangular part of
A,
2 0 0
−1 3 0
0 −1 2
With this Q, we know that the GS method could also be implemented as
x(k) = Bx(k−1) + h
with the GS iterative matrix B and constant vector h are
0 12
0 1/2
B = 0 16 1
3 , h = 17/6 . (18)
1 1
0 12 6 −13/12
9
Successive Overrelaxation (SOR) method. Let’s first write the system of
equations Ax = b in its detailed form
n
X
aij xj = bi , 1 ≤ i ≤ n.
j=1
The idea of the SOR method is essentially the same as the GS method, except
(k−1) (k)
that it also use xi to solve for xi . The algorithm is the following
i−1 n
X (k) 1 (k) 1 (k−1) X (k−1)
aij xj +aii ( x +(1− )xi )+ aij xj = bi , 1 ≤ i ≤ n. (19)
j=1
w i w j=i+1
Again we assume that all diagonal entries are nonzero. The equation (19) could
be written in the following matrix form
(1 − w1 )a11 a12
a11 /w 0 ··· 0 · · · a1n
1
a21 a 22 /w · · · 0
(k)
0 (1 − w )a22 · · · a2n
(k−1)
x + . x =b
.. .. . .
. . .. . . . . ...
.. ..
. .
1
an1 an2 ··· ann /w 0 0 ··· (1 − w )ann
10
Taking the initial vector to be x(0) = [0, 0, 0]0 , we find that
After 7 iterations, the actual solution is obtained within some fixed precision.
In the SOR method, the matrix Q is taken to be
2/w 0 0
−1 3/w 0
0 −1 2/w
With this Q, we know that the SOR(w) method could also be implemented as
x(k) = Bx(k−1) + h
Remark 3.5. (Stopping criteria) The stopping criteria in the iterative methods
for solving Ax = b is to make sure that the distance, measure in norms, between
approximations are bounded by some prescribed tolerance,
11
Let the error at the kth iteration be e(k) = x − x(k) . Let x mines both sides of
eq.(24), we have
As can be seen from eq.(26), if k(I − Q−1 A)k < 1, the error becomes smaller
and smaller as the iteration goes on, therefore the iterative method converges.
What is more, the smaller the k(I − Q−1 A)k is, the faster convergence we would
expect. A very classical theorem about the convergence of the iterative method
is the following
Theorem 4.1. (Spectral Radius Theorem) In order that the sequence generated
by eq.(7) to converge, no matter what the starting point x(0) is selected, it is
necessary and sufficient that all eigenvalues of the matrix I − Q−1 A lies in the
open unit disc, |z| < 1, in the complex plane.
Proof. Let B = I − Q−1 A, then
e(k) = B k e(0) ,
hence
ke(k) k ≤ kB k kke(0) k.
By the spectral radius Theorem,
where ρ is the spectral radius function of a matrix: for a n-by-n matrix A, with
eigenvalues λi , the
ρ(A) = max{|λi |}
i
12
In Example 3.1, we have use Jacobi, GS and SOR method to iteratively solve
it. We have observed that they take 21, 9 and 7 iterations respectively to obtain
solutions within the same tolerance. Actually, this behavior could be predicted
by the eigenvalues of I − Q−1 A.
Example 4.2. Determine whether the Jacobi, GS and SOR method will con-
verge for the matrix A and b in Example 3.1, no matter what the initial condition
is.
Solution: For the Jacobi method, we can easily compute the eigenvalues of the
relevant matrix I − Q−1 A (the matrix B in Example 3.1). The steps are
−λ 1/2 0
1
det(B − λI) = det 1/3 −λ 1/3 = −λ3 + λ = 0.
3
0 1/2 −λ
Solving for λ gives us the three eigenvalues are 0, ±0.5774, all of which lies in
the open unit disk. Thus, the Jacobi method converges.
Similarly, for the GS method, the eigenvalues of the relevant matrix I −Q−1 A
(the B from Example 3.2 eq.(18)) are determined by
−λ 11/20 0
det(B − λI) = det 0 1/6 − λ 1/3 = −λ(1/6 − λ)2 + 1 λ = 0.
36
0 1/12 1/6 − λ
Solving for λ gives us the three eigenvalues are 0, 0, 0.3333. Thus, the GS method
converges.
Similarly, for the SOR method with w = 1.1, the eigenvalues of the relevant
matrix I − Q−1 A (the B from Example 3.3 eq.(23)) are determined by
−1/10 − λ 11/20 0
det(B − λI) = det −11/300 61/600 − λ 11/30
−121/6000 671/12000 61/600 − λ
= −1/1000 + 31/3000λ + 31/3000λ2 − λ3 = 0.
Solving for λ gives us the three eigenvalues are ≈ 0.1200, 0.0833, −0.1000. Thus,
the SOR method converges.
Also from the magnitude of those eigenvalues, it is not surprise that the SOR
performs better than GS, then Jacobi, in terms of the efficiency.
13