4 QR Factorization: 4.1 Reduced vs. Full QR
4 QR Factorization: 4.1 Reduced vs. Full QR
4 QR Factorization: 4.1 Reduced vs. Full QR
R,
where
Q C
mn
with orthonormal columns and
R C
nn
an upper triangular matrix
such that
R(j, j) = 0, j = 1, . . . , n.
As with the SVD
Q provides an orthonormal basis for range(A), i.e., the columns of
A are linear combinations of the columns of
Q. In fact, we have range(A) = range(
Q).
This is true since Ax =
Q
Rx =
Qy for some y so that range(A) range(
Q). Moreover,
range(Q) range(
A) since we can write A
R
1
=
Q because
R is upper triangular with
nonzero diagonal elements. (Now we have
Qx = A
R
1
x = Ay for some y.)
Note that any partial set of columns satisfy the same property, i.e.,
span{a
1
, . . . , a
j
} = span{q
1
, . . . , q
j
}, j = 1, . . . , n.
In order to obtain the full QR factorization we proceed as with the SVD and extend
n
i=1
r
in
q
i
r
nn
(17)
Note that in these formulas the columns a
j
of A are given and we want to determine
the columns q
j
of Q and entries r
ij
of R such that Q is orthonormal, i.e.,
q
i
q
j
=
ij
, (18)
R is upper triangular and A = QR. The latter two conditions are already reected in
the formulas above.
Using (14) in the orthogonality condition (18) we get
q
1
q
1
=
a
1
a
1
r
2
11
= 1
so that
r
11
=
_
a
1
a
1
= a
1
2
.
36
Note that we arbitrarily chose the positive square root here (so that the factorization
becomes unique).
Next, the orthogonality condition (18) gives us
q
1
q
2
= 0
q
2
q
2
= 1.
Now we apply (15) to the rst of these two conditions. Then
q
1
q
2
=
q
1
a
2
r
12
q
1
q
1
r
22
= 0.
Since we ensured q
1
q
1
= 1 in the previous step, the numerator yields r
12
= q
1
a
2
so
that
q
2
=
a
2
(q
1
a
2
)q
1
r
22
.
To nd r
22
we normalize, i.e., demand that q
2
q
2
= 1 or equivalently q
2
2
= 1. This
immediately gives
r
22
= a
2
(q
1
a
2
)q
1
2
.
To fully understand how the algorithm proceeds we add one more step (for n = 3).
Now we have three orthogonality conditions:
q
1
q
3
= 0
q
2
q
3
= 0
q
3
q
3
= 1.
The rst of these conditions together with (17) for n = 3 yields
q
1
q
3
=
q
1
a
3
r
13
q
1
q
1
r
23
q
1
q
2
r
33
= 0
so that r
13
= q
1
a
3
due to the orthonormality of columns q
1
and q
2
.
Similarly, the second orthogonality condition together with (17) for n = 3 yields
q
2
q
3
=
q
2
a
3
r
13
q
2
q
1
r
23
q
2
q
2
r
33
= 0
so that r
23
= q
2
a
3
.
Together this gives us
q
3
=
a
3
(q
1
a
3
)q
1
(q
2
a
3
)q
2
r
33
and the last unknown, r
33
, is determined by normalization, i.e.,
r
33
= a
3
(q
1
a
3
)q
1
(q
2
a
3
)q
2
2
.
37
In general we can formulate the following algorithm:
r
ij
= q
i
a
j
(i = j)
v
j
= a
j
j1
i=1
r
ij
q
i
r
jj
= v
j
2
q
j
=
v
j
r
jj
We can compute the reduced QR factorization with the following (somewhat more
practical and almost Matlab implementation of the) classical Gram-Schmidt algorithm.
Algorithm (Classical Gram-Schmidt)
for j = 1 : n
v
j
= a
j
for i = 1 : (j 1)
r
ij
= q
i
a
j
v
j
= v
j
r
ij
q
i
end
r
jj
= v
j
2
q
j
= v
j
/r
jj
end
Remark The classical Gram-Schmidt algorithm is not ideal for numerical calcula-
tions since it is known to be unstable. Note that, by construction, the Gram-Schmidt
algorithm yields an existence proof for the QR factorization.
Theorem 4.1 Let A C
mn
with m n. Then A has a QR factorization. Moreover,
if A is of full rank (n), then the reduced factorization A =
Q
R with r
jj
> 0 is unique.
Example We compute the QR factorization for the matrix
A =
_
_
1 2 0
0 1 1
1 0 1
_
_
.
First v
1
= a
1
=
_
_
1
0
1
_
_
and r
11
= v
1
=
2. This gives us
q
1
=
v
1
v
1
=
1
2
_
_
1
0
1
_
_
.
38
Next,
v
2
= a
2
(q
1
a
2
)
. .
=r
12
q
1
=
_
_
2
1
0
_
_
2
_
_
1
0
1
_
_
=
_
_
1
1
1
_
_
.
This calculation required that r
12
=
2
2
=
2. Moreover, r
22
= v
2
=
3 and
q
2
=
v
2
v
2
=
1
3
_
_
1
1
1
_
_
.
In the third iteration we have
v
3
= a
3
(q
1
a
3
)
. .
=r
13
q
1
(q
2
a
3
)
. .
=r
23
q
2
from which we rst compute r
13
=
1
2
and r
23
= 0. This gives us
v
3
=
_
_
0
1
1
_
_
2
1
2
_
_
1
0
1
_
_
0 =
1
2
_
_
1
2
1
_
_
.
Finally, r
33
= v
3
=
6
2
and
q
3
=
v
3
v
3
=
1
6
_
_
1
2
1
_
_
.
Collecting all of the information we end up with
Q =
_
_
1
2
1
3
1
6
0
1
3
2
6
1
2
1
3
1
6
_
_ and R =
_
2
2
1
2
0
3 0
0 0
6
2
_
_.
4.3 An Application of the QR Factorization
Consider solution of the linear system Ax = b with A C
mm
nonsingular. Since
Ax = b QRx = b Rx = Q
b,
where the last equation holds since Q is unitary, we can proceed as follows:
1. Compute A = QR (which is the same as A =
Q
R in this case).
2. Compute y = Q
b.
39
3. Solve the upper triangular Rx = y
We will have more applications for the QR factorization later in the context of least
squares problems.
Remark The QR factorization (if implemented properly) yields a very stable method
for solving Ax = b. However, it is about twice as costly as Gauss elimination (or
A = LU). In fact, the QR factorization can also be applied to rectangular systems and
it is the basis of Matlabs backslash matrix division operator. We will discuss Matlab
examples in a later section.
4.4 Modied Gram-Schmidt
The classical Gram-Schmidt algorithm is based on projections of the form
v
j
= a
j
j1
i=1
r
ij
q
i
= a
j
j1
i=1
(q
i
a
j
)q
i
.
Note that this means we are performing a sequence of vector projections. The starting
point for the modied Gram-Schmidt algorithm is to rewrite one step of the classical
Gram-Schmidt algorithm as a single matrix projection, i.e.,
v
j
= a
j
j1
i=1
(q
i
a
j
)q
i
= a
j
j1
i=1
(q
i
q
i
)a
j
= a
j
Q
j1
Q
j1
a
j
=
_
I
Q
j1
Q
j1
_
. .
=P
j
a
j
,
where
Q
j1
= [q
1
q
2
. . . q
j1
] is the matrix formed by the column vectors q
i
, i =
1, . . . , j 1.
In order to obtain the modied Gram-Schmidt algorithm we require the following
observation that the single projection P
j
can also be viewed as a series of complementary
projections onto the individual columns q
i
, i.e.,
Lemma 4.2 If P
j
= I
Q
j1
Q
j1
with
Q
j1
= [q
1
q
2
. . . q
j1
] a matrix with orthonor-
mal columns, then
P
j
=
j1
i=1
P
q
i
.
40
Proof First we remember that
P
j
= I
Q
j1
Q
j1
= I
j1
i=1
q
i
q
i
and that the complementary projector is dened as
P
q
i
= I q
i
q
i
.
Therefore, we need to show that
I
j1
i=1
q
i
q
i
=
j1
i=1
(I q
i
q
i
) .
This is done by induction. For j = 1 the sum and the product are empty and the
statement holds by the convention that an empty sum is zero and an empty product is
the identity, i.e., P
1
= I.
Now we step from j 1 to j. First
j
i=1
(I q
i
q
i
) =
j1
i=1
(I q
i
q
i
)
_
I q
j
q
j
_
=
_
I
j1
i=1
q
i
q
i
_
_
I q
j
q
j
_
by the induction hypothesis. Expanding the right-hand side yields
I
j1
i=1
q
i
q
i
q
j
q
j
+
j1
i=1
q
i
q
i
q
j
..
=0
q
j
so that the claim is proved.
Summarizing the discussion thus far, a single step in the Gram-Schmidt algorithm
can be written as
v
j
= P
q
j1
P
q
j2
. . . P
q
1
a
j
,
or more algorithmically:
v
j
= a
j
for i = 1 : (j 1)
v
j
= v
j
q
i
q
i
v
j
end
For the nal modied Gram-Schmidt algorithm the projections are arranged dier-
ently, i.e., P
q
i
is applied to all v
j
with j > i. This leads to
41
Algorithm (Modied Gram-Schmidt)
for i = 1 : n
v
i
= a
i
end
for i = 1 : n
r
ii
= v
i
2
q
i
=
v
i
r
ii
for j = (i + 1) : n
r
ij
= q
i
v
j
v
j
= v
j
r
ij
q
i
end
end
We can compare the operations count, i.e., the number of basic arithmetic operations
(+,-,*,/), of the two algorithms. We give only a rough estimate (exact counts will
be part of the homework). Assuming vectors of length m, for the classical Gram-
Schmidt roughly 4m operations are performed inside the innermost loop (actually m
multiplications and m1 additions for the inner product, and m multiplications and m
subtractions for the formula in the second line). Thus, the operations count is roughly
n
j=1
j1
i=1
4m =
n
j=1
(j 1)4m 4m
n
j=1
j = 4m
n(n + 1)
2
2mn
2
.
The innermost loop of the modied Gram-Schmidt algorithm consists formally of ex-
actly the same operations, i.e., requires also roughly 4m operations. Thus its operation
count is
n
i=1
n
j=i+1
4m =
n
i=1
(n i)4m = 4m
_
n
2
i=1
i
_
= 4m
_
n
2
n(n + 1)
2
_
2mn
2
.
Thus, the operations count for the two algorithms is the same. In fact, mathematically,
the two algorithms can be shown to be identical. However, we will learn later that the
modied Gram-Schmidt algorithm is to be preferred due to its better numerical stability
(see Section 4.6).
4.5 Gram-Schmidt as Triangular Orthogonalization
One can view the modied Gram-Schmidt algorithm (applied to the entire matrix A)
as
AR
1
R
2
. . . R
n
=
Q, (19)
42
where R
1
, . . . , R
n
are upper triangular matrices. For example,
R
1
=
_
_
1/r
11
r
12
/r
11
r
13
/r
11
r
1m
/r
11
0 1 0 0
0 0 1 0
.
.
.
.
.
.
.
.
.
0 0 1
_
_
,
R
2
=
_
_
1 0 0
0 1/r
22
r
23
/r
22
r
2m
/r
22
0 0 1 0
.
.
.
.
.
.
.
.
.
0 0 1
_
_
and so on.
Thus we are applying triangular transformation matrices to A to obtain a matrix
Q
with orthonormal columns. We refer to this approach as triangular orthogonalization.
Since the inverse of an upper triangular matrix is again an upper triangular matrix,
and the product of two upper triangular matrices is also upper triangular, we can think
of the product R
1
R
2
. . . R
n
in (19) in terms of a matrix
R
1
. Thus, the (modied)
Gram-Schmidt algorithm yields a reduced QR factorization
A =
Q
R
of A.
4.6 Stability of CGS vs. MGS in Matlab
The following discussion is taken from [Trefethen/Bau] and illustrated by the Mat-
lab code GramSchmidt.m (whose supporting routines clgs.m and mgs.m are part of a
computer assignment).
We create a random matrix A R
8080
by selecting singular values
1
2
,
1
4
, . . . ,
1
2
80
and generating A = UV
i=1
i
u
i
v
T
i
so that
a
j
= A(:, j) =
80
i=1
i
u
i
v
ji
.
Next, V is a normally distributed random unitary matrix, and therefore the entries in
one of its columns satisfy
|v
ji
|
1
80
0.1.
43
Now from the (classical) Gram-Schmidt algorithm we know that
r
11
= a
1
2
=
80
i=1
1
v
1i
u
i
2
.
Since the singular values were chosen to decrease exponentially only the rst one really
matters, i.e.,
r
11
1
v
11
u
1
2
=
1
v
11
1
2
1
80
(since u
1
2
= 1).
Similar arguments result in the general relationship
r
jj
1
80
j
(the latter of which we know). The plot produced by GramSchmidt.m shows how
accurately the diagonal elements of R are computed. We can observe that the classical
Gram-Schmidt algorithm is stable up to
j
_
x x x
x x x
x x x
x x x
_
_
Q
1
A =
_
_
x x x
0 x x
0 x x
0 x x
_
_
Q
2
Q
1
A =
_
_
x x x
0 x x
0 0 x
0 0 x
_
_
Q
3
Q
2
Q
1
A =
_
_
x x x
0 x x
0 0 x
0 0 0
_
_
,
44
where x stands for a generally nonzero entry. From this we note that Q
k
needs to
operate on rows k : m and not change the rst k 1 rows and columns. Therefore it
will be of the form
Q
k
=
_
I
k1
O
O F
_
,
where I
k1
is a (k 1) (k 1) identity matrix and F has the eect that
Fx = xe
1
in order to introduce zeros in the lower part of column k. We will call F a Householder
reector.
Graphically, we can use either a rotation (Givens rotation) or a reection about the
bisector of x and e
1
to transform x to xe
1
.
Recall from an earlier homework assignment that given a projector P, then (I 2P)
is also a projector. In fact, (I 2P) is a reector. Therefore, if we choose v = xe
1
x
and dene P =
vv
v
, then
F = I 2P = I 2
vv
v
is our desired Householder reector. Since it is easy to see that F is Hermitian, so is
Q
k
. Note that Fx can be computed as
Fx =
_
I 2
vv
v
_
x = x 2
vv
v
..
matrix
x = x 2v
v
x
v
v
..
scalar
.
In fact, we have two choices for the reection Fx: v
+
= x + sign(x(1))xe
1
and v
= x sign(x(1))xe
1
. Here x(1) denotes the rst component of the vector
x. These choices are illustrated in Figure 4. A numerically more stable algorithm
Figure 4: Graphical interpretation of Householder reections.
(that will avoid cancellation of signicant digits) will be guaranteed by choosing that
reection which moves x further. Therefore we pick
v = x + sign(x(1))xe
1
,
which is the same (except for orientation) as v
.
The resulting algorithm is
45
Algorithm (Householder QR)
for k = 1 : n (sum over columns)
x = A(k : m, k)
v
k
= x + sign(x(1))x
2
e
1
v
k
= v
k
/v
k
2
A(k : m, k : n) = A(k : m, k : n) 2v
k
(v
k
A(k : m, k : n))
end
Note that the statement in the last line of the algorithm performs the reection
simultaneously for all remaining columns of the matrix A. On completion of this
algorithm the matrix A contains the matrix R of the QR factorization, and the vectors
v
1
, . . . , v
n
are the reection vectors. They will be used to calculate matrix-vector
products of the form Qx and Q
x
v
v
. To this end we note that v
x = 15 and v
v = 30.
Thus
Fx =
_
_
2
1
2
_
_
23
_
_
5
1
2
_
_
15
30
=
_
_
3
0
0
_
_
.
This vector contains the desired zero.
For many applications only products of the form Q
A = R,
so that we can apply exactly the same steps that were applied to the matrix A in the
Householder QR algorithm:
Algorithm (Compute Q
b)
for k = 1 : n
46
b(k : m) = b(k : m) 2v
k
(v
k
b(k : m))
end
For the second algorithm we use Q = Q
1
Q
2
. . . Q
n
(since Q
i
= Q
i
), so that the
following algorithm simply performs the reection operations in reverse order:
Algorithm (Compute Qx)
for k = n : 1 : 1
x(k : m) = x(k : m) 2v
k
(v
k
x(k : m))
end
The operations counts for the three algorithms listed above are
Householder QR: O
_
2mn
2
2
3
n
3
_
Q
b, Qx: O(mn)
47