Matrix Analysis Manual Solutions
Matrix Analysis Manual Solutions
James R. Schott
1
Chapter 1
1.1 (a)
1 0 0 0
A= , B= .
0 0 0 1
(b)
1 0 1 0 1 0
A= , B= , C= .
0 0 0 1 0 2
1.9 We need A = B + C, where B 0 = B and C 0 = −C. This would imply that A0 = B − C, and so
A + A0 = 2B and A − A0 = 2C. Thus, we have the solutions
1 1
B= (A + A0 ), C= (A − A0 ).
2 2
1.11 Since A0 = −A, we have −A2 = (−A)A = A0 A, and so for any m × 1 vector x,
m
X
x0 (−A2 )x = x0 A0 Ax = y 0 y = yi2 ≥ 0,
i=1
where y = Ax.
2
1.13 (a) (αA)−1 = α−1 A−1 since
α−1 A−1 αA = (α−1 α)A−1 A = Im .
so |A−1 | = |A|−1 .
diag(a−1 −1 −1 −1
11 , . . . , amm ) diag(a11 , . . . , amm ) = diag(a11 a11 , . . . , amm amm ) = diag(1, . . . , 1) = Im .
B −1 A−1 AB = B −1 Im B = B −1 B = Im .
1.15 Using the cofactor expansion formula on the first column of A, we get
1 2 0 2 1 1
|A| = 1× 2 2 1 +1× 1 2 0
−1 1 2 −1 1 2
= {4 − 2 + 0 − (0 + 8 + 1)} + {8 + 1 + 0 − (−2 + 2 + 0)} = −7 + 9 = 2.
1.17 Let B be the matrix obtained from A by replacing the kth row of A by its ith row. Note that Akj = Bkj
for j = 1, . . . , m, and so
m
X m
X
aij Akj = bkj Bkj = |B| = 0,
j=1 j=1
due to Theorem 1.4 (h). Similarly, let C be the matrix obtained from A by replacing the kth column
of A by its ith column. Note that Ajk = Bjk for j = 1, . . . , m, and so
m
X m
X
aji Ajk = bjk Bjk = |B| = 0,
j=1 j=1
3
1.19 The adjoint matrix of A is
−7 −5 9 −1
−4
4 2 0
A# = ,
−2
0 2 0
3 1 −3 1
and since |A| = 2, the inverse of A is
−7 −5 9 −1
−4
1 4 2 0
A−1 = ,
2
−2 0 2
0
3 1 −3 1
(D + αab0 )−1 = D−1 − αD−1 ab0 D−1 /(1 + αb0 D−1 a).
1.25 Partition B = A−1 as A is partitioned and note that AB = Im yields the four equations
A21 B11 + A22 B21 = (0), A21 B12 + A22 B22 = Im2 .
tuting these solutions into the third and fourth equations and then premultiplying by A−1
22 yield the
4
1.27 Expanding along the last row of A, we find that
0 1 −1 2 0 1
|A| = −2 −1 1 −1 −2 1 −1 1 = −2(2) − 2(−2) = 0,
−1 2 0 1 −1 2
so rank(A) < 4. On the other hand, the determinant of the submatrix obtained by deleting the last
row and column of A is
2 0 1
1 −1 1 = −2 6= 0.
1 −1 2
Thus, rank(A) = 3.
1.29 Theorem 1.12 (b) follows from part (a) and Theorem 1.7 since
(P Q)0 P Q = Q0 P 0 P Q = Q0 Q = Im .
1.31 (P P 0 )31 = (P P 0 )32 = 0 yield the equations p31 + p32 + p33 = 0 and p31 = p32 , which together imply
that p3 = (p31 , p32 , p33 )0 has the form p3 = (a, a, −2a)0 for some a. But p03 p3 = 6a2 and so since
(P P 0 )33 = 1, we must have a = ±1. Thus, there are two solutions: p31 = p32 = 1, p33 = −2 and
p31 = p32 = −1, p33 = 2.
which immediately leads to P10 P1 = Im1 and P20 P2 = Im2 . From the identity P P 0 = Im we get
h i P0
1 = P1 P10 + P2 P20 = Im .
P1 P 2
P20
5
1.35 (a)
1 2 −3
A= 2 2 4 .
−3 4 −1
(b)
3 1 1
A= 1 2 .
5
1 2 2
(c)
0 1 1
A= 1 1 .
0
1 1 0
1.37 Let y = Ax for an arbitrary m × 1 vector x. It follows that A = T 0 T is nonnegative definite since
m
X
x0 Ax = x0 T 0 T x = y 0 y = yi2 ≥ 0.
i=1
1.39 Let y = B 0 x for an arbitrary n × 1 vector x. Since A is nonnegative definite, it follows that
x0 BAB 0 x = y 0 Ay ≥ 0,
1.41 We have
2
E(z) = m(1) (t)|t=0 = et /2
t|t=0 = 0,
2
E(z 2 ) = m(2) (t)|t=0 = et /2
(1 + t2 )|t=0 = 1,
2 2 2
E(z 3 ) = m(3) (t)|t=0 = et /2
t(1 + t2 ) + et /2
2t|t=0 = et /2
(3t + t3 )|t=0 = 0,
2 2 2
E(z 4 ) = m(4) (t)|t=0 = et /2
t(3t + t3 ) + et /2
(3 + 3t2 )|t=0 = et /2
(3 + 6t2 + t4 )|t=0 = 3,
2 2 2
E(z 5 ) = m(5) (t)|t=0 = et /2
t(3 + 6t2 + t4 ) + et /2
(12t + 4t3 )|t=0 = et /2
(15t + 10t3 + t5 )|t=0 = 0,
2 2
E(z 6 ) = m(6) (t)|t=0 = et /2
t(15t + 10t3 + t5 ) + et /2
(15 + 30t2 + 5t4 )|t=0
2
= et /2
(15 + 45t2 + 15t4 + t6 )|t=0 = 15.
6
Pn
1.43 Since XX 0 = xi x0i and n−1 X1n = x̄, we have
i=1
n
!
X
−1
S = (n − 1) xi xi − nx̄x̄ = (n − 1)−1 (XX 0 − n−1 X1n 10n X 0 )
0 0
i=1
= (n − 1)−1 XAX 0 ,
−1/2 √ √ √
1.45 (a) With DΩ = diag(1/ 2, 1/ 2, 1/ 3), the correlation is given by
√
1 1/2 −1/ 6
−1/2 −1/2
√
P = DΩ ΩDΩ = .
1/2 1 1/ 6
√ √
−1/ 6 1/ 6 1
7
(e) Since σu2 6= 0 and |Ωv | 6= 0, the distributions given in (b) and (c) are nonsingular. Clearly,
|Ωw | = 0 since
A
rank = 3,
B
so the distribution given in (d) is singular.
In addition,
n
var(x) = E(xx0 ) = E(nzz 0 /v 2 ) = nE(v −2 )E(zz 0 ) = Im ,
n−2
since
∞ n ∞ n−2
2 −1
e−t/2 Γ( n−2
2 )
2 −1 e−t/2
Z Z
−2 −1 t t
E(v ) = t dt = dt
n/2 n
2 Γ( 2 ) 2Γ( n2 ) 2
n−2
Γ( n−2
0 0 2 )
2
Γ( n2 − 1) 1
= = ,
2( n2 − 1)Γ( n2 − 1) (n − 2)
as long as n > 2.
8
Chapter 2
(c) If α + β 6= 1, then
a1 a2
b1 b2
α
+β
c1 c2
1 − a1 − b1 − c1 1 − a2 − b2 − c2
αa1 + βa2
αb1 + βb2
=
αc1 + βc2
(α + β) − (αa1 + βa2 ) − (αb1 + βb2 ) − (αc1 + βc2 )
2.3 (a, a + b, a + b, −b)0 6= (1, 1, 1, 1)0 for any choices of a and b, so (1, 1, 1, 1)0 is not in S, whereas (4, 1, 1, 3)0
is in S since (a, a + b, a + b, −b)0 = (4, 1, 1, 3)0 when a = 4, b = −3.
9
√
dΩ (x1 , µ) = 2 and dΩ (x2 , µ) = 2/ 3, so x2 is closer to the mean.
6
cos θ = p = .572,
10(11)
(b) We have
x0 y
cos(π/6) = .866 = p ,
3(2)
so x0 y = 2.121.
kx + yk2 + kx − yk2 = hx + y, x + yi + hx − y, x − yi
2.11 (a) This is a linearly independent set since neither vector is a scalar multiple of the other.
(b) This is a linearly dependent set since (4, −1, 2)0 − 2(3, 2, 3)0 + (2, 5, 4)0 = 00 .
(c) This is a linearly independent set since the only solution to a(1, 2, 3)0 +b(2, 3, 1)0 +c(−1, 1, 1)0 = 00
is a = b = c = 0.
(d) This is a linearly dependent set of vectors because the number of vectors in the set exceeds the
dimension of the vectors.
2.13 (a) This is a linearly dependent set of vectors since 2(2, 1, 4, 3)0 = (4, 2, 8, 6)0 .
(b) Any subset of two of the vectors is a linearly independent set with the exception of the subset
consisting of the two vectors referred to in part (a).
2.15 (a) Any basis for R4 has four vectors, so this set is not a basis.
(b) It is easily verified that the only solution to a(2, 2, 2, 1)0 +b(2, 1, 1, 1)0 +c(3, 2, 1, 1)0 +d(1, 1, 1, 1)0 =
00 is a = b = c = d = 0. Since it is a set of four linearly independent vectors it must be a basis
for R4 .
10
(c) Since (2, 0, 1, 1)0 − 2(3, 1, 2, 2)0 + (2, 1, 1, 2)0 + (2, 1, 2, 1)0 = 00 , the set is linearly dependent and
consequently not a basis for R4 .
2.17 (a) The dimension of S is less than 3 since 2x1 + 3x2 − x3 = 0. Any pair of the vectors form a linearly
independent set so the dimension of S is 2, and we will use z 1 = x1 and z 2 = x2 .
(b) Clearly, any nonnull vector of the form (x0 , 00 )0 cannot be expressed as a linear combination of
vectors of the form (00 , y 0 )0 . This implies that the number of linearly independent columns of
A (0)
(0) B
(c) We establish the inequality for the first matrix given in part (c) of the theorem. The proofs for
the others are similar. Suppose A is m × n, B is p × q, and rank(A) = r. Then there exists an
11
n × n nonsingular matrix X such that AX = [A1 (0)], where A1 is m × r with rank(A1 ) = r.
Let CX = [C1 C2 ], where C1 is p × r and define B∗ = [C2 B]. Then
A (0) A (0) X (0) A (0)
rank = rank = rank 1 .
C B C B (0) Iq C1 B∗
Note that the first r columns of the last matrix given above are linearly independent of the
remaining columns since A1 x 6= 0 for all nonnull x. Thus,
A1 (0) A1 (0)
rank = rank + rank
C1 B∗ C1 B∗
= rank(A) + rank(B∗ ) ≥ rank(A) + rank(B).
2.23 First note that the set {γ 1 , . . . , γ m } is linearly independent since the matrix Γ = (γ 1 , . . . , γ m ) is
triangular with each diagonal element equal to one and, hence, |Γ| = 1. Also, for arbitrary x ∈ Rm ,
let αi = xi − xi+1 for i = 1, . . . , m − 1 and αm = xm . Then
m
X m−1
X
αi γ i = (xi − xi+1 )γ i + xm γ m
i=1 i=1
m−1
X m−1
X
= xi γ i − xi+1 γ i + xm γ m
i=1 i=1
m−1
X m
X
= xi γ i − xi γ i−1 + xm γ m
i=1 i=2
m−1
X
= x1 γ 1 + xi (γ i − γ i−1 ) + xm (γ m − γ m−1 )
i=2
m−1
X
= x1 e 1 + xi ei + xm em = x.
i=2
12
2.25 If for some C, AC = B, then R(B) ⊆ R(A) follows from Problem 2.24(a). Now suppose that R(B) ⊆
R(A). Thus, any vector in R(B) is also in R(A), and so it can be written as a linear combination of
the columns of A. In particular, each column of B can then be written as a linear combination of the
columns of A and this immediately leads to the existence of a matrix C so that AC = B.
2.27 (a) According to Definition 2.8, the columns of X = (x1 , . . . , xn ) will be a basis if they span S.
Suppose the columns of the m × n matrix Y form a basis for S so for each x ∈ S, there exists an
n × 1 vector a such that x = Y a. In particular, there exists an n × n matrix A such that X = Y A
since the columns of X are in S. But the columns of X are linearly independent so
which implies rank(A) = n and Y = XA−1 . Thus, the columns of X span S since the columns of
Y span S.
(b) Let X = (x1 , . . . , xn ) and let the columns of the m × n matrix Y form a basis for S. Since the
columns of Y must be in S and the columns of X span S, we must have Y = XB for some n × n
matrix B. Note that
rank(Y ) = rank(XB) ≤ rank(X).
But rank(Y ) = n since its columns form a basis, and so we must also have rank(X) = n, which
confirms that the columns of X are linearly independent.
(c) Let X1 = (x1 , . . . , xr ) and suppose the columns of the m × n matrix Y span S. Since the columns
of X1 are in S, X1 = Y A1 for some n × r matrix A1 . Also
from which it follows that rank(A1 ) = r, and so we can find an n × (n − r) matrix A2 such that
A = [A1 A2 ] is nonsingular. Then the n columns of
X = [X1 X2 ] = [Y A1 Y A2 ] = Y A
are in S and are linearly independent since we have rank(X) = rank(Y ) = n. The result now
follows from part (a) of this theorem.
2.29 Since
Im −F In (0)
D= , E=
(0) Ip −G Iq
13
are both nonsingular, we have
C B C B
rank = rank D E
A (0) A (0)
(0) B
= rank = rank(A) + rank(B),
A (0)
2.31 If
1 0 0 1
A= , B= ,
0 0 0 0
then rank(AB) = 1 whereas rank(BA) = 0.
13 1
y 2 = (2, 3, 1, 2)0 − (1, 2, 1, 2)0 = (7, 4, −3, −6)0 ,
10 10
so z 2 = √ 1 (7, 4, −3, −6)0 . Finally
110
10 4 1 1
y 3 = (3, 4, −1, 0)0 − (1, 2, 1, 2)0 − (7, 4, −3, −6)0 = (−6, 6, −10, 2)0 ,
10 (11/10) 10 11
so z 3 = √1 (−3, 3, −5, 1)0 .
44
−1
(c) We have z 01 x = √3 , z 0 x
10 2 = √1 ,
110
and z 03 x = √
11
, so
3 1 1 1
u = √ z1 + √ z 2 − √ z 3 = (1, 1, 1, 1)0 .
10 110 11 2
2.35 We have
1 1 17 20 −5
1
X1 = 2 1 , PS = X1 (X10 X1 )−1 X10 = 4 ,
20 26
42
3 −1 −5 4 41
1 0
and so the point in S closest to x is PS x = 21 (16, 25, 20) .
14
2.37 (a) Letting xi denote the ith column of PS , we find that α1 x1 +α2 x2 +α3 x3 +α4 x4 = 0 has solutions
α2 = α3 = .5α1 , α4 = α1 for any α1 , so PS is singular. On the other hand, α1 x1 +α2 x2 +α3 x3 = 0
has only the solution α1 = α2 = α3 = 0, so dim(S) = rank(PS ) = 3.
2.39 (a) Denoting the ith column of A by ai , note that a1 = a2 + a3 − a4 and a3 = 43 a4 − 23 a2 , while no
column of A is a scalar multiple of another column. Thus, dim(S) = rank(A) = 2 and any two
columns of A, for instance, {a2 , a4 }, form a basis for S.
(b) We have dim{N (A)} = 4 − rank(A) = 4 − 2 = 2, and so a basis will be 2 linearly independent
vectors satisfying Ax = 0. Note that (A)1· x = 0 yields x2 = − 21 (x1 + x4 ), while (A)3· x = 0 yields
x3 = − 41 (x1 + 3x4 ). By choosing x1 = 4, x4 = 0, and x1 = 0, x4 = 4, we get the basis {x1 , x2 },
where x1 = (4, −2, −1, 0)0 and x2 = (0, −2, −3, 4)0 .
(d) A14 6= 0, so 14 ∈
/ N (A).
2.41 Let {x1 , . . . , xr } be a basis for T so that for any x ∈ T , there exist scalars α1 , . . . , αr such that
Pr
x = i=1 αi xi = Xα, where X = (x1 , . . . , xr ) and α = (α1 , . . . , αr )0 . Since u is linear, we have
r
! r
X X
u(x) = u αi xi = αi u(xi ) = U α,
i=1 i=1
where A = U (X 0 X)−1 X 0 .
Ax = x − m−1 1m 10m x = x,
15
it follows that dim{R(A)} = m − 1 and dim{N (A)} = 1.
Since
ȳ1 1n1
..
ŷ = X β̂ = ,
.
ȳk 1nk
it then follows that
ni
k X
X
SSE1 = (y − ŷ)0 (y − ŷ) = (yij − ȳi )2 .
i=1 j=1
(c) The reduced model has X = 1n . In this case, X 0 X = n, (X 0 X)−1 = n−1 , and X 0 y = nȳ, so the
least squares estimator of µ in the reduced model is
This yields ŷ = X µ̂ = ȳ1n , and so the sum of squared errors for this reduced model is
ni
k X
X
0
SSE2 = (y − ȳ1n ) (y − ȳ1n ) = (yij − ȳ)2 .
i=1 j=1
16
The sum of squares for treatment is given by
X ni
k X ni
k X
X
SST = SSE2 − SSE1 = (yij − ȳ)2 − (yij − ȳi )2
i=1 j=1 i=1 j=1
X ni
k X ni
k X
X
2
= (yij − 2ȳyij + ȳ 2 ) − 2
(yij − 2ȳi yij + ȳi2 )
i=1 j=1 i=1 j=1
ni
k X
X
= (−2ȳyij + ȳ 2 + 2ȳi yij − ȳi2 )
i=1 j=1
k
X
= (−2ni ȳ ȳi + ni ȳ 2 + 2ni ȳi2 − ni ȳi2 )
i=1
k
X k
X
= ni (ȳi2 − 2ȳ ȳi + ȳ 2 ) = ni (ȳi − ȳ)2 .
i=1 i=1
(d) In Example 2.11, N is the number of observations, k + 1 is the number of parameters in the
complete model, and k2 is the difference in the number of pararmeters in the complete and
reduced models. For our models in this problem, these quantities are n, k, and k − 1, respectively.
Making these substitutions along with SSE2 − SSE1 = SST in (2.11), we get
SST /(k − 1)
F = .
SSE1 /(n − k)
2.47 Suppose y i ∈ S1 + S2 for i = 1, 2. Then for i = 1, 2, there are vectors xi ∈ Si and ui ∈ Si such that
y 1 = x1 + x2 and y 2 = u1 + u2 . Now
α1 y 1 + α2 y 2 = α1 (x1 + x2 ) + α2 (u1 + u2 ) = v 1 + v 2 ,
17
Ph
u2 = j=1 βj y j for some scalars α1 , . . . , αr , β1 , . . . , βh . Thus,
r
X h
X
x = u1 + u2 = α i xi + βj y j ,
i=1 j=1
2.55 Let S1 be the space spanned by the vector (1, 0, 0)0 and S2 be the space spanned by the vector (0, 0, 1)0 .
Then T ⊕ S1 = R3 , T ⊕ S2 = R3 , and S1 ∩ S2 = {0}.
2.59 (a) The rank of PS1 |S2 is 2 and so for a basis we can choose two linearly independent columns of
PS1 |S2 , for instance, {(1, 0, 0)0 , (1, 3, 3)0 }.
(b) Looking at the columns of I3 − PS1 |S2 , we see that S2 is spanned by the vector (1, 3, 2)0 .
(PS1 |S2 + PT1 |T2 )2 = PS21 |S2 + PS1 |S2 PT1 |T2 + PT1 |T2 PS1 |S2 + PT21 |T2
= PS1 |S2 + PS1 |S2 PT1 |T2 + PT1 |T2 PS1 |S2 + PT1 |T2 .
When PS1 |S2 PT1 |T2 = PT1 |T2 PS1 |S2 = (0), this reduces to PS1 |S2 + PT1 |T2 so it follows that PS1 |S2 +
PT1 |T2 is a projection matrix. Conversely, if PS1 |S2 + PT1 |T2 is a projection matrix, we must have
PS1 |S2 PT1 |T2 = −PS1 |S2 PT1 |T2 PS1 |S2 ,
PT1 |T2 PS1 |S2 = −PS1 |S2 PT1 |T2 PS1 |S2 ,
18
The three preceding equations together imply that
(b) Since PS1 |S2 + PT1 |T2 is a projection matrix, it can be written as PU |V . We first show that
U = S1 + T1 . Now for any x ∈ Rm , (PS1 |S2 + PT1 |T2 )x = PS1 |S2 x + PT1 |T2 x, where PS1 |S2 x ∈ S1
and PT1 |T2 x ∈ T1 , so U ⊂ S1 + T1 . Also, any w ∈ S1 + T1 can be written w = x + y, where
x ∈ S1 , y ∈ T1 , and so using the conditions in (a), we have
where z = PS1 |S2 x + PT1 |T2 y is some vector in Rm . Thus, w ∈ U and so S1 + T1 ⊂ U . This
establishes that U = S1 + T1 , and in a similar manner we will show that V = S2 ∩ T2 . For any
x ∈ V , x = (Im − PS1 |S2 − PT1 |T2 )x and, using the conditions in (a),
(Im − PS1 |S2 )x = (Im − PS1 |S2 )(Im − PS1 |S2 − PT1 |T2 )x
and
(Im − PT1 |T2 )x = (Im − PT1 |T2 )(Im − PS1 |S2 − PT1 |T2 )x
2.63 (a) Suppose xi ∈ S1 ∩ S2 for i = 1, 2. Since this implies that xi ∈ S1 and xi ∈ S2 for i = 1, 2, and S1
and S2 are convex sets, we have
cx1 + (1 − c)x2 ∈ Sj ,
for j = 1, 2, where 0 < c < 1. This confirms that cx1 + (1 − c)x2 ∈ S1 ∩ S2 , and so S1 ∩ S2 is a
convex set.
19
(b) Suppose z i ∈ S1 + S2 for i = 1, 2, so that there exist xi ∈ S1 and y i ∈ S2 for i = 1, 2, such that
z i = xi + y i . Then, for 0 < c < 1,
z∗ = cz 1 + (1 − c)z 2
= c(x1 + y 1 ) + (1 − c)(x2 + y 2 )
= x∗ + y ∗ .
Since S1 and S2 are convex sets, it follows that x∗ = cx1 +(1−c)x2 ∈ S1 and y ∗ = cy 1 +(1−c)y 2 ∈
S2 , so z ∗ ∈ S1 + S2 . Thus, S1 + S2 is a convex set.
2.65 Suppose x0i xi ≤ n−1 for i = 1, 2. Then, using the Cauchy-Schwarz inequality, we have for any 0 < c < 1,
so Bn is convex.
Pr
2.67 Let x ∈ C(S). Then according to Problem 2.66, for some r, x = i=1 αi xi , where each xi ∈ S and
Pr
α1 , . . . , αr are nonnegative scalars satisfying i=1 αi = 1. We will show that if r > m + 1, then x can
be expressed as a convex combination of r − 1 vectors in S. This can be used repeatedly until we have
x expressed as a convex combination of m+1 vectors. Since r −1 > m, the vectors x2 −x1 , . . . , xr −x1
are linearly dependent and so there are scalars β2 , . . . , βr , not all zero, such that
r
X r
X r
X r
X
0= βi (xi − x1 ) = − βi x 1 + β i xi = βi x i ,
i=2 i=2 i=2 i=1
Pr Pr
where β1 = − i=2 βi . Note that i=1 βi = 0 and since not all of the βi are 0, at least one βi is
positive. Thus, for any scalar γ,
r
X r
X r
X r
X
x= αi xi = αi xi − γ βi xi = (αi − γβi )xi .
i=1 i=1 i=1 i=1
20
Suppose we choose γ = min1≤i≤r (αi /βi : βi > 0) so that γ = αj /βj for some j. Then γ > 0,
αi − γβi ≥ 0 for all i, and αj − γβj = 0. Thus,
r
X r
X
x= (αi − γβi )xi = (αi − γβi )xi
i=1 i=1
i6=j
and
r
X r
X r
X r
X
(αi − γβi ) = (αi − γβi ) = αi − γ βi = 1,
i=1 i=1 i=1 i=1
i6=j
21
Chapter 3
9−λ −3 −4
12 −4 − λ −6 = −(λ + 1)(λ − 1)(λ − 2) = 0,
8 −3 −3 − λ
(b) An eigenvector corresponding to the −1 eigenvalue satisfies the equation Ax = −x, which leads
to the constraints x2 = 2x1 and x3 = x1 . Thus, an eigenvector has the form (x, 2x, x)0 and a unit
eigenvector is √1 (1, 2, 1)0 . An eigenvector corresponding to the 1 eigenvalue satisfies the equation
6
Ax = x, which leads to the constraints x2 = 0 and x3 = 2x1 . Thus, an eigenvector has the form
(x, 0, 2x)0 and a unit eigenvector is √1 (1, 0, 2)0 .
5
An eigenvector corresponding to the 2 eigenvalue
satisfies the equation Ax = 2x, which leads to the constraints x2 = x1 and x3 = x1 . Thus, an
eigenvector has the form (x, x, x)0 and a unit eigenvector is √1 (1, 1, 1)0 .
3
3.3 The eigenvalues of A0 are the same as those of A, so from Problem 3.1 we have the eigenvalues −1, 1, 2.
An eigenvector corresponding to the −1 eigenvalue satisfies the equation A0 x = −x, which leads to the
constraints x1 = −2x2 and x3 = x2 . Thus, SA0 (−1) = {(−2x, x, x)0 : −∞ < x < ∞}. An eigenvector
corresponding to the 1 eigenvalue satisfies the equation A0 x = x, which leads to the constraints x2 = 0
and x3 = −x1 . Thus, SA0 (1) = {(x, 0, −x)0 : −∞ < x < ∞}. An eigenvector corresponding to the 2
eigenvalue satisfies the equation A0 x = 2x, which leads to the constraints x1 = −4x2 and x3 = 2x2 .
Thus, SA0 (2) = {(−4x, x, 2x)0 : −∞ < x < ∞}. While the eigenvalues of A0 are the same as those of
A, we see that the eigenspaces of A0 are not the same as those of A.
22
(b) The equation Ax = −2x leads to the constraints x2 = x4 = 0 and x3 = −x1 so SA (−2) =
{(x, 0, −x, 0)0 : −∞ < x < ∞}. The equation Ax = −x leads to the constraints x1 = −2x3 ,
x2 = x3 and x4 = 0, so SA (−1) = {(−2x, x, x, 0)0 : −∞ < x < ∞}. The equation Ax = x leads to
the constraints x1 = 2x3 and x2 = 3x3 , so SA (1) = {(2x, 3x, x, y)0 : −∞ < x < ∞ −∞ < y < ∞}.
3.7 (a) Since N ≥ 2k + 1, we can find an N × k matrix W satisfying W 0 [Z1 y] = (0). We can choose
the columns of W so that they are orthogonal and scaled so that W 0 W = γIk . Under these
conditions, the ordinary least squares estimate of δ 1 in the model
y = δ0 1N + (Z1 + W )δ 1 +
is
(b) Let U be any k × k matrix satisfying U 0 U = γIk . Then the ordinary least squares estimate of δ 1
in the model
y δ0 1N Z1
= + δ1 +
0 0 U ∗
is
0 −1 0
Z Z1 Z1 y
1
= (Z10 Z1 + U 0 U )−1 Z10 y
U U U 0
= (Z10 Z1 + γIk )−1 Z10 y.
Thus, the characteristic equations of AB and BA are the same so they have the same eigenvalues.
Similarly, if A is nonsingular, we have
|AB − λIm | = |A−1 ||AB − λIm ||A| = |A−1 ABA − λA−1 A| = |BA − λIm |.
23
3.11 (a) P is an orthogonal matrix since
cos2 θ + sin2 θ cos θ sin θ − cos θ sin θ
P 0P = = I2 .
cos θ sin θ − cos θ sin θ cos2 θ + sin2 θ
which yields the two solutions λ = cos θ + i sin θ and λ = cos θ − i sin θ.
Thus, A0 and A have the same characteristic equation, and so they have the same eigenvalues.
(b) A is singular if and only if its columns are linearly dependent, that is, if and only if Ax = 0 for
some nonnull x. However, this is equivalent to saying A has a zero eigenvalue.
(c) It follows from Problem 1.22 that the characteristic equation of a triangular matrix A is
m
Y
|A − λIm | = (aii − λ),
i=1
Consequently, BAB −1 and A have identical characteristic equations, and so the two matrices have
identical eigenvalues.
24
3.15 It is easily verified that the matrices
0 0 0 1
A= , C= ,
0 0 0 0
both have eigenvalue 0 with multiplicity 2. Clearly, no B exists for which C = BAB −1 since rank(A) =
0 and rank(C) = 1.
3.17 If rank(A−λIm ) = m−1, then there is one linearly independent vector x satisfying (A−λIm )x = 0 or,
equivalently, Ax = λx. Thus, λ is an eigenvalue of A and dim{SA (λ)} = 1. It follows from Theorem
3.3 that the multiplicity of the eigenvalue λ is at least one.
3.19 Since A is a triangular matrix, it follows from Theorem 3.2(c) that it has eigenvalue 1 with multiplicity
m. The eigenvalue-eigenvector equation, Ax = x, leads to the m equations xi + xi+1 = xi , for
i = 1, . . . , m − 1, and xm = xm . It follows that xi+1 = 0 for i = 1, . . . , m − 1 and x1 is arbitrary. Thus,
there is only one linearly independent eigenvector of the form (x, 0, . . . , 0)0 .
(b) Since
(A + A−1 )xi = Axi + A−1 xi = λi xi + λ−1 −1
i xi = (λi + λi )xi ,
λi + λ−1
i is an eigenvalue of A + A−1 corresponding to the eigenvector xi .
(c) Since
(Im + A−1 )xi = xi + A−1 xi = xi + λ−1 −1
i xi = (1 + λi )xi ,
1 + λ−1
i is an eigenvalue of Im + A−1 corresponding to the eigenvector xi .
3.23 (a) Using Theorem 3.4(b) (see also Problem 3.21), we have
25
(b) An application of Theorem 1.9 reveals that
3.27 Note that 1m 10m 1m = m1m , while 1m 10m x = 0x for any x orthogonal to 1m . Thus, m is an eigenvalue
with corresponding eigenvector being any vector in the space S spanned by 1m , and 0 is an eigenvalue,
having multiplicity m − 1, with corresponding eigenvector being any vector in S ⊥ .
3.29 (a) Note that (αIm + β1m 10m )1m = (α + mβ)1m , while (αIm + β1m 10m )x = αx for any x orthogonal
to 1m . Thus, α + mβ is an eigenvalue with corresponding eigenvector being any vector in the
space S spanned by 1m , and α is an eigenvalue, having multiplicity m − 1, with corresponding
eigenvector being any vector in S ⊥ .
(b) The eigenspaces are S and S ⊥ as indicated in part (a). The eigenprojection corresponding to the
eigenvalue α + mβ is
1
PS =1m 10m ,
m
while the eigenprojection corresponding to the eigenvalue α is
1
PS ⊥ = Im − PS = Im − 1m 10m .
m
(c) None of the eigenvalues can be 0, so we must have α 6= 0 and β 6= −α/m.
(d) We have
−1 1
−1 0 −1 1 0
A = (α + mβ) 1m 1m + α Im − 1m 1m
m m
1 1 1
= α−1 Im + − 1m 10m
α + mβ α m
β
= α−1 Im − 1m 10m .
α(α + mβ)
26
(e) The determinant is the product of the eigenvalues so
(b) The rank of A is 2 since it is symmetric and has two nonzero eigenvalues.
27
3.35 Let λ1 , . . . , λm denote the eigenvalues of A. Using Problem 3.33, we have
m
X m X
X m m
X m X
X m
λ2i = a2ij = a2ii + a2ij
i=1 i=1 j=1 i=1 i=1 j=1
j6=i
m
X m X
X m
= λ2i + a2ij .
i=1 i=1 j=1
j6=i
Thus, the final double sum must reduce to zero and this requires aij = 0 for all i 6= j.
3.39 (a) Let λ1 , . . . , λm be the eigenvalues of A, so that by Theorem 3.4(a), λk1 , . . . , λkm are the eigenvalues
of Ak . But Ak = (0), so every eigenvalue of Ak is 0. That is, λki = 0 for all i, so we must have
λi = 0 for all i.
satisfies A2 = (0).
3.41 We will prove the max-min result; the proof of the min-max result is similar. Note that we may
Pk 0
without loss of generality consider the max-min of the sum j=1 xj Axj if we restrict each xj to
be a unit vector. Now suppose the columns of Cij are eigenvectors corresponding to the eigenvalues
λij +1 , . . . , λm . Then Ci0j xj = 0 implies that xj is a linear combination of orthonormal eigenvectors
28
corresponding to the eigenvalues λ1 , . . . , λij , and so we must have x0j Axj ≥ λij . Thus, with this
particular choice for Ci1 , . . . , Cik , we get
k
X k
X
x0j Axj ≥ λij .
j=1 j=1
Pk
That is, the max-min must be greater than or equal to λij . As a result, the proof will be complete
j=1
Pk
if we can show that the max-min is less than or equal to j=1 λij . We prove this result by induction
on m. Note that the result is trivially true when k = m. In particular, the result is true when m = 1.
Now suppose m > 1, k < m, and the result holds for (m − 1) × (m − 1) matrices. Let Ci1 , . . . , Cik be
given matrices satisfying the conditions of the theorem. We need to show that there are orthonormal
Pk Pk
vectors x1 , . . . , xk , with Ci0j xj = 0, such that j=1 x0j Axj ≤ j=1 λij .
First suppose that ik < m so that dim{N (Ci0k )} ≤ m − 1. Let the columns of Y = (y 1 , . . . , y m−1 ) be
orthonormal vectors satisfying Ci0j y l = 0 for l = 1, . . . , ij and j = 1, . . . , k. Then if µ1 , . . . , µm−1 are
the eigenvalues of B = Y 0 AY , it follows from Theorem 3.19 that
µi ≤ λi , for i = 1, . . . , m − 1. (1)
Let Dij = Y 0 Cij , so that if Di0j u = 0, then Ci0j w = 0 and u0 Bu = w0 Aw, where w = Y u. Thus,
since B is (m − 1) × (m − 1), we know by our induction hypothesis that there are orthonormal vectors
u1 , . . . , uk satisfying Di0j uj = 0 and
k
X k
X
u0j Buj ≤ µij . (2)
j=1 j=1
Pk Pk
Thus, if wj = Y uj , then Ci0j wj = 0, j=1 u0j Buj = j=1 w0j Awj , and so from (1) and (2)
k
X k
X
w0j Awj ≤ λ ij ,
j=1 j=1
as required.
Next suppose that ik = m and let l be the largest index for which il +1 < il+1 . We know such an l exists
since k < m. Let Sm−1 be an (m − 1)-dimensional space that contains N (Ci0l ) and the eigenvectors of
A corresponding to the eigenvalues λil+1 , . . . , λm ; there is such a space since m − il+1 + 1 + il < m. By
the definition of l, it follows that il+1 , . . . , m are among the indices i1 , . . . , ik and
0
N (Ci0l ) ⊂ N (Ci0l+1 ) ∩ Sm−1 ⊂ · · · ⊂ N (Cm−1 ) ∩ Sm−1 ⊂ Sm−1 .
29
Since for j = il+1 , . . . , m − 1, dim{N (Cj0 ) ∩ Sm−1 ) ≥ j − 1, we can find subspaces Sil+1 −1 , . . . , Sm−2
such that dim(Sj ) = j,
Sil+1 −1 ⊂ N (Ci0l+1 ), . . . , Sm−2 ⊂ N (Cm−1
0
)
and
N (Ci01 ) ⊂ · · · ⊂ N (Ci0l ) ⊂ Sil+1 −1 ⊂ · · · ⊂ Sm−2 ⊂ Sm−1 .
Consider orthonormal vectors x1 , . . . , xk that satisfy xj ∈ N (Ci0j ) for j = 1, . . . , l and xj ∈ Sij −1 for
j = l + 1, . . . , k. Since each xj ∈ Sm−1 , there exist an m × (m − 1) semiorthogonal matrix Y and
orthonormal (m − 1) × 1 vectors u1 , . . . , uk such that xj = Y uj . As a result, by using the induction
hypothesis on the matrix B = Y 0 AY , we have
k
X k
X l
X m−1
X l
X m−1
X
x0j Axj = u0j Buj ≤ µij + µj ≤ λij + µj , (3)
j=1 j=1 j=1 j=il+1 −1 j=1 j=il+1 −1
where the second inequality utilized Theorem 3.19. Since the eigenvectors of A corresponding to the
eigenvalues λil+1 , . . . λm are in Sm−1 , it follows that λil+1 , . . . λm are also eigenvalues of B. Since
µil+1 −1 , . . . , µm−1 are the smallest eigenvalues of B, this then guanantees that
m−1
X m
X
µj ≤ λj ,
j=il+1 −1 j=il+1
and when this is substituted into (3), we get the desired result.
30
x0 Ax
= min + λm (B)
0
Ch x=0 x0 x
x6=0
x0 Ax
≥ min ,
0
Ch x=0 x0 x
x6=0
where the last equality follows from Theorem 3.16. Now maximizing both sides of the equation above
over all choices of Ch satisfying Ch0 Ch = Im−h and using (3.10) of Theorem 3.18, we get
x0 (A + B)x
λh (A + B) = max min
Ch 0
Ch x=0 x0 x
x6=0
x0 Ax
≥ max min = λh (A).
Ch 0
Ch x=0 x0 x
x6=0
3.47 Define G, E, T and H as in the proof of Theorem 3.31 so that A = GΛG0 , B = GG0 , E = G0 F ,
E 0 E = T 2 , and H = ET −1 .
λh−i+1 {(F 0 BF )−1 (F 0 AF )} = λh−i+1 {(F 0 GG0 F )−1 F 0 GΛG0 F } = λh−i+1 {(T 2 )−1 E 0 ΛE}
(b) We have
where the last equality follows from the lower bound in Theorem 3.19 and the fact that the bound
is attained when H 0 = [(0) Ih ].
(c) We have
31
where the last equality follows from the lower bound in Theorem 3.19 and the fact that the bound
is attained when H 0 = [(0) Ih ].
so the eigenvalues of T T 0 are 3 and 9. The equation T T 0 x = 3x yields the constraint x2 = −x1 ,
so an eigenvector corresponding to 3 has the form (x, −x)0 . The equation T T 0 x = 9x yields the
constraint x2 = x1 , so an eigenvector corresponding to 9 has the form (x, x)0 .
(b) We have
5 1 4
T 0T = 1 −1 .
2
4 −1 5
The positive eigenvalues of T T 0 and T 0 T are the same, so T 0 T has eigenvalues 0, 3, and 9. The
equation T 0 T x = 0x yields the constraints x3 = x2 = −x1 , so an eigenvector corresponding to 0
has the form (x, −x, −x)0 . The equation T 0 T x = 3x yields the constraints x2 = 2x1 and x3 = −x1 ,
so an eigenvector corresponding to 3 has the form (x, 2x, −x)0 . The equation T 0 T x = 9x yields
the constraints x2 = 0 and x3 = x1 , so an eigenvector corresponding to 9 has the form (x, 0, x)0 .
where Y = (y 1 , . . . , y m ) = Λ1/2 X 0 . Note that aii = y 0i y i and so aii = 0 implies y i = 0. But this
implies aij = aji = 0 for any j since aij = aji = y 0i y j .
32
(b) Since
ABxi = Aγi xi = γi Axi = λi γi xi ,
(c) The spectral decompositions of A and B have a common orthogonal matrix, that is, they can be
written as A = XΛX 0 and B = XΓX 0 . The matrices Λ and Γ are diagonal so ΛΓ = ΓΛ. As a
result,
AB = XΛX 0 XΓX 0 = XΛΓX 0 = XΓΛX 0 = XΓX 0 XΛX 0 = BA.
if B 0 B = Ir . Note that the lower bound is attained when B contains as its columns, normalized
eigenvectors of A corresponding to the eigenvalues λm−r+1 , . . . , λm , and so we must have
r
X
min
0
tr(B 0 AB) = λm−i+1 .
B B=Ir
i=1
Similarly, the upper bound is attained when B contains as its columns, normalized eigenvectors
of A corresponding to the eigenvalues λ1 , . . . , λr , and so we must have
r
X
max
0
tr(B 0 AB) = λi .
B B=Ir
i=1
33
(b) Using the bounds given in part (a) and the choice of B = (e1 , . . . , er ), the result follows
3.61 For a proof of this result, see Khattree, R., 2002, Generalized antieigenvalues and antieigenvectors,
American Journal of Mathematical and Management Sciences 22, 89–98.
34
Chapter 4
4.1 Now
10 4
AA0 = ,
4 4
and |AA0 − λI2 | = (12 − λ)(2 − λ), so the eigenvalues of AA0 are 2 and 12. The equation AA0 x = 2x
leads to the constraint x2 = −2x1 , so an eigenvector corresponding to 2 has the form (x, −2x)0 . The
equation AA0 x = 12x leads to the constraint x1 = 2x2 , so an eigenvector corresponding to 12 has the
form (2x, x)0 . The positive eigenvalues of
2 3 3 0
3 5 5 1
A0 A =
3 5 5 1
0 1 1 2
are also 2 and 12. The equation A0 Ax = 2x leads to the constraints x2 = x3 = 0 and x4 = −3x1 ,
so an eigenvector corresponding to 2 has the form (x, 0, 0, −3x)0 . The equation A0 Ax = 12x leads
to the constraints x1 = 3x4 and x2 = x3 = 5x4 , so an eigenvector corresponding to 12 has the form
(3x, 5x, 5x, x)0 . As a result, the singular value decomposition of A is given by
√ √ √ √ √
1/ 5 2/ 5 2 0 −1/ 10 0 0 3/ 10
A= √ √ √ √ √ √ √ .
−2/ 5 1/ 5 0 12 3/ 60 5/ 60 5/ 60 1/ 60
4.3 A has a zero eigenvalue if and only if rank(A) = r < m, and this condition is equivalent to rank(AA0 ) =
r < m which hold if and only of A has r singular values.
4.5 We prove the result when m ≥ n. The proof for m < n is similar. Let ∆ be the n × n matrix
∆ = diag(µ1 , . . . , µr , 0, . . . , 0). Then the singular value decomposition of A can be written
∆ ∆
A=P Q0 = [P1 P2 ] Q0 = P1 ∆Q0 ,
(0) (0)
35
Then U U 0 = U 0 U = Im+n and
∆ (0) (0)
0 (0) A
U (0) −∆ (0) U = ,
A0 (0)
(0) (0) (0)
and
A0 Ay = (x0 x)(y 0 y)y.
Thus, (x0 x)(y 0 y) is the one positive eigenvalue of AA0 and A0 A with corresponding unit eigenvectors
p
given by p = (x0 x)−1/2 x and q = (y 0 y)−1/2 y. The singular value is d = (x0 x)(y 0 y) and the singular
value decomposition is then given by A = xy 0 = pdq 0 .
and
BAB = Q1 ∆−1 P10 P1 ∆Q01 Q1 ∆−1 P10 = Q1 ∆−1 ∆∆−1 P10 = Q1 ∆−1 P10 = B.
MSE(ŷ) = var(ŷ) = {N −1 10N + z 0 (Z10 Z1 )−1 Z10 }(σ 2 IN ){N −1 10N + z 0 (Z10 Z1 )−1 Z10 }0
36
(b) Note that since V10 1N = 0, ȳ and z 0 U1 D1−1 V10 y are independently distributed, and so
= σ 2 (N −1 + z 0 U1 D1−2 U10 z)
k−r
!
X
−1
= σ (N 2
+ v 01 v 1 ) =σ 2
N −1
+ vi2 ,
i=1
E(ỹ) = E(ȳ) + z 0 U1 D1−1 V10 E(y) = δ0 + z 0 δ 1 − z 0 U2 D2−2 U20 Z10 E(y), (1)
and so
where α1 = (α011 , α012 )0 = U 0 δ 1 . Combining (1) and (2) yields the result.
(c) MSE(ỹ) <MSE(ŷ) when d2k vk2 αk2 < σ 2 vk2 , that is, when d2k αk2 < σ 2 .
so the eigenvalues of A are 4 with multiplicity two and 1. The equation Ax = x leads to the
constraints x2 = −x1 and x3 = x1 , so an eigenvector corresponding to 1 has the form (x, −x, x)0 .
The equation Ax = 4x leads to the constraint x3 = x2 − x1 . Consequently, two orthogonal
eigenvectors corresponding to 4 are of the form (x, x, 0)0 and (x, −x, −2x)0 . As a result, the
spectral decomposition of A can be expressed as A = XΛX 0 , where
√ √ √
1/ 3 1/ 2 1/ 6 1 0 0
√ √ √
X = −1/ 3 1/ 2 −1/ 6 , Λ= 0 0 .
4
√ √
1/ 3 0 −2/ 6 0 0 4
37
(b) The symmetric square root of A is given by
5 1 −1
1
XΛ1/2 X 0 = 1 .
1 5
3
−1 1 5
(c) A nonsymmetric square root matrix of A has the form A = XΛ1/2 Q0 , where Q is an orthogonal
matrix other than X. For instance, when Q = I3 , we get
√ √ √
1/ 3 2 2/ 6
√ √ √
A1/2 = −1/ 3 2 −2/ 6 .
√ √
1/ 3 0 −4/ 6
so the nonzero eigenvalues of A are 5 and 10. The equation Ax = 5x leads to the constraints x2 = 0 and
3x3 = −4x1 , so an eigenvector corresponding to 5 has the form (3x, 0, −4x)0 . The equation Ax = 10x
leads to the constraints 4x2 = 5x1 and 4x3 = 3x1 , so an eigenvector corresponding to 10 has the form
(4x, 5x, 3x)0 . Consequently, we can compute T as
√
3/5 4/ 50 √ 3 4
√ 5 0 1
T = 0 = √ 5 .
5/ 50 √ 0
√ 0 10 5
−4/5 3/ 50 −4 3
4.17 (a) An eigenanalysis of A reveals that it has eigenvalue 0 with multiplicity two and eigenvalue 3.
By inspection of the equation Ax = 0, we find any eigenvector corresponding to 0 has the form
(−x, −2x, x)0 . Since there is only one linearly independent eigenvector corresponding to the 0
eigenvalue, A is not diagonalizable. It is easily shown that B has eigenvalues 0 and 1 with
multiplicity two. The equation Bx = x yields the single constraint x2 = x1 , so an eigenvector
corresponding to 1 has the form (x, x, y)0 . Since there are two linearly independent vectors of
this form, B is diagonalizable. Finally, an eigenanalysis of C reveals that it has eigenvalue
4 with multiplicity two and eigenvalue 0. By inspection of the equation Cx = 4x, we find
any eigenvector corresponding to 4 has the form (x, x, −x)0 . Since there is only one linearly
independent eigenvector corresponding to the 4 eigenvalue, C is not diagonalizable.
38
(b) Each matrix has rank of 2 so B and C have their rank equal to the number of nonzero eigenvalues.
4.19 Suppose A is nonsingular. Since AB is diagonalizable, there exist an m × m nonsingular matrix X and
an m × m diagonal matrix D such that
AB = XDX −1 .
where Y = A−1 X. This confirms that BA is diagonalizable. A similar proof can be constructed for
the case in which B is nonsingular. Next consider the 2 × 2 matrices
0 1 1 0
A= , B= .
0 0 0 0
where A = diag(A1 , . . . , Ak−1 ) and B = Ak , and then again use the proof given for the 2 × 2 form.
4.23 (a) Note that y 0i xj = (Y 0 X)ij = (X −1 X)ij = (Im )ij , so y 0i xj = 0 if i 6= j and y 0i xj = 1 if i = j. Now
rj
ri X
X
PA (µi )PA (µj ) = xMi +k y 0Mi +k xMj +l y 0Mj +l . (1)
k=1 l=1
Since the sets {Mi + 1, . . . , Mi + ri } and {Mj + 1, . . . , Mj + rj } have no elements in common when
i 6= j, (1) reduces to (0) when i 6= j.
39
(b) We have
h
X X ri
h X m
X
PA (µi ) = xMi +j y 0Mi +j = xl y 0l
i=1 i=1 j=1 l=1
m
X m
X
= (X)·l (Y 0 )l· = (X)·l (X −1 )l·
l=1 l=1
−1
= XX = Im .
(c) Write Xi = (xMi +1 , . . . , xMi +ri ) and Yi = (y Mi +1 , . . . , y Mi +ri ), so that PA (µi ) = Xi Yi0 . Note
that the null space of (A − µi Im ) is the eigenspace of µi so that N (A − µi Im ) = R(Xi ). Since
Yi0 Xi = Iri follows from Y 0 X = X −1 X = Im , we have
R{PA (µi )} = R(Xi Yi0 ) ⊆ R(Xi ) = R(Xi Yi0 Xi ) = R{PA (µi )Xi } ⊆ R{PA (µi )}.
Thus, R{PA (µi )} = N (A − µi Im ), that is, PA (µi ) is a projection matrix for the null space
of (A − µi Im ). All that remains is to show that this projection is along the column space of
(A − µi Im ) by showing N {PA (µi )} = R(A − µi Im ). Now using part (a), we have
h
X
PA (µi )(A − µi Im ) = µj PA (µi )PA (µj ) − µi PA (µi )
j=1
so
R(A − µi Im ) ⊆ N {PA (µi )}. (2)
This along with (2) confirms that N {PA (µi )} = R(A − µi Im ) and so the proof is complete.
4.25 (a)
0 0 0 0
0 0 0 0
.
0 0 0 0
0 0 0 1
40
(b)
0 0 0 0
0 0 1 0
.
0 0 0 0
0 0 0 1
(c)
0 1 0 0
0 0 1 0
.
0 0 0 0
0 0 0 1
4.27 (a) Inspection of the diagonal elements reveals that the eigenvalues are 2 and 3 with multiplicities
four and two.
4.29 Since (A − λI5 )2 = (0), the Jordan canonical form has no hi larger than 2. Thus, the forms from
Problem 4.26 satisfying this constraint are
diag(J1 (λ), J1 (λ), J1 (λ), J1 (λ), J1 (λ)), diag(J2 (λ), J1 (λ), J1 (λ), J1 (λ)), diag(J2 (λ), J2 (λ), J1 (λ)).
4.31 Let J = D+B be as in part (b) of Problem 4.28. According to Theorem 4.11, there exists a nonsingular
matrix X such that A = XJX −1 . Since all of the eigenvalues of A are zero, D = (0), and so
A = XBX −1 . As a result, Am = XB m X −1 . But from Problem 4.28 part (b), we know that B h = (0)
if h ≥ maxi hi . Since maxi hi ≤ m, we must have B m = (0), and hence Am = (0).
4.33 Let A1 = A − λIm . Note that λ is an eigenvalue of A with multiplicity r if and only if A1 has m − r
nonzero eigenvalues. But from Problem 4.32, rank(A1 ) = rank(A21 ) is equivalent to the condition
rank(A1 ) = rank(A − λIm ) = m − r. The result now follows from Theorem 4.8.
41
4.35 Since V = T U is upper triangular, it follows that vij = 0 for 1 ≤ j < i ≤ r + 1. Now for 1 ≤ i ≤ j ≤ r,
note that
m
X
vij = (T U )ij = (T )i· (U )·j = til ulj
l=1
j
X
= til ulj = 0,
l=i
where the fourth equality follows from the fact that T and U are upper triangular and the final equality
follows since tij = 0 for 1 ≤ i ≤ r, 1 ≤ j ≤ r. When 1 ≤ i ≤ j = r + 1, we have
j
X
vij = til ulj = tir+1 ur+1r+1 = 0,
l=i
since ur+1r+1 = 0.
√ √ √
4.37 In Problem 4.17, we saw that (1/ 3, 1/ 3, −1/ 3)0 is a normalized eigenvector of C corresponding
to the eigenvalue 4. Define √ √ √
1/ 3 1/ 2 1/ 6
√ √ √
Y = −1/ 2 ,
1/ 3 1/ 6
√ √
−1/ 3 0 2/ 6
and note that √ √
4 −1/ 6 5/ 2
√
Y 0 CY = 0 −6/ 3 .
2
√
0 −2/ 3 2
A normalized eigenvector of √
2 −6/ 3
B= √
−2/ 3 2
√
corresponding to the eigenvalue 0 is ( 3/2, 1/2)0 and
√
0 −4/ 3
W 0 BW =
0 4
when √
3/2 −1/2
W = √ .
1/2 3/2
42
A Schur decomposition of C is then given as C = XT X 0 , where
√ √
1/ 3 2/ 6 0
1 00 √ √ √
X=Y = −1/ 6 ,
1/ 3 1/ 2
0 W √ √ √
−1/ 3 1/ 6 1/ 2
and √ √
4 2 8/ 6
√
T = 0 −4/ 3 .
0
0 0 4
2
|tij | is uniquely defined. Let A = XT X ∗ be
2
P P
so that i<j |tij | is uniquely defined if and only if i≤j
= tr(A∗ A).
4.41 The proof is very similar to the proof of Theorem 4.18. First suppose that such a nonsingular matrix X
does exist; that is, there is a nonsingular matrix X such that X −1 AX = Λ1 and X −1 BX = Λ2 , where
Λ1 and Λ2 are diagonal matrices. Then because Λ1 and Λ2 are diagonal matrices, clearly Λ1 Λ2 = Λ2 Λ1 ,
so we have
and hence, A and B do commute. Conversely, now assuming that AB = BA, we need to show that
such a nonsingular matrix X does exist. Let µ1 , . . . , µh be the distinct values of the eigenvalues of A
having multiplicities r1 , . . . , rh , respectively. Since A is diagonalizable, a nonsingular matrix Y exists,
satisfying
Y −1 AY = Λ1 = diag(µ1 Ir1 , . . . , µh Irh ).
43
Performing this same transformation on B and partitioning the resulting matrix in the same way that
Y −1 AY has been partitioned, we get
C11 C12 ··· C1h
C21 C22 ··· C2h
−1
C = Y BY =
.. .. ..
,
. . .
Ch1 Ch2 ··· Chh
where Cij is ri × rj . Note that because AB = BA, we must have
Λ1 C = Y −1 AY Y −1 BY = Y −1 ABY = Y −1 BAY
= Y −1 BY Y −1 AY = CΛ1 .
Equating the (i, j)th submatrix of Λ1 C to the (i, j)th submatrix of CΛ1 yields the identity µi Cij =
µj Cij . Since µi 6= µj if i 6= j, we must have Cij = (0) if i 6= j; that is, the matrix C =
diag(C11 , . . . , Chh ) is block diagonal. Now clearly C is diagonalizable since B is, and so from Problem
4.21, we know that Cii is diagonalizable for each i. Thus, we can find an ri × ri nonsingular matrix Zi
satisfying
Zi−1 Cii Zi = ∆i ,
X −1 AX = Z −1 Y −1 AY Z = Z −1 Λ1 Z
and
X −1 BX = Z −1 Y −1 BY Z = Z −1 CZ
= diag(∆1 , . . . , ∆h ) = ∆,
44
4.43 From Problem 4.41, it follows that there exists a nonsingular matrix X such that X −1 AX = Λ =
diag(λi1 , . . . , λim ) and X −1 BX = M = diag(µj1 , . . . , µjm ), where (i1 , . . . , im ) and (j1 , . . . , jm ) are
permutations of (1, . . . , m), As a result,
X −1 (A + B)X = X −1 AX + X −1 BX
Thus, A + B is diagonalizable with eigenvalues λi1 + µj1 , . . . , λim + µjm , so the result follows.
4.45 Consider
1 0 −1 0
A= , B= ,
0 0 0 0
so that CAC 0 and CBC 0 are both diagonal when C = I2 . Any linear combination of A and B is
singular so Theorem 4.15 does not apply, and B is not nonnegative definite so Theoreom 4.16 does not
apply.
4.47 Let C be defined as in the solution to the previous problem. For any y ∈ Rm , let x = C 0 y so
Thus, A + B is positive definite if and only if Λ + Im is positive definite. The result follows since Λ + Im
is positive definite if and only if λi > −1 for all i.
4.49 (a) Let B = XΛX 0 be the spectral decomposition of B and P DP 0 be the spectral decomposition of
Λ1/2 X 0 AXΛ1/2 . Put Y = XΛ−1/2 P . Then
= P 0 Λ1/2 X 0 AXΛ1/2 P
= P 0 P DP 0 P = D,
so AB is diagonalizable.
45
where H11 is r × r. Since B is nonnegative definite, so is H and as a result, it can be expressed
as H = T T 0 for some m × m matrix T . Partitioning T appropriately, we have
0 0
H11 H12 T1 T1 T1 T1 T2
= [T10 T20 ] = ,
0
H12 H22 T2 T2 T1 T2 T20
0
which reveals that R(H12 ) ⊂ R(H11 ). Thus, H12 = H11 X for some r × (m − r) matrix X, and as
a result
H11 (0)
Y 0 HY = ,
(0) H22 − X 0 H11 X
where
Ir −X
Y = .
(0) Im−r
Since H11 and H22 − X 0 H11 X are symmetric, there are orthogonal matrices Q1 and Q2 , and
diagonal matrices D1 and D2 , such that Q01 H11 Q1 = D1 and Q02 (H22 − X 0 H11 X)Q2 = D2 . Let
F = CY Q, where Q = diag(Q1 , Q2 ), and note that F −1 AF −10 = diag(Ir , (0)) and F 0 BF = D =
diag(D1 , D2 ). Consequently,
and so AB is diagonalizable.
4.51 Clearly, kAk∗ is nonnegative and reduces to zero if and only if aij = 0 for all i and j, so properties (a)
and (b) of a matrix norm hold. Also, property (c) holds since |caij | = |c||aij |, while property (d) holds
since
kA + Bk∗ = m max |aij + bij | ≤ m max {|aij | + |bij |}
1≤i,j≤m 1≤i,j≤m
≤ m max |aij | + m max |bij | = kAk∗ + kBk∗ .
1≤i,j≤m 1≤i,j≤m
46
4.53 (a) Using property (e) of the matrix norm k · k, we have
2
kIm k = kIm k = kIm Im k ≤ kIm kkIm k = kIm k2 ,
which yields
kIm k
kA−1 k ≥ ≥ kAk−1 ,
kAk
where the second inequality follows from the result from part (a).
4.55 Let A = P DQ0 be a singular value decomposition of A so that the m × m matrices P and Q are
orthogonal and the matrix D has the form given in (a) or (d) of Theorem 4.1 since n = m. Then we
have
4.57 Let
0 1 0 0
A= , B= .
0 0 1 0
Both A and B have eigenvalue 0 with multiplicity two, so ρ(A) = ρ(B) = 0.
47
(b) Note that
1 0
AB = .
0 0
Since AB has eigenvalues of 0 and 1, we have
4.59 (a) We prove the Ar = L∗ U∗ factorization by induction. The result holds for r = 2 since
a11 0 1 a12 /a11 a11 a12
= ,
a21 a22 − a21 a12 /a11 0 1 a21 a22
where clearly L∗ and U∗ are nonsingular since a11 6= 0 and |A2 | = a11 a22 − a21 a12 6= 0. Now
suppose r > 2 and the factorization holds for (r − 1) × (r − 1) matrices. Partition Ar as
B c
Ar = ,
d0 e
Since L and U are nonsingular, the unique solutions to Lw = c and f 0 U = d0 are given by
w = L−1 c and f 0 = d0 U −1 . Finally, we solve gv + f 0 w = e by setting v = 1 and g = e − f 0 w.
Note that our resulting lower and upper triangular matrices are nonsingular since L and U are,
v 6= 0, and g = e − d0 U −1 L−1 c = e − d0 B −1 c 6= 0, a consequence of the fact that |Ar | =
6 0. This
proves the Ar = L∗ U∗ factorization.
Partition A, L, and U as
A11 A12 L11 (0) U11 U12
A= , L= , U = ,
A21 A22 L21 L22 (0) U22
48
Now L11 and U11 can be obtained via the process described in the first part of this problem, and
from (1) we immediately get U12 = L−1 −1
11 A12 and L21 = A21 U11 . Since rank(A) = rank(A11 ) = r,
it follows that A21 = BA11 and A22 = BA12 for some (m − r) × r matrix B. Then from (1), we
have
−1 −1
A22 = L21 U12 + L22 U22 = A21 U11 L11 A12 + L22 U22 (1)
= BA11 A−1
11 A12 + L22 U22 = BA12 + L22 U22 (2)
Thus, we must have L22 U22 = (0). For instance, one of the matrices can be chosen to be the null
matrix, while the other matrix can be chosen so that L or U is nonsingular and triangular.
(c) If A = LU , the equation Ax = c can be solved in two steps. First we find a solution y to the
equation
Ly = c, (1)
U x = y. (2)
Since both L and U are triangular matrices, solutions to (1) and (2) are easily obtained. For
instance, if both L and U are nonsingular matrices, the solution to (1) has
Pi−1
c1 ci − j=1 lij yj
y1 = , yi = , for i = 2, . . . , m,
l11 lii
while the solution to (2) has
Pi−1
ym ym−i − j=0 um−i,m−j xm−j
xm = , xm−i = , for i = 1, . . . , m − 1.
umm um−i,m−i
49
4.61 (a) It follows from Problem 4.59(a) that A can be expressed as A = L∗ U∗ , where L∗ and U∗0 are
nonsingular lower triangular m × m matrices. Let D1 and D2 be the diagonal matrices that have
the same diagonal elements as L∗ and U∗ , respectively. Define L = L∗ D1−1 , M 0 = D2−1 U∗ , and
D = D1 D2 , and note that L and M are lower triangular matrices with all diagonal elements equal
to one. Then the result follows since
(b) Note that if A has the LDM 0 decomposition from part (a), then
and so, since A is symmetric, we must have (M )·1 = (L)·1 . In addition, if the first k − 1 columns
of M are the same as those of L, then
k
X k−1
X
(A)0k· = lki dii (M )·i = lki dii (L)·i + dkk (M )·k , (1)
i=1 i=1
while
k
X k−1
X
(A)·k = mki dii (L)·i = lki dii (L)·i + dkk (L)·k . (2)
i=1 i=1
The symmetry of A implies (1) equals (2), and so (M )·k = (L)·k . Thus, we have established that
that M = L.
50
Chapter 5
(a) We have
αAα−1 A+ αA = αAA+ A = αA,
and
α−1 A+ αAα−1 A+ = α−1 A+ AA+ = α−1 A+ .
In addition, αAα−1 A+ = AA+ and α−1 A+ αA = A+ A are both symmetric since AA+ and A+ A
are symmetric.
(b) We have
A0 (A+ )0 A0 = (AA+ A)0 = A0 ,
and
(A+ )0 A0 (A+ )0 = (A+ AA+ )0 = (A+ )0 .
In addition, A0 (A+ )0 = (A+ A)0 and (A+ )0 A0 = (AA+ )0 are both symmetric since AA+ and A+ A
are symmetric.
(c) (A+ )+ = A follows immediately since conditions (5.1), (5.2), (5.3) and (5.4) for the Moore–Penrose
inverse A+ of A are conditions (5.2), (5.1), (5.4) and (5.3), respectively, for the Moore–Penrose
inverse A of A+ .
and
A−1 AA−1 = A−1 .
51
5.5 Let Φ = diag(φ1 , . . . , φm ). Note that since λi φi λi = λi λ−1
i λi = λi if λi 6= 0 and λi φi λi = 0 = λi if
λi = 0,
ΛΦΛ = Λ.
ΦΛΦ = Φ.
Further, ΛΦ and ΦΛ are diagonal matrices and, hence, are symmetric. This confirms that Φ = Λ+ .
52
5.9 Since rank(A) = 1, the singular value decomposition of A has the form A = pdq 0 , where d is a scalar,
while p and q are m × 1 and n × 1 unit vectors. Now d is the positive square root of the one nonzero
p
eigenvalue of A0 A, and so d = tr(A0 A) = c1/2 and A = c1/2 pq 0 . Thus,
and
(A+ A)2 = A+ (AA+ A) = A+ A,
so both AA+ and A+ A are idempotent. Since they are idempotent, we also find that
and
(Im − A+ A)2 = Im − 2A+ A + (A+ A)2 = Im − 2A+ A + A+ A = Im − A+ A,
5.13 Since B is positive definite, there exists an n × n nonsingular matrix T such that B = T T 0 . Letting
A∗ = AT and using Problem 5.12(c), we have
that is,
AT T 0 A0 (AT T 0 A0 )+ AT = AT.
ABA0 (ABA0 )+ A = A.
5.15 Let A = XΛX 0 be the spectral decomposition of A. If A has one nonzero eigenvalue λ with multiplicity
r, then Λ and X can be chosen so that Λ = diag(λIr , (0)) and X = (X1 , X2 ), and the spectral
decomposition will reduce to the form = X1 (λIr )X10 = λX1 X10 , where X1 is an m × r matrix satisfying
X10 X1 = Ir . Thus,
53
5.17 (a) If A is nonnegative definite, then its eigenvalues are nonnegative. It then follows from Theorem
5.7 that the eigenvalues of A+ are also nonnnegative, and so it is a nonnegative definite matrix.
(b) If Ax = 0, then A+ Ax = 0. Using the fact that A and A+ are symmetric, we have
Premultiplying by A+ , we get
0 = A+ AA+ x = A+ x.
5.19 Let rA = rank(A), rB = rank(B), while A = P DP 0 and B = XΛX 0 are the spectral decompositions
of A and B, where D and Λ are rA × rA and rB × rB diagonal matrices. Note that if Ax = 0, we
must have Bx = 0 since B and A − B are nonnegative definite, and so R(B) ⊆ R(A). First suppose
that rA = rB in which case R(B) = R(A). As a result, there exists an orthogonal matrix Q such that
X = P Q and
while
B + − A+ = XΛ−1 X 0 − P D−1 P 0 = P D−1/2 (D1/2 QΛ−1 Q0 D1/2 − IrA )D−1/2 P 0 .
Since A − B is nonnegative definite, so is IrA − D−1/2 QΛQ0 D−1/2 . But this implies that (see Problem
4.46) D1/2 QΛ−1 Q0 D1/2 − IrA is nonnegative definite and, hence, so is B + − A+ . Conversely, now
suppose that B + − A+ is nonnegative definite. But this implies that R(A+ ) ⊆ R(B + ), that is,
R(A) ⊆ R(B) in addition to the already established condition R(B) ⊆ R(A), and so rA = rB follows.
Q0 A+ P 0 P AQ = Q0 A+ AQ is symmetric,
Q0 A+ P 0 P AQQ0 A+ P 0 = Q0 A+ AA+ P 0 = Q0 A+ P 0 .
54
5.23 (a) Note that
0 1 0
A+ = 0 1 ,
0
0 0 0
ABA = diag(A11 A+ +
11 A11 , . . . , Arr Arr Arr ) = diag(A11 , . . . , Arr ) = A,
BAB = diag(A+ + + + + +
11 A11 A11 , . . . , Arr Arr Arr ) = diag(A11 , . . . , Arr ) = B,
AB = diag(A11 A+ + +
11 , . . . , Arr Arr ) is symmetric since Aii Aii is for each i,
BA = diag(A+ + +
11 A11 , . . . , Arr Arr ) is symmetric since Aii Aii is for each i.
while
1 0 −3 3
V+ = (V 0 V )−1 V 0 = .
3 −1 −1 2
55
Since U 0 V = (0), Corollary 5.13.1(d) applies and so
1/9 1/9 1/9
1/9
1/9 1/9
+
U
A+ = = 1/9 .
1/9 1/9
V+
−1
0 1
−1/3 −1/3 2/3
5.29 Let U = wx0 and V = yz 0 . Then U V 0 = (0) and U 0 V = (0), so Theorem 5.17 applies. Using this and
the solution to Problem 5.10(d), we get
5.31 We can write Ω = nD − npp0 , where p = (p1 , . . . , pm )0 , and so Ω takes the form A + cd0 in Theorem
5.18 with A = nD, c = −np, and d = p. Since
m
X
1 + d0 A−1 c = 1 − p0 D−1 p = 1 − pi = 0,
i=1
according to Theorem 5.18, Ω is singular. Note that for x = A−1 d = n−1 D−1 p = n−1 1m and
y = A−1 c = −n−1 D−1 np = −1m , we have xx+ = yy + = m−1 1m 10m . Thus, from Theorem 5.18
5.33 Since c and d are in the column space of A and A is symmetric, we have A+ Ac = AA+ c = c and
A+ Ad = AA+ d = d. We will show that (A + cd0 )+ = A+ − α−1 A+ cd0 A+ , where α = 1 + d0 A+ c, by
verifying the four conditions of the Moore–Penrose inverse. Now
(A + cd0 )(A+ − α−1 A+ cd0 A+ ) = AA+ − α−1 AA+ cd0 A+ + cd0 A+ − α−1 cd0 A+ cd0 A+
56
so condition 5.3 holds. Similarly,
(A+ − α−1 A+ cd0 A+ )(A + cd0 ) = A+ A + A+ cd0 − α−1 A+ cd0 A+ A − α−1 A+ cd0 A+ cd0
= A+ A + A+ cd0 − A+ cd0 = A+ A,
= A + cd0 ,
and
(A+ − α−1 A+ cd0 A+ )(A + cd0 )(A+ − α−1 A+ cd0 A+ ) = A+ A(A+ − α−1 A+ cd0 A+ )
= A+ − α−1 A+ cd0 A+ .
5.39 First suppose that B is a generalized inverse of A. Then ABA = A and so postmultiplying by B, we
have ABAB = AB confirming that AB is idempotent. Further, from Theorem 2.8(a),
57
implying then that rank(A) = rank(AB). Now suppose AB is idempotent and rank(A) = rank(AB).
Thus,
ABAB = AB. (1)
But the column space of AB is a subspace of the column space of A, and when rank(A) = rank(AB)
these two subspaces are the same. Thus, the columns of A can be written as a linear combination of
those of AB meaning there is some matrix C such that A = ABC. Postmultiplying equation (1) by C
yields ABA = A, so we have shown that B is a generalized inverse of A.
BACDBAC = BAC.
ACDBA = A.
This implies that CDB is a generalized inverse of A, that is, CDB = A− for some A− , and this leads
to D = C −1 A− B −1 .
5.43 (a) First suppose that B is a reflexive inverse of A. Then since ABA = A, we have
Together these two identities imply rank(B) = rank(A). Conversely, suppose ABA = A and
rank(B) = rank(A). Then BABA = BA, that is, BA is idempotent, and from Theorem 5.23,
rank(BA) = rank(A) = rank(B). It now follows from Problem 5.39 that A is a generalized inverse
of B, so BAB = B.
58
(b) Note that
−1
∆ (0) ∆ E I ∆E
AB = P Q0 Q P0 = P P 0.
(0) (0) F F ∆E (0) (0)
Thus,
I ∆E ∆ (0)
ABA = P P 0P Q0
(0) (0) (0) (0)
∆ (0)
= P Q0 = A,
(0) (0)
and
∆−1 E I ∆E
BAB = Q P 0P P0
F F ∆E (0) (0)
∆−1 E
= Q P 0 = B.
F F ∆E
5.45 We prove the result by simply showing that the expression given for A− satisfies AA− A = A, and since
U W U + U XV
AA− A = ,
V W U + V XV
and
V XV = V (In − U − U ){V (In − U − U )}− V,
so
= V U − U + V (In − U − U ) = V.
59
1) Add row 1 to row 2.
Since H is in Hermite form, it follows from Theorem 5.31 that C is a generalized inverse of A.
5.51 We show that A+ = BA0 by verifying the four conditions of Definition 5.1. We know A0 ABA0 A =
A0 A, so A+0 A0 ABA0 A = A+0 A0 A, which leads to the first condition, ABA0 A = A. We also know
60
(BA0 A)0 = BA0 A, which is the fourth condition. Postmultiplying this identity by A+ yields the
identity A0 AB 0 A+ = BA0 AA+ = BA0 . Thus,
where the second equality uses the fourth condition, while the last equality follows from the first
condition. This gives us the second condition BA0 ABA0 = BA0 . Finally, we get the third condition by
noting that
ABA0 = AB(ABA0 A)0 = ABA0 AB 0 A0
is symmetrtic.
5.53 We show that A+ = A0 (AA0 )L by verifying the four conditions in Definition 5.1. Using Theorem
5.27(b)
AA0 (AA0 )L A = AA0 (AA0 )+ A = AA0 A+0 A+ A = A(A+ A)0 A+ A = AA+ AA+ A = A,
and
Note that the symmetry of AA0 (AA0 )L follows immediately from the definition of (AA0 )L . Finally,
A0 (AA0 )L A is symmetric due to Theorem 5.27(c).
5.55 According to Theorem 5.27(b), A(A0 A)− A0 = A(A0 A)+ A0 and A0 (AA0 )− A = A0 (AA0 )+ A. Thus
61
5.57 Since H is upper triangular, we have
m
X
(H 2 )ij = hik hkj
k=1
= 0, if i > j,
= h2ii , if i = j,
j
X
= hik hkj , if i < j.
k=i
Thus, (H 2 )ij = hij = 0 if i > j, and (H 2 )ii = h2ii = hii since hii = 0 or hii = 1, so all that remains is
to show that (H 2 )ij = hij when i < j. Now if hii = 0, then hik = 0 for all k and so
j
X
(H 2 )ij = hik hkj = 0 = hij .
k=i
When hii = 1,
j
X j
X
hik hkj = hij + hik hkj = hij ,
k=i k=i+1
since if for any k, hik 6= 0, then hkk = 0 and hkj = 0, and so hik hkj = 0 for k = i + 1, . . . , j.
62
Then since A has rank of 2,
0 4 0
−6
+ 0 −1 0 1 1 1
A = 2{tr(B2 A A)} B2 A = .
12
1 −2
1
1 2 1
5.61 Using the notation of Theorem 5.13 and its corollary, if A = [Aj−1 aj ] = [U V ], then C = aj −
Aj−1 dj = cj . If cj 6= 0, then C + C = c+
j cj = 1, and so Corollary 5.13.1(c) applies. This gives
0
A+ + +
j−1 − Aj−1 aj cj A+
j−1 − dj bj
A+
j =
= .
c+
j b0j
K = (1 + a0j A+0 +
j−1 Aj−1 aj )
−1
= (1 + d0j dj )−1 ,
so
0
U + − U + V KV 0 U +0 U + A+ +
j−1 − Aj−1 aj bj
A+
j = =
KV 0 U +0 U + b0j
A+
j−1 − dj b0j
= .
b0j
63
Chapter 6
and then
1
+
3
AA c =
.
−1
0
Since this is c, the system is consistent.
(c) It is easily verified that rank(A) = 3, so from Theorem 6.7, it follows that there is n−rank(A)+1 =
1 linearly independent solution.
6.3 By augmenting A with a final column of zeros and then row reducing this matrix to one in Hermite
form, we obtain a generalized inverse given by
0 2/5 0 −1/5
A− = 0 −1/5 0 3/5 .
0 −3 5 −1
64
6.5 According to Theorem 6.3, since AXB = C is consistent, AA− CB − B = C. As a result
XX∗ = A− CB − + X∗ − A− AX∗ BB − = A− CB − + X∗ − A− CB − = X∗ .
6.7 Suppose there exists a y such that A− c and (In − A− A)y 6= 0 are linearly dependent. After possibly
making a scalar adjustment to this vector y, we would then have A− c = (In − A− A)y, and this leads
to
AA− c = A(In − A− A)y = (A − AA− A)y = (A − A)y = 0.
But this is a contradiction because if the system is consistent, we must have AA− c = c 6= 0. Thus,
A− c and (In − A− A)y must be linearly independent.
since AA− c = c if Ax = c is consistent. But since C is arbitrary and c 6= 0, any b ∈ Rn can be written
as b = Cc for some C, so we must have In − A− A = (0). The result now follows from Theorem 6.6.
6.11 The number of linearly independent solutions is r = n−rank(A) = 4−2 = 2. Using the Moore-Penrose
inverse
9 17
−6
1 12
A+ = A0 (AA0 )−1 = ,
86 −34
−26
−9 −17
and
61 24 18 25
24 −24
+ 1 24 32
I4 − A A = ,
86
18 24 18 −18
25 −24 −18 61
the general solution is given by xy = (In − A+ A)y. Choosing 86e1 and 86e2 for y, we get the two
particular linearly independent solutions, (61, 24, 18, 25)0 and (24, 32, 24, −24)0 .
65
6.13 Since G is a generalized inverse of A, the arbitrary solution x∗ can be written as
x∗ = Gc + (In − GA)y
since
Thus,
x0∗ x∗ ≥ c0 G0 Gc (1)
since
y 0 (In − GA)0 (In − GA)y (2)
is nonnegative. We have equality in (1) only if the quantity in (2) is 0 and this requires (In −GA)y = 0.
This confirms that we have a strict inequality in (1) if x∗ 6= Gc.
6.15 Note that Im = XX −1 = X1 X1− + X2 X2− , so that X2 X2− = Im − X1 X1− . Then using Theorem 6.5,
we can solve AX1 = X1 Λ1 for A to get
A = X1 ΛX1− − W X2− .
66
6.17 (a) Suppose there is a common solution. That is, there exists an X satisfying AX = C and XB = D.
Postmultiplying the first of these two equations by B, while premultiplying the second by A yields
two equations for AXB which when equated gives AD = CB. Now suppose AD = CB. We need
to show that the general solution to AX = C, XY = A− C + Y − A− AY , is a solution to XB = D
for some Y . Take Y = DB − so that
XY = A− C + DB − − A− ADB − = A− C + DB − − A− CBB − .
XY B = A− CB + DB − B − A− CBB − B = A− CB + D − A− CB = D.
for arbitrary Y , since (In − A− A)− = (In − A− A). Substituting this back into the expression for
X, we get
as required.
6.19 (a) Using the least squares inverse obtained in the solution to Problem 5.49, we find that
32
1
AAL c = 18 6= c,
22
114
67
(b) A least squares solution is given by
73
L 1 41
A c= .
22 0
0
(c) The sum of squared errors for a least squares solution to this system of equations is
4
(AAL c − c)0 (AAL c − c) = .
11
6.21 Suppose x∗ is a least squares solution. Then, according to Theorem 6.13, Ax∗ = AAL c, and this leads
to
A0 Ax∗ = A0 (AAL )0 c = A0 AL0 A0 = A0 c.
Conversely, now suppose A0 Ax∗ = A0 c. Then AL0 A0 Ax∗ = AL0 A0 c, or (AAL )0 Ax∗ = (AAL )0 c, and
so AAL Ax∗ = AAL c or, finally Ax∗ = AAL c. Thus, it follows from Theorem 6.13 that x∗ is a least
squares solution.
XB − b + X(Im − B − B)u = y,
or
X(Im − B − B)u = y − XB − b. (2)
Now using Theorem 6.14, the general least squares solution for u in (2) is given by
where w is an arbitrary m × 1 vector. We then get the general restricted least squares solution for β
by substituting uw for u in (1), and this leads to
68
Chapter 7
(c) Writing
B11 B12
A−1 = ,
B21 B22
and using Theorem 7.1, we get
d
B11 = (A11 − A12 A−1
22 A21 )
−1
= {(a − bc/d)Im }−1 = Im ,
ad − bc
a
B22 = (A22 − A21 A−1
11 A12 )
−1
= {(d − bc/a)Im }−1 = Im ,
ad − bc
b
B12 = −A−1
11 A12 B22 = − Im ,
ad − bc
c
B21 = −A−1
22 A21 B11 =− Im ,
ad − bc
That is,
1 dIm −bIm
A−1 = .
ad − bc −cIm aIm
so A is nonsingular if a 6= 0, b 6= 0, and ab 6= cdm2 . Letting g = ab − cdm2 and using Theorem 7.1 and
Problem 3.29, we find that
cdm
B11 = (aIm − (cdm/b)1m 10m )−1 = a−1 Im + 1m 10m ,
ag
cdm
B22 = (bIm − (cdm/a)1m 10m )−1 = b−1 Im + 1m 10m ,
bg
69
cdm c
B12 = −(aIm )−1 (c1m 10m ) b−1 Im + 1m 10m = − 1m 10m ,
bg g
cdm d
B21 = −(bIm )−1 (d1m 10m ) a−1 Im + 1m 10m = − 1m 10m .
ag g
That is,
0
a−1 Im + cdm
ag 1m 1m − gc 1m 10m
A−1 = .
− dg 1m 10m b−1 Im + cdm 0
bg 1m 1m
we get
1 3/2
|A| = |A11 ||A22 − A21 A−1
11 A12 | = 24 = 5.
5/12 5/6
Applying Theorem 7.1, we find
1 20 −36
B22 = (A22 − A21 A−1
11 A12 )
−1
= ,
5 −10 24
2 1 0
1
B11 = A−1 −1 −1
11 + A11 A12 B22 A21 A11 = 0 ,
1 3
5
0 0 5
0 −3
1
B12 = −A−1
11 A12 B22 = 0 −4 ,
5
−5 0
1 9 12 −10
B21 = −B22 A21 A−1
11 = .
5 −6 −8 5
70
for some matrices X and Y . This requires X = A21 A−1 −1
11 and Y = A11 A12 , and so
Here
Im1
G+ = G0 (GG0 )−1 = (Im1 + A−1 0 −10 −1
11 A12 A12 A11 )
A012 A−10
11
A011
= (A11 A011 + A12 A012 )−1 A11 ,
A012
and
−1
7.9 From Theorem 7.1, B11 = A11 − A12 A−1 0
22 A12 , and so
−1 −1 0 −1
A11 − B11 A12 A A A A12 A A
= 12 22 12 = 12 22 A22 A−1 0
C= 22 A12 Im2
0 0
A12 A22 A12 A22 Im2
is nonnegative definite since A22 is positive definite. Due to the identity above, rank(C) ≤ rank(A22 ) =
m2 = m − m1 , while it follows that rank(C) ≥ rank(A22 ) since A22 is a submatrix of C. This confirms
that C is positive semidefinite with rank of m − m1 .
71
7.11 (a) Applying Theorem 7.4, we have
only if a0 A−1
11 a = 0, and this is equivalent to a = 0.
(b) We can prove by induction. The result holds for m = 2 since in this case we can apply the result
from part (a). Now assume the result holds for (m − 1) × (m − 1) positive definite matrices. That
is, if A11 is (m − 1) × (m − 1) and positive definite, then
with equality if and only if A11 is diagonal. Applying the result of part (a) to the matrix
A11 a12
A= ,
a012 amm
we have |A| ≤ amm |A11 | with equality if and only if a12 = 0. Combining these two results yields
the desired result.
7.13 The result follows by computing the determinant of both sides of the identity
A11 A12 Im1 (0) A11 A12
= .
A21 + BA11 A22 + BA12 B Im2 A21 A22
Thus,
|A11 ||A21 A12 − A11 A22 | = | − A11 ||A|,
7.17 If B has rank 1, it can be expressed as B = −dc0 for some m × 1 vectors c and d. Using Theorem
7.4(a), we have
A d
= |A − dc0 | = |A + B|,
0
c 1
72
while Theorem 7.4(b) yields
A d
= |A|(1 − c0 A−1 d) = |A|{1 + tr(A−1 B)}.
0
c 1
7.19 The proof utilizes the results given in Theorem 5.25. If R(A12 ) ⊂ R(A11 ), then A11 A−
11 A12 = A12 , and
so
A11 (0) Im1 A−
11 A12
A= . (1)
A21 Im2 (0) A22 − A21 A−
11 A12
Taking the determinant of both sides of (1) and (2) yields the desired result.
7.21 Consider
1 0 1 1
0 1 1 1
A=
,
1 1 1 1
1 1 1 1
and partition as in (7.1) with m1 = 1 and m2 = 3. The conditions in Theorem 7.8(a) do not hold
since the column spaces of both A21 and A012 are spanned by (0, 1, 1)0 , while the column space of A22
is spanned by (1, 1, 1)0 . However, the identity holds since both |A| and |A22 | are zero.
A12 A− −
22 A21 = CA22 A22 A22 B = CA22 B.
Consider
1 0 1
A= 1 1 ,
1
1 1 1
73
and partition A as in (7.1) with m1 = 1 and m2 = 2. Note that R(A21 ) ⊂ R(A22 ) holds, but
R(A012 ) ⊂ R(A022 ) does not. The matrices D = diag(1, 0) and E = diag(0, 1) are both generalized
inverses of the matrix A22 = 12 102 , and A11 − A12 DA21 = 1, while A11 − A12 EA21 = 0.
7.25 Write the expression given for A− as A− = X + Y . Note that since A11 G = (0), we have
A11 A−
11 + CC −
F (0)
AX = ,
(Im2 − DD )(A21 A−
− −
11 + BC F ) DD
−
Also
−C h i
AY = E BC − F + A21 A−
11 −Im2 ,
−(Im2 − DD− )B
and since CE = (0), ED = (0), and (Im2 − DD− )BEB(Im2 − C − C) = (Im2 − DD− )B(Im2 − C − C)
(0) (0)
AY A = . (2)
(0) (Im2 − DD− )B(Im2 − C − C)
7.27 With
B1− −B1− A12 A−
22
C= ,
−A− −
22 A21 B1 A− − − −
22 + A22 A21 B1 A12 A22
we need to verify that ACA = A if and only the stated conditions hold. Partitioning ACA and A
into 2 × 2 forms and equating the (1, 1)th submatrices yields an identity equivalent to the identity
A11 = A11 , which of course holds. Equating the (2, 1)th submatrices yields an identity equivalent
to the identity given in (a), while equating the (1, 2)th submatrices yields the identity given in (b).
Finally, equating the (2, 2)th submatrices yields an identity equivalent to the final condition in (c).
7.29 We have
1 −1
A−1
11 =
,
−1 2
74
from which we get B = 0. Consequently,
1
Z = A012 A−1 −1 0 −1 −1
11 (I2 + A11 A12 A12 A11 ) = [2 − 2] ,
9
and
A−1 0 Z 0
B ∼ = [Z 1]
11 = 20 .
0 0
0 1 81
7.31 Note that C is symmetric and idempotent and R(B) = R(B + ) ⊂ R(C). Thus, B + C = B + and
CB + = B + , and so
A012 B + = (0) and B + A12 = (0). (1)
and
A+0 0 +
12 A12 + B A11 C (0)
A+ A = . (3)
A+ + +
12 A11 C − A12 A11 B A11 C A+
12 A12
which is symmetric. Finally using (1), (4) and the fact that BB + A11 = BB + CA11 = CA11 , we get
AA+ A = A and A+ AA+ = A+ . Thus, we have shown that the expression given for A+ satisfies the
four conditions of the Moore–Penrose inverse.
75
7.33 In the proof of Theorem 7.18, it was shown that λm1 −j+1 (B1 ) ≥ λm−j+1 (A), so the lower bound clearly
holds. Applying the inequality from Problem 7.32 to A−1 , we get
k
( ) k
X
−2 −2 λ−1
m2 (B2 )
X
{λm−j+1 (A) − λm1 −j+1 (B1 )} ≤ 2 1 + −1 −1 λj (CC 0 ).
j=1
λ m1 −k+1 (B1 ) − λ m 2 (B2 ) j=1
leads to
k
X
{λ2m1 −j+1 (B1 ) − λ2m−j+1 (A)} ≤ 2λ2m−k+1 (A)λ2m1 −k+1 (B1 )
j=1
( ) k
λ−1
m2 (B2 )
X
× 1+ λj (CC 0 )
λ−1 −1
m1 −k+1 (B1 ) − λm2 (B2 ) j=1
( ) k
−1
λm2 (B2 ) X
≤ 4
2λm1 −k+1 (B1 ) 1 + −1 λj (CC 0 ).
λm1 −k+1 (B1 ) − λ−1
m2 (B2 ) j=1
7.35 Note that the lower bound is an immediate consequence of the lower bound given for λh (B1 ) in Problem
7.34. Now let
Q0 (0) Q (0) Q0 A11 Q Q0 A12 C11 C12
C= A = = .
(0) Im2 (0) Im2 A012 Q A22 0
C12 C22
−1 0
Note that C is positive definite and C1 = C11 − C12 C22 C12 = ∆, so that the positive eigenvalues of
C1 and B1 are the same and, in particular, λ1 (C1 ) < λm2 (B2 ). Thus, we can apply Theorem 7.18 to
C. Note that −C1−1 C12 C22
−1
= −∆−1 Q0 A12 A−1
22 = Ĉ, and so
−r+k
m1X −r+k
m1X
{λm1 −j+1 (B1 ) − λm−j+1 (A)} = {λm1 −j+1 (B1 ) − λm−j+1 (A)}
j=1 j=m1 −r+1
k
X
≤ {λr−j+1 (C1 ) − λm2 +r−j+1 (C)}
j=1
k
λ2r−k+1 (C1 ) X
≤ λj (Ĉ Ĉ 0 )
{λ−1 −1
r−k+1 (C1 ) − λm2 (B2 )} j=1
76
k
λ2r−k+1 (B1 ) X
= λj (Ĉ Ĉ 0 ).
{λ−1 −1
r−k+1 (B1 ) − λm2 (B2 )} j=1
The first equality follows from the fact that λh+m2 (A) = λh (B1 ) = 0 for h = r + 1, . . . , m1 , a conse-
quence of rank(A) = rank(A22 ) + rank(B1 ) = m2 + r, while the first inequality follows from Theorem
3.19.
77
Chapter 8
78
8.7 From Theorem 8.1(f), we have
(A ⊗ B)0 = A0 ⊗ B 0 = A ⊗ B,
so A ⊗ B is symmetric.
confirming that A⊗B is orthogonal. Conversely, if A⊗B is orthogonal, we must have A0 A⊗B 0 B = Imn .
This implies that (A0 A)ij B 0 B = (0) when i 6= j, so A0 A must be diagonal. Also, we must have
(A0 A)ii B 0 B = In for i = 1, . . . , m, implying that B 0 B is diagonal with each diagonal element equal to
−1
(A0 A)ii . Thus, we have shown that for some d > 0, A0 A = dIm and B 0 B = d−1 In , and this yields the
result with c = d−1/2 .
8.13 Since rank(A) ≤ m < n, there exist n − m linearly independent n × 1 vectors x1 , . . . , xn−m such that
Axi = 0 for each i. If y 1 , . . . , y m is a set of linearly independent m × 1 vectors, then xi ⊗ y j for
i = 1, . . . , n − m and j = 1, . . . , m, form a linear independent set of vectors and
(A ⊗ B)(xi ⊗ y j ) = (Axi ⊗ By j ) = (0 ⊗ By j ) = 0
for all i and j. This confirms that 0 is an eigenvalue with multiplicity at least (n − m)m.
8.15 Let λ1 , . . . , λm be the eigenvalues of A, and θ1 , . . . , θp be the eigenvalues of B. Since A and B are
positive definite, λi > 0 and θj > 0 for all i and j. Thus, λi θj > 0 for all i and j, and so according to
Theorem 8.5, all of the eigenvalues of A ⊗ B are positive. This confirms that A ⊗ B is positive definite.
79
x1 y1 ··· x1 yn
.. ..
h i
y0 ⊗ x = ··· = ,
y1 x yn x . .
xm y1 ··· xm yn
x1 y 0 x1 y1 ··· x1 yn
.. .. ..
0
x⊗y = = .
. . .
xm y 0 xm y1 ··· xm yn
X = (1a ⊗ 1b ⊗ 1n , Ia ⊗ 1b ⊗ 1n , 1a ⊗ Ib ⊗ 1n ),
and
abn bn10a an10b
0
X X = bn1a n1a ⊗ 10b .
bnIa
an1b n10a ⊗ 1b anIb
Using the generalized inverse
(X 0 X)− = diag((abn)−1 , (bn)−1 (Ia − a−1 1a 10a ), (an)−1 (Ib − b−1 1b 10b ))
where ȳ·· , ȳi· , and ȳ·j are as defined in Example 8.3. The fitted value for yijk is µ̂ + τ̂i + γ̂j =
ȳi· + ȳ·j − ȳ·· , and so the sum of squared errors is given by
a X
X b X
n
SSE = (yijk − ȳi· − ȳ·j + ȳ·· )2 .
i=1 j=1 k=1
80
(b) The reduced model is a one-way classification model with b treatments and an observations for
each treatment. Using Example 8.2, we find that the sum of squared errors for this reduced model
is given by
a X
b X
n
!2
X X X X
SSE1 = (yijk − ȳ·j )2 = 2
yijk − (an)−1 yijk .
i=1 j=1 k=1 ijk j ik
we find that
2 2
X X X
SSA = SSE1 − SSE = (bn)−1 yijk − (abn)−1 yijk
i jk ijk
a
X
= nb (ȳi· − ȳ·· )2 .
i=1
(c) The reduced model is a one-way classification model with a treatments and bn observations for
each treatment. Again using Example 8.2, we find that the sum of squared errors for this reduced
model is given by
2
a X
X b X
n X X X
SSE2 = (yijk − ȳi· )2 = 2
yijk − (bn)−1 yijk .
i=1 j=1 k=1 ijk i jk
Thus, we have
!2 2
X X X
SSB = SSE2 − SSE = (an)−1 yijk − (abn)−1 yijk
j ik ijk
b
X
= na (ȳ·j − ȳ·· )2 .
j=1
(d) For each i, µ + τi is estimable and for each j, µ + γj is estimable with the corresponding estimates
given by ȳi· and ȳ·j , respectively.
(e) Note that the SSE given in Problem 8.18 can be expressed as
!2
X X X
2 −1
yijk − n yijk .
ijk ij k
81
Subtracting this from the SSE given in (1), we get
!2 2 !2
X X X X X X
SSAB = n−1 yijk − (bn)−1 yijk − (an) −1
yijk
ij k i jk j ik
2
X
+ (abn)−1 yijk
ijk
X X X
2
= n ȳij − nb ȳi·2 − na 2
ȳ·j + nabȳ··2
ij i j
a X
X b
= n (ȳij − ȳi· − ȳ·j + ȳ·· )2 .
i=1 j=1
(b) Similarly
= diag(A1 A3 , A2 A4 ) = A1 A3 ⊕ A2 A4 .
(c) Finally,
(A1 ⊕ A2 ) ⊗ A3 = diag(A1 , A2 ) ⊗ A3
8.23 Solving the system of equations AXB = C in Theorem 6.5 is equivalent to solving the system of
equations
(B 0 ⊗ A) vec(X) = vec(C).
82
8.25 (a) The result follows from Theorem 8.12 when p = n, q = m, and D = Im .
= {vec(C)}0 (B ⊗ A) vec(C).
and
vec(AB) = vec(AIn B) = (B 0 ⊗ A) vec(In ).
We have equality if and only if one of the vectors, vec(A) or vec(B), is a scalar multiple of the other,
or in other words, one of the matrices, A or B, is a scalar multiple of the other.
83
Pm
8.33 Since Im = i=1 ei e0i , we have
m
! m m
X X X
vec(Im ) = vec ei e0i = vec(ei e0i ) = ei ⊗ ei ,
i=1 i=1 i=1
(b) The characteristic equation of A is |A − λI2 | = λ2 − 5λ = 0. Thus, the eigenvalues of A are 0 and
5, so it is positive semidefinite. The characteristic equation of B is |B − λI2 | = λ2 − 7λ + 11 = 0.
√ √
Thus, the eigenvalues of B are 21 (7 + 5) and 21 (7 − 5). Since these are both positive, B is
positive definite. The characteristic equation of A B is |A B − λI2 | = λ2 − 16λ + 44 = 0.
√
Thus, the eigenvalues of A B are 8 ± 20, so it is also positive definite. The fact that A B is
positive definite follows from Theorem 8.17.
and
n
X n
X
0 0
tr{A (B C)} = {A (B C)}ii = (A0 )i· (B C)·i
i=1 i=1
Xn Xn Xm
= (A)0·i (B C)·i = aji bji cji .
i=1 i=1 j=1
84
(b) This follows directly from part (a) since
λ2 (A B) ≥ λ2 (AB) = 3.
The bound from Theorem 8.23 is closer since it gives the actual value.
(b) Since
2 0
A B= ,
0 3
clearly λ2 (A B) = 2. The lower bound from Theorem 8.21 is
It is easily shown that AB = B has eigenvalues 1 and 4, so Theorem 8.23 yields the bound
λ2 (A B) ≥ λ2 (AB) = 1.
The bound from Theorem 8.21 is closer since it gives the actual value.
85
while
where RAi and RBi denote the matrices obtained by deleting the first i rows and columns of
RA and RB . Note that we need to show that l1 ≥ 0. It follows from Theorem 8.19 that C is
nonnegative definite and, hence, by Theorem 8.16, RA C is nonnegative definite. An application
of Theorem 8.20 yields
The last step was obtained by expanding along the first row. This last inequality can be equiva-
lently expressed as
|RA RB | − |RA1 RB1 |/ρ ≥ |RA | − |RA |/ρ.
Using this and the fact that |RB | = |RB1 |/ρ, we have
Note that ρ−1 − |RB | = ρ−1 (1 − |RB1 |) ≥ 0 since RB1 is a correlation matrix, and so we
must have |RB1 | ≤ 1. In addition, |RA1 | − |RA | = |RA1 |(1 − |RA |/|RA1 |) ≥ 0 follows from
Theorem 8.19. Thus, we have established that l1 ≥ l2 |RB |/|RB1 |. In a similar fashion, we
86
get li ≥ li+1 |RBi−1 |/|RBi | for i = 2, . . . , m − 1. These inequalities lead to the inequality
l1 ≥ lm |RB |/|RBm−1 |. The result then follow since lm = 1 + 1 − 1 − 1 = 0.
8.45 It follows from Theorem 8.16 that A B is nonnegative definite, and since B has r positive diagonal
elements, so does A B. If the ith diagonal element of A B is 0, then (A B)ei = 0 (see Problem
Q
3.51), and so we must have rank(A B) ≤ r. Now apply the formula |A B| ≥ |A| bii of Theorem
8.20 on the submatrix of A B obtained by deleting the rows and columns corresponding to the 0
diagonal elements. This show that A B has an r × r nonsingular submatrix and so rank(A B) ≥ r.
Combining these two inequalities for rank(A B) yields rank(A B) = r.
8.47 If R 6= Im , then we must have λ < 1, so the matrix R − λIm has all of its diagonal elements positive.
It then follows from Theorem 8.17, that R (R − λIm ) is positive definite. Consequently, if x is a
normalized eigenvector of R R corresponding to the eigenvalue τ , then
= τ − λx0 Im x = τ − λ,
8.49 With C = P (A ⊗ B)P 0 and D = C −1 partitioned into the 2 × 2 form of (7.1), we have from Problem
−1
7.10, that D11 − C11 is nonnegative definite. Since C11 = A B and
so that D11 = A−1 B −1 , part (a) follows. Parts (b) and (c) are special cases of part (a) in which
B = A and B = A−1 , respectively.
8.51
1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 0 1 0
K22 =
,
K24 =
.
0 1 0 0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
87
8.53 Using Theorem 8.1(g), we have
m X
X n m X
X n
Kmn = eim e0jn ⊗ ejn e0im = eim ⊗ e0jn ⊗ ejn ⊗ e0im
i=1 j=1 i=1 j=1
m
X n
X m
X n
X
= eim ⊗ (e0jn ⊗ ejn ) ⊗ e0im = eim ⊗ ejn e0jn ⊗ e0im
i=1 j=1 i=1 j=1
m
X
= (eim ⊗ In ⊗ e0im ).
i=1
Consequently,
m
X m
X
0
Kmn (x ⊗ A ⊗ y 0 ) = (e0im x ⊗ A ⊗ eim y 0 ) = xi (A ⊗ eim y 0 )
i=1 i=1
m
!
X
= A⊗ xi eim y 0 = A ⊗ xy 0 .
i=1
0
(d) Since P is symmetric and Kmn Kmn = Imn , we find that
P 2 = P 0 P = (A ⊗ A0 )Kmn
0
Kmn (A0 ⊗ A) = (A ⊗ A0 )(A0 ⊗ A) = (AA0 ⊗ A0 A).
(e) It follows from part (d) and Theorem 8.5 that the r2 nonzero eigenvlaues of P 2 are λi λj , i =
1, . . . , r; j = 1, . . . , r. We then get the nonzero eigenvalues of P , except for sign, by taking the
square roots of these eigenvalues. However,
r
X
tr(P ) = tr(A0 A) = λi ,
i=1
and so the nonzero eigenvalues of P must be λi , i = 1, . . . , r, and ±(λi λj )1/2 for all i < j.
88
8.57 Using Problem 8.56, we have
and
vec(A0 ⊗ B 0 ) = (Kmp,n ⊗ Iq ){vec(A) ⊗ vec(B 0 )}.
0
Now applying Theorem 8.10 and the fact that Knq,n Kmp,n = Imnp , we get
1 1 1
(B ⊗ B)Nm = (B ⊗ B)(Im2 + Kmm ) = B ⊗ B + (B ⊗ B)Kmm
2 2 2
1 1 1
= B ⊗ B + Kmm (B ⊗ B) = (Im2 + Kmm )(B ⊗ B) = Nm (B ⊗ B).
2 2 2
2
Further, using this and the fact that Nm = Nm ,
Nm (B ⊗ B)Nm = Nm Nm (B ⊗ B) = Nm (B ⊗ B).
89
(b) For any m × 1 vectors a and b,
1 1 1
Nm (a + b) = (Im2 + Kmm )(a ⊗ b) = {a ⊗ b + Kmm (a ⊗ b)} = (a ⊗ b + b ⊗ a).
2 2 2
1
(a ⊗ b ⊗ c + a ⊗ c ⊗ b + b ⊗ a ⊗ c + b ⊗ c ⊗ a + c ⊗ a ⊗ b + c ⊗ b ⊗ a)
6
m m m
1 XXX
= (eh ah ⊗ ei bi ⊗ ej cj + eh ah ⊗ ei ci ⊗ ej bj
6 i=1 j=1
h=1
+ eh bh ⊗ ei ai ⊗ ej cj + eh bh ⊗ ei ci ⊗ ej aj
+ eh ch ⊗ ei ai ⊗ ej bj + eh ch ⊗ ei bi ⊗ ej aj
m m m
1 XXX
= (eh e0h a ⊗ ei e0i b ⊗ ej e0j c + eh e0h a ⊗ ei e0j b ⊗ ej e0i c
6 i=1 j=1
h=1
= ∆(a ⊗ b ⊗ c).
and so
X X
vec(A) = vec(Tij )aij = vec(Tij )u0ij v(A)
i≥j i≥j
8.65 (a) First suppose B is an m × m symmetric matrix and x0 Bx = 0 for all m × 1 vectors x. Then, in
particular,
e0i Bei = 0
for i = 1, . . . , m, and
(ei + ej )0 B(ei + ej ) = 0
90
for all i 6= j. Together these two identities imply that bii = 0 for all i and bij = 0 for all i 6= j,
and so we must have B = (0). Now for an arbitrary symmetric m × m matrix A,
m X
X m
{v(A)}0 Dm
0
Dm v(A) = {vec(A)}0 vec(A) = a2ij
i=1 j=1
X m
X
= 2 a2ij − a2ii
i≥j i=1
m
X
= 2{v(A)}0 v(A) − {v(A)}0 uii u0ii v(A)
i=1
m
!
X
0
= {v(A)} 2Im(m+1)/2 − uii u0ii v(A).
i=1
Thus, we have shown that y 0 By = 0 for all m(m + 1)/2 × 1 vectors y when B = {Dm
0
Dm −
Pm
(2Im(m+1)/2 − i=1 uii u0ii )} and, since B is symmetric, the result follows.
0
(b) From part (a), we see that Dm Dm is a diagonal matrix with m diagonal elements equal to 1
and m(m − 1)/2 diagonal elements equal to 2. Since the determinant of a diagonal matrix is the
0
product of its diagonal elements, we have |Dm Dm | = 2m(m−1)/2 .
+
8.67 (a) From Theorem 8.33(c), Dm Dm = Nm , so
+
Dm Dm (A ⊗ A)Dm = Nm (A ⊗ A)Dm = (A ⊗ A)Nm Dm = (A ⊗ A)Dm .
(b) We prove the result by induction. The result holds for i = 2 since
+
{Dm (A ⊗ A)Dm }2 +
= Dm +
(A ⊗ A){Dm Dm (A ⊗ A)Dm }
+ +
= Dm (A ⊗ A)(A ⊗ A)Dm = Dm (A2 ⊗ A2 )Dm .
8.69 Since A is lower triangular, A = i≥j aij Eij . Thus, using the fact that aij = u0ij v(A), we get
P
X X X
vec(A) = vec aij Eij = vec(Eij )aij = {vec(Eij )}u0ij v(A).
i≥j i≥j i≥j
91
8.71 Note that if A is strictly lower triangular, then vec(A)0 vec(A) = ṽ(A)0 ṽ(A), and so
0 = vec(A)0 vec(A) − ṽ(A)0 ṽ(A) = ṽ(A)0 L̃m L̃0m ṽ(A) − ṽ(A)0 ṽ(A) = ṽ(A)0 (L̃m L̃0m − Im(m−1)/2 )ṽ(A).
Since any x ∈ Rm(m−1)/2 can be expressed as x = ṽ(A) for some strictly lower triangular matrix A,
this implies that L̃m L̃0m − Im(m−1)/2 = (0) (see the solution to Problem 8.65(a)), which proves (b).
But (b) implies that L̃m is full row rank, which then yields (a) and (c) since
0 0 −1
L̃+
m = L̃m (L̃m L̃m ) = L̃0m Im(m−1)/2 = L̃0m .
To prove (d), for arbitrary A, write A = AL + AU , where AL is strictly lower triangular and AU is
upper triangular. Then
0 = vec(AL )0 vec(AU ) = ṽ(AL )0 L̃m vec(AU ),
Lm vec(A0 ) = v(A0 ) = 0,
and
L̃m vec(A0 ) = ṽ(A0 ) = 0.
Thus,
ṽ(A)0 L̃m Kmm L̃0m = vec(A0 )0 L̃0m = 00 ,
and
ṽ(A)0 L̃m Kmm L0m = vec(A0 )0 L0m = 00 ,
which proves (a) and (b). We get (d) and (e) since for any strictly lower triangular matrix A,
L0m Lm L̃0m ṽ(A) = L0m Lm vec(A) = L0m v(A) = vec(A) = L̃0m ṽ(A),
92
and
To get (c), postmultiply the transpose of (d) by Dm and then use Theorem 8.37(a). Finally, (f) follows
from (d) and Theorem 8.38(b).
8.75 First suppose that A is an m × m nonsingular positive matrix and let B = A−1 . For i 6= j,
m
X
(BA)ij = bik akj = 0. (1)
k=1
Since akj > 0 for all k, (1) holds only if bik < 0 for at least one k or bik = 0 for all k. However, this
second option is not possible since B is nonsingular, and so B is not nonnegative. Now suppose both A
and B are nonnegative and assume that the jth column of A has more than one nonzero component. If
ahj > 0 and alj > 0, then for (1) to hold, we must have bih = bil = 0 for all i 6= j. But this implies that
the hth and lth columns of B are both scalar multiples of ej . This cannot be since B is nonsingular,
and so each column of A has one nonzero component.
(b) An eigenvector of B corresponding to the eigenvalue ρ(B) = 2 is of the form (0, x2 ), which is not
positive.
93
However, since A is nonnegative, ρ(A) is an eigenvalue of A and, hence, 1 + ρ(A) is an eigenvalue
of Im + A. This confirms that ρ(Im + A) = 1 + ρ(A).
(b) If λ1 , . . . , λm are the eigenvalues of A, λk1 , . . . , λkm are the eigenvalues of Ak . Since A is nonnegative,
ρ(A) is an eigenvalue of A and if it is a multiple eigenvalue of A, then ρ(A)k = ρ(Ak ) would be a
multiple eigenvalue of Ak . However, from Theorem 8.45, we know this is not possible since Ak is
positive.
(c) It follows from part (a) that if ρ(A) is a multiple eigenvalue of A, then 1 + ρ(A) = ρ(Im + A) is
a multiple eigenvalue of Im + A. But A is an irreducible nonnegative matrix and so by Theorem
8.46, (Im + A)m−1 is positive, and then according to part (b), 1 + ρ(A) is a simple eigenvalue of
Im + A. Thus, ρ(A) must be a simple eigenvalue of A.
8.81 Since A is nonnegative and nonnegative definite, we know that aij ≥ 0 for all i and j, and a11 −a212 /a22 ≥
0 if a22 > 0 (see Problem 7.8). We will find B of the form
a b
B= ,
0 c
so that
a2 + b2 bc a11 a12
BB 0 = = . (1)
bc c2 a12 a22
If a22 = 0, a12 = 0 since A is nonnegative definite (Problem 3.51), and so BB 0 = A with
1/2
a11 0
B= .
0 0
1/2 1/2
c = a22 , b = a12 /a22 , a = (a11 − a212 /a22 )1/2 .
94
√
This confirms that the conjugate transpose of F is as given in the problem. Now m(F )ik = θ(i−1)(k−1)
√
and m(F ∗ )kj = θ−(k−1)(j−1) , so
m
X
m(F F ∗ )ij = m(F )i· (F ∗ )·j = θ(i−1)(k−1) θ−(k−1)(j−1)
k=1
m
X m−1
X
= θ(k−1)(i−j) = {θi−j }k .
k=1 k=0
mn+r
Πm = (Πm n r n r r
m ) Πm = (Im ) Πm = Πm ,
if j = 1, and
m−1
X 1 − (θj−1 )m 1 − (θm )j−1 1−1
δj = (θj−1 )k = j−1
= j−1
= = 0,
1−θ 1−θ 1 − θj−1
k=0
if j = 2, . . . , m. That is, it has the eigenvalue m and the eigenvalue 0 repeated m − 1 times.
95
8.89 The matrix
0 1 0
A=B= 1
0 0
0 0 1
is not a circulant matrix, while AB = I3 is a circulant matrix.
8.91 Now
(AA−1 )11 = (AA−1 )mm = c − abc = c(1 − ab) = (1 − ab)−1 (1 − ab) = 1,
while
(AA−1 )jj = −abc + (ab + 1)c − abc = c − abc = 1,
for j = 2, . . . , m − 1. Also
(AA−1 )j1 = aj−1 c − aj−1 c = 0,
for j = 2, . . . , m, and
(AA−1 )jm = bm−j c − bm−j c = 0,
for j = 1, . . . , m − 1. Finally,
for j = 2, . . . , m − 1, i = 1, . . . , j − 1, and
for j = 2, . . . , m−1, i = j+1, . . . , m, so that the matrix given is the inverse of A. If ab = 1, then b = 1/a,
(A)·1 = (1, a, a2 , . . . , am−1 )0 , (A)·2 = (1/a, 1, a, . . . , am−2 )0 , and (A)·1 − a(A)·2 = 0, so rank(A) < m.
where Ah is the hth leading principal submatrix of A. Clearly, |A1 | = 1 and |A2 | = 1 − ρ2 , so (1) holds
for h = 1 and h = 2. We show that it holds for general h by induction. Suppose (1) holds for h − 1, so
96
that |Ah−1 | = (1 − ρ2 )h−2 . The determinant of B = Ah can be computed by expanding along its first
row to get
h
X
|Ah | = |B| = ρi−1 B1i = B11 + ρB12 . (2)
i=1
The last equality follows since when i > 2, the matrix obtained from Ah by deleting its first row and ith
column has its first and second columns being scalar multiples of one another. In addition, note that B11
is identical to |Ah−1 | and, upon using Theorem 1.4(f), we find that B12 = (−1)1+2 ρ|Ah−1 | = −ρ|Ah−1 |.
Thus, (2) becomes
The result follows from Theorem 7.6 since the quantity in (1) is positive for all h.
8.95
1 1 1 1 1 1 1 1 1 1 1 1
−1 1 −1 −1 −1 −1 1 −1
1 1 1 1
−1 −1 1 −1 −1 −1 −1
1 1 1 1 1
1 −1 −1 1 −1 −1 −1 −1
1 1 1 1
−1 1 −1 −1 1 −1 −1 −1
1 1 1 1
−1 −1 1 −1 −1 1 −1 1 −1
1 1 1
H= .
−1 −1 −1 1 −1 −1 1 −1
1 1 1 1
1 −1 −1 −1 1 −1 −1 1 −1
1 1 1
1 −1 −1 −1 1 −1 −1 −1
1 1 1 1
−1 −1 −1 1 −1 −1 1 −1
1 1 1 1
−1 −1 −1 −1 1 −1 −1
1 1 1 1 1
1 1 −1 1 1 1 −1 −1 −1 1 −1 −1
8.97 Since each element of H is 1 or −1, we simply need to verify that HH 0 = 4mI4m . Partitioning HH 0
as H is partitioned, we find that each submatrix along the diagonal is
97
since BA0 = AB 0 and DC 0 = CD0 . In a similar fashion, using the condition XY 0 = Y X 0 , it can be
shown that the remaining off-diagonal submatrices of HH 0 are null matrices.
8.99 The result follows immediately from Theorem 8.61 since A is nonsingular if and only if |A| 6= 0, and
each term in the product given in (8.28) is nonzero if and only if ai 6= aj for all i 6= j.
is a Toeplitz matrix since it has the form given in (8.23). Similarly, we find that
bm−1 bm bm+1 ··· b2m−2
e0m
bm−2 bm−1 bm ··· b2m−3
0
em−1
0 0 0 AA0 =
P AA = P AA = bm−3 bm−2 bm−1 ··· b2m−4
..
. ..
.. .. ..
. . . .
e01
b0 b1 b2 ··· bm−1
98
Chapter 9
9.1 (a) With f (x) = log(x), we have f (i) (x) = (−1)i−1 (i − 1)!x−i . Consequently, the kth-order Taylor
formula is given by
k
X ui (−1)i−1 (i − 1)!1−i
log(1 + u) = log(1) + + rk (u, 1)
i=1
i!
k
X ui (−1)i−1
= + rk (u, 1).
i=1
i
(b) We get
(.1)2 (.1)3 (.1)4 (.1)5
log(1.1) ≈ .1 − + − + = .0953103.
2 3 4 5
9.3 Since
−z2 1
∂ z12 z1
g= ,
∂z 0 z2 z1
and
∂ 2x1 2x2 2x3
f = ,
∂x0 2 −1 −1
we have
2(x21 +x22 +x23 )−2x1 (2x1 −x2 −x3 )
∂ ∂ ∂ (x21 +x22 +x23 )2
y = g f =
∂x0 ∂f 0 ∂x0 2(x21 + x22 + x23 ) + 2x1 (2x1 − x2 − x3 )
−(x21 +x22 +x23 )−2x2 (2x1 −x2 −x3 ) −(x21 +x22 +x23 )−2x3 (2x1 −x2 −x3 )
(x21 +x22 +x23 )2 (x21 +x22 +x23 )2 .
−(x21 + x22 + x23 ) + 2x2 (2x1 − x2 − x3 ) −(x21 + x22 + x23 ) + 2x3 (2x1 − x2 − x3 )
9.5 We have
x0 Bxd(x0 Ax) − x0 Axd(x0 Bx)
df =
(x0 Bx)2
x Bx{(dx)0 Ax + x0 Adx} − x0 Ax{(dx)0 Bx + x0 Bdx}
0
=
(x0 Bx)2
2(x Bxx Adx − x Axx0 Bdx)
0 0 0
= ,
(x0 Bx)2
and so
∂ 2(x0 Bxx0 A − x0 Axx0 B)
0
f= .
∂x (x0 Bx)2
99
9.7 (a) We have
d tr(AX) = tr(AdX) = vec(A0 )0 vec(dX) = vec(A0 )0 d vec(X),
so
∂
tr(AX) = vec(A0 )0 .
∂ vec(X)0
(b) Similarly, we get
= tr(BXAdX) + tr(AXBdX)
so
∂
tr(AXBX) = vec(A0 X 0 B 0 )0 + vec(B 0 X 0 A0 )0 .
∂ vec(X)0
and so
∂
log |X 0 AX| = 2 vec{AX(X 0 AX)−1 }0 .
d vec(X)0
9.11 Since
n
X
dX n = (dX)X n−1 + X(dX)X n−2 + · · · + X n−1 dX = X j−1 (dX)X n−j ,
j=1
we find that
Xn n
X
d vec(X n ) = vec X j−1 (dX)X n−j = vec{X j−1 (dX)X n−j }
j=1 j=1
n
X
= (X n−j0 ⊗ X j−1 )d vec(X),
j=1
100
and so
n
∂ n
X
vec(X ) = (X n−j0 ⊗ X j−1 ).
∂ vec(X)0 j=1
and
and so
∂
vec(X# ) = |X|{vec(X −1 ) vec(X −10 )0 − (X −10 ⊗ X −1 )}.
∂ vec(X)0
9.15 (a) Since dF = d(AXA0 ) = A(dX)A0 and
we get
∂
vec(F ) = (A ⊗ A)Dm .
∂ v(X)0
(b) Since dF = d(XBX) = (dX)BX + XB(dX) and
we find that
∂
vec(F ) = {(XB ⊗ Im ) + (Im ⊗ XB)}Dm .
∂ v(X)0
9.17 (a) Using Theorem 8.27, we get
101
It then follows that
∂
vec(F ) = (In ⊗ Knm ⊗ Im ){Imn ⊗ vec(X) + vec(X) ⊗ Imn }.
∂ vec(X)0
= 2Dvec(X) d vec(X).
9.19 We prove the result by induction. The result holds when n = 1 due to Theorem 9.2. Suppose the
result holds for n − 1. The result then holds for n since
9.21 Using the perturbation formula for a matrix inverse given in Section 9.6, we have
102
Solutions for the Bi ’s can be obtained by equating (1) and (2). First we must have −Y = 2B1 , so
B1 = − 21 Y . Next, we have
1
Y 2 = 2B2 + B12 = 2B2 + Y 2 ,
4
and so
1 3 2 3 2
B2 = Y = Y .
2 4 8
Then we have
3 3 3
−Y 3 = 2B3 + B1 B2 + B2 B1 = 2B3 − Y − Y 3,
16 16
and this leads to
1 10 5
B3 = − Y 3 = − Y 3.
2 16 16
Finally, we have
5 4 5 9
Y 4 = 2B4 + B1 B3 + B3 B1 + B22 = 2B4 + Y + Y 4 + Y 4,
32 32 64
9.23 Note that DS can be written as DS = Im + DA . Then using the result from Problem 9.21, we have
−1/2 −1/2
R = DS SDS
1 3 2 5 3 1 3 2 5 3
= Im − DA + DA − DA + · · · (Ω + A) Im − DA + DA − DA + ···
2 8 16 2 8 16
1 3 2 2
= Ω + A − (ΩDA + DA Ω) + (DA Ω + ΩDA )
2 8
1 1 3 2 2
+ DA ΩDA − (ADA + DA A) + (DA A + ADA )
4 2 8
1 3 2 2 5 3 3
+ DA ADA − (DA ΩDA + DA ΩDA ) − (DA Ω + ΩDA )
4 16 16
= Ω + C1 + C2 + C3 .
103
involving first order terms. Premultiplying (1) by e0l yields
where y is an arbitrary scalar. Note that γ 0l γ l = 1 implies that e0l b1 = 0, and so this confirms
that bl1 = 0. Premultiplying (2) by ei when i 6= l yields
uil − xl vil
bi1 = e0i (X − xl Im )+ (xl V − U )el = −
xi − xl
since c = 1.
(c) γ 0l (Im + V )γ l = 1 and c = 1 leads to the equation vll + 2bl1 = 0, and so bl1 = −vll /2. The formula
for bil is obtained as in part (b).
1/2 1/2
(d) γ 0l γ l = λl and c = xl leads to the equation 2xl bl1 = a1 , and so
a1 ull − xl vll
bl1 = 1/2
= 1/2
.
2xl 2xl
Premultiplying (2) by ei when i 6= l yields
1/2
1/2 0 xl (uil − xl vil )
bi1 = xl ei (X − xl Im )+ (xl V − U )el = −
xi − xl
1/2
since c = xl .
104
9.29 (a) The differential of f is
and
∂
f = 2x0 B + a0 = 00
∂x0
yields the equation Bx = − 12 a. It follows from Theorem 6.4 that x = − 12 B + a + (Im − B + B)y
is a solution for any choice of y.
(b) If B is nonsingular, then B + = B −1 and the soluiton given in part (a) becomes
1 1
x = − B −1 a + (Im − B −1 B)y = − B −1 a,
2 2
which is unique. Since d2 f = d(df ) = 2(dx)0 B(dx), Hf = 2B. Thus, this unique solution is a
minimum when B is positive definite and it is a maximum when B is negative definite.
9.31 Following the development in the one sample problem given in Example 9.6, we need to maximize the
function
k k
1X 1X
g(µ1 , . . . , µk , Ω) = − ni log |Ω| − tr(Ω−1 Ui )
2 i=1 2 i=1
k
1 1X
= − n log |Ω| − tr(Ω−1 Ui ),
2 2 i=1
where
ni
X
Ui = (xij − µi )(xij − µi )0 .
j=1
105
k
1X
+ tr(Ω−1 {ni (dµi )(xi − µi )0 + ni (xi − µi )dµ0i })
2 i=1
k k
1X X
= vec(Ui − ni Ω)0 (Ω−1 ⊗ Ω−1 ) vec(dΩ) + ni (xi − µi )0 Ω−1 dµi .
2 i=1 i=1
ni Ω−1 (xi − µi ) = 0,
or, equivalently,
k
X
(Ω−1 Ui Ω−1 − ni Ω−1 ) = (0).
i=1
Premultiplying and postmultiplying this equation by Ω and then solving for Ω, we get
k ki n
1X 1 XX
Ω̂ = Ui = (xij − µ̂i )(xij − µ̂i )0
n i=1 n i=1 j=1
k in
1 XX
= (xij − xi )(xij − xi )0 .
n i=1 j=1
Computing the second differential and then evaluating at µi = xi and Ω = Ω̂, we get
k
n X
d2 g = − vec(dΩ)0 (Ω̂−1 ⊗ Ω̂−1 ) vec(dΩ) − ni (dµi )0 Ω̂−1 dµi .
2 i=1
Consequently,
−n1 Ω̂−1 (0) ··· (0)
(0) −n2 Ω̂−1 ··· (0)
Hg =
.. .. ..
.
. . .
(0) (0) ··· − n2 Ω̂−1 ⊗ Ω̂−1
Clearly, Hg is negative definite and so our solutions do maximize g.
106
9.33 Let z 1 = (x01 , y1 )0 ∈ T and z 2 = (x02 , y2 )0 ∈ T , so that xi ∈ S and yi ≥ f (xi ) for i = 1, 2. We need to
show that if 0 ≤ c ≤ 1 and z = (x0 , y)0 = cz 1 + (1 − c)z 2 , then z ∈ T . Now since xi ∈ S and S is a
convex set, it follows that x = cx1 + (1 − c)x2 ∈ S. Also, since f is a convex function,
≤ cy1 + (1 − c)y2 = y.
9.35 Let x1 ∈ S, x2 ∈ S, and 0 ≤ c ≤ 1. Applying the given inequality to x = x1 and a = cx1 + (1 − c)x2
leads to
∂
f (x1 ) ≥ f (a) + f (a) (1 − c)(x1 − x2 ),
∂a0
while applying it to x = x2 and the same a yields
∂
f (x2 ) ≥ f (a) + f (a) c(x2 − x1 ).
∂a0
Multiplying the first of these equations by c and the second by (1 − c), and then combining gives us
and so f is convex.
df = (dxc1 )x1−c
2 + xc1 dx1−c
2
c −c
= cxc−1 1−c
1 x2 dx1 + (1 − c)x1 x2 dx2 ,
+ (1 − c)(dxc1 )x−c c −c
2 dx2 + (1 − c)x1 (dx2 )dx2
c−1 −c
= c(c − 1)x1c−2 x1−c 2
2 dx1 + c(1 − c)x1 x2 dx1 dx2
−c c −c−1
+ c(1 − c)xc−1
1 x2 dx1 dx2 − c(1 − c)x1 x2 dx22 .
Thus,
−c−1
−x22 x1 x2
Hf = c(1 − c)xc−2
1 x2
.
x1 x2 −x21
107
Since the leading principal minors of −Hf are nonnegative, −Hf is nonnegative definite. It follows
from Problem 9.36 that −f (x) is a convex function or, equivalently, f (x) is a concave function.
(b) Since −f (x) is a convex function, the inequality is an immediate consequence of Jensen’s inequal-
ity.
9.39 Setting the first derivative with respect to x of the Lagrange function
2x2 − 2λx2 = 0,
2x3 − 2λx3 + 6λ = 0.
From these we get x1 = 2λ/(1 − λ), x2 = 0, and x3 = −3λ/(1 − λ). Since x21 + x22 + x23 + 4x1 − 6x3 = 2,
p
we then find that λ = 1 ± 13/15. Thus, we have the solutions,
p p p p p
x0 = (−2(1 + 13/15)/ 13/15, 0, 3(1 + 13/15)/ 13/15) when λ = 1 + 13/15,
p p p p p
x0 = (2(1 − 13/15)/ 13/15, 0, −3(1 − 13/15)/ 13/15) when λ = 1 − 13/15,
Consequently, |∆3 | = −16(1 − λ)2 {(x1 + 2)2 + x22 + (x3 − 3)2 } and |∆2 | = −8(1 − λ){(x1 + 2)2 + x22 }.
p
Here n = 3 and m = 1. Since |∆2 | < 0 and |∆3 | < 0 when λ = 1 − 13/15, we have
108
9.41 Setting the first derivative with respect to x of the Lagrange function
From the third component above, we see that λ2 = 1 since the two constraints ensure that x1 6= 0.
Multiplying the first component by 2λ1 and adding to the second component, we obtain an equation
which yields the solution x1 = (1 − 4λ21 )−1 , and when this is substituted back into the first component,
we also get x2 = 2λ1 (1 − 4λ21 )−1 . Substituting both of these into the first constraint, we find that
p p
λ1 = 0, 3/4, or − 3/4. Using these values, the solutions for x1 and x2 , and the solution for x3
obtained from the second constraint, we find that there are 3 stationary points:
109
9.43 Since f (cx) = f (x) for any scalar c 6= 0, maximizing and minimizing f (x) is equivalent to maximizing
and minimizing x0 Ax subject to the constraint x0 Bx = 1. Differentiating the Lagrange function
L(x, λ) = x0 Ax − λ(x0 Bx − 1) with respect to x and setting equal to 00 , yields the equation
Ax = λBx.
x0 Ax
λ= = x0 Ax.
x0 Bx
Thus, for any stationary point x, λ = f (x) and λ is an eigenvalue of B −1 A. Consequently, the minimum
of f is λm (B −1 A), which is attained at any eigenvector of B −1 A corresponding to λm (B −1 A), and the
maximum of f is λ1 (B −1 A), which is attained at any eigenvector of B −1 A, corresponding to λ1 (B −1 A).
n n
!
X X
2 2
L(a, λ) = σ ai − λ ai − 1 .
i=1 i=1
Setting the first derivative with respect to a equal to 0, we obtain the equations
2σ 2 ai − λ = 0,
Pn
for all i. Thus, a1 = · · · = an and the only solution satisfying i=1 ai = 1 has ai = 1/n for all i.
This of course yields the estimator x̄. Now A and B in Theorem 9.15 are
A = 2σ 2 In , B = 10n .
According to Theorem 9.15, our solution yields a minimum since x0 Ax > 0 for all x 6= 0 for which
Bx = 0.
110
9.47 Since f (c1 a, c2 b) = f (a, b) for any c1 6= 0 and c2 6= 0, maximizing f is equivalent to maximizing
(a0 Ω12 b)2 subject to the constraints a0 Ω11 a = 1 and b0 Ω22 b = 1. Thus, we consider the Lagrange
function
L(a, b, λ1 , λ2 ) = (a0 Ω12 b)2 − λ1 (a0 Ω11 a − 1) − λ2 (b0 Ω22 b − 1).
That is, λ1 = λ2 = (a0 Ω12 b)2 , which is the quantity we wish to maximize. Letting λ = λ1 = λ2 , it
follows from (1) that a = (a0 Ω12 b)−1 Ω−1 0 −1
11 Ω12 b, and when substituting this in (2), we get Ω12 Ω11 Ω12 b =
maximum will be given by the largest eigenvalue. Similarly, if we use (2) to solve for b and substitute
this in (1), we get Ω−1 −1 0
11 Ω12 Ω22 Ω12 a = λa. Thus, the maximum is attained when a is an eigenvector
of Ω−1 −1 0 −1 0 −1
11 Ω12 Ω22 Ω12 corresponding to its largest eigenvalue, and b is an eigenvector of Ω22 Ω12 Ω11 Ω12
111
Chapter 10
10.1 If x ≺ y, then
k
X k
X
x[i] ≤ y[i] ,
i=1 i=1
for k = 1, . . . , m, which implies that x[i] = y[i] , for i = 1, . . . , m. That is, for some permutation matrix
P , y = P x.
for k = 1, . . . , m − 1, and
m
X m
X m
X
w[i] = λ x[i] + (1 − λ) y[i]
i=1 i=1 i=1
Xm Xm
= λ z[i] + (1 − λ) z[i]
i=1 i=1
m
X
= z[i] .
i=1
Thus, w = λx + (1 − λ)y ≺ z.
112
10.5 If x ≺ y, then it follows from Theorem 10.9 that
m
X m
X
|xi − a| ≤ |yi − a| (1)
i=1 i=1
since g(x) = |x − a| is a convex function. Conversely now suppose (1) holds for all a. Note that if
xi > a and yi > a for all i, then
m
X m
X m
X m
X m
X m
X
xi − ma = (xi − a) = |xi − a| ≤ |yi − a| = (yi − a) = yi − ma,
i=1 i=1 i=1 i=1 i=1 i=1
so we must have
m
X m
X
xi ≤ yi . (2)
i=1 i=1
so we must have
m
X m
X
xi ≥ yi . (3)
i=1 i=1
whereas
m
X
(y[i] − y[k] )+ = 0,
i=k+1
so that
m
X k
X
(y[i] − y[k] )+ = y[i] − ky[k] .
i=1 i=1
113
Thus, due to (5) we have
k
X m
X k
X k
X
y[i] − ky[k] ≥ (x[i] − y[k] )+ ≥ (x[i] − y[k] )+ ≥ x[i] − ky[k] ,
i=1 i=1 i=1 i=1
it follows from (1) that (5) holds. We have shown this guarantees that (6) holds and this along with
(4) confirms that x ≺ y.
10.7 Since P is doubly stochastic, P 1m = 1m and 10m P = 10m . Premultiplying the first of these equations
by P −1 and postmultiplying the second by P −1 yields P −1 1m = 1m and 10m P −1 = 10m . Also if P and
P −1 are doubly stochastic, from Theorem 10.1, we know that P ej ≺ ej and P −1 (P ej ) = ej ≺ P ej .
By Problem 10.1, this means that the jth column of P can be obtained from ej by permutation of
its components; that is, the jth column of P has one nonzero component equal to one. In a similar
fashion, we can show that the jth row of P has one nonzero component equal to one. This establishes
that P is a permutation matrix.
10.9 Write b = (b1 ,0 b02 )0 , where b1 = (b1 , . . . , bm1 )0 has the eigenvalues of A11 and b2 = (bm1 +1 , . . . , bm )0
has the eigenvalues of A22 . Let A11 = P1 Db1 P10 and A22 = P2 Db2 P20 be the spectral decompositions
of A11 and A22 , so that P1 and P2 are m1 × m1 and m2 × m2 orthogonal matrices. It follows that
P = diag(P1 , P2 ) is an m × m orthogonal matrix, and so the eigenvalues of A are the same as the
eigenvalues of
P10 A11 P1 P10 A12 P2 Db1 P10 A12 P2
P 0 AP = = . (1)
P20 A012 P1 P20 A22 P2 P20 A012 P1 Db2
Now applying Theorem 10.3 to the matrix given in (1), we find that b ≺ a.
10.11 Note that the function g(x) = x−1 is a convex function on (0, ∞) and x̄1m ≺ x. Thus, an application
of Theorem 10.9 yields
m m
X 1 X 1
≥ .
x
i=1 i i=1
x̄
Multiplying both sides of this equation by mx̄ and then subtracting m leads to
m
X mx̄ − xi
≥ m(m − 1).
i=1
xi
114
10.13 Since y is in the column space of A, it can be written as y = Az for some m × 1 vector z. Then using
Problem 10.12, we get
We have equality if and only if one of the vectors Ax and Az = y is a scalar multiple of the other.
115
10.17 Let λ = (λ1 , . . . , λm )0 , where λ1 ≥ · · · ≥ λm are the eigenvalues of A. Then using the Cauchy-Schwarz
inequality, we have
m
!2
X
{tr(A)} 2
= λi = (10m λ)2 ≤ (10m 1m )(λ0 λ)
i=1
m
!
X
= m λ2i = m tr(A2 ),
i=1
with equality if and only if one of the vectors 1m and λ is a scalar multiple of the other, and this is
equivalent to saying the eigenvalues are all equal.
10.19 Clearly
tr{(A0 B)2 } = tr(A0 BA0 B) = tr(BA0 BA0 ) = tr{(BA0 )2 }.
Now applying the result in Problem 10.18 with A0 replaced by B and B replaced by A0 , we get
10.21 We prove the result by induction. It follows from Theorem 10.16 that the result holds for n = 2. Now
Pn−1
assume the result holds for n − 1 and we will show that it must hold for n. Let A = i=1 βi Ai , where
βi = αi /(1 − αn ) for i = 1, . . . , n − 1. Note that
n−1
X n−1
X
βi = (1 − αn )−1 αi = (1 − αn )−1 (1 − αn ) = 1,
i=1 i=1
and since A1 , . . . , An−1 are positive definite, we can apply our result to A, that is,
n−1
X n−1
Y
|A| = β i Ai ≥ |Ai |βi , (1)
i=1 i=1
with equality if and only if A1 = · · · = An−1 . Also, an application of Theorem 10.16 yields
n
X n−1
X
αi Ai = (1 − αn ) βi Ai + αn An = |(1 − αn )A + αn An | ≥ |A|1−αn |An |αn , (2)
i=1 i=1
116
with equality if and only if A1 = . . . = An .
10.23 Using Theorem 4.14, we can write A−1 = C 0 Λ−1 C and B −1 = C 0 C, where C is a nonsingular matrix
and Λ is a diagonal matrix with positive diagonal elements. As a result
= C 0 DC,
where D = αΛ−1 + (1 − α)Im − {αΛ + (1 − α)Im }−1 . Note that since g(x) = x−1 is a convex function
on (0, ∞),
α
αg(x) + (1 − α)g(1) = + (1 − α)
x
1
≥ g(αx + (1 − α)) = ,
αx + (1 − α)
with equality if and only if x = 1. This guarantees that D, and hence also W , is nonnegative definite.
Thus, we must have
10.25 Using the result from the previous problem, we find that
We have equality if and only if the three minimums occur at the same X in which case from the
previous problem, this requires that A−1 , B −1 , and (A + B)−1 be proportional to one another, and so
A and B are proportional.
117
Chapter 11
and
I
m Im − A Im (0) Im Im − A Im −(Im − A)
rank = rank
A (0) −A Im (0)A (0) Im
Im (0)
= rank = rank(Im ) + rank{A(Im − A)}
(0) −A(Im − A)
= m + rank{A(Im − A)},
where we have used Theorem 1.10 and Theorem 2.9. Together these two identities imply rank{A(Im −
A)} = 0, so that we must have A(Im − A) = (0) or or A2 = A.
11.3 The matrix A has r eigenvalues equal to one, with the remaining eigenvalues all being zero. Thus,
A has a spectral decomposition A = XΛX 0 , where Λ = diag(Ir , (0)) for some orthogonal matrix X.
Partition X as X = [P Q], where P is m × r. Then X 0 X = Im implies that P 0 P = Ir and
0
I r (0) P
A = XΛX 0 = [P Q] = P P 0.
0
(0) (0) Q
11.5 It follows from Problem 11.3 that a symmetric idempotent matrix is the projection matrix for the
orthogonal projection onto its column space. Thus, since these projection matrices are unique and A
and B have the same column space, we must have A = B.
118
(b) We must have
(bIm + c1m 10m )2 = b2 Im + 2bc1m 10m + mc2 1m 10m = bIm + c1m 10m .
That is, we must have b2 = b and 2bc + mc2 = c. Thus, we need b = 0 and c = 0 or c = 1/m, or
b = 1 and c = 0 or c = −1/m.
BABA = BA,
11.11 Note that if c = 0, A2 = (0) and so all the eigenvalues of A equal zero. Thus, tr(A) = c rank(A) holds
since both sides reduce to zero. If c 6= 0, then A2 = cA implies
so tr(A) = c rank(A).
(A − BB + )2 = A2 − ABB + − BB + A + BB + BB +
= A − BB + − (BB + )0 A + BB + = A − B +0 B 0 A
= A − B +0 (AB)0 = A − B +0 B 0
= A − (BB + )0 = A − BB + ,
119
(b) Since Im − A is symmetric idempotent and (Im − A)B = B, we can apply the result from part
(a). That is, (Im − A) − BB + is symmetric idempotent with
11.15 Since Ω is positive definite, it can be written Ω = T T 0 for some m × m nonsingular matrix T . Then
z = T −1 x ∼ Nm (T −1 µ, Im ) and
x0 Ax = x0 T −10 T 0 AT T −1 x = z 0 T 0 AT z.
Then z ∗ = P 0 z ∼ Nm (P 0 T −1 µ, Im ) and
r
X
x0 Ax = z 0 T 0 AT z = zP ∆P 0 z = z 0∗ ∆z ∗ = 2
zi∗ .
i=1
Thus, x0 Ax has a chi-squared distribution with r degrees of freedom. The noncentrality parameter is
1 1 1 1
λ= E(z ∗ )0 ∆E(z ∗ ) = µ0 T −10 P ∆P 0 T −1 µ = µ0 T −10 T 0 AT T −1 µ = µ0 Aµ.
2 2 2 2
120
11.19 Since x1 , . . . , xn are independent and identically distributed as N (µ, σ 2 ) random variables, we have
x = (x1 , . . . , xn )0 ∼ Nn (µ1n , σ 2 In ), y = x − µ1n ∼ Nn (0, σ 2 In ), and n−1 (x − µ1n )0 1n = (x̄ − µ).
Thus,
n(x̄ − µ)2 n(x − µ1n )0 (n−1 1n )(n−1 10n )(x − µ1n )
t= = = y 0 (n−1 σ −2 1n 10n )y.
σ2 σ2
Since (n−1 σ −2 1n 10n )(σ 2 In ) is idempotent and rank(n−1 1n 10n ) = 1, it follows from Theorem 11.10 that
t ∼ χ21 .
11.23 Since Ω is positive definite, we can write Ω = T T 0 for some nonsingular matrix T , and y = T −1 x ∼
Nm (T −1 µ, Im ). Let A∗ = T 0 AT and B∗ = BT , and note that
121
If the spectral decomposition of A∗ is given by
Λ (0) P10
A∗ = [P1 P2 ] = P1 ΛP10 ,
(0) (0) P20
then the condition B∗ A∗ = (0) implies that B∗ = CP20 for some matrix C. Now let z = P 0 y ∼
Nm (P 0 T −1 µ, Im ) and partition z = (z 01 , z 02 )0 similar to P = (P1 , P2 ). Then
x0 Ax = y 0 T 0 AT y = y 0 A∗ y = z 0 P 0 A∗ P z = z 01 Λz 1 ,
and
Bx = BT y = B∗ y = B∗ P z = CP20 P z = Cz 2 .
1 1
A1 = 14 104 + (e1 − e2 )(e1 − e2 )0 ,
4 2
1
A2 = (14 − 4e4 )(14 − 4e4 )0 ,
12
A3 = (e1 + e2 − 2e3 )(e1 + e2 − 2e3 )0 + (e3 − e4 )(e3 − e4 )0 .
(b) Since A21 = A1 and tr(A1 ) = 2, t1 ∼ χ22 . Since A22 = A2 and tr(A2 ) = 1, t2 ∼ χ21 . Finally, t3 does
not have a chi-squared distribution because A23 6= A3 .
1
A2 A3 = (14 − 4e4 )(e3 − e4 )0 6= (0).
3
Thus, t1 and t2 are independently distributed, t1 and t3 are independently distributed, but t2 and
t3 are not independently distributed.
11.27 Since Ω is positive semidefinite, there is an m × m matrix T satisfying Ω = T T 0 , and so condition (a)
can be written as
T T 0 AT T 0 BT T 0 = (0).
T 0 AT T 0 BT = (0),
122
which is the condition (11.12) established in the proof of Theorem 11.14. Writing x = µ + T P y, we
find that
x0 Ax = µ0 Aµ + y 0 Cy + 2µ0 AT P y, (1)
and
x0 Bx = µ0 Bµ + y 0 Dy + 2µ0 BT P y, (2)
where C, D, and P are as defined in the proof of Theorem 11.14, and y ∼ Nm (0, Im ). As demonstrated
in the proof of Theorem 11.14, CD = (0) and this shows that y 0 Cy and y 0 Dy are independent. Next
write y 0 Cy = y 0 CC + Cy and note that the covariance between Cy and µ0 BT P y is
But condition (b) implies that T 0 AT T 0 Bµ = 0, that is, P CP 0 T 0 Bµ = 0, and this yields CP 0 T 0 Bµ = 0.
Thus, Cy and µ0 BT P y are independently distributed, and hence so are y 0 Cy and µ0 BT P y. In a
similar fashion, it can be shown that the independence of y 0 Dy and µ0 AT P y follows from condition
(c). Finally, due to condition (d), we have
µ0 AT P E(yy 0 )P 0 T 0 Bµ = µ0 AT P P 0 T 0 Bµ = µ0 AΩBµ = 0,
so that µ0 AT P y and µ0 BT P y are uncorrelated and, hence, independent. This then confirm that (1)
and (2) are independently distributed.
11.29 Let F = [G0 H 0 ]0 , and note that F −1 = [G0 (GG0 )−1 H 0 (HH 0 )−1 ]. Then
y = Xβ + = XF −1 F β +
= X∗ β ∗ + XH 0 (HH 0 )−1 c + ,
The sum of squared errors for this reduced model is then given by
SSEr = (y ∗ − X∗ β̂ ∗ )0 (y ∗ − X∗ β̂ ∗ )
= {y − XH 0 (HH 0 )−1 c}0 {IN − XG0 (GX 0 XG0 )−1 GX 0 }{y − XH 0 (HH 0 )−1 c},
123
while the sum of squared errors for the complete model is
and for i 6= j,
m
X m
X
E{zi zj (zz 0 ⊗ zz 0 )} = Im ⊗ Tij + (Tik ⊗ Tjk ) + (Tjk ⊗ Tik ) + Tij ⊗ Im .
k=1 k=1
This leads to
m m
0 0 1 XX
E{zi zj (zz ⊗ zz )} = δij {Im ⊗ Im + (Thk ⊗ Thk )} + Im ⊗ Tij + Tij ⊗ Im
2
h=1 k=1
m
X Xm
+ (Tik ⊗ Tjk ) + (Tjk ⊗ Tik ),
k=1 k=1
124
m X
X m X
m
+ {Eij ⊗ (Tik ⊗ Tjk + Tjk ⊗ Tik )}
i=1 j=1 k=1
m Xm
1 X
= Im3 + (Im ⊗ Tij ⊗ Tij
2 i=1 j=1
+ Tij ⊗ Im ⊗ Tij + Tij ⊗ Tij ⊗ Im )
Xm X m Xm
+ (Tij ⊗ Tik ⊗ Tjk ).
i=1 j=1 k=1
11.33 (a) We can write y = sx, where x ∼ Nm (0, Ω) and s is a nonnegative random variable that is
distributed independently of x. Using Theorem 11.21, we then get
where c = E(s4 ).
(b) Premultiplying E(yy 0 ⊗ yy 0 ) in part (a) by e0i ⊗ e0i and postmultiplying by ei ⊗ ei yields the
identity
E(yi4 ) = 3c{E(yi2 )}2 ,
and clearly
n
X
E{vec(Y 0 AY )} = E{vec(y i y 0i )} − nE{vec(ȳ ȳ 0 )} = (n − 1) vec(Ω). (2)
i=1
Since aii = 1 − n−1 and aij = −n−1 for i 6= j, and the y i ’s are independent, we have
!0
X X
E{vec(Y 0 AY ) vec(Y 0 AY )0 } = E vec aij y i y 0j vec akl y k y 0l
ij kl
X
= aij akl E(y j y 0l ⊗ y i y 0k )
ijkl
125
X X
= aii ajj E(y i y 0j ⊗ y i y 0j ) + a2ij E(y i y 0j ⊗ y j y 0i )
i6=j i6=j
X X
+ a2ii E(y i y 0i ⊗ y i y 0i ) + a2ij E(y i y 0i ⊗ y j y 0j )
i i6=j
X
= aii ajj E{vec(y i y 0i ) vec(y j y 0j )0 }
i6=j
X
+ a2ij E(y i y 0i ⊗ y j y 0j )Kmm
i6=j
X X
+ a2ii E(y i y 0i ⊗ y i y 0i ) + a2ij E(y i y 0i ⊗ y j y 0j )
i i6=j
3 −1
= (n − 1) n vec(Ω) vec(Ω) + (n − 1)n−1 Kmm (Ω ⊗ Ω)
0
+ (n − 1)n−1 (Ω ⊗ Ω)
and
(d) Following the derivation given in Example 11.11 and using the formula from part (c), we find that
1
var{vec(R)} = H{2cNm (P ⊗ P ) + (c − 1) vec(P ) vec(P )0 }H 0
n−1
1
= H{2cNm (P ⊗ P )}H 0
n−1
126
2c
= Nm ΘNm ,
n−1
where H and Θ are as defined in Example 11.11. The second equality above follows from the fact
that
1
H vec(P ) = Im2 − {(Im ⊗ P ) + (P ⊗ Im )}Λm vec(P )
2
1
= vec(P ) − {(Im ⊗ P ) + (P ⊗ Im )} vec(Im )
2
1
= vec(P ) − {vec(P ) + vec(P )} = 0,
2
whereas the last equality follows from the simplification given in Problem 11.53.
Now
tr(A ⊗ BTij ⊗ CTij ) = tr(A) tr{B(ei e0j + ej e0i )} tr{C(ei e0j + ej e0i )}
In a similar fashion, we get tr(ATij ⊗B⊗CTij ) = 4 tr(B)aij cij , tr(ATij ⊗BTij ⊗C) = 4 tr(C)aij bij ,
and tr(ATij ⊗ BTik ⊗ CTjk ) = 8aij bik cjk . Thus,
m X
X m
E(x0 Axx0 Bxx0 Cx) = tr(A) tr(B) tr(C) + 2 {tr(A)bij cij + tr(B)aij cij
i=1 j=1
127
m X
X m X
m
+ tr(C)aij bij } + 8 aij bik cjk
i=1 j=1 k=1
= tr(A) tr(B) tr(C) + 2 tr(A) tr(BC) + 2 tr(B) tr(AC)
(b) Similarly
(c) Also
= vec(V1 ) ⊗ vec(V2 ).
(e) Finally
= V1 ⊗ V2 − µ1 µ01 ⊗ µ2 µ02 .
128
(a) Using Theorem 11.23(c), we get
= 2 tr{(4γ 1 γ 01 + 4γ 2 γ 02 )2 }
= 1536
11.41 Since Vi ∼ Wm (Ω, ni ), it can be written as Vi = Xi0 Xi , where the columns of the m × ni matrix Xi0
are independently distributed as Nm (0, Ω). Put X 0 = (X10 , X20 ), and note that since V1 and V2 are
independent, the n1 + n2 columns of X 0 are independently distributed as Nm (0, Ω). It then follows
from Theorem 11.25, that
−1 0
11.43 (a) In finding the distribution of V1·2 = V11 − V12 V22 V12 in the proof of Theorem 11.27, it was
observed that the distribution of V1·2 given X2 does not depend on X2 . Since V22 = X20 X2 ,
this implies that V1·2 is independent of V22 . Also, since V1·2 = X10 {In − X2 (X20 X2 )−1 X20 }X1 ,
129
V12 = X10 X2 , and {In − X2 (X20 X2 )−1 X20 }X2 = (0), it follows that V1·2 and V12 are independent
given X2 , that is, given V22 . Their unconditional independence also follows since
Z
fV1·2 ,V12 (U1 , U2 ) = fV1·2 ,V12 ,V22 (U1 , U2 , U3 )dU3
Z
= fV1·2 ,V12 |V22 (U1 , U2 |U3 )fV22 (U3 )dU3
Z
= fV1·2 |V22 (U1 |U3 )fV12 |V22 (U2 |U3 )fV22 (U3 )dU3
Z
= fV1·2 (U1 )fV12 |V22 (U2 |U3 )fV22 (U3 )dU3
Z
= fV1·2 (U1 )fV12 ,V22 (U2 , U3 )dU3
Then since
vec(V12 ) = vec(X10 X2 ) = (X20 ⊗ Im1 ) vec(X10 ),
11.45 Let U = Ω−1/2 V Ω−1/2 , where Ω1/2 is the symmetric square root of Ω. It follows from Theorem 11.26
that U ∼ Wm (Im , n). Note that if B = AΩ−1/2 , then
and (AΩ−1 A0 )−1 = (BB 0 )−1 . Thus, the result will be proven if we can show that (BU −1 B 0 )−1 ∼
Wp ((BB 0 )−1 , n − m + p). Now it follows from Theorem 4.1 that B can be written as B = C[Ip (0)]Q,
where C is a p × p nonsingular matrix and Q is an m × m orthogonal matrix. Then
−1
Ip
(BU −1 B 0 )−1 = C[Ip (0)]QU −1 Q0 C 0
(0)
−1
Ip
= C −10 [Ip (0)](QU Q0 )−1 C −1
(0)
130
−1
Ip
= C −10 [Ip (0)]D−1 C −1 ,
(0)
= (P 0 ⊗ Im )(In ⊗ Ω)(P ⊗ Im ) = In ⊗ Ω.
Thus,
n
!
X
0 0 0 0
E(X AX) = E(X P ΛP X) = E(Y ΛY ) = E λi y i y 0i
i=1
n n n
!
X X X
= λi E(y i y 0i ) = λi (Ω + µi∗ µ0i∗ ) = Ω λi + M∗0 ΛM∗
i=1 i=1 i=1
= Ω tr(A) + M 0 P ΛP 0 M = Ω tr(A) + M 0 AM.
(b) Since !
n
X n
X
0 0
vec(X AX) = vec(Y ΛY ) = vec λi y i y 0i = λi vec(y i y 0i ),
i=1 i=1
we have
n
X n
X
var{vec(X 0 AX)} = λ2i var{vec(y i y 0i )} = λ2i var(y i ⊗ y i )
i=1 i=1
n
X
= λ2i (2Nm {Ω ⊗ Ω + Ω ⊗ µi∗ µ0i∗ + µi∗ µ0i∗ ⊗ Ω})
i=1
131
( n
! )
X
= 2Nm λ2i (Ω ⊗ Ω) + Ω ⊗ M∗0 Λ2 M∗ + M∗0 Λ2 M∗ ⊗Ω
i=1
= 2Nm {tr(A )(Ω ⊗ Ω) + Ω ⊗ M 0 P Λ2 P 0 M + M 0 P Λ2 P 0 M ⊗ Ω}
2
and !
k
1 0 1 X
M A1 M = ni µi µ0i − nµ̄µ̄0
= Φ,
2 2 i=1
132
it follows that B ∼ Wm (Ω, k − 1, Φ). Turning to W , we see that it can be expressed as
X k Xni
W = y ij y 0ij − ni ȳ i ȳ 0i
i=1 j=1
k
X
= Yi0 (Ini − n−1 0
i 1ni 1ni )Yi
i=1
= Y 0 A2 Y,
A2 is idempotent and
k
1 0 1X 0
M A2 M = M (In − n−1 0
i 1ni 1ni )Mi
2 2 i=1 i i
k
1X
= µ 10 (In − n−1 0 0
i 1ni 1ni )1ni µi = (0),
2 i=1 i ni i
where we have written M 0 = (M10 , . . . , Mk0 ) with Mi0 = µi 10ni . Thus, W ∼ Wm (Ω, n − k) since
k
X k
X
tr(A2 ) = tr{(Ini − n−1 0
i 1ni 1ni )} = (ni − 1) = n − k.
i=1 i=1
independently distributed.
Since the columns of X 0 are independent, each with covariance matrix Ω, var{vec(X 0 )} = In ⊗ Ω, and
so using Theorem 8.27, Theorem 11.21(d), and the fact that
133
× (In ⊗ Kmn ⊗ Im ){vec(A) ⊗ Im2 }
Now
V = (X + M )0 A(X + M ) = X 0 AX + X 0 AM + M 0 AX + M 0 AM.
Since the first and third-order moments of vec(X 0 ) are all zero, we have
we have
1 1
Im2 − {(Im ⊗ P ) + (P ⊗ Im )}Λm Nm = Nm − {(Im ⊗ P ) + (P ⊗ Im )}Nm Λm
2 2
1
= Nm − 2Nm (Im ⊗ P )Nm Λm
2
= Nm {Im2 − (Im ⊗ P )Λm }.
134