0% found this document useful (0 votes)
1K views134 pages

Matrix Analysis Manual Solutions

This document provides solutions to selected odd numbered problems from Chapter 1 of the textbook "Matrix Analysis For Statistics, 3rd edition" by James R. Schott. The solutions range from simple calculations with matrices to more complex derivations using properties of matrices such as transposes, inverses, and partitions. Key results proved include conditions for symmetry, properties of traces, inverses of partitioned matrices, and expressions involving adjoints.

Uploaded by

ghosthawk86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views134 pages

Matrix Analysis Manual Solutions

This document provides solutions to selected odd numbered problems from Chapter 1 of the textbook "Matrix Analysis For Statistics, 3rd edition" by James R. Schott. The solutions range from simple calculations with matrices to more complex derivations using properties of matrices such as transposes, inverses, and partitions. Key results proved include conditions for symmetry, properties of traces, inverses of partitioned matrices, and expressions involving adjoints.

Uploaded by

ghosthawk86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 134

Matrix Analysis For Statistics, 3rd edition

Solutions to Selected Odd Numbered Problems

James R. Schott

1
Chapter 1

1.1 (a)
   
1 0 0 0
A= , B= .
0 0 0 1

(b)
     
1 0 1 0 1 0
A= , B= , C= .
0 0 0 1 0 2

1.3 If AB is symmetric, then


AB = (AB)0 = B 0 A0 = BA,

and if AB = BA, then


(AB)0 = B 0 A0 = BA = AB.

1.5 (a) We have


m
X m
X
tr(xy 0 ) = (xy 0 )ii = xi yi = x0 y.
i=1 i=1

(b) Using Theorem 1.3 (d)


tr(BAB −1 ) = tr(B −1 BA) = tr(A).

1.7 Using Theorem 1.3 (a) and (d), we have

tr(ABC) = tr{(ABC)0 } = tr(C 0 B 0 A0 ) = tr(CBA) = tr(ACB).

1.9 We need A = B + C, where B 0 = B and C 0 = −C. This would imply that A0 = B − C, and so
A + A0 = 2B and A − A0 = 2C. Thus, we have the solutions
1 1
B= (A + A0 ), C= (A − A0 ).
2 2

1.11 Since A0 = −A, we have −A2 = (−A)A = A0 A, and so for any m × 1 vector x,
m
X
x0 (−A2 )x = x0 A0 Ax = y 0 y = yi2 ≥ 0,
i=1

where y = Ax.

2
1.13 (a) (αA)−1 = α−1 A−1 since
α−1 A−1 αA = (α−1 α)A−1 A = Im .

(b) (A0 )−1 = (A−1 )0 since


(A−1 )0 A0 = (AA−1 )0 = Im
0
= Im .

(c) (A−1 )−1 = A since


AA−1 = Im .

(d) Note that


1 = |Im | = |A−1 A| = |A−1 ||A|,

so |A−1 | = |A|−1 .

(e) A−1 = diag(a−1 −1


11 , . . . , amm ) if A = diag(a11 , . . . , amm ) since

diag(a−1 −1 −1 −1
11 , . . . , amm ) diag(a11 , . . . , amm ) = diag(a11 a11 , . . . , amm amm ) = diag(1, . . . , 1) = Im .

(f) It follows from (b) that (A−1 )0 = (A0 )−1 = A−1 if A = A0 .

(g) (AB)−1 = B −1 A−1 since

B −1 A−1 AB = B −1 Im B = B −1 B = Im .

1.15 Using the cofactor expansion formula on the first column of A, we get

1 2 0 2 1 1
|A| = 1× 2 2 1 +1× 1 2 0
−1 1 2 −1 1 2
= {4 − 2 + 0 − (0 + 8 + 1)} + {8 + 1 + 0 − (−2 + 2 + 0)} = −7 + 9 = 2.

1.17 Let B be the matrix obtained from A by replacing the kth row of A by its ith row. Note that Akj = Bkj
for j = 1, . . . , m, and so
m
X m
X
aij Akj = bkj Bkj = |B| = 0,
j=1 j=1

due to Theorem 1.4 (h). Similarly, let C be the matrix obtained from A by replacing the kth column
of A by its ith column. Note that Ajk = Bjk for j = 1, . . . , m, and so
m
X m
X
aji Ajk = bjk Bjk = |B| = 0,
j=1 j=1

by another application of Theorem 1.4 (h).

3
1.19 The adjoint matrix of A is  
−7 −5 9 −1
 
−4
 
 4 2 0 
A# =  ,
 −2
 
0 2 0 
 
3 1 −3 1
and since |A| = 2, the inverse of A is
 
−7 −5 9 −1
 
−4
 
1 4 2 0 
A−1 =  ,
2
 −2 0 2

0 
 
3 1 −3 1

1.21 Using Theorem 1.9

(a) with A = Im , B = 1, C = 1m , and D = 10m , we find that


1
(Im + 1m 10m )−1 = Im − 1m 10m ,
1+m

(b) and with A = Im , B = 1, C = e1 , and D = 10m , we find that


1
(Im + e1 10m )−1 = Im − e1 10m .
2

1.23 Applying Corollary 1.9.2 with A = D, c = αa, and d = b, we get

(D + αab0 )−1 = D−1 − αD−1 ab0 D−1 /(1 + αb0 D−1 a).

1.25 Partition B = A−1 as A is partitioned and note that AB = Im yields the four equations

A11 B11 = Im1 , A11 B12 = (0),

A21 B11 + A22 B21 = (0), A21 B12 + A22 B22 = Im2 .

Premultiplying the first two equations by A−1 −1


11 yield the solutions B11 = A11 and B12 = (0). Substi-

tuting these solutions into the third and fourth equations and then premultiplying by A−1
22 yield the

solutions B22 = A−1 −1 −1


22 and B21 = −A22 A21 A11 . That is,
 
A−1
11 (0)
A−1 =  .
−A−1 −1
22 21 A11
A A−1
22

4
1.27 Expanding along the last row of A, we find that

0 1 −1 2 0 1
|A| = −2 −1 1 −1 −2 1 −1 1 = −2(2) − 2(−2) = 0,
−1 2 0 1 −1 2

so rank(A) < 4. On the other hand, the determinant of the submatrix obtained by deleting the last
row and column of A is
2 0 1
1 −1 1 = −2 6= 0.
1 −1 2
Thus, rank(A) = 3.

1.29 Theorem 1.12 (b) follows from part (a) and Theorem 1.7 since

|P 0 AP | = |P 0 ||A||P | = |A||P |2 = |A|,

while Theorem 1.12 (c) follows since

(P Q)0 P Q = Q0 P 0 P Q = Q0 Q = Im .

1.31 (P P 0 )31 = (P P 0 )32 = 0 yield the equations p31 + p32 + p33 = 0 and p31 = p32 , which together imply
that p3 = (p31 , p32 , p33 )0 has the form p3 = (a, a, −2a)0 for some a. But p03 p3 = 6a2 and so since
(P P 0 )33 = 1, we must have a = ±1. Thus, there are two solutions: p31 = p32 = 1, p33 = −2 and
p31 = p32 = −1, p33 = 2.

1.33 From the identity P 0 P = Im we get


     
P10 h i P 0P P10 P2 Im1 (0)
  P1 P2 =  1 1 = ,
P20 P20 P1 P20 P2 (0) Im2

which immediately leads to P10 P1 = Im1 and P20 P2 = Im2 . From the identity P P 0 = Im we get
 
h i P0
1  = P1 P10 + P2 P20 = Im .
P1 P 2 
P20

5
1.35 (a)
 
1 2 −3
 
A= 2 2 4 .
 
 
−3 4 −1

(b)
 
3 1 1
 
A= 1 2 .
 
5
 
1 2 2

(c)
 
0 1 1
 
A= 1 1 .
 
0
 
1 1 0

1.37 Let y = Ax for an arbitrary m × 1 vector x. It follows that A = T 0 T is nonnegative definite since
m
X
x0 Ax = x0 T 0 T x = y 0 y = yi2 ≥ 0.
i=1

1.39 Let y = B 0 x for an arbitrary n × 1 vector x. Since A is nonnegative definite, it follows that

x0 BAB 0 x = y 0 Ay ≥ 0,

and so BAB 0 is nonnegative definite.

1.41 We have
2
E(z) = m(1) (t)|t=0 = et /2
t|t=0 = 0,
2
E(z 2 ) = m(2) (t)|t=0 = et /2
(1 + t2 )|t=0 = 1,
2 2 2
E(z 3 ) = m(3) (t)|t=0 = et /2
t(1 + t2 ) + et /2
2t|t=0 = et /2
(3t + t3 )|t=0 = 0,
2 2 2
E(z 4 ) = m(4) (t)|t=0 = et /2
t(3t + t3 ) + et /2
(3 + 3t2 )|t=0 = et /2
(3 + 6t2 + t4 )|t=0 = 3,
2 2 2
E(z 5 ) = m(5) (t)|t=0 = et /2
t(3 + 6t2 + t4 ) + et /2
(12t + 4t3 )|t=0 = et /2
(15t + 10t3 + t5 )|t=0 = 0,
2 2
E(z 6 ) = m(6) (t)|t=0 = et /2
t(15t + 10t3 + t5 ) + et /2
(15 + 30t2 + 5t4 )|t=0
2
= et /2
(15 + 45t2 + 15t4 + t6 )|t=0 = 15.

6
Pn
1.43 Since XX 0 = xi x0i and n−1 X1n = x̄, we have
i=1

n
!
X
−1
S = (n − 1) xi xi − nx̄x̄ = (n − 1)−1 (XX 0 − n−1 X1n 10n X 0 )
0 0

i=1
= (n − 1)−1 XAX 0 ,

where A = In − n−1 1n 10n .

−1/2 √ √ √
1.45 (a) With DΩ = diag(1/ 2, 1/ 2, 1/ 3), the correlation is given by
 √ 
1 1/2 −1/ 6
−1/2 −1/2
 √ 
P = DΩ ΩDΩ = .
 
1/2 1 1/ 6
 √ √ 
−1/ 6 1/ 6 1

(b) It follows that u ∼ N (µu , σu2 ), where

µu = 103 µ = 6, σu2 = 103 Ω13 = 9.

(c) v ∼ N3 (µv , Ωv ), where µv = Aµ = (9, −2, −1)0 and


   
5 7 3 27 2 4
   
Ωv = AΩA0 =  3 0 −4  A0 =  2 7 4  .
   
   
2 1 −2 4 4 3

(d) w ∼ N5 (µw , Ωw ), where  


9
 
 −2 
   
A  
µw =  −1  ,
µ =  
B 



 6 
 
1
and  
27 2 4 15 2
 
4 −1 −3 
 
   2 7
A h i  
Ωw =  Ω A0 B0 =  4 1 −1  .
 
4 3
B  
 15 −1
 
1 9 2 
 
2 −3 −1 2 2

7
(e) Since σu2 6= 0 and |Ωv | 6= 0, the distributions given in (b) and (c) are nonsingular. Clearly,
|Ωw | = 0 since  
A
rank   = 3,
B
so the distribution given in (d) is singular.

1.47 (a) Since H1m = m1/2 e1 and HH 0 = Im , u ∼ Nm (m1/2 µe1 , σ 2 Im ).

(b) Note that u1 = m1/2 x̄ and x0 x = x0 H 0 Hx = u0 u so


m
X m
X m
X m
X
2 2 2 2 2
(xi − x̄) = xi − mx̄ = ui − u1 = u2i .
i=1 i=1 i=1 i=2

It follows from part (a) that u1 is independent of u2 , . . . , um and, as a result, s2 = (m −


Pm
1)−1 i=2 u2i is independent of x̄ = m−1/2 u1 .

1.49 Since E(z) = 0, and v and z are independent, we have

E(x) = E(n1/2 z/v) = n1/2 E(v −1 )E(z) = 0.

In addition,
n
var(x) = E(xx0 ) = E(nzz 0 /v 2 ) = nE(v −2 )E(zz 0 ) = Im ,
n−2
since
∞ n ∞ n−2
2 −1
e−t/2 Γ( n−2
2 )
2 −1 e−t/2
Z Z
−2 −1 t t
E(v ) = t dt = dt
n/2 n
2 Γ( 2 ) 2Γ( n2 ) 2
n−2
Γ( n−2
0 0 2 )
2

Γ( n2 − 1) 1
= = ,
2( n2 − 1)Γ( n2 − 1) (n − 2)

as long as n > 2.

8
Chapter 2

2.1 (a) If α 6= 1, α(a, b, a + b, 1) is not in the set so it is not a vector space.

(b) For any scalars α and β,


   
a1 a2
   
   
 b1   b2 
α

+β
 


 c1   c2 
   
a1 + b1 − 2c1 a2 + b2 − 2c2
 
αa1 + βa2
 
 
 αb1 + βb2 
=



 αc1 + βc2 
 
(αa1 + βa2 ) + (αb1 + βb2 ) − 2(αc1 + βc2 )

is in the set so it is a vector space.

(c) If α + β 6= 1, then
   
a1 a2
   
   
 b1   b2 
α

+β
 


 c1   c2 
   
1 − a1 − b1 − c1 1 − a2 − b2 − c2
 
αa1 + βa2
 
 
 αb1 + βb2 
=



 αc1 + βc2 
 
(α + β) − (αa1 + βa2 ) − (αb1 + βb2 ) − (αc1 + βc2 )

is not in the set so it is not a vector space.

2.3 (a, a + b, a + b, −b)0 6= (1, 1, 1, 1)0 for any choices of a and b, so (1, 1, 1, 1)0 is not in S, whereas (4, 1, 1, 3)0
is in S since (a, a + b, a + b, −b)0 = (4, 1, 1, 3)0 when a = 4, b = −3.

2.5 Straightforward calculation reveals that


 
4/3 2/3
Ω−1 =  ,
2/3 4/3

9

dΩ (x1 , µ) = 2 and dΩ (x2 , µ) = 2/ 3, so x2 is closer to the mean.

2.7 (a) Since x0 x = 10, y 0 y = 11 and x0 y = 6, we have

6
cos θ = p = .572,
10(11)

and so θ = .9618 = π/3.2665.

(b) We have
x0 y
cos(π/6) = .866 = p ,
3(2)
so x0 y = 2.121.

2.9 The identity holds since

kx + yk2 + kx − yk2 = hx + y, x + yi + hx − y, x − yi

= hx, xi + hx, yi + hy, xi + hy, yi + hx, xi − hx, yi − hy, xi + hy, yi

= 2hx, xi + 2hy, yi = 2kxk2 + 2kyk2 .

2.11 (a) This is a linearly independent set since neither vector is a scalar multiple of the other.

(b) This is a linearly dependent set since (4, −1, 2)0 − 2(3, 2, 3)0 + (2, 5, 4)0 = 00 .

(c) This is a linearly independent set since the only solution to a(1, 2, 3)0 +b(2, 3, 1)0 +c(−1, 1, 1)0 = 00
is a = b = c = 0.

(d) This is a linearly dependent set of vectors because the number of vectors in the set exceeds the
dimension of the vectors.

2.13 (a) This is a linearly dependent set of vectors since 2(2, 1, 4, 3)0 = (4, 2, 8, 6)0 .

(b) Any subset of two of the vectors is a linearly independent set with the exception of the subset
consisting of the two vectors referred to in part (a).

2.15 (a) Any basis for R4 has four vectors, so this set is not a basis.

(b) It is easily verified that the only solution to a(2, 2, 2, 1)0 +b(2, 1, 1, 1)0 +c(3, 2, 1, 1)0 +d(1, 1, 1, 1)0 =
00 is a = b = c = d = 0. Since it is a set of four linearly independent vectors it must be a basis
for R4 .

10
(c) Since (2, 0, 1, 1)0 − 2(3, 1, 2, 2)0 + (2, 1, 1, 2)0 + (2, 1, 2, 1)0 = 00 , the set is linearly dependent and
consequently not a basis for R4 .

2.17 (a) The dimension of S is less than 3 since 2x1 + 3x2 − x3 = 0. Any pair of the vectors form a linearly
independent set so the dimension of S is 2, and we will use z 1 = x1 and z 2 = x2 .

(b) We have x = α1 x1 + α2 x2 when α1 = −1 and α2 = 1.

(c) Two solutions are α1 = −1, α2 = 1, α3 = 0 and α1 = 1, α2 = 4, α3 = −1.

2.19 (a) If A is m × n and B is m × p, then


   
h i In h i (0)
A= A B  , B= A B  ,
(0) Ip

and so the result is a consequence of Theorem 2.8(a).

(b) Clearly, any nonnull vector of the form (x0 , 00 )0 cannot be expressed as a linear combination of
vectors of the form (00 , y 0 )0 . This implies that the number of linearly independent columns of
 
A (0)
 
(0) B

is equal to the number of linearly independent columns of


 
A
 ,
(0)

which equals rank(A), plus the number of linearly independent columns of


 
(0)
 ,
B

which equals rank(B). Thus, we have


 
A (0)
rank    = rank(A) + rank(B).
(0) B

A similar argument can be used to prove the second identity.

(c) We establish the inequality for the first matrix given in part (c) of the theorem. The proofs for
the others are similar. Suppose A is m × n, B is p × q, and rank(A) = r. Then there exists an

11
n × n nonsingular matrix X such that AX = [A1 (0)], where A1 is m × r with rank(A1 ) = r.
Let CX = [C1 C2 ], where C1 is p × r and define B∗ = [C2 B]. Then
      
A (0) A (0) X (0) A (0)
rank   = rank    = rank  1  .
C B C B (0) Iq C1 B∗

Note that the first r columns of the last matrix given above are linearly independent of the
remaining columns since A1 x 6= 0 for all nonnull x. Thus,
     
A1 (0) A1 (0)
rank   = rank   + rank  
C1 B∗ C1 B∗
= rank(A) + rank(B∗ ) ≥ rank(A) + rank(B).

2.21 Let X = (x1 , x2 ), Y = (y 1 , y 2 ), and  


a c
A= ,
b d
so that Y = XA. Note that y 1 and y 2 are linearly independent, that is, Y has full-column rank, if
and only if A is nonsingular. Thus, we must have |A| = ad − bc 6= 0.

2.23 First note that the set {γ 1 , . . . , γ m } is linearly independent since the matrix Γ = (γ 1 , . . . , γ m ) is
triangular with each diagonal element equal to one and, hence, |Γ| = 1. Also, for arbitrary x ∈ Rm ,
let αi = xi − xi+1 for i = 1, . . . , m − 1 and αm = xm . Then
m
X m−1
X
αi γ i = (xi − xi+1 )γ i + xm γ m
i=1 i=1
m−1
X m−1
X
= xi γ i − xi+1 γ i + xm γ m
i=1 i=1
m−1
X m
X
= xi γ i − xi γ i−1 + xm γ m
i=1 i=2
m−1
X
= x1 γ 1 + xi (γ i − γ i−1 ) + xm (γ m − γ m−1 )
i=2
m−1
X
= x1 e 1 + xi ei + xm em = x.
i=2

Thus, {γ 1 , . . . , γ m } is a basis for Rm .

12
2.25 If for some C, AC = B, then R(B) ⊆ R(A) follows from Problem 2.24(a). Now suppose that R(B) ⊆
R(A). Thus, any vector in R(B) is also in R(A), and so it can be written as a linear combination of
the columns of A. In particular, each column of B can then be written as a linear combination of the
columns of A and this immediately leads to the existence of a matrix C so that AC = B.

2.27 (a) According to Definition 2.8, the columns of X = (x1 , . . . , xn ) will be a basis if they span S.
Suppose the columns of the m × n matrix Y form a basis for S so for each x ∈ S, there exists an
n × 1 vector a such that x = Y a. In particular, there exists an n × n matrix A such that X = Y A
since the columns of X are in S. But the columns of X are linearly independent so

n = rank(X) = rank(Y A) ≤ rank(A),

which implies rank(A) = n and Y = XA−1 . Thus, the columns of X span S since the columns of
Y span S.

(b) Let X = (x1 , . . . , xn ) and let the columns of the m × n matrix Y form a basis for S. Since the
columns of Y must be in S and the columns of X span S, we must have Y = XB for some n × n
matrix B. Note that
rank(Y ) = rank(XB) ≤ rank(X).

But rank(Y ) = n since its columns form a basis, and so we must also have rank(X) = n, which
confirms that the columns of X are linearly independent.

(c) Let X1 = (x1 , . . . , xr ) and suppose the columns of the m × n matrix Y span S. Since the columns
of X1 are in S, X1 = Y A1 for some n × r matrix A1 . Also

r = rank(X1 ) = rank(Y A1 ) ≤ rank(A1 ),

from which it follows that rank(A1 ) = r, and so we can find an n × (n − r) matrix A2 such that
A = [A1 A2 ] is nonsingular. Then the n columns of

X = [X1 X2 ] = [Y A1 Y A2 ] = Y A

are in S and are linearly independent since we have rank(X) = rank(Y ) = n. The result now
follows from part (a) of this theorem.

2.29 Since    
Im −F In (0)
D= , E= 
(0) Ip −G Iq

13
are both nonsingular, we have
     
C B C B
rank   = rank D   E
A (0) A (0)
 
(0) B
= rank   = rank(A) + rank(B),
A (0)

where the final equality follows from Theorem 2.9(b).

2.31 If    
1 0 0 1
A= , B= ,
0 0 0 0
then rank(AB) = 1 whereas rank(BA) = 0.

2.33 (a) α1 x1 + α2 x2 + α3 x3 + α4 x4 = 0 yields solutions α2 = −2α1 , α3 = −α1 , α4 = 2α1 , where α1 is


arbitrary, so {x1 , x2 , x3 , x4 } is a linearly dependent set. The only solution to α1 x1 +α2 x2 +α3 x3 =
0 has α1 = α2 = α3 = 0, so {x1 , x2 , x3 } is a linearly independent set and thus a basis for S.

(b) z 1 = √1 (1, 2, 1, 2)0 .


10

13 1
y 2 = (2, 3, 1, 2)0 − (1, 2, 1, 2)0 = (7, 4, −3, −6)0 ,
10 10
so z 2 = √ 1 (7, 4, −3, −6)0 . Finally
110

10 4 1 1
y 3 = (3, 4, −1, 0)0 − (1, 2, 1, 2)0 − (7, 4, −3, −6)0 = (−6, 6, −10, 2)0 ,
10 (11/10) 10 11
so z 3 = √1 (−3, 3, −5, 1)0 .
44
−1
(c) We have z 01 x = √3 , z 0 x
10 2 = √1 ,
110
and z 03 x = √
11
, so

3 1 1 1
u = √ z1 + √ z 2 − √ z 3 = (1, 1, 1, 1)0 .
10 110 11 2

(d) v = x − u = 21 (1, −1, −1, 1)0 .

2.35 We have    
1 1 17 20 −5
  1  
X1 =  2 1 , PS = X1 (X10 X1 )−1 X10 = 4 ,
   
 20 26
  42  
3 −1 −5 4 41
1 0
and so the point in S closest to x is PS x = 21 (16, 25, 20) .

14
2.37 (a) Letting xi denote the ith column of PS , we find that α1 x1 +α2 x2 +α3 x3 +α4 x4 = 0 has solutions
α2 = α3 = .5α1 , α4 = α1 for any α1 , so PS is singular. On the other hand, α1 x1 +α2 x2 +α3 x3 = 0
has only the solution α1 = α2 = α3 = 0, so dim(S) = rank(PS ) = 3.

(b) The first 3 columns of PS form a basis for S.

2.39 (a) Denoting the ith column of A by ai , note that a1 = a2 + a3 − a4 and a3 = 43 a4 − 23 a2 , while no
column of A is a scalar multiple of another column. Thus, dim(S) = rank(A) = 2 and any two
columns of A, for instance, {a2 , a4 }, form a basis for S.

(b) We have dim{N (A)} = 4 − rank(A) = 4 − 2 = 2, and so a basis will be 2 linearly independent
vectors satisfying Ax = 0. Note that (A)1· x = 0 yields x2 = − 21 (x1 + x4 ), while (A)3· x = 0 yields
x3 = − 41 (x1 + 3x4 ). By choosing x1 = 4, x4 = 0, and x1 = 0, x4 = 4, we get the basis {x1 , x2 },
where x1 = (4, −2, −1, 0)0 and x2 = (0, −2, −3, 4)0 .

(c) There is no α1 and α2 such that α1 a2 + α2 a4 = (3, 5, 2, 4)0 , so (3, 5, 2, 4)0 ∈


/ S.

(d) A14 6= 0, so 14 ∈
/ N (A).

2.41 Let {x1 , . . . , xr } be a basis for T so that for any x ∈ T , there exist scalars α1 , . . . , αr such that
Pr
x = i=1 αi xi = Xα, where X = (x1 , . . . , xr ) and α = (α1 , . . . , αr )0 . Since u is linear, we have
r
! r
X X
u(x) = u αi xi = αi u(xi ) = U α,
i=1 i=1

where U = (u(x1 ), . . . , u(xr )). But

u(x) = U α = U (X 0 X)−1 X 0 Xα = U (X 0 X)−1 X 0 x = Ax,

where A = U (X 0 X)−1 X 0 .

2.43 Since x̄ = m−1 10m x, we have

u(x) = x − 1m x̄ = Im x − m−1 1m 10m x = Ax,

where A = Im − m−1 1m 10m . Since

A1m = 1m − m−1 1m 10m 1m = 1m − 1m = 0,

and for any vector x that is orthogonal to 1m ,

Ax = x − m−1 1m 10m x = x,

15
it follows that dim{R(A)} = m − 1 and dim{N (A)} = 1.

2.45 (a) The model y = Xβ +  has


   
y11 11
 ..  ..
   
 
 .   .   
1n1 0 ··· 0
   
   
 y1n1   1n1   
 0 1 n2 ··· 0
     
 .   .  
y= ..  , = . , X= .
   .   .. .. .. 
     . . . 
 y      
 k1   k1 
 .. 
 
 ..
 
 0 0 ··· 1 nk
 .   . 
   
yknk knk

(b) Since X 0 X = diag(n1 , . . . , nk ), (X 0 X)−1 = diag(n−1 −1 0 0


1 , . . . , nk ), and X y = (n1 ȳ1 , . . . , nk ȳk ) , the

least squares estimator of β is


 
ȳ1
..
 
β̂ = (X 0 X)−1 X 0 y =  .
 
 . 
ȳk

Since  
ȳ1 1n1
..
 
ŷ = X β̂ =  ,
 
 . 
ȳk 1nk
it then follows that
ni
k X
X
SSE1 = (y − ŷ)0 (y − ŷ) = (yij − ȳi )2 .
i=1 j=1

(c) The reduced model has X = 1n . In this case, X 0 X = n, (X 0 X)−1 = n−1 , and X 0 y = nȳ, so the
least squares estimator of µ in the reduced model is

µ̂ = (X 0 X)−1 X 0 y = n−1 nȳ = ȳ.

This yields ŷ = X µ̂ = ȳ1n , and so the sum of squared errors for this reduced model is
ni
k X
X
0
SSE2 = (y − ȳ1n ) (y − ȳ1n ) = (yij − ȳ)2 .
i=1 j=1

16
The sum of squares for treatment is given by

X ni
k X ni
k X
X
SST = SSE2 − SSE1 = (yij − ȳ)2 − (yij − ȳi )2
i=1 j=1 i=1 j=1

X ni
k X ni
k X
X
2
= (yij − 2ȳyij + ȳ 2 ) − 2
(yij − 2ȳi yij + ȳi2 )
i=1 j=1 i=1 j=1
ni
k X
X
= (−2ȳyij + ȳ 2 + 2ȳi yij − ȳi2 )
i=1 j=1
k
X
= (−2ni ȳ ȳi + ni ȳ 2 + 2ni ȳi2 − ni ȳi2 )
i=1
k
X k
X
= ni (ȳi2 − 2ȳ ȳi + ȳ 2 ) = ni (ȳi − ȳ)2 .
i=1 i=1

(d) In Example 2.11, N is the number of observations, k + 1 is the number of parameters in the
complete model, and k2 is the difference in the number of pararmeters in the complete and
reduced models. For our models in this problem, these quantities are n, k, and k − 1, respectively.
Making these substitutions along with SSE2 − SSE1 = SST in (2.11), we get

SST /(k − 1)
F = .
SSE1 /(n − k)

2.47 Suppose y i ∈ S1 + S2 for i = 1, 2. Then for i = 1, 2, there are vectors xi ∈ Si and ui ∈ Si such that
y 1 = x1 + x2 and y 2 = u1 + u2 . Now

α1 y 1 + α2 y 2 = α1 (x1 + x2 ) + α2 (u1 + u2 ) = v 1 + v 2 ,

where v 1 = α1 x1 + α2 u1 and v 2 = α1 x2 + α2 u2 . But since S1 and S2 are vector spaces, v i ∈ Si for


i = 1, 2, and so α1 y 1 + α2 y 2 ∈ S1 + S2 and, hence, S1 + S2 is a vector space.

2.49 Suppose x ∈ S1 + S2 . This implies that there is an x1 ∈ S1 and an x2 ∈ S2 such that x = x1 + x2 .


Since x1 ∈ S1 , x1 ∈ S1 ∪ S2 , and since x2 ∈ S2 , x2 ∈ S1 ∪ S2 . Further, x1 ∈ T and x2 ∈ T since
S1 ∪ S2 ⊂ T . But T is a vector space so x = x1 + x2 ∈ T . This confirms that S1 + S2 ⊆ T .

2.51 Each vector in {x1 , . . . , xr y 1 , . . . , y h } is in S1 + S2 since xi ∈ S1 implies that xi ∈ S1 + S2 and y i ∈ S2


implies that y i ∈ S1 + S2 . Now x ∈ S1 + S2 implies that x can be expressed as x = u1 + u2 , where
Pr
u1 ∈ S1 and u2 ∈ S2 . But since {x1 , . . . , xr } spans S1 and {y 1 , . . . , y h } spans S2 ,u1 = i=1 αi xi and

17
Ph
u2 = j=1 βj y j for some scalars α1 , . . . , αr , β1 , . . . , βh . Thus,
r
X h
X
x = u1 + u2 = α i xi + βj y j ,
i=1 j=1

and so {x1 , . . . , xr , y 1 , . . . , y h } spans S1 + S2 .

2.53 (a) max(r1 , r2 ) ≤ dim(S1 + S2 ) ≤ min(m, r1 + r2 ).

(b) max(0, r1 + r2 − m) ≤ dim(S1 ∩ S2 ) ≤ min(r1 , r2 ).

2.55 Let S1 be the space spanned by the vector (1, 0, 0)0 and S2 be the space spanned by the vector (0, 0, 1)0 .
Then T ⊕ S1 = R3 , T ⊕ S2 = R3 , and S1 ∩ S2 = {0}.

2.57 This is an immediate consequence of Theorem 2.25 since

dim(S1 + S2 ) = dim(S1 ) + dim(S2 )

if and only if dim(S1 ∩ S2 ) = 0, that is, S1 ∩ S2 = {0}.

2.59 (a) The rank of PS1 |S2 is 2 and so for a basis we can choose two linearly independent columns of
PS1 |S2 , for instance, {(1, 0, 0)0 , (1, 3, 3)0 }.

(b) Looking at the columns of I3 − PS1 |S2 , we see that S2 is spanned by the vector (1, 3, 2)0 .

2.61 (a) Note that

(PS1 |S2 + PT1 |T2 )2 = PS21 |S2 + PS1 |S2 PT1 |T2 + PT1 |T2 PS1 |S2 + PT21 |T2

= PS1 |S2 + PS1 |S2 PT1 |T2 + PT1 |T2 PS1 |S2 + PT1 |T2 .

When PS1 |S2 PT1 |T2 = PT1 |T2 PS1 |S2 = (0), this reduces to PS1 |S2 + PT1 |T2 so it follows that PS1 |S2 +
PT1 |T2 is a projection matrix. Conversely, if PS1 |S2 + PT1 |T2 is a projection matrix, we must have

PS1 |S2 PT1 |T2 + PT1 |T2 PS1 |S2 = (0).

Premultiplying this equation by PS1 |S2 leads to

PS1 |S2 PT1 |T2 = −PS1 |S2 PT1 |T2 PS1 |S2 ,

while postmultiplying by PS1 |S2 leads to

PT1 |T2 PS1 |S2 = −PS1 |S2 PT1 |T2 PS1 |S2 ,

18
The three preceding equations together imply that

PS1 |S2 PT1 |T2 = PT1 |T2 PS1 |S2 = (0).

(b) Since PS1 |S2 + PT1 |T2 is a projection matrix, it can be written as PU |V . We first show that
U = S1 + T1 . Now for any x ∈ Rm , (PS1 |S2 + PT1 |T2 )x = PS1 |S2 x + PT1 |T2 x, where PS1 |S2 x ∈ S1
and PT1 |T2 x ∈ T1 , so U ⊂ S1 + T1 . Also, any w ∈ S1 + T1 can be written w = x + y, where
x ∈ S1 , y ∈ T1 , and so using the conditions in (a), we have

w = x + y = PS1 |S2 x + PT1 |T2 y = (PS1 |S2 + PT1 |T2 )z,

where z = PS1 |S2 x + PT1 |T2 y is some vector in Rm . Thus, w ∈ U and so S1 + T1 ⊂ U . This
establishes that U = S1 + T1 , and in a similar manner we will show that V = S2 ∩ T2 . For any
x ∈ V , x = (Im − PS1 |S2 − PT1 |T2 )x and, using the conditions in (a),

(Im − PS1 |S2 )x = (Im − PS1 |S2 )(Im − PS1 |S2 − PT1 |T2 )x

= (Im − PS1 |S2 − PT1 |T2 )x = x,

and

(Im − PT1 |T2 )x = (Im − PT1 |T2 )(Im − PS1 |S2 − PT1 |T2 )x

= (Im − PS1 |S2 − PT1 |T2 )x = x.

That is, x ∈ S2 and x ∈ T2 so V ⊂ S1 ∩ S2 . Conversely, suppose x ∈ S2 and x ∈ T2 , so that


(Im − PS1 |S2 )x = x and (Im − PT1 |T2 )x = x. Then

x = (Im − PS1 |S2 )x = (Im − PS1 |S2 )(Im − PT1 |T2 )x

= (Im − PS1 |S2 − PT1 |T2 )x,

so x ∈ V and S2 ∩ T2 ⊂ V . This confirms that V = S2 ∩ T2 .

2.63 (a) Suppose xi ∈ S1 ∩ S2 for i = 1, 2. Since this implies that xi ∈ S1 and xi ∈ S2 for i = 1, 2, and S1
and S2 are convex sets, we have
cx1 + (1 − c)x2 ∈ Sj ,

for j = 1, 2, where 0 < c < 1. This confirms that cx1 + (1 − c)x2 ∈ S1 ∩ S2 , and so S1 ∩ S2 is a
convex set.

19
(b) Suppose z i ∈ S1 + S2 for i = 1, 2, so that there exist xi ∈ S1 and y i ∈ S2 for i = 1, 2, such that
z i = xi + y i . Then, for 0 < c < 1,

z∗ = cz 1 + (1 − c)z 2

= c(x1 + y 1 ) + (1 − c)(x2 + y 2 )

= {cx1 + (1 − c)x2 } + {cy 1 + (1 − c)y 2 }

= x∗ + y ∗ .

Since S1 and S2 are convex sets, it follows that x∗ = cx1 +(1−c)x2 ∈ S1 and y ∗ = cy 1 +(1−c)y 2 ∈
S2 , so z ∗ ∈ S1 + S2 . Thus, S1 + S2 is a convex set.

2.65 Suppose x0i xi ≤ n−1 for i = 1, 2. Then, using the Cauchy-Schwarz inequality, we have for any 0 < c < 1,

{cx1 + (1 − c)x2 }0 {cx1 + (1 − c)x2 } = c2 x01 x1 + 2c(1 − c)x01 x2 + (1 − c)2 x02 x2

≤ c2 x01 x1 + 2c(1 − c)(x01 x1 x02 x2 )1/2 + (1 − c)2 x02 x2

≤ n−1 {c2 + 2c(1 − c) + (1 − c)2 }

= n−1 {c + (1 − c)}2 = n−1 ,

so Bn is convex.
Pr
2.67 Let x ∈ C(S). Then according to Problem 2.66, for some r, x = i=1 αi xi , where each xi ∈ S and
Pr
α1 , . . . , αr are nonnegative scalars satisfying i=1 αi = 1. We will show that if r > m + 1, then x can
be expressed as a convex combination of r − 1 vectors in S. This can be used repeatedly until we have
x expressed as a convex combination of m+1 vectors. Since r −1 > m, the vectors x2 −x1 , . . . , xr −x1
are linearly dependent and so there are scalars β2 , . . . , βr , not all zero, such that
r
X r
X r
X r
X
0= βi (xi − x1 ) = − βi x 1 + β i xi = βi x i ,
i=2 i=2 i=2 i=1
Pr Pr
where β1 = − i=2 βi . Note that i=1 βi = 0 and since not all of the βi are 0, at least one βi is
positive. Thus, for any scalar γ,
r
X r
X r
X r
X
x= αi xi = αi xi − γ βi xi = (αi − γβi )xi .
i=1 i=1 i=1 i=1

20
Suppose we choose γ = min1≤i≤r (αi /βi : βi > 0) so that γ = αj /βj for some j. Then γ > 0,
αi − γβi ≥ 0 for all i, and αj − γβj = 0. Thus,
r
X r
X
x= (αi − γβi )xi = (αi − γβi )xi
i=1 i=1
i6=j

and
r
X r
X r
X r
X
(αi − γβi ) = (αi − γβi ) = αi − γ βi = 1,
i=1 i=1 i=1 i=1
i6=j

so we have expressed x as a convex combination of r − 1 vectors in S.

21
Chapter 3

3.1 (a) The characteristic equation is

9−λ −3 −4
12 −4 − λ −6 = −(λ + 1)(λ − 1)(λ − 2) = 0,
8 −3 −3 − λ

so the eigenvalues of A are −1, 1, 2.

(b) An eigenvector corresponding to the −1 eigenvalue satisfies the equation Ax = −x, which leads
to the constraints x2 = 2x1 and x3 = x1 . Thus, an eigenvector has the form (x, 2x, x)0 and a unit
eigenvector is √1 (1, 2, 1)0 . An eigenvector corresponding to the 1 eigenvalue satisfies the equation
6
Ax = x, which leads to the constraints x2 = 0 and x3 = 2x1 . Thus, an eigenvector has the form
(x, 0, 2x)0 and a unit eigenvector is √1 (1, 0, 2)0 .
5
An eigenvector corresponding to the 2 eigenvalue
satisfies the equation Ax = 2x, which leads to the constraints x2 = x1 and x3 = x1 . Thus, an
eigenvector has the form (x, x, x)0 and a unit eigenvector is √1 (1, 1, 1)0 .
3

(c) It follows from Theorem 3.4(a) and Theorem 3.5(a) that

tr(A10 ) = (−1)10 + 110 + 210 = 1026.

3.3 The eigenvalues of A0 are the same as those of A, so from Problem 3.1 we have the eigenvalues −1, 1, 2.
An eigenvector corresponding to the −1 eigenvalue satisfies the equation A0 x = −x, which leads to the
constraints x1 = −2x2 and x3 = x2 . Thus, SA0 (−1) = {(−2x, x, x)0 : −∞ < x < ∞}. An eigenvector
corresponding to the 1 eigenvalue satisfies the equation A0 x = x, which leads to the constraints x2 = 0
and x3 = −x1 . Thus, SA0 (1) = {(x, 0, −x)0 : −∞ < x < ∞}. An eigenvector corresponding to the 2
eigenvalue satisfies the equation A0 x = 2x, which leads to the constraints x1 = −4x2 and x3 = 2x2 .
Thus, SA0 (2) = {(−4x, x, 2x)0 : −∞ < x < ∞}. While the eigenvalues of A0 are the same as those of
A, we see that the eigenspaces of A0 are not the same as those of A.

3.5 (a) The characteristic equation is

|A − λI4 | = (λ + 2)(λ + 1)(λ − 1)2 = 0,

so the eigenvalues are −2, −1, and 1 with multiplicity two.

22
(b) The equation Ax = −2x leads to the constraints x2 = x4 = 0 and x3 = −x1 so SA (−2) =
{(x, 0, −x, 0)0 : −∞ < x < ∞}. The equation Ax = −x leads to the constraints x1 = −2x3 ,
x2 = x3 and x4 = 0, so SA (−1) = {(−2x, x, x, 0)0 : −∞ < x < ∞}. The equation Ax = x leads to
the constraints x1 = 2x3 and x2 = 3x3 , so SA (1) = {(2x, 3x, x, y)0 : −∞ < x < ∞ −∞ < y < ∞}.

3.7 (a) Since N ≥ 2k + 1, we can find an N × k matrix W satisfying W 0 [Z1 y] = (0). We can choose
the columns of W so that they are orthogonal and scaled so that W 0 W = γIk . Under these
conditions, the ordinary least squares estimate of δ 1 in the model

y = δ0 1N + (Z1 + W )δ 1 + 

is

{(Z1 + W )0 (Z1 + W )}−1 (Z1 + W )0 y = (Z10 Z1 + Z10 W + W 0 Z1 + W 0 W )−1 (Z10 y + W 0 y)

= (Z10 Z1 + γIk )−1 Z10 y.

(b) Let U be any k × k matrix satisfying U 0 U = γIk . Then the ordinary least squares estimate of δ 1
in the model        
y δ0 1N Z1 
 = +  δ1 +  
0 0 U ∗

is
 0  −1  0  
 Z Z1  Z1 y
1
        = (Z10 Z1 + U 0 U )−1 Z10 y
 U U  U 0
= (Z10 Z1 + γIk )−1 Z10 y.

3.9 Suppose B is nonsingular. Then since |B||B −1 | = 1, we have

|AB − λIm | = |B||AB − λIm ||B −1 | = |BABB −1 − λBB −1 | = |BA − λIm |.

Thus, the characteristic equations of AB and BA are the same so they have the same eigenvalues.
Similarly, if A is nonsingular, we have

|AB − λIm | = |A−1 ||AB − λIm ||A| = |A−1 ABA − λA−1 A| = |BA − λIm |.

23
3.11 (a) P is an orthogonal matrix since
 
cos2 θ + sin2 θ cos θ sin θ − cos θ sin θ
P 0P =   = I2 .
cos θ sin θ − cos θ sin θ cos2 θ + sin2 θ

(b) The characteristic equation of P is

|P − λI2 | = (cos θ − λ)2 + sin2 θ = λ2 − 2λ cos θ + 1,

which yields the two solutions λ = cos θ + i sin θ and λ = cos θ − i sin θ.

3.13 (a) Note that


|A0 − λIm | = |(A − λIm )0 | = |A − λIm |.

Thus, A0 and A have the same characteristic equation, and so they have the same eigenvalues.

(b) A is singular if and only if its columns are linearly dependent, that is, if and only if Ax = 0 for
some nonnull x. However, this is equivalent to saying A has a zero eigenvalue.

(c) It follows from Problem 1.22 that the characteristic equation of a triangular matrix A is
m
Y
|A − λIm | = (aii − λ),
i=1

thereby confirming that the diagonal elements of A are its eigenvalues.

(d) Note that

|BAB −1 − λIm | = |BAB −1 − λBB −1 | = |B||A − λIm ||B −1 | = |A − λIm |.

Consequently, BAB −1 and A have identical characteristic equations, and so the two matrices have
identical eigenvalues.

(e) Although A is a real matrix, an eigenvalue λ and corresponding eigenvector x of A could be


complex. Since A0 A = Im and Ax = λx, we have

x∗ x = x∗ A0 Ax = (Ax)∗ Ax = (λx)∗ λx = λ̄λx∗ x.

Note that x∗ x 6= 0, and so λ̄λ =1 or |λ| = 1.

24
3.15 It is easily verified that the matrices
   
0 0 0 1
A= , C= ,
0 0 0 0

both have eigenvalue 0 with multiplicity 2. Clearly, no B exists for which C = BAB −1 since rank(A) =
0 and rank(C) = 1.

3.17 If rank(A−λIm ) = m−1, then there is one linearly independent vector x satisfying (A−λIm )x = 0 or,
equivalently, Ax = λx. Thus, λ is an eigenvalue of A and dim{SA (λ)} = 1. It follows from Theorem
3.3 that the multiplicity of the eigenvalue λ is at least one.

3.19 Since A is a triangular matrix, it follows from Theorem 3.2(c) that it has eigenvalue 1 with multiplicity
m. The eigenvalue-eigenvector equation, Ax = x, leads to the m equations xi + xi+1 = xi , for
i = 1, . . . , m − 1, and xm = xm . It follows that xi+1 = 0 for i = 1, . . . , m − 1 and x1 is arbitrary. Thus,
there is only one linearly independent eigenvector of the form (x, 0, . . . , 0)0 .

3.21 (a) Since


(Im + A)xi = xi + Axi = xi + λi xi = (1 + λi )xi ,

1 + λi is an eigenvalue of Im + A corresponding to the eigenvector xi . From Theorem 3.4(b), it


then follows that (1 + λi )−1 is an eigenvalue of (Im + A)−1 corresponding to the eigenvector xi .

(b) Since
(A + A−1 )xi = Axi + A−1 xi = λi xi + λ−1 −1
i xi = (λi + λi )xi ,

λi + λ−1
i is an eigenvalue of A + A−1 corresponding to the eigenvector xi .

(c) Since
(Im + A−1 )xi = xi + A−1 xi = xi + λ−1 −1
i xi = (1 + λi )xi ,

1 + λ−1
i is an eigenvalue of Im + A−1 corresponding to the eigenvector xi .

3.23 (a) Using Theorem 3.4(b) (see also Problem 3.21), we have

Bx = (Im + A)−1 x + (Im + A−1 )−1 x = (1 + λ)−1 x + (1 + λ−1 )−1 x


1 + λ−1 + 1 + λ
= x = x,
(1 + λ)(1 + λ−1 )

so 1 is an eigenvalue of B corresponding to the eigenvector x.

25
(b) An application of Theorem 1.9 reveals that

(Im + A)−1 = Im − (Im + A−1 )−1

from which it immediately follows that B = Im .

3.25 (a) The characteristic equation of A is

|A − λI2 | = λ2 − (a11 + a22 )λ + (a11 a22 − a12 a21 ).

(b) Using the quadratic formula, we have


p p
a11 + a22 ± (a11 + a22 )2 − 4(a11 a22 − a12 a21 ) a11 + a22 ± (a11 − a22 )2 + 4a12 a21
= .
2 2
(c) The eigenvalues are real when (a11 − a22 )2 + 4a12 a21 ≥ 0.

3.27 Note that 1m 10m 1m = m1m , while 1m 10m x = 0x for any x orthogonal to 1m . Thus, m is an eigenvalue
with corresponding eigenvector being any vector in the space S spanned by 1m , and 0 is an eigenvalue,
having multiplicity m − 1, with corresponding eigenvector being any vector in S ⊥ .

3.29 (a) Note that (αIm + β1m 10m )1m = (α + mβ)1m , while (αIm + β1m 10m )x = αx for any x orthogonal
to 1m . Thus, α + mβ is an eigenvalue with corresponding eigenvector being any vector in the
space S spanned by 1m , and α is an eigenvalue, having multiplicity m − 1, with corresponding
eigenvector being any vector in S ⊥ .

(b) The eigenspaces are S and S ⊥ as indicated in part (a). The eigenprojection corresponding to the
eigenvalue α + mβ is
1
PS =1m 10m ,
m
while the eigenprojection corresponding to the eigenvalue α is
1
PS ⊥ = Im − PS = Im − 1m 10m .
m
(c) None of the eigenvalues can be 0, so we must have α 6= 0 and β 6= −α/m.

(d) We have
 
−1 1
−1 0 −1 1 0
A = (α + mβ) 1m 1m + α Im − 1m 1m
m m
 
1 1 1
= α−1 Im + − 1m 10m
α + mβ α m
β
= α−1 Im − 1m 10m .
α(α + mβ)

26
(e) The determinant is the product of the eigenvalues so

|A| = αm−1 (α + mβ).

3.31 (a) The characteristic equation of A is

|A − λI3 | = −λ(λ − 2)(λ − 3) = 0,

so A has eigenvalues 0, 2, 3. The equation Ax = 0 leads to the constraints x1 = x2 /2 and


x3 = −x2 /2, so a normalized eigenvector corresponding to 0 is √1 (1, 2, −1)0 . The equation
6
Ax = 2x leads to the constraints x1 = x3 and x2 = 0, so a normalized eigenvector corresponding
to 2 is √1 (1, 0, 1)0 . The equation Ax = 3x leads to the constraints x1 = −x2 and x3 = x2 , so a
2
normalized eigenvector corresponding to 3 is √1 (1, −1, −1)0 .
3

(b) The rank of A is 2 since it is symmetric and has two nonzero eigenvalues.

(c) The eigenspaces are


SA (0) = {(x, 2x, −x)0 : −∞ < x < ∞},

SA (2) = {(x, 0, x)0 : −∞ < x < ∞},

SA (3) = {(x, −x, −x)0 : −∞ < x < ∞},

while the corresponding eigenprojections are


     
1 2 −1 1 0 1 1 −1 −1
1  1  1 
PSA (0) =  2 4 −2  , PSA (2) =  0 , P =  −1 1 .
     
0 0 SA (3) 1
6 2 3

  
−1 −2 1 1 0 1 −1 1 1

(d) tr(A4 ) = 04 + 24 + 34 = 97.

3.33 It follows from Theorem 3.4(a) that


m
X
tr(A2 ) = λ2i .
i=1

But since A is symmetric, we also have


m
X m
X m X
X m
tr(A2 ) = tr(A0 A) = (A0 A)jj = (A0 )j· (A)·j = a2ij .
j=1 j=1 i=1 j=1

Equating these two equations gives the result.

27
3.35 Let λ1 , . . . , λm denote the eigenvalues of A. Using Problem 3.33, we have
m
X m X
X m m
X m X
X m
λ2i = a2ij = a2ii + a2ij
i=1 i=1 j=1 i=1 i=1 j=1
j6=i
m
X m X
X m
= λ2i + a2ij .
i=1 i=1 j=1
j6=i

Thus, the final double sum must reduce to zero and this requires aij = 0 for all i 6= j.

3.37 Since A0 A is symmetric and rank(A0 A) = rank(A) = r, A0 A can be written as


  
0
Λ (0) U
A0 A = [U X]   ,
(0) (0) X0

where Λ is an r × r diagonal matrix, U and X are n × r and n × (n − r) matrices satisfying U 0 U = Ir ,


X 0 X = In−r , and X 0 U = (0). It follows that X 0 A0 AX = (0) which leads to AX = (0). Similarly, since
AA0 is symmetric and rank(AA0 ) = rank(A) = r, AA0 can be written as
  
Λ (0) V
AA0 = [V 0 Y 0 ]   ,
(0) (0) Y

where Λ is an r × r diagonal matrix, V 0 and Y 0 are m × r and m × (m − r) matrices satisfying V V 0 = Ir ,


Y Y 0 = Im−r , and V Y 0 = (0). It follows that Y AA0 Y 0 = (0) which leads to Y A = (0).

3.39 (a) Let λ1 , . . . , λm be the eigenvalues of A, so that by Theorem 3.4(a), λk1 , . . . , λkm are the eigenvalues
of Ak . But Ak = (0), so every eigenvalue of Ak is 0. That is, λki = 0 for all i, so we must have
λi = 0 for all i.

(b) The matrix  


0 1
A= 
0 0

satisfies A2 = (0).

3.41 We will prove the max-min result; the proof of the min-max result is similar. Note that we may
Pk 0
without loss of generality consider the max-min of the sum j=1 xj Axj if we restrict each xj to

be a unit vector. Now suppose the columns of Cij are eigenvectors corresponding to the eigenvalues
λij +1 , . . . , λm . Then Ci0j xj = 0 implies that xj is a linear combination of orthonormal eigenvectors

28
corresponding to the eigenvalues λ1 , . . . , λij , and so we must have x0j Axj ≥ λij . Thus, with this
particular choice for Ci1 , . . . , Cik , we get
k
X k
X
x0j Axj ≥ λij .
j=1 j=1
Pk
That is, the max-min must be greater than or equal to λij . As a result, the proof will be complete
j=1
Pk
if we can show that the max-min is less than or equal to j=1 λij . We prove this result by induction
on m. Note that the result is trivially true when k = m. In particular, the result is true when m = 1.
Now suppose m > 1, k < m, and the result holds for (m − 1) × (m − 1) matrices. Let Ci1 , . . . , Cik be
given matrices satisfying the conditions of the theorem. We need to show that there are orthonormal
Pk Pk
vectors x1 , . . . , xk , with Ci0j xj = 0, such that j=1 x0j Axj ≤ j=1 λij .

First suppose that ik < m so that dim{N (Ci0k )} ≤ m − 1. Let the columns of Y = (y 1 , . . . , y m−1 ) be
orthonormal vectors satisfying Ci0j y l = 0 for l = 1, . . . , ij and j = 1, . . . , k. Then if µ1 , . . . , µm−1 are
the eigenvalues of B = Y 0 AY , it follows from Theorem 3.19 that

µi ≤ λi , for i = 1, . . . , m − 1. (1)

Let Dij = Y 0 Cij , so that if Di0j u = 0, then Ci0j w = 0 and u0 Bu = w0 Aw, where w = Y u. Thus,
since B is (m − 1) × (m − 1), we know by our induction hypothesis that there are orthonormal vectors
u1 , . . . , uk satisfying Di0j uj = 0 and
k
X k
X
u0j Buj ≤ µij . (2)
j=1 j=1
Pk Pk
Thus, if wj = Y uj , then Ci0j wj = 0, j=1 u0j Buj = j=1 w0j Awj , and so from (1) and (2)
k
X k
X
w0j Awj ≤ λ ij ,
j=1 j=1

as required.

Next suppose that ik = m and let l be the largest index for which il +1 < il+1 . We know such an l exists
since k < m. Let Sm−1 be an (m − 1)-dimensional space that contains N (Ci0l ) and the eigenvectors of
A corresponding to the eigenvalues λil+1 , . . . , λm ; there is such a space since m − il+1 + 1 + il < m. By
the definition of l, it follows that il+1 , . . . , m are among the indices i1 , . . . , ik and

0
N (Ci0l ) ⊂ N (Ci0l+1 ) ∩ Sm−1 ⊂ · · · ⊂ N (Cm−1 ) ∩ Sm−1 ⊂ Sm−1 .

29
Since for j = il+1 , . . . , m − 1, dim{N (Cj0 ) ∩ Sm−1 ) ≥ j − 1, we can find subspaces Sil+1 −1 , . . . , Sm−2
such that dim(Sj ) = j,
Sil+1 −1 ⊂ N (Ci0l+1 ), . . . , Sm−2 ⊂ N (Cm−1
0
)

and
N (Ci01 ) ⊂ · · · ⊂ N (Ci0l ) ⊂ Sil+1 −1 ⊂ · · · ⊂ Sm−2 ⊂ Sm−1 .

Consider orthonormal vectors x1 , . . . , xk that satisfy xj ∈ N (Ci0j ) for j = 1, . . . , l and xj ∈ Sij −1 for
j = l + 1, . . . , k. Since each xj ∈ Sm−1 , there exist an m × (m − 1) semiorthogonal matrix Y and
orthonormal (m − 1) × 1 vectors u1 , . . . , uk such that xj = Y uj . As a result, by using the induction
hypothesis on the matrix B = Y 0 AY , we have
k
X k
X l
X m−1
X l
X m−1
X
x0j Axj = u0j Buj ≤ µij + µj ≤ λij + µj , (3)
j=1 j=1 j=1 j=il+1 −1 j=1 j=il+1 −1

where the second inequality utilized Theorem 3.19. Since the eigenvectors of A corresponding to the
eigenvalues λil+1 , . . . λm are in Sm−1 , it follows that λil+1 , . . . λm are also eigenvalues of B. Since
µil+1 −1 , . . . , µm−1 are the smallest eigenvalues of B, this then guanantees that
m−1
X m
X
µj ≤ λj ,
j=il+1 −1 j=il+1

and when this is substituted into (3), we get the desired result.

3.43 Consider the matrices    


2 0 2 1
A= , B= .
1 2 0 2
Both matrices have the eigenvalue 2 with multiplicity 2. Theorem 3.21 would hold only if A + B had
the eigenvalue 4 with multiplicity 2. However, The eigenvalues of A + B are 3 and 5.

3.45 For any m × (m − h) matrix Ch satisfying Ch0 Ch = Im−h , we have


x0 (A + B)x
 0
x Ax x0 Bx

min = min +
Ch0
x=0 x0 x 0
Ch x=0 x0 x x0 x
x6=0 x6=0
x0 Ax x0 Bx
≥ min 0
+ min
0
Ch x=0 xx 0
Ch x=0 x0 x
x6=0 x6=0
0
x Ax x0 Bx
≥ min 0
+ min 0
0
Ch x=0 xx x6=0 x x
x6=0

30
x0 Ax
= min + λm (B)
0
Ch x=0 x0 x
x6=0
x0 Ax
≥ min ,
0
Ch x=0 x0 x
x6=0

where the last equality follows from Theorem 3.16. Now maximizing both sides of the equation above
over all choices of Ch satisfying Ch0 Ch = Im−h and using (3.10) of Theorem 3.18, we get

x0 (A + B)x
λh (A + B) = max min
Ch 0
Ch x=0 x0 x
x6=0
x0 Ax
≥ max min = λh (A).
Ch 0
Ch x=0 x0 x
x6=0

3.47 Define G, E, T and H as in the proof of Theorem 3.31 so that A = GΛG0 , B = GG0 , E = G0 F ,
E 0 E = T 2 , and H = ET −1 .

(a) Using the lower bound in Theorem 3.19, we have

λh−i+1 {(F 0 BF )−1 (F 0 AF )} = λh−i+1 {(F 0 GG0 F )−1 F 0 GΛG0 F } = λh−i+1 {(T 2 )−1 E 0 ΛE}

= λh−i+1 (T −1 E 0 ΛET −1 ) = λh−i+1 (H 0 ΛH) ≥ λm−h+(h−i+1) (Λ)

= λm−i+1 (Λ) = λm−i+1 (B −1 A).

(b) We have

min λ1 {(F 0 BF )−1 (F 0 AF )} = min λ1 {(E 0 E)−1 E 0 ΛE}


F E

= min λ1 (T −1 E 0 ΛET −1 ) = min λ1 (H 0 ΛH)


E H
−1
= λm−h+1 (B A),

where the last equality follows from the lower bound in Theorem 3.19 and the fact that the bound
is attained when H 0 = [(0) Ih ].

(c) We have

min λh {(F 0 BF )−1 (F 0 AF )} = min λh {(E 0 E)−1 E 0 ΛE}


F E

= min λh (T −1 E 0 ΛET −1 ) = min λh (H 0 ΛH)


E H
−1
= λm (B A),

31
where the last equality follows from the lower bound in Theorem 3.19 and the fact that the bound
is attained when H 0 = [(0) Ih ].

3.49 (a) Now  


6 3
TT0 =  
3 6
and
|T T 0 − λI2 | = (λ − 3)(λ − 9),

so the eigenvalues of T T 0 are 3 and 9. The equation T T 0 x = 3x yields the constraint x2 = −x1 ,
so an eigenvector corresponding to 3 has the form (x, −x)0 . The equation T T 0 x = 9x yields the
constraint x2 = x1 , so an eigenvector corresponding to 9 has the form (x, x)0 .

(b) We have  
5 1 4
 
T 0T =  1 −1  .
 
2
 
4 −1 5
The positive eigenvalues of T T 0 and T 0 T are the same, so T 0 T has eigenvalues 0, 3, and 9. The
equation T 0 T x = 0x yields the constraints x3 = x2 = −x1 , so an eigenvector corresponding to 0
has the form (x, −x, −x)0 . The equation T 0 T x = 3x yields the constraints x2 = 2x1 and x3 = −x1 ,
so an eigenvector corresponding to 3 has the form (x, 2x, −x)0 . The equation T 0 T x = 9x yields
the constraints x2 = 0 and x3 = x1 , so an eigenvector corresponding to 9 has the form (x, 0, x)0 .

3.51 Using the spectral decomposition of A, we have

A = XΛX 0 = (XΛ1/2 )(Λ1/2 X 0 ) = Y 0 Y,

where Y = (y 1 , . . . , y m ) = Λ1/2 X 0 . Note that aii = y 0i y i and so aii = 0 implies y i = 0. But this
implies aij = aji = 0 for any j since aij = aji = y 0i y j .

3.53 (a) Since


(A + B)xi = Axi + Bxi = λi xi + γi xi = (λi + γi )xi ,

it follows that λ1 + γ1 , . . . , λm + γm are the eigenvalues of A + B with corresponding eigenvectors


x1 , . . . , xm .

32
(b) Since
ABxi = Aγi xi = γi Axi = λi γi xi ,

it follows that λ1 γ1 , . . . , λm γm are the eigenvalues of AB with x1 , . . . , xm as the corresponding


eigenvectors.

(c) The spectral decompositions of A and B have a common orthogonal matrix, that is, they can be
written as A = XΛX 0 and B = XΓX 0 . The matrices Λ and Γ are diagonal so ΛΓ = ΓΛ. As a
result,
AB = XΛX 0 XΓX 0 = XΛΓX 0 = XΓΛX 0 = XΓX 0 XΛX 0 = BA.

3.55 Let X = (X1 , X2 ) and Λ = diag(Λ1 , Λ2 ), where X1 = (x1 , . . . , xk−1 ), X2 = (xk , . . . , xm ), Λ1 =


diag(λ1 , . . . , λk−1 ), and Λ2 = diag(λk , . . . , λm ). Note that

A − λIm = XΛX 0 − λXX 0

= X1 Λ1 X10 + X2 Λ2 X20 − λX1 X10 − λX2 X20

= X1 (Λ1 − λIk−1 )X10 + X2 (Λ2 − λIm−k+1 )X20 .

Since P = X2 X20 , we have

P (A − λIm )P = X2 X20 (A − λIm )X2 X20 = X2 (Λ2 − λIm−k+1 )X20 , (1)

which clearly is (0) when λk = · · · = λm = λ. Conversely, if P (A − λIm )P = (0), then premultiplying


(1) by X20 and postmultiplying by X2 confirms that λk = · · · = λm = λ.
Pr
3.57 (a) Since tr(B 0 AB) = i=1 λi (B 0 AB), it follows from Theorem 3.19 that
r
X r
X
λm−i+1 ≤ tr(B 0 AB) ≤ λi ,
i=1 i=1

if B 0 B = Ir . Note that the lower bound is attained when B contains as its columns, normalized
eigenvectors of A corresponding to the eigenvalues λm−r+1 , . . . , λm , and so we must have
r
X
min
0
tr(B 0 AB) = λm−i+1 .
B B=Ir
i=1

Similarly, the upper bound is attained when B contains as its columns, normalized eigenvectors
of A corresponding to the eigenvalues λ1 , . . . , λr , and so we must have
r
X
max
0
tr(B 0 AB) = λi .
B B=Ir
i=1

33
(b) Using the bounds given in part (a) and the choice of B = (e1 , . . . , er ), the result follows

3.59 Using the spectral decomposition of C, we have

C = XΛX 0 = XΛ1/2 Λ1/2 X 0 = T T 0 ,

where the m × m matrix T = XΛ1/2 is nonsingular since C is positive definite.

(a) Using Theorem 3.2(d) and Theorem 3.32(a), we find

λi (AB) = λi (T −1 ABT ) = λi (T −1 AT −10 T 0 BT )

≤ λj (T −1 AT −10 )λk (T 0 BT ) = λj (AT −10 T −1 )λk (T T 0 B)

= λj (AC −1 )λk (CB).

(b) Using Theorem 3.2(d) and Theorem 3.32(b), we find

λm−i+1 (AB) = λm−i+1 (T −1 ABT ) = λm−i+1 (T −1 AT −10 T 0 BT )

≥ λm−j+1 (T −1 AT −10 )λm−k+1 (T 0 BT ) = λm−j+1 (AT −10 T −1 )λm−k+1 (T T 0 B)

= λm−j+1 (AC −1 )λm−k+1 (CB).

3.61 For a proof of this result, see Khattree, R., 2002, Generalized antieigenvalues and antieigenvectors,
American Journal of Mathematical and Management Sciences 22, 89–98.

34
Chapter 4

4.1 Now  
10 4
AA0 =  ,
4 4
and |AA0 − λI2 | = (12 − λ)(2 − λ), so the eigenvalues of AA0 are 2 and 12. The equation AA0 x = 2x
leads to the constraint x2 = −2x1 , so an eigenvector corresponding to 2 has the form (x, −2x)0 . The
equation AA0 x = 12x leads to the constraint x1 = 2x2 , so an eigenvector corresponding to 12 has the
form (2x, x)0 . The positive eigenvalues of
 
2 3 3 0
 
 
 3 5 5 1 
A0 A = 



 3 5 5 1 
 
0 1 1 2

are also 2 and 12. The equation A0 Ax = 2x leads to the constraints x2 = x3 = 0 and x4 = −3x1 ,
so an eigenvector corresponding to 2 has the form (x, 0, 0, −3x)0 . The equation A0 Ax = 12x leads
to the constraints x1 = 3x4 and x2 = x3 = 5x4 , so an eigenvector corresponding to 12 has the form
(3x, 5x, 5x, x)0 . As a result, the singular value decomposition of A is given by
 √ √  √  √ √ 
1/ 5 2/ 5 2 0 −1/ 10 0 0 3/ 10
A= √ √  √  √ √ √ √ .
−2/ 5 1/ 5 0 12 3/ 60 5/ 60 5/ 60 1/ 60

4.3 A has a zero eigenvalue if and only if rank(A) = r < m, and this condition is equivalent to rank(AA0 ) =
r < m which hold if and only of A has r singular values.

4.5 We prove the result when m ≥ n. The proof for m < n is similar. Let ∆ be the n × n matrix
∆ = diag(µ1 , . . . , µr , 0, . . . , 0). Then the singular value decomposition of A can be written
   
∆ ∆
A=P  Q0 = [P1 P2 ]   Q0 = P1 ∆Q0 ,
(0) (0)

where P1 is m × n and P2 is m × (m − n). Let


 √ √ 
P1 / 2 −P1 / 2 P2
U = √ √ .
Q/ 2 Q/ 2 (0)

35
Then U U 0 = U 0 U = Im+n and
 
∆ (0) (0)  
 0  (0) A
 
U  (0) −∆ (0)  U = ,

  A0 (0)
(0) (0) (0)

and so the result follows.

4.7 Let A = xy 0 . Note that


AA0 x = (x0 x)(y 0 y)x,

and
A0 Ay = (x0 x)(y 0 y)y.

Thus, (x0 x)(y 0 y) is the one positive eigenvalue of AA0 and A0 A with corresponding unit eigenvectors
p
given by p = (x0 x)−1/2 x and q = (y 0 y)−1/2 y. The singular value is d = (x0 x)(y 0 y) and the singular
value decomposition is then given by A = xy 0 = pdq 0 .

4.9 If A = P1 ∆Q01 and B = Q1 ∆−1 P10 , then since P10 P1 = Q01 Q1 = Ir ,

ABA = P1 ∆Q01 Q1 ∆−1 P10 P1 ∆Q01 = P1 ∆∆−1 ∆Q01 = P1 ∆Q01 = A,

and
BAB = Q1 ∆−1 P10 P1 ∆Q01 Q1 ∆−1 P10 = Q1 ∆−1 ∆∆−1 P10 = Q1 ∆−1 P10 = B.

4.11 (a) Note that ŷ can be expressed as

ŷ = {N −1 10N + z 0 (Z10 Z1 )−1 Z10 }y.

Since E(ŷ) = δ0 + z 0 δ 1 = θ, Z10 Z1 = U D2 U 0 , and v = D−1 U 0 z, we have

MSE(ŷ) = var(ŷ) = {N −1 10N + z 0 (Z10 Z1 )−1 Z10 }(σ 2 IN ){N −1 10N + z 0 (Z10 Z1 )−1 Z10 }0

= σ 2 {N −1 + z 0 (Z10 Z1 )−1 z} = σ 2 {N −1 + z 0 U D−2 U 0 z}


k
!
X
2 −1 0 2 −1 2
= σ {N + v v} = σ N + vi .
i=1

36
(b) Note that since V10 1N = 0, ȳ and z 0 U1 D1−1 V10 y are independently distributed, and so

var(ỹ) = var(ȳ) + var(z 0 U1 D1−1 V10 y)

= N −1 σ 2 + (z 0 U1 D1−1 V10 )(σ 2 IN )(z 0 U1 D1−1 V10 )0

= σ 2 (N −1 + z 0 U1 D1−2 U10 z)
k−r
!
X
−1
= σ (N 2
+ v 01 v 1 ) =σ 2
N −1
+ vi2 ,
i=1

where v = (v 01 , v 02 )0 and v 1 is (k − r) × 1. We also have (see the solution to Problem 3.8)

E(ỹ) = E(ȳ) + z 0 U1 D1−1 V10 E(y) = δ0 + z 0 δ 1 − z 0 U2 D2−2 U20 Z10 E(y), (1)

and so

{E(ỹ) − θ}2 = (z 0 U2 D2−2 U20 Z10 Z1 δ 1 )2 = (z 0 U2 D2−2 U20 U D2 U 0 δ 1 )2

= ([(0) z 0 U2 D2−2 ]D2 U 0 δ 1 )2 = (z 0 U2 D2−1 D2 U20 δ 1 )2


k
!2
X
= (v 02 D2 α12 )2 = di vi αi , (2)
i=k−r+1

where α1 = (α011 , α012 )0 = U 0 δ 1 . Combining (1) and (2) yields the result.

(c) MSE(ỹ) <MSE(ŷ) when d2k vk2 αk2 < σ 2 vk2 , that is, when d2k αk2 < σ 2 .

4.13 (a) The characteristic equation of A is

|A − λI3 | = −(λ − 4)2 (λ − 1) = 0,

so the eigenvalues of A are 4 with multiplicity two and 1. The equation Ax = x leads to the
constraints x2 = −x1 and x3 = x1 , so an eigenvector corresponding to 1 has the form (x, −x, x)0 .
The equation Ax = 4x leads to the constraint x3 = x2 − x1 . Consequently, two orthogonal
eigenvectors corresponding to 4 are of the form (x, x, 0)0 and (x, −x, −2x)0 . As a result, the
spectral decomposition of A can be expressed as A = XΛX 0 , where
 √ √ √   
1/ 3 1/ 2 1/ 6 1 0 0
 √ √ √   
X =  −1/ 3 1/ 2 −1/ 6  , Λ= 0 0 .
   
4
 √ √   
1/ 3 0 −2/ 6 0 0 4

37
(b) The symmetric square root of A is given by
 
5 1 −1
1 
XΛ1/2 X 0 = 1 .
 
 1 5
3 
−1 1 5

(c) A nonsymmetric square root matrix of A has the form A = XΛ1/2 Q0 , where Q is an orthogonal
matrix other than X. For instance, when Q = I3 , we get
 √ √ √ 
1/ 3 2 2/ 6
 √ √ √ 
A1/2 =  −1/ 3 2 −2/ 6 .
 
 √ √ 
1/ 3 0 −4/ 6

4.15 This is an application of the previous problem. The characteristic equation of A is

|A − λI3 | = −λ(λ − 5)(λ − 10) = 0,

so the nonzero eigenvalues of A are 5 and 10. The equation Ax = 5x leads to the constraints x2 = 0 and
3x3 = −4x1 , so an eigenvector corresponding to 5 has the form (3x, 0, −4x)0 . The equation Ax = 10x
leads to the constraints 4x2 = 5x1 and 4x3 = 3x1 , so an eigenvector corresponding to 10 has the form
(4x, 5x, 3x)0 . Consequently, we can compute T as
 √   
3/5 4/ 50  √  3 4
 √  5 0 1  
T = 0 = √ 5 .
   
5/ 50   √   0
 √  0 10 5  
−4/5 3/ 50 −4 3

4.17 (a) An eigenanalysis of A reveals that it has eigenvalue 0 with multiplicity two and eigenvalue 3.
By inspection of the equation Ax = 0, we find any eigenvector corresponding to 0 has the form
(−x, −2x, x)0 . Since there is only one linearly independent eigenvector corresponding to the 0
eigenvalue, A is not diagonalizable. It is easily shown that B has eigenvalues 0 and 1 with
multiplicity two. The equation Bx = x yields the single constraint x2 = x1 , so an eigenvector
corresponding to 1 has the form (x, x, y)0 . Since there are two linearly independent vectors of
this form, B is diagonalizable. Finally, an eigenanalysis of C reveals that it has eigenvalue
4 with multiplicity two and eigenvalue 0. By inspection of the equation Cx = 4x, we find
any eigenvector corresponding to 4 has the form (x, x, −x)0 . Since there is only one linearly
independent eigenvector corresponding to the 4 eigenvalue, C is not diagonalizable.

38
(b) Each matrix has rank of 2 so B and C have their rank equal to the number of nonzero eigenvalues.

4.19 Suppose A is nonsingular. Since AB is diagonalizable, there exist an m × m nonsingular matrix X and
an m × m diagonal matrix D such that

AB = XDX −1 .

Thus, B = A−1 XDX −1 and


BA = A−1 XDX −1 A = Y DY −1 ,

where Y = A−1 X. This confirms that BA is diagonalizable. A similar proof can be constructed for
the case in which B is nonsingular. Next consider the 2 × 2 matrices
   
0 1 1 0
A= , B= .
0 0 0 0

Then AB = (0) is a diagonal matrix, while BA = A is not diagonalizable.

4.21 Let x1 , . . . , xm be eigenvectors of A corresponding to the eigenvalues λ1 , . . . , λm , and let y 1 , . . . , y n be


eigenvectors of B corresponding to the eigenvalues γ1 , . . . , γn . It follows from Problem 3.48 that the
eigenvectors of C are (x01 , 00 )0 , . . . , (x0m , 00 ), (00 , y 1 )0 , . . . , (00 , y n )0 , and clearly these vectors are linearly
independent if and only if the xi ’s are linearly independent and the y j ’s are linearly independent. This
confirms that C is diagonalizable if and only if A and B are diagonalizable. The induction proof for
the general case is immediate since we can write

diag(A1 , . . . , Ak ) = diag(A, B),

where A = diag(A1 , . . . , Ak−1 ) and B = Ak , and then again use the proof given for the 2 × 2 form.

4.23 (a) Note that y 0i xj = (Y 0 X)ij = (X −1 X)ij = (Im )ij , so y 0i xj = 0 if i 6= j and y 0i xj = 1 if i = j. Now
rj
ri X
X
PA (µi )PA (µj ) = xMi +k y 0Mi +k xMj +l y 0Mj +l . (1)
k=1 l=1

Since the sets {Mi + 1, . . . , Mi + ri } and {Mj + 1, . . . , Mj + rj } have no elements in common when
i 6= j, (1) reduces to (0) when i 6= j.

39
(b) We have
h
X X ri
h X m
X
PA (µi ) = xMi +j y 0Mi +j = xl y 0l
i=1 i=1 j=1 l=1
m
X m
X
= (X)·l (Y 0 )l· = (X)·l (X −1 )l·
l=1 l=1
−1
= XX = Im .

(c) Write Xi = (xMi +1 , . . . , xMi +ri ) and Yi = (y Mi +1 , . . . , y Mi +ri ), so that PA (µi ) = Xi Yi0 . Note
that the null space of (A − µi Im ) is the eigenspace of µi so that N (A − µi Im ) = R(Xi ). Since
Yi0 Xi = Iri follows from Y 0 X = X −1 X = Im , we have

R{PA (µi )} = R(Xi Yi0 ) ⊆ R(Xi ) = R(Xi Yi0 Xi ) = R{PA (µi )Xi } ⊆ R{PA (µi )}.

Thus, R{PA (µi )} = N (A − µi Im ), that is, PA (µi ) is a projection matrix for the null space
of (A − µi Im ). All that remains is to show that this projection is along the column space of
(A − µi Im ) by showing N {PA (µi )} = R(A − µi Im ). Now using part (a), we have
h
X
PA (µi )(A − µi Im ) = µj PA (µi )PA (µj ) − µi PA (µi )
j=1

= µi PA2 (µi ) − µi PA (µi )

= µi PA (µi ) − µi PA (µi ) = (0),

so
R(A − µi Im ) ⊆ N {PA (µi )}. (2)

But since R{PA (µi )} = N (A − µi Im ),

dim{R(A − µi Im )} = m − dim{N (A − µi Im )} = m − dim(R{PA (µi )}) = dim(N {PA (µi )}).

This along with (2) confirms that N {PA (µi )} = R(A − µi Im ) and so the proof is complete.
4.25 (a)
 
0 0 0 0
 
 
 0 0 0 0 
 .
 
 0 0 0 0 
 
0 0 0 1

40
(b)
 
0 0 0 0
 
 
 0 0 1 0 
 .
 
 0 0 0 0 
 
0 0 0 1

(c)
 
0 1 0 0
 
 
 0 0 1 0 
 .
 
 0 0 0 0 
 
0 0 0 1

4.27 (a) Inspection of the diagonal elements reveals that the eigenvalues are 2 and 3 with multiplicities
four and two.

(b) The two eigenspaces are

SJ (2) = {(a, 0, b, 0, 0, 0)0 : −∞ < a < ∞, −∞ < b < ∞},

SJ (4) = {(0, 0, 0, 0, a, 0)0 : −∞ < a < ∞}.

4.29 Since (A − λI5 )2 = (0), the Jordan canonical form has no hi larger than 2. Thus, the forms from
Problem 4.26 satisfying this constraint are

diag(J1 (λ), J1 (λ), J1 (λ), J1 (λ), J1 (λ)), diag(J2 (λ), J1 (λ), J1 (λ), J1 (λ)), diag(J2 (λ), J2 (λ), J1 (λ)).

4.31 Let J = D+B be as in part (b) of Problem 4.28. According to Theorem 4.11, there exists a nonsingular
matrix X such that A = XJX −1 . Since all of the eigenvalues of A are zero, D = (0), and so
A = XBX −1 . As a result, Am = XB m X −1 . But from Problem 4.28 part (b), we know that B h = (0)
if h ≥ maxi hi . Since maxi hi ≤ m, we must have B m = (0), and hence Am = (0).

4.33 Let A1 = A − λIm . Note that λ is an eigenvalue of A with multiplicity r if and only if A1 has m − r
nonzero eigenvalues. But from Problem 4.32, rank(A1 ) = rank(A21 ) is equivalent to the condition
rank(A1 ) = rank(A − λIm ) = m − r. The result now follows from Theorem 4.8.

41
4.35 Since V = T U is upper triangular, it follows that vij = 0 for 1 ≤ j < i ≤ r + 1. Now for 1 ≤ i ≤ j ≤ r,
note that
m
X
vij = (T U )ij = (T )i· (U )·j = til ulj
l=1
j
X
= til ulj = 0,
l=i

where the fourth equality follows from the fact that T and U are upper triangular and the final equality
follows since tij = 0 for 1 ≤ i ≤ r, 1 ≤ j ≤ r. When 1 ≤ i ≤ j = r + 1, we have
j
X
vij = til ulj = tir+1 ur+1r+1 = 0,
l=i

since ur+1r+1 = 0.
√ √ √
4.37 In Problem 4.17, we saw that (1/ 3, 1/ 3, −1/ 3)0 is a normalized eigenvector of C corresponding
to the eigenvalue 4. Define  √ √ √ 
1/ 3 1/ 2 1/ 6
 √ √ √ 
Y = −1/ 2 ,
 
1/ 3 1/ 6
 √ √ 
−1/ 3 0 2/ 6
and note that  √ √ 
4 −1/ 6 5/ 2
 √ 
Y 0 CY =  0 −6/ 3 .
 
2
 √ 
0 −2/ 3 2
A normalized eigenvector of  √ 
2 −6/ 3
B= √ 
−2/ 3 2

corresponding to the eigenvalue 0 is ( 3/2, 1/2)0 and
 √ 
0 −4/ 3
W 0 BW =  
0 4

when  √ 
3/2 −1/2
W = √ .
1/2 3/2

42
A Schur decomposition of C is then given as C = XT X 0 , where
 √ √ 
  1/ 3 2/ 6 0
1 00  √ √ √ 
X=Y  = −1/ 6 ,

 1/ 3 1/ 2
0 W  √ √ √ 
−1/ 3 1/ 6 1/ 2

and  √ √ 
4 2 8/ 6
 √ 
T = 0 −4/ 3 .
 
0
 
0 0 4

4.39 Note that


X m
X X m
X X
|tij |2 = |tii |2 + |tij |2 = |λi (A)|2 + |tij |2 ,
i≤j i=1 i<j i=1 i<j

2
|tij | is uniquely defined. Let A = XT X ∗ be
2
P P
so that i<j |tij | is uniquely defined if and only if i≤j

a Schur decomposition of A so that T = X ∗ AX. The result follows since


X
|tij |2 = tr(T ∗ T ) = tr{(X ∗ AX)∗ X ∗ AX}
i≤j
= tr(X ∗ A∗ XX ∗ AX) = tr(A∗ AXX ∗ )

= tr(A∗ A).

4.41 The proof is very similar to the proof of Theorem 4.18. First suppose that such a nonsingular matrix X
does exist; that is, there is a nonsingular matrix X such that X −1 AX = Λ1 and X −1 BX = Λ2 , where
Λ1 and Λ2 are diagonal matrices. Then because Λ1 and Λ2 are diagonal matrices, clearly Λ1 Λ2 = Λ2 Λ1 ,
so we have

AB = XΛ1 X −1 XΛ2 X −1 = XΛ1 Λ2 X −1 = XΛ2 Λ1 X −1

= XΛ2 X −1 XΛ1 X −1 = BA,

and hence, A and B do commute. Conversely, now assuming that AB = BA, we need to show that
such a nonsingular matrix X does exist. Let µ1 , . . . , µh be the distinct values of the eigenvalues of A
having multiplicities r1 , . . . , rh , respectively. Since A is diagonalizable, a nonsingular matrix Y exists,
satisfying
Y −1 AY = Λ1 = diag(µ1 Ir1 , . . . , µh Irh ).

43
Performing this same transformation on B and partitioning the resulting matrix in the same way that
Y −1 AY has been partitioned, we get
 
C11 C12 ··· C1h
 
 C21 C22 ··· C2h
 
−1

C = Y BY = 
 .. .. ..
,

 . . . 
 
Ch1 Ch2 ··· Chh
where Cij is ri × rj . Note that because AB = BA, we must have

Λ1 C = Y −1 AY Y −1 BY = Y −1 ABY = Y −1 BAY

= Y −1 BY Y −1 AY = CΛ1 .

Equating the (i, j)th submatrix of Λ1 C to the (i, j)th submatrix of CΛ1 yields the identity µi Cij =
µj Cij . Since µi 6= µj if i 6= j, we must have Cij = (0) if i 6= j; that is, the matrix C =
diag(C11 , . . . , Chh ) is block diagonal. Now clearly C is diagonalizable since B is, and so from Problem
4.21, we know that Cii is diagonalizable for each i. Thus, we can find an ri × ri nonsingular matrix Zi
satisfying
Zi−1 Cii Zi = ∆i ,

where ∆i is diagonal. Let X = Y Z, where Z is the block diagonal matrix Z = diag(Z1 , . . . , Zh ) so


that X is a nonsingular matrix. Finally, the matrix ∆ = diag(∆1 , . . . , ∆h ) is diagonal,

X −1 AX = Z −1 Y −1 AY Z = Z −1 Λ1 Z

= diag(Z1−1 , . . . , Zh−1 ) diag(µ1 Ir1 , . . . , µh Irh ) diag(Z1 , . . . , Zh )

= diag(µ1 Z1−1 Z1 , . . . , µh Zh−1 Zh )

= diag(µ1 Ir1 , . . . , µh Irh ) = Λ1

and

X −1 BX = Z −1 Y −1 BY Z = Z −1 CZ

= diag(Z1−1 , . . . , Zh−1 ) diag(C11 , . . . , Chh ) diag(Z1 , . . . , Zh )

= diag(Z1−1 C11 Z1 , . . . , Zh−1 Chh Zh )

= diag(∆1 , . . . , ∆h ) = ∆,

and so the proof is complete.

44
4.43 From Problem 4.41, it follows that there exists a nonsingular matrix X such that X −1 AX = Λ =
diag(λi1 , . . . , λim ) and X −1 BX = M = diag(µj1 , . . . , µjm ), where (i1 , . . . , im ) and (j1 , . . . , jm ) are
permutations of (1, . . . , m), As a result,

X −1 (A + B)X = X −1 AX + X −1 BX

= diag(λi1 + µj1 , . . . , λim + µjm ).

Thus, A + B is diagonalizable with eigenvalues λi1 + µj1 , . . . , λim + µjm , so the result follows.

4.45 Consider    
1 0 −1 0
A= , B= ,
0 0 0 0
so that CAC 0 and CBC 0 are both diagonal when C = I2 . Any linear combination of A and B is
singular so Theorem 4.15 does not apply, and B is not nonnegative definite so Theoreom 4.16 does not
apply.

4.47 Let C be defined as in the solution to the previous problem. For any y ∈ Rm , let x = C 0 y so

x0 (A + B)x = y 0 C(A + B)C 0 y = y 0 (Λ + Im )y.

Thus, A + B is positive definite if and only if Λ + Im is positive definite. The result follows since Λ + Im
is positive definite if and only if λi > −1 for all i.

4.49 (a) Let B = XΛX 0 be the spectral decomposition of B and P DP 0 be the spectral decomposition of
Λ1/2 X 0 AXΛ1/2 . Put Y = XΛ−1/2 P . Then

Y −1 ABY = P 0 Λ1/2 X 0 AXΛX 0 XΛ−1/2 P

= P 0 Λ1/2 X 0 AXΛ1/2 P

= P 0 P DP 0 P = D,

so AB is diagonalizable.

(b) Let r = rank(A) and suppose A = P ΛP 0 is the spectral decomposition of A, where Λ =


−1/2
diag(Λ1 , (0)) and Λ1 is r × r. Then C −1 = diag(Λ1 , Im−r )P 0 is a nonsingular matrix sat-
isfying C −1 AC −10 = diag(Ir , (0)). Consider the matrix
 
H 11 H 12
H = C 0 BC =  ,
0
H12 H22

45
where H11 is r × r. Since B is nonnegative definite, so is H and as a result, it can be expressed
as H = T T 0 for some m × m matrix T . Partitioning T appropriately, we have
     
0 0
H11 H12 T1 T1 T1 T1 T2 
 =  [T10 T20 ] =  ,
0
H12 H22 T2 T2 T1 T2 T20
0

which reveals that R(H12 ) ⊂ R(H11 ). Thus, H12 = H11 X for some r × (m − r) matrix X, and as
a result  
H11 (0)
Y 0 HY =  ,
(0) H22 − X 0 H11 X
where  
Ir −X
Y = .
(0) Im−r
Since H11 and H22 − X 0 H11 X are symmetric, there are orthogonal matrices Q1 and Q2 , and
diagonal matrices D1 and D2 , such that Q01 H11 Q1 = D1 and Q02 (H22 − X 0 H11 X)Q2 = D2 . Let
F = CY Q, where Q = diag(Q1 , Q2 ), and note that F −1 AF −10 = diag(Ir , (0)) and F 0 BF = D =
diag(D1 , D2 ). Consequently,

F −1 AF −10 F 0 BF = F −1 ABF = diag(D1 , (0)),

and so AB is diagonalizable.

4.51 Clearly, kAk∗ is nonnegative and reduces to zero if and only if aij = 0 for all i and j, so properties (a)
and (b) of a matrix norm hold. Also, property (c) holds since |caij | = |c||aij |, while property (d) holds
since
   
kA + Bk∗ = m max |aij + bij | ≤ m max {|aij | + |bij |}
1≤i,j≤m 1≤i,j≤m
   
≤ m max |aij | + m max |bij | = kAk∗ + kBk∗ .
1≤i,j≤m 1≤i,j≤m

Finally, property (e) holds since


  m
!
X
kABk∗ = m max |(AB)ij | = m max aik bkj
1≤i,j≤m 1≤i,j≤m
k=1
m
! m
!
X X
≤ m max |aik bkj | =m max |aik ||bkj |
1≤i,j≤m 1≤i,j≤m
k=1 k=1
m
!
X
−1 −1
≤ m max (m kAk∗ )(m kBk∗ ) = kAk∗ kBk∗ .
1≤i,j≤m
k=1

46
4.53 (a) Using property (e) of the matrix norm k · k, we have

2
kIm k = kIm k = kIm Im k ≤ kIm kkIm k = kIm k2 ,

and this leads to kIm k ≥ 1.

(b) For a nonsingular matrix A, we have

kIm k = kAA−1 k ≤ kAkkA−1 k,

which yields
kIm k
kA−1 k ≥ ≥ kAk−1 ,
kAk
where the second inequality follows from the result from part (a).

4.55 Let A = P DQ0 be a singular value decomposition of A so that the m × m matrices P and Q are
orthogonal and the matrix D has the form given in (a) or (d) of Theorem 4.1 since n = m. Then we
have

kAkE = {tr(A0 A)}1/2 = {tr[(P DQ0 )0 P DQ0 ]}1/2

= {tr(QDP 0 P DQ0 )}1/2 = {tr(D2 Q0 Q)}1/2


r
!1/2
X
2 1/2 2
= {tr(D )} = δi .
i=1

4.57 Let    
0 1 0 0
A= , B= .
0 0 1 0
Both A and B have eigenvalue 0 with multiplicity two, so ρ(A) = ρ(B) = 0.

(a) Note that  


0 1
A+B = .
1 0
Since A + B has eigenvalues of −1 and 1, we have

ρ(A + B) = 1 > 0 = ρ(A) + ρ(B).

47
(b) Note that  
1 0
AB =  .
0 0
Since AB has eigenvalues of 0 and 1, we have

ρ(AB) = 1 > 0 = ρ(A)ρ(B).

4.59 (a) We prove the Ar = L∗ U∗ factorization by induction. The result holds for r = 2 since
    
a11 0 1 a12 /a11 a11 a12
  = ,
a21 a22 − a21 a12 /a11 0 1 a21 a22

where clearly L∗ and U∗ are nonsingular since a11 6= 0 and |A2 | = a11 a22 − a21 a12 6= 0. Now
suppose r > 2 and the factorization holds for (r − 1) × (r − 1) matrices. Partition Ar as
 
B c
Ar =  ,
d0 e

where B is (r − 1) × (r − 1). According to our induction hypothesis, B can be factored as B = LU ,


where L is a nonsingular lower triangular matrix and U is a nonsingular upper triangular matrix.
Our proof will be complete if we can find g, f , v, and w such that
  
L 0 U w
Ar =   .
f0 g 00 v

Since L and U are nonsingular, the unique solutions to Lw = c and f 0 U = d0 are given by
w = L−1 c and f 0 = d0 U −1 . Finally, we solve gv + f 0 w = e by setting v = 1 and g = e − f 0 w.
Note that our resulting lower and upper triangular matrices are nonsingular since L and U are,
v 6= 0, and g = e − d0 U −1 L−1 c = e − d0 B −1 c 6= 0, a consequence of the fact that |Ar | =
6 0. This
proves the Ar = L∗ U∗ factorization.
Partition A, L, and U as
     
A11 A12 L11 (0) U11 U12
A= , L= , U = ,
A21 A22 L21 L22 (0) U22

where the (1, 1)th block matrices are r × r, and consider


   
A11 A12 L11 U11 L11 U12
 = . (1)
A21 A22 L21 U11 L21 U12 + L22 U22

48
Now L11 and U11 can be obtained via the process described in the first part of this problem, and
from (1) we immediately get U12 = L−1 −1
11 A12 and L21 = A21 U11 . Since rank(A) = rank(A11 ) = r,

it follows that A21 = BA11 and A22 = BA12 for some (m − r) × r matrix B. Then from (1), we
have
−1 −1
A22 = L21 U12 + L22 U22 = A21 U11 L11 A12 + L22 U22 (1)

= BA11 A−1
11 A12 + L22 U22 = BA12 + L22 U22 (2)

= A22 + L22 U22 . (3)

Thus, we must have L22 U22 = (0). For instance, one of the matrices can be chosen to be the null
matrix, while the other matrix can be chosen so that L or U is nonsingular and triangular.

(b) Consider the matrix  


0 1
A= .
1 1
An LU factorization would have l11 u11 = a11 = 0, so that l11 = 0 or u11 = 0. But |A| = |LU |
requires that both L and U be nonsingular since A is. Thus, an LU factorization is not possible
for this matrix.

(c) If A = LU , the equation Ax = c can be solved in two steps. First we find a solution y to the
equation
Ly = c, (1)

and then find a solution x to the equation

U x = y. (2)

As a result, we will then have


Ax = LU x = Ly = c.

Since both L and U are triangular matrices, solutions to (1) and (2) are easily obtained. For
instance, if both L and U are nonsingular matrices, the solution to (1) has
Pi−1
c1 ci − j=1 lij yj
y1 = , yi = , for i = 2, . . . , m,
l11 lii
while the solution to (2) has
Pi−1
ym ym−i − j=0 um−i,m−j xm−j
xm = , xm−i = , for i = 1, . . . , m − 1.
umm um−i,m−i

49
4.61 (a) It follows from Problem 4.59(a) that A can be expressed as A = L∗ U∗ , where L∗ and U∗0 are
nonsingular lower triangular m × m matrices. Let D1 and D2 be the diagonal matrices that have
the same diagonal elements as L∗ and U∗ , respectively. Define L = L∗ D1−1 , M 0 = D2−1 U∗ , and
D = D1 D2 , and note that L and M are lower triangular matrices with all diagonal elements equal
to one. Then the result follows since

A = L∗ U∗ = L∗ D1−1 D1 D2 D2−1 U∗ = LDM 0 .

(b) Note that if A has the LDM 0 decomposition from part (a), then

(A)01· = d11 (M )·1 , (A)·1 = d11 (L)·1 ,

and so, since A is symmetric, we must have (M )·1 = (L)·1 . In addition, if the first k − 1 columns
of M are the same as those of L, then
k
X k−1
X
(A)0k· = lki dii (M )·i = lki dii (L)·i + dkk (M )·k , (1)
i=1 i=1

while
k
X k−1
X
(A)·k = mki dii (L)·i = lki dii (L)·i + dkk (L)·k . (2)
i=1 i=1

The symmetry of A implies (1) equals (2), and so (M )·k = (L)·k . Thus, we have established that
that M = L.

50
Chapter 5

5.1 Each proof simply involves the verification of (5.1)–(5.4).

(a) We have
αAα−1 A+ αA = αAA+ A = αA,

and
α−1 A+ αAα−1 A+ = α−1 A+ AA+ = α−1 A+ .

In addition, αAα−1 A+ = AA+ and α−1 A+ αA = A+ A are both symmetric since AA+ and A+ A
are symmetric.

(b) We have
A0 (A+ )0 A0 = (AA+ A)0 = A0 ,

and
(A+ )0 A0 (A+ )0 = (A+ AA+ )0 = (A+ )0 .

In addition, A0 (A+ )0 = (A+ A)0 and (A+ )0 A0 = (AA+ )0 are both symmetric since AA+ and A+ A
are symmetric.

(c) (A+ )+ = A follows immediately since conditions (5.1), (5.2), (5.3) and (5.4) for the Moore–Penrose
inverse A+ of A are conditions (5.2), (5.1), (5.4) and (5.3), respectively, for the Moore–Penrose
inverse A of A+ .

(d) Since A−1 A = Im , we have


AA−1 A = A,

and
A−1 AA−1 = A−1 .

The symmetry conditions also follow since AA−1 = A−1 A = Im is symmetric.

5.3 a0 a = 18, so a+ = (a0 a)−1 a0 = 1


18 (2, 1, 3, 2).

51
5.5 Let Φ = diag(φ1 , . . . , φm ). Note that since λi φi λi = λi λ−1
i λi = λi if λi 6= 0 and λi φi λi = 0 = λi if

λi = 0,
ΛΦΛ = Λ.

Similarly since φi λi φi = λ−1 −1


i λi λi = λ−1
i = φi if λi 6= 0 and φi λi φi = 0 = φi if λi = 0,

ΦΛΦ = Φ.

Further, ΛΦ and ΦΛ are diagonal matrices and, hence, are symmetric. This confirms that Φ = Λ+ .

5.7 (a) Now  


5 5 −4
 
AA0 =  5 −4 
 
5
 
−4 −4 14
and |AA0 − λI3 | = −λ(λ − 6)(λ − 18), so AA0 has eigenvalues 0, 6, 18. The equation AA0 x = 6x
yields the constraints x2 = x1 and x3 = x1 , so a normalized eigenvector corresponding to 6 is
x1 = √1 (1, 1, 1)0 . The equation AA0 x = 18x yields the constraints x2 = x1 and x3 = −2x1 , so a
3
normalized eigenvector corresponding to 18 is x2 = √1 (1, 1, −2)0 . Thus,
6
 
7 7 4
0 +1 0 1 0 1  
(AA ) = x1 x1 + x2 x2 = 7 4 ,
 
 7
6 18 108  
4 4 10
and  
2 2 5
1  
A+ = A0 (AA0 )+ =  −1 −1 2 .
 
18  
4 4 1
(b) The projection for the range of A is
 
1 1 0
1 
AA+ = AA0 (AA0 )+ = 0 ,
 
 1 1
2 
0 0 2
while the projection for the row space of A is
 
5 2 1
1 
A+ A = A0 (AA0 )+ A = −2  .
 
 2 2
6 
1 −2 5

52
5.9 Since rank(A) = 1, the singular value decomposition of A has the form A = pdq 0 , where d is a scalar,
while p and q are m × 1 and n × 1 unit vectors. Now d is the positive square root of the one nonzero
p
eigenvalue of A0 A, and so d = tr(A0 A) = c1/2 and A = c1/2 pq 0 . Thus,

A+ = c−1/2 qp0 = c−1 c1/2 qp0 = c−1 A0 .

5.11 Using the identity AA+ A = A, we have

(AA+ )2 = (AA+ A)A+ = AA+ ,

and
(A+ A)2 = A+ (AA+ A) = A+ A,

so both AA+ and A+ A are idempotent. Since they are idempotent, we also find that

(Im − AA+ )2 = Im − 2AA+ + (AA+ )2 = Im − 2AA+ + AA+ = Im − AA+ ,

and
(Im − A+ A)2 = Im − 2A+ A + (A+ A)2 = Im − 2A+ A + A+ A = Im − A+ A,

so (Im − AA+ ) and (Im − A+ A) are also idempotent.

5.13 Since B is positive definite, there exists an n × n nonsingular matrix T such that B = T T 0 . Letting
A∗ = AT and using Problem 5.12(c), we have

A∗ A0∗ (A∗ A0∗ )+ A∗ = A∗ ,

that is,
AT T 0 A0 (AT T 0 A0 )+ AT = AT.

Postmultiplying both sides by T −1 and using T T 0 = B yields

ABA0 (ABA0 )+ A = A.

5.15 Let A = XΛX 0 be the spectral decomposition of A. If A has one nonzero eigenvalue λ with multiplicity
r, then Λ and X can be chosen so that Λ = diag(λIr , (0)) and X = (X1 , X2 ), and the spectral
decomposition will reduce to the form = X1 (λIr )X10 = λX1 X10 , where X1 is an m × r matrix satisfying
X10 X1 = Ir . Thus,

A+ = X1 (λIr )−1 X10 = λ−1 X1 X10 = λ−2 (λX1 X10 ) = λ−2 A.

53
5.17 (a) If A is nonnegative definite, then its eigenvalues are nonnegative. It then follows from Theorem
5.7 that the eigenvalues of A+ are also nonnnegative, and so it is a nonnegative definite matrix.

(b) If Ax = 0, then A+ Ax = 0. Using the fact that A and A+ are symmetric, we have

0 = A+ Ax = (A+ A)0 x = A0 A+0 x = AA+ x.

Premultiplying by A+ , we get
0 = A+ AA+ x = A+ x.

5.19 Let rA = rank(A), rB = rank(B), while A = P DP 0 and B = XΛX 0 are the spectral decompositions
of A and B, where D and Λ are rA × rA and rB × rB diagonal matrices. Note that if Ax = 0, we
must have Bx = 0 since B and A − B are nonnegative definite, and so R(B) ⊆ R(A). First suppose
that rA = rB in which case R(B) = R(A). As a result, there exists an orthogonal matrix Q such that
X = P Q and

A − B = P DP 0 − XΛX 0 = P D1/2 (IrA − D−1/2 QΛQ0 D−1/2 )D1/2 P 0 ,

while
B + − A+ = XΛ−1 X 0 − P D−1 P 0 = P D−1/2 (D1/2 QΛ−1 Q0 D1/2 − IrA )D−1/2 P 0 .

Since A − B is nonnegative definite, so is IrA − D−1/2 QΛQ0 D−1/2 . But this implies that (see Problem
4.46) D1/2 QΛ−1 Q0 D1/2 − IrA is nonnegative definite and, hence, so is B + − A+ . Conversely, now
suppose that B + − A+ is nonnegative definite. But this implies that R(A+ ) ⊆ R(B + ), that is,
R(A) ⊆ R(B) in addition to the already established condition R(B) ⊆ R(A), and so rA = rB follows.

5.21 Since P 0 P = Im and QQ0 = In , we see that

P AQQ0 A+ P 0 = P AA+ P 0 is symmetric,

Q0 A+ P 0 P AQ = Q0 A+ AQ is symmetric,

P AQQ0 A+ P 0 P AQ = P AA+ AQ = P AQ,

Q0 A+ P 0 P AQQ0 A+ P 0 = Q0 A+ AA+ P 0 = Q0 A+ P 0 .

The four conditions of Definition 5.1 hold, so (P AQ)+ = Q0 A+ P 0 .

54
5.23 (a) Note that  
0 1 0
 
A+ =  0 1 ,
 
0
 
0 0 0

B + = diag(1, 0, 1/2), A0 A = diag(1, 1, 0), A+ A = diag(1, 1, 0) and BB 0 = diag(1, 0, 4). Conse-


quently,  
0 1 0
 
A+ ABB 0 A0 = BB 0 A0 =  0 0 ,
 
0
 
0 0 0

and BB + A0 AB = A0 AB = diag(1, 0, 0), and so (AB)+ = B + A+ follows from Theorem 5.10(a).

(b) Note that A+ A = PR(A+ ) = PR(A0 ) = diag(1, 1, 0) and


   
0 0 0 0 0 0
   
A+ ABB 0 = diag(1, 1, 0)  0 1 = 0 1 .
   
2 2
   
0 1 1 0 0 0

Since this is not symmetric, according to Theorem 5.10(b), (AB)+ 6= B + A+ .

5.25 Let B = diag(A+ +


11 , . . . , Arr ). Then B is the Moore-Penrose inverse of A since

ABA = diag(A11 A+ +
11 A11 , . . . , Arr Arr Arr ) = diag(A11 , . . . , Arr ) = A,

BAB = diag(A+ + + + + +
11 A11 A11 , . . . , Arr Arr Arr ) = diag(A11 , . . . , Arr ) = B,

AB = diag(A11 A+ + +
11 , . . . , Arr Arr ) is symmetric since Aii Aii is for each i,

BA = diag(A+ + +
11 A11 , . . . , Arr Arr ) is symmetric since Aii Aii is for each i.

5.27 Using the solution to Problem 5.10(a), we have


 
1 1 1
1 
U+ = 1 ,
 
 1 1
9 
1 1 1

while  
1 0 −3 3
V+ = (V 0 V )−1 V 0 =  .
3 −1 −1 2

55
Since U 0 V = (0), Corollary 5.13.1(d) applies and so
 
1/9 1/9 1/9
 

  1/9 
 1/9 1/9 
+
U  
A+ =  = 1/9  .

 1/9 1/9
V+  
−1
 
 0 1 
 
−1/3 −1/3 2/3

5.29 Let U = wx0 and V = yz 0 . Then U V 0 = (0) and U 0 V = (0), so Theorem 5.17 applies. Using this and
the solution to Problem 5.10(d), we get

A+ = U + + V + = (w0 wx0 x)−1 xw0 + (y 0 yz 0 z)−1 zy 0


 
2 0 1 1
 
 
1  2 0 1 1 
=  .
24 
 −1 −3 −2 −2 

 
−3 3 0 0

5.31 We can write Ω = nD − npp0 , where p = (p1 , . . . , pm )0 , and so Ω takes the form A + cd0 in Theorem
5.18 with A = nD, c = −np, and d = p. Since
m
X
1 + d0 A−1 c = 1 − p0 D−1 p = 1 − pi = 0,
i=1

according to Theorem 5.18, Ω is singular. Note that for x = A−1 d = n−1 D−1 p = n−1 1m and
y = A−1 c = −n−1 D−1 np = −1m , we have xx+ = yy + = m−1 1m 10m . Thus, from Theorem 5.18

Ω+ = (Im − yy + )A−1 (Im − xx+ )

= n−1 (Im − m−1 1m 10m )D−1 (Im − m−1 1m 10m ).

5.33 Since c and d are in the column space of A and A is symmetric, we have A+ Ac = AA+ c = c and
A+ Ad = AA+ d = d. We will show that (A + cd0 )+ = A+ − α−1 A+ cd0 A+ , where α = 1 + d0 A+ c, by
verifying the four conditions of the Moore–Penrose inverse. Now

(A + cd0 )(A+ − α−1 A+ cd0 A+ ) = AA+ − α−1 AA+ cd0 A+ + cd0 A+ − α−1 cd0 A+ cd0 A+

= AA+ + cd0 A+ − α−1 (1 + d0 A+ c)cd0 A+

= AA+ + cd0 A+ − cd0 A+ = AA+ ,

56
so condition 5.3 holds. Similarly,

(A+ − α−1 A+ cd0 A+ )(A + cd0 ) = A+ A + A+ cd0 − α−1 A+ cd0 A+ A − α−1 A+ cd0 A+ cd0

= A+ A + A+ cd0 − α−1 (1 + d0 A+ c)A+ cd0

= A+ A + A+ cd0 − A+ cd0 = A+ A,

so (5.4) holds as well. Conditions (5.1) and (5.2) hold since

(A + cd0 )(A+ − α−1 A+ cd0 A+ )(A + cd0 ) = AA+ (A + cd0 )

= AA+ A + AA+ cd0

= A + cd0 ,

and

(A+ − α−1 A+ cd0 A+ )(A + cd0 )(A+ − α−1 A+ cd0 A+ ) = A+ A(A+ − α−1 A+ cd0 A+ )

= A+ AA+ − α−1 A+ AA+ cd0 A+

= A+ − α−1 A+ cd0 A+ .

5.35 (a) diag(0, 1/2, 1/3).

(b) diag(1, 1/2, 1/3).


(c)
 
1 1 1
 
.
 
 1 1/2 0
 
1 0 1/3

5.37 Since rank(B) = n, it follows from Theorem 5.23(f) that BB − = In . Thus,

ABB − A− AB = AA− AB = AB,

and so B − A− is a generalized inverse of AB.

5.39 First suppose that B is a generalized inverse of A. Then ABA = A and so postmultiplying by B, we
have ABAB = AB confirming that AB is idempotent. Further, from Theorem 2.8(a),

rank(A) = rank(ABA) ≤ rank(AB) ≤ rank(A),

57
implying then that rank(A) = rank(AB). Now suppose AB is idempotent and rank(A) = rank(AB).
Thus,
ABAB = AB. (1)

But the column space of AB is a subspace of the column space of A, and when rank(A) = rank(AB)
these two subspaces are the same. Thus, the columns of A can be written as a linear combination of
those of AB meaning there is some matrix C such that A = ABC. Postmultiplying equation (1) by C
yields ABA = A, so we have shown that B is a generalized inverse of A.

5.41 First let D = C −1 A− B −1 for some generalized inverse A− of A. Then

BACDBAC = BACC −1 A− B −1 BAC = BAA− AC = BAC,

so D is a generalized inverse of BAC. Conversely, suppose D is a generalized inverse of BAC so that

BACDBAC = BAC.

Premultiplying this equation by B −1 and postmultiplying by C −1 yields

ACDBA = A.

This implies that CDB is a generalized inverse of A, that is, CDB = A− for some A− , and this leads
to D = C −1 A− B −1 .

5.43 (a) First suppose that B is a reflexive inverse of A. Then since ABA = A, we have

rank(A) = rank(ABA) ≤ rank(AB) ≤ rank(B),

and since BAB = B, we have

rank(B) = rank(BAB) ≤ rank(BA) ≤ rank(A).

Together these two identities imply rank(B) = rank(A). Conversely, suppose ABA = A and
rank(B) = rank(A). Then BABA = BA, that is, BA is idempotent, and from Theorem 5.23,
rank(BA) = rank(A) = rank(B). It now follows from Problem 5.39 that A is a generalized inverse
of B, so BAB = B.

58
(b) Note that
     
−1
∆ (0) ∆ E I ∆E
AB = P   Q0 Q  P0 = P   P 0.
(0) (0) F F ∆E (0) (0)

Thus,
   
I ∆E ∆ (0)
ABA = P  P 0P   Q0
(0) (0) (0) (0)
 
∆ (0)
= P  Q0 = A,
(0) (0)

and
   
∆−1 E I ∆E
BAB = Q  P 0P  P0
F F ∆E (0) (0)
 
∆−1 E
= Q  P 0 = B.
F F ∆E

5.45 We prove the result by simply showing that the expression given for A− satisfies AA− A = A, and since
 
U W U + U XV
AA− A =  ,
V W U + V XV

this requires U W U +U XV = U and V W U +V XV = V . Now U XV = (0), while U W U = U U − U = U ,


and so the first of these two identities holds. To verify the second identity, note that

V W U = V U − U − V (In − U − U ){V (In − U − U )}− V U − U

and
V XV = V (In − U − U ){V (In − U − U )}− V,

so

V W U + V XV = V U − U + V (In − U − U ){V (In − U − U )}− V (In − U − U )

= V U − U + V (In − U − U ) = V.

5.47 We perform the following elementary row transformations on A:

59
1) Add row 1 to row 2.

2) Subtract from row 3 two times row 1.

3) Interchange rows 2 and 3.

4) Add row 2 to row 1.

This results in CA = H, where


   
−1 0 1 1 0 2
   
C =  −2 1 , H= 0 3 .
   
0 1
   
1 1 0 0 0 0

Since H is in Hermite form, it follows from Theorem 5.31 that C is a generalized inverse of A.

5.49 First find the matrix  


6 −8 −11 6
 
 −8 −8 
 
0
18 11
AA= .
 −11 22 −11 
 
11
 
6 −8 −11 6
Following the approach used in the solution to Problem 5.47, we get CA0 A = H, where
   
9/22 4/22 0 0 1 0 −5/2 1
   
 0 1 −1/2 0 
   
 4/22 3/22 0 0 
C= 
, H = 
 
.

 15 3 6 0   0 0 0 0 
   
−1 0 0 1 0 0 0 0

C is a generalized inverse of A0 A and so we can use it to get


 
5 −2 13
 
 
1  1 4 7 
AL = (A0 A)− A0 =  .
22  0 0
 
0 
 
0 0 0

5.51 We show that A+ = BA0 by verifying the four conditions of Definition 5.1. We know A0 ABA0 A =
A0 A, so A+0 A0 ABA0 A = A+0 A0 A, which leads to the first condition, ABA0 A = A. We also know

60
(BA0 A)0 = BA0 A, which is the fourth condition. Postmultiplying this identity by A+ yields the
identity A0 AB 0 A+ = BA0 AA+ = BA0 . Thus,

BA0 − BA0 ABA0 = (In − BA0 A)A0 AB 0 A+ = (In − A0 AB 0 )A0 AB 0 A+

= (A0 − A0 AB 0 A0 )AB 0 A+ = (0),

where the second equality uses the fourth condition, while the last equality follows from the first
condition. This gives us the second condition BA0 ABA0 = BA0 . Finally, we get the third condition by
noting that
ABA0 = AB(ABA0 A)0 = ABA0 AB 0 A0

is symmetrtic.

5.53 We show that A+ = A0 (AA0 )L by verifying the four conditions in Definition 5.1. Using Theorem
5.27(b)

AA0 (AA0 )L A = AA0 (AA0 )+ A = AA0 A+0 A+ A = A(A+ A)0 A+ A = AA+ AA+ A = A,

and

A0 (AA0 )L AA0 (AA0 )L = A0 (AA0 )+ AA0 (AA0 )L = A0 A+0 A+ AA0 (AA0 )L

= (A+ A)0 A+ AA0 (AA0 )L = A+ AA0 (AA0 )L

= A0 A+0 A0 (AA0 )L = A0 (AA0 )L .

Note that the symmetry of AA0 (AA0 )L follows immediately from the definition of (AA0 )L . Finally,
A0 (AA0 )L A is symmetric due to Theorem 5.27(c).

5.55 According to Theorem 5.27(b), A(A0 A)− A0 = A(A0 A)+ A0 and A0 (AA0 )− A = A0 (AA0 )+ A. Thus

A0 (AA0 )− A(A0 A)− A0 = A0 (AA0 )+ A(A0 A)+ A0

= A0 A+0 A+ AA+ A+0 A0

= (A+ A)0 A+ AA+ (AA+ )0

= A+ AA+ AA+ AA+ = A+ .

61
5.57 Since H is upper triangular, we have
m
X
(H 2 )ij = hik hkj
k=1
= 0, if i > j,

= h2ii , if i = j,
j
X
= hik hkj , if i < j.
k=i

Thus, (H 2 )ij = hij = 0 if i > j, and (H 2 )ii = h2ii = hii since hii = 0 or hii = 1, so all that remains is
to show that (H 2 )ij = hij when i < j. Now if hii = 0, then hik = 0 for all k and so
j
X
(H 2 )ij = hik hkj = 0 = hij .
k=i

When hii = 1,
j
X j
X
hik hkj = hij + hik hkj = hij ,
k=i k=i+1

since if for any k, hik 6= 0, then hkk = 0 and hkj = 0, and so hik hkj = 0 for k = i + 1, . . . , j.

5.59 We first compute  


3 1 4 7
 
 
 1 3 4 5 
A0 A = 

.

 4 4 8 12 
 
7 5 12 19
This then leads to  
30 −1 −4 −7
 
 −1 −4 −5 
 
0
30
B2 = 33I4 − A A =  .
 −4 −4 25 −12 
 
 
−7 −5 −12 14
We next calculate
   
0 24 0 24 −24 0 24
   
−36 6   −24
   
0
 6 0
48 24 0 
B2 A =  , B2 A A =  .
−12 6 
   
 6  0 24 24 24 
   
6 12 6 24 0 24 48

62
Then since A has rank of 2,
 
0 4 0
 
−6
 
+ 0 −1 0 1  1 1 
A = 2{tr(B2 A A)} B2 A =  .
12 
 1 −2

1 
 
1 2 1

5.61 Using the notation of Theorem 5.13 and its corollary, if A = [Aj−1 aj ] = [U V ], then C = aj −
Aj−1 dj = cj . If cj 6= 0, then C + C = c+
j cj = 1, and so Corollary 5.13.1(c) applies. This gives
   
0
A+ + +
j−1 − Aj−1 aj cj A+
j−1 − dj bj
A+
j =
 = .
c+
j b0j

If cj = 0, then Corollary 5.13.1(b) applies and

K = (1 + a0j A+0 +
j−1 Aj−1 aj )
−1
= (1 + d0j dj )−1 ,

KV 0 U +0 U + = (1 + d0j dj )−1 a0j A+0 + 0


j−1 Aj−1 = (1 + dj dj )
−1 0 +
dj Aj−1 = b0j ,

so
   
0
U + − U + V KV 0 U +0 U + A+ +
j−1 − Aj−1 aj bj
A+
j =  = 
KV 0 U +0 U + b0j
 
A+
j−1 − dj b0j
=  .
b0j

63
Chapter 6

6.1 (a) Using A+ which was obtained in Problem 5.2, we find


 
3 2 2 2
 
1  2 6 −1 −1 
 
AA+ =  ,
7 2 −1 6 −1 

 
2 −1 −1 6

and then  
1
 
 
+
 3 
AA c = 
 .
 −1 

 
0
Since this is c, the system is consistent.

(b) One solution is  


2
 
A+ c =  3  .
 
 
−4

(c) It is easily verified that rank(A) = 3, so from Theorem 6.7, it follows that there is n−rank(A)+1 =
1 linearly independent solution.

6.3 By augmenting A with a final column of zeros and then row reducing this matrix to one in Hermite
form, we obtain a generalized inverse given by
 
0 2/5 0 −1/5
 
A− =  0 −1/5 0 3/5  .
 
 
0 −3 5 −1

(a) Since AA− c = (14/5, 2, 4/5, −2)0 6= c, the system is inconsistent.

(b) Since AA− c = (3, 2, 1, −1)0 = c, the system is consistent.

(c) Since AA− c = (36/5, 8, −4/5, −28)0 6= c, the system is inconsistent.

64
6.5 According to Theorem 6.3, since AXB = C is consistent, AA− CB − B = C. As a result

AXY B = AA− CB − B + AY B − AA− AY BB − B = C + AY B − AY B = C,

so XY is a solution. Next suppose X∗ is a solution. Then AX∗ B = C which leads to A− AX∗ BB − =


A− CB − . The result now follows since

XX∗ = A− CB − + X∗ − A− AX∗ BB − = A− CB − + X∗ − A− CB − = X∗ .

6.7 Suppose there exists a y such that A− c and (In − A− A)y 6= 0 are linearly dependent. After possibly
making a scalar adjustment to this vector y, we would then have A− c = (In − A− A)y, and this leads
to
AA− c = A(In − A− A)y = (A − AA− A)y = (A − A)y = 0.

But this is a contradiction because if the system is consistent, we must have AA− c = c 6= 0. Thus,
A− c and (In − A− A)y must be linearly independent.

6.9 From Theorem 5.24, for arbitrary n × m matrix C, we must have

0 = (C − A− ACAA− )c = Cc − A− ACAA− c = Cc − A− ACc = (In − A− A)Cc,

since AA− c = c if Ax = c is consistent. But since C is arbitrary and c 6= 0, any b ∈ Rn can be written
as b = Cc for some C, so we must have In − A− A = (0). The result now follows from Theorem 6.6.

6.11 The number of linearly independent solutions is r = n−rank(A) = 4−2 = 2. Using the Moore-Penrose
inverse  
9 17
 
−6 
 
1  12
A+ = A0 (AA0 )−1 =  ,
86  −34

−26 

 
−9 −17
and  
61 24 18 25
 
24 −24 
 
+ 1  24 32
I4 − A A =  ,
86 
 18 24 18 −18 

 
25 −24 −18 61
the general solution is given by xy = (In − A+ A)y. Choosing 86e1 and 86e2 for y, we get the two
particular linearly independent solutions, (61, 24, 18, 25)0 and (24, 32, 24, −24)0 .

65
6.13 Since G is a generalized inverse of A, the arbitrary solution x∗ can be written as

x∗ = Gc + (In − GA)y

for some vector y. Then

x0∗ x∗ = c0 G0 Gc + y 0 (In − GA)0 (In − GA)y + 2c0 G0 (In − GA)y

= c0 G0 Gc + y 0 (In − GA)0 (In − GA)y

since

c0 G0 (In − GA)y = (Ax∗ )0 G0 (In − GA)y

= x0∗ A0 G0 (In − GA)y

= x0∗ (GA)0 (In − GA)y

= x0∗ GA(In − GA)y

= x0∗ (GA − GAGA)y

= x0∗ (GA − GA)y = 0.

Thus,
x0∗ x∗ ≥ c0 G0 Gc (1)

since
y 0 (In − GA)0 (In − GA)y (2)

is nonnegative. We have equality in (1) only if the quantity in (2) is 0 and this requires (In −GA)y = 0.
This confirms that we have a strict inequality in (1) if x∗ 6= Gc.

6.15 Note that Im = XX −1 = X1 X1− + X2 X2− , so that X2 X2− = Im − X1 X1− . Then using Theorem 6.5,
we can solve AX1 = X1 Λ1 for A to get

A = X1 ΛX1− + Y (Im − X1 X1− ) = X1 ΛX1− + Y X2 X2− ,

where Y is an arbitrary m × m matrix. If we let W = −Y X2 , then A can be expressed as

A = X1 ΛX1− − W X2− .

66
6.17 (a) Suppose there is a common solution. That is, there exists an X satisfying AX = C and XB = D.
Postmultiplying the first of these two equations by B, while premultiplying the second by A yields
two equations for AXB which when equated gives AD = CB. Now suppose AD = CB. We need
to show that the general solution to AX = C, XY = A− C + Y − A− AY , is a solution to XB = D
for some Y . Take Y = DB − so that

XY = A− C + DB − − A− ADB − = A− C + DB − − A− CBB − .

Then since XB = D is consistent, we know that DB − B = D and so

XY B = A− CB + DB − B − A− CBB − B = A− CB + D − A− CB = D.

(b) The general solution to AX = C is X = A− C + W − A− AW for arbitrary W . If this is also


to be a solution to XB = D, W must satisfy XB = A− CB + (In − A− A)W B = D, and since
AD = CB, this leads to (In − A− A)W B = D − A− AD. The general solution for W in this last
system of equations is

W = (In − A− A)(D − A− AD)B − + Y − (In − A− A)Y BB −

for arbitrary Y , since (In − A− A)− = (In − A− A). Substituting this back into the expression for
X, we get

X = A− C + (In − A− A)(D − A− AD)B − + Y − (In − A− A)Y BB − − A− AY

= A− C + (In − A− A)DB − + (In − A− A)Y (Ip − BB − ),

as required.

6.19 (a) Using the least squares inverse obtained in the solution to Problem 5.49, we find that
 
32
1  
AAL c =  18  6= c,
 
22  
114

so the system is inconsistent.

67
(b) A least squares solution is given by
 
73
 
 
L 1  41 
A c=  .
22  0 


 
0

(c) The sum of squared errors for a least squares solution to this system of equations is
4
(AAL c − c)0 (AAL c − c) = .
11
6.21 Suppose x∗ is a least squares solution. Then, according to Theorem 6.13, Ax∗ = AAL c, and this leads
to
A0 Ax∗ = A0 (AAL )0 c = A0 AL0 A0 = A0 c.

Conversely, now suppose A0 Ax∗ = A0 c. Then AL0 A0 Ax∗ = AL0 A0 c, or (AAL )0 Ax∗ = (AAL )0 c, and
so AAL Ax∗ = AAL c or, finally Ax∗ = AAL c. Thus, it follows from Theorem 6.13 that x∗ is a least
squares solution.

6.23 A solution to this problem can be found in Example 8.3.

6.25 The general solution for β̂ in B β̂ = b is given by

β̂ u = B − b + (Im − B − B)u, (1)

where u is an arbitrary m × 1 vector. Substituting this in the equation X β̂ = y yields

XB − b + X(Im − B − B)u = y,

or
X(Im − B − B)u = y − XB − b. (2)

Now using Theorem 6.14, the general least squares solution for u in (2) is given by

uw = {X(Im − B − B)}L (y − XB − b) + [Im − {X(Im − B − B)}L X(Im − B − B)]w,

where w is an arbitrary m × 1 vector. We then get the general restricted least squares solution for β
by substituting uw for u in (1), and this leads to

β̂ w = B − b + (Im − B − B)({X(Im − B − B)}L (y − XB − b)

+ [Im − {X(Im − B − B)}L X(Im − B − B)]w).

68
Chapter 7

7.1 (a) We have

|A| = |A22 ||A11 − A12 A−1


22 A21 | = |dIm ||(a − bc/d)Im |

= dm (a − bc/d)m = (ad − bc)m .

(b) A is nonsingular for any values of a, b, c, and d for which |A| =


6 0, that is, for which ad 6= bc.

(c) Writing  
B11 B12
A−1 =  ,
B21 B22
and using Theorem 7.1, we get

d
B11 = (A11 − A12 A−1
22 A21 )
−1
= {(a − bc/d)Im }−1 = Im ,
ad − bc
a
B22 = (A22 − A21 A−1
11 A12 )
−1
= {(d − bc/a)Im }−1 = Im ,
ad − bc
b
B12 = −A−1
11 A12 B22 = − Im ,
ad − bc
c
B21 = −A−1
22 A21 B11 =− Im ,
ad − bc
That is,  
1 dIm −bIm
A−1 =  .
ad − bc −cIm aIm

7.3 Using Theorem 7.4 and Problem 3.29, we have

|A| = |bIm ||aIm − (cdm/b)1m 10m | = bm am−1 (a − cdm2 /b),

so A is nonsingular if a 6= 0, b 6= 0, and ab 6= cdm2 . Letting g = ab − cdm2 and using Theorem 7.1 and
Problem 3.29, we find that

cdm
B11 = (aIm − (cdm/b)1m 10m )−1 = a−1 Im + 1m 10m ,
ag
cdm
B22 = (bIm − (cdm/a)1m 10m )−1 = b−1 Im + 1m 10m ,
bg

69
 
cdm c
B12 = −(aIm )−1 (c1m 10m ) b−1 Im + 1m 10m = − 1m 10m ,
bg g
 
cdm d
B21 = −(bIm )−1 (d1m 10m ) a−1 Im + 1m 10m = − 1m 10m .
ag g
That is,  
0
a−1 Im + cdm
ag 1m 1m − gc 1m 10m
A−1 =  .
− dg 1m 10m b−1 Im + cdm 0
bg 1m 1m

7.5 Applying Theorem 7.4 with A11 = diag(4, 3, 2),


 
1 2    
  0 0 1 2 3
A12 =  1 2  , A21 =  , A22 =  ,
 
  1 1 0 1 2
2 3

we get
1 3/2
|A| = |A11 ||A22 − A21 A−1
11 A12 | = 24 = 5.
5/12 5/6
Applying Theorem 7.1, we find
 
1  20 −36
B22 = (A22 − A21 A−1
11 A12 )
−1
= ,
5 −10 24
 
2 1 0
1 
B11 = A−1 −1 −1
11 + A11 A12 B22 A21 A11 = 0 ,
 
 1 3
5 
0 0 5
 
0 −3
1 
B12 = −A−1
11 A12 B22 =  0 −4  ,
 
5 
−5 0
 
1 9 12 −10
B21 = −B22 A21 A−1
11 = .
5 −6 −8 5

7.7 (a) If rank(A) = rank(A11 ), it follows that


   
A11 A12 A11 A11 Y
 = 
A21 A22 XA11 XA11 Y

70
for some matrices X and Y . This requires X = A21 A−1 −1
11 and Y = A11 A12 , and so

A22 = XA11 Y = A21 A−1 −1 −1


11 A11 A11 A12 = A21 A11 A12 .

(b) With B = diag(A−1


11 , (0)), we have
    
A11 A12 I A−1
11 A12 A11 A12
A(BA) =    m1 =  = A.
A21 A21 A−1 11 A12 (0) (0) A21 A21 A−1
11 A12

(c) It follows from Theorem 5.9 that


  +
Im1
A+ A−1
 
=   A11 Im1 11 A12

A21 A−1
11
 +
+ Im1
A−1 A−1  = G+ A−1 +

= Im1 11 A12 11
 11 H .
A21 A−1
11

Here
 
Im1
G+ = G0 (GG0 )−1 =   (Im1 + A−1 0 −10 −1
11 A12 A12 A11 )
A012 A−10
11
 
A011
=   (A11 A011 + A12 A012 )−1 A11 ,
A012

and

H+ (H 0 H)−1 H 0 = (Im1 + A−10 0 −1 −1


A−10 0
 
= 11 A21 A21 A11 ) Im1 11 A21

= A11 (A011 A11 + A021 A21 )−1 [A011 A021 ] ,

and so the result follows.

−1
7.9 From Theorem 7.1, B11 = A11 − A12 A−1 0
22 A12 , and so
     
−1 −1 0 −1
A11 − B11 A12 A A A A12 A A
 =  12 22 12  =  12 22  A22 A−1 0
 
C= 22 A12 Im2
0 0
A12 A22 A12 A22 Im2

is nonnegative definite since A22 is positive definite. Due to the identity above, rank(C) ≤ rank(A22 ) =
m2 = m − m1 , while it follows that rank(C) ≥ rank(A22 ) since A22 is a submatrix of C. This confirms
that C is positive semidefinite with rank of m − m1 .

71
7.11 (a) Applying Theorem 7.4, we have

|A| = |A11 |(amm − a0 A−1


11 a) ≤ amm |A11 |,

where the inequality follows from the fact that A−1


11 is positive definite. We have equality if and

only if a0 A−1
11 a = 0, and this is equivalent to a = 0.

(b) We can prove by induction. The result holds for m = 2 since in this case we can apply the result
from part (a). Now assume the result holds for (m − 1) × (m − 1) positive definite matrices. That
is, if A11 is (m − 1) × (m − 1) and positive definite, then

|A11 | ≤ a11 · · · am−1,m−1 ,

with equality if and only if A11 is diagonal. Applying the result of part (a) to the matrix
 
A11 a12
A= ,
a012 amm

we have |A| ≤ amm |A11 | with equality if and only if a12 = 0. Combining these two results yields
the desired result.

7.13 The result follows by computing the determinant of both sides of the identity
    
A11 A12 Im1 (0) A11 A12
 =  .
A21 + BA11 A22 + BA12 B Im2 A21 A22

7.15 If A11 A21 = A21 A11 , then


    
A11 A12 I (0) A11 A12
  =  m1  .
(0) A21 A12 − A11 A22 A21 −A11 A21 A22

Thus,
|A11 ||A21 A12 − A11 A22 | = | − A11 ||A|,

which leads to |A| = |A11 A22 − A21 A12 |.

7.17 If B has rank 1, it can be expressed as B = −dc0 for some m × 1 vectors c and d. Using Theorem
7.4(a), we have
A d
= |A − dc0 | = |A + B|,
0
c 1

72
while Theorem 7.4(b) yields

A d
= |A|(1 − c0 A−1 d) = |A|{1 + tr(A−1 B)}.
0
c 1

The result follows by equating these two equations.

7.19 The proof utilizes the results given in Theorem 5.25. If R(A12 ) ⊂ R(A11 ), then A11 A−
11 A12 = A12 , and

so   
A11 (0) Im1 A−
11 A12
A=  . (1)
A21 Im2 (0) A22 − A21 A−
11 A12

If R(A021 ) ⊂ R(A011 ), then A21 A−


11 A11 = A21 , and so
  
Im1 (0) A A12
A=   11 . (2)
A21 A− −
11 A22 − A21 A11 A12 (0) Im2

Taking the determinant of both sides of (1) and (2) yields the desired result.

7.21 Consider  
1 0 1 1
 
 
 0 1 1 1 
A=

,

 1 1 1 1 
 
1 1 1 1
and partition as in (7.1) with m1 = 1 and m2 = 3. The conditions in Theorem 7.8(a) do not hold
since the column spaces of both A21 and A012 are spanned by (0, 1, 1)0 , while the column space of A22
is spanned by (1, 1, 1)0 . However, the identity holds since both |A| and |A22 | are zero.

7.23 Using Theorem 5.25, if R(A21 ) ⊂ R(A22 ), then A21 = A22 A−


22 A21 . That is, A21 = A22 B for some

matrix B. Similarly, if R(A012 ) ⊂ R(A022 ), then A12 A−


22 A22 = A12 , so that A12 = CA22 for some matrix

C. The result then follows since

A12 A− −
22 A21 = CA22 A22 A22 B = CA22 B.

Consider  
1 0 1
 
A= 1 1 ,
 
1
 
1 1 1

73
and partition A as in (7.1) with m1 = 1 and m2 = 2. Note that R(A21 ) ⊂ R(A22 ) holds, but
R(A012 ) ⊂ R(A022 ) does not. The matrices D = diag(1, 0) and E = diag(0, 1) are both generalized
inverses of the matrix A22 = 12 102 , and A11 − A12 DA21 = 1, while A11 − A12 EA21 = 0.

7.25 Write the expression given for A− as A− = X + Y . Note that since A11 G = (0), we have
 
A11 A−
11 + CC −
F (0)
AX =  ,
(Im2 − DD )(A21 A−
− −
11 + BC F ) DD

and then since F A11 = (0),


 
A11 A12
AXA =  . (1)
A21 A22 − (Im2 − DD− )B(Im2 − C − C)

Also  
−C h i
AY =  E BC − F + A21 A−
11 −Im2 ,
−(Im2 − DD− )B

and since CE = (0), ED = (0), and (Im2 − DD− )BEB(Im2 − C − C) = (Im2 − DD− )B(Im2 − C − C)
 
(0) (0)
AY A =  . (2)
(0) (Im2 − DD− )B(Im2 − C − C)

Adding (1) and (2) confirms that AA− A = A.

7.27 With  
B1− −B1− A12 A−
22
C= ,
−A− −
22 A21 B1 A− − − −
22 + A22 A21 B1 A12 A22

we need to verify that ACA = A if and only the stated conditions hold. Partitioning ACA and A
into 2 × 2 forms and equating the (1, 1)th submatrices yields an identity equivalent to the identity
A11 = A11 , which of course holds. Equating the (2, 1)th submatrices yields an identity equivalent
to the identity given in (a), while equating the (1, 2)th submatrices yields the identity given in (b).
Finally, equating the (2, 2)th submatrices yields an identity equivalent to the final condition in (c).

7.29 We have  
1 −1
A−1
11 =
 ,
−1 2

74
from which we get B = 0. Consequently,

1
Z = A012 A−1 −1 0 −1 −1
11 (I2 + A11 A12 A12 A11 ) = [2 − 2] ,
9

and   
A−1 0 Z 0
B ∼ = [Z 1] 
11   = 20 .
0 0
0 1 81

This then leads to  


17 19 −4
1  
A+ = −14  .
 
 19 26
81  
−4 −14 20

7.31 Note that C is symmetric and idempotent and R(B) = R(B + ) ⊂ R(C). Thus, B + C = B + and
CB + = B + , and so
A012 B + = (0) and B + A12 = (0). (1)

Using this, we find that


 
+
A12 A+
12 + CA11 B
+
CA11 A+0 + +0
12 − CA11 B A11 A12
AA =   (2)
(0) A012 A+0
12

and  
A+0 0 +
12 A12 + B A11 C (0)
A+ A =  . (3)
A+ + +
12 A11 C − A12 A11 B A11 C A+
12 A12

Now CA11 B + = CA11 CB + = BB + and, since B = CA11 C = CT T 0 C 0 for some m1 × m1 matrix T ,

CA11 B + A11 = CA11 CB + CA11 = BB + CA11 = PR(CT ) CT T 0 = CT T 0 = CA11 .

When these are used in (2) and (3), we get


 
A12 A+
12 + BB
+
(0)
AA+ = A+ A =   (4)
(0) A012 A+0
12

which is symmetric. Finally using (1), (4) and the fact that BB + A11 = BB + CA11 = CA11 , we get
AA+ A = A and A+ AA+ = A+ . Thus, we have shown that the expression given for A+ satisfies the
four conditions of the Moore–Penrose inverse.

75
7.33 In the proof of Theorem 7.18, it was shown that λm1 −j+1 (B1 ) ≥ λm−j+1 (A), so the lower bound clearly
holds. Applying the inequality from Problem 7.32 to A−1 , we get
k
( ) k
X
−2 −2 λ−1
m2 (B2 )
X
{λm−j+1 (A) − λm1 −j+1 (B1 )} ≤ 2 1 + −1 −1 λj (CC 0 ).
j=1
λ m1 −k+1 (B1 ) − λ m 2 (B2 ) j=1

Now combining this with the fact that


k k
X X λ2m (B1 ) − λ2m−j+1 (A)
1 −j+1
{λ−2 −2
m−j+1 (A) − λm1 −j+1 (B1 )} =
j=1 j=1
λ2m−j+1 (A)λ2m1 −j+1 (B1 )
k
X
≥ λ−2 −2
m−k+1 (A)λm1 −k+1 (B1 ) {λ2m1 −j+1 (B1 ) − λ2m−j+1 (A)},
j=1

leads to
k
X
{λ2m1 −j+1 (B1 ) − λ2m−j+1 (A)} ≤ 2λ2m−k+1 (A)λ2m1 −k+1 (B1 )
j=1
( ) k
λ−1
m2 (B2 )
X
× 1+ λj (CC 0 )
λ−1 −1
m1 −k+1 (B1 ) − λm2 (B2 ) j=1
( ) k
−1
λm2 (B2 ) X
≤ 4
2λm1 −k+1 (B1 ) 1 + −1 λj (CC 0 ).
λm1 −k+1 (B1 ) − λ−1
m2 (B2 ) j=1

7.35 Note that the lower bound is an immediate consequence of the lower bound given for λh (B1 ) in Problem
7.34. Now let
       
Q0 (0) Q (0) Q0 A11 Q Q0 A12 C11 C12
C= A = = .
(0) Im2 (0) Im2 A012 Q A22 0
C12 C22
−1 0
Note that C is positive definite and C1 = C11 − C12 C22 C12 = ∆, so that the positive eigenvalues of
C1 and B1 are the same and, in particular, λ1 (C1 ) < λm2 (B2 ). Thus, we can apply Theorem 7.18 to
C. Note that −C1−1 C12 C22
−1
= −∆−1 Q0 A12 A−1
22 = Ĉ, and so

−r+k
m1X −r+k
m1X
{λm1 −j+1 (B1 ) − λm−j+1 (A)} = {λm1 −j+1 (B1 ) − λm−j+1 (A)}
j=1 j=m1 −r+1
k
X
≤ {λr−j+1 (C1 ) − λm2 +r−j+1 (C)}
j=1
k
λ2r−k+1 (C1 ) X
≤ λj (Ĉ Ĉ 0 )
{λ−1 −1
r−k+1 (C1 ) − λm2 (B2 )} j=1

76
k
λ2r−k+1 (B1 ) X
= λj (Ĉ Ĉ 0 ).
{λ−1 −1
r−k+1 (B1 ) − λm2 (B2 )} j=1

The first equality follows from the fact that λh+m2 (A) = λh (B1 ) = 0 for h = r + 1, . . . , m1 , a conse-
quence of rank(A) = rank(A22 ) + rank(B1 ) = m2 + r, while the first inequality follows from Theorem
3.19.

77
Chapter 8

8.1 (a) We have    


10 6 15 9 10 15 6 9
   
   
 6 4 9 6   5 10 3 6 
A⊗B =

,
 B⊗A=

.

 5 3 10 6   6 9 4 6 
   
3 2 6 4 3 6 2 4

(b) tr(A ⊗ B) = tr(A) tr(B) = 4(7) = 28.

(c) |A ⊗ B| = |A|2 |B|2 = (12 )(12 ) = 1.

(d) We find that


|A − λI2 | = λ2 − 4λ + 1, |B − λI2 | = λ2 − 7λ + 1,
√ 7 3

and so the eigenvalues of A are 2 ± 3, while those of B are 2 ± 2 5. Consequently the four
eigenvalues of A ⊗ B are given by
√ 7 3√ √ 7 3√ √ 7 3√ √ 7 3√
       
(2 + 3) + 5 , (2 + 3) − 5 , (2 − 3) + 5 , (2 − 3) − 5 .
2 2 2 2 2 2 2 2

(e) Using Theorem 8.4, we have


   
2 −3 2 −3
(A ⊗ B)−1 = A−1 ⊗ B −1 =  ⊗ 
−1 2 −3 5
 
4 −6 −6 9
 
 −6 9 −15 
 
10
=  .
 −2 4 −6 
 
3
 
3 −5 −6 10

8.5 Using the definition of a Kronecker product, we have


   
a1 B a1 B1 · · · a1 Bk
 .   .. ..
    h i
a ⊗ B =  ..  =   = a ⊗ B1 ··· a ⊗ Bk .

   . . 
am B am B1 · · · am Bk

78
8.7 From Theorem 8.1(f), we have
(A ⊗ B)0 = A0 ⊗ B 0 = A ⊗ B,

so A ⊗ B is symmetric.

8.9 If cA and c−1 B are orthogonal, then A0 A = c−2 Im and B 0 B = c2 In , and so

(A ⊗ B)0 (A ⊗ B) = A0 A ⊗ B 0 B = c−2 Im ⊗ c2 In = Imn ,

confirming that A⊗B is orthogonal. Conversely, if A⊗B is orthogonal, we must have A0 A⊗B 0 B = Imn .
This implies that (A0 A)ij B 0 B = (0) when i 6= j, so A0 A must be diagonal. Also, we must have
(A0 A)ii B 0 B = In for i = 1, . . . , m, implying that B 0 B is diagonal with each diagonal element equal to
−1
(A0 A)ii . Thus, we have shown that for some d > 0, A0 A = dIm and B 0 B = d−1 In , and this yields the
result with c = d−1/2 .

8.11 Each element of A ⊗ B is an element of A times an element of B so it immediately follows that


A ⊗ B = (0) if A = (0) or B = (0). Now suppose that A ⊗ B = (0) and A 6= (0). Then for some i
and j, aij 6= 0. But aij B is a submatrix of A ⊗ B, so we must have aij B = (0) implying B = (0). In a
similar fashion, it can be shown that if B 6= (0), we must have A = (0).

8.13 Since rank(A) ≤ m < n, there exist n − m linearly independent n × 1 vectors x1 , . . . , xn−m such that
Axi = 0 for each i. If y 1 , . . . , y m is a set of linearly independent m × 1 vectors, then xi ⊗ y j for
i = 1, . . . , n − m and j = 1, . . . , m, form a linear independent set of vectors and

(A ⊗ B)(xi ⊗ y j ) = (Axi ⊗ By j ) = (0 ⊗ By j ) = 0

for all i and j. This confirms that 0 is an eigenvalue with multiplicity at least (n − m)m.

8.15 Let λ1 , . . . , λm be the eigenvalues of A, and θ1 , . . . , θp be the eigenvalues of B. Since A and B are
positive definite, λi > 0 and θj > 0 for all i and j. Thus, λi θj > 0 for all i and j, and so according to
Theorem 8.5, all of the eigenvalues of A ⊗ B are positive. This confirms that A ⊗ B is positive definite.

8.17 The three matrices are the same since


   
x1 x1 y1 ··· x1 yn
 .  .. ..
 h i  
xy 0 =  ..  y1 ··· = ,
 
yn . .
   
xm xm y1 ··· xm yn

79
 
x1 y1 ··· x1 yn
.. ..
h i  
y0 ⊗ x = ··· = ,
 
y1 x yn x . .
 
xm y1 ··· xm yn
   
x1 y 0 x1 y1 ··· x1 yn
.. .. ..
   
0
x⊗y = = .
   
 .   . . 
xm y 0 xm y1 ··· xm yn

8.19 (a) For this model, we have

X = (1a ⊗ 1b ⊗ 1n , Ia ⊗ 1b ⊗ 1n , 1a ⊗ Ib ⊗ 1n ),

and  
abn bn10a an10b
 
0
X X =  bn1a n1a ⊗ 10b  .
 
bnIa
 
an1b n10a ⊗ 1b anIb
Using the generalized inverse

(X 0 X)− = diag((abn)−1 , (bn)−1 (Ia − a−1 1a 10a ), (an)−1 (Ib − b−1 1b 10b ))

we find that a least squares solution for β is given by


 
ȳ··
 
 ȳ − ȳ 
 1· ·· 

 .
..


 
 
β̂ = (X 0 X)− X 0 y =  ȳa· − ȳ·· ,
 
 
 ȳ·1 − ȳ··
 

 
 .. 

 . 

ȳ·b − ȳ··

where ȳ·· , ȳi· , and ȳ·j are as defined in Example 8.3. The fitted value for yijk is µ̂ + τ̂i + γ̂j =
ȳi· + ȳ·j − ȳ·· , and so the sum of squared errors is given by
a X
X b X
n
SSE = (yijk − ȳi· − ȳ·j + ȳ·· )2 .
i=1 j=1 k=1

80
(b) The reduced model is a one-way classification model with b treatments and an observations for
each treatment. Using Example 8.2, we find that the sum of squared errors for this reduced model
is given by
a X
b X
n
!2
X X X X
SSE1 = (yijk − ȳ·j )2 = 2
yijk − (an)−1 yijk .
i=1 j=1 k=1 ijk j ik

Noting that the SSE from part (a) can be expressed as


 2 !2  2
X X X X X X
SSE = 2
yijk − (bn)−1  yijk  − (an)−1 yijk + (abn)−1  yijk  , (1)
ijk i jk j ik ijk

we find that
 2  2
X X X
SSA = SSE1 − SSE = (bn)−1  yijk  − (abn)−1  yijk 
i jk ijk
a
X
= nb (ȳi· − ȳ·· )2 .
i=1

(c) The reduced model is a one-way classification model with a treatments and bn observations for
each treatment. Again using Example 8.2, we find that the sum of squared errors for this reduced
model is given by
 2
a X
X b X
n X X X
SSE2 = (yijk − ȳi· )2 = 2
yijk − (bn)−1  yijk  .
i=1 j=1 k=1 ijk i jk

Thus, we have
!2  2
X X X
SSB = SSE2 − SSE = (an)−1 yijk − (abn)−1  yijk 
j ik ijk
b
X
= na (ȳ·j − ȳ·· )2 .
j=1

(d) For each i, µ + τi is estimable and for each j, µ + γj is estimable with the corresponding estimates
given by ȳi· and ȳ·j , respectively.

(e) Note that the SSE given in Problem 8.18 can be expressed as
!2
X X X
2 −1
yijk − n yijk .
ijk ij k

81
Subtracting this from the SSE given in (1), we get
!2  2 !2
X X X X X X
SSAB = n−1 yijk − (bn)−1  yijk  − (an) −1
yijk
ij k i jk j ik
 2
X
+ (abn)−1  yijk 
ijk
X X X
2
= n ȳij − nb ȳi·2 − na 2
ȳ·j + nabȳ··2
ij i j
a X
X b
= n (ȳij − ȳi· − ȳ·j + ȳ·· )2 .
i=1 j=1

8.21 (a) We have

(A1 ⊕ A2 ) + (A3 ⊕ A4 ) = diag(A1 , A2 ) + diag(A3 , A4 )

= diag(A1 + A3 , A2 + A4 ) = (A1 + A3 ) ⊕ (A2 + A4 ).

(b) Similarly

(A1 ⊕ A2 )(A3 ⊕ A4 ) = diag(A1 , A2 ) diag(A3 , A4 )

= diag(A1 A3 , A2 A4 ) = A1 A3 ⊕ A2 A4 .

(c) Finally,

(A1 ⊕ A2 ) ⊗ A3 = diag(A1 , A2 ) ⊗ A3

= diag(A1 ⊗ A3 , A2 ⊗ A3 ) = (A1 ⊗ A3 ) ⊕ (A2 ⊗ A3 ).

8.23 Solving the system of equations AXB = C in Theorem 6.5 is equivalent to solving the system of
equations
(B 0 ⊗ A) vec(X) = vec(C).

From Theorem 6.4, the general solution is

vec(X) = (B 0 ⊗ A)− vec(C) + (Inp − (B 0 ⊗ A)− (B 0 ⊗ A)) vec(Y )

= (B −0 ⊗ A− ) vec(C) + (Inp − (B −0 B 0 ⊗ A− A)) vec(Y )

= vec(A− CB − ) + vec(Y ) − vec(A− AY BB − ).

The result follows since this identity is equivalent to X = A− CB − + Y − A− AY BB − .

82
8.25 (a) The result follows from Theorem 8.12 when p = n, q = m, and D = Im .

(b) Applying Theorem 8.12, we have

tr(AD0 BDC) = tr(D0 BDCA) = tr{D0 BD(CA)}

= {vec{(D0 )0 }}0 {(CA)0 ⊗ B} vec(D)

= {vec(D)}0 (A0 C 0 ⊗ B) vec(D).

8.27 If C is symmetric, then


Kmm vec(C) = vec(C 0 ) = vec(C).

Using this and Theorem 8.26, we have

{vec(C)}0 (A ⊗ B) vec(C) = {vec(C)}0 (A ⊗ B)Kmm vec(C)

= {vec(C)}0 Kmm (B ⊗ A) vec(C)

= {Kmm vec(C)}0 (B ⊗ A) vec(C)

= {vec(C)}0 (B ⊗ A) vec(C).

8.29 Using Theorem 8.11, we have

vec(AB) = vec(ABIp ) = (Ip ⊗ A) vec(B),

vec(AB) = vec(Im AB) = (B 0 ⊗ Im ) vec(A),

and
vec(AB) = vec(AIn B) = (B 0 ⊗ A) vec(In ).

8.31 Using Theorem 8.10 and the Cauchy-Schwarz inequality,

{tr(A0 B)}2 = [{vec(A)}0 vec(B)]2

≤ {vec(A)}0 vec(A){vec(B)}0 vec(B)

= {tr(A0 A)}{tr(B 0 B)}.

We have equality if and only if one of the vectors, vec(A) or vec(B), is a scalar multiple of the other,
or in other words, one of the matrices, A or B, is a scalar multiple of the other.

83
Pm
8.33 Since Im = i=1 ei e0i , we have
m
! m m
X X X
vec(Im ) = vec ei e0i = vec(ei e0i ) = ei ⊗ ei ,
i=1 i=1 i=1

where the last equality follows from Theorem 8.9(b).

8.35 (a) Straightforward computation yields


 
4 2
A B= .
2 12

(b) The characteristic equation of A is |A − λI2 | = λ2 − 5λ = 0. Thus, the eigenvalues of A are 0 and
5, so it is positive semidefinite. The characteristic equation of B is |B − λI2 | = λ2 − 7λ + 11 = 0.
√ √
Thus, the eigenvalues of B are 21 (7 + 5) and 21 (7 − 5). Since these are both positive, B is
positive definite. The characteristic equation of A B is |A B − λI2 | = λ2 − 16λ + 44 = 0.

Thus, the eigenvalues of A B are 8 ± 20, so it is also positive definite. The fact that A B is
positive definite follows from Theorem 8.17.

8.37 The identity holds since


n
X n
X
tr{(A0 B 0 )C} = {(A0 B 0 )C}ii = (A0 B 0 )i· (C)·i
i=1 i=1
n
X n X
X m
= (A B)0·i (C)·i = aji bji cji ,
i=1 i=1 j=1

and
n
X n
X
0 0
tr{A (B C)} = {A (B C)}ii = (A0 )i· (B C)·i
i=1 i=1
Xn Xn Xm
= (A)0·i (B C)·i = aji bji cji .
i=1 i=1 j=1

8.39 (a) From Theorem 8.20, we have !


m
Y
|A B| ≥ |A| bii .
i=1
Also,
m
Y
bii ≥ |B|,
i=1
when B is positive definite due to Theorem 8.18 and when B is positive semidefinte since |B| = 0
and the diagonal elements of B are nonnegative. Combining these two inequalitites yields (a).

84
(b) This follows directly from part (a) since

|A A−1 | ≥ |A||A−1 | = |AA−1 | = |Im | = 1.

8.41 (a) Since  


8 0
A B= ,
0 3
clearly λ2 (A B) = 3. The lower bound from Theorem 8.21 is

λ2 (A B) ≥ λ2 (A)(min bii ) = 1(2) = 2,


i

while Theorem 8.23 yields the bound

λ2 (A B) ≥ λ2 (AB) = 3.

The bound from Theorem 8.23 is closer since it gives the actual value.

(b) Since  
2 0
A B= ,
0 3
clearly λ2 (A B) = 2. The lower bound from Theorem 8.21 is

λ2 (A B) ≥ λ2 (A)(min bii ) = 1(2) = 2.


i

It is easily shown that AB = B has eigenvalues 1 and 4, so Theorem 8.23 yields the bound

λ2 (A B) ≥ λ2 (AB) = 1.

The bound from Theorem 8.21 is closer since it gives the actual value.

−1/2 −1/2 −1/2 −1/2 −1/2 −1/2 −1/2


8.43 (a) Note that RA = DA ADA and RB = DB BDB , where DA = diag(a11 , . . . , amm )
−1/2 −1/2 −1/2
and DB = diag(b11 , . . . , bmm ). Then using Theorem 8.13(h), we find that

−1/2 −1/2 −1/2 −1/2 −1/2 −1/2 −1/2 −1/2


|RA RB | + |RA ||RB | = |DA ADA DB BDB | + |DA ADA ||DB BDB |

= |DA |−1 |DB |−1 {|A B| + |A||B|},

85
while

−1/2 −1/2 −1/2 −1/2


|RA | + |RB | = |DA ADA | + |DB BDB |

= |DA |−1 |DB |−1 {|A||DB | + |B||DA |}


( m m
)
Y Y
−1 −1
= |DA | |DB | |A| bii + |B| aii ,
i=1 i=1

so the two inequalities are equivalent.


−1
(b) Let ρ = e01 RB e1 and

li+1 = |RAi RBi | + |RAi ||RBi | − |RAi | − |RBi |, i = 1, . . . , m − 1,

where RAi and RBi denote the matrices obtained by deleting the first i rows and columns of
RA and RB . Note that we need to show that l1 ≥ 0. It follows from Theorem 8.19 that C is
nonnegative definite and, hence, by Theorem 8.16, RA C is nonnegative definite. An application
of Theorem 8.20 yields

|RA |(1 − ρ−1 ) ≤ |RA C| = |RA RB − ρ−1 e1 e01 |


 
|RA1 RB1 |
= |RA RB | 1 − .
|RA RB |ρ

The last step was obtained by expanding along the first row. This last inequality can be equiva-
lently expressed as
|RA RB | − |RA1 RB1 |/ρ ≥ |RA | − |RA |/ρ.

Using this and the fact that |RB | = |RB1 |/ρ, we have

l1 − l2 /ρ = |RA RB | − |RA1 RB1 |/ρ + |RA ||RB | − |RA1 ||RB1 |/ρ

+ |RA1 |/ρ − |RA | + |RB1 |/ρ − |RB |

≥ |RA | − |RA |/ρ + |RA ||RB | − |RA1 ||RB1 |/ρ

+ |RA1 |/ρ − |RA | + |RB1 |/ρ − |RB |

= (ρ−1 − |RB |)(|RA1 | − |RA |).

Note that ρ−1 − |RB | = ρ−1 (1 − |RB1 |) ≥ 0 since RB1 is a correlation matrix, and so we
must have |RB1 | ≤ 1. In addition, |RA1 | − |RA | = |RA1 |(1 − |RA |/|RA1 |) ≥ 0 follows from
Theorem 8.19. Thus, we have established that l1 ≥ l2 |RB |/|RB1 |. In a similar fashion, we

86
get li ≥ li+1 |RBi−1 |/|RBi | for i = 2, . . . , m − 1. These inequalities lead to the inequality
l1 ≥ lm |RB |/|RBm−1 |. The result then follow since lm = 1 + 1 − 1 − 1 = 0.

8.45 It follows from Theorem 8.16 that A B is nonnegative definite, and since B has r positive diagonal
elements, so does A B. If the ith diagonal element of A B is 0, then (A B)ei = 0 (see Problem
Q
3.51), and so we must have rank(A B) ≤ r. Now apply the formula |A B| ≥ |A| bii of Theorem
8.20 on the submatrix of A B obtained by deleting the rows and columns corresponding to the 0
diagonal elements. This show that A B has an r × r nonsingular submatrix and so rank(A B) ≥ r.
Combining these two inequalities for rank(A B) yields rank(A B) = r.

8.47 If R 6= Im , then we must have λ < 1, so the matrix R − λIm has all of its diagonal elements positive.
It then follows from Theorem 8.17, that R (R − λIm ) is positive definite. Consequently, if x is a
normalized eigenvector of R R corresponding to the eigenvalue τ , then

0 < x0 {R (R − λIm )}x = x0 (R R)x − x0 (R λIm )x

= τ − λx0 Im x = τ − λ,

which yields the desired result.

8.49 With C = P (A ⊗ B)P 0 and D = C −1 partitioned into the 2 × 2 form of (7.1), we have from Problem
−1
7.10, that D11 − C11 is nonnegative definite. Since C11 = A B and

D = C −1 = {P (A ⊗ B)P 0 }−1 = P (A−1 ⊗ B −1 )P 0

so that D11 = A−1 B −1 , part (a) follows. Parts (b) and (c) are special cases of part (a) in which
B = A and B = A−1 , respectively.
8.51
 
1 0 0 0 0 0 0 0
 
 
 0 0 1 0 0 0 0 0 
   
 
1 0 0 0  0 0 0 0 1 0 0 0 
   
   
 0 0 1 0   0 0 0 0 0 0 1 0 
K22 =

,
 K24 =

.

 0 1 0 0   0 1 0 0 0 0 0 0 
   
 
0 0 0 1  0 0 0 1 0 0 0 0 
 
 
 0 0 0 0 0 1 0 0 
 
0 0 0 0 0 0 0 1

87
8.53 Using Theorem 8.1(g), we have
m X
X n m X
X n
Kmn = eim e0jn ⊗ ejn e0im = eim ⊗ e0jn ⊗ ejn ⊗ e0im
i=1 j=1 i=1 j=1
   
m
X n
X m
X n
X
= eim ⊗ (e0jn ⊗ ejn ) ⊗ e0im  = eim ⊗ ejn e0jn ⊗ e0im 
i=1 j=1 i=1 j=1
m
X
= (eim ⊗ In ⊗ e0im ).
i=1

Consequently,
m
X m
X
0
Kmn (x ⊗ A ⊗ y 0 ) = (e0im x ⊗ A ⊗ eim y 0 ) = xi (A ⊗ eim y 0 )
i=1 i=1
m
!
X
= A⊗ xi eim y 0 = A ⊗ xy 0 .
i=1

8.55 (a) Using Theorem 8.26(a), we have

P 0 = {Kmn (A0 ⊗ A)}0 = (A0 ⊗ A)0 Kmn


0
= (A ⊗ A0 )Knm = Kmn (A0 ⊗ A) = P.

(b) Since Kmn is nonsingular, it follows that

rank(P ) = rank(A0 ⊗ A) = rank(A0 ) rank(A) = r2 .

(c) Using Theorem 8.26(f), we have

tr(P ) = tr{Kmn (A0 ⊗ A)} = tr{(A0 ⊗ A)Kmn } = tr(A0 A).

0
(d) Since P is symmetric and Kmn Kmn = Imn , we find that

P 2 = P 0 P = (A ⊗ A0 )Kmn
0
Kmn (A0 ⊗ A) = (A ⊗ A0 )(A0 ⊗ A) = (AA0 ⊗ A0 A).

(e) It follows from part (d) and Theorem 8.5 that the r2 nonzero eigenvlaues of P 2 are λi λj , i =
1, . . . , r; j = 1, . . . , r. We then get the nonzero eigenvalues of P , except for sign, by taking the
square roots of these eigenvalues. However,
r
X
tr(P ) = tr(A0 A) = λi ,
i=1

and so the nonzero eigenvalues of P must be λi , i = 1, . . . , r, and ±(λi λj )1/2 for all i < j.

88
8.57 Using Problem 8.56, we have

vec(In ⊗ Iq ) = (Knq,n ⊗ Iq ){vec(In ) ⊗ vec(Iq )},

and
vec(A0 ⊗ B 0 ) = (Kmp,n ⊗ Iq ){vec(A) ⊗ vec(B 0 )}.

0
Now applying Theorem 8.10 and the fact that Knq,n Kmp,n = Imnp , we get

tr(A ⊗ B) = tr(A0 ⊗ B 0 ) = tr{(In ⊗ Iq )(A0 ⊗ B 0 )}

= {vec(In ⊗ Iq )}0 vec(A0 ⊗ B 0 )

= {vec(In ) ⊗ vec(Iq )}0 (Knq,n


0
⊗ Iq )(Kmp,n ⊗ Iq ){vec(A) ⊗ vec(B 0 )}

= {vec(In ) ⊗ vec(Iq )}0 {vec(A) ⊗ vec(B 0 )}.

8.59 (a) Since (B ⊗ B)Kmm = Kmm (B ⊗ B), we have

1 1 1
(B ⊗ B)Nm = (B ⊗ B)(Im2 + Kmm ) = B ⊗ B + (B ⊗ B)Kmm
2 2 2
1 1 1
= B ⊗ B + Kmm (B ⊗ B) = (Im2 + Kmm )(B ⊗ B) = Nm (B ⊗ B).
2 2 2
2
Further, using this and the fact that Nm = Nm ,

Nm (B ⊗ B)Nm = Nm Nm (B ⊗ B) = Nm (B ⊗ B).

(b) Using part (a),

(B ⊗ B)Nm (B 0 ⊗ B 0 ) = Nm (B ⊗ B)(B 0 ⊗ B 0 ) = Nm (BB 0 ⊗ BB 0 ) = Nm (A ⊗ A).

8.61 (a) The result follows since


 
m m m m m X m
1 XX 1 X X X 
(ei e0i ⊗ ej e0j + ei e0j ⊗ ej e0i ) = (ei e0i ⊗ ej e0j ) + (ei e0j ⊗ ej e0i )
2 i=1 j=1 2  i=1 j=1 i=1 j=1

  
m m
1 X X 
= ei e0i ⊗ ej e0j  + Kmm
2 i=1 j=1

1 1
= {(Im ⊗ Im ) + Kmm } = (Im2 + Kmm ) = Nm .
2 2

89
(b) For any m × 1 vectors a and b,

1 1 1
Nm (a + b) = (Im2 + Kmm )(a ⊗ b) = {a ⊗ b + Kmm (a ⊗ b)} = (a ⊗ b + b ⊗ a).
2 2 2

(c) The result follows since

1
(a ⊗ b ⊗ c + a ⊗ c ⊗ b + b ⊗ a ⊗ c + b ⊗ c ⊗ a + c ⊗ a ⊗ b + c ⊗ b ⊗ a)
6
m m m
1 XXX
= (eh ah ⊗ ei bi ⊗ ej cj + eh ah ⊗ ei ci ⊗ ej bj
6 i=1 j=1
h=1
+ eh bh ⊗ ei ai ⊗ ej cj + eh bh ⊗ ei ci ⊗ ej aj

+ eh ch ⊗ ei ai ⊗ ej bj + eh ch ⊗ ei bi ⊗ ej aj
m m m
1 XXX
= (eh e0h a ⊗ ei e0i b ⊗ ej e0j c + eh e0h a ⊗ ei e0j b ⊗ ej e0i c
6 i=1 j=1
h=1

+ eh e0i a ⊗ ei e0h b ⊗ ej e0j c + eh e0j a ⊗ ei e0h b ⊗ ej e0i c

+ eh e0i a ⊗ ei e0j b ⊗ ej e0h c + eh e0j a ⊗ ei e0i b ⊗ ej e0h c

= ∆(a ⊗ b ⊗ c).

8.63 Note that


X X X
A = aij Eij = aii Eii + aij (Eij + Eji )
i,j i i>j
X X X
= aii Tii + aij Tij = aij Tij ,
i i>j i≥j

and so
X X
vec(A) = vec(Tij )aij = vec(Tij )u0ij v(A)
i≥j i≥j

since aij = u0ij v(A).

8.65 (a) First suppose B is an m × m symmetric matrix and x0 Bx = 0 for all m × 1 vectors x. Then, in
particular,
e0i Bei = 0

for i = 1, . . . , m, and
(ei + ej )0 B(ei + ej ) = 0

90
for all i 6= j. Together these two identities imply that bii = 0 for all i and bij = 0 for all i 6= j,
and so we must have B = (0). Now for an arbitrary symmetric m × m matrix A,
m X
X m
{v(A)}0 Dm
0
Dm v(A) = {vec(A)}0 vec(A) = a2ij
i=1 j=1
X m
X
= 2 a2ij − a2ii
i≥j i=1
m
X
= 2{v(A)}0 v(A) − {v(A)}0 uii u0ii v(A)
i=1
m
!
X
0
= {v(A)} 2Im(m+1)/2 − uii u0ii v(A).
i=1

Thus, we have shown that y 0 By = 0 for all m(m + 1)/2 × 1 vectors y when B = {Dm
0
Dm −
Pm
(2Im(m+1)/2 − i=1 uii u0ii )} and, since B is symmetric, the result follows.
0
(b) From part (a), we see that Dm Dm is a diagonal matrix with m diagonal elements equal to 1
and m(m − 1)/2 diagonal elements equal to 2. Since the determinant of a diagonal matrix is the
0
product of its diagonal elements, we have |Dm Dm | = 2m(m−1)/2 .

+
8.67 (a) From Theorem 8.33(c), Dm Dm = Nm , so
+
Dm Dm (A ⊗ A)Dm = Nm (A ⊗ A)Dm = (A ⊗ A)Nm Dm = (A ⊗ A)Dm .

(b) We prove the result by induction. The result holds for i = 2 since
+
{Dm (A ⊗ A)Dm }2 +
= Dm +
(A ⊗ A){Dm Dm (A ⊗ A)Dm }
+ +
= Dm (A ⊗ A)(A ⊗ A)Dm = Dm (A2 ⊗ A2 )Dm .

If the result holds for i, then it holds for i + 1 since


+
{Dm (A ⊗ A)Dm }i+1 = +
{Dm (A ⊗ A)Dm }i {Dm
+
(A ⊗ A)Dm }
+
= Dm (Ai ⊗ Ai )Dm Dm
+
(A ⊗ A)Dm
+
= Dm (Ai ⊗ Ai )(A ⊗ A)Dm
+
= Dm (Ai+1 ⊗ Ai+1 )Dm .

8.69 Since A is lower triangular, A = i≥j aij Eij . Thus, using the fact that aij = u0ij v(A), we get
P
 
X  X X
vec(A) = vec aij Eij = vec(Eij )aij = {vec(Eij )}u0ij v(A).
 
i≥j i≥j i≥j

91
8.71 Note that if A is strictly lower triangular, then vec(A)0 vec(A) = ṽ(A)0 ṽ(A), and so

0 = vec(A)0 vec(A) − ṽ(A)0 ṽ(A) = ṽ(A)0 L̃m L̃0m ṽ(A) − ṽ(A)0 ṽ(A) = ṽ(A)0 (L̃m L̃0m − Im(m−1)/2 )ṽ(A).

Since any x ∈ Rm(m−1)/2 can be expressed as x = ṽ(A) for some strictly lower triangular matrix A,
this implies that L̃m L̃0m − Im(m−1)/2 = (0) (see the solution to Problem 8.65(a)), which proves (b).
But (b) implies that L̃m is full row rank, which then yields (a) and (c) since

0 0 −1
L̃+
m = L̃m (L̃m L̃m ) = L̃0m Im(m−1)/2 = L̃0m .

To prove (d), for arbitrary A, write A = AL + AU , where AL is strictly lower triangular and AU is
upper triangular. Then
0 = vec(AL )0 vec(AU ) = ṽ(AL )0 L̃m vec(AU ),

regardless of the choice of AL , so we must have L̃m vec(AU ) = 0. Thus, we have

L̃m vec(A) = L̃m vec(AL + AU ) = L̃m vec(AL )

= L̃m L̃0m ṽ(AL ) = ṽ(AL ) = ṽ(A).

8.73 Let A be an arbitrary strictly lower triangular matrix. Then

Kmm L̃0m ṽ(A) = Kmm vec(A) = vec(A0 ),

and since A0 is strictly upper triangular

Lm vec(A0 ) = v(A0 ) = 0,

and
L̃m vec(A0 ) = ṽ(A0 ) = 0.

Thus,
ṽ(A)0 L̃m Kmm L̃0m = vec(A0 )0 L̃0m = 00 ,

and
ṽ(A)0 L̃m Kmm L0m = vec(A0 )0 L0m = 00 ,

which proves (a) and (b). We get (d) and (e) since for any strictly lower triangular matrix A,

L0m Lm L̃0m ṽ(A) = L0m Lm vec(A) = L0m v(A) = vec(A) = L̃0m ṽ(A),

92
and

Dm Lm L̃0m ṽ(A) = Dm Lm vec(A) = Dm v(A) = vec(A + A0 − DA )

= vec(A) + vec(A0 ) = (Im2 + Kmm ) vec(A)

= 2Nm vec(A) = 2Nm L̃0m ṽ(A).

To get (c), postmultiply the transpose of (d) by Dm and then use Theorem 8.37(a). Finally, (f) follows
from (d) and Theorem 8.38(b).

8.75 First suppose that A is an m × m nonsingular positive matrix and let B = A−1 . For i 6= j,
m
X
(BA)ij = bik akj = 0. (1)
k=1

Since akj > 0 for all k, (1) holds only if bik < 0 for at least one k or bik = 0 for all k. However, this
second option is not possible since B is nonsingular, and so B is not nonnegative. Now suppose both A
and B are nonnegative and assume that the jth column of A has more than one nonzero component. If
ahj > 0 and alj > 0, then for (1) to hold, we must have bih = bil = 0 for all i 6= j. But this implies that
the hth and lth columns of B are both scalar multiples of ej . This cannot be since B is nonsingular,
and so each column of A has one nonzero component.

8.77 Consider the matrices    


0 1 1 0
A= , B= .
0 0 0 2
Both matrices are clearly reducible with A having both of its eigenvalues equal to 0, while B has
eigenvalues of 1 and 2.

(a) It follows that ρ(A) = 0.

(b) An eigenvector of B corresponding to the eigenvalue ρ(B) = 2 is of the form (0, x2 ), which is not
positive.

(c) ρ(A) = 0 is a multiple eigenvalue of A.

8.79 (a) Let λ1 , . . . , λm be the eigenvalues of A, so that λ1 + 1, . . . , λm + 1 are the eigenvalues of Im + A.


Then
ρ(Im + A) = max |λi + 1| ≤ max |λi | + 1 = 1 + ρ(A).
1≤i≤m 1≤i≤m

93
However, since A is nonnegative, ρ(A) is an eigenvalue of A and, hence, 1 + ρ(A) is an eigenvalue
of Im + A. This confirms that ρ(Im + A) = 1 + ρ(A).

(b) If λ1 , . . . , λm are the eigenvalues of A, λk1 , . . . , λkm are the eigenvalues of Ak . Since A is nonnegative,
ρ(A) is an eigenvalue of A and if it is a multiple eigenvalue of A, then ρ(A)k = ρ(Ak ) would be a
multiple eigenvalue of Ak . However, from Theorem 8.45, we know this is not possible since Ak is
positive.

(c) It follows from part (a) that if ρ(A) is a multiple eigenvalue of A, then 1 + ρ(A) = ρ(Im + A) is
a multiple eigenvalue of Im + A. But A is an irreducible nonnegative matrix and so by Theorem
8.46, (Im + A)m−1 is positive, and then according to part (b), 1 + ρ(A) is a simple eigenvalue of
Im + A. Thus, ρ(A) must be a simple eigenvalue of A.

8.81 Since A is nonnegative and nonnegative definite, we know that aij ≥ 0 for all i and j, and a11 −a212 /a22 ≥
0 if a22 > 0 (see Problem 7.8). We will find B of the form
 
a b
B= ,
0 c

so that    
a2 + b2 bc a11 a12
BB 0 =  = . (1)
bc c2 a12 a22
If a22 = 0, a12 = 0 since A is nonnegative definite (Problem 3.51), and so BB 0 = A with
 
1/2
a11 0
B= .
0 0

If a22 > 0, using (1), we solve for c, b and a to get

1/2 1/2
c = a22 , b = a12 /a22 , a = (a11 − a212 /a22 )1/2 .

Thus, since B is nonnegative, we have shown that A is completely positive.

8.83 Now θ = cos(2π/m) + i sin(2π/m) = e2πi/m and its complex conjugate is

θ̄ = cos(2π/m) − i sin(2π/m) = cos(−2π/m) + i sin(−2π/m) = e−2πi/m = θ−1 .

Similarly, θh = cos(2πh/m) + i sin(2πh/m) = e2πhi/m and

θ¯h = cos(2πh/m) − i sin(2πh/m) = cos(−2πh/m) + i sin(−2πh/m) = e−2πhi/m = θ−h .

94

This confirms that the conjugate transpose of F is as given in the problem. Now m(F )ik = θ(i−1)(k−1)

and m(F ∗ )kj = θ−(k−1)(j−1) , so
m
X
m(F F ∗ )ij = m(F )i· (F ∗ )·j = θ(i−1)(k−1) θ−(k−1)(j−1)
k=1
m
X m−1
X
= θ(k−1)(i−j) = {θi−j }k .
k=1 k=0

When i = j, this reduce to m. Otherwise


m−1
X 1 − (θi−j )m 1 − (θm )i−j 1 − 1i−j
{θi−j }k = i−j
= i−j
= = 0,
1−θ 1−θ 1 − θi−j
k=0

and so we have shown that F F ∗ = Im .

8.85 Since Πm−1


m = (e2 , e3 , . . . , em , e1 ), we find that
 
e02
 
 0
 e3

 m
 .
  X
Πm m−1
m = Πm Πm = [e2 , e3 , . . . , em , e1 ]  .. = ei e0i = Im .

 
 0  i=1
 em 
 
e01

This proves (b), while multiplying the identity in (b) by Π−1


m leads to (a). Finally, using (b), we have

mn+r
Πm = (Πm n r n r r
m ) Πm = (Im ) Πm = Πm ,

which establishes (c).

8.87 The jth eigenvalue of circ(1, . . . , 1) is given by


m−1
X m−1
X
δj = (θj−1 )k = 1 = m,
k=0 k=0

if j = 1, and
m−1
X 1 − (θj−1 )m 1 − (θm )j−1 1−1
δj = (θj−1 )k = j−1
= j−1
= = 0,
1−θ 1−θ 1 − θj−1
k=0

if j = 2, . . . , m. That is, it has the eigenvalue m and the eigenvalue 0 repeated m − 1 times.

95
8.89 The matrix  
0 1 0
 
A=B= 1
 
0 0 
 
0 0 1
is not a circulant matrix, while AB = I3 is a circulant matrix.

8.91 Now
(AA−1 )11 = (AA−1 )mm = c − abc = c(1 − ab) = (1 − ab)−1 (1 − ab) = 1,

while
(AA−1 )jj = −abc + (ab + 1)c − abc = c − abc = 1,

for j = 2, . . . , m − 1. Also
(AA−1 )j1 = aj−1 c − aj−1 c = 0,

for j = 2, . . . , m, and
(AA−1 )jm = bm−j c − bm−j c = 0,

for j = 1, . . . , m − 1. Finally,

(AA−1 )ij = −bcbj−i−1 + (ab + 1)cbj−i − acbj−i+1

= bj−i−1 (−bc + ab2 c + bc − ab2 c) = 0

for j = 2, . . . , m − 1, i = 1, . . . , j − 1, and

(AA−1 )ij = −bcai−j+1 + (ab + 1)cai−j − acai−j−1

= ai−j−1 (−a2 bc + a2 bc + ac − ac) = 0

for j = 2, . . . , m−1, i = j+1, . . . , m, so that the matrix given is the inverse of A. If ab = 1, then b = 1/a,
(A)·1 = (1, a, a2 , . . . , am−1 )0 , (A)·2 = (1/a, 1, a, . . . , am−2 )0 , and (A)·1 − a(A)·2 = 0, so rank(A) < m.

8.93 We will prove the result by showing that

|Ah | = (1 − ρ2 )h−1 , (1)

where Ah is the hth leading principal submatrix of A. Clearly, |A1 | = 1 and |A2 | = 1 − ρ2 , so (1) holds
for h = 1 and h = 2. We show that it holds for general h by induction. Suppose (1) holds for h − 1, so

96
that |Ah−1 | = (1 − ρ2 )h−2 . The determinant of B = Ah can be computed by expanding along its first
row to get
h
X
|Ah | = |B| = ρi−1 B1i = B11 + ρB12 . (2)
i=1

The last equality follows since when i > 2, the matrix obtained from Ah by deleting its first row and ith
column has its first and second columns being scalar multiples of one another. In addition, note that B11
is identical to |Ah−1 | and, upon using Theorem 1.4(f), we find that B12 = (−1)1+2 ρ|Ah−1 | = −ρ|Ah−1 |.
Thus, (2) becomes

|Ah | = |Ah−1 | − ρ2 |Ah−1 | = (1 − ρ2 )|Ah−1 | = (1 − ρ2 )(1 − ρ2 )h−2 = (1 − ρ2 )h−1 .

The result follows from Theorem 7.6 since the quantity in (1) is positive for all h.
8.95
 
1 1 1 1 1 1 1 1 1 1 1 1
 
−1 1 −1 −1 −1 −1 1 −1 
 
 1 1 1 1
 
−1 −1 1 −1 −1 −1 −1
 
 1 1 1 1 1 
 
1 −1 −1 1 −1 −1 −1 −1 
 
 1 1 1 1
 
−1 1 −1 −1 1 −1 −1 −1 
 
 1 1 1 1
 
−1 −1 1 −1 −1 1 −1 1 −1 
 
 1 1 1
H= .
−1 −1 −1 1 −1 −1 1 −1
 
 1 1 1 1 
 
1 −1 −1 −1 1 −1 −1 1 −1
 
 1 1 1 
 
1 −1 −1 −1 1 −1 −1 −1
 
 1 1 1 1 
 
−1 −1 −1 1 −1 −1 1 −1 
 
 1 1 1 1
 
−1 −1 −1 −1 1 −1 −1
 
 1 1 1 1 1 
 
1 1 −1 1 1 1 −1 −1 −1 1 −1 −1

8.97 Since each element of H is 1 or −1, we simply need to verify that HH 0 = 4mI4m . Partitioning HH 0
as H is partitioned, we find that each submatrix along the diagonal is

AA0 + BB 0 + CC 0 + DD0 = 4mIm ,

as required. The (1, 2) submatrix of HH 0 is

−AB 0 + BA0 − CD0 + DC 0 = −AB 0 + AB 0 − CD0 + CD0 = (0),

97
since BA0 = AB 0 and DC 0 = CD0 . In a similar fashion, using the condition XY 0 = Y X 0 , it can be
shown that the remaining off-diagonal submatrices of HH 0 are null matrices.

8.99 The result follows immediately from Theorem 8.61 since A is nonsingular if and only if |A| 6= 0, and
each term in the product given in (8.28) is nonzero if and only if ai 6= aj for all i 6= j.

8.101 Using (8.27), we find that


 
b0 b1 b2 ··· bm−1
 
b1 b2 b3 ··· bm
 
 
 
AA0 =  b2 b3 b4 ··· bm+1 ,
 
.. .. .. ..
 
 

 . . . . 

bm−1 bm bm+1 ··· b2m−2
Pm
where bi = k=1 aik . Then
 
bm−1 bm−2 bm−3 ··· b0
 
 bm bm−1 bm−2 ··· b1
 

 
AA0 P = AA0 [em , em−1 , . . . , e1 ] =  bm+1 bm bm−1 ··· b2
 

.. .. .. ..
 
 

 . . . . 

b2m−2 b2m−3 b2m−4 ··· bm−1

is a Toeplitz matrix since it has the form given in (8.23). Similarly, we find that
 
  bm−1 bm bm+1 ··· b2m−2
e0m  
 bm−2 bm−1 bm ··· b2m−3 
   
 0
 em−1
  
0 0 0  AA0 = 

P AA = P AA =   bm−3 bm−2 bm−1 ··· b2m−4 

 .. 
 .   ..

.. .. ..


   . . . . 
e01  
b0 b1 b2 ··· bm−1

is also a Toeplitz matrix.

98
Chapter 9

9.1 (a) With f (x) = log(x), we have f (i) (x) = (−1)i−1 (i − 1)!x−i . Consequently, the kth-order Taylor
formula is given by
k
X ui (−1)i−1 (i − 1)!1−i
log(1 + u) = log(1) + + rk (u, 1)
i=1
i!
k
X ui (−1)i−1
= + rk (u, 1).
i=1
i

(b) We get
(.1)2 (.1)3 (.1)4 (.1)5
log(1.1) ≈ .1 − + − + = .0953103.
2 3 4 5
9.3 Since  
−z2 1
∂ z12 z1
g= ,
∂z 0 z2 z1
and  
∂ 2x1 2x2 2x3
f = ,
∂x0 2 −1 −1
we have

   2(x21 +x22 +x23 )−2x1 (2x1 −x2 −x3 )
∂ ∂ ∂ (x21 +x22 +x23 )2
y = g f =
∂x0 ∂f 0 ∂x0 2(x21 + x22 + x23 ) + 2x1 (2x1 − x2 − x3 )

−(x21 +x22 +x23 )−2x2 (2x1 −x2 −x3 ) −(x21 +x22 +x23 )−2x3 (2x1 −x2 −x3 )
(x21 +x22 +x23 )2 (x21 +x22 +x23 )2 .
−(x21 + x22 + x23 ) + 2x2 (2x1 − x2 − x3 ) −(x21 + x22 + x23 ) + 2x3 (2x1 − x2 − x3 )

9.5 We have
x0 Bxd(x0 Ax) − x0 Axd(x0 Bx)
df =
(x0 Bx)2
x Bx{(dx)0 Ax + x0 Adx} − x0 Ax{(dx)0 Bx + x0 Bdx}
0
=
(x0 Bx)2
2(x Bxx Adx − x Axx0 Bdx)
0 0 0
= ,
(x0 Bx)2
and so
∂ 2(x0 Bxx0 A − x0 Axx0 B)
0
f= .
∂x (x0 Bx)2

99
9.7 (a) We have
d tr(AX) = tr(AdX) = vec(A0 )0 vec(dX) = vec(A0 )0 d vec(X),

so

tr(AX) = vec(A0 )0 .
∂ vec(X)0
(b) Similarly, we get

d tr(AXBX) = tr(A(dX)BX) + tr(AXBdX)

= tr(BXAdX) + tr(AXBdX)

= vec(A0 X 0 B 0 )0 vec(dX) + vec(B 0 X 0 A0 )0 vec(dX)

= {vec(A0 X 0 B 0 )0 + vec(B 0 X 0 A0 )0 }d vec(X),

so

tr(AXBX) = vec(A0 X 0 B 0 )0 + vec(B 0 X 0 A0 )0 .
∂ vec(X)0

9.9 Using Corollary 9.1.1, we have

d log |X 0 AX| = tr{(X 0 AX)−1 d(X 0 AX)}

= tr[(X 0 AX)−1 {(dX)0 AX + X 0 AdX}]

= 2 tr{(X 0 AX)−1 X 0 AdX}

= 2 vec{AX(X 0 AX)−1 }0 d vec(X),

and so

log |X 0 AX| = 2 vec{AX(X 0 AX)−1 }0 .
d vec(X)0

9.11 Since
n
X
dX n = (dX)X n−1 + X(dX)X n−2 + · · · + X n−1 dX = X j−1 (dX)X n−j ,
j=1

we find that
 
Xn  n
X
d vec(X n ) = vec X j−1 (dX)X n−j = vec{X j−1 (dX)X n−j }
 
j=1 j=1
n
X
= (X n−j0 ⊗ X j−1 )d vec(X),
j=1

100
and so
n
∂ n
X
vec(X ) = (X n−j0 ⊗ X j−1 ).
∂ vec(X)0 j=1

9.13 Since X# = |X|X −1 , we have

dX# = (d|X|)X −1 + |X|dX −1 = |X| tr(X −1 dX)X −1 − |X|X −1 (dX)X −1

= |X|{tr(X −1 dX)X −1 − X −1 (dX)X −1 }

and

d vec(X# ) = |X|{tr(X −1 dX) vec(X −1 ) − vec(X −1 (dX)X −1 )}

= |X|{vec(X −1 ) vec(X −10 )0 d vec(X) − (X −10 ⊗ X −1 )d vec(X)},

and so

vec(X# ) = |X|{vec(X −1 ) vec(X −10 )0 − (X −10 ⊗ X −1 )}.
∂ vec(X)0
9.15 (a) Since dF = d(AXA0 ) = A(dX)A0 and

d vec(F ) = vec{A(dX)A0 } = (A ⊗ A)d vec(X) = (A ⊗ A)Dm d v(X),

we get

vec(F ) = (A ⊗ A)Dm .
∂ v(X)0
(b) Since dF = d(XBX) = (dX)BX + XB(dX) and

d vec(F ) = vec{(dX)BX} + vec(XBdX)

= (XB ⊗ Im )d vec(X) + (Im ⊗ XB)d vec(X)

= {(XB ⊗ Im ) + (Im ⊗ XB)}Dm d v(X),

we find that

vec(F ) = {(XB ⊗ Im ) + (Im ⊗ XB)}Dm .
∂ v(X)0
9.17 (a) Using Theorem 8.27, we get

d vec(F ) = vec{(dX) ⊗ X + X ⊗ dX}

= (In ⊗ Knm ⊗ Im ){vec(dX) ⊗ vec(X) + vec(X) ⊗ vec(dX)}

= (In ⊗ Knm ⊗ Im ){Imn ⊗ vec(X) + vec(X) ⊗ Imn }d vec(X).

101
It then follows that

vec(F ) = (In ⊗ Knm ⊗ Im ){Imn ⊗ vec(X) + vec(X) ⊗ Imn }.
∂ vec(X)0

(b) The result follows since

d vec(X X) = vec{(dX) X +X dX} = 2 vec(X dX)

= 2Dvec(X) d vec(X).

9.19 We prove the result by induction. The result holds when n = 1 due to Theorem 9.2. Suppose the
result holds for n − 1. The result then holds for n since

dn X −1 = d(dn−1 X −1 ) = d{(−1)n−1 (n − 1)!(X −1 dX)n−1 X −1

= (−1)n−1 (n − 1)![{d(X −1 dX)n−1 }X −1 + (X −1 dX)n−1 dX −1 ]


" n−1 
X
= (−1)n−1 (n − 1)!  (X −1 dX)j−1 {d(X −1 dX)}(X −1 dX)n−1−j  X −1
j=1
#
−1 n−1 −1 −1
− (X dX) X dXX
" n−1 
X
= (−1)n−1 (n − 1)!(−1)  (X −1 dX)j−1 (X −1 dX)2 (X −1 dX)n−1−j  X −1
j=1
#
+ (X −1 dX)n X −1

= (−1)n (n − 1)!{(n − 1)(X −1 dX)n X −1 + (X −1 dX)n X −1 }

= (−1)n n!(X −1 dX)n X −1 .

9.21 Using the perturbation formula for a matrix inverse given in Section 9.6, we have

(Im + Y )−1 = Im − Y + 2 Y 2 − 3 Y 3 + 4 Y 4 − · · · . (1)


P∞
Writing (Im + Y )−1/2 = Im + i=1 i Bi , we also have

!2
X
(Im + Y )−1/2 (Im + Y )−1/2 = Im + i
 Bi
i=1
= Im + 2B1 + 2 (2B2 + B12 ) + 3 (2B3 + B1 B2 + B2 B1 )

+ 4 (2B4 + B1 B3 + B3 B1 + B22 ) + · · · (2)

102
Solutions for the Bi ’s can be obtained by equating (1) and (2). First we must have −Y = 2B1 , so
B1 = − 21 Y . Next, we have
1
Y 2 = 2B2 + B12 = 2B2 + Y 2 ,
4
and so  
1 3 2 3 2
B2 = Y = Y .
2 4 8
Then we have
3 3 3
−Y 3 = 2B3 + B1 B2 + B2 B1 = 2B3 − Y − Y 3,
16 16
and this leads to  
1 10 5
B3 = − Y 3 = − Y 3.
2 16 16
Finally, we have

5 4 5 9
Y 4 = 2B4 + B1 B3 + B3 B1 + B22 = 2B4 + Y + Y 4 + Y 4,
32 32 64

and this yields  


1 35 35 4
B4 = Y4 = Y .
2 64 128

9.23 Note that DS can be written as DS = Im + DA . Then using the result from Problem 9.21, we have

−1/2 −1/2
R = DS SDS
   
1 3 2 5 3 1 3 2 5 3
= Im − DA + DA − DA + · · · (Ω + A) Im − DA + DA − DA + ···
2 8 16 2 8 16
1 3 2 2
= Ω + A − (ΩDA + DA Ω) + (DA Ω + ΩDA )
2 8
1 1 3 2 2
+ DA ΩDA − (ADA + DA A) + (DA A + ADA )
4 2 8
1 3 2 2 5 3 3
+ DA ADA − (DA ΩDA + DA ΩDA ) − (DA Ω + ΩDA )
4 16 16
= Ω + C1 + C2 + C3 .

9.25 (a) From the equation


(X + U )(cel + b1 ) = (xl + a1 )(Im + V )(cel + b1 )

we get the equation


Xb1 + cU el = xl b1 + cxl V el + ca1 el (1)

103
involving first order terms. Premultiplying (1) by e0l yields

xl bl1 + cull = xl bl1 + cxl vll + ca1 ,

and when solving for a1 , we get


a1 = ull − xl vll .

(b) Solving (1) for b1 , we get

b1 = c(X − xl Im )+ (xl V − U )el + yel , (2)

where y is an arbitrary scalar. Note that γ 0l γ l = 1 implies that e0l b1 = 0, and so this confirms
that bl1 = 0. Premultiplying (2) by ei when i 6= l yields
uil − xl vil
bi1 = e0i (X − xl Im )+ (xl V − U )el = −
xi − xl
since c = 1.

(c) γ 0l (Im + V )γ l = 1 and c = 1 leads to the equation vll + 2bl1 = 0, and so bl1 = −vll /2. The formula
for bil is obtained as in part (b).
1/2 1/2
(d) γ 0l γ l = λl and c = xl leads to the equation 2xl bl1 = a1 , and so
a1 ull − xl vll
bl1 = 1/2
= 1/2
.
2xl 2xl
Premultiplying (2) by ei when i 6= l yields
1/2
1/2 0 xl (uil − xl vil )
bi1 = xl ei (X − xl Im )+ (xl V − U )el = −
xi − xl
1/2
since c = xl .

9.27 (a) Setting


∂ h i
0
f = 6x21 − 6 3x22 − 27
∂x
0
equal to 0 yields the equations x21 = 1 and x22 = 9, from which we get the stationary points
(1, 3), (−1, 3), (1, −3), and (−1, −3).

(b) The Hessian matrix is  


12x1 0
Hf =  .
0 6x2
Hf is positive definite when (x1 , x2 ) = (1, 3) so this point is a minimum. Hf is negative definite
when (x1 , x2 ) = (−1, −3) so this point is a maximum. The other two are saddle points.

104
9.29 (a) The differential of f is

df = d(x0 Bx) + d(a0 x) = (dx)0 Bx + x0 Bdx + a0 dx = 2x0 Bdx + a0 dx,

and

f = 2x0 B + a0 = 00
∂x0
yields the equation Bx = − 12 a. It follows from Theorem 6.4 that x = − 12 B + a + (Im − B + B)y
is a solution for any choice of y.

(b) If B is nonsingular, then B + = B −1 and the soluiton given in part (a) becomes

1 1
x = − B −1 a + (Im − B −1 B)y = − B −1 a,
2 2

which is unique. Since d2 f = d(df ) = 2(dx)0 B(dx), Hf = 2B. Thus, this unique solution is a
minimum when B is positive definite and it is a maximum when B is negative definite.

9.31 Following the development in the one sample problem given in Example 9.6, we need to maximize the
function
k k
1X 1X
g(µ1 , . . . , µk , Ω) = − ni log |Ω| − tr(Ω−1 Ui )
2 i=1 2 i=1
k
1 1X
= − n log |Ω| − tr(Ω−1 Ui ),
2 2 i=1

where
ni
X
Ui = (xij − µi )(xij − µi )0 .
j=1

The first differential of g is given by


k
1 1X
dg = − nd(log |Ω|) − [tr{(dΩ−1 )Ui } + tr(Ω−1 dUi )]
2 2 i=1
k
1 1X
= − n tr(Ω−1 dΩ) + tr{Ω−1 (dΩ)Ω−1 Ui }
2 2 i=1
  
k ni ni
1 X  X X 
+ tr Ω−1 (dµi ) (xij − µi )0 + (xij − µi )dµ0i 
2 i=1

j=1 j=1

k
1X
= tr{(dΩ)Ω−1 (Ui − ni Ω)Ω−1 }
2 i=1

105
k
1X
+ tr(Ω−1 {ni (dµi )(xi − µi )0 + ni (xi − µi )dµ0i })
2 i=1
k k
1X X
= vec(Ui − ni Ω)0 (Ω−1 ⊗ Ω−1 ) vec(dΩ) + ni (xi − µi )0 Ω−1 dµi .
2 i=1 i=1

Setting the derivative with respect to µi equal to 0 yields

ni Ω−1 (xi − µi ) = 0,

and this leads to the solution


µ̂i = xi .

Setting the derivative with respect to vec(Ω) equal to 0 yields


k
X
(Ω−1 ⊗ Ω−1 ) vec(Ui − ni Ω) = 0,
i=1

or, equivalently,
k
X
(Ω−1 Ui Ω−1 − ni Ω−1 ) = (0).
i=1

Premultiplying and postmultiplying this equation by Ω and then solving for Ω, we get
k ki n
1X 1 XX
Ω̂ = Ui = (xij − µ̂i )(xij − µ̂i )0
n i=1 n i=1 j=1
k in
1 XX
= (xij − xi )(xij − xi )0 .
n i=1 j=1

Computing the second differential and then evaluating at µi = xi and Ω = Ω̂, we get
k
n X
d2 g = − vec(dΩ)0 (Ω̂−1 ⊗ Ω̂−1 ) vec(dΩ) − ni (dµi )0 Ω̂−1 dµi .
2 i=1

Consequently,  
−n1 Ω̂−1 (0) ··· (0)
 
(0) −n2 Ω̂−1 ··· (0)
 
 
Hg = 
 .. .. ..
.


 . . . 

(0) (0) ··· − n2 Ω̂−1 ⊗ Ω̂−1
Clearly, Hg is negative definite and so our solutions do maximize g.

106
9.33 Let z 1 = (x01 , y1 )0 ∈ T and z 2 = (x02 , y2 )0 ∈ T , so that xi ∈ S and yi ≥ f (xi ) for i = 1, 2. We need to
show that if 0 ≤ c ≤ 1 and z = (x0 , y)0 = cz 1 + (1 − c)z 2 , then z ∈ T . Now since xi ∈ S and S is a
convex set, it follows that x = cx1 + (1 − c)x2 ∈ S. Also, since f is a convex function,

f (x) = f (cx1 + (1 − c)x2 ) ≤ cf (x1 ) + (1 − c)f (x2 )

≤ cy1 + (1 − c)y2 = y.

This confirms that T is a convex set.

9.35 Let x1 ∈ S, x2 ∈ S, and 0 ≤ c ≤ 1. Applying the given inequality to x = x1 and a = cx1 + (1 − c)x2
leads to  

f (x1 ) ≥ f (a) + f (a) (1 − c)(x1 − x2 ),
∂a0
while applying it to x = x2 and the same a yields
 

f (x2 ) ≥ f (a) + f (a) c(x2 − x1 ).
∂a0
Multiplying the first of these equations by c and the second by (1 − c), and then combining gives us

cf (x1 ) + (1 − c)f (x2 ) ≥ f (a) = f (cx1 + (1 − c)x2 ),

and so f is convex.

9.37 (a) The differential of f is

df = (dxc1 )x1−c
2 + xc1 dx1−c
2

c −c
= cxc−1 1−c
1 x2 dx1 + (1 − c)x1 x2 dx2 ,

while the second differential is

d2 f = c(dxc−1 1−c c−1 1−c


1 )x2 dx1 + cx1 (dx2 )dx1

+ (1 − c)(dxc1 )x−c c −c
2 dx2 + (1 − c)x1 (dx2 )dx2

c−1 −c
= c(c − 1)x1c−2 x1−c 2
2 dx1 + c(1 − c)x1 x2 dx1 dx2

−c c −c−1
+ c(1 − c)xc−1
1 x2 dx1 dx2 − c(1 − c)x1 x2 dx22 .

Thus,  
−c−1 
−x22 x1 x2
Hf = c(1 − c)xc−2
1 x2
.
x1 x2 −x21

107
Since the leading principal minors of −Hf are nonnegative, −Hf is nonnegative definite. It follows
from Problem 9.36 that −f (x) is a convex function or, equivalently, f (x) is a concave function.

(b) Since −f (x) is a convex function, the inequality is an immediate consequence of Jensen’s inequal-
ity.

9.39 Setting the first derivative with respect to x of the Lagrange function

L(x, λ) = x0 x − λ(x21 + x22 + x23 + 4x1 − 6x3 − 2)

equal to 0, we obtain the equations


2x1 − 2λx1 − 4λ = 0,

2x2 − 2λx2 = 0,

2x3 − 2λx3 + 6λ = 0.

From these we get x1 = 2λ/(1 − λ), x2 = 0, and x3 = −3λ/(1 − λ). Since x21 + x22 + x23 + 4x1 − 6x3 = 2,
p
we then find that λ = 1 ± 13/15. Thus, we have the solutions,
p p p p p
x0 = (−2(1 + 13/15)/ 13/15, 0, 3(1 + 13/15)/ 13/15) when λ = 1 + 13/15,
p p p p p
x0 = (2(1 − 13/15)/ 13/15, 0, −3(1 − 13/15)/ 13/15) when λ = 1 − 13/15,

Using Theorem 9.16, we compute


 
0 2x1 + 4 2x2 2x3 − 6  
  0 2x1 + 4 2x2
 2x1 + 4 2(1 − λ)
 
0 0   
∆3 =  , ∆2 =  2x1 + 4 2(1 − λ) .
 
0
2(1 − λ)
 
 2x2 0 0   
  2x2 0 2(1 − λ)
2x3 − 6 0 0 2(1 − λ)

Consequently, |∆3 | = −16(1 − λ)2 {(x1 + 2)2 + x22 + (x3 − 3)2 } and |∆2 | = −8(1 − λ){(x1 + 2)2 + x22 }.
p
Here n = 3 and m = 1. Since |∆2 | < 0 and |∆3 | < 0 when λ = 1 − 13/15, we have

(−1)m |∆2 | > 0, (−1)m |∆3 | > 0,


p p p p
and so x0 = (2(1 − 13/15)/ 13/15, 0, −3(1 − 13/15)/ 13/15) yields a minimum. Since |∆2 | > 0
p
and |∆3 | < 0 when λ = 1 + 13/15, we have

(−1)2 |∆2 | > 0, (−1)3 |∆3 | > 0,


p p p p
and so x0 = (−2(1 + 13/15)/ 13/15, 0, 3(1 + 13/15)/ 13/15) yields a maximum.

108
9.41 Setting the first derivative with respect to x of the Lagrange function

L(x, λ) = x1 (x2 + x3 ) − λ1 (x21 + x22 − 1) − λ2 (x1 x3 + x2 − 2)

equal to 0, we obtain the equation

(x2 + x3 − 2λ1 x1 − λ2 x3 , x1 − 2λ1 x2 − λ2 , x1 − λ2 x1 )0 = 00 .

From the third component above, we see that λ2 = 1 since the two constraints ensure that x1 6= 0.
Multiplying the first component by 2λ1 and adding to the second component, we obtain an equation
which yields the solution x1 = (1 − 4λ21 )−1 , and when this is substituted back into the first component,
we also get x2 = 2λ1 (1 − 4λ21 )−1 . Substituting both of these into the first constraint, we find that
p p
λ1 = 0, 3/4, or − 3/4. Using these values, the solutions for x1 and x2 , and the solution for x3
obtained from the second constraint, we find that there are 3 stationary points:

(x1 , x2 , x3 )0 = (1, 0, 2)0 , when λ1 = 0,


p p p
(x1 , x2 , x3 )0 = (−1/2, − 3/4, −4 − 2 3/4)0 , when λ1 = 3/4,
p p p
(x1 , x2 , x3 )0 = (−1/2, 3/4, −4 + 2 3/4)0 , when λ1 = − 3/4.

Straightforward calculation reveals that since λ2 = 1,


 
0 0 2x1 2x2 0
 
 
 0 0 x3 1 x1 
 
∆3 =  2x1 x3 −2λ1 0 ,
 
1
 
−2λ1
 
 2x2 1 1 0 
 
0 x1 0 0 0
p
and |∆3 | = −8λ1 x41 − 8x31 x2 − 8λ1 x21 x22 . When λ1 = 3/4, we have |∆3 | < 0 and (−1)3 |∆3 | > 0, and so
p
this stationary point is a max. When λ1 = − 3/4, we have |∆3 | > 0 and (−1)m |∆3 | > 0, and so this
stationary point is a min. When λ1 = 0, |∆3 | = 0. A closer inspection of this stationary point reveals
that it is a saddle point. For instance, if we use the two constraints to obtain equations for x2 and x3
p
in terms of x1 , and substitute these into f (x), we get f (x) = 2 ± (x1 − 1) 1 − x21 . There are values
arbitrarily close to x1 = 1 for which f is larger than when x1 = 1 and there are values arbitrarily close
to x1 = 1 for which f is smaller than when x1 = 1.

109
9.43 Since f (cx) = f (x) for any scalar c 6= 0, maximizing and minimizing f (x) is equivalent to maximizing
and minimizing x0 Ax subject to the constraint x0 Bx = 1. Differentiating the Lagrange function
L(x, λ) = x0 Ax − λ(x0 Bx − 1) with respect to x and setting equal to 00 , yields the equation

Ax = λBx.

Premultiplying by x0 and rearranging, we get

x0 Ax
λ= = x0 Ax.
x0 Bx

Thus, for any stationary point x, λ = f (x) and λ is an eigenvalue of B −1 A. Consequently, the minimum
of f is λm (B −1 A), which is attained at any eigenvector of B −1 A corresponding to λm (B −1 A), and the
maximum of f is λ1 (B −1 A), which is attained at any eigenvector of B −1 A, corresponding to λ1 (B −1 A).

9.45 (a) We have ! !


n
X n
X n
X n
X
E(µ̂) = E ai xi = ai E(xi ) = ai µ = µ ai ,
i=1 i=1 i=1 i=1
Pn
and so µ̂ is an unbiased estimator of µ only if i=1 ai = 1.

(b) First note that since the xi ’s are independent,


n
! n n
X X X
var(µ̂) = var ai xi = a2i var(xi ) = σ 2 a2i .
i=1 i=1 i=1
Pn
We need to minimize var(µ̂) subject to ai = 1, so we consider the Lagrange function
i=1

n n
!
X X
2 2
L(a, λ) = σ ai − λ ai − 1 .
i=1 i=1

Setting the first derivative with respect to a equal to 0, we obtain the equations

2σ 2 ai − λ = 0,
Pn
for all i. Thus, a1 = · · · = an and the only solution satisfying i=1 ai = 1 has ai = 1/n for all i.
This of course yields the estimator x̄. Now A and B in Theorem 9.15 are

A = 2σ 2 In , B = 10n .

According to Theorem 9.15, our solution yields a minimum since x0 Ax > 0 for all x 6= 0 for which
Bx = 0.

110
9.47 Since f (c1 a, c2 b) = f (a, b) for any c1 6= 0 and c2 6= 0, maximizing f is equivalent to maximizing
(a0 Ω12 b)2 subject to the constraints a0 Ω11 a = 1 and b0 Ω22 b = 1. Thus, we consider the Lagrange
function
L(a, b, λ1 , λ2 ) = (a0 Ω12 b)2 − λ1 (a0 Ω11 a − 1) − λ2 (b0 Ω22 b − 1).

Differentiating L with respect to a and setting equal to 0 yields

(a0 Ω12 b)Ω12 b − λ1 Ω11 a = 0, (1)

while differentiation with respect to b leads to

(a0 Ω12 b)Ω012 a − λ2 Ω22 b = 0. (2)

Premultiplying (1) by a0 , we get

(a0 Ω12 b)2 = λ1 a0 Ω11 a = λ1 ,

and premultiplying (2) by b0 , we get

(a0 Ω12 b)2 = λ2 b0 Ω22 b = λ2 .

That is, λ1 = λ2 = (a0 Ω12 b)2 , which is the quantity we wish to maximize. Letting λ = λ1 = λ2 , it
follows from (1) that a = (a0 Ω12 b)−1 Ω−1 0 −1
11 Ω12 b, and when substituting this in (2), we get Ω12 Ω11 Ω12 b =

λΩ22 b or, equivalently, Ω−1 0 −1 −1 0 −1


22 Ω12 Ω11 Ω12 b = λb. Thus, λ is an eigenvalue of Ω22 Ω12 Ω11 Ω12 , and so the

maximum will be given by the largest eigenvalue. Similarly, if we use (2) to solve for b and substitute
this in (1), we get Ω−1 −1 0
11 Ω12 Ω22 Ω12 a = λa. Thus, the maximum is attained when a is an eigenvector

of Ω−1 −1 0 −1 0 −1
11 Ω12 Ω22 Ω12 corresponding to its largest eigenvalue, and b is an eigenvector of Ω22 Ω12 Ω11 Ω12

corresponding to its largest eigenvalue.

111
Chapter 10

10.1 If x ≺ y, then
k
X k
X
x[i] ≤ y[i] ,
i=1 i=1

for k = 1, . . . , m, and if y ≺ x, then


k
X k
X
y[i] ≤ x[i] ,
i=1 i=1

for k = 1, . . . , m. Combining these two inequalities, we get


k
X k
X
x[i] = y[i] ,
i=1 i=1

for k = 1, . . . , m, which implies that x[i] = y[i] , for i = 1, . . . , m. That is, for some permutation matrix
P , y = P x.

10.3 Let w = λx + (1 − λ)y. Since x ≺ z and y ≺ z, we have


k
X k
X k
X
w[i] ≤ λ x[i] + (1 − λ) y[i]
i=1 i=1 i=1
k
X k
X
≤ λ z[i] + (1 − λ) z[i]
i=1 i=1
k
X
= z[i]
i=1

for k = 1, . . . , m − 1, and
m
X m
X m
X
w[i] = λ x[i] + (1 − λ) y[i]
i=1 i=1 i=1
Xm Xm
= λ z[i] + (1 − λ) z[i]
i=1 i=1
m
X
= z[i] .
i=1

Thus, w = λx + (1 − λ)y ≺ z.

112
10.5 If x ≺ y, then it follows from Theorem 10.9 that
m
X m
X
|xi − a| ≤ |yi − a| (1)
i=1 i=1

since g(x) = |x − a| is a convex function. Conversely now suppose (1) holds for all a. Note that if
xi > a and yi > a for all i, then
m
X m
X m
X m
X m
X m
X
xi − ma = (xi − a) = |xi − a| ≤ |yi − a| = (yi − a) = yi − ma,
i=1 i=1 i=1 i=1 i=1 i=1

so we must have
m
X m
X
xi ≤ yi . (2)
i=1 i=1

On the other hand, if xi < a and yi < a for all i, then


m
X m
X m
X m
X m
X m
X
ma − xi = (a − xi ) = |xi − a| ≤ |yi − a| = (a − yi ) = ma − yi ,
i=1 i=1 i=1 i=1 i=1 i=1

so we must have
m
X m
X
xi ≥ yi . (3)
i=1 i=1

Combining (2) and (3) yields


m
X m
X
xi = yi . (4)
i=1 i=1

Now define x+ = max(x, 0) and suppose that


m
X m
X
+
(xi − a) ≤ (yi − a)+ (5)
i=1 i=1

holds for all a. Note that for fixed k,


k
X k
X
(y[i] − y[k] )+ = y[i] − ky[k] ,
i=1 i=1

whereas
m
X
(y[i] − y[k] )+ = 0,
i=k+1

so that
m
X k
X
(y[i] − y[k] )+ = y[i] − ky[k] .
i=1 i=1

113
Thus, due to (5) we have
k
X m
X k
X k
X
y[i] − ky[k] ≥ (x[i] − y[k] )+ ≥ (x[i] − y[k] )+ ≥ x[i] − ky[k] ,
i=1 i=1 i=1 i=1

and so we must have


k
X k
X
x[i] ≤ y[i] . (6)
i=1 i=1
Finally, note that since |xi − a| = 2(xi − a) − (xi − a), |yi − a| = 2(yi − a)+ − (yi − a), and (4) holds,
+

it follows from (1) that (5) holds. We have shown this guarantees that (6) holds and this along with
(4) confirms that x ≺ y.

10.7 Since P is doubly stochastic, P 1m = 1m and 10m P = 10m . Premultiplying the first of these equations
by P −1 and postmultiplying the second by P −1 yields P −1 1m = 1m and 10m P −1 = 10m . Also if P and
P −1 are doubly stochastic, from Theorem 10.1, we know that P ej ≺ ej and P −1 (P ej ) = ej ≺ P ej .
By Problem 10.1, this means that the jth column of P can be obtained from ej by permutation of
its components; that is, the jth column of P has one nonzero component equal to one. In a similar
fashion, we can show that the jth row of P has one nonzero component equal to one. This establishes
that P is a permutation matrix.

10.9 Write b = (b1 ,0 b02 )0 , where b1 = (b1 , . . . , bm1 )0 has the eigenvalues of A11 and b2 = (bm1 +1 , . . . , bm )0
has the eigenvalues of A22 . Let A11 = P1 Db1 P10 and A22 = P2 Db2 P20 be the spectral decompositions
of A11 and A22 , so that P1 and P2 are m1 × m1 and m2 × m2 orthogonal matrices. It follows that
P = diag(P1 , P2 ) is an m × m orthogonal matrix, and so the eigenvalues of A are the same as the
eigenvalues of    
P10 A11 P1 P10 A12 P2 Db1 P10 A12 P2
P 0 AP =  = . (1)
P20 A012 P1 P20 A22 P2 P20 A012 P1 Db2
Now applying Theorem 10.3 to the matrix given in (1), we find that b ≺ a.

10.11 Note that the function g(x) = x−1 is a convex function on (0, ∞) and x̄1m ≺ x. Thus, an application
of Theorem 10.9 yields
m m
X 1 X 1
≥ .
x
i=1 i i=1

Multiplying both sides of this equation by mx̄ and then subtracting m leads to
m
X mx̄ − xi
≥ m(m − 1).
i=1
xi

114
10.13 Since y is in the column space of A, it can be written as y = Az for some m × 1 vector z. Then using
Problem 10.12, we get

(x0 y)2 = (x0 Az)2 ≤ (x0 Ax)(z 0 Az)

= (x0 Ax)(z 0 AA− Az) = (x0 Ax)(y 0 A− y).

We have equality if and only if one of the vectors Ax and Az = y is a scalar multiple of the other.

10.15 If we let z = (x0 x)−1/2 x, the inequality can be equivalently expressed as


(λ1 + λm )2
(z 0 Az)(z 0 A−1 z) ≤ ,
4λ1 λm
for all m × 1 unit vectors z. Let A = P ΛP 0 be the spectral decomposition of A so that Λ =
diag(λ1 , . . . , λm ). Note that for i = 1, . . . , m,

λ2i − (λ1 + λm )λi + λ1 λm = (λi − λ1 )(λi − λm ) ≤ 0, (1)

since λi − λ1 ≤ 0 and λi − λm ≥ 0. Dividing (1) by −λi , we get


λ1 λm
(λ1 + λm ) − λi − ≥0
λi
for i = 1, . . . , m, or, in other words,

(λ1 + λm )Im − Λ − λ1 λm Λ−1 (2)

is nonnegative definite. Premultiplying (2) by P and postmultiplying by P 0 , it then follows that

(λ1 + λm )Im − A − λ1 λm A−1

is also nonnegative definite. Consequently, for any unit vector z,

(λ1 + λm ) − z 0 Az − λ1 λm z 0 A−1 z ≥ 0. (3)

Multiplying (3) by z 0 Az and rearranging, we get

λ1 λm (z 0 Az)(z 0 A−1 z) ≤ (λ1 + λm )z 0 Az − (z 0 Az)2


 2  2
λ1 + λm λ1 + λm
= − z 0 Az − +
2 2
2
(λ1 + λm )
≤ ,
4
from which the required inequality follows.

115
10.17 Let λ = (λ1 , . . . , λm )0 , where λ1 ≥ · · · ≥ λm are the eigenvalues of A. Then using the Cauchy-Schwarz
inequality, we have
m
!2
X
{tr(A)} 2
= λi = (10m λ)2 ≤ (10m 1m )(λ0 λ)
i=1
m
!
X
= m λ2i = m tr(A2 ),
i=1

with equality if and only if one of the vectors 1m and λ is a scalar multiple of the other, and this is
equivalent to saying the eigenvalues are all equal.

10.19 Clearly
tr{(A0 B)2 } = tr(A0 BA0 B) = tr(BA0 BA0 ) = tr{(BA0 )2 }.

Now applying the result in Problem 10.18 with A0 replaced by B and B replaced by A0 , we get

tr{(A0 B)2 } = tr{(BA0 )2 } ≤ tr(BB 0 AA0 ) = tr(AA0 BB 0 ),

with equality if and only if B 0 A is symmetric.

10.21 We prove the result by induction. It follows from Theorem 10.16 that the result holds for n = 2. Now
Pn−1
assume the result holds for n − 1 and we will show that it must hold for n. Let A = i=1 βi Ai , where
βi = αi /(1 − αn ) for i = 1, . . . , n − 1. Note that
n−1
X n−1
X
βi = (1 − αn )−1 αi = (1 − αn )−1 (1 − αn ) = 1,
i=1 i=1

and since A1 , . . . , An−1 are positive definite, we can apply our result to A, that is,
n−1
X n−1
Y
|A| = β i Ai ≥ |Ai |βi , (1)
i=1 i=1

with equality if and only if A1 = · · · = An−1 . Also, an application of Theorem 10.16 yields
n
X n−1
X
αi Ai = (1 − αn ) βi Ai + αn An = |(1 − αn )A + αn An | ≥ |A|1−αn |An |αn , (2)
i=1 i=1

with equality if and only if A = An . Combining (1) and (2), we get


n n−1
!(1−αn )
X Y
βi
αi Ai ≥ |Ai | |An |αn
i=1 i=1
n−1
! n
Y Y
= |Ai |αi |An |αi = |Ai |αi ,
i=1 i=1

116
with equality if and only if A1 = . . . = An .

10.23 Using Theorem 4.14, we can write A−1 = C 0 Λ−1 C and B −1 = C 0 C, where C is a nonsingular matrix
and Λ is a diagonal matrix with positive diagonal elements. As a result

W = αA−1 + (1 − α)B −1 − {αA + (1 − α)B}−1

= C 0 {αΛ−1 + (1 − α)Im }C − [C −1 {αΛ + (1 − α)Im }C −10 ]−1

= C 0 DC,

where D = αΛ−1 + (1 − α)Im − {αΛ + (1 − α)Im }−1 . Note that since g(x) = x−1 is a convex function
on (0, ∞),

α
αg(x) + (1 − α)g(1) = + (1 − α)
x
1
≥ g(αx + (1 − α)) = ,
αx + (1 − α)

with equality if and only if x = 1. This guarantees that D, and hence also W , is nonnegative definite.
Thus, we must have

tr(W ) = α tr(A−1 ) + (1 − α) tr(B −1 ) − tr[{αA + (1 − α)B}−1 ] ≥ 0

which is the desired result. We have equality if and only A = B.

10.25 Using the result from the previous problem, we find that

|A + B|1/m = min m−1 tr{(A + B)X}


|X|=1

≥ min m−1 tr(AX) + min m−1 tr(BX)


|X|=1 |X|=1
1/m 1/m
= |A| + |B| .

We have equality if and only if the three minimums occur at the same X in which case from the
previous problem, this requires that A−1 , B −1 , and (A + B)−1 be proportional to one another, and so
A and B are proportional.

117
Chapter 11

11.1 Note that


     
 I
m Im − A   Im −Im Im Im − A Im (0) 
rank   = rank    
 A (0)   (0) Im A (0) −Im Im 
 
 (0) Im − A 
= rank   = rank(A) + rank(Im − A) = m,
 A (0) 

and
     
 I
m Im − A   Im (0) Im Im − A Im −(Im − A) 
rank   = rank    
 A (0)   −A Im (0)A (0) Im 
 
 Im (0) 
= rank   = rank(Im ) + rank{A(Im − A)}
 (0) −A(Im − A) 
= m + rank{A(Im − A)},

where we have used Theorem 1.10 and Theorem 2.9. Together these two identities imply rank{A(Im −
A)} = 0, so that we must have A(Im − A) = (0) or or A2 = A.

11.3 The matrix A has r eigenvalues equal to one, with the remaining eigenvalues all being zero. Thus,
A has a spectral decomposition A = XΛX 0 , where Λ = diag(Ir , (0)) for some orthogonal matrix X.
Partition X as X = [P Q], where P is m × r. Then X 0 X = Im implies that P 0 P = Ir and
  
0
I r (0) P
A = XΛX 0 = [P Q]    = P P 0.
0
(0) (0) Q

11.5 It follows from Problem 11.3 that a symmetric idempotent matrix is the projection matrix for the
orthogonal projection onto its column space. Thus, since these projection matrices are unique and A
and B have the same column space, we must have A = B.

11.7 (a) We must have


(a1m 10m )2 = a2 1m 10m 1m 10m = ma2 1m 10m = a1m 10m ,

that is, ma2 = a. Thus, a = 0 or a = 1/m.

118
(b) We must have

(bIm + c1m 10m )2 = b2 Im + 2bc1m 10m + mc2 1m 10m = bIm + c1m 10m .

That is, we must have b2 = b and 2bc + mc2 = c. Thus, we need b = 0 and c = 0 or c = 1/m, or
b = 1 and c = 0 or c = −1/m.

11.9 If AB is idempotent, then


ABAB = AB.

Since B is nonsingular, we can postmultiply this equation by B −1 and premultiply by B to get

BABA = BA,

confirming that BA is idempotent.

11.11 Note that if c = 0, A2 = (0) and so all the eigenvalues of A equal zero. Thus, tr(A) = c rank(A) holds
since both sides reduce to zero. If c 6= 0, then A2 = cA implies

(c−1 A)2 = c−2 A2 = c−2 cA = c−1 A.

That is, c−1 A is idempotent. Thus,

rank(A) = rank(c−1 A) = tr(c−1 A) = c−1 tr(A),

so tr(A) = c rank(A).

11.13 (a) Clearly A − BB + is symmetric since A and BB + are, and

(A − BB + )2 = A2 − ABB + − BB + A + BB + BB +

= A − BB + − (BB + )0 A + BB + = A − B +0 B 0 A

= A − B +0 (AB)0 = A − B +0 B 0

= A − (BB + )0 = A − BB + ,

so A − BB + is idempotent. Since A and BB + are idempotent as well,

rank(A − BB + ) = tr(A − BB + ) = tr(A) − tr(BB + )

= rank(A) − rank(BB + ) = rank(A) − rank(B).

119
(b) Since Im − A is symmetric idempotent and (Im − A)B = B, we can apply the result from part
(a). That is, (Im − A) − BB + is symmetric idempotent with

rank{(Im − A) − BB + } = rank{(Im − A)} − rank(B) = m − rank(A) − rank(B) = 0,

since rank(A) + rank(B) = m. This requires that (Im − A) − BB + = (0), so A = Im − BB + .

11.15 Since Ω is positive definite, it can be written Ω = T T 0 for some m × m nonsingular matrix T . Then
z = T −1 x ∼ Nm (T −1 µ, Im ) and

x0 Ax = x0 T −10 T 0 AT T −1 x = z 0 T 0 AT z.

Clearly, T 0 AT is symmetric and

(T 0 AT )2 = T 0 AΩAT = T 0 AΩAΩΩ−1 T = T 0 (AΩ)2 Ω−1 T = T 0 AΩΩ−1 T = T 0 AT,

so it is also idempotent. Consequently, it can be written T 0 AT = P ∆P 0 , where P is an orthogonal


matrix and ∆ is the diagonal matrix diag(Ir , (0)) since

rank(∆) = rank(T 0 AT ) = tr(T 0 AT ) = tr(AΩ) = rank(AΩ) = r.

Then z ∗ = P 0 z ∼ Nm (P 0 T −1 µ, Im ) and
r
X
x0 Ax = z 0 T 0 AT z = zP ∆P 0 z = z 0∗ ∆z ∗ = 2
zi∗ .
i=1

Thus, x0 Ax has a chi-squared distribution with r degrees of freedom. The noncentrality parameter is

1 1 1 1
λ= E(z ∗ )0 ∆E(z ∗ ) = µ0 T −10 P ∆P 0 T −1 µ = µ0 T −10 T 0 AT T −1 µ = µ0 Aµ.
2 2 2 2

11.17 Since A is symmetric of rank r, it can be expressed as A = P ΛP 0 , where P is an m × r matrix


satisfying P 0 P = Ir , Λ = diag(λ1 , . . . , λr ), and λ1 , . . . , λr are the nonzero eigenvalues of A. Then
z = P 0 x ∼ Nr (0, Ir ) and
r
X
x0 Ax = x0 P ΛP 0 x = z 0 Λz = λi zi2 ,
i=1
0
confirming that x Ax is a linear combination of independent one-degree of freedom chi-squared random
variables. When A is idempotent, the coefficients, that is, the λ0i s, are all equal to one.

120
11.19 Since x1 , . . . , xn are independent and identically distributed as N (µ, σ 2 ) random variables, we have
x = (x1 , . . . , xn )0 ∼ Nn (µ1n , σ 2 In ), y = x − µ1n ∼ Nn (0, σ 2 In ), and n−1 (x − µ1n )0 1n = (x̄ − µ).
Thus,
n(x̄ − µ)2 n(x − µ1n )0 (n−1 1n )(n−1 10n )(x − µ1n )
t= = = y 0 (n−1 σ −2 1n 10n )y.
σ2 σ2
Since (n−1 σ −2 1n 10n )(σ 2 In ) is idempotent and rank(n−1 1n 10n ) = 1, it follows from Theorem 11.10 that
t ∼ χ21 .

11.21 (a) Now t1 = y 0 A1 y, where A1 = diag(Ω−1


11 , (0)) and y = x − µ ∼ Nn (0, Ω). Since
 
Ir Ω−1
11 Ω 12
A1 Ω =  
(0) (0)

is idempotent and rank(A1 Ω) = r, it follows from Theorem 11.10 that t1 ∼ χ2r .

(b) Note that t2 = y 0 A2 y, where A2 = Ω−1 − A1 . Since

A2 ΩA2 = Ω−1 − A1 − A1 + A1 ΩA1 = Ω−1 − A1 = A2 ,

it follows that A2 Ω is idempotent. Applying Theorem 7.1 to A = Ω, we find that


 
−1 0 −1 −1
Ω11 Ω12 B22 Ω12 Ω11 −Ω11 Ω12 B22
A2 =  
−B22 Ω012 Ω−111 B 22
 
−1
Ω11 Ω12
 B22 Ω012 Ω−1
 
=  11 − In−r ,
−In−r

where B22 = (Ω22 −Ω012 Ω−1


11 Ω12 )
−1
. As a result, we see that rank(A2 Ω) = rank(A2 ) = rank(B22 ) =
n − r, and so another application of Theorem 11.10 shows that t2 ∼ χ2n−r .

(c) By Theorem 11.14, t1 and t2 are independently distributed since

A1 ΩA2 = A1 Ω(Ω−1 − A1 ) = A1 − A1 ΩA1 = A1 − A1 = (0).

11.23 Since Ω is positive definite, we can write Ω = T T 0 for some nonsingular matrix T , and y = T −1 x ∼
Nm (T −1 µ, Im ). Let A∗ = T 0 AT and B∗ = BT , and note that

B∗ A∗ = BT T 0 AT = BΩAT = (0)T = (0).

121
If the spectral decomposition of A∗ is given by
  
Λ (0) P10
A∗ = [P1 P2 ]    = P1 ΛP10 ,
(0) (0) P20

then the condition B∗ A∗ = (0) implies that B∗ = CP20 for some matrix C. Now let z = P 0 y ∼
Nm (P 0 T −1 µ, Im ) and partition z = (z 01 , z 02 )0 similar to P = (P1 , P2 ). Then

x0 Ax = y 0 T 0 AT y = y 0 A∗ y = z 0 P 0 A∗ P z = z 01 Λz 1 ,

and
Bx = BT y = B∗ y = B∗ P z = CP20 P z = Cz 2 .

Since var(z) = Im , z 1 is independent of z 2 and so it follows that x0 Ax is independent of Bx.

11.25 (a) We have ti = x0 Ai x, where

1 1
A1 = 14 104 + (e1 − e2 )(e1 − e2 )0 ,
4 2
1
A2 = (14 − 4e4 )(14 − 4e4 )0 ,
12
A3 = (e1 + e2 − 2e3 )(e1 + e2 − 2e3 )0 + (e3 − e4 )(e3 − e4 )0 .

(b) Since A21 = A1 and tr(A1 ) = 2, t1 ∼ χ22 . Since A22 = A2 and tr(A2 ) = 1, t2 ∼ χ21 . Finally, t3 does
not have a chi-squared distribution because A23 6= A3 .

(c) It is easily shown that A1 A2 = (0) and A1 A3 = (0), while

1
A2 A3 = (14 − 4e4 )(e3 − e4 )0 6= (0).
3

Thus, t1 and t2 are independently distributed, t1 and t3 are independently distributed, but t2 and
t3 are not independently distributed.

11.27 Since Ω is positive semidefinite, there is an m × m matrix T satisfying Ω = T T 0 , and so condition (a)
can be written as
T T 0 AT T 0 BT T 0 = (0).

Premultiplying this by T + and postmultiplying by T +0 , we find that

T 0 AT T 0 BT = (0),

122
which is the condition (11.12) established in the proof of Theorem 11.14. Writing x = µ + T P y, we
find that
x0 Ax = µ0 Aµ + y 0 Cy + 2µ0 AT P y, (1)

and
x0 Bx = µ0 Bµ + y 0 Dy + 2µ0 BT P y, (2)

where C, D, and P are as defined in the proof of Theorem 11.14, and y ∼ Nm (0, Im ). As demonstrated
in the proof of Theorem 11.14, CD = (0) and this shows that y 0 Cy and y 0 Dy are independent. Next
write y 0 Cy = y 0 CC + Cy and note that the covariance between Cy and µ0 BT P y is

E(Cyy 0 P 0 T 0 Bµ) = CE(yy 0 )P 0 T 0 Bµ = CP 0 T 0 Bµ.

But condition (b) implies that T 0 AT T 0 Bµ = 0, that is, P CP 0 T 0 Bµ = 0, and this yields CP 0 T 0 Bµ = 0.
Thus, Cy and µ0 BT P y are independently distributed, and hence so are y 0 Cy and µ0 BT P y. In a
similar fashion, it can be shown that the independence of y 0 Dy and µ0 AT P y follows from condition
(c). Finally, due to condition (d), we have

µ0 AT P E(yy 0 )P 0 T 0 Bµ = µ0 AT P P 0 T 0 Bµ = µ0 AΩBµ = 0,

so that µ0 AT P y and µ0 BT P y are uncorrelated and, hence, independent. This then confirm that (1)
and (2) are independently distributed.

11.29 Let F = [G0 H 0 ]0 , and note that F −1 = [G0 (GG0 )−1 H 0 (HH 0 )−1 ]. Then

y = Xβ +  = XF −1 F β + 

= XG0 (GG0 )−1 Gβ + XH 0 (HH 0 )−1 Hβ + 

= X∗ β ∗ + XH 0 (HH 0 )−1 c + ,

which is equivalent to the model y ∗ = X∗ β ∗ + . The least squares estimate of β ∗ is

β̂ ∗ = (X∗0 X∗ )−1 X∗0 y ∗

= GG0 (GX 0 XG0 )−1 GX 0 (y − XH 0 (HH 0 )−1 c).

The sum of squared errors for this reduced model is then given by

SSEr = (y ∗ − X∗ β̂ ∗ )0 (y ∗ − X∗ β̂ ∗ )

= {y − XH 0 (HH 0 )−1 c}0 {IN − XG0 (GX 0 XG0 )−1 GX 0 }{y − XH 0 (HH 0 )−1 c},

123
while the sum of squared errors for the complete model is

SSEc = y 0 {IN − X(X 0 X)−1 X 0 }y.

Consequently, the F statistic is


(SSEr − SSEc )/m2
F = .
SSEc /(N − m)
11.31 Using the moments of the standard normal distribution and the fact that the components of z are
independent of one another, we have



 0, i 6= j 6= h 6= k,


i = j, i 6= h 6= k,



 Thk ,

0
E(zi zj zh zk zz ) = Im + Tii + Thh , i = j, h = k, i 6= h,


i = j = h, i 6= k,



 3Tik ,



 3Im + 6Tii , i = j = h = k.

This can then be used to obtain


m m m
1 XX X
E{zi2 (zz 0 0
⊗ zz )} = (Thk ⊗ Thk ) + Im ⊗ Tii + Im ⊗ Im + Tii ⊗ Im + 2 (Tik ⊗ Tik ),
2
h=1 k=1 k=1

and for i 6= j,
m
X m
X
E{zi zj (zz 0 ⊗ zz 0 )} = Im ⊗ Tij + (Tik ⊗ Tjk ) + (Tjk ⊗ Tik ) + Tij ⊗ Im .
k=1 k=1

This leads to
m m
0 0 1 XX
E{zi zj (zz ⊗ zz )} = δij {Im ⊗ Im + (Thk ⊗ Thk )} + Im ⊗ Tij + Tij ⊗ Im
2
h=1 k=1
m
X Xm
+ (Tik ⊗ Tjk ) + (Tjk ⊗ Tik ),
k=1 k=1

where δij denotes the (i, j)th component of Im . Thus,


m X
X m
E(zz 0 ⊗ zz 0 ⊗ zz 0 ) = (Eij ⊗ E{zi zj (zz 0 ⊗ zz 0 )})
i=1 j=1
m m m
X 1 XX
= (Eii ⊗ {Im ⊗ Im + (Thk ⊗ Thk )})
i=1
2
h=1 k=1
m X
X m m X
X m
+ (Eij ⊗ Im ⊗ Tij ) + (Eij ⊗ Tij ⊗ Im )
i=1 j=1 i=1 j=1

124
m X
X m X
m
+ {Eij ⊗ (Tik ⊗ Tjk + Tjk ⊗ Tik )}
i=1 j=1 k=1
m Xm
1 X
= Im3 + (Im ⊗ Tij ⊗ Tij
2 i=1 j=1
+ Tij ⊗ Im ⊗ Tij + Tij ⊗ Tij ⊗ Im )
Xm X m Xm
+ (Tij ⊗ Tik ⊗ Tjk ).
i=1 j=1 k=1

11.33 (a) We can write y = sx, where x ∼ Nm (0, Ω) and s is a nonnegative random variable that is
distributed independently of x. Using Theorem 11.21, we then get

E(yy 0 ⊗ yy 0 ) = c{2Nm (Ω ⊗ Ω) + vec(Ω) vec(Ω)0 },

where c = E(s4 ).

(b) Premultiplying E(yy 0 ⊗ yy 0 ) in part (a) by e0i ⊗ e0i and postmultiplying by ei ⊗ ei yields the
identity
E(yi4 ) = 3c{E(yi2 )}2 ,

and solving for c leads to


E(yi4 )
c= .
3{E(yi2 )}2
(c) We can write S = (n − 1)−1 Y 0 AY , where Y 0 is the m × n matrix with ith column y i and
A = In − n−1 1n 10n . Now

var{vec(S)} = (n − 1)−2 [E{vec(Y 0 AY ) vec(Y 0 AY )0 } − E{vec(Y 0 AY )}E{vec(Y 0 AY )0 }], (1)

and clearly
n
X
E{vec(Y 0 AY )} = E{vec(y i y 0i )} − nE{vec(ȳ ȳ 0 )} = (n − 1) vec(Ω). (2)
i=1

Since aii = 1 − n−1 and aij = −n−1 for i 6= j, and the y i ’s are independent, we have
   !0 
 X X 
E{vec(Y 0 AY ) vec(Y 0 AY )0 } = E vec  aij y i y 0j  vec akl y k y 0l
 
ij kl
X
= aij akl E(y j y 0l ⊗ y i y 0k )
ijkl

125
X X
= aii ajj E(y i y 0j ⊗ y i y 0j ) + a2ij E(y i y 0j ⊗ y j y 0i )
i6=j i6=j
X X
+ a2ii E(y i y 0i ⊗ y i y 0i ) + a2ij E(y i y 0i ⊗ y j y 0j )
i i6=j
X
= aii ajj E{vec(y i y 0i ) vec(y j y 0j )0 }
i6=j
X
+ a2ij E(y i y 0i ⊗ y j y 0j )Kmm
i6=j
X X
+ a2ii E(y i y 0i ⊗ y i y 0i ) + a2ij E(y i y 0i ⊗ y j y 0j )
i i6=j
3 −1
= (n − 1) n vec(Ω) vec(Ω) + (n − 1)n−1 Kmm (Ω ⊗ Ω)
0

+ (n − 1)2 n−1 c{2Nm (Ω ⊗ Ω) + vec(Ω) vec(Ω)0 }

+ (n − 1)n−1 (Ω ⊗ Ω)

= (n − 1)3 n−1 vec(Ω) vec(Ω)0 + (n − 1)n−1 (Im2 + Kmm )(Ω ⊗ Ω)

+ (n − 1)2 n−1 c{2Nm (Ω ⊗ Ω) + vec(Ω) vec(Ω)0 }

Thus, using this and (1) and (2), we get

var{vec(Y 0 AY )} = (n − 1)3 n−1 vec(Ω) vec(Ω)0 + (n − 1)n−1 (Im2 + Kmm )(Ω ⊗ Ω)

+ (n − 1)2 n−1 c{2Nm (Ω ⊗ Ω) + vec(Ω) vec(Ω)0 } − (n − 1)2 vec(Ω) vec(Ω)0

= −(n − 1)2 n−1 vec(Ω) vec(Ω)0 + (n − 1)n−1 (Im2 + Kmm )(Ω ⊗ Ω)

+ (n − 1)2 n−1 c{2Nm (Ω ⊗ Ω) + vec(Ω) vec(Ω)0 },

and

var{vec(S)} = (n − 1)−2 var{vec(Y 0 AY )}

= n−1 c{2Nm (Ω ⊗ Ω) + vec(Ω) vec(Ω)0 } − n−1 vec(Ω) vec(Ω)0

+ (n − 1)−1 n−1 (Im2 + Kmm )(Ω ⊗ Ω)

≈ (n − 1)−1 {2cNm (Ω ⊗ Ω) + (c − 1) vec(Ω) vec(Ω)0 }.

(d) Following the derivation given in Example 11.11 and using the formula from part (c), we find that
1
var{vec(R)} = H{2cNm (P ⊗ P ) + (c − 1) vec(P ) vec(P )0 }H 0
n−1
1
= H{2cNm (P ⊗ P )}H 0
n−1

126
2c
= Nm ΘNm ,
n−1

where H and Θ are as defined in Example 11.11. The second equality above follows from the fact
that
 
1
H vec(P ) = Im2 − {(Im ⊗ P ) + (P ⊗ Im )}Λm vec(P )
2
1
= vec(P ) − {(Im ⊗ P ) + (P ⊗ Im )} vec(Im )
2
1
= vec(P ) − {vec(P ) + vec(P )} = 0,
2

whereas the last equality follows from the simplification given in Problem 11.53.

11.35 (a) We have

E(x0 Axx0 Bxx0 Cx) = E{(x0 ⊗ x0 ⊗ x0 )(A ⊗ B ⊗ C)(x ⊗ x ⊗ x)}

= E[tr{(x0 ⊗ x0 ⊗ x0 )(A ⊗ B ⊗ C)(x ⊗ x ⊗ x)}]

= E[tr{(A ⊗ B ⊗ C)(x ⊗ x ⊗ x)(x0 ⊗ x0 ⊗ x0 )}]

= tr{(A ⊗ B ⊗ C)E(xx0 ⊗ xx0 ⊗ xx0 )}.

(b) Using Problem 11.31, we get


m m
1 XX
E(x0 Axx0 Bxx0 Cx) = tr(A ⊗ B ⊗ C) + {tr(A ⊗ BTij ⊗ CTij )
2 i=1 j=1
+ tr(ATij ⊗ B ⊗ CTij ) + tr(ATij ⊗ BTij ⊗ C)}
Xm Xm X m
+ tr(ATij ⊗ BTik ⊗ CTjk ).
i=1 j=1 k=1

Now

tr(A ⊗ BTij ⊗ CTij ) = tr(A) tr{B(ei e0j + ej e0i )} tr{C(ei e0j + ej e0i )}

= tr(A)(2bij )(2cij ) = 4 tr(A)bij cij .

In a similar fashion, we get tr(ATij ⊗B⊗CTij ) = 4 tr(B)aij cij , tr(ATij ⊗BTij ⊗C) = 4 tr(C)aij bij ,
and tr(ATij ⊗ BTik ⊗ CTjk ) = 8aij bik cjk . Thus,
m X
X m
E(x0 Axx0 Bxx0 Cx) = tr(A) tr(B) tr(C) + 2 {tr(A)bij cij + tr(B)aij cij
i=1 j=1

127
m X
X m X
m
+ tr(C)aij bij } + 8 aij bik cjk
i=1 j=1 k=1
= tr(A) tr(B) tr(C) + 2 tr(A) tr(BC) + 2 tr(B) tr(AC)

+ 2 tr(C) tr(AB) + 8 tr(ABC).

11.37 (a) We have

E(xy 0 ⊗ xy 0 ) = E(x ⊗ x){E(y ⊗ y)}0 = E{vec(xx0 )}[E{vec(yy 0 )}]0

= vec{E(xx0 )}[vec{E(yy 0 )}]0 = vec(V1 ){vec(V2 )}0 .

(b) Similarly

E(xy 0 ⊗ yx0 ) = E{(x ⊗ y)(y 0 ⊗ x0 )} = E{Kmn (y ⊗ x)(y 0 ⊗ x0 )}

= Kmn E(yy 0 ⊗ xx0 ) = Kmn (V2 ⊗ V1 ) = (V1 ⊗ V2 )Kmn .

(c) Also

E(x ⊗ x ⊗ y ⊗ y) = E(x ⊗ x) ⊗ E(y ⊗ y) = E{vec(xx0 )} ⊗ E{vec(yy 0 )}

= vec(V1 ) ⊗ vec(V2 ).

(d) Using the result from (c), we get

E(x ⊗ y ⊗ x ⊗ y) = E{x ⊗ Knm (x ⊗ y) ⊗ y}

= (Im ⊗ Knm ⊗ In )E(x ⊗ x ⊗ y ⊗ y)

= (Im ⊗ Knm ⊗ In ){vec(V1 ) ⊗ vec(V2 )}.

(e) Finally

var(x ⊗ y) = E{(x ⊗ y)(x ⊗ y)0 } − E(x ⊗ y){E(x ⊗ y)}0

= E(xx0 ) ⊗ E(yy 0 ) − {E(x) ⊗ E(y)}{E(x) ⊗ E(y)}0

= V1 ⊗ V2 − µ1 µ01 ⊗ µ2 µ02 .

11.39 We have t1 = x0 A1 x and t2 = x0 A2 x, where

A1 = (e1 + e2 − 2e3 )(e1 + e2 − 2e3 )0 + (e3 − e4 )(e3 − e4 )0 = γ 1 γ 01 + γ 2 γ 02 ,

A2 = (e1 − e2 − e3 )(e1 − e2 − e3 )0 + (e1 + e2 − e4 )(e1 + e2 − e4 )0 = γ 3 γ 03 + γ 4 γ 04 ,

128
(a) Using Theorem 11.23(c), we get

var(t1 ) = 2 tr{(A1 Ω)2 } + 4µ0 A1 ΩA1 µ

= 2 tr[{(γ 1 γ 01 + γ 2 γ 02 )(4I4 + 14 104 )}2 ]

+ 4104 (γ 1 γ 01 + γ 2 γ 02 )(4I4 + 14 104 )(γ 1 γ 01 + γ 2 γ 02 )14

= 2 tr{(4γ 1 γ 01 + 4γ 2 γ 02 )2 }

= 1536

(b) Another application of Theorem 11.23(c) yields

var(t2 ) = 2 tr{(A2 Ω)2 } + 4µ0 A2 ΩA2 µ

= 2 tr{(4γ 3 γ 03 + 4γ 4 γ 04 − γ 3 104 + γ 4 104 )2 }

+ 4104 (4γ 3 γ 03 + 4γ 4 γ 04 − γ 3 104 + γ 4 104 )(γ 3 γ 03 + γ 4 γ 04 )14

= 2(340) + 4(28) = 792.

(c) Using Theorem 11.23(b), we have

cov(t1 , t2 ) = 2 tr(A1 ΩA2 Ω) + 4µ0 A1 ΩA2 µ

= 2 tr{(4γ 1 γ 01 + 4γ 2 γ 02 )(4γ 3 γ 03 + 4γ 4 γ 04 − γ 3 104 + γ 4 104 )}

+ 4104 (4γ 1 γ 01 + 4γ 2 γ 02 )(γ 3 γ 03 + γ 4 γ 04 )14

= 2(160) + 4(0) = 320.

11.41 Since Vi ∼ Wm (Ω, ni ), it can be written as Vi = Xi0 Xi , where the columns of the m × ni matrix Xi0
are independently distributed as Nm (0, Ω). Put X 0 = (X10 , X20 ), and note that since V1 and V2 are
independent, the n1 + n2 columns of X 0 are independently distributed as Nm (0, Ω). It then follows
from Theorem 11.25, that

X 0 X = X10 X1 + X20 X2 = V1 + V2 ∼ Wm (Ω, n1 + n2 ).

−1 0
11.43 (a) In finding the distribution of V1·2 = V11 − V12 V22 V12 in the proof of Theorem 11.27, it was
observed that the distribution of V1·2 given X2 does not depend on X2 . Since V22 = X20 X2 ,
this implies that V1·2 is independent of V22 . Also, since V1·2 = X10 {In − X2 (X20 X2 )−1 X20 }X1 ,

129
V12 = X10 X2 , and {In − X2 (X20 X2 )−1 X20 }X2 = (0), it follows that V1·2 and V12 are independent
given X2 , that is, given V22 . Their unconditional independence also follows since
Z
fV1·2 ,V12 (U1 , U2 ) = fV1·2 ,V12 ,V22 (U1 , U2 , U3 )dU3
Z
= fV1·2 ,V12 |V22 (U1 , U2 |U3 )fV22 (U3 )dU3
Z
= fV1·2 |V22 (U1 |U3 )fV12 |V22 (U2 |U3 )fV22 (U3 )dU3
Z
= fV1·2 (U1 )fV12 |V22 (U2 |U3 )fV22 (U3 )dU3
Z
= fV1·2 (U1 )fV12 ,V22 (U2 , U3 )dU3

= fV1·2 (U1 )fV12 (U2 ).

(b) Applying the result from Example 7.3, we have

vec(X10 )|X2 ∼ Nm1 n (vec(Ω12 Ω−1 0 −1 0


22 X2 ), In ⊗ Ω11 − Ω12 Ω22 Ω12 ).

Then since
vec(V12 ) = vec(X10 X2 ) = (X20 ⊗ Im1 ) vec(X10 ),

we see that given V22 ,

vec(V12 ) ∼ Nm1 m2 (vec(Ω12 Ω−1 −1 0


22 V22 ), V22 ⊗ (Ω11 − Ω12 Ω22 Ω12 )).

11.45 Let U = Ω−1/2 V Ω−1/2 , where Ω1/2 is the symmetric square root of Ω. It follows from Theorem 11.26
that U ∼ Wm (Im , n). Note that if B = AΩ−1/2 , then

(AV −1 A0 )−1 = (BΩ1/2 Ω−1/2 U −1 Ω−1/2 Ω1/2 B 0 )−1 = (BU −1 B 0 )−1 ,

and (AΩ−1 A0 )−1 = (BB 0 )−1 . Thus, the result will be proven if we can show that (BU −1 B 0 )−1 ∼
Wp ((BB 0 )−1 , n − m + p). Now it follows from Theorem 4.1 that B can be written as B = C[Ip (0)]Q,
where C is a p × p nonsingular matrix and Q is an m × m orthogonal matrix. Then
   −1
Ip
(BU −1 B 0 )−1 = C[Ip (0)]QU −1 Q0   C 0
(0)
  −1
Ip
= C −10 [Ip (0)](QU Q0 )−1   C −1
(0)

130
  −1
Ip
= C −10 [Ip (0)]D−1   C −1 ,
(0)

where D = QU Q0 ∼ Wm (Im , n) due to Theorem 11.26. Partition D and D−1 as


   
D11 D12 E11 E 12
D= , E = D−1 =  ,
0 0
D12 D22 E12 E22
−1 −1 −1 −1 0
where D11 and E11 are p × p. Now (BU −1 B 0 )−1 = C −10 E11 C and since E11 = D11 − D12 D22 D12 ,
−1 −1 −1
it follows from Theorem 11.27 that E11 ∼ Wp (Ip , n − m + p). Thus, C −10 E11 C ∼ Wp ((CC 0 )−1 , n −
m + p), and since (CC 0 )−1 = (BB 0 )−1 , the proof is complete.

11.47 Write A = P ΛP 0 , where P is an orthogonal matrix and Λ = diag(λ1 , . . . , λn ). Let Y 0 = (y 1 , . . . , y n ) =


X 0 P , so that M∗0 = E(Y 0 ) = M 0 P .

(a) Note that

var{vec(Y 0 )} = var{vec(X 0 P )} = var{(P 0 ⊗ Im ) vec(X 0 )}

= (P 0 ⊗ Im )(In ⊗ Ω)(P ⊗ Im ) = In ⊗ Ω.

Thus,
n
!
X
0 0 0 0
E(X AX) = E(X P ΛP X) = E(Y ΛY ) = E λi y i y 0i
i=1
n n n
!
X X X
= λi E(y i y 0i ) = λi (Ω + µi∗ µ0i∗ ) = Ω λi + M∗0 ΛM∗
i=1 i=1 i=1
= Ω tr(A) + M 0 P ΛP 0 M = Ω tr(A) + M 0 AM.

(b) Since !
n
X n
X
0 0
vec(X AX) = vec(Y ΛY ) = vec λi y i y 0i = λi vec(y i y 0i ),
i=1 i=1

we have
n
X n
X
var{vec(X 0 AX)} = λ2i var{vec(y i y 0i )} = λ2i var(y i ⊗ y i )
i=1 i=1
n
X
= λ2i (2Nm {Ω ⊗ Ω + Ω ⊗ µi∗ µ0i∗ + µi∗ µ0i∗ ⊗ Ω})
i=1

131
( n
! )
X
= 2Nm λ2i (Ω ⊗ Ω) + Ω ⊗ M∗0 Λ2 M∗ + M∗0 Λ2 M∗ ⊗Ω
i=1
= 2Nm {tr(A )(Ω ⊗ Ω) + Ω ⊗ M 0 P Λ2 P 0 M + M 0 P Λ2 P 0 M ⊗ Ω}
2

= 2Nm {tr(A2 )(Ω ⊗ Ω) + Ω ⊗ M 0 A2 M + M 0 A2 M ⊗ Ω},

where we have used the result from Problem 11.36(a).

11.49 Let Yi0 = (y i1 , . . . , y ini ) and Y 0 = (Y10 , . . . , Yk0 ), so that ȳ i = n−1 0


i Yi 1ni and ȳ = n
−1
Y 1n . Note that B
can be expressed as
k
X
B = ni ȳ i ȳ 0i − nȳ ȳ 0
i=1
k
X
= n−1 0 0
i Yi 1ni 1ni Yi − n
−1 0
Y 1n 10n Y
i=1
= Y 0 A1 Y,

where A1 = O(N −1 − n−1 1k 10k )O0 , N = diag(n1 , . . . , nk ), and


 
1n1 0 ··· 0
 
 0 1 n2 ··· 0
 

O=
 .. .. ..
.

 . . . 
 
0 0 ··· 1nk

Note that since O0 O = N and 10k N 1k = n,

A21 = O(N −1 − n−1 1k 10k )O0 O(N −1 − n−1 1k 10k )O0

= O(N −1 N N −1 − n−1 N −1 N 1k 10k − n−1 1k 10k N N −1 + n−2 1k 10k N 1k 10k )O0

= O(N −1 − n−1 1k 10k )O0 = A1 ,

and !
k
1 0 1 X
M A1 M = ni µi µ0i − nµ̄µ̄0
= Φ,
2 2 i=1

where M 0 = (µ1 10n1 , . . . , µk 10nk ). Since

tr(A1 ) = tr{(N −1 − n−1 1k 10k )N } = tr(Ik ) − n−1 10k N 1k = k − 1,

132
it follows that B ∼ Wm (Ω, k − 1, Φ). Turning to W , we see that it can be expressed as
 
X k Xni
W =  y ij y 0ij − ni ȳ i ȳ 0i 
i=1 j=1
k
X
= Yi0 (Ini − n−1 0
i 1ni 1ni )Yi
i=1
= Y 0 A2 Y,

where A2 = diag(In1 − n−1 0 −1 0 −1 0


1 1n1 1n1 , . . . , Ink − nk 1nk 1nk ). Clearly, since Ini − ni 1ni 1ni is idempotent,

A2 is idempotent and
k
1 0 1X 0
M A2 M = M (In − n−1 0
i 1ni 1ni )Mi
2 2 i=1 i i
k
1X
= µ 10 (In − n−1 0 0
i 1ni 1ni )1ni µi = (0),
2 i=1 i ni i

where we have written M 0 = (M10 , . . . , Mk0 ) with Mi0 = µi 10ni . Thus, W ∼ Wm (Ω, n − k) since
k
X k
X
tr(A2 ) = tr{(Ini − n−1 0
i 1ni 1ni )} = (ni − 1) = n − k.
i=1 i=1

Finally, since (Ini − n−1 0


i 1ni 1ni )1ni = 0, we see that A2 A1 = (0). This guarantees that B and W are

independently distributed.

11.51 First we find var{vec(X 0 AX)}. Note that

vec(X 0 AX) = (X 0 ⊗ X 0 ) vec(A) = vec{Im2 (X 0 ⊗ X 0 ) vec(A)}

= {vec(A) ⊗ Im2 }0 vec(X 0 ⊗ X 0 ).

Since the columns of X 0 are independent, each with covariance matrix Ω, var{vec(X 0 )} = In ⊗ Ω, and
so using Theorem 8.27, Theorem 11.21(d), and the fact that

Knm,nm = (In ⊗ Kmn ⊗ Im )(Knn ⊗ Kmm )(In ⊗ Knm ⊗ Im )

due to Problem 8.58(b), we find that

var{vec(X 0 AX)} = {vec(A) ⊗ Im2 }0 {var(X 0 ⊗ X 0 )}{vec(A) ⊗ Im2 }

= {vec(A) ⊗ Im2 }0 (In ⊗ Knm ⊗ Im )(var{vec(X 0 ) ⊗ vec(X 0 )})

133
× (In ⊗ Kmn ⊗ Im ){vec(A) ⊗ Im2 }

= {vec(A) ⊗ Im2 }0 (In ⊗ Knm ⊗ Im )(Im2 n2 + Knm,nm )

× {(In ⊗ Ω) ⊗ (In ⊗ Ω)}(In ⊗ Kmn ⊗ Im ){vec(A) ⊗ Im2 }

= {vec(A) ⊗ Im2 }0 (Im2 n2 + Knn ⊗ Kmm )(In ⊗ Knm ⊗ Im )

× {(In ⊗ Ω) ⊗ (In ⊗ Ω)}(In ⊗ Kmn ⊗ Im ){vec(A) ⊗ Im2 }

= {vec(A) ⊗ Im2 )0 (Im2 n2 + Knn ⊗ Kmm )(In2 ⊗ Ω ⊗ Ω){vec(A) ⊗ Im2 }

= {vec(A)0 vec(A)}(Ω ⊗ Ω) + {vec(A0 )0 vec(A)}Kmm (Ω ⊗ Ω)

= tr(A0 A)(Ω ⊗ Ω) + tr(A2 )Kmm (Ω ⊗ Ω).

Now

V = (X + M )0 A(X + M ) = X 0 AX + X 0 AM + M 0 AX + M 0 AM.

Since the first and third-order moments of vec(X 0 ) are all zero, we have

var{vec(V )} = var{vec(X 0 AX)} + var[{M 0 A0 ⊗ Im + Kmm (M 0 A ⊗ Im )} vec(X 0 )]

= tr(A0 A)(Ω ⊗ Ω) + tr(A2 )Kmm (Ω ⊗ Ω) + {M 0 A0 ⊗ Im + Kmm (M 0 A ⊗ Im )}

× (In ⊗ Ω){AM ⊗ Im + (A0 M ⊗ Im )Kmm }

= tr(A0 A)(Ω ⊗ Ω) + tr(A2 )Kmm (Ω ⊗ Ω) + M 0 A0 AM ⊗ Ω

+ Kmm (M 0 AA0 M ⊗ Ω)Kmm + (M 0 A2 M ⊗ Ω)0 Kmm + Kmm (M 0 A2 M ⊗ Ω)

= tr(A0 A)(Ω ⊗ Ω) + tr(A2 )Kmm (Ω ⊗ Ω) + M 0 A0 AM ⊗ Ω

+ Ω ⊗ M 0 AA0 M + Kmm (Ω ⊗ M 0 A2 M )0 + Kmm (M 0 A2 M ⊗ Ω).

11.53 Since Λm = Λm Nm = Nm Λm and, from Problem 8.60,

{(Im ⊗ P ) + (P ⊗ Im )}Nm = 2Nm (Im ⊗ P )Nm ,

we have
 
1 1
Im2 − {(Im ⊗ P ) + (P ⊗ Im )}Λm Nm = Nm − {(Im ⊗ P ) + (P ⊗ Im )}Nm Λm
2 2
1
= Nm − 2Nm (Im ⊗ P )Nm Λm
2
= Nm {Im2 − (Im ⊗ P )Λm }.

134

You might also like