Multivariate Analysis - M.E
Multivariate Analysis - M.E
1.1. Review of vectors and matrices. (The results are stated for
vectors and matrices with real entries but also hold for complex entries.)
An m × n matrix A ≡ {aij } is an array of mn numbers:
a11 . . . a1n
. ..
A = .. . .
am1 . . . amn
This matrix represents the linear mapping (≡ linear transformation)
A : Rn → Rm
(1.1)
x → Ax,
where x ∈ Rn is written as an n × 1 column vector and
n a x
a11 . . . a1n x1 j=1 1j j
. . .
Ax ≡ .. .. .. ≡ ..
. ∈R .
m
n
am1 . . . amn xn j=1 amj xj
B A
Then AB is the matrix of the composition Rp → Rn → Rm of the two
linear mappings determined by A and B [verify]:
(AB)x = A(Bx) ∀x ∈ Rp .
1
STAT 542 Notes, Winter 2007; MDP
(1.3) (A + B) = A + B ;
(1.4) (AB) = B A (A : m × n, B : n × p);
(1.5) (A−1 ) = (A )−1 (A : n × n, nonsingular).
AA−1 = A−1 A = I,
2
STAT 542 Notes, Winter 2007; MDP
(a) A is nonsingular.
(b) The n columns of A are linearly independent (i.e., column rank(A) = n).
Equivalently, Ax = 0 for every nonzero x ∈ Rn .
(c) The n rows of A are linearly independent (i.e., row rank(A) = n).
Equivalently, x A = 0 for every nonzero x ∈ Rn .
(d) The determinant |A| = 0 (i.e., rank(A) = n). [Define det geometrically.]
Proof of (1.9):
m m n
tr(AB) = (AB)ii = aik bki
i=1 i=1 k=1
n m n
= bki aik = (BA)kk = tr(BA).
k=1 i=1 k=1
3
STAT 542 Notes, Winter 2007; MDP
(1.15) ΓΓ = I.
This is equivalent to the fact that the n row vectors of Γ form an orthonor-
mal basis for Rn . Note that (1.15) implies that Γ = Γ−1 , hence also
Γ Γ = I, which is equivalent to the fact that the n column vectors of Γ also
form an orthonormal basis for Rn .
Note that Γ preserves angles and lengths, i.e., preserves the usual inner
product and norm in Rn : for x, y ∈ Rn ,
4
STAT 542 Notes, Winter 2007; MDP
cc̄ = a2 + b2 ≡ |c|2 ,
¯
cd = c̄d.
For any complex matrix C ≡ {cij }, let C̄ = {c̄ij } and define C ∗ = C̄ . Note
that
(1.16) (CD)∗ = D∗ C ∗ .
(1.17) |A − l I| = 0.
These roots may be real or complex; the complex roots occur in conjugate
pairs. Note that the eigenvalues of a triangular or diagonal matrix are just
its diagonal elements.
By (b) (for matrices with possibly complex entries), for each eigenvalue
l there exists some nonzero (possibly complex) vector u ∈ Cn s.t.
(A − l I)u = 0,
equivalently,
(1.18) Au = lu.
5
STAT 542 Notes, Winter 2007; MDP
(1.19) 1 , 0, . . . , 0)
ui ≡ (0, . . . , 0,
i
Su = lu ⇒ u∗ Su = lu∗ u = l.
6
STAT 542 Notes, Winter 2007; MDP
lψ γ = ψ Sγ = (ψ Sγ) = γ Sψ = mγ ψ = mψ γ,
so γ ψ = 0 since l = m.
¯
(1.20) S = Γ Dl Γ ,
(1.21) Γ ≡ (γ1 , . . . , γn ) : n × n
(1.24) x Sx ≥ 0 ∀ x ∈ Rn ;
7
STAT 542 Notes, Winter 2007; MDP
(1.26) E ≡ {x ∈ Rn | x S −1 x = 1}
√ √
is the ellipsoid with principle axes l1 γ1 , . . ., ln γn .
Proof. (a) Apply the above results and the spectral decomposition (1.20).
(b) From (1.20), S = ΓDl Γ with Γ = (γ1 · · · γn ), so S −1 = ΓDl−1 Γ and,
8
STAT 542 Notes, Winter 2007; MDP
√ √
But E0 is the ellipsoid with principal axes l1 u1 , . . ., √ln un (recall
√ (1.19))
and Γui = γi , so E is the ellipsoid with principle axes l1 γ1 , . . ., ln γn .
¯
1 1 1
S 2 = Γ diag(l12 , . . . , ln2 ) Γ ≡ ΓDl2 Γ ;
1
(1.27)
1
this is a symmetric square root of S. Any square root S 2 is nonsingular, for
1 1
(1.28) |S 2 | = |S| 2 > 0.
n1 n2
n1 S11 S12
(1.29) S= ,
n2 S21 S22
where
−1
(1.31) S11·2 ≡ S11 − S12 S22 S21
9
STAT 542 Notes, Winter 2007; MDP
(1.34) S is pd ⇐⇒ S11·2 and S22 are pd ⇐⇒ S22·1 and S11 are pd.
m n−m
(1.38) Γ =n Γ1 Γ2 ,
10
STAT 542 Notes, Winter 2007; MDP
so (1.37) becomes
(1.39) P = Γ1 Γ1 .
P Γ1 = (Γ1 Γ1 ) Γ1 = Γ1 ,
P Γ2 = (Γ1 Γ1 ) Γ2 = 0.
11
STAT 542 Notes, Winter 2007; MDP
|S + U U | = |S| · |Iq + U S −1 U |,
−1 a S −1 a
a (S + aa ) a= .
1 + a S −1 a
3. For S : p × p and T : p × p with S > 0 and T ≥ 0, show that
−1 λi (T S −1 )
λi [T (S + T ) ]= , i = 1, . . . , p,
1 + λi (T S −1 )
12
STAT 542 Notes, Winter 2007; MDP
Then
−1
S(12)·3 ≡ S(12) − S(12)3 S33 S3(12)
S11 − S13 S3−1 S31 S12 − S13 S3−1 S32
=
S21 − S23 S3−1 S31 S22 − S23 S3−1 S32
S11·3 S12·3
≡ ,
S21·3 S22·3
but in general
S 11 = (S11·2 )−1 , S11 = (S 11·2 )−1 ;
instead,
S 11 = (S11·(23) )−1 , S11 = (S 11·(23) )−1 .
Show:
(i) (S(12)·3 )11·2 = S11·(23) .
(ii) S11·2 = (S 11·3 )−1 .
(iii) S12·3 (S22·3 )−1 = −(S 11 )−1 S 12 .
(iv) S11 ≥ S11·2 ≥ S11·(23) . When do the inequalities become equalities?
(v) S12·3 (S22·3 )−1 = −(S 11·4 )−1 S 12·4 . (for a 4 × 4 partitioning.)
13
STAT 542 Notes, Winter 2007; MDP
X1
.
1.3. Random vectors and covariance matrices. Let X ≡ .. be
Xn
a rvtr in R . The expected value of X is the vector
n
E(X1 )
E(X) ≡ ... ,
E(Xn )
14
STAT 542 Notes, Winter 2007; MDP
and [verify]
X⊥
⊥ Y ⇒ Cov(X, Y ) = 0
(1.47) ⇒ Cov(X ± Y ) = Cov(X) + Cov(Y ).
15
STAT 542 Notes, Winter 2007; MDP
Exercise 1.8. Verify the Weak Law of Large Numbers (WLLN) for rvtrs:
p
X̄n converges to µ in probability (Xn → µ), that is, for each > 0,
P [X̄n − µ ≤ ] → 1 as n → ∞.
E(s2n ) = 1
n−1 E X
i=1 i
2
− n(X̄ n )2
2
= n−1 n(σ + µ ) − n n [1 + (n − 1)ρ] + µ
1 2 2 σ 2
(1.53) = (1 − ρ)σ 2 .
16
STAT 542 Notes, Winter 2007; MDP
Example 1.9b. We now re-derive (1.51) and (1.53) via covariance matri-
ces, using properties (1.44) and (1.45). Set X = (X1 , . . . , Xn ) , so
µ 1
.. .
(1.54) E(X) = . ≡ µ en , where en = .. : n × 1,
µ 1
1 ρ ··· ρ
.
. ..
..
ρ 2 1
Cov(X) = σ .
. .. ..
. . ρ
.
ρ ··· ρ 1
σ2
Var(X̄n ) = n2 en [(1 − ρ)In + ρ en en ]en
σ2
= n2 [(1 − ρ)n + ρn ]
2
[since en en = n]
σ2
= n [1 + (n − 1)ρ],
n
n
(Xi − X̄n ) =
2
Xi2 − n(X̄n )2
i=1 i=1
= X X − n1 (en X)2
= X X − n1 (X en )(en X)
≡ X In − en en X
(1.56) ≡ X QX,
where en ≡ en
√
n
is a unit vector, P ≡ en en is the projection matrix of
rank 1 ≡ tr(en en ) that projects Rn orthogonally onto the 1-dimensional
17
STAT 542 Notes, Winter 2007; MDP
Exercise 1.10. Prove Lemma 1.11 below, and use it to show that
(1.57) E(X QX) = (n − 1)(1 − ρ)σ 2 ,
which is equivalent to (1.53).
¯
Example 1.9c. Eqn. (1.53) also can be obtained from the properties of
the projection matrix Q. First note that [verify]
√
(1.59) Qen = nQen = 0.
Define
Y1
.
(1.60) Y ≡ .. = QX : n × 1,
Yn
so
(1.61) E(Y ) = Q E(X) = µ Qen = 0,
18
STAT 542 Notes, Winter 2007; MDP
Exercise 1.12. Show that Cov(X) ≡ σ 2 [(1 − ρ)In + ρen en ] in (1.55) has
one eigenvalue = σ 2 [1 + (n − 1)ρ] with eigenvector en , and n − 1 eigenvalues
= σ 2 (1 − ρ). ¯
19
STAT 542 Notes, Winter 2007; MDP
p
= (2π)− 2 e− 2 x x ,
1
(2.1) x ∈ Rp .
where
E(Y ) = A E(X) + µ = µ,
Cov(Y ) = A Cov(X) A = AA ≡ Σ > 0.
Since the distribution of Y depends only on µ and Σ, we denote this distri-
bution by Np (µ, Σ), the multivariate normal distribution (MVND) on Rp
with mean vector µ and covariance matrix Σ.
20
STAT 542 Notes, Winter 2007; MDP
p1 p2
p1 Y1 p1 µ1 p1 Σ11 Σ12
(2.6) Y = , µ= , Σ= ,
p2 Y2 p2 µ2 p2 Σ21 Σ22
where p1 + p2 = p. Then Y1 ⊥
⊥ Y2 ⇐⇒ Σ12 = 0.
Proof. This follows from the pdf (2.2) or the mgf (2.4).
¯
Since also |Σ| = |Σ11·2 | |Σ22 |, the result follows from the pdf (2.2).
21
STAT 542 Notes, Winter 2007; MDP
Thus by Lemma 2.1 for C = ( Ip1 0p1 ×p2 ) and ( 0p2 ×p1 Ip2 ), respectively,
Y1 − Σ12 Σ−1
22 Y2 ∼ N p1
µ1 − Σ 12 Σ −1
22 µ2 , Σ 11·2 ,
Y2 ∼ Np2 (µ2 , Σ22 ) ,
22
STAT 542 Notes, Winter 2007; MDP
Suppose, however, that we omit Σ−1 in (2.17) and seek the distribution
of (X − µ) (X − µ). Then this will not have a chi-square distribution in
general. Instead, by the spectral decomposition Σ = ΓDλ Γ , (2.16) yields
23
STAT 542 Notes, Winter 2007; MDP
1
Let the first row of Γ be ξ ≡ ξ
ξ and let the remaining n − 1 rows be any
orthonormal basis for L⊥ .
24
STAT 542 Notes, Winter 2007; MDP
√ ∞ k k
1 δ d
v t δ2 t2
= √ e− 2 √
e− 2 dt
2π dv − v k!
k=0
∞ √
1 −δ δ 2 d
k v
t2
=√ e 2 √
tk e− 2 dt
2π k! dv − v
k=0
∞ √
1 − δ δk d v
t2
=√ e 2 √
t2k e− 2 dt [why?]
2π (2k)! dv − v
k=0
∞
1 − δ δ k k− 1 − v
=√ e 2 v 2e 2 [verify]
2π (2k)!
k=0
∞
1+2k
δ k
(2) v 2 −1 −
e 2
v
= e− 2
δ
(2.25) 1+2k 1+2k
· ck ,
k! 2 2 Γ 2
k=0
pdf of χ 2
Poisson( 2 ) weights
δ
1+2k
where
1+2k
2k k! 2 Γ 1+2k
2
ck = √ 2
=1
(2k)! 2π
The representation (2.27) can be used to obtain the mean and variance
of χ2n (δ):
25
STAT 542 Notes, Winter 2007; MDP
Exercise 2.6. Show that the noncentral chi-square distribution χ2n (δ) is
stochastically increasing in both n and δ.
¯
since
Z ≡ Σ− 2 X ∼ Nn (Σ− 2 µ, In )
1 1
and
Σ− 2 µ2 = µ Σ−1 µ.
1
But
Γ1 Y ∼ Nm (Γ1 ξ, σ 2 Γ1 Γ1 ) = Nm (Γ1 ξ, σ 2 Im ),
26
STAT 542 Notes, Winter 2007; MDP
2.4. Joint pdf of a random sample from the MVND Np (µ, Σ).
Let X1 , . . . , Xn be an i.i.d random sample from Np (µ, Σ). Assume that Σ
is positive definite (pd) so that each Xi has pdf given by (2.2). Thus the
joint pdf of X1 , . . . , Xn is
n
1 −1
e− 2 (xi −µ) Σ (xi −µ)
1
f (x1 , . . . , xn ) = p 1
(2π) |Σ|
i=1
2 2
1 n
− 12 (xi −µ) Σ−1 (xi −µ)
= np n e i=1
(2π) 2 |Σ| 2
1 n
− 12 tr[Σ−1 ( (xi −µ)(xi −µ) )]
= np n e i=1
(2π) 2 |Σ| 2
1 −n −1
2 (x̄−µ) Σ (x̄−µ)− 12 tr(Σ−1 s)
(2.32) = np n e ,
(2π) 2 |Σ| 2
or alternatively,
−1
e− 2 µ Σ µ nx̄ Σ−1 µ− 12 tr(Σ−1 t)
n
(2.33) = np n e ,
(2π) 2 |Σ| 2
where
1
n n n
X̄ = Xi , S= (Xi − X̄)(Xi − X̄) , T = Xi Xi .
n i=1 i=1 i=1
It follows from (2.32) and (2.33) that (X̄, S) and (X̄, T ) are equivalent
representations of the minimal sufficient statistic for (µ, Σ). Also from
(2.33), with no further restrictions on (µ, Σ), this MVN statistical model
constitutes a p + p(p+1)
2 -dimensional full exponential family with natural
−1 −1
parameter (Σ µ, Σ ).
27
STAT 542 Notes, Winter 2007; MDP
X = (X1 , . . . , Xn ) : p × n,
n
S = XX = Xi Xi : p × p.
i=1
In particular, for a : p × 1,
so S is singular w.pr.1.
28
STAT 542 Notes, Winter 2007; MDP
(⇐) Method 1 (Stein; Eaton and Perlman (1973) Ann. Statist.) Assume
that Σ is pd and n ≥ p. Since
p
n
S = XX = Xi Xi + Xi Xi ,
i=1 i=p+1
p
it suffices to show that i=1 Xi Xi is pd w. pr. 1. Thus we can take n = p,
so X : p × p is a square matrix. Then |S| = |X|2 , so it suffices to show that
X itself is nonsingular w.pr.1. But
p
{X singular} = {Xi ∈ Si ≡ span {Xj | j = i}} ,
i=1
so
p
Pr[X singular] ≤ Pr[Xi ∈ Si ]
i=1
p
= E {Pr[Xi ∈ Si | Xj , j = i]} = 0,
i=1
29
STAT 542 Notes, Winter 2007; MDP
Pr[ rank(X1 ) = p ] = 1
30
STAT 542 Notes, Winter 2007; MDP
(i) A ⊗ B is bilinear:
(ii) A ⊗
( C ⊗
B ) ( AC ⊗
D ) = ( BD ).
p×q m×n q×r n×s p×r m×s
(A ⊗ B) = A ⊗ B ,
(iii)
A = A , B = B =⇒ A ⊗ B = (A ⊗ B) .
A = ΓDα Γ , B = ΨDβ Ψ ,
31
STAT 542 Notes, Winter 2007; MDP
A ⊗ B = (ΓDα Γ ) ⊗ (ΨDβ Ψ )
(3.8) = (Γ ⊗ Ψ) (Dα ⊗ Dβ ) (Γ ⊗ Ψ)
Cov(X )
· · · Cov(X1 , Xn )
X1 1
≡ Cov .. ≡
Cov(X) := Cov(X)
.. ..
.
..
.
. . .
Xn Cov(Xn , X1 ) · · · Cov(Xn )
32
STAT 542 Notes, Winter 2007; MDP
(3.9) Cov(
A B ) = (AΣA ) ⊗ (B ΛB) .
X
q×p p×n n×m q×q m×m
= (A ⊗ In )X,
AX
so
= (A ⊗ In ) Cov(X)
Cov(AX) ≡ Cov(AX) (A ⊗ In )
= (A ⊗ In ) (Σ ⊗ Λ) (A ⊗ In )
= (AΣA ) ⊗ Λ [by (ii)].
(b) Next,
Cov(X ) = Λ ⊗ Σ [Lemma 3.8],
so
Cov(B X ) = (B ΛB) ⊗ Σ [(b)],
hence
Cov(XB) ≡ Cov ((B X ) ) = Σ ⊗ (B ΛB) [Lemma 3.8].
—————————————
Looking ahead: Our goal will be to determine the joint distribution of the
matrices (S11·2 , S12 , S22 ) that arise from a partitioned Wishart matrix S.
In §3.4 we will see that the conditional distribution of S12 | S22 follows a
multivariate normal linear model (MNLM) of the form (3.14) in §3.3, whose
covariance structure has Kronecker product form. Therefore we will first
study this MNLM and determine the joint distribution of its MLEs (β̂, Σ̂)
given by (3.15) and(3.16). This will readily yield the joint distribution of
(S11·2 , S12 , S22 ), which in turn will have several interesting consequences,
including the evaluation of E(S −1 ) and the distribution of Hotelling’s T 2
statistic X̄n S −1 X̄n .
33
STAT 542 Notes, Winter 2007; MDP
(3.11) L = {βZ | β : 1 × q ∈ Rq },
E(X) = βZ, β : 1 × q,
(3.12)
Cov(X) = σ 2 In , σ 2 > 0.
E(X) = βZ, β : p × q,
(3.13)
Cov(X) = Σ ⊗ In , Σ > 0.
Often Z is called a design matrix for the linear model. We now assume that
Z is of rank q ≤ n, so ZZ is nonsingular and β is identifiable:
The maximum likelihood estimator (β̂, Σ̂). We now show that the
MLE (β̂, Σ̂) exists w. pr. 1 iff n − q ≥ p and is given by
34
STAT 542 Notes, Winter 2007; MDP
c1 n
− 12 (xi −βZi ) Σ−1 (xi −βZi )
fβ,Σ (x) = n · e i=1
|Σ| 2
c1 n
− 12 tr[Σ−1 ( (xi −βZi )(xi −βZi ) )]
= n · e i=1
|Σ| 2
c1 − 12 tr[Σ−1 (x−βZ)(x−βZ) ]
(3.17) = n · e ,
|Σ| 2
np
where c1 = (2π)− 2 and Z1 , . . . , Zn are the columns of Z. To find the MLEs
β̂, Σ̂, first fix Σ and maximize (3.17) w.r.to β. This can be accomplished
by “minimizing” the matrix-valued quadratic form
w.r.to the Loewner ordering 2 , which a fortiori minimizes tr[Σ−1 ∆(β)] [ver-
ify]. Since each row of βZ lies in L ≡ row space (Z) ⊂ Rn , this suggests
that the minimizing β̂ be chosen such that each row of β̂Z is the orthogonal
projection of the corresponding row of X onto L. But the matrix of this
orthogonal projection is
P ≡ Z (ZZ )−1 Z : n × n
2
T ≥ S iff T − S is psd.
35
STAT 542 Notes, Winter 2007; MDP
1 − 12 tr(Σ−1 W ) 1 − np
(3.22) max n e = n · e
2 ,
Σ>0 |Σ| 2 |Σ̂| 2
where Σ̂ ≡ 1
nW is the unique maximizing value of Σ.
Proof. Since the mappings
Σ → Σ−1 := Λ
Λ → (W ) ΛW
1 1
2 2 := Ω
are both bijections of Sp+ onto itself, the maximum in (3.22) is given by
1
max |Λ| 2 e− 2 tr(ΛW ) = 2 e− 2 trΩ
n 1 n 1
(3.23) n max |Ω|
Λ>0 |W | 2 Ω>0
1
p
n
ωi2 e− 2 ωi ,
1
= n max
|W | 2 ω1 ≥···≥ωp >0 i=1
36
STAT 542 Notes, Winter 2007; MDP
(3.24) XQX is pd w . pr . 1 ⇐⇒ n − q ≥ p.
rank(Q) = tr(Q) = n − q,
37
STAT 542 Notes, Winter 2007; MDP
hence (3.24) follows from Lemma 3.2. Lastly, by (3.25), (3.29), and (3.27),
1
n
(3.34) µ̂ = Xen (en en )−1 = Xi = X̄n ∼ Np µ, 1
nΣ ,
n i=1
1
n
(3.35) nΣ̂ = XQX = Xi − X̄n Xi − X̄n ∼ Wp (n − 1, Σ),
n i=1
(3.36) X̄n ⊥
⊥ Σ̂ .
38
STAT 542 Notes, Winter 2007; MDP
39
STAT 542 Notes, Winter 2007; MDP
X ↔ Y1 , β ↔ Σ12 Σ−1
22 , p ↔ p1 ,
Z ↔ Y2 , Σ ↔ Σ11·2 , q ↔ p2 .
Clearly (3.46) ⇒ (3.39), while (3.40) follows from Lemma 3.1 with
A = ( 0p2 ×p1 Ip2 ). Also, (3.47) ⇒ (3.41) and (3.47) ⇒ S11·2 ⊥⊥ Y2 , which
combines with (3.48) to yield S11·2 ⊥
⊥ (S12 , Y2 ), which implies (3.42).
3 ¯
Note that (3.39) can be restated in two equivalent forms:
−1
1
where S22
2
can be any (Borel-measurable) square root of S22 It follows from
(3.50) and (3.42) that
−1
(3.51) Σ12 = 0 =⇒ S12 S222 ⊥
⊥ S22 ⊥
⊥ S11·2 .
We remark that Proposition 3.13 can also be derived directly from the
pdf of the Wishart distribution, the existence of which requires the stronger
conditions n ≥ p and Σ > 0. We shall derive the Wishart pdf in §8.4.
40
STAT 542 Notes, Winter 2007; MDP
p
(3.52) |S| ∼ |Σ| · χ2n−p+i ,
i=1
1
1Note
that (3.52) implies that although nS is an unbiased estimator of
Σ, n S is a biased estimator of |Σ|:
p
1 1
(3.55) −1
∼ −1
· χ2n−p+1 .
aS a aΣ a
Note: Compare (3.54) to (3.1): ASA ∼ Wq (n, AΣA ), which holds with
no restrictions on n, p, Σ, A, or q.
41
STAT 542 Notes, Winter 2007; MDP
(3.56) A = ΓDa Ψ1 ,
where Da = diag(a1 , . . . , aq ) and a21 ≥ · · · ≥ a2q > 0 are the ordered eigen-
Ψ
values of AA .4 By extending Ψ1 to a p × p orthogonal matrix Ψ ≡ 1
,
Ψ2
we have the alternative representations
(3.57) A = Γ ( Da 0q×(p−q) ) Ψ,
(3.58) = C ( Iq 0q×(p−q) ) Ψ,
4
a1 ≥ · · · ≥ aq > 0 are called the singular values of A.
42
STAT 542 Notes, Winter 2007; MDP
T 2 = X S −1 X.
Then
χ2p (µ Σ−1 µ)
(3.59) T ∼
2
2 ≡ Fp, n−p+1 (µ Σ−1 µ),
χn−p+1
T 2 ≡ (X − µ0 ) S −1 (X − µ0 )
χ2p (µ − µ0 ) Σ−1 (µ − µ0 ) −1
(3.60) ∼ ≡ F p, n−p+1 (µ − µ0 ) Σ (µ − µ0 ) .
χ2n−p+1
Note: In Example 6.11 and Exercise 6.12 it will be shown that T 2 is the
UMP invariant test statistic and the LRT statistic for testing µ = µ0 vs.
µ = µ0 with Σ unknown. When µ = µ0 ,
43
STAT 542 Notes, Winter 2007; MDP
1 1
s11 = ∼ ,
s11·2 χ2n−p+1
so
(3.62) E(s11 ) = 1
n−p−1 <∞ iff n ≥ p + 2.
E(S −1 ) = 1
n−p−1 I (n ≥ p + 2).
44
STAT 542 Notes, Winter 2007; MDP
we conclude that
−1 1 1 −1
E(S )=E Σ ŠΣ
2 2
= Σ− 2 E(Š −1 )Σ− 2
1 1
1 − 12 − 12
= n−p−1 Σ Σ
−1
(3.63) = 1
n−p−1 Σ (n ≥ p + 2).
1
where T : p × p is lower triangular with tii > 0, i = 1, . . . , p. Since T1 = S11
2
and Σ = I, it follows from (3.51), (3.50), and (3.41) (with the indices “1”
and “2” interchanged) that
S21 T1−1 ⊥
⊥ T1 ⊥
⊥ s22·1
S21 T1−1 ∼ N1×(p−1) (0, 1 ⊗ Ip−1 ) ,
s22·1 ∼ χ2n−p+1 ,
45
STAT 542 Notes, Winter 2007; MDP
1 p−1 1 p−1
1 s11 S12 1 σ11 Σ12
(3.65) S= , Σ= ,
p−1 S21 S22 p−1 Σ21 Σ22
and define
−1
2 S12 S22 S21 Σ12 Σ−1
222 Σ21
R = , ρ = ,
s11 σ11
−1/2 −1/2
−1 S12 S22 S12 S22
R2 S12 S22 S21
U= = = ,
(3.66) 1 − R2 s11·2 s11·2
ρ2 Σ12 Σ−1
22 Σ21
ζ= =
1 − ρ2 σ11·2
Σ12 Σ−1 −1
22 S22 Σ22 Σ21
V ≡ V (S22 , Σ) = .
σ11·2
From Proposition 3.13 and (3.50) we have
−1/2 −1 1/2
S12 S22 S22 ∼ N1×(p−1) Σ12 Σ22 S22 , σ11.2 ⊗ Ip−1 ,
s11·2 ∼ σ11.2 · χ2n−p+1 ,
S22 ∼ Wp−1 (n, Σ22 ),
s11·2 ⊥
⊥ (S12 , S22 ),
so [verify]
χ2p−1 (V ) distn
U S22 ∼ 2 = Fp−1, n−p+1 (V ),
χn−p+1
V ∼ ζ · χ2n .
46
STAT 542 Notes, Winter 2007; MDP
Use (3.68), (3.69), and (A.8) to show that the unconditional distribution of
U (resp., R2 ) can be represented as a negative binomial mixture of central
F (resp., Beta) rvs:
(3.70) U K ∼ Fp−1+2K, n−p+1 ,
U
(3.71) R ≡
2
K ∼ B p−1
2 + K, n−p+1
2 ,
U +1
(3.72) K ∼ Negative binomial (ρ2 ),
that is,
Γ n2 + k 2
k
n
Pr[ K = k ] = n
ρ 1 − ρ2 2 , k = 0, 1, . . . .
¯
Γ 2 k!
Note: In Example 6.26 and Exercise 6.27 it will be shown that R2 is the
LRT statistic and the UMP invariant test statistic for testing ρ2 = 0 vs.
ρ2 > 0. When ρ2 = 0 ( ⇐⇒ Σ12 = 0 ⇐⇒ ζ = 0), U ⊥ ⊥ Z by (3.68) and
(3.74) R2 ∼ B p−1
2 , n−p+1
2 ,
47
STAT 542 Notes, Winter 2007; MDP
1 − 1 t2ij
p
1 n−i − 12 t2ii
f (T ) = √ e 2 · n−i−1
t ii e
1≤j<i≤p
2π i=1 2 2 Γ n−i+1
2
1
p 1
(4.1) = p tn−i
ii · exp − t2ij
2
pn
2 −p π
p(p−1)
4 Γ n−i+1
i=1
2
i=1 2 1≤j≤i≤p
p
1
=: cp,n · tn−i
ii
· exp − tr T T .
i=1
2
∂T
Since the pdf of S is given by f (S) = f (T ) , we first must find the
$ ∂S
∂S ∂T
Jacobian ∂T ≡ 1 ∂S of the mapping S = T T . [This derivation of the
Wishart pdf will resume in §4.4.]
A→B
(4.2)
x ≡ (x1 , . . . , xn ) → y ≡ (y1 , . . . , yn ),
where A and B are open subsets of Rn . The Jacobian matrix of this map-
ping is given by
∂y1 ∂yn
∂x1 · · · ∂x1
∂y . ..
(4.3) := .. . ,
∂x ∂y1
∂xn · · · ∂x
∂yn
n
48
STAT 542 Notes, Winter 2007; MDP
∂y +
Proof. This follows from the chain rule for partial derivatives:
∂zi (y1 (x1 , . . . , xn ), . . . , yn (x1 , . . . , xn )) ∂zi ∂yk ∂z ∂y
= = .
∂xj ∂yk ∂xj ∂y ∂x ij
k
∂z
∂z
∂y
49
STAT 542 Notes, Winter 2007; MDP
Proof. Use the fact that A can be written as the product of elementary
matrices of the forms
Verify the result when A = Mi (c) and A = Eij , then apply the chain rule.
¯
50
STAT 542 Notes, Winter 2007; MDP
(f ) triangular matrices:
∂Y
p
= |lii |i .
∂X i=1
i
Proof. Let Since yij = k=j lik xkj (i ≥ j), the Jacobian matrix is
∂ypp
∂y11 ∂y21 ∂y22
···
∂x11 ∂x11 ∂x11 ∂x11 l11 l21 ··· lpp
∂y11 ∂y21 ∂y22
···
∂ypp
∂Y ∂x21 ∂x21 ∂x21 ∂x21 0 l22
= = 0 .
∂ypp
∂y11 ∂y21 ∂y22
··· .
0 l22
∂X
∂x22 ∂x22 ∂x22 ∂x22
. .. ..
.
.. .. .. ..
. . . . . .
∂y11 ∂y21 ∂y22
···
∂ypp 0 0 ··· 0 lpp
∂xpp ∂xpp ∂xpp ∂xpp
This
pis a ip(p + 1)/2 × p(p + 1)/2 upper triangular matrix whose determinant
is i=1 lii .5
¯
• Y = U X, X, Y : p × p upper triangular:
∂Y
p
= |uii |p−i+1 .
∂X i=1
∂Y
p
= |lii |p−i+1 .
∂X i=1
51
STAT 542 Notes, Winter 2007; MDP
• Y = XU , X, Y : p × p upper triangular:
∂Y
p
= |uii |i .
∂X i=1
• Y = U XV , X, Y : p × p upper triangular:
∂Y
p
p
= |uii |p−i+1
· |vii |i .
∂X i=1 i=1
52
STAT 542 Notes, Winter 2007; MDP
∂Y
p
p
p
−1 p−i+1
= |lii | ·2 ·
p
|lii |p+1
= 2p
|lii |i .
∂X i=1 i=1 i=1
∂Y
p
=2p
|uii |p−i+1 .
∂X i=1
∂Y
p
=2p
|lii |p−i+1 .
∂X i=1
∂Y
p
=2p
|uii |i .
∂X i=1
53
STAT 542 Notes, Winter 2007; MDP
∂y ∂y
1 1
dy1 = dx1 + · · · + dxn ,
∂x1 ∂xn
(4.7) .. .. ..
. . .
∂y ∂y
n n
dyn = dx1 + · · · + dxn ,
∂x1 ∂x1
∂y
(4.8) dy = dx
∂x
∂y
54
STAT 542 Notes, Winter 2007; MDP
∂S
p
=2p
tp−i+1
ii .
∂T i=1
55
STAT 542 Notes, Winter 2007; MDP
1
f (S) =f (T (S)) · ∂S
∂T T =T (S)
p
p
= cp,n · tn−i
ii · exp − 12 tr T T ·2−p
t−p+i−1
ii
(4.9) i=1 i=1
p n−p−1
where
pn p(p−1)
p n−i+1
(4.10) c−1
p, n := 2
2 π 4 · Γ 2
i=1
pn p(p−1)
=: 2 2 π 4 · Γp n2 .
pk
n
2 Γp + k
(4.12) E(|S|k ) = |Σ|k · 2n
, k = 1, 2, . . . .
Γp 2
56
STAT 542 Notes, Winter 2007; MDP
(4.13)
V = S + T.
Show that the range of (U, V ) is given by {0 < U < I} × {V > 0} and verify
that (4.13) is a bijection. Show that the joint pdf of (U, V ) is given by
cp, r cp, n r−p−1 n−p−1
f (U, V ) = · |U | 2 |I − U | 2
cp, r+n
(4.14) cp, r+n r+n−p−1
− 12 tr Σ−1 V
· r+n · |V | 2 e ,
|Σ| 2
k Γ Γ + k
E(|S| ) p p
(4.16) E(|U |k ) = = r
2 n+r2
.
E(|V | )
k Γp 2 Γp 2 + k
Hint: To find the Jacobian of (4.13), apply the chain rule to the sequence
of mappings
(S, T ) → (S, V ) → (U, V ).
Use the extended combination rule to find the two intermediate Jacobians.
57
STAT 542 Notes, Winter 2007; MDP
i=1 σi
2
i=1 i=1
where Ω = Σ−1 .
¯
58
STAT 542 Notes, Winter 2007; MDP
L1 (Σ̂, I) → p as Σ̂ → 0, L1 (Σ̂, I) → ∞ as Σ̂ → ∞;
L2 (Σ̂, I) → ∞ as Σ̂ → 0 or ∞.
Sp+ → Sp+
(5.1)
Σ → AΣA .
6
If n < p it would seem impossible to estimate Σ. However several proposals
recently been put forth to address this case, which occurs for example with microarray
data where p ≈ 105 but n ≈ 103 . [References?]
59
STAT 542 Notes, Winter 2007; MDP
Note that both L1 and L2 are fully invariant, i.e., are GL-invariant. If L
is G-invariant then the risk function of any estimator Σ̂ ≡ Σ̂(S) transforms
as follows: for A ∈ G,
−1 −1 −1 −1
R A Σ̂(ASA )A , Σ = EΣ L A Σ̂(ASA )A , Σ
= EΣ [L(Σ̂(ASA ), AΣA )]
(5.3)
= EAΣA [L(Σ̂(S), AΣA )]
= R(Σ̂(S), AΣA ).
60
STAT 542 Notes, Winter 2007; MDP
−1
Proof. Set G = GL and A = SGL in (5.2) to obtain
−1 −1
Σ̂(I) = SGL Σ̂(S) SGL ,
so
(5.7) Σ̂(S) = SGL Σ̂(I) SGL .
Proposition 5.2. (a) The best fully equivariant estimator w. r. to the loss
1
function L1 is the biased estimator n+p+1 S.
(b) The best fully equivariant estimator w. r. to the loss function L2 is the
unbiased estimator n1 S.
Proof. (a) Let S = {sij | i, j = 1 . . . , p}. Because GL acts transitively on
Sp+ and L1 is GL-invariant, δ S has constant risk given by
1
The quadratic function of δ in (5.10) is minimized by δ̂ = n+p+1 .
61
STAT 542 Notes, Winter 2007; MDP
Π S Π ∼ Wp (n, Π Π ) = Wp (n, I) ∼ S
−1/2 −1/2
for any permutation matrix Π. Also s12 s22 ⊥
⊥ s22 and s12 s22 ∼ N (0, 1)
by (3.50) and (3.51), so
s2 s2
EI s212 = EI 12
· s22 = EI 12
· EI (s22 ) = 1 · n = n.
s22 s22
(b) Because GL acts transitively on Sp+ and L2 is GL-invariant, δS has
constant risk given by
This is minimized by δ̂ = 1
n.
¯
(5.12) Σ̂(S) = ST ∆ ST
62
STAT 542 Notes, Winter 2007; MDP
But (5.14) implies that Σ̂(I) = ∆ for some diagonal matrix ∆ ∈ Sp+ , [verify],
hence (5.12) follows from (5.13).
¯
(5.15) ˆ T ST ,
Σ̂T (S) = ST ∆
where
(5.16) ˆ T = diag(δ̂T,1 , . . . , δ̂T,p )
∆
and
1
(5.17) δ̂T,i = n+p+1−2i .
8
James and Stein (1962), Proc. 4th Berkeley Symp. Math. Statist. Prob. V.1.
63
STAT 542 Notes, Winter 2007; MDP
1
The ith term in the last sum is minimized by δ̂i = n+p+1−2i , as asserted.
*This follows from Bartlett’s decomposition (Proposition 3.20).
¯
(5.19) ˆ U SU ,
Σ̂U (S) = SU ∆
Because GU also acts transitively on Sp+ , the risk function of Σ̂U is also
constant on Sp+ with the same constant value as the risk function of Σ̂T
[why?10 ] Since L2 (Σ̂, Σ) is strictly convex in Σ̂ [verify!], so is R2 (Σ̂, Σ)
9
S. Lin and M. Perlman (1985). A Monte Carlo comparison of four estimators of a
covariance matrix. In Multivariate Analysis – VI, P. R. Krishnaiah, ed., pp. 411-429.
10
Use an invariance argument: Let Π denote the p × p permutation matrix corre-
sponding to the permutation (1, . . . , p) → (p, . . . , 1). Then
A = Π to obtain
Now apply (5.3) with
(5.20) R2 Σ̂U (S), Σ ≡ R2 Π Σ̂T (ΠSΠ ) Π, Σ = R2 Σ̂T (S), ΠΣΠ ,
64
STAT 542 Notes, Winter 2007; MDP
[verify], hence
1
R2 2 Σ̂T + Σ̂U , Σ < 2 R2 (Σ̂T , Σ) + 12 R2 (Σ̂U , Σ) = R2 (Σ̂T , Σ).
1
Therefore the estimator 2 Σ̂T + Σ̂U strictly dominates Σ̂T (and Σ̂U ).
The preceding
discussion suggests another estimator that strictly dom-
inates 12 Σ̂T + Σ̂U , namely
1
(5.21) Σ̂P (S) := Π Σ̂T (ΠSΠ ) Π,
p!
Π∈P(p)
where ν is the Haar probability measure on O, i.e. the unique (left ≡ right)
orthogonally invariant probability measure on O. Since [verify!]
(5.24) Σ̂O (S) = Γ Σ̂P (ΓSΓ ) Γ dν(Γ),
O
so Σ̂U and Σ̂T must have the same (constant) risk function, as asserted.
¯
65
STAT 542 Notes, Winter 2007; MDP
the strict convexity of L2 implies that Σ̂O in turn dominates Σ̂P [verify!]:
Lemma 5.3. For any S ∈ Sp+ let S = ΓS Dl(S) ΓS be its spectral decom-
position. Here l(S) = (l1 (S), . . . , lp (S)) where l1 ≥ · · · ≥ lp (> 0) are the
ordered eigenvalues of S, the columns of ΓS are the corresponding eigen-
vectors, and Dl(S) = diag(l1 (S), . . . , lp (S)). An estimator Σ̂ ≡ Σ̂(S) is
O-equivariant iff
66
STAT 542 Notes, Winter 2007; MDP
hence ΓΓSΓ = ΓΓS and l(ΓSΓ ) = l(S). Thus if Σ̂(S) satisfies (5.26) then
so Σ̂ is O-equivariant.
Conversely, if Σ̂ is O-equivariant then
Σ̂(Dl(S) ) = Dφ(l(S))
where λ(Σ) ≡ (λ1 (Σ) ≥ . . . ≥ λp (Σ) (> 0)) is the vector of the ordered
eigenvalues of Σ. Thus, by restricting consideration to orthogonally equiv-
ariant estimators, the problem of estimating Σ reduces to that of estimating
the population eigenvalues λ(Σ) based on the sample eigenvalues l(S).
Exercise 5.5. (Takemura). When p = 2, show that Σ̂O (S) has the form
(5.26) with
' √ √ (
l1 δ̂T,1 l δ̂
φ1 (l1 , l2 ) = √ √ + √ 2 T,2 √ l1 ,
l1 + l2 l1 + l2
(5.29) ' √ √ (
l2 δ̂T,2 l1 δ̂T,1
φ2 (l1 , l2 ) = √ √ +√ √ l2 .
l1 + l2 l1 + l2
67
STAT 542 Notes, Winter 2007; MDP
where δ̂T,1 = 1
n+1 and δ̂T,2 = 1
n−1 (set p = 2 in (5.17)).
¯
1
√ √
Because n+1 < n1 < n−1 1
and l1 > l2 , Σ̂O “shrinks” the largest
eigenvalue of n1 S and “expands” its smallest eigenvalue when p = 2 [verify],
and Takemura showed that this remains true of Σ̂O for all p ≥ 2.
Stein has argued that the shrinkage/expansion should be stronger than
that given by Σ̂O . For example, he suggested that for any p ≥ 2, if consid-
eration is restricted to orthogonally invariant estimators having the simple
form φi (l1 , . . . , lp ) = ci li for constants ci > 0, then the best choice of ci is
given by (recall (5.17))
1
(5.30) ci = δ̂T,i = n+p+1−2i , i = 1, . . . , p.
show that l1 (S) and lp (S) are, respectively, convex and concave functions
of S [verify]. Thus by Jensen’s inequality,
(5.33) EΣ [l1 (S)] ≥ l1 [E(S)] = l1 (nΣ) ≡ n λ1 (Σ),
(5.34) EΣ [lp (S)] ≤ lp [E(S)] = lp (nΣ) ≡ n λp (Σ).
Thus n1 l1 tends to overestimate λ1 and should be shrunk, while n1 lp tends to
underestimate λp and should be expanded. This holds for the other eigen-
values also: n1 l2 , n1 l3 , . . . should be shrunk while n1 lp−1 , n1 lp−2 , . . . should be
expanded.
Next from (3.53) and the concavity of log x,
p
n
p
E 1
l i (S) = λ i (Σ) · n−p+i
i=1 n i=1 i=1 n
n
p
≤ λi (Σ) · 1 − p−12n
i=1
n p(p−1)
(5.35) ≤ λi (Σ) · e− 2n .
i=1
68
STAT 542 Notes, Winter 2007; MDP
p n
Thus i=1 n1 li (S) will tend to underestimate i=1 λi (Σ) unless n p2 ,
which does not usually hold in applications. This suggests that the shrink-
age/expansion of the sample eigenvalues should not be done in a linear
manner: the smaller n1 li (S)’s should be expanded proportionately more than
the larger n1 li (S)’s should be shrunk.
A more precise justification is based on the celebrated ”semi-circle” law
[draw figure] of the mathematical physicist E. P. Wigner, since extended by
many others. A strong consequence of these results is that when Σ = λ Ip
(equivalently, λ1 (Σ) = · · · = λp (Σ) = λ) and both n, p → ∞ while np → η
for some fixed η ∈ (0, 1], then
a.s. √ 2
(5.36) 1
n l 1 (S) → λ (1 + η) ,
a.s. √
n lp (S) → λ (1 − η)2 .
1
(5.37)
Thus if it were known that Σ = λ Ip then n1 l1 (S) should be shrunk by
√
the factor 1/(1 + η)2 while n1 lp (S) should be expanded by the factor
√
1/(1 − η)2 . Furthermore, the expansion is proportionately greater than
the shrinkage since
1
√
(1+ η)2 · 1
√
(1− η)2 = 1
(1−η)2 > 1.
Note that these two desired shrinkage factors for n1 l1 (S) and n1 lp (S)
are even more extreme than nc1 ≡ nδ̂T,1 and nc1 ≡ nδ̂T,p from (5.30):
The shrinkage and expansion factors in (5.36) and (5.37) are derived
only for the case Σ = λ Ip (the “worst case” in that the most shrink-
age/expansion is required). In general the appropriate shrinkage/expansion
factors (equivalently, the functions φ1 , . . . , φp in (5.26)) depend on the (un-
known) empirical distribution of λ1 (Σ), . . . , λp (Σ) so must themselves be
estimated adaptively. Stein12 proposed the following adaptive eigenvalue
12
I first learned of this result at Stein’s 1975 IMS Rietz Lecture in Atlanta, which
remains unpublished in English - Stein published his results in a Russian journal in
1977. I have copies of his handwritten lecture notes from his courses at Stanford and U.
of Washington. Similar results were later obtained independently by Len Haff at UCSD.
69
STAT 542 Notes, Winter 2007; MDP
estimators:
+
1
(5.40) φ∗i (l1 , . . . , lp ) = n−p+1+2li 1 li .
j=i li −lj
The term inside the large parentheses can be negative hence its positive part
is taken. Also the required ordering φ∗1 > . . . > φ∗p need not hold, in which
case the ordering is achieved by an isotonization algorithm – see Lin and
Perlman (1985) for details. Despite these complications, Stein’s estimator
offers substantial improvement over the other estimators considered thus
far – the reduction in risk can be 70-90% when Σ ≈ λ Ip !
If the population eigenvalues are widely dispersed, i.e.,
(5.41) λ1 (Σ) · · · λp (Σ),
then the sample eigenvalues {li } will also be widely dispersed, so
1
li j =i li −l j
= j>i l i
li
−l j
+ j<i l i
li
−l j
≈ (p − i) + 0,
in which case (5.40) reduces to [verify]
∗
(5.42) φi (l1 , . . . , lp ) = n+p−1+2i li ≡ δ̂T,i li
1
(recall (5.30)). On the other hand, if two or more λi (Σ)’s are nearly equal
then the same will be true for the corresponding li ’s, in which case the
shrinkage/expansion offered by the φ∗i ’s will be more pronounced than in
(5.42), a desirable feature as indicated by (5.38) and (5.39).
70
STAT 542 Notes, Winter 2007; MDP
X /G := {Gx | x ∈ X }
π : X → X /G
x → Gx.
71
STAT 542 Notes, Winter 2007; MDP
Equivalently, if X ∼ P then gX ∼ gP .
P = g(g −1 P) ⊆ gP ∀g ∈ G,
Pgθ := gPθ ≡ Pθ ◦ g −1 .
72
STAT 542 Notes, Winter 2007; MDP
(6.1) H0 : θ ∈ Θ0 vs. H : θ ∈ Θ \ Θ0
73
STAT 542 Notes, Winter 2007; MDP
t ≡ t(x) rather than on the value of x itself. In general this may entail
a loss of information, but optimal invariant tests often (but not always)
remain admissible among all possible tests.
Because P0 and P are G-invariant, the invariance-reduced testing prob-
lem can be restated equivalently as that of testing
(6.2) H0 : τ ∈ Ξ0 vs. H : τ ∈ Ξ \ Ξ0
based on a MIS t, for appropriate sets Ξ0 and Ξ in the range of the MIP τ .
Our goal will be to determine the distribution of the MIS t and apply the
principles of hypothesis testing to (6.2). In particular, if a UMP test exists
for (6.2), it is called UMP invariant (UMPI) with respect to G for (6.1).
In cases where the class of invariant tests still so large that no UMPI
test exists, the likelihood ratio test (LRT) for (6.1), which rejects H0 for
large values of the LRT statistic
maxΘ f (x, θ)
Λ(x) := ,
maxΘ0 f (x, θ)
Λ(gx) = Λ(x) ∀ g ∈ G.
X → gX and µ → gµ,
74
STAT 542 Notes, Winter 2007; MDP
respectively. Because
gX ∼ Np (gµ, gg ≡ Ip ),
respectively, so
represent the MIS and MIP, resp. The distribution of t is χ2p (τ ), the non-
central chi-square distribution with noncentrality parameter τ . Any G-
invariant statistic depends on X only through X2 , and its distribution
depends on µ only through µ2 . The invariance-reduced problem (6.2)
becomes that of testing
the upper α quantile of the χ2p distribution, and is unbiased. Thus this test
is UMPI level α for (6.3) and is unbiased for (6.3).
¯
Exercise 6.9. (a) In Example 6.8 show that the UMP invariant level α
test is the level α LRT based on X for (6.3).
(b) The power function of this LRT is given by
It follows from the MLR property (or the log concavity of the normal pdf)
that βp (τ ) is increasing in τ , hence this test is unbiased. Show that for fixed
τ , βp (τ ) is decreasing in p. Hint: apply the NP Lemma.
75
STAT 542 Notes, Winter 2007; MDP
(c) (Kiefer and Schwartz (1965) Ann. Math. Statist.) Show that the LRT
is a proper Bayes test for (6.3), and therefore is admissible among all tests
for (6.3).
Hint: consider the following prior distribution:
Pr[ µ = 0] = γ,
Pr[ µ = 0] = 1 − γ,
µ µ = 0 ∼ Np (0, λIp ), (0 < γ < 1, λ > 0).
respectively. Again Θ and Θ0 are G-invariant. Now there are only two G-
orbits in X : {0} and Rp \{0} [why?], so any G-invariant statistic is constant
on Rp \ {0}, hence its distribution does not depend on µ. Thus there is
no G-invariant test that can distinguish between the hypotheses µ = 0 and
µ = 0 on the basis of a single observation X when Σ is unknown. ¯
76
STAT 542 Notes, Winter 2007; MDP
respectively. Because
represent the MIS and MIP, respectively [verify!]. We have seen that
−1
χ2p (τ )
Hotelling s T ≡ Y W
2
Y ∼ 2 ,
χn−p+1
the ratio of two independent chisquare variates, the first noncentral. (This is
the (nonnormalized) noncentral F distribution Fp, n−p+1 (τ ).) The invariance-
reduced problem (6.2) becomes that of testing
Because Fp, n−p+1 (τ ) has MLR in τ (see Example A.15), the UMP level α
test for (6.6) rejects τ = 0 if T 2 > Fp, n−p+1; α and is unbiased. Thus this
test is UMPI level α for (6.5), and is unbiased for (6.5).
¯
77
STAT 542 Notes, Winter 2007; MDP
Exercise 6.12. (a) In Example 6.11, show that the UMP invariant level α
test ( ≡ the T 2 test) is the level α LRT based on (Y, W ) for (6.5).
(b) The power function of this LRT is given by
βp, n−p+1 (τ ) := Pr τ [ T 2 > Fp, n−p+1;α ] ≡ Pr[ Fp, n−p+1 (τ ) > Fp, n−p+1;α ].
It follows from MLR that βp, n−p+1 (τ ) is increasing in τ , hence this test is
unbiased. Show that for fixed τ and p, βp, n−p+1 (τ ) is increasing in n.
(c)* (Kiefer and Schwartz (1965) Ann. Math. Statist.). Show that the
LRT is a proper Bayes test for testing (6.5) based on (Y, W ), and thus is
admissible among all tests for (6.5).
Hint: consider the prior probability distribution on Θ0 ∪ Θ given by
Pr[Θ0 ] = γ,
Pr[Θ] = 1 − γ, (0 < γ < 1);
(µ, Σ) Θ0 ∼ π0 ,
(µ, Σ) Θ ∼ π,
where η has pdf proportional to |Ip + ηη |−(n+1)/2 ; π assigns all its mass to
points of the form
Verify that π0 and π are proper measures, i.e., verify that the corresponding
pdfs of η have finite total mass. Show that the T 2 test is the Bayes test for
this prior distribution.
¯
78
STAT 542 Notes, Winter 2007; MDP
µ1 = 0 vs. µ1 = 0
(6.7)
based on (Y, W ) ∼ Np (µ, Σ) × Wp (n, Σ)
Exercise 6.14. (a) In Example 6.13, apply Lemma 6.3 to show that
79
STAT 542 Notes, Winter 2007; MDP
(b) Show that the joint distribution of (L, M ) ≡ (L(Y, W ), M (Y, W )) can
be described as follows:
τ
χ2p1 1+M1
τ
−1
Hint: Begin by finding the conditional distribution of Y1 −W12 W22 Y2 given
(Y2 , W22 ).
(c) Show that the level α LRT based on (Y, W ) for (6.7) is the test that
rejects (µ1 , µ2 ) = (0, 0) if
This test is the conditionally UMP level α test for (6.8) given the ancillary
statistic M and is conditionally unbiased for (6.8), therefore unconditionally
unbiased for (6.7) ≡ (6.8).
(d)** Show that no UMP size α test exists for (6.8), so no UMPI test exists
for (6.7). Therefore the LRT is not UMPI. (See Remark 6.16).
(e)* In Exercise 6.12b, show βp,m (τ ) is decreasing in p for fixed τ and m.
Hint: Apply the results (6.9) concerning the joint distribution of (L, M )
¯
80
STAT 542 Notes, Winter 2007; MDP
compare its power function to that of the LRT in Example 6.13. Given M ,
the conditional power function of the LRT is given by
% τ1
& τ1
Prτ Fp1 , n−p+1 1+M > Fp1 , n−p+1;α M ≡ βp1 , n−p+1 1+M ,
while the (unconditional) power of the size-α T 2 test is βp, n−p+1 (τ1 ) because
τ = τ1 when µ2 = 0. Since βp,m (δ) is decreasing in p but increasing in δ
(recall Exercises 6.12b, 6.14e), neither power function dominates the other.
Another possible test in Example 6.13 rejects (µ1 , µ2 ) = (0, 0) iff
−1
T12 := Y1 W11 Y1 > Fp1 , n−p1 +1, α ,
a test that ignores the covariate information and is not G1 -invariant [verify].
Since
T12 ∼ Fp1 , n−p1 +1 (τ̃1 )
where τ̃1 := µ1 Σ−1 2
11 µ1 , the power function of the level α test based on T1 is
βp1 , n−p1 +1 (τ̃1 ). Because τ̃1 ≤ τ1 but βp,m (δ) is decreasing in p and increas-
ing in m, the power function of T12 neither dominates nor is dominated by
that of the LRT or of T 2 .
¯
Exercise 6.17. Let (Y, W ) be as in Examples 6.11 and 6.13. Consider the
problem of testing µ2 = 0 vs. µ2 = 0 with µ1 and Σ unknown. Find a
natural invariance group G2 such that the test that rejects µ2 = 0 if
−1
T22 := Y2 W22 Y2 > Fp2 , n−p2 +1; α
81
STAT 542 Notes, Winter 2007; MDP
Here, unlike Examples 6.8, 6.11, and 6.13, when p ≥ 2 the alternative
hypothesis remains multi-dimensional even after reduction by invariance,
so it is not to be expected that a UMPI test exists (it does not).
¯
Exercise 6.19a. In Example 6.18 derive the LRT for (6.10). Express the
test statistic in terms of l(S).
etrS
Answer: The LRT rejects Σ = Ip for large values of |S| , or equivalently,
for large values of p
(li (S) − log li (S) − 1).
i=1
82
STAT 542 Notes, Winter 2007; MDP
Show that this problem remains invariant under the extended group
Ḡ := {ḡ = ag | a > 0, g ∈ Op }.
Express a MIS and MIP for this problem in terms of l(S) and λ(Σ) respec-
tively. Find the LRT for this problem and express it in terms of l(S).
(The hypothesis Σ = κIp , 0 < σ < ∞ is called the hypothesis of sphericity.)
1
p trS
Answer: The LRT rejects the sphericity hypothesis for large values of 1 ,
|S| p
or equivalently, for large values of
1
p
p i=1 li (S)
p 1 ,
i=1 li (S)
p
83
STAT 542 Notes, Winter 2007; MDP
(so G = GL(p1 )×GL(p2 )), then (6.13) is invariant under the action of G on
Sp+ given by S → gSg [verify]. It follows from Lemma 6.3 and the singular
value decomposition that a MIS is [verify!]
−1/2 −1/2
r(S) ≡ (r1 (S) ≥ · · · ≥ rq (S)) := the singular values of S11 S12 S22 ,
−1/2 −1/2
ρ(Σ) ≡ (ρ1 (Σ) ≥ · · · ≥ ρq (Σ)) := the singular values of Σ11 Σ12 Σ22 ,
Remark 6.23. This model and testing problem can be reduced to the
multivariate linear model and MANOVA testing problem (see Remark 8.5)
by conditioning on S22 :
−1/2
Exercise 6.24. In Example 6.22 find the LRT for (6.13). Express the
test statistic in terms of r(S). Show this LRT statistic is equivalent to
84
STAT 542 Notes, Winter 2007; MDP
a1 Σ12 a2
ρ1 (Σ) = max Cor(a1 X1 , a2 X2 ) ≡ max ) ) .
a1 =0, a2 =0 a1 =0, a2 =0 a1 Σ11 a1 a2 Σ22 a2
85
STAT 542 Notes, Winter 2007; MDP
then
Σ12 a2
ρ = max Cor(X1 , a2 X2 ) = max √ ) ,
a2 =0 a2 =0 Σ11 a2 Σ22 a2
86
STAT 542 Notes, Winter 2007; MDP
Θ0 = {Σ | Σij = 0, i = j},
87
STAT 542 Notes, Winter 2007; MDP
Σ1 = Σ2 vs. Σ1 = Σ2
(6.20)
based on (S1 , S2 ) ∼ Wp (n1 , Σ1 ) × Wp (n2 , Σ2 )
with n1 , n2 ≥ p. Here
Exercise 6.33. In Example 6.32, derive the LRT for (6.20) and express the
test statistic in terms of f (S1 , S2 ). Show that the LRT statistic is minimized
when n11 S1 = n12 S2 .
Answer: The LRT rejects Σ1 = Σ2 for large values of
88
STAT 542 Notes, Winter 2007; MDP
with n1 ≥ p, . . . , nk ≥ p. Here
Exercise 6.35. In Example 6.34, derive the LRT for (6.23). Show that the
LRT statistic is minimized when n11 S1 = · · · = n1k Sk .
Answer: The LRT rejects Σ1 = · · · = Σk for large values of
ni
k
i=1 Si
k .
i=1 |Si |
n i
89
STAT 542 Notes, Winter 2007; MDP
with Σ > 0 unknown and n ≥ p. (Example 6.11 is the special case where
r = 1.) Here
This problem is invariant under the action of the group GL × Or ≡ {(g, γ)}
acting on X and Θ via
(Y, W ) → (gY γ , gW g ),
(6.25)
(µ, Σ) → (gµγ , gΣg ),
respectively. It follows from Lemma 6.3 and the singular value decomposi-
tion that a MIS is [verify!]
Here the MIS and MIP have the same dimension, namely q, and a UMP
invariant test will not exist when q ≡ min(p, r) ≥ 2.
90
STAT 542 Notes, Winter 2007; MDP
Exercise 6.37a. In Example 6.36, derive the LRT for testing µ = 0 vs.
µ = 0 based on (Y, W ). Express the test statistic in terms of f (Y, W ). Show
that when µ = 0, W + Y Y is independent of f (Y, W ), hence is independent
of the LRT statistic.
Partial solution: The LRT rejects µ = 0 for large values of
|W + Y Y |
q
−1
(6.27) = |Ip + Y W Y | ≡ (1 + fi (Y, W )).
|W | i=1
|W |
Derive the moments of |W +Y Y | ≡ |U | under the null hypothesis µ = 0.
Solution: By independence,
E(|W |k
) E(|S| k
) Γp Γ p + k
(6.30) E(|U |k ) =
= = n 2
r+n2
.
¯
E(|W + Y Y | ) k E(|V | )
k Γp 2 Γp 2 + k
91
STAT 542 Notes, Winter 2007; MDP
Exercise 6.37d. In Exercise 6.24 it was found that the LRT for testing
(i.e., testing independence of two sets of variates) rejects Σ12 = 0 for small
values of |S11|S|
||S22 | . Show that the null (Σ12 = 0) distribution of this LRT
statistic is U (p1 , p2 , n − p2 ) – see Exercise 6.24.
¯
U (p, r, n) ∼ B n−p+i
2 , r
2 .
¯
i=1
Remark 6.38. Perlman and Olkin (Annals of Statistics 1980) applied the
FKG inequality to show that the LRTs in Exercises 6.24 and 6.37a are
unbiased.
¯
92
STAT 542 Notes, Winter 2007; MDP
a S1 a a S1 a
(6.33) Va ≡ ∼ · F1,1 ≡ δa · F1,1
a S2 a a S2 a
and let φa denote the UMPU size α test for testing δa = 1 vs. δa = 1 based
on Va (cf. TSH Ch.5 §3). Then [verify]: φa is unbiased size α for testing
Σ1 = Σ2 , with power > α when δa = 1, so φa dominates the UMPI test φ.
Note: This failure of invariance to yield a nontrivial UMPI test is usually
attributed to the group GL being “too large”, i.e., not “amenable”.13 How-
ever, this example is somewhat artificial in that the sample sizes are too
small (n1 = n2 = 1) to permit estimation of Σ1 and Σ2 . It would be of
interest to find (if possible?) an example of a trivial UMPI test in a less
contrived model. ¯
13
See Bondar and Milnes (1981) Zeit. f. Wahr. 57, pp. 103-128.
93
STAT 542 Notes, Winter 2007; MDP
ST −1 or S(S + T )−1 ,
94
STAT 542 Notes, Winter 2007; MDP
First we shall derive the pdf of b, then obtain the pdf of f using the relation
bi
(7.2) fi = .
1 − bi
p r
p W11 W12
W ≡ ∼ Wp+r (m, Ip+r ).
r W21 W22
Assume that m ≥ max(p, r), so W11 > 0 and W22 > 0 w. pr. 1. By the
properties of the distribution of a partitioned Wishart matrix (Proposition
3.13),
−1 −1
(a) the distribution of the nonzero eigenvalues of W12 W22 W21 W11
is b(p, m − r, r) [verify!]
17
If p > r then q = r and p − r of the eigenvalues of S(S + T )−1 are trivially
≡ 1. By Okamoto’s Lemma the nonzero eigenvalues are distinct w. pr. 1.
95
STAT 542 Notes, Winter 2007; MDP
−1 −1
(b) the distribution of the nonzero eigenvalues of W21 W11 W12 W22
is b(r, m − p, p) [verify!].
But these two sets of eigenvalues are identical18 so the result follows by
setting n = m − r.
¯
(S, T ) → (S, V ≡ S + T ).
S = E Db E ,
(7.4)
V = E E,
96
STAT 542 Notes, Winter 2007; MDP
hence
p
r−p−1
p
n−p−1
f (b, E) =2 cp,r cp,n ·
p
bi 2
(1 − bi ) 2 (bi − bj )
i=1 i=1 1≤i<j≤p
n+r−p
· |EE | e− 2 tr EE .
1
2
p
r−p−1 n−p−1
(7.6) f (b) = cb · bi 2
(1 − bi ) 2 · (bi − bj ), b ∈ Rb (p),
i=1 1≤i<j≤p
n+r−p
f (E) = cE · |EE | e− 2 tr EE ,
1
(7.7) 2 E ∈ RE (p),
where
cb cE = 2p cp,r cp,n .
Thus, to determine cb it suffices to determine cE . This is accomplished as
follows:
n+r−p
−1
|EE | 2 e− 2 tr EE dE
1
cE =
RE
n+r−p
−p
|EE | 2 e− 2 tr EE dE
1
=2 [by symmetry]
Rp2
p
−p p2
n+r−p 1
√ e− 2 eij deij
1 2
=2 (2π) 2 |EE | 2
Rp2 i,j=1
2π
p2
n+r−p
−p
=2 (2π) 2 · E |Wp (p, Ip )| 2 [why?]
−p p2 cp,p
=2 (2π) 2 · . [by (4.12)]
cp,n+r
97
STAT 542 Notes, Winter 2007; MDP
π 2 Γp
≡ n
r
2 p
.
Γp 2 Γp 2 Γp 2
This completes the derivation of the pdf f (b) in (7.6), hence determines the
distribution b(p, n, r) when r ≥ p. Note that this can be viewed as another
generalization of the Beta distribution.
From (7.4),
dS = (dE)Db E + E Ddb E + EDb (dE) ,
dV = (dE)E + E(dE) ,
hence, defining
dF = E −1 (dE),
dG = E −1 (dS)(E −1 ) ,
dH = E −1 (dV )(E −1 ) ,
we have
(7.10) dG = (dF )Db + Ddb + Db (dF ) ,
(7.11) dH = (dF ) + (dF ) .
∂(dS,dV )
To evaluate ∂(db,dE) , apply the chain rule to the sequence
to obtain
∂(dS, dV ) ∂(db, dF ) ∂(dG, dH) ∂(dS, dV )
= · · .
∂(db, dE) ∂(db, dE) ∂(db, dF ) ∂(dG, dH)
98
STAT 542 Notes, Winter 2007; MDP
Therefore
∂ (dgii ), (dhii ), (dgij ), (dhij )
J =
∂ (dbi ), (dfii ), (dfij ), (dfji )
Ip 0 0 0
2D 2Ip 0 0
= b ,
0 0 D1 Ip(p−1)/2
0 0 D2 Ip(p−1)/2
where
D1 := Diag(b2 , . . . , bp , b3 , . . . , bp , . . . , bp−1 , bp , bp )
D2 := Diag(b1 , . . . , b1 , b2 , . . . , b2 , . . . , bp−2 , bp−2 , bp−1 ),
hence [verify!]
(7.14) J = 2p |D1 − D2 | = 2p (bi − bj ).
1≤i<j≤p
The desired Jacobian (7.5) follows from (7.12), (7.13), and (7.14).
¯
99
STAT 542 Notes, Winter 2007; MDP
p
r−p−1 n+r
(7.15) cb (p, r, n) fi 2
(1 + fi )− 2 (fi − fj ),
i=1 1≤i<j≤p
Exercise 7.3. Under the weaker assumption that n + r ≥ p, show that the
distribution of b ≡ {bi (S, T )} does not depend on Σ and that b and V are
independent. (Note that f ≡ {fi (S, T )} is not defined unless n ≥ p.)
Hint: Apply the GL-invariance of {bi (S, T )} and Basu’s Lemma. If n ≥ p
and r ≥ p the result also follows from Exercise 4.2. ¯
l1 p p 1
(Roy) and li ·
lp i=1 i=1 li
100
STAT 542 Notes, Winter 2007; MDP
on the range
Rl := {l | ∞ > l1 > · · · > lp > 0}.
Outline of solution. Use the limit representation
S = Γ Dl Γ ,
(7.18)
Ip = Γ Γ ,
From (7.18),
dS = (dΓ) Dl Γ + Γ Ddl Γ + Γ Dl (dΓ) ,
0 = (dΓ) Γ + Γ (dΓ) ,
hence, defining dF = Γ−1 (dΓ),
101
STAT 542 Notes, Winter 2007; MDP
to obtain
∂(dS) ∂(dl, dF ) ∂(dG) ∂(dS)
= · · [verify].
∂(dl, dΓ) ∂(dl, dΓ) ∂(dl, dF ) ∂(dG)
=1 ≡J =1
From (7.22),
dgii = dli , i = 1, . . . , p,
dgij = (dfij )(lj − li ), 1 ≤ i < j ≤ p,
(note that dfii = 0 by skew-symmetry), so
∂ (dgii ), (dgij ) Ip ∗
J =
= = |D| = (li − lj ),
∂ (dli ), (dfij ) 0 D
1≤i<j≤p
where
D = Diag(l2 − l1 , . . . , lp − l1 , . . . , lp − lp−1 ).
Therefore from (7.19),
p
r−p−1
− 12 li
f (l, Γ) = cp,r · li 2
e (li − lj ),
i=1 1≤i<j≤p
so
f (l) = f (l, Γ) dΓ
Op
' (
p
r−p−1
e− 2 li
1
(7.23) cp,r dΓ · li 2
(li − lj ).
Op i=1 1≤i<j≤p
102
STAT 542 Notes, Winter 2007; MDP
ΓS ∼ Haar(Op ),
103
STAT 542 Notes, Winter 2007; MDP
S̃ = Ψ S Ψ ∼ Wp (r, I).
Then
S̃ = (ΨΓS )Dl(S) (ΨΓS ) ,
so
ΓS̃ = Ψ ΓS and l(S̃) = l(S).
Therefore
t:X →T
x → t(x)
is a maximal invariant statistic then the pdf of t w.r. to the induced measure
µ̃ = µ(t−1 ) on T is given by
(7.26) f¯(x) = f (gx) dν(g),
G
104
STAT 542 Notes, Winter 2007; MDP
f (x) = k(x2 )
for some k(·) on (0, ∞), then the pdf of t(X) w.r.to dµ̃(t) is simply
105
STAT 542 Notes, Winter 2007; MDP
e− 2 x ≡ k(x2 )
1 2
1
f (x) = p w.r. to dµ(x),
(2π) 2
so t has pdf
e− 2 t
1 1
h(t) = k(t) = p w.r. to dµ̃(t).
(2π) 2
w(t) p
p
(7.29) dµ̃(t) = dt = Γ p t 2 −1 dt.
π2
k(t) (2)
Here
e− 2 x−ξ ,
1 2
1
f (x) = p
(2π) 2
106
STAT 542 Notes, Winter 2007; MDP
∞
(2) 1 − δ2 − 2t (tδ)k 2k
= p e e γ11 dνp (γ) [ verify! ]
(2π) 2 (2k)! Op
k=0
∞
(3) (tδ)k %
&k
e− 2 e− 2
δ t
= 1
p E Beta 12 , p−1
2 [ verify! ]
(2π) 2 (2k)!
k=0
∞
p
(tδ)k Γ 12 + k Γ 2
e− 2 e− 2
δ t
= 1
p
.
(2π) 2 (2k)! Γ p2 + k Γ 12
k=0
(1) This follows from the left and right invariance of the Haar measure νp .
(2) By the invariance of νp the distribution of γ11 is even, i.e., γ11 ∼ −γ11 ,
so its odd moments vanish.
(3) By left invariance, the first column of γ is uniformly
distributed on the
unit sphere in R , hence γ11 ∼ Beta 2 , 2
p 2 1 p−1
[verify!].
Thus from (7.29) and Legendre’s duplication formula, t has pdf w. r. to dt
given by
∞ 1
dµ̃(t) t
p
2 −1 (tδ)k Γ + k
= 1p e− 2 e− 2
δ t
h(t) p 2
1
dt 22 (2k)! Γ 2 + k Γ 2
k=0
∞
( ) δ k p+2k
−1 − t
t e 2
2
= e− 2
δ
(7.30) 2 ,
k! 2
p+2k
2 Γ p+2k
k=0
2
Poisson( δ2 ) weights pdf of χ2p+2k
x → γxψ .
107
STAT 542 Notes, Winter 2007; MDP
w(l)
p(r+1)
π 2
p
r−p−1
(7.34) dµ̃(l) = dl = p
r
li 2 (li − lj ) dl.
k(l) Γp 2 Γp 2 i=1 1≤i<j≤p
Finally, the case r < p follows from (7.33) by interchanging p and r, since
XX and X X have the same nonzero eigenvalues.
108
STAT 542 Notes, Winter 2007; MDP
X ∼ Np×r (ξ, Ip ⊗ Ir )
h(l)
e− 2 tr (γxψ −ξ)(γxψ −ξ) dνp (γ) dνr (ψ)
1
1
= pr
(2π) 2 Op Or
− 12 tr ξξ − 12 tr xx
= 1
pr e e etr γxψ ξ dνp (γ) dνr (ψ)
(2π) 2 Op Or
1 1
(1) 1 − 12 λi − 2 1
li tr γDl2 ψ̃ Dλ2
= pr e e e dνp (γ) dνr (ψ)
(2π) 2 Op Or
p p 12 12
1 − 12 λi − 12 li l λ γ ψ
= pr e e e i=1 j=1 i j ji jidνp (γ) dνr (ψ)
(2π) 2 Op Or
' (2k
p
∞
p 1
(2)
= 1
e− 12 λi − 2
e
1
li 1 k dνp (γ) dνr (ψ).
pr
(2π) 2 (2k)! li λj2 γji ψji
Op Or i=1 k=0 j=1
109
STAT 542 Notes, Winter 2007; MDP
110
STAT 542 Notes, Winter 2007; MDP
as already noted in (1) its extension to the positive orthant Rp+ is convex
1
and symmetric in the {li2 }. Thus the symmetric extension to Rp+ of the
acceptance region A ⊆ Rl of any proper Bayes test for testing λ = 0 vs.
1
λ > 0 based on l must be convex and decreasing in {li2 } [explain and verify!].
Wald’s fundamental theorem of decision theory states that the closure
in the weak* topology of the set of all proper Bayes acceptance regions
determines an essentially complete class of tests. Because convexity and
monotonicity are preserved under weak* limits, this implies that the sym-
metric extension to Rp+ of any admissible acceptance region A ⊆ Rl must
1
be convex and decreasing in {li }. This shows, for example, that the test
2
π2
p
r−p−1
p
fλ (l) = pr p
p r li 2 (li − lj )
r
2 2 Γp 2 Γp 2 ( i=1 λi ) i=1 2
1≤i<j≤p
−1
e− 2 tr Dλ γ Dl γ dνp (γ),
1
(7.37) · l ∈ Rl ,
Op
111
STAT 542 Notes, Winter 2007; MDP
Remark 7.12. Stein’s integral formula (7.26) for the pdf of a maximal
invariant statistic under the action of a compact topological group G can
be partially extended to the case where G is locally compact. Important
examples include the general linear group GL and the triangular groups GT
and GU . In this case, however, the integral representation does not provide
the normalizing constant for the pdf of the MIS, but still provides a useful
expression for the likelihood ratio, e.g. (7.36). References include:
S. A. Andersson (1982). Distributions of maximal invariants using quotient
measures. Ann. Statist. 10 955-961.
M. L. Eaton (1989). Group Invariance Applications in Statistics. Regional
Conference Series in Probability and Statistics Vol. 1, Institute of Mathe-
matical Statistics.
R. A. Wijsman (1990). Invariant Measures on Groups and their Use in
Statistics. Lecture Notes – Monograph Series Vol. 14, Institute of Mathe-
matical Statistics.
112
STAT 542 Notes, Winter 2007; MDP
113
STAT 542 Notes, Winter 2007; MDP
The forms (8.1) – (8.4) are all extrinsic, in that they require spec-
ification of the design matrix X, which in turn is specified only after a
choice of coordinate system. We seek to express these equivalent forms in
an intrinsic algebraic form that will allow us to determine when a specified
linear subspace L ⊆ M(p, m) can be written as Lp (X) for some X. This
is accomplished by means of an invariant ≡ coordinate-free definition of a
MANOVA subspace.
(8.5) M(p, p) L ⊆ L.
Because M(p, p) is in fact a matrix algebra (i.e., closed under matrix mul-
tiplication as well as matrix addition) that contains the identity matrix Ip ,
(8.5) is equivalent to the condition M(p, p) L = L.
¯
(8.7) L = {x ∈ M(p, m) | x = xP }.
114
STAT 542 Notes, Winter 2007; MDP
L1 = · · · = Lp =: L̃ ⊆ M(1, m).
L ⊆ {x ∈ M(p, m) | x = xP }.
xP = x =⇒ i xP = i x, i = 1, . . . , p,
=⇒ i x ∈ L̃ ≡ Li
=⇒ i x = i xi for some xi ∈ L
p p
=⇒ x ≡ i i x = (i i )xi ∈ L,
i=1 i=1
where the final membership follows from (a) and the assumption that L is
a linear subspace. Thus
L ⊇ {x ∈ M(p, m) | x = xP },
115
STAT 542 Notes, Winter 2007; MDP
116
STAT 542 Notes, Winter 2007; MDP
Il
By (8.5), L0 Γ is a MANOVA subspace of Rpl , so we can find
0n×l
Γ0 : l × l orthogonal so that
Il
L0 Γ Γ0 = {(ξ, 0p×r ) | ξ ∈ M(p, l0 )}.
0n×l
∗ Γ0 0l×n
Now take Γ = Γ and verify that (8.11) holds.
¯
0n×l In
(U, Y, Z) → (U + b, Y, Z),
(8.13)
(ξ, µ, Σ) →
(ξ + b, µ, Σ).
Since M(p, l0 ) acts transitively on itself, the MIS and MIP are (Y, Z) and
(µ, Σ), resp., and the invariance-reduced problem becomes that of testing
117
STAT 542 Notes, Winter 2007; MDP
Remark 8.5. By Proposition 8.2b and Remark 8.3, Lp (X) and Lp (X, C)
are MANOVA subspaces of Rp×m such that Lp (X, C) ⊂ Lp (X). Thus the
general MANOVA testing problem (8.10) is often stated as that of testing
[Add Examples]
¯
(8.17) {0} ⊂ V1 ⊂ V2 ⊂ · · · ⊂ Vr ⊂ Rp
118
STAT 542 Notes, Winter 2007; MDP
AVi ⊆ Vi , i = 1, . . . , r
Exercise 8.7. Give an algebraic definition of the set of lower block trian-
gular matrices. ¯
(8.21) {0} ⊂ V1 ⊂ V2 ⊂ · · · ⊂ Vr ⊂ Rp
119
STAT 542 Notes, Winter 2007; MDP
120
STAT 542 Notes, Winter 2007; MDP
p1 p2
p1 Y1k µ1 p1 Σ11 Σ12
Yk = , µ= , Σ= .
p2 Y2k µ2 p2 Σ21 Σ22
=c · |Σ11·2 | exp − 1
2 tr Σ11·2
k=1 (y 1k − α − βy2k )
−(m+n)/2
1 −1
%
m
∗2
n
&
121
STAT 542 Notes, Winter 2007; MDP
The MLEs for these models are given in (3.15), (3.16), (3.34), and
(3.35). To assure the existence of the MLE, the single condition m ≥ p + 1
is necessary and sufficient [verify!]. (This is the same condition required for
existence of the MLE based on the complete observations Y1 , . . . , Ym only.)
If this condition holds, then the MLEs of α, β, Σ11·2 , µ2 , Σ22 are as follows:
mȲ2 + nV̄
α̂ = Ȳ1 − β̂ Ȳ2 , µ̂2 = ,
m+n
−1 ∗2
(9.2) β̂ = S12 S22 , Σ̂22 1
= m+n S22 + T + mn
m+n (Ȳ2 − V̄ ) ,
1
Σ̂11·2 = m S11·2 ,
[verify!], where
m n
S= (Yk − Ȳ )∗2 , T = (Vk − V̄ )∗2 .
k=1 k=1
m+n
Verify that m+n−1 Σ̂22 is the sample covariance matrix based on the com-
bined sample Y21 , . . . , Y2m , V1 , . . . , Vn . Furthermore, the maximum value of
the LF is given by
Remark 9.1. The pairs (Ȳ , S) and (V̄ , T ) together form a complete and
sufficient statistic for the above incomplete data model.
¯
122
STAT 542 Notes, Winter 2007; MDP
and noting that each conditional pdf is the LF of a normal linear regression
model.
¯
Partial solutions: First, for each testing problem, the LF is given by (9.1)
and its maximum under H given by (9.3).
(i) Because α = µ1 when µ2 = 0, it follows from (9.1) that the LF under
H1 is given by
m
−m/2 −1
c ·|Σ11·2 | exp − 1
2 tr Σ11·2 (y1k − µ1 − βy2k )∗2
k=1
(9.6)
−(m+n)/2
−1
%
m
∗2
n
&
· |Σ22 | exp − 1
2 tr Σ22 (y2k ) + (vk )∗2 ,
k=1 k=1
where
1
Σ̃22 := m+n (S̃22 + T̃ )
m
n
1 ∗2 ∗2
= m+n (Y2k ) + (Vk )
k=1 k=1
123
STAT 542 Notes, Winter 2007; MDP
[verify!]. Thus, by (9.3) and (9.7) the LRT rejects H2 in favor of H for large
values of [verify!]
∗2
mȲ2 +nV̄
|Σ̃22 | Σ̂22 + m+n
=
|Σ̂22 | |Σ̂22 |
= 1+ mȲ2 +nV̄
m+n Σ̂−1
22
mȲ2 +nV̄
m+n
≡ 1 + T22 .
Note that T22 is exactly the T 2 statistic for testing µ2 = 0 vs. µ2 = 0 based
on the combined sample Y21 , . . . , Y2m , V1 , . . . , Vn , so the LRT ignores the
observations Y11 , . . . , Y1m .
(ii) The LRT statistic is the product of the LRT statistics for problem (i)
and for the problem of testing µ1 = 0, µ2 = 0 vs. µ1 = 0, µ2 = 0 (see
Exercise 6.14). Both LRTs can be obtained explicitly, but the distribution
of their product is not simple. (See Eaton and Kariya (1983).)
(iii) Under H3 : µ1 = 0, µ2 appears in different forms in the two exponen-
tials on the right-hand side of (9.1), hence maximization over µ2 cannot be
done explicitly.
¯
Exercise 9.5. For simplicity, assume µ is known, say µ = 0. Find the LRT
based on Y1 , . . . , Ym , V1 , . . . , Vn for testing
Solution: The LRT statistic for this problem is the same as if the addi-
tional observations V1 , . . . , Vn were not present (cf. Exercise 6.24), namely
|S11 ||S22 |
|S| . This can be seen by examining the LF factorization in (9.1) when
µ = 0 (so α = 0 and µ2 = 0). The null hypothesis H0 : Σ12 = 0 is equivalent
to β = 0, so the second exponential on the right-hand side of (9.1) is the
same under H0 and H, hence has the same maximum value under H0 and
H. Thus this second factor cancels when forming the LRT statistic, hence
the LRT does not involve V1 , . . . , Vn .
¯
124
STAT 542 Notes, Winter 2007; MDP
125
STAT 542 Notes, Winter 2007; MDP
i .e., if
(A.1) f (x1 , y1 )f (x2 , y2 ) ≥ f (x1 , y2 )f (x2 , y1 ).
f (x2 , y)
(A.2) is nondecreasing in y ∀ x1 < x2 .
f (x1 , y)
∂ 2 log f
Fact A.4. If f (x, y) > 0 and ∂x∂y ≥ 0 on A × B then f is TP2.
¯
126
STAT 542 Notes, Winter 2007; MDP
s = x1 − y1 , u = x1 − y2 ,
t = x2 − y2 , v = x2 − y1 .
Then [verify]
u ≤ min(s, t) ≤ max(s, t) ≤ v,
s + t = u + v,
so, since h is concave,
h(s) + h(t) ≥ h(u) + h(v),
which is equivalent to the TP2 condition (A.1) for f (x, y) ≡ g(x − y).
¯
These Facts yield the following examples of TP2 functions f (x, y):
is TP2 on A × C.
127
STAT 542 Notes, Winter 2007; MDP
128
STAT 542 Notes, Winter 2007; MDP
h(λ2 ) − h(λ1 )
= g(x)[f (x|λ2 ) − f (x|λ1 )]dν(x)
11
= 12 [g(x) − g(y)] [f (x|λ2 )f (y|λ1 ) − f (y|λ2 )f (x|λ1 )]dν(x)dν(y)
≥ 0,
Remark A.13. If {f (x|λ)} has MLR and X ∼ f (x|λ), then for each a ∈ X ,
% &
Prλ [ X > a] ≡ Eλ I(a,∞)(x)
Thus if fn (x|δ) and fn (x) denote the pdfs of χ2n (δ) and χ2n , then
∞
fn (x|δ) = fn+2k (x) Pr[K = k]
k=0
∞
δ
k
x 2 +k−1 e− 2
n x
e− 2 2δ
= n
·
k=0
2 2 +1 Γ n2 + k k!
∞
2 −1 −x − δ2
n
(A.5) ≡x e 2 ·e · ck xk δ k ,
k=0
where ck ≥ 0. Thus by A.2, A.3, and A.10, fn (x|δ) is TP2 in (x, δ).
¯
129
STAT 542 Notes, Winter 2007; MDP
so if fm,n (x|δ) and fm,n (x) now denote the pdfs of Fm,n (δ) and Fm,n , then
∞
fm,n (x|δ) = fm, n+2k (x) Pr[K = k]
k=0
∞
δ
k
Γ m+n + k x
m
2 +k−1 e− 2 2δ
= m 2
n
· m+n ·
Γ 2 + k Γ 2 (x + 1) 2 +k−1 k!
k=0
x 2 −1
m
∞ x k
− δ2
(A.8) ≡ m+n ·e · dk δk ,
(x + 1) 2 −1 k=0
x+1
where dk ≥ 0. Thus by A.2 and A.10, fm,n (x|δ) is TP2 in (x, δ).
¯
Question A.16. Does χ2n (δ) have MLR w.r.to n? (δ fixed) Does Fm,n (δ)
have MLR w.r.to m? (n, δ fixed) ¯
130
STAT 542 Notes, Winter 2007; MDP
where fp−1, n−p+1 (·| ζz) and fn (·) are the pdfs for Fp−1, n−p+1 (ζz) and χ2n ,
respectively. Then fp−1, n−p+1 (u|y) is TP2 in (u, y) by Example A.15, while
n2 −1 y
fn y
ζ =c· y
ζ e− 2ζ
131