2.6.3 Generalized Likelihood Ratio Tests: 2.6. Hypothesis Testing
2.6.3 Generalized Likelihood Ratio Tests: 2.6. Hypothesis Testing
When a UMP test does not exist, we usually use a generalized likelihood ratio
test to verify H0 : θ ∈ Θ⋆ against H1 : θ ∈ Θ\Θ⋆ . It can be used when H0 is
composite, which none of the above methods can.
The generalized likelihood ratio test has critical region R = {y : λ(y) ≤ a},
where
maxθ∈Θ⋆ L(θ|y)
λ(y) =
maxθ∈Θ L(θ|y)
If we let θ̂ denote the maximum likelihood estimate of θ and let θb0 denote the
value of θ which maximises the likelihood over all values of θ in Θ⋆ , then we
may write
L(θb0 |y)
λ(y) = .
b
L(θ|y)
The quantity θb0 is called the restricted maximum likelihood estimate of θ under
H0 .
The maximum
Pn likelihood µ, σb2 )T , where µ
estimate of θ = (µ, σ 2 )T is θb = (b b=y
2 2
b = i=1 (yi − y) /n.
and σ
n2 Pn P
σ̂ 2 n i=1 (yi − y)2 n ni=1 (yi − µ0 )2
= exp P − Pn
σ̂02 2 ni=1 (yi − y)2 2 i=1 (yi − µ0 )2
Pn n
i=1 (yi − y)2 2
= Pn 2
.
i=1 (yi − µ0 )
where a and b are constants chosen to give significance level α. Now, we may
write
n
X n
X
(yi − µ0 )2 = {(yi − y) + (y − µ0 )}2
i=1 i=1
n
X n
X
2
= (yi − y) + 2(y − µ0 ) (yi − y) + n(y − µ0 )2
i=1 i=1
n
X
= (yi − y)2 + n(y − µ0 )2 .
i=1
So we reject H0 if Pn 2
i=1 (yi − y)
Pn ≤ b.
i=1 (yi − y)2 + n(y − µ 0 )2
Thus, rearranging, we reject H0 if
n(y − µ0 )2 n(y − µ0 )2
1 + Pn 2
≥c⇒ 1
Pn ≥ d,
i=1 (yi − y) (y − y) 2
n−1 i=1 i
where c and d are constants chosen to give significance level α, that is we can
write
n(Y − µ0 )2
α = P (λ(Y ) ≤ a|H0 ) = P ≥ d|H0 ,
S2
1
Pn
where S 2 = n−1 2
i=1 (Yi − Y ) .
2.6. HYPOTHESIS TESTING 115
2
To get d we need to work out the distribution of n(Y S−µ
2
0)
under the null hypothesis.
2
For Yi ∼ N (µ, σ ) we have
iid
σ2
Y ∼ N µ,
n
and so, under H0 ,
√
σ2 n Y − µ0
Y ∼ N µ0 , and √ ∼ N (0, 1).
n σ2
This gives
n(Y − µ0 )2
∼ χ21 .
σ2
Also
(n − 1)S 2
∼ χ2n−1 .
σ2
Now we may use the fact that if U and V are independent rvs such that U ∼ χ2ν1
and V ∼ χ2ν2 , then U/ν 1
V /ν2
∼ Fν1 ,ν2 .
Equivalently, we reject H0 if
r
n(y − µ0 )2
≥ tn−1, α2 ,
s2
that is, if
y −µ
0
p 2 ≥ tn−1, α2 .
s /n
Of course, this is the usual two-sided t test.
It can be shown that all of the standard tests in situations with normal distributions
are generalized likelihood ratio tests.
116 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE
In more complex cases, we have to use the following approximation to find the
critical region. The result is stated without proof.
Thus, for large n, the critical region for a test with approximate significance level
α is
R = {y : −2 log{λ(y)} ≥ χ2ν,α }.
Then we have seen that no UMP test exists in this case. Now, the likelihood is
Pn
yi −nλ
λ i=1 e
L(λ|y) = Qn .
i=1 yi !
Example 2.40. Suppose that Yi , i = 1, . . . , n, are iid random variables with the
probability mass function given by
θj , if y = j, j = 1, 2, 3;
P (Y = y) =
0, otherwise,
Here the full parameter space Θ is two-dimensional because there are only two
free parameters, i.e., θ3 = 1 − θ1 − θ2 and
Θ = {θ = (θ1 , θ2 , θ3 )T : θ3 = 1 − θ1 − θ2 , θj ≥ 0}.
Hence p = 2.
The restricted parameter space is zero-dimensional, because under the null hy-
pothesis all the parameters are equal and as they sum up to 1, they all must be
equal to 1/3. Hence ( T )
⋆ 1 1 1
Θ = θ= , ,
3 3 3
and so p0 = 0 (zero unknown parameters). That is the number of degrees of
freedom of the χ2 distribution is ν = p − p0 = 2.
To calculate λ(Y ) we need to find the MLE(θ) in Θ and in the restricted space Θ⋆ .
118 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE
MLE(θ) in Θ:
The likelihood function is
For j = 1, 2 we have,
∂l nj n3
= + (−1).
∂θj θj 1 − θ1 − θ2
When compared to zero, this gives
θbj θb3
= = γ.
nj n3
That is
3
X n
−2 log{λ(y)} = −2 nj log .
j=1
3nj
We now show that the usual test for association in contingency tables is a gen-
eralized likelihood ratio test and that its asymptotic distribution is an example of
Wilks’ theorem.
Suppose that we have an r × c contingency table in which there are Yij individuals
classified in row i and column j, when N individuals are classified independently.
Then the variables Yij have a multinomial distribution with parameters N and θij
for i = 1, 2, . . . , r and j = 1, 2, . . . , c.
Pc
Improved No difference Worse ri = j=1 yij
Placebo 16 20 9 45
Half dose 17 18 10 45
Full dose
P 26 20 14 60
cj = ri=1 yij 59 58 33 N = 150
We are interested in testing the hypothesis that the response to the drug does not
depend on the dose level.
120 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE
The null hypothesis is that the row and column classifications are independent,
and the alternative is that they are dependent.
where
Ri Cj
Eij = ,
N
Pc Pr
Ri = j=1 Yij and Cj = i=1 Yij .
Now consider the generalized likelihood ratio test. Then the likelihood is
r Y
Y c r Y
Y c
N! y y
L(θ|y) = Qr Qc θijij =A θijij ,
i=1 j=1 yij ! i=1 j=1 i=1 j=1
where the coefficient A is the number of ways that N subjects can be divided in
rc groups with yij in the ij-th group.
The log-likelihood is
r X
X c
ℓ(θ; y) = log(A) + yij log(θij ).
i=1 j=1
Pr Pc
We have to maximize this, subject to the constraint i=1 j=1 θij = 1.
Let Σ′ represent the sum over all pairs (i, j) except (r, c). Then we may write
It follows that r X
X c y
b = log(A) + ij
ℓ(θ|y) yij log .
i=1 j=1
N
Then,
∂ℓ ri rr
= −
∂ai ai ar
which, when compared to zero gives
ri rr
=
ai
b ar
b
or
ai
b ar
b
= = γ.
ri rr
However,
r
X r
X
1= ai =
b ri γ = Nγ.
i=1 i=1
ri C j
θbij0 = b
aibbj = = eij /N
NN
for i = 1, 2, . . . , r and j = 1, 2, . . . , c.
It follows that
r X
X c e
ij
ℓ(θb0 |y) = log(A) + yij log .
i=1 j=1
N
Hence, we obtain
( ) r X
c
L(θb0 |y) X yij
−2 log{λ(y)} = −2 log = −2{ℓ(θb0 |y)−ℓ(θ|y)}
b =2 yij log .
b
L(θ|y) eij
i=1 j=1
The test statistics X 2 and −2 log{λ(y)} are asymptotically equivalent, that is,
they differ by quantities that tend to zero as N → ∞.
To see this, first note that yij − eij is small relative to yij and eij when N is large.
Thus, since yij = eij + (yij − eij ), we may write (by Taylor series expansion of
log(1 + x) around x, cut after second order term)
yij yij − eij (yij − eij ) 1 (yij − eij )2
log = log 1 + ≃ − .
eij eij eij 2 e2ij
Hence, we have
yij (yij − eij ) 1 (yij − eij )2
yij log ≃ {eij + (yij − eij )} −
eij eij 2 e2ij
1 (yij − eij )2
≃ (yij − eij ) + .
2 eij
2.6. HYPOTHESIS TESTING 123
Pr Pc
Since i=1 j=1 (yij − eij ) = 0, it follows that
r X
X c
(yij − eij )2
−2 log{λ(y)} ≃ .
i=1 j=1
eij
This gives
r X
X c
2 (yij − eij )2
Xobs. = = 1.417.
i=1 j=1
eij
The critical value is χ24;0.05 = 9.488, hence there is no evidence to reject the null
hypothesis saying that the different responses are independent of the levels of the
new drug and placebo.
We obtain the same conclusion using the critical region derived from the Wilks’
theorem, which is
( r X c )
X yij
R= y:2 yij log ≥ χ2ν;α .
i=1 j=1
eij
Here r X
c
X yij
2 yij log = 1.42.
i=1 j=1
eij