0% found this document useful (0 votes)
84 views11 pages

2.6.3 Generalized Likelihood Ratio Tests: 2.6. Hypothesis Testing

This document discusses generalized likelihood ratio tests, which can be used when the null hypothesis is composite (contains multiple possible parameter values) and no uniformly most powerful test exists. It provides the formula for the generalized likelihood ratio test statistic and explains how to derive the critical region that defines the rejection criteria. It also introduces Wilks' theorem, which states that under certain conditions, the distribution of the generalized likelihood ratio test statistic converges to a chi-squared distribution, allowing approximation of critical values. An example applying this to testing if a Poisson distribution parameter equals a stated value is also given.

Uploaded by

FabiánPL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views11 pages

2.6.3 Generalized Likelihood Ratio Tests: 2.6. Hypothesis Testing

This document discusses generalized likelihood ratio tests, which can be used when the null hypothesis is composite (contains multiple possible parameter values) and no uniformly most powerful test exists. It provides the formula for the generalized likelihood ratio test statistic and explains how to derive the critical region that defines the rejection criteria. It also introduces Wilks' theorem, which states that under certain conditions, the distribution of the generalized likelihood ratio test statistic converges to a chi-squared distribution, allowing approximation of critical values. An example applying this to testing if a Poisson distribution parameter equals a stated value is also given.

Uploaded by

FabiánPL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

2.6.

HYPOTHESIS TESTING 113

2.6.3 Generalized likelihood ratio tests

When a UMP test does not exist, we usually use a generalized likelihood ratio
test to verify H0 : θ ∈ Θ⋆ against H1 : θ ∈ Θ\Θ⋆ . It can be used when H0 is
composite, which none of the above methods can.

The generalized likelihood ratio test has critical region R = {y : λ(y) ≤ a},
where
maxθ∈Θ⋆ L(θ|y)
λ(y) =
maxθ∈Θ L(θ|y)

is the generalized likelihood ratio and a is a constant chosen to give significance


level α, that is such that
P (λ(Y ) ≤ a|H0 ) = α.

If we let θ̂ denote the maximum likelihood estimate of θ and let θb0 denote the
value of θ which maximises the likelihood over all values of θ in Θ⋆ , then we
may write
L(θb0 |y)
λ(y) = .
b
L(θ|y)

The quantity θb0 is called the restricted maximum likelihood estimate of θ under
H0 .

Example 2.38. Suppose that Yi ∼ N (µ, σ 2) and consider testing H0 : µ = µ0


iid
against H1 : µ 6= µ0 . Then the likelihood is
( n
)
−n 1 X
L(µ, σ 2 |y) = (2πσ 2 ) 2 exp − 2 (yi − µ)2 .
2σ i=1

The maximum
Pn likelihood µ, σb2 )T , where µ
estimate of θ = (µ, σ 2 )T is θb = (b b=y
2 2
b = i=1 (yi − y) /n.
and σ

Similarly, the restricted maximum likelihood estimate of θ = (µ, σ 2 )T under H0


P
µ0 , σb2 0 )T , where µ
is θb0 = (b b02 = ni=1 (yi − µ0 )2 /n.
b0 = µ0 and σ
114 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE

Thus, the generalized likelihood ratio is


n P o
(2πσ̂02 )− 2 exp − 2σ̂12 ni=1 (yi − µ0 )2
n

L(µ0 , σ̂02 |y)


λ(y) = = −n
 10 Pn
L(µ̂, σ̂ 2 |y) (2πσ̂ ) exp − 2σ̂2 i=1 (yi − y)2
2 2

  n2  Pn P 
σ̂ 2 n i=1 (yi − y)2 n ni=1 (yi − µ0 )2
= exp P − Pn
σ̂02 2 ni=1 (yi − y)2 2 i=1 (yi − µ0 )2
 Pn n
i=1 (yi − y)2 2
= Pn 2
.
i=1 (yi − µ0 )

Since the critical region is R = {y : λ(y) ≤ a}, we reject H0 if


 Pn n Pn
i=1 (yi − y)2 2 i=1 (yi − y)
2
Pn 2
≤ a ⇒ Pn 2
≤ b,
i=1 (yi − µ0 ) i=1 (yi − µ0 )

where a and b are constants chosen to give significance level α. Now, we may
write
n
X n
X
(yi − µ0 )2 = {(yi − y) + (y − µ0 )}2
i=1 i=1
n
X n
X
2
= (yi − y) + 2(y − µ0 ) (yi − y) + n(y − µ0 )2
i=1 i=1
n
X
= (yi − y)2 + n(y − µ0 )2 .
i=1

So we reject H0 if Pn 2
i=1 (yi − y)
Pn ≤ b.
i=1 (yi − y)2 + n(y − µ 0 )2
Thus, rearranging, we reject H0 if
n(y − µ0 )2 n(y − µ0 )2
1 + Pn 2
≥c⇒ 1
Pn ≥ d,
i=1 (yi − y) (y − y) 2
n−1 i=1 i

where c and d are constants chosen to give significance level α, that is we can
write  
n(Y − µ0 )2
α = P (λ(Y ) ≤ a|H0 ) = P ≥ d|H0 ,
S2
1
Pn
where S 2 = n−1 2
i=1 (Yi − Y ) .
2.6. HYPOTHESIS TESTING 115

2
To get d we need to work out the distribution of n(Y S−µ
2
0)
under the null hypothesis.
2
For Yi ∼ N (µ, σ ) we have
iid
 
σ2
Y ∼ N µ,
n
and so, under H0 ,
  √ 
σ2 n Y − µ0
Y ∼ N µ0 , and √ ∼ N (0, 1).
n σ2
This gives
n(Y − µ0 )2
∼ χ21 .
σ2
Also
(n − 1)S 2
∼ χ2n−1 .
σ2
Now we may use the fact that if U and V are independent rvs such that U ∼ χ2ν1
and V ∼ χ2ν2 , then U/ν 1
V /ν2
∼ Fν1 ,ν2 .

n(Y −µ0 )2 (n−1)S 2


Here, U = σ2
and V = σ2
. Hence, if H0 is true, we have
U/1 n(Y − µ0 )2
F = = ∼ F1,n−1 .
V /(n − 1) S2
Therefore, we reject H0 at a significance level α if
n(y − µ0 )2
≥ F1,n−1,α ,
s2
where F1,n−1,α is such that P (F ≥ F1,n−1,α ) = α.

Equivalently, we reject H0 if
r
n(y − µ0 )2
≥ tn−1, α2 ,
s2
that is, if
y −µ
0
p 2 ≥ tn−1, α2 .
s /n
Of course, this is the usual two-sided t test. 

It can be shown that all of the standard tests in situations with normal distributions
are generalized likelihood ratio tests.
116 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE

2.6.4 Wilks’ theorem

In more complex cases, we have to use the following approximation to find the
critical region. The result is stated without proof.

Theorem 2.9. Wilks’ theorem.


Assume that the joint distribution of Y1 , . . . , Yn depends on p unknown parameters
and that, under H0 , the joint distribution depends on p0 unknown parameters. Let
ν = p − p0 . Then, under some regularity conditions, when the null hypothesis is
true, the distribution of the statistic −2 log{λ(Y )} converges to a χ2ν distribution
as the sample size n → ∞, i.e., when H0 is true and n is large,
−2 log{λ(Y )} ∼ χ2ν .
approx.

Thus, for large n, the critical region for a test with approximate significance level
α is
R = {y : −2 log{λ(y)} ≥ χ2ν,α }.

Example 2.39. Suppose that Yi ∼ Poisson(λ) and consider testing H0 : λ = λ0


iid
against H1 : λ 6= λ0 .

Then we have seen that no UMP test exists in this case. Now, the likelihood is
Pn
yi −nλ
λ i=1 e
L(λ|y) = Qn .
i=1 yi !

The maximum likelihood estimate of λ is λ̂ = y and the restricted maximum


likelihood estimate of λ under H0 is λ̂0 = λ0 . Thus, the generalized likelihood
ratio is
Pn Qn
y
L(λ0 |y) λ0 i=1 i e−nλ0 yi !
λ(y) = = Qn P n i=1
y
L(λ̂|y) i=1 yi ! y e−ny
i=1 i
P
  ni=1 yi
λ0
= en(y−λ0 ) .
y
It follows that
   
λ0
−2 log{λ(y)} = −2 ny log + n(y − λ0 )
y
   
y
= 2n y log + λ0 − y .
λ0
2.6. HYPOTHESIS TESTING 117

Here, p = 1 and p0 = 0, and so ν = 1. Therefore, by Wilks’ theorem, when H0 is


true and n is large,
   
Y
2n Y log + λ0 − Y ∼ χ21 .
λ0
Hence, for a test with approximate significance level α, we reject H0 if and only
if    
y
2n y log + λ0 − y ≥ χ21,α .
λ0


Example 2.40. Suppose that Yi , i = 1, . . . , n, are iid random variables with the
probability mass function given by

θj , if y = j, j = 1, 2, 3;
P (Y = y) =
0, otherwise,

where θj are unknown parameters such that θ1 + θ2 + θ3 = 1 and θj ≥ 0. Consider


testing
H0 : θ1 = θ2 = θ3
against H1 : H0 is not true
We will use the Wilks’ theorem to derive the critical region for testing this hy-
pothesis at an approximate significance level α.

Here the full parameter space Θ is two-dimensional because there are only two
free parameters, i.e., θ3 = 1 − θ1 − θ2 and

Θ = {θ = (θ1 , θ2 , θ3 )T : θ3 = 1 − θ1 − θ2 , θj ≥ 0}.

Hence p = 2.

The restricted parameter space is zero-dimensional, because under the null hy-
pothesis all the parameters are equal and as they sum up to 1, they all must be
equal to 1/3. Hence (  T )
⋆ 1 1 1
Θ = θ= , ,
3 3 3
and so p0 = 0 (zero unknown parameters). That is the number of degrees of
freedom of the χ2 distribution is ν = p − p0 = 2.

To calculate λ(Y ) we need to find the MLE(θ) in Θ and in the restricted space Θ⋆ .
118 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE

MLE(θ) in Θ:
The likelihood function is

L(θ; y) = θ1n1 θ2n2 (1 − θ1 − θ2 )n3 ,

where nj is the number of responses equal to j, j = 1, 2, 3. Then, the log-


likelihood is

l(θ; y) = n1 log(θ1 ) + n2 log(θ2 ) + n3 log(1 − θ1 − θ2 ).

For j = 1, 2 we have,
∂l nj n3
= + (−1).
∂θj θj 1 − θ1 − θ2
When compared to zero, this gives

θbj θb3
= = γ.
nj n3

Now, since θb1 + θb2 + θb3 = 1 we obtain γ = n1 , where n = n1 + n2 + n3 . Then the


estimates of the parameters are θbj = nj /n, j = 1, 2, 3.

The second derivatives are


∂2l nj n3
2
=− 2 − 2
∂θj θj θ3
∂2l n3
=− 2
∂θj θi θ3
∂2l
The determinant of the second derivatives is positive for all θ and the element ∂θ12
b Hence, there is a maximum at θb and
is negative for all θ, so also for θ.
nj
MLE(θj ) = θbj = .
n

MLE(θ) in Θ⋆ :  n1  n2  n3


1 1 1
L(θ; y) = .
3 3 3

1 1 1 T
That is θb0 = , ,
3 3 3
Now, we can calculate λ(y):
.
 
1 n1 1 n2 1 n3
   n1   n2  n3
L(θb0 ; y) 3 3 3 n n n
λ(y) = = n1  n1 n2  n2 n3  n3 = .
b y)
L(θ; n n n
3n1 3n2 3n3
2.6. HYPOTHESIS TESTING 119

That is  
3
X n
−2 log{λ(y)} = −2 nj log .
j=1
3nj

We reject H0 at an approximate significance level α if the observed sample be-


longs to the critical region R, where
( 3   )
X 3nj
R= y: 2 nj log ≥ χ22;α .
j=1
n

2.6.5 Contingency tables

We now show that the usual test for association in contingency tables is a gen-
eralized likelihood ratio test and that its asymptotic distribution is an example of
Wilks’ theorem.

Suppose that we have an r × c contingency table in which there are Yij individuals
classified in row i and column j, when N individuals are classified independently.

Let θij be the corresponding probability


P Pthat an individual is classified in row i
and column j, so that θij ≥ 0 and ri=1 cj=1 θij = 1.

Then the variables Yij have a multinomial distribution with parameters N and θij
for i = 1, 2, . . . , r and j = 1, 2, . . . , c.

Example 2.41. In an experiment 150 patients were allocated to three groups of


45, 45 and 60 patients each. Two groups were given a new drug at different dose
levels and the third group received placebo. The responses are as follows:

Pc
Improved No difference Worse ri = j=1 yij
Placebo 16 20 9 45
Half dose 17 18 10 45
Full dose
P 26 20 14 60
cj = ri=1 yij 59 58 33 N = 150

We are interested in testing the hypothesis that the response to the drug does not
depend on the dose level. 
120 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE

The null hypothesis is that the row and column classifications are independent,
and the alternative is that they are dependent.

MoreP hypothesis is H0 : θij = ai bj for some ai > 0 and bj > 0,


precisely, the nullP
with i=1 ai = 1 and cj=1 bj = 1, and the alternative is H1 : θij 6= ai bj for at
r

least one pair (i, j) .

The usual test statistic is given by


r X
X c
2 (Yij − Eij )2
X = ,
i=1 j=1
Eij

where
Ri Cj
Eij = ,
N
Pc Pr
Ri = j=1 Yij and Cj = i=1 Yij .

For large N, X 2 ∼ χ2(r−1)(c−1) approximately when H0 is true, and so we reject


H0 at the level of significance α if X 2 ≥ χ2(r−1)(c−1),α .

Now consider the generalized likelihood ratio test. Then the likelihood is
r Y
Y c r Y
Y c
N! y y
L(θ|y) = Qr Qc θijij =A θijij ,
i=1 j=1 yij ! i=1 j=1 i=1 j=1

where the coefficient A is the number of ways that N subjects can be divided in
rc groups with yij in the ij-th group.

The log-likelihood is
r X
X c
ℓ(θ; y) = log(A) + yij log(θij ).
i=1 j=1
Pr Pc
We have to maximize this, subject to the constraint i=1 j=1 θij = 1.

Let Σ′ represent the sum over all pairs (i, j) except (r, c). Then we may write

ℓ(θ|y) = log(A) + Σ′ yij log(θij ) + yrc log (1 − Σ′ θij ) .

Thus, for (i, j) 6= (r, c), solving the equation


∂ℓ yij yrc yij yrc
= − = − =0
∂θij θij 1 − Σ θij
′ θij θrc
2.6. HYPOTHESIS TESTING 121

yields θ̂ij /yij = θ̂rc /yrc = γ, say.


P P Pr Pc
Since ri=1 cj=1 θbij = i=1 j=1 yij γ = 1, we have γ = 1/N, so that the
maximum likelihood estimate of θij is θ̂ij = yij /N for i = 1, 2, . . . , r and j =
1, 2, . . . , c.

It follows that r X
X c y 
b = log(A) + ij
ℓ(θ|y) yij log .
i=1 j=1
N

Now, under H0 , we have θij = ai bj and so


r X
X c
ℓ(θ|y) = log(A) + yij {log(ai ) + log(bj )}
i=1 j=1
Xr c
X
= log(A) + ri log(ai ) + cj log(bj ).
i=1 j=1
Pr Pc
Now, we maximize this subject to the constraints i=1 ai = 1 and j=1 bj = 1.
That is, we maximize
ℓ(θ|y) = log(A) + Σ′ ri log(ai ) + rr log(ar ) + Σ′ cj log(bj ) + cc log(bc )
= log(A) + Σ′ ri log(ai ) + rr log(1 − Σ′ ai ) + Σ′ cj log(bj ) + cc log(1 − Σ′ bj ).

Then,
∂ℓ ri rr
= −
∂ai ai ar
which, when compared to zero gives
ri rr
=
ai
b ar
b
or
ai
b ar
b
= = γ.
ri rr
However,
r
X r
X
1= ai =
b ri γ = Nγ.
i=1 i=1

Hence, γ = 1/N and so


ri
ai =
b .
N
Similarly, we obtain the ML estimates for bj as bbj = cj /N.
122 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE

Thus, the restricted maximum likelihood estimate of θij under H0 is

ri C j
θbij0 = b
aibbj = = eij /N
NN
for i = 1, 2, . . . , r and j = 1, 2, . . . , c.

It follows that
r X
X c e 
ij
ℓ(θb0 |y) = log(A) + yij log .
i=1 j=1
N

Hence, we obtain
( ) r X
c  
L(θb0 |y) X yij
−2 log{λ(y)} = −2 log = −2{ℓ(θb0 |y)−ℓ(θ|y)}
b =2 yij log .
b
L(θ|y) eij
i=1 j=1

Here, p = rc − 1 and p0 = (r − 1) + (c − 1) = r + c − 2, and so ν = (r − 1)(c − 1).

Therefore, by Wilks’ theorem, when H0 is true and N is large,


r X
X c  
Yij
2 Yij log ∼ χ2(r−1)(c−1) .
i=1 j=1
Eij approx.

The test statistics X 2 and −2 log{λ(y)} are asymptotically equivalent, that is,
they differ by quantities that tend to zero as N → ∞.

To see this, first note that yij − eij is small relative to yij and eij when N is large.

Thus, since yij = eij + (yij − eij ), we may write (by Taylor series expansion of
log(1 + x) around x, cut after second order term)
   
yij yij − eij (yij − eij ) 1 (yij − eij )2
log = log 1 + ≃ − .
eij eij eij 2 e2ij

Hence, we have
   
yij (yij − eij ) 1 (yij − eij )2
yij log ≃ {eij + (yij − eij )} −
eij eij 2 e2ij
1 (yij − eij )2
≃ (yij − eij ) + .
2 eij
2.6. HYPOTHESIS TESTING 123
Pr Pc
Since i=1 j=1 (yij − eij ) = 0, it follows that
r X
X c
(yij − eij )2
−2 log{λ(y)} ≃ .
i=1 j=1
eij

So we now see why we use the X 2 test statistic.

Example 2.42. continued.


Now we will test the hypothesis from the previous example. The table of eij values
is following

Improved No difference Worse


Placebo 17.7 17.4 9.9
Half dose 17.7 17.4 9.9
Full dose 23.6 23.2 13.2

This gives
r X
X c
2 (yij − eij )2
Xobs. = = 1.417.
i=1 j=1
eij

The critical value is χ24;0.05 = 9.488, hence there is no evidence to reject the null
hypothesis saying that the different responses are independent of the levels of the
new drug and placebo.

We obtain the same conclusion using the critical region derived from the Wilks’
theorem, which is
( r X c   )
X yij
R= y:2 yij log ≥ χ2ν;α .
i=1 j=1
eij

Here r X
c  
X yij
2 yij log = 1.42.
i=1 j=1
eij

You might also like