0% found this document useful (0 votes)
39 views24 pages

2024-11-22 Slides 10

Uploaded by

謝利米
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views24 pages

2024-11-22 Slides 10

Uploaded by

謝利米
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

8 Model assessment

8.1 General notions in hypothesis testing

8.2 Error types, size and power

8.3 Basic idea for constructing tests

8.4 The notion of p-value

8.5 Examples of tests

8.6 Most powerful and uniformly most powerful tests

8.7 Likelihood ratio test

© Marius Hofert Section 8 | p. 219


8.1 General notions in hypothesis testing
Motivation: After having estimated various models to given data, we need to
assess their goodness-of-fit, i.e. how well the fitted models actually fit the given
data, to disregard those in stark contrast to the data.
ind.
Setup: We have realizations x1 , . . . , xn of X1 , . . . , Xn ∼ F .
Goal: We want to test a hypothesis about F , i.e. a statement about F (or its
density/pmf f or parameter vector θ). We assume F ∈ F for a set of dfs F.
Graphical: Graphical assessments are tools to check a model’s fit graphically.
Example: A Q-Q plot displays {(F0−1 ( i−1/2 −1 i−1/2 −1 i
n ), x(i) ) = (F0 ( n ), Fn ( n )) :
i = 1, . . . , n} for a hypothesized F0 for F , so that if F0 ≈ F , the points of the
Q-Q plot lie on the line y = x.
Formal: A (hypothesis) test is a decision function (detailed later) that indicates
whether the data supports a given hypothesis or not. The setup is as follows:
▶ Partition F = F0 ⊎ F1 and describe a hypothesis via

H0 : F ∈ F0 vs H1 : F ∈ F1 ,

© Marius Hofert Section 8.1 | p. 220


where H0 is the null hypothesis (what we are interested in testing and
rejecting) and H1 the alternative (hypothesis) (what H0 is tested against).
Note: The term “null” historically comes from “no relationship” between
quantities tested for linear dependence, e.g. H0 : ρ = 0 vs H1 : ρ ̸= 0.
▶ In a parametric hypothesis test, we assume F = {F (·; θ) : θ ∈ Θ ⊆ Rp },
Θ = Θ0 ⊎ Θ1 , and typically describe the hypotheses via
H0 : θ ∈ Θ 0 vs H1 : θ ∈ Θ 1 .
▶ H0 (H1 ) are simple if |H0 | = 1 (|H1 | = 1), otherwise composite.
Decision: The decision of a hypothesis test is to “reject H0 ” or “not reject H0 ”
(not: “accept H0 ”), so can be formulated in terms of the decision function
φ(x) := 1{x∈C} , x = (x⊤ ⊤ ⊤
1 , . . . , xn ) ∈ R
n×d
,
for some critical region C (boundary values of C are the critical values).
Power function: The power function is the probability of rejecting H0 , so
π(F ) := P(φ(X) = 1) = P(X ∈ C), F ∈ F = F0 ⊎ F1 .

8.2 Error types, size and power


A test’s decision leads to precisely one of four possible outcomes:
© Marius Hofert Section 8.2 | p. 221
Outcome (probability) H0 is rejected (x ∈ C) H0 is not rejected (x ∈ C c )
Type I error (α) No error (1 − α)
H0 is true (F ∈ F0 )
(false positive conclusion) (specificity)
No error (1 − β) Type II error (β)
H1 is true (F ∈ F1 )
(sensitivity) (false negative conclusion)

The type I error is the error of incorrectly rejecting H0 .


▶ Its probability is π(F )|F ∈F0 , the size of the test.
▶ Similar to the non-coverage probability of CIs, the size of the test is controlled
by a small probability, the significance level α ∈ [0, 1].
▶ If supF ∈F π(F ) = α (≤ α), the test is a size α test (level α test).
0
▶ A level α test with supF ∈F π(F ) < α is conservative.
0
The probability of correctly rejecting H0 is π(F )|F ∈F1 , the power of the test.
If limn→∞ π(F ) = 1 ∀ F ∈ F1 , the test is consistent.
The type II error is the error of incorrectly not rejecting H0 .
▶ Its probability is β = 1 − π(F )|F ∈F1 .
Only one error type can be controlled (one takes α, typically 5%).
Goal: Construct size α tests with maximal power (minimal β).
© Marius Hofert Section 8.2 | p. 222
8.3 Basic idea for constructing tests
1) Fix α ∈ (0, 1) (typically 5%) and find a test statistic
Tn = Tn (X)
that is a pivotal quantity under H0 , so whose df FTn does not depend on
unknown parameters under H0 , at least for n → ∞ (for an asymptotic test).
2) Determine critical values cα,1 , cα,2 (possibly ±∞) such that
PH0 (cα,1 ≤ Tn ≤ cα,2 ) ≥ 1 − α
(under H0 , so for F ∈ F0 , typically θ ∈ Θ0 ).
3) Based on x and Tn (x), the test’s critical region is then
Cα = {x : cα,1 ≤ Tn (x) ≤ cα,2 }c = {x : Tn (x) < cα,1 or Tn (x) > cα,2 },
so φ(x) = 1{Tn (x)∈[c
/ α,1 ,cα,2 ]} (reject H0 iff Tn (x) ∈
/ [cα,1 , cα,2 ]).

Note: If cα,2 = ∞ (cα,1 = ∞), the test is a left-tailed (right-tailed) test, and if
−∞ < cα,1 , cα,2 < ∞, the test is a two-tailed test (i.e. the critical region lies in
the respective tail(s) of FTn ).
© Marius Hofert Section 8.3 | p. 223
Example 8.1 (Identifying φ, π, α, β)
Let θ ∈ (0, 1) be the probability of recovery after treatment with a new medication
ind.
and X1 , . . . , Xn ∼ B(1, θ) the recovery indicators of n patients. The manufacturer
of the medication tests H0 : θ = θ0 vs H1 : θ = θ1 for some 0 < θ0 < θ1 < 1 with
P
test statistic Tn = Tn (X) = ni=1 Xi and critical region C = {x : Tn (x) > c} for
some critical value c ∈ {1, . . . , n}. Identify the test’s φ, π, α and β.
Solution. φ(x) = 1{x∈C} = 1{Tn (x)>c} , π(θ) = P(φ(X) = 1) = P(X ∈
C) = P(Tn (X) > c) = F̄B(n,θ) (c), α = supθ∈Θ0 π(θ) = π(θ)|θ=θ0 = π(θ0 ) =
F̄B(n,θ0 ) (c) and β = 1 − π(θ)|θ=θ1 = FB(n,θ1 ) (c).
α fixed, θ runs: If θ is large, then Tn is expected to be large (since E(Tn (X)) =
nθ), so under H1 we need to reject H0 more often ⇒ C = {x : Tn (x) > c}
makes sense here.
θ fixed, α runs: A larger α should lead to a smaller c (higher rejection probability
under H0 ), so a larger C and thus a larger φ. In short, 0 < α1 < α2 < 1 ⇒
cα1 > cα2 ⇒ Cα1 ⊆ Cα2 ⇒ 0 ≤ φα1 ≤ φα2 ≤ 1.
For the test to keep its level α, i.e. for supF ∈F0 π(F ) here
= π(θ0 ) ≤ α, c cannot
be too small. So c = cα , thus C = Cα and φ = φα all depend on α.
© Marius Hofert Section 8.3 | p. 224
8.4 The notion of p-value
Typically, φα is monotone in α, so φα1 ≤ φα2 ∀ 0 < α1 < α2 < 1 (the smaller
the rejection prob. α, the less often H0 is rejected). Instead of reporting the test
decision φα (x), one can then report the smallest level at which H0 is still rejected.

Definition 8.2 (P-value)


The p-value of a test is p(x) := inf{α ∈ (0, 1) : φα (x) = 1}, i.e. the smallest
level at which the test still rejects H0 .

For given α, reject H0 iff p(x) ≤ α. This gives a test’s decision for all α and is
thus frequently reported as test result (not just whether H0 was rejected).

Proposition 8.3 (Characterisation)


With FTn symmetric about 0 in the two-tailed case, one has p(x) =
PH0 (“Tn (X) is at least as extreme as Tn (x)”).

Proof. We consider the three forms of tests separately:


1) For a left-tailed test, reject H0 iff Tn (x) < cα = FT−1
n
(α).
© Marius Hofert Section 8.4 | p. 225
Then φα (x) = 1{x∈Cα } for Cα = {x̃ : Tn (x̃) < FT−1 n
(α)}.
The p-value is p(x) = inf{α ∈ (0, 1) : φα (x) = 1} = inf{α ∈ (0, 1) :
Tn (x) < FT−1
n
= cont.inf{α ∈ (0, 1) : FTn (Tn (x)) < α} = FTn (Tn (x))
(α)}F right
Tn

= PH0 (Tn (X) ≤ Tn (x)).


2) For a right-tailed test, reject H0 iff Tn (x) > cα = FT−1
n
(1 − α).
Then φα (x) = 1{x∈Cα } for Cα = {x̃ : Tn (x̃) > FT−1 n
(1 − α)}.
The p-value is p(x) = inf{α ∈ (0, 1) : φα (x) = 1} = inf{α ∈ (0, 1) :
Tn (x) > FT−1 n
(1 − α)} −1 = inf{α ∈ (0, 1) : FTn (Tn (x)−) > 1 −
FT left cont.
n

α} = inf{α ∈ (0, 1) : α > 1 − FTn (Tn (x)−)} = 1 − FTn (Tn (x)−) =


1 − PH0 (Tn (X) < Tn (x)) = PH0 (Tn (X) ≥ Tn (x)).
3) For a two-tailed test, reject H0 iff Tn (x) < cα,1 = FT−1 n
( α2 ) or Tn (x) > cα,2 =
FT−1
n
(1 − α2 ). If FTn is symmetric about 0 reject H0 iff |Tn (x)| > FT−1 n
(1 − α2 ).
Then φα (x) = 1{x∈Cα } for Cα = {x̃ : Tn (x̃) < FT−1 n
( α2 ) or Tn (x̃) >
−1 α
FTn (1 − 2 )}.
The p-value is p(x) = inf{α ∈ (0, 1) : φα (x) = 1} = inf{α ∈ (0, 1) :
Tn (x) < FT−1 ( α2 ) or Tn (x) > FT−1 = inf{α ∈ (0, 1) : α2 >
(1 − α2 )} above
as
n n
FTn (Tn (x)) or α2 > 1 − FTn (Tn (x)−)} = inf{α ∈ (0, 1) : α > 2 min{
© Marius Hofert Section 8.4 | p. 226
FTn (Tn (x)), 1 − FTn (Tn (x)−)}} = 2 min{FTn (Tn (x)), 1 − FTn (Tn (x)−)}
= 2 min{PH0 (Tn (X) ≤ Tn (x)), PH0 (Tn (X) ≥ Tn (x))}. If FTn is symmetric
about 0, then p(x) = PH0 (|Tn (X)| ≥ |Tn (x)|).

Remark 8.4 (Uniformity)


One can show that if φα is a size α test ∀ α ∈ (0, 1), then p(X) ∼ U(0, 1). For
continuous FTn (most cases), we can directly verify this via P. 8.3:
proof of d
1) For a left-tailed test, p(X) P.=8.3 FTn (Tn (X)) prob. transform
= U ∼ U(0, 1).
proof of d d
2) For a right-tailed test, p(X) P.=8.3 1−FTn (Tn (X)−) cont.
= 1−FTn (Tn (X)) prob. transform
=
1 − U ∼ U(0, 1).
proof of d
3) For a two-tailed test, p(X) P.=8.3 2 min{FTn (Tn (X)), 1−FTn (Tn (X)−)} prob. transform
=
2 min{U, 1 − U } ∼ U(0, 1).
This can be used to check a test’s correctness (e.g. if the test is done based on
resampling or otherwise approximate).

© Marius Hofert Section 8.4 | p. 227


8.5 Examples of tests
Similarly to the construction of confidence intervals, the construction of asymp-
totic (vs exact) tests is easier.
We will again see the power of the CLT and Slutsky’s theorem.

Example 8.5 (Asymptotic test of the mean for Poi(λ))


ind.
Let X1 , . . . , Xn ∼ Poi(λ) for unknown λ > 0. Provide a test of H0 : λ = λ0 vs
H1 : λ ̸= λ0 for a given λ0 .

Solution.
In this case Θ = {λ : λ > 0} = Θ0 ⊎ Θ1 for Θ0 = {λ0 } and Θ1 = {λ : λ ∈
(0, ∞)\{λ0 }}.

Under H0 , Tn = n X̄√n −λ0 n →
d
N(0, 1) (by the CLT and Slutsky’s theorem).
X̄n→∞

For sufficiently large n, the critical region is thus Cα = {x : |Tn (x)| > z1− α2 =
Φ−1 (1 − α2 )}.
√ n −λ0
Since var(X1 ) = λ H= λ0 , we could have also considered Tn = n X̄√ λ
.
0 0

© Marius Hofert Section 8.5 | p. 228


Example 8.6 (Asymptotic test of the mean for B(m, p) for known m)
ind.
Let X1 , . . . , Xn ∼ B(m, p) for known m and unknown p ∈ (0, 1). Provide a test
of H0 : p ≤ p0 vs H1 : p > p0 for a given p0 .

Solution. Even if we don’t know p under H0 (composite case), we still consider



Tn (X) = n q X̄n −mp0 and, for large n, Cα = {x : Tn (x) > z1−α =
m X̄mn (1− X̄mn )
Φ−1 (1 − α)}, because this still gives a size α test, as the power (in terms of p) is
 
√ X̄n − mp0
π(p) = P(Tn (X) > z1−α ) = P nq > z1−α
m X̄mn (1 − X̄mn )
 
√ X̄n − mp √ m(p0 − p)
=P nq > z1−α + n q
m X̄mn (1 − X̄mn ) m X̄mn (1 − X̄mn )
| {z }
  ≥ 0 under H0
√ X̄n − mp
≤P nq > z1−α → 1 − Φ(z1−α ) = α,
n→∞
m X̄mn (1 − X̄n
)
H0
m
| {z }
CLT, Slutsky

n large
N(0,1)

with equality if p = p0 . Similarly for H0 : p ≥ p0 vs H1 : p < p0 (z1−α ← zα ).


© Marius Hofert Section 8.5 | p. 229
Remark 8.7 (Exact tests for iid normal data)
Let X1 , . . . , Xn ∼ N(µ, σ 2 ).
ind.

1) Tests H0 : µ = µ0 vs H1 : µ ̸= µ0 , H1′ : µ < µ0 or H1′′ : µ > µ0 .


σ 2 known:

▶ Pivot: n X̄n −µ ∼ N(0, 1).
σ √
▶ Test statistic: Tn = n X̄n −µ0 ∼ N(0, 1) (symmetric about 0).
σ H0
▶ Critical region, p-value:
H1 : Cα = {|Tn (x)| > z1−α/2 }, p(x) = PH0 (|Tn (X)| ≥ |Tn (x)|) =
T 2 (X) ∼ F 2
PH0 (Tn2 (X) ≥ Tn2 (x)) n cont.
= χ1 1 − Fχ2 (Tn2 (x)).
1

H1 : Cα = {Tn (x) < zα = −z1−α }, p(x) = PH0 (Tn (X) ≤
Tn (x)) = Φ(Tn (x)).
H1′′ : Cα = {Tn (x) > z1−α }, p(x) = PH0 (Tn (X) ≥ Tn (x)) cont.
=
Φ̄(Tn (x)).
σ 2 unknown:

▶ Pivot: n X̄n −µ ∼ tn−1 .
Ŝn

▶ Test statistic: Tn = n X̄n −µ0 ∼ tn−1 (symmetric about 0).
Ŝn H0
▶ Critical region, p-value:

© Marius Hofert Section 8.5 | p. 230


H1 : Cα = {|Tn (x)| > Ft−1 n−1
(1 − α/2), p(x) = P(|Tn (X)| ≥
= 2(1 − Ftn−1 (|Tn (x)|)).
|Tn (x)|) = 2P(Tn (X) ≥ |Tn (x)|) cont.
H1 : Cα = {Tn (x) < Ftn−1 (α) = −Ft−1
′ −1
n−1
(1−α)}, p(x) = P(Tn (X) ≤
Tn (x)) = Ftn−1 (Tn (x)).
H1′′ : Cα = {Tn (x) > Ft−1
n−1
(1 − α)}, p(x) = P(Tn (X) ≥ Tn (x)) cont.
=
F̄tn−1 (Tn (x)).
2) Test H0 : σ 2 = σ02 vs H1 : σ 2 > σ02 (others possible).
µ known:
Pn Xi −µ 2
i=1 ( σ ) ∼ Fχ2n .
▶ Pivot:
▶ Test statistic: Tn = n
P Xi −µ 2
i=1 ( σ0 ) H ∼ Fχ2n .
0
▶ Critical region, p-value: H1 : Cα = {Tn (x) > Fχ−1 (1 − α)}, p(x) =
n
P(Tn (X) ≥ Tn (x)) cont.
= F̄χ2n (Tn (x)).
µ unknown:
2
▶ Pivot: (n−1) Ŝn
∼ Fχ2 .
σ2 n−1
2
▶ Test statistic: Tn = (n−1) Ŝn
∼ Fχ2 .
σ02 H0 n−1
▶ Critical region, p-value: H1 : Cα = {Tn (x) > Fχ−1 (1 − α)}, p(x) =
n−1

P(Tn (X) ≥ Tn (x)) cont.


= F̄χ2 (Tn (x)).
n−1
© Marius Hofert Section 8.5 | p. 231
Remark 8.8 (Exact tests for two independent iid normal datasets)
For j = 1, 2, let X1,j , . . . , Xnj ,j ∼ N(µj , σj2 ) be independent.
ind.

1) Tests H0 : µ1 − µ2 = µ0 vs H1 : µ1 − µ2 ̸= µ0 (others possible).


σ12 , σ22 known:
(X̄n1 ,1 −X̄n2 ,2 )−(µ1 −µ2 ) σ2 σ2
▶ Pivot:
σn1 ,n2 ∼ N(0, 1) for σn2 1 ,n2 = n11 + n22 .
(X̄ −X̄n2 ,2 )−µ0
▶ Test statistic: Tn1 ,n2 = n1 ,1σn ,n ∼ N(0, 1) (symmetric
H0
about 0).
1 2
▶ Critical region, p-value: Cα = {|Tn1 ,n2 (x1 , x2 )| > z1−α/2 }, p(x1 , x2 ) =
1 − Fχ2 (Tn21 ,n2 (x1 , x2 )).
as in
PH0 (|Tn1 ,n2 (X1 , X2 )| ≥ |Tn1 ,n2 (x1 , x2 )|) R.=
8.7 1) 1
σ12 = σ22 =: σ 2 unknown:
2
(n1−1)Ŝn 2
+(n2−1)Ŝn
(X̄n1 ,1−X̄n2 ,2 )−(µ1−µ2 )
▶ Pivot:
Ŝn1 ,n2
∼ tn1 +n2 −2 for Ŝn21 ,n2 = 1 ,1
n1 +n2 −2
2 ,2
.
1/n1 +1/n2
(X̄n1 ,1 −X̄n2 ,2 )−µ0
▶ Test statistic: Tn1 ,n2 = ∼ tn1 +n2 −2 (symmetric about 0).
Ŝn1 ,n2 H0

▶ Critical region, p-value: Cα = {|Tn1 ,n2 (x1 , x2 )| > Ft−1 (1 − α/2)},


n1 +n2 −2
as in
p(x1 , x2 ) = PH0 (|Tn1 ,n2 (X1 , X2 )| ≥ |Tn1 ,n2 (x1 , x2 )|) R.=
8.7 1)
2(1−
Ftn1 +n2 −2 (|Tn1 ,n2 (x1 , x2 )|)).

© Marius Hofert Section 8.5 | p. 232


2) Tests H0 : σ12 = σ22 vs H1 : σ12 ̸= σ22 (others possible) for unknown µ1 , µ2
2
Ŝn /σ12 2
Ŝn
1 ,1 1 ,1
based on pivot 2 ∼ Fn1 −1,n2 −1 with test statistic Tn1 ,n2 =
Ŝn ,2 /σ22 R. 6.33 2
Ŝn

H0
2 2 ,2
Fn1 −1,n2 −1 are also possible.

The pivots in R. 8.7 and 8.8 were the same as in R. 6.32 and 6.33.
As we can see from these remarks, the (1 − α)-CIs are the complement of the
critical regions of the respective two-tailed tests.

© Marius Hofert Section 8.5 | p. 233


Example 8.9 (Asymptotic Welch’s t-test of equality of two means)
The test of the mean difference in R. 8.8 1) can be extended to arbitrary
distributions with finite variances, at the cost of it being asymptotic.
ind.
For j = 1, 2, let X1,j , . . . , Xnj ,j ∼ Fj be all independent with mean µj and
unknown, but finite, second moment. Suppose we are interested in testing
H0 : µ1 − µ2 = 0 vs H1 : µ1 − µ2 ̸= 0 (here: µ0 = 0; others possible).
As test statistic, one uses
X̄n ,1 − X̄n2 ,2
Tn1 ,n2 = Tn1 ,n2 (X,1 , X,2 ) = r 1
2
Ŝn 2
Ŝn
1 ,1 2 ,2
n1 + n2

(a sample version of the first test statistic in R. 8.8 1)). If nk > 5, k = 1, 2,


2
Ŝ1,n Ŝ 2 2

Tn1 ,n2 is approximately tν distributed, where ν =


( n1
1 + 2,n2
n2 ) for
Ŝ 2 Ŝ 2
1,n1 2 1 1,n1 2 1
( n1 ) ν1
+ ( n1 )ν2
νk = nk − 1, k = 1, 2.
The critical region is therefore Cα = {(x,1 , x,2 ) : |Tn1 ,n2 (x,1 , x,2 )| > Ft−1
ν
(1 −
α −1 α
2 )}, so reject H0 iff |Tn1 ,n2 (x,1 , x,2 )| > Ftν (1 − 2 ).

© Marius Hofert Section 8.5 | p. 234


8.6 Most powerful and uniformly most powerful tests
Question: How can we construct size α tests with maximal power (minimal β)?

A size (level) α test is a uniformly most powerful (UMP) test if π(F ) ≥ π ′ (F )


∀ F ∈ F1 , ∀ power functions π ′ of size (level) α tests.

Theorem 8.10 (Neyman–Pearson lemma (NPL))


ind.
Let X1 , . . . , Xn ∼ f (·; θ) (density or pmf), θ ∈ {θ0 , θ1 }, and consider H0 : θ =
θ0 vs H1 : θ = θ1 with critical region Cα , α ∈ [0, 1], satisfying, for some η ≥ 0,
(i) π(θ0 ) = PH0 (X ∈ Cα ) = α (size α test);
(ii) fX (x; θ1 ) > ηfX (x; θ0 ) for a.e. x ∈ Cα ; and
(iii) fX (x; θ1 ) < ηfX (x; θ0 ) for a.e. x ∈ Cαc .
Then:
1) Sufficiency/existence: Any test satisfying (i)–(iii) is a UMP test among all
level α tests.
2) Necessity/uniqueness: If there exists a test φα satisfying (i)–(iii) for some
η > 0, then every UMP level α test also satisfies (i)–(iii) with the same η.

© Marius Hofert Section 8.6 | p. 235


Example 8.11 (UMP size α test for N(µ, σ 2 ) for known σ 2 )
Let X1 , . . . , Xn ∼ N(µ, σ 2 ) for known σ 2 > 0.
ind.

1) For µ0 < µ1 , find a UMP size α test for H0 : µ = µ0 vs H1 : µ = µ1 .


2) Find a UMP size α test for H0 : µ ≤ µ0 vs H1 : µ > µ0 .

Solution.
1) Let φNP
α be a test with critical region satisfying (ii)–(iii) of the NPL, i.e.
Cα = {x : fX (x; µ1 ) > ηfX (x; µ0 )} = {x : L(µ 0 ;x) 1
L(µ1 ;x) < η }. With L(µ; x) =
Qn 1
Pn xi −µ 2
xi −µ
1
i=1 σ φ( σ ) = 1
(2πσ 2 )n/2
e− 2 i=1
( σ
)
, we must have
L(µ0 ; x) 1
Pn 2 2 multiply
= e− 2σ2 i=1 ((xi −µ0 ) −(xi −µ1 ) ) = ...
L(µ1 ; x) out

1 ! 1
= e− 2σ2 (2nx̄n (µ1 −µ0 )−n(µ1 −µ0 )) < ,
2 2

η
2σ 2 log(η)+n(µ2 −µ2 ) !
which happens iff x̄n µ > 1
2n(µ1 −µ0 )
0
, so Cα = {x : x̄n > cα }.
0 < µ1

To determine cα , we use that the test must be a size α test, so


 
! √ X̄n − µ0 √ cα − µ0
α = PH0 (X̄n ∈ Cα ) = PH0 (X̄n > cα ) = PH0 n > n
σ σ
© Marius Hofert Section 8.6 | p. 236
 
√ cα − µ0
= Φ̄ n .
σ
Solving for cα , we obtain cα = µ0 + σ Φ √(1−α) = µ0 + σ z√
−1
1−α
n n
.
By the NPL, the test with critical region Cα = {x : x̄n > µ0 + σ z√
1−α
n
} is
thus a UMP size α test.
2) The critical region Cα = 1)
{x : x̄n > µ0 + σ z√1−α
n
} does not depend on the
value of µ1 > µ0 , so the test in 1) is also a UMP size α test for H0 : µ = µ0
vs H1 : µ > µ0 .
The power function for this test is
 z1−α 
π(µ) = Pµ (X ∈ Cα ) = Pµ X̄n > µ0 + σ √
n
 
√ X̄n − µ √ µ0 − µ √ µ − µ
0

= Pµ n > n + z1−α = Φ̄ n + z1−α ,
σ σ σ
which is ↑ in µ ≤ µ0 with π(µ0 ) = Φ̄(z1−α ) = 1−Φ(z1−α ) = 1−(1−α) = α,
so supµ≤µ0 π(µ) = α.
The test is thus also a UMP size α test for H0 : µ ≤ µ0 vs H1 : µ > µ0 .

© Marius Hofert Section 8.6 | p. 237


Remark 8.12
1) With its simple null and alternative hypotheses, the NPL seems to be limited,
but as E. 8.11 demonstrates, it often easily generalizes to composite hypotheses.
2) Similarly one can find a UMP size α test for the left-tailed H0 : µ ≥ µ0 vs
H1 : µ < µ0 .
3) For the two-tailed test H0 : µ = µ0 vs H1 : µ ̸= µ0 , there is no UMP size α
test (and it typically fails to exist in the two-tailed case as the critical regions
for µ < µ0 and µ > µ0 differ).

© Marius Hofert Section 8.6 | p. 238


8.7 Likelihood ratio test
Question: Since the NPL does not always apply to composite hypotheses, what
is a general approach for constructing a test (not necessarily UMP)?
Suppose we are interested in testing
H0 : θ ∈ Θ 0 vs H1 : θ ∈ Θ1 = Θ\Θ0 .
We now present a test statistic based on likelihoods for this test.
The likelihood ratio test (LRT) statistic is
supθ∈Θ0 L(θ; x)
Tn = Tn (x) = −2 log = −2(ℓ(θ̂0,n ) − ℓ(θ̂n )),
supθ∈Θ L(θ; x)
where θ̂0,n is the MLE of L|Θ0 and θ̂n is the unrestricted MLE.
Idea: If there are θ ∈ Θ1 for which L(θ; x) is much larger than for any θ ∈ Θ0 ,
then the likelihood ratio is small, so Tn is large and we should reject H0 .
The critical region Cα is thus of the form Cα = {x : Tn (x) > cα }. One can
d
show that Tn (X) n →→∞
Fχ2ν for ν = dim(Θ) − dim(Θ0 ), so that Cα = {x :
Tn (x) > Fχ−1
2 (1 − α)}.
ν
If Θ0 = {θ0 } and Θ1 = {θ1 } are simple, the LRT and NPL test coincide.
© Marius Hofert Section 8.7 | p. 239
Example 8.13 (LRT for N(µ, σ 2 ) for known σ 2 )
Let X1 , . . . , Xn ∼ N(µ, σ 2 ) for known σ 2 . Find the LRT of size α for testing
ind.

H0 : µ = µ0 vs H1 : µ ̸= µ0 .
Solution.
Pn
The log-likelihood is ℓ(µ; x) E.=
7.12
− n2 log(2πσ 2 ) − 1
2σ 2
2
i=1 (xi − µ) .
P
So the restricted log-likelihood is ℓ(µ0 ; x) = − n2 log(2πσ 2 )− 2σ1 2 ni=1 (xi −µ0 )2 .
The unrestricted MLE is µ̂n E.= 7.12
X̄n with log-likelihood ℓ(X̄n ; X) = − n2 log(2πσ 2 )
P
− 2σ1 2 ni=1 (Xi − X̄n )2 .
P
Therefore, Tn (x) = −2(ℓ(µ0 )−ℓ(µ̂n )) = σ12 ni=1 ((Xi −µ0 )2 −(Xi −X̄n )2 ) =
multiply

out

. . . = σn2 (X̄n − µ0 )2 = ( n X̄nσ−µ0 )2 .
χ21−0 , so Cα = {x : Tn (x) > Fχ−1
approx.
Under H0 , Tn n∼ large
2 (1 − α)}.
1

Alternatively, we know α = PH0 (Tn (X) > cα ) = PH0 (( n X̄nσ−µ0 )2 > cα ) =
!


PH0 (| n X̄nσ−µ0 | > c̃α ) Z ∼ =N(0, 1)
PH0 (|Z| > c̃α ), from which we obtain that

c̃α = z1−α/2 and thus the equivalent critical region Cα = {x : | n x̄n −µ
! 0
σ | >
z1−α/2 }.
© Marius Hofert Section 8.7 | p. 240
Example 8.14 (Two-tailed LRT for Exp(λ))
ind.
Let X1 , . . . , Xn ∼ Exp(λ), λ > 0. Find the LRT of size α for testing H0 : λ = λ0
vs H1 : λ = ̸ λ0 . Apply it to test λ0 = 1 at significance level 5% based on n = 100
observed losses with sum 125.
Solution.
Based on observations x = (x1 , . . . , xn ), the likelihood is L(λ; x) = (λe−λx̄n )n ,
λ > 0, with log-likelihood ℓ(λ; x) = n(log(λ) − λx̄n ), λ > 0. With ℓ′ (λ; x) =
n( λ1 − x̄n ) and ℓ′′ (λ; x) = − λn2 , we see that the MLE is 1/X̄n .
The LRT statistic is therefore
Tn = −2(ℓ( λ0 ) − ℓ(1/X̄n )) = −2n(log(λ0 ) − λ0 X̄n − log(1/X̄n ) + 1)
|{z} | {z }
under H0 MLE
= −2n(log(λ0 X̄n ) − λ0 X̄n + 1).

χ21−0 , so we reject H0 if Tn > Fχ−1


approx.
Under H0 , Tn n∼
large
2 (1 − α).
1
With the given quantities (n = 100, x̄n = 1.25), we have Tn = −2 · 100(log(1 ·
1.25) − 1 · 1.25 + 1) ≈ 5.3713 > 3.8415 ≈ Fχ−12 (0.95), so we reject H0 .
1

© Marius Hofert Section 8.7 | p. 241


Example 8.15 (Right-tailed LRT for Exp(λ))
ind.
Let X1 , . . . , Xn ∼ Exp(λ), λ > 0. Find the LRT of size α for testing H0 : λ ≤ λ0
vs H1 : λ > λ0 . Apply it to the same numbers as before.
Solution.
As in E. 8.14, based on observations x = (x1 , . . . , xn ), the log-likelihood is
ℓ(λ; x) = n(log(λ) − λx̄n ), λ > 0, and we have the MLE 1/X̄n .
Since ℓ(λ; x) is strictly concave with maximum

at the realized MLE 1/x̄n ,
1/x̄ , λ0 ≥ 1/x̄n ,
n
λ̂0,n = argsup L(λ; x) =
λ≤λ0 λ , λ0 < 1/x̄n .
0
The LRT statistic is therefore 
−2(ℓ(1/X̄ ) − ℓ(1/X̄ )) = 0, λ ≥ 1/X̄ ,
n n 0 n
Tn = −2(ℓ(λ̂0,n ) − ℓ(1/X̄n )) =
|{z} | {z } −2n(log(λ X̄ ) − λ X̄ + 1), λ < 1/X̄ ,
0 n 0 n 0 n
under H0 MLE

= −2n(log(λ0 X̄n ) − λ0 X̄n + 1)1(0,1/X̄n ) (λ0 ).


χ21−0 , so we reject H0 if Tn > Fχ−1
approx.
Under H0 , Tn n∼
large
2 (1 − α).
1
Since 1/x̄n = 1/1.25 < 1 = λ0 , Tn = 0 here and so H0 cannot be rejected.
© Marius Hofert Section 8 | p. 242

You might also like