0% found this document useful (0 votes)

39 views24 pages

2024-11-22 Slides 10

Uploaded by

謝利米

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views24 pages

2024-11-22 Slides 10

Uploaded by

謝利米

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

8 Model assessment

8.1 General notions in hypothesis testing

8.2 Error types, size and power

8.3 Basic idea for constructing tests

8.4 The notion of p-value

8.5 Examples of tests

8.6 Most powerful and uniformly most powerful tests

8.7 Likelihood ratio test

© Marius Hofert Section 8 | p. 219

8.1 General notions in hypothesis testing
Motivation: After having estimated various models to given data, we need to
assess their goodness-of-fit, i.e. how well the fitted models actually fit the given
data, to disregard those in stark contrast to the data.
ind.
Setup: We have realizations x1 , . . . , xn of X1 , . . . , Xn ∼ F .
Goal: We want to test a hypothesis about F , i.e. a statement about F (or its
density/pmf f or parameter vector θ). We assume F ∈ F for a set of dfs F.
Graphical: Graphical assessments are tools to check a model’s fit graphically.
Example: A Q-Q plot displays {(F0−1 ( i−1/2 −1 i−1/2 −1 i
n ), x(i) ) = (F0 ( n ), Fn ( n )) :
i = 1, . . . , n} for a hypothesized F0 for F , so that if F0 ≈ F , the points of the
Q-Q plot lie on the line y = x.
Formal: A (hypothesis) test is a decision function (detailed later) that indicates
whether the data supports a given hypothesis or not. The setup is as follows:
▶ Partition F = F0 ⊎ F1 and describe a hypothesis via

H0 : F ∈ F0 vs H1 : F ∈ F1 ,

© Marius Hofert Section 8.1 | p. 220

where H0 is the null hypothesis (what we are interested in testing and
rejecting) and H1 the alternative (hypothesis) (what H0 is tested against).
Note: The term “null” historically comes from “no relationship” between
quantities tested for linear dependence, e.g. H0 : ρ = 0 vs H1 : ρ ̸= 0.
▶ In a parametric hypothesis test, we assume F = {F (·; θ) : θ ∈ Θ ⊆ Rp },
Θ = Θ0 ⊎ Θ1 , and typically describe the hypotheses via
H0 : θ ∈ Θ 0 vs H1 : θ ∈ Θ 1 .
▶ H0 (H1 ) are simple if |H0 | = 1 (|H1 | = 1), otherwise composite.
Decision: The decision of a hypothesis test is to “reject H0 ” or “not reject H0 ”
(not: “accept H0 ”), so can be formulated in terms of the decision function
φ(x) := 1{x∈C} , x = (x⊤ ⊤ ⊤
1 , . . . , xn ) ∈ R
n×d
,
for some critical region C (boundary values of C are the critical values).
Power function: The power function is the probability of rejecting H0 , so
π(F ) := P(φ(X) = 1) = P(X ∈ C), F ∈ F = F0 ⊎ F1 .

8.2 Error types, size and power

A test’s decision leads to precisely one of four possible outcomes:
© Marius Hofert Section 8.2 | p. 221
Outcome (probability) H0 is rejected (x ∈ C) H0 is not rejected (x ∈ C c )
Type I error (α) No error (1 − α)
H0 is true (F ∈ F0 )
(false positive conclusion) (specificity)
No error (1 − β) Type II error (β)
H1 is true (F ∈ F1 )
(sensitivity) (false negative conclusion)

The type I error is the error of incorrectly rejecting H0 .

▶ Its probability is π(F )|F ∈F0 , the size of the test.
▶ Similar to the non-coverage probability of CIs, the size of the test is controlled
by a small probability, the significance level α ∈ [0, 1].
▶ If supF ∈F π(F ) = α (≤ α), the test is a size α test (level α test).
0
▶ A level α test with supF ∈F π(F ) < α is conservative.
0
The probability of correctly rejecting H0 is π(F )|F ∈F1 , the power of the test.
If limn→∞ π(F ) = 1 ∀ F ∈ F1 , the test is consistent.
The type II error is the error of incorrectly not rejecting H0 .
▶ Its probability is β = 1 − π(F )|F ∈F1 .
Only one error type can be controlled (one takes α, typically 5%).
Goal: Construct size α tests with maximal power (minimal β).
© Marius Hofert Section 8.2 | p. 222
8.3 Basic idea for constructing tests
1) Fix α ∈ (0, 1) (typically 5%) and find a test statistic
Tn = Tn (X)
that is a pivotal quantity under H0 , so whose df FTn does not depend on
unknown parameters under H0 , at least for n → ∞ (for an asymptotic test).
2) Determine critical values cα,1 , cα,2 (possibly ±∞) such that
PH0 (cα,1 ≤ Tn ≤ cα,2 ) ≥ 1 − α
(under H0 , so for F ∈ F0 , typically θ ∈ Θ0 ).
3) Based on x and Tn (x), the test’s critical region is then
Cα = {x : cα,1 ≤ Tn (x) ≤ cα,2 }c = {x : Tn (x) < cα,1 or Tn (x) > cα,2 },
so φ(x) = 1{Tn (x)∈[c
/ α,1 ,cα,2 ]} (reject H0 iff Tn (x) ∈
/ [cα,1 , cα,2 ]).

Note: If cα,2 = ∞ (cα,1 = ∞), the test is a left-tailed (right-tailed) test, and if
−∞ < cα,1 , cα,2 < ∞, the test is a two-tailed test (i.e. the critical region lies in
the respective tail(s) of FTn ).
© Marius Hofert Section 8.3 | p. 223
Example 8.1 (Identifying φ, π, α, β)
Let θ ∈ (0, 1) be the probability of recovery after treatment with a new medication
ind.
and X1 , . . . , Xn ∼ B(1, θ) the recovery indicators of n patients. The manufacturer
of the medication tests H0 : θ = θ0 vs H1 : θ = θ1 for some 0 < θ0 < θ1 < 1 with
P
test statistic Tn = Tn (X) = ni=1 Xi and critical region C = {x : Tn (x) > c} for
some critical value c ∈ {1, . . . , n}. Identify the test’s φ, π, α and β.
Solution. φ(x) = 1{x∈C} = 1{Tn (x)>c} , π(θ) = P(φ(X) = 1) = P(X ∈
C) = P(Tn (X) > c) = F̄B(n,θ) (c), α = supθ∈Θ0 π(θ) = π(θ)|θ=θ0 = π(θ0 ) =
F̄B(n,θ0 ) (c) and β = 1 − π(θ)|θ=θ1 = FB(n,θ1 ) (c).
α fixed, θ runs: If θ is large, then Tn is expected to be large (since E(Tn (X)) =
nθ), so under H1 we need to reject H0 more often ⇒ C = {x : Tn (x) > c}
makes sense here.
θ fixed, α runs: A larger α should lead to a smaller c (higher rejection probability
under H0 ), so a larger C and thus a larger φ. In short, 0 < α1 < α2 < 1 ⇒
cα1 > cα2 ⇒ Cα1 ⊆ Cα2 ⇒ 0 ≤ φα1 ≤ φα2 ≤ 1.
For the test to keep its level α, i.e. for supF ∈F0 π(F ) here
= π(θ0 ) ≤ α, c cannot
be too small. So c = cα , thus C = Cα and φ = φα all depend on α.
© Marius Hofert Section 8.3 | p. 224
8.4 The notion of p-value
Typically, φα is monotone in α, so φα1 ≤ φα2 ∀ 0 < α1 < α2 < 1 (the smaller
the rejection prob. α, the less often H0 is rejected). Instead of reporting the test
decision φα (x), one can then report the smallest level at which H0 is still rejected.

Definition 8.2 (P-value)

The p-value of a test is p(x) := inf{α ∈ (0, 1) : φα (x) = 1}, i.e. the smallest
level at which the test still rejects H0 .

For given α, reject H0 iff p(x) ≤ α. This gives a test’s decision for all α and is
thus frequently reported as test result (not just whether H0 was rejected).

Proposition 8.3 (Characterisation)

With FTn symmetric about 0 in the two-tailed case, one has p(x) =
PH0 (“Tn (X) is at least as extreme as Tn (x)”).

Proof. We consider the three forms of tests separately:

1) For a left-tailed test, reject H0 iff Tn (x) < cα = FT−1
n
(α).
© Marius Hofert Section 8.4 | p. 225
Then φα (x) = 1{x∈Cα } for Cα = {x̃ : Tn (x̃) < FT−1 n
(α)}.
The p-value is p(x) = inf{α ∈ (0, 1) : φα (x) = 1} = inf{α ∈ (0, 1) :
Tn (x) < FT−1
n
= cont.inf{α ∈ (0, 1) : FTn (Tn (x)) < α} = FTn (Tn (x))
(α)}F right
Tn

= PH0 (Tn (X) ≤ Tn (x)).

2) For a right-tailed test, reject H0 iff Tn (x) > cα = FT−1
n
(1 − α).
Then φα (x) = 1{x∈Cα } for Cα = {x̃ : Tn (x̃) > FT−1 n
(1 − α)}.
The p-value is p(x) = inf{α ∈ (0, 1) : φα (x) = 1} = inf{α ∈ (0, 1) :
Tn (x) > FT−1 n
(1 − α)} −1 = inf{α ∈ (0, 1) : FTn (Tn (x)−) > 1 −
FT left cont.
n

α} = inf{α ∈ (0, 1) : α > 1 − FTn (Tn (x)−)} = 1 − FTn (Tn (x)−) =

1 − PH0 (Tn (X) < Tn (x)) = PH0 (Tn (X) ≥ Tn (x)).
3) For a two-tailed test, reject H0 iff Tn (x) < cα,1 = FT−1 n
( α2 ) or Tn (x) > cα,2 =
FT−1
n
(1 − α2 ). If FTn is symmetric about 0 reject H0 iff |Tn (x)| > FT−1 n
(1 − α2 ).
Then φα (x) = 1{x∈Cα } for Cα = {x̃ : Tn (x̃) < FT−1 n
( α2 ) or Tn (x̃) >
−1 α
FTn (1 − 2 )}.
The p-value is p(x) = inf{α ∈ (0, 1) : φα (x) = 1} = inf{α ∈ (0, 1) :
Tn (x) < FT−1 ( α2 ) or Tn (x) > FT−1 = inf{α ∈ (0, 1) : α2 >
(1 − α2 )} above
as
n n
FTn (Tn (x)) or α2 > 1 − FTn (Tn (x)−)} = inf{α ∈ (0, 1) : α > 2 min{
© Marius Hofert Section 8.4 | p. 226
FTn (Tn (x)), 1 − FTn (Tn (x)−)}} = 2 min{FTn (Tn (x)), 1 − FTn (Tn (x)−)}
= 2 min{PH0 (Tn (X) ≤ Tn (x)), PH0 (Tn (X) ≥ Tn (x))}. If FTn is symmetric
about 0, then p(x) = PH0 (|Tn (X)| ≥ |Tn (x)|).

Remark 8.4 (Uniformity)

One can show that if φα is a size α test ∀ α ∈ (0, 1), then p(X) ∼ U(0, 1). For
continuous FTn (most cases), we can directly verify this via P. 8.3:
proof of d
1) For a left-tailed test, p(X) P.=8.3 FTn (Tn (X)) prob. transform
= U ∼ U(0, 1).
proof of d d
2) For a right-tailed test, p(X) P.=8.3 1−FTn (Tn (X)−) cont.
= 1−FTn (Tn (X)) prob. transform
=
1 − U ∼ U(0, 1).
proof of d
3) For a two-tailed test, p(X) P.=8.3 2 min{FTn (Tn (X)), 1−FTn (Tn (X)−)} prob. transform
=
2 min{U, 1 − U } ∼ U(0, 1).
This can be used to check a test’s correctness (e.g. if the test is done based on
resampling or otherwise approximate).

© Marius Hofert Section 8.4 | p. 227

8.5 Examples of tests
Similarly to the construction of confidence intervals, the construction of asymp-
totic (vs exact) tests is easier.
We will again see the power of the CLT and Slutsky’s theorem.

Example 8.5 (Asymptotic test of the mean for Poi(λ))

ind.
Let X1 , . . . , Xn ∼ Poi(λ) for unknown λ > 0. Provide a test of H0 : λ = λ0 vs
H1 : λ ̸= λ0 for a given λ0 .

Solution.
In this case Θ = {λ : λ > 0} = Θ0 ⊎ Θ1 for Θ0 = {λ0 } and Θ1 = {λ : λ ∈
(0, ∞)\{λ0 }}.
√
Under H0 , Tn = n X̄√n −λ0 n →
d
N(0, 1) (by the CLT and Slutsky’s theorem).
X̄n→∞

For sufficiently large n, the critical region is thus Cα = {x : |Tn (x)| > z1− α2 =
Φ−1 (1 − α2 )}.
√ n −λ0
Since var(X1 ) = λ H= λ0 , we could have also considered Tn = n X̄√ λ
.
0 0

© Marius Hofert Section 8.5 | p. 228

Example 8.6 (Asymptotic test of the mean for B(m, p) for known m)
ind.
Let X1 , . . . , Xn ∼ B(m, p) for known m and unknown p ∈ (0, 1). Provide a test
of H0 : p ≤ p0 vs H1 : p > p0 for a given p0 .

Solution. Even if we don’t know p under H0 (composite case), we still consider

√
Tn (X) = n q X̄n −mp0 and, for large n, Cα = {x : Tn (x) > z1−α =
m X̄mn (1− X̄mn )
Φ−1 (1 − α)}, because this still gives a size α test, as the power (in terms of p) is

√ X̄n − mp0
π(p) = P(Tn (X) > z1−α ) = P nq > z1−α
m X̄mn (1 − X̄mn )

√ X̄n − mp √ m(p0 − p)
=P nq > z1−α + n q
m X̄mn (1 − X̄mn ) m X̄mn (1 − X̄mn )
| {z }
≥ 0 under H0
√ X̄n − mp
≤P nq > z1−α → 1 − Φ(z1−α ) = α,
n→∞
m X̄mn (1 − X̄n
)
H0
m
| {z }
CLT, Slutsky
∼
n large
N(0,1)

with equality if p = p0 . Similarly for H0 : p ≥ p0 vs H1 : p < p0 (z1−α ← zα ).

1) Tests H0 : µ = µ0 vs H1 : µ ̸= µ0 , H1′ : µ < µ0 or H1′′ : µ > µ0 .

σ 2 known:
√
▶ Pivot: n X̄n −µ ∼ N(0, 1).
σ √
▶ Test statistic: Tn = n X̄n −µ0 ∼ N(0, 1) (symmetric about 0).
σ H0
▶ Critical region, p-value:
H1 : Cα = {|Tn (x)| > z1−α/2 }, p(x) = PH0 (|Tn (X)| ≥ |Tn (x)|) =
T 2 (X) ∼ F 2
PH0 (Tn2 (X) ≥ Tn2 (x)) n cont.
= χ1 1 − Fχ2 (Tn2 (x)).
1
′
H1 : Cα = {Tn (x) < zα = −z1−α }, p(x) = PH0 (Tn (X) ≤
Tn (x)) = Φ(Tn (x)).
H1′′ : Cα = {Tn (x) > z1−α }, p(x) = PH0 (Tn (X) ≥ Tn (x)) cont.
=
Φ̄(Tn (x)).
σ 2 unknown:
√
▶ Pivot: n X̄n −µ ∼ tn−1 .
Ŝn
√
▶ Test statistic: Tn = n X̄n −µ0 ∼ tn−1 (symmetric about 0).
Ŝn H0
▶ Critical region, p-value:

© Marius Hofert Section 8.5 | p. 230

H1 : Cα = {|Tn (x)| > Ft−1 n−1
(1 − α/2), p(x) = P(|Tn (X)| ≥
= 2(1 − Ftn−1 (|Tn (x)|)).
|Tn (x)|) = 2P(Tn (X) ≥ |Tn (x)|) cont.
H1 : Cα = {Tn (x) < Ftn−1 (α) = −Ft−1
′ −1
n−1
(1−α)}, p(x) = P(Tn (X) ≤
Tn (x)) = Ftn−1 (Tn (x)).
H1′′ : Cα = {Tn (x) > Ft−1
n−1
(1 − α)}, p(x) = P(Tn (X) ≥ Tn (x)) cont.
=
F̄tn−1 (Tn (x)).
2) Test H0 : σ 2 = σ02 vs H1 : σ 2 > σ02 (others possible).
µ known:
Pn Xi −µ 2
i=1 ( σ ) ∼ Fχ2n .
▶ Pivot:
▶ Test statistic: Tn = n
P Xi −µ 2
i=1 ( σ0 ) H ∼ Fχ2n .
0
▶ Critical region, p-value: H1 : Cα = {Tn (x) > Fχ−1 (1 − α)}, p(x) =
n
P(Tn (X) ≥ Tn (x)) cont.
= F̄χ2n (Tn (x)).
µ unknown:
2
▶ Pivot: (n−1) Ŝn
∼ Fχ2 .
σ2 n−1
2
▶ Test statistic: Tn = (n−1) Ŝn
∼ Fχ2 .
σ02 H0 n−1
▶ Critical region, p-value: H1 : Cα = {Tn (x) > Fχ−1 (1 − α)}, p(x) =
n−1

P(Tn (X) ≥ Tn (x)) cont.

= F̄χ2 (Tn (x)).
n−1
© Marius Hofert Section 8.5 | p. 231
Remark 8.8 (Exact tests for two independent iid normal datasets)
For j = 1, 2, let X1,j , . . . , Xnj ,j ∼ N(µj , σj2 ) be independent.
ind.

1) Tests H0 : µ1 − µ2 = µ0 vs H1 : µ1 − µ2 ̸= µ0 (others possible).

σ12 , σ22 known:
(X̄n1 ,1 −X̄n2 ,2 )−(µ1 −µ2 ) σ2 σ2
▶ Pivot:
σn1 ,n2 ∼ N(0, 1) for σn2 1 ,n2 = n11 + n22 .
(X̄ −X̄n2 ,2 )−µ0
▶ Test statistic: Tn1 ,n2 = n1 ,1σn ,n ∼ N(0, 1) (symmetric
H0
about 0).
1 2
▶ Critical region, p-value: Cα = {|Tn1 ,n2 (x1 , x2 )| > z1−α/2 }, p(x1 , x2 ) =
1 − Fχ2 (Tn21 ,n2 (x1 , x2 )).
as in
PH0 (|Tn1 ,n2 (X1 , X2 )| ≥ |Tn1 ,n2 (x1 , x2 )|) R.=
8.7 1) 1
σ12 = σ22 =: σ 2 unknown:
2
(n1−1)Ŝn 2
+(n2−1)Ŝn
(X̄n1 ,1−X̄n2 ,2 )−(µ1−µ2 )
▶ Pivot:
Ŝn1 ,n2
∼ tn1 +n2 −2 for Ŝn21 ,n2 = 1 ,1
n1 +n2 −2
2 ,2
.
1/n1 +1/n2
(X̄n1 ,1 −X̄n2 ,2 )−µ0
▶ Test statistic: Tn1 ,n2 = ∼ tn1 +n2 −2 (symmetric about 0).
Ŝn1 ,n2 H0

▶ Critical region, p-value: Cα = {|Tn1 ,n2 (x1 , x2 )| > Ft−1 (1 − α/2)},

© Marius Hofert Section 8.5 | p. 232

2) Tests H0 : σ12 = σ22 vs H1 : σ12 ̸= σ22 (others possible) for unknown µ1 , µ2
2
Ŝn /σ12 2
Ŝn
1 ,1 1 ,1
based on pivot 2 ∼ Fn1 −1,n2 −1 with test statistic Tn1 ,n2 =
Ŝn ,2 /σ22 R. 6.33 2
Ŝn
∼
H0
2 2 ,2
Fn1 −1,n2 −1 are also possible.

The pivots in R. 8.7 and 8.8 were the same as in R. 6.32 and 6.33.
As we can see from these remarks, the (1 − α)-CIs are the complement of the
critical regions of the respective two-tailed tests.

© Marius Hofert Section 8.5 | p. 233

Example 8.9 (Asymptotic Welch’s t-test of equality of two means)
The test of the mean difference in R. 8.8 1) can be extended to arbitrary
distributions with finite variances, at the cost of it being asymptotic.
ind.
For j = 1, 2, let X1,j , . . . , Xnj ,j ∼ Fj be all independent with mean µj and
unknown, but finite, second moment. Suppose we are interested in testing
H0 : µ1 − µ2 = 0 vs H1 : µ1 − µ2 ̸= 0 (here: µ0 = 0; others possible).
As test statistic, one uses
X̄n ,1 − X̄n2 ,2
Tn1 ,n2 = Tn1 ,n2 (X,1 , X,2 ) = r 1
2
Ŝn 2
Ŝn
1 ,1 2 ,2
n1 + n2

(a sample version of the first test statistic in R. 8.8 1)). If nk > 5, k = 1, 2,

2
Ŝ1,n Ŝ 2 2

Tn1 ,n2 is approximately tν distributed, where ν =

( n1
1 + 2,n2
n2 ) for
Ŝ 2 Ŝ 2
1,n1 2 1 1,n1 2 1
( n1 ) ν1
+ ( n1 )ν2
νk = nk − 1, k = 1, 2.
The critical region is therefore Cα = {(x,1 , x,2 ) : |Tn1 ,n2 (x,1 , x,2 )| > Ft−1
ν
(1 −
α −1 α
2 )}, so reject H0 iff |Tn1 ,n2 (x,1 , x,2 )| > Ftν (1 − 2 ).

8.6 Most powerful and uniformly most powerful tests
Question: How can we construct size α tests with maximal power (minimal β)?

A size (level) α test is a uniformly most powerful (UMP) test if π(F ) ≥ π ′ (F )

∀ F ∈ F1 , ∀ power functions π ′ of size (level) α tests.

Theorem 8.10 (Neyman–Pearson lemma (NPL))

ind.
Let X1 , . . . , Xn ∼ f (·; θ) (density or pmf), θ ∈ {θ0 , θ1 }, and consider H0 : θ =
θ0 vs H1 : θ = θ1 with critical region Cα , α ∈ [0, 1], satisfying, for some η ≥ 0,
(i) π(θ0 ) = PH0 (X ∈ Cα ) = α (size α test);
(ii) fX (x; θ1 ) > ηfX (x; θ0 ) for a.e. x ∈ Cα ; and
(iii) fX (x; θ1 ) < ηfX (x; θ0 ) for a.e. x ∈ Cαc .
Then:
1) Sufficiency/existence: Any test satisfying (i)–(iii) is a UMP test among all
level α tests.
2) Necessity/uniqueness: If there exists a test φα satisfying (i)–(iii) for some
η > 0, then every UMP level α test also satisfies (i)–(iii) with the same η.

Example 8.11 (UMP size α test for N(µ, σ 2 ) for known σ 2 )
Let X1 , . . . , Xn ∼ N(µ, σ 2 ) for known σ 2 > 0.
ind.

1) For µ0 < µ1 , find a UMP size α test for H0 : µ = µ0 vs H1 : µ = µ1 .

2) Find a UMP size α test for H0 : µ ≤ µ0 vs H1 : µ > µ0 .

Solution.
1) Let φNP
α be a test with critical region satisfying (ii)–(iii) of the NPL, i.e.
Cα = {x : fX (x; µ1 ) > ηfX (x; µ0 )} = {x : L(µ 0 ;x) 1
L(µ1 ;x) < η }. With L(µ; x) =
Qn 1
Pn xi −µ 2
xi −µ
1
i=1 σ φ( σ ) = 1
(2πσ 2 )n/2
e− 2 i=1
( σ
)
, we must have
L(µ0 ; x) 1
Pn 2 2 multiply
= e− 2σ2 i=1 ((xi −µ0 ) −(xi −µ1 ) ) = ...
L(µ1 ; x) out

1 ! 1
= e− 2σ2 (2nx̄n (µ1 −µ0 )−n(µ1 −µ0 )) < ,
2 2

η
2σ 2 log(η)+n(µ2 −µ2 ) !
which happens iff x̄n µ > 1
2n(µ1 −µ0 )
0
, so Cα = {x : x̄n > cα }.
0 < µ1

To determine cα , we use that the test must be a size α test, so

! √ X̄n − µ0 √ cα − µ0
α = PH0 (X̄n ∈ Cα ) = PH0 (X̄n > cα ) = PH0 n > n
σ σ
© Marius Hofert Section 8.6 | p. 236

√ cα − µ0
= Φ̄ n .
σ
Solving for cα , we obtain cα = µ0 + σ Φ √(1−α) = µ0 + σ z√
−1
1−α
n n
.
By the NPL, the test with critical region Cα = {x : x̄n > µ0 + σ z√
1−α
n
} is
thus a UMP size α test.
2) The critical region Cα = 1)
{x : x̄n > µ0 + σ z√1−α
n
} does not depend on the
value of µ1 > µ0 , so the test in 1) is also a UMP size α test for H0 : µ = µ0
vs H1 : µ > µ0 .
The power function for this test is
z1−α
π(µ) = Pµ (X ∈ Cα ) = Pµ X̄n > µ0 + σ √
n

√ X̄n − µ √ µ0 − µ √ µ − µ
0

= Pµ n > n + z1−α = Φ̄ n + z1−α ,
σ σ σ
which is ↑ in µ ≤ µ0 with π(µ0 ) = Φ̄(z1−α ) = 1−Φ(z1−α ) = 1−(1−α) = α,
so supµ≤µ0 π(µ) = α.
The test is thus also a UMP size α test for H0 : µ ≤ µ0 vs H1 : µ > µ0 .

Remark 8.12
1) With its simple null and alternative hypotheses, the NPL seems to be limited,
but as E. 8.11 demonstrates, it often easily generalizes to composite hypotheses.
2) Similarly one can find a UMP size α test for the left-tailed H0 : µ ≥ µ0 vs
H1 : µ < µ0 .
3) For the two-tailed test H0 : µ = µ0 vs H1 : µ ̸= µ0 , there is no UMP size α
test (and it typically fails to exist in the two-tailed case as the critical regions
for µ < µ0 and µ > µ0 differ).

8.7 Likelihood ratio test
Question: Since the NPL does not always apply to composite hypotheses, what
is a general approach for constructing a test (not necessarily UMP)?
Suppose we are interested in testing
H0 : θ ∈ Θ 0 vs H1 : θ ∈ Θ1 = Θ\Θ0 .
We now present a test statistic based on likelihoods for this test.
The likelihood ratio test (LRT) statistic is
supθ∈Θ0 L(θ; x)
Tn = Tn (x) = −2 log = −2(ℓ(θ̂0,n ) − ℓ(θ̂n )),
supθ∈Θ L(θ; x)
where θ̂0,n is the MLE of L|Θ0 and θ̂n is the unrestricted MLE.
Idea: If there are θ ∈ Θ1 for which L(θ; x) is much larger than for any θ ∈ Θ0 ,
then the likelihood ratio is small, so Tn is large and we should reject H0 .
The critical region Cα is thus of the form Cα = {x : Tn (x) > cα }. One can
d
show that Tn (X) n →→∞
Fχ2ν for ν = dim(Θ) − dim(Θ0 ), so that Cα = {x :
Tn (x) > Fχ−1
2 (1 − α)}.
ν
If Θ0 = {θ0 } and Θ1 = {θ1 } are simple, the LRT and NPL test coincide.
© Marius Hofert Section 8.7 | p. 239
Example 8.13 (LRT for N(µ, σ 2 ) for known σ 2 )
Let X1 , . . . , Xn ∼ N(µ, σ 2 ) for known σ 2 . Find the LRT of size α for testing
ind.

H0 : µ = µ0 vs H1 : µ ̸= µ0 .
Solution.
Pn
The log-likelihood is ℓ(µ; x) E.=
7.12
− n2 log(2πσ 2 ) − 1
2σ 2
2
i=1 (xi − µ) .
P
So the restricted log-likelihood is ℓ(µ0 ; x) = − n2 log(2πσ 2 )− 2σ1 2 ni=1 (xi −µ0 )2 .
The unrestricted MLE is µ̂n E.= 7.12
X̄n with log-likelihood ℓ(X̄n ; X) = − n2 log(2πσ 2 )
P
− 2σ1 2 ni=1 (Xi − X̄n )2 .
P
Therefore, Tn (x) = −2(ℓ(µ0 )−ℓ(µ̂n )) = σ12 ni=1 ((Xi −µ0 )2 −(Xi −X̄n )2 ) =
multiply

out
√
. . . = σn2 (X̄n − µ0 )2 = ( n X̄nσ−µ0 )2 .
χ21−0 , so Cα = {x : Tn (x) > Fχ−1
approx.
Under H0 , Tn n∼ large
2 (1 − α)}.
1
√
Alternatively, we know α = PH0 (Tn (X) > cα ) = PH0 (( n X̄nσ−µ0 )2 > cα ) =
!

√
PH0 (| n X̄nσ−µ0 | > c̃α ) Z ∼ =N(0, 1)
PH0 (|Z| > c̃α ), from which we obtain that
√
c̃α = z1−α/2 and thus the equivalent critical region Cα = {x : | n x̄n −µ
! 0
σ | >
z1−α/2 }.
© Marius Hofert Section 8.7 | p. 240
Example 8.14 (Two-tailed LRT for Exp(λ))
ind.
Let X1 , . . . , Xn ∼ Exp(λ), λ > 0. Find the LRT of size α for testing H0 : λ = λ0
vs H1 : λ = ̸ λ0 . Apply it to test λ0 = 1 at significance level 5% based on n = 100
observed losses with sum 125.
Solution.
Based on observations x = (x1 , . . . , xn ), the likelihood is L(λ; x) = (λe−λx̄n )n ,
λ > 0, with log-likelihood ℓ(λ; x) = n(log(λ) − λx̄n ), λ > 0. With ℓ′ (λ; x) =
n( λ1 − x̄n ) and ℓ′′ (λ; x) = − λn2 , we see that the MLE is 1/X̄n .
The LRT statistic is therefore
Tn = −2(ℓ( λ0 ) − ℓ(1/X̄n )) = −2n(log(λ0 ) − λ0 X̄n − log(1/X̄n ) + 1)
|{z} | {z }
under H0 MLE
= −2n(log(λ0 X̄n ) − λ0 X̄n + 1).

χ21−0 , so we reject H0 if Tn > Fχ−1

approx.
Under H0 , Tn n∼
large
2 (1 − α).
1
With the given quantities (n = 100, x̄n = 1.25), we have Tn = −2 · 100(log(1 ·
1.25) − 1 · 1.25 + 1) ≈ 5.3713 > 3.8415 ≈ Fχ−12 (0.95), so we reject H0 .
1

Example 8.15 (Right-tailed LRT for Exp(λ))
ind.
Let X1 , . . . , Xn ∼ Exp(λ), λ > 0. Find the LRT of size α for testing H0 : λ ≤ λ0
vs H1 : λ > λ0 . Apply it to the same numbers as before.
Solution.
As in E. 8.14, based on observations x = (x1 , . . . , xn ), the log-likelihood is
ℓ(λ; x) = n(log(λ) − λx̄n ), λ > 0, and we have the MLE 1/X̄n .
Since ℓ(λ; x) is strictly concave with maximum

at the realized MLE 1/x̄n ,
1/x̄ , λ0 ≥ 1/x̄n ,
n
λ̂0,n = argsup L(λ; x) =
λ≤λ0 λ , λ0 < 1/x̄n .
0
The LRT statistic is therefore 
−2(ℓ(1/X̄ ) − ℓ(1/X̄ )) = 0, λ ≥ 1/X̄ ,
n n 0 n
Tn = −2(ℓ(λ̂0,n ) − ℓ(1/X̄n )) =
|{z} | {z } −2n(log(λ X̄ ) − λ X̄ + 1), λ < 1/X̄ ,
0 n 0 n 0 n
under H0 MLE

= −2n(log(λ0 X̄n ) − λ0 X̄n + 1)1(0,1/X̄n ) (λ0 ).

χ21−0 , so we reject H0 if Tn > Fχ−1
approx.
Under H0 , Tn n∼
large
2 (1 − α).
1
Since 1/x̄n = 1/1.25 < 1 = λ0 , Tn = 0 here and so H0 cannot be rejected.
© Marius Hofert Section 8 | p. 242

Business Statistics Report Sample
100% (2)
Business Statistics Report Sample
12 pages
cs109 Final Cheat 3 PDF
No ratings yet
cs109 Final Cheat 3 PDF
13 pages
Mark Scheme: Sample Assessment Material 2018
No ratings yet
Mark Scheme: Sample Assessment Material 2018
17 pages
Chapter 8
No ratings yet
Chapter 8
123 pages
Statistics and Probability - Reviewer (SHS - 2nd)
50% (2)
Statistics and Probability - Reviewer (SHS - 2nd)
1 page
Week 1: Assignment1: Assignment Submitted On 2022-01-28, 20:30 IST
No ratings yet
Week 1: Assignment1: Assignment Submitted On 2022-01-28, 20:30 IST
53 pages
STAT600 Notes Student
No ratings yet
STAT600 Notes Student
285 pages
4PH1 Que 2017sep
No ratings yet
4PH1 Que 2017sep
28 pages
SAGE Quantitative Research Methods
50% (2)
SAGE Quantitative Research Methods
4 pages
Tingkat Signifikansi Untuk Uji Satu Arah 0.05 0.025 0.01 0.005 0.0005 Tingkat Signifikansi Untuk Uji Dua Arah
No ratings yet
Tingkat Signifikansi Untuk Uji Satu Arah 0.05 0.025 0.01 0.005 0.0005 Tingkat Signifikansi Untuk Uji Dua Arah
4 pages
Sampling Method in Thesis
No ratings yet
Sampling Method in Thesis
37 pages
CH 9.3 Hypothesis Testing For Proportions
No ratings yet
CH 9.3 Hypothesis Testing For Proportions
13 pages
S60106A 1C Que
100% (1)
S60106A 1C Que
32 pages
Multinomial Logistic Regression Basic Relationships
No ratings yet
Multinomial Logistic Regression Basic Relationships
73 pages
Hypothesis Testing ANOVA Module 5
No ratings yet
Hypothesis Testing ANOVA Module 5
49 pages
Sample Hypothesis Test
No ratings yet
Sample Hypothesis Test
41 pages
LESSON 3 Probability Distribution
57% (7)
LESSON 3 Probability Distribution
26 pages
S52917a Rms
No ratings yet
S52917a Rms
15 pages
Econ 299 Chapter 9.0
No ratings yet
Econ 299 Chapter 9.0
31 pages
University of Central Punjab Faculty of Engineering
No ratings yet
University of Central Punjab Faculty of Engineering
3 pages
Sampling
No ratings yet
Sampling
22 pages
P&S MCQ U 5
100% (6)
P&S MCQ U 5
8 pages
Materi 01 - Distribusi Binomial Dan Poisson
No ratings yet
Materi 01 - Distribusi Binomial Dan Poisson
23 pages
Mark Scheme (Results) January 2012: International GCSE Chemistry (4CH0) Paper 1C Science Double Award (4SC0) Paper 1C
No ratings yet
Mark Scheme (Results) January 2012: International GCSE Chemistry (4CH0) Paper 1C Science Double Award (4SC0) Paper 1C
28 pages
Mark Scheme (Results) January 2012: International GCSE Chemistry (4CH0) Paper 2C
No ratings yet
Mark Scheme (Results) January 2012: International GCSE Chemistry (4CH0) Paper 2C
16 pages
Jurnal Busy Book Nilmayani
No ratings yet
Jurnal Busy Book Nilmayani
14 pages
Name: - Class: P.6 - Date: - Book6B Chapter 2 Worksheet 2.1
No ratings yet
Name: - Class: P.6 - Date: - Book6B Chapter 2 Worksheet 2.1
7 pages
Introduction To Hypothesis Testing: Chap 8-1
No ratings yet
Introduction To Hypothesis Testing: Chap 8-1
59 pages
2014-3-Pen-Jit Sin - Marking Scheme No Working/Answer Partial Marks Total Marks 6
No ratings yet
2014-3-Pen-Jit Sin - Marking Scheme No Working/Answer Partial Marks Total Marks 6
6 pages
2 PQP
No ratings yet
2 PQP
17 pages
Examiners' Report/ Principal Examiner Feedback January 2012
No ratings yet
Examiners' Report/ Principal Examiner Feedback January 2012
6 pages
Hypothesis Test
No ratings yet
Hypothesis Test
49 pages
Sample Assessment Material 2018: Mark Scheme
No ratings yet
Sample Assessment Material 2018: Mark Scheme
15 pages
Chapter 9 - Hypothesis Testing
No ratings yet
Chapter 9 - Hypothesis Testing
100 pages
Lect 8
No ratings yet
Lect 8
22 pages
Sample Assessment Material 2018: Mark Scheme
No ratings yet
Sample Assessment Material 2018: Mark Scheme
11 pages
Chapter 7 XSTKE
No ratings yet
Chapter 7 XSTKE
26 pages
S52917a Que PDF
No ratings yet
S52917a Que PDF
28 pages
S52917a Que PDF
No ratings yet
S52917a Que PDF
28 pages
S52917a Que PDF
No ratings yet
S52917a Que PDF
28 pages
S52917a Que PDF
No ratings yet
S52917a Que PDF
28 pages
Chapter 9 (Compatibility Mode)
No ratings yet
Chapter 9 (Compatibility Mode)
26 pages
Biostat PHC 321 Mid
100% (1)
Biostat PHC 321 Mid
7 pages
Statistical Hypothesis Testing Yp G: Null Hypothesis Null Hypothesis
No ratings yet
Statistical Hypothesis Testing Yp G: Null Hypothesis Null Hypothesis
34 pages
PAN African e Network Project: Semester - 1
No ratings yet
PAN African e Network Project: Semester - 1
75 pages
Mark Scheme: Sample Assessment Material 2018
No ratings yet
Mark Scheme: Sample Assessment Material 2018
15 pages
Topic 17: Simple Hypotheses: 1 Overview and Terminology
No ratings yet
Topic 17: Simple Hypotheses: 1 Overview and Terminology
12 pages
MIT18 05S14 Reading2
No ratings yet
MIT18 05S14 Reading2
6 pages
Probability Distribution Assignment
No ratings yet
Probability Distribution Assignment
2 pages
STS DMT 101 Lecture 02
No ratings yet
STS DMT 101 Lecture 02
22 pages
STAT 2500 Review For Final Exam - (Solutions) - Updated
No ratings yet
STAT 2500 Review For Final Exam - (Solutions) - Updated
12 pages
AES Lecture5 Testing
No ratings yet
AES Lecture5 Testing
58 pages
Find The Mean, Median, Mode, Range, Variance and Std. Deviation
No ratings yet
Find The Mean, Median, Mode, Range, Variance and Std. Deviation
2 pages
Midterm So Ls
No ratings yet
Midterm So Ls
9 pages
Mtma Dse2 End Sem 19
No ratings yet
Mtma Dse2 End Sem 19
3 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
90 pages
Ch10 4page
No ratings yet
Ch10 4page
30 pages
Fundamental of Hypothesis Testing: - Average Price of A Kilogram of Shrimp in The Philippines Is
No ratings yet
Fundamental of Hypothesis Testing: - Average Price of A Kilogram of Shrimp in The Philippines Is
26 pages
STAT601 HypothesisTesting Fall2023
No ratings yet
STAT601 HypothesisTesting Fall2023
38 pages
Testing Concepts.: 1 Hypotheses
No ratings yet
Testing Concepts.: 1 Hypotheses
6 pages
Hypothesis Testing
100% (1)
Hypothesis Testing
58 pages
Stat 22 SP 21 HW7 Solutions
No ratings yet
Stat 22 SP 21 HW7 Solutions
2 pages
Elements of A Test of Hypothesis
No ratings yet
Elements of A Test of Hypothesis
5 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
15 pages
S52918a Rms
No ratings yet
S52918a Rms
10 pages
SB K49 Lecture8
No ratings yet
SB K49 Lecture8
51 pages
Chapter 6
No ratings yet
Chapter 6
28 pages
Lecture 7 With No Solutions2
No ratings yet
Lecture 7 With No Solutions2
42 pages
F13-Homework (4) - STAT 101-1445H
No ratings yet
F13-Homework (4) - STAT 101-1445H
2 pages
Statistical Assessment of Contaminated Land: Some Implications of The 'Mean Value Test'
No ratings yet
Statistical Assessment of Contaminated Land: Some Implications of The 'Mean Value Test'
4 pages
Testing in Statistics
No ratings yet
Testing in Statistics
22 pages
Testing Hypothesis
No ratings yet
Testing Hypothesis
14 pages
Maths (Module 4) 1
No ratings yet
Maths (Module 4) 1
16 pages
LECTURE 8.Pptx Edited - PPTX Sem 1 2022 - 2023
No ratings yet
LECTURE 8.Pptx Edited - PPTX Sem 1 2022 - 2023
56 pages
Lecture 09
No ratings yet
Lecture 09
48 pages
Probability and Statistics Notes
No ratings yet
Probability and Statistics Notes
10 pages
Cycle 3 Probability Sheets - Copyright PDF
No ratings yet
Cycle 3 Probability Sheets - Copyright PDF
6 pages
Hypothesis
100% (1)
Hypothesis
61 pages
CH 9 Hypothesis Testing 40-41-2 4
No ratings yet
CH 9 Hypothesis Testing 40-41-2 4
24 pages
Statistical Inference: Prof M.Shashi
No ratings yet
Statistical Inference: Prof M.Shashi
32 pages
Testing of Hypotheses
No ratings yet
Testing of Hypotheses
19 pages
Lecture 7 With Solutions1
No ratings yet
Lecture 7 With Solutions1
42 pages
Chap 01
No ratings yet
Chap 01
91 pages
2024-09-06 Slides 01
No ratings yet
2024-09-06 Slides 01
27 pages
Introduction To Hypothesis Testing: 8.1 Basic Structure of Hypothesis Tests
No ratings yet
Introduction To Hypothesis Testing: 8.1 Basic Structure of Hypothesis Tests
12 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
66 pages
Walpole Chapter 10
No ratings yet
Walpole Chapter 10
13 pages
Business Statistics Communicating With Numbers 1st Edition Jaggia Solutions Manual
100% (39)
Business Statistics Communicating With Numbers 1st Edition Jaggia Solutions Manual
33 pages
Week 8 - 9 Hypothesis Testing Updated
No ratings yet
Week 8 - 9 Hypothesis Testing Updated
32 pages
Mat 326 Chapter 10 Fall 2024
No ratings yet
Mat 326 Chapter 10 Fall 2024
10 pages
CH 8
No ratings yet
CH 8
24 pages
Sta 115
No ratings yet
Sta 115
54 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
41 pages
STSM3714 (With Notes From Class)
No ratings yet
STSM3714 (With Notes From Class)
110 pages
Topic - Chapter 9 - One-Sample Hypothesis Tests
No ratings yet
Topic - Chapter 9 - One-Sample Hypothesis Tests
1 page
Testing Hypothesis
No ratings yet
Testing Hypothesis
13 pages
BSDS Notes Weeks10 11
No ratings yet
BSDS Notes Weeks10 11
8 pages
BSDS Slides Module 8 9 11
No ratings yet
BSDS Slides Module 8 9 11
14 pages
Tutorial 9 2602
No ratings yet
Tutorial 9 2602
9 pages
Chap 10
No ratings yet
Chap 10
38 pages
HypothesisTesting IIIA
No ratings yet
HypothesisTesting IIIA
6 pages
FinQuiz - Curriculum Note, Study Session 2, Reading 6
No ratings yet
FinQuiz - Curriculum Note, Study Session 2, Reading 6
11 pages
Chapter 6 Hypothesis Testing
No ratings yet
Chapter 6 Hypothesis Testing
4 pages
Testing Statistical Hypothesis - 1
No ratings yet
Testing Statistical Hypothesis - 1
10 pages

2024-11-22 Slides 10

Uploaded by

2024-11-22 Slides 10

Uploaded by

8 Model assessment

8.1 General notions in hypothesis testing

8.2 Error types, size and power

8.3 Basic idea for constructing tests

8.4 The notion of p-value

8.5 Examples of tests

8.6 Most powerful and uniformly most powerful tests

8.7 Likelihood ratio test

© Marius Hofert Section 8 | p. 219

© Marius Hofert Section 8.1 | p. 220

8.2 Error types, size and power

The type I error is the error of incorrectly rejecting H0 .

Definition 8.2 (P-value)

Proposition 8.3 (Characterisation)

Proof. We consider the three forms of tests separately:

= PH0 (Tn (X) ≤ Tn (x)).

α} = inf{α ∈ (0, 1) : α > 1 − FTn (Tn (x)−)} = 1 − FTn (Tn (x)−) =

Remark 8.4 (Uniformity)

© Marius Hofert Section 8.4 | p. 227

Example 8.5 (Asymptotic test of the mean for Poi(λ))

© Marius Hofert Section 8.5 | p. 228

Solution. Even if we don’t know p under H0 (composite case), we still consider

with equality if p = p0 . Similarly for H0 : p ≥ p0 vs H1 : p < p0 (z1−α ← zα ).

1) Tests H0 : µ = µ0 vs H1 : µ ̸= µ0 , H1′ : µ < µ0 or H1′′ : µ > µ0 .

© Marius Hofert Section 8.5 | p. 230

P(Tn (X) ≥ Tn (x)) cont.

1) Tests H0 : µ1 − µ2 = µ0 vs H1 : µ1 − µ2 ̸= µ0 (others possible).

▶ Critical region, p-value: Cα = {|Tn1 ,n2 (x1 , x2 )| > Ft−1 (1 − α/2)},

© Marius Hofert Section 8.5 | p. 232

© Marius Hofert Section 8.5 | p. 233

(a sample version of the first test statistic in R. 8.8 1)). If nk > 5, k = 1, 2,

Tn1 ,n2 is approximately tν distributed, where ν =

© Marius Hofert Section 8.5 | p. 234

A size (level) α test is a uniformly most powerful (UMP) test if π(F ) ≥ π ′ (F )

Theorem 8.10 (Neyman–Pearson lemma (NPL))

© Marius Hofert Section 8.6 | p. 235

1) For µ0 < µ1 , find a UMP size α test for H0 : µ = µ0 vs H1 : µ = µ1 .

To determine cα , we use that the test must be a size α test, so

© Marius Hofert Section 8.6 | p. 237

© Marius Hofert Section 8.6 | p. 238

χ21−0 , so we reject H0 if Tn > Fχ−1

© Marius Hofert Section 8.7 | p. 241

= −2n(log(λ0 X̄n ) − λ0 X̄n + 1)1(0,1/X̄n ) (λ0 ).

You might also like