0% found this document useful (0 votes)
4 views110 pages

STSM3714 (With Notes From Class)

The document outlines the principles of hypothesis testing and interval estimation in statistics, focusing on definitions, testing simple and composite hypotheses, and methods for interval estimation. It includes various statistical tests, examples, and exercises to illustrate the concepts. The content is structured into chapters covering definitions, testing hypotheses, and interval estimation methods.

Uploaded by

Mackenzie Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views110 pages

STSM3714 (With Notes From Class)

The document outlines the principles of hypothesis testing and interval estimation in statistics, focusing on definitions, testing simple and composite hypotheses, and methods for interval estimation. It includes various statistical tests, examples, and exercises to illustrate the concepts. The content is structured into chapters covering definitions, testing hypotheses, and interval estimation methods.

Uploaded by

Mackenzie Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

STSM3714

Hypothesis Testing and Interval Estimation


by

P.C.N. Groenewald
(Revised by KN Bekker, R Schall, S van der Merwe and M Sjölander)

Handbooks :

1. Statistical Inference – G. Casella & R.L. Berger, Brooks/Cole Publishing, 1990.

2. Probability and Statistics – M.H. DeGroot, Addison–Wesley, 1986.

3. Introduction to the Theory of Statistics – A.M. Mood, F.A. Grabill & D.C. Boes, McGraw–Hill,
1974.
CONTENTS

CHAPTER 1 : Definitions and testing simple hypotheses

1.1. Introduction
1.2. Testing simple hypotheses
1.3. Optimal tests for simple hypotheses
Exercises

CHAPTER 2 : Composite Hypotheses

2.1. Introduction
2.2. Generalized Likelihood Ratio Tests
2.4. Two–sample Normal Tests
2.3. Uniformly Most Powerful Tests
2.5. Chi–square Tests
2.6. Approximate Large–Sample Tests
Exercises

CHAPTER 3 : Interval Estimation

3.1. Introduction
3.2. Methods of finding Interval Estimators
3.3. Evaluating Interval Estimators
3.4. Approximate Confidence Intervals
Exercises
Chapter 1
Definitions and testing simple hipotheses

1.1 Introduction
There are two major areas of statistical inference: the estimation of parameters and the testing
of hypotheses. (Example of a hypotheis 0 < µ < 5; or p = 0.2.) We shall study the second of these
two areas in this chapter. Our aim will be to develop general methods for testing hypotheses and
to apply those methods to some common problems.

In experimental research, the object is sometimes merely to estimate parameters. Thus one may
wish to estimate the yield of a new hybrid line of corn. But more often the ultimate purpose will
involve some use of the estimate. One may wish, for example, to compare the yield of the new
line with that of a standard line and perhaps recommend that the new line replace the standard
line if it appears superior. This is a common situation in research. One may wish to determine
whether a new method of sealing light bulbs will increase the life of the bulbs, whether a new
germicide is more effective in treating a certain infection than a standard germicide, whether
one method of preserving foods is better than another insofar as retention of vitamins is
concerned, and so on.

Note that in all these examples we are trying to discover something about a parameter of a
distribution, but other than estimation, we simply want to find out whether it falls in a particular
region of the parameter space, and not to estimate it.

The following definition of a hypothesis is rather general, but the important point is that a
hypothesis makes a statement about the population. The goal of a hypothesis test is to decide,
based on a sample from the population, which of two complementary hypotheses is true.

Definition 1.1 : A statistical hypothesis is an assertion or conjecture or statement about the


parameter or parameters of a population.

Definition 1.2 : The two complementary hypotheses in hypothesis testing problem are called the
null hypothesis and the alternative hypothesis. They are denoted by 𝐻0 and 𝐻1 , respectively.
(e.g. H0: µ = 1400 ; H1: µ > 1400)

It is important to remember that the conjecture that we want to find evidence for is the
alternative hypothesis. The null hypothesis refers to the neutral position or status quo.

Notation : When drawing a random sample from a distribution 𝑓(𝑥|𝜃) with parameter 𝜃, we
will denote the sample space by 𝑆 and the parameter space by Ω, so that (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) ∈ 𝑆
and 𝜃 ∈ Ω.
Definition 1.3 : A statistical test or hypothesis test is a rule that specifies for which values of the
sample the null hypothesis should be rejected.
e.g. reject H0 if 𝑥 ≥ 1500

We will use the capital gamma, Γ, to denote a test. So the hypotheses should be of the form
𝐻0 : 𝜃 ∈ Ω0 ,
(e.g. Ω0 = {1400} and Ω1 = (1400, ∞)) (1.1)
𝐻1 : 𝜃 ∈ Ω1 ,
where Ω0 ∩ Ω1 = 𝜙 and Ω0 ∪ Ω1 = Ω (Ω0 and Ω1 are disjoint and mutually exhaustive).
(e.g. Ω0 = (−∞, 1400] and Ω1 = (1400, ∞) thus Ω = (−∞, ∞))
(e.g. H0: µ ≤ 1400 ; H1: µ > 1400)

Definition 1.4 : The critical region or rejection region is that subset 𝑅 of the sample space 𝑆
for which 𝐻0 would be rejected if (𝑥1 , … , 𝑥𝑛 ) ∈ 𝑅.
1
e.g. 𝑅 = {(𝑥1 , … , 𝑥𝑛 ): ∑ 𝑥𝑖 ≥ 1500}
n
i.e. reject H0 if we get a sample with the property 𝑥 ≥ 1500
i.e. reject H0 if 𝑥 ≥ 1500

e.g.
X ~ N(μ ,25)
H0: μ ≤ 1000 (null hypothesis) Ω0 = (-∞,1000]
H1: μ > 1000 (alternative hypothesis) Ω1 = (1000, ∞) Ω = (-∞,∞)
Example of a statistical test / hypothesis test: Reject H0 if 𝑋 > 1100
Rejection rule / rejection region / critical region:
Reject H0 if {x1, x2, …, xn} 𝜖 {(x1 , x2 , … , x𝑛 ) ∶ 𝑋 > 1100}
R = {(x1 , x2 , … , x𝑛 ) ∶ 𝑋 > 1100}

e.g.
X ~ N(1000,𝜎 2 )
H0: 𝜎 2 = 25 (null hypothesis) Ω0 = {25}
H1: 𝜎 2 ≠ 25 (alternative hypothesis) Ω1 = (0, 25)∪(25, ∞) Ω = (0, ∞)
Example of a statistical test / hypothesis test: Reject H0 if 𝑆 2 < 22 or if 𝑆 2 > 28
Rejection rule / rejection region / critical region:
Reject H0 if {x1, x2, …, xn} 𝜖 {(x1 , x2 , … , x𝑛 ) ∶ 𝑆 2 < 22 or if 𝑆 2 > 28}
i.e. R = {(x1 , x2 , … , x𝑛 ) ∶ 𝑆 2 < 22 or if 𝑆 2 > 28}

e.g.
X ~ Bin(10,p)
H0: p ≤ 0.3 (null hypothesis) Ω0 = [0 , 0.3]
H1: p > 0.3 (alternative hypothesis) Ω1 = (0.3 , 1] Ω = [0 , 1]
Example of a statistical test / hypothesis test: Reject H0 if 𝑋 > 0.4
Rejection rule / rejection region / critical region:
Reject H0 if {x1, x2, …, xn} 𝜖 {(x1 , x2 , … , x𝑛 ) ∶ 𝑋 > 0.4}
R = {(x1 , x2 , … , x𝑛 ) ∶ 𝑋 > 0.4}
The complement of the critical region is called the acceptance region.
So to define a statistical test Γ is to specify the critical region of the test.

Definition 1.5 : The power function of a test, 𝜋(𝜃), is the probability of rejecting the null
hypothesis (H0), as a function of 𝜃.

So
𝜋(𝜃) = 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝜃]
= 𝑃[(𝑋1 , … , 𝑋𝑛 ) ∈ 𝑅|𝜃] (1.2)
= 𝑃𝜃 [𝑿 ∈ 𝑅] .

e.g.
if our test is given by: Reject H0 if 𝑥 ≥ 1500
then 𝑅 = {(𝑥1 , … , 𝑥𝑛 ): 𝑥 ≥ 1500}
then 𝜋(𝜃) = P𝜃 (Reject H0 ) = P𝜃 (𝑋 ≥ 1500)

Note that 𝜋(𝜃) is defined over the whole range of 𝜃 ∈ Ω.


Note : The bold type 𝑿 (or 𝑋) denotes the whole sample.

Example 1.1

Suppose that the average life of light bulbs made under a standard manufacturing procedure is
1400 hours. It is desired to test a new process for manufacturing the bulbs. So we are dealing
with two populations of light bulbs. We know that the mean of the first population is 1400. The
question is whether the mean of the second population is greater than or less than 1400. If we
are really interested in determining if the new process would be an improvement on the
standard, i.e longer average lifetime, we would typically set up the null hypothesis as stating that
the new process has a mean that is actually smaller than or equal to the standard. If we can reject
this null hypothesis we can conclude that the new process is better.

1
Assume that the lifetimes of bulbs are exponentially Exp(𝜃) distributed with mean 𝜃, that is,
1
𝑓(𝑥|𝜃) = 𝜃 𝑒 −𝑥/𝜃 , 0 < 𝑥 < ∞ .

The two hypotheses are then ;


𝐻0 : 𝜃 ≤ 1400
𝐻1 : 𝜃 > 1400 ,

and we must decide on the basis of a random sample of 𝑛 of the new bulbs tested whether to
not reject or to reject 𝐻0 . Note that rejection of 𝐻0 means acceptance of 𝐻1 .

So we must specify our test Γ by defining the critical region 𝑅. 𝑅 is a subset of the sample
space 𝑆 and is generally in 𝑛 dimensions. However, as we shall see later, the critical region can
usually be expressed in terms of a single sufficient statistic, 𝑇(𝑋1 , … , 𝑋𝑛 ).
1
In this case a sufficient statistic for 𝜃 is the sample mean, 𝑋 = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 . Obviously the larger
the sample mean, the less likely is the null hypothesis. So let us reject the null hypothesis if the
sample from the new process gives us a mean that is larger than 1500. The test Γ is then
specified by the following critical region:

𝑅 = {(𝑥1 , … , 𝑥𝑛 ): 𝑥 ≥ 1500} .
which means:

Reject 𝐻0 if 𝑥 ≥ 1500.

The power function of the test can now also be determined:

𝜋(𝜃) = 𝑃𝜃 [𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ]
= 𝑃𝜃 [𝑋 ≥ 1500] .

𝑛
To determine this function we must know the distribution of 𝑋, which is Gamma (𝑛, 𝜃), which
cannot be evaluated explicitly.

1 1
So let us assume 𝑛 = 1, i.e. a single new light bulb was tested. (𝑋 = 𝑛 ∑ 𝑋𝑖 = 1 𝑋1 = 𝑋1 which
we will call X). Then

𝜋(𝜃) = 𝑃𝜃 [𝑋 > 1500] , 𝜃 > 0

For a continuous distribution it doesn’t matter if you have < or if you have ≤
For a continuous distribution it doesn’t matter if you have > or if you have ≥

1
where 𝑋 ∼ 𝐸𝑥𝑝 (𝜃).

So (the probability that we reject H0 is)


∞ 1
𝜋(𝜃) = ∫1500 𝑒 −𝑥/𝜃 𝑑𝑥
𝜃
1500
= 𝑒− 𝜃 ,
a function of 𝜃.

From this power function we can now determine the probability of rejecting the null hypothesis
for any value of 𝜃.

For example, if the true mean of the new process is 𝜃 = 1600, then the probability of rejecting
the null hypothesis (correctly (because 𝜃 > 1400)) is
(null hypothesis is 𝐻0 : 𝜃 ≤ 1400)
“If the true mean of the new process is 1600, what is the probability that we (correctly) reject
H0”
𝜋(1600) = 𝑒 −0.938 = 0.392.

On the other hand, if the true mean is 𝜃 = 1300, then we don’t want to reject the null
hypothesis (because it is true i.e. 𝜃 ≤ 1400), but the probability of doing so is
“If the true mean of the new process is 1300, what is the probability that we (incorrectly) reject
H0”

𝜋(1300) = 𝑒 −1.1538 = 0.315.

FIGURE 1.1 : Power function of the test Γ.

Note for example 1.2:


𝑋−𝜇
X ~ N(𝜇, 𝜎 2 ) implies Z = ~𝑁(0,1)
𝜎
𝜎2 𝑋−𝜇 𝑋−𝜇
Xi ~ N(𝜇, 𝜎 2 ) implies 𝑋 ~ N(𝜇, ) implies Z = = 𝜎/ ~𝑁(0,1)
𝑛 √𝜎2 /𝑛 √𝑛

Example 1.2

Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜃, 𝜎 2 ) where 𝜎 2 = 25 is known. We want to test whether 𝜃 could


be larger than 10, where the norm is that 𝜃 is smaller than or equal to 10.

𝐻0 : 𝜃 ≤ 10
So
𝐻1 : 𝜃 > 10 .

We decide to reject 𝐻0 if the sample mean is larger than 13. Thus the critical region

𝑅 = {𝒙 ∶ 𝑥 > 13}

specifies the test and the power function of the test is given by
𝜋(𝜃) = P𝜃 (Reject H0)
𝜎2
= 𝑃𝜃 [𝑋 > 13] 𝑤ℎ𝑒𝑟𝑒 𝑋 ∼ 𝑁 (𝜃, )
𝑛
𝑋−𝜃 13 − 𝜃
= 𝑃𝜃 [ > ]
𝜎/√𝑛 𝜎/√𝑛
13 − 𝜃
= 𝑃𝜃 [𝑍 > ]
𝜎/√𝑛
13 − 𝜃
= 1 − 𝑃𝜃 [𝑍 < ]
𝜎/√𝑛
13 − 𝜃
= 1 − Φ( )
5/√𝑛

< and ≤ is the same thing as we are dealing with a continuous distribution (the normal distribution)

where Φ represents the cumulative distribution function of a standard normal variate i.e. cdf
of Z~N(0,1) i.e. Φ(… ) is FZ(… ).

Φ always gives you probability with < or ≤ sign for N(0,1) distribution

Φ can be read off of table C or you can use the = norm.s.dist(…, true) function in Excel

𝜋(𝜃) can now be evaluated for any given value of 𝜃.

“i) Given that n = 9, find the probability of (correctly) rejecting H0 if θ = 15”


“ii) Given that n = 9, find the probability of (incorrectly) rejecting H0 if θ = 10”

13−15
i) 𝜋(15) = 1 − Φ ( 5 ) = 1 − Φ(−1.2) = 1 − P(Z < −1.2)
√9
= 1 − P(Z > 1.2) = P(Z < 1.2) = Φ(1.20) = 0.8849

13−10
ii) 𝑤ℎ𝑖𝑙𝑒 𝜋(10) = 1 − Φ ( 5 ) = 1 − Φ(1.80) = 1 − 0.9641 = 0.0359.
√9

in Excel:
13−15
i) 𝜋(15) = 1 − Φ ( 5 ) = 1 − Φ(−1.2) = 1 − norm. s. dist(−1.2, true) = 0.8849
√9
13−10
ii) 𝜋(10) = 1 − Φ ( 5 ) = 1 − Φ(1.8) = 1 − norm. s. dist(1.8, true) = 0.0359
√9
FIGURE 1.2 : Power function of the test Γ, (𝑛 = 9).

FIGURE 1.3 : Power function of the test Γ, (various sample sizes 𝑛).

Theorem 1.4 : Let 𝑋(1) < 𝑋(2) < ⋯ < 𝑋(𝑛) denote the order statistics of a random sample
𝑋1 , … , 𝑋𝑛 from a contineous population with pdf 𝑓𝑋 (𝑥) and cdf 𝐹𝑋 (𝑥). Then the pdf of 𝑋(𝑗) is
𝑛!
𝑓𝑋(𝑗) (𝑥) = 𝑓 (𝑥)[𝐹𝑋 (𝑥)]𝑗−1 [1 − 𝐹𝑋 (𝑥)]𝑛−𝑗 . (1.19)
(𝑗−1)!(𝑛−𝑗)! 𝑋

It follows then for the smallest order statistic that 𝑓𝑋(1) (𝑥) = 𝑛𝑓𝑋 (𝑥)[1 − 𝐹𝑋 (𝑥)]𝑛−1 (1.20)

and for the largest order statistic that 𝑓𝑋(𝑛) (𝑥) = 𝑛𝑓𝑋 (𝑥)[𝐹𝑋 (𝑥)]𝑛−1 . (1.21)
(this often gets used with a distribution which has a pdf in a certain limitted range e.g. below and
also the uniform U(0, θ) distribution)
Class Example 1

Consider a sample X1,…,Xn from the distribution: f(x|θ) = (x/θ)θ–1 , 0 ≤ x ≤ θ , θ > 0.

Let Y = X(n). For the test of H0: θ ≥ 3 versus H1: θ < 3, let the critical region be R = {x: X(n) ≤ 2.8}
Determine the power function of the test.

Solution:
The power function is:
π(θ) = P(Reject H0 | θ)
= P(X(n) ≤ 2.8 | θ)
= P(Y ≤ 2.8 | θ)
2.8
= ∫0 𝑓𝑌 (𝑦) 𝑑𝑦 we thus need to find 𝑓𝑌 (𝑦)

Y = X(n) is the nth order statistic. We will need the density function of Y:
𝑛−1
𝑓𝑌 (𝑦) = 𝑛𝑓𝑋 (𝑦)(𝐹𝑋 (𝑦))
𝑛−1
𝑦 𝜃−1 𝑦 𝑥 𝜃−1
= 𝑛 (𝜃) (∫0 ( ) 𝑑𝑥)
𝜃
𝑦 𝜃−1 1 1 𝑦 𝑛−1
= 𝑛 (𝜃) ([𝜃𝜃−1 𝜃 𝑥 𝜃 ] )
0
𝜃−1 𝜃 𝑛−1
𝑦 𝑦
= 𝑛 (𝜃) (( ) )
𝜃
𝑛𝜃−𝜃
𝑦 𝜃−1 𝑦
= 𝑛 (𝜃) (𝜃 )
𝑦 𝑛𝜃−1
= 𝑛 (𝜃) for 0 ≤ 𝑦 ≤ 𝜃

Thus the power function is:


2.8
π(θ) = ∫0 𝑓𝑌 (𝑦) 𝑑𝑦
2.8 𝑦 𝑛𝜃−1
= ∫0 𝑛 (𝜃) 𝑑𝑦
2.8
1 𝑛𝜃−1 𝑦 𝑛𝜃
= [𝑛 (𝜃) ( 𝑛𝜃 ) ]
0
2.8 𝑛𝜃
=(𝜃) for 𝜃 ≥ 2.8 (0 ≤ 2.8 ≤ 𝜃 implies 𝜃 ≥ 2.8)

If 𝜃 < 2.8 then π(θ) = P(Y ≤ 2.8 | θ) = 1


(Why 1?
Because we know 0 ≤ x ≤ θ
so 0 ≤ x ≤ number smaller than 2.8
so 0 ≤ xi ≤ number smaller than 2.8
so 0 ≤ biggest xi ≤ number smaller than 2.8
so 0 ≤ x(n) ≤ number smaller than 2.8
so 0 ≤ y ≤ number smaller than 2.8)
e.g. if 𝜃 = 2.6 then 0 ≤ y ≤ 2.6 then P(Y ≤ 2.8 | θ) = 1)
1.2 Testing simple hypotheses
Definition 1.6 : When the parameter space under a particular hypothesis contains a single point,
it is a simple hypothesis. When a hypothesis contains more than one point it is called a composite
hypothesis.

H0: θ = 5 is an example of a simple hypothesis i.e. Ω0 = {5}


H1: θ = 6 is an example of a simple hypothesis i.e. Ω1 = {6} Ω = {5, 6}

θ<5,θ>5,θ≠5,θ≤5,θ≥5 are examples of composite hypotheses

Definition 1.7 : A Type I Error is made if the null hypothesis is rejected when it is true, and a Type
II Error is made if the null hypothesis is not rejected when it is false.

Since there are only two possible choices, the test of a simple null hypothesis against a simple
alternative can be represented schematically as follows :

𝐻0 true 𝐻0 false
(𝐻1 true)
𝐻0 not rejected Correct Type II Error
𝐻0 rejected Type I Error Correct

For the rest of ch1 we will consider the situation where both the null and alternative hypotheses
are simple. For the purpose of the test, the parameter space contains only two points. Under a
simple hypothesis the distribution of the observations is completely specified. Under a composite
hypothesis only the class of distributions is specified. So in this case we have

𝐻0 : 𝜃 = 𝜃0
𝐻1 : 𝜃 = 𝜃1 , 𝑤ℎ𝑒𝑟𝑒 Ω = {𝜃0 , 𝜃1 } .

Since the power function is the probability of rejecting the null hypothesis as a function of the
parameter, we can see that

𝜋(𝜃) = P[ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝜃]

𝜋(𝜃0 ) = 𝑃[ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝜃0 ]


= 𝑃[ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒 ]
(1.3)
= 𝑃[ 𝑇𝑦𝑝𝑒 𝐼 𝐸𝑟𝑟𝑜𝑟 ]
= 𝛼

and
𝜋(𝜃1 ) = 𝑃[ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝜃1 ]
= 𝑃[ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑖𝑠 𝑡𝑟𝑢𝑒 ]
= 1 − 𝑃[ 𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑖𝑠 𝑡𝑟𝑢𝑒 ] (1.4)
= 1 − 𝑃[ 𝑇𝑦𝑝𝑒 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 ]
= 1−𝛽

Notation : We will denote the probability of a Type I Error by 𝛼, and the probability of a Type II
Error by 𝛽. 𝛼 is also called the size of the test.

In general:
α = P(type I error) = P(Reject H0 | H0 true)
β = P(type II error) = P(Do not reject H0 | H1 true)
𝜋(𝜃) = P(Reject H0 |𝜃)

So it follows for
𝐻0 : 𝜃 = 𝜃0
𝐻1 : 𝜃 = 𝜃1

𝛼 = P(Reject H0 | H0 true) = P(Reject H0 |𝜃0 ) = 𝜋(𝜃0 )


𝛽 = P(Do not reject H0 | H1 true) = 1 – P(Reject H0 | H1 true) = 1 – P(Reject H0 |𝜃1 ) = 1 − 𝜋(𝜃1 )
(1.5)
for simple hypotheses.

1.3 Optimal tests for simple hypotheses


It is desirable to find a test procedure for which the probabilities 𝛼 and 𝛽 are both small. In
general one test procedure would be better than another if its probabilities for the two types of
errors are both smaller than those for the other test. It is easy to construct a test for which 𝛼 =
0 by always accepting 𝐻0 . However for this procedure 𝛽 = 1. For a fixed sample size it is not
possible to find a test procedure for which 𝛼 and 𝛽 will both be arbitrarily small. If you want a
very small 𝛼, you have to accept a large 𝛽, and vice versa.

We will discuss two possible criteria for defining the best, or optimal test.

1.3.1 Minimizing a linear combination (minimize 𝒂𝜶 + 𝒃𝜷)


𝒇(𝐱|𝜽𝟎 ) 𝒃
To minimize 𝒂𝜶 + 𝒃𝜷 (a and b are given/known) 𝐫𝐞𝐣𝐞𝐜𝐭 𝐇𝟎 𝐢𝐟 <
𝒇(𝐱|𝜽𝟏 ) 𝒂
Once we simplify the rejection rule, we can find the 𝛼, 𝛽
and thus the minimum value of 𝑎𝛼 + 𝑏𝛽.

Suppose 𝑎 and 𝑏 are specified positive constants and it is desired to find a test Γ ∗ for which
𝑎𝛼 ∗ + 𝑏𝛽 ∗ will be a minimum, where 𝛼 ∗ and 𝛽 ∗ are the error probabilities associated with
the test Γ ∗ . The following theorem shows that a procedure that is optimal in this sense has a
very simple form.
Consider the simple hypotheses

𝐻0 : 𝜃 = 𝜃0 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝐻1 : 𝜃 = 𝜃1 ,

and let
𝑓(𝐱|𝜃𝑖 ) = f(x1 , … , 𝑥𝑛 |𝜃𝑖 ) = ∏𝑛𝑗=1 𝑓(𝑥𝑗 |𝜃𝑖 )
( random sample
=> independent and identically distributed
=> pdf of Xj is fXj (xj |θi ) = fX (xj |θi ) = f(xj |θi )
Pdf of X1 ,…, Xn = Pdf of X1 × … × Pdf of Xn )

be the joint p.d.f. of the observations in the sample if 𝐻𝑖 is true (𝑖 = 0,1).

𝑓(𝐱|𝜃0 ) 𝑏
Theorem 1.1 : Let Γ ∗ denote a test procedure such that 𝐻0 rejected if 𝑓(𝐱|𝜃1 )
< . Then for
𝑎
any other test Γ, it follows that

𝑎𝛼 ∗ + 𝑏𝛽 ∗ ≤ 𝑎𝛼 + 𝑏𝛽 .

Proof : (We shall present the proof for the case where the sample 𝑋1 , … , 𝑋𝑛 is drawn from a
continuous distribution. In the case of a discrete distribution the 𝑛–dimensional integrals are
replaced by summations.)

Let 𝑅 denote the critical region of an arbitrary test Γ. Then the acceptance region is the
complement 𝑅 𝑐 .

Then

𝛼 = 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒 ]


= 𝑃[𝑿 ∈ 𝑅|𝜃 = 𝜃0 ]
= ∫𝑅 𝑓(𝒙|𝜃0 ) 𝑑𝑥.

Similarly

𝛽 = 𝑃[𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑖𝑠 𝑡𝑟𝑢𝑒 ]


= 𝑃[𝑿 ∈ 𝑅 𝑐 |𝜃 = 𝜃1 ]
= ∫𝑅𝑐 𝑓(𝒙|𝜃1 ) 𝑑𝑥 ,
and

𝑎𝛼 + 𝑏𝛽 = 𝑎 ∫𝑅 𝑓(𝒙|𝜃0 )𝑑𝑥 + 𝑏 ∫𝑅𝑐 𝑓(𝒙|𝜃1 )𝑑𝑥


= 𝑎 ∫𝑅 𝑓(𝒙|𝜃0 )𝑑𝑥 + 𝑏[1 − ∫𝑅 𝑓(𝒙|𝜃1 )𝑑𝑥]
(1.6)
= ∫𝑅 𝑎 𝑓(𝒙|𝜃0 )𝑑𝑥 + 𝑏 − ∫𝑅 𝑏 𝑓(𝒙|𝜃1 )𝑑𝑥
= 𝑏 + ∫𝑅 [𝑎𝑓(𝒙|𝜃0 ) − 𝑏𝑓(𝒙|𝜃1 )]𝑑𝑥.
(It can be seen from (1.6) that this quantity would be a) minimum if 𝑅 is chosen so that the
integral (on the right hand side) is a minimum. (This would be so) if 𝑅 contains every point for
which 𝑎𝑓(𝒙|𝜃0 ) − 𝑏𝑓(𝒙|𝜃1 ) < 0 (and exclude every point for which 𝑎𝑓(𝒙|𝜃0 ) − 𝑏𝑓(𝒙|𝜃1 ) ≥
𝑓(𝒙 |𝜃 ) 𝑏
0). So if 𝑅 is the region where 𝑎𝑓(𝒙|𝜃0 ) < 𝑏𝑓(𝒙|𝜃1 ) i.e. 𝑓(𝒙|𝜃 0) < 𝑎 , then 𝑎𝛼 + 𝑏𝛽 is a
1
minimum and corresponds to the test Γ ∗ in the theorem.

𝑓(𝒙|𝜃0 )
Note : The ratio is called the likelihood ratio of the sample, and to minimize 𝑎𝛼 + 𝑏𝛽,
𝑓(𝒙|𝜃1 )

𝑓( 𝒙|𝜃0 ) 𝑏
𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 <𝑎. (1.7)
𝑓( 𝒙|𝜃1 )

The constants 𝑎 and 𝑏 determine the relative weights that are given to the two types of errors.
For example, if 𝑎 = 2 and 𝑏 = 1, then a Type I Error is considered twice as serious as a Type II
Error.

1.3.2 Most powerful tests (set 𝜶 to a small value (often 0.05) and minimise 𝜷)
𝑓(𝒙|𝜃0 )
To minimize 𝛽 (for a given 𝛼) i.e. to find the most powerful test, 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 < 𝑘
𝑓(𝒙|𝜃1 )
Once we simplify the above, as we know the value of 𝛼, we can use it to find k and thus 𝛽.
1 − 𝛽 is known as the power of a test
Minimising 𝛽 => maximising 1 − 𝛽 => maximising power => most powerful

Since 𝛼 and 𝛽 cannot both be made arbitrarily small for a fixed sample size, an important
criterium for defining an optional test is to keep the size of a Type I Error fixed and to choose a
test which will minimize the size of a Type II Error. This leads to the definition of a most powerful
test.

Definition 1.8 : A test Γ ∗ is a most powerful test (MPT) of size 𝛼 (0 < 𝛼 < 1) if :

(i) 𝜋 ∗ (𝜃0 ) = 𝛼 , (i.e. 𝛼 ∗ = 𝛼) and


(ii) 𝛽 ∗ ≤ 𝛽

for all other tests Γ of size 𝛼 or smaller.

Theorem 1.2 : (Neyman–Pearson Lemma)


Suppose Γ ∗ is a test of 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 = 𝜃1 which has the following form for some
constant 𝑘 > 0 :

𝑓(𝒙|𝜃0 )
𝒙∈𝑅 𝑖𝑓 < 𝑘
𝑓(𝒙|𝜃1 )

Where (𝑘 is found by) 𝑃𝜃0 [𝑿 ∈ 𝑅] = 𝛼 ∗ .

If Γ is any other test such that 𝛼 ≤ 𝛼 ∗ , then 𝛽 ≥ 𝛽 ∗ .


Proof : The test has the same form as that given in Theorem 1.1 with 𝑎 = 1 and 𝑏 = 𝑘. So it
follows that

𝛼 ∗ + 𝑘𝛽 ∗ ≤ 𝛼 + 𝑘𝛽

where 𝛼 and 𝛽 are the error sizes of any other test Γ. So if 𝛼 ≤ 𝛼 ∗ , then 0 ≤ 𝛼 ∗ − 𝛼 ≤
𝑘(𝛽 − 𝛽 ∗ ), and thus, since 𝑘 > 0, 𝛽 ≥ 𝛽 ∗ .

OR

Note : The Neyman–Pearson Lemma states that the most powerful test of size 𝛼 is the one
where the null hypothesis is rejected when

𝑓(𝒙|𝜃0 )
< 𝑘, (1.8)
𝑓(𝒙|𝜃1 )

where 𝑘 is chosen so that 𝑃𝜃0 [ 𝑿 ∈ 𝑅] = 𝛼.

Then the size of the Type II Error is smaller than for any other test with the same or smaller
Type I Error probability.
Example 1.3
Let 𝑥1 , … , 𝑥𝑛 be a random sample from a 𝑁(𝜃, 𝜎 2 ) distribution with 𝜎 2 known. Only two
values for 𝜃 are possible, so we want to test 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 = 𝜃1 where 𝜃0 < 𝜃1 .

(a) Suppose we want to minimize a linear combination of the error sizes, 𝑎𝛼 + 𝑏𝛽. Then,
𝑓(𝒙|𝜃 ) 𝑏
according to Equation (1.7) we must reject 𝐻0 if 𝑓(𝒙|𝜃0 ) < 𝑎.
1
Now
1
1 − (𝑥 −𝜃)2
𝑓(𝑥𝑖 |𝜃) = 1 𝑒 2𝜎2 𝑖
(2𝜋𝜎2 )2
2
(𝑥 −𝜃)
1 − 𝑖 2
𝑓(𝒙|𝜃) = 𝑓(𝑥1 , … , 𝑥𝑛 |𝜃) = ∏𝑛𝑖=1 𝑓(𝑥𝑖 |𝜃) = ∏𝑛𝑖=1 𝑒 2𝜎
√2𝜋𝜎2
(𝑥 −𝜃)2 (𝑥 −𝜃)2 (𝑥 −𝜃)2
1 − 1 2 1 − 2 2 1 − 𝑛 2
= √2𝜋𝜎2 𝑒 2𝜎 × √2𝜋𝜎2 𝑒 2𝜎 × … × √2𝜋𝜎2 𝑒 2𝜎

2 2
(𝑥 −𝜃) (𝑥 −𝜃) (𝑥 −𝜃)2
1 1 1 − 1 − 2 − 𝑛 2
= √2𝜋𝜎2 × √2𝜋𝜎2 × … × √2𝜋𝜎2 𝑒 2𝜎2 × 𝑒 2𝜎2 × …× 𝑒 2𝜎

2 2
(𝑥 −𝜃) (𝑥 −𝜃) (𝑥 −𝜃)2
1 1 1 − 1 2 − 2 2 −…− 𝑛 2
= (2𝜋𝜎2 )1/2
× (2𝜋𝜎2 )1/2
× …× (2𝜋𝜎2 )1/2
𝑒 2𝜎 2𝜎 2𝜎

2
(𝑥𝑖 −𝜃)
1 − ∑𝑛
= (2𝜋𝜎2 )𝑛/2
𝑒 𝑖=1 2𝜎2

(1.9)
is the normal likelihood function, and
1
1 − Σ(𝑥𝑖 −𝜃0 )2
𝑒 2𝜎2
𝑓(𝒙|𝜃0 ) (2𝜋𝜎2 )𝑛/2
= 1
𝑓(𝒙|𝜃1 ) Σ(𝑥𝑖 −𝜃1 )2
1 −
𝑒 2𝜎2 ea/eb = ea-b (1.10)
(2𝜋𝜎2 )𝑛/2
1
− [Σ(𝑥 −𝜃 )2 −Σ(𝑥 −𝜃 )2 ]
= 𝑒 2𝜎2 𝑖 0 𝑖 1
1
− ∑ [𝑥 −2𝑥𝑖 𝜃0 +𝜃0 −𝑥𝑖2 +2𝑥𝑖 𝜃1 −𝜃12 ]
𝑛 2 2
= 𝑒 2𝜎2 𝑖=1 𝑖
1
− ∑𝑛 [2𝑥𝑖 𝜃1 −2𝑥𝑖 𝜃0 +𝜃02 −𝜃12 ]
=𝑒 2𝜎2 𝑖=1
1
− 2 [∑𝑛 𝑛 2 2
𝑖=1[2𝑥𝑖 𝜃1 −2𝑥𝑖 𝜃0 ]+∑𝑖=1[𝜃0 −𝜃1 ]]
=𝑒 2𝜎
1
− [∑𝑛 2𝑥𝑖 (𝜃1 −𝜃0 )+𝑛[𝜃02 −𝜃12 ]]
=𝑒 2𝜎2 𝑖=1
1
− 2 [2(𝜃1 −𝜃0 ) ∑𝑛 2 2
𝑖=1 𝑥𝑖 +𝑛[𝜃0 −𝜃1 ]]
=𝑒 2𝜎
1 𝑛
Note: ∑ 𝑥 =𝑥
𝑛 𝑖=1 𝑖
Thus: ∑𝑛𝑖=1 𝑥𝑖 = 𝑛𝑥
1
− 2 [2(𝜃1 −𝜃0 )𝑛𝑥+𝑛[𝜃02 −𝜃12 ]]
=𝑒 2𝜎
𝑛
− [2(𝜃1 −𝜃0 )𝑥+[𝜃02 −𝜃12 ]]
=𝑒 2𝜎2
𝑛
− 2 [(𝜃02 −𝜃12 )+2𝑥(𝜃1 −𝜃0 )]
=𝑒 2𝜎

So reject 𝐻0 if
𝑛
− [(𝜃02 −𝜃12 )+2𝑥(𝜃1 −𝜃0 )] 𝑏
𝑒 2𝜎2 < ,
𝑎
𝑛 𝑏
− 2𝜎2 [(𝜃02 − 𝜃12 ) + 2𝑥(𝜃1 − 𝜃0 )] < ln ( 𝑎 ),

that is, if
2𝜎2 𝑏
𝜃02 − 𝜃12 + 2𝑥(𝜃1 − 𝜃0 ) > − ℓ𝑛 (𝑎) ,
𝑛
or if
2𝜎2 𝑏
2𝑥(𝜃1 − 𝜃0 ) > − ℓ𝑛 (𝑎) − (𝜃02 − 𝜃12 ),
𝑛

2𝜎2 𝑏
2𝑥 (𝜃1 − 𝜃0 ) > 𝜃12 − 𝜃02 − ℓ𝑛 (𝑎)
𝑛
1 2𝜎2 𝑏
𝑥 > (𝜃12 − 𝜃02 − ℓ𝑛 ( )) . (1.11)
2(𝜃1 −𝜃0 ) 𝑛 𝑎

For a numerical example, let 𝜃0 = 0, 𝜃1 = 5, 𝜎 2 = 16, 𝑛 = 9 and 𝑎 = 2, 𝑏 = 1.


From (1.11) follows that 𝐻0 should be rejected if 𝑥 > 2.7464.
What are then the error probabilities?

𝛼 = 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒]


= 𝑃[𝑋 > 2.7464|𝜃 = 0]
𝑋−0 2.7464−0
= 𝑃[ > |𝜃 = 0]
√16/9 √16/9
2.7464−0
= 𝑃 [𝑍 > ]
4/3
= 𝑃[𝑍 > 2.0598]
= 1 − 𝑃[𝑍 < 2.0598]
= 1 − 0.9803
= 0.0197 .

𝛽 = 𝑃(𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑡𝑟𝑢𝑒)


= 𝑃[𝑋 < 2.7464|𝜃 = 5]
𝑋−5 2.7464−5
= 𝑃[ < |𝜃 = 5]
√16/9 √16/9
2.7464−5
= 𝑃 [𝑍 < ]
4/3
= 𝑃[𝑍 < −1.6902]
= 𝑃[𝑍 > 1.6902]
= 1 − 𝑃[𝑍 < 1.6902]
= 1 − 0.9549
= 0.0455

(Excel can be used instead of the table. See note I added by example 1.2)

So 2𝛼 + 𝛽 = 2(0.0197) + (0.0455) = 0.0849.


𝑓(𝒙|𝜃0 )
(b) Consider now the MPT of size 𝛼. According to Equation (1.8), reject 𝐻0 if < 𝑘,
𝑓(𝒙|𝜃1 )
and it follows directly from Equation (1.11) with 𝑎 = 1 and 𝑏 = 𝑘 that 𝐻0 is rejected when
1 2𝜎2
𝑥 > (𝜃12 − 𝜃02 − ℓ𝑛(𝑘)) . (1.12)
2(𝜃1 −𝜃0 ) 𝑛

(If I didn’t have part a, I would have to first find this:


1
1 − (𝑥 −𝜃)2
𝑓(𝑥𝑖 |𝜃) = 1 𝑒 2𝜎2 𝑖
(2𝜋𝜎2 )2
2
(𝑥 −𝜃)
1 − 𝑖 2
𝑓(𝒙|𝜃) = 𝑓(𝑥1 , … , 𝑥𝑛 |𝜃) = ∏𝑛𝑖=1 𝑓(𝑥𝑖 |𝜃) = ∏𝑛𝑖=1 𝑒 2𝜎
√2𝜋𝜎2
(𝑥 −𝜃)2 (𝑥 −𝜃)2 (𝑥 −𝜃)2
1 − 1 2 1 − 2 2 1 − 𝑛 2
= √2𝜋𝜎2 𝑒 2𝜎 × √2𝜋𝜎2 𝑒 2𝜎 × … × √2𝜋𝜎2 𝑒 2𝜎

2 2
(𝑥 −𝜃) (𝑥 −𝜃) (𝑥 −𝜃)2
1 1 1 − 1 − 2 − 𝑛 2
= √2𝜋𝜎2 × √2𝜋𝜎2 × … × √2𝜋𝜎2 𝑒 2𝜎2 × 𝑒 2𝜎2 × … × 𝑒 2𝜎
2 2
(𝑥 −𝜃) (𝑥 −𝜃) (𝑥 −𝜃)2
1 1 1 − 1 − 2 2 −…− 𝑛 2
= (2𝜋𝜎2 )1/2
× (2𝜋𝜎2 )1/2
× …× (2𝜋𝜎2 )1/2
𝑒 2𝜎2 2𝜎 2𝜎

2
(𝑥𝑖 −𝜃)
1 − ∑𝑛
= (2𝜋𝜎2 )𝑛/2
𝑒 𝑖=1 2𝜎2
(1.9)

is the normal likelihood function, and


1
1 − Σ(𝑥𝑖 −𝜃0 )2
𝑒 2𝜎2
𝑓(𝒙|𝜃0 ) (2𝜋𝜎2 )𝑛/2
= 1
𝑓(𝒙|𝜃1 ) Σ(𝑥𝑖 −𝜃1 )2
1 −
𝑒 2𝜎2 ea/eb = ea-b (1.10)
(2𝜋𝜎2 )𝑛/2
1
− [Σ(𝑥 −𝜃 )2 −Σ(𝑥 −𝜃 )2 ]
= 𝑒 2𝜎2 𝑖 0 𝑖 1
1
− ∑ [𝑥 −2𝑥𝑖 𝜃0 +𝜃0 −𝑥𝑖2 +2𝑥𝑖 𝜃1 −𝜃12 ]
𝑛 2 2
= 𝑒 2𝜎2 𝑖=1 𝑖
1
− ∑𝑛 [2𝑥𝑖 𝜃1 −2𝑥𝑖 𝜃0 +𝜃02 −𝜃12 ]
=𝑒 2𝜎2 𝑖=1
1
− 2 [∑𝑛 𝑛 2 2
𝑖=1[2𝑥𝑖 𝜃1 −2𝑥𝑖 𝜃0 ]+∑𝑖=1[𝜃0 −𝜃1 ]]
=𝑒 2𝜎
1
− [∑𝑛 2𝑥𝑖 (𝜃1 −𝜃0 )+𝑛[𝜃02 −𝜃12 ]]
=𝑒 2𝜎2 𝑖=1
1
− 2 [2(𝜃1 −𝜃0 ) ∑𝑛 2 2
𝑖=1 𝑥𝑖 +𝑛[𝜃0 −𝜃1 ]]
=𝑒 2𝜎
1 𝑛
Note: ∑ 𝑥 =𝑥
𝑛 𝑖=1 𝑖
Thus: ∑𝑛𝑖=1 𝑥𝑖 = 𝑛𝑥
1
− 2 [2(𝜃1 −𝜃0 )𝑛𝑥+𝑛[𝜃02 −𝜃12 ]]
=𝑒 2𝜎
𝑛
− [2(𝜃1 −𝜃0 )𝑥+[𝜃02 −𝜃12 ]]
=𝑒 2𝜎2
𝑛
− [(𝜃02 −𝜃12 )+2𝑥(𝜃1 −𝜃0 )]
=𝑒 2𝜎2

So reject 𝐻0 if
𝑛
− [(𝜃2 −𝜃2 )+2𝑥(𝜃 −𝜃 )]
1 0
𝑒 2𝜎2 0 1 < 𝑘,
𝑛 2 2
− 2𝜎2 [(𝜃0 − 𝜃1 ) + 2𝑥(𝜃1 − 𝜃0 )] < ln(𝑘 ),
that is, if
2𝜎2
𝜃02 − 𝜃12 + 2𝑥(𝜃1 − 𝜃0 ) > − ℓ𝑛(𝑘) ,
𝑛
or if
2𝜎2
2𝑥(𝜃1 − 𝜃0 ) > − ℓ𝑛(𝑘) − (𝜃02 − 𝜃12 ),
𝑛

2𝜎2
2𝑥 (𝜃1 − 𝜃0 ) > 𝜃12 − 𝜃02 − ℓ𝑛(𝑘)
𝑛

𝑥 >
1
2(𝜃1 −𝜃0 )
(𝜃12 − 𝜃02 −
2𝜎2
𝑛
ℓ𝑛(𝑘)) . )
(with min a𝛼 + 𝑏𝛽 we could stop here, as the rhs is known.

But with setting 𝛼 and minimising 𝛽 (most powerful test), the rhs has a k in the formula, which
is unknown. Thus we must continue and use the definition of 𝛼 to find the value of the rhs)

(The important difference between the two approaches is that 𝑎 and 𝑏 in (1.11) are assumed
known, while 𝑘 in (1.12) must be determined so that the size of the Type I Error is equal to 𝛼.
So let the right–hand side of (1.12) be equal to 𝑐. ) Then 𝐻0 is rejected if 𝑥 > 𝑐, where

𝛼 = 𝑃[ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒 ] = 𝑃[ 𝑋 > 𝑐|𝜃0 ] = 𝑃𝜃0 [𝑋 > 𝑐]

𝜎2
We know that under 𝐻0 , Xi ∼ 𝑁(𝜃0 , 𝜎 2 ) 𝑠𝑜 𝑋 ∼ 𝑁 (𝜃0 , ) , so by standardising,
𝑛

𝑋−𝜃0 𝑐−𝜃0
𝛼 = 𝑃𝜃0 [𝑋 > 𝑐] = 𝑃 [𝜎/ > ]
√𝑛 𝜎/√𝑛
𝑐−𝜃0
= 𝑃 [𝑍 > ] 𝑤ℎ𝑒𝑟𝑒 𝑍 ∼ 𝑁(0,1)
𝜎/√𝑛
𝑐−𝜃0
= 1 − 𝑃 [𝑍 < ]
𝜎/√𝑛
𝑐−𝜃0
1 − 𝛼 = 𝑃 [𝑍 < ]
𝜎/√𝑛
𝑐−𝜃0
1 − 𝛼 = Φ( 𝜎 ) Φ gets read from table C. But c is unknown.
√𝑛
𝑐−𝜃0
Φ−1 (1 − 𝛼) = 𝜎
√𝑛

Φ−1 gets read from table D or table E if it had an ∞ row or in Excel =norm.s.inv(…).

𝑐−𝜃0
𝑧1−𝛼 = 𝜎
√𝑛

𝑐−𝜃0
So 𝜎/√𝑛
is the (1 − 𝛼)th percentile of the standard normal distribution, 𝑧1−𝛼 , and
𝑐−𝜃0
= 𝑧1−𝛼 , 𝑜𝑟
𝜎/√𝑛
𝜎
𝑐 = 𝜃0 + 𝑧1−𝛼 .
√𝑛

So the MPT of size 𝛼 is:


𝜎
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 > 𝜃0 + 𝑧1−𝛼 . (1.13)
√𝑛

Note that it is not necessary to evaluate 𝑘, only 𝑐, which is the right–hand side of (1.12).
For the numerical example 𝛼 should first be fixed. Let 𝛼 = 0.01 (and let 𝜃0 = 0, 𝜃1 = 5, 𝜎 2 =
16, 𝑛 = 9) then it follows from (1.13) that 𝐻0 should be rejected if 𝑥 > 3.1067.

Then

𝛼 = 0.01 𝑎𝑛𝑑
𝛽 = 𝑃(𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑡𝑟𝑢𝑒) = 𝑃[𝑋 < 3.1067|𝜃 = 5] = 𝑃[𝑍 < −1.4200]
= 0.0778 .

(the way we find 𝛽 is similar to that in part a)

Example 1.4 : Sampling from a Bernoulli distribution. [N.B. This is an example where we use a
discrete distribution.]

Suppose that 𝑋1 , … , 𝑋𝑛 is a random sample from a Bernoulli distribution with unknown


parameter 𝑝, and that the following hypotheses are to be tested :

𝐻0 : 𝑝 = 𝑝0 𝑣𝑒𝑟𝑠𝑢𝑠 𝐻1 : 𝑝 = 𝑝1 𝑤ℎ𝑒𝑟𝑒 𝑝0 < 𝑝1 .

We want to find a MPT of size 𝛼. Now the probability function is


𝑓(𝑥|𝑝) = 𝑝 𝑥 (1 − 𝑝)1−𝑥 , 𝑥 = 0,1 ,

and the likelihood function for a sample of 𝑛 is

𝑓(𝒙|𝑝) = ∏𝑛𝑖=1 𝑝 𝑥𝑖 (1 − 𝑝)1−𝑥𝑖


= 𝑝Σ𝑥𝑖 (1 − 𝑝)𝑛−Σ𝑥𝑖
= 𝑝 𝑦 (1 − 𝑝)𝑛−𝑦 𝑤ℎ𝑒𝑟𝑒 𝑦 = ∑𝑛𝑖=1 𝑥𝑖 .
Then
𝑦 𝑦 𝑛−𝑦
𝑓(𝒙|𝑝0 ) 𝑝 (1−𝑝 )𝑛−𝑦 𝑝 1−𝑝
= 𝑝0𝑦 (1−𝑝0)𝑛−𝑦 = (𝑝0) (1−𝑝0)
𝑓(𝒙|𝑝1 ) 1 1 1 1
𝑓(𝒙|𝑝0 )
and we reject 𝐻0 if < 𝑘
𝑓(𝒙|𝑝1 )
𝑝 𝑦 1−𝑝0 𝑛−𝑦
(𝑝0) (1−𝑝 ) <𝑘.
1 1
𝑝0 𝑦 1 − 𝑝0 𝑛 1 − 𝑝0 −𝑦
( ) ( ) ( ) <𝑘
𝑝1 1 − 𝑝1 1 − 𝑝1
𝑝0 𝑦 1 − 𝑝0 𝑛 1 − 𝑝1 𝑦
( ) ( ) ( ) <𝑘
𝑝1 1 − 𝑝1 1 − 𝑝0

That is, if
𝑝 (1−𝑝 ) 𝑦 1−𝑝 𝑛
[𝑝0(1−𝑝1)] (1−𝑝0) < 𝑘,
1 0 1
𝑝0 (1 − 𝑝1 ) 𝑦 𝑘
[ ] <
𝑝1 (1 − 𝑝0 ) 1−𝑝 𝑛
(1 − 𝑝0 )
1

𝑝0 (1 − 𝑝1 ) 𝑦 1 − 𝑝1 𝑛
[ ] < 𝑘( )
𝑝1 (1 − 𝑝0 ) 1 − 𝑝0
or if
𝑝 (1−𝑝 ) 1−𝑝 𝑛
𝑦 ℓ𝑛 [𝑝0(1−𝑝1)] < ℓ𝑛 [𝑘 (1−𝑝1) ] .
1 0 0

𝑝 𝑝
Now it is important to note that since 𝑝0 < 𝑝1 𝑠𝑜 𝑝0 < 1 and 𝑝0 , 𝑝1 > 0 so 0 < 𝑝0 < 1
1 1
1−𝑝1 1−𝑝1
𝑝0 < 𝑝1 𝑠𝑜 −𝑝0 > −𝑝1 𝑠𝑜 1 − 𝑝0 > 1 − 𝑝1 𝑠𝑜 1 > 1−𝑝 i.e. <1
0 1−𝑝0

1−𝑝
and 0 < 𝑝0 , 𝑝1 < 1 so 0 < 1 − 𝑝0 , 1 − 𝑝1 < 1 so 0 < 1−𝑝1 < 1
0

𝑝 (1−𝑝 )
0 < [𝑝0(1−𝑝1)] < 1, and therefore the log is negative. So if we divide both sides of the inequality
1 0
𝑝0 (1−𝑝1 )
by ℓ𝑛 [𝑝 ] < 0, the inequality sign is reversed. So reject 𝐻0 if
1 (1−𝑝0 )
1−𝑝 𝑛 𝑝 (1−𝑝 )
𝑦 > ℓ𝑛 [𝑘 (1−𝑝1) ] /ℓ𝑛 [𝑝0(1−𝑝1)] . (1.14)
0 1 0

As in the previous example it is not necessary to evaluate 𝑘. If we denote the right–hand side by
𝑐, then we reject 𝐻0 if 𝑦 > 𝑐 where 𝑐 is obtained from the relation
𝛼 = 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒) = 𝑃𝑝0 [𝑌 > 𝑐] .

Under 𝐻0 , 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 has a Binomial distribution with parameters 𝑛 and 𝑝0 . Since the
distribution is discrete it is generally not possible to find 𝑐 that exactly satisfies the equation.
1
For example, suppose 𝑛 = 20 and 𝑝0 = 2 and we want 𝛼 = 0.05. Now
1 20 1 y 1 20−y 20 1 20
𝑃 [𝑌 > 𝑐|𝑝0 = 2 ] = ∑20 𝑦=𝑐+1 (y ) ( 2 ) ( 2 ) = ∑20𝑦=𝑐+1 (y ) ( 2 ) (1.15)

We want 0.05 = 𝛼 = 𝑃𝑝0 [𝑌 > 𝑐] i.e. we want c such that 1 − 0.05 = 1 − 𝑃𝑝0 [𝑌 > 𝑐] i.e.

𝑃𝑝0 [𝑌 ≤ 𝑐] = 0.95 where Y ~ Bin(20, ½)


From Binomial tables follows that
1
𝑃 [𝑌 ≤ 13|𝑝 = ] = 0.9423
2
𝑎𝑛𝑑
1
𝑃 [𝑌 ≤ 14|𝑝 = ] = 0.9793 .
2

So there is no discrete value for 𝑐 for which the size of the test is exactly 0.95, it must be either
0.9423 (𝑐 = 13) or 0.9793 (𝑐 = 14). In practice we usually use a test with size as close as
possible to the desired one. 0.9423 is closer to 0.95 than 0.9793, thus we choose c = 13 i.e. Reject
H0 if 𝑌 ≤ 13
(The size of the test can be made exactly what is desired by using randomized tests, but we will
not deal with that.)

Table A Binomial (e.g. Bin(20,0.5)) CDF P(Y≤14) BINOM.DIST(14,20,0.5,TRUE) =0.979305267


X Binomial (e.g. Bin(20,0.5)) PDF P(Y=14) BINOM.DIST(14,20,0.5,FALSE)
Table B Poisson (e.g. Poisson(7)) CDF P(X≤2) POISSON.DIST(2,7,TRUE)=0.029636164
X Poisson (e.g. Poisson(7)) PDF P(X=2) POISSON.DIST(2,7,FALSE)
Table C Normal CDF =NORM.S.DIST(….,true)
Table D Normal Percentiles =NORM.S.INV(….)
Table E t Percentiles =T.INV(….,df)
Table F Chi square percentiles =CHISQ.INV(….,df)
Table G Beta percentiles =BETA.INV(….,alpha,beta)
Table H F percentiles =F.INV(….,DF1,DF2)

e.g. X~Beta(0.2 , 0.3) and I want F(?) = P(X < ?) = 0.9 then ? = Beta.Inv(0.9, 0.2, 0.3)

1.3.3 Sufficient statistics

The application of the Neyman–Pearson Lemma can be simplified by using sufficient statistics
instead of the whole sample.
Remember that a statistic
𝑇(𝑥1 , … , 𝑥𝑛 ) = 𝑇(𝐱) ,
a function of the sample, is a sufficient statistic for the parameter 𝜃 if the joint density function
of the sample can be expressed as
𝑓(𝒙|𝜃) = 𝑔(𝑇(𝒙)|𝜃)ℎ(𝒙) , (1.16)

where ℎ(𝒙) > 0 is independent of 𝜃.

e.g. ∑ 𝑋𝑖 is sufficient for 𝜇 as we can show the above, practically it means 𝑇(𝑿) = ∑ 𝑋𝑖 has
1
enough information in it for us to get the best estimate of 𝜇, which is 𝑛 ∑ 𝑋𝑖 . You will know your
sample size (e.g. 10) and all you need to know is the value of ∑ 𝑋𝑖 (e.g. ∑ 𝑋𝑖 = 120) then you
1 1
have enough info to information to get the best estimate of 𝜇 i.e. it is 𝑛 ∑ 𝑋𝑖 = 10 (120) = 12.
Theorem 1.3 : Consider the simple hypotheses 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 = 𝜃1 . Suppose
𝑇(𝐱 ) is a sufficient statistic for 𝜃 and 𝑔(𝑡|𝜃𝑖 ) is the pdf of 𝑇 corresponding to 𝜃𝑖 (𝑖 = 0,1).
Then any test with critical region 𝑈 (a subset of the sample space of 𝑇) is a MPT of size 𝛼 if it
satisfies
𝑔(𝑡|𝜃 )
𝑡 ∈ 𝑈 𝑖𝑓 𝑔(𝑡|𝜃0 ) < 𝑘 (1.17)
1
for some 𝑘 > 0, where 𝑃𝜃0 [𝑇 ∈ 𝑈] = 𝛼 .

𝑔(𝑡|𝜃0 )
i.e. we can use < 𝑘 as our rejection rule (when to reject H0) where g is the pdf of t,
𝑔(𝑡|𝜃1 )
𝑓(𝒙|𝜃0 )
instead of < 𝑘 where f is the pdf of x1,…,xn when finding the most powerful test.
𝑓(𝒙|𝜃1 )

E.g.:
• This t could be for example x̅ when doing a test on θ for the N(θ, σ2) distribution
where σ2 is known. If Xi ~ N(θ, σ2) then X̅ ~ N(θ, σ2/n) so you can easily write down it's
density function. (see example 1.5)
• This t could be for Y = max(Xi) = X(n) when doing a test on θ of the Uniform(0, θ)
distribution. We would use theorem 1.4. to get the density function of t = Y = max(Xi)
= X(n)
• If a density function has range 0 ≤ x ≤ θ, then we use t = Y = max(Xi) = X(n). We would
use theorem 1.4. to get the density function of t = Y = max(Xi) = X(n). (see example 1.6)
• If a density function has range a ≤ x ≤ θ, where a is known and the test is on θ, then we
use t = max(Xi) = X(n). We would use theorem 1.4. to get the density function of t = Y =
max(Xi) = X(n)
• If a density function has range θ ≤ x ≤ b, where b is known and the test is on θ, then we
use t = min(Xi) = X(1). We would use theorem 1.4. to get the density function of t = Y =
min(Xi) = X(1).

Proof : According to the Neyman–Pearson Lemma,


𝑓(𝒙|𝜃 )
𝒙 ∈ 𝑅 𝑖𝑓 𝑓(𝒙|𝜃0) < 𝑘 .
1
By the Factorization Theorem, the pdf of can be written as 𝑓(𝒙|𝜃𝑖 ) = 𝑔(𝑇(𝒙)|𝜃𝑖 )ℎ(𝒙) , 𝑖 = 0,1,
where 𝑇(𝒙) is a sufficient statistic. So
𝑔(𝑇(𝒙)|𝜃 )ℎ(𝒙)
𝒙 ∈ 𝑅 𝑖𝑓 𝑔(𝑇(𝒙)|𝜃0)ℎ(𝒙) < 𝑘 .
1
Since ℎ(𝒙) is independent of 𝜃, it follows that
𝑔(𝑇(𝒙)|𝜃0 )
𝒙 ∈ 𝑅 𝑖𝑓 < 𝑘.
𝑔(𝑇(𝒙)|𝜃1 )

In terms of the original sample, the test based on 𝑇 has critical region
𝑔(𝑇(𝒙)|𝜃 )
𝑅 = {𝒙 ∶ 𝒙 ∈ 𝑅} = {𝒙 ∶ 𝑇(𝒙) ∈ 𝑈} where 𝑇(𝒙) ∈ 𝑈 𝑖𝑓 𝑔(𝑇(𝒙)|𝜃0 ) < 𝑘
1
and thus
𝛼 = 𝑃𝜃0 [𝑿 ∈ 𝑅] = 𝑃𝜃0 [𝑇(𝑿) ∈ 𝑈].
So the test based on 𝑇 is a MPT of size 𝛼.
Example 1.5 : Consider again the problem in Example 1.3(b). (We want to test the mean of a
normal distribution based on a sample of 𝑛 observations. We know a sufficient statistic for the
𝜎2
mean 𝜃 is 𝑇(𝑿) = 𝑋, the sample mean, with distribution 𝑋 ∼ 𝑁 (𝜃, ).
𝑛
So
𝑛
𝑛 − (𝑥−𝜃)2
𝑔(𝑥|𝜃) = √2𝜋𝜎2 𝑒 2𝜎2 (1.18)
and we apply the Neyman–Pearson Lemma. The result is identical to (1.12) in what follows, but
the calculations are simpler. )
Example 1.5 is just example 1.3(b) where we use our new result (theorem 1.3) instead of using
theorem 1.2 i.e.

𝑋𝑖 ∼ 𝑁(𝜃, 𝜎 2 )

𝜎2
𝑡(𝑿) = 𝑋 ∼ 𝑁 (𝜃, ) because 𝑋 is sufficient for 𝜃. (Shown in 2nd year)
𝑛

1
− (𝑥−𝜃)2
1 𝜎2 𝑛 −
𝑛
(𝑥−𝜃)2
2
𝑔(𝑥|𝜃) = 2
𝑒 𝑛 = √2𝜋𝜎2 𝑒 2𝜎2
√𝜎 √2𝜋
𝑛

𝑛
𝑛 − (𝑥−𝜃0 )2
√2𝜋𝜎2 𝑒 2𝜎2
𝑔(x|𝜃0 )
= 𝑛
(𝑥−𝜃1 )2
𝑔(x|𝜃1 ) 𝑛
√2𝜋𝜎2 𝑒 2𝜎2

ea/eb = ea-b
n
− 0 [(𝑥−𝜃 )2 −(𝑥−𝜃 )2 ]
1
= 𝑒 n2𝜎2
2 2
− [𝑥 −2𝑥𝜃0 +𝜃0 −𝑥 +2𝑥𝜃1 −𝜃12 ]
2
= 𝑒 2𝜎2
n
− [2𝑥𝜃1 −2𝑥𝜃0 +𝜃02 −𝜃12 ]
=𝑒 2𝜎2
𝑛
− 2 [2(𝜃1−𝜃0 )𝑥+[𝜃02 −𝜃12 ]]
=𝑒 2𝜎
𝑛
− 2 [(𝜃02 −𝜃12 )+2𝑥(𝜃1 −𝜃0 )]
=𝑒 2𝜎

So reject 𝐻0 if
𝑛
− [(𝜃2 −𝜃2 )+2𝑥(𝜃 −𝜃 )]
1 0
𝑒 2𝜎2 0 1 < 𝑘,
𝑛 2 2
− 2𝜎2 [(𝜃0 − 𝜃1 ) + 2𝑥(𝜃1 − 𝜃0 )] < ln(𝑘),

that is, if
2𝜎2
𝜃02 − 𝜃12 + 2𝑥(𝜃1 − 𝜃0 ) > − ℓ𝑛(𝑘) ,
𝑛

or if
2𝜎2
2𝑥(𝜃1 − 𝜃0 ) > − ℓ𝑛(𝑘) − (𝜃02 − 𝜃12 ),
𝑛

2𝜎2
2𝑥 (𝜃1 − 𝜃0 ) > 𝜃12 − 𝜃02 − ℓ𝑛(𝑘)
𝑛
1 2𝜎2
𝑥 > (𝜃12 − 𝜃02 − ℓ𝑛(𝑘)) .
2(𝜃1 −𝜃0 ) 𝑛

Then 𝐻0 is rejected if 𝑥 > 𝑐, where

𝛼 = 𝑃[ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒 ] = 𝑃[ 𝑋 > 𝑐|𝜃0 ] = 𝑃𝜃0 [𝑋 > 𝑐]

𝜎2
We know that under 𝐻0 , Xi ∼ 𝑁(𝜃0 , 𝜎 2 ) 𝑠𝑜 𝑋 ∼ 𝑁 (𝜃0 , ) , so by standardising,
𝑛

𝑋−𝜃0 𝑐−𝜃0
𝛼 = 𝑃𝜃0 [𝑋 > 𝑐] = 𝑃 [𝜎/ > ]
√𝑛 𝜎/√𝑛
𝑐−𝜃0
= 𝑃 [𝑍 > ] 𝑤ℎ𝑒𝑟𝑒 𝑍 ∼ 𝑁(0,1)
𝜎/√𝑛
𝑐−𝜃0
= 1 − 𝑃 [𝑍 < ]
𝜎/√𝑛
𝑐−𝜃0
1 − 𝛼 = 𝑃 [𝑍 < ]
𝜎/√𝑛
𝑐−𝜃0
1 − 𝛼 = Φ( 𝜎 ) Φ gets read from table C. But c is unknown.
√𝑛
𝑐−𝜃0
Φ−1 (1 − 𝛼) = 𝜎
√𝑛
𝑐−𝜃0
𝑧1−𝛼 = 𝜎
√𝑛
𝑐−𝜃0
= 𝑧1−𝛼 , 𝑜𝑟
𝜎/√𝑛
𝜎
𝑐 = 𝜃0 + 𝑧1−𝛼 .
√𝑛

So the MPT of size 𝛼 is:

𝜎
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 > 𝜃0 + 𝑧1−𝛼
√𝑛

I could also ask you to calculate 𝛽 for this example (and for the theoretical example 1.3(b))
𝜎
𝛽 = 𝑃[𝐷𝑜 𝑛𝑜𝑡 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑖𝑠 𝑡𝑟𝑢𝑒 ] = 𝑃[ 𝑋 < 𝑐|𝜃1 ] = 𝑃 [ 𝑋 < 𝜃0 + 𝑧1−𝛼 |𝜃1 ]
√𝑛
𝜎 𝜎 𝜎
𝑋−𝜃1 𝜃0 +𝑧1−𝛼 −𝜃1 𝜃0 +𝑧1−𝛼 −𝜃1 𝜃0 +𝑧1−𝛼 −𝜃1
√𝑛 √𝑛 √𝑛
= 𝑃[ 𝜎 < 𝜎 ] = 𝑃 [Z < 𝜎 ] = Φ( 𝜎 )
√𝑛 √𝑛 √𝑛 √𝑛
√𝑛
= Φ ((𝜃0 − 𝜃1 ) + 𝑧1−𝛼 )
𝜎

(Φ( ) is the same thing as FZ ( ).)


For the next example we will need the following theorem.

Theorem 1.4 : Let 𝑋(1) < 𝑋(2) < ⋯ < 𝑋(𝑛) denote the order statistics of a random sample
𝑋1 , … , 𝑋𝑛 from a contineous population with pdf 𝑓𝑋 (𝑥) and cdf 𝐹𝑋 (𝑥). Then the pdf of 𝑋(𝑗) is
𝑛!
𝑓𝑋(𝑗) (𝑥) = 𝑓 (𝑥)[𝐹𝑋 (𝑥)]𝑗−1 [1 − 𝐹𝑋 (𝑥)]𝑛−𝑗 . (1.19)
(𝑗−1)!(𝑛−𝑗)! 𝑋
It follows then for the smallest order statistic that
𝑓𝑋(1) (𝑥) = 𝑛𝑓𝑋 (𝑥)[1 − 𝐹𝑋 (𝑥)]𝑛−1 (1.20)
and for the largest order statistic that
𝑓𝑋(𝑛) (𝑥) = 𝑛𝑓𝑋 (𝑥)[𝐹𝑋 (𝑥)]𝑛−1 . (1.21)
(this often gets used with a distribution which has a pdf in a certain limitted range limitted by the
parameter e.g. below and also the uniform U(0, θ) distribution.) If the range is 0 ≤ 𝑥 ≤ 𝜃 then
X(𝑛) will be sufficient for 𝜃 i.e. we can use Theorem 1.3 where our t (which in the previous
example was 𝑥) is now t = X(𝑛) . (If we had 𝜃 ≤ 𝑥 ≤ 5 we would use t = X(1) .)

Example 1.6 : Consider a sample 𝑋1 , … , 𝑋𝑛 from the distribution


𝑥 𝜃−1
𝑓(𝑥|𝜃) = (𝜃) ,0 ≤ 𝑥 ≤ 𝜃 ,𝜃 > 0 ,
and we want to test 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 = 𝜃1 , where 𝜃0 < 𝜃1 .

(Now the joint pdf of 𝑥1 , … , 𝑥𝑛 is

1
𝑓(𝒙|𝜃) = 𝜃𝑛(𝜃−1) ∏𝑛𝑖=1 𝑥𝑖𝜃−1 , 0 ≤ 𝑥𝑖 ≤ 𝜃 , 𝑖 = 1, … , 𝑛 .
This is a nonstandard problem and difficult to evaluate unless we use a sufficient statistic.
A sufficient statistic for 𝜃 is 𝑌 = max{𝑋1 , … , 𝑋𝑛 }, the largest order statistic.

Note : It is useful to remember that if a parameter defines a boundary point of the sample space,
then the corresponding order statistic is sufficient.)
We will use theorem 1.3 instead of theorem 1.2.

From (1.21) it follows that the distribution of 𝑡 = 𝑋(𝑛) = 𝑌 is given by


𝑔𝑌 (𝑦|𝜃) = 𝑛𝑓𝑋 (𝑦|𝜃)[𝐹𝑋 (𝑦|𝜃)]𝑛−1 , 0 ≤ 𝑦 ≤ 𝜃 ,
𝑤ℎ𝑒𝑟𝑒
𝑦 𝜃−1
𝑓𝑋 (𝑦|𝜃) = ( ) , 𝑎𝑛𝑑
𝜃
𝑦
𝑥 𝜃−1 𝑦 𝜃
𝐹𝑋 (𝑦|𝜃) = ∫ ( ) 𝑑𝑥 = ( ) .
0 𝜃 𝜃
𝑆𝑜
𝑛
𝑔(𝑦|𝜃) = 𝑛𝜃−1
𝑦 𝑛𝜃−1 , 0 ≤ 𝑦 ≤ 𝜃 .
𝜃
𝑔(𝑦|𝜃0 )
𝑊𝑒 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 < 𝑘, 𝑤ℎ𝑒𝑟𝑒
𝑔(𝑦|𝜃1 )
𝑛 𝑛𝜃0 −1
𝑔(𝑦|𝜃0 ) 𝜃 𝑛𝜃0 −1 𝑦
𝑛𝜃 −1 𝑛𝜃 −1
𝑦 𝑛𝜃0 −1 𝜃1 1 𝜃 1
−𝑛(𝜃1 −𝜃0 ) 1
= 𝑛 = 𝑛𝜃 −1
= 𝑦 𝑛𝜃 −1
𝑔(𝑦|𝜃1 ) 𝑦 𝑛𝜃1 −1 𝑦 𝑛𝜃1 −1 𝜃0 0 𝜃0 0
𝜃 𝑛𝜃1 −1

𝑔(𝑦|𝜃0 )
𝑊𝑒 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 < 𝑘
𝑔(𝑦|𝜃1 )
𝑛𝜃 −1
𝜃1 1
i.e. 𝑦 −𝑛(𝜃1 −𝜃0) 𝑛𝜃 −1 < 𝑘
𝜃0 0
𝑛𝜃 −1
𝜃0 0
i.e if 𝑦 −𝑛(𝜃1 −𝜃0) < 𝑘 𝑛𝜃 −1
𝜃1 1
𝑛𝜃0 −1
𝜃0
(you had to check if 𝑛𝜃 −1 was negative or positive i.e. < becomes > or stays <
𝜃1 1
𝑛𝜃 −1
𝜃0 0
we know 0 < 𝜃0 < 𝜃1 so 𝑛𝜃 −1 is positive so the sign stays the same)
𝜃1 1
that is, when
1
𝑛𝜃 −1 −𝑛(𝜃1 −𝜃0 )
𝜃0 0
𝑦 > (𝑘 ) ( 𝑠𝑖𝑛𝑐𝑒 𝜃1 > 𝜃0 ) , (when you raise to a negative
𝑛𝜃 −1
𝜃1 1
power, the sign changes)
or 𝑦 > 𝑐 , where
𝛼 = 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒)

𝛼 = 𝑃𝜃0 [𝑌 > 𝑐] 𝑜𝑟 𝑦𝑜𝑢 𝑐𝑎𝑛 𝑢𝑠𝑒 1 − 𝑃𝜃0 [𝑌 < 𝑐]


𝜃0 c
𝑛 𝑛𝜃0 −1
𝑛
𝛼 = ∫ 𝑛𝜃 −1
𝑦 𝑑𝑦 𝑜𝑟 𝑦𝑜𝑢 𝑐𝑎𝑛 𝑢𝑠𝑒 1 − ∫ 𝑛𝜃0 −1
𝑦 𝑛𝜃0 −1
0
𝑐 𝜃0 0 𝜃0
𝑐 𝑛𝜃0
𝛼 = 1−( ) .
𝜃0
𝑛𝜃0 1
𝑐
So (𝜃 ) = 1 − 𝛼, and 𝑐 = 𝜃0 (1 − 𝛼)𝑛𝜃0 .
0
1
The MPT of size 𝛼 is then: Reject 𝐻0 if 𝑦 > 𝜃0 (1 − 𝛼)𝑛𝜃0 .

I could also ask you to calculate 𝛽:


c 𝑛
𝛽 = 𝑃[𝐷𝑜 𝑛𝑜𝑡 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑖𝑠 𝑡𝑟𝑢𝑒 ] = 𝑃𝜃1 [𝑌 < 𝑐] = ∫0 𝑛𝜃 −1 𝑦 𝑛𝜃1 −1
𝜃0 1
1 𝑛𝜃1
𝑛𝜃1 𝑛𝜃1 𝜃1
𝑐 𝜃0 (1−𝛼)𝑛𝜃0 𝜃
= (𝜃 ) =( ) = (𝜃0 ) (1 − 𝛼)𝜃0
1 𝜃1 1

Exercises 1
1. Suppose that 𝑋1 , … , 𝑋𝑛 form a random sample from a uniform distribution on the interval
(0, 𝜃), and that the following hypotheses are to be tested : 𝐻0 : 𝜃 ≥ 2 𝑣𝑒𝑟𝑠𝑢𝑠 𝐻1 : 𝜃 < 2.
Let 𝑌𝑛 = max{𝑋1 , … , 𝑋𝑛 } and consider the test procedure with critical region 𝑌𝑛 ≤ 1.5.
Determine the power function of the test.
2. Consider a sample 𝑥1 , 𝑥2 , … , 𝑥𝑛 from the distribution

𝑥 𝜃−1
𝑓(𝑥|𝜃) = (𝜃) ,0 ≤ 𝑥 ≤ 𝜃 ,𝜃 > 0 .
Let 𝑌 = max {𝑥1 , 𝑥2 , … , 𝑥𝑛 } and for the test of 𝐻0 : 𝜃 ≤ 2 versus 𝐻1 : 𝜃 > 2, let the critical
region be 𝑅 = {𝑥: 𝑦 > 1.9}.
Find the power function of this test.

3. Let 𝑋 be a single observation from the density


𝑓(𝑥|𝜃) = 2𝜃𝑥 + 1 − 𝜃 , 0 ≤ 𝑥 ≤ 1,
where −1 ≤ 𝜃 ≤ 1. We want to test 𝐻0 : 𝜃 = 0 versus 𝐻1 : 𝜃 = 1.
(a) Determine a test for which the value of 𝛼 + 2𝛽 is a minimum and determine that
minimum value.
(b) Find a test Γ ∗ for which 𝛼 ∗ ≤ 0.1 and 𝛽 ∗ is a minimum. What is the power of this test?

4. Let 𝑋 be a single observation from the density


𝑓(𝑥|𝜃) = 𝜃𝑥 𝜃−1 , 0 < 𝑥 < 1 , 𝜃 > 0 .
(a) Find a most powerful size 𝛼 test of 𝐻0 : 𝜃 = 2 versus 𝐻1 : 𝜃 = 1 and determine the size
of the Type II Error.
(b) Find the test for the above hypotheses that minimizes 𝛼 + 𝛽 and calculate the values of
𝛼 and 𝛽 for this test.

5. Let 𝑋 be a single observation from


𝜃2𝜃
𝑓(𝑥|𝜃) = ,2 ≤ 𝑥 < ∞ .
𝑥 𝜃+1
𝐹𝑜𝑟 𝑡ℎ𝑒 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝐻0 : 𝜃 = 1 𝑣𝑠 𝐻1 : 𝜃 = 2 ,
find the MPT of size 𝛼. What is the size of the Type II error?
6. Suppose 𝑋1 , … , 𝑋𝑛 is a random sample from a normal distribution with unknown mean 𝜃
and variance 1, and that the following hypotheses are to be tested : 𝐻0 : 𝜃 = 3.5 versus
𝐻1 : 𝜃 = 5.
(a) Among all test procedures for which 𝛽 ≤ 0.05 , determine the test for which 𝛼 is a
minimum.
(b) For 𝑛 = 4, find the minimum value attained by the test in (a).

7. Consider a random sample of size 𝑛 from a normal distribution with mean zero and
unknown variance 𝜎 2 . We want to test the hypotheses 𝐻0 : 𝜎 2 = 2 versus 𝐻1 : 𝜎 2 = 3.
(a) Find the most powerful test of size 𝛼 = 0.05.
(b) Determine the test in (a) when 𝑛 = 8.

8. Suppose a single observation is taken from a uniform distribution on the interval (0, 𝜃), and
that the following hypotheses are to be tested : 𝐻0 : 𝜃 = 1 versus 𝐻1 : 𝜃 = 2.
(a) Show that there exists a test procedure for which 𝛼 = 0 and 𝛽 < 1.
(b) Among all tests for which 𝛼 = 0, find the one for which 𝛽 is a minimum.
9. Let 𝑋1 , … , 𝑋𝑛 be a random sample from a Poisson distribution with unknown parameter
𝜆. For the hypotheses 𝐻0 : 𝜆 = 𝜆0 versus 𝐻1 : 𝜆 = 𝜆1 , where 𝜆0 < 𝜆1 ;
(a) Find the most powerful size−𝛼 test ;
(b) Find the MPT when 𝑛 = 20, 𝜆0 = 0.1, 𝜆1 = 0.2 and 𝛼 is approximately 0.1.

10. Suppose a random sample from a normal distribution with unknown mean 𝜇 and
standard deviation 2. We want to test 𝐻0 : 𝜇 = −1 versus 𝐻1 : 𝜇 = 1. Determine the minimum
value of 𝛼 + 𝛽 that can be attained for (a) 𝑛 = 1, (b) 𝑛 = 9, (c) 𝑛 = 36.

11. In 20 tosses of a coin, 5 heads and 15 tails appear. Test the null hypothesis that the coin is
fair against the alternative that the probability for heads is 0.3. The size of the test should be
smaller or equal to 0.1. What is the power of your test?

12. A bag contains 5 balls of which 𝑚 are black and the rest, 5 − 𝑚, are white. Now, 𝑚 is
either 3 or 1, so we draw two balls without replacement and we want to test 𝐻0 : 𝑚 = 3
versus 𝐻1 : 𝑚 = 1. If we decide to reject 𝐻0 if both balls drawn are white, find 𝛼 and 𝛽.
Chapter 2
Composite Hypotheses
(< or > or ≤ or ≥ or ≠ etc.)
(simple implies =)

2.1 Introduction
As stated in definition 1.6, if the parameter space under a particular hypothesis contains more
than one point, it is a composite hypothesis. In most cases in practice composite hypotheses are
considered, or cases where one of the hypotheses is composite. Often the null hypotheses is
simple while the alternative hypothesis is composite. For example, the null hypothesis can specify
a particular value, 𝐻0 : 𝜃 = 𝜃0 , the standard or norm, while the alternative just states that the
null hypothesis is not true, or 𝐻1 : 𝜃 ≠ 𝜃0 .

Note : We shall denote the parameter space by Ω, and 𝐻0 : 𝜃 ∈ Ω0 versus 𝐻1 : 𝜃 ∈ Ω1 , where


Ω0 ∪ Ω1 = Ω and Ω0 ∩ Ω1 = 𝜙, as the hypotheses. The alternative is one–sided if Ω1 is one
subset of Ω, and two–sided if Ω1 is formed by the union of two disjoint subsets of Ω.
For example, for 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 ≠ 𝜃0 , the alternative is two–sided, while for 𝐻0 : 𝜃 ≤
𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 or if we have 𝐻0 : 𝜃 ≥ 𝜃0 versus 𝐻1 : 𝜃 < 𝜃0 , the alternative is one–
sided.

In Chapter I for two simple hypotheses the size of the test 𝛼 = 𝜋(𝜃0 ) =
𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒) = 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝜃 = 𝜃0 ) was defined as the probability of rejecting 𝐻0
when 𝐻0 is true. When 𝐻0 is composite (e.g. 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒) =
|𝜃 ))
𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ≤ 𝜃0 this definition is not sufficient and we have the more general definition.

sup is simply a max where the limit is always included


inf is simply a min where the limit is always included
max 𝑥 2 = does not exist sup 𝑥 2 = 32 = 9
𝑥∈(2,3) 𝑥∈(2,3)
max 𝑥 2 = 32 = 9 sup 𝑥 2 = 32 = 9
𝑥∈(2,3] 𝑥∈(2,3]

min 𝑥 2 = does not exist inf 𝑥 2 = 22 = 4


𝑥∈(2,3) 𝑥∈(2,3)
min 𝑥 2 = 22 = 4 inf 𝑥 2 = 22 = 4
𝑥∈[2,3) 𝑥∈[2,3)

𝜃 ∈ Ω0 is the same thing as 𝜃 ∈ Ω𝑐0 is the same thing as 𝜃 ∈ Ω1


Definition 2.1 : Let Γ be a test of 𝐻0 : 𝜃 ∈ Ω0 versus 𝐻1 : 𝜃 ∈ Ω0 where Ω0 ⊂ Ω.
The size of the test is defined as
𝛼 = sup 𝜋(𝜃)
𝜃∈Ω0
= sup 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ]
𝜃∈Ω0 (2.1)
= sup 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒]
= sup𝑃[𝑇𝑦𝑝𝑒 𝐼 𝐸𝑟𝑟𝑜𝑟] ,

where 𝜋(𝜃) is the power function of the test.

Similarly,
𝛽 = sup [1 − 𝜋(𝜃)]
𝜃∈Ω𝑐0
= sup 𝑃[𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 ]
𝜃∈Ω𝑐0
= sup 𝑃[𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑡𝑟𝑢𝑒]
= sup𝑃[𝑇𝑦𝑝𝑒 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 ] ,
Note that the supremum (sup) is the same as the least-upper-bound, but differs from the
maximum in that it is not necessarily in the set of values.

Note : The size of a test is also called the level of significance of the test.

Example 2.1
Consider again the light bulbs from Example 1.1. There we had the two composite hypotheses,
𝐻0 : 𝜃 ≤ 1400 versus 𝐻1 : 𝜃 > 1400, and it was decided to reject 𝐻0 if 𝑥 ≥ 1500. The power
function was derived as
1500
𝜋(𝜃) = 𝑒 − 𝜃 .
Now the size of this test is the maximum probability of committing a Type I Error,
1500
𝛼 = sup 𝜋(𝜃) = sup 𝑒 − 𝜃 .
𝜃∈Ω0 𝜃≤1400
Since 𝜋(𝜃) is an increasing function of 𝜃, the supremum occurs at the maximum value of 𝜃
1500
under 𝐻0 , which is 1400. So 𝛼 = 𝑒 −1400 = 0.3425.

The power function is a measure of the efficienty of a test. For a “good” test we should have

(i) 𝜋(𝜃) ⟶ 0 if 𝜃 ∈ Ω0 (𝐻0 true).

(ii) 𝜋(𝜃) ⟶ 1 if 𝜃 ∈ Ω1 (𝐻1 true).

The power function of a perfect test of the hypothesis 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 would
look like the solid line in Figure 2.1. However a test is never perfect and the power function would
usually look like the dotted line.
FIGURE 2.1 : Power function

2.2 Generalized Likelihood Ratio Tests


(mainly used for = vs ≠. Can be used for ≤ vs > or for ≥ vs <)

The most powerful test (MPT) defined by the Neyman–Pearson Lemma does not apply in the
case of composite hypotheses. The Neyman–Pearson Lemma defines a likelihood ratio test
𝑓(𝒙|𝜃 ) 𝐿(𝜃 |𝒙)
because we rejected H0 if 𝑓(𝒙|𝜃0) < 𝑘 i.e. 𝐿(𝜃0 |𝒙) < 𝑘. We will use the same principle to define
1 1
generalized likelihood ratio tests (LR-tests). However it will not have the same optimal properties
as the most powerful test in the case of simple hypotheses.

Definition 2.2 : Let 𝒙 = {𝑥1 , … , 𝑥𝑛 } be a random sample from a distribution with likelihood
function 𝐿(𝜃|𝒙). For the hypothesis 𝐻0 : 𝜃 ∈ Ω0 versus 𝐻1 : 𝜃 ∈ Ω0 where Ω0 ⊂ Ω, let
sup𝜃∈Ω0 𝐿(𝜃|𝒙)
𝜆(𝒙) = . (2.3)
sup𝜃∈Ω 𝐿(𝜃|𝒙)

A generalized likelihood ratio test (GLRT) of size 𝛼 is any test that has a rejection region of the
form 𝑅 = {∶ 𝜆(𝒙) ≤ 𝑘}, where 0 ≤ 𝑘 ≤ 1 is chosen so that sup𝜃∈Ω0 𝑃[𝑿 ∈ 𝑅] = 𝛼.

(set 𝜶 to a small value (often 0.05) and minimise 𝜷)


sup𝜃∈Ω0 𝐿(𝜃|𝒙)
To minimize 𝛽 (for a given 𝛼) i.e. to find, 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 ≤ 𝑘
sup𝜃∈Ω 𝐿(𝜃|𝒙)
Once we simplify the above, as we know the value of 𝛼, we use it to find k (or c) and thus 𝛽.)

Remarks :
(i) Note that Λ(𝑋1 , … , 𝑋𝑛 ) is a statistic with observed value 𝜆(𝒙) = 𝜆(𝑥1 , … , 𝑥𝑛 ), called the
test statistic.

(ii) If the parameter space is reduced to two points so that Ω0 = 𝜃0 and Ω = {𝜃0 , 𝜃1 }, then
the GLRT does not reduce to the MPT for simple hypotheses.
(ii) Note that 0 ≤ 𝜆(𝒙 ) ≤ 1 since 𝐿(𝜃|𝒙) is positive and sup𝜃∈Ω0 𝐿(𝜃|𝒙) ≤ sup𝜃∈Ω 𝐿(𝜃|𝒙)
because Ω0 ⊂ Ω.

(iv) Note that the supremum of 𝐿(𝜃|𝒙) over the whole parameter space Ω is always
obtained at the maximum likelihood estimators of the parameters. So sup𝜃∈Ω 𝐿(𝜃|𝒙) =
𝐿(𝜃̂|𝒙), where 𝜃̂ represents the ML estimator of 𝜃.

ML estimator of 𝜃 is value of 𝜃 that maximizes 𝐿(𝜃|𝒙).


For convienciance, we note that this is the same value as the value that maximizes ln (𝐿(𝜃|𝒙))
𝛿
sometimes denoted 𝑙(𝜃|𝒙) i.e. we find it by ln(𝐿(𝜃|𝒙)) |𝜃=𝜃̂ = 0.
𝛿𝜃

Steps in deriving a LR–test:

1. Write down the likelihood function.


2. Write down the two hypotheses and define the parameter spaces Ω and Ω0 .
3. Determine the maximum likelihood estimators in Ω and in Ω0 i.e. usual MLE, and
maximising the likelihood function just over Ω0 .
4. Calculate the LR statistic 𝜆(𝒙) and simplify as much as possible.
5. State the LR–test in terms of the result in (iv) i.e. Reject H0 if 𝜆(𝒙) ≤ c.
6. If possible, write the rule stated in (5) in terms of a single statistic with known distribution
under the null hypothesis i.e. usually we try to get a LHS where we know LHS~…..
7. Use the distribution of the statistic to determine the constant so that the size of the test is
𝛼. (Use the definition of 𝛼 to find the RHS value of the rejection rule.)
8. State the critical region of the test clearly. (i.e. Reject H0 if …..)

Example 2.2 : Consider a sample 𝑥1 , … , 𝑥𝑛 from the exponential Exp(θ) distribution:


𝑓(𝑥|𝜃) = 𝜃𝑒 −𝜃𝑥 , 0 ≤ 𝑥 < ∞ .

We want to find a test of size 𝛼 for the simple null hypothesis 𝐻0 : 𝜃 = 𝜃0 versus the composite
two–sided alternative 𝐻1 : 𝜃 ≠ 𝜃0 . Note that now the exponential density function is defined
differently from Example 1.1.

i) The likelihood function is


𝐿(𝜃|𝒙) = 𝜃 𝑛 𝑒 −𝜃Σ𝑥𝑖 .

ii) 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 ≠ 𝜃0
The parameter spaces are
Ω = {𝜃: 𝜃 > 0} 𝑎𝑛𝑑 Ω0 = {𝜃: 𝜃 = 𝜃0 } .
(formula sheet)

iii)
In Ω0 ; (𝜃 can only be 𝜃0 ) so we will substitute 𝜃0 for 𝜃
sup 𝐿(𝜃|𝒙) = 𝐿(𝜃0 |𝒙) = 𝜃0𝑛 𝑒 −𝜃0Σ𝑥𝑖 ,
𝜃∈Ω0
since Ω0 contains only one point i.e. 𝜃0 .

In Ω ; (the maximum (supremum) will occur at the usual MLE)


sup 𝐿(𝜃|𝒙) = 𝐿(𝜃̂|𝒙) ,
𝜃∈Ω

where 𝜃̂ is the maximum likelihood estimator. If the ML estimator is not known, it must be
𝜕
derived by setting ℓ𝑛 𝐿(𝜃|𝒙) = 0 and solving for 𝜃. In this case
𝜕𝜃
𝐿(𝜃|𝒙) = 𝜃 𝑛 𝑒 −𝜃Σ𝑥𝑖 .
ℓ𝑛 𝐿(𝜃|𝒙) = 𝑛 ℓ𝑛𝜃 − 𝜃Σ𝑥𝑖
𝜕 𝑛
ℓ𝑛 𝐿(𝜃|𝒙) = − Σ𝑥𝑖
𝜕𝜃 𝜃
𝜕 𝑛
Set 𝜕𝜃 ℓ𝑛 𝐿(𝜃|𝒙)|𝜃=𝜃̂ = 𝜃̂ − Σ𝑥𝑖 = 0, then follows
𝑛
= Σ𝑥𝑖
𝜃̂
1 Σ𝑥𝑖
=
𝜃̂ 𝑛
𝑛 1 1
𝜃̂ = Σ𝑥 = 1 = 𝑥.
𝑖 Σ𝑥𝑖
n
So
1
Sup 𝐿(𝜃|𝒙) = 𝐿(𝜃̂|𝒙) = 𝐿 (𝑥 |𝒙)
𝜃∈Ω
1 𝑛 1
= (𝑥) 𝑒 −𝑥Σ𝑥𝑖
1 𝑛 1
= (𝑥) 𝑒 −𝑥n𝑥
−𝑛
= 𝑥 𝑒 −𝑛
iv)
So
sup𝜃∈Ω0 𝐿(𝜃 |𝒙) 𝜃0𝑛 𝑒 −𝜃0 Σ𝑥𝑖 𝜃0𝑛 𝑒 −𝜃0 n𝑥
𝜆(𝒙) = = −𝑛 = −𝑛 −𝑛 = (𝜃0 𝑥)𝑛 𝑒 𝑛(1−𝜃0𝑥) . (2.4)
sup𝜃∈Ω 𝐿(𝜃 |𝒙) 𝑥 𝑒 −𝑛 𝑥 𝑒

v)
According to the GLR test we reject 𝐻0 if 𝜆(𝒙) ≤ 𝑐 , that is if
(𝜃0 𝑥)𝑛 𝑒 𝑛(1−𝜃0𝑥) ≤ 𝑐 . (2.5)

This can’t be simplified Algebraicly, but graphically we can see that it is the same as:
𝑥 ≤ 𝑘1 or 𝑥 ≥ 𝑘2 .

(To determine 𝑐 we must write the inequality in terms of a statistic of which the distribution is
known. We note that the only sample statistic in the left–hand side of (2.5) is 𝑥, so we want to
redefine the rejection region in terms of 𝑥. Now 𝜆(𝒙) in (2.4) is a non–monotone function of
1
𝑥 with a maximum of one when 𝑥 = 𝜃 . So (2.5) will hold for 0 ≤ 𝑐 ≤ 1 when 𝑥 is either
0
small enough or large enough. This is shown in Figure 2.2.
FIGURE 2.2 : LR statistic as a function of 𝒙

So 𝜆(𝒙) ≤ 𝑐 if and only if 𝑥 ≤ 𝑘1 or 𝑥 ≥ 𝑘2 . Note that 𝑘1 and 𝑘2 are the two roots of the
equation 𝜆(𝒙) − 𝑐 = 0. It is usually not easy to find these roots, so the problem is solved by
using the condition that the size of the test must be 𝛼.)

vi)

Important relationships I :
If 𝑋𝑖 ∼ Exp(𝜃) , 𝑖 = 1, … , 𝑛, then
∑𝑖=1 𝑋𝑖 ∼ Gamma(𝑛, 𝜃) , and
1 𝜃
∑ 𝑋𝑖 ∼ Gamma(𝑛, 1 ) , and
𝑛 𝑖=1
𝑛
𝑋 ∼ Gamma(𝑛, 𝑛𝜃) , and
2𝑛𝜃𝑋 ∼ Gamma(𝑛, 𝑛𝜃 ) = Gamma(2𝑛 , 1)
2𝑛𝜃 2 2
∼ 2 2 v 1
Y=2𝑛𝜃𝑋 𝜒2𝑛 (because 𝜒v = Gamma( , )) 2 2

Reject H0 if 𝑥 ≤ 𝑘1 or 𝑥 ≥ 𝑘2 .

vii)
(We have seen that the critical region 𝑅 = {∶ 𝜆(𝒙) ≤ 𝑐} is equivalent to
𝑅 = {𝑥: 𝑥 ≤ 𝑘1 𝑜𝑟 𝑥 ≥ 𝑘2 } , 𝑠𝑜 )
𝛼 = sup 𝑃 [𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ] = 𝑃𝜃0 [Λ(𝑋) ≤ 𝑐]
𝜃∈Ω0
(2.6)
= 𝑃𝜃0 [𝑋 ≤ 𝑘1 𝑜𝑟 𝑋 ≥ 𝑘2 ] .
𝛼 𝛼
+2 = 𝑃𝜃0 [𝑋 ≤ 𝑘1 ] + 𝑃𝜃0 [𝑋 ≥ 𝑘2 ]
2

(However, an infinite number of pairs (𝑘1 , 𝑘2 ) satisfy (2.6). Any pair would specify a valid LR–
test of size 𝛼 for the given hypotheses, but the proper one would also satisfy 𝜆(𝑘1 ) = 𝜆(𝑘2 ).
As this is a difficult solution to find, in practice an “equal tail” test is usually prefered. This means
that Equation (2.6) is split into two parts where 𝑘1 < 𝑘2 , so that)
𝑃𝜃0 [𝑋 ≤ 𝑘1 ] = 𝑃[𝑋 ≥ 𝑘2 ] = 𝛼/2 . (2.7)
i.e.
𝑃𝜃0 [𝑋 ≤ 𝑘1 ] = 𝛼/2 and 𝑃[𝑋 ≥ 𝑘2 ] = 𝛼/2

(This means we make the probability of a Type I Error for a two–sided alternative the same above
and below the null hypothesis. From (2.7) the constants 𝑘1 and 𝑘2 can now be uniquely
determined.)

So under the null hypothesis we know the distribution of 2𝑛𝜃0 𝑋, and


𝛼/2 = 𝑃𝜃0 [𝑋 ≤ 𝑘1 ] = 𝑃[2𝑛𝜃0 𝑋 ≤ 2𝑛𝜃0 𝑘1 ]
= 𝑃[𝑌 ≤ 2𝑛𝜃0 𝑘1 ] = 𝐹𝑌 (2𝑛𝜃0 𝑘1 )
2 (2.8)
𝑤ℎ𝑒𝑟𝑒 𝑌 = 2𝑛𝜃0 𝑋 ∼ 𝜒2𝑛
2 𝛼
𝑆𝑜 2𝑛𝜃0 𝑘1 = 𝐹𝑌−1 ( 2 ) = 𝜒2𝑛,𝛼/2
which can be found in a chi–squared table with 2𝑛 degrees of freedom.

(Notation : The subscript after a simbol for a distribution will always denote the cdf–value of the
distribution. For example, if 𝑍 ∼ 𝑁(0,1), then 𝑧𝛼 is the value for which Φ(𝑧𝛼 ) = 𝐹𝑍 (𝑧𝛼 ) =
𝑃[𝑍 ≤ 𝑧𝛼 ] = 𝛼 i.e. 𝑧𝛼 = Φ−1 (𝛼), or if 𝑋 ∼ 𝑡𝑛 , then 𝑡𝑛,𝛼 means that 𝐹𝑋 (𝑡𝑛,𝛼 )
= 𝑃[𝑋 ≤ 𝑡𝑛,𝛼 ] = 𝛼 i.e. 𝑡𝑛,𝛼 = 𝐹𝑋−1 (𝛼) if 𝑋 ∼ 𝑡𝑛 . If 𝑋 ∼ 𝜒𝑛2 , then 𝜒𝑛,𝛼
2
is that value for which
2
𝐹𝑋 (𝜒𝑛,𝛼 2
) = 𝑃[𝑋 ≤ 𝜒𝑛,𝛼 2
] = 𝛼 i.e. 𝜒𝑛,𝛼 = 𝐹𝑋−1 (𝛼) if 𝑋 ∼ 𝜒𝑛2 . )

From (2.8) follows then that


1 2
𝑘1 = 𝜒2𝑛,𝛼/2 . (2.9)
2𝑛𝜃0
Similarly, it follows that
𝛼
= 𝑃𝜃0 [𝑋 ≥ 𝑘2 ] = 𝑃𝜃0 [2𝑛𝜃0 𝑋 ≥ 2𝑛𝜃0 𝑘2 ] = 1 − 𝑃𝜃0 [2𝑛𝜃0 𝑋 ≤ 2𝑛𝜃0 𝑘2 ] .
2
So
𝐹𝑌 (2𝑛𝜃0 𝑘2 ) = 𝑃[𝑌 ≤ 2𝑛𝜃0 𝑘2 ] = 1 − 𝛼/2
𝛼
2𝑛𝜃0 𝑘2 = 𝐹𝑌−1 (1 − )
2
2
which means that 2𝑛𝜃0 𝑘2 = 𝜒2𝑛,1−𝛼/2 , and
1 2
𝑘2 = 𝜒2𝑛,1−𝛼/2 . (2.10)
2𝑛𝜃0
viii) (Finally the result can be stated as follows:
For the hypotheses 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 ≠ 𝜃0 ,) reject 𝐻0 if
1 2 1 2
𝑥 ≤ 2𝑛𝜃 𝜒2𝑛,𝛼/2 𝑜𝑟 𝑥 ≥ 2𝑛𝜃 𝜒2𝑛,1−𝛼/2 .
0 0
OR :
1
2 2 1
𝑅 = {𝑥: 𝑥 ≤ 2𝑛𝜃 𝜒2𝑛,𝛼/2 𝑜𝑟 𝑥 ≥ 2𝑛𝜃 𝜒2𝑛,1−𝛼/2 }. (2.11)
0 0
Steps in deriving a LR–test:
1. Write down the likelihood function.
2. Write down the two hypotheses and define the parameter spaces Ω and Ω0 .
3. Determine the maximum likelihood estimators in Ω and in Ω0 i.e. usual MLE, and
maximising the likelihood function just over Ω0 .
4. Calculate the LR statistic 𝜆(𝒙) and simplify as much as possible.
5. State the LR–test in terms of the result in (iv) i.e. Reject H0 if 𝜆(𝒙) ≤ c.
6. If possible, write the rule stated in (5) in terms of a single statistic with known distribution
under the null hypothesis i.e. usually we try to get a LHS where we know LHS~…..
7. Use the distribution of the statistic to determine the constant so that the size of the test is
𝛼. (Use the definition of 𝛼 to find the RHS value of the rejection rule.)
8. State the critical region of the test clearly. (i.e. Reject H0 if …..)

In example 2.3 and example 2.4 we have that ∑(𝒙𝒊 − 𝝁𝟎 )𝟐 = ∑(𝒙𝒊 − 𝒙)𝟐 + 𝒏(𝒙 − 𝝁𝟎 )𝟐
which is from:

∑(𝑥𝑖 − 𝜇0 )2 = ∑(𝑥𝑖 − 𝑥 + 𝑥 − 𝜇0 )2 = ∑[(𝑥𝑖 − 𝑥) + (𝑥 − 𝜇0 )]2


= ∑[(𝑥𝑖 − 𝑥)2 + 2(𝑥𝑖 − 𝑥)(𝑥 − 𝜇0 ) + (𝑥 − 𝜇0 )2 ]
= ∑(𝑥𝑖 − 𝑥)2 + ∑ 2(𝑥𝑖 − 𝑥)(𝑥 − 𝜇0 ) + ∑(𝑥 − 𝜇0 )2
= ∑(𝑥𝑖 − 𝑥)2 + 2(𝑥 − 𝜇0 ) ∑(𝑥𝑖 − 𝑥) + 𝑛(𝑥 − 𝜇0 )2
= ∑(𝑥𝑖 − 𝑥)2 + 2(𝑥 − 𝜇0 )(∑ 𝑥𝑖 − ∑ 𝑥) + 𝑛(𝑥 − 𝜇0 )2
= ∑(𝑥𝑖 − 𝑥)2 + 2(𝑥 − 𝜇0 )(𝑛𝑥 − 𝑛𝑥) + 𝑛(𝑥 − 𝜇0 )2
= ∑(𝑥𝑖 − 𝑥)2 + 2(𝑥 − 𝜇0 )(0) + 𝑛(𝑥 − 𝜇0 )2
= ∑(𝑥𝑖 − 𝑥)2 + 0 + 𝑛(𝑥 − 𝜇0 )2
= ∑(𝑥𝑖 − 𝑥)2 + 𝑛(𝑥 − 𝜇0 )2

Let us look at an important standard problem and follow the steps as given above.
Example 2.3 : Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) where 𝜎 2 is known and we want to derive a LR–
test of size 𝛼 for 𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0 .
1
− Σ(𝑥𝑖 −𝜇)2
1. Likelihood function: 𝐿(𝜇|𝐱) = (2𝜋𝜎 2 )−𝑛/2 𝑒 2𝜎2 .(pdf on formula sheet, Πf(xi|𝜇))

2. 𝐻0 : 𝜇 = 𝜇0 vs 𝐻1 : 𝜇 ≠ 𝜇0 .
Ω = {𝜇: −∞ < 𝜇 < ∞}
Ω0 = {𝜇: 𝜇 = 𝜇0 }

𝛿 ln 𝐿(𝜇|𝒙)
3. In Ω the ML estimator is 𝜇̂ = 𝑥. From |𝜇=𝜇̂ = 0 (because we want
𝛿𝜇
sup𝜇∈Ω 𝐿(𝜇|𝐱))
In Ω0 a single point, 𝜇 = 𝜇0 . (because we want sup𝜇∈Ω0 𝐿(𝜇|𝐱))

1
− Σ(𝑥𝑖 −𝑥)2
sup𝜇∈Ω 𝐿(𝜇|𝐱) = 𝐿(𝜇̂ |𝐱) = 𝐿(𝑥|𝐱) = (2𝜋𝜎 2 )−𝑛/2 𝑒 2𝜎2 .
1
− 2 Σ(𝑥𝑖 −𝜇0 )2
sup𝜇∈Ω0 𝐿(𝜇|𝐱) = 𝐿(𝜇0 |𝐱) = (2𝜋𝜎 2 )−𝑛/2 𝑒 2𝜎 .
4.
So
sup𝜇∈Ω0 𝐿(𝜇|𝐱)
𝜆(𝒙) = sup𝜇∈Ω 𝐿(𝜇|𝐱)
1
− Σ(𝑥𝑖 −𝜇0 )2
(2𝜋𝜎2 )−𝑛/2 𝑒 2𝜎2
= 1
− Σ(𝑥𝑖 −𝑥)2
(2𝜋𝜎2 )−𝑛/2 𝑒 2𝜎2
1 (2.12)
[Σ(𝑥𝑖 −𝑥)2 −Σ(𝑥𝑖 −𝜇0 )2 ]
= 𝑒 2𝜎2
𝑛
− 0 (𝑥−𝜇 )2
= 𝑒 2𝜎2
𝑠𝑖𝑛𝑐𝑒 Σ(𝑥𝑖 − 𝜇0 )2 = Σ(𝑥𝑖 − 𝑥)2 + 𝑛(𝑥 − 𝜇0 )2
(𝑠ℎ𝑜𝑤𝑛 𝑏𝑒𝑓𝑜𝑟𝑒 𝑡ℎ𝑖𝑠 𝑒𝑥𝑎𝑚𝑝𝑙𝑒)

5. Reject Now 𝐻0 if 𝜆(𝒙) ≤ 𝑐, that is, if


𝑛
− (𝑥−𝜇0 )2
𝑒 2𝜎2 ≤ 𝑐.
𝑛
− 2𝜎2 (𝑥 − 𝜇0 )2 ≤ ℓ𝑛 𝑐
2𝜎2
(𝑥 − 𝜇0 )2 ≥ − 𝑛 ℓ𝑛 𝑐
(𝑥 − 𝜇0 )2 ≥ 𝑘 2

6. Thus reject 𝐻0 if
(𝑥 − 𝜇0 )2 ≥ 𝑘 2 .
√(𝑥 − 𝜇0 )2 ≥ √𝑘 2 .
|𝑥 − 𝜇0 | ≥ k
±(𝑥 − 𝜇0 ) ≥ k
−(𝑥 − 𝜇0 ) ≥ k or (𝑥 − 𝜇0 ) ≥ k
(𝑥 − 𝜇0 ) ≤ −k or (𝑥 − 𝜇0 ) ≥ k
𝑥 − 𝜇0 ≤ −k or 𝑥 − 𝜇0 ≥ k
𝑥 ≤ 𝜇0 − k or 𝑥 ≥ 𝜇0 + k

𝜎2
where, under 𝐻0 , Xi ∼ 𝑁(𝜇0 , 𝜎 2 ), so 𝑋 ∼ 𝑁 (𝜇0 , ).
𝑛

7. To determine 𝑘, we must have


𝛼 = sup 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒] = 𝑃𝜇0 [𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ] = 𝑃𝜇0 [𝑋 ≤ 𝜇0 − k or 𝑋 ≥ 𝜇0 + k]
𝛼 𝛼
+ 2 = 𝑃𝜇0 [𝑋 ≤ 𝜇0 − 𝑘] + 𝑃𝜇0 [𝑋 ≥ 𝜇0 + 𝑘]
2

Let
𝛼/2 = 𝑃𝜇0 [𝑋 ≥ 𝜇0 + 𝑘]
𝑋−𝜇0 (𝜇0 +𝑘)−𝜇0 𝑘 𝑘
= 𝑃𝜇0 [ 𝜎/ ≥ ] = 𝑃𝜇0 [𝑍 ≥ 𝜎/ 𝑛] = 1 − P [𝑍 ≤ 𝜎/ 𝑛]
√𝑛 𝜎/√𝑛 √ √
𝑘
= 1 − Φ(𝜎/ 𝑛)

𝑘
Φ (𝜎/ 𝑛) = 1 − 𝛼/2

𝑘 𝛼
= Φ−1 (1 − 2 )
𝜎/√𝑛
𝑘
= 𝑧1−𝛼/2
𝜎/√𝑛
𝜎
𝑘 = 𝑧1−𝛼/2 .
√𝑛

8. LR–test: Reject the null hypothesis 𝐻0 : 𝜇 = 𝜇0 if


𝑥 ≤ 𝜇0 − k or 𝑥 ≥ 𝜇0 + k
𝜎 𝜎
𝑥 ≤ 𝜇0 − 𝑧1−𝛼/2 𝑛 𝑜𝑟 𝑥 ≥ 𝜇0 + 𝑧1−𝛼/2 . (2.13)
√ √𝑛

(Note : Remember that 𝑧𝛼/2 = −𝑧1−𝛼/2 so if we used 𝛼/2 = 𝑃𝜇0 [𝑋 ≤ 𝜇0 − 𝑘] we would


end up with the same answer by converting 𝑧𝛼/2 to −𝑧1−𝛼/2 .

In the next important example we will show how to deal with a nuisance parameter, that is, a
parameter which we are not interested in testing, but which is unknown.)

Example 2.4 : Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) where 𝜎 2 is unknown and we want to derive the
LR–test of size 𝛼 for 𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0 .

1. Likelihood function :
1
− Σ(𝑥𝑖 −𝜇)2
𝐿(𝜇, 𝜎 2 |𝒙) = (2𝜋𝜎 2 )−𝑛/2 𝑒 2𝜎2 . (pdf on formula sheet, Πf(xi|𝜇))

2. 𝐻0 : 𝜇 = 𝜇0 vs 𝐻1 : 𝜇 ≠ 𝜇0 .
Now the parameter space is two–dimensional since we have two unknown parameters.
Ω = {(𝜇, 𝜎 2 ) : − ∞ < 𝜇 < ∞ , 𝜎 2 > 0}
Ω0 = {(𝜇, 𝜎 2 ): 𝜇 = 𝜇0 , 𝜎 2 > 0} .

3. In Ω the ML estimators are


1
𝜇̂ = 𝑥 𝑎𝑛𝑑 𝜎̂ 2 = 𝑛 Σ(𝑥𝑖 − 𝑥)2 .

This if from:
𝛿 ln 𝐿(𝜇,𝜎2 |𝒙)
…. |𝜇=𝜇̂ = 0 …. yields 𝜇̂ = 𝑥
𝛿𝜇
̂
𝜎=𝜎
and
𝛿 ln 𝐿(𝜇,𝜎2 |𝒙) 1 1
…. |𝜇=𝜇̂ = 0 …. yields 𝜎̂ 2 = 𝑛 ∑(𝑥𝑖 − 𝜇̂ )2 = 𝑛 ∑(𝑥𝑖 − 𝑥)2
𝛿𝜎
̂
𝜎=𝜎

In Ω0 , now a line in the space 𝜎 2 > 0,

1
𝜇 = 𝜇0 𝑎𝑛𝑑 𝜎̂02 = 𝑛 Σ(𝑥𝑖 − 𝜇0 )2 .
This is from:
𝜇 can only be 𝜇0
But we got to estimate 𝜎 2
𝛿 ln 𝐿(𝜇,𝜎2 |𝒙) 1
…. |𝜇=𝜇0 = 0 …. yields 𝜎̂02 = 𝑛 ∑(𝑥𝑖 − 𝜇0 )2
𝛿𝜎 ̂0
𝜎=𝜎
𝜇0 is in the formula because when we set 𝜇 = 𝜇0

4. (Plug what you got in step 3 into the Likelihood, as this is where the sup (max) occurs)

In Ω:
sup(𝜇,𝜎2 )∈Ω 𝐿(𝜇, 𝜎 2 | 𝒙) = 𝐿(𝜇̂ , 𝜎̂ 2 |𝒙)
𝑛 1
− Σ(𝑥𝑖 −𝑥)2
= (2𝜋𝜎̂ 2 )− 2 𝑒 ̂2
2𝜎
1
𝑛 − 1 Σ(𝑥𝑖 −𝑥)2
2 Σ(𝑥𝑖 −𝑥)2
= (2𝜋𝜎̂ 2 )− 2 𝑒 𝑛

= (2𝜋𝜎̂ 2 )−𝑛/2 𝑒 −𝑛/2

In Ω0 : sup(𝜇,𝜎2)∈Ω0 𝐿(𝜇, 𝜎 2 |𝒙 ) = 𝐿(𝜇0 , 𝜎̂02 | 𝒙)


1
𝑛 − Σ(𝑥𝑖 −𝜇0 )2
= (2𝜋𝜎̂02 )− 2 𝑒 ̂2
2𝜎 0
1
𝑛 − 1 Σ(𝑥𝑖 −𝜇0 )2
2
= (2𝜋𝜎̂02 )− 2 𝑒 2𝑛Σ(𝑥𝑖 −𝜇0)
= (2𝜋𝜎̂02 )−𝑛/2 𝑒 −𝑛/2

So
2
sup(𝜇,𝜎2)∈Ω 𝐿(𝜇, 𝜎 |𝒙) ̂ 2 )−𝑛/2 𝑒 −𝑛/2
(2𝜋𝜎 ̂2
𝜎
−𝑛/2
𝜆(𝒙) = 2 = (2𝜋𝜎̂2 )−𝑛/2 𝑒 −𝑛/2 = (𝜎̂02 ) . (2.14)
sup(𝜇,𝜎2 )∈Ω 𝐿(𝜇, 𝜎 |𝒙) 0
0

5. Reject 𝐻0 if 𝜆(𝒙) ≤ 𝑐, that is, if


−𝑛/2
̂2
𝜎
(𝜎̂02 ) ≤ 𝑐.

6.
−𝑛/2
−𝑛/2 1 2
𝜎̂02 Σ(𝑥𝑖 − 𝜇 0 )
𝜆(𝐱) = ( 2) = (n )
𝜎̂ 1 2
n Σ(𝑥𝑖 − 𝑥)
−𝑛/2
Σ(𝑥𝑖 − 𝜇0 )2
= ( )
Σ(𝑥𝑖 − 𝑥)2
−𝑛/2
Σ(𝑥𝑖 − 𝑥)2 + 𝑛(𝑥 − 𝜇0 )2
= ( )
Σ(𝑥𝑖 − 𝑥)2
−𝑛/2
𝑛(𝑥 − 𝜇0 )2
= (1 + ) .
Σ(𝑥𝑖 − 𝑥)2
(done before previous example)
Reject H0 if
−𝑛/2
𝑛(𝑥−𝜇0 )2
(1 + ) ≤𝑐
Σ(𝑥 −𝑥)2
𝑖
𝑛(𝑥−𝜇0 )2
1+ ≥ 𝑐 −2/n
Σ(𝑥𝑖 −𝑥)2
𝑛(𝑥−𝜇0 )2
≥ 𝑐 −2/n − 1
Σ(𝑥𝑖 −𝑥)2
𝑛(𝑥−𝜇0 )2
≥ 𝑑 2 ( 𝑤ℎ𝑒𝑟𝑒 𝑑2 = 𝑐 −2/𝑛 − 1) ,
Σ(𝑥𝑖 −𝑥)2
𝑛(𝑥−𝜇0 )2
1 ≥ 𝑑2
(𝑛−1) Σ(𝑥𝑖 −𝑥)2
𝑛−1
𝑛(𝑥−𝜇0 )2
2
(𝑛−1)𝑆 2
≥ 𝑑
𝑛(𝑥−𝜇 )2
√ (𝑛−1)𝑆0 2 ≥ √𝑑 2

that is,
√𝑛|𝑥−𝜇0 |
≥ 𝑑, (2.15)
√(𝑛−1) 𝑠2
1
where 𝑠 2 = 𝑛−1 Σ(𝑥𝑖 − 𝑥)2, the unbiased estimator for 𝜎 2 .

We must now write the left–hand side of (2.15) in terms of a statistic with known distribution
under 𝐻0 .

𝐼𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑡 𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝐼𝐼 ∶
𝑖𝑓 Z ∼ 𝑁(0,1) , 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓
𝑈 ∼ 𝜒𝜈2 , 𝑡ℎ𝑒𝑛
𝑍
𝑇 = ∼ 𝑡𝜈
√𝑈/𝜈

𝜎2
We know that under 𝐻0 , Xi ∼ 𝑁(𝜇0 , 𝜎 2 ), so 𝑋 ∼ 𝑁 (𝜇0 , ) , so that
𝑛
𝑋−𝜇0 √𝑛(𝑋−𝜇0 )
𝑍= 𝜎 = ∼ 𝑁(0,1)
𝜎
√𝑛

Σ(𝑋𝑖 −𝑋)2 (𝑛−1)𝑠2 2


Further , 𝑈 = = ∼ 𝜒𝑛−1 , independent of 𝑋, so that
𝜎2 𝜎2

√𝑛(𝑋−𝜇0 ) √𝑛(𝑋−𝜇0 ) √𝑛(𝑋−𝜇0 )


𝑍 𝜎 𝜎 𝜎 √𝑛(𝑋−𝜇0 )
𝑇 = = 2
= (𝑛−1)𝑠2
= 𝑠 = ∼ 𝑡𝑛−1 . (2.16)
√𝑈/(𝑛−1) 𝑠
√(𝑛−1)𝑠 /(𝑛−1) √ 𝜎
𝜎2 (𝑛−1)𝜎2

Reject H0 if
√𝑛|𝑥 − 𝜇0 |
≥ 𝑑,
√(𝑛 − 1) 𝑠 2
√𝑛|𝑥−𝜇0 |
≥ 𝑑
𝑠√(𝑛−1)
√𝑛|𝑥−𝜇0 |
≥ 𝑑 √(𝑛 − 1)
𝑠
|𝑇| ≥ 𝑑 √(𝑛 − 1)
|𝑇| ≥ 𝑘
±𝑇 ≥ 𝑘
−𝑇 ≥ 𝑘 𝑜𝑟 𝑇 ≥ 𝑘
𝑇 ≤ −𝑘 𝑜𝑟 𝑇 ≥ 𝑘

√𝑛|𝑥−𝜇0 |
|𝑇| = ≥ 𝑘 , (where 𝑘 = √𝑛 − 1𝑑) that is, if 𝑇 ≤ −𝑘 or 𝑇 ≥ 𝑘, where 𝑇 ∼ 𝑡𝑛−1.
𝑠

7. Since we now know the distribution of 𝑇, we can determine 𝑘 by setting

𝛼 = sup 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒] = 𝑃𝜇0 [𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ] = 𝑃𝜇0 [𝑇 ≤ −𝑘 𝑜𝑟 𝑇 ≥ 𝑘]


𝛼 𝛼
+ = 𝑃𝜇0 [𝑇 ≤ −𝑘 ] +𝑃𝜇0 [ 𝑇 ≥ 𝑘]
2 2
𝑃𝜇0 [𝑇 ≥ 𝑘] = 𝛼/2 .
𝑃𝜇0 [𝑇 ≤ 𝑘] = 1 − 𝛼/2 .
𝐹𝑇 [ 𝑘] = 1 − 𝛼/2 .
𝛼
𝑘 = 𝐹𝑇−1 (1 − 2 ) = 𝑡𝑛−1,1−𝛼/2

From the 𝑡–tables we obtain 𝑘 = 𝑡𝑛−1,1−𝛼/2.

√𝑛|𝑥−𝜇0 |
8. Reject the null hypothesis 𝐻0 : 𝜇 = 𝜇0 if |𝑡| ≥ 𝑡𝑛−1,1−𝛼/2 ,where |𝑡| = ,
𝑠

𝑠 𝑠
or , equivalently, if 𝑥 ≤ 𝜇0 − 𝑡𝑛−1,1−𝛼/2 𝑜𝑟 𝑥 ≥ 𝜇0 + 𝑡𝑛−1,1−𝛼/2 . (2.17)
√𝑛 √𝑛

(Remember that 𝑡𝑛−1,𝛼/2 = −𝑡𝑛−1,1−𝛼/2 so if we used 𝛼/2 = 𝑃𝜇0 [𝑇 ≤ −𝑘 ] we would end up


with the same answer by converting 𝑡𝑛−1,𝛼/2 to −𝑡n−1,1−𝛼/2 .)

In the previous examples we have only dealt with two–sided (≠) alternatives (H1) and a point (=)
null hypothesis (H0). For one–sided tests we would usually apply the methods described in the
example. However, one–sided LR–tests are derived in a similar manner and are usually just a
one–sided version of the tests described above.
In Example 2.4, let 𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0
The LR–test of size 𝛼 would then be:
𝑠 𝑠
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 ≤ 𝜇0 − 𝑡𝑛−1,1−𝛼/2 𝑜𝑟 𝑥 ≥ 𝜇0 + 𝑡𝑛−1,1−𝛼/2 .
√𝑛 √𝑛

For example, in Example 2.4, if we had 𝐻0 : 𝜇 ≤ 𝜇0 versus 𝐻1 : 𝜇 > 𝜇0 be the two hypotheses.
The LR–test of size 𝛼 would then be:
𝑠
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 ≥ 𝜇0 + 𝑡𝑛−1,1−𝛼 𝑛 . (2.18)

Similarly for 𝐻0 : 𝜇 ≥ 𝜇0 versus 𝐻1 : 𝜇 < 𝜇0 follows :


𝑠
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 ≤ 𝜇0 − 𝑡𝑛−1,1−𝛼 . (2.19)
√𝑛

These are one–sided critical regions and the only difference to (2.17) is that the 𝛼/2 is replaced
by 𝛼.
(A similar thing happens in example 2.3.)

The above example is one case to which the theory of the next section does not apply.

As an illustration of a one–sided LR–test, consider the following example.

Like in chapter 1, where you can replace the 𝑓(𝒙|𝜃) (pdf of entire sample X1,..,Xn) with 𝑔(t|𝜃)
(pdf of a sufficient statistic) in calculating the liklihood ratio, you can do so also for LR-tests i.e.
𝐿(𝜃|𝒙) = 𝑓(𝒙|𝜃) can be replaced with 𝐿(𝜃|𝑡) = 𝑔(t|𝜃) . This is one of the things that is
different about example 2.5.

Example 2.5 : Let 𝑋1 , … , 𝑋𝑛 be a random sample from the distribution

𝑓(𝑥|𝜃) = 𝑒 −(𝑥−𝜃) , 0 ≤ 𝜃 ≤ 𝑥 < ∞ .

(𝜃 ≤ 𝑥 < ∞ is the range of x and 𝜃 ≥ 0)


Find the LR–test of size 𝛼 for the hypothesis 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 .

(Since 𝜃 is the lower boundary of the sample space, we know that a sufficient statistic for 𝜃 is
𝑌 = min{𝑋1 , … , 𝑋𝑛 }, the first order statistic. Just as in Chapter 1, we can base our test on
sufficient statistics.)
Preliminary calculations:
𝑁𝑜𝑤 𝑓𝑋 (𝑥|𝜃) = 𝑒 −(𝑥−𝜃) , 𝜃 ≤ 𝑥 < ∞ , 𝑎𝑛𝑑
𝑥 𝑥
𝐹𝑋 (𝑥|𝜃) = ∫−∞ 𝑓𝑋 (𝑢|𝜃)𝑑𝑢 = ∫𝜃 𝑒 −(𝑢−𝜃) 𝑑𝑢 = . . . = 1 − 𝑒 −(𝑥−𝜃) .
(From (1.20) follows that) t = Y = min(Xi) = X(1) (introduced sufficient statistics as t, theorem 1.4
introduced smallest order statistic as min(Xi) = X(1), we’ll just call it Y)
𝑓𝑋(1) (𝑥) = 𝑛𝑓𝑋 (𝑥)[1 − 𝐹𝑋 (𝑥)]𝑛−1
𝑓𝑌 (𝑦|𝜃) = 𝑛 𝑒 −(𝑦−𝜃) 𝑒 −(𝑛−1)(𝑦−𝜃) (2.20)
= 𝑛 𝑒 −𝑛(𝑦−𝜃) , 𝜃 ≤ 𝑦 < ∞ .
1. Likelihood function :
𝐿(𝜃|𝑦) = fY (y|𝜃) = 𝑛𝑒 −𝑛(𝑦−𝜃) , 0 ≤ 𝜃 ≤ 𝑦.

2. 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 .
Ω = {𝜃 : 𝜃 > 0}
Ω0 = {𝜃 : 0 < 𝜃 ≤ 𝜃0 } .

3.
𝐿(𝜃|𝑦) = 𝑛𝑒 −𝑛(𝑦−𝜃) is an increasing function of 𝜃 because:
𝜃 ↑ implies −𝜃 ↓ implies 𝑦 − 𝜃 ↓ implies 𝑛(𝑦 − 𝜃) ↓ implies −𝑛(𝑦 − 𝜃) ↑ implies
𝑒 −𝑛(𝑦−𝜃) ↑ implies 𝑛𝑒 −𝑛(𝑦−𝜃) ↑ implies 𝐿(𝜃|𝑦) ↑

In Ω the ML estimator for 𝜃 is 𝜃̂ = 𝑦 since 𝐿(𝜃|𝑦) is an increasing function of 𝜃 up to


𝜃 = 𝑦. If 𝜃 > 𝑦, then 𝐿(𝜃|𝑦) = 0.

In Ω0 there are two possibilities.

If 𝜃0 ≤ 𝑦, the maximum is attained for However if 𝜃0 > 𝑦 then the maximum


𝜃 = 𝜃0 . likelihood is at 𝜃 = 𝑦, the same point as in Ω.

4.
So
sup 𝐿(𝜃|𝑦) = 𝐿( y|𝑦) = 𝑛 𝑒 −𝑛(𝑦−y) = 𝑛 𝑒 0 = 𝑛
𝜃∈Ω
𝐿(𝜃0 |𝑦) 𝑖𝑓 𝜃0 ≤ 𝑦 𝑛 𝑒 −𝑛(𝑦−𝜃0) 𝑖𝑓 𝜃0 ≤ 𝑦 𝑛 𝑒 −𝑛(𝑦−𝜃0) 𝑖𝑓 𝜃0 ≤ 𝑦
sup 𝐿(𝜃|𝑦) = { = { −𝑛(𝑦−y) ={
𝜃∈Ω0 𝐿(𝑦|𝑦) 𝑖𝑓 𝜃0 > 𝑦 . 𝑛𝑒 𝑖𝑓 𝜃0 > 𝑦 . 𝑛 𝑖𝑓 𝜃0 > 𝑦 .

and
sup 𝐿(𝜃|𝑦)
𝜃∈Ω0 𝑒 −𝑛(𝑦−𝜃0) 𝑖𝑓 𝜃0 ≤ 𝑦
𝜆(𝑦) = = {
sup 𝐿(𝜃|𝑦)
𝜃∈Ω
1 𝑖𝑓 𝜃0 > 𝑦 .

5. Reject 𝐻0 if 𝜆(𝑦) ≤ 𝑐 where 0 < 𝑐 < 1. So we will never reject 𝐻0 if 𝜃0 > 𝑦. So reject
𝐻0 if 𝑒 −𝑛(𝑦−𝜃0 ) ≤ 𝑐 for 𝜃0 ≤ 𝑦. That is, when
𝑒 −𝑛(𝑦−𝜃0) ≤ 𝑐

ℓ𝑛 𝑐
𝑦 ≥ 𝑘 ( 𝑤ℎ𝑒𝑟𝑒 𝑘 = 𝜃0 − 𝑛 ) .

6. Under 𝐻0 the distribution of 𝑌 is known and given in (2.20).

7. Let 𝛼 = sup 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻 0 𝑡𝑟𝑢𝑒) = sup𝜃≤𝜃0 𝑃[𝑌 ≥ 𝑘] . Notice that since the null
hypothesis is now composite we must find the supremum of the probability and set it equal to
𝛼.
∞ ∞
𝑃[𝑌 ≥ 𝑘] = ∫𝑘 𝑓(𝑦|𝜃)𝑑𝑦 = ∫𝑘 𝑛𝑒 −𝑛(𝑦−𝜃) 𝑑𝑦
= … = 𝑒 −𝑛(𝑘−𝜃) .

This is an increasing function of 𝜃 and since 𝜃 ≤ 𝜃0 under 𝐻0 , the supremum is obtained


when 𝜃 = 𝜃0 . So
𝛼 = sup 𝑃[𝑌 ≤ 𝑘] = 𝑒 −𝑛(𝑘−𝜃0) = 𝛼,
𝜃≤𝜃0
ℓ𝑛 𝛼
𝑘 = 𝜃0 − .
𝑛

8. A test of size 𝛼 for 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 is:

ℓ𝑛𝛼
Reject 𝐻0 if 𝑦 ≥ 𝑘 𝑖. 𝑒. 𝑦 ≥ 𝜃0 − , where 𝑦 = min{𝑥1 , … , 𝑥𝑛 }.
𝑛

2.4 Two–sample normal tests (LR tests for 2 groups)


(continuation of 2.2 i.e. Generalized Likelihood Ratio Tests (LR-tests) but for 2 samples /
populations – in the notes they do examples on the normal distribution. The exercises has cases
(of Generalized Likelihood Ratio Tests (LR-tests)) for 2 samples / populations for other
distributions as well.)

In this section we will derive the LR–tests in two important examples, tests that are often
encountered in practice. Here we are comparing the parameters of two different populations.

Example 2.10 : Consider a sample 𝑋1 , … , 𝑋𝑛 of size 𝑛 from a 𝑁(𝜇1 , 𝜎 2 ) distribution and


an independent sample 𝑌1 , … , 𝑌𝑚 of size 𝑚 from a 𝑁(𝜇2 , 𝜎 2 ) distribution. Note that the
variances are assumed equal but unknown. We want to derive the LR test for 𝐻0 : 𝜇1 = 𝜇2 (= 𝜇0 )
versus 𝐻1 : 𝜇1 ≠ 𝜇2 .
1. Likelihood function :

𝐿(𝜇1 , 𝜇2 , 𝜎 2 |𝒙, 𝒚)
= 𝑓𝑋1 (𝑥1 |𝜇1 , 𝜎 2 ) × … × 𝑓𝑋𝑛 (𝑥𝑛 |𝜇1 , 𝜎 2 ) × 𝑓𝑌1 (𝑦1 |𝜇2 , 𝜎 2 ) × … × 𝑓𝑌𝑚 (𝑦𝑚 |𝜇2 , 𝜎 2 )
= ∏𝑛𝑖=1 𝑓𝑋 (𝑥𝑖 |𝜇1 , 𝜎 2 ) × ∏𝑚 2
𝑖=1 𝑓𝑌 (𝑦𝑖 |𝜇2 , 𝜎 )
1 1 1 1
− (𝑥 −𝜇1 )2 − (𝑦 −𝜇2 )2
= ∏𝑛𝑖=1(2𝜋𝜎 2 )−2 𝑒 2𝜎2 𝑖
𝑚
×∏
𝑖=1 (2𝜋𝜎 2 )−2 𝑒 2𝜎2 𝑖
𝑛+𝑚 1
− 2 [Σ(𝑥𝑖 −𝜇1 )2 +Σ(𝑦𝑗 −𝜇2 )2 ]
= (2𝜋𝜎 2 )− 2 𝑒 2𝜎 (2.26)

2. 𝐻0 : 𝜇1 = 𝜇2 (= 𝜇0 ) versus 𝐻1 : 𝜇1 ≠ 𝜇2 .

Ω = {(𝜇1 , 𝜇2 , 𝜎 2 ) : − ∞ < 𝜇1 , 𝜇2 < ∞ , 𝜎 2 > 0}


Ω0 = {(𝜇0 , 𝜎02 ) : − ∞ < 𝜇0 < ∞ , 𝜎02 > 0} .

3. In Ω the ML estimators of the means are

𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎2 |𝒙, 𝒚) 𝜇1 =𝜇
̂ 𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎2 |𝒙, 𝒚) 𝜇1 =𝜇
̂ 𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎2 |𝒙, 𝒚) 𝜇1 =𝜇
̂
(
𝜕𝜇1
|𝜇2=𝜇̂12 = 0; 𝜕𝜇2
|𝜇2=𝜇̂12 = 0; 𝜕𝜎2
| 𝜇2=𝜇̂12 = 0)
̂2
𝜎2 =𝜎 ̂2
𝜎2 =𝜎 𝜎2 =𝜎̂2

𝜇̂ 1 = 𝑥 𝑎𝑛𝑑 𝜇̂ 2 = 𝑦 . (2.27)

Further,
𝜕ℓ𝑛 𝐿 𝜇1 =𝜇
̂ 𝑛+𝑚 1
| ̂12 = − + 2𝜎̂4 [Σ(𝑥𝑖 − 𝜇̂ 1 )2 + Σ(𝑦𝑗 − 𝜇̂ 2 )2 ] = 0
𝜕𝜎2 𝜇2 =𝜇 ̂2
2𝜎
𝜎2 =𝜎
̂2

Set equal to zero, replace 𝜇1 and 𝜇2 by their MLE’s and solve for 𝜎 2 . Thus

1
𝜎̂ 2 = [Σ(𝑥𝑖 − 𝑥)2 + Σ(𝑦𝑗 − 𝑦)2 ] . (2.28)
𝑛+𝑚

In Ω0 (we call the 𝜎2 here 𝜎20 for convenience) the means are equal, so

2 2
𝜕𝐿(𝜇0 , 𝜇0 , 𝜎0 |𝒙, 𝒚) 𝜇 =𝜇 ̂ 𝜕𝐿(𝜇0 , 𝜇0 , 𝜎0 |𝒙, 𝒚) 𝜇 =𝜇 ̂
( |𝜎02 =𝜎̂02 = 0; |𝜎02 =𝜎̂02 = 0)
𝜕𝜇0 0 0 𝜕𝜎02 0 0

2
𝜕ℓ𝑛 𝐿(𝜇0 , 𝜇0 , 𝜎0 |𝒙, 𝒚) 𝜕 1
= {− [Σ(𝑥𝑖 − 𝜇0 )2 + Σ(𝑦𝑗 − 𝜇0 )2 ]}
𝜕𝜇0 𝜕𝜇0 2𝜎02
1
= [Σ(𝑥𝑖 − 𝜇0 ) + Σ(𝑦𝑗 − 𝜇0 )]
𝜎02

2
𝜕𝐿(𝜇0 , 𝜇0 , 𝜎0 |𝒙, 𝒚) 𝜇 =𝜇̂ 1
|𝜎02=𝜎̂02 = ̂02
[Σ(𝑥𝑖 − 𝜇̂ 0 ) + Σ(𝑦𝑗 − 𝜇̂ 0 )] = 0
𝜕𝜇0 0 0 𝜎

so that
Σ(𝑥𝑖 − 𝜇̂ 0 ) + Σ(𝑦𝑗 − 𝜇̂ 0 ) = 0
Σ𝑥𝑖 − 𝑛𝜇̂ 0 + Σ𝑦𝑗 − 𝑚𝜇̂ 0 = 0
Σ𝑥𝑖 + Σ𝑦𝑗 = 𝑛𝜇̂ 0 + 𝑚𝜇̂ 0
𝜇̂ 0 (𝑛 + 𝑚) = Σ𝑥𝑖 + Σ𝑦𝑗 (2.29)
Σ𝑥𝑖 +Σ𝑦𝑗
𝜇̂ 0 = 𝑛+𝑚
𝑛𝑥+𝑚𝑦
= .
𝑛+𝑚

Similarly to (2.28) the estimator for 𝜎02 in Ω0 is

𝜕𝐿(𝜇0 , 𝜇0 , 𝜎02 |𝒙, 𝒚) 𝜇0=𝜇̂0


|𝜎2=𝜎̂2 = 0
𝜕𝜎02 0 0

2 1
𝜎̂0 = 𝑛+𝑚 [Σ(𝑥𝑖 − 𝜇̂ 0 )2 + Σ(𝑦𝑗 − 𝜇̂ 0 )2 ] . (2.30)

4. Then

sup 𝐿(𝜇1 , 𝜇2 , 𝜎 2 |𝐱, 𝐲) = 𝐿(𝜇̂ 1 , 𝜇̂ 2 , 𝜎̂ 2 |𝐱, 𝐲)


Ω
𝑛+𝑚 1
− [Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 ]
= (2𝜋𝜎̂ 2 )− 2 𝑒 ̂2
2𝜎
1
𝑛+𝑚 − 1 [Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 ]
[Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 ]}
= (2𝜋𝜎̂ 2 )− 2 𝑒 2{
𝑛+𝑚
𝑛+𝑚 𝑛+𝑚
= (2𝜋𝜎̂ 2 )− 2 𝑒− 2 .
Similarly
sup 𝐿(𝜇0 , 𝜎02 |𝐱, 𝐲) = 𝐿(𝜇̂ 0 , 𝜎̂0 |𝐱, 𝐲 )
Ω0
1
𝑛+𝑚 − ̂ 0 )2 +Σ(𝑦𝑗 −𝜇
[Σ(𝑥𝑖 −𝜇 ̂ 0 )2 ]
= (2𝜋𝜎̂0 2 )− 2 𝑒 ̂2
2𝜎 0
1
𝑛+𝑚 − 1 ̂ 0 )2 +Σ(𝑦𝑗 −𝜇
[Σ(𝑥𝑖 −𝜇 ̂ 0 )2 ]
2 − 2{ ̂ 0 )2 +Σ(𝑦𝑗 −𝜇
[Σ(𝑥𝑖 −𝜇 ̂ 0 )2 ]}
= (2𝜋𝜎̂0 ) 2 𝑒 𝑛+𝑚
𝑛+𝑚 𝑛+𝑚

= (2𝜋𝜎̂02 ) 2 𝑒− 2 ,

so that
𝑛+𝑚
sup 𝐿(𝜇0 ,𝜎02 |𝐱,𝐲) −
𝑛+𝑚

𝑛+𝑚

Ω0 ̂ 2)
(2𝜋𝜎 2 𝑒 2 ̂02
𝜎 2
𝜆(𝐱, 𝐲) = = 𝑛+𝑚 𝑛+𝑚 = (𝜎̂2 )
sup 𝐿(𝜇1 ,𝜇2 ,𝜎2 |𝐱,𝐲) − −
Ω ̂02 ) 2
(2𝜋𝜎 𝑒 2
𝑛+𝑚
1 −
̂ 0 )2 +Σ(𝑦𝑗 −𝜇
[Σ(𝑥𝑖 −𝜇 ̂ 0 )2 ] 2
= ( 𝑛+𝑚
) (2.31)
1
[Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 ]
𝑛+𝑚
𝑛+𝑚

̂ 0 )2 +Σ(𝑦𝑗 −𝜇
Σ(𝑥𝑖 −𝜇 ̂ 0 )2 2
= [ ] .
Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2
To simplify, notice that
Like what was mentioned before example 2.3 and used in example 2.3 and example 2.4 we
have that ∑(𝑥𝑖 − 𝜇0 )2 = ∑(𝑥𝑖 − 𝑥)2 + 𝑛(𝑥 − 𝜇0 )2

Σ(𝑥𝑖 − 𝜇̂ 0 )2 = Σ(𝑥𝑖 − 𝑥)2 + 𝑛(𝑥 − 𝜇̂ 0 )2


𝑛𝑥+𝑚𝑦 2
= Σ(𝑥𝑖 − 𝑥)2 + 𝑛(𝑥 − )
𝑛+𝑚
(𝑛+𝑚)𝑥 𝑛𝑥+𝑚𝑦
= Σ(𝑥𝑖 − 𝑥)2 + 𝑛[ 𝑛+𝑚 − 𝑛+𝑚 ]2
𝑛𝑥+𝑚𝑥−𝑛𝑥−𝑚𝑦 2
= Σ(𝑥𝑖 − 𝑥)2 + 𝑛[ ]
𝑛+𝑚
𝑚𝑥−𝑚𝑦 2
= Σ(𝑥𝑖 − 𝑥)2 + 𝑛[ ]
𝑛+𝑚
𝑚(𝑥−𝑦) 2
= Σ(𝑥𝑖 − 𝑥)2 + 𝑛[ (𝑚+𝑛) ]
𝑛𝑚2
= Σ(𝑥𝑖 − 𝑥)2 + (𝑚+𝑛)2 (𝑥 − 𝑦)2
𝑎𝑛𝑑 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑙𝑦
𝑛2 𝑚
Σ(𝑦𝑗 − 𝜇̂ 0 )2 = Σ(𝑦𝑗 − 𝑦)2 + (𝑛+𝑚)2 (𝑥 − 𝑦)2
so that
𝑛𝑚2 𝑛2 𝑚
Σ(𝑥𝑖 − 𝜇̂ 0 )2 + Σ(𝑦𝑗 − 𝜇̂ 0 )2 = Σ(𝑥𝑖 − 𝑥)2 + (𝑚+𝑛)2 (𝑥 − 𝑦)2 + Σ(𝑦𝑗 − 𝑦)2 + (𝑛+𝑚)2 (𝑥 − 𝑦)2
𝑛𝑚 𝑛𝑚
= Σ(𝑥𝑖 − 𝑥)2 + Σ(𝑦𝑗 − 𝑦)2 + 𝑚 (𝑚+𝑛)2 (𝑥 − 𝑦)2 + 𝑛 (𝑛+𝑚)2 (𝑥 − 𝑦)2
𝑛𝑚
= Σ(𝑥𝑖 − 𝑥)2 + Σ(𝑦𝑗 − 𝑦)2 + [𝑚 + 𝑛][(𝑚+𝑛)2 (𝑥 − 𝑦)2 ]
𝑛𝑚
= Σ(𝑥𝑖 − 𝑥)2 + Σ(𝑦𝑗 − 𝑦)2 + (𝑥 − 𝑦)2 .
𝑛+𝑚

𝑛+𝑚
𝑛𝑚 −
Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 + (𝑥−𝑦)2 2
𝑛+𝑚
𝜆(𝐱, 𝐲) = [ ]
Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2
𝑛+𝑚
𝑛𝑚 −
Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 (𝑥−𝑦)2 2
𝑛+𝑚
= [Σ(𝑥 −𝑥)2+Σ(𝑦 2 + ]
𝑖 𝑗 −𝑦) Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2

Then it follows that


𝑛+𝑚
𝑛𝑚 −
(𝑥−𝑦)2 2
𝑛+𝑚
𝜆(𝐱 , 𝐲 ) = [1 + ] . (2.32)
Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2

5. Reject 𝐻0 if 𝜆(𝐱 , 𝐲 ) < 𝑐. This is equivalent to rejecting 𝐻0 if


𝑛+𝑚
𝑛𝑚 −
2 2
(𝑥 − 𝑦)
[1 + 𝑛 + 𝑚 ] <𝑐
Σ(𝑥𝑖 − 𝑥)2 + Σ(𝑦𝑗 − 𝑦)2
𝑛𝑚
𝑛 + 𝑚 (𝑥 − 𝑦)2 −
2
1+ > 𝑐 𝑛+𝑚
Σ(𝑥𝑖 − 𝑥)2 + Σ(𝑦𝑗 − 𝑦)2
𝑛𝑚 2
𝑛 + 𝑚 (𝑥 − 𝑦) > 𝑐 −
2
𝑛+𝑚 − 1
Σ(𝑥𝑖 − 𝑥)2 + Σ(𝑦𝑗 − 𝑦)2
𝑛𝑚 2 −
2
(𝑥 − 𝑦) 𝑐 𝑛+𝑚 − 1
𝑛+𝑚 >
1 2 2 1
𝑛 + 𝑚 − 2 [Σ(𝑥𝑖 − 𝑥)𝑛𝑚+ Σ(𝑦𝑗 − 𝑦) ] (𝑛 + 𝑚 − 2)
(𝑥−𝑦)2
1
𝑛+𝑚
> 𝑘2 ,
[Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 ]
𝑛+𝑚−2
𝑛𝑚
√𝑛+𝑚(𝑥−𝑦)2
1
>𝑘 (2.33)
√ [Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 ]
𝑛+𝑚−2
|𝑥−𝑦|
𝑜𝑟 1 1
>𝑘,
√( + )𝑆𝑝2
𝑛 𝑚

1
𝑤ℎ𝑒𝑟𝑒 𝑆𝑝2 = 𝑛+𝑚−2 [Σ(𝑥𝑖 − 𝑥)2 + Σ(𝑦𝑗 − 𝑦)2 ] (2.34)
is the pooled variance estimator.
𝑛𝑚 𝑛+𝑚 −1 𝑛 𝑚 −1 1 1 −1 1
and as = ( 𝑛𝑚 ) = (𝑛𝑚 + 𝑛𝑚) = (𝑚 + 𝑛) = 1 1
𝑛+𝑚 ( + )
𝑚 𝑛

6. Now the numerator and denominator in (2.33) are independent and we must find the
distribution of the left–hand side under 𝐻0 . (We work under 𝐻0 because we are going to use
these results in α, in which it is given that 𝐻0 is true.)

Xi ∼ 𝑁(𝜇0 , 𝜎 2 ) 𝑎𝑛𝑑 Yi ∼ 𝑁(𝜇0 , 𝜎 2 ) , 𝑠𝑜

𝜎2 𝜎2
𝑋 ∼ 𝑁 (𝜇0 , ) 𝑎𝑛𝑑 𝑌 ∼ 𝑁 (𝜇0 , 𝑚 ) , 𝑠𝑜
𝑛
𝜎2 𝜎2 (𝑋−𝑌)−0 𝑋−𝑌
𝑋−𝑌 ∼ 𝑁 (0, + ) 𝑠𝑜 𝑍= 2 2
= ∼ 𝑁(0,1)
𝑛 𝑚 1 1
√ 𝜎 +𝜎 √( + )𝜎2
𝑛 𝑚 𝑛 𝑚

As:
𝐸(𝑎𝑋 + 𝑏𝑌) = 𝑎𝐸(𝑋) + 𝑏𝐸(𝑌)
𝑉(𝑎𝑋 + 𝑏𝑌) = 𝑎2 𝑉(𝑋) + 𝑏 2 𝑉(𝑌)
So:
𝐸(𝑋 − 𝑌) = 𝐸(1𝑋 + (−1)𝑌) = 1𝐸(𝑋) + (−1)𝐸(𝑌) = 𝐸(𝑋) − 𝐸(𝑌) = 𝜇0 − 𝜇0 = 0
𝜎2 𝜎2
𝑉(𝑋 − 𝑌) = 𝑉(1𝑋 + (−1)𝑌) = 12 𝑉(𝑋) + (−1)2 𝑉(𝑌) = 𝑉(𝑋) + 𝑉(𝑌) = +
𝑛 𝑚

Next,
2
(𝑛−1)𝑆𝑋 Σ(𝑋𝑖 −𝑋)2 2 Σ(𝑌𝑗 −𝑌)2 2
= ∼ 𝜒𝑛−1 𝑎𝑛𝑑 ∼ 𝜒𝑚−1 (see ch6 of Rice Textbook)
𝜎2 𝜎2 𝜎2
Σ(𝑋𝑖 −𝑋)2 Σ(𝑌𝑗 −𝑌)2 2
independently, so that + ∼ 𝜒𝑛−1+m−1
𝜎2 𝜎2
(𝑛+𝑚−2)𝑆𝑝2 Σ(𝑋𝑖 −𝑋)2 +Σ(𝑌𝑗 −𝑌)2 2
𝑈= = ∼ 𝜒𝑛+𝑚−2 .
𝜎2 𝜎2

𝑍
If 𝑍~𝑁(0,1) and 𝑈~𝜒𝑣2 then ~𝑡𝑣 from the definition of the t-distribution from 2nd year.
√𝑈/𝑣

𝑠𝑜 𝑡ℎ𝑎𝑡
𝑋−𝑌 𝑋−𝑌
1 1 2 1 1
𝑍 √( + )𝜎 √( + ) 𝑋−𝑌
𝑛 𝑚 𝑛 𝑚
𝑊 = = = = ∼ 𝑡𝑛+𝑚−2 . (2.35)
𝑈 1 1
√ (𝑛+𝑚−2)𝑆2
𝑝 √𝑆𝑝2 √( + )𝑆𝑝2
𝑛+𝑚−2 𝑛 𝑚
√( 𝜎2
)

𝑛+𝑚−2

7. So we reject 𝐻0 if
|𝑥−𝑦|
1 1
> k i.e. |𝑤| > 𝑘 i.e. ±𝑤 > 𝑘 i.e. 𝑤 > 𝑘 𝑜𝑟 − 𝑤 > 𝑘 i.e. 𝑤 > 𝑘 𝑜𝑟 𝑤 < −𝑘
√( + )𝑆𝑝2
𝑛 𝑚

𝛼 = sup 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒) = sup 𝑃(𝑤 > 𝑘 𝑜𝑟 𝑤 < −𝑘|𝜇1 = 𝜇2 (= 𝜇0 ))


= sup 𝑃𝜇0 (𝑤 > 𝑘 𝑜𝑟 𝑤 < −𝑘) = 𝑃𝜇0 (𝑤 > 𝑘 𝑜𝑟 𝑤 < −𝑘) = 𝑃𝜇0 (𝑤 > 𝑘 )+𝑃𝜇0 ( 𝑤 < −𝑘)

where 𝑊 ∼ 𝑡𝑛+𝑚−2 .
𝛼
Since the 𝑡–distribution is symmetric around zero, = 𝑃[𝑊 > 𝑘]
2
𝛼
so 1 − 2 = 𝑃[𝑊 < 𝑘] ,
𝛼
so 1 − 2 = 𝐹𝑊 [𝑘] ,
−1 𝛼
so 𝐹𝑊 [1 − 2 ] = 𝑘 where 𝑊 ∼ 𝑡𝑛+𝑚−2
−1 𝛼
so 𝑘 = 𝐹𝑊 [1 − 2 ] where 𝑊 ∼ 𝑡𝑛+𝑚−2
so 𝑘 = 𝑡𝑛+𝑚−2;1−𝛼/2.

8. The LR–test is: Reject the null hypothesis 𝐻0 : 𝜇1 = 𝜇2 if


|𝑥−𝑦|
|𝑤| = 1 1
> 𝑡𝑛+𝑚−2;1−𝛼/2 . (2.36)
√( + )𝑆𝑝2
𝑛 𝑚

Remarks :
𝐻0 : 𝜇1 = 𝜇2 versus 𝐻1 : 𝜇1 ≠ 𝜇2 Reject H0 if |𝑤| > 𝑡𝑛+𝑚−2;1−𝛼/2
Is the same as
𝐻0 : 𝜇1 = 𝜇2 versus 𝐻1 : 𝜇1 ≠ 𝜇2 Reject H0 if 𝑤 > 𝑡𝑛+𝑚−2;1−𝛼/2 𝑜𝑟 𝑤 < −𝑡𝑛+𝑚−2;1−𝛼/2

1. For the one–sided hypotheses


𝐻0 : 𝜇1 ≤ 𝜇2 𝐻0 : 𝜇1 ≥ 𝜇2
versus 𝐻1 : 𝜇1 > 𝜇2 versus 𝐻1 : 𝜇1 < 𝜇2 ,
LR–test is just the 1–sided version of (2.36), LR–test is just the 1–sided version of (2.36),
Reject 𝐻0 if 𝑤 > 𝑡𝑛+𝑚−2;1−𝛼 Reject 𝐻0 if 𝑤 < −𝑡𝑛+𝑚−2,1−𝛼
𝐻0 : 𝜇1 = 𝜇2 means 𝐻0 : 𝜇1 − 𝜇2 = 0
2. A more general hypothesis would be 𝐻0 : 𝜇1 − 𝜇2 = 𝛿0 versus 𝐻1 : 𝜇1 − 𝜇2 ≠ 𝛿0 .
|𝑥−𝑦−𝛿0 |
The test is the same as in (2.36) except that 𝑤 = 1 1
.
√( + )𝑆𝑝2
𝑛 𝑚
(See also Exercise 2).

3. The test is unbiased.

4. For more than two populations, the test of 𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘 (= 𝜇0 ) versus


𝐻1 : 𝜇𝑖 ≠ 𝜇𝑗 for at least one pair (𝑖, 𝑗), (𝐻1 : not 𝐻0 ) is called the one–way analysis of
variance (one-way ANOVA) (see Exercise 4).

Example 2.11 : Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇1 , 𝜎12 ) and, independently, 𝑌1 , … , 𝑌𝑚 ∼ 𝑁(𝜇2 , 𝜎22 ) with
all parameters unknown. Derive the LR–test of 𝐻0 : 𝜎12 = 𝜎22 (= 𝜎02 ) versus 𝐻1 : 𝜎12 ≠ 𝜎22 .

1. Likelihood function :
𝐿(𝜇1 , 𝜇2 , 𝜎12 , 𝜎22 | , )
= 𝑓𝑋1 (𝑥1 |𝜇1 , 𝜎12 ) × … × 𝑓𝑋𝑛 (𝑥𝑛 |𝜇1 , 𝜎12 ) × 𝑓𝑌1 (𝑦1 |𝜇2 , 𝜎12 ) × … × 𝑓𝑌𝑚 (𝑦𝑚 |𝜇2 , 𝜎22 )
= ∏𝑛𝑖=1 𝑓𝑋 (𝑥𝑖 |𝜇1 , 𝜎12 ) × ∏𝑚 2
𝑖=1 𝑓𝑌 (𝑦𝑖 |𝜇2 , 𝜎2 )
1 1 1 1
− (𝑥 −𝜇1 )2 − (𝑦 −𝜇2 )2
= ∏𝑛𝑖=1(2𝜋𝜎12 )−2 𝑒 2𝜎2 𝑖 × ∏𝑚 2 −2
𝑖=1(2𝜋𝜎2 ) 𝑒 2𝜎2 𝑖
1 1
− Σ(𝑥𝑖 −𝜇1 )2 − Σ(𝑦𝑗 −𝜇2 )2
2𝜎2 2𝜎2
= (2𝜋𝜎12 )−𝑛/2 (2𝜋𝜎22 )−𝑚/2 𝑒 1 2 . (2.37)

2. 𝐻0 : 𝜎12 = 𝜎22 (= 𝜎02 ) versus 𝐻1 : 𝜎12 ≠ 𝜎22 .


Ω = {(𝜇1 , 𝜇2 , 𝜎12 , 𝜎22 ) : −∞ < 𝜇1 , 𝜇2 < ∞ , 𝜎12 > 0 , 𝜎22 > 0}
Ω0 = {(𝜇1 , 𝜇2 , 𝜎02 ) : −∞ < 𝜇1 , 𝜇2 < ∞ , 𝜎02 > 0}

3. In Ω all four parameters have the usual ML–estimators :


2 , 𝜎2 |𝒙, 𝒚)
𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎1 2 , 𝜎2 |𝒙, 𝒚)
𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎1
2 𝜇 =𝜇 ̂ 2 𝜇 =𝜇̂
(
𝜕𝜇1
|𝜇12 =𝜇̂12
= 0; 𝜕𝜇
|𝜇12=𝜇̂12 = 0;
2
𝜎21 =𝜎̂ 21 ̂ 21
𝜎21 =𝜎
̂ 22
𝜎22 =𝜎 ̂ 22
𝜎22 =𝜎
2 , 𝜎2 |𝒙, 𝒚)
𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎1 2 , 𝜎2 |𝒙, 𝒚)
𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎1
2 𝜇 =𝜇 ̂ 2 𝜇 =𝜇 ̂
𝜕𝜎12 | 𝜇12=𝜇̂12 = 0; 𝜕𝜎22 |𝜇12=𝜇̂12 = 0)
𝜎21 =𝜎̂ 21 ̂ 21
𝜎21 =𝜎
̂ 22
𝜎22 =𝜎 ̂ 22
𝜎22 =𝜎

𝜇̂ 1 = 𝑥 , 𝜇̂ 2 = 𝑦 ,
1 1
𝜎̂12 = Σ(𝑥𝑖 − 𝑥)2 , 𝜎̂22 = Σ(𝑦𝑗 − 𝑦)2 .
𝑛 𝑚

In Ω0 there is also no restriction on the means,


𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎2 2
0 , 𝜎0 |𝒙, 𝒚) 𝜇1 =𝜇
̂1;0 𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎2 2
0 , 𝜎0 |𝒙, 𝒚) 𝜇1 =𝜇
̂
( 𝜕𝜇
|𝜇 =𝜇̂ = 0; 𝜕𝜇
|𝜇 =𝜇̂1;0 = 0;
1 2 2;0 2 2;0 2
̂ 20
𝜎20 =𝜎 ̂ 20
𝜎20 =𝜎
𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎2 2
0 , 𝜎0 |𝒙, 𝒚) 𝜇1 =𝜇
̂
𝜕𝜎2
|𝜇 =𝜇̂1;0 = 0)
0 2 2;0
̂ 20
𝜎20 =𝜎

so 𝜇̂ 1;0 = 𝑥 , 𝜇̂ 2;0 = 𝑦, but as in (2.28), when the variances are assumed equal,
1
𝜎̂02 = 𝑛+𝑚 [Σ(𝑥𝑖 − 𝑥)2 + Σ(𝑦𝑗 − 𝑦)2 ] .

4. Now, supΩ 𝐿(𝜇1 , 𝜇2 , 𝜎12 , 𝜎22 | , 𝒙, 𝒚) = 𝐿(𝑥, 𝑦, 𝜎̂12 , 𝜎̂22 | 𝒙, 𝒚)


1 1

𝑛+𝑚 − Σ(𝑥𝑖 −𝑥)2 − Σ(𝑦𝑗 −𝑦)2
̂2 ̂2
= (2𝜋) 2 (𝜎̂12 )−𝑛/2 (𝜎̂22 )−𝑚/2 𝑒 2𝜎 1 2𝜎 2
1 1
𝑛+𝑚 − 1 Σ(𝑥𝑖 −𝑥)2 − 1 Σ(𝑦𝑗 −𝑦)2
− 2 Σ(𝑥𝑖 −𝑥)2 2 Σ(𝑦𝑗 −𝑦)2
= (2𝜋) 2 (𝜎̂12 )−𝑛/2 (𝜎̂22 )−𝑚/2 𝑒 𝑛 𝑚
𝑛+𝑚 𝑛+𝑚
= (2𝜋)− 2 (𝜎̂12 )−𝑛/2 (𝜎̂22 )−𝑚/2 𝑒 − 2 .

Also supΩ0 𝐿(𝜇1 , 𝜇2 , 𝜎02 , 𝜎02 | , 𝑦) = 𝐿(𝑥, 𝑦, 𝜎̂02 , 𝜎̂02 | , 𝑦)


1 1
𝑛+𝑚 − Σ(𝑥𝑖 −𝑥)2 − 2 Σ(𝑦𝑗 −𝑦)2
= (2𝜋)− 2 (𝜎̂02 )−𝑛/2 (𝜎̂02 )−𝑚/2 𝑒 ̂2
2𝜎 0 ̂0
2𝜎
1
𝑛+𝑚 − 1 [Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 ]
− 2 [Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 ]
= (2𝜋) 2
0 (𝜎̂ ) −𝑛/2
(𝜎̂02 )−𝑚/2 𝑒 2
𝑛+𝑚
𝑛+𝑚 𝑛+𝑚
2 − 2 −
= (2𝜋𝜎̂0 ) 𝑒 2 .

Then the LR–statistic is given by

supΩ0 𝐿(𝜇1 ,𝜇2 ,𝜎02 ,𝜎02 | ,𝑦)


𝜆(𝐱 , 𝐲) = supΩ 𝐿(𝜇1 ,𝜇2 ,𝜎12 ,𝜎22 | ,𝒙,𝒚)
𝑛+𝑚 𝑛+𝑚
− −
(2𝜋) 2 (𝜎̂12 )−𝑛/2 (𝜎
̂22 )−𝑚/2 𝑒 2
= 𝑛+𝑚 𝑛+𝑚
− −
̂02 ) 2 𝑒 2
(2𝜋𝜎
𝑛+𝑚

̂02 )
(𝜎 2
= ̂12 )−𝑛/2 (𝜎
̂22 )−𝑚/2
(𝜎
−𝑛/2 −𝑚/2
𝜎̂02 ̂02
𝜎
= (𝜎̂2 ) (𝜎̂ 2)
1 2
1 −𝑛/2 1 −𝑚/2
[Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 ] [Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 ]
𝑛+𝑚 𝑛+𝑚
= ( 1 ) ( 1 )
Σ(𝑥𝑖 −𝑥)2 Σ(𝑦𝑗 −𝑦)2
𝑛 𝑚
−𝑛/2 −𝑚/2
(𝑛+𝑚)−1 Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 (𝑛+𝑚)−1 Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2
= ( [ ]) ( [ ])
n−1 Σ(𝑥𝑖 −𝑥)2 𝑚−1 Σ(𝑦𝑗 −𝑦)2
𝑛 −𝑛/2
−𝑚/2
(𝑛+𝑚)−1 − 2 (𝑛+𝑚)−1 −𝑚/2 Σ(𝑥𝑖 −𝑥)2 Σ(𝑦𝑗 −𝑦)2 Σ(𝑥𝑖 −𝑥)2 Σ(𝑦 −𝑦)2
= ( ) ( ) ([Σ(𝑥 −𝑥)2 + Σ(𝑥 −𝑥)2 ]) ([ + Σ(𝑦𝑗 −𝑦)2])
n−1 𝑚−1 𝑖 𝑖 Σ(𝑦𝑗 −𝑦)2 𝑗
𝑛+𝑚 −𝑛/2 −𝑚/2
(𝑛+𝑚) 2 Σ(𝑦𝑗 −𝑦)2 Σ(𝑥𝑖 −𝑥)2
= (1 + Σ(𝑥 −𝑥)2 ) (1 + ) .
𝑛𝑛/2 𝑚𝑚/2 𝑖 Σ(𝑦𝑗 −𝑦)2
Σ(𝑦𝑗 −𝑦)2
5. Reject 𝐻0 if 𝜆(𝒙, 𝐲 ) < 𝑐. Let 𝑣 = , then it is equivalent to rejecting 𝐻0 if
Σ(𝑥𝑖 −𝑥)2
𝑛+𝑚
(𝑛+𝑚) 2 1 −𝑚/2
(1 + 𝑣)−𝑛/2 (1 + 𝑣) < 𝑐. (2.38)
𝑛𝑛/2 𝑚𝑚/2
𝑚

𝑛 1 2 𝑐
(1 + 𝑣)− 2 (1 + ) < 𝑛+𝑚
𝑣
(𝑛 + 𝑚) 2
𝑚
𝑛𝑛/2 𝑚𝑚/2

𝑛 1 −2
(1 + 𝑣) 2 (1 + ) <𝑘
𝑣

6. Reject 𝐻0 if 𝑣 < 𝑘1 or 𝑣 > 𝑘2

𝐼𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑡 𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝐼𝑉 ∶
𝐼𝑓 𝑋 ∼ 𝜒𝜈2 , 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓
𝑌 ∼ 𝜒𝜔2
, 𝑡ℎ𝑒𝑛 (2.39)
𝑋/𝜈
𝑊 = ∼ 𝐹𝜈,𝜔 .
𝑌/𝜔

(𝑛−1)𝑠12 Σ(𝑋𝑖 −𝑋)2 2 (𝑚−1)𝑠22 Σ(𝑌𝑗 −𝑌)2 2


Now under 𝐻0 , = ∼ 𝜒𝑛−1 and = ∼ 𝜒𝑚−1 ,
𝜎02 𝜎02 𝜎02 𝜎02

1 1
Let 𝑠12 = 𝑛−1 Σ(𝑥𝑖 − 𝑥)2 and 𝑠22 = 𝑚−1 Σ(𝑦𝑗 − 𝑦)2
(𝑚−1)𝑠22 /(𝑚−1) 1
𝜎2 𝑠2 Σ(𝑌𝑗 −𝑌)2 𝑛−1
so 0
(𝑛−1)𝑠2
= 𝑠22 = 𝑚−1
1 = 𝑚−1 𝑉 ∼ 𝐹𝑚−1,𝑛−1 .
1 /(𝑛−1) 1 Σ(𝑋𝑖 −𝑋)2
𝑛−1
𝜎2
0
then the LR–test is equivalent to
Reject 𝐻0 if 𝑣 < 𝑘1 or 𝑣 > 𝑘2
𝑛−1 𝑛−1 𝑛−1 𝑛−1
if 𝑚−1 𝑣 < 𝑚−1 𝑘1 or 𝑣 > 𝑚−1 𝑘2
𝑚−1
𝑛−1 𝑛−1
if 𝑣 < 𝑘1′ or 𝑣 > 𝑘2′
𝑚−1 𝑚−1
𝑠22 𝑠22 𝑠22
if < 𝑘1′ or > 𝑘2′ where ∼ 𝐹𝑚−1,𝑛−1 .
𝑠12 𝑠12 𝑠12

7. Now 𝛼 = sup 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒) = sup 𝑃𝜎02 (𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ) = 𝑃𝜎02 (𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 )
𝑠2 𝑠22 𝑠2 𝑠2
= 𝑃𝜎02 [𝑠22 < 𝑘′1 or > 𝑘′2 ] = 𝑃𝜎02 [𝑠22 < 𝑘 ′1 ] + 𝑃𝜎02 [𝑠22 > 𝑘 ′ 2 ]
1 𝑠12 1 1

𝑠22 𝑠22
𝛼/2 + 𝛼/2 = 𝑃𝜎02 [ 2 < 𝑘 ′1 ] + 𝑃𝜎02 [ 2 > 𝑘 ′ 2 ]
𝑠1 𝑠1

𝑠2 𝑠2
Let 𝑃𝜎02 [𝑠22 < 𝑘′1 ] = 𝛼/2 and 𝑃𝜎02 [𝑠22 > 𝑘′2 ] = 𝛼/2 , then
1 1
𝑠22 𝑠22 𝑠22
𝑃𝜎02 [𝑠2 < 𝑘′1 ] = 𝛼/2 and 𝑃𝜎02 [𝑠2 < 𝑘′2 ] = 1 − 𝛼/2 where ~𝐹𝑚−1,𝑛−1
1 1 𝑠12
𝐹𝐹𝑚−1,𝑛−1 (𝑘 ′1 ) = 𝛼/2 and 𝐹𝐹𝑚−1,𝑛−1 (𝑘 ′ 2 ) = 1 − 𝛼/2
𝑘 ′1 = 𝐹𝐹−1
𝑚−1,𝑛−1
(𝛼/2) and 𝑘 ′ 2 = 𝐹𝐹−1
𝑚−1,𝑛−1
(1 − 𝛼/2)
𝑘′1 = 𝐹𝑚−1,𝑛−1;𝛼/2 𝑎𝑛𝑑 𝑘′2 = 𝐹𝑚−1,𝑛−1;1−𝛼/2 .

𝑠12
However, this two–sided test can be reduced to a one–sided test since ∼ 𝐹𝑛−1,𝑚−1 (similar
𝑠22
𝑠22 1
to how we got ∼ 𝐹𝑚−1,𝑛−1 ) and 𝐹𝑚−1,𝑛−1;𝛼/2 = (easy to show, similar to
𝑠12 𝐹𝑛−1,𝑚−1;1−𝛼/2
above) .

8. The LR–test for 𝐻0 : 𝜎12 = 𝜎22 versus 𝐻1 : 𝜎12 ≠ 𝜎22 is :


Reject 𝐻0 if
𝑠22 𝑠22
< 𝑘1′ or > 𝑘2′
𝑠12 𝑠12
𝑠22 𝑠22
< 𝐹𝑚−1,𝑛−1;𝛼/2 or > 𝐹𝑚−1,𝑛−1;1−𝛼/2
𝑠12 𝑠12
𝑠22 1 𝑠22
< 𝑜𝑟 > 𝐹𝑚−1,𝑛−1;1−𝛼/2 . (2.40)
𝑠12 𝐹𝑛−1,𝑚−1,1−𝛼/2 𝑠12
𝑠12 𝑠22
> 𝐹𝑛−1,𝑚−1,1−𝛼/2 𝑜𝑟 > 𝐹𝑚−1,𝑛−1;1−𝛼/2 .
𝑠22 𝑠12

(so that we just use one F table, the 1 − 𝛼/2 table. You might only have the 97.5% table and not
the 2.5% table, and have 𝛼 = 0.05, then you only need the 97.5% table and not the 2.5% table by
doing this second last step.)

𝑠22
( OR 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 > 𝐹𝑚−1,𝑛−1;1−𝛼 𝑤ℎ𝑒𝑟𝑒 𝑠22 𝑖𝑠 𝑡ℎ𝑒 𝑙𝑎𝑟𝑔𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒. (2.40a))
𝑠12

Remarks :
1. For the one–sided hypotheses 𝐻0 : 𝜎12 ≤ 𝜎22 (𝐻0 : 𝜎12 ≥ 𝜎22 ) versus 𝐻1 : 𝜎12 > 𝜎22 (𝐻1 : 𝜎12 <
𝑠12 𝑠22
𝜎22 ) the LR–test is to reject 𝐻0 if > 𝐹𝑛−1,𝑚−1,1−𝛼 (if > 𝐹𝑚−1,𝑛−1,1−𝛼 ). (similar to before
𝑠22 𝑠12
– see previous examples)

2. In example 2.11 we had 𝐻0 : 𝜎12 = 𝜎22 versus 𝐻1 : 𝜎12 ≠ 𝜎22


𝜎2 𝜎2
which is the same as 𝐻0 : 𝜎12 = 1 versus 𝐻1 : 𝜎12 ≠ 1
2 2
𝜎12 𝜎12
A more general hypothesis is 𝐻0 : = 𝜆0 versus 𝐻1 : ≠ 𝜆0 , see Exercise 3.
𝜎22 𝜎22
Or 𝐻0 : 𝜎12 = 𝜆0 𝜎22 (= 𝜎10
2
) versus 𝐻1 : 𝜎12 ≠ 𝜆0 𝜎22

3. For 𝑘 populations the hypothesis 𝐻0 : 𝜎12 = 𝜎22 = ⋯ = 𝜎𝑘2 (= 𝜎02 ) is usually tested by
means of an approximate test, using the asymptotic distribution of the LR–statistic (see a later
section and Exercise 5).

4. See “Computer lecture” excel file on blackboard!!!


2.3 Uniformly Most Powerful Tests
(used for ≤ vs > or for ≥ vs <. Example 2.9 shows that it can’t be used for = vs ≠)

In Chapter 1 we had the definition of a most powerful test (MPT) for simple hipotheses.

We were dealing with simple hyotheses (= and =) (so then size is 𝛼 ∗ = 𝜋 ∗ (𝜃0 ) for test*) in this
section:
Definition 1.8 : A test Γ ∗ is a most powerful test (MPT) of size 𝛼 (0 < 𝛼 < 1) if :
(i) 𝜋 ∗ (𝜃0 ) = 𝛼 , (i.e. 𝛼 ∗ = 𝛼) and
(ii) 𝛽 ∗ ≤ 𝛽
for all other tests Γ of size 𝛼 or smaller.

We had:
(ii) 𝛽 ∗ ≤ 𝛽 i.e. 1 − 𝜋 ∗ (𝜃1 ) ≤ 1 − 𝜋(𝜃1 ) i.e. −𝜋 ∗ (𝜃1 ) ≤ −𝜋(𝜃1 ) i.e. 𝜋 ∗ (𝜃1 ) ≥ 𝜋(𝜃1 )
Remember, 𝜋(𝜃1 ) = 1 − 𝛽 is called the power of the test, so 𝛽 = 1 − 𝜋(𝜃1 )
𝜋(𝜃) is called the power function.

For composite hypotheses, we have:


Definition 2.1 : Let Γ be a test of 𝐻0 : 𝜃 ∈ Ω0 versus 𝐻1 : 𝜃 ∈ Ω0 where Ω0 ⊂ Ω.
The size of the test is defined as
𝛼 = sup 𝜋(𝜃) = sup 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ] = sup 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒] = sup 𝑃[𝑇𝑦𝑝𝑒 𝐼 𝐸𝑟𝑟𝑜𝑟]
𝜃∈Ω0 𝜃∈Ω0
where 𝜋(𝜃) is the power function of the test.

Similarly, 𝛽 = sup [1 − 𝜋(𝜃)] = sup 𝑃[𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 ]


𝜃∈Ω𝑐0 𝜃∈Ω𝑐0
= sup 𝑃[𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑡𝑟𝑢𝑒] = sup𝑃[𝑇𝑦𝑝𝑒 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 ]

A more general version of that definition for composite hypotheses (so then size is 𝛼 ∗ =
sup𝜃∈Ω0 𝜋 ∗ (𝜃) for test*) is as follows:

Definition 2.3 : A test Γ ∗ of 𝐻0 : 𝜃 ∈ Ω0 versus 𝐻1 : 𝜃 ∈ Ω𝑐0 is a uniformly most powerful test


(UMPT) of size 𝛼 if and only if
1. sup𝜃∈Ω0 𝜋 ∗ (𝜃) = 𝛼 ,(i.e. 𝛼 ∗ = 𝛼), and
2. 𝜋 ∗ (𝜃) ≥ 𝜋(𝜃) for all 𝜃 ∈ Ω𝑐0 , (2.21)
where 𝜋(𝜃) is the power function of any other test of size 𝛼 or smaller.

(Remember Ω𝑐0 = Ω1 .)

So a test is UMP if its power function is higher over the region where the alternative is true than
any other test whose size is the same or smaller.
FIGURE 2.3 : Power functions of two tests

In Figure 2.3 the power functions, 𝜋 ∗ (𝜃) and 𝜋(𝜃) , are depicted for two tests of the
hypotheses 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 . Both tests have size 𝛼, but Γ ∗ is a uniformly more
powerful test than Γ.

In Chapter 1, we were dealing with simple hyotheses (= and =):


Theorem 1.2 : (Neyman–Pearson Lemma)
Suppose Γ ∗ is a test of 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 = 𝜃1 which has the following form for some
𝑓(𝒙|𝜃 )
constant 𝑘 > 0 : 𝒙 ∈ 𝑅 𝑖𝑓 𝑓(𝒙|𝜃0 ) < 𝑘 Where (𝑘 is found by) 𝑃𝜃0 [𝑿 ∈ 𝑅] = 𝛼 ∗ .
1

Theorem 1.3 : Consider the simple hypotheses 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 = 𝜃1 . Suppose


𝑇(𝐱 ) is a sufficient statistic for 𝜃 and 𝑔(𝑡|𝜃𝑖 ) is the pdf of 𝑇 corresponding to 𝜃𝑖 (𝑖 = 0,1).
Then any test with critical region 𝑈 (a subset of the sample space of 𝑇) is a MPT of size 𝛼 if it
𝑔(𝑡|𝜃0 )
satisfies 𝑡∈𝑈 𝑖𝑓 < 𝑘 for some 𝑘 > 0, where 𝑃𝜃0 [𝑇 ∈ 𝑈] = 𝛼 .
𝑔(𝑡|𝜃1 )
(working with the pdf of t instead of the pdf of x = (x1,…,xn) )

In general, only one–sided tests are UMP. The following theorem will show how to obtain an
UMP test by using the result of the Neyman–Pearson Lemma.

Theorem 2.1 : Consider testing 𝐻0 : 𝜃 ∈ Ω0 versus 𝐻1 : 𝜃 ∈ Ω𝑐0 .


Suppose the test is based on a sufficient statistic 𝑇 with rejection region 𝑈 (or 𝑅 if we are
working with the entire sample) satisfying the following conditions:
1. The test is of size 𝛼,
so there exists a 𝜃0 ∈ Ω0 such that 𝑃𝜃0 [𝑇 ∈ 𝑈] = 𝛼 (or 𝑃𝜃0 [𝑿 ∈ 𝑅] = 𝛼).
(We don’t have 𝑠𝑢𝑝, because 𝑠𝑢𝑝 will occur at 𝜃0 , so sup𝜃∈Ω0 𝑃(… ) = 𝑃𝜃0 (… ))
2. Let 𝑔(𝑡|𝜃) be the pdf of 𝑇. For the 𝜃0 in (i) and for each 𝜃′ ∈ Ω𝑐0 , there exists a 𝑘′ > 0
𝑔(𝑡|𝜃0 ) 𝑓(𝒙|𝜃0 )
such that 𝑡∈𝑈 𝑖𝑓 < 𝑘′ (or 𝑿 ∈ 𝑅 𝑖𝑓 < 𝑘)
𝑔(𝑡|𝜃′ ) 𝑓(𝒙|𝜃′ )
Then this test is a uniformly most powerful size 𝛼 test for 𝐻0 versus 𝐻1 .

(If we have 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 , then 𝜃 ′ ∈ Ω𝑐0 = (𝜃0 , ∞))


(If we have 𝐻0 : 𝜃 ≥ 𝜃0 versus 𝐻1 : 𝜃 < 𝜃0 , then 𝜃 ′ ∈ Ω𝑐0 = (−∞, 𝜃0 ))
Proof : (𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 OR 𝐻0 : 𝜃 ≥ 𝜃0 versus 𝐻1 : 𝜃 < 𝜃0 )
Let 𝜋 (𝜃) be the power function of the test with rejection region R. Fix 𝜃′ ∈ Ω𝑐0 and consider

testing 𝐻0′ : 𝜃 = 𝜃0 versus 𝐻1′ : 𝜃 = 𝜃′ . We now have two simple hypotheses and since the
rejection region has the same form as in the Neyman–Pearson Lemma (Th1.2, or we can use
Th1.3), it follows that 𝜋 ∗ (𝜃′) ≥ 𝜋(𝜃′) where 𝜋(𝜃) is the power function of any other size 𝛼
test of 𝐻′0 , that is, any test satisfying 𝜋(𝜃0 ) = 𝛼. (For our original test, sup𝜃∈Ω0 𝜋(𝜃) = 𝜋(𝜃0 ))
But 𝜃′ was arbitrary, so 𝜋 ∗ (𝜃′) ≥ 𝜋(𝜃′) for all 𝜃′ ∈ Ω𝑐0 and the result follows.

Example 2.6 : Consider again Example 1.3(b) where the MP test of size 𝛼 was derived for
testing 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 = 𝜃1 where 𝜃0 < 𝜃1 and 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜃, 𝜎 2 ) with 𝜎 2
𝜎
known. The MPT from (1.13) is: 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 if 𝑥 > 𝜃0 + 𝑧1−𝛼 𝑛 . (2.22)

Suppose now we want to test 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 . We’ll thus use the same rejection
𝜎
rule according to theorem 2.1 i.e. 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 if 𝑥 > 𝜃0 + 𝑧1−𝛼 𝑛 for our UMP test of size 𝛼.

(We can see that the conditions for the theorem are met:
We see that the critical region above is independent of 𝜃1 , so it is the MP test for any 𝜃1 > 𝜃0 .
Condition (i) of the theorem is true since the power function of the test,
𝜎
𝜋 ∗ (𝜃) = 𝑃𝜃 [𝑋 > 𝜃 + 𝑧1−𝛼 ], is an increasing function of 𝜃, so sup𝜃≤𝜃0 𝜋 ∗ (𝜃) = 𝜋(𝜃0 ) = 𝛼.
√𝑛
So (2.22) is also the UMP test for 𝐻0 : 𝜃 ≤ 𝜃0 against 𝐻1 : 𝜃 > 𝜃0 .)

Another way to derive an UMP test is to apply the Karlin–Rubin theorem. For that we need the
following definition. I do this section different to how the original notes do it, so only use this
updated version of the notes for the rest of this section 2.3.

Definition 2.4 :
a) A family of pdfs {𝑔(𝑡|𝜃): 𝜃 ∈ Ω} for a random variable 𝑇 with parameter 𝜃 has a
𝑔(𝑡|𝜃′ )
monotone 𝜃0′ < 𝜃1′ likelihood ratio (MLR 𝜃0′ < 𝜃1′ ) if, for every 𝜃0′ < 𝜃1′ , 𝑔(𝑡|𝜃0′ ) is a
1
nondecreasing (increasing or constant) function of 𝑡.
b) A family of pdfs {𝑔(𝑡|𝜃): 𝜃 ∈ Ω} for a random variable 𝑇 with parameter 𝜃 has a
𝑔(𝑡|𝜃′ )
monotone 𝜃0′ > 𝜃1′ likelihood ratio (MLR 𝜃0′ > 𝜃1′ ) if, for every 𝜃0′ > 𝜃1′ , 𝑔(𝑡|𝜃0′ ) is a
1
nondecreasing (increasing or constant) function of 𝑡.

(Note: 𝑡 can also be the entire sample i.e. 𝒙 i.e. 𝑔(𝑡| … ) becomes 𝑓(𝒙| … ). Non-increasing
(decreasing or constant) w.r.t. 𝑠 is the same as non-decreasing (increasing or constant) w.r.t.
𝑡 = −𝑠). Many common families of distributions have a MLR. Indeed, any regular exponential
family with 𝑔(𝑡|𝜃) = ℎ(𝑡)𝑐(𝜃)𝑒 𝑤(𝜃)𝑡 has a MLR if 𝑤(𝜃) is a nondecreasing function.

Theorem 2.2 : (Karlin–Rubin)


Consider testing 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 (or 𝐻0 : 𝜃 ≥ 𝜃0 versus 𝐻1 : 𝜃 < 𝜃0 ). Suppose
𝑇 is a sufficient statistic for 𝜃 and the family of pdfs of 𝑇 has a MLR 𝜃0′ < 𝜃1′ (or 𝜃0′ > 𝜃1′
respectively). Then for any 𝑐, the test that rejects 𝐻0 if and only if 𝑇 < 𝑐 is a UMP test of size
𝛼, where 𝛼 = 𝑃𝜃0 [𝑇 < 𝑐].
We shall not consider the proof.

Remarks :

1. Note that the theorem only applies to one–sided hypotheses. As we shall illustrate in Example
2.9, no UMP test exists for two–sided hypotheses.

2. To apply this theorem, first determine the likelihood ratio of the sample, or of a sufficient
statistic. Verify that it is a nondecreasing function of the sufficient statistic for the two parameter
values where if 𝐻0 : 𝜃 ≤ 𝜃0 we set 𝜃0′ < 𝜃1′ , and 𝐻0 : 𝜃 ≥ 𝜃0 we set 𝜃0′ > 𝜃1′ Write the test
down and determine the constant 𝑐.

3. The test is for a one–parameter family of distributions, or in other words, only the parameter
of interest must be unknown.

Example 2.7 : Let 𝑋1 , … , 𝑋𝑛 be a sample from a Poisson distribution with parameter 𝜃. Find
a UMP test of size 𝛼 for 𝐻0 : 𝜃 ≥ 𝜃0 versus 𝐻1 : 𝜃 < 𝜃0 .
Let 𝜃0′ > 𝜃1′ and calculate the ratio
′Σ𝑥𝑖 ′
𝑓(𝒙|𝜃0′ ) 𝐿(𝜃′ | 𝒙) 𝜃0 𝑒 −𝑛𝜃0 / ∏ 𝑥𝑖 !
= 𝐿(𝜃0′ | 𝒙) = ′Σ𝑥𝑖 ′
𝑓(𝒙|𝜃′ )
1 1 𝜃1 𝑒 −𝑛𝜃1 / ∏ 𝑥𝑖 !
Σ𝑥𝑖
′ ′ 𝜃′
= 𝑒 −𝑛(𝜃0 −𝜃1) (𝜃0′ ) .
1
(it’s easy to see that 𝑎𝑏 𝑡 is increasing w.r.t. 𝑡 if 𝑎 > 0 and 𝑏 > 1
because if 𝑡 get’s bigger, then 𝑏 𝑡 get’s bigger if 𝑏 > 1, then 𝑎𝑏 𝑡 get’s bigger if 𝑎 > 0)

Since 𝜃0′ > 𝜃1′ , this is N.B. an increasing function of 𝑇 = Σ𝑋𝑖 . Furthermore, Σ𝑋𝑖 is a sufficient
statistic for 𝜃. So, according to the Karlin–Rubin theorem we can immediately state that the UMP
test is of the form:
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 Σ𝑥𝑖 < 𝑐 .
To find 𝑐 so that the size of the test is 𝛼, let
𝛼 = sup 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒] = sup 𝑃[Σ𝑥𝑖 < 𝑐|𝜃 ≥ 𝜃0 ] = sup P[Σ𝑥𝑖 < 𝑐] = 𝑃𝜃0 [Σ𝑋𝑖 < 𝑐].
𝜃≥𝜃0
Under 𝐻0 , Σ𝑋𝑖 ∼ Pois(𝑛𝜃0 ), a discrete distribution, so it may not be possible to find a 𝑐 so that
the size of the test is exactly 𝛼.

For example, let 𝜃0 = 1 and 𝑛 = 10.


We want 𝑐 so that we get a value as close as possible to 0.05 = 𝑃𝜃0 =1 [Σ𝑋𝑖 < 𝑐] where Σ𝑋𝑖 ∼
Pois(10 × 1) i.e. Σ𝑋𝑖 ∼ Pois(10),

From Poisson tables it follows that 𝑃𝜃0 [Σ𝑋𝑖 ≤ 4] = 0.0293 and 𝑃𝜃0 [Σ𝑋𝑖 ≤ 5] = 0.0671. So
the size of the test would be one of these two values depending on your choice. If we want an 𝛼
as close as possible to 0.05, then the UMP test of size 𝛼 = 0.0671 is:

𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 Σ𝑥𝑖 ≤ 5 i. e. 𝑖𝑓 Σ𝑥𝑖 < 6 .


Example 2.8 : Consider a sample of 𝑛 independent observations from a 𝑁(0, 𝜎 2 )
distribution. Find a UMP test of size 𝛼 for 𝐻0 : 𝜎 2 ≤ 𝜎02 versus 𝐻1 : 𝜎 2 > 𝜎02 .
Let 𝜎0′2 < 𝜎1′2 , then
1 2

′2 Σ(xi −0)
𝐿(𝜎0′2 | 𝒙) (2𝜋𝜎0′2 )−𝑛/2 𝑒 2𝜎0
= 1 2
𝐿(𝜎1′2 | 𝒙) − ′2 Σ(xi −0)
(2𝜋𝜎1′2 )−𝑛/2 𝑒 2𝜎 1

1 1 1
𝜎′2
𝑛/2 − Σ𝑥𝑖2 ( ′2 − ′2 )
2
= (𝜎1′2 ) 𝑒 𝜎 0𝜎 1 .
0
1 1 1
𝜎1′2
𝑛/2 − Σ𝑥𝑖2 ( ′2 − ′2 )
( (𝜎′2 ) 𝑒 2 𝜎0 𝜎1
= 𝑎𝑒 𝑏𝑡 will be increasing if 𝑎, 𝑏 > 0
0
𝑛/2
𝜎′2 1 1 1 1 1 1 1
𝑎 = (𝜎1′2 ) > 0 but 𝑏 = − 2 (𝜎′2 − 𝜎′2 ) < 0 ∵ 𝜎0′2 < 𝜎1′2 so > 𝜎′2 so − 𝜎′2 > 0 so
0 0 1 𝜎0′2 1 𝜎0′2 1
1 1 1
− 2 (𝜎′2 − 𝜎′2 ) < 0 where 𝑡 = Σ𝑥𝑖2 (actually decreasing w.r.t. this 𝑡 = Σ𝑥𝑖2 )
0 1

So we do it like this:
1 1 1
𝜎′2
𝑛/2 − Σ𝑥𝑖2 ( ′2 − ′2 )
(𝜎1′2 ) 𝑒 2 𝜎 0𝜎 1 = 𝑎𝑒 𝑏𝑡 will be increasing if 𝑎, 𝑏 > 0
0
𝑛/2
𝜎′2 1 1 1
𝑎 = (𝜎1′2 ) > 0 but 𝑏 = 2 (𝜎′2 − 𝜎′2 ) > 0 where 𝑡 = −Σ𝑥𝑖2 so increasing w.r.t. 𝑡 = −Σ𝑥𝑖2 )
0 0 1

It is an increasing function of 𝑇 = −Σ𝑋𝑖2 . Furthermore, this expression is a function of


𝑇 = −Σ𝑋𝑖2 , which is a sufficient statistic for 𝜎 2 .

So the UMP test is of the form


𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 − Σ𝑥𝑖2 < 𝑐
Σ𝑥𝑖2 > −𝑐
Σ𝑥𝑖2 > 𝑐′
where 𝛼 = sup P[Reject H0 |𝐻0 𝑡𝑟𝑢𝑒] = 𝑃𝜎02 [Σ𝑋𝑖2 > 𝑐′] . (see previous example)

Important Relationship III


If 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ), then
𝑋 −𝜇
then 𝑖 ~𝑁(0,1), then
𝜎
𝑋𝑖 −𝜇 2
then ( ) ~𝜒12 , then
𝜎
𝑋𝑖 −𝜇 2 Σ(𝑋𝑖 −𝜇)2
∑( ) = ∼ 𝜒𝑛2 , and (2.22)
𝜎 𝜎2
Σ(𝑋𝑖 −𝑋)2 (𝑛−1)𝑆 2 2
( = ∼ 𝜒𝑛−1 . (2.23))
𝜎2 𝜎2
Σ(𝑋𝑖 −0)2 Σ𝑋𝑖2
Since 𝜇 = 0, we know that = ∼ 𝜒𝑛2 under 𝐻0 .
𝜎02 𝜎02
Σ𝑋 2 𝑐′ Σ𝑋 2 𝑐′ 𝑐′
So 𝛼 = 𝑃𝜎02 [Σ𝑋𝑖2 > 𝑐′] = 𝑃𝜎02 [ 𝜎2𝑖 > 𝜎2 ] = 1 − 𝑃𝜎02 [ 𝜎2𝑖 < 𝜎2 ] = 1 − 𝐹𝜒𝑛2 [𝜎2 ] ,
0 0 0 0 0
𝑐′ 𝑐′ 2
So 𝐹𝜒𝑛2 [𝜎2 ] = 1 − 𝛼 so which means that = 𝜒𝑛,1−𝛼 , or 𝑐 = ′
𝜎02 2
𝜒𝑛,1−𝛼 .
0 𝜎02

The UMP test of size 𝛼 is then:

𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 Σ𝑥𝑖2 > 𝜎02 𝜒𝑛,1−𝛼


2
.

(If I ask for the power function:)


The power function of this test is

𝜋(𝜎 2 ) = 𝑃𝜎2 [Reject H0 ]


= 𝑃𝜎2 [Σ𝑋𝑖2 > 𝜎02 𝜒𝑛,1−𝛼
2
]
Σ𝑋 2 2 𝜎2
= 𝑃𝜎2 [ 𝜎2𝑖 > 𝜎02 𝜒𝑛,1−𝛼 ]
(2.24)
Σ𝑋𝑖2 𝜎02
= 1 − 𝑃𝜎2 [ 𝜎2 > 𝜒2 ]
𝜎2 𝑛,1−𝛼
2 𝜎2
= 1 − 𝐹𝜒𝑛 (𝜎02 𝜒𝑛,1−𝛼 ),

Σ(𝑋𝑖 −0)2 Σ𝑋𝑖2 Σ(𝑋𝑖 −0)2 Σ𝑋𝑖2


because = ∼ 𝜒𝑛2 under 𝐻0 . but in general = ∼ 𝜒𝑛2
𝜎02 𝜎02 𝜎2 𝜎2
where 𝐹𝜒𝑛 ( . ) is the cdf of a chi–squared distribution with 𝑛 degrees of freedom.

(You can skip this example)


Example 2.9 : This example is a graphical explanation as to why a two–sided test can not be
UMP. Consider 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) with 𝜎 2 known. Let 𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0 .
From Example 2.3 (Equation 2.13) the LR test follows as:
𝜎 𝜎
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 ≤ 𝜇0 − 𝑧1−𝛼/2 𝑛 𝑜𝑟 𝑥 ≥ 𝜇0 + 𝑧1−𝛼/2 𝑛 .
√ √
The power function of this test is
𝜋(𝜇) = 𝑃𝜇 [ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ]
𝜎 𝜎
= 𝑃𝜇 [𝑋 ≤ 𝜇0 − 𝑧1−𝛼/2 𝑜𝑟 𝑋 ≥ 𝜇0 + 𝑧1−𝛼/2 ]
√𝑛 √𝑛
𝜎 𝜎
= 1 − 𝑃𝜇 [𝜇0 − 𝑧1−𝛼/2 ≤ 𝑋 ≤ 𝜇0 + 𝑧1−𝛼/2 ]
√𝑛 √𝑛
𝜇0 −𝜇 𝜇0 −𝜇
= 1−𝑃[ 𝜎 − 𝑧1−𝛼/2 ≤ 𝒵 ≤ 𝜎 + 𝑧1−𝛼/2 ] (2.25)
√𝑛 √𝑛

𝜇0 −𝜇 𝜇0 −𝜇
= 1 − {𝑃 [𝒵 ≤ 𝜎 + 𝑧1−𝛼/2 ] − 𝑃 [𝒵 ≤ 𝜎 − 𝑧1−𝛼/2 ]}
√𝑛 √𝑛
√𝑛(𝜇0 −𝜇) √𝑛(𝜇0 −𝜇)
= 1 − Φ( 𝜎
+ 𝑧1−𝛼/2 ) + Φ ( 𝜎
− 𝑧1−𝛼/2 ) .
𝜋(𝜇0 ) = 1 − Φ(𝑧1−𝛼/2 ) + Φ(−𝑧1−𝛼/2 )
= 1 − Φ(𝑧1−𝛼/2 ) + Φ(𝑧𝛼/2 )
Notice that
= 1 − (1 − 𝛼/2) + 𝛼/2
= 𝛼 , 𝑡ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑒𝑠𝑡.

The power function is shown as the solid line in Figure 2.4.

FIGURE 2.4

(𝜇0′ ∈ {𝜇0 } and 𝜇1′ ∈ (−∞, 𝜇0 ) ∪ (𝜇0 , ∞) because we had 𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0 )

Consider now a test of the form:


𝜎 𝜎
Reject 𝐻0 if 𝑥 ≤ 𝜇0 + 𝑧4𝛼 𝑛 or 𝑥 ≥ 𝜇0 + 𝑧1−𝛼 .
5 √ 5 √𝑛

The size of this test is still 𝜋2 (𝜇0 ) = 𝛼, but the power functions, 𝜋2 (𝜇), is shown in Figure 2.4
as the broken line. It is clear that neither test is uniformly more powerful than the other. For 𝜇 >
𝜇0 , the original test is more powerful, while for 𝜇 < 𝜇0 the new test is more powerful. So for
any two–sided test it is always possible to find another two–sided test that will be more powerful
in some region of the parameter space.

The LR–test in the above example has however one desirable property which is unique to it. It is
an unbiased test.

Definition 2.5 : A test with power function 𝜋(𝜃) is an unbiased test if 𝜋(𝜃1′ ) ≥ 𝜋(𝜃0′ ) for
every 𝜃0′ ∈ Ω0 and 𝜃1′ ∈ Ω𝑐0 .

(We could get for example 𝜃0′ ∈ (−∞, 𝜃0 ] and 𝜃1′ ∈ (𝜃0 , ∞))

(𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0 then Ω0 = {𝜇0 } and Ω1 = Ω𝑐0 = (−∞, 𝜇0 ) ∪ (𝜇0 , ∞)


𝐻0 : 𝜇 ≤ 𝜇0 versus 𝐻1 : 𝜇 > 𝜇0 then Ω0 = (−∞, 𝜇0 ] and Ω1 = Ω𝑐0 = (𝜇0 , ∞)
𝐻0 : 𝜇 ≥ 𝜇0 versus 𝐻1 : 𝜇 < 𝜇0 then Ω0 = [𝜇0 , ∞) and Ω1 = Ω𝑐0 = (−∞, 𝜇0 ))

This simply means that the probability of rejecting the null hypothesis when it is false should
never be smaller than the probability of rejecting it when it is true.
From Figure 2.4 we can see that inf𝜇∈Ω 𝜋(𝜇) = 𝜋(𝜇0 ) = 𝛼, so 𝜋(𝜇) > 𝜋(𝜇0 ) for all 𝜇 ≠ 𝜇0 ,
and thus the LR–test is an unbiased test of size 𝛼. The test with power function 𝜋2 (𝜇) is
however not unbiased since the power function drops below the size of the test in a region of
Ω𝑐0 .

Note : The LR–test is a symmetrical test and there exists no other unbiased test of the same size
that is more powerful. So we call the LR–test in this situation the uniformly most powerful
unbiased test of the given hypothesis.

Important results (memorize the rejection rules to be able to apply this to dataset(s)):

1. (One-group) Normal test on the mean where the variance is unknown (t-test):

One sample test on the mean of a normal distribution (variance unknown):


From example 2.4 : Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) where 𝜎 2 is unknown and we want to
derive the LR–test of size 𝛼 for 𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0 .

√𝑛|𝑥−𝜇0 |
Reject the null hypothesis 𝐻0 : 𝜇 = 𝜇0 if |𝑡| ≥ 𝑡𝑛−1,1−𝛼/2 , where |𝑡| = ,
𝑠

𝑠 𝑠
or , equivalently, if 𝑥 ≤ 𝜇0 − 𝑡𝑛−1,1−𝛼/2 𝑜𝑟 𝑥 ≥ 𝜇0 + 𝑡𝑛−1,1−𝛼/2 .
√𝑛 √𝑛

In the previous examples we have only dealt with two–sided (≠) alternatives (H1) and a point (=)
null hypothesis (H0). For one–sided tests we would usually apply the methods described in the
example. However, one–sided LR–tests are derived in a similar manner and are usually just a
one–sided version of the tests described above.

In Example 2.4, let 𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0


The LR–test of size 𝛼 would then be:
𝑠 𝑠
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 ≤ 𝜇0 − 𝑡𝑛−1,1−𝛼/2 𝑜𝑟 𝑥 ≥ 𝜇0 + 𝑡𝑛−1,1−𝛼/2 .
√𝑛 √𝑛

For example, in Example 2.4, if we had 𝐻0 : 𝜇 ≤ 𝜇0 versus 𝐻1 : 𝜇 > 𝜇0 be the two hypotheses.
The LR–test of size 𝛼 would then be:
𝑠
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 ≥ 𝜇0 + 𝑡𝑛−1,1−𝛼 𝑛 .

Similarly for 𝐻0 : 𝜇 ≥ 𝜇0 versus 𝐻1 : 𝜇 < 𝜇0 follows :


𝑠
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 ≤ 𝜇0 − 𝑡𝑛−1,1−𝛼 .
√𝑛

These are one–sided critical regions and the only difference to (2.17) is that the 𝛼/2 is replaced
by 𝛼.
• The case where the variance is known doesn’t have to be memorized, but it is called a z-
test.

2. Two-group (independent groups) normal test on the means where the variances are
unknown but equal (t-test) (case 2a of computer lecture):

Two sample (independent samples) test on the means of normal distributions (variances
equal but unknown):
From example 2.10 : Consider a sample 𝑋1 , … , 𝑋𝑛 of size 𝑛 from a 𝑁(𝜇1 , 𝜎 2 )
distribution and an independent sample 𝑌1 , … , 𝑌𝑚 of size 𝑚 from a 𝑁(𝜇2 , 𝜎 2 ) distribution.
Note that the variances are assumed equal but unknown. We want to derive the LR test for
𝐻0 : 𝜇1 = 𝜇2 (= 𝜇0 ) versus 𝐻1 : 𝜇1 ≠ 𝜇2 .

The LR–test is: Reject the null hypothesis 𝐻0 : 𝜇1 = 𝜇2 if


|𝑥−𝑦|
|𝑤| = 1 1
> 𝑡𝑛+𝑚−2;1−𝛼/2 .
√( + )𝑆𝑝2
𝑛 𝑚

(𝑛1 −1)𝑠21 +(𝑛2 −1)𝑠22 ∑(𝑥𝑖 −𝑥)2 + ∑(𝑦𝑖 −𝑦)2


Note that: 𝑠2𝑝 = (𝑛1 −1)+(𝑛2 −1)
= 𝑛1 +𝑛2 −2

Remarks :

𝐻0 : 𝜇1 = 𝜇2 versus 𝐻1 : 𝜇1 ≠ 𝜇2 Reject H0 if |𝑤| > 𝑡𝑛+𝑚−2;1−𝛼/2


Is the same as
𝐻0 : 𝜇1 = 𝜇2 versus 𝐻1 : 𝜇1 ≠ 𝜇2 Reject H0 if 𝑤 > 𝑡𝑛+𝑚−2;1−𝛼/2 𝑜𝑟 𝑤 < −𝑡𝑛+𝑚−2;1−𝛼/2

For the one–sided hypotheses


𝐻0 : 𝜇1 ≤ 𝜇2 versus 𝐻1 : 𝜇1 > 𝜇2 𝐻0 : 𝜇1 ≥ 𝜇2 versus 𝐻1 : 𝜇1 < 𝜇2 ,

LR–test is just the 1–sided version, LR–test is just the 1–sided version,

Reject 𝐻0 if 𝑤 > 𝑡𝑛+𝑚−2;1−𝛼 Reject 𝐻0 if 𝑤 < −𝑡𝑛+𝑚−2,1−𝛼

• Two-group (independent groups) normal test on the means where the variances are
unknown but unequal (t-test) doesn’t have to be memorized. (case 2b of computer
lecture)

• Two-group (independent groups) normal test on the means where the variances are
known but equal (z-test) doesn’t have to be memorized. (case 2c of computer lecture)
3. Two-group dependent groups normal test on the means where the variances are
unknown but equal (t-test) (case 1 of computer lecture):

Two sample (dependent samples) test on the means of normal distributions (variances equal
but unknown):
From ex 2. of ch 2
Consider paired samples (𝑋1 , 𝑌1 ), (𝑋2 , 𝑌2 ), … (𝑋𝑛 , 𝑌𝑛 ) from two normal distributions where
𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇1 , 𝜎12 ) and 𝑌1 , … , 𝑌𝑛 ∼ 𝑁(𝜇2 , 𝜎22 ) and 𝑋 and 𝑌 are not independent with
Cov (𝑋, 𝑌) = 𝜌𝜎1 𝜎2 . Derive a test of size 𝛼 for 𝐻0 : 𝜇1 − 𝜇2 = 0 versus 𝐻1 : 𝜇1 − 𝜇2 ≠ 0
where 𝜎12 and 𝜎22 are unknown. (Hint: Work with the distribution of 𝑋 − 𝑌).

In Exercise 2 from this chapter we derived the LR-test (which is called the paired t-test) of
size α for the hypothesis for 𝐻0 : 𝜇1 − 𝜇2 = 0 versus 𝐻1 : 𝜇1 − 𝜇2 ≠ 0. The test statistic is
based on the within-subject differences 𝐷1 , … , 𝐷𝑛 where 𝐷𝑖 = 𝑋𝑖 − 𝑌𝑖 , have distribution
𝑁(𝜇, 𝜎 2 )and 𝜇 = 𝜇1 − 𝜇2 . Then the LR test is as follows:

Reject the null hypothesis 𝐻0 : 𝜇 = 𝜇0 (e.g. 𝐻0 : 𝜇 = 0) if

|𝑑−𝜇0 |
|𝑡| > 𝑡𝑛−1;1−𝛼/2 where |𝑡| =
√s/n

1 1 2
Here 𝑑 = 𝑛 ∑ 𝑑𝑖 = 𝑥 − 𝑦 and 𝑠 2 = 𝑛−1 ∑(𝑑𝑖 − 𝑑) i.e. the sample variance of the 𝑑𝑖 ’s.

Generalize to the one-sided hypotheses as in the previous situation.

4. k-group independent groups normal test on the means where the variances are unknown
but equal (t-test):
(example 2.10’s case where we have k groups instead of 2 groups)

ANOVA: k-sample (independent samples) test on the means of normal distributions


(variances equal but unknown):

From ex 4. of ch 2
Consider 𝑘 normal populations where 𝑋𝑖 ∼ 𝑁(𝜇𝑖 , 𝜎 2 ) , 𝑖 = 1, … , 𝑘, independently. A sample
of size 𝑛 is drawn from each population and 𝜎 2 is unknown. Find the LR–test of size 𝛼 for
testing 𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘 (= 𝜇0 ) versus 𝐻1 : Not all means are equal.

Calculate x̅ for each sample and then find the sample variance (=var.s()) for all the x̅’s to find 𝑠2𝑥̅
(𝑛1 −1)𝑠21 +⋯+(𝑛𝑘 −1)𝑠2𝑘
Calculate the sample variance of each sample 𝑠2𝑖 and then find 𝑠2𝑝 = (𝑛1 −1)+⋯+(𝑛𝑘 −1)

𝑛𝑠𝑥2
The LR-test is: Reject the null hypothesis if 𝐹 = 2 > 𝐹𝑘−1,𝑘(𝑛−1);1−𝛼
𝑠𝑝
2.5 Chi–Square Tests
In this section we present a number of tests of hypotheses that one way or another involve the
chi–square distribution. Included will be the asymptotic distribution of the generalized
likelihood–ratio, goodness–of–fit, and tests concerning contingency tables. The material in this
section will be presented with an aim of merely finding tests of certain hypotheses, and it will not
be presented in such a way that concern is given to the optimality of the test. Thus, the power
functions of the derived tests will not be discussed.

Sometimes the distribution of the generalized likelihood ratio is intractable. In such cases we can
use the asymptotic distribution of the GLR as given without proof in the following theorem.

A results that holds if 𝑛 → ∞, holds approximately if n is large.

Theorem 2.3 : Let 𝑋1, … , 𝑋𝑛 be a random sample from a pdf 𝑓(𝑥| 𝜃 ) where
𝜃 = {𝜃1 , … , 𝜃𝑘 } . Let 𝐻0 : 𝜃 ∈ Ω0 and 𝐻0 : 𝜃 ∈ Ω𝑐0 . Under some regularity conditions the
distribution of −2ℓ𝑛 𝜆(𝐗) converges to a chi–squared distribution under 𝐻0 as 𝑛 → ∞. The
degrees of freedom is the difference between the number of free parameters in Ω and the
number of free parameters under 𝐻0 .

Since the LR criterium rejects the null hypothesis if 𝜆(𝐗) < 𝑐, it follows that this is equivalent to
rejecting when −2ℓ𝑛 𝜆(𝐗) > −2 ℓ𝑛(𝑐) i.e. −2ℓ𝑛 𝜆(𝐗) > 𝑘 , where we have that
𝑃𝐻0 [−2ℓ𝑛 𝜆(𝐗) > 𝑘] = 𝛼 i.e. 𝑃𝐻0 [−2ℓ𝑛 𝜆(𝐗) < 𝑘] = 1 − 𝛼 i.e. 𝐹−2ℓ𝑛 𝜆(𝐗),𝐻0 [𝑘] = 1 − 𝛼
. 2
Furthermore, −2ℓ𝑛 𝜆(𝐗) ∼ 𝜒𝜈2 under 𝐻0 , so 𝑘 = 𝜒𝜈,1−𝛼 where 𝜈 is determined according
to the theorem above.

Approximate LR–test:

2
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓 − 2ℓ𝑛 𝜆(𝐱) > 𝜒𝜈,1−𝛼 . (2.41)

𝜈 = dim(Ω) − dim(Ω0 ) where dim refers to the number of free parameters under Ω or Ω0 ,
which is the number of parameter that can vary (i.e. is not a fixed value) under Ω or Ω0 .

We look at the following in preparation of section 2.5.1

Binomial distribution: Y~Bin(n,p)


𝑛 𝑛!
𝑓(𝑦|𝑝) = (𝑦) 𝑝 𝑦 (1 − 𝑝)𝑛−𝑦 = 𝑦!(𝑛−𝑦)! 𝑝 𝑦 𝑞 𝑛−𝑦 𝑦 = 0, 1, … , 𝑛 0≤𝑝≤1 𝑞 = 1−𝑝

The experiment has 𝑛 independent trials.


Each trial results in either a success or a failure.
𝑃(𝑠𝑢𝑐𝑐𝑒𝑠𝑠) = 𝑝 and 𝑃(𝑓𝑎𝑖𝑙𝑢𝑟𝑒) = 𝑞 = 1 − 𝑝 in each trial.
𝑦 = total number of successes.
𝑛 − 𝑦 = total number of failures.
Or:

𝑛! 𝑦 𝑦
𝑓(𝑥|𝑝) = 𝑦 𝑝1 1 𝑝2 2 𝑦1 = 0, 1, … , 𝑛 𝑦2 = 𝑛 − 𝑦1 0 ≤ 𝑝1 ≤ 1 𝑝2 = 1 − 𝑝1
1 !𝑦2 !

Or:
𝑛! 𝑦 𝑦
𝑓(𝑥|𝑝) = 𝑦 !𝑦 ! 𝑝1 1 𝑝2 2 𝑦𝑖 = 0, 1, … , 𝑛 ∑2𝑖=1 𝑦𝑖 = 𝑛 0 ≤ 𝑝𝑖 ≤ 1 ∑2𝑖=1 𝑝𝑖 = 1
1 2

The experiment has 𝑛 independent trials. Each trial results in either event 1 or event 2.
𝑃(𝑒𝑣𝑒𝑛𝑡 1) = 𝑝1 and 𝑃(𝑒𝑣𝑒𝑛𝑡 2) = 𝑝2 = 1 − 𝑝1 in each trial.
𝑦1 = total number of times event 1 occurs
and 𝑦2 = 𝑛– 𝑦1 = total number of times event 2 occurs.

Multinomial distribution: (y1,…,yk)~Multinomial(n,p1,…,pk)


𝑛! 𝑦 𝑦 𝑦
𝑓(𝒚|𝒑) = 𝑦 !𝑦 !…𝑦 ! 𝑝1 1 𝑝2 2 … 𝑝𝑘 𝑘 𝑦𝑖 = 0, 1, … , 𝑛 ∑𝑘𝑖=1 𝑦𝑖 = 𝑛 0 ≤ 𝑝𝑖 ≤ 1 ∑𝑘𝑖=1 𝑝𝑖 = 1
1 2 𝑘

The experiment has 𝑛 independent trials. Each trial results in either event 1 or event 2 or … or
event k.
For 𝑖 = 1, … , 𝑘: 𝑃(𝑒𝑣𝑒𝑛𝑡 𝑖) = 𝑝𝑖 in each trial .
For 𝑖 = 1, … , 𝑘: 𝑦𝑖 = total number of times event 𝑖 occurs.

Practical example
This is like looking for the probability that we draw 2 red balls, 3 green balls and 4 yellow balls
out of a bucket that has 10 red balls, 15 green balls and 20 yellow balls, which is:
9! 10 2 15 3 20 4
( ) (10+15+20) (10+15+20) = …
2!3!4! 10+15+20

Note:
If we have the value of the first 𝑘 − 1 𝑝𝑖 ’s we have automatically have the value of
𝑝𝑘 = 1 − ∑𝑘−1
𝑖=1 𝑝𝑖
If we have the value of the first 𝑘 − 1 𝑦𝑖 ’s we have automatically have the value of
𝑦𝑘 = 𝑛 − ∑𝑘−1
𝑖=1 𝑦𝑖

(1 (observation) multinomial experiment with n trials i.e. there is one vector (y1, y2,…,yk)
observation)

𝑛! 𝑦 𝑦 𝑦
𝑓(𝒚|𝒑) = 𝑦 𝑝1 1 𝑝2 2 … 𝑝𝑘 𝑘
!𝑦
1 2 !…𝑦 𝑘 !
𝑛! 𝑦 𝑦 𝑦 𝑦 𝑦 𝑦
𝐿(𝒑|𝒚) = 𝑦 !𝑦 !…𝑦 ! 𝑝1 1 𝑝2 2 … 𝑝𝑘 𝑘 = 𝐶𝑝1 1 𝑝2 2 … 𝑝𝑘 𝑘 (as we have one vector observation)
1 2 𝑘
𝑦 𝑦 𝑦
i.e. 𝐿(𝒑|𝒚) ∝ 𝑝1 1 𝑝2 2 … 𝑝𝑘 𝑘
2.5.1 Goodness–of–fit Tests

Let the possible outcomes of a random experiment be decomposed into 𝑘 mutually exclusive
sets, say 𝐴1 , … , 𝐴𝑘 . Define 𝑝𝑗 = 𝑃[𝐴𝑗 ], 𝑗 = 1,2, … , 𝑘 . In 𝑛 independent repetitions of the
experiment, let 𝑌𝑗 denote the number of outcomes belonging to the set 𝐴𝑗 , so that ∑𝑘𝑗=1 𝑌𝑗 =
𝑛 and ∑𝑘𝑗=1 𝑝𝑗 = 1 . Then 𝑌1 , … , 𝑌𝑘 have a multinomial distribution with parameters
𝑝1 , … , 𝑝𝑘 and 𝑛. This is a very general situation and is used to test hypotheses about 𝑝1 , … , 𝑝𝑘
where the original data can be continuous, discrete or categorical.
The likelihood function for the multinomial distribution is given by
𝑦 𝑦 𝑦
𝐿(𝑝1 , … , 𝑝𝑘 | ) ∝ 𝑝1 1 𝑝2 2 … 𝑝𝑘 𝑘
𝑦 (2.42)
= ∏𝑘𝑗=1 𝑝𝑗 𝑗 .
Let us look at the LR–statistic for some typical null hypotheses.

You only need to memorise and know the highlighted part of the following example and remarks
that follow.

Example 2.12 : The simplest null hypothesis for the multinomial model is:

𝐻0 : 𝑝𝑗 = 𝑝𝑗0 (𝑗 = 1, … , 𝑘) 𝑣𝑒𝑟𝑠𝑢𝑠 𝐻1 : 𝐻0 𝑛𝑜𝑡 𝑡𝑟𝑢𝑒 .


Notice that under 𝐻0 all the parameters are completely specified so that Ω0 is a single point
(Dim = 0). Further,

Ω = {(𝑝1 , … , 𝑝𝑘 ): 0 ≤ 𝑝𝑗 ≤ 1 , ∑𝑘𝑗=1 𝑝𝑗 = 1} ,
with dimension = 𝑘 − 1. Thus the number of free parameters in Ω is 𝑘 − 1 and the number
of free parameters under 𝐻0 is zero. We shall not derive the ML estimators of 𝑝1 , … , 𝑝𝑘
formally, but it is logical that
𝑦
𝑝̂𝑗 = 𝑛𝑗 , 𝑗 = 1, … , 𝑘 . (2.43)
𝑘
𝑦
𝑆𝑜 sup 𝐿(𝑝1 , … , 𝑝𝑘 | ) ∝ ∏ 𝑝𝑗0𝑗
Ω0
𝑗=1
𝑘
𝑦
𝑎𝑛𝑑 sup 𝐿(𝑝1 , … , 𝑝𝑘 | ) ∝ ∏ 𝑝̂𝑗 𝑗
Ω
𝑗=1
𝑘
𝑦𝑗 𝑦𝑗
= ∏ ( ) .
𝑛
𝑗=1

It then follows that


𝑦𝑗
𝑛𝑝𝑗0
𝜆(𝒚) = ∏𝑘𝑗=1 ( ) . (2.44)
𝑦𝑗

The distribution of 𝜆(𝒚) is intractable, so we use the asymptotic result in (2.41), namely that
𝑦
𝑄𝜈 = −2ℓ𝑛𝜆(𝒚) = 2 ∑𝑘𝑗=1 𝑦𝑗 ℓ𝑛 (𝑛𝑝𝑗 ) has a chi–squared distribution with 𝜈 = dim(Ω) −
𝑗0
dim(Ω0 ) = 𝑘 − 1 degrees of freedom.

So approximate LR–test:
𝑦
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑄𝑘−1 = 2 ∑𝑘𝑗=1 𝑦𝑗 ℓ𝑛 (𝑛𝑝𝑗 ) > 𝜒𝑘−1,1−𝛼
2
. (2.45)
𝑗0

Note that the degrees of freedom = number of free parameters in Ω – number of free
parameters in Ω0
= (k – 1) – 0
=k–1

Remarks

1. If we write

𝑂𝑗 = 𝑦𝑗 = observed frequencies in set 𝐴𝑗 ,


𝐸𝑗 = 𝑛𝑝𝑗0 = expected frequencies in 𝐴𝑗 under 𝐻0 ,

then the left–hand side of (2.45) becomes

𝑂
𝑄𝑘−1 = 2 ∑𝑘𝑗=1 𝑂𝑗 ℓ𝑛 (𝐸𝑗) . (2.46)
𝑗

This is a convenient form and is the basic form of 𝑄 for all tests of a multinomial distribution.

Use the Qk-1 given above in 2.46.


ONLY IF I ask you to use the Pearson statistic then use the Qk-1 given below in (b) in 2.47:

2. Another commonly used form of 𝑄 is the Pearson statistic. A Taylor series expansion of 𝑄
about 𝐸𝑗 is:

𝑂 1
∑𝑘𝑗=1 𝑂𝑗 ℓ𝑛 ( 𝑗) = ∑𝑘𝑗=1 [(𝑂𝑗 − 𝐸𝑗 ) + (𝑂𝑗 − 𝐸𝑗 )2 + ⋯ ] .
𝐸𝑗 2𝐸 𝑗
Since Σ𝑂𝑗 = Σ𝐸𝑗 = 𝑛, the first term in the expansion is zero. A second order approximation is
then

(𝑂𝑗 −𝐸𝑗 )2 .
𝑄𝑘−1 ≈ ∑𝑘𝑗=1 2
∼ 𝜒𝑘−1 . (2.47)
𝐸𝑗

This approximation is only reasonable if the 𝐸𝑗 ’s are not too small. A rule of thumb is that all
𝐸𝑗 ’s should be larger or equal to five, 𝐸𝑗 ≥ 5 , 𝑗 = 1, … , 𝑘.
Class example (using the result of example 2.12):
Suppose we want to test the following hypotheses:

𝐻0 : 𝑝1 = 0.3, 𝑝2 = 0.5, 𝑝3 = 0.2 versus 𝐻1 : 𝐻0 𝑖𝑠 𝑛𝑜𝑡 𝑡𝑟𝑢𝑒

where 𝑝𝑖 = 𝑃(𝑒𝑣𝑒𝑛𝑡 𝑖) and we observe 28 occurances of event 1, 53 occurances of event 2 and


19 occurances of event 3.

Solution:
𝐻0 : 𝑝1 = 0.3, 𝑝2 = 0.5, 𝑝3 = 0.2 versus 𝐻1 : 𝐻0 𝑖𝑠 𝑛𝑜𝑡 𝑡𝑟𝑢𝑒
2
Reject 𝐻0 if 𝑄𝑘−1 > 𝜒𝑘−1,1−𝛼
(O1 = 28, O2 = 52, O3 = 19. n = 28 + 53 + 19 = 100. Ei = npi = 100pi. Thus E1 = 30, E2 = 50, E3 = 20.
𝑂 28 53 19
𝑄𝑘−1 = 2 ∑𝑘𝑗=1 𝑂𝑗 ℓ𝑛 (𝐸𝑗) = 2 [28ℓ𝑛 (30) + 53ℓ𝑛 (50) + 19ℓ𝑛 (20)] = 0.3678
𝑗
2 2 2
𝜒𝑘−1,1−𝛼 = 𝜒3−1,1−0.05 = 𝜒2,0.95 = 5.99146

0.3678 ≯ 5.99146
Do not reject H0
At a 95% confidence level we could not conclude that “𝑝1 = 0.3, 𝑝2 = 0.5, 𝑝3 = 0.2” is false.

Example 2.13 : In the previous example the null hypothesis was simple. Let us look at a
composite null hypothesis.
In certain genetics problems, each individual in a given population must have one of three
possible genotypes, and it is assumed that the probabilities 𝑝1 , 𝑝2 and 𝑝3 of the three
genotypes can be represented in the following form:

𝑝1 = 𝜃 2 , 𝑝2 = 2𝜃(1 − 𝜃) , 𝑝3 = (1 − 𝜃)2 .

Here the parameter 𝜃 is unknown and lies in the interval 0 < 𝜃 < 1. For any 𝜃 in this interval,
𝑝𝑗 > 0 and ∑3𝑗=1 𝑝𝑗 = 1. A sample of 𝑛 is taken from the population and the number of each
genotype, 𝑦1 , 𝑦2 and 𝑦3 observed. We want to test whether it is reasonable to assume that
the probabilities have the form given above for some value of 𝜃.
𝑦 𝑦 𝑦
1. Likelihood function : 𝐿(𝑝1 , 𝑝2 , 𝑝3 | ) ∝ 𝑝1 1 𝑝2 2 𝑝3 3 .

2. 𝐻0 : 𝑝1 = 𝜃 2 , 𝑝2 = 2𝜃(1 − 𝜃) , 𝑝3 = (1 − 𝜃)2 .
Ω = {(𝑝1 , 𝑝2 , 𝑝3 ): 𝑝𝑗 > 0 , Σ𝑝𝑗 = 1} . (dim = k − 1 = 3 − 1 = 2)
Ω0 = {𝜃: 0 < 𝜃 < 1} (dim = 1) .
𝑦
3. In Ω the ML estimators are 𝑝̂𝑗 = 𝑛𝑗 , 𝑗 = 1,2,3,
In Ω0 we must determine the ML estimator of 𝜃.
Under 𝐻0 ,

ℓ𝑛 𝐿(𝑝1 , 𝑝2 , 𝑝3 | ) = 𝑐 + 𝑦1 ℓ𝑛 𝜃 2 + 𝑦2 ℓ𝑛(2𝜃(1 − 𝜃)) + 𝑦3 ℓ𝑛(1 − 𝜃)2


= 𝑐 + 2𝑦1 ℓ𝑛 𝜃 + 𝑦2 ℓ𝑛2 + 𝑦2 ℓ𝑛 𝜃 + 𝑦2 ℓ𝑛(1 − 𝜃) + 2𝑦3 ℓ𝑛(1 − 𝜃) .

𝜕ℓ𝑛 𝐿 2𝑦1 𝑦2 𝑦2 2𝑦
= + − 1−𝜃 − 1−𝜃3 .
𝜕𝜃 𝜃 𝜃
Set equal to zero, then
1 1
̂ (2𝑦1
𝜃
+ 𝑦2 ) = ̂ (𝑦2
1−𝜃
+ 2𝑦3 )
𝑎𝑛𝑑 (2.48)
2𝑦1 +𝑦2 2𝑦1 +𝑦2
𝜃̂ = = .
2𝑦1 +2𝑦2 +2𝑦3 2𝑛

4. So
3
𝑦𝑗 𝑦𝑗
sup 𝐿(𝑝1 , 𝑝2 , 𝑝3 |𝒙) ∝ ∏( )
Ω 𝑛
𝑗=1
𝑎𝑛𝑑
sup𝐿(𝜃|𝒙) ∝ 𝜃̂ 2𝑦1 [2𝜃̂ (1 − 𝜃̂)]𝑦2 (1 − 𝜃̂)2𝑦3
Ω0
𝑠𝑜 𝑡ℎ𝑎𝑡
𝑦1 𝑦2 𝑦3
𝑛𝜃̂ 2 𝑛2𝜃̂(1 − 𝜃̂) 𝑛(1 − 𝜃̂)2
𝜆(𝒙) = ( ) ( ) ( )
𝑦1 𝑦2 𝑦3
𝑎𝑛𝑑
𝑦1 𝑦2 𝑦3
−2ℓ𝑛 𝜆(𝒙) = 2𝑦1 ℓ𝑛 ( 2 ) + 2𝑦2 ℓ𝑛 ( ) + 2𝑦3 ℓ𝑛 ( )
𝑛𝜃̂ ̂ ̂
𝑛2𝜃(1 − 𝜃) 𝑛(1 − 𝜃̂)2
𝑖. 𝑒.
3
𝑂𝑗
𝑄1 = 2 ∑ 𝑂𝑗 ℓ𝑛 ( )
𝐸𝑗
𝑗=1
with degrees of freedom = 1.

2
5. Reject 𝐻0 if 𝑄1 > 𝜒1,1−𝛼 . (2.48)

In general for this type of example (H0 has pi’s being a function of 𝜃), our rejection rule is:
2
Reject 𝐻0 if 𝑄k−2 > 𝜒k−2,1−𝛼 .

See remarks after previous example – you can be asked Pearson’s Statistic for this case as well.

2.5.2 Contingency Tables

In a two–way contingency table we suppose that 𝑛 individuals or items are classified according
to criteria 𝐴 and 𝐵, that there are 𝑟 classifications in 𝐴 and 𝑐 classifications in 𝐵, and that
the number of individuals belonging to 𝐴𝑖 and 𝐵𝑗 is 𝑁𝑖𝑗 . We then have a 𝑟 × 𝑐 contingency
table with cell frequencies 𝑁𝑖𝑗 and ∑𝑟𝑖=1 ∑𝑐𝑗=1 𝑁𝑖𝑗 = 𝑛.

We shall denote the row totals by 𝑁𝑖 . and the column totals by 𝑁 .𝑗 , that is,

𝑁𝑖 . = ∑𝑐𝑗=1 𝑁𝑖𝑗 𝑎𝑛𝑑 𝑁 .𝑗 = ∑𝑟𝑖=1 𝑁𝑖𝑗 .

𝐵1 𝐵2 ⋯ 𝐵𝑐
𝐴1 𝑁11 𝑁12 ⋯ 𝑁1𝑐 𝑁1 .
𝐴2 𝑁21 𝑁22 ⋯ 𝑁2𝑐 𝑁2 .
⋮ ⋮ ⋮ ⋮ ⋮
𝐴𝑟 𝑁𝑟1 𝑁𝑟2 ⋯ 𝑁𝑟𝑐 𝑁𝑟 .
𝑁 .1 𝑁 .2 ⋯ 𝑁 .𝑐 𝑛

The 𝑛 individuals can again be regarded as a sample from a multinomial distribution with
probabilities 𝑝𝑖𝑗 , 𝑖 = 1, … , 𝑟 , 𝑗 = 1, … , 𝑐, for the 𝑟 × 𝑐 cells.

1. The likelihood function is then

𝑁
𝐿(𝑝𝑖𝑗 , 𝑖 = 1, … , 𝑟 , 𝑗 = 1, … , 𝑐|𝑁𝑖𝑗 ) ∝ ∏𝑟𝑖=1 ∏𝑐𝑗=1 𝑝𝑖𝑗𝑖𝑗 . (2.49)

2. The null hypothesis is now the independence of criteria 𝐴 and 𝐵. This is different from the
null hypothesis in goodness–of–fit tests. We want to determine whether 𝐴𝑖 is independent of
𝐵𝑗 for all 𝑖, 𝑗. That is, if 𝑃[𝐴𝑖 𝑎𝑛𝑑 𝐵𝑗 ] = 𝑃[𝐴𝑖 ]𝑃[𝐵𝑗 ]. So if we write 𝑃[𝐴𝑖 𝑎𝑛𝑑 𝐵𝑗 ] =
𝑝𝑖𝑗 , 𝑃[𝐴𝑖 ] = 𝑝𝑖 . and 𝑃[𝐵𝑗 ] = 𝑝 .𝑗 , then the null hypothesis is:

𝐻0 : 𝑝𝑖𝑗 = 𝑝𝑖 . 𝑝 .𝑗 ; 𝑖 = 1, … , 𝑟 , 𝑗 = 1, … , 𝑐 .

Note: The hypotheses for this situation are the same as:
• H0: the two categories are independent versus H1: the two categories are dependent
• OR H0: there is not a relationship between the two categories versus H 1: there is a
relationship between the two categories
• OR H0: there is no interaction between the two categories versus H1: there is
interaction between the two categories

When 𝐻0 is not true, there is interaction between the two criteria of classification.
Now
Ω = {(𝑝11 , … , 𝑝𝑟𝑐 ): 0 ≤ 𝑝𝑖𝑗 ≤ 1 , ΣΣ𝑝𝑖𝑗 = 1} (dim = 𝑟𝑐 − 1)
Ω0 = {(𝑝1 . , … , 𝑝𝑟 . , 𝑝 .1 , … , 𝑝 .𝑐 ): 0 ≤ 𝑝𝑖 . , 𝑝 .𝑗 ≤ 1, ∑𝑖 𝑝𝑖 . = 1, ∑𝑗 𝑝 .𝑗 = 1}
(dim = 𝑟 − 1 + 𝑐 − 1)
𝑁𝑖𝑗
3. In Ω the ML estimators of 𝑝𝑖𝑗 are 𝑝̂𝑖𝑗 = 𝑛 .
In Ω0 the ML estimators for 𝑝𝑖 . and 𝑝 .𝑗 are
𝑁𝑖 . 𝑁 .𝑗
𝑝̂ 𝑖 . = , 𝑖 = 1, … , 𝑟 , 𝑎𝑛𝑑 𝑝̂ .𝑗 = , 𝑗 = 1, … , 𝑐 .
𝑛 𝑛

4.
𝑁 𝑁 𝑁𝑖𝑗
sup𝐿(𝑝𝑖𝑗 |𝑁𝑖𝑗 ) ∝ ∏𝑟𝑖=1 ∏𝑐𝑗=1 𝑝̂𝑖𝑗𝑖𝑗 = ∏𝑟𝑖=1 ∏𝑐𝑗=1 ( 𝑖𝑗) ,
Ω 𝑛
𝑁 𝑁𝑖𝑗 𝑁 .𝑗 𝑁𝑖𝑗
sup𝐿(𝑝𝑖 . , 𝑝 .𝑗 |𝑁𝑖𝑗 ) ∝ ∏𝑟𝑖=1 ∏𝑐𝑖=1 (𝑝̂ 𝑖 . 𝑝̂ .𝑗 )𝑁𝑖𝑗 = ∏𝑟𝑖=1 ∏𝑐𝑗=1 ( 𝑖 . ) ( ) .
Ω0 𝑛 𝑛

Then
𝑁 𝑁 .𝑗
(∏𝑟𝑖=1 𝑁𝑖 . 𝑖 . )(∏𝑐𝑗=1 𝑁 .𝑗 )
𝜆(𝑁𝑖𝑗 ) = 𝑁𝑖𝑗 (2.50)
𝑛𝑛 ∏𝑟𝑖=1 ∏𝑐𝑗=1 𝑁𝑖𝑗

𝑟 ∑𝑐𝑗 𝑁𝑖𝑗
since ∏𝑟𝑖 ∏𝑐𝑗 𝑛𝑁𝑖𝑗 = 𝑛∑𝑖 = 𝑛𝑛
and
𝑁 ∑𝑐 𝑁𝑖𝑗 𝑁
∏𝑟𝑖 ∏𝑐𝑗 𝑁𝑖 . 𝑖𝑗 = ∏𝑟𝑖 𝑁𝑖 . 𝑗 = ∏𝑟𝑖 𝑁𝑖 . 𝑖 . . (2.51)
Further
𝑁
−2 ℓ𝑛 𝜆(𝑁𝑖𝑗 ) = 2 ∑𝑟𝑖 ∑𝑐𝑗 𝑁𝑖𝑗 ℓ𝑛 𝑁𝑖 .𝑁𝑖𝑗 .𝑗
𝑛

𝑂
i.e. 𝑄 = 2 ∑𝑟𝑖 ∑𝑐𝑗 𝑂𝑖𝑗 ℓ𝑛 𝐸 (Prove).
𝑖𝑗

𝑖𝑗

The degrees of freedom is then no. of free parameters in Ω – no. of free parameters in Ω0
= (𝑟𝑐 − 1) − (𝑟 − 1 + 𝑐 − 1)
= (𝑟 − 1)(𝑐 − 1) .

2
5. So we reject 𝐻0 (independence) if 𝑄 > 𝜒(𝑟−1)(𝑐−1) ; 1−𝛼 .

Remark : The Pearson statistic for testing the above hypothesis would again be similar to (2.47),
namely

(𝑂𝑖𝑗 −𝐸𝑖𝑗 )2 .
𝑄′ = ∑𝑟𝑖 ∑𝑐𝑗 2
∼ 𝜒(𝑟−1)(𝑐−1) . (2.52)
𝐸𝑖𝑗

As before, use the Q given just under 2.51, and you use the Q’ above in equation 2.52 ONLY IF I
ask you to use the Pearson statistic
Example 2.14 : A thousand individuals were classified according to sex and according to
whether or not they were colour–blind as follows

(Oij) Male Female


Not Colour–blind 442 514 956
Colour–blind 38 6 44
480 520 1000

Test whether the two categories are independent at the 1%.

Solution:
𝐻0 Colour–blindness is independent of gender.
𝐻1 Colour–blindness is dependent of gender.

𝑁𝑖 . 𝑁 .𝑗
The table of expected frequencies, 𝐸𝑖𝑗 = , is as follows:
𝑛

Eij Male Female


Not Colour–blind 956×480 956×520 956
1000 1000
= 458.88 = 497.12
44×480 44×520
Colour–blind 44
1000 1000
= 21.12 = 22.88
480 520 1000

Oij ln(Oij/Eij) yeilds

442 ln(442/458.88) 514 ln(514/497.12)


= -16.5657 = 17.1634
22.3199 -8.03102

From (2.51) follows that

𝑂
𝑄 = 2 ∑𝑟𝑖 ∑𝑐𝑗 𝑂𝑖𝑗 ℓ𝑛 𝐸𝑖𝑗
𝑖𝑗
𝑄 = 2(−16.5657 + 17.1635 + 22.3199 − 8.0310)
= 29.7732 .
2 2 2
The 1% critical value is 𝜒(𝑟−1)(𝑐−1) ; 1−0.01 = 𝜒(2−1)(2−1) ; 1−0.01 = 𝜒1;0.99 = 6.635.

2
We reject 𝐻0 if 𝑄 > 𝜒(𝑟−1)(𝑐−1) ; 1−𝛼
29.7732 > 6.635
Reject H0
At a 1% significance level we have sufficient evidence to say that there is a relationship between
colour-blindness and gender
OR At a 1% significance level we have sufficient evidence to say that there is interaction between
colour-blindness and gender
OR At a 1% significance level we have sufficient evidence to say that colour-blindness and gender
are dependent
OR At a 1% significance level we reject the hypothesis that colour-blindness and gender are
independent

(The Pearson statistic (2.52) is calculated as 𝑄′ = 27.1387, leading to the same conclusion.)

2.6 Approximate Large–Sample Tests


(To make the work easier, the work is explained differently here than the original notes)

Another common method of constructing a large–sample test statistic is based on an estimator


that has an asymptotic normal distribution. Suppose we wish to test a hypothesis about
parameter 𝜇 𝑜𝑟 𝐸(𝑋). 𝑋 is the estimator of 𝜇, based on a sample of size n, that has been
derived by the maximum likelihood method or method of moments. We know that by the central
limit theorem:

𝑋−𝜇 𝑋 − 𝜇 𝜇̂ − 𝜇
𝑍𝑛 = = = ~́𝑁(0,1)
𝜎/√𝑛 𝜎𝑋 𝜎𝜇̂
(for n large)

The 3 components are: (1) 𝜇, which gets estimated by (2) 𝜇̂ , which has standard deviation (3)
𝜎𝜇̂ .
We replace the 𝜇 with a 𝜇0 as we work with supremum under 𝐻0 in our definition of 𝛼.
If the data is normally distributed, the ~́ becomes ~.
Remember ~́ means “is approximately distributed as”, while ~ means “is distributed as”.
Thus our LR-test results (using the results of example 2.3) will be:
𝜎 𝜎
a) 𝐻0 : 𝜇 = 𝜇0 vs 𝐻1 : 𝜇 ≠ 𝜇0 Reject H0 if 𝑥 ≤ 𝜇0 − 𝑧1−𝛼 𝑛 or 𝑥 ≥ 𝜇0 + 𝑧1−𝛼 𝑛
2 √ 2 √
𝜎
b) 𝐻0 : 𝜇 ≤ 𝜇0 vs 𝐻1 : 𝜇 > 𝜇0 Reject H0 if 𝑥 ≥ 𝜇0 + 𝑧1−𝛼
√𝑛
𝜎
c) 𝐻0 : 𝜇 ≥ 𝜇0 vs 𝐻1 : 𝜇 < 𝜇0 Reject H0 if 𝑥 ≤ 𝜇0 − 𝑧1−𝛼
√𝑛
Or alternatively
𝑥−𝜇0 𝑥−𝜇0
a) 𝐻0 : 𝜇 = 𝜇0 vs 𝐻1 : 𝜇 ≠ 𝜇0 Reject H0 if ≤ −𝑧1−𝛼 or ≥ 𝑧1−𝛼
𝜎𝑋 2 𝜎𝑋 2
𝑥−𝜇0
(or use just | | ≥ 𝑧1−𝛼 )
𝜎𝑋 2
𝑥−𝜇0
b) 𝐻0 : 𝜇 ≤ 𝜇0 vs 𝐻1 : 𝜇 > 𝜇0 Reject H0 if ≥ 𝑧1−𝛼
𝜎𝑋
𝑥−𝜇0 𝜎
c) 𝐻0 : 𝜇 ≥ 𝜇0 vs 𝐻1 : 𝜇 < 𝜇0 Reject H0 if ≤ −𝑧1−𝛼 where 𝜎𝑋 =
𝜎𝑋 √𝑛
Or alternatively
a) 𝐻0 : 𝜇 = 𝜇0 vs 𝐻1 : 𝜇 ≠ 𝜇0 Reject H0 if |𝑍𝑛 | ≥ 𝑧1−𝛼
2
b) 𝐻0 : 𝜇 ≤ 𝜇0 vs 𝐻1 : 𝜇 > 𝜇0 Reject H0 if 𝑍𝑛 ≥ 𝑧1−𝛼
c) 𝐻0 : 𝜇 ≥ 𝜇0 vs 𝐻1 : 𝜇 < 𝜇0 Reject H0 if 𝑍𝑛 ≤ −𝑧1−𝛼
̂ −𝜇0
𝜇 𝑋−𝜇0 𝜎
where 𝑍𝑛 = = where 𝜎𝜇̂ =
𝜎𝜇
̂ 𝜎𝑋 √𝑛

We thus use these results for our approximate tests (i.e. even when the data is not normally
distributed.) If the 𝜎𝜇̂ or 𝜎 contains unknown parameter(s), we apply Slutsky’s theorem and
̂ −𝜇
𝜇
thus use ~́𝑁(0,1)
𝑠𝜇
̂

Slutsky’s theorem states that if

𝑊𝑛 −𝜃
∼ 𝑁(0,1) , 𝑡ℎ𝑒𝑛
𝜎𝑛
𝑊𝑛 −𝜃 . 𝜎𝑛 𝑃
∼ 𝑁(0,1) 𝑖𝑓 ⟶ 1 𝑤ℎ𝑒𝑛 𝑛 → ∞ .
𝑆𝑛 𝑆𝑛

Notes:

1. For example, if we deal with the Bernoulli distribution (see example 2.15), then:
𝜇 = 𝐸(𝑋) = 𝑝

1 1
and is estimated by: 𝜇̂ = 𝑋 = ∑ 𝑋𝑖 = (𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠)
𝑛 𝑛
= 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 = 𝑝̂

𝜎 √𝑉(𝑋) √𝑝(1−𝑝)
with standard deviation 𝜎𝜇̂ = 𝜎𝑝̂ = 𝜎𝑋 = = =
√𝑛 √𝑛 √𝑛

2. If we deal with 2 populations: (2 Samples: X1,…,Xn and Y1,…,Ym) Let D = X – Y


𝜇 = 𝜇𝐷 = 𝜇𝑋−𝑌 = 𝐸(𝑋 − 𝑌) = 𝐸(𝑋) − 𝐸(𝑌) = 𝜇𝑋 − 𝜇𝑌

and is estimated by: 𝐷 = 𝑋 − 𝑌

with standard deviation:


𝑉(𝑋) 𝑉(𝑌) 𝜎2 𝜎𝑌2
𝜎𝜇̂ = 𝜎𝐷 = 𝜎𝑋−𝑌 = √𝑉(𝑋 − 𝑌) = √𝑉(𝑋) + 𝑉(𝑌) = √ + = √ 𝑛𝑋 +
𝑛 𝑚 𝑚

3. Dealing with the Poisson distribution (1 population), and dealing with 2 Bernoulli
populations, are tutorial questions.

*** If θ̂ is the MLE of θ, then g(θ̂) is the MLE of g(θ) (Invariance property of MLE’s)
e.g. If θ̂ is the MLE of θ, then θ̂2 is the MLE of θ2, and sin(θ̂) is the MLE of sin(θ).
Example 2.15 :
Let 𝑋1 , … , 𝑋𝑛 be a random sample from a Ber( 𝑝 ) population. Consider testing 𝐻0 : 𝑝 ≤ 𝑝0
versus 𝐻1 : 𝑝 > 𝑝0 .

𝑝̂−𝑝 𝑝̂−𝑝0
𝑍𝑛 = ~́𝑁(0,1), and in our definition of α: 𝑍𝑛 becomes
𝜎𝑝
̂ 𝜎𝑝
̂
𝑝̂−𝑝0
Thus for our approximate test we have case (b), we reject H0 if 𝑍𝑛 ≥ 𝑧1−𝛼 i.e. ≥ 𝑧1−𝛼
𝜎𝑝
̂

𝜎 √𝑉(𝑋) √𝑝(1−𝑝) 𝑝(1−𝑝) 𝑝̂(1−𝑝̂)


where 𝜎𝑝̂ = 𝜎𝑋 = = = =√ which we estimate by 𝑠𝑝̂ = √
√ 𝑛 √𝑛 √𝑛 𝑛 𝑛
because ***
𝑝̂−𝑝0
Thus we reject H0 if ≥ 𝑧1−𝛼
̂ (1−𝑝
𝑝 ̂)

𝑛

Let 𝑋1 , … , 𝑋𝑛 be a random sample from a Ber(𝑝) population. If we are testing the two–sided
hypothesis 𝐻0 : 𝑝 = 𝑝0 versus 𝐻1 : 𝑝 ≠ 𝑝0 :

𝑝̂−𝑝 𝑝̂−𝑝0
𝑍𝑛 = ~́𝑁(0,1), and in our definition of α: 𝑍𝑛 becomes
𝜎𝑝
̂ 𝜎𝑝
̂
𝑝̂−𝑝0
For our approximate test we have case (a), we reject H0 if |𝑍𝑛 | ≥ 𝑧1−𝛼/2 i.e. | | ≥ 𝑧1−𝛼/2
𝜎𝑝
̂

𝜎 √𝑉(𝑋) √𝑝(1−𝑝) 𝑝(1−𝑝)


where 𝜎𝑝̂ = 𝜎𝑋 = = = =√
√𝑛 √𝑛 √𝑛 𝑛
𝑝̂(1−𝑝̂)
because 𝑝 = 𝑝0 under 𝐻0 , instead of estimating this with 𝑠𝑝̂ = √ as above, we rather
𝑛
𝑝0 (1−𝑝0 )
use √ 𝑛
𝑝̂−𝑝0
Thus we reject H0 if | | ≥ 𝑧1−𝛼/2
√𝑝0 (1−𝑝0 )/𝑛

Class example:

At the beginning of last year, the entry requirements for a certain module were lowered. In that
year, 18 students failed the module and there were 60 students in the course. Test whether 20%
or less of students still fail the module, or whether more than 20% of students now fail the
module.

Solution:

We are clearly dealing with 60 Bernoulli trials (trials resulting in a success or failure).
Xi = 1 if student i failed and 0 otherwise (i = 1,…,60).
H0: p ≤ 0.2 versus H1: p > 0.2
𝑝̂−𝑝0
We reject H0 if ≥ 𝑧1−𝛼 (from the 1st half of example 2.15)
√𝑝̂(1−𝑝̂)/𝑛
18
𝑝̂−𝑝0 −0.2
60
= = 1.6903 𝑧1−𝛼 = 𝑧0.95 = 1.644854
√𝑝̂(1−𝑝̂)/𝑛 18 18
√ (1− )/60
60 60
1.6903 ≥ 1.644854
Reject H0
At a 5% significance level, we have sufficient evidence to say that more than 20% of students now
fail the module.

Note: Whenever doing any practical (data) hypothesis test question (where data is given) in this
module: a) state the hypotheses, b) state the rejection rule, c) calculate the lhs of the rejection
rule (called the test statistic if the rhs is just a table value (or minus table value)), d) calculate the
rhs of the rejection rule (if it is just a table value it is known as the critical value), e) say whether
we reject H0 or not, and f) give a conclusion.
(Even if marks were not awarded for some of these parts in tutorials, they may be awarded marks
in tests/exams.)

EXERCISES 2

1. Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ). Find the LR–test of size 𝛼 for 𝐻0 : 𝜎 2 = 𝜎02 versus 𝐻1 : 𝜎 2 ≠ 𝜎02
when (a) 𝜇 is known, (b) 𝜇 is unknown.

2. Consider paired samples (𝑋1 , 𝑌1 ), (𝑋2 , 𝑌2 ), … (𝑋𝑛 , 𝑌𝑛 ) from two normal distributions where
𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇1 , 𝜎12 ) and 𝑌1 , … , 𝑌𝑛 ∼ 𝑁(𝜇2 , 𝜎22 ) and 𝑋 and 𝑌 are not independent with
Cov (𝑋, 𝑌) = 𝜌𝜎1 𝜎2 . Derive a test of size 𝛼 for 𝐻0 : 𝜇1 − 𝜇2 = 0 versus 𝐻1 : 𝜇1 − 𝜇2 ≠ 0
where 𝜎12 and 𝜎22 are unknown. (Hint: Work with the distribution of 𝑋 − 𝑌).

3. Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇1 , 𝜎12 ) independently from 𝑌1 , … , 𝑌𝑚 ∼ 𝑁(𝜇2 , 𝜎22 ). Derive the LR–test
𝜎2 𝜎2
of size 𝛼 for 𝐻0 : 𝜎12 = 𝜆0 versus 𝐻1 : 𝜎12 ≠ 𝜆0 where 𝜇1 and 𝜇2 are unknown.
2 2

4. Consider 𝑘 normal populations where 𝑋𝑖 ∼ 𝑁(𝜇𝑖 , 𝜎 2 ) , 𝑖 = 1, … , 𝑘 , independently. A


sample of size 𝑛 is drawn from each population and 𝜎 2 is unknown. Find the LR–test of size 𝛼
for testing 𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘 (= 𝜇0 ) versus 𝐻1 : Not all means are equal.

5. Using the asymptotic distribution of the LR–statistic, find an approximate test for the
hypothesis 𝐻0 : 𝜎12 = 𝜎22 = ⋯ = 𝜎𝑘2 (= 𝜎02 ) versus 𝐻1 : Not all variances are equal, where you
have 𝑘 independent samples of size 𝑛 each from normal distributions with unknown means.
6. A random sample, 𝑋1 , … , 𝑋𝑛 , is drawn from a Pareto population with pdf

𝜃𝜈 𝜃
𝑓(𝑥|𝜃, 𝜈) = , 𝜈≤𝑥<∞
𝑥 𝜃+1
, 𝜃 > 0, 𝜈 > 0 .

(a) Find the ML estimators of 𝜃 and 𝜈.


(b) Show that the LR–test of 𝐻0 : 𝜃 = 1 versus 𝐻1 : 𝜃 ≠ 1 has a critical region of the form R =
{𝒙 ∶ 𝑇(𝒙) ≤ 𝑐1 or 𝑇(𝒙) ≥ 𝑐2 }, where 0 < 𝑐1 < 𝑐2 and

𝑖 𝑖 ∏𝑛 𝑋
𝑇(𝑿) = ℓ𝑛 [(min𝑋 ].
)𝑛 𝑖

7. Suppose 𝑋1 , … , 𝑋𝑛 ∼ Exp (𝜃) independent of 𝑌1 , … , 𝑌𝑚 ∼ Exp (𝜇).

(a) Find the LR–test of 𝐻0 : 𝜃 = 𝜇 versus 𝐻1 : 𝜃 ≠ 𝜇.


Σ𝑋𝑖
(b) Show that the test is based on the statistic 𝑇 = Σ𝑋 +Σ𝑌 .
𝑖 𝑗
(c) What is the distribution of 𝑇 under 𝐻0 ?

8. Find the LR–test of 𝐻0 : 𝜃 = 0 versus 𝐻1 : 𝜃 ≠ 0 based on a sample of 𝑛 from a population


with pdf

1
𝑓(𝑥|𝜃, 𝜆) = 𝜆 𝑒 −(𝑥−𝜃)/𝜆 , 𝜃 < 𝑥 < ∞ ,
where 𝜆 is unknown.

9. Let 𝑋1 , … , 𝑋𝑛 be a sample from a population with pdf

2𝜃2
𝑓(𝑥|𝜃) = ,𝜃 < 𝑥 < ∞ .
𝑥3

Derive the LR–test of size 𝛼 for the hypothesis 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 . Also find the
power function of this test.

10. Let 𝑋1 , … , 𝑋𝑛 be a sample from a population with pdf

𝑓(𝑥|𝛽) = 𝛽 2 𝑥 𝑒 −𝛽𝑥 , 0 < 𝑥 < ∞.

Find a UMP test of size 𝛼 for the hypothesis 𝐻0 : 𝛽 ≥ 𝛽0 versus 𝐻1 : 𝛽 < 𝛽0.

11. Let 𝑋1 , … , 𝑋𝑛 be a sample from a population with pdf

𝑓(𝑥|𝜃) = 𝜃𝑥 𝜃−1 , 0 < 𝑥 < 1 .

Find a UMP test of size 𝛼 for the hypothesis 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 .


12. Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) with 𝜇 known. Find the UMP test of size 𝛼 for the hypothesis
𝐻0 : 𝜎 2 ≥ 𝜎02 versus 𝐻1 : 𝜎 2 < 𝜎02 .

13. Let 𝑋1 , … , 𝑋𝑛 ∼ Poisson (𝜆).

(a) Find the UMP test for 𝐻0 : 𝜆 ≤ 𝜆0 versus 𝐻1 : 𝜆 > 𝜆0 .

(b) Consider the case where 𝑛 is large and derive an approximate test for the hypotheses in
(a).

(c) Assume in (b) that 𝐻0 : 𝜆 ≤ 1 versus 𝐻1 : 𝜆 > 1. Determine the sample size 𝑛 so that the
size of the test is approximately 0,05 and 𝑃[Reject 𝐻0 |𝜆 = 2] = 0,9.

14. Let 𝑋1 , … , 𝑋𝑛 ∼ Ber (𝑝1 ) independent of 𝑌1 , … , 𝑌𝑚 ∼ Ber (𝑝2 ). Derive an approximate


test of size 𝛼 for the hypothesis 𝐻0 : 𝑝1 = 𝑝2 versus 𝐻2 : 𝑝1 ≠ 𝑝2 .

15. Suppose 𝑔(𝑡|𝜃) = ℎ(𝑡)𝑐(𝜃)𝑒 𝑤(𝜃)𝑡 is a one–parameter exponential family for the random
variable 𝑇. Show that this family has a monotone likelihood ratio (MLR) in 𝑡 if 𝑤(𝜃) is an
increasing function of 𝜃. What would then be the form of the critical region for the test 𝐻0 : 𝜃 ≤
𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 . Give three examples of such a family.

16. Given the sample (-0.2 -0.9 -0.6 0.1) from a normal population with unit variance, test the
assumption that the population mean is greater than zero at the 5% level.

17.
(a) Given the sample (-4.4 4.0 2.0 -4.8) from a normal population with variance 4 and the sample
(6.0 1.0 3.2 -0.4) from a normal population with variance 5, test at the 5% level that the means
differ by no more than one unit.
(b) Test the hypothesis that the two samples in (a) came from populations with the same
variance. Use 𝛼 = 0,05.
(c) Test the hypothesis in (a) if you assume that the two variances are equal but unknown.

18. A metallurgist made four determinations of the melting point of manganese : 1269, 1271,
1263 and 1265 degrees centigrade. Test the hypothesis that the mean 𝜇 is equal to the
published value of 1260. Use 𝛼 = 0,05.

19. The metallurgist in Problem 18 decided that his measurements should have a standard
deviation of 2 degrees or less. Are the data consistent with this supposition at the 5% level?
20. Two drugs for high blood pressure, A and B, must be compared. Ten patients are treated
with drug A and the decrease in blood pressure measured. Later the same 10 patients are treated
with drug B and blood pressure measured. The paired results are as follows :

Decrease in blood pressure


Patient 1 2 3 4 5 6 7 8 9 10
Drug A 10 8 0 4 6 1 7 9 11 -1
Drug B 6 4 2 4 2 -2 4 7 7 1

(a) Test the null hypothesis that there is no difference in the effects of the two drugs.
(b) Assume that 20 patients were randomly assigned to the two drugs. Test now the hypothesis
in (a).

21. To compare two diets, 20 people are grouped into 10 pairs where each person of a pair has
the same mass. One person in a pair then follows diet A while the other person follows diet B.
After three weeks the loss in mass was as follows:

Diet A 12 16 9 20 8 7 10 7 13 5
Diet B 8 14 12 14 4 9 10 5 9 2

(a) Supporters of diet A claim that it is better than diet B. Test with 𝛼 = 0,05 if this claim is
justified.
(b) Assume that the 20 persons were randomly assigned to the two diets. Test now the
hypothesis in (a).

22. In a certain city the number of car accidents on each day of the working week are recorded
over a period of a few months.

Day Mon Tues Wed Thurs Fri


Number 21 14 16 18 31

(a) With 𝛼 = 0.05, test the null hypothesis that the probability for an accident is the same for
all days of the week.
(b) Test the null hypothesis that an accident is twice as likely on a Friday than on any other day.

23. A prominent baseball player’s batting average dropped from 0.313 in one year to 0.280 in
the following year. He was at bat 374 times during the first year and 268 times during the second
year. Is the hypothesis tenable at the 5% level that his hitting ability was the same during the two
years? Use two different approaches to test this hypothesis. (Hint: See Exercise 14 and Section
5.2.2).
24. For the data given in Example 2.14, the following genetic model is assumed :

𝑝 𝑝2
+ 𝑝𝑞
2 2
𝑞 𝑞2
2 2

where 𝑝 = 1 − 𝑞 is the proportion of non–colour–blind individuals in the population. Are the


data consistent with the model?

25. Of 64 offspring of a certain cross between guinea pigs, 34 were red, 10 were black and 20
were white. According to the genetic model, these numbers should be in the ratio 9/3/4. Are the
data consistent with the model at the 5% level?

26. Gilby classified 1725 school children according to intelligence and apparent family economic
level. They were classified as follows:

Dull Intelligent Very capable


Very well clothed 81 322 233
Well clothed 141 457 153
Poorly clothed 127 163 48

Test for independence at the 1% level.

27. Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) with 𝜎 2 known. For 𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0 , find the


exact distribution of −2ℓ𝑛 Λ, where Λ is the GLR–statistic.
Chapter 3
Interval Estimation

3.1 Introduction
The point estimation of a parameter 𝜃 is a guess of a single value as the value of 𝜃. In this
chapter we discuss interval estimation and, more generally, set estimation. The inference in a set
estimation problem is the statement that “ 𝜃 ∈ 𝐶 ” where 𝐶 ⊂ Θ and 𝐶 = 𝐶(𝑿) is a set
determined by the value of the data 𝑿 = 𝒙 observed. If 𝜃 is real–valued, then we usually
prefer the set estimate 𝐶 to be an interval. Interval estimators will be the main topic of this
chapter.

As in the previous two chapters, this chapter is divided into two parts, the first concerned with
finding interval estimators and the second part concerned with evaluating the worth of the
estimators. We begin with a formal definition of an interval estimator.

Definition 3.1 : An interval estimate of a real–valued parameter 𝜃 is any pair of functions,


𝐿(𝑥1 , … , 𝑥𝑛 ) and 𝑈(𝑥1 , … , 𝑥𝑛 ), of a sample that satisfy 𝐿(𝒙) ≤ 𝑈(𝒙) for all 𝒙 ∈ 𝑆. If 𝑿 = 𝒙
is observed, the inference 𝐿(𝒙) ≤ 𝜃 ≤ 𝑈(𝒙) is made. The random interval [𝐿(𝑿), 𝑈(𝑿)] is
called an interval estimator.

We will use our previously defined conventions and write [𝐿(𝑿), 𝑈(𝑿)] for an interval
estimator of 𝜃 based on the random sample 𝑿 = (𝑋1 , … , 𝑋𝑛 ) and [𝐿(𝒙), 𝑈(𝒙)] for the
realized value of the interval. Although in the majority of cases we will work with finite values for
𝐿 and 𝑈 , there is sometimes interest in one–sided interval estimates. For instance, if
𝐿(𝒙) = −∞, then we have the one–sided interval (−∞, 𝑈(𝒙)] and the assertion is that “𝜃 ≤
𝑈(𝒙),” with no mention of a lower bound. We could similarly take 𝑈(𝒙) = ∞ and have a one–
sided interval [𝐿(𝒙), ∞).

1
Example of a point estimator: 𝜇̂ = 𝑋 = 𝑛 ∑ 𝑋𝑖
Example of a point estimate: 𝜇̂ = 𝑥 = 23.4

Example of an interval estimator (confidence interval) is: [𝑋 − 1, 𝑋 + 1]


Example of an interval estimate (confidence interval) is:
[𝑥 − 1, 𝑥 + 1] = [23.4 − 1 , 23.4 + 1] = [22.4 , 24.4]

Example 3.1 : For a sample 𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 from a 𝑁(𝜇, 1) distribution, an interval estimator


of 𝜇 is [𝑋 − 1, 𝑋 + 1]. This means that we will assert that 𝜇 is in this interval
At this point, it is natural to inquire as to what is gained by using an interval estimator. Previously,
we estimated 𝜇 with 𝑋 and now we have the less precise estimator [𝑋 − 1, 𝑋 + 1]. We surely
must gain something! By giving up some precision in our estimate (or assertion about 𝜇), we
have gained some confidence, or assurance, that our assertion is correct. When we estimate 𝜇
by 𝑋, the probability that we are exactly correct, that is, 𝑃(𝑋 = 𝜇), is zero. However, with an
interval estimator, we have a positive probability of being correct. The probability that 𝜇 is
covered by the interval [𝑋 − 1, 𝑋 + 1] can be calculated as

𝑃(𝜇 ∈ [𝑋 − 1, 𝑋 + 1]) = 𝑃(𝑋 − 1 ≤ 𝜇 ≤ 𝑋 + 1)


" = 𝑃(−𝑋 + 1 ≥ −𝜇 ≥ −𝑋 − 1) "
= 𝑃(−𝑋 − 1 ≤ −𝜇 ≤ −𝑋 + 1)
= 𝑃(−1 ≤ 𝑋 − 𝜇 ≤ 1)
−1 𝑋−𝜇 1
= 𝑃( ≤ ≤ )
√1/4 √1/4 √1/4
𝑋−𝜇
= 𝑃 (−2 ≤ ≤ 2)
√1/4
= 𝑃(−2 ≤ 𝑍 ≤ 2)
= 𝑃(𝑍 ≤ 2) − P(Z ≤ −2)
= 𝑃(𝑍 ≤ 2) − [1 − 𝑃(𝑍 ≤ 2)]
= 0.9772 − [1 − 0.9772]
= 0.9544 .

Thus we have over a 95% chance of covering the unknown parameter with our interval estimator.
Sacrificing some precision in our estimate, in moving from a point to an interval, has resulted in
increased confidence that our assertion is correct.

The purpose of using an interval estimator, rather than a point estimator, is to have some
guarantee of capturing the parameter of interest. The certainty of this guarantee is quantified in
the following definitions.

Definition 3.2 : For an interval estimator [𝐿(𝑿), 𝑈(𝑿)] of a parameter 𝜃 , the coverage
probability is the probability that the random interval [𝐿(𝑿), 𝑈(𝑿)] covers the true parameter,
𝜃. In symbols, it is denoted by either 𝑃𝜃 (𝜃 ∈ [𝐿(𝑿), 𝑈(𝑿)]), or 𝑃(𝜃 ∈ [𝐿(𝑿), 𝑈(𝑿)]|𝜃).

Definition 3.3 : For an interval estimator [𝐿(𝑿), 𝑈(𝑿)] of a parameter 𝜃, the confidence
coefficient of [𝐿(𝑿), 𝑈(𝑿)] is the infimum of the coverage probabilities, inf𝜃 𝑃𝜃 (𝜃 ∈
[𝐿(𝑿), 𝑈(𝑿)]).

For the previous example:


0.9544 is the coverage probability, while inf(0.9544) = 0.9544 is the confidence coefficient
𝜇
[x̅ - 1, x̅ + 1] was our interval estimate for μ
[x̅ - 1, x̅ + 1] (with confidence coefficient 0.9544) is our confidence interval for μ
In Bayesian statistics, parameters are seen as random variables.
In Inferencial statistics, parameters are seen as constants.

𝜇 ∈ [𝑋 − 1, 𝑋 + 1]

There are a number of things to be aware of in these definitions. One, it is important to keep in
mind that the interval is the random quantity, not the parameter. Therefore, when we write
probability statements such as 𝑃𝜃 (𝜃 ∈ [𝐿(𝑿), 𝑈(𝑿)]), these probability statements refer to , not
𝜃. In other words, think of 𝑃𝜃 (𝜃 ∈ [𝐿(𝑿), 𝑈(𝑿)]), which might look like a statement about a
random 𝜃 , as the algebraically equivalent 𝑃𝜃 (𝐿(𝑿) ≤ 𝜃, 𝑈(𝑿) ≥ 𝜃) , a statement about a
random .

Interval estimators, together with a measure of confidence (usually a confidence coefficient) are
sometimes known as confidence intervals. We will often use this term interchangeably with
interval estimator. A interval estimator with confidence coefficient equal to some value, say
1 − 𝛼, is simply called a 1 − 𝛼 confidence interval.

Another important point is concerned with coverage probabilities and confidence coefficients.
Since we do not know the true value of 𝜃, we can only guarantee a coverage probability equal
to the infimum, the confidence coefficient. In some cases this does not matter because the
coverage probability will be a constant function of 𝜃. In other cases, however, the coverage
probability can be a fairly variable function of 𝜃.
Example 3.2 : Let 𝑋1 , … , 𝑋𝑛 be a random sample from a uniform(0, 𝜃) population and let
𝑌 = max{𝑋1 , … , 𝑋𝑛 }. We are interested in an interval estimator of 𝜃. We consider two candidate
estimators: [𝑎𝑌, 𝑏𝑌], 1 ≤ 𝑎 < 𝑏 , and [𝑌 + 𝑐, 𝑌 + 𝑑], 0 ≤ 𝑐 < 𝑑 , where 𝑎, 𝑏, 𝑐 and 𝑑 are
specified constants. Note that 𝜃 is necessarily larger than 𝑦. For the first interval we have

𝑃𝜃 (𝜃 ∈ [𝑎𝑌, 𝑏𝑌]) = 𝑃𝜃 (𝑎𝑌 ≤ 𝜃 ≤ 𝑏𝑌)


1 1 1
" = 𝑃𝜃 ( ≥ ≥ ) "
𝑎𝑌 𝜃 𝑏𝑌
1 1 1
= 𝑃𝜃 ( ≤ ≤ )
𝑏𝑌 𝜃 𝑎𝑌
1 𝑌 1
= 𝑃𝜃 ( ≤ ≤ )
𝑏 𝜃 𝑎
1 1
= 𝑃𝜃 ( ≤ 𝑇 ≤ ). (𝑇 = 𝑌/𝜃)
𝑏 𝑎
𝑌
𝑇 = 𝑔(𝑌) = 𝜃

𝑌
To find the inverse function: 𝑇 = 𝜃 ⟹ 𝑌 = 𝑇𝜃 thus 𝑌 = 𝑔−1 (𝑇) = 𝑇𝜃

𝑑𝑔−1 (𝑡)
So =𝜃
𝑑𝑡

We previously saw (by using Theorem 1.4 in Exercise 1 of Chapter 1) that

𝑓𝑌 (𝑦) = 𝑛𝑦 𝑛−1 /𝜃 𝑛 , 0 ≤ 𝑦 ≤ 𝜃,

so the pdf of 𝑇 is:

𝑑𝑔−1 (𝑡) 𝑛(𝑡𝜃)𝑛−1 𝑛(𝑡𝜃)𝑛−1 𝑛(𝑡)𝑛−1 (𝜃)𝑛−1


𝑓𝑇 (𝑡) = 𝑓𝑌 (𝑔−1 (𝑡)) | 𝑑𝑡 | = 𝜃𝑛 𝜃 = 𝜃𝑛𝜃−1 = = 𝑛𝑡 𝑛−1
𝜃𝑛−1
for 0 ≤ 𝑌 ≤ 𝜃 ⟹ 0 ≤ 𝑇𝜃 ≤ 𝜃 ⟹ 0 ≤ 𝑇 ≤ 1
1
1 1 1/𝑎 1 𝑛 1 𝑛
We therefore have 𝑃𝜃 (𝑏 ≤ 𝑇 ≤ 𝑎) = ∫1/𝑏 𝑛𝑡 𝑛−1 𝑑𝑡 = t n |a1 = (𝑎) − (𝑏) . (3.1)
b

1 𝑛 1 𝑛
(Coverage probability = (𝑎) − (𝑏) .
1 𝑛 1 𝑛 1 𝑛 1 𝑛
Confidence coefficient = inf [(𝑎) − (𝑏) ] = (𝑎) − (𝑏) )
𝜃

The coverage probability of the first interval is independent of the value of 𝜃 and thus
1 𝑛 1 𝑛
(𝑎 ) − (𝑏 ) is the confidence coefficient of the interval.

For the other interval, for 𝜃 ≥ 𝑑 a similar calculation yields


𝑃𝜃 (𝜃 ∈ [𝑌 + 𝑐, 𝑌 + 𝑑]) = 𝑃𝜃 (𝑌 + 𝑐 ≤ 𝜃 ≤ 𝑌 + 𝑑)
= 𝑃𝜃 (𝑐 ≤ 𝜃 − 𝑌 ≤ 𝑑)
= 𝑃𝜃 (𝑐 − 𝜃 ≤ −𝑌 ≤ 𝑑 − 𝜃)
" = 𝑃𝜃 (−𝑐 + 𝜃 ≥ 𝑌 ≥ −𝑑 + 𝜃) "
" = 𝑃𝜃 (𝜃 − 𝑐 ≥ 𝑌 ≥ 𝜃 − 𝑑) "
= 𝑃𝜃 (𝜃 − 𝑑 ≤ 𝑌 ≤ 𝜃 − 𝑐) (3.2)
𝜃−𝑑 𝑌 𝜃−𝑐
= 𝑃𝜃 ( ≤𝜃≤ )
𝜃 𝜃
𝑑 𝑐
= 𝑃𝜃 (1 − 𝜃 ≤ 𝑇 ≤ 1 − 𝜃) (𝑇 = 𝑌/𝜃)
𝑐
1−𝑐/𝜃 1− 𝑐 𝑛 𝑑 𝑛
= ∫1−𝑑/𝜃 𝑛𝑡 𝑛−1 𝑑𝑡 = t n | 𝜃
𝑑 = (1 − 𝜃) − (1 − 𝜃 ) .
1−
𝜃

𝑐 𝑛 𝑑 𝑛
(Coverage probability = (1 − 𝜃) − (1 − 𝜃 ) .
𝑐 𝑛 𝑑 𝑛 𝑐 𝑛 𝑑 𝑛
Confidence coefficient = inf [(1 − 𝜃) − (1 − 𝜃 ) ] = lim [(1 − 𝜃) − (1 − 𝜃 ) ]
𝜃 𝜃→∞
= (1 − 0)𝑛 − (1 − 0)𝑛 = 1 − 1 = 0 )

In this case, the coverage probability depends on 𝜃 . Furthermore, it is straightforward to


calculate lim 𝑃𝜃 (𝜃 ∈ [𝑌 + 𝑐, 𝑌 + 𝑑]) = 0, showing that the confidence coefficient of this
𝜃→∞
interval is zero.

Alternative approach:
𝑌
𝑇 = 𝑔(𝑌) = .
𝜃

𝑌
To find the inverse function: 𝑇 = 𝜃 ⟹ 𝑌 = 𝑇𝜃 thus 𝑌 = 𝑔−1 (𝑇) = 𝑇𝜃

𝑑𝑔−1 (𝑡)
So =𝜃
𝑑𝑡

We previously saw (by using Theorem 1.4 in Exercise 1 of Chapter 1) that

𝑓𝑌 (𝑦) = 𝑛𝑦 𝑛−1 /𝜃 𝑛 , 0 ≤ 𝑦 ≤ 𝜃,

so the pdf of 𝑇 is:

𝑑𝑔−1 (𝑡) 𝑛(𝑡𝜃)𝑛−1 𝑛(𝑡𝜃)𝑛−1 𝑛(𝑡)𝑛−1 (𝜃)𝑛−1


𝑓𝑇 (𝑡) = 𝑓𝑌 (𝑔−1 (𝑡)) | 𝑑𝑡 | = 𝜃= = = 𝑛𝑡 𝑛−1 0 ≤ 𝑌 ≤ 𝜃 ⟹
𝜃𝑛 𝜃𝑛 𝜃−1 𝜃𝑛−1
0 ≤ 𝑇𝜃 ≤ 𝜃 ⟹ 0 ≤ 𝑇 ≤ 1

t
We therefore have 𝐹𝑇 (𝑡) = ∫0 𝑛𝑢𝑛−1 𝑑𝑢 = un |t0 = t 𝑛 0≤𝑇≤1
𝑃𝜃 (𝜃 ∈ [𝑎𝑌, 𝑏𝑌]) = 𝑃𝜃 (𝑎𝑌 ≤ 𝜃 ≤ 𝑏𝑌)
1 1 1
" = 𝑃𝜃 ( ≥ ≥ ) "
𝑎𝑌 𝜃 𝑏𝑌
1 1 1
= 𝑃𝜃 ( ≤ ≤ )
𝑏𝑌 𝜃 𝑎𝑌
1 𝑌 1
= 𝑃𝜃 ( ≤ ≤ )
𝑏 𝜃 𝑎
1 1
= 𝑃𝜃 ( ≤ 𝑇 ≤ ). (𝑇 = 𝑌/𝜃)
𝑏 𝑎
1 1
= 𝐹𝑇 ( ) − 𝐹𝑇 ( )
𝑎 𝑏
𝑛 𝑛
1 1
= ( ) −( )
𝑎 𝑏

The coverage probability of the first interval is independent of the value of 𝜃 and thus
1 𝑛 1 𝑛
(𝑎 ) − (𝑏 ) is the confidence coefficient of the interval.

For the other interval, for 𝜃 ≥ 𝑑 a similar calculation yields

𝑃𝜃 (𝜃 ∈ [𝑌 + 𝑐, 𝑌 + 𝑑]) = 𝑃𝜃 (𝑌 + 𝑐 ≤ 𝜃 ≤ 𝑌 + 𝑑)
= 𝑃𝜃 (𝑐 ≤ 𝜃 − 𝑌 ≤ 𝑑)
= 𝑃𝜃 (𝑐 − 𝜃 ≤ −𝑌 ≤ 𝑑 − 𝜃)
" = 𝑃𝜃 (−𝑐 + 𝜃 ≥ 𝑌 ≥ −𝑑 + 𝜃) "
" = 𝑃𝜃 (𝜃 − 𝑐 ≥ 𝑌 ≥ 𝜃 − 𝑑) "
= 𝑃𝜃 (𝜃 − 𝑑 ≤ 𝑌 ≤ 𝜃 − 𝑐) (3.2)
𝜃−𝑑 𝑌 𝜃−𝑐
= 𝑃𝜃 ( ≤𝜃≤ )
𝜃 𝜃
𝑑 𝑐
= 𝑃𝜃 (1 − 𝜃 ≤ 𝑇 ≤ 1 − 𝜃) (𝑇 = 𝑌/𝜃)
𝑐 𝑑 𝑐 𝑛 𝑑 𝑛
= 𝐹𝑇 (1 − 𝜃) − 𝐹𝑇 (1 − 𝜃 ) = (1 − 𝜃) − (1 − 𝜃 ) .

In this case, the coverage probability depends on 𝜃 . Furthermore, it is straightforward to


calculate lim 𝑃𝜃 (𝜃 ∈ [𝑌 + 𝑐, 𝑌 + 𝑑]) = 0, showing that the confidence coefficient of this
𝜃→∞
interval is zero.

3.2 Methods of finding Interval Estimators


We will examine three methods of finding confidence intervals, although there are some overlap
in the methods and different methods will usually give the same result. However, there are
situations where one method is preferable to another.
3.2.1 Inverting a Test Statistic (inverting a rejection region of a hypothesis test)

There is a very strong correspondence between hypothesis testing and interval estimation. We
can say in general that every confidence set corresponds to a test and vice versa. Consider the
following example.

In example 3.3 the question is for the normal distribution where σ2 is known, find a 100(1-α)%
confidence interval for µ. The result used comes from example 2.3. (They did extra steps shown
in brackets which you can skip.)

Example 3.3 Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) with 𝜎 2 known and consider testing. Find a
100(1 – α)% confidence interval for 𝜇.

Solution:
Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) with 𝜎 2 known and consider testing 𝐻0 : 𝜇 = 𝜇0 versus
𝐻1 : 𝜇 ≠ 𝜇0 . From Chapter 2 we know that the unbiased test of size 𝛼 has rejection region

𝜎
𝑅 = {𝒙 ∶ |𝑥 − 𝜇0 | > 𝑧1−𝛼/2 } , (𝑤ℎ𝑒𝑟𝑒 𝑃[ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝜇 = 𝜇0 ] = 𝛼) .
√𝑛
𝜎
= {𝒙 ∶ ±( 𝑥 − 𝜇0 ) > 𝑧1−𝛼/2 }
√𝑛
𝜎 𝜎
= {𝒙 ∶ −(𝑥 − 𝜇0 ) > 𝑧1−𝛼 𝑜𝑟 𝑥 − 𝜇0 > 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
𝜎 𝜎
= {𝒙 ∶ 𝑥 − 𝜇0 < −𝑧1−𝛼 𝑜𝑟 𝑥 − 𝜇0 > 𝑧1−𝛼 }
2√𝑛 2 √𝑛
𝜎 𝜎
= {𝒙 ∶ 𝑥 < 𝜇0 − 𝑧1−𝛼 𝑜𝑟 𝑥 > 𝜇0 + 𝑧1−𝛼 }
2 √𝑛 2 √𝑛

(This means that 𝑃[ 𝐴𝑐𝑐𝑒𝑝𝑡 𝐻0 |𝜇 = 𝜇0 ] = 1 − 𝛼.)


But the acceptance region, 𝐴(𝜇0 ), is the complement of 𝑅, so

𝜎 𝜎
𝐴(𝜇0 ) = {𝒙 ∶ 𝜇0 − 𝑧1−𝛼/2 ≤ 𝑥 ≤ 𝜇0 + 𝑧1−𝛼/2 }
√𝑛 √𝑛

(with probability
𝜎 𝜎
𝑃[𝑿 ∈ 𝐴(𝜇0 )] = 𝑃 [𝜇0 − 𝑧1−𝛼/2 ≤ 𝑋 ≤ 𝜇0 + 𝑧1−𝛼/2 ]
√𝑛 √𝑛
= 1−𝛼.

This probability statement is true for every 𝜇0 , hence the statement

𝜎 𝜎
𝑃𝜇 [𝜇 − 𝑧1−𝛼/2 ≤ 𝑋 ≤ 𝜇 + 𝑧1−𝛼/2 ] = 1−𝛼
√ 𝑛 √𝑛

is true.) By inverting this statement and dropping the subscript it follows that
𝜎 𝜎
( 𝑃𝜇 [𝑋 − 𝑧1−𝛼/2 ≤ 𝜇 ≤ 𝑋 + 𝑧1−𝛼/2 ] = 1−𝛼.
√𝑛 √𝑛

For a given sample 𝑥1 , … , 𝑥𝑛 we then define)


Get 𝜇 alone in the middle, and you will get:

𝜎 𝜎
𝐶(𝑿) = {𝜇: 𝑥 − 𝑧1−𝛼/2 ≤ 𝜇 ≤ 𝑥 + 𝑧1−𝛼/2 } (3.3)
√𝑛 √𝑛

is a 100 (1 –𝛼)% confidence interval for 𝜇.


𝜎 𝜎
As {𝜇0 − 𝑧1−𝛼 ≤ 𝑥 ≤ 𝜇0 + 𝑧1−𝛼 }
2 √ 𝑛 2 √𝑛
𝜎 𝜎
= {−𝑧1−𝛼 ≤ 𝑥 − 𝜇0 ≤ 𝑧1−𝛼 }
2 √ 𝑛 2 √𝑛
𝜎 𝜎
= {−𝑥 − 𝑧1−𝛼 ≤ −𝜇0 ≤ −𝑥 + 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
𝜎 𝜎
= " {𝑥 + 𝑧1−𝛼 ≥ 𝜇0 ≥ 𝑥 − 𝑧1−𝛼 }"
2 √𝑛 2 √𝑛
𝜎 𝜎
= {𝑥 − 𝑧1−𝛼 ≤ 𝜇0 ≤ 𝑥 + 𝑧1−𝛼 }
2 √𝑛 2 √𝑛

Solution (without extra steps):


Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) with 𝜎 2 known and consider testing 𝐻0 : 𝜇 = 𝜇0 versus
𝐻1 : 𝜇 ≠ 𝜇0 . From Chapter 2 we know that the unbiased test of size 𝛼 has rejection region

𝜎
𝑅 = {𝒙 ∶ |𝑥 − 𝜇0 | > 𝑧1−𝛼/2 }
√𝑛
𝜎
= {𝒙 ∶ ±( 𝑥 − 𝜇0 ) > 𝑧1−𝛼/2 }
√𝑛
𝜎 𝜎
= {𝒙 ∶ −(𝑥 − 𝜇0 ) > 𝑧1−𝛼 𝑜𝑟 𝑥 − 𝜇0 > 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
𝜎 𝜎
= {𝒙 ∶ 𝑥 − 𝜇0 < −𝑧1−𝛼 𝑜𝑟 𝑥 − 𝜇0 > 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
𝜎 𝜎
= {𝒙 ∶ 𝑥 < 𝜇0 − 𝑧1−𝛼 𝑜𝑟 𝑥 > 𝜇0 + 𝑧1−𝛼 }
2 √𝑛 2 √𝑛

But the acceptance region, 𝐴(𝜇0 ), is the complement of 𝑅, so

𝜎 𝜎
𝐴(𝜇0 ) = {𝒙 ∶ 𝜇0 − 𝑧1−𝛼/2 ≤ 𝑥 ≤ 𝜇0 + 𝑧1−𝛼/2 }
√𝑛 √𝑛

𝜎 𝜎
{𝜇0 − 𝑧1−𝛼 ≤ 𝑥 ≤ 𝜇0 + 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
𝜎 𝜎
= {−𝑧1−𝛼 ≤ 𝑥 − 𝜇0 ≤ 𝑧1−𝛼 }
2 √ 𝑛 2 √𝑛
𝜎 𝜎
= {−𝑥 − 𝑧1−𝛼 ≤ −𝜇0 ≤ −𝑥 + 𝑧1−𝛼 }
2 √ 𝑛 2 √𝑛
𝜎 𝜎
= " {𝑥 + 𝑧1−𝛼 ≥ 𝜇0 ≥ 𝑥 − 𝑧1−𝛼 }"
2 √𝑛 2 √𝑛
𝜎 𝜎
= {𝑥 − 𝑧1−𝛼 ≤ 𝜇0 ≤ 𝑥 + 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
By inverting this statement and dropping the subscript it follows that

𝜎 𝜎
𝐶(𝑿) = {𝜇: 𝑥 − 𝑧1−𝛼/2 ≤ 𝜇 ≤ 𝑥 + 𝑧1−𝛼/2 } (3.3)
√𝑛 √𝑛

is a 100 (1 – 𝛼)% confidence interval for 𝜇.

By inverting the acceptance region of a test of size 𝛼, we obtain a confidence interval with
confidence coefficient 1 − 𝛼. These sets are connected by the tautology
(𝑥1 , … , 𝑥𝑛 ) ∈ 𝐴(𝜇0 ) ⟺ 𝜇0 ∈ 𝐶(𝑥1 , … , 𝑥𝑛 ) .

Theorem 3.1 : For each 𝜃0 ∈ Θ, let 𝐴(𝜃0 ) be the acceptance region of a size 𝛼 test of
𝐻0 : 𝜃 = 𝜃0 . For each 𝒙 ∈ 𝜒, define a set 𝐶(𝒙) in the parameter space by
𝐶(𝒙) = {𝜃0 : ∈ 𝐴(𝜃0 )} .
Then the random set 𝐶(𝑿) is a 1 − 𝛼 confidence set. Conversely, let 𝐶(𝑿) be a 1 − 𝛼
confidence set. For any 𝜃0 ∈ Θ, define
𝐴(𝜃0 ) = {𝒙 ∶ 𝜃0 ∈ 𝐶(𝒙)} .
Then 𝐴(𝜃0 ) is the acceptance region of a size 𝛼 test of 𝐻0 : 𝜃 = 𝜃0 .

In example 3.4 the question is for the exponential distribution, find a 100(1-α)% confidence
interval for θ. The result used comes from example 2.2.

Example 3.4
(This example is done slightly differently to the original notes to make it easier)
For the exponential distribution, find a 100(1-α)% confidence interval for θ.

Solution:
Suppose we want to find a confidence interval for the parameter 𝜃 of an exponential
distribution by inverting a size 𝛼 test of 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 ≠ 𝜃0 .

For the hypotheses 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 ≠ 𝜃0 ,) from example 2.2, reject 𝐻0 if


1 2 1 2
𝑥 ≤ 2𝑛𝜃 𝜒2𝑛,𝛼/2 𝑜𝑟 𝑥 ≥ 2𝑛𝜃 𝜒2𝑛,1−𝛼/2 .
0 0
OR :
1
2 2 1
𝑅 = {𝑥: 𝑥 ≤ 2𝑛𝜃 𝜒2𝑛,𝛼/2 𝑜𝑟 𝑥 ≥ 2𝑛𝜃 𝜒2𝑛,1−𝛼/2 }. (2.11)
0 0

Then the acceptance region is


12 2 1
𝐴(𝜃0 ) = {𝒙 ∶ 2𝑛𝜃 𝜒2𝑛,𝛼/2 ≤ 𝑥 ≤ 2𝑛𝜃 𝜒2𝑛,1−𝛼/2 } , (3.5)
0 0

and by inverting it, the 100( 1 − 𝛼 )% confidence interval for 𝜃 follows after dropping the
subscript.

1 2 2 1
𝐶(𝒙)1−𝛼 = {𝜃: 2𝑛𝑥 𝜒2𝑛,𝛼/2 ≤ 𝜃 ≤ 2𝑛𝑥 𝜒2𝑛,1−𝛼/2 } . (3.6)
Note that the inversion of a two–sided test gave a two–sided interval. In the next example we
invert a one–sided test to get a one–sided interval.

In example 3.5 the question is for the normal distribution where σ2 is unknown, find a
100(1-α)% upper confidence bound for µ. The result used comes from example 2.4. remarks
(equation 2.19). (Upper confidence bound means confidence interval of the form (−∞, 𝑈].)

Example 3.5

Let 𝑋1 , … , 𝑋𝑁 ∼ 𝑁(𝜇, 𝜎 2 ) and we want a 1 − 𝛼 upper confidence bound for 𝜇. That is, we
want a confidence interval of the form 𝐶(𝒙) = (−∞, 𝑈(𝒙)]. To obtain such an interval we will
invert a one sided hypothesis of the form 𝐻0 : 𝜇 ≥ 𝜇0 versus 𝐻1 : 𝜇 < 𝜇0 .

Note that if we want an upper bound on the interval, we must use a test with an upper bound
on the alternative hypothesis and vice versa.

The standard test has critical region

𝑠
𝑅 = {𝒙 ∶ 𝑥 < 𝜇0 − 𝑡𝑛−1,1−𝛼 }
√𝑛

and acceptance region


𝑠
𝐴(𝜇0 ) = {𝒙 ∶ 𝑥 ≥ 𝜇0 − 𝑡𝑛−1,1−𝛼 }
√𝑛

𝑠 𝑠 𝑠
{𝑥 ≥ 𝜇0 − 𝑡𝑛−1,1−𝛼 } = {𝑥 + 𝑡𝑛−1,1−𝛼 ≥ 𝜇0 } = {𝜇0 ≤ 𝑥 + 𝑡𝑛−1,1−𝛼 }
√𝑛 √𝑛 √𝑛

After dropping the subscript and inverting follows the 100(1 − 𝛼)% confidence interval :

𝑠
𝐶(𝒙)1−𝛼 = {𝜇: 𝜇 ≤ 𝑥 + 𝑡𝑛−1,1−𝛼 }. (3.7)
√𝑛

𝑠
i.e. {𝜇: −∞ < 𝜇 ≤ 𝑥 + 𝑡𝑛−1,1−𝛼 }.
√𝑛
The test inversion method is completely general in that we can invert any test and obtain a
confidence set. However, in certain situations one of the following two methods works easier.
Especially for discrete distributions.

3.2.2 Pivotal Quantities

Definition 3.4 : A random variable 𝑄(𝑿, 𝜃), a function of the data and the parameter, is a
pivotal quantity if the distribution of 𝑄(𝑿, 𝜃) is independent of the parameter. That is, if 𝑋 ∼
𝑓(𝑥|𝜃), then 𝑄(𝑿, 𝜃) has the same distribution for all values of 𝜃.

In location and scales cases there are many examples of pivotal quantities.
Example 3.6 : Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ). Then we know
√𝑛(𝑋−𝜇) (𝑋−𝜇)
(a) = ∼ 𝑁(0,1),
𝜎 𝜎/√𝑛
√𝑛(𝑋−𝜇) (𝑋−𝜇)
(b) = ∼ 𝑡𝑛−1,
𝑆 𝑆/√𝑛
𝑋𝑖 −𝜇
(d) ~𝑁(0,1)
𝜎
𝑋𝑖 −𝜇 2 (𝑋𝑖 −𝜇)2
(e) ( ) = ~𝜒12
𝜎 𝜎2
(𝑋𝑖 −𝜇)2 1 ∑𝑛
𝑖=1(𝑋𝑖 −𝜇)
2
(f) ∑𝑛𝑖=1 = 𝜎2 ∑𝑛𝑖=1(𝑋𝑖 − 𝜇)2 = ~𝜒𝑛2
𝜎2 𝜎2
2
∑𝑛
𝑖=1(𝑋𝑖 −𝑋) (𝑛−1)𝑆 2 ̂2
𝑛𝜎 2
(c) = = ~𝜒𝑛−1 where 𝜎̂ 2 is the MLE of 𝜎 2 (3.8)
𝜎2 𝜎2 𝜎2

1 2 1 2
Remember: 𝑆 2 = 𝑛−1 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋) and 𝜎̂ 2 = 𝑛 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋) is the MLE of 𝜎 2 .

All these statistics are pivotal quantities since (i) they are functions of both the data and the
parameter(s), and (ii) their distributions are independent of the parameters.

Approaches to find a pivotal quanity:


Approach 1: Remember an important relationship or hint previously/currently given.
Approach 2: We can sometimes look at the form of the pdf (of the data or of a sufficient statistic)
𝑥−𝜇
to see if a pivot exists. In the normal case the quantity appears in the pdf of 𝑋, and is a
𝜎/√𝑛
pivot. Then use important relationship or hint previously/currently to get the pivotal quantity’s
distribution (or use MGF’s for this).
Approach 3: In general, suppose the pdf of a statistic 𝑇, 𝑓(𝑡|𝜃), can be expressed in the form

𝜕
𝑓(𝑡|𝜃) = 𝑔(𝑄(𝑡, 𝜃)) |𝜕𝑡 𝑄(𝑡, 𝜃)| , (3.9)

for some function 𝑔 and some monotone (in 𝑡) function 𝑄. Then 𝑄(𝑇, 𝜃) is a pivot. Then use
important relationship or hint previously/currently to get the pivotal quantity’s distribution (or
use MGF’s for this).

Don’t confuse this formula with the formula for getting the pdf of a function of 𝑋 e.g. 𝑦 = 𝑅(𝑥)
and 𝑥 = 𝑆(𝑦) then 𝑔𝑌 (𝑦) = … (which by the way has 2 forms, one which divides and one that
times’s which looks similar to equation (3.9) but is NOT the same).

Example 3.7 :
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝐸𝑥𝑝 (𝜆). Then 𝑇 = Σ𝑋𝑖 is a sufficient statistic for 𝜆 and we know 𝑇 =
Σ𝑋𝑖 ∼ Gamma(𝑛, 𝜆) with pdf
𝜆𝑛
𝑓(𝑡|𝜆) = 𝑡 𝑛−1 𝑒 −𝜆𝑡 .
Γ(𝑛)
Notice that 𝜆 and 𝑡 appear together in the pdf as 𝜆𝑡, and in fact, 2𝜆𝑇 has a 𝜒 2 –distribution
2
with 2𝑛 degrees of freedom. So 𝑄 = 2𝜆𝑇 = 2𝜆Σ𝑋𝑖 ∼ 𝜒2𝑛 , independent of 𝜆, so 𝑄 is a pivot.
Once we have a pivotal quantity, the construction of a confidence interval is simple, provided the
pivotal quantity is invertable. Since the distribution of 𝑄(𝑇, 𝜃) is known, we can find two
numbers 𝑎 and 𝑏 so that

𝑃[𝑎 ≤ 𝑄(𝑇, 𝜃) ≤ 𝑏] = 1 − 𝛼 (3.10)

for a specified 𝛼. By inverting the inequalities we can obtain the confidence interval, similar to
inverting the acceptance region for a test.

Example 3.7 (cont.) :


Since 2𝜆𝑇 is a pivot, we can find 𝑎 and 𝑏 so that
𝑃[𝑎 ≤ 2𝜆𝑇 ≤ 𝑏] = 1 − 𝛼 . (3.11)

There is an infinite number of sets (𝑎, 𝑏) that satisfy (3.11). If we use the equal–tail criterium,
2 2
then 𝑃[2𝜆𝑇 > 𝑏] = 𝛼/2 where 2𝜆𝑇 ∼ 𝜒2𝑛 . So 𝑏 = 𝜒2𝑛;1−𝛼/2 . Similarly it follows then that
2
𝑎 = 𝜒2𝑛,𝛼/2.

So
2 2
𝑃[𝜒2𝑛,𝛼/2 ≤ 2𝜆𝑇 ≤ 𝜒2𝑛,1−𝛼/2 ] = 1−𝛼
𝑎𝑛𝑑
2 2
𝜒2𝑛,𝛼/2 𝜒2𝑛,1−𝛼/2
𝑃[ ≤𝜆≤ ] = 1−𝛼,
2𝑇 2𝑇
2 2
𝜒2𝑛,𝛼/2 𝜒2𝑛,1−𝛼/2
so that 𝐶(𝑿)1−𝛼 = {𝜆: ≤𝜆≤ }. (3.12)
2Σ𝑥𝑖 2Σ𝑥𝑖

Example 3.7 using approach 3:


Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝐸𝑥𝑝 (𝜆). Then 𝑇 = Σ𝑋𝑖 is a sufficient statistic for 𝜆 and we know 𝑇 =
Σ𝑋𝑖 ∼ Gamma(𝑛, 𝜆) with pdf
𝜆𝑛
𝑓(𝑡|𝜆) = 𝑡 𝑛−1 𝑒 −𝜆𝑡
Γ(𝑛)
1
= 𝑒 −𝜆𝑡 𝑛𝜆𝑛 𝑡 𝑛−1
nΓ(𝑛)
𝜕
= 𝑔[(𝜆𝑡)𝑛 ] |𝜕𝑡 (𝜆𝑡)𝑛 |

2
Thus (𝜆𝑡)𝑛 is a pivotal quantity, so 2[(𝜆𝑡)𝑛 ]1/𝑛 = 2𝜆𝑡 is a pivotal quantity. 2𝜆𝑇 ∼ 𝜒2𝑛 .

So
2 2
𝑃[𝜒2𝑛,𝛼/2 ≤ 2𝜆𝑇 ≤ 𝜒2𝑛,1−𝛼/2 ] = 1−𝛼
𝑎𝑛𝑑
2 2
𝜒2𝑛,𝛼/2 𝜒2𝑛,1−𝛼/2
𝑃[ ≤𝜆≤ ] = 1−𝛼,
2𝑇 2𝑇
2 2
𝜒2𝑛,𝛼/2 𝜒2𝑛,1−𝛼/2
so that 𝐶(𝑿)1−𝛼 = {𝜆: ≤𝜆≤ }.
2Σ𝑥𝑖 2Σ𝑥𝑖
Example 3.8 :
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) and we want a 100( 1 − 𝛼 )% confidence interval for 𝜎 . ( 𝜇
unknown, so the pivotal quantity may not contain 𝜇 ). A pivotal quantity for 𝜎 2 is 𝑄 =
(𝑛−1)𝑆 2 2 1
∼ 𝜒𝑛−1 where 𝑆 2 = 𝑛−1 Σ(𝑋𝑖 − 𝑋)2. So
𝜎2
(𝑛−1)𝑆 2
𝑃 [𝑎 ≤ ≤ 𝑏] = 1 = 𝛼
𝜎2
2 2
if 𝑎 = 𝜒𝑛−1,𝛼/2 and 𝑏 = 𝜒𝑛−1,1−𝛼/2 .
2 (𝑛−1)𝑆 2 2
𝑃 [𝜒𝑛−1,𝛼/2 ≤ ≤ 𝜒𝑛−1,1−𝛼/2 ] = 1=𝛼
𝜎2
1 𝜎2 1
𝑃 [𝜒 2 ≤ (𝑛−1)𝑆 2
≤ 𝜒2 ] = 1=𝛼
𝑛−1,1−𝛼/2 𝑛−1,𝛼/2
(𝑛−1)𝑆 2 (𝑛−1)𝑆 2
So 𝑃 [𝜒2 ≤ 𝜎 2 ≤ 𝜒2 ] = 1 − 𝛼,
𝑛−1,1−𝛼/2 𝑛−1,𝛼/2
or equivalently,
(𝑛−1)𝑠2 (𝑛−1)𝑠2
𝐶(𝑋)1−𝛼 = {𝜎: √𝜒2 ≤ 𝜎 ≤ √ 𝜒2 }. (3.13)
𝑛−1,1−𝛼/2 𝑛−1,𝛼/2

3.2.3 Statistical Method

We base this confidence interval construction for a parameter 𝜃 on a real–valued statistic 𝑇


with cdf 𝐹𝑇 (𝑡|𝜃). (In practice we would usually take 𝑇 to be a sufficient statistic for 𝜃). We will
first assume that 𝑇 is a continuous random variable. The situation where 𝑇 is discrete is
similar, but has a few additional technical details to consider. We, therefore, state the discrete
case in a separate theorem. Why? Remember 𝑃(𝑋 < 3) = 𝑃(𝑋 ≤ 2) for a discrete distribution,
but 𝑃(𝑋 < 3) = 𝑃(𝑋 ≤ 3) for a continuous distribution etc.

Theorem 3.2 : Let 𝑇 be a statistic with continuous cdf 𝐹𝑇 (𝑡|𝜃). Let 0 < 𝛼 < 1 be a fixed
value. Suppose that for each 𝑡 ∈ 𝑇, the function 𝜃𝐿 (𝑡) and 𝜃𝑈 (𝑡) can be defined as follows.

1. If 𝐹𝑇 (𝑡|𝜃) is a decreasing function of 𝜃 for each 𝑡, define 𝜃𝐿 (𝑡) and 𝜃𝑈 (𝑡) by


𝛼 𝛼
𝐹𝑇 (𝑡|𝜃𝑈 (𝑡)) = 2 , 𝐹𝑇 (𝑡|𝜃𝐿 (𝑡)) = 1 − 2
𝛼 𝛼
𝐹𝑇 (𝑡|𝜃𝑈 (𝑡)) = 2 , 1 − 𝐹𝑇 (𝑡|𝜃𝐿 (𝑡)) = 2 .
𝑡 𝛼 ∞ 𝛼
∫−∞ 𝑓𝑇 (𝑢|𝜃𝑈 (𝑡))𝑑𝑢 = 2 , ∫𝑡 𝐹𝑇 (𝑢|𝜃𝐿 (𝑡))𝑑𝑢 = 2

2. If 𝐹𝑇 (𝑡|𝜃) is an increasing function of 𝜃 for each 𝑡, define 𝜃𝐿 (𝑡) and 𝜃𝑈 (𝑡) by


𝛼 𝛼
𝐹𝑇 (𝑡|𝜃𝑈 (𝑡)) = 1 − 2 , 𝐹𝑇 (𝑡|𝜃𝐿 (𝑡)) = 2 (3.14)
𝛼 𝛼
1 − 𝐹𝑇 (𝑡|𝜃𝑈 (𝑡)) = 2 , 𝐹𝑇 (𝑡|𝜃𝐿 (𝑡)) = 2
∞ 𝛼 𝑡 𝛼
∫𝑡 𝐹𝑇 (𝑢|𝜃𝑈 (𝑡))𝑑𝑢 = 2
∫−∞ 𝑓𝑇 (𝑢|𝜃𝐿 (𝑡))𝑑𝑢 = 2

Then the random interval [𝜃𝐿 (𝑇), 𝜃𝑈 (𝑇)] is a 1 − 𝛼 confidence interval for 𝜃.
If you have data, it is not really necessary to determine whether 𝐹𝑇 (𝑡|𝜃) is increasing or
decreasing. Simply solve the equations (case 1 or case 2) for values, say 𝜃1 (𝑡) and 𝜃2 (𝑡), of the
parameter instead of 𝜃L (𝑡) and 𝜃U (𝑡), and put the smaller number between 𝜃1 (𝑡) and 𝜃2 (𝑡)
on the left (i.e. it is then 𝜃L (𝑡)), and the bigger number between 𝜃1 (𝑡) and 𝜃2 (𝑡) on the right
(i.e. it is then 𝜃U (𝑡)).

Proof : We will prove only part (1). The proof of part (2) is similar. Since 𝐹𝑇 (𝑡|𝜃) is a
decreasing function of 𝜃 for each 𝑡 and 1 − 𝛼/2 > 𝛼/2, 𝜃𝐿 (𝑡) < 𝜃𝑈 (𝑡) and the values
𝜃𝐿 (𝑡) and 𝜃𝑈 (𝑡) are unique. Also, 𝐹𝑡 (𝑡|𝜃) is decreasing in 𝜃 and hence

𝐹𝑇 (𝑡|𝜃) < 𝛼/2 ⟺ 𝜃 > 𝜃𝑈 (𝑡) ,


𝐹𝑇 (𝑡|𝜃) > 1 − 𝛼/2 ⟺ 𝜃 < 𝜃𝐿 (𝑡) .

Consider the interval [𝜃𝐿 (𝑇), 𝜃𝑈 (𝑇)]. Since

𝜃 > 𝜃𝑈 (𝑡) ⟺ 𝐹𝑇 (𝑡|𝜃) < 𝛼/2 ,

it follows that

𝛼 𝛼 𝛼
𝑃𝜃 (𝜃 > 𝜃𝑈 (𝑇)) = 𝑃𝜃 (𝐹𝑇 (𝑇|𝜃) < 2 ) = 𝐹𝐹𝑇 ( 2 ) = 2 ,
So 𝑃𝜃 (𝜃 ≤ 𝜃𝑈 (𝑇)) = 1 − 𝛼/2

where the last equality follows from the Probability Integral Transform which states that the
0 𝑦<0
𝑦−0
random variable 𝐹(𝑥) ∼ Uniform(0, 1) i.e. 𝐹𝐹(𝑥) (𝑦) = {1−0 = 𝑦 0 ≤ 𝑦 ≤ 1.
1 𝑦>1
By a similar argument, we have

𝜃 < 𝜃𝐿 (𝑡) ⟺ 𝐹𝑇 (𝑡|𝜃) > 1 − 𝛼/2

𝛼 𝛼 𝛼 𝛼
𝑃𝜃 (𝜃 < 𝜃𝐿 (𝑇)) = 𝑃𝜃 (𝐹𝑇 (𝑇|𝜃) > 1 − 2 ) = 1 − 𝐹𝐹𝑇 (1 − 2 ) = 1 − (1 − 2 ) = 2 .

Putting these two together, we have


𝛼 𝛼
𝑃𝜃 (𝜃𝐿 (𝑇) ≤ 𝜃 ≤ 𝜃𝑈 (𝑇)) = 𝑃𝜃 (𝜃 ≤ 𝜃𝑈 (𝑇)) − 𝑃𝜃 (𝜃 < 𝜃𝐿 (𝑇)) = 1 − − = 1 − 𝛼 ,
2 2
proving the theorem.

The equations in the case of a decreasing cdf can also be expressed in terms of the pdf of the
statistic 𝑇. The functions 𝜃𝑈 (𝑡) and 𝜃𝐿 (𝑡) can be defined to satisfy
𝑡 ∞
∫−∞ 𝑓𝑇 (𝑢|𝜃𝑈 (𝑡))𝑑𝑢 = 𝛼/2 𝑎𝑛𝑑 ∫𝑡 𝐹𝑇 (𝑢|𝜃𝐿 (𝑡))𝑑𝑢 = 𝛼/2 . (3.15)
A similar set of equations holds for the increasing case.
The statistical method is particularly useful in the case (a) where the sample space depends on
the parameter, and (b) with discrete distributions.

Example 3.9 :
Consider a sample 𝑥1 , … , 𝑥𝑛 from a Uniform(0, 𝜃) distribution

1
𝑓(𝑥|𝜃) = ,0 ≤ 𝑥 ≤ 𝜃 .
𝜃

We want to find a 100(1 − 𝛼)% confidence interval for 𝜃.

We know that 𝑇 = 𝑌 = max{𝑋1 , … , 𝑋𝑛 } = X(𝑛) is a sufficient statistic for 𝜃 with pdf

𝑛
𝑓𝑌 (𝑦|𝜃) = 𝜃𝑛 𝑦 𝑛−1 , 0 ≤ 𝑦 ≤ 𝜃 ,

(from Theorem 1.4 as in Exercise 1 of Chapter 1) and cdf

𝑦 𝑛
𝐹𝑌 (𝑦|𝜃) = (𝜃) , 0 ≤ 𝑦 ≤ 𝜃 .

𝐹𝑌 (𝑦|𝜃) is a decreasing function of 𝜃, so set


𝛼 𝛼
𝐹𝑌 (𝑦|𝜃𝑈 (𝑦)) = 𝑎𝑛𝑑 𝐹𝑌 (𝑦|𝜃𝐿 (𝑦)) = 1 − 2 .
2
𝑦 𝑛 𝛼 𝑦 𝑛 𝛼
That is, (𝜃 ) = and (𝜃 ) =1−2 ,
𝑈 (𝑦) 2 𝐿 (𝑦)

from which we solve


1
𝜃𝐿 (𝑦) = 𝑦(1 − 𝛼/2)−𝑛
1
𝜃𝑈 (𝑦) = 𝑦(𝛼/2)−𝑛 .

So
1 1
𝐶(𝑦)1−𝛼 = {𝜃: 𝑦(1 − 𝛼/2)−𝑛 ≤ 𝜃 ≤ 𝑦(𝛼/2)−𝑛 } . (3.16)

Notice two things about this method. Firstly, the Equations (3.14) or (3.15) need to be solved only
for the actual observed value of the statistic 𝑇 = 𝑡0 , and secondly, it is not really necessary to
determine whether 𝐹𝑇 (𝑡|𝜃) is increasing or decreasing. Simply solve the equations for values,
say 𝜃1 (𝑡) and 𝜃2 (𝑡), of the parameter. Then the smaller solution is 𝜃𝐿 (𝑡) and the larger
𝜃𝑈 (𝑡). We now consider the discrete case.
Theorem 3.3 : Let 𝑇 be a discrete statistic with cdf 𝐹𝑇 (𝑡|𝜃) = 𝑃(𝑇 ≤ 𝑡|𝜃) . Let
0 < 𝛼 < 1 be a fixed value. Suppose that for each 𝑡 ∈ 𝑇, 𝜃1 (𝑡) and 𝜃2 (𝑡) can be defined as
𝛼 𝛼
𝐹𝑇 (𝑡|𝜃1 (𝑡)) = 2 , 𝐹𝑇 (𝑡 − 1|𝜃2 (𝑡)) = 1 − 2
𝛼 𝛼
𝑃[𝑇 ≤ 𝑡|𝜃1 (𝑡)] = 2 , 𝑃[𝑇 ≤ 𝑡 − 1|𝜃2 (𝑡)] = 1 − 2
𝛼 𝛼
𝑃[𝑇 ≤ 𝑡|𝜃1 (𝑡)] = 2 , 𝑃[𝑇 < 𝑡|𝜃2 (𝑡)] = 1 − 2
𝛼 𝛼
𝑃[𝑇 ≤ 𝑡|𝜃1 (𝑡)] = , 𝑃[𝑇 ≥ 𝑡|𝜃2 (𝑡)] = . (3.17)
2 2
𝛼 𝛼
∑𝑡𝑡=0
0
𝑓𝑇 (𝑡|𝜃1 (𝑡)) = , ∑∞
𝑡=𝑡0 𝑓𝑇 (𝑡|𝜃2 (𝑡)) = . (3.18)
2 2
𝛼 𝛼
∑t𝑢=0 𝑓𝑇 (𝑢|𝜃1 (𝑡)) = , ∑∞
𝑢=t 𝑓𝑇 (𝑢|𝜃2 (𝑡)) = . (3.18)
2 2

Then [𝜃1 (𝑡) , 𝜃2 (𝑡)] ([𝜃2 (𝑡), 𝜃1 (𝑡)]) is a 1 − 𝛼 confidence interval for 𝜃 if 𝐹𝑇 (𝑡|𝜃) is an
increasing (decreasing) function of 𝜃. Notice that Equation (3.17) can be written in terms of the
probability function as (3.18).

Example 3.10 :
Let 𝑋1 , … , 𝑋𝑛 be a random sample from a Poisson population with parameter 𝜆 and define
𝑌 = Σ𝑋𝑖 . 𝑌 = Σ𝑋𝑖 is sufficient for 𝜆 and 𝑌 = Σ𝑋𝑖 ∼ Poisson (𝑛𝜆) . Applying the above
method, if 𝑌 = 𝑦0 is observed, we are led to solve for 𝜆1 and 𝜆2 in the equations

a) If you want an exact confidence interval (then we do a trick):


𝛼 𝛼
∑t𝑢=0 𝑓𝑇 (𝑢|𝜃1 (𝑡)) = , ∑∞𝑢=t 𝑓𝑇 (𝑢|𝜃2 (𝑡)) = 2
2
𝛼 𝛼
𝑃[𝑌 ≤ 𝑦0 |𝜆1 ] = , 𝑃[𝑌 ≥ 𝑦0 |𝜆2 ] = 2 . (3.19)
2

There is an interesting relationship between the Poisson and Gamma distributions:


If 𝑌 ~ Poisson(𝛽𝑥), then 𝑃[𝑌 ≥ 𝛾] = 𝑃[𝑋 < 𝑥], 𝑋 ∼Gamma(𝛾, 𝛽) and 𝛾 is an integer
𝛽 2 2𝛾 1
Further, 2𝛽𝑋 ∼ Gamma (𝛾, 2𝛽) ∼ Gamma ( 2 , 2) ~𝜒2𝛾 .

𝑌 = Σ𝑋𝑖 ∼ Poisson(𝑛𝜆). So, for the first equation in (3.19) it follows that
𝛼 𝛼
𝑃[𝑌 ≤ 𝑦0 |𝜆1 ] = 2 , 𝑃[𝑌 ≥ 𝑦0 |𝜆2 ] = 2 .
𝛼 𝛼
𝑃[𝑌 ≥ 𝑦0 + 1|𝜆1 ] = 1 − 2 , 𝑃[𝑌 ≥ 𝑦0 |𝜆2 ] = 2
𝛼 𝛼
𝑃[𝐺𝑎𝑚𝑚𝑎(𝑦0 + 1, n) < 𝜆1 |𝜆1 ] = 1 − 2 , 𝑃[𝐺𝑎𝑚𝑚𝑎(𝑦0 , n) < 𝜆2 |𝜆2 ] = 2
𝛼 𝛼
𝑃[2𝑛𝐺𝑎𝑚𝑚𝑎(𝑦0 + 1, n) < 2𝑛𝜆1 |𝜆1 ] = 1 − 2 , 𝑃[2𝑛𝐺𝑎𝑚𝑚𝑎(𝑦0 , n) < 2𝑛𝜆2 |𝜆2 ] = 2
2 2 𝛼 𝛼
𝑃[𝜒2(𝑦 0 +1)
< 2𝑛𝜆1 ] = 1 − 2 , 𝑃[𝜒2𝑦 0
< 2𝑛𝜆2 ] = 2
(< is the same as ≤ with 𝜒 2 as it is a continuous random variable)
𝛼 𝛼
𝐹𝜒2 [2𝑛𝜆1 ] = 1 − , 𝐹𝜒2𝑦 2 [2𝑛𝜆2 ] =
2(𝑦0 +1) 2 2 0
𝛼 𝛼
2𝑛𝜆1 = 𝐹𝜒−1 2 [1 − ] , 2𝑛𝜆2 = 𝐹𝜒−1
2 [ ]
2(𝑦0 +1) 2 2𝑦0 2
1 −1 𝛼 1 −1 𝛼
𝜆1 = 2𝑛 𝐹𝜒2 [1 − 2 ], 𝜆2 = 𝐹2 [ ]
2(𝑦0 +1) 2𝑛 𝜒2𝑦0 2
1 2 1
𝜆1 = 2𝑛 𝜒2(𝑦 0 +1);1−𝛼/2
, 𝜆2 = 𝜒2 .
2𝑛 2𝑦0 ;𝛼/2
Now 𝜆2 < 𝜆1 and the confidence interval is
2 1 2 1
𝐶(𝑦)1−𝛼 = {𝜆: 2𝑛 𝜒2𝑦0 ;𝛼/2
≤ 𝜆 ≤ 2𝑛 𝜒2(𝑦0 +1);1−𝛼/2
}. (3.20)

(Note : A similar relationship exists between the Binomial and 𝐹 distributions.)

b) Otherwise an approximate confidence interval is given by (approximate because the discrete


situation is going to let us work with the “closest table value”):

Another way of finding the confidence interval is to solve for 𝜆1 in


𝛼 𝛼
( ∑t𝑢=0 𝑓𝑇 (𝑢|𝜃1 (𝑡)) = , ∑∞𝑢=t 𝑓𝑇 (𝑢|𝜃2 (𝑡)) = 2
2
𝛼 𝛼
𝑃[𝑌 ≤ 𝑦0 |𝜆1 ] = , 𝑃[𝑌 ≥ 𝑦0 |𝜆2 ] = 2 . (3.19)
2
𝛼 𝛼
𝑃[𝑌 ≤ 𝑦0 |𝜆1 ] = 2
, 𝑃[𝑌 < 𝑦0 |𝜆2 ] = 1 − 2 .
𝛼 𝛼
𝑃[𝑌 ≤ 𝑦0 |𝜆1 ] = , 𝑃[𝑌 ≤ 𝑦0 − 1|𝜆2 ] = 1 − 2 )
2
𝛼 𝛼
𝐹𝑌 (𝑦0 |𝜃1 (𝑡)) = 2 , 𝐹𝑌 (𝑦0 − 1|𝜃2 (𝑡)) = 1 − 2
(< is NOT the same as ≤ with 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 as it is a discrete random variable)

This is easiest done with the help of Poisson tables, where the value of the parameter can be
read off for a given probability. However, tables are never comprehensive enough to obtain an
interval with the exact required confidence coefficient 1 − 𝛼.

If data is given: For a numerical example, consider 𝑛 = 10 and observing 𝑦0 = Σ𝑥𝑖 = 3.

a) If you want an exact confidence interval


According to (3.20), a 95% confidence interval for 𝜆 is
1 2 1 2
𝐶(𝑦0 )0,95 = {𝜆: 𝜒6;0.025 ≤ 𝜆 ≤ 20 𝜒8;0.975 }
20 (3.21)
= {𝜆: 0.0619 ≤ 𝜆 ≤ 0.8768} .

b) The approximate confidence interval is given by:


Using the Poisson tables, we want to find 𝜆1 so that
𝐹𝑌 (3|𝜃1 (𝑡)) = 0.025, 𝐹𝑌 (2|𝜃2 (𝑡)) = 0.975
𝐹𝑌 (3|𝜃1 (𝑡) = 9) = 0.0212, 𝐹𝑌 (2|𝜃2 (𝑡) = 0.6) = 0.9769

So our confidence coefficient is not 0.95, but 0.9769 – 0.0212 = 0.9557, and
𝐶(𝑦0 )0.9557 = {𝜆: 0.6 ≤ 𝑛𝜆 ≤ 9.0}
(3.22)
= {𝜆: 0.06 ≤ 𝜆 ≤ 0.90} ,
which is reasonably close to (3.21).

Remember 𝑌 = Σ𝑋𝑖 ∼ Poisson(𝑛𝜆) so we had an approximate 95% confidence interval for 𝑛𝜆


that we converted to an approximate 95% confidence interval for 𝜆. (Actually an exact 95.57%
confidence interval.)
3.3 Evaluating Interval Estimators
We now have seen many methods for deriving confidence sets and, in fact, we can derive
different confidence sets for the same problem. In such situations we would, of course, want to
choose a best one. Therefore, we now examine some methods and criteria for evaluating set
estimators.

In set estimation two quantities vie against one another, size and coverage probability. Naturally,
we want our set to have small size and large coverage probability. The coverage probability of a
confidence set will, except in special cases, be a function of the parameter so there is not one
value to consider, but an infinite number of values. For the most part, however, we will measure
coverage probability performance by the confidence coefficient, the infimum of the coverage
probabilities. When we speak of the size of a confidence set we will usually mean the length of
the confidence set, if the set is an interval.

3.3.1 Size and Coverage Probability

We now consider what appears to be a simple, constrained minimization problem. For a given,
specified coverage probability find the confidence interval with the shortest length. We first
consider an example.

Example 3.11 : Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) where 𝜎 2 = 1 . Using the fact that


𝑍 = √𝑛 (𝑋 − 𝜇)/1 is a pivot with a standard normal distribution, any 𝑎 and 𝑏 that satisfy
"1 − 𝛼 𝐶. 𝐼. 𝑓𝑜𝑟 𝑍" = 𝑃[𝑎 ≤ 𝑍 ≤ 𝑏] = 1 − 𝛼 will give a 1 − 𝛼 confidence interval for 𝜇 ;
𝑏 𝑎
"1 − 𝛼 𝐶. 𝐼. 𝑓𝑜𝑟 𝜇" = 𝑃 [𝑥 − ≤𝜇≤𝑥− ]=1−𝛼
√ 𝑛 √𝑛
𝑏 𝑎
{𝜇: 𝑥 − ≤𝜇≤𝑥− }.
√𝑛 √𝑛

Which choice of 𝑎 and 𝑏 is best? The length of the interval is

𝑎 𝑏 1
𝐿1−𝛼 𝐶.𝐼. 𝑓𝑜𝑟 𝜇 = 𝐿 = [𝑥 − ] − [𝑥 − ]= (𝑏 − 𝑎) ∝ 𝑏 − 𝑎 = 𝐿1−𝛼 𝐶.𝐼. 𝑓𝑜𝑟 𝑍 , (3.23)
√ 𝑛 √ 𝑛 √𝑛

1
but since is constant, we want to minimize 𝑏 − 𝑎 while maintaining 1 − 𝛼 coverage.
√𝑛
There are an infinite number of choices for 𝑎 and 𝑏. The traditional values used are 𝑏 = 𝑧1−𝛼/2
and 𝑎 = 𝑧𝛼/2 = −𝑧1−𝛼/2 . The following theorem will show whether this choice is optimal.

Theorem 3.4 : Let 𝑓(𝑥) be a unimodal pdf. If the interval [𝑎, 𝑏] satisfies
𝑏
a) 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥 = 1 − 𝛼,
b) 𝑓(𝑎) = 𝑓(𝑏), and
c) 𝑎 ≤ 𝑥 ∗ ≤ 𝑏, where 𝑥 ∗ is the mode of 𝑓(𝑥),
then [𝑎, 𝑏] is the shortest interval satisfying (a).
Proof : Let [𝑎′, 𝑏′] be any interval with 𝑏′ − 𝑎′ < 𝑏 − 𝑎 . We will show that this implies
𝑏′
∫𝑎′ 𝑓(𝑥)𝑑𝑥 ≤ 1 − 𝛼. The result will be proved only for 𝑎′ ≤ 𝑎, the proof being similar if 𝑎 < 𝑎′.
Also, two cases need to be considered, 𝑏′ ≤ 𝑎 and 𝑏′ > 𝑎.
If 𝑏′ ≤ 𝑎, then 𝑎′ ≤ 𝑏′ ≤ 𝑎 ≤ 𝑥 ∗ and

𝑏′
∫𝑎′ 𝑓(𝑥)𝑑𝑥 ≤ 𝑓(𝑏′)(𝑏′ − 𝑎′) (𝑥 ≤ 𝑏′ ≤ 𝑥 ∗ ⇒ 𝑓(𝑥) ≤ 𝑓(𝑏′))
≤ 𝑓(𝑎)(𝑏′ − 𝑎′) (𝑏′ ≤ 𝑎 ≤ 𝑥 ∗ ⇒ 𝑓(𝑏′) ≤ 𝑓(𝑎))
< 𝑓(𝑎)(𝑏 − 𝑎) (𝑏′ − 𝑎′ < 𝑏 − 𝑎 𝑎𝑛𝑑 𝑓(𝑎) > 0)
𝑏
≤ ∫𝑎 𝑓(𝑥)𝑑𝑥 ((𝑎), (𝑏), 𝑎𝑛𝑑 𝑢𝑛𝑖𝑚𝑜𝑑𝑎𝑙𝑖𝑡𝑦 ⇒ 𝑓(𝑥) ≥ 𝑓(𝑎) 𝑓𝑜𝑟 𝑎 ≤ 𝑥 ≤ 𝑏)
= 1−𝛼,

𝑏′
Thus ∫𝑎′ 𝑓(𝑥)𝑑𝑥 < 1 − 𝛼 (thus we have a contradiction thus the assumption is false)
completing the proof in the first case.

If 𝑏′ > 𝑎 , then 𝑎′ ≤ 𝑎 < 𝑏′ < 𝑏 for, if 𝑏′ were greater than or equal to 𝑏 , then 𝑏′ − 𝑎′
would be greater than or equal to 𝑏 − 𝑎. In this case, we can write
𝑏′ 𝑏 𝑎 𝑏
∫𝑎′ 𝑓(𝑥)𝑑𝑥 = ∫𝑎 𝑓(𝑥)𝑑𝑥 + [∫𝑎′ 𝑓(𝑥)𝑑𝑥 − ∫𝑏′ 𝑓(𝑥)𝑑𝑥]
𝑎 𝑏
= (1 − 𝛼) + [∫𝑎′ 𝑓(𝑥)𝑑𝑥 − ∫𝑏′ 𝑓(𝑥)𝑑𝑥]
and the theorem will be proved if we show that the expression
in square brackets is negative. Now, using the unimodality of 𝑓,
the ordering 𝑎′ ≤ 𝑎 < 𝑏′ < 𝑏, and (b),
𝑎
∫𝑎′ 𝑓(𝑥)𝑑𝑥 ≤ 𝑓(𝑎)(𝑎 − 𝑎′)
𝑏
and ∫𝑏′ 𝑓(𝑥)𝑑𝑥 ≥ 𝑓(𝑏)(𝑏 − 𝑏′) .
𝑏
so − ∫𝑏′ 𝑓(𝑥)𝑑𝑥 ≤ −𝑓(𝑏)(𝑏 − 𝑏′)
𝑎 𝑏
Thus [∫𝑎′ 𝑓(𝑥)𝑑𝑥 − ∫𝑏′ 𝑓(𝑥)𝑑𝑥] ≤ 𝑓(𝑎)(𝑎 − 𝑎′) − 𝑓(𝑏)(𝑏 − 𝑏′)
= 𝑓(𝑎)(𝑎 − 𝑎′ ) − 𝑓(𝑎)(𝑏 − 𝑏 ′ ) (𝑓(𝑎) = 𝑓(𝑏))
= 𝑓(𝑎)[(𝑎 − 𝑎′) − (𝑏 − 𝑏′)]
= 𝑓(𝑎)[𝑎 − 𝑎′ − 𝑏 + 𝑏′]
= 𝑓(𝑎)[𝑏 ′ − 𝑎′ − 𝑏 + 𝑎]
= 𝑓(𝑎)[(𝑏 ′ − 𝑎′ ) − (𝑏 − 𝑎)] < 0,
which is negative if (𝑏′ − 𝑎′) < (𝑏 − 𝑎) and 𝑓(𝑎) > 0.

𝑏′ 𝑎 𝑏
∫𝑎′ 𝑓(𝑥)𝑑𝑥 = (1 − 𝛼) + [∫𝑎′ 𝑓(𝑥)𝑑𝑥 − ∫𝑏′ 𝑓(𝑥)𝑑𝑥]
= (1 − 𝛼) + 𝑛𝑒𝑔. 𝑛𝑢𝑚𝑏𝑒𝑟
< 1−𝛼
𝑏′
Thus ∫𝑎′ 𝑓(𝑥)𝑑𝑥 < 1 − 𝛼 (thus we have a contradiction thus the assumption is false)
completing the proof in the second case.
Example 3.11 (Cont.) : According to Theorem 3.4 the value of the density function should be
the same at 𝑎 and 𝑏. So chosing 𝑎 = 𝑧𝛼/2 = −𝑧1−𝛼/2 and 𝑏 = 𝑧1−𝛼/2 (obviously (a) holds):
since the distribution in question here is the standard normal, and thus symmetric around zero
((c) holds), it means that 𝑓(−𝑧1−𝛼/2 ) = 𝑓(𝑧1−𝛼/2 ) ((b) holds), and the equal tail interval is
indeed the shortest possible 1 − 𝛼 interval so 𝐿1−𝛼 𝐶.𝐼. 𝑓𝑜𝑟 𝑍 is minimised, so 𝐿1−𝛼 𝐶.𝐼. 𝑓𝑜𝑟 𝜇
1 1 1
is minimised with length 𝐿 = [b − a] = [𝑧1−𝛼 − (−𝑧1−𝛼 )] = 2𝑧1−𝛼/2 .
√𝑛 √𝑛 2 2 √𝑛

So for any symmetric unimodal distribution, (like the 𝑡–distribution) the shortest interval will by
symmetric around the mean, given the interval is a function of 𝑏 − 𝑎.

Corollary 3.1 : Let 𝑓(𝑥) be a strictly decreasing pdf on [0, ∞). Of all intervals [𝑎, 𝑏]
𝑏
that satisfy ∫𝑎 𝑓(𝑥)𝑑𝑥 = 1 − 𝛼, the shortest is obtained by choosing 𝑎 = 0 and 𝑏 so that
𝑏
𝑃(0 ≤ 𝑋 ≤ 𝑏) = ∫0 𝑓(𝑥)𝑑𝑥 = 1 − 𝛼.

The proof of this corollary follows the same line as Theorem 3.4, see Exercise 5.

NOTE : It is important to note that the previous theorem and corollary only applies when the
interval length is a function of 𝑏 − 𝑎, which is usually only the case when working with a location
problem. In scale cases the theorem may not be applicable, as in the following example.

1 2𝑋
Example 3.12 : Let 𝑋 ∼ Gamma(𝑘, 𝛽) with 𝑘 known. The quantity 𝑌 = 𝛽
is a pivot with
2
𝑌∼ 𝜒2𝑘 , so we can get a confidence interval by finding constants 𝑎 and 𝑏 to satisfy
𝑃[𝑎 < 𝑌 < 𝑏] = 1 − 𝛼 .

However, Theorem 3.4 will not give the shortest interval since the interval for 𝛽 is of the form
2𝑥 2𝑥
𝐶(𝑦)1−𝛼 = {𝛽: ≤𝛽≤ } (3.24)
𝑏 𝑎
with length
2𝑥 2𝑥 1 1 𝑏−𝑎 𝑏−𝑎
𝐿1−𝛼 𝐶.𝐼.𝑓𝑜𝑟 𝛽 = − = 2𝑥 (𝑎 − 𝑏) = 2𝑥 ( 𝑎𝑏 ) ∝
𝑁𝑂𝑇 𝑃𝑅𝑂𝑃 𝑏 − 𝑎 = 𝐿1−𝛼 𝐶.𝐼.𝑓𝑜𝑟 𝑌 ,
𝑎 𝑏 𝑎𝑏
(3.25)
𝑏−𝑎
that is, it is 𝐿 is proportional to 𝑎𝑏 and not 𝑏 − 𝑎. Thus theorem 3.4. is useless. It is however
not easy to solve for 𝑎 and 𝑏 in this case, and in practice the equal–tail intervals are often
used if the distribution is not too skew.

2
Notice that if 𝑋 is defined in the example as 𝑋 ∼ Gamma(𝑘, 𝛽), then 𝑌 = 2𝛽𝑋 ∼ 𝜒2𝑘 , and
the confidence interval for 𝛽 is of the form
𝑎 𝑏
{𝛽: ≤ 𝛽 ≤ 2𝑥} (3.26)
2𝑥
1
with length 𝐿1−𝛼 𝐶.𝐼.𝑓𝑜𝑟 𝛽 = (𝑏 − 𝑎) ∝ 𝑏 − 𝑎 = 𝐿1−𝛼 𝐶.𝐼.𝑓𝑜𝑟 𝑌 . Thus we would use corollary
2𝑥
2
3.1. i.e. we would choose 𝑎 = 0 and 𝑏 = 𝜒1−α ,2𝑘 as the pdf is decreasing w.r.t. 𝑦. If the pdf
2
was unimodal, we would be able to use theorem 3.4., i.e. we would choose 𝑎 = 𝜒α/2 ,2𝑘 and
2
𝑏 = 𝜒1−α/2 ,2𝑘 .
3.3.2 Test–Related Optimality

Since there is a one-to-one correspondence between confidence sets and tests of hypotheses
(Theorem 9.2.1), there is some correspondence between optimality of tests and optimality of
confidence sets. Usually, test–related optimality properties of confidence sets do not directly
relate to the size of the set but rather to the probability of the set covering false values.

Consider the general situation, where 𝑋 ∼ 𝑓(𝑥|𝜃) and we construct a 1 − 𝛼 confidence


interval for 𝜃, 𝐶(𝑋), by inverting an acceptance region. The probability of false coverage is the
function of 𝜃 and 𝜃′ given by

𝑃𝜃 [𝜃′ ∈ 𝐶(𝑋)] , 𝜃 ≠ 𝜃′ , (3.27)

the probability of covering 𝜃′ when 𝜃 is the true parameter.

Definition 3.5 : A 1 − 𝛼 confidence set that minimizes the probability of false coverage over a
class of 1 − 𝛼 confidence sets is called a uniformly most accurate (UMA) confidence set.

We know that a uniformly most powerful test (UMP test) minimizes the probability of accepting
the null hypothesis over all parameter values outside the null hypothesis. {𝛽(𝜃) ≥ 𝛽′(𝜃) for all
𝜃 ∈ Ω0 }. It can be shown that this is equivalent to minimizing the probability of false coverage
when inverting the acceptance region. So UMA confidence intervals are constructed by inverting
UMP tests. Unfortunately UMP tests are one–sided, so that UMA intervals are also one–sided.

Example 3.13 : Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ), where 𝜎 2 is known. The interval


𝜎
𝐶(𝑥)1−𝛼 = {𝜇: 𝜇 ≥ 𝑥 − 𝑧1−𝛼 } (3.28)
√𝑛

𝜎
[ 𝑥 − 𝑧1−𝛼 , ∞) is a 1 − 𝛼 UMA lower confidence bound since it can be obtained by
√𝑛
inverting the UMP test of 𝐻0 : 𝜇 ≤ 𝜇0 versus 𝐻1 : 𝜇 > 𝜇0 .

The property of unbiasedness when testing a two–sided hypothesis can also be transferred to
confidence intervals. Remember that an unbiased test is one in which the power under the
alternative is always greater than the power under the null hypothesis.

Definition 3.6 : A 1 − 𝛼 confidence set 𝐶(𝑋) is unbiased if 𝑃𝜃 [𝜃′ ∈ 𝐶(𝑋)] ≤ 1 − 𝛼 for all
𝜃 ≠ 𝜃′.

Thus, for an unbiased confidence set, the probability of false coverage is never more than the
minimum probability of true coverage. Again, the inversion of an unbiased test, or unbiased UMP
test, will result in an unbiased, or unbiased UMA, confidence interval.
3.4 Approximate Confidence Intervals
As with tests, we close this chapter with some approximate and asymptotic (result holds for
𝑛 ⟶ ∞, thus it holds approximately when n is large) versions of confidence sets. These methods
can be of use in complicated situations where other methods have failed.

3.4.1 Approximate Maximum Likelihood Intervals


If 𝑋1 , … , 𝑋𝑛 ∼ 𝑓(𝑥|𝜃) and 𝜃̂ is the MLE of 𝜃 (then by the invariance property of MLE’s we
have that ℎ(𝜃̂) is the MLE of ℎ(𝜃)), then the variance of a function ℎ(𝜃̂) can be approximated
[ℎ′ (𝜃)]2 |𝜃=𝜃
𝑉𝑎𝑟 (ℎ(𝜃̂)|𝜃) ≈
̂
by ̂ . (3.29)
𝜕2
− 2 ℓ𝑛 𝐿(𝜃| 𝒙)|𝜃=𝜃̂
𝜕𝜃
̂ )−ℎ(𝜃)
ℎ(𝜃
Then, under quite general conditions, we have that ⟶ 𝑁(0,1) , (3.30)
̂ )|𝜃)
̂ (ℎ(𝜃
√ 𝑉𝑎𝑟

giving the approximate confidence interval for ℎ(𝜃) (by using the pivotal quantity method);
̂ )−ℎ(𝜃)
ℎ(𝜃
i.e. 𝑃 [−𝑧1−𝛼 ≤ ≤ 𝑧1−𝛼 ] ≈ 1 − 𝛼
2 √ 𝑉𝑎𝑟 ̂ )| 𝜃 )
̂ (ℎ(𝜃 2

yielding 𝑃 [ℎ(𝜃̂ ) − 𝑧1−𝛼 √ ̂


𝑉𝑎𝑟 (ℎ(𝜃̂)|𝜃) ≤ ℎ(𝜃) ≤ ℎ(𝜃̂ ) + 𝑧1−𝛼 √ ̂
𝑉𝑎𝑟 (ℎ(𝜃̂)|𝜃)] ≈ 1 − 𝛼
2 2

𝐶(𝑿)1−𝛼 ≈ {ℎ(𝜃): ℎ(𝜃̂) − 𝑧1−𝛼/2 √ ̂


𝑉𝑎𝑟 (ℎ(𝜃̂)|𝜃) ≤ ℎ(𝜃) ≤ ℎ(𝜃̂) + 𝑧1−𝛼/2 √ ̂
𝑉𝑎𝑟 (ℎ(𝜃̂)|𝜃)} .
(3.31)

Example 3.14 : Consider a sample 𝑋1 , … , 𝑋𝑛 from a Bernoulli (p) distribution. We want a


𝑝 1 𝑝
confidence interval for the odds, ℎ(𝑝) = 1−𝑝. The MLE of 𝑝 is 𝑝̂ = 𝑛 Σ𝑥𝑖 . So let ℎ(𝑝) = 1−𝑝 ,
then
1
ℎ′(𝑝) = (1−𝑝)2
1
[ℎ′ (𝑝)]2 |𝑝=𝑝̂ = (1−𝑝̂)2
𝑎𝑛𝑑 𝐿(𝜃| 𝒙) = 𝑝 Σ𝑥𝑖 (1
− 𝑝)(𝑛−Σ𝑥𝑖 )
ℓ𝑛 𝐿(𝜃| 𝒙) = Σ𝑥𝑖 ℓ𝑛 𝑝 + (𝑛 − Σ𝑥𝑖 )ℓ𝑛(1 − 𝑝)
= 𝑛𝑝̂ ℓ𝑛 𝑝 + (n − 𝑛𝑝̂ )ℓ𝑛(1 − 𝑝)
𝜕 𝑛𝑝̂ 𝑛(1−𝑝̂)
ℓ𝑛 𝐿(𝜃| 𝒙) = −
𝜕𝑝 𝑝 1−𝑝
𝜕2 𝑛𝑝̂ 𝑛(1−𝑝̂)
− 𝜕𝑝2 ℓ𝑛 𝐿(𝜃| 𝒙) = + (1−𝑝)2
𝑝2
𝜕2 𝑛 𝑛 𝑛(1−𝑝̂) 𝑛𝑝̂
− 𝜕𝑝2 ℓ𝑛 𝐿(𝜃| 𝒙)|𝑝=𝑝̂ = + (1−𝑝̂) = 𝑝̂(1−𝑝̂) + 𝑝̂(1−𝑝̂) = ⋯
𝑝̂
𝑛
= .
𝑝̂(1−𝑝̂)
So
𝑝̂
̂
𝑉𝑎𝑟 (ℎ(𝑝̂ )|𝑝) ≈ ,
𝑛(1−𝑝̂)3
and the approximate 1 − 𝛼 confidence interval is given by

𝑝̂ 𝑝̂ 𝑝 𝑝̂ 𝑝̂
− 𝑧1−𝛼/2 √𝑛(1−𝑝̂)3 ≤ 1−𝑝 ≤ 1−𝑝̂ + 𝑧1−𝛼/2 √𝑛(1−𝑝̂)3 .
1−𝑝̂

3.4.2 Other Approximate Intervals

If we have any statistics 𝑊 and 𝑉 and a parameter 𝜃 such that, as 𝑛 → ∞,


𝑊−𝜃
→ 𝑁(0,1) ,
𝑉

then we can form an approximate interval for 𝜃 given (by using the pivotal quantity method) by

𝑊−𝜃
−𝑧1−𝛼/2 ≤ ≤ 𝑧1−𝛼/2
𝑉
𝑊 − 𝑧1−𝛼/2 𝑉 ≤ 𝜃 ≤ 𝑊 + 𝑧1−𝛼/2 𝑉

In particular, if 𝑋1 , … , 𝑋𝑛 is a random sample from a population with mean 𝜇 and variance 𝜎 2 ,


then, from the Central Limit Theorem,
𝑋−𝜇
𝜎 → 𝑁(0,1).
√𝑛
𝑋−𝜇
(for any distribution. However, if the data is normal, this will be 𝜎 ~𝑁(0,1) yeilding an
√𝑛
exact result.)
𝑋−𝜇
−𝑧1−𝛼/2 ≤ 𝜎 ≤ 𝑧1−𝛼/2
√𝑛
𝜎 𝜎
𝑋 − 𝑧1−𝛼/2 ≤ 𝜃 ≤ 𝑋 + 𝑧1−𝛼/2 *
√𝑛 √𝑛

Moreover, from Slutsky’s Theorem, if 𝑠 2 → 𝜎 2 in probability, then (for when 𝜎 is unknown):


𝑋−𝜇
𝑠 → 𝑁(0,1).
√𝑛
𝑋−𝜇
−𝑧1−𝛼/2 ≤ 𝑠 ≤ 𝑧1−𝛼/2
√𝑛
𝑠 𝑠
𝑋 − 𝑧1−𝛼/2 ≤ 𝜃 ≤ 𝑋 + 𝑧1−𝛼/2
√𝑛 √𝑛
i.e. the first result* is already an approximate result, so further approximating 𝜎 with 𝑠 still yields
an approximate result. If the data was normally distributed, the first result would have been an
exact result, and if we used the second result, we would rather get an exact result by using the t-
𝑋−𝜇
distribution (when the data is normal) as then 𝑠 ∼ 𝑡𝑛−1 and we would use the pivotal
√𝑛
quantity method i.e. for Xi ~ N(), we would get an exact confidence interval as
𝑠 𝑠
𝑋 − 𝑡1−𝛼,n−1 ≤ 𝜃 ≤ 𝑋 + 𝑡1−𝛼,n−1
2 √𝑛 2 √𝑛
In the above case we get an approximate interval without specifying the sampling distribution.
We can do even better when we do specify the form of the distribution.

Example 3.15 : If 𝑋1 , … , 𝑋𝑛 ∼ Pois(𝜆) we know that 𝐸[𝑋] = 𝜆,


Option 1: We can use the following:
𝑋−𝜇 𝑋−𝜆
𝜎 ≈ 𝑠 → 𝑁(0,1) (3.32)
√𝑛 √𝑛

1
where 𝑠 2 = 𝑛−1 Σ(𝑥𝑖 − 𝑥)2.

Option 2: We can use the following:


However, this is true even if we did not know the sample is from a Poisson distribution, as long
as we want an interval for the mean. Since we know that 𝜎 2 = Var[𝑋] = 𝜆 = 𝐸(𝑋) ≈ 𝑋 also,
with estimator 𝑋, the variance can be estimated by 𝑋, and then

𝑋−𝜇 𝑋−𝜆
𝜎 ≈ √𝑋
→ 𝑁(0,1) (3.33)
√𝑛
√𝑛
is another approximation. (This trick doesn’t work with any distribution – it works for example
with the Poisson distribution because 𝜎 2 = Var [𝑋] = 𝜆 = 𝐸(𝑋) ≈ 𝑋 for the Poisson
distribution.)

Option 3: We can use the following:


Since the variance is a function of the parameter, a third approximation is
𝑋−𝜇 𝑋−𝜆
𝜎 = √𝜆
→ 𝑁(0,1) . (3.34)
√𝑛 √𝑛

This approximation is the best since it uses the fewest number of estimators. However it is not
directly invertable since the variance contains the unknown parameter. The confidence interval
is of the form

𝑋−𝜆 𝑋−𝜆 𝑋−𝜆


{𝜆: −𝑧1−𝛼 ≤ √𝜆
≤ 𝑧1−𝛼 } = {𝜆: −𝑧1−𝛼 ≤ √𝜆
; √𝜆
≤ 𝑧1−𝛼 }
2 2 2 2
√𝑛 √𝑛 √𝑛

𝑋−𝜆 𝑋−𝜆 𝑋−𝜆 𝑋−𝜆 𝑋−𝜆


= {𝜆: 𝑧1−𝛼 ≥ − √𝜆
; √𝜆
≤ 𝑧1−𝛼/2 } = {𝜆: − √𝜆
≤ 𝑧1−𝛼 ; √𝜆
≤ 𝑧1−𝛼/2 } = {𝜆: ± √𝜆
≤ 𝑧1−𝛼 }
2 2 2
√𝑛 √𝑛 √𝑛 √𝑛 √𝑛

𝑥−𝜆
= {𝜆: | | ≤ 𝑧1−𝛼/2 } .
√𝜆/𝑛

After squaring both sides and rearranging terms, we have


2
𝑥−𝜆 2 2 2 𝜆
2 2 2 𝜆
{𝜆: ( ) ≤ 𝑧1− 𝛼 } = {𝜆: (𝑥 − 𝜆) ≤ 𝑧1−𝛼/2 } = {𝜆: 𝑥 − 2𝑥𝜆 + 𝜆 ≤ 𝑧1−𝛼/2 } ,
√𝜆/𝑛 2 𝑛 𝑛

or the quadratic equation


2 1 2
{𝜆: 𝜆2 − (2𝑥 + 𝑧1−𝛼/2 )𝜆 + 𝑥 ≤ 0} . (3.35)
𝑛
2 1 2
In the form 𝑎𝜆2 + 𝑏𝜆 + c ≤ 0 where 𝑎 = 1, 𝑏 = −(2𝑥 + 𝑛 𝑧1−𝛼/2 ), 𝑐 = 𝑥

−𝑏±√𝑏2 −4𝑎𝑐
For 𝑎𝜆2 + 𝑏𝜆 + c = 0 then the solution would be 𝜆 = 2𝑎
𝑎 = 1 > 0 therefore the arms go up.
−𝑏−√𝑏2 −4𝑎𝑐 −𝑏+√𝑏2 −4𝑎𝑐
Thus we know ≤𝜆≤
2𝑎 2𝑎

Since the coefficient of 𝜆2 is positive, the inequality is satisfied if 𝜆 lies between the two roots
of the quadratic. These roots are
−𝑏±√𝑏 2 −4𝑎𝑐 1 1 1 2 2
2
= [(2𝑥 + 𝑛 𝑧1−𝛼/2 ) ± √(2𝑥 + 𝑛 𝑧1−𝛼/2
2
) − 4𝑥 ]
2𝑎 2
(3.36)
21 2 1 𝑥
= 𝑥 + 2𝑛 𝑧1−𝛼/2 ± 𝑧1−𝛼/2 √4𝑛2 𝑧1−𝛼/2 +𝑛

Notice if 𝑛 gets large and we let terms of order 𝑛−1 go to zero, then (3.36) becomes
𝑥
≈ 𝑥 ± 𝑧1−𝛼/2 √𝑛 (3.37)
𝑥 𝑥
i.e. we get 𝑥 − 𝑧1−𝛼 √𝑛 ≤ 𝜆 ≤ 𝑥 + 𝑧1−𝛼/2 √𝑛 (using option 3).
2

Option 2 would have given the same answer i.e. which is the same interval given by (3.33). So for
large 𝑛 there will be little difference between the different approaches.

Section 3.4.1. would have given the same answer: Notice also that if the approach of Section
3.4.1 is folowed with ℎ(𝜆) = 𝜆, the resulting interval will also be as in (3.37). The approach of
Section 3.4.1 is usually only followed when we want a confidence interval for a function of the
parameter.

𝑠 𝑠
Option 1 would have given: 𝑋 − 𝑡1−𝛼,n−1 ≤ 𝜆 ≤ 𝑋 + 𝑡1−𝛼,n−1 .
2 √𝑛 2 √𝑛
EXERCISES 3

1. Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇1 , 𝜎12 ) independent from 𝑌1 , … , 𝑌𝑚 ∼ 𝑁(𝜇2 , 𝜎22 ) . Find a 1 - 𝛼


confidence interval for 𝜇1 − 𝜇2 if (a) 𝜎12 and 𝜎22 are known ; (b) 𝜎12 = 𝜎22 = 𝜎 2 unknown.

2. For the situation in Problem 2 of Chapter 2, find a 1 - 𝛼 confidence interval for 𝜇1 − 𝜇2 .

𝜎2
3. For the situation in Problem 3 of Chapter 2, find a 1 - 𝛼 confidence interval for 𝜆 = 𝜎12 .
2

4. Let 𝑋 be a single observation from the density 𝑓(𝑥|𝜃) = 𝜃𝑥 𝜃−1 , 0 ≤ 𝑥 ≤ 1 , where


𝜃 > 0.

(a) Find a pivotal quantity and confidence interval for 𝜃.

1 2
(b) Let 𝑌 = −ℓ𝑛 𝑋. What is the confidence coefficient of the interval [𝑦 , 𝑦]? Also, find a better
interval for 𝜃.

5. Prove Corollary 3.1.

6. Let 𝑋1 , … , 𝑋𝑛 be distributed as 𝑓(𝑥|𝜃) = 𝑒 −(𝑥−𝜃) , 𝜃 ≤ 𝑥 < ∞.


Use each of the three methods described in this chapter (see also Example 2.5) to find a 1 - 𝛼
confidence interval for 𝜃. Which interval is the best? If 𝑛 = 6 with = (7; 9; 4; 6; 12; 8), find the
best 95% confidence interval for 𝜃.

7. Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜃, 𝜃) , 𝜃 > 0. Give three examples of pivotal quantities for 𝜃 and obtain
the 1 - 𝛼 confidence intervals.

8. How large a sample must be taken from a 𝑁(𝜇, 𝜎 2 ) distribution if the length of a 95%
1
confidence interval must be not larger than 2 𝜎? Assume 𝜎 2 known.

9. 𝑋 is a single observation from


𝑓(𝑥|𝜃) = 𝜃𝑒 −𝜃𝑥 , 0 ≤ 𝑥 < ∞ , 𝜃 > 0 .
1
(a) (𝑋, 2𝑋) is a confidence interval for 𝜃. What is the confidence coefficient?

1
(b) Find another confidence interval for with the same coefficient but smaller expected
𝜃
length.
10. Let 𝑋1 , … , 𝑋𝑛 be a random sample from

𝑓(𝑥|𝜃) = 𝜃𝑒 −𝜃𝑥 , 0 ≤ 𝑥 ≤ ∞ , 𝜃 > 0 .

(a) Find a 1 - 𝛼 confidence interval for the mean of the population.

(b) Find a confidence interval for 𝑃[𝑋 > 1] = 𝑒 −𝜃 .

(c) If 𝑌1 = min{𝑋1 , … , 𝑋𝑛 }, find a pivotal quantity and interval for 𝜃, based only on 𝑌1 .

11. The breaking strengths in kg of five specimens of manila rope were found to be 330, 230,
270, 290 and 275.

(a) Find a 95% confidence interval for the mean breaking strength, assuming normality.

(b) Estimate the point at which only 5% of such specimens are expected to break.

(c) Find a 90% confidence interval for 𝜎 2 and 𝜎.

12. To test two new lines of hybrid corn under normal farming conditions, a seed company
selected eight farms at random and planted both lines in experimental plots on each farm. The
yield for the eight locations were

𝐿𝑖𝑛𝑒 𝐴 86 87 56 93 84 93 75 79
𝐿𝑖𝑛𝑒 𝐵 80 79 58 91 77 82 74 66

Assuming joint normality, estimate the difference between the mean yields by a 95% confidence
interval.

13. Let 𝑋1 , … , 𝑋𝑛 be distributed as

2𝑥
𝑓(𝑥|𝜃) = 𝜃2 , 0 < 𝑥 < 𝜃 , 𝜃 > 0 .

Find a 1 - 𝛼 confidence interval for 𝜃.

14. Let 𝑋1 , … , 𝑋𝑛 be a sample from a Gamma (𝑘, 𝛽) distribution with 𝑘 known. Find a
uniformly most accurate (UMA) 1 - 𝛼 confidence interval of the form (0, 𝑢( )] for 𝛽.
15. Let 𝑋 be one observation from the pdf

𝑒 𝑥−𝜃
𝑓(𝑥|𝜃) = (1+𝑒 𝑥−𝜃 )2 , −∞ < 𝑥 < ∞ , −∞ < 𝜃 < ∞ .

Find a UMA 1 - 𝛼 interval for 𝜃 of the form (−∞, 𝑢(𝑥)].

16. Let 𝑋1 , … , 𝑋𝑛 ∼ 𝐸𝑋𝑃 (𝜆). Find a UMA 1 - 𝛼 confidence interval based on inverting the
UMP test of 𝐻0 : 𝜆 ≥ 𝜆0 versus 𝐻1 : 𝜆 < 𝜆0 . Find the expected length of this interval.

17. If 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) with 𝜎 2 unknown, find the shortest 1 - 𝛼 confidence interval for
𝜇.

18. A thumbtack is tossed 20 times and landed with the point up 14 times. Use the statistical
method and tables to find a confidence interval for 𝑝, the probability for “point up”, with
confidence coefficient of approximately 90%.

19. The following data, the number of aphids per row in nine rows of a potato field, can be
assumed to follow a Poisson distribution:
155, 104, 66, 50, 36, 40, 30, 35, 42.
Construct the best approximate 90% confidence interval for the mean number of aphids per row.
Compare with the exact result as obtained from Equation (3.20).

20. For a random sample 𝑋1 , … , 𝑋𝑛 from a Bernoulli(𝑝) distribution, find the best approximate
1 - 𝛼 confidence interval (as in Example 3.14) for 𝑝. Find also a simpler, but less accurate,
interval. Find the approximate 90% interval for the data in Exercise 19.

21. If 𝑋1 , … , 𝑋𝑛 ∼ 𝐵𝑒𝑟 (𝑝1 ) independently from 𝑌1 , … , 𝑌𝑚 ∼ 𝐵𝑒𝑟 (𝑝2 ), find an approximate


1 - 𝛼 confidence interval for 𝑝1 − 𝑝2.

22. Let 𝑋1 , … , 𝑋𝑛 be a sample from the Geometric distribution


𝑓(𝑥|𝑝) = 𝑝(1 − 𝑝)𝑥−1 , 𝑥 = 1,2, … ,0 < 𝑝 < 1 .
Find an approximate confidence interval for
1
(a) 𝐸[𝑋] = 𝑝 ; (b) 𝑃[𝑋 = 2] = 𝑝(1 − 𝑝).

23. Let 𝑋1 , … , 𝑋𝑛 be a sample from a Negative Binomial (𝑟, 𝑝) distribution with 𝑟 known ;
𝑟+𝑥−1 𝑟
𝑓(𝑥|𝑝) = ( ) 𝑝 (1 − 𝑝)𝑥 , 𝑥 = 0.1, … ,0 < 𝑝 < 1 .
𝑥
𝑟(1−𝑝)
(a) Find an approximate 1 − 𝛼 confidence interval for the mean, 𝐸[𝑋] = .
𝑝

(b) The aphid data of Exercise 20 can also be modelled using the negative binomial distribution
with 𝑟 = 2. Construct an approximate 90% confidence interval for the mean using the result in
part (a) and compare it with the result of Exercise 20.
24. One side of a square field is measured 9 times. The measuring instrument has a
measurement error that is normally distributed with standard deviation of one when the true
distance is approximately 9 meters. The mean length obtained from the 9 measurements is 9
meters. Find an approximate 99% confidence interval for the area of the field.

(Tables, and formulae are in the original notes)

You might also like