STSM3714 (With Notes From Class)
STSM3714 (With Notes From Class)
P.C.N. Groenewald
(Revised by KN Bekker, R Schall, S van der Merwe and M Sjölander)
Handbooks :
3. Introduction to the Theory of Statistics – A.M. Mood, F.A. Grabill & D.C. Boes, McGraw–Hill,
1974.
CONTENTS
1.1. Introduction
1.2. Testing simple hypotheses
1.3. Optimal tests for simple hypotheses
Exercises
2.1. Introduction
2.2. Generalized Likelihood Ratio Tests
2.4. Two–sample Normal Tests
2.3. Uniformly Most Powerful Tests
2.5. Chi–square Tests
2.6. Approximate Large–Sample Tests
Exercises
3.1. Introduction
3.2. Methods of finding Interval Estimators
3.3. Evaluating Interval Estimators
3.4. Approximate Confidence Intervals
Exercises
Chapter 1
Definitions and testing simple hipotheses
1.1 Introduction
There are two major areas of statistical inference: the estimation of parameters and the testing
of hypotheses. (Example of a hypotheis 0 < µ < 5; or p = 0.2.) We shall study the second of these
two areas in this chapter. Our aim will be to develop general methods for testing hypotheses and
to apply those methods to some common problems.
In experimental research, the object is sometimes merely to estimate parameters. Thus one may
wish to estimate the yield of a new hybrid line of corn. But more often the ultimate purpose will
involve some use of the estimate. One may wish, for example, to compare the yield of the new
line with that of a standard line and perhaps recommend that the new line replace the standard
line if it appears superior. This is a common situation in research. One may wish to determine
whether a new method of sealing light bulbs will increase the life of the bulbs, whether a new
germicide is more effective in treating a certain infection than a standard germicide, whether
one method of preserving foods is better than another insofar as retention of vitamins is
concerned, and so on.
Note that in all these examples we are trying to discover something about a parameter of a
distribution, but other than estimation, we simply want to find out whether it falls in a particular
region of the parameter space, and not to estimate it.
The following definition of a hypothesis is rather general, but the important point is that a
hypothesis makes a statement about the population. The goal of a hypothesis test is to decide,
based on a sample from the population, which of two complementary hypotheses is true.
Definition 1.2 : The two complementary hypotheses in hypothesis testing problem are called the
null hypothesis and the alternative hypothesis. They are denoted by 𝐻0 and 𝐻1 , respectively.
(e.g. H0: µ = 1400 ; H1: µ > 1400)
It is important to remember that the conjecture that we want to find evidence for is the
alternative hypothesis. The null hypothesis refers to the neutral position or status quo.
Notation : When drawing a random sample from a distribution 𝑓(𝑥|𝜃) with parameter 𝜃, we
will denote the sample space by 𝑆 and the parameter space by Ω, so that (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) ∈ 𝑆
and 𝜃 ∈ Ω.
Definition 1.3 : A statistical test or hypothesis test is a rule that specifies for which values of the
sample the null hypothesis should be rejected.
e.g. reject H0 if 𝑥 ≥ 1500
We will use the capital gamma, Γ, to denote a test. So the hypotheses should be of the form
𝐻0 : 𝜃 ∈ Ω0 ,
(e.g. Ω0 = {1400} and Ω1 = (1400, ∞)) (1.1)
𝐻1 : 𝜃 ∈ Ω1 ,
where Ω0 ∩ Ω1 = 𝜙 and Ω0 ∪ Ω1 = Ω (Ω0 and Ω1 are disjoint and mutually exhaustive).
(e.g. Ω0 = (−∞, 1400] and Ω1 = (1400, ∞) thus Ω = (−∞, ∞))
(e.g. H0: µ ≤ 1400 ; H1: µ > 1400)
Definition 1.4 : The critical region or rejection region is that subset 𝑅 of the sample space 𝑆
for which 𝐻0 would be rejected if (𝑥1 , … , 𝑥𝑛 ) ∈ 𝑅.
1
e.g. 𝑅 = {(𝑥1 , … , 𝑥𝑛 ): ∑ 𝑥𝑖 ≥ 1500}
n
i.e. reject H0 if we get a sample with the property 𝑥 ≥ 1500
i.e. reject H0 if 𝑥 ≥ 1500
e.g.
X ~ N(μ ,25)
H0: μ ≤ 1000 (null hypothesis) Ω0 = (-∞,1000]
H1: μ > 1000 (alternative hypothesis) Ω1 = (1000, ∞) Ω = (-∞,∞)
Example of a statistical test / hypothesis test: Reject H0 if 𝑋 > 1100
Rejection rule / rejection region / critical region:
Reject H0 if {x1, x2, …, xn} 𝜖 {(x1 , x2 , … , x𝑛 ) ∶ 𝑋 > 1100}
R = {(x1 , x2 , … , x𝑛 ) ∶ 𝑋 > 1100}
e.g.
X ~ N(1000,𝜎 2 )
H0: 𝜎 2 = 25 (null hypothesis) Ω0 = {25}
H1: 𝜎 2 ≠ 25 (alternative hypothesis) Ω1 = (0, 25)∪(25, ∞) Ω = (0, ∞)
Example of a statistical test / hypothesis test: Reject H0 if 𝑆 2 < 22 or if 𝑆 2 > 28
Rejection rule / rejection region / critical region:
Reject H0 if {x1, x2, …, xn} 𝜖 {(x1 , x2 , … , x𝑛 ) ∶ 𝑆 2 < 22 or if 𝑆 2 > 28}
i.e. R = {(x1 , x2 , … , x𝑛 ) ∶ 𝑆 2 < 22 or if 𝑆 2 > 28}
e.g.
X ~ Bin(10,p)
H0: p ≤ 0.3 (null hypothesis) Ω0 = [0 , 0.3]
H1: p > 0.3 (alternative hypothesis) Ω1 = (0.3 , 1] Ω = [0 , 1]
Example of a statistical test / hypothesis test: Reject H0 if 𝑋 > 0.4
Rejection rule / rejection region / critical region:
Reject H0 if {x1, x2, …, xn} 𝜖 {(x1 , x2 , … , x𝑛 ) ∶ 𝑋 > 0.4}
R = {(x1 , x2 , … , x𝑛 ) ∶ 𝑋 > 0.4}
The complement of the critical region is called the acceptance region.
So to define a statistical test Γ is to specify the critical region of the test.
Definition 1.5 : The power function of a test, 𝜋(𝜃), is the probability of rejecting the null
hypothesis (H0), as a function of 𝜃.
So
𝜋(𝜃) = 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝜃]
= 𝑃[(𝑋1 , … , 𝑋𝑛 ) ∈ 𝑅|𝜃] (1.2)
= 𝑃𝜃 [𝑿 ∈ 𝑅] .
e.g.
if our test is given by: Reject H0 if 𝑥 ≥ 1500
then 𝑅 = {(𝑥1 , … , 𝑥𝑛 ): 𝑥 ≥ 1500}
then 𝜋(𝜃) = P𝜃 (Reject H0 ) = P𝜃 (𝑋 ≥ 1500)
Example 1.1
Suppose that the average life of light bulbs made under a standard manufacturing procedure is
1400 hours. It is desired to test a new process for manufacturing the bulbs. So we are dealing
with two populations of light bulbs. We know that the mean of the first population is 1400. The
question is whether the mean of the second population is greater than or less than 1400. If we
are really interested in determining if the new process would be an improvement on the
standard, i.e longer average lifetime, we would typically set up the null hypothesis as stating that
the new process has a mean that is actually smaller than or equal to the standard. If we can reject
this null hypothesis we can conclude that the new process is better.
1
Assume that the lifetimes of bulbs are exponentially Exp(𝜃) distributed with mean 𝜃, that is,
1
𝑓(𝑥|𝜃) = 𝜃 𝑒 −𝑥/𝜃 , 0 < 𝑥 < ∞ .
and we must decide on the basis of a random sample of 𝑛 of the new bulbs tested whether to
not reject or to reject 𝐻0 . Note that rejection of 𝐻0 means acceptance of 𝐻1 .
So we must specify our test Γ by defining the critical region 𝑅. 𝑅 is a subset of the sample
space 𝑆 and is generally in 𝑛 dimensions. However, as we shall see later, the critical region can
usually be expressed in terms of a single sufficient statistic, 𝑇(𝑋1 , … , 𝑋𝑛 ).
1
In this case a sufficient statistic for 𝜃 is the sample mean, 𝑋 = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 . Obviously the larger
the sample mean, the less likely is the null hypothesis. So let us reject the null hypothesis if the
sample from the new process gives us a mean that is larger than 1500. The test Γ is then
specified by the following critical region:
𝑅 = {(𝑥1 , … , 𝑥𝑛 ): 𝑥 ≥ 1500} .
which means:
Reject 𝐻0 if 𝑥 ≥ 1500.
𝜋(𝜃) = 𝑃𝜃 [𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ]
= 𝑃𝜃 [𝑋 ≥ 1500] .
𝑛
To determine this function we must know the distribution of 𝑋, which is Gamma (𝑛, 𝜃), which
cannot be evaluated explicitly.
1 1
So let us assume 𝑛 = 1, i.e. a single new light bulb was tested. (𝑋 = 𝑛 ∑ 𝑋𝑖 = 1 𝑋1 = 𝑋1 which
we will call X). Then
For a continuous distribution it doesn’t matter if you have < or if you have ≤
For a continuous distribution it doesn’t matter if you have > or if you have ≥
1
where 𝑋 ∼ 𝐸𝑥𝑝 (𝜃).
From this power function we can now determine the probability of rejecting the null hypothesis
for any value of 𝜃.
For example, if the true mean of the new process is 𝜃 = 1600, then the probability of rejecting
the null hypothesis (correctly (because 𝜃 > 1400)) is
(null hypothesis is 𝐻0 : 𝜃 ≤ 1400)
“If the true mean of the new process is 1600, what is the probability that we (correctly) reject
H0”
𝜋(1600) = 𝑒 −0.938 = 0.392.
On the other hand, if the true mean is 𝜃 = 1300, then we don’t want to reject the null
hypothesis (because it is true i.e. 𝜃 ≤ 1400), but the probability of doing so is
“If the true mean of the new process is 1300, what is the probability that we (incorrectly) reject
H0”
Example 1.2
𝐻0 : 𝜃 ≤ 10
So
𝐻1 : 𝜃 > 10 .
We decide to reject 𝐻0 if the sample mean is larger than 13. Thus the critical region
𝑅 = {𝒙 ∶ 𝑥 > 13}
specifies the test and the power function of the test is given by
𝜋(𝜃) = P𝜃 (Reject H0)
𝜎2
= 𝑃𝜃 [𝑋 > 13] 𝑤ℎ𝑒𝑟𝑒 𝑋 ∼ 𝑁 (𝜃, )
𝑛
𝑋−𝜃 13 − 𝜃
= 𝑃𝜃 [ > ]
𝜎/√𝑛 𝜎/√𝑛
13 − 𝜃
= 𝑃𝜃 [𝑍 > ]
𝜎/√𝑛
13 − 𝜃
= 1 − 𝑃𝜃 [𝑍 < ]
𝜎/√𝑛
13 − 𝜃
= 1 − Φ( )
5/√𝑛
< and ≤ is the same thing as we are dealing with a continuous distribution (the normal distribution)
where Φ represents the cumulative distribution function of a standard normal variate i.e. cdf
of Z~N(0,1) i.e. Φ(… ) is FZ(… ).
Φ always gives you probability with < or ≤ sign for N(0,1) distribution
Φ can be read off of table C or you can use the = norm.s.dist(…, true) function in Excel
13−15
i) 𝜋(15) = 1 − Φ ( 5 ) = 1 − Φ(−1.2) = 1 − P(Z < −1.2)
√9
= 1 − P(Z > 1.2) = P(Z < 1.2) = Φ(1.20) = 0.8849
13−10
ii) 𝑤ℎ𝑖𝑙𝑒 𝜋(10) = 1 − Φ ( 5 ) = 1 − Φ(1.80) = 1 − 0.9641 = 0.0359.
√9
in Excel:
13−15
i) 𝜋(15) = 1 − Φ ( 5 ) = 1 − Φ(−1.2) = 1 − norm. s. dist(−1.2, true) = 0.8849
√9
13−10
ii) 𝜋(10) = 1 − Φ ( 5 ) = 1 − Φ(1.8) = 1 − norm. s. dist(1.8, true) = 0.0359
√9
FIGURE 1.2 : Power function of the test Γ, (𝑛 = 9).
FIGURE 1.3 : Power function of the test Γ, (various sample sizes 𝑛).
Theorem 1.4 : Let 𝑋(1) < 𝑋(2) < ⋯ < 𝑋(𝑛) denote the order statistics of a random sample
𝑋1 , … , 𝑋𝑛 from a contineous population with pdf 𝑓𝑋 (𝑥) and cdf 𝐹𝑋 (𝑥). Then the pdf of 𝑋(𝑗) is
𝑛!
𝑓𝑋(𝑗) (𝑥) = 𝑓 (𝑥)[𝐹𝑋 (𝑥)]𝑗−1 [1 − 𝐹𝑋 (𝑥)]𝑛−𝑗 . (1.19)
(𝑗−1)!(𝑛−𝑗)! 𝑋
It follows then for the smallest order statistic that 𝑓𝑋(1) (𝑥) = 𝑛𝑓𝑋 (𝑥)[1 − 𝐹𝑋 (𝑥)]𝑛−1 (1.20)
and for the largest order statistic that 𝑓𝑋(𝑛) (𝑥) = 𝑛𝑓𝑋 (𝑥)[𝐹𝑋 (𝑥)]𝑛−1 . (1.21)
(this often gets used with a distribution which has a pdf in a certain limitted range e.g. below and
also the uniform U(0, θ) distribution)
Class Example 1
Let Y = X(n). For the test of H0: θ ≥ 3 versus H1: θ < 3, let the critical region be R = {x: X(n) ≤ 2.8}
Determine the power function of the test.
Solution:
The power function is:
π(θ) = P(Reject H0 | θ)
= P(X(n) ≤ 2.8 | θ)
= P(Y ≤ 2.8 | θ)
2.8
= ∫0 𝑓𝑌 (𝑦) 𝑑𝑦 we thus need to find 𝑓𝑌 (𝑦)
Y = X(n) is the nth order statistic. We will need the density function of Y:
𝑛−1
𝑓𝑌 (𝑦) = 𝑛𝑓𝑋 (𝑦)(𝐹𝑋 (𝑦))
𝑛−1
𝑦 𝜃−1 𝑦 𝑥 𝜃−1
= 𝑛 (𝜃) (∫0 ( ) 𝑑𝑥)
𝜃
𝑦 𝜃−1 1 1 𝑦 𝑛−1
= 𝑛 (𝜃) ([𝜃𝜃−1 𝜃 𝑥 𝜃 ] )
0
𝜃−1 𝜃 𝑛−1
𝑦 𝑦
= 𝑛 (𝜃) (( ) )
𝜃
𝑛𝜃−𝜃
𝑦 𝜃−1 𝑦
= 𝑛 (𝜃) (𝜃 )
𝑦 𝑛𝜃−1
= 𝑛 (𝜃) for 0 ≤ 𝑦 ≤ 𝜃
Definition 1.7 : A Type I Error is made if the null hypothesis is rejected when it is true, and a Type
II Error is made if the null hypothesis is not rejected when it is false.
Since there are only two possible choices, the test of a simple null hypothesis against a simple
alternative can be represented schematically as follows :
𝐻0 true 𝐻0 false
(𝐻1 true)
𝐻0 not rejected Correct Type II Error
𝐻0 rejected Type I Error Correct
For the rest of ch1 we will consider the situation where both the null and alternative hypotheses
are simple. For the purpose of the test, the parameter space contains only two points. Under a
simple hypothesis the distribution of the observations is completely specified. Under a composite
hypothesis only the class of distributions is specified. So in this case we have
𝐻0 : 𝜃 = 𝜃0
𝐻1 : 𝜃 = 𝜃1 , 𝑤ℎ𝑒𝑟𝑒 Ω = {𝜃0 , 𝜃1 } .
Since the power function is the probability of rejecting the null hypothesis as a function of the
parameter, we can see that
and
𝜋(𝜃1 ) = 𝑃[ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝜃1 ]
= 𝑃[ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑖𝑠 𝑡𝑟𝑢𝑒 ]
= 1 − 𝑃[ 𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑖𝑠 𝑡𝑟𝑢𝑒 ] (1.4)
= 1 − 𝑃[ 𝑇𝑦𝑝𝑒 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 ]
= 1−𝛽
Notation : We will denote the probability of a Type I Error by 𝛼, and the probability of a Type II
Error by 𝛽. 𝛼 is also called the size of the test.
In general:
α = P(type I error) = P(Reject H0 | H0 true)
β = P(type II error) = P(Do not reject H0 | H1 true)
𝜋(𝜃) = P(Reject H0 |𝜃)
So it follows for
𝐻0 : 𝜃 = 𝜃0
𝐻1 : 𝜃 = 𝜃1
We will discuss two possible criteria for defining the best, or optimal test.
Suppose 𝑎 and 𝑏 are specified positive constants and it is desired to find a test Γ ∗ for which
𝑎𝛼 ∗ + 𝑏𝛽 ∗ will be a minimum, where 𝛼 ∗ and 𝛽 ∗ are the error probabilities associated with
the test Γ ∗ . The following theorem shows that a procedure that is optimal in this sense has a
very simple form.
Consider the simple hypotheses
𝐻0 : 𝜃 = 𝜃0 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝐻1 : 𝜃 = 𝜃1 ,
and let
𝑓(𝐱|𝜃𝑖 ) = f(x1 , … , 𝑥𝑛 |𝜃𝑖 ) = ∏𝑛𝑗=1 𝑓(𝑥𝑗 |𝜃𝑖 )
( random sample
=> independent and identically distributed
=> pdf of Xj is fXj (xj |θi ) = fX (xj |θi ) = f(xj |θi )
Pdf of X1 ,…, Xn = Pdf of X1 × … × Pdf of Xn )
𝑓(𝐱|𝜃0 ) 𝑏
Theorem 1.1 : Let Γ ∗ denote a test procedure such that 𝐻0 rejected if 𝑓(𝐱|𝜃1 )
< . Then for
𝑎
any other test Γ, it follows that
𝑎𝛼 ∗ + 𝑏𝛽 ∗ ≤ 𝑎𝛼 + 𝑏𝛽 .
Proof : (We shall present the proof for the case where the sample 𝑋1 , … , 𝑋𝑛 is drawn from a
continuous distribution. In the case of a discrete distribution the 𝑛–dimensional integrals are
replaced by summations.)
Let 𝑅 denote the critical region of an arbitrary test Γ. Then the acceptance region is the
complement 𝑅 𝑐 .
Then
Similarly
𝑓(𝒙|𝜃0 )
Note : The ratio is called the likelihood ratio of the sample, and to minimize 𝑎𝛼 + 𝑏𝛽,
𝑓(𝒙|𝜃1 )
𝑓( 𝒙|𝜃0 ) 𝑏
𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 <𝑎. (1.7)
𝑓( 𝒙|𝜃1 )
The constants 𝑎 and 𝑏 determine the relative weights that are given to the two types of errors.
For example, if 𝑎 = 2 and 𝑏 = 1, then a Type I Error is considered twice as serious as a Type II
Error.
1.3.2 Most powerful tests (set 𝜶 to a small value (often 0.05) and minimise 𝜷)
𝑓(𝒙|𝜃0 )
To minimize 𝛽 (for a given 𝛼) i.e. to find the most powerful test, 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 < 𝑘
𝑓(𝒙|𝜃1 )
Once we simplify the above, as we know the value of 𝛼, we can use it to find k and thus 𝛽.
1 − 𝛽 is known as the power of a test
Minimising 𝛽 => maximising 1 − 𝛽 => maximising power => most powerful
Since 𝛼 and 𝛽 cannot both be made arbitrarily small for a fixed sample size, an important
criterium for defining an optional test is to keep the size of a Type I Error fixed and to choose a
test which will minimize the size of a Type II Error. This leads to the definition of a most powerful
test.
Definition 1.8 : A test Γ ∗ is a most powerful test (MPT) of size 𝛼 (0 < 𝛼 < 1) if :
𝑓(𝒙|𝜃0 )
𝒙∈𝑅 𝑖𝑓 < 𝑘
𝑓(𝒙|𝜃1 )
𝛼 ∗ + 𝑘𝛽 ∗ ≤ 𝛼 + 𝑘𝛽
where 𝛼 and 𝛽 are the error sizes of any other test Γ. So if 𝛼 ≤ 𝛼 ∗ , then 0 ≤ 𝛼 ∗ − 𝛼 ≤
𝑘(𝛽 − 𝛽 ∗ ), and thus, since 𝑘 > 0, 𝛽 ≥ 𝛽 ∗ .
OR
Note : The Neyman–Pearson Lemma states that the most powerful test of size 𝛼 is the one
where the null hypothesis is rejected when
𝑓(𝒙|𝜃0 )
< 𝑘, (1.8)
𝑓(𝒙|𝜃1 )
Then the size of the Type II Error is smaller than for any other test with the same or smaller
Type I Error probability.
Example 1.3
Let 𝑥1 , … , 𝑥𝑛 be a random sample from a 𝑁(𝜃, 𝜎 2 ) distribution with 𝜎 2 known. Only two
values for 𝜃 are possible, so we want to test 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 = 𝜃1 where 𝜃0 < 𝜃1 .
(a) Suppose we want to minimize a linear combination of the error sizes, 𝑎𝛼 + 𝑏𝛽. Then,
𝑓(𝒙|𝜃 ) 𝑏
according to Equation (1.7) we must reject 𝐻0 if 𝑓(𝒙|𝜃0 ) < 𝑎.
1
Now
1
1 − (𝑥 −𝜃)2
𝑓(𝑥𝑖 |𝜃) = 1 𝑒 2𝜎2 𝑖
(2𝜋𝜎2 )2
2
(𝑥 −𝜃)
1 − 𝑖 2
𝑓(𝒙|𝜃) = 𝑓(𝑥1 , … , 𝑥𝑛 |𝜃) = ∏𝑛𝑖=1 𝑓(𝑥𝑖 |𝜃) = ∏𝑛𝑖=1 𝑒 2𝜎
√2𝜋𝜎2
(𝑥 −𝜃)2 (𝑥 −𝜃)2 (𝑥 −𝜃)2
1 − 1 2 1 − 2 2 1 − 𝑛 2
= √2𝜋𝜎2 𝑒 2𝜎 × √2𝜋𝜎2 𝑒 2𝜎 × … × √2𝜋𝜎2 𝑒 2𝜎
2 2
(𝑥 −𝜃) (𝑥 −𝜃) (𝑥 −𝜃)2
1 1 1 − 1 − 2 − 𝑛 2
= √2𝜋𝜎2 × √2𝜋𝜎2 × … × √2𝜋𝜎2 𝑒 2𝜎2 × 𝑒 2𝜎2 × …× 𝑒 2𝜎
2 2
(𝑥 −𝜃) (𝑥 −𝜃) (𝑥 −𝜃)2
1 1 1 − 1 2 − 2 2 −…− 𝑛 2
= (2𝜋𝜎2 )1/2
× (2𝜋𝜎2 )1/2
× …× (2𝜋𝜎2 )1/2
𝑒 2𝜎 2𝜎 2𝜎
2
(𝑥𝑖 −𝜃)
1 − ∑𝑛
= (2𝜋𝜎2 )𝑛/2
𝑒 𝑖=1 2𝜎2
(1.9)
is the normal likelihood function, and
1
1 − Σ(𝑥𝑖 −𝜃0 )2
𝑒 2𝜎2
𝑓(𝒙|𝜃0 ) (2𝜋𝜎2 )𝑛/2
= 1
𝑓(𝒙|𝜃1 ) Σ(𝑥𝑖 −𝜃1 )2
1 −
𝑒 2𝜎2 ea/eb = ea-b (1.10)
(2𝜋𝜎2 )𝑛/2
1
− [Σ(𝑥 −𝜃 )2 −Σ(𝑥 −𝜃 )2 ]
= 𝑒 2𝜎2 𝑖 0 𝑖 1
1
− ∑ [𝑥 −2𝑥𝑖 𝜃0 +𝜃0 −𝑥𝑖2 +2𝑥𝑖 𝜃1 −𝜃12 ]
𝑛 2 2
= 𝑒 2𝜎2 𝑖=1 𝑖
1
− ∑𝑛 [2𝑥𝑖 𝜃1 −2𝑥𝑖 𝜃0 +𝜃02 −𝜃12 ]
=𝑒 2𝜎2 𝑖=1
1
− 2 [∑𝑛 𝑛 2 2
𝑖=1[2𝑥𝑖 𝜃1 −2𝑥𝑖 𝜃0 ]+∑𝑖=1[𝜃0 −𝜃1 ]]
=𝑒 2𝜎
1
− [∑𝑛 2𝑥𝑖 (𝜃1 −𝜃0 )+𝑛[𝜃02 −𝜃12 ]]
=𝑒 2𝜎2 𝑖=1
1
− 2 [2(𝜃1 −𝜃0 ) ∑𝑛 2 2
𝑖=1 𝑥𝑖 +𝑛[𝜃0 −𝜃1 ]]
=𝑒 2𝜎
1 𝑛
Note: ∑ 𝑥 =𝑥
𝑛 𝑖=1 𝑖
Thus: ∑𝑛𝑖=1 𝑥𝑖 = 𝑛𝑥
1
− 2 [2(𝜃1 −𝜃0 )𝑛𝑥+𝑛[𝜃02 −𝜃12 ]]
=𝑒 2𝜎
𝑛
− [2(𝜃1 −𝜃0 )𝑥+[𝜃02 −𝜃12 ]]
=𝑒 2𝜎2
𝑛
− 2 [(𝜃02 −𝜃12 )+2𝑥(𝜃1 −𝜃0 )]
=𝑒 2𝜎
So reject 𝐻0 if
𝑛
− [(𝜃02 −𝜃12 )+2𝑥(𝜃1 −𝜃0 )] 𝑏
𝑒 2𝜎2 < ,
𝑎
𝑛 𝑏
− 2𝜎2 [(𝜃02 − 𝜃12 ) + 2𝑥(𝜃1 − 𝜃0 )] < ln ( 𝑎 ),
that is, if
2𝜎2 𝑏
𝜃02 − 𝜃12 + 2𝑥(𝜃1 − 𝜃0 ) > − ℓ𝑛 (𝑎) ,
𝑛
or if
2𝜎2 𝑏
2𝑥(𝜃1 − 𝜃0 ) > − ℓ𝑛 (𝑎) − (𝜃02 − 𝜃12 ),
𝑛
2𝜎2 𝑏
2𝑥 (𝜃1 − 𝜃0 ) > 𝜃12 − 𝜃02 − ℓ𝑛 (𝑎)
𝑛
1 2𝜎2 𝑏
𝑥 > (𝜃12 − 𝜃02 − ℓ𝑛 ( )) . (1.11)
2(𝜃1 −𝜃0 ) 𝑛 𝑎
(Excel can be used instead of the table. See note I added by example 1.2)
2 2
(𝑥 −𝜃) (𝑥 −𝜃) (𝑥 −𝜃)2
1 1 1 − 1 − 2 − 𝑛 2
= √2𝜋𝜎2 × √2𝜋𝜎2 × … × √2𝜋𝜎2 𝑒 2𝜎2 × 𝑒 2𝜎2 × … × 𝑒 2𝜎
2 2
(𝑥 −𝜃) (𝑥 −𝜃) (𝑥 −𝜃)2
1 1 1 − 1 − 2 2 −…− 𝑛 2
= (2𝜋𝜎2 )1/2
× (2𝜋𝜎2 )1/2
× …× (2𝜋𝜎2 )1/2
𝑒 2𝜎2 2𝜎 2𝜎
2
(𝑥𝑖 −𝜃)
1 − ∑𝑛
= (2𝜋𝜎2 )𝑛/2
𝑒 𝑖=1 2𝜎2
(1.9)
So reject 𝐻0 if
𝑛
− [(𝜃2 −𝜃2 )+2𝑥(𝜃 −𝜃 )]
1 0
𝑒 2𝜎2 0 1 < 𝑘,
𝑛 2 2
− 2𝜎2 [(𝜃0 − 𝜃1 ) + 2𝑥(𝜃1 − 𝜃0 )] < ln(𝑘 ),
that is, if
2𝜎2
𝜃02 − 𝜃12 + 2𝑥(𝜃1 − 𝜃0 ) > − ℓ𝑛(𝑘) ,
𝑛
or if
2𝜎2
2𝑥(𝜃1 − 𝜃0 ) > − ℓ𝑛(𝑘) − (𝜃02 − 𝜃12 ),
𝑛
2𝜎2
2𝑥 (𝜃1 − 𝜃0 ) > 𝜃12 − 𝜃02 − ℓ𝑛(𝑘)
𝑛
𝑥 >
1
2(𝜃1 −𝜃0 )
(𝜃12 − 𝜃02 −
2𝜎2
𝑛
ℓ𝑛(𝑘)) . )
(with min a𝛼 + 𝑏𝛽 we could stop here, as the rhs is known.
But with setting 𝛼 and minimising 𝛽 (most powerful test), the rhs has a k in the formula, which
is unknown. Thus we must continue and use the definition of 𝛼 to find the value of the rhs)
(The important difference between the two approaches is that 𝑎 and 𝑏 in (1.11) are assumed
known, while 𝑘 in (1.12) must be determined so that the size of the Type I Error is equal to 𝛼.
So let the right–hand side of (1.12) be equal to 𝑐. ) Then 𝐻0 is rejected if 𝑥 > 𝑐, where
𝜎2
We know that under 𝐻0 , Xi ∼ 𝑁(𝜃0 , 𝜎 2 ) 𝑠𝑜 𝑋 ∼ 𝑁 (𝜃0 , ) , so by standardising,
𝑛
𝑋−𝜃0 𝑐−𝜃0
𝛼 = 𝑃𝜃0 [𝑋 > 𝑐] = 𝑃 [𝜎/ > ]
√𝑛 𝜎/√𝑛
𝑐−𝜃0
= 𝑃 [𝑍 > ] 𝑤ℎ𝑒𝑟𝑒 𝑍 ∼ 𝑁(0,1)
𝜎/√𝑛
𝑐−𝜃0
= 1 − 𝑃 [𝑍 < ]
𝜎/√𝑛
𝑐−𝜃0
1 − 𝛼 = 𝑃 [𝑍 < ]
𝜎/√𝑛
𝑐−𝜃0
1 − 𝛼 = Φ( 𝜎 ) Φ gets read from table C. But c is unknown.
√𝑛
𝑐−𝜃0
Φ−1 (1 − 𝛼) = 𝜎
√𝑛
Φ−1 gets read from table D or table E if it had an ∞ row or in Excel =norm.s.inv(…).
𝑐−𝜃0
𝑧1−𝛼 = 𝜎
√𝑛
𝑐−𝜃0
So 𝜎/√𝑛
is the (1 − 𝛼)th percentile of the standard normal distribution, 𝑧1−𝛼 , and
𝑐−𝜃0
= 𝑧1−𝛼 , 𝑜𝑟
𝜎/√𝑛
𝜎
𝑐 = 𝜃0 + 𝑧1−𝛼 .
√𝑛
Note that it is not necessary to evaluate 𝑘, only 𝑐, which is the right–hand side of (1.12).
For the numerical example 𝛼 should first be fixed. Let 𝛼 = 0.01 (and let 𝜃0 = 0, 𝜃1 = 5, 𝜎 2 =
16, 𝑛 = 9) then it follows from (1.13) that 𝐻0 should be rejected if 𝑥 > 3.1067.
Then
𝛼 = 0.01 𝑎𝑛𝑑
𝛽 = 𝑃(𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑡𝑟𝑢𝑒) = 𝑃[𝑋 < 3.1067|𝜃 = 5] = 𝑃[𝑍 < −1.4200]
= 0.0778 .
Example 1.4 : Sampling from a Bernoulli distribution. [N.B. This is an example where we use a
discrete distribution.]
That is, if
𝑝 (1−𝑝 ) 𝑦 1−𝑝 𝑛
[𝑝0(1−𝑝1)] (1−𝑝0) < 𝑘,
1 0 1
𝑝0 (1 − 𝑝1 ) 𝑦 𝑘
[ ] <
𝑝1 (1 − 𝑝0 ) 1−𝑝 𝑛
(1 − 𝑝0 )
1
𝑝0 (1 − 𝑝1 ) 𝑦 1 − 𝑝1 𝑛
[ ] < 𝑘( )
𝑝1 (1 − 𝑝0 ) 1 − 𝑝0
or if
𝑝 (1−𝑝 ) 1−𝑝 𝑛
𝑦 ℓ𝑛 [𝑝0(1−𝑝1)] < ℓ𝑛 [𝑘 (1−𝑝1) ] .
1 0 0
𝑝 𝑝
Now it is important to note that since 𝑝0 < 𝑝1 𝑠𝑜 𝑝0 < 1 and 𝑝0 , 𝑝1 > 0 so 0 < 𝑝0 < 1
1 1
1−𝑝1 1−𝑝1
𝑝0 < 𝑝1 𝑠𝑜 −𝑝0 > −𝑝1 𝑠𝑜 1 − 𝑝0 > 1 − 𝑝1 𝑠𝑜 1 > 1−𝑝 i.e. <1
0 1−𝑝0
1−𝑝
and 0 < 𝑝0 , 𝑝1 < 1 so 0 < 1 − 𝑝0 , 1 − 𝑝1 < 1 so 0 < 1−𝑝1 < 1
0
𝑝 (1−𝑝 )
0 < [𝑝0(1−𝑝1)] < 1, and therefore the log is negative. So if we divide both sides of the inequality
1 0
𝑝0 (1−𝑝1 )
by ℓ𝑛 [𝑝 ] < 0, the inequality sign is reversed. So reject 𝐻0 if
1 (1−𝑝0 )
1−𝑝 𝑛 𝑝 (1−𝑝 )
𝑦 > ℓ𝑛 [𝑘 (1−𝑝1) ] /ℓ𝑛 [𝑝0(1−𝑝1)] . (1.14)
0 1 0
As in the previous example it is not necessary to evaluate 𝑘. If we denote the right–hand side by
𝑐, then we reject 𝐻0 if 𝑦 > 𝑐 where 𝑐 is obtained from the relation
𝛼 = 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒) = 𝑃𝑝0 [𝑌 > 𝑐] .
Under 𝐻0 , 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 has a Binomial distribution with parameters 𝑛 and 𝑝0 . Since the
distribution is discrete it is generally not possible to find 𝑐 that exactly satisfies the equation.
1
For example, suppose 𝑛 = 20 and 𝑝0 = 2 and we want 𝛼 = 0.05. Now
1 20 1 y 1 20−y 20 1 20
𝑃 [𝑌 > 𝑐|𝑝0 = 2 ] = ∑20 𝑦=𝑐+1 (y ) ( 2 ) ( 2 ) = ∑20𝑦=𝑐+1 (y ) ( 2 ) (1.15)
We want 0.05 = 𝛼 = 𝑃𝑝0 [𝑌 > 𝑐] i.e. we want c such that 1 − 0.05 = 1 − 𝑃𝑝0 [𝑌 > 𝑐] i.e.
So there is no discrete value for 𝑐 for which the size of the test is exactly 0.95, it must be either
0.9423 (𝑐 = 13) or 0.9793 (𝑐 = 14). In practice we usually use a test with size as close as
possible to the desired one. 0.9423 is closer to 0.95 than 0.9793, thus we choose c = 13 i.e. Reject
H0 if 𝑌 ≤ 13
(The size of the test can be made exactly what is desired by using randomized tests, but we will
not deal with that.)
e.g. X~Beta(0.2 , 0.3) and I want F(?) = P(X < ?) = 0.9 then ? = Beta.Inv(0.9, 0.2, 0.3)
The application of the Neyman–Pearson Lemma can be simplified by using sufficient statistics
instead of the whole sample.
Remember that a statistic
𝑇(𝑥1 , … , 𝑥𝑛 ) = 𝑇(𝐱) ,
a function of the sample, is a sufficient statistic for the parameter 𝜃 if the joint density function
of the sample can be expressed as
𝑓(𝒙|𝜃) = 𝑔(𝑇(𝒙)|𝜃)ℎ(𝒙) , (1.16)
e.g. ∑ 𝑋𝑖 is sufficient for 𝜇 as we can show the above, practically it means 𝑇(𝑿) = ∑ 𝑋𝑖 has
1
enough information in it for us to get the best estimate of 𝜇, which is 𝑛 ∑ 𝑋𝑖 . You will know your
sample size (e.g. 10) and all you need to know is the value of ∑ 𝑋𝑖 (e.g. ∑ 𝑋𝑖 = 120) then you
1 1
have enough info to information to get the best estimate of 𝜇 i.e. it is 𝑛 ∑ 𝑋𝑖 = 10 (120) = 12.
Theorem 1.3 : Consider the simple hypotheses 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 = 𝜃1 . Suppose
𝑇(𝐱 ) is a sufficient statistic for 𝜃 and 𝑔(𝑡|𝜃𝑖 ) is the pdf of 𝑇 corresponding to 𝜃𝑖 (𝑖 = 0,1).
Then any test with critical region 𝑈 (a subset of the sample space of 𝑇) is a MPT of size 𝛼 if it
satisfies
𝑔(𝑡|𝜃 )
𝑡 ∈ 𝑈 𝑖𝑓 𝑔(𝑡|𝜃0 ) < 𝑘 (1.17)
1
for some 𝑘 > 0, where 𝑃𝜃0 [𝑇 ∈ 𝑈] = 𝛼 .
𝑔(𝑡|𝜃0 )
i.e. we can use < 𝑘 as our rejection rule (when to reject H0) where g is the pdf of t,
𝑔(𝑡|𝜃1 )
𝑓(𝒙|𝜃0 )
instead of < 𝑘 where f is the pdf of x1,…,xn when finding the most powerful test.
𝑓(𝒙|𝜃1 )
E.g.:
• This t could be for example x̅ when doing a test on θ for the N(θ, σ2) distribution
where σ2 is known. If Xi ~ N(θ, σ2) then X̅ ~ N(θ, σ2/n) so you can easily write down it's
density function. (see example 1.5)
• This t could be for Y = max(Xi) = X(n) when doing a test on θ of the Uniform(0, θ)
distribution. We would use theorem 1.4. to get the density function of t = Y = max(Xi)
= X(n)
• If a density function has range 0 ≤ x ≤ θ, then we use t = Y = max(Xi) = X(n). We would
use theorem 1.4. to get the density function of t = Y = max(Xi) = X(n). (see example 1.6)
• If a density function has range a ≤ x ≤ θ, where a is known and the test is on θ, then we
use t = max(Xi) = X(n). We would use theorem 1.4. to get the density function of t = Y =
max(Xi) = X(n)
• If a density function has range θ ≤ x ≤ b, where b is known and the test is on θ, then we
use t = min(Xi) = X(1). We would use theorem 1.4. to get the density function of t = Y =
min(Xi) = X(1).
In terms of the original sample, the test based on 𝑇 has critical region
𝑔(𝑇(𝒙)|𝜃 )
𝑅 = {𝒙 ∶ 𝒙 ∈ 𝑅} = {𝒙 ∶ 𝑇(𝒙) ∈ 𝑈} where 𝑇(𝒙) ∈ 𝑈 𝑖𝑓 𝑔(𝑇(𝒙)|𝜃0 ) < 𝑘
1
and thus
𝛼 = 𝑃𝜃0 [𝑿 ∈ 𝑅] = 𝑃𝜃0 [𝑇(𝑿) ∈ 𝑈].
So the test based on 𝑇 is a MPT of size 𝛼.
Example 1.5 : Consider again the problem in Example 1.3(b). (We want to test the mean of a
normal distribution based on a sample of 𝑛 observations. We know a sufficient statistic for the
𝜎2
mean 𝜃 is 𝑇(𝑿) = 𝑋, the sample mean, with distribution 𝑋 ∼ 𝑁 (𝜃, ).
𝑛
So
𝑛
𝑛 − (𝑥−𝜃)2
𝑔(𝑥|𝜃) = √2𝜋𝜎2 𝑒 2𝜎2 (1.18)
and we apply the Neyman–Pearson Lemma. The result is identical to (1.12) in what follows, but
the calculations are simpler. )
Example 1.5 is just example 1.3(b) where we use our new result (theorem 1.3) instead of using
theorem 1.2 i.e.
𝑋𝑖 ∼ 𝑁(𝜃, 𝜎 2 )
𝜎2
𝑡(𝑿) = 𝑋 ∼ 𝑁 (𝜃, ) because 𝑋 is sufficient for 𝜃. (Shown in 2nd year)
𝑛
1
− (𝑥−𝜃)2
1 𝜎2 𝑛 −
𝑛
(𝑥−𝜃)2
2
𝑔(𝑥|𝜃) = 2
𝑒 𝑛 = √2𝜋𝜎2 𝑒 2𝜎2
√𝜎 √2𝜋
𝑛
𝑛
𝑛 − (𝑥−𝜃0 )2
√2𝜋𝜎2 𝑒 2𝜎2
𝑔(x|𝜃0 )
= 𝑛
(𝑥−𝜃1 )2
𝑔(x|𝜃1 ) 𝑛
√2𝜋𝜎2 𝑒 2𝜎2
−
ea/eb = ea-b
n
− 0 [(𝑥−𝜃 )2 −(𝑥−𝜃 )2 ]
1
= 𝑒 n2𝜎2
2 2
− [𝑥 −2𝑥𝜃0 +𝜃0 −𝑥 +2𝑥𝜃1 −𝜃12 ]
2
= 𝑒 2𝜎2
n
− [2𝑥𝜃1 −2𝑥𝜃0 +𝜃02 −𝜃12 ]
=𝑒 2𝜎2
𝑛
− 2 [2(𝜃1−𝜃0 )𝑥+[𝜃02 −𝜃12 ]]
=𝑒 2𝜎
𝑛
− 2 [(𝜃02 −𝜃12 )+2𝑥(𝜃1 −𝜃0 )]
=𝑒 2𝜎
So reject 𝐻0 if
𝑛
− [(𝜃2 −𝜃2 )+2𝑥(𝜃 −𝜃 )]
1 0
𝑒 2𝜎2 0 1 < 𝑘,
𝑛 2 2
− 2𝜎2 [(𝜃0 − 𝜃1 ) + 2𝑥(𝜃1 − 𝜃0 )] < ln(𝑘),
that is, if
2𝜎2
𝜃02 − 𝜃12 + 2𝑥(𝜃1 − 𝜃0 ) > − ℓ𝑛(𝑘) ,
𝑛
or if
2𝜎2
2𝑥(𝜃1 − 𝜃0 ) > − ℓ𝑛(𝑘) − (𝜃02 − 𝜃12 ),
𝑛
2𝜎2
2𝑥 (𝜃1 − 𝜃0 ) > 𝜃12 − 𝜃02 − ℓ𝑛(𝑘)
𝑛
1 2𝜎2
𝑥 > (𝜃12 − 𝜃02 − ℓ𝑛(𝑘)) .
2(𝜃1 −𝜃0 ) 𝑛
𝜎2
We know that under 𝐻0 , Xi ∼ 𝑁(𝜃0 , 𝜎 2 ) 𝑠𝑜 𝑋 ∼ 𝑁 (𝜃0 , ) , so by standardising,
𝑛
𝑋−𝜃0 𝑐−𝜃0
𝛼 = 𝑃𝜃0 [𝑋 > 𝑐] = 𝑃 [𝜎/ > ]
√𝑛 𝜎/√𝑛
𝑐−𝜃0
= 𝑃 [𝑍 > ] 𝑤ℎ𝑒𝑟𝑒 𝑍 ∼ 𝑁(0,1)
𝜎/√𝑛
𝑐−𝜃0
= 1 − 𝑃 [𝑍 < ]
𝜎/√𝑛
𝑐−𝜃0
1 − 𝛼 = 𝑃 [𝑍 < ]
𝜎/√𝑛
𝑐−𝜃0
1 − 𝛼 = Φ( 𝜎 ) Φ gets read from table C. But c is unknown.
√𝑛
𝑐−𝜃0
Φ−1 (1 − 𝛼) = 𝜎
√𝑛
𝑐−𝜃0
𝑧1−𝛼 = 𝜎
√𝑛
𝑐−𝜃0
= 𝑧1−𝛼 , 𝑜𝑟
𝜎/√𝑛
𝜎
𝑐 = 𝜃0 + 𝑧1−𝛼 .
√𝑛
𝜎
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 > 𝜃0 + 𝑧1−𝛼
√𝑛
I could also ask you to calculate 𝛽 for this example (and for the theoretical example 1.3(b))
𝜎
𝛽 = 𝑃[𝐷𝑜 𝑛𝑜𝑡 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑖𝑠 𝑡𝑟𝑢𝑒 ] = 𝑃[ 𝑋 < 𝑐|𝜃1 ] = 𝑃 [ 𝑋 < 𝜃0 + 𝑧1−𝛼 |𝜃1 ]
√𝑛
𝜎 𝜎 𝜎
𝑋−𝜃1 𝜃0 +𝑧1−𝛼 −𝜃1 𝜃0 +𝑧1−𝛼 −𝜃1 𝜃0 +𝑧1−𝛼 −𝜃1
√𝑛 √𝑛 √𝑛
= 𝑃[ 𝜎 < 𝜎 ] = 𝑃 [Z < 𝜎 ] = Φ( 𝜎 )
√𝑛 √𝑛 √𝑛 √𝑛
√𝑛
= Φ ((𝜃0 − 𝜃1 ) + 𝑧1−𝛼 )
𝜎
Theorem 1.4 : Let 𝑋(1) < 𝑋(2) < ⋯ < 𝑋(𝑛) denote the order statistics of a random sample
𝑋1 , … , 𝑋𝑛 from a contineous population with pdf 𝑓𝑋 (𝑥) and cdf 𝐹𝑋 (𝑥). Then the pdf of 𝑋(𝑗) is
𝑛!
𝑓𝑋(𝑗) (𝑥) = 𝑓 (𝑥)[𝐹𝑋 (𝑥)]𝑗−1 [1 − 𝐹𝑋 (𝑥)]𝑛−𝑗 . (1.19)
(𝑗−1)!(𝑛−𝑗)! 𝑋
It follows then for the smallest order statistic that
𝑓𝑋(1) (𝑥) = 𝑛𝑓𝑋 (𝑥)[1 − 𝐹𝑋 (𝑥)]𝑛−1 (1.20)
and for the largest order statistic that
𝑓𝑋(𝑛) (𝑥) = 𝑛𝑓𝑋 (𝑥)[𝐹𝑋 (𝑥)]𝑛−1 . (1.21)
(this often gets used with a distribution which has a pdf in a certain limitted range limitted by the
parameter e.g. below and also the uniform U(0, θ) distribution.) If the range is 0 ≤ 𝑥 ≤ 𝜃 then
X(𝑛) will be sufficient for 𝜃 i.e. we can use Theorem 1.3 where our t (which in the previous
example was 𝑥) is now t = X(𝑛) . (If we had 𝜃 ≤ 𝑥 ≤ 5 we would use t = X(1) .)
1
𝑓(𝒙|𝜃) = 𝜃𝑛(𝜃−1) ∏𝑛𝑖=1 𝑥𝑖𝜃−1 , 0 ≤ 𝑥𝑖 ≤ 𝜃 , 𝑖 = 1, … , 𝑛 .
This is a nonstandard problem and difficult to evaluate unless we use a sufficient statistic.
A sufficient statistic for 𝜃 is 𝑌 = max{𝑋1 , … , 𝑋𝑛 }, the largest order statistic.
Note : It is useful to remember that if a parameter defines a boundary point of the sample space,
then the corresponding order statistic is sufficient.)
We will use theorem 1.3 instead of theorem 1.2.
𝑔(𝑦|𝜃0 )
𝑊𝑒 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 < 𝑘
𝑔(𝑦|𝜃1 )
𝑛𝜃 −1
𝜃1 1
i.e. 𝑦 −𝑛(𝜃1 −𝜃0) 𝑛𝜃 −1 < 𝑘
𝜃0 0
𝑛𝜃 −1
𝜃0 0
i.e if 𝑦 −𝑛(𝜃1 −𝜃0) < 𝑘 𝑛𝜃 −1
𝜃1 1
𝑛𝜃0 −1
𝜃0
(you had to check if 𝑛𝜃 −1 was negative or positive i.e. < becomes > or stays <
𝜃1 1
𝑛𝜃 −1
𝜃0 0
we know 0 < 𝜃0 < 𝜃1 so 𝑛𝜃 −1 is positive so the sign stays the same)
𝜃1 1
that is, when
1
𝑛𝜃 −1 −𝑛(𝜃1 −𝜃0 )
𝜃0 0
𝑦 > (𝑘 ) ( 𝑠𝑖𝑛𝑐𝑒 𝜃1 > 𝜃0 ) , (when you raise to a negative
𝑛𝜃 −1
𝜃1 1
power, the sign changes)
or 𝑦 > 𝑐 , where
𝛼 = 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒)
Exercises 1
1. Suppose that 𝑋1 , … , 𝑋𝑛 form a random sample from a uniform distribution on the interval
(0, 𝜃), and that the following hypotheses are to be tested : 𝐻0 : 𝜃 ≥ 2 𝑣𝑒𝑟𝑠𝑢𝑠 𝐻1 : 𝜃 < 2.
Let 𝑌𝑛 = max{𝑋1 , … , 𝑋𝑛 } and consider the test procedure with critical region 𝑌𝑛 ≤ 1.5.
Determine the power function of the test.
2. Consider a sample 𝑥1 , 𝑥2 , … , 𝑥𝑛 from the distribution
𝑥 𝜃−1
𝑓(𝑥|𝜃) = (𝜃) ,0 ≤ 𝑥 ≤ 𝜃 ,𝜃 > 0 .
Let 𝑌 = max {𝑥1 , 𝑥2 , … , 𝑥𝑛 } and for the test of 𝐻0 : 𝜃 ≤ 2 versus 𝐻1 : 𝜃 > 2, let the critical
region be 𝑅 = {𝑥: 𝑦 > 1.9}.
Find the power function of this test.
7. Consider a random sample of size 𝑛 from a normal distribution with mean zero and
unknown variance 𝜎 2 . We want to test the hypotheses 𝐻0 : 𝜎 2 = 2 versus 𝐻1 : 𝜎 2 = 3.
(a) Find the most powerful test of size 𝛼 = 0.05.
(b) Determine the test in (a) when 𝑛 = 8.
8. Suppose a single observation is taken from a uniform distribution on the interval (0, 𝜃), and
that the following hypotheses are to be tested : 𝐻0 : 𝜃 = 1 versus 𝐻1 : 𝜃 = 2.
(a) Show that there exists a test procedure for which 𝛼 = 0 and 𝛽 < 1.
(b) Among all tests for which 𝛼 = 0, find the one for which 𝛽 is a minimum.
9. Let 𝑋1 , … , 𝑋𝑛 be a random sample from a Poisson distribution with unknown parameter
𝜆. For the hypotheses 𝐻0 : 𝜆 = 𝜆0 versus 𝐻1 : 𝜆 = 𝜆1 , where 𝜆0 < 𝜆1 ;
(a) Find the most powerful size−𝛼 test ;
(b) Find the MPT when 𝑛 = 20, 𝜆0 = 0.1, 𝜆1 = 0.2 and 𝛼 is approximately 0.1.
10. Suppose a random sample from a normal distribution with unknown mean 𝜇 and
standard deviation 2. We want to test 𝐻0 : 𝜇 = −1 versus 𝐻1 : 𝜇 = 1. Determine the minimum
value of 𝛼 + 𝛽 that can be attained for (a) 𝑛 = 1, (b) 𝑛 = 9, (c) 𝑛 = 36.
11. In 20 tosses of a coin, 5 heads and 15 tails appear. Test the null hypothesis that the coin is
fair against the alternative that the probability for heads is 0.3. The size of the test should be
smaller or equal to 0.1. What is the power of your test?
12. A bag contains 5 balls of which 𝑚 are black and the rest, 5 − 𝑚, are white. Now, 𝑚 is
either 3 or 1, so we draw two balls without replacement and we want to test 𝐻0 : 𝑚 = 3
versus 𝐻1 : 𝑚 = 1. If we decide to reject 𝐻0 if both balls drawn are white, find 𝛼 and 𝛽.
Chapter 2
Composite Hypotheses
(< or > or ≤ or ≥ or ≠ etc.)
(simple implies =)
2.1 Introduction
As stated in definition 1.6, if the parameter space under a particular hypothesis contains more
than one point, it is a composite hypothesis. In most cases in practice composite hypotheses are
considered, or cases where one of the hypotheses is composite. Often the null hypotheses is
simple while the alternative hypothesis is composite. For example, the null hypothesis can specify
a particular value, 𝐻0 : 𝜃 = 𝜃0 , the standard or norm, while the alternative just states that the
null hypothesis is not true, or 𝐻1 : 𝜃 ≠ 𝜃0 .
In Chapter I for two simple hypotheses the size of the test 𝛼 = 𝜋(𝜃0 ) =
𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒) = 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝜃 = 𝜃0 ) was defined as the probability of rejecting 𝐻0
when 𝐻0 is true. When 𝐻0 is composite (e.g. 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒) =
|𝜃 ))
𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ≤ 𝜃0 this definition is not sufficient and we have the more general definition.
Similarly,
𝛽 = sup [1 − 𝜋(𝜃)]
𝜃∈Ω𝑐0
= sup 𝑃[𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 ]
𝜃∈Ω𝑐0
= sup 𝑃[𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻1 𝑡𝑟𝑢𝑒]
= sup𝑃[𝑇𝑦𝑝𝑒 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 ] ,
Note that the supremum (sup) is the same as the least-upper-bound, but differs from the
maximum in that it is not necessarily in the set of values.
Note : The size of a test is also called the level of significance of the test.
Example 2.1
Consider again the light bulbs from Example 1.1. There we had the two composite hypotheses,
𝐻0 : 𝜃 ≤ 1400 versus 𝐻1 : 𝜃 > 1400, and it was decided to reject 𝐻0 if 𝑥 ≥ 1500. The power
function was derived as
1500
𝜋(𝜃) = 𝑒 − 𝜃 .
Now the size of this test is the maximum probability of committing a Type I Error,
1500
𝛼 = sup 𝜋(𝜃) = sup 𝑒 − 𝜃 .
𝜃∈Ω0 𝜃≤1400
Since 𝜋(𝜃) is an increasing function of 𝜃, the supremum occurs at the maximum value of 𝜃
1500
under 𝐻0 , which is 1400. So 𝛼 = 𝑒 −1400 = 0.3425.
The power function is a measure of the efficienty of a test. For a “good” test we should have
The power function of a perfect test of the hypothesis 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 would
look like the solid line in Figure 2.1. However a test is never perfect and the power function would
usually look like the dotted line.
FIGURE 2.1 : Power function
The most powerful test (MPT) defined by the Neyman–Pearson Lemma does not apply in the
case of composite hypotheses. The Neyman–Pearson Lemma defines a likelihood ratio test
𝑓(𝒙|𝜃 ) 𝐿(𝜃 |𝒙)
because we rejected H0 if 𝑓(𝒙|𝜃0) < 𝑘 i.e. 𝐿(𝜃0 |𝒙) < 𝑘. We will use the same principle to define
1 1
generalized likelihood ratio tests (LR-tests). However it will not have the same optimal properties
as the most powerful test in the case of simple hypotheses.
Definition 2.2 : Let 𝒙 = {𝑥1 , … , 𝑥𝑛 } be a random sample from a distribution with likelihood
function 𝐿(𝜃|𝒙). For the hypothesis 𝐻0 : 𝜃 ∈ Ω0 versus 𝐻1 : 𝜃 ∈ Ω0 where Ω0 ⊂ Ω, let
sup𝜃∈Ω0 𝐿(𝜃|𝒙)
𝜆(𝒙) = . (2.3)
sup𝜃∈Ω 𝐿(𝜃|𝒙)
A generalized likelihood ratio test (GLRT) of size 𝛼 is any test that has a rejection region of the
form 𝑅 = {∶ 𝜆(𝒙) ≤ 𝑘}, where 0 ≤ 𝑘 ≤ 1 is chosen so that sup𝜃∈Ω0 𝑃[𝑿 ∈ 𝑅] = 𝛼.
Remarks :
(i) Note that Λ(𝑋1 , … , 𝑋𝑛 ) is a statistic with observed value 𝜆(𝒙) = 𝜆(𝑥1 , … , 𝑥𝑛 ), called the
test statistic.
(ii) If the parameter space is reduced to two points so that Ω0 = 𝜃0 and Ω = {𝜃0 , 𝜃1 }, then
the GLRT does not reduce to the MPT for simple hypotheses.
(ii) Note that 0 ≤ 𝜆(𝒙 ) ≤ 1 since 𝐿(𝜃|𝒙) is positive and sup𝜃∈Ω0 𝐿(𝜃|𝒙) ≤ sup𝜃∈Ω 𝐿(𝜃|𝒙)
because Ω0 ⊂ Ω.
(iv) Note that the supremum of 𝐿(𝜃|𝒙) over the whole parameter space Ω is always
obtained at the maximum likelihood estimators of the parameters. So sup𝜃∈Ω 𝐿(𝜃|𝒙) =
𝐿(𝜃̂|𝒙), where 𝜃̂ represents the ML estimator of 𝜃.
We want to find a test of size 𝛼 for the simple null hypothesis 𝐻0 : 𝜃 = 𝜃0 versus the composite
two–sided alternative 𝐻1 : 𝜃 ≠ 𝜃0 . Note that now the exponential density function is defined
differently from Example 1.1.
ii) 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 ≠ 𝜃0
The parameter spaces are
Ω = {𝜃: 𝜃 > 0} 𝑎𝑛𝑑 Ω0 = {𝜃: 𝜃 = 𝜃0 } .
(formula sheet)
iii)
In Ω0 ; (𝜃 can only be 𝜃0 ) so we will substitute 𝜃0 for 𝜃
sup 𝐿(𝜃|𝒙) = 𝐿(𝜃0 |𝒙) = 𝜃0𝑛 𝑒 −𝜃0Σ𝑥𝑖 ,
𝜃∈Ω0
since Ω0 contains only one point i.e. 𝜃0 .
where 𝜃̂ is the maximum likelihood estimator. If the ML estimator is not known, it must be
𝜕
derived by setting ℓ𝑛 𝐿(𝜃|𝒙) = 0 and solving for 𝜃. In this case
𝜕𝜃
𝐿(𝜃|𝒙) = 𝜃 𝑛 𝑒 −𝜃Σ𝑥𝑖 .
ℓ𝑛 𝐿(𝜃|𝒙) = 𝑛 ℓ𝑛𝜃 − 𝜃Σ𝑥𝑖
𝜕 𝑛
ℓ𝑛 𝐿(𝜃|𝒙) = − Σ𝑥𝑖
𝜕𝜃 𝜃
𝜕 𝑛
Set 𝜕𝜃 ℓ𝑛 𝐿(𝜃|𝒙)|𝜃=𝜃̂ = 𝜃̂ − Σ𝑥𝑖 = 0, then follows
𝑛
= Σ𝑥𝑖
𝜃̂
1 Σ𝑥𝑖
=
𝜃̂ 𝑛
𝑛 1 1
𝜃̂ = Σ𝑥 = 1 = 𝑥.
𝑖 Σ𝑥𝑖
n
So
1
Sup 𝐿(𝜃|𝒙) = 𝐿(𝜃̂|𝒙) = 𝐿 (𝑥 |𝒙)
𝜃∈Ω
1 𝑛 1
= (𝑥) 𝑒 −𝑥Σ𝑥𝑖
1 𝑛 1
= (𝑥) 𝑒 −𝑥n𝑥
−𝑛
= 𝑥 𝑒 −𝑛
iv)
So
sup𝜃∈Ω0 𝐿(𝜃 |𝒙) 𝜃0𝑛 𝑒 −𝜃0 Σ𝑥𝑖 𝜃0𝑛 𝑒 −𝜃0 n𝑥
𝜆(𝒙) = = −𝑛 = −𝑛 −𝑛 = (𝜃0 𝑥)𝑛 𝑒 𝑛(1−𝜃0𝑥) . (2.4)
sup𝜃∈Ω 𝐿(𝜃 |𝒙) 𝑥 𝑒 −𝑛 𝑥 𝑒
v)
According to the GLR test we reject 𝐻0 if 𝜆(𝒙) ≤ 𝑐 , that is if
(𝜃0 𝑥)𝑛 𝑒 𝑛(1−𝜃0𝑥) ≤ 𝑐 . (2.5)
This can’t be simplified Algebraicly, but graphically we can see that it is the same as:
𝑥 ≤ 𝑘1 or 𝑥 ≥ 𝑘2 .
(To determine 𝑐 we must write the inequality in terms of a statistic of which the distribution is
known. We note that the only sample statistic in the left–hand side of (2.5) is 𝑥, so we want to
redefine the rejection region in terms of 𝑥. Now 𝜆(𝒙) in (2.4) is a non–monotone function of
1
𝑥 with a maximum of one when 𝑥 = 𝜃 . So (2.5) will hold for 0 ≤ 𝑐 ≤ 1 when 𝑥 is either
0
small enough or large enough. This is shown in Figure 2.2.
FIGURE 2.2 : LR statistic as a function of 𝒙
So 𝜆(𝒙) ≤ 𝑐 if and only if 𝑥 ≤ 𝑘1 or 𝑥 ≥ 𝑘2 . Note that 𝑘1 and 𝑘2 are the two roots of the
equation 𝜆(𝒙) − 𝑐 = 0. It is usually not easy to find these roots, so the problem is solved by
using the condition that the size of the test must be 𝛼.)
vi)
Important relationships I :
If 𝑋𝑖 ∼ Exp(𝜃) , 𝑖 = 1, … , 𝑛, then
∑𝑖=1 𝑋𝑖 ∼ Gamma(𝑛, 𝜃) , and
1 𝜃
∑ 𝑋𝑖 ∼ Gamma(𝑛, 1 ) , and
𝑛 𝑖=1
𝑛
𝑋 ∼ Gamma(𝑛, 𝑛𝜃) , and
2𝑛𝜃𝑋 ∼ Gamma(𝑛, 𝑛𝜃 ) = Gamma(2𝑛 , 1)
2𝑛𝜃 2 2
∼ 2 2 v 1
Y=2𝑛𝜃𝑋 𝜒2𝑛 (because 𝜒v = Gamma( , )) 2 2
Reject H0 if 𝑥 ≤ 𝑘1 or 𝑥 ≥ 𝑘2 .
vii)
(We have seen that the critical region 𝑅 = {∶ 𝜆(𝒙) ≤ 𝑐} is equivalent to
𝑅 = {𝑥: 𝑥 ≤ 𝑘1 𝑜𝑟 𝑥 ≥ 𝑘2 } , 𝑠𝑜 )
𝛼 = sup 𝑃 [𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ] = 𝑃𝜃0 [Λ(𝑋) ≤ 𝑐]
𝜃∈Ω0
(2.6)
= 𝑃𝜃0 [𝑋 ≤ 𝑘1 𝑜𝑟 𝑋 ≥ 𝑘2 ] .
𝛼 𝛼
+2 = 𝑃𝜃0 [𝑋 ≤ 𝑘1 ] + 𝑃𝜃0 [𝑋 ≥ 𝑘2 ]
2
(However, an infinite number of pairs (𝑘1 , 𝑘2 ) satisfy (2.6). Any pair would specify a valid LR–
test of size 𝛼 for the given hypotheses, but the proper one would also satisfy 𝜆(𝑘1 ) = 𝜆(𝑘2 ).
As this is a difficult solution to find, in practice an “equal tail” test is usually prefered. This means
that Equation (2.6) is split into two parts where 𝑘1 < 𝑘2 , so that)
𝑃𝜃0 [𝑋 ≤ 𝑘1 ] = 𝑃[𝑋 ≥ 𝑘2 ] = 𝛼/2 . (2.7)
i.e.
𝑃𝜃0 [𝑋 ≤ 𝑘1 ] = 𝛼/2 and 𝑃[𝑋 ≥ 𝑘2 ] = 𝛼/2
(This means we make the probability of a Type I Error for a two–sided alternative the same above
and below the null hypothesis. From (2.7) the constants 𝑘1 and 𝑘2 can now be uniquely
determined.)
(Notation : The subscript after a simbol for a distribution will always denote the cdf–value of the
distribution. For example, if 𝑍 ∼ 𝑁(0,1), then 𝑧𝛼 is the value for which Φ(𝑧𝛼 ) = 𝐹𝑍 (𝑧𝛼 ) =
𝑃[𝑍 ≤ 𝑧𝛼 ] = 𝛼 i.e. 𝑧𝛼 = Φ−1 (𝛼), or if 𝑋 ∼ 𝑡𝑛 , then 𝑡𝑛,𝛼 means that 𝐹𝑋 (𝑡𝑛,𝛼 )
= 𝑃[𝑋 ≤ 𝑡𝑛,𝛼 ] = 𝛼 i.e. 𝑡𝑛,𝛼 = 𝐹𝑋−1 (𝛼) if 𝑋 ∼ 𝑡𝑛 . If 𝑋 ∼ 𝜒𝑛2 , then 𝜒𝑛,𝛼
2
is that value for which
2
𝐹𝑋 (𝜒𝑛,𝛼 2
) = 𝑃[𝑋 ≤ 𝜒𝑛,𝛼 2
] = 𝛼 i.e. 𝜒𝑛,𝛼 = 𝐹𝑋−1 (𝛼) if 𝑋 ∼ 𝜒𝑛2 . )
In example 2.3 and example 2.4 we have that ∑(𝒙𝒊 − 𝝁𝟎 )𝟐 = ∑(𝒙𝒊 − 𝒙)𝟐 + 𝒏(𝒙 − 𝝁𝟎 )𝟐
which is from:
Let us look at an important standard problem and follow the steps as given above.
Example 2.3 : Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) where 𝜎 2 is known and we want to derive a LR–
test of size 𝛼 for 𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0 .
1
− Σ(𝑥𝑖 −𝜇)2
1. Likelihood function: 𝐿(𝜇|𝐱) = (2𝜋𝜎 2 )−𝑛/2 𝑒 2𝜎2 .(pdf on formula sheet, Πf(xi|𝜇))
2. 𝐻0 : 𝜇 = 𝜇0 vs 𝐻1 : 𝜇 ≠ 𝜇0 .
Ω = {𝜇: −∞ < 𝜇 < ∞}
Ω0 = {𝜇: 𝜇 = 𝜇0 }
𝛿 ln 𝐿(𝜇|𝒙)
3. In Ω the ML estimator is 𝜇̂ = 𝑥. From |𝜇=𝜇̂ = 0 (because we want
𝛿𝜇
sup𝜇∈Ω 𝐿(𝜇|𝐱))
In Ω0 a single point, 𝜇 = 𝜇0 . (because we want sup𝜇∈Ω0 𝐿(𝜇|𝐱))
1
− Σ(𝑥𝑖 −𝑥)2
sup𝜇∈Ω 𝐿(𝜇|𝐱) = 𝐿(𝜇̂ |𝐱) = 𝐿(𝑥|𝐱) = (2𝜋𝜎 2 )−𝑛/2 𝑒 2𝜎2 .
1
− 2 Σ(𝑥𝑖 −𝜇0 )2
sup𝜇∈Ω0 𝐿(𝜇|𝐱) = 𝐿(𝜇0 |𝐱) = (2𝜋𝜎 2 )−𝑛/2 𝑒 2𝜎 .
4.
So
sup𝜇∈Ω0 𝐿(𝜇|𝐱)
𝜆(𝒙) = sup𝜇∈Ω 𝐿(𝜇|𝐱)
1
− Σ(𝑥𝑖 −𝜇0 )2
(2𝜋𝜎2 )−𝑛/2 𝑒 2𝜎2
= 1
− Σ(𝑥𝑖 −𝑥)2
(2𝜋𝜎2 )−𝑛/2 𝑒 2𝜎2
1 (2.12)
[Σ(𝑥𝑖 −𝑥)2 −Σ(𝑥𝑖 −𝜇0 )2 ]
= 𝑒 2𝜎2
𝑛
− 0 (𝑥−𝜇 )2
= 𝑒 2𝜎2
𝑠𝑖𝑛𝑐𝑒 Σ(𝑥𝑖 − 𝜇0 )2 = Σ(𝑥𝑖 − 𝑥)2 + 𝑛(𝑥 − 𝜇0 )2
(𝑠ℎ𝑜𝑤𝑛 𝑏𝑒𝑓𝑜𝑟𝑒 𝑡ℎ𝑖𝑠 𝑒𝑥𝑎𝑚𝑝𝑙𝑒)
6. Thus reject 𝐻0 if
(𝑥 − 𝜇0 )2 ≥ 𝑘 2 .
√(𝑥 − 𝜇0 )2 ≥ √𝑘 2 .
|𝑥 − 𝜇0 | ≥ k
±(𝑥 − 𝜇0 ) ≥ k
−(𝑥 − 𝜇0 ) ≥ k or (𝑥 − 𝜇0 ) ≥ k
(𝑥 − 𝜇0 ) ≤ −k or (𝑥 − 𝜇0 ) ≥ k
𝑥 − 𝜇0 ≤ −k or 𝑥 − 𝜇0 ≥ k
𝑥 ≤ 𝜇0 − k or 𝑥 ≥ 𝜇0 + k
𝜎2
where, under 𝐻0 , Xi ∼ 𝑁(𝜇0 , 𝜎 2 ), so 𝑋 ∼ 𝑁 (𝜇0 , ).
𝑛
Let
𝛼/2 = 𝑃𝜇0 [𝑋 ≥ 𝜇0 + 𝑘]
𝑋−𝜇0 (𝜇0 +𝑘)−𝜇0 𝑘 𝑘
= 𝑃𝜇0 [ 𝜎/ ≥ ] = 𝑃𝜇0 [𝑍 ≥ 𝜎/ 𝑛] = 1 − P [𝑍 ≤ 𝜎/ 𝑛]
√𝑛 𝜎/√𝑛 √ √
𝑘
= 1 − Φ(𝜎/ 𝑛)
√
𝑘
Φ (𝜎/ 𝑛) = 1 − 𝛼/2
√
𝑘 𝛼
= Φ−1 (1 − 2 )
𝜎/√𝑛
𝑘
= 𝑧1−𝛼/2
𝜎/√𝑛
𝜎
𝑘 = 𝑧1−𝛼/2 .
√𝑛
In the next important example we will show how to deal with a nuisance parameter, that is, a
parameter which we are not interested in testing, but which is unknown.)
Example 2.4 : Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) where 𝜎 2 is unknown and we want to derive the
LR–test of size 𝛼 for 𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0 .
1. Likelihood function :
1
− Σ(𝑥𝑖 −𝜇)2
𝐿(𝜇, 𝜎 2 |𝒙) = (2𝜋𝜎 2 )−𝑛/2 𝑒 2𝜎2 . (pdf on formula sheet, Πf(xi|𝜇))
2. 𝐻0 : 𝜇 = 𝜇0 vs 𝐻1 : 𝜇 ≠ 𝜇0 .
Now the parameter space is two–dimensional since we have two unknown parameters.
Ω = {(𝜇, 𝜎 2 ) : − ∞ < 𝜇 < ∞ , 𝜎 2 > 0}
Ω0 = {(𝜇, 𝜎 2 ): 𝜇 = 𝜇0 , 𝜎 2 > 0} .
This if from:
𝛿 ln 𝐿(𝜇,𝜎2 |𝒙)
…. |𝜇=𝜇̂ = 0 …. yields 𝜇̂ = 𝑥
𝛿𝜇
̂
𝜎=𝜎
and
𝛿 ln 𝐿(𝜇,𝜎2 |𝒙) 1 1
…. |𝜇=𝜇̂ = 0 …. yields 𝜎̂ 2 = 𝑛 ∑(𝑥𝑖 − 𝜇̂ )2 = 𝑛 ∑(𝑥𝑖 − 𝑥)2
𝛿𝜎
̂
𝜎=𝜎
1
𝜇 = 𝜇0 𝑎𝑛𝑑 𝜎̂02 = 𝑛 Σ(𝑥𝑖 − 𝜇0 )2 .
This is from:
𝜇 can only be 𝜇0
But we got to estimate 𝜎 2
𝛿 ln 𝐿(𝜇,𝜎2 |𝒙) 1
…. |𝜇=𝜇0 = 0 …. yields 𝜎̂02 = 𝑛 ∑(𝑥𝑖 − 𝜇0 )2
𝛿𝜎 ̂0
𝜎=𝜎
𝜇0 is in the formula because when we set 𝜇 = 𝜇0
4. (Plug what you got in step 3 into the Likelihood, as this is where the sup (max) occurs)
In Ω:
sup(𝜇,𝜎2 )∈Ω 𝐿(𝜇, 𝜎 2 | 𝒙) = 𝐿(𝜇̂ , 𝜎̂ 2 |𝒙)
𝑛 1
− Σ(𝑥𝑖 −𝑥)2
= (2𝜋𝜎̂ 2 )− 2 𝑒 ̂2
2𝜎
1
𝑛 − 1 Σ(𝑥𝑖 −𝑥)2
2 Σ(𝑥𝑖 −𝑥)2
= (2𝜋𝜎̂ 2 )− 2 𝑒 𝑛
So
2
sup(𝜇,𝜎2)∈Ω 𝐿(𝜇, 𝜎 |𝒙) ̂ 2 )−𝑛/2 𝑒 −𝑛/2
(2𝜋𝜎 ̂2
𝜎
−𝑛/2
𝜆(𝒙) = 2 = (2𝜋𝜎̂2 )−𝑛/2 𝑒 −𝑛/2 = (𝜎̂02 ) . (2.14)
sup(𝜇,𝜎2 )∈Ω 𝐿(𝜇, 𝜎 |𝒙) 0
0
6.
−𝑛/2
−𝑛/2 1 2
𝜎̂02 Σ(𝑥𝑖 − 𝜇 0 )
𝜆(𝐱) = ( 2) = (n )
𝜎̂ 1 2
n Σ(𝑥𝑖 − 𝑥)
−𝑛/2
Σ(𝑥𝑖 − 𝜇0 )2
= ( )
Σ(𝑥𝑖 − 𝑥)2
−𝑛/2
Σ(𝑥𝑖 − 𝑥)2 + 𝑛(𝑥 − 𝜇0 )2
= ( )
Σ(𝑥𝑖 − 𝑥)2
−𝑛/2
𝑛(𝑥 − 𝜇0 )2
= (1 + ) .
Σ(𝑥𝑖 − 𝑥)2
(done before previous example)
Reject H0 if
−𝑛/2
𝑛(𝑥−𝜇0 )2
(1 + ) ≤𝑐
Σ(𝑥 −𝑥)2
𝑖
𝑛(𝑥−𝜇0 )2
1+ ≥ 𝑐 −2/n
Σ(𝑥𝑖 −𝑥)2
𝑛(𝑥−𝜇0 )2
≥ 𝑐 −2/n − 1
Σ(𝑥𝑖 −𝑥)2
𝑛(𝑥−𝜇0 )2
≥ 𝑑 2 ( 𝑤ℎ𝑒𝑟𝑒 𝑑2 = 𝑐 −2/𝑛 − 1) ,
Σ(𝑥𝑖 −𝑥)2
𝑛(𝑥−𝜇0 )2
1 ≥ 𝑑2
(𝑛−1) Σ(𝑥𝑖 −𝑥)2
𝑛−1
𝑛(𝑥−𝜇0 )2
2
(𝑛−1)𝑆 2
≥ 𝑑
𝑛(𝑥−𝜇 )2
√ (𝑛−1)𝑆0 2 ≥ √𝑑 2
that is,
√𝑛|𝑥−𝜇0 |
≥ 𝑑, (2.15)
√(𝑛−1) 𝑠2
1
where 𝑠 2 = 𝑛−1 Σ(𝑥𝑖 − 𝑥)2, the unbiased estimator for 𝜎 2 .
We must now write the left–hand side of (2.15) in terms of a statistic with known distribution
under 𝐻0 .
𝐼𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑡 𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝐼𝐼 ∶
𝑖𝑓 Z ∼ 𝑁(0,1) , 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓
𝑈 ∼ 𝜒𝜈2 , 𝑡ℎ𝑒𝑛
𝑍
𝑇 = ∼ 𝑡𝜈
√𝑈/𝜈
𝜎2
We know that under 𝐻0 , Xi ∼ 𝑁(𝜇0 , 𝜎 2 ), so 𝑋 ∼ 𝑁 (𝜇0 , ) , so that
𝑛
𝑋−𝜇0 √𝑛(𝑋−𝜇0 )
𝑍= 𝜎 = ∼ 𝑁(0,1)
𝜎
√𝑛
Reject H0 if
√𝑛|𝑥 − 𝜇0 |
≥ 𝑑,
√(𝑛 − 1) 𝑠 2
√𝑛|𝑥−𝜇0 |
≥ 𝑑
𝑠√(𝑛−1)
√𝑛|𝑥−𝜇0 |
≥ 𝑑 √(𝑛 − 1)
𝑠
|𝑇| ≥ 𝑑 √(𝑛 − 1)
|𝑇| ≥ 𝑘
±𝑇 ≥ 𝑘
−𝑇 ≥ 𝑘 𝑜𝑟 𝑇 ≥ 𝑘
𝑇 ≤ −𝑘 𝑜𝑟 𝑇 ≥ 𝑘
√𝑛|𝑥−𝜇0 |
|𝑇| = ≥ 𝑘 , (where 𝑘 = √𝑛 − 1𝑑) that is, if 𝑇 ≤ −𝑘 or 𝑇 ≥ 𝑘, where 𝑇 ∼ 𝑡𝑛−1.
𝑠
√𝑛|𝑥−𝜇0 |
8. Reject the null hypothesis 𝐻0 : 𝜇 = 𝜇0 if |𝑡| ≥ 𝑡𝑛−1,1−𝛼/2 ,where |𝑡| = ,
𝑠
𝑠 𝑠
or , equivalently, if 𝑥 ≤ 𝜇0 − 𝑡𝑛−1,1−𝛼/2 𝑜𝑟 𝑥 ≥ 𝜇0 + 𝑡𝑛−1,1−𝛼/2 . (2.17)
√𝑛 √𝑛
In the previous examples we have only dealt with two–sided (≠) alternatives (H1) and a point (=)
null hypothesis (H0). For one–sided tests we would usually apply the methods described in the
example. However, one–sided LR–tests are derived in a similar manner and are usually just a
one–sided version of the tests described above.
In Example 2.4, let 𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0
The LR–test of size 𝛼 would then be:
𝑠 𝑠
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 ≤ 𝜇0 − 𝑡𝑛−1,1−𝛼/2 𝑜𝑟 𝑥 ≥ 𝜇0 + 𝑡𝑛−1,1−𝛼/2 .
√𝑛 √𝑛
For example, in Example 2.4, if we had 𝐻0 : 𝜇 ≤ 𝜇0 versus 𝐻1 : 𝜇 > 𝜇0 be the two hypotheses.
The LR–test of size 𝛼 would then be:
𝑠
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 ≥ 𝜇0 + 𝑡𝑛−1,1−𝛼 𝑛 . (2.18)
√
These are one–sided critical regions and the only difference to (2.17) is that the 𝛼/2 is replaced
by 𝛼.
(A similar thing happens in example 2.3.)
The above example is one case to which the theory of the next section does not apply.
Like in chapter 1, where you can replace the 𝑓(𝒙|𝜃) (pdf of entire sample X1,..,Xn) with 𝑔(t|𝜃)
(pdf of a sufficient statistic) in calculating the liklihood ratio, you can do so also for LR-tests i.e.
𝐿(𝜃|𝒙) = 𝑓(𝒙|𝜃) can be replaced with 𝐿(𝜃|𝑡) = 𝑔(t|𝜃) . This is one of the things that is
different about example 2.5.
(Since 𝜃 is the lower boundary of the sample space, we know that a sufficient statistic for 𝜃 is
𝑌 = min{𝑋1 , … , 𝑋𝑛 }, the first order statistic. Just as in Chapter 1, we can base our test on
sufficient statistics.)
Preliminary calculations:
𝑁𝑜𝑤 𝑓𝑋 (𝑥|𝜃) = 𝑒 −(𝑥−𝜃) , 𝜃 ≤ 𝑥 < ∞ , 𝑎𝑛𝑑
𝑥 𝑥
𝐹𝑋 (𝑥|𝜃) = ∫−∞ 𝑓𝑋 (𝑢|𝜃)𝑑𝑢 = ∫𝜃 𝑒 −(𝑢−𝜃) 𝑑𝑢 = . . . = 1 − 𝑒 −(𝑥−𝜃) .
(From (1.20) follows that) t = Y = min(Xi) = X(1) (introduced sufficient statistics as t, theorem 1.4
introduced smallest order statistic as min(Xi) = X(1), we’ll just call it Y)
𝑓𝑋(1) (𝑥) = 𝑛𝑓𝑋 (𝑥)[1 − 𝐹𝑋 (𝑥)]𝑛−1
𝑓𝑌 (𝑦|𝜃) = 𝑛 𝑒 −(𝑦−𝜃) 𝑒 −(𝑛−1)(𝑦−𝜃) (2.20)
= 𝑛 𝑒 −𝑛(𝑦−𝜃) , 𝜃 ≤ 𝑦 < ∞ .
1. Likelihood function :
𝐿(𝜃|𝑦) = fY (y|𝜃) = 𝑛𝑒 −𝑛(𝑦−𝜃) , 0 ≤ 𝜃 ≤ 𝑦.
2. 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 .
Ω = {𝜃 : 𝜃 > 0}
Ω0 = {𝜃 : 0 < 𝜃 ≤ 𝜃0 } .
3.
𝐿(𝜃|𝑦) = 𝑛𝑒 −𝑛(𝑦−𝜃) is an increasing function of 𝜃 because:
𝜃 ↑ implies −𝜃 ↓ implies 𝑦 − 𝜃 ↓ implies 𝑛(𝑦 − 𝜃) ↓ implies −𝑛(𝑦 − 𝜃) ↑ implies
𝑒 −𝑛(𝑦−𝜃) ↑ implies 𝑛𝑒 −𝑛(𝑦−𝜃) ↑ implies 𝐿(𝜃|𝑦) ↑
4.
So
sup 𝐿(𝜃|𝑦) = 𝐿( y|𝑦) = 𝑛 𝑒 −𝑛(𝑦−y) = 𝑛 𝑒 0 = 𝑛
𝜃∈Ω
𝐿(𝜃0 |𝑦) 𝑖𝑓 𝜃0 ≤ 𝑦 𝑛 𝑒 −𝑛(𝑦−𝜃0) 𝑖𝑓 𝜃0 ≤ 𝑦 𝑛 𝑒 −𝑛(𝑦−𝜃0) 𝑖𝑓 𝜃0 ≤ 𝑦
sup 𝐿(𝜃|𝑦) = { = { −𝑛(𝑦−y) ={
𝜃∈Ω0 𝐿(𝑦|𝑦) 𝑖𝑓 𝜃0 > 𝑦 . 𝑛𝑒 𝑖𝑓 𝜃0 > 𝑦 . 𝑛 𝑖𝑓 𝜃0 > 𝑦 .
and
sup 𝐿(𝜃|𝑦)
𝜃∈Ω0 𝑒 −𝑛(𝑦−𝜃0) 𝑖𝑓 𝜃0 ≤ 𝑦
𝜆(𝑦) = = {
sup 𝐿(𝜃|𝑦)
𝜃∈Ω
1 𝑖𝑓 𝜃0 > 𝑦 .
5. Reject 𝐻0 if 𝜆(𝑦) ≤ 𝑐 where 0 < 𝑐 < 1. So we will never reject 𝐻0 if 𝜃0 > 𝑦. So reject
𝐻0 if 𝑒 −𝑛(𝑦−𝜃0 ) ≤ 𝑐 for 𝜃0 ≤ 𝑦. That is, when
𝑒 −𝑛(𝑦−𝜃0) ≤ 𝑐
…
ℓ𝑛 𝑐
𝑦 ≥ 𝑘 ( 𝑤ℎ𝑒𝑟𝑒 𝑘 = 𝜃0 − 𝑛 ) .
7. Let 𝛼 = sup 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻 0 𝑡𝑟𝑢𝑒) = sup𝜃≤𝜃0 𝑃[𝑌 ≥ 𝑘] . Notice that since the null
hypothesis is now composite we must find the supremum of the probability and set it equal to
𝛼.
∞ ∞
𝑃[𝑌 ≥ 𝑘] = ∫𝑘 𝑓(𝑦|𝜃)𝑑𝑦 = ∫𝑘 𝑛𝑒 −𝑛(𝑦−𝜃) 𝑑𝑦
= … = 𝑒 −𝑛(𝑘−𝜃) .
ℓ𝑛𝛼
Reject 𝐻0 if 𝑦 ≥ 𝑘 𝑖. 𝑒. 𝑦 ≥ 𝜃0 − , where 𝑦 = min{𝑥1 , … , 𝑥𝑛 }.
𝑛
In this section we will derive the LR–tests in two important examples, tests that are often
encountered in practice. Here we are comparing the parameters of two different populations.
𝐿(𝜇1 , 𝜇2 , 𝜎 2 |𝒙, 𝒚)
= 𝑓𝑋1 (𝑥1 |𝜇1 , 𝜎 2 ) × … × 𝑓𝑋𝑛 (𝑥𝑛 |𝜇1 , 𝜎 2 ) × 𝑓𝑌1 (𝑦1 |𝜇2 , 𝜎 2 ) × … × 𝑓𝑌𝑚 (𝑦𝑚 |𝜇2 , 𝜎 2 )
= ∏𝑛𝑖=1 𝑓𝑋 (𝑥𝑖 |𝜇1 , 𝜎 2 ) × ∏𝑚 2
𝑖=1 𝑓𝑌 (𝑦𝑖 |𝜇2 , 𝜎 )
1 1 1 1
− (𝑥 −𝜇1 )2 − (𝑦 −𝜇2 )2
= ∏𝑛𝑖=1(2𝜋𝜎 2 )−2 𝑒 2𝜎2 𝑖
𝑚
×∏
𝑖=1 (2𝜋𝜎 2 )−2 𝑒 2𝜎2 𝑖
𝑛+𝑚 1
− 2 [Σ(𝑥𝑖 −𝜇1 )2 +Σ(𝑦𝑗 −𝜇2 )2 ]
= (2𝜋𝜎 2 )− 2 𝑒 2𝜎 (2.26)
2. 𝐻0 : 𝜇1 = 𝜇2 (= 𝜇0 ) versus 𝐻1 : 𝜇1 ≠ 𝜇2 .
𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎2 |𝒙, 𝒚) 𝜇1 =𝜇
̂ 𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎2 |𝒙, 𝒚) 𝜇1 =𝜇
̂ 𝜕𝑙𝑛𝐿(𝜇1 , 𝜇2 , 𝜎2 |𝒙, 𝒚) 𝜇1 =𝜇
̂
(
𝜕𝜇1
|𝜇2=𝜇̂12 = 0; 𝜕𝜇2
|𝜇2=𝜇̂12 = 0; 𝜕𝜎2
| 𝜇2=𝜇̂12 = 0)
̂2
𝜎2 =𝜎 ̂2
𝜎2 =𝜎 𝜎2 =𝜎̂2
𝜇̂ 1 = 𝑥 𝑎𝑛𝑑 𝜇̂ 2 = 𝑦 . (2.27)
Further,
𝜕ℓ𝑛 𝐿 𝜇1 =𝜇
̂ 𝑛+𝑚 1
| ̂12 = − + 2𝜎̂4 [Σ(𝑥𝑖 − 𝜇̂ 1 )2 + Σ(𝑦𝑗 − 𝜇̂ 2 )2 ] = 0
𝜕𝜎2 𝜇2 =𝜇 ̂2
2𝜎
𝜎2 =𝜎
̂2
Set equal to zero, replace 𝜇1 and 𝜇2 by their MLE’s and solve for 𝜎 2 . Thus
1
𝜎̂ 2 = [Σ(𝑥𝑖 − 𝑥)2 + Σ(𝑦𝑗 − 𝑦)2 ] . (2.28)
𝑛+𝑚
In Ω0 (we call the 𝜎2 here 𝜎20 for convenience) the means are equal, so
2 2
𝜕𝐿(𝜇0 , 𝜇0 , 𝜎0 |𝒙, 𝒚) 𝜇 =𝜇 ̂ 𝜕𝐿(𝜇0 , 𝜇0 , 𝜎0 |𝒙, 𝒚) 𝜇 =𝜇 ̂
( |𝜎02 =𝜎̂02 = 0; |𝜎02 =𝜎̂02 = 0)
𝜕𝜇0 0 0 𝜕𝜎02 0 0
2
𝜕ℓ𝑛 𝐿(𝜇0 , 𝜇0 , 𝜎0 |𝒙, 𝒚) 𝜕 1
= {− [Σ(𝑥𝑖 − 𝜇0 )2 + Σ(𝑦𝑗 − 𝜇0 )2 ]}
𝜕𝜇0 𝜕𝜇0 2𝜎02
1
= [Σ(𝑥𝑖 − 𝜇0 ) + Σ(𝑦𝑗 − 𝜇0 )]
𝜎02
2
𝜕𝐿(𝜇0 , 𝜇0 , 𝜎0 |𝒙, 𝒚) 𝜇 =𝜇̂ 1
|𝜎02=𝜎̂02 = ̂02
[Σ(𝑥𝑖 − 𝜇̂ 0 ) + Σ(𝑦𝑗 − 𝜇̂ 0 )] = 0
𝜕𝜇0 0 0 𝜎
so that
Σ(𝑥𝑖 − 𝜇̂ 0 ) + Σ(𝑦𝑗 − 𝜇̂ 0 ) = 0
Σ𝑥𝑖 − 𝑛𝜇̂ 0 + Σ𝑦𝑗 − 𝑚𝜇̂ 0 = 0
Σ𝑥𝑖 + Σ𝑦𝑗 = 𝑛𝜇̂ 0 + 𝑚𝜇̂ 0
𝜇̂ 0 (𝑛 + 𝑚) = Σ𝑥𝑖 + Σ𝑦𝑗 (2.29)
Σ𝑥𝑖 +Σ𝑦𝑗
𝜇̂ 0 = 𝑛+𝑚
𝑛𝑥+𝑚𝑦
= .
𝑛+𝑚
2 1
𝜎̂0 = 𝑛+𝑚 [Σ(𝑥𝑖 − 𝜇̂ 0 )2 + Σ(𝑦𝑗 − 𝜇̂ 0 )2 ] . (2.30)
4. Then
so that
𝑛+𝑚
sup 𝐿(𝜇0 ,𝜎02 |𝐱,𝐲) −
𝑛+𝑚
−
𝑛+𝑚
−
Ω0 ̂ 2)
(2𝜋𝜎 2 𝑒 2 ̂02
𝜎 2
𝜆(𝐱, 𝐲) = = 𝑛+𝑚 𝑛+𝑚 = (𝜎̂2 )
sup 𝐿(𝜇1 ,𝜇2 ,𝜎2 |𝐱,𝐲) − −
Ω ̂02 ) 2
(2𝜋𝜎 𝑒 2
𝑛+𝑚
1 −
̂ 0 )2 +Σ(𝑦𝑗 −𝜇
[Σ(𝑥𝑖 −𝜇 ̂ 0 )2 ] 2
= ( 𝑛+𝑚
) (2.31)
1
[Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 ]
𝑛+𝑚
𝑛+𝑚
−
̂ 0 )2 +Σ(𝑦𝑗 −𝜇
Σ(𝑥𝑖 −𝜇 ̂ 0 )2 2
= [ ] .
Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2
To simplify, notice that
Like what was mentioned before example 2.3 and used in example 2.3 and example 2.4 we
have that ∑(𝑥𝑖 − 𝜇0 )2 = ∑(𝑥𝑖 − 𝑥)2 + 𝑛(𝑥 − 𝜇0 )2
𝑛+𝑚
𝑛𝑚 −
Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 + (𝑥−𝑦)2 2
𝑛+𝑚
𝜆(𝐱, 𝐲) = [ ]
Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2
𝑛+𝑚
𝑛𝑚 −
Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2 (𝑥−𝑦)2 2
𝑛+𝑚
= [Σ(𝑥 −𝑥)2+Σ(𝑦 2 + ]
𝑖 𝑗 −𝑦) Σ(𝑥𝑖 −𝑥)2 +Σ(𝑦𝑗 −𝑦)2
1
𝑤ℎ𝑒𝑟𝑒 𝑆𝑝2 = 𝑛+𝑚−2 [Σ(𝑥𝑖 − 𝑥)2 + Σ(𝑦𝑗 − 𝑦)2 ] (2.34)
is the pooled variance estimator.
𝑛𝑚 𝑛+𝑚 −1 𝑛 𝑚 −1 1 1 −1 1
and as = ( 𝑛𝑚 ) = (𝑛𝑚 + 𝑛𝑚) = (𝑚 + 𝑛) = 1 1
𝑛+𝑚 ( + )
𝑚 𝑛
6. Now the numerator and denominator in (2.33) are independent and we must find the
distribution of the left–hand side under 𝐻0 . (We work under 𝐻0 because we are going to use
these results in α, in which it is given that 𝐻0 is true.)
𝜎2 𝜎2
𝑋 ∼ 𝑁 (𝜇0 , ) 𝑎𝑛𝑑 𝑌 ∼ 𝑁 (𝜇0 , 𝑚 ) , 𝑠𝑜
𝑛
𝜎2 𝜎2 (𝑋−𝑌)−0 𝑋−𝑌
𝑋−𝑌 ∼ 𝑁 (0, + ) 𝑠𝑜 𝑍= 2 2
= ∼ 𝑁(0,1)
𝑛 𝑚 1 1
√ 𝜎 +𝜎 √( + )𝜎2
𝑛 𝑚 𝑛 𝑚
As:
𝐸(𝑎𝑋 + 𝑏𝑌) = 𝑎𝐸(𝑋) + 𝑏𝐸(𝑌)
𝑉(𝑎𝑋 + 𝑏𝑌) = 𝑎2 𝑉(𝑋) + 𝑏 2 𝑉(𝑌)
So:
𝐸(𝑋 − 𝑌) = 𝐸(1𝑋 + (−1)𝑌) = 1𝐸(𝑋) + (−1)𝐸(𝑌) = 𝐸(𝑋) − 𝐸(𝑌) = 𝜇0 − 𝜇0 = 0
𝜎2 𝜎2
𝑉(𝑋 − 𝑌) = 𝑉(1𝑋 + (−1)𝑌) = 12 𝑉(𝑋) + (−1)2 𝑉(𝑌) = 𝑉(𝑋) + 𝑉(𝑌) = +
𝑛 𝑚
Next,
2
(𝑛−1)𝑆𝑋 Σ(𝑋𝑖 −𝑋)2 2 Σ(𝑌𝑗 −𝑌)2 2
= ∼ 𝜒𝑛−1 𝑎𝑛𝑑 ∼ 𝜒𝑚−1 (see ch6 of Rice Textbook)
𝜎2 𝜎2 𝜎2
Σ(𝑋𝑖 −𝑋)2 Σ(𝑌𝑗 −𝑌)2 2
independently, so that + ∼ 𝜒𝑛−1+m−1
𝜎2 𝜎2
(𝑛+𝑚−2)𝑆𝑝2 Σ(𝑋𝑖 −𝑋)2 +Σ(𝑌𝑗 −𝑌)2 2
𝑈= = ∼ 𝜒𝑛+𝑚−2 .
𝜎2 𝜎2
𝑍
If 𝑍~𝑁(0,1) and 𝑈~𝜒𝑣2 then ~𝑡𝑣 from the definition of the t-distribution from 2nd year.
√𝑈/𝑣
𝑠𝑜 𝑡ℎ𝑎𝑡
𝑋−𝑌 𝑋−𝑌
1 1 2 1 1
𝑍 √( + )𝜎 √( + ) 𝑋−𝑌
𝑛 𝑚 𝑛 𝑚
𝑊 = = = = ∼ 𝑡𝑛+𝑚−2 . (2.35)
𝑈 1 1
√ (𝑛+𝑚−2)𝑆2
𝑝 √𝑆𝑝2 √( + )𝑆𝑝2
𝑛+𝑚−2 𝑛 𝑚
√( 𝜎2
)
𝑛+𝑚−2
7. So we reject 𝐻0 if
|𝑥−𝑦|
1 1
> k i.e. |𝑤| > 𝑘 i.e. ±𝑤 > 𝑘 i.e. 𝑤 > 𝑘 𝑜𝑟 − 𝑤 > 𝑘 i.e. 𝑤 > 𝑘 𝑜𝑟 𝑤 < −𝑘
√( + )𝑆𝑝2
𝑛 𝑚
where 𝑊 ∼ 𝑡𝑛+𝑚−2 .
𝛼
Since the 𝑡–distribution is symmetric around zero, = 𝑃[𝑊 > 𝑘]
2
𝛼
so 1 − 2 = 𝑃[𝑊 < 𝑘] ,
𝛼
so 1 − 2 = 𝐹𝑊 [𝑘] ,
−1 𝛼
so 𝐹𝑊 [1 − 2 ] = 𝑘 where 𝑊 ∼ 𝑡𝑛+𝑚−2
−1 𝛼
so 𝑘 = 𝐹𝑊 [1 − 2 ] where 𝑊 ∼ 𝑡𝑛+𝑚−2
so 𝑘 = 𝑡𝑛+𝑚−2;1−𝛼/2.
Remarks :
𝐻0 : 𝜇1 = 𝜇2 versus 𝐻1 : 𝜇1 ≠ 𝜇2 Reject H0 if |𝑤| > 𝑡𝑛+𝑚−2;1−𝛼/2
Is the same as
𝐻0 : 𝜇1 = 𝜇2 versus 𝐻1 : 𝜇1 ≠ 𝜇2 Reject H0 if 𝑤 > 𝑡𝑛+𝑚−2;1−𝛼/2 𝑜𝑟 𝑤 < −𝑡𝑛+𝑚−2;1−𝛼/2
Example 2.11 : Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇1 , 𝜎12 ) and, independently, 𝑌1 , … , 𝑌𝑚 ∼ 𝑁(𝜇2 , 𝜎22 ) with
all parameters unknown. Derive the LR–test of 𝐻0 : 𝜎12 = 𝜎22 (= 𝜎02 ) versus 𝐻1 : 𝜎12 ≠ 𝜎22 .
1. Likelihood function :
𝐿(𝜇1 , 𝜇2 , 𝜎12 , 𝜎22 | , )
= 𝑓𝑋1 (𝑥1 |𝜇1 , 𝜎12 ) × … × 𝑓𝑋𝑛 (𝑥𝑛 |𝜇1 , 𝜎12 ) × 𝑓𝑌1 (𝑦1 |𝜇2 , 𝜎12 ) × … × 𝑓𝑌𝑚 (𝑦𝑚 |𝜇2 , 𝜎22 )
= ∏𝑛𝑖=1 𝑓𝑋 (𝑥𝑖 |𝜇1 , 𝜎12 ) × ∏𝑚 2
𝑖=1 𝑓𝑌 (𝑦𝑖 |𝜇2 , 𝜎2 )
1 1 1 1
− (𝑥 −𝜇1 )2 − (𝑦 −𝜇2 )2
= ∏𝑛𝑖=1(2𝜋𝜎12 )−2 𝑒 2𝜎2 𝑖 × ∏𝑚 2 −2
𝑖=1(2𝜋𝜎2 ) 𝑒 2𝜎2 𝑖
1 1
− Σ(𝑥𝑖 −𝜇1 )2 − Σ(𝑦𝑗 −𝜇2 )2
2𝜎2 2𝜎2
= (2𝜋𝜎12 )−𝑛/2 (2𝜋𝜎22 )−𝑚/2 𝑒 1 2 . (2.37)
𝜇̂ 1 = 𝑥 , 𝜇̂ 2 = 𝑦 ,
1 1
𝜎̂12 = Σ(𝑥𝑖 − 𝑥)2 , 𝜎̂22 = Σ(𝑦𝑗 − 𝑦)2 .
𝑛 𝑚
so 𝜇̂ 1;0 = 𝑥 , 𝜇̂ 2;0 = 𝑦, but as in (2.28), when the variances are assumed equal,
1
𝜎̂02 = 𝑛+𝑚 [Σ(𝑥𝑖 − 𝑥)2 + Σ(𝑦𝑗 − 𝑦)2 ] .
𝐼𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑡 𝑅𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝐼𝑉 ∶
𝐼𝑓 𝑋 ∼ 𝜒𝜈2 , 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓
𝑌 ∼ 𝜒𝜔2
, 𝑡ℎ𝑒𝑛 (2.39)
𝑋/𝜈
𝑊 = ∼ 𝐹𝜈,𝜔 .
𝑌/𝜔
1 1
Let 𝑠12 = 𝑛−1 Σ(𝑥𝑖 − 𝑥)2 and 𝑠22 = 𝑚−1 Σ(𝑦𝑗 − 𝑦)2
(𝑚−1)𝑠22 /(𝑚−1) 1
𝜎2 𝑠2 Σ(𝑌𝑗 −𝑌)2 𝑛−1
so 0
(𝑛−1)𝑠2
= 𝑠22 = 𝑚−1
1 = 𝑚−1 𝑉 ∼ 𝐹𝑚−1,𝑛−1 .
1 /(𝑛−1) 1 Σ(𝑋𝑖 −𝑋)2
𝑛−1
𝜎2
0
then the LR–test is equivalent to
Reject 𝐻0 if 𝑣 < 𝑘1 or 𝑣 > 𝑘2
𝑛−1 𝑛−1 𝑛−1 𝑛−1
if 𝑚−1 𝑣 < 𝑚−1 𝑘1 or 𝑣 > 𝑚−1 𝑘2
𝑚−1
𝑛−1 𝑛−1
if 𝑣 < 𝑘1′ or 𝑣 > 𝑘2′
𝑚−1 𝑚−1
𝑠22 𝑠22 𝑠22
if < 𝑘1′ or > 𝑘2′ where ∼ 𝐹𝑚−1,𝑛−1 .
𝑠12 𝑠12 𝑠12
7. Now 𝛼 = sup 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒) = sup 𝑃𝜎02 (𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 ) = 𝑃𝜎02 (𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 )
𝑠2 𝑠22 𝑠2 𝑠2
= 𝑃𝜎02 [𝑠22 < 𝑘′1 or > 𝑘′2 ] = 𝑃𝜎02 [𝑠22 < 𝑘 ′1 ] + 𝑃𝜎02 [𝑠22 > 𝑘 ′ 2 ]
1 𝑠12 1 1
𝑠22 𝑠22
𝛼/2 + 𝛼/2 = 𝑃𝜎02 [ 2 < 𝑘 ′1 ] + 𝑃𝜎02 [ 2 > 𝑘 ′ 2 ]
𝑠1 𝑠1
𝑠2 𝑠2
Let 𝑃𝜎02 [𝑠22 < 𝑘′1 ] = 𝛼/2 and 𝑃𝜎02 [𝑠22 > 𝑘′2 ] = 𝛼/2 , then
1 1
𝑠22 𝑠22 𝑠22
𝑃𝜎02 [𝑠2 < 𝑘′1 ] = 𝛼/2 and 𝑃𝜎02 [𝑠2 < 𝑘′2 ] = 1 − 𝛼/2 where ~𝐹𝑚−1,𝑛−1
1 1 𝑠12
𝐹𝐹𝑚−1,𝑛−1 (𝑘 ′1 ) = 𝛼/2 and 𝐹𝐹𝑚−1,𝑛−1 (𝑘 ′ 2 ) = 1 − 𝛼/2
𝑘 ′1 = 𝐹𝐹−1
𝑚−1,𝑛−1
(𝛼/2) and 𝑘 ′ 2 = 𝐹𝐹−1
𝑚−1,𝑛−1
(1 − 𝛼/2)
𝑘′1 = 𝐹𝑚−1,𝑛−1;𝛼/2 𝑎𝑛𝑑 𝑘′2 = 𝐹𝑚−1,𝑛−1;1−𝛼/2 .
𝑠12
However, this two–sided test can be reduced to a one–sided test since ∼ 𝐹𝑛−1,𝑚−1 (similar
𝑠22
𝑠22 1
to how we got ∼ 𝐹𝑚−1,𝑛−1 ) and 𝐹𝑚−1,𝑛−1;𝛼/2 = (easy to show, similar to
𝑠12 𝐹𝑛−1,𝑚−1;1−𝛼/2
above) .
(so that we just use one F table, the 1 − 𝛼/2 table. You might only have the 97.5% table and not
the 2.5% table, and have 𝛼 = 0.05, then you only need the 97.5% table and not the 2.5% table by
doing this second last step.)
𝑠22
( OR 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 > 𝐹𝑚−1,𝑛−1;1−𝛼 𝑤ℎ𝑒𝑟𝑒 𝑠22 𝑖𝑠 𝑡ℎ𝑒 𝑙𝑎𝑟𝑔𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒. (2.40a))
𝑠12
Remarks :
1. For the one–sided hypotheses 𝐻0 : 𝜎12 ≤ 𝜎22 (𝐻0 : 𝜎12 ≥ 𝜎22 ) versus 𝐻1 : 𝜎12 > 𝜎22 (𝐻1 : 𝜎12 <
𝑠12 𝑠22
𝜎22 ) the LR–test is to reject 𝐻0 if > 𝐹𝑛−1,𝑚−1,1−𝛼 (if > 𝐹𝑚−1,𝑛−1,1−𝛼 ). (similar to before
𝑠22 𝑠12
– see previous examples)
3. For 𝑘 populations the hypothesis 𝐻0 : 𝜎12 = 𝜎22 = ⋯ = 𝜎𝑘2 (= 𝜎02 ) is usually tested by
means of an approximate test, using the asymptotic distribution of the LR–statistic (see a later
section and Exercise 5).
In Chapter 1 we had the definition of a most powerful test (MPT) for simple hipotheses.
We were dealing with simple hyotheses (= and =) (so then size is 𝛼 ∗ = 𝜋 ∗ (𝜃0 ) for test*) in this
section:
Definition 1.8 : A test Γ ∗ is a most powerful test (MPT) of size 𝛼 (0 < 𝛼 < 1) if :
(i) 𝜋 ∗ (𝜃0 ) = 𝛼 , (i.e. 𝛼 ∗ = 𝛼) and
(ii) 𝛽 ∗ ≤ 𝛽
for all other tests Γ of size 𝛼 or smaller.
We had:
(ii) 𝛽 ∗ ≤ 𝛽 i.e. 1 − 𝜋 ∗ (𝜃1 ) ≤ 1 − 𝜋(𝜃1 ) i.e. −𝜋 ∗ (𝜃1 ) ≤ −𝜋(𝜃1 ) i.e. 𝜋 ∗ (𝜃1 ) ≥ 𝜋(𝜃1 )
Remember, 𝜋(𝜃1 ) = 1 − 𝛽 is called the power of the test, so 𝛽 = 1 − 𝜋(𝜃1 )
𝜋(𝜃) is called the power function.
A more general version of that definition for composite hypotheses (so then size is 𝛼 ∗ =
sup𝜃∈Ω0 𝜋 ∗ (𝜃) for test*) is as follows:
(Remember Ω𝑐0 = Ω1 .)
So a test is UMP if its power function is higher over the region where the alternative is true than
any other test whose size is the same or smaller.
FIGURE 2.3 : Power functions of two tests
In Figure 2.3 the power functions, 𝜋 ∗ (𝜃) and 𝜋(𝜃) , are depicted for two tests of the
hypotheses 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 . Both tests have size 𝛼, but Γ ∗ is a uniformly more
powerful test than Γ.
In general, only one–sided tests are UMP. The following theorem will show how to obtain an
UMP test by using the result of the Neyman–Pearson Lemma.
testing 𝐻0′ : 𝜃 = 𝜃0 versus 𝐻1′ : 𝜃 = 𝜃′ . We now have two simple hypotheses and since the
rejection region has the same form as in the Neyman–Pearson Lemma (Th1.2, or we can use
Th1.3), it follows that 𝜋 ∗ (𝜃′) ≥ 𝜋(𝜃′) where 𝜋(𝜃) is the power function of any other size 𝛼
test of 𝐻′0 , that is, any test satisfying 𝜋(𝜃0 ) = 𝛼. (For our original test, sup𝜃∈Ω0 𝜋(𝜃) = 𝜋(𝜃0 ))
But 𝜃′ was arbitrary, so 𝜋 ∗ (𝜃′) ≥ 𝜋(𝜃′) for all 𝜃′ ∈ Ω𝑐0 and the result follows.
Example 2.6 : Consider again Example 1.3(b) where the MP test of size 𝛼 was derived for
testing 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 = 𝜃1 where 𝜃0 < 𝜃1 and 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜃, 𝜎 2 ) with 𝜎 2
𝜎
known. The MPT from (1.13) is: 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 if 𝑥 > 𝜃0 + 𝑧1−𝛼 𝑛 . (2.22)
√
Suppose now we want to test 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 . We’ll thus use the same rejection
𝜎
rule according to theorem 2.1 i.e. 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 if 𝑥 > 𝜃0 + 𝑧1−𝛼 𝑛 for our UMP test of size 𝛼.
√
(We can see that the conditions for the theorem are met:
We see that the critical region above is independent of 𝜃1 , so it is the MP test for any 𝜃1 > 𝜃0 .
Condition (i) of the theorem is true since the power function of the test,
𝜎
𝜋 ∗ (𝜃) = 𝑃𝜃 [𝑋 > 𝜃 + 𝑧1−𝛼 ], is an increasing function of 𝜃, so sup𝜃≤𝜃0 𝜋 ∗ (𝜃) = 𝜋(𝜃0 ) = 𝛼.
√𝑛
So (2.22) is also the UMP test for 𝐻0 : 𝜃 ≤ 𝜃0 against 𝐻1 : 𝜃 > 𝜃0 .)
Another way to derive an UMP test is to apply the Karlin–Rubin theorem. For that we need the
following definition. I do this section different to how the original notes do it, so only use this
updated version of the notes for the rest of this section 2.3.
Definition 2.4 :
a) A family of pdfs {𝑔(𝑡|𝜃): 𝜃 ∈ Ω} for a random variable 𝑇 with parameter 𝜃 has a
𝑔(𝑡|𝜃′ )
monotone 𝜃0′ < 𝜃1′ likelihood ratio (MLR 𝜃0′ < 𝜃1′ ) if, for every 𝜃0′ < 𝜃1′ , 𝑔(𝑡|𝜃0′ ) is a
1
nondecreasing (increasing or constant) function of 𝑡.
b) A family of pdfs {𝑔(𝑡|𝜃): 𝜃 ∈ Ω} for a random variable 𝑇 with parameter 𝜃 has a
𝑔(𝑡|𝜃′ )
monotone 𝜃0′ > 𝜃1′ likelihood ratio (MLR 𝜃0′ > 𝜃1′ ) if, for every 𝜃0′ > 𝜃1′ , 𝑔(𝑡|𝜃0′ ) is a
1
nondecreasing (increasing or constant) function of 𝑡.
(Note: 𝑡 can also be the entire sample i.e. 𝒙 i.e. 𝑔(𝑡| … ) becomes 𝑓(𝒙| … ). Non-increasing
(decreasing or constant) w.r.t. 𝑠 is the same as non-decreasing (increasing or constant) w.r.t.
𝑡 = −𝑠). Many common families of distributions have a MLR. Indeed, any regular exponential
family with 𝑔(𝑡|𝜃) = ℎ(𝑡)𝑐(𝜃)𝑒 𝑤(𝜃)𝑡 has a MLR if 𝑤(𝜃) is a nondecreasing function.
Remarks :
1. Note that the theorem only applies to one–sided hypotheses. As we shall illustrate in Example
2.9, no UMP test exists for two–sided hypotheses.
2. To apply this theorem, first determine the likelihood ratio of the sample, or of a sufficient
statistic. Verify that it is a nondecreasing function of the sufficient statistic for the two parameter
values where if 𝐻0 : 𝜃 ≤ 𝜃0 we set 𝜃0′ < 𝜃1′ , and 𝐻0 : 𝜃 ≥ 𝜃0 we set 𝜃0′ > 𝜃1′ Write the test
down and determine the constant 𝑐.
3. The test is for a one–parameter family of distributions, or in other words, only the parameter
of interest must be unknown.
Example 2.7 : Let 𝑋1 , … , 𝑋𝑛 be a sample from a Poisson distribution with parameter 𝜃. Find
a UMP test of size 𝛼 for 𝐻0 : 𝜃 ≥ 𝜃0 versus 𝐻1 : 𝜃 < 𝜃0 .
Let 𝜃0′ > 𝜃1′ and calculate the ratio
′Σ𝑥𝑖 ′
𝑓(𝒙|𝜃0′ ) 𝐿(𝜃′ | 𝒙) 𝜃0 𝑒 −𝑛𝜃0 / ∏ 𝑥𝑖 !
= 𝐿(𝜃0′ | 𝒙) = ′Σ𝑥𝑖 ′
𝑓(𝒙|𝜃′ )
1 1 𝜃1 𝑒 −𝑛𝜃1 / ∏ 𝑥𝑖 !
Σ𝑥𝑖
′ ′ 𝜃′
= 𝑒 −𝑛(𝜃0 −𝜃1) (𝜃0′ ) .
1
(it’s easy to see that 𝑎𝑏 𝑡 is increasing w.r.t. 𝑡 if 𝑎 > 0 and 𝑏 > 1
because if 𝑡 get’s bigger, then 𝑏 𝑡 get’s bigger if 𝑏 > 1, then 𝑎𝑏 𝑡 get’s bigger if 𝑎 > 0)
Since 𝜃0′ > 𝜃1′ , this is N.B. an increasing function of 𝑇 = Σ𝑋𝑖 . Furthermore, Σ𝑋𝑖 is a sufficient
statistic for 𝜃. So, according to the Karlin–Rubin theorem we can immediately state that the UMP
test is of the form:
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 Σ𝑥𝑖 < 𝑐 .
To find 𝑐 so that the size of the test is 𝛼, let
𝛼 = sup 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑡𝑟𝑢𝑒] = sup 𝑃[Σ𝑥𝑖 < 𝑐|𝜃 ≥ 𝜃0 ] = sup P[Σ𝑥𝑖 < 𝑐] = 𝑃𝜃0 [Σ𝑋𝑖 < 𝑐].
𝜃≥𝜃0
Under 𝐻0 , Σ𝑋𝑖 ∼ Pois(𝑛𝜃0 ), a discrete distribution, so it may not be possible to find a 𝑐 so that
the size of the test is exactly 𝛼.
From Poisson tables it follows that 𝑃𝜃0 [Σ𝑋𝑖 ≤ 4] = 0.0293 and 𝑃𝜃0 [Σ𝑋𝑖 ≤ 5] = 0.0671. So
the size of the test would be one of these two values depending on your choice. If we want an 𝛼
as close as possible to 0.05, then the UMP test of size 𝛼 = 0.0671 is:
1 1 1
𝜎′2
𝑛/2 − Σ𝑥𝑖2 ( ′2 − ′2 )
2
= (𝜎1′2 ) 𝑒 𝜎 0𝜎 1 .
0
1 1 1
𝜎1′2
𝑛/2 − Σ𝑥𝑖2 ( ′2 − ′2 )
( (𝜎′2 ) 𝑒 2 𝜎0 𝜎1
= 𝑎𝑒 𝑏𝑡 will be increasing if 𝑎, 𝑏 > 0
0
𝑛/2
𝜎′2 1 1 1 1 1 1 1
𝑎 = (𝜎1′2 ) > 0 but 𝑏 = − 2 (𝜎′2 − 𝜎′2 ) < 0 ∵ 𝜎0′2 < 𝜎1′2 so > 𝜎′2 so − 𝜎′2 > 0 so
0 0 1 𝜎0′2 1 𝜎0′2 1
1 1 1
− 2 (𝜎′2 − 𝜎′2 ) < 0 where 𝑡 = Σ𝑥𝑖2 (actually decreasing w.r.t. this 𝑡 = Σ𝑥𝑖2 )
0 1
So we do it like this:
1 1 1
𝜎′2
𝑛/2 − Σ𝑥𝑖2 ( ′2 − ′2 )
(𝜎1′2 ) 𝑒 2 𝜎 0𝜎 1 = 𝑎𝑒 𝑏𝑡 will be increasing if 𝑎, 𝑏 > 0
0
𝑛/2
𝜎′2 1 1 1
𝑎 = (𝜎1′2 ) > 0 but 𝑏 = 2 (𝜎′2 − 𝜎′2 ) > 0 where 𝑡 = −Σ𝑥𝑖2 so increasing w.r.t. 𝑡 = −Σ𝑥𝑖2 )
0 0 1
𝜇0 −𝜇 𝜇0 −𝜇
= 1 − {𝑃 [𝒵 ≤ 𝜎 + 𝑧1−𝛼/2 ] − 𝑃 [𝒵 ≤ 𝜎 − 𝑧1−𝛼/2 ]}
√𝑛 √𝑛
√𝑛(𝜇0 −𝜇) √𝑛(𝜇0 −𝜇)
= 1 − Φ( 𝜎
+ 𝑧1−𝛼/2 ) + Φ ( 𝜎
− 𝑧1−𝛼/2 ) .
𝜋(𝜇0 ) = 1 − Φ(𝑧1−𝛼/2 ) + Φ(−𝑧1−𝛼/2 )
= 1 − Φ(𝑧1−𝛼/2 ) + Φ(𝑧𝛼/2 )
Notice that
= 1 − (1 − 𝛼/2) + 𝛼/2
= 𝛼 , 𝑡ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑒𝑠𝑡.
FIGURE 2.4
The size of this test is still 𝜋2 (𝜇0 ) = 𝛼, but the power functions, 𝜋2 (𝜇), is shown in Figure 2.4
as the broken line. It is clear that neither test is uniformly more powerful than the other. For 𝜇 >
𝜇0 , the original test is more powerful, while for 𝜇 < 𝜇0 the new test is more powerful. So for
any two–sided test it is always possible to find another two–sided test that will be more powerful
in some region of the parameter space.
The LR–test in the above example has however one desirable property which is unique to it. It is
an unbiased test.
Definition 2.5 : A test with power function 𝜋(𝜃) is an unbiased test if 𝜋(𝜃1′ ) ≥ 𝜋(𝜃0′ ) for
every 𝜃0′ ∈ Ω0 and 𝜃1′ ∈ Ω𝑐0 .
(We could get for example 𝜃0′ ∈ (−∞, 𝜃0 ] and 𝜃1′ ∈ (𝜃0 , ∞))
This simply means that the probability of rejecting the null hypothesis when it is false should
never be smaller than the probability of rejecting it when it is true.
From Figure 2.4 we can see that inf𝜇∈Ω 𝜋(𝜇) = 𝜋(𝜇0 ) = 𝛼, so 𝜋(𝜇) > 𝜋(𝜇0 ) for all 𝜇 ≠ 𝜇0 ,
and thus the LR–test is an unbiased test of size 𝛼. The test with power function 𝜋2 (𝜇) is
however not unbiased since the power function drops below the size of the test in a region of
Ω𝑐0 .
Note : The LR–test is a symmetrical test and there exists no other unbiased test of the same size
that is more powerful. So we call the LR–test in this situation the uniformly most powerful
unbiased test of the given hypothesis.
Important results (memorize the rejection rules to be able to apply this to dataset(s)):
1. (One-group) Normal test on the mean where the variance is unknown (t-test):
√𝑛|𝑥−𝜇0 |
Reject the null hypothesis 𝐻0 : 𝜇 = 𝜇0 if |𝑡| ≥ 𝑡𝑛−1,1−𝛼/2 , where |𝑡| = ,
𝑠
𝑠 𝑠
or , equivalently, if 𝑥 ≤ 𝜇0 − 𝑡𝑛−1,1−𝛼/2 𝑜𝑟 𝑥 ≥ 𝜇0 + 𝑡𝑛−1,1−𝛼/2 .
√𝑛 √𝑛
In the previous examples we have only dealt with two–sided (≠) alternatives (H1) and a point (=)
null hypothesis (H0). For one–sided tests we would usually apply the methods described in the
example. However, one–sided LR–tests are derived in a similar manner and are usually just a
one–sided version of the tests described above.
For example, in Example 2.4, if we had 𝐻0 : 𝜇 ≤ 𝜇0 versus 𝐻1 : 𝜇 > 𝜇0 be the two hypotheses.
The LR–test of size 𝛼 would then be:
𝑠
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑥 ≥ 𝜇0 + 𝑡𝑛−1,1−𝛼 𝑛 .
√
These are one–sided critical regions and the only difference to (2.17) is that the 𝛼/2 is replaced
by 𝛼.
• The case where the variance is known doesn’t have to be memorized, but it is called a z-
test.
2. Two-group (independent groups) normal test on the means where the variances are
unknown but equal (t-test) (case 2a of computer lecture):
Two sample (independent samples) test on the means of normal distributions (variances
equal but unknown):
From example 2.10 : Consider a sample 𝑋1 , … , 𝑋𝑛 of size 𝑛 from a 𝑁(𝜇1 , 𝜎 2 )
distribution and an independent sample 𝑌1 , … , 𝑌𝑚 of size 𝑚 from a 𝑁(𝜇2 , 𝜎 2 ) distribution.
Note that the variances are assumed equal but unknown. We want to derive the LR test for
𝐻0 : 𝜇1 = 𝜇2 (= 𝜇0 ) versus 𝐻1 : 𝜇1 ≠ 𝜇2 .
Remarks :
LR–test is just the 1–sided version, LR–test is just the 1–sided version,
• Two-group (independent groups) normal test on the means where the variances are
unknown but unequal (t-test) doesn’t have to be memorized. (case 2b of computer
lecture)
• Two-group (independent groups) normal test on the means where the variances are
known but equal (z-test) doesn’t have to be memorized. (case 2c of computer lecture)
3. Two-group dependent groups normal test on the means where the variances are
unknown but equal (t-test) (case 1 of computer lecture):
Two sample (dependent samples) test on the means of normal distributions (variances equal
but unknown):
From ex 2. of ch 2
Consider paired samples (𝑋1 , 𝑌1 ), (𝑋2 , 𝑌2 ), … (𝑋𝑛 , 𝑌𝑛 ) from two normal distributions where
𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇1 , 𝜎12 ) and 𝑌1 , … , 𝑌𝑛 ∼ 𝑁(𝜇2 , 𝜎22 ) and 𝑋 and 𝑌 are not independent with
Cov (𝑋, 𝑌) = 𝜌𝜎1 𝜎2 . Derive a test of size 𝛼 for 𝐻0 : 𝜇1 − 𝜇2 = 0 versus 𝐻1 : 𝜇1 − 𝜇2 ≠ 0
where 𝜎12 and 𝜎22 are unknown. (Hint: Work with the distribution of 𝑋 − 𝑌).
In Exercise 2 from this chapter we derived the LR-test (which is called the paired t-test) of
size α for the hypothesis for 𝐻0 : 𝜇1 − 𝜇2 = 0 versus 𝐻1 : 𝜇1 − 𝜇2 ≠ 0. The test statistic is
based on the within-subject differences 𝐷1 , … , 𝐷𝑛 where 𝐷𝑖 = 𝑋𝑖 − 𝑌𝑖 , have distribution
𝑁(𝜇, 𝜎 2 )and 𝜇 = 𝜇1 − 𝜇2 . Then the LR test is as follows:
|𝑑−𝜇0 |
|𝑡| > 𝑡𝑛−1;1−𝛼/2 where |𝑡| =
√s/n
1 1 2
Here 𝑑 = 𝑛 ∑ 𝑑𝑖 = 𝑥 − 𝑦 and 𝑠 2 = 𝑛−1 ∑(𝑑𝑖 − 𝑑) i.e. the sample variance of the 𝑑𝑖 ’s.
4. k-group independent groups normal test on the means where the variances are unknown
but equal (t-test):
(example 2.10’s case where we have k groups instead of 2 groups)
From ex 4. of ch 2
Consider 𝑘 normal populations where 𝑋𝑖 ∼ 𝑁(𝜇𝑖 , 𝜎 2 ) , 𝑖 = 1, … , 𝑘, independently. A sample
of size 𝑛 is drawn from each population and 𝜎 2 is unknown. Find the LR–test of size 𝛼 for
testing 𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘 (= 𝜇0 ) versus 𝐻1 : Not all means are equal.
Calculate x̅ for each sample and then find the sample variance (=var.s()) for all the x̅’s to find 𝑠2𝑥̅
(𝑛1 −1)𝑠21 +⋯+(𝑛𝑘 −1)𝑠2𝑘
Calculate the sample variance of each sample 𝑠2𝑖 and then find 𝑠2𝑝 = (𝑛1 −1)+⋯+(𝑛𝑘 −1)
𝑛𝑠𝑥2
The LR-test is: Reject the null hypothesis if 𝐹 = 2 > 𝐹𝑘−1,𝑘(𝑛−1);1−𝛼
𝑠𝑝
2.5 Chi–Square Tests
In this section we present a number of tests of hypotheses that one way or another involve the
chi–square distribution. Included will be the asymptotic distribution of the generalized
likelihood–ratio, goodness–of–fit, and tests concerning contingency tables. The material in this
section will be presented with an aim of merely finding tests of certain hypotheses, and it will not
be presented in such a way that concern is given to the optimality of the test. Thus, the power
functions of the derived tests will not be discussed.
Sometimes the distribution of the generalized likelihood ratio is intractable. In such cases we can
use the asymptotic distribution of the GLR as given without proof in the following theorem.
Theorem 2.3 : Let 𝑋1, … , 𝑋𝑛 be a random sample from a pdf 𝑓(𝑥| 𝜃 ) where
𝜃 = {𝜃1 , … , 𝜃𝑘 } . Let 𝐻0 : 𝜃 ∈ Ω0 and 𝐻0 : 𝜃 ∈ Ω𝑐0 . Under some regularity conditions the
distribution of −2ℓ𝑛 𝜆(𝐗) converges to a chi–squared distribution under 𝐻0 as 𝑛 → ∞. The
degrees of freedom is the difference between the number of free parameters in Ω and the
number of free parameters under 𝐻0 .
Since the LR criterium rejects the null hypothesis if 𝜆(𝐗) < 𝑐, it follows that this is equivalent to
rejecting when −2ℓ𝑛 𝜆(𝐗) > −2 ℓ𝑛(𝑐) i.e. −2ℓ𝑛 𝜆(𝐗) > 𝑘 , where we have that
𝑃𝐻0 [−2ℓ𝑛 𝜆(𝐗) > 𝑘] = 𝛼 i.e. 𝑃𝐻0 [−2ℓ𝑛 𝜆(𝐗) < 𝑘] = 1 − 𝛼 i.e. 𝐹−2ℓ𝑛 𝜆(𝐗),𝐻0 [𝑘] = 1 − 𝛼
. 2
Furthermore, −2ℓ𝑛 𝜆(𝐗) ∼ 𝜒𝜈2 under 𝐻0 , so 𝑘 = 𝜒𝜈,1−𝛼 where 𝜈 is determined according
to the theorem above.
Approximate LR–test:
2
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓 − 2ℓ𝑛 𝜆(𝐱) > 𝜒𝜈,1−𝛼 . (2.41)
𝜈 = dim(Ω) − dim(Ω0 ) where dim refers to the number of free parameters under Ω or Ω0 ,
which is the number of parameter that can vary (i.e. is not a fixed value) under Ω or Ω0 .
𝑛! 𝑦 𝑦
𝑓(𝑥|𝑝) = 𝑦 𝑝1 1 𝑝2 2 𝑦1 = 0, 1, … , 𝑛 𝑦2 = 𝑛 − 𝑦1 0 ≤ 𝑝1 ≤ 1 𝑝2 = 1 − 𝑝1
1 !𝑦2 !
Or:
𝑛! 𝑦 𝑦
𝑓(𝑥|𝑝) = 𝑦 !𝑦 ! 𝑝1 1 𝑝2 2 𝑦𝑖 = 0, 1, … , 𝑛 ∑2𝑖=1 𝑦𝑖 = 𝑛 0 ≤ 𝑝𝑖 ≤ 1 ∑2𝑖=1 𝑝𝑖 = 1
1 2
The experiment has 𝑛 independent trials. Each trial results in either event 1 or event 2.
𝑃(𝑒𝑣𝑒𝑛𝑡 1) = 𝑝1 and 𝑃(𝑒𝑣𝑒𝑛𝑡 2) = 𝑝2 = 1 − 𝑝1 in each trial.
𝑦1 = total number of times event 1 occurs
and 𝑦2 = 𝑛– 𝑦1 = total number of times event 2 occurs.
The experiment has 𝑛 independent trials. Each trial results in either event 1 or event 2 or … or
event k.
For 𝑖 = 1, … , 𝑘: 𝑃(𝑒𝑣𝑒𝑛𝑡 𝑖) = 𝑝𝑖 in each trial .
For 𝑖 = 1, … , 𝑘: 𝑦𝑖 = total number of times event 𝑖 occurs.
Practical example
This is like looking for the probability that we draw 2 red balls, 3 green balls and 4 yellow balls
out of a bucket that has 10 red balls, 15 green balls and 20 yellow balls, which is:
9! 10 2 15 3 20 4
( ) (10+15+20) (10+15+20) = …
2!3!4! 10+15+20
Note:
If we have the value of the first 𝑘 − 1 𝑝𝑖 ’s we have automatically have the value of
𝑝𝑘 = 1 − ∑𝑘−1
𝑖=1 𝑝𝑖
If we have the value of the first 𝑘 − 1 𝑦𝑖 ’s we have automatically have the value of
𝑦𝑘 = 𝑛 − ∑𝑘−1
𝑖=1 𝑦𝑖
(1 (observation) multinomial experiment with n trials i.e. there is one vector (y1, y2,…,yk)
observation)
𝑛! 𝑦 𝑦 𝑦
𝑓(𝒚|𝒑) = 𝑦 𝑝1 1 𝑝2 2 … 𝑝𝑘 𝑘
!𝑦
1 2 !…𝑦 𝑘 !
𝑛! 𝑦 𝑦 𝑦 𝑦 𝑦 𝑦
𝐿(𝒑|𝒚) = 𝑦 !𝑦 !…𝑦 ! 𝑝1 1 𝑝2 2 … 𝑝𝑘 𝑘 = 𝐶𝑝1 1 𝑝2 2 … 𝑝𝑘 𝑘 (as we have one vector observation)
1 2 𝑘
𝑦 𝑦 𝑦
i.e. 𝐿(𝒑|𝒚) ∝ 𝑝1 1 𝑝2 2 … 𝑝𝑘 𝑘
2.5.1 Goodness–of–fit Tests
Let the possible outcomes of a random experiment be decomposed into 𝑘 mutually exclusive
sets, say 𝐴1 , … , 𝐴𝑘 . Define 𝑝𝑗 = 𝑃[𝐴𝑗 ], 𝑗 = 1,2, … , 𝑘 . In 𝑛 independent repetitions of the
experiment, let 𝑌𝑗 denote the number of outcomes belonging to the set 𝐴𝑗 , so that ∑𝑘𝑗=1 𝑌𝑗 =
𝑛 and ∑𝑘𝑗=1 𝑝𝑗 = 1 . Then 𝑌1 , … , 𝑌𝑘 have a multinomial distribution with parameters
𝑝1 , … , 𝑝𝑘 and 𝑛. This is a very general situation and is used to test hypotheses about 𝑝1 , … , 𝑝𝑘
where the original data can be continuous, discrete or categorical.
The likelihood function for the multinomial distribution is given by
𝑦 𝑦 𝑦
𝐿(𝑝1 , … , 𝑝𝑘 | ) ∝ 𝑝1 1 𝑝2 2 … 𝑝𝑘 𝑘
𝑦 (2.42)
= ∏𝑘𝑗=1 𝑝𝑗 𝑗 .
Let us look at the LR–statistic for some typical null hypotheses.
You only need to memorise and know the highlighted part of the following example and remarks
that follow.
Example 2.12 : The simplest null hypothesis for the multinomial model is:
Ω = {(𝑝1 , … , 𝑝𝑘 ): 0 ≤ 𝑝𝑗 ≤ 1 , ∑𝑘𝑗=1 𝑝𝑗 = 1} ,
with dimension = 𝑘 − 1. Thus the number of free parameters in Ω is 𝑘 − 1 and the number
of free parameters under 𝐻0 is zero. We shall not derive the ML estimators of 𝑝1 , … , 𝑝𝑘
formally, but it is logical that
𝑦
𝑝̂𝑗 = 𝑛𝑗 , 𝑗 = 1, … , 𝑘 . (2.43)
𝑘
𝑦
𝑆𝑜 sup 𝐿(𝑝1 , … , 𝑝𝑘 | ) ∝ ∏ 𝑝𝑗0𝑗
Ω0
𝑗=1
𝑘
𝑦
𝑎𝑛𝑑 sup 𝐿(𝑝1 , … , 𝑝𝑘 | ) ∝ ∏ 𝑝̂𝑗 𝑗
Ω
𝑗=1
𝑘
𝑦𝑗 𝑦𝑗
= ∏ ( ) .
𝑛
𝑗=1
The distribution of 𝜆(𝒚) is intractable, so we use the asymptotic result in (2.41), namely that
𝑦
𝑄𝜈 = −2ℓ𝑛𝜆(𝒚) = 2 ∑𝑘𝑗=1 𝑦𝑗 ℓ𝑛 (𝑛𝑝𝑗 ) has a chi–squared distribution with 𝜈 = dim(Ω) −
𝑗0
dim(Ω0 ) = 𝑘 − 1 degrees of freedom.
So approximate LR–test:
𝑦
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑄𝑘−1 = 2 ∑𝑘𝑗=1 𝑦𝑗 ℓ𝑛 (𝑛𝑝𝑗 ) > 𝜒𝑘−1,1−𝛼
2
. (2.45)
𝑗0
Note that the degrees of freedom = number of free parameters in Ω – number of free
parameters in Ω0
= (k – 1) – 0
=k–1
Remarks
1. If we write
𝑂
𝑄𝑘−1 = 2 ∑𝑘𝑗=1 𝑂𝑗 ℓ𝑛 (𝐸𝑗) . (2.46)
𝑗
This is a convenient form and is the basic form of 𝑄 for all tests of a multinomial distribution.
2. Another commonly used form of 𝑄 is the Pearson statistic. A Taylor series expansion of 𝑄
about 𝐸𝑗 is:
𝑂 1
∑𝑘𝑗=1 𝑂𝑗 ℓ𝑛 ( 𝑗) = ∑𝑘𝑗=1 [(𝑂𝑗 − 𝐸𝑗 ) + (𝑂𝑗 − 𝐸𝑗 )2 + ⋯ ] .
𝐸𝑗 2𝐸 𝑗
Since Σ𝑂𝑗 = Σ𝐸𝑗 = 𝑛, the first term in the expansion is zero. A second order approximation is
then
(𝑂𝑗 −𝐸𝑗 )2 .
𝑄𝑘−1 ≈ ∑𝑘𝑗=1 2
∼ 𝜒𝑘−1 . (2.47)
𝐸𝑗
This approximation is only reasonable if the 𝐸𝑗 ’s are not too small. A rule of thumb is that all
𝐸𝑗 ’s should be larger or equal to five, 𝐸𝑗 ≥ 5 , 𝑗 = 1, … , 𝑘.
Class example (using the result of example 2.12):
Suppose we want to test the following hypotheses:
Solution:
𝐻0 : 𝑝1 = 0.3, 𝑝2 = 0.5, 𝑝3 = 0.2 versus 𝐻1 : 𝐻0 𝑖𝑠 𝑛𝑜𝑡 𝑡𝑟𝑢𝑒
2
Reject 𝐻0 if 𝑄𝑘−1 > 𝜒𝑘−1,1−𝛼
(O1 = 28, O2 = 52, O3 = 19. n = 28 + 53 + 19 = 100. Ei = npi = 100pi. Thus E1 = 30, E2 = 50, E3 = 20.
𝑂 28 53 19
𝑄𝑘−1 = 2 ∑𝑘𝑗=1 𝑂𝑗 ℓ𝑛 (𝐸𝑗) = 2 [28ℓ𝑛 (30) + 53ℓ𝑛 (50) + 19ℓ𝑛 (20)] = 0.3678
𝑗
2 2 2
𝜒𝑘−1,1−𝛼 = 𝜒3−1,1−0.05 = 𝜒2,0.95 = 5.99146
0.3678 ≯ 5.99146
Do not reject H0
At a 95% confidence level we could not conclude that “𝑝1 = 0.3, 𝑝2 = 0.5, 𝑝3 = 0.2” is false.
Example 2.13 : In the previous example the null hypothesis was simple. Let us look at a
composite null hypothesis.
In certain genetics problems, each individual in a given population must have one of three
possible genotypes, and it is assumed that the probabilities 𝑝1 , 𝑝2 and 𝑝3 of the three
genotypes can be represented in the following form:
𝑝1 = 𝜃 2 , 𝑝2 = 2𝜃(1 − 𝜃) , 𝑝3 = (1 − 𝜃)2 .
Here the parameter 𝜃 is unknown and lies in the interval 0 < 𝜃 < 1. For any 𝜃 in this interval,
𝑝𝑗 > 0 and ∑3𝑗=1 𝑝𝑗 = 1. A sample of 𝑛 is taken from the population and the number of each
genotype, 𝑦1 , 𝑦2 and 𝑦3 observed. We want to test whether it is reasonable to assume that
the probabilities have the form given above for some value of 𝜃.
𝑦 𝑦 𝑦
1. Likelihood function : 𝐿(𝑝1 , 𝑝2 , 𝑝3 | ) ∝ 𝑝1 1 𝑝2 2 𝑝3 3 .
2. 𝐻0 : 𝑝1 = 𝜃 2 , 𝑝2 = 2𝜃(1 − 𝜃) , 𝑝3 = (1 − 𝜃)2 .
Ω = {(𝑝1 , 𝑝2 , 𝑝3 ): 𝑝𝑗 > 0 , Σ𝑝𝑗 = 1} . (dim = k − 1 = 3 − 1 = 2)
Ω0 = {𝜃: 0 < 𝜃 < 1} (dim = 1) .
𝑦
3. In Ω the ML estimators are 𝑝̂𝑗 = 𝑛𝑗 , 𝑗 = 1,2,3,
In Ω0 we must determine the ML estimator of 𝜃.
Under 𝐻0 ,
𝜕ℓ𝑛 𝐿 2𝑦1 𝑦2 𝑦2 2𝑦
= + − 1−𝜃 − 1−𝜃3 .
𝜕𝜃 𝜃 𝜃
Set equal to zero, then
1 1
̂ (2𝑦1
𝜃
+ 𝑦2 ) = ̂ (𝑦2
1−𝜃
+ 2𝑦3 )
𝑎𝑛𝑑 (2.48)
2𝑦1 +𝑦2 2𝑦1 +𝑦2
𝜃̂ = = .
2𝑦1 +2𝑦2 +2𝑦3 2𝑛
4. So
3
𝑦𝑗 𝑦𝑗
sup 𝐿(𝑝1 , 𝑝2 , 𝑝3 |𝒙) ∝ ∏( )
Ω 𝑛
𝑗=1
𝑎𝑛𝑑
sup𝐿(𝜃|𝒙) ∝ 𝜃̂ 2𝑦1 [2𝜃̂ (1 − 𝜃̂)]𝑦2 (1 − 𝜃̂)2𝑦3
Ω0
𝑠𝑜 𝑡ℎ𝑎𝑡
𝑦1 𝑦2 𝑦3
𝑛𝜃̂ 2 𝑛2𝜃̂(1 − 𝜃̂) 𝑛(1 − 𝜃̂)2
𝜆(𝒙) = ( ) ( ) ( )
𝑦1 𝑦2 𝑦3
𝑎𝑛𝑑
𝑦1 𝑦2 𝑦3
−2ℓ𝑛 𝜆(𝒙) = 2𝑦1 ℓ𝑛 ( 2 ) + 2𝑦2 ℓ𝑛 ( ) + 2𝑦3 ℓ𝑛 ( )
𝑛𝜃̂ ̂ ̂
𝑛2𝜃(1 − 𝜃) 𝑛(1 − 𝜃̂)2
𝑖. 𝑒.
3
𝑂𝑗
𝑄1 = 2 ∑ 𝑂𝑗 ℓ𝑛 ( )
𝐸𝑗
𝑗=1
with degrees of freedom = 1.
2
5. Reject 𝐻0 if 𝑄1 > 𝜒1,1−𝛼 . (2.48)
In general for this type of example (H0 has pi’s being a function of 𝜃), our rejection rule is:
2
Reject 𝐻0 if 𝑄k−2 > 𝜒k−2,1−𝛼 .
See remarks after previous example – you can be asked Pearson’s Statistic for this case as well.
In a two–way contingency table we suppose that 𝑛 individuals or items are classified according
to criteria 𝐴 and 𝐵, that there are 𝑟 classifications in 𝐴 and 𝑐 classifications in 𝐵, and that
the number of individuals belonging to 𝐴𝑖 and 𝐵𝑗 is 𝑁𝑖𝑗 . We then have a 𝑟 × 𝑐 contingency
table with cell frequencies 𝑁𝑖𝑗 and ∑𝑟𝑖=1 ∑𝑐𝑗=1 𝑁𝑖𝑗 = 𝑛.
We shall denote the row totals by 𝑁𝑖 . and the column totals by 𝑁 .𝑗 , that is,
𝐵1 𝐵2 ⋯ 𝐵𝑐
𝐴1 𝑁11 𝑁12 ⋯ 𝑁1𝑐 𝑁1 .
𝐴2 𝑁21 𝑁22 ⋯ 𝑁2𝑐 𝑁2 .
⋮ ⋮ ⋮ ⋮ ⋮
𝐴𝑟 𝑁𝑟1 𝑁𝑟2 ⋯ 𝑁𝑟𝑐 𝑁𝑟 .
𝑁 .1 𝑁 .2 ⋯ 𝑁 .𝑐 𝑛
The 𝑛 individuals can again be regarded as a sample from a multinomial distribution with
probabilities 𝑝𝑖𝑗 , 𝑖 = 1, … , 𝑟 , 𝑗 = 1, … , 𝑐, for the 𝑟 × 𝑐 cells.
𝑁
𝐿(𝑝𝑖𝑗 , 𝑖 = 1, … , 𝑟 , 𝑗 = 1, … , 𝑐|𝑁𝑖𝑗 ) ∝ ∏𝑟𝑖=1 ∏𝑐𝑗=1 𝑝𝑖𝑗𝑖𝑗 . (2.49)
2. The null hypothesis is now the independence of criteria 𝐴 and 𝐵. This is different from the
null hypothesis in goodness–of–fit tests. We want to determine whether 𝐴𝑖 is independent of
𝐵𝑗 for all 𝑖, 𝑗. That is, if 𝑃[𝐴𝑖 𝑎𝑛𝑑 𝐵𝑗 ] = 𝑃[𝐴𝑖 ]𝑃[𝐵𝑗 ]. So if we write 𝑃[𝐴𝑖 𝑎𝑛𝑑 𝐵𝑗 ] =
𝑝𝑖𝑗 , 𝑃[𝐴𝑖 ] = 𝑝𝑖 . and 𝑃[𝐵𝑗 ] = 𝑝 .𝑗 , then the null hypothesis is:
𝐻0 : 𝑝𝑖𝑗 = 𝑝𝑖 . 𝑝 .𝑗 ; 𝑖 = 1, … , 𝑟 , 𝑗 = 1, … , 𝑐 .
Note: The hypotheses for this situation are the same as:
• H0: the two categories are independent versus H1: the two categories are dependent
• OR H0: there is not a relationship between the two categories versus H 1: there is a
relationship between the two categories
• OR H0: there is no interaction between the two categories versus H1: there is
interaction between the two categories
When 𝐻0 is not true, there is interaction between the two criteria of classification.
Now
Ω = {(𝑝11 , … , 𝑝𝑟𝑐 ): 0 ≤ 𝑝𝑖𝑗 ≤ 1 , ΣΣ𝑝𝑖𝑗 = 1} (dim = 𝑟𝑐 − 1)
Ω0 = {(𝑝1 . , … , 𝑝𝑟 . , 𝑝 .1 , … , 𝑝 .𝑐 ): 0 ≤ 𝑝𝑖 . , 𝑝 .𝑗 ≤ 1, ∑𝑖 𝑝𝑖 . = 1, ∑𝑗 𝑝 .𝑗 = 1}
(dim = 𝑟 − 1 + 𝑐 − 1)
𝑁𝑖𝑗
3. In Ω the ML estimators of 𝑝𝑖𝑗 are 𝑝̂𝑖𝑗 = 𝑛 .
In Ω0 the ML estimators for 𝑝𝑖 . and 𝑝 .𝑗 are
𝑁𝑖 . 𝑁 .𝑗
𝑝̂ 𝑖 . = , 𝑖 = 1, … , 𝑟 , 𝑎𝑛𝑑 𝑝̂ .𝑗 = , 𝑗 = 1, … , 𝑐 .
𝑛 𝑛
4.
𝑁 𝑁 𝑁𝑖𝑗
sup𝐿(𝑝𝑖𝑗 |𝑁𝑖𝑗 ) ∝ ∏𝑟𝑖=1 ∏𝑐𝑗=1 𝑝̂𝑖𝑗𝑖𝑗 = ∏𝑟𝑖=1 ∏𝑐𝑗=1 ( 𝑖𝑗) ,
Ω 𝑛
𝑁 𝑁𝑖𝑗 𝑁 .𝑗 𝑁𝑖𝑗
sup𝐿(𝑝𝑖 . , 𝑝 .𝑗 |𝑁𝑖𝑗 ) ∝ ∏𝑟𝑖=1 ∏𝑐𝑖=1 (𝑝̂ 𝑖 . 𝑝̂ .𝑗 )𝑁𝑖𝑗 = ∏𝑟𝑖=1 ∏𝑐𝑗=1 ( 𝑖 . ) ( ) .
Ω0 𝑛 𝑛
Then
𝑁 𝑁 .𝑗
(∏𝑟𝑖=1 𝑁𝑖 . 𝑖 . )(∏𝑐𝑗=1 𝑁 .𝑗 )
𝜆(𝑁𝑖𝑗 ) = 𝑁𝑖𝑗 (2.50)
𝑛𝑛 ∏𝑟𝑖=1 ∏𝑐𝑗=1 𝑁𝑖𝑗
𝑟 ∑𝑐𝑗 𝑁𝑖𝑗
since ∏𝑟𝑖 ∏𝑐𝑗 𝑛𝑁𝑖𝑗 = 𝑛∑𝑖 = 𝑛𝑛
and
𝑁 ∑𝑐 𝑁𝑖𝑗 𝑁
∏𝑟𝑖 ∏𝑐𝑗 𝑁𝑖 . 𝑖𝑗 = ∏𝑟𝑖 𝑁𝑖 . 𝑗 = ∏𝑟𝑖 𝑁𝑖 . 𝑖 . . (2.51)
Further
𝑁
−2 ℓ𝑛 𝜆(𝑁𝑖𝑗 ) = 2 ∑𝑟𝑖 ∑𝑐𝑗 𝑁𝑖𝑗 ℓ𝑛 𝑁𝑖 .𝑁𝑖𝑗 .𝑗
𝑛
𝑂
i.e. 𝑄 = 2 ∑𝑟𝑖 ∑𝑐𝑗 𝑂𝑖𝑗 ℓ𝑛 𝐸 (Prove).
𝑖𝑗
𝑖𝑗
The degrees of freedom is then no. of free parameters in Ω – no. of free parameters in Ω0
= (𝑟𝑐 − 1) − (𝑟 − 1 + 𝑐 − 1)
= (𝑟 − 1)(𝑐 − 1) .
2
5. So we reject 𝐻0 (independence) if 𝑄 > 𝜒(𝑟−1)(𝑐−1) ; 1−𝛼 .
Remark : The Pearson statistic for testing the above hypothesis would again be similar to (2.47),
namely
(𝑂𝑖𝑗 −𝐸𝑖𝑗 )2 .
𝑄′ = ∑𝑟𝑖 ∑𝑐𝑗 2
∼ 𝜒(𝑟−1)(𝑐−1) . (2.52)
𝐸𝑖𝑗
As before, use the Q given just under 2.51, and you use the Q’ above in equation 2.52 ONLY IF I
ask you to use the Pearson statistic
Example 2.14 : A thousand individuals were classified according to sex and according to
whether or not they were colour–blind as follows
Solution:
𝐻0 Colour–blindness is independent of gender.
𝐻1 Colour–blindness is dependent of gender.
𝑁𝑖 . 𝑁 .𝑗
The table of expected frequencies, 𝐸𝑖𝑗 = , is as follows:
𝑛
𝑂
𝑄 = 2 ∑𝑟𝑖 ∑𝑐𝑗 𝑂𝑖𝑗 ℓ𝑛 𝐸𝑖𝑗
𝑖𝑗
𝑄 = 2(−16.5657 + 17.1635 + 22.3199 − 8.0310)
= 29.7732 .
2 2 2
The 1% critical value is 𝜒(𝑟−1)(𝑐−1) ; 1−0.01 = 𝜒(2−1)(2−1) ; 1−0.01 = 𝜒1;0.99 = 6.635.
2
We reject 𝐻0 if 𝑄 > 𝜒(𝑟−1)(𝑐−1) ; 1−𝛼
29.7732 > 6.635
Reject H0
At a 1% significance level we have sufficient evidence to say that there is a relationship between
colour-blindness and gender
OR At a 1% significance level we have sufficient evidence to say that there is interaction between
colour-blindness and gender
OR At a 1% significance level we have sufficient evidence to say that colour-blindness and gender
are dependent
OR At a 1% significance level we reject the hypothesis that colour-blindness and gender are
independent
(The Pearson statistic (2.52) is calculated as 𝑄′ = 27.1387, leading to the same conclusion.)
𝑋−𝜇 𝑋 − 𝜇 𝜇̂ − 𝜇
𝑍𝑛 = = = ~́𝑁(0,1)
𝜎/√𝑛 𝜎𝑋 𝜎𝜇̂
(for n large)
The 3 components are: (1) 𝜇, which gets estimated by (2) 𝜇̂ , which has standard deviation (3)
𝜎𝜇̂ .
We replace the 𝜇 with a 𝜇0 as we work with supremum under 𝐻0 in our definition of 𝛼.
If the data is normally distributed, the ~́ becomes ~.
Remember ~́ means “is approximately distributed as”, while ~ means “is distributed as”.
Thus our LR-test results (using the results of example 2.3) will be:
𝜎 𝜎
a) 𝐻0 : 𝜇 = 𝜇0 vs 𝐻1 : 𝜇 ≠ 𝜇0 Reject H0 if 𝑥 ≤ 𝜇0 − 𝑧1−𝛼 𝑛 or 𝑥 ≥ 𝜇0 + 𝑧1−𝛼 𝑛
2 √ 2 √
𝜎
b) 𝐻0 : 𝜇 ≤ 𝜇0 vs 𝐻1 : 𝜇 > 𝜇0 Reject H0 if 𝑥 ≥ 𝜇0 + 𝑧1−𝛼
√𝑛
𝜎
c) 𝐻0 : 𝜇 ≥ 𝜇0 vs 𝐻1 : 𝜇 < 𝜇0 Reject H0 if 𝑥 ≤ 𝜇0 − 𝑧1−𝛼
√𝑛
Or alternatively
𝑥−𝜇0 𝑥−𝜇0
a) 𝐻0 : 𝜇 = 𝜇0 vs 𝐻1 : 𝜇 ≠ 𝜇0 Reject H0 if ≤ −𝑧1−𝛼 or ≥ 𝑧1−𝛼
𝜎𝑋 2 𝜎𝑋 2
𝑥−𝜇0
(or use just | | ≥ 𝑧1−𝛼 )
𝜎𝑋 2
𝑥−𝜇0
b) 𝐻0 : 𝜇 ≤ 𝜇0 vs 𝐻1 : 𝜇 > 𝜇0 Reject H0 if ≥ 𝑧1−𝛼
𝜎𝑋
𝑥−𝜇0 𝜎
c) 𝐻0 : 𝜇 ≥ 𝜇0 vs 𝐻1 : 𝜇 < 𝜇0 Reject H0 if ≤ −𝑧1−𝛼 where 𝜎𝑋 =
𝜎𝑋 √𝑛
Or alternatively
a) 𝐻0 : 𝜇 = 𝜇0 vs 𝐻1 : 𝜇 ≠ 𝜇0 Reject H0 if |𝑍𝑛 | ≥ 𝑧1−𝛼
2
b) 𝐻0 : 𝜇 ≤ 𝜇0 vs 𝐻1 : 𝜇 > 𝜇0 Reject H0 if 𝑍𝑛 ≥ 𝑧1−𝛼
c) 𝐻0 : 𝜇 ≥ 𝜇0 vs 𝐻1 : 𝜇 < 𝜇0 Reject H0 if 𝑍𝑛 ≤ −𝑧1−𝛼
̂ −𝜇0
𝜇 𝑋−𝜇0 𝜎
where 𝑍𝑛 = = where 𝜎𝜇̂ =
𝜎𝜇
̂ 𝜎𝑋 √𝑛
We thus use these results for our approximate tests (i.e. even when the data is not normally
distributed.) If the 𝜎𝜇̂ or 𝜎 contains unknown parameter(s), we apply Slutsky’s theorem and
̂ −𝜇
𝜇
thus use ~́𝑁(0,1)
𝑠𝜇
̂
𝑊𝑛 −𝜃
∼ 𝑁(0,1) , 𝑡ℎ𝑒𝑛
𝜎𝑛
𝑊𝑛 −𝜃 . 𝜎𝑛 𝑃
∼ 𝑁(0,1) 𝑖𝑓 ⟶ 1 𝑤ℎ𝑒𝑛 𝑛 → ∞ .
𝑆𝑛 𝑆𝑛
Notes:
1. For example, if we deal with the Bernoulli distribution (see example 2.15), then:
𝜇 = 𝐸(𝑋) = 𝑝
1 1
and is estimated by: 𝜇̂ = 𝑋 = ∑ 𝑋𝑖 = (𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠)
𝑛 𝑛
= 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 = 𝑝̂
𝜎 √𝑉(𝑋) √𝑝(1−𝑝)
with standard deviation 𝜎𝜇̂ = 𝜎𝑝̂ = 𝜎𝑋 = = =
√𝑛 √𝑛 √𝑛
3. Dealing with the Poisson distribution (1 population), and dealing with 2 Bernoulli
populations, are tutorial questions.
*** If θ̂ is the MLE of θ, then g(θ̂) is the MLE of g(θ) (Invariance property of MLE’s)
e.g. If θ̂ is the MLE of θ, then θ̂2 is the MLE of θ2, and sin(θ̂) is the MLE of sin(θ).
Example 2.15 :
Let 𝑋1 , … , 𝑋𝑛 be a random sample from a Ber( 𝑝 ) population. Consider testing 𝐻0 : 𝑝 ≤ 𝑝0
versus 𝐻1 : 𝑝 > 𝑝0 .
𝑝̂−𝑝 𝑝̂−𝑝0
𝑍𝑛 = ~́𝑁(0,1), and in our definition of α: 𝑍𝑛 becomes
𝜎𝑝
̂ 𝜎𝑝
̂
𝑝̂−𝑝0
Thus for our approximate test we have case (b), we reject H0 if 𝑍𝑛 ≥ 𝑧1−𝛼 i.e. ≥ 𝑧1−𝛼
𝜎𝑝
̂
Let 𝑋1 , … , 𝑋𝑛 be a random sample from a Ber(𝑝) population. If we are testing the two–sided
hypothesis 𝐻0 : 𝑝 = 𝑝0 versus 𝐻1 : 𝑝 ≠ 𝑝0 :
𝑝̂−𝑝 𝑝̂−𝑝0
𝑍𝑛 = ~́𝑁(0,1), and in our definition of α: 𝑍𝑛 becomes
𝜎𝑝
̂ 𝜎𝑝
̂
𝑝̂−𝑝0
For our approximate test we have case (a), we reject H0 if |𝑍𝑛 | ≥ 𝑧1−𝛼/2 i.e. | | ≥ 𝑧1−𝛼/2
𝜎𝑝
̂
Class example:
At the beginning of last year, the entry requirements for a certain module were lowered. In that
year, 18 students failed the module and there were 60 students in the course. Test whether 20%
or less of students still fail the module, or whether more than 20% of students now fail the
module.
Solution:
We are clearly dealing with 60 Bernoulli trials (trials resulting in a success or failure).
Xi = 1 if student i failed and 0 otherwise (i = 1,…,60).
H0: p ≤ 0.2 versus H1: p > 0.2
𝑝̂−𝑝0
We reject H0 if ≥ 𝑧1−𝛼 (from the 1st half of example 2.15)
√𝑝̂(1−𝑝̂)/𝑛
18
𝑝̂−𝑝0 −0.2
60
= = 1.6903 𝑧1−𝛼 = 𝑧0.95 = 1.644854
√𝑝̂(1−𝑝̂)/𝑛 18 18
√ (1− )/60
60 60
1.6903 ≥ 1.644854
Reject H0
At a 5% significance level, we have sufficient evidence to say that more than 20% of students now
fail the module.
Note: Whenever doing any practical (data) hypothesis test question (where data is given) in this
module: a) state the hypotheses, b) state the rejection rule, c) calculate the lhs of the rejection
rule (called the test statistic if the rhs is just a table value (or minus table value)), d) calculate the
rhs of the rejection rule (if it is just a table value it is known as the critical value), e) say whether
we reject H0 or not, and f) give a conclusion.
(Even if marks were not awarded for some of these parts in tutorials, they may be awarded marks
in tests/exams.)
EXERCISES 2
1. Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ). Find the LR–test of size 𝛼 for 𝐻0 : 𝜎 2 = 𝜎02 versus 𝐻1 : 𝜎 2 ≠ 𝜎02
when (a) 𝜇 is known, (b) 𝜇 is unknown.
2. Consider paired samples (𝑋1 , 𝑌1 ), (𝑋2 , 𝑌2 ), … (𝑋𝑛 , 𝑌𝑛 ) from two normal distributions where
𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇1 , 𝜎12 ) and 𝑌1 , … , 𝑌𝑛 ∼ 𝑁(𝜇2 , 𝜎22 ) and 𝑋 and 𝑌 are not independent with
Cov (𝑋, 𝑌) = 𝜌𝜎1 𝜎2 . Derive a test of size 𝛼 for 𝐻0 : 𝜇1 − 𝜇2 = 0 versus 𝐻1 : 𝜇1 − 𝜇2 ≠ 0
where 𝜎12 and 𝜎22 are unknown. (Hint: Work with the distribution of 𝑋 − 𝑌).
3. Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇1 , 𝜎12 ) independently from 𝑌1 , … , 𝑌𝑚 ∼ 𝑁(𝜇2 , 𝜎22 ). Derive the LR–test
𝜎2 𝜎2
of size 𝛼 for 𝐻0 : 𝜎12 = 𝜆0 versus 𝐻1 : 𝜎12 ≠ 𝜆0 where 𝜇1 and 𝜇2 are unknown.
2 2
5. Using the asymptotic distribution of the LR–statistic, find an approximate test for the
hypothesis 𝐻0 : 𝜎12 = 𝜎22 = ⋯ = 𝜎𝑘2 (= 𝜎02 ) versus 𝐻1 : Not all variances are equal, where you
have 𝑘 independent samples of size 𝑛 each from normal distributions with unknown means.
6. A random sample, 𝑋1 , … , 𝑋𝑛 , is drawn from a Pareto population with pdf
𝜃𝜈 𝜃
𝑓(𝑥|𝜃, 𝜈) = , 𝜈≤𝑥<∞
𝑥 𝜃+1
, 𝜃 > 0, 𝜈 > 0 .
𝑖 𝑖 ∏𝑛 𝑋
𝑇(𝑿) = ℓ𝑛 [(min𝑋 ].
)𝑛 𝑖
1
𝑓(𝑥|𝜃, 𝜆) = 𝜆 𝑒 −(𝑥−𝜃)/𝜆 , 𝜃 < 𝑥 < ∞ ,
where 𝜆 is unknown.
2𝜃2
𝑓(𝑥|𝜃) = ,𝜃 < 𝑥 < ∞ .
𝑥3
Derive the LR–test of size 𝛼 for the hypothesis 𝐻0 : 𝜃 ≤ 𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 . Also find the
power function of this test.
Find a UMP test of size 𝛼 for the hypothesis 𝐻0 : 𝛽 ≥ 𝛽0 versus 𝐻1 : 𝛽 < 𝛽0.
(b) Consider the case where 𝑛 is large and derive an approximate test for the hypotheses in
(a).
(c) Assume in (b) that 𝐻0 : 𝜆 ≤ 1 versus 𝐻1 : 𝜆 > 1. Determine the sample size 𝑛 so that the
size of the test is approximately 0,05 and 𝑃[Reject 𝐻0 |𝜆 = 2] = 0,9.
15. Suppose 𝑔(𝑡|𝜃) = ℎ(𝑡)𝑐(𝜃)𝑒 𝑤(𝜃)𝑡 is a one–parameter exponential family for the random
variable 𝑇. Show that this family has a monotone likelihood ratio (MLR) in 𝑡 if 𝑤(𝜃) is an
increasing function of 𝜃. What would then be the form of the critical region for the test 𝐻0 : 𝜃 ≤
𝜃0 versus 𝐻1 : 𝜃 > 𝜃0 . Give three examples of such a family.
16. Given the sample (-0.2 -0.9 -0.6 0.1) from a normal population with unit variance, test the
assumption that the population mean is greater than zero at the 5% level.
17.
(a) Given the sample (-4.4 4.0 2.0 -4.8) from a normal population with variance 4 and the sample
(6.0 1.0 3.2 -0.4) from a normal population with variance 5, test at the 5% level that the means
differ by no more than one unit.
(b) Test the hypothesis that the two samples in (a) came from populations with the same
variance. Use 𝛼 = 0,05.
(c) Test the hypothesis in (a) if you assume that the two variances are equal but unknown.
18. A metallurgist made four determinations of the melting point of manganese : 1269, 1271,
1263 and 1265 degrees centigrade. Test the hypothesis that the mean 𝜇 is equal to the
published value of 1260. Use 𝛼 = 0,05.
19. The metallurgist in Problem 18 decided that his measurements should have a standard
deviation of 2 degrees or less. Are the data consistent with this supposition at the 5% level?
20. Two drugs for high blood pressure, A and B, must be compared. Ten patients are treated
with drug A and the decrease in blood pressure measured. Later the same 10 patients are treated
with drug B and blood pressure measured. The paired results are as follows :
(a) Test the null hypothesis that there is no difference in the effects of the two drugs.
(b) Assume that 20 patients were randomly assigned to the two drugs. Test now the hypothesis
in (a).
21. To compare two diets, 20 people are grouped into 10 pairs where each person of a pair has
the same mass. One person in a pair then follows diet A while the other person follows diet B.
After three weeks the loss in mass was as follows:
Diet A 12 16 9 20 8 7 10 7 13 5
Diet B 8 14 12 14 4 9 10 5 9 2
(a) Supporters of diet A claim that it is better than diet B. Test with 𝛼 = 0,05 if this claim is
justified.
(b) Assume that the 20 persons were randomly assigned to the two diets. Test now the
hypothesis in (a).
22. In a certain city the number of car accidents on each day of the working week are recorded
over a period of a few months.
(a) With 𝛼 = 0.05, test the null hypothesis that the probability for an accident is the same for
all days of the week.
(b) Test the null hypothesis that an accident is twice as likely on a Friday than on any other day.
23. A prominent baseball player’s batting average dropped from 0.313 in one year to 0.280 in
the following year. He was at bat 374 times during the first year and 268 times during the second
year. Is the hypothesis tenable at the 5% level that his hitting ability was the same during the two
years? Use two different approaches to test this hypothesis. (Hint: See Exercise 14 and Section
5.2.2).
24. For the data given in Example 2.14, the following genetic model is assumed :
𝑝 𝑝2
+ 𝑝𝑞
2 2
𝑞 𝑞2
2 2
25. Of 64 offspring of a certain cross between guinea pigs, 34 were red, 10 were black and 20
were white. According to the genetic model, these numbers should be in the ratio 9/3/4. Are the
data consistent with the model at the 5% level?
26. Gilby classified 1725 school children according to intelligence and apparent family economic
level. They were classified as follows:
3.1 Introduction
The point estimation of a parameter 𝜃 is a guess of a single value as the value of 𝜃. In this
chapter we discuss interval estimation and, more generally, set estimation. The inference in a set
estimation problem is the statement that “ 𝜃 ∈ 𝐶 ” where 𝐶 ⊂ Θ and 𝐶 = 𝐶(𝑿) is a set
determined by the value of the data 𝑿 = 𝒙 observed. If 𝜃 is real–valued, then we usually
prefer the set estimate 𝐶 to be an interval. Interval estimators will be the main topic of this
chapter.
As in the previous two chapters, this chapter is divided into two parts, the first concerned with
finding interval estimators and the second part concerned with evaluating the worth of the
estimators. We begin with a formal definition of an interval estimator.
We will use our previously defined conventions and write [𝐿(𝑿), 𝑈(𝑿)] for an interval
estimator of 𝜃 based on the random sample 𝑿 = (𝑋1 , … , 𝑋𝑛 ) and [𝐿(𝒙), 𝑈(𝒙)] for the
realized value of the interval. Although in the majority of cases we will work with finite values for
𝐿 and 𝑈 , there is sometimes interest in one–sided interval estimates. For instance, if
𝐿(𝒙) = −∞, then we have the one–sided interval (−∞, 𝑈(𝒙)] and the assertion is that “𝜃 ≤
𝑈(𝒙),” with no mention of a lower bound. We could similarly take 𝑈(𝒙) = ∞ and have a one–
sided interval [𝐿(𝒙), ∞).
1
Example of a point estimator: 𝜇̂ = 𝑋 = 𝑛 ∑ 𝑋𝑖
Example of a point estimate: 𝜇̂ = 𝑥 = 23.4
Thus we have over a 95% chance of covering the unknown parameter with our interval estimator.
Sacrificing some precision in our estimate, in moving from a point to an interval, has resulted in
increased confidence that our assertion is correct.
The purpose of using an interval estimator, rather than a point estimator, is to have some
guarantee of capturing the parameter of interest. The certainty of this guarantee is quantified in
the following definitions.
Definition 3.2 : For an interval estimator [𝐿(𝑿), 𝑈(𝑿)] of a parameter 𝜃 , the coverage
probability is the probability that the random interval [𝐿(𝑿), 𝑈(𝑿)] covers the true parameter,
𝜃. In symbols, it is denoted by either 𝑃𝜃 (𝜃 ∈ [𝐿(𝑿), 𝑈(𝑿)]), or 𝑃(𝜃 ∈ [𝐿(𝑿), 𝑈(𝑿)]|𝜃).
Definition 3.3 : For an interval estimator [𝐿(𝑿), 𝑈(𝑿)] of a parameter 𝜃, the confidence
coefficient of [𝐿(𝑿), 𝑈(𝑿)] is the infimum of the coverage probabilities, inf𝜃 𝑃𝜃 (𝜃 ∈
[𝐿(𝑿), 𝑈(𝑿)]).
𝜇 ∈ [𝑋 − 1, 𝑋 + 1]
There are a number of things to be aware of in these definitions. One, it is important to keep in
mind that the interval is the random quantity, not the parameter. Therefore, when we write
probability statements such as 𝑃𝜃 (𝜃 ∈ [𝐿(𝑿), 𝑈(𝑿)]), these probability statements refer to , not
𝜃. In other words, think of 𝑃𝜃 (𝜃 ∈ [𝐿(𝑿), 𝑈(𝑿)]), which might look like a statement about a
random 𝜃 , as the algebraically equivalent 𝑃𝜃 (𝐿(𝑿) ≤ 𝜃, 𝑈(𝑿) ≥ 𝜃) , a statement about a
random .
Interval estimators, together with a measure of confidence (usually a confidence coefficient) are
sometimes known as confidence intervals. We will often use this term interchangeably with
interval estimator. A interval estimator with confidence coefficient equal to some value, say
1 − 𝛼, is simply called a 1 − 𝛼 confidence interval.
Another important point is concerned with coverage probabilities and confidence coefficients.
Since we do not know the true value of 𝜃, we can only guarantee a coverage probability equal
to the infimum, the confidence coefficient. In some cases this does not matter because the
coverage probability will be a constant function of 𝜃. In other cases, however, the coverage
probability can be a fairly variable function of 𝜃.
Example 3.2 : Let 𝑋1 , … , 𝑋𝑛 be a random sample from a uniform(0, 𝜃) population and let
𝑌 = max{𝑋1 , … , 𝑋𝑛 }. We are interested in an interval estimator of 𝜃. We consider two candidate
estimators: [𝑎𝑌, 𝑏𝑌], 1 ≤ 𝑎 < 𝑏 , and [𝑌 + 𝑐, 𝑌 + 𝑑], 0 ≤ 𝑐 < 𝑑 , where 𝑎, 𝑏, 𝑐 and 𝑑 are
specified constants. Note that 𝜃 is necessarily larger than 𝑦. For the first interval we have
𝑌
To find the inverse function: 𝑇 = 𝜃 ⟹ 𝑌 = 𝑇𝜃 thus 𝑌 = 𝑔−1 (𝑇) = 𝑇𝜃
𝑑𝑔−1 (𝑡)
So =𝜃
𝑑𝑡
𝑓𝑌 (𝑦) = 𝑛𝑦 𝑛−1 /𝜃 𝑛 , 0 ≤ 𝑦 ≤ 𝜃,
1 𝑛 1 𝑛
(Coverage probability = (𝑎) − (𝑏) .
1 𝑛 1 𝑛 1 𝑛 1 𝑛
Confidence coefficient = inf [(𝑎) − (𝑏) ] = (𝑎) − (𝑏) )
𝜃
The coverage probability of the first interval is independent of the value of 𝜃 and thus
1 𝑛 1 𝑛
(𝑎 ) − (𝑏 ) is the confidence coefficient of the interval.
𝑐 𝑛 𝑑 𝑛
(Coverage probability = (1 − 𝜃) − (1 − 𝜃 ) .
𝑐 𝑛 𝑑 𝑛 𝑐 𝑛 𝑑 𝑛
Confidence coefficient = inf [(1 − 𝜃) − (1 − 𝜃 ) ] = lim [(1 − 𝜃) − (1 − 𝜃 ) ]
𝜃 𝜃→∞
= (1 − 0)𝑛 − (1 − 0)𝑛 = 1 − 1 = 0 )
Alternative approach:
𝑌
𝑇 = 𝑔(𝑌) = .
𝜃
𝑌
To find the inverse function: 𝑇 = 𝜃 ⟹ 𝑌 = 𝑇𝜃 thus 𝑌 = 𝑔−1 (𝑇) = 𝑇𝜃
𝑑𝑔−1 (𝑡)
So =𝜃
𝑑𝑡
𝑓𝑌 (𝑦) = 𝑛𝑦 𝑛−1 /𝜃 𝑛 , 0 ≤ 𝑦 ≤ 𝜃,
t
We therefore have 𝐹𝑇 (𝑡) = ∫0 𝑛𝑢𝑛−1 𝑑𝑢 = un |t0 = t 𝑛 0≤𝑇≤1
𝑃𝜃 (𝜃 ∈ [𝑎𝑌, 𝑏𝑌]) = 𝑃𝜃 (𝑎𝑌 ≤ 𝜃 ≤ 𝑏𝑌)
1 1 1
" = 𝑃𝜃 ( ≥ ≥ ) "
𝑎𝑌 𝜃 𝑏𝑌
1 1 1
= 𝑃𝜃 ( ≤ ≤ )
𝑏𝑌 𝜃 𝑎𝑌
1 𝑌 1
= 𝑃𝜃 ( ≤ ≤ )
𝑏 𝜃 𝑎
1 1
= 𝑃𝜃 ( ≤ 𝑇 ≤ ). (𝑇 = 𝑌/𝜃)
𝑏 𝑎
1 1
= 𝐹𝑇 ( ) − 𝐹𝑇 ( )
𝑎 𝑏
𝑛 𝑛
1 1
= ( ) −( )
𝑎 𝑏
The coverage probability of the first interval is independent of the value of 𝜃 and thus
1 𝑛 1 𝑛
(𝑎 ) − (𝑏 ) is the confidence coefficient of the interval.
𝑃𝜃 (𝜃 ∈ [𝑌 + 𝑐, 𝑌 + 𝑑]) = 𝑃𝜃 (𝑌 + 𝑐 ≤ 𝜃 ≤ 𝑌 + 𝑑)
= 𝑃𝜃 (𝑐 ≤ 𝜃 − 𝑌 ≤ 𝑑)
= 𝑃𝜃 (𝑐 − 𝜃 ≤ −𝑌 ≤ 𝑑 − 𝜃)
" = 𝑃𝜃 (−𝑐 + 𝜃 ≥ 𝑌 ≥ −𝑑 + 𝜃) "
" = 𝑃𝜃 (𝜃 − 𝑐 ≥ 𝑌 ≥ 𝜃 − 𝑑) "
= 𝑃𝜃 (𝜃 − 𝑑 ≤ 𝑌 ≤ 𝜃 − 𝑐) (3.2)
𝜃−𝑑 𝑌 𝜃−𝑐
= 𝑃𝜃 ( ≤𝜃≤ )
𝜃 𝜃
𝑑 𝑐
= 𝑃𝜃 (1 − 𝜃 ≤ 𝑇 ≤ 1 − 𝜃) (𝑇 = 𝑌/𝜃)
𝑐 𝑑 𝑐 𝑛 𝑑 𝑛
= 𝐹𝑇 (1 − 𝜃) − 𝐹𝑇 (1 − 𝜃 ) = (1 − 𝜃) − (1 − 𝜃 ) .
There is a very strong correspondence between hypothesis testing and interval estimation. We
can say in general that every confidence set corresponds to a test and vice versa. Consider the
following example.
In example 3.3 the question is for the normal distribution where σ2 is known, find a 100(1-α)%
confidence interval for µ. The result used comes from example 2.3. (They did extra steps shown
in brackets which you can skip.)
Example 3.3 Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) with 𝜎 2 known and consider testing. Find a
100(1 – α)% confidence interval for 𝜇.
Solution:
Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) with 𝜎 2 known and consider testing 𝐻0 : 𝜇 = 𝜇0 versus
𝐻1 : 𝜇 ≠ 𝜇0 . From Chapter 2 we know that the unbiased test of size 𝛼 has rejection region
𝜎
𝑅 = {𝒙 ∶ |𝑥 − 𝜇0 | > 𝑧1−𝛼/2 } , (𝑤ℎ𝑒𝑟𝑒 𝑃[ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝜇 = 𝜇0 ] = 𝛼) .
√𝑛
𝜎
= {𝒙 ∶ ±( 𝑥 − 𝜇0 ) > 𝑧1−𝛼/2 }
√𝑛
𝜎 𝜎
= {𝒙 ∶ −(𝑥 − 𝜇0 ) > 𝑧1−𝛼 𝑜𝑟 𝑥 − 𝜇0 > 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
𝜎 𝜎
= {𝒙 ∶ 𝑥 − 𝜇0 < −𝑧1−𝛼 𝑜𝑟 𝑥 − 𝜇0 > 𝑧1−𝛼 }
2√𝑛 2 √𝑛
𝜎 𝜎
= {𝒙 ∶ 𝑥 < 𝜇0 − 𝑧1−𝛼 𝑜𝑟 𝑥 > 𝜇0 + 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
𝜎 𝜎
𝐴(𝜇0 ) = {𝒙 ∶ 𝜇0 − 𝑧1−𝛼/2 ≤ 𝑥 ≤ 𝜇0 + 𝑧1−𝛼/2 }
√𝑛 √𝑛
(with probability
𝜎 𝜎
𝑃[𝑿 ∈ 𝐴(𝜇0 )] = 𝑃 [𝜇0 − 𝑧1−𝛼/2 ≤ 𝑋 ≤ 𝜇0 + 𝑧1−𝛼/2 ]
√𝑛 √𝑛
= 1−𝛼.
𝜎 𝜎
𝑃𝜇 [𝜇 − 𝑧1−𝛼/2 ≤ 𝑋 ≤ 𝜇 + 𝑧1−𝛼/2 ] = 1−𝛼
√ 𝑛 √𝑛
is true.) By inverting this statement and dropping the subscript it follows that
𝜎 𝜎
( 𝑃𝜇 [𝑋 − 𝑧1−𝛼/2 ≤ 𝜇 ≤ 𝑋 + 𝑧1−𝛼/2 ] = 1−𝛼.
√𝑛 √𝑛
𝜎 𝜎
𝐶(𝑿) = {𝜇: 𝑥 − 𝑧1−𝛼/2 ≤ 𝜇 ≤ 𝑥 + 𝑧1−𝛼/2 } (3.3)
√𝑛 √𝑛
𝜎
𝑅 = {𝒙 ∶ |𝑥 − 𝜇0 | > 𝑧1−𝛼/2 }
√𝑛
𝜎
= {𝒙 ∶ ±( 𝑥 − 𝜇0 ) > 𝑧1−𝛼/2 }
√𝑛
𝜎 𝜎
= {𝒙 ∶ −(𝑥 − 𝜇0 ) > 𝑧1−𝛼 𝑜𝑟 𝑥 − 𝜇0 > 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
𝜎 𝜎
= {𝒙 ∶ 𝑥 − 𝜇0 < −𝑧1−𝛼 𝑜𝑟 𝑥 − 𝜇0 > 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
𝜎 𝜎
= {𝒙 ∶ 𝑥 < 𝜇0 − 𝑧1−𝛼 𝑜𝑟 𝑥 > 𝜇0 + 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
𝜎 𝜎
𝐴(𝜇0 ) = {𝒙 ∶ 𝜇0 − 𝑧1−𝛼/2 ≤ 𝑥 ≤ 𝜇0 + 𝑧1−𝛼/2 }
√𝑛 √𝑛
𝜎 𝜎
{𝜇0 − 𝑧1−𝛼 ≤ 𝑥 ≤ 𝜇0 + 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
𝜎 𝜎
= {−𝑧1−𝛼 ≤ 𝑥 − 𝜇0 ≤ 𝑧1−𝛼 }
2 √ 𝑛 2 √𝑛
𝜎 𝜎
= {−𝑥 − 𝑧1−𝛼 ≤ −𝜇0 ≤ −𝑥 + 𝑧1−𝛼 }
2 √ 𝑛 2 √𝑛
𝜎 𝜎
= " {𝑥 + 𝑧1−𝛼 ≥ 𝜇0 ≥ 𝑥 − 𝑧1−𝛼 }"
2 √𝑛 2 √𝑛
𝜎 𝜎
= {𝑥 − 𝑧1−𝛼 ≤ 𝜇0 ≤ 𝑥 + 𝑧1−𝛼 }
2 √𝑛 2 √𝑛
By inverting this statement and dropping the subscript it follows that
𝜎 𝜎
𝐶(𝑿) = {𝜇: 𝑥 − 𝑧1−𝛼/2 ≤ 𝜇 ≤ 𝑥 + 𝑧1−𝛼/2 } (3.3)
√𝑛 √𝑛
By inverting the acceptance region of a test of size 𝛼, we obtain a confidence interval with
confidence coefficient 1 − 𝛼. These sets are connected by the tautology
(𝑥1 , … , 𝑥𝑛 ) ∈ 𝐴(𝜇0 ) ⟺ 𝜇0 ∈ 𝐶(𝑥1 , … , 𝑥𝑛 ) .
Theorem 3.1 : For each 𝜃0 ∈ Θ, let 𝐴(𝜃0 ) be the acceptance region of a size 𝛼 test of
𝐻0 : 𝜃 = 𝜃0 . For each 𝒙 ∈ 𝜒, define a set 𝐶(𝒙) in the parameter space by
𝐶(𝒙) = {𝜃0 : ∈ 𝐴(𝜃0 )} .
Then the random set 𝐶(𝑿) is a 1 − 𝛼 confidence set. Conversely, let 𝐶(𝑿) be a 1 − 𝛼
confidence set. For any 𝜃0 ∈ Θ, define
𝐴(𝜃0 ) = {𝒙 ∶ 𝜃0 ∈ 𝐶(𝒙)} .
Then 𝐴(𝜃0 ) is the acceptance region of a size 𝛼 test of 𝐻0 : 𝜃 = 𝜃0 .
In example 3.4 the question is for the exponential distribution, find a 100(1-α)% confidence
interval for θ. The result used comes from example 2.2.
Example 3.4
(This example is done slightly differently to the original notes to make it easier)
For the exponential distribution, find a 100(1-α)% confidence interval for θ.
Solution:
Suppose we want to find a confidence interval for the parameter 𝜃 of an exponential
distribution by inverting a size 𝛼 test of 𝐻0 : 𝜃 = 𝜃0 versus 𝐻1 : 𝜃 ≠ 𝜃0 .
and by inverting it, the 100( 1 − 𝛼 )% confidence interval for 𝜃 follows after dropping the
subscript.
1 2 2 1
𝐶(𝒙)1−𝛼 = {𝜃: 2𝑛𝑥 𝜒2𝑛,𝛼/2 ≤ 𝜃 ≤ 2𝑛𝑥 𝜒2𝑛,1−𝛼/2 } . (3.6)
Note that the inversion of a two–sided test gave a two–sided interval. In the next example we
invert a one–sided test to get a one–sided interval.
In example 3.5 the question is for the normal distribution where σ2 is unknown, find a
100(1-α)% upper confidence bound for µ. The result used comes from example 2.4. remarks
(equation 2.19). (Upper confidence bound means confidence interval of the form (−∞, 𝑈].)
Example 3.5
Let 𝑋1 , … , 𝑋𝑁 ∼ 𝑁(𝜇, 𝜎 2 ) and we want a 1 − 𝛼 upper confidence bound for 𝜇. That is, we
want a confidence interval of the form 𝐶(𝒙) = (−∞, 𝑈(𝒙)]. To obtain such an interval we will
invert a one sided hypothesis of the form 𝐻0 : 𝜇 ≥ 𝜇0 versus 𝐻1 : 𝜇 < 𝜇0 .
Note that if we want an upper bound on the interval, we must use a test with an upper bound
on the alternative hypothesis and vice versa.
𝑠
𝑅 = {𝒙 ∶ 𝑥 < 𝜇0 − 𝑡𝑛−1,1−𝛼 }
√𝑛
𝑠 𝑠 𝑠
{𝑥 ≥ 𝜇0 − 𝑡𝑛−1,1−𝛼 } = {𝑥 + 𝑡𝑛−1,1−𝛼 ≥ 𝜇0 } = {𝜇0 ≤ 𝑥 + 𝑡𝑛−1,1−𝛼 }
√𝑛 √𝑛 √𝑛
After dropping the subscript and inverting follows the 100(1 − 𝛼)% confidence interval :
𝑠
𝐶(𝒙)1−𝛼 = {𝜇: 𝜇 ≤ 𝑥 + 𝑡𝑛−1,1−𝛼 }. (3.7)
√𝑛
𝑠
i.e. {𝜇: −∞ < 𝜇 ≤ 𝑥 + 𝑡𝑛−1,1−𝛼 }.
√𝑛
The test inversion method is completely general in that we can invert any test and obtain a
confidence set. However, in certain situations one of the following two methods works easier.
Especially for discrete distributions.
Definition 3.4 : A random variable 𝑄(𝑿, 𝜃), a function of the data and the parameter, is a
pivotal quantity if the distribution of 𝑄(𝑿, 𝜃) is independent of the parameter. That is, if 𝑋 ∼
𝑓(𝑥|𝜃), then 𝑄(𝑿, 𝜃) has the same distribution for all values of 𝜃.
In location and scales cases there are many examples of pivotal quantities.
Example 3.6 : Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ). Then we know
√𝑛(𝑋−𝜇) (𝑋−𝜇)
(a) = ∼ 𝑁(0,1),
𝜎 𝜎/√𝑛
√𝑛(𝑋−𝜇) (𝑋−𝜇)
(b) = ∼ 𝑡𝑛−1,
𝑆 𝑆/√𝑛
𝑋𝑖 −𝜇
(d) ~𝑁(0,1)
𝜎
𝑋𝑖 −𝜇 2 (𝑋𝑖 −𝜇)2
(e) ( ) = ~𝜒12
𝜎 𝜎2
(𝑋𝑖 −𝜇)2 1 ∑𝑛
𝑖=1(𝑋𝑖 −𝜇)
2
(f) ∑𝑛𝑖=1 = 𝜎2 ∑𝑛𝑖=1(𝑋𝑖 − 𝜇)2 = ~𝜒𝑛2
𝜎2 𝜎2
2
∑𝑛
𝑖=1(𝑋𝑖 −𝑋) (𝑛−1)𝑆 2 ̂2
𝑛𝜎 2
(c) = = ~𝜒𝑛−1 where 𝜎̂ 2 is the MLE of 𝜎 2 (3.8)
𝜎2 𝜎2 𝜎2
1 2 1 2
Remember: 𝑆 2 = 𝑛−1 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋) and 𝜎̂ 2 = 𝑛 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋) is the MLE of 𝜎 2 .
All these statistics are pivotal quantities since (i) they are functions of both the data and the
parameter(s), and (ii) their distributions are independent of the parameters.
𝜕
𝑓(𝑡|𝜃) = 𝑔(𝑄(𝑡, 𝜃)) |𝜕𝑡 𝑄(𝑡, 𝜃)| , (3.9)
for some function 𝑔 and some monotone (in 𝑡) function 𝑄. Then 𝑄(𝑇, 𝜃) is a pivot. Then use
important relationship or hint previously/currently to get the pivotal quantity’s distribution (or
use MGF’s for this).
Don’t confuse this formula with the formula for getting the pdf of a function of 𝑋 e.g. 𝑦 = 𝑅(𝑥)
and 𝑥 = 𝑆(𝑦) then 𝑔𝑌 (𝑦) = … (which by the way has 2 forms, one which divides and one that
times’s which looks similar to equation (3.9) but is NOT the same).
Example 3.7 :
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝐸𝑥𝑝 (𝜆). Then 𝑇 = Σ𝑋𝑖 is a sufficient statistic for 𝜆 and we know 𝑇 =
Σ𝑋𝑖 ∼ Gamma(𝑛, 𝜆) with pdf
𝜆𝑛
𝑓(𝑡|𝜆) = 𝑡 𝑛−1 𝑒 −𝜆𝑡 .
Γ(𝑛)
Notice that 𝜆 and 𝑡 appear together in the pdf as 𝜆𝑡, and in fact, 2𝜆𝑇 has a 𝜒 2 –distribution
2
with 2𝑛 degrees of freedom. So 𝑄 = 2𝜆𝑇 = 2𝜆Σ𝑋𝑖 ∼ 𝜒2𝑛 , independent of 𝜆, so 𝑄 is a pivot.
Once we have a pivotal quantity, the construction of a confidence interval is simple, provided the
pivotal quantity is invertable. Since the distribution of 𝑄(𝑇, 𝜃) is known, we can find two
numbers 𝑎 and 𝑏 so that
for a specified 𝛼. By inverting the inequalities we can obtain the confidence interval, similar to
inverting the acceptance region for a test.
There is an infinite number of sets (𝑎, 𝑏) that satisfy (3.11). If we use the equal–tail criterium,
2 2
then 𝑃[2𝜆𝑇 > 𝑏] = 𝛼/2 where 2𝜆𝑇 ∼ 𝜒2𝑛 . So 𝑏 = 𝜒2𝑛;1−𝛼/2 . Similarly it follows then that
2
𝑎 = 𝜒2𝑛,𝛼/2.
So
2 2
𝑃[𝜒2𝑛,𝛼/2 ≤ 2𝜆𝑇 ≤ 𝜒2𝑛,1−𝛼/2 ] = 1−𝛼
𝑎𝑛𝑑
2 2
𝜒2𝑛,𝛼/2 𝜒2𝑛,1−𝛼/2
𝑃[ ≤𝜆≤ ] = 1−𝛼,
2𝑇 2𝑇
2 2
𝜒2𝑛,𝛼/2 𝜒2𝑛,1−𝛼/2
so that 𝐶(𝑿)1−𝛼 = {𝜆: ≤𝜆≤ }. (3.12)
2Σ𝑥𝑖 2Σ𝑥𝑖
2
Thus (𝜆𝑡)𝑛 is a pivotal quantity, so 2[(𝜆𝑡)𝑛 ]1/𝑛 = 2𝜆𝑡 is a pivotal quantity. 2𝜆𝑇 ∼ 𝜒2𝑛 .
So
2 2
𝑃[𝜒2𝑛,𝛼/2 ≤ 2𝜆𝑇 ≤ 𝜒2𝑛,1−𝛼/2 ] = 1−𝛼
𝑎𝑛𝑑
2 2
𝜒2𝑛,𝛼/2 𝜒2𝑛,1−𝛼/2
𝑃[ ≤𝜆≤ ] = 1−𝛼,
2𝑇 2𝑇
2 2
𝜒2𝑛,𝛼/2 𝜒2𝑛,1−𝛼/2
so that 𝐶(𝑿)1−𝛼 = {𝜆: ≤𝜆≤ }.
2Σ𝑥𝑖 2Σ𝑥𝑖
Example 3.8 :
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) and we want a 100( 1 − 𝛼 )% confidence interval for 𝜎 . ( 𝜇
unknown, so the pivotal quantity may not contain 𝜇 ). A pivotal quantity for 𝜎 2 is 𝑄 =
(𝑛−1)𝑆 2 2 1
∼ 𝜒𝑛−1 where 𝑆 2 = 𝑛−1 Σ(𝑋𝑖 − 𝑋)2. So
𝜎2
(𝑛−1)𝑆 2
𝑃 [𝑎 ≤ ≤ 𝑏] = 1 = 𝛼
𝜎2
2 2
if 𝑎 = 𝜒𝑛−1,𝛼/2 and 𝑏 = 𝜒𝑛−1,1−𝛼/2 .
2 (𝑛−1)𝑆 2 2
𝑃 [𝜒𝑛−1,𝛼/2 ≤ ≤ 𝜒𝑛−1,1−𝛼/2 ] = 1=𝛼
𝜎2
1 𝜎2 1
𝑃 [𝜒 2 ≤ (𝑛−1)𝑆 2
≤ 𝜒2 ] = 1=𝛼
𝑛−1,1−𝛼/2 𝑛−1,𝛼/2
(𝑛−1)𝑆 2 (𝑛−1)𝑆 2
So 𝑃 [𝜒2 ≤ 𝜎 2 ≤ 𝜒2 ] = 1 − 𝛼,
𝑛−1,1−𝛼/2 𝑛−1,𝛼/2
or equivalently,
(𝑛−1)𝑠2 (𝑛−1)𝑠2
𝐶(𝑋)1−𝛼 = {𝜎: √𝜒2 ≤ 𝜎 ≤ √ 𝜒2 }. (3.13)
𝑛−1,1−𝛼/2 𝑛−1,𝛼/2
Theorem 3.2 : Let 𝑇 be a statistic with continuous cdf 𝐹𝑇 (𝑡|𝜃). Let 0 < 𝛼 < 1 be a fixed
value. Suppose that for each 𝑡 ∈ 𝑇, the function 𝜃𝐿 (𝑡) and 𝜃𝑈 (𝑡) can be defined as follows.
Then the random interval [𝜃𝐿 (𝑇), 𝜃𝑈 (𝑇)] is a 1 − 𝛼 confidence interval for 𝜃.
If you have data, it is not really necessary to determine whether 𝐹𝑇 (𝑡|𝜃) is increasing or
decreasing. Simply solve the equations (case 1 or case 2) for values, say 𝜃1 (𝑡) and 𝜃2 (𝑡), of the
parameter instead of 𝜃L (𝑡) and 𝜃U (𝑡), and put the smaller number between 𝜃1 (𝑡) and 𝜃2 (𝑡)
on the left (i.e. it is then 𝜃L (𝑡)), and the bigger number between 𝜃1 (𝑡) and 𝜃2 (𝑡) on the right
(i.e. it is then 𝜃U (𝑡)).
Proof : We will prove only part (1). The proof of part (2) is similar. Since 𝐹𝑇 (𝑡|𝜃) is a
decreasing function of 𝜃 for each 𝑡 and 1 − 𝛼/2 > 𝛼/2, 𝜃𝐿 (𝑡) < 𝜃𝑈 (𝑡) and the values
𝜃𝐿 (𝑡) and 𝜃𝑈 (𝑡) are unique. Also, 𝐹𝑡 (𝑡|𝜃) is decreasing in 𝜃 and hence
it follows that
𝛼 𝛼 𝛼
𝑃𝜃 (𝜃 > 𝜃𝑈 (𝑇)) = 𝑃𝜃 (𝐹𝑇 (𝑇|𝜃) < 2 ) = 𝐹𝐹𝑇 ( 2 ) = 2 ,
So 𝑃𝜃 (𝜃 ≤ 𝜃𝑈 (𝑇)) = 1 − 𝛼/2
where the last equality follows from the Probability Integral Transform which states that the
0 𝑦<0
𝑦−0
random variable 𝐹(𝑥) ∼ Uniform(0, 1) i.e. 𝐹𝐹(𝑥) (𝑦) = {1−0 = 𝑦 0 ≤ 𝑦 ≤ 1.
1 𝑦>1
By a similar argument, we have
𝛼 𝛼 𝛼 𝛼
𝑃𝜃 (𝜃 < 𝜃𝐿 (𝑇)) = 𝑃𝜃 (𝐹𝑇 (𝑇|𝜃) > 1 − 2 ) = 1 − 𝐹𝐹𝑇 (1 − 2 ) = 1 − (1 − 2 ) = 2 .
The equations in the case of a decreasing cdf can also be expressed in terms of the pdf of the
statistic 𝑇. The functions 𝜃𝑈 (𝑡) and 𝜃𝐿 (𝑡) can be defined to satisfy
𝑡 ∞
∫−∞ 𝑓𝑇 (𝑢|𝜃𝑈 (𝑡))𝑑𝑢 = 𝛼/2 𝑎𝑛𝑑 ∫𝑡 𝐹𝑇 (𝑢|𝜃𝐿 (𝑡))𝑑𝑢 = 𝛼/2 . (3.15)
A similar set of equations holds for the increasing case.
The statistical method is particularly useful in the case (a) where the sample space depends on
the parameter, and (b) with discrete distributions.
Example 3.9 :
Consider a sample 𝑥1 , … , 𝑥𝑛 from a Uniform(0, 𝜃) distribution
1
𝑓(𝑥|𝜃) = ,0 ≤ 𝑥 ≤ 𝜃 .
𝜃
𝑛
𝑓𝑌 (𝑦|𝜃) = 𝜃𝑛 𝑦 𝑛−1 , 0 ≤ 𝑦 ≤ 𝜃 ,
𝑦 𝑛
𝐹𝑌 (𝑦|𝜃) = (𝜃) , 0 ≤ 𝑦 ≤ 𝜃 .
So
1 1
𝐶(𝑦)1−𝛼 = {𝜃: 𝑦(1 − 𝛼/2)−𝑛 ≤ 𝜃 ≤ 𝑦(𝛼/2)−𝑛 } . (3.16)
Notice two things about this method. Firstly, the Equations (3.14) or (3.15) need to be solved only
for the actual observed value of the statistic 𝑇 = 𝑡0 , and secondly, it is not really necessary to
determine whether 𝐹𝑇 (𝑡|𝜃) is increasing or decreasing. Simply solve the equations for values,
say 𝜃1 (𝑡) and 𝜃2 (𝑡), of the parameter. Then the smaller solution is 𝜃𝐿 (𝑡) and the larger
𝜃𝑈 (𝑡). We now consider the discrete case.
Theorem 3.3 : Let 𝑇 be a discrete statistic with cdf 𝐹𝑇 (𝑡|𝜃) = 𝑃(𝑇 ≤ 𝑡|𝜃) . Let
0 < 𝛼 < 1 be a fixed value. Suppose that for each 𝑡 ∈ 𝑇, 𝜃1 (𝑡) and 𝜃2 (𝑡) can be defined as
𝛼 𝛼
𝐹𝑇 (𝑡|𝜃1 (𝑡)) = 2 , 𝐹𝑇 (𝑡 − 1|𝜃2 (𝑡)) = 1 − 2
𝛼 𝛼
𝑃[𝑇 ≤ 𝑡|𝜃1 (𝑡)] = 2 , 𝑃[𝑇 ≤ 𝑡 − 1|𝜃2 (𝑡)] = 1 − 2
𝛼 𝛼
𝑃[𝑇 ≤ 𝑡|𝜃1 (𝑡)] = 2 , 𝑃[𝑇 < 𝑡|𝜃2 (𝑡)] = 1 − 2
𝛼 𝛼
𝑃[𝑇 ≤ 𝑡|𝜃1 (𝑡)] = , 𝑃[𝑇 ≥ 𝑡|𝜃2 (𝑡)] = . (3.17)
2 2
𝛼 𝛼
∑𝑡𝑡=0
0
𝑓𝑇 (𝑡|𝜃1 (𝑡)) = , ∑∞
𝑡=𝑡0 𝑓𝑇 (𝑡|𝜃2 (𝑡)) = . (3.18)
2 2
𝛼 𝛼
∑t𝑢=0 𝑓𝑇 (𝑢|𝜃1 (𝑡)) = , ∑∞
𝑢=t 𝑓𝑇 (𝑢|𝜃2 (𝑡)) = . (3.18)
2 2
Then [𝜃1 (𝑡) , 𝜃2 (𝑡)] ([𝜃2 (𝑡), 𝜃1 (𝑡)]) is a 1 − 𝛼 confidence interval for 𝜃 if 𝐹𝑇 (𝑡|𝜃) is an
increasing (decreasing) function of 𝜃. Notice that Equation (3.17) can be written in terms of the
probability function as (3.18).
Example 3.10 :
Let 𝑋1 , … , 𝑋𝑛 be a random sample from a Poisson population with parameter 𝜆 and define
𝑌 = Σ𝑋𝑖 . 𝑌 = Σ𝑋𝑖 is sufficient for 𝜆 and 𝑌 = Σ𝑋𝑖 ∼ Poisson (𝑛𝜆) . Applying the above
method, if 𝑌 = 𝑦0 is observed, we are led to solve for 𝜆1 and 𝜆2 in the equations
𝑌 = Σ𝑋𝑖 ∼ Poisson(𝑛𝜆). So, for the first equation in (3.19) it follows that
𝛼 𝛼
𝑃[𝑌 ≤ 𝑦0 |𝜆1 ] = 2 , 𝑃[𝑌 ≥ 𝑦0 |𝜆2 ] = 2 .
𝛼 𝛼
𝑃[𝑌 ≥ 𝑦0 + 1|𝜆1 ] = 1 − 2 , 𝑃[𝑌 ≥ 𝑦0 |𝜆2 ] = 2
𝛼 𝛼
𝑃[𝐺𝑎𝑚𝑚𝑎(𝑦0 + 1, n) < 𝜆1 |𝜆1 ] = 1 − 2 , 𝑃[𝐺𝑎𝑚𝑚𝑎(𝑦0 , n) < 𝜆2 |𝜆2 ] = 2
𝛼 𝛼
𝑃[2𝑛𝐺𝑎𝑚𝑚𝑎(𝑦0 + 1, n) < 2𝑛𝜆1 |𝜆1 ] = 1 − 2 , 𝑃[2𝑛𝐺𝑎𝑚𝑚𝑎(𝑦0 , n) < 2𝑛𝜆2 |𝜆2 ] = 2
2 2 𝛼 𝛼
𝑃[𝜒2(𝑦 0 +1)
< 2𝑛𝜆1 ] = 1 − 2 , 𝑃[𝜒2𝑦 0
< 2𝑛𝜆2 ] = 2
(< is the same as ≤ with 𝜒 2 as it is a continuous random variable)
𝛼 𝛼
𝐹𝜒2 [2𝑛𝜆1 ] = 1 − , 𝐹𝜒2𝑦 2 [2𝑛𝜆2 ] =
2(𝑦0 +1) 2 2 0
𝛼 𝛼
2𝑛𝜆1 = 𝐹𝜒−1 2 [1 − ] , 2𝑛𝜆2 = 𝐹𝜒−1
2 [ ]
2(𝑦0 +1) 2 2𝑦0 2
1 −1 𝛼 1 −1 𝛼
𝜆1 = 2𝑛 𝐹𝜒2 [1 − 2 ], 𝜆2 = 𝐹2 [ ]
2(𝑦0 +1) 2𝑛 𝜒2𝑦0 2
1 2 1
𝜆1 = 2𝑛 𝜒2(𝑦 0 +1);1−𝛼/2
, 𝜆2 = 𝜒2 .
2𝑛 2𝑦0 ;𝛼/2
Now 𝜆2 < 𝜆1 and the confidence interval is
2 1 2 1
𝐶(𝑦)1−𝛼 = {𝜆: 2𝑛 𝜒2𝑦0 ;𝛼/2
≤ 𝜆 ≤ 2𝑛 𝜒2(𝑦0 +1);1−𝛼/2
}. (3.20)
This is easiest done with the help of Poisson tables, where the value of the parameter can be
read off for a given probability. However, tables are never comprehensive enough to obtain an
interval with the exact required confidence coefficient 1 − 𝛼.
So our confidence coefficient is not 0.95, but 0.9769 – 0.0212 = 0.9557, and
𝐶(𝑦0 )0.9557 = {𝜆: 0.6 ≤ 𝑛𝜆 ≤ 9.0}
(3.22)
= {𝜆: 0.06 ≤ 𝜆 ≤ 0.90} ,
which is reasonably close to (3.21).
In set estimation two quantities vie against one another, size and coverage probability. Naturally,
we want our set to have small size and large coverage probability. The coverage probability of a
confidence set will, except in special cases, be a function of the parameter so there is not one
value to consider, but an infinite number of values. For the most part, however, we will measure
coverage probability performance by the confidence coefficient, the infimum of the coverage
probabilities. When we speak of the size of a confidence set we will usually mean the length of
the confidence set, if the set is an interval.
We now consider what appears to be a simple, constrained minimization problem. For a given,
specified coverage probability find the confidence interval with the shortest length. We first
consider an example.
𝑎 𝑏 1
𝐿1−𝛼 𝐶.𝐼. 𝑓𝑜𝑟 𝜇 = 𝐿 = [𝑥 − ] − [𝑥 − ]= (𝑏 − 𝑎) ∝ 𝑏 − 𝑎 = 𝐿1−𝛼 𝐶.𝐼. 𝑓𝑜𝑟 𝑍 , (3.23)
√ 𝑛 √ 𝑛 √𝑛
1
but since is constant, we want to minimize 𝑏 − 𝑎 while maintaining 1 − 𝛼 coverage.
√𝑛
There are an infinite number of choices for 𝑎 and 𝑏. The traditional values used are 𝑏 = 𝑧1−𝛼/2
and 𝑎 = 𝑧𝛼/2 = −𝑧1−𝛼/2 . The following theorem will show whether this choice is optimal.
Theorem 3.4 : Let 𝑓(𝑥) be a unimodal pdf. If the interval [𝑎, 𝑏] satisfies
𝑏
a) 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥 = 1 − 𝛼,
b) 𝑓(𝑎) = 𝑓(𝑏), and
c) 𝑎 ≤ 𝑥 ∗ ≤ 𝑏, where 𝑥 ∗ is the mode of 𝑓(𝑥),
then [𝑎, 𝑏] is the shortest interval satisfying (a).
Proof : Let [𝑎′, 𝑏′] be any interval with 𝑏′ − 𝑎′ < 𝑏 − 𝑎 . We will show that this implies
𝑏′
∫𝑎′ 𝑓(𝑥)𝑑𝑥 ≤ 1 − 𝛼. The result will be proved only for 𝑎′ ≤ 𝑎, the proof being similar if 𝑎 < 𝑎′.
Also, two cases need to be considered, 𝑏′ ≤ 𝑎 and 𝑏′ > 𝑎.
If 𝑏′ ≤ 𝑎, then 𝑎′ ≤ 𝑏′ ≤ 𝑎 ≤ 𝑥 ∗ and
𝑏′
∫𝑎′ 𝑓(𝑥)𝑑𝑥 ≤ 𝑓(𝑏′)(𝑏′ − 𝑎′) (𝑥 ≤ 𝑏′ ≤ 𝑥 ∗ ⇒ 𝑓(𝑥) ≤ 𝑓(𝑏′))
≤ 𝑓(𝑎)(𝑏′ − 𝑎′) (𝑏′ ≤ 𝑎 ≤ 𝑥 ∗ ⇒ 𝑓(𝑏′) ≤ 𝑓(𝑎))
< 𝑓(𝑎)(𝑏 − 𝑎) (𝑏′ − 𝑎′ < 𝑏 − 𝑎 𝑎𝑛𝑑 𝑓(𝑎) > 0)
𝑏
≤ ∫𝑎 𝑓(𝑥)𝑑𝑥 ((𝑎), (𝑏), 𝑎𝑛𝑑 𝑢𝑛𝑖𝑚𝑜𝑑𝑎𝑙𝑖𝑡𝑦 ⇒ 𝑓(𝑥) ≥ 𝑓(𝑎) 𝑓𝑜𝑟 𝑎 ≤ 𝑥 ≤ 𝑏)
= 1−𝛼,
𝑏′
Thus ∫𝑎′ 𝑓(𝑥)𝑑𝑥 < 1 − 𝛼 (thus we have a contradiction thus the assumption is false)
completing the proof in the first case.
If 𝑏′ > 𝑎 , then 𝑎′ ≤ 𝑎 < 𝑏′ < 𝑏 for, if 𝑏′ were greater than or equal to 𝑏 , then 𝑏′ − 𝑎′
would be greater than or equal to 𝑏 − 𝑎. In this case, we can write
𝑏′ 𝑏 𝑎 𝑏
∫𝑎′ 𝑓(𝑥)𝑑𝑥 = ∫𝑎 𝑓(𝑥)𝑑𝑥 + [∫𝑎′ 𝑓(𝑥)𝑑𝑥 − ∫𝑏′ 𝑓(𝑥)𝑑𝑥]
𝑎 𝑏
= (1 − 𝛼) + [∫𝑎′ 𝑓(𝑥)𝑑𝑥 − ∫𝑏′ 𝑓(𝑥)𝑑𝑥]
and the theorem will be proved if we show that the expression
in square brackets is negative. Now, using the unimodality of 𝑓,
the ordering 𝑎′ ≤ 𝑎 < 𝑏′ < 𝑏, and (b),
𝑎
∫𝑎′ 𝑓(𝑥)𝑑𝑥 ≤ 𝑓(𝑎)(𝑎 − 𝑎′)
𝑏
and ∫𝑏′ 𝑓(𝑥)𝑑𝑥 ≥ 𝑓(𝑏)(𝑏 − 𝑏′) .
𝑏
so − ∫𝑏′ 𝑓(𝑥)𝑑𝑥 ≤ −𝑓(𝑏)(𝑏 − 𝑏′)
𝑎 𝑏
Thus [∫𝑎′ 𝑓(𝑥)𝑑𝑥 − ∫𝑏′ 𝑓(𝑥)𝑑𝑥] ≤ 𝑓(𝑎)(𝑎 − 𝑎′) − 𝑓(𝑏)(𝑏 − 𝑏′)
= 𝑓(𝑎)(𝑎 − 𝑎′ ) − 𝑓(𝑎)(𝑏 − 𝑏 ′ ) (𝑓(𝑎) = 𝑓(𝑏))
= 𝑓(𝑎)[(𝑎 − 𝑎′) − (𝑏 − 𝑏′)]
= 𝑓(𝑎)[𝑎 − 𝑎′ − 𝑏 + 𝑏′]
= 𝑓(𝑎)[𝑏 ′ − 𝑎′ − 𝑏 + 𝑎]
= 𝑓(𝑎)[(𝑏 ′ − 𝑎′ ) − (𝑏 − 𝑎)] < 0,
which is negative if (𝑏′ − 𝑎′) < (𝑏 − 𝑎) and 𝑓(𝑎) > 0.
𝑏′ 𝑎 𝑏
∫𝑎′ 𝑓(𝑥)𝑑𝑥 = (1 − 𝛼) + [∫𝑎′ 𝑓(𝑥)𝑑𝑥 − ∫𝑏′ 𝑓(𝑥)𝑑𝑥]
= (1 − 𝛼) + 𝑛𝑒𝑔. 𝑛𝑢𝑚𝑏𝑒𝑟
< 1−𝛼
𝑏′
Thus ∫𝑎′ 𝑓(𝑥)𝑑𝑥 < 1 − 𝛼 (thus we have a contradiction thus the assumption is false)
completing the proof in the second case.
Example 3.11 (Cont.) : According to Theorem 3.4 the value of the density function should be
the same at 𝑎 and 𝑏. So chosing 𝑎 = 𝑧𝛼/2 = −𝑧1−𝛼/2 and 𝑏 = 𝑧1−𝛼/2 (obviously (a) holds):
since the distribution in question here is the standard normal, and thus symmetric around zero
((c) holds), it means that 𝑓(−𝑧1−𝛼/2 ) = 𝑓(𝑧1−𝛼/2 ) ((b) holds), and the equal tail interval is
indeed the shortest possible 1 − 𝛼 interval so 𝐿1−𝛼 𝐶.𝐼. 𝑓𝑜𝑟 𝑍 is minimised, so 𝐿1−𝛼 𝐶.𝐼. 𝑓𝑜𝑟 𝜇
1 1 1
is minimised with length 𝐿 = [b − a] = [𝑧1−𝛼 − (−𝑧1−𝛼 )] = 2𝑧1−𝛼/2 .
√𝑛 √𝑛 2 2 √𝑛
So for any symmetric unimodal distribution, (like the 𝑡–distribution) the shortest interval will by
symmetric around the mean, given the interval is a function of 𝑏 − 𝑎.
Corollary 3.1 : Let 𝑓(𝑥) be a strictly decreasing pdf on [0, ∞). Of all intervals [𝑎, 𝑏]
𝑏
that satisfy ∫𝑎 𝑓(𝑥)𝑑𝑥 = 1 − 𝛼, the shortest is obtained by choosing 𝑎 = 0 and 𝑏 so that
𝑏
𝑃(0 ≤ 𝑋 ≤ 𝑏) = ∫0 𝑓(𝑥)𝑑𝑥 = 1 − 𝛼.
The proof of this corollary follows the same line as Theorem 3.4, see Exercise 5.
NOTE : It is important to note that the previous theorem and corollary only applies when the
interval length is a function of 𝑏 − 𝑎, which is usually only the case when working with a location
problem. In scale cases the theorem may not be applicable, as in the following example.
1 2𝑋
Example 3.12 : Let 𝑋 ∼ Gamma(𝑘, 𝛽) with 𝑘 known. The quantity 𝑌 = 𝛽
is a pivot with
2
𝑌∼ 𝜒2𝑘 , so we can get a confidence interval by finding constants 𝑎 and 𝑏 to satisfy
𝑃[𝑎 < 𝑌 < 𝑏] = 1 − 𝛼 .
However, Theorem 3.4 will not give the shortest interval since the interval for 𝛽 is of the form
2𝑥 2𝑥
𝐶(𝑦)1−𝛼 = {𝛽: ≤𝛽≤ } (3.24)
𝑏 𝑎
with length
2𝑥 2𝑥 1 1 𝑏−𝑎 𝑏−𝑎
𝐿1−𝛼 𝐶.𝐼.𝑓𝑜𝑟 𝛽 = − = 2𝑥 (𝑎 − 𝑏) = 2𝑥 ( 𝑎𝑏 ) ∝
𝑁𝑂𝑇 𝑃𝑅𝑂𝑃 𝑏 − 𝑎 = 𝐿1−𝛼 𝐶.𝐼.𝑓𝑜𝑟 𝑌 ,
𝑎 𝑏 𝑎𝑏
(3.25)
𝑏−𝑎
that is, it is 𝐿 is proportional to 𝑎𝑏 and not 𝑏 − 𝑎. Thus theorem 3.4. is useless. It is however
not easy to solve for 𝑎 and 𝑏 in this case, and in practice the equal–tail intervals are often
used if the distribution is not too skew.
2
Notice that if 𝑋 is defined in the example as 𝑋 ∼ Gamma(𝑘, 𝛽), then 𝑌 = 2𝛽𝑋 ∼ 𝜒2𝑘 , and
the confidence interval for 𝛽 is of the form
𝑎 𝑏
{𝛽: ≤ 𝛽 ≤ 2𝑥} (3.26)
2𝑥
1
with length 𝐿1−𝛼 𝐶.𝐼.𝑓𝑜𝑟 𝛽 = (𝑏 − 𝑎) ∝ 𝑏 − 𝑎 = 𝐿1−𝛼 𝐶.𝐼.𝑓𝑜𝑟 𝑌 . Thus we would use corollary
2𝑥
2
3.1. i.e. we would choose 𝑎 = 0 and 𝑏 = 𝜒1−α ,2𝑘 as the pdf is decreasing w.r.t. 𝑦. If the pdf
2
was unimodal, we would be able to use theorem 3.4., i.e. we would choose 𝑎 = 𝜒α/2 ,2𝑘 and
2
𝑏 = 𝜒1−α/2 ,2𝑘 .
3.3.2 Test–Related Optimality
Since there is a one-to-one correspondence between confidence sets and tests of hypotheses
(Theorem 9.2.1), there is some correspondence between optimality of tests and optimality of
confidence sets. Usually, test–related optimality properties of confidence sets do not directly
relate to the size of the set but rather to the probability of the set covering false values.
Definition 3.5 : A 1 − 𝛼 confidence set that minimizes the probability of false coverage over a
class of 1 − 𝛼 confidence sets is called a uniformly most accurate (UMA) confidence set.
We know that a uniformly most powerful test (UMP test) minimizes the probability of accepting
the null hypothesis over all parameter values outside the null hypothesis. {𝛽(𝜃) ≥ 𝛽′(𝜃) for all
𝜃 ∈ Ω0 }. It can be shown that this is equivalent to minimizing the probability of false coverage
when inverting the acceptance region. So UMA confidence intervals are constructed by inverting
UMP tests. Unfortunately UMP tests are one–sided, so that UMA intervals are also one–sided.
𝜎
[ 𝑥 − 𝑧1−𝛼 , ∞) is a 1 − 𝛼 UMA lower confidence bound since it can be obtained by
√𝑛
inverting the UMP test of 𝐻0 : 𝜇 ≤ 𝜇0 versus 𝐻1 : 𝜇 > 𝜇0 .
The property of unbiasedness when testing a two–sided hypothesis can also be transferred to
confidence intervals. Remember that an unbiased test is one in which the power under the
alternative is always greater than the power under the null hypothesis.
Definition 3.6 : A 1 − 𝛼 confidence set 𝐶(𝑋) is unbiased if 𝑃𝜃 [𝜃′ ∈ 𝐶(𝑋)] ≤ 1 − 𝛼 for all
𝜃 ≠ 𝜃′.
Thus, for an unbiased confidence set, the probability of false coverage is never more than the
minimum probability of true coverage. Again, the inversion of an unbiased test, or unbiased UMP
test, will result in an unbiased, or unbiased UMA, confidence interval.
3.4 Approximate Confidence Intervals
As with tests, we close this chapter with some approximate and asymptotic (result holds for
𝑛 ⟶ ∞, thus it holds approximately when n is large) versions of confidence sets. These methods
can be of use in complicated situations where other methods have failed.
giving the approximate confidence interval for ℎ(𝜃) (by using the pivotal quantity method);
̂ )−ℎ(𝜃)
ℎ(𝜃
i.e. 𝑃 [−𝑧1−𝛼 ≤ ≤ 𝑧1−𝛼 ] ≈ 1 − 𝛼
2 √ 𝑉𝑎𝑟 ̂ )| 𝜃 )
̂ (ℎ(𝜃 2
𝑝̂ 𝑝̂ 𝑝 𝑝̂ 𝑝̂
− 𝑧1−𝛼/2 √𝑛(1−𝑝̂)3 ≤ 1−𝑝 ≤ 1−𝑝̂ + 𝑧1−𝛼/2 √𝑛(1−𝑝̂)3 .
1−𝑝̂
then we can form an approximate interval for 𝜃 given (by using the pivotal quantity method) by
𝑊−𝜃
−𝑧1−𝛼/2 ≤ ≤ 𝑧1−𝛼/2
𝑉
𝑊 − 𝑧1−𝛼/2 𝑉 ≤ 𝜃 ≤ 𝑊 + 𝑧1−𝛼/2 𝑉
1
where 𝑠 2 = 𝑛−1 Σ(𝑥𝑖 − 𝑥)2.
𝑋−𝜇 𝑋−𝜆
𝜎 ≈ √𝑋
→ 𝑁(0,1) (3.33)
√𝑛
√𝑛
is another approximation. (This trick doesn’t work with any distribution – it works for example
with the Poisson distribution because 𝜎 2 = Var [𝑋] = 𝜆 = 𝐸(𝑋) ≈ 𝑋 for the Poisson
distribution.)
This approximation is the best since it uses the fewest number of estimators. However it is not
directly invertable since the variance contains the unknown parameter. The confidence interval
is of the form
𝑥−𝜆
= {𝜆: | | ≤ 𝑧1−𝛼/2 } .
√𝜆/𝑛
−𝑏±√𝑏2 −4𝑎𝑐
For 𝑎𝜆2 + 𝑏𝜆 + c = 0 then the solution would be 𝜆 = 2𝑎
𝑎 = 1 > 0 therefore the arms go up.
−𝑏−√𝑏2 −4𝑎𝑐 −𝑏+√𝑏2 −4𝑎𝑐
Thus we know ≤𝜆≤
2𝑎 2𝑎
Since the coefficient of 𝜆2 is positive, the inequality is satisfied if 𝜆 lies between the two roots
of the quadratic. These roots are
−𝑏±√𝑏 2 −4𝑎𝑐 1 1 1 2 2
2
= [(2𝑥 + 𝑛 𝑧1−𝛼/2 ) ± √(2𝑥 + 𝑛 𝑧1−𝛼/2
2
) − 4𝑥 ]
2𝑎 2
(3.36)
21 2 1 𝑥
= 𝑥 + 2𝑛 𝑧1−𝛼/2 ± 𝑧1−𝛼/2 √4𝑛2 𝑧1−𝛼/2 +𝑛
Notice if 𝑛 gets large and we let terms of order 𝑛−1 go to zero, then (3.36) becomes
𝑥
≈ 𝑥 ± 𝑧1−𝛼/2 √𝑛 (3.37)
𝑥 𝑥
i.e. we get 𝑥 − 𝑧1−𝛼 √𝑛 ≤ 𝜆 ≤ 𝑥 + 𝑧1−𝛼/2 √𝑛 (using option 3).
2
Option 2 would have given the same answer i.e. which is the same interval given by (3.33). So for
large 𝑛 there will be little difference between the different approaches.
Section 3.4.1. would have given the same answer: Notice also that if the approach of Section
3.4.1 is folowed with ℎ(𝜆) = 𝜆, the resulting interval will also be as in (3.37). The approach of
Section 3.4.1 is usually only followed when we want a confidence interval for a function of the
parameter.
𝑠 𝑠
Option 1 would have given: 𝑋 − 𝑡1−𝛼,n−1 ≤ 𝜆 ≤ 𝑋 + 𝑡1−𝛼,n−1 .
2 √𝑛 2 √𝑛
EXERCISES 3
𝜎2
3. For the situation in Problem 3 of Chapter 2, find a 1 - 𝛼 confidence interval for 𝜆 = 𝜎12 .
2
1 2
(b) Let 𝑌 = −ℓ𝑛 𝑋. What is the confidence coefficient of the interval [𝑦 , 𝑦]? Also, find a better
interval for 𝜃.
7. Let 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜃, 𝜃) , 𝜃 > 0. Give three examples of pivotal quantities for 𝜃 and obtain
the 1 - 𝛼 confidence intervals.
8. How large a sample must be taken from a 𝑁(𝜇, 𝜎 2 ) distribution if the length of a 95%
1
confidence interval must be not larger than 2 𝜎? Assume 𝜎 2 known.
1
(b) Find another confidence interval for with the same coefficient but smaller expected
𝜃
length.
10. Let 𝑋1 , … , 𝑋𝑛 be a random sample from
(c) If 𝑌1 = min{𝑋1 , … , 𝑋𝑛 }, find a pivotal quantity and interval for 𝜃, based only on 𝑌1 .
11. The breaking strengths in kg of five specimens of manila rope were found to be 330, 230,
270, 290 and 275.
(a) Find a 95% confidence interval for the mean breaking strength, assuming normality.
(b) Estimate the point at which only 5% of such specimens are expected to break.
12. To test two new lines of hybrid corn under normal farming conditions, a seed company
selected eight farms at random and planted both lines in experimental plots on each farm. The
yield for the eight locations were
𝐿𝑖𝑛𝑒 𝐴 86 87 56 93 84 93 75 79
𝐿𝑖𝑛𝑒 𝐵 80 79 58 91 77 82 74 66
Assuming joint normality, estimate the difference between the mean yields by a 95% confidence
interval.
2𝑥
𝑓(𝑥|𝜃) = 𝜃2 , 0 < 𝑥 < 𝜃 , 𝜃 > 0 .
14. Let 𝑋1 , … , 𝑋𝑛 be a sample from a Gamma (𝑘, 𝛽) distribution with 𝑘 known. Find a
uniformly most accurate (UMA) 1 - 𝛼 confidence interval of the form (0, 𝑢( )] for 𝛽.
15. Let 𝑋 be one observation from the pdf
𝑒 𝑥−𝜃
𝑓(𝑥|𝜃) = (1+𝑒 𝑥−𝜃 )2 , −∞ < 𝑥 < ∞ , −∞ < 𝜃 < ∞ .
16. Let 𝑋1 , … , 𝑋𝑛 ∼ 𝐸𝑋𝑃 (𝜆). Find a UMA 1 - 𝛼 confidence interval based on inverting the
UMP test of 𝐻0 : 𝜆 ≥ 𝜆0 versus 𝐻1 : 𝜆 < 𝜆0 . Find the expected length of this interval.
17. If 𝑋1 , … , 𝑋𝑛 ∼ 𝑁(𝜇, 𝜎 2 ) with 𝜎 2 unknown, find the shortest 1 - 𝛼 confidence interval for
𝜇.
18. A thumbtack is tossed 20 times and landed with the point up 14 times. Use the statistical
method and tables to find a confidence interval for 𝑝, the probability for “point up”, with
confidence coefficient of approximately 90%.
19. The following data, the number of aphids per row in nine rows of a potato field, can be
assumed to follow a Poisson distribution:
155, 104, 66, 50, 36, 40, 30, 35, 42.
Construct the best approximate 90% confidence interval for the mean number of aphids per row.
Compare with the exact result as obtained from Equation (3.20).
20. For a random sample 𝑋1 , … , 𝑋𝑛 from a Bernoulli(𝑝) distribution, find the best approximate
1 - 𝛼 confidence interval (as in Example 3.14) for 𝑝. Find also a simpler, but less accurate,
interval. Find the approximate 90% interval for the data in Exercise 19.
23. Let 𝑋1 , … , 𝑋𝑛 be a sample from a Negative Binomial (𝑟, 𝑝) distribution with 𝑟 known ;
𝑟+𝑥−1 𝑟
𝑓(𝑥|𝑝) = ( ) 𝑝 (1 − 𝑝)𝑥 , 𝑥 = 0.1, … ,0 < 𝑝 < 1 .
𝑥
𝑟(1−𝑝)
(a) Find an approximate 1 − 𝛼 confidence interval for the mean, 𝐸[𝑋] = .
𝑝
(b) The aphid data of Exercise 20 can also be modelled using the negative binomial distribution
with 𝑟 = 2. Construct an approximate 90% confidence interval for the mean using the result in
part (a) and compare it with the result of Exercise 20.
24. One side of a square field is measured 9 times. The measuring instrument has a
measurement error that is normally distributed with standard deviation of one when the true
distance is approximately 9 meters. The mean length obtained from the 9 measurements is 9
meters. Find an approximate 99% confidence interval for the area of the field.