0% found this document useful (0 votes)

14 views17 pages

Lecture Notes - 3

The document discusses statistical hypothesis testing, emphasizing its role in determining the validity of a proposed theory about a population based on sample data. It introduces key concepts such as null and alternative hypotheses, decision rules, Type I and Type II errors, and the power function of a test. The notes also highlight the importance of balancing the size of a test with its power to detect true effects in the population.

Uploaded by

qq1812016515

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views17 pages

Lecture Notes - 3

Uploaded by

qq1812016515

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Statistical Inference (STAT3013/8027) - Lecture Notes - Page 82

4. STATISTICAL HYPOTHESIS TESTING

In all of the previous sections of these notes, we have focussed on the area of statistical estimation.
In other words, we have tried to use our data (and sometimes a prior belief in the case of Bayesian
approaches) to arrive at either “best guesses” (in the case of point estimation) or “plausible ranges”
(in the case of interval estimation) for some quantitative aspect (often encoded as a parameter
of a distributional family) of a population of interest. In many situations, however, the simple
estimation of a population characteristic is not the final desired outcome of a statistical analysis.
Specifically, we may want to use our estimates to decide whether some previously proposed theory
or statement regarding the population of interest is actually true (or at least is plausible given the
information provided by the observations at hand). This is, of course, the standard framework of
statistical hypothesis testing which is familiar from any introductory unit in basic statistics. In
this final section of these notes, we will briefly discuss the more formal structure of the theory
of hypothesis testing which underlies the standard testing procedures (for population means and
proportions) which are the main staple of any introductory presentation. We start by giving a
formal set of definitions for parametric hypothesis testing and then introduce perhaps the most
important and flexible of all testing procedures, based on the likelihood function. Finally, as we
have done with both point and interval estimation previously, we will briefly investigate procedures
for some standard situations which are not (as heavily) dependent on the parametric assumptions
which will underly our initial discussions of statistical testing theory.

4.1. Definitions
We shall introduce and define the key aspects of a statistical hypothesis test through a rather simple
example. Suppose that we have purchased a light-bulb based on its advertised claim that the mean
lifetime of such bulbs is at least 1000 hours. If we then observe the lifetime of the actual bulb we
purchased, we have some data with which to assess the advertising claim. This simple scenario is
precisely the framework of statistical hypothesis testing.
4.1.1. Statistical Hypotheses and Decision Rules: More formally, suppose we believe that the
lifetime of the population of bulbs in question is exponentially distributed with mean parameter
θ, so that the probability density associated with X, the random lifetime of a bulb, is given by
fX (x; θ) = θ−1 e−x/θ for some θ ∈ Θ. A statistical hypothesis is then simply a statement regarding
the population of interest or, equivalently in the parametric case described here, the value of the
true population parameter. As such, we can formulate the hypothesis we wish to examine regarding
the population of light-bulbs as H0 : θ ≥ 1000. More generally, we have:
Definition 4.1: Suppose that X1 , . . . , Xn represent a simple random sample from a parametric
family with density function fX (x; θ) for some parameter θ ∈ Θ. A statistical hypothesis is
simply a subset of the parameter space, Θ. Any statistical hypothesis of interest, often termed
the null hypothesis, is associated with a competing alternative hypothesis. As such, a null
hypothesis and its alternative form a partition of the parameter space Θ consisting of the sets:
Θ0 , the set of parameter values which constitute the null hypothesis and Θ1 = Θc0 ∩ Θ, the set
of parameter values which are in the parameter space but not in the null hypothesis collection.
Note that in our light-bulb example, Θ0 = {θ ∈ Θ : θ ≥ 1000}. Moreover, we stress that the
alternative hypothesis is defined as the complement of the null hypothesis within the parameter
space. In other words, if we are considering testing the mean of a normal distribution and our null
hypothesis is H0 : µ = 0, then the general alternative (in the case that the parameter space of µ is
the entire real line) would be the two-sided one, H1 : µ = 0. However, if we restrict the parameter
space to only non-negative values (perhaps because of some external information regarding the
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 83

specific problem at hand), then the relevant alternative hypothesis would be the one-sided one,
H1 : µ > 0, since {µ = 0}c ∩ {µ ≥ 0} = {µ > 0}.
A statistical test of the null hypothesis H0 : θ ∈ Θ0 is then just a decision rule based on
the observed data for deciding whether to accept H0 or reject it, and thus accept the alternative
hypothesis, H1 : θ ∈ Θ1 .
Definition 4.2: Suppose that X1 , . . . , Xn represent a simple random sample from a parametric
family with density functions fX (x; θ) for some parameter θ ∈ Θ. Further let X represent the
sample space of the (random) vector X = (X1 , . . . , Xn ). A statistical test of the null hypothesis
H0 : θ ∈ Θ0 is just a decision rule based on a partitioning of the sample space. In particular,
if we partition the sample space X into those outcomes of the observations which would lead
us to reject H0 , often denoted as C and referred to as the rejection region or critical region of
the test, and those observations which would lead us to accept H0 , which is just the collection
C c ∩ X , then a statistical test is simply defined by the decision rule which rejects H0 in favor
of H1 in the case that X = (X1 , . . . , Xn ) ∈ C and accepts H0 otherwise.
So, characterising a statistical test is as simple as defining its associated rejection region. For
instance, in our light-bulb example, we can define the test which rejects H0 if X, the observed
lifetime of our sampled bulb, is less than 1000 hours. In other words, we define a test with critical
region C = {X < 1000}. Indeed, since we have already seen (during our initial discussions of
the concept of sufficiency) that statistics can be viewed as partitioning the sample space of the
observations, it is quite common to define a statistical test in terms of a rejection region which
is just a level set for some statistic T (X1 , . . . , Xn ); in other words, C has the form C = {X ∈
X : T (X) < k} for some prespecified value k. Of course, whether this is a “good” test must be
determined by examining the properties of the testing procedure so determined. This exercise is
the subject of the next section.
4.1.2. Size and the Power Function: Common sense would indicate that the test described in
the example of the previous section; namely, rejecting the null hypothesis that the mean lifetime
of the bulbs is at least 1000 hours based on a single observation being less than 1000 hours, is
not a very good test since it is quite prone to making an error. Indeed, we can assess the quality
of a statistical test by examining the two distinct types of errors that can arise from it. If the
observations fall in the rejection region C when in fact that null hypothesis, H0 , is true then our
testing procedure will reject H0 when it should not. Such a mistake is termed a Type I error and
has a chance of occuring P rθ (C) for θ ∈ Θ0 . Alternatively, if the observed data values fall outside
the rejection region when in fact the null hypothesis is false, then our testing procedure will accept
H0 when it should not. Such a mistake is termed a Type II error and has a chance of occuring
P rθ (C c ) for θ ∈ Θ1 . Clearly, we would like to use a testing procedure which has a small chance of
making errors of either type.
Of course, to actually assess the probability of making an error, we must make a probability
statement about the observed data values, and these values depend on the true parameter θ. For
instance, suppose that in our light-bulb example, the true mean lifetime of bulbs is exactly 1000
hours. In this case, H0 is indeed true and the chance of a Type I error is
1000
1 −x/1000
P r(C) = P r1000 (X < 1000) = e dx = 1 − e−1000/1000 = 0.632.
0 1000

Similarly, if θ = 1500 the chance of a Type I error is

1000
1 −x/1500
P r(C) = P r1500 (X < 1000) = e dx = 1 − e−1000/1500 = 0.077.
0 1500
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 84

On the other hand, if θ = 500 then H0 is false and the chance of making a Type II error is:
∞
1 −x/500
P r(C c ) = P r500 (X ≥ 1000) = e dx = e−1000/500 = 0.135.
1000 500

Clearly, there is a strong relationship between Type I and Type II errors. In particular, note that
for a given value of θ, only one type of error can occur (since for any given θ, H0 either is or is not
true). For convenience we generally focus our attention on the so-called power function:
Definition 4.3: The power function of a statistical test of H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1
determined by the rejection region C is given by KC (θ) = P rθ (C). Note that this function
yields the chance of a Type I error when θ ∈ Θ0 and yields the probability of correctly rejecting
H0 when θ ∈ Θ1 ; that is KC (θ) = 1 − P rθ (C c ), which is one minus the probability of a Type
II error when θ ∈ Θ1 . This last probability is often termed the power of the test, since it
represents the likelihood of the test detecting that the null hypothesis is indeed false (i.e., its
power of detection).
Since KC (θ) is just the chance of rejecting H0 when the true parameter value is θ, we would like
to have tests which have values of KC (θ) which are large when θ ∈ Θ1 and which are small when
θ ∈ Θ0 . Of course, since KC (θ) is a function, it can sometimes be difficult to work with directly.
Therefore, we often define:
Definition 4.4: The size (or significance level) of a statistical test is given by

αC = sup KC (θ).
θ∈Θ0

In other words, the size of a test is the largest possible chance of a Type I error.
In the case of our light-bulb example, it is easy to calculate the power function as KC (θ) = P rθ (X <
1000) = 1 − e−1000/θ . Therefore, the size of the test determined by C = {X < 1000} is easily seen
to be

sup KC (θ) = sup 1 − e−1000/θ = 1 − e−1000/1000 = 0.632,
θ∈Θ0 θ≥1000

since KC (θ) is clearly a decreasing function of θ in this case.

As we have noted, this seems a rather large chance of an error. Indeed, it is rather standard
practice to focus on tests which have sizes on the order of 0.05 or 0.01. The problem with the
current test is that it is too liberal with its rejection policy. With this in mind, we can define a
new test based on a critical region which makes it more difficult to reject H0 . Suppose we define
our new test based on the rejection region C = {X < 250}. Some simple calculations (following
along the lines of those set out for the critical region C, and left as an exercise for the reader) show
that the power function for this new test is KC (θ) = 1 − e−250/θ which implies that the size of our
new test is 1 − e−250/1000 = 0.221. This value is certainly more palatable than that of the previous
test, but it still seems rather large. In fact, as is commonly shown in introductory statistics units,
we generally want to choose a rejection region to achieve a specific size, α. In other words, if we
continue to focus on tests with rejection regions of the form Cα = {X < kα }, we would like to
choose kα such that

sup KCα (θ) = sup 1 − e−kα /θ = 1 − e−kα /1000 = α.
θ∈Θ0 θ≥1000

Some simple algebra shows that, for this example, we have kα = −1000 ln(1 − α). In particular, if
we want a test with size α = 0.05, we should use a test with rejection region C = {X < 51.29},
since −1000 ln(1 − 0.05) = 51.29.
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 85

Of course, size (i.e., Type I error) is only one side of the coin. We must also examine our
chance of a Type II error. In particular, we would like to ensure that the power function of our test
is large when θ ∈ Θ1 . Note that if we employ the test based on the critical region C = {X < 51.29}
for our lightbulb example (so that we have a test with size α = 0.05), then the power of this
test when θ = 500 (i.e., when the true mean lifetime is half of the advertised duration) is given
51.29 1 −x/500
by KC (500) = P r500 (X < 51.29) = 0 500 e dx = 1 − e−51.29/500 = 0.0975. In other
words, this test has less than a 10% chance of detecting even this drastic departure from the null
hypothesis. Unfortunately, if our power is not as large as we like, then we cannot simply change a
rejection region of the form C = {X < k} to increase the power without simultaneously affecting
the size of our test. Indeed, it is usually the case that simple modifications to a testing procedure
to decrease the chance of a Type II error (or equivalently to increase the power of the test when H0
is false) will increase the size of the test. Our task, then, is to find tests (or equivalently rejection
regions) of a given size which have the best possible power when θ ∈ Θ1 . We note that there are
two potential ways of modifying our test so as to increase its power at the same time as maintaining
its size. The first is to change the sample space X (recall that a statistical test is equivalent to a
partitioning of the sample space). The only way to effectively achieve a change in X is to change
the sample size (and indeed, it should seem reasonable that the easiest way to increase the power of
detection of departures from the null hypothesis is to increase the information available on which
to base a decision). While this is sometimes a possibility in practice, usually we are in the position
of already having gathered our observations and so the size of the sample is a fixed quantity. The
other method of changing our test is, of course, to change our critical region C. We have noted that
critical regions are generally based on level sets of a statistic (though they certainly do not have to
be), and simply changing the level of the set [i.e., changing the value k in the region of the form
{X ∈ X : T (X) < k}] will generally only increase the power at the expense of increasing the size
of the test as well. As such, we must change our critical region (and thus the corresponding test)
more substantially and dramatically, generally by basing it on a different statistic, T (X). Finding
“good” tests based on level sets of statistics T (X) for a fixed sample size is the subject of the
following sections.

4.2. Most Powerful Tests

As noted at the end of the last section, we would like to determine tests which have a specified
size and as large a power as possible when θ ∈ Θ1 . In fact, our goal will be to find a test which is
uniformly most powerful (UMP) among all tests of a specified size α. Formally, we would like to
find a test determined by a critical region C such that:
i. KC (θ) ≤ α for all θ ∈ Θ0 ; and,
ii. KC (θ) ≥ KC (θ) for all θ ∈ Θ1 and all subsets C ⊆ X such that KC (θ) ≤ α for all θ ∈ Θ0 .
A test (or critical region) satisfying this definition is a uniformly most powerful test of size α. Note
that the second condition simply states that the test determined by the critical region C must have
higher power at all points of Θ1 than any other test (determined by critical region C ) with size
α. Unfortunately, as we have seen throughout our investigations, it is rarely possible to find such
uniformly optimal procedures, since it will generally be the case that any given test will have the
best power for some θ values in Θ1 but not all. There are some special situations in which certain
tests are known to be UMP. For example, if we assume a normal probability model, then the
standard one-sided t-tests for differences in means and linear regression parameters can be shown
to be uniformly most powerful (of course, if we assume some other, non-normal, probability model
for our observations than this statement ceases to be true). Outside these cases, however, UMP
tests rarely exist (and even if they do exist, actually finding them is usually extremely difficult).
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 86

However, there is one case in which it is always possible to find a UMP test, and this is the subject
of the next section.
4.2.1. Simple Hypotheses and the Neyman-Pearson Lemma: A statistical hypothesis which
consists of only a single parameter value is generally termed simple. For instance, if Θ0 for the null
hypothesis H0 : θ ∈ Θ0 consists of the single value θ0 (i.e., Θ0 = {θ0 }), then it is a simple hypothesis.
Alternatively, if a hypothesis is not simple (i.e., it contains more than a single possible value) it is
termed composite. In this section, we will examine the case of a statistical test for which both the
null and alternative hypotheses are simple. In other words, we shall suppose that X1 , . . . , Xn are
a sample from a population characterised by a probability model with density function fX (x; θ)
for θ ∈ Θ where Θ = {θ0 , θ1 } and we shall focus on testing the null hypothesis H0 : θ ∈ Θ0 with
Θ0 = {θ0 }. Note that the structure of Θ means that this is a test of the null hypothesis H0 : θ = θ0
versus the alternative hypothesis H1 : θ = θ1 .
We now demonstrate that the UMP test in the case of two simple hypotheses is based on the
so-called likelihood ratio:
L(θ0 ; X1 , . . . , Xn ) fX (X1 , θ0 ) · · · fX (Xn ; θ0 )
Λ(X1 , . . . , Xn ) = = .
L(θ1 ; X1 , . . . , Xn ) fX (X1 , θ1 ) · · · fX (Xn ; θ1 )
In particular, the test we shall define has a critical region of the form C = {Λ(X1 , . . . , Xn ) ≤ k}.
[NOTE: Since θ0 and θ1 are specified constants in the current testing framework, Λ(X1 , . . . , Xn )
is a statistic.] The idea here is to construct the critical region, C, by collecting together those
elements of the sample space, X , which give the strongest evidence against the null hypothesis. In
this respect, the ratio of the likelihood for any given sample at each of the two possible parameter
values is precisely a relative measure of how plausible the two hypotheses are. In other words, when
Λ(X1 , . . . , Xn ) is very small, this is strong evidence that the observations arose from the alternative
hypothesis rather than the null hypothesis. All that remains, then, is to determine the value of
k so as to ensure that the test is of the desired size α. This can always be accomplished with
an application (of perhaps rather tedious) calculus in the current setting since we have assumed
that our hypotheses are simple (and thus completely determine the distribution of the data). Of
course, while it should seem intuitively reasonable that the likelihood ratio is a good method of
distinguishing between samples which support the null hypothesis versus samples which support
the alternative hypothesis, in order to be assured that the test based on this statistic is UMP
we need to demonstrate that the likelihood ratio provides the “best” information for making this
distinction. This fact is the subject of the so-called Neyman-Pearson Lemma:
Theorem 4.1: Suppose that X1 , . . . , Xn are a sample from a population characterised by
a probability model with density function fX (x; θ) for θ ∈ Θ where Θ = {θ0 , θ1 } and we
want to test the null hypothesis H0 : θ ∈ Θ0 with Θ0 = {θ0 }. Then the test with critical
region C = {Λ(X1 , . . . , Xn ) ≤ kα }, where Λ(X1 , . . . , Xn ) is the likelihood ratio statistic defined
previously and kα is defined such that P rθ0 (C) = α, is uniformly most powerful among all tests
of size no larger than α.
Proof: We start by defining any other test of size α ≤ α, determined by the critical region C .
We need to show that P rθ1 (C) ≥ P rθ1 (C ), since this demonstrates that the test based on the
critical region C has larger power than any other test of size no larger than α for all θ ∈ Θ1
(and here we see why the fact that the alternative hypothesis is simple makes this situation
much easier to deal with than the general case of a composite alternative). Now, we note the
following simple probability identities:
P rθ1 (C) = P rθ1 (C ∩ C ) + P rθ1 (C ∩ C c )
P rθ1 (C ) = P rθ1 (C ∩ C) + P rθ1 (C ∩ C c ),
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 87

which together imply that:

P rθ1 (C) = {P rθ1 (C ) − P rθ1 (C ∩ C c )} + P rθ1 (C ∩ C c )

= P rθ1 (C ) + {P rθ1 (C ∩ C c ) − P rθ1 (C ∩ C c )}.

So, we can demonstrate the desired result by simply showing that P rθ1 (C∩C c )−P rθ1 (C ∩C c ) ≥
0. To do so, we ﬁrst note that for any event E ⊆ C, we have:

1
P rθ1 (E) = L(θ1 ; x1 , . . . , xn )dx1 · · · dxn = L(θ0 ; x1 , . . . , xn )dx1 · · · dxn
E E Λ(x1 , . . . , xn )

1 1
≥ L(θ0 ; x1 , . . . , xn )dx1 · · · dxn = P rθ0 (E),
kα E kα

since, by the deﬁnition of C, we have Λ(x1 , . . . , xn ) ≤ kα for any x = (x1 , . . . , xn ) ∈ E ⊆ C.

Moreover, a nearly identical argument shows that for any event F ⊆ C c we have P rθ1 (F ) ≤
1 c c c
kα P rθ0 (F ). Therefore, since C ∩ C ⊆ C and C ∩ C ⊆ C , we see that:

1 1
P rθ1 (C ∩ C c ) − P rθ1 (C ∩ C c ) ≥ P rθ0 (C ∩ C c ) − P rθ0 (C ∩ C c )
kα kα
1
= P rθ0 (C ∩ C c ) − P rθ0 (C ∩ C c )
kα
1
= P rθ0 (C ∩ C c ) + P rθ0 (C ∩ C )
kα

− P rθ0 (C ∩ C) − P rθ0 (C ∩ C c )
1
= P rθ0 (C) − P rθ0 (C )
kα
1
= (α − α )
kα
≥ 0.

So, we now have a UMP test for the case of simple null and alternative hypotheses. Of course, for
any speciﬁc instance, we will need to calculate the appropriate value of kα .
Example 4.1: Suppose that X1 , . . . , Xn are a random sample from a normal distribution with
mean µ and unit variance. Further, suppose that we know µ ∈ {0, 1}. We wish to test H0 : µ = 0
versus H1 : µ = 1. Now, the likelihood function in this case is:
n
1 1
L(µ; X1 , . . . , Xn ) = exp − (Xi − µ)2 .
(2π)n/2 2 i=1

Therefore, the uniformly most powerful test of size α in this case is determined by the rejection
region C = {Λ(X1 , . . . , Xn ) ≤ kα }, where
n
(2π)−n/2 exp − 12 i=1 Xi2
Λ(X1 , . . . , Xn ) = n
(2π)−n/2 exp − 12 i=1 (Xi − 1)2
n
1 2
= exp − {X − (Xi − 1)2 }
2 i=1 i
n
n
= exp − Xi ,
2 i=1
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 88

and kα is determined so that P r0 (C) = α. Now, before we actually determine kα directly,

we note that the critical region determined by the likelihood ratio can often be simpliﬁed in
structure. In this case, note that:
n n
n n
{Λ(X1 , . . . , Xn ) ≤ kα } = exp − Xi ≤ kα = − Xi ≤ ln(kα )
2 i=1 2 i=1

n
n 1 1
= Xi ≥ − ln(kα ) = X≥ − ln(kα ) .
i=1
2 2 n

In other words, we can now see that the UMP test is equivalently determined by a rejection region
of the form C = {X ≥ cα }, where cα = 12 − n1 ln(kα ) is now determined so that P r0 (C) = α.
This form of the critical region makes determination of the required constant much easier, since
the distribution of the statistic X is well-known in this case. In particular, when µ = 0, X
is normally distributed with mean 0 and variance n1 , so that the required value of cα can be
determined as:
√ √
P r0 (X ≥ cα ) = α =⇒ P r0 ( n X ≥ cα n) = α
√
=⇒ 1 − Φ(cα n) = α
1
=⇒ cα = Φ−1 (1 − α) √ .
n

Of course, we can now determine the value of kα if we so desire, but it is no longer necessary, as
we see that the UMP test is now simply determined by the decision rule which rejects H0 : µ = 0
in favor of H1 : µ = 1 whenever X ≥ Φ−1 (1 − α) √1n . As a ﬁnal aside, we note that this rejection
rule can also be written in the form:
X −0
≥ Φ−1 (1 − α),
1/n

which looks strikingly like the usual one-sided test for a single population mean when the popu-
lation variance is assumed known. Indeed, this is precisely the starting point for demonstrating
the previously stated facts regarding the UMP nature of the usual one-sided t-tests under the
assumption of normally distributed observations.
We close this section with a few remarks. First, we note that the simple nature of the null and alter-
native hypotheses assumed here by no means requires θ0 and θ1 to be scalar values, just that they be
a single (possibly vector-valued) point in the parameter space Θ. Second, we note that there was no
real requirement that our observed sample contain independent observations or even that the struc-
ture of our testing framework be parametric in the true sense of the word. All that we truly required
was that the two competing hypotheses each completely determined a distinct joint likelihood for
the observed data. In other words, if our hypotheses took the form H0 : fX1 ,...,Xn (x1 , . . . , xn ) =
g0 (x1 , . . . , xn ) and H1 : fX1 ,...,Xn (x1 , . . . , xn ) = g1 (x1 , . . . , xn ) where fX1 ,...,Xn (x1 , . . . , xn ) repre-
sents the joint density function of the observations X1 , . . . , Xn and g0 (x1 , . . . , xn ) and g1 (x1 , . . . , xn )
are two given functions, then the UMP test of size α associated with these competing hypotheses is

determined by a rejection region of the form C = gg01 (x 1 ,...,xn )
≤ kα where kα is determined so that
(x 1 ,...,x n )
g (x , . . . , xn )dx1 · · · dxn = α (of course, this last requirement may mean a rather tedious and
C 0 1
complicated calculus problem is required before we can actually implement this test). Finally, we
note that it may not always be possible to ﬁnd a value kα which satisﬁes the strictures of Theorem
4.1. In other words, there may be no value kα such that P rθ0 (C) = α exactly. In particular, this
can occur when the observed data have a discrete distribution.
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 89

Example 4.2: Suppose that X1 , . . . , X10 are iid random variables having a Bernoulli distribu-
tion with parameter θ. Further, suppose that we wish to test H0 : θ = 0.5 versus H1 : θ = 0.2.
The likelihood function in this case is just:

L(θ; X1 , . . . , X10 ) = θ10X (1 − θ)10(1−X) ,

10
where 10X = i=1 Xi is just the number of the Xi ’s which take the value 1. As such, the UMP
test of a given size α is the one determined by the rejection region of the form:

C = {Λ(X1 , . . . , Xn ) ≤ kα }
10X 10(1−X)
0.5 0.5
= ≤ kα
0.2 10X 0.810(1−X)
10
5
= 410X ≤ kα
8
= {10X ≤ cα },
10
where cα = log4 85 kα . Suppose that we wish to ﬁnd a UMP test of size α = 0.01. Since
10X has a binomial distribution with parameters n = 10 and p = 0.5 under the null hypothesis,
we see that we must ﬁnd cα such that:
cα
10
0.510 = 0.01.
i
i=0

However, some simple arithmetic shows that:

0
1

10 10
0.510 = 0.0009765625; 0.510 = 0.0107421875.
i i
i=0 i=0

Therefore, P r0.5 (C) < 0.01 for any choice cα < 1 and P r0.5 (C) > 0.01 for any choice cα ≥ 1. In
other words, there is no possible value of cα which makes the probability of the rejection region
exactly equal to 0.01.
In such cases, however, while there may be no UMP test of a specific size α (if kα does not exist
for this size), there will always be a UMP test for some collection of sizes α1 , α2 , . . . , and we can
then pick the UMP test with the size closest to our desired size α. Indeed, in Example 4.2, we can
find a UMP test of size α = 0.0107421875 which is rather close to 0.01.
4.2.2. Generalised Likelihood Ratio Tests: In the previous section, we were able to find a UMP
test in the case of simple null and alternative hypotheses. Of course, we have already noted that
such an endeavour is generally not possible in the case of composite hypotheses. Nonetheless, the
result of the Neyman-Pearson lemma does lead quite naturally to the construction of a test in the
case of composite hypotheses. In particular, suppose that X1 , . . . , Xn are a random sample from a
population characterised by a probability model with density function fX (x; θ) for θ ∈ Θ and we
are interested in testing H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1 where Θ0 ∪ Θ1 = Θ is any partition of the
parameter space. Following the general notion of the Neyman-Pearson lemma, we can define the
generalised likelihood ratio:
supθ∈Θ0 L(θ; X1 , . . . , Xn ) supθ∈Θ0 fX (X1 ; θ) · · · fX (Xn , θ)
Λg (X1 , . . . , Xn ) = =
supθ∈Θ L(θ; X1 , . . . , Xn ) supθ∈Θ fX (X1 ; θ) · · · fX (Xn , θ)

and the generalised likelihood ratio test which has critical region C = {Λg (X1 , . . . , Xn ) ≤ kα }
where, as usual, kα is deﬁned so that the size of the test is α; that is, supθ∈Θ0 P rθ (C) = α. Note
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 90

that the generalised likelihood ratio differs from the likelihood ratio statistic defined in Theorem
4.1 not only in the use of supremums (which are now necessary due to the potentially composite
nature of the hypotheses) but also in that the denominator is maximised over the entirety of the
parameter space Θ (rather than over the alternative hypothesis). This difference is employed for
purely mathematical reasons, and a little thought shows that the set C defined by a level set of the
generalised likelihood ratio is typically equivalent to a level set of the statistic:

supθ∈Θ0 L(θ; X1 , . . . , Xn )
Λg (X1 , . . . , Xn ) = .
supθ∈Θ1 L(θ; X1 , . . . , Xn )

In other words,
{Λg (X1 , . . . , Xn ) ≤ kα } = {Λg (X1 , . . . , Xn ) ≤ kα }

for some value kα , provided the level set of Λg (X1 , . . . , Xn ) in question has kα < 1. However,
from the perspective of constructing a critical region, these are the only level sets of interest, since
samples for which Λg (X1 , . . . , Xn ) = 1 indicate that the null hypothesis is at least as likely as
the alternative (since the supremum of the likelihood over the entire space is no larger than the
supremum over the null hypothesis subset) and as such would never reasonably be included in a
rejection region.
It would be nice if this test based on the generalised likelihood ratio was always the UMP test,
however, this is not the case. There are indeed cases where this test can be shown to be the UMP
test (indeed, the usual t-tests for population means and linear regression coefficients in the case of
normally distributed observations turn out to have the form of generalised likelihood ratio tests).
However, a full demonstration of when these tests are UMP is beyond the scope of these notes.
Moreover, even were we able to conclude that the likelihood ratio test was UMP we would still be
in the unenviable position of having to determine the appropriate value kα in the definition of the
critical region C. Fortunately, it turns out that even when the generalised likelihood ratio test is
not UMP, it typically has excellent properties (in particular, it can be shown to have nearly the
largest possible power as the sample size increases towards infinity). As such, we tend to use the
generalised likelihood ratio test in most complex testing situations where no other specific UMP
test is available.
We close this section by noting one other strength of the generalised likelihood ratio test.
Recall that we must determine the value of kα in the definition of the rejection region C =
{Λg (X1 , . . . , Xn ) ≤ kα }. To do so requires the distribution of the statistic Λg (X1 , . . . , Xn ) which
can be quite complicated in general. However, in some specific situations, the distribution of
Λg (X1 , . . . , Xn ) can be accurately approximated. In particular, suppose that θ = (θ1 , . . . , θp ), so
that the probability model parameter is a p-vector. Further suppose that Θ is an open subset
of p-dimensional Euclidean space (for example, the entire p-dimensional Euclidean space itself or
perhaps the positive quadrant, so that Θ = {θ ∈ IRp : θ1 > 0, . . . , θp > 0}) and the null hy-
pothesis we are interested in testing has the form H0 : θ1 = θ1,0 , . . . , θq = θq,0 for some q ≤ p.
In this case, it can be shown that the distribution of −2 ln{Λg (X1 , . . . , Xn )} is approximately
chi-squared with q degrees of freedom. As such, we can construct a test based on the gener-
alised likelihood ratio with an approximate size α which is determined by the rejection region
C = {−2 ln[Λg (X1 , . . . , Xn )] ≥ χ2q (1 − α)} = {Λg (X1 , . . . , Xn ) ≤ −0.5 exp[χ2q (1 − α)]}.
Example 4.3: Suppose that we observe the array of independent random variables Xij , i =
1, . . . , I, j = 1, . . . , J where Xij is normally distributed with mean µi and variance σi2 (i.e.,
a standard balanced one-way analysis of variance dataset). An important assumption for the
validity of standard ANOVA procedures is that of homoscedasticity. Suppose we wish to test this
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 91

assumption; that is, we wish to test the hypothesis H0 : σ12 = · · · = σI2 . Note that this hypothesis
is not quite in the form required for our chi-squared approximation to the generalised likelihood
ratio test. However, a simple reparameterisation from σ12 , . . . , σI2 to σ12 , τ22 = σ22 − σ12 , . . . , τI2 =
σI2 − σ12 shows that the null hypothesis can be written in the form H0 : τ22 = 0, . . . , τI2 = 0. Now,
the likelihood for this situation can readily be calculated as:

L(µ1 , . . . , µI , σ12 ,τ22 . . . , τI2 ; X11 , . . . , XIJ )

I J
1 (Xij − µi )2
J/2
1 1
= exp − ,
(2π) IJ/2 σ1 (τ2 + σ1 ) · · · (τI2 + σ12 )
2 2 2 2 i=1 j=1 τi2 + σ12

where we have deﬁned τ12 = 0. Some straightforward (though tedious) calculus shows that this
likelihood is maximised at:
J J J
1 1 1
µ̂i = Xij = X i ; σ12 = (X1j − X 1 )2 ; τi2 = (Xij − X i )2 − σ̂12 ,
J j=1 J j=1 J j=1

while under the null hypothesis the likelihood is maximised at:

J I J
1 2 1
µ̂i,0 = Xij = X i ; σ̂1,0 = (X1j − X 1 )2 .
J j=1 IJ i=1 j=1

Substituting these values into the likelihood then yields:

I J
1 (Xij − µ̂i )2
J/2
1 1
sup L(θ; X) = exp −
θ∈Θ (2π)IJ/2 σ̂1 (τ̂2 + σ̂1 ) · · · (τ̂I2 + σ̂12 )
2 2 2 2 i=1 j=1 τ̂i2 + σ̂12
J/2
1 1 IJ
= exp −
(2π)IJ/2 σ̂1 (τ̂2 + σ̂1 ) · · · (τ̂I2 + σ̂12 )
2 2 2 2
IJ/2 I J
1 1 1 (Xij − µ̂i,0 )2
sup L(θ; X) = 2 exp − 2
θ∈Θ0 (2π)IJ/2 σ̂1,0 2 i=1 j=1 σ̂1,0
IJ/2
1 1 IJ
= 2 exp − .
(2π)IJ/2 σ̂1,0 2

Therefore, the generalised likelihood ratio statistic is given by:

J/2
supθ∈Θ0 L(θ; X) σ̂12 τ̂22 + σ̂12 τ̂I2 + σ̂12
Λg (X11 , . . . , XIJ ) = = 2 2 ··· 2 .
supθ∈Θ L(θ; X) σ̂1,0 σ̂1,0 σ̂1,0

Finally, then, we see that the generalised likelihood ratio test with approximate size α is deter-
mined by the rejection region:

C = {−2 ln[Λg (X11 , . . . , XIJ )] ≥ χ2I−1 (1 − α)}

n
2
= IJ ln(σ̂1,0 )−J ln(σ̂12 + τ̂i2 ) ≥ χ2I−1 (1 − α) .
i=1

We close this section by noting that a proof of the chi-squared approximation to the distribution
of −2 ln{Λ(X1 , . . . , Xn )} is beyond the scope of these notes, but it does follow along the lines of
the argument used in the construction of the asymptotic chi-squared likelihood-based conﬁdence
intervals developed in Section 3.2.2.
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 92

4.3 Non-parametric Tests

To complete these notes, we present several hypothesis testing procedures for some standard sit-
uations which are not dependent on a proper choice of underlying parametric probability model
for the observations. As such, these procedures are given the name non-parametric. This section
is not intended to be a complete or rigorous development of the field of non-parametric tests, but
rather a presentation of a few simple and common procedures which give the general flavour of the
ideas involved in constructing non-parametric tests.
4.3.1. Tests for Univariate Samples: In this section, we shall assume that we have observed a
random sample X1 , . . . , Xn from some univariate distribution having CDF F (x). However, we shall
not make any further assumptions regarding the form of F (x). We shall then be interested in testing
whether a specified quantile of the distribution is equal to some value. In other words, for given
values p0 and z, we wish to test the null hypothesis H0 : F (z) = p0 against the two-sided alternative
H1 : F (z) = p0 . One possible test in this case is to determine a rejection region based on a level set
n
of the statistic Y = i=1 I{Xi ≤z} , the number of sampled observations less than or equal to the
null hypothesis value z. Suppose we define a critical region of the form C = {Y ≤ c1 or Y ≥ c2 }.
The power function for this test is given by:
c1 n
c1
n
n i n−i n i
KC (p) = P rp (C) = P rp (Y = i)+ P rp (Y = i) = p (1−p) + p (1−p)n−i
i i
i=0 i=c2 i=0 i=c2

where p is the (unknown) true value of F (z). Of course, we must choose c1 and c2 in order to
achieve a desired size for this test. As such, we need to choose the values of c1 and c2 so that
KC (p0 ) = α. [NOTE: Since Y is clearly a discrete random variable, we will not be able to achieve
all possible sizes; see Example 4.2 and the remarks at the end of Section 4.2.1.] We note that this
test is valid regardless of the underlying distribution F (x). Typically, the value of interest for p0
will be one-half, so that we are testing whether z is the median of the distribution F (x). In such
cases, the test described here is referred to as the sign test, since it can be seen to be based on
the number of positive values among the collection X1 − z, . . . , Xn − z. [NOTE: The version of the
sign test presented here is two-sided, however, it can be easily modiﬁed to achieve a test against
either of the one-sided alternatives H1 : F (z) > p0 or H1 : F (z) < p0 . All that is required is a
modiﬁcation of the critical region to the form C = {Y ≥ c} or C = {Y ≤ c}, respectively.]
Example 4.4: Let X1 , . . . , X10 be a random sample from a population characterised by a
distribution with CDF F (x). Suppose we wish to test whether the median of this distribution is
equal to 72; that is, we wish to test H0 : F (72) = 0.5 against the two-sided alternative. Further,
suppose that we would like a test of size α = 0.07. Some simple calculation shows:
0
1

10 10
(0.5)i (1 − 0.5)10−i = 0.00097656; (0.5)i (1 − 0.5)10−i = 0.01074219;
i i
i=0 i=0
2
10
10 10
(0.5)i (1 − 0.5)10−i = 0.05468750; (0.5)i (1 − 0.5)10−i = 0.05468750;
i i
i=0 i=8
10
10
10 10
(0.5)i (1 − 0.5)10−i = 0.01074219; (0.5)i (1 − 0.5)10−i = 0.00097656.
i i
i=9 i=10

So, we see that it is not possible to choose a rejection region such that the size of the test is
precisely 0.07. However, we can choose either C = {Y ≤ 2 or Y ≥ 9} or C = {Y ≤ 1 or Y ≥ 8}
and arrive at a test which has size 0.0547 + 0.0107 = 0.0654 which is reasonably close to the
desired level.
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 93

The sign test is remarkably flexible, making essentially no assumptions regarding the underlying
distribution F (x). However, it can be shown that its power (i.e., the probability of it detecting
that the null hypothesis is actually false) is quite low (and indeed, calculating the power when
F (z) = p1 for some value p1 = p0 is a straightforward calculation again involving the binomial
distribution). This lack of power should not be very surprising, as the sign test is only based
on whether the given observations are larger than the proposed median z, and ignores how far
above or below z the observations were. As such, the sign test tends to ignore useful information
contained in the sample. It does this in order to avoid making various assumptions regarding the
underlying distribution F (x), and this is a common theme for non-parametric tests; namely, they
give up power in order to avoid making parametric assumptions. This is not an advisable thing to
do if we truly believe in a given set of parametric assumptions. However, if we do not believe in
any parametric framework, then using the non-parametric approach seems a more prudent way to
proceed. Nonetheless, the loss of information inherent in the sign test seems rather dramatic, and
it can often be improved upon without requiring parametric assumptions to be made.
We saw that the sign test for the median was based on the statistic Y , the number of values in
the collection X1 − z, . . . , Xn − z which were positive. This approach essentially ignores the size of
the deviation between the observation and the proposed null hypothesis median value z. It turns out
that it is possible to retain some of the information contained in the size of these differences without
reverting to a parametric approach. In particular, suppose we define the quantities Zi = |Xi − z|
and let Ri be the rank of Zi in an ordered list of the values Z1 , . . . , Zn . For example, if n = 3 and
Z2 < Z3 < Z1 , then we would have R1 = 3 (since Z1 is the largest of the Zi ’s) while R2 = 1 and
R3 = 2. Finally, we define
−1 if Xi < z
si = 0 if Xi = z .
1 if Xi > z
A test of the null hypothesis H0 : F (z) = 0.5 can then be constructed based on the level sets
n
of the so-called Wilcoxon signed-rank statistic, W = i=1 si Ri . The idea is that if z truly is
the median of the population, then the ranks Ri will be evenly dispersed among the positive and
negative Xi − z values, and thus the statistic W will tend to be near zero. On the other hand,
if z is not the true median, then the large deviations from z will tend to congregate on one side
of z or the other, meaning that more of the large ranks will go with either the positive Xi − z
values (if the true median is larger than z) or the negative Xi − z values (if the true median is
smaller than z). In either case, the value of the statistic W will tend to be far from zero (in either
direction). Therefore, we can construct a test against the two-sided alternative with rejection region
C = {W ≤ c1 or W ≥ c2 }. Again, of course, we must determine the values c1 and c2 so as to ensure
that the size of our test is the desired value, α (and again, there are the obvious one-sided versions
of this test). Unfortunately, unlike the sign test, the distribution of the statistic W is no longer as
simple as the binomial distribution of Y . Nonetheless, the distribution of W under H0 can indeed
be computed directly (and tables of its distribution for small sample sizes exist). Moreover, it can
further be shown that the distribution of W under H0 is approximately normal with mean zero and
n
variance i=1 i2 = 16 n(n + 1)(2n + 1) when the sample size n is large. The demonstration of this
n
fact is beyond the scope of these notes, however, we do note that W = i=1 si Ri has the form of
a sum, and thus it is not overly surprising that its distribution can be approximated by a normal
distribution.
Example 4.5: Suppose that we observe the following 20 data values:
94.1, 93.3, 91.2, 93.0, 104.8, 100.6, 110.4, 94.1, 95.2, 102.1,
92.9, 102.7, 111.5, 88.4, 88.7, 105.0, 94.0, 99.1, 109.5, 97.3,
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 94

and we wish to construct an α = 0.01 level test of H0 : F (95) = 0.5 versus the two-sided
alternative. So, the desired critical region has the form C = {W ≤ c1 or W ≥ c2 }, and we need
to choose c1 and c2 so that P rH0 (C) = 0.01 (at least approximately). Using the fact that, under
H0 , W is approximately normally distributed with mean zero and variance 16 20(21)(41) = 2870,
we see that
c1 c2
P rH0 (C) ≈ Φ √ + 1−Φ √ ,
2870 2870
√
and thus choosing c2 = −c1 = 2.575 2870 = 137.95 yields a test with size:

P rH0 (C) ≈ Φ(−2.575) + {1 − Φ(2.575)} = 0.01.

[NOTE: These are certainly not the only possible choices for c1 and c2 , but the symmetry of
the resulting rejection region seems a sensible feature.] Now, to actually implement the test on
the given data, we note that the Xi − 95 values are:
−0.9, −1.7, −3.8, −2.0, 9.8, 5.6, 15.4, −0.9, 0.2. 7.1,
−2.1, 7.7, 16.5, −6.6, −6.3, 10.0, −1.0, 4.1, 14.5, 2.3.
The ranks of the absolute values of this collection are:
2.5, 5, 9, 6, 16, 11, 19, 2.5, 1, 14,
7, 15, 20, 13, 12, 17, 4, 10, 18, 8.
[NOTE: In the case of tied values, we simply assign the average rank; for example, the two
absolute values of 0.9 are the second and third smallest, so each is assigned a rank of 2.5. It
should be noted, however, that if there are a large number of tied observations, the normal
approximation to the distribution of W can become poor, and the procedure described here
would need to be modiﬁed.] So, we can now calculate the Wilcoxon signed-rank statistic as:

W = −2.5 − 5 − 9 − 6 + 16 + 11 + 19 − 2.5 + 1 + 14
− 7 + 15 + 20 − 13 − 12 + 17 − 4 + 10 + 18 + 8
= 88.

Since 88 ∈/ C, we do not reject the null hypothesis. Of course, if we were to change the size
of our test to α = 0.1, then the rejection region would need to change accordingly. A simple
calculation (left as an exercise) shows that (assuming we wish to maintain the symmetric aspect
of our rejection region), the new critical region is given by C = {W ≤ −88.13 or W ≥ 88.13}.
Again, we see that 88 ∈ / C, but this time it is a very near thing. Indeed, we recall from our
introductory units in statistics, the p-value of a testing procedure is the smallest size α for which
the observed data falls in the rejection region. As such, we see that the p-value associated with
this Wilcoxon signed-rank test for the observed data is very near to 0.1.
We close this section by noting that the Wilcoxon signed-rank test generally has much better power
than the sign test and still does not require parametric assumptions. Of course, the power of the
Wilcoxon signed-rank test is still generally less than that of parametric procedures, provided we
believe that the required parametric assumptions are indeed true.
4.3.2. Tests for Bivariate Samples: In this section, we shall assume that we have observed
two random samples X1 , . . . , Xm and Y1 , . . . , Yn from two independent, univariate populations
characterised by distributions having CDFs F (x) and G(y), respectively. As in the previous section,
we shall not make any further assumptions regarding the forms of F (x) or G(y). We shall then be
interested in testing whether the two populations are characterised by the same distribution. In
other words, we wish to test the null hypothesis H0 : F (z) = G(z) for all z against the two-sided
alternative H1 : F (z) = G(z) for some z. We shall discuss several diﬀerent tests for this situation.
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 95

The first test we shall discuss is essentially just a test for the equality of medians (and as such
is usually referred to as the median test). The idea of the test is that if the two populations have
the same distribution (or indeed, just the same median) than when the two samples are combined
and the median of this combined collection calculated, we should expect half of each sample to fall
below the combined median. Specifically, then, we define Z = median{X1 , . . . , Xn , Y1 , . . . , Ym } and
n

V = I{Xi <Z} ,
i=1

so that V is just the number of Xi ’s which fall below the combined median Z. Clearly, if H0 is true
then we would expect V to be equal to n/2, and thus we shall construct our test with a rejection

region of the form C = V − n2 ≥ k . All that remains, is to determine the value of k to achieve
a desired size α for our test. If we assume that the CDFs F (x) and G(y) are continuous, so that
the chance of any of the Xi ’s or Yj ’s being equal is zero (i.e., there is no chance of any ties in the
combined collection of observations), then it can easily be seen that in order for V = v, we must
choose v out of the n Xi ’s and m+n 2 − v of the m Yj ’s to be less than Z. This is precisely the
structure of the so-called hypergeometric distribution. In other words, we have:

n m
v 0.5(m + n) − v
P rH0 (V = v) = ,
m+n
0.5(m + n)

m
where we must be careful to interpret to be zero when 0.5(m + n) − v < 0.
0.5(m + n) − v
[NOTE: In the case that 0.5(m+n) is not an integer then, by convention, we simply use 0.5(m+n−1)
instead, the idea being that we have thus ignored the observed value equal to Z, the combined
median, which will always exist in a combined sample of odd size.] Now, if m and n are small enough,
an exact calculation of the hypergeometric probabilities can be performed and an appropriate value
for k can then be chosen to yield a test of the desired size. When m and n are large, however, such
calculations are extremely time consuming. As such, we can approximate the distribution of V with
a normal distribution when m and n are large (in this particular case, the normal approximation
is quite accurate as soon as m, n > 10). It is a reasonably straightforward exercise to show that:
n mn
EH0 (V ) = ; V arH0 (V ) = = σV2 .
2 4(m + n − 1)

Therefore, if we want a test of approximate size α, we should choose k such that:

n k k

P rH0 V − ≥ k ≈ Φ − + 1−Φ = α.
2 σV σV

A simple calculation then shows that we should choose k = Φ−1 (1 − α/2)σV . As a speciﬁc example,
400
suppose that we have m = n = 20. In this case, σV2 = 4(39) = 2.5641. Thus, a test with size α = 0.05
n
√
would reject H0 whenever V diﬀered from 2 = 10 by more than k = 1.96 2.5641 = 3.14; that is,
we will reject H0 if more than 13 or fewer than 7 Xi ’s fall below the combined median, Z.
Of course, just as the sign test ignored the actual values of the observed data, the median test
described above does not take into account the size of the Xi ’s and Yj ’s but only the number of these
values which fall below the observed median of the combined sample. In the case of the univariate
framework, we saw that the sign test could be improved upon by incorporating the ranks of the
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 96

data, and this led to the Wilcoxon signed-rank test. In the current setting, a very similar approach
can be taken to develop an improved test for the null hypothesis H0 : F (z) = G(z) for all z. We
again start by considering the combined sample, and deﬁne Ri (i = 1, . . . , n) to be the rank of Xi
in the combined sample. For example, if we have observed the samples X1 = 1, X2 = 6, X3 = 2
and Y1 = 0, Y2 = 4, then the ordered combined collection is Y1 , X1 , X3 , Y2 , X2 and thus R1 = 2,
R2 = 5 and R3 = 3. The Mann-Whitney test can then be determined by deﬁning a rejection region
n
based on the statistic T = i=1 Ri [NOTE: this test is also sometimes referred to as the Wilcoxon
rank-sum test]. It is a reasonably straightforward (though tedious) exercise to show that, under
the null hypothesis, the mean and variance of T are:

n(n + m + 1) nm(n + m + 1)
EH0 (T ) = ; V arH0 (T ) = .
2 12
Now, if the observed value of T is far from its expectation under the null hypothesis, then this
is evidence that we should reject the null hypothesis. Indeed, the rejection region for the Mann-
Whitney test is of the form C = {|T −EH0 (T )| ≥ k}. All that remains is to appropriately determine
the value k to ensure the desired size of the test. For the simple data set with n = 3 and m = 2
given earlier, we note that the observed value of T is 2 + 5 + 3 = 10. To determine the distribution
of T in this case, we note that for n = 3 and m = 2, there are 10 possible general arrangements for
the combined values in terms of the sample to which the values belong; that is, the ordered sample
could have been associated with the arrangements:

xxxyy, xxyxy, xxyyx, xyxxy, xyxyx, xyyxx, yxxxy, yxxyx, yxyxx, yyxxx

(e.g., the given data are in the arrangement yxxyx). For each of these 10 arrangements, the
associated values of T are 6, 7, 8, 8, 9, 10, 9, 10, 11, and 12. Under the null hypothesis, each of
these 10 arrangements is equally likely, and thus we can calculate:

1 1 1 1
P rH0 (T ≤ 6) = , P rH0 (T ≤ 7) = , P rH0 (T ≥ 11) = , P rH0 (T ≥ 12) = .
10 5 5 10

As such, if we want a test with size α = 0.2, we could use the rejection region C = {T = 6 or T =
12}. [NOTE: Again, we see that it is not always possible to construct tests for all possible sizes.]
Unfortunately, the exact distribution of the statistic T is quite complicated when m and n are
reasonably large. However, as for the Wilcoxon signed-rank statistic, it turns out that, under the
null hypothesis, the distribution of T is well-approximated by a normal distribution with mean
EH0 (T ) and variance V arH0 (T ). As such, we can deﬁne a rejection region for the Mann-Whitney

test with approximate size α by setting k = Φ−1 (1 − α/2) V arH0 (T ).
Example 4.6: Suppose that we observe two samples of size n = 10 and m = 9 as follows:

X : 4.3, 5.9, 4.9, 3.1, 5.3, 6.4, 6.2, 3.8, 7.1, 5.8,
Y : 5.5, 7.9, 6.8, 9.0, 5.6, 6.3, 8.5, 4.6, 7.5.

The sorted combined sample (along with whether the observations was an Xi or a Yj ) is:

3.1(x), 3.8(x), 4.3(x), 4.6(y), 4.9(x), 5.3(x), 5.5(y), 5.6(y), 5.8(x), 5.9(x),
6.2(x), 6.3(y), 6.4(x), 6.8(y), 7.1(x), 7.5(y), 7.9(y), 8.5(y), 9.0(y).

So, the ranks of the Xi ’s are seen to be R1 = 3, R2 = 10, R3 = 5, R4 = 1, R5 = 6, R6 = 13,

R7 = 11, R8 = 2, R9 = 15, R10 = 9, implying that the observed value of the test statistic is
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 97

T = 75. Furthermore, we see that under the null hypothesis, the mean and variance of T are
given by
10(10 + 9 + 1) 10(9)(10 + 9 + 1)
EH0 (T ) = = 100; V arH0 (T ) = = 150.
2 12
Therefore, a size α = 0.05 test is determined by the critical region C = {|T − 100| ≥ k}, where
√
k = 1.96 150 = 24.005. So, since |T − 100| = |75 − 100| = 25, we reject the null hypothesis (of
course, if we had desired a test of size α = 0.01 we would not have rejected H0 , since in this case
√
the appropriate value of k would have been 2.575 150 = 31.537). Finally, by way of comparison,
we note that the observed number of Xi ’s less than the combined sample median of 5.9 is V = 6.
Using the normal approximation to the distribution of V , we see that a median test with size
√
α = 0.05 is determined by the critical region C = {|V − 5| ≥ 1.96 1.25} = {|V − 5| ≥ 2.19},
since
10 9(10)
EH0 (V ) = = 5; V arH0 (V ) = = 1.25.
2 4(9 + 10 − 1)
Thus, since |V − 5| = 1 in this case, we do not reject the null hypothesis. This is a nice example
of how the median test is less powerful than the Mann-Whitney test (not an overly surprising
result given that the median test ignores more information contained in the observed data than
does the Mann-Whitney test).
As noted at the end of Example 4.6, the Mann-Whitney test is generally more powerful than the
median test since it takes into account, to some degree, the relative sizes of the observed data values.
Of course, it only takes account of these sizes through the use of ranks, and thus still ignores some
potentially relevant information. As such, we close this section by introducing another testing
procedure for the null hypothesis H0 : F (z) = G(z) for all z which does take into account the
actual observed values of the data directly.
Supposing that we have observed two independent samples, X = (X1 , . . . , Xn ) and Y =
(Y1 , . . . , Ym ) and we have settled on some statistic T = T (X, Y ) which can be used to investigate
the potential differences between the two samples (e.g., the most common choice would be X − Y ,
though many other choices are possible). As in the parametric setting, we then construct a test
based on a rejection region of the form C = {T ≤ k1 or T ≥ k2 } for values of k1 and k2 chosen to
ensure that the size of the resulting test was some desired value α. In the parametric setting, we
would use our chosen underlying probability model to determine the value of k1 and k2 . However,
in the current setting, we have avoided making parametric assumptions. Nonetheless, it is possible
to determine values of k1 and k2 under the assumption of the null hypothesis of equal distributions
within the two populations under study. We note that if H0 is true, then the observation labels
(i.e., whether the observation is associated
with the X-sample or the Y -sample) are equally likely to
n+m
have arisen in any of the possible allocations of the observed values to X and Y samples.
n
As such, we can define a new data set X = (X1 , . . . , Xn ), Y = (Y1 , . . . , Ym ), which is just a
permutation of the original samples (so that values in the original X-sample may now appear in
the new Y -sample instead) and a new test statistic value T = T (X , Y ). If we calculate values of
T for all of the possible re-allocations of the data labels, then we can approximate the probability
P rH0 (C) by simply calculating the proportion of these T values which fall in the set C. Or,
conversely, we can construct a rejection region with (approximately) the desired size α by selecting
k1 and k2 to be the lower and upper α/2-quantiles of the observed distribution of the T values;
n+m
that is, if we represent the ordered collection of the N = T values as T[1]

, . . . , T[N ],
n

then k1 = T[N α/2] and k2 = T[N (1−α/2)] [where, of course, we must round off the values N α/2 and
Statistical Inference (STAT3013/8027) - Lecture Notes - Page 98

N (1−α/2) to the nearest integer value]. The test so constructed is often referred to as a permutation
test, due to the process of permutating of sample labels on which it is based. We stress that, despite
the fact that we are using the actual observed values of our data in the construction of the test,
their are no parametric assumptions being employed. The actual implementation of the testing
process is easiest to understand by examination of a simple example:
Example 4.6: Suppose that we observed the two datasets

X1 = 4, X2 = 3, X3 = 7; Y1 = 1, Y2 = 9.

Further, suppose that we use the standard

statistic T = X − Y to distinguish between the two
5
samples. We note that there are = 10 diﬀerent re-allocations of the data values into an
3
X-sample of size 3 and a Y -sample of size 2. These 10 re-allocations, along with their associated
value of T are:

X1 = 1, X2 = 3, X3 = 4, Y1 = 7, Y2 = 9 : T = 2.67 − 8.0 = −5.33

X1 = 1, X2 = 3, X3 = 7, Y1 = 4, Y2 = 9 : T = 3.67 − 6.5 = −2.83
X1 = 1, X2 = 3, X3 = 9, Y1 = 4, Y2 = 7 : T = 4.33 − 5.5 = −1.17
X1 = 1, X2 = 4, X3 = 7, Y1 = 3, Y2 = 9 : T = 4.00 − 6.0 = −2.00
X1 = 1, X2 = 4, X3 = 9, Y1 = 3, Y2 = 7 : T = 4.67 − 5.0 = −0.33
X1 = 1, X2 = 7, X3 = 9, Y1 = 3, Y2 = 4 : T = 5.67 − 3.5 = 2.17
X1 = 3, X2 = 4, X3 = 7, Y1 = 1, Y2 = 9 : T = 4.67 − 5.0 = −0.33
X1 = 3, X2 = 4, X3 = 9, Y1 = 1, Y2 = 7 : T = 5.33 − 4.0 = 1.33
X1 = 3, X2 = 7, X3 = 9, Y1 = 1, Y2 = 4 : T = 6.33 − 2.5 = 3.83
X4 = 1, X2 = 7, X3 = 9, Y1 = 1, Y2 = 3 : T = 6.67 − 2.0 = 4.67.

Since each of these T values is equally likely under the null hypothesis, we see that the region
C = {T ≤ −5.33 or T ≥ 4.67} has an approximate size of 0.2 (since 2 of the ten re-allocations
yield T values which lie in C). As such, we have constructed a test with size α = 0.2. Since
our observed value is T = −0.33 ∈ / C, we see that we cannot reject the null hypothesis H0 :
F (z) = G(z) for all z. Of course, we could just as easily used some other statistic T , say the
difference in medians. In general, the choice of statistic will depend upon how we believe the
two populations are likely to differ from one another, and thus is a quite problem specific issue.
In general, when n + m is large, the number of re-allocations of the labels is extremely large (e.g.,
for the dataset of Example 4.5, where n = 10 and m = 9, there are 92,378 different re-allocations
of the data into two samples of appropriate size). In such cases, it is common practice to use only
a random subset of some number B of the possible re-allocations. We note the similarity in this
regard to the idea underlying the bootstrap introduced in Section 2.6.3. Indeed, the bootstrap can
also be used to construct non-parametric hypothesis tests, but we do not discuss this idea here.

Module 1 - Tests of Hypothesis For A Single Sample
100% (1)
Module 1 - Tests of Hypothesis For A Single Sample
27 pages
STSM3714 (With Notes From Class)
No ratings yet
STSM3714 (With Notes From Class)
110 pages
IE241 Hypothesis Testing
100% (1)
IE241 Hypothesis Testing
30 pages
4 Hypothesis Testing in The Multiple Regression Model
No ratings yet
4 Hypothesis Testing in The Multiple Regression Model
49 pages
4 Hypothesis Testing in The Multiple Regression Model
No ratings yet
4 Hypothesis Testing in The Multiple Regression Model
49 pages
Hypothesis Testing: Concepts and Simple Examples
No ratings yet
Hypothesis Testing: Concepts and Simple Examples
16 pages
2020 Notes SC
No ratings yet
2020 Notes SC
9 pages
Chap 4
No ratings yet
Chap 4
4 pages
SBC 3305
No ratings yet
SBC 3305
11 pages
Stat2602 Chapter5
No ratings yet
Stat2602 Chapter5
28 pages
Hypothesis Tests & Control Charts: by S.G.M
No ratings yet
Hypothesis Tests & Control Charts: by S.G.M
26 pages
Chap 3 Hypothesis Testing
No ratings yet
Chap 3 Hypothesis Testing
20 pages
Lecture 10 - Statistics
No ratings yet
Lecture 10 - Statistics
24 pages
Testing of Hypothesis.
No ratings yet
Testing of Hypothesis.
12 pages
Waqar Ansari's RISE QM Ch#14
No ratings yet
Waqar Ansari's RISE QM Ch#14
20 pages
Lecture Notes 1
No ratings yet
Lecture Notes 1
147 pages
Test of Hypothesis by Zakir Sir
No ratings yet
Test of Hypothesis by Zakir Sir
34 pages
Hypothesis Testing Intro and Test For Means
No ratings yet
Hypothesis Testing Intro and Test For Means
10 pages
Chapter 7 XSTKE
No ratings yet
Chapter 7 XSTKE
26 pages
Testing Hypothesis
No ratings yet
Testing Hypothesis
41 pages
Polytechnic University of The Philippines College of Engineering Department of Industrial Engineering
No ratings yet
Polytechnic University of The Philippines College of Engineering Department of Industrial Engineering
27 pages
ES 209 Lecture Notes Week 14
No ratings yet
ES 209 Lecture Notes Week 14
22 pages
Ch4 Prob II Nau Fall24
No ratings yet
Ch4 Prob II Nau Fall24
24 pages
Testing of Hypotheses PDF
No ratings yet
Testing of Hypotheses PDF
21 pages
5 - Test of Hypothesis (Part - 1)
No ratings yet
5 - Test of Hypothesis (Part - 1)
46 pages
Basic Testing
No ratings yet
Basic Testing
116 pages
Gsbiju MA202 3 3
No ratings yet
Gsbiju MA202 3 3
7 pages
Statistics 5.testing of Significance
No ratings yet
Statistics 5.testing of Significance
15 pages
Statistics and Probability: Lecture 10: Test of Hypotheses For A Single Sample
No ratings yet
Statistics and Probability: Lecture 10: Test of Hypotheses For A Single Sample
14 pages
Unit 5: Hypothesis Testing
No ratings yet
Unit 5: Hypothesis Testing
6 pages
Ch5 MMW BSN
No ratings yet
Ch5 MMW BSN
18 pages
Sample Hypothesis Test
No ratings yet
Sample Hypothesis Test
41 pages
Tests of Hypotheses
No ratings yet
Tests of Hypotheses
24 pages
MNSTA Chapter 4
No ratings yet
MNSTA Chapter 4
31 pages
Test of Hypothesis
67% (12)
Test of Hypothesis
85 pages
Quantitative Techniques: Confirmatory Statistics
No ratings yet
Quantitative Techniques: Confirmatory Statistics
3 pages
What Is Hypothesis Testing
100% (1)
What Is Hypothesis Testing
32 pages
MA8452 PIT - by EasyEngineering - Net 2
No ratings yet
MA8452 PIT - by EasyEngineering - Net 2
74 pages
Handout 7
No ratings yet
Handout 7
16 pages
Z Test
No ratings yet
Z Test
14 pages
Units-12+13+14 Hypothesis Testing Numerical
No ratings yet
Units-12+13+14 Hypothesis Testing Numerical
14 pages
Aui Sta201&203 Lecture5
No ratings yet
Aui Sta201&203 Lecture5
16 pages
Answers To The Learning Activities From Module 8
No ratings yet
Answers To The Learning Activities From Module 8
21 pages
Unit - 1 L2
No ratings yet
Unit - 1 L2
80 pages
Unit 6
No ratings yet
Unit 6
51 pages
G. Hypothesis Testing-1
No ratings yet
G. Hypothesis Testing-1
21 pages
A. The Probability of Type I and II Error B. One-And Two-Tailed Test C. The Use of P-Values For Decision Making in Testing Hypotheses D. Single Sample: Test Concerning A Single Mean
No ratings yet
A. The Probability of Type I and II Error B. One-And Two-Tailed Test C. The Use of P-Values For Decision Making in Testing Hypotheses D. Single Sample: Test Concerning A Single Mean
37 pages
Hypothesis Testing About A Mean 3
No ratings yet
Hypothesis Testing About A Mean 3
39 pages
Topic 2. Distributions, Hypothesis Testing, and Sample Size Determination
No ratings yet
Topic 2. Distributions, Hypothesis Testing, and Sample Size Determination
15 pages
Section 8: Hypothesis Testing: Introduction To Probability & Statistics Dr. Oliver Russell
No ratings yet
Section 8: Hypothesis Testing: Introduction To Probability & Statistics Dr. Oliver Russell
50 pages
Business Research Methods: Prof - Radhika Kiran Kumar Indira Institute of Business Management
No ratings yet
Business Research Methods: Prof - Radhika Kiran Kumar Indira Institute of Business Management
41 pages
Sample Tests
No ratings yet
Sample Tests
66 pages
5 Estimation and Hypothesis Testing
No ratings yet
5 Estimation and Hypothesis Testing
25 pages
Sta301 Lec37
No ratings yet
Sta301 Lec37
53 pages
Lecture Slides 12 UN1201
No ratings yet
Lecture Slides 12 UN1201
26 pages
5 - Test of Hypothesis (Part - 1)
No ratings yet
5 - Test of Hypothesis (Part - 1)
44 pages
Intro of Hypothesis Testing
100% (1)
Intro of Hypothesis Testing
66 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
42 pages
A System of Legal Logic: Using Aristotle, Ayn Rand, and Analytical Philosophy to Understand the Law, Interpret Cases, and Win in Litigation
From Everand
A System of Legal Logic: Using Aristotle, Ayn Rand, and Analytical Philosophy to Understand the Law, Interpret Cases, and Win in Litigation
Russell Hasan
No ratings yet
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter4 Instructors
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter4 Instructors
15 pages
HullOFOD11eProblem 31 - 15
No ratings yet
HullOFOD11eProblem 31 - 15
194 pages
Stock Watson 3U ExerciseSolutions Chapter17 Instructors
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter17 Instructors
26 pages
HullOFOD11eProblem 01 - 40
No ratings yet
HullOFOD11eProblem 01 - 40
2 pages
HullOFOD11eProblem 25 - 30
No ratings yet
HullOFOD11eProblem 25 - 30
19 pages
Advanced Mathematics ( )
No ratings yet
Advanced Mathematics ( )
2 pages
HullOFOD11eProblem 01 - 37
No ratings yet
HullOFOD11eProblem 01 - 37
2 pages
HullOFOD11eProblem 21 - 28
No ratings yet
HullOFOD11eProblem 21 - 28
63 pages
Stock Watson 3U ExerciseSolutions Chapter8 Instructors
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter8 Instructors
14 pages
QRM 10
No ratings yet
QRM 10
101 pages
Stock Watson 3U ExerciseSolutions Chapter15 Instructors
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter15 Instructors
12 pages
QRM 02
No ratings yet
QRM 02
51 pages
QRM 03
No ratings yet
QRM 03
16 pages
Week 2 - Tutorial Solution
No ratings yet
Week 2 - Tutorial Solution
2 pages
Stock Watson 3U ExerciseSolutions Chapter14 Instructors
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter14 Instructors
13 pages
Sheet 2
No ratings yet
Sheet 2
1 page
QRM 01
No ratings yet
QRM 01
45 pages
Week 3 - Tutorial Solutions
No ratings yet
Week 3 - Tutorial Solutions
8 pages
Rplots
No ratings yet
Rplots
5 pages
Lecture Notes - 1
No ratings yet
Lecture Notes - 1
56 pages
Sheet 4
No ratings yet
Sheet 4
1 page
Stock Watson 3U ExerciseSolutions Chapter5 Instructors
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter5 Instructors
18 pages
Sheet 1
No ratings yet
Sheet 1
1 page
Wooldridge 7e Ch06 IM
100% (1)
Wooldridge 7e Ch06 IM
20 pages
W 15808
No ratings yet
W 15808
62 pages
Stock Watson 3U ExerciseSolutions Chapter3 Instructors
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter3 Instructors
23 pages
Tut W2 Sol
No ratings yet
Tut W2 Sol
6 pages
Intraday Liquidity Dynamics and News Releases Around Price Jumps: Evidence From The DJIA Stocks
No ratings yet
Intraday Liquidity Dynamics and News Releases Around Price Jumps: Evidence From The DJIA Stocks
39 pages
Financial Analytics of Inverse BTC Options in A Stochastic Volatility World
No ratings yet
Financial Analytics of Inverse BTC Options in A Stochastic Volatility World
39 pages
J Jfineco 2010 03 009
No ratings yet
J Jfineco 2010 03 009
25 pages
Identifying The Length of A
No ratings yet
Identifying The Length of A
15 pages
Chapter 8 - Confidence Intervals - Lecture Notes
No ratings yet
Chapter 8 - Confidence Intervals - Lecture Notes
12 pages
Stat 112 - Statistics and Probabilityweek11-15
No ratings yet
Stat 112 - Statistics and Probabilityweek11-15
17 pages
Iso 3207 1975
No ratings yet
Iso 3207 1975
11 pages
Fourth Quarter Stat Activity 1
No ratings yet
Fourth Quarter Stat Activity 1
2 pages
STATQ3LAS10
No ratings yet
STATQ3LAS10
6 pages
Topic 03 CIs and Sample Size 03042024 042900pm
No ratings yet
Topic 03 CIs and Sample Size 03042024 042900pm
121 pages
Introduction To Randomized Controlled Clinical Trials - 2nd Edition Entire Volume Download
100% (17)
Introduction To Randomized Controlled Clinical Trials - 2nd Edition Entire Volume Download
14 pages
Loss Models From Data To Decisions Third Edition Stuart A. Klugman PDF Download
100% (1)
Loss Models From Data To Decisions Third Edition Stuart A. Klugman PDF Download
49 pages
Statistics For Management II Group Assignment
No ratings yet
Statistics For Management II Group Assignment
8 pages
Chapter 5 - Estimation
No ratings yet
Chapter 5 - Estimation
8 pages
Solutions One Sample Hypothesis Testing 7
No ratings yet
Solutions One Sample Hypothesis Testing 7
12 pages
Beyene Stat For Management II Chapter 2
No ratings yet
Beyene Stat For Management II Chapter 2
21 pages
ISPE - PV - Packaging For OSD Forms DP
No ratings yet
ISPE - PV - Packaging For OSD Forms DP
22 pages
Business Statistics 4-6
No ratings yet
Business Statistics 4-6
96 pages
Stat New
No ratings yet
Stat New
7 pages
SP Iii-34
No ratings yet
SP Iii-34
3 pages
5 QSS Sampling Distribution
No ratings yet
5 QSS Sampling Distribution
57 pages
Session 16
No ratings yet
Session 16
45 pages
Cea Ece069 p1
No ratings yet
Cea Ece069 p1
52 pages
Course Outlines of SI For BSSS
No ratings yet
Course Outlines of SI For BSSS
6 pages
X X Z N N: Interval Estimate
No ratings yet
X X Z N N: Interval Estimate
10 pages
Topic 11 Confidence Intervals For A Single Sample
No ratings yet
Topic 11 Confidence Intervals For A Single Sample
21 pages
ST130 - Chapter 8
No ratings yet
ST130 - Chapter 8
13 pages
10 67-Hết
No ratings yet
10 67-Hết
9 pages
26th FEB Assignment 1
No ratings yet
26th FEB Assignment 1
2 pages
Full Essentials of Modern Business Statistics With Microsoft Excel 8th Edition David Anderson Ebook All Chapters
100% (12)
Full Essentials of Modern Business Statistics With Microsoft Excel 8th Edition David Anderson Ebook All Chapters
53 pages
Ch08-Statistical Intervals For A Single Sample
No ratings yet
Ch08-Statistical Intervals For A Single Sample
34 pages
Statsprob Finals
No ratings yet
Statsprob Finals
14 pages
ADM-SHS-StatProb-Q3-M23-Identifying The Length of A Confidence Interval
No ratings yet
ADM-SHS-StatProb-Q3-M23-Identifying The Length of A Confidence Interval
27 pages

Lecture Notes - 3

Uploaded by

Lecture Notes - 3

Uploaded by

Statistical Inference (STAT3013/8027) - Lecture Notes - Page 82

4. STATISTICAL HYPOTHESIS TESTING

Similarly, if θ = 1500 the chance of a Type I error is

since KC (θ) is clearly a decreasing function of θ in this case.

4.2. Most Powerful Tests

which together imply that:

P rθ1 (C) = {P rθ1 (C ) − P rθ1 (C ∩ C c )} + P rθ1 (C ∩ C c )

since, by the deﬁnition of C, we have Λ(x1 , . . . , xn ) ≤ kα for any x = (x1 , . . . , xn ) ∈ E ⊆ C.

and kα is determined so that P r0 (C) = α. Now, before we actually determine kα directly,

L(θ; X1 , . . . , X10 ) = θ10X (1 − θ)10(1−X) ,

However, some simple arithmetic shows that:

L(µ1 , . . . , µI , σ12 ,τ22 . . . , τI2 ; X11 , . . . , XIJ )

while under the null hypothesis the likelihood is maximised at:

Substituting these values into the likelihood then yields:

Therefore, the generalised likelihood ratio statistic is given by:

C = {−2 ln[Λg (X11 , . . . , XIJ )] ≥ χ2I−1 (1 − α)}

4.3 Non-parametric Tests

P rH0 (C) ≈ Φ(−2.575) + {1 − Φ(2.575)} = 0.01.

Therefore, if we want a test of approximate size α, we should choose k such that:

So, the ranks of the Xi ’s are seen to be R1 = 3, R2 = 10, R3 = 5, R4 = 1, R5 = 6, R6 = 13,

Further, suppose that we use the standard

X1 = 1, X2 = 3, X3 = 4, Y1 = 7, Y2 = 9 : T = 2.67 − 8.0 = −5.33

You might also like