Chap 3 Hypothesis Testing
Chap 3 Hypothesis Testing
Hypothesis Testing
3.1 Introduction
A statistical hypothesis can never be determined to true or false unless the entire
population is examined, and this is mostly impractical. Instead, the data from a
random sample of the population is used to provide evidence that supports or is against
the stated hypothesis.
In testing any statistical hypothesis, there are four possible situations that deter-
3-1
3.1. Introduction 3. Hypothesis Testing
mine whether our decision is correct or in error. These are presented in the table below.
We note that there are two wrong decisions one can make from this table; that is, type
I error and type II error. These are defined below and their roles on hypothesis testing
elaborated on.
Definition 3.1.2. A type I error is rejecting a null hypothesis when it’s true.
Definition 3.1.3. A type II error is failing to reject a null hypothesis when it’s false.
3-2
3. Hypothesis Testing 3.1. Introduction
The power of the test assess the sensitivity of a test. For example, suppose we were
testing H0 : µ = 68 vs H1 : µ ̸= 68 and we fail to reject H0 when 67 ≤ µ ≤ 69. Then
the power of the test seek the capability of the test to properly reject H0 when indeed
µ = 68.5. If the power is low, then the test would not be good enough for an analyst.
Definition 3.1.5. A test of any statistical hypothesis where the alternative is one
sided, such as
H0 : θ = θ0 vs H1 : θ > θ0
or
H0 : θ = θ0 vs H1 : θ < θ0 ,
Definition 3.1.6. A test of any statistical hypothesis where the alternative is two
sided, such as
H0 :θ = θ0 ,
H1 :θ ̸= θ0
is called a two-tailed test, since the critical region is split into two parts in each tail
of the distribution of the test statistic.
The null hypothesis will often be stated using the equality sign but in the case of
one-tailed tests, the statement of the alternative is the most important consideration.
3-3
3.2. One-sample hypothesis testing 3. Hypothesis Testing
Whether one sets up a one-tailed or a two-tailed test will depend on the conclusion to
be drawn if H0 is rejected.
Example 3.1.1. A manufacturer of a certain brand of rice cereal claims that the
average saturated fat content does not exceed 1.5 grams per serving. State the null
and alternative hypotheses to be used in testing this claim.
Example 3.1.2. A real estate agent claims that 60% of all private residences being
built today are 3-bedroom homes. To test this claim, a large sample of new residences
is inspected; the proportion of these homes with 3 bedrooms is recorded and used as
the test statistic. State the null and alternative hypotheses to be used in this test.
Let X1 , X2, . . . , Xn be a random sample from a distribution with mean µ and variance
σ 2 . Consider the hypothesis
H0 : µ = µ0 ,
H1 : µ ̸= µ0 .
The appropriate test statistic should be based on the random variable, X̄. We re-
call from the CLT that X̄ has approximately a normal distribution with mean µ and
variance σ 2 /n for sufficiently large sample sizes. We can therefore determine a critical
region based on the computed sample average, x̄.
3-4
3. Hypothesis Testing 3.2. One-sample hypothesis testing
X̄ − µ0
Z= ∼ N (0, 1)
σ/sqrtn
X̄ − µ0
P −zα/2 < √ < zα/2 = 1 − α
σ/ n
can be used to write an appropriate non-rejection region. Please note that the critical
region is designed to control α, the probability of type I error.
x̄ − µ0
z= √ (3.1)
σ/ n
Then if −zα/2 < z < zα/2 , we fail to reject the null hypothesis at a level of
significance, α. Otherwise we reject H0 . Note that this implies that there is
probability α of rejecting H0 when indeed µ = µ0 .
Tests of one-sided hypotheses on the mean involve the same statistic described in
the two-sided case. The difference, of course, is that the critical region is only in one
tail of the standard normal distribution. For example, suppose that we seek to test
H0 : µ = µ0 ,
H1 : µ > µ0 .
3-5
3.2. One-sample hypothesis testing 3. Hypothesis Testing
Then rejection of H0 results when the computed z > zα . Similarly if the alternative is
H1 : µ < µ0 then z < −zα results in rejection of H0 .
Example 3.2.1. A random sample of 100 recorded deaths in the Botswana during the
past year showed an average life span of 61.8 years. Assuming a population standard
deviation of 8.9 years, does this seem to indicate that the mean life span today is
greater than 60 years? Use a 0.05 level of significance.
Example 3.2.2. An electrical firm manufactures light bulbs that have a lifetime that is
approximately normally distributed with a mean of 800 hours and a standard deviation
of 40 hours.
(a) State an appropriate null and alternative hypotheses for testing this claim.
(b) Suppose a random sample of 49 bulbs has an average life of 788 hours. Test your
null hypothesis against the alternative at 0.01 level of significance.
The p-value
The pre-selection of the significance level, α, has its roots in the philosophy that the
maximum risk of making a type I error should be controlled. However, this approach
does not account for values of test statistics that are close to the critical region even
though the risk of committing a type I error for such values could hardly be considered
severe.
3-6
3. Hypothesis Testing 3.2. One-sample hypothesis testing
Definition 3.2.1. The p-value is the probability of obtaining a test statistic as ex-
treme or more extreme than the observed test statistic under the null hypothesis. That
is,
p = P(|Z| ≥ z | µ = µ0 ) (3.2)
The smaller values of p-value implies that there is stronger evidence against H0 .
That is, if the p-value is sufficiently small, we may be willing to abandon the assump-
tion that H0 is true and believe that H1 is more plausible. This is rejecting the null
hypothesis. When using the p-value, there is no need to specify the level of signifi-
cance because the conclusions are drawn on the basis of its size in harmony with the
subjective judgement of the investigator.
Example 3.2.3. Recently many companies have been allowing people to work from
home due to COVID-19. Among other things, working from home is supposed to
reduce the number of sick days taken by employees. Suppose at one firm, it is known
that pre-covid employees have taken a mean of 5.4 days with standard deviation of
2.5 days. During 2021, management chose a simple random sample of 80 employees to
follow in a year and they averaged 4.5 sick days.
(a) State an appropriate null and alternative hypotheses for testing the statement
above.
(b) Find the p-value for testing your null hypothesis against the alternative and
interpret your results.
3-7
3.2. One-sample hypothesis testing 3. Hypothesis Testing
The reader should realize by now that the hypothesis-testing approach to statistical
inference in this chapter is very closely related to the confidence interval approach. It
turns out that the testing of H0 : µ = µ0 against H1 : µ ̸= µ0 at a significance level α
is equivalent to computing a 100(1 − α)% confidence interval on µ and rejecting H0 if
µ0 is outside the confidence interval.
3-8
3. Hypothesis Testing 3.2. One-sample hypothesis testing
H0 : µ = µ0 vs H1 : µ ̸= µ0
x̄ − µ0
t= √ (3.3)
s/ n
Example 3.2.4. According to a dietary study, high sodium intake may be related to
ulcers, stomach cancer, and migraine headaches. The human requirement for salt is
only 220 mg per day, which is surpassed in most single servings of ready-to-eat cereals.
If a random sample of 20 similar servings of a certain cereal has a mean sodium content
of 244 mg and a standard deviation of 24.5 mg, does this suggest at the 0.05 level of
significance that the average sodium content for a single serving of such cereal is greater
than 220 milligrams? Assume the distribution of sodium contents to be normal.
We consider the problem of testing the hypothesis that the proportion of successes
in a binomial experiment equals some specified value in this section. That is, we are
testing the null hypothesis H0 that p = p0 , where p is the parameter of the binomial
distribution. The alternative hypothesis may be one of the usual one-sided or two-sided
3-9
3.2. One-sample hypothesis testing 3. Hypothesis Testing
alternatives:
p < p0 , p > p0 or p ̸= p0
We use the normal approximation so that the z-value test statistic for p = p0
is given by
p̂ − p0
z=p (3.4)
p0 (1 − p0 )/n
Example 3.2.5. A commonly prescribed drug for relieving nervous tension is believed
to be only 60% effective. Experimental results with a new drug administered to a
random sample of 100 adults who were suffering from nervous tension show that 70
received relief. Is this sufficient evidence to conclude that the new drug is superior to
the one commonly prescribed? Use a 0.05 level of significance.
Example 3.2.6. Suppose that, in the past, 80% of all adults favored capital pun-
ishment. Do we have reason to believe that the proportion of adults favoring capital
punishment has decreased if, in a random sample of 15 adults, 11 favor capital punish-
ment? Use a 0.05 level of significance.
Exercise 1
Group Assignment: The class should be divided into pairs of students for this
project. Suppose it is conjectured that at least 25% of students at your university
exercise for more than two hours a week. Collect data from a random sample of 50
3-10
3. Hypothesis Testing 3.3. Two-samples hypothesis testing
students. Ask each student if he or she works out for at least two hours per week.
Then do the computations that allow either rejection or non-rejection of the above
conjecture. Show all work and quote a P-value in your conclusion.
Now suppose we wish to determine whether the means of two independent populations
are equal. The basic idea is to compute the difference of the sample means. If the dif-
ference is far from zero then we can conclude that the population means are different.
Otherwise if the difference is close to 0, we can conclude that the population means
might be the same. More formally,
X̄ − Ȳ
z=p 2 (3.5)
σ1 /n1 + σ22 /n2
Note that if the variances are unknown but n1 and n2 are sufficiently large,
then the variances may be approximated with s21 and s22 .
Example 3.3.1. A random sample of size n1 = 25, taken from a normal population
with a standard deviation σ1 = 5.2, has a mean x̄1 = 81. A second random sample
of size n2 = 36, taken from a different normal population with a standard deviation
3-11
3.3. Two-samples hypothesis testing 3. Hypothesis Testing
σ2 = 3.4, has a mean x̄2 = 76. Test the hypothesis that µ1 = µ2 against the alternative,
µ1 ̸= µ2 . Quote a p-value in your conclusion.
Example 3.3.2. Two soft drink filling machines are being compared. The number of
containers filled each minute is counted for 60 minutes for each machine. During the
60 minutes, machine 1 filled an average of 73.8 cans per minute compared to 76.1 cans
per minute for machine 2 with standard deviations of 5.2 and 4.1 cans respectively.
(a) If the counts are made each minute for 60 consecutive minutes, what assumption
necessary to the validity of a hypothesis test may be violated?
(b) Assuming that all necessary assumptions are met, can you conclude that machine
2 is faster than machine 1?
Independent samples
We know from the first chapter that the Central Limit Theorem is only applicable
when the sample size is sufficiently large. However, it may occur that decisions are
based on small sample sizes due to several reasons. For example, when a procedure
used for testing a particular product for quality standards is prohibitively expensive
or in cases where research is carried on a rare species and so on. In those cases, the
t-distribution can be used to construct a hypothesis test with an assumption that the
populations are approximately normal or are not too skewed.
3-12
3. Hypothesis Testing 3.3. Two-samples hypothesis testing
(X̄ − Ȳ ) − ∆0
t= p (3.6)
sp 1/n1 + 1/n2
Example 3.3.3. A researcher claims that the average life span of mice can be extended
by as much as 8 months when the calories in their diet are reduced by approximately
40% from the time they are weaned. The restricted diets are enriched to normal levels
by vitamins and protein. Suppose that a random sample of 10 mice is fed a normal diet
and has an average life span of 32.1 months with a standard deviation of 3.2 months,
while a random sample of 15 mice is fed the restricted diet and has an average life span
of 37.6 months with a standard deviation of 2.8 months.
Test the hypothesis, at the 0.05 level of significance, that the average life span of
mice on this restricted diet is increased by 8 months against the alternative that the
increase is less than 8 months. Assume the distributions of life spans for the regular
and restricted diets are approximately normal with equal variances.
Example 3.3.4. In a study to compare the effectiveness of online learning with tradi-
tional classroom instruction, 12 students took a business course online while 14 students
took it in a classroom. The final exam scores for the online students averaged 76.8 with
a standard deviation of 8.6, whereas the final exam scores for the classroom students
averaged 80.1 with a standard deviation of 9.3.
3-13
3.3. Two-samples hypothesis testing 3. Hypothesis Testing
Can you conclude that the traditional classroom instruction is more effective?
Paired samples
D̄ − µD
T = √
sD / n
H0 : µD = d0
d − d0
t= √ (3.7)
sd / n
Critical regions are constructed using the t-distribution with n − 1 degrees of freedom.
Example 3.3.5. A taxi company manager is trying to decide whether the use of radial
tires instead of regular belted tires improves fuel economy. Twelve cars were equipped
with radial tires and driven over a prescribed test course. Without changing drivers,
the same cars were then equipped with regular belted tires and driven once again over
the test course. The gasoline consumption, in kilometers per liter, was recorded as
3-14
3. Hypothesis Testing 3.3. Two-samples hypothesis testing
follows:
Can we conclude that cars equipped with radial tires give better fuel economy than
those equipped with belted tires? Assume the populations to be normally distributed.
Use a p-value in your conclusion.
Situations often arise where we wish to test the hypothesis that two proportions are
equal. For example, a person may decide to give up smoking only if he or she is
convinced that the proportion of smokers with lung cancer exceeds the proportion of
non-smokers with lung cancer. In that case we may wish to compare the proportions
of non-smokers with lung cancer against the similar proportion for the smokers.
3-15
3.3. Two-samples hypothesis testing 3. Hypothesis Testing
x1 + x2
p̂ = (3.8)
n1 + n2
where x1 and x2 are the numbers of successes in each of the two samples. Substituting
p̂ and 1 − p̂ for p and q respectively, the z-value for testing H0 is given by
p̂1 − p̂2
z=q (3.9)
p̂q̂ 1/n1 + 1/n2
The critical regions for the appropriate alternative hypotheses are set up as before,
using critical points of the standard normal curve.
Example 3.3.6. A vote is to be taken among the residents of a town and the surround-
ing villages to determine whether a proposed chemical plant should be constructed.
The construction site is within the town limits, and for this reason many voters in
the villages believe that the proposal will pass because of the large proportion of town
voters who favor the construction. To determine if there is a significant difference in
the proportions of town voters and voters in the villages favoring the proposal, a poll
is taken. If 120 of 200 town voters favor the proposal and 240 of 500 villages’ residents
favor it, would you agree that the proportion of town voters favoring the proposal is
3-16
3. Hypothesis Testing
3.4. One- and two- sample hypothesis testing for sample variances
higher than the proportion of county voters? Use an α = 0.05 level of significance.
sample variances
Lets consider the problem of testing the null hypothesis H0 that the population variance
σ 2 equals a specified value σ02 against one of the usual alternatives. Then the appro-
priate test statistic is based on the sampling distribution of the statistic (n − 1)S 2 /σ 2 .
Therefore, if we assume that the distribution of the population being sampled is normal,
the chi-squared value for testing H0 : σ 2 = σ02 is given by
(n − 1)s2
χ2 = ,
σ02
where n is the sample size and s2 is the sample variance. For a two-tailed test at the
α-level of significance, the critical region is χ2 < χ2α/2,n−1 or χ2 > χ21−α/2,n−1 .
Solution 3.4.1. The hypotheses from the above problem are deduced as follows
H0 :σ 2 ≤ 1.15
H1 :σ 2 > 1.15
3-17
3.4. One- and two- sample hypothesis testing for sample variances
3. Hypothesis Testing
(n − 1)s2
χ2 =
σ02
24(2.03)
=
1.15
= 42.365
From the chi-square statistical table, it can be seen that χ21−α,n−1 = χ20.95,24 = 36.415
and since χ2 > χ20.95,24 , we reject the null hypothesis and conclude that the dispensing
machine is out of control.
Now let us consider the problem of testing the equality of the variances σ12 and σ22
of two populations. That is, we shall test the null hypothesis H0 that σ12 = σ22 against
one of the usual alternatives
For independent random samples of sizes n1 and n2 , respectively, the test statistic for
H0 is the ratio
s21
f= (3.10)
s22
where s21 and s22 are the two sample variances. If the two populations are approximately
normally distributed and the null hypothesis is true, then it can be shown that the ratio
f = s21 /s22 is a value of the F-distribution with υ1 = n1 − 1 and υ2 = n2 − 1 degrees
of freedom. Therefore, the rejection regions of size α corresponding to the two-sided
3-18
3. Hypothesis Testing
3.4. One- and two- sample hypothesis testing for sample variances
Men Women
n1 = 11 n2 = 14
s1 = 6.1 s2 = 5.3
Test the hypothesis that σ12 = σ22 against the alternative that σ12 > σ22 . Use a p-value
in your conclusion.
Solution 3.4.2. The test statistic is given by
s21 6.12
f= = = 1.32
s22 5.32
and under H0 is the value of Fυ1 =10,υ2 =13 . From the F statistical table we have that
F0.9 (10, 13) = 2.14 and therefore, the p-value of the observed value is given by
p = P f10,13 > 1.32
> P f10,13 > 2.14 = 0.10
That is, there is more that 10% chance that we may reject H0 while in fact the null
hypothesis is true. This is a very significant chance! Therefore we fail to reject the
3-19
3.4. One- and two- sample hypothesis testing for sample variances
3. Hypothesis Testing
null hypothesis and conclude that the variances of the times for women are the same
as those of their male counterparts.
In R, the exact p-value is given by 1 - pf(1.32, 10, 13) and was found to be
0.314.
3-20