6.1 The Elements of A Test of Hypothesis: Date of Latest Update: August 20
6.1 The Elements of A Test of Hypothesis: Date of Latest Update: August 20
Department of Statistics
Summer Session II
Suppose you wanted to determine whether the mean level of a driver’s blood alcohol
exceeds the legal limit after two drinks, or whether the majority of registered voters
approve of the president’s performance. In both cases, you are interested in making an
inference about how the value of a parameter relates to a specific numerical value. Is it
less than, equal to, or greater than the specified number? This type of inference, called a
test of hypothesis.
1. Null hypothesis(H0 ): A theory about the values of one or more population param-
eters. The theory generally represents the status quo, which we adopt until it is
proven false. By convention, the theory is stated as H0 : parameter=value.
2. Alternative (research) hypothesis (Ha ): A theory that contradicts the null hypothe-
sis. The theory generally represents that which we will accept only when sufficient
evidence exist to establish its truth.
3. Test statistic: A sample statistic used to decide whether to reject the null hypothesis.
4. Rejection region: The numerical values of the test statistic for which the null hy-
pothesis will be rejected. The rejection region is chosen so that the probability is
α that it will contain the test statistic when the null hypothesis is true, thereby
leading to a Type I error. The value of α is usually chosen to be small (e.g, 0.01,
0.05, or 0.10) and is referred to as the level of significance of the test.
1
7. Conclusion:
a. If the numerical value of the test statistic falls into the rejection region, we reject
the null hypothesis and conclude that the alternative hypothesis is true. We know
that the hypothesis-testing process will lead to this conclusion incorrectly (a Type
I error) only 100α% of the time when H0 is true.
b. If the test statistic does not fall into the rejection region, we do not reject H0 .
Thus, we reserve judgement about which hypothesis is true. We do not conclude
that the null hypothesis is true because we do not (in general) know the probability
β that our test procedure will lead to an incorrect acceptance of H0 (a Type II
error).
Example 1 Suppose building specifications in a certain city require that the average break-
ing strength of residential sewer pipe be more than 2,400 pounds per foot of length (i.e.,
per linear foot). Each manufacturer that wants to sell pipe in that city must demonstrate
that its product meets the specification. Note that we are interested in making an infer-
ence about the mean µ of a population. However, in this example, we are less interested
in estimating the value of µ than we are in testing a hypothesis about its value. That is,
we want to decide whether the mean breaking strength of the pipe exceeds
2,400 pounds per linear foot. Suppose we test 50 sections of sewer pipe and find the
mean and standard deviation for the measurements are x̄ = 2, 460 pounds per linear foot
and s = 200 pounds per linear foot. Using α = 0.05.
Solution:
1. Null hypothesis (H0 ): µ ≤ 2, 400 (i.e., the manufacturer’s pipe does not meet spec-
ifications)
2. Alternative hypothesis (Ha ): µ > 2, 400 (i.e., the manufacturer’s pipe meets specifi-
cations)
x̄−2,400 x̄−2,400 x̄−2,400 2,460−2,400
3. Test statistic: z = σx̄
= √
σ/ n
≈ √
s/ n
= √
200/ 50
= 2.12
5. Conclusion: The sample mean lies 2.12σx̄ above the hypothesized value of µ =
2, 400. Since this value of z exceeds 1.645, it falls into the rejection region. That is,
we reject the null hypothesis that µ = 2, 400 and conclude that µ > 2, 400. Thus,
it appears that the company’s pipe has a mean strength that exceeds 2,400 pounds
per linear foot.
Note: When the null hypothesis contains more than one value of µ, as in this case
(H0 : µ ≤ 2, 400), we use the value of µ closest to the values specified in the alterna-
tive hypothesis. The idea is that if the hypothesis that µ equals 2,400 can be rejected in
favor of µ > 2, 400, then µ less than or equal to 2,400 can certainly be rejected.
2
Example 2 If the sample mean breaking strength to for the 50 sections of sewer pipe in
Example 1 turned out to be x̄ = 2, 430 pounds per linear foot. Assume that the sample
standard deviation is still s = 200. Let’s repeat the test at α = 0.05.
Solution:
1. Null hypothesis (H0 ): µ ≤ 2, 400 (i.e., the manufacturer’s pipe does not meet speci-
fications)
2. Alternative hypothesis (Ha ): µ > 2, 400 (i.e., the manufacturer’s pipe meets specifi-
cations)
x̄−2,400 x̄−2,400 x̄−2,400 2,430−2,400
3. Test statistic: z = σx̄
= √
σ/ n
≈ √
s/ n
= √
200/ 50
= 1.06
5. Conclusion: The sample mean x̄ = 2, 430 is only 1.06 standard deviations above
the null hypothesized value of µ = 2, 400. This value does not fall into the rejection
region (z > 1.645). Therefore, we know that we cannot reject H0 if we use α =
0.05. Even though the sample mean exceed the specification by enough to provide
convincing evidence that the population mean exceeds 2,400.
Note: Concluding that the null hypothesis is true (the pipe does not meet specifications)
when in fact it is false (the pipe does meet specifications) is the Type II decision error.
Note: Be careful not to “accept H0 ” when conducting a test of hypothesis, since the mea-
sure of reliability, β = P (Type II error), is almost always unknown. If the test statistic
does not fall into the rejection region, it is better to state the conclusion as “insufficient
evidence to reject H0 .”
3
6.2 Large-Sample Test of Hypothesis about a Population Mean
The null and alternative hypotheses may take one of several forms, a one-tailed( or one-
sided) statistical test and a two-tailed (or two-sided) hypothesis.
Steps for Selecting the Null the Alternative Hypotheses
1. Select the alternative hypothesis as that which the sampling experiment is intended
to establish. The alternative hypothesis will assume on of three forms:
a. One tailed, upper tailed Example: Ha : µ > 2, 400
b. One tailed, lower tailed Example: Ha : µ < 2, 400
a. Two tailed Example: Ha : µ 6= 2, 400
2. Select the null hypothesis as the status quo – that which will be presumed true unless
the sampling experiment conclusively establishes the alternative hypothesis. The
null hypothesis will be specified as that parameter value closest to the alternative in
one-tailed tests and as the complementary (or only unspecified) value in two-tailed
tests. Example: H0 : µ = 2, 400
Rejection region: z < −zα Rejection region: z < −zα/2 or z > zα/2
(or z > zα when Ha : µ > µ0 )
where zα is chosen so that where zα/2 is chosen so that
P (z < −zα ) = α P (z > zα/2 ) = α/2
Note: µ0 is the symbol for the numerical value assigned to µ under the null hypothesis.
Conditions Required for a Valid Large-Sample Hypothesis Test for µ
1. A random sample is selected from the target population.
2. The sample size n is large (i.e., n ≥ 30). (Due to the central limit theorem, this
condition guarantees that the test statistic will be approximately normal regardless
of the shape of the underlying probability distribution of the population.)
4
Possible Conclusions for a Test of Hypothesis
1. If the calculated test statistic falls into the rejection region, reject H0 and conclude
that the alternative hypothesis Ha is true. State that you are rejecting H0 at the
α level of significance. Remember that the confidence is in the testing process, not
the particular result of a single test.
2. If the test statistic does not fall into the rejection region, conclude that the sampling
experiment does not provide sufficient evidence to reject H0 at the α level of signif-
icance. [Generally, we will not “accept” the null hypothesis unless the probability
β of a Type II error has been calculated.
Example 3 The effect of drugs and alcohol on the nervous system has been the subject
of considerable research. Suppose a research neurologist is testing the effect of a drug on
response time by injecting 100 rats with a unit dose of the drug, subjecting each rat to a
neurological stimulus, and recording its response time. The mean and standard deviations
for the 100 records are x̄ = 1.05 and s = 0.5 respectively. The neurologist knows that
the mean response time for rats not injected with the drug (the “control” mean) is 1.2
seconds. She wishes to test whether the mean response time for drug-injected rats differs
from 1.2 seconds. Set up the test of hypothesis for this experiment, using α = 0.01.
Solution:
1. H0 : µ = 1.2
2. Ha : µ 6= 1.2
x̄−1.2 x̄−1.2 x̄−1.2 1.05−1.2
3. Test statistic: z = σx̄
= √
σ/ n
≈ √
s/ n
= √
0.5/ 100
= −3.0
Note: Since the sample size of the experiment is large enough (n > 30), the central limit
theorem will apply, and no assumptions need be made about the population of response
time measurement. The sampling distribution of the sample mean response of 100 rats will
be approximately normal, regardless of the distribution of the individual rats’ response
times.
5
6.3 Observed Significance Levels: p-Values
A second method of presenting method of presenting the results of a statistical test reports
the extent to which the test statistic disagrees with the null hypothesis and leaves to
the reader the task of deciding whether to reject the null hypothesis. This measure of
disagreement is called the observed significance level (or p-value) for the test.
Definition 6.1 The observed significance level, or p-value, for a specific statistical
test is the probability (assuming that H0 is true) of observing a value of the test statistic
that is at least as contradictory to the null hypothesis, and supportive of the alternative
hypothesis, as the actual one computed from the sample data.
1. If the test is one-tailed, the p-value is equal to the tail area beyond z in the same
direction as the alternative hypothesis. Thus, if the alternative hypothesis is of
the form >, the p-value is the area to the right of, or above, the observed z-value.
Conversely, if the alternative is of the form <, the p-value is the area to the left of,
or below, the observed z-value.
2. If the test is two tailed, the p-value is equal to twice the tail area beyond the observed
z-value in the direction of the sign of z. That is, if z is positive, the p-value is twice
the area to the right of, or above, the observed z value. Conversely, if z is negative,
the p-value is twice the area to the left of, or below, the observed z-value.
2. If the observed significance level (p-value) of the test is less than the chosen value
of α, reject the null hypothesis. Otherwise, do not reject the null hypothesis.
6
6.4 Small-Sample Test of Hypothesis about a Population Mean
Refer to section 5.3, when we are faced with making inferences about a population mean
from the information in a small sample, two problems emerge:
1. The normality of the sampling distribution for x̄ does not follow from the central
limit theorem when the sampling size is small. We must assume that the distribution
of measurements from which the sample was selected is approximately normally
distributed.
2. If the population standard deviation σ is unknown, as is usually the case, then we
cannot assume that s will provide a good approximation for σ when the sample size
is small. Instead, we must use the t-distribution rather than the standard normal
z-distribution to make inferences about the population mean µ.
7
Conditions Required for a Valid Large-Sample Hypothesis Test for p
2. The sample size n is large. (This condition will be satisfied if np0 and nq0 are both
at least 15.)
Example 4 The reputations (and hence sales) of many businesses can be severely damaged
by shipments of manufactured items that contain a large percentage of defectives. For
example, a manufacturer of alkaline batteries many want to be reasonably certain that less
than 5% of its batteries are defective. Suppose 300 batteries are randomly selected from a
very large shipment each is tested and 10 defective batteries are found. Does this outcome
provide sufficient evidence for the manufacturer to conclude that the fraction defective in
the entire shipment is less than 0.05? Use α = 0.01.
Solution:
1. H0 : p = 0.05
2. Ha : p < 0.05
p̂−p0 p̂−0.05 (10/300)−0.05
3. Test statistic: z = σp̂
= σp̂
= √ = √ 0.03333−0.05 = −1.32
p0 q0 /n (0.05)(0.95)/300
5. Conclusion: The calculated z-value does not fall into the rejection region. Therefore,
there is insufficient evidence at the 0.01 level of significance to indicate that the
shipment contains less than 5% defective batteries.
Note:
1. Before conducting the test, we should check to determine whether the sample size
is large enough to use the normal approximation to the sampling distribution of p̂.
Since np0 = (300)(0.05) = 15 and nq0 = (300)(0.95) = 285 are both at least 15, the
normal approximation will be adequate.