Chapter 9.
Chapter 9.
Introduction
What is a hypothesis?
Hypothesis testing is a procedure, based on sample evidence and probability, for testing
claims about a property/characteristic of a population.
Suppose that your state legislature is considering a proposal to lower the legal limit for the
blood alcohol concentration (BAC) that constitutes drunk driving. A legislator wants to
determine whether a majority of adults in her district favor this proposal. To gather infor-
mation, she surveys 200 randomly selected individuals from her district and learns that 59%
of these people favor the proposal. This sample information will be used to decide whether
more than 50% of the population in her district favors the proposal.
Can the legislator conclude that a majority of all adults in her district favor this proposal?
Because the result is based on a sample, there is the possibility that the observed majority
may have occurred just by the “luck of the draw.” If a majority of the whole population
actually is opposed to the proposal (so 50% or less favor it), how likely is it that 59% of a
random sample from this population would favor it?
Because we use sample data to test hypotheses, we cannot state with 100% certainty that
the statement is true; we can only determine whether the sample data support the statement
or not. In fact, because the statement can be either true or false, hypothesis testing is based
on two types of hypotheses.
Alternative Hypothesis (Ha ): A statement that we are trying to find evidence to support;
cantradictory to H0 .
Left- and right-tailed tests are referred to as one-tailed tests. Only our alternative hy-
pothesis changes; the null hypothesis remains the same in all three tests.
(b) The packaging on a light bulb states that the bulb will last 500 hours under normal
use. A consumer advocate would like to know if the mean lifetime of a bulb is less than
500 hours.
(c) The standard deviation of the rate of return for a certain class of mutual funds is 0.08.
A mutual fund manager believes the standard deviation of the rate of return for his
fund is less than 0.08.
Example 3. Determine the null and alternative hypotheses and the tailed-ness.
(a) The Medco pharmaceutical company has just developed a new antibiotic for chil-
dren. Among the competing antibiotics, 2% of children who take the drug experience
headaches as a side effect. A researcher for the Food and Drug Administration wishes to
know if the percentage of children taking the new antibiotic who experience headaches
as a side effect is more than 2%.
(b) The Blue Book value of a used 3-year-old Chevy Corvette is $37,500. Grant wonders if
the mean price of a used 3-year-old Chevy Corvette in the Miami metropolitan area is
different from $37,500.
(c) The standard deviation of the content of a 64-ounce bottle of detergent using an old
filling machine was known to be 0.23 ounces. The company purchased a new filling
machine and wants to know if there is less variability with the new filling machine.
Everytime we make a decision (either reject H0 or fail to reject H0 ), we either made the
correct decision or we made an error.
The numbering of the errors indicates which of the two hypotheses, (1) null or (2) alternative,
is actually true. For example, a Type I error is the error that occurs when the first hypothesis,
the null, is really true, but we decide in favor of the alternative.
Power = probability we correctly reject the null hypothesis, which occurs with
probability 1 − β.
Think of a criminal trial. In any trial, the defendant is assumed to be innocent. (We give the
defendant the benefit of the doubt). The district attorney must collect and present evidence
proving that the defendant is guilty beyond all reasonable doubt.
Because we are seeking evidence for guilt, it becomes the alternative hypothesis. Innocense
is assumed, so it is the null hypothesis. The hypotheses for a trial are written
Using this court analogy, the two correct decisions are to:
• conclude that an innocent person is not guilty or
• conclude that a guilty person is guilty.
The two incorrect decisions are to:
• convict an innocent person (a Type I error) or
• to let a guilty person go free (a Type II error).
Rather than focusing on the risk of making a mistake, many investigators prefer to focus on
the chance that their sample will provide the evidence necessary to make the right choice.
The power of a hypothesis test is the probability (1 − β) of rejecting a false null hypothesis.
That is, that we decide in favor of the alternative hypothesis given a specific truth about
the population. When the alternative is actually true, power is the probability that we do
not make a Type II error.
There are two features of power that apply to all hypothesis tests and that researchers should
keep in mind when they plan a study:
• The power increases when the sample size is increased. The sample statistic is a more
accurate estimate of the population value, making it easier to detect a difference between
the true population value and the null value (Law of Large Numbers).
• The power increases when the difference between the true population value and the null
hypothesis value increases. This makes sense because the probability of detecting a
large difference is higher than the probability of detecting a small difference.
For a specified true value and level of significance, we can either calculate the power for a
given sample size or we can compute the sample size required to achieve a desired power.
The table below shows the power for three different “true” proportions and for three different
sample sizes, for a test with a significance level = 0.05. As you look at the table, keep in
mind that the power is the probability the sample evidence will lead us to conclude that a
majority of the student population would attend a regular summer term.
True Population
Proportion
0.52 0.60 0.65
n = 50 0.09 0.41 0.69
Sample Size n = 100 0.11 0.64 0.92
n = 400 0.20 0.99 Nearly 1
Example 7. Suppose that a test for ESP has four choices, and that the probability of a
correct guess by chance on each trial is 0.25. A researcher believes that the true probability
of a correct guess is 0.33. The following output shows the power of the one-sided test for
this situation for three possible sample sizes:
Sample
Size Power
50 0.3776
100 0.5740
400 0.9705
(b) Write a sentence providing the power of the test for n = 100 and explain its meaning.
(c) If the researcher wants to have at least a 0.95 probability of detecing ESP in the study,
and is correct that the true probability of a success is 0.33, would a sample of size 400
be sufficient? Explain.
(d) If the true probability of success is actually 0.40 on each trial, would the power for each
sample size be higher or lower than that shown in the output? Explain.
(b) According the the Centers for Disease Control and Prevention, 16% of children aged 6 to
11 years are overweight. A school nurse thinks that the percentage of 6- to 11-year-olds
who are overweight is higher in her school district.
Example 9. According to the Centers for Disease Control and Prevention, in 2005, 15.2%
of tenth-grade students had tried marijuana. The Drug Abuse and Resistance Education
(DARE) program underwent several major changes to keep up with technology and issues
facing students in the 21st century. After the changes, a school resource officer (SRO) thinks
that the proportion of tenth-grade students who have tried marijuana has decreased from
the 2005 level.
(b) Assume you fail to reject H0 , and suppose, in fact, that the proportion of tenth-grade
students who have tried marijuana is 14.7%. Was a Type I or Type II error committed?
Earlier in the course, we discussed sampling distributions. Particular distributions are as-
sociated with hypothesis testing. Perform tests of a population mean using a Student’s
t-distribution. (Remember, use a Student’s t-distribution when the population standard
deviation is unknown and the distribution of the sample mean is approximately normal.)
We perform tests of a population proportion using a normal distribution (usually n is large).
If you are testing a single population mean, the distribution for the test is for means:
X ∼ tdf
The population parameter is µ. The estimated value (point estimate) for µ is x̄, the sample
mean.
If you are testing a single population proportion, the distribution for the test is for propor-
tions or percentages:
r !
p(1 − p)
p̂ ∼ N p,
n
x
The population parameter is p. The estimated value (point estimate) for p is p̂. p̂ = where
n
x is the number of successes and n is the sample size.
• For means
– X ∼ N if X ∼ N
• For proportions
– X must follow a binomial distribution
– np and nq must both be at least 5
After determining the null and alternative hypotheses, the next step is to calculate the data
summary called a test statistic that measures the difference between the sample result and
the null value.
p̂ − p
• Test statistic for a single proportion: z = r
p(1 − p)
n
x̄ − µ
• Test statistic for a single mean: t = √
s/ n
We, then, compute a probability, called a p-value, which is the probability of getting a value
of the test statistic that is at least as extreme as the test statistic obtained from the sample
data, assuming the null hypothesis is true.
In hypothesis testing, the objective is to decide if we should reject the null hypothesis in
favor of the alternative. We do this by comparing the p-value to a designated standard called
the level of significance for the test.
The details of how to find the p-value - the probability of a test statistic as extreme as
or more extreme than the observed test statistic - depend on the direction specified in the
alternative hypothesis:
• For a less than alternative hypothesis, find the probability that the test statistic z could
have been equal to or less than what it is.
• For a greater than alternative hypothesis, find the probability that the test statistic z
could have been equal to or greater than what it is.
• For a two-tailed alternative hypothesis, the p-value includes the probability areas in
both extremes of the distribution of the test statistic z.
The level of significance, α (alpha), is the value that is the borderline between when a
p-value is small enough to choose the alternative hypothesis, and when it is not small enough.
• If p-value ≤ α, we reject the null hypothesis. (“If the P is low, the null must go!”)
• If p-value > α, we fail to reject the null hypothesis.
The level of significance is chosen by the researcher BEFORE the experiment/study begins.
The phrase statistically significant is used to describe the results when the researcher has
decided that the p-value is small enough to decide in favor of the alternative hypothesis.
Before technology became so widespread, it was difficult or impossible to compute the p-value
in many circumstances, so researchers used a method called the rejection region approach.
The critical region, or rejection region, in a hypotheis test is the region of possible values
for the test statistic that would lead to rejection of the null hypothesis.
A boundary of a rejection region is called a critical value. The critical value is denoted as
z ∗ or t∗ (similar to what we found when computing confidence intervals), while the rejection
region is the area more extreme than this value.
We compare the test statistic to the rejection region. If the test statistic falls in the rejection
region, the null hypothesis is rejected.
Ppossible decisions:
• We reject the null hypothesis
• We fail to reject the null hypothesis
We never ‘ACCEPT’ the null hypothesis.
The p-value method and the rejection region method give equivalent conclusions.
Recall our criminal trial. The trial is the process whereby the jury obtains information
(sample data). The jury then deliberates about the evidence (the data analysis). Finally,
the jury either convicts the defendant (rejects the null hypothesis) or declares the defendant
not guilty (fails to reject the null hypothesis).
Note: the defendant is never declared innocent. That is, we never conclude that the null
hypothesis is true.
Notes:
• Remember that the hypotheses are always statements about a parameter, not about a
statistic. The whole point is to see what you can infer about an unknown parameter
value based on a sample statistic.
• The p-value is NOT the probability that the null hypothesis is true. Rather, it is the
probability of obtaining such an extreme sample result (or one even more extreme) if
the null hypothesis were true.
• Remember we do NOT accept the null hypothesis, we fail to reject the null.
1. Set up hypothesis:
5. OR Decision Rule: If the test statistic is MORE EXTREME than the critical value,
reject the null hypothesis.
6. Conclusion - State your decision and your conclusion in terms of the problem.
If you Reject H0 : There is sufficient evidence to conlcude [statement in Ha ].
If you Fail to Reject H0 : There is not sufficient evidence to conlcude [statement in Ha ].
Step 1: Hyp
Step 2: Req
Step 3: TS
Step 4: p-val
Step 5: Dec
Step 6: Conc
Step 4: CV
Step 5: Dec
Step 1: Hyp
Step 2: Req
Step 3: TS
Step 4: p-val
Step 5: Dec
Step 6: Conc
Step 4: CV
Step 5: Dec
1. Set up hypothesis:
5. OR Decision Rule: If the test statistic is MORE EXTREME than the critical value,
reject the null hypothesis.
6. Conclusion - State your decision and your conclusion in terms of the problem.
If you Reject H0 : There is sufficient evidence to conlcude [statement in Ha ].
If you Fail to Reject H0 : There is not sufficient evidence to conlcude [statement in Ha ].
The critical value for z with an area to the right of 0.10 is approximately 1.28. Notice that
the critical value for t is bigger than the corresponding critical value of z with an area to the
right of 0.10. This is because the t-distribution has more spread than the z-distribution.
If the degrees of freedom we desire are not available in the t-table, we follow the practice of
choosing the closest number of degrees of freedom available in the table. For example, if we
have 43 degrees of freedom, we use 45 degrees of freedom from the t-table.
In addition, the last row of the t-table provides the z-values from the standard normal
distribution. We use these values for situations where the degrees of freedom are more than
2,000. This is acceptable because the t-distribution starts to behave like the standard normal
distribution as n increases.
Step 1: Hyp
Step 2: Req
Step 3: TS
Step 4: p-val
Step 5: Dec
Step 6: Conc
Step 4: CV
Step 5: Dec
19.68
20.66
19.56
19.98
20.65
19.61
20.55
20.36
21.02
21.50
19.74
Step 1: Hyp
Step 2: Req
Step 3: TS
Step 4: p-val
Step 5: Dec
Step 6: Conc
Step 4: CV
Step 5: Dec
Example 15. To test H0 : µ = 100 versus H1 : µ 6= 100, a simple random sample of size
n = 23 is obtained from a population that is known to be normally distributed.
(a) If x̄ = 104.8 and s = 9.2, compute the test statistic.
(b) If the researcher decides to test this hypothesis at the α = 0.01 level of significance,
determine the critical value(s).
2. What conditions are necessary for testing this hypothesis? Are those conditions met?