Statistics Module 6thweek
Statistics Module 6thweek
Hypothesis Testing
By the end of this lesson, the student is expected to:
The method in which we select samples to learn more about characteristics in a given
population is called hypothesis testing.
Hypothesis testing is really a systematic way to test claims or ideas about a group or
population.
Illustration:
Suppose we read an article stating that housewives in the Philippines watch at an average
of 3 hours of teleserye on TV per day. To test whether this claim is true, we record the
time (in hours) that a group of 20 housewives (the sample), among all housewives in the
Philippines (the population), watch teleserye on TV. The mean we measure for these 20
housewives is a sample mean. We can then compare the sample mean we select to the
population mean stated in the article.
I. Null Hypothesis
Denoted by
The statement being tested
It represents what the experimenter doubts to be true
Must contain the condition of equality and must be written with the symbol
=, ≤ , or ≥.
For the mean, the null hypothesis will be stated in one of these three possible
forms:
: =
: ≤
: ≥
For the mean, the alternative hypothesis will be stated in one of these three
possible forms:
: ≠
: >
: <
If you are conducting a research study and you want to use a hypothesis
test to support your claim, the claim must be stated in such a way that it
becomes the alternative hypothesis, so it cannot contain the condition of
equality.
Example. If you believe that your brand of LED bulb lasts longer than the mean of 10
years for other brands, state the claim that > 10, where is the mean life of your LED
bulb.
: = 10 vs. : > 10
Research Problem
Example.
Suppose that the government is deciding whether to approve the manufacturing of a new
drug. A drug is to be tested to find out if it can dissolve cholesterol deposits in the heart’s
arteries. A major cause of heart diseases is the hardening of the arteries caused by the
accumulation of cholesterol. The Bureau of Food and Drug (BFAD) will not allow the
marketing of the drug unless there is strong evidence that it is effective.
A random sample of 98 middle-aged men has been selected for the experiment. Each
man is given a standard daily dosage of the drug for 2 consecutive weeks. Their
To perform a statistical hypothesis test, we must firstly identify the parameter of interest,
and have some educated guess about the true value of the parameter. In the case of the
BFAD example, the possible states of the drug’s effectiveness are referred to as
hypotheses. Because the director wants only to know whether it is effective or not, either
of the following hypotheses applies.
To measure the effectiveness of the drug for each middle-aged man, we can look at the
percent change in cholesterol levels experienced by all middle-aged men who took the
drug before and after they took the drug.
The null hypothesis represents no practical change in cholesterol levels before and after
the drug use. In terms of , we say
: ≤ 30%.
: > 30%
Type I Error
o The mistake of rejecting the null hypothesis when it is true.
o It is not a miscalculation or procedural misstep; it is an actual error that can
occur when a rare event happens by chance.
o The probability of rejecting the null hypothesis when it is true is called the
significance level ( ).
o The value of is typically predetermined, and very common choices are
= 0.05 and = 0.01.
Type II Error
o The mistake of failing to reject the null hypothesis when it is false.
o The symbol (beta) is used to represent the probability of a type II error.
o The mistake of failing to reject the null hypothesis that the mean life of your
LED bulb = 10 years, when it is actually false ( i.e., the mean is not 10
years).
o BFAD does not allow the release of an effective medicine.
True Situation
o The experimenter is free to determine . If the test leads to the rejection of Ho,
the researcher can then conclude that there is sufficient evidence supporting
at level of significance.
o Usually, is unknown because it’s hard to calculate it. The common solution to
this difficulty is to “withhold judgment” if the test leads to the failure to reject .
o and are inversely related. For a fixed sample size , as decreases
increases.
o In almost all statistical tests, both and can be reduced by increasing the
sample size.
o Because of the inverse relationship of and , setting a very small should also
be avoided if the researcher cannot afford a very large risk of committing a Type
II error.
o The choice of usually depends on the consequences associated with making a
Type I error.
Common Choices of Consequences of
Type I error
0.01 or smaller very serious
0.05 moderately serious
0.10 not too serious
o The usual practice in research and industry is to determine in advance the values
of and , so the value of is determined.
o Depending on the seriousness of a type I error, try to use the largest that
you can tolerate.
o For type I errors with more serious consequences, select smaller values of .
Then choose a sample size as large as is reasonable, based on considerations
of time, cost, and other such relevant factors.
The Test Statistic - a statistic computed from the sample data that is especially
sensitive to the differences between and .
Region of Rejection or Critical Region- the set of all values of the test statistic which
will lead to the rejection of .
o the behavior of the test statistic if the null hypotheses were true.
o the alternative hypothesis: the location of the region of rejection depends on the
form of .
o level of significance (): the smaller is, the smaller the region of rejection.
Critical Value/s
o the value or values that separate the critical region from the values of the test
statistic that would not lead to rejection of the null hypothesis.
o It depends on the nature of the null hypothesis, the relevant sampling distribution,
and the level of significance.
Types of Tests
o Two-tailed Test. If we are primarily concerned with deciding whether the true
value of a population parameter is different from a specified value, then the test
should be two-tailed. For the case of the mean, we say : ≠ .
o Left-tailed Test. If we are primarily concerned with deciding whether the true
value of a parameter is less than a specified value, then the test should be left-
tailed. For the case of the proportion, we say : < .
o Right-tailed Test. If we are primarily concerned with deciding whether the true
value of a parameter is greater than a specified value, then we should use the
right-tailed test. For the case of the standard deviation, we say : > .
Rejection 0 Critical
Region Value(s)
: =
: <
Rejection
Region
: =
: >
0
: =
: ≠
− Decision
0.01 Reject .
0.05 Reject .
Notes:
o Some texts say, “accept the null hypothesis” instead of “fail to reject the null
hypothesis.”
o Whether we use the term accept or fail to reject, we should recognize that we are
not proving the null hypothesis; we are merely saying that the sample evidence is
not strong enough to warrant rejection of the null hypothesis.
: =
̅− > >
=
√
≠ <− & >
̅− > >
= ( , )
√
≠ <− & >
( , ) ( , )
= −1
The above tests are exact -level tests for samples from a normal distribution. However,
they provide good approximate -level test when the distribution is not normal provided
that the sample size is > 30.
If is unknown and > 30, use the Z-test but replace by , that is,
̅−
=
√
z-test
Does an average box of cereal contain more than 368 grams of cereal? A random
sample of 25 boxes showed ̅ = 372.5. The company has specified to be 15
grams. Test at the = 0.05 level.
Solution:
: = 368
: > 368
Step 3. Identify the test statistics. Because we know the population standard deviation
, the test statistics is z-test.
̅− 372.5 − 368
= = = 1.5
15
√ √25
Conclusion: There is no sufficient sample evidence to support that the true mean is
more than 368 grams.
Does an average box of cereal contain 368 grams of cereal? A random sample of 25
boxes showed ̅ = 372.5. The company has specified to be 15 grams. Test at the
= 0.05 level.
Solution:
: = 368
: ≠ 368
Step 3. Identify the test statistics. Because we know the population standard deviation
, the test statistics is Z-test.
Conclusion: There is no sufficient sample evidence to support that the true mean is
more than 368 grams.
t -Test
Example 3. (One-Tail)
Does an average box of cereal contain more than 368 grams of cereal? A random
sample of 36 boxes showed ̅ = 372.5 and = 15 grams. Test at the = 0.01
level.
Solution
: = 368
: > 368
̅− 372.5 − 368
= = = 1.80
15
√ √36
= 0.01; = 36, = 35
Conclusion: There is no sufficient sample evidence to support that the true mean is
more than 368 grams.
Example 4.
Cooking oils that are low in both cholesterol and saturated fats are often recommended
for people who are trying to lower their blood cholesterol level or to lose weight. Many
cooking oils that have no cholesterol still have saturated fat contents of 6% to 18%.
Cooking oil made from soybeans has been advertised as containing 15% saturated fats.
A dietitian thinks that the percentage of saturated fats is greater than 15% and randomly
selects 13 bottles of soybean cooking oil for testing. These bottles contain the following
percentages of saturated fats:
On the basis of this sample, can the dietitian conclude that the level of saturated fats in
cooking oil made from soybeans is greater than 15% at 0.01 level of significance?
(Assume that the population is normally distributed.)
What would happen if instead of taking a sample of size 13, the dietitian takes a sample
of size 39? Include the following additional observations in the original data set and test
the same hypotheses at 0.01 level of significance.
Aufmann, R., et. al. (2013). Mathematical Excursion 3rd Ed. Cengage Learning.
California, USA
Gosioco, E. S. et al. (2015). Fundamentals of Statistics. Pampanga, Philippines
https://saylordotorg.github.io/text_introductory-statistics/s06-03-measures-
of-variability.html