Introduction To Hypothesis Testing
Introduction To Hypothesis Testing
Introduction to
Hypothesis
Testing
8
8.1 Inferential Statistics
and Hypothesis Testing
8.2 Four Steps to
LEARNING OBJECTIVES Hypothesis Testing
After reading this chapter, you should be able to: 8.3 Hypothesis Testing and
Sampling Distributions
8.4 Making a Decision:
1 Identify the four steps of hypothesis testing. Types of Error
8.5 Testing a Research
2 Define null hypothesis, alternative hypothesis, Hypothesis: Examples
level of significance, test statistic, p value, and Using the z Test
statistical significance. 8.6 Research in Focus:
Directional Versus
Nondirectional Tests
3 Define Type I error and Type II error, and identify the
type of error that researchers control. 8.7 Measuring the Size of
an Effect: Cohen’s d
4 Calculate the one-independent sample z test and 8.8 Effect Size, Power, and
Sample Size
interpret the results.
8.9 Additional Factors That
Increase Power
5 Distinguish between a one-tailed and two-tailed test,
and explain why a Type III error is possible only with 8.10 SPSS in Focus:
one-tailed tests. A Preview for
Chapters 9 to 18
6 Explain what effect size measures and compute a 8.11 APA in Focus:
Reporting the Test
Cohen’s d for the one-independent sample z test. Statistic and Effect Size
FIGURE 8.1
We expect the
The sampling distribution for a sample mean to be
population mean is equal to 1,000. equal to the
If 1,000 is the correct population population mean.
mean, then we know that, on
average, the sample mean will
equal 1,000 (the population mean).
Using the empirical rule, we know
that about 95% of all samples
selected from this population will
have a sample mean that falls
within two standard deviations
(SD) of the mean. It is therefore
unlikely (less than a 5%
probability) that we will measure a
sample mean beyond µ = 1000
2 SD from the population mean, if
the population mean is indeed
correct.
suppose we read an article stating that children in the United States watch an aver
age of 3 hours of TV per week. To test whether this claim is true, we record the time
(in hours) that a group of 20 American children (the sample), among all children in
the United States (the population), watch TV. The mean we measure for these 20
children is a sample mean. We can then compare the sample mean we select to the
population mean stated in the article.
2. We select a criterion upon which we decide that the claim being tested is
true or not. For example, the claim is that children watch 3 hours of TV per
week. Most samples we select should have a mean close to or equal to
3 hours if the claim we are testing is true. So at what point do we decide that
the discrepancy between the sample mean and 3 is so big that the claim
we are testing is likely not true? We answer this question in this step of
hypothesis testing.
3. Select a random sample from the population and measure the sample mean.
For example, we could select 20 children and measure the mean time (in
hours) that they watch TV per week.
4. Compare what we observe in the sample to what we expect to observe if NOTE: Hypothesis testing is
the claim we are testing is true. We expect the sample mean to be around the method of testing whether
3 hours. If the discrepancy between the sample mean and population mean claims or hypotheses regarding
is small, then we will likely decide that the claim we are testing is indeed a population are likely to be
true. If the discrepancy is too large, then we will likely decide to reject the true.
claim as being not true.
LE A R N I N G
1. On average, what do we expect the sample mean to be equal to? C H EC K 1
2. True or false: Researchers select a sample from a population to learn more about
characteristics in that sample.
Step 1: State the hypotheses. We begin by stating the value of a population mean
in a null hypothesis, which we presume is true. For the children watching TV
example, we state the null hypothesis that children in the United States watch an
average of 3 hours of TV per week. This is a starting point so that we can decide
whether this is likely to be true, similar to the presumption of innocence in a
courtroom. When a defendant is on trial, the jury starts by assuming that the
defendant is innocent. The basis of the decision is to determine whether this
assumption is true. Likewise, in hypothesis testing, we start by assuming that the
hypothesis or claim we are testing is true. This is stated in the null hypothesis. The
basis of the decision is to determine whether this assumption is likely to be true.
The null hypothesis (H0), stated as the null, is a statement about a population
DEFINITION parameter, such as the population mean, that is assumed to be true.
The null hypothesis is a starting point. We will test whether the value
stated in the null hypothesis is likely to be true.
Keep in mind that the only reason we are testing the null hypothesis is because
we think it is wrong. We state what we think is wrong about the null hypothesis in
an alternative hypothesis. For the children watching TV example, we may have
reason to believe that children watch more than (>) or less than (<) 3 hours of TV
per week. When we are uncertain of the direction, we can state that the value in the
null hypothesis is not equal to (≠) 3 hours.
NOTE: In hypothesis testing, In a courtroom, since the defendant is assumed to be innocent (this is the null
we conduct a study to test hypothesis so to speak), the burden is on a prosecutor to conduct a trial to show
whether the null hypothesis is evidence that the defendant is not innocent. In a similar way, we assume the null
likely to be true. hypothesis is true, placing the burden on the researcher to conduct a study to show
evidence that the null hypothesis is unlikely to be true. Regardless, we always make
a decision about the null hypothesis (that it is likely or unlikely to be true). The
alternative hypothesis is needed for Step 2.
1. Decisions are made about the null hypothesis. Using the courtroom
analogy, a jury decides whether a defendant is guilty or not guilty. The
jury does not make a decision of guilty or innocent because the defendant
is assumed to be innocent. All evidence presented in a trial is to show
that a defendant is guilty. The evidence either shows guilt (decision:
guilty) or does not (decision: not guilty). In a similar way, the null
hypothesis is assumed to be correct. A researcher conducts a study show
ing evidence that this assumption is unlikely (we reject the null hypoth
esis) or fails to do so (we retain the null hypothesis).
2. The bias is to do nothing. Using the courtroom analogy, for the same
reason the courts would rather let the guilty go free than send the inno
cent to prison, researchers would rather do nothing (accept previous
notions of truth stated by a null hypothesis) than make statements that
are not correct. For this reason, we assume the null hypothesis is correct,
thereby placing the burden on the researcher to demonstrate that the
null hypothesis is not likely to be correct.
Step 2: Set the criteria for a decision. To set the criteria for a decision, we state the
level of significance for a test. This is similar to the criterion that jurors use in a
criminal trial. Jurors decide whether the evidence presented shows guilt beyond a
reasonable doubt (this is the criterion). Likewise, in hypothesis testing, we collect
data to show that the null hypothesis is not true, based on the likelihood of selecting
a sample mean from a population (the likelihood is the criterion). The likelihood or
level of significance is typically set at 5% in behavioral research studies. When the
probability of obtaining a sample mean is less than 5% if the null hypothesis were
true, then we conclude that the sample we selected is too unlikely and so we reject
the null hypothesis.
We expect the
sample mean to be
H1: Children equal to the H1: Children
watch less than population mean. watch more
3 hours of TV than 3 hours of
per week. TV per week.
µ=3 µ=3
FIGURE 8.2
sample mean that is beyond 2 SD from the population mean. For the children
watching TV example, we can look for the probability of obtaining a sample mean
beyond 2 SD in the upper tail (greater than 3), the lower tail (less than 3), or both
tails (not equal to 3). Figure 8.2 shows that the alternative hypothesis is used to
determine which tail or tails to place the level of significance for a hypothesis test.
NOTE: The level of Step 3: Compute the test statistic. Suppose we measure a sample mean equal to
significance in hypothesis 4 hours per week that children watch TV. To make a decision, we need to evaluate
testing is the criterion we how likely this sample outcome is, if the population mean stated by the null
use to decide whether the hypothesis (3 hours per week) is true. We use a test statistic to determine this
value stated in the null likelihood. Specifically, a test statistic tells us how far, or how many standard
hypothesis is likely to be true. deviations, a sample mean is from the population mean. The larger the value of the
test statistic, the further the distance, or number of standard deviations, a sample
mean is from the population mean stated in the null hypothesis. The value of the
test statistic is used to make a decision in Step 4.
NOTE: We use the value of the Step 4: Make a decision. We use the value of the test statistic to make a decision
test statistic to make a decision about the null hypothesis. The decision is based on the probability of obtaining a
regarding the null hypothesis. sample mean, given that the value stated in the null hypothesis is true. If the
C H APT ER 8 : I N T RO D U C T I O N T O H YPO T H ES I S T ES T I NG 7
probability of obtaining a sample mean is less than 5% when the null hypothesis is
true, then the decision is to reject the null hypothesis. If the probability of obtaining
a sample mean is greater than 5% when the null hypothesis is true, then the
decision is to retain the null hypothesis. In sum, there are two decisions a researcher
can make:
1. Reject the null hypothesis. The sample mean is associated with a low proba
bility of occurrence when the null hypothesis is true.
2. Retain the null hypothesis. The sample mean is associated with a high proba
bility of occurrence when the null hypothesis is true.
The probability of obtaining a sample mean, given that the value stated in the
null hypothesis is true, is stated by the p value. The p value is a probability: It varies
between 0 and 1 and can never be negative. In Step 2, we stated the criterion or
probability of obtaining a sample mean at which point we will decide to reject the
value stated in the null hypothesis, which is typically set at 5% in behavioral research.
To make a decision, we compare the p value to the criterion we set in Step 2.
When the p value is less than 5% (p < .05), we reject the null hypothesis. We will NOTE: Researchers make
refer to p < .05 as the criterion for deciding to reject the null hypothesis, although decisions regarding the null
note that when p = .05, the decision is also to reject the null hypothesis. When the hypothesis. The decision can
p value is greater than 5% (p > .05), we retain the null hypothesis. The decision to be to retain the null (p > .05)
reject or retain the null hypothesis is called significance. When the p value is less or reject the null (p < .05).
than .05, we reach significance; the decision is to reject the null hypothesis. When
the p value is greater than .05, we fail to reach significance; the decision is to retain
the null hypothesis. Figure 8.3 shows the four steps of hypothesis testing.
LE A R N I N G
1. State the four steps of hypothesis testing. C H EC K 2
2. The decision in hypothesis testing is to retain or reject which hypothesis: the
null or alternative hypothesis?
3. The criterion or level of significance in behavioral research is typically set at
what probability value?
4. A test statistic is associated with a p value less than .05 or 5%. What is the deci
sion for this hypothesis test?
5. If the null hypothesis is rejected, then did we reach significance?
outcome; 4. Reject the null; 5. Yes.
Step 3: Compute the test statistic. Step 4: Make a decision; 2. Null; 3. A .05 or 5% likelihood for obtaining a sample
Answers: 1. Step 1: State the null and alternative hypothesis. Step 2: Determine the level of significance.
8 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
FIGURE 8.3
TABLE 8.1 A review of the notation used for the mean, variance, and standard deviation in population,
sample, and sampling distributions.
Variance s2 s2 or SD 2 σ2
σM
2
=
n
Standard s s or SD σ
σM =
deviation n
TABLE 8.2 A review of the key differences between population, sample, and sampling distributions.
What is it? Scores of all persons in a Scores of a select All possible sample means that
population portion of persons from can be drawn, given a certain
the population sample size
What is the shape? Could be any shape Could be any shape Normally distributed
LE A R N I N G
1. For the following statement, write increases or decreases as an answer. The like C H EC K 3
lihood that we reject the null hypothesis (increases or decreases):
a. The closer the value of a sample mean is to the value stated by the null
hypothesis?
b. The further the value of a sample mean is from the value stated in the null
hypothesis?
2. A researcher selects a sample of 49 students to test the null hypothesis that the
average student exercises 90 minutes per week. What is the mean for the sam
pling distribution for this population of interest if the null hypothesis is true?
Answers: 1. (a) Decreases, (b) Increases; 2. 90 minutes.
10 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
TABLE 8.3 Four outcomes for making a decision. The decision can be either correct (correctly reject
or retain null) or wrong (incorrectly reject or retain null).
Decision
Type II error, or beta (b) error, is the probability of retaining a null hypothesis
that is actually false. DEFINITION
Type I error is the probability of rejecting a null hypothesis that is actually true.
Researchers directly control for the probability of committing this type of error. DEFINITION
An alpha (a) level is the level of significance or criterion for a hypothesis test.
It is the largest probability of committing a Type I error that we will allow and
still decide to reject the null hypothesis.
Since we assume the null hypothesis is true, we control for Type I error by stating a NOTE: Researchers directly
level of significance. The level we set, called the alpha level (symbolized as a), is the larg control for the probability of
est probability of committing a Type I error that we will allow and still decide to reject the a Type I error by stating an
null hypothesis. This criterion is usually set at .05 (a = .05), and we compare the alpha alpha (a) level.
level to the p value. When the probability of a Type I error is less than 5% (p < .05),
we decide to reject the null hypothesis; otherwise, we retain the null hypothesis.
The correct decision is to reject a false null hypothesis. There is always some
probability that we decide that the null hypothesis is false when it is indeed false. This
decision is called the power of the decision-making process. It is called power because NOTE: The power in hypothesis
it is the decision we aim for. Remember that we are only testing the null hypothesis testing is the probability of
because we think it is wrong. Deciding to reject a false null hypothesis, then, is the correctly rejecting the value
power, inasmuch as we learn the most about populations when we accurately reject stated in the null hypothesis.
false notions of truth. This decision is the most published result in behavioral research.
LE A R N I N G
1. What type of error do we directly control? C H EC K 4
Recall that we can state one of three alternative hypotheses: A population mean
NOTE: The z test is used to is greater than (>), less than (<), or not equal (≠) to the value stated in a null hypoth
test hypotheses about a esis. The alternative hypothesis determines which tail of a sampling distribution to
population mean when the place the level of significance, as illustrated in Figure 8.2. In this section, we will use
population variance is known. an example for each type of alternative hypothesis.
NONDIRECTIONAL, TWO-TAILED
HYPOTHESIS TESTS (H1: ≠)
NOTE: Nondirectional In Example 8.1, we will use the z test for a nondirectional, or two-tailed test,
tests are used to test where the alternative hypothesis is stated as not equal to (≠) the null hypothesis. For
hypotheses when we are this test, we will place the level of significance in both tails of the sampling distribu
interested in any alternative tion. We are therefore interested in any alternative from the null hypothesis. This is
from the null hypothesis. the most common alternative hypothesis tested in behavioral science.
Templer and Tomeo (2002) reported that the population mean score on the
E X A M PL E 8 .1 quantitative portion of the Graduate Record Examination (GRE) General Test for
students taking the exam between 1994 and 1997 was 558 ± 139 (m ± s). Suppose we
select a sample of 100 participants (n = 100). We record a sample mean equal to 585
(M = 585). Compute the one–independent sample z test for whether or not we will
retain the null hypothesis (m = 558) at a .05 level of significance (a = .05).
Step 1: State the hypotheses. The population mean is 558, and we are testing
whether the null hypothesis is (=) or is not (≠) correct:
H0: m = 558 Mean test scores are equal to 558 in the population.
H1: m ≠ 558 Mean test scores are not equal to 558 in the population.
Step 2: Set the criteria for a decision. The level of significance is .05, which makes the
alpha level a = .05. To locate the probability of obtaining a sample mean from a given
C H APT ER 8 : I N T RO D U C T I O N T O H YPO T H ES I S T ES T I NG 13
population, we use the standard normal distribution. We will locate the z scores in a
standard normal distribution that are the cutoffs, or critical values, for sample mean
values with less than a 5% probability of occurrence if the value stated in the null
(m = 558) is true.
A critical value is a cutoff value that defines the boundaries beyond which
less than 5% of sample means can be obtained if the null hypothesis is true. DEFINITION
Sample means obtained beyond a critical value will result in a decision to
reject the null hypothesis.
α .05
Splitting α in half: = = .0250 in each tail
2 2
TABLE 8.4 Critical values for one- and two-tailed tests at three commonly used levels of significance.
Type of Test
−3 −2 −1 0 1 2 3 FIGURE 8.4
Null
−1.96 1.96 The critical values (±1.96) for a
nondirectional (two-tailed) test
with a .05 level of significance.
14 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
NOTE: For two-tailed tests, To locate the critical values, we use the unit normal table given in Table B1 in Appendix
the alpha is split in half B and look up the proportion .0250 toward the tail in column C. This value, .0250, is
and placed in each tail of a listed for a z-score equal to z = 1.96. This is the critical value for the upper tail of the
standard normal distribution. standard normal distribution. Since the normal distribution is symmetrical, the critical
value in the bottom tail will be the same distance below the mean, or z = –1.96. The
NOTE: A critical value
regions beyond the critical values, displayed in Figure 8.4, are called the rejection
marks the cutoff for the
regions. If the value of the test statistic falls in these regions, then the decision is to
rejection region.
reject the null hypothesis; otherwise, we retain the null hypothesis.
The rejection region is the region beyond a critical value in a hypothesis test.
DEFINITION When the value of a test statistic is in the rejection region, we decide to reject
the null hypothesis; otherwise, we retain the null hypothesis.
Step 3: Compute the test statistic. Step 2 sets the stage for making a decision because the
criterion is set. The probability is less than 5% that we will obtain a sample mean that is at
least 1.96 standard deviations above or below the value of the population mean stated in
the null hypothesis. In this step, we will compute a test statistic to determine whether the
sample mean we selected is beyond or within the critical values we stated in Step 2.
The test statistic for a one–independent sample z test is called the z statistic. The
z statistic converts any sampling distribution into a standard normal distribution.
The z statistic is therefore a z transformation. The solution of the formula gives the
number of standard deviations, or z-scores, that a sample mean falls above or below
the population mean stated in the null hypothesis. We can then compare the value
of the z statistic, called the obtained value, to the critical values we determined in
Step 2. The z statistic formula is the sample mean minus the population mean
stated in the null hypothesis, divided by the standard error of the mean:
M −µ σ
z statistic: zobt = , where σ M = .
σM n
The obtained value is the value of a test statistic. This value is compared to
the critical value(s) of a hypothesis test to make a decision. When the obtained
value exceeds a critical value, we decide to reject the null hypothesis;
otherwise, we retain the null hypothesis.
To calculate the z statistic, first compute the standard error (sM), which is the
denominator for the z statistic:
NOTE: The z statistic
measures the number of σ 139
σM = = = 13.9.
standard deviations, or n 100
z-scores, that a sample mean
falls above or below the Then compute the z statistic by substituting the values of the sample mean,
population mean stated in the M = 585; the population mean stated by the null hypothesis, m = 558; and the
null hypothesis. standard error we just calculated, sM = 13.9:
M − µ 585 − 558
zobt = = = 1.94 .
σM 13.9
C H APT ER 8 : I N T RO D U C T I O N T O H YPO T H ES I S T ES T I NG 15
Step 4: Make a decision. To make a decision, we compare the obtained value to the
critical values. We reject the null hypothesis if the obtained value exceeds a critical
value. Figure 8.5 shows that the obtained value (Zobt = 1.94) is less than the critical value;
it does not fall in the rejection region. The decision is to retain the null hypothesis.
The probability of obtaining Zobt = 1.94 is stated by the p value. To locate the p value
or probability of obtaining the z statistic, we refer to the unit normal table in
Table B1 in Appendix B. Look for a z score equal to 1.94 in column A, then locate
the probability toward the tail in column C. The value is .0262. Finally, multiply the
value given in column C times the number of tails for alpha. Since this is a two-
tailed test, we multiply .0262 times 2: p = (.0262) × 2 tails = .0524. Table 8.5
summarizes how to determine the p value for one- and two-tailed tests. (We will
compute one-tailed tests in Examples 8.2 and 8.3.)
TABLE 8.5 To find the p value for the z statistic, find its probability (toward the tail) in the unit normal
table and multiply this probability times the number of tails for alpha.
Number of tails 1 2
Probability p p
p value calculation 1p 2p
We found in Example 8.1 that if the null hypothesis were true, then p = .0524
that we could have selected this sample mean from this population. The criteria we
set in Step 2 was that the probability must be less than 5% that we obtain a sample
mean, if the null hypothesis were true. Since p is greater than 5%, we decide to
retain the null hypothesis. We conclude that the mean score on the GRE General
Test in this population is 558 (the value stated in the null hypothesis).
16 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
Using the same study from Example 8.1, Templer and Tomeo (2002) reported that the
E X A M PL E 8 . 2 population mean on the quantitative portion of the GRE General Test for students
taking the exam between 1994 and 1997 was 558 ± 139 (m ± s). Suppose we select a
sample of 100 students enrolled in an elite private school (n = 100). We hypothesize
that students at this elite school will score higher than the general population. We
record a sample mean equal to 585 (M = 585), same as measured in Example 8.1.
Compute the one–independent sample z test at a .05 level of significance.
Step 1: State the hypotheses. The population mean is 558, and we are testing
whether the alternative is greater than (>) this value:
H0: m = 558 Mean test scores are equal to 558 in the population of students
at the elite school.
H1: m > 558 Mean test scores are greater than 558 in the population of
students at the elite school.
Step 2: Set the criteria for a decision. The level of significance is .05, which makes
the alpha level a = .05. To determine the critical value for an upper-tail critical test,
we locate the probability .0500 toward the tail in column C in the unit normal
table. The z-score associated with this probability is between z = 1.64 and z = 1.65.
The average of these z-scores is z = 1.645. This is the critical value or cutoff for the
rejection region. Figure 8.6 shows that for this test, we place all the value of alpha in
the upper tail of the standard normal distribution.
NOTE: For one-tailed tests, the Step 3: Compute the test statistic. Step 2 sets the stage for making a decision because
alpha level is placed in a single the criterion is set. The probability is less than 5% that we will obtain a sample
tail of a distribution. For mean that is at least 1.645 standard deviations above the value of the population
upper-tail critical tests, the mean stated in the null hypothesis. In this step, we will compute a test statistic to
alpha level is placed above the determine whether or not the sample mean we selected is beyond the critical value
mean in the upper tail. we stated in Step 2.
C H APT ER 8 : I N T RO D U C T I O N T O H YPO T H ES I S T ES T I NG 17
Rejection region
α = .05
FIGURE 8.6
The test statistic does not change from that in Example 8.1. We are testing the same
population, and we measured the same value of the sample mean. We changed only
the location of the rejection region in Step 2. The z statistic is the same computation
as that shown in Example 8.1:
M − µ 585 − 558
zobt = = = 1.94 .
σM 13.9
Step 4: Make a decision. To make a decision, we compare the obtained value to the
critical value. We reject the null hypothesis if the obtained value exceeds the critical
value. Figure 8.7 shows that the obtained value (Zobt = 1.94) is greater than the
critical value; it falls in the rejection region. The decision is to reject the null
hypothesis. The p value for this test is .0262 (p = .0262). We do not double the
p value for one-tailed tests.
We found in Example 8.2 that if the null hypothesis were true, then p = .0262
that we could have selected this sample mean from this population. The criteria we
set in Step 2 was that the probability must be less than 5% that we obtain a sample
mean, if the null hypothesis were true. Since p is less than 5%, we decide to reject
the null hypothesis. We decide that the mean score on the GRE General Test in this
Rejection region
α = .05
Retain the null
hypothesis
FIGURE 8.7
−3 −2 −1 0 1 2 3
Since the obtained value reaches
Null 1.94
the rejection region, we decide to
reject the null hypothesis.
18 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
population is not 558, which was the value stated in the null hypothesis. Also,
notice that we made two different decisions using the same data in Examples 8.1
and 8.2. This outcome is explained further in Section 8.6.
Using the same study from Example 8.1, Templer and Tomeo (2002) reported that
E X A M PL E 8 . 3 the population mean on the quantitative portion of the GRE General Test for those
taking the exam between 1994 and 1997 was 558 ± 139 (m ± s). Suppose we select a
sample of 100 students enrolled in a school with low funding and resources (n = 100).
We hypothesize that students at this school will score lower than the general
population. We record a sample mean equal to 585 (M = 585), same as measured in
Examples 8.1 and 8.2. Compute the one–independent sample z test at a .05 level of
significance.
Step 1: State the hypotheses. The population mean is 558, and we are testing
whether the alternative is less than (<) this value:
H0: m = 558 Mean test scores are equal to 558 in the population at this
school.
H1: m < 558 Mean test scores are less than 558 in the population at this
school.
Step 2: Set the criteria for a decision. The level of significance is .05, which makes
NOTE: For one-tailed tests, the
the alpha level a = .05. To determine the critical value for a lower-tail critical test, we
alpha level is placed in a single
locate the probability .0500 toward the tail in column C in the unit normal table.
tail of the distribution. For
The z-score associated with this probability is again z = 1.645. Since this test is a
lower-tail critical tests, the
lower-tail critical test, we place the critical value the same distance below the mean:
alpha is placed below the
The critical value for this test is z = –1.645. All of the alpha level is placed in the
mean in the lower tail.
lower tail of the distribution beyond the critical value. Figure 8.8 shows the standard
normal distribution, with the rejection region beyond the critical value.
Step 3: Compute the test statistic. Step 2 sets the stage for making a decision because
the criterion is set. The probability is less than 5% that we will obtain a sample
mean that is at least 1.645 standard deviations below the value of the population
mean stated in the null hypothesis. In this step, we will compute a test statistic to
determine whether or not the sample mean we selected is beyond the critical value
we stated in Step 2.
The test statistic does not change from that used in Example 8.1. We are testing the
same population, and we measured the same value of the sample mean. We changed
C H APT ER 8 : I N T RO D U C T I O N T O H YPO T H ES I S T ES T I NG 19
Rejection region
α = .05
FIGURE 8.8
only the location of the rejection region in Step 2. The z statistic is the same
computation as that shown in Example 8.1:
M − µ 585 − 558
zobt = = = 1.94 .
σM 13.9
Step 4: Make a decision. To make a decision, we compare the obtained value to the
critical value. We reject the null hypothesis if the obtained value exceeds the critical
value. Figure 8.9 shows that the obtained value (Zobt = +1.94) does not exceed the
critical value. Instead, the value we obtained is located in the opposite tail. The
decision is to retain the null hypothesis.
Rejection region
α = .05 Retain the null
hypothesis
FIGURE 8.9
NOTE: A Type III error occurs The decision in Example 8.3 was to retain the null hypothesis, although if we
when the rejection region is placed the rejection region in the upper tail (as we did in Example 8.2), we would
located in the wrong tail. This have decided to reject the null hypothesis. We anticipated that scores would be
type of error is only possible for worse, and instead, they were better than the value stated in the null hypothesis.
one-tailed tests. When we fail to reject the null hypothesis because we placed the rejection region in
the wrong tail, we commit a Type III error (Kaiser, 1960).
A Type III error occurs with one-tailed tests, where the researcher decides to retain
DEFINITION the null hypothesis because the rejection region was located in the wrong tail.
The “wrong tail” refers to the opposite tail from where a difference was
observed and would have otherwise been significant.
FIGURE 8.10
The two-tailed test is more conservative; it makes it more difficult to reject the NOTE: Two-tailed tests are
null hypothesis. It also eliminates the possibility of committing a Type III error. more conservative and
The one-tailed test, though, is associated with greater power. If the value stated in eliminate the possibility of
the null hypothesis is false, then a one-tailed test will make it easier to detect this committing a Type III error.
(i.e., lead to a decision to reject the null hypothesis). Because the one-tailed test One-tailed tests are associated
makes it easier to reject the null hypothesis, it is important that we justify that an with more power, assuming the
outcome can occur in only one direction. Justifying that an outcome can occur in value stated in the null
only one direction is difficult for much of the data that behavioral researchers mea hypothesis is wrong.
sure. For this reason, most studies in behavioral research are two-tailed tests.
LE A R N I N G
1. Is the following set of hypotheses appropriate for a directional or a nondirec C H EC K 5
tional hypothesis test?
H0: m = 35
H1: m ≠ 35
3. A researcher conducts a hypothesis test and finds that the probability of select
ing the sample mean is p = .0689 if the value stated in the null hypothesis is
true. What is the decision for a hypothesis test at a .05 level of significance?
For a single sample, an effect is the difference between a sample mean and
DEFINITION the population mean stated in the null hypothesis. In hypothesis testing, an
effect is insignificant when we retain the null hypothesis; an effect is
significant when we reject the null hypothesis.
Effect size is a statistical measure of the size of an effect in a population,
which allows researchers to describe how far scores shifted in the population,
or the percent of variance that can be explained by a given variable.
M −µ
Cohen’s d = .
σ
The value of Cohen’s d is zero when there is no difference between two means
and increases as the differences get larger. To interpret values of d, we refer to Cohen’s
effect size conventions outlined in Table 8.6. The sign of d indicates the direction
of the shift. When values of d are positive, an effect shifted above the population
mean; when values of d are negative, an effect shifted below the population mean.
In Example 8.4, we will compute effect size for the research study in Examples 8.1
to 8.3. Since we tested the same population and measured the same sample mean
in each example, the effect size estimate will be the same for all examples.
C H APT ER 8 : I N T RO D U C T I O N T O H YPO T H ES I S T ES T I NG 23
In Examples 8.1 to 8.3, we used data given by Templer and Tomeo (2002). They
reported that the population mean on the quantitative portion of the GRE General E X A M PLE 8 . 4
Test for those taking the exam between 1994 and 1997 was 558 ± 139 (m ± s). In
each example, the mean test score in the sample was 585 (M = 585). What is the
effect size for this test using Cohen’s d?
The numerator for Cohen’s d is the difference between the sample mean (M = 585)
and the population mean (m = 558). The denominator is the population standard
deviation (s = 139):
M −µ 27
d= = = 0.19.
σ 139
We conclude that the observed effect shifted 0.19 standard deviations above
the mean in the population. This way of interpreting effect size is illustrated in
Figure 8.11. We are stating that students in the elite school scored 0.19 standard
deviations higher, on average, than students in the general population. This
interpretation is most meaningfully reported with Example 8.2 since we decided to
reject the null hypothesis using this example. Table 8.7 compares the basic
characteristics of hypothesis testing and effect size.
d = 0.19
Population distribution
assuming the null is true
µ = 558
σ = 139
Population distribution
assuming the null is false—
with a 2-point effect
FIGURE 8.11
µ = 585
Effect size. Cohen’s d estimates
σ = 139
the size of an effect using the
population standard deviation as
an absolute comparison.
A 27-point effect shifted the
168 307 446 585 724 863 1002 distribution of scores in the
population by 0.19 standard
deviations.
24 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
TABLE 8.7 Distinguishing characteristics for significance testing and effect size.
Hypothesis
(Significance) Testing Effect Size (Cohen’s d)
What does the test measure? The probability of obtaining a The size of a measured treatment
measured sample mean effect in the population
What can be inferred from the Whether the null hypothesis is Whether the size of a treatment
test? true or false effect is small to large
Can this test stand alone in Yes, the test statistic can be No, effect size is almost always
research reports? reported without an effect size reported with a test statistic
LE A R N I N G
C H EC K 6 1. ________ measures the size of an effect in a population, whereas ______________
measures whether an effect exists in a population.
2. The scores for a population are normally distributed with a mean equal to 25
and standard deviation equal to 6. A researcher selects a sample of 36 students
and measures a sample mean equal to 23 (M = 23). For this example:
a. What is the value of Cohen’s d?
b. Is this effect size small, medium, or large?
6
= -0.33, (b) Medium effect size. Answers: 1. Effect size, hypothesis or significance testing; 2. (a) d =
23 − 25
standard deviation differs between these populations. Using the values given in
Table 8.8, we already have enough information to compute effect size:
Class 1 Class 2
M1 = 40 M2 = 40
m1 = 38 m2 = 38
s1 = 10 s2 = 2
M − µ 40 − 38
Effect size for Class 1 : d = = = 0.20.
σ 10
M − µ 40 − 38
Effect size for Class 2 : d = = = 1.00.
σ 10
The numerator for each effect size estimate is the same. The mean difference
between the sample mean and the population mean is 2 points. Although there is a
2-point effect in both Class 1 and Class 2, Class 2 is associated with a much larger
effect size in the population because the standard deviation is smaller. Since a larger
effect size is associated with greater power, we should find that it is easier to detect
the 2-point effect in Class 2. To determine whether this is true, suppose we select a
sample of 30 students (n = 30) from each class and measure the same sample mean
value that is listed in Table 8.8. Let’s determine the power of each test when we
conduct an upper-tail critical test at a .05 level of significance.
To determine the power, we will first construct the sampling distribution for each
s
class, with a mean equal to the population mean and standard error equal to :
n
If the null hypothesis is true, then the sampling distribution of the mean for
alpha (a), the type of error associated with a true null hypothesis, will have a mean
equal to 38. We can now determine the smallest value of the sample mean that is
the cutoff for the rejection region, where we decide to reject that the true population
mean is 38. For an upper-tail critical test using a .05 level of significance, the critical
26 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
M − 38
Cutoff for a (Class 1): 1.645 =
1.82
M = 40.99
M − 38
Cutoff for a (Class 2): 1.645 =
0.37
M = 38.61
FIGURE 8.12
Small effect size and low power 32.54 34.36 36.18 38 39.82 41.64 43.46
for Class 1. In this example, when Sampling distribution
alpha is .05, the critical value or About 29% of sample means
assuming the null is false—
cutoff for alpha is 40.99. When selected from this population will
with a 2-point effect
a = .05, notice that only about result in a decision to reject the
29% of samples will detect this µ = 40 null, if the null is false.
effect (the power). So even if the SEM = 1.82 Power = .2946
researcher is correct, and the null n = 30
is false (with a 2-point effect), only
about 29% of the samples he or
34.54 36.36 38.18 40 41.82 43.64 45.46
she selects at random will result
40.99
in a decision to reject the null
hypothesis.
C H APT ER 8 : I N T RO D U C T I O N T O H YPO T H ES I S T ES T I NG 27
If we are correct, and the 2-point effect exists, then we are much more likely to
detect the effect in Class 2 for n = 30. Class 1 has a small effect size (d = .20). Even if
we are correct, and a 2-point effect does exist in this population, then of all the
samples of size 30 we could select from this population, only about 29% (power =
.2946) of those samples will show the effect (i.e., lead to a decision to reject the
null). The probability of correctly rejecting the null hypothesis (power) is low.
Class 2 has a large effect size (d = 1.00). If we are correct, and a 2-point effect
does exist in this population, then of all the samples of size 30 we could select from
this population, nearly 100% (power = .9999) of those samples will show the effect
(i.e., lead to a decision to reject the null hypothesis). Hence, we have more power to
detect an effect in this population, and correctly reject the null hypothesis.
M − µ 40 − 38
zobt = = = 1.10.
σ 10
n 30
For a one-tailed test that is upper-tail critical, the critical value is 1.645. The
value of the test statistic (+1.10) does not exceed the critical value (+1.645), so we
retain the null hypothesis. NOTE: Increasing the sample
Increase the sample size to n = 100. The test statistic for Class 1 when n = 100 is: size increases power by
reducing the standard error,
M − µ 40 − 38 thereby increasing the value of
zobt = = = 2.00.
σ 10 the test statistic in hypothesis
n 100 testing.
28 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
The critical value is still 1.645. The value of the test statistic (+2.00) now exceeds
the critical value (+1.645), so we reject the null hypothesis.
Notice that increasing the sample size alone led to a decision to reject the
null hypothesis. Hence, increasing sample size increases power: It makes it
more likely that we will detect an effect, assuming that an effect exists in some
population.
LE A R N I N G
C H EC K 7 1. As effect size increases, what happens to the power?
3. When a population is associated with a small effect size, what can a researcher
do to increase the power of the study?
4. True or false: The effect size, power, and sample size associated with a study can
affect the decisions we make in hypothesis testing.
Answers: 1. Power increases; 2. Power decreases; 3. Increase the sample size (n); 4. True.
the direction that an effect is expected to occur, thereby increasing the power to
detect an effect.
σ 8
σM = = = 2.0.
n 16
To compute the z statistic, we subtract the sample mean from the population
mean and divide by the standard error:
M − µ 12 − 10
zobt = = = 1.00.
σM 2
An obtained value equal to 1.00 does not exceed the critical value for a one-
tailed test (critical value = 1.645) or a two-tailed test (critical values = ±1.96). The
decision is to retain the null hypothesis.
If the population standard deviation is smaller, the standard error will be
smaller, thereby making the value of the test statistic larger. Suppose, for example,
that we reduce the population standard deviation to 4. The standard error in this
example is now:
σ 4
σM = = = 1.0.
n 16
To compute the z statistic, we subtract the sample mean from the population
mean and divide by this smaller standard error:
M − µ 12 − 10
zobt = = = 2.00.
σM 1
An obtained value equal to 2.00 does exceed the critical value for a one-tailed
test (critical value = 1.645) and a two-tailed test (critical values = ±1.96). Now the
decision is to reject the null hypothesis. Assuming that an effect exists in the
population, decreasing the population standard deviation decreases standard error
and increases the power to detect an effect. Table 8.9 lists each factor that increases
power.
30 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
To increase power:
Increase Decrease
Test scores for students in the elite school were significantly higher than
the standard performance of test takers, z = 1.94, p < .03.
Notice that when we report a result, we do not state that we reject or retain the
null hypothesis. Instead, we report whether a result is significant (the decision was
to reject the null hypothesis) or not significant (the decision was to retain the null
hypothesis). Also, you are not required to report the exact p value, although it is
recommended. An alternative is to report it in terms of the closest value to the
hundredths or thousandths place that its value is less than. In this example, we
stated p < .03 for a p value actually equal to .0262.
Finally, it is often necessary to include a figure or table to illustrate a significant
effect and the effect size associated with it. For example, we could describe the effect
size in one additional sentence supported by the following figure:
650
600
550
500
450
400
350
Score
300
250
200
150
100 FIGURE 8.14
50 The mean Graduate Record
0 Examination (GRE) General Test
General Population Gifted Students scores among a sample of gifted
Population and Sample students compared with the
general population. Error bars
indicate SEM.
In two sentences and a figure, we reported the value of the test statistic,
p value, effect size, and the mean test scores. The error bars indicate the standard
error of the mean for this study.
32 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
C H A P T E R SU M M A RY O R G A N I Z E D BY LE A R N I N G O B J EC T I V E
LO 1: Identify the four steps of hypothesis testing. When a null hypothesis is retained, a result is
not significant.
•• Hypothesis testing, or significance test-
ing, a method of testing a claim or hypothesis LO 3: Define Type I error and Type II error, and iden
about a parameter in a population, using data tify the type of error that researchers control.
measured in a sample. In this method, we test
some hypothesis by determining the likeli •• We can decide to retain or reject the null
hood that a sample statistic could have been hypothesis, and this decision can be correct or
selected, if the hypothesis regarding the popu incorrect. Two types of errors in hypothesis
lation parameter were true. The four steps of testing are called Type I and Type II errors.
hypothesis testing are as follows: •• A Type I error is the probability of rejecting
– Step 1: State the hypotheses. a null hypothesis that is actually true. The
– Step 2: Set the criteria for a decision. probability of this type of error is determined
– Step 3: Compute the test statistic. by the researcher and stated as the level of sig
– Step 4: Make a decision. nificance or alpha level for a hypothesis test.
•• A Type II error is the probability of retaining
LO 2: Define null hypothesis, alternative hypothesis, a null hypothesis that is actually false.
level of significance, test statistic, p value, and statisti
cal significance. LO 4: Calculate the one–independent sample z test
and interpret the results.
•• The null hypothesis (H 0), stated as the
null, is a statement about a population •• The one–independent sample z test is a
parameter, such as the population mean, that statistical procedure used to test hypotheses
is assumed to be true. concerning the mean in a single population
•• An alternative hypothesis (H 1 ) is a with a known variance. The test statistic for
statement that directly contradicts a null this hypothesis test is
hypothesis by stating that the actual value of a
population parameter, such as the mean, is M −µ σ
zobt = , where σ M = .
less than, greater than, or not equal to the σM n
value stated in the null hypothesis.
•• Level of significance refers to a criterion of •• Critical values, which mark the cutoffs for
judgment upon which a decision is made the rejection region, can be identified for
regarding the value stated in a null hypothesis. any level of significance. The value of the test
•• The test statistic is a mathematical formula statistic is compared to the critical values.
that allows researchers to determine the likeli When the value of a test statistic exceeds a
hood or probability of obtaining sample out critical value, we reject the null hypothesis;
comes if the null hypothesis were true. The otherwise, we retain the null hypothesis.
value of a test statistic can be used to make
inferences concerning the value of population LO 5: Distinguish between a one-tailed and two-
parameters stated in the null hypothesis. tailed test, and explain why a Type III error is possible
•• A p value is the probability of obtaining a sam only with one-tailed tests.
ple outcome, given that the value stated in the
null hypothesis is true. The p value of a sample •• Nondirectional (two-tailed) tests are
outcome is compared to the level of significance. hypothesis tests where the alternative hypothe
•• Significance, or statistical significance, sis is stated as not equal to (≠). So we are interested
describes a decision made concerning a value in any alternative from the null hypothesis.
stated in the null hypothesis. When a null •• Directional (one-tailed) tests are hypoth
hypothesis is rejected, a result is significant. esis tests where the alternative hypothesis is
C H APT ER 8 : I N T RO D U C T I O N T O H YPO T H ES I S T ES T I NG 33
stated as greater than (>) or less than (<) some medium, and large effects based on typical
value. So we are interested in a specific alterna findings in behavioral research.
tive from the null hypothesis.
•• A Type III error occurs for one-tailed tests LO 7: Define power and identify six factors that influ
where a result would have been significant in ence power.
one tail, but the researcher retains the null
hypothesis because the rejection region was •• The power in hypothesis testing is the prob
placed in the wrong or opposite tail. ability that a randomly selected sample will
show that the null hypothesis is false when
LO 6: Explain what effect size measures and compute the null hypothesis is in fact false.
a Cohen’s d for the one–independent sample z test. •• To increase the power of detecting an effect in
a given population:
•• Effect size is a statistical measure of the size of a. Increase effect size (d), sample size (n), and
an observed effect in a population, which allows alpha (a).
researchers to describe how far scores shifted in b. Decrease beta error (b), population standard
the population, or the percent of variance that deviation (s), and standard error (sM).
can be explained by a given variable.
•• Cohen’s d is used to measure how far scores APA LO 8: Summarize the results of a one–indepen
shifted in a population and is computed using dent sample z test in American Psychological
the following formula: Association (APA) format.
KEY TERMS
1. State the four steps of hypothesis testing. 5. What is the power in hypothesis testing?
2. What are two decisions that a researcher makes
in hypothesis testing? 6. What are the critical values for a one–independent
sample nondirectional (two-tailed) z test at a .05
3. What is a Type I error (a)? level of significance?
34 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
7. Explain why a one-tailed test is associated with 17. A researcher conducts a one–independent sample
greater power than a two-tailed test. z test and makes the decision to reject the null
hypothesis. Another researcher selects a larger
8. How are the rejection region, probability of a sample from the same population, obtains the
Type I error, level of significance, and alpha level same sample mean, and makes the decision to
related? retain the null hypothesis using the same
hypothesis test. Is this possible? Explain.
9. Alpha (a) is used to measure the error for deci
sions concerning true null hypotheses. What is 18. Determine the level of significance for a hypothe
beta (b) error used to measure? sis test in each of the following populations given
the specified standard error and critical values.
10. What three factors can be increased to increase
Hint: Refer to the values given in Table 8.4:
power?
a. m = 100, sM = 8, critical values: 84.32 and 115.68
11. What three factors can be decreased to increase
b. m = 100, sM = 6, critical value: 113.98
power?
c. m = 100, sM = 4, critical value: 86.8
12. Distinguish between the significance of a result
and the size of an effect. 19. For each p value stated below: (1) What is the
decision for each if a = .05? (2) What is the deci
Concepts and Application Problems sion for each if a = .01?
a. p = .1000
13. Explain why the following statement is true: The
b. p = .0250
population standard deviation is always larger
than the standard error when the sample size is c. p = .0050
greater than one (n > 1). d. p = .0001
14. A researcher conducts a hypothesis test and con 20. For each obtained value stated below: (1) What is
cludes that his hypothesis is correct. Explain why the decision for each if a = .05 (one-tailed test,
this conclusion is never an appropriate decision upper-tail critical)? (2) What is the decision for
in hypothesis testing. each if a = .01 (two-tailed test)?
a. zobt = 2.10
15. The weight (in pounds) for a population of
school-aged children is normally distributed b. zobt = 1.70
with a mean equal to 135 ± 20 pounds (m ± s). c. zobt = 2.75
Suppose we select a sample of 100 children (n = d. zobt = –3.30
100) to test whether children in this population
are gaining weight at a .05 level of significance. 21. Will each of the following increase, decrease, or
a. What are the null and alternative hypotheses? have no effect on the value of a test statistic for
the one–independent sample z test?
b. What is the critical value for this test?
a. The sample size is increased.
c. What is the mean of the sampling distribution?
b. The population variance is decreased.
d. What is the standard error of the mean for the
sampling distribution? c. The sample variance is doubled.
d. The difference between the sample mean and
16. A researcher selects a sample of 30 participants population mean is decreased.
and makes the decision to retain the null hypoth
esis. She conducts the same study testing the 22. The police chief selects a sample of 49 local police
same hypothesis with a sample of 300 partici officers from a population of officers with a mean
pants and makes the decision to reject the null physical fitness rating of 72 ± 7.0 (m ± s) on a
hypothesis. Give a likely explanation for why the 100-point physical endurance rating scale. He
two samples led to different decisions. measures a sample mean physical fitness rating on
C H APT ER 8 : I N T RO D U C T I O N T O H YPO T H ES I S T ES T I NG 35
this scale equal to 74. He conducts a one–independent 27. As a increases, so does the power to detect an
sample z test to determine whether physical effect. Why, then, do we restrict a from being
endurance increased at a .05 level of significance. larger than .05?
a. State the value of the test statistic and whether
28. Will increasing sample size (n) and decreasing the
to retain or reject the null hypothesis.
population standard deviation (s) increase or
b. Compute effect size using Cohen’s d. decrease the value of standard error? Will this
increase or decrease power?
23. A cheerleading squad received a mean rating (out
of 100 possible points) of 75 ± 12 (m ± s) in com Problems in Research
petitions over the previous three seasons. The
same cheerleading squad performed in 36 local 29. Directional vs. nondirectional hypothesis
competitions this season with a mean rating testing. In an article reviewing directional and
equal to 78 in competitions. Suppose we conduct nondirectional tests, Leventhal (1999) stated the
a one–independent sample z test to determine following hypotheses concerning the difference
whether mean ratings increased this season between two population means.
(compared to the previous three seasons) at a .05
level of significance.
a. State the value of the test statistic and whether A B
to retain or reject the null hypothesis.
m1 – m2 = 0 m1 – m2 = 0
b. Compute effect size using Cohen’s d.
m1 – m2 > 0 m1 – m2 ≠ 0
24. A local school reports that its average GPA is
2.66 ± 0.40 (m ± s). The school announces that it
will be introducing a new program designed to
improve GPA scores at the school. What is the a. Which did he identify as nondirectional?
effect size (d) for this program if it is expected to b. Which did he identify as directional?
improve GPA by:
30. The one-tailed tests. In their book, Common
a. .05 points? Errors in Statistics (and How to Avoid Them), Good
b. .10 points? and Hardin (2003) wrote, “No one will know
c. .40 points? whether your [one-tailed] hypothesis was con
ceived before you started or only after you’d
25. Will each of the following increase, decrease, or examined the data” (p. 347). Why do the
have no effect on the value of Cohen’s d? authors state this as a concern for one-tailed
a. The sample size is decreased. tests?
b. The population variance is increased.
31. The hopes of a researcher. Hayne Reese
c. The sample variance is reduced. (1999) wrote, “The standard method of statistical
d. The difference between the sample and popu inference involves testing a null hypothesis that
lation mean is increased. the researcher usually hopes to reject” (p. 39).
Why does the researcher usually hope to reject
26. State whether the effect size for a 1-point effect the null hypothesis?
(M – m = 1) is small, medium, or large given the
following population variances: 32. Describing the z test. In an article describing
a. s = 1 hypothesis testing with small sample sizes,
Collins and Morris (2008) provided the following
b. s = 2
description for a z test: “Z is considered signifi
c. s = 4 cant if the difference is more than roughly two
d. s = 6 standard deviations above or below zero (or more
36 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
precisely, |Z| > 1.96)” (p. 464). Based on this all tests became more powerful as sample size
description: increased” (p. 468). How did increasing the sam
a. Are the authors referring to critical values for a ple size in this study increase power?
one- or two-tailed z test?
34. Describing hypothesis testing. Blouin and
b. What alpha level are the authors referring to? Riopelle (2004) made the following statement
concerning how scientists select test statistics:
33. Sample size and power. Collins and Morris “[This] test is the norm for conducting a test of H0,
(2008) simulated selecting thousands of samples when . . . the population(s) are normal with
and analyzed the results using many different known variance(s)” (p. 78). Based on this descrip
test statistics. With regard to the power for these tion, what test statistic are they describing as the
samples, they reported that “generally speaking, norm? How do you know this?
APPENDIX C
Chapter Solutions
for Even-Numbered
End-of-Chapter Problems
C H A P T E R 8
2. Reject the null hypothesis and retain the null 18.
hypothesis. a. a = .05.
b. a = .01.
4. A Type II error is the probability of retaining a
null hypothesis that is actually false. c. a = .001.
20.
6. Critical values = ±1.96. 1a. Reject the null hypothesis.
8. All four terms describe the same thing. The 1b. Reject the null hypothesis.
level of significance is represented by alpha, 1c. Reject the null hypothesis.
which defines the rejection region or the region 1d. Retain the null hypothesis.
associated with the probability of committing a 2a. Retain the null hypothesis.
Type I error. 2b. Retain the null hypothesis.
10. Alpha level, sample size, and effect size. 2c. Reject the null hypothesis.
2d. Reject the null hypothesis.
12. In hypothesis testing, the significance of an effect
determines whether an effect exists in some pop 22.
7 74 − 72
ulation. Effect size is used as a measure for how a. σ M = = 1.0; hence, zobt = = 2.00.
49 1
big the effect is in the population.
The decision is to reject the null hypothesis.
14. All decisions are made about the null hypothesis 74 − 72
and not the alternative hypothesis. The only b. d = = .29. A medium effect size.
7
appropriate decisions are to retain or reject the 24.
null hypothesis. 0.05
a. d = = 0.125. A small effect size.
0.4
16. The sample size in the second sample was larger.
Therefore, the second sample had more power to b. d = 0.1 = 0.25. A medium effect size.
0.4
detect the effect, which is likely why the deci
0.4
sions were different. c. d = = 1.00. A large effect size.
0.4
37
38 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS
26. 30. The point Good and Hardin (2003) are making is
1 that it is possible with the same data to retain the
a. d = = 1.00. Large effect size.
1 null for a two-tailed test and reject the null for a
1 one-tailed test where the entire rejection region is
b. d = = 0.50. Medium effect size.
2 placed in a single tail.
1
c. d = = 0.25. Medium effect size. 32.
4
1 a. Two-tailed z test.
d. d = = .17. Small effect size.
6 b. a = .05.
28. This will decrease standard error, thereby increas 34. We would use the z test because the population
ing power. variance is known.