541SolutionsManuel PDF
541SolutionsManuel PDF
1.1. Most students will prefer to work in seconds, to avoid having to work with decimals or
fractions.
1.2. Who? The individuals in the data set are students in a statistics class. What? There are
eight variables: ID (a label, with no units); Exam1, Exam2, Homework, Final, and Project
(in units in “points,” scaled from 0 to 100); TotalPoints (in points, computed from the other
scores, on a scale of 0 to 900); and Grade (A, B, C, D, and E). Why? The primary purpose
of the data is to assign grades to the students in this class, and (presumably) the variables
are appropriate for this purpose. (The data might also be useful for other purposes.)
1.5. The cases are apartments. There are five variables: rent (quantitative), cable (categorical),
pets (categorical), bedrooms (quantitative), distance to campus (quantitative).
1.6. (a) To find injuries per worker, divide the rates in Example 1.6 by 100,000 (or, redo the
computations without multiplying by 100,000). For wage and salary workers, there are
0.000034 fatal injuries per worker. For self-employed workers, there are 0.000099 fatal
injuries per worker. (b) These rates are 1/10 the size of those in Example 1.6, or 10,000
times larger than those in part (a): 0.34 fatal injuries per 10,000 wage/salary workers, and
0.99 fatal injuries per 10,000 self-employed workers. (c) The rates in Example 1.6 would
probably be more easily understood by most people, because numbers like 3.4 and 9.9 feel
more “familiar.” (It might be even better to give rates per million worker: 34 and 99.)
1.7. Shown are two possible stemplots; the first uses split 5 58 5 58
stems (described on page 11 of the text). The scores are 6 0 6 058
6 58 7 00235558
slightly left-skewed; most range from 70 to the low 90s. 7 0023 8 000035557
7 5558 9 00022338
8 00003
8 5557
9 0002233
9 8
1.8. Preferences will vary. However, the stemplot in Figure 1.8 shows a bit more detail, which
is useful for comparing the two distributions.
1.9. (a) The stemplot of the altered data is shown on the right. (b) Blank stems 1 6
should always be retained (except at the beginning or end of the stemplot), 2
2 5568
because the gap in the distribution is an important piece of information about 3 34
the data. 3 55678
4 012233
4 8
5 1
53
54 Chapter 1 Looking at Data—Distributions
Frequency
shape as the second histogram in Exercise 1.7. 6
5
4
3
2
1
0
50 60 70 80 90 100
First exam scores
Frequency
12
10
8
6
4
2
0
40 60 80 100
First exam scores
1.13. Using either a stemplot or histogram, we see that the distribution is left-skewed, centered
near 80, and spread from 55 to 98. (Of course, a histogram would not show the exact values
of the maximum and minimum.)
1.14. (a) The cases are the individual employees. (b) The first four (employee identification
number, last name, first name, and middle initial) are labels. Department and education level
are categorical variables; number of years with the company, salary, and age are quantitative
variables. (c) Column headings in student spreadsheets will vary, as will sample cases.
1.15. A Web search for “city rankings” or “best cities” will yield lots of ideas, such as crime
rates, income, cost of living, entertainment and cultural activities, taxes, climate, and school
system quality. (Students should be encouraged to think carefully about how some of these
might be quantitatively measured.)
Solutions 55
1.16. Recall that categorical variables place individuals into groups or categories, while
quantitative variables “take numerical values for which arithmetic operations. . . make sense.”
Variables (a), (d), and (e)—age, amount spent on food, and height—are quantitative. The
answers to the other three questions—about dancing, musical instruments, and broccoli—are
categorical variables.
1.18. Student answers will vary. A Web search for “college ranking methodology” gives
some ideas; in recent year, U.S. News and World Report used “16 measures of academic
excellence,” including academic reputation (measured by surveying college and university
administrators), retention rate, graduation rate, class sizes, faculty salaries, student-faculty
ratio, percentage of faculty with highest degree in their fields, quality of entering students
(ACT/SAT scores, high school class rank, enrollment-to-admission ratio), financial resources,
and the percentage of alumni who give to the school.
Percent
25
20
15
10
5
0
orange
blue
green
red
purple
yellow
gray
black
white
brown
Favorite color
15
10
5
0
orange
blue
green
red
purple
yellow
gray
white
black
brown
1.21. (a) There were 232 total respondents. The table that follows gives the percents; for
10 .
example, = 4.31%. (b) The bar graph is on the following page. (c) For example, 87.5%
232
of the group were between 19 and 50. (d) The age-group classes do not have equal width:
The first is 18 years wide, the second is 6 years wide, the third is 11 years wide, etc.
Note: In order to produce a histogram from the given data, the bar for the first age
group would have to be three times as wide as the second bar, the third bar would have to
be wider than the second bar by a factor of 11/6, etc. Additionally, if we change a bar’s
56 Chapter 1 Looking at Data—Distributions
width by a factor of x, we would need to change that bar’s height by a factor of 1/x.
40
35
Age group 30
(years) Percent
Percent
25
1 to 18 4.31% 20
19 to 24 41.81% 15
25 to 35 30.17% 10
5
36 to 50 15.52%
0
51 to 69 6.03%
70 and over
19 to 24
25 to 35
36 to 50
51 to 69
1 to 18
70 and over 2.16%
1.22. (a) & (b) The bar graph and pie charts are shown below. (c) A clear majority (76%)
agree or strongly agree that they browse more with the iPhone than with their previous
phone. (d) Student preferences will vary. Some might prefer the pie chart because it is more
familiar.
Strongly
50 disagree
Response percent
40
30
Mildly
20 disagree Strongly
agree
10 Mildly
agree
0
Strongly Mildly Mildly Strongly
agree agree disagree disagree
0
e r
bil lm he
zr
ry
ian
k
Pa
kic
o Ot
Ra
er
thi
mb
sM
kB
de
No
ola
Sy
ac
Si
ow
tor
Bl
ind
Mo
1.24. (a) The weights add to 254.2 million tons, and the percents add to 99.9.
(b) & (c) The bar graph and pie chart are shown below.
Glass Other
Yard trimmings
Wood
Food scraps
25 Rubber, leather, Paper
20 textile
Plastics
15 Metals
Metals
Wood
Glass
10
Other
5 Plastics
Yard trimmings
0 Food scraps
Source
1.25. (a) & (b) Both bar graphs are shown below. (c) The ordered bars in the graph from (b)
make it easier to identify those materials that are frequently recycled and those that are not.
(d) Each percent represents part of a different whole. (For example, 2.6% of food scraps are
recycled; 23.7% of glass is recycled, etc.)
60 60
50 50
Percent recycled
Percent recycled
40 40
30 30
20 20
10 10
0 0
s s r r d r r s s r d r
las tal ape e
the pe tal as be he
ps
ps
s
oo oo
s
od cs
bb
ng
ng
tic
Me Pa Me Gl b Ot
ra
ra
Fo asti
G P Ru W O Ru W
as
mi
mi
sc
sc
Pl
Pl
im
im
od
Tr
Tr
Fo
Material Material
20 20
Percent of all spam
15 15
10 10
5 5
0 0
Adult Financial Health Leisure Products Scams Products Financial Adult Scams Leisure Health
Type of spam Type of spam
1.28. (a) The bar graph is below. (b) The number of Facebook users trails off rapidly after the
top seven or so. (Of course, this is due in part to the variation in the populations of these
countries. For example, that Norway has nearly half as many Facebook users as France is
remarkable, because the 2008 populations of France and Norway were about 62.3 million
and 4.8 million, respectively.)
Facebook users (millions)
10
0
ng
m
uth la
ark
da
Co lia
No e
Ho frica
en
y
bia
Au ey
De ypt
ain
Ve xico
ile
Sw y
l
ly
Ge ia
ae
an
nc
So zue
do
Ita
Ind
a
Ko
Ch
na
rk
ed
rw
nm
lom
Eg
Sp
Isr
str
Fra
rm
Me
ing
A
Tu
Ca
ne
ng
dK
ite
Un
Country
1.29. (a) Most countries had moderate (single- or double-digit) increases in Face- 0 000
book usages. Chile (2197%) is an extreme outlier, as are (maybe) Venezuela 0 2333
0 4444
(683%) and Colombia (246%). (b) In the stemplot on the right, Chile and 0 6
Venezuela have been omitted, and stems are split five ways. (c) One observa- 0 99
tion is that, even without the outliers, the distribution is right-skewed. (d) The 1
stemplot can show some of the detail of the low part of the distribution, if the 1 33
1
outliers are omitted. 1
1
2
2
2 4
Solutions 59
Other Ph.D.
Other M.A.
Other M.S.
M.D.
Law
Ed.D.
Theology
M.Ed.
M.B.A.
Graduate degree
1.31. (a) The luxury car bar graph is below 25 Luxury cars
on the left; bars are in decreasing order of
size (the order given in the table). (b) The 20 Intermediate cars
intermediate car bar graph is below on the
Percent
15
right. For this stand-alone graph, it seemed
appropriate to re-order the bars by decreasing 10
size. Students may leave the bars in the order 5
given in the table; this (admittedly) might
make comparison of the two graphs simpler. 0
ld
ite
y
rl
e
er
er
d
ck
Gra
pea
Blu
Re
/go
(c) The graph on the right is one possible Silv
Oth
Wh
Bla
low
choice for comparing the two types of cars: ite
Wh
Yel
Color
for each color, we have one bar for each car
type.
Percent of intermediate cars
25
20
Percent of luxury cars
20
15
15
10
10
5 5
0 0
ld
d
ite
ite
y
y
rl
rl
e
e
er
er
er
er
d
d
ck
ck
Gra
Gra
l
pea
pea
Blu
Blu
Re
Re
/go
/go
Silv
Oth
Silv
Oth
Wh
Wh
Bla
Bla
low
low
ite
ite
Wh
Wh
Yel
Yel
Color Color
1.32. This distribution is skewed to the right, meaning that Shakespeare’s plays contain many
short words (up to six letters) and fewer very long words. We would probably expect most
authors to have skewed distributions, although the exact shape and spread will vary.
60 Chapter 1 Looking at Data—Distributions
1.33. Shown is the stemplot; as the text suggests, we have trimmed num- 0 799
bers (dropped the last digit) and split stems. 359 mg/dl appears to be 1 0134444
1 5577
an outlier. Overall, glucose levels are not under control: Only 4 of the 2 0
18 had levels in the desired range. 2 57
3
3 5
1.34. The back-to-back stemplot on the right suggests that the Individual Class
individual-instruction group was more consistent (their num- 0 799
22 1 0134444
bers have less spread) but not more successful (only two had 99866655 1 5577
numbers in the desired range). 22222 2 0
8 2 57
3
3 5
1.35. The distribution is roughly symmetric, centered near 7 (or “between 6 and 7”), and
spread from 2 to 13.
1.36. (a) Totals emissions would almost certainly be higher for 0 00000000000000011111
very large countries; for example, we would expect that even 0 222233333
0 445
with great attempts to control emissions, China (with over 0 6677
1 billion people) would have higher total emissions than the 0 888999
smallest countries in the data set. (b) A stemplot is shown; a 1 001
histogram would also be appropriate. We see a strong right 1
1
skew with a peak from 0 to 0.2 metric tons per person and a 1 67
smaller peak from 0.8 to 1. The three highest countries (the 1 9
United States, Canada, and Australia) appear to be outliers;
apart from those countries, the distribution is spread from 0 to 11 metric tons per person.
1.38. (a) The first histogram shows two modes: 5–5.2 and 5.6–5.8. (b) The second histogram
has peaks in locations close to those of the first, but these peaks are much less pronounced,
so they would usually be viewed as distinct modes. (c) The results will vary with the
software used.
18 18
16 16
14 14
Frequency
12 12
10 10
8 8
6 6
4 4
2 2
0 0
4.2 4.6 5 5.4 5.8 6.2 6.6 7 4.14 4.54 4.94 5.34 5.74 6.14 6.54 6.94
Rainwater pH Rainwater pH
1.39. Graph (a) is studying time (Question 4); it is reasonable to expect this to be right-skewed
(many students study little or not at all; a few study longer).
Graph (d) is the histogram of student heights (Question 3): One would expect a fair
amount of variation but no particular skewness to such a distribution.
The other two graphs are (b) handedness and (c) gender—unless this was a particularly
unusual class! We would expect that right-handed students should outnumber lefties
substantially. (Roughly 10 to 15% of the population as a whole is left-handed.)
1.40. Sketches will vary. The distribution of coin years would be left-skewed because newer
coins are more common than older coins.
1.41. (a) Not only are most responses multiples of 10; Women Men
many are multiples of 30 and 60. Most people will 0 033334
96 0 66679999
“round” their answers when asked to give an estimate 22222221 1 2222222
like this; in fact, the most striking answers are ones 888888888875555 1 558
4440 2 00344
such as 115, 170, or 230. The students who claimed 360 2
minutes (6 hours) and 300 minutes (5 hours) may have 3 0
6 3
been exaggerating. (Some students might also “consider
suspicious” the student who claimed to study 0 minutes per night. As a teacher, I can easily
believe that such students exist, and I suspect that some of your students might easily accept
that claim as well.) (b) The stemplots suggest that women (claim to) study more than men.
The approximate centers are 175 minutes for women and 120 minutes for men.
62 Chapter 1 Looking at Data—Distributions
1.42. The stemplot gives more information than a histogram (since all the 48 8
original numbers can be read off the stemplot), but both give the same im- 49
50 7
pression. The distribution is roughly symmetric with one value (4.88) that 51 0
is somewhat low. The center of the distribution is between 5.4 and 5.5 (the 52 6799
median is 5.46, the mean is 5.448); if asked to give a single estimate for the 53 04469
“true” density of the earth, something in that range would be the best answer. 54 2467
55 03578
56 12358
57 59
58 5
1.43. (a) There are four variables: GPA, IQ, and self-concept are quantitative, while gender
is categorical. (OBS is not a variable, since it is not really a “characteristic” of a student.)
(b) Below. (c) The distribution is skewed to the left, with center (median) around 7.8. GPAs
are spread from 0.5 to 10.8, with only 15 below 6. (d) There is more variability among the
boys; in fact, there seems to be a subset of boys with GPAs from 0.5 to 4.9. Ignoring that
group, the two distributions have similar shapes.
0 5 Female Male
1 8 0 5
2 4 1 8
3 4689 2 4
4 0679 4 3 689
7 4 069
5 1259 952 5 1
6 0112249 4210 6 129
7 22333556666666788899 98866533 7 223566666789
8 0000222223347899 997320 8 0002222348
9 002223344556668 65300 9 2223445668
710 10 68
10 01678
150
140
1970 1975 1980 1985 1990 1995 2000 2005
Year
1.47. The total for the 24 countries was 897 days, so with Suriname, it is 897 + 694 = 1591
days, and the mean is x = 1591
25 = 63.64 days.
821
1.48. The mean score is x = = 82.1.
10
1.49. To find the ordered list of times, start with the 24 times in Example 1.23, and add 694 to
the end of the list. The ordered times (with median highlighted) are
4, 11, 14, 23, 23, 23, 23, 24, 27, 29, 31, 33, 40 ,
42, 44, 44, 44, 46, 47, 60, 61, 62, 65, 77, 694
The outlier increases the median from 36.5 to 40 days, but the change is much less than the
outlier’s effect on the mean.
1.50. The median of the service times is 103.5 seconds. (This is the average of the 40th and
41st numbers in the sorted list, but for a set of 80 numbers, we assume that most students
will compute the median using software, which does not require that the data be sorted.)
1.53. The maximum and minimum can be found by inspecting the list. The sorted list (with
quartile and median locations highlighted) is
1 2 2 3 4 9 9 9 11 19
19 25 30 35 40 44 48 51 52 54
55 56 57 59 64 67 68 73 73 75
75 76 76 77 80 88 89 90 102 103
104 106 115 116 118 121 126 128 137 138
140 141 143 148 148 157 178 179 182 199
201 203 211 225 274 277 289 290 325 367
372 386 438 465 479 700 700 951 1148 2631
This confirms the five-number summary (1, 54.5, 103.5, 200, and 2631 seconds)
given in Example 1.26. The sum of the 80 numbers is 15,726 seconds, so the mean is
x = 15,726
80 = 196.575 seconds (the value 197 in the text was rounded).
Note: The most tedious part of this process is sorting the numbers and adding them
all up. Unless you really want to confirm that your students can sort a list of 80 numbers,
consider giving the students the sorted list of times, and checking their ability to identify the
locations of the quartiles.
1.54. The median and quartiles were found earlier; the minimum and maximum are easy to
locate in the ordered list of scores (see the solutions to Exercises 1.51 and 1.52), so the
five-number summary is Min = 55, Q 1 = 75, M = 82.5, Q 3 = 92, Max = 98.
1.55. Use the five-number summary from the solution to Exercise 1.54:
95
Min = 55, Q 1 = 75, M = 82.5, Q 3 = 92, Max = 98 90
Score on first exam
85
80
75
70
65
60
55
50
1.56. The interquartile range is IQR = Q 3 − Q 1 = 92 − 75 = 17, so the 1.5 × IQR rule would
consider as outliers scores outside the range Q 1 − 25.5 = 49.5 to Q 3 + 25.5 = 117.5.
According to this rule, there are no outliers.
1
1.57. The variance can be computed from the formula s 2 = (xi − x)2 ; for
n−1
example, the first term in the sum would be (80 − 82.1)2 = 4.41. However, in practice,
1416.9
software or a calculator is the preferred approach; this yields s 2 = = 157.43 and
√ . 9
s = s 2 = 12.5472.
Solutions 65
1.59. Without Suriname, the quartiles are 23 and 46.5 days; with Suriname included, they are
23 and 53.5 days. Therefore, the IQR increases from 23.5 to 30.5 days—a much less drastic
change than the change in s (18.6 to 132.6 days).
950
1.60. Divide total score by 4: = 237.5 points.
4
1.63. All of these numbers are given in the table in the solution to the previous exercise.
(a) x changes from 4.76% (with) to 4.81% (without); the median (4.7%) does not change.
(b) s changes from 0.7523% to 0.5864%; Q 1 changes from 4.3% to 4.35%, while Q 3 = 5%
does not change. (c) A low outlier decreases x; any kind of outlier increases s. Outliers
have little or no effect on the median and quartiles.
1.65. Use a small data set with an odd number of points, so that the median is the middle
number. After deleting the lowest observation, the median will be the average of that middle
number and the next number after it; if that latter number is much larger, the median will
change substantially. For example, start with 0, 1, 2 , 998, 1000; after removing 0, the
median changes from 2 to 500.
1.66. Salary distributions (especially in professional sports) tend to be skewed to the right. This
skew makes the mean higher than the median.
Solutions 67
1.67. (a) The distribution is left-skewed. While the skew makes the 3 7
five-number summary is preferable, some students might give the 4 3
4 7777
mean/standard deviation. In ounces, these statistics are: 5 23
x s Min Q1 M Q3 Max 5
6.456 1.425 3.7 4.95 6.7 7.85 8.2 6 0033
6 7
(b) The numerical summary does not reveal the two weight clusters (vis- 7 03
ible in a stemplot or histogram). (c) For small potatoes (less than 6 oz), 7 668899999
8 2
n = 8, x = 4.662 oz, and s = 0.501 oz. For large potatoes, n = 17,
x = 7.300 oz, and s = 0.755 oz. Because there are clearly two groups, it seems appropriate
to treat them separately.
1.68. (a) The five-number summary is Min = 2.2 cm, Q 1 = 10.95 cm, M = 28.5 cm, Q 3 =
41.9 cm, Max = 69.3 cm. (b) & (c) The boxplot and histogram are shown below. (Students
might choose different interval widths for the histogram.) (d) Preferences will vary. Both
plots reveal the right-skew of this distribution, but the boxplot does not show the two peaks
visible in the histogram.
70 9
Diameter at breast height (cm)
8
60
7
Frequency
50 6
5
40
4
30 3
2
20
1
10 0
0 10 20 30 40 50 60 70 80
0 Diameter at breast height (cm)
1.69. (a) The five-number summary is Min = 0 mg/l, Q 1 = 0 mg/l, M = 5.085 mg/l, Q 3 =
9.47 mg/l, Max = 73.2 mg/l. (b) & (c) The boxplot and histogram are shown below.
(Students might choose different interval widths for the histogram.) (d) Preferences will
vary. Both plots reveal the sharp right-skew of this distribution, but because Min = Q 1 , the
boxplot looks somewhat strange. The histogram seems to convey the distribution better.
70 30
60 25
Frequency
50 20
CRP (mg/l)
40 15
30 10
20 5
10 0
0 0 10 20 30 40 50 60 70 80 90
CRP (mg/l)
68 Chapter 1 Looking at Data—Distributions
1.5
Logarithm Min Q1 M Q3 Max
1 0.5
Natural 0 0 1.8048 2.3485 4.3068
0.5
Common 0 0 0.7838 1.0199 1.8704
0 0
.
(The ratio between these answers is roughly ln 10 = 2.3.)
(b) & (c) The boxplots and histograms are shown below. (Students might choose different
interval widths for the histograms.) (d) As for Exercise 1.69, preferences will vary.
16 16
14 14
12 12
Frequency
Frequency
10 10
8 8
6 6
4 4
2 2
0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
Natural log of (1+CRP) Base-10 log of (1+CRP)
1.71. (a) The five-number summary (in units of µmol/l) is Min = 0.24, Q 1 = 0.355, M =
0.76, Q 3 = 1.03, Max = 1.9. (b) & (c) The boxplot and histogram are shown below.
(Students might choose different interval widths for the histogram.) (d) The distribution is
right-skewed. A histogram (or stemplot) is preferable because it reveals an important feature
not evident from a boxplot: This distribution has two peaks.
1.8 14
1.6 12
Retinol level (µmol/l)
1.4 10
Frequency
1.2 8
1 6
0.8 4
0.6
2
0.4
0
0.2
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
0 Retinol level (µmol/l)
Solutions 69
1.72. The mean and standard deviation for these ratings are 1 0000000000000000
.
x = 5.9 and s = 3.7719; the five-number summary is 2 0000
3 0
Min = Q 1 = 1, M = 6.5, Q 3 = Max = 10. For a graphical 4 0
presentation, a stemplot (or histogram) is better than a boxplot 5 00000
because the latter obscures details about the distribution. (With 6 000
a little thought, one might realize that Min = Q 1 = 1 and 7 0
8 000000
Q 3 = Max = 10 means that there are lots of 1’s and lots 9 00000
of 10’s, but this is much more evident in a stemplot or his- 10 000000000000000000
togram.)
1.73. The distribution of household net worth would almost surely be strongly skewed to the
right: Most families would generally have accumulated little or modest wealth, but a few
would have become rich. This strong skew pulls the mean to be higher than the median.
1.74. See also the solution to Exercise 1.36. (a) The five- 0 00000000000000011111
number summary (in units of metric tons per person) is: 0 222233333
0 445
Min = 0, Q 1 = 0.75, M = 3.2, Q 3 = 7.8, Max = 19.9 0 6677
The evidence for the skew is in the large gaps between the 0 888999
higher numbers; that is, the differences Q 3 − M and Max− Q 3 1 001
are large compared to Q 1 − Min and M − Q 1 . (b) The IQR 1
1
is Q 3 − Q 1 = 7.05, so outliers would be less than −9.825 or 1 67
greater than 18.375. According to this rule, only the United 1 9
States qualifies as an outlier, but Canada and Australia seem
high enough to also include them.
.
1.75. The total salary is $690,000, so the mean is x = $690,000
9 = $76,667. Six of the nine
employees earn less than the mean. The median is M = $35,000.
1.76. If three individuals earn $0, $0, and $20,000, the reported median is $20,000. If the two
individuals with no income take jobs at $14,000 each, the median decreases to $14,000.
The same thing can happen to the mean: In this example, the mean drops from $20,000 to
$16,000.
.
1.77. The total salary is now $825,000, so the new mean is x = $825,000
9 = $91,667. The
median is unchanged.
1.79. The quote describes a distribution with a strong right skew: Lots of years with no losses
to hurricane ($0), but very high numbers when they do occur. For example, if there is one
hurricane in a 10-year period causing $1 million in damages, the “average annual loss” for
that period would be $100,000, but that does not adequately represent the cost for the year
of the hurricane. Means are not the appropriate measure of center for skewed distributions.
1.80. (a) x and s are appropriate for symmetric dis- Women Men
tributions with no outliers. (b) Both high numbers x s x s
are flagged as outliers. For women, IQR = 60, Before 165.2 56.5 117.2 74.2
so the upper 1.5 × IQR limit is 300 minutes. For After 158.4 43.7 110.9 66.9
men, IQR = 90, so the upper 1.5 × IQR limit is 285 minutes. The table on the right shows
the effect of removing these outliers.
1.81. (a) & (b) See the table on the right. In both cases, x s M
the mean and median are quite similar. pH 5.4256 0.5379 5.44
Density 5.4479 0.2209 5.46
1.82. See also the solution to Exercise 1.43. (a) The mean of x s M
this distribution appears to be higher than 100. (There is IQ 108.9 13.17 110
no substantial difference between the standard deviations.) GPA 7.447 (2.1) 7.829
(b) The mean and median are quite similar; the mean is slightly smaller due to the slight left
skew of the data. (c) In addition to the mean and median, the standard deviation is shown for
reference (the exercise did not ask for it).
Note: Students may be somewhat puzzled by the statement in (b) that the median is
“close to the mean” (when they differ by 1.1), followed by (c), where they “differ a bit”
(when M − x = 0.382). It may be useful to emphasize that we judge the size of such differ-
.
1.1
ences relative to the spread of the distribution. For example, we can note that 13.17 = 0.08
.
2.1 = 0.18 for (c).
for (b), and 0.382
1.83. With only two observations, the mean and median are always equal because the median
is halfway between the middle two (in this case, the only two) numbers.
1.84. (a) The mean (green arrow) moves along with the moving point (in fact, it moves in
the same direction as the moving point, at one-third the speed). At the same time, as long
as the moving point remains to the right of the other two, the median (red arrow) points to
the middle point (the rightmost nonmoving point). (b) The mean follows the moving point
as before. When the moving point passes the rightmost fixed point, the median slides along
with it until the moving point passes the leftmost fixed point, then the median stays there.
1.85. (a) There are several different answers, depending on the configuration of the first five
points. Most students will likely assume that the first five points should be distinct (no
repeats), in which case the sixth point must be placed at the median. This is because the
median of 5 (sorted) points is the third, while the median of 6 points is the average of the
third and fourth. If these are to be the same, the third and fourth points of the set of six
must both equal the third point of the set of five.
The diagram below illustrates all of the possibilities; in each case, the arrow shows the
Solutions 71
location of the median of the initial five points, and the shaded region (or dot) on the line
indicates where the sixth point can be placed without changing the median. Notice that there
are four cases where the median does not change, regardless of the location of the sixth
point. (The points need not be equally spaced; these diagrams were drawn that way for
convenience.)
(b) Regardless of the configuration of the first five points, if the sixth point is added so as to
leave the median unchanged, then in that (sorted) set of six, the third and fourth points must
be equal. One of these two points will be the middle (fourth) point of the (sorted) set of
seven, no matter where the seventh point is placed.
Note: If you have a student who illustrates all possible cases above, then it is likely that
the student either (1) obtained a copy of this solutions manual, (2) should consider a career
in writing solutions manuals, (3) has too much time on his or her hands, or (4) both 2 and
3 (and perhaps 1) are true.
Min Q1 M Q3 Max 48
bihai 46.34 46.71 47.12 48.245 50.26 46
Length (mm)
1.87. (a) The means and standard deviations bihai red yellow
(all in millimeters) are: 46 3466789 37 4789 34 56
47 114 38 0012278 35 146
Variety x s 48 0133 39 167 36 0015678
bihai 47.5975 1.2129 49 40 56 37 01
50 12 41 4699 38 1
red 39.7113 1.7988 42 01
yellow 36.1800 0.9753 43 0
(b) Bihai and red appear to be right-skewed (although it is difficult to tell with such small
samples). Skewness would make these distributions unsuitable for x and s.
72 Chapter 1 Looking at Data—Distributions
.
1.88. (a) The mean is x = 15, and the standard deviation is s = 5.4365. (b) The mean is still
15; the new standard deviation is 3.7417. (c) Using the mean as a substitute for missing data
will not change the mean, but it decreases the standard deviation.
1.89. The minimum and maximum are easily determined to be 1 and 12 letters, and the
quartiles and median can be found by adding up the bar heights. For example, the first
two bars have total height 22.3% (less than 25%), and adding the third bar brings the total
to 45%, so Q 1 must equal 3 letters. Continuing this way, we find that the five-number
summary, in units of letters, is:
Min = 1, Q 1 = 3, M = 4, Q 3 = 5, Max = 12
Note that even without the frequency table given in the data file, we could draw the same
conclusion by estimating the heights of the bars in the histogram.
1.90. Because the mean is to be 7, the five numbers must add up to 35. Also, the third number
(in order from smallest to largest) must be 10 because that is the median. Beyond that, there
is some freedom in how the numbers are chosen.
Note: It is likely that many students will interpret “positive numbers” as meaning
positive integers only, which leads to eight possible solutions, shown below.
1 1 10 10 13 1 1 10 11 12 1 2 10 10 12 1 2 10 11 11
1 3 10 10 11 1 4 10 10 10 2 2 10 10 11 2 3 10 10 10
1.91. The simplest approach is to take (at least) six numbers—say, a, b, c, d, e, f in increasing
order. For this set, Q 3 = e; we can cause the mean to be larger than e by simply choosing
f to be much larger than e. For example, if all numbers are nonnegative, f > 5e would
accomplish the goal because then
a+b+c+d +e+ f e+ f e + 5e
x= > > = e.
6 6 6
1.93. (a) One possible answer is 1, 1, 1, 1. (b) 0, 0, 20, 20. (c) For (a), any set of four
identical numbers will have s = 0. For (b), the answer is unique; here is a rough description
of why. We want to maximize the “spread-out”-ness of the numbers (which is what standard
deviation measures), so 0 and 20 seem to be reasonable choices based on that idea. We also
want to make each individual squared deviation—(x1 − x)2 , (x2 − x)2 , (x3 − x)2 , and
(x4 − x)2 —as large as possible. If we choose 0, 20, 20, 20—or 20, 0, 0, 0—we make the
Solutions 73
first squared deviation 152 , but the other three are only 52 . Our best choice is two at each
extreme, which makes all four squared deviations equal to 102 .
1.94. Answers will vary. Typical calculators will carry only about 12 to 15 digits; for example,
a TI-83 fails (gives s = 0) for 14-digit numbers. Excel (at least the version I checked) also
fails for 14-digit numbers, but it gives s = 262,144 rather than 0. The (very old) version of
Minitab used to prepare these answers fails at 20,000,001 (eight digits), giving s = 2.
1.95. The table on the right reproduces the (in mm) (in inches)
means and standard deviations from the Variety x s x s
solution to Exercise 1.87 and shows those bihai 47.5975 1.2129 1.874 0.04775
values expressed in inches. For each conver- red 39.7113 1.7988 1.563 0.07082
yellow 36.1800 0.9753 1.424 0.03840
sion, multiply by 39.37/1000 = 0.03937 (or
divide by 25.4—an inch is defined as 25.4 millimeters). For example, for the bihai variety,
x = (47.5975 mm)(0.03937 in/mm) = (47.5975 mm) ÷ (25.4 mm/in) = 1.874 in.
1.96. (a) x = 5.4479 and s = 0.2209. (b) The first measurement corresponds to
5.50 × 62.43 = 343.365 pounds per cubic foot. To find x new and snew , we similarly multiply
. .
by 62.43: x new = 340.11 and snew = 13.79.
Note: The conversion from cm to feet is included in the multiplication by 62.43; the
step-by-step process of this conversion looks like this:
(1 g/cm3 )(0.001 kg/g)(2.2046 lb/kg)(30.483 cm3/ft3 ) = 62.43 lb/ft3
.
1.97. Convert from kilograms to pounds by multiplying by 2.2: x = (2.42 kg)(2.2 lb/kg) =
.
5.32 lb and s = (1.18 kg)(2.2 lb/kg) = 2.60 lb.
1.99. There are 80 service times, so to find the 10% trimmed mean, remove the highest and
lowest eight values (leaving 64). Remove the highest and lowest 16 values (leaving 48) for
the 20% trimmed mean.
The mean and median for the full data set are x = 196.575 and M = 103.5 minutes. The
. .
10% trimmed mean is x ∗ = 127.734, and the 20% trimmed mean is x ∗∗ = 111.917 minutes.
Because the distribution is right-skewed, removing the extremes lowers the mean.
74 Chapter 1 Looking at Data—Distributions
1.100. After changing the scale from centimeters to inches, the five-number summary values
change by the same ratio (that is, they are multiplied by 0.39). The shape of the histogram
might change slightly because of the change in class intervals. (a) The five-number
summary (in inches) is Min = 0.858, Q 1 = 4.2705, M = 11.115, Q 3 = 16.341, Max =
27.027. (b) & (c) The boxplot and histogram are shown below. (Students might choose
different interval widths for the histogram.) (d) As in Exercise 1.56, the histogram reveals
more detail about the shape of the distribution.
12
Diameter at breast height (in)
25
10
20
Frequency
8
15 6
4
10
2
5 0
0 5 10 15 20 25 30 35
0 Diameter at breast height (in)
1.101. Take the mean plus or minus two standard deviations: 572 ± 2(51) = 470 to 674.
1.102. Take the mean plus or minus three standard deviations: 572 ± 3(51) = 419 to 725.
620 − 572 .
1.103. The z-score is z = 51 = 0.94.
− 572 .
1.104. The z-score is z = 510 51 = −1.22. This is negative because an ISTEP score of 510 is
below average; specifically, it is 1.22 standard deviations below the mean.
.
1.105. Using Table A, the proportion below 620 (z = 0.94) 620
is 0.8264 and the proportion at or above is 0.1736; these
0.8264
two proportions add to 1. The graph on the right illus- 0.1736
trates this with a single curve; it conveys essentially the
same idea as the “graphical subtraction” picture shown in
419 470 521 572 623 674 725
Example 1.36.
.
1.106. Using Table A, the proportion below 620 (z = 0.94)
. 620 660
is 0.8264, and the proportion below 660 (z = 1.73) is
0.8264
0.9582. Therefore:
0.9582
area between area left area left
= −
620 and 660 of 660 of 620 419 470 521 572 623 674 725
.
1.107. Using Table A, this ISTEP score should correspond to a standard score of z = 0.67
.
(software gives 0.6745), so the ISTEP score (unstandardized) is 572 + 0.67(51) = 606.2
(software: 606.4).
.
1.108. Using Table A, x should correspond to a standard score of z = −0.84 (software gives
.
−0.8416), so the ISTEP score (unstandardized) is x = 572 − 0.84(51) = 529.2 (software:
529.1).
1.112. (a) The table on the right gives the Women Men
ranges for women; for example, about 68% 68% 7856 to 20,738 4995 to 23,125
of women speak between 7856 and 20,738 95% 1415 to 27,179 −4070 to 32,190
words per day. (b) Negative numbers do 99.7% −5026 to 33,620 −13,135 to 41,255
not make sense for this situation. The 68–95–99.7 rule is reasonable for a distribution that
is close to Normal, but by constructing a stemplot or histogram, it is easily confirmed that
this distribution is slightly right-skewed. (c) These ranges are also in the table; the men’s dis-
tribution is more skewed than the women’s distribution, so the 68–95–99.7 rule is even less
appropriate. (d) This does not support the conventional wisdom: The ranges from parts (a)
and (c) overlap quite a bit. Additionally, the difference in the means is quite small relative to
the large standard deviations.
76 Chapter 1 Looking at Data—Distributions
1.115. (a) We need the 5th, 15th, 55th, and Table A Software
85th percentiles for a N (0, 1) distribu- Standard Actual Standard Actual
tion. These are given in the table on the F −1.64 53.6 −1.6449 53.55
right. (b) To convert to actual scores, take D −1.04 59.6 −1.0364 59.64
the standard-score cut-off z and compute C 0.13 71.3 0.1257 71.26
10z + 70. (c) Opinions will vary. B 1.04 80.4 1.0364 80.36
Note: The cut-off for an A given in the previous solution is the lowest score that gets an
A—that is, the point where one’s grade drops from an A to a B. These cut-offs are the points
where one’s grade jumps up. In practice, this is only an issue for a score that falls exactly
on the border between two grades.
1.118. The mean and median both equal 0.5; the quartiles are Q 1 = 0.25 and Q 3 = 0.75.
1.119. (a) Mean is C, median is B (the right skew pulls the mean to the right). (b) Mean A,
median A. (c) Mean A, median B (the left skew pulls the mean to the left).
Solutions 77
1.121. (a) The applet shows an area of 0.6826 between −1.000 and 1.000, while the
68–95–99.7 rule rounds this to 0.68. (b) Between −2.000 and 2.000, the applet reports
0.9544 (compared to the rounded 0.95 from the 68–95–99.7 rule). Between −3.000 and
3.000, the applet reports 0.9974 (compared to the rounded 0.997).
1.122. See the sketch of the curve in the solution to Exercise 1.120. (a) The middle 95% fall
within two standard deviations of the mean: 266 ± 2(16), or 234 to 298 days. (b) The
shortest 2.5% of pregnancies are shorter than 234 days (more than two standard deviations
below the mean).
1.125. The mean and standard deviation are x = 5.4256 and s = 0.5379. About 67.62%
.
(71/105 = 0.6476) of the pH measurements are in the range x ± s = 4.89 to 5.96. About
95.24% (100/105) are in the range x ± 2s = 4.35 to 6.50. All (100%) are in the range
x ± 3s = 3.81 to 7.04.
78 Chapter 1 Looking at Data—Distributions
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
(c) (d)
1.6 –1.8 1.6
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
1.130. 70 is two standard deviations below the mean (that is, it has standard score z = −2), so
about 2.5% (half of the outer 5%) of adults would have WAIS scores below 70.
1.131. 130 is two standard deviations above the mean (that is, it has standard score z = 2), so
about 2.5% of adults would score at least 130.
− 1509 .
1.132. Tonya’s score standardizes to z = 1820321 = 0.9688, while Jermaine’s score
29 − 21.5 .
corresponds to z = 5.4 = 1.3889. Jermaine’s score is higher.
.
1.133. Jacob’s score standardizes to z = 16 −5.421.5 = −1.0185, while Emily’s score corresponds
− 1509 .
to z = 1020321 = −1.5234. Jacob’s score is higher.
− 1509 .
1.134. Jose’s score standardizes to z = 2080321 = 1.7788, so an equivalent ACT score is
.
21.5 + 1.7788 × 5.4 = 31.1. (Of course, ACT scores are reported as whole numbers, so this
would presumably be a score of 31.)
Solutions 79
30 − 21.5 .
1.135. Maria’s score standardizes to z = = 1.5741, so an equivalent SAT score is
. 5.4
1509 + 1.5741 × 321 = 2014.
2090 − 1509 .
1.136. Maria’s score standardizes to z = 321 = 1.81, for which Table A gives 0.9649.
Her score is the 96.5 percentile.
19 − 21.5 .
1.137. Jacob’s score standardizes to z = 5.4 = −0.4630, for which Table A gives 0.3228.
His score is the 32.3 percentile.
1.138. 1920 and above: The top 10% corresponds to a standard score of z = 1.2816, which in
.
turn corresponds to a score of 1509 + 1.2816 × 321 = 1920 on the SAT.
1.139. 1239 and below: The bottom 20% corresponds to a standard score of z = −0.8416,
.
which in turn corresponds to a score of 1509 − 0.8416 × 321 = 1239 on the SAT.
1.140. The quartiles of a Normal distribution are ±0.6745 standard deviations from the mean,
.
so for ACT scores, they are 21.5 ± 0.6745 × 5.4 = 17.9 to 25.1.
1.141. The quintiles of the SAT score distribution are 1509 − 0.8416 × 321 = 1239,
1509 − 0.2533 × 321 = 1428, 1509 + 0.2533 × 321 = 1590, and 1509 + 0.8416 × 321 = 1779.
1.142. For a Normal distribution with mean 55 mg/dl and standard deviation 15.5 mg/dl:
− 55 .
(a) 40 mg/dl standardizes to z = 4015.5 = −0.9677. Using Table A, 16.60% of women fall
− 55 .
below this level (software: 16.66%). (b) 60 mg/dl standardizes to z = 6015.5 = 0.3226.
Using Table A, 37.45(c) Subtract the answers from (a) and (b) from 100%: Table A gives
45.95% (software: 45.99%), so about 46% of women fall in the intermediate range.
1.143. For a Normal distribution with mean 46 mg/dl and standard deviation 13.6 mg/dl:
− 46 .
(a) 40 mg/dl standardizes to z = 4013.6 = −0.4412. Using Table A, 33% of men fall below
− 46 .
this level (software: 32.95%). (b) 60 mg/dl standardizes to z = 6013.6 = 1.0294. Using
Table A, 15.15(c) Subtract the answers from (a) and (b) from 100%: Table A gives 51.85%
(software: 51.88%), so about 52% of men fall in the intermediate range.
1.144. (a) About 0.6% of healthy young adults have osteoporosis (the cumulative probability
below a standard score of −2.5 is 0.0062). (b) About 31% of this population of older
women has osteoporosis: The BMD level which is 2.5 standard deviations below the young
adult mean would standardize to −0.5 for these older women, and the cumulative probability
for this standard score is 0.3085.
1.145. (a) About 5.2%: x < 240 corresponds to z < −1.625. Table A gives 5.16% for
−1.63 and 5.26% for −1.62. Software (or averaging the two table values) gives 5.21%.
(b) About 54.7%: 240 < x < 270 corresponds to −1.625 < z < 0.25. The area to the
left of 0.25 is 0.5987; subtracting the answer from part (a) leaves about 54.7%. (c) About
279 days or longer: Searching Table A for 0.80 leads to z > 0.84, which corresponds to
x > 266 + 0.84(16) = 279.44. (Using the software value z > 0.8416 gives x > 279.47.)
80 Chapter 1 Looking at Data—Distributions
1.146. (a) The quartiles for a standard Normal distribution are ±0.6745. (b) For a N (µ, σ )
distribution, Q 1 = µ − 0.6745σ and Q 3 = µ + 0.6745σ . (c) For human pregnancies,
. .
Q 1 = 266 − 0.6745 × 16 = 255.2 and Q 3 = 266 + 0.67455 × 16 = 276.8 days.
1.147. (a) As the quartiles for a standard Normal distribution are ±0.6745, we have
IQR = 1.3490. (b) c = 1.3490: For a N (µ, σ ) distribution, the quartiles are
Q 1 = µ − 0.6745σ and Q 3 = µ + 0.6745σ .
1.148. In the previous two exercises, we found that for a N (µ, σ ) distribution,
Q 1 = µ − 0.6745σ , Q 3 = µ + 0.6745σ , and IQR = 1.3490σ . Therefore,
1.5 × IQR = 2.0235σ , and the suspected outliers are below Q 1 − 1.5 × IQR = µ − 2.698σ ,
and above Q 3 + 1.5 × IQR = µ + 2.698σ . The percentage outside of this range is
2 × 0.0035 = 0.70%.
1.149. (a) The first and last deciles for a standard Normal distribution are ±1.2816. (b) For
.
a N (9.12, 0.15) distribution, the first and last deciles are µ − 1.2816σ = 8.93 and
.
µ + 1.2816σ = 9.31 ounces.
1.150. The shape of the quantile plot suggests that the data are right-skewed (as was observed
in Exercises 1.36 and 1.74). This can be seen in the flat section in the lower left—these
numbers were less spread out than they should be for Normal data—and the three apparent
outliers (the United States, Canada, and Australia) that deviate from the line in the upper
right; these were much larger than they would be for a Normal distribution.
1.151. (a) The plot is reasonably linear except for the point in the upper right, so this
distribution is roughly Normal, but with a high outlier. (b) The plot is fairly linear, so
the distribution is roughly Normal. (c) The plot curves up to the right—that is, the large
values of this distribution are larger than they would be in a Normal distribution—so the
distribution is skewed to the right.
5.4
other points.
5.2
4.8
–3 –2 –1 0 1 2 3
Normal score
Solutions 81
1.153. (a) All three quantile plots are below; the yellow variety is the nearest to a straight line.
(b) The other two distributions are slightly right-skewed (the lower-left portion of the graph
is somewhat flat); additionally, the bihai variety appears to have a couple of high outliers.
42
49 37
41
48 40 36
39
47 35
38
46 37 34
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
Normal score Normal score Normal score
1.154. Shown are a histogram and quantile plot for one sample of 200 simulated N (0, 1)
points. Histograms will vary slightly but should suggest a bell curve. The Normal quantile
plot shows something fairly close to a line but illustrates that, even for actual Normal data,
the tails may deviate slightly from a line.
3
50
2
Simulated values
40
1
Frequency
30
0
20 –1
10 –2
0 –3
–3 –2.5 –2 –1.5 –1 –0.5 0 0.5 1 1.5 2 2.5 3 –3 –2 –1 0 1 2 3
Simulated values Normal score
1.155. Shown are a histogram and quantile plot for one sample of 200 simulated uniform data
points. Histograms will vary slightly but should suggest the density curve of Figure 1.34
(but with more variation than students might expect). The Normal quantile plot shows that,
compared to a Normal distribution, the uniform distribution does not extend as low or as
high (not surprising, since all observations are between 0 and 1).
1
0.9
25
0.8
Simulated values
20 0.7
Frequency
0.6
15 0.5
0.4
10 0.3
5 0.2
0.1
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 –3 –2 –1 0 1 2 3
Simulated values Normal score
82 Chapter 1 Looking at Data—Distributions
1.157. (a) The distribution appears to be roughly Normal. (b) One could 8 28
justify using either the mean and standard deviation or the five-number 9
10 58
summary: 11 34
x s Min Q1 M Q3 Max 12 023689
13 015788
15.27% 3.118% 8.2% 13% 15.5% 17.6% 22.8% 14 0077
(c) For example, binge drinking rates are typically 10% to 20%. Which 15 13466889
16 01567
states are high, and which are low? One might also note the geographical
17 45677789
distribution of states with high binge-drinking rates: The top six states 18 8
(Wisconsin, North Dakota, Iowa, Minnesota, Illinois, and Nebraska) are 19 148
all adjacent to one another. 20 2
21 6
22 8
1.158. (a) The stemplot on the right suggests that there are two groups of 16 3
states: the under-23% and over-23% groups. Additionally, while they do 17
18 14678
not qualify as outliers, Oklahoma (16.3%) and Vermont (30%) stand out 19 4679
as notably low and high. (b) One could justify using either the mean and 20 268
standard deviation or the five-number summary: 21 346899
22 3488
x s Min Q1 M Q3 Max 23
23.71% 3.517% 16.3% 20.8% 24.3% 26.4% 30% 24 12446
25 023468
Neither summary reveals the two groups of states visible in the stemplot. 26 02346
(c) One could explore the connections (geographical, socioeconomic, etc.) 27 0455
between the states in the two groups; for example, the top group includes 28 355679
29
many northeastern states, while the bottom group includes quite a few 30 0
southern states.
Solutions 83
Percent
or side-by-side bars like
those below. (They could 50 Black
40
also make six pie charts, Blue
30
but comparing slices across 20 Red
pies is difficult.) Possible 10
observations: white is con- Brown
0
siderably less popular in North South Europe China South Japan Other
America America Korea
Europe, and gray is less
common in China.
Note: The orders of countries and colors is as given in the text, which is more-or-less
arbitrary. (Colors are ordered by decreasing popularity in North America.)
North America
25
South America
20
Percent
15 Europe
10 China
5 South Korea
0 Japan
Silver White Gray Black Blue Red Brown Other
Color
summary is preferred. 50
Min Q1 M Q3 Max 40
0 3 12.5 34 86 30
20
Some students might report the less- 10
. .
appropriate x = 21.62 and s = 22.76. 0
From the histogram and five-number 0 10 20 30 40 50 60 70 80 90 100
summary, we can observe, for example, that Internet users per hundred people
many countries have fewer than 10 Internet users per 100 people. In 75% of countries, less
than 1/3 of the population uses the Internet.
84 Chapter 1 Looking at Data—Distributions
1.164. (a) & (b) The graphs are below. Bars are shown in alpha- Baltimore 7.82
betical order by city name (as the data were given in the table). Boston 8.26
.
651 = 7.82. The
(c) For Baltimore, for example, this rate is 5091 Chicago 4.02
complete table is shown on the right. (d) & (e) Graphs below. Long Beach 6.25
Los Angeles 8.07
Note that the text does not specify whether the bars should be Miami 3.67
ordered by increasing or decreasing rate. (f) Preferences may Minneapolis 14.87
vary, but the ordered bars make comparisons easier. New York 6.23
Oakland 9.30
Philadelphia 7.04
San Francisco 7.61
Washington, D.C. 13.12
8000 50000
Population (thousands)
7000
Open space (acres)
40000
6000
5000 30000
4000
3000 20000
2000
10000
1000
0 0
.
Oa rk
Ch on
s
Los each
Ne lis
and
nea i
Oa rk
.
ngt cisco
Wa Fran ia
Ch on
ore
s
Lon icago
Los each
Ne lis
Phi kland
nea i
ngt cisco
Wa Fran ia
ore
gB o
D.C
Min iam
D.C
Min Miam
ele
ele
g
o
San delph
San delph
po
po
t
t
a
wY
Bos
wY
Bos
tim
tim
l
Ang
c
M
Ang
k
gB
on,
on,
i
Bal
Bal
la
la
Lon
Phi
shi
shi
14 14
Acres of open space
12 12
per 1000 people
10 10
8 8
6 6
4 4
2 2
0 0
.
.
Oa rk
Ch rk
Ch on
Los oston
s
s
Los each
Ne ch
Ne lis
on, lis
Phi kland
nd
nea i
mi
Wa Fran ia
Lon lphia
ngt cisco
Phi cisco
ore
Fra re
gB o
go
D.C
D.C
Min Miam
ele
ele
g
o
San delph
San altimo
po
shi eapo
ea
t
Mia
kla
a
ica
wY
wY
Bos
tim
Ang
Ang
e
gB
on,
i
n
Oa
B
lad
Bal
n
la
ngt
B
Lon
Min
shi
Wa
Solutions 85
1.165. The given description is true on the average, but the curves (and a few calculations)
give a more complete picture. For example, a score of about 675 is about the 97.5th
percentile for both genders, so the top boys and girls have very similar scores.
1.166. (a) & (b) Answers will vary. Definitions might be as simple as “free time,” or “time
spent doing something other than studying.” For part (b), it might be good to encourage
students to discuss practical difficulties; for example, if we ask Sally to keep a log of her
activities, the time she spends filling it out presumably reduces her available “leisure time.”
1.168. Gender and automobile preference are categorical; age and household income are
quantitative.
25
the “Other” category presumably includes the
20
remaining 29.3 million subscribers.
15
10
0
er
line
Ca west
e
&T
ited ion
ink
er
adR st
aO n
arte
nlin
o
unn
a
Oth
Am Veriz
AT
thL
vis
mc
On
Q
Ch
ble
Ear
Co
eric
Ro
Un
1.170. Women’s weights are skewed to the right: This makes the mean higher than the median,
and it is also revealed in the differences M − Q 1 = 14.9 lb and Q 3 − M = 24.1 lb.
1.171. (a) For car makes (a categorical variable), use either a bar graph or pie chart. For
car age (a quantitative variable), use a histogram, stemplot, or boxplot. (b) Study time is
quantitative, so use a histogram, stemplot, or boxplot. To show change over time, use a time
plot (average hours studied against time). (c) Use a bar graph or pie chart to show radio
station preferences. (d) Use a Normal quantile plot to see whether the measurements follow
a Normal distribution.
86 Chapter 1 Looking at Data—Distributions
Spam count
appropriate. What students learn from this 1200
1000
graph will vary; one observation might be
800
that AA and BB (and perhaps some others) 600
might need some advice on how to reduce 400
the amount of spam they receive. 200
0
AA BB CC DD EE FF GG HH II JJ KK LL other
Account ID
1.173. No, and no: It is easy to imagine examples of many different data sets with mean 0 and
standard deviation 1—for example, {−1,0,1} and {−2,0,0,0,0,0,0,0,2}.
Likewise, for any given five numbers a ≤ b ≤ c ≤ d ≤ e (not all the same), we can
create many data sets with that five-number summary, simply by taking those five numbers
and adding some additional numbers in between them, for example (in increasing order):
10, , 20, , , 30, , , 40, , 50. As long as the number in the first blank is
between 10 and 20, and so on, the five-number summary will be 10, 20, 30, 40, 50.
1.174. The time plot is shown below; because of the great detail in this plot, it is larger than
other plots. Ruth’s and McGwire’s league-leading years are marked with different symbols.
(a) During World War II (when many baseball players joined the military), the best home
run numbers decline sharply and steadily. (b) Ruth seemed to set a new standard for other
players; after his first league-leading year, he had 10 seasons much higher than anything that
had come before, and home run production has remained near that same level ever since
(even the worst post-Ruth year—1945—had more home runs than the best pre-Ruth season).
While some might argue that McGwire’s numbers also raised the standard, the change is
not nearly as striking, nor did McGwire maintain it for as long as Ruth did. (This is not
necessarily a criticism of McGwire; it instead reflects that in baseball, as in many other
endeavors, rates of improvement tend to decrease over time as we reach the limits of human
ability.)
70
League-leading HRs in season
60
50
40
30
20
10
0
1880 1900 1920 1940 1960 1980 2000
Year
Solutions 87
1.175. Bonds’s mean changes from 36.56 to 34.41 home runs (a drop of 2.15), 1 69
while his median changes from 35.5 to 34 home runs (a drop of 1.5). This 2 4
2 55
illustrates that outliers affect the mean more than the median. 3 3344
3 77
4 02
4 5669
5
5
6
6
7 3
1.176. Recall the text’s description of the effects of a linear transformation xnew = a + bx: The
mean and standard deviation are each multiplied by b (technically, the standard deviation
is multiplied by |b|, but this problem specifies that b > 0). Additionally, we add a to the
(new) mean, but a does not affect the standard deviation. (a) The desired transformation
is xnew = −40 + 2x; that is, a = −40 and b = 2. (We need b = 2 to double the standard
deviation; as this also doubles the mean, we then subtract 40 to make the new mean 100.)
1 . .
(b) xnew = −45.4545 + 1.8182x; that is, a = −49 11 = −49.0909 and b = 20 11 = 1.8182.
5
(This choice of b makes the new standard deviation 20 and the new mean 145 11 ; we then
subtract 45.4545 to make the new mean 100.) (c) David’s score—2 · 72 − 40 = 104—is
.
higher within his class than Nancy’s score—1.8182 · 78 − 45.4545 = 96.4—is
within her class. (d) A third-grade score of 75 corresponds to a score of 110 from the
− 100
N (100, 20) distribution, which has a standard score of z = 110 20 = 0.5. (Alternatively,
− 70
z = 75 10 = 0.5.) A sixth-grade score of 75 corresponds to about 90.9 on the transformed
− 100 − 80 .
scale, which has standard score z = 90.920 = 75 11 = −0.45. Therefore, about 69% of
third graders and 32% of sixth graders score below 75.
1.177. Results will vary. One set of 20 samples gave Means Standard deviations
the results at the right (Normal quantile plots are not 22 568 5 6
shown). 23 6
23 89 6 66899
Theoretically, x will have a Normal distribution
√ . 24 02 7 3
with mean 25 and standard deviation 8/ 30 = 1.46, 24 89 7
so that about 99.7% of the time, one should find x 25 3 8 113
between 20.6 and 29.4. Meanwhile, the theoretical dis- 25 6799 8 789
26 124 9 000
tribution of s is nearly Normal (slightly skewed) with 26 59 9 556
. .
mean = 7.9313 and standard deviation = 1.0458; about 27 4 10 2
99.7% of the time, s will be between 4.8 and 11.1.
Note: If we take a sample of size√n from a Normal distribution and compute the sam-
ple standard deviation S, then (S/σ ) n − 1 has a “chi” distribution with n − 1 degrees of
freedom (which looks like a Normal distribution when n is reasonably large). You can learn
all you would want to know—and more—about this distribution on the Web (for example, at
Wikipedia). One implication
√ that “on the average,” s underestimates σ ; specifically,
of this is
2 (n/2)
the mean of S is σ √n − 1 (n/2 − 1/2) . The factor in parentheses is always less than 1, but
approaches 1 as n approaches infinity. The proof of this fact is left as an exercise—for the
instructor, not for the average student!
Chapter 2 Solutions
2.2. When students are classified like this, PSQI is being used as a categorical variable,
because each student is categorized by the group he/she falls in.
One advantage is that it might simplify the analysis, or at least it might simplify the
process of describing the results. (Saying that someone fell into the “poor” category is easier
to interpret than saying that person had a PSQI score of 12.) A more subtle issue is that it is
not clear whether finding an average is appropriate for these numbers; technically, averages
are not appropriate for a quantitative measurement unless the variable is measured on an
“interval” scale, meaning (for example) that the difference between PSQI scores of 1 and 2
is the same as the difference between PSQI scores of 10 and 11.
2.3. With this change, the cases are cups of Mocha Frappuccino (as before). The variables
(both quantitative) are size and price.
2.4. One could make the argument that being subjected to stress makes it more difficult to
sleep, so that SUDS (stress level) is explanatory and PSQI (sleep quality) is the response.
2.6. Stemplots are shown; histograms would be equivalent. Students Bots Spams/day
may choose different ways to summarize the data, such as bar 0 1223 0 002359
graphs (one bar for each botnet). Note that summarizing each 0 58 1 06
1 2 2
variable separately does not reveal the relationship between the 1 58 3 0
two variables; that is done using a scatterplot in the next exercise. 2 4
Because both distributions are skewed, we prefer five-number 2 5
summaries to the mean and standard deviation. 3 1 6 0
x s Min Q 1 M Q3 Max
Bots (thousands) 99.7 96.6 12 20 67.5 150 315
Spams/day (billions) 13.6 18.6 0.35 2 7.0 16 60
88
Solutions 89
4.00
the appropriate price for each size (rather
than vice versa). The scatterplot shows a 3.75
positive association between size and price. 3.50
3.25
10 12 14 16 18 20 22 24
Size (ounces)
90 Chapter 2 Looking at Data—Relationships
2.10. Two good choices are the change in debt from 2006 to Increase Ratio
2007 (subtract the two numbers for each country) or the ratio −0 21 9 88
of the two debts (divide one number by the other). Students 0 0044 10 4
0 56667 10 668
may think of other new variables, but these have the most 1 034 11 0223334
direct bearing on the question. 1 11 66778
Shown are stemplots of the increase (2007 debt minus 2 13 12 013
2006 debt, measured in US$ billions), and the debt ratio 2 7 12 589
3
(2007 debt divided by 2006 debt; these numbers have no 3 77
units). From either variable, we can see that debt increased 4 122
for all but two of the 24 countries. This can be summarized 4
5 34
using either the mean and standard deviation or the five-
number summary (the latter is preferred for increase, because of the skew).
x s Min Q1 M Q3 Max
Increase 19.07 18.38 −2.89 5.25 12.205 37.66 54.87
Ratio 1.145 0.082 0.984 1.098 1.143 1.193 1.298
Note: In looking at increases, one notes that the size of the debt and the size of the change
are related (countries with smaller debts typically changed less than countries with large
debts). Debt ratio does not have this relationship with debt size (or at least it is less appar-
ent); for this reason, it might be considered a better choice for answering this question.
0
0 1000 2000 3000 4000
Debt 2006
2.13. (a) A boxplot summarizes the distribution of one variable. (Two [or more] boxplots
can be used to compare two [or more] distributions, but that does not allow us to
examine the relationship between those variables.) (b) This is only correct if there is an
explanatory/response relationship. Otherwise, the choice of which variable goes on which
axis might be somewhat arbitrary. (c) High values go with high values, and low values go
with low values. (Of course, those statements are generalizations; there can be exceptions.)
2.14. (a) The points should all fall close to a negatively sloped line. (b) Look for a “cloud”
of points with no discernible pattern. Watch for students who mistakenly consider “no
relationship” as meaning “no linear relationship.” For example, points that suggest a
circle, triangle, or curve may indicate a non-linear relationship. (c) The points should be
widely scattered around a positively sloped line. (d) Sketches might be curved, angular, or
something more imaginative.
2.15. (a) Below, left. (b) Adding up the numbers in the first column of the table gives 46,994
thousand (that is, about 47 million) uninsured; divide each number in the second column
by this amount. (Express answers as percents; that is, multiply by 100.) (c) Below, right.
(The title on the vertical axis is somewhat wordy, but that is needed to distinguish between
this graph and the one in the solution to the next exercise.) (d) The plots differ only in the
vertical scale. (e) The uninsured are found in similar numbers for the five lowest age groups
(with slightly more in those aged 25–34 and 45–64), and fewer among those over 65.
Number uninsured (thousands)
X X X X
10000
Percent of all uninsured
20%
X X X X
8000 X X
in age group
15%
6000
10%
4000
2000 5%
X X
0 0%
<18 18–24 25–34 35–44 45–64 65+ <18 18–24 25–34 35–44 45–64 65+
Age group (years) Age group (years)
2.17. The percents in Exercise 2.15 show what fraction of the uninsured fall in each age group.
The percents in Exercise 2.16 show what fraction of each age group is uninsured.
Note: When looking at fractions and percents, encourage students to focus on
the “whole”—that is, what does the denominator represent? For all the fractions in
Exercise 2.15, the “whole” is the group of all uninsured people. For Exercise 2.16, the
“whole” for each fraction is the total number of people in the corresponding age group.
Carbohydrates
relationship—that is, beers with high alcohol 15
content generally have more carbohydrates,
and those with low alcohol content generally 10
have fewer carbohydrates. (b) The outlier 5
is O’Doul’s, which is marketed as “non-
alcoholic” beer. (c) The scatterplot is on the 0
3.5 4 4.5 5 5.5 6 6.5
right. (d) Without the outlier, the scatterplot Percent alcohol
suggests a slightly curved relationship. (That
relationship was also somewhat visible in Figure 2.10, but it is easier to see when the points
are not crowded together in half of the graph.)
160
approximation given the amount of scatter.) 140
120
100
80
3.5 4 4.5 5 5.5 6 6.5
Percent alcohol
2.20. (a) Figure 2.11 shows a positive curved relationship. More specifically, in countries with
fewer than 10 Internet users per 100 people, life expectancy ranges between 40 and about
75 years. For countries with more than 10 Internet users per 100 people, life expectancy
increases (slowly) with increasing Internet usage. (b) A more likely explanation for the
association is that countries with higher Internet usage are typically more developed and
affluent, which comes with benefits such as better access to medical care, etc.
2.21. There is a moderate positive linear relationship; the relationship for all countries is less
linear because of the wide range in life expectancy among countries with low Internet use.
Solutions 93
Temperature (degrees F)
60
changes in temperature (not vice versa).
55
(b) Temperature increases linearly with time
50 X
(about 10 degrees per month); the relationship
is strong. 45
40 X
35
30 X
25
February March April May
Month
2.26. To be considered an outlier, the point for the ninth student should be in either the upper
left or lower right portion of the scatterplot. The former would correspond to a student who
had a below-average second-test score but an above-average final-exam score. The latter
would be a student who did well on the second test but poorly on the final.
2.27. (a) Age is explanatory; weight is the response variable. (b) Explore the relationship;
there is no reason to view one or the other as explanatory. (c) Number of bedrooms is
explanatory; price is the response variable. (d) Amount of sugar is explanatory; sweetness is
the response variable. (e) Explore the relationship.
2.28. Parents’ income is explanatory, and college debt is the response. Both variables are
quantitative. We would expect a negative association: Low income goes with high debt, high
income with low debt.
2.29. (a) In general, we expect more intelligent children to be better readers and less intelligent
children to be weaker. The plot does show this positive association. (b) The four points are
for children who have moderate IQs but poor reading scores. (c) The rest of the scatterplot
is roughly linear but quite weak (there would be a lot of variation about any line we draw
through the scatterplot).
2.30. (a) The response variable (estimated level) can only take on the values 1, 2, 3, 4, 5, so
the points in the scatterplot must fall on one of those five levels. (b) The association is
(weakly) positive. (c) The estimate is 4, which is an overestimate; that child had the lowest
score on the test.
Brain activity
and high values, but those points do not 0.06
0.04
deviate from the pattern of the rest. Social
0.02
exclusion does appear to trigger a pain re- 0
sponse. –0.02
–0.04
–0.06
1 1.5 2 2.5 3 3.5
Social distress score
600 600
Team value ($millions)
1000
850
30 35 40 45 50 55 60
Lean body mass (kg)
Men
(open circles) show fairly steady improve- 2200
ment. Women have made more rapid 2100 Women
progress, but their progress seems to have 2000
slowed (the record has not changed since 1900
1993), while men’s records may be drop- 1800
ping more rapidly in recent years. (b) The 1700
data support the first claim but do not seem 1600
to support the second. 1500
1900 1920 1940 1960 1980 2000
Year
Solutions 97
.
2.38. The correlation is r = 0.8839. (This can be computed by hand, but software makes it
much easier.)
.
2.39. (a) The correlation is r = 0.8839 (again). (b) They are equal. (c) Units do not affect
correlation.
2.40. The correlation is near 1, because the scatterplot shows a very strong positive linear
.
association. (This can be confirmed with the data; we find that r = 0.9971.)
2.41. In both these cases, the points in a scatterplot would fall exactly on a positively sloped
line, so both have correlation r = 1. (a) With x = the price of a brand-name product, and
y = the store-brand price, the prices satisfy y = 0.9x. (b) The prices satisfy y = x − 1.
y
20
10
20 30 40 50 60
x
.
2.43. Software gives r = 0.2873. (This is consistent with the not-very-strong association visible
in the plot.)
.
2.44. (a) With the outlier (O’Doul’s) removed, the correlation is r = 0.4185. (b) Outliers that
are not consistent with the pattern of the other points tend to decrease the size of r (that is,
they make r closer to 0), because they weaken the association between the variables. (Points
that lie far from the other points, but roughly on the same line, are typically not referred to
as outliers because they actually strengthen the association.)
.
2.45. The correlation is r = 0.6701, but we note
80
Life expectancy (years)
60
0 10 20 30 40 50 60 70 80 90
Internet users (per 100 people)
.
2.47. (a) r = 0.5194. (b) The first-test/final-exam correlation will be lower, because the
relationship is weaker. (See the next solution for confirmation.)
.
2.48. (a) r = −0.2013. (b) The small correlation (that is, close to 0) is consistent with
a weak association. (c) This correlation is much smaller (in absolute value) than the
second-test/final-exam correlation 0.5194.
2.49. Such a point should be at the lower left part of the scatterplot. Because it tends to
strengthen the relationship, the correlation increases.
Note: In this case, r was positive, so strengthening the relationship means r gets larger.
If r had been negative, strengthening the relationship would have decreased r (toward −1).
2.50. Any outlier should make r closer to 0, because it weakens the relationship. To be
considered an outlier, the point for the ninth student should be in either the upper left or
lower right portion of the scatterplot. The former would correspond to a student who had a
below-average second-test score but an above-average final-exam score. The latter would be
a student who did well on the second test but poorly on the final.
Note: In this case, because r > 0, this means r gets smaller. If r had been negative,
getting closer to 0 would mean that r gets larger (but gets smaller in absolute value).
.
2.51. The correlations are listed on the right; these sup- Value and revenue r1 = −0.3228
.
port the observation from the solution to Exercise 2.34 Value and debt r2 = 0.9858
.
that the value/debt relationship is by far the strongest. Value and income r3 = 0.7177
. .
2.52. For Exercise 2.32, r1 = 0.2797; for Exercise 2.36, r2 = 0.9958 (run 8903) and
.
r3 = 0.9982 (run 8905).
Solutions 99
.
2.54. The correlation is r = 0.481. The correlation is greatly 11
lowered by the one outlier. Outliers tend to have fairly 10
9
strong effects on correlation; it is even stronger here because 8
there are so few observations. 7
6
y
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9 10
x
2.55. (a) As two points determine a line, the correlation is always either −1 or 1. (b) Sketches
will vary; an example is shown as the first graph below. Note that the scatterplot must be
positively sloped, but r is affected only by the scatter about a line drawn through the data
points, not by the steepness of the slope. (c) The first nine points cannot be spread from the
top to the bottom of the graph because in such a case the correlation cannot exceed about
0.66 (based on empirical evidence—that is, from a reasonable amount of playing around
.
with the applet). One possibility is shown as the second graph below. (d) To have r = 0.8,
the curve must be higher at the right than at the left. One possibility is shown as the third
graph below.
100 Chapter 2 Looking at Data—Relationships
2.56. (a) The correlation will be closer to −1. One possible answer is shown below, left.
(b) Answers will vary, but the correlation will increase and can be made positive by
dragging the point down far enough (below, right).
2.57. (Scatterplot not shown.) If the husband’s age is y and the wife’s x, the linear relationship
y = x + 2 would hold, and hence r = 1 (because the slope is positive).
2.58. Explanations and sketches will vary, but should note that correlation measures
the strength of the association, not the slope of the line (except for the sign of the
slope—positive or negative). The hypothetical Funds A and B mentioned in the report, for
example, might be related by a linear formula with slope 2 (or 1/2).
2.59. The person who wrote the article interpreted a correlation close to 0 as if it were a
correlation close to −1 (implying a negative association between teaching ability and
research productivity). Professor McDaniel’s findings mean there is little linear association
between research and teaching—for example, knowing that a professor is a good researcher
gives little information about whether she is a good or bad teacher.
Note: Students often think that “negative association” and “no association” mean the
same thing. This exercise provides a good illustration of the difference between these terms.
2.60. (a) Because occupation has a categorical (nominal) scale, we cannot compute the
correlation between occupation and anything. (There may be a strong association between
these variables; some writers and speakers use “correlation” as a synonym for “association.”
It is much better to retain the more specific meaning.) (b) A correlation r = 1.19 is
impossible because −1 ≤ r ≤ 1 always. (c) Neither variable (gender and color) is
quantitative.
2.61. Both relationships (scatterplots follow) are somewhat linear. The GPA/IQ scatterplot
. .
(r = 0.6337) shows a stronger association than GPA/self-concept (r = 0.5418). The two
students with the lowest GPAs stand out in both plots; a few others stand out in at least one
plot. Generally speaking, removing these points raises r (because the remaining points look
more linear). An exception: Removing the lower-left point in the self-concept plot decreases
r because the relative scatter of the remaining points is greater.
Solutions 101
10 10
8 8
GPA
GPA
6 6
4 4
2 2
0 0
70 80 90 100 110 120 130 20 30 40 50 60 70 80
IQ Self-concept score
.
2.63. The estimated fat gain is 3.505 − 0.00344 × 600 = 1.441 kg.
2.64. The data used to determine the regression line had NEA increase values ranging from
−94 to 690 calories, so estimates for values inside that range (like 200 and 500) should be
relatively safe. For values far outside this range (like −400 and 1000), the predictions would
not be trustworthy.
2.65. The table on the right shows the r −0.9 −0.5 −0.3 0 0.3 0.5 0.9
values of r 2 (expressed as percents). r 2 81% 25% 9% 0% 9% 25% 81%
We observe that (i) the fraction of
variation explained depends only on the magnitude (absolute value) of r , not its sign, and
(ii) the fraction of explained variation drops off drastically as |r | moves away from 1.
40000
(although note that the slope of the line
is almost completely determined by the 30000
largest cities). (c) The regression equation
is ŷ = 1248 + 6.1050x. (d) Regression 20000
.
on population explains r 2 = 95.2% of the 10000
variation in open space.
0
0 2500 5000 7500
Population (thousands)
102 Chapter 2 Looking at Data—Relationships
2.67. Residuals (found with software) are given in the table Los Angeles 5994.85
on the right. Los Angeles is the best; it has nearly 6000 Washington, D.C. 2763.75
acres more than the regression line predicts. Chicago, Minneapolis 2107.59
which falls almost 7300 acres short of the regression pre- Philadelphia 169.42
diction, is the worst of this group. Oakland 27.91
Boston 20.96
San Francisco −75.78
Baltimore −131.55
New York −282.99
Long Beach −1181.70
Miami −2129.21
Chicago −7283.26
.
651 = 7.82. The complete table is shown below
2.69. For Baltimore, for example, this rate is 5091
on the left. Note that population is in thousands, so these are in units of acres per 1000
people. (a) Scatterplot below on the right. (b) The association is much less linear than in
the scatterplot for Exercise 2.66. (c) The regression equation is ŷ = 8.739 − 0.000424x.
.
(d) Regression on population explains only r 2 = 8.7% of the variation in open space per
person.
Baltimore 7.82
Boston 8.26 14
(acres/1000 people)
Chicago 4.02
12
Long Beach 6.25
Open space
2.71. The regression equation for predicting carbohydrates from alcohol content is
ŷ = 3.379 + 1.6155x.
.
Note: As we would guess from the scatterplot, and from the correlation r = 0.2873 found
.
in Exercise 2.43, this is not a very reliable prediction; it only explains r2 = 8.3% of the
variation in carbohydrates.
2.72. (a) With the outlier (O’Doul’s) removed, the regression equation changes to
ŷ = −3.544 + 3.0319x. (b) An outlier tends to weaken the association between the variables,
and (as in this case) can drastically change the linear regression equation.
2.73. (a) To three decimal places, the correlations are all approximately 0.816 (for set D, r
actually rounds to 0.817), and the regression lines are all approximately ŷ = 3.000 + 0.500x.
.
For all four sets, we predict ŷ = 8 when x = 10. (b) Scatterplots below. (c) For Set A, the
use of the regression line seems to be reasonable—the data do seem to have a moderate
linear association (albeit with a fair amount of scatter). For Set B, there is an obvious
non-linear relationship; we should fit a parabola or other curve. For Set C, the point
(13, 12.74) deviates from the (highly linear) pattern of the other points; if we can exclude
it, the (new) regression formula would be very useful for prediction. For Set D, the data
point with x = 19 is a very influential point—the other points alone give no indication
of slope for the line. Seeing how widely scattered the y coordinates of the other points
are, we cannot place too much faith in the y coordinate of the influential point; thus, we
cannot depend on the slope of the line, so we cannot depend on the estimate when x = 10.
(We also have no evidence as to whether or not a line is an appropriate model for this
relationship.)
2.74. (a) The scatterplot (below, left) suggests a moderate positive linear relationship. (b) The
regression equation is ŷ = 17.38 + 0.6233x. (c) The residual plot is below, right. (d) The
.
regression explains r 2 = 27.4% of the variation in y. (e) Student summaries will vary.
34 3
33
2
32
31 1
Residual
30 0
y
29 –1
28
–2
27
26 –3
25 –4
17 18 19 20 21 22 23 17 18 19 20 21 22 23
x x
2.75. (a) The scatterplot (below, left) suggests a fairly strong positive linear relationship.
(b) The regression equation is ŷ = 1.470 + 1.4431x. (c) The residual plot is below,
right. The new point’s residual is positive; the other residuals decrease as x increases.
.
(d) The regression explains r 2 = 71.1% of the variation in y. (e) The new point makes the
relationship stronger, but its location has a large impact on the regression equation—both the
slope and intercept changed substantially.
50
4
45 2
Residual
40 0
y
–2
35
–4
30 –6
25 –8
16 18 20 22 24 26 28 30 16 18 20 22 24 26 28 30
x x
2.76. (a) The scatterplot (following page, left) gives little indication of a relationship between
.
x and y. The regression equation is ŷ = 29.163 + 0.02278x; it explains only r 2 = 0.5% of
the variation in y. The residual plot (following page, right) tells a similar story to the first
scatterplot—little evidence of a relationship. This new point does not fall along the same
line as the other points, so it drastically weakens the relationship. (b) A point that does not
follow the same pattern as the others can drastically change an association, and in extreme
cases, can essentially make it disappear.
Solutions 105
34 5
33 4
32 3
31 2
Residual
30 1
y
29 0
28 –1
27 –2
26 –3
25 –4
15 20 25 30 35 40 45 50 15 20 25 30 35 40 45 50
x x
0.06
(b) Based on the “up-and-over” method, 0.04
most students will probably estimate 0.02
. 0
that ŷ = 0; the regression formula gives
–0.02
ŷ = −0.0045. (c) The correlation is
. . –0.04
r = 0.8782, so the line explains r 2 = 77% –0.06
of the variation in brain activity. 1 1.5 2 2.5 3 3.5
Social distress score
106 Chapter 2 Looking at Data—Relationships
2.80. The regression equations are ŷ = −2.39 + 0.158x (Run 8903, 11.9 mg/s) and
ŷ = −1.45 + 0.0911x (Run 8905, 29.6 mg/s). Therefore, the growth rates are (respectively)
0.158 cm/minute and 0.0911 cm/minute; this suggests that the faster the water flows, the
more slowly the icicles grow.
. .
2.81. The means and standard deviations are x = 95 min, y = 12.6611 cm, sx = 53.3854 min,
. .
and s y = 8.4967 cm; the correlation is r = 0.9958.
.
For predicting length from time, the slope and intercept are b1 = r s y /sx = 0.158 cm/min
.
and a1 = y − b1 x = −2.39 cm, giving the equation ŷ = −2.39 + 0.158x (as in
Exercise 2.80).
.
For predicting time from length, the slope and intercept are b2 = r sx /s y = 6.26 min/cm
.
and a2 = x − b2 y = 15.79 min, giving the equation x̂ = 15.79 + 6.26y.
2.82. The means and standard deviations are: For lean body mass, m = 46.74 and
sm = 8.28 kg, and for metabolic rate, r = 1369.5 and sr = 257.5 cal/day. The
correlation is r = 0.8647. For predicting metabolic rate from body mass, the slope is
.
b1 = r · sr /sm = 26.9 cal/day per kg. For predicting body mass from metabolic rate, the
.
slope is b2 = r · sm /sr = 0.0278 kg per cal/day.
. .
2.83. The correlation of IQ with GPA is r1 = 0.634; for self-concept and GPA, r2 = 0.542.
.
IQ does a slightly better job; it explains about r12 = 40.2% of the variation in GPA, while
.
self-concept explains about r22 = 29.4% of the variation.
50
50 55 60 65 70 75
Wife’s height
. . .
2.86. (a) x = 95 min, sx = 53.3854 min, y = 12.6611 cm, and s y = 8.4967 cm. The
.
correlation r = 0.9958 has no units. (b) Multiply the old values of y and s y by 2.54:
. .
y = 32.1591 and s y = 21.5816 inches. The correlation r is unchanged. (c) The slope is
r s y /sx ; with s y from part (b), this gives b1 = 0.4025 in/min. (Or multiply by 2.54 the
appropriate slope from the solution to Exercise 2.80.)
√
2.87. r = 0.16 = 0.40 (high attendance goes with high grades, so r must be positive).
Solutions 107
2.88. (a) The scatterplot (below, left) includes the regression line ŷ = 21581 − 473.73x.
(b) The scatterplot does not suggest a linear association, so a regression line is not an
appropriate summary of the relationship. (c) The residual plot (below, right) reveals—in a
manner similar to the original scatterplot—that a line is not appropriate for this relationship.
More specifically, the wide range of GDP-per-cap values for low unemployment rates
suggest that there may be no useful relationship unless unemployment is sufficiently high.
80000 60000
70000 50000
60000 40000
GDP per cap
Residual
50000 30000
40000 20000
30000 10000
20000 0
10000 –10000
0 –20000
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Unemployment Unemployment
2.89. (a) The scatterplot (below, left) includes the regression line ŷ = 10.27 − 0.5382x. (b) The
scatterplot looks more linear than Figure 2.5, but a line may not be appropriate for all values
of log unemployment. (c) In the residual plot (below, right), we see that there are more
negative residuals on the left and right, with more positive residuals in the middle.
2
11
1
10
LGDP per cap
Residual
9 0
8 –1
7 –2
6 –3
–1 0 1 2 3 4 –1 0 1 2 3 4
L unemployment L unemployment
2.90. After removing countries with low unemployment rates, there are 70 countries left.
(a) The scatterplot (below, left) includes the regression line ŷ = 10.8614 − 0.7902x. (b) A
line seems appropriate for this set of countries, (c) The residual plot (below, right) does not
seem to show any patterns that might suggest any causes for concern.
10.5
1
10
0.5
LGDP per cap
9.5
Residual
9 0
8.5 –0.5
8
–1
7.5
7 –1.5
6.5 –2
1.5 2 2.5 3 3.5 4 4.5 1.5 2 2.5 3 3.5 4 4.5
L unemployment L unemployment
108 Chapter 2 Looking at Data—Relationships
600
Team value ($millions)
550
500
450
400
350
300
250
–30 –20 –10 0 10 20 30 40 50 80 100 120 140 160 180 200 0 10 20 30 40 50 60 70
Income ($millions) Debt ($millions) Revenue ($millions)
2.92. For an NEA increase of 143 calories, the predicted fat gain is ŷ = 3.505 − 0.00344 ×
. .
143 = 3.013 kg, so the residual is y − ŷ = 3.2 − 3.013 = 0.187 kg. This residual is positive
because the actual fat gain was greater than the prediction.
2.94. (a) It is impossible for all the residuals to be positive; some must be negative, because
they will always sum to 0. (b) The direction of the relationship (positive or negative) has no
connection to whether or not it is due to causation. (c) Lurking variables can be any kind of
variables (and may be more likely to be explanatory rather than response).
2.95. (a) A high correlation means strong association, not causation. (b) Outliers in the y
direction (and some other data points) will have large residuals. (c) It is not extrapolation if
1 ≤ x ≤ 5.
2.96. Both variables are indicators of the level of technological and economic development in a
country; typically, they will both be low in a poorer, underdeveloped country, and they will
both be high in a more affluent country.
2.97. A reasonable explanation is that the cause-and-effect relationship goes in the other
direction: Doing well makes students or workers feel good about themselves, rather than
vice versa.
2.98. Patients suffering from more serious illnesses are more likely to go to larger hospitals
(which may have more or better facilities) for treatment. They are also likely to require
more time to recuperate afterwards.
Solutions 109
2.99. The explanatory and response variables were “consumption of herbal tea” and
“cheerfulness/health.” The most important lurking variable is social interaction; many of the
nursing home residents may have been lonely before the students started visiting.
2.100. See also the solutions to Exercises 2.3 and 2.9. (a) Size should be on the horizontal
axis because it is the explanatory variable. (b) The regression line is ŷ = 2.6071 + 0.08036x.
(c) See the plot (next page, left). (d) Rounded to four decimal places, the residuals (as
computed by software) are −0.0714, 0.1071, and −0.0357. It turns out that these three
residuals add up to 0, no matter how much they are rounded. However, if they are computed
by hand, and the slope and intercept in the regression equation have been rounded, there
might be some roundoff error. (e) The middle residual is positive and the other two are
negative, meaning that the 16-ounce drink costs more than the predicted value and the other
two sizes cost less than predicted. Note that the residuals show the same pattern (relative to
a horizontal line at 0) as the original points around the regression line.
4.60 0.10
4.40
0.05
4.20
Residual
Cost ($)
4.00 0.00
3.80
–0.05
3.60
3.40 –0.10
10 12 14 16 18 20 22 24 10 12 14 16 18 20 22 24
Size (ounces) Size (ounces)
2.101. (a) The plot (below, left) is curved (low at the beginning and end of the year, high in
the middle). (b) The regression line is ŷ = 39.392 + 1.4832x. It does not fit well because
a line is poor summary of this relationship. (c) Residuals are negative for January through
March and October through December (when actual temperature is less than predicted
temperature), and positive from April to September (when it is warmer than predicted).
(d) A similar pattern would be expected in any city that is subject to seasonal temperature
variation. (e) Seasons in the Southern Hemisphere are reversed, so temperature would be
cooler in the middle of the year.
70 20
Temperature (oF)
60 10
Residual
0
50
–10
40
–20
30 –30
20 –40
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Month Month
110 Chapter 2 Looking at Data—Relationships
2.102. (a) Below, left. (b) This line is not a good summary of the pattern; the scatterplot is
curved rather than linear. (c) The sum is 0.01. The first two and last four residuals are
negative, and those in the middle are positive. Plot below, right.
8 0.6
0.4
7 0.2
Weight (kg)
Residual
0
6 –0.2
–0.4
5 –0.6
–0.8
4 –1
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Age (months) Age (months)
2.103. With individual children, the correlation would be smaller (closer to 0) because the
additional variation of data from individuals would increase the “scatter” on the scatterplot,
thus decreasing the strength of the relationship.
2.104. Presumably, those applicants who were hired would generally have been those who
scored well on the test. As a result, we have little or no information on the job performance
of those who scored poorly (and were therefore not hired). Those with higher test scores
(who were hired) will likely have a range of performance ratings, so we will only see
the various ratings for those with high scores, which will almost certainly show a weaker
relationship than if we had performance ratings for all applicants.
2.105. For example, a student who in the past might have received a grade of B (and a
lower SAT score) now receives an A (but has a lower SAT score than an A student in the
past). While this is a bit of an oversimplification, this means that today’s A students are
yesterday’s A and B students, today’s B students are yesterday’s C students, and so on.
Because of the grade inflation, we are not comparing students with equal abilities in the past
and today.
2.106. A simple example illustrates this nicely: Suppose that everyone’s current salary is their
age (in thousands of dollars); for example, a 52-year-old worker makes $52,000 per year.
Everyone receives a $500 raise each year. That means that in two years, every worker’s
income has increased by $1000, but their age has increased by 2, so each worker’s salary is
now their age minus 1 (thousand dollars).
.
2.107. The correlation between BMR and fat gain is r = 0.08795; the slope of the regression
line is b = 0.000811 kg/cal. These both show that BMR is less useful for predicting fat
gain. The small correlation suggests a very weak linear relationship (explaining less than 1%
of the variation in fat gain). The small slope means that changes in BMR have very little
impact on fat gain; for example, increasing BMR by 100 calories changes fat gain by only
0.08 kg.
Solutions 111
2.108. (a) The scatterplot of the data is below on the left. (It is difficult to tell that there
are 20 data points, because many of the points overlap.) (b) The regression equation is
ŷ = −14.4 + 46.6x. (c) Residual plot below, right. The residuals for the extreme x-values
(x = 0.25 and x = 20.0) are almost all positive; all those for the middle two x-values are
negative.
900 16
800 12
700 8
Response
600
Residual
4
500
400 0
300 –4
200 –8
100 –12
0
–16
0 5 10 15 20 0 5 10 15 20
Mass (ng) Mass (ng)
74 76 78 80 82 84 86 88
Round 1
2.111. (a) Drawing the “best line” by eye is a very inaccurate process; few people choose
the best line (although you can get better at it with practice). (b) Most people tend to
.
overestimate the slope for a scatterplot with r = 0.7; that is, most students will find that the
112 Chapter 2 Looking at Data—Relationships
2.112. (a) Any point that falls exactly on the regression line will not increase the sum of
squared vertical distances (which the regression line minimizes). Any other line—even if it
passes through this new point—will necessarily have a higher total sum of squares. Thus,
the regression line does not change. Possible output below, left. (b) Influential points are
those whose x coordinates are outliers; this point is on the right side, while all others are on
the left. Possible output below, right.
2.114. See also the solution to Exercise 2.73. (a) Fits and residuals are listed below. (Students
should find these using software.) (b) Plots below. (c) The residual plots confirm the
observations made in Exercise 2.73: Regression is only appropriate for Set A.
2 2 2
Set A Set B 3 Set C Set D
1 1 2 1
1
0 0 0
0
–1 –1 –1
–1
–2 –2 –2 –2
4 6 8 10 12 14 4 6 8 10 12 14 4 6 8 10 12 14 5 10 15 20
2.115. There are 1684 female binge drinkers in the table; 8232 female students are not binge
drinkers.
2.116. There are 1684 + 8232 = 9916 women in the study. The number of students who are
not binge drinkers is 5550 + 8232 = 13,782.
2.117. Divide the number of non-bingeing females by the total number of students:
8232 .
= 0.482
17,096
2.118. Use the numbers in the right-hand column of the table in Example 2.28. Divide the
counts of bingeing and non-bingeing students by the total number of students:
3314 . 13,782 .
= 0.194 and = 0.806
17,096 17,096
2.119. This is a conditional distribution; take the number of bingeing males divided by the
1630 .
total number of males: = 0.227.
7180
2.120. The first computation was performed in the previous solution; for the second, take the
5550 .
number of non-bingeing males divided by the total number of males: = 0.773.
7180
114 Chapter 2 Looking at Data—Relationships
.
2.121. (a) There are 151 + 148 = 299 “high exercisers,” of which 151 299 = 50.5% get enough
sleep and 49.5% (the rest) do not. (b) There are 115 + 242 = 357 “low exercisers,” of which
115 .
357 = 32.2% get enough sleep and 67.8% (the rest) do not. (c) Those who exercise more
than the median are more likely to get enough sleep.
Note: This question is asking for the conditional distribution of sleep within each
exercise group. The next question asks for the conditional distribution of exercise within
each sleep group.
.
2.122. (a) There are 151 + 115 = 266 students who get enough sleep, of which 151 266 = 56.8%
are high exercisers and 43.2% (the rest) are low exercisers. (b) There are 148 + 242 = 390
.
students who do not get enough sleep, of which 148 390 = 37.9% are high exercisers and 62.1%
(the rest) are low exercisers. (c) Students who get enough sleep are more likely to be high
exercisers. (d) Preferences will vary. In particular, note that one can make the case for a
cause-and-effect relationship in either direction between these variables.
2.123. 63
2100 = 3.0% of Hospital A’s patients died, compared with 16
800 = 2.0% at Hospital B.
0.4 0.7
Proportion of students
Proportion of students
0.6
0.3 0.5
0.4
0.2
0.3
0.1 0.2
0.1
0 0
15–19 20–24 25–34 35 and over Full-time Part-time
2.126. Refer to the counts in the solution to Exercise 2.125. For each age category, the
conditional distribution of status is found by dividing the counts in that row by that row
. 389 .
3777 = 0.8970 and 3777 = 0.1030, meaning that of all college students
total. For example, 3388
in the 15–19 age range, about 89.7% are full-time, and the rest (10.3%) are part-time. Note
that each pair of numbers should add up to 1 (except for rounding error, but with only two
numbers, that rarely happens). The complete table is shown below, along with one possible
graphical presentation. We see that the older the students are, the more likely they are to be
part-time.
1
0.9 Part-time
Proportion of students
0.8
FT PT 0.7 Full-time
15–19 0.8970 0.1030 0.6
0.5
20–24 0.8182 0.1818 0.4
25–34 0.5006 0.4994 0.3
0.2
35+ 0.2715 0.7285 0.1
0
15–19 20–24 25–34 35 and over
2.127. Refer to the counts in the solution to Exercise 2.125. For each status category, the
conditional distribution of age is found by dividing the counts in that column by that column
3388 . 5238 .
total. For example, 11,091 = 0.3055, 11,091 = 0.4723, etc., meaning that of all full-time
college students, about 30.55% are aged 15 to 19, 47.23% are 20 to 24, and so on. Note
that each set of four numbers should add up to 1 (except for rounding error). Graphical
presentations may vary; one possibility is shown below. We see that full-time students are
dominated by younger ages, while part-time students are more likely to be older. (This
is essentially the same observation made in the previous exercise, seen from a different
viewpoint.)
1
0.9 15–19
Proportion of students
0.8
FT PT 0.7 20–24
15–19 0.3055 0.0734 0.6
0.5 25–34
20–24 0.4723 0.2197 0.4
35 and over
25–34 0.1535 0.3207 0.3
0.2
35+ 0.0687 0.3861 0.1
0
Full-time Part-time
116 Chapter 2 Looking at Data—Relationships
2.128. Two examples are shown on the right. In general, choose a 50 150 175 25
to be any number from 0 to 200, and then all the other entries 150 50 25 175
can be determined.
Note: This is why we say that such a table has “one degree of freedom”: We can make
one (nearly) arbitrary choice for the first number, and then have no more decisions to make.
2.129. To construct such a table, we can start by choosing values for the a b c r1
row and column sums r 1, r 2, r 3, c1, c2, c3, as well as the grand total d e f r2
N . Note that N = r 1 + r 2 + r 3 = c1 + c2 + c3, so we only have five g h i r3
choices to make. Then, find each count a, b, c, d, e, f, g, h, i by taking c1 c2 c3 N
the corresponding row total, times the corresponding column total, divided by the grand total.
For example, a = r 1 × c1/N and f = r 2 × c3/N . Of course, these counts should be whole
numbers, so it may be necessary to make adjustments in the row and column totals to meet
this requirement.
The simplest such table would have all nine counts (a, b, c, d, e, f, g, h, i) equal to one
another.
+ 180 .
2.130. (a) Overall, 125 + 155
900 = 51.1%
60
Nonresponse rate (%)
68 .
2.131. (a) Use column percents, e.g., 225 = 30.22% of females are in administration. See table
and graph below. The biggest difference between women and men is in Administration:
A higher percentage of women chose this major. Meanwhile, a greater proportion of men
.
722 = 46.5% did not
chose other fields, especially Finance. (b) There were 386 responses; 336
respond.
Solutions 117
40 Female
Percent of students
35
Male
30
Female Male Overall
25 Overall
Accting. 30.22% 34.78% 32.12%
20
Admin. 40.44% 24.84% 33.94%
15
Econ. 2.22% 3.70% 2.85%
10
Fin. 27.11% 36.65% 31.09%
5
0
Accounting Administration Economics Finance
Major
.
2.132. 1424 = 58.33% of desipramine users 60
Malaria Helminths
2.134. Opinions will vary; one can argue for causation both ways, and the truth is probably
that both conditions exacerbate one another.
Student
characteristics
118 Chapter 2 Looking at Data—Relationships
Age
2.137. No; self-confidence and improving fitness could be a common response to some other
personality trait, or high self-confidence could make a person more likely to join the
exercise program.
Other
factors
2.139. Two possibilities are that they might perform better simply because this is their second
attempt or because they feel better prepared as a result of taking the course (whether or not
they really are better prepared).
2.140. The diagram below illustrates the confounding between exposure to chemicals and
standing up.
?
Time Seriousness
standing up of illness
2.141. Patients suffering from more serious illnesses are more likely to go to larger hospitals
(which may have more or better facilities) for treatment. They are also likely to require
more time to recuperate afterwards.
Solutions 119
2.142. Spending more time watching TV means that less time is spent on other activities; this
may suggest lurking variables. For example, perhaps the parents of heavy TV watchers do
not spend as much time at home as other parents. Also, heavy TV watchers would typically
not get as much exercise.
2.143. In this case, there may be a causative effect, but in the direction opposite to the one
suggested: People who are overweight are more likely to be on diets and so choose artificial
sweeteners over sugar. (Also, heavier people are at a higher risk to develop diabetes; if they
do, they are likely to switch to artificial sweeteners.)
2.144. (a) Statements such as this typically mean that the risk of dying at a given age is half
as great; that is, given two groups of the same age, where one group walks and the other
does not, the walkers are half as likely to die in (say) the next year. (b) Men who choose
to walk might also choose (or have chosen, earlier in life) other habits and behaviors that
reduce mortality.
2.145. This is an observational study—students choose their “treatment” (to take or not take
the refresher sessions).
2.146. (a) Time plot on the right. (b) The re- 950
gression equation is ŷ = 116779 − 57.83x. 900
Rank of “Atticus”
2.148. (a) The scatterplot shows a positive, curved relationship. (b) The regression explains
.
about r 2 = 98.3% of the variation in salary. While this indicates that the relationship is
strong, and close to linear, we can see from that scatterplot that the actual relationship is
curved.
2.149. (a) The residuals are positive at the beginning and end, and negative in the middle.
(b) The behavior of the residuals agrees with the curved relationship seen in Figure 2.30.
2.150. (a) Both plots show a positive association, but the log-salary plot is linear rather than
curved. (b) While the residuals in Figure 2.31 were positive at the beginning and end, and
negative in the middle, the log-salary residuals in Figure 2.33 show no particular pattern.
120 Chapter 2 Looking at Data—Relationships
2.151. (a) The regression equation for predicting salary from year is ŷ = 41.253 + 3.9331x;
.
for x = 25, the predicted salary is ŷ = 139.58 thousand dollars, or about $139,600.
(b) The log-salary regression equation is ŷ = 3.8675 + 0.04832x. With x = 25, we have
. .
ŷ = 5.0754, so the predicted salary is e ŷ = 160.036, or about $160,040. (c) Although both
predictions involve extrapolation, the second is more reliable, because it is based on a linear
fit to a linear relationship. (d) Interpreting relationships without a plot is risky. (e) Student
summaries will vary, but should include comments about the importance of looking at plots,
and the risks of extrapolation.
2.154. (a) A spreadsheet or other software is the best way to do these computations. As a
−141,800 . −109,800 .
check, the first two raises are 142,900
141,800 = 0.78% and 113,600109,800 = 3.46%. The
scatterplot (following page, left) shows a moderately strong negative linear relationship.
(b) The regression equation is ŷ = 8.8536 − 0.00005038x. (c) A plot of residuals
versus 2007–08 salaries reveals no outliers or other causes for concern. (d) The data
do show that, in general, those with lower salaries are given greater percentage raises.
Students might observe, for example, that the regression explains 71.1% of the variation
in raise. In addition, the regression slope tells us that on the average, an individual’s
raise decreases by about 0.5% for each additional $10,000 earned in 2007–08, because
.
0.00005038 × $10,000 = 0.5%. (Note that the slope is in units of percent per dollar.)
Solutions 121
6 1.5
5 1
Residual (%)
4 0.5
Raise (%)
3 0
2 –0.5
1 –1
0 –1.5
70 80 90 100 110 120 130 140 70 80 90 100 110 120 130 140
2007–08 salary ($1000) 2007–08 salary ($1000)
2.155. A school that accepts weaker students but graduates a higher-than-expected number of
them would have a positive residual, while a school with a stronger incoming class but a
lower-than-expected graduation rate would have a negative residual. It seems reasonable to
measure school quality by how much benefit students receive from attending the school.
2.156. (a) The association is negative and roughly linear. This seems reasonable because a
low number of smokers suggests that the state’s population is health-conscious, so we
might expect more people in that state to have healthy eating habits. (b) The correlation is
.
r = −0.5503. (c) Utah is the farthest point to the left (that is, it has the lowest smoking
rate) and lies well below the line (i.e., the proportion of adults who eat fruits and vegetables
is lower than we would expect). (d) California has the second-lowest smoking rate and
one of the highest fruit/vegetable rates. This point lies above the line, meaning that the
proportion of California adults who eat fruits and vegetables is higher than we would expect.
Text pages
x = 62 pages, we predict ŷ = 68.7 pages. 70
60
50
40
30
20
10
20 30 40 50 60 70 80
A
L TEX pages
2.161. These results support the idea (the slope is negative, so variation decreases with
increasing diversity), but the relationship is only moderately strong (r 2 = 0.34, so diversity
only explains 34% of the variation in population variation).
Note: That last parenthetical comment is awkward and perhaps confusing, but is
consistent with similar statements interpreting r2 .
Residual
distribution. 0.5
0
–0.5
–1
–3 –2 –1 0 1 2 3
Normal score
2.166. (a) Yes: The two lines appear to fit the data well. There do not appear to
be any outliers or influential points. (b) Compare the slopes: before—0.189;
after—0.157. (The units for these slopes are 100 ft3 /day per degree-day/day; for
students who are comfortable with units, 18.9 ft3 vs. 15.7 ft3 would be a better
answer.) (c) Before: ŷ = 1.089 + 0.189(35) = 7.704 = 770.4 ft3 . After:
ŷ = 0.853 + 0.157(35) = 6.348 = 634.8 ft3 . (d) This amounts to an additional
($1.20)(7.704 − 6.348) = $1.63 per day, or $50.44 for the month.
124 Chapter 2 Looking at Data—Relationships
2.167. (a) Shown below are plots of count against time, and residuals against time for the
regression, which gives the formula ŷ = 259.58 − 19.464x. Both plots suggest a curved
relationship rather than a linear one. (b) With natural logarithms, the regression equation is
ŷ = 5.9732 − 0.2184x; with common logarithms, ŷ = 2.5941 − 0.09486x. The second pair
of plots below show the (natural) logarithm of the counts against time, suggesting a fairly
linear relationship, and the residuals against time, which shows no systematic pattern. (If
common logarithms are used instead of natural logs, the plots will look the same, except the
vertical scales will be different.) The correlations confirm the increased linearity of the log
plot: r 2 = 0.8234 for the original data, r 2 = 0.9884 for the log-data.
350 120
100
Bacteria count (hundreds)
300
80
250 60
Residual
200 40
150 20
0
100
–20
50 –40
0 –60
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Time Time
6 0.2
Log[Bacteria count (hundreds)]
5.5 0.15
5 0.1
Residual
0.05
4.5
0
4
–0.05
3.5 –0.1
3 –0.15
2.5 –0.2
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Time Time
2.168. Note that y = 46.6 + 0.41x. We predict that Octavio will score 4.1 points above the
mean on the final exam: ŷ = 46.6 + 0.41(x + 10) = 46.6 + 0.41x + 4.1 = y + 4.1.
(Alternatively, because the slope is 0.41, we can observe that an increase of 10 points on the
midterm yields an increase of 4.1 on the predicted final-exam score.)
2.169. Number of firefighters and amount of damage both increase with the seriousness of the
fire (i.e., they are common responses to the fire’s seriousness).
Solutions 125
Percent of degrees
50
The Western Europe and Asia distribu- 40
tions are similar.
30
20
10
0
US Western Asia Overall
Europe
2.171. Different graphical presentations are possible; one is shown below. More women
perform volunteer work; the notably higher percentage of women who are “strictly
voluntary” participants accounts for the difference. (The “court-ordered” and “other”
percentages are similar for men and women.)
100
90 Strictly voluntary
80
70 Court-ordered
Percent
60
50 Other
40
Non-volunteers
30
20
10
0
Men Women
60
50 Court-ordered
40
30
20
10
0
Men Women
126 Chapter 2 Looking at Data—Relationships
are admitted, compared to 200 400 = 50% of females. (d) A majority (6/7) of male applicants
+ 200 .
apply to the business school, which admits 400 600 + 300 = 66.67% of all applicants. Meanwhile,
90 + 200 .
a majority (3/5) of women apply to the law school, which admits only 200 + 400 = 48.33% of
its applicants.
2.174. Tables will vary, of course. The key idea is that one gender should be more likely to
apply to the schools that are easier to get into. For example, if the four schools admit 50%,
60%, 70%, and 80% of applicants, and men are more likely to apply to the first two, while
women apply to the latter two, women will be admitted more often.
A nice variation on this exercise is to describe two basketball teams practicing. You
observe that one team makes 50% of their shots, while the other makes only 40%. Does that
mean the first team is more accurate? Not necessarily; perhaps they attempted more lay-ups
while the other team spent more time shooting three-pointers. (Some students will latch onto
this kind of example much more quickly than discussions of male/female admission rates.)
2.176. (a) Subtract the “agreed” counts from the Minitab output
sample sizes to get the “disagreed” counts. The Students Non-st Total
table is in the Minitab output on the right. (The Agr 22 30 52
26.43 25.57
output has been slightly altered to have more
descriptive row and column headings.) We find Dis 39 29 68
.
X 2 = 2.67, df = 1, and P = 0.103, so we 34.57 33.43
cannot conclude that students and non-students Total 61 59 120
differ in the response to this question. (b) For
testing H0 : p1 = p2 versus Ha : p1 = p2 , we ChiSq = 0.744 + 0.769 +
. . .
have p̂1 = 0.3607, p̂2 = 0.5085, p̂ = 0.4333, 0.569 + 0.588 = 2.669
. df = 1, p = 0.103
SE Dp = 0.09048, and z = −1.63. Up to round-
ing, z 2 = X 2 , and the P-values are the same. (c) The statistical tests in (a) and (b) assume
that we have two SRSs, which we clearly do not have here. Furthermore, the two groups
differed in geography (northeast/West Coast) in addition to student/non-student classifica-
tion. These issues mean we should not place too much confidence in the conclusions of our
significance test—or, at least, we should not generalize our conclusions too far beyond the
populations “upper level northeastern college students taking a course in Internet marketing”
and “West Coast residents willing to participate in commercial focus groups.”
3.1. Any group of friends is unlikely to include a representative cross section of all students.
3.3. A hard-core runner (and her friends) are not representative of all young people.
3.4. The performance of one car is anecdotal evidence—a “haphazardly selected individual
case.” People tend to remember—and re-tell—stories about extraordinary performance, while
cases of average or below-average reliability are typically forgotten.
3.5. For example, who owns the Web site? Do they have data to back up this statement, and if
so, what was the source of that data?
3.8. (a) No treatment is imposed on the subjects (children); they (or their parents) choose how
much TV they watch. The explanatory variable is hours watching TV, and the response
variable is “later aggressive behavior.” (b) An adolescent who watches a lot of television
probably is more likely to spend less time doing homework, playing sports, or having social
interactions with peers. He or she may also have less contact with or guidance from his/her
parents.
3.9. This is an experiment: Each subject is (presumably randomly) assigned to a group, each
with its own treatment (computer animation or reading the textbook). The explanatory
variable is the teaching method, and the response variable is the change in each student’s
test score.
3.10. This is an experiment, assuming the order of treatment given to each subject was
randomly determined. The explanatory variable is the form of the apple (whole, or juice),
and the response variable is how full the subjects felt.
Note: This is a matched pairs experiment, described on page 181 of the text.
3.11. The experimental units are food samples, the treatment is exposure to different levels of
radiation, and the response variable is the amount of lipid oxidation. Note that in a study
with only one factor—like this one—the treatments and factor levels are essentially the same
thing: The factor is varying radiation exposure, with nine levels.
It is hard to say how much this will generalize; it seems likely that different lipids react
to radiation differently.
128
Solutions 129
3.12. This is an experiment because the experimental units (students) are randomly assigned to
a treatment group. Note that in a study with only one factor—like this one—the treatments
and factor levels are essentially the same thing: There are two treatments/levels of the factor
“instruction method.” The response variable is the change in score on the standardized test.
The results of this experiment should generalize to other classes (on the same topic)
taught by the same instructor, but might not apply to other subject matter, or to classes
taught by other instructors.
3.13. Those who volunteer to use the software may be better students (or worse). Even if
we cannot decide the direction of the bias (better or worse), the lack of random allocation
means that the conclusions we can draw from this study are limited at best.
3.14. Treatment 1
Group 1 - Paper-based
*
HH
25 students
Random instruction j Compare change
H
H *
assignment in test scores
H
j
H Treatment 2
Group 2 - Web-based
25 students
instruction
3.15. Because there are nine levels, this diagram is rather large (and repetitive), so only the top
three branches are shown.
Group 1 - Treatment 1
3 samples Radiation level 1 J
J
JJ
^
Random - Group 2 - Treatment 2 - Compare lipid
assignment 3 samples Radiation level 2 oxidation
J
J
JJ
^ Group 3 - Treatment 3
3.17. (a) Shopping patterns may differ on Friday and Saturday, which would make it hard to
determine the true effect of each promotion. (That is, the effect of the promotion would be
confounded with the effect of the day.) To correct this, we could offer one promotion on a
Friday, and the other on the following Friday. (Or, we could do as described in the exercise,
and then on the next weekend, swap the order of the offers.) (b) Responses may vary in
different states. To control for this, we could launch both campaigns in (separate) parts of
the same state or states. (c) A control is needed for comparison; if we simply compare
this year’s yield to last year’s yield, we will not know how much of the difference can be
attributed to changes in the economy. We should compare the new strategy’s yield with
another investment portfolio using the old strategy.
Note: For part (c), this comparison might be done without actually buying or selling
anything; we could simply compute how much money would have been made if we had
followed the new strategy; that is, we keep a “virtual portfolio.” This assumes that our
buying and selling is on a small enough scale that it does not affect market prices.
130 Chapter 3 Producing Data
3.18. (a) Assigning subjects by gender is not random. It would be better to treat gender
as a blocking variable, assigning five men and five women to each treatment. (b) This
randomization will not necessarily divide the subjects into two groups of five. (Note that it
would be a valid randomization to use this method until one group had four subjects, and
then assign any remaining subjects to the other group.) (c) The 10 rats in a batch might be
similar to one another in some way. For example, they might be siblings, or they might have
been exposed to unusual conditions during shipping. (The safest approach in this situation
would be to treat each batch as a block, and randomly assign two or three rats from each
batch to each treatment.)
3.19. The experiment can be single-blind (those evaluating the exams should not know which
teaching approach was used), but not double-blind, because the students will know which
treatment (teaching method) was assigned to them.
3.20. For example, we might block by gender, by year in school, or by housing type
(dorm/off-campus/Greek).
3.21. For example, new employees should be randomly assigned to either the current
program or the new one. There are many possible choices for outcome variables, such as
performance on a test of the information covered in the program, or a satisfaction survey or
other evaluation of the program by those who went through it.
3.22. Subjects—perhaps recruited from people suffering from chronic pain, or those recovering
from surgery or an injury—should be randomly assigned to be treated with magnets, or a
placebo (an object similar to the magnets, except that it is not magnetic. Students should
address some of the practical difficulties of such an experiment, such as: How does one
measure pain relief? How can we prevent subjects from determining whether they are
being treated with a magnet? (For the latter question, we might apply the treatments in a
controlled setting, making sure that there is nothing metal with which the subjects could test
their treatment object.)
3.23. (a) The factors are calcium dose, and vitamin D dose. There are 9 treatments (each
calcium/vitamin D combination). (b) Assign 20 students to each group, with 10 of each
gender. The complete diagram (including the blocking step) would have a total of 18
branches, below is a portion of that diagram, showing only three of the nine branches for
each gender. (c) Randomization results will vary.
-
Group 1 Treatment 1
@
R
@
Men - Random - Group 2 - Treatment 2 - measure
1
assignment @ TBBMC
R
@
Group 3 - Treatment 3
Subjects
-
Group 1 Treatment 1
PP
q @
R
@
Women - Random - Group 2 - Treatment 2 - measure
assignment @
R
@ TBBMC
Group 3 - Treatment 3
Solutions 131
3.24. Students may need guidance as to how to illustrate these interactions. Figure 3.7 shows
one such illustration (as part of Exercise 3.40). Shown below are two possible illustrations,
based on made-up data (note there is no scale on the vertical axis). In the first, we see
that the effect of vitamin D on TBBMC depends on the calcium dose. In the second, we
see little variation in men’s TBBMC across all nine treatments, while women’s TBBMC
appears to depend on the treatment group. (In particular, women’s TBBMC is greatest for
treatment 9, with the highest doses of both calcium and vitamin D.)
TBBMC level
50 IU vitamin D Women
0 IU vitamin D
0 200 400 1 2 3 4 5 6 7 8 9
Calcium dose (mg/day) Treatment
3.25. (a) For example, flip a coin for each customer to choose which variety (s)he will taste.
To evaluate preferences, we would need to design some scale for customers to rate the
coffee they tasted, and then compare the ratings. (b) For example, flip a coin for each
customer to choose which variety (s)he will taste first. Ask which of the two coffees (s)he
preferred. (c) The matched-pairs version described in part (b) is the stronger design; if each
customer tastes both varieties, we only need to ask which was preferred. In part (a), we
might ask customers to rate the coffee they tasted on a scale of (say) 1 to 10, but such
ratings can be wildly variable (one person’s “5” might be another person’s “8”), which
makes comparison of the two varieties more difficult.
3.26. An experiment would be nearly impossible in this case, unless the publishers of Sports
Illustrated would allow you to choose the cover photo for several issues. (The ideal design
would involve taking a collection of currently-successful teams or individuals, and randomly
assigning half to be on the cover, then observing their fortunes over the next few months.)
3.27. Experimental units: pine tree seedlings. Factor: amount of light. Treatments (two): full
light, or shaded to 5% of normal. Response variable: dry weight at end of study.
3.28. Experimental units: Middle schools. Factors: Physical activity program, and nutrition
program. Treatments (four): Activity intervention, nutrition intervention, both interventions,
and neither intervention. Response variables: Physical activity and lunchtime consumption
of fat.
3.29. Subjects: adults (or registered voters) from selected households. Factors: level of
identification, and offer of survey results. Treatments (six): interviewer’s name with results,
interviewer’s name without results, university name with results, university name without
results, both names with results, both names without results. Response variable: whether or
not the interview is completed.
132 Chapter 3 Producing Data
3.30. (a) The subjects are the physicians, the factor is medication (aspirin or placebo), and the
response variable is health, specifically whether the subjects have heart attacks. (b) Below.
(c) The difference in the number of heart attacks between the two groups was so great that
it would rarely occur by chance if aspirin had no effect.
Group 1 - Treatment 1
* 11,000 physicians
Aspirin HH
Random j
H Observe heart
assignment H *
attacks
H
j
H
Group 2 - Treatment 2
11,000 physicians Placebo
3.31. Assign nine subjects to treatment 1, then nine more to treatment 2, etc. A diagram is on
the next page; if we assign labels 01 through 36, then line 151 gives:
Group 1 - Antidepressant
9 subjects
Antidepressant J
Group 2 - J
plus stress J Observe number
*
9 subjects HH
management ^
J
Random j
H
assignment HH *
and severity of
J Hj Group 3 -
headaches
Placebo
J 9 subjects
J
^
J
3.32. (a) A diagram is shown below. (b) Label the subjects from 01 through 20. From
line 101, we choose:
19, 05, 13, 17, 09, 07, 02, 01, 18, and 14
Assuming that labels are assigned alphabetically, that is Wayman, Cunningham, Mitchell,
Seele, Knapp, Fein, Brifcani, Becker, Truong, and Ponder for one group, and the rest for the
other. See note on page 50 about using Table B.
Group 1 - Treatment 1
* Strong marijuana H
10 subjects
Random H
j
H Observe work
output and
assignment H *
H
j
H earnings
Group 2 - Treatment 2
10 subjects Weak marijuana
3.33. Students might envision different treatments; one possibility is to have some volunteers
go through a training session, while others are given a written set of instructions, or watch a
video. For the response variable(s), we need some measure of training effectiveness; perhaps
we could have the volunteers analyze a sample of lake water and compare their results to
some standard.
3.34. (a) Diagram below. (b) Using line 153 from Table B, the first four subjects are 07, 88,
65, and 68. See note on page 50 about using Table B.
Accomplice fired
Group 1 - because he/she did
30 subjects
poorly J
J
JJ
^ Observe
Random - Group 2 - Accomplice - performance
assignment 30 subjects randomly fired
J
JJ
^ Group 3 - Both continue
30 subjects to work
134 Chapter 3 Producing Data
H JJ
6 schools intervention Observe physical
*
H^
Random j
H activity and
H * lunchtime fat
JH
assignment j
H Group 3 - Both interventions
J 6 schools
consumption
J
^
J
Group 4 - Neither
6 schools intervention
3.36. (a) The table below shows the 16 treatments—four levels for each of the two factors.
(b) A diagram is not shown here (with 16 treatments, it is quite large). Six subjects are
randomly assigned to each treatment; they read the ad for that treatment, and we record their
attractiveness ratings for the ad. Using line 111, the first six subjects are 81, 48, 66, 94, 87,
and 60.
Factor B
Fraction of shoes on sale
25% 50% 75% 100%
20% 1 2 3 4
Factor A 40% 5 6 7 8
Discount level 60% 9 10 11 12
80% 13 14 15 16
3.37. (a) Population = 1 to 150 , Select a sample of size 25 , click Reset and Sample .
(b) Without resetting, click Sample again. (c) Click Sample three more times.
3.38. Population = 1 to 40 , Select a sample of size 20 , then click Reset and Sample .
3.39. Design (a) is an experiment. Because the treatment is randomly assigned, the effect of
other habits would be “diluted” because they would be more-or-less equally split between
the two groups. Therefore, any difference in colon health between the two groups could be
attributed to the treatment (bee pollen or not).
Design (b) is an observational study. It is flawed because the women observed chose
whether or not to take bee pollen; one might reasonably expect that people who choose to
take bee pollen have other dietary or health habits that would differ from those who do not.
Solutions 135
3.40. For a range of discounts, the attractiveness of the sale decreases slightly as the
percentage of goods on sale increases. (The decrease is so small that it might not be
significant.) With precise discounts, on the other hand, mean attractiveness increases with
the percentage on sale. Range discounts are more attractive when only 25% of goods are
marked down, while the precise discount is more attractive if 75% or 100% of goods are
discounted.
3.41. As described, there are two factors: ZIP code (three levels: none, five-digit, nine-digit)
and the day on which the letter is mailed (three levels: Monday, Thursday, or Saturday) for
a total of nine treatments. To control lurking variables, aside from mailing all letters to the
same address, all letters should be the same size, and either printed in the same handwriting
or typed. The design should also specify how many letters will be in each treatment group.
Also, the letters should be sent randomly over many weeks.
3.42. Results will vary, but probability computations reveal that more than 97% of samples
will have 7 to 13 fast-reacting subjects (and 99.6% of samples have 8 to 14 fast-reacting
subjects). Additionally, if students average their 10 samples, nearly all students (more than
99%) should find that the average number of fast-reacting subjects is between 8.5 and 11.5.
Note: X, the number of fast-reacting subjects in the sample, has a hypergeometric
.
distribution with parameters N = 40, r = 20, n = 20, so that P(7 ≤ X ≤ 13) = 0.974. The
theoretical average number of fast-reacting subjects is 10.
3.43. Each player will be put through the sequence (100 yards, four times) twice—once with
oxygen and once without, and we will observe the difference in their times on the final run.
(If oxygen speeds recovery, we would expect that the oxygen-boosted time will be lower.)
Randomly assign half of the players to use oxygen on the first trial, while the rest use it on
the second trial. Trials should be on different days to allow ample time for full recovery.
If we label the players 01 through 20 and begin on line 140, we choose 12, 13, 04, 18,
19, 16, 02, 08, 17, 10 to be in the oxygen-first group. See note on page 50 about using
Table B.
3.44. The sketches requested in the problem are not shown here; random assignments will vary
among students. (a) Label the circles 1 to 6, then randomly select three (using Table B, or
simply by rolling a die) to receive the extra CO2 . Observe the growth in all six regions,
and compare the mean growth within the three treated circles with the mean growth in the
other three (control) circles. (b) Select pairs of circles in each of three different areas of the
forest. For each pair, randomly select one circle to receive the extra CO2 (using Table B or
by flipping a coin). For each pair, compute the difference in growth (treated minus control).
3.45. (a) Randomly assign half the girls to get high-calcium punch; the other half will get
low-calcium punch. The response variable is not clearly described in this exercise; the best
we can say is “observe how the calcium is processed.” (b) Randomly select half of the
girls to receive high-calcium punch first (and low-calcium punch later), while the other half
gets low-calcium punch first (followed by high-calcium punch). For each subject, compute
the difference in the response variable for each level. This is a better design because it
deals with person-to-person variation; the differences in responses for 60 individuals gives
more precise results than the difference in the average responses for two groups of 30
136 Chapter 3 Producing Data
subjects. (c) The first five subjects are 38, 44, 18, 33, and 46. In the CR design, the first
group receives high-calcium punch all summer; in the matched pairs design, they receive
high-calcium punch for the first part of the summer, and then low-calcium punch in the
second half.
3.46. (a) False. Such regularity holds only in the long run. If it were true, you could look at
the first 39 digits and know whether or not the 40th was a 0. (b) True. All pairs of digits
(there are 100, from 00 to 99) are equally likely. (c) False. Four random digits have chance
1/10000 to be 0000, so this sequence will occasionally occur. 0000 is no more or less
random than 1234 or 2718 or any other four-digit sequence.
3.47. (a) This is a block design. (b) The diagram might be similar to the one below (which
assumes equal numbers of subjects in each group). (c) The results observed in this study
would rarely have occurred by chance if vitamin C were ineffective.
Group 1 - Treatment 1
*
n runners Vitamin C H
Random H
j
H Watch for
H *
assignment infections
H
j
H Group 2 - Treatment 2
n runners Placebo
Group 1 - Treatment 1
*
n nonrunners Vitamin C HH
Random j
H Watch for
assignment HH *
infections
j
H
Group 2 - Treatment 2
n nonrunners Placebo
3.48. The population is faculty members at Mongolian public universities, and the sample is
the 300 faculty members to whom the survey was sent. Because we do not know how many
responses were received, we cannot determine the response rate.
Note: We might consider the population to be either the 2500 faculty members on the
list, or the larger group of “all current and future faculty members.” In the latter case,
those on the list constitute the sampling frame—the subset of the population from which our
sample will be selected.
The sample might be considered to be only those faculty who actually responded to the
survey (rather than the 300 selected), because that is the actual group from which we “draw
conclusions about the whole.”
3.49. The population is all forest owners in the region. The sample is the 772 forest owners
348 .
contacted. The response rate is = 45%. Aside from the given information, we would
772
like to know the sample design (and perhaps some other things).
Note: It would also be reasonable to consider the sample to be the 348 forest owners
who returned the survey, because that is the actual group from which we “draw conclusions
about the whole.”
Solutions 137
3.50. To use Table B, number the list from 0 to 9 and choose two single digits. (One can
also assign labels 01–10, but that would require two-digit numbers, and we would almost
certainly end up skipping over many pairs of digits before we found two in the desired
range.)
It is worth noting that choosing an SRS is often described as “pulling names out of a
hat.” For long lists, it is often impractical to do this literally, but with such a small list, one
really could write each ringtone on a slip of paper and choose two slips at random.
3.51. See the solution to the previous exercise; for this problem, we need to choose three items
instead of two, but it is otherwise the same.
3.52. (a) This is a multistage sample: We first sample three of the seven course sections, then
eight from each chosen section. (b) This is an SRS: Each student has the same chance
(5/55) of being selected. (c) This is a voluntary response sample: Only those who visit the
site can participate (if they choose to). (d) This is a stratified random sample: Males and
females (the strata) are sampled separately.
3.53. (a) This statement confuses the ideas of population and sample. (If the entire population
is found in our sample, we have a census rather than a sample.) (b) “Dihydrogen monoxide”
is H2 O. Any concern about the dangers posed by water most likely means that the
respondent did not know what dihydrogen monoxide was, and was too embarrassed to admit
it. (Conceivably, the respondent knew the question was about water and had concerns arising
from a bad experience of flood damage or near-drowning. But misunderstanding seems
to be more likely.) (c) Honest answers to such questions are difficult to obtain even in an
anonymous survey; in a public setting like this, it would be surprising if there were any
raised hands (even though there are likely to be at least a few cheaters in the room).
3.54. (a) The content of a single chapter is not random; choose random words from random
pages. (b) Students who are registered for an early-morning class might have different
characteristics from those who avoid such classes. (c) Alphabetic order is not random. One
problem is that the sample might include people with the same last name (siblings, spouses,
etc.). Additionally, some last names tend to be more common in some ethnic groups.
3.55. The population is (all) local businesses. The sample is the 73 businesses that return the
questionnaire, or the 150 businesses selected. The nonresponse rate is 51.3% = 150 77
.
Note: The definition of “sample” makes it somewhat unclear whether the sample
includes all the businesses selected or only those that responded. My inclination is toward
the latter (the smaller group), which is consistent with the idea that the sample is “a part of
the population that we actually examine.”
138 Chapter 3 Producing Data
ay
y
ay
y
y
y
da
rda
da
da
da
(b) The exclusion of the Christmas shopping
esd
esd
urs
Fri
n
n
tu
Mo
Su
dn
Tu
Sa
season might impact the numbers. In addi-
Th
We
tion, much of this period fell during a economic recession. Day of week
3.57. Note that the numbers add to 100% down the columns; that is, 39% is the percent of Fox
viewers who are Republicans, not the percent of Republicans who watch Fox.
Students might display the data using a stacked bar graph like the one below, or
side-by-side bars. (They could also make four pie charts, but comparing slices across pies is
difficult.) The most obvious observation is that the party identification of Fox’s audience is
noticeably different from the other three sources.
100
90 Other/Don't know
Percent of viewers
80
70 Independent
60
50 Democrat
40
Republican
30
20
10
0
Fox CNN MSNBC Network
3.58. Exact descriptions of the populations may vary. (a) All current students—or perhaps all
current students who were enrolled during the year prior to the change. (The latter would be
appropriate if we want opinions based on a comparison between the new and old curricula.)
(b) All U.S. households. (c) Adult residents of the United States.
3.60. Assign labels 001 to 200. To use Table B, take three digits at a time beginning on
line 112; the first five pixels are 089, 064, 032, 117, and 003.
3.61. Population = 1 to 200 , Select a sample of size 25 , then click Reset and Sample .
Solutions 139
3.62. With the applet: Population = 1 to 371 , Select a sample of size 25 , then click
Reset and Sample . With Table B, line 120 gives the codes labeled 354, 239, 193, 099,
and 262.
3.63. One could use the labels already assigned to the blocks, but that would mean skipping
a lot of four-digit combinations that do not correspond to any block. An alternative would
be to drop the second digit and use labels 100–105, 200–211, and 300–325. But by far the
simplest approach is to assign labels 01–44 (in numerical order by the four-digit numbers
already assigned), enter the table at line 135, and select:
39 (block 3020), 10 (2003), 07 (2000), 11 (2004), and 20 (3001)
See note on page 50 about using Table B.
3.64. If one always begins at the same place, then the results could not really be called
random.
3.65. The sample will vary with the starting line in Table B. The simplest method is to use
the last digit of the numbers assigned to the blocks in Group 1 (that is, assign the labels
0–5), then choose one of those blocks; use the last two digits of the blocks in Group 2
(00–11) and choose two of those, and finally use the last two digits of the blocks in Group
3 (00–25) and choose three of them.
3.66. (a) If we choose one of the first 45 students and then every 45th name after that, we
45 = 200 names. (b) Label the first 45 names 01–45. Beginning at
will have a total of 9000
line 125, the first number we find is 21, so we choose names 21, 66, 111, . . . .
3.67. Considering the 9000 students of Exercise 3.66, each student is equally likely;
specifically, each name has chance 1/45 of being selected. To see this, note that each of the
first 45 has chance 1/45 because one is chosen at random. But each student in the second
45 is chosen exactly when the corresponding student in the first 45 is, so each of the second
45 also has chance 1/45. And so on.
This is not an SRS because the only possible samples have exactly one name from
the first 45, one name from the second 45, and so on; that is, there are only 45 possible
samples. An SRS could contain any 200 of the 9000 students in the population.
3.68. (a) This is a stratified random sample. (b) Label from 01 through 27; beginning at
line 122, we choose:
13 (805), 15 (760), 05 (916), 09 (510), 08 (925),
27 (619), 07 (415), 10 (650), 25 (909), and 23 (310)
Note: The area codes are in north-south order if we read across the rows; that is how they
were labeled for this solution. Students might label down rather than across; the sample
should include the same set of labels but a different list of area codes.
140 Chapter 3 Producing Data
3.69. Assign labels 01–36 for the Climax 1 group, 01–72 for the Climax 2 group, and so on.
Then beginning at line 140, choose:
12, 32, 13, 04 from the Climax 1 group and (continuing on in Table B)
51, 44, 72, 32, 18, 19, 40 from the Climax 2 group
24, 28, 23 from the Climax 3 group and
29, 12, 16, 25 from the mature secondary group
See note on page 50 about using Table B.
3.70. Label the students 01, . . . , 30 and use Table B. Then label the faculty 0, . . . , 9 and use
the table again. (You could also label the faculty from 01 to 10, but that would needlessly
require two-digit labels.)
Note: Students often try some fallacious method of choosing both samples
simultaneously. We simply want to choose two separate SRSs: one from the students and one
from the faculty. See note on page 50 about using Table B.
3.71. Each student has a 10% chance: 3 out of 30 over-21 students, and 2 of 20 under-21
students. This is not an SRS because not every group of 5 students can be chosen; the only
possible samples are those with 3 older and 2 younger students.
3.72. Label the 500 midsize accounts from 001 to 500, and the 4400 small accounts from
0001 to 4400. On line 115, we first encounter numbers 417, 494, 322, 247, and 097 for the
midsize group, then 3698, 1452, 2605, 2480, and 3716 for the small group. See note on
page 50 about using Table B.
3.73. (a) This design would omit households without telephones or with unlisted numbers.
Such households would likely be made up of poor individuals (who cannot afford a phone),
those who choose not to have phones, and those who do not wish to have their phone
numbers published. (b) Those with unlisted numbers would be included in the sampling
frame when a random-digit dialer is used.
3.74. (a) This will almost certainly produce a positive response because it draws the dubious
conclusion that cell phones cause brain cancer. Some people who drive cars, or eat carrots,
or vote Republican develop brain cancer, too. Do we conclude that these activities should
come with warning labels, also? (b) The phrasing of this question will tend to make
people respond in favor of national health insurance: It lists two benefits of such a system,
and no arguments from the other side of the issue. (c) This sentence is so convoluted
and complicated that it is almost unreadable; it is also vague (what sort of “economic
incentives”? How much would this cost?). A better phrasing might be, “Would you be
willing to pay more for the products you buy if the extra cost were used to conserve
resources by encouraging recycling?” That is still vague, but less so, and is written in plain
English.
3.75. The first wording brought the higher numbers in favor of a tax cut; “new government
programs” has considerably less appeal than the list of specific programs given in the second
wording.
Solutions 141
3.76. Children from larger families will be overrepresented in such a sample. Student
explanations of why will vary; a simple illustration can aid in understanding this effect.
Suppose that there are 100 families with children; 60 families have one child and the other
40 have three. Then there are a total of 180 children (an average of 1.8 children per family),
and two-thirds (120) of those children come from families with three children. Therefore, if
we had a class (a sample) chosen from these 180 children, only one-third of the class would
answer “one” to the teacher’s question, and the rest would say “three.” This would give an
average of about 2.3 children per family.
3.78. Responses to public opinion polls can be affected by things like the wording of the
question, as was the case here: Both statements address the question of how to distribute
wealth in a society, but subtle (and not-so-subtle) slants in the wording suggest that the
public holds conflicting opinions on the subjects.
3.79. The population is undergraduate college students. The sample is the 2036 students. (We
assume they were randomly selected.)
3.80. No; this is a voluntary response sample. The procedures described in the text apply to
data gathered from an SRS.
3.81. The larger sample would have less sampling variability. (That is, the results would have a
higher probability of being closer to the “truth.”)
3.82. (a) Parameters are associated with the population; statistics describe samples. (b) Bias
means that the center of sampling distribution is not equal to the true value of the
parameter; that is, bias is systematic under- or over-estimation. Variability refers to the
spread (not the center) of the sampling distribution. (c) Large samples generally have lower
variability, but if the samples are biased, that lower variability is of little use. (In addition,
larger samples generally come at a cost; the added cost might not justify the decrease in
variability.) (d) A sampling distribution might be visualized (or even simulated) with a
computer, but it arises from the process of sampling, not from computation.
3.83. (a) Population: college students. Sample: 17,096 students. (b) Population: restaurant
workers. Sample: 100 workers. (c) Population: longleaf pine trees. Sample: 40 trees.
3.84. (a) High bias, high variability (many are low, wide scatter). (b) Low bias, low variability
(close to parameter, little scatter). (c) Low bias, high variability (neither too low nor too
high, wide scatter). (d) High bias, low variability (too high, little scatter).
Note: Make sure that students understand that “high bias” means that the values are far
from the parameter, not that they are too high.
142 Chapter 3 Producing Data
Frequency
and the largest is 11.5 (from 9, 10, 12, 15).
8
(b) Answers will vary. On the right is the
(exact) sampling distribution, showing all 6
possible values of the experiment (so the first 4
rectangle is for 5.25, the next is for 5.5, etc.). 2
Note that it looks roughly Normal; if we had 0
5 6 7 8 9 10 11
taken a larger sample from a larger popula-
Sample mean
tion, it would appear even more Normal.
Note: This histogram was found by considering all 10 4 = 210 of the possible samples. A
collection of only 10 random samples will, of course, be considerably less detailed.
3.86. No: With sufficiently large populations (“at least 100 times larger than the sample”), the
variability (and margin of error) depends on the sample size.
3.87. (a) This is a multistage sample. (b) Attitudes in smaller countries (many of which were
not surveyed) might be different. (c) An individual country’s reported percent will typically
differ from its true percent by no more than the stated margin of error. (The margins of
error differ among the countries because the sample sizes were not all the same.)
Note: The number of countries in the world is about 195 (the exact number depends on
the criteria of what constitutes a separate country). That means that about 60 countries are
not represented in this survey.
3.88. (a) The population is Ontario residents; the sample is the 61,239 people interviewed.
(b) The sample size is very large, so if there were large numbers of both sexes in the
sample—this is a safe assumption because we are told this is a “random sample”—these two
numbers should be fairly accurate reflections of the values for the whole population.
3.89. (a) The histogram should be centered at about 0.6 (with quite a bit of spread). For
reference, the theoretical histogram is shown below on the left; student results should have a
similar appearance. (b) The histogram should be centered at about 0.2 (with quite a bit of
spread). The theoretical histogram is shown below on the right.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Solutions 143
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
3.91. (a) The scores will vary depending on the starting row. Note that the smallest possible
mean is 61.75 (from the sample 58, 62, 62, 65) and the largest is 77.25 (from 73, 74, 80,
82). (b) Answers will vary; shown below are two views of the (exact) sampling distribution.
The first shows all possible values of the experiment (so the first rectangle is for 61.75,
the next is for 62.00, etc.); the other shows values grouped from 61 to 61.75, 62 to 62.75,
etc. (which makes the histogram less bumpy). The tallest rectangle in the first picture is 8
units; in the second, the tallest is 28 units.
Note: These histograms were found by considering all 10 4 = 210 of the possible
samples. It happens that half (105) of those samples yield a mean smaller than 69.4 and
half yield a greater mean.
Frequency
and 78. 40
The shape of the sampling distribution be- 30
comes more apparent if the results of many 20
students are pooled. Below on the right is an 10
example based on 300 sample means, which 0
might arise from pooling all the results in a 20 25 30 35 40 45 50 55 60 65 70 75
class of 30. Value of x-bar
Note: Because the values in these samples are not independent (there can be no repeats),
a stronger version of the central limit theorem is needed to determine that the sampling
distribution is approximately Normal. Confirming the standard deviation given above is a
reasonably difficult exercise even for a mathematics major.
3.93. (a) Below is the population stemplot (which gives the same information as a histogram).
. .
The (population) mean GPA is µ = 2.6352, and the standard deviation is σ = 0.7794.
.
[Technically, we should take σ = 0.7777, which comes from dividing by n rather than
n − 1, but few (if any) students would know this, and it has little effect on the results.]
(b) & (c) Results will vary; these histograms are not shown. Not every sample of size 20
could be viewed as “generally representative of the population,” but most should bear at
least some resemblance to the population distribution.
0 134
0 567889
1 0011233444
1 5566667888888888999999
2 000000000111111111222222222333333333444444444
2 5555555555555666666667777777777777788888888888888999999
3 0000000000000011111111112222222223333333333333333444444444
3 556666666677777788889
4 0000
Solutions 145
Many of the questions in Section 3.4 (Ethics), Exercises 3.96–3.117, are matters of opinion and
may be better used for class discussion rather than as assigned homework. A few comments
are included here.
3.96. These three proposals are clearly in increasing order of risk. Most students will likely
consider that (a) qualifies as minimal risk, and most will agree that (c) goes beyond minimal
risk.
3.97. (a) A nonscientist might raise different viewpoints and concerns from those considered
by scientists. (b) Answers will vary.
3.98. It is good to plainly state the purpose of the research (“To study how people’s religious
beliefs and their feelings about authority are related”). Stating the research thesis (that
orthodox religious belief are associated with authoritarian personalities) would cause bias.
3.102. (a) Ethical issues include informed consent and confidentiality; random assignment
generally is not an ethical consideration. (b) “Once research begins, the board monitors its
progress at least once a year.” (c) Harm need not be physical; psychological harm also needs
to be considered.
146 Chapter 3 Producing Data
3.103. To control for changes in the mass spectrometer over time, we should alternate between
control and cancer samples.
3.105. The articles are “Facebook and academic performance: Reconciling a media sensation
with data” (Josh Pasek, eian more, Eszter Hargittai), a critique of the first article called “A
response to reconciling a media sensation with data” (Aryn C. Karpinski), and a response to
the critique (“Some clarifications on the Facebook-GPA study and Karpinski’s response”) by
the original authors. In case these articles are not available at the address given in the text,
they might be found elsewhere with a Web search.
3.106. (a) The simplest approach is to label from 00001 through 14959 and then take five
digits at a time from the table. A few clever students might think of some ways to make
this process more efficient, such as taking the first random digit chosen as “0” if it is
even and “1” if odd. (This way, fewer numbers need to be ignored.) (b) Using labels
00001–14959, we choose 05995, 06788, and 14293. Students who try an alternate approach
may have a different sample.
3.108. (a) A sample survey: We want to gather information about a population (U.S. residents)
based on a sample. (b) An experiment: We want to establish a cause-and-effect relationship
between teaching method and amount learned. (c) An observational study: There is no
particular population from which we will sample; we simply observe “your teachers,” much
like an animal behavioral specialist might study animals in the wild.
3.111. They cannot be anonymous because the interviews are conducted in person in the
subject’s home. They are certainly kept confidential.
Note: For more information about this survey, see the GSS Web site:
www.norc.org/GSS+Website
3.112. This offers anonymity, since names are never revealed. (However, faces are seen, so
there may be some chance of someone’s identity becoming known.)
3.116. (a) Those being surveyed should be told the kind of questions they will be asked
and the approximate amount of time required. (b) Giving the name and address of the
organization may give the respondents a sense that they have an avenue to complain should
they feel offended or mistreated by the pollster. (c) At the time that the questions are being
asked, knowing who is paying for a poll may introduce bias, perhaps due to nonresponse
(not wanting to give what might be considered a “wrong” answer). When information about
a poll is made public, though, the poll’s sponsor should be announced.
3.120. At norc.org, search for “Consumer Finances” or “SCF,” and from the SCF page,
click on the link to “website for SCF respondents.” At the time this manual was written, the
pledge was found at www.norc.org/scf2010/Confidentiality.html.
3.121. (a) You need information about a random selection of his games, not just the ones he
chooses to talk about. (b) These students may have chosen to sit in the front; all students
should be randomly assigned to their seats.
Solutions 147
3.122. (a) A matched pairs design (two halves of the same board would have similar
properties). (b) A sample survey (with a stratified sample: smokers and nonsmokers). (c) A
block design (blocked by gender).
3.124. (a) For example, one could select (or recruit) a sample and assess each person’s calcium
intake (perhaps by having them record what they eat for a week), and measure his/her
TBBMD. (b) For example, measure each subject’s TBBMD, then randomly assign half the
subjects to take a calcium supplement, and the other half to take a placebo. After a suitable
period, measure TBBMD again. (c) The experiment, while more complicated, gives better
information about the relationship between these variables, because it controls for other
factors that may affect bone health.
3.126. Each subject should taste both kinds of fries in a randomly selected order and then
be asked about preference. One question to consider is whether they should have ketchup
available; many people typically eat fries with ketchup, and its presence or absence might
affect their preferences. If ketchup is used, should one use the same ketchup for both, or a
sample of the ketchup from each restaurant?
3.127. The two factors are gear (three levels) and steepness of the course (number of levels
not specified). Assuming there are at least three steepness levels—which seems like the
smallest reasonable choice—that means at least nine treatments. Randomization should be
used to determine the order in which the treatments are applied. Note that we must allow
ample recovery time between trials, and it would be best to have the rider try each treatment
several times.
3.129. (a) One possible population: all full-time undergraduate students in the fall term on a
list provided by the registrar. (b) A stratified sample with 125 students from each year is
one possibility. (c) Mailed (or emailed) questionnaires might have high nonresponse rates.
Telephone interviews exclude those without phones and may mean repeated calling for those
who are not home. Face-to-face interviews might be more costly than your funding will
allow. There might also be some response bias: Some students might be hesitant about
criticizing the faculty (while others might be far too eager to do so).
148 Chapter 3 Producing Data
3.130. (a) For the two factors (administration method, Injection Skin patch IV drip
with three levels, and dosage, with two levels), 5 mg 1 2 3
the treatment combinations are shown in the table 10 mg 4 5 6
on the right, and the design is diagrammed below.
(b) Larger samples give more information; in particular, with large samples, we reduce the
variability in the observed mean concentrations so that we can have more confidence that the
differences we might observe are due to the treatment applied rather than random fluctuation.
Group 1 - Treatment 1
n subjects 5 mg, injected
B
B
Group 2 - Treatment 2 B
n subjects 5 mg, patch B
B
J B
Group 3 - Treatment 3 J B
Measure
* n subjects 5 mg, IV drip H JJ BN concentration in
Random
H j
H ^
H * the blood after
assignment BJ HHj Group 4 - Treatment 4
BJ n subjects 10 mg, injected
30 minutes
BJ
B J ^
B Group 5 - Treatment 5
B n subjects 10 mg, patch
B
B
BN Group 6 - Treatment 6
n subjects 10 mg, IV drip
3.131. Use a block design: Separate men and women, and randomly allocate each gender
among the six treatments.
The remaining exercises relate to the material of Section 3.4 (Ethics). Answers are given
for the first two; the rest call for student opinions, or information specific to the student’s
institution.
3.132. Parents who fail to return the consent form may be more likely to place less priority
on education and therefore may give their children less help with homework, and so forth.
Including those children in the control group is likely to lower that group’s score.
Note: This is a generalization, to be sure: We are not saying that every such parent does
not value education, only that the percentage of this group that highly values education will
almost certainly be lower than that percentage of the parents who return the form.
3.133. The latter method (CASI) will show a higher percentage of drug use because
respondents will generally be more comfortable (and more assured of anonymity) about
revealing embarrassing or illegal behavior to a computer than to a person, so they will be
more likely to be honest.
Chapter 4 Solutions
4.1. Only 6 of the first 20 digits on line 119 correspond 95857 07118 87664 92099
to “heads,” so the proportion of heads is 20 = 0.3.
6 TTTTT HTHHT TTTTH THHTT
With such a small sample, random variation can produce results different from the expected
value (0.5).
4.2. The overall rate (56%) is an average. Graduation rates vary greatly among institutions;
some will have have higher rates, and others lower.
4.3. (a) Most answers (99.5% of them) will be between 82% and 98%. (b) Based on 100,000
simulated trials—more than students are expected to do—the longest string of misses will
be quite short (3 or fewer with probability 99%, 5 or fewer with probability 99.99%). The
average (“expected”) longest run of misses is about 1.7. For shots made, the average run is
about 27, but there is lots of variation; 77% of simulations will have a longest run of made
shots between 17 and 37, and about 95% of simulation will fall between 12 and 46.
4.5. If you hear music (or talking) one time, you will almost certainly hear the same thing for
several more checks after that. (For example, if you tune in at the beginning of a 5-minute
song and check back every 5 seconds, you’ll hear that same song over 30 times.)
4.6. To estimate the probability, count the number of times the dice show 7 or 11, then divide
by 25. For “perfectly made” (fair) dice, the number of winning rolls will nearly always
(99.4% of the time) be between 1 and 11 out of 25.
4.7. Out of a very large number of patients taking this medication, the fraction who experience
this bad side effect is about 0.00001.
Note: Student explanations will vary, but should make clear that 0.00001 is a long-run
average rate of occurrence. Because a probability of 0.00001 is often stated as “1 in in
100,000,” it is tempting to interpret this probability as meaning “exactly 1 out of every
100,000.” While we expect about 1 occurrence of side effects out of 100,000 patients, the
actual number of side effects patients is random; it might be 0, or 1, or 2, . . . .
4.8. (a) – (c) Results will vary, but after n tosses, the 99.7% Range 99.7% Range
distribution of the proportion p̂ is approximately Nor- n for p̂ for count
√ 50 0.5 ± 0.2121 25 ± 10.6
mal with mean 0.5 and standard deviation 0.5/ n,
while the distribution of the count of heads is ap- 150 0.5 ± 0.1225 75 ± 18.4
300 0.5 ± 0.0866 150 ± 26.0
proximately Normal with mean 0.5n and standard
√ 600 0.5 ± 0.0612 300 ± 36.7
deviation 0.5 n, so using the 68–95–99.7 rule, we
have the results shown in the table on the right. For example, after 300 tosses, nearly all
students should have p̂ between 0.4134 and 0.5866, and a count between 124 and 176. Note
that the range for p̂ gets narrower, while the range for the count gets wider.
149
150 Chapter 4 Probability: The Study of Randomness
4
.
4.9. The true probability (assuming perfectly fair dice) is 1 − 5
6 = 0.5177, so students
should conclude that the probability is “quite close to 0.5.”
4.10. Sample spaces will likely include blonde, brunette (or brown), black, red, and gray.
Depending on student imagination (and use of hair dye), other colors may be listed; there
should at least be options to answer ”other” and “bald.”
4.11. One possibility: from 0 to hours (the largest number should be big enough to include
all possible responses). In addition, some students might respond with fractional answers
(e.g., 3.5 hours).
4.13. P(Blue, Green, Black, Brown, Grey, or White) = 1 − P(Purple, Red, Orange, or
Yellow) = 1 − (0.14 + 0.08 + 0.05 + 0.03) = 1 − 0.3 = 0.7. Using Rule 4 (the complement
rule) is slightly easier, because we only need to add the four probabilities of the colors we
do not want, rather than adding the six probabilities of the colors we want.
4.15. In Example 4.13, P(B) = P(6 or greater) was found to be 0.222, so P(A or B) =
P(A) + P(B) = 0.301 + 0.222 = 0.523.
4.16. For each possible value (1, 2, . . . , 6), the probability is 1/6.
4.18. If Ak is the event “the kth card drawn is an ace,” then A1 and A2 are not independent; in
3
particular, if we know that A1 occurred, then the probability of A2 is only 51 .
4.19. (a) The probability that both of two disjoint events occur is 0. (Multiplication is
appropriate for independent events.) (b) Probabilities must be no more than 1; P(A and B)
will be no more than 0.5. (We cannot determine this probability exactly from the given
information.) (c) P(Ac ) = 1 − 0.35 = 0.65.
4.20. (a) The two outcomes (say, A and B) in the sample space need not be equally likely.
The only requirements are that P(A) ≥ 0, P(B) ≥ 0, and P(A) + P(B) = 1. (b) In a
table of random digits, each digit has probability 0.1. (c) If A and B were independent,
then P(A and B) would equal P(A)P(B) = 0.06. (That is, probabilities are multiplied, not
added.) In fact, the given probabilities are impossible, because P(A and B) must be less than
the smaller of P(A) and P(B).
4.21. There are six possible outcomes: { link1, link2, link3, link4, link5, leave }.
Solutions 151
4.22. There are an infinite number of possible outcomes, and the description of the sample
space will depend on whether the time is measured to any degree of accuracy (S is the set
of all positive numbers) or rounded to (say) the nearest second (S = {0, 1, 2, 3, . . .}), or
nearest tenth of a second (S = {0, 0.1, 0.2, 0.3 . . .}).
4.23. (a) P(“Empire State of Mind” or “I Gotta Feeling”) = 0.180 + 0.068 = 0.248.
(b) P(neither “Empire State of Mind” nor “I Gotta Feeling”) = 1 − 0.248 = 0.752.
4.24. (a) If Rk is the event “ ‘Party in the USA’ is the kth chosen ringtone,” then
P(R1 and R2 ) = P(R1 )P(R2 ) = 0.1072 = 0.011449. (b) The complement would be “at most
one ringtone is ‘Party in the USA.’ ” P[(R1 and R2 )c ] = 1 − P(R1 and R2 ) = 0.988551.
4.25. (a) The given probabilities have sum 0.97, so P(type AB) = 0.03.
(b) P(type O or B) = 0.44 + 0.11 = 0.55.
4.26. P(both are type O) = (0.44)(0.52) = 0.2288; P(both are the same type) =
(0.42)(0.35) + (0.11)(0.10) + (0.03)(0.03) + (0.44)(0.52) = 0.3877.
4.27. (a) Not legitimate because the probabilities sum to 2. (b) Legitimate (for a nonstandard
deck). (c) Legitimate (for a nonstandard die).
4.28. (a) The given probabilities have sum 0.77, so P(French) = 0.23.
(b) P(not English) = 1 − 0.59 = 0.41, using Rule 4. (Or, add the other three probabilities.)
4.29. (a) The given probabilities have sum 0.72, so this probability must be 0.28.
(b) P(at least a high school education) = 1 − P(has not finished HS) = 1 − 0.12 = 0.88.
(Or add the other three probabilities.)
..
4.30. The probabilities of 2, 3, 4, and 5 are unchanged (1/6), so P( . or .. ..) must still be 1/3.
.. ..
If P( . .) = 0.21, then P( . ) = 3 − 0.21 = 0.123 (or 300 ). The complete table follows.
1 37
. . .. .. ... .. ..
Face . . .. .. ..
Probability 0.123 1/6 1/6 1/6 1/6 0.21
4.31. For example, the probability for A-positive blood is (0.42)(0.84) = 0.3528 and for
A-negative (0.42)(0.16) = 0.0672.
Blood type A+ A– B+ B– AB+ AB– O+ O–
Probability 0.3528 0.0672 0.0924 0.0176 0.0252 0.0048 0.3696 0.0704
4.32. (a) All are equally likely; the probability is 1/38. (b) Because 18 slots are red, the
.
probability of a red is P(red) = 1838 = 0.474. (c) There are 12 winning slots, so P(win a
.
column bet) = 1238 = 0.316.
152 Chapter 4 Probability: The Study of Randomness
4.33. (a) There are six arrangements of the digits 4, 9, and 1 (491, 419, 941, 914, 149,
194), so that P(win) = 1000
6
= 0.006. (b) The only winning arrangement is 222, so
P(win) = 1000
1
= 0.001.
4.34. (a) There are 104 = 10,000 possible PINs (0000 through 9999).* (b) The probability that
a PIN has no 0s is 0.94 (because there are 94 PINs that can be made from the nine nonzero
digits), so the probability of at least one 0 is 1 − 0.94 = 0.3439.
*If we assume that PINs cannot have leading 0s, then there are only 9000 possible codes
94
(1000–9999), and the probability of at least one 0 is 1 − 9000 = 0.271.
.
4.35. P(none are O-negative) = (1 − 0.07)10 = 0.4840, so P(at least one is
.
O-negative) = 1 − 0.4840 = 0.5160.
4.36. If we assume that each site is independent of the others (and that they can be considered
as a random sample from the collection of sites referenced in scientific journals), then P(all
.
seven are still good) = 0.877 = 0.3773.
4.37. This computation would only be correct if the events “a randomly selected person is at
least 75” and “a randomly selected person is a woman” were independent. This is likely
not true; in particular, as women have a greater life expectancy than men, this fraction is
probably greater than 3%.
4.38. As P(R) = 26 and P(G) = 46 , and successive rolls are independent, the respective
probabilities are:
4 4 2 5
2 . 4 . 2 .
2
6
4
6 = 243 = 0.00823, 2
6
4
6 = 729 = 0.00549, and 2
6 6 = 729 = 0.00274.
4
.
4.39. (a) (0.65)3 = 0.2746 (under the random walk theory). (b) 0.35 (because performance in
separate years is independent). (c) (0.65)2 + (0.35)2 = 0.545.
4.40. For any event A, along with its complement Ac , we have P(S) = P(A or Ac ) because
“A or Ac ” includes all possible outcomes (that is, it is the entire sample space S). By Rule 2,
P(S) = 1, and by Rule 3, P(A or Ac ) = P(A) + P(Ac ), because A and Ac are disjoint.
Therefore, P(A) + P(Ac ) = 1, from which Rule 4 follows.
4.41. Note that A = (A and B) or (A and B c ), and the events (A and B) and (A and B c ) are
disjoint, so Rule 3 says that
P(A) = P (A and B) or (A and B c ) = P(A and B) + P(A and B c ).
If P(A and B) = P(A)P(B), then we have P(A and B c ) = P(A) − P(A)P(B) =
P(A)(1 − P(B)), which equals P(A)P(B c ) by the complement rule.
Solutions 153
4.42. (a) Hannah and Jacob’s children can have alleles AA, BB, or AB, so A B
they can have blood type A, B, or AB. (The table on the right shows the A AA AB
possible combinations.) (b) Either note that the four combinations in the B AB BB
table are equally likely, or compute:
P(type A) = P(A from Hannah and A from Jacob) = P(A H )P(A J ) = 0.52 = 0.25
P(type B) = P(B from Hannah and B from Jacob) = P(B H )P(B J ) = 0.52 = 0.25
P(type AB) = P(A H )P(B J ) + P(B H )P(A J ) = 2 · 0.25 = 0.5
4.43. (a) Nancy and David’s children can have alleles BB, BO, or OO, B O
so they can have blood type B or O. (The table on the right shows B BB BO
the possible combinations.) (b) Either note that the four combinations O BO OO
in the table are equally likely or compute P(type O) = P(O from Nancy and O from
David) = 0.52 = 0.25 and P(type B) = 1 − P(type O) = 0.75.
4.44. Any child of Jennifer and José has a 50% chance of being type A A O
(alleles AA or AO), and each child inherits alleles independently of other A AA AO
children, so P(both are type A) = 0.52 = 0.25. For one child, we B AB BO
have P(type A) = 0.5 and P(type AB) = P(type B) = 0.25, so that P(both have same
type) = 0.52 + 0.252 + 0.252 = 0.375 = 38 .
4.45. (a) Any child of Jasmine and Joshua has an equal (1/4) chance of hav- A O
ing blood type AB, A, B, or O (see the allele combinations in the table). B AB BO
Therefore, P(type O) = 0.25. (b) P(all three have type O) = 0.253 = O AO OO
0.015625 = 64 . P(first has type O, next two do not) = 0.25 · 0.75 = 0.140625 = 64
1 2 9
.
4.49. (a) The probabilities for a discrete random variable always add to one. (b) Continuous
random variables can take values from any interval, not just 0 to 1. (c) A Normal random
variable is continuous. (Also, a distribution is associated with a random variable, but
“distribution” and “random variable” are not the same things.)
154 Chapter 4 Probability: The Study of Randomness
4.50. (a) If T is the event that a person uses Twitter, we can write the sample space as
{T, T c }. (b) There are various ways to express this; one would be
{T T T, T T T c , T T c T, T c T T, T T c T c , T c T T c , T c T c T, T c T c T c }.
(c) For this random variable (call it X ), the sample space is {0, 1, 2, 3}. (d) The sample
space in part (b) reveals which of the three people use Twitter. This may or may not be
important information; it depends on what questions we wish to ask about our sample.
4.51. (a) Based on the information from Exercise 4.50, along with the complement rule,
P(T ) = 0.19 and P(T c ) = 0.81. (b) Use the multiplication rule for independent events;
. .
for example, P(T T T ) = 0.193 = 0.0069, P(T T T c ) = (0.192 )(0.81) = 0.0292,
. .
P(T T c T c ) = (0.19)(0.812 ) = 0.1247, and P(T c T c T c ) = 0.813 = 0.5314. (c) Add up the
probabilities from (b) that correspond to each value of X .
4.52. The two histograms are shown below. The most obvious difference is that a “family”
must have at least two people. Otherwise, the family-size distribution has slightly larger
probabilities for 2, 3, or 4, while for large family/household sizes, the differences between
the distributions are small.
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
4.53. (a) See also the solution to Exercise 4.22. If we view this time as being measured to
any degree of accuracy, it is continuous; if it is rounded, it is discrete. (b) A count like this
must be a whole number, so it is discrete. (c) Incomes—whether given in dollars and cents,
or rounded to the nearest dollar—are discrete. (However, it is often useful to treat such
variables as continuous.)
the horizontal axis.) (c) P(at least one ace) = 0.1493, 0.6
which can be computed either as 0.1448 + 0.0045 or
1 − 0.8507. 0.4
0.2
0.0
0 1 2
Solutions 155
4.55. (a) Histogram on the right. (b) “At least one nonword
error” is the event “X ≥ 1” (or “X > 0”). P(X ≥ 1) = 0.3
1 − P(X = 0) = 0.9. (c) “X ≤ 2” is “no more than two
nonword errors,” or “fewer than three nonword errors.” 0.2
4.57. (a) The pairs are given below. We must assume that we can distinguish between,
for example, “(1,2)” and “(2,1)”; otherwise, the outcomes are not equally likely.
(b) Each pair has probability 1/36. (c) The value of X is given below each pair. For
the distribution (given below), we see (for example) that there are four pairs that add to
5, so P(X = 5) = 36 4
. Histogram below, right. (d) P(7 or 11) = 366
+ 36
2
= 368
= 29 .
(e) P(not 7) = 1 − 6
36 = 56 .
Value of X 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
Probability 36 36 36 36 36 36 36 36 36 36 36
4.58. The possible values of Y are 1, 2, 3, . . . , 12, each with probability 1/12. Aside from
drawing a diagram showing all the possible combinations, one can reason that the first
(regular) die is equally likely to show any number from 1 through 6. Half of the time, the
second roll shows 0, and the rest of the time it shows 6. Each possible outcome therefore
has probability 16 · 12 .
156 Chapter 4 Probability: The Study of Randomness
4.59. The table on the right shows the additional columns to add to the table (1,7) (1,8)
given in the solution to Exercise 4.57. There are 48 possible (equally-likely) 8 9
combinations. (2,7) (2,8)
9 10
Value of X 2 3 4 5 6 7 8 9 10 11 12 13 14 (3,7) (3,8)
Probability 1 2 3 4 5 6 6 6 5 4 3 2 1 10 11
48 48 48 48 48 48 48 48 48 48 48 48 48 (4,7) (4,8)
11 12
(5,7) (5,8)
12 13
(6,7) (6,8)
13 14
4.60. (a) W can be 0, 1, 2, or 3. (b) See the top two lines of the table below. (c) The
distribution is given in the bottom two lines of the table. For example, P(W = 0) =
. .
(0.73)(0.73)(0.73) = 0.3890, and in the same way, P(W = 3) = 0.273 = 0.1597. For
P(W = 1), note that each of the three arrangements that give W = 1 have probability
.
(0.73)(0.73)(0.27) = 0.143883, so P(W = 1) = 3(0.143883) = 0.4316. Similarly,
.
P(W = 2) = 3(0.73)(0.27)(0.27) = 0.1597.
Arrangement DDD DDF DFD FDD FFD FDF DFF FFF
Probability 0.3890 0.1439 0.1439 0.1439 0.0532 0.0532 0.0532 0.0197
Value of W 0 1 2 3
Probability 0.3890 0.4316 0.1597 0.0197
4.61. (a) P(X < 0.6) = 0.6. (b) P(X ≤ 0.6) = 0.6. (c) For continuous random variables,
“equal to” has no effect on the probability; that is, P(X = c) = 0 for any value of c.
4.62. (a) P(X ≥ 0.30) = 0.7. (b) P(X = 0.30) = 0. (c) P(0.30 < X < 1.30) =
P(0.30 < X < 1) = 0.7. (d) P(0.20 ≤ X ≤ 0.25 or 0.7 ≤ X ≤ 0.9) = 0.05 + 0.2 = 0.25.
(e) P(not [0.4 ≤ X ≤ 0.7]) = 1 − P(0.4 ≤ X ≤ 0.7) = 1 − 0.3 = 0.7.
4.64. (a) The area of a triangle is 12 bh = 12 (2)(1) = 1. (b) P(Y < 1) = 0.5.
(c) P(Y > 0.6) = 0.82; the easiest way to compute this is to note that the unshaded area is
a triangle with area 12 (0.6)(0.6) = 0.18.
8−9 x −9 10 − 9
4.65. P(8 ≤ x ≤ 10) = P 0.0724 ≤ 0.0724 ≤ 0.0724 = P(−13.8 ≤ Z ≤ 13.8). This probability
is essentially 1; x will almost certainly estimate µ within ±1 (in fact, it will almost
certainly be much closer than this).
0.52 − 0.56 p̂ − 0.56 0.60 − 0.56
4.66. (a) P(0.52 ≤ p̂ ≤ 0.60) = P ≤ 0.019 ≤ = P(−2.11 ≤ Z ≤ 2.11) =
0.019 0.019
− 0.56 − 0.56
0.9826 − 0.0174 = 0.9652. (b) P( p̂ ≥ 0.72) = P p̂0.019 ≥ 0.720.019 = P(Z ≥ 8.42); this
is basically 0.
4.71. First wenote that µ X = 0(0.5) + 2(0.5) = 1, so σ X2 = (0 − 1)2 (0.5) + (2 − 1)2 (0.5) = 1
and σ X = σ X2 = 1.
4.72. (a) Each toss of the coin is independent (that is, coins have no memory). (b) The
variance is multiplied by 102 = 100. (The mean and standard deviation are multiplied
by 10.) (c) The correlation does not affect the mean of a sum (although it does affect the
variance and standard deviation).
4.75. The average grade is µ = (0)(0.05) + (1)(0.04) + (2)(0.20) + (3)(0.40) + (4)(0.31) = 2.88.
4.78. In the solution to Exercise 4.75, we found the average grade was µ = 2.88, so
σ 2 = (0 − 2.88)2 (0.05) + (1 − 2.88)2 (0.04)
+ (2 − 2.88)2 (0.2) + (3 − 2.88)2 (0.4) + (4 − 2.88)2 (0.31) = 1.1056,
√ .
and the standard deviation is σ = 1.1056 = 1.0515.
4.79. (a) With ρ = 0, the variance is σ X2 +Y = σ X2 + σY2 = (75)2 + (41)2 = 7306, so the standard
√ .
deviation is σ X +Y = 7306 = $85.48. (b) This is larger; the negative correlation decreased
the variance.
4.80. (a) The mean of Y is µY = 1—the obvious balance point of the triangle. (b) Both X 1
and X 2 have mean µ X 1 = µ X 2 = 0.5 and µY = µ X 1 + µ X 2 .
4.81. The situation described in this exercise—“people who have high intakes of calcium
in their diets are more compliant than those who have low intakes”—implies a positive
correlation between calcium intake and compliance. Because of this, the variance of total
calcium intake is greater than the variance we would see if there were no correlation (as the
calculations in Example 4.38 demonstrate).
4.82. Let N and W be nonword and word error counts. In Exercise 4.76, we found
µ N = 1.9 errors and µW = 1 error. The variances of these distributions are σ N2 = 1.29
.
and σW2 = 1, so the standard deviations are σ N = 1.1358 errors and σW = 1 error. The
mean total error count is µ N + µW = 2.9 errors for both cases. (a) If error counts are
.
independent (so that ρ = 0), σ N2 +W = σ N2 + σW2 = 2.29 and σ N +W = 1.5133 errors.
(Note that we add the variances, not the standard deviations.) (b) With ρ = 0.5,
. .
σ N2 +W = σ N2 + σW2 + 2ρσ N σW = 2.29 + 1.1358 = 3.4258 and σ N +W = 1.8509 errors.
4.83. (a) The mean for one coin is µ1 = (0) 1
2 + (1) 1
2 = 0.5 and the variance is
σ12 = (0 − 0.5)2 1
2 + (1 − 0.5)2 1
2 = 0.25, so the standard deviation is σ1 = 0.5.
(b) Multiply µ1 and σ12
by 4: µ4 = 4µ1 = 2 and σ42 = 4σ14 = 1, so σ4 = 1. (c) Note that
because of the symmetry of the distribution, we do not need to compute the mean to see
that µ4 = 2; this is the obvious balance point of the probability histogram in Figure 4.7. The
details of the two computations are
µW = (0)(0.0625) + (1)(0.25) + (2)(0.375) + (3)(0.25) + (4)(0.0625) = 2
σW2 = (0 − 2)2 (0.0625) + (1 − 2)2 (0.25)
+ (2 − 2)2 (0.375) + (3 − 2)2 (0.25) + (4 − 2)2 (0.0625) = 1.
4.84. If D is the result of rolling a single four-sided die, then µ D = (1 + 2 + 3 + 4) 1
4 = 2.5,
and σ D2 = [(1 − 2.5)2 + (2 − 2.5)2 + (3 − 2.5)2 + (4 − 2.5)2 ] 14 = 1.25. Then for the sum
Solutions 159
4.87. (a) Not independent: Knowing the total X of the first two cards tells us something
about the total Y for three cards. (b) Independent: Separate rolls of the dice should be
independent.
. .
4.88. Divide the given values by 2.54: µ = 69.6063 in and σ = 2.8346 in.
4.91. Although the probability of having to pay for a total loss for one or more of the 10
policies is very small, if this were to happen, it would be financially disastrous. On the other
hand, for thousands of policies, the law of large numbers says that the average claim on
many policies will be close to the mean, so the insurance company can be assured that the
premiums they collect will (almost certainly) cover the claims.
4.93. (a) Add up the given probabilities and subtract from 1; this gives P(man does not die
in the next five years) = 0.99749. (b) The distribution of income (or loss) is given below.
.
Multiplying each possible value by its probability gives the mean intake µ = $623.22.
4.94. The mean µ of the company’s “winnings” (premiums) and their “losses” (insurance
claims) is positive. Even though the company will lose a large amount of money on a small
number of policyholders who die, it will gain a small amount on the majority. The law of
large numbers says that the average “winnings” minus “losses” should be close to µ, and
overall the company will almost certainly show a profit.
4.95. The events “roll a 3” and “roll a 5” are disjoint, so P(3 or 5) = P(3) + P(5) = 1
6 + 1
6 =
6 = 3.
2 1
4.96. The events E (roll is even) and G (roll is greater than 4) are not disjoint—specifically,
E and G = {6}—so P(E or G) = P(E) + P(G) − P(E and G) = 36 + 26 − 16 = 46 = 23 .
4.97. Let A be the event “next card is an ace” and B be “two of Slim’s four cards are aces.”
Then, P(A | B) = 48
2
because (other than those in Slim’s hand) there are 48 cards, of which
2 are aces.
4.98. Let A1 = “the next card is a diamond” and A2 = “the second card is a diamond.”
We wish to find P(A1 and A2 ). There are 27 unseen cards, of which 10 are diamonds, so
5 .
P(A1 ) = 1027 , and P(A2 | A1 ) = 26 , so P(A1 and A2 ) = 27 × 26 = 39 = 0.1282.
9 10 9
Note: Technically, we wish to find P(A1 and A2 | B), where B is the given event (25 cards
visible, with 3 diamonds in Slim’s hand). We have P(A1 | B) = 27 10
and P(A2 | A1 and B) = 26
9
,
and compute P(A1 and A2 | B) = P(A1 | B) × P(A2 | A1 and B).
4.99. This computation uses the addition rule for disjoint events, which is appropriate for this
setting because B (full-time students) is made up of four disjoint groups (those in each of
the four age groups).
4.100. With A and B as defined in Example 4.44 (respectively, 15- to 19-year-old students, and
full-time students), we want to find
P(B and A) 0.21 .
P(B | A) = = = 0.9130
P(A) 0.21 + 0.02
For these two calculations, we restrict our attention to different subpopulations of students
(that is, different rows of the table given in Example 4.44). For P(A | B), we ask what
fraction of full-time students (the subpopulation) are aged 15 to 19 years. For P(B | A), we
ask what fraction of the subpopulation of 15- to 19-year-old students are full-time.
Solutions 161
4.101. The tree diagram shows the probability found First card Second card
in Exercise 4.98 on the top branch. The middle two 9/ 26 diamond 5/ 39
diamond
branches (added together) give the probability that 10/ 27 non-
17/ 26 diamond 85/ 351
Slim gets exactly one diamond from the next two
cards, and the bottom branch is the probability that 17/ 27 non-
10/ 26 diamond 85/ 351
neither card is a diamond. diamond non- 136/ 351
16/ 26 diamond
4.102. (a) The given statement is only true for disjoint events; in general, P(A or B) =
P(A) + P(B) − P(A and B). (b) P(A) plus P(Ac ) is always equal to one. (c) Two events
are independent if P(B | A) = P(B). They are disjoint if P(A and B) = 0.
4.103. For a randomly chosen adult, let S = “(s)he gets enough sleep” and let E = “(s)he
gets enough exercise,” so P(S) = 0.4, P(E) = 0.46, and P(S and E) = 0.24.
(a) P(S and E c ) = 0.4 − 0.24 = 0.16. (b) P(S c and E) = 0.46 − 0.24 = 0.22.
(c) P(S c and E c ) = 1 − (0.4 + 0.46 − 0.24) = 0.38. (d) The answers in (a) and
(b) are found by a variation of the addition rule for disjoint events: We note that
P(S) = P(S and E) + P(S and E c ) and P(E) = P(S and E) + P(S c and E). In each
case, we know the first two probabilities, and find the third by subtraction. The answer
for (c) is found by using the general addition rule to find P(S or E), and noting that
S c and E c = (S or E)c .
4.105. For a randomly chosen high school student, let L = “student admits to lying” and
M = “student is male,” so P(L) = 0.48, P(M) = 0.5, and P(M and L) = 0.25. Then
P(M or L) = P(M) + P(L) − P(M and L) = 0.73.
4.106. Using the addition rule for disjoint events, note that P(M c and L) = P(L) −
P(M and L) = 0.23. Then by the definition of conditional probability, P(M c | L) =
P(M and L) 0.23 .
= = 0.4792.
P(L) 0.48
4.107. Let B = “student is a binge drinker” and M = “student is male.” (a) The four
probabilities sum to 0.11 + 0.12 + 0.32 + 0.45 = 1. (b) P(B c ) = 0.32 + 0.45 = 0.77.
c and M) .
(c) P(B c | M) = P(BP(M) = 0.110.32
+ 0.32 = 0.7442. (d) In the language of this chapter, the
events are not independent. An attempt to phrase this for someone who has not studied this
material might say something like, “Knowing a student’s gender gives some information
about whether or not that student is a binge drinker.”
Note: Specifically, male students are slightly more likely to be binge drinkers. This
statement might surprise students who look at the table and note that the proportion
of binge drinkers in the men’s columns is smaller than that proportion in the women’s
162 Chapter 4 Probability: The Study of Randomness
column. We cannot compare those proportions directly; we need to compare the conditional
probabilities of binge drinkers within each given gender (see the solution to the next
exercise.)
4.108. Let B = “student is a binge drinker” and M = “student is male.” (a) These two
probabilities are given as entries in the table: P(M and B) = 0.11 and P(M c and B) = 0.12.
.
(b) These are conditional probabilities: P(B | M) = P(BP(M) and M)
= 0.110.11
+ 0.32 = 0.2558 and
and M c ) .
P(B | M c ) = P(BP(M c) = 0.120.12
+ 0.45 = 0.2105. (c) The fact that P(B | M) > P(B | M )
c
indicates that male students are more likely to be binge drinkers (see the comment in the
solution to the previous exercise). The other comparison, P(M and B) < P(M c and B), is
more a reflection of the fact that the survey reported responses for more women (57%)
than men (43%) and does not by itself allow for comparison of binge drinking between
the genders. (To understand this better, imagine a more extreme case, where, say, 90% of
respondents were women . . . .)
4.110. The branches of this tree diagram have the prob- Institution type Gender
abilities given in Exercise 4.109, and the branches 0.44 male 0.2684
4-year
end with the probabilities found in the solution to that 0.61 female 0.3416
0.56
exercise.
0.41 male 0.1599
0.39 2-year
0.59 female 0.2301
4.112. P(A or B) = P(A) + P(B) − P(A and B) = 0.138 + 0.261 − 0.082 = 0.317.
.
4.113. P(A | B) = P(AP(B)
and B)
= 0.082
0.261 = 0.3142. If A and B were independent, then P(A | B)
would equal P(A), and also P(A and B) would equal the product P(A)P(B).
4.116. Let A be the event “income ≥ $1 million” and B be “income ≥ $100,000.” Then
“A and B” is the same as A, so:
392,220
P(A) 142,978,806 392,220 .
P(A | B) = = = = 0.02180
P(B) 17,993,498 17,993,498
142,978,806
4.117. See also the solution to Exercise 4.115, especially the table of probabilities given there.
c and B) .
(a) P(Ac | B) = P(AP(B) = 0.14
0.22 = 0.6364. (b) The events A and B are not independent;
c
would have subtracted this from each of the two-way inter- B and C
B only 0.1
section probabilities to find, for example, P(A and B and C c ). B 0.1 None
0
Next, determine P(A only) so that the total probability of
the regions that make up the event A is 0.7. Finally, P(none) = P(A and B and C ) = 0
c c c
4.119. We seek P(at least one offer) = P(A or B or C); we can find this as 1 − P(no
offers) = 1 − P(Ac and B c and C c ). We see in the Venn diagram of Exercise 4.118 that this
probability is 1.
4.120. This is P(A and B and C c ). As was noted in Exercise 4.118, because
P(A and B and C) = 0, this is the same as P(A and B) = 0.3.
4.122. Let W = “the degree was earned by a woman” and P = “the degree was a professional
degree.” (a) To construct the table (below), divide each entry by the grand total of
933 .
all entries (2403); for example, 2403 = 0.3883 is the fraction of all degrees that were
bachelor’s degrees awarded to women. Some students may also find the row totals (1412
and 991) and the column totals (1594, 662, 95, 52) and divide those by the grand total;
.
2403 = 0.6633 is the fraction of all degrees that were bachelor’s degrees.
for example, 1594
.
(b) P(W ) = 1412
2403 = 0.5876 (this is one of the optional marginal probabilities from the
51 .
table below). (c) P(W | P) = 51/2403
95/2403 = 95 = 0.5368. (This is the “Female” entry from the
“Professional” column, divided by that column’s total.) (d) W and P are not independent; if
they were, the two probabilities in (b) and (c) would be equal.
4.123. Let M be the event “the person is a man” and B be “the person earned a bachelor’s
991 .
degree.” (a) P(M) = 2403 = 0.4124. Or take the answer from part (b) of the
661 .
previous exercise and subtract from 1. (b) P(B | M) = 661/2403
991/2403 = 991 = 0.6670.
(This is the “Bachelor’s” entry from the “Male” row, divided by that row’s total.)
. .
(c) P(M and B) = P(M) P(B | M) = (0.4124)(0.6670) = 0.2751. This agrees with the
.
directly computed probability: P(M and B) = 2403
661
= 0.2751.
.
4.124. Each unemployment rate is computed as Did not finish HS 1− 11,552
12,623
= 0.0848
shown on the right. (Alternatively, subtract the .
HS/no college 1− 36,249
38,210
= 0.0513
number employed from the number in the labor .
Some college 1− 32,429
33,928
= 0.0442
force, then divide that difference by the number .
College graduate 1− 39,250
= 0.0288
in the labor force.) Because these rates (probabili- 40,414
ties) are different, education level and being employed are not independent.
4.125. (a) Add up the numbers in the first and second columns. We find that there are 186,210
thousand (i.e., over 186 million) people aged 25 or older, of which 125,175 thousand are in
. .
186,210 = 0.6722. (b) P(L | C) =
the labor force, so P(L) = 125,175 P(L and C)
P(C) = 40,414
51,568 = 0.7837.
(c) L and C are not independent; if they were, the two probabilities in (a) and (b) would be
equal.
Solutions 165
4.126. For the first probability, add up the numbers in the third column. We find that there
are 119,480 thousand (i.e., over 119 million) employed people aged 25 or older. Therefore,
39,250 .
P(C | E) = P(CP(E)
and E)
= 119,480 = 0.3285.
For the second probability, we use the total number of college graduates in the
.
population: P(E | C) = P(CP(C)
and E)
= 39,250
51,568 = 0.7611.
4.127. The population includes retired people who have left the labor force. Retired persons
are more likely than other adults to have not completed high school; consequently, a
relatively large number of retired persons fall in the “did not finish high school” category.
Note: Details of this lurking variable can be found in the Current Population Survey
annual report on “Educational Attainment in the United States.” For 2006, this report says
that among the 65-and-over population, about 24.8% have not completed high school,
compared to about 19.3% of the under-65 group.
4.129. (a) Her brother has type aa, and he got one allele from each parent. A a
But neither parent is albino, so neither could be type aa. (b) The table A AA Aa
on the right shows the possible combinations, each of which is equally a Aa aa
likely, so P(aa) = 0.25, P(Aa) = 0.5, and P(A A) = 0.25. (c) Beth is either A A or Aa, and
P(A A | not aa) = 0.25
0.75 = 3 , while P(Aa | not aa) = 0.75 = 3 .
1 0.50 2
4.130. (a) If Beth is Aa, then the first table on the right gives A a A A
the (equally likely) allele combinations for a child, so P(child a Aa aa Aa Aa
is non-albino | Beth is Aa) = 12 . If Beth is A A, then as the a Aa aa Aa Aa
second table shows, their child will definitely be type Aa (and non-albino), so P(child is
non-albino | Beth is A A) = 1. (b) We have:
P(child is non-albino) = P(child Aa and Beth Aa) + P(child Aa and Beth A A)
= P(Beth Aa) P(child Aa | Beth Aa) + P(Beth A A) P(child Aa | Beth A A)
= 2
3 · 1
2 + 1
3 ·1= 2
3
1/3
Therefore, P(Beth is Aa | child is Aa) = 2/3 = 12 .
4.131. Let C be the event “Toni is a carrier,” T be the event “Toni tests positive,” and D
be “her son has DMD.” We have P(C) = 23 , P(T | C) = 0.7, and P(T | C c ) = 0.1.
) = P(T and C) + P(T and C ) = P(C) P(T | C) + P(C ) P(T | C ) =
Therefore, P(T c c c
3 (0.7) + 3 (0.1) = 0.5, and:
2 1
4.132. The value −1 should occur about 30% of the time (that is, the proportion should be
close to 0.3).
4.133. The mean of X is µ X = (−1)(0.3) + (2)(0.7) = 1.1, so the mean if many such values
will be close to 1.1.
4.134. (a) µ√ X = (1)(0.4) + (2)(0.6) = 1.6 and σ X = (1 − 1.6) (0.4) + (2 − 1.6) (0.6) = 0.24,
2 2 2
.
so σ X = 0.24 = 0.4899. (b) The mean is µY = 3µ X√− 2 = 2.8. The variance is
.
σY2 = 9σ X2 = 2.16, and the standard deviation is σY = 2.16 = 1.4697 (this can also be
found as 3σ X ). (c) The first computation used Rule 1 for means. The second computation
used Rule 1 for variances and standard deviations.
4.135. (a) Because the possible values of X are 1 and 2, the possible values of Y are
3 · 12 − 2 = 1 (with probability 0.4) and 3 · 22 − 2 = 10 (with probability 0.6).
(b) µY =√(1)(0.4) + (10)(0.6) = 6.4 and σY2 = (1 − 6.4)2 (0.4) + (10 − 6.4)2 (0.6) = 19.44,
.
so σ X = 19.44 = 4.4091. (c) Those rule are for transformations of the form a X + b, not
a X 2 + b.
4.136. (a) A and B are disjoint. (If A happens, B did not.) (b) A and B are independent. (A
concerns the first roll, B the second.) (c) A and B are independent. (A concerns the second
roll, B the first.) (d) A and B are neither disjoint nor independent. (If A happens, then so
does B.)
(c) P(A) = 15
36 = 5
12 and P(B) = 10
36 = 18 . (d) P(A) = 36 = 12 and P(B) = 36 = 18 .
5 15 5 10 5
= 0.6,
√
so the standard deviation is σ X = 0.6 = 0.7746. (b) & (c) To achieve a mean of 3 with
possible values 2, 3, and 4, the distribution must be symmetric; that is, the probability at 2
must equal the probability at 4 (so that 3 would be the balance point of the distribution). Let
p be the probability assigned to 2 (and also to 4) in the new distribution. A larger standard
deviation is achieved when p > 0.3, and a smaller standard deviation arises when p < 0.3.
√
In either case, the new standard deviation is 2 p.
4.139. For each bet, the mean is the winning prob- Point Expected Payoff
ability times the winning payout, plus the losing 4 or 10 1
(+$20) + 23 (−$10) = 0
3
probability times −$10. These are summarized 5 or 9 2
(+$15) + 35 (−$10) = 0
5
in the table on the right; all mean payoffs equal
6 or 8 5
(+$12) + 11
6
(−$10) = 0
$0. 11
Note: Alternatively, we can find the mean amount of money we have at the end of the
bet. For example, if the point is 4 or 10, we end with either $30 or $0, and our expected
ending amount is 13 ($30) + 23 ($0) = $10—equal to the amount of the bet.
Solutions 167
1 − 0.72
4.140. P(A) = P(B) = · · · = P(F) = 0.72
6 = 0.12 and P(1) = · · · = P(8) = 8 = 0.035.
4.141. (a) All probabilities are greater than or equal to 0, and their sum is 1. (b) Let R1
be Taster 1’s rating and R2 be Taster 2’s rating. Add the probabilities on the diagonal
(upper left to lower right): P(R1 = R2 ) = 0.03 + 0.07 + 0.25 + 0.20 + 0.06 = 0.61.
(c) P(R1 > 3) = 0.39 (the sum of the ten numbers in the bottom two rows), and
P(R2 > 3) = 0.39 (the sum of the ten numbers in the right two columns).
4.145. With B, M, and D representing the three kinds of degrees, and W meaning the degree
recipient was a woman, we have been given:
P(B) = 0.71, P(M) = 0.23, P(D) = 0.06,
P(W | B) = 0.44, P(W | M) = 0.41, P(W | D) = 0.30.
Therefore, we find
P(W ) = P(W and B) + P(W and M) + P(W and D)
= P(B) P(W | B) + P(M) P(W | M) + P(D) P(W | D) = 0.4247,
so:
P(B and W ) P(B) P(W | B) 0.3124 .
P(B | W ) = = = = 0.7356
P(W ) P(W ) 0.4247
no point $0 12/ 36
4.147. P(no point is established) = 1236 = 3 .
1
4 $20 1/ 36
In Exercise 4.139, the probabilities of winning each 4 7 –$10 2/ 36
odds bet were given as 13 for 4 and 10, 25 for 5 and 9, 5
5 $15 2/ 45
5 7 –$10 3/ 45
and 11 for 6 and 8. This tree diagram can get a bit large 6 $12 25/ 396
(and crowded). In the diagram shown on the right, the 6
First 7 –$10 30/ 396
probabilities are omitted from the individual branches. roll
8
8 $12 25/ 396
The probability of winning 7 –$10 30/ 396
an odds
bet on 4 or 10 (with
9 $15 2/ 45
a net payout of $20) is 36 3 = 36
3 1 1
. Losing that odds 9
7 –$10 3/ 45
1/ 36
bet costs $10, and has probability 3
36
2
3 = 2
36 (or 10
10 $20
7 –$10 2/ 36
1
18 ).
Similarly, the probability of winning an odds bet
on 5 or 9 is 36 25 = 45
4 2 4
, and the probability of losing that bet is 36 5 =
3 3
(or 1
15 ).
45
For an odds bet on 6 or 8, we win $12 with probability 5 5
= 25
396 , and lose $10 with
36 11
11 = 396 (or 66 ).
5 6 30 5
probability 36
To confirm that this game is fair, one can multiply each payoff by its probability then
add up all of those products. More directly, because each individual odds bet is fair (as was
shown in the solution to Exercise 4.139), one can argue that taking the odds bet whenever it
is available must be fair.
4.148. Student findings will depend on how much they explore the Web site. Individual growth
charts include weight-for-age, height-for-age, weight-for-length, head circumference-for-age,
and body mass index-for-age.
4.149. Let R1 be Taster 1’s rating and R2 be Taster 2’s rating. P(R1 = 3) =
0.01 + 0.05 + 0.25 + 0.05 + 0.01 = 0.37, so:
P(R2 > 3 and R1 = 3) 0.05 + 0.01 .
P(R2 > 3 | R1 = 3) = = = 0.1622
P(R1 = 3) 0.37
4.151. The event {Y < 1/2} is the bottom half of the square, while
{Y > X } is the upper left triangle of the square. They overlap in
a triangle with area 1/8, so: Y X
P(Y < 12 and Y > X ) 1/8 1
P(Y < 1
| Y > X) = = =
2 P(Y > X ) 1/2 4
Y 1 2
Solutions 169
5.1. The population is iPhone users (or iPhone users who use the AppsFire service). The
statistic is an average of 65 apps per device. Likely values will vary, in part based on how
many apps are on student phones (which they might consider “typical”).
5.2. With µ = 240, σ = 18, and n = 36, we have mean µx = µ = 240 and standard deviation
√
σx = σ/ n = 3.
5.3. When n = 144, the mean is µx = µ = 240 (unchanged), and the standard deviation is
√
σx = σ/ n = 1.5. Increasing n does not change µx but decreases σx , the variability of
the sampling distribution. (In this case, because n was increased by a factor of 4, σx was
halved.)
√
5.4. When n = 144, σx = σ/ 144 = 18/12 = 1.5. The sampling distribution of x is
approximately N (240, 1.5), so about 95% of the time, x is between 237 and 243.
√
5.5. When n = 1296, σx = σ/ 1296 = 18/36 = 0.5. The sampling distribution of x is
approximately N (240, 0.5), so about 95% of the time, x is between 239 and 241.
√
. − 25 28 − 25 . .
5.6. With σ/ 50 = 3.54, we have P(x < 28) = P x3.54 < 3.54 = P(Z < 0.85) = 0.8023.
5.7. (a) Either change “variance” to “standard deviation” (twice), or change the formula at the
end to 102 /30. (b) Standard deviation decreases with increasing sample size. (c) µx always
equals µ, regardless of the sample size.
120
are 290 and 370. (c) Answers will vary.
100
Shown on the right is a histogram of the 80
(exact) sampling distribution. With a sample 60
size of only 3, the distribution is noticeably 40
non-Normal. (d) The center of the exact 20
sampling distribution is µ, but with only 0
290 300 310 320 330 340 350 360 370
10 values of x, this might not be true for
Sample mean
student histograms.
Note: This histograms were found by considering all 1000 possible samples.
170
Solutions 171
√ .
5.10. (a) σx = σ/ 200 = 0.08132. (b) With n = 200,
x will be within
±0.16 (about 10
6.9 − 7.02 . .
minutes) of µ = 7.02 hours. (c) P(x ≤ 6.9) = P Z ≤ 0.08132 = P(Z ≤ −1.48) =
0.0694.
5.11. (a) With n = 200, the 95% probability range was about ±10 minutes, so need a
larger sample size. (Specifically, to halve the range, we need to roughly quadruple the
.
sample size.) (b) We need 2σx = 60 5
, so σx = 0.04167. (c) With σ = 1.15, we have
√
n = 0.04167
1.15
= 27.6, so n = 761.76—use 762 students.
√ √ .
5.12. (a) The standard deviation is σ/ 10 = 280/ 10 = 88.5438 seconds. (b) In order to have
√ √ .
σ/ n = 15 seconds, we need n = 280 15 , so n = 348.4—use n = 349.
√ √ .
5.13. Mean µ = 250 ml and standard deviation σ/ 6 = 3/ 6 = 1.2247 ml.
5.14. (a) For this exercise, bear in mind that the ac-
tual distribution for a single song length is definitely
not Normal; in particular, a Normal distribution with
mean 350 seconds and standard deviation 280 sec-
onds extends well below 0 seconds. The Normal√
curve for x should be taller
√ by a factor of 10 and
skinnier by a factor of 1/ 10 (although that tech- –480 –210 70 350 630 910 1190
nical detail will likely be lost on most students). (b) Using a N (350, 280) distribution,
. .
1 − P(331 < X < 369) = 1 − P(−0.07 < Z < 0.07) = 0.9442. (c) Using a N (350, 88.5438)
. .
distribution, 1 − P(331 < X < 369) = 1 − P(−0.21 < Z < 0.21) = 0.8336.
.
5.15. In Exercise 5.13, we found that σx = 1.2247 ml, so x
has a N (250 ml, 1.2247 ml) distribution. (a) On the right.
The
√ Normal curve for x should be√taller by a factor of
6 and skinnier by a factor of 1/ 6. (b) The probability
that a single can’s volume differs from the target by
at least 1 ml—one-third of a standard deviation—is
1 − P(−0.33 < Z < 0.33) = 0.7414. (c) The probability
241 244 247 250 253 256 259
that x is at least 1 ml from the target is
1 − P(249 < x < 251) = 1 − P(−0.82 < Z < 0.82) = 0.4122.
5.16. For the population distribution (the number of friends of a randomly chosen individual),
µ = 130 and σ = 85 friends. (a) For the total number of friends for a sample of n = 30
√ .
users, the mean is nµ = 3900 and the standard deviation is σ n = 465.56 friends.
(b) For the mean number of friends, the mean is µ = 130 and the standard deviation is
√ . 140 − 130 .
σ/ n = 15.519 friends. (c) P(x > 140) = P Z > 15.519 = P(Z > 0.64) = 0.2611
(software: 0.2597).
172 Chapter 5 Sampling Distributions
5.17. (a) x is not systematically higher than or lower than µ; that is, it has no particular
tendency to underestimate or overestimate µ. (In other words, it is “just right” on the
average.) (b) With large samples, x is more likely to be close to µ because with a larger
sample comes more information (and therefore less uncertainty).
.
5.18. (a) P(X ≥ 23) = P Z ≥ 23 −5.119.2 = P(Z ≥ 0.75) = 0.2266 (with software:
0.2281). Because ACT scores are reported as whole numbers, we might instead compute
.
P(X ≥ 22.5) = P(Z ≥ 0.65) = 0.2578(software: 0.2588). (b) µx = 19.2 and
√ . 23 − 19.2
σx = σ/ 25 = 1.02. (c) P(x ≥ 23) = P Z ≥ 1.02 = P(Z ≥ 3.73) = 0.0001. (In
this case, it is not appropriate to find P(x ≥ 22.5), unless x is rounded to the nearest
whole number.) (d) Because individual scores are only roughly Normal, the answer to (a) is
approximate. The answer to (c) is also approximate but should be more accurate because x
should have a distribution that is closer to Normal.
√ √ .
5.19. (a) µx = 0.5 and σx = σ/ 50 = 0.7/ 50 = 0.09899. (b) Because this distribution
is only approximately Normal, it would be quite reasonable to use the 68–95–99.7 rule
to give a rough estimate: 0.6 is about one standard deviation above the mean, so the
probability should be about 0.16 (half of the32% that falls outside ±1 standard deviation).
. − 0.5
Alternatively, P(x > 0.6) = P Z > 0.6 0.09899 = P(Z > 1.01) = 0.1562.
5.21. Let X be Sheila’s measured glucose level. (a) P(X > 140) = P(Z > 1.5) = 0.0668.
(b) If x is the
√ mean of three measurements (assumed to be independent), then x has a
N (125, 10/ 3 ) or N (125 mg/dl, 5.7735 mg/dl) distribution, and P(x > 140) = P(Z >
2.60) = 0.0047.
√ .
5.22. (a) µ X = ($500)(0.001) = $0.50 and σ X = 249.75 = $15.8035. (b) In the long run,
Joe makes about 50 cents for each $1 √ ticket.. (c) If x is Joe’s average payoff over a year,
then µx = µ = $0.50 and σx = σ X / 104 = $1.5497. The central limit theorem says that
x is approximately Normally distributed (with this mean and standard deviation). (d) Using
.
this Normal approximation, P(x > $1) = P(Z > 0.32) = 0.3745 (software: 0.3735).
Note: Joe comes out ahead if he wins at least once during the year. This probability is
.
easily computed as 1 − (0.999)104 = 0.0988. The distribution of x is different enough from a
Normal distribution so that answers given by the approximation are not as accurate in this
case as they are in many others.
5.23. The mean of three measurements has a N (125 mg/dl, 5.7735 mg/dl) distribution, and
.
P(Z > 1.645) = 0.05 if Z is N (0, 1), so L = 125 + 1.645 · 5.7735 = 134.5 mg/dl.
Solutions 173
√ .
5.24. x is approximately Normal with mean 1.3 and standard deviation 1.5/ 200 =
.
0.1061 flaws/yd2 , so P(x > 2) = P(Z > 6.6) = 0 (essentially).
5.26. (a) Although the probability of having to pay for a total loss for one or more of the 12
policies is very small, if this were to happen, it would be financially disastrous. On the other
hand, for thousands of policies, the law of large numbers says that the average claim on
many policies will be close to the mean, so the insurance company can be assured that the
premiums they collect will (almost certainly) cover the claims. (b) The central limit theorem
says that, in spite of the skewness of the population distribution, the average loss among
10,000 policies
√ will be approximately Normally distributed with mean $250 and standard
deviation σ/ 10,000 = $1000/100 = $10. Since $275 is 2.5 standard deviations above the
mean, the probability of seeing an average loss over $275 is about 0.0062.
√ .
5.27. (a) The mean of six untreatedspecimens has a standard deviation of 2.2/ 6 =
− 57
0.8981 lbs, so P(x u > 50) = P Z > 50 0.8981 = P(Z > −7.79), which is basically 1.
.
(b) x u − x t has mean 57 − 30 = 27 lbs and standard
deviation 2.22 /6 + 1.62 /6 =
− 27 .
1.1106 lbs, so P(x u − x t > 25) = P Z > 25 1.1106 = P(Z > −1.80) = 0.9641.
5.28. (a) The central limit theorem says that the sample means will be roughly
Normal. Note that the distribution of individual scores cannot have extreme outliers
because all scores are between
√ 1 and 7. (b) For Journal scores, y has mean 4.8 and
.
standard deviation 1.5/√28 = 0.2835. For Enquirer scores, x has mean 2.4 and
.
standard deviation 1.6/ 28 = 0.3024. (c) y − x has (approximately) a .Normal
distribution with mean 2.4 and standard
deviation 1.52 /28 + 1.62 /28 = 0.4145.
1 − 2.4 .
(d) P(y − x ≥ 1) = P Z ≥ 0.4145 = P(Z ≥ −3.38) = 0.9996.
√ √
5.29. (a) y has a N (µY , σY / m ) distribution and x has a N (µ X , σ X / n ) distribution.
y − x has a Normal distribution with mean µY − µ X and standard deviation
(b)
σY2 /m + σ X2 /n.
5.30. We have been given µ X = 9%, σ X = 19%, µY = 11%, σY = 17%, and ρ = 0.6.
R = 0.7X + 0.3Y has mean µ R = 0.7µ X .+ 0.3µY = 9.6% and standard
(a) Linda’s return
deviation σ R = (0.7σ X )2 + (0.3σY )2 + 2ρ(0.7σ X )(0.3σY ) = 16.8611%. (b) R, the average
return over 20√years, has approximately a Normal distribution with mean 9.6% and standard
. . .
deviation σ R / 20 = 3.7703%, so P(R < 5%) = P(Z < −1.22) = 0.1112. (c) After a
12% gain in the first year, Linda would have $1120; with a 6% gain in the second year, her
portfolio would be worth $1187.20. By contrast, two years with a 9% return would make
her portfolio worth $1188.10.
Note: As the text suggests, the appropriate
√ average for this situation is (a variation
.
on) the geometric mean, computed as (1.12)(1.06) − 1 = 8.9587%. Generally, if the
sequence of annual returns is r1 , r2 , . . . , rk (expressed as decimals), the mean return is
174 Chapter 5 Sampling Distributions
√
k
(1 + r1 )(1 + r2 ) · · · (1 + rk ) − 1. It can be shown that the geometric mean is always
smaller than the arithmetic mean, unless all the returns are the same.
5.32. n = 250 (the sample size), p̂ = 45% = 0.45, and X = n p̂ = 112.5. (Because X must be
a whole number, it was either 112 or 113, and the reported value of p̂ was rounded.)
5.33. (a) n = 1500 (the sample size). (b) The “Yes” count seems like the most reasonable
choice, but either count is defensible. (c) X = 825 (or X = 675). (d) p̂ = 1500
825
= 0.55 (or
p̂ = 1500 = 0.45).
675
5.34. Assuming no multiple births (twins, triplets, quadruplets), we have four independent
trials, each with probability of success (type O blood) equal to 0.25, so the number of
children with type O blood has the B(4, 0.25) distribution.
5.35. We have 15 independent trials, each with probability of success (heads) equal to 0.5, so
X has the B(15, 0.5) distribution.
5.36. Assuming each free-throw attempt is an independent trial, X has the B(10, 0.8)
distribution, and P(X ≤ 4) = 0.0064.
5.37. (a) For the B(5, 0.4) distribution, P(X = 0) = 0.0778 and P(X ≥ 3) = 0.3174. (b) For
the B(5, 0.6) distribution, P(X = 5) = 0.0778 and P(X ≤ 2) = 0.3174. (c) The number
of “failures” in the B(5, 0.4) distribution has the B(5, 0.6) distribution. With 5 trials,
0 successes is equivalent to 5 failures, and 3 or more successes is equivalent to 2 or fewer
failures.
5.38. (a) For the B(100, 0.5) distribution, µp̂ = p = 0.5 and σp̂ = p(1−
n
p)
= 20
1
= 0.05.
(b) No; the mean and standard deviation of the sample count are both 100 times bigger.
(That is, p̂ = X/100, so µp̂ = µ X /100 and σp̂ = σ X /100.)
5.39. (a) p̂ has approximately a Normal distribution with mean 0.5 and standard deviation
.
0.05, so P(0.3 < p̂ < 0.7) = P(−4 < Z < 4) = 1. (b) P(0.35 < p̂ < 0.65) = P(−3 < Z <
.
3) = 0.9974.
Note: For the second, the 68–95–99.7 rule would give 0.997—an acceptable answer,
especially since this is an approximation anyway. For comparison, the exact answers (to
. .
four decimal places) are P(0.3 < p̂ < 0.7) = 0.9999 or P(0.3 ≤ p̂ ≤ 0.7) = 1.0000, and
. .
P(0.35 < p̂ < 0.65) = 0.9965 or P(0.35 ≤ p̂ ≤ 0.65) = 0.9982. (Notice that the “correct”
answer depends on our understanding of “between.”)
.
5.40. (a) P(X ≥ 3) = 43 0.533 0.47 + 44 0.534 = 0.3588. (b) If the coin were fair,
4 3 4 4
P(X ≥ 3) = 3 0.5 0.5 + 4 0.5 = 0.3125.
Solutions 175
5.41. (a) Separate flips are independent (coins have no “memory,” so they do not try to
compensate for a lack of tails). (b) Separate flips are independent (coins have no “memory,”
so they do not get on a “streak” of heads). (c) p̂ can vary from one set of observed data to
another; it is not a parameter.
5.42. (a) X is a count; p̂ is a proportion. (b) The given formula is the standard deviation for a
binomial proportion. The variance for a binomial count is np(1 − p). (c) The rule of thumb
in the text is that np and n(1 − p) should both be at least 10. If p is close to 0 (or close to
1), n = 1000 might not satisfy this rule of thumb. (See also the solution to Exercise 5.22.)
5.43. (a) A B(200, p) distribution seems reasonable for this setting (even though we do not
know what p is). (b) This setting is not binomial; there is no fixed value of n. (c) A
B(500, 1/12) distribution seems appropriate for this setting. (d) This is not binomial,
because separate cards are not independent.
5.44. (a) This is not binomial; X is not a count of successes. (b) A B(20, p) distribution
seems reasonable, where p (unknown) is the probability of a defective pair. (c) This should
be (at least approximately) the B(n, p) distribution, where n is the number of students
in our sample, and p is the probability that a randomly-chosen student eats at least five
servings of fruits and vegetables.
5.45. (a) C, the number caught, is B(10, 0.7). M, the number missed, is B(10, 0.3).
(b) Referring to Table C, we find P(M ≥ 4) = 0.2001 + 0.1029 + 0.0368 + 0.0090 +
0.0014 + 0.0001 = 0.3503 (software: 0.3504).
5.46. (a) The B(20, 0.3) distribution (at least approximately). (b) P(X ≥ 8) = 0.2277.
5.48. X , the number who listen to streamed music daily, has the B(20, 0.25) distribution.
(a) µ X = np = 5, and µp̂ = 0.25. (b) With n = 200, µ X = 50 and µp̂ = 0.25. With
n = 2000, µ X = 500 and µp̂ = 0.25. µ X increases with n, while µp̂ does not depend on n.
5.50. (a) The population (the 75 members of the fraternity) is only 2.5 times the size of
the sample. Our rule of thumb says that this ratio should be at least 20. (b) Our rule of
thumb for the Normal approximation calls for np and n(1 − p) to be at least 10; we have
np = (1000)(0.002) = 2.
5.51. The count of 5s among n random digits has a binomial distribution with p = 0.1.
.
(a) P(at least one 5) = 1 − P(no 5) = 1 − (0.9)5 = 0.4095. (Or take 0.5905 from Table C
and subtract from 1.) (b) µ = (40)(0.1) = 4.
176 Chapter 5 Sampling Distributions
x 0 1 2 3 4
P(X = x) .3164 .4219 .2109 .0469 .0039
µ
0 1 2 3 4
√ .
5.54. For p̂, µ = 0.52 and σ = p(1 − p)/n = 0.01574. As p̂ is approximately Normally
distributed with this mean and standard deviation, we find:
. .
P(0.49 < p̂ < 0.55) = P(−1.91 < Z < 1.91) = 0.9438
(Software computation of the Normal probability gives 0.9433. Using a binomial
.
distribution, we can also find P(493 ≤ X ≤ 554) = 0.9495.)
5.56. When n = 300, the distribution of p̂ is approximately Normal with mean 0.52 and
standard deviation 0.02884 (nearly twice that in Exercise 5.54). When n = 5000, the
standard deviation drops to 0.00707 (less than half as big as in Exercise 5.54). Therefore:
. .
n = 300 : P(0.49 < p̂ < 0.55) = P(−1.04 < Z < 1.04) = 0.7016
. .
n = 5000 : P(0.49 < p̂ < 0.55) = P(−4.25 < Z < 4.25) = 1
Larger samples give a better probability that p̂ will be close to the true proportion p.
(Software computation of the first Normal probability gives 0.7017; using a binomial
.
distribution, we can also find P(147 ≤ X ≤ 165) = 0.7278. These more accurate answers do
not change our conclusion.)
Solutions 177
√ .
5.57. (a) The mean is µ = p = 0.69, and the standard deviation is σ = p(1 − p)/n =
0.0008444. (b) µ ± 2σ gives the range 68.83% to 69.17%. (c) This range is considerably
narrower than the historical range. In fact, 67% and 70% correspond to z = −23.7 and
z = 11.8—suggesting that the observed percents do not come from a N (0.69, 0.0008444)
distribution; that is, the population proportion has changed over time.
√
5.58. (a) p̂ = 294
400 = 0.735. (b) With p = 0.8, σp̂ = (0.8)(0.2)/400 = 0.02. (c) Still
assuming that p = 0.8, we would expect that about 95% of the time, p̂ should fall between
0.76 and 0.84. (d) It appears that these students prefer this type of course less than the
national average. (The observed value of p̂ is quite a bit lower than we would expect from a
N (0.8, 0.2) distribution, which suggests that it came from a distribution with a lower mean.)
√
5.60. As σp̂ = p(1 − p)/n, we have 0.0042 = (0.52)(0.48)/n, so n = 15,600.
5.61. (a)√ p = 1/4 = √ 0.25. (b) P(X ≥ 10) = 0.0139. (c) µ = np = 5 and
.
σ = np(1 − p) = 3.75 = 1.9365 successes. (d) No: The trials would not be independent
because the subject may alter his/her guessing strategy based on this information.
23,772,494 .
5.65. (a) p = 209,128,094 = 0.1137. (b) If B is the number of blacks, then B has
.
(approximately) the B(1200, 0.1137) distribution, so the mean is np = 136.4 blacks.
.
(c) P(B ≤ 100) = P(Z < −3.31) = 0.0005.
Note: In (b), the population is at least 20 times as large as the sample, so our “rule of
thumb” for using a binomial distribution is satisfied. In fact, the mean would be the same
even if we could not use a binomial distribution, but we need to have a binomial distribution
for part (c), so that we can approximate it with a Normal distribution—which we can safely
do, because both np and n(1 − p) are much greater than 10.
n n!
5.66. (a) n = n! 0! = 1. The only way to distribute
n successes among n observations is for
n n! n · (n − 1)!
all observations to be successes. (b) n − 1 = (n − 1)! 1! = (n − 1)! = n. To distribute
n − 1 successes among n observations, the one failure must be either
observation
1, 2, 3,
n n! n! n
. . . , n − 1, or n. (c) k = k! (n − k)! = (n − k)! [n − (n − k)]! = n − k . Distributing k
successes is equivalent to distributing n − k failures.
5.68. (a) P(first . appears on toss 2) = 5 1
= . 5
6 6 36
(b) P(first . appears on toss 3) = 5
6
5
6
1
6 = 25
216 .
3
(c) P(first . appears on toss 4) = 5
6
1
6 .
4
P(first . appears on toss 5) = 5
6
1
6 .
k − 1
5.69. Y has possible values 1, 2, 3, . . .. P(first . appears on toss k) = 5
6
1
6 .
5.72. (a) The table of standard deviations is given below. (b) The graph is below on the right;
it is shown as a scatterplot, but in this situation it would be reasonable to “connect the dots”
because the relationship between standard deviation and sample size holds for all n. (c) As
Solutions 179
n increases, the standard deviation decreases—at first quite rapidly, then more slowly (a
demonstration of the law of diminishing returns).
√ 100
n σ/ n
Standard deviation
1 100 75
4 50
25 20 50
100 10
250 6.32 25
500 4.47
0
1000 3.16
5000 1.41 0 1000 2000 3000 4000 5000
Sample size
5.73. (a) Out of 12 independent vehicles, the number X with one person has the B(12, 0.755)
distribution, so P(X ≥ 7) = 0.9503 (using software or a calculator). (b) Y (the number
of one-person cars in a sample of 80) has the B(80, 0.755) distribution. Regardless of
the approach used—Normal approximation, or exact computation using software or a
.
calculator—P(Y ≥ 41) = 1.
5.74. This would not be surprising: Assuming that all the authors are independent (for
example, none were written by siblings or married couples), we can view the 12
names as being a random sample so that the number N of occurrences of the ten most
common names would have a binomial distribution with n = 12 and p = 0.056. Then
.
P(N = 0) = (1 − 0.056)12 = 0.5008.
√
5.77. If x is the average weight of 12 eggs, then x has a N (65 g, 5/ 12 g) =
830 .
12 < x < 12 ) = P(−1.44 < Z < 2.89) = 0.9231
N (65 g, 1.4434 g) distribution, and P( 755
(software: 0.9236).
5.78. (a) The machine that makes the caps and the machine that applies the torque are not
the same. (b)√T (torque) is N (7.0, 0.9) and S (cap strength) is N (10.1, 1.2), so T − S is
N (7 − 10.1, 0.92 + 1.22 ) = N (−3.1 inch · lb, 1.5 inch · lb). The probability that the cap
breaks is P(T > S) = P(T − S > 0) = P(Z > 2.07) = 0.0192 (software: 0.0194).
√
5.79. The center line is µx = µ = 4.25 and the control limits are µ ± 3σ/ 5 = 4.0689 to
4.4311.
√ √
5.80. (a) x has a N (32, 6/ 25 ) = N (32, 1.2) distribution,
and y has a N (29, 5/ 25 ) =
.
N (29, 1) distribution. (b) y − x has a N (29 − 32, 52 /25 + 62 /25 ) = N (−3, 1.5620)
distribution. (c) P(y ≥ x) = P(y − x ≥ 0) = P(Z ≥ 1.92) = 0.0274.
5.82. (a) Yes; this rule works for any random variables X and Y . (b) No; this rule requires that
X and Y be independent. The incomes of two married people are certainly not independent,
as they are likely to be similar in many characteristics that affect income (for example,
educational background).
5.83. For each step of the random walk, the mean is µ = (1)(0.6) + (−1)(0.4) = 0.2, the
√ is σ. = (1 − 0.2) (0.6) + (−1 − 0.2) (0.4) = 0.96, and the standard deviation is
variance 2 2 2
6.4. Shown below are sample output screens for (a) 10 and (b) 1000 SRSs. In 99.4% of all
repetitions of part (a), students should see between 5 and 10 hits (that is, at least 5 of the 10
SRSs capture the true mean µ). Out of 1000 80% confidence intervals, nearly all students
will observe between 76% and 84% capturing the mean.
6.5. The standard error is sx = √σ = 0.3, and the 95% confidence interval for µ is
100
3
87.3 ± 1.96 √ = 87.3 ± 0.588 = 86.712 to 87.888
100
6.6. A 99% confidence interval would have a larger margin of error; a wider interval is needed
in order to be more confident that the interval includes the true mean. The 99% confidence
interval for µ is
3
87.3 ± 2.576 √ = 87.3 ± 0.7728 = 86.527 to 88.073
100
(1.96)(12,000) 2
.
6.7. n = 1000
= 553.19—take n = 554.
6.8. In the previous exercise, we found that n = 554 would give a margin of error of $1000.
The margin of error (1.96 × $12,000
√
n
) would be smaller than $1000 with a larger sample
(n > 554), and larger with a smaller sample (n < 554).
.
If all 1000 graduates respond, the margin of error would be (1.96)($379.47) = $743.77; if
.
n = 500, the margin of error would be (1.96)($536.66) = $1051.85.
181
182 Chapter 6 Introduction to Inference
249 .
6.9. The (useful) response rate is 5800 = 0.0429, or about 4.3%. The reported margin of error
is probably unreliable because we know nothing about the 95.7% of students that did not
provide (useful) responses; they may be more (or less) likely to charge education-related
expenses.
6.10. (a) The 95% confidence interval is 87 ± 10 = 77 to 97. (The sample size is not needed.)
(b) Greater than 10: A wider margin of error is needed in order to be more confident that
the interval includes the true mean.
Note: If this result is based on a Normal distribution, the margin of error for 99%
confidence would be roughly 13.1, because we multiply by 2.576 rather than 1.96.
√
6.11. The margins of error are 1.96 × 7/ n, which yields n = 10
4.3386, 3.0679, 2.1693, and 1.5339. (And, of course, all n = 20
intervals are centered at 50.) Interval width decreases n = 40
with increasing sample size. n = 80
46 47 48 49 50 51 52 53 54
√
6.12. The margins of error are z ∗ × 14/ 49 = 2z ∗ . With z ∗ 99%
equal to 1.282, 1.645, 1.960, and 2.576, this yields 2.564, 95%
3.290, 3.920, and 5.152. (And, of course, all intervals are 90%
centered at 70.) Increasing confidence makes the interval 80%
wider.
66 68 70 72 74
√
6.13. (a) She did not divide the standard deviation by n = 20. (b) Confidence intervals
concern the population mean, not the sample mean. (The value of the sample mean is
known to be 8.6; it is the population mean that we do not know.) (c) 95% is a confidence
level, not a probability. Furthermore, it does not make sense to make probability statements
about the population mean µ, which is an unknown constant (rather than a random
quantity). (d) The large sample size does not affect the distribution of individual alumni
ratings (the population distribution). The use of a Normal distribution is justified because the
distribution of the sample mean is approximately Normal when the sample is large.
Note: For part (c), a Bayesian statistician might view the population mean µ as a
random quantity, but the viewpoint taken in the text is non-Bayesian.
√
6.14. (a) The standard deviation should be divided by 100 = 10, not by 100. (b) The correct
interpretation is that (with 95% confidence) the average time spent at the site is between
3.71 and 4.69 hours. That is, the confidence interval is a statement about the population
mean, not about the individual members. (c) To halve the margin of error, the sample size
needs to be quadrupled, to about 400. (In fact, n = 385 would be enough.)
Solutions 183
6.15. (a) To estimate the mean importance of recreation to college satisfaction, the 95%
confidence interval for µ is
3.9
7.5 ± 1.96 √ = 7.5 ± 0.1478 = 7.3522 to 7.6478
2673
(b) The 99% confidence interval for µ is
3.9
7.5 ± 2.576 √ = 7.5 ± 0.1943 = 7.3057 to 7.6943
2673
6.16. We must assume that the 2673 students were chosen as an SRS (or something like it).
The non-Normality of the population distribution is not a problem; we have a very large
sample, so the central limit theorem applies.
6.17. For mean TRAP level, the margin of error is 2.29 U/l and the 95% confidence interval
for µ is
6.5
13.2 ± 1.96 √ = 13.2 ± 2.29 = 10.91 to 15.49 U/l
31
6.19. Scenario B has a smaller margin of error. Both samples would have the same value of z ∗
(1.96), but the value of σ would be smaller for (B) because we would have less variability
in textbook cost for students in a single major.
Note: Of course, at some schools, taking a sample of 100 sophomores in a given major
is not possible. However, even if we sampled students from a number of institutions, we still
might expect less variability within a given major than from a broader cross-section.
6.20. We assume that the confidence interval was √ constructed using the methods of
this chapter; that is, we assume that x ± 1.96σ/ 2500. Then the center of the given
confidence interval is x = 12 ($45,330 + $46,156) = $45,743, the margin of error
for is $46,156 − x = $413, and the 99% confidence margin of error is therefore
1.96 = $542.8. Then the desired confidence interval is
$413 · 2.576
($48,633 − x) ± $542.8 = $2347.2 to $3432.8;
that is, the average starting salary at this institution is about $2300 to $3400 less than that
NACE mean.
6.21. (a) “The probability is about 0.95 that x is within 14 kcal/day of . . . µ” (because 14
is two standard deviations). (b) This is simply another way of understanding the statement
from part (a): If |x − µ| is less than 14 kcal/day 95% of the time, then “about 95% of all
samples will capture the true mean . . . in the interval x plus or minus 14 kcal/day.”
184 Chapter 6 Introduction to Inference
6.22. For the mean monthly rent for unfurnished one-bedroom apartments in Dallas, the 95%
confidence interval for µ is
$290
$980 ± 1.96 √ = $980 ± $179.74 = $800.26 to $1159.74
10
6.23. No; This is a range of values for the mean rent, not for individual rents.
Note: To find a range to include 95% of all rents, we should take µ ± 2σ (or more
precisely, µ ± 1.96σ ), where µ is the (unknown) mean rent for all apartments, and σ is
the standard deviation for all apartments (assumed to be $290 in Exercise 6.22). If µ were
equal to $1050, for example, this range would be about $470 to $1630. However, because
we do not actually know µ, we estimate it using x,and to account for the variability
in x,
we must widen the margin of error by a factor of 1 + n . The formula x ± 2σ 1 + 10
1 1
is
called a prediction interval for future observations. (Usually, such intervals are constructed
with the t distribution, discussed in the Chapter 7, but the idea is the same.)
6.24. If the distribution were roughly Normal, the 68–95–99.7 rule says that 68% of all
measurements should be in the range 13.8 to 53.0 ng/ml, 95% should be between −5.8
and 72.6 ng/ml, and 99.7% should be between −25.4 and 92.2 ng/ml. Because the
measurements cannot be negative, this suggests that the distribution must be skewed to
the right. The Normal confidence interval should be fairly accurate nonetheless because
the central limit theorem says that the distribution of the sample mean x will be roughly
Normal.
6.25. (a) For the mean number of hours spent on the Internet, the 95% confidence interval for
µ is
5.5
19 ± 1.96 √ = 19 ± 0.3112 = 18.6888 to 19.3112 hours
1200
(b) No; this is a range of values for the mean time spent, not for individual times. (See also
the comment in the solution to Exercise 6.23.)
6.26. (a) To change from hours to minutes, multiply by 60: x m = 60x h = 1140 and
σm = 60σh = 330 minutes. (b) For mean time in minutes, the 95% confidence interval for µ
is
330
1140 ± 1.96 √ = 1140 ± 18.67 = 1121.33 to 1158.67 minutes
1200
(c) This interval can be found by multiplying the previous interval (18.6888 to
19.3112 hours) by 60.
6.27. (a) We can be 95% confident, but not certain. (b) We obtained the interval 85% to
95% by a method that gives a correct result (that is, includes the true mean) 95% of the
time. (c) For 95% confidence, the margin of error is about two standard deviations (that
.
is, z ∗ = 1.96), so σestimate = 2.5%. (d) No; confidence intervals only account for random
sampling error.
Solutions 185
.
6.28. (a) The standard deviation of the mean is σx = √3.5 = 0.7826 mpg. (b) A 3 4
20
stemplot (right) does not suggest any severe deviations from Normality. The 3 677
3 9
mean of the 20 numbers in the sample is x = 43.17 mpg. (c) If µ is the 4 1
population mean fuel efficiency, the 95% confidence interval for µ is 4 23333
4 445
3.5
43.17 ± 1.96 √ = 43.17 ± 1.5339 = 41.6361 to 44.7039 mpg 4 667
20 4 88
5 0
6.30. One sample screen is shown below, along with a sample stemplot of results. The number
of hits will vary, but the distribution should follow a binomial distribution with n = 50 and
p = 0.95, so we expect the average number of hits to be about 47.5. We also find that about
99.7% of individual counts should be 43 or more, and the mean hit count for 30 samples
should be approximately Normal with mean 47.5 and standard deviation 0.2814—so almost
all sample means should be between 46.66 and 48.34.
44 00
45 0000
46 00
47 00000000
48 000000
49 000
50 00000
(1.96)(6.5) 2
.
6.31. n = 1.5
= 72.14—take n = 73.
6.32. If we start with a sample of size k and lose 20% of the sample, we will end with 0.8k.
Therefore, we need to increase the sample size by 25%—that is, start with a sample of size
k = 1.25n—so that we end with (0.8)(1.25n) = n. With n = 73, that means we should
initially sample k = 91.25 (use 92) subjects.
6.33. No: Because the numbers are based on voluntary response rather than an SRS, the
confidence interval methods of this chapter cannot be used; the interval does not apply to
the whole population.
6.34. (a) For the mean of all repeated measurements, the 98% confidence interval for µ is
0.0002
10.0023 ± 2.326 √ = 10.0023 ± 0.0002 = 10.0021 to 10.0025 g
5
(2.326)(0.0002) 2
.
(b) n = 0.0001
= 21.64—take n = 22.
186 Chapter 6 Introduction to Inference
6.35. The number of hits has a binomial distribution with parameters n = 5 and p = 0.95, so
the number of misses is binomial with n = 5 and p = 0.05. We can therefore use Table C
.
to answer these questions. (a) The probability that all cover their means is 0.955 = 0.7738.
(Or use Table C to find the probability of 0 misses.) (b) The probability that at least four
.
intervals cover their means is 0.955 + 5(0.05)(0.954 ) = 0.9774. (Or use Table C to find the
probability of 0 or 1 misses.)
6.36. The new design can be considered an improvement if the mean response µ to the survey
is greater than 4. The null hypothesis should be H0 : µ = 4; the alternative hypothesis
could be either µ > 4 or µ < 4. The first choice would be appropriate if we want the
default assumption (H0 ) to be that the new design is not an improvement; that is, we will
only conclude that the new design is better if we see compelling evidence to that effect.
Choosing Ha : µ < 4 would mean that the default assumption is that the new design is at
least as good as the old one, and we will stick with that belief unless we see compelling
evidence that it is worse.
Students who are just learning about stating hypotheses might have difficulty choosing the
alternative for this problem. In fact, either one is defensible, although the typical choice in
such cases would be µ > 4—that is, we give the benefit of the doubt to the old design and
need convincing evidence that the new design is better.
6.37. If µ is the mean DXA reading for the phantom, we test H0 : µ = 1.4 g/cm2 versus
Ha : µ = 1.4 g/cm2 .
–3 –2 –1 0 1 2 3
–3 –2 –1 0 1 2 3
–3 –2 –1 0 1 2 3
–3 –2 –1 0 1 2 3
Solutions 187
6.42. For z ∗ = 2 the P-value would be 2P(Z > 2) = 0.0456, and for z ∗ = 3 the P-value
would be 2P(Z > 3) = 0.0026.
Note: In other words, the Supreme Court uses α no bigger than about 0.05.
− 25
6.43. (a) z = 27√ = 2.4. (b) For a one-sided alternative, P = P(Z > 2.4) = 0.0082. (c) For
5/ 36
a two-sided alternative, double the one-sided P-value: P = 0.0164.
.
6.44. The test statistic is z = = 0.45. This gives very little reason to doubt the null
0.514−0.5
√
0.314/ 100
.
hypothesis (µ = 0.5); in fact, the two-sided P-value is P = 0.6527.
6.45. Recall the statement from the text: “A level α two-sided significance test rejects . . .
H0 : µ = µ0 exactly when the value µ0 falls outside a level 1 − α confidence interval for µ.”
(a) No; 30 is not in the 95% confidence interval because P = 0.033 means that we would
reject H0 at α = 0.05. (b) Yes; 30 is in the 99% confidence interval because we would not
reject H0 at α = 0.01.
6.46. See the quote from the text in the previous solution. (a) No, we would not reject µ = 58
because 58 falls well inside the confidence interval, so the P-value is (much) greater than
0.05. (b) Yes, we would reject µ = 63; the fact that 63 falls outside the 95% confidence
interval means that P < 0.05.
Note: The given confidence interval suggests that x = 57.5, and if the interval
was constructed using the Normal distribution, the standard error of the mean is about
2.25—half the margin of error. (The standard error might be less if it was constructed with
a t distribution rather than the Normal distribution.) Then x is about 0.22 standard errors
.
below µ = 58—yielding P = 0.82—and x is about 2.44 standard errors below µ = 63, so
.
that P = 0.015.
6.47. (a) Yes, we reject H0 at α = 0.05. (b) No, we do not reject H0 at α = 0.01.
(c) We have P = 0.039; we reject H0 at significance level α if P < α.
6.48. (a) No, we do not reject H0 at α = 0.05. (b) No, we do not reject H0 at α = 0.01.
(c) We have P = 0.062; we reject H0 at significance level α if P < α.
6.49. (a) One of the one-sided P-values is half as big as the two-sided P-value (0.022); the
other is 1 − 0.022 = 0.978. (b) Suppose the null hypothesis is H0 : µ = µ0 . The smaller
P-value (0.022) goes with the one-sided alternative that is consistent with the observed data;
for example, if x > µ0 , then P = 0.022 for the alternative µ > µ0 .
6.50. (a) The null hypothesis√ should be a statement about µ, not x. (b) The standard deviation
of the sample mean is 5/ 30. (c) x = 45 would not make us inclined to believe that µ > 50
over the (presumed) null hypothesis µ = 50. (d) Even if we fail to reject H0 , we are not
sure that it is true.
Note: That is, “not rejecting H0 ” is different from “knowing that H0 is true.” This is
the same distinction we make about a jury’s verdict in a criminal trial: If the jury finds the
defendant “not guilty,” that does not necessarily mean that they are sure he/she is innocent.
It simply means that they were not sufficiently convinced of his/her guilt.
188 Chapter 6 Introduction to Inference
6.51. (a) Hypotheses should be stated in terms of the population mean, not the sample mean.
(b) The null hypothesis H0 should be that there is no change (µ = 21.2). (c) A small
P-value is needed for significance; P = 0.98 gives no reason to reject H0 . (d) We compare
the P-value, not the z-statistic, to α. (In this case, such a small value of z would have a
very large P-value—close to 0.5 for a one-sided alternative, or close to 1 for a two-sided
alternative.)
6.52. (a) We are checking to see if the proportion p increased, so we test H0 : p = 0.88 versus
Ha : p > 0.88. (b) The professor believes that the mean µ for the morning class will be
higher, so we test H0 : µ = 75 versus Ha : µ > 75. (c) Let µ be the mean response (for the
population of all students who read the newspaper). We are trying to determine if students
are neutral about the change, or if they have an opinion about it, with no preconceived idea
about the direction of that opinion, so we test H0 : µ = 0 versus Ha : µ = 0.
6.53. (a) If µ is the mean score for the population of placement-test students, then we
test H0 : µ = 77 versus Ha : µ = 77 because we have no prior belief about whether
placement-test students will do better or worse. (b) If µ is the mean time to complete the
maze with rap music playing, then we test H0 : µ = 20 seconds versus Ha : µ > 20 seconds
because we believe rap music will make the mice finish more slowly. (c) If µ is the mean
area of the apartments, we test H0 : µ = 880 ft2 versus Ha : µ < 880 ft2 , because we suspect
the apartments are smaller than advertised.
6.54. (a) If pm and p f are the proportions of (respectively) males and females who like MTV
best, we test H0 : pm = p f versus Ha : pm > p f . (b) If µ A and µ B are the mean test scores
for each group, we test H0 : µ A = µ B versus Ha : µ A > µ B . (c) If ρ is the (population)
correlation between time spent at social network sites and self-esteem, we test H0 : ρ = 0
versus Ha : ρ < 0.
Note: In each case, the parameters identified refer to the respective populations, not the
samples.
6.55. (a) H0 : µ = $42,800 versus Ha : µ > $42,800, where µ is the mean household income
of mall shoppers. (b) H0 : µ = 0.4 hr versus Ha : µ = 0.4 hr, where µ is this year’s mean
response time.
6.56. (a) For Ha : µ > µ0 , the P-value is P(Z > 1.63) = 0.0516.
(b) For Ha : µ < µ0 , the P-value is P(Z < 1.63) = 0.9484.
(c) For Ha : µ = µ0 , the P-value is 2P(Z > 1.63) = 2(0.0516) = 0.1032.
6.57. (a) For Ha : µ > µ0 , the P-value is P(Z > −1.82) = 0.9656.
(b) For Ha : µ < µ0 , the P-value is P(Z < −1.82) = 0.0344.
(c) For Ha : µ = µ0 , the P-value is 2P(Z < −1.82) = 2(0.0344) = 0.0688.
6.58. Recall the statement from the text: “A level α two-sided significance test rejects . . .
H0 : µ = µ0 exactly when the value µ0 falls outside a level 1 − α confidence interval for µ.”
(a) No, 30 is not in the 95% confidence interval because P = 0.032 means that we would
reject H0 at α = 0.05. (b) No, 30 is not in the 90% confidence interval because we would
also reject H0 at α = 0.10.
Solutions 189
6.59. See the quote from the text in the previous solution. (b) Yes, we would reject
H0 : µ = 24; the fact that 24 falls outside the 90% confidence interval means that P < 0.10.
(a) No, we would not reject H0 : µ = 30 because 30 falls inside the confidence interval, so
P > 0.10.
Note: The given confidence interval suggests that x = 28.5, and if the interval was
constructed using the Normal distribution, the standard error of the mean is about
1.75—half the margin of error. (The standard error might be less if it was constructed with
a t distribution rather than the Normal distribution.) Then x is about 2.57 standard errors
.
above µ = 24—yielding P = 0.01—and x is about 0.86 standard errors below µ = 30, so
.
that P = 0.39.
6.60. The study presumably examined malarial infection rates in two groups of subjects—one
with bed nets and one without. The observed differences between the two groups were so
large that they would be unlikely to occur by chance if bed nets had no effect. Specifically,
if the groups were the same, and we took many samples, the difference in malarial
infections would be so large less than 0.1% of the time.
6.61. P = 0.09 means there is some evidence for the wage decrease, but it is not significant
at the α = 0.05 level. Specifically, the researchers observed that average wages for
peer-driven students were 13% lower than average wages for ability-driven students, but
(when considering overall variation in wages) such a difference might arise by chance 9% of
the time, even if student motivation had no effect on wages.
6.62. If the presence of pig skulls were not an indication of wealth, then differences similar to
those observed in this study would occur less than 1% of the time by chance.
6.63. Even if the two groups (the health and safety class, and the statistics class) had the same
level of alcohol awareness, there might be some difference in our sample due to chance. The
difference observed was large enough that it would rarely arise by chance. The reason for
this difference might be that health issues related to alcohol use are probably discussed in
the health and safety class.
6.64. Even if scores had not changed over time, random fluctuation might cause the mean in
2009 to be different from the 2007 mean. However, in this case the difference was so great
that it is unlikely to have occurred by chance; specifically, such a difference would arise less
than 5% of the time if the actual mean had not changed. We therefore conclude that the
mean did change from 2007 to 2009.
6.65. If µ is the mean difference between the two groups of children, we test H0 : µ = 0
−0 .
versus Ha : µ = 0. The test statistic is z = 41.2 = 3.33, for which software reports
.
P = 0.0009—very strong evidence against the null hypothesis.
Note: The exercise reports the standard deviation of the mean, rather than
√ the sample
standard deviation; that is, the reported value has already been divided by 238.
190 Chapter 6 Introduction to Inference
6.66. If µ is the mean north-south location, we wish to test H0 : µ = 100 versus Ha : µ = 100.
.
We find z = 99.74√− 100 = −0.11; this is not significant—P = 2(0.4562) = 0.9124—so we
58/ 584
have no reason to doubt a uniform distribution based on this test.
6.67. If µ is the mean east-west location, the hypotheses are H0 : µ = 100 versus Ha : µ = 100
.
(as in the previous exercise). For testing these hypotheses, we find z = 113.8√− 100 = 5.75.
58/ 584
This is highly significant (P < 0.0001), so we conclude that the trees are not uniformly
spread from east to west.
.
6.68. For testing these hypotheses, we find z = 10.2 −√8.9 = 1.27. This is not significant
2.5/ 6
(P = 0.1020); there is not enough evidence to conclude that these sonnets were not written
by our poet. (That is, we cannot reject H0 .)
.
6.69. (a) z = 127.8√− 115 = 2.13, so the P-value is P = P(Z > 2.13) = 0.0166. This is strong
30/ 25
evidence that the older students have a higher SSHA mean. (b) The important assumption
is that this is an SRS from the population of older students. We also assume a Normal
distribution, but this is not crucial provided there are no outliers and little skewness.
6.70. (a) Because we suspect that athletes might be deficient, we use a one-sided alternative:
H0 : µ = 2811.5 kcal/day versus Ha : µ < 2811.5 kcal/day. (b) The test statistic
.
is z = 2403.7−2811.5
√ = −6.57, for which P < 0.0001. There is strong evidence of
880/ 201
below-recommended caloric consumption among female Canadian high-performance athletes.
6.71. (a) H0 : µ = 0 mpg versus Ha : µ = 0 mpg, where µ is the mean difference. (b) The
.
mean of the 20 differences is x = 2.73, so z = 2.73√− 0 = 4.07, for which P < 0.0001. We
3/ 20
conclude that µ = 0 mpg; that is, we have strong evidence that the computer’s reported fuel
efficiency differs from the driver’s computed values.
.
6.72. A debt of $3817 in the West would be equivalent to a debt of ($3817)(1.06) = $4046
in the Midwest, for a difference of $4046 − $3260 = $786. With the hypotheses given
in Example 6.10, and the standard deviation ($374) from Example 6.11, the test statistic
− $0 . .
is z = $786
$374 = 2.10. The P-value is P = 2P(Z ≥ 2.10) = 0.0358; recall that, for
the unadjusted data, the P-value was 0.1362. Adjusting for the differing value of a dollar
strengthens the evidence against H0 , enough that it is now significant at the 5% level.
6.73. For (b) and (c), either compare with the critical values in Table D or determine the
P-value (0.0336). (a) H0 : µ = 0.9 mg versus Ha : µ > 0.9 mg. (b) Yes, because z > 1.645
(or because P < 0.05). (c) No, because z < 2.326 (or because P > 0.01).
Solutions 191
6.74. A sample screen (for x = 1) is shown below on the left. As one can judge from the
shading under the Normal curve, x = 0.5 is
√ not significant, but 0.6 is. (In fact, the cutoff is
about 0.52, which is approximately 1.645/ 10.)
6.75. See the sample screen (for x = 1) above on the right. As one can judge from the shading
under the Normal curve, x = 0.7 is not√significant, but 0.8 is. (In fact, the cutoff is about
0.7354, which is approximately 2.326/ 10.) Smaller α means that x must be farther away
from µ0 in order to reject H0 .
6.77. When a test is significant at the 5% level, it means that if the null hypothesis were true,
outcomes similar to those seen are expected to occur fewer than 5 times in 100 repetitions
of the experiment or sampling. “Significant at the 10% level” means we have observed
something that occurs in fewer than 10 out of 100 repetitions (when H0 is true). Something
that occurs “fewer than 5 times in 100 repetitions” also occurs “fewer than 10 times in 100
repetitions,” so significance at the 5% level implies significance at the 10% level (or any
higher level).
192 Chapter 6 Introduction to Inference
6.78. Something that occurs “fewer than 5 times in 100 repetitions” is not necessarily as rare
as something that occurs “less than once in 100 repetitions,” so a test that is significant at
5% is not necessarily significant at 1%.
6.79. Using Table D or software, we find that the 0.005 critical value is 2.576, and the 0.0025
critical value is 2.807. Therefore, if 2.576 < |z| < 2.807—that is, either 2.576 < z < 2.807
or −2.807 < z < −2.576—then z would be significant at the 1% level, but not at the 0.5%
level.
6.80. As 2.326 < 2.52 < 2.576, the two-sided P-value is between 2(0.005) = 0.01 and
.
2(0.01) = 0.02. (Software tells us that P = 0.012, consistent with the observation that z is
close to 2.576.)
6.81. As 0.63 < 0.674, the one-sided P-value is P > 0.25. (Software gives P = 0.2643.)
6.82. Because 1.645 < 1.92 < 1.960, the P-value is between 2(0.025) = 0.05 and
2(0.05) = 0.10. From Table A, P = 2(0.0274) = 0.0548.
6.83. Because the alternative is two-sided, the answer for z = −1.92 is the same as for
z = 1.92: −1.645 > −1.92 > −1.960, so Table D says that 0.05 < P < 0.10, and Table A
gives P = 2(0.0274) = 0.0548.
− 525
6.84. (a) z = 541.4√
= 1.64. This is not significant at α = 0.05 because z < 1.645
100/ 100
(or P = 0.0505). (b) z = 541.5√− 525 = 1.65. This is significant at α = 0.05 because
100/ 100
z > 1.645 (or P = 0.0495). (c) Fixed-level significance tests require that we draw a line
between “significant” and “not significant”; in this example, we see evidence on each side
of that line. The 5% significance level is a guideline, not a sacred edict. P-values are more
informative ways to convey the strength of the evidence.
6.85. In order to determine the effectiveness of alarm systems, we need to know the percent of
all homes with alarm systems, and the percent of burglarized homes with alarm systems.
For example, if only 10% of all home have alarm systems, then we should compare the
proportion of burglarized homes with alarm systems to 10%, not 50%.
An alternate (but rather impractical) method would be to sample homes and classify
them according to whether or not they had an alarm system, and also by whether or not
they had experienced a break-in at some point in the recent past. This would likely require
a very large sample in order to get a sufficiently large count of homes that had experienced
break-ins.
6.86. Finding something to be “statistically significant” is not really useful unless the
significance level is sufficiently small. While there is some freedom to decide what
“sufficiently small” means, α = 0.20 would lead the student to incorrectly reject H0 one-fifth
of the time, so it is clearly a bad choice.
6.87. The first test was barely significant at α = 0.05, while the second was significant at any
reasonable α.
Solutions 193
6.88. One can learn something from negative results; for example, a study that finds no benefit
from a particular treatment is at least useful in terms of what will not work. Furthermore,
reviewing such results might point researchers to possible future areas of study.
6.89. A significance test answers only Question b. The P-value states how likely the observed
effect (or a stronger one) is if H0 is true, and chance alone accounts for deviations from
what we expect. The observed effect may be significant (very unlikely to be due to chance)
and yet not be of practical importance. And the calculation leading to significance assumes a
properly designed study.
6.90. Based on the description, this seems to have been an experiment (not just an
observational study), so a statistically significant outcome suggests that vitamin C is
effective in preventing colds.
6.91. (a) If SES had no effect on LSAT results, there would still be some difference in scores
due to chance variation. “Statistically insignificant” means that the observed difference was
no more than we might expect from that chance variation. (b) If the results are based on a
small sample, then even if the null hypothesis were not true, the test might not be sensitive
enough to detect the effect. Knowing the effects were small tells us that the statistically
insignificant test result did not occur merely because of a small sample size.
6.92. These questions are addressed in the summary for Section 6.3. (a) Failing to reject
H0 does not mean that H0 is true. (b) This is correct; a difference that is statistically
significant might not be practically important. (This does not mean that these are opposites;
a difference could be both statistically and practically significant.) (c) This might be
technically true, but in order for the analysis to be meaningful, the data must satisfy the
assumptions of the analysis. (d) Searching for patterns and then testing their significance can
lead to false positives (that is, we might reject the null hypothesis incorrectly). If a pattern is
observed, we should collect new data to test if it is present.
6.93. In each case, we find the test statistic z by dividing the observed difference
√ .
(2453.7 − 2403.7 = 50 kcal/day) by 880/ n. (a) For n = 100, z = 0.57, so
.
P = P(Z > 0.57) = 0.2843. (b) For n = 500, z = 1.27, so P = P(Z > 1.27) = 0.1020.
.
(c) For n = 2500, z = 2.84, so P = P(Z > 2.84) = 0.0023.
6.94. The study may have rejected µ = µ0 (or some other null hypothesis), but with such a
large sample size, such a rejection might occur even if the actual mean (or other parameter)
differs only slightly from µ0 . For example, there might be no practical importance to the
difference between µ = 10 and µ = 10.5.
6.95. We expect more variation with small sample sizes, so even a large difference between
x and µ0 (or whatever measures are appropriate in our hypothesis test) might not turn out
to be significant. If we were to repeat the test with a larger sample, the decrease in the
standard error might give us a small enough P-value to reject H0 .
194 Chapter 6 Introduction to Inference
6.98. When many variables are examined, “significant” results will show up by chance, so
we should not take it for granted that the variables identified are really indicative of future
success. In order to decide if they are appropriate, we should track this year’s trainees and
compare the success of those from urban/suburban backgrounds with the rest, and likewise
compare those with a degree in a technical field with the rest.
6.100. We expect 50 tests to be statistically significant: Each of the 1000 tests has a 5%
chance of being significant, so the number of significant tests has a binomial distribution
with n = 1000 and p = 0.05, for which the mean is np = 50.
.
6.102. Using α/6 = 0.008333 as the cutoff, the fourth (P = 0.003) and sixth (P < 0.001) tests
are significant.
.
6.103. Using α/12 = 0.004167 as the cutoff, we reject the fifth (P = 0.002) and eleventh
(P < 0.002) tests.
6.104. The power of this study is far lower than what is generally desired—for example, it is
well below the “80% standard” mentioned in the text. For the specified effect, 35% power
means that, if the effect is present, we will only detect it 35% of the time. With such a
small chance of detecting an important difference, the study should probably not be run
(unless the sample size is increased to give sufficiently high power).
6.105. A larger sample gives more information and therefore gives a better chance of detecting
a given alternative; that is, larger samples give more power.
6.106. The power for µ = −3 is 0.82—the same as the power for µ = 3—because both
alternatives are an equal distance from the null value of µ. (The symmetry of two-sided
tests with the Normal distribution means that we only need to consider the size of the
difference, not the direction.)
6.110. (a) For the alternative Ha : µ > 168, we reject H0 at the 5% significance level if
.
z > 1.645. (b) x −√168 > 1.645 when x > 168 + 1.645 · √27 = 173.31. (c) When µ = 173,
27/ 70 70
the probability of rejecting H0 is
x − 173 173.31 − 173 .
P(x > 173.31) = P √ > √ = P(Z > 0.10) = 0.4602.
27/ 70 27/ 70
(d) The power of this test is not up to the 80% standard suggested in the text; he should
collect a larger sample.
Note: Software gives a slightly different answer for the power in part (c), but the
conclusion in part (d) is the same. To achieve 80% power against µ = 173, we need
n = 180.
.
6.111. We reject H0 when z > 2.326, which is equivalent to x > 450 + 2.326 · √100 = 460.4,
500
so the power against µ = 460 is
P(reject H0 when µ = 460) = P(x > 460.4 when µ = 460)
460.4 − 460 .
=P Z> √ = P(Z > 0.09) = 0.4641.
100/ 500
This is quite a bit less than the “80% power” standard.
6.113. (a) The hypotheses are “subject should go to college” and “subject should join work
force.” The two types of errors are recommending someone go to college when (s)he is
better suited for the work force, and recommending the work force for someone who should
go to college. (b) In significance testing, we typically wish to decrease the probability of
wrongly rejecting H0 (that is, we want α to be small); the answer to this question depends
on which hypothesis is viewed as H0 .
196 Chapter 6 Introduction to Inference
Note: For part (a), there is no clear choice for which should be the null hypothesis.
In the past, when fewer people went to college, one might have chosen “work force” as
H0 —that is, one might have said, “we’ll assume this student will join the work force unless
we are convinced otherwise.” Presently, roughly two-thirds of graduates attend college,
which might suggest H0 should be “college.”
6.114. This is probably not a confidence interval; it is not intended to give an estimate of the
mean income, but rather it gives the range of incomes earned by all (or most) telemarketers
working for this company.
6.115. From the description, we might surmise that we had two (or more) groups of
students—say, an exercise group and a control (or no-exercise) group. (a) For example, if µ
is the mean difference in scores between the two groups, we might test H0 : µ = 0 versus
Ha : µ = 0. (Assuming we had no prior suspicion about the effect of exercise, the alternative
should be two-sided.) (b) With P = 0.38, we would not reject H0 . In plain language: The
results observed do not differ greatly from what we would expect if exercise had no effect
on exam scores. (c) For example: Was this an experiment? What was the design? How big
were the samples?
6.117. (a) Because all standard √ deviations and sample sizes are the same, the margin of error
.
for all intervals is 1.96 × 19/ 180 = 2.7757. The confidence intervals are listed in the table
on the following page. (b) The plot on the following page shows the error bars for the
confidence intervals of part (a), and also for part (c). The limits for (a) are the thicker lines
which do not extend as far above and √ below the mean. (c) With z ∗ = 2.40, the margin of
.
error for all intervals is 2.40 × 19/ 180 = 3.3988. The confidence intervals are listed in the
table below and are shown in the plot (the thinner lines with the wider dashes). (d) When
we use z ∗ = 2.40 to adjust for the fact that we are making three “simultaneous” confidence
intervals, the margin of error is larger, so the intervals overlap more.
Solutions 197
77
Workplace Mean 75
size SCI
73
Mean SCI
< 50 64.45 to 70.01
71
50–200 67.59 to 73.15
69
> 200 72.05 to 77.61
67
< 50 63.83 to 70.63
65
50–200 66.97 to 73.77
63
> 200 71.43 to 78.23 <50 50–200 >200
Workplace size
6.118. Shown below is a sample screenshot from the applet and an example of what the
resulting plot might look like. Most students (99.7% of them) should find that their final
proportion is between 0.90 and 1; 90% will have a proportion between 0.925 and 0.975.
Note: For each n (number of intervals), the number of “hits” would have a binomial
distribution with p = 0.95, but these counts would not be independent; for example, if we
knew there were 28 hits after 30 tries, we would know that there could be no more than 38
after 40 tries.
98
96
Percent hit
94
92
90
88
0 50 100 150 200
Number of intervals
6.119. A sample screenshot and example plot are not shown but would be similar to those
shown above for the previous exercise. Most students (99.4% of them) should find that their
final proportion is between 0.84 and 0.96; 85% will have a proportion between 0.87 and
0.93.
.
6.120. For n = 10, z = 1 −√0 = 0.90, for which P = 0.1841. For the other sample sizes, the
14/ 10
computations are similar; the resulting table and graphs are shown on the following page.
We see that sample size increases the value of the test statistic (assuming the mean is the
same), which in turn decreases the size of the P-value.
198 Chapter 6 Introduction to Inference
2 0.18
0.16
n z P 1.5 0.14
Test statistic
0.12
P -value
10 0.90 0.1841 0.1
20 1.28 0.1003 1
0.08
30 1.56 0.0594 0.06
0.5 0.04
40 1.81 0.0351
0.02
50 2.02 0.0217 0 0
10 20 30 40 50 10 20 30 40 50
Sample size Sample size
√
6.121. (a) x = 5.3 mg/dl, so x ± 1.960σ/ 6 is 4.6132 to 6.0534 mg/dl. (b) To test
4.8 .
H0 : µ = 4.8 mg/dl versus Ha : µ > 4.8 mg/dl, we compute z = x − √ = 1.45 and
0.9/ 6
.
P = 0.0735. This is not strong enough to reject H0 .
Note: The confidence interval in (a) would allow us to say without further computation
that, against a two-sided alternative, we would have P > 0.05. Because we have a one-sided
alternative, we could conclude from the confidence interval that P > 0.025, but that is not
enough information to draw a conclusion.
6.124. (a) The intended population is probably “the Food stores 15.22 to 22.12
American public”; the population that was actually Mass merchandisers 27.77 to 36.99
sampled was “citizens of Indianapolis √ (with listed Pharmacies 43.68 to 53.52
phone numbers).” (b) Take x ± 1.96s/ 201; these intervals are listed on the right. (c) The
confidence intervals do not overlap at all; in particular, the lower confidence limit of the
rating for pharmacies is higher than the upper confidence limit for the other stores. This indi-
cates that the pharmacies are really rated higher.
Solutions 199
√ .
6.125. (a) Under H0 , x has a N (0%, 55%/ 104 ) =
.
N (0%, 5.3932%) distribution. (b) z = 6.9√− 0 = 1.28, 6.9%
55/ 104
so P = P(Z > 1.28) = 0.1003. (c) This is not sig-
nificant at α = 0.05. The study gives some evidence
–15.6% –5.2% 0 5.2% 10.4% 15.6%
of increased compensation, but it is not very strong;
similar results would happen about 10% of the time just by chance.
6.126. No: “Significant at α = 0.01” does mean that the null hypothesis is unlikely, but only in
the sense that the evidence (from the sample) would not occur very often if H0 were true.
There is no probability associated with H0 ; it is either true or it is not.
Note: Bayesian statistics views the parameter we wish to estimate as having a
probability distribution; with that viewpoint, it would make sense to speak of “the
probability that H0 is true.” This textbook does not take the Bayesian approach.
6.127. Yes. That’s the heart of why we care about statistical significance. Significance tests
allow us to discriminate between random differences (“chance variation”) that might occur
when the null hypothesis is true, and differences that are unlikely to occur when H0 is true.
√
6.129. For each sample, find x, then take x ± 1.96(4/ 12 ) = x ± 2.2632.
We “expect” to see that 95 of the 100 intervals will include 25 (the true value of µ);
binomial computations show that (about 99% of the time) 90 or more of the 100 intervals
will include 20.
x−
6.130. For each sample, find x, then compute z = √25 . Choose a significance level α and
4/ 12
the appropriate cutoff point—for example, with α = 0.10, reject H0 if |z| > 1.645; with
α = 0.05, reject H0 if |z| > 1.96.
If, for example, α = 0.05, then we “expect” to reject H0 (that is, make the wrong
decision) only 5 of the 100 times.
x−
6.131. For each sample, find x, then compute z = √23 . Choose a significance level α and the
4/ 12
appropriate cutoff point (z ∗ )—forexample, with α = 0.10, reject H0 if |z| > 1.645; with
α = 0.05, reject H0 if |z| > 1.96.
Because the true mean is 25, Z = x − √25 has a N (0, 1) distribution, so the probability
4/ 12
x−
that we will accept H0 is P −z ∗
< √23 < z ∗ = P(−z ∗ < Z + 1.7321 < z ∗ ) =
4/ 12
P(−1.7321 − z ∗ < Z < −1.7321 + z ∗ ). If α = 0.10 (z ∗ = 1.645), this probability
is P(−3.38 < Z < −0.09) = 0.4637; if α = 0.05 (z ∗ = 1.96), this probability is
P(−3.69 < Z < 0.23) = 0.5909. For smaller α, the probability will be larger. Thus we
“expect” to (wrongly) accept H0 about half the time (or more), and correctly reject H0 about
half the time or less. (The probability of rejecting H0 is essentially the power of the test
against the alternative µ = 25.)
6.132. The test statistics and P-values are in the table below, computed as described in
the text; for example, for conscientiousness, z = 3.80−3.88
0.10 = −0.8. Because we are
200 Chapter 6 Introduction to Inference
performing 13 tests, we should use the Bonferroni procedure (see Exercise 6.102) or some
other multiple-test method. For an overall significance level of α, Bonferroni requires
individual-test significance at α/13; for α = 0.05, this means we need P < 0.0038.
Therefore, the only significant differences (for α = 0.05 or α = 0.01) are for handshake
strength and grip.
Characteristic z P
Conscientiousness −0.8 0.4238
Extraversion 0.9 0.3682
Agreeableness −2.2 0.0278
Emotional stability 1.5 0.1336
Openness to experience 0.5 0.6170
Overall handshake 2.3 0.0214
Handshake strength 5.3 < 0.0001*
Handshake vigor 1.7 0.0892
Handshake grip 3.8 0.0002*
Handshake duration 1.5 0.1336
Eye contact −0.6 0.5486
Professional dress −2.0 0.0456
Interviewer assessment −0.58 0.5626
Chapter 7 Solutions
7.2. In each case, use df = n − 1; if that number is not in Table D, drop to the lower degrees
of freedom. (a) For 95% confidence and df = 10, use t ∗ = 2.228. (b) For 99% confidence
and df = 32, we drop to df = 30 and use t ∗ = 2.750. (Software gives t ∗ = 2.7385 for
df = 32.) (c) For 90% confidence and df = 249, we drop to df = 100 and use t ∗ = 1.660.
(Software gives t ∗ = 1.6510 for df = 249.)
7.3. For the mean monthly rent, the 95% confidence interval for µ is
$96
$613 ± 2.131 √ = $613 ± $51.14 = $561.86 to $664.14
16
7.4. The margin of error for 90% confidence would be smaller (so the interval would be
narrower) because we are taking a greater risk—specifically, a 10% risk—that the interval
does not include the true mean µ.
7.6. For the hypotheses H0 : µ = $550 versus Ha : µ > $550, we find t = 613 −√550 = 2.625
96/ 16
.
with df = 15, for which P = 0.0096. We have strong evidence against H0 , and conclude
that the mean rent is greater than $550.
7.7. Software will typically give a more accurate value for t ∗ than that given in Table D, and
will not round off intermediate values such as the standard deviation. Otherwise, the details
of this computation√ are the same as what is shown in the textbook: df = 7, t ∗ = 2.3646,
6.75 ± t ∗ (3.8822/ 8 ) = 6.75 ± 3.2456 = 3.5044 to 9.9956, or about 3.5 to 10.0 hours per
month.
201
202 Chapter 7 Inference for Distributions
7.9. About −1.33 to 10.13: Using the mean and standard deviation from the previous exercise,
the 95% confidence interval for µ is
4.6152
4.4 ± 2.7765 √ = 4.4 ± 5.7305 = −1.3305 to 10.1305
5
(This is the interval produced by software; using the critical value t ∗ = 2.776 from Table D
gives −1.3296 to 10.1296.)
7.10. See also the solutions to Exercises 1.36, 1.74, and 1.150. The CO2 data are sharply
right-skewed (clearly non-Normal). However, the robustness of the t procedures should make
them safe for this situation because the sample size is large (n = 48). The bigger question
is whether we can treat the data as an SRS; we have recorded CO2 emissions for every
country with a population over 20 million, rather than a random sample.
7.11. The distribution is clearly non-Normal, but the sample size (n = 63) should be sufficient
to overcome this, especially in the absence of strong skewness. One might question the
independence of the observations; it seems likely that after 40 or so tickets had been posted
for sale, that someone listing a ticket would look at those already posted for an idea of what
price to charge.
If we were to use t procedures, we would presumably take the viewpoint that these 63
observations come from a larger population of hypothetical tickets for this game, and we are
trying to estimate the mean µ of that population. However, because (based on the histogram
in Figure 1.33) the population distribution is likely bimodal, the mean µ might not be the
most useful summary of a bimodal distribution.
7.12. The power would be greater because larger differences (like µ > 1) are easier to detect.
7.15. (a) df = 10, t ∗ = 2.228. (b) df = 21, t ∗ = 2.080. (c) df = 21, t ∗ = 1.721. (d) For
a given confidence level, t ∗ (and therefore the margin of error) decreases with increasing
sample size. For a given sample size, t ∗ increases with increasing confidence.
–3 –2 –1 0 1 2 3
7.18. Because the value of x is positive, which supports the direction of the alternative
(µ > 0), the P-value for the one-sided test is half as big as that for the two-sided test:
P = 0.037.
7.20. (a) df = 15. (b) 1.753 < t < 2.131. (c) 0.025 < P < 0.05. (d) t = 2.10 is significant at
.
5%, but not at 1%. (e) From software, P = 0.0265.
7.21. (a) df = 27. (b) 1.703 < t < 2.052. (c) Because the alternative is two-sided, we double
the upper-tail probabilities to find the P-value: 0.05 < P < 0.10. (d) t = 2.01 is not
.
significant at either level (5% or 1%). (e) From software, P = 0.0546.
204 Chapter 7 Inference for Distributions
7.22. (a) df = 13. (b) Because 2.282 < |t| < 2.650, the P-value is between 0.01 < P < 0.02.
.
(c) From software, P = 0.0121.
7.23. Let P be the given (two-sided) P-value, and suppose that the alternative is µ > µ0 . If x
is greater than µ0 , this supports the alternative over H0 . However, if x < µ0 , we would not
take this as evidence against H0 because x is on the “wrong” side of µ0 . So, if the value of
x is on the “correct” side of µ0 , the one-sided P-value is simply P/2. However, if the value
of x is on the “wrong” side of µ0 , the one-sided P-value is 1 − P/2 (which will always be
at least 0.5, so it will never indicate significant evidence against H0 ).
7.24. (a) The distribution is slightly right-skewed, and the largest observation 7 24
stands out from the rest (although it does not quite qualify as an outlier us- 8 355
9 679
ing the 1.5 × IQR rule). (b) The reasonably large sample should be sufficient 10 3456
to overcome the mild non-Normality of the data, and because it was based 11 012899
on a random sample from a large population, t procedures should be appro- 12 0678
. . 13 7
priate.
√ (c) x = 119.0667 and s = 29.5669 friends, so the standard error is
. 14 8
s/ 30 = 5.3982. The critical value for 95% confidence is t ∗ = 2.045, so 15 248
the margin of error is 11.04. (d) With 95% confidence, the mean number of 16 0
Facebook friends at this university is between 108.03 and 130.11. 17 1
18
19 3
7.25. (a) If µ is the mean number of uses a person can produce in 5 minutes after witnessing
.
rudeness, we wish to test H0 : µ = 10 versus Ha : µ < 10. (b) t = 7.88 −√10 = −5.2603,
2.35/ 34
with df = 33, for which P < 0.0001. This is very strong evidence that witnessing rudeness
decreases performance.
Diameter (cm)
has two peaks and a high 1 678 50
value (not quite an outlier). 2 2 40
Both the stemplot and 2 6679
30
quantile plot show that the 3 112
3 5789 20
distribution is not Normal. 4 0033444 10
The five-number summary 4 7
5 112 0
is 2.2, 10.95, 28.5, 41.9, –3 –2 –1 0 1 2 3
5
69.3 (all in cm); a boxplot 6 Normal score
is not shown, but the long 6 9
“whisker” between Q 3 and
the maximum is an indication of the skewness. (b) Maybe: We have a large enough sample
to overcome the non-Normal distribution, but we are sampling from a small population.
√
.
(c) The mean is x = 27.29 cm, s = 17.7058 cm, and the margin of error is t ∗ · s/ 40:
df t∗ Interval
Table D 30 2.042 27.29 ± 5.7167 = 21.57 to 33.01 cm
Software 39 2.0227 27.29 ± 5.6626 = 21.63 to 32.95 cm
(d) One could argue for either answer. We chose a random sample from this tract, so the
main question is, can we view trees in this tract as being representative of trees elsewhere?
7.28. (a) We wish to test H0 : µ = 3421.7 kcal/day versus Ha : µ < 3421.7 kcal/day. (b) The
. .
test statistic is t = 3077 −√3421.7 = −3.73, with df = 113, for which P = 0.0002. (c) Starting
987/ 114
with the average shortfall 3421.7 − 3077.0 = 344.7 kcal/day, the mean deficiency is (with
95% confidence) between about 160 and 530 kcal/day.
df t∗ Interval
Table D 100 1.984 344.7 ± 183.4030 = 161.2970 to 528.1030
Software 113 1.9812 344.7 ± 183.1423 = 161.5577 to 527.8423
(c) Because this is not a random sample, it may not represent other children well.
206 Chapter 7 Inference for Distributions
7.30. (a) The distribution cannot be Normal because all values must be (presumably) integers
between 0 and 4. (b) The sample size (282) should make the t methods appropriate √ because
the distribution of ratings can have no outliers. (c) The margin of error is t ∗ · s/ 282, which
is either 0.1611 (Table D) or 0.1591 (software):
df t∗ Interval
Table D 100 2.626 2.22 ± 0.1611 = 2.0589 to 2.3811
Software 281 2.5934 2.22 ± 0.1591 = 2.0609 to 2.3791
(d) The sample might not represent children from other locations well (or perhaps more
accurately, it might not represent well the opinions of parents of children from other
locations).
df t∗ Interval
90% confidence Table D 100 1.660 2.22 ± 0.1018 = 2.1182 to 2.3218
Software 281 1.6503 2.22 ± 0.1012 = 2.1188 to 2.3212
95% confidence Table D 100 1.984 2.22 ± 0.1217 = 2.0983 to 2.3417
Software 281 1.9684 2.22 ± 0.1207 = 2.0993 to 2.3407
7.32. (a) For example, Subject 1’s weight change is 61.7 − 55.7 = 6 kg. (b) The
.
mean change√is x = 4.73125 kg and the standard deviation is s = 1.7457 kg.
. ∗
(c) SEx = s/ 16 = 0.4364 kg; for df = 15, t = 2.131, so the margin of error for 95%
confidence is ±0.9300 (software: ±0.9302). Based on a method that gives correct results
95% of the time, the mean weight change is 3.8012 to 5.6613 kg (software: 3.8010 to
.
5.6615 kg). (d) x = 10.40875 lb, s = 3.8406 lb, and the 95% confidence interval is 8.3626
to 12.4549 lb (software: 8.3622 to 12.4553 lb). (e) H0 is µ = 16 lb. The test statistic is
.
t = −5.823 with df = 15, which is highly significant evidence (P < 0.0001) against H0
(unless Ha is µ > 16 lb). (f) The data suggest that the excess calories were not converted
into weight; the subjects must have used this energy some other way. (See the next exercise
for more information.)
−0 . .
7.33. (a) t = 328 √
= 5.1250 with df = 15, for which P = 0.0012. There is strong evidence
256/ 16
of a change in NEAT. (b) With t ∗ = 2.131, the 95% confidence interval is 191.6 to
464.4 kcal/day. This tells us how much of the additional calories might have been burned
by the increase in NEAT: It consumed 19% to 46% of the extra 1000 kcal/day.
.
7.34. (a) For the differences, x = $112.5 and s = $123.7437. (b) We wish to test
H0 : µ = 0 versus Ha : µ > 0, where µ is the mean difference between Jocko’s estimates
and those of the other garage. (The alternative hypothesis is one-sided because the
insurance adjusters suspect that Jocko’s estimates may be too high.) For this test, we
.
find t = 112.5 −√0 = 2.87 with df = 9, for which P = 0.0092 (Minitab output
123.7437/ 10
Solutions 207
below). This is significant evidence against H0 —that is, we have good reason to believe
estimates are higher. (c) The 95% confidence interval with df = 9 is
that Jocko’s √
x ± 2.262 s/ 10 = $112.5 ± $88.5148 = $23.99 to $201.01. (The software interval is $23.98
to $201.02.) (d) Student answers may vary; based on the confidence interval, one could
justify any answer in the range $25,000 to $200,000.
7.36. (a) The plots are on the right; the five- 88 6 980
number summary (in units of “picks”) is 89 5
90 9
Min Q1 M Q3 Max 960
91 236679
Pick count
886 919.5 936.5 958 986 92 0133445
93 567 940
There are no outliers or particular skewness, 94 05
but the stemplot reveals two peaks. (The box- 95 0226779 920
plot gives no evidence of the two peaks; they 96 12579
97 07 900
are visible in the quantile plot, but it takes 98 6
a fair amount of thought—or practice—to 880
observe this in a quantile plot.) (b) While
the distribution is non-Normal, there are no 980
outliers or strong skewness, so the sample 960
size n = 36 should make the t procedures
Pick count
. 940
reasonably safe. (c) The mean is x = 938.2,
.
the standard deviation is s = 24.2971,
√ and the 920
.
standard error of the mean is s/ 36 = 4.0495.
900
(All are in units of picks.) (d) The 90% con-
fidence interval for the mean number of picks 880
in a 1-pound bag is: -3 -2 -1 0 1 2 3
Normal score
df t∗ Interval
Table D 30 1.697 938.2 ± 6.8720 = 931.3502 to 945.0943
Software 35 1.6896 938.2 ± 6.8420 = 931.3803 to 945.0642
925 .
7.37. (a) To test H0 : µ = 925 picks versus Ha : µ > 925 picks, we have t = 938.2 − √ = 3.27
24.2971/ 36
.
with df = 35, for which P = 0.0012. (b) For H0 : µ = 935 picks versus Ha : µ > 935 picks,
935 . .
we have t = 938.2 − √ = 0.80, again with df = 35, for which P = 0.2158. (c) The 90%
24.2971/ 36
208 Chapter 7 Inference for Distributions
confidence interval from the previous exercise was 931.4 to 945.1 picks, which includes 935,
but not 925. For a test of H0 : µ = µ0 versus Ha : µ = µ0 , we know that P < 0.10 for
values of µ0 outside the interval, and P > 0.10 if µ0 is inside the interval. The one-sided
P-value would be half of the two-sided P-value.
√
7.38. The 90% confidence interval is 3.8 ± t ∗ (1.02/ 1783 ). With Table D, take df = 1000
and t ∗ = 1.646; with software, take df = 1782 and t ∗ = 1.6457. Either way, the confidence
interval is 3.7602 to 3.8398.
7.39. (a) The differences are spread from −0.018 to 0.020 g, with mean x = −1 85
. −1
−0.0015 and standard deviation s = 0.0122 g. A stemplot is shown on the right;
−0 65
the sample is too small to make judgments about skewness or symmetry. (b) For −0
.
H0 : µ = 0 versus Ha : µ = 0, we find t = −0.0015
√ − 0 = −0.347 with df = 7, 0 2
s/ 8
for which P = 0.7388. We cannot reject H0 based on this sample. (c) The 95% 0 55
1
confidence interval for µ is
1
0.0122 2 0
−0.0015 ± 2.365 √ = −0.0015 ± 0.0102 = −0.0117 to 0.0087 g
8
(d) The subjects from this sample may be representative of future subjects, but the test re-
sults and confidence interval are suspect because this is not a random sample.
7.40. (a) The differences are spread from −31 to 45 g/cm2 , with mean x = 4.625 −3 1
. −2
and standard deviation s = 26.8485 g/cm2 . A stemplot is shown on the right; 8
−1
the sample is too small to make judgments about skewness or symmetry. (b) For −0
. 4
H0 : µ = 0 versus Ha : µ = 0, we find t = 4.625√− 0 = 0.487 with df = 7, for 0 16
s/ 8
which P = 0.6410. We cannot reject H0 based on this sample. (c) The 95% 1 3
2
confidence interval for µ is
3 5
26.8485 4 5
4.625 ± 2.365 √ = 4.625 ± 22.4494 = −17.8244 to 27.0744 g/cm2
8
(d) See the answer to part (d) of the previous exercise.
7.44. We have data for all countries with population at least 20 million, so this cannot be
considered a random sample of (say) all countries.
7.45. We test H0 : median = 0 versus Ha : median > 0—or equivalently, H0 : p = 1/2 versus
Ha : p > 1/2, where p is the probability that Jocko’s estimate is higher. One difference is
0; of the nine non-zero differences, seven are positive. The P-value is P(X ≥ 7) = 0.0898
from a B(9, 0.5) distribution; there is not quite enough evidence to conclude that Jocko’s
estimates are higher. In Exercise 7.34 we were able to reject H0 ; here we cannot.
Note: The failure to reject H0 in this case is because with the sign test, we pay attention
only to the sign of each difference, not the size. In particular, the negative differences are
each given the same “weight” as each positive difference, in spite of the fact that the
negative differences are only −$50 and −$75, while most of the positive differences are
larger. See the “Caution” about the sign test on page 425 of the text.
210 Chapter 7 Inference for Distributions
7.46. We test H0 : median = 0 versus Ha : median = 0. The Minitab output below gives P = 1
because there were four positive and four negative differences, giving us no reason to doubt
H0 . (This is the same conclusion we reached with the t test, for which P = 0.7388.)
7.47. We test H0 : median = 0 versus Ha : median = 0. There were three negative and
five positive differences, so the P-value is 2P(X ≥ 5) for a binomial distribution with
parameters n = 8 and p = 0.5. From Table C or software (Minitab output below), we have
P = 0.7266, which gives no reason to doubt H0 . The t test P-value was 0.6410.
7.48. We test H0 : median = 0 versus Ha : median > 0, or H0 : p = 1/2 versus Ha : p > 1/2.
Three of the 20 differences are zero; of the other 17, 16 are positive. The P-value
is P(X ≥ 16) for a B(17, 0.5) distribution. While Table C cannot give us the exact
value of this probability, if we weaken the evidence by pretending that the three zero
differences were negative and look at the B(20, 0.5) distribution, we can estimate that
P < 0.0059—enough information to reject the null hypothesis. In fact, software reports the
P-value as 0.0001. (For the t test, we found P = 0.0005.)
7.49. We test H0 : median = 0 versus Ha : median > 0, or H0 : p = 1/2 versus Ha : p > 1/2.
Out of the 20 differences, 17 are positive (and none equal 0). The P-value is P(X ≥ 17)
for a B(20, 0.5) distribution. From Table C or software (Minitab output below), we have
P = 0.0013, so we reject H0 and conclude that the results of the two computations are
.
different. (Using a t test, we found P = 0.0003, which led to the same conclusion.)
.
7.51. The standard deviation for the given data was s = 0.012224. With α = 0.05, t = x
√ ,
√ s/ 15
and df = 14, we reject H0 if |t| ≥ 2.145, which means |x| ≥ (2.145)(s/ 15 ), or
|x| ≥ 0.00677. Assuming µ = 0.002:
P(|x| ≥ 0.00677) = 1 − P (−0.00677 ≤ x ≤ 0.00677)
−0.00677 − 0.002 x − 0.002 0.00677 − 0.002
=1− P √ ≤ √ ≤ √
s/ 15 s/ 15 s/ 15
= 1 − P (−2.78 ≤ Z ≤ 1.51)
.
= 1 − (0.9345 − 0.0027) = 0.07
The power is about 7% against this alternative—not surprising, given the small sample size,
and the fact that the difference (0.002) is small relative to the standard deviation.
Note: Power calculations are often done with software. This may give answers that
differ slightly from those found by the method described in the text. Most software does
these computations with a “noncentral t distribution” (used in the text for two-sample
power problems) rather than a Normal distribution, resulting in more accurate answers. In
most situations, the practical conclusions drawn from the power computations are the same
regardless of the method used.
7.53. Taking s = 1.5 as in Example 7.9, the power for the alternative µ = 0.75 is:
√
t ∗s x − 0.75 t ∗ s/ n − 0.75 √
P x ≥ √ when µ = 0.75 = P √ ≥ √ = P Z ≥ t ∗ − 0.5 n
n s/ n s/ n
.
Using trial-and-error, we find that with n = 26, power = 0.7999, and with n = 27,
.
power = 0.8139. Therefore, we need n > 26.
7.54. (a) Use a two-sided alternative (Ha : µ A = µ B ) because we (presumably) have no prior
suspicion that one design will be better than the other. (b) Both sample sizes are the same
(n 1 = n 2 = 15), so the appropriate degrees of freedom would be df = 15 − 1 = 14. (c) For a
two-sided test at α = 0.05, we need |t| > t ∗ , where t ∗ = 2.145 is the 0.025 critical value for
a t distribution with df = 14.
7.55. Because 2.264 < t < 2.624 and the alternative is two-sided, Table D tells us that the
P-value is 0.02 < P < 0.04. (Software gives P = 0.0280.) That is sufficient to reject H0 at
α = 0.05.
212 Chapter 7 Inference for Distributions
.
7.56. We find SE D = 2.0396. The options for the df t∗ Confidence interval
95% confidence interval for µ1 − µ2 are shown on 85.4 1.9881 −14.0550 to −5.9450
the right. This interval includes fewer values than a 49 2.0096 −14.0987 to −5.9013
99% confidence interval would (that is, a 99% con- 40 2.021 −14.1220 to -5.8780
fidence interval would be wider) because increasing
our confidence level means that we need a larger margin of error.
.
7.57. We find SE D = 4.5607. The options for the df t∗ Confidence interval
95% confidence interval for µ1 − µ2 are shown on 15.7 2.1236 −19.6851 to −0.3149
the right. The instructions for this exercise say to 9 2.262 −20.3163 to 0.3163
use the second approximation (df = 9), in which
case we do not reject H0 , because 0 falls in the the 95% confidence interval. Using the first
approximation (df = 15.7, typically given by software), the interval is narrower, and we
.
would reject H0 at α = 0.05 against a two-sided alternative. (In fact, t = −2.193, for which
. .
0.05 < P < 0.1 [Table D, df = 9], or P = 0.0438 [software, df = 15.7].)
7.59. SPSS and SAS give both results (the SAS output refers to the unpooled result as the
Satterthwaite method), while JMP and Excel show only the unpooled procedures. The
pooled t statistic is 1.998, for which P = 0.0808.
Note: When the sample sizes are equal—as in this case—the pooled and unpooled t
statistics are equal. (See the next exercise.)
Both Excel and JMP refer to the unpooled test with the slightly-misleading phrase
“assuming unequal variances.” The SAS output also implies that the variances are unequal
for this method. In fact, unpooled procedures make no assumptions about the variances.
Finally, note that both Excel and JMP can do pooled procedures as well as the unpooled
procedures that are shown.
7.61. (a) Hypotheses should involve µ1 and µ2 (population means) rather than x 1 and x 2
(sample means). (b) The samples are not independent; we would need to compare the 56
males to the 44 females. (c) We need P to be small (for example, less than 0.10) to reject
H0 . A large P-value like this gives no reason to doubt H0 . (d) Assuming the researcher
computed the t statistic using x 1 − x 2 , a positive value of t does not support Ha . (The
one-sided P-value would be 0.982, not 0.018.)
Solutions 213
7.62. (a) Because 0 is not in the confidence interval, we would reject H0 at the 5% level.
(b) Larger samples generally give smaller margins of error (at the same confidence level,
and assuming that the standard deviations for the large and small samples are about the
same). One conceptual explanation for this is that larger samples give more information and
therefore offer more precise results. A more mathematical explanation:
In looking at the
formula for a two-sample confidence interval, we see that SE D = s1 /n 1 + s22 /n 2 , so that if
2
7.63. (a) We cannot reject H0 : µ1 = µ2 in favor of the two-sided alternative at the 5% level
.
because 0.05 < P < 0.10 (Table D) or P = 0.0542 (software). (b) We could reject H0
in favor of Ha : µ1 < µ2 . A negative t-statistic means that x 1 < x 2 , which supports the
claim that µ1 < µ2 , and the one-sided P-value would be half of its value from part (a):
.
0.025 < P < 0.05 (Table D) or P = 0.0271 (software).
.
7.64. We find SE D = 3.4792. The options for the df t∗ Confidence interval
95% confidence interval for µ1 − µ2 are shown on 87.8 1.9873 −16.9144 to −3.0856
the right. A 99% confidence interval would include 39 2.0227 −17.0374 to −2.9626
more values (it would be wider) because increasing 30 2.042 −17.1046 to −2.8954
our confidence level means that we need a larger margin of error.
7.65. (a) Stemplots (right) do not look particularly Normal, but Neutral Sad
they have no extreme outliers or skewness, so t procedures 0 0000000 0 0
should be reasonably safe. (b) The table of summary statistics 0 55 0 5
1 000 1 000
is below on the left. (c) We wish to test H0 : µ N = µ S versus
. 1 1 555
Ha : µ N < µ S . (d) We find SE D = 0.3593 and t = −4.303, 2 00 2 0
. .
so P = 0.0001 (df = 26.5) or P < 0.0005 (df = 13). Ei- 2 55
ther way, we reject H0 . (e) The 95% confidence interval for the 3 00
3 55
difference is one of the two options in the table below on the 4 00
right.
Group n x s df t∗ Confidence interval
Neutral 14 $0.5714 $0.7300 26.5 2.0538 −2.2842 to −0.8082
Sad 17 $2.1176 $1.2441 13 2.160 −2.3224 to −0.7701
214 Chapter 7 Inference for Distributions
7.66. (a) The scores can be examined with either his- Primed Non-primed
tograms or stemplots. Neither distribution reveals 1 1 00
any extreme skewness or outliers, so t procedures 2 00 2 00
3 000 3 000000000000
should be safe. (b) Summary statistics for the two 4 0000000000 4 000
distributions are given below on the left. We find 5 0000000 5 0
. .
SE D = 0.2891 and t = 3.632, so P = 0.0008
.
(df = 39.5) or 0.001 < P < 0.002 (df = 19). Either way, we reject H0 . (c) The 95% con-
fidence interval for the difference is one of the two options in the table below on the right.
(d) The hypothesis test and confidence interval suggest that primed individuals had a more
positive response to the shampoo label, with an average rating between 0.4 and 1.6 points
higher than the unprimed group. (However, priming can only do so much, as the average
score in the primed group was only 4 on a scale of 1 to 7.)
Group n x s df t∗ Confidence interval
Primed 22 4.00 0.9258 39.5 2.0220 0.4655 to 1.6345
Non-primed 20 2.95 0.9445 19 2.093 0.4450 to 1.6550
7.67. (a) The female means and standard deviations df t∗ Confidence interval
. .
are x F = 4.0791 and s F = 0.9861; for males, 402.2 1.9659 0.0793 to 0.4137
. .
they are x M = 3.8326 and s M = 1.0677. (b) Both 220 1.9708 0.0788 to 0.4141
distributions are somewhat skewed to the left. This 100 1.984 0.0777 to 0.4153
can be seen by constructing a histogram, but is also evident in the data table in the text by
noting the large numbers of “4” and “5” ratings for both genders. However, because the rat-
ings range from 1 to 5, there are no outliers, so the t procedures should be safe. (c) We find
. . . .
SE D = 0.0851 and t = 2.898, for which P = 0.0040 (df = 402.2) or 0.002 < P < 0.005
(df = 220). Either way, there is strong evidence of a difference in satisfaction. (d) The 95%
confidence interval for the difference is one of the three options in the table on the right—
roughly 0.08 to 0.41. (e) While we have evidence of a difference in mean ratings, it might
not be as large as 0.25.
.
7.68. (a) For testing H0 : µLC = µLF versus Ha : µLC = µLF , we have SE D = 6.7230 and
. .
t = 4.165, so P < 0.0001 (df = 62.1) or P < 0.001 (df = 31). Either way, we clearly reject
H0 . (b) It might be that the moods of subjects who dropped out differed from the moods of
those who stayed; in particular, it seems reasonable to suspect that those who dropped out
had higher TMDS scores.
Solutions 215
7.69. (a) Assuming we have SRSs from each pop- df t∗ Confidence interval
ulation, use of two-sample t procedures seems 76.1 1.9916 −11.7508 to 15.5308
reasonable. (We cannot assess Normality, but the 36 2.0281 −12.0005 to 15.7805
large sample sizes would overcome most problems.) 30 2.042 −12.0957 to 15.8757
.
(b) We wish to test H0 : µ f = µm versus Ha : µ f = µm . (c) We find SE D = 6.8490 mg/dl.
. .
The test statistic is t = 0.276, with df = 76.1 (or 36—use 30 for Table D), for which
.
P = 0.78. We have no reason to believe that male and female cholesterol levels are different.
(d) The options for the 95% confidence interval for µ f − µm are shown on the right. (e) It
might not be appropriate to treat these students as SRSs from larger populations.
Note: Because t distributions are more spread out than Normal distributions, a t-value
that would not be significant for a Normal distribution (such as 0.276) cannot possibly be
significant when compared to a t distribution.
7.71. (a) The distribution cannot be Normal because all df t∗ Confidence interval
numbers are integers. (b) The t procedures should 354.0 1.9667 0.5143 to 0.9857
be appropriate because we have two large samples 164 1.9745 0.5134 to 0.9866
with no outliers. (c) We will test H0 : µ I = µC 100 1.984 0.5122 to 0.9878
versus Ha : µ I > µC (or µ I = µC ). The one-sided alternative reflects the researchers’
(presumed) belief that the intervention would increase scores on the test. The two-sided
alternative allows
for the possibility that the intervention might have had a negative effect.
. .
(d) SE D = s I /n I + sC2 /n C = 0.1198 and t = (x I − x C )/SE D = 6.258. Regardless of how
2
.
we compute degrees of freedom (df = 354 or 164), the P-value is very small: P < 0.0001.
We reject H0 and conclude that the intervention increased test scores. (e) The interval is
x I − x C ± t ∗ SE D ; the value of t ∗ depends on the df (see the table), but note that in every case
the interval rounds to 0.51 to 0.99. (f) The results for this sample may not generalize well to
other areas of the country.
216 Chapter 7 Inference for Distributions
7.73. (a) This may be near enough to an SRS, if this df t∗ Confidence interval
company’s working conditions were similar to that 137.1 1.9774 9.9920 to 13.0080
.
of other workers. (b) SE D = 0.7626; regardless of 114 1.9810 9.9893 to 13.0107
how we choose df, the interval rounds to 9.99 to 100 1.984 9.9870 to 13.0130
3
13.01 mg.y/m . (c) A one-sided alternative would seem to be reasonable here; specifically,
we would likely expect that the mean exposure for outdoor workers would be lower. For
testing H0 , we find t = 15.08, for which P < 0.0001 with either df = 137 or 114 (and
for either a one- or a two-sided alternative). We have strong evidence that outdoor concrete
workers have lower dust exposure than the indoor workers. (d) The sample sizes are large
enough that skewness should not matter.
.
7.74. With the given standard deviations, SE D = df t∗ Confidence interval
0.2653; regardless of how we choose df, a 95% con- 121.5 1.9797 4.3747 to 5.4253
fidence interval for the difference in means rounds 114 1.9810 4.3744 to 5.4256
to 4.37 to 5.43 mg.y/m3 . With the null hypothesis 100 1.984 4.3736 to 5.4264
H0 : µi = µo (and either a one- or two-sided alternative, as in the previous exercise), we find
t = 18.47, for which P < 0.0001 regardless of df and the chosen alternative. We have strong
evidence that outdoor concrete workers have lower respirable dust exposure than the indoor
workers.
7.76. (a) The 68–95–99.7 rule suggests that the distri- df t∗ Confidence interval
butions are not Normal: If they were Normal, then 7.8 2.3159 −16.4404 to 3.8404
(for example) 95% of 7-to-10-year-olds drink be- 4 2.776 −18.4551 to 5.8551
tween −13.2 and 29.6 oz of sweetened drinks per day.
As negative numbers do not make sense (unless some children are regurgitating sweetened
. .
drinks), the distributions must be right-skewed. (b) We find SE D = 4.3786 and t = −1.439,
.
with either df = 7.8 (P = 0.1890) or df = 4 (P = 0.2236). We do not have enough evidence
to reject H0 . (c) The possible 95% confidence intervals are given in the table. (d) Because
the distributions are not Normal and the samples are small, the t procedures are questionable
for these data. (e) Because this group is not an SRS—and indeed might not be random in
any way—we would have to be very cautious about extending these results to other children.
7.77. This is a matched pairs design; for example, Monday hits are (at least potentially) not
independent of one another. The correct approach would be to use one-sample t methods on
the seven differences (Monday hits for design 1 minus Monday hits for design 2, Tuesday/1
minus Tuesday/2, and so on).
7.78. (a) Results for this randomization will depend on df t∗ Confidence interval
.
the technique used. (b) SE D = 0.5235, and the op- 12.7 2.1651 0.6667 to 2.9333
tions for the 95% confidence interval are given on the 9 2.262 0.6160 to 2.9840
right. (c) Because 0 falls outside the 95% confidence
.
interval, the P-value is less than 0.05, so we would reject H0 . (For reference, t = 3.439 and
the actual P-value is either 0.0045 or 0.0074, depending on which df we use.)
7.79. The next 10 employees who need screens might not be an independent group—perhaps
they all come from the same department, for example. Randomization reduces the chance
that we end up with such unwanted groupings.
.
7.83. (a) SE D = 1.9686. Answers will vary with the df df t∗ Confidence interval
used (see the table), but the interval is roughly −1 to 122.5 1.9795 −0.8968 to 6.8968
7 units. (b) Because of random fluctuations between 54 2.0049 −0.9468 to 6.9468
stores, we might (just by chance) have seen a rise in 50 2.009 −0.9549 to 6.9549
the average number of units sold even if actual mean sales had remained unchanged. (Based
on the confidence interval, mean sales might have even dropped slightly.)
7.84. (a) Good statistical practice dictates that the alternative hypothesis should be chosen
without looking at the data; we should only choose a one-sided alternative if we have some
reason to expect it before looking at the sample results. (b) The correct P-value is twice
that reported for the one-tailed test: P = 0.12.
7.86. See the solution to Exercise 7.65 for a table of means and standard deviations.
.
The pooled standard deviation is sp = 1.0454, so the pooled standard error is
√ . .
sp 1/14 + 1/17 = 0.3773. The test statistic is t = −4.098 with df = 29, for which
.
P = 0.0002, and the 95% confidence interval (with t ∗ = 2.045) is −2.3178 to −0.7747.
In the solution to Exercise 7.65, we reached the same conclusion on the significance test
. .
(t = −4.303 and P = 0.0001) and the confidence interval was quite similar (roughly −2.3
to −0.8).
7.87. See the solution to Exercise 7.66 for a table of means and standard deviations.
.
The pooled standard deviation is sp = 0.9347, so the pooled standard error is
√ . .
sp 1/22 + 1/20 = 0.2888. The test statistic is t = 3.636 with df = 40, for which
.
P = 0.0008, and the 95% confidence interval (with t ∗ = 2.021) is 0.4663 to 1.6337.
In the solution to Exercise 7.66, we reached the same conclusion on the significance test
. . .
(t = 3.632 and P = 0.0008) and the confidence interval (using the more-accurate df = 39.5)
was quite similar: 0.4655 to 1.6345.
7.88. See the solution to Exercise 7.67 for means and df t∗ Confidence interval
standard deviations. The pooled standard deviation 687 1.9634 0.0842 to 0.4088
.
is sp = 1.0129, so the pooled standard error is 100 1.984 0.0825 to 0.4105
√ . .
sp 1/468 + 1/221 = 0.0827. The test statistic is t = 2.981 with df = 687, for which
P = 0.0030 (or, using Table D, 0.002 < P < 0.005). The 95% confidence interval is one of
the two entries in the table on the right.
In the solution to Exercise 7.67, we reached the same conclusion on the significance test
. .
(t = 2.898 and P = 0.0040). The confidence intervals were slightly wider, but similar; using
.
the more-accurate df = 402.2, the interval was 0.0793 to 0.4137. (The other intervals were
wider than this.)
220 Chapter 7 Inference for Distributions
7.89. See the solution to Exercise 7.81 for means and df t∗ Confidence interval
standard deviations. The pooled standard devia- 58 2.0017 −19.0830 to −2.5837
.
tion is sp = 15.9617, and the standard error is 50 2.009 −19.1130 to −2.5536
.
SE D = 4.1213. For the significance test, t = −2.629, df = 58, and P = 0.0110, so we
have fairly strong evidence (though not quite significant at α = 0.01) that the south mean is
greater than the north mean. Possible answers for the confidence interval (with software, and
with Table D) are given in the table. All results are similar to those found in Exercise 7.81.
Note: If n1 = n2 (as in this case), the standard error and t statistic are the same for the
usual and pooled procedures. The degrees of freedom will usually be different (specifically, df
is larger for the pooled procedure, unless s1 = s2 and n1 = n2 ).
. . .
7.91. With sn = 17.5001, ss = 14.2583, and n n = n s = 30, we have sn2 /n n = 10.2085 and
.
ss2 /n s = 6.7767, so: 2
sn ss2 2
nn + ns . (10.2085 + 6.7767)
2
.
df = 2 2 2 2 = = 55.7251
29 (10.2085 + 6.7767 )
sn s 1 2 2
1
nn − 1 nn + n s 1− 1 nss
. . .
7.92. With se = 16.0743, sw = 15.3314, and n e = n w = 30, we have se2 /n e = 8.6128 and
.
sw /n w = 7.8351, so:
2
2
se sw2 2
ne + nw . (8.6128 + 7.8351)
2
.
df = 2 2 2 2 = 1 = 57.8706
29 (8.6128 + 7.8351 )
se s 2 2
1
ne − 1 ne + n w1− 1 nww
Solutions 221
. .
7.93. (a) With si = 7.8, n i = 115, so = 3.4, df t∗ Confidence interval
.
and n o = 220, we have si2 /n i = 0.5290 and Part (d) 333 1.9671 10.2931 to 12.7069
.
so2 /n o = 0.05455, so: 100 1.984 10.2827 to 12.7173
Part (e) 333 1.9671 4.5075 to 5.2925
. (0.5290 + 0.05455) .
2
df = = 137.0661 100 1.984 4.5042 to 5.2958
0.52902 0.054552
+
114 219
.
(n i − 1)si2 + (n o − 1)so2
(b) sp = = 5.3320, which is slightly closer to so (the standard deviation
ni + no − 2
.
from the larger sample). (c) With no assumption of equality, SE1 = si2 /n i + so2 /n o =
√ .
0.7626. With the pooled method, SE2 = sp 1/n i + 1/n o = 0.6136. (d) With the pooled
.
standard deviation, t = 18.74 and df = 333, for which P < 0.0001, and the 95% confidence
interval is as shown in the table. With the smaller standard error, the t value is larger (it had
been 15.08), and the confidence interval is narrower. The P-value is also smaller (although
. .
both are less than 0.0001). (e) With si = 2.8, n i = 115, so = 0.7, and n o = 220, we have
. .
si2 /n i = 0.06817 and so2 /n o = 0.002227, so:
. (0.06817 + 0.002227) .
2
df = = 121.5030
0.068172 0.0022272
+
114 219
.
The pooled standard deviation is sp = 1.7338; the standard errors are SE1 = 0.2653 (with
no assumptions) and SE2 = 0.1995 (assuming equal standard deviations). The pooled t is
24.56 (df = 333, P < 0.0001), and the 95% confidence intervals are shown in the table. The
pooled and usual t procedures compare similarly to the results for part (d): With the pooled
procedure, t is larger, and the interval is narrower.
7.94. We have n 1 = n 2 = 5. (a) For a two-sided test with df = 4, the critical value is
t ∗ = 2.776. (b) With the pooled procedures, df = 8 and the critical value is t ∗ = 2.306.
(c) The smaller critical value with the pooled approach means that a smaller t-value (that is,
weaker evidence) is needed to reject H0 .
Note: When software is available, we use the more accurate degrees of freedom for the
standard approach. In this case, pooling typically is less beneficial; for this example, the
.
software output shown in Figure 7.14 shows that df = 7.98 for the unpooled approach.
7.95. (a) From an F(15, 22) distribution with α = 0.05, F ∗ = 2.15. (b) Because F = 2.45
is greater than the 5% critical value, but less than the 2.5% critical value (F ∗ = 2.50),
we know that P is between 2(0.025) = 0.05 and 2(0.05) = 0.10. (Software tells us that
P = 0.055.) F = 2.45 is significant at the 10% level but not at the 5% level.
7.96. The power would be higher. Larger differences are easier to detect; µ 1 − µ2 Power
that is, when µ1 − µ2 is more than 5, there is a greater chance that the 5 0.7965
test statistic will be significant. 6 0.9279
Note: In fact, as the table on the right shows, if we repeat the com- 7 0.9817
putations of Example 7.23 with larger values of µ1 − µ2 , the power 8 0.9968
increases rapidly. 9 0.9996
222 Chapter 7 Inference for Distributions
7.97. The power would be smaller. A larger value of σ means that large dif- σ Power
ferences between the sample means would arise more often by chance so 7.4 0.7965
that, if we observe such a difference, it gives less evidence of a difference 7.5 0.7844
in the population means. 7.6 0.7722
Note: The table on the right shows the decrease in the power as σ in- 7.7 0.7601
creases. 7.8 0.7477
2 .
7.101. The test statistic is F = 1.16
1.152
= 1.0175 with df 211 and 164. Table E tells us that
P > 0.20, while software gives P = 0.9114. The distributions are not Normal (“total score
was an integer between 0 and 6”), so the test may not be reliable (although with s1 and
s2 so close, the conclusion is probably correct). To reject at the 5% level, we would need
F > F ∗ , where F ∗ = 1.46 (using df 120 and 100 from Table E) or√F ∗ = 1.3392 (using
software). As F = s22 /s12 , we would need s22 > s12 F ∗ , or s2 > 1.15 F ∗ , which is about
1.3896 (Table E) or 1.3308 (software).
2 .
7.102. The test statistic is F = 1.19
1.122
= 1.1289 with df 164 and 211. Table E tells us that
P > 0.2, while software gives P = 0.4063. We cannot conclude that the standard deviations
are different. The distributions are not Normal (because all responses are integers from 1 to
5), so the test may not be reliable.
2 .
7.103. The test statistic is F = 7.8
3.72
= 5.2630 with df 114 and 219. Table E tells us that
P < 0.002, while software gives P < 0.0001; we have strong evidence that the standard
deviations differ. The authors described the distributions as somewhat skewed, so the
Normality assumption may be violated.
Solutions 223
2
7.104. The test statistic is F = 2.8
0.72
= 16 with df 114 and 219. Table E tells us that P < 0.002,
while software gives P < 0.0001; we have strong evidence that the standard deviations
differ. We have no information about the Normality of the distributions, so it is difficult
to determine how reliable these conclusions are. (We can observe that for Exercise 7.73,
x 1 − 3s1 and x 2 − 3s2 were both negative, hinting at the skewness of those distributions.
For Exercise 7.74, this is not the case, suggesting that these distributions might not be as
skewed.)
. 2 .
7.105. The test statistic is F = 17.5001
14.25832
= 1.5064 with df 29 and 29. Table E tells us that
P > 0.2, while software gives P = 0.2757; we cannot conclude that the standard deviations
differ. The stemplots and boxplots of the north/south distributions in Exercise 7.81 do not
appear to be Normal (both distributions were skewed), so the results may not be reliable.
. 2 .
7.106. The test statistic is F = 16.0743
15.33142
= 1.0993 with df 29 and 29. Table E tells us that
P > 0.2, while software gives P = 0.8006; we cannot conclude that the standard deviations
differ. The stemplots and boxplots of the east/west distributions in Exercise 7.82 do not
appear to be Normal (both distributions were skewed), so the results may not be reliable.
. 2 .
7.107. (a) To test H0 : σ1 = σ2 versus Ha : σ1 = σ2 , we find F = 7.1554
6.76762
= n F∗
1.1179. We do not reject H0 . (b) With an F(4, 4) distribution with a two- 5 9.60
sided alternative, we need the critical value for p = 0.025: F ∗ = 9.60. The 4 15.44
table on the right gives the critical values for other sample sizes. With such 3 39.00
2 647.79
small samples, this is a very low-power test; large differences between σ1
and σ2 would rarely be detected.
7.108. (a) For two samples of size 20, we have Power Power
noncentrality parameter n δ t∗ (Normal) (software)
10 . 20 1.5811 2.0244 0.3288 0.3379
δ= √ = 1.5811
20 2/20 60 2.7386 1.9803 0.7759 0.7753
The power is about 0.33 (using the Normal
approximation) or 0.34 (software); see the table on the right. (b) With n 1 = n 2 = 60, we
.
have δ = 2.7386, df = 118, and t ∗ = 1.9803 (or 1.984 for df = 100 from Table D). The
approximate power is about 0.78 (details in the table on the right). (c) Samples of size 60
would give a reasonably good chance of detecting a difference of 20 cm.
7.109. The four standard deviations from Exercises 7.81 and 7.82 are Power with n =
. . . .
sn = 17.5001, ss = 14.2583, se = 16.0743, and sw = 15.3314 cm. σ 20 60
Using a larger σ for planning the study is advisable because it pro- 15 0.5334 0.9527
vides a conservative (safe) estimate of the power. For example, if 16 0.4809 0.9255
we choose a sample size to provide 80% power and the true σ is 17 0.4348 0.8928
smaller than that used for planning, the actual power of the test is 18 0.3945 0.8560
greater than the desired 80%.
Results of additional power computations depend on what students consider to be “other
reasonable values of σ .” Shown in the table are some possible answers using the Normal
approximation. (Powers computed using the noncentral t distribution are slightly greater.)
224 Chapter 7 Inference for Distributions
√ .
7.110. (a) The noncentrality parameter is δ = 1.5
= 5.3446. With such a large value of δ,
1.6 2/65
the value of t ∗ (1.9787 for df = 128, or 1.984 for df = 100 from Table D) does not matter
.
very much. The Normal approximation for the power is P(Z > t ∗ − δ) = 0.9996 for either
.
choice of t ∗ . Software gives the same result. (b) For samples of size 100, δ = 6.6291, and
once again the value of t ∗ makes little difference; the power is very close to 1 (using the
Normal approximation or software). (c) Because the effect is large relative to the standard
deviation, small samples are sufficient. (Even samples of size 20 will detect this difference
with probability 0.8236.)
.
7.111. The mean is x = 140.5, the standard deviation is s = 13.58, and the standard error of
.
the mean is sx = 6.79. It would not be appropriate to construct a confidence interval because
we cannot consider these four scores to be an SRS.
7.112. To support the alternative µ1 > µ2 , we need to see x 1 > x 2 , so that t = (x 1 − x 2 )/SE D
must be positive. (a) If t = 2.08, the one-sided P-value is half of the reported two-sided
value (P = 0.04), so we reject H0 at α = 0.05. (b) t = −2.08 does not support Ha ; the
one-sided P-value is 0.96. We do not reject H0 at α = 0.05 (or any reasonable choice of α).
7.113. The plot (below, left) shows that t ∗ approaches 1.96 as df increases.
7.114. The plot (below, right) shows that t ∗ approaches 1.645 as df increases.
4 2.75
t* for 95% confidence
3.5 2.5
3 2.25
2.5 2
2 1.75
1.5 1.5
0 20 40 60 80 100 0 20 40 60 80 100
Degrees of freedom Degrees of freedom
√
7.115. The margin of error is t ∗ / n, using t ∗ for df = n − 1 and 95% confidence. For
example, when n = 5, the margin of error is 1.2417, and when n = 10, it is 0.7154, and
for n = 100, it is 0.1984. As we see in the plot (next page, left), as sample size increases,
margin of error decreases (toward 0, although it gets there very slowly).
√
7.116. The margin of error is t ∗ / n, using t ∗ for df = n − 1 and 99% confidence. For
example, when n = 5, the margin of error is 2.0590, and when n = 10, it is 1.0277, and for
n = 100, it is 0.2626. As we see in the plot (next page, right), as sample size increases,
margin of error decreases (toward 0, although it gets there very slowly).
Solutions 225
0.4
0.5
0.2
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Sample size Sample size
7.117. (a) Use two independent samples (students that live in the dorms, and those that live
elsewhere). (b) Use a matched pairs design: Take a sample of college students, and have
each subject rate the appeal of each label design. (c) Take a single sample of college
students, and ask them to rate the appeal of the product.
7.118. (a) Take a single sample of customers, and record the age of each subject. (b) Use two
independent samples (this year’s sample, and last year’s sample). (c) Use a matched pairs
design: Ask each customer in the sample to rate each floor plan.
.
7.119. (a) To test H0 : µ = 1.5 versus Ha : µ < 1.5, we have t = 1.20 −
√ 1.5 = −2.344 with
1.81/ 200
.
df = 199, for which P = 0.0100. We can reject H0 at the 5% significance level. (b) From
Table D, use df = 100 and t ∗ = 1.984, so the 95% confidence interval for µ is
1.81
1.20 ± 1.984 √ = 1.20 ± 0.2539 = 0.9461 to 1.4539 violations
200
(With software, the interval is 0.9476 to 1.4524.) (c) While the significance test lets us
conclude that there were fewer than 1.5 violations (on the average), the confidence interval
gives us a range of values for the mean number of violations. (d) We have a large sample
(n = 200), and the limited range means that there are no extreme outliers, so t procedures
should be safe.
7.120. (a) The prior studies give us reason to expect greater improvement from those who used
.
the computer, so we test H0 : µC = µ D versus Ha : µC > µ D . (b) With SE D = 0.7527, the
. . .
test statistic is t = 2.79, so P = 0.0027 (df = 484.98) or 0.0025 < P < 0.005 (df = 241;
use df = 100 in Table D). Either way, we reject H0 at the α = 0.05 level. (c) While we
have strong evidence that the mean improvement is greater using computerized training, we
cannot draw this conclusion about an individual’s response.
7.121. (a) The mean difference in body weight change (with wine minus
√ without wine)
.
was x 1 = 0.4 − 1.1 = −0.7 kg, with standard error SE1 = 8.6/ 14 = 2.2984 kg.
The mean difference
√ in caloric intake was x 2 = 2589 − 2575 = 14 cal, with
.
SE2 = 210/ 14 = 56.1249 cal. (b) The t statistics ti = x i /SEi , both with df = 13, are
t1 = −0.3046 (P1 = 0.7655) and t2 = 0.2494 (P2 = 0.8069). (c) For df = 13, t ∗ = 2.160,
so the 95% confidence intervals x i ± t ∗ SEi are −5.6646 to 4.2646 kg (−5.6655 to 4.2655
with software) and −107.2297 to 135.2297 cal (−107.2504 to 135.2504 with software).
226 Chapter 7 Inference for Distributions
(d) Students might note a number of factors in their discussions; for example, all subjects
were males, weighing 68 to 91 kg (about 150 to 200 lb), which may limit how widely we
can extend these conclusions.
7.123. How much a person eats or drinks may depend on how many people he or she is sitting
with. This means that the individual customers within each wine-label group probably
cannot be considered to be independent of one another, which is a fundamental assumption
of the t procedures.
. . √
7.124. The mean is x = 26.8437 cm, s = 18.3311 cm, and the margin of error is t ∗ · s/ 584:
df t∗ Interval
Table D 100 1.984 26.8437 ± 1.5050 = 25.3387 to 28.3486 cm
Software 583 1.9640 26.8437 ± 1.4898 = 25.3538 to 28.3335 cm
The confidence interval is much narrower with the whole data set, largely because the
standard error is about one-fourth what it was with a sample of size 40. The distribution of
the 584 measurements is right-skewed (although not as much as the smaller sample). If we
can view these trees as an SRS of similar stands—a fairly questionable assumption—the t
procedures should be fairly reliable because of the large sample size. See the solution to
Exercise 7.126 for an examination of the distribution.
7.125. The tables on the following page contain summary statistics and 95% confidence
intervals for the differences. For north/south differences, the test of H0 : µn = µs gives
t = −7.15 with df = 575.4 or 283; either way, P < 0.0001, so we reject H0 . For east/west
.
differences, t = −3.69 with df = 472.7 or 230; either way, P = 0.0003, so we reject H0 .
The larger data set results in smaller standard errors (both are near 1.5, compared to about 4
in Exercises 7.81 and 7.82), meaning that t is larger and the margin of error is smaller.
Solutions 227
x s n df t∗ Confidence interval
North 21.7990 18.9230 300 N–S 575.4 1.9641 −13.2222 to −7.5248
South 32.1725 16.0763 284 283 1.9684 −13.2285 to −7.5186
East 24.5785 17.7315 353 100 1.984 −13.2511 to −7.4960
West 30.3052 18.7264 231 E–W 472.7 1.9650 −8.7764 to −2.6770
230 1.9703 −8.7847 to −2.6687
100 1.984 −8.8059 to −2.6475
90 100
80 M x x M
70 80
Frequency
Frequency
60
50 60
40
40
30
20 20
10
0 0
0 10 20 30 40 50 60 70 80 0.5 1 1.5 2 2.5 3 3.5 4 4.5
DBH (cm) Log(DBH)
80 4.5
• ••
70
•••••••
• 4
••
••
••
•••
••
••••••••••••••
•
••• •••••••••
•••••
60 3.5
••••• ••
DBH (cm)
•
Log(DBH)
50 ••
• 3 •••
••• ••••
40 •
•••••••• 2.5 •
•• ••
30
•••• 2
•
•••
20
•••
• 1.5
••••
•
• •
10
••••• 1 •••••
0 ••••••••••••••••••
• •••••••••• 0.5 • ••••••••••••••
–4 –3 –2 –1 0 1 2 3 –4 –3 –2 –1 0 1 2 3
Normal score Normal score
7.127. (a) This is a matched pairs design because at each of the 24 nests, the same
mockingbird responded on each day. (b) The variance of the difference is approximately
s12 + s42 − 2ρs1 s4 = 48.684, so the standard deviation is 6.9774 m. (c) To test H0 : µ1 = µ4
.
versus Ha : µ1 = µ4 , we have t = 15.1 −√6.1 = 6.319 with df = 23, for which P is very
6.9774/ 24
small. (d) Assuming the correlation is the same (ρ = 0.4), the variance of the difference
is approximately s12 + s52 − 2ρs1 s5 = 31.324, so the standard deviation is 5.5968 m. To
.
test H0 : µ1 = µ5 versus Ha : µ1 = µ5 , we have t = 4.9 − 6.1√ = −1.050 with df = 23,
5.5968/ 24
228 Chapter 7 Inference for Distributions
.
for which P = 0.3045. (e) The significant difference between day 1 and day 4 suggests
that the mockingbirds altered their behavior when approached by the same person for four
consecutive days; seemingly, the birds perceived an escalating threat. When approached by a
new person on day 5, the response was not significantly different from day 1; this suggests
that the birds saw the new person as less threatening than a return visit from the first person.
7.129. The mean and standard deviation of the 25 numbers are x = 78.32% and
. .
s = 33.3563%, so the standard error is SEx = 6.6713%. For df = 24, Table D gives
∗
t = 2.064, so the 95% confidence interval is x ± 13.7695% = 64.5505% to 92.0895% (with
software, t ∗ = 2.0639 and the interval is x ± 13.7688% = 64.5512% to 92.0888%). This
seems to support the retailer’s claim: The original supplier’s price was higher between 65%
to 93% of the time.
7.130. (a) We are interested in weight change; the pairs are the “before” and “after”
measurements. (b) The mean weight change was a loss. The exact amount lost is not
specified, but it was large enough so that it would rarely happen by chance for an ineffective
weight-loss program. (c) Comparing to a t (40) distribution in Table D, we find P < 0.0005
for a one-sided alternative (P < 0.0010 for a two-sided alternative). Software reveals that it
is even smaller than that: about 0.000013 (or 0.000026 for a two-sided alternative).
Solutions 229
7.135. The similarity of the sample standard deviations suggests that the population standard
.
deviations are likely to be similar. The pooled standard deviation is sp = 436.368 and
.
t = −0.3533, so P = 0.3621 (df = 179)—still not significant.
7.136. (a) The sample sizes (98 and 83) are quite large, so the t test should be reasonably safe
(provided there are no extreme outliers). (b) Large samples do not make the F test more
reliable when the underlying distributions are skewed, so it should not be used.
7.137. No: What we have is nothing like an SRS of the population of school corporations.
232 Chapter 7 Inference for Distributions
G•Power output
A priori analysis for "t-Test (means)", two-tailed:
Alpha: 0.0500
Power (1-beta): 0.9000
Effect size "d": 0.1600
Total sample size: 1644
Actual power: 0.9001
Critical value: t(1642) = 1.9614
Delta: 3.2437
234 Chapter 7 Inference for Distributions
7.141. (a) The distributions can be compared using a back-to-back 3BR 4BR
stemplot (shown), or two histograms, or side-by-side boxplots. 99987 0
4432211100 1 4
Three-bedroom homes are right-skewed; four-bedroom homes 8655 1 678
are generally more expensive. The top two prices from the three- 1 2 024
976 2 68
bedroom distribution qualify as outliers using the 1.5 × IQR 3 223
criterion. Boxplots are probably a poor choice for displaying 3 9
4 2
the distributions because they leave out so much detail, but five-
number summaries do illustrate that four-bedroom prices are higher at every level. Summary
statistics (in units of $1000) are given in the table below. (b) For testing H0 : µ3 = µ4 versus
. .
Ha : µ3 = µ4 , we have t = −4.475 with either df = 20.98 (P = 0.0002) or df = 13
(P < 0.001). We reject H0 and conclude that the mean prices are different (specifically,
that 4BR houses are more expensive). (c) The one-sided alternative µ3 < µ4 could have
been justified because it would be reasonable to expect that four-bedroom homes would
be more expensive. (d) The 95% confidence interval for the difference µ4 − µ3 is about
$63,823 to $174,642 (df = 20.97) or $61,685 to $176,779 (df = 13). (e) While the data
were not gathered from an SRS, it seems that they should be a fair representation of three-
and four-bedroom houses in West Lafayette. (Even so, the small sample sizes, together with
the skewness and the outliers in the three-bedroom data, should make us cautious about the
t procedures. Additionally, we might question independence in these data: When setting
the asking price for a home, sellers are almost certainly influenced by the asking prices for
similar homes on the market in the area.)
n x s Min Q1 M Q3 Max
3BR 23 147.561 61.741 79.5 100.0 129.9 164.9 295.0
4BR 14 266.793 87.275 149.9 189.0 259.9 320.0 429.9
Chapter 8 Solutions
8.1. (a) n = 760 banks. (b) X = 283 banks expected to acquire another bank.
.
(c) p̂ = 283
760 = 0.3724.
8.2. (a) n = 1063 adults played video games. (b) X = 223 of those adults play daily or almost
223 .
daily. (c) p̂ = 1063 = 0.2098.
. .
8.3. (a) With p̂ = 0.3724, SE p̂ = p̂(1 − p̂)/760 = 0.01754. (b) The 95% confidence interval
is p̂ ± 1.96 SE p̂ = 0.3724 ± 0.0344. (c) The interval is 33.8% to 40.7%.
. .
8.4. (a) With p̂ = 0.2098, SE p̂ = p̂(1 − p̂)/1063 = 0.01249. (b) The 95% confidence
interval is p̂ ± 1.96 SE p̂ = 0.2098 ± 0.0245. (c) The interval is 18.5% to 23.4%.
–3 –2 –1 0 1 2 3
8.6. The 95% confidence interval is 44% to 86%. (Student opinions of what qualifies as
“appropriate” rounding might vary.) This is given directly in the Minitab output; the SAS
output gives a confidence interval for the complementary proportion.
The confidence interval in consistent with the result of the significance test, but is more
informative in that it gives a range of values for the true proportion.
8.8. With√n = 40 and p̂ = 0.65, the standard error for the significance test is
. . 0.15 .
σp̂ = p0 (1 − p0 )/40 = 0.0791, and the test statistic is z = ( p̂ − p0 )/σp̂ = 0.0791 = 1.90.
The two-sided P-value is 0.0574 (Table A) or 0.0578 (software)—not quite significant at the
5% level, but stronger evidence than the result with n = 20.
8.9. (a) To test H0 : p = 0.5 versus Ha : p = 0.5 with p̂ = 0.35, the test statistic is
p̂ − p0 . −0.15 .
z= = = −1.34
p0 (1 − p0 ) 0.1118
n
This is the opposite of the value of z given in Example 8.4, and the two-sided P-value is
the same: 0.1802 (or 0.1797 with software). (b) The standard error for a confidence interval
.
is SE p̂ = p̂(1 − p̂)/20 = 0.1067, so the 95% confidence interval is 0.35 ± 0.2090 = 0.1410
to 0.5590. This is the complement of the interval shown in the Minitab output in Figure 8.2.
235
236 Chapter 8 Inference for Proportions
8.10. We can achieve that margin of error with 90% confidence with a smaller sample. With
2
.
p ∗ = 0.5 (as in Example 8.5), we compute n = (2)(0.03)
1.645
= 751.67, so we need a sample
of 752 students.
Margin of error
0.05
size of the maximum error depends on the
0.04
sample size.)
0.03
Note: The first printing of the text asked
0.02
students to plot sample proportion (p̂) ver-
sus the margin of error (m), rather than 0.01
m versus p̂. Because p̂ is the explanatory 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
variable, the latter is more natural. Sample proportion
8.12. (a) H0 should refer to p (the population proportion), not p̂ (the sample proportion).
(b) Use Normal distributions (and a z test statistic) for significance tests involving
proportions. (c) The margin of error equals z ∗ times standard error; for 95% confidence, we
would have z ∗ = 1.96.
8.13. (a) Margin of error only accounts for random sampling error. (b) P-values measure the
strength of the evidence against H0 , not the probability of it being true. (c) The confidence
level cannot exceed 100%. (In practical terms, the confidence level must be less than 100%.)
. .
8.15. The sample proportion is p̂ = 3274
5000 = 0.6548, the standard error is SE p̂ = 0.00672, and
the 95% confidence interval is 0.6548 ± 0.0132 = 0.6416 to 0.6680.
.
8.16. (a) With p̂ = 0.16, we have SE p̂ = 0.00518, so the 95% confidence interval is
.
p̂ ± 1.96 SE p̂ = 0.16 ± 0.01016 = 0.1498 to 0.1702. (b) With 95% confidence, the percent
of this group who would prefer a health care professional as a marriage partner is about
16 ± 1%, or 15% and 17%.
Solutions 237
8.17. (a) SE p̂ depends on n, which is some number less than 7061. Without that number,
we do not know the margin of error z ∗ SE p̂ . (b) The number who expect to begin playing
.
an instrument is 67% of half of 7061, or (0.67)(0.5)(7061) = 2365 players. (c) Taking
.
n = (0.5)(7061) = 3530.5, the 99% confidence interval is p̂ ± 2.576 SE p̂ = 0.67 ± 0.02038 =
0.6496 to 0.6904. (d) We do not know the sampling methods used, which might make these
methods unreliable.
Note: Even though n must be an integer in reality, it is not necessary to round n in
part (c); the confidence interval formula works fine when n is not a whole number.
.
8.20. (a) For 99% confidence, the margin of error is (2.576)(0.00122) = 0.00314. (b) All
of these facts suggest possible sources of error; for example, students at two-year colleges
are not represented, nor are students at institutions that do not wish to pay the fee. (Even
though the fee is scaled for institution size, larger institutions can more easily absorb it.)
None of these potential errors are covered by the margin of error found in part (a), which
only accounts for random sampling error.
. .
8.21. (a) About (0.42)(159,949) = 67,179 students plan to study abroad. (b) SE p̂ = 0.00123,
.
the margin of error is 2.576 SE p̂ = 0.00318, and the 99% confidence interval is 0.4168 to
0.4232.
. .
8.22. With p̂ = 1087
1430 = 0.7601, we have SE p̂ = 0.0113, and the 95% confidence interval is
.
p̂ ± 1.96 SE p̂ = 0.7601 ± 0.0221 = 0.7380 to 0.7823.
.
8.23. With p̂ = 0.43, we have SE p̂ = 0.0131, and the 95% confidence interval is
.
p̂ ± 1.96 SE p̂ = 0.43 ± 0.0257 = 0.4043 to 0.4557.
238 Chapter 8 Inference for Proportions
8.24. (a) A 99% confidence interval would be wider: We need a larger margin of error (by
a factor of 2.576/1.96) in order to be more confident that we have included p. The 99%
confidence interval is 0.3963 to 0.4637. (b) A 90% confidence interval would be narrower
(by a factor of 1.645/1.96). The 90% confidence interval is 0.4085 to 0.4515.
√ .
8.25. (a) SE p̂ = (0.87)(0.13)/430, 000 = 0.0005129. For 99% confidence, the margin of
.
error is 2.576 SE p̂ = 0.001321. (b) One source of error is indicated by the wide variation in
response rates: We cannot assume that the statements of respondents represent the opinions
of nonrespondents. The effect of the participation fee is harder to predict, but one possible
impact is on the types of institutions that participate in the survey: Even though the fee is
scaled for institution size, larger institutions can more easily absorb it. These other sources
of error are much more significant than sampling error, which is the only error accounted
for in the margin of error from part (a).
√ .
8.26. (a) The standard error is SE p̂ = (0.69)(0.31)/1048 = 0.01429, so the margin
.
of error for 95% confidence is 1.96 SE p̂ = 0.02800 and the interval is 0.6620
√ (b) To test H0 : p. = 0.79 versus Ha : p < 0.79, the standard error
to 0.7180.
.
is
σp̂ = (0.79)(0.21)/1048 = 0.01258 and the test statistic is z = 0.69−0.79
0.01258 = −7.95. This is
very strong evidence against H0 (P < 0.00005).
√ .
8.27. (a) The standard error is SE p̂ = (0.38)(0.62)/1048 = 0.01499, so the margin of error
.
for 95% confidence is 1.96 SE p̂ = 0.02939 and the interval is 0.3506 to 0.4094. (b) Yes;
some respondents might not admit to such behavior. The true frequency of such actions
might be higher than this survey suggests.
9054 .
.
8.28. (a) p̂ = 24,142 = 0.3750. (b) The standard error is SE p̂ = p̂(1 − p̂)/24,142 = 0.003116,
.
so the margin of error for 95% confidence is 1.96 SE p̂ = 0.00611 and the interval is 0.3689
− 24,142 .
to 0.3811. (c) The nonresponse rate was 37,328
37,328 = 0.3532—about 35%. We have no
way of knowing if cheating is more or less prevalent among nonrespondents; this weakens
the conclusions we can draw from this survey.
390 .
.
8.29. (a) p̂ = 1191 = 0.3275. The standard error is SE p̂ = p̂(1 − p̂)/1191 = 0.01360, so
.
the margin of error for 95% confidence is 1.96 SE p̂ = 0.02665 and the interval is 0.3008 to
0.3541. (b) Speakers and listeners probably perceive sermon length differently (just as, say,
students and lecturers have different perceptions of the length of a class period).
8.30. A 90% confidence interval would be narrower: The margin of error will be smaller (by a
factor of 1.645/2.576) if we are willing to be less confident that we have included p. The
90% confidence interval is 0.6570 to 0.6830—narrower than the 99% confidence interval
(0.6496 to 0.6904) from Exercise 8.17.
8.31. Recall the rule of thumb from Chapter 5: Use the Normal approximation if np ≥ 10 and
n(1 − p) ≥ 10. We use p0 (the value specified in H0 ) to make our decision.
(a) No: np0 = 6. (b) Yes: np0 = 18 and n(1 − p0 ) = 12. (c) Yes: np0 = n(1 − p0 ) = 50.
(d) No: np0 = 2.
Solutions 239
.
8.33. With p̂ = 0.69, SE p̂ = 0.02830 and the 95% confidence interval is 0.6345 to 0.7455.
.
8.34. With p̂ = 0.583, SE p̂ = 0.03023 and the 95% confidence interval is 0.5237 to 0.6423.
. .
8.35. We estimate p̂ = 594
2533 = 0.2345, SE p̂ = 0.00842, and the 95% confidence interval is
0.2180 to 0.2510.
. .
8.36. (a) We estimate p̂ = 1434
2533 = 0.5661, SE p̂ = 0.00985, and the 95% confidence interval
is 0.5468 to 0.5854. (b) Pride or embarrassment might lead respondents to claim that their
income was above $25,000 even if it were not. Consequently, it would not be surprising
if the true proportion p were lower than the estimate p̂. (There may also be some who
would understate their income, out of humility or mistrust of the interviewer. While this
would seem to have less of an impact, it makes it difficult to anticipate the overall effect of
untruthful responses.) (c) Respondents would have little reason to lie about pet ownership;
the few that might lie about it would have little impact on our conclusions. The number of
untruthful responses about income is likely to be much larger and have a greater impact.
.
8.37. We estimate p̂ = 110
125 = 0.88, SE p̂ = 0.02907, and the 95% confidence interval is 0.8230
to 0.9370.
542 .
8.38. (a) p̂ = 1711 = 0.3168; about 31.7% of bicyclists aged 15 or older killed between
1987 and 1991 had alcohol in their systems at the time of the accident. (b) SE p̂ =
.
p̂(1 − p̂)/1711 = 0.01125; the 99% confidence interval is p̂ ± 2.576 SE p̂ = 0.2878
to 0.3457. (c) No: We do not know, for example, what percent of cyclists who were
386 .
not involved in fatal accidents had alcohol in their systems. (d) p̂ = 1711 = 0.2256,
.
SE p̂ = 0.01010, and the 99% confidence interval is 0.1996 to 0.2516.
8.40. With no prior knowledge of the value of p (the proportion of “Yes” responses), take
2
.
p ∗ = 0.5: n = 2(0.15)
1.96
= 42.7—use n = 43.
8.41. As a quick estimate, we can observe that to cut the margin of error in half, we must
quadruple the sample size, from 43 to 172. Using the sample-size formula, we find
2
.
n = 2(0.075)
1.96
= 170.7—use n = 171. (The difference in the two answers is due to
rounding.)
2
.
8.42. Using p ∗ = 0.25 (based on previous surveys), we compute n = 1.96
0.1 (0.25)(0.75) =
72.03, so we need a sample of 73 students.
2
8.43. The required sample sizes are found by computing 1.960.1 p ∗ (1− p ∗ ) = 384.16 p ∗ (1− p ∗ ):
To be sure that we meet our target margin of error, we should take the largest sample
indicated: n = 97 or larger.
2
8.44. n = 1.96
0.02 (0.15)(0.85) = 1224.51—use n = 1225.
8.45. With p1 = 0.4, n 1 = 25, p2 = 0.5, and n 2 = 30, the mean and standard deviation of the
sampling distribution of D = p̂1 − p̂2 are
p1 (1 − p1 ) p (1 − p2 ) .
µ D = p1 − p2 = −0.1 and σD = + 2 = 0.1339
n1 n2
8.46. (a) With p1 = 0.4, n 1 = 100, p2 = 0.5, and n 2 = 120, the mean and standard deviation
of the sampling distribution of D = p̂1 − p̂2 are
p1 (1 − p1 ) p (1 − p2 ) .
µ D = p1 − p2 = −0.1 and σD = + 2 = 0.0670
n1 n2
(b) µ D is unchanged, with σD is halved.
8.47. (a) The means are µp̂1 = p1 and µp̂2 = p2 . The standard deviations are
p1 (1 − p1 ) p2 (1 − p2 )
σp̂1 = and σp̂2 =
n1 n2
p (1 − p 1) p (1 − p2 )
(b) µ D = µp̂1 − µp̂2 = p1 − p2 . (c) σD2 = σp̂21 + σp̂22 = 1 + 2 .
n1 n2
Solutions 241
79 .
8.48. With p̂w = 100
44
= 0.44 and p̂m = 140 = 0.5643, we estimate the difference to be
.
p̂m − p̂w = 0.1243. The standard error of the difference is
p̂w (1 − p̂w ) p̂ (1 − p̂m ) .
SE D = + m = 0.06496
100 140
so the 95% confidence interval for pm − pw is 0.1243 ± (1.96)(0.06496) = −0.0030 to
0.2516.
Note: We followed the text’s practice of subtracting the smaller proportion from the
larger one, as described at the top of page 494.
8.49. Let us call the proportions favoring Commercial B qw and qm . Our estimates of
these proportions are the complements of those found in Exercise 8.48; for example,
q̂w = 100
56
= 0.56 = 1 − p̂w . Consequently, the standard error of the difference q̂w − q̂m
q̂w (1 − q̂w ) q̂ (1 − q̂ ) .
is the same as that for p̂m − p̂w : SE D = 100
+ m 140 m = 0.06496. The
margin of error is therefore also the same, and the 95% confidence interval for qw − qm is
(q̂w − q̂m ) ± (1.96)(0.06496) = −0.0030 to 0.2516.
Note: As in the previous exercise, we followed the text’s practice of subtracting the
smaller proportion from the larger one.
44 + 79
8.50. The pooled estimate of the proportion is p̂ = 100 +140 = 0.5125. For testing
.
H0 : pm = pw versus Ha : pm = pw , we have SE Dp = p̂(1 − p̂) 100
1
+ 140
1
= 0.06544 and
.
the test statistic is z = ( p̂m − p̂w )/SE Dp = 1.90, for which the two-sided P-value is 0.0576.
This is not quite enough evidence to reject H0 at the 5% level.
8.51. Because the sample proportions would tend to support the alternative hypothesis
( pm > pw ), the P-value is half as large (P = 0.0288), which would be enough to reject H0
at the 5% level.
8.52. (a) The filled-in table is on the Population Sample Count of Sample
right. (b) The estimated difference is Population proportion size successes proportion
.
p̂2 − p̂1 = 0.1198. (c) Large-sample 1 p1 2822 198 0.0702
methods should be appropriate, be- 2 p2 1553 295 0.1900
cause we have large, independent
.
samples from two populations. (d) With SE D = 0.01105, the 95% confidence interval is
.
0.1198 ± 0.02167 = 0.0981 to 0.1415. (e) The estimated difference is about 12.0%, and the
interval is about 9.8% to 14.1%. (f) It is hard to imagine why the months for each survey
would affect the interpretation. (Of course, just because we cannot guess what the impact
would be does not mean there is no impact.)
198 + 295 .
8.53. For H0 : p1 = p2 , the pooled estimate of the proportion is p̂ = 2822 + 1553 = 0.1127.
. 0.1198 .
The standard error is SE Dp = 0.00999, and the test statistic is z = 0.00999 = 11.99. The
alternative hypothesis was not specified in this exercise; for either p1 = p2 or p1 < p2 , the
P-value associated with z = 11.99 would be tiny. (For the alternative p1 > p2 , P would be
242 Chapter 8 Inference for Proportions
nearly 1, and we would not reject H0 ; however, it is hard to imagine why we would suspect
that podcast use had decreased from 2006 to 2008.)
8.54. (a) No; this is a ratio of proportions, not people. In addition, these are sample
proportions, so they are only estimates of the population proportions. If the size of the
population (Internet users) remained roughly constant from 2006 to 2008, we can say that
about about 2.7 times as many people are downloading podcasts. (b) We are quite confident
that the 2008 proportion exceeds the 2006 proportion by at least 0.098, so making the
(extremely reasonable) assumption that the number of Internet users did not decrease, we are
nearly certain that there are more people downloading podcasts.
8.55. (a) The filled-in table is on the Population Sample Count of Sample
right. The values of X 1 and X 2 Population proportion size successes proportion
are estimated as (0.54)(1063) and 1 p1 1063 574 0.54
(0.89)(1064). (b) The estimated 2 p2 1064 947 0.89
.
difference is p̂2 − p̂1 = 0.35.
(c) Large-sample methods should be appropriate, because we have large, independent
.
samples from two populations. (d) With SE D = 0.01805, the 95% confidence interval is
.
0.35 ± 0.03537 = 0.3146 to 0.3854. (e) The estimated difference is about 35%, and the
interval is about 31.5% to 38.5%. (f) A possible concern is that adults were surveyed before
Christmas, while teens were surveyed before and after Christmas. It might be that some of
those teens may have received game consoles as gifts, but eventually grew tired of them.
(0.54)(1063) + (0.89)(1064).
8.56. The pooled estimate of the proportion is p̂ = = 0.7151. For testing
1063 + 1064
.
H0 : p1 = p1 versus Ha : p1 = p2 , we have SE Dp = p̂(1 − p̂) 1063 1
+ 1064
1
= 0.01957 and
.
the test statistic is z = ( p̂2 − p̂1 )/SE Dp = 17.88. The P-value is essentially 0; we have no
doubt that the two proportions are different (specifically, that the teen proportion is higher).
8.57. (a) The filled-in table is on the Population Sample Count of Sample
right. The values of X 1 and X 2 Population proportion size successes proportion
are estimated as (0.73)(1063) and 1 p1 1063 776 0.73
(0.76)(1064). (b) The estimated 2 p 2 1064 809 0.76
.
difference is p̂2 − p̂1 = 0.03.
(c) Large-sample methods should be appropriate, because we have large, independent
.
samples from two populations. (d) With SE D = 0.01889, the 95% confidence interval is
.
0.03 ± 0.03702 = −0.0070 to 0.0670. (e) The estimated difference is about 3%, and the
interval is about −0.7% to 6.7%. (f) As in the solution to Exercise 8.55, a possible concern
is that adults were surveyed before Christmas.
(0.73)(1063) + (0.76)(1064) .
8.58. The pooled estimate of the proportion is p̂ = = 0.7450. For testing
1063 + 1064
.
H0 : p1 = p1 versus Ha : p1 = p2 , we have SE Dp = p̂(1 − p̂) 1063 1
+ 1064
1
= 0.01890
.
and the test statistic is z = ( p̂2 − p̂1 )/SE Dp = 1.59. The P-value is 0.1118 (Table A) or
0.1125 (software); either way, there is not enough evidence to conclude that the proportions
are different.
Solutions 243
8.59. No; this procedure requires independent samples from different populations. We have one
sample (of teens).
8.61. (a) H0 should refer to p1 and p2 (population proportions) rather than p̂1 and p̂2 (sample
proportions). (b) Knowing p̂1 = p̂2 does not tell us that the success counts are equal
(X 1 = X 2 ) unless the sample sizes are equal (n 1 = n 2 ). (c) Confidence intervals only
account for random sampling error.
8.62. (a) The mean of D = p̂1 − p̂2 is µ D = p1 − p2 = 0.4 − 0.5 = −0.1 (as before). The
standard deviation is
p1 (1 − p1 ) p (1 − p2 ) .
σD = + 2 = 0.07106
50 1000
(b) The mean of p̂1 − 0.5 is also −0.1; the standard deviation is the same as that of p̂1 :
p1 (1 − p1 ) .
= 0.06928. (c) The standard deviation of p̂2 is only 0.01581, so it will
50
typically (95% of the time) differ from its mean (0.5) by no more than about 0.032. (d) If
one sample is very large, that estimated proportion will be more accurate, and most of the
variation in the difference will come from the variation in the other proportion.
8.63. Pet owners had the lower proportion of women, so we call them “population 2”:
. 1024 . .
p̂2 = 285
595 = 0.4790. For non-pet owners, p̂1 = 1939 = 0.5281. SE D = 0.02341, so the 95%
confidence interval is 0.0032 to 0.0950.
244 Chapter 8 Inference for Proportions
8.65. With equal sample sizes, the pooled estimate of the proportion is p̂ = 0.255,
the average of p̂1 = 0.29 and p̂2 = 0.22. This can also be computed by
taking X 1 = (0.29)(1421) = 412.09 and X 2 = (0.22)(1421) = 312.62, so
.
p̂ = (X 1 + X 2 )/(1421 + 1421). The standard error for a significance test is SE Dp = 0.01635,
.
and the test statistic is z = 4.28 (P < 0.0001); we conclude that the proportions are
.
different. The standard error for a confidence interval is SE D = 0.01630, and the 95%
confidence interval is 0.0381 to 0.1019. The interval gives us an idea of how large the
difference is: Music downloads dropped 4% to 10%.
8.66. The table below shows the results from the previous exercise, and those with different
sample sizes. For part (iii), two answers are given, corresponding to the two ways one could
interpret which is the “first sample size.”
n1 n2 p̂ SE Dp z SE D Confidence interval
8.65 1421 1421 0.255 0.01635 4.28 0.01630 0.0381 to 0.1019
(i) 1000 1000 0.255 0.01949 3.59 0.01943 0.0319 to 0.1081
(ii) 1600 1600 0.255 0.01541 4.54 0.01536 0.0399 to 0.1001
(iii) 1000 1600 0.2469 0.01738 4.03 0.01770 0.0353 to 0.1047
1600 1000 0.2631 0.01775 3.94 0.01733 0.0360 to 0.1040
As one would expect, we see in (i) and (ii) that smaller samples result in smaller z (weaker
evidence) and wider intervals, while larger samples have the reverse effect. The results of
(iii) show that the effect of varying unequal sample sizes is more complicated.
. 75 .
8.67. (a) We find p̂1 = 73 91 = 0.8022 and p̂2 = 109 = 0.6881. For a confidence interval,
.
SE D = 0.06093, so the 95% confidence interval for p1 − p2 is (0.8022 − 0.6881) ±
(1.96)(0.06093) = −0.0053 to 0.2335. (b) The question posed was, “Do high-tech
companies tend to offer stock options more often than other companies?” Therefore, we test
. . 73 + 75
H0 : p1 = p2 versus Ha : p1 > p2 . With p̂1 = 0.8022, p̂2 = 0.6881, and p̂ = 91 + 109 = 0.74,
. .
we find SE Dp = 0.06229, so z = ( p̂1 − p̂2 )/SE Dp = 1.83. This gives P = 0.0336. (c) We
have fairly strong evidence that high-tech companies are more likely to offer stock options.
However, the confidence interval tells us that the difference in proportions could be very
small, or as large as 23%.
Solutions 245
. .
8.68. With p̂2002 = 0.4780 and p̂2004 = 0.3750, the standard error for a confidence interval
.
is SE D = 0.00550. The 90% confidence interval for the difference p2002 − p2004 is
(0.4780 − 0.3750) ± 1.645SE D = 0.0939 to 0.1120.
.
8.69. (a) p̂f = 48 60 = 0.8, so SE p̂ = 0.05164 for females. p̂m = 132 = 0.39, so
52
. √ .
SE p̂ = 0.04253 for males. (b) SE D = 0.051642 + 0.042532 = 0.06690, so the interval is
( p̂f − p̂m ) ± 1.645 SE D , or 0.2960 to 0.5161. There is (with high confidence) a considerably
higher percent of juvenile references to females than to males.
515 . 27 . .
8.70. (a) p̂1 = 1520 = 0.3388 for men, and p̂2 = 191 = 0.1414 for women. SE D = 0.02798,
so the 95% confidence interval for p1 − p2 is 0.1426 to 0.2523. (b) The female
contribution is larger because the sample size for women is much smaller. (Specifically,
. .
p̂1 (1 − p̂1 )/n 1 = 0.0001474, while p̂2 (1 − p̂2 )/n 2 = 0.0006355.) Note that if the
sample sizes had been similar, the male contribution would have been larger (assuming the
proportions remained the same) because the numerator term pi (1 − pi ) is greater for men
than women.
. .
8.71. We test H0 : p1 = p2 versus Ha : p1 = p2 . With p̂1 = 0.5281, p̂2 = 0.4790, and
+ 285 . . .
p̂ = 1024
1939 + 595 = 0.5166, we find SE Dp = 0.02342, so z = ( p̂1 − p̂2 )/SE Dp = 2.10. This
gives P = 0.0360—significant evidence (at the 5% level) that a higher proportion of non-pet
owners are women.
8.73. (a) While there is only a 5% chance of any interval being Genre Interval
wrong, we have six (roughly independent) chances to make Racing 0.705 to 0.775
that mistake. (b) For 99.2% confidence, use z ∗ = 2.65. (Us- Puzzle 0.684 to 0.756
∗ .
ing software, z = 2.6521, or 2.6383 using the exact value of Sports 0.643 to 0.717
0.05/6 = 0.0083). (c) The margin of error for each interval is Action 0.632 to 0.708
∗
z SE p̂ , so each interval is about 1.35 times wider than in the Adventure 0.622 to 0.698
Rhythm 0.571 to 0.649
previous exercise. (If intervals are rounded to three decimal
places, as on the right, the results are the same regardless of the value of z ∗ used.)
8.74. (a) The proportion is p̂ = 0.042, so X = (0.042)(15, 000) = 630 of the households
.
in the sample are wireless only. (b) SE p̂ = 0.00164, so the 95% confidence interval is
0.042 ± 0.00321 = 0.0388 to 0.0452.
8.75. (a) The proportion is p̂ = 0.164, so X = (0.164)(15, 000) = 2460 of the households
.
in the sample are wireless only. (b) SE p̂ = 0.00302, so the 95% confidence interval is
0.164 ± 0.00593 = 0.1581 to 0.1699. (c) The estimate is 16.4%, and the interval is 15.8%
246 Chapter 8 Inference for Proportions
to 17.0%. (d) The difference in the sample proportions is D = 0.164 − 0.042 = 0.122.
. .
(e) SE D = 0.00344, so the margin of error is 1.96 SE D = 0.00674. (The confidence interval
is therefore 0.1153 to 0.1287.)
.
0.042 = 3.90. A better term might be “relative rate” (of
8.76. (a) The “relative risk” is 0.164
being wireless only). (b) A possible summary: From 2003 to 2007, the proportion of
wireless-only households increased by nearly four times. (With software, and/or the
methods of Chapter 14, one could also determine a confidence interval for this ratio,
as in Example 8.11.) (c) Both illustrate (in different ways) a change in the proportion.
(d) Preferences will vary. Students might note that ratios are often used in news reports; for
example, “the risk of complications is twice as high for people taking drug A compared to
those taking drug B.”
.
8.77. With p̂1 = 0.43, p̂2 = 0.32, and n 1 = n 2 = 1430, we have SE D = 0.01799, so the 95%
confidence interval is (0.43 − 0.32) ± 0.03526 = 0.0747 to 0.1453.
8.78. The pooled estimate of the proportion is p̂ = 0.375 (the average of p̂1 and p̂2 , because
. .
the sample sizes were equal). Then SE Dp = 0.01811, so z = (0.43 − 0.32)/SE Dp = 6.08, for
which P < 0.0001.
8.79. (a) and (b) The revised confidence intervals and z statistics are in the table below.
(c) While the interval and z statistic change slightly, the conclusions are roughly the same.
Note: Even if the second sample size were as low as 100, the two proportions would be
significantly different, albeit less so (z = 2.15, P = 0.0313).
n2 SE D m.e. Interval p̂ SE Dp z P
1000 0.01972 0.03866 0.0713 to 0.1487 0.3847 0.02006 5.48 < 0.0001
2000 0.01674 0.03281 0.0772 to 0.1428 0.3659 0.01668 6.59 < 0.0001
994 . .
8.80. With p̂ = 1430 = 0.6951, we have SE p̂ = 0.01217, so the 95% confidence interval is
.
p̂ ± 1.96 SE p̂ = 0.6951 ± 0.0239 = 0.6712 to 0.7190.
8.81. Student answers will vary. Shown on the right is the margin of error n m.e.
arising for sample sizes ranging from 500 to 2300; a graphical summary 500 0.04035
is not shown, but a good choice would be a plot of margin of error ver- 800 0.03190
sus sample size. 1100 0.02721
1430 0.02386
1700 0.02188
2000 0.02018
2300 0.01881
√ .
8.82. With p̂ = 0.58, the standard error is SE p̂ = (0.58)(0.42)/1048 = 0.01525, so the
.
margin of error for 90% confidence is 1.645 SE p̂ = 0.02508, and the interval is 0.5549 to
0.6051.
Solutions 247
.
8.83. With p̂m = 0.59 and p̂w = 0.56, the standard error is SE D = 0.03053, the margin of
.
error for 95% confidence is 1.96SE D = 0.05983, and the confidence interval for pm − pw is
−0.0298 to 0.0898.
8.84. (a) The table below summarizes the margins of error m.e.= 1.96 p̂(1 − p̂)/n:
p̂ m.e. 95% confidence interval
Current Downloading less 38% 6.05% 31.95% to 44.05%
downloaders Use P2P networks 33.33% 5.88% 27.45% to 39.21%
(n = 247) Use e-mail/IM 24% 5.33% 18.67% to 29.33%
Use music-related sites 20% 4.99% 15.01% to 24.99%
Use paid services 17% 4.68% 12.32% to 21.68%
All users Have used paid services 7% 1.35% 5.65% to 8.35%
(n = 1371) Currently use paid services 3% 0.90% 2.10% to 3.90%
(b) Obviously, students’ renditions of the above information in a paragraph will vary.
(c) Student opinions may vary on this. Personally, I lean toward B, although I would be
inclined to report two margins of error: “no more than 6%” for the current downloaders and
“no more than 1.4%” for all users.
8.85. (a) People have different symptoms; for example, not all who wheeze consult a doctor.
45 . 12 .
(b) In the table (below), we find for “sleep” that p̂1 = 282 = 0.1596 and p̂2 = 164 = 0.0732,
. .
so the difference is p̂1 − p̂2 = 0.0864. Therefore, SE D = 0.02982 and the margin of
error for 95% confidence is 0.05844. Other computations are performed in like manner.
(c) It is reasonable to expect that the bypass proportions would be higher—that is, we
expect more improvement where the pollution decreased—so we could use the alternative
45 + 12 . .
p1 > p2 . (d) For “sleep,” we find p̂ = 282 + 164 = 0.1278 and SE Dp = 0.03279. Therefore,
. .
z = (0.1596 − 0.0732)/SE Dp = 2.64. Other computations are similar. Only the “sleep”
difference is significant. (e) 95% confidence intervals are shown below. (f) Part (b) showed
improvement relative to control group, which is a better measure of the effect of the bypass,
because it allows us to account for the improvement reported over time even when no
change was made.
Bypass minus congested Bypass
Complaint p̂1 − p̂2 95% CI z P p̂ 95% CI
Sleep 0.0864 0.0280 to 0.1448 2.64 0.0042 0.1596 0.1168 to 0.2023
Number 0.0307 −0.0361 to 0.0976 0.88 0.1897 0.1596 0.1168 to 0.2023
Speech 0.0182 −0.0152 to 0.0515 0.99 0.1600 0.0426 0.0190 to 0.0661
Activities 0.0137 −0.0395 to 0.0670 0.50 0.3100 0.0925 0.0586 to 0.1264
Doctor −0.0112 −0.0796 to 0.0573 −0.32 0.6267 0.1174 0.0773 to 0.1576
Phlegm −0.0220 −0.0711 to 0.0271 −0.92 0.8217 0.0474 0.0212 to 0.0736
Cough −0.0323 −0.0853 to 0.0207 −1.25 0.8950 0.0575 0.0292 to 0.0857
8.86. (a) The number of orders completed in five days or less before the changes was
.
X 1 = (0.16)(200) = 32. With p̂1 = 0.16, SE p̂ = 0.02592, and the 95% confidence
interval for p1 is 0.1092 to 0.2108. (b) After the changes, X 2 = (0.9)(200) = 180. With
.
p̂2 = 0.9, SE p̂ = 0.02121, and the 95% confidence interval for p2 is 0.8584 to 0.9416.
.
(c) SE D = 0.03350 and the 95% confidence interval for p2 − p1 is 0.6743 to 0.8057, or
about 67.4% to 80.6%.
248 Chapter 8 Inference for Proportions
.
8.87. With p̂ = 0.56, SE p̂ = 0.01433, so the margin of error for 95% confidence is
.
1.96SE p̂ = 0.02809.
. .
8.88. (a) X 1 = 121 = (0.903)(134) die-hard fans and X 2 = 161 = (0.679)(237) less loyal fans
+ 161 . .
watched or listened as children. (b) p̂ = 121
134 + 237 = 0.7601 and SE Dp = 0.04615, so we find
.
z = 4.85 (P < 0.0001)—strong evidence of a difference in childhood experience. (c) For a
.
95% confidence interval, SE D = 0.03966 and the interval is 0.1459 to 0.3014. If students
work with the rounded proportions (0.903 and 0.679), the 95% confidence interval is 0.1463
to 0.3017.
. .
8.92. (a) p̂ = 463
975 = 0.4749, SE D = 0.01599, and the 95% confidence interval is 0.4435
to 0.5062. (b) Expressed as percents, the confidence interval is 44.35% to 50.62%.
(c) Multiply the upper and lower limits of the confidence interval by 37,500: about 16,632
to 18,983 students.
Solutions 249
8.94. With sample sizes of n w = (0.52)(1200) = 624 women and n m = 576 men, we test
H0 : pm = pw versus Ha : pm = pw . Assuming there were X m = 0.62n m = 357.12 men and
X w = 0.51n w = 318.24 women who thought that parents put too little pressure on students,
. . .
the pooled estimate is p̂ = 0.5628, SE Dp = 0.02866, and the test statistic is z = 3.84. This
is strong evidence (P < 0.0001) that a higher proportion of men have this opinion.
.
To construct a 95% confidence interval for pm − pw , we have SE D = 0.02845, yielding
the interval 0.0542 to 0.1658.
8.95. The difference becomes more significant (i.e., the P-value de- n z P
creases) as the sample size increases. For small sample sizes, the 40 0.90 0.3681
difference between p̂1 = 0.5 and p̂2 = 0.4 is not significant, but 50 1.01 0.3125
with larger sample sizes, we expect that the sample proportions 80 1.27 0.2041
should be better estimates of their respective population propor- 100 1.42 0.1556
tions, so p̂1 − p̂2 = 0.1 suggests that p1 = p2 . 400 2.84 0.0045
500 3.18 0.0015
1000 4.49 0.0000
8.96. The table and graph below show the large-sample margins of error. The margin of error
decreases as sample size increases, but the rate of decrease is noticeably less for large n.
Margin of error (95% conf.) 0.2
n m.e.
40 0.2079 0.15
50 0.1859
80 0.1470 0.1
100 0.1315
0.05
400 0.0657
500 0.0588
0
1000 0.0416 0 100 200 300 400 500 600 700 800 900 1000
Sample size
8.97. (a) Using either trial and error, or the formula derived in part(b), we find that at least
p̂ (1 − p̂ ) p̂ (1 − p̂2 )
n = 342 is needed. (b) Generally, the margin of error is m = z ∗ 1 1
+ 2 ;
√ n n
with p̂1 = p̂2 = 0.5, this is m = z ∗ 0.5/n. Solving for n, we find n = (z ∗ /m)2 /2.
8.98. We must assume that we can treat the births recorded during these two times as
independent SRSs. Note that the rules of thumb for the Normal approximation are not
satisfied here; specifically, three birth defects are less than ten. Additionally, one might call
into question the assumption of independence, because there may have been multiple births
to the same set of parents included in these counts (either twins/triplets/etc., or “ordinary”
siblings).
16 .
If we carry out the analysis in spite of these issues, we find p̂1 = 414 = 0.03865 and
3 . .
p̂2 = 228 = 0.01316. We might then find a 95% confidence interval: SE D = 0.01211, so the
interval is p̂1 − p̂2 ± (1.96)(0.01211) = 0.00175 to 0.04923. Note that this does not take into
account the presumed direction of the difference. (This setting does meet our requirements
for the plus four method, for which p̃1 = 0.04086 and p̃2 = 0.01739, SE D = 0.01298, and
the 95% confidence interval is −0.0020 to 0.0489.)
250 Chapter 8 Inference for Proportions
. 339 . .
8.99. (a) p0 = 143,611
181,535 = 0.7911. (b) p̂ = 870 = 0.3897, σp̂ = 0.0138, and
. .
z = ( p̂ − p0 )/σp̂ = −29.1, so P = 0 (regardless of whether Ha is p > p0 or p = p0 ).
This is very strong evidence against H0 ; we conclude that Mexican Americans are
. 143,611 − 339 .
870 = 0.3897, while p̂2 = 181,535 − 870 = 0.7930. Then
underrepresented on juries. (c) p̂1 = 339
. . .
p̂ = 0.7911 (the value of p0 from part (a)), SE Dp = 0.01382, and z = −29.2—and again, we
have a tiny P-value and reject H0 .
9.1. (a) The conditional distributions are given in the table below. For example, given that
Explanatory = 1, the distribution of the response variable is 200
70
= 35% “Yes” and
200 = 65% “No.” (b) The graphical display might take the form of a bar graph like the
130
one shown below, but other presentations are possible. (c) One notable feature is that when
Explanatory = 1, “No” is more common, but “Yes” and “No” are closer to being evenly
split when Explanatory = 2.
70 Exp1
60
Exp2
Explanatory 50
Percent
Response variable 40
variable 1 2 30
Yes 35% 45% 20
No 65% 55% 10
0
Yes No
Response variable
9.2. (a) The expected count for the first cell is (160)(200)
400 = 80. (b) This X 2 statistic has
df = (2 − 1)(2 − 1) = 1. (c) Because 3.84 < X < 5.02, the P-value is between 0.025 and
2
0.05.
9.4. The nine terms are shown in the table on Fruit Physical activity
the right. For example, the first term is consumption Low Moderate Vigorous
(69 − 51.90)2 . Low 5.6341 0.2230 0.3420
= 5.6341 Medium 0.6256 0.2898 0.0153
51.90
These terms add up to about 14.1558; the High 6.1280 0.0091 0.8889
slight difference is due to the rounding of the expected values reported in Example 9.10.
9.5. The table below summarizes the bounds for the P-values, and also gives the exact
P-values (given by software). In each case, df = (r − 1)(c − 1).
Size of Crit. values Bounds Actual
X2 table df (Table F) for P P
(a) 5.32 2 by 2 1 5.02 < X 2 < 5.41 0.02 < P < 0.025 0.0211
(b) 2.7 2 by 2 1 2.07 < X 2 < 2.71 0.10 < P < 0.15 0.1003
(c) 25.2 4 by 5 12 24.05 < X 2 < 26.22 0.01 < P < 0.02 0.0139
(d) 25.2 5 by 4 12 24.05 < X 2 < 26.22 0.01 < P < 0.02 0.0139
251
252 Chapter 9 Analysis of Two-Way Tables
9.6. The Minitab output shown on the right gives Minitab output
.
X 2 = 54.307, df = 1, and P < 0.0005, indicating Men Women Total
significant evidence of an association. Yes 1392 1748 3140
1215.19 1924.81
No 3956 6723 10679
4132.81 6546.19
Total 5348 8471 13819
ChiSq = 25.726 + 16.241 +
7.564 + 4.776 = 54.307
df = 1, p = 0.000
9.8. The table below lists the observed counts, the population proportions, the expected counts,
and the chi-square contributions (for the next exercise). Each expected count is the product
of the proportion and the sample size 1567; for example, (0.172)(1567) = 269.524 for
California.
State AZ CA HI IN NV OH
Observed count 167 257 257 297 107 482
Proportion 0.105 0.172 0.164 0.188 0.070 0.301
Expected count 164.535 269.524 256.988 294.596 109.690 471.667
Chi-square contribution 0.0369 0.5820 0.0000 0.0196 0.0660 0.2264
9.9. The expected counts are in the table above, rounded to four decimal places as in
Example 9.15; for example, for California, we have
(257 − 269.524)2 .
= 0.5820
269.524
The six values add up to 0.93 (rounded to two decimal places).
.
9.10. The chi-square goodness of fit statistic is X 2 = 15.2 with df = 5, for which
0.005 < P < 0.01 (software gives 0.0096). The details of the computation are given in the
table below; note that there were 475 M&M’s in the bag.
Expected Expected Observed (O − E)2
frequency count count O−E E
Brown 0.13 61.75 61 −0.75 0.0091
Yellow 0.14 66.5 59 −7.5 0.8459
Red 0.13 61.75 49 −12.75 2.6326
Orange 0.20 95 77 −18 3.4105
Blue 0.24 114 141 27 6.3947
Green 0.16 76 88 12 1.8947
475 15.1876
Solutions 253
9.11. (a) The two-way table is on the right; for Date of Survey
example, for April 2001, (0.05)(2250) = 112.5 April April March April
and (0.95)(2250) = 2137.5. (b) Under Broadband? 2001 2004 2007 2008
the null hypothesis that the proportions Yes 112.5 540 1080 1237.5
have not changed, the expected counts are No 2137.5 1710 1170 1012.5
(0.33)(2250) = 742.5 (across the top row) and (0.67)(2250) = 1507.5 (across the bottom
row), because the average of the four broadband percents is 5%+24%+48%+55%
4 = 33%. (We
take the unweighted average because we have assumed that the sample sizes were equal.)
.
The test statistic is X 2 = 1601.8 with df = 3, for which P < 0.0001. Not surprisingly, we
reject H0 . (c) The average of the last two broadband percents is 48%+55%
2 = 51.5%, so if
the proportions are equal, the expected counts are (0.515)(2250) = 1158.75 (top row) and
.
(0.485)(2250) = 1091.25 (bottom row). The test statistic is X 2 = 22.07 with df = 1, for
which P < 0.0001.
Note: This test is equivalent to testing H0 : p1 = p2 versus Ha : p1 = p2 using
.
the methods of Chapter 8. We find pooled estimate p̂ = 0.515, SEDp = 0.01490, and
.
z = (0.48 − 0.55)/SEDp = −4.70. (Note that z2 = X 2 .)
9.12. (a) The two-way table is on the right; for Date of Survey
example, for April 2001, (0.41)(2250) = 922.5 April April March April
and (0.59)(2250) = 1327.5. (b) Under Dialup? 2001 2004 2007 2008
the null hypothesis that the proportions Yes 922.5 675 360 270
have not changed, the expected counts are No 1327.5 1575 1890 1980
(0.2475)(2250) = 556.875 (across the top
row) and (0.7525)(2250) = 1693.125 (across 50
Percent of households
9.13. Students may experiment with a variety of scenarios, but they should find that regardless
of the what they try, the conclusion is the same.
254 Chapter 9 Analysis of Two-Way Tables
9.15. (a) The 3 × 2 table is on the right. (b) The percents of Allowed?
disallowed small, medium, and large claims are (respectively) Stratum Yes No Total
6 . 5 .
57 = 10.5%, 17 = 29.4%, and 5 = 20%. (c) In the 3 × 2
1
Small 51 6 57
table, the expected count for large/not allowed is too small Medium 12 5 17
.
79 = 0.76). (d) The null hypothesis is “There is no rela-
( 5·12 Large 4 1 5
tionship between claim size and whether a claim is allowed.” Total 67 12 79
(e) As a 2 × 2 table (with the second row 16 “yes” and 6 “no”), we find X = 3.456, df = 1,
2
9.16. (a) In the table below, the estimated numbers of disallowed claims in the populations
are found by multiplying the sample proportion by the population size; for example,
.
57 · 3342 = 351.8 claims. (b) For each stratum, let p̂ be the sample proportion, n be
6
the sample
size, and N be the population size. The standard error for the sample is
SE p̂ = p̂(1 − p̂)/n, and the standard error for the population estimate is N SE p̂ . The
margins of error depends on the desired confidence level; for 95% confidence, we should
double the population standard errors.
9.17. The table on the right shows the given information trans- Year DFW Pass
lated into a 3 × 2 table. For example, in Year 1, about 1 1018.584 1389.416
(0.423)(2408) = 1018.584 students received DFW grades, 2 578.925 1746.075
and the rest—(0.577)(2408) = 1389.416 students—passed. 3 423.074 1702.926
.
To test H0 : the DFW rate has not changed, we have X = 307.8, df = 2, P < 0.0001—very
2
9.18. (a) The table of approximate counts is on Attendance ABC DFW Total
the right. Because the reported percents were Less than 50% 10.78 9 19.78
rounded to the nearest whole percent, the total 51% to 74% 43.12 25.2 68.32
sample size is not 719. (b) With the counts as in 75% to 94% 134.75 54 188.75
. .
the table, X = 15.75, df = 3, and P = 0.0013.
2 95% or more 355.74 91.8 447.54
If students round the counts, or attempt to adjust Total 544.39 180 724.39
2
the numbers in the first column so the numbers add up to 719, the value of X will change
slightly, but the P-value remains small, and the conclusion is the same. (c) We have strong
enough evidence to conclude that there is an association between class attendance and DFW
rates. (d) Association is not proof of causation. However, by comparing the observed counts
with the expected counts, we can see that the data are consistent with that scenario; for ex-
ample, among students with the highest attendance rates, more passed than expected (355.74
observed, 336.33 expected), and fewer failed (91.8 observed, 111.2 expected).
9.20. (a) The approximate counts are shown on the Government loans
right; for example, among those students in trades, Field Yes No Total
(0.45)(942) = 423.9 took government loans Trades 423.9 518.1 942
and (0.55)(942) = 518.1 did not. (b) We have Design 317.47 281.53 599
strong enough evidence to conclude that there is Health 2878.7 2355.3 5234
an association between field of study and taking Media/IT 1780.9 1457.1 3238
Service 826.8 551.2 1378
government loans; the test statistic is X 2 = 97.44
Other 1081 1219 2300
(with unrounded counts) or 97.55 (with rounded Total 7308.77 6382.23 13,691
counts), with df = 5, for which P is very small.
(c) Overall, 53.3% of these students took government loans; students in trades and “other”
fields of study were slightly less likely, and those in the service field were slightly more
likely. A bar graph would be a good choice for a graphical summary.
9.21. (a) The approximate counts are shown Parents, family, spouse
on the right; for example, among those Field Yes No Total
students in trades, (0.2)(942) = 188.4 Trades 188.4 753.6 942
relied on parents, family, or spouse, and Design 221.63 377.37 599
(0.8)(942) = 753.6 did not. (b) We have Health 1360.84 3873.16 5234
strong enough evidence to conclude that there Media/IT 518.08 2719.92 3238
Service 248.04 1129.96 1378
is an association between field of study and
Other 943 1357 2300
getting money from parents, family, or spouse; Total 3479.99 10211.01 13,691
the test statistic is X 2 = 544.0 (with un-
rounded counts) or 544.8 (with rounded counts), with df = 5, for which P is very small.
(c) Overall, 25.4% of these students relied on family support; students in media/IT and ser-
vice fields were slightly less likely, and those in the design and“other” fields were slightly
more likely. A bar graph would be a good choice for a graphical summary.
.
9.22. (a) For example, 63 +63309 = 16.94% of the small- 60 56.85%
est banks over RDC. The bar graph on the right is
Percent offering RDC
50
one possible graphical summary. (b) To test H0 : no
association between bank size and offering RDC, 40
. 30.89%
we have X 2 = 96.3 with df = 2, for which P is 30
tiny. We have very strong evidence of an associa- 20 16.94%
tion.
10
0
Under $100 $101–200 $201 or more
Bank assets ($millions)
Solutions 257
.
+ 148 = 50.5% get
9.23. (a) Of the high exercisers, 151151
9.24. (a) The marginal totals are given in the table Lied? Male Female Total
on the right. (b) The most appropriate description Yes 3,228 10,295 13,523
is the conditional distribution by gender (the ex- No 9,659 4,620 14,279
planatory variable): 25.05% of males, and 69.02% Total 12,887 14,915 27,802
of females, admitted to lying. (c) Females are
70 69.02%
much more likely to have lied (or at least, to ad-
60
mit to lying). (d) Not surprisingly, this is highly
9.25. (a) The marginal totals are given in the table Lied? Male Female Total
on the right. (b) The most appropriate descrip- Yes 11,724 14,169 25,893
tion is the conditional distribution by gender (the No 1,163 746 1,909
explanatory variable): 91% of males, and 95% Total 12,887 14,915 27,802
of females, agreed that trust and honesty are 100
91% 95%
essential. (c) Females are slightly more likely to
that trust is essential
9.26. The main problem is that this is not a two-way table. Specifically, each of the 119
students might fall into several categories: They could appear on more than one row if
they saw more than one of the movies and might even appear more than once on a given
row (for example, if they have both bedtime and waking symptoms arising from the same
movie).
Another potential problem is that this is a table of percents rather than counts. However,
because we were given the value of n for each movie title, we could use that information to
determine the counts for each category; for example, it appears that 20 of the 29 students
.
who watched Poltergeist had short-term bedtime problems because 20 29 = 68.96% (perhaps
the reported value of 68% was rounded incorrectly). If we determine all of these counts in
this way (and note several more apparent rounding errors in the process), those counts add
up to 200, so we see that students really were counted more than once.
If the values of n had not been given for each movie, then we could not do a chi-squared
analysis even if this were a two-way table.
0.4 0.7
Proportion of students
Proportion of students
0.6
0.3 0.5
0.4
0.2
0.3
0.2
0.1
0.1
0 0
15–19 20–24 25–34 35 and over Full-time Part-time
Solutions 259
1
0.9 15–19
Full- Part-
Proportion of students
0.8
time time 20–24
0.7
15–19 0.2964 0.0610 0.6
0.5 25–34
20–24 0.4763 0.2254
0.4
35 and over
25–34 0.1522 0.3458 0.3
0.2
35+ 0.0752 0.3678 0.1
0
Full-time Part-time
.
6925 = 46.99% are men and the rest
9.28. (a) Of all students aged 20 to 24 years, 3254
.
6925 = 53.01%) are women. Shown below are two possible graphical displays. In the bar
( 3671
graph on the left, the bars represent the proportion of all students (in this age range) in each
gender. Alternatively, because the two percents represent parts of a single whole, we can
display the distribution as a pie chart like that in the middle. (b) Among male students,
2719 . 535 .
3254 = 83.56% are full-time and the rest ( 3254 = 16.44%) are part-time. Among female
. 680 .
3671 = 81.48% and 3671 = 18.52%. Men in this age range
students, those numbers are 2991
are (very slightly) more likely to be full-time students. The bar graph below on the right
shows the proportions of full-time students side by side; note that a pie graph would not
be appropriate for this display because the two proportions represent parts of two different
.
wholes. (c) For the full-time row, the expected counts are (5710)(3254)
6925 = 2683.08 and
(5710)(3671) .
6925 = 3026.92. (d) Using df = 1, we see that X 2 = 5.17 falls between 5.02 and 5.41,
so 0.02 < P < 0.025 (software gives 0.023). This is significant evidence (at the 5% level)
that there is a difference in the conditional distributions.
0.9
Proportion of FT students
0.5
Proportion of all students
0.8
0.4 0.7
0.6
0.3 Female Male 0.5
0.4
0.2
0.3
0.1 0.2
0.1
0 0
Male Female Male Female
260 Chapter 9 Analysis of Two-Way Tables
9.29. (a) The percent who have lasting waking Minitab output
symptoms is the total of the first column divided WakeYes WakeNo Total
69 .
by the grand total: 119 = 57.98%. (b) The BedYes 36 33 69
40.01 28.99
percent who have both waking and bedtime
symptoms is the count in the upper left divided BedNo 33 17 50
36 . 28.99 21.01
by the grand total: 119 = 30.25%. (c) To test
H0 : There is no relationship between waking Total 69 50 119
and bedtime symptoms versus Ha : There is a ChiSq = 0.402 + 0.554 +
. 0.554 + 0.765 = 2.275
relationship, we find X 2 = 2.275 (df = 1) and
. df = 1, p = 0.132
P = 0.132. We do not have enough evidence to
conclude that there is a relationship.
9.30. The table below gives df = (r − 1)(c − 1), bounds for P, and software P-values.
Size of Crit. values Bounds Software
X2 table df (Table F) for P P
(a) 1.25 2 by 2 1 X 2 < 1.32 P > 0.25 0.2636
(b) 18.34 4 by 4 9 16.92 < X 2 < 19.02 0.025 < P < 0.05 0.0314
(c) 24.21 2 by 8 7 22.04 < X 2 < 24.32 0.001 < P < 0.0025 0.0010
(d) 12.17 5 by 3 8 12.03 < X 2 < 13.36 0.10 < P < 0.15 0.1438
9.32. To construct such a table, we can start by choosing values for the row a b r1
and column sums r1 , r2 , r3 , c1 , c2 , as well as the grand total N . Note that c d r2
the N = r1 + r2 + r3 = c1 + c2 , so we only have four choices to make. Then e f r3
find each count a, b, c, d, e, f by taking the corresponding row total, times c 1 c 2 N
the corresponding column total, divided by the grand total. For example, a = r1 × c1 /N and
d = r2 × c2 /N . Of course, these counts should be whole numbers, so it may be necessary to
make adjustments in the row and column totals to meet this requirement.
The simplest such table would have all six counts a, b, c, d, e, f equal to one another
(which would arise if we start with r1 = r2 = r3 and c1 = c2 ).
9.33. (a) Different graphical presentations are possible; one is shown on the following page.
More women perform volunteer work; the notably higher percent of women who are
“strictly voluntary” participants accounts for the difference. (The “court-ordered” and “other”
percents are similar for men and women.) (b) Either by adding the three “participant”
categories or by subtracting from 100% the non-participant percentage, we find that 40.3%
of men and 51.3% of women are participants. The relative risk of being a volunteer is
.
40.3% = 1.27.
therefore 51.3%
Solutions 261
100
90 Strictly voluntary
80
70 Court-ordered
Percent
60
50 Other
40
Non-volunteers
30
20
10
0
Men Women
60
50 Court-ordered
40
30
20
10
0
Men Women
9.35. (a) The missing entries (shown shaded on the right) Gender
are found by subtracting the number who have tried low- Low-fat diet? Women Men
fat diets from the given totals. (b) Viewing gender as Yes 35 8
explanatory, compute the conditional distributions of low- No 146 97
35 .
fat diet for each gender: 181 = 19.34% of women and Total 181 105
.
105 = 7.62% of men have tried low-fat diets. (c) The test statistic is X = 7.143 (df = 1),
8 2
for which P = 0.008. We have strong evidence of an association; specifically, women are
more likely to try low-fat diets.
Minitab output
Women Men Total 20
Percent who have tried
Yes 35 8 43
27.21 15.79 15
low-fat diets
No 146 97 243
153.79 89.21 10
Total 181 105 286
5
ChiSq = 2.228 + 3.841 +
0.394 + 0.680 = 7.143
df = 1, p = 0.008 0
Women Men
262 Chapter 9 Analysis of Two-Way Tables
40
35
9.37. (a) As the conditional distribution of model dress for each age group has been given to
us, it only remains to display this distribution graphically. One such presentation is shown
below. (b) In order to perform the significance test, we must first recover the counts from
.
the percents. For example, there were (0.723)(1006) = 727 non-sexual ads in young adult
magazines. The remainder of these counts can be seen in the Minitab output below, where
. .
we see X 2 = 2.59, df = 1, and P = 0.108—not enough evidence to conclude that age group
affects model dress.
Minitab output
Young Mature Total 25
1 727 383 1110
Sexual ads (%)
740.00 370.00 20
2 279 120 399 15
266.00 133.00
Total 1006 503 1509 10
9.38. (a) Subtract the “agreed” counts from the Minitab output
sample sizes to get the “disagreed” counts. The Students Non-st Total
table is in the Minitab output on the right. (The Agr 22 30 52
26.43 25.57
output has been slightly altered to have more
descriptive row and column headings.) We find Dis 39 29 68
. 34.57 33.43
X = 2.67, df = 1, and P = 0.103, so we
2
9.40. In Exercise 9.15, we are comparing three populations (model 1): small, medium, and
large claims. In Exercise 9.23, we test for independence (model 2) between amount of sleep
and level of exercise. In Exercise 9.24, we test for independence between gender and lying
to teachers. In Exercise 9.39, one could argue for either answer. If we chose three separate
random samples from each division, then we are comparing three populations (model 1). If
a single random sample of student athletes was chosen, and then we classified each student
by division and by gambling response, this is a test for independence (model 2).
264 Chapter 9 Analysis of Two-Way Tables
Note: For some of these problems, either answer may be acceptable, provided a
reasonable explanation is given. The distinctions between the models can be quite difficult to
make since the difference between several populations might, in fact, involve classification
by a categorical variable. In many ways, it comes down to how the data were collected.
For example, in Exercise 9.15, we were told that the data came from a stratified random
sample—which means that the three groups were treated as separate populations. Of course,
the difficulty is that the method of collecting data may not always be apparent, in which
case we have to make an educated guess. One question we can ask to educate our guess is
whether we have data that can be used to estimate the (population) marginal distributions.
9.41. The Minitab output on the right shows both Minitab output
the two-way table (column and row headings Female Male Total
have been changed to be more descriptive) and bihai 20 0 20
. 14.00 6.00
the results for the significance test: X 2 = 12.0,
df = 1, and P = 0.001, so we conclude that no 29 21 50
35.00 15.00
gender and flower choice are related. The count
of 0 does not invalidate the test: Our smallest Total 49 21 70
expected count is 6, while the text says that “for ChiSq = 2.571 + 6.000 +
1.029 + 2.400 = 12.000
2 × 2 tables, we require that all four expected df = 1, p = 0.001
cell counts be 5 or more.”
.org .edu
Solutions 265
Minitab output
Eng Mgmt LA Other Total
Bio 13 25 158 202 398
25.30 34.56 130.20 207.95
Chem 16 15 19 64 114
7.25 9.90 37.29 59.56
Math 3 11 20 38 72
4.58 6.25 23.55 37.62
Phys 9 5 14 33 61
3.88 5.30 19.96 31.87
Total 41 56 211 337 645
ChiSq = 5.979 + 2.642 + 5.937 + 0.170 +
10.574 + 2.630 + 8.973 + 0.331 +
0.543 + 3.608 + 0.536 + 0.004 +
6.767 + 0.017 + 1.777 + 0.040 = 50.527
df = 9, p = 0.000
Solutions 267
9.46. Note that the given counts actually form a three-way table Face checks
(classified by adhesive, side, and checks). Therefore, this anal- No Yes
ysis should not be done as if the counts come from a 2 × 4 PVA/loose 10 54
two-way table; for one thing, no conditional distribution will PVA/tight 44 20
answer the question of interest (how to avoid face checks). UF/loose 21 43
Nonetheless, many students may do this analysis, for which they UF/tight 37 27
will find X 2 = 6.798, df = 3, and P = 0.079.
A better approach is to rearrange the table as shown on the right. The conditional distri-
butions across the rows will then give us information about avoiding face checks; the graph
.
below illustrates this. We find X 2 = 45.08, df = 3, and P < 0.0005, so we conclude that
the appearance of face checks is related to the adhesive/side combination—specifically, we
recommend the PVA/tight combination.
Another approach (not quite as good as the previous one) is to perform two separate
analyses—say, one for loose side, and one for tight side. These computations show that UF
.
is better than PVA for loose side (X 2 = 5.151, df = 1, P = 0.023), but there is no significant
.
difference for tight side (X 2 = 1.647, df = 1, P = 0.200). We could also do separate
. .
analyses for PVA (X = 37.029, df = 1, P < 0.0005) and UF (X 2 = 8.071, df = 1,
2
P = 0.005), from which we conclude that for either adhesive, the tight side has fewer face
checks. (Minitab output on the following page.)
Minitab output
NoChk Chk Total 80
PVA-L 10 54 64 70
Face checks (%)
28.00 36.00 60
PVA-T 44 20 64 50
28.00 36.00 40
UF-L 21 43 64 30
28.00 36.00 20
10
UF-T 37 27 64
28.00 36.00 0
PVA/loose PVA/tight UF/loose UF/tight
Total 112 144 256 Adhesive/side combination
ChiSq = 11.571 + 9.000 +
9.143 + 7.111 +
1.750 + 1.361 +
2.893 + 2.250 = 45.079
df = 3, p = 0.000
268 Chapter 9 Analysis of Two-Way Tables
9.48. (a) The bar graph on the right shows how 100
40
Event p̂1 p̂2 z P 35 Placebo
Itchiness 0.0386 0.0189 1.57 0.1154 30
Rash 0.0712 0.0270 2.74 0.0061 25
“Hyper” 0.0890 0.0622 1.35 0.1756 20
Diarrhea 0.1128 0.0919 0.92 0.3595 15
Vomiting 0.0653 0.0568 0.47 0.6357 10
Headache 0.0979 0.0649 1.61 0.1068 5
Stomachache 0.1543 0.1108 1.71 0.0875 0
"Hyper"
Vomiting
Any
Headache
Other
Diarrhea
Rash
Itchiness
Drowsiness
Stomachache
Adverse event
(e) We would expect multiple observations on the same child to be dependent, so the
assumptions for our analysis are not satisfied. Examination of the data reveals that the
results for both groups are quite similar, so we are inclined to agree with the authors
that there are no statistically significant differences. (f) Student opinions about the
criticisms of this study will vary. The third criticism might be dismissed as sounding
like conspiracy-theory paranoia, but the other three address the way that echinacea was
administered; certainly we cannot place too much faith in a clinical trial if it turns out that
the treatments were not given properly!
270 Chapter 9 Analysis of Two-Way Tables
.
9.49. The chi-square goodness of fit statistic is X 2 = 3.7807 with df = 3, for which P > 0.25
(software gives 0.2861), so there is not enough evidence to conclude that this university’s
distribution is different. The details of the computation are given in the table below; note
that there were 210 students in the sample.
Expected Expected Observed (O − E)2
frequency count count O−E E
Never 0.43 90.3 79 −11.3 1.4141
Sometimes 0.35 73.5 83 9.5 1.2279
Often 0.15 31.5 36 4.5 0.6429
Very often 0.07 14.7 12 −2.7 0.4959
210 3.7807
.
9.50. The chi-square goodness of fit statistic is X 2 = 3.4061 with df = 4, for which P > 0.25
(software gives 0.4923), so we have no reason to doubt that the numbers follow a Normal
distribution. The details of the computation are given in the table below. The table entries
from Table A for −0.6, −0.1, 0.1, and 0.6 are (respectively) 0.2743, 0.4602, 0.5398,
and 0.7257. Then, for example, the expected frequency in the interval −0.6 to −0.1 is
0.4602 − 0.2743 = 0.1859.
Expected Expected Observed (O − E)2
frequency count count O−E E
z ≤ −0.6 0.2743 137.2 139 1.85 0.0250
−0.6 < z ≤ −0.1 0.1859 93.0 102 9.05 0.8811
−0.1 < z ≤ 0.1 0.0796 39.8 41 1.20 0.0362
0.1 < z ≤ 0.6 0.1859 93.0 78 −14.95 2.4045
z > 0.6 0.2743 137.2 140 2.85 0.0592
3.4061
9.55. (a) Each quadrant accounts for one-fourth of the Observed Expected (o − e)2 /e
area, so we expect it to contain one-fourth of the 100 18 25 1.96
trees. (b) Some random variation would not surprise us; 22 25 0.36
we no more expect exactly 25 trees per quadrant than 39 25 7.84
we would expect to see exactly 50 heads when flipping 21 25 0.64
a fair coin 100 times. (c) The table on the right shows 100 10.8
the individual computations, from which we obtain X 2 = 10.8, df = 3, and P = 0.0129. We
conclude that the distribution is not random.
Chapter 10 Solutions
10.1. The given model was µ y = 43.4+2.8x, with standard deviation σ = 4.3. (a) The slope is
2.8. (b) When x increases by 1, µ y increases by 2.8. (Or equivalently, if x increases by 2,
µ y increases by 5.6, etc.) (c) When x = 7, µ y = 43.4+2.8(7) = 63. (d) Approximately 95%
of observed responses would fall in the interval µ y ± 2σ = 63 ± 2(4.3) = 63 ± 8.6 = 54.4 to
71.6.
10.3. Example 10.5 gives the confidence interval −0.969 to −0.341 for the slope β1 . Recall
) when x (i.e., PA) changes by +1. (a) If PA
that slope is the change in y (i.e., BMI
increases by 1, we expect BMI to change by β1 , so the 95% confidence interval for the
change is −0.969 to −0.341—that is, a decrease of 0.341 to 0.969 kg/m2 . (b) If PA
to change by −β1 , so the 95% confidence interval for the
decreases by 1, we expect BMI
to
change is an increase of 0.341 to 0.969 kg/m2 . (c) If PA increases by 0.5, we expect BMI
change by 0.5β1 , so the 95% confidence interval for the change is a decrease of 0.1705 to
0.4845 kg/m2 .
10.4. The given prediction interval is 16.4 to 31.0 kg/m2 . This interval is 2t ∗ SE ŷ units wide,
. .
where t ∗ = 1.9845 for df = 98. Therefore, SE ŷ = 3.68. By examining Figure 10.8, we can
judge that the prediction intervals for x = 9 and x = 10 are roughly the same width, so the
standard errors should be roughly the same.
. .
Note: For the first question, students might take t∗ = 2, in which case SEŷ = 3.65. For
the second question, when x changes from 9 to 10, SEŷ increases by about 0.006, so it is
quite reasonable to say they are about the same.
10.5. (a) The plot on the following page suggests a linear increase. (b) The regression equation
is ŷ = −4566.24 + 2.3x. (c) The fitted values and residuals are given in the table on the
following page. Squaring the residuals and summing gives 0.952, so the standard error is:
0.952 .
s= = 0.3173 = 0.5633
n−2
(d) Given x (the year), spending comes from a N (µ y , σ ) distribution, where µ y = β0 + β1 x.
.
The estimates of β0 , β1 , and σ are b0 = −4566.24, b1 = 2.3, and s √= 0.5633.
.
(e) We first note that x = 2005 and (xi − x)2 = 10, so SEb1 = s/ 10 = 0.1781.
We have df = n − 2 = 3, so t ∗ = 3.182, and the 95% confidence interval for β1 is
. .
b1 ± t ∗ SEb1 = 2.3 ± 0.5667 = 1.733 to 2.867. This gives the rate of increase of R&D
spending: between 1.733 and 2.867 billion dollars per year.
272
Solutions 273
10.6. (a) The variables x and y are reversed: Slope gives the change in y for a change in x.
(b) The population regression line has intercept β0 and slope β1 (not b0 and b1 ). (c) The
estimate µ̂ y = b0 + b1 x ∗ is more accurate when x ∗ is close to x, so the width of the
confidence interval grows with |x ∗ − x|.
10.7. (a) The parameters are β0 , β1 , and σ ; b0 , b1 , and s are the estimates of those
parameters. (b) H0 should refer to β1 (the population slope) rather than b1 (the estimated
slope). (c) The confidence interval will be narrower than the prediction interval because the
confidence interval accounts only for the uncertainty in our estimate of the mean response,
while the prediction interval must also account for the random error of an individual
response.
10.8. The table below gives two sets of answers: those found with critical values from
Table D, and those found with software. The approach taken only makes a noticeable
difference in part (c), where (with Table D) we take df = 80 rather than df = 98. In each
case, the margin of error is t ∗ SEb1 = 0.58t ∗ , with df = n − 2.
Table D Software
df b1 t∗ Interval t∗ Interval
(a) 23 1.1 2.069 −0.1000 to 2.3000 2.0687 −0.0998 to 2.2998
(b) 23 2.1 2.069 0.9000 to 3.3000 2.0687 0.9002 to 3.2998
(c) 98 1.1 1.990 −0.0542 to 2.2542 1.9845 −0.0510 to 2.2510
10.9. The test statistic is t = b1 /SEb1 = b1 /0.58, with df = n − 2. The tests for parts (a) and
(c) are not quite significant at the 5% level, while the test for part (b) is highly significant.
This is consistent with the confidence intervals from the previous exercise.
df b1 t P (Table D) P (software)
(a) 23 1.1 1.90 0.05 < P < 0.10 0.0705
(b) 23 2.1 3.62 0.001 < P < 0.002 0.0014
(c) 98 1.1 1.90 0.05 < P < 0.10* 0.0608
*Note that for (c), if we use Table D, we take df = 80.
274 Chapter 10 Inference for Regression
10.10. (a) The plot (below, left) shows a strong linear relationship with −2 7
no striking outliers. (b) The regression line (shown on the plot) is −2
−1 876
ŷ = 1133 + 1.6924x. (c) The residual plot (below, right) shows no −1 11
clear cause for concern. (d) A stemplot (shown) or histogram shows two −0 966
large residuals (one positive, one negative). (e) To test for a relationship, −0 433330
we test H0 : β1 = 0 versus Ha : β1 = 0 (or equivalently, use ρ in place 0 0011344
0 66789
of β1 ). (f) The test statistic and P-value are given in the Minitab output 1 02234
.
below: t = 10.55, P < 0.0005. We have strong evidence of a non-zero 1
slope. 2
2 5
Minitab output: Regression of 2008 tuition on 2000 tuition
The regression equation is y2008 = 1133 + 1.69 y2000
Predictor Coef Stdev t-ratio p
Constant 1132.8 701.4 1.61 0.116
y2000 1.6924 0.1604 10.55 0.000
s = 1134 R-sq = 78.2% R-sq(adj) = 77.5%
14
2008 tuition and fees ($1000)
2
12 Residual ($1000)
10 1
8 0
6 –1
4 –2
2 –3
2 3 4 5 6 7 2 3 4 5 6 7
2000 tuition and fees ($1000) 2000 tuition and fees ($1000)
.
10.11. (a) From the Minitab output above, we have SEb1 = 0.1604. With df = 30, t ∗ = 2.042,
.
so the 95% confidence interval for β1 is 1.6924 ± t ∗ SEb1 = 1.3649 to 2.0199. This slope
means that a $1 difference in tuition in 2000 changes 2008 tuition by between $1.36 and
$2.02. (It might be easier to understand expressed like this: If the costs of two schools
differed by $1000 in the year 2000, then in 2008, they would differ by between $1365 and
.
$2020.) (b) Regression explains r 2 = 78.2% of the variation in 2008 tuition. (c) When
.
x = 5100, the estimated 2008 tuition is ŷ = 1133 + 1.6924(5100) = $9764. (d) When
.
x = 8700, the estimated 2008 tuition is ŷ = 1133 + 1.6924(8700) = $15,857. (Software
reports $15,856; the difference is due to rounding. (e) The 2000 tuition at Stat U is similar
to others in the data set, while Moneypit U was considerably more expensive in 2000, so
that prediction requires extrapolation.
10.12. (a) β0 is the population intercept, 0.8. This says that the mean overseas return is 0.8%
when the U.S. return is 0%. (b) β1 is the population slope, 0.46. This says that when the
U.S. return changes by 1%, the mean overseas return changes by 0.46%. (c) The full model
is yi = 0.8 + 0.46xi + i , where yi and xi are observed overseas and U.S. returns in a given
year, and i are independent N (0, σ ) variables. The residual terms i allow for variation in
overseas returns when U.S. returns remain the same.
10.14. (a) One issue is that correlation coefficients assume a linear association between two
variables; if two variables x and y are whole numbers, the only linear relationships between
them √ would have correlation 1 or −1. (b) To test H0 : ρ = 0 versus Ha : ρ = 0, we compare
r√ n − 2
t = (with n = 98) to a t (96) distribution. The table on the following page shows t
1 −r
2
and P; entries marked * or ** are significant at the α = 0.05 level. (c) Entries marked **
.
15 = 0.0033; that eliminates three significant associations, although two
are significant at 0.05
(ASSESS/XTRA and ASSESS/HAND) were barely eliminated, and perhaps should not be
completely dismissed. (d) We might hesitate to apply the results more broadly because of
the artificial setting of these interviews, and the fact that the subjects were undergraduate
students.
276 Chapter 10 Inference for Regression
Overall rating
20
require that the variables x and y be Normal;
15
it is the errors (the deviation from the line)
that should be Normal. (c) The scatterplot 10
is on the right; note that incentive pay is the 5
explanatory variable. There is a weak pos-
itive linear relationship. (d) The regression 0
0 10 20 30 40 50 60 70 80
equation is ŷ = 6.247 + 0.1063x. (Not sur- Incentives as percent of salary
prisingly, regression only explains 15.3% of
the variation in rating.) (e) Residual analysis might include a histogram or stemplot, a plot
of residuals versus incentive pay, and perhaps a Normal quantile plot; the first two of these
items are shown on the following page. The residuals are slightly right-skewed. In addition,
we note that for incentive pay less than about 30%, most residuals are greater than about −7,
but extend up to +15.
60
120
50
100
Frequency
Frequency
40
80
60 30
40 20
20 10
0 0
0 10 20 30 40 50 60 70 80 90 0 5 10 15 20 25 30
Incentives as percent of salary Overall rating
Min Q1 M Q3 Max
Incentives as percent of salary 0 0.306 1.43 17.65 85.01
Overall player rating 0 2.25 6.31 12.69 27.88
15
60
10
50
Frequency 5
Residual
40
30 0
20 –5
10 –10
0 –15
–12 –9 –6 –3 0 3 6 9 12 15 18 21 0 10 20 30 40 50 60 70 80
Residual Incentives as percent of salary
10.16. (a) The regression equation is ŷ = 2.2043 + 0.02084x (Minitab output on the following
page). (b) Shown below are a histogram of the residuals, and a plot of residuals versus
incentive pay. These show no clear reasons for concern; in particular, the distribution of
residuals is considerably less skewed than those in Exercise 10.15. The residual spread is
.
also more consistent for low incentive pay. (c) The estimated slope is b1 = 0.02084 with
.
SEb1 = 0.003419. Whether we use df = 201 (and software) or df = 100 (and Table D), the
95% confidence interval is b1 ± t ∗ SEb1 = 0.0141 to 0.0276. Therefore, if the incentive
portion of salary increases by 1%, sqrt(rating) increases between 0.0141 and 0.0246. (d) The
predicted ratings are shown in the table (below, left). (e) In this plot (below, right), we see
that the models give similar estimates for high incentive pay, but the square-root model is
lower for low incentive pay. (f) Based on residuals, this model seems to be better. (There is
no clear reason to form a preference based on predicted values.)
35
2
30
25 1
Frequency
Residual
20 0
15
–1
10
–2
5
0 –3
–2.5 –2 –1.5 –1 –0.5 0 0.5 1 1.5 2 2.5 0 10 20 30 40 50 60 70 80
Residual Incentives as percent of salary
16
Model #1
14
Incentive Model
Predicted rating
Model #2
pay #1 (10.15) #2 (10.16) 12
0% 6.25 4.86 10
20% 8.37 6.87
8
40% 10.50 9.23
60% 12.63 11.94 6
80% 14.75 14.99 4
0 10 20 30 40 50 60 70 80
Incentives as percent of salary
278 Chapter 10 Inference for Regression
10.17. (a) 22 of these 30 homes sold for more than their assessed values. −3 3
This was “an SRS of 30 properties,” so it should be reasonably represen- −2 6543
−1 9422
tative, so the larger population should be similar (or at least it was at the −0 984430
time of the sample). (b) The scatterplot (below, left) shows a moderately 0 1134578
strong linear association. (c) The regression line ŷ = 21.50 + 0.9468x 1 1236
is included on the scatterplot. (d) The plot of residuals versus assessed 2 26
3
value (below, right) shows no obvious unusual features. The house with 4 17
the highest assessed value (which also stands out in the original scatter-
plot) may be influential. (e) A stemplot (shown here) or histogram looks reasonably Normal,
although there are two high residuals that stand apart from the rest. (f) There are no clear
violations of the assumptions—at least, none severe enough to cause too much concern.
40
Sales price ($1000)
250 30
20
Residual
10
200
0
–10
150 –20
–30
100 –40
100 150 200 250 300 100 150 200 250 300
Assessed value ($1000) Assessed value ($1000)
10.18. (a) and (b) The table on the right gives Assessed Selling Predicted
the predicted selling prices and residuals for value price price Residual
these three houses. (c) From the Minitab 155 142.9 168.26 −25.36
output in the previous solution, we have 220 224.0 229.80 −5.80
. . . −5.34
b0 = 21.50, SEb0 = 15.28, b1 = 0.94682, 285 286.0 291.34
.
and SEb1 = 0.08064. For 95% confidence with df = 28, we have t ∗ = 2.048, so the intervals
are −9.79 to 52.79 (intercept) and 0.78 to 1.11 (slope). (d) The two confidence intervals
in part (c) include the values 0 (for the intercept) and 1 (for the slope), so we cannot reject
y = x as unreasonable.
Note: In Exercise 10.17(a), we noted that 22 of the 30 homes in the sample sold for
more than their assessed value, and some students might fall back on that to answer part (d)
of this question. However, the confidence intervals in part (c) suggest that we do not have
enough evidence to reject the null hypothesis that the model is y = x.
10.19. (a) The plot (below, left) is roughly linear and increas- −3 520
ing. The number of tornadoes in 2004 (1819) is noticeably −2 931
−1 99843310
high, as is the 2008 count (1691) to a lesser extent. (b) The −0 9887654443211110
.
regression equation is ŷ = −28,438 + 14.82x; both the slope 0 001223556778
and intercept are significantly different from 0. In the Minitab 1 001224
. 2 011789
output on the following page, we see SEb1 = 1.463. With
∗ 3 6
t = 2.0049 for df = 54, the confidence interval for β1 is 4
.
b1 ± t ∗ SEb1 = 14.82 ± 2.93 = 11.89 to 17.76 tornadoes per 5 5
year. (c) In the plot (below, right), we see that the scatter might
be greater in recent years, and the 2004 residual is particularly high. (d) Based on a stem-
plot (right), the 2004 residual is an outlier; the other residuals appear to be roughly Normal.
.
(e) Without the 2004 count, the regression equation is ŷ = −26,584 + 13.88x. The estimated
slope decreases by almost one tornado per year.
1800 500
1600 400
Tornado count
1400 300
200
Residual
1200 100
1000 0
800 –100
–200
600 –300
400 –400
1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000
Year Year
280 Chapter 10 Inference for Regression
Computer's mpg
outliers, so regression seems to be appro- 46
priate. (b) The regression equation (shown 44
on the scatterplot) is ŷ = 11.81 + 0.7754x. 42
(c) Student summaries will vary. The mpg 40
38
values are certainly similar, but one notable
36
difference is that all but three of the computer 34
values are higher than the driver’s values, and
30 32 34 36 38 40 42 44 46 48
the mean computer mpg is about 2.7 mpg Driver's mpg
higher than the mean driver mpg. Addition-
ally, the slope of the regression line is about 0.78, meaning that (on average) a 1 mpg change
in the driver’s value corresponds to a 0.78 mpg for the computer. The intercept, however, is
about 11.8 mpg, suggesting that the computer’s value is generally higher when the driver’s
value is small.
.
10.21. (a) About r 2 = 8.41% of the variability in AUDIT score is explained by (a linear
regression on) gambling frequency. (b)√ With r = 0.29 and n = 908, the test statistic for
r n −2 .
H0 : ρ = 0 versus Ha : ρ = 0 is t = √ = 9.12 (df = 906), for which P is very small.
1 −r
2
(c) Nonresponse is a problem because the students who did not answer might have different
characteristics from those who did. Because of this, we should be cautious about considering
these results to be representative of all first-year students at this university, and even more
cautious about extending these results to the broader population of all first-year students.
Solutions 281
10.22. (a) Stemplots are shown on the right. x (wa- Area IBI
.
tershed area) is right-skewed; x = 28.2857 km , 2
0 2 2 99
. .
sx = 17.7142 km2 . y (IBI) is left-skewed; y = 65.9388, 0 5688999 3 233
. 1 0024 3 9
sy = 18.2796. (b) The scatterplot (below, left) shows a 1 66889 4 13
weak positive association, with more scatter in y for small 2 111133 4 67
x. (c) yi = β0 + β1 xi + i , i = 1, 2, ..., 49; i are indepen- 2 66667889 5 34
dent N (0, σ ) variables. (d) The hypotheses are H0 : β1 = 0 3 112244 5 556899
3 9 6 0124
versus Ha : β1 = 0. (e) See the Minitab output below. The 4 6 7
= 52.92 + 0.4602 Area, and the
regression equation is IBI 4 799 7 11124
.
estimated standard deviation is s = 16.53. For testing the 5 244 7 56889
5 789 8 001222344
hypotheses in part (d), t = 3.42 and P = 0.001. (f) The 6 8 556899
residual plot (below, right) again shows that there is more 6 9 9 1
variation for small x. (g) As we can see from a stemplot 7 0
and/or a Normal quantile plot (both below), the residuals are somewhat left-skewed but oth-
erwise seem reasonably close to Normal. (h) Student opinions may vary. The two apparent
deviations from the model are (i) a possible change in standard deviation as x changes and
(ii) possible non-Normality of error terms.
90
20
80
10
70
Residual
0
60
IBI
50 –10
40 –20
30 –30
20 –40
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
2 2
Watershed area (km ) Watershed area (km )
−3 2200
−2 8 20
−2 42 10
−1 9665
Residual
−1 3 0
−0 8885 –10
−0 433100
0 223334 –20
0 666789 –30
1 022334
1 6799 –40
2 0024 –3 –2 –1 0 1 2 3
2 5 Normal score
282 Chapter 10 Inference for Regression
10.23. (a) The stemplot of percent forested is shown on the right; see Percent forested
the solution to the previous exercise for the stemplot of IBI. x (per- 0 00000033789
cent forested) is right-skewed; x = 39.3878%, sx = 32.2043%. y 1 0014778
2 125
(IBI) is left-skewed; y = 65.9388, sy = 18.2796. (b) The scatterplot 3 123339
(below, left) shows a weak positive association, with more scatter 4 133799
in y for small x. (c) yi = β0 + β1 xi + i , i = 1, 2, ..., 49; i are 5 229
independent N (0, σ ) variables. (d) The hypotheses are H0 : β1 = 0 6 38
7 599
versus Ha : β1 = 0. (e) See the Minitab output below. The regression 8 069
= 59.91 + 0.1531 Forest, and the estimated standard
equation is IBI 9 055
.
deviation is s = 17.79. For testing the hypotheses in (d), t = 1.92 10 00
and P = 0.061. (f) The residual plot (below, right) shows a slight curve—the residuals seem
to be (very) slightly lower in the middle and higher on the ends. (g) As we can see from a
stemplot and/or a Normal quantile plot (both below), the residuals are left-skewed. (h) Stu-
dent opinions may vary. The three apparent deviations from the model are (i) a possible
change in standard deviation as x changes, (ii) possible curvature of residuals, and (iii) possi-
ble non-Normality of error terms.
90 30
80 20
70 10
Residual
60 0
IBI
50 –10
40 –20
30 –30
20 –40
0 20 40 60 80 100 0 20 40 60 80 100
Percent forested Percent forested
−3 55 30
−3 4
−2 988 20
−2 0 10
−1 985
Residual
−1 2110 0
−0 99887 –10
−0 410 –20
0 134
0 557899 –30
1 01122333 –40
1 55678 –3 –2 –1 0 1 2 3
2 044 Normal score
2 78
Solutions 283
10.24. The first model (using watershed area to predict IBI) is preferable because the
regression was significant (P = 0.001 versus P = 0.061) and explained a higher proportion
of the variation in IBI (19.9% versus 7.3%).
10.25. The precise results of these changes depend on which observation is changed. (There
are six observations which had 0% forest and two which had 100% forest.) Specifically, if
we change IBI to 0 for one of the first six observations, the resulting P-value is between
0.019 (observation 6) and 0.041 (observation 3). Changing one of the last two observations
changes the P-value to 0.592 (observation 48) or 0.645 (observation 49).
In general, the first change decreases P (that is, the relationship is more significant)
because it accentuates the positive association. The second change weakens the association,
so P increases (the relationship is less significant).
Reading score
scatterplot on the right.) The slope is sig- 60
nificantly different from 0: t = 4.82,
P < 0.0005. (b) Without the four points 40
from the bottom of the scatterplot, the re- 20
gression equation is ŷ = −33.40 + 0.8818x,
.
s = 15.18. (This is the dashed line in the 0
80 90 100 110 120 130 140
scatterplot.) The slope is again significantly
IQ
different from 0: t = 6.57, P < 0.0005.
With the outliers removed, the line changes slightly; the most significant change is the de-
crease in the estimated standard deviation s. This correspondingly makes t larger (i.e., b1 is
more significantly different from 0) and makes the regression line more useful for prediction
(r 2 increases from 28.9% to 44.4%). Of course, we should not arbitrarily remove data points;
more investigation is needed to determine why these students’ reading scores were so much
lower than we would expect based on their IQs.
10.29. (a) Stemplots are shown below; both variables are right-skewed. For pure tones,
. .
x = 106.20 and s = 91.76 spikes/second, and for monkey calls, y = 176.57 and
sy = 111.85 spikes/second. (b) There is a moderate positive association; the third point
(circled) has the largest residual; the first point (marked with a square) is an outlier for
= 93.9 + 0.778 TONE and s = 87.30; the test
tone response. (c) With all 37 points, CALL
of β1 = 0 gives t = 4.91, P < 0.0001. (d) Without the first point, ŷ = 101 + 0.693x,
s = 88.14, t = 3.18. Without the third point, ŷ = 98.4 + 0.679x, s = 80.69, t = 4.49. With
neither, ŷ = 116 + 0.466x, s = 79.46, t = 2.21. The line changes a bit, but always has a
slope significantly different from 0.
Monkey call response (spikes/second)
500
Tone Call Residual
0 122222233444 0 4 −1 65 400
0 55556777 0 566667889 −1 3
1 0011244 1 011223334 −0 8876555 300
1 566778 1 5889999 −0 44444331100
2 24 2 0004 0 012334
2 5 2 7 0 667888 200
3 3 0134 1 14
3 3 1 7 100
4 4 2 0
4 7 4 8 0
5 0 0 100 200 300 400 500
Pure tone response (spikes/second)
.
10.31. (a) The stemplots (below, left) are fairly symmetric. For x (MOE), x = 1, 799, 180
. . .
and sx = 329, 253; for y (MOR), y = 11, 185 and sy = 1980. (b) The plot (below, right)
shows a moderately strong, positive, linear relationship. Because we would like to predict
MOR from MOE, we should put MOE on the x axis. (c) The model is yi = β0 + β1 xi + i ,
i = 1, 2, ..., 32; i are independent N (0, σ ) variables. The regression equation is
= 2653 + 0.004742 MOE, s = .
MOR 1238. The slope is significantly different from 0:
t = 7.02 (df = 30), P < 0.0001. (d) Assumptions appear to be met: A stemplot of the
residuals shows one slightly low (not quite an outlier), but acceptable, and the plot of
residuals against MOE (not shown) does not suggest any particular pattern.
MOE MOR Residuals
11 6 6 3 −3 3
12 7 −2 15
13 55 8 3588 −2 14
MOR (thousands)
14 1578 9 222 −1 6 13
15 5589 10 22356 −1 31110 12
16 14 11 223455799 −0 76555 11
17 2479 12 00777 −0 43221 10
18 447 13 469 0 00223 9
19 358 14 5 0 78 8
20 0348 15 3 1 1334 7
21 8 1 599 6
22 1 2 1 1 1.2 1.4 1.6 1.8 2 2.2 2.4
23 47 MOE (millions)
24
25 3
10.32. (a) The 95% confidence interval gives a range of values for the mean MOR of many
pieces of wood with MOE equal to 2,400,000. The prediction interval gives a range of
values for the MOR of one piece of wood with MOE equal to 2,400,000. (b) The prediction
interval will include more values because the confidence interval accounts only for the
uncertainty in our estimate of the mean response, while the prediction interval must also
account for the random error of an individual response. (c) With the regression equation
= 2653 + 0.004742 MOE, the predicted mean response when x = MOE = 2,400,000
MOR
= .
is µ̂ y = MOE 14,034. The Minitab output below gives the two intervals, along with SEµ̂
(“Stdev.fit”).
10.35. (a) Aside from the one high point (70 100
months of service, and wages 97.6801), 90
Wages (rescaled)
10.36. The table below summarizes the regression results with the outlier excluded, and those
with all points. Minitab output for both regressions is shown above. (a) The intercept and
slope estimates change very little, but the estimate of σ increases from 10.21 to 11.98.
(b) With the outlier, the t statistic decreases (because s has increased), and the P-value
increases slightly—although it is still significant at the 5% level. (c) The interval width
2t ∗ SEb1 increases from 0.1030 to 0.1207—roughly the same factor by which s increased.
(Because the degrees of freedom change from 57 to 58, t ∗ decreases from 2.0025 to 2.0017,
but the change in s has a much greater impact.)
b0 b1 s t P Interval width
Outlier excluded 43.383 0.07325 10.21 2.85 0.006 0.1030
All points 44.213 0.07310 11.98 2.42 0.018 0.1207
.
10.38. (a) ŷ = −61.12 + 9.3187(18) = 107, for a prediction of 2.9107 m. (b) This is an
example of extrapolation—trying to make a prediction outside the range of given x-values.
Minitab reports that a 95% prediction interval for ŷ when x ∗ = 18 is about 62.6 to 150.7.
The width of the interval is an indication of how unreliable the prediction
is.
. .
Note: Minitab’s “Stdev.Fit” value of 19.56 is SEµ̂ , so SEŷ = s2 + SE2µ̂ = 20.00, which
.
agrees with the margin for the prediction interval: t∗ SEŷ = (2.201)(20.00) = 44.02.
10.40. A negative association makes sense here: If the price of beer is above average, fewer
students can afford to drink, while more drinking happens when beer is cheaper.
Note: The fact that the correlation is relatively small indicates that the price of beer
is not a crucial factor in determining the prevalence of binge-drinking. In particular, a
.
straight-line relationship with the cost of beer only explains about r2 = 13% of the variation
in binge-drinking rates.
√
r n −2 .
10.41. To test H0 : ρ = 0 versus Ha : ρ = 0, we compute t = √ = −4.16. Comparing this
1 −r
2
10.42. (a) Scatterplot below, left. (b) Scatterplot below, right. (c) The regression equation is
. .
ŷ = −827.66 + 0.4200x with s = 0.2016. For 95% confidence with df = 6, t ∗ = 2.4469, so
. .
with b1 = 0.4200 and SEb1 = 0.006524, the confidence interval is 0.4041 to 0.4360.
Note: If students use a common logarithm (rather than a natural logarithm, as we have
done), everything would be multiplied by about 0.4343: The vertical scale on the graph
would be from 0 to about 6, the regression line would be ŷ = −359.45 + 0.1824x, and the
interval would be 0.1755 to 0.1893.
DRAM capacity (kbits x 106)
1 14
log(DRAM capacity)
12
0.8
10
0.6
8
0.4 6
0.2 4
2
0
0
1970 1980 1990 2000 1970 1980 1990 2000
Year Year
√ . SSM 4560.6 .
10.46. s = MSE = 14.8985 and r 2 = = = 0.5330.
SST 8556.0
290 Chapter 10 Inference for Regression
√ .
10.47. As sx = 1
19 (xi − x)2 = 19.99%, we have (xi − x)2 = sx 19 = 87.1344%, so:
s . 14.8985 .
SEb1 = = = 0.1710
(xi − x)2 87.1344
.
Alternatively, note that we have F = 20.55 and b1 = 0.775. Because t 2 = F, we know
that t = 4.5332 (take the positive square root, because t and b1 have the same sign).
Then SEb1 = b1 /t = 0.1710. (Note that with this approach, we do not need to know that
sx = 19.99%.)
Finally, with df = 18, t ∗ = 2.1009 for 95% confidence, so the 95% confidence interval is
0.775 ± 0.3592 = 0.4158 to 1.1342.
. . . . .
10.48. (a) With x = 80.9, sx = 17.2, y = 43.5, sy = 20.3, and r = 0.68, we find:
20.3 .
b1 = (0.68) = 0.8026
17.2
.
b0 = 43.5 − (0.8026)(80.9) = −21.4270
(Answers may vary slightly due to rounding.) The regression equation is therefore
= −21.4270 + 0.8026 FVC. (b) Testing β1 = 0 is equivalent to testing ρ = 0, so the
GHP √
r n −2 .
test statistic is t = √ = 6.43 (df = 48), for which P < 0.0005. The slope (correlation)
21 −r
is significantly different from 0.
√
r n −2
10.49. Use the formula t = √ with r = 0.6. For n = 20, t = 3.18 with df = 18, for which
2 1 −r
the two-sided P-value is P = 0.0052. For n = 10, t = 2.12 with df = 8, for which the
two-sided P-value is P = 0.0667. With the larger sample size, r should be a better estimate
of ρ, so we are less likely to get r = 0.6 unless ρ is really not 0.
20
are lower, than we would predict from the
10
regression.
0
–10
–20
0 50 100 150 200
Length of service (months)
Solutions 291
ACT
20
for the SAT score (420). Since this SAT score
is so low, this point may be influential. No 15
other points fall outside the pattern. (b) The 10
regression equation is ŷ = 1.626 + 0.02137x.
The slope is significantly different from 0: 5
300 500 700 900 1100 1300 1500
t = 10.78 (df = 58) for which P < 0.0005. SAT
(c) r = 0.8167.
10.52. (a) The means are identical (21.133). (b) For the observed ACT scores,
sy = 4.714; for the fitted values, s ŷ = 3.850. (c) For z = 1, the SAT score is
.
x + sx = 912.7 + 180.1 = 1092.8. The predicted ACT score is ŷ = 25 (Minitab reports
24.983), which gives a standard score of about 1 (using the standard deviation of the
predicted ACT scores. (d) For z = −1, the SAT score is x − sx = 912.7 − 180.1 = 732.6.
.
The predicted ACT score is ŷ = 17.3 (Minitab reports 17.285), which gives a standard score
of about −1. (e) It appears that the standard score of the predicted value is the same as the
explanatory variable’s standard score. (See note below.)
Note: (a) This will always be true because i ŷi = i (b0 + b1 xi ) = n b0 + b1 i xi =
n(y − b1 x) + b1 n x = n y. (b) The standard deviation of the predicted values will be
sŷ = |r|sy ; in this case, sŷ = (0.8167)(4.714). To see this, observe that the variance of the
predicted values is n −1 1 i (ŷi − y)2 = n −1 1 i (b1 xi − b1 x)2 = b21 sx2 = r2 sy2 . (e) For a given
standard score z, note that ŷ = b0 + b1 (x + z sx ) = y − b1 x + b1 x + b1 z sx = y + z r sy . If r > 0,
the standard score for ŷ equals z; if r < 0, the standard score is −z.
292 Chapter 10 Inference for Regression
ACT
20
a0 = −2.7522. (b) The new line is dashed.
(c) For example, the first prediction is 15
.
−2.7522 + (0.02617)(1000) = 23.42. Up 10
to rounding error, the mean and standard
deviation of the predicted scores are the same 5
300 500 700 900 1100 1300 1500
as those of the ACT scores: y = 21.13 and SAT
sy = 4.7137.
Note: The usual least-squares line minimizes the total squared vertical distance from the
points to the line. If instead we seek to minimize the total of i |hi vi |, where hi is the hori-
zontal distance and vi is the vertical distance, we obtain the line ŷ = a0 + a1 x—except that
we must choose the sign of a1 to be the same as the sign of r. (It would hardly be the “best
line” if we had a positive slope with a negative association.) If r = 0, either sign will do.
10 10
8 8
Weight (100g)
Weight (100g)
6 6
4 4
2 2
0 0
5 10 15 20 25 30 35 40 45 1 2 3 4 5 6 7
Length (cm) Width (cm)
10 10
8 8
Weight (100g)
Weight (100g)
6 6
4 4
2 2
0 0
0 500 1000 1500 2000 0 10 20 30 40 50 60
2 2 2 2
Length (cm ) Width (cm )
10.56. (a) The regression line is WEIGHT = 10
−115.10 + 3.1019( LENGTH)( WIDTH),
. 8
s = 41.69, r 2 = 0.986. (b) As measured by
Weight (100g)
0
0 50 100 150 200 250 300 350
2
Length times width (cm )
Minitab output: Regression of weight on length*width
The regression equation is weight = -115 + 3.10 lenwid
Predictor Coef Stdev t-ratio p
Constant -115.10 21.87 -5.26 0.000
lenwid 3.1019 0.1179 26.32 0.000
s = 41.69 R-sq = 98.6% R-sq(adj) = 98.4%
294 Chapter 10 Inference for Regression
10.61. (a) These intervals (in the table below) overlap quite a bit. (b) These quantities can be
computed from the data, but it is somewhat simpler to recall that they can be found from
the sample standard deviations sx,w and sx,m :
√ . √ . √ . √ .
sx,w 11 = 6.8684 11 = 22.78 and sx,m 6 = 6.6885 6 = 16.38
The women’s SEb1 is smaller in part because it is divided by a large number. (c) In order to
reduce SEb1 for men, we should choose our new sample to include men with a wider variety
of lean body masses. (Note that just taking a larger sample will reduce SEb1 ; it is reduced
even more if we choose subjects who will increase sx,m .)
b1 SEb1 df t∗ Interval
Women 24.026 4.174 10 2.2281 14.7257 to 33.3263
Men 16.75 10.20 5 2.5706 −9.4699 to 42.9699
10.62. Scatterplots, and portions of the Minitab outputs, are shown below. The equations are:
For all points, = −7.796 + 7.8742 LOGMPH
MPG
For speed ≤ 30 mph, = −9.786 + 8.5343 LOGMPH
MPG
For fuel efficiency ≤ 20 mpg, = −4.282 + 6.6854 LOGMPH
MPG
Students might make a number of observations about the effects of the restrictions; for
example, the estimated coefficients (and their standard errors) change quite a bit.
Speed ≤ 30 mph Fuel efficiency ≤ 20 MPG
20 20
Fuel efficiency (MPG)
19 19
18 18
17 17
16 16
15 15
14 14
13 13
12 12
2.4 2.6 2.8 3 3.2 3.4 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8
log(Speed in miles per hour) log(Speed in miles per hour)
11.1. (a) The response variable is math GPA. (b) The number of cases is n = 106. (c) There
were p = 4 explanatory variables. (d) The explanatory variables were SAT Math, SAT
Verbal, class rank, and mathematics placement score.
11.2. (a) ŷ = −3.8 + 7.3(3) − 2.1(1) = 16. (b) No: We can compute predicted values for any
values of x1 and x2 . (Of course, it helps if they are close to those in the data set.) (c) This
is determined by the coefficient of x1 : An increase of two units in x1 results in an increase
of (7.3)(2) = 14.6 units in ŷ.
11.3. (a) The fact that the coefficients are all positive indicates that math GPA should increase
when any explanatory variable increases (as we would expect). (b) With n = 86 cases and
p = 4 variables, DFM = p = 4 and DFE = n − p − 1 = 81. (c) In the following table, each
t statistic is the estimate divided by the standard error; the P-values are computed from a t
distribution with df = 81. (The t statistic for the intercept was not required for this exercise,
but is included for completeness.)
Variable Estimate SE t P
Intercept −0.764 0.651 −1.1736 0.2440
SAT Math 0.00156 0.00074 2.1081 0.0381
SAT Verbal 0.00164 0.00076 2.1579 0.0339
HS rank 1.470 0.430 3.4186 0.0010
Bryant placement 0.889 0.402 2.2114 0.0298
All four coefficients are significantly different from 0 (although the intercept is not).
296
Solutions 297
11.5. The correlations are found in Fig- SATM SATV HSM HSS HSE
ure 11.4 and are summarized in the GPA 0.2517 0.1145 0.4365 0.3294 0.2890
table on the right. Of the 15 possible SATM 0.4639 0.4535 0.2405 0.1083
scatterplots to be made from these six SATV 0.2211 0.2617 0.2437
HSM 0.5757 0.4469
variables, three are shown below as
HSS 0.5794
examples. The pairs with the largest
correlations are generally easy to pick out. The whole-number scale for high school grades
causes point clusters in those scatterplots and makes it difficult to determine the strength of
the association. For example, in the plot of HSS versus HSE below, the circled point repre-
sents 9 of the 224 students. One might guess that these three scatterplots show relationships
of roughly equal strength, but because of the overlapping points, the correlations are quite
different; from left to right, they are 0.2517, 0.4365, and 0.5794.
10 10
750 9 9
650 8 8
7
7
SATM
HSM
HSS
550 6
6
5
450 4 5
3 4
350
2 3
250 1 2
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 2 3 4 5 6 7 8 9 10
GPA GPA HSE
11.6. The regression equation is given in the Minitab output below. The whole-number scale
for high school grades means that the predicted values also come in clusters. All but 21
students had both HSM and HSE above 5, so for all three plots, there are few residuals on
the left half.
2 2 2
1.5 1.5 1.5
1 1 1
0.5 0.5 0.5
Residual
0 0 0
–0.5 –0.5 –0.5
–1 –1 –1
–1.5 –1.5 –1.5
–2 –2 –2
–2.5 –2.5 –2.5
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1.25 1.5 1.75 2 2.25 2.5 2.75 3
HSM HSE Predicted values
11.7. The table below gives two sets of answers: those found with critical values from Table D
and those found with software. In each case, the estimated coefficient is b1 = 6.4 with
standard error SEb1 = 3.1, and the margin of error is t ∗ SEb1 , with df = n − 3 for parts (a)
and (b), and df = n − 4 for parts (c) and (d). (The Table D interval for part (d) uses
df = 100.)
Table D Software
n df t ∗ Interval t ∗ Interval
(a) 27 24 2.064 0.0016 to 12.7984 2.0639 0.0019 to 12.7981
(b) 53 50 2.009 0.1721 to 12.6279 2.0086 0.1735 to 12.6265
(c) 27 23 2.069 −0.0139 to 12.8139 2.0687 −0.0128 to 12.8128
(d) 124 120 1.984 0.2496 to 12.5504 1.9799 0.2622 to 12.5378
.
11.8. For all four settings, the test statistic is t = b1 /SEb1 = 6.4/3.1 = 2.065, with df = n − 3
for parts (a) and (b) and df = n − 4 for parts (c) and (d). The P-values are 0.0499, 0.0442,
0.0504, and 0.0411. At the 5% significance level, we would reject the null hypothesis for
each test except (c); the test is barely significant for (a), and barely not significant for (c).
(This is consistent with the confidence intervals from the previous exercise.)
11.9. (a) H0 should refer to β2 (the population coefficient) rather than b2 (the estimated
coefficient). (b) This sentence should refer to the squared multiple correlation. (c) A small
P implies that at least one coefficient is different from 0.
11.10. (a) Multiple regression only assumes Normality of the error terms (residuals), not
the explanatory variables. (The explanatory variables do not even need to be random
variables.) (b) A small P-value tells us that the model is significant (useful for prediction)
but does not measure its explanatory power (the accuracy of those predictions). The squared
multiple correlation R 2 is a measure of explanatory power. (c) For example, if x1 and x2 are
significantly correlated with each other and with the response variable, it might turn out that
the coefficient of x1 is statistically significant and the coefficient of x2 is not. (d) R is not
the average correlation; if it were, adding additional variables might reduce make R closer
to 0. R 2 tells us the total explanatory power of the entire model.
Note: The statement for part (c) is a paraphrase of the “Caution” on page 602 of the
text. As a simple illustration of how this might happen, suppose that the response variable
y = ax1 + b (with little or no error term), where all observed values of x1 are positive, and
the second explanatory variable is x2 = x12 . The correlation between y and x2 might be very
large, but in a multiple regression model with x1 , the coefficient of x2 will almost certainly
not be significant.
11.13. We have p = 8 explanatory variables and n = 795 observations. (a) The ANOVA F test
has degrees of freedom DFM = p = 8 and DFE = n − p − 1 = 786. (b) This model explains
.
only R 2 = 7.84% of the variation in energy-drink consumption; it is not very predictive.
(c) A positive (negative) coefficient means that large values of that variable correspond to
higher (lower) energy-drink consumption. Therefore, males and Hispanics consume energy
drinks more frequently, and consumption increases with risk-taking scores. (d) Within a
group of students with identical (or similar) values of those other variables, energy-drink
consumption increases with increasing jock identity and increasing risk taking.
11.14. No (or at least, not necessarily). It is possible that, although no individual coefficient
is significant, the whole group (or some subset) is. Recall that the t tests “assess the
significance of each predictor variable assuming that all other predictors are included in the
regression equation.” If one variable is removed from the model (because its t statistic is
not significant), we can no longer use the other t statistics to draw conclusions about the
remaining coefficients.
table on the right. (d) The relationship is still positive after adjusting for RB. When gene ex-
pression increases by 1, popularity increases by 0.204 in Model 1, and by 0.161 in Model 2
(with RB fixed).
11.16. (a) All three correlations quite high: year and tornado count (0.8095), population and
tornado count (0.8180), and year and population (0.9981). The solution to Exercise 10.19
shows a scatterplot of tornadoes versus year; the other two scatterplots are shown on
the following page. Because of the near-perfect linear relationship between year and
population, the plot of tornadoes versus population looks nearly identical to the plot
of tornadoes versus year (apart from horizontal scale). (b) The regression equation is
= 63677 − 33.91 YEAR + 0.0191 CENSUS. (Minitab output on the following page.)
COUNT
(c) The only cause for concern in the analysis is the extremely high count from 2004, which
is visible in all the plots. The plots versus year and versus population are nearly identical,
apart from scale; neither plot shows any striking patterns. The Normal quantile plot (along
with a stemplot of the residuals) suggests no serious deviations from Normality. (d) To
look for a linear increase over time, we test H0 : β1 = 0 versus Ha : β1 = 0, where β1 is
300 Chapter 11 Multiple Regression
coefficient of YEAR in our model. The test statistic is t = −1.46 (P = 0.149), so we cannot
reject H0 . With population included, the predictive information in year is made redundant.
(That is, once we know the population, the additional information from year does not
appreciably improve our estimate of tornado count.)
1600
Tornado count
260 1400
240
1200
220
1000
200
180 800
160 600
140 400
1950 1960 1970 1980 1990 2000 140 160 180 200 220 240 260 280 300
Year Population (millions)
500 500
400 400
300 300
200 200
Residual
Residual
100 100
0 0
–100 –100
–200 –200
–300 –300
–400 –400
1950 1960 1970 1980 1990 2000 140 160 180 200 220 240 260 280 300
Year Population (millions)
500
400
−3 6 300
−2 876432 200
Residual
−1 7654410 100
−0 9966655433222100 0
0 00123345699
–100
1 111335667
2 56799 –200
3 –300
4 9 –400
–3 –2 –1 0 1 2 3
Normal score
Solutions 301
Residual
0 0
–2 –2
–4 –4
–6 –6
–8 –8
–10 –10
–6 –4 –2 0 2 4 6 0 5 10 15 20 25 30
(PA – 8.614) 2
(PA – 8.614)
−8 1
−7 3
−6 840 8
−5 9832 6
−4 553210 4
−3 8875421 2
Residual
−2 9943321 0
−1 7776311 –2
−0 9776666642100
–4
0 0256667778889
–6
1 012334467899
2 233447889 –8
3 67 –10
4 012277 –3 –2 –1 0 1 2 3
5 348 Normal score
6 68
7 0007
11.18. (a) All distributions are skewed to the right (stemplots follow). Student choices of
summary statistics may vary; five-number summaries are a good choice because of the
skewness, but some may also give means and standard deviations. Notice especially how the
skewness is apparent in the five-number summaries.
Variable x s Min Q1 M Q3 Max
Total billing 8.2524 6.7441 1.6 2.85 6.7 11.35 29.5
Number of architects 10.5714 8.9026 1.0 5.00 7.0 16.50 39.0
Number of engineers 6.8095 10.7964 0.0 0.00 2.0 9.00 36.0
Number of staff 59.9048 55.8891 9.0 15.50 58.0 71.00 240.0
302 Chapter 11 Multiple Regression
(b) Correlation coefficients are given with the scatterplots (below). All pairs
=
of variables are positively correlated. (c) The regression equation is Billing
.
0.7799 + 0.0143 Arch − 0.1364 Eng + 0.1377 Staff, and the standard error is s = 1.935.
(Minitab output below.) (d) The plots of residuals versus the explanatory variables (not
shown) reveal no particular causes for concern. A stemplot of the residuals (below) is
somewhat right-skewed; this can also be seen in a Normal quantile plot (not shown). (e) The
predicted billing for HCO is 3.028 (million dollars).
Total billing Architects Engineers Staff Residuals
0 1112234 0 1223 0 000000000123 0 011111124 −2 5
0 5566779 0 556667789 0 5567 0 5566667789 −1 99921
1 0022 1 23 1 1 1 −0 886544
1 58 1 6779 1 57 1 6 0 006
2 2 2 2 2 4 1 245
2 9 2 2 2 1
3 3 3 3
3 9 3 56 4 2
30 30 30
25 25 25
Total billing
Total billing
Total billing
20 20 20
15 15 15
10 10 10
5 5 5
0 0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 0 50 100 150 200
Number of architects Number of engineers Number of staff
Correlation: 0.7841 Correlation: 0.8194 Correlation: 0.9587
40 40 35
35 35
Number of engineers
Number of architects
Number of architects
30
30 30
25
25 25
20
20 20
15
15 15
10
10 10
5 5 5
0 0 0
0 5 10 15 20 25 30 35 0 50 100 150 200 0 50 100 150 200
Number of engineers Number of staff Number of staff
Correlation: 0.4569 Correlation: 0.7579 Correlation: 0.9018
Minitab output: Regression of total billing on numbers of architects, engineers, and staff
TotalBil = 0.780 + 0.014 N_Arch - 0.136 N_Eng + 0.138 N_Staff
Predictor Coef Stdev t-ratio p
Constant 0.7799 0.7126 1.09 0.289
N_Arch 0.0143 0.1252 0.11 0.910
N_Eng -0.1364 0.1558 -0.88 0.393
N_Staff 0.13773 0.04104 3.36 0.004
s = 1.935 R-sq = 93.0% R-sq(adj) = 91.8%
Prediction for 3 architects, 1 engineer, 17 staff members
Fit Stdev.Fit 95.0% C.I. 95.0% P.I.
3.028 0.566 ( 1.833, 4.223) ( -1.227, 7.283)
Solutions 303
11.19. (a) In the two scatterplots (below), we see a moderate positive linear relationship
for small banks. For large banks, the relationship is very weak. (b) For small banks,
= 35.9 + 0.1042 LOS, with R 2 = . .
Wages 46.6% and s = 7.026. (c) For large banks,
= 49.5 + 0.0560 LOS, with R 2 = . .
Wages 3.5% and s = 13.02. (d) The large-bank regression
is not significant (nor is it useful for prediction).
70 100
Small banks Large banks
65 90
60
80
55
Wages
Wages
50 70
45 60
40
50
35
30 40
25 30
0 50 100 150 200 0 50 100 150 200
Length of service Length of service
11.20. (a) Note that most statistical software provides a way to define new variables like this.
= 35.9 + 0.1042 LOS + 13.6 SIZE1 − 0.0483 LOSSIZE1.
(b) The regression equation is Wages
(c) The intercept and coefficient of LOS in this equation are the same as in the small-bank
regression from the previous exercise. (d) Up to rounding error, these two sums equal the
intercept and coefficient of LOS in the large-bank regression: Adding the intercept and SIZE1
coefficient gives 35.9 + 13.6 = 49.5, and adding the LOS and LOSSIZE1 coefficients gives
0.1042 − 0.0483 = 0.0559. (e) Large banks (SIZE1 = 1) have a larger intercept, suggesting
that on the average, they offer a higher starting wages (for employees with LOS = 0).
However, they also have a smaller slope, meaning that (on the average) wages increase
faster at smaller banks.
11.21. (a) Budget and Opening are right-skewed; Theaters and Opinion are roughly symmetric
(slightly left-skewed). Five-number summaries are best for skewed distributions, but all
possible numerical summaries are given here.
Variable x s Min Q1 M Q3 Max
Budget 61.81 52.47 6.5 20.0 45.0 85.0 185.0
Opening 28.59 31.89 1.1 10.0 18.6 32.1 158.4
Theaters 2785 921 808.0 2123.0 2808.0 3510.0 4366.0
Opinion 6.440 1.064 3.6 5.9 6.6 7.0 8.9
A worthwhile observation is that for all four variables, the maximum observation comes
from The Dark Knight. (b) Correlation coefficients are given with the scatterplots (below).
All pairs of variables are positively correlated. The Budget/Theaters and Opening/Theaters
relationships appear to be curved; the others are reasonably linear.
Budget
Budget
Opening
500 500
U.S. revenue ($millions)
300 300
200 200
100 100
0 0
0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160
Budget ($millions) Opening ($millions)
500 500
U.S. revenue ($millions)
400 400
300 300
200 200
100 100
0 0
500 1000 1500 2000 2500 3000 3500 4000 4500 3 4 5 6 7 8 9
Theaters Opinion
Minitab output: Regression of U.S. Revenue on budget, opening, theaters, and opinion
USRevenue = - 67.7 + 0.135 Budget + 3.02 Opening - 0.00223 Theaters
+ 10.3 Opinion
Predictor Coef Stdev t-ratio p
Constant -67.72 24.14 -2.81 0.009
Budget 0.13511 0.09776 1.38 0.177
Opening 3.0165 0.1461 20.65 0.000
Theaters -0.002229 0.005299 -0.42 0.677
Opinion 10.262 3.032 3.38 0.002
s = 15.69 R-sq = 98.1% R-sq(adj) = 97.8%
−2 440 40
−1 75 30
−1 4332
−0 875 20
Residual
−0 4322221100 10
0 144
0 0
1 0144 –10
1 568 –20
2 4
2 8 –30
3 500 1000 1500 2000 2500 3000 3500 4000 4500
3 6 Theaters
11.25. (a) Using the full model, the 95% prediction interval is $86.86 to $154.91 million.
(b) With the reduced model, the interval is $89.93 to $155.00 million. (c) The intervals
are very similar; as we saw in the previous exercise, there is little additional predictive
information from the two variables we removed.
Note: According to http://www.imdb.com/title/tt0425061/business, the
actual U.S. revenue for Get Smart was $130.3 million.
Minitab output: Predicting U.S. revenue for Get Smart (full model)
Fit Stdev.Fit 95.0% C.I. 95.0% P.I.
120.89 5.58 ( 109.48, 132.29) ( 86.86, 154.91)
Predicting U.S. revenue for Get Smart (reduced model)
Fit Stdev.Fit 95.0% C.I. 95.0% P.I.
122.46 2.86 ( 116.64, 128.29) ( 89.93, 155.00)
11.26. (a) The two films are Yes Man and Hancock, which made (respectively) $36.7 and
$34.2 million more than predicted. (The easiest way to find these two movies is to find the
two largest residuals of the reduced-model regression.) (b) With those movies removed, the
regression equation is
USRevenue = −75.72 + 3.1038 Opening + 11.112 Opinion
The coefficients are significant (t = 39.15 and t = 4.75, both with P < 0.0005). (c) Both
coefficients decreased slightly, meaning that a change in either variable makes a slightly
smaller change on the predicted revenue. Another observation: R 2 is slightly larger for this
regression (98.6% versus 97.9%). This does not mean that this regression is better; rather,
removing the outliers means that there is less variation to explain. (d) A stemplot and
quantile plot (below) suggest no reasons for concern.
Minitab output: Regression of U.S. revenue on opening and opinion (outliers removed)
USRevenu = - 75.7 + 3.10 Opening + 11.1 Opinion
Predictor Coef Stdev t-ratio p
Constant -75.72 14.44 -5.24 0.000
Opening 3.10379 0.07928 39.15 0.000
Opinion 11.112 2.341 4.75 0.000
s = 13.17 R-sq = 98.6% R-sq(adj) = 98.5%
30
−2 0 20
−1 6
−1 420 10
Residual
−0 98765
0
−0 433210
0 0012334 –10
0 56789
1 024 –20
1 6 –30
2 0 –3 –2 –1 0 1 2 3
Normal score
308 Chapter 11 Multiple Regression
11.27. (a) The PEER distribution is left-skewed; the other two distributions are irregular
(stemplots below). Student choices of summary statistics may vary; both five-number
summaries and means/standard deviations are given below. (b) Correlation coefficients are
given below the scatterplots. PEER and FtoS are negatively correlated, FtoS and CtoF are
positively correlated, and the other correlation is very small.
Variable x s Min Q1 M Q3 Max
Peer review score 79.60 18.37 39 61 85 97 100
Faculty/student ratio 61.88 28.23 18 29 67 89 100
Citations/faculty ratio 63.84 25.23 17 40 66 86 100
Citation/faculty ratio
Citation/faculty ratio
80 80 80
60 60 60
40 40 40
20 20 20
0 0 0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Peer review score Peer review score Faculty/student ratio
Correlation: –0.1143 Correlation: 0.0045 Correlation: 0.5801
11.28. (a) All three scatterplots are on the following page. The plot versus peer review score is
much more linear than the other two. (b) The correlations are given below the scatterplots.
Not surprisingly, the PEER correlation is greatest.
Note: The fact that the scatterplots do not all suggest linear associations does not mean
that a multiple regression is inappropriate. Even if the data exactly fit a multiple regression
model, the pairwise scatterplots will not necessarily appear to be linear.
Solutions 309
90 90 90
Overall score
Overall score
Overall score
80 80 80
70 70 70
60 60 60
50 50 50
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Peer review score Faculty/student ratio Citation/faculty ratio
Correlation: 0.8073 Correlation: 0.0637 Correlation: 0.2691
11.29. (a) The model is OVERALLi = β0 + β1 PEERi + β2 FtoSi + β3 CtoSi + i , where i are
independent N (0, σ ) random variables. (b) The regression equation is:
OVERALL = 18.85 + 0.5746 PEER + 0.0013 FtoS + 0.1369 CtoF
(c) For the confidence intervals, take bi ± t ∗ SEbi , with t ∗ = 1.9939 (for df = 71). These
intervals have been added to the Minitab output below. The second interval contains 0,
because that coefficient is not significantly different from 0. (d) The regression explains
. .
R 2 = 72.2% of the variation in overall score. The estimate of σ is s = 7.043.
.
11.30. (a) Between GPA and IQ, r = 0.634 (so straight-line regression explains r 2 = 40.2%
.
of the variation in GPA). Between GPA and self-concept, r = 0.542 (so regression explains
r 2 = 29.4% of the variation in GPA). Since gender is categorical, the correlation between
GPA and gender is not meaningful. (b) The model is GPA = β0 + β1 IQ + β2 SC + i ,
where i are independent N (0, σ ) random variables. (c) Regression gives the equation
= −3.88 + 0.0772 IQ + 0.0513 SC. Based on the reported value of R 2 , the regression
GPA
explains 47.1% of the variation in GPA. (So adding self-concept to the model only adds
about 6.9% to the variation explained by the regression.) (d) We test H0 : β2 = 0 versus
Ha : β2 = 0. The test statistic t = 3.14 (df = 75) has P = 0.002; we conclude that the
coefficient of self-concept is not 0.
11.31. (a) All distributions are skewed to varying degrees—GINI and CORRUPT to the right,
the other three to the left. CORRUPT, DEMOCRACY, and LIFE have the most skewness.
Student choices of summary statistics may vary; five-number summaries are a good choice
because of the skewness, but some may also give means and standard deviations.
Variable x s Min Q1 M Q3 Max
LSI 6.2597 1.2773 2.5 5.5 6.1 7.35 8.2
GINI 37.9399 8.8397 24.70 32.65 35.95 42.750 60.10
CORRUPT 4.8861 2.4976 1.7 2.85 4.0 7.3 9.7
DEMOCRACY 4.2917 1.6799 0.5 3.0 5.0 5.5 6.0
LIFE 71.9450 9.0252 44.28 70.39 73.16 78.765 82.07
Notice especially how the skewness is apparent in the five-number summaries.
(b) Correlation coefficients are given below the scatterplots. GINI is negatively (and weakly)
correlated to the other four variables, while all other correlations are positive and more
substantial (0.533 or more).
LSI 8 8
2 5 7 7
3 3
3 78 6 6
LSI
LSI
4 5 5
4 55678
5 0023334 4 4
5 555566677778889999
3 3
6 011233
6 557788 2 2
7 001113334 20 25 30 35 40 45 50 55 60 1 2 3 4 5 6 7 8 9 10
7 66677888899 GINI CORRUPT
8 000112 Correlation: –0.1394 Correlation: 0.7218
8 8
GINI 7 7
2 44 6 6
2 55566888999
LSI
LSI
3 0011223333344444444 5 5
3 55556666666788999 4 4
4 0001233344
4 56 3 3
5 0112234
2 2
5 778
0 1 2 3 4 5 6 40 45 50 55 60 65 70 75 80
6 0
DEMOCRACY LIFE
Correlation: 0.6084 Correlation: 0.8073
Solutions 311
CORRUPT 10 6
1 79 9
2 112224 8 5
DEMOCRACY
2 5556666688999999
CORRUPT
7 4
3 002244 6
3 55557 3
5
4 002233
4 2
4 58
5 000 3
1
5 79 2
6 134 1 0
6 5 20 25 30 35 40 45 50 55 60 20 25 30 35 40 45 50 55 60
7 03344 GINI GINI
7 56 Correlation: –0.3845 Correlation: –0.1875
8 24
8 66789 80
6
9 12
9 5667 75 5
DEMOCRACY
70
4
DEMOCRACY 65
LIFE
60 3
0 55
1 00 55 2
1 5555555 50
2 0 1
45
2 55 40 0
3 000000 20 25 30 35 40 45 50 55 60 1 2 3 4 5 6 7 8 9 10
3 5555 GINI CORRUPT
4 000
Correlation: –0.3828 Correlation: 0.7271
4 5555555
5 0000000
5 5555555555555555 80 80
6 000000000000000 75 75
70 70
LIFE 65 65
LIFE
LIFE
4 4 60 60
4 689 55 55
5 12 50 50
5 679 45 45
6 34
40 40
6 566899
1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6
7 00000011111122222233334
7 5556667888888889999999 CORRUPT DEMOCRACY
8 0000000112 Correlation: 0.6559 Correlation: 0.5331
that same information but contributes it more efficiently than does DEMOCRACY. Recall
from the previous solution that DEMOCRACY and CORRUPT had correlation 0.7271.)
Shown on the next page are stemplots of the residuals for all four regressions; the first
distribution is clearly skewed, but the other three show no severe deviations from Normality.
A full analysis of the residuals for each regression would require a total of 10 scatterplots;
shown are six of these plots which suggest possible problems with the assumptions. The
first five show signs of non-constant standard deviations, and the last shows a hint of
curvature.
Model 3 residuals
Model 3 residuals
1 1
0.5 0.5 0.5
0 0 0
–0.5 –0.5 –0.5
–1 –1 –1
–1.5 –1.5 –1.5
–2 –2 –2
40 45 50 55 60 65 70 75 80 40 45 50 55 60 65 70 75 80 0 1 2 3 4 5 6
LIFE LIFE DEMOCRACY
1.5 1.5
1.5
1 1
Model 4 residuals
Model 4 residuals
Model 4 residuals
1
0.5 0.5 0.5
0 0 0
–0.5 –0.5 –0.5
–1 –1 –1
–1.5 –1.5 –1.5
–2 –2 –2
40 45 50 55 60 65 70 75 80 0 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10
LIFE DEMOCRACY CORRUPT
11.33. (a) The coefficients, standard errors, t statistics, and P-values are given in the Minitab
output shown with the solution to the previous exercise. (b) Student observations will vary.
For example, the t statistic for the GINI coefficient grows from t = −1.18 (P = 0.243) to
t = 3.92 (P < 0.0005). The DEMOCRACY t is 3.27 in the third model (P < 0.0005) but
drops to 0.60 (P = 0.552) in the fourth model. (c) A good choice is to use GINI, LIFE, and
CORRUPT (Minitab output on the following page). All three coefficients are significant, and
R 2 = 77.3% is nearly the same as the fourth model from previous exercise. However, a
scatterplot of the residuals versus CORRUPT (not shown) still looks quite a bit like the final
scatterplot shown in the previous solution, suggesting a slightly curved relationship, which
would violate the assumptions of our model.
314 Chapter 11 Multiple Regression
11.34. (a) Stemplots (below) show that all four variables are right-skewed to some degree.
Variable x s Min Q1 M Q3 Max
VO+ 985.806 579.858 285.0 513.0 870.0 1251.0 2545.0
VO– 889.194 427.616 254.0 536.0 903.0 1028.0 2236.0
OC 33.416 19.610 8.1 17.9 30.2 47.7 77.9
TRAP 13.248 6.528 3.3 8.8 10.3 19.0 28.8
(b) Correlations and scatterplots (below) show that all six pairs of variables are positively
associated. The strongest association is between VO+ and VO– and the weakest is between
OC and VO–.
VO +
VO +
0 0 0
0 500 1000 1500 2000 0 10 20 30 40 50 60 70 80 0 5 10 15 20 25 30
VO – OC TRAP
Solutions 315
1500 1500 20
TRAP
VO –
VO –
15
1000 1000
10
500 500
5
0 0 0
0 10 20 30 40 50 60 70 80 0 5 10 15 20 25 30 0 10 20 30 40 50 60 70 80
OC TRAP OC
Residual
500
= 334.0 + 19.505 OC
VO+
. . 0
with s = 443.3 and R 2 = 0.435; the
test statistic for the slope is t = 4.73 –500
(P < 0.0005), so we conclude the slope –1000
is not zero. The plot of residuals against 0 10 20 30 40 50 60 70 80
OC suggests a slight downward curve on OC
the right end, as well as increasing scatter as OC increases. The residuals are also somewhat
right-skewed. A stemplot and Normal quantile plot of the residuals are not shown here but
could be included as part of the analysis. (b) The regression equation is
= 57.7 + 6.415 OC + 53.87 TRAP
VO+
. .
with s = 376.3 and R 2 = 0.607. The coefficient of OC is not significantly different from
0 (t = 1.25, P = 0.221), but the coefficient of TRAP is (t = 3.50, P = 0.002). This is
consistent with the correlations found in the solution to Exercise 11.34: TRAP is more highly
correlated with VO+, and is also highly correlated with OC, so it is reasonable that, if TRAP
is present in the model, little additional information is gained from OC.
Model R2 s
1 =
VO+ 334.0 + 19.505 OC 0.435 443.3
SE = 4.127
t = 4.73
P < 0.0005
2 =
VO+ 57.7 + 6.415 OC + 53.87 TRAP 0.607 376.3
SE = 5.125 SE = 15.39
t = 1.25 t = 3.50
P = 0.221 P = 0.002
3 = −243.5 + 8.235 OC
VO+ + 6.61 TRAP + 0.975 VOminus 0.884 207.8
SE = 2.840 SE = 10.33 SE = 0.1211
t = 2.90 t = 0.64 t = 8.05
P = 0.007 P = 0.528 P < 0.0005
4 = −234.1 + 9.404 OC
VO+ + 1.019 VOminus 0.883 205.6
SE = 2.150 SE = 0.0986
t = 4.37 t = 10.33
P < 0.0005 P < 0.0005
11.37. Stemplots (below) show that all four variables are noticeably less skewed.
Variable x s Min Q1 M Q3 Max
LVO+ 6.7418 0.5555 5.652 6.240 6.768 7.132 7.842
LVO– 6.6816 0.4832 5.537 6.284 6.806 6.935 7.712
LOC 3.3380 0.6085 2.092 2.885 3.408 3.865 4.355
LTRAP 2.4674 0.4978 1.194 2.175 2.332 2.944 3.360
Correlations and scatterplots (on the following page) show that all six pairs of variables are
positively associated. The strongest association is between LVO+ and LVO– and the weakest
is between LOC and LVO–. The regression equations for these transformed variables are
given in the table below, along with significance test results. Residual analysis for these
regressions is not shown.
The final conclusion is the same as for the untransformed data: When we use all three
explanatory variables to predict LVO+, the coefficient of LTRAP is not significantly different
from 0 and we then find that the model that uses LOC and LVO– to predict LVO+ is nearly
as good (in terms of R 2 ), making it the best of the bunch.
log(VO +)
log(VO +)
7 7 7
6 6 6
log(TRAP)
log(VO –)
log(VO –)
2.5
6.5 6.5 2
6 6 1.5
5.5 5.5 1
2 2.5 3 3.5 4 1 1.5 2 2.5 3 2 2.5 3 3.5 4
log(OC) log(TRAP) log(OC)
11.38. Refer to the solution to Exercise 11.34 for the scatterplots. Note that, in this case, it
really makes the most sense to use TRAP (rather than OC) to predict VO– (because it is the
appropriate biomarker), but many students might miss that detail. Both single-explanatory
variable models are given in the first table on the following page. Residual analysis plots are
not included. Our conclusion here is similar to the conclusion in Exercises 11.36 and 11.37:
The best model is to use OC and VO+ to predict VO–.
11.39. Refer to the solution to Exercise 11.37 for the scatterplots. As in the previous exercise,
the more logical single-variable model would be to use LTRAP to predict LVO–, but many
students might miss that detail. Both single-explanatory variable models are given in the
second table on the following page. Residual analysis plots are not included. This time, we
might conclude that the best model is to predict LVO– from LVO+ alone; neither biomarker
variable makes an indispensable contribution to the prediction.
Solutions 319
11.40. (a) Histograms are below; all distributions are sharply right-skewed.
Variable x s Min Q1 M Q3 Max
PCB 68.4674 59.3906 6.0996 29.8305 47.9596 91.7140 318.746
PCB52 0.9580 1.5983 0.0200 0.2180 0.4770 0.8925 9.060
PCB118 3.2563 3.0191 0.2360 1.4800 2.4200 3.8950 18.900
PCB138 6.8268 5.8627 0.6400 2.9700 4.9200 8.7150 32.300
PCB180 4.1584 4.9864 0.3950 1.1950 2.6900 4.5900 31.500
(b) Scatterplots and correlations are on the following page. All pairs of variables are
positively associated, although some only weakly. In general, even when the association
is strong, the plots show more variation for large values of the two variables. If we test
H0 : ρ = 0 versus Ha : ρ = 0 for these correlations, we find that P < 0.0005 for eight
.
of them. The PCB52/PCB138 correlation is less significant (r = 0.3009, t = 2.58,
P = 0.0120), and the PCB52/PCB180 correlation is not significantly different from 0
.
(r = 0.0869, t = 0.71, P = 0.4775).
35
50
30
40
Frequency
Frequency
25
20 30
15
20
10
5 10
0 0
0 50 100 150 200 250 300 350 0 1 2 3 4 5 6 7 8 9 10
PCB PCB52
30 35
25 30
Frequency
Frequency
25
20
20
15
15
10
10
5 5
0 0
0 2 4 6 8 10 12 14 16 18 20 0 5 10 15 20 25 30 35
PCB118 PCB138
50 r = 0.5964
300
40 250
Frequency
200
30
PCB
150
20 100
10 50
0 0
0 5 10 15 20 25 30 35 0 1 2 3 4 5 6 7 8 9
PCB180 PCB52
Solutions 321
PCB
PCB
150 150 150
100 100 100
50 50 50
0 0 0
0 2 4 6 8 10 12 14 16 18 0 5 10 15 20 25 30 0 5 10 15 20 25 30
PCB118 PCB138 PCB180
9 r = 0.6849 9 r = 0.3009 9 r = 0.0869
8 8 8
7 7 7
6 6 6
PCB52
PCB52
PCB52
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
0 2 4 6 8 10 12 14 16 18 0 5 10 15 20 25 30 0 510 15 20 25 30
PCB118 PCB138 PCB180
r = 0.7294 r = 0.4374 r = 0.8823
30
16 16
25
12 12
PCB118
PCB118
PCB138
20
8 8 15
10
4 4
5
0 0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30
PCB138 PCB180 PCB180
with s = 6.382 and R 2 = 0.989. All coefficients are significantly different from 0, although
the constant 0.937 is not (t = 0.76, P = 0.449). That makes some sense—if none of these
four congeners are present, it might be somewhat reasonable to predict that the total amount
of PCB is 0. (c) The residuals appear to be roughly Normal, but with two outliers. There are
no clear patterns when plotted against the explanatory variables (these plots are not shown).
322 Chapter 11 Multiple Regression
11.44. (a) Because TEQ is defined as the sum TEQPCB + TEQDIOXIN + TEQFURAN, we have
β0 = 0 and β1 = β2 = β3 = 1. (b) The error terms are all zero, so they have no scatter;
therefore, σ = 0. (c) Results will vary slightly with software, but except for rounding error,
the regression confirms the values in parts (a) and (b).
with s = 0.9576 and R 2 = 0.677. Only the constant and the PCB118 coefficient are sig-
nificantly different from 0; see Minitab output below. Residuals (stemplot on the right) are
slightly right-skewed and show no clear patterns when plotted with the explanatory variables
(not shown).
11.46. (a) Results will vary with software. (b) Different soft- x s
ware may produce different results, but (presumably) all LPCB28 −1.3345 1.1338
software will ignore those 16 specimens, which is probably LPCB52 −0.7719 1.1891
not a good approach. (c) The summary statistics (right) LPCB118 0.8559 0.8272
and stemplots (below) are based on natural logarithms; for LPCB126 −4.8457 0.7656
common logarithms, multiply mean and standard deviation LPCB138 1.6139 0.8046
LPCB153 1.7034 0.9012
by 2.3026. For LPCB126, the zero terms were replaced
. LPCB180 0.9752 0.9276
ln 0.0026 = −5.9522, which accounts for the odd appear- LPCB 3.9170 0.8020
ance of its stemplot. LTEQ 0.8048 0.5966
11.47. (a) The correlations (all positive) are listed in the table below. The largest correlation
is 0.956 (LPCB and LPCB138); the smallest (0.227, for LPCB28 and LPCB180) is not
quite significantly different from 0 (t = 1.91, P = 0.0607) but, with 28 correlations, such
a P-value could easily arise by chance, so we would not necessarily conclude that ρ = 0.
Rather than showing all 28 scatterplots—which are all fairly linear and confirm the positive
associations suggested by the correlations—we have included only two of the interesting
ones: LPCB against LPCB28 and LPCB against LPCB126. The former is notable because of
one outlier (specimen 39) in LPCB28; the latter stands out because of the “stack” of values
in the LPCB126 data set that arose from the adjustment of the zero terms. (The outlier in
LPCB28 and the stack in LPCB126 can be seen in other plots involving those variables; the
two plots shown are the most appropriate for using the PCB congeners to predict LPCB, as
the next exercise asks.) (b) All correlations are higher with the transformed data. In part,
this is because these scatterplots do not exhibit the “greater scatter in the upper right” that
was seen in many of the scatterplots of the original data.
LPCB28 LPCB52 LPCB118 LPCB126 LPCB138 LPCB153 LPCB180
LPCB52 0.795
LPCB118 0.533 0.671
LPCB126 0.272 0.331 0.739
LPCB138 0.387 0.540 0.890 0.792
LPCB153 0.326 0.519 0.780 0.647 0.922
LPCB180 0.227 0.301 0.654 0.695 0.896 0.867
LPCB 0.570 0.701 0.906 0.729 0.956 0.905 0.829
5.5 5.5
5 5
4.5 4.5
log(PCB)
log(PCB)
4 4
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
–6 –5 –4 –3 –2 –1 0 1 2 –6.5 –6 –5.5 –5 –4.5 –4 –3.5
log(PCB28) log(PCB126)
11.48. Student results will vary with how many different models they try, and what tradeoff
they consider between “good” (in terms of large R 2 ) and “simple” (in terms of the number
of variables included in the model). The first Minitab output on the next page, produced
with the BREG (best regression) command, gives some guidance as to likely answers; it
shows the best models with one, two, three, four, five, six, and seven explanatory variables.
We can see, for example, that if all variables are used, R 2 = 0.975, but we can achieve
similar values of R 2 with fewer variables. The best regressions with two, three, and four
explanatory variables are shown in the Minitab output on the next page.
326 Chapter 11 Multiple Regression
11.49. Using Minitab’s BREG (best regression) command for guidance, we see that there is
little improvement in R 2 beyond models with four explanatory variables. The best models
with two, three, and four variables are given in the Minitab output below.
11.50. The degree of change in these elements of a regression can be readily seen by
comparing the three regression results shown in the solution to Exercise 11.48; they will be
even more visible if students have explored more models in their search for the best model.
Student explanations might include observations of changes in particular coefficients from
one model to another and perhaps might attempt to paraphrase the text’s comments about
why this happens.
328 Chapter 11 Multiple Regression
11.52. The plots show positive associations between the variables. The correlations and
P-values are in the plots; all correlations are positive (as expected) and significantly
different from 0. (Recall that the√ P-values
√ are correct if the two variables are Normally
distributed, in which case t = r n − 2/ 1 − r 2 has a t (n − 2) distribution if ρ = 0.)
40 40 40
Taste
Taste
Taste
30 30 30
20 20 20
10 10 10
0 0 0
Acetic
5.5 5.5
H2S
6
5 5 5
4
4.5 4.5
3
4 4 2
2 3 4 5 6 7 8 9 10 0.5 1 1.5 2 0.5 1 1.5 2
H2S Lactic Lactic
Solutions 329
60 30
50 20
40 10
Residual
Taste
30 0
20
–10
10
–20
0
–30
4 4.5 5 5.5 6 6.5 4 4.5 5 5.5 6 6.5
Acetic Acetic
30 30
20 20
10 10
Residual
Residual
0 0
–10 –10
–20 –20
–30 –30
2 3 4 5 6 7 8 9 10 0.75 1 1.25 1.5 1.75 2
H2S Lactic
330 Chapter 11 Multiple Regression
60 25
50 20
15
40 10
Residual
Taste
30 5
0
20
–5
10 –10
0 –15
–20
2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10
H2S H2S
20 20
10 10
Residual
Residual
0 0
–10 –10
–20 –20
4 4.5 5 5.5 6 6.5 0.75 1 1.25 1.5 1.75 2
Acetic Lactic
Solutions 331
60 25
50 20
15
40
Residual
10
Taste
30 5
0
20
–5
10 –10
0 –15
–20
0.75 1 1.25 1.5 1.75 2 0.75 1 1.25 1.5 1.75 2
Lactic Lactic
20 20
Residual
Residual
10 10
0 0
–10 –10
–20 –20
4 4.5 5 5.5 6 6.5 2 3 4 5 6 7 8 9 10
Acetic H2S
that it does not add significantly to the model when H2S is used because Acetic and H2S
are correlated (in fact, r = 0.618 for these two variables). This model does a better job than
any of the three simple linear regression models, but it is not much better than the model
with H2S alone (which explained 57.1% of the variation in Taste)—as we might expect
from the t-test result.
12.1. (a) H0 says the population means are all equal. (b) Experiments are best for establishing
causation. (c) ANOVA is used to compare means (and assumes that the variances are equal).
(d) Multiple comparisons procedures are used when we wish to determine which means
are significantly different, but have no specific relations in mind before looking at the data.
(Contrasts are used when we have prior expectations about the differences.)
12.2. (a) If we reject H0 , we conclude that at least one mean is different from the rest.
(b) One-way ANOVA is used to compared two or more means. (When only means are to be
compared, we usually use a two-sample t test.) (c) Two-way ANOVA is used to examine the
effect of two explanatory variables (which have two or more values) on a response variable
(which is assumed to have a Normal distribution, meaning that it can take any value, at least
in theory).
12.3. We were given sample sizes n 1 = 23, n 2 = 20, and n 3 = 28 and standard deviations
s1 = 5, s2 = 5, and s3 = 6. (a) Yes: The guidelines for pooling standard deviations
.
say that the ratio of largest to smallest should be less than 2; we have 65 = 1.2 < 2.
(b) Squaring the three standard deviations gives s12 = 25, s22 = 25, and s32 = 36.
22s12 + 19s22 + 27s32 . .
(c) sp2 = = 29.3676. (d) sp = sp2 = 5.4192.
22 + 19 + 27
–2 0 6 12 18 24 30 36 9 12 15 18 21 24 27 10 15 20
12.5. (a) This sentence describes between-group variation. Within-group variation is the
variation that occurs by chance among members of the same group. (b) The sums of squares
(not the mean squares) in an ANOVA table will add. (c) The common population standard
deviation σ (not its estimate sp ) is a parameter. (d) A small P means the means are not all
the same, but the distributions may still overlap quite a bit. (See the “Caution” immediately
preceding this exercise in the text.)
12.6. The answers are found in Table E (or using software) with p = 0.05 and degrees of
freedom I − 1 and N − I . (a) I = 4, N = 16, df 3 and 12: F > 3.49 (software: 3.4903).
(b) I = 4, N = 24, df 3 and 20: F > 3.10 (software: 3.0984). (c) I = 4, N = 32, df 3
and 28: F > 2.95 (software: 2.9467). (d) As the degrees of freedom increase, values from
an F distribution tend to be smaller (closer to 1), so smaller values of F are statistically
significant. In terms of ANOVA conclusions, we have learned that with smaller samples
(fewer observations per group), the F statistic needs to be fairly large in order to reject H0 .
12.7. Assuming the t (ANOVA) test establishes that the means are different, contrasts and
multiple comparison provide no further useful information. (With two means, there is only
one comparison to make, and it has already been made by the t test.)
333
334 Chapter 12 One-Way Analysis of Variance
12.8. (a) The stated hypothesis is µ50% < 12 (µ0% + µ100% ), so we use the contrast
ψ = 12 (µ0% + µ100% ) − µ50% , with coefficients 0.5, −1, and 0.5. The hypotheses
can then be stated H0 : ψ = 0 versus Ha : ψ > 0. (b) The estimated
contrast is
.
c = 2 (50 + 120) − 75 = 10 cm , with standard error SEc = sp 40 + 40
1 3 0.25 1
+ 0.25
40 = 5.8095, so
. 10 .
the test statistic is t = 5.8095 = 1.7213 with df = 117. The one-sided P-value is P = 0.0439,
so this is significant at α = 0.05, but not at α = 0.01.
Note: We wrote the contrast so that it would be positive when Ha is true (in
keeping with the text’s advice). We could also test this hypothesis using the contrast
ψ = µ50% − 12 (µ0% + µ100% ), or even ψ = µ0% + µ100% − 2µ50% . The resulting t statistic
is the same (except possibly in sign) regardless of the way the contrast is stated.
12.11. (a) I = 3 and N = 33, so the degrees of freedom are 2 and 30. F = 127
50 = 2.54.
Comparing to the F(2, 30) distribution in Table E, we find 2.49 < F < 3.32, so
.
0.050 < P < 0.100. (Software gives P = 0.0957.) (b) I = 4 and N = 32, so the degrees
58/3 .
of freedom are 3 and 28. F = 182/28 = 2.9744. Comparing to the F(3, 28) distribution in
.
Table E, we find 2.95 < F < 3.63, so 0.025 < P < 0.050. (Software gives P = 0.0486.)
F = 2.54 F = 2.97
0 1 2 3 4 0 1 2 3 4
12.12. (a) Yes: The guidelines for pooling standard deviations say that the ratio of largest to
28 = 1.5. (b) Squaring the three standard deviations
smallest should be less than 2; we have 42
28s12 + 31s22 + 120s32 .
gives s12 = 1369, s22 = 784, and s32 = 1764. (c) sp2 = = 1532.49.
28 + 31 + 120
.
(d) sp = sp2 = 39.15. (e) Because the third sample was nearly twice as large as the other
two put together, the pooled standard deviation is closest to s3 .
Solutions 335
12.13. (a) Response: egg cholesterol level. Populations: chickens with different diets or drugs.
I = 3, n 1 = n 2 = n 3 = 25, N = 75. (b) Response: rating on five-point scale. Populations:
the three groups of students. I = 3, n 1 = 31, n 2 = 18, n 3 = 45, N = 94. (c) Response: quiz
score. Populations: students in each TA group. I = 3, n 1 = n 2 = n 3 = 14, N = 42.
12.14. (a) Response: time to complete VR path. Populations: children using different
navigation methods. I = 4, n i = 10 (i = 1, 2, 3, 4), N = 40. (b) Response:
calcium content of bone. Populations: chicks eating diets with differing pesticide levels.
I = 5, n i = 13 (i = 1, 2, 3, 4, 5), N = 65. (c) Response: total sales between 11:00 a.m.
and 2:00 p.m. Populations: customers responding to one of four sample offers. I = 4,
n i = 5 (i = 1, 2, 3, 4) and N = 20.
12.15. For all three situations, the hypotheses are H0 : µ1 = µ2 = µ3 versus Ha : at least one
mean is different. The degrees of freedom are DFG = DFM = I − 1 (“model” or “between
groups”), DFE = DFW = N − I (“error” or “within groups”), and DFT = N − 1 (“total”).
The degrees of freedom for the F test are DFG and DFE.
12.17. (a) This sounds like a fairly well-designed experiment, so the results should at least
apply to this farmer’s breed of chicken. (b) It would be good to know what proportion of
the total student body falls in each of these groups—that is, is anyone overrepresented in
this sample? (c) How well a TA teaches one topic (power calculations) might not reflect that
TA’s overall effectiveness.
12.18. (a) This sounds like a fairly well-designed experiment, assuming the subjects come
from a group which is representative of the population. (We assume that this teaching tool is
intended for use with children and that the children used in the experiment were themselves
deaf.) (b) This should at least give information about pesticide effect on bone calcium in
chicks. It might not apply to adult chickens, or other species of birds. (c) The results might
extend to similar sandwich shops, and perhaps to other times of day, or to weekend sales.
336 Chapter 12 One-Way Analysis of Variance
12.19. (a) With I = 3 and N = 120, we have df 2 and 117. (b) To use Table E, we compare
to df 2 and 100; with F > 5.02, we conclude that P < 0.001. Software gives P = 0.0003.
(c) Haggling and bargaining behavior is probably linked to the local culture, so we should
hesitate to generalize these results beyond similar informal shops in Mexico.
12.22. We have I = 4 groups with N = 620. With the given group means, the overall mean is
130 · 2.93 + 248 · 3.00 + 174 · 3.01 + 68 · 3.39 .
x= = 3.0309
N
(a) DFG = I − 1 = 3 and DFE = N − I = 616. (b) The groups sum of squares is
.
SSG = 130(2.93 − x)2 + 248(3.00 − x)2 + 174(3.01 − x)2 + 68(3.39 − x)2 = 10.4051
MSG 10.4051/3 . .
(c) F = = = 2.68. (d) Software gives P = 0.0461, so we have enough
MSE 797.25/616
evidence to reject H0 at the 5% significance level. (e) The mean for the “other” group
appears to be higher than the means of the first three groups (which are similar).
Solutions 337
Average price
110
rearranged in any order.) (b) Yes; the smallest-to-largest
.
26.09 = 1.55. (c) The degrees of
standard deviation ratio is 40.39 100
freedom are I − 1 = 2 and N − I = 44. From Table E (with
df 2 and 40), we have 0.025 < P < 0.050; software gives 90
P = 0.0427. The difference in means is (barely) significant
80
at the 5% level. DSL Cable Fiber
Provider
The one-sided P-value is P = 0.0181, so this is significant at α = 0.05, but not at α = 0.01.
12.25. (a) Use matched pairs t methods; we examine the change in reaction time for each
subject. (b) No: We cannot use ANOVA methods because we do not have four independent
samples. (The same group of subjects performed each of the four tasks.)
12.26. We have x 1 = 61.62, s1 = 13.75, n 1 = 71, x 2 = 46.47, s2 = 7.94, and n 2 = 37. For the
.
pooled t procedure, we find sp = 12.09 and t = 6.18 (df = 106, P < 0.0001). The Minitab
output below shows that F = 38.17 (t 2 , up to rounding error).
12.27. (a) With I = 4 and N = 2290, the degrees of freedom are DFG = I − 1 = 3 and
MSG 11.806 .
DFE = N − I = 2286. (b) MSE = sp2 = 4.6656, so F = = = 2.5304. (c) The
MSE 4.6656
.
F(3, 1000) entry in Table E gives 0.05 < P < 0.10; software give P = 0.0555.
338 Chapter 12 One-Way Analysis of Variance
+ + ··· +
3(s12 s22 s52 ) 159
sp2 = = = 10.6 12
3+3+3+3+3 15 Placebo Low A High A Low B High B
√ . Treatment
and sp = 10.6 = 3.2558. (c) The degrees
of freedom are DFG = I − 1 = 4 and DFE = N − I = 15. (d) Comparing to an F(4, 15)
distribution in Table E, we see that 3.80 < F < 4.89, so 0.010 < P < 0.025; software gives
.
P = 0.0165. We have significant evidence that the means are not all the same.
Solutions 339
Mean grade
average grade decreases as the number of 2.7
accommodations increases. (b) Having too
many decimal points is distracting; in this 2.6
situation, no useful information is gained 2.5
by having more than one or two digits af-
ter the decimal point. For example, the 2.4
0 1 2 3 4
first mean and standard deviation would Number of accommodations
be more effectively presented as 2.79 and
0.85. (c) The largest-to-smallest SD ratio is slightly over 2 (about 2.009), so pooling is not
.
advisable. (If we pool in spite of this, we find sp = 0.8589.) (d) Eliminating data points
(without a legitimate reason) is always risky, although we could run the analysis with and
without them. Combining the last three groups would be a bad idea if the data suggested that
grades rebounded after 2 accommodations (i.e., if the average grades were higher for 3 and
4 accommodations), but as that is not the case, lumping 2, 3, and 4 accommodations seems
reasonable. (e) ANOVA is not appropriate for these data, chiefly because we do not have 245
independent observations. (f) There may be a number of local factors (for example, student
demographics or teachers’ attitudes toward accommodations) which affected grades; these
effects might not be the same elsewhere. (g) One weakness is that we do not have a control
group for comparison; that is, we cannot tell what grades these students (or a similar group)
would have had without accommodations.
12.32. (a) The largest-to-smallest SD ratios are Minitab output: Chi-square test
2.84, 1.23, and 1.14, so the text’s guidelines are Women Men Total
satisfied for intensity and recall, but not for fre- 1 38 8 46
31.64 14.36
quency. (b) As in the previous exercise, I = 5
and N = 410, so we use an F(4, 405) distribu- 2 22 11 33
22.70 10.30
tion. From the F(4, 200) distribution in Table E,
we can conclude that P < 0.001 for all three 3 57 34 91
62.59 28.41
response variables. With software, we find that
4 102 58 160
the P-values are much smaller; all are less than 110.05 49.95
0.00002. We conclude that, for each variable, we
5 63 17 80
have strong evidence that some group mean is 55.02 24.98
different. (This conclusion is cautious in the case
Total 282 128 410
of frequency because of our concern about the
standard deviations.) (c) The table below shows ChiSq = 1.279 + 2.817 +
0.021 + 0.047 +
one way of summarizing the means. For each 0.499 + 1.100 +
variable, it attempts to identify low (underlined), 0.589 + 1.297 +
medium, and high (boldface) values of that vari- 1.156 + 2.547 = 11.353
df = 4, p = 0.023
able. Hispanic Americans were higher than other
groups for all four variables. Asian Americans were low for all variables (the lowest in all
but global score). Japanese were low on all but global score, while European Americans and
Indians were in the middle for all but global score. (d) The results might not generalize to,
for example, subjects who are from different parts of their countries or not in a college or
university community. (e) Create a two-way table with counts of men and women in each
cultural group. The Minitab output on the right gives X 2 = 11.353, df = 4, and P = 0.023,
so we have evidence (significant at α = 0.05) that the gender mix was not the same for all
cultures. Specifically, Hispanic Americans and European Americans had higher percentages
of women, which might further affect how much we can generalize the results.
12.33. Because the descriptions of these contrasts do not specify an expected direction for the
comparison, the subtraction could be done either way (in the order shown, or in the opposite
order). (a) ψ1 = µ2 − 12 (µ1 + µ4 ). (b) ψ2 = 13 (µ1 + µ2 + µ4 ) − µ3 .
12.34. Neither the descriptions in Exercise 12.33 nor the background information in
Example 12.25 seem to give any indication of an expected direction for the contrasts, so
we have given two-sided alternatives. If students give a one-sided alternative, they should
explain why they did so. (a) With ψ1 = µ2 − 12 (µ1 + µ4 ) and ψ2 = 13 (µ1 + µ2 + µ4 ) − µ3 ,
we test H0 : ψi = 0 versus Ha : ψi = 0 (for i = 1 or 2). (b) The estimated contrasts are
. .
c1 = 0.195 and c2 = 0.48. (c) The pooled estimate of the standard deviation sp is either
.
1.6771 or 1.6802 (see the note at the end of this solution), so SEc1 = 0.3093 or 0.3098,
.
and SEc2 = 0.2929 or 0.2934. (d) Neither contrast is significantly different from 0 (with a
Solutions 341
.
two-sided alternative). For comparing brown eyes to the other colors, t1 = 0.630 or 0.629,
. .
with df = 218, for which P = 0.5290 or 0.5298. For gaze up versus gaze down, t2 = 1.639
.
or 1.636, with df = 218, for which P = 0.1026 or 0.1033. (e) The confidence intervals are
ci ± t ∗ SEci , where t ∗ = 1.984 (Table D) or 1.971 (software). This gives roughly −0.41 to
0.80 for ψ1 and −0.10 to 1.06 for ψ2 .
Note: The simplest way to find the pooled standard deviation √ sp is to use the value
1.6771 reported by SAS and Minitab in Figure 12.11 (or take MSE from the Excel output).
Some students might compute it by hand from the numbers given in Example 12.25, which
gives 1.6802. The difference is due to rounding; note that the reported standard deviation
for brown eyes should be 1.72 rather than 1.73. In the end, our conclusions are the same
either way.
12.35. See the solution to Exercise 1.87 for stemplots. The means, standard deviations, and
standard errors (all in millimeters) are given below. We reject H0 and conclude that at least
one mean is different (F = 259.12, df 2 and 51, P < 0.0005).
Variety n x s sx 47
12.36. (a) On the right are summary statistics and a plot Group n x s
of means; side-by-side stemplots are on the following 1 20 2.95 0.945
page. Students might also use five-number summaries to 2 22 4.00 0.926
describe the data, but with small samples and relatively 3 10 3.10 1.524
unskewed distributions, they give us little additional
.
Average preference rating
4
information. (b) ANOVA gives F = 5.63 (df 2 and 49)
. 3.8
and P = 0.0063—strong evidence of a difference in
means. (c) Because preference ratings are whole numbers, 3.6
the underlying distributions cannot be Normal, but apart 3.4
from that, the stemplots and summary statistics show 3.2
no particular causes for concern. On the following page
3
are stemplots of the residuals, which show the expected
granularity (due to the ratings being whole numbers). 2.8
1 2 3
With such small samples, it is difficult to make any further
Group
judgments about Normality. (d) The three test statistics are
2.95 − 4.00 . 2.95 − 3.10 . 4.00 − 3.10 .
t12 = = −3.18, t13 = = −0.36, t23 = = 2.21
sp 1
20
+ 1
22
sp 1
20
+ 1
10
sp 1
22
+ 1
10
Results will vary with the method used, and the overall significance level. Using the Bonfer-
.
roni method with α = 0.05 (and three comparisons), we have t ∗∗ = 2.479, so only groups 1
and 2 are significantly different.
342 Chapter 12 One-Way Analysis of Variance
Variety n x s sx 3.85
log(Flower lengths)
12.38. (a) Statistics and plots are below. (b) The standard deviations satisfy the text’s
guidelines for pooling. One concern is that all three distributions are slightly left-skewed and
the youngest nonfiction death is an outlier. (c) ANOVA gives F = 6.56 (df 2 and 120) and
P = 0.002, so we conclude that at least one mean is different. (d) The appropriate contrast
is ψ1 = 12 (µnov + µnf ) − µp . (This is defined so that the ψ1 > 0 if poets die younger. This
is not absolutely necessary but is in keeping with the text’s advice.) The null hypothesis is
H0 : ψ1 = 0; the Yeats quote hardly seems like an adequate reason to choose a one-sided
.
alternative, but students may have other opinions. For the test, we compute c = 10.9739,
. .
SEc = 3.0808, and t = 3.56 with df = 120. The P-value is very small regardless of whether
Ha is one- or two-sided, so we conclude that the contrast is positive (and poets die young).
(e) For this comparison, the contrast is ψ2 = µnov − µnf , and the hypotheses are H0 : ψ2 = 0
versus Ha : ψ2 = 0. (Because the alternative is two-sided, the subtraction in this contrast can
. . .
go either way.) For the test, we compute c = −5.4272, SEc = 3.4397, and t = −1.58 with
df = 120. This gives P = 0.1172; the difference between novelists and nonfiction writers is
not significant. (f) With three comparisons and df = 120, the Bonferroni critical value is
.
t ∗∗ = 2.4280. The pooled standard deviation is sp = 14.4592, so the differences, standard
errors, and t values are:
. . .
x nov − x p = 8.2603, SEnov−p = sp 1
67 +
= 3.1071,
1
32 t = 2.66
. 1 . .
x nov − x nf = −5.4272, SEnov−nf = sp 67
1
+ 24 = 3.4397, t = −1.58
. . .
x p − x nf = −13.6875, SEp−nf = sp 32
1
+ 24
1
= 3.9044, t = −3.51
The first and last differences are greater (in absolute value) than t ∗∗ , so those differences
are significant. The second difference is the same one tested in the contrast of part (e); the
standard error and the conclusion are the same.
78
Average age at death (years)
n x s sx 76
Novels 67 71.4478 13.0515 1.5945 74
Poems 32 63.1875 17.2971 3.0577 72
Nonfiction 24 76.8750 14.0969 2.8775 70
68
Minitab output: Analysis of Variance on age at death 66
Source DF SS MS F p 64
Writer 2 2744 1372 6.56 0.002 62
Error 120 25088 209 Novels Poems Nonfiction
Total 122 27832 Type of writing
344 Chapter 12 One-Way Analysis of Variance
12.39. (a) The means, standard deviations, and standard errors are given below (all in
grams per cm2 ). (b) All three distributions appear to reasonably close to Normal, and
the standard deviations are suitable for pooling. (c) ANOVA gives F = 7.72 (df 2
and 42) and P = 0.001, so we conclude that the means are not all the same. (d) With
df = 42, 3 comparisons, and α = 0.05, the Bonferroni critical value is t ∗∗ = 2.4937.
.
The pooled standard deviation is sp = 0.01437 and the standard error of each difference is
√ .
SE D = sp 1/15 + 1/15 = 0.005246, so two means are significantly different if they differ
.
by t ∗∗ SE D = 0.01308. The high-dose mean is significantly different from the other two.
(e) Briefly: High doses of kudzu isoflavones increase BMD.
n x s sx
Control 15 0.2189 0.01159 0.002992 0.23
BMD (g/cm2)
Low dose 15 0.2159 0.01151 0.002972
High dose 15 0.2351 0.01877 0.004847
0.22
Minitab output: Analysis of Variance on BMD
Source DF SS MS F p
Factor 2 0.003186 0.001593 7.72 0.001 0.21
Error 42 0.008668 0.000206 Control Low dose High dose
Total 44 0.011853 Treatment
12.40. (a) With I = 5 and N = 315, the F tests have df 4 and 310. (b) The response variable
does not need to be Normally distributed; rather, the deviations from the mean within each
group should be Normal. (c) By comparing each F statistic to 2.401—the 5% critical
value for an F(4, 310) distribution—we see that the means are significantly different for
the first three questions. (We can also compute the P-value for each F statistic to reach
this conclusion.) (d) A possible plot is shown below; this could also be split into three
separate plots. For the first question, the leisure group appears to have the most interest in
experiencing Hawaiian culture. For the second question, the sports and leisure groups had
a lower preference for a group tour, while fraternal associations had a higher preference.
For the third question, the leisure group is most interested in ocean sports, and the business
group is least interested.
5.5
Culture
(1=strongly disagree,
5
7=strongly agree)
Mean response
12.41. (a) The mean responses were not significantly different for the last question. (b) Taking
the square roots of the given values of MSE gives the values of sp . For the Bonferroni
method with α = 0.05, df = 310, and 10 comparisons, t ∗∗ = 2.827. Only the largest
difference within each set of means is significant:
3.97 − 5.33 .
t14 = = −2.891 experience culture—honeymoon and leisure groups
1.8058 34 + 26
1 1
3.38 − 2.39 .
t23 = = 3.550 group tour—fraternal association/sports groups
1.6855 56 + 105
1 1
5.33 − 4.02 .
t45 = = 2.856 ocean sports—leisure/business groups
2.0700 261
+ 94
1
.
0.657 =
12.45. (a) Pooling is reasonable: The ratio is 0.824
1.25. For the pooled standard deviation, we compute F = 17.66
488s12 + 68s22 + 211s32 .
sp2 = = 0.5902 2.31 3.01
488 + 68 + 211 4.63
. √ .
so sp = 0.5902 = 0.7683. (b) Comparing F = 17.66
0 1 2 3 4 5
to an F(2, 767) distribution, we find P < 0.001.
Sketches of this distribution will vary; in the graph on the right, the three marked points are
the 10%, 5%, and 1% critical values, so we can see that the observed value lies well above
the bulk of this distribution. (c) For the contrast ψ = µ2 − 12 (µ1 + µ3 ), we test H0 : ψ = 0
. . .
versus Ha : ψ > 0. We find c = 0.585 with SEc = 0.0977, so t = c/SEc = 5.99 with
df = 767, and P < 0.0001.
12.46. (a) The linear-trend coefficients (plot below, left) fall on a line. If µ1 = µ2 = · · · = µ5 ,
then the linear-trend contrast ψ1 = −2µ1 − 1µ2 + 0µ3 + 1µ4 + 2µ5 = 0. (b) The
quadratic-trend coefficients (plot below, right) fall on a parabola. If all µi are equal, then
ψ2 = 2µ1 − 1µ2 − 2µ3 − 1µ4 + 2µ5 = 0. If µi = 5i, then
ψ2 = 2 · 5 − 1 · 10 − 2 · 15 − 1 · 20 + 2 · 25 = 0
. .
(c) The sample contrasts are c2 = −3.36 and c3 = 1.12. (d) The standard errors are
. . . .
SEc2 = 0.8306 and SEc3 = 0.6425, so the test statistics are t2 = −3.364 and t3 = 1.740.
With df = 129, the P-values are 0.001 and 0.0842. Combined with the linear-trend result
from Example 12.20 (t = −0.18, P = 0.861), we see that we have significant evidence for a
quadratic trend, but not for a linear or cubic trend.
Quadratic contrast coefficients
Linear contrast coefficients
2 2
1 1
0 0
–1 –1
–2 –2
–3 –3
0 1 2 3 4 5 0 1 2 3 4 5
i i
Solutions 347
12.48. (a) The residuals appear to be reasonably Normal. (b) With df = 27, three comparisons,
and α = 0.05, the Bonferroni critical value is t ∗∗ = 2.5525. The pooled standard deviation is
. √ .
sp = 21.5849, and the standard error of each difference is SE D = sp 1/10 + 1/10 = 9.6531,
.
so two means are significantly different if they differ by t ∗∗ SE D = 24.6390. The high-jump
mean is significantly different from the other two.
−4
−0 887751 625
0
0 1449 620
1 112899 –20 615
2 25 610
3 5 –40
605
4
–60 600
5 1
–3 –2 –1 0 1 2 3 Control Low jump High jump
Normal score Treatment
12.50. (a) There are no clear violations of Normality, but the number of residuals is so
small that it is difficult to draw any conclusions. (b) With df = 9, three comparisons, and
α = 0.05, the Bonferroni critical value is t ∗∗ = 2.9333. The pooled standard deviation is
. √ .
sp = 0.5305, and the standard error of each difference is SE D = sp 1/4 + 1/4 = 0.3751, so
.
two means are significantly different if they differ by t ∗∗ SE D = 1.1003. The iron mean is
significantly higher than the other two.
348 Chapter 12 One-Way Analysis of Variance
−0 8 0.6
−0 6 4.5
0.4
Residual
0 3.5
−0 0
0 00 –0.2 3
0 33 –0.4
0 455 2.5
–0.6
–0.8 2
–1 1.5
–2 –1 0 1 2 Aluminum Clay Iron
Normal score Pot type
2.89 = 3 > 2.
12.51. (a) Pooling is risky because 8.66 n x s
(b) ANOVA gives F = 137.94 (df 5 and 12), for which ECM1 3 65.0% 8.6603%
P < 0.0005. We reject H0 and conclude that not all ECM2 3 63.3% 2.8868%
means are equal. ECM3 3 73.3% 2.8868%
MAT1 3 23.3% 2.8868%
Minitab output: Analysis of Variance on Gpi MAT2 3 6.6% 2.8868%
Source DF SS MS F p MAT3 3 11.6% 2.8868%
Treatment 5 13411.1 2682.2 137.94 0.000
Error 12 233.3 19.4
Total 17 13644.4
12.52. (a) The residuals have one low outlier, and a lot of granularity, so Normality is
difficult to assess. (b) With df = 12, 15 comparisons, and α = 0.05, the Bonferroni
.
critical value is t ∗∗ = 3.6489. The pooled standard deviation is sp = 4.4096%, and
√ .
the standard error of each difference is SE D = sp 1/3 + 1/3 = 3.6004%, so two
.
means are significantly different if they differ by t ∗∗ SE D = 13.1375%. The three
ECM means are significantly higher than the three MAT means. (c) The contrast is
ψ = 13 (µECM1 + µECM2 + µECM3 ) − 13 (µMAT1 + µMAT2 + µMAT3 ), and the hypotheses are
. .
H0 : ψ = 0 versus Ha : ψ = 0. For the test, we compute c = 53.33%, SEc = 2.0787%, and
.
t = 25.66 with df = 12. This has a tiny P-value; the difference between ECM and MAT is
highly significant. This is consistent with the Bonferroni results from part (b).
−1 0 6
70
−0 4
−0 60
Mean % Gpi cells
2
−0 0 50
Residual
−0 333 –2 40
−0 1111 –4
0 111111 30
–6
0 33 –8 20
0 55 10
–10
–12 0
–3 –2 –1 0 1 2 3 ECM1 ECM2 ECM3 MAT1 MAT2 MAT3
Normal score Pot type
Solutions 349
12.53. Let µ1 be the placebo mean, µ2 and µ3 be the means for low and high doses of
.
Drug A, and µ4 and µ5 be the means for low and high doses of Drug B. Recall that sp =
3.2558. (a) The first contrast is ψ1 = µ1 − 12 (µ2 +µ4 ); the second is ψ2 = µ3 −µ2 −(µ5 −µ4 ).
(b) The estimated contrasts are c1 = 14.00 − 0.5(15.25) − 0.5(15.75) = −1.5 and
c2 = (18.25 − 15.25) − (22.50 − 15.75) = −3.75. The respective standard errors are:
1 0.25 0 0.25 0 .
SEc1 = sp + + + + = 1.9937 and
4 4 4 4 4
0 1 1 1 1 .
SEc2 = sp + + + + = s p = 3.2558
4 4 4 4 4
. .
(c) Neither contrast is significant (t1 = −0.752 and t2 = −1.152, for which the one-sided
P-values are 0.2317 and 0.1337). We do not have enough evidence to conclude that low
doses increase activity level over a placebo, nor can we conclude that activity level changes
due to increased dosage are different between the two drugs.
12.54. (a) Below. (b) To test H0 : µ1 = · · · = µ4 versus Ha : not all µi are equal, ANOVA
(Minitab output below) gives F = 967.82 (df 3 and 351), which has P < 0.0005. We
conclude that not all means are equal; specifically, the “Placebo” mean is much higher than
the other three means.
30
12.55. (a) The plot (below) shows granularity (which varies between groups), but that should
not make us question independence; it is due to the fact that the scores are all integers.
.
(b) The ratio of the largest to the smallest standard deviations is 1.595/0.931 = 1.714—less
than 2. (c) Apart from the granularity, the quantile plots (on the following page) are
reasonably straight. (d) Again, apart from the granularity, the residual quantile plot (below,
right) looks pretty good.
4 4
3 3
2 2
Residual
Residual
1 1
0 0
-1 -1
-2 -2
-3 -3
-4 -4
0 50 100 150 200 250 300 350 -3 -2 -1 0 1 2 3
Case Normal score
350 Chapter 12 One-Way Analysis of Variance
20 21
20
19
19
18 18
17 17
16
16
15
15 14
14 13
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
Normal score Normal score
18 32
31
17
30
16 29
15 28
27
14
26
13 25
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
Normal score Normal score
12.56. We have six comparisons to make, and df = 351, so the Bonferroni√ critical value with
.
α = 0.05 is t ∗∗ = 2.6533. The pooled standard deviation is sp = MSE = 1.1958; the
differences, standard errors, and t statistics are below. The only nonsignificant difference is
between the two Pyr treatments (meaning the second application of the shampoo is of little
benefit). The Keto shampoo mean is the lowest; the placebo mean is by far the highest.
12.60. There is no effect on the test statistic, df, P-value, and conclusion. The degrees of
freedom are not affected, because the number of groups and sample sizes are unchanged;
meanwhile, the SS and MS values change (by a factor of b2 ), but this change does not
affect F because the factors of b2 cancel out in the ratio F = MSG/MSE. With the same
F- and df values, the P-value and conclusion are necessarily unchanged.
Proof of these statements is not too difficult, but it requires careful use of the SS
formulas. For most students, a demonstration with several choices of a and b would
probably be more convincing than a proof. However, here is the basic idea: Using results
of Chapter 1, we know that the means undergo the same transformation as the data
(x i∗ = a + bx i ), while the standard deviations are changed by a factor of |b|. Let x be the
I
average of all the data; note that x ∗ = a + bx. Now SSG = i=1 n i (x i − x)2 , so:
SSG∗ = i n i (x i∗ − x ∗ )2 = i n i (b x i − b x)2 = i n i b2 (x i − x)2 = b2 SSG
Similarly, we can establish that SSE∗ = b2 SSE and SST∗ = b2 SST. Since the MS values are
merely SS values divided by the (unchanged) degrees of freedom, these also change by a
factor of b2 .
12.61. A table of means and standard deviations is below. Quantile plots are not shown, but
apart from the granularity of the scores and a few possible outliers, there are no marked
deviations from Normality. Pooling is reasonable for both PRE1 and PRE2; the ratios are
1.24 and 1.48.
For both PRE1 and PRE2, we test H0 : µ B = µ D = µ S versus Ha : at least one mean is
different. Both tests have df 2 and 63. For PRE1, F = 1.13 and P = 0.329; for PRE2,
F = 0.11 and P = 0.895. There is no reason to believe that the mean pretest scores differ
between methods.
PRE1 PRE2
Method n x s x s
Basal 22 10.5 2.9721 5.27 2.7634
DRTA 22 9.72 2.6936 5.09 1.9978
Strat 22 9.136 3.3423 4.954 1.8639
9 7.5 45
8.5 7 44
8 6.5 43
7.5 6 42
7 5.5 41
6.5 5 40
Basal DRTA Strat Basal DRTA Strat Basal DRTA Strat
Teaching method Teaching method Teaching method
354 Chapter 12 One-Way Analysis of Variance
12.63. The scatterplot (below, left) suggests that a straight line is not the best choice of a
model. Regression gives the formula
= 4.432 − 0.000102 Friends
Score
Not surprisingly, the slope is not significantly different from 0 (t = −0.28, P = 0.782).
The regression only explains 0.1% of the variation in score. The residual plot (below, right)
is nearly identical to the first scatterplot, and suggests (as that did) that a quadratic model
might be a better choice.
Note: If one fits a quadratic model, it does better (and has significant coefficients), but it
still only explains 8.3% of the variation in attractiveness.
7 3
6
Attractiveness score
2
5 1
Residual
4 0
3 –1
2 –2
1 –3
0 –4
100 200 300 400 500 600 700 800 900 100 200 300 400 500 600 700 800 900
Number of friends Number of friends
12.64. The pooled standard deviation sp is found by looking at the spread of each observation
about its group mean x i . The “total” standard deviation s given in Exercise 12.30 is the
spread about the grand mean (the mean of all the data values, ignoring distinctions between
groups). When we ignore group differences, we have more variation (uncertainty) in our
data, so s is almost always larger than sp .
This can be made clearer (to sufficiently mathematical students) by noting that the total
variance s 2 can be found in the ANOVA table:
SSE SST
Just as sp2 = = MSE, s 2 = = MST.
DFE DFT
(The total mean square is not included in the ANOVA table but is easily computed from the
values on the bottom line.) Because SSM + SSE = SST, we always have SSE ≤ SST, with
equality only when the model is completely worthless (that is, when all group means equal
the grand mean, so that SSM = 0). Because DFE < DFT, it might be that MSE ≥ MST but
that does not happen very often.
356 Chapter 12 One-Way Analysis of Variance
40 + 47 + 43
12.66. With σ = 7 and means µ1 = 40, µ2 = 47, and µ3 = 43, we have µ̄ = 3 = 43.3
and noncentrality parameter:
n (µi − µ̄)2 (10) (40 − 43.3)2 + (47 − 43.3)2 + (43 − 43.3)2 (10)(24.6) .
λ= = = = 5.0340
σ 2 49 49
(The value of λ in the G•Power output below is slightly different due to rounding.) The
degrees of freedom and critical value are the same as in Example 12.27: df 2 and 27,
F ∗ = 3.35. Software reports the power as about 46%. Samples of size 10 are not adequate
for this alternative; we should increase the sample size so that we have a better chance of
detecting it. (For example, samples of size 20 give nearly 80% power for this alternative.)
G•Power output
Post-hoc analysis for "F-Test (ANOVA)", Global, Groups: 3:
Alpha: 0.0500
Power (1-beta): 0.4606
Effect size "f": 0.4096
Total sample size: 30
Critical value: F(2,27) = 3.3541
Lambda: 5.0332
12.67. (a) Sampling plans will vary but should attempt to address how cultural groups will be
determined: Can we obtain such demographic information from the school administration?
Do we simply select a large sample then poll each student to determine if he or she belongs
to one of these groups? (b) Answers will vary with choice of Ha and desired power. For
example, with the alternative µ1 = µ2 = 4.4, µ3 = 5, and standard deviation σ = 1.2,
three samples of size 75 will produce power 0.89. (See G•Power output below.) (c) The
report should make an attempt to explain the statistical issues involved; specifically, it should
convey that sample sizes are sufficient to detect anticipated differences among the groups.
G•Power output
Post-hoc analysis for "F-Test (ANOVA)", Global, Groups: 3:
Alpha: 0.0500
Power (1-beta): 0.8920
Effect size "f": 0.2357
Total sample size: 225
Critical value: F(2,222) = 3.0365
Lambda: 12.4998
Solutions 357
12.68. Recommended sample sizes will vary with choice of Ha and desired power. For
example, with the alternative µ1 = µ2 = 0.22, µ3 = 0.24, and standard deviation σ = 0.015,
three samples of size 10 will produce power 0.84, and samples of size 15 increase the power
to 0.96. (See G•Power output below.) The report should make an attempt to explain the
statistical issues involved; specifically, it should convey that sample sizes are sufficient to
detect anticipated differences among the groups.
G•Power output
Post-hoc analysis for "F-Test (ANOVA)", Global, Groups: 3:
Alpha: 0.0500
Power (1-beta): 0.8379
Effect size "f": 0.6285
Total sample size: 30
Critical value: F(2,27) = 3.3541
Lambda: 11.8504
Note: Accuracy mode calculation.
12.69. The design can be similar, although the types of music might be different. Bear in
mind that spending at a casual restaurant will likely be less than at the restaurants examined
in Exercise 12.28; this might also mean that the standard deviations could be smaller. A
pilot study might be necessary to get an idea of the size of the standard deviations. Decide
how big a difference in mean spending you would want to detect, then do some power
computations.
Chapter 13 Solutions
13.1. (a) Two-way ANOVA is used when there are two factors (explanatory variables). (The
outcome [response] variable is assumed to have a Normal distribution, meaning that it can
take any value, at least in theory.) (b) Each level of A should occur with all three levels
of B. (Level A has two factors.) (c) The RESIDUAL part of the model represents the error.
(d) DFAB = (I − 1)(J − 1).
13.2. (a) Parallel profiles imply that there is no interaction. (b) It is not necessary that all
sample sizes be the same. (The standard deviations must all be the same.) (c) sp2 is found
by pooling the sample variances for each SRS. (d) The main effects can give useful
information even in the presence of an interaction.
13.3. (a) A large value of the AB F statistic indicates that we should reject the hypothesis
of no interaction. (b) The relationship is backwards: Mean squares equal sum of squares
divided by degrees of freedom. (c) Under H0 , the ANOVA test statistics have an F
distribution. (d) If the sample sizes are not the same, the sums of squares may not add
for “some methods of analysis.” (See the ‘Caution’ on page 680; for more detail, see
http://afni.nimh.nih.gov/sscc/gangc/SS.html.)
Group mean
B1. 35
(b) No interaction (the lines 30
15
are perfectly parallel). 25
20 B1
(c) Yes: The factor-A means 10
15
increase under B1, and de-
10
crease under B2. 5
A1 A2 A3 A1 A2 A3
(d) Yes: When A changes
Level of factor A Level of factor A
from level 2 to level 3, the 50
(c) (d)
means increase under B1 and 45 50 B2
decrease under B2. 40 B2
Group mean
Group mean
35 40
30
30
25
20 B1 20 B1
15
10 10
A1 A2 A3 A1 A2 A3
Level of factor A Level of factor A
358
Solutions 359
13.6. The answers are found in Table E (or using software) with P = 0.05. (a) We have I = 2,
J = 4 and N = 24, so DFA = 1 and DFE = 16. We would reject H0 if F > 4.49 (software
gives 4.4940). (b) We have I = J = 4 and N = 32, so DFAB = 9 and DFE = 16. We
would reject H0 if F > 2.54 (software: 2.5377). (c) We have I = J = 2 and N = 204, so
DFAB = 1 and DFE = 200. We would reject H0 if F > 3.89 (software: 3.8884).
0 1 2 3 4 5 6 0 1 2 3 4 0 1 2 3 4 5
13.7. (a) The factors are gender (I = 2) and age (J = 3). The response variable is the percent
of pretend play. The total number of observations is N = (2)(3)(11) = 66. (b) The factors
are time after harvest (I = 5) and amount of water (J = 2). The response variable is
the percent of seeds germinating. The total number of observations is N = 30 (3 lots of
seeds in each of the 10 treatment combinations). (c) The factors are mixture (I = 6) and
freezing/thawing cycles (J = 3). The response variable is the strength of the specimen. The
total number of observations is N = 54. (d) The factors are training programs (I = 4) and
the number of days to give the training (J = 2). The response variable is not specified, but
presumably is some measure of the training’s effectiveness. The total sample size is N = 80.
13.8. The table on the right summarizes (a) (b) (c) (d)
the degrees of freedom for each source. I = 2 5 6 4
J= 3 2 3 2
Source N= 66 30 54 80
A I −1= 1 4 5 3
B J −1= 2 1 2 1
AB (I − 1)(J − 1) = 2 4 10 3
Error N − IJ = 60 20 36 72
360 Chapter 13 Two-Way Analysis of Variance
Repurchase intent
short history, and decreases it (very slightly) for customers 7 Thank-you
with long history. Note that either variable could be on note
the horizontal axis in the plot of means. (b) The marginal 6.5
means are
6
Short history 6.245 No thank-you note 6.61
Long history 7.45 Thank-you note 7.085 No note
5.5
Short Long
For example, 5.69 +2 6.80 = 6.245. The history marginal Transaction history
means convey the fact that repurchase intent is higher for
customers with long history. The thank-you note marginal means suggest that a thank-you
note increases repurchase intent, but they are harder to interpret because of the interaction.
13.10. With I = J = 2 levels for each factor, the three missing entries in the DF column are
all 1. The MS entries are computed as MS MS
DF , and the F statistics are DFE . Comparing each
test statistic to an F(1, 160) distribution gives the P-values.
Source DF SS MS F P-value
Transaction history 1 61.445 61.445 12.94 0.0004
Thank-you statement 1 21.810 21.810 4.59 0.0336
Interaction 1 15.404 15.404 3.24 0.0736
Error 160 759.904 4.7494
The interaction is not quite significant, but the two main effects are.
13.12. (a) The plot suggests a gender effect: Men had higher 114
postexercise blood pressure (BP) than women. There also Men
13.15. Marginal means are listed in the table below. In each case, we find the average of the
four means for each level of the characteristic. For example, for Humor, we have
3.04 + 5.36 + 2.84 + 3.08
No humor: = 3.58
4
5.06 + 5.55 + 1.95 + 3.27
Humor: = 3.9575
4
The presence of humor slightly increases mean satisfaction. The process and outcome effects
appear to be greater (that is, the change in mean satisfaction is greater).
Marginal means
Humor Process Outcome
No 3.58 Favorable 4.7525 Favorable 4.315
Yes 3.9575 Unfavorable 2.785 Unfavorable 3.2225
Average attitude
cooking than males. Attitudes in France
toward cooking
are less positive than those in the U.S. and 7
Canada. (b) While the means plot is not per- Female
6.5
fectly parallel, it is not clear that it indicates
an interaction. If an interaction is present, it 6
is that the female/male difference is greatest Male
in Canada, and least in France. 5.5
Canada United States France
Culture
13.18. (a) With a total sample size of N = 811, and six groups, we have df = 805. (b) There
.
are 6 means, so there are 62· 5 = 15 comparisons. (c) Using sp = 1.7746 from the previous
exercise, the t statistics are ti j = (x i − x j )/SEi j , where SEi j = sp n1i + n1j . The complete
set of differences, standard errors, and t values is listed on the following page; the eight
largest differences (marked with an asterisk) are significant.
Note: If doing these computations by hand, it is best to start with the largest differences
and work down until one finds a difference that is not significant. (One should then check a
few more, as there might be one or two more significant differences remaining because the
standard errors vary with sample size.) In this set of data, for example, there is little reason
to check for a significant difference between the Canadian male, U.S. male, and French
female means.
Solutions 363
Means Difference SE t
Female/Canada – Male/Canada 1.31 0.1960 6.6828 *
Female/Canada – Female/U.S. 0.34 0.1759 1.9334
Female/Canada – Male/U.S. 1.27 0.2107 6.0262 *
Female/Canada – Female/France 1.32 0.2272 5.8088 *
Female/Canada – Male/France 2.01 0.2223 9.0406 *
Male/Canada – Female/U.S. −0.97 0.2071 −4.6839 *
Male/Canada – Male/U.S. −0.04 0.2374 −0.1685
Male/Canada – Female/France 0.01 0.2522 0.0397
Male/Canada – Male/France 0.70 0.2478 2.8251
Female/U.S. – Male/U.S. 0.93 0.2211 4.2067 *
Female/U.S. – Female/France 0.98 0.2369 4.1376 *
Female/U.S. – Male/France 1.67 0.2321 7.1937 *
Male/U.S. – Female/France 0.05 0.2638 0.1895
Male/U.S. – Male/France 0.74 0.2596 2.8508
Female/France – M/France 0.69 0.2731 2.5262
13.19. Means plots are below. Possible observations: Except for female responses to purchase
intention, means decreased from Canada to the United States to France. Females had
higher means than men in almost every case, except for French responses to credibility
and purchase intention (suggesting a modest interaction). Gender differences in France are
considerably smaller than in either Canada or the United States.
5
4.6
toward functional foods
4.6 4.4
Male
4.4 4.2 Male
4.2
4
4
3.8 3.8
Canada United States France Canada United States France
Culture Culture
4.6 4.4
Credibility of information
Female
about functional foods
Female
for functional foods
Purchase intention
4.4 4.2
Male
Male 4
4.2
3.8
4
3.6
3.8 3.4
3.6 3.2
Canada United States France Canada United States France
Culture Culture
364 Chapter 13 Two-Way Analysis of Variance
13.20. Opinions of undergraduate students might be similar to a large segment of the young
adult population, but this sample is probably not an unbiased representation of that group.
Filling out the surveys in class might also affect the usefulness of the responses in some
way (although it is difficult to predict what that effect might be).
13.21. (a) The marginal means (as well as the individual cell means) are in the table below.
The first two means suggest that the intervention group showed more improvement than the
control group. (b) Interaction means that the mean number of actions changes differently
over time for the two groups. We see this in the plot below because the lines connecting the
means are not parallel.
13.23. We have I = 3, J = 2, and N = 30, so the degrees of freedom are DFA = 2, DFB = 1,
DFAB = 2, and DFE = 24. This allows us to determine P-values (or to compare to
Table E), and we find that there are no significant effects (although B is close):
FA = 1.87 has df 2 and 24, so P = 0.1759
FB = 3.49 has df 1 and 24, so P = 0.0740
FAB = 2.14 has df 2 and 24, so P = 0.1396
Solutions 365
13.24. (a) Based on the given P-values, the interaction and the main effect of B are significant
at α = 0.05. (b) In order to summarize the results, we would need to know the number of
levels for each factor (I and J ) and the sample sizes in each cell (n i j ). We also would want
to know the sample cell means x i j so that we could interpret the significant main effect and
the nature of the interaction.
13.25. (a) The means are nearly parallel, and show little
13.27. (a) Plot on the right. (b) There seems to be a fairly 5.4
large difference between the means based on how much Eat-R
5.2
13.28. The “Other” category had the lowest mean HS math 9.5
EO
7.5
Males Females
Gender
(right) and from the marginal means (CS: 605, EO: 624.5,
600
O: 566.) Males had higher mean scores in CS and O, while
females are slightly higher in EO; this indicates an inter- 575 CS
action. Overall, the marginal means by gender are 611.7
(males) and 585.3 (females). 550
O
525
Males Females
Gender
13.30. A study today might include a category for those who declared a major such as
Information Technology (which probably did not exist at the time of the initial study). Some
variables that might be useful to consider: grade in first programming course, high school
physics grades, etc.
Solutions 367
(d) There appears to be an interaction: Individuals send more money to groups, while groups
send more money to individuals. (e) Compare the statistics to an F(1, 105) distribution. The
three P-values are 0.0033 (sender), 0.9748 (responder), and 0.1522 (interaction). Only the
main effect of sender is significant.
13.33. Yes; the iron-pot means are the highest, and the F statistic for testing the effect of
the pot type is very large. (In this case, the interaction does not weaken any evidence that
iron-pot foods contain more iron; it only suggests that while iron pots increase iron levels in
all foods, the effect is strongest for meats.)
Clay/Veg.
Iron/Veg.
Alum/Meat
Alum/Leg.
Clay/Meat
Clay/Leg.
Iron/Meat
Iron/Leg.
Alum/Veg.
plot on the right are drawn with this length
(above and below each mean), so two means
are significantly different if the “dot” for one Pot/Foodcombination
mean does not fall within the other mean’s error bars. For example, we find that iron/meat
is significantly larger than everything else, and iron/legumes is significantly different from
everything except iron/vegetable. These conclusions are consistent with the results of the
two-way ANOVA.
Diameter (mm)
Time 1 (8:00AM) Time 2 (11:00AM) Time 3 (3:00PM)
Tool x s x s x s
1 25.0307 0.001155 25.0280 0 25.0260 0
2 25.0167 0.001155 25.0200 0.002000 25.0160 0
3 25.0063 0.001528 25.0127 0.001155 25.0093 0.001155
4 25.0120 0 25.0193 0.001155 25.0140 0.004000
5 24.9973 0.001155 25.0060 0 25.0003 0.001528
13.36. All means and standard deviations will change by a factor of 0.04; the plot is identical
to that in Exercise 13.35, except that the vertical scale is different. All SS and MS values
change by a factor of 0.042 = 0.0016, but the F- (and P-) values are the same.
13.37. (a) All three F values have df 1 and 945, and the P-values are < 0.001, < 0.001, and
0.1477. Gender and handedness both have significant effects on mean lifetime, but there is
no significant interaction. (b) Women live about 6 years longer than men (on the average),
while right-handed people average 9 more years of life than left-handed people. “There is no
interaction” means that handedness affects both genders in the same way, and vice versa.
13.38. (a) Fseries = 7.02 with df 3 and 61; this has P = 0.0004. Fholder = 1.96 with df 1 and
61; this has P = 0.1665. Finteraction = 1.24 with df 3 and 61; this has P = 0.3026. Only
the series had a significant effect; the presence or absence of a holder and series/holder
interaction did not significantly affect the mean radon reading. (b) Because the ANOVA
indicates that these means are significantly different, we conclude that detectors produced in
different production runs give different readings for the same radon level. This inconsistency
may indicate poor quality control in production.
Note: In the initial printing of the text, the total sample size (N = 69) was not given,
without which we cannot determine the denominator degrees of freedom for part (a).
370 Chapter 13 Two-Way Analysis of Variance
13.39. (a) & (b) The table below lists the means and standard deviations (the latter in
parentheses) of the nitrogen contents of the plants. The two plots below suggest that plant 1
and plant 3 have the highest nitrogen content, plant 2 is in the middle, and plant 4 is the
lowest. (In the second plot, the points are so crowded together that no attempt was made
to differentiate among the different water levels.) There is no consistent effect of water
level on nitrogen content. Standard deviations range from 0.0666 to 0.3437, for a ratio of
5.16—larger than we like. (c) Minitab output below. Both main effects and the interaction
are highly significant.
Amount of water per day
Species 50mm 150mm 250mm 350mm 450mm 550mm 650mm
1 3.2543 2.7636 2.8429 2.9362 3.0519 3.0963 3.3334
(0.2287) (0.0666) (0.2333) (0.0709) (0.0909) (0.0815) (0.2482)
2 2.4216 2.0502 2.0524 1.9673 1.9560 1.9839 2.2184
(0.1654) (0.1454) (0.1481) (0.2203) (0.1571) (0.2895) (0.1238)
3 3.0589 3.1541 3.2003 3.1419 3.3956 3.4961 3.5437
(0.1525) (0.3324) (0.2341) (0.2965) (0.2533) (0.3437) (0.3116)
4 1.4230 1.3037 1.1253 1.0087 1.2584 1.2712 0.9788
(0.1738) (0.2661) (0.1230) (0.1310) (0.2489) (0.0795) (0.2090)
3.5 3 3 3.5
3
Mean percent nitrogen
1 1
3 3 3 1
3 3 1 3
1 1
1
2.5 2 2.5
2
2 2 2 2 2 2 2
1.5 4 1.5
4 4 4
4
1 4 4 1
0.5 0.5
1 2 3 4 5 6 7 1 2 3 4
Water level Species
–0.2
–0.4
–0.6
–3 –2 –1 0 1 2 3
Normal score
Solutions 371
13.41. For each water level, there is highly Water level sp SE D MSD1 MSD2
significant evidence of variation in nitro- 1 0.1824 0.0860 0.2418 0.3059
gen level among plant species (Minitab 2 0.2274 0.1072 0.3015 0.3814
output below). For each water level, 3 0.1912 0.0902 0.2535 0.3208
we have df = 32, 6 comparisons, and 4 0.1991 0.0939 0.2640 0.3340
α = 0.05, so the Bonferroni critical 5 0.1994 0.0940 0.2643 0.3344
6 0.2318 0.1093 0.3073 0.3887
value is t ∗∗ = 2.8123. (If we take into
7 0.2333 0.1100 0.3093 0.3913
account that there are 7 water levels, so
that overall we are performing 6 × 7 = 42 comparisons, we should take t ∗∗ = 3.5579.)
The table on the right gives the pooled standard deviations sp , the standard errors of each
√
difference SE D = sp 1/9 + 1/9, and the “minimum significant difference” MSD = t ∗∗ SE D
(two means are significantly different if they differ by at least this amount). MSD1 uses
t ∗∗ = 2.8123, and MSD2 uses t ∗∗ = 3.5579. As it happens, for either choice of MSD,
the only nonsignificant differences are between species 1 and 3 for water levels 1, 4, and 7.
(These are the three closest pairs of points in the plot from the solution to Exercise 13.39.)
Therefore, for every water level, species 4 has the lowest nitrogen level and species 2 is next.
For water levels 1, 4, and 7, species 1 and 3 are statistically tied for the highest level; for the
other levels, species 3 is the highest, with species 1 coming in second.
13.42. The F statistics for all four ANOVAs are significant, and all four regressions are
significant as well. However, the regressions all have low R 2 (varying from 6.4% to 27.3%),
and plots indicate that a straight line is not really appropriate except perhaps for plant 3
(which had the highest R 2 value).
1.2
3.3
1.0
3.0
0.8
2.7 0.6
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Water level Water level
13.43. (a) & (b) The tables on the following page list the means and standard deviations (the
latter in parentheses). The means plots show that biomass (both fresh and dry) increases
with water level for all plants. Generally, plants 1 and 2 have higher biomass for each water
level, while plants 3 and 4 are lower. Standard deviation ratios are quite high for both fresh
. .
and dry biomass: 108.01/6.79 = 15.9 and 35.76/3.12 = 11.5. (c) Minitab output below. For
both fresh and dry biomass, main effects and the interaction are significant. (The interaction
for fresh biomass has P = 0.04; other P-values are smaller.)
Fresh biomass
Species 50mm 150mm 250mm 350mm 450mm 550mm 650mm
1 109.095 165.138 168.825 215.133 258.900 321.875 300.880
(20.949) (29.084) (18.866) (42.687) (45.292) (46.727) (29.896)
2 116.398 156.750 254.875 265.995 347.628 343.263 397.365
(29.250) (46.922) (13.944) (59.686) (54.416) (98.553) (108.011)
3 55.600 78.858 90.300 166.785 164.425 198.910 188.138
(13.197) (29.458) (28.280) (41.079) (18.646) (33.358) (18.070)
4 35.128 58.325 94.543 96.740 153.648 175.360 158.048
(11.626) (6.789) (13.932) (24.477) (22.028) (32.873) (70.105)
Dry biomass
Species 50mm 150mm 250mm 350mm 450mm 550mm 650mm
1 40.565 63.863 71.003 85.280 103.850 136.615 120.860
(5.581) (7.508) (6.032) (10.868) (15.715) (16.203) (17.137)
2 34.495 57.365 79.603 95.098 106.813 103.180 119.625
(11.612) (6.149) (13.094) (25.198) (18.347) (25.606) (35.764)
3 26.245 31.865 36.238 64.800 64.740 74.285 67.258
(6.430) (11.322) (11.268) (9.010) (3.122) (12.277) (7.076)
4 15.530 23.290 37.050 34.390 48.538 61.195 53.600
(4.887) (3.329) (5.194) (11.667) (5.658) (12.084) (25.290)
13.44. Both sets of residuals have a high outlier (observation #53); observation #52 is a low
outlier for fresh biomass. The other residuals look reasonably Normal.
150 50
40
100
30
50 20
Residual
Residual
10
0
0
–50 –10
–20
–100
–30
–150 –40
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
Normal score Normal score
Solutions 375
13.45. For each water level, there is highly significant evidence of variation in biomass level
(both fresh and dry) among plant species (Minitab output below). For each water level, we
have df = 12, 6 comparisons, and α = 0.05, so the Bonferroni critical value is t ∗∗ = 3.1527.
(If we take into account that there are 7 water levels, so that overall we are performing
6 × 7 = 42 comparisons, we should take t ∗∗ = 4.2192.) The table below √ gives the pooled
standard deviations sp , the standard errors of each difference SE D = sp 1/4 + 1/4, and the
“minimum significant difference” MSD = t ∗∗ SE D (two means are significantly different if
they differ by at least this amount). MSD1 uses t ∗∗ = 3.1527, and MSD2 uses t ∗∗ = 4.2192.
Rather than give a full listing of which differences are significant, we note that plants 3 and
4 are not significantly different, nor are 1 and 3 (except for one or two water levels). All
other plant combinations are significantly different for at least three water levels. For fresh
biomass, plants 2 and 4 are different for all levels, and for dry biomass, 1 and 4 differ for
all levels.
Fresh biomass Dry biomass
Water level sp SE D MSD1 MSD2 sp SE D MSD1 MSD2
1 20.0236 14.1588 44.6382 50.3764 7.6028 5.3760 16.9487 19.1274
2 31.4699 22.2526 70.1552 79.1735 7.6395 5.4019 17.0305 19.2197
3 19.6482 13.8934 43.8012 49.4318 9.5103 6.7248 21.2010 23.9263
4 43.7929 30.9663 97.6265 110.1762 15.5751 11.0133 34.7213 39.1846
5 38.2275 27.0310 85.2197 96.1746 12.5034 8.8412 27.8734 31.4565
6 59.3497 41.9666 132.3068 149.3147 17.4280 12.3235 38.8518 43.8462
7 66.7111 47.1719 148.7174 167.8348 23.7824 16.8167 53.0176 59.8329
13.46. The F statistics for all eight ANOVAs are significant, and all eight regressions are
significant as well. Unlike the nitrogen level (Exercises 13.39 through 13.42), all of these
regressions have reasonably large values of R 2 , and the scatterplots suggest that a straight
line is an appropriate model for the relationship.
50 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Water level Water level
120 140
120
100
100
80
80
60 60
40 40
20 20
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Water level Water level
Plant 3 80 Plant 4
90
80 70
Dry biomass level
70 60
60 50
50 40
40 30
30 20
20 10
10 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Water level Water level
380 Chapter 13 Two-Way Analysis of Variance
13.48. The table and plot of the means below suggest that, within a given gender, students
who stay in the sciences have higher HSS grades than those who end up in the “Other”
group. Males have a slightly higher mean in the CS group, but females have the edge in the
other two. Normal quantile plots show no great deviations from Normality, apart from the
granularity of the grades (most evident among women in EO). In the ANOVA, both main
effects and the interaction are all significant. Residual analysis (not shown) shows that they
are left-skewed.
Major 9
Mean HSS grade
Gender CS EO Other
Male n = 39 39 39 8.5
x = 8.6667 7.9231 7.4359
s = 1.2842 2.0569 1.7136 8 Female
Female n = 39 39 39
x = 8.3846 9.2308 7.8205 7.5
Male
7.3
s = 1.6641 0.7057 1.8046 CS EO Other
Major
Solutions 381
10
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
Normal score Normal score Normal score
13.49. The table and plot of the means suggest that females have higher HSE grades than
males. For a given gender, there is not too much difference among majors. Normal quantile
plots show no great deviations from Normality, apart from the granularity of the grades
(most evident among women in EO). In the ANOVA, only the effect of gender is significant.
Residual analysis (not shown) reveals some causes for concern; for example, the variance
does not appear to be constant.
Major 9
Mean HSE grade
Gender CS EO Other
Male n = 39 39 39 8.5
Female
x = 7.7949 7.4872 7.4103
8
s = 1.5075 2.1505 1.5681
Female n = 39 39 39 7.5
x = 8.8462 9.2564 8.6154 Male
s = 1.1364 0.7511 1.1611 CS EO Other
Major
382 Chapter 13 Two-Way Analysis of Variance
10 10 10
9 9 9
8 8 8
7
7 7
6
6 6
5
5 5 4
4 4 3
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
Normal score Normal score Normal score
HSE grade (women in EO)
HSE grade (women in CS)
9 9
8 9 8
7 7
6 8 6
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
Normal score Normal score Normal score
13.50. The table and plot of the means suggest that students who stay in the sciences have
higher mean GPAs than those who end up in the “Other” group. Both genders have similar
mean GPAs in the EO group, but in the other two groups, females perform better. Normal
quantile plots show no great deviations from Normality, apart from a few low outliers in
the two EO groups. In the ANOVA, sex and major are significant, while there is some (not
quite significant) evidence for the interaction.
2.8
Male n = 39 39 39
2.6
x = 2.7474 3.0964 2.0477
2.4
s = 0.6840 0.5130 0.7304 Female
Female n = 39 39 39 2.2
x = 2.9792 3.0808 2.5236 2 Male
s = 0.5335 0.6481 0.7656 CS EO Other
Major
Solutions 383
4 4 3
4 4 4
3.5
GPA (women in CS)
13.51. The table and plot of the means below suggest that students who stay in the sciences
have higher mean SATV scores than those who end up in the “Other” group. Female CS
and EO students have higher scores than males in those majors, but males have the higher
mean in the Other group. Normal quantile plots suggest some right-skewness in the “Women
in CS” group and also some non-Normality in the tails of the “Women in EO” group. Other
groups look reasonably Normal. In the ANOVA table, only the effect of major is significant.
Male n = 39 39 39
x = 526.949 507.846 487.564 500
s = 100.937 57.213 108.779 Male
475
Female n = 39 39 39
x = 543.385 538.205 465.026 Female
450
s = 77.654 102.209 82.184 CS EO Other
Major
384 Chapter 13 Two-Way Analysis of Variance
600 600
500
500 500
450
400 400
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
Normal score Normal score Normal score
SATV score (women in EO)
SATV score (women in CS)
p 0.5
14.1. If p = 0.5, then odds = = = 1, or “1 to 1.”
1− p 0.5
odds 3 3
14.2. If odds = 3, then p = = = .
odds + 1 3+1 4
. .
14.3. We have p̂men = 63
110 = 0.5727, and p̂women = 60
130 = 6
13 = 0.4615. Therefore:
63/110 63 .
oddsmen = = = 1.3404, and
47/110 47
6/13 6 .
oddswomen = = = 0.8571, or “6 to 7”
7/13 7
Note: The odds can also be computed without first finding p̂; for example, 63 men preferred
Commercial A and 47 preferred Commercial B, so oddsmen = 47 63
.
14.4. The odds for selecting Commercial B would be the reciprocal of the odds for
47 7 .
Commercial A: odds∗men = = 0.7460 and odds∗women = = 1.1667.
63 6
.
14.5. With oddsmen = 6347 and oddswomen = 7 , we have log(oddsmen ) = 0.2930 and
6
.
log(oddswomen ) = −0.1542.
Note: You may wish to remind students to use the natural logarithm, called “ln” by
Excel and most calculators. A student who mistakenly uses the common (base-10) logarithm
instead of the natural logarithm will get 0.1272 and −0.0669 as answers.
.
14.6. With odds∗men = 47 ∗ ∗
63 and oddswomen = 6 , we have log(oddsmen ) = −0.2930 and
7
.
log(odds∗women ) = 0.1542.
Note: Because these odds were the reciprocals of those from Exercise 14.3, the log odds
are the opposites (negations) of those found in Exercise 14.5. A student who mistakenly
uses the common (base-10) logarithm instead of the natural logarithm will get −0.1272 and
0.0669 as answers.
14.7. The model is y = log(odds) = β0 + β1 x. If x = 1 for men and 0 for women, we need:
pmen pwomen
log = β 0 + β1 and log = β0
1 − pmen 1 − pwomen
. .
We estimate b0 = log(oddswomen ) = −0.1542 and b1 = log(oddsmen ) − b0 = 0.4471, so the
regression equation is log(odds) = −0.1542 + 0.4471x.
.
If x = 0 for men and 1 for women, we estimate b0 = log(oddsmen ) = 0.2930
.
and b1 = log(oddswomen ) − b0 = −0.4471, so the regression equation is
log(odds) = 0.2930 − 0.4471x.
The estimated odds ratio is either:
. oddsmen .
e0.4471 = = 1.5638 if x = 1 for men, or
oddswomen
. oddswomen .
e−0.4471 = = 0.6395 if x = 1 for women
oddsmen
385
386 Chapter 14 Logistic Regression
14.8. Because of the relationships between the (log) odds for selecting Commercial A and the
(log) odds for selecting Commercial B, noted in the solutions to Exercises 14.4 and 14.6,
these coefficients are the opposites (negations) of, and the odds ratios are reciprocals of,
those found in the solution to the previous exercise.
The model is y = log(odds) = β0 + β1 x. If x = 1 for men and 0 for women, we need:
pmen pwomen
log = β0 + β1 and log = β0
1 − pmen 1 − pwomen
. .
We estimate b0 = log(odds∗women ) = 0.1542 and b1 = log(odds∗men ) − b0 = −0.4471, so the
regression equation is log(odds) = 0.1542 − 0.4471x.
.
If x = 0 for men and 1 for women, we estimate b0 = log(odds∗men ) = −0.2930 and b1 =
.
log(odds∗women ) − b0 = 0.4471, so the regression equation is log(odds) = −0.293 + 0.4471x.
The estimated odds ratio is either:
. oddsmen .
e−0.4471 = = 0.6395 if x = 1 for men, or
oddswomen
. oddswomen .
e0.4471 = = 1.5638 if x = 1 for women
oddsmen
14.9. (a) The appropriate test would be a chi-square test with df = 5. (b) The logistic
regression model has no error term. (c) H0 should refer to β1 (the population slope) rather
than b1 (the estimated slope). (d) The interpretation of coefficients is affected by correlations
among explanatory variables.
14.10. (a) β1 = 3 means that log(odds) increases by 3 when x increases by 1. This means the
.
odds increase by a factor of e3 = 20. (b) β0 is the log-odds of an event. (c) The odds of an
event is the ratio of the event’s probability and its complement.
Note: For part (a), it is difficult to make a simple statement about the effect on the
probability when odds increases by a factor of 20. With a little algebra, we can start with
odds 20odds 20
the formula p = and find that the new probability is p∗ = = .
odds + 1 20odds + 1 19 + 1/p
14.12. Use the formula given in Exercise 14.2: For each estimated odds value, the estimated
odds . 6.1405 . 9.9249 .
probability is p̂ = 3.8452 = 0.7399. (b) 7.1405 = 0.8600. (c) 10.9249 = 0.9085.
. (a) 2.8452
odds + 1
14.13. (a) For each column, divide the “yes” entry by the total to find p̂. (b) For each p̂,
p̂
compute odds = . (c) Finally, take log(odds).
1 − p̂
. .
p̂low = 88
1169 = 0.0753 oddslow = 0.0814 log(oddslow ) = −2.5083
. .
p̂high = 112
1246 = 0.0899 oddshigh = 0.0988 log(oddshigh ) = −2.3150
Solutions 387
. 15 .
142 = 0.7606 for exclusive-territory firms. (b) p̂2 = 28 = 0.5357 for other
14.14. (a) p̂1 = 108
. . .
firms. (c) odds1 = 1−p̂1p̂1 = 3.1765 and odds2 = 1−p̂2p̂2 = 1.1538. (d) log(odds1 ) = 1.1558 and
.
log(odds2 ) = 0.1431. (Be sure to use the natural logarithm for this computation.)
. .
14.15. (a) b0 = log(oddslow ) = −2.5083 and b1 = log(oddshigh ) − log(oddslow ) = 0.1933.
(b) The fitted model is log(odds) = −2.5083 + 0.1933x. (c) The odds ratio is
. .
oddshigh /oddslow = eb1 = 1.2132 (or 0.0988
0.0814 = 1.2132). (d) The relative risk from
Example 9.7 was 1.19—very close to this odds ratio.
. .
14.16. (a) b0 = log(odds2 ) = 0.1431 and b1 = log(odds1 ) − log(odds2 ) = 1.0127. (b) The fitted
.
model is log(odds) = 0.1431 + 1.0127x. (c) The odds ratio is odds1 /odds2 = eb1 = 2.7529.
14.17. Shown below is Minitab output. (a) The slope is significantly different from 0
(z = 2.37, P = 0.018), but the constant is not (z = 0.38, P = 0.706). (b) With b1 = 1.0127,
.
SEb1 = 0.4269, and z ∗ = 1.96, the 95% confidence interval for β1 is 0.176 to 1.849.
. .
(c) Exponentiating gives the interval e0.176 = 1.19 to e1.849 = 6.36.
Minitab output
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant 0.143101 0.378932 0.38 0.706
Exclusive
Yes 1.01267 0.426920 2.37 0.018 2.75 1.19 6.36
ea
14.18. Recall that, by properties of exponents, = ea − b . Therefore:
eb
oddsx + 1 e−11.0391 × e3.1709(x + 1)
= = e3.1709(x + 1) − 3.1709x = e3.1709(x + 1 − x) = e3.1709
oddsx e−11.0391 × e3.1709x
. .
14.19. With b1 = 3.1088 and SEb1 = 0.3879, the 99% confidence interval is
.
b1 ± 2.576SEb1 = b1 ± 0.9992, or 2.1096 to 4.1080.
14.20. To find the confidence interval for the odds ratio, we first make a confidence interval for
.
the slope b1 and then transform (exponentiate) it: b1 ± z ∗ SEb1 = 3.1088 ± (1.96)(0.3879) =
. .
2.3485 to 3.8691, so the odds ratio interval is e2.3485 = 10.470 to e3.8691 = 47.898. Up to
rounding error, this agrees with the software output.
. 2 .
0.3879 = 8.01. (b) z = 64.23, which agrees with the value of X given by SPSS
14.21. (a) z = 3.1088 2
and SAS. (c) The sketches are below. For both the Normal and chi-square distributions, the
test statistics are quite extreme, consistent with the reported P-value.
64.23
8.01
–2 0 2 4 6 8 10 0 2 4 6 8 10
388 Chapter 14 Logistic Regression
14.22. Shown in the table below are the coefficients for the full model (from Example 14.11),
as well as the three two-variable models and the three one-variable models (one of which
appeared in Example 14.6). P-values for individual coefficients are given in parentheses
below the coefficients. The X 2 statistics for the one-variable models are not shown; most
software will not produce this, because the P-value for the coefficient measures the overall
significance of the model.
Coefficient of:
Constant LOpening Theaters Opinion X2 df P
−2.0132 2.1467 −0.0010 −0.1095 12.716 3 0.0053
(0.0277) (0.2748) (0.8083)
−2.7164 2.1319 −0.0010 12.656 2 0.0018
(0.0286) (0.2805)
−2.7154 1.3091 −0.0710 11.432 2 0.0033
(0.0066) (0.8672)
−2.1815 0.00096 −0.0065 5.442 2 0.0658
(0.0332) (0.9858)
−3.1658 1.3083
(0.0070)
−2.2212 0.00096
(0.0329)
−0.1230 0.0822
(0.8030)
14.23. An odds ratio greater than 1 means a higher probability of a low tip. Therefore: The
odds favor a low tip from senior adults, those dining on Sunday, those who speak English
as a second language, and French-speaking Canadians. Diners who drink alcohol and lone
males are less likely to leave low tips. For example, for a senior adult, the odds of leaving a
low tip were 1.099 (for a probability of 0.5236).
14.25. (a) For men’s magazines, the odds ratio confidence interval includes 1. This indicates
that this explanatory variable has no effect on the probability that a model’s clothing is not
sexual, which is consistent with our failure to reject H0 for men’s magazines in the previous
exercise. For all other explanatory variables, the odds ratio interval does not include 1,
equivalent to the significant evidence against H0 for those variables. (b) The odds that the
model’s clothing is not sexual are 1.27 to 2.16 times higher for magazines targeted at mature
adults, 2.74 to 5.01 times higher when the model is male, and 1.11 to 2.23 times higher
for magazines aimed at women. (These statements can also be made in terms of the odds
that the model’s clothing is sexual; for example, those odds are 1.27 to 2.16 times higher
Solutions 389
for magazines targeted at young adults, and so forth.) (c) The summary might note that it
is easier to interpret the odds ratio rather than the regression coefficients because of the
difficulty of thinking in terms of a log-odds scale.
p̂1 .
14.26. (a) p̂1 = 1000 = 0.463.
463
(b) odds1 = 1 − p̂1 = 0.8622. (c) p̂2 = 537
1000 = 0.537.
p̂2 .
(d) odds2 = 1 − p̂2 = 1.1598.(e) The odds in parts (b) and (d) are reciprocals—their product
is 1. (Likewise, the probabilities in (a) and (c) are complements—their sum is 1.)
. 75 .
14.27. (a) p̂hi = 73
91= 0.8022 and oddshi = 1 −p̂hip̂hi = 4.05. (b) p̂non = 109 = 0.6881 and
p̂non . .
oddsnon = 1 − p̂non = 2.2059. (c) The odds ratio is oddshi /oddsnon = 1.8385. The odds of a
high-tech company offering stock options are about 1.84 times those for a non-high-tech
firm.
. .
14.28. (a) log(oddshi ) = 1.4001 and log(oddsnon ) = 0.7911. (b) log(oddsnon ) = β0
and log(oddshi ) = β0 + β1 , so we find the estimates of β0 and β1 from the observed
. .
log-odds: b0 = log(oddsnon ) = 0.7911 and b1 = log(oddshi ) − log(oddsnon ) = 0.6090.
. .
(c) eb1 = e0.6090 = 1.8385, as we found in 16.13(c).
. .
14.29. (a) With b1 = 0.6090 and SEb1 = 0.3347, the 95% confidence interval is
.
b1 ± 1.96SEb1 = b1 ± 0.6560, or −0.0470 to 1.2650. (b) Exponentiating the confidence limits
gives the interval 0.9540 to 3.5430. (c) Because the confidence interval for β1 contains 0, or
equivalently because 1 is in the interval for the odds ratio, we could not reject H0 : β1 = 0 at
the 5% level. There does not appear to be a significant difference between the odds of stock
options for high-tech and other firms.
Note: Software reports z = 1.820 and a P-value of 0.0688, which are nearly
identical to the results for a two-proportion z test with the same counts (z = −1.832 and
P = 0.0669)—see the solution to Exercise 8.67. For large samples, these two tests should
give similar results.
14.30. Minitab output is on the following page. All proportions, odds, odds ratios, and
parameter estimates (b0 and b1 ) are unchanged. Because the standard error is smaller, the
.
95% confidence interval is narrower: b1 ± 1.96SEb1 = b1 ± 0.4637, or 0.1452 to 1.0727. The
odds-ratio interval is therefore 1.1563 to 2.9233. Because 0 is not in the confidence interval
for β1 and 1 is not in the odds-ratio interval, we have significant evidence of a difference in
the odds between the two types of companies.
Note: For testing H0 : β1 = 0, software reports z = 2.573 and P = 0.0101. For
comparison, the test of p1 = p2 yields z = 2.591 and P = 0.0096.
390 Chapter 14 Logistic Regression
p̂w
14.32. (a) For female references, p̂w = 48
60 = 0.8, giving oddsw = 1 − p̂w = 4 (“4 to 1”). (b) For
p̂m
male references, p̂m = = 0.39, giving oddsm =
52
132 = 0.65 (“13 to 20”). (c) The odds
1 − p̂m
.
ratio is oddsw /oddsm = 6.1538. (The odds of a juvenile reference are more than six times
greater for females.)
2
.
14.33. (a) The interval is b1 ± 1.96SEb1 , or 0.2452 to 1.2558. (b) X 2 = 0.7505
0.2578 = 8.47. This
gives a P-value between 0.0025 and 0.005. (c) We have strong evidence that there is a real
(significant) difference in risk between the two groups.
2
.
14.34. (a) The interval is b1 ± 1.96SEb1 , or 1.0946 to 2.5396. (b) X 2 = 1.8171
0.3686 = 24.3. This
gives P < 0.0005. (c) We have strong evidence that there is a real (significant) difference in
juvenile references between male and female references.
.
14.35. (a) The estimated odds ratio is eb1 = 2.1181 (as we found in Exercise 14.31).
Exponentiating the interval for β1 in Exercise 14.33(a) gives the odds-ratio interval from
about 1.28 to 3.51. (b) We are 95% confident that the odds of death from cardiovascular
disease are about 1.3 to 3.5 times greater in the high blood pressure group.
pi
14.37. (a) The model is log 1 − pi = β0 + β1 xi , where xi = 1 if the ith person is over 40,
and 0 if he/she is under 40. (b) pi is the probability that the ith person is terminated; this
model assumes that the probability of termination depends on age (over/under 40). In this
case, that seems to have been the case, but we might expect that other factors were taken
.
into consideration. (c) The estimated odds ratio is eb1 = 3.859. (Of course, we can also
get this from 41/765
7/504 .) We can also find, for example, a 95% confidence interval for b1 :
b1 ± 1.96SEb1 = 0.5409 to 2.1599. Exponentiating this translates to a 95% confidence
interval for the odds: 1.7176 to 8.6701. The odds of being terminated are 1.7 to 8.7 times
greater
for those over 40. (d) Use a multiple logistic regression model, for example,
log 1 −pi pi = β0 + β1 x1,i + β2 x2,i .
14.38. (a) Positive coefficients indicate increasing odds (and increasing probability), and
negative coefficients indicate decreasing odds. Therefore, the traits that make an individual
more likely to use the Internet are those listed in the rightmost column of the table below.
(The increase for having children is not significant.) (b) The odds ratios are given in the
.
table below; for example, e−0.063 = 0.9389 for Age. (c) The estimated log(odds) for this
individual would be
−0.063(23) + 0.013(50) + 0.367(1) − 0.222(1) + 1.080(1) + 0.285(0) + 0.049(0) = 0.426
.
so the estimated odds would be e0.426 = 1.5311. (d) The estimated probability is
odds = .
p = odds +1
0.6049.
14.39. It is difficult to find the needed probabilities from the Eats fruit?
numbers as given; this is made easier if we first convert Active? Yes No Total
the given information into a two-way table, shown on the Yes 169 494 663
right. The proportions meeting the activity guidelines are No 68 403 471
. 494 . . Total 237 897 1134
237 = 0.7131 and p̂no = 897 = 0.5507, so oddsfruit =
p̂fruit = 169
. . .
2.4853 and oddsno = 1.2258. Then log(oddsfruit ) = 0.9104 and log(oddsno ) = 0.2036, so
. .
b0 = 0.2036, b1 = 0.7068, and the model is log(odds) = 0.2036 + 0.7068x. Software reports
. .
SEb1 = 0.1585 and z = 4.46 for testing H0 : β1 = 0. A 95% confidence interval for the odds
ratio is 1.49 to 2.77.
392 Chapter 14 Logistic Regression
. .
14.40. (a) For females, p̂f = 708
1294 = 0.5471. For males, p̂m = 788
1862 = 0.4232. (b) The odds for
females and males are
p̂f . p̂m .
oddsf = = 1.2082 and oddsm = = 0.7337
1 − p̂f 1 − p̂m
.
0.7337 = 1.6467. (c) The model is log(odds) = β0 + β1 x, with x = 0 for
so the odds ratio is 1.2082
male and x = 1 for female. (These values for x make the slope positive, because the odds
.
are higher for females.) (d) eb1 = 1.6467, as we found in part (b). (e) The 95% confidence
interval for β1 is b1 ± 1.96 SEb1 = 0.3559 to 0.6417. Exponentiating gives a 95% confidence
interval for the odds ratio: 1.4275 to 1.8997. Female odds of reducing spending are 1.4 to
1.9 times those of males.
14.41. For each group, the probability, odds, and log(odds) of being overweight are
65080 . p̂no . .
p̂no = = 0.2732 oddsno = = 0.3759 log(oddsno ) = −0.9785
238215 1 − p̂no
83143 . p̂FF . .
p̂FF = = 0.2856 oddsFF = = 0.3997 log(oddsFF ) = −0.9170
291152 1 − p̂FF
With x = 0 for no fast food and x = 1 for fast food, the logistic regression equation
.
is log(odds) = −0.9785 + 0.0614x. Software reports SEb1 = 0.006163, and for testing
.
H0 : β1 = 0 we have z = 9.97, leaving little doubt that the slope is not 0. A 95% confidence
interval for the odds ratio is 1.0506 to 1.0763; the odds of being overweight for students at
schools close to fast-food restaurants are about 1.05 to 1.08 times greater than for students
at schools that are not close to fast food.
14.42. (a) The researchers were adjusting for violations of independence: The samples could
have included multiple students from the same school. (b) All of those other variables
could have a connection to being overweight. If the researchers had not controlled for
these variables, then their results might have been weakened (made less significant) if, for
example, they had a slightly higher number of female students from rural schools, or a small
number of non-exercising males in urban schools.
14.43. Portions of SAS and GLMStat output are given on the following page.
(a) The X 2 statistic for testing this hypothesis is 33.65 (df = 3), which has
P = 0.0001. We conclude that at least one coefficient is not 0. (b) The fitted model is
log(odds) = −6.053 + 0.3710 HSM + 0.2489 HSS + 0.03605 HSE. The standard errors of the
three coefficients are 0.1302, 0.1275, and 0.1253, giving respective 95% confidence intervals
0.1158 to 0.6262, −0.0010 to 0.4988, and −0.2095 to 0.2816. (c) Only the coefficient of
HSM is significantly different from 0, though HSS may also be useful.
Note: In the multiple regression case study of Chapter 11, HSM was also the only
significant explanatory variable among high school grades, and HSS was not even close to
significant. See Figure 11.5 on page 603 of the text.
Solutions 393
14.44. Portions of SAS and GLMStat output are given below. (a) The X 2 statistic for testing
this hypothesis is 14.2 (df = 2), which has P = 0.0008. We conclude that at least one
coefficient is not 0. (b) The model is log(odds) = −4.543+0.003690 SATM +0.003527 SATV.
The standard errors of the two coefficients are 0.001913 and 0.001751, giving respective
95% confidence intervals −0.000059 to 0.007439, and 0.000095 to 0.006959. (The first
coefficient has a P-value of 0.0537 and the second has P = 0.0440.) (c) We (barely) cannot
reject βSATM = 0—though because 0 is just in the confidence interval, we are reluctant to
discard SATM. Meanwhile, we conclude that βSATV = 0.
Note: By contrast, with multiple regression of GPA on SAT scores, we found SATM
useful but not SATV. See Figure 11.8 on page 607 of the text.
14.45. The coefficients and standard errors for the fitted model are on the following page. Note
that the tests requested in parts (a) and (b) are not available with all software packages.
(a) The X 2 statistic for testing this hypothesis is given by SAS as 19.2256 (df = 3); because
P = 0.0002, we reject H0 and conclude that high school grades add a significant amount
394 Chapter 14 Logistic Regression
to the model with SAT scores. (b) The X 2 statistic for testing this hypothesis is 3.4635
(df = 2); because P = 0.1770, we cannot reject H0 ; SAT scores do not add significantly to
the model with high school grades. (c) For modeling the odds of HIGPA, high school grades
(specifically HSM, and to a lesser extent HSS) are useful, while SAT scores are not.
14.46. (a) The fitted model is log(odds) = −0.6124 + 0.0609 Gender; the coefficient
of gender is not significantly different from 0 (z = 0.21, P = 0.8331). (b) Now,
log(odds) = −5.214 + 0.3028 Gender + 0.004191 SATM + 0.003447 SATV. In this model,
gender is still not significant (P = 0.3296). (c) Gender is not useful for modeling the odds
of HIGPA.
GLMStat output: Gender only
estimate se(est) z ratio Prob>|z|
1 Constant -0.6124 0.4156 -1.474 0.1406
2 Gender 6.087e-2 0.2889 0.2107 0.8331
Gender and SAT scores
estimate se(est) z ratio Prob>|z|
1 Constant -5.214 1.362 -3.828 0.0001
2 Gender 0.3028 0.3105 0.9750 0.3296
3 SATM 4.191e-3 1.987e-3 2.109 0.0349
4 SATV 3.447e-3 1.760e-3 1.958 0.0502
14.47. The models reported below are for the odds of death, as requested in the instructions. If
a student models odds of survival, or codes the indicator variables for hospital and condition
differently, his or her answers will be slightly different from these (but the conclusions
should be the same). (a) The fitted model is log(odds) = −3.892 + 0.4157 Hospital,
. .
using 1 for Hospital A and 0 for Hospital B. With b1 = −0.4157 and SEb1 = 0.2831,
we find that z = −1.47 or X = 2.16 (P = 0.1420), so we do not have evidence to
2
suggest that β1 is not 0. A 95% confidence interval for β1 is −0.1392 to 0.9706 (this
Solutions 395
.
interval includes 0). We estimate the odds ratio to be eb1 = 1.515, with confidence
interval 0.87 to 2.64 (this includes 1, since β1 might be 0). (b) The fitted model is
log(odds) = −3.109 − 0.1320 Hospital − 1.266 Condition; as before, use 1 for Hospital A
and 0 for Hospital B, 1 for good condition and 0 for poor. The estimated odds ratio is
.
eb1 = 0.8764, with confidence interval 0.48 to 1.60. (c) In neither case is the effect of
Hospital significant. However, we can see the effect of Simpson’s paradox in the coefficient
of Hospital, or equivalently in the odds ratio. In the model with Hospital alone, this
coefficient was positive and the odds ratio was greater than 1, meaning Hospital A patients
have higher odds of death. When condition is added to the model, this coefficient is negative
and the odds ratio is less than 1, meaning Hospital A patients have lower odds of death.
GLMStat output: Hospital only
estimate se(est) z ratio Prob>|z|
1 Constant -3.892 0.2525 -15.41 <0.0001
2 Hosp 0.4157 0.2831 -1.469 0.1420
15.1. The rankings are shown on the right. Group A ranks Group Rooms Ranks
(shaded) are 1, 2, 4, 6, and 8; Group B ranks are 3, 5, 7, 9, A 30 1
and 10. A 68 2
B 240 3
A 243 4
B 329 5
A 448 6
B 540 7
A 552 8
B 560 9
B 780 10
15.2. The list of ranks is not shown because it is nearly identical to the one shown in the
previous solution; the only change needed is to change 780 to 4003 in the last line. The
ranks assigned to each group are exactly the same.
15.4. Changing the data does not change the hypotheses, so they are the same as in the
previous solution. Additionally, because the assigned ranks did not change, the test statistic
is still W = 21.
396
Solutions 397
Minitab output: Wilcoxon rank sum (Mann-Whitney) confidence interval and test
GrpA N = 5 Median = 243.0
GrpB N = 5 Median = 540.0
Point estimate for ETA1-ETA2 is -228.0
96.3 Percent C.I. for ETA1-ETA2 is (-537.0,208.0)
W = 21.0
Test of ETA1 = ETA2 vs. ETA1 ~= ETA2 is significant at 0.2101
15.6. Because the ranks and test statistic are unchanged, all answers are the same as those
given in the previous solution.
99 .
15.7. (a) For example, 254 = 39.0% of self- Self-employed?
employed workers are completely satisfied. Yes 39.0% 55.9% 3.1% 2.0%
The complete table is on the right with a bar No 28.2% 61.2% 8.2% 2.3%
graph. Overall, self-employed workers are 60
more satisfied than the other group. (b) See
50
the Minitab output below: X 2 = 15.641
with df = 3, for which P = 0.001. We can 40
Percent
Self-employed
reject H0 and conclude that job satisfaction 30
and job type (self-employed or not) are not Not self-employed
20
independent.
10
0
Completely Mostly Mostly Completely
Minitab output: Chi-square test satisfied satisfied dissatisfied dissatisfied
Expected counts are printed below observed counts Job satisfaction
C2 C3 C4 C5 Total
1 99 142 8 5 254
77.83 152.53 18.06 5.58
2 250 542 73 20 885
271.17 531.47 62.94 19.42
Total 349 684 81 25 1139
ChiSq = 5.760 + 0.727 + 5.606 + 0.059 +
1.653 + 0.209 + 1.609 + 0.017 = 15.641
df = 3, p = 0.001
15.8. (a) Summary statistics for the two distributions are: Men Women
995 0 79
x s Min Q1 M Q3 Max 220 1 023
7 1 6
Men 15,287 7710 5180 9,951 12,423.5 21,791 29,920 31 2 124
Women 17,085 7858 7694 10,592 15,275.5 22,376 32,291 9 2
3 2
Various graphical summaries are possible; shown on the right is a
back-to-back stemplot. The men’s mean and median are lower than the women’s, but the
stemplots don’t suggest a substantial difference. Neither distribution has extreme skewness
or outliers. (b) The Wilcoxon test statistic is W = 99 with two-sided P = 0.6776 (Minitab
output on the following page). We do not have enough evidence to conclude that there is
difference between genders in words spoken.
398 Chapter 15 Nonparametric Tests
x s Min Q1 M Q3 Max
Men 14,060 9065 695 7464.5 11118 22740 36345
Women 14,252 6515 2363 8345.5 14602 18050 32291
Minitab output: Wilcoxon rank sum test
Mwords N = 37 Median = 11118
Wwords N = 41 Median = 14602
W = 1421.0
Test of ETA1 = ETA2 vs. ETA1 ~= ETA2 is significant at 0.6890
Two-sample t test
N Mean StDev SE Mean
Mwords 37 14060 9065 1490
Wwords 41 14252 6515 1017
T-Test mu Mwords = mu Wwords (vs not =): T= -0.11 P=0.92 DF= 64
.
15.10. (a) We find W = 26 and P = 0.0152 (Minitab output on the following page). We
have strong evidence against the hypothesis of identical distributions; we conclude that
the weed-free yield is higher. (b) For testing H0 : µ0 = µ9 versus Ha : µ0 > µ9 , we find
. .
x 0 = 170.2, s0 = 5.4216, x 9 = 157.575, s9 = 10.1181, and t = 2.20, which gives
P = 0.0423 (df = 4.6). We have fairly strong evidence that the mean yield is higher with no
weeds—but the evidence is not quite as strong as in (a). (c) Both tests still reach the same
conclusion, so there is no “practically important impact” on our conclusions. The Wilcoxon
.
evidence is slightly weaker: W = 22, P = 0.0259. The t-test evidence is slightly stronger:
t = 2.79, df = 3, P = 0.0341. The new statistics for the 9-weeds-per-meter group are
.
x 9 = 162.633 and s9 = 0.2082; these are substantial changes for each value.
Solutions 399
15.11. (a) Normal quantile plots are not shown. The score 0.00 for child 8 seems to be a low
outlier (although with only five observations, such judgments are questionable). (b) For
.
testing H0 : µ1 = µ2 versus Ha : µ1 > µ2 , we have x 1 = 0.676, s1 = 0.1189, x 2 = 0.406,
.
and s2 = 0.2675. Then, t = 2.062, which gives P = 0.0447 (df = 5.5). We have some
evidence that high-progress readers have higher mean scores. (c) We test:
H0 : Scores for both groups are identically distributed
vs. Ha : High-progress children systematically score higher
.
for which we find W = 36 and P = 0.0473 or 0.0463—significant evidence (at α = 0.05)
against the hypothesis of identical distributions. This is equivalent to the conclusion reached
in part (b).
15.12. (a) Normal quantile plots are not shown. The score 0.54 for child 3 seems to be a low
.
outlier. (b) For testing H0 : µ1 = µ2 versus Ha : µ1 > µ2 , we have x 1 = 0.768, s1 = 0.1333,
.
x 2 = 0.516, s2 = 0.2001. Then, t = 2.344, which gives P = 0.0259 (df = 6.97). We have
fairly strong evidence that high-progress readers have higher mean scores. (c) We test:
H0 : Scores for both groups are identically distributed
vs. Ha : High-progress children systematically score higher
.
for which we find W = 38 and P = 0.0184. This is evidence against H0 , slightly stronger
than that found in part (b).
15.14. (a) The outline is shown below. (b) We consider score im- Treatment Control
provements (posttest minus pretest). The means, medians, and 0 455
76 0 7
standard deviations are: 0 8
. 110 1 1
Treatment: x = 11.4 M = 11.5 s = 3.1693 332 1 2
. 5 1 4
Control: x = 8.25 M = 7.5 s = 3.6936 6 1
A back-to-back stemplot is one way to compare the distributions graphically. Both of these
comparisons support the idea that the positive subliminal message resulted in higher test
scores. (c) We have W = 114, for which P = 0.0501 (or 0.0494, adjusted for ties). This is
just about significant at α = 0.05, and at least warrants further study.
Minitab output: Wilcoxon rank sum test
Trtmt N = 10 Median = 11.500
Ctrl N = 8 Median = 7.500
W = 114.0
Test of ETA1 = ETA2 vs. ETA1 > ETA2 is significant at 0.0501
The test is significant at 0.0494 (adjusted for ties)
Treatment 1
Group 1 - Subliminal
*
10 subjects H
Random message H
j
H Observe test
H *
assignment score change
H
j
H Group 2 - Treatment 2
8 subjects Neutral message
15.15. (a) At right. Unlogged plots appear to have a greater number Unlogged Logged
of species. (b) We test H0 : There is no difference in the number 0 4
0
of species on logged and unlogged plots versus Ha : Unlogged 0
plots have a greater variety of species. The Wilcoxon test gives 1 0
. 333 1 2
W = 159 and P = 0.0298 (0.0290, adjusted for ties). We conclude 55 1 455
that the observed difference is significant; unlogged plots really do 1 7
998 1 88
have a greater number of species. 10 2
22 2
Minitab output: Wilcoxon rank sum test
Unlogged N = 12 Median = 18.500
Logged N = 9 Median = 15.000
W = 159.0
Test of ETA1 = ETA2 vs. ETA1 > ETA2 is significant at 0.0298
The test is significant at 0.0290 (adjusted for ties)
Solutions 401
15.16. For the Wilcoxon test, we have W = 579, for which P = 0.0064 (0.0063, adjusted for
ties). The evidence is slightly stronger with the Wilcoxon test than for the t and permutation
tests.
Minitab output: Wilcoxon rank sum test
Trtmt N = 21 Median = 53.00
Ctrl N = 23 Median = 42.00
W = 579.0
Test of ETA1 = ETA2 vs. ETA1 > ETA2 is significant at 0.0064
The test is significant at 0.0063 (adjusted for ties)
15.18. We test:
H0 : Food scores and activities scores have the same distribution
vs. Ha : Food scores are higher
The differences, and their ranks, are:
Food Activities
Spa score score Difference Rank
1 90.9 93.8 −2.9 4
2 92.3 92.3 0.0 –
3 88.6 91.4 −2.8 3
4 81.8 95.0 −13.2 6
5 85.7 89.2 −3.5 5
6 88.9 88.2 0.7 1
7 81.0 81.8 −0.8 2
In fact, it is not necessary to give the complete set of rankings; the only observations we
need to make are (1) there is only one positive difference and (2) it is the smallest (in
absolute value) of all the nonzero differences. Therefore, W + = 1.
Note: In assigning ranks, differences of 0 are ignored; see the comment in the text
toward the bottom of page 735. If a student mistakenly assigns a rank of 1 to 0, they would
find W + = 2 (or perhaps 3 if they erroneously count 0 as a “positive difference”).
402 Chapter 15 Nonparametric Tests
15.19. We test:
H0 : Food scores and activities scores have the same distribution
vs. Ha : Food scores are higher
The differences, and their ranks, are:
Food Activities
Spa score score Difference Rank
1 77.3 95.7 −18.4 6
2 85.7 78.0 7.7 2
3 84.2 87.2 −3.0 1
4 85.3 85.3 0.0 –
5 83.7 93.6 −9.9 5
6 84.6 76.0 8.6 4
7 78.5 86.3 −7.8 3
√
. .
µW + = 7(74+ 1) = 14 and σW + = 7(7 + 1)(1424
+ 1)
= 35 = 5.9161. Such a student would
presumably take W + = 8 (or 9), so they would compute the approximate P-value as
. . . .
P(W + ≥ 7.5) = P(Z ≥ −1.10) = 0.8643 or P(W + ≥ 8.5) = P(Z ≥ −0.93) = 0.8238.
While these are close to the right answer (and lead to the same conclusion), they are not
quite correct. In other situations, failing to ignore differences of 0 may lead to the wrong
conclusion.
Minitab output: Wilcoxon signed rank test (median = 0 versus median > 0)
N FOR WILCOXON ESTIMATED
N TEST STATISTIC P-VALUE MEDIAN
Diff 7 6 6.0 0.853 -3.450
15.22. (a) Five of the six subjects rated drink A higher, by between 2 and 8 points. The
subject who rated drink B higher only gave it a 2-point edge. (b) For testing H0 : µd = 0
. .
versus Ha : µd = 0, we have x = 4.16 and s = 3.6560, so t = 2.79 (df = 5) and
.
P = 0.0384—enough evidence to reject H0 . (c) For testing H0 : Ratings have the same
distribution for both drinks versus Ha : One drink is systematically rated higher, we have
W + = 19.5 and P = 0.075—not quite significant at α = 0.05. (d) For a sample this small,
the Wilcoxon test has low power. (See the related note in the solution to Exercise 15.24.)
Minitab output: Matched pairs t test
Variable N Mean StDev SE Mean T P-Value
Diff 6 4.17 3.66 1.49 2.79 0.038
Wilcoxon signed rank test
N FOR WILCOXON ESTIMATED
N TEST STATISTIC P-VALUE MEDIAN
Diff 6 6 19.5 0.075 5.000
15.23. (a) With this additional subject, six of the seven subjects rated drink A −0 2
higher, and (as before) the subject who preferred drink B only gave it a 0 2
0 5578
2-point edge. (b) For testing H0 : µd = 0 versus Ha : µd = 0, we have
. . . . 1
x = 7.8571 and s = 10.3187, so t = 2.01 (df = 6) and P = 0.0906. (c) For 1
testing H0 : Ratings have the same distribution for both drinks versus Ha : One 2
drink is systematically rated higher, we have W + = 26.5 and P = 0.043. 2
3 0
(d) The new data point is an outlier (see the stemplot, above on the right),
which may make the t procedure inappropriate. This also increases the standard deviation of
the differences, which makes t insignificant. The Wilcoxon test is not sensitive to outliers,
and the extra data point makes it powerful enough to reject H0 .
Minitab output: Matched pairs t test
Variable N Mean StDev SE Mean T P-Value
Diff 7 7.86 10.32 3.90 2.01 0.091
Wilcoxon signed rank test
N FOR WILCOXON ESTIMATED
N TEST STATISTIC P-VALUE MEDIAN
Diff 7 7 26.5 0.043 5.500
15.24. (a) The differences (treatment minus control) were 0.01622, 0.01102, and 0.01607. The
. .
mean difference was x = 0.01444, and s = 0.002960. The fact that all are positive supports
the idea that there was more growth in the treated plots. (b) For testing H0 : µ = 0 versus
.
Ha : µ > 0, with µ the mean (treatment minus control) difference, we have t = √ x
= 8.45,
s/ 3
df = 2, and P = 0.0069. We conclude that growth was greater in treated plots. (c) The
404 Chapter 15 Nonparametric Tests
Wilcoxon statistic is W + = 6, for which P = 0.091. We would not reject H0 (which states
that there is no difference among pairs). (d) A low-power test has a low probability of
rejecting H0 when it is false.
Minitab output: Wilcoxon signed rank test (median = 0 versus median > 0)
N FOR WILCOXON ESTIMATED
N TEST STATISTIC P-VALUE MEDIAN
Diff 3 3 6.0 0.091 0.01485
Note: With only three pairs, the Wilcoxon signed rank test can never give a P-value smaller
than 0.091. This is one difference between some nonparametric tests and parametric tests
like the t test: With the t test, the power improves when we consider alternatives that are
farther from the null hypothesis; for example, if H0 says µ = 0, we have higher power for
the alternative µ = 10 than for µ = 5. With the Wilcoxon signed rank test, all alternatives
look the same; the values of W + and P would be the same if the three differences had been
100, 200, and 300.
Also, note that the “estimated median” in the Minitab output (0.01485) is not the same
as the median of the three differences (0.01607). The process of computing this point
estimate is not discussed in the text, but we will illustrate it for this simple case: The
Wilcoxon estimated median is the median of the set of Walsh averages of the differences.
This set consists of every possible pairwise average (xi + xj )/2 for i ≤ j; note that this
includes i = j, in which case the average is xi . In general, there are n(n + 1)/2 such
averages, so with n = 3 differences, we have 6 Walsh averages: the three differences
(0.01622, 0.01102, and 0.01607) and the averages of each pair of distinct differences
(0.013545, 0.01362, and 0.016145). The median of 0.01102, 0.013545, 0.01362, 0.01607,
0.016145, and 0.01622 is 0.014845.
15.25. We examine the heart-rate increase (final minus resting) from low-rate exercise;
our hypotheses are H0 : median = 0 versus Ha : median > 0. The statistic is W + = 10
(the first four differencesare positive, and the fifth is 0, so we drop it). We compute
W +
− 5 9.5 − 5 .
P = P(W + ≥ 9.5) = P ≥ = P(Z ≥ 1.64) = 0.0505. This is right on the
2.739 2.739
borderline of significance: It is fairly strong evidence that heart rate increases, but (barely)
not significant at 5%.
Minitab output: Wilcoxon signed rank test (median = 0 versus median > 0)
N FOR WILCOXON ESTIMATED
N TEST STATISTIC P-VALUE MEDIAN
LowDiff 5 4 10.0 0.050 7.500
15.26. (a) We first find the Final − Resting differences for both exercise rates (Low: 15, 9,
6, 9, 0; Medium: 21, 24, 15, 15, 18), then compute the differences of these differences
(6, 15, 9, 6, 18). To this last list of differences, we apply the Wilcoxon signed rank test.
The hypotheses are H0 : median = 0 versus Ha : median > 0. (The rank sum test is not
appropriate because we do not have two independent samples.) (b) The statistic is W + = 15
(all five differences were positive), and the reported P-value is 0.030—fairly strong evidence
that medium-rate exercise increases are greater than low-rate exercise increases.
Minitab output: Wilcoxon signed rank test (median = 0 versus median > 0)
N FOR WILCOXON ESTIMATED
N TEST STATISTIC P-VALUE MEDIAN
LowMed 5 5 15.0 0.030 10.50
Solutions 405
15.27. For testing H0 : median = 0 versus Ha : median > 0, the Wilcoxon statistic is W + = 119
(14 of the 15 differences were positive, and the one negative difference was the smallest
in absolute value), and P < 0.0005—very strong evidence that there are more aggressive
incidents during moon days. This agrees with the results of the t and permutation tests.
(See the note in the solution to Exercise 15.24 for an explanation of the estimated median
reported by Minitab.)
Minitab output: Wilcoxon signed rank test (median = 0 versus median > 0)
N FOR WILCOXON ESTIMATED
N TEST STATISTIC P-VALUE MEDIAN
diff 15 15 119.0 0.000 2.570
15.28. There are 17 nonzero differences; only one is negative (the boldface 6 in the list below).
Diff: 1 1 2 2 2 3 3 3 3 3 3 6 6 6 6 6 6
Rank: 1 2 3
4 5 6 7 8 9 10 11 12 13 14 15 16 17
Value: 1.5 4 8.5 14.5
This gives W + = 138.5. (Note that the only tie we really need to worry about is the last
group; all other ties involve only positive differences.)
15.30. If we compute Haiti content minus factory content (so that a negative −1 4
difference means that the amount of vitamin C decreased), we find that the −1 3322
−1
mean change is −5.33 and the median is −6. (See the note in the solution −0 9988
to Exercise 15.24 for an explanation of the estimated median reported by −0 7776666
Minitab.) The stemplot is right-skewed. There are five positive differences;−0 5444
the Wilcoxon statistic is W + = 37, for which P < 0.0005. The differences −0 2
−0 1
are systematically negative, so vitamin C content is lower in Haiti. 0 1
Minitab output: Wilcoxon signed rank test (median = 0 versus median < 0) 0 33
0 4
N FOR WILCOXON ESTIMATED 0
N TEST STATISTIC P-VALUE MEDIAN 0 8
Change 27 27 37.0 0.000 -5.500
406 Chapter 15 Nonparametric Tests
15.31. (a) The Wilcoxon statistic is W + = 0 (all of the differences were less than 16), for
which P = 0—very strong evidence against H0 . We conclude that the median weight
gain is less than 16 pounds. (b) Minitab (output below) gives the interval 3.75 to 5.90 kg
for the median weight gain. (For comparison, in the solution to Exercise 7.32, the 95%
confidence interval for the mean µ was about 3.80 to 5.66 kg. See the note in the solution
to Exercise 15.24 for an explanation of the estimated median reported by Minitab.)
Minitab output: Wilcoxon signed rank test (median = 16 versus median = 16)
N FOR WILCOXON ESTIMATED
N TEST STATISTIC P-VALUE MEDIAN
Diff 16 16 0.0 0.000 4.800
Wilcoxon signed rank confidence interval
ESTIMATED ACHIEVED
N MEDIAN CONFIDENCE CONFIDENCE INTERVAL
Diff 16 4.80 94.8 ( 3.75, 5.90)
15.34. (a) The Kruskal-Wallis test (Minitab output below) gives H = 8.73, df = 4, and
P = 0.069—not significant at α = 0.05. Note, however, that the rankings clearly suggest
that vitamin C content decreases over time; the samples are simply too small to achieve
significance even with such seemingly strong evidence. (See also a related comment in
the solution to Exercise 15.24.) (b) The more accurate P-value is more in line with the
apparent strength of the evidence, and does change our conclusion. With it, we reject H0
and conclude that the distribution changes over time.
Minitab output: Kruskal-Wallis test
LEVEL NOBS MEDIAN AVE. RANK Z VALUE
0 2 48.705 9.5 2.09
1 2 41.955 7.5 1.04
3 2 21.795 5.5 0.00
5 2 12.415 3.5 -1.04
7 2 8.320 1.5 -2.09
OVERALL 10 5.5
H = 8.73 d.f. = 4 p = 0.069
15.35. (a) Diagram below. (b) The stemplots (right) Control Low jump High jump
suggest greater density for high-jump rats and a 55 4 55 55
greater spread for the control group. (c) H = 10.66 56 9 56 56
57 57 57
with P = 0.005. We conclude that bone density dif- 58 58 8 58
fers among the groups. ANOVA tests H0 : all means 59 33 59 469 59
are equal, assuming Normal distributions with the 60 03 60 57 60
same standard deviation. For Kruskal-Wallis, the null 61 14 61 61
62 1 62 62 2266
hypothesis is that the distributions are the same (but 63 63 1258 63 1
not necessarily Normal). (d) There is strong evidence 64 64 64 33
that the three groups have different bone densities; 65 3 65 65 00
66 66 66
specifically, the high-jump group has the highest av- 67 67 67 4
erage rank (and the highest density), the low-jump
group is in the middle, and the control group is lowest.
Minitab output: Kruskal-Wallis test
LEVEL NOBS MEDIAN AVE. RANK Z VALUE
Ctrl 10 601.5 10.2 -2.33
Low 10 606.0 13.6 -0.81
High 10 637.0 22.6 3.15
OVERALL 30 15.5
H = 10.66 d.f. = 2 p = 0.005
H = 10.68 d.f. = 2 p = 0.005 (adjusted for ties)
Group 1 - Treatment 1
10 rats Control J
J
JJ
^
Random - Group 2 - Treatment 2 - Observe bone
assignment 10 rats Low jump density
J
J
JJ
^ Group 3 - Treatment 3
15.36. (a) For ANOVA, H0 : µ1 = µ2 = µ3 = µ4 versus Ha : Not all µi are equal. For
Kruskal-Wallis, H0 says that the distribution of the trapped insect count is the same for
all board colors; the alternative is that the count is systematically higher for some colors.
(b) In the order given, the medians are 46.5, 15.5, 34.5, and 15 insects; it appears that
yellow is most effective, green is in the middle, and white and blue are least effective. The
Kruskal-Wallis test statistic is H = 16.95, with df = 3; the P-value is 0.001, so we have
strong evidence that color affects the insect count (that is, the difference we observed is
statistically significant).
Minitab output: Kruskal-Wallis test
LEVEL NOBS MEDIAN AVE. RANK Z VALUE
Lemon 6 46.50 21.2 3.47
White 6 15.50 7.3 -2.07
Green 6 34.50 14.8 0.93
Blue 6 15.00 6.7 -2.33
OVERALL 24 12.5
H = 16.95 d.f. = 3 p = 0.001
H = 16.98 d.f. = 3 p = 0.001 (adjusted for ties)
15.37. (a) I = 4, n i = 6, N = 24. (b) The table below lists color, insect count, and rank.
There are only three ties (and the second could be ignored, as both of those counts are for
white boards). The Ri (rank sums) are:
Yellow 17 + 20 + 21 + 22 + 23 + 24 = 127
White 3 + 4 + 5.5 + 9.5 + 9.5 + 12.5 = 44
Green 7 + 14 + 15 + 16 + 18 + 19 = 89
Blue 1 + 2 + 5.5 + 8 + 11 + 12.5 = 40
12 1272 + 442 + 892 + 402
(c) H = − 3(25) = 91.953 − 75 = 16.953.
24(25) 6
Under H0 , this has approximately the chi-squared distribution with df = I − 1 = 3;
comparing to this distribution tells us that 0.0005 < P < 0.001.
B B W W W B G B W W B W B G G G Y G G Y Y Y Y Y
7 11 12 13 14 14 15 16 17 17 20 21 21 25 32 37 38 39 41 45 46 47 48 59
1 2 3 4 5 6 7 8 910 11 12
13 14 15 16 17 18 19 20 21 22 23 24
5.5 9.5 12.5
15.38. (a) The stemplots (right) appear to suggest that Unlogged 1 year ago 8 years ago
logging reduces the number of species per plot and 0 0 2 0
that recovery is slow (the 1-year-after and 8-years- 0 0 0 4
0 0 7 0
after stemplots are similar). The logged stemplots 0 0 8 0
have some outliers and appear to have more spread 1 1 11 1 0
than the unlogged stemplot. The medians are 18.5, 1 333 1 23 1 2
12.5, and 15. (b) For testing H0 : all medians are 1 55 1 4555 1 455
1 1 1 7
equal versus Ha : at least one median is different, we 1 899 1 8 1 88
have H = 9.31, df = 2, and P = 0.010 (or 0.009, ad- 2 01 2 2
justed for ties). This is good evidence of a difference 2 22 2 2
among the groups.
Solutions 409
15.39. (a) Yes, the data support this Minitab output: Chi-square test
statement: The percent of high-SES Never Former Curr Total
subject who have never smoked High 68 92 51 211
68 . 58.68 83.57 68.75
( 211 = 32.2%) is higher than those
Mid 9 21 22 52
percents for middle- and low-SES 14.46 20.60 16.94
subjects (17.3% and 23.7%, respec-
Low 22 28 43 93
tively), and the percent of current 25.86 36.83 30.30
smokers among high-SES subjects
51 .
Total 99 141 116 356
( 211 = 24.2%) is lower than among
ChiSq = 1.481 + 0.850 + 4.584 +
the middle- (42.3%) and low- (46.2%) 2.062 + 0.008 + 1.509 +
SES groups. (b) X 2 = 18.510 with 0.577 + 2.119 + 5.320 = 18.510
df = 4, for which P = 0.001. There is df = 4, p = 0.001
a significant relationship between SES Kruskal-Wallis test
LEVEL NOBS MEDIAN AVE. RANK Z VALUE
and smoking behavior. (c) H = 12.72 High 211 2.000 162.4 -3.56
with df = 2, so P = 0.002—or, Mid 52 2.000 203.6 1.90
after adjusting for ties, H = 14.43 Low 93 2.000 201.0 2.46
and P = 0.001. The observed differ- OVERALL 356 178.5
ences are significant; some SES groups H = 12.72 d.f. = 2 p = 0.002
smoke systematically more. H = 14.43 d.f. = 2 p = 0.001
(adjusted for ties)
Frequency
to make such a graph for them; we can 40
simply observe that 7 of those 10 calls took 30
5 hours, which is quite different from the 20
distribution for Verizon customers. The
10
means and medians tell the same story:
0
.
Verizon x V = 1.7263 hr MV = 1 hr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
CLEC x C = 3.8 hr MC = 5 hr Repair response time (hours)
(b) The distributions are sharply skewed, and the sample sizes are quite different; the t test is
not reliable in situations like this. The Wilcoxon rank-sum test gives W = 4778.5, which is
highly significant (P = 0.0026 or 0.0006). We have strong evidence that response times for
Verizon customers are shorter. It is also possible to apply the Kruskal-Wallis test (with two
groups). While the P-values are slightly different (P = 0.005, or 0.001 adjusted for ties), the
conclusion is the same: We have strong evidence of a difference in response times.
Minitab output: Wilcoxon rank sum test
Verizon N = 95 Median = 1.000
CLEC N = 10 Median = 5.000
W = 4778.5
Test of ETA1 = ETA2 vs. ETA1 < ETA2 is significant at 0.0026
The test is significant at 0.0006 (adjusted for ties)
Kruskal-Wallis test
LEVEL NOBS MEDIAN AVE. RANK Z VALUE
1 95 1.000 50.3 -2.80
2 10 5.000 78.7 2.80
OVERALL 105 53.0
H = 7.84 d.f. = 1 p = 0.005
H = 10.54 d.f. = 1 p = 0.001 (adjusted for ties)
Solutions 411
47
the latter exercise requests the same analysis for ANOVA.
The means, standard deviations, and medians (all in 45
millimeters) are: 43
41
Variety n x s M
39
bihai 16 47.5975 1.2129 47.12
red 23 39.7113 1.7988 39.16 37
yellow 15 36.1800 0.9753 36.11 35
bihai red yellow
The appropriate rank test is a Kruskal-Wallis test of Heliconia variety
H0 : all three varieties have the same length distribution versus Ha : at least one variety is
systematically longer or shorter. We reject H0 and conclude that at least one species has dif-
ferent lengths (H = 45.35, df = 2, P < 0.0005).
Minitab output: Kruskal-Wallis test
LEVEL NOBS MEDIAN AVE. RANK Z VALUE
1 16 47.12 46.5 5.76
2 23 39.16 26.7 -0.32
3 15 36.11 8.5 -5.51
OVERALL 54 27.5
H = 45.35 d.f. = 2 p = 0.000
H = 45.36 d.f. = 2 p = 0.000 (adjusted for ties)
412 Chapter 15 Nonparametric Tests
15.45. Use the Wilcoxon rank sum test with a two-sided alternative. For meat, W = 15 and
P = 0.4705, and for legumes, W = 10.5 and P = 0.0433 (or 0.0421). There is no evidence
of a difference in iron content for meat, but for legumes the evidence is significant at
α = 0.05.
Minitab output: Wilcoxon rank sum test for meat
Alum N = 4 Median = 2.050
Clay N = 4 Median = 2.375
W = 15.0
Test of ETA1 = ETA2 vs. ETA1 ~= ETA2 is significant at 0.4705
Cannot reject at alpha = 0.05
Wilcoxon rank sum test for legumes
Alum N = 4 Median = 2.3700
Clay N = 4 Median = 2.4550
W = 10.5
Test of ETA1 = ETA2 vs. ETA1 ~= ETA2 is significant at 0.0433
The test is significant at 0.0421 (adjusted for ties)
15.47. (a) The three pairwise comparisons are bihai-red, bihai-yellow, and red-yellow. (b) The
test statistics and P-values are given in the Minitab output below; all P-values are reported
as 0 to four decimal places. (c) All three are easily significant at the overall 0.05 level.
Minitab output: Wilcoxon rank sum test for bihai – red
bihai N = 16 Median = 47.120
red N = 23 Median = 39.160
W = 504.0
Test of ETA1 = ETA2 vs.ETA1 ~= ETA2 is significant at 0.0000
Wilcoxon rank sum test for bihai – yellow
bihai N = 16 Median = 47.120
yellow N = 15 Median = 36.110
W = 376.0
Test of ETA1 = ETA2 vs. ETA1 ~= ETA2 is significant at 0.0000
Wilcoxon rank sum test for red – yellow
red N = 23 Median = 39.160
yellow N = 15 Median = 36.110
W = 614.0
Test of ETA1 = ETA2 vs. ETA1 ~= ETA2 is significant at 0.0000
414
Solutions 415
16.2. The standard deviation of a sample measures the spread of that sample. The standard
error of a sample mean (or other statistic) estimates how much the mean would vary, if you
were to take the means of many samples from the same population. The SE is smaller by a
√
factor of n.
16.3. (a) The bootstrap samples from the sample (that is, the data), not the population. (b) The
bootstrap samples with replacement. (c) The sample size should be equal to the original
sample. (d) The bootstrap distribution is usually similar to the sampling distribution in shape
and spread, but not in center.
16.4. The bootstrap distribution is (usually) close to Normal, so we expect the sampling
distribution to also be close to Normal.
Mean
mean(IQ)
116
112
108
16.5. The bootstrap distribution is (usually) close to Normal, with some positive skewness. We
expect the sampling distribution to be close to Normal.
−3 −2 −1 0 1 2 3 3 4 5 6 7 8
Normal score mean(CO2)
16.6. Due to the small sample size, the bootstrap distribution shows some discreteness (note
the small “stair-steps” in the quantile plot). This particular bootstrap distribution looks
reasonably Normal, but with a sample size this small, the sample skewness may vary
substantially, so we cannot say if the sampling distribution is really skewed.
Note: The small-sample variability in skewness is not discussed until Section 16.3.
Bootstrap methods are not well-suited to datasets this small.
416 Chapter 16 Bootstrap Methods and Permutation Tests
−3 −2 −1 0 1 2 3 4 6 8 10
Normal score mean(Hours)
16.7. The bootstrap distribution suggests that the sampling distribution should be close to
Normal.
Normal quantile plot: Mean song length (seconds) Bootstrap distribution: Mean song length (seconds)
Observed
Mean song length (seconds)
Mean
450
350
250
16.9. In each case, SEboot will vary. To get an idea of how much variability one might
observe, a range of “typical” bootstrap SE is given, based on 500 trials. (a) For the IQ data,
. .
s = 14.8009, so SEx = 1.9108. SEboot will typically be between and 1.77 and 2.01. (b) For
. .
the CO2 data, s = 4.8222, so SEx = 0.6960. SEboot will typically be between about 0.64 and
. .
0.74. (c) For the video-watching data, s = 3.8822, so SEx = 1.3726. SEboot will typically
be between about 1.20 and 1.36—almost certainly an underestimate.
√ The bootstrap is biased
downward for estimating standard error by a factor of about (n − 1)/n, which is about
0.94 when n = 8.
60
mal in the tails; both the quantile plot and
Frequency
the histogram (on the following page) are
40
clearly skewed to the right.
20
0
0 500 1000 1500 2000 2500 3000
Call length
Normal quantile plot: mean(Callcenter80 length) Bootstrap distribution: mean(Callcenter80 length)
Observed
mean(Callcenter80 length)
400
Mean
300
200
100
16.11. (a) The CALLCENTER20 bootstrap distribution is slightly skewed to the right, but it is
considerably less skewed than the CALLCENTER80 bootstrap distribution. (b) The standard
error for the smaller data set is much smaller: For CALLCENTER20, the standard error is
almost always between 21 and 25, and for CALLCENTER80, it is almost always between
35 and 41.
Note: The difference in standard errors is primarily because the sample standard
deviation for the CALLCENTER20 data is much smaller (103.8 versus 342.0).
418 Chapter 16 Bootstrap Methods and Permutation Tests
200 Normal quantile plot: mean(Callcenter20 length) Bootstrap distribution: mean(Callcenter20 length)
Observed
mean(Callcenter20 length)
Mean
150
100
.
16.12. (a) The mean of the sample is x = 13.7567, so the bootstrap bias estimate is
.
13.762 − 13.7567 = 0.0053. (b) SEboot = 4.725. (We do not need to divide the given
value by anything; it is already the estimate standard deviation of the sample mean.) (c) For
df = 5, the appropriate critical value is t ∗ = 2.571, so the 95% bootstrap confidence interval
.
is x ± t ∗ SEboot = 13.7567 ± 12.146 = 1.6107 to 25.9027.
16.13. See also the solution to Exercise 16.8. (a) The bootstrap Typical ranges
distribution is skewed; a t interval might not be appropriate. SEboot 39.5 to 46.5
(b) The bootstrap t interval is x ±t ∗ SEboot , where x = 354.1 sec, t lower 260.7 to 274.7
.
t ∗ = 2.0096 for df = 49, and SEboot is typically between 39.5 t upper 433.5 to 447.5
and 46.5. This gives the range of intervals shown on the right.
(c) The interval reported in Example 7.11 was 266.6 to 441.6 seconds.
16.14. (a) Based on 1000 resamples, SEboot is almost always be- Typical ranges
tween 3.8 and 4.5. (b) The bootstrap distribution looks reason- SEboot 3.9 to 4.5
ably Normal, with no appreciable bias, so a bootstrap t interval t lower 0.88 to 2.09
is appropriate. Typical ranges are on the right. (c) The t interval t upper 17.82 to 19.03
reported in Example 7.15 (page 439) was 1.2 to 18.7.
Normal quantile plot: diffmeans(DRP) Bootstrap distribution: diffmeans(DRP)
Observed
20
Mean
diffmeans(DRP)
15
10
5
0
−3 −2 −1 0 1 2 3 0 5 10 15 20
Normal score diffmeans(DRP)
Solutions 419
16.15. The summary statistics given in Example 16.6 include standard deviations
. . .
s1 = 14.7 min for Verizon and s2 = 19.5 min for CLEC, so SE D = 4.0820. (Computation
.
from the original data gives SE D = 4.0827.) The standard error reported by the S-PLUS
bootstrap routine (shown in the text following that example) is 4.052.
16.16. See also the solution to Exercise 16.6. (a) The bootstrap dis- Typical ranges
tribution found in the solution to Exercise 16.6 looks reasonably SEboot 1.2 to 1.4
Normal. (b) The likely range of bootstrap intervals is on the right. t lower 3.5 to 3.9
. .
(c) With x = 6.75 hr, s = 3.8822, and t ∗ = 2.365 (df = 7), the usual t upper 9.6 to 10
95% confidence interval is 3.50 to 10.00, so the bootstrap interval is
almost always narrower than the usual interval.
16.17. See also the solution to Exercise 16.10. (a) The bootstrap Typical ranges
bias is typically between −4 and 4, which is small relative to Bias −4 to 4
x = 196.575 min. (b) Ranges for the bootstrap interval are given on SEboot 35 to 41
.
the right. (c) SEx = 38.2392, while SEboot ranges from about 35 to t lower 114 to 127
41. The usual t interval is 120.46 to 272.69 min. t upper 266 to 279
16.18. (a) The two distributions should be similar (although the Typical ranges
default plots may differ depending on what software is used). x 25% 242.7 to 246.4
(b) Typical ranges for these quantities are shown on the right. Bias −1.3 to 2.4
(c) The interval in Example 16.5 was (210.19, 277.81). SEboot 16 to 19
t lower 205.5 to 211.5
t upper 276.5 to 282.5
Normal quantile plot: trim25(Real estate price) Bootstrap distribution: trim25(Real estate price)
320
Observed
Mean
trim25(Real estate price)
280
240
200
Normal quantile plot: stdev(Real estate price) Bootstrap distribution: stdev(Real estate price)
Observed
500
Mean
stdev(Real estate price)
400
300
200
100
16.20. (a) Shown on the right are both stem- North South 60
plots and boxplots (copied from the solution 43322 0 2
65 0 57 50
Mean
North DBH − south DBH
−5
−20 −15 −10
16.21. (a) The data appear to be roughly Normal, though with Typical ranges
the typical random gaps and bunches that usually occur with Bias −0.016 to 0.02
relatively small samples. It appears from both the histogram SEboot 0.12 to 0.14
and quantile plot that the mean is slightly larger than zero, but t lower −0.16 to −0.11
the difference is not large enough to rule out a N (0, 1) distri- t upper 0.36 to 0.41
bution. (b) The bootstrap distribution is extremely close to Normal with no appreciable bias.
.
(c) SEx = 0.1308, and the usual t interval is −0.1357 to 0.3854. Typical results for SEboot
and the bootstrap interval are above on the right.
20
3
2
15
Frequency
Normal78
1
10
0
−1
5
−2
0
−2 −1 0 1 2 −2 −1 0 1 2 3
Normal score Normal78
16.22. Based on a quantile plot and histogram, the bootstrap distribution is quite non-Normal.
Normal quantile plot: median(Real estate price) Bootstrap distribution: median(Real estate price)
Observed
median(Real estate price)
Mean
300
250
200
16.23. Because the scores are all between 1 and 10, there can be Typical ranges
no extreme outliers, so standard t methods should be safe. The Bias −0.07 to 0.06
bootstrap distribution appears to be quite Normal, with little bias. SEboot 0.44 to 0.53
The usual t interval is 4.9256 to 6.8744, which is in the range of t lower 4.8 to 5.0
typical bootstrap intervals. t upper 6.8 to 7.0
Mean
mean(Luck)
.
16.24. (a) The sample standard deviation is s = 4.4149 mpg. Typical ranges
(b) The typical range for SEboot is in the table on the right. Bias −0.22 to -0.09
(c) SEboot is quite large relative to s, suggesting that s is not a SEboot 0.55 to 0.65
very accurate estimate. (d) There is substantial negative bias and t lower 3.07 to 3.26
some skewness, so a t interval is probably not appropriate. t upper 5.57 to 5.76
Mean
stdev(Fuel efficiency)
5
4
3
−3 −2 −1 0 1 2 3 3 4 5 6
Normal score stdev(Fuel efficiency)
15
10
Billionaire wealth
Frequency
8
10
6
4
5
2
0
−2 −1 0 1 2 0 5 10 15
Normal score Billionaire wealth
Observed
Mean
trim25(Billionaire wealth)
3.0
2.5
2.0
1.5
16.26. (a) The bootstrap distribution for the CLEC mean is strongly Typical ranges
right-skewed with mean 16.5, with bias and SEboot in the ranges Bias −0.40 to 0.42
shown on the right. For comparison, the bootstrap distribution SEboot 3.6 to 4.3
for ILEC mean in Figure 16.3 is barely skewed, with a mean of t lower 7.6 to 9.0
8.41 and SEboot = 0.367. (b) Note that SEboot is much larger for t upper 24.0 to 25.4
CLEC than for ILEC. Because the ILEC bootstrap means vary so little, when we compute
x ILEC − x CLEC , it is the latter term that primarily determines the shape of the distribution of
the difference. Because of the minus sign, the right skewness of the CLEC means cause the
difference to be left-skewed.
Normal quantile plot: mean(CLEC repair time) Bootstrap distribution: mean(CLEC repair time)
30
Observed
Mean
mean(CLEC repair time)
25
20
15
10
−3 −2 −1 0 1 2 3 10 15 20 25 30
Normal score mean(CLEC repair time)
424 Chapter 16 Bootstrap Methods and Permutation Tests
√
16.27. (a) x would have a Normal distribution with mean 8.4 and standard deviation 14.7/ n.
(b) and (c) Histograms below. The values of SEboot will be quite variable, both because of
variation in the original sample, and variation due to resampling. (d) Student answers will
vary, depending on their samples. There may be some skewness (right or left) for smaller
samples. SEboot should be roughly halved each time the sample size increases by a factor of
4, although for n = 10 and n = 40, the size of SEboot can vary considerably.
−5 0 5 10 15 20 0 5 10 15 4 5 6 7 8 9 10
mean(x10) mean(x40) mean(x160)
√
16.28. (a) The mean is 8.4, and the standard deviation is 14.7/ n. (b) and (c) Histograms on
the following page. The values of SEboot will be quite variable, both because of variation
in the original sample, and variation due to resampling. (d) Student answers may vary,
depending on their samples. There may be substantial right skewness, and some irregularity,
for smaller samples (the n = 10 histogram below is an extreme case), but the distribution
should be closer to Normal for large samples. There will typically be little or no bias.
SEboot should be roughly halved each time the sample size increases by a factor of 4,
although for n = 10 and n = 40, the size of SEboot may vary considerably.
0 5 10 15 20 25 30 35 5 10 15 6 8 10 12 14
mean(repair10) mean(repair40) mean(repair160)
16.29. Student answers should vary depending on their samples. They should notice that the
bootstrap distributions are approximately Normal for larger sample sizes. For small samples,
the sample could be skewed one way or the other in Exercise 16.27, and most should be
right skewed for Exercise 16.28. Some of that skewness should come through into the
bootstrap distribution.
16.30. For a 90% bootstrap percentile confidence interval, we choose the 5th and 95th
percentiles of the bootstrap distribution. For an 80% interval, use the 10th and 90th
percentiles.
Solutions 425
16.31. (a) The bootstrap distribution looks close to Nor- Typical ranges
mal (though that does not mean much with this small Bias −0.54 to 0.54
sample). The bias is small. (b) The typical range of SEboot 4.35 to 5.01
bootstrap t intervals is on the right. (c) The bootstrap t lower 0.89 to 2.57
percentile interval is much narrower than the bootstrap t t upper 24.94 to 26.63
interval. (It is typically 70% to 80% as wide.) Percentile lower 3.7 to 5.9
Percentile upper 22.1 to 24.5
Note: The reason that the percentile interval is nar-
rower in this setting is that the bootstrap distribution has “heavy tails,” visible in the slight
curvature on the edges of the quantile plot. This inflates the standard error, and therefore
make the t interval wider than it should be.
Normal quantile plot: mean(repair6) Bootstrap distribution: mean(repair6)
Observed
Mean
25
mean(repair6)
20
15
10
5
−3 −2 −1 0 1 2 3 0 5 10 15 20 25 30
Normal score mean(repair6)
.
16.32. (a) The sample standard deviation is s = 14.8 with Typical ranges
.
n = 60, so SEx = 1.9108. The usual t confidence interval Bias −0.19 to 0.23
.
is x ± 2.001 SEx = 114.9833 ± 3.8235 = 111.16 to 118.81. SEboot 1.78 to 2.04
(b) The bootstrap distribution appears to be reasonably close t lower 110.9 to 111.5
to Normal, except perhaps in the tails. Ranges for SEboot and t upper 118.5 to 119.1
the bootstrap t interval are given on the right. (c) The intervals agree fairly well, although
resampling variation can produce different results. Here the formula interval would be fine.
Normal quantile plot: mean(IQ) Bootstrap distribution: mean(IQ)
Observed
Mean
boot t
118
percentile
mean(IQ)
BCa
usual t
114
110
16.33. (a) The bootstrap percentile and t intervals are Typical ranges
very similar, suggesting that the t intervals are accept- t lower −0.16 to −0.11
able. (b) Every interval (percentile and t) includes 0. t upper 0.36 to 0.41
Note: In the solution to Exercise 16.31, the per- Percentile lower −0.17 to −0.09
centile intervals were always 70% to 80% as wide as Percentile upper 0.35 to 0.42
the t intervals (because of the heavy tails of that bootstrap distribution). In this case, the
width of the percentile interval is 93% to 106% of the width of the t interval.
8
right, but has no extreme outliers, so stan-
dard t procedures should be safe unless high
Frequency
6
accuracy is needed. (b) The usual t interval
is x ± 2.045SEx = 99.8 ± 3.05 = 96.75
4
to 102.85. (c) The bootstrap distribution
2
is very close to Normal with no apprecia-
ble bias; a t interval should be accurate.
0
Ranges are in the table on the right. (d) The 85 90 95 100 105 110 115 120
bootstrap percentile interval (ranges on the data30
right) is typically similar to the bootstrap t Typical ranges
interval, and both are similar to the standard Bias −0.15 to 0.15
t interval. We conclude that the usual t in- SEboot 1.34 to 1.60
terval is accurate. t lower 96.7 to 97.0
Note: The width of the percentile inter- t upper 102.6 to 102.9
val is typically 90% to 103% of the width of Percentile lower 96.6 to 97.4
Percentile upper 102.3 to 103.2
the t interval.
Normal quantile plot: mean(data30) Bootstrap distribution: mean(data30)
104
Observed
Mean
102
mean(data30)
100
98
96
16.35. These intervals are given on page 16-37 of the text: The percentile interval is (−0.128,
0.356), and the bootstrap t interval is (−0.144, 0.358). The differences are relatively small
relative to the width of the intervals, so they do not indicate appreciable skewness.
Solutions 427
16.36. (a) These intervals will vary in a manner similar to the bootstrap t and percentile
intervals; see the solution to Exercise 16.34 for likely ranges. (b) BCa and tilting intervals
are typically similar to the standard t interval (96.75 to 102.85), again suggesting that the
usual t interval is accurate. (c) For a quick check, we might use the percentile interval. For
a more accurate check, we should use a BCa or tilting interval (if they are available).
16.37. Typical ranges for the BCa interval are shown on the Typical ranges
right; the tilting interval will be similar. Most intervals are BCa lower −0.19 to −0.11
fairly similar to the bootstrap t and percentile intervals from BCa upper 0.32 to 0.41
Example 16.10, suggesting that the simpler intervals are
adequate.
16.38. As was previously noted in the solutions to Exercises 16.8 and 16.13, the skewness
is a cause for concern. The lower limit of the percentile interval is generally larger than
lower limit of the bootstrap t interval. The BCa interval is almost always shifted to the right
relative to both the t and percentile intervals. The t and percentile intervals are inaccurate
here; the more sophisticated BCa or tilting intervals are more reliable.
Bootstrap distribution: mean(Seconds)
Observed
Typical ranges Mean
t lower 260.7 to 274.7 boot t
percentile
t upper 433.5 to 447.5 BCa
Percentile lower 270.2 to 287.0
Percentile upper 432.8 to 463.3
BCa lower 281.4 to 300.0
BCa upper 450.9 to 516.2
250 300 350 400 450 500
mean(Seconds)
16.39. The percentile interval is shifted to the right relative to the bootstrap t interval. The
more accurate intervals are shifted even further to the right.
Bootstrap distribution: mean(Callcenter80 length)
Observed
Typical ranges Mean
t lower 114 to 127 boot t
percentile
t upper 266 to 279 BCa
Percentile lower 127 to 140
Percentile upper 267 to 298
BCa lower 137 to 152
BCa upper 292 to 371
150 200 250 300
mean(Callcenter80 length)
428 Chapter 16 Bootstrap Methods and Permutation Tests
16.40. The percentile interval is typically shifted to the left relative to the others; the BCa
and tilting intervals are farther to the right. Based on these differences, the BCa or tilting
interval should be used.
Bootstrap distribution: stdev(Real estate price)
Observed
Typical ranges Mean
t lower 139.4 to 162.7 boot t
percentile
t upper 470.9 to 494.2 BCa
Percentile lower 122.8 to 151.3
Percentile upper 437.7 to 479.5
BCa lower 162.3 to 202.3
BCa upper 484.7 to 609.1
100 200 300 400 500
stdev(Real estate price)
16.41. The bootstrap distribution for the smaller sample is less skewed. The standard t interval
is 80.34 to 177.46; the bootstrap t interval is similar, and the other bootstrap intervals are
generally narrower and shifted to the right.
Note: Generally, a smaller sample should result in less regularity—that is, more
skewness, larger standard error, etc. In this case, the smaller sample does not contain the
nine highest call lengths, many of which would be considered outliers. Those increase the
skewness of the bootstrap distribution for CALLCENTER80.
Bootstrap distribution: mean(Callcenter20 length)
Observed
Typical ranges Mean
t lower 78.0 to 84.4 boot t
percentile
t upper 173.3 to 179.8 BCa
Percentile lower 80.8 to 92.8
Percentile upper 169.7 to 181.6
BCa lower 83.1 to 96.2
BCa upper 173.0 to 191.9
50 100 150 200
mean(Callcenter20 length)
16.42. See also the solution to Exercise 16.26. (a) The bootstrap distribution shows strong
right skewness, making the formula t, bootstrap t, and (to a lesser extent) the percentile
intervals unreliable. The BCa and tilting intervals adjust for skewness, so they should be
accurate. (b) Typical ranges for these intervals are shown on the right. Relative to the
.
sample mean x = 16.5, the BCa and tilting intervals are much more asymmetrical than
the other intervals, because they take into account the skewness in the data. The bootstrap
t ignores the skewness, and the percentile interval only catches part of the skewness. In
practical terms, a t interval would tend to underestimate the true value: It would not stretch
far enough to the right, so it would have a probability of missing the population mean
higher than 5%. This is true to a lesser extent for the percentile interval.
Solutions 429
.
16.43. The observed difference is x ILEC − x CLEC = −8.1. Ranges for all three intervals are
given below. Because of the left skew of the bootstrap distribution, the t interval does not
reach far enough to the left and reaches too far to the right, meaning that the interval would
be too high too often, effectively overestimating where the true difference lies. This may
also be true for the percentile interval, but considerably less so.
Bootstrap distribution: ILEC mean − CLEC mean
Observed
Typical ranges Mean
t lower −16.5 to −15.4 boot t
percentile
t upper −0.8 to 0.3 BCa
Observed
Mean
corr(bots, spams/day)
boot t
0.8
percentile
BCa
0.6
0.4
0.2
Mean
boot t
percentile
0.996
BCa
0.992
16.46. We should resample whole observations. If the data are stored in a spreadsheet with
observations in rows and the x and y variables in two columns, then we should pick a
random sample of rows with replacement. When a row is picked, we put the whole row into
a bootstrap data set. By doing so, we maintain the relationship between x and y.
16.47. (a) The regression equation for predicting salary Typical ranges
(in $millions) is ŷ = 0.8125 + 7.7170x, where x is Bias −0.56 to 1.35
batting average. (The slope is not significantly different SEboot 9.26 to 10.6
from 0: t = 0.744, P = 0.461.) (b) The bootstrap t lower −13.6 to −10.8
distribution is reasonably close to Normal, suggesting t upper 26.3 to 29.0
that any of the intervals would be reasonably accu- Percentile lower −12.7 to −7.7
Percentile upper 26.0 to 31.7
rate. Ranges for the intervals are given on the right.
BCa lower −13.6 to −7.6
(c) These results are consistent with our conclusion BCa upper 24.7 to 33.0
about correlation: All intervals include zero, which
corresponds to no (linear) relationship between batting average and salary.
Normal quantile plot: Regression line slope Bootstrap distribution: Regression line slope
Observed
40
Regression line slope Mean
30 boot t
percentile
20
BCa
10
0
−10
−3 −2 −1 0 1 2 3 −20 −10 0 10 20 30 40
Normal score Regression line slope
16.48. (a) The residuals versus bots plot suggests that Typical ranges
spread increases with the number of bots—a violation Bias −0.02 to −0.01
of the conditions for regression. The small sample size SEboot 0.045 to 0.053
makes it hard to draw conclusions from the quantile t lower 0.050 to 0.067
plot (it is not very linear, but that is primarily because t upper 0.274 to 0.290
of one point). (b) The bootstrap distribution is decid- Percentile lower 0.028 to 0.049
Percentile upper 0.207 to 0.216
edly non-Normal; we should not use the t interval.
BCa lower 0.029 to 0.053
(The percentile interval is more reliable, but is still BCa upper 0.207 to 0.224
somewhat risky because of the shape of the bootstrap
distribution.) (c) Ranges for all three intervals are given in the table on the right. As ex-
pected, the bootstrap t interval is inaccurate, but the percentile and BCa intervals are quite
similar. With t ∗ = 2.306 for df = 8, the standard t interval is 0.1705 ± t ∗ (0.03189) = 0.0969
to 0.2440—shifted to the right relative to BCa (and percentile).
Note: Because H0 : β1 = 0 is equivalent to H0 : ρ = 0, it might be tempting to think that
bootstrapping the slope and bootstrapping the correlation should be equivalent. Although
the distributions are usually similar, the resampling process also changes sx and sy , which
complicates the relationship between ρ and β1 .
10
10
5
5
0
0
Residual
Residual
−10
−10
−20
−20
0 50 100 150 200 250 300 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
Spam bots Normal score
Normal quantile plot: Regression line slope Bootstrap distribution: Regression line slope
Observed
Mean
0.20
Regression line slope
boot t
percentile
BCa
0.10
0.00
16.49. (a) The tails of the residuals are somewhat Typical ranges
heavier than we would expect from a Normal distri- Bias −0.0008 to 0.0026
bution. (b) The bootstrap distribution is reasonable SEboot 0.016 to 0.019
close to Normal, so the bootstrap t should be fairly t lower 1.07 to 1.09
accurate. (c) Ranges for all three bootstrap intervals t upper 1.14 to 1.16
are given in the table on the right; they all give Percentile lower 1.08 to 1.09
Percentile upper 1.14 to 1.16
similar results. With t ∗ = 2.074 for df = 22, the
BCa lower 1.07 to 1.09
standard t interval is 1.1159 ± t ∗ (0.0108) = 1.0784 BCa upper 1.14 to 1.16
to 1.1534. The bootstrap intervals are fairly close to
this.
20
20
10
10
Residual
Residual
0
0
−20 −10
−20 −10
0 100 200 300 400 −2 −1 0 1 2
Debt in 2006 Normal score
Normal quantile plot: Regression line slope Bootstrap distribution: Regression line slope
1.18
Observed
Mean
Regression line slope
boot t
1.14
percentile
BCa
1.10
1.06
Observed
Mean
3.0
boot t
mean(aggdiff−all)
percentile
2.5
BCa
2.0
1.5
Mean
mean(aggdiff−trimmed)
boot t
percentile
3.2
BCa
2.8
2.4
16.51. No, because we believe that one population has a smaller spread, but in order to pool
the data, the permutation test requires that both populations be the same when H0 is true.
√ .
16.52. The standard error is about (0.04)(0.96)/250 = 0.0124. We should not feel
comfortable declaring this to be significant at the 5% level, because 0.04 is less than one SE
below 0.05.
Observed
Mean
2002 price − 2001 price
50
0
−50
−150
.
Ha : µILEC < µCLEC , we find t = −3.25, 0 5 10 15
. .
df = 10.71, and P = 0.004. (c) Based on Repair time
1000 resamples (each of size 1000), the P-value typically ranges from 0.001 to 0.010. The
permutation test does not require Normal distributions, and gives more accurate answers in
the case of skewness. A plot of the permutation distribution shows there is substantial skew-
ness. (d) The difference is significant at the 5% level (and usually at the 1% level).
436 Chapter 16 Bootstrap Methods and Permutation Tests
Normal quantile plot: mean(ILEC time − CLEC time) Permutation distribution: mean(ILEC time − CLEC time)
Observed
mean(ILEC time − CLEC time)
Mean
0.5
−0.5
−1.5
−2.5
−3 −2 −1 0 1 2 3 −2 −1 0 1
Normal score mean(ILEC time − CLEC time)
16.56.
√ The standard deviations are approximately Standard
P(1 − P)/n, giving the results on the right. Study n P deviation
DRP 1000 0.015 0.00384
Verizon 500,000 0.0183 0.00019
16.57. (a) The two populations should be the same shape, but skewed—or otherwise clearly
non-Normal—so that the t test is not appropriate. (b) Either test is appropriate if the two
populations are both Normal with the same standard deviation. (c) We can use a t test, but
not a permutation test, if both populations are Normal with different standard deviations.
16.58. With resamples of size 1000, the P-value typically ranges from 0.53 to 0.68. If the
price distributions in 2001 and 2002 were the same, then a difference in medians as large as
we observed would occur more than half the time. We conclude that the difference is easily
explained by chance, and is therefore not statistically significant.
Note: The permutation distribution (below) appears to be bimodal, but that does not
affect our conclusion.
Normal quantile plot: ILEC median − CLEC median Permutation distribution: ILEC median − CLEC median
Observed
ILEC median − CLEC median
Mean
50
0
−50
−3 −2 −1 0 1 2 3 −50 0 50 100
Normal score ILEC median − CLEC median
Solutions 437
6
Ha : µ > 0, where µ is the population
2
mer language institute. We find t = 3.86,
.
df = 19, and P = 0.0005. (b) The quantile
0
−6 −4 −2
plot (right) looks odd because we have a
small sample, and all differences are in-
tegers. (c) The P-value is almost always
less than 0.002. Both tests lead to the same −2 −1 0 1 2
conclusion: The difference is statistically Normal score
significant.
Note: The text states that “the histogram [of the permutation distribution] looks a bit
odd.” In fact, different software produces different default histograms, some of which look
fine. (This statement was made about the default histogram produced by S-PLUS.) To avoid
potential confusion, check what your software does, and (if necessary) tell students to ignore
that part of the question.
Normal quantile plot: mean(gain) Permutation distribution: mean(gain)
Observed
3
Mean
2
mean(gain)
1
0
−1
−2
−3 −2 −1 0 1 2 3 −2 −1 0 1 2 3
Normal score mean(gain)
Mean
2
1
0
−1
−2
−3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Normal score mean(Computer − Driver)
438 Chapter 16 Bootstrap Methods and Permutation Tests
Mean
0.2
−0.2
−0.6
Observed
Mean
corr(Salary, Average)
0.2
0.0
−0.2
−0.4
16.63. For testing H0 : σ1 = σ2 versus Ha : σ1 = σ2 , the permutation test P-value will almost
.
always be between 0.065 and 0.095. In the solution to Exercise 7.105, we found F = 1.50
.
with df = 29 and 29, for which P = 0.2757—three or four times as large. In this case, the
permutation test P-value is smaller, which is typical of short-tailed distributions.
Solutions 439
2.0
var(north) / var(south) Mean
1.5
1.0
0.5
. .
16.64. (a) The group means (in µmol/L) are x 1 = 0.778 (group 1, uninfected) and x 2 = 0.620
(group 2, infected). (b) If R = µ1 /µ2 is the ratio of the population means, we test
H0 : R = 1 versus Ha : R > 1. The one-sided P-value is typically between 0.014 and
0.21. The permutation distribution (below) is centered near 1 with standard deviation
approximately 0.11; it is roughly Normal with some right skewness. We expect the
permutation distribution to be centered at about 1, because that is the null hypothesis value
for the ratio.
Note: Our permutation resampling will, on the average, produce x1 = x2 , so it seems
“obvious” that the ratio x1 /x2 should equal 1 on the average. Of course, one should beware
of accepting the “obvious”; in general, the expected value of a ratio is not equal to the
ratio of the expected values, although it will often (as in this case) be close.
Mean
mean(uninf.)/mean(inf.)
1.1
0.9
0.7
16.65. For the permutation test, we must resample in a way that is consistent with the null
hypothesis. Hence we pool the data—assuming that the two populations are the same—and
draw samples (without replacement) for each group from the pooled data. For the bootstrap,
we do not assume that the two populations are the same, so we sample (with replacement)
from each of the two datasets separately, rather than pooling the data first.
Note: Shown below is the bootstrap distribution for the ratio of means; comparing this
with the permutation distribution from the previous solution illustrates the effect of the
resampling method.
440 Chapter 16 Bootstrap Methods and Permutation Tests
16.66. (a) Ranges for the BCa interval (based on 1000 re- Typical ranges
samples) are given on the right. Note that some of these Bias −0.01 to 0.02
intervals include 1, suggesting that for some resamples, SEboot 0.12 to 0.15
we could not reject H0 : R = 1. (b) The bootstrap distri- BCa lower 0.97 to 1.06
bution (shown with the solution to the previous problem) BCa upper 1.47 to 1.58
is right skewed, with relatively little bias. Typically, the Percentile lower 0.99 to 1.05
Percentile upper 1.49 to 1.58
percentile interval is shifted slightly to the right of the
BCa interval, which means that it suggests slightly stronger evidence against H0 : R = 1.
Note: If students use a larger resample size, the intervals show less variability. with
10,000 resamples, both the BCa and percentile intervals will almost certainly (barely) not
include 1; in particular, the BCa lower bound is typically between 1.003 and 1.023.
16.67. (a) The resampled CLEC standard deviation is sometimes 0, so for best results (that
is, to avoid infinite ratios), put that standard deviation in the numerator. Both bootstrap
distributions are shown below. (We do not need quantile plots to confirm that these
distributions are non-Normal.) Regardless of which ratio we use, the resulting P-value is
close to 0.37. (b) The difference in the P-values is evidence of the inaccuracy of the F test;
these distributions clearly do not satisfy the Normality assumption.
0 1 2 3 4 0 1 2 3 4 5 6
sd(CLEC) / sd(ILEC) sd(ILEC) / sd(CLEC)
Solutions 441
Normal quantile plot: 2008 prop. − 2006 prop. Permutation distribution: 2008 prop. − 2006 prop.
0.03
Observed
Mean
2008 prop. − 2006 prop.
0.01
−0.01
−0.03
16.69. The bootstrap distribution looks quite Normal, and (as a consequence) all of the
bootstrap confidence intervals are similar to each other, and also are similar to the standard
(large-sample) confidence interval: 0.0981 to 0.1415.
Note: At the time these solutions were written, R’s bootstrapping package would fail if
asked to find the BCa confidence interval for this exercise.
Normal quantile plot: 2008 prop. − 2006 prop. Bootstrap distribution: 2008 prop. − 2006 prop.
Observed
0.15
Mean
2008 prop. − 2006 prop.
0.13
0.11
0.09
16.70. See the solution to Exercise 7.65 for stemplots and summary statistics, and the unpooled
test; the solution to Exercise 7.86 gives the details for the pooled test. In Exercise 7.65,
the text suggested that there was a prior suspicion that that sad group would be willing to
pay more, so we used a one-sided alternative. Here we use a two-sided alternative. (a) The
. .
unpooled t statistic is t = −4.30 with df = 26.48, for which the two-sided P-value is
. . .
P = 0.0002. (b) The pooled test gives t = −4.10, df = 2, and P = 0.0003. (c) Apart from
some granularity (visible in the quantile plot), the permutation distribution is reasonably
Normal. The observed difference is about three standard deviations above the mean, and
P < 0.002 (nearly always). (d) Student preferences will vary. Note that the permutation test
is safest because it makes the fewest assumptions about the populations, while the pooled
442 Chapter 16 Bootstrap Methods and Permutation Tests
t test makes the most assumptions. Given the identical conclusions and the similarity of
strength of the evidence, there is little reason to have a strong preference here.
Mean
mean(Sad − Neutral)
0.5
−0.5
−1.5
16.72. To compare the means, we need a matched-pairs permutation test, which gives
a P-value near 0.78—no reason to suspect a systematic difference in the operators’
measurements.
There is not really a legitimate way to compare the spreads with the data we have. Most
of the variation in each operator’s measurements can be attributed to variation in the subjects
being measured, rather than variation due to the operator’s abilities. Nonetheless, we can
do this comparison by randomly swapping (or not) observations within each matched pair,
and then examining the ratio of variances (or equivalently, the standard deviations). The
.
permutation distribution of the variance ratio is given below; it has P = 0.66. In both cases
there is not statistically significant evidence of a difference between the operators. The
differences could easily arise by chance; even larger differences would occur by chance
more than half the time.
Note: For a legitimate comparison of the spreads for the two operators, we would want
Solutions 443
Observed
Mean
−0.010
Observed
1.10
Mean
1.00
0.90
16.73. See the solution to Exercise 16.18 for another view of the bootstrap distribution. Ranges
for the bootstrap t, percentile, and BCa intervals are given on the right. All have similar
upper endpoints, but the lower endpoint for the bootstrap t is typically less than the others
(because it ignores the skewness in the bootstrap distribution). It appears that any of the
other intervals—including the percentile interval—would be more reliable.
Bootstrap distribution: trim25(Price)
Observed
Typical ranges Mean
t lower 205.5 to 211.5 boot t
percentile
t upper 276.5 to 282.5 BCa
Percentile lower 208.4 to 216.5
Percentile upper 276.2 to 288.8
BCa lower 207.9 to 217.4
BCa upper 276.0 to 293.6
200 220 240 260 280 300
trim25(Price)
444 Chapter 16 Bootstrap Methods and Permutation Tests
16.74. See the solution to Exercise 16.5 for the bootstrap distribu- Typical ranges
tion. (a) The bootstrap distribution is approximately Normal, with t lower 3.1 to 3.4
a very slight right skew. A t interval should be acceptable unless t upper 5.8 to 6.1
high accuracy is needed. (b) Ranges for the t and BCa intervals BCa lower 3.2 to 3.6
are given on the right. (c) The BCa interval is typically shifted to BCa upper 5.9 to 6.5
the right of the t interval, as we might expect because of the slight right skew.
16.75. All answers (including the shape of the bootstrap distribution) will depend strongly on
the initial sample of uniform random numbers. The median M of these initial samples will
be between about 0.36 and 0.64 about 95% of the time; this is the center of the bootstrap t
confidence interval. (a) For a uniform distribution on 0 to 1, the population median is 0.5.
Most of the time, the bootstrap distribution is quite non-Normal; three examples are shown
below. (b) SEboot typically ranges from about 0.04 to 0.12 (but may vary more than that,
depending on the original sample). The bootstrap t interval is therefore roughly M ± 2SEboot .
(c) The more sophisticated BCa and tilting intervals may or may not be similar to the
bootstrap t interval. The t interval is not appropriate because of the non-Normal shape of
the bootstrap distribution, and because SEboot is unreliable for the sample median (it depends
strongly on the sizes of the gaps between the observations near the middle).
Note: Based on 5000 simulations of this exercise, the bootstrap t interval M ± 2SEboot
will capture the true median (0.5) only about 94% of the time (so it slightly underperforms
its intended 95% confidence level). In the same test, both the percentile and BCa intervals
included 0.5 over 95% of the time, while at the same time being narrower than the
bootstrap t interval nearly two-thirds of the time. These two measures (achieved confidence
level, and width of confidence interval) both confirm the superiority of the other intervals.
The bootstrap percentile, BCa, and tilting intervals do fairly well despite the high
variability in the shape of the bootstrap distribution. They give answers similar to the exact
rank-based confidence intervals obtained by inverting hypothesis tests. One variation of
tilting intervals matches the exact intervals.
0.2 0.3 0.4 0.5 0.6 0.7 0. 0.3 0.4 0.5 0.6 0.3 0.4 0.5 0.6 0.7
median(uniform50) median(uniform50) median(uniform50)
Solutions 445
16.76. (a) The difference appears to be quite large; it should be signifi- Male Female
cant. (b) The two-sided permutation P-value is about 0.000655. With 1 9
2 01
1000 resamples, students will typically get a P-value of no more than 3 2 2233
0.004. (c) We conclude that there is significant evidence that the mean 5 2 4
6 2
ages differ. The t test P-value is similar to the (true) permutation P- 899 2 9
value, although student estimates of the latter might be too high. 01 3
22 3
Note: Some software will compute the exact permutation test P- 5 3
.
110
value; it is 167,960 = 0.000655. This is about three times larger than the standard t test
.
result: P = 0.000223.
Normal quantile plot: mean(male age − female age) Permutation distribution: mean(male age − female age)
Observed
6
mean(male age − female age)
Mean
4
2
0
−6 −4 −2
−3 −2 −1 0 1 2 3 −6 −4 −2 0 2 4 6
Normal score mean(male age − female age)
16.77. See the solution to Exercise 2.33 for a scatterplot. The permutation distribution (found
by permuting one variable and computing the correlation) is roughly Normal, and the
.
observed correlation (r = 0.878) lies far out on the high tail, about three standard deviations
above the mean (0). We conclude there is a significant positive relationship.
Observed
Mean
corr(distress, activity)
boot t
0.8
percentile
BCa
0.6
0.4
0.2
16.79. (a) The 2001 data is slightly skewed, but 2000 prices 2001 prices
close to Normal given the sample size (50). The 1 3346899 0 5677
2000 data is strongly right-skewed with two 2 001488 1 0134445799
3 3669 2 0011123444677899
high outliers; a sample of size 20 is probably 4 8 3 1123457
not enough to compensate. (b) The two-sided 5 4 25556788
P-value for the permutation test is approximately 6 5 017
0.28. We conclude that there is not strong evi- 7 6 8
8 7 1
dence that the mean selling prices are different 9
for all Seattle real estate in 2000 and in 2001. 10
11 0
12
13
14
15
16
17
18 4
Solutions 447
Normal quantile plot: 2000 mean − 2001 mean Permutation distribution: 2000 mean − 2001 mean
Observed
2000 mean − 2001 mean Mean
150
50
−50
−150
16.80. (a) See the solution to Exercise 1.41 for stemplots. Typical ranges
Summary statistics (all in units of minutes): t lower 12.0 to 17.0
t upper 79.0 to 84.0
x s Min Q1 M Q3 Max Percentile lower 8.8 to 18.9
Men 117.2 74.24 0 60 120 150 300 Percentile upper 75.4 to 85.2
Women 165.2 56.51 60 120 175 180 360 BCa lower 7.4 to 20.8
BCa upper 75.1 to 87.9
(b) The (unpooled) two-sample t test of H0 : µ F = µ M
. . .
versus Ha : µ F = µ M gives t = 2.82, df = 54.2, and P = 0.0067—a significant difference.
A 95% confidence interval for the difference µ F − µ M is 13.85 to 82.15 minutes. (c) A
two-sided permutation test for the difference of means typically gives P no more than 0.02
(with 1000 resamples). The bootstrap distribution is slightly skewed; confidence intervals are
similar to the standard t interval, although the percentile and BCa intervals are sometimes
shifted to the left.
Normal quantile plot: Female mean − male mean Permutation distribution: Female mean − male mean
Observed
Female mean − male mean
40
Mean
20
0
−20
−60
Normal quantile plot: Female mean − male mean Bootstrap distribution: Female mean − male mean
Female mean − male mean
Observed
Mean
80
boot t
percentile
60
BCa
40
20
0
−3 −2 −1 0 1 2 3 −20 0 20 40 60 80 100
Normal score Female mean − male mean
16.81. The permutation and bootstrap distributions for the Typical ranges
difference in medians are extremely non-Normal, with t lower 9.6 to 16.5
many gaps and multiple peaks. In this situation, we have t upper 93.5 to 100.4
conflicting results: The permutation test gives fairly Percentile lower 0 to 30
strong evidence of a difference (the two-sided P-value is Percentile upper 90 to 100
roughly 0.032), but the BCa interval for the difference in BCa lower −32.5 to 0
BCa upper 75 to 90
medians nearly always includes 0.
Normal quantile plot: Female median − male median Permutation distribution: Female median − male median
60
Observed
Female median − male median
Mean
40
20
0
−20
−60
Normal quantile plot: Female median − male median Bootstrap distribution: Female median − male median
Observed
Female median − male median
120
Mean
boot t
percentile
20 40 60 80
BCa
0
−3 −2 −1 0 1 2 3 0 20 40 60 80 100 120 14
Normal score Female median − male median
Solutions 449
16.82. The standard test for equality of variances gives Typical ranges
.
F = 0.58 with df 29 and 29, for which p = 0.1477. t lower −0.22 to 0.02
(a) Using a permutation test, the two-sided P-value is t upper 1.14 to 1.38
about 0.226. Ranges for the bootstrap intervals are on Percentile lower 0.17 to 0.23
the right; the bootstrap t is a bad choice for this sharply Percentile upper 1.24 to 1.62
skewed distribution. (b) The variances are equal if and BCa lower 0.21 to 0.27
BCa upper 1.42 to 2.42
only if the standard deviations are equal. Any conclu-
sions about the ratio of variances from the bootstrap and permutation distributions has an
equivalent conclusion about the ratio of standard deviations. (c) We have strong evidence
that the means of the two distributions are different, but cannot reject H0 : σ F = σ M . The
evidence regarding the medians is mixed.
Normal quantile plot: var(Female) / var(Male) Permutation distribution: var(Female) / var(Male)
3.5
Observed
Mean
var(Female) / var(Male)
2.5
1.5
0.5
Mean
var(Female) / var(Male)
boot t
percentile
2.0
BCa
1.0
0.0
16.83. See Exercise 8.55 for more details about this survey. Typical ranges
The bootstrap distribution appears to be close to Normal; t lower 0.31 to 0.32
bootstrap intervals are similar to the large-sample interval t upper 0.38 to 0.39
(0.3146 to 0.3854). Percentile lower 0.30 to 0.32
Note: At the time of this writing, R’s bootstrapping Percentile upper 0.38 to 0.39
package would not compute the BCa intervals for this exercise.
450 Chapter 16 Bootstrap Methods and Permutation Tests
Normal quantile plot: Teen prop. − adult prop. Bootstrap distribution: Teen prop. − adult prop.
Observed
Mean
Teen prop. − adult prop.
boot t
0.38
percentile
0.34
0.30
16.84. The bootstrap distribution is slightly skewed, but close Typical ranges
enough to Normal that there is little difference among the t lower 1.54 to 1.56
interval methods. t upper 1.73 to 1.76
Note: At the time of this writing, R’s bootstrapping Percentile lower 1.54 to 1.57
package would not compute the BCa intervals for this Percentile upper 1.73 to 1.77
exercise.
Normal quantile plot: Teen prop. / adult prop. Bootstrap distribution: Teen prop. / adult prop.
1.80
Observed
Mean
Teen prop. / adult prop.
boot t
1.70
percentile
1.60
1.50
16.85. (a) This is the usual way of computing percent change: 89/54 − 1 = 0.65. (b) Subtract
1 from the confidence interval found in Exercise 16.84; this typically gives an interval
similar to 0.55 to 0.75. (c) Preferences will vary.
16.86. (a) Jocko’s mean estimate is $1827.5; the other Typical ranges
garage’s mean is $1715. The matched-pairs t interval Bias −3.7 to 3.7
for the difference is $112.5 ± $88.52 = $23.98 to SEboot 34.7 to 39.9
$201.02. (b) Because these are matched pairs, we re- t lower 22.3 to 34.0
sample the differences. The distribution is reasonably t upper 191.0 to 202.7
close to Normal; ranges for the bootstrap intervals are Percentile lower 25.1 to 47.6
Percentile upper 175 to 195
on the right. (c) The bootstrap t interval is similar to
BCa lower 20 to 45
the standard t interval; the other intervals are typically BCa upper 170 to 195
narrower.
Solutions 451
200
mean(Jocko − Other) Mean
150 boot t
percentile
BCa
100
50
16.87. (a) The mean ratio is 1.0596; the usual t interval is Typical ranges
.
1.0596 ± (2.262)(0.02355) = 1.0063 to 1.1128. The (a) Mean ratio
bootstrap distribution for the mean is close to Normal, t lower 1.00 to 1.02
and the bootstrap confidence intervals (typical ranges on t upper 1.10 to 1.12
the right) are usually similar to the usual t interval, but Percentile lower 1.00 to 1.03
slightly narrower. Bootstrapping the median produces a Percentile upper 1.09 to 1.11
BCa lower 1.00 to 1.03
clearly non-Normal distribution; the bootstrap t interval
BCa upper 1.09 to 1.11
should not be used for the median. (Ranges for median (b) Ratio of means
intervals are not given.) (b) The ratio of means is 1.0656; t lower 0.59 to 0.68
the bootstrap distribution is noticeably skewed, so the t upper 1.46 to 1.54
bootstrap t is not a good choice, but the other methods Percentile lower 0.69 to 0.78
usually give intervals similar to 0.75 to 1.55. Also shown Percentile upper 1.45 to 1.64
below is the bootstrap distribution for the ratio of the me- BCa lower 0.69 to 0.78
BCa upper 1.45 to 1.66
dians. It is considerably less erratic than the median ratio,
but we have still not included these confidence intervals. (c) For example, the usual t interval
from part (a) could be summarized by the statement, “On average, Jocko’s estimates are 1%
to 11% higher than those from other garages.”
Normal quantile plot: mean(Jocko / Other) Bootstrap distribution: mean(Jocko / Other)
Observed
Mean
1.10
mean(Jocko / Other)
boot t
percentile
BCa
1.05
1.00
1.15 Normal quantile plot: median(Jocko / Other) Bootstrap distribution: median(Jocko / Other)
Observed
Mean
median(Jocko / Other)
1.10
boot t
percentile
BCa
1.05
1.00
0.95
Mean
boot t
1.4
percentile
BCa
1.0
0.6
Observed
median(Jocko) / median(Other)
Mean
boot t
1.5
percentile
BCa
1.0
0.5
17.4. Possible examples of special causes might include: traffic, number of passengers on the
shuttle (especially if the shuttle makes several stops along the way), mechanical problems
with the shuttle.
√
√ line is at µ = 85 seconds. The control limits should be at µ ± 3σ/ 6 =
17.5. The center
85 ± 3(17/ 6 ), which means about 64.18 and 105.82 seconds.
17.6. (a) With n√= 5, the center line is unchanged (85 seconds), but the control limits are
now µ ± 3σ/ 5 = 62.18 and 107.82 seconds. (b) With √ n. = 7, the center line is unchanged
(85 seconds), but the control limits are now µ ± 3σ/ 7 = 65.72 and 104.28 seconds. (c) To
convert to minutes, divide the original center line and control limits by 60: The center line
is 1.417 minutes, and the control limits are 1.070 and 1.764 minutes.
17.8. Common causes of variation might include the time it takes to call in the order, to make
the pizza, and to deliver it. Examples of special causes might include heavy traffic or
waiting for a train (causing delays in delivery), high demand for pizza (for example, during
events like the Super Bowl), etc.
17.9. The most common problems are related to the application of the color coat; that should
be the focus of our initial efforts.
30 14
Frequency (percent)
12
Percent of losses
25
10
20
8
15
6
10
4
5 2
0 0
s
ss
r
rity
s
olo
olo
es
nc
os
ve
ne
glo
cla
kn
ere
s/c
/gl
ne
c
ck
re/
re/
DRG
hic
les
tu
ple
of
dh
thi
xtu
xtu
st
oa
p
ck
ra
Rip
lor
Rip
Te
Te
os
La
o
tro
Co
Po
Gl
c
Ele
Problem
17.10. These DRGs account for a total of 80.5% of all losses. Certainly the first two (209 and
116) should be among those that are studied first; some students may also include 107, 462,
and so on.
17.11. Possible causes could include delivery delays due to traffic or a train, high demand
during special events, and so forth.
453
454 Chapter 17 Statistics for Quality: Control and Capability
√
17.12. (a) The center line is at µ = 72◦ F; the control limits should be at µ ± 3σ/ 5, which
means about 71.46◦ F and 72.54◦ F. (b) For n = 5, c4 = 0.94 and B6 = 1.964, so the center
line for the s chart is (0.94)(0.4) = 0.376◦ F, and the control limits are 0 and 0.7856◦ F.
17.13. (a) For√the x chart, the center line is at µ = 1.028 lb; the control limits should be
at µ ± 3σ/ 3, which means about 0.9864 and 1.0696 lb. (b) For n = 3, c4 = 0.8862
and B6 = 2.276, so the center line for the s chart is (0.8862)(0.024) = 0.02127 lb, and
the control limits are 0 and 0.05462 lb. (c) The control charts are below. (d) Both charts
suggest that the process is in control.
1.07
0.04
1.04
1.03 0.03
1.02
1.01 0.02
1 0.01
0.99
0.98 0
0 5 10 15 20 0 5 10 15 20
Sample number Sample number
17.14. (a) Common causes might include processing time, normal workload fluctuation, or
postal delivery time. (b) s-type special causes might include a new employee working in the
personnel department. (c) Special causes affecting x might include a sudden large influx of
applications or perhaps introducing a new filing system for applications.
Solutions 455
Sample mean
• ••
on the right and below. Points outside con- 11.6 • • •••
trol limits are marked with an “X.” (c) Set B 11.5 • •
• •••
is from the in-control process. The process 11.4 •
mean shifted suddenly for Set A; it appears 11.3 •
to have changed on about the 11th or 12th 11.2
sample. The mean drifted gradually for the 11.1
0 5 10 15 20
process in Set C.
Sample number
Sample mean
11.6 •• • • • 11.7 ••••
• • •
11.5 • •• • •• •••
• • • • ••••
11.4 •
11.5 ••••
11.3
••
11.3 •
11.2
11.1 11.1
0 5 10 15 20 0 5 10 15 20
Sample number Sample number
17.16. (a) For the x chart, the center line is 11.5, and the control limits are 11.2 and 11.8 (as
in Exercise 17.15). For n = 4, c4 = 0.9213 and B6 = 2.088, so the center line for the s
chart is (0.9213)(0.2) = 0.18426, and the control limits are 0 and 0.4176. (b) The s chart is
certainly out of control at sample 7 (and was barely in control at sample 6). After that, there
are a number of out-of-control points. The x chart is noticeably out of control at sample 8.
(c) A change in the mean does not affect the s chart; the effect on the x chart is masked
by the change in σ: Because of the increased variability, the sample means are sometimes
below the UCL even after the process mean shifts.
12.1 x• 0.8 x
Sample standard deviation
0.7
•
11.9 x x x
• • 0.6 x
Sample mean
• x •
0.5 • x
11.7 •• x
•
• • ••• 0.4 • •••••
11.5 •
••
• ••
• 0.3 •
•• • 0.2
• •
••• •
11.3
0.1 • •
11.1 0
0 5 10 15 20 0 5 10 15 20
Sample number Sample number
17.17. For the s chart with n = 6, we have c4 = 0.9515, B5 = 0.029 and B6 = 1.874, so the
center line is (0.9515)(0.001) = 0.0009515 inch, and the control limits are 0.000029 and
0.001874 √ inch. For the x chart, the center line is µ = 0.87 inch, and the control limits are
. .
µ ± 3σ/ 6 = 0.87 ± 0.00122 = 0.8688 and 0.8712 inch.
456 Chapter 17 Statistics for Quality: Control and Capability
17.18. (a) For n = 5, we have c4 = 0.94, B5 = 0, and B6 = 1.964, so the center line is
√ . are 0 and 0.249428. (b) The center line is µ = 4.22, and the
0.11938, and the control limits
control limits are µ ± 3σ/ 5 = 4.00496 to 4.3904.
17.19. For the x chart, the center line is 43, and the control limits are 25.91 and 60.09.
For n = 5, c4 = 0.9400 and B6 = 1.964, so the center line for the s chart is
(0.9400)(12.74) = 11.9756, and the control limits are 0 and 25.02. The control charts
(below) show that sample 5 was above the UCL on the s chart, but it appears to have been
special cause variation, as there is no indication that the samples that followed it were out of
control.
60 x
•
50 20
• • • • • • •
45 • • • • •
15 •
40 • • • • • • • • • • • •
• • • • 10
35 • • • •
30 5 •
25
0
0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18
Sample number Sample number
17.20. The new type of yarn would appear on the x chart because it would cause a shift in the
mean pH. (It might also affect the process variability and therefore show up on the s chart.)
Additional water in the kettle would change the pH for that kettle, which would change the
mean pH and also change the process variability, so we would expect that special cause to
show up on both the x and s charts.
17.22. c = 3.090. (Looking at Table A, there appear to be three possible answers—3.08, 3.09,
or 3.10. Software gives the answer 3.090232. . . .)
√
17.23. The usual 3σ limits are µ ± 3σ/ n for an x chart and (c4 ± 3c5 )σ for an s chart. For
√
2σ limits, simply replace “3” with “2.” (a) µ ± 2σ/ n. (b) (c4 ± 2c5 )σ .
17.24. (a) The R chart monitors the variability (spread) of the process. (b) The R chart is
commonly used because R is easier to compute (by hand) than s. (c) The x control limits
are affected because we estimate process spread using R instead of s.
Solutions 457
17.25. (a) Shrinking the control limits would increase the frequency of false alarms, because
the probability of an out-of-control point when the process is in control will be higher
(roughly 5% instead of 0.3%). (b) Quicker response comes at the cost of more false alarms.
(c) The runs rule is better at detecting gradual changes. (The one-point-out rule is generally
better for sudden, large changes.)
17.26. (a) Either (ii) or (iii), depending on whether the deterioration happens quickly or
gradually. We would not necessarily expect that this deterioration would result in a
change in variability (s or R). (b) (i) s or R chart: A change in precision suggests altered
variability (s or R), but not necessarily a change in center (x). (c) (i) s or R chart:
Assuming there are other (fluent) customer service representatives answering the phones, this
new person would have unusually long times, which should most quickly show up as an
increase in variability. (d) (iii) A run on the x chart: “The runs signal responds to a gradual
shift more quickly than the one-point-out signal.”
.
17.27. We estimate σ̂ to √be s/0.9213 = 1.1180, so the x chart has center line x = 47.2 and
.
control limits x ± 3σ̂ / 4 = 45.523 and 48.877. The s chart has center line s = 1.03 and
.
control limits 0 and 2.088σ̂ = 2.3344.
. .
17.28. To estimate µ and σ , we compute x = 1.0299 lb and s = 0.0222 lb from the
sample means and standard deviations given in Table 17.3. µ̂ = x is our estimate of
µ; this is about 0.002 lb greater than the historical value (1.028 lb). To estimate σ , we
.
use σ̂ = s/c4 = 0.0222/0.8862 = 0.0251 lb—about 4.6% greater than the historical
value (0.024 lb). Both of these differences are so small that, even if they are statistically
significant, it seems unlikely that they suggest any noteworthy change in this process.
17.30. (a) Sketches will vary quite a bit; many students will struggle with the implications of
this situation on the appearance of the two charts. The two charts below were produced
using a much more sophisticated approach than most students would take; they arose from
a simulation taking samples of size 6 (3 from each clerk), where the experienced clerk’s
458 Chapter 17 Statistics for Quality: Control and Capability
processing time (in minutes) is N (2, 0.5) and the new hire’s processing time is N (4, 0.8).
The center lines and control limits were estimated from the data. (b) For example, this
would be acceptable if we are concerned with overall processing time and are not interested
in individual processing times. In particular, it would not be appropriate to compute
tolerance limits for this situation because the individual measurements do not have a Normal
distribution.
3.5
3 1.5
2.5 1
2
0.5
1.5
1 0
0 5 10 15 20 0 5 10 15 20
Sample number Sample number
Note: The situation described here demonstrates the problems that can arise when we do
not carefully consider the question of “rational subgroups” in our sampling design; see the
discussion on page 17-30.
This situation is related to the issue of distinguishing “within-groups” variation
from “between-groups” variation, as discussed in Chapter 12 (One-Way ANOVA). The
within-groups variation is variation in invoice processing time for each clerk, and the
between-groups variation is the difference between their processing times. In this case,
though, we are not paying attention to the explanatory variable (which clerk processed the
invoice), so all we see is a mixture of the two sources of variation. If—as was the case
here—the two clerks were fairly consistent so that within-groups variation is small, the
sample standard deviations are most affected by the between-groups variation.
Note that both charts show less variation than we typically see; nearly all the points are
no more than 1 or 2 standard deviations from the center line. To begin to understand why,
imagine an extreme case with no within-groups variation—where one clerk always takes
exactly 2 minutes, and the other always takes exactly 4 minutes. Then each sample would
.
contain the 6 numbers 2, 2, 2, 4, 4, and 4, so x = 3 and s = 1.0954 for all samples, and the
control charts would have no variation at all.
17.31. (a) Average the 20 sample means and standard deviations and estimate µ to be
.
µ̂ = x = 2750.7 and σ to be σ̂ = s/c4 = 345.5/0.9213 = 375.0. (b) In the s chart shown in
Figure 17.7, most of the points fall below the center line.
1400
Sample mean
1000 •
800 • • • • • 6500 • • • • • •
• • • •
• •
600
• •
• • • •
400 6000
•
200
0 5500
0 3 6 9 12 15 0 3 6 9 12 15
Sample number Sample number
17.33. If the manufacturer practices SPC, that provides some assurance that the phones are
roughly uniform in quality—as the text says, “We know what to expect in the finished
product.” So, assuming that uniform quality is sufficiently high, the purchaser does not need
to inspect the phones as they arrive because SPC has already achieved the goal of that
inspection: to avoid buying many faulty phones. (Of course, a few unacceptable phones may
be produced and sold even when SPC is practiced—but inspection would not catch all such
phones anyway.)
.
17.34. The standard deviation of all 120 measurements is s = $811.53, and the mean is
.
x = $6442.4 (the same as x—as it must be, provided all the individual samples were the
same size). The natural tolerances are x ± 3s = $4007.8 to $8877.0.
7500
trustworthy. 7000
Note: We might also assess Normality 6500
with a histogram or stemplot; this looks 6000
reasonably Normal, but we see that the 5500
number of losses between $6000 and $6500 5000
is noticeably higher than we might expect 4500
–3 –2 –1 0 1 2 3
from a Normal distribution. In fact, the Normal score
smallest and largest losses were $4727 and
$8794. These are both within the tolerances, but note that the minimum is quite a bit more
than the lower limit of the tolerances ($4008). The large number of losses between $6000
and $6500 makes the mean slightly lower and therefore lowers both of the tolerance limits.
17.36. (a) About 99.9% meet the old specifications: If X is the water resistance on a randomly
chosen jacket, then:
− 2750 − 2750 .
P(1000 < X < 4000) = P 1000383.8 < Z < 4000383.8 = P(−4.56 < Z < 3.26) = 0.9994
(b) About 97.4% meet the new specifications:
1500 − 2750 3500 − 2750 .
P(1500 < X < 3500) = P 383.8 <Z< 383.8 = P(−3.26 < Z < 1.95) = 0.9738
460 Chapter 17 Statistics for Quality: Control and Capability
17.37. If we shift the process mean to 2500 mm, about 99% will meet the new specifications:
− 2500 − 2500 .
P(1500 < X < 3500) = P 1500383.8 < Z < 3500383.8 = P(−2.61 < Z < 2.61) = 0.9910
17.38. (a) The means (1.2605 and 1.2645) agree exactly with those given; the standard
deviations are the same up to rounding. (b) The s chart tracks process spread. For the 30
.
samples, we have s = 0.0028048, so σ̂ = s/c4 = s/0.7979 = 0.003515; the center line is
.
s, and the control limits are B5 σ̂ = 0 and B6 σ̂ = 2.606σ̂ = 0.009161. Short-term variation
seems to be in control. (c) For the x chart, which monitors
√ .the process center, the center
line is x = 1.26185, and the control limits are x ± 3σ̂ / 2 = 1.2544 to 1.2693. The control
chart shows that the process is in control.
Sample standard deviation
0.009 1.268
0.008 •
0.007
1.266 • •
Sample mean
0.006 • •• 1.264 • • • •
• •
0.005 • • • 1.262
• • ••••• •••
0.004 • 1.260 • • • • •
0.003 • • • • • •
• • • • 1.258 • • •
0.002 • • • •• •
0.001 • • • • 1.256
0
• • • • • •
1.254
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Sample number Sample number
17.39. The mean of the 17 in-control samples is x = 43.4118, and the standard deviation is
11.5833, so the natural tolerances are x ± 3s = 8.66 to 78.16.
17.40. There were no out-of-control points, so we estimate the mean of the process using
µ̂ = x = 1.26185. The estimated standard deviation is computed from the 60 individual data
.
points; this gives s = 0.003328. The natural tolerances are x ± 3s = 1.2519 to 1.2718.
17.41. Only about 44% of meters meet the specifications. Using the mean (43.4118) and
standard deviation (11.5833) found in the solution to Exercise 17.39:
− 43.4118 − 43.4118 .
P(44 < X < 64) = P 4411.5833 < Z < 6411.5833 = P(0.05 < Z < 1.78) = 0.4426
17.42. There is no clear deviation from Normality apart from granularity due to the limited
accuracy of the recorded measurements.
Solutions 461
1.268
Phantom measurement
Distance measurement
70
1.266
60
1.264
1.262 50
1.260 40
1.258
1.256 30
1.254 20
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
Normal score Normal score
17.43. The limited precision of the measurements shows up in the granularity (stair-step
appearance) of the graph. Aside from this, there is no particular departure from Normality.
.
17.44. The standard deviation of all 60 weights is s = 0.0224 lb, and the mean is
.
x = 1.02996 lb (the same as x, except for rounding error). The natural tolerances are
x ± 3s = 0.9627 to 1.0972 lb.
. .
17.46. (a) For the 21 samples, we have s = 0.2786, so σ̂ = s/c4 = 0.2786/0.9213 = 0.3024;
.
the center line is s, and the control limits are B5 σ̂ = 0 and B6 σ̂ = (2.088)(0.3024) =
0.6313. Short-term variation seems
√ to be in control. (b) For the x chart, the center line is 0
and the control limits are ±3σ̂ / 4 = ±0.4536. The x chart suggests that the process mean
has drifted. (Only the first four out-of-control points are marked.) One possible cause for the
increase in the mean is that the machine that makes the bearings is gradually drifting out of
adjustment.
462 Chapter 17 Statistics for Quality: Control and Capability
0.8 •• •
0.6 x •
Sample std. deviation
0.6 x x x
0.5 • • ••• • •
• •
Sample mean
• 0.4 •
0.4 • •• •
• • 0.2 •
0.3 • • • 0
•••
•• • • • •
0.2 • –0.2 • •
0.1 •• • • –0.4
•
0 –0.6
0 5 10 15 20 0 5 10 15 20
Sample number Sample number
17.47. (a) (ii) A sudden change in the x chart: This would immediately increase the amount of
time required to complete the checks. (b) (i) A sudden change (decrease) in s or R because
the new measurement system will remove (or decrease) the variability introduced by human
error. (c) (iii) A gradual drift in the x chart (presumably a drift up, if the variable being
tracked is the length of time to complete a set of invoices).
17.49. The process is no longer the same as it was during the downward trend (from the
1950s into the 1980s). In particular, including those years in the data used to establish the
control limits results in a mean that is too high to use for current winning times, and a
standard deviation that includes variation attributable to the “special cause” of the changing
conditioning and professional status of the best runners. Such special cause variation should
not be included in a control chart.
185
184.2 pounds. The first four points are above 184 • •
the upper control limit; there are no runs 183 • •
(above or below the center line) longer than 182 • •
181 • • • •
five. The overall impression is that Joe’s •
180 •
weight returns to being “in control”; it de- 179
•
creases fairly steadily, and the last 12 points 178
are between the control limits. 177
0 2 4 6 8 10 12 14 16
Week
17.51. LSL and USL are specification limits on the individual observations. This means that
they do not apply to averages and that they are specified as desired output levels, rather than
being computed based on observation of the process. LCL and UCL are control limits for
the averages of samples drawn from the process. They may be determined from past data,
or independently specified, but the main distinction is that the purpose of control limits is to
detect whether the process is functioning “as usual,” while specification limits are used to
determine what percentage of the outputs meet certain specifications (are acceptable for use).
17.52. In each graph below, the large tick marks are 3σ apart, and the smaller tick marks are
1σ apart, and the target is marked as “T.” Because Cp = 1.5, the specification limits are 9σ
apart, located at T ± 4.5σ . The first two graphs could be flipped (i.e., the peak of the curve
Solutions 463
could be closer to the LSL than the USL). (a) Cpk = 0.5 means that the nearer specification
limit is (0.5)(3σ ) = 1.5σ above (or below) the mean. (b) Cpk = 1.0 means that the nearer
specification limit is 3σ above (or below) the mean. (c) Cpk = 1.5 means that the nearer
specification limit is (1.5)(3σ ) = 4.5σ above (or below) the mean—so that µ falls exactly
on the target (halfway between the specification limits).
Note: At the end of Example 17.16, the text notes that Cp = Cpk means “the process is
properly centered”—that is, µ equals the target.
17.53. For computing Ĉpk , note that the estimated process mean (2750.7 mm) lies closer
4000 − 1000 . 4000 − 2750.7 .
to the USL. (a) Ĉp = = 1.3028 and Ĉpk = = 1.0850.
6 × 383.8 3 × 383.8
3500 − 1500 . 3500 − 2750.7 .
(b) Ĉp = = 0.8685 and Ĉpk = = 0.6508.
6 × 383.8 3 × 383.8
.
17.54. (a) With the original specifications, Ĉp = 1.3028 (unchanged from the previous exercise,
4000 − 2500 .
because Ĉp does not depend on µ) and Ĉpk = = 1.3028. (b) Once again,
3 × 383.8
. 3500 − 2500 .
Ĉp = 0.8685 is unchanged. Ĉpk = = 0.8685.
3 × 383.8
17.55. In the solution to Exercise 17.44, we found that the mean and standard deviation of all
. . 1.10 − 0.94 .
60 weights are x = 1.02996 lb and s = 0.0224 lb. (a) Ĉp = = 1.1901 and
6 × 0.0224
1.10 − 1.03 .
Ĉpk = = 1.0418. (These were computed with the unrounded values of x and s;
3 × 0.0224
rounding will produce slightly different results.) (b) Customers typically will not complain
about a package that was too heavy.
17.56. A change to the process mean would not change Ĉp , but we could increase Ĉpk by
centering the process mean between the specification limits, at µ = 1.10 +2 0.94 = 1.02 lb.
With that change, Ĉpk increases to 1.1901 (the same as Ĉp before the change).
Note: The effect of this change is hard to predict if we suspect that the weight
measurements are non-Normal, but the data do not suggest any such problems (see the
solution to Exercise 17.45). Additionally, decreasing the process mean might have the
undesirable effect of increasing customer dissatisfaction (see part (b) of the previous
exercise).
464 Chapter 17 Statistics for Quality: Control and Capability
LSL USL
.
0.75 − 0.25
17.57. (a) Cpk = 3σ = 0.5767. 50% of the output meets the
specifications. (b) LSL and USL are 0.865 standard deviations
above and below to mean, so the proportion meeting specifica-
.
tions is P(−0.865 < Z < 0.865) = 0.6130. (c) The relationship
between Cpk and the proportion of the output meeting specifica-
tions depends on the shape of the distribution.
.
17.58. In the solution to Exercise 17.31, we found σ̂ = s/c4 = 375.0; from this, we compute
3500 − 2500 .
Ĉpk = = 0.8889, which is larger than the previous value (0.8685).
3 × 375.0
17.59. See also the solution to Exercise 17.43. (a) Use LSL USL
the mean and standard deviation of the 85 remaining
observations: µ̂ = x = 43.4118 and σ̂ = s = 11.5833.
.
(b) Ĉp = 620σ̂ = 0.2878 and Ĉpk = 0 (because µ̂ is
outside the specification limits). This process has very
8.7 20.2 31.8 43.4 55.0 66.6 78.2
poor capability: The mean is too low and the spread
too great. Only about 46% of the process output meets specifications.
17.60. See also the solution to Exercise 17.34. (a) About 97.1%: For the 120 observations
. .
in Table 17.7, we find µ̂ = x = $6442.4 and σ̂ = s = $811.53. Therefore, we estimate
P($4500 < X < $7500) = P(−2.39 < Z < 1.30) = 0.9032 − 0.0084 = 0.8948.
7500 − 4500 . 7500 − 6442.4 .
(b) Ĉp = = 0.6161. (c) Ĉpk = = 0.4344.
6 × 811.53 3 × 811.53
17.61. We have x = 22.005 mm and s = 0.009 mm, so we assume that an individual bearing
diameter X follows a N (22.005, 0.009) distribution. (a) About 85.3% meet specifications:
21.985 − 22.005 22.015 − 22.005
P(21.985 < X < 22.015) = P 0.009 <Z< 0.009
= P(−2.22 < Z < 1.11)
= 0.9868 − 0.1335 = 0.8533.
22.015 − 22.005 .
(b) Ĉpk = = 0.3704.
3 × 0.009
17.62. (a) This is unlikely to have any beneficial effect; it would result in more frequent
adjustments, but these would often be unnecessary and so might degrade capability. Control
limits are for correcting special-cause variation, not common-cause variation. (b) If some of
the nonconforming bearings are due to operator error, further training may have the effect of
reducing σ and increasing Cpk . Part (d) offers a slightly different viewpoint. (c) Assuming
the new machine has less variability (smaller σ ), this should improve the process capability.
(d) The number of nonconforming bearings produced by an operator is (for the most part)
a result of random variation within the system; no incentive can cause the operator to do
better than the system allows. (e) Better raw material should (presumably) result in better
product, so this should improve the capability.
Solutions 465
17.64. (a) The graph on the right shows the mean shifted
toward the USL; it could also be shifted toward the LSL USL
LSL. As in the graph in the previous problem, tick
marks are σ units apart. (b) Cpk = 4.5σ
3σ = 1.5.
Six-sigma quality does not mean that Cpk ≥ 2; the latter is a stronger requirement. (c) The
desired probability is 1 − P(−7.5 < Z < 4.5), for which software gives 3.4 × 10−6 , or about
3.4 out-of-spec parts per million.
17.65. Students will have varying justifications for the sampling choice. Choosing six calls
per shift gives an idea of the variability and mean for the shift as a whole. If we took
six consecutive calls (at a randomly chosen time), we might see additional variability
in x because sometimes those six calls might be observed at particularly busy times
(when a customer has to wait for a long time until a representative is available or when a
representative is using the restroom).
17.66. (a) For n = 6, we have c4 = 0.9515, B5 = 0.029, and B6 = 1.874. With s = 29.985
.
seconds, we compute σ̂ = s/c4 = 31.5134 seconds, so the initial s chart has center line s
. .
and control limits B5 σ̂ = 0.9139 and B6 σ̂ = 59.0561 seconds. There are four out-of-control
points, from samples 28, 39, 42, and 46. (b) With the remaining 46 samples, s = 24.3015,
so σ̂ = s/c4 = 25.54 seconds, and the control limits are B5 σ̂ = 0.741 and B6 σ̂ = 47.86
seconds. There are no more out-of-control points. (The second s chart
√ is not shown.) (c) We
have center line x = 29.2087 seconds, and control limits x ± 3σ̂ / 6 = −2.072 and 60.489
seconds. (The lower control limit should be ignored or changed to 0.) The x chart has no
out-of-control points.
120 •x 60 •
x
Sample std. deviation
100 • x 50
• • • •
Sample mean
80 40
• • ••
x
30 •
•• •• •• ••• • • • •
60 • • •• ••
• ••• •• • ••• • •
40 • • • • • 20 • •
• • • • • • ••
• • • • • •
20 ••• ••• •• • •• •••• •• • • •• •• •• 10 •
• • • •• • 0
0
0 10 20 30 40 50 0 10 20 30 40 50
Sample number Sample number
17.67. The outliers are 276 seconds (sample 28), 244 seconds (sample 42), and 333 seconds
(sample 46). After dropping those outliers, the standard deviations drop to 9.284, 6.708, and
31.011 seconds. (Sample #39, the other out-of-control point, has two moderately large times,
144 and 109 seconds; if they are removed, s drops to 3.416.)
466 Chapter 17 Statistics for Quality: Control and Capability
17.68. For those 10 days, there were 961 absences and 10 · 987 = 9870 person-days available
961 .
for work, so p = = 0.09737, and:
9870
p(1 − p)
CL = p = 0.09737, control limits: p ± 3 = 0.06906 and 0.12567
987
17.69. (a) For those 10 months, there were 1028 overdue invoices out of 28,400 total invoices
1028 .
(opportunities), so p = = 0.03620. (b) The center line and control limits are:
28, 400
p(1 − p)
CL = p = 0.03620, control limits: p ± 3 = 0.02568 and 0.04671
2840
17.70. Based on 3.22 complaints per 1000 passengers, the center line is p = 3.22
1000 = 0.00322,
p(1 − p)
and the control limits are p ± 3 , which means about −0.000179 and 0.006619. As
2500
the problem says, we take LCL = 0.
0.04
is 0.0189. For the first operator, the con-
trol limits are those found in the previous 0.03
solution: 0.00063 and 0.03717. For the
second operator,
the control limits are 0.02
0.0189 · 0.9811
0.0189 ± 3 , which yields
400 0.01
−0.00153 (use 0) and 0.03933.
Note: We could simplify this control 0
chart with a couple of practical observa- 1 2 3 4 5 6 7 8
tions. First, we would not be concerned Sample number (hour)
if the proportion of broken eggs were too low, so we could take the first operator’s LCL to
be 0. In addition, for the first (second) operator, the hourly proportion will be a multiple of
0.002 (0.0025), so it will exceed the UCL if p̂ ≥ 0.038 (p̂ ≥ 0.04).
.
17.73. The center line is at p = 163
36,480 = 0.004468; the control limits should be at
p(1 − p)
p±3 , which means about -0.00066 (use 0) and 0.0096.
1520
Absentee rate
9218 0.38
center line is p = 0.3555, and the control • •
0.36
limits are: • • •
p(1 − p) 0.34
•
p±3 = 0.3082 and 0.4028 •
921.8 0.32 •
The p chart suggests that absentee rates
0.3
are in control. (c) For October, the limits
0 2 4 6 8 10
are 0.3088 and 0.4022; for June, they are Sample number (month)
0.3072 and 0.4038. These limits appear
as solid lines on the p chart, but they are not substantially different from the control limits
found in (b). Unless n varies a lot from sample to sample, it is sufficient to use n.
8000
17.77. (a) p = = 0.008. We expect about 4 = (500)(0.008) defective orders per
1, 000, 000
month. (b) The center line and control limits are:
p(1 − p)
CL = p = 0.008, control limits: p ± 3 = −0.00395 and 0.01995
500
(We take the lower control limit to be 0.) It takes at least ten bad orders in a month to be
out of control because (500)(0.01995) = 9.975.
17.78. Control charts focus on ensuring that the process is consistent, not that the product
is good. An in-control process may consistently produce some percentage of low-quality
products. Keeping a process in control allows one to detect shifts in the distribution of the
output (which may have been caused by some correctable error); it does not help in fixing
problems that are inherent to the process.
468 Chapter 17 Statistics for Quality: Control and Capability
Frequency (percent)
complaints; that is, he or she could be 25
counted in several categories. (b) Clearly, 20
top priority should be given to the pro- 15
cess of creating, correcting, and adjusting
10
invoices, as the three most common com-
5
plaints involved invoices.
0
t
es
ed
nic ct s ent
op nual
te
e
live ...
ret .
Co f inv .
Me omp men
..
y o g ..
oic
Eq ity o y da
ce
oic
urn
tes
Te Corr hipm
rac inin
inv
ma
ng eten
hip
era
r
a
Co lete
es
.
f ..
Ac f obt
lls
de
let
Ca
mp
nt
o
c
mp
me
al
se
r
cu
eti
Cla
uip
Ea
ch
Complaint
17.80. (a) Use x and s charts to track the time required. (b) Use a p chart to track the
acceptance percentage. (c) Use x and s charts to track the thickness. (d) Use a p chart to
track the proportion of dropped calls.
17.81. On one level, these two events are similar: Points below the LCL on an x (s) chart
suggest that the process mean (standard deviation) may have decreased. The difference is in
the implications of such a decrease (if not due to a special cause). For the mean, a decrease
might signal a need to recalibrate the process in order to keep meeting specifications (that
is, to bring the process back into control). A decrease in the standard deviation, on the other
hand, typically does not indicate that adjustment or recalibration is necessary, but it will
require re-computation of the x chart control limits.
situation calls for a p chart with center line p = = 0.006 and control limits
6
17.82. This 1000
p(1 − p)
p±3 = 0.006 ± 0.01238. We take LCL = 0, and the UCL is 0.0184. (In order to
350
exceed this UCL, we would need to reject at least 7 of the 350 lots.)
Sample mean
840 •• •
control chart in the solution to the previous • •
• • • •
problem. Meanwhile, x =√834.5, and the •• •
• • •
control limits are x ± 3σ̂ / 3 = 821.86 to 830
•
847.14. The x chart gives no indication of • •
trouble—the process seems to be in control. •
820
0 5 10 15 20
Sample number
.
17.85. (a) As was found in the previous exercise, σ̂ = s/c4 = 7.295. Therefore,
.
Cp = 650σ̂ = 1.1423. This is a fairly small value of Cp ; the specification limits are just barely
wider than the 6σ̂ width of the process distribution, so if the mean wanders too far from
830, the capability will drop. (b) If we adjust the mean to be close to 830 mm × 10−4
(the center of the specification limits), we will maximize Cpk . Cpk is more useful when the
mean is not in the center of the specification limits. (c) The value of σ̂ used for determining
Cp was estimated from the values of s from our control samples. These are for estimating
short-term variation (within those samples) rather than the overall process variation. To
get a better estimate of the latter, we should instead compute the standard deviation s of
the individual measurements used to obtain the means and standard deviations given in
Table 17.11 (specifically, the 60 measurements remaining after dropping samples 1 and
10). These numbers are not available. (See “How to cheat on Cpk ” on page 17–44 of
Chapter 17.)
.
17.86. About 99.94%: With σ̂ = 7.295 and mean 830, we compute P(805 < X < 855) =
P(−3.43 < Z < 3.43) = 0.9994.
p(1 − p)
17.87. (a) Use a p chart, with center line p = 15
= 0.003 and control limits p ± 3
5000 ,
100
or 0 to 0.0194. (b) There is little useful information to be gained from keeping a p chart:
If the proportion remains at 0.003, about 74% of samples will yield a proportion of 0, and
about 22% of proportions will be 0.01. To call the process out of control, we would need to
see two or more unsatisfactory films in a sample of 100.
17.88. Assuming x is (approximately) Normally distributed, the probability that it would fall
within the 1σ level is about 0.68, so the probability that it does this 15 times is about
.
0.6815 = 0.0031.
17.89. Several interpretations of this problem are possible, but for most reasonable
interpretations, the probability is about 0.3%. From the description, it seems reasonable
to assume that all three points are inside the control limits; otherwise, the one-point-out
rule would take effect. Furthermore, the phrase “two out of three” could be taken to mean
either “exactly two out of three,” or “at least two out of three.” (Given what we are trying to
detect, the latter makes more sense, but students may have other ideas.)
For the kth point, we name the following events:
√
• Ak = “that point is no more than 2σ/ n from the center line,”
470 Chapter 17 Statistics for Quality: Control and Capability
(The factor of 1/2 accounts for the second and third points being on the same side of the
center line.) If the “other” point is the second or third point, this probability is the same,
so if we interpret “two out of three” as meaning “exactly two out of three,” then the total
probability is three times the above number:
.
P(false out-of-control signal from an in-control process) = 0.31% (or 0.26%)
With the (more-reasonable) interpretation “at least two out of three”:
P(false out-of-control signal) = 12 P(A1 ∩ B2 ∩ B3 ) + 12 P(B1 ∩ A2 ∩ B3 ) + 12 P(B1 ∩ B2 )
.
= 0.32% (or 0.27%)