0% found this document useful (0 votes)
81 views27 pages

Triola Cap 1

exercios resolvidos

Uploaded by

Odilia Guimaraes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views27 pages

Triola Cap 1

exercios resolvidos

Uploaded by

Odilia Guimaraes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Chapter 2

Summarizing and Graphing Data

2-2 Frequency Distributions

1. No. The first class frequency, for example, tells us only that there were 18 pennies with
weights in the 2.40-2.49 grams class, but there is no way to tell the exact values of those 18
weights.
2. The sum of the relative frequencies should be 1.00 when proportions are used, and it should be
100% when percentages are used.
3. No. This is not a relative frequency distribution because the sum of the percentages is not
100%. It appears that each respondent was asked to indicate whether he downloaded the four
types of material (and so the sum of the percentages could be anywhere from 0% to 400%),
and not to place himself in one of the four categories (in which case the table would be a
relative frequency distribution and the sum of the percentages would be 100%).
4. The gap in the frequencies suggests the table includes heights from two different populations.
Considering the values, it appears that the two populations are elementary students and
faculty/staff personnel at the school.
5. a. Class width: subtracting the first two lower class limits, 1410 = 4.
b. Class midpoints: the first class midpoint is (10+13)/2 = 11.5, and the others can be obtained
by adding the class width to get 11.5, 15.5, 19.5, 23.5, 27.5.
c. Class boundaries: the boundary between the first and second class is (13+14)/2 = 13.5, and
the others can be obtained by adding or subtracting the class width to get 9.5, 13.5, 17.5,
21.5, 25.5, 29.5.
6. a. Class width: subtracting the first two lower class limits, 62 = 4.
b. Class midpoints: the first class midpoint is (2+5)/2 = 3.5, and the others can be obtained by
adding the class width to get 3.5, 7.5, 11.5, 15.5.
c. Class boundaries: the boundary between the first and second class is (5+6)/2 = 5.5, and the
others can be obtained by adding or subtracting the class width to get 1.5, 5.5, 9.5, 13.5,
17.5.
7. a. Class width: subtracting the first two lower class limits, 1.000.00 = 1.00.
b. Class midpoints: the first class midpoint is (0.00+0.99)/2 = 0.495, and the others can be
obtained by adding the class width to get 0.495, 1.495, 2.495, 3.495, 4.495.
c. Class boundaries: the boundary between the first and second class is (0.99+1.00)/2 = 0.995,
and the others can be obtained by adding or subtracting the class width to get -0.005, 0.995,
1.995, 2.995, 3.995, 4.995.
8. a. Class width: subtracting the first two lower class limits, 1.000.00 = 1.00.
b. Class midpoints: the first class midpoint is (0.00+0.99)/2 = 0.495, and the others can be
obtained by adding the class width to get 0.495, 1.495, 2.495, 3.495, 4.495, 5.495
c. Class boundaries: the boundary between the first and second class is (0.99+1.00)/2 = 0.995,
and the others can be obtained by adding or subtracting the class width to get -0.005, 0.995,
1.995, 2.995, 3.995, 4.995, 5.995.

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


18 CHAPTER 2 Summarizing and Graphing Data

9. a. Strict interpretation: No; because there are more values at the upper end, there is not
symmetry.
b. Loose interpretation: Yes; there is a concentration of frequencies at the middle and a
tapering off in both directions.
10. a. Strict interpretation: No; the concentration of values is at the upper end.
b. Loose interpretation: No; the concentration of values is at the upper end.
11. The requested figure is given below at the left. Obtain each relative frequency by dividing the
given frequency by 25, the total number of observations in each table. The total line is not
necessary.
The non-filtered cigarettes have much more tar. Yes, the filters appear to be effective in
reducing the amount of tar.
Relative Frequency Comparison for #11 Relative Frequency Comparison for #12
cigarette type discard type
tar (mg) non-filtered filtered weight (lbs) metal plastic
25 0% 8% 0.000.99 8.1% 22.6%
69 0% 8% 1.001.99 41.9% 32.3%
1013 4% 24% 2.002.99 24.2% 33.9%
1417 0% 60% 3.00 3.99 19.4% 6.5%
1821 60% 0% 4.004.99 6.5% 3.2%
2225 28% 0% 5.005.99 0.0% 1.6%
2629 8% 0% total 100% 100%
total 100% 100%
12. The requested figure is given above at the right. Obtain each relative frequency by dividing
the given frequency by 62, the total number of observations in each table. The total line is
not necessary. [Due to rounding, percentages actually sum to 100.1%.]
While the weights cover approximately the same range, it appears that the weights for the
plastic are slightly smaller.

NOTE: For cumulative tables, this manual uses upper class boundaries in the less than column.
Consider exercise #13, for example, to understand why is done. Conceptually, weights occur on a
continuum and the integer values reported are assumed to be the nearest whole number
representation of the precise measure. An exact weight of 17.7, for example, would be reported as
18 and fall in the third class. The values in the second class, therefore, are better described as less
than 17.5 (using the upper class boundary) than as less than 18 (using the lower class limit of
the next class). This distinction is crucial in the construction of pictorial representations in the
next section. To present a visually simpler table, however, it is common practice to follow the
example in the text and use the lower class limit of the next class. Regardless of the less than
label, the final cumulative frequency must equal the total sample size and the sum of the
cumulative frequency column has no meaning and should never be included.

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Frequency Distributions SECTION 2-2 19

13. Obtain the cumulative frequency values 14. Obtain the cumulative frequency values
by adding the given frequencies. by adding the given frequencies.
tar (mg) in tar (mg) in
non-filtered cumulative filtered cumulative
cigarettes . frequency . cigarettes . frequency .
less than 13.5 1 less than 5.5 2
less than 17.5 1 less than 9.5 4
less than 21.5 16 less than 13.5 10
less than 25.5 23 less than 17.5 25
less than 29.5 25

15. Obtain the relative frequencies by 16. Obtain the relative frequencies by
dividing the given frequencies by the dividing the given frequencies by the
total of 2223. total of 570.
relative relative
category frequency those whose smoking frequency
male survivors 16.2% continued after the gum 33.5%
males who dies 62.8% stopped after the gum 10.4%
female survivors 15.5% continued after the patch 46.1%
females who died 5.5% stopped after the patch 10.0%
100.0% 100.0%

17. The requested table is given below. 18. The requested table is given below.
The frequency distribution of the last The data are assumed to relate to the
digits shows unusually high numbers of 1979 nuclear power plant accident at
0s and 5s. This is typical for data that Three Mile Island. Such data are
have been rounded off to convenient important because they can be helpful in
values. It appears that the heights were detecting potentially dangerous situations
reported and not actually measured. and in making recommendations for
digit frequency future action.
0 9 level of
1 2 strontium-90 frequency
2 1 110119 2
3 3 120129 2
4 1 130139 5
5 15 140149 9
6 2 150159 13
7 0 160169 6
8 3 170179 2
9 1 . 180189 1.
37 40

19. The requested table is given below. 20. The requested table is given below.
nicotine (mg) frequency The values appear to be lower than the
1.01.1 14 unfiltered ones in exercise #19.
1.21.3 4 nicotine (mg) frequency
1.41.5 3 0.20.3 1
1.61.7 3 0.40.5 1
1.81.9 1. 0.60.7 1
25 0.80.9 8
1.01.1 12
1.21.3 2.
25

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


20 CHAPTER 2 Summarizing and Graphing Data

21. The requested table is given below. 22. The requested table is given below.
No, the voltages do not appear to follow Yes, the voltages do appear to follow
a normal distribution instead of being a normal distribution there are many
concentrated near the middle of the values near the center of the distribution,
distribution, the values appear to be and the frequencies diminish toward
rather evenly distributed. either end. The values appear to be
voltage (volts) frequency higher than those in exercise #21.
123.3123.4 10 voltage (volts) frequency
123.5123.6 9 123.9124.0 2
123.7123.8 10 124.1124.2 1
123.9124.0 10 124.3123.4 6
124.1124.2 1. 124.5125.6 9
40 124.7124.8 13
124.9125.0 5
125.1125.2 4.
40
23. The requested table is given below. 24. The requested table is given below.
While over half of the screws are within Yes, the weights appear to have a
0.01 inches of the claimed value (28 of distribution that is approximately normal.
50 fall between 0.74 and 0.76), there are These weights are considerably higher
over twice as many screws below that than the weights in exercise #7.
range as there are above it (15 vs. 7). It weight (lbs) frequency
appears that there might be a slight 1.00 4.99 8
tendency to err on the side of making the 5.00 8.99 21
screws too small. 9.0012.99 22
length (in) frequency 13.0016.99 8
0.7200.729 5 17.0020.99 3.
0.7300.739 10 62
0.7400.749 11
0.7500.759 17
0.7600.769 7.
50

25. The requested table is given below. The ratings appear to have a distribution that is not
normal. While there is a maximum score with progressively smaller frequencies on either side
of the maximum, the distribution is definitely not symmetric (i.e., the maximum score is not
near the middle, but at the upper end of the distribution).
FICO score frequency
400449 1
450499 1
500549 5
550599 8
600649 12
650699 16
700749 19
750799 27
800849 10
850899 1.
100

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Frequency Distributions SECTION 2-2 21

26. The requested tables are given below. In each case the relative frequencies were obtained by
dividing the observed frequencies by 36.
REGULAR COKE DIET COKE
relative relative
weight (lbs) frequency weight (lbs) frequency
0.79000.7949 2.8% 0.77500.7799 11.1%
0.79500.7999 0.0% 0.78000.7849 36.1%
0.80000.8049 2.8% 0.78500.7899 41.7%
0.80500.8099 8.3% 0.79000.7949 11.1% .
0.81000.8149 11.1% 100.0%
0.81500.8199 47.2%
0.82000.8249 16.7%
0.82500.8299 11.1% .
100.0%
There are two significant differences between the data sets: the weights for Regular Coke are
considerably larger than those for Diet Coke, and the weights for Regular Coke cover a much
wider range than those for Diet Coke. This suggests that the sweetener in Regular Coke adds
weight to the product and does not distribute evenly throughout the product. As the company
produces more Regular Coke than Diet Coke, another possibility is that the harder-working
machines filling the Regular Coke may not be holding their tolerance as well and a wider
range in volume dispensed might account for the wider range of weights for Regular Coke.

27. The requested table is given below. 28. The requested table us given below.
weight (g) frequency The post-1964 quarters appear to have
6.00006.0499 2 weights that are lighter (due to their
6.05006.0999 3 different metallic composition) and
6.10006.1499 10 spread over a smaller range (due to
6.15006.1999 8 their fewer years in circulation).
6.20006.2499 6
weight (g) frequency
6.25006.2999 7
5.50005.5499 3
6.30006.3499 3
5.55005.5999 9
6.35006.3999 1.
5.60005.6499 11
40
5.65005.6999 9
5.70005.7499 7
5.75005.7999 1.
40

29. The requested table is given below. 30. The requested table is given below.
blood group frequency main cause frequency
O 22 bad track 23
A 20 faulty equipment 9
B 5 human error 12
AB 3. other 6.
50 50

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


22 CHAPTER 2 Summarizing and Graphing Data

31. The frequency distributions including and excluding the outlier are given below. In general, an
outlier can add several rows to a frequency distribution. Even though most of the added rows
have frequency zero, the table tends to suggest that these are possible values thus distorting
the readers mental image of the distribution.
0.0111 CANS (with the outlier) 0.0111 CANS (without the outlier)
weight (lbs) frequency weight (lbs) frequency
200 219 6 200 219 6
220 239 5 220 239 5
240 259 12 240 259 12
260 279 36 260 279 36
280 299 87 280 299 87
300 319 28 300 319 28 .
320 339 0 174
340 359 0
360 379 0
380 399 0
400 419 0
420 439 0
440 459 0
460 479 0
480 499 0
500 519 1 .
175

32. Let n = the number of data values and let x = the number of classes.
Either (1) solve the given formula x = 1 + (log n)/(log 2) for n to get n = 2x-1.
or (2) use trial-and-error by entering various values for n.
Use the values x = 5.5, 6.5, 7.5, Assuming n is at least 16, use the cut-off
to get cut-off values for n shown below. values to complete the table as follows.
x n = 2x-1 n ideal # of classes
5.5 22.63 16 22 5
6.5 45.25 23 45 6
7.5 90.51 46 90 7
8.5 181.02 91 181 8
9.5 362.04 182 362 9
10.5 724.04 363 724 10
11.5 1448.15 725 1448 11
12.5 2896.31 1449 2896 12
NOTE: Either the cut-off value method or the trial-and-error method indicates that
for n < 22.63, x rounds to 5.
for 22.63 < n < 45.25, x rounds to 6.
for 45.25 < n < 90.51, x rounds to 7.
for 90.51 < n < 181.02, x rounds to 8.
etc.

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Histograms SECTION 2-3 23

2-3 Histograms

1. The pulse rate data have been organized into 7 classes. Examining the frequency distribution
requires consideration of 14 pieces of information: the 7 class labels, and the 7 class
frequencies. The histogram efficiently presents the same information in one visual image and
gives all the relevant CVDOT (center, variation, distribution shape, outlier, [time is not
relevant for these data]) details in an intuitive format.
2. Not necessarily. Depending on how the potential subjects were approached, a voluntary
response sample of health data might fail to be representative of the general population for in
the following ways.
(a) Thinking they might receive free health information, those with health problems might be
more likely to volunteer.
(b) To avoid looking bad when compared with their peers, those with health problems might be
less likely to volunteer.
(c) The pool of potential volunteers may have been approached and/or identified so as to be
more homogeneous in some manner (racially, ethnically, etc.) than the general population,
and hence the sample data would not reflect the true range of values in the general
population.
3. The data set is small enough that the individual numbers can be examined; they do not require
summarization in a figure. The data set is not large enough for a histogram to reveal the true
nature of the distribution; the histogram will essentially be a repeat of the individual numbers.
4. In ordinary language, normal refers to that which is most common; in statistics, normal
refers to a specific pattern of values. A normal distribution is characterized by a distribution
that is approximately bell-shaped (i.e., bunching up in the middle, and tapering off
symmetrically at either end). Determining whether a distribution is approximately bell-shaped
requires subjective judgment.
NOTE: For exercises 5-8, the following values are used to answer the questions. It appears that the
midpoints of the first 3 classes are 5,000 and 10,000 and 15,000. It appears that the heights of the
5 bars are 2, 30, 8, 15, 5.
5. a. Adding the heights of all the bars, the total number is 2+30+8+15+5 = 60.
b. Adding the heights of the two rightmost bars, the number over 20,000 miles is 15+5 = 20.
6. a. Subtracting the first two midpoints, the class width is 10,0005,000 = 5,000 miles.
b. The upper class boundary of the first class is the average of the first two class midpoints,
(5,000+10,000)/2 = 7500. The lower class boundary of the first class is the upper class
boundary minus the class width, 75005000 = 2500. While it is unclear whether a reading
of exactly 7500 miles would fall into the first or second class, for example, the approximate
lower and upper limits of the first class are 2500 miles and 7500 miles.
7. a. The minimum possible miles traveled is the lower class boundary associated with the
leftmost bar, 2500 miles.
b. The maximum possible number of miles traveled is the upper class boundary associated
with the rightmost bar, 42,500 miles.
8. The histogram appears to include mileage amounts from two different populations. These
most likely represent automobiles which are driven in and out of the city each day but are
parked during the day (cars belonging to commuters) and automobiles that are driven during
the day (taxis, messenger and/or delivery cars).

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


24 CHAPTER 2 Summarizing and Graphing Data

9. The histogram is given below. The digits 0 and 5 occur disproportionately more than the
others. This is typical for data that have been rounded off to convenient values. It appears
that the heights were reported and not actually measured.
Last Digit of Student Heights
16

14

12

10
Frequency

0
0 1 2 3 4 5 6 7 8 9
digit

10. The histogram is given below. The true class midpoints are 114.5, 124.5, 134.5, etc. The
manual follows the text in presenting a histogram that communicates the information
in an appropriate, though approximate, manner.
Radiation in Baby Teeth
14

12

10
Frequency

0
115 125 135 145 155 165 175 185
Strontium-90 (millibecquerels)

11. The histogram is given below at the left.


Results for Non-Filtered Cigarettes Results for Filtered Cigarettes

14 12

12
10

10
8
Frequency

Frequency

8
6
6
4
4

2 2

0 0
1.05 1.25 1.45 1.65 1.85 0.25 0.45 0.65 0.85 1.05 1.25
nicotine (mg) nicotine (mg)

12. The histogram is given above at the right. For a better comparison, the two figures are placed
side by side. The nicotine amounts appear to be substantially lower for the filtered cigarettes.

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Histograms SECTION 2-3 25

13. The histogram is given below at the left. No, the voltages do not appear to follow a normal
distribution instead of being concentrated near the middle of the distribution, the values
appear to be rather evenly distributed.
Home Voltage Measurements
Generator Voltage Measurements
10 14

12
8
10
Frequency

Frequency
8

6
4

4
2
2

0 0
123.35 123.55 123.75 123.95 124.15 123.95 124.15 124.35 124.55 124.75 124.95 125.15
volts volts

14. The histogram is given above at the right. For a better comparison, the two figures are placed
side by side. Yes, the voltages do appear to follow a normal distribution there are many
values near the center of the distribution, and the frequencies diminish toward either end. The
values appear to be higher than those in exercise #13.

15. The histogram is given below. The true class boundaries are 0.7195, 0.7295, 0.7395, etc. The
manual follows the text in presenting a histogram that communicates the information in an
appropriate, though approximate, manner. While the 0.75 label appears reasonably accurate
in that all but 5 of the screws were within 0.02 of that value, it appears that there are slightly
more screws below the labeled value than above the labeled value and that the values extended
farther below the labeled value than above the labeled value.
3/4 Inch Screws
18

16

14

12
Frequency

10

0
0.72 0.73 0.74 0.75 0.76 0.77
length (inches)

16. The histogram is given below. The true class midpoints are 2.995, 6.995, 10.995, etc. The
manual follows the text in presenting a histogram that communicates the information in an
appropriate, though approximate, manner. Yes, the weights appear to have a distribution that
is approximately normal.

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


26 CHAPTER 2 Summarizing and Graphing Data

Discarded Paper
25

20

15
Frequency

10

0
3 7 11 15 19
weight (lbs)

17. The histogram is given below. The true class boundaries are 399.5, 449.5, 499.5, etc. The
manual follows the text in presenting a histogram that communicates the information in an
appropriate, though approximate, manner. The ratings appear to have a distribution that is not
normal. While there is a maximum score with progressively smaller frequencies on either side
of the maximum, the distribution is definitely not symmetric (i.e., the maximum score is not
near the middle, but at the upper end of the distribution).
Credit Rating
30

25

20
Frequency

15

10

0
400 450 500 550 600 650 700 750 800 850 900
FICO score

18. The two relative frequency histograms are given below. For the sake of comparison, the same
horizontal and vertical axes have been used for both histograms.
REGULAR COKE DIET COKE

50 50

40 40

30 30
Percent

Percent

20 20

10 10

0 0
0.78 0.79 0.80 0.81 0.82 0.83 0.78 0.79 0.80 0.81 0.82 0.83
weight (lbs) weight (lbs)

Each set of weights appears to have a distribution that is approximately normal, but there are

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Histograms SECTION 2-3 27

two significant differences between the two sets: the weights for Regular Coke are
considerably larger than those for Diet Coke, and the weights for Regular Coke cover a much
wider range than those for Diet Coke. This suggests that the sweetener in Regular Coke adds
weight to the product and does not distribute evenly throughout the product. Since the
company produces more Regular Coke than Diet Coke, another possibility is that the harder-
working machines filling the Regular Coke may not be holding their tolerance as well and a
wider range in volume dispensed might account for the wider range of weights for Regular
Coke.
19. The histogram is given below at the left. The true class boundaries are 5.99995, 6.04995,
6.09995, etc. The manual follows the text in presenting a histogram that communicates the
information in an appropriate, though approximate, manner.
Pre-1964 Quarters Post-1964 Quarters
12
10

10
8

8
Frequency

Frequency
6
6

4
4

2
2

0 0
6.00 6.05 6.10 6.15 6.20 6.25 6.30 6.35 6.40 5.50 5.55 5.60 5.65 5.70 5.75 5.80
weight (grams) weight (grams)

20. The histogram is given above at the right. For a better comparison, the two figures are placed
side by side. The true class boundaries are 5.49995, 5.54995, 5.59995, etc. The manual
follows the text in presenting a histogram that communicates the information in an appropriate,
though approximate, manner. The post-1964 quarters appear to have weights that are lighter
(due to their different metallic composition) and spread over a smaller range (due to their
fewer years in circulation).

21. The back-to-back relative frequency histograms are given below. The pulse rates of the males
tend to be lower than those of the females.
PULSE RATES

50-59

60-69

70-79

80-89

90-99

100-109

110-119

120-129

40 30 20 10 0 0 10 20 30 40
WOMEN MEN
(percent relative frequency)

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


28 CHAPTER 2 Summarizing and Graphing Data

22. The two requested histograms are given below. They give very different visual images of the
shape of the distribution. An outlier can have a significant effect on the histogram.
WITH THE OUTLIER

90
80
70
60
frequency

50
40
30
20
10
0
210 250 290 330 370 410 450 490
weight (lbs)

WITHOUT THE OUTLIER

90
80
70
60
frequency

50
40
30
20
10
0
210 230 250 270 290 310
weight (lbs)

2-4 Statistical Graphics

1. The dotplot permits identification of each original value and is easier to construct. The dotplot
gives an accurate visual impression of the proportion of the data within any selected range of
values; while the polygon is limited to impressions concerning the specified classes (and only
the heights at the class midpoints, and not the areas under the lines, give an accurate visual
impression of those proportions).
2. A scatterplot requires paired data from two quantitative variables typically either two pieces
of data from each experimental unit (e.g., a childs height and weight), or data from two
different sets for which each value from one set may be appropriately associated with a value
from the second set (e.g., the weight of the male child and the weight of the female child from
mixed-gender twins) . The scatterplot can reveal the nature of the relationship between the two
variables.
3. Using relative frequencies allows direct comparison of the two polygons. When two sets of
data have different sample sizes, the larger data set will naturally have higher frequencies and
direct comparison of the heights of the two polygons does not give meaningful information.
4. Since categories in a Pareto chart are ordered according to frequency, Pareto charts clearly
show the relative positions of the categories under investigation. In addition, the Pareto chart
is based on height and the pie chart is based on area and it is easier to compare heights than
areas.

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Statistical Graphics SECTION 2-4 29

5. The dotplot is given below. The Strontium-90 levels appear to have a spread-out normal
distribution, a wide range of values clustered around 150 and occurring with less frequency at
the extremes.

POST-1979 BABY TEETH

120 130 140 150 160 170 180 190


Strontium-90 (millibecquerels)

6. The stemplot is given below. The Strontium-90 levels appear to have a normal distribution
clustered around 150.
Strontium-90 (mBq)
11 | 46
12 | 89
13 | 03678
14 | 022455579
15 | 0001111256688
16 | 133569
17 | 02
18 | 8

7. The frequency polygon is given below at the left.


NOTE: The frequencies are plotted at the class midpoints, which are not integer values. The
polygon must begin and end at zero at the midpoints of the adjoining classes that contain no
data values.
POLYGON FOR POST-1979 BABY TEETH OGIVE FOR POST-1979 BABY TEETH
14
40

12
cumulative frequency

10 30
frequency

8
20
6

4
10

0 0
104.5 114.5 124.5 134.5 144.5 154.5 164.5 174.5 184.5 194.5 109.5 119.5 129.5 139.5 149.5 159.5 169.5 179.5 189.5
Strontium-90 (millibecquerels) Strontium-90 (millibecquerels)

8. The ogive is given above at the right. Using the figure: move up from 150 on the horizontal
scale to intersect the graph, then move left to intersect the vertical scale at 18. This indicates
there were approximately 18 data values which would have been recorded as being below 150,
which agrees with the actual data values.
NOTE: Ogives always begin on the vertical axis at zero and end at n, the total number of data
values. All cumulative values are plotted at the upper class boundaries.

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


30 CHAPTER 2 Summarizing and Graphing Data

9. The stemplot is given below. The weights appear to be approximately normally distributed,
except perhaps for the necessary lower truncation at zero.
weight (pounds)
0. | 12356677888999
1. | 11234444444445556678
2. | 001111113334668888999
3. | 9345
4. | 36
5. | 2

10. The dotplot is given below. The weights appear to be approximately normally distributed,
except perhaps for the presence of a few high values.

DISCARDED PLASTIC

-0.0 0.7 1.4 2.1 2.8 3.5 4.2 4.9


weight (pounds)

11. The ogive is given below at the left. Using the figure: move up from 4 on the horizontal
scale to intersect the graph, then move left to intersect the vertical scale at 59. This indicates
there were approximately 59 data values which would have been recorded as being below 4,
which agrees with the actual data values.
NOTE: Ogives always begin on the vertical axis at zero and end at n, the total number of data
values. All cumulative values are plotted at the upper class boundaries.
OGIVE FOR DISCARDED PLASTIC POLYGON FOR DISCARDED PLASTIC
70
20
60
cumulative frequency

50 15
frequency

40

10
30

20
5
10

0 0
-0.005 0.995 1.995 2.995 3.995 4.995 5.995 -0.505 0.495 1.495 2.495 3.495 4.495 5.495 6.495
weight (pounds) weight (pounds)

12. The frequency polygon is given above at the right.


NOTE: The frequencies are plotted at the class midpoints, which have one more decimal place
than the original data. The polygon must begin and end at zero at the midpoints of the
adjoining classes that contain no data values.

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Statistical Graphics SECTION 2-4 31

13. The Pareto chart is given below.

PARETO CHART FOR UNDERGRADUATE ENROLLMENTS

40

30
percent

20

10

0
Public Public Private Private
4-Year 2-Year 4-Year 2-Year
TYPE OF INSTITUTION

14. The pie chart is given below. The slices of the pie may appear in any order and in any
position, but their relative sizes must be as shown. The Pareto chart is more effective than the
pie chart. While it is clear which bar in the Pareto chart is the tallest, it is not clear which area
in the pie chart is the largest.

PIE CHART FOR UNDERGRADUATE ENROLLMENTS


Private 2-Year

Private 4-Year

Public 4-Year

Public 2-Year

TYPE OF INSTITUTION

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


32 CHAPTER 2 Summarizing and Graphing Data

15. The pie chart is given below. The slices of the pie may appear in any order and in any
position, but their relative sizes must be as shown. There were 1231 total responses, and the
central angle of the pie chart for each category was determined as follows.
Interview: 452/1231 = 36.7%, and 36.7% of 360 is 132
Resume: 297/1231 = 24.1%, and 24.1% of 360 is 87
Reference Checks: 143/1231 = 11.6%, and 11.6% of 360 is 42
Cover Letter: 141/1231 = 11/5%, and 11.5% of 360 is 41
Interview Follow-Up: 113/1231 = 9.2%, and 9.2% of 360 is 33
Screening Call: 85/1231 = 6.9%, and 6.9% of 360 is 35
Screening Call

Interview Follow-Up

Interview
Reference Checks

Cover Letter

Resume

AREA OF THE MISTAKE

PIE CHART FOR JOB APPLICATION MISTAKES

16. The Pareto Chart is given below. The Pareto chart is more effective than the pie chart.

PARETO CHART OF JOB APPLICATION MISTAKES


500

400

300
frequency

200

100

0
Reference Interview Screening
Interview Resume Cover
Checks Follow-Up Call
Letter
AREA OF THE MISTAKE

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Statistical Graphics SECTION 2-4 33

17. The pie chart is given below at the left. The slices of the pie may appear in any order
and in any position, but their relative sizes must be as shown.
BLOOD GROUPS 25

AB
20
B

15

frequency
A O
10

0
O A B AB
BLOOD GROUPS

18. The Pareto chart is given above at the right.

19. The Pareto chart is given below at the left.


CAUSES OF TRAIN DERAILMENTS CAUSES OF TRAIN DERAILMENTS
25

Other
20

Faulty Equipment Bad Track


15
frequency

10
Human Error

0
Bad Track Human Error Faulty Equipment Other

20. The pie chart is given above at the right. The slices of the pie may appear in any order
and in any position, but their relative sizes must be as shown.

21. The simple, unmodified scatterplot is given below. There appears to be a slight tendency for
cigarettes with more tar to also have more CO.
Scatterplot of CO vs Tar

18

17
CO

16

15

14

10 12 14 16 18 20 22 24 26 28
Tar

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


34 CHAPTER 2 Summarizing and Graphing Data

NOTE: The above scatterplot shows only 9 data points, even though there were 25 pairs of
tar/CO data points in the original sample. Since the scatterplot actually shows less than half
the information contained in the sample, it may not provide an accurate picture of the data.
This is caused by duplicate values: the (22,14) and (23,15) and (27,16) each appear 2 times,
and the (20,16) pair appears 14 times! Two modifications that adjust for this phenomenon are
shown below. The scatterplot on the left inserts numbers to tell how many data points are
represented by dots that indicating duplicate values. The scatterplot on the right shows the true
number of dots. The same effect can also be obtained by using dots whose size is proportional
to the number of duplicate values it represents. The modified scatterplots indicate that there
appears to be no relationship between the amounts of tar and CO.
Scatterplot of CO vs Tar Scatterplot of CO vs Tar

18 18

17 17

14 2 16

CO
CO

16

2 15
15

2 14
14
10 12 14 16 18 20 22 24 26 28
10 12 14 16 18 20 22 24 26 28 Tar
Tar

22. The scatterplot is given below. It appears that more energy is used on days when it is very
cold (for heating) or very warm (for air conditioning).
HOME ENERGY CONSUMPTION
4500

4000

3500
kWh

3000

2500

2000
20 30 40 50 60 70 80
Average Daily Temperatute (Fahrenheit)

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Statistical Graphics SECTION 2-4 35

23. The time series graph is given below. Note that the given years are not evenly spaced.
TRANSISTORS PER SQUARE INCH

400000

300000
Transistors (1000's)

200000

100000

1970 1975 1980 1985 1990 1995 2000 2005


Year

24. The time series graph is given below. The graph does not appear to show linear growth
(constant slope) over the entire time period, but it does appear that there was linear growth
during certain periods (e.g., since 1999).
US CELL PHONE SUBSCRIPTIONS

200000

175000
subscribers (1000's)

150000

125000

100000

75000

50000

25000

1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005
Year

25. The multiple bar graph is given on the next page. As the population increases, the numbers of
marriages and divorces will automatically increase. To identify any change in marriage and
divorce patterns, one needs to examine the rates. This is analogous to using percents (or
relative frequencies) instead of frequencies to compare categories for two samples of different
sizes. The marriage rate appears to have remained fairly constant, with a possible slight
decrease in recent years. The divorce rate appears to have steadily grown, with a possible
slight decrease in recent years.

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


36 CHAPTER 2 Summarizing and Graphing Data

MARRIAGE AND DIVORCE RATES

12 M = marriage rate
D = divorce rate

10
rate per 1000

0
M D M D M D M D M D M D M D M D M D M D M D
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

26. The multiple bar graph is given below. The females consistently outnumber the males, and the
numbers of both genders are gradually increasing over time.
GENDERS OF STUDENTS
12000
M = male
F = female
10000
students (1000's)

8000

6000

4000

2000

0
M F M F M F M F M F M F M F
2004 2005 2006 2007 2008 2009 2010 year

27. The back-to-back stemplot is given below. The pulse rates for men appear to be lower than the
pulse rates of women.
PULSE RATES
Women | | Men .
| 5 | 666666
888884444000 | 6 | 00000004444444888
66666622222222 | 7 | 22222266
88888000000 | 8 | 44448888
6| 9|6
4 | 10 |
| 11 |
4 | 12 |

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Statistical Graphics SECTION 2-4 37

28. a. The next two rows of the expanded stemplot come from the original 70s row as follows.
7 | 22222222
7 | 666666
b. The completed condensed stemplot is as follows.
FEMALE PULSE RATES
stem | leaves .
6 7 | 000444488888*22222222666666
8 9 | 00000088888*6
1011 | 4*
1213 | 4*

2-5 Critical Thinking: Bad Graphs

1. The illustration uses two-dimensional objects (dollar bills) to represent a one-dimensional


variable (purchasing power). If the illustration uses a dollar bill with the original length and
the original width to represent the original purchasing power, then the illustration is
misleading (because the length and the width translates into the area and gives the
visual impression of 25% instead of 50%). But if the illustration uses a dollar bill with the
area (i.e, with .707 of the original length and .707 of the original width) to represent the
original purchasing power, then the illustration conveys the proper visual impression.
2. No. Since the data comes from a voluntary response sample it may not be representative of the
population. Since the sample may not be representative, even sound graphing techniques will
not necessarily provide accurate understanding of the population.
3. No. Results should be presented in a way that is fair and objective so that the reader has the
reliable information necessary to reach his own conclusion.
4. No, the resulting graph is not misleading. Since the variable of interest (area) is two-
dimensional, it is appropriate to use corresponding two-dimensional figures to make
comparisons.
5. No. The illustration uses two-dimensional objects to represent a one-dimensional variable
(weight). The average male weight is 172/137 = 1.255 times the average female weight.
Making a two-dimensional figure 1.255 times taller and 1.255 times wider increases the area
by (1.255)2 = 1.58 and gives a misleading visual impression.
6. The graph creates the impression
that men have salaries that are Average Teaching Salaries at Private Colleges and Universities

more than twice that of women. 80

The distortion occurs because the 70


vertical scale does not start at 60
zero. A graph that depicts the
salary ($1000's)

50
data fairly is given at the right. 40

30

20

10

0
women men

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


38 CHAPTER 2 Summarizing and Graphing Data

7. The average income for men is


Average Full-Time Incomes for Persons 18 and Older
about 1.4 times the average income
for women. Making the mens 60

pictograph 1.4 times as wide and 50

1.4 times as high as the womens


40

income ($1000's)
produces a mens image with
(1.4)2 = 1.96 times the area of the 30

womens image. Since it is the 20


area that gives the visual
10
impression in a two-dimensional
figure, the mens average income 0
women men
appears to be almost twice that of
the womens average income. A
graph that depicts the data fairly is
given at the right.
8. The oil consumption for the USA
is about 3.7 times the oil Daily Oil Consumption

consumption for Japan. Making 20

the USAs pictograph 3.7 times


16
larger than Japans in three
dimensions produces an image for
millions of barrels

12
the USA with (3.7)3 = 50 times the
volume of the image for Japan. 8

Since it is the perceived volume


4
that gives the visual impression in
the given figure, the consumption 0
for the USA appears to be 50 times USA Japan

that for Japan. A graph depicting


the data fairly is given at the right.
9. The graph in the text makes it
BRAKING DISTANCES
appear that the braking distance for
the Acura RL is more than twice 200

that of the Volvo S80. The actual 175

difference is about 60 feet, and the 150

Acura RL distances is about 125

192/133 = 1.44 times that of the 100


feet

Volvo S80. The exaggeration of 75

differences is caused by the fact 50

that the distance scale dies not start 25

at zero. A graph that depicts the 0


Acura RL Honda Accord Volvo S80
data fairly is given at the right.

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Critical Thinking: Bad Graphs SECTION 2-5 39

10. The graph given in the text is misleading ADOPTIONS FROM CHINA
because it gives the visual impression that
the number of adoptions has more than 7000

doubled. Not starting the vertical axis at 6000

number of adoptions
5000
zero exaggerates the differences between
4000
categories. A graph that depicts the data 3000
fairly is given at the right. 2000

1000

0
2000 2005
year

11. The given figure is misleading because the backside of the head is not visible. Categories
extending to the backside of the head will not have as much area showing as comparable
categories shown at the front of the head. A regular pie chart would give the relative sizes of
the categories in as undistorted manner. A better graph would be a bar chart with the vertical
axis starting at 0, and the categories given in order by age. When there is a natural ordering of
the categories that can be preserved with a bar chart but it is hidden in a pie chart, which
ends up placing the first and last categories side by side.
12. For easier comparison, the two graphs are given side by side with the same horizontal scale.
a. The graph below on the left exaggerates the differences between categories by not starting
the vertical scale at zero.
b. The graph below on the right depicts the data fairly.
DISTRIBUTION OF U.S. UNDERGRADUATES DISTRIBUTION OF U.S. UNDERGRADUATES
65 70

60 60
prcentage of students

55
prcentage of students

50

50
40

45
30

40
20

35
10
30
0
2-year colleges 4-year colleges 2-year colleges 4-year colleges

Statistical Literacy and Critical Thinking

1. When investigating the distribution of a data set, a histogram is more effective than a
frequency distribution. Both figures contain the same information, but the visual impact of the
histogram presents that information in a more efficient and more understandable manner.
2. When investigating changes over a period of years, a time series graph would be more
effective than a histogram. A histogram would indicate the frequency with which different
amounts occurred, but by ignoring the years in which those amounts occurred it would give no
information about changes over time.
3. Using two-dimensional figures to compare one-dimensional variables exaggerates differences
whenever the areas of the two dimensional figures are not proportional to the amounts being
portrayed. Making the height and width proportional to the amounts being portrayed creates a

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


40 CHAPTER 2 Summarizing and Graphing Data

distorted picture because it is area that makes the visual impression on the reader and a two-
fold increase in height and width produces a four-fold increase in area.
4. The highest histogram bars should be near the center, with the heights of the bars diminishing
toward each end. The figure should be approximately symmetric.

Chapter Quick Quiz

1. 10 0 = 10. The class width may be found by subtracting consecutive lower class limits.
2. Assuming the data represent values reported to the nearest integer, the class boundaries for the
first class are -0.5 and 9.5.
3. No. All that can be said is that there are 27 data values somewhere within that class.
4. False. A normal distribution is bell-shaped, with the middle classes having higher frequencies
than the classes at the extremes. The distribution for a balanced die will be flat, with each
class having about the same frequency.
5. Variation.
6. 52, 52, 59. The 5 to the left of the stem represent the tens digit associated with the ones digits
to the right of the stem.
7. Scatterplot. The data is two-dimensional, requiring separate axes for each variable (shoe size
and height).
8. True. The vertical scale for the relative frequency histogram will be the values of the
frequency histogram divided by the sample size n.
9. A histogram reveals the shape of the distribution of the data.
10. Pareto chart. When there is no natural order for the categories, placing them in the order of
their frequencies shows the relative importance without losing the nature of any significant
relationships between the categories.

Review Exercises

1. The frequency distribution is given at the right. The pulse rates MALE PULSE RATES
for the males appear to be lower than those for the females. beats per
minute frequency
5059 6
6069 17
7079 8
8089 8
9099 1.
40

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Review Exercises 41

2. The histogram is given below. The basic shape is similar to the histogram for the females, but
the male pulse rates appear to be lower.

MALE PULSE RATES


18

16

14

12
Frequency

10

0
50 60 70 80 90 100
beats per minute

3. The dotplot is given below at the left. It shows that the male pulse rates appear to be lower
than those for the females.
beats per minute
MA L E P U L S E R A T E S 5 | 666666
6 | 00000004444444888
7 | 22222266
8 | 44448888
60 66 72 78 84 90 96
b e a t s p e r mi n u t e
9|6

4. The stemplot is given above at the right. It shows that the male pulse rates appear to be lower
than those of the females.
5. The scatterplot is given below.
CAR WEIGHT AND BRAKING DISTANCE

145
braking distance (ft)

140

135

130

125
3300 3400 3500 3600 3700 3800 3900 4000 4100 4200
weight (lbs)

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


42 CHAPTER 2 Summarizing and Graphing Data

6. The time-series graph is given below.


ANNUAL SUNSPOT NUMBERS
160

140

120
sunspot number

100

80

60

40

20

0
1980 1984 1988 1992 1996 2000 2004
year

7. The graph is misleading because the vertical axis does not start at zero, causing it to exaggerate
the differences between the categories. A graph that correctly illustrates the acceleration times
is given below.
CAR ACCELERATION TIMES

6
seconds

0
Volvo XC-90 Audi Q7 VW Passat BMW 3 Series

8. a. 25. The difference between the first two lower class limits is 125 100 = 25.
b. 100 and 124. These are the values given in the first row of the table.
c. 99.5 and 124.5. Values within these boundaries will round to the whole numbers given by
the class limits.
d. No. The distribution is not symmetric; the class with the largest frequency is near the right
end of the distribution.

Cumulative Review Exercises

1. Yes. The sum of the relative frequencies is 100%.

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.


Cumulative Review Exercises 43

2. Nominal. The yes-no-maybe responses are categories only; they do not provide numerical
measures of any quantity, nor do they have any natural ordering.
3. The actual numbers of responses are as follows. Note that 884 + 433 + 416 = 1733.
Yes: (0.51)(1733) = 884. No: (0.25)(1733) = 433. Maybe: (0.24)(1733) = 416
4. Voluntary Response Sample. A voluntary response sample is not likely to be representative of
the population of all executives, but of those executives who had strong feelings about the
topic and/or had enough free time to respond to such a survey.
5. a. A random sample is a sample in which every member of the population has an equal chance
of being selected.
b. A simple random sample of size n is a sample in which every possible sample of size n has
an equal chance of being selected.
c. Yes, it is a random sample because every person in the population of 300,000,000 has the
same chance of being selected. No, it is not a simple random sample because all possible
groups of 1000 do not have the same chance of being selected in fact a group of 1000
composed of the oldest person in the each of the first 1000 of the 300,000 groups has no
chance of being selected.
6. a. 100. The difference between the first two lower class limits is 100 0 = 100.
b. -0.5 and 99.5. Values within these boundaries will round to the whole numbers given by
the class limits.
c. 11/40 = 0.275, or 27.5%. The total of the frequencies is 40.
d. Ratio. Differences between the data values are meaningful and there is a meaningful zero.
e. Quantitative. The data values are measurements of the cotinine levels.
7. The histogram is given below. Using a strict interpretation of the criteria, the cotinine levels
do not appear to be normally distributed the values appear to be concentrated in the lower
portion of the distribution, with very few values in the upper portion.
NOTE: The histogram bars extend from class boundary to class boundary. We follow the text
and for clarity of presentation use the labels 0, 100, 200, etc. instead of the more cumbersome
-0.5, 99.5, 199.5 etc.
COTININE LEVELS OF SMOKERS
14

12

10
frequency

0
0 100 200 300 400 500
ng/mL

8. Statistic. A statistic is a measurement of some characteristic of a sample, while a parameter is


a measurement of some characteristic of the entire population.

Copyright 2010 Pearson Education, Inc. Publishing as Addison-Wesley.

You might also like