Descriptive Statistics
Descriptive Statistics
Statistics
In this chapter, you will study numerical
and graphical ways to describe and display
your data. This area of statistics is
called "Descriptive Statistics." You will
learn how to calculate, and even more
importantly, how to interpret these
measurements and graphs.
• A statistical graph is a tool
that helps you learn about the
shape or distribution of a
sample or a population.
• A graph can be a more
effective way of presenting
data than a mass of numbers
because we can see where
data clusters and where there
are only a few data values.
• Some of the types of graphs that are
used to summarize and organize data
are: the dot plot, the bar graph, the
histogram, the stem-and-leaf plot,
the frequency polygon (a type of
broken line graph), the pie chart,
and the box plot.
Stem-and-Leaf
Graphs (Stemplots),
Line Graphs, and
Bar Graphs
One simple graph,
the stem-and-leaf
graph or stemplot,
comes from the field of
exploratory data
analysis. It is a good
choice when the data
sets are small.
To create the plot, divide each observation of data
into a stem and a leaf. The leaf consists of a final
significant digit. For example, 23 has stem two and
leaf three. The number 432 has stem 43 and leaf
two. Likewise, the number 5,432 has stem 543 and
leaf two. The decimal 9.3 has stem nine and leaf
three. Write the stems in a vertical line from
smallest to largest. Draw a vertical line to the right
of the stems. Then write the leaves in increasing
order next to their corresponding stem.
For Susan Dean's spring pre-calculus class, scores for the first exam were as follows (smallest
to largest):
33; 42; 49; 49; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80; 83; 88; 88; 88; 90;
92; 94; 94; 94; 94; 96; 100
Stem Leaf
3 3
4 299
5 355
Table 2.1 Stem-and-Leaf
6 1378899 Graph
7 2348
8 03888
9 0244446
10 0
The stemplot is a quick way to graph data and gives an exact picture
of the data. You want to look for an overall pattern and any outliers.
An outlier is an observation of data that does not fit the rest of the
data. It is sometimes called an extreme value. When you graph an
outlier, it will appear not to fit the pattern of the graph. Some
outliers are due to mistakes (for example, writing down 50 instead of
500) while others may indicate that something unusual is happening.
It takes some background information to explain outliers, so we will
cover them in more detail later.
Stem Leaf
The data are the distances (in 1 15
kilometers) from a home to local 2 357
supermarkets. 3 23358
Create a stemplot using the data: 4 025578
5 56
1.1; 1.5; 2.3; 2.5; 2.7; 3.2; 3.3;
6 57
3.3; 3.5; 3.8; 4.0; 4.2; 4.5; 4.5;
7
4.7; 4.8; 5.5; 5.6; 6.5; 6.7; 12.3
8
Buchanan 65 Coolidge 51
President Age President Age President Age
Washington 67 Lincoln 56 Hoover 90
J. Adams 90 A. Johnson 66 F. Roosevelt 63
Jefferson 83 Grant 63 Truman 88
Madison 85 Hayes 70 Eisenhower 78
Monroe 73 Garfield 49 Kennedy 46
J. Q. Adams 80 Arthur 56 L. Johnson 64
•Table 2.5 Presidential Age at Jackson 78 Cleveland 71 Nixon 81
Death Van Buren 79 B. Harrison 67 Ford 93
W. H. Harrison 68 Cleveland 71 Reagan 93
Tyler 71 McKinley 58
Polk 53 T. Roosevelt 60
Taylor 65 Taft 72
Fillmore 74 Wilson 67
Pierce 64 Harding 57
Buchanan 77 Coolidge 60
Ages at Inauguration Ages at Death
998777632 4 69
877776665555444
5 366778 •Solution
4422111110
9854421110 6 003344567778
7 0011147889
8 01358
9 0033
Line Graph
Proportion
Number of (%) of
Age Facebook Facebook
groups users users
13–25 65,082,280 45%
26–44 53,300,200 36%
45–64 27,885,100 19%
Activity:
The following data show the distances (in miles) from the
homes of off-campus statistics students to the college.
Create a stem plot using the data and identify any outliers:
• 0.5; 0.7; 1.1; 1.2; 1.2; 1.3; 1.3; 1.5; 1.5; 1.7; 1.7; 1.8; 1.9;
2.0; 2.2; 2.5; 2.6; 2.8; 2.8; 2.8; 3.5; 3.8; 4.4; 4.8; 4.9; 5.2;
5.5; 5.7; 5.8; 8.0
Histogram
• A histogram consists of contiguous (adjoining)
boxes. It has both a horizontal axis and a
vertical axis. The horizontal axis is labeled with
what the data represents (for instance, distance
from your home to school). The vertical axis is
labeled either frequency or relative
frequency (or percent frequency or
probability). The graph will have the same
shape with either label. The histogram (like the
stemplot) can give you the shape of the data,
the center, and the spread of the data.
• The relative frequency is equal to the frequency for an observed value
of the data divided by the total number of data values in the sample.
(Remember, frequency is defined as the number of times an answer
occurs.) If:
• f = frequency
• n = total number of data values (or the sum of the individual frequencies), and
• RF = relative frequency,
• then:
RF=f / n
For example, if three students in
Mr. Ahab's English class of 40
students received from 90% to
100%, then, f = 3, n = 40,
and RF = f/n = 3 /40 = 0.075.
7.5% of the students received
90–100%. 90–100% are
quantitative measures.
• To construct a histogram, first decide how
many bars or intervals, also called classes, represent the data.
• Many histograms consist of five to 15 bars or classes for clarity.
The number of bars needs to be chosen.
• Choose a starting point for the first interval to be less than the
smallest data value.
• A convenient starting point is a lower value carried out to one
more decimal place than the value with the most decimal places.
• For example, if the value with the most decimal places is 6.1 and this
is the smallest value, a convenient starting point is 6.05 (6.1 – 0.05 =
6.05). We say that 6.05 has more precision. If the value with the most
decimal places is 2.23 and the lowest value is 1.5, a convenient starting
point is 1.495 (1.5 – 0.005 = 1.495). If the value with the most decimal
places is 3.234 and the lowest value is 1.0, a convenient starting point is
0.9995 (1.0 – 0.0005 = 0.9995).
If all the data happen to be integers and the smallest
value is two, then a convenient starting point is 1.5 (2 –
0.5 = 1.5). Also, when the starting point and other
boundaries are carried to one additional decimal place,
no data value will fall on a boundary. The next two
examples go into detail about how to construct a
histogram using continuous data and how to create a
histogram using discrete data.
•EXAMPLE
•The following data are the heights (in inches to the nearest half inch) of 100 male
semiprofessional soccer players. The heights are continuous data, since height is
measured.
60; 60.5; 61; 61; 61.5;63.5; 63.5; 63.5
64; 64; 64; 64; 64; 64; 64; 64.5; 64.5; 64.5; 64.5; 64.5; 64.5; 64.5; 64.5
66; 66; 66; 66; 66; 66; 66; 66; 66; 66; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5;
66.5; 66.5; 66.5; 67; 67; 67; 67; 67; 67; 67; 67; 67; 67; 67; 67; 67.5; 67.5; 67.5; 67.5;
67.5; 67.5; 67.5
68; 68; 69; 69; 69; 69; 69; 69; 69; 69; 69; 69; 69.5; 69.5; 69.5; 69.5; 69.5
70; 70; 70; 70; 70; 70; 70.5; 70.5; 70.5; 71; 71; 71
72; 72; 72; 72.5; 72.5; 73; 73.5
74
•The smallest data value is 60. Since the data with the most decimal places has
one decimal (for instance, 61.5), we want our starting point to have two decimal
places. Since the numbers 0.5, 0.05, 0.005, etc. are convenient numbers, use
0.05 and subtract it from 60, the smallest value, for the convenient starting point.
•60 – 0.05 = 59.95 which is more precise than, say, 61.5 by one decimal place.
The starting point is, then, 59.95.
(74.05−59.95) / 8 = 1.76
•NOTE
•We will round up to two and make each bar or class
interval two units wide. Rounding up to two is one
way to prevent a value from falling on a boundary.
Rounding to the next number is often necessary even
if it goes against the standard rules of rounding. For
this example, using 1.76 as the width would also
work. A guideline that is followed by some for the
number of bars or class intervals is to take the square
root of the number of data values and then round to
the nearest whole number, if necessary. For example,
if there are 150 values of data, take the square root of
150 and round to 12 bars or intervals.
The heights 60 through 61.5 inches are in the
The boundaries are: interval 59.95–61.95. The heights that are
•59.95 63.5 are in the interval 61.95–63.95. The
•59.95 + 2 = 61.95 heights that are 64 through 64.5 are in the
•61.95 + 2 = 63.95 interval 63.95–65.95. The heights 66 through
•63.95 + 2 = 65.95 67.5 are in the interval 65.95–67.95. The
•65.95 + 2 = 67.95 heights 68 through 69.5 are in the interval
•67.95 + 2 = 69.95 67.95–69.95. The heights 70 through 71 are in
•69.95 + 2 = 71.95 the interval 69.95–71.95. The heights 72
•71.95 + 2 = 73.95 through 73.5 are in the interval 71.95–73.95.
•73.95 + 2 = 75.95 The height 74 is in the interval 73.95–75.95.
•The following histogram
displays the heights on
the x-axis and relative
frequency on the y-axis.
•EXAMPLE 2.8
•Create a histogram for the following data: the number of books bought by 50 part-
time college students at ABC College.
•The number of books is discrete data, since books are counted.
1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1
2; 2; 2; 2; 2; 2; 2; 2; 2; 2
3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3
4; 4; 4; 4; 4; 4
5; 5; 5; 5; 5
6; 6
•Eleven students buy one book. Ten
students buy two books. Sixteen
students buy three books. Six
students buy four books. Five
students buy five books. Two
students buy six books.
The following data are the shoe sizes of 50 male students. The sizes
are discrete data since shoe size is measured in whole and half units
only. Construct a histogram and calculate the width of each bar or
class interval. Suppose you choose six bars.
9; 9; 9.5; 9.5; 10; 10; 10; 10; 10; 10; 10.5; 10.5; 10.5; 10.5; 10.5;
10.5; 10.5; 10.5
11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11.5; 11.5; 11.5;
11.5; 11.5; 11.5; 11.5
12; 12; 12; 12; 12; 12; 12; 12.5; 12.5; 12.5; 12.5; 14