MC Math 13 Module 4
MC Math 13 Module 4
MC Math 13 Module 4
3rd-yr mathematics
Descriptive Statistics
1
2.1 Three Popular Data Displays
LEARNING OBJECTIVE
Suppose 30 students in a statistics class took a test and made the following scores:
86 80 25 77 73 100 69 93
90 83 70 73 76 90 71 95
40 58 68 69 73 90 92 74
70 83
100 87
78 97
How did the class do on the test? A quick glance at the set of 30 numbers does
not immediately give a clear answer. However the data set may be reorganized
and rewritten to make relevant information more visible. One way to do so is to
construct a stem and leaf diagram as shown in Figure 2.1 "Stem and Leaf
Diagram". The numbers in the tens place, from 2 through 9, and additionally the
number 10, are the “stems,” and are arranged in numerical order from top to
bottom to the left of a vertical line. The number in the units place in each
measurement is a “leaf,” and is placed in a row to the right of the corresponding
stem, the number in the tens place of that measurement. Thus the three leaves 9,
8, and 9 in the row headed with the stem 6 correspond to the three exam scores in
the 60s, 69 (in the first row of data), 68 (in the third row), and 69 (also in the
third row). The display is made even more useful for some purposes by
rearranging the leaves in numerical order, as shown in Figure 2.2 "Ordered Stem
and Leaf Diagram". Either way, with the data reorganized certain information of
interest becomes apparent immediately. There are two perfect scores; three
students made scores under 60; most students scored
2
Chapter 2 Descriptive
Statistics
in the 70s, 80s and 90s; and the overall average is probably in the high 70s or
low 80s.
3
In this example the scores have a natural stem (the tens place) and leaf (the ones
place). One could spread the diagram out by splitting each tens place number into
lower and upper categories. For example, all the scores in the 80s may be
represented on two separate stems, lower 80s and upper 80s:
8033
867
The definitions of stems and leaves are flexible in practice. The general purpose
of a stem and leaf diagram is to provide a quick display of how the data are
distributed across the range of their values; some improvisation could be
necessary to obtain a diagram that best meets that goal.
Note that all of the original data can be recovered from the stem and leaf
diagram. This will not be true in the next two types of graphical displays.
Frequency Histogramsp
The stem and leaf diagram is not practical for large data sets, so we need a
different, purely graphical way to represent data. A frequency histogram1 is such
a device.
We will illustrate it using the same data set from the previous subsection. For the
30 scores on the exam, it is natural to group the scores on the standard ten-point
scale, and count the number of scores in each group. Thus there are two 100s,
seven scores in the 90s, six in the 80s, and so on. We then construct the diagram
shown in Figure 2.3 "Frequency Histogram" by drawing for each group, or class,
a vertical bar whose length is the number of observations in that group. In our
example, the bar labeled 100 is 2 units long, the bar labeled 90 is 7 units long,
and so on. While the individual data values are lost, we know the number in each
class. This number is called the frequency2 of the class, hence the name
frequency histogram.
Of a class of measurements,
the number of
measurements in the data set
that are in the class.
Figure 2.3 Frequency HISTOGRAM
In our example of the exam scores in a statistics class, five students scored in the
80s. The number 5 is the frequency of the group labeled “80s.” Since there are
30
students in the entire statistics class, the proportion wh o scored in the 80s is
5/30. The number 5/30, which could also be expressed as ⎯⎯ 0.16 ≈. 1667, or as
16.67%, is
the relative frequency3 of the group labeled “80s.” Every group (the 70s, the
80s, and so on) has a relative frequency. We can thus construct a diagram by
drawing for each group, or class, a vertical bar whose length is the relative
frequency of that group. For example, the bar for the 80s will have length 5/30
unit, not 5 units. The diagram is a relative frequency histogram4 for the data,
and is shown in Figure 2.4 "Relative Frequency Histogram". It is exactly the
same as the frequency histogram except that the vertical axis in the relative
frequency histogram is not frequency but relative frequency.
Of a class of measurements,
the proportion of all
measurements in the data set
that are in the class.
A graphical device showing The same procedure can be applied to any collection of numerical data. Classes
how data are distributed
across the range of their
are selected, the relative frequency of each class is noted, the classes are arranged
values by collecting them into and indicated in order on the horizontal axis, and for each class a vertical bar,
classes and indicating the whose length is the relative frequency of the class, is drawn. The resulting
proportion of measurements
display is a
in each class.
relative frequency histogram for the data. A key point is that now if each vertical
bar has width 1 unit, then the total area of all the bars is 1 or 100%.
Although the histograms in Figure 2.3 "Frequency Histogram" and Figure 2.4
"Relative Frequency Histogram" have the same appearance, the relative
frequency histogram is more important for us, and it will be relative frequency
histograms that will be used repeatedly to represent data in this text. To see why
this is so, reflect on what it is that you are actually seeing in the diagrams that
quickly and effectively communicates information to you about the data. It is the
RELATIVE sizes of the bars. The bar labeled “70s” in either figure takes up 1/3 of the
total area of all the bars, and although we may not think of this consciously, we
perceive the proportion 1/3 in the figures, indicating that a third of the grades
were in the 70s. The relative frequency histogram is important because the
labeling on the vertical axis reflects what is important visually: the relative sizes
of the bars.
When the size n of a sample is small only a few classes can be used in
constructing a relative frequency histogram. Such a histogram might look
something like the one in panel (a) of Figure 2.5 "Sample Size and Relative
Frequency Histograms". If the sample size n were increased, then more classes
could be used in constructing a relative frequency histogram and the vertical bars
of the resulting histogram would be finer, as indicated in panel (b) of Figure 2.5
"Sample Size and Relative Frequency Histograms". For a very large sample the
relative frequency histogram would look very fine, like the one in (c) of Figure
2.5 "Sample Size and Relative Frequency Histograms". If the sample size were
to increase indefinitely then the corresponding relative frequency histogram
would be so fine that it would look like a smooth curve, such as the one in panel
(d) of Figure 2.5 "Sample Size and Relative Frequency Histograms".
Figure 2.5 SAMPLE Size AND RELATIVE Frequency HISTOGRAMS
KEY TAKEAWAYS
Graphical representations of large data sets provide a quick overview of the nature of
the data.
A population or a very large data set may be represented by a smooth curve. This
curve is a very fine relative frequency histogram in which the exceedingly narrow
vertical bars have been omitted.
When a curve derived from a relative frequency histogram is used to describe a data
set, the proportion of data with values between two numbers A and b is the area
under the curve between A and b, as illustrated in Figure 2.6 "A Very Fine Relative
Frequency Histogram".
EXERCISES
EXERCISES
BASIC
93 75 76 82 100
53 70 70 82 85
4. Construct a stem and leaf diagram, a frequency histogram, and a relative
frequency histogram for the following data set. For the histograms use
classes 6.0–6.9, 7.0–7.9, and so on.
8.5 8.2 7.0 7.0 4.9
6.5 8.2 7.6 1.5 9.3
APPLICATIONS
1.
17 23 46 31 17 19 26 31 42 5
21 32 36 37 37 38 41 40 19 12
7 48 29 39 42 38 41 32 36 35
11 20 57 20 34 39 42 63 39 12 72 13 43
78 95 34 25 13 20 39 38 27 70 51 45
2.
LOOKING AHEAD
11
12