DescribingDataGraphically Lesson
DescribingDataGraphically Lesson
This lesson includes an overview of the subject, instructor notes, and example exercises using
Minitab.
Statistics is the discipline concerned with the optimal acquisition (where garbage in equals
garbage out) and analysis of data in order to model a population or process.
We can begin to analyze a data set by describing it both numerically and graphically. This lesson
considers important graphical summaries of data, including dotplots, histograms, and stem-
and-leaf plots. In this lesson, we are only considering quantitative (numeric) data, not qualitative
(categorical) data. For the data sets of interest, we will select only one variable of interest; that is,
we will be working with univariate data, not bivariate or multivariate data. Note that boxplots are
discussed in a separate lesson.
Prerequisites
This lesson requires knowledge of basic graphing techniques. In Minitab, graphs will be
constructed on single and multiple columns of data.
Learning Targets
Time Required
It will take the instructor 20 minutes in class to introduce the graphical summaries. We
recommend starting the activity sheet in class so that students can ask the instructor questions
WWW.MINITAB.COM/ACADEMIC
while working on it. The exercises on the activity sheet will take an additional 40 minutes, and
they can be used as homework or quiz problems.
Materials Required
Assessment
The activity sheet contains exercises for students to assess their understanding of the learning
targets for this lesson.
Possible Extensions
This lesson provides good introductory examples for students new to statistics. The instructor
may want to do the Describing Data Numerically lesson first so that students know how
quantitative measures of center and spread for a data set are calculated.
References
One of the most basic graphs for a single data set is the dotplot.
Definition: Dotplots or dot diagrams represent each observation by a dot on a single numerical
axis.
The dotplot:
Minitab instructions for creating dotplots will be discussed later in this lesson.
Example 1
You take a trip to your local grocery store to pick up a carton of a dozen eggs. You open a
carton, and there is one cracked egg inside. This gets you thinking – how many of the cartons on
the grocery store shelf contain cracked eggs? Further, how many of the eggs in each carton are
cracked? You select 30 cartons of a dozen eggs and record the number of cracked eggs in each
carton.
Below is the raw data, sorted from smallest to largest, for the sample of size n = 30 cartons.
Below the data is its dotplot. This data is in the column “Broken Eggs per Carton” in the Minitab
worksheet DescribingDataGraphically_Lesson.mtw.
As seen in the dotplot, the data is skewed, where skewness is the extent to which the data is
not symmetrical. We say the data is positively skewed or right skewed because the “tail” of the
graph pulls to the right.
Histograms
One of the most popular graphs for a single set of data is the histogram.
Definition: A histogram is a graphical way to display the frequency of data points within a
particular data set.
The histogram:
Is typically used for larger data sets, such as n > 50. It is not a good graph if you only
have a few data points.
It’s best to keep the bins the same width; otherwise, the histogram can be hard to read.
Be careful not to use too many or too few bins. In Minitab, you are able to adjust the
binning structure, which is useful in displaying the features of a data set that you want
your reader to notice.
ALWAYS CAREFULLY LABEL ALL PARTS OF A GRAPH – including axes, titles, and units!
Assume your reader knows nothing about your data and is gathering information about
it from your graph.
Below is a histogram of the number of “Broken Eggs per Carton.” The bins contain their left
endpoints. That is, if x is a data value, then the first bin from 0 to 1 contains x’s such that 0 ≤ x <
1. In this histogram, all of the 0’s are in the first bin, all of the 1’s are in the second bin, etc.
14
12
10
Frequency
0
0 2 4 6 8 10 12
Broken Eggs per Carton
Minitab instructions for creating histograms will be discussed later in this lesson.
Example 2
Below are the ages at which the U.S. presidents began their first (non-consecutive) terms,
increasing in order from George Washington to Barack Obama.
10
8
Frequency
0
42 48 54 60 66
Presidents Ages
Bins contain their left endpoints.
As seen in the histogram, the data is symmetric about its center. Since the data is symmetric,
the mean and median ages will be close to the same value.
Stem-and-Leaf Plots
The last graph in this lesson is the stem-and-leaf plot. This graph is often new to students
taking statistics for the first time.
Definition: A stem-and-leaf plot is similar to a histogram, but is turned on its side. Instead of
displaying bins, a stem-and-leaf plot displays digits from the actual data values to denote the
frequency of each value.
1. Select one or more digits for the stem values. The trailing digits become the leaves.
Example 3
Using the “Presidents Ages” data set from Example 2, construct a stem-and-leaf plot.
(a) Choose stem units as tens and the leaf units as ones. Increment the stem rows by twos.
(b) Choose stem units as tens and the leaf units as ones. Increment the stem rows by tens.
Both stem-and-leaf plots show that the data is symmetric about its center.
However, the stem-and-leaf plot in part (a) displays the shape and spread of the data better
than the plot in part (b).
Minitab Calculations
The graphs described in the first three examples can be constructed in Minitab. We’ll use the
“Presidents Ages” data in the Minitab worksheet DescribingDataGraphically_Lesson.mtw to
make a dotplot, histogram, and stem-and-leaf plot. The data are in column C2.
Minitab
Minitab Express
Minitab
Minitab Express
Minitab produces the histogram shown above on the left. By default, Minitab’s bins are defined
by their center values, or midpoints. It’s hard to read the histogram with midpoints because you
can’t easily tell where each bin starts and ends. The histogram shown above on the right defines
bins by their boundary values, or cutpoints.
Minitab
Minitab Express
When using cutpoints instead of midpoints, Minitab constructs histograms such that the bins
include their left cutpoints. It’s important to state this in the graph to alleviate confusion as to
whether bins contain their right or left cutpoints. We can display this fact as a Footnote at the
bottom of the histogram graph.
Minitab Express
Minitab
Minitab Express
Each row of the stem-and-leaf plot displays the count, stem, and leaf.
o The 1st line of the plot above has the count as 2, the stem as 4, and the leaves as 2
and 3.
o The 2nd line has the count as 2, the stem as 4, and no leaves.
o The 3rd line has the count as 6, the stem as 4, and leaves 6, 6, 7, 7.
The row that contains the median has parentheses around the count. The count for this
row represents the number of data values (or leaves) in the row.
o The 7th row contains the median, and the count is denoted by (9). This means there
are 9 data values in this row.
The count for a row before the median represents the total count for that row and the
rows before it.
o The 1st line’s count is 2, meaning there are two data values less than or equal to 43.
o The 2nd line’s count is 2, meaning there are two data values less than or equal to 45.
o The 3rd line’s count is 6, meaning there are six data values less than or equal to 47.
The count for a row after the median represents the total count for that row and the
rows after it.
o The 8th line’s count is 18, meaning there are eighteen data values greater than or
equal to 56.
o The 9th line’s count is 11, meaning there are eleven data values greater than or equal
to 58.
o The 10th line’s count is 10, meaning there are ten data values greater than or equal to
60.
Here are some notes for interpreting the stem-and-leaf plot for “Presidents Ages”:
Before ending this lesson, let’s look at another stem-and-leaf plot of “Presidents Ages” using an
increment of 5 years, instead of 2 years as shown above:
Here are some notes for interpreting this stem-and-leaf plot for “Presidents Ages”:
The 1st stem is 4 with two leaves: 2 and 3. This means that one president was age 42 and
one president was age 43, at the beginning of their first terms.
The 2nd stem is 4 with seven leaves: 6, 6, 7, 7, 8, 9, 9. This means that the ages of seven
presidents at the beginning of their first terms were ages 46, 46, 47, 47, 48, 49, and 49.
Twenty-two presidents began their first terms at age 54 or younger.
The median for the presidents’ ages is 54.5. It is the average value between the age 54 in
row 3 and 55 in row 4. Since the median is not contained in a row, there is no indication
of the median using parentheses in the plot.
Since the increment value is 5, the difference in value between stems is 5 years. In other
words, the stem-and-leaf bins have a width of 5 years. The first line (bin) represents
presidents whose ages were between and including 40 to 44; the second line represents
presidents whose ages were between and including 45 and 49, etc.
You can choose to increment by another value, such as 10, if you want a different view of
the stem-and-leaf plot. For an increment of 10, there are too few stems for the plot to be
as informative as it currently is with increments of 2 or 5.