0% found this document useful (0 votes)
175 views25 pages

Graphing - Distributions

This document discusses different methods for graphing data, including frequency tables, pie charts, bar charts, and comparing distributions. It provides examples and guidelines for when each method is most effective. Frequency tables organize raw data into categories and relative frequencies. Pie charts show relative frequencies but work best for small numbers of categories, while bar charts can display either frequencies or relative frequencies and are better for larger numbers of categories or comparing distributions. Other types of graphs like histograms and stem-and-leaf displays are also introduced. Guidelines emphasize using the graph type most suited to the number of categories and clearly displaying the information without unnecessary complexity.

Uploaded by

lemuel sardual
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
175 views25 pages

Graphing - Distributions

This document discusses different methods for graphing data, including frequency tables, pie charts, bar charts, and comparing distributions. It provides examples and guidelines for when each method is most effective. Frequency tables organize raw data into categories and relative frequencies. Pie charts show relative frequencies but work best for small numbers of categories, while bar charts can display either frequencies or relative frequencies and are better for larger numbers of categories or comparing distributions. Other types of graphs like histograms and stem-and-leaf displays are also introduced. Guidelines emphasize using the graph type most suited to the number of categories and clearly displaying the information without unnecessary complexity.

Uploaded by

lemuel sardual
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Graphing Distributions

Graphing data is the first and often most important step in data analysis. In this day of computers,
researchers all too often see only the results of complex computer analyses without ever taking a close
look at the data themselves. This is all the more unfortunate because computers can create many types of
graphs quickly and easily.

Learning Objectives
1. Create a frequency table
2. Determine when pie charts are valuable and when they are not
3. Create and interpret bar charts
4. Identify common graphical mistakes
Frequency Tables
All of the graphical methods shown in this section are derived from frequency tables. Table 1 shows a
frequency table for the results of the iMac study; it shows the frequencies of the various response
categories. It also shows the relative frequencies, which are the proportion of responses in each category.
For example, the relative frequency for “none” of 0.17 = 85/500.

Table 1. Frequency Table for the iMac Data.

Previous Ownership Frequency Relative Frequency


None 85 0.17
Windows 60 0.12
Macintosh 355 0.71
Total 500 1.00
Pie Charts
The pie chart in Figure 1 shows the results of the iMac study. In a pie chart, each category is represented
by a slice of the pie. The area of the slice is proportional to the percentage of responses in the category.
This is simply the relative frequency multiplied by 100. Although most iMac purchasers were Macintosh
owners, Apple was encouraged by the 12% of purchasers who were former Windows users, and by the
17% of purchasers who were buying a computer for the first time.

Pie charts are effective for displaying the relative frequencies of a small number of categories. They are not
recommended, however, when you have a large number of categories. Pie charts can also be confusing
when they are used to compare the outcomes of two different surveys or experiments. In an influential book
on the use of graphs, Edward Tufte asserted “The only worse design than a pie chart is several of them.”

Bar charts
Bar charts can also be used to represent frequencies of different categories. A bar chart of the iMac
purchases is shown in Figure 2. Frequencies are shown on the Yaxis and the type of computer previously
owned is shown on the X-axis. Typically, the Y-axis shows the number of observations in each category
rather than the percentage of observations in each category as is typical in pie charts.

65
Comparing Distributions
Often we need to compare the results of different surveys, or of different conditions within the same overall
survey. In this case, we are comparing the “distributions” of responses between the surveys or conditions.
Bar charts are often excellent for illustrating differences between two distributions. Figure 3 shows the
number of people playing card games at the Yahoo web site on a Sunday and on a Wednesday in the
spring of 2001. We see that there were more players overall on Wednesday compared to Sunday. The
number of people playing Pinochle was nonetheless the same on these two days. In contrast, there were
about twice as many people playing hearts on Wednesday as on Sunday. Facts like these emerge clearly
from a well-designed bar chart.

Poke
r
Blackjac
k
Bridg
e
Gi
n
Cribbag
e
Heart
s
Canast
a
Pinochl
e
Euchr
e
Spade
s
0 200 400 600
0 0 0

Wednesday Sunday

Figure 3. A bar chart of the number of people playing different card games on Sunday and
Wednesday.

The bars in Figure 3 are oriented horizontally rather than vertically. The horizontal format is useful when
you have many categories because there is more room for the category labels. We’ll have more to say
about bar charts when we consider numerical quantities later in this chapter.

Some graphical mistakes to avoid


Don’t get fancy! People sometimes add features to graphs that don’t help to convey their information. For
example, 3-dimensional bar charts such as the one shown in Figure 4 are usually not as effective as their
two-dimensional counterparts.

66
'!!"

&#!"

&!!"

%#!"

%!!"

$#!"

$!!"

#!"

!"
()*+ " ,-*.)/0" 123-*4)05
"

Finally, we note that it is a serious mistake to use a line graph when the X-axis contains merely qualitative
variables. A line graph is essentially a bar graph with the tops of the bars represented by points joined by
lines (the rest of the bar is suppressed). Figure 7 inappropriately shows a line graph of the card game data
from Yahoo. The drawback to Figure 7 is that it gives the false impression that the games are naturally
ordered in a numerical way when, in fact, they are ordered alphabetically.

Figure 7. A line graph used inappropriately to depict the number of people playing different card games on
Sunday and Wednesday.

Summary
Pie charts and bar charts can both be effective methods of portraying qualitative data. Bar charts are better
when there are more than just a few categories and for comparing two or more distributions. Be careful to
avoid creating misleading graphs.

Graphing Quantitative Variables

Quantitative variables are distinguished from categorical (sometimes called qualitative) variables such as
favorite color, religion, city of birth, favorite sport in which there is no ordering or measuring involved.
There are many types of graphs that can be used to portray distributions of quantitative variables.
The upcoming sections cover the following types of graphs: (1) stem and leaf displays, (2) histograms, (3)
67
frequency polygons, (4) box plots, (5) bar charts, (6) line graphs, (7) dot plots, and (8) scatter plots
(discussed in a different chapter). Some graph types such as stem and leaf displays are best-suited for
small to moderate amounts of data, whereas others such as histograms are bestsuited for large amounts of
data. Graph types such as box plots are good at depicting differences between distributions. Scatter plots
are used to show the relationship between two variables.

Stem and Leaf Displays


Learning Objectives
1. Create and interpret basic stem and leaf displays
2. Create and interpret back-to-back stem and leaf displays
3. Judge whether a stem and leaf display is appropriate for a given data set
A stem and leaf display is a graphical method of displaying data. It is particularly useful when your data are
not too numerous. In this section, we will explain how to construct and interpret this kind of graph.
A stem and leaf display of the data is shown in Figure 1. The left portion of Figure 1 contains the stems.
They are the numbers 3, 2, 1, and 0, arranged as a column to the left of the bars. Think of these numbers
as 10’s digits. A stem of 3, for example, can be used to represent the 10’s digit in any of the numbers from
30 to 39. The numbers to the right of the bar are leaves, and they represent the 1’s digits. Every leaf in the
graph therefore stands for the result of adding the leaf to 10 times its stem.

68
Histograms
Learning Objectives
1. Create a grouped frequency distribution
2. Create a histogram based on a grouped frequency distribution
3. Determine an appropriate bin width
A histogram is a graphical method for displaying the shape of a distribution. It is particularly useful when
there are a large number of observations. We begin with an example consisting of the scores of 642
students on a psychology test. The test consists of 197 items each graded as “correct” or “incorrect.” The
students' scores ranged from 46 to 167.
The first step is to create a frequency table. Unfortunately, a simple frequency table would be too
big, containing over 100 rows. To simplify the table, we group scores together as shown in Table 1.

The histogram makes it plain that most of the scores are in the middle of the distribution, with fewer scores
in the extremes. You can also see that the distribution is not symmetric: the scores extend to the right
farther than they do to the left. The distribution is therefore said to be skewed. (We'll have more to say
about shapes of distributions in Chapter 3.)
Histograms can be based on relative frequencies instead of actual frequencies. Histograms based
on relative frequencies show the proportion of scores in each interval rather than the number of scores. In
this case, the Y-axis runs from 0 to 1 (or somewhere in between if there are no extreme proportions). You
can change a histogram based on frequencies to one based on relative frequencies by (a) dividing each
class frequency by the total number of observations, and then (b) plotting the quotients on the Y-axis
(labeled as proportion).

69
Frequency Polygons
Learning Objectives
1. Create and interpret frequency polygons
2. Create and interpret cumulative frequency polygons
3. Create and interpret overlaid frequency polygons
Frequency polygons are a graphical device for understanding the shapes of distributions. They serve the
same purpose as histograms, but are especially helpful for comparing sets of data. Frequency polygons
are also a good choice for displaying cumulative frequency distributions.
The first label on the X-axis is 35. This represents an interval extending from 29.5 to 39.5. Since the lowest
test score is 46, this interval has a frequency of 0. The point labeled 45 represents the interval from 39.5 to
49.5. There are three scores in this interval. There are 147 scores in the interval that surrounds 85.
You can easily discern the shape of the distribution from Figure 1. Most of the scores are between
65 and 115. It is clear that the distribution is not symmetric inasmuch as good scores (to the right) trail off
more gradually than poor scores (to the left). In the terminology of Chapter 3 (where we will study shapes of
distributions more systematically), the distribution is skewed.

10
6
10
4
10
2
10
u y
Freq enc

0
80
60
40
20
0
35 45 55 65 76 85 95 15 15 15 15 15 15 15 15
Test0Score1 2 3 4 5 6 7
Figure 1. Frequency polygon for the psychology test scores.

A cumulative frequency polygon for the same test scores is shown in Figure 2. The graph is the same as
before except that the Y value for each point is the number of students in the corresponding class interval
plus all numbers in lower intervals. For example, there are no scores in the interval labeled “35,” three in
the interval “45,” and 10 in the interval “55.” Therefore, the Y value corresponding to “55” is 13. Since 642
students took the test, the cumulative frequency for the last interval is 642.

70
0
60
0
50
uy
CumulaƟveFreqenc

0
40
0
30
0
20
0
10
00
35 45 55 65 76 85 95 1 5 1 5 1 5 1 5 1 5 1 5 1 5
Test Score
0 1 2 3 4 5 6 70
Figure 2. Cumulative frequency polygon for the psychology test scores.

Frequency polygons are useful for comparing distributions. This is achieved by overlaying the frequency
polygons drawn for different data sets. Figure 3 provides an example. The data come from a task in which
the goal is to move a computer cursor to a target on the screen as fast as possible. On 20 of the trials, the
target was a small rectangle; on the other 20, the target was a large rectangle. Time to reach the target was
recorded on each trial. The two distributions (one for each target) are plotted together in Figure 3. The
figure shows that, although there is some overlap in times, it generally took longer to move the cursor to the
small target than to the large one.

Box Plots
Learning Objectives
1. Define basic terms including hinges, H-spread, step, adjacent value, outside value, and far out value
2. Create a box plot
3. Create parallel box plots
4. Determine whether a box plot is appropriate for a given data set
There are several steps in constructing a box plot. The first relies on the 25th, 50th, and 75th
percentiles in the distribution of scores. Figure 1 shows how these three statistics are used. For each
gender we draw a box extending from the 25th percentile to the 75th percentile. The 50th percentile is
drawn inside the box. Therefore, the bottom of each box is the 25th percentile, the top is the 75th
percentile, and the line in the middle is the 50th percentile.

71
72
.

Variations on box plots


Statistical analysis programs may offer options on how box plots are created. For example, the box plots in
Figure 6 are constructed from our data but differ from the previous box plots in several ways.
1. It does not mark outliers.
2. The means are indicated by green lines rather than plus signs.
3. The mean of all scores is indicated by a gray line.
4. Individual scores are represented by dots. Since the scores have been rounded to the nearest second,
any given dot might represent more than one score.

5. The box for the women is wider than the box for the men because the widths of the boxes are
proportional to the number of subjects of each gender (31 women and 16 men).

Bar Charts
Learning Objectives
1. Create and interpret bar charts
2. Judge whether a bar chart or another graph such as a box plot would be more appropriate
In the section on qualitative variables, we saw how bar charts could be used to illustrate the frequencies of
different categories. For example, the bar chart shown in Figure 1 shows how many purchasers of iMac
computers were previous Macintosh users, previous Windows users, and new computer
Number of Buyers

purchasers.
400

300

200

100

0
None Windows Macintosh
Previous Computer
Figure 1. iMac buyers as a function of previous computer ownership.

In this section we show how bar charts can be used to present other kinds of quantitative information, not
just frequency counts. The bar chart in Figure 2 shows the percent increases in the Dow Jones, Standard

73
and Poor 500 (S & P), and Nasdaq stock indexes from May 24th 2000 to May 24th 2001. Notice that both
the S & P and the Nasdaq had “negative increases” which means that they decreased in value. In this bar
chart, the Y-axis is not frequency but rather the signed quantity percentage increase.

Bar charts are particularly effective for showing change over time. Figure 3, for example, shows the percent
increase in the Consumer Price Index (CPI) over four three-month periods. The fluctuation in inflation is
apparent in the graph.

Bar charts are often used to compare the means of different experimental conditions. Figure 4 shows the
mean time it took one of us (DL) to move the cursor to either a small target or a large target. On average,
more time was required for small targets than for large ones.

em for this purpose. Box plots should be used instead since they provide more information than bar charts
without taking up more space. For example, a box plot of the cursor-movement data is shown in Figure 5.
You can see that Figure 5 reveals more about the distribution of movement times than does Figure 4.

Line Graphs
Learning Objectives
1. Create and interpret line graphs
2. Judge whether a line graph would be appropriate for a given data set
A line graph is a bar graph with the tops of the bars represented by points joined by lines (the rest of the bar
is suppressed). For example, Figure 1 was presented in the section on bar charts and shows changes in
the Consumer Price Index (CPI) over time.

A line graph of these same data is shown in Figure 2. Although the figures are similar, the line graph
emphasizes the change from period to period.

Line graphs are appropriate only when both the X- and Y-axes display ordered (rather than qualitative)
variables. Although bar graphs can also be used in this situation, line graphs are generally better at
comparing changes over time. Figure 3, for example, shows percent increases and decreases in five
components of the CPI. The figure makes it easy to see that medical costs had a steadier progression than
the other components. Although you could create an analogous bar chart, its interpretation would not be as
easy.

74
8

7
Housing
6
CPI % Increase

5
Medical Care

3 Food and Beverage

Recreation
2

Transportation
1

0
July 2000 October 2000 January 2001 April 2001

Figure 3. A line graph of the percent change in five components of the CPI over time.

Let us stress that it is misleading to use a line graph when the X-axis contains merely qualitative variables.
Figure 4 inappropriately shows a line graph of the card game data from Yahoo, discussed in the section on
qualitative variables. The defect in Figure 4 is that it gives the false impression that the games are naturally
ordered in a numerical way.

75
8000

6000
Number of Players

4000

Wednesday

2000

Sunday

BlackjackBridge CanastaCribbageEuchre Gin Hearts Pinochle Poker Spades


Figure 4. A line graph, inappropriately used, depicting the number of people playing different card games
on Wednesday and Sunday.

76
Dot Plots
by David M. Lane

Prerequisites
• Chapter 2: Bar Charts

Learning Objectives
1. Create and interpret dot plots
2. Judge whether a dot plot would be appropriate for a given data set

Dot plots can be used to display various types of information. Figure 1 uses a dot plot to display the number
of M & M's of each color found in a bag of M & M's. Each dot represents a single M & M. From the figure,
you can see that there were 3 blue M & M's, 19 brown M & M's, etc.

Figure 1. A dot plot showing the number of M & M's of various colors in a bag of M & M's.

The dot plot in Figure 2 shows the number of people playing various card games on the Yahoo
website on a Wednesday. Unlike Figure 1, the location rather than the number of dots represents the
frequency.

77
Spades

Euchre

Pinochle

Canasta

Hearts

Cribbage

Gin

Bridge

Blackjack

Poker

1000 2000 3000 4000 5000 6000 7000

Figure 2. A dot plot showing the number of people playing various card games on a Wednesday.

The dot plot in Figure 3 shows the number of people playing on a Sunday and on a Wednesday.
This graph makes it easy to compare the popularity of the games separately for the two days, but does not
make it easy to compare the popularity of a given game on the two days.

78
Sunday
Spades
Euchre
Pinochle
Canasta
Hearts
Cribbage
Gin
Bridge
Blackjack
Poker

Wednesday
Spades
Euchre
Pinochle
Canasta
Hearts
Cribbage
Gin
Bridge
Blackjack
Poker

1000 2000 3000 4000 5000 6000 7000

Figure 3. A dot plot showing the number of people playing various card games on a Sunday and on a
Wednesday.

79
Sunday
Wednesday
Spades
Euchre
Pinochle
Canasta
Hearts
Cribbage
Gin
Bridge
Blackjack
Poker

1000 2000 3000 4000 5000 6000 7000

Figure 4. An alternate way of showing the number of people playing various card games on a Sunday and
on a Wednesday.
The dot plot in Figure 4 makes it easy to compare the days of the week for specific games while still
portraying differences among games.

80
Statistical Literacy
by Seyd Ercan and David Lane

Prerequisites
• Chapter 2: Graphing Distributions

Fox News aired the line graph below showing the number unemployed during four quarters between 2007
and 2010.

What do you think?


Does Fox News' line graph provide misleading information? Why or Why not?

Think about this before continuing:

There are major flaws with the Fox News graph. First, the title of the graph is
misleading. Although the data show the number unemployed, Fox News’ graph is
titled "Job Loss by Quarter." Second, the intervals on the X-axis are misleading.
Although there are 6 months between September 2008 and March 2009 and 15
months between March 2009 and June 2010, the intervals are represented in the
graph by very similar lengths. This gives the false impression that unemployment
increased steadily.
The graph presented below is corrected so that distances on the
X-axis are proportional to the number of days between the

dates. This graph shows clearly that the rate of increase in the number
unemployed is greater between September 2008 and March 2009 than it is
between March 2009 and June 2010.

81
82
References
Tufte, E. R. (2001). The Visual Display of Quantitative Information (2nd ed.) (p.
178). Cheshire, CT: Graphics Press.

83
Exercises

Prerequisites
• All material presented in the Graphing Distributions chapter

1. Name some ways to graph quantitative variables and some ways to graph qualitative variables.

2. Based on the frequency polygon displayed below, the most common test grade was around what score?
Explain.

3. An experiment compared the ability of three groups of participants to remember briefly- presented chess
positions. The data are shown below. The numbers represent the number of pieces correctly
remembered from three chess positions. Create side-by-side box plots for these three groups. What can
you say about the differences between these groups from the box plots?

84
8817 981; :617
8819 9=17 :;1<
8<18 9?17 ;718
8?1< :61; ;<1:
4. 971= :;1; ;>17 You have to decide between displaying
991; ;719 =717 your data with a histogram or with a stem
and leaf display. What factor(s) would
9>1? ;81< =:1?
affect your choice?
9?1= ;;1= =;1?
5. :918 ;;1? >619 In a box plot, what percent of the scores
:918 ;=1= >;19 are between the lower and upper hinges?

6. A student has decided to display the results


of his project on the number of hours people in various countries slept per night. He compared the
sleeping patterns of people from the US, Brazil, France, Turkey, China, Egypt, Canada, Norway, and
Spain. He was planning on using a line graph to display this data. Is a line graph appropriate? What
might be a better choice for a graph?

7. For the data from the 1977 Stat. and Biom. 200 class for eye color, construct: a. pie graph
b. horizontal bar graph
c. vertical bar graph
d. a frequency table with the relative frequency of each eye color

!) 88
' 87
! ;
!+ 8
(Question submitted by J. Warren, UNH)

8. A graph appears below showing the number of adults and children who prefer each type of soda. There
were 130 adults and kids surveyed. Discuss some ways in which the graph below could be improved.

85
9. Which of the box plots on the graph has a large positive skew? Which has a large negative skew?

Question from Case Studies

Angry Moods (AM) case study

86
10. (AM) Is there a difference in how much males and females use aggressive behavior to improve an
angry mood? For the “Anger-Out” scores: a. Create parallel box plots.
b. Create a back to back stem and leaf displays (You may have trouble finding a computer to do this
so you may have to do it by hand. Use a fixed-width font such as Courier.)

11. (AM) Create parallel box plots for the Anger-In scores by sports participation.

12. (AM) Plot a histogram of the distribution of the Control-Out scores.

13. (AM) Create a bar graph comparing the mean Control-In score for the athletes and the non- athletes.
What would be a better way to display this data?

14. (AM) Plot parallel box plots of the Anger Expression Index by sports participation. Does it look like
there are any outliers? Which group reported expressing more anger?

Flatulence (F) case study

15. (F) Plot a histogram of the variable “per day.”

16. (F) Create parallel box plots of “how long” as a function gender. Why is the 25th percentile not
showing? What can you say about the results?

17. (F) Create a stem and leaf plot of the variable “how long.” What can you say about the shape of the
distribution?

Physicians’ Reactions (PR) case study

18. (PR) Create box plots comparing the time expected to be spent with the average-weight and
overweight patients.

19. (PR) Plot histograms of the time spent with the average-weight and overweight patients.

20. (PR) To which group does the patient with the highest expected time belong?

Smiles and Leniency (SL) case study

21. (SL) Create parallel box plots for the four conditions.

22. (SL) Create back to back stem and leaf displays for the false smile and neutral conditions. (It may be
hard to find a computer program to do this for you, so be prepared to do it by hand).

87
ADHD Treatment (AT) case study

23. (AT) Create a line graph of the data. Do certain dosages appear to be more effective than others?

24. (AT) Create a stem and leaf plot of the number of correct responses of the participants after taking the
placebo (d0 variable). What can you say about the shape of the distribution?

25. (AT) Create box plots for the four conditions. You may have to rearrange the data to get a computer
program to create the box plots.

SAT and College GPA (SG) case study

26. (SG)Create histograms and stem and leaf displays of both high-school grade point average and
university grade point average. In what way(s) do the distributions differ?

27. The April 10th issue of the Journal of the American Medical Association reports a study on the effects
of anti-depressants. The study involved 340 subjects who were being treated for major depression.
The subjects were randomly assigned to receive one of three treatments: St. John’s wort (an herb),
Zoloft (Pfizer’s cousin of Lilly’s Prozac) or placebo for an 8-week period. The following are the mean
scores (approximately) for the three groups of subjects over the eight-week experiment. The first
column is the baseline. Lower scores mean less depression. Create a graph to display these means.

00)3 /7)/ /5)7 /5)/ /4)0 /3)/ /0)/ /0)1

01). 0.)0 /6)0 /6). /4)3 /4)/ /2)0 /1).

00)2 /7)0 /4)4 /3)3 /2)0 /1)/ //)6 /.)3

28. For the graph below, of heights of singers in a large chorus. What word starting with the letter “B” best
describes the distribution?

88
29. Pretend you are constructing a histogram for describing the distribution of salaries for individuals who
are 40 years or older, but are not yet retired. (a) What is on the Y-axis? Explain. (b) What is on the X-
axis? Explain. (c)
What would be the probable shape of the salary distribution? Explain why.

89

You might also like