5.1 Visual Displays of Data
5.1 Visual Displays of Data
MODULE 4
5.1 VISUAL DISPLAYS OF DATA
Introduction
The science of statistics deals with the collection, analysis, interpretation, and
presentation of data.
We see and use data in our everyday lives. Organizing and summarizing data is
called descriptive statistics.
When collecting data we use
a population (includes all items of interest), or
a sample (includes some, but ordinarily not all, of the items in the population).
From the sample data, we can calculate a statistic. A statistic is a number that
represents a property of the sample. For example, if we consider one math class to be
a sample of the population of all math classes, then the average number of points
earned by students in that one math class at the end of the term is an example of a
statistic. The statistic is an estimate of a population parameter. A parameter is a
number that is a property of the population. Since we considered all math classes to
be the population, then the average number of points earned per student over all the
math classes is an example of a parameter.
One of the main concerns in the field of statistics is how accurately a statistic
estimates a parameter. The accuracy really depends on how well the sample
represents the population. The sample must contain the characteristics of the
population in order to be a representative sample. We are interested in both the
sample statistic and the population parameter in inferential statistics.
If we let X equal the number of points earned by one math student at the end of a
term, then X is a numerical variable. If we let Y be a person's party affiliation, then
some examples of Y include Liberal, Conservative, and Independent. Y is a categorical
variable. We could do some math with values of X (calculate the average number of
points earned, for example), but it makes no sense to do math with values of Y
(calculating an average party affiliation makes no sense).
These notes include material from Introductory Statistics by B. Illowsky, S.Dean ...1
Access for free at https://openstax.org/details/books/introductory-statistics
Licenced by OpenStax under Creative Commons Attribution License v4.0
MATH10064
Ex. 1
Determine what the key terms refer to in the following study. We want to know the
average (mean) amount of money first year college students spend at ABC College on
school supplies that do not include books. We randomly survey 100 first year students
at the college. Three of those students spent $150, $200, and $225, respectively.
>>
The population is all first year students attending ABC College this term.
The sample could be all students enrolled in one section of a beginning statistics
course at ABC College (although this sample may not represent the entire
population).
The parameter is the average (mean) amount of money spent (excluding books) by
first year college students at ABC College this term.
The statistic is the average (mean) amount of money spent (excluding books) by first
year college students in the sample.
The variable could be the amount of money spent (excluding books) by one first year
student. Let X = the amount of money spent (excluding books) by one first year
student attending ABC College.
The data are the dollar amounts spent by the first year students. Examples of the
data are $150, $200, and $225.
Data
Data may come from a population or from a sample. Small letters like x or y generally
are used to represent data values.
Quantitative data are always numbers and are the result of counting or measuring
attributes of a population.
For example: amount of money, pulse rate, weight, number of people living in your
town, and number of students who take statistics course.
All data that are the result of counting are called quantitative discrete data. These
data take on only certain numerical values. If you count the number of phone calls
you receive for each day of the week, you might get values such as zero, one, two, or
three.
All data that are the result of measuring are quantitative continuous data assuming
that we can measure accurately.
𝜋𝜋 𝜋𝜋 𝜋𝜋 3𝜋𝜋 3𝜋𝜋
Measuring angles in radians might result in such numbers as , , , , , and so on.
6 3 2 2 4
If you and your friends carry backpacks with books in them to school, the numbers
of books in the backpacks are discrete data and the weights of the backpacks are
continuous data.
Once you have collected data, what will you do with it?
For example, suppose you are interested in buying a house in a particular area. You
may have no clue about the house prices, so you might ask your real estate agent to
give you a sample data set of prices. Looking at all the prices in the sample often is
overwhelming. A better way might be to look at the median price and the variation of
prices.
Your agent might also provide you with a graph of the data.
A statistical graph is a tool that helps you learn about the shape or distribution of a
sample or a population. A graph can be a more effective way of presenting data than
a mass of numbers because we can see where data clusters and where there are only
a few data values.
Newspapers and the Internet use graphs to show trends and to enable readers to
compare facts and figures quickly. Statisticians often graph data first to get a picture
of the data. Then, more formal tools may be applied.
Some of the types of graphs that are used to summarize and organize data are the
dot plot, the bar graph, the histogram, the stem-and-leaf plot, the frequency polygon
(a type of broken line graph), the pie chart, and the box plot.
Frequency Table
Once you have a set of data, you will need to organize it so that you can analyze how
frequently each value occurs in the set. We can record the number of times each data
value occurs (using tally marks) and construct a frequency table. All of the graphical
methods that follow are based on frequency tables.
>>
Grade Tally Frequency
A |||| |||| | 11
B |||| |||| |||| | 16
C |||| |||| || 12
F | 1
When we collect quantitative data we take a look at the raw data using frequency
distributions. The frequency distribution for any variable is the count of how many
participants have reported certain values of that variable. We then present the frequency
distribution in a table.
For example, the following are data values from a survey on the number pets a household
has.
1, 0, 4, 3, 3, 2, 2, 2, 1, 0, 0, 1, 0, 1, 2, 2, 1, 2,
>>
Class Size Tally Frequency
no pets |||| 4
One pet |||| 5
Two pets |||| | 6
Three pets || 2
Four pets | 1
In a case of large number of values or when we have continuous quantitative (not whole
numbers) data it is harder to create frequency table. In this case for each variable we
report frequencies for equal intervals of data.
Ex. 2
The following data represent the number of employees at 35 restaurants in a certain city.
Using this data, create a frequency polygon
21, 22, 22, 21, 18, 25, 23, 21, 23, 24, 20, 21, 19, 20, 20, 22, 21, 18,
21, 20, 21, 20, 19, 21, 19, 23, 20, 21, 23, 20, 23, 21, 27, 21, 19
>>
Bar graphs consist of bars that are separated from each other. The bars can be
rectangles or they can be rectangular boxes (used in three-dimensional plots), and
they can be vertical or horizontal.
For example, the following table shows the number of users Facebook had by the end
of 2011, Facebook.
The total number of users, over 146 million users in the United States, are presented
by three age groups, the number of users in each age group, and the proportion (%)
of users in each age group.
The bar graph below shows the age The pie chart below shows proportions
groups represented on the x-axis and of Facebook users as part of a whole
proportions on the y-axis. circle split per age groups.
When we are interested to see how different parts compare to each other we may use
a bar graph. When we are more interested in how a specific part compares to the
whole, we may choose to use a pie chart to represent the data.
Contingency Table
A contingency table provides a way of portraying data that can facilitate calculating
probabilities. The table helps in determining conditional probabilities quite easily.
The table displays sample values in relation to two different variables that may be
dependent or contingent on one another.
The following table presents admission data of ABC college in two different programs:
Histograms
One advantage of a histogram is that it can readily display large data sets. A rule of
thumb is to use a histogram when the data set consists of 100 values or more.
One simple graph, the stem-and-leaf plot or stemplot, comes from the field of
exploratory data analysis. It is a good choice when the data sets are small. To create
the plot, divide each observation of data into a stem and a leaf. The leaf consists of a
final significant digit. For example, 23 has stem two and leaf three. The number 432
has stem 43 and leaf two. Likewise, the number 5,432 has stem 543 and leaf two.
The decimal 9.3 has stem nine and leaf three. Write the stems in a vertical line from
smallest to largest. Draw a vertical line to the right of the stems. Then write the
leaves in increasing order next to their corresponding stem.
For example, Susan Dean's spring pre-calculus class has scores for the first exam
were as follows (smallest to largest):
33; 42; 49; 49; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80; 83; 88; 88;
88; 90; 92; 94; 94; 94; 94; 96; 100
>>
Stem-and-leaf of first exam scores N = 30
1 3 3 The stemplot shows that most scores fell in the 60s, 70s,
4 4 299 80s, and 90s. Eight out of the 31 scores or approximately
6 5 35
26% (8/31) were in the 90s or 100, a fairly high number of
13 6 1378899
As.
(4) 7 2348
13 8 03888
8 9 0244446
1 10 0
Leaf Unit = 1
Another way of constructing the stem and leaf plot for this data is by splitting the
stem into two part (one that will hold leaf values 0 to 4, and the other holding 5 to 9).
A side-by-side stem-and-leaf plot allows a comparison of the two data sets in two
columns. In a side-by-side stem-and-leaf plot, two sets of leaves share the same
stem. The leaves are to the left and to the right of the stems.
For example, the following tables show the ages of US presidents at their
inauguration and at their death. Construct a side by-side stem-and-leaf plot using
this data.
Age at Age at
President Death Age President Death Age
Inauguration Inauguration
Washington 57 67 Cleveland 55 71
J. Adams 61 90 McKinley 54 58
Jefferson 57 83 T. Roosevelt 42 60
Madison 57 85 Taft 51 72
Monroe 58 73 Wilson 56 67
J.Q. Adams 57 80 Harding 55 57
Jackson 61 78 Coolidge 51 60
Van Buren 54 79 Hoover 54 90
W.H. Harrison 68 68 F. Roosevelt 51 63
Tyler 51 71 Truman 60 88
Polk 49 53 Eisenhower 62 78
Taylor 64 65 Kennedy 43 46
Fillmore 50 74 L. Johnson 55 64
Pierce 48 64 Nixon 56 81
Buchanan 65 77 Ford 61 93
Lincoln 52 56 Carter 52 95
A. Johnson 56 66 Reagan 69 93
Grant 46 63 G.H.W. Bush 64 94
Hayes 54 70 Clinton 47
Garfield 49 49 G.W. Bush 54
Arthur 51 56 Obama 47
Cleveland 47 71 D. Trump 70
B. Harrison 55 67
>>
Stem-and-leaf of Age at Inauguration N = 45 Stem-and-leaf of Death Age N = 41
32 4
9987776 4 69
4444422111110 5 3
8777776665555 5 6678
4421110 6 003344
985 6 567778
0 7 0111234
7 7889
8 013
8 58
9 00334
9 5