0% found this document useful (0 votes)
100 views8 pages

5.1 Visual Displays of Data

This document discusses visual displays of data in statistics. It defines key terms like population, sample, parameter, statistic, and variable. It explains how to organize raw data using frequency tables and distributions. Different types of quantitative and qualitative data are described. Common visual displays of data like dot plots, bar graphs, histograms, stem-and-leaf plots, frequency polygons, pie charts, and box plots are introduced as ways to summarize and present data. Examples of frequency tables created from raw data are also shown.

Uploaded by

entropy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views8 pages

5.1 Visual Displays of Data

This document discusses visual displays of data in statistics. It defines key terms like population, sample, parameter, statistic, and variable. It explains how to organize raw data using frequency tables and distributions. Different types of quantitative and qualitative data are described. Common visual displays of data like dot plots, bar graphs, histograms, stem-and-leaf plots, frequency polygons, pie charts, and box plots are introduced as ways to summarize and present data. Examples of frequency tables created from raw data are also shown.

Uploaded by

entropy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

MATH10064 Frosina Stojanovska-Pocuca, Mohawk College

MODULE 4
5.1 VISUAL DISPLAYS OF DATA

Introduction

The science of statistics deals with the collection, analysis, interpretation, and
presentation of data.
We see and use data in our everyday lives. Organizing and summarizing data is
called descriptive statistics.
When collecting data we use
a population (includes all items of interest), or
a sample (includes some, but ordinarily not all, of the items in the population).

From the sample data, we can calculate a statistic. A statistic is a number that
represents a property of the sample. For example, if we consider one math class to be
a sample of the population of all math classes, then the average number of points
earned by students in that one math class at the end of the term is an example of a
statistic. The statistic is an estimate of a population parameter. A parameter is a
number that is a property of the population. Since we considered all math classes to
be the population, then the average number of points earned per student over all the
math classes is an example of a parameter.

One of the main concerns in the field of statistics is how accurately a statistic
estimates a parameter. The accuracy really depends on how well the sample
represents the population. The sample must contain the characteristics of the
population in order to be a representative sample. We are interested in both the
sample statistic and the population parameter in inferential statistics.

A variable, notated by capital letters such as X and Y, is a characteristic of interest


for each person or thing in a population.

Variables may be numerical or categorical.


• Numerical variables take on values with equal units such as weight in
kilograms and time in hours.
• Categorical variables place the person or thing into a category.

If we let X equal the number of points earned by one math student at the end of a
term, then X is a numerical variable. If we let Y be a person's party affiliation, then
some examples of Y include Liberal, Conservative, and Independent. Y is a categorical
variable. We could do some math with values of X (calculate the average number of
points earned, for example), but it makes no sense to do math with values of Y
(calculating an average party affiliation makes no sense).

These notes include material from Introductory Statistics by B. Illowsky, S.Dean ...1
Access for free at https://openstax.org/details/books/introductory-statistics
Licenced by OpenStax under Creative Commons Attribution License v4.0
MATH10064

Ex. 1
Determine what the key terms refer to in the following study. We want to know the
average (mean) amount of money first year college students spend at ABC College on
school supplies that do not include books. We randomly survey 100 first year students
at the college. Three of those students spent $150, $200, and $225, respectively.
>>
The population is all first year students attending ABC College this term.
The sample could be all students enrolled in one section of a beginning statistics
course at ABC College (although this sample may not represent the entire
population).
The parameter is the average (mean) amount of money spent (excluding books) by
first year college students at ABC College this term.
The statistic is the average (mean) amount of money spent (excluding books) by first
year college students in the sample.
The variable could be the amount of money spent (excluding books) by one first year
student. Let X = the amount of money spent (excluding books) by one first year
student attending ABC College.
The data are the dollar amounts spent by the first year students. Examples of the
data are $150, $200, and $225.

Data

Data may come from a population or from a sample. Small letters like x or y generally
are used to represent data values.

Most data can be put into the following categories:


• Qualitative
• Quantitative

Qualitative data are the result of categorizing or describing attributes of a population.


Hair color, blood type, ethnic group, the car a person drives, and the street a person
lives on are examples of qualitative data. Qualitative data are generally described by
words or letters. For instance, hair color might be black, dark brown, light brown,
blonde, gray, or red. Blood type might be AB+, O-, or B+. Researchers often prefer to
use quantitative data over qualitative data because it lends itself more easily to
mathematical analysis. For example, it does not make sense to find an average hair
color or blood type.

Quantitative data are always numbers and are the result of counting or measuring
attributes of a population.

For example: amount of money, pulse rate, weight, number of people living in your
town, and number of students who take statistics course.

2... 5.1 Visual Displays of Data


MATH10064

Quantitative data may be either discrete or continuous.

All data that are the result of counting are called quantitative discrete data. These
data take on only certain numerical values. If you count the number of phone calls
you receive for each day of the week, you might get values such as zero, one, two, or
three.

All data that are the result of measuring are quantitative continuous data assuming
that we can measure accurately.
𝜋𝜋 𝜋𝜋 𝜋𝜋 3𝜋𝜋 3𝜋𝜋
Measuring angles in radians might result in such numbers as , , , , , and so on.
6 3 2 2 4

If you and your friends carry backpacks with books in them to school, the numbers
of books in the backpacks are discrete data and the weights of the backpacks are
continuous data.

Visual Displays of Data

Once you have collected data, what will you do with it?

Data can be described and presented in many different formats.

For example, suppose you are interested in buying a house in a particular area. You
may have no clue about the house prices, so you might ask your real estate agent to
give you a sample data set of prices. Looking at all the prices in the sample often is
overwhelming. A better way might be to look at the median price and the variation of
prices.

Your agent might also provide you with a graph of the data.

A statistical graph is a tool that helps you learn about the shape or distribution of a
sample or a population. A graph can be a more effective way of presenting data than
a mass of numbers because we can see where data clusters and where there are only
a few data values.

Newspapers and the Internet use graphs to show trends and to enable readers to
compare facts and figures quickly. Statisticians often graph data first to get a picture
of the data. Then, more formal tools may be applied.

Some of the types of graphs that are used to summarize and organize data are the
dot plot, the bar graph, the histogram, the stem-and-leaf plot, the frequency polygon
(a type of broken line graph), the pie chart, and the box plot.

5.1 Visual Displays of Data ...3


MATH10064

Frequency Table

Once you have a set of data, you will need to organize it so that you can analyze how
frequently each value occurs in the set. We can record the number of times each data
value occurs (using tally marks) and construct a frequency table. All of the graphical
methods that follow are based on frequency tables.

For example, the following data represent grades of 40 students.


ABCCCBAABFACABBAABAAACCCBBBBCBCBBCBCBBAC

>>
Grade Tally Frequency
A |||| |||| | 11
B |||| |||| |||| | 16
C |||| |||| || 12
F | 1

When we collect quantitative data we take a look at the raw data using frequency
distributions. The frequency distribution for any variable is the count of how many
participants have reported certain values of that variable. We then present the frequency
distribution in a table.

For example, the following are data values from a survey on the number pets a household
has.
1, 0, 4, 3, 3, 2, 2, 2, 1, 0, 0, 1, 0, 1, 2, 2, 1, 2,

>>
Class Size Tally Frequency
no pets |||| 4
One pet |||| 5
Two pets |||| | 6
Three pets || 2
Four pets | 1

In a case of large number of values or when we have continuous quantitative (not whole
numbers) data it is harder to create frequency table. In this case for each variable we
report frequencies for equal intervals of data.

Ex. 2
The following data represent the number of employees at 35 restaurants in a certain city.
Using this data, create a frequency polygon

21, 22, 22, 21, 18, 25, 23, 21, 23, 24, 20, 21, 19, 20, 20, 22, 21, 18,
21, 20, 21, 20, 19, 21, 19, 23, 20, 21, 23, 20, 23, 21, 27, 21, 19
>>

4... 5.1 Visual Displays of Data


MATH10064

Class Size Frequency


18 – 20 6
20 – 22 18
22 – 24 8
24 – 26 2
26 - 28 1
Class intervals include values lower limit ≤ x < upper limit.
Bar Graphs and Pie Charts

Bar graphs consist of bars that are separated from each other. The bars can be
rectangles or they can be rectangular boxes (used in three-dimensional plots), and
they can be vertical or horizontal.

For example, the following table shows the number of users Facebook had by the end
of 2011, Facebook.

The total number of users, over 146 million users in the United States, are presented
by three age groups, the number of users in each age group, and the proportion (%)
of users in each age group.

Age Groups Number of Facebook Users Proportion (%) of Facebook Users


13 – 25 65 082 280 45%
26 – 44 53 300 200 36%
45 - 64 27 885 100 19%

The bar graph below shows the age The pie chart below shows proportions
groups represented on the x-axis and of Facebook users as part of a whole
proportions on the y-axis. circle split per age groups.

When we are interested to see how different parts compare to each other we may use
a bar graph. When we are more interested in how a specific part compares to the
whole, we may choose to use a pie chart to represent the data.

5.1 Visual Displays of Data ...5


MATH10064

Contingency Table

A contingency table provides a way of portraying data that can facilitate calculating
probabilities. The table helps in determining conditional probabilities quite easily.
The table displays sample values in relation to two different variables that may be
dependent or contingent on one another.

The following table presents admission data of ABC college in two different programs:

Physical Sciences Social Sciences Total


Female 450 1202 1652
Male 1110 987 2097
Total 1560 2189 3749

The marginal distribution of students applying to ABC college is 1560/3749 physical


sciences and 22189/3749 social sciences.
The conditional distribution of female applicants is 450/1652 in physical sciences
and 1202/1652 in social sciences, while the conditional distribution of male
applicants is 1110/2097 in physical sciences and 987/2097 in social sciences.

Histograms

One advantage of a histogram is that it can readily display large data sets. A rule of
thumb is to use a histogram when the data set consists of 100 values or more.

A histogram consists of contiguous (adjoining) boxes. It has both a horizontal axis


and a vertical axis. The horizontal axis is labeled with what the data represents (for
instance, distance from your home to school). The vertical axis is labeled either
frequency or relative frequency (or percent frequency). The graph will have the same
shape with either label.

For example, the following histogram represents the number of employees at 35


restaurants in a certain city.

a) Using 5 classes of width 2 b) using 10 classes of width 1

6... 5.1 Visual Displays of Data


MATH10064

Stem-and-Leaf Plot (Stemplots)

One simple graph, the stem-and-leaf plot or stemplot, comes from the field of
exploratory data analysis. It is a good choice when the data sets are small. To create
the plot, divide each observation of data into a stem and a leaf. The leaf consists of a
final significant digit. For example, 23 has stem two and leaf three. The number 432
has stem 43 and leaf two. Likewise, the number 5,432 has stem 543 and leaf two.
The decimal 9.3 has stem nine and leaf three. Write the stems in a vertical line from
smallest to largest. Draw a vertical line to the right of the stems. Then write the
leaves in increasing order next to their corresponding stem.

For example, Susan Dean's spring pre-calculus class has scores for the first exam
were as follows (smallest to largest):

33; 42; 49; 49; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80; 83; 88; 88;
88; 90; 92; 94; 94; 94; 94; 96; 100
>>
Stem-and-leaf of first exam scores N = 30
1 3 3 The stemplot shows that most scores fell in the 60s, 70s,
4 4 299 80s, and 90s. Eight out of the 31 scores or approximately
6 5 35
26% (8/31) were in the 90s or 100, a fairly high number of
13 6 1378899
As.
(4) 7 2348
13 8 03888
8 9 0244446
1 10 0

Leaf Unit = 1

Another way of constructing the stem and leaf plot for this data is by splitting the
stem into two part (one that will hold leaf values 0 to 4, and the other holding 5 to 9).

Stem-and-leaf of first exam scores N = 30


1 3 3
1 3
2 4 2
4 4 99
5 5 3
7 5 55
9 6 13
13 6 7889
(3) 7 234
14 7 8
13 8 03
11 8 888
8 9 024444
2 9 6
1 10 0
Leaf Unit = 1

5.1 Visual Displays of Data ...7


MATH10064

Side-by-Side Stem-and-Leaf Plot

A side-by-side stem-and-leaf plot allows a comparison of the two data sets in two
columns. In a side-by-side stem-and-leaf plot, two sets of leaves share the same
stem. The leaves are to the left and to the right of the stems.

For example, the following tables show the ages of US presidents at their
inauguration and at their death. Construct a side by-side stem-and-leaf plot using
this data.
Age at Age at
President Death Age President Death Age
Inauguration Inauguration
Washington 57 67 Cleveland 55 71
J. Adams 61 90 McKinley 54 58
Jefferson 57 83 T. Roosevelt 42 60
Madison 57 85 Taft 51 72
Monroe 58 73 Wilson 56 67
J.Q. Adams 57 80 Harding 55 57
Jackson 61 78 Coolidge 51 60
Van Buren 54 79 Hoover 54 90
W.H. Harrison 68 68 F. Roosevelt 51 63
Tyler 51 71 Truman 60 88
Polk 49 53 Eisenhower 62 78
Taylor 64 65 Kennedy 43 46
Fillmore 50 74 L. Johnson 55 64
Pierce 48 64 Nixon 56 81
Buchanan 65 77 Ford 61 93
Lincoln 52 56 Carter 52 95
A. Johnson 56 66 Reagan 69 93
Grant 46 63 G.H.W. Bush 64 94
Hayes 54 70 Clinton 47
Garfield 49 49 G.W. Bush 54
Arthur 51 56 Obama 47
Cleveland 47 71 D. Trump 70
B. Harrison 55 67
>>
Stem-and-leaf of Age at Inauguration N = 45 Stem-and-leaf of Death Age N = 41
32 4
9987776 4 69
4444422111110 5 3
8777776665555 5 6678
4421110 6 003344
985 6 567778
0 7 0111234
7 7889
8 013
8 58
9 00334
9 5

8... 5.1 Visual Displays of Data

You might also like