2data Presentation and Visualization

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

New Era University - Virtual Learning Environment

College of Accountancy

Chapter Two: Data Presentation and


Visualization

Luzale D. Henson, R.E.E., M.B.A.,Ph.D.


Describing Data -Frequency Distributions and
Graphic Presentation

Learning Outcomes
When you have completed this chapter, you will be able to:
1. Describe different sources of data and how to gather them
2. Organize data into a frequency distribution.
3. Describe a set of data in texts, tables and graphs
4. Read and interpret tables and graphs
5. Portray a data set in a histogram, frequency polygon, and cumulative
frequency polygon
6. Present data using such graphic techniques as line charts, bar charts, and
pie charts.
7. Present application software that visualizes data
2-2

Presentation of Data
Math test scores of 35,35,38,40,42,42,44,
 Textual Presentation:
15 students Data presented in
out of 50 45,45,47,48,49,50,50,
paragraph
items or sentences
50
 Tabular Presentation: Data presented
using tables
▪ table number, table title, columns headers, row
classifiers
▪ Source notes
▪ Frequency
▪ Total Frequency
▪ Percentage Frequency
2-2

Presentation of Data
 What is your typical day??
 If you re going to break up your schedule
into categories, how does it look like?
Time Use per Day for an Average
Part-Time Graduate Student
Sleeping
1.5
1.2
Leisure/Sports
0.8
6 Working
2 Educational
Activities
Eating/Drinking
2.5 2
Grooming

Traveling
8
Others
2-2

Presentation of Data
Table Number
Table 1 Table Title
Row Classifier Population of Students in NEU High School
According to Year Level Column Headers

Year Level No. of Students Percentage


(Frequency) Frequency
1st year 350 35.0
2nd year 250 25.0
3rd year 250 25.0
4th year 150 15.0
N = 1,000
Source: NEU Registrar
Source Note
2-2

Frequency Distribution
 Frequency distribution: A grouping of data
into categories showing the number of
observations in each mutually exclusive
category.
2-5

EXAMPLE 1 Ungrouped data

 A professor wishes to determine the amount of


studying students do. He selects a random sample
of 30 students and determines the number of
hours each student studies per week: 15.0, 23.7,
19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6,
12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1,
15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6.
 Organize the data into a frequency distribution.
What is the size of class interval?
2-4

Frequency Distribution
 Class mark (midpoint): A point that divides
a class into two equal parts. This is the
average between the upper and lower class
limits.
 Class interval: For classes of the same size,
the class interval is obtained by subtracting
the lower limit of a class from the lower
limit of the next class.
2-6

EXAMPLE 1 continued
9 classes with a
class interval of 3
Consider the classes 8-12 and 13-17. The class marks are 10 and 15,
respectively.The class interval is 5 (13-8).

6 classes with size of class


interval = 5 Grouped
data

9 classes with a
class interval of 3
2-7

Suggestions on Constructing a
Frequency Distribution
 Class intervals used in the frequency
A class mark is the
distribution should be equal. midpoint of the class
interval.The class interval
 Suggested class interval: 12.1-17 has 15 as its class
mark
 i = (highest value-lowest value)/number of classes.
Max-Min=33.8 -10.3 =23.5
Select no. of classes =>6
Select the width => 5
6 times 5 = 30
Make adjustment
The square root of n is
also a reasonable guideline
for the number of classes
for n<125……….an even
numbered class is often
advantageous.
2-3

Frequency Distribution in SPSS

Steps:
1. Go to Transform→ Recode into Different Variables
2. Move variable “Hours” to Numeric Variable dialogue box
3. Select Old and New Values
2-3

Frequency Distribution in SPSS


Steps:
4. Following the suggested class intervals in slide 10, lowest value
ranged from 8-12 hrs, hence, we choose Range, LOWEST through
values and input the upper limit 12, meaning from 12 hrs and below
5. Go to New Value and select Value prompt.Type 1 as the new value
6. Press Add
2-3

Frequency Distribution in SPSS


7. For the next class interval 12.1-17, choose Range, input the range by
typing first the lower limit 12.1, then the upper limit 17 in the ‘through’
box.
8. Go to New Value and select Value prompt.Type 2 as the new value.
Then press Add.
9. Follow the same procedure until the 5 th class interval 28.1-32 hrs.
2-3

Frequency Distribution in SPSS


10. For the last class interval of 32.1-37, choose Range, value through
HIGHEST by typing the lower limit 32.1,
11. Go to New Value and select Value prompt.Type 6 as the new value
12. Press Add, then press Continue.
2-3

Frequency Distribution in SPSS


13. Pressing Continue in step 12 brings you back to Recode into
Different Variables dialogue box.Type Hours_Recode under Name
14. Press Change
15. Then press Ok.
Notice that in the Data View, a new variable is created with a variable
name Hours_Recode
2-3

Frequency Distribution in SPSS


16. Go to Variable View
17. Press Values in the variable column header.
18. In the Value Labels dialogue box, start typing the class intervals from
1 to 6 (Value) and type the Label 8-12 hrs, 12.1-17 hrs, etc. Press Add
for each category.This is to add descriptive label to each class interval
when SPSS generates an output.
19. Press OK
2-3

Frequency Distribution in SPSS


20. To generate a report, go to
Analyze→Descriptive Statistics→Frequencies
21. Drag the variable Hour_Recode to Variable(s) box and press OK

An Output View will be activated and will display a frequency table


shown below.
2-9

Relative Frequency Distribution


3.33% of the
students
spent 8 – 12
hours
studying per
week

About 76.7 %
of the
students
spent 8-22
hours
studying per
Only 10.0 % week
of the
students
spent 27.1-37
hours
studying per
week
2-9

Contingency Table
 Also referred to as cross tabulation or cross tab is used to record
and analyze the relationship between two or more variables in
matrix form
Table 2
Opinion of Buyers on the New Product
Choice Men Women Children Row
Totals
Like the Product 50 56 45 151
Indifferent 23 16 12 51
Do not like the Product 43 55 40 138
Column Totals 116 127 97 340
2-9

[R1,C1]: thirty-three per cent


Contingency Table (33%) of those who like the
product are men.
Forty-three per cent (43%) of men
 Also referred to as cross tabulation or cross tab like
is used to record
the product
and analyze the relationship between two or more variables in
matrix form
Table 2
Percentages for the Opinion of Buyers on the New Product
Choice Men Women Children Row
Totals
Like the Product 50 (33%) 56 (37%) 45 (30%) 151 (44%)
(43%) (44%) (46%)
Indifferent 23 (45%) 16 (31%) 12 (24%) 51 (15%)
(20%) (13%) (12%)
Do not like the 43 (53%) 55 (40%) 40 (29%) 138 (41%)
Product (37%) (43%) (41%)
Column Totals 116 (34%) 127 (37%) 97 (29%) 340
Legend: Red – percentage with respect to column total
Black – percentage with respect to row total
2-10

Pattern of Variability
 Stem-and-Leaf Display: A statistical
technique for displaying a set of data. Each
numerical value is divided into two parts:
the leading digits become the stem and the
trailing digits the leaf.
 Note: An advantage of the stem-and-leaf
display over a frequency distribution is we
do not lose the identity of each
observation.
2-11

EXAMPLE 2
 The following are scores of a student’s 19 math
quizzes: 76, 74, 82,96, 66, 76, 78, 72, 52, 68, 86, 84, 62,
76, 78, 92, 82, 74, 88. Construct a stem-and-leaf
chart for the data.
Stem Leaf
(50-54) 5 2
(55-59) 5
Stem Leaf
(60-64) 6 2
5 2 (65-69) 6 6 8
6 2 6 8 (70-74) 7 2 4 4
(75-79) 7 6 6 6 8 8
7 2 4 4 6 6 6 8 8 (80-84) 8 2 2 4
(85-89) 8 6 8
8 2 2 4 6 8
(90-94) 9 2
9 2 6 (95-99) 9 6
Weights of 50 College Students (lb)
Notice 2 overlapping Back-to-back stem-and-leaf display
distributions
Female Male
09 8 8 09
10 1 8 8 1 8 8 10
11 0 2 5 5 6 8 8 0 2 5 5 6 8 8 11
12 0 0 0 8 9 0 0 0 8 9 12
13 2 5 7 2 5 7 13
14 2 3 5 8 2 14 3 5 8
15 0 4 4 5 7 8 15 0 4 4 5 7 8
16 1 2 2 5 7 8 16 1 2 2 5 7 8
17 0 0 6 6 7 17 0 0 6 6 7
18 3 4 6 8 18 3 4 6 8
19 0 1 5 5 19 0 1 5 5
20 5 20 5
21 5 21 5
2-10

Pattern of Variability
 Dot-Plot Display: A statistical technique
for displaying a set of data. Each dot
represents data value positioned along a
scale.
 Note: It has the same advantage as the
stem-and-leaf display as the identity of
each observation is not lost.
2-11

EXAMPLE 2
 The following are scores of a student’s 19 math
quizzes: 76, 74, 82,96, 66, 76, 78, 72, 52, 68, 86, 84, 62,
76, 78, 92, 82, 74, 88. Construct a dotplot display
for the data. Stem Leaf
(50-54) 5 2
(55-59) 5
(60-64) 6 2
3 (65-69) 6 6 8
(70-74) 7 2 4 4
2 (75-79) 7 6 6 6 8 8
(80-84) 8 2 2 4
(85-89) 8 6 8
1
(90-94) 9 2
(95-99) 9 6

50 60 70 80 90 100
2-12

Graphic Presentation Data


 Provides strong visual impression of
distributions or trends.
 It adds interest to pages of text or tables
 It has the following labels:
a. figure number and title written below the
graph
b. % should be indicated in pie chart
c. Axis should be labeled in bar graphs
2-12

Graphic Presentation of a
Frequency Distribution
 The three commonly used graphic forms
are histograms, frequency polygons, and
cumulative frequency distribution (ogive).
 Histogram: A graph in which the classes
are marked on the horizontal axis and the
class frequencies on the vertical axis. The
class frequencies are represented by the
heights of the bars and the bars are
drawn adjacent to each other. It is used
for continuous variables.
2-14

Histogram for Hours Spent Studying

Figure 1. Hours Spent Studying


2-13

Graphic Presentation of a
Frequency Distribution
 A frequency polygon consists of line
segments connecting the points formed
by the class midpoint and the class
frequency.
 It is used for continuous variables
2-15

Frequency Polygon for Hours Spent


Studying
Go to Graphs→ Legacy Dialogs→ Line→ Choose Simple→ Press Define

Drag Hour_Recode to
Category Axis and
press OK
2-13

Graphic Presentation of a
Frequency Distribution
 A cumulative frequency distribution
(ogive) is used to determine how many
or what proportion of the data values are
below or above a certain value.
◦ Less than cumulative frequency
◦ Greater than cumulative frequency
◦ Cumulative relative frequency =>percentage
2-16

Less Than Cumulative Frequency


Distribution For Hours Studying
The cumulative frequency
from lowest class interval
to highest class interval.
2-17
Bar graphs are used for

Bar Chart data sets which are


grouped with few (10 or
less) classes.

 A bar chart can be used to depict any of


the levels of measurement (nominal,
ordinal, interval, or ratio). It is used for
discrete and qualitative variables.
 EXAMPLE 3: Construct a bar chart for
the number of unemployed people per
100,000 population for selected cities of
2005.
2-18

EXAMPLE 3 continued

City Number of unemployed


per 100,000 population
Atlanta, GA 7300
Boston, MA 5400
Chicago, IL 6700
Los Angeles, CA 8900
New York, NY 8200
Washington, D.C. 8900
2-19

Bar Chart for the Unemployment


Data
10000 8900 8900
# unemployed/100,000

8200
8000 7300
6700
5400 Atlanta
6000
Boston
4000 Chicago
Los Angeles
2000 New York
Washington
0
1 2 3 4 5 6
Cities
2-20

Pie Chart
 A pie chart is especially useful in displaying a
relative frequency distribution. A circle is
divided proportionally to the relative
frequency and portions of the circle are
allocated for the different groups.
 EXAMPLE 4: A sample of 200 runners were
asked to indicate their favorite type of
running shoe.
2-21

EXAMPLE 4 continued

 Draw a pie chart based on the following information.

Type of shoe # of runners


Nike 92
Adidas 49
Reebok 37
Asics 13
Other 9
2-22

Pie Chart for Running Shoes

Reebok Puma Others


19% 7% 5%
Nike

Adidas
Adidas Reebok
24% Nike
45% Puma

Others
2-22

Pareto Diagram
A bar graph with the bars arranged from the most numerous category
to the least numerous category. It includes a line graph displaying the
cumulative percentages and counts for the bars.
C o u n t

Per cent
Count 4,678 1560 1426 1134 690
Per cent 49% 16% 15% 12% 7%
Cum% 49% 66% 81% 93% 100%
Present your findings this way
Students' Assessment of His Research Students' Assessment of His Research
Adviser BEFORE RE Started Adviser AFTER RE
(n = 44, mean = 3.52) (n = 44, mean=3.68)
50% 40%

45% 43% 34%


35%
32%
40%
30%
35%
25%
25%
30%
25%
25% 20%
20%
20%
15%
15%
10%
10% 7%
7%
5% 5%
5% 2%

0% 0%
not a good below about above outstanding not a good below about above outstanding
teacher and average as a average as a average as a teacher and teacher and average as a average as a average as a teacher and
mentor teacher and teacher and teacher and mentor mentor teacher and teacher and teacher and mentor
mentor mentor mentor mentor mentor mentor
Or this way
Before the research started After the research
43%

34%
32%

25% 25%

20%

7% 7%
5%
2%

I feel that my adviser is I feel that my adviser is I feel that my adviser is I feel that my adviser is I feel that my adviser is
not a good teacher and below average as a about average as a above average as a an outstanding teacher
mentor teacher and mentor teacher and mentor teacher and mentor and mentor
Data Visualization
Analogy: Data visualization is how the brain processes
information.

Why use it? Using visual representations, such as charts or


graphs, allows the understanding of complex data. The
visual summary of information makes it easier to identify
patterns and trends than looking through thousands of rows
on a spreadsheet. It is stylised, scalable and conveys
significant meaning more effectively than other data
methods.

Note: Slides on Data Visualization will be shown during


synchronous lecture.

Educ 201A – LDH - New Era University


Assessment:
1. In a frequency distribution the categories must
a. Be mutually exclusive. b. Have at least 5 observations.
c. Be of the same size. d. Be of nominal scale.
2. To determine the class interval
a. Divide the class frequencies in half.
b. Divide the class frequency by the number of observations.
c. Find the difference between consecutive lower class limits.
d. Count the number of observations in the class.
3. The class frequency is
a.The number of observations in each class.
b.The difference between consecutive lower class limits.
c. Always contains at least 5 observations.
d. Usually a multiple of the lower limit of the first class.
Assessment:
4. A research organization is making a study of the selling price of
home computers. There are 45 computers in the study. How many
classes would you recommend?
a. 10 b. 20 c. 6 d. 3
5. To find the class midpoint
a. Divide class interval in half & add the result to the lower limit.
b. Find the difference between consecutive lower limits.
c. Count the number of observations in the class.
d. Divide the class frequency by the number of observations.
6. Which of the following is not a guideline for a frequency
distribution.
a. Avoid open-ended classes.
b. Have more than 5 but less than 15 classes.
c. Make the lower limit of the first class a multiple of the class
interval.
d. Have more than 5 observations in each class.
Assessment:
7. To convert a frequency distribution to a relative frequency
distribution
a. Find the difference between consecutive lower class limits.
b. Divide the class frequency by the total number of observations.
c. Divide the lower limit of the first class by the class interval.
d. Multiply the class frequency by 100.

8. For a stem-and-leaf display


a. Arrange the leaf values from smallest to largest.
b. Make sure the stem value is only one digit.
c. Do not allow stems with no leaf values.
d. All of the above.
Assessment:

9. The difference between a histogram and a frequency polygon is


a. The frequency polygon is reported as a percent.
b. The histogram employs bars whereas the midpoint are
connected for a frequency polygon.
c. Bars cannot be adjacent in a histogram.
d. Open-ended classes can be accommodated with a frequency
polygon.
10. In a less-than cumulative frequency polygon
a. The class frequencies are converted to a percent.
b. There must be at least 5 observations in each class.
c. We add the class frequencies starting with the first class.
d. All of the above.
11. On a scale of 1-4, with 4 being the highest, rate your degree of
confidence that you can explain well the topic under this module to
your friend/classmate.
4 – very confident
3 – confident
2 – not so confident
1- not confident at all
Item for Research: Get to know your friends online!!!!!!!!!
Gather at least 7 different sets of data from your 20 friends (2 nominal
data, 3 ordinal data, 1 interval and 1 ratio level data)
1. Write a brief explanation of the data you gathered.Why did you gather such
data? What do you want to know from your friends?
2. Organize your data in contingency table.
3. Construct a graph representing your data.
4. Write at least two relevant findings based on the results of the data gathered.
5. Give conclusions based on findings.
6. Present your output in powerpoint slides and send to NEU-VLE.

You might also like