MATH 121 Chapter 2 Frequency Distribution Graphs
MATH 121 Chapter 2 Frequency Distribution Graphs
2
Chapter
2 Frequency Distribution and Graphs
Learning Objectives
After completing this chapter, the students will able to:
o Define some basic terms in formulation of frequency distribution.
o Organize data into a frequency distribution.
o Compute for the midpoints of each class on a frequency distribution table.
o Compare and contrast raw data and frequency distribution.
o Construct a stem-and-leaf plot.
o Represents frequency distribution graphically using histogram, frequency polygons and
cumulative frequency polygon (ogive).
o Draw the data using pareto chart, bar chart, pie chart, time series graph, pictograph, and
scatter plot.
o Give the importance of graphs in statistics.
o Construct the different graphs and charts.
Chapter Outline
2.1 Introduction
2.2 Defining Some Terms
2.3 Constructing Frequency Distribution
Categorical Frequency Distribution
Determining Class Interval
Grouped Frequency Distribution
2.4 Stem-and-Leaf Plot
2.5 Graphing Frequency Distribution
Histogram
Frequency Polygon
Cumulative Frequency Polygon (or Ogive)
2.6 Other Types of Graphs
Pareto Chart
Bar Chart
Pie Chart
Time Series Graph
Pictograph
Scatter Plot
2.7 Guideline for Developing Good Graphs/Charts
Statistics: The only Science that enables different experts using the same figure to
draw different conclusions.
– Evan Esar
Page | 1
2.1 Introduction
• Raw data is the data arranged neither in ascending nor descending order.
• Range is the difference of the highest value and the lowest value in a distribution.
• Frequency distribution is the organization of data in a tabular form, using
mutually exclusive classes showing the number of observation in each.
• Class Limits (or Apparent Limits) is the highest and lowest values describing a
class.
• Class Boundaries (or Exact Limits) is the upper and lower values of a class for
group frequency distribution whose values has additional decimal place more than
the class limits and end with the digit 5.
• Class Interval (or Class Width) is the distance between the class lower boundary
and the class upper boundary and it is denoted by the symbol i.
• Frequency (F) is the number of values in a specific class of frequency
distribution.
• Relative Frequency (RF) is the value obtained when the frequencies in each class
of the frequency distribution is divided by the total number of values.
• Percentage is obtained by multiplying the relative frequency by 100%.
• Cumulative Frequency (CF) is the sum of the frequencies accumulated up to the
upper boundary of a class in a frequency distribution.
• Class Midpoint (or Class Mark)is the point halfway between the class limits of
each class and is representative of the data within that class.
Page | 2
2.3 Constructing Frequency Distribution
A grouped frequency distribution is used when the range of the data set is large; the
data must be grouped into classes whether it is categorical data or interval data. For
interval data, the classes are more than one unit in width. The procedure in constructing
the frequency distribution is discussed in the succeeding sections.
Example: Twenty applicants were given a performance evaluation appraisal. The data
set is
Solution:
Step 3:
Convert the tallied data into numerical
frequencies.
Page | 3
Class Tally Frequency Percent
High IIII-II 7
Average IIII-III 8
Low IIII 5
Step 4:Determine the percentage. The percentage is computed using the formula:
f
% = n x100 %, where f frequency of the class and n total number of values.
1. Rule 1.To determine the number of classes is to use the smallest positive integer k
such that 2k ≥ n, where n is the total number of observations. Using Formula 2 -1we
can obtain the ideal class interval.
𝑅𝑎𝑛𝑔𝑒 HV − LV
Suggested Class Interval (i) = Number of classes = (Formula 2-1)
𝑘
2. Rule 2. Another way to determine the class interval we can apply formula 2-2.
Range
Suggested Class Interval = 1+3.322 (logarithm of total frequencies) (Formula 2 -2)
3. Rule 3. Another guideline to determine the class interval is to have an ideal number
of classes, then apply Formula 2-3.
Page | 4
C. Grouped Frequency Distribution
P18.80 P22.00 P23.40 P24.30 P27.00 P27.90 P31.00 P26.00 P20.80 P17.00
20.00 22.60 23.40 24.50 27.00 29.30 32.10 26.10 21.00 17.30
20.25 22.75 23.70 24.70 27.40 30.10 33.70 26.30 21.60 17.80
18.40 21.90 23.00 23.85 26.80 27.80 30.80 25.00 20.40 15.50
18.70 21.90 23.20 24.10 26.90 27.90 30.90 25.20 20.50 15.70
17.95 21.75 22.90 23.70 26.50 27.50 30.60 24.75 20.25 14.10
18.35 21.80 22.90 23.70 26.50 27.60 30.75 25.00 20.30 14.30
20.20 22.80 23.50 24.60 27.30 29.50 32.90 26.20 21.30 17.40
Solution:
Step 1:Arrange the new data in ascending or descending order. In this particular example
we will arrange raw data in ascending order. This will make it easier for us to
tally the data.
P14.10 P17.95 P20.25 P21.75 P22.90 P23.70 P24.75 P26.50 P27.50 P30.60
14.30 18.35 20.30 21.80 22.90 23.70 25.00 26.50 27.60 30.75
15.50 18.40 20.40 21.90 23.00 23.85 25.00 26.80 27.80 30.80
15.70 18.70 20.50 21.90 23.20 24.10 25.20 26.90 27.90 30.90
17.00 18.80 20.80 22.00 23.40 24.30 26.00 27.00 27.90 31.00
17.30 20.00 21.00 22.60 23.40 24.50 26.10 27.00 29.30 32.10
17.40 20.20 21.30 22.75 23.50 24.60 26.20 27.30 29.50 32.90
17.80 20.25 21.60 22.80 23.70 124.70 26.30 27.40 30.10 33.70
Page | 5
• Determine the number of classes.
The objective is to use just enough classes. We can determine the number
of classes (k) using “2 to the k rule”. This will enable us to select the smallest
number (k) for the number of classes such that 2k (raised to the power of k) is
greater than the number of observations (n). Using our example, there are 80
call center agents (or n = 80). If we apply k = 6, which means we would use 6
classes, then 2k = 26 = 64, somewhat less than 80. Thus, 6 is not enough
classes. If we try k = 7, then 2k = 27 = 128, which is greater than 80.
Therefore, the recommended number of classes is 7.
Generally the class interval (or width) should be equal for all classes. The
classes must cover all the values in the raw data (that is, from lowest to highest).
Class interval is generated using the formula:
Range HV – LV 19.60
Suggested Class Interval=Number of Classes = - = 2.80 ≈3
𝑘 7
Note: Round the value of the interval up to the nearest whole number if there is a remainder.
The starting point can be the smallest data value or any convenient
number less than the smallest data value. In our case 14 is used.
We need to add the interval (or width) to the lowest score taken as the
starting point to obtain the lower limit of the next class. Keep adding until we
reach the 7 classes, as reflected 14, 17, 20, 23, 26, 29, and 32.
To obtain the upper class limits, we need to subtract one unit to the lower
limit of the second class to obtain the upper class. That is, 17 – 1 = 16. Then
add the interval (or width) to each upper limit to obtain all the upper limits.
Class Limits
14 – 16
17 – 19
20 – 22
23 – 25
26 – 28
29 – 31
32 – 34
Page | 6
• Set the class boundaries in each class. To obtain the class boundaries, we
need to subtract 0.5 from each lower class limit and add 0.5 to each upper
class limit.
Page | 7
Step 5:Determine the relative frequency. It can be found by dividing each frequency by
the total frequency
Step 6:Determine the percentage. It can be found by multiplying 100% in each relative
frequency.
Step 7:Determine the cumulative frequencies. The cumulative frequency can be found by
adding the frequency in each class to the total frequencies of the classes
preceding that class.
Page | 8
Step 8:Determine the midpoints. The midpoint can be found by getting the average of the
upper limit and lower limit in each class.
Example 2: SJS Travel Agency, a nationwide local travel agency, offers special rates on
summer period. The owner wants additional information on the ages of those people
taking travel tours. A random sample of 50 customers taking travel tours last summer
revealed these ages.
18 29 42 57 61 67 37 49 53 47
24 34 45 58 63 70 39 51 54 48
28 36 46 60 66 77 40 52 56 49
19 31 44 58 62 68 38 50 54 48
27 36 46 59 64 74 39 51 55 48
Solution:
18 29 37 42 47 49 53 57 61 67
19 31 38 44 48 50 54 58 62 68
24 34 39 45 48 51 54 58 63 70
27 36 39 46 48 51 55 59 64 74
28 36 40 46 49 52 56 60 66 77
Page | 9
Step2:Determine the classes.
• Select a starting point for the lowest class limit. The lowest value in the data set is
18, this will also serve as our starting point.
• Set the individual class limit. We will add 9 to each lower class limit until
reaching the number of classes (18, 27, 36, 45, 54, 63, and 72). To obtain the
upper class limits, we need to subtract one unit to the lower limit of the second
class to obtain the upper limit of the first class. That is, 27 – 1 = 26. Then add the
interval (or width) to each upper limit to obtain all the upper limits (26, 35, 44, 53,
62, 71, and 80).
Class Limits
18 – 26
27 – 35
36 – 44
45 – 53
54 – 62
63 – 71
72 – 80
• Set the class boundaries in each class. To obtain the class boundaries, we need to
subtract 0.5 from each lower class limit and add 0.5 to each upper class.
Page | 10
Class Limits Class Boundaries
18 – 26 17.5 – 26.5
27 – 35 26.5 – 35.5
36 – 44 35.5 – 44.5
45 – 53 44.5 – 53.5
54 – 62 53.5 – 62.5
63 – 71 62.5 – 71.5
72 – 80 71.5 – 80.5
Page | 11
Step 6: Determine the percentage.
A statistician named John Tukey introduced the stem-and-leaf plot. The objective
of this method is to some extent overcomes the loss of actual observations brought about
by the histogram. The advantage of the stem-and-leaf plot over the histogram is that we
can see the actual observations.
The stem is the leading digit or digits and the leaf is the trailing digit. The stem is
placed at the first column and the leaf at the second column.
Page | 12
Example 1: SJS Travel Agency, a nationwide local travel agency, offers special rates on
summer period. The owner wants additional information on the ages of people taking
travel tours. A random sample random sample of 50 customers taking travel tours last
summer revealed these ages.
18 29 37 42 47 49 53 57 61 67
19 31 38 44 48 50 54 58 62 68
24 34 39 45 48 51 54 58 63 70
27 36 39 46 48 51 55 59 64 74
28 36 40 46 49 52 56 60 66 77
Construct a stem-and-leaf plot.
Solution:
The stems (leading digits0 for the raw data are 1, 2, 3, 4, 5, 6, 7. The leaves for each
stem (trailing digit) are recorded at the same row and are rank-ordered to form a stem-
and-leaf plot.
Stem Leaf
1 8, 9
2 4, 7, 8, 9
Tens digit 3 1, 4, 6, 6, 7, 8, 9, 9,
Units digit
(leading digits) 4 0, 2, 4, 5, 6, 6, 7, 8, 8, 8, 9, 9
(trailingdigits)
5 0, 1, 1, 2, 3, 4, 4, 5, 6, 7, 8, 8, 9
6 0, 1, 2, 3, 4, 6, 7, 8
7 0, 4, 7
When the data set contains large number of values, making conclusions from an
ordered array or stem-and-leaf plot is often difficult. We will need graphs or charts in
such situations. There are a number of graphs or charts to visually show numerical data.
These include histogram, frequency polygon, and cumulative frequency(ogive).
In this section, we discussed several graphical methods that are used for interval data.
The most important of these graphical methods is the histogram. Histogram is a powerful
graphical technique used to summarize interval data, but it also helps explain an
important aspect of probability.
A. Histogram
A histogram is a graph in which classes are marked on the horizontal axis (x-axis)
and the class frequencies represents on the vertical axis (y-axis). The height of the bars
represents the class frequencies, and the bars are drawn adjacent to each other.
Nevertheless, the histogram focuses on the frequency of each class and sacrifices
whatever information was contained in the actual observations.
Page | 13
B. Frequency Polygon
A frequency polygon is a graph that displays the data using points which are
connected by lines. The frequencies are represented by the heights of the points at the
midpoints of the classes. The vertical axis represents the frequency of the distribution
while the horizontal axis represents the midpoints of the frequency distribution.
a. Constructing a histogram
Step 1: Find the midpoints of each class.
Step 2: Draw and label the x-axis and y-axis.
Step3: Represent the frequency on the y-axis and the midpoints on the x-axis.
Step 4: Use the frequency to represent the height and draw the vertical bars.
The class frequencies are scaled along the vertical axis and the class midpoints
along the horizontal axis. From Figure 2.1 we note that there are 4 employees in the
₱15,000 class midpoints or ₱14,000-16,000. Therefore, the height of the column for class
₱14,000-₱16,000 is 4. Applying the same thing to other classes we shall obtain the graph
below.
As the histogram shows, the class with the greatest number of data values (23) is
₱ 23,000-₱25,000, followed by 17 for ₱26,000-₱28,000. The graph also has one peak
with the data clustering around it.
Page | 14
Histogram for Call Center Agents' Salary
25
Frequency 20
15
10
5
0
15 18 21 24 27 30 33
Salary (in Thousand)
Step 4: Connect the adjacent points with the line segments. Draw a line back to the x-
axis at the beginning and end of the graph.
15
10
5
0
15 18 21 24 27 30 33
Salary (in Thousand)
Page | 15
Class Limits Class Boundaries Frequency cf
14-16 13.5-16.5 4 4
17-19 16.5-19.5 9 13
20-22 19.5-22.5 16 29
23-25 22.5-25.5 23 52
26-28 25.5-28.5 17 69
29-31 28.5-31.5 8 77
32-34 31.5-34.5 3 80
Step 3: Represent the frequency on the y-axis and the upper class boundaries on the x-
axis.
Step 4: Connect the adjacent points with line segments.
50
0
16.5 19.5 22.5 25.5 28.5 31.5 34.5
Real Limit (Salary in Thousand)
Page | 16
A. Pareto Chart
A pareto chart is a graph used to represent a frequency distribution for a
categorical data (or nominal-level) and frequencies are displayed by the heights of
vertical bars, which are arranged in order from highest to lowest.
F. Scatter Plot
A scatter plotis used to examine possible relationships between two numerical
variables. The two variables are plot in x-axis and y-axis.
Now we will illustrate how to construct the pareto chart, bar chart, pie chart, time
series graph, pictograph, and scatter plot using the succeeding examples.
Example 1: Using the information in the table about the favourite snacks of 870 youths,
construct a pareto chart, bar chart, and pie chart.
Products Sales
Junk Foods 135
Candy 250
Ice Cream 185
Chocolate 210
Others 90
Page | 17
Solution:
a. Constructing a Pareto Chart
Products Sales
Candy 250
Chocolate 210
Ice Cream 185
Junk Foods 135
Others 90
Step 2: Draw and label the x-axis (Products) and y-axis (Sales).
Step 3: Construct the chart by arranging the frequency from highest to lowest and from
left to right. Make a bar with the same width and draw the height corresponding
to the frequencies. Figure 2.4 shows the Pareto Graph on the favorite snacks of
youth.
Favorite Snacks
300
Sales (in Millions)
200
100
0
Candy Chocolate Ice Cream Junk Foods Others
Products
Step 1: Draw and label the x-axis (Products) and y-axis (Sales).
Step 2:Make a bar with the same width and draw the height corresponding to the
frequencies. Figure 2.5 shows the Bar Chart on the favorite snacks of the youth.
Page | 18
Favorite Snacks
300
Step 1:Since there are 360 ˚ in a circle, the frequency of each class must be converted into
a proportional part of a circle. This conversion is done by applying the formula
f
Degrees=(n) (360˚)
Hence the following conversions are obtained. The degrees should total to 360˚.
250 135
Candy (870) (360˚) =103˚ Junk Foods ( 70 ) (360˚) = 56˚
210 90
Chocolate (870) (360˚) = 87˚ Others (870) (360˚) = 37˚
185
Ice Cream (870) (360˚) = 77˚
Step 2: Each frequency must also converted to a percentage and has a total of 100%.
This percentage can be done using by applying the formula
𝑓
Percentage = (𝑛) (100%)
Page | 19
250 135
Candy (870) (100%) = 29% Junk Foods ( 70 ) (100%) = 16%
210 90
Chocolate (870) (100%) = 24% Others (870) (100%)= 10%
185
Ice Cream (870) (100%) = 21%
Step 3:Using a protractor graph each section and write its name and appropriate
percentage, as shown in Figure 2.6
Example 2: Using the information in the table below about the dollar to peso exchange
rate from January to December of 2010, construct a time series graph.
Solution:
Page | 20
Peso-US Dollar Exchange Rate
48
Example 3: The VSAS Realty Inc. is a real estate who develops household in Rizal
province. The information in the table show the number of house
construction from 2005 to 2009. Construct a pictograph.
Solution:
Step 1: Draw and label the x-axis and y-axis.
Step 2: Label the x-axis for years and y-axis for Number of Houses.
Step 3: Draw a house that represent the number of houses.
800
700
600 Figure 2.8: Pictograph
500 for Example 3
no. of houses
400
300
200
100
2006 2007 2008 2009 2010
Year
Legend: = 100 houses
Page | 21
Example 4:The owner of the chain of halo-halo stores would like to study the effect of
atmospheric temperature on sales during the summer season. A random
sample of 12 days is selected with the given as follows:
Day 1 2 3 4 5 6 7 8 9 10 11 12
Temperature(˚F) 79 76 78 84 90 83 93 94 97 85 88 82
Total Sales 147 143 147 168 206 155 192 211 209 187 200 150
Solution:
225
200
175
150
Sales (Y)
125
100
75
50
25
0
0 15 30 45 60 75 90
Temperature (X)
Good graphical displays tell what the data are conveying. Sadly many graphs or
charts shown in newspapers and magazines are misleading, incorrect, or complicated that
must not be used. In order to correctly develop a good graphs/charts there are some
guidelines that needs to bear in minds such as
Page | 22