0% found this document useful (0 votes)
75 views

Module 2 - Distribution

The document provides an overview of frequency distributions and their graphical representations. It discusses ungrouped and grouped frequency distributions, and describes how to create a frequency distribution table from a set of data. Specifically, it explains how to tally the occurrences of each unique value in the data, count the frequency of each value, and calculate the relative frequencies as percentages of the total counts. The goal is to help students learn how to organize and summarize data using frequency distributions.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

Module 2 - Distribution

The document provides an overview of frequency distributions and their graphical representations. It discusses ungrouped and grouped frequency distributions, and describes how to create a frequency distribution table from a set of data. Specifically, it explains how to tally the occurrences of each unique value in the data, count the frequency of each value, and calculate the relative frequencies as percentages of the total counts. The goal is to help students learn how to organize and summarize data using frequency distributions.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Advance Statistics: Self-Learning Module for College Students 1

3
Advance Statistics: Self-Learning Module for College Students 1

Module 2 Distributions

Module Overview
Graphing data is the first and often most important step in data analysis. In this day
of computers, researchers all too often see only the results of complex computer analyses
without ever taking a close look at the data themselves. This is all the more unfortunate
because computers can create many types of graphs quickly and easily.
This module is divided in two lessons, mainly frequency distribution and graphing
qualitative and quantitative distributions as given by the outline below:

I. Frequency Distribution
a. Ungrouped and Grouped Frequency Distribution
b. Cumulative Absolute Frequency
c. Cumulative Relative Frequency
II. Visual Presentation of Quantitative Variables
a. Stem-and-Leaf Plots
b. Histograms
c. Frequency Polygons
d. Bar Charts
e. Line Graphs

Lesson 1: Frequency Distribution

Learning Outcomes
At the end of this lesson you shall be able to:
1. define what a statistical distribution is;
2. describe an ungrouped and grouped frequency distribution; and
3. write a complete grouped and ungrouped frequency distribution.

Pre-Assessment
Before you begin this module, you must have answered first the Pre-
Assessment posted in Google Classroom. If you are unable to connect to
Google Classroom you can access the Pre-Assessment by using this link:
https://bre.is/uoXn5fUW or by scanning the QR Code presented. You can
access the Pre-Assessment using your laptop or your mobile devices.

Discussion
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 2

A distribution is a description of the set of possible values that a random variable


can take. This can be done by noting the absolute or relative frequency. A distribution can
be illustrated in terms of a table, or in terms of a graph.

Frequency Distribution
A frequency distribution is a representation, either in a graphical or tabular format,
that displays the number of observations within a given interval. The interval size depends
on the data being analyzed and the goals of the analysis. The intervals must be mutually
exclusive, meaning that the events cannot happen simultaneously and exhaustive that the
events must occur at the same times and that the sum of the events make up everything
that can possibly happen.
The tables below are examples of a frequency distribution:

Table 2.1 Results of a single, hypothetical


experiment in which an “unweighted” die is tossed
6000 times.
Face of die Number of times face turns up
1 968
2 1027
3 1018
4 996
5 1007
6 984

Table 2.2 Number of days on which measurable


rain occurs in a specific year, in five hypothetical
towns.
Town name Number of days in a year with
measurable precipitation
Happyville 108
Joytown 86
Wonderdale 198
Sunnywater 259
Rainy Glen 18

In both of the above examples (the first showing the results of 6000 die tosses and
the second showing the days with precipitation in five hypothetical towns), the scenarios
are portrayed with frequency as the dependent variable. Whenever frequency is portrayed
as the dependent variable in a distribution, that distribution is called a frequency
distribution.

UNGROUPED FREQUENCY DISTRIBUTION


The simplest way to tabulate the die toss results as a frequency distribution is to
combine all the tosses and show the total frequency for each die face 1 through 6. A

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 3

hypothetical example of this result, called an ungrouped frequency distribution, is shown in


Table 2.3. We don’t care about the weighting characteristics of each individual die, but only
about potential biasing of the entire set.

Table 2.3 An ungrouped frequency distribution


showing the results of a single, hypothetical
experiment in which five different die, some
‘‘weighted’’ and some not, are each tossed 6000
times.
Face of die Number of times face turns up
1 4857
2 4999
3 4626
4 5362
5 4947
6 5209

The table above shows that, without regards to the “weighted characteristics of the
dice”, there is a bias in favor of there is some bias in favor of faces 4 and 6, some bias
against faces 1 and 3, and little or no bias either for or against faces 2 and 5.

Example:
Create a frequency distribution table for the given data below:

1, 3, 6, 4, 5, 6, 3, 4, 6, 3, 6

Solution:
From the given data we can observe the following:
 1 appeared only once, so the frequency of the data point 1 is 1
 6 appeared four times, hence the frequency of data point 6 is 4
 there are 5 exclusive data points, 1, 3, 4, 5, and 6
 the total number of data, n , is 11

Step 1: Arrange the data in ascending order.

1, 3, 3, 3, 4, 4, 5, 6, 6, 6, 6

Step 2: Create the first three columns of your table. The first column is designated as
x (the data points), the second is the tally column (the counts or number of
data in a data point) and the third is f (frequency for a data point).

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 4

Step 3: Put the exclusive data points (1, 3, 4, 5 and 6) in the x column. Count each
occurrence of the data point by putting a tick mark on the corresponding row
in the tally column. The f column is simply the number of tick marks for each
data point in the tally column.

1, 3, 3, 3, 4, 4, 5, 6, 6, 6, 6
occurrence 1 3 2 1 4

x tally f
1 1
3 3
4 2
5 1
6 4

Step 4: Find Σ f (read as “sigma f ” or “∑ of f ”) which is simply the sum of


occurrences. This must be equal to the number of data, n.

Σ f =1+ 3+2+1+ 4=11

x tally f
1 1
3 3
4 2
5 1
6 4
Σ f =¿ 11

Step 5: Add another column for the relative frequency ( f %). The relative frequency
for each data point ( f % x) is computed by dividing the frequency at each data
point ( f x ) by Σ f then multiplying the result by 100%.

We will shorten out the table by removing the tally column. Determine the f x
at each data points. Data point 1 ( x 1) has 1 frequency, hence f 1 is 1. x 6 has 4
occurrences, hence f 6 is 4.

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 5

x f
1 f 1=¿ 1
3 f 3=¿ 3
4 f 4=¿ 2
5 f 5=¿ 1
6 f 6=¿ 4
Σ f =¿ 11

Next, we compute f % at each data point.

f1 1
f %1=¿ ×100 % ¿ ×100 % ¿ 0.0909 ×100 %=9.09 %
Σf 11

f3 3
f %3=¿ ×100 % ¿ ×100 % ¿ 0.2727 ×100 %=27.27 %
Σf 11

f4 2
f %4 =¿ ×100 % ¿ ×100 % ¿ 0.1818 ×100 %=18.18 %
Σf 11

f5 1
f %5=¿ ×100 % ¿ ×100 % ¿ 0.0909 ×100 %=9.09 %
Σf 11

f6 4
f %6=¿ ×100 % ¿ ×100 % ¿ 0.3636 ×100 %=36.36 %
Σf 11

Add the f % column, then write the respective relative frequency for each
data points. Add the f % (Σ f %)and the result should be equal to 100% (or
very near to 100%, since we rounded-up during the computations above).

Σ f %=9.09 % +27.27 %+18.18 % +9.09 % +36.36 %=99.99 % ≈ 100.00 %

x f f%
1 1 9.09%
3 3 27.27%
4 2 18.18%
5 1 9.09%
6 4 36.36%
Σ f =¿ 11 Σ f %=100.00 %

Activity 2.1: Ungrouped Frequency Distribution

The data below is the result of a 10-item Math test taken by 27 students. Construct a complete

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 6

ungrouped frequency distribution table based on the given data.

10 4 10 9 10 7 6 5 9

3 4 9 10 9 3 10 7 2

3 4 7 8 3 1 2 8 7

GROUPED FREQUENCY DISTRIBUTION


When we are dealing with a distribution where the data, n, is large, say 30 or more,
it is better to group the data into classes.
Suppose we want to distribute the scores of 52 students in a 100-item Math test, we
group them into 7 classes and then determined the number of students whose score falls in
a particular class, we will use a grouped frequency distribution. The result of this grouping
is shown in Table 2.4.

Table 2.4. The raw scores and the tabulated grouped frequency distribution
of the hypothetical scores of 52 students in a Math test.

18 23 40 28 53 71 87 77 63 43
23 14 45 68 77 85 74 63 52 47
44 35 27 30 43 48 56 67 83 66
50 44 32 25 42 50 71 75 48 42
28 46 59 23 65 28 42 44 45 55
57 78 82

Relative
Classes Class Boundaries Class Mark Frequencies
Frequencies
(X ) (B) (M ) (f )
(f %)
14 – 24 13.5 – 24.5 19 5 9.615%
25 – 35 24.5 – 35.5 30 8 15.385%
36 – 46 35.5 – 46.5 41 11 21.154%
47 – 57 46.5 – 57.5 52 10 19.231%
58 – 68 57.5 – 68.5 63 7 13.462%
69 – 79 68.5 – 79.5 74 7 13.462%
80 – 90 79.5 – 90.5 85 4 7.692%
Σ f =52 100.00%

Class
In a grouped frequency distribution, a class or interval represents a data
point in the distribution. The number of classes in a particular distribution can be
determined using the 2k ≥ n Rule , where k is the number of class or intervals and n is

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 7

the total number of cases (data). There are 7 classes or intervals in Table 2.4
because when k = 7, 2k =128 which is the first number that is greater than n (52
cases).
A class also have a class limit and a class width or class size (represented by i
). The class limit determines what data value that goes into the class. There are two
limits for each class, the lower limit for the smallest value that can be counted into
the class and the upper limit for the largest value that can be counted into the class.
The class width or class size is the number of data values included in a class. This is
usually determined by subtracting consecutive or neighboring upper or lower limits.
The i is the same for all classes. Let us take for example the first class in Table 2.4, 14
– 24. Counting from 14 to 24, including 14, will produce 11 data values (14, 15, 16,
17, . . ., 24), hence the class size of the interval is 11. Also, considering the
consecutive classes, 25 – 35 and 36 – 46, subtracting the upper limits (36 – 25) as
well as the lower limits (46 – 35) will result to 11.

Class Boundary
The class boundary or the true boundary separate one class in a grouped
frequency distribution from another. The boundaries have one more decimal place
than the raw data and therefore do not appear in the data. There is no gap between
the upper boundary of one class and the lower boundary of the next class. The lower
class boundary is found by subtracting 0.5 units from the lower class limit and the
upper class boundary is found by adding 0.5 units to the upper class limit.
For example, the class boundary of the class 69 – 79 is 68.5 – 79.5. The lower
class boundary is computed by subtracting 0.5 the lower limit of the interval, 69 –
0.5 = 68.5. on the other hand, the upper class boundary is computed by adding 0.5
to the upper class limit, 79 + 0.5 = 79.5.
Also notice that there is no gap between the upper class boundary and the
lower class boundary of two consecutive classes. Take for example the boundaries
of the neighboring classes 47 – 57 and 58 – 68. Notice that the there is a gap
between the upper limit of the 47 – 57 class (57) and the lower limit of the 58 – 68
class (58). But if we will look into their respective class boundaries, you will see that
there is no gap between the upper class boundary of the 47 – 57 class (57.5) and the
lower class boundary 58 – 68 class (57.5). This is important when we want to
present the distribution as a history which we will discuss later.

Class Mark
The class mark or class midpoint is the middlemost data value in a particular
class. Let us take for example the 36 – 46 class with 35.5 – 46.5 class boundary. If
the write the data values contained in this class and determined the middlemost
value, we will find the class mark (M ):

36 37 38 39 40 41 42 43 44 45 46

Middlemost value, hence, the


class mark for the 36 – 46 class. Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 8

Another method of identifying the class mark for a particular class is by


adding the lower limit (LL) and the upper limit (UL) of the class and then dividing
the sum by 2. Hence, using this method, the class mark of class 36 – 46 ( M 36−46) is:

¿36−46 =36, UL36− 46=46

¿36−46 +UL36−46 36+46 82


M 36−46=¿ ¿ ¿ ¿ 41
2 2 2

We can also use the class boundaries in identifying the class mark using the
method above. So, if the class boundary of the 36 – 46 class is 35.5 (Lower
Boundary, LB) – 46.5 (Upper Boundary, UB), M 36−46 is:

LB36−46=35.5 , UB 36−46 =46.5

LB36−46+UB 36−46 35.5+46.5 82


M 36−46=¿ ¿ ¿ ¿ 41
2 2 2

Example:

The following data are the resting heart rate measurements for a group of 25-year old and
above men. Construct a grouped frequency table for the given data.

62 53 74 59 55 61 64 54 68
58 51 75 65 77 78 74 81 76
78 67 82 52 51 57 68 49 63
72 56 69 57 74 66 76 58 66
53 57 81 64 78 64 68 64 65

Solution:

Step 1: Arrange the score in ascending order. Count the number of cases n and
identify the range (R) by subtracting the lowest data value (LDV ) to the
highest data value (HDV ).

49 51 51 52 53 53 54 55 56
57 57 57 58 58 59 61 62 63
64 64 64 64 65 65 66 66 67
68 68 68 69 72 74 74 74 75
76 76 77 78 78 78 81 81 82

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 9

n=45
HDV =82
LDV =49

R=HDV −LDV =82−49=33

Step 2: Identify the number of classes of interval k and the class width i.

To determine the number of classes, we are going to use the 2k ≥ n Rule . We are
going the find a value, k which will make 2k equal or higher (closest highest value)
than n (45). When k =6, 2k =26=64 . Since 64 is the closest highest value to 45 (n)
when k =6 for 2k , the number of class interval needed for our table is 6.

To determine the class width i, we will divide the range R by the number of classes,
k. Round up to the nearest whole number (Some books are requiring that the k
should always be rounded up to the next odd number. We will not follow this since
it will highly affect the selection of k using the 2k ≥ n Rule .) The class width for our
example is:

R 33
i=¿ = ¿ 5.5 use 6
k 6

Step 3: Determine the starting point and use this to write the classes (X ) column.
The starting point is the LDV −1.
LDV −1=49−1
LDV −1 (49−1=48 ¿is the lower limit of the
48 +i=48+6
54 +i=54 +6 first class. To find lower limit of the next class,
60 +i=60+ 6 add i to the starting point. Find the lower limits
66 +i=66 +6 of the preceding classes by doing the same
72 +i=72+6
procedure.

48 +(i−1)=48+5 We determine the upper limit of the first


53 +i=53+ 6 class by adding i−1 to the starting point. We
59 +i=59+ 6 can find the upper limit of the preceding
65 +i=65+ 6 upper limit by adding i to the upper limit of
71 +i=71+6
the first class. Use this procedure to
77 +i=77+ 6
determine the upper limit of the succeeding
classes.
Step 4: Write the class boundary (B) column. Subtract 0.5 to the lower limits and
add 0.5 to the upper limits of each class.

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 10

B
X
Lower Class Boundary Upper Class Boundary
48 – 53 48−0.5=47.5 53+0.5=53.5
54 – 59 54−0.5=53.5 59+0.5=59.5
60 – 65 60−0.5=59.5 65+0.5=65.5
66 – 71 66−0.5=65.5 71+0.5=71.5
72 – 77 72−0.5=72.5 77+0.5=77.5
78 – 83 78−0.5=77.5 83+ 0.5=83.5

X B
48 – 53 47.5 – 53.5
54 – 59 53.5 – 59.5
60 – 65 59.5 – 65.5
66 – 71 65.5 – 71.5
72 – 77 71.5 – 77.5
78 – 83 77.5 – 83.5

Step 5: Find the class mark (M ) for each class. The class mark is the average of the
limits or the boundaries. To avoid repeatedly doing the computation, identify
the class mark of the first class, then add i to get the class mark of the
preceding class.

48+53 47.5+53.5
X B M or =
2 2
48 – 53 47.5 – 53.5 50.5
101
54 – 59 53.5 – 59.5 56.5 50.5 +i=50.5+ 6
60 – 65 59.5 – 65.5 62.5 56.5 +i=56.5+ 6 2
66 – 71 65.5 – 71.5 68.5 62.5 +i=62.5+ 6
72 – 77 71.5 – 77.5 74.5 68.5 +i=68.5+ 6
78 – 83 77.5 – 83.5 80.5 74.5 +i=74.5+ 6

Step 6: Add the frequency (f ) column. Count the data values that will fall to a
particular class. You can use a tally column for this.

49 51 51 52 53 53 54 55 56
57 57 57 58 58 59 61 62 63
64 64 64 64 65 65 66 66 67
68 68 68 69 72 74 74 74 75
76 76 77 78 78 78 81 81 82

X tally f
48 – 53 6
54 – 59 9
60 – 65 9
66 – 71 7
72 – 77 8 Marvin Y. Arce
78 – 83 6 All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 11

To check if your tally is correct the sum of the frequencies (Σ f ). This value must be
equal to n.
Σ f =6+9+ 9+7+8+ 6=45=n

X B M f
48 – 53 47.5 – 53.5 50.5 6
54 – 59 53.5 – 59.5 56.5 9
60 – 65 59.5 – 65.5 62.5 9
66 – 71 65.5 – 71.5 68.5 7
72 – 77 71.5 – 77.5 74.5 8
78 – 83 77.5 – 83.5 80.5 6
Σ f =45

Step 7: Find the relative frequency (f %).


X f f%
f 48−53 6
48 – 53 6 f %48−53=¿ ×100 % ¿ ×100 %=0.1333 × 100 %=13.33 %
Σf 45
f 54−59 9
54 – 59 9 f %54−59=¿ ×100 % ¿ ×100 %=0.2 ×100 %=20.00 %
Σf 45
f 60−65 9
60 – 65 9 f %60−65=¿ ×100 % ¿ ×100 %=0.2 ×100 %=20.00 %
Σf 45
f 66−71 7
66 – 71 7 f %66−71=¿ ×100 % ¿ ×100 %=0.1556 × 100 %=15.56 %
Σf 45
f 72−77 8
72 – 77 8 f %72−77=¿ ×100 % ¿ ×100 %=0.1778 × 100 %=17.78%
Σf 45
f 78−83 6
78 – 83 6 f %78−83=¿ ×100 % ¿ ×100 %=0.1333 × 100 %=13.33 %
Σf 45
Σ f =45
The complete table is presented below:

X B M f f%
48 – 53 47.5 – 53.5 50.5 6 13.33%
54 – 59 53.5 – 59.5 56.5 9 20.00%
60 – 65 59.5 – 65.5 62.5 9 20.00%
66 – 71 65.5 – 71.5 68.5 7 15.56%
72 – 77 71.5 – 77.5 74.5 8 17.78%
78 – 83 77.5 – 83.5 80.5 6 13.33%
Σ f =45 Σ f %=100.0 %
Activity 2.2: Grouped Frequency Distribution

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 12

The data below are the height (in centimeters) of 100 children age 5 up to 12. Construct
a grouped frequency table for the presented data.

44.96 33.27 37.57 40.41 46.15 34.10 38.23 46.35 38.18 42.91
32.62 31.48 39.88 31.35 48.02 33.98 46.18 32.60 36.90 39.06
43.48 47.97 41.16 45.88 31.88 32.22 39.34 42.65 35.71 45.77
49.27 47.32 46.29 40.71 33.47 47.42 40.51 35.80 38.09 36.58
37.42 36.49 31.12 33.98 38.42 33.86 47.74 43.51 43.03 45.41
40.08 32.31 34.12 48.27 32.46 49.41 35.12 38.07 43.86 44.61
32.58 49.27 45.23 43.81 43.27 30.09 48.11 32.87 40.18 39.80
32.53 44.39 31.29 30.07 38.29 46.16 40.43 46.39 42.08 34.64
35.29 36.43 34.42 42.88 32.67 49.54 37.25 31.04 49.14 45.34
45.04 42.24 31.68 48.18 32.96 33.85 44.92 31.80 38.67 49.43

HINT:
 Use whole number class limits.
 Adjustment must be made at the starting point to accommodate the largest data
value.
 Use the class boundaries in identifying frequencies.

Cumulative Frequency
CUMULATIVE ABSOLUTE FREQUENCY
A cumulative absolute frequency is the number of data values which fall at or below
a given data point (for ungrouped data) or class/interval (for grouped data). It is computed
by adding up the frequencies which are equal to or less than a given data value. The
cumulative frequency may be found from the frequency by either adding up the
frequencies of all scores smaller than or equal to the point of interest, or by adding the
absolute frequency of a data value to the cumulative frequency of the score value
immediately below it.

Example:
Let us consider our example for ungrouped data and we will add the cumulative
absolute frequency column.

x f f%
1 1 9.09%
3 3 27.27%
4 2 18.18%
5 1 9.09%
6 4 36.36%
Σ f =¿ 11 Σ f %=100.00 %
The cumulative absolute frequency of the first class is
equal to its absolute frequency ( f ). This means that
there is 1 data value in the first class frequency.
Marvin Y. Arce
All Rights Reserved
The cumulative absolute frequency of2020the succeeding
class is the sum of its frequency and the frequency of
Advance Statistics: Self-Learning Module for College Students 13

x f Cf
1 1 1
3 3 4
4 2 6
5 1 7
6 4 11
Σ f =¿ 11

Alternatively, we can get the cumulative absolute frequency of a class by adding the
frequency of that class to the cumulative frequency of the previous class.
add
add x f Cf
1 1 1
3 3 4
4 2 6 Sum is the cumulative frequency of
5 1 7 X5
6 4 11 the cumulative
Σ f =¿ 11 absolute frequency of
the last class should be
equal to Σ f

The same method can be applied for group frequency distribution.

X B M f Cf
48 – 53 47.5 – 53.5 50.5 6 add 6 same as the f of X 48−53
54 – 59 53.5 – 59.5 56.5 9 add 15
60 – 65 59.5 – 65.5 62.5 9 24
66 – 71 65.5 – 71.5 68.5 7 31
72 – 77 71.5 – 77.5 74.5 8 39
78 – 83 77.5 – 83.5 80.5 6 45 same as the Σ f
Σ f =45

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 14

CUMULATIVE RELATIVE FREQUENCY


The cumulative relative frequency tells the percentage that falls in and are below a
particular class. It is simply the sum of the relative frequency of a particular class and the
relative frequency of the preceding class.
The procedure of obtaining the cumulative relative frequency is the same as that of
the cumulative absolute frequency.

Example: For ungrouped frequency distribution

x f f% Cf %
1 1 9.09% 9.09% same as the f of X 1
3 3 27.27% 36.36% 9.09% + 27.27%
4 2 18.18% 54.54% 36.36% + 18.18%
5 1 9.09% 63.63% 54.54% + 9.09%
6 4 36.36% 100.00% equal to Σ f %
Σ f =¿ 11 Σ f %=100.00 %

Example: For grouped frequency distribution

X B M f Cf f% Cf %
48 – 53 47.5 – 53.5 50.5 6 6 13.33% 13.33%
54 – 59 53.5 – 59.5 56.5 9 15 20.00% 33.33%
60 – 65 59.5 – 65.5 62.5 9 24 20.00% 53.33%
66 – 71 65.5 – 71.5 68.5 7 31 15.56% 68.89%
72 – 77 71.5 – 77.5 74.5 8 39 17.78% 86.67%
78 – 83 77.5 – 83.5 80.5 6 45 13.33% 100.00%
Σ f =45

Activity 2.3: Cumulative Frequency


Complete the frequency distribution table in Activity 2.1 and 2.2 by adding a cumulative
absolute frequency and cumulative relative frequency columns.

Answer Key

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 15

Use the following answers/solutions to check your works. You must send a verification such
as a photograph of your solutions, to your instructor.

Activity 2.1: Ungrouped Frequency Distribution

X f f%
1 1 3.70%
2 2 7.41%
3 4 14.81%
4 3 11.11%
5 1 3.70%
6 1 3.70%
7 4 14.81%
8 2 7.41%
9 4 14.81%
10 5 18.52%
Σ f =27 Σ f %=100.00 %

Activity 2.2: Grouped Frequency Distribution

n=100
HDV =49.54
LDV =30.07
R=19.47
k =7 ( 2k ≥ n ,27 ≥ 100 ,12 ≥100 )

X B M f f%
30 – 32 29.5 – 32.5 31 13 13.00%
33 – 35 32.5 – 35.5 34 19 19.00%
36 – 38 35.5 – 38.5 37 15 15.00%
39 – 41 38.5 – 41.5 40 12 12.00%
42 – 44 41.5 – 44.5 43 12 12.00%
45 – 47 44.5 – 47.5 46 17 17.00%
48 – 50 47.5 – 50.5 49 12 12.00%
Σ f =100 Σ f %=100.00 %

Activity 2.3: Cumulative Frequency

A.
X f f% Cf Cf %
1 1 3.70 1 3.70
2 2 7.41 3 11.11
3 4 14.81 7 25.93
4 3 11.11 10 37.04
5 1 3.70 11 40.74
6 1 3.70 12 44.44

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 16

7 4 14.81 16 59.26
8 2 7.41 18 66.67
9 4 14.81 22 81.48
10 5 18.52 27 100.00
Σ f =27 Σ f %=100.00

B.
X f f% Cf Cf %
30 – 32 13 13.00% 13 13.00
33 – 35 19 19.00% 32 32.00
36 – 38 15 15.00% 47 47.00
39 – 41 12 12.00% 59 59.00
42 – 44 12 12.00% 71 71.00
45 – 47 17 17.00% 88 88.00
48 – 50 12 12.00% 100 100.00
Σ f =100 Σ f %=100.00 %
This solution removed the B and M column

Answer Key
Use the following answers/solutions to check your works. You must send a verification such
as a photograph of your solutions, to your instructor.

Activity 2.1: Ungrouped Frequency Distribution

X f f%
1 1 3.70%
2 2 7.41%
3 4 14.81%
4 3 11.11%
5 1 3.70%
6 1 3.70%
7 4 14.81%
8 2 7.41%
9 4 14.81%
10 5 18.52%
Σ f =27 Σ f %=100.00 %

Activity 2.2: Grouped Frequency Distribution

n=100
HDV =49.54
LDV =30.07
R=19.47
k =7 ( 2k ≥ n ,27 ≥ 100 ,12 ≥100 )

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 17

X B M f f%
30 – 32 29.5 – 32.5 31 13 13.00%
33 – 35 32.5 – 35.5 34 19 19.00%
36 – 38 35.5 – 38.5 37 15 15.00%
39 – 41 38.5 – 41.5 40 12 12.00%
42 – 44 41.5 – 44.5 43 12 12.00%
45 – 47 44.5 – 47.5 46 17 17.00%
48 – 50 47.5 – 50.5 49 12 12.00%
Σ f =100 Σ f %=100.00 %

Activity 2.3: Cumulative Frequency

A.
X f f% Cf Cf %
1 1 3.70 1 3.70
2 2 7.41 3 11.11
3 4 14.81 7 25.93
4 3 11.11 10 37.04
5 1 3.70 11 40.74
6 1 3.70 12 44.44
7 4 14.81 16 59.26
8 2 7.41 18 66.67
9 4 14.81 22 81.48
10 5 18.52 27 100.00
Σ f =27 Σ f %=100.00

B.
X f f% Cf Cf %
30 – 32 13 13.00% 13 13.00
33 – 35 19 19.00% 32 32.00
36 – 38 15 15.00% 47 47.00
39 – 41 12 12.00% 59 59.00
42 – 44 12 12.00% 71 71.00
45 – 47 17 17.00% 88 88.00
48 – 50 12 12.00% 100 100.00
Σ f =100 Σ f %=100.00 %
This solution removed the B and M column

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 18

Lesson 2: Visual Presentation of Quantitative Variables


Learning Outcomes
At the end of this lesson you shall be able to:
1. create and interpret basic and back-to-back stem-and-leaf displays and judge
whether a stem and leaf display is appropriate for a given data set;
2. create a histogram based on a grouped frequency distribution; and
3. create and interpret frequency, cumulative frequency and overlaid frequency
polygons;

Discussion
There are many types of graphs that can be used to portray distributions of
quantitative variables. These are some of the graphs that we can use to present
quantitative variables: (1) stem-and-leaf plots, (2) histograms, (3) frequency polygons, (4)
box plots (discussed in different module), (5) bar charts, (6) line graphs, and (7) scatter
plots (discussed in a different module). Some graph types such as stem and leaf displays
are best-suited for small to moderate amounts of data, whereas others such as histograms
are best-suited for large amounts of data.

STEM-AND-LEAF PLOTS
Stem-and-leaf plots are a method for showing the frequency with which certain
classes of values occur. You could make a frequency distribution table or a histogram for
the values, or you can use a stem-and-leaf plot and let the numbers themselves to show
pretty much the same information

Basic Stem-and-Leaf Plots


Basic stem-and-leaf plots present single observation or experiment.

Example:
The data below are the total accumulated scores of 31 basketball players in a
particular league.

37 33 33 32 29 28 28 23
22 22 22 21 21 21 20 20
19 19 18 18 18 18 16 15
14 14 14 12 12 9 6

We will try to construct a stem-and-leaf plot for the data using the steps below.

Step 1: Arrange the data in either ascending or descending order. Since our data is
already arranged in descending order, we will skip this part.

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 19

Step 2: Write the stem and leaf column.

Stem Leaf

Step 3: Identify the “stems”. In our given data the stems are the left-most digits (the
ten’s digits). We will have four stems: 3, 2, 1 and 0 (0 for the data values who
does not have a ten’s digit). Write down the stems in the stem column

Stem Leaf
3
2
1
0

Step 4: Write down the “leaves”. The leaves are the one’s digit for each of the data
value belonging to a particular stem. for example, the number 37 belongs to
the “3” stem and has a “leaf” of 7.
These data entry are the
Stem Leaf data values 37, 33, 33
3 7, 3, 3, 2 and 32.
2 0, 0, 1, 1, 1, 2, 2, 2, 3, 8, 8, 9
1 2, 2, 4, 4, 4, 5, 6, 8, 8, 8, 8, 9, 9
0 6, 9

To further understand the concept of stem-and-leaf plot, let use analyze the solution
to our example. The left portion of our solution contains the stems. They are the numbers 3,
2, 1, and 0. Think of these numbers as 10’s digits. A stem of 3, for example, can be used to
represent the 10’s digit in any of the numbers from 30 to 39. The numbers to the right of
the bar are leaves, and they represent the 1’s digits. Every leaf in the graph therefore stands
for the result of adding the leaf to 10 times its stem.
In the top row, the four leaves to the right of stem 3 are 7, 3, 3, and 2. Combined with
the stem, these leaves represent the numbers 37, 33, 33, and 32, which are the numbers
accumulated scores for the first four players. The next row has a stem of 2 and 12 leaves.
Together, they represent 12 data points, namely, two occurrences of 20, three occurrences
of 21, three occurrences of 22, one occurrence of 23, two occurrences of 28, and one
occurrence of 29. We leave it to you to figure out what the third row represents. The fourth
row has a stem of 0 and two leaves. It stands for the last two entries in our data, namely 9
and 6. (The latter two numbers may be thought of as 09 and 06.)

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 20

One purpose of a stem and leaf display is to clarify the shape of the distribution. You
can see many facts about the accumulated scores of the 31 players more easily in the stem-
and-leaf plot rather than a tabulated data. For example, by looking at the stems and the
shape of the plot, you can tell that most of the players had between 10 and 29 accumulated
scores, with a few having more and a few having less. The precise numbers of accumulated
scores can be determined by examining the leaves.
We can make our figure even more revealing by splitting each stem into two parts.
Figure 2.1 shows the result of dividing a stem into two parts. The top row is reserved for
numbers from 35 up to 39 and holds only the 37 accumulated scores by the first player.
The second row is reserved for the numbers from 30 up to 34 and holds the 32, 33, and 33
scores made by the next three players in the table.

Stem Leaf
3 7
3 2, 3, 3
2 8, 8, 9
2 0, 0, 1, 1, 1, 2, 2, 2, 3
1 5, 6, 8, 8, 8, 8, 9, 9
1 2, 2, 4, 4, 4
0 6, 9
Figure 2.1: Stem-and-leaf plot of the Accumulated
Scores of 31 basketball players

Activity 2.4: Basic Stem-and-Leaf Plot 1

Complete a basic stem-and-leaf plot for the following scores in a 100-item Science test.

73 42 67 78 99 84 91 82 86 94

Example:
The reaction time of mobile gaming engaged students where measured. The results,
in seconds, were given below. Construct a stem-and-plot for the given data.

7.6 8.1 9.2 6.8 5.9 6.2 6.1


5.8 7.3 8.1 8.8 7.4 7.7 8.2

Solution:
Arrange the data in ascending order.

5.8, 5.9, 6.1, 6.2, 6.8, 7.3, 7.4, 7.6, 7.7, 8.1, 8.1, 8.2, 8.8, 9.2

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 21

Identify the stems. In this data set the stems are the one’s digit, hence 5, 6, 7, 8 and 9.
Write these numbers in the stem column.

Stem Leaf
5
6
7
8
9

Complete the plot by adding the leaves. The leaves are the tenth’s digit for each data
value belonging in a particular stem. We will put a 9 and an 8 in the “5 stem” to
represent 5.9 and 5.9. Do this to complete the leaf column.

Stem Leaf
5 8, 9
6 1, 2, 8
7 3, 4, 6, 7
8 1, 1, 2, 8
9 2

Activity 2.5: Basic Stem-and-Leaf Plot 2

A study was conducted to determine the average size of the kidney stones from patients
35 years old and above which have a sedentary lifestyle. The data, in centimeter, is given
below:

0.55 1.21 0.59 1.09 1.30 1.35 0.85 0.64 0.91


0.67 1.30 0.82 1.06 0.76 1.13 0.65 1.34 0.50
1.17 1.12 1.09 0.61 0.80 1.20 1.26 1.20 0.64
1.02 0.89 1.12 0.88 0.58 0.51 1.20 1.12 0.85
1.38

Construct a stem-and-leaf plot for the given data.

Hint: Use the key: 0(5)∨1 is read as 0.51 and 1(1) ∨2is read as 1.12

Back-to-Back Stem-and-Leaf Plot


There is a variation of stem and leaf displays that is useful for comparing
distributions. The two distributions are placed back to back along a common column of
stems. The result is a “back-to-back stem-and-leaf plot.”
Example:
The following are the scores of two classes, Class A and Class B on a 50-item test.
Construct a stem-and-leaf plot on the test results.

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 22

Class A Class B

29 14 19 22 23 33 32 31 37 16 13 33
37 34 17 42 37 16 15 29 26 29 37 24
32 21 34 10 41 25 32 25 34 24 20 31
8 17 17 25 33 44 29 16 17 17 39 35
16 22 11 11 11 11

Step 1: Arrange the data values using the same order. For our example we will
arrange the data values in ascending order.

Class A 8, 10, 14, 16, 17, 17, 17, 19, 21, 22, 23, 25, 25, 29, 32, 33, 34, 34
37, 37, 41, 42

Class B 13, 15, 16, 16, 16, 17, 17, 20, 22, 24, 24, 25, 26, 29, 29, 29, 31, 31
32, 32, 33, 34, 35, 37, 37, 39

Step 2: Write the stem and leaf column. Since we have 2 data sets, we will have three
columns. The middle column is for the stem and the remaining two columns
are for the leaves of the two data sets.

Class A Stem Class B

Step 3: Determine a suitable stem for each group. In our example, we use the ten’s
digits as the stem, hence we have 0, 1, 2, 3, 4. Write these numbers in the
stem column.

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 23

Class A Stem Class B


0
1
2
3
4

Step 4: Write the leaves for each side.

Class A Stem Class B


8 0
0, 4, 6, 7, 7, 9 1 7, 7, 6, 6, 6, 5, 3
1, 2, 3, 5, 5, 9 2 9, 9, 9, 6, 5, 4, 4, 2, 0
2, 3, 4, 4, 7, 7 3 9, 7, 7, 5, 4, 3, 2, 2, 1, 1
1, 2 4

The plot shows that Class A has the overall lowest score (8) and the overall highest
score (42). Comparing the plots, we can see that the scores in Class B are mostly
congregated at the “3 stem” while in Class A is mostly evenly Distributed on the 1, 2 and 3
stems. Base on the comparison of the plots we can say that, without considering other
factors, we can say that the performance of Class B in the test is much higher compared to
Class A.

Activity 2.6: Back-to-Back Stem-and-Leaf Plot

The following are the result of the reaction time test on two groups of teenagers. One
group is composed of students who plays mobile game and the other group are those
who do not play. The result of the test is shown in the table below:

Mobile Gamers Non-Mobile Gamers

2.09 1.67 1.55 1.53 1.86 1.70 1.32 1.40 1.97 1.74 1.39 1.73
1.68 2.15 1.52 1.93 1.82 1.71 1.56 1.77 1.50 1.88 1.43 1.91
1.83 1.60 1.89 1.82 1.57 2.00 1.34 1.96 1.84 1.53 1.42 1.49
1.94 1.90 1.82 0.00 0.00 0.00

Construct a back-to-back stem-and-leaf plot

Hint: Use the key: 1(2) | 3 is read as 1.23

HISTOGRAMS
A histogram is a plot that lets you discover, and show, the underlying frequency
distribution (shape) of a set of continuous data. This allows the inspection of the data for

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 24

its underlying distribution. In a histogram, it is the area of the bar that indicates the
frequency of occurrences for each class. This means that the height of the bar does not
necessarily indicate how many occurrences of scores there were within each individual
class. It is the product of height multiplied by the width of the class that indicates the
frequency of occurrences within that class. One of the reasons that the height of the bars is
often incorrectly assessed as indicating frequency and not the area of the bar is due to the
fact that a lot of histograms often have equally spaced bars, and under these circumstances,
the height of the bin does reflect the frequency.

Example:
The frequency distribution table below shows the scores of 642 students on a
psychology test. The test consists of 197 items, each graded as "correct" or "incorrect." The
students' scores ranged from 46 to 167. In Table 2.5, the frequency distribution table is
only consisting of the interval (X ¿ columns, class boundary ( B) and the frequency ( f )
column because these are the only information, we need to construct a histogram.

Table 2.5: Distribution of scores of 642 students in a


197-item psychology test.
X B f
40 – 49 39.5 – 49.5 3
50 – 59 49.5 – 59.5 10
60 – 69 59.5 – 69.5 53
70 – 79 69.5 – 79.5 107
80 – 89 79.5 – 89.5 147
90 – 99 89.5 – 99.5 130
100 – 109 99.5 – 109.5 78
110 – 119 109.5 – 119.5 59
120 – 129 119.5 – 129.5 36
130 – 139 129.5 – 139.5 11
140 – 149 139.5 – 149.5 6
150 – 159 149.5 -159.5 1
160 – 169 159.5 – 169.5 1
Σ f =642

Step 1: Draw the axes of your histogram. For the x-axis use the class boundary so
that there will be no gap in the graphs later on. The y-axis is for the
frequencies. Chose a suitable distance between the points in your y-axis. For
our example we will use a 10-unit distance between each point.

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 25

150

140

130

120

110

100

90

Fr 80
eq
ue 70
nci
es 60

50

40

30

20

10

39.5 49.5 59.5 69.5 79.5 89.5 99.5 109.5 119.5 129.5 139.5 149.5 159.5 169.5 179.5

Class Boundaries

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 26

Step 2: Mark the intersection of a frequency and the adjacent points in the class
boundaries. Use the distribution table as a guide.

This line represents the intersection of the


frequency (3) at the interval 40 – 49 and the class
boundary of the same interval.

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 27

Step 3: Enclosed the intersection with bars.

The histogram makes it plain that most of the scores are in the middle of the
distribution, with fewer scores in the extremes. You can also see that the
distribution is not symmetric: the scores extend to the right farther than they do to
the left.
We can also create histograms using Microsoft Excel. If you want to learn the
procedure as well as the method of activating MS Excel’s Data Analysis ToolPak for

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 28

Office 2003, 2007, 2010, 2019 (Office 365) go to the links provided below (Select
only video appropriate for the Office version that you have. You can also access
these files on your Google Classroom):

1. Activating Data Analysis ToolPak Office 2003:


File Name: Activating Data Analysis Add-in in Excel 2003
Link: https://bit.ly/3cfHM8k

2. Activating Data Analysis ToolPak Office 2007:


File Name: Activating Data Analysis Add-in in Excel 2007
Link: https://bit.ly/2RGPSh3

3. Activating Data Analysis ToolPak Office 2010 and 2016:


File Name: Activating Data Analysis Add-in in Excel 2010, 2016
Link: https://bit.ly/35SuaPf

4. Activating Data Analysis ToolPak Office 2018 and 2019 (Office 365):
File Name: Activating Data Analysis Add-in in Excel 2018, 2019 (365)
Link: https://bit.ly/2FEghtz

5. Creating histogram in MS Excel:


File Name: Constructing Histogram Using MS Excel
Link: https://bit.ly/3mBbp8V

Activity 2.7: Histogram

Construct a histogram manually using the data given below:

X B f
5–9 4.5 – 9.5 5
10 – 14 9.5 – 14.5 11
15 – 19 14.5 – 19.5 15
20 – 24 19.5 – 24.5 27
25 – 29 24.5 – 29.5 78
30 – 34 29.5 – 34.5 63
35 – 39 34.5 – 39.5 11
40 – 44 39.5 – 44.5 8

Construct a histogram using MS Excel on the given data above. The data values is in the
given link: (You can also download it from your Google Classroom.)
File Name: Module 2 - Lesson 2 Activity 2.7 Excel File
Link: https://bit.ly/2RGIDpc

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 29

FREQUENCY POLYGON
Frequency polygons, also called ogives, are a graphical device for understanding the
shapes of distributions. They serve the same purpose as histograms, but are especially
helpful for comparing sets of data.
A frequency polygon is a graph constructed by using lines to join the midpoints of
each interval, or bin. The heights of the points represent the frequencies. A frequency
polygon can be created from the histogram or by calculating the midpoints of the bins from
the frequency distribution table.

Example:
We will use the same table from the previous example and construct a frequency
polygon for it. But for this example, we are going to add the midpoint (M ) column.

Table 2.5: Distribution of scores of 642 students in a 197-item


psychology test.
X B M f
40 – 49 39.5 – 49.5 44.5 3
50 – 59 49.5 – 59.5 54.5 10
60 – 69 59.5 – 69.5 64.5 53
70 – 79 69.5 – 79.5 74.5 107
80 – 89 79.5 – 89.5 84.5 147
90 – 99 89.5 – 99.5 94.5 130
100 – 109 99.5 – 109.5 104.5 78
110 – 119 109.5 – 119.5 114.5 59
120 – 129 119.5 – 129.5 124.5 36
130 – 139 129.5 – 139.5 134.5 11
140 – 149 139.5 – 149.5 144.5 6
150 – 159 149.5 -159.5 154.5 1
160 – 169 159.5 – 169.5 164.5 1
Σ f =642

Step 1: Create the x−¿ and y−¿axis of your graph. The x−¿axis contains the
midpoints for each classes or intervals and the y−¿axis is for the frequencies.
It is a good point to include the lowest midpoint after the first class interval
and the highest midpoint after the last class interval. These midpoints (with
0 frequencies) are needed to close the graph making it a polygon.

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 30

These midpoints are NOT


included the frequency
distribution table. The f at
these classes is 0. We added
this midpoint to close the
polygon.

Step 2: Put a point (or a dot) on the intersection of the midpoints and the
frequencies for each of the classes.

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 31

Step 3: Connect the points with a solid line.

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 32

Activity 2.4: Basic Stem-and-Leaf Plot 1

You can also create a frequency polygon from an existing histogram. Just put a tick on the
middle part of each bar then connect it with a line.

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 33

Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 34

Activity 2.7: Histogram

Construct a frequency polygon using the data given below:

X B f
5–9 4.5 – 9.5 5
10 – 14 9.5 – 14.5 11
15 – 19 14.5 – 19.5 15
20 – 24 19.5 – 24.5 27
25 – 29 24.5 – 29.5 78
30 – 34 29.5 – 34.5 63
35 – 39 34.5 – 39.5 11
40 – 44 39.5 – 44.5 8

Marvin Y. Arce
All Rights Reserved
2020

You might also like