MODULE-2 Descriptive Statistics
MODULE-2 Descriptive Statistics
2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT
Last time, you were introduced to some basic concepts in statistics which include the different
basic terms and definitions, different sampling techniques and ways on how to gather data, ways
on how to present data using frequency tables, and the different vital statistical rates and ratios.
The organization of data may be done by using tables while the summary of data may be displayed
employing graphs and charts.
Another method of summarizing data is to compute numbers, such as average, that
describe a set of data. Numbers that are used to describe sets of data are called descriptive
measures. We also intend to analyze a set of data by describing its spread or dispersion. All of
which is to be discussed in this module.
Topics
Schedule:
Instructional Materials:
• ULS CLMS
• Pre-recorded lectures
• PDF files
LEARNING OUTCOMES
By the end of this module you should have been able to:
1. find the measures of central tendency, and variability of both ungrouped and grouped data;
2. describe data by using measures of central tendency and measures of variability; and
3. interpret Likert-type of questions applying measures of central tendency.
MOTIVATION
Ever wonder how your grades were computed? How the so-called General weighted Average – a
crucial part for students vying for Latin honors – is obtained? Average is a value where all the other
values in a set of data tend to cluster. In this module, we will discuss the average or measure of
central tendency: the mean, median, and mode.
Aside from tables and graphs, another way of describing a set of data is by stating a single numerical value
associated with it. This value is where all the other values in a distribution tend to cluster. It is called the average or
measure of central tendency. There are three kinds of average: the mean, the median, and the mode.
Mean
The mean (also known as the arithmetic mean)
➢ It is the sum of measures divided by the number of measures in a variable.
➢ It is symbolized as 𝑥̅ (read as x bar).
➢ Used to describe a set of data where the measures cluster or concentrate at a point. As the measures
cluster around each other, a single value appears to represent distinctively the total measures.
➢ It is, however, affected by extreme measures, that is, very high or very low measures can easily
change the value of the mean.
To find the weighted mean:
To find the mean of ungrouped data:
∑ 𝑓𝑥
∑𝑥 𝑥̅ 𝑤𝑚 =
𝑥̅ = 𝑁
𝑛 Where:
Where:
∑fx = summation of the product of the frequency
∑x = the summation of x (sum of the measures) (f) and score (x)
n = number of values of x N = total number of frequency
Example 1: The grades in Chemistry of 10 students are 87, 84, 85, 85, 86, 90, 79, 82, 78, 76.
What is the average grade of the 10 students?
Solution: 87 + 84 + 85 + 85 + 86 + 90 + 79 + 82 + 78 + 76 832
𝑥̅ = = = 83.2
10 10
Example 2: Find the mean salary for a small company that pays monthly salaries to its employees as shown
in the frequency distribution.
Salary (x) Number of fx
Employees (f) ∑ 𝑓𝑥
Php7 000.00 8 56 000 𝑥̅ =
𝑁
Php8 000.00 11 88 000 427 000
=
Php9 250.00 14 129 500 45
Php10 500.00 9 94 500 = 9 488.89
Php17 000.00 2 34 000
Php25 000.00 1 25 000 The average monthly salary that
Total 45 ∑ 𝒇𝒙 =427000
the company pays is
Php9,488.89
Example 1: The number of books borrowed in the library from Monday to Friday last week were 58, 60,
54, 35, and 97 respectively. Find the median.
Solution: 35, 54, 58, 60, 97 Arrange the number of books borrowed in increasing order.
̃ = 58
𝒙 The median is 58.
Example 2: Cora’s quizzes for the second quarter are 8, 7, 6, 10, 9, 5, 9, 6, 10, and 7. Find the median.
Solution: 5, 6, 6, 7, 7, 8, 9, 9, 10, 10 Since the number of measures is even, then the median is the
average of the two middle scores.
𝟕+𝟖
̃=
𝒙 = 𝟕. 𝟓
𝟐
The median score of Cora’s quizzes is 7.5
Mode
Mode is the measure or value which occurs most frequently in a set of data. It is the value with the greatest
frequency.
To find the mode of a given set of data: take note of the following:
➢ select the measure that appears most often in the set;
➢ If two or more measures appear the same number of times, and the frequency they appear is
greater than any other measures, then each of these values is a mode
➢ If every measure appears the same number of times, then the set of data has no mode.
➢ The distribution is unimodal if there’s only one mode, bimodal if two, and trimodal if three.
Median uses the symbol 𝒙 ̂
Example 1: The shoe sizes of 10 randomly selected students in a class are 6, 5, 4, 6, 4, 5, 6, 7, 7 and 6.
What is the mode?
̂ = 𝟔 The mode is 6 since it is the shoe size that occurred the most number of times.
Answer: 𝒙
Example 2: The sizes of 9 classes in a certain school are 50, 52, 55, 49, 51, 54, 55, 53 and 54.
̂ = 𝟓𝟒 & 𝟓𝟓 The modes are 54 and 55 since the two measures occurred the same number
Answer: 𝒙
of times. The distribution is bimodal.
Example: Compute the mean of the scores of the students in a Mathematics test.
Class Frequency
46 – 50 1
41 – 45 5
36 – 40 11
31 – 35 12
26 – 30 11
21 – 25 5
16 – 20 2
11 – 15 1
The frequency distribution for the data is given below. The columns xm and fxm are added.
Class f 𝑥𝑚 𝑓𝑥𝑚
46 – 50 1 48 48
41 – 45 5 43 215 To find 𝑥𝑚
36 – 40 11 38 418 𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 + 𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
31 – 35 12 33 396 𝑥𝑚 =
2
26 – 30 11 28 308
21 – 25 5 23 115 Example: in 46 – 50
16 – 20 2 18 36
46 + 50
11 – 15 1 13 13 𝑥𝑚 =
N=48 2
∑(𝑓𝑥 ) = 1549
𝑚
𝑥𝑚 = 48
∑(𝑓𝑥𝑚 )
𝑥̅ =
𝑁
1 549
=
48
= 32.27
∑ 𝑓𝑥𝑐
𝑥̅ = 𝑥0 + [ ]𝑖
𝑁
Where: 𝑥̅ = mean
𝑥0 = assumed mean
𝑓 = frequency
𝑥𝑐 = coded value
𝑁 = total frequency
𝑖 = size of the class interval
➢ Any class mark can be considered as assumed mean. The class chosen to contain 𝑥0 is given a 0
deviation.
➢ Subsequently, consecutive positive integers are assigned to the classess upward and negative
integers to the classes downward.
Example: Compute the mean of the scores of the students in a Mathematics test.
Class Frequency
46 – 50 1
41 – 45 5
The frequency distribution 36 – 40 11
for the data is given below. 31 – 35 12
The columns xm, xc and 𝑓𝑥c
26 – 30 11
are added.
21 – 25 5
16 – 20 2
11 – 15 1
Solution:
Class f 𝑥𝑚 𝑥𝑐 𝑓𝑥𝑐 To find 𝑥𝑐 :
46 – 50 1 48 3 3
• Choose any class mark to be considered as
41 – 45 5 43 2 10 assumed mean (in this case we choose 33)
36 – 40 11 38 1 11 • Coded value (𝑥𝑐 ) assigned to 33 would
31 – 35 12 33 0 0 be zero (0)
26 – 30 11 28 -1 -11 • Consecutive positive integers are assigned
upward after zero (0)
21 – 25 5 23 -2 -10
• Consecutive negative integers are
16 – 20 2 18 -3 -6 assigned upward after zero (0)
11 – 15 1 13 -4 -4 • Be mindful of the arrangement of the class
N=48 intervals whether descending or
∑(𝑓𝑥𝑐 ) = −7 ascending.
∑(𝑓𝑥𝑐 )
𝑥̅ = 𝑥0 + [ ]𝑖
𝑁
−7
= 33 + ( )5
48
=32.27
The frequency distribution for the data is given below. The columns for lb and “less than” cumulative
frequency are added.
𝑁
Class f lb “<” cf Class f lb “<” cf 𝑁 𝑁 −𝑐𝑓𝑏
2
Compute 2 , since = 𝑥̃ = 𝑥𝑙𝑏 + [ ]𝑖
46 – 50 1 45.5 48 11 – 15 1 10.5 1 48
2 𝑓𝑚
= 24, then the class 24−19
41 – 45 5 40.5 47 16 – 20 2 15.5 3 2
= 30.5 + [ ]5
36 – 40 11 35.5 42 21 – 25 5 20.5 8 interval containing 24 in 12
or
the less than cumulative = 32.58
31 – 35 12 30.5 31 26 – 30 11 25.5 19
frequency will be the =
26 – 30 11 25.5 19 31 – 35 12 30.5 31 median class. In this
21 – 25 5 20.5 8 36 – 40 11 35.5 42 case the median class The median score is
16 – 20 2 15.5 3 41 – 45 5 40.5 47 is the 31–35 class 32.58.
11 – 15 1 10.5 1 46 – 50 1 45.5 48 interval.
∴50% of students got a score above 32. 58, and 50% of students got a score below 32.58
∆1
𝑥̂ = 𝑥𝑙𝑏 + [ ]𝑖
∆1 + ∆2
where: 𝑥𝑙𝑏 = lower boundary of the modal class.
∆1 = difference between the frequencies of the modal class and the next lower class.
∆2 = difference between the frequencies of the modal class and the next upper class.
𝑖 = class interval.
The modal class is the class interval with the highest frequency.
Example:
Class f lb ∆1
Since class 31 – 35 𝑥̂ = 𝑥𝑙𝑏 + [ ]𝑖
46 – 50 1 45.5 ∆1 + ∆2
has the highest
41 – 45 5 40.5 12 − 11
frequency, the modal = 30.5 + [ ]5
36 – 40 11 35.5
class is 31 – 35. (12 − 11) + (12 − 11)
31 – 35 12 30.5 = 30.5 + [0.5]5
26 – 30 11 25.5 = 30.5 + 2.5
21 – 25 5 20.5 = 33
16 – 20 2 15.5 The mode score is 33.
11 – 15 1 10.5
Reminder: take note of the arrangement of the class interval if it is from highest to lowest or from
lowest to highest.
The table below is the summary and interpretation of the mean responses in the Likert-type of statements 1 - 3.
5 4 3 2 1 𝑥̅ Interpretation of 𝑥̅
1. 36 51 18 0 1 4.14 Agree
2. 18 44 37 8 1 3.65 Agree
3. 18 48 28 0 1 3.86 Agree
The earlier part of this module dealt with the concepts of measurements that describe the middle or the center of the
distribution. But the measures of central tendency do not describe how the observations spread out from the center
of the distribution. In this lesson, we will deal with the measures of variability or spread of a distribution.
Definition
MEASURES or VARIABILITY OR DISPERSION
These are measures of the average distance of each observation from the center of the
distribution. They measure the homogeneity or heterogeneity of a particular group.
Consider the following measurements, in liters, for two samples of apple juice in a tetra packed by companies A
and B.
Sample A Sample B
0.95 1.06
1.00 1.01
0.92 0.88
1.03 0.91
1.10 1.14
𝑥̅ = 1.00 𝑥̅ = 1.00
Company A
0.8 0.9 1 1.1 1.2
Company B
Both samples have the same mean, 1.00 liters. It is quite obvious that company A packed apple juice with a
more uniform content than company B. We say that the variability or the dispersion of the observations from the
mean is less for sample A than for sample B. Therefore, in buying apple juice, we would feel more confident that the
tetra pack we select will be closer to the advertised mean if we buy from company A.
Range
The range is the simplest measure of variability. It is the difference between the largest and smallest
measurement.
R = H - L
The main disadvantage of the range is that it does not consider every measure in the data.
Examples:
1. The IQs of 5 members of a family are 108, 112, 127, 118 and 113. Find the range.
2. The range of each of the set of scores of the three students is as follows:
H = 98 L = 92 R = 98 - 92 = 6
Student A
H = 97 L = 90 R = 97 - 90 = 7
Student B
H = 97 L = 90 R = 97 - 90 = 7
Student C
Observe that two students are “ tie.” This indicates that the range is not a reliable measure of dispersion. It is a poor
measure of dispersion, particularly if the size of the sample or population is large. It considers only the extreme
values and tells us nothing about the distribution of numbers in between.
Variance
Definition
Since the range considers only two scores in the data set, it is an unreliable measure
of variability, it cannot be used to directly compare two sets of data and it is an unstable
Variance is the measure of variability especially for a very large sample.
average of the
squared deviation Consider the following
from the mean. The
formulas for finding For male group
the variance for
ungrouped data are: Sample A (x) (𝑥 − 𝑥̅ ) (𝑥 − 𝑥̅ )2
a. Treating the data as
Population Variance: population, the variance is
∑(𝑥 − 𝜇)2
𝜎2 = 65 -19 361 ∑(𝑥 − 𝜇)2 820
𝑁
𝜎2 = =
𝑁 5
Sample Variance: 75 -9 81 = 164 𝑠𝑞𝑢𝑎𝑟𝑒 𝑢𝑛𝑖𝑡𝑠
∑(𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1 85 1 1 b. Treating the data as sample,
Where:
the variance is
𝑥 = individual
value/score from 95 11 121 ∑(𝑥 − 𝑥̅ )2
820 820
𝑠2 = = =
the raw data 𝑛−1 5−1 4
𝑥̅ = the sample mean = 205 𝑠𝑞𝑢𝑎𝑟𝑒 𝑢𝑛𝑖𝑡𝑠
100 16 256
𝑁 = total population
𝑛 =number of sample 𝛴 = 820
𝜇 = population mean
𝜎 2 = the population
variance
𝑠 2 = the sample For female group
variance
Sample B (𝑥 − 𝑥̅ ) (𝑥 − 𝑥̅ )2 1. Treating the data as
population, the variance is
82 -2 4 ∑(𝑥 − 𝜇)2 10
𝜎2 = =
83 -1 1 𝑁 5
= 2 𝑠𝑞𝑢𝑎𝑟𝑒 𝑢𝑛𝑖𝑡𝑠
84
0 0 2. Treating the data as sample,
85 1 the variance is
1 ∑(𝑥−𝑥̅ )2 10 10
𝑠2 = = = =
86 2 4 𝑛−1 5−1 4
2.5 𝑠𝑞𝑢𝑎𝑟𝑒 𝑢𝑛𝑖𝑡𝑠
𝛴 = 10
Note this
Using the variance as a measure of variability for the two sets of grades, the males showed more variability in
performance. Note that the higher the variance, the more variable or far apart the values are from each other.
Standard Deviation
Since the obtained variance is in squared units then the deviation from the mean is squared. You cannot
Definition
still picture out the true meaning of the data set. Hence, extract the square root of the variance, defining another
standard deviation measure of variability called the standard deviation.
is the square root of
the variance.
For Male Group
a. Treating the data as population, the standard deviation is
Population Standard
∑(𝑥−𝜇)2 820
Deviation: 𝜎=√ =√ = √164 = 12.81
𝑁 5
∑(𝑥−𝜇)2
𝜎=√ 𝑁 b. Treating the data as sample, the standard deviation is
∑(𝑥−𝑥̅ )2 820 820
Sample Standard 𝑠=√ = √5−1 = √ = √205 = 14.32
𝑛−1 4
deviation:
∑(𝑥−𝑥̅ )2 For female group
𝑠=√ 𝑛−1
Where: a. Treating the data as population, the standard deviation is
𝑥 = individual
∑(𝑥−𝜇)2 10
value/score from 𝜎=√ = √ 5 = √2 = 1.41
𝑁
the raw data
𝑥̅ = the sample mean b. Treating the data as sample, the standard deviation is
𝑁 = total population
∑(𝑥−𝑥̅ )2 10 10
𝑛 =number of sample 𝑠=√ = √5−1 = √ 4 = √2.5 = 1.58
𝑛−1
𝜇 = population mean
𝜎 = the population
standard deviation
𝑠 = the sample Based from the results, the male group are more dispersed than the female group.
standard deviation
Coefficient of Variance
When comparing two sets of data with different units, you cannot use the variance or the standard deviation. You have to use the coefficient
of variance.
Definition
Coefficient of variance It
is the ratio of the standard deviation to the mean. It is used to compare
the variability of two or more sets of data even when they are expressed in different units of
measurement.
𝑠𝑑
𝑐𝑣 = 𝑥̅ • 100%
Where:
𝑐𝑣 = Coefficient of variance Guiding Principle
𝑠𝑑 = standard deviation The lesser the value of the measure, the more consistent, the more
̅𝑥 = mean
homogeneous and the less scattered are the observations in the set of data.
Example: The following are the time taken (in minutes) to complete a homework by the two groups of students
in a day:
Group A :24, 26, 33, 37, 29, 31
Group B : 26 , 28, 33, 35, 37, 29 , 27, 25
a. Find the range.
b. Find the coefficient of variation.
c. What can you say on the average (mean) time of Group A and Group B in completing the homework?
d. Which group finishes the homework at approximately uniform rate?
Solution:
a. Find the range.
Group A 24, 26, 29, 31, 33, 37 Group B 25, 26, 27, 28, 29, 33, 35, 37
R=H–L
R = 37 – 24 R= 37 – 25
R = 13 R = 12
b. Find the coefficient of variation.
Group A Group B
Step 1. Find the Mean Step 1. Find the Mean
24+26+29+31+33+37 25+26+27+28+29+33+35+37
𝑥̅ = = 30 𝑥̅ = = 30
6 8
Step 2. Construct a table for easier computation Step 2. Construct a table for easier computation later
later x ̅
x-𝒙 ̅)2
(x - 𝒙
x x-𝒙 ̅ ̅)2
(x - 𝒙 25 25 – 30 = - 5 (-5)2 = 25
24 24 – 30 = - 6 (-6)2 = 36
26 -4 16
26 -4 16
27 -3 9
29 -1 1
31 1 1 28 -2 4
33 3 9 29 -1 1
37 7 49 33 3 9
∑(𝑥 − 𝑥̅ )2 = 112 35 5 25
Step 3. Solve for the Standard deviation 37 7 49
118
𝑠𝑑 = √
Step 4. Solve for the coefficient of variance or cv 8−1
𝑠𝑑 sd = 4.11
𝑐𝑣 = • 100%
𝑥̅
4.73
𝑐𝑣 = • 100% Step 4. Solve for the coefficient of variance or cv
30
cv = 15.77 % 𝑠𝑑
𝑐𝑣 = • 100%
𝑥̅
4.11
𝑐𝑣 = • 100%
MICROSOFT OFFICE USER 30 14
cv = 13.7 %
PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT
c. What can you say on the average (mean) time of Group A and Group B in completing the homework?
The average time for Group A and Group B in completing the homework is the same.
Group B finishes the homework at approximately uniform rate since it has lesser coefficient of
variance compared to Group A.
Part
Measures of Variability (grouped data)
5
Range
The RANGE of a frequency distribution is simply the difference between the upper class boundary of the
highest class interval and the lower class boundary of the lowest class interval.
𝑹𝒂𝒏𝒈𝒆 = 𝑼𝒑𝒑𝒆𝒓 𝑪𝒍𝒂𝒔𝒔 𝑩𝒐𝒖𝒏𝒅𝒂𝒓𝒚 − 𝑳𝒐𝒘𝒆𝒓 𝑪𝒍𝒂𝒔𝒔 𝑩𝒐𝒖𝒏𝒅𝒂𝒓𝒚
Example: Scores in Second Periodical Test of I – Faith in Mathematics I
Scores Frequency
Solution:
46 - 50 1
41 - 45 10 Upper class boundary = 50.5
36 - 40 10 Lower class Boundary = 20.5
31 - 35 16 𝑅 = 50.5 − 20.5
26 - 30 9
𝑹 = 𝟑𝟎
21 - 25 4
Variance and Standard Deviation
For large quantities, the variance and standard deviation is computed using frequency distribution with
columns for the midpoint value, the product of the frequency and midpoint value for each interval, the deviation
and its square and finally the product of the frequency and the squared deviation.
Example: The table is the distribution of the number of mistakes 50 students made in factoring 20
quadratic equations. Compute the standard deviation.
Number of Frequency
Mistakes
18 – 20 2
Mean
15 – 17 5
∑(𝒇𝒙𝒎 )
̅=
𝒙
12 – 14 6 𝑵
𝟒𝟑𝟖
9 – 11 10 = 𝟓𝟎
6–8 15 = 𝟖. 𝟕𝟒
3–5 8
0–2 4
Total 50
Solution:
Number of
f 𝒙𝒎 𝒇𝒙𝒎 ̅
𝒙𝒎 − 𝒙 ̅) 𝟐
(𝒙𝒎 − 𝒙 ̅)𝟐
𝒇(𝒙𝒎 − 𝒙
Mistakes
̅)𝟐
∑ 𝒇(𝒙𝒎 −𝒙 1063.62 ̅)𝟐
𝒔𝟐 = = = 21.706531 ≈ 𝟐𝟏. 𝟕𝟏 𝒔=√
∑ 𝒇(𝒙𝒎 −𝒙
=√
1063.64
= √21.706531 ≈ 𝟒. 𝟔𝟔
𝒏−𝟏 49 𝒏−𝟏 49
Example: The table is the distribution of the number of mistakes 50 students made in factoring 20 quadratic
equations. Compute the standard deviation.
Solution:
Number of Frequency Number of
f 𝒇𝒙𝒎 𝟐
Mistakes Mistakes
18 – 20 2 18 – 20 2 19 38 722
15 – 17 5 16 80 1280
15 – 17 5 12 – 14 6 13 78 1014
12 – 14 6 9 – 11 10 10 100 1000
6–8 15 7 105 735
9 – 11 10
3–5 8 4 32 128
6–8 15 0–2 4 1 4 4
Total 50 437 4883
3–5 8
0–2 4
Total 50
𝟓𝟎(𝟒𝟖𝟖𝟑)−(𝟒𝟑𝟕)𝟐
𝒔𝟐 = 𝒔 = √21.706531
𝟓𝟎(𝟓𝟎−𝟏)
𝒔𝟐 =
𝟓𝟑,𝟏𝟖𝟏
≈ 𝟐𝟏. 𝟕𝟎𝟔𝟓𝟑𝟏 𝒔 ≈ 𝟒. 𝟔𝟔
𝟐𝟒𝟓𝟎
REFERENCES
Blay, Basilia e. (2007). Elementary Statistics. Pasig City: Anvil Publishing, Inc.
Elston and Johnson (1995).Essentials of Biostatistics. Singapore: Info Access & Distribution Pte LTD.
Balayan, et. al (2006). Biostatistics a Foundation to the Medical & Health Sciences. Sampaloc: Sta. Monica
Printing Corporation
Altares, et. Al (2005). Elementary Statistics with Computer Applications: Rex Bookstore, Manila, Philippines