0% found this document useful (0 votes)
48 views18 pages

MODULE-2 Descriptive Statistics

stats
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views18 pages

MODULE-2 Descriptive Statistics

stats
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

PCOA 022 – Statistical Analysis with Software Application S.Y.

2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT

Module 2 : Descriptive Statistics


INTRODUCTION

Last time, you were introduced to some basic concepts in statistics which include the different
basic terms and definitions, different sampling techniques and ways on how to gather data, ways
on how to present data using frequency tables, and the different vital statistical rates and ratios.
The organization of data may be done by using tables while the summary of data may be displayed
employing graphs and charts.
Another method of summarizing data is to compute numbers, such as average, that
describe a set of data. Numbers that are used to describe sets of data are called descriptive
measures. We also intend to analyze a set of data by describing its spread or dispersion. All of
which is to be discussed in this module.

Topics

❖ Measures of Central Tendency (ungrouped data)


❖ Measures of Central Tendency (grouped data)
❖ Measures of Variability or dispersion (ungrouped data)
❖ Measures of Variability or dispersion (grouped data)
INSTRUCTIONAL MATERIALS

Schedule:

• 2nd – 3rd Week

Instructional Materials:

• ULS CLMS
• Pre-recorded lectures
• PDF files

LEARNING OUTCOMES

By the end of this module you should have been able to:

1. find the measures of central tendency, and variability of both ungrouped and grouped data;
2. describe data by using measures of central tendency and measures of variability; and
3. interpret Likert-type of questions applying measures of central tendency.

MOTIVATION

Ever wonder how your grades were computed? How the so-called General weighted Average – a
crucial part for students vying for Latin honors – is obtained? Average is a value where all the other
values in a set of data tend to cluster. In this module, we will discuss the average or measure of
central tendency: the mean, median, and mode.

MICROSOFT OFFICE USER 1


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT
CONTENT

Measures of Central Tendency (ungrouped data)

Aside from tables and graphs, another way of describing a set of data is by stating a single numerical value
associated with it. This value is where all the other values in a distribution tend to cluster. It is called the average or
measure of central tendency. There are three kinds of average: the mean, the median, and the mode.

Mean
The mean (also known as the arithmetic mean)
➢ It is the sum of measures divided by the number of measures in a variable.
➢ It is symbolized as 𝑥̅ (read as x bar).
➢ Used to describe a set of data where the measures cluster or concentrate at a point. As the measures
cluster around each other, a single value appears to represent distinctively the total measures.
➢ It is, however, affected by extreme measures, that is, very high or very low measures can easily
change the value of the mean.
To find the weighted mean:
To find the mean of ungrouped data:
∑ 𝑓𝑥
∑𝑥 𝑥̅ 𝑤𝑚 =
𝑥̅ = 𝑁
𝑛 Where:
Where:
∑fx = summation of the product of the frequency
∑x = the summation of x (sum of the measures) (f) and score (x)
n = number of values of x N = total number of frequency

Example 1: The grades in Chemistry of 10 students are 87, 84, 85, 85, 86, 90, 79, 82, 78, 76.
What is the average grade of the 10 students?
Solution: 87 + 84 + 85 + 85 + 86 + 90 + 79 + 82 + 78 + 76 832
𝑥̅ = = = 83.2
10 10

The average grade of the 10 students is 83.2

Example 2: Find the mean salary for a small company that pays monthly salaries to its employees as shown
in the frequency distribution.
Salary (x) Number of fx
Employees (f) ∑ 𝑓𝑥
Php7 000.00 8 56 000 𝑥̅ =
𝑁
Php8 000.00 11 88 000 427 000
=
Php9 250.00 14 129 500 45
Php10 500.00 9 94 500 = 9 488.89
Php17 000.00 2 34 000
Php25 000.00 1 25 000 The average monthly salary that
Total 45 ∑ 𝒇𝒙 =427000
the company pays is
Php9,488.89

MICROSOFT OFFICE USER 2


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT
Median
Median is the middle entry or term in a set of data arranged in either increasing or decreasing order. The
median is a positional measure. Thus the values of the individual measures in a set of data do not affect it. It is
affected by the number of measures and not by the size of the extreme values.
To find the median of a given set of data: take note of the following:

➢ Arrange the data in either increasing or decreasing order.


➢ Locate the middle value. If the number of cases is odd, the middle value is the median.
➢ If the number of cases is even, take the arithmetic mean of the two middle measures.
Median uses the symbol 𝒙 ̃

Example 1: The number of books borrowed in the library from Monday to Friday last week were 58, 60,
54, 35, and 97 respectively. Find the median.

Solution: 35, 54, 58, 60, 97 Arrange the number of books borrowed in increasing order.

̃ = 58
𝒙 The median is 58.

Example 2: Cora’s quizzes for the second quarter are 8, 7, 6, 10, 9, 5, 9, 6, 10, and 7. Find the median.

Solution: 5, 6, 6, 7, 7, 8, 9, 9, 10, 10 Since the number of measures is even, then the median is the
average of the two middle scores.
𝟕+𝟖
̃=
𝒙 = 𝟕. 𝟓
𝟐
The median score of Cora’s quizzes is 7.5

Mode
Mode is the measure or value which occurs most frequently in a set of data. It is the value with the greatest
frequency.
To find the mode of a given set of data: take note of the following:
➢ select the measure that appears most often in the set;
➢ If two or more measures appear the same number of times, and the frequency they appear is
greater than any other measures, then each of these values is a mode
➢ If every measure appears the same number of times, then the set of data has no mode.
➢ The distribution is unimodal if there’s only one mode, bimodal if two, and trimodal if three.
Median uses the symbol 𝒙 ̂
Example 1: The shoe sizes of 10 randomly selected students in a class are 6, 5, 4, 6, 4, 5, 6, 7, 7 and 6.
What is the mode?

̂ = 𝟔 The mode is 6 since it is the shoe size that occurred the most number of times.
Answer: 𝒙

Example 2: The sizes of 9 classes in a certain school are 50, 52, 55, 49, 51, 54, 55, 53 and 54.

̂ = 𝟓𝟒 & 𝟓𝟓 The modes are 54 and 55 since the two measures occurred the same number
Answer: 𝒙
of times. The distribution is bimodal.

MICROSOFT OFFICE USER 3


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT

Measures of Central Tendency (grouped data)


The Mean of Grouped Data Using the Class Marks
When the number of items in a set of data is too big, items are grouped for convenience. The manner of
computing for the mean of grouped data is given by the formula:
∑(𝑓𝑥𝑚 )
𝑥̅ =
𝑁
where: 𝑥̅ = mean
f = frequency of each class
𝑥𝑚 = class mark
∑(𝑓𝑥𝑚 ) = summation of the product of the frequency and the class mark
𝑁= sum of all the frequency

Example: Compute the mean of the scores of the students in a Mathematics test.
Class Frequency
46 – 50 1
41 – 45 5
36 – 40 11
31 – 35 12
26 – 30 11
21 – 25 5
16 – 20 2
11 – 15 1

The frequency distribution for the data is given below. The columns xm and fxm are added.
Class f 𝑥𝑚 𝑓𝑥𝑚
46 – 50 1 48 48
41 – 45 5 43 215 To find 𝑥𝑚
36 – 40 11 38 418 𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 + 𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
31 – 35 12 33 396 𝑥𝑚 =
2
26 – 30 11 28 308
21 – 25 5 23 115 Example: in 46 – 50
16 – 20 2 18 36
46 + 50
11 – 15 1 13 13 𝑥𝑚 =
N=48 2
∑(𝑓𝑥 ) = 1549
𝑚
𝑥𝑚 = 48

∑(𝑓𝑥𝑚 )
𝑥̅ =
𝑁
1 549
=
48
= 32.27

The average score of students in a Mathematics Test is 32. 27

MICROSOFT OFFICE USER 4


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT
The Mean of Grouped Data Using the Coded Deviation
An alternative formula for computing the mean of grouped data makes use of coded deviation:

∑ 𝑓𝑥𝑐
𝑥̅ = 𝑥0 + [ ]𝑖
𝑁

Where: 𝑥̅ = mean
𝑥0 = assumed mean
𝑓 = frequency
𝑥𝑐 = coded value
𝑁 = total frequency
𝑖 = size of the class interval

➢ Any class mark can be considered as assumed mean. The class chosen to contain 𝑥0 is given a 0
deviation.
➢ Subsequently, consecutive positive integers are assigned to the classess upward and negative
integers to the classes downward.

Example: Compute the mean of the scores of the students in a Mathematics test.
Class Frequency
46 – 50 1
41 – 45 5
The frequency distribution 36 – 40 11
for the data is given below. 31 – 35 12
The columns xm, xc and 𝑓𝑥c
26 – 30 11
are added.
21 – 25 5
16 – 20 2
11 – 15 1
Solution:
Class f 𝑥𝑚 𝑥𝑐 𝑓𝑥𝑐 To find 𝑥𝑐 :
46 – 50 1 48 3 3
• Choose any class mark to be considered as
41 – 45 5 43 2 10 assumed mean (in this case we choose 33)
36 – 40 11 38 1 11 • Coded value (𝑥𝑐 ) assigned to 33 would
31 – 35 12 33 0 0 be zero (0)
26 – 30 11 28 -1 -11 • Consecutive positive integers are assigned
upward after zero (0)
21 – 25 5 23 -2 -10
• Consecutive negative integers are
16 – 20 2 18 -3 -6 assigned upward after zero (0)
11 – 15 1 13 -4 -4 • Be mindful of the arrangement of the class
N=48 intervals whether descending or
∑(𝑓𝑥𝑐 ) = −7 ascending.

∑(𝑓𝑥𝑐 )
𝑥̅ = 𝑥0 + [ ]𝑖
𝑁
−7
= 33 + ( )5
48
=32.27

MICROSOFT OFFICE USER 5


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT
The Median of Grouped Data
The median is the middle value in a set of quantities. It separates an ordered set of data into two equal parts.
Half of the quantities found above the median and the other half is found below it.
In computing for the median of grouped data, the following formula is used:
𝑁
− 𝑐𝑓𝑏
𝑥̃ = 𝑥𝑙𝑏 + [ 2 ]𝑖
𝑓𝑚
where: 𝑥̃ = median
𝑥𝑙𝑏 = the lower boundary of the median class
N = total frequency
𝑐𝑓𝑏 = the cumulative frequency of the lower class next to the median class
𝑓𝑚 = frequency of the median class
𝑖 = size of the class interval
𝑁 𝑡ℎ
The median class is the class that contains the ( 2 ) score. This can be located under the column < 𝑐𝑓of
the cumulative frequency distribution.
Example:
Compute the median of the scores of the students in a Mathematics test.
Class Frequency To Reminders:
46 – 50 1 • To find lb, subtract 0.5 from the lower limit.
41 – 45 5 • To find “<” cf, start with N.
36 – 40 11 ➢ N = 48, so in 46 – 50, the “<” cf is 48.
➢ To get the next “<” cf, get the difference of the “<” cf and f of the class interval
31 – 35 12 above it. Example, in 41 – 45, the above “<” cf is 48 and f is 1, so 48 – 1 = 47.
26 – 30 11 Hence, the “<” cf of 41 – 45 is 47.
21 – 25 5 ➢ In 36 – 40, 47 – 5 = 42, therefore the “<” cf is 42.
16 – 20 2 ➢ Then do the same for the succeeding class intervals.
➢ The “<” cf and f of the last class interval should be the same.
11 – 15 1 ➢ If the class interval is from lowest to highest, then start from the bottom

The frequency distribution for the data is given below. The columns for lb and “less than” cumulative
frequency are added.
𝑁
Class f lb “<” cf Class f lb “<” cf 𝑁 𝑁 −𝑐𝑓𝑏
2
Compute 2 , since = 𝑥̃ = 𝑥𝑙𝑏 + [ ]𝑖
46 – 50 1 45.5 48 11 – 15 1 10.5 1 48
2 𝑓𝑚
= 24, then the class 24−19
41 – 45 5 40.5 47 16 – 20 2 15.5 3 2
= 30.5 + [ ]5
36 – 40 11 35.5 42 21 – 25 5 20.5 8 interval containing 24 in 12
or
the less than cumulative = 32.58
31 – 35 12 30.5 31 26 – 30 11 25.5 19
frequency will be the =
26 – 30 11 25.5 19 31 – 35 12 30.5 31 median class. In this
21 – 25 5 20.5 8 36 – 40 11 35.5 42 case the median class The median score is
16 – 20 2 15.5 3 41 – 45 5 40.5 47 is the 31–35 class 32.58.
11 – 15 1 10.5 1 46 – 50 1 45.5 48 interval.

∴50% of students got a score above 32. 58, and 50% of students got a score below 32.58

MICROSOFT OFFICE USER 6


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT
The Mode of Grouped Data
The mode of grouped data can be approximated using the following formula:

∆1
𝑥̂ = 𝑥𝑙𝑏 + [ ]𝑖
∆1 + ∆2
where: 𝑥𝑙𝑏 = lower boundary of the modal class.
∆1 = difference between the frequencies of the modal class and the next lower class.
∆2 = difference between the frequencies of the modal class and the next upper class.
𝑖 = class interval.

The modal class is the class interval with the highest frequency.

Example:

1. Compute the mode of the scores of the students in a Mathematics test.


Class Frequency
46 – 50 1
41 – 45 5
36 – 40 11
31 – 35 12
26 – 30 11
21 – 25 5
16 – 20 2
11 – 15 1
The frequency distribution for the data is given below. The column for lb is added.

Class f lb ∆1
Since class 31 – 35 𝑥̂ = 𝑥𝑙𝑏 + [ ]𝑖
46 – 50 1 45.5 ∆1 + ∆2
has the highest
41 – 45 5 40.5 12 − 11
frequency, the modal = 30.5 + [ ]5
36 – 40 11 35.5
class is 31 – 35. (12 − 11) + (12 − 11)
31 – 35 12 30.5 = 30.5 + [0.5]5
26 – 30 11 25.5 = 30.5 + 2.5
21 – 25 5 20.5 = 33
16 – 20 2 15.5 The mode score is 33.
11 – 15 1 10.5

33 is the common score among the students in a Math test.

Reminder: take note of the arrangement of the class interval if it is from highest to lowest or from
lowest to highest.

MICROSOFT OFFICE USER


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT
The Questionnaire & The Likert-Scale
Likert-type question is used if the researcher wants to know the feelings or opinions of the respondents
regarding any topic or issues of interest.
Below are examples of Likert-type statements. The respondents will choose the number which best
represents their feelings regarding the statements. Remember that the statements are grouped according to
a theme.
Choices Likert-type Mean Interpretation
5 – strongly agree 4.50 – 5.00 – strongly agree
4 – agree 3.50 – 4.49 – agree
3 – Somewhat agree 2.50 – 3.49 - Somewhat agree
2 - disagree 1.50 – 2.49 - disagree
1 – strongly disagree 1.00 – 1.49 – strongly disagree
Items 1 – 3 refers to students’ personal confidence in learning statistics 5 4 3 2 1

1. I am sure that I can learn statistics.

2. I think I can handle different lessons in statistics.

3. I can get good grades in statistics.

Items 4 – 6 refer to the students’ perception on statistics as a subject

4. I think statistics is a worthwhile, necessary subject.

5. I will use statistics in many ways as a professional.

6. I’ll need a good understanding of statistics for my research work.

Items 7 – 9 re to students’ attitudes on the use of computer in learning

7. Computer makes learning fun and easy


8. I think working with computers would be enjoyable and stimulating

9. Computers help a lot in learning

The table below is the summary and interpretation of the mean responses in the Likert-type of statements 1 - 3.

5 4 3 2 1 𝑥̅ Interpretation of 𝑥̅

1. 36 51 18 0 1 4.14 Agree

2. 18 44 37 8 1 3.65 Agree

3. 18 48 28 0 1 3.86 Agree

T 72 143 83 8 3 3.88 Agree

MICROSOFT OFFICE USER


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT

To solve for 𝑥̅, use formula for weighted mean.


(5(36)+4(51)+3(18)+2(0)+1(1) 439
For item no. 1: = 106 = 𝟒. 𝟏𝟒
36+51+18+0+1

• The same process will be applied to the rest of the items.


• To get total mean 3.88, add all the computed mean for items 1 – 3 divided by the number of items.
4.14+3.65+3.86
= 𝟑. 𝟖𝟖
3
• To interpret the mean result, refer to the Likert-type Mean Interpretation, 4.14 is in the scale 3.50 – 4.49 which is
interpreted as agree.
∴We can say that most of the students agree that they can learn statistics (based on item no. 1). In general, (referring to
the total) we can conclude that most of the students agree that they are personally confident in learning statistics .
Measures of Variability (ungrouped data)

The earlier part of this module dealt with the concepts of measurements that describe the middle or the center of the
distribution. But the measures of central tendency do not describe how the observations spread out from the center
of the distribution. In this lesson, we will deal with the measures of variability or spread of a distribution.

Definition
MEASURES or VARIABILITY OR DISPERSION

These are measures of the average distance of each observation from the center of the
distribution. They measure the homogeneity or heterogeneity of a particular group.

Consider the following measurements, in liters, for two samples of apple juice in a tetra packed by companies A
and B.
Sample A Sample B

0.95 1.06

1.00 1.01

0.92 0.88

1.03 0.91

1.10 1.14

𝑥̅ = 1.00 𝑥̅ = 1.00

MICROSOFT OFFICE USER 9


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT

How far apart are the measurements from one another?

Company A
0.8 0.9 1 1.1 1.2

Company B

0.8 0.9 1 1.1 1.2

Both samples have the same mean, 1.00 liters. It is quite obvious that company A packed apple juice with a
more uniform content than company B. We say that the variability or the dispersion of the observations from the
mean is less for sample A than for sample B. Therefore, in buying apple juice, we would feel more confident that the
tetra pack we select will be closer to the advertised mean if we buy from company A.

A small measure of variability would A big measure of variability would


indicate that the data are... indicate that the data are...
1. Clustered closely around the mean 1. FAR AWAY FROM THE MEAN
2. more homogeneous 2. HETEROGENEOUS
3. less variable
3. MORE VARIABLE
4. more consistent
5. more uniformly distributed 4. LESS CONSISTENT
5. LESS UNIFORMLY DISTRIBUTED

There are 4 major


parts/measures under the measures of variability
These are:
1. Range (R): The difference between the highest value and the lowest value in a set of data Formula: R =
H-L
2. Standard Deviation (𝜎 for population and s for sample)
Calculator: 𝜎 or s
3. Variance (𝜎 2 for population and s2 for sample) The square of standard deviation
4. Coefficient of Variation (cv): It is to compare the variability of two or more sets of data having different
units.
Formula: cv = standard deviation divided by mean

MICROSOFT OFFICE USER 10


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT

Range
The range is the simplest measure of variability. It is the difference between the largest and smallest
measurement.

R = H - L

where R = Range, H = Highest measure, L = Lowest Measure

The main disadvantage of the range is that it does not consider every measure in the data.

Examples:

1. The IQs of 5 members of a family are 108, 112, 127, 118 and 113. Find the range.

Solution: The range of the IQs is 127 - 108 = 19.

2. The range of each of the set of scores of the three students is as follows:

H = 98 L = 92 R = 98 - 92 = 6
Student A

H = 97 L = 90 R = 97 - 90 = 7
Student B

H = 97 L = 90 R = 97 - 90 = 7
Student C

Observe that two students are “ tie.” This indicates that the range is not a reliable measure of dispersion. It is a poor
measure of dispersion, particularly if the size of the sample or population is large. It considers only the extreme
values and tells us nothing about the distribution of numbers in between.

MICROSOFT OFFICE USER 11


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT

Variance

Definition
Since the range considers only two scores in the data set, it is an unreliable measure
of variability, it cannot be used to directly compare two sets of data and it is an unstable
Variance is the measure of variability especially for a very large sample.
average of the
squared deviation Consider the following
from the mean. The
formulas for finding For male group
the variance for
ungrouped data are: Sample A (x) (𝑥 − 𝑥̅ ) (𝑥 − 𝑥̅ )2
a. Treating the data as
Population Variance: population, the variance is
∑(𝑥 − 𝜇)2
𝜎2 = 65 -19 361 ∑(𝑥 − 𝜇)2 820
𝑁
𝜎2 = =
𝑁 5
Sample Variance: 75 -9 81 = 164 𝑠𝑞𝑢𝑎𝑟𝑒 𝑢𝑛𝑖𝑡𝑠
∑(𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1 85 1 1 b. Treating the data as sample,
Where:
the variance is
𝑥 = individual
value/score from 95 11 121 ∑(𝑥 − 𝑥̅ )2
820 820
𝑠2 = = =
the raw data 𝑛−1 5−1 4
𝑥̅ = the sample mean = 205 𝑠𝑞𝑢𝑎𝑟𝑒 𝑢𝑛𝑖𝑡𝑠
100 16 256
𝑁 = total population
𝑛 =number of sample 𝛴 = 820
𝜇 = population mean
𝜎 2 = the population
variance
𝑠 2 = the sample For female group
variance
Sample B (𝑥 − 𝑥̅ ) (𝑥 − 𝑥̅ )2 1. Treating the data as
population, the variance is
82 -2 4 ∑(𝑥 − 𝜇)2 10
𝜎2 = =
83 -1 1 𝑁 5
= 2 𝑠𝑞𝑢𝑎𝑟𝑒 𝑢𝑛𝑖𝑡𝑠
84
0 0 2. Treating the data as sample,
85 1 the variance is
1 ∑(𝑥−𝑥̅ )2 10 10
𝑠2 = = = =
86 2 4 𝑛−1 5−1 4
2.5 𝑠𝑞𝑢𝑎𝑟𝑒 𝑢𝑛𝑖𝑡𝑠
𝛴 = 10

Note this
Using the variance as a measure of variability for the two sets of grades, the males showed more variability in
performance. Note that the higher the variance, the more variable or far apart the values are from each other.

MICROSOFT OFFICE USER 12


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT

Standard Deviation
Since the obtained variance is in squared units then the deviation from the mean is squared. You cannot
Definition
still picture out the true meaning of the data set. Hence, extract the square root of the variance, defining another
standard deviation measure of variability called the standard deviation.
is the square root of
the variance.
For Male Group
a. Treating the data as population, the standard deviation is
Population Standard
∑(𝑥−𝜇)2 820
Deviation: 𝜎=√ =√ = √164 = 12.81
𝑁 5
∑(𝑥−𝜇)2
𝜎=√ 𝑁 b. Treating the data as sample, the standard deviation is
∑(𝑥−𝑥̅ )2 820 820
Sample Standard 𝑠=√ = √5−1 = √ = √205 = 14.32
𝑛−1 4
deviation:
∑(𝑥−𝑥̅ )2 For female group
𝑠=√ 𝑛−1
Where: a. Treating the data as population, the standard deviation is
𝑥 = individual
∑(𝑥−𝜇)2 10
value/score from 𝜎=√ = √ 5 = √2 = 1.41
𝑁
the raw data
𝑥̅ = the sample mean b. Treating the data as sample, the standard deviation is
𝑁 = total population
∑(𝑥−𝑥̅ )2 10 10
𝑛 =number of sample 𝑠=√ = √5−1 = √ 4 = √2.5 = 1.58
𝑛−1
𝜇 = population mean
𝜎 = the population
standard deviation
𝑠 = the sample Based from the results, the male group are more dispersed than the female group.
standard deviation

Coefficient of Variance
When comparing two sets of data with different units, you cannot use the variance or the standard deviation. You have to use the coefficient
of variance.

Definition

Coefficient of variance It
is the ratio of the standard deviation to the mean. It is used to compare
the variability of two or more sets of data even when they are expressed in different units of
measurement.
𝑠𝑑
𝑐𝑣 = 𝑥̅ • 100%
Where:
𝑐𝑣 = Coefficient of variance Guiding Principle
𝑠𝑑 = standard deviation The lesser the value of the measure, the more consistent, the more
̅𝑥 = mean
homogeneous and the less scattered are the observations in the set of data.

MICROSOFT OFFICE USER 13


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT

Example: The following are the time taken (in minutes) to complete a homework by the two groups of students
in a day:
Group A :24, 26, 33, 37, 29, 31
Group B : 26 , 28, 33, 35, 37, 29 , 27, 25
a. Find the range.
b. Find the coefficient of variation.
c. What can you say on the average (mean) time of Group A and Group B in completing the homework?
d. Which group finishes the homework at approximately uniform rate?
Solution:
a. Find the range.
Group A 24, 26, 29, 31, 33, 37 Group B 25, 26, 27, 28, 29, 33, 35, 37
R=H–L
R = 37 – 24 R= 37 – 25
R = 13 R = 12
b. Find the coefficient of variation.
Group A Group B
Step 1. Find the Mean Step 1. Find the Mean
24+26+29+31+33+37 25+26+27+28+29+33+35+37
𝑥̅ = = 30 𝑥̅ = = 30
6 8

Step 2. Construct a table for easier computation Step 2. Construct a table for easier computation later
later x ̅
x-𝒙 ̅)2
(x - 𝒙
x x-𝒙 ̅ ̅)2
(x - 𝒙 25 25 – 30 = - 5 (-5)2 = 25
24 24 – 30 = - 6 (-6)2 = 36
26 -4 16
26 -4 16
27 -3 9
29 -1 1
31 1 1 28 -2 4
33 3 9 29 -1 1
37 7 49 33 3 9
∑(𝑥 − 𝑥̅ )2 = 112 35 5 25
Step 3. Solve for the Standard deviation 37 7 49

∑(𝑥 − 𝑥̅ )2 ∑(𝑥 − 𝑥̅ )2 = 118


𝑠𝑑 = √
𝑛−1
Step 3. Solve for the Standard deviation
112
𝑠𝑑 = √
6−1 ∑(𝑥 − 𝑥̅ )2
𝑠𝑑 = √
sd = 4.73 𝑛−1

118
𝑠𝑑 = √
Step 4. Solve for the coefficient of variance or cv 8−1
𝑠𝑑 sd = 4.11
𝑐𝑣 = • 100%
𝑥̅
4.73
𝑐𝑣 = • 100% Step 4. Solve for the coefficient of variance or cv
30
cv = 15.77 % 𝑠𝑑
𝑐𝑣 = • 100%
𝑥̅
4.11
𝑐𝑣 = • 100%
MICROSOFT OFFICE USER 30 14
cv = 13.7 %
PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT

c. What can you say on the average (mean) time of Group A and Group B in completing the homework?

The average time for Group A and Group B in completing the homework is the same.

d. Which group finishes the homework at approximately uniform rate?

Group B finishes the homework at approximately uniform rate since it has lesser coefficient of
variance compared to Group A.

Part
Measures of Variability (grouped data)
5
Range
The RANGE of a frequency distribution is simply the difference between the upper class boundary of the
highest class interval and the lower class boundary of the lowest class interval.
𝑹𝒂𝒏𝒈𝒆 = 𝑼𝒑𝒑𝒆𝒓 𝑪𝒍𝒂𝒔𝒔 𝑩𝒐𝒖𝒏𝒅𝒂𝒓𝒚 − 𝑳𝒐𝒘𝒆𝒓 𝑪𝒍𝒂𝒔𝒔 𝑩𝒐𝒖𝒏𝒅𝒂𝒓𝒚
Example: Scores in Second Periodical Test of I – Faith in Mathematics I

Scores Frequency
Solution:
46 - 50 1
41 - 45 10 Upper class boundary = 50.5
36 - 40 10 Lower class Boundary = 20.5
31 - 35 16 𝑅 = 50.5 − 20.5
26 - 30 9
𝑹 = 𝟑𝟎
21 - 25 4
Variance and Standard Deviation
For large quantities, the variance and standard deviation is computed using frequency distribution with
columns for the midpoint value, the product of the frequency and midpoint value for each interval, the deviation
and its square and finally the product of the frequency and the squared deviation.
Example: The table is the distribution of the number of mistakes 50 students made in factoring 20
quadratic equations. Compute the standard deviation.
Number of Frequency
Mistakes
18 – 20 2
Mean
15 – 17 5
∑(𝒇𝒙𝒎 )
̅=
𝒙
12 – 14 6 𝑵
𝟒𝟑𝟖
9 – 11 10 = 𝟓𝟎
6–8 15 = 𝟖. 𝟕𝟒
3–5 8
0–2 4
Total 50

MICROSOFT OFFICE USER 15


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT

Solution:
Number of
f 𝒙𝒎 𝒇𝒙𝒎 ̅
𝒙𝒎 − 𝒙 ̅) 𝟐
(𝒙𝒎 − 𝒙 ̅)𝟐
𝒇(𝒙𝒎 − 𝒙
Mistakes

18 – 20 2 19 38 10.26 105.2676 210.54


15 – 17 5 16 80 7.26 52.7076 263.54
12 – 14 6 13 78 4.26 18.1476 108.89
9 – 11 10 10 100 1.26 1.5876 15.88
6–8 15 7 105 -1.74 3.0276 45.41
3–5 8 4 32 -4.74 22.4676 179.74
0–2 4 1 4 -7.74 59.9076 239.63
50 437 1063.62

MICROSOFT OFFICE USER 16


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT

Variance Standard Deviation


∑ 𝒇(𝒙𝒎 −𝝁)𝟐 1063.62 ∑ 𝒇(𝒙𝒎 −𝝁)𝟐
𝝈𝟐 = = = 21.2724 ≈ 𝟐𝟏. 𝟐𝟕 𝝈=√ =√
1063.64
= √21.2724 ≈ 𝟒. 𝟔𝟏
𝑵 50 𝑵 50

̅)𝟐
∑ 𝒇(𝒙𝒎 −𝒙 1063.62 ̅)𝟐
𝒔𝟐 = = = 21.706531 ≈ 𝟐𝟏. 𝟕𝟏 𝒔=√
∑ 𝒇(𝒙𝒎 −𝒙
=√
1063.64
= √21.706531 ≈ 𝟒. 𝟔𝟔
𝒏−𝟏 49 𝒏−𝟏 49

Alternate Formula for SAMPLE VARIANCE & STANDARD DEVIATION

𝒏 ∑ 𝒇𝒙𝒎 𝟐 − (∑ 𝒇𝒙𝒎 )𝟐 𝒏 ∑ 𝒇𝒙𝒎 𝟐 − (∑ 𝒇𝒙𝒎 )𝟐


𝒔𝟐 = 𝒔=√
𝒏(𝒏 − 𝟏) 𝒏(𝒏 − 𝟏)

Where: s2 = sample variance


s = sample standard deviation
n = number of sample or frequency
f = frequency
𝑥𝑚= class mark or midpoint

Example: The table is the distribution of the number of mistakes 50 students made in factoring 20 quadratic
equations. Compute the standard deviation.

Solution:
Number of Frequency Number of
f 𝒇𝒙𝒎 𝟐
Mistakes Mistakes

18 – 20 2 18 – 20 2 19 38 722
15 – 17 5 16 80 1280
15 – 17 5 12 – 14 6 13 78 1014
12 – 14 6 9 – 11 10 10 100 1000
6–8 15 7 105 735
9 – 11 10
3–5 8 4 32 128
6–8 15 0–2 4 1 4 4
Total 50 437 4883
3–5 8
0–2 4
Total 50

MICROSOFT OFFICE USER 17


PCOA 022 – Statistical Analysis with Software Application S.Y. 2021-2022
Summer Term
Instructor: Mary Jane Bugarin - Tolentino, LPT

𝒏 ∑ 𝒇𝒙𝒎 𝟐 −(∑ 𝒇𝒙𝒎 )𝟐


𝒔𝟐 = 𝒔=√
𝒏 ∑ 𝒇𝒙𝒎 𝟐−(∑ 𝒇𝒙𝒎 )𝟐
𝒏(𝒏−𝟏)
𝒏(𝒏−𝟏)

𝟓𝟎(𝟒𝟖𝟖𝟑)−(𝟒𝟑𝟕)𝟐
𝒔𝟐 = 𝒔 = √21.706531
𝟓𝟎(𝟓𝟎−𝟏)

𝒔𝟐 =
𝟓𝟑,𝟏𝟖𝟏
≈ 𝟐𝟏. 𝟕𝟎𝟔𝟓𝟑𝟏 𝒔 ≈ 𝟒. 𝟔𝟔
𝟐𝟒𝟓𝟎

• Descriptive Statistics is a statistical procedure concerned with describing the


characteristics and properties of a group of persons, places or things.
• Among the measurements falling under descriptive statistics are the measures of
central tendency, measures of variability, summation and other items which help in
describing a data set.
• There are three measures of central tendency, namely, mean, median, mode which
is used for both grouped and ungrouped data.
• Measures of Variability or Dispersion are measures of the average distance of each
observation from the center of the distribution. They measure the homogeneity or
heterogeneity of a particular group. It includes:
1. Range - R = H – L
2. Standard Deviation (𝜎 for population and s for sample)
3. Variance (𝜎 2 for population and s2 for sample) The square of standard deviation.
4. Coefficient of Variation (cv): It is to compare the variability of two or more sets of data having
different units.

REFERENCES
Blay, Basilia e. (2007). Elementary Statistics. Pasig City: Anvil Publishing, Inc.
Elston and Johnson (1995).Essentials of Biostatistics. Singapore: Info Access & Distribution Pte LTD.
Balayan, et. al (2006). Biostatistics a Foundation to the Medical & Health Sciences. Sampaloc: Sta. Monica
Printing Corporation
Altares, et. Al (2005). Elementary Statistics with Computer Applications: Rex Bookstore, Manila, Philippines

MICROSOFT OFFICE USER 18

You might also like