Module Stats, 2022
Module Stats, 2022
Alaminos, Laguna
S.Y. 2021-2022
MODULE
Statistics and Probability
_________________________________
Name
EXAMPLE 3.2.1
To paraphrase Benjamin Disraeli: "There are lies, darn lies, and DAM STATISTICS."
Compute the mean, median and mode for the following DAM STATISTICS:
A measure of central tendency is a number that represents the typical value in a collection
of numbers. Three familiar measures of central tendency are the mean, the median, and
the mode.
We will let n represent the number of data points in the distribution. Then
Median = "middle" data point (or average of two middle data points) when the data
points are arranged in numerical order.
Mode = the value that occurs most often (if there is such a value).
In EXAMPLE 3.2.1 the distribution has 7 data points, so n = 7.
We can also use the MEDIAN to describe the typical response. In order to find the
median we must first list the data points in numerical order:
756, 726, 710, 568, 564, 440, 440
Because the median is 568 it is also reasonable to say that on this list the typical dam is
568 feet tall.
We can also use the MODE to describe the typical dam height. Since the number 440
occurs more often than any of the other numbers on this list, the mode is 440.
EXAMPLE 3.2.2
Survey question: How many semester hours are you taking this semester?
Responses: 15, 12, 18, 12, 15, 15, 12, 18, 15, 16
If the data points have been arranged numerically, we can use this fact to efficiently find
the median.
EXAMPLE
24, 25, 28, 31, 33, 33, 36, 42, 42, 48, 51, 57, 57, 68, 75, 79, 79, 79, 85
SOLUTION
The numbers are already in numerical order. The position of the "middle of the list" is:
(n+1)/2 = (19+1)/2 = 20/2 =10
Thus, the tenth number will be the median. We count until we arrive at the tenth number.
24, 25, 28, 31, 33, 33, 36, 42, 42, 48, 51, 57, 57, 68, 75, 79, 79, 79, 85
EXAMPLE 3.2.3
Compute the mean, median, and mode for this distribution of test scores:
92, 68, 80, 68, 84
FREQUENCY TABLES
EXAMPLE 3.2.5
Find the mean, median and mode for the following collection of responses to the
question: "How many parking tickets have you received this semester?"
1, 1, 0,1, 2, 2, 0, 0, 0, 3, 3,0, 3, 3, 0,2, 2, 2, 1, 1,4, 1, 1,0,3, 0, 0, 0, 1, 1, 2, 2, 2, 2,1, 1, 1, 1,
4, 4, 4,1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,1, 1, 1, 1, 1, 3,3,0, 3, 3, 1, 1, 1, 1,0, 0, 1, 1, 1, 1, 3, 3,
3, 2, 3, 3, 1, 1, 1,2, 2, 2,4, 5, 5, 4, 4, 1, 1, 1, 4,1, 1, 1,3, 3, 5,3, 3, 3, 2,3, 3, 0, 0, 0, 0, 3, 3,
3, 3, 3, 3, 0, 2, 2, 2, 2, 1, 1, 1,3, 1, 0, 0, 0,1, 1, 3,1, 1, 1, 2, 2, 2, 4, 2, 2, 2, 1, 1, 1, 1,0, 0, 2,
2, 3, 3,2, 2, 3,2, 0, 0, 1, 1,3, 3, 3, 1, 1, 1, 1, 1,2, 2, 2, 2, 1, 1, 1, 1, 0,1, 1, 1, 3,1, 1, 1, 2, 2,
2, 1, 1, 1,2, 1, 1, 1,3, 3,5, 3, 3, 1, 1, 1, 3, 3, 3, 3, 1, 1, 1,4, 1, 1, 4, 4, 4, 4, 4, 4,1, 1, 1,2, 2,5,
5, 2, 3, 3, 4, 4,3,2, 2, 2, 1,5, 1,2, 2, 1, 1, 1, 2, 2, 2, 2, 2,1, 1, 0,1, 1, 1,3, 3, 3, 3, 3
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5
Value Frequency
0 27
1 96
2 58
3 54
4 18
5 7
Now this table conveys everything that was significant about the distribution of data that
we presented at the beginning of this example. When working with frequency tables,
recall this fundamental fact:
The numbers in the "value" column indicate which numbers appear on the original list of
data. The numbers in the "frequency" column tell how many times the corresponding
value appears on the original list of data.
Now we find the mean, median and mode for the data in the table.
MODE
The mode, if it exists, is easiest to find. For data presented in a frequency table, the mode
is the value associated with the greatest frequency (if there is a greatest frequency).
In this case, the greatest frequency is 96 and the associated value is "1," so the mode is
"1." More students received 1 parking ticket than any of the other possibilities.
MEAN
To find the mean, we must have a convenient way to determine the sum of all the data
points, and also a convenient way to determine n, the number of data points in the
distribution. We may be tempted to merely add the six numbers in the "value" column,
and divide by six, but that would be incorrect, because it would fail to take into account
that facts that the distribution includes many more than just six data points, and the
various values do not all occur with the same frequency.
To find n in a case like this, we find the sum of numbers in the "frequency" column. This
makes sense, when we recall that the frequencies tell how many times each of the values
occurs.
Finding the sum of all 260 data points is simpler than it may at first seem, when we recall
what the table represents. For example, since the value 0 has a frequency of 27, when we
took the sum of all of the zeroes in the distribution, that subtotal would be (0)(27) = 0.
Likewise, the second row in the table shows use that the value 1 appears 96 times in the
distribution, so when we took the sum of all of the ones, we would get a subtotal of
(1)(96) = 96.
Continuing in this vein, the next row of the table tells us that the value 2 appears in the
distribution 58 times, so when we took the sum of all of the twos from the list, we would
have a subtotal of (2)(58) = 116.
This indicates that in order to find the sum of all of the data points in a frequency table,
we find the sum of all subtotals formed by multiplying a value times its frequency.
(0)(27) +(1)(96) +(2)(58) + (3)(54) +(4)(18) + (5)(7)
Mean =
27 +96 +58 +54 +18 + 7
481
= = 18.5
260
Summarizing the process described above, we have the following general rule for
determining the mean for data in a frequency table:
Mean = S/n
where n, the number of data points in the distribution, is obtained by finding the sum of
all of the frequencies, and S, the sum of the data points, is found by adding all of the
subtotals formed by multiplying a value by its associated frequency.
MEDIAN
To find the median, we need to recall that we are trying to find the middle number (or
two middle numbers) in a list of 260 numbers. Recall from earlier that the position of the
middle number is (n+1)/2.
In this case, the position of the middle number is 261/2, or 130.5.
Since 130.5 is located between 130 and 131, the median will be the average of the 130th
and 131st numbers in the ordered list (by the way, since the values are arranged
numerically in the table as we read from top to bottom, this data has already been
ordered). We need to count through the table until we find the 130th and 131st numbers.
This is done as follows, by taking into account the cumulative frequency as the values in
the various rows are "read" from the table. To make a column for cumulative frequency,
we pretend that we are reading data from a list, in numerical order. If we were to do so,
the first 27 numbers on the list would all be "0." This means that after all of the 0s are
read from the list, we would have read a total of 27 numbers. We say that the value 0 has
a cumulative frequency of 27:
Value Frequency Cumulative
Frequency
0 27 27
1 96
2 58
3 54
4 18
5 7
Next we would read all of the 1s from the list. There are 96 of them. Thus, after we have
read all the 0s and 1s from the list, we would have read a total of 27 + 96 = 123 numbers.
We summarize this by saying that the value 1 has a cumulative frequency of 123.
Continuing this process, after having read all of the 0s and 1s from the list, we would
read all of the 2s from the list. Since the value 2 appears on the list 58 times, after we
read all of the 0s, 1s and 2s from the list, we will have read a total of 27 + 96 + 58 = 181
numbers. We say that the value 2 has a cumulative frequency of 181.
At this point we will stop. Remember, the reason we started making the column for
cumulative frequency was so that we could locate the 130th and 131st numbers, in order
to determine the median. Now we see that the 130th and 131st numbers are both 2s (the
cumulative frequency column tells us, in fact, that the 124th through 181st numbers on
the list are all 2s).
The two middle numbers are both 2s, so the median is 2.
From the previous example we generalize to form this rule for determining the median
for data in a frequency table (this rule assumes that the values appear in the table in
numerical order).
EXAMPLE 3.2.6
The frequency table below represents the distribution of scores on a ten-point quiz.
Compute the mean, median, and mode for this distribution.
Quiz Scores
Value Frequency
5 6
6 8
7 14
8 22
9 28
10 36
EXAMPLE 3.2.7
The frequency table below represents the distribution, according to age, of students in a
certain class.
Value Frequency
18 31
19 48
20 60
21 50
22 33
EXAMPLE 3.2.8
The relative frequency table below shows the distribution of scores on a quiz in the
course Quantum Electrodynamics For Liberal Arts. Find the mean, median and mode.
Score Relative
Frequency
4 .03
5 .10
6 .04
7 .12
8 .33
9 .20
10 .18
EXAMPLE 3.2.9
A number of people invested $1000 each in the Gomer Family of Mutual Funds. The
frequency table below shows the current values of those investments. Compute the mean,
median and mode.
Value Frequency
0 48
50 42
75 31
100 28
150 22
EXAMPLE 3.2.10
A number of people invested $1000 each in the Gomer Family of Mutual Funds. The
frequency table below shows the current values of those investments after Gomer hit the
trifecta at the dog track, and hit the Cash 5 jackpot. Compute the mean, median and
mode.
Value Frequency
0 48
50 42
75 31
100 28
150 22
2,876,423 1
If we compare the previous two examples, we see that the two distributions are nearly
identical, except that the distribution in EXAMPLE 3.2.10 contains one extra number
(2,876,423) that is significantly greater than any of the other numbers in the distribution.
(A number that is significantly greater or significantly less than most of the other
numbers in a distribution is called an extreme value or outlier.)
Notice that including this extreme value had a huge effect on the mean of the distribution
(which increased from $61.55 to $16,784.58) but had no effect whatsoever on either the
median or the mode. Also notice that in the distribution in EXAMPLE 2.1.10, the mean is
not a good representation of the typical value in the distribution. This illustrates an
important general fact:
Of the three measures of central tendency (mean, median, mode), the mean is the
measure that is most likely to be distorted by the presence of extreme values.
EXAMPLE 3.2.11
The annual earnings for employees of a certain restaurant are given below:
12 laborers earn $8000 each.
10 laborers earn $9000 each.
4 supervisors earn $11000 each
The owner/manager earns $240,000.
Of the three measures of central tendency, which will be the least accurate representation
of "typical earnings?"
2. Find the mean of the following data: 12, 10,15, 10, 16, 12,10,15, 15, 13
A. 13 B. 12.5 C. 15 D. 12.8
3. Find the mode of the following data: 20, 14, 12, 14, 26, 16, 18, 19, 14
A. 14 B. 17 C. 26 D. 16
5. Find the median of the following data: 25, 20, 30, 30, 20, 24, 24, 30, 31
A. 20 B. 26 C. 25 D. 30
7. Find the mean of the following data: 20, 24, 24, 24, 22, 22, 24, 22, 23, 25
A. 23.5 B. 23 C. 24 D.
9. Find the mean of the following data: 0, 5, 30, 25, 16, 18, 19, 26, 0, 20, 28
A. 0 B. 18 C. 19 D. 17
10. Find the median of the following data: 9, 6, 12, 5, 17, 3, 9, 5, 10, 2, 8, 7
A. 6.5 B. 7.5 C. 6 D. 7.75
The table at right shows the distribution of scores on score freq
Quiz #6 in MGF1106 for Sections 01-08, Spring 1999. 6 11
Refer to the table for exercises 11 - 14. 8 26
10 27
11. Select the statement that is correct. 12 32
A. 6 people had scores of 11. 14 31
B. 27 people had scores of 10. 16 12
C. A and B are both correct. 18 15
D. All of these are false. 20 7
Whatto Know
Let’s begin with interesting and exploratory activities that would lead to the basic
concepts of measures of variability. You will learn to interpret, draw conclusions and make
recommendations.
After these activities, the learners shall be able to answer the question, “How can I
make use of the representations and descriptions of a given set of data in real-life
situations?”.
The lesson on measures of variability will tell you how the values are scattered or
clustered about the typical value.
It is quite possible to have two sets of observations with the same mean or median that
differs in the amount of spread about the mean. Do the following activity.
Activity 1
WHICH TASTESBETTER?
A housewife surveyed canned ham for a special family affair.
She picked 5 cans each from two boxes packed by company A and
company B. Both boxes have l the same weight. Consider the
following weights in kilograms of the canned Ham packed by the two
companies (sample A and sample B).
Help the housewife choose the best sample by doing the following procedure.
465
Measures other than the mean may provide additional information about the same data.
These are the measures of dispersion.
Measures of dispersion or variability refer to the spread of the values about the mean.
These are important quantities used by statisticians in evaluation. Smaller dispersion of scores
arising from the comparison often indicates more consistency and more reliability.
The most commonly used measures of dispersion are the range, the average deviation,
the standard deviation, and the variance.
Whatto Process
Here you will be provided with enabling activities that you have to go through to
validate your understanding on measures of variability after the activities in the What to
Know phase. These would answer the question “How can I make use of the
representations and descriptions of given set of data in real-life situations?”.
The Range
The range is the simplest measure of variability. It is the difference between the largest
value and the smallest value.
R=H–L
466
Finding the range of wages: Range = Highest wage – Lowest wage
Comparing the two wages, you will note that wages of workers of factory B have a higher
range than wages of workers of factory A. These ranges tell us that the wages of workers of
factory B are more scattered than the wages of workers of factory A.
Look closely at wages of workers of factory B. You will see that except for 672 the
highest wage, the wages of the workers are more consistent than the wages in A. Without the
highest wage of 672 the range would be 80 from 480 – 400 = 80. Whereas, if you exclude the
highest wage 575 in A, the range would be 140 from 520 – 380 = 140.
Can you now say that the wages of workers of factory B are more scattered or variable
than the wages of workers of factory A?
The range tells us that it is not a stable measure of variability because its value can
fluctuate greatly even with a change in just a single value, either the highest or lowest.
ACTIVITY 1
2. The range of each of the set of scores of the three students is as follows:
Ana H = 98, L = 92, R = 98 – 92 = 6
Josie H = 97, L = 90, R = 97 – 90 = 7
Lina H = 98, L = 89, R = 98 – 89 = 7
a. What have you observed about the range of the scores of the three students?
b. What does it tell you?
467
3. Consider the following sets of scores: Find the range and the median.
Set A Set B
3 3
4 7
5 7
6 7
8 8
9 8
10 8
12 9
15 15
Activi
TRYTHIS!
ACTIVITY 2
A. Compute the range for each set of numbers.
1. If the range of the set of scores is 29 and the lowest score is 18, what is the
highest score?
2. If the range of the set of scores is 14, and the highest score is 31, what is the
lowest score?
3. The reaction times for a random sample of 9 subjects to a stimulant were recorded
as 2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1 and 3.4 seconds. Calculate range.
4. Two students have the following grades in six math tests. Compute the mean and
the range. Tell something about the two sets of scores.
Pete Ricky
82 88
98 94
86 89
80 87
100 92
94 90
468
The Average Deviation
The dispersion of a set of data about the average of these data is the average deviation
or mean deviation.
A.D. = ∑|x-x|
N
where A.D. is the average deviation;
x is the individual score;
x is the mean; and
N is the number of scores.
|x-x| is the absolute value of the deviation from the mean.
Example:
Find the average deviation of the following data: 12, 17, 13, 18, 18, 15, 14, 17, 11
2. Find the absolute difference between each score and the mean.
|x-x| = |12 − 15| = 3
= |17 − 15| = 2
= |13 − 15| = 2
= |18 − 15| = 3
= |18 − 15| = 3
= |15 − 15| = 0
= |14 − 15| = 1
= |17 − 15| = 2
= |11 − 15| = 4
469
= |18 − 15| = 3
= |15 − 15| = 0
= |14 − 15| = 1
= |17 − 15| = 2
= |11 − 15| = 4
∑|x-x| = 20
x x |x-x|
12 15 3
17 15 2
13 15 2
18 15 3
18 15 3
15 15 0
14 15 1
17 15 2
11 15 4
∑|x-x| =20
TRYTHIS!
ACTIVITY 3
Solve the average deviation of the following:
The average deviation gives a better approximation than the range. However, it does
not lend itself readily to mathematical treatment for deeper analysis.
470
The Standard Deviation
IN PAIRS
ACTIVITY 4
Compute the standard deviation of the set of test scores: {39, 10,
24, 16, 19, 26, 29, 30, 5}.
∑(x-x)2
SD =
N
g. Summarize the procedure in computing the standard deviation.
From the activity, you have learned how to compute for the standard deviation.
Like the average deviation, standard deviation differentiates sets of scores with equal
averages. But the advantage of standard deviation over mean deviation is that it has several
applications in inferential statistics
To compute for the standard deviation of an ungrouped data, we use the formula:
471
∑(x-x)2
SD =
N
Where SD is the standard deviation;
x is the individual score;
x is the mean; and
N is the number of scores.
In the next discussion, you will learn more about the importance of using the standard
deviation.
Compare the standard deviation of the scores of the three students in their Mathematics
quizzes.
Solution:
Student A:
∑(x-x)2 34
SD = = = 6.8 = 26
N 5
472
Student B:
∑x 92 + 92 + 96 + 95 + 90
x= N = 5 = 94
∑(x-x)2 8
SD = = = 1.6 = 1.3
N 5
Student C:
∑x 95 + 94 + 93 + 96 + 92
x= N = 5 = 94
∑(x-x)2 10
SD = = = 2 = 1.4
N 5
473
The result of the computation of the standard deviation of the scores of the three students
can be summarized as:
SD (A) = 2.6
SD (B) = 1.3
SD (C) = 1.4
The standard deviation of the scores can be illustrated below by plotting the scores on
the number line.
Graphically, a standard deviation of 2.6 means most of the scores are within 2.6 units
from the mean. A Standard deviation of 1.3 and 1.4 suggest that most of the scores are within
1.3 and 1.4 units from the mean.
The scores of Student B is clustered closer to the mean. This shows that the score of
Student B is the most consistent among the three sets of scores.
474
ACITIY 5
A. Compute the standard deviation for each set of numbers.
1. (12, 13, 14, 15, 16, 17, 18)
2. (7, 7, 8, 12, 14, 14, 14, 14, 15, 15)
3. (12, 12, 13, 13, 13, 13, 13, 15, 19, 20, 20)
4. (12, 13, 17, 22, 22, 23, 25, 26)
5. (23, 25, 27, 27, 32, 32, 36, 38)
B. The reaction times for a random sample of nine subjects to a stimulant were recorded as
2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1 and 3.4 seconds. Calculate the range and standard
deviation.
C. Suppose two classes achieved the following grades on a math test, find the range and
the standard deviation.
ACTIVITY 6
The grades of a student in nine quizzes: 78, 80, 80, 82, 85, 85, 85, 88, 90. Calculate for
the mean and standard deviation using a scientific calculator.
Procedure
475
x f(x)
1
2 Is displayed. Input values of x.
3
78 = 80 = 82 = 85 = 88 = 90 =
1 = 2 = 1 = 3 = 1 = 1
x f(x)
1 78 1
2 80 2
3 82 1
4 85 3 The displayed output.
5 88 1
6 90 1
Answer: SD ≈ 3.74
In the next discussion, you will learn about another measure of variability.
The Variance
1
The variance (∂2) of a data is equal to N . The sum of their squares minus the square of
their mean. It is virtually the square of the standard deviation.
∑(x-x)2
∂2 =
N
where ∂2 is the variance;
N is the total number of observations;
x is the raw score; and
x is the mean of the data.
Variance is not only useful, it can be computed with ease and it can also be broken into
two or more component sums of squares that yield useful information.
476
ACTIVITY 7
The table shows the daily sales in peso of two sari-sari stores near a school.
Store A Store B
300 300
310 120
290 500
301 100
299 490
295 110
305 300
300 480
477