0% found this document useful (0 votes)
38 views

Module - Data Management (Part 2)

The document discusses measures of variability or dispersion for data sets. It defines range, variance, and standard deviation as common measures of variability. Range is the difference between the highest and lowest values in a data set. Variance and standard deviation take into account all values in a data set and how spread out they are from the mean. The document provides examples of calculating range, variance, and standard deviation for two sample data sets to demonstrate how set B has higher variability than set A.

Uploaded by

Nikki Jean Hona
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Module - Data Management (Part 2)

The document discusses measures of variability or dispersion for data sets. It defines range, variance, and standard deviation as common measures of variability. Range is the difference between the highest and lowest values in a data set. Variance and standard deviation take into account all values in a data set and how spread out they are from the mean. The document provides examples of calculating range, variance, and standard deviation for two sample data sets to demonstrate how set B has higher variability than set A.

Uploaded by

Nikki Jean Hona
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

DATA MANAGEMENT (STATISTICS)

Objectives
1. Recognize the basic terms of statistics.
2. Determine and apply the measures of central tendency, variability, and
position.
3. Apply the measures of central tendency, and variability in normal
distribution.
4. Determine the linear regression and correlation of the set of data.
Lesson Proper

LESSON 4. Measures of Variability/ Dispersion/Spread/ Scattering

Let us first compare the two sets of data.


Set A Set B
2, 3, 3, 4, 8 2, 3, 3, 3, 9
𝑥̅ = 4 𝑥̅ = 4
𝑥̃ = 3 𝑥̃ = 3
𝑥̂ = 3 𝑥̂ = 3
The two sets of data have different, but it is reveal that mean, median, and mode
respectively are the same. In some aspects the two sets of data are still different by
using the measures of variability (also called dispersion, spread, or scattering). This
descriptive measures talks about how spread, scatter, disperse, or variable of data is.

Like the descriptive measures of central tendency (mean, median, and mode),
measures of variability have also different types, such as range, variance, standard
deviation, and among others. The bigger the descriptive measures of variability,
means the data is more spread, disperse, scatter, or variable. Similarly, the smaller
the descriptive measures of variability, means the data is more close to one another.

Ungrouped Data
1. Range
The difference of the highest value and lowest value.
𝑅 = 𝐻𝑉 − 𝐿𝑉
The range of Set A is 𝑅 = 8 − 2 = 6, while the range of Set B is 𝑅 = 9 − 2 = 7. Results
implies that Set A have a more closer data than Set B.

Range is not as reliable as the other measures of variable, because it does not consider
the data in between the highest and lowest values. Other types of variability consider
these data.

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 1
For instant, given the data below. Both have the range (𝑅 = 8 − 2 = 6), but as we can
observe the behavior of data between the highest value and lowest value are not the
same.
Set C Set D
2, 2, 2, 2, 8 2, 3, 4, 6, 8

2. Variance
Variance of Population
2
∑(𝑥 − 𝜇 )2
𝜎 =
𝑁
Where:
𝜎 (Lowercase Greek letter sigma)
∑ (uppercase Greek letter sigma means summation)
𝜎 2 is the variance
𝑥 is each datum
𝜇 (Lowercase Greek letter mu) is the mean of the population
𝑁 is the population size

Variance of Sample
∑(𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1
Where:
∑ (uppercase Greek letter sigma means summation)
𝑠 2 is the variance
𝑥 is each datum
𝑥 is the mean of the population
𝑛 is the population size

Use the variance of population if data comes from the whole population.
Likewise, use the formula for the variance of the sample if the sample comes
from the sample.

Let us determine the variance of each set. Assuming the data comes from a
sample.
Set A Set B
2, 3, 3, 4, 8 2, 3, 3, 3, 9
𝑥̅ = 4 𝑥̅ = 4
𝑥̃ = 3 𝑥̃ = 3
𝑥̂ = 3 𝑥̂ = 3

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 2
Set A
𝒙 𝒙−𝒙 ( 𝒙 − 𝒙 )𝟐
2 2 − 4 = −2 (−2)2 = 4
3 3 − 4 = −1 (−1)2 = 1
3 3 − 4 = −1 (−1)2 = 1
4 4−4= 0 (0)2 = 0
8 8−4= 4 (4)2 = 16
𝑥̅ = 4 ∑(𝑥 − 𝑥̅ )2 = 22
There are five data point (e.i. 2, 3, 3, 4, and 8).
∑(𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1
22
𝑠2 =
5−1
22 11
𝑠2 = = = 𝟓. 𝟓
4 2
The variance of Set A is 5.5. Let us also determine the variance of Set B.
Set B
𝒙 𝒙−𝒙 ( 𝒙 − 𝒙 )𝟐
2 2 − 4 = −2 (−2)2 = 4
3 3 − 4 = −1 (−1)2 = 1
3 3 − 4 = −1 (−1)2 = 1
3 3 − 4 = −1 (−1)2 = 1
9 9−4= 5 (5)2 = 25
𝑥̅ = 4 ∑(𝑥 − 𝑥̅ )2 = 32

2
∑(𝑥 − 𝑥̅ )2
𝑠 =
𝑛−1
2
32
𝑠 =
5−1
32 16
𝑠2 = = =𝟖
4 2
The variance of Set B is 8 compare to variance of Set A is 5.5. Since the Variance of Set
B is bigger that the variance of Set A, therefore the data in Set B is more spread,
disperse, scatter, or variable. Likewise, each datum in Set A is closer from one another
compare to behavior of data in Set B.

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 3
3. Standard Deviation
Standard Deviation of Population
∑(𝑥 − 𝜇 )2
𝜎=√
𝑁
Where:
𝜎 (Lowercase Greek letter sigma) is the standard deviation
∑ (uppercase Greek letter sigma means summation)
𝑥 is each datum
𝜇 (Lowercase Greek letter mu) is the mean of the population
𝑁 is the population size

Standard Deviation of Sample


∑(𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
Where:
∑ (uppercase Greek letter sigma means summation)
𝑠 is the variance
𝑥 is each datum
𝑥 is the mean of the population
𝑛 is the population size

Use the standard deviation of population if data comes from the whole
population. Likewise, use the formula for the standard deviation of the sample
if the sample comes from the sample.
Standard deviation and variance are related. The square root of variance is equal to
the standard deviation. Similarly, the square of standard deviation is equal to the
variance.

Therefore:
Set A
∑(𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
𝑠 = √5.5 ≈ 𝟐. 𝟑𝟓

Set B
∑(𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
𝑠 = √8 ≈ 𝟐. 𝟖𝟑
Similarly, results reveals that the data in Set B is more disperse or scatter compare to
data in Set A.

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 4
Grouped Data

Let us consider the same problem we used in measures of central tendency. The ages
of the first 50 persons who enter the mall were tallied, as shown below. Determine
the mean, median, and mode of their ages.
Age Frequency
10 – 19 5
20 – 29 20
30 – 39 10
40 – 49 7
50 – 59 8
Total n=50

1. Range
𝑈𝑝𝑝𝑒𝑟 𝐶𝑙𝑎𝑠𝑠 𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝐿𝑜𝑤𝑒𝑟 𝐶𝑙𝑎𝑠𝑠 𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦
𝑅= −
𝑜𝑓 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑜𝑓 𝐿𝑜𝑤𝑒𝑠𝑡 𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙
To solve, determine first the boundaries of class.
Age Frequency Boundaries
10 – 19 5
20 – 29 20 19.5 – 29.5
30 – 39 10
40 – 49 7
50 – 59 8
Total n=50

For example, consider class 20 – 29. The lower boundary is average of lower
limit of the class and the upper limit of lower class:
20 + 19 39
= = 19.5
2 2
Hence, the upper boundary is the average of the upper limit of the class and
the lower limit of the higher class next to it:
29 + 30 59
= = 29.5
2 2
Therefore, the class boundary is 19.5 – 29.5. To proceed, we all know from the
previous discussions that the class interval is 10 (e.g. 40 – 30=10). Simply, add
or subtract the class interval to determine the other boundaries. For example:
19.5 – 29.5
+10 +10
29.5 – 39.5

19.5 – 29.5
-10 -10
9.5 – 19.5

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 5
And continue the same process to determine the succeeding class boundaries.
Age Frequency Boundaries
10 – 19 5 9.5 – 19.5
20 – 29 20 19.5 – 29.5
30 – 39 10 29.5 -39.5
40 – 49 7 39.5 – 49.5
50 – 59 8 49.5 – 59.5
Total n=50

Lower boundary of Upper boundary of


the lowest class the highest class

𝑅 = 59.5 − 9.5 = 𝟓𝟎
The range is 50.
2. Variance
Variance of Population
∑ 𝑓 ( 𝑥 − 𝜇 )2
𝜎2 =
𝑁
Where:
𝜎 (Lowercase Greek letter sigma)
∑ (uppercase Greek letter sigma means summation)
𝑓 is the frequency of the class
𝜎 2 is the variance
𝑥 is each datum
𝜇 (Lowercase Greek letter mu) is the mean of the population
𝑁 is the population size

Variance of Sample
2
∑ 𝑓 (𝑥 − 𝑥̅ )2
𝑠 =
𝑛−1
Where:
∑ (uppercase Greek letter sigma means summation)
𝑓 is the frequency of the class
𝑠 2 is the variance
𝑥 is each datum
𝑥 is the mean of the population
𝑛 is the population size

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 6
In the previous discussions (measures of central tendency) we already computed for
the mean which is 𝑥̅ = 33.1. Likewise, the class mark was already determined in the
previous discussions. Class mark (x) is the average of lower limit and upper limit of
each class. We will use this to complete first the table.
Age f x 𝑥− 𝑥 ̅ (𝑥 − 𝑥
̅ )2 𝑓 (𝑥 − 𝑥
̅ )2
14.5 − 33.1 (−18.6)2
10 – 19 5 14.5 5(345.96) = 1,729.8
= −18.6 = 345.96
24.5 − 33.1 (−8.6)2
20 – 29 20 24.5 20(73.96) = 1,479.2
= −8.6 = 73.96
30 – 39 10 34.5 1.4 1.96 19.6
40 – 49 7 44.5 11.4 129.96 909.72
50 – 59 8 54.5 21.4 457.96 3,663.68
Total n=50 ∑ 𝑓 (𝑥 − 𝑥
̅ )2 = 7,802

∑ 𝑓 (𝑥 − 𝑥
̅ )2 = 1,729.8 + 1,479.2 + 19.6 + 909.72 + 3,663.68 = 7,802

2
∑ 𝑓 (𝑥 − 𝑥̅ )2
𝑠 =
𝑛−1
7,802 7,802
𝑠2 = = ≈ 𝟏𝟓𝟗. 𝟐𝟐
50 − 1 49
The variance of the sample is 159.22.
3. Standard Deviation
Standard Deviation of Population
∑ 𝑓 (𝑥 − 𝜇 ) 2
𝜎=√
𝑁
Where:
𝜎 (Lowercase Greek letter sigma) is the standard deviation
∑ (uppercase Greek letter sigma means summation)
𝑓 is the frequency of the class
𝑥 is each datum
𝜇 (Lowercase Greek letter mu) is the mean of the population
𝑁 is the population size

Standard Deviation of Sample


∑ 𝑓 (𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
Where:
∑ (uppercase Greek letter sigma means summation)
𝑠 is the variance
𝑓 is the frequency of the class
𝑥 is each datum
𝑥 is the mean of the population
𝑛 is the population size

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 7
We already know that variance and standard computation is related (vis – a – vis).
We can already determine the standard deviation using variance.

∑ 𝑓 (𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1

𝑠 = √159.22 ≈ 𝟏𝟐. 𝟔𝟐
This gives us the standard deviation of 12.62.

Measures of Location (Position)

Measures of location or position is used to locate the relative position of data value in
the data set. These includes standard scores, percentiles, deciles, and quartiles.

1. Standard Scores
Standard scores or z-scores tells how many standard deviations a data value
is above or below the mean for a specific distribution of values. If standard
score is zero, then the data value is the same as the mean. If it is positive means
the data value is above the mean. Hence, if it is negative, then it is below the
mean.

It obtained by subtracting the mean from the value and dividing the result by
the standard deviation. The symbol for a standard score is z. The formula is:
𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑒𝑎𝑛
𝑧=
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
The formula for samples is:
𝑥 − 𝑥̅
𝑧=
𝑠
The formula for populations is:
𝑥−𝜇
𝑧=
𝜎
The z – score represents the number of standard deviations a data falls above
or below the mean.

Example. Angelo’s score in Literature is 88, compare to the mean score of the class
which is 80 with standard deviation of 3. Also, his score in Mathematics is
90 with a class mean of 95 and standard deviation of 5. Which subject
Angelo perform better?

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 8
Solution:
Literature Mathematics
Score/ value (x) 88 90
Mean 80 95
Standard Deviation 3 5

Literature Mathematics From the results, it shows that


𝑥 − 𝑥̅ 𝑥 − 𝑥̅ Angelo’s score in Literature is
𝑧= 𝑧= 2.67 units above the mean
𝑠 𝑠
88 − 80 90 − 95 while his score is Mathematics
𝑧= 𝑧= is 1 unit below the mean (-1.0).
3 5
8 −5 This implies that he scored
𝑧 = = 2. 6 ̅ ≈ 2.67 𝑧= = −1.0 better in Literature than in
3 5
Mathematics.

Illustration of result in Literature:

z -3 -2 -1 0 1 2 2.67 3
s -3s -2s -1s 0s 1s 2s 2.67s 3s
Scores 71 74 77 80 83 86 88 89
𝑥̅

Illustration of result in Mathematics:

z -3 -2 -1 0 1 2 3
s -3s -2s -1s 0s 1s 2s 3s
Scores 80 85 90 95 100 105 110
𝑥̅

Ungrouped Data
A group of students obtained the following scores in their Statistics quiz:
4, 9, 7, 14, 10, 8, 12, 15, 6, 11
Determine the 1st and 3rd Quartiles, 3rd and 7thDecile, and 25th and 75th Percentiles.

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 9
2. Quartiles
Divide the group into 4 parts/ quarters (𝑄1, 𝑄2 , 𝑄3 , 𝑄4 ). Every part or quarter
is equivalent to ¼ or 25%.
𝑘
𝑄𝑘 = (𝑛 + 1)
4
Where 𝑘 is the partition (𝑘 = 1,2,3,4), and 𝑛 is the number of terms. Round off
the nearest whole number.

First, arrange the scores in ascending order. Then solve for the quartiles.
4, 6, 7, 8, 9, 10, 11, 12, 14, 15
There are 10 scores (𝑛 = 10).

First Quartile (𝑄1 ) Third Quartile (𝑄3 )


𝑘 𝑘
𝑄𝑘 = (𝑛 + 1) 𝑄𝑘 = (𝑛 + 1)
4 4
1 3
𝑄1 = (10 + 1) 𝑄3 = (10 + 1)
4 4
1 3
𝑄1 = (11) 𝑄1 = (11)
4 4
1 11 3 33
𝑄1 = (11) = = 2.75 = 3 𝑄1 = (11) = = 8.25 = 8
4 4 4 4

From the result 𝑄1 is the 3rd term, and 𝑄3 is the 8th term which is 12. The
difference between 𝑄3 and 𝑄1 respectively is the interquartile (𝑄2 ). The 𝑄2 can
also be solve by the use of formula.
𝑄2 = 𝑄3 − 𝑄1
𝑄2 = 8.25 − 2.75
𝑄2 = 5.5 ≈ 6

or

𝑘
(𝑛 + 1)
𝑄𝑘 =
4
2
𝑄2 = (10 + 1)
4
1
𝑄2 = (11)
2
1 11
𝑄2 = (11) = = 5.5 ≈ 6
2 2
This implies that 2nd quartile is 6th term which is 10. The first 6 terms (4, 6, 7,
8, 9, 10) belongs to 50% of the score.
4, 6, 7, 8, 9, 10, 11, 12, 14, 15
𝑸𝟏 𝑸𝟐 𝑸𝟑

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 10
This means that the first 3 terms (4, 6, 7) belongs to 25% of the scores.
Similarly, the first 8 terms (4, 6, 7, 8, 9, 10, 11,12) belongs to 75% of the
scores.

Interpolation technique can be done to get the exact number in the position.
For example, the 𝑄3 is not actually the 8th term but the 8.25th term. What is the
score in the 8.25th term?
Interpolation
The 3rd
quartile (𝑄3 ) is in the 8.25th term which is somewhere between 8th
term and the 9th term (specifically, it is a quarter after the 8th term). It is 0.25
higher than the 8th term. First, is get the difference of scores between the 9th
term and the 8th term:
9𝑡ℎ – 8𝑡ℎ
14 − 12
=2
Then multiply the difference of scores (2) and the excess after the 8 th term
(0.25).
2(0.25) = 0.5
Then add the result to the 8 term.
th

12 + 0.5 = 12.5
Therefore, the 3 quartile (𝑄3 ) which is 8.25th term with score of 12.5. This is
rd

the actual position and actual score. This can also be done to other quartiles.

3. Deciles
Divide the group into 10 parts (𝐷1 , 𝐷2 , … 𝐷9 , 𝐷10 ). Each partition is equivalent
1
to 10 or 10%.
𝑘
𝐷𝑘 = (𝑛 + 1)
10
Where 𝑘 is the partition (𝑘 = 1,2,3,4,5,6,7,8,9,10), and 𝑛 is the number of
terms. Round off the nearest whole number.

From the problem, we’re about to find the 3rd and 7th deciles.
4, 6, 7, 8, 9, 10, 11, 12, 14, 15

Third Decile (𝐷3 ) Eighth Decile (𝐷8 )


𝑘 𝑘
𝐷𝑘 = (𝑛 + 1) 𝐷𝑘 = (𝑛 + 1)
10 10
3 7
𝐷3 = (10 + 1) 𝐷7 = (10 + 1)
10 10
3 7
𝐷3 = (11) 𝐷7 = (11)
10 10
33 77
𝐷3 = = 3.3 ≈ 3 𝐷7 = = 7.7 ≈ 8
10 10

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 11
From the results, the 3rd decile is 3rd term which is 12. The 7th decile is the 8th
term which is 12. Interpolation technique can be done to get the score of actual
position.

4, 6, 7, 8, 9, 10, 11, 12, 14, 15


𝑫𝟑 𝑫𝟕
Roughly speaking, the 1st quartile and 2.5th decile are equal, 2nd quartile is to
5th decile, and 3rd quartile is equal to 7.5th decile.

4. Percentiles
Divide the group into 100 parts (𝑃1 , 𝑃2 , 𝑃3 , … 𝑃99, 𝑃100). Each partition is
equivalent to 1/100 or 1%.
𝑘
𝑃𝑘 = (𝑛 + 1)
100
Where 𝑘 is the partition (𝑘 = 1,2,3, … ,99,100), and 𝑛 is the number of terms.
Round off the nearest whole number.

From the given, we are about to the 25th and 75th percentiles.
4, 6, 7, 8, 9, 10, 11, 12, 14, 15

25th Percentile (𝑃25 ) 75th Percentile (𝑃75 )


𝑘 𝑘
𝑃𝑘 = (𝑛 + 1) 𝑃𝑘 = (𝑛 + 1)
100 100
25 75
𝑃25 = (10 + 1) 𝑃75 = (10 + 1)
100 100
1 3
𝑃25 = (11) 𝑃75 = (11)
4 4
11 33
𝑃25 = = 2.67 ≈ 3 𝑃75 = = 8.25 ≈ 8
4 4

The 25th percentile is the 3rd terms which is 7. The 75th percentile is 8th term
which is 12. Interpolation technique can also be done to get the actual score in
actual position.

4, 6, 7, 8, 9, 10, 11, 12, 14, 15


𝑷𝟐𝟓 𝑷𝟕𝟓

Furthermore, the 50th term is the same as the 2nd quartile or the 5th decile
which is the 6th term. And the 6th term is 10.

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 12
Quartile Decile Percentile
𝐷1 𝑃10
𝐷2 𝑃20
𝑄1 𝐷2.5 𝑃25
𝐷3 𝑃30
𝐷4 𝑃40
𝑄2 𝐷5 𝑃50
𝐷6 𝑃60
𝐷7 𝑃70
𝑄3 𝐷7.5 𝑃75
𝐷8 𝑃80
𝐷9 𝑃90
𝑄4 𝐷10 𝑃100
Grouped Data
The computation of quartiles, deciles, and percentiles in grouped data is the same as
the computation for the median of grouped data.
Problem. The scores of 50 students in Statistics are shown in the table below.
Score Frequency
41 - 45 9
36 – 40 13
31 – 35 15
26 – 30 10
21 – 25 3
Total 50

Determine the following:


a. 𝑄2
b. 𝐷4
c. 𝑃67

1. Quartile
𝑘𝑛
− 𝑐𝑓𝑏
𝑄𝑘 = 𝑙𝑏 + ( 4 )𝑖
𝑓
Where:
𝑄𝑘 is the quartile position
𝑙𝑏 is the lower boundary of the class.
𝑘 is the nth quartile (𝑘 = 1, 2, 3, 4)
𝑛 is the total frequency
𝑐𝑓𝑏 is cumulative of frequency before the class
𝑓 is the frequency of the class
𝑖 is the class interval

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 13
Solution. To solve for the 2nd quartile (𝑄2 ) complete first the table.
Score Frequency 𝑐𝑓
41 - 45 9 50
36 – 40 13 41
31 – 35 15 28
26 – 30 10 13
21 – 25 3 3
Total 50

𝑘𝑛
Then solve for :
4

𝑘𝑛 2(50) 100
= = = 25
4 4 4
Class 31 – 35 gave 14th to 28th ranks where 25th rank is included. Therefore, 2nd
quartile is included in this class.
Score Frequency 𝑐𝑓
41 - 45 9 50
36 – 40 13 41
31 – 35 15 28
26 – 30 10 13
21 – 25 3 3
Total 50
The lower boundary (𝑙𝑏) is halfway between 31 and 30 which 30.5. The frequency
(𝑓) of the class is 15. The cumulative frequency before (𝑐𝑓𝑏 ) the class is 13. The class
interval is 5 (e.g. 31 − 26, 𝑜𝑟 35 − 30). Therefore:
𝑘𝑛
− 𝑐𝑓𝑏
𝑄𝑘 = 𝑙𝑏 + ( 4 )𝑖
𝑓

2(50)
− 13
𝑄2 = 30.5 + ( 4 )5
15

100
− 13
𝑄2 = 30.5 + ( 4 )5
15

25 − 13
𝑄2 = 30.5 + ( )5
15
12
𝑄2 = 30.5 + ( )5
15

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 14
4
𝑄2 = 30.5 + ( ) 5
5
𝑄2 = 30.5 + 4
𝑸𝟐 = 𝟑𝟒. 𝟓
The 2nd quartile is 34.5.

2. Decile
𝑘𝑛
− 𝑐𝑓𝑏
𝐷𝑘 = 𝑙𝑏 + (10 )𝑖
𝑓
Where:
𝐷𝑘 is the decile position
𝑙𝑏 is the lower boundary of the class.
𝑘 is the nth quartile (𝑘 = 1, 2, 3, … 9, 10)
𝑛 is the total frequency
𝑐𝑓𝑏 is cumulative of frequency before the class
𝑓 is the frequency of the class
𝑖 is the class interval
Score Frequency 𝑐𝑓
41 - 45 9 50
36 – 40 13 41
31 – 35 15 28
26 – 30 10 13
21 – 25 3 3
Total 50
𝑘𝑛
To solve for the 4 decile, start with 10 :
th

𝑘𝑛 4(50)
= = 4(5) = 20
10 10
The 20th rank also belongs to class 31 -35 which holds the ranks 14th to 28th.
The same as the solution for quartile earlier, we have the same values for
unknowns. Hence:
𝑘𝑛
− 𝑐𝑓𝑏
𝐷𝑘 = 𝑙𝑏 + (10 )𝑖
𝑓
4(50)
− 13
𝐷4 = 30.5 + ( 10 )5
15

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 15
4(5) − 13
𝐷4 = 30.5 + ()5
15
20 − 13
𝐷4 = 30.5 + ( )5
15
7
𝐷4 = 30.5 + ( ) 5
15
7
𝐷4 = 30.5 +
3
𝑸𝟒 = 𝟑𝟎. 𝟓 + 𝟐. 𝟑𝟑 ≈ 𝟑𝟐. 𝟖𝟑
The 4th decile is approximately 32.83.

3. Percentile
𝑘𝑛
− 𝑐𝑓𝑏
𝑃𝑘 = 𝑙𝑏 + ( 100 )𝑖
𝑓
Where:
𝑃𝑘 is the percentile position
𝑙𝑏 is the lower boundary of the class.
𝑘 is the nth quartile (𝑘 = 1, 2, 3, … ,99, 100)
𝑛 is the total frequency
𝑐𝑓𝑏 is cumulative of frequency before the class
𝑓 is the frequency of the class
𝑖 is the class interval
Score Frequency 𝑐𝑓
41 - 45 9 50
36 – 40 13 41
31 – 35 15 28
26 – 30 10 13
21 – 25 3 3
Total 50
𝑘𝑛
To solve for the 67th percentile, start with 100:
𝑘𝑛 67(50) 3,350
= = = 33.5 ≈ 38
100 100 100
The 38th rank is included in the class 36 -40 since it holds the ranks 29th to 41st.
36 + 35 71
𝑙𝑏 = = = 35.5
2 2
𝑛 = 50
𝑐𝑓𝑏 = 28
𝑓=13
𝑖=5
𝑘𝑛
100 − 𝑐𝑓𝑏
𝑃𝑘 = 𝑙𝑏 + ( )𝑖
𝑓

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 16
67(50)
− 28
𝑃67 = 35.5 + ( 100 )5
13
35.5 − 28
𝑃67 = 35.5 + (
)5
13
7.5
𝑃67 = 35.5 + ( ) 5
13
37.5
𝑃67 = 35.5 + ( )
13
𝑷𝟔𝟕 = 𝟑𝟓. 𝟓 + 𝟐. 𝟖𝟖 ≈ 𝟑𝟖. 𝟑𝟖
The 67 percentile is approximately 38.38.
th

Box – and - Whisker Plot

Three steps in creating box – and – whisker plot:


1. Calculate the values of the five – number summary.
2. Draw and translate data sets to and from a box – and – whisker plot.
3. Interpret the shape of a box – and – whisker plot.
To illustrate, let us consider the scenario below.
Suppose, Maria’s scores in Calculus quizzes are 15, 2, 8, 9, 5, 5, 13, 10, 7, 9, and 4. Draw
box – and – whisker graph and interpret the result.
Step 1. Five – number summary
The five – number summary is composed of:
a. Least value;
b. Greatest value;
c. First Quartile (𝑄1 );
d. Second Quartile (𝑄2 ); and
e. Third Quartile (𝑄3 )
Arrange first the given in increasing order:
2, 4, 5, 5, 7, 8, 9, 9, 10, 13, 15
From the given the least value is 2 and the greatest value is 15.
𝑘 𝑘 𝑘
𝑄𝑘 = (𝑛 + 1) 𝑄𝑘 = (𝑛 + 1) 𝑄𝑘 =(𝑛 + 1)
4 4 4
1 2 3
𝑄1 = (11 + 1) 𝑄2 = (11 + 1) 𝑄3 = (11 + 1)
4 4 4
1 1 3
𝑄1 = (12) 𝑄2 = (12) 𝑄3 = (12)
4 2 4
12 12 36
𝑄1 = 𝑄2 = 𝑄2 =
4 2 4
𝑄1 = 3 𝑄2 = 6 𝑄2 = 9
The 3rd term is 5. (Lower The 6th term is 8. The 9th term is 10. (Upper
Quartile) (Median) Quartile)

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 17
2, 4, 5, 5, 7, 8, 9, 9, 10, 13, 15
L 𝑄1 𝑄2 𝑄3 H

Step 2. Draw and translate data sets to and from a box – and – whisker plot.
a) In the number line, draw congruent vertical lines positioned at 𝑄1 , 𝑄2, 𝑎𝑛𝑑 𝑄3.

b) Horizontally connect endpoints of 𝑄1 , 𝑎𝑛𝑑 𝑄3 lines to create box.

c) Connect the least and highest value to the box. These lines are called whiskers.

Step 3. Interpret the shape of a box – and – whisker plot.


Observations.
• The third quarter (from median to upper quarter) data is densely
concentrated.
• Upper whisker is widely spread than the other section.
This shows that data is slightly skewed to the right because data at the right of
the box is quite spread than the other.

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 18
LESSON5. Normal Distribution

If the mean, median, and mode of data coincide or equal to one another, we can say
that data is normally distributed. This form a “bell curve”. A normal distribution is a
perfectly symmetric, mound-shaped distribution. If data is normally distributed, it
means that most of the data is densely concentrated towards the center.

Characteristics of normal distribution.


1. Mean, median, and mode, coincide or equal from one another.
2. The shape is like a bell.
3. Mean, median, and mode, divides that curve into equal parts. The lower 50%
of data, and upper 50% of data.
4. The curve is horizontally asymptotic. This means that the curve won’t touch
the x-axis.
Mean
Median
mode

bell – shaped curve

Lower 50% Upper 50%

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 19
Empirical Rule

1. The area from -1.0 to 1.0 is equivalent to 68% of the total area.
2. The area from -2.0 to 2.0 is equivalent to 95% of the total area.
3. The area from -3.0 to 3.0 is equivalent to 99.7% of the total area.

Source: Almukkahal, R., et. al. (2016). CK-12 Advanced Probability and Statistics Concepts.

Problem. Suppose, the score of 250 students in a certain exam is normally distributed
with the mean 30 with a standard deviation of 6. Answer the following.
1. How many students scored less 25? Can be write as P(z<25)? – Probability of
z-scores which less 25?
2. How many students have scores higher than 36? P(z>36)
3. How many students have scores from 24 to 44? P(24<z<44)
Solution.
Use z-score:
𝑥 − 𝑥̅ 𝑥−𝜇
𝑧= 𝑜𝑟 𝑧 =
𝑠 𝜎
1. P(z<25)
𝑥 − 𝑥̅ 25 − 30 −5
=𝑧= = ≈ −0.83
𝑠 6 6
Use Standard Normal Distribution (see Appendix).

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 20
The computed z is −0.83, split into −0.8 and −0.03. Since this is negative table
for negative z-score. Locate -0.8 at the left most of the table, and locate -0.03
at the top. The intersection of the two represents area or probability from left
to right of the normal distribution.

Source: http://onlinestatbook.com/2/calculators/normal_dist.html

The intersection is 0.20327 or 20.327%, this is the area in the normal


distribution from left to 0.83.

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 21
-0.83

0.20327 or
20.33%

x=25 𝑥̅ = 30
Since, the sample is 250, the 20.33% of it scored 25 or less. Therefore:
250𝑥0.20327 = 51
At most 51 students scored less than 25. The number of students is rounded
into whole number since it is nominal (countable).

2. P(z>36)
𝑥 − 𝑥̅ 36 − 30 6
𝑧=
= = ≈ 1.00
𝑠 6 6
Locate 1.0 and 0.00 at the table for positive z-score. We will get 0.84134 or
84.134% (see the illustration below.

1.00

84.13%

15.87%

𝑥̅ = 30 𝑥 = 41

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 22
But we are to get the number of students with score of at least 36 (36 or
above). This means that the answer found in the other section, which is
100% − 84.13% = 15.87%. Therefore:
250𝑥0.1587 = 40
At most 40 students scored at least 36 or 36 and above.

3. (24<z<44)
𝑥 − 𝑥̅ 24 − 30 −6
𝑧= = = ≈ −1.00
𝑠 6 6
Locate: −1.0 𝑎𝑛𝑑 0.00

𝑥 − 𝑥̅ 44 − 30 14
𝑧= = = ≈ 2.33
𝑠 6 6
Locate: 2.3 𝑎𝑛𝑑 0.03

The covered area of -1.00 from left is 0.15866 or 15.87%. Hence, the covered area of
2.33 from left is 0.99010 or 99.01%.

99.01%

−1.00

83.14%

2.33
15.87%

𝑥 = 24 𝑥̅ = 30 𝑥 = 44
To get the area covered in between subtract the smaller area.
99.01% − 15.87% = 83.14%
So, 83.14% or 0.8314 has scored from 24 to 44. Therefore:
250𝑥0.8314 = 208
At most 208 students has score 24 to 44.

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 23
LESSON 6. Linear Regression

Way in determining the equation of the relationship between variables, given that the
two variables are really related. Draw scatter diagram or plot to create the
approximate model of the relationship between variables. The line of most interest is
called line of the best fit or the least-square regression line. It is the line for a set of
bivariate that minimizes the sum of the squares of the vertical deviations from each
point to the line.

The Formula for the Least – Squares Line


Given the ordered pairs of related variables (𝒙𝟏 , 𝒚𝟏 ), (𝒙𝟐 , 𝒚𝟐 ), (𝒙𝟑 , 𝒚𝟑 ),…,(𝒙𝒏 , 𝒚𝒏 ); the
equation will be
̂ = 𝒂𝒙 + 𝒃
𝒚
Where:
𝒏 ∑ 𝒙𝒚−(∑ 𝒙)(∑ 𝒚)
𝒂= 𝟐 𝟐
and ̅ − 𝒂𝒙
𝒃=𝒚 ̅
𝒏 ∑ 𝒙 −(∑ 𝒙)
̅ is the mean of x variables, and 𝒚
𝒙 ̅ is the mean of y variables

Let us determine the equation of the line which approximately model the relationship
of the following points:
A(1,2), B(2,3), C(3,3), D(4,6), E(5,4)
7

0
0 1 2 3 4 5 6
2
𝑥 𝑦 𝑥𝑦 𝑥
1 2 1(2) = 2 1
2
2 3 6 2 =4
3 3 9 9
4 6 24 16
5 4 20 25

∑ 𝑥 = 15 ∑ 𝑦 = 18 ∑ 𝑥𝑦 = 61 ∑ 𝑥 2 = 60

15 18
𝑥̅ = =3 𝑦̅ = = 3.6
5 5

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 24
𝒏 ∑ 𝒙𝒚 − (∑ 𝒙)(∑ 𝒚)
𝒂= 𝟐
𝒏 ∑ 𝒙 − (∑ 𝒙)𝟐
5(61) − (15)(18)
𝑎=
5(60) − (15)2

305 − 270
𝑎=
300 − 225
35 𝟕
𝑎= = ̅
≈ 0.46
75 𝟏𝟓

𝒃=𝒚 ̅ − 𝒂𝒙̅
7
𝑏 = 3.6 − (3)
15
7 11
𝑏 = 3.6 − = = 𝟐. 𝟐
5 5
Therefore, the equation of least – regression line is:
̂ = 𝒂𝒙 + 𝒃
𝒚
𝟕 𝟏𝟏
̂=
𝒚 𝒙+ ̂ = 𝟎. 𝟒𝟕𝒙 + 𝟐. 𝟐
𝒐𝒓 𝒚
𝟏𝟓 𝟓
7

0
0 1 2 3 4 5 6

7
The trend of the line is slightly increasing with steepness or inclination of 𝑜𝑟 0.47
15
11
per unit and y – intercept of 𝑜𝑟 2.2.
5

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 25
Linear Correlation Coefficient

Use to determine the strength of a linear relationship between two variables, denoted
by 𝑟. Given the ordered pairs of related variables (𝒙𝟏 , 𝒚𝟏 ), (𝒙𝟐 , 𝒚𝟐 ), (𝒙𝟑, 𝒚𝟑 ),…,(𝒙𝒏 , 𝒚𝒏 );
the equation will be
𝑛(∑ 𝑥𝑦) − (∑ 𝑥 )(∑ 𝑦)
𝑟=
√𝑛(∑ 𝑥 2 ) − (∑ 𝑥 )2 ∙ √𝑛(∑ 𝑦2 ) − (∑ 𝑦)2
If 𝑟 is positive, as one variables increases, the other one also increases.
If 𝑟 is negative, as one variable increases, the other one decreases.
The closer |𝑟| to 1, the stronger the relationship of the variable, while if r=0 means
there is no correlation.

Further interpretation. The following points are the accepted guidelines for
interpreting the correlation coefficient:

𝒓 𝑰𝒏𝒕𝒆𝒓𝒑𝒓𝒆𝒕𝒂𝒕𝒊𝒐𝒏
0 No linear relationship
Perfect positive linear relationship (as one variable
+1
increases, then the other variable increases)
Perfect negative linear relationship (as one variable
-1
decreases, then the other variable decreases)
Between
0 and 0.3 Weak positive (negative) linear relationship
(0 and -0.3)
Between
0.3 and 0.7 Moderate positive (negative) linear relationship
(-0.3 and -0.7)
Between
0.7 and 1.0 Strong positive (negative) linear relationship
(-0.7 and -1.0)
Source: Ratner, B. (2009). The correlation coefficient: Its values range between +1/−1, or do they?

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 26
Let us consider the same example, after determining the equation of the line the best
fit to the given data. Let us now determine the correlation of data.
A(1,2), B(2,3), C(3,3), D(4,6), E(5,4)
7

0
0 1 2 3 4 5 6

First 4 columns were already filled in the previous solution. For this, we need addition
column for 𝑦 2 .
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
1 2 1(2) = 2 1 22 = 4
2 3 6 22 = 4 32 = 9
3 3 9 9 9
4 6 24 16 36
5 4 20 25 16

∑ 𝑥 = 15 ∑ 𝑦 = 18 ∑ 𝑥𝑦 = 61 ∑ 𝑥 2 = 60 ∑ 𝑦 2 = 74

𝑛(∑ 𝑥𝑦) − (∑ 𝑥 )(∑ 𝑦)


𝑟=
√𝑛(∑ 𝑥 2 ) − (∑ 𝑥 )2 ∙ √𝑛(∑ 𝑦2 ) − (∑ 𝑦)2
5(61) − (15)(18)
𝑟=
√5(60) − (15)2 ∙ √5(74) − (18)2
305 − 270
𝑟=
√300 − 225 ∙ √370 − 324
35
𝑟=
√75 ∙ √46
35
𝑟=
5√3 ∙ √46
7
𝑟=
√3 ∙ √46

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 27
7
𝑟=
√138
7√138
𝑟= ≈ 𝟎. 𝟔𝟎
138

The computed r is +0.60, this means that variables have moderate positive linear
relationship.

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 28
References
Almukkahal, R., et. al. (2016). CK-12 Advanced Probability and Statistics Concepts.
Flexbook: next generation textbook.
Australian Bureau of Statistics (2013). What is Variable? Retrieved 04 June 2020 from
https://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+langu
age+-
+what+are+variables#:~:text=A%20variable%20is%20any%20characteri
stics,type%20are%20examples%20of%20variables.
Bluman, A. G. (2018). Elementary Statistics: A Step by Step Approach , Tenth Edition,
ISBN 978 – 1 – 259 -75533 McGraw – Hill Education, New York City, USA.
Retrieved 03 June 2020 from https://b-ok.asia/book/5009088/f236d3
Dataceuticc, Inc. (2018). Sir Ronald Aylmer Fisher – The Father of Modern Statistics.
Retrieved 06 June 2020 from
https://www.dataceutics.com/blog/2018/7/24/sir-ronald-aylmer-fisher-
the-father-of-modern-statistics
Encyclopedia Britanica, Inc. (2020). Sir Ronald Aylmer Fisher. Retrieved 06 June
2020 from https://www.britannica.com/science/physical-anthropology
Gupta, S. (2014). Sampling Methods. Retrieved 06 June 2020 from
https://www.slideshare.net/shubhanshug1/seminar-sampling-
methods?qid=d1f11eda-cdd5-44b8-81de-
f0cd88637e6e&v=&b=&from_search=1
Ratner, B. (2009). The correlation coefficient: Its values range between +1/−1, or do
they?. Spring Nature Switzerland. Retrieved 17 June 2020 from
https://doi.org/10.1057/jt.2009.5
Tejada, J.J. & Punzalan, R. B. (2012). On the Misuse of Slovin’s Formula. The Philippine
Statistician, Vol. 61, No. 1, pp. 129 – 136. Retrieved 06 May 2020 from
https://www.psai.ph/docs/publications/tps/tps_2012_61_1_9.pdf
Weiss, N. A. (2012). Elementary Statistics, 8th Edition, ISBN 978 – 0- 321 – 69123 - 1.
Pearson Education, Inc., Boston, USA. Retrieved 03 June 2020 from https://b-
ok.asia/book/1236722/d339a2
http://onlinestatbook.com/2/calculators/normal_dist.html

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 29
Appendix A

http://onlinestatbook.com/2/calculators/normal_dist.html

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 30
Appendix B.

http://onlinestatbook.com/2/calculators/normal_dist.html

Mathematics in the Modern World – Data Management (Part 2) – Madrazo, A. (2020), [email protected] | 31

You might also like