0% found this document useful (0 votes)
840 views

Lecture Week 3

Fractiles are statistical measures that divide a distribution into equal parts. The most common fractiles are quartiles, which divide data into four equal parts (Q1, Q2, Q3), deciles which divide data into ten equal parts, and percentiles which divide data into 100 equal parts. Formulas are provided to calculate fractiles for both ungrouped and grouped data distributions based on cumulative frequencies and class boundaries. An example is shown calculating Q1, D6, and P95 for grouped weight data of jackfruits sold at a supermarket.

Uploaded by

Jhovelle Ansay
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
840 views

Lecture Week 3

Fractiles are statistical measures that divide a distribution into equal parts. The most common fractiles are quartiles, which divide data into four equal parts (Q1, Q2, Q3), deciles which divide data into ten equal parts, and percentiles which divide data into 100 equal parts. Formulas are provided to calculate fractiles for both ungrouped and grouped data distributions based on cumulative frequencies and class boundaries. An example is shown calculating Q1, D6, and P95 for grouped weight data of jackfruits sold at a supermarket.

Uploaded by

Jhovelle Ansay
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Fractiles

Fractiles are measures of location or position which include not only central location but also any position
based on the number of equal divisions in a given distribution. If we divide the distribution into four equal
divisions, then we have quartiles denoted by Q1, Q2, Q3, and Q4. The most commonly used fractiles are the
quartiles, deciles, and percentiles.

Fractiles for Ungrouped Data


Quartiles
Quartiles divide a distribution into four equal parts. For example, Q1, or the first quartile, locates the point
which is greater than 25% of the items in a distribution.

3N th item
Q3 is the 3rd quartile. Q3 =
4
This means that 75% of the observations lie below this value.

2N th item or the median


Q2 is the 2nd quartile. Q2 =
4
N th item
Q1 is the 1st quartile. Q1 =
4
Deciles
Deciles are values that divide a distribution into 10 equal parts.
Nth item
D1 is the 1st decile. D1 =
10
3Nth item
D3 is the 3rd decile. D3 =
10
5Nth item or the median
D5 is the 5th decile. D5 =
10
Percentiles
Percentiles are values that divide the distribution into 100 equal parts. P10 or the tenth percentile means
the 10th item in the distribution which is 10% higher than the rest of the items.
Nth item
P1 is the 1st percentile. P1 =
100
25Nth item or Q
P25 is the 25th percentile. P25 = 1
100
50Nth item or the median
P50 is the 50th percentile. P50 =
100
67Nth item
P67 is the 67th percentile. P67 =
100
Example 1:
Calculate Q1, Q2, Q3, D1, D4, D5, D7, P10, P25, P50, and P70 for the following IQ scores.
87 90 95 96 97 98 98 99
100 100 100 100 100 101 101 102
102 102 103 104 105 107 110

N th = 23th = 5.75th item, which is 98.


Q1 =
4 4
Note: For odd number of observations, when the item number is fractional, take the next higher item. Since the 5 th
item is 97, the 5.75th item is the next value, which is 98. This means that the score of 98 is higher than 25% of the
items in the distribution. If the number of cases is even, take the point midway between the two items located at
the middle of the distribution.

23
Q2 = 2N th = 2(23) th = th = 11.5th item, which is 100.
4 4 2
This means that the score of 100 is higher that 50% of the items in the distribution.

69
Q3 = 3N th = 3(23) th = th = 17.25th item, which is 102.
4 4 4
N th = 23
D1 = th = 2.3th item, which is 95.
10 10
4N th = 4(23)th = 92
D4 = th = 9.2th item, which is 100.
10 10 10
5N th = 5(23)th = 115
D5 = th = 11.5th item, which is 100.
10 10 10
7N th = 7(23)th = 161
D7 = th = 16.1th item, which is 102.
10 10 10
10Nth = 10(23)th = 23
P10 = th = 2.3th item, which is 95.
100 100 10
P25 = 25Nth = 25(23)th = 23
th = 5.75th item, which is 98.
100 100 4
50Nth = 50(23)th = 23
P50 = th = 11.5th item, which is 100.
100 100 2
P70 = 70Nth = 70(23)th = 1,610th = 16.1th item, which is 102.
100 100 100
Note that the median is equal to Q2, D5, and P50.

Fractiles for Grouped Data

Quartiles
kN – - cf
Qk = LL + i 4
m
where Qk = kth quartile
LL = lower class boundary of the kth quartile class
cf = less than cumulative frequency below the kth quartile class
fm = frequency of the kth quartile class
i = class size
N = total number of observations
Deciles
kN – - cf
Dk = LL + i 10
m
where Dk = kth decile
LL = lower class boundary of the kth decile class
cf = less than cumulative frequency below the kth decile class
m = frequency of the kth decile class
i = class size
N = total number of observations
Percentiles
kN – - cf
Pk = LL+ i 100
m
where Pk = kth percentile
LL = lower class boundary of the kth percentile class
cf = less than cumulative frequency below the kth percentile class
m = frequency of the kth percentile class
i = class size
N = total number of observations
Example 2:
Find the Q1, D6, and P95 of the data in table 1.
Table 1
Weights of 50 Pieces of Jackfruits Sold in Supermarket Y
Weights (in pounds) No. of Pieces <CF
50 – 54 5 5
55 – 59 19 24
60 – 64 22 46
65 – 69 3 49
70 – 74 1 50
Solution:

1. N th = 50th = 12.5th item.


Q1 =
4 4
The 1st quartile class is 55–59 since it is where the 12.5 th item is found. Hence, LL=54.5, cf=5, fm=19, N=50, and
i=5.
N – - cf
Q1 = LL + i 4
m

12.5 – 5
= 54.5 + 5
= 54.5 + 1.97 19
= 56.47

D6 = 6N
2. 6(50)
th = th = 6(5)th = 30th item.
10 10
Thus, the 6 decile class is 60–64 since it is where the 30 th item is found. LL=59.5, cf=24, fm=22, i=5, and N=50.
th

6N– - cf
D6 = LL + i 10
m

30 – 24
= 59.5 + 5
= 59.5 + 1.36 22
= 60.86

P95 = 95Nth =
3. 95(50)th = 95
th = 47.5th item.
100 100 2
Thus, the 95 percentile class is within 65 – 69. LL = 64.5, cf = 46, fm = 3, i = 5
th

95N– - cf
P95 = LL + i 100
m
47.5 – 46
= 64.5 + 5
22
= 64.5 + 2.5
= 67

Measures of Variation
Events of nature vary from time to time. People keep on changing their location, motion, physical
appearance, skin reaction to different chemicals, height, weight, hair color, eye color, ideas, and even values in
life. Usually, the heights of a group of people with the same race tend to converge to a certain common value. For
example, if the mean height of Filipino males is approximately 5 feet and 6 inches, then this means that most
Filipino male adults have heights that are clustering about this value. The extent of the clustering of the heights of
the Filipino males about a central value is known as variation. The measures of variation will enable you to know
how varied the observations are, whether there are extreme values in the distribution, or whether the values are
very close to each other. If the measure of variation is zero, it means that there is no variation at all and that the
observations are all alike, or homogeneous. Otherwise, they are heterogeneous. The common measures of
variation are the range, mean absolute deviation, variance, standard deviation, coefficient of variation, quartile
deviation, and the percentile range.
Range
The range is the simplest form of measuring the variation of a distribution. To get the range, subtract the
lowest score or observation from the highest score.

R = Highest Observation – Lowest Observation


Example 1:
A group of scientists went on an expedition to ten range in Sierra Madre, Philippines to study the different
species of plants existing in that area. The ages of the scientists are 34, 35, 45, 56, 32, 25, and 40. What is the
range of their ages?
Solution:
Highest Age = 56
Lowest Age = 25
R = Highest – Lowest
= 56 – 25
= 31
Therefore, the range of their ages is 31.
If the size of the population or sample is large, the range is not an excellent measure of variation because
it will only consider the highest and the lowest values and will not tell anything about the values between them. If
one is interested in the position of each observation relative to the mean of the set of data, other measures of
variation may be necessary. One such measure is the mean absolute deviation.
Mean Absolute Deviation
To find the mean absolute deviation, subtract the mean score from each raw score, then, using the
absolute values of the differences, get the sum of the results. The sum is called the sum of the deviations from the
mean. Next, divide this sum by N, the total number of cases. In symbols,

x- x
MAD = N (for ungrouped data)
where MAD = mean absolute deviation
x = raw score
= mean score
N = number of observations
x
x- x
MAD = N (for grouped data)
where MAD = mean absolute deviation
 = frequency
x = class mark
= mean score
N = number of observations
Example 2: x
Take the MAD of the ages of the scientists in example 1.
Solution:
The ages are 34, 35, 45, 56, 32, 25, and 40.
34 + 35 + 45 + 56 + 32 + 25 + 40
Mean Age: x = = 38.14
7
x x-x x - x
34 -4.14 4.14
35 -3.14 3.14
45 6.86 6.86
56 17.86 17.86
32 -6.14 6.14
25 -13.14 13.14
40 1.86 1.86
Total 53.14
53.14 = 7.59
MAD =
7
Therefore, the mean absolute deviation is 7.59.

Variance
Variance is another measure of variation which can be used instead of the range. The variance considers
the deviation of each observation from the mean. To obtain the variance of a distribution, first, square the
deviation from the mean of each raw score and add them together. Then, divide the resulting sum by N or the
total number of cases.
1. Population Variance for Ungrouped Data

(x – )2
= N
where V = population variance
x = raw score
 = population mean
N = number of observations

2. Sample Variance for Ungrouped Data


(x – Mn )2
V = N-1
where V = sample variance
x = raw score
Mn = sample mean
N = number of observations
3.
4. Population Variance for Grouped Data

f(x – )2
=
N
where V = population variance
 = frequency
x = class mark
 = population mean
N = number of observations

Sample Variance for Grouped Data


Nx2 – ( x)2
V = N(N – 1)
where V = sample variance N
N = number of observations
 = frequency
x = class mark
Except when specified that the population variance is to be used, you will always use the sample variance formula
in the examples and exercises throughout the book.

Example 3:
Find the population and sample variances of the following distribution: 34, 35, 45, 56, 32, 25, and 40
Solution:

x = 267 = 38.14
7
x x - x (x – x)2
34 4.14 17.1396
35 3.14 9.8596
45 6.86 47.0596
56 17.86 318.9796
32 6.14 37.6996
25 13.14 172.6596
40 1.86 3.4596
Total 267 53.14 606.86

1. Population Variance
(x – )2
= N
606.8
=
6
7
= 86.7
2.
3. Sample Variance
(x – Mn )2
V = N
606.8
=
6
= 101.14
6
Example 4:
Compute for the population and sample variances for the data in table 1.
Table 1
IQ Scores
IQ Scores  x x x2 x2 (x – x)2
75 – 79 10 77 770 5,929 59,290 1,876.9
80 – 84 12 82 984 6,724 80,688 908.28
85 – 89 25 87 2,175 7,569 189,225 342.25
90 – 94 34 92 3,128 8,464 287,776 57.46
95 – 99 19 97 1,843 9,409 178,771 754.11
100 - 104 15 102 1,530 10,404 156,060 1,915.35
N = 115 10,430 951,810 5,854.35
Solution: 10,430
x= = 38.14
Sample Variance 115
Nx2 – ( x)2
V = N(N – 1)
N
115(951,810) – (10,430)2
=
115(115 – 1)
109,458,150 – 108,784,900
=
13,110
= 51.35
Population Variance
(x – )2
= N
5,854.
=
35
= 50.91
115
Standard Deviation
The standard deviation,  for a population or s for a sample, is the square root of the value of the variance.
In symbols,
Population Standard Deviation (s)
___
s=√
Sample Standard Deviation (SD)
___
SD = √V
Unless specified, the sample standard deviation will be used in all the examples and exercises throughout the
book.
Example 5:
Compute for the population and sample standard deviations for the data in table 1.
Solution:
Population Variance
= 50.91
Therefore, the value of the population standard deviation is

s = √50.91 =
Sample Variance
7.14
V = 51.35
The sample standard deviation is
SD = √51.35 = 7.17
Example 6:
Find the standard deviation for the distribution in table 2.

Table 2
Scores in the Statistics Final Exam
Class Interval  x x x2
27 – 29 12 28 336 9,408
30 – 32 23 31 713 22,103
33 – 35 60 34 2,040 69,360
36 – 38 45 37 1,665 61,605
39 – 41 51 40 2,040 81,600
42 – 44 75 43 3,225 138,675
45 – 47 28 46 1,288 59,248
48 – 50 33 49 1,617 79,233
51 – 53 18 52 936 48,672
54 – 56 10 55 550 30,250
355 14,410 600,154

14,410
x=
355
= 40.59
V 355(600,154) – (14,410)2
=
355(355 – 1)
5,406,57
=
0
= 43.02125,670
SD = √43.02
= 6.56
Therefore, the standard deviation of the score is 6.56.

Coefficient of Variation
When it is necessary to compare the variability of two or more groups, the task is easy if the means are
the same. For example, you can easily compare which group is more varied in height between the following
groups:
Group 1: 156 cm, standard deviation = 6
Group 2: 156 cm, standard deviation = 10
Clearly, one can say that Group 2 is more varied because it has a higher standard deviation. The task
becomes more difficult if the means are not equal and the units are different, such as when comparing the weights
of two groups belonging to different age brackets or different genders. To compare the variability of the weights of
9 girls, having a mean weight of 100 pounds and a standard deviation of 5 with that of the weight of 12 boys
having a mean of 160 pounds and a standard deviation of 8, a statistic called the coefficient of variation could help
you. The formula is given by:
SD
CV = 100%
where SD = standard deviation
 = mean
Since s and  have the same units, their units will cancel out and so, CV has no unit.
Example 7:
Suppose two groups of students are to be compared in terms of height.
Group Mean Height Standard Deviation CV
Male 162 cm 10 cm 6.17%
Female 148 cm 4 cm 2.70%
Solution:
10 100% = 6.17%
Male CV =
162
4 100% = 2.70%
Female CV =
148
Comparing the relative variations in height of the male and female students, it can be seen that the heights of the
male students have a higher coefficient of variation than those of the female students. Thus, the male students’
heights are more varied.
Example 8:
Compare the variability of the heights and weights of the students given in the following data:

 s CV
Height (in cm) 168 cm 12 cm 7.14%
Weight (in pounds) 200 lb 20 lb 10.00%
From the results, it can be seen that the weights of the students are more varied than the heights.

Quartile Deviation
The quartile deviation is another way of determining the spread of a distribution in terms of quartiles. The
quartile deviation formula is shown below:

Q3 – Q1
QD =
2
where QD = quartile deviation
Q3 = 3rd quartile
Q1 = 1st quartile
Example 9:
Find the QD of the following scores:
23 25 25 30 35 39 40 44 47 51 60
Solution:

Q3 = 3N th = 3(11)th = 8.25th item Thus, Q3 = 47.


4 4
N 11
Q1 = th = th = 2.75th item Thus, Q1 = 25.
4 4
47 – 25 22
QD = = = 11
2 2
Hence, the QD is 11.
Example 10:
Find the QD of the car battery lives (in years).
1.2 1.4 1.6 2.2 2.5 2.8 3.0 3.0 3.1 4.4
Solution:

Q3 = 3N th = 3(10) th = 7.5th item, which is 3.0 years (the value is


4 4
midwayNbetween
10the 7th and the 8th items, which is 3.0 in this example).
Q1 = th = th = 2.5th item, which is 1.5 years (since the number of
4 4
cases is even, the mean between the 2nd and the 3rd item, which is 1.5, is taken).
3.0 – 1.5 1.5
QD = = = 0.75
2 2
Hence, the QD is 0.75.
Example 11:
Find the QD of the scores in the following table:
Table 3
Scores in a Statistics Final Exam
Class Boundaries
Class Interval  x x <CF
Lower Upper
27 – 29 12 28 336 12 26.5 29.5
30 – 32 23 31 713 35 29.5 32.5
33 – 35 60 34 2,040 95 32.5 35.5
36 – 38 45 37 1,665 140 35.5 38.5
39 – 41 51 40 2,040 191 38.5 41.5
42 – 44 75 43 3,225 266 41.5 44.5
45 – 47 28 46 1,288 294 44.5 47.5
48 – 50 33 49 1,617 327 47.5 50.5
51 – 53 18 52 936 345 50.5 53.5
54 – 56 10 55 550 355 53.5 56.5
N = 355
Solution:

N = 355= 88.75, hence, LL= 32.5, cf = 35, i = 3, and fm = 60.


For Q1:
4 4
88.75 – 35
Q1 = 32.5 + 3 = 32.5 + 2.69 = 35.19
60

3N
For Q3: = = 3(355)= 266.25, hence, LL
= 32.5, cf = 35, i = 3, and fm = 60.
4 4
266.25 – 266
Q3 = 44.5 + 3 = 44.5 + 0.027 = 44.53
28
44.53 – 35.19 9.34
QD = = = = 4.67
2 2
Hence, the quartile deviation is 4.67.

Percentile Range
The percentile range, PR, is the difference between the 90 th percentile (P90) and the 10th percentile (P10). In
symbols,
PR = P90 – P10
Example 12:
The following data represent the scores of students in a Physics final examination:
100 100 111 111 112 120 121 122 123
130 132 133 135 140 145 145 146 150
150 155 160 164 165 165 170 171 175 180
Calculate the percentile range of the scores.

Solution:

P90 = 90Nth = 90(29)th = 26.1th item, which is 175.


100 100
10N 10(29)th = 2.9th item, which is 111.
P10 = th =
100
PR = P – P
100
90 10
= 175 – 111
= 64
Hence, the percentile range of the scores is 64.

You might also like