C.V and Quatiles
C.V and Quatiles
The standard deviation is the absolute measure of dispersion. Its relative measure is called the
standard coefficient of dispersion or coefficient of standard deviation. It is defined as:
𝑆
Coefficient of Standard Deviation = 𝑋̅
Coefficient of Variation
The most important of all the relative measures of dispersion is the coefficient of variation.
This word is variation not variance. There is no such thing as coefficient of variance. The
coefficient of variation (C.V) is defined as:
𝑆
Coefficient of Variation = 𝑋̅ × 100
Thus C.V is the value of S when 𝑋̅ is assumed equal to 100. It is a pure number and the unit
of observation is not mentioned with its value. It is written in percentage form like 20% or
25%. When its value is 20%, it means that when the mean of the observations is assumed
equal to 100, their standard deviation will be 20. The C.V is used to compare the dispersion
in different sets of data particularly the data which differ in their means or differ in their units
of measurement. The wages of workers may be in dollars and the consumption of meat in
families may be in kilograms. The standard deviation of wages in dollars cannot be compared
with the standard deviation of amount of meat in kilograms. Both the standard deviations
need to be converted into a coefficient of variation for comparison. Suppose the value
of C.V for wages is 10% and the value of C.V for kilograms of meat is 25%. This means that
the wages of workers are consistent. Their wages are close to the overall average of their
wages. But the families consume meat in quite different quantities. Some families consume
very small quantities of meat and some others consume large quantities of meat. We say that
there is greater variation in their consumption of meat. The observations about the quantity of
meat are more dispersed or more variant. So C.V is unit less quantity.
Example:
Calculate the coefficient of standard deviation and coefficient of variation for the following
sample data: 2, 4, 8, 6, 10, and 12.
Solution:
X (𝑋 − 𝑋̅)2
2 (2 − 7)2 =25
4 9
8 1
6 1
10 9
12 25
∑ 𝑋 42
𝑋̅ = = =7
𝑛 6
2
∑(𝑋 − 𝑋̅)2 70
𝑆 = = = 14
𝑛−1 5
𝑆 = √𝑆 2 = √14 = 3.7417
3.7417
Coefficient of Standard Deviation = = 0.5345
7
3.7417
Coefficient of Variation = × 100 = 53.5%
7
Which of the three subjects shows highest variation and which shows
lowest variation in marks?
Solution :
Mathematics :
x̄ = 56, σ = 12
C.V = 21.42%
Science :Coefficient of variation (C.V) = (σ/x̄) ⋅ 100%
x̄ = 65, σ = 14
C.V = 21.53%
Social Science :
x̄ = 60, σ = 10
C.V = 16.66%
Temperature A 𝐴2 Temperature B 𝐵2
18 324 11 121
20 400 14 196
22 484 15 225
24 576 17 289
26 676 18 324
∑ 𝐴 = 110 ∑ 𝐴2 ∑ 𝐵 = 75 ∑ 𝐵2 = 1155
= 2460
Solution :
2
1 2
(∑ 𝑋 )2
𝑆 = {∑ 𝑋 − }
𝑛−1 𝑛
Temperature of city A :
∑ 𝐴 110
𝐴̅ = = = 22
𝑛 5
1 (∑ 𝐴)2 1 1102 40
𝑆2 = {∑ 𝐴2 − }= {2460 − }= = 10
𝑛−1 𝑛 5−1 5 4
𝑆 = √𝑆 2 = √10 = 3.1623
3.1623
Coefficient of Variation = × 100 = 14.3741%
22
Temperature of city B :
∑ 𝐵 75
𝐵̅ = = = 15
𝑛 5
2
1 2
(∑ 𝐵)2 1 752 30
𝑆 = {∑ 𝐵 − }= {1155 − }= = 7.5
𝑛−1 𝑛 5−1 5 4
𝑆 = √𝑆 2 = √7.5 = 2.7386
2.7386
Coefficient of Variation = × 100 = 18.2574
15
Question:
The average marks scored by two students Sathya and Vidhya in 5 subjects are 460 and 480
with standard deviation 4.6 and 2.4 respectively. Who is more consistent in performance?
Page 125
QUANTILES
Median:
It is such a value which divides the data set into two equal parts.
50% 50%
Quartiles:
It is such a value which divides the data set into 4 four equal parts.
Minimum 1 Q1 2 Q2 3 Q3 4 Maximum
Median
25% 75%
50% 50%
75% 25%
𝑄1 = 1st Quartile
𝑄2 = 2nd Quartile = Median
𝑄3 = 3rd Quartile
How to calculate Quartiles: Use the following steps for calculating Quartiles es for small data sets.
For large data sets, we use frequency distribution to find quartiles.
• Step 1: Sort the data in ascending order (from smallest to largest)
𝑛
• Step 2: Calculate 𝑖𝑡ℎ= 𝑖 (4 ) where i is the particular quartile you wish to calculate and n is the
sample size. i.e.
𝑛
𝑄1 =
4
𝑛 𝑛
𝑄2 = 2 ( ) = = 𝑚𝑒𝑑𝑖𝑎𝑛
4 2
3𝑛
𝑄3 =
4
• Step 3: If 𝑖 is an integer, the q𝑡ℎ Quartile is the mean of the data values in positions 𝑖 and 𝑖+1. If 𝑖 is
not an integer, then round up to the next integer and use the value in this position.
Example: Use the following set of stock prices (in dollars): 10, 7, 20, 12, 5, 15, 9, 18, 4, 12,
8, 14 Find the 1st Quartile, 3rd Quartile and median.
Solution:
First sort the data in ascending order:
4 5 7 8 9 10 12 12 14 15 18 20
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th
To find median
𝑛 12
𝑄2 = 𝑚𝑒𝑑𝑖𝑎𝑛 = = = 6, Where 6 is an integer value so
2 2
DESILES
It is such a value which divides the data set into 10 ten equal parts.
1 2 3 4 5 Median 6 7 8 9 10
Minimum D1 D2 D3 D4 D5 D6 D7 D8 D9 Maximum
10% 90%
20% 80%
30% 70% .
40% 60% .
50% 50% .
60% 40% .
70% 30% .
80% 20% .
90% 10% .
There are 9 deciles which divide data set into 10 equal parts.
From minimum value to 𝐷1 be first divide and less than 𝐷1 to minimum value is 10%
of data set and greater than 𝐷1 to maximum value is 90% of the data set. And same for
all deciles i.e. 𝐷2 , 𝐷3 , 𝐷3 , 𝐷4 , 𝐷5 = 𝑚𝑒𝑑𝑖𝑎𝑛, 𝐷6 , 𝐷7 , 𝐷8 , 𝐷9 shown in the given
figure.
How to calculate Deciles: Use the following steps for calculating Deciles for small data sets. For
large data sets, we use frequency distribution to find deciles.
• Step 1: Sort the data in ascending order (from smallest to largest)
𝑛
• Step 2: Calculate 𝑖𝑡ℎ= 𝑖 (10) where i is the particular Decile you wish to calculate and n is the
sample size. i.e.
𝑛 2𝑛 3𝑛 4𝑛 5𝑛 𝑛 6𝑛 7𝑛 8𝑛
𝐷1 = 10 𝐷2 = 10 𝐷3 = 10 𝐷4 = 10 𝐷5 = 10 = 2 𝐷6 = 10 𝐷7 = 10 𝐷8 = 10
9𝑛
𝐷9 =
10
• Step 3: If 𝑖 is an integer, the d𝑡ℎ Decile is the mean of the data values in positions 𝑖 and 𝑖+1. If 𝑖 is
not an integer, then round up to the next integer and use the value in this position.
Example: Use the following set of weights of 15 apples (in grams): 2.0, 2.7, 2.4, 2.9, 4.7,
4.5, 3.5, 3.7, 4.6, 4.9, 1.9, 2.5, 3.0, 4.1, and 4.6. Find 𝑄1 , 𝐷3 , 𝐷 7 , 𝑄3 𝑎𝑛𝑑 𝐷9 .
Solution:
First sort the data in ascending order:
1.9 2.0 2.4 2.5 2.7 2.9 3.0 3.5 3.7 4.1 4.5 4.6 4.6 4.7 4.9
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th
To find
3𝑛 3×15
𝐷3 = 10
= 10
= 4.5 , Where 4.5 is a decimal value so we round up (4.5) ~ 5
9𝑛 9×15
𝐷9 = 10 = 10
= 13.5 , Where 13.5 is a decimal value so we round up (13.5) ~ 14
50% 50% .
60% 40% .
75% 25% .
98% 2% .
99% 1% .
There are 99 percentiles which divide data set into 100 equal parts.
From minimum value to 𝑃1 be first divide and less than 𝑃1 to minimum value is 1% of
data set and greater than 𝑃1 to maximum value is 99% of the data set. And same for
all Percentiles i.e. 𝑃2 , 𝑃3 , 𝑃4 , 𝑃5 , … 𝑃10 ,…, 𝑃25 ,… 𝑃50 = 𝑄2 = 𝐷5 = 𝑀𝐸𝐷𝐼𝐴𝑁,… ,
𝑃55 ,… , 𝑃60 , …, 𝑃98 , 𝑃99 as shown in the given figure.
If we calculate percentiles it include all quantiles such as
1. 𝑃10 = 𝐷1 𝑃20 = 𝐷2 𝑃30 = 𝐷3 …….. 𝑃90 = 𝐷9
2. 𝑃25 = 𝑄1
3. 𝑃50 = 𝐷5 = 𝑄2 = 𝑚𝑒𝑑𝑖𝑎𝑛
4. 𝑃75 = 𝑄3
How to calculate Percentiles: Use the following steps for calculating percentiles for small data sets.
For large data sets, we use frequency distribution to find percentiles.
• Step 1: Sort the data in ascending order (from smallest to largest)
𝑛
• Step 2: Calculate 𝑖𝑡ℎ= 𝑖 (100) where i is the particular Percentile you wish to calculate and n is the
sample size. i.e.
𝑛 2𝑛 3𝑛 4𝑛 10𝑛 𝑛
𝑃1 = 𝑃2 = 𝑃3 = 𝑃4 = …….. 𝑃10 = = = 𝐷1 …
100 100 100 100 100 10
25𝑛 𝑛 50𝑛 𝑛 75𝑛 3𝑛
𝑃25 = = = 𝑄1 …… 𝑃50 = = = 𝑚𝑒𝑑𝑖𝑎𝑛 …… 𝑃75 = = = 𝑄3 …….
100 4 100 2 100 4
99𝑛
𝑃99 =
100
• Step 3: If 𝑖 is an integer, the p𝑡ℎ Percentile is the mean of the data values in positions 𝑖 and 𝑖+1. If 𝑖
is not an integer, then round up to the next integer and use the value in this position.
INTERPRETATION OF PERCENTILES
If you give NTS test for admission then you got your result card with percentile let
suppose 85%.Which means 85% of students got less than your score in a test and 15%
greater than your score in a test.
Page 84
3.42,
3.43
Weight (milligrams) No of seeds CF Class Boudaries
LCB UCB
10-24.9 16 16 9.05 24.95
25-39.9 68 16+68=84 24.95 39.95
40-54.9 204 84+204=288 39.95 54.95
55-69.9 233 288+233=521 54.95 69.95
70-84.9 240 521+240=761 69.95 84.95
85-99.9 655 761+655=1416 84.95 99.95
100-114.9 803 1416+803=2219 99.95 114.95
115-129.9 294 2219+294=2513 114.95 129.95
130-144.9 21 2513+21=2534 129.95 144.95
145-159.9 4 2534+4=2538 144.95 159.95
ℎ 𝑛
𝑄1 = 𝑙 + ( − 𝐶)
𝑓 4
𝑛 2538
h = 84.95-69.95=15 = 4 = 634.5
4
15 2538
𝑄1 = 69.95 + ( − 521)
240 4
𝑄1 = 77.0437
ℎ 3𝑛
𝑄3 = 𝑙 + ( − 𝐶)
𝑓 4
3(𝑛) 3(2538)
= = 1903 h = 114.95-99.95= 15
4 4
15 3(2538)
𝑄3 = 99.95 + ( − 1416)
803 4
𝑄3 = 109.0565
ℎ 3𝑛
𝐷3 = 𝑙 + ( − 𝐶)
𝑓 10
3(𝑛) 3(2538)
= 10 = 761.4 h = 99.95-84.95 = 15
10
15 3(2538)
𝐷3 = 84.95 + ( − 761)
655 10
𝐷3 = 84.95916
ℎ 45𝑛
𝑃95 = 𝑙 + ( − 𝐶)
𝑓 100
95(𝑛) 95(2538)
h = 129.95-114.95 =15 = = 2411.1
100 100
15 95(2538)
𝑃95 = 114.95 + ( − 2219)
294 100
𝑃95 = 124.751
Using box plots we can better understand our data by understanding its distribution, outliers,
mean, median and variance. Box plot packs all of this information about our data in a single
concise diagram.
The left and right sides of the box are the lower and upper quartiles. The box covers
the interquartile interval, where 50% of the data is found.
The vertical line that split the box in two is the median. Sometimes, the mean is also
indicated by a dot or a cross on the box plot.
The whiskers are the two lines outside the box, that go from the minimum to the
lower quartile (the start of the box) and then from the upper quartile (the end of the
box) to the maximum.
Identifying outliers
In box plot the whiskers are generally defined as 1.5 times the inter-quartile range. Anything
this outside the whiskers is considered as an outlier.
A variation of the box and whisker plot restricts the length of the whiskers to a maximum of
1.5 times the interquartile range. That is, the whisker reaches the value that is the furthest
from the centre while still being inside a distance of 1.5 times the interquartile range from the
lower or upper quartile. Data points that are outside this interval are represented as points on
the graph and considered potential outliers.
Example
Example:
For the following data set. Identify any outliers, and draw a box-and-whisker plot.
{5,40,42,46,48,49,50,50,52,53,55,56,58,75,102}
𝑛 = 15
𝑄1 𝑄2 𝑄3
5 40 42 46 48 49 50 50 52 53 55 56 58 75 102
1st 2nd 3rd 4rth 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th
𝑛 15
Lower Quartile = 𝑄1 = = = 3.75~4r𝑡ℎ value = 46
4 4
3𝑛 3(15)
𝑈𝑝𝑝𝑒𝑟 Quartile = 𝑄3 = = = 11.25~12ℎ value = 56
4 4
Hence it is clear that any range above 71 or below 31 are outliers. Since 5 is less
than 31 and 75 and 102 are greater than 71, there are 3 outliers. The box-and-
whisker plot is as shown. Note that 40 and 58 are shown as the ends of the
Identify Skewness
We can also identify the skewness of our data by observing the shape of the box plot. If the
box plot is symmetric it means that our data follows a normal distribution. If our box plot is
not symmetric it shows that our data is skewed. You can get a better understanding by
looking at the diagrams below:
Here is a box plot with respect to the distribution curve (Positively skewed)
The three box and whisker plots of chart 4.5.2.1 have been created using R software. What
can you say about the three distributions?
Complete a given table
The centre of distribution A is the lowest of the three distributions (median is 0.11).
The distribution is positively skewed, because the whisker and half-box are longer on
the right side of the median than on the left side.
The centre of distribution C is the highest of the three distributions (median is 0.88).
The distribution C is negatively skewed because the whisker and half-box are longer
on the left side of the median than on the right side.
All three distributions include potential outliers. Let’s take distribution A, for example. The
interquartile range is Q3 - Q1 = 0.32 – 0.02 = 0.30. According to the definition used by the
function in R software, all values higher than Q3 + 1.5 x (Q3 - Q1) = 0.32 + 1.5 x 0.30 = 0.77
are outside the right whisker and indicated by a circle. There are two potential outliers in
distribution A.