0% found this document useful (0 votes)
44 views16 pages

C.V and Quatiles

The document discusses the coefficient of standard deviation and coefficient of variation. The coefficient of standard deviation is defined as the standard deviation divided by the mean. The coefficient of variation is defined as the standard deviation divided by the mean and multiplied by 100. It is a dimensionless quantity used to compare the dispersion of different data sets that may have different units or means. An example is provided to demonstrate calculating the coefficient of standard deviation and coefficient of variation.

Uploaded by

Nosheen Ramzan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views16 pages

C.V and Quatiles

The document discusses the coefficient of standard deviation and coefficient of variation. The coefficient of standard deviation is defined as the standard deviation divided by the mean. The coefficient of variation is defined as the standard deviation divided by the mean and multiplied by 100. It is a dimensionless quantity used to compare the dispersion of different data sets that may have different units or means. An example is provided to demonstrate calculating the coefficient of standard deviation and coefficient of variation.

Uploaded by

Nosheen Ramzan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Coefficient of Standard Deviation

The standard deviation is the absolute measure of dispersion. Its relative measure is called the
standard coefficient of dispersion or coefficient of standard deviation. It is defined as:

𝑆
Coefficient of Standard Deviation = 𝑋̅

Coefficient of Variation
The most important of all the relative measures of dispersion is the coefficient of variation.
This word is variation not variance. There is no such thing as coefficient of variance. The
coefficient of variation (C.V) is defined as:
𝑆
Coefficient of Variation = 𝑋̅ × 100

Thus C.V is the value of S when 𝑋̅ is assumed equal to 100. It is a pure number and the unit
of observation is not mentioned with its value. It is written in percentage form like 20% or
25%. When its value is 20%, it means that when the mean of the observations is assumed
equal to 100, their standard deviation will be 20. The C.V is used to compare the dispersion
in different sets of data particularly the data which differ in their means or differ in their units
of measurement. The wages of workers may be in dollars and the consumption of meat in
families may be in kilograms. The standard deviation of wages in dollars cannot be compared
with the standard deviation of amount of meat in kilograms. Both the standard deviations
need to be converted into a coefficient of variation for comparison. Suppose the value
of C.V for wages is 10% and the value of C.V for kilograms of meat is 25%. This means that
the wages of workers are consistent. Their wages are close to the overall average of their
wages. But the families consume meat in quite different quantities. Some families consume
very small quantities of meat and some others consume large quantities of meat. We say that
there is greater variation in their consumption of meat. The observations about the quantity of
meat are more dispersed or more variant. So C.V is unit less quantity.

Example:
Calculate the coefficient of standard deviation and coefficient of variation for the following
sample data: 2, 4, 8, 6, 10, and 12.

Solution:

X (𝑋 − 𝑋̅)2
2 (2 − 7)2 =25
4 9
8 1
6 1
10 9
12 25
∑ 𝑋 42
𝑋̅ = = =7
𝑛 6

2
∑(𝑋 − 𝑋̅)2 70
𝑆 = = = 14
𝑛−1 5

𝑆 = √𝑆 2 = √14 = 3.7417
3.7417
Coefficient of Standard Deviation = = 0.5345
7
3.7417
Coefficient of Variation = × 100 = 53.5%
7

Q.1 The mean and standard deviation of marks obtained by 40 students of


a class in three subjects Mathematics, Science and Social Science are given
below

Which of the three subjects shows highest variation and which shows
lowest variation in marks?

Solution :

Mathematics :

Coefficient of variation (C.V) = (σ/x̄) ⋅ 100%

x̄ = 56, σ = 12

C.V = (12/56) ⋅ 100%

C.V = 0.2142 ⋅ 100%

C.V = 21.42%
Science :Coefficient of variation (C.V) = (σ/x̄) ⋅ 100%

x̄ = 65, σ = 14

C.V = (14/65) ⋅ 100%

C.V = 0.2153 ⋅ 100%

C.V = 21.53%

Social Science :

Coefficient of variation (C.V) = (σ/x̄) ⋅ 100%

x̄ = 60, σ = 10

C.V = (10/60) ⋅ 100%

C.V = 0.1666 ⋅ 100%

C.V = 16.66%

The highest variation is in the subject Science and lowest variation is in


the subject Social science
Q.2 The temperature of two cities A and B in a winter season are given
below.

Temperature A 𝐴2 Temperature B 𝐵2

18 324 11 121

20 400 14 196

22 484 15 225

24 576 17 289

26 676 18 324

∑ 𝐴 = 110 ∑ 𝐴2 ∑ 𝐵 = 75 ∑ 𝐵2 = 1155
= 2460

Find which city is more consistent in temperature changes?

Solution :

2
1 2
(∑ 𝑋 )2
𝑆 = {∑ 𝑋 − }
𝑛−1 𝑛
Temperature of city A :
∑ 𝐴 110
𝐴̅ = = = 22
𝑛 5
1 (∑ 𝐴)2 1 1102 40
𝑆2 = {∑ 𝐴2 − }= {2460 − }= = 10
𝑛−1 𝑛 5−1 5 4

𝑆 = √𝑆 2 = √10 = 3.1623

3.1623
Coefficient of Variation = × 100 = 14.3741%
22

Temperature of city B :
∑ 𝐵 75
𝐵̅ = = = 15
𝑛 5

2
1 2
(∑ 𝐵)2 1 752 30
𝑆 = {∑ 𝐵 − }= {1155 − }= = 7.5
𝑛−1 𝑛 5−1 5 4

𝑆 = √𝑆 2 = √7.5 = 2.7386

2.7386
Coefficient of Variation = × 100 = 18.2574
15

Hence city A is more consistent.

Question:
The average marks scored by two students Sathya and Vidhya in 5 subjects are 460 and 480
with standard deviation 4.6 and 2.4 respectively. Who is more consistent in performance?

Page 125

4.25, 4.26, 4.27, 4.28, 4.29 and 4.30

QUANTILES
Median:

It is such a value which divides the data set into two equal parts.

Minimum Median Maximum

50% 50%

Quartiles:
It is such a value which divides the data set into 4 four equal parts.

Minimum 1 Q1 2 Q2 3 Q3 4 Maximum
Median
25% 75%
50% 50%
75% 25%

𝑄1 = 1st Quartile
𝑄2 = 2nd Quartile = Median
𝑄3 = 3rd Quartile

How to calculate Quartiles: Use the following steps for calculating Quartiles es for small data sets.
For large data sets, we use frequency distribution to find quartiles.
• Step 1: Sort the data in ascending order (from smallest to largest)
𝑛
• Step 2: Calculate 𝑖𝑡ℎ= 𝑖 (4 ) where i is the particular quartile you wish to calculate and n is the
sample size. i.e.
𝑛
𝑄1 =
4
𝑛 𝑛
𝑄2 = 2 ( ) = = 𝑚𝑒𝑑𝑖𝑎𝑛
4 2
3𝑛
𝑄3 =
4

• Step 3: If 𝑖 is an integer, the q𝑡ℎ Quartile is the mean of the data values in positions 𝑖 and 𝑖+1. If 𝑖 is
not an integer, then round up to the next integer and use the value in this position.

Example: Use the following set of stock prices (in dollars): 10, 7, 20, 12, 5, 15, 9, 18, 4, 12,
8, 14 Find the 1st Quartile, 3rd Quartile and median.
Solution:
 First sort the data in ascending order:
4 5 7 8 9 10 12 12 14 15 18 20
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th

 There are 12 scores so, n = 12.


 To find
𝑛 12
𝑄1 = 4 = 4
= 3, Where 3 is an integer value so
3𝑟𝑑 𝑣𝑎𝑙𝑢𝑒 + 4𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 7 + 8
𝑄1 = = = 7.5
2 2
𝑄1 = 7.5
 To find
3𝑛 3×12
𝑄3 = = = 9, Where 9 is an integer value so
4 4
9𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 + 10𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 14 + 15
𝑄3 = = = 14.5
2 2

 To find median
𝑛 12
𝑄2 = 𝑚𝑒𝑑𝑖𝑎𝑛 = = = 6, Where 6 is an integer value so
2 2

6𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 + 7𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 10 + 12


𝑄2 = 𝑚𝑒𝑑𝑖𝑎𝑛 = = = 11
2 2

DESILES
It is such a value which divides the data set into 10 ten equal parts.

1 2 3 4 5 Median 6 7 8 9 10
Minimum D1 D2 D3 D4 D5 D6 D7 D8 D9 Maximum
10% 90%
20% 80%
30% 70% .
40% 60% .

50% 50% .
60% 40% .
70% 30% .

80% 20% .
90% 10% .

 There are 9 deciles which divide data set into 10 equal parts.
 From minimum value to 𝐷1 be first divide and less than 𝐷1 to minimum value is 10%
of data set and greater than 𝐷1 to maximum value is 90% of the data set. And same for
all deciles i.e. 𝐷2 , 𝐷3 , 𝐷3 , 𝐷4 , 𝐷5 = 𝑚𝑒𝑑𝑖𝑎𝑛, 𝐷6 , 𝐷7 , 𝐷8 , 𝐷9 shown in the given
figure.
How to calculate Deciles: Use the following steps for calculating Deciles for small data sets. For
large data sets, we use frequency distribution to find deciles.
• Step 1: Sort the data in ascending order (from smallest to largest)
𝑛
• Step 2: Calculate 𝑖𝑡ℎ= 𝑖 (10) where i is the particular Decile you wish to calculate and n is the
sample size. i.e.
𝑛 2𝑛 3𝑛 4𝑛 5𝑛 𝑛 6𝑛 7𝑛 8𝑛
𝐷1 = 10 𝐷2 = 10 𝐷3 = 10 𝐷4 = 10 𝐷5 = 10 = 2 𝐷6 = 10 𝐷7 = 10 𝐷8 = 10
9𝑛
𝐷9 =
10

• Step 3: If 𝑖 is an integer, the d𝑡ℎ Decile is the mean of the data values in positions 𝑖 and 𝑖+1. If 𝑖 is
not an integer, then round up to the next integer and use the value in this position.

Example: Use the following set of weights of 15 apples (in grams): 2.0, 2.7, 2.4, 2.9, 4.7,
4.5, 3.5, 3.7, 4.6, 4.9, 1.9, 2.5, 3.0, 4.1, and 4.6. Find 𝑄1 , 𝐷3 , 𝐷 7 , 𝑄3 𝑎𝑛𝑑 𝐷9 .
Solution:
 First sort the data in ascending order:
1.9 2.0 2.4 2.5 2.7 2.9 3.0 3.5 3.7 4.1 4.5 4.6 4.6 4.7 4.9
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th

 There are 15 scores so, n = 15.


 To find
𝑛 15
𝑄1 = == 3.75, Where 3.75 is a decimal value so we round up (3.75) ~ 4
4 4
𝑄1 = 4𝑟𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 2.5
𝑄1 = 2.5
 To find
3𝑛 3×15
𝑄3 = 4
= 4
= 11.25, Where 11.25 is a decimal value so we round up (11.25) ~ 12
𝑄3 = 12𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 4.6

 To find
3𝑛 3×15
𝐷3 = 10
= 10
= 4.5 , Where 4.5 is a decimal value so we round up (4.5) ~ 5

𝐷3 = 5𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 2.7


 To find
7𝑛 7×15
𝐷7 = 10 = 10 = 10.5 , Where 10.5 is a decimal value so we round up (10.5) ~ 11

𝐷7 = 11𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 4.5


 To find

9𝑛 9×15
𝐷9 = 10 = 10
= 13.5 , Where 13.5 is a decimal value so we round up (13.5) ~ 14

𝐷9 = 14𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 4.7


PERCENTILES
It is such a value which divides the data set into 100 ten equal parts.

1 2….. 10… 25… Median 60… 75… 98 99 100


Minimum P1 P2… P10… P25… P50… P60… P75… P98 P99 Maximum
1% 99%
2% 98%
10% 90% .
25% 75% .

50% 50% .
60% 40% .
75% 25% .

98% 2% .
99% 1% .

 There are 99 percentiles which divide data set into 100 equal parts.
 From minimum value to 𝑃1 be first divide and less than 𝑃1 to minimum value is 1% of
data set and greater than 𝑃1 to maximum value is 99% of the data set. And same for
all Percentiles i.e. 𝑃2 , 𝑃3 , 𝑃4 , 𝑃5 , … 𝑃10 ,…, 𝑃25 ,… 𝑃50 = 𝑄2 = 𝐷5 = 𝑀𝐸𝐷𝐼𝐴𝑁,… ,
𝑃55 ,… , 𝑃60 , …, 𝑃98 , 𝑃99 as shown in the given figure.
 If we calculate percentiles it include all quantiles such as
1. 𝑃10 = 𝐷1 𝑃20 = 𝐷2 𝑃30 = 𝐷3 …….. 𝑃90 = 𝐷9
2. 𝑃25 = 𝑄1
3. 𝑃50 = 𝐷5 = 𝑄2 = 𝑚𝑒𝑑𝑖𝑎𝑛
4. 𝑃75 = 𝑄3
How to calculate Percentiles: Use the following steps for calculating percentiles for small data sets.
For large data sets, we use frequency distribution to find percentiles.
• Step 1: Sort the data in ascending order (from smallest to largest)
𝑛
• Step 2: Calculate 𝑖𝑡ℎ= 𝑖 (100) where i is the particular Percentile you wish to calculate and n is the
sample size. i.e.
𝑛 2𝑛 3𝑛 4𝑛 10𝑛 𝑛
𝑃1 = 𝑃2 = 𝑃3 = 𝑃4 = …….. 𝑃10 = = = 𝐷1 …
100 100 100 100 100 10
25𝑛 𝑛 50𝑛 𝑛 75𝑛 3𝑛
𝑃25 = = = 𝑄1 …… 𝑃50 = = = 𝑚𝑒𝑑𝑖𝑎𝑛 …… 𝑃75 = = = 𝑄3 …….
100 4 100 2 100 4

99𝑛
𝑃99 =
100

• Step 3: If 𝑖 is an integer, the p𝑡ℎ Percentile is the mean of the data values in positions 𝑖 and 𝑖+1. If 𝑖
is not an integer, then round up to the next integer and use the value in this position.
INTERPRETATION OF PERCENTILES
If you give NTS test for admission then you got your result card with percentile let
suppose 85%.Which means 85% of students got less than your score in a test and 15%
greater than your score in a test.
Page 84
3.42,
3.43
Weight (milligrams) No of seeds CF Class Boudaries
LCB UCB
10-24.9 16 16 9.05 24.95
25-39.9 68 16+68=84 24.95 39.95
40-54.9 204 84+204=288 39.95 54.95
55-69.9 233 288+233=521 54.95 69.95
70-84.9 240 521+240=761 69.95 84.95
85-99.9 655 761+655=1416 84.95 99.95
100-114.9 803 1416+803=2219 99.95 114.95
115-129.9 294 2219+294=2513 114.95 129.95
130-144.9 21 2513+21=2534 129.95 144.95
145-159.9 4 2534+4=2538 144.95 159.95

ℎ 𝑛
𝑄1 = 𝑙 + ( − 𝐶)
𝑓 4
𝑛 2538
h = 84.95-69.95=15 = 4 = 634.5
4
15 2538
𝑄1 = 69.95 + ( − 521)
240 4
𝑄1 = 77.0437

ℎ 3𝑛
𝑄3 = 𝑙 + ( − 𝐶)
𝑓 4
3(𝑛) 3(2538)
= = 1903 h = 114.95-99.95= 15
4 4

15 3(2538)
𝑄3 = 99.95 + ( − 1416)
803 4
𝑄3 = 109.0565

ℎ 3𝑛
𝐷3 = 𝑙 + ( − 𝐶)
𝑓 10
3(𝑛) 3(2538)
= 10 = 761.4 h = 99.95-84.95 = 15
10
15 3(2538)
𝐷3 = 84.95 + ( − 761)
655 10

𝐷3 = 84.95916

ℎ 45𝑛
𝑃95 = 𝑙 + ( − 𝐶)
𝑓 100
95(𝑛) 95(2538)
h = 129.95-114.95 =15 = = 2411.1
100 100

15 95(2538)
𝑃95 = 114.95 + ( − 2219)
294 100

𝑃95 = 124.751

Box and whisker plot


The box and whisker plot, sometimes simply called the box plot, is a type of graph
that help visualize the five-number summary. It doesn’t show the distribution in as
much detail as histogram does, but it’s especially useful for indicating whether a
distribution is skewed and whether there are potential unusual observations (outliers)
in the data set. A box plot is ideal for comparing distributions because the centre,
spread and overall range are immediately apparent..

Using box plots we can better understand our data by understanding its distribution, outliers,
mean, median and variance. Box plot packs all of this information about our data in a single
concise diagram.

In a box and whisker plot:

 The left and right sides of the box are the lower and upper quartiles. The box covers
the interquartile interval, where 50% of the data is found.

 The vertical line that split the box in two is the median. Sometimes, the mean is also
indicated by a dot or a cross on the box plot.

 The whiskers are the two lines outside the box, that go from the minimum to the
lower quartile (the start of the box) and then from the upper quartile (the end of the
box) to the maximum.
Identifying outliers
In box plot the whiskers are generally defined as 1.5 times the inter-quartile range. Anything
this outside the whiskers is considered as an outlier.

A variation of the box and whisker plot restricts the length of the whiskers to a maximum of
1.5 times the interquartile range. That is, the whisker reaches the value that is the furthest
from the centre while still being inside a distance of 1.5 times the interquartile range from the
lower or upper quartile. Data points that are outside this interval are represented as points on
the graph and considered potential outliers.

Example

Example:
For the following data set. Identify any outliers, and draw a box-and-whisker plot.
{5,40,42,46,48,49,50,50,52,53,55,56,58,75,102}
𝑛 = 15

𝑄1 𝑄2 𝑄3

5 40 42 46 48 49 50 50 52 53 55 56 58 75 102

1st 2nd 3rd 4rth 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th

𝑴edian = 𝑸𝟐 = 8th value = 50

𝑛 15
Lower Quartile = 𝑄1 = = = 3.75~4r𝑡ℎ value = 46
4 4

3𝑛 3(15)
𝑈𝑝𝑝𝑒𝑟 Quartile = 𝑄3 = = = 11.25~12ℎ value = 56
4 4

𝑰nter Quartile Range = 𝑸𝟑 − 𝑄1 = 56 − 46 = 10

Lower Limit = 𝑄𝟏 − (1.5 × 𝐼𝑄𝑅) = 46 − (1.5 × 10) = 31

Upper limit = 𝑄𝟑 + (1.5 × 𝐼𝑄𝑅) = 56 + (1.5 × 10) = 71


Conclusion

Hence it is clear that any range above 71 or below 31 are outliers. Since 5 is less

than 31 and 75 and 102 are greater than 71, there are 3 outliers. The box-and-

whisker plot is as shown. Note that 40 and 58 are shown as the ends of the

whiskers, with the outliers plotted separately.

Identify Skewness
We can also identify the skewness of our data by observing the shape of the box plot. If the
box plot is symmetric it means that our data follows a normal distribution. If our box plot is
not symmetric it shows that our data is skewed. You can get a better understanding by
looking at the diagrams below:
Here is a box plot with respect to the distribution curve (Positively skewed)

Question to be practiced – Comparison of three box and whisker plots

The three box and whisker plots of chart 4.5.2.1 have been created using R software. What
can you say about the three distributions?
Complete a given table

Distribution Distribution Distribution


Measurement
A B C
Minimum 0.00
Lower quartile
(Q1)
Median (Q2)
Upper quartile
(Q3)
Maximum 0.86

 The centre of distribution A is the lowest of the three distributions (median is 0.11).
The distribution is positively skewed, because the whisker and half-box are longer on
the right side of the median than on the left side.

 Distribution B is approximately symmetric, because both half-boxes are almost the


same length (0.11 on the left side and 0.10 on the right side). It’s the most
concentrated distribution because the interquartile range is 0.21, compared to 0.30 for
distribution A and 0.26 for distribution C.

 The centre of distribution C is the highest of the three distributions (median is 0.88).
The distribution C is negatively skewed because the whisker and half-box are longer
on the left side of the median than on the right side.
All three distributions include potential outliers. Let’s take distribution A, for example. The
interquartile range is Q3 - Q1 = 0.32 – 0.02 = 0.30. According to the definition used by the
function in R software, all values higher than Q3 + 1.5 x (Q3 - Q1) = 0.32 + 1.5 x 0.30 = 0.77
are outside the right whisker and indicated by a circle. There are two potential outliers in
distribution A.

You might also like