M3 Nirali
M3 Nirali
CHAPTER - 5
STATISTICS, CORRELATION AND REGRESSION
5.1 INTRODUCTION
In recent decades, the growth of statistics has made its place in almost every major phase of human activity, particularly so in
the field of Engineering and Science. Everything dealing with the collection, presentation, processing, analysis and interpretation
of numerical data belongs to the field of statistics. Collection and processing of data is usually referred to as statistical survey.
Before any major project work is undertaken, the statistical survey is a must. Only when statistical survey gives green signal,
actual start of the work is undertaken.
For example, if a Dam is to be constructed on a river, many aspects have to be taken into account. Foremost is the selection
of dam site. For making a proper choice, it may be necessary to consider average rainfull in the catchment area for the past say
100 years, the extent of the area which may be submerged, the population which is going to be benefitted, the availability of
labour and many other aspects. Good statistical survey should be able to answer all these questions. All such considerations and
statistical survey have to be made whenever a new industry is to be started. The success of such project depends to a great
extent upon sound statistical survey. Apart from these basic considerations, modern statistical techniques are widely used in the
fields of statistical work, Quality control, reliability needs of the highly complex products of space technology and operation
research.
Aim of this work is to introduce to the readers, the simple aspects of collection, classification and enumeration of numerical
data, which are very essential for development of modern statistical techniques, used in engineering fields.
(5.1)
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.2) STATISTICS, CORRELATION AND REGRESSION
Consider the Table 5.1, which shows various values of x and f. It shows that the value of x = 1 is recorded twice, x = 4 occurs
six times, x = 8 occurs four times, etc.
The total numbers of observations being ∑f = 45. In statistical language, this table means x = 1 has frequency 2, x = 4 has
frequency 6 and so on. This way of arrangement of data is called frequency distribution. In the above example, the range of
variety is from x = 1 to x = 10. When the range is wide and the total number of observations is very large, the data can be
expressed in still more compact form by dividing the range in class intervals.
Table 5.1
x f
1 2
2 3
3 5
4 6
5 10
6 6
7 4
8 4
9 3
10 2
– ∑ f = 45
Next, consider the table 5.2. Here the range of variety (0, 100) is divided into 10 class intervals each of width 10. The class
10 + 0
interval 0 – 10 has width 10, the lower limit 0 and the upper limit 10. Here = 5 is the middle value of the class interval and
2
16 is the frequency corresponding to this class interval. The middle value x = 5 represents the class interval (0 – 10) of f = 16 is
taken as frequency of variety x. This way of representing the data is called Grouped frequency distribution. In such type of
presentation, the class intervals must be well defined. One such way of defining the class interval is that, all the values of x = 0
and above but less than 10 are included in the class interval 0 – 10. The total frequency of all such observations is 16 and is the
frequency of class interval 0 – 10 or is the frequency of variable (variety) x = 5.
Table 5.2
C.I. Mid-value Frequency
(Class interval) x f
0 – 10 5 16
10 – 20 15 18
20 – 30 25 20
30 – 40 35 22
40 – 50 45 40
50 – 60 55 45
60 – 70 65 35
70 – 80 75 20
80 – 90 85 19
90 – 100 95 15
Total – ∑f = 250
Similarly, all the observations having the value x = 10 and above but less than 20 are included in the class interval 10 – 20
and so on. Slight change in the definition of last class interval is made. Here all the values of x = 90 and above and less than or
equal to 100 are included in the class interval 90 – 100. ∑f = 250 gives the total frequency which is sometimes denoted by N.
In presenting the data in Grouped frequency distribution form, the following points must be noted :
(i) The class interval must be well defined that is there must not be any ambiguity about the inclusion of value of x in one
or the other class interval. In the Table 5.2, the way of defining class interval enables us to put x = 10 in the class
interval 10 – 20 while x = 100 is put in the interval 90 – 100.
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.3) STATISTICS, CORRELATION AND REGRESSION
(ii) The class intervals must be exhaustive that is no observation should escape classification. For this, the entire range of
observations should be divided into well defined class intervals.
(iii) The width of the class interval should be uniform as far as possible.
(iv) The number of class intervals should neither be too large nor too small. Depending upon the range of variate x and
the total frequency of observations, the total number of class intervals is divided into about 10 to 25 class intervals.
Sometimes the additional column of cumulative frequency (c.f.) supplements the grouped frequency distribution or frequency
distribution table.
In the Table 5.3, the number 76 against x = 35 shows the total frequency upto and including the observation x = 35 which is
the middle value of the interval (30 – 40).
Table 5.3
C.I. Mid-value (x) Frequency (f) Cumulative Frequency (c. f.)
0 – 10 5 16 16
10 – 20 15 18 34
20 – 30 25 20 54
30 – 40 35 22 76
40 – 50 45 40 116
50 – 60 55 45 161
60 – 70 65 35 196
70 – 80 75 20 216
80 – 90 85 19 235
90 – 100 95 15 250
Total – ∑f = 250 N = 250
This type of cumulative frequencies also called less than cumulative frequency. If we reverse the process, that is computing
cumulative sum of frequencies from highest class to lowest class, than this type of cumulative frequencies is called more than
cumulative frequency.
3. Cumulative Frequency Curve or The Ogive : Taking upper limit of classes of x co-ordinate and corresponding
cumulative frequency as y co-ordinate, if the points are plotted and then joined by free hand curve, it gives what is called as
ogive (See Fig. 5.2).
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.4) STATISTICS, CORRELATION AND REGRESSION
y
270
240
210
Cumulative frequency
180
150
120
90
60
30
x
0 10 20 30 40 50 60 70 80 90 100
Variable x
Fig. 5.2
ILLUSTRATIONS
Ex. 1 : Find the Arithmetic mean for the following distribution :
x 0 1 2 3 4 5 6 7 8 9 10
f 4 5 12 12 13 16 15 13 12 5 6
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.5) STATISTICS, CORRELATION AND REGRESSION
Dividing by ∑f throughout
∑ fd ∑ fx ∑ fA
= –
∑f ∑f ∑f
∑ fd – ∑f
i.e. = x–A
∑f ∑f
(A being constant is taken outside the ∑ notation)
∑ fd –
= x–A
∑f
– ∑ fd – –
or x = A + = A +d [d is the mean of the variable d]
∑f
f d and ∑ fd are smaller numbers as compared to f x and ∑ fx, which result in the reduction of the calculations.
Further, for reduction in calculations for grouped frequencies distribution can be achieved by taking
x–A d
u = or u =
h h
that gives hu = x – A
Then proceeding as before, we get
∑fu –
h = x – A
∑f
– ∑fu
or x = A +h
∑f
This formula is mostly used in grouped frequency distribution, where, h is chosen to be equal to the width of the class
interval.
Ex. 2 : Calculate arithmetic mean for the following frequency distribution :
Observations (x) 103 110 112 118 95
Frequency (f) 4 6 10 12 3
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.6) STATISTICS, CORRELATION AND REGRESSION
0 – 10 8
10 – 20 20
20 – 30 14
30 – 40 16
40 – 50 20
50 – 60 25
60 – 70 13
70 – 80 10
80 – 90 5
90 – 100 2
–
∴ x , the mean of first set is given by
– x1 + x2 + x3 + … + xn
x =
n1
–
∴ n1 x = x1 + x2 + … + xn
1
–
Hence, by definition the joint arithmetic mean z is given by
(x1 + x2 + x3 + … + xn ) + (y1 + y2 + y3 + … + yn )
– 1 2
z =
n1 + n2
– –
– n1 x + n2 y
∴ z = … (A)
n1 + n2
∑f = N1 ∑f = N2
– –
Means x , y for two sets are given by
– ∑ fx –
x = ∴ N1 x = ∑ fx
N1
– ∑ fy –
y = ∴ N2 y = ∑ fy
N2
–
Hence, z the combined mean is given by
– –
– ∑ fx + ∑ fy N1 x + N2 y
z = = … (A)
N1 + N2 N1 + N2
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.8) STATISTICS, CORRELATION AND REGRESSION
Ex. 4 : Marks obtained in paper of Applied Mechanics by a group of Computer and Electronics students are as given in following
tables :
Group (A) of Computer students :
Marks Obtained No. of Students
0 – 10 5
10 – 20 6
20 – 30 15
30 – 40 15
40 – 50 9
∑f = 50
Group (B) of Electronics students :
Marks Obtained No. of Students
0 – 10 8
10 – 20 15
20 – 30 18
30 – 40 13
40 – 50 6
∑f = 60
Find the Combined mean of the two groups.
Sol. : For group (A) :
C.I. Mid-value x f f × x
0 – 10 5 5 25
10 – 20 15 6 90
20 – 30 25 15 375
30 – 40 35 15 525
40 – 50 45 9 405
Total – Ν 1 = ∑ f = 50 ∑ fx = 1420
For group (B) :
C.I. Mid-value x f f × x
0 – 10 5 8 40
10 – 20 15 15 225
20 – 30 25 18 450
30 – 40 35 13 455
40 – 50 45 6 270
Total – N2 = ∑ f = 60 ∑ fy = 1440
–
Mean x of group (A) is given by,
– ∑ fx – –
x = ⇒ x ∑ f = N1 x = ∑ fx = 1420
∑f
–
Mean y of group (B) is given by,
– ∑ fy – –
y = ⇒ y ∑ f = N2 y = ∑ fy = 1440
∑f
–
Combined mean z is given by,
– –
– N1 x + N2 y 1420 + 1440
z = =
N1 + N2 50 + 60
2860
= = 26
110
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.9) STATISTICS, CORRELATION AND REGRESSION
Ex. 5 : Arithmetic mean of weight of 100 boys is 50 kg and the arithmetic mean of 50 girls is 45 kg. Calculate the arithmetic
mean of combined group of boys and girls.
– –
Sol. : Let X1 and N1 be the mean and size of group of boys and Y and N2 be the mean and size of group of girls. So
– –
N1 = 100, X = 50, N2 = 50, Y = 45. Hence, combined mean is
– –
N1X + N2 Y (100 × 50) + (50 × 45) 7250
Z = = = = 48.3333
N1 + N2 100 + 50 150
Ex. 6 : The mean weekly salary paid to 300 employees of a firm is ` 1,470. There are 200 male employees and the remaining
are females. If mean salary of males is ` 1,505, obtain the mean salary of females.
– – –
Sol. : Suppose X and N1 are mean and group size of males. Y and N2 are mean and size of group of females, xc mean of all
the employees considered together.
– –
N1X + N2Y
Now, Z =
N1 + N2
–
(200 × 1505) + (100 × Y)
∴ 1470 =
200 + 100
–
301000 + 100Y
∴ 1470 =
300
∴ 441000 = 301000 + 100Y
∴ 4410 = 3010 + Y
–
∴ Y = ` 1,400
5.4.2 Geometric Mean
Geometric mean of a set of an observations x1, x2, …, xn is given by nth root of their product.
Thus Geometric Mean (G.M.) is given by,
G.M. = (x1 ⋅ x2 ⋅ x3 … xn)1/n
In case of frequency distribution
x x1 x2 x3 ……… xn
f f1 f2 f3 ……… fn
1/N
xf1 ⋅ xf2 ⋅ xf3 … xfn where, N = ∑ f
G.M. = G = 1 2 3 n
To calculate it, denoting G.M. by G and taking logarithms of both sides
1 1/N
log x11 ⋅ x22 ⋅ x33 … xnn
f f f f
N
log G =
1
= [f log x1 + f2 log x2 … + fn log xn]
N 1
1
= ∑ f log x
N
or G = antilog
1 ∑ f log x
N
It is seen that logarithm of G is the arithmetic mean of the logarithms of the given values. In case of grouped frequency
distribution, x is taken as mid-value of the class interval.
For two sets of observations (x1, x2, …, xn ), (y1, y2, …, yn ) with geometric means G1, G2, it can be established that
1 2
n1 log G1 + n2 log G2
log G =
n1 + n2
where, G is the combined or common geometric mean of the two series.
It may be noted here that if one of the observations is zero, geometric mean becomes zero and if one of the observations is
negative, geometric mean becomes imaginary. Naturally, calculation of geometric mean becomes meaningless in such cases.
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.10) STATISTICS, CORRELATION AND REGRESSION
(H.M.) Harmonic mean of set of observations (x1, x2, ……, xn) is the reciprocal of the arithmetic mean of the reciprocals of the
where, N = ∑ f
5.4.4 Median
Median of a distribution is the value of the variable (or variate) which divides it into two equal parts. It is the value such that
the number of observations above it is equal to the number of observations below it. Sometimes, Median is called positional
average.
th
In case of ungrouped data, if the number of observations n is odd, then the median is the middle value which is n + 1
2
observations of the set of observations after they are arranged in ascending or descending order. For even number of
observations, it is the arithmetic mean of the two middle terms given by,
1 nth n + 1 observations
th
The value of observation + The value of
2 2 2
x x1 x2 x3 ……… xn
f f1 f2 f3 ……… fn
where, ∑f = N
N N
We prepare the cumulative frequency column. Then consider cumulative frequency (c.f.) equal to or just greater than ,
2 2
ILLUSTRATIONS
Ex. 7 : For the ordered arrangement of n = 8 observations x = 1, 5, 9, 11, 21 , 24, 27, 30, the middle terms are 11 and 21
11 + 21
and median = = 16 and for the ordered arrangement of n = 7 observations x = 35, 35. 36, 37 , 38, 39, 40, the middle
2
x 1 3 5 7 9 11 13 15 17
f 3 6 8 12 16 16 15 10 5
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.11) STATISTICS, CORRELATION AND REGRESSION
The value of c.f. just greater than 45.5 is 61, the corresponding value of x is 11 and thus, median is 11.
N
In case of grouped frequency distribution, the class corresponding to the c.f. just greater than is called the median class
2
and the value of median is obtained by the formula :
Median = l +
h N
–c
f 2
where, l is the lower limit of the median class.
f is the frequency of the median class.
Ex. 9 : Wages earned in Rupees per day by the labourers are given by the table :
Wages in ` 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of Labourers 5 8 13 10 8
Find the median of the distribution.
Sol. :
Wages in ` C.I. No. of Labourers f (c.f.)
10 – 20 5 5
20 – 30 8 13
30 – 40 13 26
40 – 50 10 36
50 – 60 8 44
Total ∑f = N = 44 –
N 44
Here = = 22
2 2
Cumulative frequency (c.f.) just greater than 22 is 26 and the corresponding class is 30 – 40.
Using formula to calculate median, with
N
l = 30, f = 13, h = 10, = 22, c = 13
2
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.12) STATISTICS, CORRELATION AND REGRESSION
10
Median = 30 + (22 – 13)
13
10 90
= 30 + (9) = 30 + = 36.923
13 13
5.4.5 Mode
It is the value of the variate which occurs most frequently in a set of observations, or is the value of variate corresponding to
maximum frequency.
We note that general nature of frequency curve is bell shaped in majority of situations. Thus, initially frequency is small, it
increases and reaches the maximum and then it declines. The value on x-axis at which the maxima or peak of the frequency curve
appear is a mode.
For the mode of the data 35, 38, 40, 39, 35, 36, 37, it can be clearly seen the observation 35 has maximum frequency, hence
it is mode.
For the mode of the following frequency distirubution
x 10 11 12 13 14 15
f 2 5 10 21 12 13
Since maximum frequency is associated with observation 13, the mode is 13.
In case of grouped frequency distribution, Mode is given by the formula :
(f1 – f0)
Mode = l + h ×
(f1 – f0) – (f2 – f1)
(f1 – f0)
= l+h×
2 f1 – f0 – f2
Here, l is the lower limit of the modal class
h is the width of the modal class
f1 is the frequency of the modal class
f0 is the frequency of the class preceding to the modal class
f2 is the frequency of the class succeeding to the modal class.
ILLUSTRATION
ILLUSTRATION
Ex. 1 : Calculate Arithmetic mean and Mean deviation of the following frequency distribution :
x 1 2 3 4 5 6
f 3 4 8 6 4 2
Sol. : Preparing the table :
x f x×f x–A |x – A | f×|x–A|
1 3 3 – 2.37 2.37 7.11
2 4 8 – 1.37 1.37 5.48
3 8 24 – 0.37 0.37 2.96
4 6 24 0.63 0.63 3.78
5 4 20 1.63 1.63 6.52
6 2 12 2.63 2.63 5.26
Total ∑f = 27 ∑ f x = 91 – – ∑f × |x – A|| = 31.11
– ∑ fx 9 1
A.M. = A = x = = = 3.37 (approximately)
∑f 27
∑f × |x – A| 31.11
Mean deviation = = = 1.152 (approximately).
∑f 27
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.14) STATISTICS, CORRELATION AND REGRESSION
(ii) Standard Deviation : Standard deviation is defined as the positive square root of the arithmetic mean of the squares of
the deviations of the given values from their arithmetic mean. It is denoted by the symbol σ.
For the variate x which takes n values x1, x2, … xn,
1 −
S.D. = σ = ∑ (x − x )2
n
For a frequency distribution (x, f), i.e. for variate x which takes n values x1, x2, … xn and corresponding frequencies f1, f2, …, fn.
1 2
S.D. = σ =
N (
–
∑f x– x )
–
where, x is A.M. of the distribution and N = ∑ f.
(iii) Variances : The square of the standard deviation is called variance, denoted by Var (x).
For the variate x which takes the values x1, x2, …, xn,
2
1
Variance = Var (x) = σ2 =
n (
–
∑ x–x )
For a frequency distribution (x, f), i.e. for variate x which taken n values x1, x2, …, xn and corresponding frequencies f1, f2, …, fn,
2
1
Variance = Var (x) = σ2 =
N
∑f (x – –x)
The step of squaring the deviations (x – –x) overcomes the drawback of ignoring the signs in Mean deviation. Standard
deviation is also suitable for further mathematical treatment. Moreover among all the measures of dispersion, standard deviation
is affected by least fluctuations of sampling, hence it is considered as most reliable measure of dispersion.
(iv) Root mean square deviation is given by
1
R.M.S. = S = ∑ f (x – A)2
N
where, A is any arbitrary number (reference number).
Also, square of the root mean square deviation is called Mean square deviation denoted by S2.
1
M.S.D. = S2 = ∑ f(x − A)2
N
–
When A = x , the Arithmetic mean, Root mean square deviation becomes equal to the standard deviation.
(v) For computation purpose the above formulae can be simplified as follows :
Method of Calculating σ :
For variate x which takes n values x1, x2, …, xn
1 −
σ2 = ∑ (x − x )2
n
1 − −
= ∑ (x2 − 2x x + x 2)
n
1 − −
= ∑ x2 − 2 x 2 + x 2
n
2
σ2 =
1
∑ x2 −
1 ∑ x … (1)
n
∴
n
=
1 – 2
∑ f x2 – 2 x + x
– 2
‡ ∑ fx – ∑f
= x, = 1
N N N
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.15) STATISTICS, CORRELATION AND REGRESSION
1 –2
= ∑ f x2 – x
N
2
σ2 =
1
∑ f x2 –
1 ∑ f x … (2)
N N
Usually, product terms fx and fx2 are large, hence to reduce the volume of calculations further, we proceed as follows :
2
1
σ2 =
N
∑f (x – –x)
2
1
=
N
∑f (x – A + A – –x) , (where, A is arbitrary number)
(A – –x) + (A – –x)
2
1
= ∑f (x – A)2 + 2 (x – A)
N
2
1 2 ∑f
N
∑ f (x – A)2 +
=
N
∑ (A – –x) ∑ f (x – A) + (A – –x) N
– 1
Let d = x – A then using x = A + ∑ fd
N
2
1 2 1 A 1
σ2 = ∑f d 2 + A – A – ∑ fd ∑f d + –A – ∑fd ·1
N
∴
N N N
1 2 2 1 2
= ∑ f d – 2 (∑ fd) + 2 (∑ fd)
2
N N N
1 1 2
= ∑ f d 2 – 2 (∑ f d)
N N
2
σ2 =
1
∑ fd2 –
∑ fd … (3)
N N
2
or σ=
1
∑ f d2 –
∑ fd … (4)
N N
Terms f d , f d 2 are numerically smaller as compared to f x , f x 2 and use of formula (4) reduces the calculations considerably in
obtaining σ.
In dealing with data presented in grouped frequency distribution form and to reduce the calculations further, we put
x–A
u = , where h is generally taken as width of class interval.
h
d
Thus u = or d = hu putting d = hu in formula (3),
h
2
σ =
1
∑ f h2 u2 –
∑ f hu
N N
2
σ=h
1
∑ f u2 –
∑ f u … (5)
N N
Formula (5) is quite useful for data presented in grouped frequency distribution form.
(vi) Relation Between σ and S : By definition, we have
1
S2 = ∑ f (x – A)2
N
2
1
=
N
∑f (x – –x + –x – A)
) (–x – A) + (–x – A)
2 2
1
=
N
∑f ( x –
–
x ) –
+2 x–x (
2 2
1 ∑f
=
N
∑f (x – –x) –
+ 2 x–A ( ) N1 ∑ f (x – –x) + (–x – A) N
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.16) STATISTICS, CORRELATION AND REGRESSION
(
–
Note that x – A ) being constant, is taken outside the summation.
1 1 – 1
Now since
N
∑f (x – –x) = N
∑ fx – x .
N
– –
∑f = x – x = 0
2 2
1
∴ S2 =
N
∑f (x – –x) + (–x – A) , as ∑ f = N
–
Thus, S2 = σ2 + d2 , where d = x – A … (6)
–
If x = A, thus S2 would be least as d = 0. Thus, Mean square deviation (S2) and consequently Root mean square (S)
–
deviation are least when deviations are taken from A = x .
(vii) Coefficient of Variation : The relative measure of standard deviation is called coefficient of variation and is given by
σ
C.V. = × 100 … (7)
A.M.
For comparing the variability of two series, we calculate coefficient of variation for each series.
ILLUSTRATION
Ex. 1 : Runs scored in 10 matches of current IPL season by two batsmen A and B are tabulated as under
Batsman A 46 34 52 78 65 81 26 46 19 47
Batsman B 59 25 81 47 73 78 42 35 42 10
For Batsman B
x d = x – 42 d2
59 17 289
25 –17 289
81 39 1521
47 5 25
73 31 961
78 36 1296
42 0 0
35 –7 49
42 0 0
10 –32 1024
∑ d = 72 ∑ d2 = 5454
– ∑d
xB = 42 + = 42 + 7.2 = 49.2
10
2
∑d2 ∑d
σB = – = 545.4 – 51.84 – 493.56 = 22.22
N N
σB 22.22
Coefficient of variation for B = × 100 = × 100 = 45.16
A.M. 49.2
Conclusion : A.M. for A is slightly higher than B A.M. for B so A is slightly better and coefficient of variation for A is less than that
of B.
∴ A is more consistent.
Ex. 2 : Fluctuations in the Aggregate of marks obtained by two groups of students are given below. Find out which of the two
shows greater variability. (Dec. 2012)
Group A 518 519 530 530 544 542 518 550 527 527 531 550 550 529 528
Group B 825 830 830 819 814 814 844 842 842 826 832 835 835 840 840
σ
Sol. : To solve this problem, we have to determine coefficient of variation × 100 in each case. First we present the data
A.M.
in frequency distribution form.
For Group A :
x f d = x – 530 d2 fd fd2
527 2 –3 9 –6 18
528 1 –2 4 –2 4
529 1 –1 1 –1 1
530 2 0 0 0 0
531 1 1 1 1 1
Total ∑ f = 15 – – ∑ fd = 43 1973
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.18) STATISTICS, CORRELATION AND REGRESSION
− ∑ fd 43
A.M. = x A = 530 + = 530 + = 532.866
∑f 15
2 2
σA =
1
∑ fd2 –
∑ fd = 1973
–
43 = 131.533 – 8.218
N N 15 15
= 11.105
σA 11.105
Coefficient of variation = × 100 = × 100 = 2.0840
A.M. 532.866
For Group B :
x f d = x – 830 d2 fd fd2
814 2 – 16 256 – 32 512
819 1 – 11 121 – 11 121
825 1 –5 25 –5 25
826 1 –4 16 –4 16
830 2 0 0 0 0
832 1 2 4 2 4
835 2 5 25 10 50
840 2 10 100 20 200
842 2 12 144 24 288
844 1 14 196 14 196
Total ∑ f = 15 – – ∑ fd = 18 ∑ fd2 = 1412
− 18
A.M. = x B = 830 + = 831.2
15
2
σB =
1412
– 18 = 94.133 – 1.44 = 9.628
15 15
σB 9.628
Coefficient of variation = × 100 = × 100 = 1.158
A.M. 831.2
Coefficient of variation of group A is greater than that of group B.
∴ Group A has greater variability, or Group B is more consistent.
Ex. 3 : Calculate standard deviation for the following frequency distribution. Decide whether A.M. is good average.
Wages in Rupees earned per day 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of labourers 5 9 15 12 10 3
Sol. : Preparing the table for the purpose of calculations.
Wages Earned C.I. Mid-value x Frequency f x – 25 fu fu2
u=
10
0 – 10 5 5 –2 – 10 20
10 – 20 15 9 –1 –9 9
20 – 30 25 15 0 0 0
30 – 40 35 12 1 12 12
40 – 50 45 10 2 20 40
50 – 60 55 3 3 9 27
Total – ∑ f = 54 – ∑ fu = 22 ∑ fu2 = 108
Using formula (4),
2
× 108 –
1 22
σ = 10
54 54
= 10 2 – 0.166 = 13.54 approximately
In this problem,
∑ fu
A.M. = 25 + h = 25 + 10 (0.4074) = 29.074
N
Conclusion : σ = 13.54 is quite a large value and Arithmetic mean 29.074 is not a good average.