0% found this document useful (0 votes)
75 views18 pages

M3 Nirali

This document discusses the importance of statistics in engineering and science, emphasizing the necessity of statistical surveys before project initiation. It covers the collection and classification of data, differentiating between primary and secondary data, and introduces methods for presenting data, such as frequency distribution and graphical representations. Additionally, it highlights the concept of central tendency and various methods for calculating the mean, which is crucial for interpreting data effectively.

Uploaded by

duduknalehiranya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views18 pages

M3 Nirali

This document discusses the importance of statistics in engineering and science, emphasizing the necessity of statistical surveys before project initiation. It covers the collection and classification of data, differentiating between primary and secondary data, and introduces methods for presenting data, such as frequency distribution and graphical representations. Additionally, it highlights the concept of central tendency and various methods for calculating the mean, which is crucial for interpreting data effectively.

Uploaded by

duduknalehiranya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT III : STATISTICS

CHAPTER - 5
STATISTICS, CORRELATION AND REGRESSION

5.1 INTRODUCTION
In recent decades, the growth of statistics has made its place in almost every major phase of human activity, particularly so in
the field of Engineering and Science. Everything dealing with the collection, presentation, processing, analysis and interpretation
of numerical data belongs to the field of statistics. Collection and processing of data is usually referred to as statistical survey.
Before any major project work is undertaken, the statistical survey is a must. Only when statistical survey gives green signal,
actual start of the work is undertaken.
For example, if a Dam is to be constructed on a river, many aspects have to be taken into account. Foremost is the selection
of dam site. For making a proper choice, it may be necessary to consider average rainfull in the catchment area for the past say
100 years, the extent of the area which may be submerged, the population which is going to be benefitted, the availability of
labour and many other aspects. Good statistical survey should be able to answer all these questions. All such considerations and
statistical survey have to be made whenever a new industry is to be started. The success of such project depends to a great
extent upon sound statistical survey. Apart from these basic considerations, modern statistical techniques are widely used in the
fields of statistical work, Quality control, reliability needs of the highly complex products of space technology and operation
research.
Aim of this work is to introduce to the readers, the simple aspects of collection, classification and enumeration of numerical
data, which are very essential for development of modern statistical techniques, used in engineering fields.

5.2 COLLECTION AND CLASSIFICATION OF DATA


The data according to the method of collection are of two types viz. (a) Primary data, (b) Secondary data. Primary data also,
called raw data may be a result of a survey or investigation through questionaries and data taken from sources which are
already collected by some other agency viz. media reports, office record, bulletins, magazine, website etc. is called secondary
data.
Data collected in a statistical survey as a result of some kind of experimentation is usually large in size and is in the form
which is not very useful for arriving at any specific conclusions. The first task is to present this data in a proper form. As a first
step, this data which is generally in the form of numerical observations, is arranged either in the ascending or descending order.
For example, the set of observations 45, 35, 0, 10, 0, 51, 81, 71, 95, 17, 97, 21, 26, 86, 100, 55, 46, 56, 37, 92 (which are in all 20) is
rearranged in ascending order as 0, 0, 10, 17, 21, 26, 35, 37, 45, 51, 55, 56, 71, 81, 86, 92, 95, 97, 100.
This way of presentation immediately reveals that the minimum value of the observation is 0 and maximum is 100. It also
indicates that observations are well spread out in the interval (0, 100). In different experiments, these observations could carry
different meanings. In some experiments, these figures may indicate the number of syntax errors committed by a group of 20
students in their first attempt to write a computer program. In yet another experiment, these figures may indicate marks obtained
out of 100 by a group of 20 students in the paper of numerical computational methods. In an altogether different context, these
figures may indicate Rainfall in centimeters in a certain catchment area for the past 20 year. For development of statistical
techniques it is unimportant, what is exactly represented by these observations. In presentation of data, these observations are
represented by symbol x, called in statistical language, a variable (variety).
After arranging the data in ascending or descending order, to make it more compact (or further classified), it is presented in
a tabular form consisting of columns headed by symbols x and f. The column headed by x consists of various observations
recorded out of experimentation, arranged in proper order, and column headed by f contains entries which indicate number of
times particular value of x occur.

(5.1)
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.2) STATISTICS, CORRELATION AND REGRESSION

Consider the Table 5.1, which shows various values of x and f. It shows that the value of x = 1 is recorded twice, x = 4 occurs
six times, x = 8 occurs four times, etc.
The total numbers of observations being ∑f = 45. In statistical language, this table means x = 1 has frequency 2, x = 4 has
frequency 6 and so on. This way of arrangement of data is called frequency distribution. In the above example, the range of
variety is from x = 1 to x = 10. When the range is wide and the total number of observations is very large, the data can be
expressed in still more compact form by dividing the range in class intervals.
Table 5.1
x f
1 2
2 3
3 5
4 6
5 10
6 6
7 4
8 4
9 3
10 2
– ∑ f = 45
Next, consider the table 5.2. Here the range of variety (0, 100) is divided into 10 class intervals each of width 10. The class
10 + 0
interval 0 – 10 has width 10, the lower limit 0 and the upper limit 10. Here = 5 is the middle value of the class interval and
2
16 is the frequency corresponding to this class interval. The middle value x = 5 represents the class interval (0 – 10) of f = 16 is
taken as frequency of variety x. This way of representing the data is called Grouped frequency distribution. In such type of
presentation, the class intervals must be well defined. One such way of defining the class interval is that, all the values of x = 0
and above but less than 10 are included in the class interval 0 – 10. The total frequency of all such observations is 16 and is the
frequency of class interval 0 – 10 or is the frequency of variable (variety) x = 5.
Table 5.2
C.I. Mid-value Frequency
(Class interval) x f
0 – 10 5 16
10 – 20 15 18
20 – 30 25 20
30 – 40 35 22
40 – 50 45 40
50 – 60 55 45
60 – 70 65 35
70 – 80 75 20
80 – 90 85 19
90 – 100 95 15
Total – ∑f = 250
Similarly, all the observations having the value x = 10 and above but less than 20 are included in the class interval 10 – 20
and so on. Slight change in the definition of last class interval is made. Here all the values of x = 90 and above and less than or
equal to 100 are included in the class interval 90 – 100. ∑f = 250 gives the total frequency which is sometimes denoted by N.
In presenting the data in Grouped frequency distribution form, the following points must be noted :
(i) The class interval must be well defined that is there must not be any ambiguity about the inclusion of value of x in one
or the other class interval. In the Table 5.2, the way of defining class interval enables us to put x = 10 in the class
interval 10 – 20 while x = 100 is put in the interval 90 – 100.
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.3) STATISTICS, CORRELATION AND REGRESSION

(ii) The class intervals must be exhaustive that is no observation should escape classification. For this, the entire range of
observations should be divided into well defined class intervals.

(iii) The width of the class interval should be uniform as far as possible.

(iv) The number of class intervals should neither be too large nor too small. Depending upon the range of variate x and
the total frequency of observations, the total number of class intervals is divided into about 10 to 25 class intervals.

Sometimes the additional column of cumulative frequency (c.f.) supplements the grouped frequency distribution or frequency
distribution table.

In the Table 5.3, the number 76 against x = 35 shows the total frequency upto and including the observation x = 35 which is
the middle value of the interval (30 – 40).

Table 5.3
C.I. Mid-value (x) Frequency (f) Cumulative Frequency (c. f.)
0 – 10 5 16 16
10 – 20 15 18 34
20 – 30 25 20 54
30 – 40 35 22 76
40 – 50 45 40 116
50 – 60 55 45 161
60 – 70 65 35 196
70 – 80 75 20 216
80 – 90 85 19 235
90 – 100 95 15 250
Total – ∑f = 250 N = 250
This type of cumulative frequencies also called less than cumulative frequency. If we reverse the process, that is computing
cumulative sum of frequencies from highest class to lowest class, than this type of cumulative frequencies is called more than
cumulative frequency.

5.3 GRAPHICAL REPRESENTATION OF DATA


To observe the data at a glance, it is exhibited by following graphical methods :
Y
1. Histogram : A Histogram is drawn by constructing rectangles
45
over the class intervals, such that the areas of rectangles are proportional 40
to the class frequencies. 35
If the class intervals are of equal width, the heights of the rectangles 30
Frequency

will be proportional to the class frequencies themselves, otherwise these 25

would be proportional to the ratios of the frequencies to the width of the 20

classes (See Fig. 5.1). 15


10
2. Frequency Polygon : Consider the set of points (x, f), where x is
5
the middle value of the class interval and f is the corresponding
X
frequency. If these set of points are joined by straight lines, they form a 0 10 20 30 40 50 60 70 80 90 100
Variable x
frequency polygon. It is shown by dotted lines in Fig. 5.1.
Fig. 5.1

3. Cumulative Frequency Curve or The Ogive : Taking upper limit of classes of x co-ordinate and corresponding
cumulative frequency as y co-ordinate, if the points are plotted and then joined by free hand curve, it gives what is called as
ogive (See Fig. 5.2).
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.4) STATISTICS, CORRELATION AND REGRESSION
y
270

240

210

Cumulative frequency
180
150

120
90

60
30
x
0 10 20 30 40 50 60 70 80 90 100
Variable x

Fig. 5.2

5.4 LOCATION OF CENTRAL TENDENCY


After collecting the data and arranging it in the proper order in the form of frequency distribution or grouped frequency
distribution, next task is to study this data carefully and to draw valid conclusions. If data collected relates to marks obtained by
the students in Mathematics paper, it should be able to reveal the general performance of the students. Whether the class
contains large number of good students or the overall calibre of students is medicore, all this must be inferred from the data. If
the numerical data collected relates to the industrial project, the whole success of the project will depend upon the appropriate
conclusions drawn from the study of this data. The first step in this direction is the location of central tendency. It means what is
represented by data by and large. Whether the data is favourable to a particular project or not will depend upon the criterion
that is decided upon. But overall picture must be exhibited. This overall picture or central tendency of the data is known by
obtaining what we call the Mean or Average. There are various methods to calculate the mean or the average. Depending upon
the project under study, the particular method is selected. Various types of measures of central tendency are as given below :
(1) Arithmetic mean (2) Geometric mean (3) Harmonic mean
(4) Median (5) Mode.
Among the above stated, arithmetic mean, geometric mean and harmonic mean are called as mathematical averages and
median and mode are called positional averages.
Out of these, Arithmetic mean is of greater importance and serves the purpose in many cases. Now, we see how these
measures are calculated.
5.4.1 Arithmetic Mean
Consider the variate x which takes n values x1, x2, x3 …… xn, (set of n observations) then the Arithmetic mean (A.M.) is a sum

of the observations divided by number of observations, denoted by x and is given by,
– x1 + x2 + x3 + …… + xn ∑x
A.M. = x = =
n n
If the data is presented in the form of frequency distribution
x x1 x2 x3 …… xn
f f1 f2 f3 …… fn

then the sum of observations is f1 x1 + f2 x2 + … + fn xn and arithmetic mean x is given by
– f1 x1 + f2 x2 + … + fn xn ∑ fx ∑ fx
A.M. = x = = =
f1 + f2 + f3 + … + fn ∑f N
where, N = ∑ f = f1 + f2 + …… + fn is the total frequency.

ILLUSTRATIONS
Ex. 1 : Find the Arithmetic mean for the following distribution :
x 0 1 2 3 4 5 6 7 8 9 10
f 4 5 12 12 13 16 15 13 12 5 6
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.5) STATISTICS, CORRELATION AND REGRESSION

Sol. : Writing the tabulated values as :


x f x × f
0 4 0
1 5 5
2 12 24
3 12 36
4 13 52
5 16 80
6 15 90
7 13 91
8 12 96
9 5 45
10 6 60
Total ∑f = 113 ∑ fx = 579
– ∑ fx 579
x = = = 5.12 (approximately)
∑f 113
To reduce the calculations, we consider the variable d = x – A. Where, A is middle value or value near to it in the range of
variable x, A is sometimes called assumed mean.
Now we can write
f×d = f×x – f×A
or ∑ fd = ∑ fx – ∑ fA

Dividing by ∑f throughout
∑ fd ∑ fx ∑ fA
= –
∑f ∑f ∑f
∑ fd – ∑f
i.e. = x–A
∑f ∑f
(A being constant is taken outside the ∑ notation)
∑ fd –
= x–A
∑f
– ∑ fd – –
or x = A + = A +d [d is the mean of the variable d]
∑f
f d and ∑ fd are smaller numbers as compared to f x and ∑ fx, which result in the reduction of the calculations.
Further, for reduction in calculations for grouped frequencies distribution can be achieved by taking
x–A d
u = or u =
h h
that gives hu = x – A
Then proceeding as before, we get
∑fu –
h = x – A
∑f
– ∑fu
or x = A +h
∑f
This formula is mostly used in grouped frequency distribution, where, h is chosen to be equal to the width of the class
interval.
Ex. 2 : Calculate arithmetic mean for the following frequency distribution :
Observations (x) 103 110 112 118 95
Frequency (f) 4 6 10 12 3
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.6) STATISTICS, CORRELATION AND REGRESSION

Sol. : We solve the problem by direct methods.


x f fx
103 4 103 × 4 = 412
110 6 110 × 6 = 660
112 10 112 × 10 = 1120
118 12 118 × 12 = 1416
95 3 95 × 3 = 285
Total N = 35 ∑f x = 3893
– ∑fx 3893
∴ x = = = 111.2286
∑f 35
Ex. 3 : Marks obtained in a paper of statistics are given in the following table.
Marks Obtained No. of Students

0 – 10 8

10 – 20 20

20 – 30 14

30 – 40 16

40 – 50 20

50 – 60 25

60 – 70 13

70 – 80 10

80 – 90 5

90 – 100 2

Find the Arithmetic mean of the distribution.


Sol. : Preparing the table as : A = 45, h = 10.
C.I. Mid-value f x – 45 f× u
u =
10
x
0 – 10 5 8 –4 – 32
10 – 20 15 20 –3 – 60
20 – 30 25 14 –2 – 28
30 – 40 35 16 –1 – 16
40 – 50 45 20 0 0
50 – 60 55 25 1 25
60 – 70 65 13 2 26
70 – 80 75 10 3 30
80 – 90 85 5 4 20
90 – 100 95 2 5 10
Total – ∑ f = 133 – ∑ fu = – 25
– ∑ fu – 25
x = A +h = 45 + 10
∑f  133 
= 45 + 10
  = 45 – 250 = 43.12
– 25
 133  133
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.7) STATISTICS, CORRELATION AND REGRESSION

Combined Arithmetic Mean (Mean of composite series)


Consider two sets of data
1. x1, x2, …, xn containing n1 items
1

2. y1, y2, …, yn containing n2 items


2


∴ x , the mean of first set is given by

– x1 + x2 + x3 + … + xn
x =
n1

∴ n1 x = x1 + x2 + … + xn
1

and the mean of second set is given by


y1 + y2 + y3 + … + yn
– 2
y =
n2

∴ n2 y = y1 + y2 + … + yn
2


Hence, by definition the joint arithmetic mean z is given by
(x1 + x2 + x3 + … + xn ) + (y1 + y2 + y3 + … + yn )
– 1 2
z =
n1 + n2
– –
– n1 x + n2 y
∴ z = … (A)
n1 + n2

The above result can be generalized to k (k ≥ 2) groups.

(A) gives combined Arithmetic Mean (A.M.) of the composite series.


Same type of formula holds good for sets of data presented in frequency distribution form. Consider two sets of data :
Set 1 Set 2
x f y f
x1 f1 y1 f1
x2 f2 y2 f2
x3 f3 y3 f3
… … … …
… … … …
xn fn yn fn
1 1 1 1

∑f = N1 ∑f = N2
– –
Means x , y for two sets are given by
– ∑ fx –
x = ∴ N1 x = ∑ fx
N1
– ∑ fy –
y = ∴ N2 y = ∑ fy
N2

Hence, z the combined mean is given by
– –
– ∑ fx + ∑ fy N1 x + N2 y
z = = … (A)
N1 + N2 N1 + N2
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.8) STATISTICS, CORRELATION AND REGRESSION

Ex. 4 : Marks obtained in paper of Applied Mechanics by a group of Computer and Electronics students are as given in following
tables :
Group (A) of Computer students :
Marks Obtained No. of Students
0 – 10 5
10 – 20 6
20 – 30 15
30 – 40 15
40 – 50 9
∑f = 50
Group (B) of Electronics students :
Marks Obtained No. of Students
0 – 10 8
10 – 20 15
20 – 30 18
30 – 40 13
40 – 50 6
∑f = 60
Find the Combined mean of the two groups.
Sol. : For group (A) :
C.I. Mid-value x f f × x
0 – 10 5 5 25
10 – 20 15 6 90
20 – 30 25 15 375
30 – 40 35 15 525
40 – 50 45 9 405
Total – Ν 1 = ∑ f = 50 ∑ fx = 1420
For group (B) :
C.I. Mid-value x f f × x
0 – 10 5 8 40
10 – 20 15 15 225
20 – 30 25 18 450
30 – 40 35 13 455
40 – 50 45 6 270
Total – N2 = ∑ f = 60 ∑ fy = 1440

Mean x of group (A) is given by,
– ∑ fx – –
x = ⇒ x ∑ f = N1 x = ∑ fx = 1420
∑f

Mean y of group (B) is given by,
– ∑ fy – –
y = ⇒ y ∑ f = N2 y = ∑ fy = 1440
∑f

Combined mean z is given by,
– –
– N1 x + N2 y 1420 + 1440
z = =
N1 + N2 50 + 60
2860
= = 26
110
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II) (5.9) STATISTICS, CORRELATION AND REGRESSION

Ex. 5 : Arithmetic mean of weight of 100 boys is 50 kg and the arithmetic mean of 50 girls is 45 kg. Calculate the arithmetic
mean of combined group of boys and girls.
– –
Sol. : Let X1 and N1 be the mean and size of group of boys and Y and N2 be the mean and size of group of girls. So
– –
N1 = 100, X = 50, N2 = 50, Y = 45. Hence, combined mean is
– –
N1X + N2 Y (100 × 50) + (50 × 45) 7250
Z = = = = 48.3333
N1 + N2 100 + 50 150
Ex. 6 : The mean weekly salary paid to 300 employees of a firm is ` 1,470. There are 200 male employees and the remaining
are females. If mean salary of males is ` 1,505, obtain the mean salary of females.
– – –
Sol. : Suppose X and N1 are mean and group size of males. Y and N2 are mean and size of group of females, xc mean of all
the employees considered together.
– –
N1X + N2Y
Now, Z =
N1 + N2

(200 × 1505) + (100 × Y)
∴ 1470 =
200 + 100

301000 + 100Y
∴ 1470 =
300
∴ 441000 = 301000 + 100Y
∴ 4410 = 3010 + Y

∴ Y = ` 1,400
5.4.2 Geometric Mean
Geometric mean of a set of an observations x1, x2, …, xn is given by nth root of their product.
Thus Geometric Mean (G.M.) is given by,
G.M. = (x1 ⋅ x2 ⋅ x3 … xn)1/n
In case of frequency distribution
x x1 x2 x3 ……… xn
f f1 f2 f3 ……… fn
1/N
xf1 ⋅ xf2 ⋅ xf3 … xfn where, N = ∑ f
G.M. = G = 1 2 3 n
To calculate it, denoting G.M. by G and taking logarithms of both sides
1 1/N

log x11 ⋅ x22 ⋅ x33 … xnn
f f f f
N 
log G =

1
= [f log x1 + f2 log x2 … + fn log xn]
N 1
1
= ∑ f log x
N

or G = antilog
1 ∑ f log x

N 
It is seen that logarithm of G is the arithmetic mean of the logarithms of the given values. In case of grouped frequency
distribution, x is taken as mid-value of the class interval.
For two sets of observations (x1, x2, …, xn ), (y1, y2, …, yn ) with geometric means G1, G2, it can be established that
1 2
n1 log G1 + n2 log G2
log G =
n1 + n2
where, G is the combined or common geometric mean of the two series.
It may be noted here that if one of the observations is zero, geometric mean becomes zero and if one of the observations is
negative, geometric mean becomes imaginary. Naturally, calculation of geometric mean becomes meaningless in such cases.
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.10) STATISTICS, CORRELATION AND REGRESSION

5.4.3 Harmonic Mean

(H.M.) Harmonic mean of set of observations (x1, x2, ……, xn) is the reciprocal of the arithmetic mean of the reciprocals of the

given values. Thus, H.M. or H is given by,


1
H.M. = H =
1  +
1 1 1 1
x1 x2 + x3 + … + xn
n  

In case of frequency distribution (x, f),


1
H =
1 f

N x

where, N = ∑ f
5.4.4 Median

Median of a distribution is the value of the variable (or variate) which divides it into two equal parts. It is the value such that

the number of observations above it is equal to the number of observations below it. Sometimes, Median is called positional

average.
th
In case of ungrouped data, if the number of observations n is odd, then the median is the middle value which is n + 1
 2 
observations of the set of observations after they are arranged in ascending or descending order. For even number of

observations, it is the arithmetic mean of the two middle terms given by,

1 nth n + 1 observations
th
The value of observation + The value of
2 2 2  

For a data presented in the form of frequency distribution :

x x1 x2 x3 ……… xn

f f1 f2 f3 ……… fn

where, ∑f = N
N N
We prepare the cumulative frequency column. Then consider cumulative frequency (c.f.) equal to or just greater than ,
2 2

the corresponding value of x is the median.

ILLUSTRATIONS

Ex. 7 : For the ordered arrangement of n = 8 observations x = 1, 5, 9, 11, 21 , 24, 27, 30, the middle terms are 11 and 21
11 + 21
and median = = 16 and for the ordered arrangement of n = 7 observations x = 35, 35. 36, 37 , 38, 39, 40, the middle
2

term is 37 and Median = 37 (4th observation).

Ex. 8 : Obtain the median of the distribution :

x 1 3 5 7 9 11 13 15 17
f 3 6 8 12 16 16 15 10 5
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.11) STATISTICS, CORRELATION AND REGRESSION

Sol. : Preparing the table as :


x f c.f.
1 3 3
3 6 9
5 8 17
7 12 29
9 16 45
11 16 61
13 15 76
15 10 86
17 5 91
Total ∑f = 91 –
N
Here the total frequency N = 91; = 45.5.
2

The value of c.f. just greater than 45.5 is 61, the corresponding value of x is 11 and thus, median is 11.
N
In case of grouped frequency distribution, the class corresponding to the c.f. just greater than is called the median class
2
and the value of median is obtained by the formula :

Median = l +
h N 
–c
f 2 
where, l is the lower limit of the median class.
f is the frequency of the median class.

h is the width of the median class.

c is the c.f. of the class preceding the median class.

Ex. 9 : Wages earned in Rupees per day by the labourers are given by the table :
Wages in ` 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of Labourers 5 8 13 10 8
Find the median of the distribution.
Sol. :
Wages in ` C.I. No. of Labourers f (c.f.)
10 – 20 5 5
20 – 30 8 13
30 – 40 13 26
40 – 50 10 36
50 – 60 8 44
Total ∑f = N = 44 –
N 44
Here = = 22
2 2
Cumulative frequency (c.f.) just greater than 22 is 26 and the corresponding class is 30 – 40.
Using formula to calculate median, with
N
l = 30, f = 13, h = 10, = 22, c = 13
2
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.12) STATISTICS, CORRELATION AND REGRESSION

10
Median = 30 + (22 – 13)
13
10 90
= 30 + (9) = 30 + = 36.923
13 13
5.4.5 Mode
It is the value of the variate which occurs most frequently in a set of observations, or is the value of variate corresponding to
maximum frequency.
We note that general nature of frequency curve is bell shaped in majority of situations. Thus, initially frequency is small, it
increases and reaches the maximum and then it declines. The value on x-axis at which the maxima or peak of the frequency curve
appear is a mode.
For the mode of the data 35, 38, 40, 39, 35, 36, 37, it can be clearly seen the observation 35 has maximum frequency, hence
it is mode.
For the mode of the following frequency distirubution
x 10 11 12 13 14 15
f 2 5 10 21 12 13
Since maximum frequency is associated with observation 13, the mode is 13.
In case of grouped frequency distribution, Mode is given by the formula :
(f1 – f0)
Mode = l + h ×
(f1 – f0) – (f2 – f1)
(f1 – f0)
= l+h×
2 f1 – f0 – f2
Here, l is the lower limit of the modal class
h is the width of the modal class
f1 is the frequency of the modal class
f0 is the frequency of the class preceding to the modal class
f2 is the frequency of the class succeeding to the modal class.

ILLUSTRATION

Ex. 10 : Find the Mode for the following distribution :


C.I. 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
f 4 7 8 12 25 18 10
Sol. : Here C.I. 40 – 50 corresponding to which f = 25 is maximum, is the modal class.
l = 40, h = 10, f1 = 25, f0 = 12, f2 = 18
10 (25 – 12)
Mode = 40 +
(2 × 25 – 12 – 18)
130
= 40 + = 40 + 6.5 = 46.5
20
So far we have considered various ways in which average can be calculated. It is clear that no single average is suitable for all
types of data. Arithmetic mean, Geometric mean and Harmonic mean are rigidly defined and are based on all the observations,
they are suitable for further mathematical treatment. They are not much affected by fluctuations of sampling. In fact among all
the averages, Arithmetic mean is least affected by fluctuations. Geometric mean becomes zero if any one of the observations is
zero. Geometric and Harmonic means are not easy to understand and are difficult to compute. They give greater importance to
small items and are useful when small items have to be given a very high weightage. Median and Mode are not amenable to
algebraic treatment. Their main advantage is that they are not affected by extreme values, but compared to Arithmetic mean they
are affected much by fluctuations of sampling. All the averages have merits and demerits, but Arithmetic mean because of its
simplicity and its stability is much more familiar to a lyman. It has wide applications in statistical theory and is considered as best
among all the averages.
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.13) STATISTICS, CORRELATION AND REGRESSION

5.5 MEASURES OF DISPERSION


After calculation of the average using any of the five methods discussed in previous section, question arises whether the
average calculated gives correct information about the central tendency of the data, the purpose for which it is calculated. Main
point to be discussed is whether the average is true representative of the data or not. As an illustration, consider the two sets of
observations.
(i) 5, 10, 15, 20, 25.
(ii) 13, 14, 15, 16, 17.
The Arithmetic mean of both these sets is 15. It is obvious that 15 is better average for second than the first, because the
observations in the second set are much closer to the value 15 as compared to the first set. In the second set, the values of the
variate are much less scattered or dispersed from the mean as compared to the first. We note that average remains good
representative if dispersion is less (i.e. observations are closed to it). Thus, dispersion decides the reliability of average.
There are two widely accepted ways of measuring the degree of scatteredness from the mean. These are :
(i) Mean deviation
(ii) Standard deviation.
These are the measures of dispersion, which decide whether the average truely represents the given data or not. Besides
these two standard measures, there are other measures of dispersion such as range and quartile deviation or semi-
interquartile range. But these are not as much of consequence as it uses only two extreme items or depends upon only two
portion values.
We shall now discuss about the two measures of dispersion mentioned earlier.
(i) Mean Deviation : It is defined as the arithmetic mean of absolute deviations from any average is called as mean
deviation about the respective average. For the variate x which takes n values x1, x2, … xn, mean deviation from the average A

(usually, Arithmetic mean x or at most median or mode) is given by
1
Mean deviation = ∑ |x − A|
n
For a frequency distribution (x, f) i.e. variate which takes n values x1, x2, …., xn and corresponding frequencies f1, f2, …., fn.
1
Mean deviation = ∑ f |x – A|
N
where, N = ∑ f is the total frequency, |x – A| represents the modulus or the absolute value of the deviation (x – A) ignoring
the – ve sign. It can be broadly stated that when deviation is a small number, the average is good.

ILLUSTRATION
Ex. 1 : Calculate Arithmetic mean and Mean deviation of the following frequency distribution :
x 1 2 3 4 5 6
f 3 4 8 6 4 2
Sol. : Preparing the table :
x f x×f x–A |x – A | f×|x–A|
1 3 3 – 2.37 2.37 7.11
2 4 8 – 1.37 1.37 5.48
3 8 24 – 0.37 0.37 2.96
4 6 24 0.63 0.63 3.78
5 4 20 1.63 1.63 6.52
6 2 12 2.63 2.63 5.26
Total ∑f = 27 ∑ f x = 91 – – ∑f × |x – A|| = 31.11
– ∑ fx 9 1
A.M. = A = x = = = 3.37 (approximately)
∑f 27
∑f × |x – A| 31.11
Mean deviation = = = 1.152 (approximately).
∑f 27
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.14) STATISTICS, CORRELATION AND REGRESSION

(ii) Standard Deviation : Standard deviation is defined as the positive square root of the arithmetic mean of the squares of
the deviations of the given values from their arithmetic mean. It is denoted by the symbol σ.
For the variate x which takes n values x1, x2, … xn,
1 −
S.D. = σ = ∑ (x − x )2
n
For a frequency distribution (x, f), i.e. for variate x which takes n values x1, x2, … xn and corresponding frequencies f1, f2, …, fn.
1 2
S.D. = σ =
N (

∑f x– x )

where, x is A.M. of the distribution and N = ∑ f.
(iii) Variances : The square of the standard deviation is called variance, denoted by Var (x).
For the variate x which takes the values x1, x2, …, xn,
2
1
Variance = Var (x) = σ2 =
n (

∑ x–x )
For a frequency distribution (x, f), i.e. for variate x which taken n values x1, x2, …, xn and corresponding frequencies f1, f2, …, fn,
2
1
Variance = Var (x) = σ2 =
N
∑f (x – –x)
The step of squaring the deviations (x – –x) overcomes the drawback of ignoring the signs in Mean deviation. Standard
deviation is also suitable for further mathematical treatment. Moreover among all the measures of dispersion, standard deviation
is affected by least fluctuations of sampling, hence it is considered as most reliable measure of dispersion.
(iv) Root mean square deviation is given by
1
R.M.S. = S = ∑ f (x – A)2
N
where, A is any arbitrary number (reference number).
Also, square of the root mean square deviation is called Mean square deviation denoted by S2.
1
M.S.D. = S2 = ∑ f(x − A)2
N

When A = x , the Arithmetic mean, Root mean square deviation becomes equal to the standard deviation.
(v) For computation purpose the above formulae can be simplified as follows :
Method of Calculating σ :
For variate x which takes n values x1, x2, …, xn
1 −
σ2 = ∑ (x − x )2
n
1 − −
= ∑ (x2 − 2x x + x 2)
n
1 − −
= ∑ x2 − 2 x 2 + x 2
n
2

σ2 =
1
∑ x2 −
1 ∑ x … (1)
n 

n

For frequency distribution (x, f),


2
1
σ2 =
N
∑f (x – –x)
–2
=
1
N
∑f (x2 – 2x –x + x )
– 2 ∑f
1 2x
= ∑ f x2 – ∑ fx + –x .
N N N

=
1 – 2
∑ f x2 – 2 x + x
– 2
‡ ∑ fx – ∑f
= x, = 1
N  N N 
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.15) STATISTICS, CORRELATION AND REGRESSION

1 –2
= ∑ f x2 – x
N
2
σ2 =
1
∑ f x2 –
 1 ∑ f x … (2)
N N 
Usually, product terms fx and fx2 are large, hence to reduce the volume of calculations further, we proceed as follows :
2
1
σ2 =
N
∑f (x – –x)
2
1
=
N
∑f (x – A + A – –x) , (where, A is arbitrary number)


(A – –x) + (A – –x) 
2
1
= ∑f (x – A)2 + 2 (x – A)
N
2
1 2 ∑f
N
∑ f (x – A)2 +
=
N
∑ (A – –x) ∑ f (x – A) + (A – –x) N
– 1
Let d = x – A then using x = A + ∑ fd
N
2
1 2  1  A 1 
σ2 = ∑f d 2 + A – A – ∑ fd ∑f d + –A – ∑fd ·1
N    

N N N
1 2 2 1 2
= ∑ f d – 2 (∑ fd) + 2 (∑ fd)
2
N N N
1 1 2
= ∑ f d 2 – 2 (∑ f d)
N N
2
σ2 =
1
∑ fd2 –
∑ fd … (3)
N  N 
2
or σ=
1
∑ f d2 –
∑ fd … (4)
N  N 
Terms f d , f d 2 are numerically smaller as compared to f x , f x 2 and use of formula (4) reduces the calculations considerably in
obtaining σ.
In dealing with data presented in grouped frequency distribution form and to reduce the calculations further, we put
x–A
u = , where h is generally taken as width of class interval.
h
d
Thus u = or d = hu putting d = hu in formula (3),
h
2
σ =
1
∑ f h2 u2 –
∑ f hu
N  N 
2
σ=h
1
∑ f u2 –
∑ f u … (5)
N  N 
Formula (5) is quite useful for data presented in grouped frequency distribution form.
(vi) Relation Between σ and S : By definition, we have
1
S2 = ∑ f (x – A)2
N
2
1
=
N
∑f (x – –x + –x – A)

) (–x – A) + (–x – A) 
2 2
1
=
N
∑f ( x –

x ) –
+2 x–x (
2 2
1 ∑f
=
N
∑f (x – –x) –
+ 2 x–A ( ) N1 ∑ f (x – –x) + (–x – A) N
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.16) STATISTICS, CORRELATION AND REGRESSION

(

Note that x – A ) being constant, is taken outside the summation.
1 1 – 1
Now since
N
∑f (x – –x) = N
∑ fx – x .
N
– –
∑f = x – x = 0

2 2
1
∴ S2 =
N
∑f (x – –x) + (–x – A) , as ∑ f = N


Thus, S2 = σ2 + d2 , where d = x – A … (6)


If x = A, thus S2 would be least as d = 0. Thus, Mean square deviation (S2) and consequently Root mean square (S)

deviation are least when deviations are taken from A = x .

(vii) Coefficient of Variation : The relative measure of standard deviation is called coefficient of variation and is given by

σ
C.V. = × 100 … (7)
A.M.

For comparing the variability of two series, we calculate coefficient of variation for each series.

ILLUSTRATION

Ex. 1 : Runs scored in 10 matches of current IPL season by two batsmen A and B are tabulated as under
Batsman A 46 34 52 78 65 81 26 46 19 47
Batsman B 59 25 81 47 73 78 42 35 42 10

Decide who is better batsman and who is more consistent.

Sol.: For Batsman A :


x d = x – 46 d2
46 0 0
34 –12 144
52 6 36
78 32 1024
65 19 361
81 35 1225
26 –20 400
46 0 0
19 –27 729
47 1 1
2
∑ d = 34 ∑ d = 3920
– ∑d
xA = 46 + = 46 + 3.4 = 49.4
10
2
∑d2 ∑d
σA = – = 392 – 11.56 = 19.50 (N = 10)
N N
σA 19.50
Coefficient of variation for A = × 100 = × 100 = 39.47
A.M. 49.4
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.17) STATISTICS, CORRELATION AND REGRESSION

For Batsman B
x d = x – 42 d2
59 17 289
25 –17 289
81 39 1521
47 5 25
73 31 961
78 36 1296
42 0 0
35 –7 49
42 0 0
10 –32 1024
∑ d = 72 ∑ d2 = 5454
– ∑d
xB = 42 + = 42 + 7.2 = 49.2
10
2
∑d2 ∑d
σB = – = 545.4 – 51.84 – 493.56 = 22.22
N N
σB 22.22
Coefficient of variation for B = × 100 = × 100 = 45.16
A.M. 49.2
Conclusion : A.M. for A is slightly higher than B A.M. for B so A is slightly better and coefficient of variation for A is less than that
of B.
∴ A is more consistent.
Ex. 2 : Fluctuations in the Aggregate of marks obtained by two groups of students are given below. Find out which of the two
shows greater variability. (Dec. 2012)

Group A 518 519 530 530 544 542 518 550 527 527 531 550 550 529 528

Group B 825 830 830 819 814 814 844 842 842 826 832 835 835 840 840
σ
Sol. : To solve this problem, we have to determine coefficient of variation × 100 in each case. First we present the data
A.M.
in frequency distribution form.
For Group A :
x f d = x – 530 d2 fd fd2

518 2 – 12 144 – 24 288

519 1 – 11 121 – 11 121

527 2 –3 9 –6 18

528 1 –2 4 –2 4

529 1 –1 1 –1 1

530 2 0 0 0 0

531 1 1 1 1 1

542 1 12 144 12 144

544 1 14 196 14 196

550 3 20 400 60 1200

Total ∑ f = 15 – – ∑ fd = 43 1973
ENGINEERING MATHEMATICS – III (Comp. Engg. and IT Group) (S-II)(5.18) STATISTICS, CORRELATION AND REGRESSION

− ∑ fd 43
A.M. = x A = 530 + = 530 + = 532.866
∑f 15
2 2
σA =
1
∑ fd2 –
∑ fd = 1973

43 = 131.533 – 8.218
N  N  15 15
= 11.105
σA 11.105
Coefficient of variation = × 100 = × 100 = 2.0840
A.M. 532.866
For Group B :
x f d = x – 830 d2 fd fd2
814 2 – 16 256 – 32 512
819 1 – 11 121 – 11 121
825 1 –5 25 –5 25
826 1 –4 16 –4 16
830 2 0 0 0 0
832 1 2 4 2 4
835 2 5 25 10 50
840 2 10 100 20 200
842 2 12 144 24 288
844 1 14 196 14 196
Total ∑ f = 15 – – ∑ fd = 18 ∑ fd2 = 1412
− 18
A.M. = x B = 830 + = 831.2
15
2
σB =
1412
– 18 = 94.133 – 1.44 = 9.628
15 15
σB 9.628
Coefficient of variation = × 100 = × 100 = 1.158
A.M. 831.2
Coefficient of variation of group A is greater than that of group B.
∴ Group A has greater variability, or Group B is more consistent.
Ex. 3 : Calculate standard deviation for the following frequency distribution. Decide whether A.M. is good average.
Wages in Rupees earned per day 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of labourers 5 9 15 12 10 3
Sol. : Preparing the table for the purpose of calculations.
Wages Earned C.I. Mid-value x Frequency f x – 25 fu fu2
u=
10
0 – 10 5 5 –2 – 10 20
10 – 20 15 9 –1 –9 9
20 – 30 25 15 0 0 0
30 – 40 35 12 1 12 12
40 – 50 45 10 2 20 40
50 – 60 55 3 3 9 27
Total – ∑ f = 54 – ∑ fu = 22 ∑ fu2 = 108
Using formula (4),
2
× 108 –  
1 22
σ = 10
54 54
= 10 2 – 0.166 = 13.54 approximately
In this problem,
∑ fu
A.M. = 25 + h = 25 + 10 (0.4074) = 29.074
N
Conclusion : σ = 13.54 is quite a large value and Arithmetic mean 29.074 is not a good average.

You might also like