0% found this document useful (0 votes)
42 views108 pages

Module 1

This document provides an introduction to statistics and data collection. It discusses key concepts such as the definition of statistics, main functions of statistics including data collection, presentation, analysis and interpretation. It also describes different types of statistical data like primary and secondary data. Various methods for collecting primary data and constructing frequency distributions are explained. Finally, it discusses measures of central tendency like the arithmetic mean, median and mode to analyze data.

Uploaded by

p4544468
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views108 pages

Module 1

This document provides an introduction to statistics and data collection. It discusses key concepts such as the definition of statistics, main functions of statistics including data collection, presentation, analysis and interpretation. It also describes different types of statistical data like primary and secondary data. Various methods for collecting primary data and constructing frequency distributions are explained. Finally, it discusses measures of central tendency like the arithmetic mean, median and mode to analyze data.

Uploaded by

p4544468
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

INTRODUCTION TO STATISTICS & DATA

COLLECTION

Statistics: Statistics is the “ science which deals with


the collection, analysis and interpretation
of numerical data”.

Main Functions of Statistics: ➢ Collection of Data

➢ Presentation of Data

➢ Analysis of Data

➢ Interpretation of results 1
Types of Statistical Data:

Primary Data: Primary data are those which are collected from the units or
individuals directly and these data have never been used for
any purpose earlier.
Example:
If a researcher is interested to know the impact of noon meal scheme for the
school children, he has to undertake a survey and collect data on the opinion
of parents and children by asking relevant questions. Such a data collected
for the purpose is called primary data.
2
Methods for Collecting Primary Data:
The primary data can be collected by the following five methods.
1. Direct personal interviews
2. Indirect Oral interviews
3. Information from correspondents
4. Mailed questionnaire method
5. Schedules sent through enumerators

Secondary Data:
The data, which had been collected by some individual or agency and
statistically treated to draw certain conclusions and now the same data are
used and analyzed to extract some other information.
3
Frequency Distribution:
Frequency distribution is a series when a number of observations with similar or
closely related values are put in separate bunches or groups, each group being in
order of magnitude in a series.

A frequency distribution is constructed for three main reasons:


• To facilitate the analysis of data.
• To estimate frequencies of the unknown population distribution from the
distribution of sample data.
• To facilitate the computation of various statistical measures
4
Discrete frequency distribution
Marks 20 30 40 50 60 70
No. of
Students 8 12 20 10 6 4

Grouped Frequency distribution

Marks 0-9 10-19 20-29 30-39 40-49 50-59


No. of
12 18 27 20 17 6
Students

Continuous Frequency distribution


Marks 0-10 10-20 20-30 30-40 40-50 50-60
No. of
12 18 27 20 17 6
Students
5
Formation of Frequency Distribution
Classification according to class-intervals:
(i) Class Limits

(ii) Class-interval

(iii) Class-frequency

(iv) Class Mid-point

(v) Exclusive method

(vi) Inclusive method

6
Data Analysis:

➢ Measures of Central tendency or average

➢ Measures of Variation or dispersion

➢ Measures of Skewness

➢ Measures of Kurtosis

7
Measures of Central tendency or average
Quite often it is found that the entries in data set cluster around a central
(or middle) value. This behavior of the data set is called the central
tendency. The main Challenge is to locate a central value around which the
clustering takes place.
❑ Arithmetic Mean

❑ Median
❑ Mode
❑ Geometric Mean
❑ Harmonic Mean
8
Arithmetic Mean (A.M.)
A.M. : Arithmetic mean of set of observations is their sum
divided by the number of observations.

𝑥1 + 𝑥2 + ………+ 𝑥𝑛 σ𝑛
𝑖=1 𝑥𝑖
Simple A.M. : 𝑋ത = =
𝑁 𝑁
N - Number of observations.

Ex: A Monthly income of 10 families in a city is given by:


1600, 1560, 1440, 1530, 1670, 1860, 1750, 1910, 1490, 1800.
σ10
𝑖=1 𝑥𝑖 16610
𝑋ത = = = 𝟏𝟔𝟔𝟏
10 10

9
A.M. For Discrete Series:
σ𝑛
𝑖=1 𝑓𝑖 𝑥𝑖
Direct Method: 𝑋ത =
𝑁
f - Frequency of the given set of observations
𝑁 = σ𝑛𝑖=1 𝑓𝑖 = Total number of observations

Ex: The following data represents the marks obtained


by 60 students of a class. Obtain the average marks.

Marks 20 30 40 50 60 70
No. of
8 12 20 10 6 4
Students

10
Marks No. of Students ( f ) fx
20 8 160
30 12 360
40 20 800
50 10 500
60 6 360
70 4 280
𝑁 = 60 σ 𝑓𝑥 = 2460

σ 𝑓𝑖 𝑥𝑖 2460

𝑋= = = 𝟒𝟏
𝑁 60

11
A.M. For Continuous Series:
σ𝑛
𝑖=1 𝑓𝑖 𝑥𝑖
Direct Method: 𝑋ത =
𝑁
f - Frequency of the given set of observations
x - mid-point of each class
𝑁 = σ𝑛𝑖=1 𝑓𝑖 = Total number of observations

Ex: Obtain the Arithmetic mean for the following data.

Marks 0-10 10-20 20-30 30-40 40-50 50-60


No. of
12 18 27 20 17 6
Students

12
No. of Students Mid-point
Marks fx
(𝒇) (𝒙)
0-10 12 5 60
10-20 18 15 270
20-30 27 25 675
30-40 20 35 700
40-50 17 45 765
50-60 6 55 330

σ 𝑓𝑖 𝑥𝑖 2800
𝑋ത = = = 28
𝑛 100

13
Using the Deviation:
If the values of x or f are large, the calculation of A.M. by above
formula is quite time-consuming and tedious.
The A.M. is reduced to a great extent by taking the deviations
of the given values from any arbitrary point ‘A’ :

Let 𝑑𝑖 = 𝑥𝑖 − 𝐴 ⇒ 𝑓𝑖 𝑑𝑖 = 𝑓𝑖 𝑥𝑖 − 𝐴

1 ℎ
𝑋 = 𝐴 + σ𝑁
ത 𝑖=1 𝑓𝑖 𝑑𝑖 or 𝑋 = 𝐴 + σ𝑁
ത 𝑖=1 𝑓𝑖 𝑑𝑖
𝑁 𝑁
ℎ 𝑜𝑟 𝑖 − common magnitude of class

14
C.I. 0-8 8-16 16-24 24-32 32-40 40-48
Ex:
Frequency 8 7 16 24 15 7

𝒙−𝑨
C.I. Mid-Value Frequency (f) 𝒅= fd
𝒉
0-8 4 8 -3 -24
8-16 12 7 -2 -14
16-24 20 16 -1 -16
24-32 28 24 0 0
32-40 36 15 1 15
40-48 44 7 2 14
Total 77 -25
𝑛
ℎ 8 x (−25)
𝑋ത = 𝐴 + ෍ 𝑓𝑖 𝑑𝑖 = 28 + = 25.404
𝑁 77
𝑖=1
15
Median:
Median of a distribution is the value of the variable which divides it into two equal parts.
It is the value which exceeds and is exceeded by the same number of observations. Thus the
median is called as a “positional average”.

Evaluation of Median: For ungrouped data,


(i) Odd number of observations (middle value after the values has
been arranged in ascending or descending order of magnitude).
(ii) Even number of observations (arithmetic mean of two middle
terms after the values has been arranged in ascending or
descending order of magnitude).

Ex: Find the median of the values 25, 20, 15, 35, 18.
Ex: Find the median of the values 8, 20, 50, 25, 15, 30.
16
For Grouped data:

(i) Discrete Frequency distribution:


𝑁+1
a) Find , where N – Total Frequency = σ𝑁
𝑖=1 𝑓𝑖 .
2
𝑁+1
b) See the (less than) cumulative frequency (c.f.) just greater than .
2
c) The corresponding value of x is median.

Ex: x 1 2 3 4 5 6 7 8 9
f 8 10 11 16 20 25 15 9 6

Calculate the Median of the distribution.

17
x f c.f.
1 8 8 𝑁+1
= 60.5
2 10 18 2
3 11 29
4 16 45 The cumulative frequency (c.f.) just
5 20 65 𝑁+1
greater than is 65 and value
2
6 25 90
corresponding to 65 is 5.
7 15 10
8 9 114 ∴ 𝑴𝒆𝒅𝒊𝒂𝒏 𝒊𝒔 𝟓.
9 6 120
N = 120

18
(ii) Continuous Frequency Distribution:
In case of continuous frequency distribution, the class corresponding to the c.f.
𝑵
just greater than is called the median class and the value of median is obtained
𝟐
by the following formula:

ℎ 𝑁
Median = 𝑙 + ( − 𝑐)
𝑓 2

where l - is the lower limit of the median class,


f - is the frequency of the median class,
h - is the magnitude of the median class,
c - is the c.f. of the class preceding the median class

Note: The median formula can only be used only for continuous classes without
any gaps, i.e., for exclusive type classifications.
19
Ex: Find the median of the following data:

Wages
2000-3000 3000-4000 4000-5000 5000-6000 6000-7000
(in Rs.)
No. of
3 5 20 10 5
workers

20
Solution:
𝑁 43
Wages No. of c.f. = = 21.5
(in Rs.) Employees 2 2
2000-3000 3 3 Cumulative frequency just greater
3000-4000 5 8
than 21.5 is 28 and the corresponding
4000-5000 20 28 class is 4000-5000.
5000-6000 10 38 Thus the median class is 4000-5000.
6000-7000 5 43
N = 43
l = 4000; h = 1000; f = 20; c = 8

1000
Median = 4000 + (21.5 − 8)
20
∴ Median = 4675.
21
MODE
Definition: Mode is the value of the variable which is predominant
in the series.

1. For a discrete frequency distribution , mode is the value of


x – corresponding to maximum frequency.
x 1 2 3 4 5 6 7 8
f 4 9 16 25 22 15 7 3

22
2.For Continuous Frequency distribution:

ℎ(𝑓1 −𝑓0 ) ℎ(𝑓1 −𝑓0 )


Mode = 𝑙 + =𝑙+
𝑓1 −𝑓0 −(𝑓2 −𝑓1 ) 2𝑓1 −𝑓0 −𝑓2

l – lower limit of the modal class


h – magnitude of the modal class
𝑓1 - frequency of the modal class
𝑓0 𝑎𝑛𝑑 𝑓2 - frequencies of the class preceding and succeeding the modal class

23
Example: Find mode

C.I. 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80

Frequency 5 8 7 12 28 20 10 10

Hence the maximum frequency is 28.

Thus the modal class is 40-50.

10(28−12)
Mode = 40 + = 46.67
2x28−12−20

24
A distribution is having only one mode is called Unimodal.
If it contains more than one mode, it is called bimodal or multimodal.

Note: In the following three cases, mode can not be obtained by using the above
formula:
(a)When the highest frequency is observed at the beginning of the frequency table.
(b)When the highest frequency is observed at the ending of the frequency table.
(c)When two or more class intervals contain the same maximum frequencies.

However, in the above three cases, mode can be obtained by using either a method
called ‘Grouping method’ or ‘empirical relationship between arithmetic mean,
median and mode.
The empirical relationship between mean, median and mode is
Mode = 3 Median – 2 Mean 25
Example: Calculate the mean, median and mode for the following data.

Wages
15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64
(in lakhs.)
No. of
31 47 59 78 104 113 81 60 52 25
workers

Mean = 39.52

Median = 39.77

Mode = 40.6

26
Example: Calculate the Mean, Median and Mode for the
following data.

Variable 10-13 13-16 16-19 19-22 22-25 25-28 28-31 31-34 34-37 37-40
Frequency 8 15 27 51 75 54 36 18 9 7

Mean = 24.19
Median = 23.96
Mode = 23.6

27
Geometric mean (G.M)
29
Harmonic mean (H.M)

30
Series A Series B Series C
200 200 1
200 205 989
200 202 2
200 203 3
200 190 5
Mean = 200 200 200

31
Measure of Dispersion
The degree to which numerical data tend to spread about an
average value is called variation or dispersion of data.

Example: Two distributions giving the weekly wages if 200 persons


may have the same mean value, say Rs. 100.

In one distribution, most of the observations may be centered around the


mean value 100, a few others away from 100

In another distribution a large number of observations may be above


150 and another set of large number of observations may be below 50
and only a few between 50 and 100 and still mean may be 100.
32
From this example,
• These two distributions with the same mean are not identical.
• In one, the items are nearer to the mean and in the other they are
spread away from the mean.
• Two distributions may also have same median. But the deviations of
the observations from the median may be different type in the two
distributions.

To study this aspect of distributions, another characteristic called


dispersion.

33
There are two kind of measures of dispersion
Absolute measure of dispersion
Relative measure of dispersion
Absolute measure of dispersion
Absolute measure of dispersion indicates the amount of variations in a set
of values in terms of units of observation.
Example
When rainfalls on differ days are available in mm, any absolute measure of
dispersion gives the variation rainfall in mm.
Relative measure of dispersion
Relative measures of dispersion are free from the units of measurements of
the observations.
They are pure numbers. They are used to compare the variation in two or
more sets, which are having differnent units of measuremet of observation.
34
Absolute Measure of Dispersion:
Range
Quartile Deviation
Mean Deviation
Standard Deviation

Relative Measure of Dispersion:


Co-efficient of Range
Co-efficient of Quartile deviation
Co-efficient of Mean deviation
Co-efficient of variation

35
Range
Definition: Difference between the value of the smallest item and the
value of the largest item in the distribution.
Range = 𝑳 − 𝑺
L – Largest Value, S- Smallest Value

The relative measure corresponding to range is called the coefficient of


range,
𝑳−𝑺
Coefficient of Range =
𝑳+𝑺

36
Example 1: The following are the prices of shares of a company from
Monday to Saturday:

Days Monday Tuesday Wednesday Thursday Friday Saturday


Price(Rs.) 200 210 208 160 220 250

Calculate the range and its coefficient.

Solution: Range = L – S = 250 – 160 = 90


Range = Rs. 90
𝐿−𝑆 250−160 90
Coefficient of Range = = = = 0.22
𝐿+𝑆 250+160 410

37
In a frequency distribution, range is calculated by taking the difference
between the lower limit of the lowest class and the upper limit of the
highest class.

Marks 10-20 20-30 30-40 40-50 50-60 60-70


Example 2:
No. of
12 18 27 20 17 6
Students

Range = L – S = 70 – 10 = 60
𝐿−𝑆 70−10 60
Coefficient of Range = = = = 0.75
𝐿+𝑆 70+10 80

38
Partitions:
These are the values which divided the series into a number of equal
parts.
Quartiles: The three points which divided the series in to four equal
parts are called quartiles. These are Q1, Q2 (median), Q3.

Deciles: The nine points which divided the series in to ten equal parts
are called deciles. These are D1, D2,….D5(median), …...D9.

Percentiles: The ninety-nine points which divided the series in to


hundred equal parts are called percentiles. Here P50 is the median.

39
Quartiles
𝑁+1
First Quartile (Q1) = Size of 𝑡ℎ item (Discrete series)
4
𝑁
Q1 = Size of 𝑡ℎ item (Continuous series)
4
𝑁
4
− 𝑐.𝑓.
Q1 = l + ×𝑖
𝑓

𝑁+1
Third Quartile (Q3) = Size of 3 𝑡ℎ item (Discrete series)
4
3𝑁
Q3 = Size of 𝑡ℎ item. (Continuous series)
4
3𝑁
4
− 𝑐.𝑓.
Q3 = l + ×𝑖
𝑓

40
Example:

41
Quartiles: Grouped data-Discrete series

43
Example : Calculate Q1 and Q3 for the following data.

Roll No. 1 2 3 4 5 6 7
Marks 20 28 40 12 30 15 50

Solution: Marks in ascending order 12 15 20 28 30 40 50

𝑁+1 7+1
Q1 = Size of 𝑡ℎ item = Size of = 2nd item.
4 4
Size of 2nd item is 15. Hence Q1 = 15

𝑁+1 7+1
Q3 = Size of 3 𝑡ℎ item = Size of 3 = 6th item.
4 4
Size of 6th item is 40. Hence Q3 = 40.

.
44
Quartiles:Grouped data-Continuous series

𝑁
4
− 𝑐.𝑓.
Q1 = l + ×𝑖
𝑓

3𝑁
− 𝑐.𝑓.
4
Q3 = l + ×𝑖
𝑓 45
Example : Compute the value of Q1 and Q3 for following data:

C.I. 10-20 20-30 30-40 40-50 50-60 60-70 70-80

f 12 19 5 10 9 6 6

46
Solution: Cumulative
Marks Frequency
Frequency
10-20 12 12
20-30 19 31
30-40 5 36
40-50 10 46
50-60 9 55
60-70 6 61
70-80 6 67
𝑵 = 67

47
𝑁 67
Q1 = Size of 𝑡ℎ item = Size of = 16.75th item.
4 4
Q1 - lies in the interval 20-30

𝑁
4
− 𝑐.𝑓.
Q1 = l + ×𝑖 l = 20, 𝑁/4 = 16.75, c.f. = 12
𝑓
f = 19, i = 10

67
4
−12
Q1 = 20 + × 10 = 20+2.5 = 22.5
19

Hence Q1 = 22.5

48
3𝑁 3×67
Q3 = Size of 𝑡ℎ item = Size of = 50.25th item.
4 4
Q3 - lies in the class 50-60.

3𝑁
4
− 𝑐.𝑓.
Q3 = 𝑙 + ×𝑖 l = 50, 3𝑁/4 = 50.25, c.f. = 46
𝑓
f = 9, i = 10

50.25−46
Q3 = 50 + × 10 = 50+4.72 = 54.72
9

Hence Q3 = 54.72

49
𝑁+1
Deciles: 𝑫𝟒 = Size of 4 item in individual and discrete series.
10
4𝑁
𝑫𝟒 = Size of th item in continuous series.
10

𝑁+1
Percentiles: 𝑷𝟔𝟎 = Size of 60 th item in discrete series.
100

60𝑁
𝑷𝟔𝟎 = Size of th item in continuous series.
100

50
4𝑁 4×67
𝑫𝟒 = Size of th item = = 26.8 th item.
10 10
𝑫𝟒 lies in the interval of 20-30.

4𝑁
− 𝑐.𝑓.
10
𝑫𝟒 = l + ×𝑖 l = 20, 4𝑁/10 = 26.8, c.f. = 12
𝑓
f = 19, i = 10
𝑫𝟒 = 27.79
60𝑁 60×67
𝑷𝟔𝟎 = Size of th item = = 40.2 th item
100 100
𝑷𝟔𝟎 lies in the interval of 40-50.
60𝑁
100
− 𝑐.𝑓.
𝑷𝟔𝟎 = 𝑙 + ×𝑖 l = 40, 4𝑁/10 = 40.2, c.f. = 36
𝑓
f = 10, i = 10
𝑷𝟔𝟎 = 44.2

51
Quartile Deviation
Definition: Average amount by which the two quartiles differ from the
median.
𝑸𝟑 −𝑸𝟏
Quartile Deviation (Q.D.) =
𝟐

• The Median ± Q.D. covers exactly 50 per cent of the observations.

• When Q.D. is very small, it describes high uniformity or small variation


of the central 50% items, and a high Q.D. means that the variation among
the central items is large.

52
Relative measure of Q.D.

𝑸𝟑 −𝑸𝟏
Coefficient of Q.D. =
𝑸𝟑 +𝑸𝟏

It can be used to compare the degree of variation in different distributions.

53
Example : Calculate the value of Q.D. and its coefficient of Q.D.
from the following data.

Roll No. 1 2 3 4 5 6 7
Marks 20 28 40 12 30 15 50

Solution: Marks in ascending order 12 15 20 28 30 40 50

𝑁+1 7+1
Q1 = Size of 𝑡ℎ item = Size of = 2nd item.
4 4
Size of 2nd item is 15. Hence Q1 = 15

𝑁+1 7+1
Q3 = Size of 3 𝑡ℎ item = Size of 3 = 6th item.
4 4
Size of 6th item is 40. Hence Q3 = 40.

.
54
𝑄3−𝑄1 40−15
∴ 𝑸. 𝑫. =
2 = 2 = 12.5

𝑄3 −𝑄1 40−15
Coefficient of Q.D. = = = 𝟎. 𝟒𝟓𝟓
𝑄3 +𝑄1 40+15

55
Example : Compute the value of Q.D. and its coefficient from
the following data.

Marks 10 20 30 40 50 60
No. of
4 7 15 8 7 2
Students

56
Solution: Marks Frequency cumulative
frequency
10 4 4
20 7 11
30 15 26
40 8 34
50 7 41
60 2 43

𝑁+1 43+1
Q1 = Size of 𝑡ℎ item = Size of = 11th item.
4 4
Size of 11th item is 20. Hence Q1 = 20

𝑁+1 43+1
Q3 = Size of 3 𝑡ℎ item = Size of 3 = 33rd item.
4 4
Size of 33rd item is 40. Hence Q3 = 40.

57
𝑄3−𝑄1 40−20
𝑸. 𝑫. =
2 = 2 = 10

𝑄3 −𝑄1 40−20
Coefficient of Q.D. = = = 0.333
𝑄3 +𝑄1 40+20

58
Example : Compute the value of Q.D. and coefficient of Q.D.
from the following data

C.I. 10-20 20-30 30-40 40-50 50-60 60-70 70-80

f 12 19 5 10 9 6 6

59
Solution: Cumulative
Marks Frequency
Frequency
10-20 12 12
20-30 19 31
30-40 5 36
40-50 10 46
50-60 9 55
60-70 6 61
70-80 6 67
𝑵 = 67

60
𝑁 67
Q1 = Size of 𝑡ℎ item = Size of = 16.75th item.
4 4
Q1 lies in the interval 20-30

𝑁
4
− 𝑐.𝑓.
Q1 = l+ ×𝑖 l = 20, 𝑁/4 = 16.75, c.f. = 12 f = 19, i = 10
𝑓

67
4
−12
Q1 = 20 + × 10 = 20+2.5 = 22.5
19

Hence Q1 = 22.5

61
3𝑁 3×67
Q3 = Size of 𝑡ℎ item = Size of = 50.25th item.
4 4
Q3 lies in the class 50-60.

3𝑁
− 𝑐.𝑓.
4
Q3 = 𝑙 + ×𝑖 l = 50, 3𝑁/4 = 50.25, c.f. = 46 f = 9, i = 10
𝑓

50.25−46
Q3 = 50 + × 10 = 50+4.72 = 54.72
9
Hence Q3 = 54.72

𝑄3 −𝑄1 54.72−22.5
𝑸. 𝑫. = = = 16.11
2 2
𝑄3 −𝑄1 54.72−22.5
Coefficient of Q.D. = = = 0.4172
𝑄3 +𝑄1 54.72+22.5
62
Mean Deviation
Definition: M.D. is the average difference between the observations in a distribution and
the median or mean of that series.
Or

Mean deviation is the arithmetic mean of the deviations of a series computed from
any measure of central tendency; i.e., the mean, median or mode, all the deviations
are taken as positive
σ𝐷
Mean Deviation or M.D. =
𝑁
Where 𝐷 is the deviations from median ignoring signs.

63
For individual observations:
(i) Compute median of the series.
(ii) Take deviations of items from median ignoring ± signs and denote
these deviations by 𝐷 .
(iii) Obtain the total of these observations, σ 𝐷 .
(iv) Divide the total obtained in step (iii) by the number of observations to
get the value of mean deviation.

Relative Measure of M.D. :

𝑀.𝐷.
Co-efficient of M.D. =
𝑀𝑒𝑑𝑖𝑎𝑛

64
Example : Calculate the mean deviation of the two income
groups.
I (Rs.) II (Rs.)
4000 3000
4200 4000
4400 4200
4600 4400
4800 4600
4800
5800

65
Solution: Group I Group II
Rs. 𝑫 Rs. 𝑫
4000 400 3000 1400
4200 200 4000 400
4400 0 4200 200
4600 200 4400 0
4800 400 4600 200
4800 400
5800 1400
N=5 σ 𝑫 = 1200 N=7 σ 𝑫 =4000

σ𝐷
Mean deviation : I group M.D. =
𝑁
𝑁+1 5+1
Median = 𝑡ℎ item = = 3rd item. Size of the 3rd item = 4400.
2 2
1200
M.D. = = 240
5
i.e., the average deviation of the individual incomes from the median income
is Rs. 240.
66
σ𝐷
Mean deviation : II group M.D. =
𝑁
𝑁+1 7+1
Median = 𝑡ℎ item = = 4th item.
2 2
Size of the 4th item = 4400.
4000
M.D. = = 571.43
5
i.e., the average deviation of the individual incomes from the
median income is Rs. 571.43.

𝑀.𝐷. 240
Co-efficient of M.D. (I - Group) = = = 0.055
𝑀𝑒𝑑𝑖𝑎𝑛 4400
571.43
(II- Group) = = 0.13
4400

67
Mean deviation – Discrete Series
σ𝑓 𝐷
M.D. =
𝑁
(i) Compute median of the series.

(ii) Take deviations of items from median ignoring ± signs and


denote these deviations by 𝐷 .

(iii) Multiply these deviations by the respective frequencies and


Obtain the total, σ 𝑓 𝐷 .

(iv) Divide the total obtained in step (iii) by the number of


observations to get the value of mean deviation.

68
Example : The number of telephone calls received at an exchange in 245
successive one-minute intervals are shown in the following
frequency distribution. Compute the mean deviation about
the median.

Number of
0 1 2 3 4 5 6 7
Calls
Frequency 14 21 25 43 51 40 39 12

69
Solution: No. of Calls f c.f. 𝑫 f𝑫
0 14 14 4 56
1 21 35 3 63
2 25 60 2 50
3 43 103 1 43
4 51 154 0 0
5 40 194 1 40
6 39 233 2 78
7 12 245 3 36
N = 245 σ 𝒇 𝑫 = 366
𝑁+1 245+1
Median = Size of th item = = 123rd item .
2 2
Hence the median value is 4.
σ𝑓 𝐷 366
M.D. = = = 1.49
𝑁 245

70
In Continuous Series:
We have to obtain the mid-points of the various classes and take
the deviations of these mid-points from median.

That is |D| = | median – mid|

71
Example: Find mean deviation from mean and median
Marks 0-10 10-20 20-30 30-0 40-50 50-60 60-70 70-80
No. of 20 25 32 40 42 35 10 8
Students

72
73
74
75
Eample: Calculate the coefficient of mean deviation from the
following data:

Marks 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90


No. of 2 6 12 18 25 20 10 7
Students

Mean deviation = 12.94

76
Standard deviation
For the frequency distribution 𝑥𝑖 | 𝑓𝑖 ; i =1, 2, …, n ,

σ(𝑥𝑖 −𝑥)ҧ 2 1 1 1
Variance = 𝜎2 = = σ 𝑥𝑖2 − ( σ 𝑥𝑖 ) 2 = σ 𝑥𝑖2 − 𝑥ҧ 2
𝑁 𝑁 𝑁 𝑁

(ii) Discrete or Continuous frequency distribution


1
𝜎2 = σ𝑖 𝑓𝑖 (𝑥𝑖 − 𝑥)ҧ 2
𝑁
1 1 1 1
= σ 𝑓𝑖 𝑥𝑖2 − ( σ 𝑓𝑖 𝑥𝑖 ) 2 or σ 𝑓𝑖 𝑑𝑖2 − ( σ 𝑓𝑖 𝑑𝑖 ) 2
𝑁 𝑁 𝑁 𝑁

1 1
Standard deviation = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝜎= σ (𝑥𝑖 − 𝑥)ҧ 2 or σ 𝑓𝑖 (𝑥𝑖 − 𝑥)ҧ 2
𝑁 𝑁

ഥ − Arithmetic mean of the distribution


𝒙
1 ℎ
Note : 𝑋 = 𝐴 + σ𝑁
ത 𝑖=1 𝑓𝑖 𝑑𝑖 or 𝑋 = 𝐴 + σ𝑁
ത 𝑖=1 𝑓𝑖 𝑑𝑖
𝑁 𝑁 77
Coefficient of Variation

𝜎
Coefficient of Variation : C.V. = x 100 (Relative Measure)
𝑥ҧ

78
Example: The score of two players A and B in ten innings during a
certain season are:
A 32 28 47 63 71 39 10 60 96 14
B 19 31 48 53 67 90 10 62 40 80

Find which of the two players A, B is more consistent in scoring.

79
Solution: Calculation of Coefficient of Variation

𝑋 ത
(𝑋 − 𝑋) ത 2
(𝑋 − 𝑋) 𝑌 ത
(𝑌 − 𝑌) ത 2
(𝑌 − 𝑌)
32 -14 196 19 -31 961
28 -18 324 31 -19 361
47 +1 1 48 -2 4
63 +17 289 53 +3 9
71 +25 625 67 +17 289
39 -7 49 90 +40 1600
10 -36 1296 10 -40 1600
60 +14 196 62 +12 144
96 +50 2500 40 -10 100
14 -32 1024 80 +30 900
σ 𝑋 = 460 0 6500 σ 𝑌 =500 0 5968
460 500

𝑋= = 46 ത
𝑌= = 50
10 10
σ(𝑥𝑖 −𝑥)ҧ 2 6500 ത 2
σ(𝑦𝑖 −𝑦) 5968
𝜎𝐴2 = = = 650 2
𝜎𝐵 = = = 596.8
𝑁 10 𝑁 10

80
σ(𝑥𝑖 −𝑥)ҧ 2 ത 2
σ(𝑦𝑖 −𝑦)
𝜎𝐴 = = 25.5 𝜎𝐵 = = 24.43
𝑁 𝑁

𝜎𝐴 𝜎𝐵
C.V.(A) = ҧ x 100 = 55.43 C.V.(B) = x 100 = 48.86
𝑥 𝑦ത

81
2
෍ 𝑋 = 460 ; ෍ 𝑋𝑖 − 𝑥ҧ = 0 ; ෍ 𝑋𝑖 − 𝑥ҧ = 6500

2
෍ 𝑌 = 500 ; ෍ 𝑌𝑖 − 𝑦ത = 0 ; ෍ 𝑌𝑗 − 𝑦ത = 5968

𝜎𝐴 = 25.5 𝜎𝐵 = 24.43

C.V.(A) = 55.43 C.V.(B) = 48.86

82
Example: Suppose that samples of polythene bags from two manufacturers, A and B,
are tested by a prospective buyer for bursting pressure, with the following results:

Bursting Number of Bags


Pressure (lb.) A B
5.0 – 9.9 2 9
10.0 – 14.9 9 11
15.0 – 19.9 29 18
20.0 – 24.9 54 32
25.0 – 29.9 11 27
30.0 – 34.9 5 13

Which set of bags has the highest average burning pressure?


Which has more uniform pressure? If prices are the same,
which manufacturer’s bags would be preferred by the buyer? Why?
83
For Manufacturer A
Bursting 𝑚 − 17.45
Pressure 𝒎 𝒇 5 𝒇𝒅 𝒇𝒅𝟐
(lb.) d
4.95-9.95 7.45 2 -2 -4 4
9.95-14.95 12.45 9 -1 -9 9
14.95-19.95 17.45 29 0 0 0
19.95-24.95 22.45 54 1 54 54
24.95-29.95 27.45 11 2 22 44
29.95-34.95 32.45 5 3 15 45
N = 110 σ 𝒇𝒅=78 σ 𝒇𝒅𝟐 =160

84
σ𝑛
𝑖=1 𝑓𝑖 𝑑𝑖
𝑋ത𝐴 = 𝐴 + ×𝑖
𝑁

Here, 𝐴 = 17.45, σ 𝑓𝑑=78, 𝑁 = 110, 𝑖 = 5

78
𝑋ത𝐴 = 17.45 + × 5 = 21
110

σ 𝑓𝑖 𝑑𝑖2 σ 𝑓𝑖 𝑑𝑖
σ𝑨 = − ( )2 × 𝑖
𝑁 𝑁

= 1.455 − 0.503 × 5 = 4.88


𝜎𝐴
C.V. = x 100 = 23.24%
𝑥ҧ

85
For Manufacturer B
Bursting 𝑚 − 17.45
Pressure 𝒎 𝒇 5 𝒇𝒅 𝒇𝒅𝟐
(lb.) d
4.95-9.95 7.45 9 -2 -18 36
9.95-14.95 12.45 11 -1 -11 11
14.95-19.95 17.45 18 0 0 0
19.95-24.95 22.45 32 +1 +32 32
24.95-29.95 27.45 27 +2 +54 108
29.95-34.95 32.45 13 +3 +39 117
N = 110 σ 𝒇𝒅 = 96 σ 𝒇𝒅𝟐 =304

86
σ𝑛
𝑖=1 𝑓𝑖 𝑑𝑖
𝑋ത𝐵 = 𝐴 + ×𝑖
𝑁

Here, 𝐴 = 17.45, σ 𝑓𝑑 = 96, 𝑁 = 110, 𝑖 = 5

96
𝑋ത𝐵 = 17.45 + × 5 = 21.81
110

σ 𝑓𝑖 𝑑𝑖2 σ 𝑓𝑖 𝑑𝑖 2
σ𝑩 = − × 𝑖
𝑁 𝑁

= 2.764 − 0.762 × 5 = 7.075


𝜎𝐵
C.V. = x 100 = 32.44%
𝑥ҧ

87
𝑋ത𝐴 = 21 𝑋ത𝐵 = 21.81

𝜎𝐴 = 4.88 𝜎𝐵 = 7.07

C.V.(A) = 23.24% C.V.(B) = 32.44%

Since the average bursting pressure is higher for manufacturer B,


the bags of manufacturer B have higher bursting pressure.

88
C.I. f C.I. f
0-5 10 0-5 10
5-10 30 5-10 40
10-15 60 10-15 30
15-20 60 15-20 90
20-25 30 20-25 20
25-30 10 25-30 10

𝑋ത = 15
𝜎=6

89
90
60 60
ഥ = 15
𝒙
𝝈 = 𝟔

30 30
40

30

20
10 10
10 10

90
Skewness
When a series is not symmetrical it is said to be asymmetrical or skewed.
A distribution is said to be ‘skewed’ when the mean and median
fall at different points in the distribution, and the balance (or centre of
gravity) is shifted to one side or the other to left or right.

Measure of skewness is the lack of symmetry of a distribution.

91
Dispersion is concerned with the amount of variation rather than with its
direction.
Skewness tell us about the direction of the variation or the departure from
symmetry.

Types of Skewness: (i) Symmetrical Distribution


(ii) Positively Skewed Distribution (Mean > Mode)
(iii) Negatively Skewed Distribution (Mean < Mode)

92
93
Measure of Skewness
Absolute measures of Skewness(Sk)

Sk = 𝑋ത - Mode

Relative measures of Skewness:


(i) Karl pearson’s coefficient of skewness.
(ii) Bowley’s coefficient of skewness.
(iii) Measure of skewness based on moments.

94
(i) Karl pearson’s coefficient of skewness.
𝑀𝑒𝑎𝑛−𝑀𝑜𝑑𝑒 ത
𝑋−𝑀𝑜
Coefficient of Skewness : Sk = =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝜎

however, in practice, it is rare that the value of Sk exceed the limits of ±𝟏.

But using , Mode = 3 Median – 2 Mean


ത 𝑀𝑒𝑑𝑖𝑎𝑛)
3(𝑋−
Sk =
𝜎
this measure can vary between ±𝟑 ;

95
(ii) Bowley’s coefficient of skewness
It is based on Quartiles. In a symmetrical distribution first and third
quartiles are equidistant from the median :
𝑄1 𝑀𝐸𝐷𝐼𝐴𝑁 𝑄3

In a symmetrical distribution the third quartile is the same distance above


the median as the first quartile is below it, i.e.,
𝑄3 − 𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑀𝑒𝑑𝑖𝑎𝑛 − 𝑄1
or 𝑄3 + 𝑄1 − 2 𝑀𝑒𝑑𝑖𝑎𝑛 = 0

𝑄3 + 𝑄1 − 2 𝑀𝑒𝑑𝑖𝑎𝑛
∴ Sk =
𝑄3 − 𝑄1
This measure is called the quartile measure of skewness and varies
between ± 1.
96
(iii) Moments
Moment is a measure of a force with respect to its tendency to
provide rotation.
The strength of the tendency depends on the amount of force and the
distance from the origin of the point at which the force is exerted.

ഥ ) be used to represent the deviation of any


Definition: Let the symbol xi = (𝑿𝒊 − 𝑿
item in a distribution from the arithmetic average of that distribution.
The arithmetic mean of the various powers of these deviations in any
distribution are called the moments of the distribution.

97
Central Moments (Moments about the Arithmetic Mean):

σ(𝑋𝑖 −𝑋)
First Moment 𝜇1 = : (sum of the deviations from
𝑁
A.M. is always zero. 𝝁𝟏 = 𝟎)

ത 2
σ(𝑋𝑖 −𝑋)
Second Moment 𝜇2 = = 𝜎 2 = Variance
𝑁

ത 3
σ(𝑋𝑖 −𝑋)
Third Moment 𝜇3 =
𝑁

ത 4
σ(𝑋𝑖 −𝑋)
Fourth Moment 𝜇4 =
𝑁

98
For a frequency distribution:

σ 𝑓𝑖 (𝑋𝑖 −𝑋)
First Moment 𝜇1 =
𝑁
ത 2
σ 𝑓𝑖 (𝑋𝑖 −𝑋)
Second Moment 𝜇2 = = 𝜎 2 = Variance
𝑁

ത 3
σ 𝑓𝑖 (𝑋𝑖 −𝑋) σ 𝑓𝑖 𝑥𝑖 3
Third Moment 𝜇3 = or
𝑁 𝑁

ത 4
σ 𝑓𝑖 (𝑋𝑖 −𝑋) σ 𝑓𝑖 𝑥𝑖 4
Fourth Moment 𝜇4 = or
𝑁 𝑁

99
Conversion of moments about an Arbitrary origin into
Moments about mean

𝜇1 = 𝜇1′ − 𝜇1′

𝜇2 = 𝜇2′ − (𝜇1′ ) 2

𝜇3 = 𝜇3′ − 3𝜇1′ 𝜇2′ + 2(𝜇1′ ) 3

𝜇4 = 𝜇4′ − 4𝜇1′ 𝜇3′ + 6𝜇2′ (𝜇1′ ) 2 − 3(𝜇1′ ) 4

100
Non-central Moments (Moments about the Assumed Mean)
Where the actual mean is in fractions it is difficult to calculate moments by applying the
above formulae. In such cases we can first compute moments about an arbitrary origin(A).
σ(𝑋𝑖 −𝐴) σ 𝑓𝑖 (𝑋𝑖 −𝐴)
𝜇1′ = ത
=𝑋−𝐴 ; ′
𝜇1 =
𝑁 𝑁
Mean = 𝑋ത = 𝜇1′ + A

σ(𝑋𝑖 −𝐴)2 σ 𝑓𝑖 (𝑋𝑖 −𝐴)2


𝜇2′ = ; 𝜇2′ =
𝑁 𝑁
σ(𝑋𝑖 −𝐴)3 σ 𝑓𝑖 (𝑋𝑖 −𝐴)3
𝜇3′ = ; 𝜇3′ =
𝑁 𝑁
σ(𝑋𝑖 −𝐴)4 σ 𝑓𝑖 (𝑋𝑖 −𝐴)3
𝜇4′ = ; 𝜇4′ =
𝑁 𝑁

101
𝜇32
𝛽1 (beta one) = (Coeff. of Skewness)
𝜇23

𝜇4
𝛽2 (beta two) = (Coeff. of Kurtosis)
𝜇22

𝛾1 (Gamma one) = 𝛽1 (Coeff. of Skewness)

𝛾2 (Gamma two) = 𝛽2 −3 (Coeff. of Kurtosis)

102
Marks Frequency
0-10 5
10-20 20
20-30 15
30-40 45
40-50 10
50-60 5

Find the Measure of skewness based on moments.

103
𝑋ത = 30
Mode = 35.3846

Absolute measures of Skewness(Sk)

Sk = 𝑋ത - Mode = -5.3846

104
Mid point
Marks f 𝒅 = (𝒎 − 𝟑𝟓)/𝟏𝟎 fd 𝒇𝒅𝟐 𝒇𝒅𝟑 𝒇𝒅𝟒
(m)
0-10 5 5 -3 -15 45 -135 405
10-20 15 20 -2 -40 80 -160 320
20-30 25 15 -1 -15 15 -15 15
30-40 35 45 0 0 0 0 0
40-50 45 10 +1 10 10 10 10
50-60 55 5 +2 10 20 40 80
N =100 σ 𝒇𝒅 = -50 σ 𝒇𝒅𝟐 = 170 σ 𝒇𝒅𝟑 = -260 σ 𝒇𝒅𝟒 = 830

σ 𝑓𝑖 𝑑𝑖 σ 𝑓𝑖 𝑑𝑖3
𝜇1′ = × 𝑖 = -5 𝜇3′ = × 𝑖 3 = 2600
𝑁 𝑁
Mean = 𝑋ത = 𝜇1′ + A = 30
σ 𝑓𝑖 𝑑𝑖4
σ 𝑓𝑖 𝑑𝑖2 𝜇4′ = × 𝑖 4 = 83000
𝑁
𝜇2′ = × 𝑖 2 = 170
𝑁

105
𝜇2 = 𝜇1′ − (𝜇1′ ) 2 = 145 = variance = 𝜎 2

𝜇3 = 𝜇3′ − 3𝜇1′ 𝜇2′ + 2(𝜇1′ ) 3 = −300

𝜇4 = 𝜇4′ − 4𝜇1′ 𝜇3′ + 6𝜇2′ (𝜇1′ ) 2 − 3(𝜇1′ ) 4 = 54625

𝜇32 𝜇4
𝛽1 = = - 0.02952 ; 𝛽2 = = 2.5981
𝜇23 𝜇22

𝛾1 = 𝛽1 = - 0.172 ; 𝛾2 = 𝛽2 −3 = -0.4019

Since, 𝛾1 - is negative, the distribution is negatively skewed.


𝛾2 - is negative, the distribution is platykurtic curve.
106
Kurtosis
Kurtosis enables us to have an idea about the ‘flatness’ or ‘peakedness’
in the region about the mode of a frequency curve.

“Convexity of the frequency curve”

𝜇4
It is measured by the Coefficient of 𝛽2 = or 𝛾2 = 𝛽2 −3
𝜇22
(i) 𝛽2 = 3, i.e., 𝛾2 = 0 : mesokurtic curve
(ii) 𝛽2 < 3, i.e., 𝛾2 < 0 : platykurtic curve
(iii) 𝛽2 > 3, i.e., 𝛾2 > 0 : leptokurtic curve

107
108

You might also like