Module 1
Module 1
COLLECTION
➢ Presentation of Data
➢ Analysis of Data
➢ Interpretation of results 1
Types of Statistical Data:
Primary Data: Primary data are those which are collected from the units or
individuals directly and these data have never been used for
any purpose earlier.
Example:
If a researcher is interested to know the impact of noon meal scheme for the
school children, he has to undertake a survey and collect data on the opinion
of parents and children by asking relevant questions. Such a data collected
for the purpose is called primary data.
2
Methods for Collecting Primary Data:
The primary data can be collected by the following five methods.
1. Direct personal interviews
2. Indirect Oral interviews
3. Information from correspondents
4. Mailed questionnaire method
5. Schedules sent through enumerators
Secondary Data:
The data, which had been collected by some individual or agency and
statistically treated to draw certain conclusions and now the same data are
used and analyzed to extract some other information.
3
Frequency Distribution:
Frequency distribution is a series when a number of observations with similar or
closely related values are put in separate bunches or groups, each group being in
order of magnitude in a series.
(ii) Class-interval
(iii) Class-frequency
6
Data Analysis:
➢ Measures of Skewness
➢ Measures of Kurtosis
7
Measures of Central tendency or average
Quite often it is found that the entries in data set cluster around a central
(or middle) value. This behavior of the data set is called the central
tendency. The main Challenge is to locate a central value around which the
clustering takes place.
❑ Arithmetic Mean
❑ Median
❑ Mode
❑ Geometric Mean
❑ Harmonic Mean
8
Arithmetic Mean (A.M.)
A.M. : Arithmetic mean of set of observations is their sum
divided by the number of observations.
𝑥1 + 𝑥2 + ………+ 𝑥𝑛 σ𝑛
𝑖=1 𝑥𝑖
Simple A.M. : 𝑋ത = =
𝑁 𝑁
N - Number of observations.
9
A.M. For Discrete Series:
σ𝑛
𝑖=1 𝑓𝑖 𝑥𝑖
Direct Method: 𝑋ത =
𝑁
f - Frequency of the given set of observations
𝑁 = σ𝑛𝑖=1 𝑓𝑖 = Total number of observations
Marks 20 30 40 50 60 70
No. of
8 12 20 10 6 4
Students
10
Marks No. of Students ( f ) fx
20 8 160
30 12 360
40 20 800
50 10 500
60 6 360
70 4 280
𝑁 = 60 σ 𝑓𝑥 = 2460
σ 𝑓𝑖 𝑥𝑖 2460
ത
𝑋= = = 𝟒𝟏
𝑁 60
11
A.M. For Continuous Series:
σ𝑛
𝑖=1 𝑓𝑖 𝑥𝑖
Direct Method: 𝑋ത =
𝑁
f - Frequency of the given set of observations
x - mid-point of each class
𝑁 = σ𝑛𝑖=1 𝑓𝑖 = Total number of observations
12
No. of Students Mid-point
Marks fx
(𝒇) (𝒙)
0-10 12 5 60
10-20 18 15 270
20-30 27 25 675
30-40 20 35 700
40-50 17 45 765
50-60 6 55 330
σ 𝑓𝑖 𝑥𝑖 2800
𝑋ത = = = 28
𝑛 100
13
Using the Deviation:
If the values of x or f are large, the calculation of A.M. by above
formula is quite time-consuming and tedious.
The A.M. is reduced to a great extent by taking the deviations
of the given values from any arbitrary point ‘A’ :
Let 𝑑𝑖 = 𝑥𝑖 − 𝐴 ⇒ 𝑓𝑖 𝑑𝑖 = 𝑓𝑖 𝑥𝑖 − 𝐴
1 ℎ
𝑋 = 𝐴 + σ𝑁
ത 𝑖=1 𝑓𝑖 𝑑𝑖 or 𝑋 = 𝐴 + σ𝑁
ത 𝑖=1 𝑓𝑖 𝑑𝑖
𝑁 𝑁
ℎ 𝑜𝑟 𝑖 − common magnitude of class
14
C.I. 0-8 8-16 16-24 24-32 32-40 40-48
Ex:
Frequency 8 7 16 24 15 7
𝒙−𝑨
C.I. Mid-Value Frequency (f) 𝒅= fd
𝒉
0-8 4 8 -3 -24
8-16 12 7 -2 -14
16-24 20 16 -1 -16
24-32 28 24 0 0
32-40 36 15 1 15
40-48 44 7 2 14
Total 77 -25
𝑛
ℎ 8 x (−25)
𝑋ത = 𝐴 + 𝑓𝑖 𝑑𝑖 = 28 + = 25.404
𝑁 77
𝑖=1
15
Median:
Median of a distribution is the value of the variable which divides it into two equal parts.
It is the value which exceeds and is exceeded by the same number of observations. Thus the
median is called as a “positional average”.
Ex: Find the median of the values 25, 20, 15, 35, 18.
Ex: Find the median of the values 8, 20, 50, 25, 15, 30.
16
For Grouped data:
Ex: x 1 2 3 4 5 6 7 8 9
f 8 10 11 16 20 25 15 9 6
17
x f c.f.
1 8 8 𝑁+1
= 60.5
2 10 18 2
3 11 29
4 16 45 The cumulative frequency (c.f.) just
5 20 65 𝑁+1
greater than is 65 and value
2
6 25 90
corresponding to 65 is 5.
7 15 10
8 9 114 ∴ 𝑴𝒆𝒅𝒊𝒂𝒏 𝒊𝒔 𝟓.
9 6 120
N = 120
18
(ii) Continuous Frequency Distribution:
In case of continuous frequency distribution, the class corresponding to the c.f.
𝑵
just greater than is called the median class and the value of median is obtained
𝟐
by the following formula:
ℎ 𝑁
Median = 𝑙 + ( − 𝑐)
𝑓 2
Note: The median formula can only be used only for continuous classes without
any gaps, i.e., for exclusive type classifications.
19
Ex: Find the median of the following data:
Wages
2000-3000 3000-4000 4000-5000 5000-6000 6000-7000
(in Rs.)
No. of
3 5 20 10 5
workers
20
Solution:
𝑁 43
Wages No. of c.f. = = 21.5
(in Rs.) Employees 2 2
2000-3000 3 3 Cumulative frequency just greater
3000-4000 5 8
than 21.5 is 28 and the corresponding
4000-5000 20 28 class is 4000-5000.
5000-6000 10 38 Thus the median class is 4000-5000.
6000-7000 5 43
N = 43
l = 4000; h = 1000; f = 20; c = 8
1000
Median = 4000 + (21.5 − 8)
20
∴ Median = 4675.
21
MODE
Definition: Mode is the value of the variable which is predominant
in the series.
22
2.For Continuous Frequency distribution:
23
Example: Find mode
Frequency 5 8 7 12 28 20 10 10
10(28−12)
Mode = 40 + = 46.67
2x28−12−20
24
A distribution is having only one mode is called Unimodal.
If it contains more than one mode, it is called bimodal or multimodal.
Note: In the following three cases, mode can not be obtained by using the above
formula:
(a)When the highest frequency is observed at the beginning of the frequency table.
(b)When the highest frequency is observed at the ending of the frequency table.
(c)When two or more class intervals contain the same maximum frequencies.
However, in the above three cases, mode can be obtained by using either a method
called ‘Grouping method’ or ‘empirical relationship between arithmetic mean,
median and mode.
The empirical relationship between mean, median and mode is
Mode = 3 Median – 2 Mean 25
Example: Calculate the mean, median and mode for the following data.
Wages
15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64
(in lakhs.)
No. of
31 47 59 78 104 113 81 60 52 25
workers
Mean = 39.52
Median = 39.77
Mode = 40.6
26
Example: Calculate the Mean, Median and Mode for the
following data.
Variable 10-13 13-16 16-19 19-22 22-25 25-28 28-31 31-34 34-37 37-40
Frequency 8 15 27 51 75 54 36 18 9 7
Mean = 24.19
Median = 23.96
Mode = 23.6
27
Geometric mean (G.M)
29
Harmonic mean (H.M)
30
Series A Series B Series C
200 200 1
200 205 989
200 202 2
200 203 3
200 190 5
Mean = 200 200 200
31
Measure of Dispersion
The degree to which numerical data tend to spread about an
average value is called variation or dispersion of data.
33
There are two kind of measures of dispersion
Absolute measure of dispersion
Relative measure of dispersion
Absolute measure of dispersion
Absolute measure of dispersion indicates the amount of variations in a set
of values in terms of units of observation.
Example
When rainfalls on differ days are available in mm, any absolute measure of
dispersion gives the variation rainfall in mm.
Relative measure of dispersion
Relative measures of dispersion are free from the units of measurements of
the observations.
They are pure numbers. They are used to compare the variation in two or
more sets, which are having differnent units of measuremet of observation.
34
Absolute Measure of Dispersion:
Range
Quartile Deviation
Mean Deviation
Standard Deviation
35
Range
Definition: Difference between the value of the smallest item and the
value of the largest item in the distribution.
Range = 𝑳 − 𝑺
L – Largest Value, S- Smallest Value
36
Example 1: The following are the prices of shares of a company from
Monday to Saturday:
37
In a frequency distribution, range is calculated by taking the difference
between the lower limit of the lowest class and the upper limit of the
highest class.
Range = L – S = 70 – 10 = 60
𝐿−𝑆 70−10 60
Coefficient of Range = = = = 0.75
𝐿+𝑆 70+10 80
38
Partitions:
These are the values which divided the series into a number of equal
parts.
Quartiles: The three points which divided the series in to four equal
parts are called quartiles. These are Q1, Q2 (median), Q3.
Deciles: The nine points which divided the series in to ten equal parts
are called deciles. These are D1, D2,….D5(median), …...D9.
39
Quartiles
𝑁+1
First Quartile (Q1) = Size of 𝑡ℎ item (Discrete series)
4
𝑁
Q1 = Size of 𝑡ℎ item (Continuous series)
4
𝑁
4
− 𝑐.𝑓.
Q1 = l + ×𝑖
𝑓
𝑁+1
Third Quartile (Q3) = Size of 3 𝑡ℎ item (Discrete series)
4
3𝑁
Q3 = Size of 𝑡ℎ item. (Continuous series)
4
3𝑁
4
− 𝑐.𝑓.
Q3 = l + ×𝑖
𝑓
40
Example:
41
Quartiles: Grouped data-Discrete series
43
Example : Calculate Q1 and Q3 for the following data.
Roll No. 1 2 3 4 5 6 7
Marks 20 28 40 12 30 15 50
𝑁+1 7+1
Q1 = Size of 𝑡ℎ item = Size of = 2nd item.
4 4
Size of 2nd item is 15. Hence Q1 = 15
𝑁+1 7+1
Q3 = Size of 3 𝑡ℎ item = Size of 3 = 6th item.
4 4
Size of 6th item is 40. Hence Q3 = 40.
.
44
Quartiles:Grouped data-Continuous series
𝑁
4
− 𝑐.𝑓.
Q1 = l + ×𝑖
𝑓
3𝑁
− 𝑐.𝑓.
4
Q3 = l + ×𝑖
𝑓 45
Example : Compute the value of Q1 and Q3 for following data:
f 12 19 5 10 9 6 6
46
Solution: Cumulative
Marks Frequency
Frequency
10-20 12 12
20-30 19 31
30-40 5 36
40-50 10 46
50-60 9 55
60-70 6 61
70-80 6 67
𝑵 = 67
47
𝑁 67
Q1 = Size of 𝑡ℎ item = Size of = 16.75th item.
4 4
Q1 - lies in the interval 20-30
𝑁
4
− 𝑐.𝑓.
Q1 = l + ×𝑖 l = 20, 𝑁/4 = 16.75, c.f. = 12
𝑓
f = 19, i = 10
67
4
−12
Q1 = 20 + × 10 = 20+2.5 = 22.5
19
Hence Q1 = 22.5
48
3𝑁 3×67
Q3 = Size of 𝑡ℎ item = Size of = 50.25th item.
4 4
Q3 - lies in the class 50-60.
3𝑁
4
− 𝑐.𝑓.
Q3 = 𝑙 + ×𝑖 l = 50, 3𝑁/4 = 50.25, c.f. = 46
𝑓
f = 9, i = 10
50.25−46
Q3 = 50 + × 10 = 50+4.72 = 54.72
9
Hence Q3 = 54.72
49
𝑁+1
Deciles: 𝑫𝟒 = Size of 4 item in individual and discrete series.
10
4𝑁
𝑫𝟒 = Size of th item in continuous series.
10
𝑁+1
Percentiles: 𝑷𝟔𝟎 = Size of 60 th item in discrete series.
100
60𝑁
𝑷𝟔𝟎 = Size of th item in continuous series.
100
50
4𝑁 4×67
𝑫𝟒 = Size of th item = = 26.8 th item.
10 10
𝑫𝟒 lies in the interval of 20-30.
4𝑁
− 𝑐.𝑓.
10
𝑫𝟒 = l + ×𝑖 l = 20, 4𝑁/10 = 26.8, c.f. = 12
𝑓
f = 19, i = 10
𝑫𝟒 = 27.79
60𝑁 60×67
𝑷𝟔𝟎 = Size of th item = = 40.2 th item
100 100
𝑷𝟔𝟎 lies in the interval of 40-50.
60𝑁
100
− 𝑐.𝑓.
𝑷𝟔𝟎 = 𝑙 + ×𝑖 l = 40, 4𝑁/10 = 40.2, c.f. = 36
𝑓
f = 10, i = 10
𝑷𝟔𝟎 = 44.2
51
Quartile Deviation
Definition: Average amount by which the two quartiles differ from the
median.
𝑸𝟑 −𝑸𝟏
Quartile Deviation (Q.D.) =
𝟐
52
Relative measure of Q.D.
𝑸𝟑 −𝑸𝟏
Coefficient of Q.D. =
𝑸𝟑 +𝑸𝟏
53
Example : Calculate the value of Q.D. and its coefficient of Q.D.
from the following data.
Roll No. 1 2 3 4 5 6 7
Marks 20 28 40 12 30 15 50
𝑁+1 7+1
Q1 = Size of 𝑡ℎ item = Size of = 2nd item.
4 4
Size of 2nd item is 15. Hence Q1 = 15
𝑁+1 7+1
Q3 = Size of 3 𝑡ℎ item = Size of 3 = 6th item.
4 4
Size of 6th item is 40. Hence Q3 = 40.
.
54
𝑄3−𝑄1 40−15
∴ 𝑸. 𝑫. =
2 = 2 = 12.5
𝑄3 −𝑄1 40−15
Coefficient of Q.D. = = = 𝟎. 𝟒𝟓𝟓
𝑄3 +𝑄1 40+15
55
Example : Compute the value of Q.D. and its coefficient from
the following data.
Marks 10 20 30 40 50 60
No. of
4 7 15 8 7 2
Students
56
Solution: Marks Frequency cumulative
frequency
10 4 4
20 7 11
30 15 26
40 8 34
50 7 41
60 2 43
𝑁+1 43+1
Q1 = Size of 𝑡ℎ item = Size of = 11th item.
4 4
Size of 11th item is 20. Hence Q1 = 20
𝑁+1 43+1
Q3 = Size of 3 𝑡ℎ item = Size of 3 = 33rd item.
4 4
Size of 33rd item is 40. Hence Q3 = 40.
57
𝑄3−𝑄1 40−20
𝑸. 𝑫. =
2 = 2 = 10
𝑄3 −𝑄1 40−20
Coefficient of Q.D. = = = 0.333
𝑄3 +𝑄1 40+20
58
Example : Compute the value of Q.D. and coefficient of Q.D.
from the following data
f 12 19 5 10 9 6 6
59
Solution: Cumulative
Marks Frequency
Frequency
10-20 12 12
20-30 19 31
30-40 5 36
40-50 10 46
50-60 9 55
60-70 6 61
70-80 6 67
𝑵 = 67
60
𝑁 67
Q1 = Size of 𝑡ℎ item = Size of = 16.75th item.
4 4
Q1 lies in the interval 20-30
𝑁
4
− 𝑐.𝑓.
Q1 = l+ ×𝑖 l = 20, 𝑁/4 = 16.75, c.f. = 12 f = 19, i = 10
𝑓
67
4
−12
Q1 = 20 + × 10 = 20+2.5 = 22.5
19
Hence Q1 = 22.5
61
3𝑁 3×67
Q3 = Size of 𝑡ℎ item = Size of = 50.25th item.
4 4
Q3 lies in the class 50-60.
3𝑁
− 𝑐.𝑓.
4
Q3 = 𝑙 + ×𝑖 l = 50, 3𝑁/4 = 50.25, c.f. = 46 f = 9, i = 10
𝑓
50.25−46
Q3 = 50 + × 10 = 50+4.72 = 54.72
9
Hence Q3 = 54.72
𝑄3 −𝑄1 54.72−22.5
𝑸. 𝑫. = = = 16.11
2 2
𝑄3 −𝑄1 54.72−22.5
Coefficient of Q.D. = = = 0.4172
𝑄3 +𝑄1 54.72+22.5
62
Mean Deviation
Definition: M.D. is the average difference between the observations in a distribution and
the median or mean of that series.
Or
Mean deviation is the arithmetic mean of the deviations of a series computed from
any measure of central tendency; i.e., the mean, median or mode, all the deviations
are taken as positive
σ𝐷
Mean Deviation or M.D. =
𝑁
Where 𝐷 is the deviations from median ignoring signs.
63
For individual observations:
(i) Compute median of the series.
(ii) Take deviations of items from median ignoring ± signs and denote
these deviations by 𝐷 .
(iii) Obtain the total of these observations, σ 𝐷 .
(iv) Divide the total obtained in step (iii) by the number of observations to
get the value of mean deviation.
𝑀.𝐷.
Co-efficient of M.D. =
𝑀𝑒𝑑𝑖𝑎𝑛
64
Example : Calculate the mean deviation of the two income
groups.
I (Rs.) II (Rs.)
4000 3000
4200 4000
4400 4200
4600 4400
4800 4600
4800
5800
65
Solution: Group I Group II
Rs. 𝑫 Rs. 𝑫
4000 400 3000 1400
4200 200 4000 400
4400 0 4200 200
4600 200 4400 0
4800 400 4600 200
4800 400
5800 1400
N=5 σ 𝑫 = 1200 N=7 σ 𝑫 =4000
σ𝐷
Mean deviation : I group M.D. =
𝑁
𝑁+1 5+1
Median = 𝑡ℎ item = = 3rd item. Size of the 3rd item = 4400.
2 2
1200
M.D. = = 240
5
i.e., the average deviation of the individual incomes from the median income
is Rs. 240.
66
σ𝐷
Mean deviation : II group M.D. =
𝑁
𝑁+1 7+1
Median = 𝑡ℎ item = = 4th item.
2 2
Size of the 4th item = 4400.
4000
M.D. = = 571.43
5
i.e., the average deviation of the individual incomes from the
median income is Rs. 571.43.
𝑀.𝐷. 240
Co-efficient of M.D. (I - Group) = = = 0.055
𝑀𝑒𝑑𝑖𝑎𝑛 4400
571.43
(II- Group) = = 0.13
4400
67
Mean deviation – Discrete Series
σ𝑓 𝐷
M.D. =
𝑁
(i) Compute median of the series.
68
Example : The number of telephone calls received at an exchange in 245
successive one-minute intervals are shown in the following
frequency distribution. Compute the mean deviation about
the median.
Number of
0 1 2 3 4 5 6 7
Calls
Frequency 14 21 25 43 51 40 39 12
69
Solution: No. of Calls f c.f. 𝑫 f𝑫
0 14 14 4 56
1 21 35 3 63
2 25 60 2 50
3 43 103 1 43
4 51 154 0 0
5 40 194 1 40
6 39 233 2 78
7 12 245 3 36
N = 245 σ 𝒇 𝑫 = 366
𝑁+1 245+1
Median = Size of th item = = 123rd item .
2 2
Hence the median value is 4.
σ𝑓 𝐷 366
M.D. = = = 1.49
𝑁 245
70
In Continuous Series:
We have to obtain the mid-points of the various classes and take
the deviations of these mid-points from median.
71
Example: Find mean deviation from mean and median
Marks 0-10 10-20 20-30 30-0 40-50 50-60 60-70 70-80
No. of 20 25 32 40 42 35 10 8
Students
72
73
74
75
Eample: Calculate the coefficient of mean deviation from the
following data:
76
Standard deviation
For the frequency distribution 𝑥𝑖 | 𝑓𝑖 ; i =1, 2, …, n ,
σ(𝑥𝑖 −𝑥)ҧ 2 1 1 1
Variance = 𝜎2 = = σ 𝑥𝑖2 − ( σ 𝑥𝑖 ) 2 = σ 𝑥𝑖2 − 𝑥ҧ 2
𝑁 𝑁 𝑁 𝑁
1 1
Standard deviation = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝜎= σ (𝑥𝑖 − 𝑥)ҧ 2 or σ 𝑓𝑖 (𝑥𝑖 − 𝑥)ҧ 2
𝑁 𝑁
𝜎
Coefficient of Variation : C.V. = x 100 (Relative Measure)
𝑥ҧ
78
Example: The score of two players A and B in ten innings during a
certain season are:
A 32 28 47 63 71 39 10 60 96 14
B 19 31 48 53 67 90 10 62 40 80
79
Solution: Calculation of Coefficient of Variation
𝑋 ത
(𝑋 − 𝑋) ത 2
(𝑋 − 𝑋) 𝑌 ത
(𝑌 − 𝑌) ത 2
(𝑌 − 𝑌)
32 -14 196 19 -31 961
28 -18 324 31 -19 361
47 +1 1 48 -2 4
63 +17 289 53 +3 9
71 +25 625 67 +17 289
39 -7 49 90 +40 1600
10 -36 1296 10 -40 1600
60 +14 196 62 +12 144
96 +50 2500 40 -10 100
14 -32 1024 80 +30 900
σ 𝑋 = 460 0 6500 σ 𝑌 =500 0 5968
460 500
ത
𝑋= = 46 ത
𝑌= = 50
10 10
σ(𝑥𝑖 −𝑥)ҧ 2 6500 ത 2
σ(𝑦𝑖 −𝑦) 5968
𝜎𝐴2 = = = 650 2
𝜎𝐵 = = = 596.8
𝑁 10 𝑁 10
80
σ(𝑥𝑖 −𝑥)ҧ 2 ത 2
σ(𝑦𝑖 −𝑦)
𝜎𝐴 = = 25.5 𝜎𝐵 = = 24.43
𝑁 𝑁
𝜎𝐴 𝜎𝐵
C.V.(A) = ҧ x 100 = 55.43 C.V.(B) = x 100 = 48.86
𝑥 𝑦ത
81
2
𝑋 = 460 ; 𝑋𝑖 − 𝑥ҧ = 0 ; 𝑋𝑖 − 𝑥ҧ = 6500
2
𝑌 = 500 ; 𝑌𝑖 − 𝑦ത = 0 ; 𝑌𝑗 − 𝑦ത = 5968
𝜎𝐴 = 25.5 𝜎𝐵 = 24.43
82
Example: Suppose that samples of polythene bags from two manufacturers, A and B,
are tested by a prospective buyer for bursting pressure, with the following results:
84
σ𝑛
𝑖=1 𝑓𝑖 𝑑𝑖
𝑋ത𝐴 = 𝐴 + ×𝑖
𝑁
78
𝑋ത𝐴 = 17.45 + × 5 = 21
110
σ 𝑓𝑖 𝑑𝑖2 σ 𝑓𝑖 𝑑𝑖
σ𝑨 = − ( )2 × 𝑖
𝑁 𝑁
85
For Manufacturer B
Bursting 𝑚 − 17.45
Pressure 𝒎 𝒇 5 𝒇𝒅 𝒇𝒅𝟐
(lb.) d
4.95-9.95 7.45 9 -2 -18 36
9.95-14.95 12.45 11 -1 -11 11
14.95-19.95 17.45 18 0 0 0
19.95-24.95 22.45 32 +1 +32 32
24.95-29.95 27.45 27 +2 +54 108
29.95-34.95 32.45 13 +3 +39 117
N = 110 σ 𝒇𝒅 = 96 σ 𝒇𝒅𝟐 =304
86
σ𝑛
𝑖=1 𝑓𝑖 𝑑𝑖
𝑋ത𝐵 = 𝐴 + ×𝑖
𝑁
96
𝑋ത𝐵 = 17.45 + × 5 = 21.81
110
σ 𝑓𝑖 𝑑𝑖2 σ 𝑓𝑖 𝑑𝑖 2
σ𝑩 = − × 𝑖
𝑁 𝑁
87
𝑋ത𝐴 = 21 𝑋ത𝐵 = 21.81
𝜎𝐴 = 4.88 𝜎𝐵 = 7.07
88
C.I. f C.I. f
0-5 10 0-5 10
5-10 30 5-10 40
10-15 60 10-15 30
15-20 60 15-20 90
20-25 30 20-25 20
25-30 10 25-30 10
𝑋ത = 15
𝜎=6
89
90
60 60
ഥ = 15
𝒙
𝝈 = 𝟔
30 30
40
30
20
10 10
10 10
90
Skewness
When a series is not symmetrical it is said to be asymmetrical or skewed.
A distribution is said to be ‘skewed’ when the mean and median
fall at different points in the distribution, and the balance (or centre of
gravity) is shifted to one side or the other to left or right.
91
Dispersion is concerned with the amount of variation rather than with its
direction.
Skewness tell us about the direction of the variation or the departure from
symmetry.
92
93
Measure of Skewness
Absolute measures of Skewness(Sk)
Sk = 𝑋ത - Mode
94
(i) Karl pearson’s coefficient of skewness.
𝑀𝑒𝑎𝑛−𝑀𝑜𝑑𝑒 ത
𝑋−𝑀𝑜
Coefficient of Skewness : Sk = =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝜎
however, in practice, it is rare that the value of Sk exceed the limits of ±𝟏.
95
(ii) Bowley’s coefficient of skewness
It is based on Quartiles. In a symmetrical distribution first and third
quartiles are equidistant from the median :
𝑄1 𝑀𝐸𝐷𝐼𝐴𝑁 𝑄3
𝑄3 + 𝑄1 − 2 𝑀𝑒𝑑𝑖𝑎𝑛
∴ Sk =
𝑄3 − 𝑄1
This measure is called the quartile measure of skewness and varies
between ± 1.
96
(iii) Moments
Moment is a measure of a force with respect to its tendency to
provide rotation.
The strength of the tendency depends on the amount of force and the
distance from the origin of the point at which the force is exerted.
97
Central Moments (Moments about the Arithmetic Mean):
ത
σ(𝑋𝑖 −𝑋)
First Moment 𝜇1 = : (sum of the deviations from
𝑁
A.M. is always zero. 𝝁𝟏 = 𝟎)
ത 2
σ(𝑋𝑖 −𝑋)
Second Moment 𝜇2 = = 𝜎 2 = Variance
𝑁
ത 3
σ(𝑋𝑖 −𝑋)
Third Moment 𝜇3 =
𝑁
ത 4
σ(𝑋𝑖 −𝑋)
Fourth Moment 𝜇4 =
𝑁
98
For a frequency distribution:
ത
σ 𝑓𝑖 (𝑋𝑖 −𝑋)
First Moment 𝜇1 =
𝑁
ത 2
σ 𝑓𝑖 (𝑋𝑖 −𝑋)
Second Moment 𝜇2 = = 𝜎 2 = Variance
𝑁
ത 3
σ 𝑓𝑖 (𝑋𝑖 −𝑋) σ 𝑓𝑖 𝑥𝑖 3
Third Moment 𝜇3 = or
𝑁 𝑁
ത 4
σ 𝑓𝑖 (𝑋𝑖 −𝑋) σ 𝑓𝑖 𝑥𝑖 4
Fourth Moment 𝜇4 = or
𝑁 𝑁
99
Conversion of moments about an Arbitrary origin into
Moments about mean
𝜇1 = 𝜇1′ − 𝜇1′
𝜇2 = 𝜇2′ − (𝜇1′ ) 2
100
Non-central Moments (Moments about the Assumed Mean)
Where the actual mean is in fractions it is difficult to calculate moments by applying the
above formulae. In such cases we can first compute moments about an arbitrary origin(A).
σ(𝑋𝑖 −𝐴) σ 𝑓𝑖 (𝑋𝑖 −𝐴)
𝜇1′ = ത
=𝑋−𝐴 ; ′
𝜇1 =
𝑁 𝑁
Mean = 𝑋ത = 𝜇1′ + A
101
𝜇32
𝛽1 (beta one) = (Coeff. of Skewness)
𝜇23
𝜇4
𝛽2 (beta two) = (Coeff. of Kurtosis)
𝜇22
102
Marks Frequency
0-10 5
10-20 20
20-30 15
30-40 45
40-50 10
50-60 5
103
𝑋ത = 30
Mode = 35.3846
Sk = 𝑋ത - Mode = -5.3846
104
Mid point
Marks f 𝒅 = (𝒎 − 𝟑𝟓)/𝟏𝟎 fd 𝒇𝒅𝟐 𝒇𝒅𝟑 𝒇𝒅𝟒
(m)
0-10 5 5 -3 -15 45 -135 405
10-20 15 20 -2 -40 80 -160 320
20-30 25 15 -1 -15 15 -15 15
30-40 35 45 0 0 0 0 0
40-50 45 10 +1 10 10 10 10
50-60 55 5 +2 10 20 40 80
N =100 σ 𝒇𝒅 = -50 σ 𝒇𝒅𝟐 = 170 σ 𝒇𝒅𝟑 = -260 σ 𝒇𝒅𝟒 = 830
σ 𝑓𝑖 𝑑𝑖 σ 𝑓𝑖 𝑑𝑖3
𝜇1′ = × 𝑖 = -5 𝜇3′ = × 𝑖 3 = 2600
𝑁 𝑁
Mean = 𝑋ത = 𝜇1′ + A = 30
σ 𝑓𝑖 𝑑𝑖4
σ 𝑓𝑖 𝑑𝑖2 𝜇4′ = × 𝑖 4 = 83000
𝑁
𝜇2′ = × 𝑖 2 = 170
𝑁
105
𝜇2 = 𝜇1′ − (𝜇1′ ) 2 = 145 = variance = 𝜎 2
𝜇32 𝜇4
𝛽1 = = - 0.02952 ; 𝛽2 = = 2.5981
𝜇23 𝜇22
𝛾1 = 𝛽1 = - 0.172 ; 𝛾2 = 𝛽2 −3 = -0.4019
𝜇4
It is measured by the Coefficient of 𝛽2 = or 𝛾2 = 𝛽2 −3
𝜇22
(i) 𝛽2 = 3, i.e., 𝛾2 = 0 : mesokurtic curve
(ii) 𝛽2 < 3, i.e., 𝛾2 < 0 : platykurtic curve
(iii) 𝛽2 > 3, i.e., 𝛾2 > 0 : leptokurtic curve
107
108