Statistics Is The ": Science Which Deals With The Collection, Analysis and Interpretation of Numerical Data"
Statistics Is The ": Science Which Deals With The Collection, Analysis and Interpretation of Numerical Data"
COLLECTION
Statistics:
09-01-2023 1
Main Functions of Statistics:
➢ Collection of Data
➢ Presentation of Data
➢ Analysis of Data
➢ Interpretation of results
09-01-2023 2
Types of Statistical Data:
o Primary Data: Primary data are those which are collected from the units or
individuals directly and these data have never been used for
any purpose earlier.
o Secondary Data: The data, which had been collected by some individual or
agency and statistically treated to draw certain conclusions
and now the same data are used and analyzed to extract some
other information.
09-01-2023 3
❖ Population
❖ Sample
❖ Parameter
❖ Statistic
❖ Sampling
❖ Random Sampling
❖ Non-Random Sampling
09-01-2023 4
▪ Variable
▪ Frequency
▪ Discrete frequency distribution
▪ Continuous frequency distribution
1/9/2023 5
Marks 20 30 40 50 60 70
No. of
8 12 20 10 6 4
Students
1/9/2023 6
Formation of Frequency Distribution
Classification according to class-intervals:
(i) Class Limits
(ii) Class-interval
(iii) Class-frequency
1/9/2023 7
Data Analysis:
➢ Measures of Skewness
➢ Measures of Kurtosis
09-01-2023 8
Measures of Central tendency or average
❑ Arithmetic Mean
❑ Median
❑ Mode
❑ Geometric Mean
❑ Harmonic Mean
09-01-2023 9
Arithmetic Mean (A.M.)
A.M. : Arithmetic mean of set of observations is their sum
divided by the number of observations.
Simple A.M. :
N - Number of observations.
09-01-2023 10
A.M. For Discrete Series:
Direct Method:
f - Frequency of the given set of observations
𝑁 = σ𝑛𝑖=1 𝑓𝑖 = Total number of observations
Marks 20 30 40 50 60 70
No. of
8 12 20 10 6 4
Students
09-01-2023 11
Marks No. of Students ( f ) fx
20 8 160
30 12 360
40 20 800
50 10 500
60 6 360
70 4 280
𝑁 = 60 σ 𝑓𝑥 = 2460
09-01-2023 12
A.M. For Continuous Series:
Direct Method:
f - Frequency of the given set of observations
x - mid-point of each class
𝑁 = σ𝑛𝑖=1 𝑓𝑖 = Total number of observations
09-01-2023 13
No. of Students Mid-point
Marks fx
(𝒇) (𝒙)
0-10 12 5 60
10-20 18 15 270
20-30 27 25 675
30-40 20 35 700
40-50 17 45 765
50-60 6 55 330
09-01-2023 14
Using the Deviation:
If the values of x or f are large, the calculation of A.M. by above
formula is quite time-consuming and tedious.
The A.M. is reduced to a great extent by taking the deviations
of the given values from any arbitrary point ‘A’ :
Let 𝑑𝑖 = 𝑥𝑖 − 𝐴 ⇒ 𝑓𝑖 𝑑𝑖 = 𝑓𝑖 𝑥𝑖 − 𝐴
1 ℎ
𝑋ത = 𝐴 + σ𝑁
𝑖=1 𝑓𝑖 𝑑𝑖 or 𝑋 = 𝐴 + σ𝑁
ത 𝑖=1 𝑓𝑖 𝑑𝑖
𝑁 𝑁
ℎ 𝑜𝑟 𝑖 − common magnitude of class
09-01-2023 15
C.I. 0-8 8-16 16-24 24-32 32-40 40-48
Ex:
Frequency 8 7 16 24 15 7
09-01-2023 16
Median:
Median of a distribution is the value of the variable which divides it
into two equal parts.
It is the value which exceeds and is exceeded by the same number of
observations. Thus the median is called as a “positional average”.
09-01-2023 17
For Grouped data:
Ex: x 1 2 3 4 5 6 7 8 9
f 8 10 11 16 20 25 15 9 6
09-01-2023 18
x f c.f.
1 8 8
2 10 18
3 11 29
4 16 45 The cumulative frequency (c.f.) just
5 20 65 𝑁+1
greater than is 65 and value
2
6 25 90
corresponding to 65 is 5.
7 15 10
8 9 114 ∴ 𝑴𝒆𝒅𝒊𝒂𝒏 𝒊𝒔 𝟓.
9 6 120
N = 120
09-01-2023 19
(ii) Continuous Frequency Distribution:
In case of continuous frequency distribution, the class corresponding to the
𝑵
c.f. just greater than is called the median class and the value of median is
𝟐
obtained by the following formula:
Median =
09-01-2023 20
Ex: Find the median of the following data:
Wages
2000-3000 3000-4000 4000-5000 5000-6000 6000-7000
(in Rs.)
No. of
3 5 20 10 5
workers
09-01-2023 21
Solution:
Wages No. of c.f. =
(in Rs.) Employees
2000-3000 3 3 Cumulative frequency just greater
3000-4000 5 8
than 21.5 is 28 and the corresponding
4000-5000 20 28 class is 4000-5000.
5000-6000 10 38 Thus the median class is 4000-5000.
6000-7000 5 43
l = 4000; h = 1000; f = 20; c = 8
N = 43
1000
Median = 4000 + (21.5 − 8)
20
∴ Median = 4675.
09-01-2023 22
Quartiles
𝑁+1
First Quartile (Q1) = Size of 𝑡ℎ item (Discrete series)
4
𝑁
Q1 = Size of 𝑡ℎ item (Continuous series)
4
Q1 = l
09-01-2023 23
𝑁+1
Third Quartile (Q3) = Size of 3 𝑡ℎ item (Discrete series)
4
3𝑁
Q3 = Size of 𝑡ℎ item. (Continuous series)
4
Q3 = l
09-01-2023 24
Example 3: Calculate Q1 and Q3 for the following data.
Roll No. 1 2 3 4 5 6 7
Marks 20 28 40 12 30 15 50
𝑁+1 7+1
Q1 = Size of 𝑡ℎ item = Size of = 2nd item.
4 4
nd
Size of 2 item is 15. Hence Q1 = 15
𝑁+1 7+1
Q3 = Size of 3 4 𝑡ℎ item = Size of 3 4
= 6th item.
Size of 6th item is 40. Hence Q3 = 40.
.
09-01-2023 25
Example 4: Compute the value of Q1 and Q3 for following data:
f 12 19 5 10 9 6 6
09-01-2023 26
Solution: Cumulative
Marks Frequency
Frequency
10-20 12 12
20-30 19 31
30-40 5 36
40-50 10 46
50-60 9 55
60-70 6 61
70-80 6 67
𝑵 = 67
09-01-2023 27
𝑁 67
Q1 = Size of 𝑡ℎ item = Size of = 16.75th item.
4 4
Q1 - lies in the interval 20-30
Q1 = 20 = 20+2.5 = 22.5
Hence Q1 = 22.5
09-01-2023 28
3𝑁 3×67
Q3 = Size of 𝑡ℎ item = Size of = 50.25th item.
4 4
Q3 - lies in the class 50-60.
50.25−46
Q3 = 50 + × 10 = 50+4.72 = 54.72
9
Hence Q3 = 54.72
09-01-2023 29
𝑁+1
Deciles: 𝑫𝟒 = Size of 4 item in individual and discrete series.
10
4𝑁
𝑫𝟒 = Size of th item in continuous series.
10
𝑁+1
Percentiles: 𝑷𝟔𝟎 = Size of 60 th item in discrete series.
100
60𝑁
𝑷𝟔𝟎 = Size of th item in continuous series.
100
09-01-2023 30
4𝑁 4×67
𝑫𝟒 = Size of th item = = 26.8 th item.
10 10
𝑫𝟒 lies in the interval of 20-30.
09-01-2023 31
MODE
Definition: Mode is the value of the variable which is predominant
in the series.
09-01-2023 32
For Continuous Frequency distribution:
Mode =
09-01-2023 33
C.I. 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Frequency 5 8 7 12 28 20 10 10
Mode = =
09-01-2023 34
A distribution is having only one mode is called Unimodal.
If it contains more than one mode, it is called bimodal or multimodal.
Then the value of the mode cannot be determined the above formula and
hence mode is ill-defined.
09-01-2023 35
1) Calculate the mean, median and mode for the following data.
Wages
15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64
(in lakhs.)
No. of
31 47 59 78 104 113 81 60 52 25
workers
Mean = 39.52
Median = 39.77
Mode = 40.6
09-01-2023 36
2) Calculate the Mean, Median and Mode for the following data.
Variable 10-13 13-16 16-19 19-22 22-25 25-28 28-31 31-34 34-37 37-40
Frequency 8 15 27 51 75 54 36 18 9 7
Mean = 24.19
Median = 23.96
Mode = 23.6
09-01-2023 37
Series A Series B Series C
200 200 1
200 205 989
200 202 2
200 203 3
200 190 5
Mean = 200 200 200
09-01-2023 38
Measure of Dispersion
❖ Scatteredness (homogeneity or heterogeneity)
09-01-2023 39
Absolute Measure of Dispersion:
09-01-2023 40
Range
Definition: Difference between the value of the smallest item and the
value of the largest item in the distribution.
Range = 𝑳 − 𝑺
L – Largest Value, S- Smallest Value
09-01-2023 41
Example 1: The following are the prices of shares of a company from
Monday to Saturday:
09-01-2023 42
In a frequency distribution, range is calculated by taking the difference
between the lower limit of the lowest class and the upper limit of the
highest class.
Range = L – S = 70 – 10 = 60
Coefficient of Range = =
09-01-2023 43
Quartile Deviation
Definition: Average amount by which the two quartiles differ from the
median.
Quartile Deviation (Q.D.) =
09-01-2023 44
Relative measure of Q.D.
𝑸𝟑 −𝑸𝟏
Coefficient of Q.D. =
𝑸𝟑 +𝑸𝟏
09-01-2023 45
Example 3: Calculate the value of Q.D. and its coefficient of Q.D.
from the following data.
Roll No. 1 2 3 4 5 6 7
Marks 20 28 40 12 30 15 50
𝑁+1 7+1
Q1 = Size of 𝑡ℎ item = Size of = 2nd item.
4 4
nd
Size of 2 item is 15. Hence Q1 = 15
𝑁+1 7+1
Q3 = Size of 3 4 𝑡ℎ item = Size of 3 4
= 6th item.
Size of 6th item is 40. Hence Q3 = 40.
.
09-01-2023 46
𝑄3 −𝑄1 40−15
∴ 𝑸. 𝑫. = 2 = 2 = 12.5
𝑄3 −𝑄1 40−15
Coefficient of Q.D. = = = 𝟎. 𝟒𝟓𝟓
𝑄3 +𝑄1 40+15
09-01-2023 47
Example 4: Compute the value of Q.D. and its coefficient from
the following data.
Marks 10 20 30 40 50 60
No. of
4 7 15 8 7 2
Students
09-01-2023 48
Solution: Marks Frequency cumulative
frequency
10 4 4
20 7 11
30 15 26
40 8 34
50 7 41
60 2 43
𝑁+1 43+1
Q1 = Size of 𝑡ℎ item = Size of = size of 11th item.
4 4
Size of 11th item is 20. Hence Q1 = 20
𝑁+1 43+1
Q3 = Size of 3 𝑡ℎ item = Size of 3 = size of 33rd
4 4
item.
Size of 33rd item is 40. Hence Q3 = 40.
09-01-2023 49
= 10
𝑄3 −𝑄1 40−20
Coefficient of Q.D. = = = 0.333
𝑄3 +𝑄1 40+20
09-01-2023 50
Example 4: Compute the value of Q.D. and coefficient of Q.D.
from the following data
f 12 19 5 10 9 6 6
09-01-2023 51
Solution: Cumulative
Marks Frequency
Frequency
10-20 12 12
20-30 19 31
30-40 5 36
40-50 10 46
50-60 9 55
60-70 6 61
70-80 6 67
𝑵 = 67
09-01-2023 52
𝑁 67
𝑡ℎ item = = 16.75th item.
4 4
Q1 lies in the interval 20-30
Q1 = 20 = 20+2.5 = 22.5
Hence Q1 = 22.5
09-01-2023 53
3𝑁 3×67
𝑡ℎ item = = 50.25th item.
4 4
Q3 lies in the class 50-60.
50.25−46
Q3 = 50 + × 10 = 50+4.72 = 54.72
9
Hence Q3 = 54.72
𝑄3 −𝑄1 54.72−22.5
𝑸. 𝑫. = = = 16.11
2 2
𝑄3 −𝑄1 54.72−22.5
Coefficient of Q.D. = = = 0.4172
𝑄3 +𝑄1 54.72+22.5
09-01-2023 54
Mean Deviation
Definition: M.D. is the average difference between the observations in a distribution and
the median or mean of that series.
σ𝐷
Mean Deviation or M.D. =
𝑁
Where 𝐷 is the deviations from median ignoring signs.
09-01-2023 55
Relative Measure of M.D. :
Co-efficient of M.D. =
09-01-2023 56
Example 5: Calculate the mean deviation of the two income
groups.
I (Rs.) II (Rs.)
4000 3000
4200 4000
4400 4200
4600 4400
4800 4600
4800
5800
09-01-2023 57
Solution: Group I Group II
Rs. 𝑫 Rs. 𝑫
4000 400 3000 1400
4200 200 4000 400
4400 0 4200 200
4600 200 4400 0
4800 400 4600 200
4800 400
5800 1400
N=5 σ 𝑫 = 1200 N=7 σ 𝑫 =4000
σ𝐷
Mean deviation : I group M.D. =
𝑁
𝑁+1 5+1
Median = 2
𝑡ℎ item = 2
= 3rd item. Size of the 3rd item = 4400.
1200
M.D. = = 240
5
i.e., the average deviation of the individual incomes from the median income
is Rs. 240.
09-01-2023 58
σ𝐷
Mean deviation : II group M.D. =
𝑁
𝑁+1 7+1
Median = 𝑡ℎ item = = 4th item.
2 2
Size of the 4th item = 4400.
4000
M.D. = = 571.43
7
i.e., the average deviation of the individual incomes from the
median income is Rs. 571.43.
𝑀.𝐷. 240
Co-efficient of M.D. (I - Group) = = = 0.055
𝑀𝑒𝑑𝑖𝑎𝑛 4400
571.43
(II- Group) = = 0.13
4400
09-01-2023 59
Mean deviation – Discrete Series
M.D. =
09-01-2023 60
Example 6: The number of telephone calls received at an exchange in 245
successive one-minute intervals are shown in the following
frequency distribution. Compute the mean deviation about
the median.
Number of
0 1 2 3 4 5 6 7
Calls
Frequency 14 21 25 43 51 40 39 12
09-01-2023 61
Solution: No. of Calls f c.f. 𝑫 f𝑫
0 14 14 4 56
1 21 35 3 63
2 25 60 2 50
3 43 103 1 43
4 51 154 0 0
5 40 194 1 40
6 39 233 2 78
7 12 245 3 36
N = 245 σ 𝒇 𝑫 = 366
𝑁+1 245+1
Median = Size of th item = size of item= size of 123rd item .
2 2
Hence the median value is 4.
σ𝑓𝐷 366
M.D. = = = 1.49
𝑁 245
09-01-2023 62
In Continuous Series:
We have to obtain the mid-points of the various classes and take
the deviations of these mid-points from median.
09-01-2023 63
Calculate the coefficient of mean deviation from the following data:
09-01-2023 64
Standard deviation
For the frequency distribution 𝑥𝑖 | 𝑓𝑖 ; i =1, 2, …, n ,
ҧ 2
σ(𝑥𝑖 −𝑥) 1 1 1
2
Variance = 𝜎 = = σ 𝑥𝑖2 − ( σ 𝑥𝑖 ) 2 = σ 𝑥𝑖2 − 𝑥ҧ 2
𝑁 𝑁 𝑁 𝑁
1 1
Standard deviation = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝜎= ҧ 2
σ (𝑥𝑖 − 𝑥) or ҧ 2
σ 𝑓𝑖 (𝑥𝑖 − 𝑥)
𝑁 𝑁
09-01-2023 65
Coefficient of Variation
09-01-2023 66
3) The score of two players A and B in ten innings during a certain
season are:
A 32 28 47 63 71 39 10 60 96 14
B 19 31 48 53 67 90 10 62 40 80
09-01-2023 67
Solution: Calculation of Coefficient of Variation
𝑋 ത
(𝑋 − 𝑋) ത 2
(𝑋 − 𝑋) 𝑌 ത
(𝑌 − 𝑌) ത 2
(𝑌 − 𝑌)
32 -14 196 19 -31 961
28 -18 324 31 -19 361
47 +1 1 48 -2 4
63 +17 289 53 +3 9
71 +25 625 67 +17 289
39 -7 49 90 +40 1600
10 -36 1296 10 -40 1600
60 +14 196 62 +12 144
96 +50 2500 40 -10 100
14 -32 1024 80 +30 900
σ 𝑋 = 460 0 6500 σ 𝑌 =500 0 5968
460 500
𝑋ത = = 46 𝑌ത = = 50
10 10
σ(𝑥𝑖 − 𝑥)ҧ 2 6500 ത 2
σ(𝑦𝑖 − 𝑦) 5968
𝜎𝐴2 = = = 650 𝜎𝐵2 = = = 596.8
𝑁 10 𝑁 10
09-01-2023 68
𝜎𝐴 𝜎𝐵
C.V.(A) = x 100 = 55.43 C.V.(B) = x 100 = 48.86
𝑥ҧ 𝑦ത
09-01-2023 69
2
𝑋 = 460 ; 𝑋𝑖 − 𝑥ҧ = 0 ; 𝑋𝑖 − 𝑥ҧ = 6500
2
𝑌 = 500 ; 𝑌𝑖 − 𝑦ത = 0 ; 𝑌𝑗 − 𝑦ത = 5968
𝜎𝐴 = 25.5 𝜎𝐵 = 24.43
09-01-2023 70
4) Suppose that samples of polythene bags from two manufacturers, A and B, are
tested by a prospective buyer for bursting pressure, with the following results:
09-01-2023 72
σ𝑛
𝑖=1 𝑓𝑖 𝑑𝑖
𝑋ത𝐴 = 𝐴 + ×𝑖
𝑁
78
𝑋ത𝐴 = 17.45 + × 5 = 21
110
09-01-2023 73
For Manufacturer B
Bursting
Pressure 𝒎 𝒇 𝒇𝒅 𝒇𝒅𝟐
(lb.) d
4.95-9.95 7.45 9 -2 -18 36
9.95-14.95 12.45 11 -1 -11 11
14.95-19.95 17.45 18 0 0 0
19.95-24.95 22.45 32 +1 +32 32
24.95-29.95 27.45 27 +2 +54 108
29.95-34.95 32.45 13 +3 +39 117
N = 110 σ 𝒇𝒅 = 96 σ 𝒇𝒅𝟐 =304
09-01-2023 74
σ𝑛
𝑖=1 𝑓𝑖 𝑑𝑖
𝑋ത𝐵 = 𝐴 + ×𝑖
𝑁
96
𝑋ത𝐵 = 17.45 + × 5 = 21.81
110
09-01-2023 75
𝑋ത𝐴 = 21 𝑋ത𝐵 = 21.81
𝜎𝐴 = 4.88 𝜎𝐵 = 7.07
09-01-2023 76
C.I. f C.I. f
0-5 10 0-5 10
5-10 30 5-10 40
10-15 60 10-15 30
15-20 60 15-20 90
20-25 30 20-25 20
25-30 10 25-30 10
𝑋ത = 15
𝜎=6
09-01-2023 77
90
60 60
ഥ
𝒙 = 15
𝝈 = 𝟔
30 30
40
30
20
10 10
10 10
09-01-2023 78
Skewness
When a series is not symmetrical it is said to be asymmetrical or skewed.
A distribution is said to be ‘skewed’ when the mean and median
fall at different points in the distribution, and the balance (or centre of
gravity) is shifted to one side or the other to left or right.
09-01-2023 79
Dispersion is concerned with the amount of variation rather than with its
direction.
Skewness tell us about the direction of the variation or the departure from
symmetry.
Types of Skewness:
(i) Symmetrical Distribution
(mean, median and mode are equal, there is no skewness)
(ii) Positively Skewed Distribution
(Mean >median> Mode)
(iii) Negatively Skewed Distribution
(Mean <median< Mode)
09-01-2023 80
• Methods of ascertaining Skewness
Skewness can be studied graphically and mathematically.
Sk = 𝑋ത - Mode
09-01-2023 82
(i) Karl pearson’s coefficient of skewness.
Coefficient of Skewness : Sk = =
however, in practice, it is rare that the value of Sk exceed the limits of ±𝟏.
Sk =
09-01-2023 83
(ii) Bowley’s coefficient of skewness
It is based on Quartiles. In a symmetrical distribution first and third
quartiles are equidistant from the median :
𝑄1 𝑀𝐸𝐷𝐼𝐴𝑁 𝑄3
Sk =
This measure is called the quartile measure of skewness and varies
between ± 1.
09-01-2023 84
(iii) Moments
Moment is a measure of a force with respect to its tendency to
provide rotation.
The strength of the tendency depends on the amount of force and the
distance from the origin of the point at which the force is exerted.
09-01-2023 85
Central Moments (Moments about the Arithmetic Mean):
ത
σ (𝑋𝑖−𝑋)
First Moment 𝜇1 = : (sum of the deviations from
𝑁
A.M. is always zero. 𝝁𝟏 = 𝟎)
Third Moment
Fourth Moment
09-01-2023 86
For a frequency distribution:
ത
σ 𝑓𝑖 (𝑋𝑖−𝑋)
First Moment 𝜇1 =
𝑁
Third Moment or
Fourth Moment or
09-01-2023 87
Conversion of moments about an Arbitrary origin into
Moments about mean
𝜇1 = 𝜇1′ − 𝜇1′
𝜇2 = 𝜇2′ − (𝜇1′ ) 2
09-01-2023 88
Non-central Moments (Moments about the Assumed Mean)
Where the actual mean is in fractions it is difficult to calculate moments by applying the
above formulae. In such cases we can first compute moments about an arbitrary origin(A).
σ(𝑋𝑖−𝐴) σ 𝑓𝑖 (𝑋𝑖−𝐴)
𝜇1′ = ത
=𝑋−𝐴 ; ′
𝜇1 =
𝑁 𝑁
Mean = 𝑋ത = 𝜇1′ + A
09-01-2023 89
(beta one) = (Coeff. of Skewness)
09-01-2023 90
Marks Frequency
0-10 5
10-20 20
20-30 15
30-40 45
40-50 10
50-60 5
09-01-2023 91
𝑋ത = 30
Mode = 35.3846
Sk = 𝑋ത - Mode = -5.3846
09-01-2023 92
Mid point
Marks f 𝒅 = (𝒎 − 𝟑𝟓)/𝟏𝟎 fd 𝒇𝒅𝟐 𝒇𝒅𝟑 𝒇𝒅𝟒
(m)
0-10 5 5 -3 -15 45 -135 405
10-20 15 20 -2 -40 80 -160 320
20-30 25 15 -1 -15 15 -15 15
30-40 35 45 0 0 0 0 0
40-50 45 10 +1 10 10 10 10
50-60 55 5 +2 10 20 40 80
N =100 σ 𝒇𝒅 = -50 σ 𝒇𝒅𝟐 = 170 σ 𝒇𝒅𝟑 = -260 σ 𝒇𝒅𝟒 = 830
σ 𝑓𝑖 𝑑𝑖
𝜇1′ = × 𝑖 = -5 = 2600
𝑁
Mean = 𝑋ത = 𝜇1′ + A = 30
= 83000
= 170
09-01-2023 93
𝜇2 = 𝜇1′ − (𝜇1′ ) 2 = 145 = variance = 𝜎 2
= - 0.02952 ; = = 2.5981
𝛾1 = 𝛽1 = - 0.172 ; 𝛾2 = 𝛽2 −3 = -0.4019
𝜇4
It is measured by the Coefficient of 𝛽2 = or 𝛾2 = 𝛽2 −3
𝜇22
(i) 𝛽2 = 3, i.e., 𝛾2 = 0 : mesokurtic curve
(ii) 𝛽2 < 3, i.e., 𝛾2 < 0 : platykurtic curve
(iii) 𝛽2 > 3, i.e., 𝛾2 > 0 : leptokurtic curve
09-01-2023 95
09-01-2023 96