0% found this document useful (0 votes)
23 views22 pages

BS Unit 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views22 pages

BS Unit 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT III

Syllabus:

Measures of Dispersion – Range, Quartile Deviation, Mean Deviation & Standard Deviation with coefficients

Skewness – Bowley’s and Karl Pearson’s methods

Measures of Dispersion

Averages or measures of central tendency give us an idea of the concentration of observations about the
central part of the distribution. In spite of their great utility in statistical analysis, they have their own
limitations. If we are given only the average of a series of observations, we cannot form compete idea
about the distribution since there may exist a number of distributions whose averages are same but which
may differ widely from each other in a number of ways. The following example will illustrate this view
point.

Let us consider the following three series A, B and C of 9 items each.

Total Mean

A 15 15 15 15 15 15 15 15 15 135 15

B 11 12 13 14 15 16 17 18 19 135 15

C 6 10 13 14 15 17 19 20 21 135 15

All the three series A, B and C have the same size (n = 9) and same mean viz., 15. Thus, if we are given
that the mean of a series of 9 observations is 15, we cannot determine if we are talking of the series A, B
or C. In fact, any series of 9 items with total 135 will give mean 15. Thus, we may have a large number
of series with entirely different structures and compositions but having the same mean.

From the above illustration it is obvious that the measures of central tendency are inadequate to describe
the distribution completely. Thus the measures of central tendency must be supported and supplemented
by some other measures. One such measure is Dispersion.

Literal meaning of dispersion is “scatteredness”. We study dispersion to have an idea of the homogeneity
or heterogeneity of the distribution. In the above illustration, we say that the series A is stationary, i.e., it
is constant and shows no variability. Series B is slightly dispersed and series C is relatively more
dispersed. We say that series B is more homogeneous as compared with series C or the series C is more
heterogeneous than series B.

Definitions

“ Dispersion is the measure of the variation of the items” – A.L. Bowley


“Dispersion is a measure of the extent to which the Individual items vary.” – L.R. Connor

“Dispersion or spread is the degree of the scatter or variation of the variables about a central value.” –
B.C. Brooks and W.F.L. Dick.

Objectives or Significance of measures of dispersion

The main objectives of studying measures of dispersion are as follows:

1. To find out reliability of an average.

The measures of variation enable us to find out if the average is representative of the data. Dispersion
gives us an idea about the spread of the observations about an average value. If the dispersion is
small, it means that the given data values are closer to the central (average ) value and hence the
average may be regarded as reliable in the sense that it provides a fairly good estimate of the
corresponding population average. If the dispersion is large, then the data values are more deviated
from the central value, thereby implying that the average is not representative of the data and hence
not quite reliable.

2. To control the variation of data from the central value.

The measures of variation help us to determine the causes and the nature of variation, so as to control
the variation itself.

3. To compare two or more sets of data regarding their variability.

The relative measures of dispersion may be used to compare two or more distributions, even if they
are measured in different units, as regards their variability or uniformity.

4. To obtain other statistical measures for further analysis of data.

The measures of variation are used for computing other statistical measures which are used
extensively in correlation, regression analysis etc.

Characteristics of an ideal measure of dispersion

A good measure of dispersion should possess the following characteristics:

1. It should be rigidly defined.


2. It should be easy to calculate and easy to understand.
3. It should be based on all the observations.
4. It should be amenable to further mathematical treatment.
5. It should be least affected by sampling fluctuations.
6. It should not be affected much by extreme observations.

Absolute and Relative Measures of dispersion


The measures of dispersion which are expressed in terms of the original units of a series are termed as
Absolute measures. Such measures are not suitable for comparing the variability of the two distributions
which are expressed in different units of measurement. On the other hand, Relative measures of
dispersion are obtained as ratios or percentages and are thus pure numbers independent of the units of
measurement. For comparing the variability of two distributions (even if they’re measured in the same
units), we compute the relative measures of dispersion instead of the absolute measures of dispersion.

Measures of Dispersion

The various measures of dispersion are:

i. Range
ii. Quartile Deviation or Semi-Interquartile Range.
iii. Mean deviation
iv. Standard deviation.
v. Lorenz curve.

The first two measures viz., Range and Quartile Deviation are termed as positional measures since they
depend upon the values of the variable at particular position of the distribution. The last measure viz.,
Lorenz curve is a graphic method of studying variability.

RANGE

Range is the simplest of all the measures of dispersion. It is defined as the difference between two
extreme observations of the distribution. In other words, range is the difference between the largest
(maximum) and the smallest (minimum) observation of the distribution. Thus,

Range = X max – X min

Where X max is the largest observation and X min is the smallest observation of the variable values.

In case of grouped frequency distribution (for discrete values) or continuous frequency distribution, range
is defined as the difference between the upper limit of the highest class and the lower limit of the smallest
class.

Range as defined in the above formula is the absolute measure of range. A relative measure of range, also
termed as the coefficient of range, is defined as:

Coefficient of Range = X max – X min

X max + X min

Example:
Merits and Demerits of Range

Range is the simplest though crude measure of dispersion.

Merits

1. It is easy to understand and easy to calculate.


2. It gives a quick measure of variability.

Demerits

1. It is not based on all the observations.


2. It is very much affected by extreme observations.
3. It only gives a rough idea of spread of observations.
4. It does not give any idea about the pattern of the distribution. There can be two distributions with
the same range but with different patterns of distribution.
5. It is very much affected by sampling fluctuations.
6. It is not capable of being treated mathematically.
7. It cannot be calculated for a distribution with open ends.

Uses

Inspite of many serious demerits, it is useful in the following situations:


1. Range is used in industry for the statistical quality control of the manufactured product by the
construction of R – chart i.e. the control chart for range.
2. Range is by far the most widely used measure of variability in our day –to –day life. For example,
the answer to problems like, ‘daily sales in a departmental store’, monthly wages of workers in a
factory etc. is usually provided by the probable limits – in the form of a range.
3. Range is also used as a very convenient measure by meteorological department for weather
forecasts since the public is primarily interested to know the limits within which the temperature
is likely to vary on a particular day.

QUARTILE DEVIATION OR SEMI INTER – QUARTILE RANGE

It is a measure of dispersion based on the upper quartile Q3 and the lower quartile Q1.

Inter – quartile Range = Q3 – Q1

Quartile deviation is obtained from inter-quartile range on dividing by 2 and hence is also known as
semi inter-quartile range. Thus

Q 3−Q1
Quartile Deviation (Q.D) ¿
2

The above measure is an absolute measure of dispersion. Relative measure of Quartile deviation is
known as coefficient of quartile deviation and is given by:

( Q3 −Q1 )
2
Coefficient of Q . D=
Q3−Q!
2

Q3−Q1
¿
Q3−Q1

Example:
Merits and Demerits of Quartile Deviation

Merits

1. It is rigidly defined.
2. It is easy to understand and compute.
3. It is not affected by extreme observations and hence a suitable measure of dispersion when a
distribution is highly skewed.
4. It can be calculated even for a distribution with open ends.
Demerits

1. Since it is not based on all observations, it is not a reliable measure of dispersion.


2. It is very much affected by the fluctuations of sampling.
3. It is not capable of being treated mathematically.

MEAN DEVIATION OR AVERAGE DEVIATION

An important requirement of a measure of dispersion is that it should be based on all the observations.
Range and Quartile deviation do not satisfy this requirement. Mean deviation is a measure of dispersion
based on all the observations. It is defined as the arithmetic mean of the absolute deviations of
observations from a central value like mean, median or mode. Here the dispersion in each observation is
measured by its deviation from a central value. This deviation will be positive for an observation greater
than the central value and negative for less than it. If these deviations are added as such, the positive
deviations tend to cancel the negative deviations and the resulting sum will not reflect a correct
magnitude of the total spread of observations. This difficulty is tackled by taking the absolute value of the
deviations.

Calculation of Mean Deviation

The following are the formulae for the computation of mean deviation (M.D) of an individual series of
observations X1, X2,,…. Xn:

∑| X− X̅ |
1. M . D ¿ Mean ( X )=
n
∑|X −median|
2. M . D ¿ Median=
n
∑|X −mode|
3. M . D ¿ Mode=
n

Example:

Calculate Mean Deviation from mean and median for the following data of heights (in inches) of 10
persons.

60, 62, 70, 69, 63, 65, 60, 68, 63, 64

Also calculate their respective coefficients.

Solution:

Calculation of M.D from X̅

60+ 62+70+69+63+ 65+60+68+63+ 64


X= =64.4 inches
10
X | X−X̅ |
60 4.4
62 2.4
70 5.6
69 4.6
63 1.4
65 0.6
60 4.4
68 3.6
63 1.4
64 0.4
∑| X−X|=28.8

∑|X −X̅ | 28.8


M.D from X̅ = = =2.88 inches
n 10

¿ 2.88
Coefficient of M.D from X̅ = M . D ¿ X̅ = =0.045
X̅ 64.4

Calculation of M.D from Median:

Arranging the observations in order of magnitude we have

60, 60, 62, 63, 63. 64, 65, 68, 69, 70

63+64
Median = =63.5inches
2

X | X−Median|
60 3.5
60 3.5
62 1.5
63 0.5
63 0.5
64 0.5
65 1.5
68 4.5
69 5.5
70 6.5
∑| X−Median|=28
∑|X −median| 28
M.D from Median = = =2.8 inches
n 10

2.8
Coefficient of M.D from Md = =0.044
63.5

In case of an ungrouped frequency distribution, the observations X 1, X2,……Xn occur with respective
frequencies f1, f2,….fn such that ∑f = N. The corresponding formulae for M.D can be written as:

∑ f |X −X̅ |
1. M . D ¿ Mean=
N
∑ f | X−Median|
2. M . D ¿ Median=
N
∑ f | X−mode|
3. M . D ¿ Mode=
N

The above formulae are also applicable to a grouped frequency distribution where the symbols X 1, X2,
…..Xn will denote the mid-values of the first, second……nth classes respectively.

Coefficient of Mean Deviation

The above formulae for mean deviation give an absolute measure of dispersion. The formulae for relative
measure, termed as the coefficient of mean deviation, are given below:

¿
1. Coefficient of M . D ¿ Mean=M . D ¿ X̅

¿
2. Coefficient of M . D ¿ Median=M . D ¿ Median
Median
¿
3. Coefficient of M . D ¿ Mode=M . D ¿ Mode
Mode

Example:
Weekly No. of Mid-
u=
X −475fu c.f | X−X̅ | f | X− X̅ | | X−M d| f | X−M d|
wages Workers values 50
(f) (X)
200 - 250 7 225 -5 -35 7 227.25 1590.75 230 1610
250 – 300 13 275 -4 -52 20 177.25 2304.25 180 2340
300 – 350 15 325 -3 -45 35 127.25 1908.75 130 1950
350 – 400 24 375 -2 -48 59 77.25 1854 80 1920
400 – 450 36 425 -1 -36 95 27.25 981 30 1080
450 – 500 50 475 0 0 145 22.75 1137.5 20 1000
500 – 550 25 525 1 25 170 72.75 1818.75 70 1750
550 – 600 10 575 2 20 180 122.75 1227.5 120 1200
600 – 650 8 625 3 24 188 172.75 1382 170 1360
650 -700 6 675 4 24 194 222.75 1336.5 220 1320
700 – 750 4 725 5 20 198 272.75 1091 270 1080
750 - 800 2 775 6 -12 200 322.75 645.5 320 640
N = 200 17277.5 17250

∑ fu
X =A +h .
N

A = 475, h = 50, N = 200, ∑fu = -91

(−91 )
X =475+50 × =452.25
200

∑ f |X −Mean| 17277.5
M.D from Mean = = =86.39
N 200

Calculation of M.D from Median


M d =l+
h N
f 2 (
−C )
Median Class is 450 – 500; l = 450; h = 50; f =50; N = 200; C = 95

M d =450+
50 200
50
×
2 (
−95 =455 )
∑ f |X −Median| 17250
M.D from Median = = =86.25
N 200

Example:

Calculate mean deviation from median and its coefficient from the given data:

X : 0 1 2 3 4 5 6 7 8 9

F : 15 45 91 162 110 95 82 26 13 2

Solution:

Calculation of Mean Deviation

X f Less than c.f | X−4| f | X−4|


0 15 15 4 60
1 45 60 3 135
2 91 151 2 182
3 16 313 1 162
2
4 11 423 0 0
0
5 95 518 1 95
6 82 600 2 164
7 26 626 3 78
8 13 639 4 52
9 2 641 5 10
Total 938

N 641
Since = =320.5 ∴ M d=¿ 4 ¿
2 2

∑ f |X −Median| 938
Thus, M.D¿ = =1.46
N 641

Coefficient of M.D = M . D ¿ Median ¿ = 1.46 =0.365


Median 4

Merits, Demerits and Uses of Mean Deviation


Merits

1. It is easy to understand and compute.


2. It is based on all the observations.
3. It is less affected by extreme observations vis-à-vis range or standard deviation.
4. It is not much affected by sampling.

Demerits

1. It is not capable of further mathematical treatment. Since mean deviation is the arithmetic mean
of absolute value of deviations, it is not very convenient to be algebraically manipulated. This
necessitates a search for a measure of dispersion which is capable of being subjected to further
mathematical treatment.
2. It is not a well-defined measure of dispersion since deviations can be taken from any measure of
central tendency.

Uses

The mean deviation is a very useful measure of dispersion when sample is small and no elaborate analysis
of data is needed. Since standard deviation gives more importance to extreme observations the use of
mean deviation is preferred in statistical analysis of certain economic, business and social phenomena.

STANDARD DEVIATION

Standard deviation, usually denoted by the letter σ (small sigma) of the Greek alphabets was first
suggested by Karl Pearson as a measure of dispersion in 1893. It is defined as the positive square root of
the arithmetic mean of the squares of the deviations of the given observations from their arithmetic mean.
Square of standard deviation is known as variance. Thus, if X 1, X2,….,Xn is a set of n observations then its
standard deviation is given by:


2
∑ ( X−X̅ )
σ=
n

∑X
where, X = is the arithmetic mean of the n observations.
n

If X is a whole number, the above method is appropriate. If X̅ is not a whole number the S.D can be
conveniently computed by using the transformed form of the above formula, given below:

√ ∑ X2 ∑ X
( )
2
σ= −
n n

Where n is the number of observations.

In case of frequency distribution, the standard deviation is given by:



2
∑ f ( X− X̅ )
σ=
N

Where X is the value of the variable or the mid-value of the class (in case of grouped or continuous
frequency distributions); f is the corresponding frequency of the value X; N = ∑f, is the total frequency
and

∑ fX
X=
N

is the arithmetic mean of the distribution.

S.D of an ungrouped and grouped frequency distribution can also be computed using the formula:

√ ∑ f X 2 ∑ fX
( )
2
σ= −
N N

= √ Mean of squares−Square of the mean

Variance is given by σ 2

Example:

Calculate standard deviation of the following data:

X: 10 11 12 13 14 15 16 17 18

f: 2 7 10 12 15 11 10 6 3

Solution:

Calculation of Standard Deviation

X X2 f fX fX2
10 100 2 20 200
11 121 7 77 847
12 144 10 120 1440
13 169 12 156 2028
14 196 15 210 2940
15 225 11 165 2475
16 256 10 160 2560
17 289 6 102 1734
18 324 3 54 972
Tota 76 106 15196
l 4

σ=

∑ f X 2 ∑ fX 2
N

N ( )
=
76


15196 1064 2
76 ( )
= √ 199.95−( 14 )2 =√ 3.95=1.99

Short-cut method to calculate S.D

The formulae to calculate standard deviation using short-cut method is given below:

1.

√ ( )
∑ f d 2 ∑ fd 2
σ= −
N N

Where d = X – A; where A is assumed mean, which is usually taken from a value which lies in the middle
portion of the X values.

√ ( )
∑ f u2 ∑ fu 2
2. σ =h × −
N N

X− A
Where, ¿ ; h is the class width or a common factor of the X values; A is the assumed
h
mean.

Example:
∴ σ X =14.24

Coefficient of Variation

The standard deviation is an absolute measure of dispersion and is expressed in the same units as the units
of variable X. A relative measure of dispersion, based on standard deviation is known as coefficient of
standard deviation and is given by

σ
Coefficient of Standard Deviation=

Coefficient of Standard deviation expressed in the form of percentage is called coefficient of variation.
This measure introduced by Karl Pearson, is used to compare the variability or homogeneity or stability
or uniformity or consistency of two or more sets of data. The data having a higher value of the coefficient
of variation is said to be more dispersed or less uniform or consistent, etc.

σ
Coefficient of Variation= × 100

Example:

Example:

Calculate standard deviation and its coefficient of variation from the following data:

Measurements : 0 -5 5 -10 10 – 15 15 – 20 20 – 25

Frequency : 4 1 10 3 2

Solution:

Calculation of X̅ and σ

Class Frequency (f) Mid-Values u fu fu2


Intervals (X)
0–5 4 2.5 -2 -8 16
5 – 10 1 7.5 -1 -1 1
10 -15 10 12.5 0 0 0
15 – 20 3 17.5 1 3 3
20 – 25 2 22.5 2 8 8
Total 20 -2 28

X −12.5
Here, u=
5

∑ fu 5×2
X =A + × h=12.5− =12
N 20

σ =h ×
√ N ( )
∑ f u2 ∑ fu 2

N
=5 ×

28 −2 2
20

20 ( )
=5.89

σ 5.89
Coefficient of Variation (CV) ¿ × 100= ×100=49
X̅ 12

Merits, Demerits and Uses of Standard Deviation

Merits

1. It is a rigidly defined measure of dispersion.


2. It is based on all the observations.
3. It is capable of being treated mathematically. For example, if standard deviation of a number of
groups are known, their combined standard deviation can be computed.
4. It is not very much affected by the fluctuations of sampling and therefore is widely used in
sampling theory and test of significance.

Demerits

1. As compared to quartile deviation, range, etc., it is difficult to understand and calculate.


2. It gives more importance to extreme observations.

Uses

1. Standard deviation can be used to compare the dispersions of two or more distributions when
their units of measurements and arithmetic means are same.
2. It is used to test the reliability of mean. It may be pointed out here that the mean of a distribution
with lower S.D is said to be more reliable.
SKEWNESS

Skewness of a distribution refers to its asymmetry. The symmetry of a distribution implies that for a given
deviation from a central value, there is equal number of observations on either side of it. If the
distribution is asymmetrical or skewed, its frequency curve would have a prolonged tail either towards the
left or towards its right hand side. Thus, the skewness of a distribution is defined as the departure from
symmetry.

In a symmetrical distribution, mean, median and mode are equal and the ordinate at mean divides the
frequency curve into two parts such that one part is the mirror image of the other. Positive skewness
results if some observations of high magnitude are added to a symmetrical distribution so that the right
hand tail of the frequency curve gets elongated. In such a situation, we have Mode< Median < Mean.
Similarly, negative skewness results when some observations of low magnitude are added to the
distribution so that left hand tail of the frequency curve gets elongated and we have Mode > Median >
Mean.

Measure of Skewness

A measure of skewness gives the extent and direction of skewness of a distribution. As in case of
dispersion, we can define absolute and relative measures of skewness. Two measures of skewness are:

i. Measures of Skewness based on Mean ( X̅ ¿ , Median (Md), Mode (Mo):

This measure was suggested by Karl Pearson. According to this method, the difference
between X̅ and (mode) Mo can be taken as an absolute measure of skewness in a distribution,
i.e., absolute measure of skewness = X̅ - Mo.
Alternatively, when mode is ill-defined and the distribution is moderately skewed, the
above measure can be approximately expressed as 3 ( X −Median ).

A relative measure, known as Karl Pearson’s Coefficient of Skewness, is given by

X−M o Mean−Mode
Sk = =
σ Standard Deviation

And if Mode is ill-defined,

3 ( X −M d ) 3 ( Mean−Median )
Sk = =
σ Standard Deviation

Note that, if Sk > 0, the distribution is positively skewed.

If Sk< 0, the distribution is negatively skewed and


If Sk = 0, the distribution is symmetrical.

ii. Measure of Skewness based on Quartiles

This measure suggested by Bowley, is based upon the fact that Q1 and Q3 are equidistant from
median of a symmetrical distribution, i.e., Q3 – Md = Md – Q1. Therefore,
(Q3 – Md ) – (Md – Q1) can be taken as an absolute measure of skewness.

The relative measure, known as Bowley’s Coefficient of Skewness, is defined as

( Q 3−M d ) −( M d−Q1) Q3+ Q1−2 M d


SQ = = Q −Q
( Q3−M d ) + ( M d−Q1 ) 3 1

The value of SQ will lie between -1 and +1.

Example:
Example:

You might also like