0% found this document useful (0 votes)
53 views

Bast 503 Lect 5

This document discusses measures of dispersion and relative standing in data distributions. It defines dispersion as a measure of how items vary from the central value in a distribution. Common measures of dispersion discussed include range, inter-quartile range, mean deviation, and standard deviation. The inter-quartile range is presented as a robust measure of dispersion that is less influenced by outliers than range. Advantages and disadvantages of different dispersion measures are provided.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Bast 503 Lect 5

This document discusses measures of dispersion and relative standing in data distributions. It defines dispersion as a measure of how items vary from the central value in a distribution. Common measures of dispersion discussed include range, inter-quartile range, mean deviation, and standard deviation. The inter-quartile range is presented as a robust measure of dispersion that is less influenced by outliers than range. Advantages and disadvantages of different dispersion measures are provided.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

MEASURES OF DISPERSION,

RELATIVE STANDING AND


SHAPE
INTRODUCTION
• The Measures of central tendency gives us a birds
eye view of the entire data they are called averages
of the first order, it serve to locate the centre of the
distribution
• but they do not reveal how the items are spread out
on either side of the central value.
• The measure of the scattering of items in a
distribution about the average is called dispersion.

2
INTRODUCTION
• Dispersion measures the extent to which the items
vary from some central value.
• Measures of dispersion or variation measure only the
degree but not the direction of the variation.
• The measures of dispersion are also called averages
of the second order because they are based on the
deviations of the different values from the mean or
other measures of central tendency which are called
averages of the first order.

3
DEFINITION
• In the words of Bowley “Dispersion is the measure of
the variation of the items”
• According to Conar “Dispersion is a measure of the
extent to which the individual items vary”
• According to Simpson and Kafka, “The measures of the
scatterness of a mass of figures in a series about an
average is called measure of variation, or dispersion”.
• According to Spiegel, “The degree to which numerical
data lend to spread about an average value is called
the variation, or dispersion of the data”.
4
Measures of Variability or Dispersion

• The dispersion of a distribution reveals how the


observations are spread out or scattered on each
side of the center.
• The measure the dispersion, scatteredness, or
variation of a distribution is as much important as to
locate the central tendency.
• If the dispersion is small, it indicates high uniformity
of the observations in the distribution.
• Absence of dispersion in the data indicates perfect
uniformity. This situation arises when all
observations in the distribution are identical.
• If this were the case, description of any single
observation would suffice.
 Number of minutes 20 clients
waited to see a consulting doctor
 X:Mean Time – 14.6
Consultant Doctor minutes
 Y:Mean waiting time
14.6 minutes
 What is the difference
in the two series?

Slide 6
Significance of Measuring Variation
• It points out as to how far an average is
representative of mass.
• It is used to determine the uniformity and
consistency.
• It is used to determine the nature and cause
of variation in order to control the variation.
• It enable a comparison to be made of two or
more series with regard to their variability.
Objective of dispersion
• It is the value of dispersion which says how much
reliable a central tendency is?
• Usually, a small value of dispersion indicates that
measure of central tendency is more reliable series
and vice ‐ versa. ™
• Many powerful analytical tools in statistics such as
correlation analysis, the testing of hypothesis,
analysis of variance, the statistical quality control,
regression analysis are based on measure of
variation of one kind or another.
Characteristic of an Ideal Measure of
Dispersion
• It should be rigidly defined.
• It should be easy to understand and easy to calculate.
• It should be based on all the observations of the data.
• It should be easily subjected to further mathematical
• It should be least affected by the sampling fluctuation
.
• It should not be unduly affected by the extreme
values
How dispersions are measured?
• Absolute: Measure the dispersion in the original
unit of the data.
– Variability in 2 or more distribution can be compared
provided they are given in the same unit and have the
same average.
• Relative: Measure of dispersion is free from unit of
measurement of data.
– It is the ratio of a measure of absolute dispersion to the
average, from which absolute deviations are measured.
– It is called as co-efficient of dispersion.
Measures of Dispersion

– Range
– Inter-Quartile deviation
– Mean deviation
– Standard deviation
Range
• The simplest and crudest measure of
dispersion is the range.
• This is defined as the difference between the
largest and the smallest values in the
distribution.
• If x1 , x 2 ,.........., x n are the values of observations
in a sample, then range (R) of the variable X is
given by:
R x1 , x 2 ,........, x n   maxx1 , x 2 ,..........., x n  min x1 , x 2 ,............, x n 
Range
• Symbolically for ungrouped data it is represented by:
R=H–L
Where, R = Range
H = Highest value in the observation
L = Lowest value in the observation

• In the case of a grouped data range is estimated by taking


the difference of upper limit of highest class interval and
lower limit of lowest class interval.
Range
Size of Item Frequency Set 1 Se13t 2
20-40 7 8 30
40-60 11 10 35
60-80 30 20 42
80-100 17 9 50
100-120 5 15 32
Total 70 10 49

Range = 120-20 = 100 13 39


28 33

Coefficient of Range = ?
When To Use the Range
• The range is used when
– you have ordinal data or
– you are presenting your results to people with little or no
knowledge of statistics
– It is used to find the variability of data quickly.
– When the sample size is small, it is considered quite
adequate to measure the variability.
• The range is rarely used in scientific work as it is fairly
insensitive
– It depends on only two scores in the set of data, XL and XS
– Two very different sets of data can have the same range:
1 1 1 1 9 vs 1 3 5 7 9
15
Merits and Demerits of Range
• It is simplest and easiest to compute, understand and
interpret.
• It is a crude measure since it does not take into account all
individual observation.
• It is subjected to wide fluctuation from sample to sample
based on same sporulation. Addition/removal of a single
extreme value at upper/lower end of data series can alter the
range to great extent.
• In open-ended distribution, it is not possible to compute the
range
Characteristics of Range
• Simplest measure of dispersion
• It is not based on all the observations.
• Unduly affected by the extreme values and
fluctuations of sampling.
• The range may increase with the size of the set of
observations
• Gives an idea of the variability very quickly
Inter-quartile Range or Quartile Deviation
• The IQR is a measure of variability, based on dividing a data set into
quartiles.
• Quartiles divide a rank-ordered data set into four equal parts. The
values that separate parts are called the first, second, and third
quartiles; and they are denoted by Q1, Q2, and Q3, respectively.
• the interquartile range (IQR), also called the midspread or middle 50%,
or technically H-spread, is a measure of statistical dispersion, being
equal to the difference between 75th and 25th percentiles, or between
upper and lower quartiles
IQR = Q3 −  Q1.
• In other words, the IQR is the first quartile subtracted from the third
quartile.
Inter-quartile Range or Quartile Deviation
Finding the IQR requires the following steps:
1. Arrange the data sets from least to highest.
2. Find the values of the 3rd quartile (Q3) and the 1st quartile (Q1)
values by using the locator's formula: c = k(n)/4. where k is the
quartile and n is the number of values (observations).
Rule of thumb for c
a) If c is a whole number, the quartile value is the average of
the values at c and c + 1.
b) If c is not a whole number, the quartile value is the value at
the integer when c is rounded off.
3. Subtract Q3 and Q1.
Percentiles, Quartiles (Measure of Relative
Standing) and Inter-quartile Range
Semi Inter-quartile Range or Quartile Deviation
• The quartile deviation is half of the IQR. It can be found by dividing
the IQR by 2. 
QD = IQR/2 = (Q3 – Q1)/ 2
• When QD is small, it means that there is a small deviation in the
centre 50% items. If it is high, it shows that central 50% items have
a large variation
• Coefficient of quartile deviation: IQR or the QD is an absolute
measure of dispersion and can be changed in to relative measure
of dispersion using
(Q3 – Q1)/(Q3 + Q1)
IQR
• Merits:
– It is superior to range as a measure of dispersion.
– A special utility in measuring variation in case of open end distribution
or one which the data may be ranked but measured quantitatively.
– Useful in erratic or badly skewed distribution.
– The Quartile deviation is not affected by the presence of extreme values.
• Limitations:
– As the value of quartile deviation dose not depend upon every item of
the series it can’t be regarded as a good method of measuring
dispersion.
– It is not capable of mathematical manipulation.
– Its value is very much affected by sampling fluctuation.
The mean deviation
• The mean deviation is an average of absolute deviations of
individual observations from the central value of a series.
• It uses average but ignores signs and hence appears
unmethodical.
• MD is based on all values and hence cannot be calculated for
open-ended distributions.
• MD is calculated from mean as well as from median for both
ungrouped data using direct method and for continuous
distribution using assumed mean method and short-cut-
method.
• The average used is either the arithmetic mean or median
Actual and absolute deviations from mean

• For individual series: X1,, X2, ……… Xn


 xx
Mean deviation 
n
• For discrete series: X1,, X2, ……… Xn & with
corresponding frequency f1, f2, ……… fn

f i xi  x
MD 
n
Mean Deviation for continuous grouped data

• MD for m1, m2, …… mn are the class mid points with


corresponding frequency f1, f2, ……… fn

f i mi  x
MD x  
n

• Coeff. Of MD: = (MD /average)


– The average from which the Deviations are calculated. It is
a relative measure of dispersion and is comparable to
similar measure of other series.
Problem – Find the md for the following data
Class Frequency
0-10 7
10-20 12
20-30 18
30-40 25
40-50 16
50-60 14
60-70 8
Merits and limitation of MD
• Simplicity – simple to understand and easy to
calculate
• It is based on each and every data
• Less affected by the extreme value
Drawback
– Algebraic signs are ignored
– May not give very accurate result
– Not capable of further algebraic treatment
– Rarely used
Standard Deviation
• While looking at the earlier measures of dispersion
all of them suffer from one or the other demerit i.e.
• Range –it suffer from a serious drawback considers
only 2 values and neglects all the other values of
the series.
• Quartile deviation considers only 50% of the item
and ignores the other 50% of items in the series.
• Mean deviation no doubt an improved measure but
ignores negative signs without any basis.
Standard Deviation
• Introduced by Karl Pearson in 1893.
• Widely used measure of dispersion
• Karl Pearson, after observing all these things, has given us a
more scientific formula for calculating or measuring
dispersion.
• The standard deviation is a very important concept in
Statistics since it is the basic tool for summarizing the amount
of randomness in a situation.
• Also known as root mean square deviation
• Denote by small Greek letter σ (sigma)
• It is the positive square root of the average of squares of
deviations of the observations from the mean.
Standard Deviation
• It is the positive square root of the average of squares of
deviations of the observations from the mean.
• While calculating SD we take deviations of individual
observations from their AM and then each squares. The sum
of the squares is divided by the Total number of observations.
The square root of this sum is knows as standard deviation.
• It is always calculated from the arithmetic mean and median.
Mode is not considered.
Standard Deviation
• Standard deviation tells you how spread out the data is.
• It is a measure of how far each observed value is from the
mean.
• In any distribution, about 95% of values will be within 2
standard deviations of the mean.
Difference between MD and SD
• Algebraic signs are ignored in the calculation of MD whereas
in the calculation of SD signs are taken into account
• MD can be computed either from mean or Median where SD
is always computed from the AM because the sum of the
square of deviation of item from AM is the least
Standard Deviation for individual
observation
• Deviation from Actual mean

√ √
∑ (𝑥 − 𝑥) 𝑜𝑟 ∑ 𝑑
√ ∑𝑥
( )
∑𝑥
2 2 2 2

𝜎= 𝑜𝑟 −
𝑁 𝑁 𝑁 𝑁

• Deviation from Assumed mean

√ ∑𝑑
( )
∑𝑑
2 2

𝜎= −
𝑁 𝑁
Steps in calculating the standard deviation
• For each value, find its distance to the mean
• For each value, find the square of this distance
• Find the sum of these squared values
• Divide the sum by the number of values in the data
set
• Find the square root of this
Problem: Find the mean respiration rate per minute
and its standard deviation when in 4 cases the rate was
found to be : 16, 13, 17 and 22.

16
13
17
22

√ √
2

𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛=𝜎=
∑ (𝑥 − 𝑋 ¿)2
=
∑𝑑 ¿
𝑁 𝑁
Example: Blood serum cholesterol levels of 10 persons are as
under: 240, 260, 290, 245, 255, 288, 272, 263, 277, 251.
calculation standard deviation with the help of assumed mean
Standard Deviation for discrete
series
• Deviation from Actual mean

√ ∑ 𝑓𝑥
2

𝜎=
𝑁 Where x  X  X
• Deviation from Assumed mean
 

√ ∑ 𝑓 𝑑2 ∑ 𝑓 𝑑
( )
2


𝜎= −
Step Deviation Method𝑁
𝑁 d   X  A

where i is common interval


Find the standard deviation value for the following
discreate data using assume mean method
10 11 12 13 14 15 16


2 7 11 15 10 4 1

( )
2

𝜎=
∑𝑑 2

∑𝑑
𝑁 𝑁

x f d = x-A fd
Find the standard deviation value for the following
discreate data using step deviation method
BP
102 106 110 114 118 122 126
(mmHg)
Days 2 9 25 35 17 10 1

x f d = (x-A)/4 fd

√ ∑𝑓𝑑
( ∑𝑓𝑑
)
2 2

𝜎= − 𝑋𝑖
𝑁 𝑁
Standard Deviation for Group Data

√ 
• SD is : fx

2
𝑓 (𝑥 − 𝑥) Where x
i i
𝜎= f
𝑁 i

• Simplified formula

√ ∑𝑑
( )
∑𝑑
2 2

𝜎=
𝑁

𝑁
𝑋𝑖
d
m  A
,
i
Where i = class interval
Find the standard deviation value for the
following discreate data using step deviation
method
IQ 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No of 5 12 15 20 10 4 2
Student

IQ f m d = (x-A)/10 fd
Mathematical Properties of SD
• Combined Standard Deviation

• The Sum of Squares of the Deviation of items in the


series from their AM is minimum
Mathematical Properties of SD
Relation between Measures of
Dispersion
Coefficient of Variation
• The corresponding relative measures of SD is known as
coefficient of variance
• Developed by Karl Pearson
• Used where we want to compare the variability of two or
more series
• A coefficient of variation is computed as a ratio of the
standard deviation of the distribution to the mean of the
same distribution.
Characteristics of Standard Deviation
• SD is very satisfactory and most widely used measure of
dispersion
• Amenable for mathematical manipulation
• It is independent of origin, but not of scale
• If SD is small, there is a high probability for getting a value
close to the mean and if it is large, the value is father away
from the mean
• Does not ignore the algebraic signs and it is less affected by
fluctuations of sampling
Example-2: Find Standard Deviation of Group
Data

xi fi f i xi f i xi
2 x i  x x i  x 2 f i x i  x 2
3 2 6 18 -3 9 18
5 3 15 75 -1 1 3
7 2 14 98 1 1 2
8 2 16 128 2 4 8
9 1 9 81 3 9 9
Total 10 60 400 - - 40

x

f
f x i

i
i

60
10
6 𝜎=
√ ∑ 𝑓 (𝑥 − 𝑋 )
𝑁
2
𝜎=
√40
10
Variance
• Term was used to describe the square of the SD by RA Fisher in
1913.
• Has highly importance in advanced work where it is possible to
split the total into several parts, each attributable to one of the
factors causing variation in their original.
• Variance measures how far a data set is spread out.
• Defined as “The average of the squared differences from the
Mean”.

• Also defined as square of SD


or
Variance
• In frequency distribution where deviation are
taken from assumed mean, variance can be
calculated using

• Where
and I = class interval
Problem
• calculate the range, the variance and the standard deviation of the sample data
which follows Vitamin E concentration (mmol/l) in 12 heifers showing clinical signs
of an unusual myopathy:
4.2 3.3 7 6.9 5.1 3.4 2.5 8.6 3.5 2.9 4.9 5.4

= 4.80 Range = max - min = 8.6 - 2.5 Range = 6.1


X
4.2 -0.61 0.37
3.3 -1.51 2.28
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒=
∑ ( 𝑥 − 𝑥 )2 39.31
= =3.28
7 2.19 4.80 𝑛 12
6.9 2.09 4.38
5.1 0.29 0.09
3.4 -1.41 1.98 or
2.5 -2.31 5.33
8.6 3.79 14.38
3.5 -1.31 1.71
2.9 -1.91 3.64 𝜎 =√ 3.28=1.81
4.9 0.09 0.01
5.4 0.59 0.35
39.31
Standard Error(SE)
• The standard error(SE) is very similar to standard deviation as both
are measures of spread.
• The standard error is the approximate standard deviation of a
statistical sample population.
• Standard error uses statistics (sample data), standard deviations
use parameters (population data).
• The standard error is a statistical term that measures the accuracy
with which a sample represents a population.
• In statistics, a sample mean deviates from the actual mean of a
population; this deviation is the standard error.
• Defined as
=
Need of SE
• Used as an instrument in testing of hypothesis
• Provide an idea about unreliability of a sample
• Helps in determining the limits within which
values are expected to lie
Problem: Find Standard Deviation and
variance of Group Data
X (X-240) d2
d


240.12
∑𝑑 −
( )
∑𝑑
2 2

240.13 𝜎=
𝑁 𝑁
240.15
240.12
240.17
240.15
240.17
240.16
240.22
240.21
Problem
Calculate Variance, Standard Deviation , Mean Deviation from Median, Mode, Arithmetic
Mean, Geometric mean, Harmonic mean, Coefficient of Variation and Standard Error
from the data of Gestation Period of crossbred cows given below for each grades
individually and for pooled data using formula for Variance, Standard Deviation and
Standard Error.

FxH FxBxH FxJxH


278 273 285 282 274 290 271 292 264 285 271 265 299 272 276
289 287 270 277 285 276 275 308 273 266 280 266 279 281 263
254 283 275 286 276 278 299 278 286 280 270 279 273 305 284
285 293 283 287 286 279 280 284 285 270 278 269 285 268 279
272 292 281 285 290 272 287 275 276 281 278 283 279 271 274
261 282 282 255 283 276 278 279 270 276 284 266 281 279 283
283 309 281 288 288 275 288 291 274 279 274 278 280 279 280
283 285 281 272 277 283 272 285 282 260 283 275 276 286 279
273 285 270 292 253 282 285 257 266 286 280 281 276 286 275
295 269 288 281 287 270 294 279 269 299 277 298 277 277 260

You might also like