Measures of Dispersion

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 79

How spread out is the

data?

Variability = Spread = Dispersion


Variability

 Measures of central tendency


 One single value that represents the entire
data.
 Measures of variation
 gives additional information to judge the
reliability of our measure of central tendency.
Variability

For two groups the value may be same but


still there can be wide disparities.

Measures of variation – the extent to which


the observations vary from average value.
Graph cont…......

Central location is less representative of


the data as a whole than it would be for
data more closely centered around the
mean as in the first curve.
Dispersion…..

There are problems peculiar to widely


dispersed data
We may wish to compare dispersion of
various samples.
A wide spread of values away from the
center is present in an unacceptable risk
Financial analyst –
dispersion of firms earnings those varying
from extremely high to low
or even negative levels-
higher risk to stock holders and creditors
Quality control experts

analyze the dispersion of a product’s quality


levels.

A drug that is average in purity but ranges


from very pure to highly impure may
endanger lives !
In matters of health
Variation in body temperature , pulse beat
and blood pressure are basic guides to
diagnosis. Treatment to control their
variation.

In industrial production
control of quality variation

In social problems
inequality of the distribution of income
and wealth etc.
common measures of dispersion or
variability
 Range
 Mean Deviation
 Quartile Deviation
 Variance
 Standard Deviation
Range

 The range is the distance between the


smallest and the largest value in the set.

 Range= Highest value – Lowest Value

 Quick & easy, but only reflects the


extremes, and may be distorted by one
extreme value.
Quartile deviation

 Range is affected by extreme items.


 For this purpose inter quartile range is defined.
 Quartile deviation is an absolute measure of variation.
 Inter quartile range= Q3-Q1

Q3  Q1
Q.D 
2
 Coefficient of quartile deviation

Q3  Q1

Q3  Q1
You are given the frequency distribution of 292 workers
of a factory according to their average weekly income.
Calculate quartile deviation and its coefficient from the
following data
Weekly income No. of workers
Below 1350 8
1350-1370 16
1370-1390 39
1390-1410 58
1410-1430 60
1430-1450 40
1450-1470 22
1470-1490 15
1490-1510 15
1510-1530 9
1530 & above 10
Weekly No. of C.f
income workers
Below 1350 8 8
1350-1370 16 24
1370-1390 39 63
1390-1410 58 121
1410-1430 60 181
1430-1450 40 221
1450-1470 22 243
1470-1490 15 258
1490-1510 15 273
1510-1530 9 282
1530 & above 10 292

N= 292
Q1 = N/4th observation
= 73rd observation
Class interval 1390-1410 & c=63

N c
Q1  l1  4 (l2  l1 )
f
Q1=1393.448
& similarly Q3= 1449
Q3  Q1
 Coeff. Of Q.D =
Q3  Q1

1449  1393.448
=
1449  1393.4481
= 0.020
Variance and Standard Deviation

 These measures use the squares of the


deviations, thus avoiding the inconvenience
of absolute values.
Standard Deviation

 Introduced by Karl Pearson


 Also known as root mean square deviation
 Most widely used.
Standard Deviation

 Standard Deviation of the Population is


designated with the lower case of the Greek
letter, sigma. σ
 Standard Deviation of the Sample is
designated with the lower case of our usual
letter, s.
Variance

 Variance of the Population is the square of


the standard deviation, so it is designated
with the lower case sigma, squared. σ 2
 Variance of the Sample is similarly
designated with the lower case s, squared.
s 2
Variance is defined as
mean of the squared deviations.
2 2
σ = Σ ( X X̄ - )

N
Variance: Computational
Formula

2 2 2
σ = ΣX _ X̄
N
Standard Deviation:
Computational Formula
 Simply take the square root of the Variance

 (X  X ) 2

N
Standard Deviation:
Computational Formula

 X 2
2
(X )
N
Grouped data

  f (X  X ) 2

N
Grouped data- computational formula

  fX 2
2
(X )
N
Coefficient of Variation
 Absolute measure of variation = S.D
 Relative measure of variation=Coefficient of Variation
 Used to compare the variability of two or more than
two series.
 Coefficient of variation = high
 Less consistent, less uniform , less stable or less
homogeneous data


C.V  100
X
Combined Standard Deviation
N1 12  N 2 22  N1d12  N 2 d 22
 12 
N1  N 2
where
 12 =Combined Standard Deviation

1 =Standard Deviation of First group


2 =Standard Deviation of Second group

d1  X 1  X 12

d 2  X 2  X 12

X 12  =Combined mean
Standard Deviation Represents

 a sort of average variability, or deviation, from


the mean

 is in the same units as the mean.


Standard Deviation

 If the mean = 80, and σ = 5, that means


standard deviation is 5 units from the mean of
80.
 If we are measuring length, the mean might
be 80 ft, and σ is then 5 ft.
 If we are measuring scores, the mean might
be 80 points and σ is 5 points.
Standard Deviation

 This would be reported by saying the mean is


80 plus or minus a standard deviation of 5.

 A little more than 2/3 of the values in a


normal distribution will be within 1 standard
deviation above and below the mean, here
between 75 and 85.
For mean of 120, with sd of 25,
Properties

 The sum of the square of the deviations from


their mean is minimum
 For a symmetrical distribution
 Mean + 1 σ covers 68.27 % observations.

 Mean + 2 σ covers 95.45 % observations


 Mean + 3 σ covers 99.73 % observations
 This can be illustrated by
Using the Empirical Rule

 Suppose that the variable Math SAT scores is bell-shaped with


a mean of 500 and a standard deviation of 90. Then,

 68% of all test takers scored between 410 and 590


(500 ± 90).

 95% of all test takers scored between 320 and 680


(500 ± 180).

 99.7% of all test takers scored between 230 and 770


(500 ± 270).

Chap 3-37
Which Measure of variation to use?

 The type of data available


 If observations are few in number or extreme
values
 Avoid S.D
 If gaps around the observations
 Avoid Q.D
 If open end classes
 Prefer Q.D
Shape of a Distribution

 Describes how data are distributed


 Measures of shape
 Symmetric or skewed

Left-Skewed Symmetric Right-Skewed

Mean < Median Mean = Median Median < Mean

Chap 3-39
Which Measure of variation to use?

 The purpose of investigation


 Practically, all advanced statistical methods
deal with S.D.
 If not specified
 Use S.D
General Descriptive Stats Using
Microsoft Excel
1. Select Tools.
2. Select Data Analysis.
3. Select Descriptive Statistics and
click OK.

Chap 3-41
General Descriptive Stats Using
Microsoft Excel
4. Enter the cell range.
5. Check the Summary
Statistics box.
6. Click OK

Chap 3-42
Excel output

Microsoft Excel
descriptive statistics output,
using the house price data:

House Prices:

$2,000,000
500,000
300,000
100,000
100,000

Chap 3-43
Sample statistics versus population
parameters

Measure Population Sample


Parameter Statistic
Mean
 X
Variance
2 S2
Standard
 S
Deviation

Chap 3-44
Minitab Output

Descriptive Statistics: House Price


Total
Variable Count Mean SE Mean StDev Variance Sum Minimum
House Price 5 600000 357771 800000 6.40000E+11 3000000 100000

N for
Variable Median Maximum Range Mode Skewness Kurtosis
House Price 300000 2000000 1900000 100000 2.01 4.13

Chap 3-45
The Five Number Summary

The five numbers that help describe the center,


spread and shape of data are:
 Xsmallest
 First Quartile (Q1)
 Median (Q2)
 Third Quartile (Q3)
 Xlargest

Chap 3-46
Relationships among the five-number
summary and distribution shape
Left-Skewed Symmetric Right-Skewed
Median – Xsmallest Median – Xsmallest Median – Xsmallest
> ≈ <
Xlargest – Median Xlargest – Median Xlargest – Median
Q1 – Xsmallest Q1 – Xsmallest Q1 – Xsmallest

> ≈ <

Xlargest – Q3 Xlargest – Q3 Xlargest – Q3


Median – Q1 Median – Q1 Median – Q1

> ≈ <

Q3 – Median Q3 – Median Q3 – Median

Chap 3-47
Review

Ok, let’s see how you do….


Variability refers to which of the
following?
 A. the number of scores in a sample.
 B. the precision of the sample mean.
 C. the spread of the sample scores.
 D. none of the above.
Variability refers to

 A. the number of scores in a sample.


 B. the precision of the sample mean.
 C. the spread of the sample scores.
 D. none of the above.
A measure of Variability is:

 A. the mode
 B. the standard deviation
 C. the median
 D. none of the above.
A measure of Variability is:

 A. the mode
 B. the standard deviation
 C. the median
 D. none of the above.
The difference between the
highest and lowest values is:
 A. deviation
 B. midpoint
 C. range
 D. median
The difference between the
highest and lowest values is:
 A. deviation
 B. midpoint
 C. range
 D. median
The mean of the squared
deviations is better known as:
 A. the variance
 B. the standard deviation
 C. the range
 D. the mean deviation
The mean of the squared
deviations is better known as:
 A. the variance
 B. the standard deviation
 C. the range
 D. the mean deviation
What is the standard deviation of
6, 13, 10, 2, 4
 A. 0
 B. 2.5
 C. 4.0
 D. 5.0
 What is the variance?
 What is the range?
Variance:
Computational Formula

2 2 2
s = ΣX _ X̄
N
What is the standard deviation of
6, 13, 10, 2, 4
 mean is 7, mean sq = 49, N is 5
 Sum of the x-squares is 325,
 So 325/5 - 49 = 65-49 = 16.
 Sq rt of 16 is 4
 Variance is 4 squared, or 16
 What is the range? 13 - 2 = 11.
Example1

 A researcher interested in the influence of


anxiety on concentration first incites his
subjects and then asks them to count
backwards by three, from 257.

 He counts the number of mistakes made in


two minutes.
 The data on 7 subjects are as follows.
Example1

 5, 5, 6, 7, 8, 8, 10

 What is the variance?


 What is the standard deviation?
Example1

 5, 5, 6, 7, 8, 8, 10
 mean = 7, N = 7,
 sumX = 49, sumX2 = 363
 Variance = 363/7 - 49 = 2.86
 Standard deviation = 1.69
 that is, 2/3 of the cases made between 5.31
and 8.69 errors in two minutes.
Example2

 The mean and standard deviation of 10


observations were found to be 9.5 and 2.5
respectively. Later on an additional observation
became available. This was 15.0 and was
included. Find the mean and the S.D of the 11
observations.
A. 10 & 3
B. 11 & 2.86
C. 13 & 3.1
D. 10 & 2.86
Example2

 The mean and standard deviation of 10


observations were found to be 9.5 and 2.5
respectively. Later on an additional observation
became available. This was 15.0 and was
included. Find the mean and the S.D of the 11
observations.
A. 10 & 3
B. 11 & 2.86
C. 13 & 3.1
D. 10 & 2.86
Example3

 A collar manufacture is considering the production


of a new style of collar to attract young men. The
following statistics of neck circumference are
available based on measurements of a typical
group of the college students.
 Mid Value: 12.0 12.5 13 13.5 14 14.5 15 15.5 16
 In inches

 Number 2 16 36 60 76 37 18 3 2
 Of students
Example3

 Compute the standard deviation and use the


criterion +3σ from mean to determine the
largest and smallest size of the collar he should
make in order to meet the needs of practically all
the customers bearing in mind that collars are
worn on average ½ inch longer than neck size.
A 11.655 & 15.945
B 11.555 & 16.4
C 12.155 & 16.445
D 12 & 16.7
Example3

 Compute the standard deviation and use the


criterion +3σ from mean to determine the
largest and smallest size of the collar he should
make in order to meet the needs of practically all
the customers bearing in mind that collars are
worn on average ½ inch longer than neck size.
A 11.655 & 15.945
B 11.555 & 16.4
C 12.155 & 16.445
D 12 & 16.7
Example4 Suppose that samples of polythene bags from two
manufactures A and B are tested by a prospective buyer for
bursting pressure, with the following results.

Bursting pressure Number of bags


(lbs)
A B
5.0-10 2 9
10.0-15 9 11
15.0-20 29 18
20.0-25 54 32
25.0-30 1 27
30.0-35 5 13
Example4

 Which set of bags has the highest average


bursting pressure? Which has more uniform
pressure? If prices are the same which
manufacturer’s bags would be preferred by
the buyer? Why?
Combined Standard Deviation
N1 12  N 2 22  N1d12  N 2 d 22
 12 
N1  N 2
where
 12 =Combined Standard Deviation

1 =Standard Deviation of First group


2 =Standard Deviation of Second group

d1  X 1  X 12

d 2  X 2  X 12

X 12  =Combined mean
Example 5
 In two factories A & B engaged in the same
industry, the average monthly wages and S.D
are as follows

Factory Average monthly S.D of No. of wage


wages (Rs.) wages earners
A 4600 500 100

B 4900 400 80
Example 5

 Which factory A or B pays larger amount as


monthly wages?
 Which factory shows greater variability in the
distribution of wages?
 What is the mean and S.D of all the workers
in two factories taken together?
Example 6

 For a group of 50 male workers, the mean


and S.D of their daily wages are Rs.63 and
Rs 9 respectively. For a group of 40 female
workers, these are 54 and 6 respectively.
Find the S.D of daily wages for the combined
group of 90 workers.
Variability or Dispersion

 Range

 Standard Deviation

 Variance
Measure of skewness

Skewness is an indicator of lack of symmetry in


a data.
A distribution, or data set, is symmetric if it
looks the same to the left and right of the
center point.
Positively & negatively Skewed
Karl Pearson’s coefficient of Skewness

Pearson’s measure of Skewness=


(mean-mode)/ Standard deviation
If calculation of mode is difficult, it can be calculated as

3(Mean-Median)/Standard deviation
Kurtosis

 Kurtosis is the degree of peakedness of a


distribution . It indicates the flatness of the
distribution curve at the top.
 In general more peaked the distribution, more
will be kurtosis and more flat the distribution
less will be its kurtosis.
Peaked (leptokurtic)& Flat (platycurtic)

You might also like