0% found this document useful (0 votes)
15 views33 pages

Lecture III-Measures of Dispersion

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views33 pages

Lecture III-Measures of Dispersion

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Summary Statistics:

Measures of Dispersion

Adapted by Dr. Awosanya


Descriptive Statistics

This session will address the following top-


ics:
• Calculation and interpretation of Mea-
sures of variability
– Range
– Inter-quartile range
– Standard deviation
– Standard error for the mean
• Application of appropriate measures of
dispersion
Measures of Spread, Dispersion, Variability

• In addition to a measure of central tendency, in describing a


distribution it is important to provide information concern-
ing the relative position of other data points in the sample,
(that is, a measure of spread or variability).
Range – is the simplest = Highest value minus
lowest value

• Take a sample sample of 10 heights (70, 95, 100, 103, 105,


107, 110, 112, 115, 140cms)
Lowest (minimum) value = 70cm.
Highest (Maximum) value= 140cm
Range is therefore 140 – 70 = 70cm
Simple to understand but far from perfect - why ?
i The range is derived from extreme values. It says nothing
about the values in between
· Not stable (as sample size increases the range can change
dramatically)
· Can’t use statistics to look at it.
Figure 8. Two distributions with the same range

No. of
People

Same Range
Different mean and variability
• Percentiles: Those values in a series of observations, ar-
ranged in ascending order of magnitude, which divide the
distribution into two equal parts (thus the median is the 50 th
percentile).

• Quartiles: The values which divide a series of observations,


arranged in ascending order, into 4 equal parts. (Thus the
2nd Quartile is the Median).

• The Interquartile Range represents the central portion of


the distribution and is calculated as the difference between
the third quartile and the first quartile. This range includes
about one-half of the observations in the set, leaving one
quarter of the observations on each side.
Median and quartiles
Sort the data in increasing order

The median is the middle value (if n is odd) or the average of the two middle
values (if n is even), it is a measure of the “center” of the data

Quartiles: dividing the set of ordered values into


4 equal partsQ2 = second quartile = median

first 25% second 25% third 25% fourth 25%


Q1 Q2 Q3
IQR = Interquartile range = Q3  Q1
Measures of Data Variability

• Interquartile Range
– the difference between the score representing the 75th percentile and the
score representing the 25th percentile

– Arrange observation in ascending order


– Find the position for Q1 and Q3
– Identify values and The Inter-quartile range = Q3 - Q1
– Example 29 , 31 , 24 , 29 , 30 , 25

– Arrange: 24 , 25 , 29 , 29, 30 , 31

» Q1 = value of (n+1)/4=1.75
» Q1 = 24+0.75 = 24.75

» Q3 = value of (n+1)*3/4=5.2
» Q3 = 30+0.2 = 30.2

» Q3 – Q3 = 30.2 – 24.75
So how do we get a single mathematical mea-
sure or
summarise the variability of an observed
set of values?

• The most frequent and most informative


measure is the VARIANCE and its
related functions

• The variance is computed in stages:


• 1. Calculate the mean as a measure of central location
(MEAN)

• 2. Calculate the difference between each observation and


the mean (DEVIATION)
(x-x)
• 3. Next square the differences (SQUARED DEVIATION)
(x-x)2

• Q. What is the effect of this ?

- Negative and positive deviations will not cancel each


other out.
- Values further from the mean have a bigger impact.
• 4. Sum up these squared deviations (SUM OF THE
SQUARED DEVIATIONS)
Σ (x -x)2

• 5. Divide this SUM OF THE SQUARED DEVIATIONS by


the total number of observations minus 1 (n-1) to give the
VARIANCE
Σ (x - x)2
n-1

This is a measure of the variability of the data

Why divide by n - 1 ?

This is an adjustment for the fact that the mean is just an


estimate of the true population mean. It tends to make the
variance bigger.
Measures of Data Variability

• Standard Deviation
– The standard deviation
is the square root of the
average squared devia-
tion from the mean
 2

(x i x)
SD 
n 1

n  x   x 
2 2
i i
SD 
n( n  1 )
Calculating Standard Deviation

Squarred
Score Mean Deviation* deviation
8 9.67 -1.67 2.79
25 9.67 15.33 235.01
7 9.67 -2.67 7.13
5 9.67 -4.67 21.81
8 9.67 -1.67 2.79
3 9.67 -6.67 44.49
10 9.67 0.33 0.11
12 9.67 2.33 5.43
9 9.67 -0.67 0.45
sum of squared dev= 320.01
Deviation from the mean (score-mean)

• Standard Deviation = Square root(sum of squared deviations / (n-1)


= Square root(320.01/(9-1))
= Square root(40) = 6.32
• The Normal Distribution: The symmetrical clustering of val-
ues around a central location is called the normal distribu-
tion. The belt-shaped curve that results when a normal dis-
tribution is graphed is called the normal curve.

• For distributions that are approximately normal (that is


unimodal, symmetrical and having a bell shaped curve) the
standard deviation and the mean together provide suffi-
cient information to describe the distribution totally.

• In a normal distribution there is a useful property.


• Approx. 68% of individual observations fall within ISD of
the Mean (+/- ISD)

• Approx. 95% of individual observations fall within 2 SD of


the Mean (+/- 2SD)

• Approx. 99% of individual observations fall within 2 SD of


the Mean ( +/- 2 SD)

• Many statistical tests that we use are based on the assump-


tion that the variable that we are studying is normally dis-
tributed in the population.

• Using the Standard Deviation (SD) we can


• Describe the “normal” range
• Compare the degree of variability in the distribution
of a factor between two populations or two different
variables in the same population.
• A problem arises where the means of two distributions are
different as the SD relates to the Mean.

• One measure that takes into account distribution of differ-


ent Mean and allows event-comparison of the variability in
the data is the COEFFICIENT OF VARIATION (CV)

• This CV expresses the SD as a proportion of the mean


CV = (SD/X ) x 100

The coefficients of variation can be compared directly.


Normal Curve: Properties and Use

• The Normal Curve as


Histogram

Num ber of observations for each interval


Normal Curve on SD Scale

The x-axis expresses the data values in a standardized format. Note the zero at the center of the graph. This
point represents the mean. The points on the x-axis, +1 and -1, represent data values which are one standard
deviation above and below the mean, respectively.
Area Indicates Proportion of Sample
Area under the Curve

Note the area under the curve in the figure above. It shows that 47.5 percent of the
observations fall between the middle point, where z=0, and the point almost two
standard deviations above the mean (z=+1.96).
Area under the Curve and Z-score

The characteristics of the normal curve make it useful to calculate z-scores, an index of the
distance from the mean in units of standard deviations.
z-score = (score - mean) / standard deviation
Scores and Normal Curve
Standard Error of the Mean

• mean
– arithmetic sum of data divided by number of observations

• standard deviation
– index of variability (spread) of data about the mean

• z-score
– distance from mean in standard deviation units
z = (x-mean)/sd

normal curve
– bell-shaped curve that relates probability to z-scores
Sample Mean
• In a typical situation, a sample might be taken
and the mean and standard deviation com-
puted. From this data, one will want to infer
that the population values are identical or at
least similar. In other words, it is hoped that
the sample data reflects the population data.

Now, change your thinking from a single


sample and consider the situation where you
take many samples and determine a mean
and standard deviation for each sample. The
obtained mean values would be distributed in
the same normal distribution as raw scores.
Multiple Sample Means
Distribution of Sample Means

• One could obtain a standard deviation of sample means


which would describe the variability or spread of sample
means about the true population mean.

In a practical situation, however, there is only one sam-


ple mean.

One hopes this sample mean is near the real population


mean.

Wouldn't it be nice to have an estimate of the standard


deviation of sample means which describe the spread of
sample means?
Standard Error of the Mean

• Divide the standard deviation by the square


root of the number of observations.
– standard error of mean SE  σ
= standard deviation/square root(n) n

• The resulting estimate of the standard devia-


tion of sample means is called the standard
error of means and can be interpreted in a
manner similar to the standard deviation of
raw scores.
– For example, the probability of obtaining a sample mean
which is outside the -1.96 to +1.96 range is 5 out of 100
Applying the Standard Error
• Suppose that the population mean of male's serum uric
acid levels is 5.4 mg per 100 ml and the standard devia-
tion is 1. If you drew 100 samples of 25 men in each
sample and computed 100 sample means, how many of
those means would you expect to fall within the range
5.4-1.96*1 to 5.4+1.96*1? The answer is 95.

If you conducted a sample and found a mean serum


uric acid level of 8.2, then would you assume this was
"significantly" different from the population mean? Yes,
because a mean of that magnitude could occur less
than 5 times in 100
Choosing the Measures of
Central Location and Dispersion
Exercise

• Determine the first and third quartiles


and interquartile range for the following
data

– 0, 3, 0, 7, 2, 1, 0, 1, 5, 2, 4, 2, 8, 1, 3, 0, 1, 2, 1

You might also like