Week 2 Measures of Dispersion II

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 34

Measures of

Dispersion
The first step involved in data analysis is the description or summary of
one or several properties of the data - as descriptive statistics.

The two most common numerical descriptive measures are measures of


central tendency and measures of variability / dispersion.

A statistic is a numerical index that describes or summarizes some


characteristics of a given sample data set while a parameter is a
numerical descriptive measures for a population.
Commonly used statistics are the mean, mode and
median which are all averages and are jointly referred
to as measures of central tendency.

They show how data group towards the center.


Mean: The arithmetic average of the data set.

Calculated as the sum of the values divided by the number of


observations.

Exercise

A sample of 15 overdue accounts in a large department store yields the


following amounts ($) due: 55.20; 4.88; 271.95; 18.06; 180.29; 365.29;
28.16; 399.11; 807.80; 44.14; 97.47; 9.98; 61.61; 56.89 and 82.73

a) Determine the mean amount due for the 15 accounts sampled.

b) If there are a total of 150 overdue accounts, use the sample mean to
predict the total amount overdue for all 150 accounts.
Median: Middle value in a given data set when arranged in order of
magnitude

Mode: Most frequently occurring value in a data set

Exercise

An experiment was conducted to measure the effectiveness of a new


procedure for pruning. Each of 13 workers was assigned the task of
pruning an acre of avocados. The productivity, measured in
worker-hours/acre, was recorded for each person as, 4.4; 4.9; 4.2; 4.4; 4.8;
4.9; 4.8; 4.5; 4.3; 4.8; 4.7; 4.4 and 4.2 Determine the mode and median
productivity for the group.
Measures of Dispersion / Variation / Spread

It is not sufficient to describe a data set using only measures of central


tendency, such as the mean or the median.

Measures of variation show how the data is spread or dispersed and


include the range and standard deviation.

Range

Difference between the highest and lowest values in a data set.

However, this does not show whether the values are common or

extreme values.
Mean Deviation

Shows how each observation or data point differs from the mean.

Calculated as the difference between the value of the observation and


the mean.

However, the summation of the mean deviations is always equal to zero


which gives no new information about the data set.
Standard Deviation
Shows how data is spread about the mean (above and below).

It uses the square of each deviation from the mean to cancel out the
effect of any negative deviation.

The resultant squared deviations are summed and averaged by dividing


the summation by:

•the total number of observations in a population


•the total number of observations less 1 (n-1) in a sample
to obtain the variance.

The standard deviation is then obtained by calculating the square root of


the variance.
Population Standard deviation

Sample standard deviation

The commonly used measure is the sample standard deviation


Interpretation and application of the standard deviation:

A small standard deviation implies that the mean is a typical or


common value in the data set, or that the values of the data set are close
to the mean.

This is ideal for quality control in manufacturing as a sign of low risk.

Or in assessing volatility of returns of an investment e.g. bond markets


have low volatility while stock markets have high volatility.
A large standard deviation indicates that the mean is not typical, and that
the observations are far from the mean.

This allows the use of other statistical measures to get a better


understanding of the characteristics of the data set.
Steps for calculating the standard deviation

1)Mean = Sum of values/ Number of values

2) Mean deviation = Data value – mean

3) Square and add the mean deviations

4) Variance = Sum of squared deviations / n-1

5) Standard Deviation =
Given the following data set:
a) 54, 36, 78, 91

b) Ten friends scored the following marks in their end-of-year math


exam: 23%, 37%, 45%, 49%, 56%, 63%, 63%, 70%, 72% and 82%

Calculate the following:


i) Mean
•Mean deviation for each observation
•Mean deviation squared sum
iii) Variance of the data set
•Standard deviation of the data
Answers to Question b)

Score Mean deviation M.D 2


1 23 -33 1089
2 37 -19 361
3 45 -11 121
4 49 -7 49
5 56 0 0
6 63 7 49
7 63 7 49
8 70 14 196
9 72 16 256
10 82 26 676
560 2846

Mean score 560/10 = 56%


Variance =2,846/(10-1) = 316
Standard deviation = square root of 316 = 17.77
Normal distribution

Many data sets are normally distributed or symmetric (occurrences are


equally distributed above and below an average), with the following
characteristics:

A normal distribution exhibits the following:


•68.3% of the population is contained within 1 standard deviation from
the mean.
•95.4% of the population is contained within 2 standard deviations from
the mean.
•99.7% of the population is contained within 3 standard deviations from
the mean.
The normal distribution curve is also referred to as the Gaussian
Distribution (Gaussian Curve) or bell-shaped curve. 

Manufacturing processes and natural occurrences frequently create this


type of distribution, a unimodal bell curve.

Statistical analysis always assumes that the data set has a normal
distribution except when otherwise specified.
If a normal distribution has a mean of 75 and a standard deviation of 10,
95% of the distribution can be found between which two values?

A) 0, 95

B) 65, 85

C) 55, 95

D) 45, 105

Answer: C. 95% of the distribution (area under the curve) 2 std. dev
about the mean.Therefore 75 - 20 = 55 is the lower value and 75 + 20 =
95 is the upper value.
If a normal distribution has a mean of 35 and a variance of 25, 68% of
the distribution can be found between which two values?

A) 30, 40

B) 25, 45

C) 0, 70

D) 20, 50

Answer: A. 68% of the distribution (area under the curve) is about +/- 1
standard deviation from the mean. The standard deviation is the square
root of the variance and therefore = 5. Therefore 35 - 5 = 30 is the lower
value and 35 + 5 = 40 is the upper value.
A distribution of measurements for the length of widgets was found to
have a mean of 92.0mm and a standard deviation of 2.50mm.
Approximately what percent of measurements are between 87.00mm and
97.00mm?

A) 100%

B) 68%

C) 95%

D) 99%

Answer: C. The measurements of 87.00mm and 97.00mm are two


standard deviations away from the mean of 92.00mm. Therefore about
95% of the values recorded are between 87.00mm and 97.00mm.
Exercise Four

Ten friends scored the following marks in their end-of-year math


exam: 23%, 37%, 45%, 49%, 56%, 63%, 63%, 70%, 72% and 82%

i)How many students are likely to score above 73.77%

ii) How many students are likely to score less than 73.77%
Exercise Five

A battery has an expected life of 46 months with a std. deviation of 4


months. With 100 batteries, how many would you expect to last up to 54
months? Assume a normal distribution.

Mean = 46 months

Note : 1 std. deviation is equal to 4 months, therefore 8 months is two


std. deviations.

Answer (50 + 47.77) = 97.77% of 100 = 98 batteries

Therefore, batteries that should last 54 months and over will

be 2.23% of 100 = 2.28 batteries.


Exercise Six
In a set of 100 scores that are normally distributed with a mean of 23
and a standard deviation of 4.69

a)How many scores are expected to be lower than 18.31 (one standard

deviation below the mean)

b) How many of the 100 scores are expected to be below 32.38 (two

standard deviations above the mean.


Exercise Seven

Given a set of 36 months of medical leave data that has a mean of 12


days per month and a standard deviation of 4.24 days, how many
months are expected to have less than 16.24 days per month medical
leave?
Exercise

A researcher is comparing two multiple-choice tests with different


conditions. In the first test, a typical multiple-choice test is administered.
In the second test, alternative choices (i.e. correct and incorrect answers)
are randomly assigned to test takers. The results from the two tests are:

Comparison of the two test results is difficult. Comparing standard


deviations doesn’t really work, because the means are also different.
Coefficient of variation

Regular test: CV = 17.03

Randomized answers: CV = 28.35

The results from the second test have more variability than from the first.
In finance, the coefficient of variation allows investors to determine how
much volatility, or risk, is assumed in comparison to the amount of
return expected from investments.

The lower the CV the better risk-return trade-off.


SKEWNESS
The skewness of a distribution is defined as the lack of symmetry. This is
calculated as:
In a symmetrical distribution, the Mean, Median and Mode are equal to
each other and the mean divides the distribution into two equal parts.

A perfectly symmetrical / normally distribution data set will have a


skewness of 0.

This is shown using a smooth curve as in the following figure:


If the distribution is skewed, having a long tail in one direction and a
single peak, the mean is pulled in the direction of the tail and the median
falls between the mode and the mean as illustrated below.
KURTOSIS

Whereas skewness measures the lack of symmetry of the frequency


curve of a distribution, kurtosis is a measure of the relative peakedness
of its frequency curve.

Kurtosis is calculated as:

Frequency curves can be divided into three categories depending upon


the shape of their peak.
The three shapes are termed as Leptokurtic, Mesokurtic and
Platykurtic as shown in the following figure.
• The kurtosis of a normal distribution is 0 - mesokurtic distributions. 

•  

• If the kurtosis is less than zero, then the distribution is light in the tails
and is called a platykurtic distribution. 

• If the kurtosis is greater than zero, then the distribution has heavier
tails and is called a leptokurtic distribution 

You might also like