0% found this document useful (0 votes)
3 views31 pages

Lecture 03

This document covers central tendency and dispersion in statistics, focusing on measures such as mean, median, mode, range, variance, standard deviation, and coefficient of variation. It explains how to calculate these measures and their characteristics, including their sensitivity to outliers and applicability to different types of data. The document also provides examples and tips for choosing the appropriate measures based on data characteristics.

Uploaded by

Helen Nikulina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views31 pages

Lecture 03

This document covers central tendency and dispersion in statistics, focusing on measures such as mean, median, mode, range, variance, standard deviation, and coefficient of variation. It explains how to calculate these measures and their characteristics, including their sensitivity to outliers and applicability to different types of data. The document also provides examples and tips for choosing the appropriate measures based on data characteristics.

Uploaded by

Helen Nikulina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Statistics and Econometrics

Lesson 3: Central Tendency and Dispersion

Zhejin ZHAO

Statistical Analysis for Business & Economics: Spring 2011


Introduction
Recall Lesson 2, where we used graphical techniques to
describe data:
Histogram
18

16

14

12

10
Frequency

6 Frequency
4

0
0~5 5~10 10~15 15~20 20~25 25~30 30~35
Waiting time

While this histogram provides some new insight, other


interesting questions (e.g. what is the average of waiting time?)
remain unclear.
2
Introduction: Summarizing Distributions

3
3 Measures of Central Tendency

Statistic Formula Excel Formula Pro Con


Average Familiar and
of all the Influenced
uses all the
Mean data =AVERAGE(Data) by extreme
data
values.
information.

Middle May not be


Robust when
value in influenced
Median =MEDIAN(Data) extreme data
sorted by extreme
values exist.
array values.

4
3 Measures of Central Tendency (cont’d)

E.g. 1 2 2 2 2 4 7
Mean? Median? Mode?

5
Notations

 When referring to the number of observations in a


population, we use uppercase letter N

 When referring to the number of observations in a


sample, we use lower case letter n

 The mean for a population is denoted with :


Parameter or Statistic?

 The mean for a sample is denoted with : Parameter


or Statistic?

6
Characteristics of the Mean
Note: mean is very sensitive to
exceptionally large or small
observations called outliers
Examples
 As soon as a billionaire
moves into a neighborhood,
the average household
income increases beyond what
it was previously!

7
Characteristics of the Mean

 Imagine if Ming Yao were in this class, what happens to

the mean height of the class.

8
Median

 Defined as the value below which are 50% of observation


in the data set and above which are 50% of observation.

 Median separates the upper and lower half of the sorted


observations.

 If n is odd, the median is the middle observation in the


data array.

 If n is even, the median is the average of the middle two


observations in the data array.

9
Median (cont’d)
Example 1
Consider the following n = 6 data values:
11 12 15 17 21 32

What is the median?


xn / 2  x( n / 21)
For even n, Median =
2
n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4
M = (x3+x4)/2 = (15+17)/2 = 16

11 12 15 16 17 21 32
10
Median (cont’d)
Example 2
Consider the following n = 7 data values:
12 23 23 25 27 34 41

What is the median?


For odd n, Median = x( n 1) / 2
(n+1)/2 = (7+1)/2 = 8/2 = 4
M = x4 = 25
12 23 23 25 27 34 41

11
Median (cont’d)

• The median is insensitive to extreme data values.

• For example, consider the following quiz scores for 3 students:


Adrian’s scores:
20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285
Dustin’s scores:
60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380
Josh’s scores:
50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350

• In above case, is the median informative?

12
Mode

 Defined as the most frequently occurring data value.

 May have multiple modes or no mode


cf.) Mean and median: unique to a data set

13
Mode (cont’d)

An example

Consider the following quiz scores for 4 students:


Scarlett’s scores:
60, 70, 70, 70, 80 Mean =70, Median = 70, Mode = 70
Johan’s scores:
45, 45, 70, 90, 100 Mean = 70, Median = 70, Mode = 45
Tim’s scores:
50, 60, 70, 80, 90 Mean = 70, Median = 70, Mode = none
Ryan’s scores:
50, 50, 70, 90, 90 Mean = 70, Median = 70, Modes = 50,90

14
Mean, Median, Mode

If a distribution is symmetric, the mean, median and


mode may coincide.
median
mode

mean

15
Mean, Median, Mode

If a distribution is asymmetrical, say skewed to the left or


to the right, the three measures may differ. E.g.:

median
mode

mean

16
Practice

1, 2, 3, 4, 2, 2, 3, 4, 5, 2

1. Calculate mean, median and mode of the data set.

2. Add 2 to each observation and recalculate mean,


median and mode.

3. Multiply each original observation by 2 and recalculate


mean, median and mode.

4. Describe how each measure of center changed with the


different operations.
17
Tips: Which Measures to Use?

 For ordinal and nominal data the calculation of the mean is


not valid.
 Median is appropriate for ordinal data.
 For nominal data, a mode calculation is useful for
determining highest frequency.

18
Outline of Dispersion

 Dispersion is the “spread” of data points about the center


of the distribution.
 Measures of dispersion:
• Range
• Variance
• Standard deviation
• Coefficient of variation

19
Range

 The difference between the largest and smallest


observation.
Range = xmax – xmin

 An example: Tina’s homework score


85 98 87 83 84 7 86
Range = 98 – 7 = 91

 Drawback: determined by only the two extreme values

20
Variance
N
2
 The population variance is defined as   xi   
the sum of squared deviations around 2  i 1
N
the mean divided by the population size.

 For the sample variance, we divide by n


2
n – 1 instead of n, otherwise it would tend   xi  x 
to underestimate the unknown population s 2  i 1
variance. n 1

Note! the denominator is sample size (n) minus one !

 Drawback: due to its units, hard to interpret

21
Standard Deviation

 The square root of the variance.


 Explains how individual values in a data set vary from the
mean.
 Units of measure are the same as X.

Population N Sample n
2
standard
2
  xi    standard   xi  x 
i 1
 i 1
deviation s
deviation N n 1

22
Standard Deviation (cont’d)

 Excel’s built in functions are


Statistic Excel population Excel sample
formula formula
Variance =VARP(Array) =VAR(Array)

Standard deviation =STDEVP(Array) =STDEV(Array)

23
Calculating a Standard Deviation

 Consider the following five quiz scores for Stephanie.


(Table 412)

 Now, calculate the sample standard deviation:


n
2
  xi  x  2380
i 1
s   595 24.39
n 1 5 1
24
Calculating a Standard Deviation

 The standard deviation is nonnegative because


deviations around the mean are squared.
 When every observation is exactly equal to the mean,
the standard deviation is zero.
 Standard deviations can be large or small, depending
on the units of measure.
 Compare standard deviations only for data sets
measured in the same unit.

25
Coefficient of Variation
 The coefficient of variation of a set of observations is
the standard deviation of the observations divided by
their mean, that is:
• Population coefficient of variation = CV =

• Sample coefficient of variation = CV =

 This coefficient provides a proportionate measure of


variation, which is free of units
 It measures relative dispersion.
26
Coefficient of Variation (cont’d)

Example 1
data sets with the same units

• Let’s compare two stocks you’re interested in:

“Penny” Stock (A) vs. Regular Stock (B)


X = $2.00 X = $100.00
s = $0.04 s = $2.00

• Question: Which stock is more unstable?

27
Coefficient of Variation (cont’d)

• In terms of absolute dispersion, stock B seems to be more


unstable. However, to compare the dispersion, we often
adjust to the returns. i.e. using the CV, we have:

Stock A: CV = ($.04/$2)*100 = 2%
Stock B: CV = ($2 /$100)*100 = 2%

Statistic Formula Excel Pro

Coefficient Measures relative variation in


of s percent so can compare data
100  None
variation x sets even with different units.
(CV) (Unit free)

28
Coefficient of Variation (cont’d)

Example 2
data sets with different units

• Compare variability for the following two variables.

National SAT mean = 1000 vs. GPA mean = 2.5


National SAT s.d. = 200 vs. GPA s.d. = 1.2

• Again, here CV becomes a useful tool for comparison.


Without CV, it is not possible to compare them.

29
Measures of dispersion
Population Sample

Size N n
Mean

Variance

Standard Deviation S

Coefficient of  s
Variation CV= 
CV=
x
Tips: Which Measures to Use?

 If data are symmetric, with no serious outliers,


can use range and standard deviation.
 If comparing variation across two data sets, use
coefficient of variation.
 The measures of variability introduced in this section
can be used only for numerical data.

31

You might also like