Chap02 1
Chap02 1
Chapter 2
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1
Chapter Goals
After completing this chapter, you should be able to:
■ Compute and interpret the mean, median, and mode for a
set of data
■ Find the range, variance, standard deviation, and
coefficient of variation and know what these values mean
■ Apply the empirical rule to describe the variation of
population values around the mean
■ Explain the weighted mean and when to use it
■ Explain how a least squares regression line estimates a
linear relationship between two variables
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-2
Chapter Topics
■ Measures of central tendency, variation, and
shape
■Mean, median, mode, geometric mean
■Quartiles
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-3
Chapter Topics
(continued)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-4
Describing Data Numerically
Describing Data Numerically
Mode Variance
Standard Deviation
Coefficient of Variation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-5
2.1
Measures of Central Tendency
Overview
Central Tendency
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-6
Arithmetic Mean
■ The arithmetic mean (mean) is the most
common measure of central tendency
■ For a population of N values:
Population
values
Population size
■ For a sample of size n:
Observed
values
Sample size
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-7
Arithmetic Mean
(continued)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-8
Median
■ In an ordered list, the median is the “middle”
number (50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-9
Finding the Median
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-10
Mode
■ A measure of central tendency
■ Value that occurs most often
■ Not affected by extreme values
■ Used for either numerical or categorical data
■ There may may be no mode
■ There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-11
Review Example
■ Five houses on a hill by the beach
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-12
Review Example:
Summary Statistics
House Prices:
■ Mean: ($3,000,000/5)
$2,000,000 = $600,000
500,000
300,000
100,000
100,000
■ Median: middle value of ranked data
Sum 3,000,000
= $300,000
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-13
Which measure of location
is the “best”?
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-14
Shape of a Distribution
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-15
Geometric Mean
■ Geometric mean
■ Used to measure the rate of change of a variable
over time
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-16
Example
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-17
Example
(continued)
Arithmetic
mean rate Misleading result
of return:
Geometric
mean rate
of return: More
accurate
result
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-18
2.2
Measures of Variability
Variation
Same center,
different variation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-19
Range
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-20
Disadvantages of the Range
■ Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
■ Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-21
Interquartile Range
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-22
Interquartile Range
Example:
X Median X
minimum Q1 (Q2) Q3 maximum
12 30 45 57
70
Interquartile range
= 57 – 30 = 27
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-23
Quartiles
■ Quartiles split the ranked data into 4 segments with
an equal number of values per segment
Q1 Q2 Q3
■ The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
■ Q2 is the same as the median (50% are smaller, 50% are
larger)
■ Only 25% of the observations are greater than the third
quartile
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-24
Quartile Formulas
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-25
Quartiles
(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-26
Population Variance
■ Population variance:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-27
Sample Variance
■ Sample variance:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-28
Population Standard Deviation
■ Most commonly used measure of variation
■ Shows variation about the mean
■ Has the same units as the original data
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-29
Sample Standard Deviation
■ Most commonly used measure of variation
■ Shows variation about the mean
■ Has the same units as the original data
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-30
Calculation Example:
Sample Standard Deviation
Sample
Data (xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-32
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.570
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-33
Advantages of Variance and
Standard Deviation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-34
Coefficient of Variation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-35
Comparing Coefficient
of Variation
■ Stock A:
■Average price last year = $50
■Standard deviation = $5
Both stocks
■ Stock B: have the same
standard
■Average price last year = $100 deviation, but
■Standard deviation = $5
stock B is less
variable relative
to its price
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-36
Using Microsoft Excel
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-37
Using Excel
■ Select data / data analysis / descriptive statistics
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-38
Using Excel
■ Enter input
range details
■ Click OK
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-39
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-40
Chebychev’s Theorem
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-41
Chebychev’s Theorem
(continued)
At least within
(1 - 1/1.52) = 55.6% ……... k = 1.5 (μ ± 1.5σ)
(1 - 1/22) = 75% …........... k = 2 (μ ± 2σ)
(1 - 1/32) = 89% …….…... k = 3 (μ ± 3σ)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-42
The Empirical Rule
68%
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-43
The Empirical Rule
■ contains about 95% of the values in
the population or the sample
■ contains almost all (about 99.7%) of the
values in the population or the sample
95% 99.7%
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-44
2.3
Weighted Mean
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-45
Approximations for Grouped Data
Suppose data are grouped into K classes, with
frequencies f1, f2, . . . fK, and the midpoints of the
classes are m1, m2, . . ., mK
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-46
Approximations for Grouped Data
Suppose data are grouped into K classes, with
frequencies f1, f2, . . . fK, and the midpoints of the
classes are m1, m2, . . ., mK
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-47
2.4
The Sample Covariance
■ The covariance measures the strength of the linear relationship
between two variables
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-48
Interpreting Covariance
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-49
Coefficient of Correlation
■ Measures the relative strength of the linear relationship
between two variables
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-50
Features of
Correlation Coefficient, r
■ Unit free
■ Ranges between –1 and 1
■ The closer to –1, the stronger the negative linear
relationship
■ The closer to 1, the stronger the positive linear
relationship
■ The closer to 0, the weaker any positive linear
relationship
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-51
Scatter Plots of Data with Various
Correlation Coefficients
Y Y Y
X X X
r = -1 r = -.6 r=0
Y
Y Y
X X X
r = +1 r = +.3 r=0
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-52
Using Excel to Find
the Correlation Coefficient
■ Select Data / Data Analysis
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-53
Using Excel to Find
the Correlation Coefficient(continued)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-54
Interpreting the Result
■ r = .733
■ There is a relatively
strong positive linear
relationship between
test score #1
and test score #2
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-55
Chapter Summary
■ Described measures of central tendency
■ Mean, median, mode
■ Illustrated the shape of the distribution
■ Symmetric, skewed
■ Described measures of variation
■ Range, interquartile range, variance and standard deviation,
coefficient of variation
■ Discussed measures of grouped data
■ Calculated measures of relationships between
variables
■ covariance and correlation coefficient
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-56
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Online topic: Outliers