Business Statistics, 4e: by Ken Black
Business Statistics, 4e: by Ken Black
Business Statistics, 4e: by Ken Black
by Ken Black
Chapter 3
Discrete Distributions
Descriptive
Statistics
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-1
Learning Objectives
• Distinguish between measures of central
tendency, measures of variability, measures
of shape, and measures of association.
• Understand the meanings of mean, median,
mode, quartile, percentile, and range.
• Compute mean, median, mode, percentile,
quartile, range, variance, standard deviation,
and mean absolute deviation on ungrouped
data.
• Differentiate between sample and
population variance and standard deviation.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-2
Learning Objectives -- Continued
• Understand the meaning of standard
deviation as it is applied by using the
empirical rule and Chebyshev’s theorem.
• Compute the mean, median, standard
deviation, and variance on grouped data.
• Understand box and whisker plots,
skewness, and kurtosis.
• Compute a coefficient of correlation and
interpret it.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-3
Measures of Central Tendency:
Ungrouped Data
• Measures of central tendency yield
information about “particular places or
locations in a group of numbers.”
• Common Measures of Location
– Mode
– Median
– Mean
– Percentiles
– Quartiles
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-4
Mode
• The most frequently occurring value in a
data set
• Applicable to all levels of data
measurement (nominal, ordinal, interval,
and ratio)
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-5
Mode -- Example
• The mode is 44.
• There are more 44s 35 41 44 45
37 43 44 46
39 43 44 46
40 43 44 46
40 43 45 48
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-6
Median
• Middle value in an ordered array of
numbers.
• Applicable for ordinal, interval, and ratio
data
• Not applicable for nominal data
• Unaffected by extremely large and
extremely small values.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-7
Median: Computational Procedure
• First Procedure
– Arrange the observations in an ordered array.
– If there is an odd number of terms, the median
is the middle term of the ordered array.
– If there is an even number of terms, the median
is the average of the middle two terms.
• Second Procedure
– The median’s position in an ordered array is
given by (n+1)/2.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-8
Median: Example
with an Odd Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
X X X X ... X
1 2 3 N
N N
24 13 19 26 11
5
93
5
18. 6
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-12
Sample Mean
X
X X X X ... X
1 2 3 n
n n
57 86 42 38 90 66
6
379
6
63.167
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-13
Percentiles
• Measures of central tendency that divide a
group of data into 100 parts
• At least n% of the data lie below the nth
percentile, and at most (100 - n)% of the data
lie above the nth percentile
• Example: 90th percentile indicates that at least
90% of the data lie below it, and at most 10%
of the data lie above it
• The median and the 50th percentile have the
same value.
• Applicable for ordinal, interval, and ratio data
• Not applicable for nominal data
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-14
Percentiles: Computational Procedure
• Organize the data into an ascending ordered
array.
• Calculate the
P
percentile location:
i (n)
100
• Determine the percentile’s location and its
value.
• If i is a whole number, the percentile is the
average of the values at the i and (i+1)
positions.
• If i is not a whole number, the percentile is at
the (i+1) position in the ordered array.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-15
Percentiles: Example
• Raw Data: 14, 12, 19, 23, 5, 13, 28, 17
• Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
• Location of
30th percentile: 30
i (8) 2. 4
100
• The location index, i, is not a whole number; i+1 =
2.4+1=3.4; the whole number portion is 3; the
30th percentile is at the 3rd location of the array;
the 30th percentile is 13.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-16
Quartiles
• Measures of central tendency that divide a group
of data into four subgroups
• Q1: 25% of the data set is below the first quartile
• Q2: 50% of the data set is below the second
quartile
• Q3: 75% of the data set is below the third quartile
• Q1 is equal to the 25th percentile
• Q2 is located at 50th percentile and equals the
median
• Q3 is equal to the 75th percentile
• Quartile values are not necessarily members of the
data set
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-17
Quartiles
Q1 Q2 Q3
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-18
Quartiles: Example
• Ordered array: 106, 109, 114, 116, 121, 122,
125, 129
• Q1 25 109114
i (8) 2 Q1 1115
.
100 2
• Q2: 50 116121
i (8) 4 Q2 1185
.
100 2
75 122125
• Q3: i (8) 6 Q3 1235
.
100 2
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-19
Variability
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-20
Variability
Variability
No Variability
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-21
Measures of Variability:
Ungrouped Data
• Measures of variability describe the spread
or the dispersion of a set of data.
• Common Measures of Variability
– Range
– Interquartile Range
– Mean Absolute Deviation
– Variance
– Standard Deviation
– Z scores
– Coefficient of Variation
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-22
Range
• The difference between the largest and the
smallest values in a set of data
• Simple to compute 35 41 44 45
Range 39 43 = 44 46
Largest - Smallest =
40 43 44 46
48 - 35 = 13
40 43 45 48
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-23
Interquartile Range
Interquartile Range Q 3 Q1
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-24
Deviation from the Mean
• Data set: 5, 9, 16, 17, 18
• Mean: X
65
13
N 5
• Deviations from the mean: -8, -4, 3, 4, 5
-4 +5
-8 +4
+3
0 5 10 15 20
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-25
Mean Absolute Deviation
• Average of the absolute deviations from the
mean
X X X
X
M . A. D.
5 -8 +8 N
9 -4 +4 24
16 +3 +3
17 +4 +4
5
18 +5 +5 4.8
0 24
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-26
Population Variance
• Average of the squared deviations from the
arithmetic mean
X X
X
X
2
2
2
5 -8 64
9 -4 16 N
16 +3 9 130
17 +4 16 5
18 +5 25 2 6 .0
0 130
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-27
Population Standard Deviation
• Square root of the
variance
X
2
X X X
2
2
N
5 -8 64 130
9 -4 16
5
16 +3 9
2 6 .0
17 +4 16
18
+5 25 2
0 130
2 6 .0
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 5 .1 3-28
Sample Variance
• Average of the squared deviations from the
arithmetic mean
X X X X X X
X
2
2
2
2,398 625 390,625 S
1,844 71 5,041
n1
1,539 -234 54,756 6 6 3 ,8 6 6
1,311 -462 213,444 3
7,092 0 663,866 2 2 1 , 2 8 8 .6 7
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-29
Sample Standard Deviation
• Square root of the
X X
2
sample variance 2
S
X X X X X
2
n1
6 6 3 ,8 6 6
2,398 625 390,625
1,844 71 5,041 3
1,539 -234 54,756 2 2 1 , 2 8 8 .6 7
1,311 -462 213,444 2
7,092 0 663,866 S S
2 2 1 , 2 8 8 .6 7
4 7 0 .4 1
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-30
Uses of Standard Deviation
• Indicator of financial risk
• Quality Control
– construction of quality control charts
– process capability studies
• Comparing populations
– household incomes in two cities
– employee absenteeism at two plants
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-31
Standard Deviation as an
Indicator of Financial Risk
Annualized Rate of Return
Financial
Security
A 15% 3%
B 15% 7%
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-32
Empirical Rule
• Data are normally distributed (or approximately
normal)
1 68
2 95
3 99.7
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-33
Chebyshev’s Theorem
• Applies to all distributions
1
P( k X k ) 1 2
k
for k > 1
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-34
Chebyshev’s Theorem
• Applies to all distributions
Number Minimum Proportion
of Distance from of Values Falling
Standard the Mean Within Distance
Deviations
K=2 2 1-1/22 = 0.75
K=3 3 1-1/32 = 0.89
K=4 4 1-1/42 = 0.94
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-35
Coefficient of Variation
• Ratio of the standard deviation to the mean,
expressed as a percentage
• Measurement of relative dispersion
C.V . 100
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-36
Coefficient of Variation
291
84
2
1
4.6 2
10
100 100
. .
CV 1
1
. .
CV 2
2
1 2
4.6 10
100 100
29 84
1586
. 1190
.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-37
Measures of Central Tendency
and Variability: Grouped Data
• Measures of Central Tendency
– Mean
– Median
– Mode
• Measures of Variability
– Variance
– Standard Deviation
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-38
Mean of Grouped Data
• Weighted average of class midpoints
• Class frequencies are the weights
fM
f
fM
N
f 1M 1 f 2 M 2 f 3 M 3 f iM i
f 1 f 2 f 3 fi
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-39
Calculation of Grouped Mean
Class Interval Frequency Class Midpoint fM
20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
50 2150
fM
2150
43 . 0
f 50
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-40
Median of Grouped Data
N
cfp
Median L 2 W
fmed
Where:
L the lower limit of the median class
cfp = cumulative frequency of class preceding the median class
fmed = frequency of the median class
W = width of the median class
N = total of frequencies
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-41
Median of Grouped Data -- Example
Cumulative N
cfp
Class Interval Frequency Frequency
Md L 2 W
20-under 30 6 6 fmed
30-under 40 18 24 50
40-under 50 11 35 24
50-under 60 11 46 40 2 10
11
60-under 70 3 49
40.909
70-under 80 1 50
N = 50
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-42
Mode of Grouped Data
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-43
Variance and Standard Deviation
of Grouped Data
Population Sample
f M S M X
2 2
f
2
2
n1
N
S
2
2 S
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-44
Population Variance and Standard
Deviation of Grouped Data
f M fM M M M
2 2
Class Interval f
M 2
2
f 7200
144 12
2
144
N 50
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-45
Measures of Shape
• Skewness
– Absence of symmetry
– Extreme values in one side of a distribution
• Kurtosis
– Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal shape
– Platykurtic: flat and spread out
• Box and Whisker Plots
– Graphic display of a distribution
– Reveals skewness
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-46
Skewness
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-47
Skewness
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-48
Coefficient of Skewness
• Summary measure for skewness
3 Md
S
• If S < 0, the distribution is negatively skewed
(skewed to the left).
• If S = 0, the distribution is symmetric (not
skewed).
• If S > 0, the distribution is positively skewed
(skewed to the right).
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-49
Coefficient of Skewness
1
23 2
26 3
29
M
d1 26 M d2 26 M
d3 26
1
12.3 2
12.3 3
12.3
3 1 M d1
3 2 M d2
3 3 M
d3
S 1
S 2
S 3
1 2 3
Leptokurtic
Mesokurtic
Platykurtic
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-51
Box and Whisker Plot
• Five secific values are used:
– Median, Q2
– First quartile, Q1
– Third quartile, Q3
– Minimum value in the data set
– Maximum value in the data set
• Inner Fences
– IQR = Q3 - Q1
– Lower inner fence = Q1 - 1.5 IQR
– Upper inner fence = Q3 + 1.5 IQR
• Outer Fences
– Lower outer fence = Q1 - 3.0 IQR
– Upper outer fence = Q3 + 3.0 IQR
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-52
Box and Whisker Plot
Minimum Q1 Q2 Q3 Maximum
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-53
Skewness: Box and Whisker Plots,
and Coefficient of Skewness
S<0 S=0 S>0
SSXY
r
SSX SSY
X X Y Y
X X Y Y
2 2
X Y
XY n
X
2
Y 2
Y
2
1 r 1
2
X n n
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-55
Three Degrees of Correlation
r<0 r>0
r=0
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-56
Computation of r for
the Economics Example (Part 1)
Futures
Interest Index
Day X Y X2 Y2 XY
1 7.43 221 55.205 48,841 1,642.03
2 7.48 222 55.950 49,284 1,660.56
3 8.00 226 64.000 51,076 1,808.00
4 7.75 225 60.063 50,625 1,743.75
5 7.60 224 57.760 50,176 1,702.40
6 7.63 223 58.217 49,729 1,701.49
7 7.68 223 58.982 49,729 1,712.64
8 7.67 226 58.829 51,076 1,733.42
9 7.59 226 57.608 51,076 1,715.34
10 8.07 235 65.125 55,225 1,896.45
11 8.03 233 64.481 54,289 1,870.99
12 8.00 241 64.000 58,081 1,928.00
Summations 92.93 2,725 720.220 619,207 21,115.07
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-57
Computation of r
for the Economics Example (Part 2)
X Y
XY
n
r
X 2
Y
2
X n Y n
2 2
92.93 2725
21,115.07
12
720.22 92 .93
2
619,207 2725
2
12 12
.815
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-58
Scatter Plot and Correlation Matrix
for the Economics Example
245
240
Futures Index
235
230
225
220
7.40 7.60 7.80 8.00 8.20
Interest
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-59