Evans Analytics2e PPT 04
Evans Analytics2e PPT 04
Notation
Measures of Location
Measures of Dispersion
Standardization
Proportions for Categorical Variables
Measures of Association
Outliers
Population - all items of interest for a particular
decision or investigation
- all married drivers over 25 years old
- all subscribers to Netflix
Sample mean:
=SUM(B2:B95)/COUNT(B2:B95)
Mean = $2,471,760/94
= $26,295.32
Median =
($15,562.50 + $15,750.00)/2
= $15,656.25
=MEDIAN(B2:B94)
Person Age
1 17.00
2 21.00
3 15.00
4 18.00
5 999.00
6 22.00
7 11.00
8 25.00
Mean 141.00
Median 19.50
Distribution is important!
Notation
Measures of Location
Measures of Dispersion
Range
Interquartile Range
Variance
Standard Deviation
Empirical Rules
Standardization
Proportions for Categorical Variables
Measures of Association
Outliers
Dispersion refers to the degree of variation in
the data; that is, the numerical spread (or
compactness) of the data.
Key measures:
◦ Range
◦ Interquartile range
◦ Variance
◦ Standard deviation
The range is the simplest and is the difference
between the maximum value and the minimum
value in the data set.
Examples:
◦ For k = 2: at least ¾ or 75% of the data lie within two
standard deviations of the mean
◦ For k = 3: at least 8/9 or 89% of the data lie within
three standard deviations of the mean
Notation
Measures of Location
Measures of Dispersion
Standardization
Proportions for Categorical Variables
Measures of Association
Outliers
A standardized value, commonly called a z-score,
provides a relative measure of the distance an
observation is from the mean, which is independent of
the units of measurement.
The z-score for the ith observation in a data set is
calculated as follows:
=(B2 - $B$97)/$B$98, or
=STANDARDIZE(B2,$B$97,$B$98).
0
1
Notation
Measures of Location
Measures of Dispersion
Standardization
Proportions for Categorical Variables
Measures of Association
Outliers
The proportion, denoted by p, is the fraction of
data that have a certain characteristic.
There is only a 0.3% (for normally distributed data) or a 11% (for any
distribution) chance to see an observation outside +/- 3 std.dev.
This suggests that month 12 is statistically different from the rest of
the data.