Dr. K. M. Salah Uddin Associate Professor Dept. of MIS, DU
Dr. K. M. Salah Uddin Associate Professor Dept. of MIS, DU
Measures
Mode Variance
Coefficient of Variation
Measures of Central Tendency
Overview
Central Tendency
X i
XG ( X1 X 2 Xn )1/ n
X i 1
n Midpoint of Most
ranked frequently
values observed
value
Arithmetic Mean
The arithmetic mean (mean) is the most
common measure of central tendency
X i
X1 X 2 Xn
X i1
n n
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
Median
In an ordered array, the median is the “middle”
number (50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
n 1
Median position position in the ordered data
2
If the number of values is odd, the median is the middle number
If the number of values is even, the median is the average of
the two middle numbers
n 1
Note that 2 is not the value of the median, only the
position of the median in the ranked data
Mode
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical
(nominal) data
There may may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Review Example
Five houses on a hill by the beach
$2,000 K
House Prices:
$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000
$100 K
$100 K
Review Example:
Summary Statistics
House Prices:
Mean: ($3,000,000/5)
$2,000,000 = $600,000
500,000
300,000
100,000
100,000
Median: middle value of ranked data
Sum $3,000,000
= $300,000
Q1 Q2 Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are
larger)
Only 25% of the observations are greater than the third
quartile
Quartile Formulas
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Q1 and Q3 are measures of noncentral location
Q2 = median, a measure of central tendency
Quartiles
(continued)
Example:
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = 12.5
XG ( X1 X 2 Xn ) 1/ n
R G [(1 R1 ) (1 R 2 ) (1 Rn )]1/ n 1
Same center,
different variation
Range
Simplest measure of variation
Difference between the largest and the smallest
values in a set of data:
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Disadvantages of the Range
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range
Example:
X Median X
minimum Q1 (Q2) Q3 maximum
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
Variance
S 2 i1
n -1
Where X = mean
n = sample size
Xi = ith value of the variable X
Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Is the square root of the variance
Has the same units as the original data
n
Sample standard deviation:
(X X)
i
2
S i1
n -1
Calculation Example:
Sample Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.567
Advantages of Variance and
Standard Deviation
S
CV 100%
X
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
S $5
CVA 100% 100% 10%
X $50 Both stocks
Stock B: have the same
standard
Average price last year = $100
deviation, but
Standard deviation = $5 stock B is less
variable relative
to its price
S $5
CVB 100% 100% 5%
X $100
Z Scores
XX
Z
S
Z Scores
(continued)
Example:
If the mean is 14.0 and the standard deviation is 3.0,
what is the Z score for the value 18.5?
X X 18.5 14.0
Z 1.5
S 3.0
The value 18.5 is 1.5 times of standard deviations
above the mean
(A negative Z-score would mean that a value is less
than the mean)
Shape of a Distribution
X i
X1 X 2 XN
i1
N N
Where μ = population mean
N = population size
Xi = ith value of the variable X
Population Variance
σ2 i1
N
N
Population standard deviation:
i
(X μ)2
σ i1
N
The Empirical Rule
68%
μ
μ 1σ
The Empirical Rule
μ 2σcontains about 95% of the values in
the population or the sample
μ 3σcontains about 99.7% of the values
in the population or the sample
95% 99.7%
μ 2σ μ 3σ
Chebyshev Rule