Week 02 Descriptive Statistics
Week 02 Descriptive Statistics
5)
Chap 3-1
Measures of Central Tendency:
The Mean
X i
X1 X 2 Xn
X i1
n n
Sample size Observed values
Chap 3-2
Measures of Central Tendency:
The Mean
(continued)
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Mean = 13 Mean = 14
11 12 13 14 15 65 11 12 13 14 20 70
13 14
5 5 5 5
Chap 3-3
Measures of Central Tendency:
The Median
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Median = 13 Median = 13
Chap 3-4
Measures of Central Tendency:
Locating the Median
The location of the median when the values are in numerical order
(smallest to largest):
n 1
Median position position in the ordered data
2
If the number of values is odd, the median is the middle number
n 1
Note that is not the value of the median, only the position of
2
the median in the ranked data
Chap 3-5
Measures of Central Tendency:
The Mode
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical (nominal)
data
There may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Chap 3-6
Measures of Central Tendency:
Review Example
Chap 3-7
Measures of Central Tendency:
Which Measure to Choose?
Chap 3-8
Measures of Central Tendency:
Summary
Central Tendency
X i
X i 1
n Middle value Most
in the ordered frequently
array observed
value
Chap 3-9
Measures of Variation
Variation
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 - 1 = 12
Chap 3-11
Measures of Variation:
Why The Range Can Be Misleading
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Chap 3-12
Measures of Variation:
The Sample Variance
S 2 i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Chap 3-14
Measures of Variation:
The Sample Standard Deviation
n
Sample standard deviation: (X X)
i
2
S i1
n -1
Chap 3-15
Measures of Variation:
Comparing Standard Deviations
Chap 3-16
Locating Extreme Outliers:
Z-Score
XX
Z
S
Chap 3-18
General Descriptive Stats Using
Microsoft Excel Functions
Chap 3-19
General Descriptive Stats Using
Microsoft Excel Data Analysis Tool
1. Select Data.
3. Select Descriptive
Statistics and click OK.
Chap 3-20
General Descriptive Stats Using
Microsoft Excel
Chap 3-21
Excel output
Microsoft Excel House Prices
descriptive statistics output,
Mean 600000
using the house price data: Standard Error 357770.8764
House Prices: Median 300000
Mode 100000
$2,000,000 Standard Deviation 800000
500,000 Sample Variance 640,000,000,000
300,000 Kurtosis 4.1301
100,000 Skewness 2.0068
100,000 Range 1900000
Minimum 100000
Maximum 2000000
Sum 3000000
Count 5
Chap 3-22
Numerical Descriptive
Measures for a Population
Chap 3-23
Numerical Descriptive Measures
for a Population: The mean µ
The population mean is the sum of the values in
the population divided by the population size, N
X i
X1 X 2 XN
i1
N N
Where μ = population mean
N = population size
Xi = ith value of the variable X
Chap 3-24
Numerical Descriptive Measures
For A Population: The Variance σ2
σ2 i1
N
N
Population standard deviation:
i
2
(X μ)
σ i1
N
Chap 3-26
Sample statistics versus
population parameters
Chap 3-27
Quartiles
Quartiles split the ranked data into 4 segments with
an equal number of values per segment
Q1 Q2 Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% of the observations
are smaller and 50% are larger)
Only 25% of the observations are greater than the third
quartile
Chap 3-28
Quartile Measures:
Locating Quartiles
Chap 3-29
Quartile Measures:
Calculation Rules
Chap 3-30
Quartile Measures
Calculating The Quartiles: Example
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = (12+13)/2 = 12.5
Measures like Q1, Q3, and IQR that are not influenced
by outliers are called resistant measures
Chap 3-32
Calculating The Interquartile
Range
Example:
X Median X
minimum Q1 (Q2) Q3 maximum
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
Chap 3-33
The Five-Number Summary
Chap 3-34
Five Number Summary and
The Boxplot
Chap 3-35
Five Number Summary:
Shape of Boxplots
If data are symmetric around the median then the box
and central line are centered between the endpoints
Chap 3-36
Distribution Shape and
The Boxplot
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Chap 3-37
Measures Of The Relationship Between
Two Numerical Variables
The Covariance
The Coefficient of Correlation
Chap 3-38
The Covariance
The covariance measures the strength of the linear
relationship between two numerical variables (X & Y)
( X X)( Y Y )
i i
cov ( X , Y ) i1
n 1
Only concerned with the strength of the relationship
No causal effect is implied
Chap 3-39
Interpreting Covariance
Covariance between two variables:
cov(X,Y) > 0 X and Y tend to move in the same direction
Chap 3-40
Coefficient of Correlation
Measures the relative strength of the linear
relationship between two numerical variables
Sample coefficient of correlation:
cov (X , Y)
r
SX SY
where
n n n
(X X)(Y Y)
i i (X X)
i
2
i
(Y Y ) 2
cov (X , Y) i1
SX i1
SY i 1
n 1 n 1 n 1
Chap 3-41
Features of the
Coefficient of Correlation
The population coefficient of correlation is referred as ρ.
The sample coefficient of correlation is referred to as r.
Either ρ or r have the following features:
Unit free
Ranges between –1 and 1
The closer to –1, the stronger the negative linear relationship
The closer to 1, the stronger the positive linear relationship
The closer to 0, the weaker the linear relationship
Chap 3-42
Scatter Plots of Sample Data with
Various Coefficients of Correlation
Y Y
X X
r = -1 r = -.6
Y
Y Y
X X X
r = +1 r = +.3 r=0
Chap 3-43
The Coefficient of Correlation Using
Microsoft Excel Function
Chap 3-44
The Coefficient of Correlation Using
Microsoft Excel Data Analysis Tool
1. Select Data
2. Choose Data Analysis
3. Choose Correlation &
Click OK
Chap 3-45
The Coefficient of Correlation
Using Microsoft Excel
Chap 3-46
Excel
Chap 1-47