3 Descriptive Statistics - Numerical
3 Descriptive Statistics - Numerical
Slide 1
Measures of Location
Mean
If the measures are computed
Median
for data from a sample,
Mode they are called sample statistics.
Percentiles
Quartiles If the measures are computed
for data from a population,
they are called population parameters.
Slide 2
Mean
Slide 3
Sample Mean x
∑x i
x=
n
Number of
observations
in the sample
Slide 4
Population Mean µ
∑x i
µ=
N
Number of
observations in
the population
Slide 5
Sample Mean
Slide 6
Sample Mean
=x
∑
=
x i 34, 356
= 490.80
n 70
Slide 7
Median
Slide 8
Median
26 18 27 12 14 27 19 7 observations
12 14 18 19 26 27 27 in ascending order
Median = 19
Slide 9
Median
26 18 27 12 14 27 30 19 8 observations
12 14 18 19 26 27 27 30 in ascending order
Slide 10
Median
Slide 11
Trimmed Mean
Slide 12
Mode
Slide 13
Mode
Slide 14
Percentiles
Slide 15
Percentiles
Arrange the data in ascending order.
i = (p/100)n
Slide 17
80th Percentile
Slide 18
Quartiles
Slide 19
Third Quartile
Slide 20
We look at the average income in Hong Kong over the last 10
years, adjusted for inflation, it has been going up, then can we
conclude that in general, citizens are better off economically?
Not Necessary!!
Let’s say if the median income has gone down in the same
period, then it means that a typical person is actually worse
off, even though the people at the top have been making more
money.
A statistician has his head in the oven and his feet in the
refrigerator. When he is asked how he feel, he says, “On
average, pretty good!”
Slide 21
Measures of Variability
Slide 22
Measures of Variability
Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation
Slide 23
Range
Slide 24
Range
Slide 25
Interquartile Range
Slide 26
Interquartile Range
Slide 27
Variance
Slide 28
Variance
2
2 ∑ ( xi − x ) 2 ∑ ( xi − µ ) 2
s = σ =
n −1 N
for a for a
sample population
Slide 29
Standard Deviation
Slide 30
Standard Deviation
s = s2 σ = σ2
for a for a
sample population
Slide 31
Coefficient of Variation
s σ
× 100 % ×100 %
x µ
for a for a
sample population
Slide 32
Sample Variance, Standard Deviation,
And Coefficient of Variation
Example: Hotel Room Rates
• Variance ∑ i
( x − x ) 2
=s2 = 2, 996.16
n−1
Slide 33
Slide 3, Part B
Descriptive Statistics: Numerical Measures
Measures of Distribution Shape, Relative Location,
and Detecting Outliers
Exploratory Data Analysis
Measures of Association Between Two Variables
The Weighted Mean and
Working with Grouped Data
Slide 34
Measures of Distribution Shape,
Relative Location, and Detecting Outliers
Distribution Shape
z-Scores
Chebyshev’s Theorem
Empirical Rule
Detecting Outliers
Slide 35
Distribution Shape: Skewness
Slide 36
Distribution Shape: Skewness
.25
.20
.15
.10
.05
0
Slide 37
Distribution Shape: Skewness
.25
.20
.15
.10
.05
0
Slide 38
Distribution Shape: Skewness
.25
.20
.15
.10
.05
0
Slide 39
Distribution Shape: Skewness
.25
.20
.15
.10
.05
0
Slide 40
Distribution Shape: Skewness
Slide 41
Distribution Shape: Skewness
.25
.20
.15
.10
.05
0
Slide 42
z-Scores
xi − x
zi =
s
Slide 43
z-Scores
Slide 44
z-Scores
Slide 45
Chebyshev’s Theorem
Slide 46
Chebyshev’s Theorem
Slide 47
Chebyshev’s Theorem
Slide 48
Empirical Rule
Slide 49
Empirical Rule
Slide 50
Empirical Rule
99.72%
95.44%
68.26%
x
µ µ + 3σ
µ – 3σ µ – 1σ µ + 1σ
µ – 2σ µ + 2σ
Slide 51
Detecting Outliers
Slide 52
Detecting Outliers
Slide 53
Exploratory Data Analysis
Slide 54
Five-Number Summary
1 Smallest Value
2 First Quartile
3 Median
4 Third Quartile
5 Largest Value
Slide 55
Five-Number Summary
Slide 56
Box Plot
Slide 57
Box Plot
400 425 450 475 500 525 550 575 600 625
Q1 = 445 Q3 = 525
Q2 = 475
Slide 58
Box Plot
Slide 59
Box Plot
Slide 60
Box Plot
400 425 450 475 500 525 550 575 600 625
Slide 63
Covariance
Slide 64
Covariance
∑ ( xi − x )( yi − y ) for
sxy =
n −1 samples
∑ ( xi − µ x )( yi − µ y ) for
σ xy = populations
N
Slide 65
Correlation Coefficient
Slide 66
Correlation Coefficient
for for
samples populations
Slide 67
Correlation Coefficient
Slide 68
Covariance and Correlation Coefficient
Slide 69
Covariance and Correlation Coefficient
x y ( xi − x ) ( y i − y ) ( xi − x )( yi − y )
277.6 69 10.65 -1.0 -10.65
259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev. 8.2192 .8944
Slide 70
Covariance and Correlation Coefficient
Slide 71
The Weighted Mean and
Working with Grouped Data
Weighted Mean
Mean for Grouped Data
Variance for Grouped Data
Standard Deviation for Grouped Data
Slide 72
Weighted Mean
Slide 73
Weighted Mean
x= ∑ wx i i
∑w i
where:
xi = value of observation i
wi = weight for observation i
Slide 74
Grouped Data
Slide 75
Mean for Grouped Data
Sample Data
x= ∑ fM i i
Population Data
µ= ∑ fMi i
N
where:
fi = frequency of class i
Mi = midpoint of class i
Slide 76
Sample Mean for Grouped Data
Slide 77
Sample Mean for Grouped Data
34, 525
=x = 493.21
70
This approximation
differs by $2.41 from
the actual sample
mean of $490.80.
Slide 78
Variance for Grouped Data
2 ∑ f i ( Mi − x ) 2
s =
n −1
∑ f i ( M i − µ ) 2
σ2 =
N
Slide 79
Sample Variance for Grouped Data
continued
Slide 80
Sample Variance for Grouped Data
Slide 81
Check Your Understanding
What can you conclude from the following data set?
Variable A Variable B
1 12
6 18
23 25
28 43
55 52
56 73
64 75
66 94
A. The correlation coefficient equals -0.86, so the two
variables have a strong negative linear relationship.
B. The correlation coefficient equals 0.05, so the two
variables have not much linear relationship.
C. The correlation coefficient equals 0.95, so the two
variables have a strong positive linear relationship.
Slide 82