Topic3 Descriptive Statistics
Topic3 Descriptive Statistics
Linh Nghiem
MATH1905
Overview
Favorite Sport
# Students Percent
(Frequency) Frequency (%)
Football 8 25
Basketball 8 25
Baseball 3 9.375
Tennis 1 3.125
Soccer 5 15.625
Others 7 21.875
Total 32 100
Pie chart
Favorite Sport
21.875% 0.25
Football
Basketball
Baseball
Tennis
Soccer
Others
15.625%
0.25
3.125% 9.375%
Bar chart
Favorite Sport
8 8
8
7
(Frequency) Number of students
0
Football Basketball Baseball Tennis Soccer Others
Grouped bar chart
Favourite Sport by Gender
4 4 4 4 4
4
3 3 3
3
Number of students
2
2
1
1
0 0
0
Football Basketball Baseball Tennis Soccer Others
Women Men
Summarizing Quantitative Data
Main descriptions
• Location:
- Mean, median and mode
- Relative standing: quartiles, percentiles
• Variability:
- Standard deviation
- Range and interquartile range
• Shape:
- Symmetry and skewness
- Uni-modal and multi-modal
Mean
• Sample mean:
1 1 n
∑
x̄ = (x1 + x2 + … + xn) = xi,
n n i=1
40 + 20 + 40 + … + 42 + 40
x̄ = = 34.24
25
Median
Sorted marks
20 40 42 45 46 48
50 55 62 64 76 80
We have n = 12 observations, so the two middle observations are the 6th and
48 + 50
the 7th in the sorted data. Median = = 49.
2
Mode
20% 80%
20th percentile
Percentiles
• First, second, third quartiles: p = .25, .50, .75 respectively.
- Median = 50th percentile = second quartile.
- Denoted as Q1, Q2, and Q3 respectively.
Q1 Q2 Q3
First Quartile Second Quartile Third Quartile
(25th percentile) (50th percentile) (75th percentile)
(median)
Calculating percentiles
• If we have n observations, the location of the p-percentile is given by
p
Lp = (n + 1)
100
Sample variance
n n
( )
1 1
s2 = (xi − x̄)2 = xi2 − n x̄2
n−1∑
i=1
n − 1 ∑
i=1
40 + 20 + 80 + … + 64
x̄ = = 52.75
12
1
s2 = {(40 − 52.75) + (20 − 52.75) + … + (64 − 52.75) } = 277.3561
2 2 2
12 − 1
s= 277.3561 = 16.65
Range and interquartile range
• Range: difference between maximum and minimum
• Interquartile range (IQR): difference between third and first quartile.
Sorted marks
20 40 42 45 46 48
50 55 64 65 76 80
Range = 80 - 20 = 60
75
L75 = (12 + 1) × = 9.75, Q3 = 64 + (65 − 64) × 0.75 = 64.75
100
25
L25 = (12 + 1) × = 3.25, Q1 = 42 + (45 − 42) × 0.25 = 42.75
100
400
400
200
300
300
150
Frequency
Frequency
Frequency
200
200
100
100
100
50
0
0
−3 −2 −1 0 1 2 3 4 0 2 4 6 0 5 10 15
Mean = Median ≈ 0 Mean = 0.95 > median = 0.68 Mean = 12.8 < median = 13.2
Unimodal, bimodal, and multimodal
Boxplot
40
30
hwy
20
• Eg: For two stocks A and B, we want to see how their returns move
with each other.
- A positive covariance implies if the return on A increases
(decreases), then the return on B also increases (decreases)
- A negative covariance implies if the return on A increase
(decreases), then the return on B decreases (increases)
Covariance and correlation
n n
n − 1 ( i=1 )
1 1
∑ ∑
Cov(X, Y ) = (xi − x̄)(yi − ȳ) = xi yi − n x̄ ȳ
n − 1 i=1
Cov(X, Y )
rXY =
sxsy
Example: Rates of return (%) for two stocks X and Y
Scatterplots
Covariance and correlation
r = 0.96 r = 0.96
Correlation does not imply causation
Summary