Class 1 - 20th August 2024 - Descriptive Statistic
Class 1 - 20th August 2024 - Descriptive Statistic
Examples:
1. Data: {0, 7, 12, 5, 14, 8, 0, 9, 22} N=9 (odd). Sort them bottom to top, find the middle: 0
0 5 7 8 9 12 14 22.
2. Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 (even). Sort them bottom to top, the middle
is the simple average between 8 & 9: 0 0 5 7 8 9 12 14 22 33; median = (8+9)÷2 = 8.5
MEDIAN
MEAN
MODE
Geometric Mean
Variable growth rate/rate of change.
Ex: investment over a period of time.
Ri denotes the rate of return
Geometric mean R
n
( 1+ R g ) =( 1+ R 1 )( 1+ R 2 )( 1+ R 3 ) … … .(1+ Rn )
Solving for R:
Measure of Variability
MoV are used as MoCL fail at grasping the entire
picture. – i.e., do not shed light on how much the data is
spread out.
Two sets of class; same mean but different variability.
Range
Max – Min Range
Variance
Population variance is denoted by σ2
Sample variance is denoted by S2
Variance of
Population
Variance of
Sample
Standard Deviation
Comments on the general shape of the distribution of a data set.
Bell shaped histogram Empirical Rule can be used.
Empirical Rule
Empirical Rule states:
* This is a common way to identify Outliers. 2 or 2.5 sigma’s can be used as well.
Chebysheffs Theorem
Applies to all Histograms – not just bell shaped ones like empirical rule.
K standard deviation
Theorem: for K=2, at least ¾ of all observations lie within 2 standard deviations of
the mean. | lower bound of the empirical rule’s approximation of 95%.
1
1− 2
for K >1
k
There are 5,000
Example:
numbers whose
standard
deviation is 18.
At least how
many of the
numbers will be
25 25
Co-efficient of Variation
CoE of Variation is standard deviation divided by the mean of the observations.
o Populations: σ /μ
o Sample: s/ x ¿
Example”
55, 49, 43, 45, 34, 23, 38, 30 arrange 23, 30, 34, 38, 43, 45, 49, 55
25th percentile: 23, 30, 34, 43, 45, 49, 55, 89: 2 below – 6 above
50th percentile: 23, 30, 34, 43, 43, 45, 49, 55,89: 4 below – 4 above
75th percentile: 23, 30, 34, 43, 45, 49, 55, 89: 6 below – 2 above.
5-Number Summary
The minimum number.
First/Lower Quartile
Second quartile
Third quartile/Upper quartile
The maximum number.
Interquartile Range
Measures the spread of the middle of the 50% of the observations.
Q3 – Q1 = IQR
Large value: Q1 and Q3 are spread out indicative of high variability.
Not impacted by outlier as is range.
Box Plots
Whisker
Q1 Q2 Q3
Stroop Interference
Confliction between what is asked and what is written.
Measures of Linear Relationship
Three numerical measures of linear relationship comment on strength and
direction of the linear relationship between two variables.
o Covariance
o Coefficient of correlation
o Coefficient of determination. (not discussed)
Covariance
same direction: large positive number (either decrease or increase).
Opposite directions: large negative number.
No particular pattern: small number.
* often hard to determine the size of the number. Co-efficient of correlation helps with this.
Coefficient of correlation
Fixed range from -1 to +1
Positive correlation: close +1
Negative correlation: close -1
No straight-line relationship is indicated by close to 0.
Summary of Symbols
Population Sample
Size N n
Mean µ
2 2
Variance σ S
Standard Deviation σ S
Coefficient of Variation CV cv
Covariance σxy Sxy
Coefficient of Correlation ρ r
Standard deviation is square root of variance.
Coefficient of variance is stdev divided by mean.