0% found this document useful (0 votes)
5 views6 pages

Class 1 - 20th August 2024 - Descriptive Statistic

The document provides an overview of descriptive statistics, covering measures of central location, variability, relative standing, and linear relationships. Key concepts include median, mean, standard deviation, variance, and the empirical rule, along with methods for identifying outliers and analyzing data distribution. It also includes practical examples and definitions to aid understanding of these statistical measures.

Uploaded by

Ghina Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

Class 1 - 20th August 2024 - Descriptive Statistic

The document provides an overview of descriptive statistics, covering measures of central location, variability, relative standing, and linear relationships. Key concepts include median, mean, standard deviation, variance, and the empirical rule, along with methods for identifying outliers and analyzing data distribution. It also includes practical examples and definitions to aid understanding of these statistical measures.

Uploaded by

Ghina Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Table of Contents

Unit 1 – Descriptive Statistics.................................................................................................. 2


Measure of Central Location.................................................................................................... 2
Median................................................................................................................................. 2
Examples:......................................................................................................................... 2
Mean, Median and Mode...................................................................................................... 2
Geometric Mean................................................................................................................... 2
Measure of Variability............................................................................................................. 3
Range................................................................................................................................... 3
Variance............................................................................................................................... 3
Standard Deviation.............................................................................................................. 3
Empirical Rule...................................................................................................................... 3
3-Sigma Approach to Find Outliers.......................................................................................4
Chebysheffs Theorem.......................................................................................................... 4
Co-efficient of Variation........................................................................................................ 4
Analysis of the Shape........................................................................................................... 4
Measures of Relative Standing................................................................................................ 4
5-Number Summary............................................................................................................. 5
Interquartile Range.............................................................................................................. 5
Box Plots.............................................................................................................................. 5
Stroop Interference.............................................................................................................. 5
Measures of Linear Relationship.............................................................................................. 6
Covariancechwby................................................................................................................. 6
Coefficient of correlation...................................................................................................... 6
Summary of Symbols.............................................................................................................. 6
Unit 1 – Descriptive Statistics
Measuring Tool
Measures of Central Location Mean, Median, Mode
Measures of Variability Range, Standard Deviation, Variance,
Coefficient of Variation
Measures of Relative Standing Percentiles, Quartiles
Measures of Linear Relationship Covariance, Correlation, Determination,
Least Squares Line

Measure of Central Location


Median
The median is calculated by placing all the observations in order; the observation that falls
in the middle is the median.

Examples:
1. Data: {0, 7, 12, 5, 14, 8, 0, 9, 22} N=9 (odd). Sort them bottom to top, find the middle: 0
0 5 7 8 9 12 14 22.

2. Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 (even). Sort them bottom to top, the middle
is the simple average between 8 & 9: 0 0 5 7 8 9 12 14 22 33; median = (8+9)÷2 = 8.5

* Sample and population medians are computed the same way.

Mean, Median and Mode

MEDIAN
MEAN
MODE

Geometric Mean
 Variable growth rate/rate of change.
 Ex: investment over a period of time.
 Ri denotes the rate of return
 Geometric mean  R
n
( 1+ R g ) =( 1+ R 1 )( 1+ R 2 )( 1+ R 3 ) … … .(1+ Rn )
 Solving for R:

R g=√ ( 1+ R1 ) ( 1+ R2 ) ( 1+ R3 ) … … .(1+ Rn )−1


n

Measure of Variability
 MoV are used as MoCL fail at grasping the entire
picture. – i.e., do not shed light on how much the data is
spread out.
 Two sets of class; same mean but different variability.
Range
Max – Min  Range

Variance
 Population variance is denoted by σ2
 Sample variance is denoted by S2

Variance of
Population
Variance of
Sample

Standard Deviation
 Comments on the general shape of the distribution of a data set.
 Bell shaped histogram  Empirical Rule can be used.

Empirical Rule
Empirical Rule states:

1. Approximately 68% of all observations fall within one standard


deviation of the mean.

2. Approximately 95% of all observations fall within two standard


deviations of the mean.
3. Approximately 99.7% of all observations fall
within three standard deviations of the mean.

3-Sigma Approach to Find Outliers


For Normal distribution, 3 sigma’s around the mean covers 99.7% of the area. So, if we
reject numbers that go past 3 sigma’s, we lose about 0.3% of the numbers, most likely, the
outliers.

* This is a common way to identify Outliers. 2 or 2.5 sigma’s can be used as well.

Chebysheffs Theorem
 Applies to all Histograms – not just bell shaped ones like empirical rule.
 K  standard deviation
 Theorem: for K=2, at least ¾ of all observations lie within 2 standard deviations of
the mean. | lower bound of the empirical rule’s approximation of 95%.

1
1− 2
for K >1
k
There are 5,000
Example:
numbers whose
standard
deviation is 18.
At least how
many of the
numbers will be
 25   25 

50 (1.388 std away from mean) 80 (mean) 105

K= distance from the mean of the range given/standard deviation of the


total observation
K= 25/18 = 1.388 (standard deviation co-efficient)

Co-efficient of Variation
 CoE of Variation is standard deviation divided by the mean of the observations.
o Populations: σ /μ
o Sample: s/ x ¿

Analysis of the Shape


 Determines asymmetry of the distrubitio.
o Skewness = 0: Symmetry around the mean
o Skewness < 0: Skewed to the left (Negatively skewed)
o Skewness > 0: Skewed to the right (Positively skewed)

Measures of Relative Standing


 Provide information on the position of the particular values – relative to data set.
 Percentile; value for which P percent are less than that value and (100-P)% are
greater than that value.
o Example: 60th percentile; meaning 60% are below; 40% are above.
 Quartiles  25th; Q1 (lower quartile) | 50th; Q2 (second quartile) | 75th; Q3 (upper
quartile)

Example”

55, 49, 43, 45, 34, 23, 38, 30  arrange  23, 30, 34, 38, 43, 45, 49, 55

 25th percentile: 23, 30, 34, 43, 45, 49, 55, 89: 2 below – 6 above
 50th percentile: 23, 30, 34, 43, 43, 45, 49, 55,89: 4 below – 4 above
 75th percentile: 23, 30, 34, 43, 45, 49, 55, 89: 6 below – 2 above.

5-Number Summary
 The minimum number.
 First/Lower Quartile
 Second quartile
 Third quartile/Upper quartile
 The maximum number.

Interquartile Range
 Measures the spread of the middle of the 50% of the observations.
 Q3 – Q1 = IQR
 Large value: Q1 and Q3 are spread out  indicative of high variability.
 Not impacted by outlier as is range.
Box Plots

Whisker

Q1 Q2 Q3

 Left whisker: Max (S, Q1-1.5*IQR)


 Right whisker: Min (L, Q3+1.5*IQR)

Stroop Interference
 Confliction between what is asked and what is written.
Measures of Linear Relationship
 Three numerical measures of linear relationship  comment on strength and
direction of the linear relationship between two variables.
o Covariance
o Coefficient of correlation
o Coefficient of determination. (not discussed)

Covariance
 same direction: large positive number (either decrease or increase).
 Opposite directions: large negative number.
 No particular pattern: small number.

* often hard to determine the size of the number. Co-efficient of correlation helps with this.

Coefficient of correlation
 Fixed range from -1 to +1
 Positive correlation: close +1
 Negative correlation: close -1
 No straight-line relationship is indicated by close to 0.

Summary of Symbols
Population Sample
Size N n
Mean µ
2 2
Variance σ S
Standard Deviation σ S
Coefficient of Variation CV cv
Covariance σxy Sxy
Coefficient of Correlation ρ r
 Standard deviation is square root of variance.
 Coefficient of variance is stdev divided by mean.

You might also like