0% found this document useful (0 votes)
33 views14 pages

Lecture 4 - Test of Outliers and Test of SKewness

statistics

Uploaded by

wahbamagdy1234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views14 pages

Lecture 4 - Test of Outliers and Test of SKewness

statistics

Uploaded by

wahbamagdy1234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Applied Statistics

Dr. Hossameldin Ahmed


Assistant Professor of Econometrics Applied Statistics

Lecture Four
Test of Outliers - Box Plot

*** **
LB UB
Q1 Q3

• Lower Bound (LB) = Q1 – 1.5 IQR


• Upper Bound (UB) = Q3 + 1.5 IQR
Test of Skewness
𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛
𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 = 3
𝑆𝐷

• Symmetric SC = 0 ± 0.5 ( from -0.5 to +0.5)

• Positively skewed ( skewed to the right) SC is greater than +0.5

• Negatively Skewed ( Skewed to the left) SC is less than – 0.5


Test for Outliers

Yes No

Skewed Test of skewness

Symmetric Skewed
Median
Mean Median
IQR
SD IQR
Example
a) Is this sample containing any extreme values? Justify your answer with a
suitable test.
Answer
Test for the outliers - Box Plot
Step 1: put the values in order from the smallest to the largest

50 65 67 70 72 75 77 80 82 112

Step 2: location of Q1 = ¼ (n + 1) = ¼ (10+1) = 2. 75

Value of Q1 = Start + ratio * distance = 65 + .75 (67 – 65) = 66.5 million $


Example
Step 3: location of Q3 = ¾ (n + 1) = ¾ (10+1) = 8. 25
Value of Q3 = Start + ratio * distance = 80 + 0.25 (82 – 80) = 80.5 million $

Step 4: IQR = Q3 – Q1 = 80.5 – 66.5 = 14 million $

Step 5:
Lower bound = Q1 – 1.5 IQR = 66.5 – (1.5 X 14) = 45.5 million $
Upper bound = Q3 + 1.5 IQR = 80.5 + (1.5 X 14) = 101.5 million $
Example

* 112
45.5 101.5

Comment: ???
Example
b) According to your conclusion in part (a), calculate the best central and
the best absolute dispersion measure.
Answer
IQR = 14 million $

Comment: IQR of profits is 14 million $ which represents the range of 50%


distance of the ordered dataset after excluding the lowest and the highest
25% of the ordered dataset.
Example
b) According to your conclusion in part (a), calculate the best central and the
best absolute dispersion measure.
Answer
Median
Step 1: put the values in order from the smallest to the largest

50 65 67 70 72 75 77 80 82 112
Step2: location of the median (even sample size) – Case 2
𝒏 𝟏𝟎 𝟏𝟎
= = = 𝟓 and +𝟏=𝟔
𝟐 𝟐 𝟐
Step 3: value of the median= (72+75)/2 = 73.5 million $
Example
C) Assuming that the outlier(s) are not found, what would be the best central measure
Answer
After removing 112
Median
Step 1: put the values in order from the smallest to the largest

50 65 67 70 72 75 77 80 82
Step2: location of the median (odd sample size)
𝒏+𝟏 𝟗+𝟏
= =𝟓
𝟐 𝟐
Step 3: value of the median= 72 million $
𝑋𝑖 𝑋 − 𝑏𝑎𝑟 𝑋 − 𝑥𝑏𝑎𝑟 (𝑋 − 𝑥𝑏𝑎𝑟)^2

50 70.89 -20.89 436.35

65 70.89 -5.89 34.68

67 70.89 -3.89 15.12

70 70.89 -0.89 0.79

72 70.89 1.11 1.23

75 70.89 4.11 16.90

77 70.89 6.11 37.35

80 70.89 9.11 83.01

82 70.89 11.11 123.46


Example
σ𝒏 𝟐
σ𝒏𝒊=𝟏 𝒙𝒊 𝟐

𝒊=𝟏 𝒙𝒊 − 𝒙 Standard deviation (s) =
ഥ=
𝑿 𝑺 =
𝒏 𝒏−𝟏
= 𝟕𝟎. 𝟖𝟗 𝟕𝟒𝟖. 𝟖𝟗 𝒗𝒂𝒓 = 𝟗𝟑. 𝟔𝟏
= = 𝟗𝟑. 𝟔𝟏
𝟗−𝟏
= 9.68 million $
Example
𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛
𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 = 3
𝑆𝐷

70.89 − 72
=3 = −0.34
9.68
Comment: ???
Coefficient of Variation
Can be used to compare the variability of two or more sets of
data measured in different units.

 S
CV     100%

 X 
Rule: The lower CV is the higher level of homogeneity

You might also like