Lecture 2 & 3 - Numerical Presenation
Lecture 2 & 3 - Numerical Presenation
Lecture Three
Numerical Presentation
The main goal is to summarize all the values in the given dataset in a value or
more, where when we look at these values we can know what happened in the
dataset.
POPULATION Sample
Parameter Statistic
𝜇 𝑥ҧ
2
𝜎
𝑠2
𝑁
𝑛
Numerical Presentation
Example: If you need to know your mark and I told you that your marks are
normally distributed, is that a clear answer to your question?
If I told you that the minimum mark is 80, is that a clear answer to your
question?
Central Measures
The main goal is to summarize all the values in one value where the majority of
value**
Mean Median Mode
Mean
around it
Mean
𝑿𝒊 Value
𝑿𝟏 1 million
𝑿𝟐 2 million
𝑿𝟑 3 million
σ𝒏𝒊=𝟏 𝒙𝒊
ഥ=
𝑿
𝒏
Mean
σ𝒏𝒊=𝟏 𝒙𝒊
ഥ=
𝑿
𝒏
Profits in million $
92, 85, 88, 95
𝟗𝟐 + 𝟖𝟓 + 𝟖𝟖 + 𝟗𝟓
ഥ=
𝑿 = 𝟗𝟎 𝒎𝒊𝒍𝒍𝒊𝒐𝒏 $
𝟒
Comment: the mean of the profits is 90 million $ which represents the value at the center of dataset where
In case we have a company which has a profit In case we have another company
of zero, the mean will be
which has a profit of zero, the
mean will be
𝟗𝟐 + 𝟖𝟓 + 𝟖𝟖 + 𝟗𝟓 + 𝟎
ഥ=
𝑿
𝟓
= 𝟕𝟐 𝒎𝒊𝒍𝒍𝒊𝒐𝒏 $ 𝟗𝟐 + 𝟖𝟓 + 𝟖𝟖 + 𝟗𝟓 + 𝟎 + 𝟎
ഥ=
𝑿
𝟔
There a big difference between 72 and zero = 𝟔𝟎 𝒎𝒊𝒍𝒍𝒊𝒐𝒏 $
Can we depend on 60 to
represents the data
Outlier
Technical problem
Advantages Disadvantages
92 85 88 95 0
Step 1: put the values in order
( smallest largest)
0 85 88 92 95
Step2: location of the median (odd sample size) – Case 1
𝒏+𝟏 𝟓+𝟏
= = = 𝟑 (third value)
𝟐 𝟐
0 85 88 92 95 400
Step2: location of the median (even sample size) – Case 2
𝒏 𝟔 𝒏 𝟔
= = =𝟑 and +𝟏= +𝟏=𝟒
𝟐 𝟐 𝟐 𝟐
Step 3: value of the median= (88+92)/2 = 90
Comment: median of the profits is 90 million $ which represents the value at 50%
distance of the ordered dataset
Median
Advantages Disadvantages
Grades
A D A B B A C A A C A
Mode: A
A D B A A F B A D B B
Mode: A & B
A F B C D
Mode: no mode
Mode
Misleading value
Mode
Profits of Shark company in the last 6 weeks
Advantages Disadvantages
The main goal is to evaluate how far the values are away from each other and
how far they are from the center of dataset. As a result of that we can evaluate
90 Million
85 95
Absolute Dispersion Measures
98 100 95 92 96 94
Case of homogeneity
85 74 93 20 100 0 94 52
Case of heterogeneity
Absolute Dispersion Measures
Variance and
Inter-quartile
Range Standard
range
Deviation
Range
Profits in million $
Comment: the range of the profits is 10 which represents the distance between min profit
Advantages Disadvantages
𝒏
σ𝒊=𝟏 𝟐
𝟐
ഥ
𝒙𝒊 − 𝒙
𝑺 =
𝒏−𝟏
ഥ=𝟎
𝒙𝒊 − 𝒙 Deviations Around the Mean
(not from mean)
Variance and Standard Deviation
𝒏 𝟐
𝒙𝒊 ഥ
𝒙 ഥ ( 𝒙𝒊 − 𝒙
𝒙𝒊 − 𝒙 ഥ)𝟐
𝟐
σ ഥ
𝒊=𝟏 𝒙𝒊 − 𝒙
𝑺 =
92 90 2 4 𝒏−𝟏
88 90 -2 4
𝟓𝟖
95 90 5 25 = = 𝟏𝟗. 𝟑𝟑
𝟒−𝟏
85 90 -5 25
Standard deviation (s) =
58
𝒗𝒂𝒓 = 𝟏𝟗. 𝟑𝟑
= 4.4 million $
Variance and Standard Deviation
Mean = 90 million $
SD = 4.4 million $
90 – 4.4 90 + 4.4
85.6 million $ 94.4 million $
Variance and Standard Deviation
Comment:
Disadvantages
Advantages
Smallest
Value
Inter Quartile Range Largest
Value
First Third
Quartile
Quartile
(Q1)
(Q3)
25%
75%
Inter Quartile Range (IQR)
67, 72, 65, 77, 75, 70, 80, 82, 50, 112
Step 1: put the values in order from the smallest to the largest
50 65 67 70 72 75 77 80 82 112
Comment: Q1 of profits is 66.5 million $ which represents the value at 25% distance of the ordered dataset.
Inter Quartile Range (IQR)
50 65 67 70 72 75 77 80 82 112
*** **
LB UB
Q1 Q3
𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛
𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 = 3
𝑆𝐷
Yes No
Symmetric Skewed
Median
Mean Median
IQR
SD IQR
Example
a) Is this sample containing any extreme values? Justify your answer with a
suitable test.
Answer
Test for the outliers - Box Plot
Step 1: put the values in order from the smallest to the largest
50 65 67 70 72 75 77 80 82 112
50 65 67 70 72 75 77 80 82 112
* 112
45.5 101.5
Comment: ???
Example
b) According to your conclusion in part (a), calculate the best central and
the best absolute dispersion measure.
Answer
IQR = 14 million $
50 65 67 70 72 75 77 80 82 112
Step2: location of the median (even sample size) – Case 2
𝒏 𝟏𝟎 𝟏𝟎
= = = 𝟓 and +𝟏=𝟔
𝟐 𝟐 𝟐
Step 3: value of the median= (72+75)/2 = 73.5 million $
Example
C) Assuming that the outlier(s) are not found, what would be the best central measure
Answer
After removing 112
Median
Step 1: put the values in order from the smallest to the largest
50 65 67 70 72 75 77 80 82
Step2: location of the median (odd sample size)
𝒏+𝟏 𝟗+𝟏
= =𝟓
𝟐 𝟐
Step 3: value of the median= 72 million $
𝑋𝑖 𝑋 − 𝑏𝑎𝑟 𝑋 − 𝑥𝑏𝑎𝑟 (𝑋 − 𝑥𝑏𝑎𝑟)^2
70.89 − 72
=3 = −0.34
9.68
Comment: ???
Coefficient of Variation
Can be used to compare the variability of two or more sets of
data measured in different units.
S
CV 100%
X
Rule: The lower CV is the higher level of homogeneity
Coefficient of Variation
Question Two: The prices of stock A and Stock B recorded over several months as
follows.
Stock A: 10 10 12 10 11 11 10 11 10 9
Stock B: 9 10 12 7 10 16 10 15 10
Where that the Standard deviation for stock A is 0.843. The Variance of stock B is
8.933 and mean is 10.6
Which stock would you prefer to buy? And why? Comment on the results.
Stock A
S
CV 100%
X
10 + 10 + 12 + 10 + 11 + 11 + 10 + 11 + 10_+9
𝑋ത = = 10.4
10
0.843
𝐶. 𝑉𝐴 = × 100 = 8.108%
10.4
Stock B
S
CV 100%
X
𝑆= 8.933 = 2.98
2.98
𝐶. 𝑉𝐵 = × 100 = 28.19%
10.6
Comment