0% found this document useful (0 votes)

20 views60 pages

Lecture 2 & 3 - Numerical Presenation

The document discusses numerical presentation in applied statistics, focusing on summarizing datasets using central measures like mean, median, and mode, as well as absolute dispersion measures such as range, variance, and standard deviation. It highlights the importance of understanding outliers and their impact on statistical measures, along with methods to test for skewness and outliers. The document also provides examples and calculations to illustrate these concepts.

Uploaded by

Michael Yousry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views60 pages

Lecture 2 & 3 - Numerical Presenation

Uploaded by

Michael Yousry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Applied Statistics

Dr. Aya Ahmed

Assistant Professor of Econometrics Applied Statistics

Lecture Three
Numerical Presentation

The main goal is to summarize all the values in the given dataset in a value or

more, where when we look at these values we can know what happened in the

dataset.

What? How? When?

Numerical Presentation

POPULATION Sample

Parameter Statistic
𝜇 𝑥ҧ
2
𝜎
𝑠2
𝑁
𝑛
Numerical Presentation

Example: If you need to know your mark and I told you that your marks are
normally distributed, is that a clear answer to your question?
If I told you that the minimum mark is 80, is that a clear answer to your
question?
Central Measures

The main goal is to summarize all the values in one value where the majority of

the values are around it.

Central Measures

Can it represents all values ?

** when the dataset contains the same

value**
Mean Median Mode
Mean

What does it indicate?

It is the value at the center of dataset

where the majority of the values are

around it
Mean

𝑿𝒊 Value
𝑿𝟏 1 million
𝑿𝟐 2 million
𝑿𝟑 3 million

𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔

𝑴𝒆𝒂𝒏 =
𝑪𝒐𝒖𝒏𝒕 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔

σ𝒏𝒊=𝟏 𝒙𝒊
ഥ=
𝑿
𝒏
Mean

𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔

𝑴𝒆𝒂𝒏 =
𝑪𝒐𝒖𝒏𝒕 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔

σ𝒏𝒊=𝟏 𝒙𝒊
ഥ=
𝑿
𝒏
Profits in million $
92, 85, 88, 95
𝟗𝟐 + 𝟖𝟓 + 𝟖𝟖 + 𝟗𝟓
ഥ=
𝑿 = 𝟗𝟎 𝒎𝒊𝒍𝒍𝒊𝒐𝒏 $
𝟒

Comment: the mean of the profits is 90 million $ which represents the value at the center of dataset where

the majority of the values are around it

Mean

In case we have a company which has a profit In case we have another company
of zero, the mean will be
which has a profit of zero, the
mean will be
𝟗𝟐 + 𝟖𝟓 + 𝟖𝟖 + 𝟗𝟓 + 𝟎
ഥ=
𝑿
𝟓
= 𝟕𝟐 𝒎𝒊𝒍𝒍𝒊𝒐𝒏 $ 𝟗𝟐 + 𝟖𝟓 + 𝟖𝟖 + 𝟗𝟓 + 𝟎 + 𝟎
ഥ=
𝑿
𝟔
There a big difference between 72 and zero = 𝟔𝟎 𝒎𝒊𝒍𝒍𝒊𝒐𝒏 $

Can we depend on 60 to
represents the data
Outlier

It is a value which has different nature of the

values in the given dataset.
By removing the outlier:
1-Sample size will be less
2-Less reliable estimates
3-We don’t only remove a value but we
remove a feature from the sample that is
found in the population
Outlier
When we can remove the outlier

Technical problem

Bad entry mistake

Mean

Advantages Disadvantages

• Easy to be calculated • It is affected by outliers

• Easy to be explained
• Takes all the values into
calculation
Median

What does it indicate?

It is the value at 50% distance of the ordered dataset

Median

92 85 88 95 0
Step 1: put the values in order
( smallest  largest)

0 85 88 92 95
Step2: location of the median (odd sample size) – Case 1
𝒏+𝟏 𝟓+𝟏
= = = 𝟑 (third value)
𝟐 𝟐

Step 3: value of the median

Comment: median of the profits is 88 million $ which represents the value at 50% distance of the
ordered dataset
Median
92 85 88 95 0 400
Step 1: put the values in order
( smallest  largest)

0 85 88 92 95 400
Step2: location of the median (even sample size) – Case 2
𝒏 𝟔 𝒏 𝟔
= = =𝟑 and +𝟏= +𝟏=𝟒
𝟐 𝟐 𝟐 𝟐
Step 3: value of the median= (88+92)/2 = 90
Comment: median of the profits is 90 million $ which represents the value at 50%
distance of the ordered dataset
Median

Advantages Disadvantages

• It concentrates on the location

• Easy to be calculated more than the value
• Easy to be explained • It does not take into calculation
• It is less sensitive to the outliers all the values in the dataset
• It is not applicable with
qualitative data, specially it is
nominal
Mode

What does it indicate?

It is the most frequent / repeated value(s)

Mode

Grades
A D A B B A C A A C A
Mode: A
A D B A A F B A D B B
Mode: A & B
A F B C D
Mode: no mode
Mode

Profits of Shark company in the last 6 weeks

0 0 500 120 125 36

Misleading value
Mode
Profits of Shark company in the last 6 weeks

10 20 500 120 125 36

Failed to provide you with a value

Mode

Advantages Disadvantages

• Easy to be calculated • It not preferred to be used

• Easy to be explained with continuous variables
due to:
• It is applicable with
qualitative data • Fail to estimate a value
• Misleading values
Absolute Dispersion Measures

The main goal is to evaluate how far the values are away from each other and

how far they are from the center of dataset. As a result of that we can evaluate

if the values are homogenous or heterogeneous.

Absolute Dispersion Measures

90 Million

85 95
Absolute Dispersion Measures

98 100 95 92 96 94

Case of homogeneity

85 74 93 20 100 0 94 52

Case of heterogeneity
Absolute Dispersion Measures

Variance and
Inter-quartile
Range Standard
range
Deviation
Range

What does it indicate?

It is the distance between the min. value

and max. value

Range

Profits in million $

92, 85, 88, 95

Range = Max. Value – Min. Value
= 95 – 85 = 10 million $

Comment: the range of the profits is 10 which represents the distance between min profit

(85 million $) and max. profit (95 million $)

Range

Company (1) Company (2)

Range of salaries is 30,000 Range of salaries is 30,000

Min salary 10,000 Min salary 20,000

Max. Salary 40,000 Max. Salary 50,000

• Meaningless until we linked with the Min. and Max. values.

• Can’t be used to compare between 2 datasets or more.
• Affected by outliers.
Range

Advantages Disadvantages

• Easy to be calculated • It takes only two values into

• Easy to be explained calculation
• It combines the tails of dataset • It does not provide us with
average distance around the
mean
• It is affected by outlier
Variance and Standard Deviation

Average distance around the

mean
Variance and Standard Deviation

𝒏
σ𝒊=𝟏 𝟐
𝟐
ഥ
𝒙𝒊 − 𝒙
𝑺 =
𝒏−𝟏
ഥ=𝟎
𝒙𝒊 − 𝒙 Deviations Around the Mean
(not from mean)
Variance and Standard Deviation
𝒏 𝟐
𝒙𝒊 ഥ
𝒙 ഥ ( 𝒙𝒊 − 𝒙
𝒙𝒊 − 𝒙 ഥ)𝟐
𝟐
σ ഥ
𝒊=𝟏 𝒙𝒊 − 𝒙
𝑺 =
92 90 2 4 𝒏−𝟏
88 90 -2 4
𝟓𝟖
95 90 5 25 = = 𝟏𝟗. 𝟑𝟑
𝟒−𝟏
85 90 -5 25
Standard deviation (s) =
58
𝒗𝒂𝒓 = 𝟏𝟗. 𝟑𝟑

= 4.4 million $
Variance and Standard Deviation

Mean = 90 million $
SD = 4.4 million $

90 – 4.4 90 + 4.4
85.6 million $ 94.4 million $
Variance and Standard Deviation

Comment:

- SD of profits is 4.4 million $ which represents the average distance

around the mean profit (90 million $)
- As a result of that, the majority of the values range from 85.6 million $
to 94.4 million $ on average.
Variance and Standard Deviation

Disadvantages
Advantages

It is affected by outliers because the

• Easy to be calculated main component in its calculation is the

mean which has a main drawback of

• Easy to be explained
being impacted by outliers
• It takes all values into calculation
Inter Quartile Range (IQR)

Lowest 25 % Highest 25%

Distance Range of 50 % Distance of the ordered Distance
Dataset

Smallest
Value
Inter Quartile Range Largest
Value
First Third
Quartile
Quartile
(Q1)
(Q3)
25%
75%
Inter Quartile Range (IQR)

67, 72, 65, 77, 75, 70, 80, 82, 50, 112
Step 1: put the values in order from the smallest to the largest

50 65 67 70 72 75 77 80 82 112

Step 2: location of Q1 = ¼ (n + 1) = ¼ (10+1) = 2. 75

Value of Q1 = Start + ratio * distance = 65 + .75 (67 – 65) = 66.5 million $

Comment: Q1 of profits is 66.5 million $ which represents the value at 25% distance of the ordered dataset.
Inter Quartile Range (IQR)

50 65 67 70 72 75 77 80 82 112

Step 3: location of Q3 = ¾ (n + 1) = ¾ (10+1) = 8. 25

Value of Q3 = Start + ratio * distance = 80 + 0.25 (82 – 80) = 80.5 million $
Comment: Q3 of profits is 80.5 million $ which represents the value at 75% distance of the ordered dataset.
Step 4: IQR = Q3 – Q1 = 80.5 – 66.5 = 14 million $
Comment: IQR of profits is 14 million $ which represents the range of 50% distance of the ordered dataset after
excluding the lowest and the highest 25% of the ordered dataset.
Test of Outliers - Box Plot

*** **
LB UB
Q1 Q3

 Lower Bound (LB) = Q1 – 1.5 IQR

 Upper Bound (UB) = Q3 + 1.5 IQR
Test of Skewness

𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛
𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 = 3
𝑆𝐷

 Symmetric SC = 0 ± 0.5 ( from -0.5 to +0.5)

 Positively skewed ( skewed to the right) SC is greater than +0.5

 Negatively Skewed ( Skewed to the left) SC is less than – 0.5

Test for Outliers

Yes No

Skewed Test of skewness

Symmetric Skewed
Median
Mean Median
IQR
SD IQR
Example
a) Is this sample containing any extreme values? Justify your answer with a
suitable test.
Answer
Test for the outliers - Box Plot
Step 1: put the values in order from the smallest to the largest

50 65 67 70 72 75 77 80 82 112

Step 2: location of Q1 = ¼ (n + 1) = ¼ (10+1) = 2. 75

Value of Q1 = Start + ratio * distance = 65 + .75 (67 – 65) = 66.5 million $

Example
a) Is this sample containing any extreme values? Justify your answer with a
suitable test.
Answer
Test for the outliers - Box Plot
Step 1: put the values in order from the smallest to the largest

50 65 67 70 72 75 77 80 82 112

Step 2: location of Q1 = ¼ (n + 1) = ¼ (10+1) = 2. 75

Value of Q1 = Start + ratio * distance = 65 + .75 (67 – 65) = 66.5 million $

Example

* 112
45.5 101.5

Comment: ???
Example
b) According to your conclusion in part (a), calculate the best central and
the best absolute dispersion measure.
Answer
IQR = 14 million $

Comment: IQR of profits is 14 million $ which represents the range of 50%

distance of the ordered dataset after excluding the lowest and the highest
25% of the ordered dataset.
Example
b) According to your conclusion in part (a), calculate the best central and the
best absolute dispersion measure.
Answer
Median
Step 1: put the values in order from the smallest to the largest

50 65 67 70 72 75 77 80 82 112
Step2: location of the median (even sample size) – Case 2
𝒏 𝟏𝟎 𝟏𝟎
= = = 𝟓 and +𝟏=𝟔
𝟐 𝟐 𝟐
Step 3: value of the median= (72+75)/2 = 73.5 million $
Example
C) Assuming that the outlier(s) are not found, what would be the best central measure
Answer
After removing 112

Median
Step 1: put the values in order from the smallest to the largest

50 65 67 70 72 75 77 80 82
Step2: location of the median (odd sample size)
𝒏+𝟏 𝟗+𝟏
= =𝟓
𝟐 𝟐
Step 3: value of the median= 72 million $
𝑋𝑖 𝑋 − 𝑏𝑎𝑟 𝑋 − 𝑥𝑏𝑎𝑟 (𝑋 − 𝑥𝑏𝑎𝑟)^2

50 70.89 -20.89 436.35

65 70.89 -5.89 34.68

67 70.89 -3.89 15.12

70 70.89 -0.89 0.79

72 70.89 1.11 1.23

75 70.89 4.11 16.90

77 70.89 6.11 37.35

80 70.89 9.11 83.01

82 70.89 11.11 123.46

Example
𝒏 𝟐
σ𝒏𝒊=𝟏 𝒙𝒊 𝟐
σ ഥ
𝒊=𝟏 𝒙𝒊 − 𝒙 Standard deviation (s) =
ഥ=
𝑿 𝑺 =
𝒏 𝒏−𝟏
= 𝟕𝟎. 𝟖𝟗 𝟕𝟒𝟖. 𝟖𝟗 𝒗𝒂𝒓 = 𝟗𝟑. 𝟔𝟏
= = 𝟗𝟑. 𝟔𝟏
𝟗−𝟏
= 9.68 million $
Example
𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛
𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 = 3
𝑆𝐷

70.89 − 72
=3 = −0.34
9.68
Comment: ???
Coefficient of Variation
Can be used to compare the variability of two or more sets of
data measured in different units.

 S
CV     100%

 X 
Rule: The lower CV is the higher level of homogeneity
Coefficient of Variation

Question Two: The prices of stock A and Stock B recorded over several months as
follows.
Stock A: 10 10 12 10 11 11 10 11 10 9
Stock B: 9 10 12 7 10 16 10 15 10
Where that the Standard deviation for stock A is 0.843. The Variance of stock B is
8.933 and mean is 10.6
Which stock would you prefer to buy? And why? Comment on the results.
Stock A
 S
CV     100%

X 

10 + 10 + 12 + 10 + 11 + 11 + 10 + 11 + 10_+9
𝑋ത = = 10.4
10

0.843
𝐶. 𝑉𝐴 = × 100 = 8.108%
10.4
Stock B
 S
CV     100%

X 

𝑆= 8.933 = 2.98

2.98
𝐶. 𝑉𝐵 = × 100 = 28.19%
10.6
Comment

Since the 𝐶. 𝑉𝐴 < 𝐶. 𝑉𝐵 so prices of stock A is

more homogenous than the prices of stock B
as results we would prefer to buy stock A.
Thank you

See you next lecture

Measures of Variability
100% (1)
Measures of Variability
11 pages
Math GR10 Qtr4-Module-1
100% (1)
Math GR10 Qtr4-Module-1
24 pages
AP Statistics Midterm
33% (3)
AP Statistics Midterm
51 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
Week 1
100% (1)
Week 1
25 pages
2024 GR 9 Data Handling and Transformation Geometry Tutorial
No ratings yet
2024 GR 9 Data Handling and Transformation Geometry Tutorial
17 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Empirical Research Methods-AB
No ratings yet
Empirical Research Methods-AB
155 pages
HR Analytics Day 1
No ratings yet
HR Analytics Day 1
80 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
Data Analytics TB
No ratings yet
Data Analytics TB
1,944 pages
Summary Measures
No ratings yet
Summary Measures
26 pages
Basic Stats Session
No ratings yet
Basic Stats Session
16 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
65 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
6 pages
Statistics III Form 4
100% (2)
Statistics III Form 4
5 pages
11 Statistics1 P62 Summer 2002 2014
No ratings yet
11 Statistics1 P62 Summer 2002 2014
42 pages
Financial Analytics Jul Dec 24-1
No ratings yet
Financial Analytics Jul Dec 24-1
181 pages
Topic 1 Describing Data II
No ratings yet
Topic 1 Describing Data II
68 pages
Numerical Descriptive Measures
No ratings yet
Numerical Descriptive Measures
52 pages
Quantitative Methods For Management
No ratings yet
Quantitative Methods For Management
118 pages
S1 Revision Worksheet For Pre-Mock 2 Month: 09 (March 2024) Chapters - 2, 3, 5
No ratings yet
S1 Revision Worksheet For Pre-Mock 2 Month: 09 (March 2024) Chapters - 2, 3, 5
4 pages
Descriptive Statsistics
No ratings yet
Descriptive Statsistics
34 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Descriptive Statistics - Numerical Measures
No ratings yet
Descriptive Statistics - Numerical Measures
91 pages
Data Analytics Compendium BITeSys 2024
No ratings yet
Data Analytics Compendium BITeSys 2024
46 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Statistics Midterm Review
No ratings yet
Statistics Midterm Review
21 pages
Lecture 04
No ratings yet
Lecture 04
88 pages
Dote 2011 L1
No ratings yet
Dote 2011 L1
35 pages
Statistics For Managers Using Microsoft Excel: 5 Edition
No ratings yet
Statistics For Managers Using Microsoft Excel: 5 Edition
54 pages
Statpro Reporting Finaaaaal
No ratings yet
Statpro Reporting Finaaaaal
22 pages
Mid Assignment - Business Statistics - FGS - Mbus.2024.207 - KSL Harshapriya.
No ratings yet
Mid Assignment - Business Statistics - FGS - Mbus.2024.207 - KSL Harshapriya.
86 pages
Lecture 1 - Introduction To Statistics
No ratings yet
Lecture 1 - Introduction To Statistics
48 pages
Central Tendency Variation Outliers
No ratings yet
Central Tendency Variation Outliers
59 pages
Beyond The Rule of 5: Lessons Learned From AbbVie's Drugs and Compound Collection
No ratings yet
Beyond The Rule of 5: Lessons Learned From AbbVie's Drugs and Compound Collection
56 pages
Mid Assignment Business Statistics FGS - Mbus.2024.207 KSL Harshapriya Final 30.08.2024
No ratings yet
Mid Assignment Business Statistics FGS - Mbus.2024.207 KSL Harshapriya Final 30.08.2024
81 pages
Statistics & Probability - Paper 1 (New)
No ratings yet
Statistics & Probability - Paper 1 (New)
100 pages
Midterms Day 4
No ratings yet
Midterms Day 4
51 pages
Topic II Part II
No ratings yet
Topic II Part II
22 pages
Lecture - 04 - TP
No ratings yet
Lecture - 04 - TP
126 pages
Mastering Outliers in Excel and in R
No ratings yet
Mastering Outliers in Excel and in R
71 pages
Measures of Central Tendency & Variability: Lina, Karima, Joselyn, Arlene
No ratings yet
Measures of Central Tendency & Variability: Lina, Karima, Joselyn, Arlene
34 pages
Numerical Measures: Bf1206-Business Mathematics SEMESTER 2 - 2016/2017
No ratings yet
Numerical Measures: Bf1206-Business Mathematics SEMESTER 2 - 2016/2017
25 pages
Measure of Variation
No ratings yet
Measure of Variation
50 pages
Statistics 19.06 v.2
No ratings yet
Statistics 19.06 v.2
18 pages
Lecture 3 - Numerical Presenation
No ratings yet
Lecture 3 - Numerical Presenation
66 pages
Data Analyst Cheat Sheet FROM Parth Roy
No ratings yet
Data Analyst Cheat Sheet FROM Parth Roy
59 pages
Newbold SBE9e Accessible CH02
No ratings yet
Newbold SBE9e Accessible CH02
64 pages
2 - Descriptive Statistics
No ratings yet
2 - Descriptive Statistics
29 pages
Share MBBS - Lecture 4 (1) - 1
No ratings yet
Share MBBS - Lecture 4 (1) - 1
68 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
26 pages
Lecture 3 - Numerical Presenation
No ratings yet
Lecture 3 - Numerical Presenation
31 pages
Session 2 Inferential Statistics Slides
No ratings yet
Session 2 Inferential Statistics Slides
93 pages
Name Rabia Basri ID 18PKR10306 Program B.S (Library Info - Sciences) Semester SPRING 2024
No ratings yet
Name Rabia Basri ID 18PKR10306 Program B.S (Library Info - Sciences) Semester SPRING 2024
16 pages
Human Activities Classifier Using SVM
No ratings yet
Human Activities Classifier Using SVM
19 pages
Measures of Spread and Dispersion
No ratings yet
Measures of Spread and Dispersion
20 pages
Exploring Numerical Data - Students
No ratings yet
Exploring Numerical Data - Students
97 pages
MMW - Chapter 8 - Measures of Relative Position (Quartile, Percentile, Z-Score)
No ratings yet
MMW - Chapter 8 - Measures of Relative Position (Quartile, Percentile, Z-Score)
28 pages
Chapter 3, Part A Descriptive Statistics: Numerical Measures
No ratings yet
Chapter 3, Part A Descriptive Statistics: Numerical Measures
7 pages
B. Data Management
No ratings yet
B. Data Management
61 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
19 pages
ML Lab Manual
No ratings yet
ML Lab Manual
60 pages
Statistics ClassNotes - 2
No ratings yet
Statistics ClassNotes - 2
10 pages
Lecture 4 - Test of Outliers and Test of SKewness
No ratings yet
Lecture 4 - Test of Outliers and Test of SKewness
14 pages
Central Tendency - Lecture Notes
No ratings yet
Central Tendency - Lecture Notes
34 pages
1.3 Variation
No ratings yet
1.3 Variation
16 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
40 pages
Lecture 4 Copy 1
No ratings yet
Lecture 4 Copy 1
13 pages
Lecture Slides - Capítulo 02
No ratings yet
Lecture Slides - Capítulo 02
21 pages
DS-Lecture-3a-Data-Central Tendency
No ratings yet
DS-Lecture-3a-Data-Central Tendency
13 pages
Quarter 4 Module 1 Illustrating Quartiles Deciles Percentiles
No ratings yet
Quarter 4 Module 1 Illustrating Quartiles Deciles Percentiles
11 pages
Module 1 Overview - of - Statistics
No ratings yet
Module 1 Overview - of - Statistics
11 pages
q2 Final Exam Grade 9 Elective
No ratings yet
q2 Final Exam Grade 9 Elective
6 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
Descriptive Statistics: Mean or Average
No ratings yet
Descriptive Statistics: Mean or Average
5 pages
Answers IBS
No ratings yet
Answers IBS
13 pages
Meas T
No ratings yet
Meas T
8 pages
Seminar 3 Measures of Dispersion With Answers
No ratings yet
Seminar 3 Measures of Dispersion With Answers
7 pages
Box Plots and Quartiles Edexcel Solutions
No ratings yet
Box Plots and Quartiles Edexcel Solutions
9 pages
Qtymeth Dispersion
No ratings yet
Qtymeth Dispersion
8 pages
Western Mindanao State University Siay Campus: Mode Median
No ratings yet
Western Mindanao State University Siay Campus: Mode Median
5 pages
Basic 1
No ratings yet
Basic 1
60 pages
Box - Plot
No ratings yet
Box - Plot
5 pages
Measure of Dispersion or Variation
No ratings yet
Measure of Dispersion or Variation
5 pages
MATH 1002 Assignment 1 LPP
No ratings yet
MATH 1002 Assignment 1 LPP
3 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
The Practically Cheating Statistics Handbook, The Sequel! (2nd Edition)
From Everand
The Practically Cheating Statistics Handbook, The Sequel! (2nd Edition)
S. Deviant
4.5/5 (3)

Lecture 2 & 3 - Numerical Presenation

Uploaded by

Lecture 2 & 3 - Numerical Presenation

Uploaded by

Applied Statistics

Dr. Aya Ahmed

What? How? When?

the values are around it.

Can it represents all values ?

** when the dataset contains the same

What does it indicate?

It is the value at the center of dataset

where the majority of the values are

𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔

𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔

the majority of the values are around it

It is a value which has different nature of the

Bad entry mistake

• Easy to be calculated • It is affected by outliers

What does it indicate?

It is the value at 50% distance of the ordered dataset

Step 3: value of the median

• It concentrates on the location

What does it indicate?

It is the most frequent / repeated value(s)

Profits of Shark company in the last 6 weeks

0 0 500 120 125 36

10 20 500 120 125 36

Failed to provide you with a value

• Easy to be calculated • It not preferred to be used

if the values are homogenous or heterogeneous.

What does it indicate?

It is the distance between the min. value

and max. value

92, 85, 88, 95

(85 million $) and max. profit (95 million $)

Company (1) Company (2)

Range of salaries is 30,000 Range of salaries is 30,000

Min salary 10,000 Min salary 20,000

• Meaningless until we linked with the Min. and Max. values.

• Easy to be calculated • It takes only two values into

Average distance around the

- SD of profits is 4.4 million $ which represents the average distance

It is affected by outliers because the

mean which has a main drawback of

Lowest 25 % Highest 25%

Step 2: location of Q1 = ¼ (n + 1) = ¼ (10+1) = 2. 75

Value of Q1 = Start + ratio * distance = 65 + .75 (67 – 65) = 66.5 million $

Step 3: location of Q3 = ¾ (n + 1) = ¾ (10+1) = 8. 25

 Lower Bound (LB) = Q1 – 1.5 IQR

 Symmetric SC = 0 ± 0.5 ( from -0.5 to +0.5)

 Positively skewed ( skewed to the right) SC is greater than +0.5

 Negatively Skewed ( Skewed to the left) SC is less than – 0.5

Skewed Test of skewness

Step 2: location of Q1 = ¼ (n + 1) = ¼ (10+1) = 2. 75

Value of Q1 = Start + ratio * distance = 65 + .75 (67 – 65) = 66.5 million $

Step 2: location of Q1 = ¼ (n + 1) = ¼ (10+1) = 2. 75

Value of Q1 = Start + ratio * distance = 65 + .75 (67 – 65) = 66.5 million $

Comment: IQR of profits is 14 million $ which represents the range of 50%

50 70.89 -20.89 436.35

65 70.89 -5.89 34.68

67 70.89 -3.89 15.12

70 70.89 -0.89 0.79

72 70.89 1.11 1.23

75 70.89 4.11 16.90

77 70.89 6.11 37.35

80 70.89 9.11 83.01

82 70.89 11.11 123.46

Since the 𝐶. 𝑉𝐴 < 𝐶. 𝑉𝐵 so prices of stock A is

See you next lecture

You might also like