Unit 1 Computational Statistics
Unit 1 Computational Statistics
Introduction to Statistics
What is Statistics
• The practice or science of collecting,
interpreting and analyzing numerical data in
large quantities, especially for the purpose of
inferring proportions in a whole from those
in a representative sample.
• Collection of methods for planning
experiments, obtaining data and then
organizing, summarizing, presenting,
analyzing, interpreting & drawing conclusions.
IT IS AFFECTED BY OUTLIERS….
Statistical Data- Categorical, Numerical (Continuous)
Mean $ 189,848.18
Median $ 55,000.00
Mode $ 64,000.00
Task-2:
-Income is an example where averages are meaningless. You should be
aware that the correct measure to use depends on the research that you
are conducting.
-Usually, whenever we have research on income, we use the median
income, instead of the mean income.
-There are certain individuals that are earning much more than others.
They are the outliers which deviate the mean value drastically.
Measure of Asymmetry- SKEWNESS
• Zero/ No skew:
• Negative/ Left skew:
Measures of Variability
• Variance
• Standard Deviation
• Coefficient of variance
Standard Deviation
• Variance values are large.
• SD is much more small and meaningful.
• SD is the preferred measure of variability (for a
single dataset), as it is directly interpretable.
Standard • σ or σ (for
deviation population)
X
denoted by • sX or s (for sample)
Mean – is a simple average of given data values:
●Example: 4,5,9,2,14,6
●Mean x̄ =
(4+5+9+3+15+6) /6
= 42/6
=7
Variance: a measure of how data-
points differ from the mean
● Marks of Student A : 30, 50, 70, 100, 100
● Marks of Student B: 70, 70, 70, 70, 70
●The mean (average) of 2 students’ marks are:
○Marks of Student A : mean = 70
○Marks of Student B : mean = 70
●But we know that the two data sets are not
identical !
●So, variance will show how they are different.
●We want to find a way to represent these two
datasets numerically.
How to Calculate variance?
A
1 30
2 50
3 70
4 100
5 100
Total 350
X
1 30 30-70=-40
2 50 50-70=-20
3 70 70-70=0
4 100 100-70=30
5 100 100-70=30
Total 350
Example 1- Variance
Score ( )2
X
1 30 30-70=-40 1600
2 50 50-70=-20 400
3 70 70-70=0 00
4 100 100-70=30 900
5 100 100-70=30 900
Total 350 3800
Example 1- Variance
Score ( )2
X
1
30 30-70=-40 1600
2
50 50-70=-20 400
3
70 70-70=0 00
4
100 100-70=30 900
5
100 100-70=30 900
Totals 350 3800
= 3800/5 =
760
Example 1- Variance
Score ( )2
B
1
70 70-70=0 0
2
70 70-70=0 0
3
70 70-70=0 0
4
70 70-70=0 0
5
70 70-70=0 0
Totals 350 0
0/5 =0
Example 2- Variance
Drive Mark Mathe
w
1 28 27
2 22 27
3 21 28
4 26 6
5 18 27
Which driver was more
consistent?
Example 2- Variance
Drive Mark's ( )2
Score X
1 28 5 25
2 22 -1 1
3 21 -2 4
4 26 3 9
5 18 -5 25
Totals 115 64
_
X = (28+22+21+26+18)/5 = 23
Example 2- Variance
Drive Mathew's ( )2
Score X
1 27 4 16
2 27 4 16
3 28 5 25
4 06 -17 289
5 27 4 16
Totals 115 362
Mark’s Variance = 64 / 5 = 12.8
Mathew’s Variance = 362 / 5 = 72.4
●Population standard
deviation:
Example – Standard
Deviation
Drive Mark's Score ( )2
X
1 28 5 25
2 22 -1 1 Mark’s Variance = 64 / 5
= 12.8
3 21 -2 4
4 26 3 9
5 18 -5 25
Totals 115 64
Example- Variance & Standard
Deviation
●You have just measured the heights of your dogs (in mm)
● The heights (at the shoulders) are: 600mm, 470mm,
170mm, 430mm and 300mm.
●Find out the Mean, the Variance, and the Standard Deviation.
Example- Variance & Standard
Deviation
●Your first step is to find the Mean:
● Mean (600 + 470 + 170 + 430 + 300)/ 5
=
=
● Mean 1970/5
● Mean = 394
Example- Variance & Standard
Deviation
●Now we calculate each dog's difference from the
Mean
Example- Variance & Standard
Deviation
●To calculate the Variance, take each difference, square it, and
then average the result:
● Variance
σ = √21704
= 147.32...
= 147 (to the nearest mm)
Example- Variance & Standard
Deviation
●And the good thing about the Standard Deviation is that it is useful. Now we
can show which heights are within one Standard Deviation (147mm) of
the Mean: