Basic Statistics II
Basic Statistics II
Read as
X-bar
=
Sample size
AM=Sum of the values divided by the no of the values
Measure of Central Tendency: The Mean(continued)
AM is affected by extreme values (outliers)-Disadvantage
Data 1- 1, 2, 3, 4, 5
Mean= =3
Data 2- 1, 2, 3, 4, 20
Mean= =6
• =sum of cross products of frequency of each class with mid point X of each class
• n=total no. of observation= =total frequency
Solution
Lower Upper Midpoints(x) Frequency(f) fx
0 1 0.5 1 0.5
1 2 1.5 4 6
2 3 2.5 8 20
3 4 3.5 7 24.5
4 5 4.5 3 13.5
5 6 5.5 2 11
Total 25 75.5
Mean 3.02
Example of Mean
Mean 142.25
Weighted Mean
• A weighted mean is a kind of average. Instead of each data point contributing equally to the
final mean, some data points contribute “more weights” than others.
• To calculate the average, that takes into account the importance of each value to the overall.
• Let are the data points with weights . Then the weighted average mean is given by
(
• Find average cost of labour per hour for each of the product
GM==(718)^(1/5)=10.388
Year Interest Rate(%) Return at the end of year
1 7 107
2 8 115.56
3 10 127.12
4 12 142.37 Return with agg rate
5 18 168
GM 10.388 163.91
Relation
𝐴𝑀 ≥𝐺𝑀 ≥ 𝐻𝑀
Measure of Central Tendency: The Median
In an ordered array the middle number is median i.e., 50% data are above and 50% below
Data 1- 1, 2, 3, 4, 5 Median=3
Data 2- 1, 2, 3, 4, 20 Median=3
500-600 11
600-700 5
Median=
Median
Advantages
• Not affected by extreme values
• Can be computed in case of open class
• Can be computed in case of categorical data
Disadvantage
• Arraying of data is time consuming
• To estimate population parameter mean is easier
Measure of Central Tendency: The Mode
Value that occurs most often
• Not affected by extreme values
• Can be computed in case of categorical data or numerical data
There may be no mode
• There may be several modes
Mode for grouped data
Mode=
where =lower limit of modal class
=frequency of the modal class Mode=
=frequency preceding the modal class
=frequency succeeding the modal class
c= class interval of the modal class
Class Frequency
0-100 5
100-200 8
200-300 4
300-400 6
Modal
400-500 2 Class
500-600 11
600-700 5
Measure of Central Tendency: Which
measure to use?
• The mean is generally used unless extreme value(s) exists
• Median is used, when there is outliers in data
• In some situations it make sense to use both the mean and the median
Read as
Sigma
Interfractile Range
Variance 41.49717514
SD 6.441830108
Measure of Variations: Comparing Standard
Deviations
Formulae: (Mean-Mode)/SD, (Mean-Median)/SD,
Locating Extreme Outliers: Z-score
• Example-
Relation between two numerical variables
1. Covariance
2. Coefficient of correlation
Features of the coefficient of correlation
Interpreting the coefficient
Test Score #1 (X) Test Score #2(Y)
78 82 100
92 88
86 91 95
95 90
83 92 90
70
70 75 80 85 90 95 100
Test Score #1 Test Score #2
Test Score #1 1 Test Score #1(x)
Test Score #2 0.293643524 1