Lecture2
Lecture2
• The central tendency is stated as the statistical measure that represents the single
value of the entire distribution or a dataset.
• These three values summarize the dataset using a single value.
2
MEAN
An Outlier
10 20 30 40
4
TRIMMED MEAN
• The average of all values after dropping a fixed number of extreme values from both
ends.
• Preferable to use instead of ordinary mean as it can negate the effect of extreme
values (Outliers).
5
MEDIAN
• The median is the middle number on a sorted list of the data.
• If there is an even number of data values, the middle value is one that is not actually
in the data set, but rather the average of the two values that divide the sorted data
into upper and lower halves.
!"#
• If n is odd, th value à 10, 15, 18, 22, 25, 28, 33, 43, 50
$
%
&
th value + (%&"#) th value
• If n is even, à 10, 15, 18, 22, 25, 28, 33, 43, 50, 55 à
$
(25+28)/2 = 26.5
• Not influenced by the extreme values (Outliers).
6
PRACTICE PROBLEM - 1
• A group of 10 students appeared in a test. Their obtained marks are given below.
60, 70, 80, 75, 65, 70, 80, 70, 65, 65
Sorted Order
60 65 65 65 70 70 70 75 80 80
Median = 70
Mean = 70
7
PRACTICE PROBLEM - 2
• A group of 10 students appeared in a test. Their obtained marks are given below.
60, 70, 80, 75, 65, 70, 100, 70, 65, 65
Mean = 720/10 = 72
Without considering the 100, mean = 620/9 = 68.89
Median = 70
8
PRACTICE PROBLEM - 3
• A group of 10 students appeared in a test. Their obtained marks are given below.
60, 70, 80, 75, 65, 70, 10, 70, 65, 65
Mean = 630/10 = 63
Without considering the 10, mean = 620/9 = 68.89
Median = 67.5
9
WEIGHTED MEAN
• Some values are intrinsically more variable than others, and highly variable
observations are given a lower weight.
10
PRACTICE PROBLEM - 4
• In a software project, there could be several risks and these risks can be categorized
according to their level of severity. Calculate the expected value of the Damage.
• E(X) = ∑ "#$# Risk ID Severity Level Damage
1 Extreme (5) 250000
2 Significant (4) 150000
3 Moderate (3) 100000
4 Less (2) 50000
5 Insignificant (1) 10000
• Instead of the middle number, the weighted median is a value such that the sum of
the weights is equal for the lower and upper halves of the sorted list.
• X = 15 17 20 22 24
• 17 is the maximum value of the lower half
• 20 is the minimum value of the upper half
• Weighted Median = (17+20) / 2 = 18.5
• Median = 20
12
MODE
13
DISPERSION
14
MEAN ABSOLUTE DEVIATION
• The mean of the absolute value of the deviations from the mean.
15
(SAMPLE) VARIANCE
• The sum of squared deviations from the mean divided by n – 1 where n is the number of
data values.
• Average of the squared deviations.
• Why the denominator is n-1 instead of n? To obtain the true estimate of the variance with
regard to the population, it is divided by n-1 so that the estimated value would be little
larger. This value it is known as the true estimate of the variance.
16
PRACTICE EXAMPLE
18
MEDIAN OF MEDIAN ABOSULTE DEVIATION
• The median of the absolute value of the deviations from the median.
19
SUMMARY OF DISPERSION MEASURES
20
PRACTICE EXAMPLE
• Find the mean, median, mode, range, variance, standard deviation, mean absolute deviation and
median of the median absolute deviation for the following list of values:
22
0 100
Lower Quartile Median Upper Quartile
FINDING PERCENTILE
8, 9, 10, 10, 10, 11, 11, 11, 12, 13
• Total data points, n = 10
• 50th percentile = 50% X 10 = 5th value = 10
• Median = (10+11)/2 = 10.5
8, 9, 10, 10, 10, 11, 11, 11, 12, 13, 15, 16, 16, 17, 20
• Total data points, n = 15
• 50th percentile = 50% X 15 = 7.5th value = 11
• Median = 8th value = 11
• 20th percentile = 20% X 15 = 3rd value = 10
• 75th percentile = 75% X 15 = 11.25th value = 15 + (16-15)X0.25 = 15.25
• 95th percentile = 95% X 15 = 14.25th value = 17 + (20-17)X0.25 = 17.75 23
INTERQUARTILE RANGE
• The difference between the 75th percentile and the 25th percentile.
• IQR = Q3 – Q1
• Range = max - min
24
ESTIMATING MEAN, MEDIAN FOR A
GROUPED DATASET
Class Interval
1. Exclusive Interval
2. Inclusive Interval
25
EXAMPLE: EXCLUSIVE INTERVAL
85, 125, 210, 180, 160, 155, 135, 100 …..
26
CONVERTING INCLUSIVE TO EXCLUSIVE
Seconds Frequency
50.5 – 55.5 2
55.5 – 60.5 7
60.5 – 65.5 8
65.5 – 70.5 4
27
FINDING MEAN, MEDIAN, VARIANCE FOR A
GROUPED DATASET
Seconds Frequency (f) Mid point (x) Fi * Xi Fi * (Xi – Xbar)2 CFi
• Prepare a grouped dataset based on the above mentioned table for estimating mean, median and variance
of the first innings score.
• The class interval must be within 10 to 20.
• You can use inclusive/exclusive interval.
• Also estimate mean, median and variance of the first innings score for each ground and then compare 29
them
CORRELATION
• Correlation is a statistic that measures the degree to which two variables move in
relation to each other.
• Correlation is Positive when the values increase together.
• Correlation is Negative when one value decreases as the other increases.
30
POSITIVE AND NEGATIVE CORRELATION
31
PEARSON’S CORRELATION COEFFICIENT
32
EXAMPLE
33
SCATTER PLOT ON DATASET
Y
120
100
80
60
40
20
34
0
0 10 20 30 40 50 60 70
FINDING THE VALUE OF R
X Y Xi – Mean of X Yi – Mean of Y !" − !$%& ()" (Xi – Mean of X)2 (Yi – Mean of Y)2
− )$%&)
43 99 1.833333 18 33 3.361111 324
21 65 -20.1667 -16 322.6667 406.6944 256
25 79 -16.1667 -2 32.33333 261.3611 4
42 75 0.833333 -6 -5 0.694444 36
57 87 15.83333 6 95 250.6944 36
59 81 17.83333 0 0 318.0278 0
Mean of X = 41.1667
Mean of Y = 81
* !" − !$%& )" − )$%& = 478
36
USEFUL RESOURCES
37
THANK YOU
38