Measures of Location
Measures of Location
Measures of Location
The term central tendency refers to the middle value of the data, and is measured using the mean, median, or mode. It is
the tendency of the values of a random variable to cluster around the mean, median, and mode. And a measure of central
tendency for a data distribution is a measure of centralness of data and it is used to summarize the data set.
1. The mode : ungrouped data: the value which occurs with the highest frequency
2. The modal class : grouped data: the class with the highest frequency density
N f
0 .9 1 .4 2 .8 3 .1 5 .6
Example (ungrouped): 0.9, 1.4, 2.8, 3.1, 5.6 x 2.76
5
Example (grouped): Find the mean of the following tables
Frequency, f 11 10 5 3 1 Frequency, f 22 48 25 16 9
Solution: Solution:
Number of instruments, x Frequency, f fx Speed Mid-interval value, x Frequency, f fx
1 11 11 21-25 23 22 506
2 10 20 26-30 28 48 1344
3 5 15 31-35 33 25 825
4 3 12 36-45 40.5 16 648
5 1 5 46-60 53 9 477
f 30 fx 63 f 120 fx 3800
63 3800 2
x 2 .1 x 31
30 120 3
4. The median :ungrouped data: arrange the values in order of increasing size,
1
If n is odd, the median = (n 1) th value
2
n th (n 1) th
If n is even, the median =
2
The first quartile, Q1 and third quartile, Q3,
:If n is odd, find the median and delete it from the list, split the remaining data into their upper half and lower half, then
the median of upper half is Q3 and the median of lower half is Q1.
If n is even, split the data into their upper half and lower half, then the median of upper half is Q3 and the median of
lower half is Q1.
1
The median, lower quartile and upper quartile: Grouped data: use cumulative frequency graph to estimate the values
N
Cumulative frequency
n F
3 Using formula: Median L 2 c
n f
4
1
n
2 Where L = Lower class boundary of median class
1
n N = Total frequency
4
0 X F = previous cumulative frequency of the median class
Q1 M Q3 f = Frequency of median class
c = class size of median class
Comparison of the mean, median and mode
Mean would represent the best average when there is the most typical value. However, the median is preferable to the
mean as an average when there are values which are not typical (outliers) or the distribution is skewed. The mode is not
very useful measure of the centre because the value of mode could be any number which is duplicated more than twice.
Mean = median = mode Mean < median < mode Mean > median > mode
The distribution is symmetrical. The distribution is skewed to the left The distribution is skewed to the
(negatively skewed). right (positively skewed).
Measure of Spread
Range = largest value - smallest value
Interquartile range = Q3 - Q1
Box-and-whisker plots
Q1 Median Q3 Outlier
Smallest value Largest value
Q3-Q2 ≈ Q2-Q1, (median is in the centre of the box) --> the distribution is symmetrical
Q1 Median Q3
Smallest value Largest value
Q3-Q2 > Q2-Q1, (median is nearer to the left of the box) --> the distribution is positively skewed
Q1 Median Q3
Smallest value Largest value
Q3-Q2 < Q2-Q1, (median is nearer to the right of the box) --> the distribution is negatively skewed
2
Outliers
lower inner fence: Q1 - 1.5*(Q3-Q1) upper inner fence: Q3 + 1.5*(Q3-Q1)
lower outer fence: Q1 - 3*(Q3-Q1) upper outer fence: Q3 + 3*(Q3-Q1)
If there is no value beyond the inner fence, so the data set does not contain any values which could be said to be outliers.
A point beyond an inner fence on either side is considered a mild outlier. A point beyond an outer fence is considered an
extreme outlier.
So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what is extra large or extra
small. Rottweilers are tall dogs. And Dachshunds are a bit short ... but don't tell them!
f x x fx
2 2
2
Variance, σ =
i i
or σ 2
=
i i
x
2
f i f i
Example:
3
cf
0
6
22
46
71
88
N
F
Median L 2 c
f
88
22
20 2 10
24
29.17(2d.p)
What will happen to the mean, median, mode, range, Variance and Standard Deviation if...?
Case 1:
The numbers are 3,4,4,5; 4 cases Let us multiply each observation
mean --> mean+X
mean = 4 by 2.
median --> median+X
median = 4 3,4,4,5
mode --> mode+X
mode= 4 6,8,8,10
range --> nothing (same range)
Add a constant 3 to each observation. after multiplication:
Variance --> nothing (same)
6,7,7,8 Mean =8
Standard Deviation --> nothing (same)
Mean = 7 Median = 8
Case 2:
Median = 7 Mode = 8
mean --> mean*X
Mode = 7 The mean, median, and mode are
median --> median*X
They are all 3 more than what they also multiplied by 2.
mode --> mode*X
were. The standard deviation will also
range --> range*X
The standard deviation will not change be multiplied by 2.
Standard Deviation --> |x|*SD
if you add a constant. You can verify
Variance --> variance *X2
Exercise:
4
1. i) Draw a histogram and describe the skewness of the distribution. ii) draw a cumulative frequency graph then estimate
the value of median and interquartile range. iii) Find the mean, variance and standard deviation for each of the following.
2. i) Find the median, mean, standard deviation, range and interquartile range of each of the following data sets.
ii) construct a box-and-whisker plot then calculate the inner and outer fences, state by giving a reason whether there
are any outliers, then comment on the shape of the distribution.
a) 7 4 14 9 12 2 19 6 15 30 4 8 9 10 7
b) 7.6 4.8 1.2 6.9 4.8 7.2 8.1 10.3 4.8 6.7 4.9 6.7 1.0 5.3
3. The 30 numbers of the Darton town orchestra each recorded the amount of individual practice x hours, they did in the
first week of June. The results are summarised as follows:
x 22.5 , x 2
1755
The mean and standard deviation of the number of hours of practice undertaken by the members of the Darton orchestra
in this week were μ and σ respectively.
Find a) μ and
b) σ.
Two new people joined the orchestra and the number of hours of individual practice they did in the first week of June
were μ-2σ and μ+2σ.
c) State, giving the reasons, whether the effect of including these two members was to increase, decrease or leave
unchanged the mean and standard deviation.
4. Thirty children were given a task to perform and the times taken were recorded, each to the next whole number of
minutes above the actual time. The results were as follows:
12 20 14 17 17 8 19 13 27 13
16 18 10 7 22 16 11 18 13 6
16 12 14 23 15 8 10 17 16 19
5. As part of a detailed study of its workforce, a large company selected a random sample of 100 meale employees and
recorded the length of time each employee had been with the company. The histogram illustrates the distribution
produced.
5
a) copy and complete the following table.
Time (Years) Number of males
0-
2-
5-
10-
15-
20-30
b) calculate estimates of the median and quartile times for the sample
c) an equivalent random sample of 100 female employees gave calculated estimates of 3.2 years for the median, and 1.8
years and 8.6 years for the quartile times. The longest serving woman in the sample had been with the company for 20
years. The sample also included a woman who had very recently joined the company. Draw adjacent box plots to
compare the distributions of the lengths of time male employees and female employees had been with the company.
d) List one difference between the two distributions as illustrated by the box plots.