Lesson 5
Lesson 5
OF
AGRICULTURE & TECHNOLOGY
JKUAT SODeL
Nairobi, Kenya
E-mail: [email protected]
Back Close 0
STA 2100 Probability and Statistics I
LESSON 5
Numerical Summaries of Data (Grouped
Frequency Distributions)
JKUAT SODeL
Learning outcomes
Upon completing this topic, you should be able to:
Calculate and interpret the measures of central tendency
©2014
Back Close 1
STA 2100 Probability and Statistics I
5.1. Introduction
In our last lecture we looked at the measures of central tendency,
measures of dispersion/variability and location for ungrouped
data. We saw different ways of computing the mean, that is
JKUAT SODeL
Back Close 2
STA 2100 Probability and Statistics I
a Frequency Distribution),where we showed how to construct
a frequency distribution (also known as grouped frequency) ta-
ble. Therefore, in this lesson, we shall go straight to the use of
these tables assuming that we are now familiar with the con-
JKUAT SODeL
Back Close 3
STA 2100 Probability and Statistics I
classes or categories and determine the number of individ-
uals belonging to each class or group or category. Such a
number is what we refer to as class frequency. When data
are arranged in classes together with the corresponding
JKUAT SODeL
Example . Consider the following data that gives the masses
of 100 male JKUAT IT students as recorded in the frequency
table below.
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 4
STA 2100 Probability and Statistics I
Mass (kg) No. of Relative Cumulative
Students frequency Frequency
60-62 5 0.05 5
63-65 18 0.18 23
JKUAT SODeL
66-68 42 0.42 65
69-71 27 0.27 92
72-74 8 0.08 100
©2014
Back Close 5
STA 2100 Probability and Statistics I
Class intervals and class limits: The symbol showing a class
such as 60-62 is called a class interval. The numbers 60 and
62 are called class limits. The value 60 is the Lower Class
Limit (LCL) while 62 is the Upper Class Limit (UCL).
JKUAT SODeL
Back Close 6
STA 2100 Probability and Statistics I
Class size or interval or width: This is the difference between
the UCB and the LCB, for instance 62.5-59.5=3
Class mark/mid-mark: This is the midpoint of class inter-
val obtained by taking the average of UCB and LCB. In
JKUAT SODeL
Example . The speed, to the nearest mile per hour, of 120
vehicles passing a check point were recorded and grouped as
follows:
JJ II Speed(mph) 21-25 26-30 31-35 36-35 46-60
J I No. of Cars 22 48 25 16 9
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 7
STA 2100 Probability and Statistics I
Estimate the mean of this distribution.
Solution
First, we need to work out the mid-interval values for the
first interval21-25 using the LCB=20.5 and UCB=25.5
JKUAT SODeL
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 8
STA 2100 Probability and Statistics I
speed(mph) Midpoint, x f fx
21-25 23 22 506
26-30 28 48 1344
31-35 33 25 825
JKUAT SODeL
2
P P
Hence, mean x̄= f x/ f = 3800/120 = 31 3
Back Close 9
STA 2100 Probability and Statistics I
xi = a + di where di = the deviation of xi from the assumed
mean a.
P P P P
Thus, the mean x̄ = f i xi / f i = fi (a + di )/ fi ;
expanding this and simplify the expression, we get x̄ = a +
JKUAT SODeL
P P P P
fi di / fi that we can simply write as x̄ = a + f x/ f
without the subscripts.
So the mean, x̄ = assumed mean+mean deviation f rom the assum
©2014
Back Close 10
STA 2100 Probability and Statistics I
where ū is the mean of u
The formula x̄ = a + cū is what we refer to as the cod-
ing method for computing the mean and other measures from
a frequency distribution table. It is mainly useful when class
JKUAT SODeL
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 11
STA 2100 Probability and Statistics I
Mass (kg) No. of class d= u= fu
Students,f mark, x−a d/3
x
60-62 5 61 -6 -2 -10
JKUAT SODeL
63-65 18 64 -3 -1 -18
66-68 42 67 0 0 0
69-71 27 70 3 1 27
©2014
72-74 8 73 6
16 2
P P
Total f= fu =
100 15
Thus, using the coding formula we have x̄ = a + cū , where
P P
ū = f u/N = 15/100 = 0.15 remember f =N
JJ II Implying x̄ = a + cū = 67 + 3(0.15) = 67.45kg
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 12
STA 2100 Probability and Statistics I
5.2.3. Median for a grouped frequency
As previously mentioned, there is loss of individual identities;
hence we cannot calculate the exact value for the median but
can be estimated using two methods.
JKUAT SODeL
this)
Using the interpolation formula
Given grouped frequency data, the best that we can do is to es-
timate the group/class that contains the median item and hence
obtain the ‘theoretical’ value. To achieve this objective, we pro-
JJ II ceed as follows:
J I Step1: Form a Cumulative Frequency (CF) column
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 13
STA 2100 Probability and Statistics I
Step 2: Find N/2
Step3: Find that F value that first exceeds, N/2 which
identifies the median class M
Step 4: Calculate the median using the formula
median = LM + ( N/2+F
JKUAT SODeL
M −1
fM
)CM
Where:
LM : is the lower class boundary of the median class
©2014
Back Close 14
STA 2100 Probability and Statistics I
Age in Years 20-25 25-30 30-35 35-40 40-45 45-50
No. of reps 2 14 29 43 33 9
Solution:
Using the procedure illustrated above, we have
JKUAT SODeL
Back Close 15
STA 2100 Probability and Statistics I
5.2.4. Mode for a grouped frequency:
Sometimes a set of data is obtained where it is appropriate to
measure a representative value in terms of ‘popularity ’. The
mode of a set of data is that value which occurs most often or
JKUAT SODeL
Back Close 16
STA 2100 Probability and Statistics I
Step 3: Calculate D2 =difference between largest frequency
and the frequency immediately following it.
Step 4: Use the interpolation formula mode = Lm +( D1D+D
1
2
)Cm
Where:
JKUAT SODeL
Back Close 17
STA 2100 Probability and Statistics I
D2 = 43 − 33 = 10
Lm = 35
Cm = 5
14 14
Hence, mode = 35 + ( 10+14 )5 = 35 + ( 24 )5 = 37.92 years
JKUAT SODeL
Back Close 18
STA 2100 Probability and Statistics I
5.3.1. Variance and Standard deviation
In most cases, we are always interested in a measure that can
be used for further statistical analysis of a set of data. In that
case, the variance and standard deviation are measures that can
JKUAT SODeL
Back Close 19
STA 2100 Probability and Statistics I
cessful sales made by the salesmen employed by a large micro-
computer firm in a particular quarter. Calculate the standard
deviation of the number of sales.
No. of 0 to 4 5 to 9 10 to 15 to 20 to 25 to
sales 14 19 24 29
JKUAT SODeL
No. of 1 14 23 21 15 6
salesmen,f
©2014
Solution:
We can solve this problem by first finding the midpoint, com-
puting the mean and then variance
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 20
STA 2100 Probability and Statistics I
No. of Sales No.of mid- fx f x2
Sales- point,
men, (x)
f
JKUAT SODeL
0 to 4 1 2 2 4
5 to 9 14 7 98 686
10 to 14 23 12 276 3312
©2014
15 to 19 21 17 357 6069
20 to 24 15 22 330 7260
25 to 29 6 27 162 4374
Total 80 1225 21,703
1225
Hence, mean, x̄ = 80 = 15.31 sales
JJ II
q
21703
√ √
sd = − (15.31)2 = 271.29 − 234.40 = 36.89 =
80
J I 6.1 sales
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 21
STA 2100 Probability and Statistics I
Note: In this case we have assumed that we are dealing with
the whole population, hence we divide the denominator by
n and not n − 1.
We can use the coding method to find the standard deviation
JKUAT SODeL
figures, we have
= f (a + cu − a − cū)2 / f simplifying the equation we
P P
have
=c2 f (u − ū)2 / f
P P
Back Close 22
STA 2100 Probability and Statistics I
ample 1. Using the above formulas, we can obtain the standard
deviation as follows:
First, remember, we already obtained the mean x̄ = 67.45kg
Mass (kg) No. of class x − x̄ (x − x̄)2 f (x − x̄)2
JKUAT SODeL
Students,f mark,
x
60-62 5 61 -6.45 41.6025 208.0125
©2014
Back Close 23
STA 2100 Probability and Statistics I
Alternatively, using the coding method, we have;
Example . Using the coding method to find the standard
deviation of the same data set.
Mass (kg) No. of class u = f u f u2
JKUAT SODeL
Students,f mark, x (x −
62)/3
60-62 5 61 -2 -10 20
©2014
63-65 18 64 -1 -18 18
66-68 42 67 0 0 0
69-71 27 70 1 27 27
72-74 8 73 2 16 32
100 15 97
JJ II Again, we can say that we already obtained the mean x̄ =
J I 67.45kg
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 24
STA 2100 Probability and Statistics I
Thus, variance=c2 [( f u2 / f ) − ū2 ]=9 [97/100 − 0.152 ] =
P P
8.5275
JKUAT SODeL
©2014
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 25
STA 2100 Probability and Statistics I
5.3.2. Mean absolute deviation (MAD)
P P
Also known as mean deviation, is defined by f |x − x̄|/ f
P
or f |x − x̄|/N
Where, |x − x̄| is the absolute difference between the value
JKUAT SODeL
|x| = −x if x < 0
For instance, | − 56| = 56 , |9| = 9 or | − 3.8| = 3.8
Example . Given the data in Example 3, obtain the MAD
for the data.
Solution:
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 26
STA 2100 Probability and Statistics I
Mass (kg) No. of class x − x̄ |x − x̄| f |x − x̄|
Students,f mark, x
60-62 5 61 -6.45 6.45 32.25
63-65 18 64 -3.45 3.45 116.1
JKUAT SODeL
P
Total f= 280.5
100
Hence, M AD = 280.5/100 = 2.805 ≈ 2.81kg
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 27
STA 2100 Probability and Statistics I
5.4. Measures of location/Position
5.4.1. Quartiles and Percentiles
We have already discussed how to find the median of grouped
data. The process of obtaining the quartiles and percentiles in
JKUAT SODeL
Back Close 28
STA 2100 Probability and Statistics I
Q2 is the 24 nth value
Q3 is the 34 nth value
Exercise 1. Jua Kali Solicitors monitored the time spent
on consultations with a random sample of 120 of their clients.
JKUAT SODeL
14 19 24 29 34 44 59 89 119
No. 2 5 17 33 27 25 7 3 1
of
clients
(a) Obtain the estimates of the median and quartiles of this
JJ II distribution.
J I (b) Comment on the skewness of the distribution.
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 29
STA 2100 Probability and Statistics I
5.5. Combining sets of Data
There are some instances where we have been given (a) the num-
ber of observations, (b) the mean and (c) the standard deviation
for each data set, but we need to combine the data.
JKUAT SODeL
Back Close 30
STA 2100 Probability and Statistics I
(b) Find the mean and standard deviation of the number of
errors per page for the 250 pages.
Solution:
P
(a) x̄ = x/n=920/200=4.6
JKUAT SODeL
Back Close 31
STA 2100 Probability and Statistics I
pages
P P
Thus, total number of errors = x+ y = 920+220 = 1140
Mean=1140/250 = 4.56 and
(standarddeviation)2 = x2 + y 2 /250 − 4.562
P P
JKUAT SODeL
= 5032−1210
250
− 4.562 = 4.1744
√
Standard deviation = 4.1744 = 2.04(3.sf)
©2014
Back Close 32
STA 2100 Probability and Statistics I
5.6. Summary
The relationship between mean, median and mode is as follows.
The median lies between the mean and mode but closer
to the mean by a factor of 2 to 1. Hence the relationship
JKUAT SODeL
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 33
STA 2100 Probability and Statistics I
Learning Activities
Briefly show how you can use the coding method to obtain
the mean and standard deviation of a simple frequency
distribution table.
JKUAT SODeL
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 34
STA 2100 Probability and Statistics I
Solutions to Exercises
Exercise 1. For grouped continuous data with n = 120 ,
Q1 is the 14 nth value i.e the 30th value,
Q2 is the 42 nth value i.e. the 60th value,
JKUAT SODeL
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 35
STA 2100 Probability and Statistics I
Time CF
9.5-14.5 2
14.5-19.5 7
19.5-24.5 24
JKUAT SODeL
24.5-29.5 57
29.5-34.5 84
34.5-44.5 109
©2014
44.5-59.5 116
59.5-89.5 119
89.5-119.5 120
For solution (a)
Q1 lies in the interval 24.5-29.5 (width=5)
JJ II There are 33 items in this interval
J I So Q1 = 24.5 + 33 6
∗ 5 = 25.4 min
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 36
STA 2100 Probability and Statistics I
Q2 lies in the interval 29.5-34.5 (width=5)
There are 27 items in this interval
3
So Q1 = 29.5 + 27 ∗ 5 = 30 min
Q3 lies in the interval 34.5-44.5 (width=10)
JKUAT SODeL
N/2+FM −1
( fM )CM
Solution (b)
Q3 − Q2 = 6.9 min, Q2 − Q1 = 4.6 min
Since Q3 − Q2 > Q2 − Q1 , it implies that we have a positive
skew
JJ II Exercise 1
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 37
STA 2100 Probability and Statistics I
P P 2
Exercise 2. x = 101.4, x = 102.83, n = 100
P
Therefore, x̄ = x/n = 101.4/100 = 1.014
So the mean volume is 1.014 litres.
pP p
s= x2 /n − x̄2 = 102.83/100 − 1.0142 = 0.0101...
JKUAT SODeL
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 38