Biostatistics
Biostatistics
Biostatistics
4
Measures of averages - Mean median mode geometric mean harmonic mean
computation of the above statistics for raw and grouped data - merits and demerits measures of location percentiles quartiles - computation of the above statistics for raw
and grouped data
In the study of a population with respect to one in which we are interested we may get a
large number of observations. It is not possible to grasp any idea about the characteristic when
we look at all the observations. So it is better to get one number for one group. That number
must be a good representative one for all the observations to give a clear picture of that
characteristic. Such representative number can be a central value for all these observations. This
central value is called a measure of central tendency or an average or a measure of locations.
There are five averages. Among them mean, median and mode are called simple averages and
the other two averages geometric mean and harmonic mean are called special averages.
Arithmetic mean or mean
Arithmetic mean or simply the mean of a variable is defined as the sum of the
observations divided by the number of observations. It is denoted by the symbol
variable x assumes n values x1, x2 xn then the mean is given by
Grouped Data
The mean for grouped data is obtained from the following formula:
If the
Where
A = any value in x
n = total frequency
c = width of the class interval
Example 2
Given the following frequency distribution, calculate the arithmetic mean
Marks
: 64
63
62
61
60
59
18
12
Number of
Students
Solution
X
64
63
62
61
60
59
Direct method
F
8
18
12
9
7
6
60
Fx
512
1134
744
549
420
354
3713
D=x-A
2
1
0
-1
-2
-3
Fd
16
18
0
-9
-14
-18
-7
Short-cut method
Here A = 62
Example 3
For the frequency distribution of seed yield of seasamum given in table, calculate the mean yield
per plot.
Yield per 64.5-84.5
plot in(in
g)
No of
3
plots
84.5-104.5
104.5-124.5
124.5-144.5
20
Solution
Yield ( in g)
No of Plots (f)
Mid X
64.5-84.5
84.5-104.5
104.5-124.5
124.5-144.5
Total
3
5
7
20
35
74.5
94.5
114.5
134.5
A=94.5
The mean yield per plot is
Direct method:
=
Shortcut method
=119.64 gms
Fd
-1
0
1
2
-3
0
7
40
44
Example 4
If the weights of sorghum ear heads are 45, 60,48,100,65 gms, calculate the median
Solution
Here n = 5
First arrange it in ascending order
45, 48, 60, 65, 100
Median =
=
=60
Example 5
If the sorghum ear- heads are 5,48, 60, 65, 65, 100 gms, calculate the median.
Solution
Here n = 6
Grouped data
In a grouped distribution, values are associated with frequencies. Grouping can be in the
form of a discrete frequency distribution or a continuous frequency distribution. Whatever may
be the type of distribution, cumulative frequencies have to be calculated to know the total
number of items.
Step3: See in the cumulative frequencies the value just greater than
Step4: Then the corresponding value of x is median.
Example 6
The following data pertaining to the number of insects per plant. Find median number of insects
per plant.
Number of insects per plant (x)
No. of plants(f)
Solution
1
1
2
3
3
5
4
6
5
10
6 7
13 9
8
5
9
3
10
2
11
2
f
1
3
5
6
10
13
9
5
3
2
2
1
60
cf
1
4
9
15
25
38
47
52
55
57
59
60
Median = size of
Here the number of observations is even. Therefore median = average of (n/2)th item and
(n/2+1)th item.
12
1
class interval is called the Median class. Then apply the formula
Median =
where
Example 7
For the frequency distribution of weights of sorghum ear-heads given in table below. Calculate
the median.
Weights of ear
heads ( in g)
60-80
80-100
100-120
120-140
140-160
Total
No of ear
heads (f)
22
38
45
35
24
164
Less than
class
<80
<100
<120
<140
<160
Cumulative
frequency (m)
22
60
105
140
164
Solution
Median =
=
It lies between 60 and 105. Corresponding to 60 the less than class is 100 and corresponding to
105 the less than class is 120. Therefore the medianal class is 100-120. Its lower limit is 100.
Here
Median =
Merits of Median
1. Median is not influenced by extreme values because it is a positional average.
2. Median can be calculated in case of distribution with open-end intervals.
3. Median can be located even if the data are incomplete.
Demerits of Median
1. A slight change in the series may bring drastic change in median value.
2. In case of even number of items or continuous series, median is an estimated value other than
any value in the series.
3. It is not suitable for further mathematical treatment except its use in calculating mean
deviation.
4. It does not take into account all the observations.
Mode
The mode refers to that value in a distribution, which occur most frequently. It is an
actual value, which has the highest concentration of items in and around it. It shows the centre of
concentration of the frequency in around a given value. Therefore, where the purpose is to know
the point of the highest concentration it is preferred. It is, thus, a positional measure.
Its importance is very great in agriculture like to find typical height of a crop variety,
maximum source of irrigation in a region, maximum disease prone paddy variety. Thus the mode
is an important measure in case of qualitative data.
Solution
The maximum frequency is 16. The corresponding x value is 75.
mode = 75 gms.
Continuous distribution
Locate the highest frequency the class corresponding to that frequency is called the modal class.
Then apply the formula.
Mode =
Where = lower limit of the model class
= the frequency of the class preceding the model class
= the frequency of the class succeeding the model class
and
c = class interval
Example 10
For the frequency distribution of weights of sorghum ear-heads given in table below. Calculate
the mode
Weights of ear
heads (g)
60-80
80-100
No of ear
heads (f)
22
38
100-120
120-140
45
35
140-160
Total
20
160
Solution
Mode =
Here
=38,
=35
Mode =
=
= 109.589
Geometric mean
The geometric mean of a series containing n observations is the nth root of the product of the
values.
If x1, x2, xn are observations then
G.M=
=
Log GM =
=
=
GM = Antilog
For grouped data
GM = Antilog
GM is used in studies like bacterial growth, cell division, etc.
Example 11
If the weights of sorghum ear heads are 45, 60, 48,100, 65 gms. Find the Geometric mean for the
following data
Weight of ear Log x
head x (g)
45
1.653
60
1.778
48
1.681
100
2.000
65
1.813
Total
8.925
Solution
Here n = 5
GM = Antilog
= Antilog
= Antilog
= 60.95
Grouped Data
Example 12
Find the Geometric mean for the following
Weight of sorghum (x)
50
65
75
80
95
100
Solution
Weight of
sorghum (x)
No. of ear
head(f)
Log x
f x log x
50
63
65
130
5
10
5
15
1.699
10.799
1.813
2.114
8.495
17.99
9.065
31.71
15
50
2.130
9.555
31.95
99.21
135
Total
Here n= 50
GM = Antilog
= Antilog
= Antilog 1.9842 = 96.43
Continuous distribution
Example 13
For the frequency distribution of weights of sorghum ear-heads given in table below.
Calculate the Geometric mean
Weights of ear
heads ( in g)
60-80
80-100
100-120
No of ear
heads (f)
22
38
45
120-140
140-160
Total
35
20
160
Solution
Weights of ear
No of ear
heads ( in g)
heads (f)
60-80
22
Mid x
Log x
f log x
70
1.845
40
59
80-100
38
90
1.954
74.25
100-120
45
110
2.041
91.85
120-140
35
130
2.114
73.99
140-160
20
150
2.176
43.52
Total
160
324.2
Here n = 160
GM = Antilog
= Antilog
= Antilog
= 106.23
Harmonic mean (H.M)
Harmonic mean of a set of observations is defined as the reciprocal of the arithmetic
average of the reciprocal of the given values. If x1, x2..xn are n observations,
Example 13
From the given data 5, 10,17,24,30 calculate H.M.
X
5
10
17
24
30
0.2000
0.1000
0.0588
0.0417
0.4338
= 11.526
Example 14
Number of tomatoes per plant are given below. Calculate the harmonic mean.
Number of tomatoes per plant
20 21
22
23
24
25
Number of plants
Solution
Number of
tomatoes per
plant (x)
20
21
22
23
24
25
No of
plants(f)
4
2
7
1
3
1
18
0.0500
0.0476
0.0454
0.0435
0.0417
0.0400
0.2000
0.0952
0.3178
0.0435
0.1251
0.0400
0.8216
Merits of H.M
1. It is rigidly defined.
2. It is defined on all observations.
3. It is amenable to further algebraic treatment.
4. It is the most suitable average when it is desired to give greater weight to smaller observations
and less weight to the larger ones.
Demerits of H.M
1. It is not easily understood.
2. It is difficult to compute.
3. It is only a summary figure and may not be the actual item in the series
4. It gives greater importance to small items and is therefore, useful only when small items have
to be given greater weightage.
5. It is rarely used in grouped data.
Percentiles
The percentile values divide the distribution into 100 parts each containing 1 percent of
the cases. The xth percentile is that value below which x percent of values in the distribution fall.
It may be noted that the median is the 50th percentile.
For raw data, first arrange the n observations in increasing order. Then the xth percentile
is given by
Where
= lower limit of the percentile calss which contains the xth percentile value (x. n /100)
= cumulative frequency uotp
= frequency of the percentile class
C= class interval
N= total number of observations
Percentile for Raw Data or Ungrouped Data
Example 15
The following are the paddy yields (kg/plot) from 14 plots:
30,32,35,38,40.42,48,49,52,55,58,60,62,and 65 ( after arranging in ascending order). The
computation of 25th percentile (Q1) and 75th percentile (Q3) are given below:
= 37.25 kg
= 55.75 kg
Example 16
The frequency distribution of weights of 190 sorghum ear-heads are given below. Compute 25th
percentile and 75th percentile.
Weight of earheads (in g)
40-60
60-80
80-100
100-120
120-140
140-160
160-180
180-200
Total
No of ear
heads
6
28
35
55
30
15
12
9
190
Solution
Weight of earheads (in g)
40-60
60-80
80-100
100-120
120-140
140-160
160-180
180-200
Total
No of ear heads
6
28
35
55
30
15
12
9
190
< 60
< 80
<100
<120
<140
<160
<180
<200
, and for
Cumulative
frequency
6
34
69
124
154
169
181
190
47.5
142.5
= 47.5.
The value 47.5 lies between 34 and 69. Therefore, the percentile class is 80-100. Hence,
= 80 +7.71 or 87.71 g.
Similarly,
Class
Quartiles
The quartiles divide the distribution in four parts. There are three quartiles. The second
quartile divides the distribution into two halves and therefore is the same as the median. The first
(lower).quartile (Q1) marks off the first one-fourth, the third (upper) quartile (Q3) marks off the
three-fourth. It may be noted that the second quartile is the value of the median and 50th
percentile.
Raw or ungrouped data
First arrange the given data in the increasing order and use the formula for Q1 and Q3
then quartile deviation, Q.D is given by
Where
item and
item
Example 18
Compute quartiles for the data given below (grains/panicles) 25, 18, 30, 8, 15, 5, 10, 35, 40, 45
Solution
5, 8, 10, 15, 18, 25, 30, 35, 40, 45
= (2.75)th item
= 2nd item +
= 8+ (10-8)
= 8+ x 2
= 8+1.5
= 9.5
= 3 x (2.75) th item
= (8.75)th item
= 8th item +
= 35+ (40-35)
= 35+1.25
= 36.25
Discrete Series
Step1: Find cumulative frequencies.
Step2: Find
Step3: See in the cumulative frequencies, the value just greater than
, then the
corresponding value of x is Q1
Step4: Find
Step5: See in the cumulative frequencies, the value just greater than
,then the
corresponding value of x is Q3
Example 19
Compute quartiles for the data given bellow (insects/plant).
X
f
5
4
8
3
12
2
15
4
19
5
24
2
30
4
Solution
x
5
8
12
15
19
24
f
4
3
2
4
5
2
cf
4
7
9
13
18
20
Continuous series
Step1: Find cumulative frequencies
Step2: Find
Step3: See in the cumulative frequencies, the value just greater than
, then the
then the
corresponding class interval is called 3rd quartile class. Then apply the respective formulae
No. of Students
11
18
25
28
30
33
22
15
12
10
f
11
18
25
28
30
33
22
15
12
10
204
cf
11
29
54
82
112
145
167
182
194
204
Questions
1. The middle value of an ordered series is called
a)2nd quartile
b) 5th decile
c) 50th percentile
b) bimodal
c) Trimodel
d) All of these