Biostat English
Biostat English
1 OBJECTIVES
1.2 INTRODUCTION
We welcome the reader who wishes to learn biostatistics. In this chapter we introduce
you to the subject. First of all we define statistics and biostatistics and then examples are
given where bio- statistical techniques are useful. These examples show that biostatistics has
an importance in advancing our biological knowledge; biostatistics helps to evaluate many
life-and-death issues in medicine.
We advise you to read the examples carefully and then think yourself, “What can be
inferred from the information presented?” What would you do with the data after they are
collected? How can it be presented and what you can get from it? We want you to realize
that biostatistics is a tool that can be used to benefit you and society.
There is no royal road to biostatistics. You need to be involved. You need to work
hard. You need to think. If you analyze the actual data, the result will be a powerful tool that
has immediate practical uses. Our main purpose is to develop thought patterns in your mind
that are useful in evaluating information in all areas of your life.
2. To choose the best therapy, a physician must compare the diagnosis or future course, of a
patient under several therapies. A therapy may be a success, a failure, or somewhere in
between; the evaluation of the chance of each occurrence necessarily enters into the decision.
Statistics is the science which deals with the collection, classifying, presenting, comparing
and interpreting numerical data collected to throw light on any sphere of enquiry- Lovitt.
The science of statistics is a most useful servant, but only of great value to those who
understand its proper use- W.I.King. Statistics provides tools and techniques for research
workers- A.M. Mood. Planning is the order of the day and without statistics planning is
inconceivable- L.H.C. Tippet.
Statistics may be defined as a science of numerical information which employs the process of
measurement and collection, classification, analysis, decision making and communication of
results in a manner understandable and verifiable by other- Cecil H. Meyers
1. Some statistical methods are used more deeply in biostatistics than in other fields. For
example, a general statistical textbook would not discuss the life-table method of analyzing
survival data of importance in many bio-statistical applications. The topics in this book are
adapted to the applications in mind.
2. Examples are drawn from the biological, medical, and health care areas; this helps you
maintain motivation. It also helps you in understanding how to apply statistical methods.
3. A third reason for a book on biostatistics is to teach the material to the audience of health
professionals. In this case, the interaction between students and teacher, but especially among
the students themselves, is of great value in learning and applying the subject matter.
Some of the statistical symbols which are useful to biostatistics students are:
In larger universities where both a statistics and a biostatistics department exist, the
degree of integration between the two departments may range from the bare minimum to
very close collaboration. In general, the difference between a statistics program and a
biostatistics program is twofold: (i) statistics departments will often host
theoretical/methodological research which are less common in biostatistics programs and (ii)
statistics departments have lines of research that may include biomedical applications but
also other areas such as industry (quality control), business and economics and biological
areas other than medicine
There is a special need of the subject bio statistics because it related with such areas
as medical , pharmacy, forestry, agriculture, etc, which are very necessary for the betterment
of society.
The importance and application of statistics in the field of biology is increasing day
by day. Why it is so? The reason is that in biology the interplay of casual and response
variables follow the laws that are not in the classic mold of 19th century physical science. In
that century, biologists such as Robert Mayer, Helmholtz, and others in trying to show that
biological process were nothing but physicochemical phenomena, helped create the
impression that the experimental methods and natural philosophy that had led to such
dramatic progress in the physical sciences should be imitated fully in biology.
Many biologists even to this day have retained the tradition of strictly mechanistic
and deterministic concepts of thinking, while physicists, as their science became more
refined and came to deal with ever more elementary particles, began to resort to statistical
approaches. In biology most phenomena are affected by many casual factors, uncontrollable
in their variation and often unidentifiable. Statistics is needed to measure such variable
phenomena with a predictable error and to ascertain the reality of minute but important
differences.
A Biostatistics centre could jointly organize working groups, the seminar series,
computing infrastructure and possibly consulting and clinical trials coordinating centre
cervices. The main objective of the centre would be to estimate, collaborate on, and circulate
results of research in a particular subspecialty in the following reasons:
The most critical short term problem in the field of biostatistics is the information
system. We need to incorporate modern, web-based technologies into the everyday
workings of the department of biostatistics. We need reliable and accessible systems that
are competitive with those available to departments of statistics and biostatistics. We
likely build collaborations with computer science students.
1.5.1 DATA
The information collected from census or surveys or from other sources is called raw data.
The word data means information. The adjective raw attached to data indicates that the
information collected cannot be used directly. It has to be converted into more suitable form
before it begins to make sense to be utilized gainfully. Raw data is like raw rice. Raw rice has
to be cooked properly and tastefully before it is eaten and digested. Similarly, raw data has to
data tabulation, which give meaning to the information collected. Data are tabulated by (1)
manual procedure (2) Mechanical procedure (3) Computer feeding. IN preparation of tables
following principles are followed:
(i) A rough draft of the table should be prepared first. Before drawing out the
final table, rough draft should be examined carefully.
(ii) Headings of the rows and columns should be brief and clear.
(iii) Title, note, row and column are made specific, connoting meaning or
expressions.
(iv) Numbers of class intervals are decided as per aims of study which should not
be too small or too big.
(v) Symbols used, should be explained.
(vi) Tabulated data should specify the units of their measurements.
(vii) The sources from which data are obtained should be given.
Tabulated data will give some information and also allow for further analysis.
The columns and rows in a table make eye strain and there are chances of poor visual
impression of data presented in a tabular form. Now the well tabulated data can be
represented in the form of picture, diagram or figure which will help in good
comparison through good visual impression. The representation of quantitative data
through charts and diagrams is known as graphical representation of statistical data. A
picture is said to be more effective than words for describing a particular thing or
phenomenon. Main objective of diagram is to help the eye to grasp series of numbers
and to grasp the meaning of series of data and also to assist the intelligence.
There are various types of graphs in the form of charts and diagrams. Some of
them are:
The simplest type of graph that can be used to represent the categorical data is the bar
diagram. It is also called a columnar diagram. The bar diagrams are drawn through
columns of equal width. In this diagram we show the category of the variable on the X-
axis and the frequencies on the Y-axis on a graph paper. A bar of each category is of the
variable is drawn and the height of the bar represents the frequency of that category.
Since the data is of qualitative nature or quantitative data of discrete type, bars should not
be next to each other and there should be an equal gap between two successive bars.
Following rules were observed while constructing a bar diagram:
Month: Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec
Patients: 285 315 250 289 386 410 452 620 421 186 450 500
700
600
500
400
300
200
100
0
Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec
Figure
ure -1.5 simple bar diagram
When two components are grouped in one set of variable or different variables of one
component are put together, their representation is made by a double bar diagram. In this
method, different variables are shown in a single bar with ddifferent
ifferent rectangles. From
above example, patients were divided in two categories as male and female and the data
is given below:
Month: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Male 100 250 150 189 270 200 350 275 215 86 300 200
Female: 185 115 100 100 116 210 102 345 206 100 150 300
Male Female
350
300
250
200
150
100
50
0
Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec
Multiple bar diagram shows that the proportion of subgroup between two or more categories
are represented with a bar giving proportion to each of them within the bar. It is also
advisable to make one bar as 100% and each subcategory is given proportion within the
graph.
Pie diagram is another graphical method of the representation of categorical data. Pie
is a mathematical constant defined as the ratio of the circumference of a circle to the diameter
and is equal to 22/7. It is drawn to depict the total valu
valuee of the given attribute using a circle.
In the pie chart, a circle (total 360o) is divided into sectors with areas proportional to the
frequencies or the relative frequencies of the categories of a variable. Dividing the circle into
corresponding degrees of angle then represent the sub
sub– sets of the data. Hence, it is also
called as Divided Circle Diagram
Diagram.
Example 2. A household with a monthly salary of Rs. 7200 plans his budget for a month as
given below:
Total of data corresponds to 360o. Let xo = the angle at the centre for item A, then for
the data given in above example to draw pie graph, we find the angles of each category.
Calculation of Angles
For Food:
f 3000
Angle at centre = × 360 o = × 360 o = 150o. Here f= Frequency of food and
∑f 7200
∑ f = Total frequency
For Rent:
f 800
Angle at centre = × 360 o = × 360 o = 40o
∑f 7200
Similarly, we can calculate the remaining angles, and the total of angles column should
always come to 360o.
Table-2
Miscellaneous 700 35
Total 7200 360
Figure-1.7Pie chart
1.8.2.3 Histogram
1. Class intervals must be exclusive. If the intervals are in inclusive form, convert them
to the exclusive form.
2. Draw rectangles with class intervals as bases and the corresponding frequencies as
heights.
3. If the intervals are equal, then the height of each rectangle is proportional to the
corresponding class frequency.
4. If the intervals are unequal, then the area of each rectangle is proportional to the
corresponding class frequency density.
Example 3. Draw a histogram for the following data showing the class interval and their
corresponding frequencies.
Frequency 4 10 18 8 6
Figure-1.8 Histogram
Example 4. Following is the distribution of shops according to the number of wage - earners
employed at a shopping complex.
Under 5 18 3.6
5 – 10 27 5.4
10 – 20 24 2.4
20 – 30 20 2.0
30 – 50 16 0.8
Illustrate the above table by a histogram, showing clearly how you deal with the unequal
class intervals.
Solution. When the class intervals are unequal, we construct each rectangle with the class
intervals as base and frequency density as height.
Figure- 1.9
In a frequency distribution, the mid-value of each class is obtained. Then on the graph
paper, the frequency is plotted against the corresponding mid-value. These points are joined
by straight lines. These straight lines may be extended in both directions to meet the X - axis
to form a polygon. If these points are joined by a free hand smooth curve then it is called
Frequency curve.
Example 5. The growth rate of different crops like rice, wheat, birth rates, death rates and
life expectancy are given in the following table. Make a frequency polygon from it.
40 – 44 42 3
45 – 49 47 10
50 – 54 52 12
55 – 59 57 15
60 – 64 62 7
65 – 69 67 5
Figure-1.10
1.8.2.5 Pictograms
Pictograph is the use of pictures or images to present data. They will give the quick
idea for the frequency of the characteristics and fraction also marks on pictures, e.g., bus
for transport, man for cases, cot for hospital beds, etc. It is widely used by government
and private organizations. The chief advantage of this method is its attraction.
1.8.2.6 Line chart
It is most widely used in medical science. It shows the trend of times. Data having
some order as age –wise incidence of a disease can be represented by a line chart. It is drawn
by taking one variable on the horizontal X-axis and the other variable on the vertical Y-axis.
This graph shows the effect of one variable on the other variable, e.g., age specific incidence
of cancer among males of Delhi.
If we plot the less than cumulative frequencies rather than frequencies against the
upper limits of the classes, the curve obtained on joining these points by free hand curve is
called less than cumulative frequency curve or ogive or less than ogive and If we plot the
more than cumulative frequencies rather than frequencies against the lower limits of the
classes, the curve obtained on joining these points by free hand curve is called more than
cumulative frequency curve. The advantage of this curve is that it enables us to answer the
queries related to the frequency distribution of the variable.
It is the simplest way of the representation of bivariate data. Thus for the bivariate
distribution (x, y) ; if the values of the Variable X and Y be plotted as x along X-axis and the
y along the Y-axis respectively in the x y plane, the diagram of dots so obtained is called
scatter diagram.
1.9 SUMMARY
From the study of this chapter the students came to know the definitions of statistics
and biostatistics, the scope and applications of biostatistics. The students studied and learnt
about data. What is data? What are the types of data? The classification of different types of
data provides knowledge to treat different types of data. We learn from the study of this
chapter the different steps necessary for adopting any sampling procedure and the two types
of error involved in the collection of sample and complete census. We learn definitions of
2.1 OBJECTIVES
From the study of this chapter the students will be able:
1. To know about the measures of central tendency- mean, median and mode.
2. To know the merits and demerits and uses of these measures.
3. To know about different methods of measuring mean, median and mode.
4. To know the situations where which measure is better to use?
5. To know the advantages of short cut methods of computing mean.
2.2 INTRODUCTION
In the previous chapter, we discussed data collection, data organization and data
representation techniques. The data representation techniques such as frequency histograms and
frequency polygons, introduced the concept of the shape of distributions of data. For example, a
frequency polygon illustrated the distribution of body mass index data. We expend chapter 1 on
these concepts by defining measures of central tendency.
Measures of central tendency as the name suggests are numerical measurement of the
central part of the distribution. Measures of central tendency are also called averages or measures
of location because they show the location of the centre of the distribution from which the data
were sampled. According to Professor Bowley, averages are, “statistical constants which enable
us to comprehend in a single effort the significance of the whole.” In other words, these are
numbers that tell us where the majority of values in the distribution are located. For example the
average marks in a distribution of marks of all the students of a class. The averages which are
commonly used in biostatistics are as follows:
2.3 MEAN
Mean or arithmetic mean of a series of data is the ratio of the sum of the observations to
the number of observations. If x1 , x 2 ,......x n are the observations of a series then their arithmetic
mean is given by
n
x + x 2 + .....x n ∑x
i =1
i
x= 1 = (1)
n n
And if the corresponding frequencies, f1 , f 2 ,.... f n of the variables x1 , x 2 ,......x n are given, then
the arithmetic mean is defined as ratio in which the numerator is the sum of products of the
variables with their frequencies and denominator is the sum of the frequencies.
n
f x + f 2 x 2 + ..... f n x n ∑fx
i =1
i i
x= 1 1 = (2)
∑ fi N
Mean of individual items is given by the ratio of the sum of items to the number of items
as given in formula (1).
Example 1. Find the arithmetic mean of triglycerides present 10 patients in their blood samples
in a hospitalas:
Solution. Let x be the average triglyceride value and since these are individual items, their mean
can be computed by formula
n
x1 + x 2 + .....x n ∑x i
x= = i =1
n n
25 + 30 + 21 + 55 + 47 + 10 + 15 + 17 + 45 + 35 300
= = = 30
10 10
characteristic and f1 , f 2 ,.... f n be their corresponding frequencies then the arithmetic mean is
given by the formula (2). The computation procedure for mean can be easily understood with the
help of the example given below.
Example 2. The distribution of marks of 50 students of B.Sc. class in a botany semester
examination is given below. Find the average of marks.
Marks (x) 12 23 25 35 45 15 40
Frequency(f) 3 10 12 10 2 8 5
Solution. Since this is a discrete distribution so the average of marks is given by the formula (2).
For the computation of average marks we prepare the following table:
12 3 36
23 10 230
25 12 300
35 10 350
45 2 90
15 8 120
40 5 200
Total ∑ f = 50 ∑ f x = 1326
f x + f 2 x 2 + ..... f n x n ∑fx
i =1
i i
1326
x= 1 1 = = = 26.52
∑ fi N 50
This is clear that it is not necessary that average will be a number presenting in the
data and also it is not an integer value while the marks in integers.
2.3.3 MEAN IN CONTINUOUS DISTRIBUTION
In case of continuous distribution, there are given class intervals and their
corresponding frequencies. First of all we find the mid values of these classes and treat them
as the variable values. Now we apply the formula (2) for the calculation of arithmetic mean.
The procedure will be clear from the following example.
Example 3. For the data given in the below table on systolic BP of 68 patients, calculate the
arithmetic mean.
Table 2.
90-100 3 140-150 11
100-110 5 150-160 9
110-120 7 160-170 6
120-130 10 170-180 2
130-140 15
90-100 3 95 285
Total ∑f = 68 ∑ fx = 9220
∑fx i =1
i i
9220
x == = = 135.6mmHg
N 68
For the computation of mean short –cut method is applied when the variable values
and their frequencies are large. To make the computations easy we take a middle value in the
given values of x as assumed mean and subtract this assumed mean from all the values of x.
This assumed mean is also called provisional mean. Then the formula for the calculation of
arithmetic mean is given by as follows:
x = A+
∑d
n (3)
Step 1. Take any observation (generally, middle value if we arrange the values in
ascending or descending order of magnitude) of the individual series as assumed mean A.
Step 2. Find the deviation of the values of variate x from assumed mean A, i.e., calculate
the differences d= x- A
Step 3. Find the sum of d and use above formula (3), we find the value of mean.
If the frequencies corresponding to the variate values are given, then we use
the formula for mean as follows:
x = A+
∑ fd
N (4)
Example 4. The marks of the 7 students of a class in a test are as given below:
Solution. Let us take assumed mean A=25. Now we prepare the table for the computation
of mean as given below:
X d = x- 25
12 -13
15 -10
22 -3
25 0
35 10
40 15
45 20
Total ∑ d = 19
Arithmetic mean x = A +
∑d = 25 +
19
= 25 + 2.71 = 27.71
n 7
Thus the average of marks of the given 7 students of the class is 27.71
Example 5. Ten patients were examined for uric acid test. The operation was performed
1050 times and the frequencies so obtained for different number of patients (x) are shown
in the table given below. Compute the arithmetic mean by short- cut method.
x: 0 1 2 3 4 5 6 7 8 9 10
Solution. Let 5 be the assumed mean. Now we prepare the table for the calculation of
mean.
X Frequency (f) d = x- 5 fd
0 2 -5 -10
1 8 -4 -32
2 43 -3 -129
3 133 -2 -266
4 207 -1 --207
5 260 0 0
6 213 1 213
7 120 2 240
8 54 3 162
9 9 4 36
10 1 5 5
Total ∑f = 1050 ∑ fd = 12
Arithmetic mean x = A +
∑ fd = 5+
12
= 5 + 0.0114 = 5.0114cm
N 1050
It can be used in grouped data. When all the classes are of equal width (say h), in
continuous data and the values of x are at equal interval in discrete grouped data then the we may
simplify the calculations by taking d= (x- A)/ h in short-cut method. Now the formula for the
calculation of mean becomes.
x = A+
∑ fd × h
N
Here, the symbols have the same meaning as in short-cut method above and h is the gap
between the two values of x or class interval.
Example 6. Find the mean by step deviation method for the data of blood pressure of 68 patients
as given in the following table.
BP(mmHg) (x) 90 100 110 120 130 140 150 160 170
Frequency ( f) 3 5 7 10 15 11 9 6 2
Solution. We take assumed mean A= 130 and here interval between any two values of x is 10,
i.e., h= 10. Now prepare the table for the computation of mean.
90 3 -4 -12
100 5 -3 -15
110 7 -2 -14
120 10 -1 -10
130 15 0 0
140 11 1 11
150 9 2 18
160 6 3 18
170 2 4 8
Total ∑f = 68 ∑ fd = 4
Arithmetic mean x = A +
∑ fd × h = 130 + 4
× 10 = 130 + 0.588 = 130.588mmHg
N 68
90-100 3 95 -4 -12
130-140 15 135 0 0
140-150 11 145 1 11
150-160 9 155 2 18
160-170 6 165 3 18
170-180 2 175 4 8
Total ∑f = 68 ∑ fd = 4
Arithmetic mean x = A +
∑ fd × h = 135 + 4
× 10 = 135 + 0.588 = 135.588mmHg
N 68
Example 8. In a study on patients of typhoid fever the following data are obtained. Find the
arithmetic mean.
Age in years 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89
No. of cases 1 0 1 10 17 38 9 3
Solution. This is inclusive type data; first of all we convert it to exclusive type data. The
procedure for converting inclusive type data to exclusive type data is as follows:
We see that the upper limit of the first class is 19 and the lower limit of the second
class is 20 and their difference is 20-19=1. Now subtract half of the difference, i.e., 0.5 from
the upper limit and 0.5 to the lower limit. Also we see that this difference is the same for
each of the class. So the new classes are as 9.5-19.5, 19.5-29.5 and so on.
Now for the calculation of mean any method discussed above can be used. Here we
apply step deviation method.
9.5-19.5 1 14.5 -3 -3
19.5-29.5 0 24.5 -2 0
29.5-39.5 1 34.5 -1 -1
39.5-49.5 10 44.5 0 0
49.5-59.5 17 54.5 1 17
59.5-69.5 38 64.5 2 76
69.5-79.5 9 74.5 3 27
79.5-89.5 3 84.5 4 12
Total ∑f = 79 ∑ fd = 128
Arithmetic mean x = A +
∑ fd × h = 44.5 + 128 × 10 = 44.5 + 16.2 = 60.7
N 79
In computation of arithmetic mean some items are more important than the others, in
such cases the weightage should be given to the items according to their importance. For
example if we want to have an idea of the change in cost of living of group of people of a
certain locality, then the simple mean of the prices of the commodities consumed by them
will not do, since all the commodities are not equally important, e.g., wheat, rice and pulses
are more important than cigarettes, tea, confectionery, etc.
xw =
∑w x i i
∑w i
Example 9. The following table gives the platelets count (in lakh/cmm) from the analysis of
the blood samples on five different days in a pathology laboratory. Find the average platelets
count per patient.
Day 1 2 3 4 5
Platelates count 0.50 0.75 1.00 1.50 2.00
Solution. The table for the calculation of weighted mean is given by:
0.50 65 32.5
0.75 80 60.0
1.00 95 95.0
1.50 90 135
2.00 70 140
xw =
∑w x
i i
=
462.5
= 1.156
∑w i 400
If x1 , x 2 ,......x m are the means of m series of sizes n1 , n2 ,....nm respectively, then their
combined arithmetic meaning x is given by:
x=
∑n xi i
; i = 1,2,......m
∑n i
Example 10. There are 40 male and 10 female employees in a firm. The mean salary of male
employees is Rs.520 and that of female employees Rs. 420. Find the combined average
salary of all the employees.
Solution. Here, n1 = 40, n2 = 10, x1 = 520, x 2 = 420
Combined mean x =
∑n x i i
=
n1 x1 + n 2 x 2 520 × 40 + 420 × 10 25000
= = = 500
∑n i n1 + n 2 40 + 10 50
Some times there are problems of such type that we used wrong digits while the
actual digits were different, then we replace the wrong digits with the correct digits and now
we can get the correct mean. The procedure will be clear from the example given below.
Example 11. A student calculates the mean of 20 observations as 25.2. Later on he found
that he misread one observation 34 in place of 43, find the correct mean.
x=
∑ x or ∑ x = nx = 20 × 25.2 = 504
n
Merits:
6. It is easily understandable.
6. It may lead to wrong conclusions if the details of the data are not given. For
example the marks of two students in three successive tests are respectively 30, 40, 50
and 50, 40, 30. We see that average score of both the students is same, we can say that
both students are of same level while first is improving and the second is deteriorating.
Uses of Mean:
4. A businessman uses it for computing per unit profit, output per person, average
expenditure and average profit per week or per month, etc.
2.4 MEDIAN
Median of a distribution is the middle most value of the variable if the values of the
variable are arranged in ascending or descending order of their magnitude. The median
divides the observations of the variable in such a way that half of the observations of the
variable lie above the median and half below this. Median is thus called a positional average
because it locates at the middle of the observations. But if the number of observations is even
then after arrangement there will be two middle values and the median will be the average of
these two middle values.
2.4.1 MEDIAN IN INDIVIDUAL SERIES
Case 1. If n is odd then middle most, i.e., (n+1/2)th term value is the median.
Case2. If n is even then there are two middle terms (n/2)th and (n+1/2)th, then median
is given by:
n n
2 th + 2 + 1thterm
Me =
2
Example 12. The marks of 9 children in a test exam are: 12, 23, 34, 11, 14, 15, 13, 16, 45.
So the median is the (9+1)/2 th, i.e., 5th term value, i.e., 15.
Example 13. The number of blood LDL (in mg/dl) present the blood samples of 12 patients
are: 5, 19, 42, 11, 50, 30, 21, 0, 22, 52, 36, 27
n n 12 12
2 th + 2 + 1thterm 2 th + 2 + 1thterm
Me = =
2 2
(6th + 7th)term 22 + 27 49
= = = = 24.5
2 2 2
So median is 24.5 mg/dl which does not belong to the data. So in case of even number of
observations median is not present in the data observations.
characteristic and f1 , f 2 ,.... f n be their corresponding frequencies then for the calculation of
median we calculate the cumulative frequencies. The median is calculated with the help of
the following steps.
Step2. Find the total of frequencies, called cumulative frequency and denoted by c.f.
N
Step3. Find , where N= ∑ f
2 .
N
Step4. Find cumulative frequency just greater than 2 . The value of x corresponding to this
cumulative frequency is the required median.
X 21 15 17 9 5 7 8 10
F 2 5 3 4 5 1 6 12
Solution. For calculating the median we arrange the values of x in ascending order and then
prepare the cumulative frequency table as follows:
x f c.f.
5 5 5
7 1 6
8 6 12
9 4 16
10 12 28
15 5 33
17 3 36
21 2 38
N = ∑ f = 38
Here N/2 = 38/2= 19 and cumulative frequency just greater than 19 is 28. The value of x
corresponding to cumulative frequency 28 is 10. So the median of the given data is 10.
When the data is in class interval form, the class corresponding to c.f. just greater than
N/2 is called the median class and the median is computed by the following formula:
N
−C
Me = L + ×h
2
f
N= Total of frequency
Example15. The following table gives the distribution of weights of 100 persons. Find the
median of this data.
Weight 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90
Frequency 1 3 6 10 15 25 15 10 11 4
40-45 1 1
45-50 3 4
50-55 6 10
55-60 10 20
60-65 15 35
65-70 25 60
70-75 15 75
75-80 10 85
80-85 11 96
85-90 4 100
Total N = ∑ f = 100
Here N/2 = 10/2= 50 and cumulative frequency just greater than 50 is 60. The class
corresponding to cumulative frequency 60 is 65-70. So class 65-70 is the median class. Now
median is given by:
N
−C
Me = L + ×h
2
f
M e = 65 +
(50 − 35) × 5 = 65 + 15 × 5 = 65 + 3 = 68
25 25
Merits:
8. In case of qualitative data, e.g., beauty, honesty, intelligence, etc. it is the best
measure of central tendency.
Demerit:
2. It is a positional average and is based only on the middle term. It does not use all
the observations of the data.
Uses:
2.5 MODE
Mode is the most frequent item of the series, i.e., in a given set of observations a item
or observation which is repeated maximum number of times an all other observations cluster
around this, is called mode. For example, the average height of an Indian male is 5 feet 6
inch; the average size of the shoes of an Indian male is number 7, etc. Mode is also known as
norm.
Unimodal: If the data of a distribution has only one mode then the distribution is
called unimodal.
Bimodal: If we find that there are two items in a distribution which have the same
number of repetitions, then these two items are the modes and the distribution is
called bimodal.
Trimodal: Similarly, in a distribution, if there are three such items that they have the
same frequency then these three items are called the modes of the distribution and the
distribution is called trimodal.
Ill- defined mode: If there exists more than one mode in a distribution, then mode is
called ill-defined.
2.5.2 MODE IN IDDIVIDUAL SERIES
In case individual series mode is the most frequent observation. It is clear from the
following example.
2, 3, 4, 7, 9, 3, 2, 1, 5, 3, 6, 3, 8, 3
Solution. In the given series the observation 3 is repeated maximum number of times (5) so
the mode of the given series is 3.
In case of discrete frequency distribution, mode is the value of the variable which has the
maximum frequency. Consider the following example:
Variable (x) 2 5 7 9 11 25 35 43 52
Frequency (f) 1 3 4 8 25 12 11 10 8
Solution: Here we see that in the given distribution, the variable 1 has the maximum
frequency 25. So the mode of this distribution is 11.
When the distribution is irregular, the frequencies are increasing and decreasing in
An irregular pattern or the difference between the maximum frequency and the frequency
succeeding or proceeding to it is small and the observations are concentrated on either side,
in such a situation mode cannot be determined merely by inspection. In such a case, we apply
the grouping method for the computation of mode. The procedure of grouping method will
be clear from the following example.
Frequency (f) 1 3 4 5 7 10 11 10 9 14 7 5
Solution. Here we see that initially the frequencies are increasing from 1 to 11 and then
decreasing but the frequency 14 of the variable value 11 is again increasing and then
decreasing up to frequency 5. This distribution shows an irregular pattern. So for the
calculation of mode we apply the grouping method of mode. For this we prepare a table and
the procedure of preparing the table is explained below the table.
Variable Frequency(f)
(x)
Column Column Column Column Column Column
(i) (ii) (iii) (iv) (v) (vi)
2 1
3 3
}4
4 4
}7 }8
}9 }12
5 5
}12 }16
6 7
}17 }22
7 10
}21 }28
8 11
}21 }31
9 10
}19 }30
10 9
}23 }33
11 14
}21 }30
12 7
}12 }26
13 5
Prepare a table from the frequencies of the distribution. In column (i), we have the
original frequencies. Mark bold type the maximum frequency in this column. Column (ii) is
prepared by adding the frequencies two by two as 1+3 = 4; 4+5 = 9 and so on. Mark bold
type the maximum frequency in this column also. Column (iii) is prepared by adding the
frequencies two by two leaving the first frequency. Column (iv) is prepared by adding the
frequencies three by three. Column (v) is prepared by adding the frequencies three by three
leaving the first frequency and column (VI) is prepared by adding the frequency three by
three leaving the first two frequencies. In each column make bold type the maximum
frequency. The table is given above:
i 14 11
ii 23 10, 11
iv 30 8, 9, 10
v 33 9, 10, 11
vi 31 7, 8, 9
In the analysis table column number (1) shows the columns serially from the above table 12,
column number (2) shows the maximum frequency from the same table 12 and column
number (3) shows the value of x related to the maximum frequency or the values of x which
contributes in the maximum frequency. Finally, in column number (3) of the analysis table
we see that the value 11 is repeated maximum number of times. So 11 is the mode of the
above distribution.
Mo = L +
( f1 − f0 ) ×h
(2 f1 − f0 − f2 )
Where L= lower limit of the modal class
Example 19. Following table shows the blood pressure and the frequency related to it. Find
the mode of this distribution.
Table 14.
70-80 2 110-120 32
80-90 4 120-130 28
90-100 14 130-140 12
100-110 35 140-150 5
Solution. From the table it is clear that maximum frequency is 35 and the related class is the
100-110. So 100-110 is the modal class. Now to compute the mode we use the following
formula:
Mo = L +
( f1 − f0 ) ×h
(2 f1 − f0 − f2 )
Here L= lower limit of the modal class= 100
M o = 100 +
(35 − 14) × 10 = 100 +
210
= 100 + 8.75 = 108.75
∴
(70 − 14 − 32) 24
Merits:
5. It can be computed for the distributions of unequal class intervals provided the
modal class; the class preceding the modal class and succeeding the modal class are of
equal width.
3. In some cases mode is ill defined. In some cases it is not possible to find a clear
mode. Some series have two modes and some more than two modes.
5. If the modal class or the class preceding or succeeding the modal class are of
unequal width, it cannot be determined.
Uses:
2.6 SUMMARY
The study of this chapter provides us the knowledge of central tendency and measures
of central tendency. From the study of this chapter we came to know the definitions of the
measures of central tendency as mean, median and mode. We studied and learnt different
methods of computing mean. We learnt about weighted mean and combined mean. We learnt
how we can calculate the mean, median and mode in case of individual series, in case of
discrete distribution and continuous distribution. We studied the grouping method of mode.
We studied the merits, demerits and uses of mean, median and mode also. From the study of
these methods, merits, demerits and uses we came to know the situations where which
method is suitable and also which measure is suitable for the particular situation? Over all we
learnt a lot about measures of central tendency.
3.3.4 STANDARD DEVIATION
For describing the scatteredness of the data values the best measure of variability is the
standard deviation. It is denoted by σ . If standard deviation in a data is small, it means there
is high degree of homogeneity in the data values and vice versa if the value of standard
deviation is large, it means there is a large heterogeneity in the data values.
It is defined as the positive square root of the arithmetic mean of the deviations of values
when the deviations are taken from their arithmetic mean.
Let the variable under study X takes the n values x1 , x 2 ,......x n , their standard deviation
is given by the following formula:
∑ (x − x) ∑d
2 2
σ = i
σ= where d = xi − x
n or n
Step2. Compute the deviations of the series values from the mean, i.e., compute
d = xi − x
Step3. Compute the square of the values got in step 2, i.e., compute d 2 = ( xi − x )
2
.
Step4. Find the sum of values got in step 3 and divide it by the number of values, i.e.,
∑d ∑ (x − x)
2 2
i
compute = .
n n
Step5. Take the square of the value got in step 4. This is the required value of the
standard deviation.
x d = (xi − x ) d 2 = ( xi − x )
2
12 -8 64
15 -5 25
17 -3 9
21 1 1
28 8 64
27 7 49
∑ x = 120 ∑ d 2 = ∑( xi − x ) = 212
2
x=
∑ x = 120 = 20
Arithmetic mean n 6
∑d
2
212
σ = = = 35.33 = 5.94
Standard deviation n 6
This method is applied when mean is in fractional form because in that case the
deviations and their squares make the calculations difficult. So in this case we take the
deviations of the values from an assumed mean.
Let d = x – A, here A is the assumed mean, then in this case the formula for standard
deviation is given by as :
∑d ∑d
2 2
σ = −
n n where n is the number of observations.
We follow the following steps for the commutation of S.D. in this case:
Step2. Compute the deviations of the series values from the assumed mean, i.e.,
Compute d = xi − A
Step3. Find the total of step 2 values, i.e., find total of d, i.e., Σ d.
Step4. Divide the value of step 3 by number of values ‘n’ and find its square, i.e,
∑d
2
n .
Step5. Compute the square of the values got in step 2, i.e., compute d 2 = ( xi − A)
2
.
Step5. Find the sum of values got in step 5 and divide it by the number of values, i.e.,
∑d ∑ (x − x)
2 2
i
Compute = .
n n
Step6. Subtract the value of step 4 from value of step 5 and then take its square root.
Example8. Find the standard deviation in the above example 7 by short- cot method.
Solution. Let us take 21 as assumed mean A. Now we prepare the following table for the
computation of standard deviation.
x d = (x − 21) d2
12 -9 81
15 -6 36
17 -4 16
21 0 0
28 7 49
27 6 36
Total ∑ d = −6 ∑ d 2 = 218
∑d ∑d
2 2
218
∴ σ = − = − 1 = 35.33 = 5.94
n n 6
characteristic and f1 , f 2 ,.... f n be their corresponding frequencies then standard deviation can
be calculate with the help of these methods:
The procedures of the three methods will be clear with the help of the examples.
∑ f (x − x )
2
σ =
N
Example9. Calculate the standard deviation of the distribution of marks of the B.Sc.
botany class students. The data is given below:
Solution. For the calculation of standard deviation we prepare the following table:
X f fx d = (x − x ) ( x − x )2 f (x − x )
2
18 1 18 -6 36 36
17 3 51 -7 49 147
15 5 75 -9 81 405
20 1 20 -4 16 16
25 2 50 1 1 2
32 10 320 8 64 640
42 1 42 18 324 324
N = 25 ∑ f x = 600 ∑ f ( x − x ) = 1858
2
Arithmetic mean x =
∑ f x = 600 = 24
∑ f 25
∑ f (x − x )
2
1858
Standard deviation σ = = == 74.32 = 8.62
N 25
∑ fd ∑ fd
2 2
Example10. For the above example 9 apply assumed mean method for computing the
standard deviation.
Solution. Let us assume A= 20. Now for the calculation of standard deviation we prepare the
following table:
X f d = ( x − A) fd fd 2
12 2 -8 -16 128
18 1 -2 -2 4
17 3 -3 -9 27
15 5 -5 -25 125
20 1 0 0 0
25 2 5 10 50
32 10 12 120 1440
42 1 22 22 484
N = 25 ∑ fd = 100 ∑ fd 2 = 2258
∑ fd ∑ fd
2 2 2
2258 100
Standard Deviation σ = − = −
N N 25 25
= 90.32 − 16 = 73.68 = 8.62
∑ fd ∑ fd
2 2
σ = − ×h
N N
The procedure of the method will be clear from the example given below:
Example11. Daily high blood pressure of a patient on 100 days is given below:
No. of days: 3 9 25 35 17 10 1
Solution. Let us take the assumed mean A= 114. Here common interval h= 4. Now we
prepare the following table for the calculation of standard deviation.
BP f
d=
(x − 114 ) fd fd 2
(mmHg) 4
102 3 -3 -9 27
106 9 -2 -18 36
110 25 -1 -25 25
114 35 0 0 0
118 17 1 17 17
122 10 2 20 40
126 1 3 3 9
∑ fd ∑ fd
2 2 2
154 − 12
S.D.= σ = − ×h = − ×4
N N 100 100
In case of continuous distribution we find the mid values of classes and treated them
as the variable values x. In this case we can apply all the three methods discussed in previous
section. But generally step deviation is applied. The formula is the same as in case of discrete
distribution discussed. The procedure is described in the example given below.
Example12. Calculate the standard deviation for the following table giving the age
distribution of 542 persons of a city.
Age in years: 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80 80 – 90
Solution. For the calculation of standard deviation, let us take d = ( x − 55) / 10 . Here we let
assumed mean A = 55 and common interval (h) = 10. Now we prepare the following table:
20-30 25 3 -3 -9 27
30-40 35 61 -2 -122 244
50-60 55 153 0 0 0
80-90 85 2 3 6 18
∑ fd ∑ fd
2 2
S.D.= σ = − ×h
N N
2
765 − 15
= − × 10 = 1.334 × 10 = 11.55
542 542
Hence the standard deviation of age of the given distribution is 11.55 years.
Merits:
1. It is rigidly defined.
2. It uses all the observations of the data in calculation.
3. It is used in correlation.
4. It is affected least by fluctuation of sampling.
5. It is suitable for further mathematical treatments.
6. It is the best measure of variability.
Demerits:
3.4.1 VARIANCE
It is the best measure of the comparison of variability of the two series or populations.
The units of measurement of the two populations may be different. This comparison is
possible because it is a unit free measure. It is presented in percentage and
is expressed as:
σ
Coefficient of variation (C.V.) = × 100 ; where notations have their usual meaning.
x
A series having lesser c.v. is called more consistent or more homogeneous, i.e., the
values of the series are closer to the mean of the series and if the c.v. of a series is larger,
it is called more variable or in other words more heterogeneous series, i.e., the values of
the series far apart from the mean of the series.
x = 24 σ =6
σ 6
Solution. Coefficient of variation (C.V.) = × 100 = × 100 = 25%
x 24
Example14. The following data shows the mean and standard deviation on systolic BP
and weight of 10 persons as:
BP Weight
120 15 60 4.5
Solution. For comparison of the two characteristics we find the C.V. of these
characteristics.
σ 15
C.V for BP = × 100 = × 100 = 12.5%
x 120
σ 4.5
C.V for Weight = × 100 = × 100 = 7.5%
x 60
We see that the coefficient of variation of BP is more than the coefficient of variation
of weight so BP is more variable than the weight of the given persons.