0% found this document useful (0 votes)
30 views52 pages

Biostat English

This document discusses the objectives, definitions, symbols, scope, and applications of biostatistics. The objectives include defining biostatistics and statistics, understanding statistical symbols, and learning about data collection techniques and representation. Biostatistics is defined as applying statistical methods to biological areas like medicine and health research. It has broad scope and applications in fields like medicine, pharmacy, agriculture, and environmental science by aiding in experiment design, data analysis, and interpreting results.

Uploaded by

Devashish Vyas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views52 pages

Biostat English

This document discusses the objectives, definitions, symbols, scope, and applications of biostatistics. The objectives include defining biostatistics and statistics, understanding statistical symbols, and learning about data collection techniques and representation. Biostatistics is defined as applying statistical methods to biological areas like medicine and health research. It has broad scope and applications in fields like medicine, pharmacy, agriculture, and environmental science by aiding in experiment design, data analysis, and interpreting results.

Uploaded by

Devashish Vyas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

1.

1 OBJECTIVES

Following are the objectives of this chapter:

1. To know the definitions of Statistics and Biostatistics.

2. To know about some statistical symbols.

3. To know the scope and applications of Biostatistics.

4. To know about data, its collection and collection techniques.

5. To know about organization and representation of data by many graphical techniques


such as histogram, pie chart, frequency polygon etc.

1.2 INTRODUCTION

We welcome the reader who wishes to learn biostatistics. In this chapter we introduce
you to the subject. First of all we define statistics and biostatistics and then examples are
given where bio- statistical techniques are useful. These examples show that biostatistics has
an importance in advancing our biological knowledge; biostatistics helps to evaluate many
life-and-death issues in medicine.

We advise you to read the examples carefully and then think yourself, “What can be
inferred from the information presented?” What would you do with the data after they are
collected? How can it be presented and what you can get from it? We want you to realize
that biostatistics is a tool that can be used to benefit you and society.

There is no royal road to biostatistics. You need to be involved. You need to work
hard. You need to think. If you analyze the actual data, the result will be a powerful tool that
has immediate practical uses. Our main purpose is to develop thought patterns in your mind
that are useful in evaluating information in all areas of your life.

1.3 DEFINITIONS OF STATISTICS AND BIOSTATISTICS


Much of the joy and pain in life arises in situations that involve considerable
uncertainty. Here we are giving two situations which show that the study of statistics and
biostatistics is necessary.
1. Parents of a child with a genetic defect consider whether or not they should have another
child. They will base their decision on the chance that the next child will have the same
defect.

2. To choose the best therapy, a physician must compare the diagnosis or future course, of a
patient under several therapies. A therapy may be a success, a failure, or somewhere in
between; the evaluation of the chance of each occurrence necessarily enters into the decision.

1.3.1 DEFINITION OF STATISTICS

Statistics is the science which deals with the collection, classifying, presenting, comparing
and interpreting numerical data collected to throw light on any sphere of enquiry- Lovitt.

The science of statistics is a most useful servant, but only of great value to those who
understand its proper use- W.I.King. Statistics provides tools and techniques for research
workers- A.M. Mood. Planning is the order of the day and without statistics planning is
inconceivable- L.H.C. Tippet.

Statistics may be defined as a science of numerical information which employs the process of
measurement and collection, classification, analysis, decision making and communication of
results in a manner understandable and verifiable by other- Cecil H. Meyers

1.3.2 DEFINITION OF BIOSTATISTICS

Biostatistics is the application of statistics methods applied to biological areas.


Biological laboratory experiments, medical research (including clinical research), and health
services research all use statistical methods. Many other biological disciplines rely on
statistical methodology.

There are three reasons for focusing on biostatistics:

1. Some statistical methods are used more deeply in biostatistics than in other fields. For
example, a general statistical textbook would not discuss the life-table method of analyzing
survival data of importance in many bio-statistical applications. The topics in this book are
adapted to the applications in mind.

2. Examples are drawn from the biological, medical, and health care areas; this helps you
maintain motivation. It also helps you in understanding how to apply statistical methods.
3. A third reason for a book on biostatistics is to teach the material to the audience of health
professionals. In this case, the interaction between students and teacher, but especially among
the students themselves, is of great value in learning and applying the subject matter.

1.4 STATISTICAL SYMBOL

1.4.1 STATISTICAL SYMBOL

Some of the statistical symbols which are useful to biostatistics students are:

f: Frequency of the variate

x : Arithmetic Mean of a given set of values or of a distribution

M e : Median of a given set of values or of a distribution

M o : Mode of a given set of values or of a distribution

σ : Standard Deviation of a given set of values or of a distribution

σ 2 : Variance of a given set of values or of a distribution

Σ : Sum of all the values of a given set

Q.D.: Quartile deviation of a given set of values or of a distribution

M.D.: Mean deviation of a given set of values or of a distribution

1.4.2 SCOPE OF BIOSTATISTICS

Biostatistics is the application of statistics in different fields of biology. The science


of biostatistics includes the design of biological experiments, especially in medicine,
pharmacy, agriculture, forestry, environmental science, fishery etc; the collection,
summarization, and analysis of data from those experiments; and execute interpretation and
inference from the results. A major branch of this is medical biostatistics, which is
exclusively concerned with health and medical sciences.

In current world, the scope of biostatistics is increasing rapidly. If we discuss about


biostatistics, we see that almost all educational programmes in biostatistics are at
postgraduate level. They are most often found in schools of public health, affiliated with
schools of medicine, forestry, or agriculture, or as a focus of application in departments of
statistics.

In larger universities where both a statistics and a biostatistics department exist, the
degree of integration between the two departments may range from the bare minimum to
very close collaboration. In general, the difference between a statistics program and a
biostatistics program is twofold: (i) statistics departments will often host
theoretical/methodological research which are less common in biostatistics programs and (ii)
statistics departments have lines of research that may include biomedical applications but
also other areas such as industry (quality control), business and economics and biological
areas other than medicine

There is a special need of the subject bio statistics because it related with such areas
as medical , pharmacy, forestry, agriculture, etc, which are very necessary for the betterment
of society.

1.4.3 APPLICATION OF BIOSTATISTICS

The importance and application of statistics in the field of biology is increasing day

by day. Why it is so? The reason is that in biology the interplay of casual and response
variables follow the laws that are not in the classic mold of 19th century physical science. In
that century, biologists such as Robert Mayer, Helmholtz, and others in trying to show that
biological process were nothing but physicochemical phenomena, helped create the
impression that the experimental methods and natural philosophy that had led to such
dramatic progress in the physical sciences should be imitated fully in biology.

Many biologists even to this day have retained the tradition of strictly mechanistic
and deterministic concepts of thinking, while physicists, as their science became more
refined and came to deal with ever more elementary particles, began to resort to statistical
approaches. In biology most phenomena are affected by many casual factors, uncontrollable
in their variation and often unidentifiable. Statistics is needed to measure such variable
phenomena with a predictable error and to ascertain the reality of minute but important
differences.

A Biostatistics centre could jointly organize working groups, the seminar series,
computing infrastructure and possibly consulting and clinical trials coordinating centre
cervices. The main objective of the centre would be to estimate, collaborate on, and circulate
results of research in a particular subspecialty in the following reasons:

1. Statistical methods for longitudinal studies;


2. Statistical genetics;
3. Foundations of inference;
4. Bayesian biostatistics
5. Biostatistician practice and education.

The most critical short term problem in the field of biostatistics is the information
system. We need to incorporate modern, web-based technologies into the everyday
workings of the department of biostatistics. We need reliable and accessible systems that
are competitive with those available to departments of statistics and biostatistics. We
likely build collaborations with computer science students.

1.5 DATA AND ITS TYPES

1.5.1 DATA

The information collected from census or surveys or from other sources is called raw data.
The word data means information. The adjective raw attached to data indicates that the
information collected cannot be used directly. It has to be converted into more suitable form
before it begins to make sense to be utilized gainfully. Raw data is like raw rice. Raw rice has
to be cooked properly and tastefully before it is eaten and digested. Similarly, raw data has to
data tabulation, which give meaning to the information collected. Data are tabulated by (1)
manual procedure (2) Mechanical procedure (3) Computer feeding. IN preparation of tables
following principles are followed:

(i) A rough draft of the table should be prepared first. Before drawing out the
final table, rough draft should be examined carefully.
(ii) Headings of the rows and columns should be brief and clear.
(iii) Title, note, row and column are made specific, connoting meaning or
expressions.
(iv) Numbers of class intervals are decided as per aims of study which should not
be too small or too big.
(v) Symbols used, should be explained.
(vi) Tabulated data should specify the units of their measurements.
(vii) The sources from which data are obtained should be given.

1.8.2 REPRESENTATION OF DATA

Tabulated data will give some information and also allow for further analysis.
The columns and rows in a table make eye strain and there are chances of poor visual
impression of data presented in a tabular form. Now the well tabulated data can be
represented in the form of picture, diagram or figure which will help in good
comparison through good visual impression. The representation of quantitative data
through charts and diagrams is known as graphical representation of statistical data. A
picture is said to be more effective than words for describing a particular thing or
phenomenon. Main objective of diagram is to help the eye to grasp series of numbers
and to grasp the meaning of series of data and also to assist the intelligence.

There are various types of graphs in the form of charts and diagrams. Some of
them are:

1. Bar diagram, 2. Pie chart, 3. Histogram, 4. Frequency polygon and


Frequency curve, 5. Pictograms, 6. Line chart, 7. Cumulative frequency
curve 8. Scatter diagram
1.8.2.1 Bar diagram

The simplest type of graph that can be used to represent the categorical data is the bar
diagram. It is also called a columnar diagram. The bar diagrams are drawn through
columns of equal width. In this diagram we show the category of the variable on the X-
axis and the frequencies on the Y-axis on a graph paper. A bar of each category is of the
variable is drawn and the height of the bar represents the frequency of that category.
Since the data is of qualitative nature or quantitative data of discrete type, bars should not
be next to each other and there should be an equal gap between two successive bars.
Following rules were observed while constructing a bar diagram:

(a) The width of all the bars or columns is similar.

(b) All the bars should are placed on equal intervals/distance.

The following types of bar graphs are possible:

(a) Simple bar graph

(b) Double bar graph

(c) Multiple bar graphs

We will illustrate each of these graphs by the following illustrations:

(a) Simple Bar Diagram

A simple bar diagram is constructed for an immediate comparison. It is advisable to


arrange the given data set in an ascending or descending order and plot the data variables
accordingly. In Base hospital has been found patients in OPD in particular disease as
below in year 2012.

Month: Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec

Patients: 285 315 250 289 386 410 452 620 421 186 450 500
700
600
500
400
300
200
100
0
Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec

Figure
ure -1.5 simple bar diagram

(b) Double Bar Diagram

When two components are grouped in one set of variable or different variables of one
component are put together, their representation is made by a double bar diagram. In this
method, different variables are shown in a single bar with ddifferent
ifferent rectangles. From
above example, patients were divided in two categories as male and female and the data
is given below:

Month: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Male 100 250 150 189 270 200 350 275 215 86 300 200

Female: 185 115 100 100 116 210 102 345 206 100 150 300
Male Female

350
300
250
200
150
100
50
0
Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec

Figure -1. 6 Double Bar Diagram

(c) Multiple Bar Diagram

Multiple bar diagram shows that the proportion of subgroup between two or more categories
are represented with a bar giving proportion to each of them within the bar. It is also
advisable to make one bar as 100% and each subcategory is given proportion within the
graph.

1.8.2.2 Pie Chart

Pie diagram is another graphical method of the representation of categorical data. Pie
is a mathematical constant defined as the ratio of the circumference of a circle to the diameter
and is equal to 22/7. It is drawn to depict the total valu
valuee of the given attribute using a circle.
In the pie chart, a circle (total 360o) is divided into sectors with areas proportional to the
frequencies or the relative frequencies of the categories of a variable. Dividing the circle into
corresponding degrees of angle then represent the sub
sub– sets of the data. Hence, it is also
called as Divided Circle Diagram
Diagram.
Example 2. A household with a monthly salary of Rs. 7200 plans his budget for a month as
given below:

Item Food Rent Education Savings Misc. Total

Amount (Rs.) 3000 800 1200 1500 700 7200

Make a pie chart for this data.

Solution. First of all we find the angles of each sector as follows:

Total of data corresponds to 360o. Let xo = the angle at the centre for item A, then for
the data given in above example to draw pie graph, we find the angles of each category.

Calculation of Angles

For Food:

f 3000
Angle at centre = × 360 o = × 360 o = 150o. Here f= Frequency of food and
∑f 7200

∑ f = Total frequency
For Rent:

f 800
Angle at centre = × 360 o = × 360 o = 40o
∑f 7200

Similarly, we can calculate the remaining angles, and the total of angles column should
always come to 360o.

Table-2

Item Amount (Rs.) Angle

Food (A) 300 150

Rent (B) 800 40

Education (C) 1200 60

Savings (D) 1500 75

Miscellaneous 700 35
Total 7200 360

Figure-1.7Pie chart

1.8.2.3 Histogram

A two dimensional frequency density diagram is called a histogram. A histogram is a


diagram which represents the class interval and frequency in the form of a rectangle. There
will be as many adjoining rectangles as there are class intervals. There are two types of
histograms-

(1) Histogram with equal class intervals


(2) Histogram with unequal class intervals

To draw a histogram, you should follow the steps as stated below:

1. Class intervals must be exclusive. If the intervals are in inclusive form, convert them
to the exclusive form.

2. Draw rectangles with class intervals as bases and the corresponding frequencies as
heights.

3. If the intervals are equal, then the height of each rectangle is proportional to the
corresponding class frequency.
4. If the intervals are unequal, then the area of each rectangle is proportional to the
corresponding class frequency density.

Example 3. Draw a histogram for the following data showing the class interval and their
corresponding frequencies.

Class interval 0-5 5-10 10-15 15-20 20-25

Frequency 4 10 18 8 6

Figure-1.8 Histogram

Example 4. Following is the distribution of shops according to the number of wage - earners
employed at a shopping complex.

Table-3 showing the distribution of wage earners

Number of wage earners No. of shops Frequency density

Under 5 18 3.6

5 – 10 27 5.4

10 – 20 24 2.4

20 – 30 20 2.0

30 – 50 16 0.8
Illustrate the above table by a histogram, showing clearly how you deal with the unequal
class intervals.

Solution. When the class intervals are unequal, we construct each rectangle with the class
intervals as base and frequency density as height.

Frequency density = Frequency/ Class width

Figure- 1.9

1.8.2.4 Frequency Polygon and Frequency Curve

In a frequency distribution, the mid-value of each class is obtained. Then on the graph
paper, the frequency is plotted against the corresponding mid-value. These points are joined
by straight lines. These straight lines may be extended in both directions to meet the X - axis
to form a polygon. If these points are joined by a free hand smooth curve then it is called
Frequency curve.

Example 5. The growth rate of different crops like rice, wheat, birth rates, death rates and
life expectancy are given in the following table. Make a frequency polygon from it.

Table-4 Showing class interval and frequency


Class interval Mid Marks Frequency

40 – 44 42 3

45 – 49 47 10

50 – 54 52 12

55 – 59 57 15

60 – 64 62 7

65 – 69 67 5

Figure-1.10

1.8.2.5 Pictograms

Pictograph is the use of pictures or images to present data. They will give the quick
idea for the frequency of the characteristics and fraction also marks on pictures, e.g., bus
for transport, man for cases, cot for hospital beds, etc. It is widely used by government
and private organizations. The chief advantage of this method is its attraction.
1.8.2.6 Line chart

It is most widely used in medical science. It shows the trend of times. Data having
some order as age –wise incidence of a disease can be represented by a line chart. It is drawn
by taking one variable on the horizontal X-axis and the other variable on the vertical Y-axis.
This graph shows the effect of one variable on the other variable, e.g., age specific incidence
of cancer among males of Delhi.

Cumulative frequency curve

If we plot the less than cumulative frequencies rather than frequencies against the
upper limits of the classes, the curve obtained on joining these points by free hand curve is
called less than cumulative frequency curve or ogive or less than ogive and If we plot the
more than cumulative frequencies rather than frequencies against the lower limits of the
classes, the curve obtained on joining these points by free hand curve is called more than
cumulative frequency curve. The advantage of this curve is that it enables us to answer the
queries related to the frequency distribution of the variable.

1.8.2.8 Scatter diagram

It is the simplest way of the representation of bivariate data. Thus for the bivariate
distribution (x, y) ; if the values of the Variable X and Y be plotted as x along X-axis and the
y along the Y-axis respectively in the x y plane, the diagram of dots so obtained is called
scatter diagram.

1.9 SUMMARY
From the study of this chapter the students came to know the definitions of statistics
and biostatistics, the scope and applications of biostatistics. The students studied and learnt
about data. What is data? What are the types of data? The classification of different types of
data provides knowledge to treat different types of data. We learn from the study of this
chapter the different steps necessary for adopting any sampling procedure and the two types
of error involved in the collection of sample and complete census. We learn definitions of
2.1 OBJECTIVES
From the study of this chapter the students will be able:

1. To know about the measures of central tendency- mean, median and mode.
2. To know the merits and demerits and uses of these measures.
3. To know about different methods of measuring mean, median and mode.
4. To know the situations where which measure is better to use?
5. To know the advantages of short cut methods of computing mean.

2.2 INTRODUCTION
In the previous chapter, we discussed data collection, data organization and data
representation techniques. The data representation techniques such as frequency histograms and
frequency polygons, introduced the concept of the shape of distributions of data. For example, a
frequency polygon illustrated the distribution of body mass index data. We expend chapter 1 on
these concepts by defining measures of central tendency.

Measures of central tendency as the name suggests are numerical measurement of the
central part of the distribution. Measures of central tendency are also called averages or measures
of location because they show the location of the centre of the distribution from which the data
were sampled. According to Professor Bowley, averages are, “statistical constants which enable
us to comprehend in a single effort the significance of the whole.” In other words, these are
numbers that tell us where the majority of values in the distribution are located. For example the
average marks in a distribution of marks of all the students of a class. The averages which are
commonly used in biostatistics are as follows:

1. Mean or arithmetic mean 2. Median 3. Mode.

2.3 MEAN
Mean or arithmetic mean of a series of data is the ratio of the sum of the observations to
the number of observations. If x1 , x 2 ,......x n are the observations of a series then their arithmetic
mean is given by
n

x + x 2 + .....x n ∑x
i =1
i
x= 1 = (1)
n n

And if the corresponding frequencies, f1 , f 2 ,.... f n of the variables x1 , x 2 ,......x n are given, then
the arithmetic mean is defined as ratio in which the numerator is the sum of products of the
variables with their frequencies and denominator is the sum of the frequencies.
n

f x + f 2 x 2 + ..... f n x n ∑fx
i =1
i i
x= 1 1 = (2)
∑ fi N

where, N = ∑f i = sum of frequencies.

2.3.1 MEAN OF INDIVIDUAL ITEMS

Mean of individual items is given by the ratio of the sum of items to the number of items
as given in formula (1).

Example 1. Find the arithmetic mean of triglycerides present 10 patients in their blood samples
in a hospitalas:

25, 30, 21, 55, 47, 10, 15, 17, 45, 35

Solution. Let x be the average triglyceride value and since these are individual items, their mean
can be computed by formula
n

x1 + x 2 + .....x n ∑x i
x= = i =1
n n
25 + 30 + 21 + 55 + 47 + 10 + 15 + 17 + 45 + 35 300
= = = 30
10 10

2.3.2 MEAN IN DISCRETE FREQUENCY DISTRIBUTION

If x1 , x 2 ,......x n are the observations in a discrete distribution according to some

characteristic and f1 , f 2 ,.... f n be their corresponding frequencies then the arithmetic mean is
given by the formula (2). The computation procedure for mean can be easily understood with the
help of the example given below.
Example 2. The distribution of marks of 50 students of B.Sc. class in a botany semester
examination is given below. Find the average of marks.

Marks (x) 12 23 25 35 45 15 40

Frequency(f) 3 10 12 10 2 8 5

Solution. Since this is a discrete distribution so the average of marks is given by the formula (2).
For the computation of average marks we prepare the following table:

Table1. For calculation of Mean in Discrete Distribution

Marks (x) Frequency (f) fx

12 3 36

23 10 230

25 12 300

35 10 350

45 2 90

15 8 120

40 5 200

Total ∑ f = 50 ∑ f x = 1326

f x + f 2 x 2 + ..... f n x n ∑fx
i =1
i i
1326
x= 1 1 = = = 26.52
∑ fi N 50

This is clear that it is not necessary that average will be a number presenting in the
data and also it is not an integer value while the marks in integers.
2.3.3 MEAN IN CONTINUOUS DISTRIBUTION

In case of continuous distribution, there are given class intervals and their
corresponding frequencies. First of all we find the mid values of these classes and treat them
as the variable values. Now we apply the formula (2) for the calculation of arithmetic mean.
The procedure will be clear from the following example.

Example 3. For the data given in the below table on systolic BP of 68 patients, calculate the
arithmetic mean.

Table 2.

Systolic BP (mmHg) Frequency (f) Systolic BP (mmHg) Frequency (f)

90-100 3 140-150 11

100-110 5 150-160 9

110-120 7 160-170 6

120-130 10 170-180 2

130-140 15

Solution. For the calculation of mean we prepare the following table:

Table3. For calculation of Mean in Continuous Distribution

Systolic BP (mmHg) Frequency (f) Mid Value (x) Fx

90-100 3 95 285

100-110 5 105 525

110-120 7 115 805

120-130 10 125 1250

130-140 15 135 2025

140-150 11 145 1595


150-160 9 155 1395

160-170 6 165 990

170-180 2 175 350

Total ∑f = 68 ∑ fx = 9220

∑fx i =1
i i
9220
x == = = 135.6mmHg
N 68

2.3.4 SHORT-CUT METHOD FOR MEAN

For the computation of mean short –cut method is applied when the variable values
and their frequencies are large. To make the computations easy we take a middle value in the
given values of x as assumed mean and subtract this assumed mean from all the values of x.
This assumed mean is also called provisional mean. Then the formula for the calculation of
arithmetic mean is given by as follows:

x = A+
∑d
n (3)

where A= assumed mean, n= number of observations in the given data

d= x- A= deviation of all the variate values from assumed mean A.

Steps of computation for short- cut method:

Step 1. Take any observation (generally, middle value if we arrange the values in
ascending or descending order of magnitude) of the individual series as assumed mean A.

Step 2. Find the deviation of the values of variate x from assumed mean A, i.e., calculate
the differences d= x- A

Step 3. Find the sum of d and use above formula (3), we find the value of mean.

If the frequencies corresponding to the variate values are given, then we use
the formula for mean as follows:
x = A+
∑ fd
N (4)

where, N = ∑ f = sum of frequencies. Here we find the product of f and d.


If the data is continuous, we find the mid values as x and then d= x- A. Now
apply the above formula (4). The procedure will be clear from the examples as
give below.

Example 4. The marks of the 7 students of a class in a test are as given below:

12, 15, 22, 25, 35, 40, 45

Find the mean by short-cut method.

Solution. Let us take assumed mean A=25. Now we prepare the table for the computation
of mean as given below:

Table-4 Mean for individual data by short cut method

X d = x- 25

12 -13

15 -10

22 -3

25 0

35 10

40 15

45 20

Total ∑ d = 19

Arithmetic mean x = A +
∑d = 25 +
19
= 25 + 2.71 = 27.71
n 7
Thus the average of marks of the given 7 students of the class is 27.71

Example 5. Ten patients were examined for uric acid test. The operation was performed
1050 times and the frequencies so obtained for different number of patients (x) are shown
in the table given below. Compute the arithmetic mean by short- cut method.

x: 0 1 2 3 4 5 6 7 8 9 10

f: 2 8 43 133 207 260 213 120 54 9 1

Solution. Let 5 be the assumed mean. Now we prepare the table for the calculation of
mean.

Table-5 Mean for discrete grouped data by short cut method

X Frequency (f) d = x- 5 fd

0 2 -5 -10

1 8 -4 -32

2 43 -3 -129

3 133 -2 -266

4 207 -1 --207

5 260 0 0

6 213 1 213

7 120 2 240

8 54 3 162

9 9 4 36

10 1 5 5

Total ∑f = 1050 ∑ fd = 12
Arithmetic mean x = A +
∑ fd = 5+
12
= 5 + 0.0114 = 5.0114cm
N 1050

Thus the average for uric acid is 5.0114.

2.3.5 STEP DEVIATION METHOD OF MEAN

It can be used in grouped data. When all the classes are of equal width (say h), in
continuous data and the values of x are at equal interval in discrete grouped data then the we may
simplify the calculations by taking d= (x- A)/ h in short-cut method. Now the formula for the
calculation of mean becomes.

x = A+
∑ fd × h
N

Here, the symbols have the same meaning as in short-cut method above and h is the gap
between the two values of x or class interval.

Example 6. Find the mean by step deviation method for the data of blood pressure of 68 patients
as given in the following table.

BP(mmHg) (x) 90 100 110 120 130 140 150 160 170

Frequency ( f) 3 5 7 10 15 11 9 6 2

Solution. We take assumed mean A= 130 and here interval between any two values of x is 10,
i.e., h= 10. Now prepare the table for the computation of mean.

Table6. Step Deviation Method of Mean in discrete grouped data

BP (mmHg) Frequency (f) x − 130 fd


d=
10

90 3 -4 -12

100 5 -3 -15
110 7 -2 -14

120 10 -1 -10

130 15 0 0

140 11 1 11

150 9 2 18

160 6 3 18

170 2 4 8

Total ∑f = 68 ∑ fd = 4

Arithmetic mean x = A +
∑ fd × h = 130 + 4
× 10 = 130 + 0.588 = 130.588mmHg
N 68

Thus the average for BP is 130.588mmHg.

Example7. For example 3, calculate arithmetic mean by step deviation method.

Solution. For the calculation of mean the table is given below:

Table-7. Mean by Step Deviation Method in Continuous Data

Systolic BP Frequency Mid Value x − 135 fd


d=
(f) (x) 10
(mmHg)

90-100 3 95 -4 -12

100-110 5 105 -3 -15

110-120 7 115 -2 -14

120-130 10 125 -1 -10

130-140 15 135 0 0

140-150 11 145 1 11

150-160 9 155 2 18
160-170 6 165 3 18

170-180 2 175 4 8

Total ∑f = 68 ∑ fd = 4

Arithmetic mean x = A +
∑ fd × h = 135 + 4
× 10 = 135 + 0.588 = 135.588mmHg
N 68

Thus the average for uric acid is 135.588mmHg.

Example 8. In a study on patients of typhoid fever the following data are obtained. Find the
arithmetic mean.

Age in years 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89

No. of cases 1 0 1 10 17 38 9 3

Solution. This is inclusive type data; first of all we convert it to exclusive type data. The
procedure for converting inclusive type data to exclusive type data is as follows:

We see that the upper limit of the first class is 19 and the lower limit of the second
class is 20 and their difference is 20-19=1. Now subtract half of the difference, i.e., 0.5 from
the upper limit and 0.5 to the lower limit. Also we see that this difference is the same for
each of the class. So the new classes are as 9.5-19.5, 19.5-29.5 and so on.

Now for the calculation of mean any method discussed above can be used. Here we
apply step deviation method.

Table-8. Mean by Step Deviation Method in Inclusive Data

Age Frequency Mid Value x − 44.5 fd


d=
(f) (x) 10

9.5-19.5 1 14.5 -3 -3

19.5-29.5 0 24.5 -2 0
29.5-39.5 1 34.5 -1 -1

39.5-49.5 10 44.5 0 0

49.5-59.5 17 54.5 1 17

59.5-69.5 38 64.5 2 76

69.5-79.5 9 74.5 3 27

79.5-89.5 3 84.5 4 12

Total ∑f = 79 ∑ fd = 128

Arithmetic mean x = A +
∑ fd × h = 44.5 + 128 × 10 = 44.5 + 16.2 = 60.7
N 79

2.3.6 WEIGHTED MEAN

In computation of arithmetic mean some items are more important than the others, in
such cases the weightage should be given to the items according to their importance. For
example if we want to have an idea of the change in cost of living of group of people of a
certain locality, then the simple mean of the prices of the commodities consumed by them
will not do, since all the commodities are not equally important, e.g., wheat, rice and pulses
are more important than cigarettes, tea, confectionery, etc.

If x1 , x 2 ,......x n are the variate values of a distribution and w1 , w2 ,....wn be their


corresponding weights then weighted mean is give by:

xw =
∑w x i i

∑w i

Example 9. The following table gives the platelets count (in lakh/cmm) from the analysis of
the blood samples on five different days in a pathology laboratory. Find the average platelets
count per patient.

Day 1 2 3 4 5
Platelates count 0.50 0.75 1.00 1.50 2.00

(in lakh/cmm) (w)

No. of patients (x) 65 80 95 90 70

Solution. The table for the calculation of weighted mean is given by:

Table 9. Table for Weighted Mean

Platelets count (x) No. of patients (w) Wx

0.50 65 32.5

0.75 80 60.0

1.00 95 95.0

1.50 90 135

2.00 70 140

Total ∑ w = 400 ∑ wx = 462.5

xw =
∑w x
i i
=
462.5
= 1.156
∑w i 400

Thus, the average platelets per patient are 1.156 lakh/cmm.

2.3.7 COMBINED MEAN

If x1 , x 2 ,......x m are the means of m series of sizes n1 , n2 ,....nm respectively, then their
combined arithmetic meaning x is given by:

x=
∑n xi i
; i = 1,2,......m
∑n i

Example 10. There are 40 male and 10 female employees in a firm. The mean salary of male
employees is Rs.520 and that of female employees Rs. 420. Find the combined average
salary of all the employees.
Solution. Here, n1 = 40, n2 = 10, x1 = 520, x 2 = 420

Combined mean x =
∑n x i i
=
n1 x1 + n 2 x 2 520 × 40 + 420 × 10 25000
= = = 500
∑n i n1 + n 2 40 + 10 50

Hence, the average salary of all the employees is Rs. 500.

2.3.8 CORRECTED MEAN

Some times there are problems of such type that we used wrong digits while the
actual digits were different, then we replace the wrong digits with the correct digits and now
we can get the correct mean. The procedure will be clear from the example given below.

Example 11. A student calculates the mean of 20 observations as 25.2. Later on he found
that he misread one observation 34 in place of 43, find the correct mean.

Solution. We know the mean of individual series is given by:

x=
∑ x or ∑ x = nx = 20 × 25.2 = 504
n

But he misread 43 as 34. So the correct total of x= 504-34+43=513.

So correct mean=513/ 20 =25.65

2.3.9 MERITS, DEMERIRS AND USES OF MEAN

Merits:

1. Mean is rigidly defined.

2. It can be calculated easily by a non mathematical person also.

3. It is based upon all the observations.

4. Among all the averages, it is affected least by fluctuations of sampling.

5. It is the best measure to compare two or more series.

6. It is easily understandable.

7. It is the most widely used method of central tendency.


Demerits:

1. It is affected much by extreme values.

2. It cannot be calculated in case of open end classes.

3. It cannot be calculated in case of qualitative data such as intelligence, beauty, etc.

`4. In extremely asymmetrical distribution, mean is not a suitable measure of central


tendency.

5. It cannot be calculated if any observation is missing.

6. It may lead to wrong conclusions if the details of the data are not given. For
example the marks of two students in three successive tests are respectively 30, 40, 50
and 50, 40, 30. We see that average score of both the students is same, we can say that
both students are of same level while first is improving and the second is deteriorating.

Uses of Mean:

1. It is very much used in practical situations.

2. A common man uses it for computing his monthly budget.

3. It is very much used in sampling and inference.

4. A businessman uses it for computing per unit profit, output per person, average
expenditure and average profit per week or per month, etc.

2.4 MEDIAN

Median of a distribution is the middle most value of the variable if the values of the
variable are arranged in ascending or descending order of their magnitude. The median
divides the observations of the variable in such a way that half of the observations of the
variable lie above the median and half below this. Median is thus called a positional average
because it locates at the middle of the observations. But if the number of observations is even
then after arrangement there will be two middle values and the median will be the average of
these two middle values.
2.4.1 MEDIAN IN INDIVIDUAL SERIES

Arrange the data observations, (say n) in ascending or descending order of magnitude.


Now there can be two cases:

Case 1. If n is odd then middle most, i.e., (n+1/2)th term value is the median.

Case2. If n is even then there are two middle terms (n/2)th and (n+1/2)th, then median
is given by:

n n  
 2 th +  2 + 1thterm
  
Me = 
2

Example 12. The marks of 9 children in a test exam are: 12, 23, 34, 11, 14, 15, 13, 16, 45.

Find the median of the marks.

Solution. Arrange the given observations in ascending order of magnitude, we get

`11, 12, 13, 14, 15, 16, 23, 34, 45

Here the number of observations n =9, i.e., odd.

So the median is the (9+1)/2 th, i.e., 5th term value, i.e., 15.

Example 13. The number of blood LDL (in mg/dl) present the blood samples of 12 patients
are: 5, 19, 42, 11, 50, 30, 21, 0, 22, 52, 36, 27

Find the median of the data.

Solution. On arranging the given observations in ascending order of magnitude, we get,

0, 5, 11, 19, 21, 22, 27, 30, 36, 42, 50, 52

Here number of observations = 12, i.e., even. So median is given by

n n   12  12  
 2 th +  2 + 1thterm  2 th +  2 + 1thterm
     
Me =  =
2 2
(6th + 7th)term 22 + 27 49
= = = = 24.5
2 2 2
So median is 24.5 mg/dl which does not belong to the data. So in case of even number of
observations median is not present in the data observations.

2.4.2 MEDIAN IN DISCRETE FREQUENCY DISTRIBUTION

If x1 , x 2 ,......x n are the observations in a discrete distribution according to some

characteristic and f1 , f 2 ,.... f n be their corresponding frequencies then for the calculation of
median we calculate the cumulative frequencies. The median is calculated with the help of
the following steps.

Working steps for median

Step1. Arrange the given values in ascending order of magnitude.

Step2. Find the total of frequencies, called cumulative frequency and denoted by c.f.
N
Step3. Find , where N= ∑ f
2 .

N
Step4. Find cumulative frequency just greater than 2 . The value of x corresponding to this
cumulative frequency is the required median.

Example14. Find the median for the following data.

X 21 15 17 9 5 7 8 10

F 2 5 3 4 5 1 6 12

Solution. For calculating the median we arrange the values of x in ascending order and then
prepare the cumulative frequency table as follows:

Table10. Median in Discrete Distribution

x f c.f.

5 5 5

7 1 6
8 6 12

9 4 16

10 12 28

15 5 33

17 3 36

21 2 38

N = ∑ f = 38

Here N/2 = 38/2= 19 and cumulative frequency just greater than 19 is 28. The value of x
corresponding to cumulative frequency 28 is 10. So the median of the given data is 10.

2.4.3 MEDIAN IN CONTINUOUS FREQUENCY DISTRIBUTION

When the data is in class interval form, the class corresponding to c.f. just greater than
N/2 is called the median class and the median is computed by the following formula:

N 
 −C
Me = L +  ×h
2
f

Where L= lower limit of the median class

C= cumulative frequency just before the median class

N= Total of frequency

f = frequency of the median class

h = magnitude of the median class

Example15. The following table gives the distribution of weights of 100 persons. Find the
median of this data.

Weight 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90
Frequency 1 3 6 10 15 25 15 10 11 4

Solution. For computing the median we prepare the following table:

Table11. Median in Continuous Distribution

Weight (in kg) (x) Frequency (f) Cumulative


Frequency (c.f.)

40-45 1 1

45-50 3 4

50-55 6 10

55-60 10 20

60-65 15 35

65-70 25 60

70-75 15 75

75-80 10 85

80-85 11 96

85-90 4 100

Total N = ∑ f = 100

Here N/2 = 10/2= 50 and cumulative frequency just greater than 50 is 60. The class
corresponding to cumulative frequency 60 is 65-70. So class 65-70 is the median class. Now
median is given by:

N 
 −C
Me = L +  ×h
2
f

here L= lower limit of the median class= 65


C= cumulative frequency just before the median class=35

N= Total of frequency =100

f = frequency of the median class=25

h = magnitude of the median class=5

M e = 65 +
(50 − 35) × 5 = 65 + 15 × 5 = 65 + 3 = 68
25 25

So median of the given data is 68kg

2.4.4 MERITS, DEMERIRS AND USES OF MEDIAN

Merits:

1. Median is rigidly defined.

2. It is not affected at all from extreme values.

3. It is easy to understand and to calculate.

4. In case of individual series data it can be located merely by inspection.

5. It can be calculated in case of open end classes.

6. Its graphical representation is also possible.

7. It can be computed even if the classes are of unequal interval.

8. In case of qualitative data, e.g., beauty, honesty, intelligence, etc. it is the best
measure of central tendency.

Demerit:

1. It is not amenable to algebraic treatment.

2. It is a positional average and is based only on the middle term. It does not use all
the observations of the data.

3. In case of irregular distribution, it is not a good measure.


4. In case of even number of observations it cannot be determined exactly, it can be
estimated only by the average of the two middle terms.

5. In comparison to mean it is affected much by fluctuations of sampling.

Uses:

1. It is a good measure if numerical measurements are not possible.

2. In case of qualitative data where the observations cannot be determined


quantitatively, it is the only average.

3. It is generally used in studying the average intelligence or average honesty of a


group of people.

2.5 MODE

Mode is the most frequent item of the series, i.e., in a given set of observations a item
or observation which is repeated maximum number of times an all other observations cluster
around this, is called mode. For example, the average height of an Indian male is 5 feet 6
inch; the average size of the shoes of an Indian male is number 7, etc. Mode is also known as
norm.

2.5.1 TYPES OF MODE OF A DISTRIBUTION

Unimodal: If the data of a distribution has only one mode then the distribution is
called unimodal.

Bimodal: If we find that there are two items in a distribution which have the same
number of repetitions, then these two items are the modes and the distribution is
called bimodal.

Trimodal: Similarly, in a distribution, if there are three such items that they have the
same frequency then these three items are called the modes of the distribution and the
distribution is called trimodal.

Ill- defined mode: If there exists more than one mode in a distribution, then mode is
called ill-defined.
2.5.2 MODE IN IDDIVIDUAL SERIES

In case individual series mode is the most frequent observation. It is clear from the
following example.

Example 16. Find the mode of the series given below:

2, 3, 4, 7, 9, 3, 2, 1, 5, 3, 6, 3, 8, 3

Solution. In the given series the observation 3 is repeated maximum number of times (5) so
the mode of the given series is 3.

2.5.3 MODE IN DISCRETE FREQUENCY DISTRIBUTION

In case of discrete frequency distribution, mode is the value of the variable which has the
maximum frequency. Consider the following example:

Example 17. Find the mode of the following frequency distribution:

Variable (x) 2 5 7 9 11 25 35 43 52

Frequency (f) 1 3 4 8 25 12 11 10 8

Solution: Here we see that in the given distribution, the variable 1 has the maximum
frequency 25. So the mode of this distribution is 11.

2.5.4 GROUPING METHOD OF MODE

When the distribution is irregular, the frequencies are increasing and decreasing in

An irregular pattern or the difference between the maximum frequency and the frequency
succeeding or proceeding to it is small and the observations are concentrated on either side,
in such a situation mode cannot be determined merely by inspection. In such a case, we apply
the grouping method for the computation of mode. The procedure of grouping method will
be clear from the following example.

Example 18. Find the mode of the following distribution.


Variable (x) 2 3 4 5 6 7 8 9 10 11 12 13

Frequency (f) 1 3 4 5 7 10 11 10 9 14 7 5

Solution. Here we see that initially the frequencies are increasing from 1 to 11 and then
decreasing but the frequency 14 of the variable value 11 is again increasing and then
decreasing up to frequency 5. This distribution shows an irregular pattern. So for the
calculation of mode we apply the grouping method of mode. For this we prepare a table and
the procedure of preparing the table is explained below the table.

Table12. Table for grouping the frequencies

Variable Frequency(f)
(x)
Column Column Column Column Column Column
(i) (ii) (iii) (iv) (v) (vi)

2 1

3 3
}4
4 4
}7 }8
}9 }12
5 5
}12 }16
6 7
}17 }22
7 10
}21 }28
8 11
}21 }31
9 10
}19 }30
10 9
}23 }33
11 14
}21 }30
12 7
}12 }26
13 5
Prepare a table from the frequencies of the distribution. In column (i), we have the
original frequencies. Mark bold type the maximum frequency in this column. Column (ii) is
prepared by adding the frequencies two by two as 1+3 = 4; 4+5 = 9 and so on. Mark bold
type the maximum frequency in this column also. Column (iii) is prepared by adding the
frequencies two by two leaving the first frequency. Column (iv) is prepared by adding the
frequencies three by three. Column (v) is prepared by adding the frequencies three by three
leaving the first frequency and column (VI) is prepared by adding the frequency three by
three leaving the first two frequencies. In each column make bold type the maximum
frequency. The table is given above:

Now to find the mode we prepare the following analysis table:

Table13. ANALYSIS TABLE

Column number Maximum frequency Value(s) of x related to the

(1) (2) maximum frequency (3)

i 14 11

ii 23 10, 11

iii 21, 21 7, 8, 11, 12

iv 30 8, 9, 10

v 33 9, 10, 11

vi 31 7, 8, 9

In the analysis table column number (1) shows the columns serially from the above table 12,
column number (2) shows the maximum frequency from the same table 12 and column
number (3) shows the value of x related to the maximum frequency or the values of x which
contributes in the maximum frequency. Finally, in column number (3) of the analysis table
we see that the value 11 is repeated maximum number of times. So 11 is the mode of the
above distribution.

2.5.5 MODE IN CONTINUOUS FREQUENCY DISTRIBUTION

In case of grouped continuous frequency distribution the maximum frequency shows


that the related class is the modal class and for the computation of mode we use the following
formula:

Mo = L +
( f1 − f0 ) ×h
(2 f1 − f0 − f2 )
Where L= lower limit of the modal class

h= magnitude of the modal class

f1 = frequency of the modal class

f 2 = frequency of the class succeeding the modal class

f 0 = frequency of the class preceding the modal class

For a moderately asymmetrical distribution the mode can be calculated by a formula


given by Karl Pearson as follows:

Mode = 3 Median – 2 Mean

Example 19. Following table shows the blood pressure and the frequency related to it. Find
the mode of this distribution.

Table 14.

C.I. Frequency C.I. frequency

70-80 2 110-120 32

80-90 4 120-130 28

90-100 14 130-140 12

100-110 35 140-150 5
Solution. From the table it is clear that maximum frequency is 35 and the related class is the
100-110. So 100-110 is the modal class. Now to compute the mode we use the following
formula:

Mo = L +
( f1 − f0 ) ×h
(2 f1 − f0 − f2 )
Here L= lower limit of the modal class= 100

h= magnitude of the modal class= 10

f1 = frequency of the modal class= 35

f 2 = frequency of the class succeeding the modal class= 32

f 0 = frequency of the class preceding the modal class= 14

M o = 100 +
(35 − 14) × 10 = 100 +
210
= 100 + 8.75 = 108.75

(70 − 14 − 32) 24

So mode of the given distribution is 108.75.

2.5.6 MERITS, DEMERITS AND USES OF MODE

Merits:

1. Mode is easy to understand and to calculate.

2. It is not affected by extreme values.

3. It can be determined graphically.

4. In some cases it can be located by inspection only.

5. It can be computed for the distributions of unequal class intervals provided the
modal class; the class preceding the modal class and succeeding the modal class are of
equal width.

6. It represents the most frequent value of the distribution, practically it is very


useful.
Demerits:

1. It is not based upon all the observations.

2. It is not subjected to algebraic treatments, i.e., we cannot compute the combined


mode if we have the modes of the two series.

3. In some cases mode is ill defined. In some cases it is not possible to find a clear
mode. Some series have two modes and some more than two modes.

4. As compared to mean, mode is affected much by fluctuations of sampling; it is an


unstable measure of central tendency.

5. If the modal class or the class preceding or succeeding the modal class are of
unequal width, it cannot be determined.

6. There are different formulas for the calculation of mode.

Uses:

1. It is used to find the ideal size; it is very useful in business forecasting.

2. It is very useful in ready-made market, e.g., shoes, shirts, jeans etc.

3. It is very useful in commercial management.

2.6 SUMMARY

The study of this chapter provides us the knowledge of central tendency and measures
of central tendency. From the study of this chapter we came to know the definitions of the
measures of central tendency as mean, median and mode. We studied and learnt different
methods of computing mean. We learnt about weighted mean and combined mean. We learnt
how we can calculate the mean, median and mode in case of individual series, in case of
discrete distribution and continuous distribution. We studied the grouping method of mode.
We studied the merits, demerits and uses of mean, median and mode also. From the study of
these methods, merits, demerits and uses we came to know the situations where which
method is suitable and also which measure is suitable for the particular situation? Over all we
learnt a lot about measures of central tendency.
3.3.4 STANDARD DEVIATION

For describing the scatteredness of the data values the best measure of variability is the
standard deviation. It is denoted by σ . If standard deviation in a data is small, it means there
is high degree of homogeneity in the data values and vice versa if the value of standard
deviation is large, it means there is a large heterogeneity in the data values.

It is defined as the positive square root of the arithmetic mean of the deviations of values
when the deviations are taken from their arithmetic mean.

3.3.4.1 Standard Deviation for Individual Series:

Let the variable under study X takes the n values x1 , x 2 ,......x n , their standard deviation
is given by the following formula:

∑ (x − x) ∑d
2 2

σ = i
σ= where d = xi − x
n or n

The steps of the procedure are as follows:

Step1. Compute the arithmetic mean of the given series.

Step2. Compute the deviations of the series values from the mean, i.e., compute
d = xi − x

Step3. Compute the square of the values got in step 2, i.e., compute d 2 = ( xi − x )
2
.

Step4. Find the sum of values got in step 3 and divide it by the number of values, i.e.,

∑d ∑ (x − x)
2 2
i
compute = .
n n

Step5. Take the square of the value got in step 4. This is the required value of the
standard deviation.

The procedure will be clear from the example given below:

Example7. Compute the standard deviation of the following series:

12, 15, 17, 21, 28, 27,


Solution. For the computation of standard deviation we prepare the following table:

Table7. S.D. for Individual Series

x d = (xi − x ) d 2 = ( xi − x )
2

12 -8 64

15 -5 25

17 -3 9

21 1 1

28 8 64

27 7 49

∑ x = 120 ∑ d 2 = ∑( xi − x ) = 212
2

x=
∑ x = 120 = 20
Arithmetic mean n 6

∑d
2
212
σ = = = 35.33 = 5.94
Standard deviation n 6

Hence the standard deviation of the given series is 5.94

3.3.4.2 Short-Cut Method of Standard Deviation

This method is applied when mean is in fractional form because in that case the
deviations and their squares make the calculations difficult. So in this case we take the
deviations of the values from an assumed mean.

Let d = x – A, here A is the assumed mean, then in this case the formula for standard
deviation is given by as :

∑d ∑d 
2 2

σ = − 
n  n  where n is the number of observations.
 
We follow the following steps for the commutation of S.D. in this case:

Step1. Take any value of the series as assumed mean A.

Step2. Compute the deviations of the series values from the assumed mean, i.e.,

Compute d = xi − A

Step3. Find the total of step 2 values, i.e., find total of d, i.e., Σ d.

Step4. Divide the value of step 3 by number of values ‘n’ and find its square, i.e,

 ∑d 
2

 
 n  .
 

Step5. Compute the square of the values got in step 2, i.e., compute d 2 = ( xi − A)
2
.

Step5. Find the sum of values got in step 5 and divide it by the number of values, i.e.,

∑d ∑ (x − x)
2 2
i
Compute = .
n n

Step6. Subtract the value of step 4 from value of step 5 and then take its square root.

This is the required value of the standard deviation.

The procedure will be clear from the example given below:

Example8. Find the standard deviation in the above example 7 by short- cot method.

Solution. Let us take 21 as assumed mean A. Now we prepare the following table for the
computation of standard deviation.

Table8. For Short –Cut Method of S.D.

x d = (x − 21) d2

12 -9 81

15 -6 36

17 -4 16
21 0 0

28 7 49

27 6 36

Total ∑ d = −6 ∑ d 2 = 218

∑d  ∑d 
2 2
218
∴ σ = −  = − 1 = 35.33 = 5.94
n  n  6
 

3.3.4.3 Standard Deviation in Discrete Frequency Distribution

If x1 , x 2 ,......x n are the observations in a discrete distribution according to some

characteristic and f1 , f 2 ,.... f n be their corresponding frequencies then standard deviation can
be calculate with the help of these methods:

1. Actual mean method

2. Assumed mean method

3. Step deviation method

The procedures of the three methods will be clear with the help of the examples.

1. Actual mean method:

For this we use the following formula:

∑ f (x − x )
2

σ =
N

Where N = ∑ f =Total frequency

Example9. Calculate the standard deviation of the distribution of marks of the B.Sc.
botany class students. The data is given below:

x: 12, 18, 17, 15, 20, 25, 32, 42


f: 2, 1, 3, 5, 1 2 10 1

Solution. For the calculation of standard deviation we prepare the following table:

Table9. S.D. for Discrete Frequency Distribution by Actual Mean

X f fx d = (x − x ) ( x − x )2 f (x − x )
2

12 2 24 -12 144 288

18 1 18 -6 36 36

17 3 51 -7 49 147

15 5 75 -9 81 405

20 1 20 -4 16 16

25 2 50 1 1 2

32 10 320 8 64 640

42 1 42 18 324 324

N = 25 ∑ f x = 600 ∑ f ( x − x ) = 1858
2

Arithmetic mean x =
∑ f x = 600 = 24
∑ f 25

∑ f (x − x )
2
1858
Standard deviation σ = = == 74.32 = 8.62
N 25

Hence the standard deviation of the given distribution is 8.62

2. Assumed Mean Method:


In this method we take a middle vale of x as the assumed mean A and the apply the
following formula:

∑ fd  ∑ fd 
2 2

σ = −  ; where symbols are in their usual meaning.


N  N 
 

For the procedure of this method we take a example below:

Example10. For the above example 9 apply assumed mean method for computing the
standard deviation.

Solution. Let us assume A= 20. Now for the calculation of standard deviation we prepare the
following table:

Table10. S.D. for Discrete Frequency Distribution by Assumed Mean

X f d = ( x − A) fd fd 2

12 2 -8 -16 128

18 1 -2 -2 4

17 3 -3 -9 27

15 5 -5 -25 125

20 1 0 0 0

25 2 5 10 50

32 10 12 120 1440

42 1 22 22 484

N = 25 ∑ fd = 100 ∑ fd 2 = 2258

∑ fd  ∑ fd 
2 2 2
2258  100 
Standard Deviation σ = −  = − 
N  N  25  25 
 
= 90.32 − 16 = 73.68 = 8.62

Hence the standard deviation of the given distribution is 8.62

3. Step Deviation Method:


This method is applied when the values have some common interval (say h), we
divide the deviations by this common interval and apply the following formula:

∑ fd  ∑ fd 
2 2

σ = −  ×h
N  N 
 

The procedure of the method will be clear from the example given below:

Example11. Daily high blood pressure of a patient on 100 days is given below:

Bp (mmHg): 102 106 110 114 118 122 126

No. of days: 3 9 25 35 17 10 1

Calculate the standard deviation of the above data.

Solution. Let us take the assumed mean A= 114. Here common interval h= 4. Now we
prepare the following table for the calculation of standard deviation.

Table11. S.D. for Discrete Frequency Distribution by Step Deviation

BP f
d=
(x − 114 ) fd fd 2
(mmHg) 4

102 3 -3 -9 27

106 9 -2 -18 36

110 25 -1 -25 25

114 35 0 0 0
118 17 1 17 17

122 10 2 20 40

126 1 3 3 9

Total N = 100 ∑ fd = 100 ∑ fd 2 = 2258

∑ fd  ∑ fd 
2 2 2
154  − 12 
S.D.= σ = −  ×h = −  ×4
N  N  100  100 
 

= 1.54 − .0144 × 4 = 1.235 × 4 = 4.94 MmHg

3.3.4.4 Standard Deviation in Continuous Frequency Distribution

In case of continuous distribution we find the mid values of classes and treated them
as the variable values x. In this case we can apply all the three methods discussed in previous
section. But generally step deviation is applied. The formula is the same as in case of discrete
distribution discussed. The procedure is described in the example given below.

Example12. Calculate the standard deviation for the following table giving the age
distribution of 542 persons of a city.

Age in years: 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80 80 – 90

No. of members: 3 61 132 153 140 51 2

Solution. For the calculation of standard deviation, let us take d = ( x − 55) / 10 . Here we let

assumed mean A = 55 and common interval (h) = 10. Now we prepare the following table:

Table12. S.D. in Continuous Distribution by Step Deviation Method

Age Mid- Frequency fd fd 2


group value (f) d=
(x − 55)
(x) 10

20-30 25 3 -3 -9 27
30-40 35 61 -2 -122 244

40-50 45 132 -1 -132 132

50-60 55 153 0 0 0

60-70 65 140 1 140 140

70-80 75 51 2 102 204

80-90 85 2 3 6 18

Total N=542 ∑ fd = −15 ∑ fd 2 = 765

∑ fd  ∑ fd 
2 2

S.D.= σ = −  ×h
N  N 
 

2
765  − 15 
= −  × 10 = 1.334 × 10 = 11.55
542  542 

Hence the standard deviation of age of the given distribution is 11.55 years.

3.3.4. Merits, Demerits and Uses of Standard Deviation

Merits:

1. It is rigidly defined.
2. It uses all the observations of the data in calculation.
3. It is used in correlation.
4. It is affected least by fluctuation of sampling.
5. It is suitable for further mathematical treatments.
6. It is the best measure of variability.

Demerits:

1. Its calculation is difficult in comparison to other measures of dispersion.


2. It is sensitive to extreme values.
3. It is not easily understandable for a common person.
Uses:

1. It is best measure of comparison of variability.


2. It is used in partitioning between groups and within groups in analysis of
variance and design of variance.
3. It is used with mean in normal distribution for finding the areas.
4. It shows best dispersion of values from the mean.
5. It is very much used in medical field.

3.4 VARIANCE AND COEFFICIENT OF VARIATION

3.4.1 VARIANCE

It is just the square of the standard deviation. It is denoted by σ 2 . In other words


variance is the arithmetic mean of the squares of the deviations, when deviations are
taken from their arithmetic mean.

3.4.2 COEFFICIENT OF VARIATION

It is the best measure of the comparison of variability of the two series or populations.
The units of measurement of the two populations may be different. This comparison is
possible because it is a unit free measure. It is presented in percentage and
is expressed as:
σ
Coefficient of variation (C.V.) = × 100 ; where notations have their usual meaning.
x

A series having lesser c.v. is called more consistent or more homogeneous, i.e., the
values of the series are closer to the mean of the series and if the c.v. of a series is larger,
it is called more variable or in other words more heterogeneous series, i.e., the values of
the series far apart from the mean of the series.

Example13. Calculate the coefficient of variation of the distribution of marks of the


B.Sc. botany class students. Given the following information:

Average marks Standard deviation of marks

x = 24 σ =6

σ 6
Solution. Coefficient of variation (C.V.) = × 100 = × 100 = 25%
x 24

Hence the C.V. of the marks is 25%.

Example14. The following data shows the mean and standard deviation on systolic BP
and weight of 10 persons as:

BP Weight

Mean S.D. Mean S.D.

120 15 60 4.5

Compare the two characteristics.

Solution. For comparison of the two characteristics we find the C.V. of these
characteristics.

σ 15
C.V for BP = × 100 = × 100 = 12.5%
x 120

σ 4.5
C.V for Weight = × 100 = × 100 = 7.5%
x 60

We see that the coefficient of variation of BP is more than the coefficient of variation
of weight so BP is more variable than the weight of the given persons.

You might also like