0% found this document useful (0 votes)
407 views109 pages

Biostatistics Chapter

The document discusses statistics and biostatistics concepts. It explains that statistics is the quantitative analysis of data from various sources like health studies. Biostatistics applies statistical methods to analyze biological and medical data. Some key uses of biostatistics mentioned are evaluating public health programs, identifying disease symptoms and signs, and assessing the efficacy of new drugs. The document also covers frequency distribution tables and defines important biostatistics terminology like class intervals, continuous vs. discontinuous classes, and open end classes.

Uploaded by

Senen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
407 views109 pages

Biostatistics Chapter

The document discusses statistics and biostatistics concepts. It explains that statistics is the quantitative analysis of data from various sources like health studies. Biostatistics applies statistical methods to analyze biological and medical data. Some key uses of biostatistics mentioned are evaluating public health programs, identifying disease symptoms and signs, and assessing the efficacy of new drugs. The document also covers frequency distribution tables and defines important biostatistics terminology like class intervals, continuous vs. discontinuous classes, and open end classes.

Uploaded by

Senen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

14.1.

INTRODUCTION
Statistics is the quantitative information of any data such as birth and death
rate, health management, production of medicine, profit and loss of different
industries. Another language, it is the collection, compilation, presentation,
analysis and interpretation of both qualitative and quantitative information
about data for example in one family planning, a number of born children can
be collected from married women. The characteristics to be measured which
depend on the objectives of the study.
For example: study the knowledge of child health care, child mortality,
child fertility and some other characteristics to be recorded are age of
mother, education of mother, number of born children, duration of marriage,
breastfeeding of period, number of dead children and proper vaccination is
done or not in a month.

14.2. BIOSTATISTICS CONCEPT


The term is used when the tool applied to the analysis of data that is come from
biological organisms.

Importance of Statistics in Biotechnology


It is apply for the aggregate of facts.
Statistics are affected by number of causes by multiplicity.
It should be capable of being related to each other, so they effect on
relationship can be induced.
Introduction to Biotechnology and Biostatistics

This is applying for the standard of accuracy should be maintained


for the analysis of data.
Statistics is help for the presenting of small to large quantity of data

This tool is applied for the method of comparison of data.

bank, army, buildings, plants and animals, etc.


It tries to give material for the administrators so as to serve as a
guide to shaping or planning in future policies.
Problems Restricted in Statistics
It can be used to analyze only collective matters no single events.
It is applying for only the quantitative data not for the quality data.
It is associated for only the uncertain samples like biased.
This is done by only the true samples average run.
Uses of Biostatistics
Public Health and Medicine Community

of attacks or deaths among the vaccinated subject is compared with


the unvaccinated ones.
To measures the affected peoples in publically health related.
Differentiated epidemiological studies for the role of causative
factors are applied for the evaluation.
Medicine
Identify the symptoms and sign of a syndrome or a disease.
Typhoid and cough is found by chance and fever is almost found
among the every cases.
The association is done between the attributes such as cancer and
smoking.

among the people who are taking for the analyzes.


To check the data or drug to know the percentage of cured and died
people after applying drug to the patients with control groups for the
comparisons.
Anatomy and Physiology
Check the correlation between the two factors or variables, e.g.,

observation of normal in a population.


Biostatistics

To check out the means or average proportions of normal or two


places in different mode.
To compare the healthy or unhealthy population in data.
Pharmacology
To know the potency of the new drug with respect to the standard
drug (standard drug is take for the comparisons of tested drugs.
Take different drug or two successive dosages of the same drug for
the comparisons which one is more affective for the prevention of
diseases.
The drug is given to the animals or humans to view the changes
occur due to the drug effect or due to the off target effect (means by
chance) to know the action of the drug.

14.3 FREQUENCY DISTRIBUTION


There are mathematical function which showing numbers of instances in which
variables takes each of possible values in experiment.
The statistical table which shows the values of variable arranged in order of
magnitude wither in individually or in a group and also the frequencies side by
side are known as frequency distribution (Tables 14.2 and 14.3).

Table 14.2: Simple Frequency Distribution

Number of subject test Marks obtained


Biostatistics 23
Cell biology 45
Molecular biology 34
Microbiology 56
Environmental biology 45
Molecular virology 65
Biodiversity 56
Plant genetic resources 78
Animal biotechnology 75
Plant biotechnology 34
Recombinant DNA technology 56
Animal biosciences 55
Applied Microbiology 77
Immunology 56
Introduction to Biotechnology and Biostatistics

Pharmaceutical chemistry 68
Chemistry 79
Intellectual property rights 67
Entrepreneurship 56

Table 14.3: Group Frequency Distribution

Age in years Frequency (No. of persons)


10–15 10
15–20 15
20–25 20
25–30 25
30–35 30
35–40 35
40–45 40
45–50 34
50–55 43
55–60 37
60–65 34
65–70 24
70–75 9
75–80 8
80–85 6
85–90 5
90–95 4
95–100 7

Remember for the forming frequency distribution


1. The number of class should neither to be small nor to be large in
distribution pattern.
2. The normally classes should be made of equal width are to be
expected.
3. The classes should be exclusive pattern, i.e., non-overlapping.
4. The class must be exhaustive i.e, each raw data must be included
in the classes.
5.
Remember for the grouped frequency distribution
1. Cumulative frequency
2. Class frequency
Biostatistics

3. Total frequency
4. Percentage frequency
5. Frequency density
6. Class marks
7. Class width
8. Class boundaries
9. Class interval
10. Class limits

Class or Class Interval


The large number of observations varying in a wide range is available. There
are classified in several groups according to the size of values. Each of these
groups defined by an interval is known as class or interval.

Continuous: The class interval does not contain the upper boundary
of the class will be known as class interval of continuous.
A class interval of the form 10–20 in continuous class will contain values
from 10 to less 20. An example such as

Range
0–10 From zero less than 10
10–20 From 10 less than20
20–30 From 20 less than 30
30–40 From 30 less than 40
40–50 From 40 less than 50
50–60 From 50 less than 60
60–70 From 60 less than 70
70–80 From 70 less than 80
80–90 From 80 less than 90
90–100 From 90 less than 100

Discontinuous: The class interval where each value includes the


end values will be called discontinuous class interval. The class
interval of the form 0–9 in discontinuous class will contain values
from 0–9 both inclusive.
An example such as:
Range
0–9 From 0 to9
10–19 From 10 to 19
Introduction to Biotechnology and Biostatistics

20–29 From 20 to 29
30–39 From 30 to 39
40–49 From 40 to 49
50–59 From 50 to 59
60–69 From 60 to 69
70–79 From 70 to 79
80–89 From 80 to 89
90–99 From 90 to 99

Continuous classes intervals are formed generally with continuous type


values or non-integral values e.g., Kg and Rupees, etc. Discontinuous class
intervals are formed generally with discrete or integral values e.g., marks.

Open End Class


The one end of class is not specified is known as open end class. A frequency
distribution may have either one or two end classes.

0–50 90
50–100 100
100–150 80
150–200 50
200–250 40
250–300 150
300–350 50
350–400 30
400–450 40
450–500 50
500–550 40
550–600 30
600–650 20
650–700 10

Class Limits
The construction of groups frequency distribution the class interval must be
defined by pairs of numbers such that the upper end of one class does not
coincide with the lower end of the immediate following class.
The two numbers are used to specify the limits of classes’ interval for the
purposes of tallying the original observations into the various classes are called
Biostatistics

as class limits:
The small of the values or pair is known as lower class limits.
The larger of the values or pair is known as upper class limit.

Class Boundaries
The measurements of the continuous variables all data are recorded nearest to
a certain unit or integer value. The most extreme values which would ever be
included in a class interval are known as class boundaries, In fact, this is real or
actual limits of a class interval.
The extreme point is lower is known as lower class boundary.
The extreme point is higher is known as higher class boundary.

Calculation
If x is the gap between the upper class limit of any class or class interval and the
lower class limit of the next class or class interval.

Upper class boundary = upper class limit + 1 x


2
Lower class boundary = lower class limit – 1 x
2

Note
Class limits are used only for the construction of the grouped frequency
distribution but in all statistical calculations and diagrams involving end points
of classes (e.g., medium, mode, histogram and ogives, etc.) Class boundaries
are used.
Class Mark
It is mid value of a class interval exactly at the middle of the class
or class interval.
It lies half way between the class limits or between the class
boundaries.
Class mark = Lower class limit + upper class limit
It is used for the representative class interval for the calculation of
means and standard deviation and mean deviation, etc.
Class Width
It is length or range of class interval or difference between the upper or lower
class boundaries.
Introduction to Biotechnology and Biostatistics

Class Frequency or Total Class of the Frequency


The number of observation falling within a class is known as class frequency
or simple frequency. The sum of all the classes frequencies are called total
frequencies.

Percentage Frequency
The percentage of the class interval is expressed as percentage of the total
frequency distribution.

Relative Frequency
The representation ratios are of the total frequency values. It is not expressed in
percentage. It is used to compare two or more frequency distributions or two or
more items in the same frequency distribution.

Total frequency

Frequency Density
The frequency density of a class interval is its frequency per unit width. It shows
the concentration of frequency in a class. It is used in drawing histogram when
the classes are of unequal width.

Frequency density = Class frequency


Width of the class
Table 14.4: Class Interval, Class Frequency, Class limits, Class boundaries, Class marks, Class width, Frequency density,
Relative frequency

1 2 Lower Upper Lower Upper 7 8 9 10


3 4 5 6
15–19 18 15 19 14.5 19.5 17 5 3.6 0.18
20–24 34 20 24 19.5 24.5 22 5 6.8 0.34
25–29 21 25 29 24.5 29.5 27 5 4.2 0.21
30–34 12 30 34 29.5 34.5 32 5 2.4 0.12
35–39 9 35 44 34.5 39.5 39.5 5 1.8 0.09
40–44 6 45 59 39.5 44.5 52 5 1.2 0.06
45–49 0 50 64 44.5 49.5 57 5 1.0 0.05
50–54 0 55 69 49.5 54.5 62 5 0.08 0.04
55–59 0 60 74 54.5 59.5 67 5 0.06 0.03
60–64 0 65 79 59.5 64.5 72 5 0.04 0.02
65–69 0 70 84 64.5 69.5 77 5 0.01 0.01
Total 100 - - - - - - - 1.15
Biostatistics
Introduction to Biotechnology and Biostatistics

Cumulative Frequency Distribution:


The corresponding to a class is the sum of all the frequency up to and including
that class. It is obtained by adding to the frequency of that class and all the
frequencies of the previous classes.
It is two types
1. Less than cumulative frequencies:
The number of observations up to a given value is called less than cumulative
frequency.
2. More than cumulative frequencies:
The number of observations greater than a value is called as more than
cumulative frequencies.
Table 14.5: Cumulative Frequencies

Less than More than


30–40 8 8 100
40–50 12 20 92
50–60 20 40 80
60–70 25 65 60
70–80 18 83 35
80–90 17 100 17
90–100 5 105 12
Total 105

Uses
To analysis the number of observations less than or more than any
given value.

To measure out pentiles, quartiles and median. (Dr. Pranav Kumar


Banerjee, a textbook of biometry).

Theoretical Distribution
Normal distribution was first discovered in 1733 by mathematician De
Movire. He obtained this continuous distribution as a limiting case of binomial
distribution. So the normal distribution is also called as Gaussian distribution
named after Karl Friedrich Gauss, who used this normal curve to describe the
theory of accidental errors of measurements involved in calculation of orbits of
Biostatistics

heavenly bodies. It is a continuous probability distribution which is bell shaped


unimodal and symmetrical.
The normal distribution of a variable when represented graphically takes
the shape of a symmetrical curve is called as Normal curve.
Properties of the normal distribution and normal probability curve:
It has two parameters viz. mean (µ) and standard deviation (
bell shaped and symmetrical about the line x = µ.
It has only one mode occurring at µ i.e., it is unimodal.
The mean, median and mode value will coincide at the center (x = µ)
because of the symmetrical and single peaked.
Mean = Median = Mode = µ
The normal curve has asymptotic tails, i.e., progressively nearing
the abscissa or x-axis.
The range is unlimited in both directions but as the distance from µ
increases, the curve approaches the horizontal axis more and more
closely and never touches the horizontal axis (X).

The curve changes from concave to convex and vice versa.


1
) and the third (Q3) are equidistant from
µ.
Q3 – µ = µ – Q 1
Quartile deviation = Q3 – Q1/2
= 0.67458
Q1 = µ – 0.6745
Q3 = µ + 0.6745
Thus the range of the normal distribution from -0.6745 to 0.6745.
The normal distribution is bilaterally symmetrical so, it is free from

Skewness = 0
Kurtosis = 0
The ordinates maximum (y) lies at the mean, i.e., at x = µ.

decreases.
Since the mean = median = µ, the coordinates at x = µ (or Z = 0)
Introduction to Biotechnology and Biostatistics

divides the whole area into two equal parts.


The area right to the ordinate and left to the ordinates at x = µ (Z =
0) is 0.5.
The mathematical equation is completely determined if the mean

The curve will always remain in symmetrical about the maximum


ordinate. But the change in the value of mean (µ) and the standard

The maximum ordinate is at the mean (µ) and at various standard

at the mean when the total mean are under the normal curve is equal
to unity.
There are following table which gives are under the normal
probability curve for some important values of Z.

Z = ± 0.6745 50% = 0.50


Z = ±1.0 68.27% = 0.6827
Z = ±1.96 95% = 0.95
Z = ±2.00 95.45% = 0.9545
Z = ±2.58 99% = 0.99
Z = ±3.0 99.73% = 0.9973

Uses of normal distribution


It is apply in sampling theory.
It can be used to approximately the binomial and Poisson distribution.
It has considerable application in statistical quality control.
It is mostly used in statistical hypothesis.

It has different mathematical properties which make it popular


and comparatively easy to manipulate foe the use in social natural
sciences.

It is the normal distribution which fits best with an observed the distribution and

size (n) as the latter.


The mean value, standard deviation is computed.
Biostatistics

So the Xm is transformed into a Z score.


Z = Xm
Example 1:
glucose concentration of 80 patients of Mmidnapure district hospital. Calculate
the mean and the standard deviation of the given data.
Class Interval 100–109 110–119 120–129 130–139 140–149 150–159 160–169
Frequency 6 11 10 17 16 13 7

Solution:
Let assume the mean 134.5
) X fd fd2
(f)
100–109 104.5 6 104.5–134.5 = -30 -3 -18 54
110–119 114.5 11 114.5–134.5 = -20 -2 -22 44
120–129 124.5 10 124.5–134.5 = -10 -1 -10 10
130–139 134.5 17 134.5–134.5 = 0 0 0 0
140–149 144.5 16 144.5–134.5 = +10 +1 +16 16
150–159 154.5 13 154.5–134.5 = +20 +2 +26 52
160–169 164.5 7 164.5–134.5 = +30 +3 +21 63
239

X = 134.5 + 13/80 × 10
= 134.5 + 1.625
= 136.125
= 136.1
2 2

2
× 10

= 10 × 1.72
=17.2
The deviation of each (Xm) from (X) is then transformed into Z score
which entered in the following table.
Mid Value X y Y
(f)

100–109 6 104.5 104.5 – 136.1= -31.6 -31.6/17.2 = -1.84 0.0734 3.4


110–119 11 114.5 114.5 – 136.1 = -21.6 -21.6/17.2 = -1.26 0.1804 8.4
120–129 10 124.5 124.5 – 136.1 = -11.6 -11.6/17.2 = -0.67 0.3187 14.8
130–139 17 134.5 134.5 – 136.1 = -1.6 -1.6/17.2 = -0.09 0.3973 18.5
Introduction to Biotechnology and Biostatistics

140–149 16 144.5 144.5 – 136.1 = +8.4 8.4/17.2 = 0.49 0.3538 16.5


150–159 13 154.5 154.5– 136.1 = 18.4 18.4/17.2 = 1.07 0.2251 10.5
160–169 7 164.5 164.5– 136.1 = 28.4 28.4/17.2 = 1.65 0.1023 4.8

Neglecting the algebraic sign of the Z score, the height y of the ordinate at each
Z score is then recorded from units normal curve table.

each Z score by multiplying its y score with in/ SD (I = class interval, n = total
frequency, SD = standard deviation) for example
Y = y × in/ SD
= 0.1804 × 10×80/17.2
= 0.1804×46.51
=8.39
= 8.4

14.4. VARIABLES
These characteristics are different from individual to individual. So, it is known
as variable. There are two types of variable. The information of the whole group
is examining the part of the each whole group individual to individual.
Qualitative variable
Quantitative variable

14.5. QUALITATIVE VARIABLE


The variable is measured normally by qualitative properties such as type of
disease caused in human, rank of doctors, occupation of persons, and color of
girl and qualification of girls. It is also called as attribute.

14.6. QUANTITATIVE VARIABLE


The variable is measured by numerical basis such as weight of boys, height of
girls, blood pressure of patients, and number of marks obtains in class, age of
small baby girl and number of patient in a hospital. It is again divided into two
parts: discrete variable and continuous variable.
Biostatistics

14.7. DISCRETE VARIABLE


It is measured by only integral value such as number of patient admitted in a
day in a hospital.

14.8. CONTINUOUS VARIABLE


It is measured by integral or fractional values such as age of boys and height
of boys.
According to recorded variable is known as random variable and non-
random variable

14.9. RANDOM VARIABLE


The values are selected randomly from different units selected by random
process such as 500 students in a school and n = 100 students are selected by
any random process for age investigation.

14.10. NON-RANDOM VARIABLE


The values are not selected randomly; select all students’ population in a school
for age investigation. The data can be collected from selected all populations
or some units. There are two types of data collection method: (i) sample survey
method; and (ii) census method.
Discrete variables:
Serial Numbers Number of Tiger in One species
1 4
2 5
3 6
4 2
5 5
6 5
7 7
Introduction to Biotechnology and Biostatistics

Continuous Variable:

Serial Numbers Weight of the Tiger


1 86.023 kg
2 105.976 kg
3 104.034 kg
4 160.89 kg
5 145.087 kg
6 128.568 kg
7 100.572 kg

14.11. SAMPLE SURVEY METHOD


In this method, we are collected data from some selected units under investigation
example quality of data can be ensured.

14.12. CENSUS METHOD


In this method, we are collected data from all population unit under investigation
example money and time.

14.13. GRAPHICAL REPRESENTATION OF DATA


There are two method of data collection has been discussed.
Primary data; and
Secondary data.

14.14. PRIMARY DATA


The data are collected by personal interview or by mailed. These data are called
as raw data. Example
The amount of fasting blood sugar of some persons (male and female)

14.14.1. Primary Data Collection


Biostatistics

Schedules sent through investigators


Direct personal observation
Questionnaires sent by email
Indirect oral presentation

14.14.2. Questionnaire
This is the perform containing a sequence of questions for the statistical enquiry.
This is used for collection of primary data from individual persons through their
responses.

14.14.3. Primary Data Advantages


It provides details information but information may be suppressed
in secondary data.
It is free from errors.
It contains information regarding methods of procuring data where
as primary data often included.
It is cost effective.
It is time less consuming process or suitability.
It gives accuracy results for the data analysis.

14.14.4. Population
There are group of peoples or study the elements for measurements having
some common fundamentals characteristics.
Finite:
Population consists of an endless succession of values e.g.,
number of plants in ocean.

14.15. SECONDARY DATA


The data are collected from official recorded, from published work or by second
person. Example census reports are known as secondary data.
Statistical Error:
There are errors during the collection of data or error shows the extent to which
the observed value of a quantity exceeds the true value.
Error = Observed value-True value
Types:
Introduction to Biotechnology and Biostatistics

Biased errors: due to personal prejudices or bias of investigator


Unbiased errors: statistical enquiry due to chance causes.
Array
Presenting data in ascending order of magnitude is known as array.
Tally

(\) tally mark and running diagonally across the four tally marks.
Examples: Form a frequency table for the different variables.
1, 4, 7, 8, 5, 8, 4, 9, 23, 56, 23, 45, 67, 78, 25, 4, 8. 9, 4, 1, 23, 78, 56, 9, 45, 78,
25, 67, 25, 78, 45, 23.
Solution:
Variables Tally Frequency
1 II 6
4 IIII 7
7 I 4
8 III 6
9 III 7
11 III 3
23 IIII 5
55 IIIII 3
56 II 5
45 III 7
67 II 7
78 IIII 3
25 IIII 5
78 IIII 6

14.16. BIOSTATISTICS
Statistics of marriage, birth death rate, migration, family planning, level of
education, health care of pregnant women and many more problems which are
affecting the welfare of mankind.

14.17. PRESENTATION OF DATA


The data are collected through survey is called raw data. The raw data are
not always suitable for proper statistical analysis. So, we have to classified,
Biostatistics

tabulated and presented by graphs and diagrams.

14.18. DIFFERENT GRAPHS AND DIAGRAM


There are many important diagrams and graphs used in biostatistics. The
representation of data through chart and diagrams is known as graphical
representation of data.

It is easily understood.
The data presented in more attractive form.
It shows the tendency and trends of values of the variable.
It is useful to detect mistakes.
It shows easily the relationship between two data sets.
It has universal applicable.
It is helpful for the assimilation of data quickly.

It does not show all the facts.


It can reveal only the approximate position.
It can take lot of time so its time consuming pattern.

There are different types of graphs in the form of diagrams and charts such as:
line diagram
bar diagram
pie diagrams
These three are used for qualitative presentation of data
stem- and leaf plot
histogram
ogive (cumulative frequency polygon).
scatter diagram
frequency polygone
Introduction to Biotechnology and Biostatistics

Statistics is an important part for all students from graduation to PhD level.
Here we are going to discuss stats via using some important software’s which
make them very easy as well as less time-consuming. Excel is basic stat software
which should be known to each and every student of any discipline. After excel,
some other advanced stats tool are also discussed here in this chapter such as
Origin Pro software. The data in the form of raw scores is known as ungrouped
data and when it is organized into frequency distribution then it is referred to
as grouped data. Separate modes and methods are used to represent these two
types of data ungrouped and grouped.

14.19. LINE DIAGRAM

X-axis by a line. The resultant diagram is called Line diagram.


Example 1: The following data are representing the number of concentration
Fluoride (0, 25, 50, 75, and 100) mgkg-1 NaF effect on Root length. Represent
the data by a line diagram work on Excel software.
Solution:
Firstly open excel software and enter data in excel sheet (Figure 14.1).
After getting design line diagram choose layout and select error bars with
standard error.

Select the data and click on cell


Biostatistics

Go to insert page and select data

Click on line diagram and choose graph.

In line diagram many more style is there, so which you want to like in design
select and click “OK”
Introduction to Biotechnology and Biostatistics

After you get graph then go to design page and select any design which you
want, so your graph will attractive presentation of data.

Select any design which you want to like in graph.

Now you get graph according your choice.


Biostatistics

Now, go to layout page.

Select “error bars with standard error,” so you can present your data without
any error.

Go to grindlines, select “horizontal lines,” and select “None”


Introduction to Biotechnology and Biostatistics

Go to data tables and select below, so your data will show on graph.

Go to data labels and select “above” then data show on line symbol.
Biostatistics

Go to legend and “select show legend on top”

Go to Axis titles select “Title below Axis”

Figure 14.1: Screenshot represents the data input in excel sheet and form Line
diagram.
Introduction to Biotechnology and Biostatistics

14.20. BAR DIAGRAM


The diagram is similar to line diagram but the value is shown by a rectangle
instead of a line.
Example 1. The following data are representing the number of concentration
Fluoride (0, 25, 50, 75, and 100) mgkg-1 NaF effect on Root length. Represent
the data by a Bar diagram work on Excel software.
Solution: Same as line diagram (Figure 14.2).

Figure 14.2: Screenshot represents the data input in excel sheet and form Bar
diagram.

14.21. PIE DIAGRAM


This diagram is used to present the values of different levels of qualitative
data, where data are move to angle and the angles are drawn within a circle.
The total angle of a circle is 360o. Diagram is made same as above given data
(Figure 14.3). Enter data, select data, go to insert page, choose pie diagram
“3-D diagram.”Same as above
Biostatistics

Figure 14.3: Screenshot represents the data input in excel sheet and form Pie
diagram.

14.22. STEM AND LEAF PLATE

eachleaf is digit displayed t the right of it. Each leaf represent a separate data
value.

14.23 HISTOGRAM
It is the accurate representation of the distribution of numerical data. It is an
estimate of the probability distribution of a continuous variable and was first
introduced by Karl Pearson.

14.24 OGIVES
It is the graph showing the curve of a cumulative distribution function. The
points plotted are the upper class limit and the corresponding cumulative
frequency.

14.25. SCATTER DIAGRAM


In this diagram the values are assumed to be correlated for example the given
concentration of F is affected the root length (If we are increase F concentration
than length are reduced, so in between there is correlation).
Introduction to Biotechnology and Biostatistics

1. Enter the data on excel cell, after select the data, for graph formation.
When you will get graph than right click on dots.
2. Select the Trendline option in layout option.
3. After Trendline option select the last two options display equation
on chart and display R- squared value on chart.
4. Note down the displayed equation and calculate the x value from
the given equation (in y = mx + c)
5.
6. R2 = 0.821 (Figure 204.4).
Go to insert page, select scatter, choose scatter which you want to like in graph.

Right click on trendline, after that new window open then select liner, display
equation on chart and display R square value.
Biostatistics

Figure 14.4: Screenshot represents the data input in excel sheet and form scatter
diagram

Methods of Presentation of Statistical Data


Textual presentation: data presented in descriptive form.
Tabular presentation: data presentation in table form. It is two
types:
Simple form
Complex form
Example 1: Simple form: Number of students in HIMT College Greater Noida,
Dr. M.P.S College, Agra, IIMT Aligarh.

HIMT College Greater Noida 4567


Dr M.P.S College, Agra 6000
IIMT Aligarh 5500

Example 2: complex form: Number of students in different course in


different colleges.

HIMT College Greater 123 342 145 140


Noida
Dr M.P.S College, Agra, 234 230 202 130
IIMT Aligarh 231 123 250 145

(a) Graphical presentation


Statistical Tables: It is systematic arrangement of quantitative data under
appropriate heads in rows and columns.
Tables parts description
TITLE
STUB
CAPTION
Introduction to Biotechnology and Biostatistics

BOX HEAD

SOURCE
FOOT NOTE
Features of good table
1. Title must have clear and concise which gives precise idea about
the table contents.
2. The items arrangement in the table should be arrange logically.
3. Special notes at the end of the table for experiment should be bear
to resolve or solve the confusing entry.
4. All the necessary details should be containing in the table.
5. Column or sub column should be distribution like single or double
ruling, etc.
6. Figure should be kept close as possible to the table for the
compatible comparison.
7. Table should be well proportional or justify in breadth or length.
8. Measurements of units or abbreviations should be shown clearly on
top of the column or below in the “Note” text line.
9. Pattern of table given in Table 14.1.
Table 14.1: Geo-Statistics of Individual Layers of Groundwater Quality
Parameters (*WHO/Indian Standard for Drinking Purpose) Banasthali, Tonk,
Rajasthan, India

Mean Value

1 Ca+2+Mg+2 0.00 7.63 2.76 1.05 15–40 meq/L 40.000

2 Mn 0.32 4.85 1.99 1.11 5.00 High 0.90 to


0.70

3 Fe 4.18 9.79 6.55 0.92 100.0 Medium High 0.70 to


0.50

4 Cu 0.48 5.32 2.31 1.19 50.00 Medium 0.50 to


0.35

5 Zn 0.38 2.12 1.01 0.20 5.00 Low Below


0.35

6 Cl 2.66 24.17 9.02 4.04 1.00 meq/L

7 CO3 0.00 3.33 1.10 0.57 meq/L

8 HCO3 1.83 19.60 7.12 3.11 meq/L

9 pH 7.30 9.39 8.49 0.32 6.50–8.50 8.500

10 EC 2.73 29.05 9.68 3.70 <3 dS/m

11 RSC 0.00 20.82 5.63 3.73 1.25–2.50 meq/L

12 F 5.89 40.45 23.17 3.78 1 mg/L High 0.50 to


1.50

Note: Values are mean of three replicates, S.No. – Serial number; SD – Standard
deviation (Electrical conductivity – EC dSm-1), (Calcium – Ca, Chloride – Cl,
Carbonate – CO3, Biocarbonate – HCO3,Residual sodium carbonate – RSC,
Biostatistics

Magnesium – Mg, Sulfur – S, Phosphorus – P, Potassium – K, Iron – Fe,


Manganese – Mn, Zinc – Zn, Copper – Cu, Fluoride – F).

Statistics measure are based on the It is measure on the sample observations is known
units in the population is known as as statistics.
parameter.
Example: population mean and Example: Sample mean and sample standard
population standard deviation. deviation.
Population characterize. Sample characterize.
This is not apply for the directly It is apply for the directly worked out
worked out.
Parameters value is constant and it It is variable calculation in sample. Mean the values
of sample is varies from sample to sample.

x Mean
S Standard deviation
S2 Variance
r p

The variable is discontinuous or continuous in the following cases are:


The numbers of individuals in the family.

In a washing machine, the numbers of gallons of water


The life time of television tubes are produced in a company.
Solution (i) discrete (ii) continuous (iii) discrete; and (iv) continuous.
BA Punjab University, 1976.

14.26. SOME IMPORTANT FACTS

14.27. DESCRIPTIVE STATISTICS


There is a use of simple graphics as well as simple calculations to estimate the
collected data. It involves different parameters values like mean, median, mode,
range, variance, standard deviation, standard error, etc.
Introduction to Biotechnology and Biostatistics

14.28. MEAN
The sum of a collected data and divide it by the number of the set of data. It is

No. of boys participate in cricket games and get marks


3, 3, 4, 5, 3, 6, 2, 1, 7, 8 and 10 (N = 11)
Calculate mean = 3+3+4+5+3+6+2+1+7+8+10= 52/11= 4.72
Example 1: Marks obtained in biostatistics of 10 students of BSc
Biotechnology students of HIMT College Greater Noida.

1 67
2 69
3 66
4 68
5 72
6 63
7 76
8 65
70
10 74
Total = 10

Solution:

Mean = 690/10= 69

14.29 CALCULATE AVERAGE


Select new column for the average (mean) and then click on cell for select data
and select an average option and enter (Figure 14.5).
Biostatistics

Choose average

Select data

After selection of data insert “)” after D6

After insert “),” click on enter button


Introduction to Biotechnology and Biostatistics

Select cell and click on right corner and then drag below so you get average
values of all data in less time consuming or we can say it is fast process.

Figure 14.5: Screenshot represents the average values of data


Examples 2: Find out the Arithmetic mean from the frequency table
Marks 30 40 50 60 70 80 90
No. of 15 20 10 15 20 15 5
students

Solution: Here we apply Mean =

= 5750
X = 5700/100 = 57
Biostatistics

The arithmetic mean of x is 57

30 15 450
40 20 800
50 10 500
60 15 900
70 20 1400
80 15 1200
90 5 450
f = 100 fx = 5700

Calculation of mean in continuous series


Example 3: Calculate the mean for the following data:

Frequency 3 5 10 15 5 12

Solution:

10–20 15 3 45
20–30 25 5 125
30–40 35 10 350
40–50 45 15 675
50–60 55 5 275
60–70 65 12 780
= 50 = 2250

Here we apply
X=

Examples: Calculate the arithmetic mean for the daily wages from the following
data.

Wages in 10–20 20–30 30–40 40–50 50–60 60–70


Rupees
Number of 5 10 30 20 15 10
workers
Introduction to Biotechnology and Biostatistics

Solution:

10–20 15 5 75
20–30 25 10 250
30–40 35 30 1050
40–50 45 20 900
50–60 55 15 825
60–70 65 10 650
= 90 = 3750

Explain:

Here we apply
X=

Examples: Calculate the arithmetic mean for the following data.


Class 10–20 20–30 30–40 40–50 50–60 60–70 70–80 80–90 90–100
Interval
Frequency 2 7 17 29 29 10 3 2 1

Solution:

10–20 15 2 30
20–30 25 7 175
30–40 35 17 595
40–50 45 29 1305
50–60 55 29 1595
60–70 65 10 650
70–80 75 3 225
80–90 85 2 170
90–100 95 1 95
= 100 = 4840
Biostatistics

Here we apply
X=

Short cut method:


It is applied when the values and frequencies of the variable are quite large so

taken as that value of x, which corresponds to the middle value of the frequency
distribution.
Such as in the case of ungrouped data

where as a = assumed mean


N = number of items
D = (x-a) = deviation of any variate from a.

Example 1: Find the mean weight of the following students by short cut
method whose weights are in kg.
67 69 66 68 63 76 72 74 70 65

Solution:
Let assume 68 as mean
X X-a = d
67 67 – 68 = -1
69 69 – 68 = +1
66 66 – 68 = -2
68 68 – 68 = +0
63 63 – 68 = -5
76 76 – 68 = +8
72 72 – 68 = +4
74 74 – 68 = +6
70 70 – 68 = +2
65 65 – 68 = -3

fd = +21 – 11 = 10
N =10
X=a+ / N = 68+10/10
= 68+1 = 69
69 Kg
Introduction to Biotechnology and Biostatistics

Examples 2: Find the mean height of the 8 students by shirt cut method,
whose height are in centimeter.
59 65 69 63 61 71 73 67

Solution:
Let assume 65 as mean
X
59 59 – 65 = -6
65 65 – 65 = 0
69 69 – 65 = +4
63 63 – 65 = -2
61 61 – 65 = -4
71 71 – 65 = +6
73 73 – 65 = +8
67 67 – 65 = +2

fd = +20 – 12 = 8
N=8
X=a+ /N
= 65+8/8 = 65+1
66 cm

Such as in the case of ungrouped data


fd
N
fd = product of frequency and corresponding deviation

Example: Find out the arithmetic mean by short cut method for the following
data.

Wages in 1 0 – 20–40 40–50 50–70 70–80 80–100


rupees 20
Number of 5 15 25 35 12 8
persons

Solution:
Let assume 55 as a mean
fd
10–20 15 5 15–55 = -40 -200
20–40 30 15 30–55 = -25 -375
Biostatistics

40–50 45 25 45–55 = -10 -250


50–70 60 35 60–55 = 5 +175
70–80 75 12 75–55 = 20 +240
80–100 90 8 90–55 = 35 +280
f = 100, N =
100

X=a+ /N
= 55 – 130/100
= 55–13
= 53.7

Step deviation method:


The class interval in a grouped data is equal then the calculations can be

equal to the width of the class interval.


X=a+
N
Where “a” assume mean
D = x-a/ i deviation of any variate from “a”
i = width of class interval
N = number of observation
Example: Find out the arithmetic mean.

Height of the 60–62 63–65 66–68 69–71 72–74


student
Number of 15 54 126 81 24
students

Solution:
Let’s take 67 as assume mean
fd
(f) (d)
60–62 61 15 -6 -2 -30
63–65 64 54 -3 -1 -54
66–68 67 126 0 0 0
69–71 70 81 +3 +1 +81
72–74 73 24 +6 +2 +48
= 300 = 45
Introduction to Biotechnology and Biostatistics

= 45
a = 67
= 300
i=3
X = a+ 45/300 × 3
= 67 +45/300 × 3 = 67 +45 = 67.45 inch.
Example: Calculate the average marks by the step deviation method.
Marks 0–10 10–20 20–30 30–40 40–50 50–60
Number of 40 25 50 35 30 20
students
Solution:
Let us take 35 as assumed mean
fd

0–10 5 40 -30 -3 -120


10–20 15 25 -20 -2 -50
20–30 25 50 -10 -1 -50
30–40 35 35 0 0 0
40–50 45 30 +10 +1 +30
50–60 55 20 +20 +2 +40

-220 + 70 = -150
a = 35
N= 200
i = 10
X=a+ i
N
35 – 150/ 200×10
= 35 – 7.5 = 27.5
Merits, Demerits, and Uses of Mean
Merits
It has the simplest formula which is understandable easily and easy
to compute.

for a single problem.


It is based all the observation calculations.

The mean is typical, i.e., it balance the value on either side.


It is best for the two series measure for the comparison.
It is calculated on value
Biostatistics

It is does not depend upon any position.


Demerits
v The value are not known, it cannot be calculated.
v The value is affected by extreme mean.
v It is not determined by the qualitative data such as honesty, love,
beauty, etc.
Uses
v It is always obtained by mean.
v It is used in practical statistics.
v The common people uses mean for the calculating average marks
obtained by students.

14.29. CALCULATE AVERAGE


The set of numbers is simply the sum of the numbers divided by the total
numbers of values in the set. For example: we want the average of 24, 55, 17,
87 and 100. Simplify find the sum of the numbers 24+55+17+87+100 = 283
and divided by 5 to get 56.6.

14.30. MEDIAN
It is the value in given data which divides into two equal parts such that half of
the observation is below or another half are above it. It is the middle most point
or the central value of the variable in a set of observations when observations
are arranged either in ascending or in descending order of their magnitudes.
“It is the value of that in a series which decides the series into two equal parts,
one part consisting of all values greater than it.” (Prof. Ghosh & Chowdhury).

Simple series (ungrouped data)


The arrangement of data is either descending or ascending order. If the number
of observation is odd, the value of the meddle is most of the median. However,
if the number be even, the arithmetic mean of the two middle most items is
taken as median.
When “n” is odd. In this case (n+1/2)th value is the median.
= M = (n+1/2)th term.
When “n” is even then there are two middle terms n/2 and (n+1/2)th. The
Introduction to Biotechnology and Biostatistics

median is the average of these two terms.


M = n/2 + (n/2 + 1)
2
Example 1. Find the median of the following numbers:
(a) 21, 12, 49, 37, 88, 46, 55, 74, 63
(b) 88, 72, 33, 29, 70, 86, 54, 91, 61, 57.
Solution: (a) Let arrange the data in order: 12, 21, 37, 46, 49, 55, 63, 74, 88.
In this data the number of item is n = 9 (odd)
Median
= average of (n/2)th + (n/2+1) terms.
Average of (10/2)th and (10/2) th +1 terms.
= average of 5th and 6th terms
M = 61+70/2 = 131/2 = 65.2
Median is 65.2
Example 2. The number of runs scored by 11 players of cricket team of
HIMT Greater Noida: 5, 19, 42, 11, 50, 30, 21, 0, 52, 36, 27.
Solution:
Let the value arrange in ascending order
0, 5, 11, 19, 21, 27, 30, 36, 42, 50, 52.
Here item is 11 (odd)
= M = (n+1/2)th item
= 11+1/2
= 6th item, i.e., 27
= 27 runs.
(c) Grouped data:
(1) Discrete series: The arrangement of data is arrange either ascending
or descending order. The corresponding frequencies is showing in
the table and also the cumulative frequencies is also prepared and
showed in the table.
(2) The formula of median as follows
(3) M = n+1/2th
(4) N=

Example 1. Find the median for the following data:


Income in 100 150 80 200 250 180 Total
rupees
Biostatistics

No. of 24 26 16 20 6 30 122
persons
Solution:
Let arrange the data in ascending order and then the form the cumulating
frequencies

80 16 16
100 24 40
150 26 66
180 30 96
200 20 116
250 6 122
As according to the table n = 122 (even),
So the median (M) = average of (n/2)th + (n/2+1)th
= 122/2 + (122+2/2)
= 61+62/2
= 61.5th
= 61.5th lies are the interval 41 to 66. Therefore the value is 150.
Example 2: Calculate the median for the following data:
Number of 6 16 7 4 2 8
students
Marks 20 25 50 9 80 40
Solution:
Lets arrange the data or we can say that marks in ascending order and then form
the cumulative frequencies.

9 4 4
20 6 10
25 16 26
40 8 34
50 7 41
80 2 43

Here

Median (M) is = n+1/2


= 43+1/2
= 22nd value
In the table all the items are shows from 11 to 26 have their values 25.
Since 22 and items lies in this interval, therefore it value is 25.
Introduction to Biotechnology and Biostatistics

(2) Continuous series:


The data is given in the table in the form of frequency with class interval. The
cumulative frequencies are found out for each value.
The median class is calculated and a cumulative frequency N/2 lies is called
median class.
Now median is calculated
M = L + N/2-C × i

L = Lower limit of the class in which median lies.


Fm = Frequency of the class
C = Cumulative frequencies
I = Width of the class interval
N = Total number of the frequencies.
Merits and Demerits of the Median
Merits
It is no popular as mean and it is easily understood.
It is not affected by the variation in the magnitude of the extreme
values.
The value of the median can be graphically ascertained to ogives.
It is the best measure for qualitative data such as intelligence and
beauty, etc.
It can be determined even by inspection in many cases.
Demerits
Data must be arranged for the calculation of the median.
The positional average cannot be dependent on each and every
observations.
It is subjected algebraic treatment.
.

Median on Excel
Birth weight of girls = 3.2, 2.5, 2.8, 2.2, 3.0 (N=5)

median of the given data is 2.8 (Figure 1.6).


Click on cell and enter “=”
Biostatistics

After enter “=“ then you have to write “me”

Now you have to choose “MEDIAN”

Select the data

After selection of data, enter “)” symbol


Introduction to Biotechnology and Biostatistics

After click on enter button, choose median given cell and click on right side,
you see one sign and then cursor stay on that and drag it.

Figure 14.6: Screenshot represents the median values of data.


Biostatistics

14.31. MODE
There is given value of the data set which occurs most frequently. For example,
Let us consider the age of boys, where ages are
8, 5, 10, 9, 5, 8, 10, and 8
Here 8 is repeat most of the time or we can say occurs maximum time. Therefore
mode is 8, because 8 is occurs most frequently in the given data set value. The
data enter same as above description given in median and follow step same as
in median and get the values of mode (Figure 14.7).
According to the CRAFT and COWDEN, “The mode of the distribution of the
value at the point around which the items tend to be most heavily concentrated.
It may be regarded as the most typical value.”
Ungrouped data (simple series): Mode can be determined by locating
that value which occurs the maximum number of times. It is that
value of the variable which corresponds to the largest frequency.
Example 1: Find out the Mode of the following data.
1,3, 1, 3, 3, 5, 3, 3, 1, 5, 3, 3, 4, 5, 4, 2, 3, 2, 3, 7, 6, 3, 2, 5, 2, 3, 3, 2, 6, 2, 3, 2, 4, 2, 3.

Solution:
Let prepare the table
Values Number of items (f)
1 3
2 8
3 14
4 3
5 4
6 2
7 1

Here the 3 repeats 14 times and is the most frequent is used so the mode is 3.
Example 2: In Bombay there is Khadim shop sold 100 pairs of shoes in
Khadim exclusive on certain day with the following distribution. Find out
the mode of distribution.
Size of the 4 5 6 7 8 9 10
shoe
Number of 10 15 20 35 16 3 1
pairs

Solution: Let prepare the table for the frequency


Introduction to Biotechnology and Biostatistics

Size of the 4 5 6 7 8 9 10
shoe
Number of 10 15 20 35 16 3 1
pairs

Note: 7 are the maximum frequency is 35 so 7 is the mode of distribution.


Example 3: Find the mode and mean of the numbers.
Solution:
Let prepare the table with frequency.
Value (x) Number of items (f) fx
1 2 2
2 2 4
3 3 9
4 2 8
5 2 10
7 1 7
40

Here is Mean

N
X = 40/12 = 3.33

The table indicate the number of “3” has the maximum frequency is “3” so
therefore “3” is the mode of the mode of the numbers.
(B) Grouped data (Discrete series): Mode is determined by inspection.
In this case error of judgment is possible in these cases where the difference
between the maximum frequency and the frequency preceding or succeeding it
is very small and the items are heavily concentrated on ether side. It is prepared
by grouping table and analysis table.
Grouped table features: It has six columns.
Column I: The maximum frequencies and original frequencies are marked.
Column II: The frequencies of column I are combined two by two and
frequency is marked by bold type.
Column III: Leaving the column I frequency and combine the other two and
again marked by bold type.
Column IV: The column I are combined in three by three and the maximum
frequency is marked by bold type.
Column V: Leave the frequency I and combine the others three by three and
Biostatistics

marked by bold type.


Column VI: Leave the two frequencies of the column T and combine the others
three by three and marked by bold type.
Example 1. Calculate the mode of frequency distribution.
H e i g h t 58 59 60 61 62 63 64 65 66 67 Total
in inch
No. of 4 6 5 10 20 22 24 6 2 1 100
person

Solution:
Height Frequency of two (II) of two leaving Grouping of Grouping Grouping
(I) three (IV) of three of three
(V) (VI)
58 4 10 15
59 6 11 21
60 5 15 35
61 10 30 52
62 20 42 66
63 22 46 52
64 24 30 32
65 6 8 9
66 2 3
67 1

Analysis table:
Columns Size of the items having maximum frequency
58 59 60 61 62 63 64 65 66 67
I I
II I I
III I I
IV I I I
V I I I
VI I I I
Total 1 3 5 4 1

Since the number 63 occur maximum number of items i.e., 5 times, hence
mode is 63.
(C) Continuous Series:
Introduction to Biotechnology and Biostatistics

The class of grouped in frequency is determined by inspection or with the help


of grouping data. The class of interval is not continuous we have to transform
class limits into class boundaries. If class width are not equal, we have to make
the interval equal and frequencies of such classes be adjusted considering
equally distribution throughout the class.

apply the empirical relationship


Mode = L1 + d1 × 1
d1 + d2

where,
L1 = Lower boundary
d1 = Difference of the largest frequency and preceding modal class
d2 = Difference of the largest frequency and following modal class
i = Width classes
fm= Maximum frequencyw
f1 = Frequency of the class just proceeding the modal class
f2 = Frequency of the class just following the modal class
d1 = fm – f1
d2 = fm – f2
Mode = L1 + fm – f1 i
(fm – f1) + (fm – f2)
= L1 + fm – f1 × i
2 fm – f1 – f2

Example 1: Find the mode of the following data:


Marks 1–5 6–10 11–15 16–20 21–25
Number of 7 10 16 32 24
students
Solution:
Mid Value
1–5 0.5–5.5 3 7
6–10 5.5–10.5 8 10
11–15 10.5–15.5 13 16
16–20 15.5–20.5 18 32
21–25 20.5–25.5 23 24
Biostatistics

Maximum Frequency is 32 and it is lies in the class 15.5–20.5.


Thus the modal of class is 15.5–20–5.
L1= 15.5
= 32
f1 = 16
f2= 24
i=5
= L1 1

1 2
= 15.5 + 16/ 64–40 × 5
= 15.5 + 16/24 × 5
= 15.5 + 3.33
= 18.83.
For excel you can fit the value same as median and follow the instruction as
before in mean and median.
Same as median

Same as median

Figure 14.7: Screenshot represents the mode values of data.

Merits and Demerits of Mode


Merits
It is obtained by inspection.
It is not affected by extreme values.
It can be calculated by open end classes.
It can be easily understood.
It can be used easily for the qualitative analysis.
It can be found graphically.
Introduction to Biotechnology and Biostatistics

Demerits
In this case large number of observation is available and there is no

It cannot be treated algebraically.


It is measure peculiar measure of central tendency.
It is arranged the data in the form of frequency distribution for the
calculation of mode.

Partition Values
When we required dividing a series into more than two equal parts, the dividing
places are known as partition values.
Percentiles
Deciles
Quartiles
Percentiles: The values which divide the total number of observations into the
hundred equal parts. There are 99 percentiles P1, P2, P3…….P99 is called as
first percentile second third percentile, etc.
Deciles: The values which divide the total number of observations into the ten
equal parts. These are nine deciles viz, D1, D2,…. D9, first, second and third
deciles, etc.
Quartiles: The values which divide the total number of observations into the
four equal parts. Therefore there are three quartiles.
First quartile (Lower quartile): Q1
Second quartile (Middle quartile): Q2
Third quartile (Upper quartile): Q3
Geometric Mean

geometric mean. The geometric mean cannot be used if the values is zeros or
negative values.
Geometric mean (GM) = 1
× X2 × X3 …. Xn
n = number of observations
X1 × X2 × X3… = Variable values.
Example 1: Find the G.M of the three numbers 8, 36, 48.
Solution:
Biostatistics

=2×2×2×3
= 24
Merits and Demerits of Geometric Mean
Merits
It is affected by the extreme values.
It is capable of algebraic treatment.

It is based on all the observations.


It is uses in microbiology.
It is highly useful for the averaging rations, percentage and
determining the ration of change.
It is important for the index number.
Demerits

negative values.
Harmonic Mean

individual observations.
Thus for observations X1, X2, X3, …. Xn
H.M = N
1/X1 + 1/X2 + 1/ X3 …. 1/Xn
=N

Example: Find the average rate of motion in the case of a person who rides

km at 6 km an hour.
Solution:
Harmonic mean is the proper average.
N = 3 HM. = 3/1/10 +1/8+ 1/6
= 3/ 12+15+20/120
= 3/47/120
= 3/0.39
7.6 km an hour
Uses
It is very limited.
It is involving in time, rate and price.
It gives less weight to large items and more weight to small item.
Introduction to Biotechnology and Biostatistics

Merits and Demerits of Harmonic Mean


Merits
It is based on all the observations of the series.
It is suitable for algebraic treatment.

Demerits
The values cannot be computed when there are both positive and
negative items.
It is not popular.

Examples: Find out the relation between A.M, G.M. and H.M.
Solution:
The observation of any given set, A.M. is greater than or equal to G.M. and G.M. is greater than or
equal to HM.

They are equal, only when all observations are equal


X = G.M. = H.M.
For example: We take two position items 6 and 6
Mean (A.M) = 6+6/2 = 6

H.M = 2/ 1/6+1/6
= 2/ 1/6+1/6
= 2/1/3
=6
So, X = GM. = H.M
But, the size vary, mean (A.M) will be greater than the geometric mean and geometric mean will be
greater than the harmonic mean. This is because of the property of the geometric mean to give larger
weight to smaller item and of the harmonic mean to give the largest weight to the smallest item.
X > G.M > HM.
For example:
We take two positive items 4 and 9
Mean (AM.) = 4+9/2 = 6.5

=6
H.M. = 2/1/4+1/9 =
= 2/13/36 = 2×36/13
= 5.5
So, A.M > G.M > H.M = = 6.5 > 6 > 5.5.
Questions: Which type of average would be suitable?
i. Average sales for various years ?
ii. Sale of shirts with collar size in Cm 36, 37, 35, 36, 33, 36 ?
iii. Size of agriculture holdings ?
iv. Per capita income in several countries ?
v. Runs scored by a player in different matches ?
vi. Comparison of intelligence of students ?
vii. Marks of candidates obtained in an examination ?
viii. Size of the shoes sold at a shop ?
Answers: (i) Mean (ii) Mode (iii) Mode (iv) Mean (v) Mean (vi) Median (vii) Median (viii) Mode.
Biostatistics

14.32. STANDARD DEVIATION


The standard deviation concept was introduced by Karl Pearson in 1893.
Standard deviation is used to measure the amount of variation in the set of
the data. At the time of data collection or in the case of experiments results
we used to collect or calculate multiple sets of data to minimize the chance of
experimental error.
2
/N]
It is the square root of the arithmetic mean squares of deviation from arithmetic
mean. In short of S.D may be defined as the “Root Mean Square of deviation
fro Mean.”

If x1, x2 x3……xn be set of observations and x their arithmetic mean.


Deviation from mean = (x1 – x), (x2 – x), (x3- x)…….(xn – x).
Square deviation from mean = (x1 – x)2, (x2 – x)2, (x3- x)2…….((xn….x)2.
Mean Square deviation from mean i.e.,
= (x1 – x)2, (x2 – x)2, (x3- x)2…….((xn….x)2/ n
/n
2

Root mean square deviation from mean, i.e., standard deviation


2

Coefficient of Standard Deviation:


It is ratio of the standard deviation to its arithmetic mean i.e.,
Coefficient of standard deviation
Standard deviation calculates:
Simple series:
It can calculate the mean
It can find out the differences of each observation from the mean.
The square of the differences of observation from the mean is fixed.
Add the square values to get the sum of the squares.
Divide by the numbers of observations.
S.D ( )2/n
2
/n
where d = (X – X)
X = Value of the variable
X = Arithmetic mean
Introduction to Biotechnology and Biostatistics

n = Total number of observations.


Example 1. Find the standard deviation.
11 12 13 14 15 16 17 18 19 20 21
Solution:
Here there is arithmetic mean
X = 11+12+13+14+15+16+17+18+19+20+21/11
X = 176/11
=16
Lets calculate the standard deviation
X 2 2

11 11 – 16 = -5 25
12 12 – 16 = -4 16
13 13 – 16 = -3 9
14 14 – 16 = -2 4
15 15 – 16 = -1 1
16 16 – 16 = 0 0
17 17 – 16 = +1 1
18 18 – 16 = +2 4
19 19 – 16 = +3 9
20 20 – 16 = +4 16
21 21 – 16 = +5 25
N = 11 2
= 110

S.D (

= 3.16

14.33. SHORT CUT METHOD


This method is used for the calculation of the standard deviation when the

2 2

Where d = X – A
A = Assumed mean
n = Total number of observations
Example 1: Find the standard deviation of the following items.
Biostatistics

48 43 65 57 31 60 37 48 59 78
Solution:
Calculate the standard deviation.
Value (X) d2
48 48 – 57 = -9 81
43 43 – 57 = -14 196
65 65 – 57 = +8 64
57 57 – 57 = 0 0
31 31 – 57 = -26 676
60 60 – 57 = +3 9
37 37 – 57 = -20 400
48 48 – 57 = -9 81
59 59 – 57 = +2 4
78 78 -57 = +21 441
-78 +34 = -44 1952

Here assumed mean = 57


2
= 1952

n = 10
2 2

(b) Standard deviation (Grouped data) Discrete Series:


(i) Direct Method
f (X – X)2/ n
X = Arithmetic mean
f = Frequency
n = Number of item
(ii) Short Cut Method
The mean has a fractional value then the following formula is used
fd2 fd/n)2
d= X –A
A = assumed mean

Example 1: Find out the mean and standard deviation of the following
data.
Size of item 10 11 12 13 14 15 16
Frequency 2 7 11 15 10 4 1
Solution: Prepare the following table:
Introduction to Biotechnology and Biostatistics

fd fd2
10 2 10 – 13 = -3 -6 18
11 7 11 – 13 = -2 -14 28
12 11 12 – 13 = -1 -11 11
13 15 13 – 13 = 0 0 0
14 10 14 – 13 = +1 10 10
15 4 15 – 13 = +2 8 16
16 1 16 – 13 = +3 3 9
2

= 13 – 10/50
=13 – 0.2
= 12.8
X = 12.8
2 2 2

= 1.342
Standard deviation in continuous series
(a) Direct Method

X = Mid value
X = A.M
f = Frequency

(b) Short Cut Method:


d=X–A
d = X – A/i
= d/i

d = X-A/i
A = Assumed mean
N = total frequency
i = class width
Example 1: Calculate the mean, median, S.D variance and covariance of
the following items.
Heights in inches 95–105 105–115 115–125 125–135 135–145
Number of 19 23 36 70 52
children
Solution:
Let assumed the mean value is 130
Biostatistics

Cf fd fd2
o f
Children
95–105 100 19 19 100–130/10 -57 19 × 9 = 171
= -3
105–115 110 23 42 110 –13 0/10 -46 23 × 4 = 92
= -2
115–125 120 36 78 120–130/10 -36 36 × 1 = 36
= -1
125- 135 130 70 148 130–130/10 = 0 0 70 × 0 = 0
135–145 140 52 200 140–130/10= +52 52 × 1 = 52
+1
N = 200 -139 +152 = -87 = 351

Mean =
= 130 + -87/200 × 10
= 130 – 4.35
= 125.65
2 2

2
×10

= 1.2489 × 10
= 12.489
Median =
Median class 125 – 135 = 125 + 200/2 – 78 /70 × 10
= 125 + 100 -78/70 × 10
= 125 +22/7
= 125 + 3.14
= 128.14
2 2

= 155.97
= 156

1248.00/125.65 – 124800/12565
= 9.93

Standard Error
The sampling distribution of any statistics will have its own mean, standard
deviation, etc. The sample estimates of statistics will differ from population
parameter.
The difference between the sample or particular sample and population variation
is called as sampling error or standard error.
Standard error can be calculated by
Introduction to Biotechnology and Biostatistics

S.E – X = Standard error


S.D = Standard deviation
n= Size of the sample
If the sample have same standard deviation then
S.E of (X1 –X2)
1
+1/n2
(X1) (X2) = Population
n1 and n2 = size of the sample
Factors affecting or controlling the Standard Error:
The sample size: Increase the size of the sample decease S.E.
The nature of statistics: e.g., means, variance, etc.
The standard deviation: the value of S.E varies directly with the size
of S.D.
Uses:
To measure of the extent sampling error in the mean.
To calculate the size and also determine whether the population is
drawn from known population or not.
Example 1:
S.D of sample is 36 cases is 23.61.
Solution:

S.D = 23.61
n = 36

23.61/6
= 3.935.

interorbital width (m.m) of a sample of 100 pigeons.


Class Interval 11–13 14–16 17–19 20–22 23–25
Frequency 8 20 40 25 7
Solution:
Mid f d2 fd2

11–13 12 8 96 -609 37.1 296.7


14–16 15 20 300 -3.09 9.55 190.96
17–19 18 40 720 -.09 0.0081 0.324
20–22 21 25 525 2.91 8.47 211.7
23–25 24 7 168 5.91 34.92 = 35 244.49
2

Mean (x) =
Biostatistics

= 1809/100
=18.09

= 3.07
Variance = (S.D)2 = 9.4249

= 3.07/18.09 × 100
= 16.97
Example 2: Find out the mean and S.D from the following frequency
distribution?
Scores 20–22 23–25 26–28 29–31 32–34 35–37 38–40
Frequency 2 5 7 13 8 4 1
Solution:
Mid f d2 fd2

20–22 21 2 42 -3.7 75.69 151.38


23–25 24 5 120 -5.7 32.49 162.46
26–28 27 7 189 -2.7 7.29 51.03
29–31 30 13 390 0.3 0.09 1.17
32–34 33 8 264 3.3 10.89 87.12
35–37 36 4 144 6.3 39.69 158.76
38–40 39 1 39 9.3 86.49 86.49
2

= 1188/40
= 29.7
S.D = = 2

= 4.18
Example: Calculate the mean and median from the following frequency
distribution.
Scores 20–24 25–29 30–34 35–39 40–44 45–49
Frequency 7 9 12 6 4 2
Solution:
f cf
20–24 22 19.5–24.5 7 154 7
25–29 27 24.5–29.5 9 243 16
Introduction to Biotechnology and Biostatistics

30–34 32 29.5–34.5 12 384 28


35–39 37 34.5–39.5 6 222 34
40–44 42 39.5–44.5 4 168 38
45–49 47 44.5–49.5 2 94 40

Mean (x) = = 1265/40 = 31.625


1

= N/2 = 40/2
= 20
L1 = 29.5
= 12, C = 16, I = 5
Median = 29.5 + 20–16/12 × 5
29.5 +4/12 × 5
= 29.5 + 1.667
= 31.167
Example: Calculate the mean, median and S.D from the following
distribution?
Scores 10–19 20–29 30–39 4 0 – 50–59 60–69 70–79 80–89 90–99
49
Frequency 2 5 3 5 8 12 25 30 10
Solution:
Mid f d2 fd2

10–19 14.5 9.5–19.5 2 29 2 -55.8 3113.64 6227.28

20–29 24.5 19.5–29.5 5 122.5 7 -45.8 2097.64 10488.2

30–39 34.5 29.5–39.5 3 103.5 10 -35.8 1281.64 3844.92

40–49 44.5 39.5–49.5 5 222.5 15 -25.8 665.64 3328.2

50–59 54.5 49.5–59.5 8 436 23 -15.8 249.64 1997.12

60–69 64.5 59.5–69.5 12 774 35 -5.8 33.64 403.68

70–79 74.5 69.5–79.5 25 1862.5 60 4.2 17.64 441

80–89 84.5 79.5–89.5 30 2535 90 14.2 201.64 6049.2

90–99 94.5 89.5–99.5 10 945 100 24.2 585.64 5856.4

n = 100 = 7030 2
= 38636

Lets assume mean 74.5

= 7030/100
= 70.30
Biostatistics

Median = 1
= 69.5 + 50 – 35/25 × 10
= 69.5 + 150/25
= 69.5 + 6
= 75.5
2

38636/100

= 19.656
Merits and Demerits of the Standard Deviation
Merits

measure of dispersion.
It is used in correlation.

It is based on all the observations.


Demerits
It gives more weightage to extreme values
It is not simple to understand.

Uses

It is also helpful for the calculation of the standard error.


It is used for the summaries of the deviations of a large distribution
from mean.

14.34. CALCULATE STANDARD DEVIATION ON


EXCEL
Select new column for the SD and then click on cell type STDEV choose option
and enter and select the data and click on enter button. There are many snapshot
below which will help you for calculating standard deviation.
Same as mean (Figure 14.8).
Introduction to Biotechnology and Biostatistics

Same as mean

Same as mean

Same as mean

Same as mean
Biostatistics

Same as mean

Figure 14.8: Screenshot represents the standard values of data.

14.34. VARIANCE
2
)
Variance = (S.D) 2
)
2

Covariance:
where the average product of the simultaneous deviation of the variables from
their respective mean.
2 2
/ n-1 or
2 2
/n
COVXY
Variance on excel
Select new column for the variance and then click on cell and type VAR,
selection option and enter (Figure 14.9).
Introduction to Biotechnology and Biostatistics

Same as standard deviation

Same as standard deviation

Same as standard deviation

Same as standard deviation


Biostatistics

Same as standard deviation

Figure 14.9: Screenshot represents the Variance of data.

Frequency Values of Variable


The different observations are known as values of variable. The values of
variable obtained by observations are known as observed values or observation.

14.35 CORRELATION (R)


Select new column for the correlation and then click on cell type CORREL and
choose option and then enter.
There are two type of correlations +1 (perfect correlation) 0 (no correlation) to
–1 (perfect negative correlation) (Figure 14.10).
After calculating correlation we can determine the probability of observed
correlation occurred by chance means we can conduct a significance test. Most
often we are using this to determine while our hypothesis is a real one and
not a chance occurrence. There are two hypothesis (1) Null hypothesis and (ii)
Alternative hypothesis.
1. Null hypothesis

known as null hypothesis. It is denoted by H0. The formula given as follows:


H0: µ = µ0
2. Alternative hypothesis
Introduction to Biotechnology and Biostatistics

The statement against null hypothesis is known as alternative hypothesis. Thus,


if null hypothesis is H0: µ = µ0 0
or HA: µ < µ0 or HA: µ > µ0
for agriculture and 0.01 for pharmaceutical industry. For example, if we get
correlation value r = 0.972, which is more than 0 means our value accepts the
alternative hypothesis and reject the null hypothesis.
Choose the cell and text the “cor” and select the CORREL

Now same follow the instruction as same as in standard deviation

Same as standard deviation

Same as standard deviation

Same as standard deviation


Biostatistics

Same as standard deviation

Figure 14.10: Screenshot represents the correlation values of data.

14.36. REGRESSION
It is used to denote estimation or prediction of the average value of one variable
for a specified value of the other variable. One of the variables is known as
independent or the explained variable and the other is called dependent or the
explaining variable.
(It is the measure of the average relationship between two or more variable in
terms of the original units of the data; M.M. Blair).
Regression Lines
The bivariate data are plotted as points on graph paper, it will be found that the
concentrations point follows a certain pattern showing the relationship between
the variables. When the trends points are found to be linear, we determine the

used to obtain best estimated of one variable for given values of the other are
called regression lines.

bx +a.

of that line.
Introduction to Biotechnology and Biostatistics

change per unit change in some other independent variable (X) is known as

Types of Regression
Simple regression:
Dependent variable is a function of a single independent variable.
Multiple regression
Dependent variable is a function of two or more variable.
Linear regression
Dependent variable is linearly correlated with the predictor (independent
variable). It forms the straight line.
Nonlinear regression
Dependent variable has a nonlinear correlation with the independent variable. It
forms sigmoid or hyperbolic curve.
Properties Regression
The expression of the dependent variable is applied as a function of
independent variable.

dependent and independent variable.


Regression predicts.

Methods of Studying Regression


There are two types of method such as:
Graphic Method
Algebraic Method

Useful Properties of Regression


It is a mathematical measure showing the average relationship
between two variables.

Sometimes both variables are randomly collect.


It causes and affects the relationship between the variables.
It is used for prediction of one value in respect to the other given
values.

It has wide application as it studies linear and nonlinear relationship


Biostatistics

between the variables.


It is explain that the decrease in one variable is associated with the
increase in the other variable.

14.37 NULL HYPOTHESIS


In statistical hypothesis is asset up and whose validity is tested for possible
rejection on the basis of sample observations is known as Null Hypothesis.
It is denoted by H0
It is tested against the alternatives.
“Null hypothesis is the hypothesis which is to be tested for possible rejection
under the assumption it is true” (Prof R. A Fisher remarked).

14.38. ALTERNATIVE HYPOTHESIS


The negation of null hypothesis is called the alternative hypothesis.

It is not tested, but its acceptance (rejection) depends on the rejection


(acceptance) of the null hypothesis.
The alternative hypothesis contradicts depends on the rejection of
the null hypothesis.

Statistical Hypothesis
The statement or assertion about the statistical population or the value of its
parameters is called statistical hypothesis.
It is two types of hypothesis
Simple
Composite

1. Simple hypothesis
The hypothesis which specifies the population completely is called simple
hypothesis.

2. Composite hypothesis
The hypothesis which does not specify the population completely is called
composite hypothesis.
Introduction to Biotechnology and Biostatistics

Rejection Region
The set values of the statistics which lead to rejection of the null hypothesis
is called rejection region of the test. The probability of the null hypothesis is
rejected by the test is often referred to as “size” of the critical region.
On the other hand which lead to the acceptance of null hypothesis which
gives us a region is called as “Acceptance region.”

Statistics Test
After the arrangement of the null hypothesis and alternative hypothesis the test
of statistics is computed and it is based on the probability distribution. It is used
to test whether the null hypothesis set up should be accepted or rejected.

The probability is maximum with which a true null hypothesis is rejected is


known as level of significance of the test.

consequence of statistical decision for the farming decision rules.

or ½ (0.05) % are also used.

Degree of Freedoms
The sample which is freely variable without affecting the mean or it is an integer

There are the number of data which are given in the form of a series of variables
in a row or column or the number of frequencies that are put in cells in a
contingency table which can be calculated independently is called the degrees
of freedom and is denoted by

Calculation for Degree Freedom


The data is given in the form of series of variables in a row or column then
the degrees of freedom (df) = (the number of items in series) -1, i.e., df = n–1.
Where “n” is the number of observations.
The number of frequencies are put in cells in a contingency table the degree
of freedom will be the product (number of rows less one) and the (number of
column less one) i.e.,

where “R” is the number of rows.


Biostatistics

“C” = is the number of column.

Condition for using the Chi-Square test:


It should be on random basis of sampling.
It should be absolute not relative terms.
It is dependent degree of freedom.
It should be independent to each other for making the sample of each
observations.
If X2 test applied in a fourfold table then it will not give a reliable
result with one degree of freedom if the expected value in any cell
is less than 5.
In such cases to apply X2 test yates correction necessary.
In this test the total number of observation should be large it mean

Types of chi-square test


Homogeneity Chi-Square.
Contingency Chi-Square.

Goodness of Fit (Pearsonian – x2):

whether the actual or (observed) numbers or frequencies are similar or in “good


agreement” with the expected or (theoretical) number of frequencies.

/E
2

where “E” Expected.


“O” Observed

Example: The Model’s reported the results of the garden pea test each for

Solution:
Cross Progeny Hypothesis
Green × Yellow Pods (F2)428: 152 3: 1
Violet red × White Flower (F1) 47: 40 1: 1
Round yellow × Wrinkled green (F1) 31: 26: 27: 26 1: 1: 1: 1
Introduction to Biotechnology and Biostatistics

Solution:
Null Hypothesis = 3: 1
Alternative Hypothesis = 1: 1
Calculation =
Observed (O) Expected (E) (O-E) (O-E)2 (O-E)2/E
428 428 – 435 = -7 49 49/435 = 0. 113
152 152 – 145 = 7 49 49/145 = 0. 338
Total = 580 x2 = 0. 451

The critical value of X2 at 0.05 and for 2–1 = 1 degree of freedom is 3.84.
The decision; the calculated value of chi – square (x2) = 0.451 < critical value
of x2 for df
variation with the data So it is result of F2 monohybrid cross.
Solution (b)
Null Hypothesis = 1: 1
Alternative Hypothesis = 3: 1
Calculation =

Observed (O) Expected (E) (O-E) (O-E)2 (O-E)2/E


47 87/2 = 43.5 47 – 43.5 = 3.5 12.25 12.25/43.5 = 0.281
40 87/2 = 43.5 40 – 43.5 = -3.5 12.25 12.25/43.5 = 0.281
Total 87 X2 = 0.562

Critical value: The control vale of chi square at 0.05 and for 2–1 = 1 degree of
freedom is 3.84.
Decision: the calculated value of the chi square x2 = 0.562 < critical value of
x2 for 1 df = 3.84 so the null hypothesis is accepted, i.e., the variation is non

Solution (c)
Null Hypothesis = 1: 1
Alternative Hypothesis = 3: 1
Calculation =
Observed (O) Expected (E) (O-E) (O-E)2 (O-E)2/E

31 31 – 27.5 = 3.5 12.25 12.25/27.5 = 0.445

26 26– 27.5 = 1.5 2.25 2.25/27.5 = 0.082

27 27– 27.5 = -0.5 0.25 0.25/27.5 = 0.009

26 26 – 27.5 = -1.5 2.25 2.25/27.5 = 0.082

Total = 110 x2 = 0.618

Critical value
Biostatistics

The chi square value of chi-square at 0.05 and 4–1 = 3 so, df = is 7.82
Decision
Calculated chi-square value (x2) = 0.618 < critical value of X2 of 3 df = 7.82 so

Uses of chi square test


Test of homogeneity
Test of independent of attributes.
Test of Goodness of Fit.

14.39. CENTRAL TENDENCY


The tendency for the given values of random variable to cluster round its mean,
mode or median.

14.40 MEASURES OF VARIATION


The variability is essential a normal character. The variability is a biological
phenomenon. It is an important characteristic indicating the extent to which
observations vary among themselves.
There are three main types of variability
Biological variability
Experimental variability
Real variability

Measures of variability
It is helpful to find out on how individuals observations are dispersed around
the mean of a large series.
The variability of a given set will be zero and only when observations are equal
so it takes positive when observations are unequal.
The measurements and variability are both of fundamental importance in the
biological science.
Dispersion
Introduction to Biotechnology and Biostatistics

Absolute measure of dispersion


1. It is expressed in the same statistical unit in which the original data
are given.
2. It is used for the comparison of two sets of observations provided
the variables are expressed in the same units and of the same
average size.
3. The sets of data are given in dissimilar units again then the absolute
measures of dispersion are not comparable.
Relative measures of dispersions
1. It may also be used to compare the relative accuracy of data.
2. It is the ratio of a measure of absolute dispersion to an appropriate
measure of central value and it is expressed in pure number.
3. It is independent of the units of measurement.
Good measures of dispersions

It should be easily calculated.


It should be based on all observations.
It should be algebraic treatment.
It should be based on all observations.
It should not be affected by extreme items.
It should have stability of samples.

Range
It is the simplest measure of dispersion. It is the difference between the value of
smallest item and the largest item included in the distribution.
Range (R) = Largest value (L) – Smallest value (S)
The relative measure corresponding to range is co-

R=L–S
L+S
Example: Find out the ranges of daily wages of 8 persons in a family given
below.
Biostatistics

Rupees 10 11.50 12 21 6.75 18 13 20


Solution:
Here the largest values Rs. 20 & smallest value Rs. 675
R = 20 – 6.57 = 13.25
Example 2: Find the range of the following:
Class 10–19 0–29 30–39 40–49 50–79 80–99
No. of person 5 15 25 35 15 5

Solution:
There are class discontinuous type we change class limits to class boundaries
7 then the lower class boundary of lowest class 9.5 (S) and the upper class
boundary of the highest class = 99.5 (L)
The range (R) = 99.5 – 9.5 = Rs. 90.

Days Mon Tue Wed Thus Fri Sat


Prices 20 21 23 16 25 22

Solution:
R=L–S
Here L = 25, S = 16
R = 25 – 16 = 9 (R)

L+S
= 25 -16
25+16
= 9/41 = 0.219
Merits and Demerits of Range
Merits
It takes time to calculate.
It is also simple to calculate.
It is easy to understand.
Demerits
It is not depend on all observations.
It is based on only the largest and smallest among the values.
It is highly affected by extreme values.
It cannot be calculated by from frequency distribution with open
classes.
Uses
1. Estimating the Fluctuations in Prices: It is useful for the prices
variation in stocks and shares.
2. Weather Forecasts: It is preferably used in determining the
Introduction to Biotechnology and Biostatistics

difference in minimum and maximum temperature for predicting


the variation of temperature in a day.
3. Quality Control: It plays an important role in preparing control
charts in the methods of statistical quality control.

Mean Deviation
The mean deviation is called the average deviation. It is the average difference
between the items in a distribution and the median and mean that series.
It is about the mean.
It is about the median

It is the ration of mean deviation to its arithmetic mean or median multiplied by 100.
C.M.D = MD× 100
Mean/ Median

Calculation of Mean Deviation


Ungrouped data:
It can be calculated mean and median. Calculate the deviation from mean
denoting by “D” and ignoring the sign positive (+) or negative (-)
Where “n” is the total number of observation items.

nn

Example: Calculate the mean deviation for the following items.


X 10 11 12 13 14
F 3 12 18 12 3

Solution:
fD
N
= 36/48
= 0.75
Example: Find out the mean deviation of the following data 13, 84, 68, 24,
96, 139, 84, 27 and bout the median?
Solution:
Here there are many even number of observations viz 8, median is the average
of the two middle most observations
Let us arrange the data
Biostatistics

13 24 27 68 84 84 96 139

Median = 68+84/2
= 152/2
= 76
X X – Median = D
13 76 – 13 = 63
24 76 – 24 = 52
27 76 – 27 = 49
68 76 – 68 = 8
84 84 – 76 = 8
84 84 – 76 = 8
96 96 – 76 = 20
139 139 – 76 = 63
N=8

Mean Deviation about the median

= 1/8× 271 = 33.88


Grouped data:
1. Discrete series:
It is calculated mean and median and the deviation from the mean and the

f
f
value of mean deviation.

MD = f f f
=X–X
f=N

2. Continuous Series: It can be calculated the mean and the median


series and also take the deviation of the items from the mean

deviation is taken from the mid value of each class. Multiply the
deviation by frequencies and obtain the total. It is divide the total
sum of the total number of observations.
Introduction to Biotechnology and Biostatistics

Merits and Demerits of Deviation


Merits
It is relative simplicity.
It is easy to compute.
It is easy to understand.
It is less affected by the value of extreme items as compared to
standard deviations.
Demerits
It is rarely used in social sciences.
It is yielding best result while taken from median.
This method may not yield accurate results.
It is not suitable for further algebraic treatment.
Example 1. Calculate the mean deviation from the mean from the following
data.
Marks 0–10 10–20 20–30 30–40 40–50 Total
No. of students 5 8 15 16 6 50
Solution:
Mid Value (X) fD
0–10 5 5 25 5 -27 = 22 110
10–20 15 8 120 15 – 27 = 12 96
20–30 25 15 375 25 – 27 = 2 30
30–40 35 16 560 35 – 27 = 8 128
40–50 45 6 270 45 – 27 = 18 108
f = 50 = 1350 fD =
472

X= 1350/ 50
= 27
MD = f
= 472/50
= 9.44
Example: From the following frequency distribution calculate the value of
quartile (Q1) median (Q2) and upper Quartile (Q3).
Marks in 10–19 20–29 30–39 40–49 50–59 60–69 Total
Mathematics
Frequency 8 11 15 17 12 7 70
Solution:
Q1 = N/4
Q2 = N/2
Q3 = 3N/4
Biostatistics

10–19 8 9.5 0
20–29 11 Q1 19.5 8 N/4 = 17.5
30–39 15 29.5 19
40–49 17 Q2 39.5 34 2 N/4 = 35
50–59 12 49.5 51
60–69 7 Q3 59.5 63 3 N/4 = 52.5
69.5 70 =N

Q1 = L1 + N/4 –F1/f1 × i
L1 = Lower boundary of quartiles
F1 = Cumulative frequency
f1 = frequency of quartile class
i = width of class interval

Q1 = 19.5 + 17.5–8/11 × 10
= 19.5 + 95/11
= 19.5 +8.6
= 28.1
Q2 = 39.5 + 35–34/17 ×10
= 39.5+ 10/17
=39.5+0.58
= 40.08
=40
Q3 = 49.5 + 52.5 – 51/12 × 10
49.5 + 1.5 × 10/12
= 49.5 + 1.25
= 50.75
= 51
Q1 = 28
Q2 = 40
Q3 = 51

deviation from the following frequency distribution?

Class 10–15 15–20 20–25 25–30 30–40 40–50 50–60 60–70


Interval
Frequency 4 12 16 22 10 8 6 4 = 82

Solution:

10–15 4 10 0
15–20 12 15 4
Introduction to Biotechnology and Biostatistics

20–25 16 Q1 20 16 20.5 = N/4


25–30 22 Q2 25 32 41 = N/2
30–40 10 Q3 30 54 61.5 = 3N/4
40–50 8 40 64
50–60 6 50 72
60–70 4 60 78
70 82 =N

Cumulative frequency N/4


Cumulative frequency N/2
Cumulative frequency 3N/4
Total frequency N = 82
N/4 = 20.5
N/2 = 41
3N/4 = 61.5
Q1 = L1 + N/4 – F1/f × I
Q1 = 20 + 20.5 – 16/16 ×5
Q1 = 20 + 4.5/16 × 5
= 20 + 22.5/16
= 20 + 1.4 = 21.4
Q2 = 4 + N/2 – F/f × i
= 25 + 41–32/22 × 10 = 25 + 90/22 = 25 + 4 = 29.0
Q3 = 30 + 61.5 – 54/10 × 10
= 30 + 7.5 = 37.5
Q = (Quartile Deviation)
= Q3 –Q1/2
=37.5 – 21.4/2 = 16.1/2 = 8.0
2
× 100
= 8/29×100
=27.58%.
Example: Find the quartile deviation of the following distribution?
Class Interval 40–45 45–50 50–55 55–60 60–65 65–70
Frequency 10 22 28 20 12 8
Solution:

40–45 10 40 0
45 – 50 22 Q1 45 10 25 = N/4
50 – 65 28 Q2 50 32
55 – 60 20 Q3 55 60 75 = 3N/4
60 – 65 12 60 80
Biostatistics

65 – 70 8 65 92
70 100 =N

Q1 = L1 + N/4 – F/ f × i
N/4 = 25 F = 10
L1 = 45
f = 28
i=5
= 45 + 25 – 10/22 ×5
= 45 + 15/22 × 5
= 45 + 3.4
= 48.4
Q3 = 55 + 75 -60/20 ×5 = 55 + 3.75
= 58.75
Quartile Deviation (Q) = Q3 – Q1/2
= 58.75 – 48.4/2
= 10.35/2
= 5.175

14.41 BINOMIAL DISTRIBUTION


Binomial distribution was discovered by Swiss mathematician James Bernoulli
(1654–1705). It is derived from the process is called Bernoulli trial.
It is the discrete probability distribution which is obtained when the probability
(P) of the happening of an event is same in all the trials and there are only two
events in each trial.
It is theoretical probability distribution because it can be worked out theoretically
using the series of the terms of binomial equation.

Condition Under Binomial Distribution


The trials or events must be repeated under the conditions.

The events or trials must be independent it means happening of one


event or trials must not affect the happening of other events.
The variable should be discrete it means the values of X should be
1, 2, 3, 4, or 5, etc.
It never be 1.5, 2.5, 3.7, 5.5, etc.
It should be either success or failure so that it can dichotomy exist,
i.e., the happening of an event has two possible outcomes.
Introduction to Biotechnology and Biostatistics

Properties of Binomial Distribution


It is presented the discrete probability distribution.
It has two parameters “p” or “q” the probability of success or failure
and “n” the number of trials.
The parameters “n” is always integer.
The mean (µ), SD ( ) variance ( 2
(C.D) of a binomial distribution of cases of the class having the
proportion “p” in the population are obtained from the sample size
(n) and the proportions (p and q) of the cases in the two classes.
Mean (µ) =

2
)=
Kurtosis and Skewness of binomial distribution depends on the
proportion of p and q in the population.
1
) = (q – p)2/npq
2
) = 3 + 1 – 6 pq/npq
It is symmetrical of p = q = 0.5
It is positively skewed if p < 0.5
It is negatively skewed if p > 0.5

Computational Binomial Probabilities

It may be regarded as generalization of binomial distribution for accommodating


any number of variables when they are more than two mutually exclusive
outcomes of a trial the observations leads to multinomial distribution.
Suppose that:
E1, E2, E3 E4……EK are k mutually exclusive and exhaustive outcomes of an
event trial with respective probabilities p1, p2, p3, pk will occur k1, k2, k3, k4…kn
times respectively is
C = N/ k1 2 3 1
k1
P2k2….Pkn
Where k1 + k2 + k3 + kn = N
C = the number of permutation of the events E1, E2….Ek
(1) Single term of the expansion:
It represents the number of ways in which the conditions of each term may
Biostatistics

expressed by
nCk = n/ k n-k
(2) Bernoulli expansion:
It is having total number of classes, events with probability of Bernoulli
expansion
n = total number of events
p = classes
q = classes
X = number of classes in p classes
n-X = number of cases in q classes
Probability of P (X) is expressed by Bernoulli expansion
P (X) = n px qn-x/
(3) Binomial expansion:
It is having a total number of events and trials with probability of occurrence of
success or failure.
n = total number of events/ trials
p = probability of occurrence of success
q = probability of occurrence of failure.

n n 2 3

Probabilities distribution of binomial expansion for r success in n trials is given


by
n
Cr n r

It has mean = and variance = .


Mean, Variance and Standard deviation of Binomial distribution
The probability of distribution of binomial distribution for r success in n
events and trials is given by
n
Cr n r

So, mean (µ) = np


2
) = npq

n =number of independent events


p = probability of success
q = probability of failures
Example 1: Calculate the probability of getting head three times when a
coin is tossed 5 times.
Solution:
When coin is tossed there are only two outcomes either head or tail.
Therefore it shows binomial distribution. If the probability of getting head then
it is “p” and of tail it is “q.” The probability of getting head 3 times can be
Introduction to Biotechnology and Biostatistics

calculated in the following way.


n =5, X = 3, P = ½ = 0.5, Q ½ = 0.5
P (X) = n px q n-x/ X (n-X)
= 5 (0.5)3 (0.5)5–3/3 (5–3)
= 5.4.3.2 × (0.5)3 × (0.5)2/ 3.2.2
= 5 × (0.5)5
= 10 × 0.5 ×0.5 ×0.5 × 0.5 × 0.5
= 5 × 0.03125
= 0.3125
Example 2: The two male and female children in a family of four children
by applying binomial theorem.
Solution:
Total number of children = 2 + 2 = 4
Male = 2
Female = 2
Initial probabilities of male child (p) = 2/4 = ½ = 0.5
Initial probabilities of female child (q) = 2/4 = ½ = 0.5
According to the binomial equation:
4 4 3 2 2 3 4

= 6. (0.5)2 (0.5)2
= 6 × 0.5 × 0.5 × 0.5 × 0.5
= 6 × 0.0625
= 0.375
So, the probabilities of two boys and two girls in a family are 0.375.
Example 3: In a two hundred families with three children a population of
Arambagh subdivision is sampled at random. How many families do we expect
to have (a) no girls (b) one girl (c) two girls? Assume the sex ration to be 1:1.
Solution:
Probabilities for girls and boys = ½
g for girls and b for boys.
Now we expand the binomial (g and b), n = 3
(g +b)3 = g3 + 3g2b + 3gb2 + b3
No girls relate to b3 term
(1) ½ × ½ × ½ = 1/8 = 1/8 ×200 = 25 (200 = families)
(2) One girls relates to 3gb2 term
3 × ½ (1/2)2 = 3 ×1/2 × ¼ = 3/8 = 3/8 ×200 = 75
(3) Two girls relate to 3g2b term
3 × (1/2)2 ½ = 3× ¼ ×1/2 = 3/8 = 3/8 × 200 = 75
Example 4: A plant breeder has 45 different inbred strains of pea plants. How
many different hybrids can be obtained from a total 45 plants?
Biostatistics

Solution:
Hybrid has two genes
N = 45 r = 2
According to the formula:
nCr = n/r (n –r)
= 45/ 2 (45 -2)
= 45 × 44 ×43/ 2.1 ×4 ×43
= 45 ×44/2
45 × 22 = 990
Example 5: In a family with two children in serampore subdivision where
both parents are heteroztgous for albinism. What proportion of these
family would be expected to have (a) neither child with albinism (b) one
child with albinism, (c) both children with albinism?
Solution:
Let the symbol “a” for albinism and “A” for normal
Expand binomial expansion
(A + a)2 = A2 + 2Aa + a2
The parents are heterozygous so therefore probabilities of normal ¾ and albino
¼
Two children in a family both are normal, i.e., A2 = (3/4)2 = 9/16
Among the two children, one is with albinism, i.e., 2Aa = 2Aa = 2 × ¾ × ¼ =
6/16
Both children with albinism, i.e., a2 = (1/4)2 = 1/16.
Example 5: Consider the parents of a Sinha Roy family in which both
of them heterozygous for a sever genetic syndrome, that is autosomal

How unlucky is this family?


Solution:
n=6
p=¾
q=¼
Probabilities for diseases (t) = 5
Normal (s) = 1
P = n × p i q 5 /s × t
= 6 × (3/4)1 (1/5)5
= 6 ×5/ 5 ×3/4 ×1/1024/1 × 5
= 18/4096
= 0.00439
= 0.0044
Example 6: A couple of heterozygous for albinism (Aa). What is the
probability that (a) 4 out of 6 children born to them are normal: (b) 4
Introduction to Biotechnology and Biostatistics

normal and 2 albibo out of 6 children?


Solution:
Let “a” = allele for albinism
A = allele for normal skin color
Generally heterozygous parents have ¾ normal children and ¼ albino children.
Pa = ¾
q (a) = ¼ and n = 6
Probability of 4 children being normal
PA = (3/4)4 = 3/4 × 3/4 × 3/4 × 3/4 = 81/256 = 0.316
n = 6, s = 4 and I = 2 (4 normal and 2 albino)
P = n × (p)4 × (q)2/ s × t
= 6 ×(3/4)4 × (1/4)2
= 6 ×5 ×4 ×3/4×3/4×3/4×3/4×1/4×1/4/2×4
= 15×81/256 ×1/16 = 1215/4096 = 0.2966
= 0.297
Through binomial expansion:
4 2 3 3 2 4

P = 15p4q2
= 15 × (3/4)4 × (1/4)2
= 15 × 81/256 ×1/16
= 1215/4096
= 0.2966
= 0.297
Example 6: The four babies were born in Aligarh general hospital (a) what
was the chance that two will be boys and two girls (b) what was the chance
that all four would be girls?
Solution:
The probabilities of boys and girls were ½ = 0.5
P=½

(a) n = 4. s = 2 and t = 2
(b) (s) for girls and
(c) (t) for boys
(d) P = n × ps × qt/s×t
(e) = 4 × (1/2)2 × (1/2)2/2×2
(f) = 4.3. (2) ×1/4 ×1/4/2.1(2)
(g) =6×¼×¼
(h) = 3/8
Probabilities was 3/8
(c) (p+q)4 = p4 + 4 p3q + 6 p2q2 + 4 pq3 + q4
(d) p = q4 (1/2)4 = 1/8
Biostatistics

Example 7: There are eight children in a family, where both parents are
heterozygous for albinism what mathematical expression predicts the
probability that six are normal and two are albinos?
Solution:
As both parents are heterozygous the probabilities of normal is ¾ and albinos ¼
i.e., p = ¾
q=¼
n= 8
s = 6 and t = 2
(a) P = n × ps × qt/s×t
= 8 ×3/4 × (1/4)2/6 (2)
8.7 (6) ×(3/4)6 × (1/4)2
= 28 × (3/4)6 ×(1/4)2
= 28 × 3/4×3/4×3/4×3/4×3/4×3/4 ×1/4×1/4
= 7×729/16384
= 5103/16384
= 0.31146
Example 17: A multiple allelic system is known to consist of seven alleles.
Assuming that this is a diploid species, how many different genotypes could
exist in the population?
Solution:
Number of possible genotypes = Number of different allelic combination
(heterozygotes) × number of genotypes with two same allele (homozygotes)
= n + n /k (n – k)
n=7
k = 2 (heterozygotes)
= (7)/ 2 (7–2) +7 = 7.6 (5)/2.5 +7 = 21 +7
= 28 genotypes
Poisson Distribution
The poison distribution was derived by French mathematician Simeon Denis
Poisson (1837) and is known as Poisson distribution.
It represent the Poisson distribution of discrete, random variables of rare events
whose probability occurrence is very small but the number of events/trials is

where binomial expression formula can be used in determining theoretical


probabilities.

rare events and the mean and the variance is equal.


Characteristics
The events are independent and random.
It is limited form of binomial distribution.
Introduction to Biotechnology and Biostatistics

It may be expected in cases where the chance of any individual event


being success is small.
It has a single parameter, the mean of the distribution.
The mean and the variance are equal.
It is a discrete probability distribution because it is a probability
distribution of whole number (0, 1, 2, 3….n) of events.
It is positively skewed and declines with the rise of value of mean.
It is leptokurtic which decreases with the increases of mean.
Condition under the Poisson distribution is used
It is applicable when the observation or number of events is very
large but the probability of success is very small.
The random variables should be discrete.
A dichotomy exist, i.e., the happening of the events must be divided
into two classes viz., Success or failure occurrence or nonoccurrence.
P should be small or it case to zero.
It is independent it means happening of one events does not affect the happening
of other event.
Computation of Poisson distribution
The random variable(X) is applicable for probability of distribution so it’s said
to have Poisson distribution.
P (X) = e-m mx/x
P = Probability of success
x = variables (such as 0, 1, 2, 3 and n)
e = constant 2.7183 (base of natural logarithm
m
Number of 0 1 2 3 r n Total
success (X)
Probabilities e-m e-m m/1 e-m m2/2 e-m m3/3 e-m mr/r e-m mn/n 1
P(X)

= e-m + e-m m/1 + e-m m2/2 + e-m m3/3 + e-m mr/r + e-m mn/n +
= e-m (1 + m/1 + m2/2 + m3/3 + mr/r + mn/n +
= e-m em = e 01 = 1
Mean = m = p

Skewness given by (ß1) = 1/m


Kurtosis given by (ß2) = 3 + 1/m
Variance = m
Examples of Poisson Distribution
Biostatistics

treatment in water.
The number of bacterial colonies in a given culture of per unit area
on microscopic slide has been seen under microscope.
The emission of radioactive particles
The number of mistakes have committed by a good typist per page.
The numbers of buses are passing through a certain road (M.G Road
Agra).
The number of diseases or death by cancer or heart attack in any
cities like in (Agra) hospitals in one year.
Example 1: In Biotechnology of 520 pages, 390 typological error occur.

that random sample of 5 pages will contain no error?


Solution:
Here n = 5, Book has 520 pages
Typological errors 390 pages
Therefore probabilities (P) = 390/ 520
= 0.75
Mean np = 5 × 0.75
= 3.75
Using Poisson probabilities law
P (r) = e-m mr/ r = e-0.75 mr/r
Probabilities error zero,
Therefore P(0) = e-0.75 (3.75)0/0
= e-0.753.75
Example 1: The Biostatistics book with 585 pages contains 43 typological
errors. If these errors are randomly distributed throughout the book, what
is the probability that 10 pages, selected at random will be free from errors?
Solution:
Here n = 10
Book has 585 pages
Typological errors 43 pages
Therefore probabilities P = 43/585 = 0.0735
Mean (m) np = 10 × 0.0735
= 0.735
Poisson distribution (Pr) = e-m mr/r
= -0.735 × (0.735)r/ r
Probability zero error P (0) = e-0.735 × (0.735)0/0
= e-0.735 ×1
= e-0.735
Introduction to Biotechnology and Biostatistics

= 0.4795

ß1, ß2, µ3, µ4


Solution:
Here m = 4
Variance = m = 4

=2
Skewness (ß1) = 1/m
=1/4
=0.25
Kurtosis (ß2)
= 3 +1/m
=3+¼
0.325
µ3 = m
=4
µ4 = m +3m2
= 4 + 3 (4)2
= 4 +3.16
= 4 +48 =52
Example 4: The following data are obtained from vector cytogenetic
research laboratory of Dr. M.P.S College Agra.
1
)
0 20
1 26
2 16
3 4
4 2
What is mean?
What is e-m?
What is P(0)?
Solution:

= 0+26+32+12+18/20+26+16+4+2
= 78/68
1.147
=1.15
(X) = m = 1.15
e-1.15 = 0.32
Biostatistics

= 0.32 × 1 /1 = 0.32
Example 5: In a family of 8 children where both parents are heterozygous
for albinism, what mathematical expression predicts the probability that
six are normal and two are albino?
Solution:
Both parents are heterozygous, the probabilities of normal is 3/4 and albinos
1/4.
i.e.,

= 8/6 (2) × (3/4)6 × (1/4)2


= 8.7 (6)/ 2.1 (6) × (3/4)6 × (1/4)2
= 28 × (3/4)6 × (1/4)2
= 28 × 3/4×3/4×3/4×3/4×3/4×3/4 ×1/4×1/4
= 7×729/ 1638
= 5103/16384
= 0.31146

14.42. SKEWNESS
The distribution is said to be symmetrical when mean, median and coincide.
It has three parts left tail and middle part also. It has also right and left tail are
equal length. It is used to denote the extent of a symmetry in the data. When the
frequency distribution is not symmetrical it is said to be skewed. The meaning
of skewness is “ lack of symmetry.” A symmetrical distribution has therefore
zero skewness.
Characteristics
2. It may be positive or negative.
Positive skewness:
The curve of the distribution has longer tail toward the right it means
the higher values of the variable.
Mean > Median > Mode
If the curve of the distribution has a longer tail towards the left, i.e., the
lower values of the variable
Mean < Median < Mode
2. Here Mean, Median and mode are failed to coincide. Both median and
mean are displaced from the mode toward the skewed tail.
Introduction to Biotechnology and Biostatistics

Mean > Median > Mode (Positive Skewed)


Mean < Median < Mode (Negatively Skewed)

Therefore
Q3 –Q2 > Q2 –Q1 (Positively Skewed)
Q2 – Q1 > Q3 –Q2 (Negatively charged).
Measures of Skewness
It indicate not only the extent of skewness in numerical expression
but also the direction, i.e., the number in which the deviations are
distributed.
It is normally measures of symmetry are called measures of skewness.
The absolute measures are known as measures of skewness.
It tells us the extent of symmetry whether it is positive or negative.
Absolute skewness = Mean – Mode
Mean > Mode (Positive skewness)
Mode > Mean (Negative skewness)
ß1 2
3
3
2
µ3 = 3rd moment and µ2 2nd moment.
There are important measures of relative skewness.
1.
Sk = Mean – Mode/ Standard deviation
Sk = 3 (Mean – Median)/Standard deviation
2.
Sk = Q3 – 2Q2 + Q1/ Q3 – Q1 (Q1 = First quartile; Q2 = Second quartile)
3.
Sk = P90 + P10 – 2 Median/P90 – P10 (P10 = 10th Percentiles; P90 = 90th Percentiles

14.43. KURTOSIS AND MOMENT


It is used to describe the degree of peakedness of a frequency distribution com-
pared to that of normal distribution.
Characteristics
1. It is measure the peakedness of a normal curve.
2. It is also called as measures of convexity of the curve.
3. It introduced the three broad patterns.
4. If the peakedness viz., (Leptokurtic, Mesokurtic and Platykurtic).

curve.
The curve which has higher and sharper peaked is called as mesokurtic
Biostatistics

it means the normal curve.


The curve has higher and sharper peaked and narrow body then the
normal curve is known as Leptokurtic.

tails than normal curve is called as Platykurtic.


Measures of Kurtosis
The frequency of distribution is based upon the fourth moment about the mean
of the distribution.
ß2 = µ4/µ22
= µ4/ 4
2
= 3 Mesokurtic
2
< 3 Platykurtic
2
> 3 leptokurtic

Importance of Skewness
It tells the direction and extent of asymmetry in a series.
It provides us an idea about the nature and degree of concentration of
items.
Dispersion Skewness
It spread the individuals values It shows the departure from symmetry,
about the mean it means central i.e., direction of variation.
value.
It shows the degree of variability. It shows the value is higher or the lower
concentration.
It is types of averages of deviation- It is not the average but it is measured
average of the second order. by the use of mean, median and mode.
It judges the truthfulness of the It judges the truthfulness of the central
central tendencies. tendencies.

than normal curve.


It also depend the shape of the top of a frequency curve.
Moments
It is used in mechanics, physics, etc. It is also applied in statistics, it describe
the various characteristics of frequency distribution viz., central tendency,
dispersions and skewness and kurtosis.

taken from the mean of distribution.


Role Moments:
First moments (µ1) of frequency It is always zero, i.e., It measures mean of the
µ=0 distribution µ1 = X = 0
Introduction to Biotechnology and Biostatistics

Second Moments (µ2) of 2 2


It measures the variance,
frequency distribution about i.e., the spread of the
the mean is the variance of the different terms in a
2
/N distribution.
Third Moments (µ3 It deals with skewness It gives an idea about the
X)3/N degree of skewness present
Fourth Moments (µ4 It is highlights on the It measures kurtosis.
x)4/N height of frequency
distribution whether it

topped than normal


Example 1: The coefficient of skewness = 3, Mean = 90, Median = 80; Find the
value of S.D.
Solution:
Sk = 3 (Mean – Median)/S.D
S.D = 3 (Mean – Median)/Sk
S.D = 3 (90 -80)/ 3
= 3 ×10/ 3
= 10

14.44 SET THEORY AND PROBABILITY

object of the set is called as elements or members.


The sets are usually denoted by cpaitals letters (e.g., A, B, C and D) and their
elemnts are denoted by small letters (e.g., a, b, c and d). The elements are
enclosed in within curly backers (……). It is separated by commas.
Example 1: The collections of all consonants of English alphabet and collection
of all odd numbers less than 50.

Finite set:

4, 6 and 8), it has no countable elements.

Null set
It does not contains any elements at all is called as null set. It is also called void
set or empty set.
There is only one such set
Biostatistics

It is denoted by
Examples 1: A person who can jump to a height of 5 miles is the null set
because none can jump to such height.
Unit Set
The set having only one element and it is also known as single tone set.
Equal Set
The two sets viz A and B are called equal if they have same elements. When A

Equivalent Set
The two sets viz A and B are equivalent if the number of elements, i.e., cardial
numbers are equal. e.g., A = (2, 4 and 6) and B= (a, b and c)
Here n (A) = n
(B) = 3.
Cardinal Set

by n (A).
The cardial number null set is zero.

E.g.: A = (2, 3) has 2 elements, so n (a) =2


B = (a, e, I, o, u) has 5 elements so n (A) =5
Subsets and Superset
The two sets viz., A and B if each element of set A is also an element of a set B,
then set A is called a subset of B and set B is the superset of A.
This is read as “B” contains “A” or “A” contains “B”
Example: The set A = (2, 5) is subset of the set B = (2, 5, 7).
Here all the elements of A are also the elements of B
On the other hand B is a superset of set “A.”

14.45. COMPARATIVE STATISTICS


In this, we are compared variables. We can find out variables are basically same
means they could originate from the same population or if they are significantly
different means they have a different origin. So, there are two tests ANOVA
(analysis of variance) and t-test (Figure 14.11).

14.46. CHI SQUARE TEST


The calculation of he quantity which is used to compare an “observe” ration
with an “expected” or “theoretical” ration and to to determine the how closely
Introduction to Biotechnology and Biostatistics

the former fits the letter so it involve all the statistical test for the calculation
or observation of the hypothesis to be significant or not significant value.
In scientific research, first we make a hypothesis and do experiments after
completion of the experiment we analysis the data that is correct or not so chi
square test for the analysis of our observation.

“observed” frequencies and “ theoretical or “expected” frequencies and thus to


determine where the observed and theoretical frequencies is due to the error of
sampling or due to the chance.
It is computed on the basis of frequencies in a simple and thus the value of chi-
square so obtained is a statistic. Chi-square is not a parametric test as its value
is not derived from the observations in a population. Hence chi-square test is a
Non Parametric test.
Chi square (X2) test:
A statistical test for determine the number of observations which is derived
from those “expected” or “theoretical” number under a particular hypothesis.
X2 2
/E
where ½ = 0.5 Yates correction
O = observed frequencies
E = expected frequencies
Important characteristics of chi-square test
Chi square will be zero if each pair is zero and it might be assume

observed frequency and expected frequency in each pair are unequal.

It is a statistics not a parameter.


It is always positive as each pair is squared up.

Useful Points
Hypothesis test

or “reject” the hypothesis under consideration (i.e., null hypothesis).

14.47. STUDENT DISTRIBUTION TEST


The t-test is used with small sample (n < 30) and it was worked out by W.S
Gossett whose pen name was “student.” However the test is called as student
t- test distribution.
Biostatistics

mean and true mean or population mean expressed in terms of the standard
error.
T = Difference between sample mean
Standard error of the difference between means
1
–X2
where X1 and X2 = Mean
SE = Standard error
Type
Paired t-Test
Unpaired t-Test
Condition for applying test
Random sample are collected from normal population,
The population variances are regarded as equal for the testing the
equality of two population means.
Samples are less than 30.
Some adjustments in degrees of freedom for are made in case of
two samples.
distribution properties
The distribution curve varies with the degrees of freedom.
It is symmetrical distribution with mean zero.

The graph is similar to that of normal distribution


It has greater spread than normal distribution.
The larger the number of degrees of freedom the more closely “t”
distribution resembles standard normal distribution.
distribution applications

mean of two sample.


The difference between two sample mean when the population variance
being equal and unknown.
The single mean when the population variance is unknown.
Calculate the t- value:
1
–X2
X1 and X2 = Mean
Sx1 and Sx2 = SD
n1 and n2 = size of sample
SE of (X1 – X2 1
+ 1/n2
1
– X2 )2
2
– X2)2/ N1+N2 -2
Determine the pooled degree of freedom from the formula
df = (n-1) + (n2 -1)
= n1+n2–2
Introduction to Biotechnology and Biostatistics

Compare the calculated value with the table value at particular degrees
of freedom.
Example: There are 13 children were given a usual diet plus vitamins “A”
and “D” tables. While the second comparable group of 12 children was taking
the usual diet. After 12 months, the gain in weight in pounds was noted as
given in the table. Can you say that Vitamins A and D were responsible for the
difference?
A 5 3 4 3 2 6 3 2 3 6 7 5 3
B 1 3 2 4 2 1 3 4 3 2 2 3
Solution:
Null Hypothesis:
Vitamins (A and D) are responsible for the gain weight difference
Alternative hypothesis:
Vitamins are not the responsible for the gain weight differences.
S.no Gr A (x) (X –X) = D (X-X)2 = Gr B (Y) (Y-Y) = D2 (Y-Y)2 =
D21 D22
1 5 5– 4 = 1 1 1 1 – 2.5 =- 1.5 2.25
2 3 3– 4 = -1 1 3 3 – 2.5 = 0.5 0.25
3 4 4– 4 = 0 0 2 2 – 2.5 = -0.5 0.25
4 3 3– 4 = -1 1 4 4 – 2.5 = 1.5 2.25
5 2 2– 4 = -2 4 2 2 – 2.5 = 0.5 0.25
6 6 6– 4 = 2 4 1 1 – 2.5 = -1.5 2.25
7 3 3– 4 = -1 1 3 3 – 2.5 = 0.5 0.25
8 2 2– 4 =-2 4 4 4 – 2.5 = 1.5 2.25
9 3 3– 4 = -1 1 3 3 – 2.5 = 0.5 0.25
10 6 6– 4 = 2 4 2 2 – 2.5 = -0.5 2.25
11 7 7– 4 = 3 9 2 2 – 2.5 = -0.5 0.25
12 5 5– 4 = 1 1 3 3 – 2.5 = + 0.5 0.25
13 3 3– 4 = -1 1 0.25
2
1
= 2
2
= 11
32

GrA GrB
n = 13 n = 12
X = 14
D21 = 32 D22 = 11
2 2
/ n1 + n2 – 2
Biostatistics

= 1.37 ×0.4
= 0.548 = 0.55

4 -2.5/0.55
1.5/0.55 = 2.72

Critical value is calculated for df 23 is 2.07.


Decision (t) = 2.72
.05, 23
= 2.07.

14.48. Z-TEST
The deviation from the mean in a normal distribution r curve is called relative or
standard normal deviate and is given the symbol “Z.’ It is measured in terms of
SD and indicates how many an observations is bigger or smaller than the mean
in units of SD. So “Z” will be ration
Z = Observations – Mean/SD = X-X/SD
It is applying to the sampling variability and the difference between a sample
estimate and that of population is expressed in terms of SE instead of SD. The
score of the value ration between the observed difference and SE is called “Z.”
Condition for “Z”
It must be quantitative.
It should be assumed to follow normal distribution.
It must be randomly collected the data.
The sample size must be larger than 30.

mean (X)
Z = X – µ/ SE (X)

Z = X1 –X2/ SE (X1 – X2)

14.49. F-TEST OR FISHER’S F TEST


Introduction to Biotechnology and Biostatistics

The F – test was originated by RA Fisher. It is known as F-test. It is also called


as variance ratio test as comparison of sample variance involves in this test.
F –Test: hypothesis variances derived from two samples
F = 21 22
Assumption of F-test:
It should be equal for all group.
It should be independent of each value.
It should be normally distributed.
Uses of F-Test
The two independent estimates of the population variances are
homogeneous.
Two independent samples have been drawn from the normal populations
2
).
It is equality of population variances
df = (n1–1).
F- ration value is smaller than the tale value so the null hypothesis is
accepted
It indicates the samples are drawn from the same population.
It is used to calculate F Statistics.

14.50. T TEST
One Tailed t Test
The statistical hypothesis where either alternative hypothesis is one sided is
called one tailed test or one sided test.
It may be right or left tailed test.
If we want to know one particular drug is than the other.
It will be one tailed test.
Two-Tailed t Test
It is a test of statistical hypothesis based on rejected region represented by both
sides of the standard normal curve.
Example: The nourished children or healthy children is different look from that
of unnourished or unhealthy children.
Example: The sample of mean of 1600 IQ level children was 99. It is likely
that this was a random sample from a population with mean I.Q 100 and
standard deviation is 15.
Solution:
Null hypothesis: the sample has not be drawn from the population with mean
I.Q 100.
Alternative hypothesis: sample has been drawn from the population hypothesis.
Biostatistics

Here n = 1600
X = 99
µ = 100
SD = 15

So the null hypothesis is rejected, sample has not been drawn from population
with mean 100 and SD is 15.

14.51. ANOVA (ANALYSIS OF VARIANCES)


It is powerful statistical procedure for determining if differences in means are

2
): It is an absolute measure of dispersion of raw scores around
the sample (group) mean and the dispersion of the scores resulting from their
varying differences (error terms) from the means.
The square of the standard deviation is called the variance and is denoted by
the 2).
Mean Square: The measure variability are used in the analysis of variance is
called a “Mean square.”
Sum of square deviation from mean divided by degrees of freedom.
Mean square = Sum of square deviation from mean/Degrees of freedom
Assumption in the analysis of variances:
It effects of various components are additive.
It occurs random and it is independent of each other in the groups.
In this the population is normally distributed with common variance.
The samples are independently drawn.
Technique for the analysis of variance
One way ANOVA: The single independent variable is involved Eample:
the effect of pesticides (independent variables) on the oxygen consumption
(dependent variable) in a sample f insect.
Two way ANOVA: The two independent variables are involved.
Example: There are number of group of pesticides involve for the oxygen
consumption of sample of insect.
Procedure:
It is more convenient.
Introduction to Biotechnology and Biostatistics

It is based on the short cut method on the sum of the squares of the
individuals values are usually used.
The procedure of the calculation in direct method are lengthy as
well as time consuming and this is not popular in practice for the all
experiments.

14.52. NON-PARAMETRIC STATISTICS

Parameter
It is the numerical index or summary value like mean, median and standard
deviation or variance of a variable for the entire population.
Non-parametric test
The test or methods are mathematical procedures concerned with the treatment
of standard problems when the assumption of normality is replaced by general
assumption concerning the distribution function. It is also called the distribution
free.
Parametric test
The most commonly used statistical methods are called parametric because
they are involved in testing the values of parameter (mean, median or standard
deviation).
Characteristics
It can be computed by very simple method.
It does not require normal distribution of the variables.
It can be used for very small sample.
It works out without using any pre-computed statistic as an estimate of
parameter.
It can be done with very little assumption.
Merits and demerits of non-parametric test
Merits
It can be applied in all types of data.
It does not need pre-computed statistics.
It has a greater range of applicability.
It does not require laborious and lengthy calculations.
It is generally simple to understand and very easy to computed and
applies.
Demerits
The procedure has lack of power.
Biostatistics

It sometimes pays for freedom from assumption.


This procedure has lack of power.

Types of non-parametric test


Mann-whitney u-test or rank sum test
The test is also known as U-test.

two independent sample groups.


Kruskal-Wallis test or H-test
It is the rank dependent one way ANOVA and interpreted using critical
chi-square values.
It is a kind of rank sum test.
The sign test for paired data
It is discarded if the paired observation difference is zero.
It is based on the direction (+ or – sign) of a pair of observations and not
on their numerical magnitudes.
One sample run test
This test deals with the randomness with which the sample items have
been selected.
It is based on the order in which the sample observations are obtained.
Kolmo Gorove-Smirnov test

between the observed frequency distribution and a theoretical


distribution.
It is also known as K-test, i.e., another method of measuring goodness

It is more powerful and easier to apply.


Kenoal test for concordance

individuals.
It is applicable when two sets of ranking individuals are available.
Median test for independent sample

or more independent groups using a common median of those groups.


Wilcoxon signed rank test
It accounts for the magnitude of differences between paired values and not only
their sign. It is useful in comparing the two populations.

14.53. IMPORTANT FORMULAS


Introduction to Biotechnology and Biostatistics

Average Formula:
Let a1, a2, a3, a4, a5…….an be set of numbers = (a1+a2+a3+a4+a5….+an)/n
Adding Formula: a/b+c/d = ad+bc/bd
Subtracting formula: a/b – c/d = ad – bc /ad
Multiplying fractions: a/b * c/d = ac/bd
Dividing fractions: a/b/c/d = a/b / c/d = a/b * d/c = ad/bc
15.1. SIGMA PLOT
Sigma plot is used for prepare graph, analysis of variance and most of presentation
of data for analysis. This software can read multiple formats, we can directly
paste data from excel to Sigmaplot worksheet and make graph easily.

15.2. SPSS (STATISTICAL PACKAGE FOR THE


SOCIAL SCIENCES)
It is a widely used program for statistical analysis in science and various research
fields (market researchers, education researchers, and health researchers).

15.3. ORIGIN PRO SOFTWARE


It is used for data analysis and graphing software. Nowadays origin software
is used to create attractive graphs with different colors and more easy to other
software. It is an easy-to-use interface for beginners, with high-quality graphing
features. Graph qualities are much higher than above-mentioned software.
Origin simply starts with a built-in graph template and then we can easily
customize each and every element of a graph according to our need. We can
easily add additional axes, panels or layers to graph page in origin. We can
also save the settings of a graph as a custom template for their repeat use. It
is column oriented which is not common for the above-mentioned software.
Therefore, it provides several statistical analysis tools like descriptive statistics
(basic statistics) and one-way and two-way analysis of variance (ANOVA),

You might also like