Biostatistics Chapter
Biostatistics Chapter
INTRODUCTION
Statistics is the quantitative information of any data such as birth and death
rate, health management, production of medicine, profit and loss of different
industries. Another language, it is the collection, compilation, presentation,
analysis and interpretation of both qualitative and quantitative information
about data for example in one family planning, a number of born children can
be collected from married women. The characteristics to be measured which
depend on the objectives of the study.
For example: study the knowledge of child health care, child mortality,
child fertility and some other characteristics to be recorded are age of
mother, education of mother, number of born children, duration of marriage,
breastfeeding of period, number of dead children and proper vaccination is
done or not in a month.
Pharmaceutical chemistry 68
Chemistry 79
Intellectual property rights 67
Entrepreneurship 56
3. Total frequency
4. Percentage frequency
5. Frequency density
6. Class marks
7. Class width
8. Class boundaries
9. Class interval
10. Class limits
Continuous: The class interval does not contain the upper boundary
of the class will be known as class interval of continuous.
A class interval of the form 10–20 in continuous class will contain values
from 10 to less 20. An example such as
Range
0–10 From zero less than 10
10–20 From 10 less than20
20–30 From 20 less than 30
30–40 From 30 less than 40
40–50 From 40 less than 50
50–60 From 50 less than 60
60–70 From 60 less than 70
70–80 From 70 less than 80
80–90 From 80 less than 90
90–100 From 90 less than 100
20–29 From 20 to 29
30–39 From 30 to 39
40–49 From 40 to 49
50–59 From 50 to 59
60–69 From 60 to 69
70–79 From 70 to 79
80–89 From 80 to 89
90–99 From 90 to 99
0–50 90
50–100 100
100–150 80
150–200 50
200–250 40
250–300 150
300–350 50
350–400 30
400–450 40
450–500 50
500–550 40
550–600 30
600–650 20
650–700 10
Class Limits
The construction of groups frequency distribution the class interval must be
defined by pairs of numbers such that the upper end of one class does not
coincide with the lower end of the immediate following class.
The two numbers are used to specify the limits of classes’ interval for the
purposes of tallying the original observations into the various classes are called
Biostatistics
as class limits:
The small of the values or pair is known as lower class limits.
The larger of the values or pair is known as upper class limit.
Class Boundaries
The measurements of the continuous variables all data are recorded nearest to
a certain unit or integer value. The most extreme values which would ever be
included in a class interval are known as class boundaries, In fact, this is real or
actual limits of a class interval.
The extreme point is lower is known as lower class boundary.
The extreme point is higher is known as higher class boundary.
Calculation
If x is the gap between the upper class limit of any class or class interval and the
lower class limit of the next class or class interval.
Note
Class limits are used only for the construction of the grouped frequency
distribution but in all statistical calculations and diagrams involving end points
of classes (e.g., medium, mode, histogram and ogives, etc.) Class boundaries
are used.
Class Mark
It is mid value of a class interval exactly at the middle of the class
or class interval.
It lies half way between the class limits or between the class
boundaries.
Class mark = Lower class limit + upper class limit
It is used for the representative class interval for the calculation of
means and standard deviation and mean deviation, etc.
Class Width
It is length or range of class interval or difference between the upper or lower
class boundaries.
Introduction to Biotechnology and Biostatistics
Percentage Frequency
The percentage of the class interval is expressed as percentage of the total
frequency distribution.
Relative Frequency
The representation ratios are of the total frequency values. It is not expressed in
percentage. It is used to compare two or more frequency distributions or two or
more items in the same frequency distribution.
Total frequency
Frequency Density
The frequency density of a class interval is its frequency per unit width. It shows
the concentration of frequency in a class. It is used in drawing histogram when
the classes are of unequal width.
Uses
To analysis the number of observations less than or more than any
given value.
Theoretical Distribution
Normal distribution was first discovered in 1733 by mathematician De
Movire. He obtained this continuous distribution as a limiting case of binomial
distribution. So the normal distribution is also called as Gaussian distribution
named after Karl Friedrich Gauss, who used this normal curve to describe the
theory of accidental errors of measurements involved in calculation of orbits of
Biostatistics
Skewness = 0
Kurtosis = 0
The ordinates maximum (y) lies at the mean, i.e., at x = µ.
decreases.
Since the mean = median = µ, the coordinates at x = µ (or Z = 0)
Introduction to Biotechnology and Biostatistics
at the mean when the total mean are under the normal curve is equal
to unity.
There are following table which gives are under the normal
probability curve for some important values of Z.
It is the normal distribution which fits best with an observed the distribution and
Solution:
Let assume the mean 134.5
) X fd fd2
(f)
100–109 104.5 6 104.5–134.5 = -30 -3 -18 54
110–119 114.5 11 114.5–134.5 = -20 -2 -22 44
120–129 124.5 10 124.5–134.5 = -10 -1 -10 10
130–139 134.5 17 134.5–134.5 = 0 0 0 0
140–149 144.5 16 144.5–134.5 = +10 +1 +16 16
150–159 154.5 13 154.5–134.5 = +20 +2 +26 52
160–169 164.5 7 164.5–134.5 = +30 +3 +21 63
239
X = 134.5 + 13/80 × 10
= 134.5 + 1.625
= 136.125
= 136.1
2 2
2
× 10
= 10 × 1.72
=17.2
The deviation of each (Xm) from (X) is then transformed into Z score
which entered in the following table.
Mid Value X y Y
(f)
Neglecting the algebraic sign of the Z score, the height y of the ordinate at each
Z score is then recorded from units normal curve table.
each Z score by multiplying its y score with in/ SD (I = class interval, n = total
frequency, SD = standard deviation) for example
Y = y × in/ SD
= 0.1804 × 10×80/17.2
= 0.1804×46.51
=8.39
= 8.4
14.4. VARIABLES
These characteristics are different from individual to individual. So, it is known
as variable. There are two types of variable. The information of the whole group
is examining the part of the each whole group individual to individual.
Qualitative variable
Quantitative variable
Continuous Variable:
14.14.2. Questionnaire
This is the perform containing a sequence of questions for the statistical enquiry.
This is used for collection of primary data from individual persons through their
responses.
14.14.4. Population
There are group of peoples or study the elements for measurements having
some common fundamentals characteristics.
Finite:
Population consists of an endless succession of values e.g.,
number of plants in ocean.
(\) tally mark and running diagonally across the four tally marks.
Examples: Form a frequency table for the different variables.
1, 4, 7, 8, 5, 8, 4, 9, 23, 56, 23, 45, 67, 78, 25, 4, 8. 9, 4, 1, 23, 78, 56, 9, 45, 78,
25, 67, 25, 78, 45, 23.
Solution:
Variables Tally Frequency
1 II 6
4 IIII 7
7 I 4
8 III 6
9 III 7
11 III 3
23 IIII 5
55 IIIII 3
56 II 5
45 III 7
67 II 7
78 IIII 3
25 IIII 5
78 IIII 6
14.16. BIOSTATISTICS
Statistics of marriage, birth death rate, migration, family planning, level of
education, health care of pregnant women and many more problems which are
affecting the welfare of mankind.
It is easily understood.
The data presented in more attractive form.
It shows the tendency and trends of values of the variable.
It is useful to detect mistakes.
It shows easily the relationship between two data sets.
It has universal applicable.
It is helpful for the assimilation of data quickly.
There are different types of graphs in the form of diagrams and charts such as:
line diagram
bar diagram
pie diagrams
These three are used for qualitative presentation of data
stem- and leaf plot
histogram
ogive (cumulative frequency polygon).
scatter diagram
frequency polygone
Introduction to Biotechnology and Biostatistics
Statistics is an important part for all students from graduation to PhD level.
Here we are going to discuss stats via using some important software’s which
make them very easy as well as less time-consuming. Excel is basic stat software
which should be known to each and every student of any discipline. After excel,
some other advanced stats tool are also discussed here in this chapter such as
Origin Pro software. The data in the form of raw scores is known as ungrouped
data and when it is organized into frequency distribution then it is referred to
as grouped data. Separate modes and methods are used to represent these two
types of data ungrouped and grouped.
In line diagram many more style is there, so which you want to like in design
select and click “OK”
Introduction to Biotechnology and Biostatistics
After you get graph then go to design page and select any design which you
want, so your graph will attractive presentation of data.
Select “error bars with standard error,” so you can present your data without
any error.
Go to data tables and select below, so your data will show on graph.
Go to data labels and select “above” then data show on line symbol.
Biostatistics
Figure 14.1: Screenshot represents the data input in excel sheet and form Line
diagram.
Introduction to Biotechnology and Biostatistics
Figure 14.2: Screenshot represents the data input in excel sheet and form Bar
diagram.
Figure 14.3: Screenshot represents the data input in excel sheet and form Pie
diagram.
eachleaf is digit displayed t the right of it. Each leaf represent a separate data
value.
14.23 HISTOGRAM
It is the accurate representation of the distribution of numerical data. It is an
estimate of the probability distribution of a continuous variable and was first
introduced by Karl Pearson.
14.24 OGIVES
It is the graph showing the curve of a cumulative distribution function. The
points plotted are the upper class limit and the corresponding cumulative
frequency.
1. Enter the data on excel cell, after select the data, for graph formation.
When you will get graph than right click on dots.
2. Select the Trendline option in layout option.
3. After Trendline option select the last two options display equation
on chart and display R- squared value on chart.
4. Note down the displayed equation and calculate the x value from
the given equation (in y = mx + c)
5.
6. R2 = 0.821 (Figure 204.4).
Go to insert page, select scatter, choose scatter which you want to like in graph.
Right click on trendline, after that new window open then select liner, display
equation on chart and display R square value.
Biostatistics
Figure 14.4: Screenshot represents the data input in excel sheet and form scatter
diagram
BOX HEAD
SOURCE
FOOT NOTE
Features of good table
1. Title must have clear and concise which gives precise idea about
the table contents.
2. The items arrangement in the table should be arrange logically.
3. Special notes at the end of the table for experiment should be bear
to resolve or solve the confusing entry.
4. All the necessary details should be containing in the table.
5. Column or sub column should be distribution like single or double
ruling, etc.
6. Figure should be kept close as possible to the table for the
compatible comparison.
7. Table should be well proportional or justify in breadth or length.
8. Measurements of units or abbreviations should be shown clearly on
top of the column or below in the “Note” text line.
9. Pattern of table given in Table 14.1.
Table 14.1: Geo-Statistics of Individual Layers of Groundwater Quality
Parameters (*WHO/Indian Standard for Drinking Purpose) Banasthali, Tonk,
Rajasthan, India
Mean Value
Note: Values are mean of three replicates, S.No. – Serial number; SD – Standard
deviation (Electrical conductivity – EC dSm-1), (Calcium – Ca, Chloride – Cl,
Carbonate – CO3, Biocarbonate – HCO3,Residual sodium carbonate – RSC,
Biostatistics
Statistics measure are based on the It is measure on the sample observations is known
units in the population is known as as statistics.
parameter.
Example: population mean and Example: Sample mean and sample standard
population standard deviation. deviation.
Population characterize. Sample characterize.
This is not apply for the directly It is apply for the directly worked out
worked out.
Parameters value is constant and it It is variable calculation in sample. Mean the values
of sample is varies from sample to sample.
x Mean
S Standard deviation
S2 Variance
r p
14.28. MEAN
The sum of a collected data and divide it by the number of the set of data. It is
1 67
2 69
3 66
4 68
5 72
6 63
7 76
8 65
70
10 74
Total = 10
Solution:
Mean = 690/10= 69
Choose average
Select data
Select cell and click on right corner and then drag below so you get average
values of all data in less time consuming or we can say it is fast process.
= 5750
X = 5700/100 = 57
Biostatistics
30 15 450
40 20 800
50 10 500
60 15 900
70 20 1400
80 15 1200
90 5 450
f = 100 fx = 5700
Frequency 3 5 10 15 5 12
Solution:
10–20 15 3 45
20–30 25 5 125
30–40 35 10 350
40–50 45 15 675
50–60 55 5 275
60–70 65 12 780
= 50 = 2250
Here we apply
X=
Examples: Calculate the arithmetic mean for the daily wages from the following
data.
Solution:
10–20 15 5 75
20–30 25 10 250
30–40 35 30 1050
40–50 45 20 900
50–60 55 15 825
60–70 65 10 650
= 90 = 3750
Explain:
Here we apply
X=
Solution:
10–20 15 2 30
20–30 25 7 175
30–40 35 17 595
40–50 45 29 1305
50–60 55 29 1595
60–70 65 10 650
70–80 75 3 225
80–90 85 2 170
90–100 95 1 95
= 100 = 4840
Biostatistics
Here we apply
X=
taken as that value of x, which corresponds to the middle value of the frequency
distribution.
Such as in the case of ungrouped data
Example 1: Find the mean weight of the following students by short cut
method whose weights are in kg.
67 69 66 68 63 76 72 74 70 65
Solution:
Let assume 68 as mean
X X-a = d
67 67 – 68 = -1
69 69 – 68 = +1
66 66 – 68 = -2
68 68 – 68 = +0
63 63 – 68 = -5
76 76 – 68 = +8
72 72 – 68 = +4
74 74 – 68 = +6
70 70 – 68 = +2
65 65 – 68 = -3
fd = +21 – 11 = 10
N =10
X=a+ / N = 68+10/10
= 68+1 = 69
69 Kg
Introduction to Biotechnology and Biostatistics
Examples 2: Find the mean height of the 8 students by shirt cut method,
whose height are in centimeter.
59 65 69 63 61 71 73 67
Solution:
Let assume 65 as mean
X
59 59 – 65 = -6
65 65 – 65 = 0
69 69 – 65 = +4
63 63 – 65 = -2
61 61 – 65 = -4
71 71 – 65 = +6
73 73 – 65 = +8
67 67 – 65 = +2
fd = +20 – 12 = 8
N=8
X=a+ /N
= 65+8/8 = 65+1
66 cm
Example: Find out the arithmetic mean by short cut method for the following
data.
Solution:
Let assume 55 as a mean
fd
10–20 15 5 15–55 = -40 -200
20–40 30 15 30–55 = -25 -375
Biostatistics
X=a+ /N
= 55 – 130/100
= 55–13
= 53.7
Solution:
Let’s take 67 as assume mean
fd
(f) (d)
60–62 61 15 -6 -2 -30
63–65 64 54 -3 -1 -54
66–68 67 126 0 0 0
69–71 70 81 +3 +1 +81
72–74 73 24 +6 +2 +48
= 300 = 45
Introduction to Biotechnology and Biostatistics
= 45
a = 67
= 300
i=3
X = a+ 45/300 × 3
= 67 +45/300 × 3 = 67 +45 = 67.45 inch.
Example: Calculate the average marks by the step deviation method.
Marks 0–10 10–20 20–30 30–40 40–50 50–60
Number of 40 25 50 35 30 20
students
Solution:
Let us take 35 as assumed mean
fd
-220 + 70 = -150
a = 35
N= 200
i = 10
X=a+ i
N
35 – 150/ 200×10
= 35 – 7.5 = 27.5
Merits, Demerits, and Uses of Mean
Merits
It has the simplest formula which is understandable easily and easy
to compute.
14.30. MEDIAN
It is the value in given data which divides into two equal parts such that half of
the observation is below or another half are above it. It is the middle most point
or the central value of the variable in a set of observations when observations
are arranged either in ascending or in descending order of their magnitudes.
“It is the value of that in a series which decides the series into two equal parts,
one part consisting of all values greater than it.” (Prof. Ghosh & Chowdhury).
No. of 24 26 16 20 6 30 122
persons
Solution:
Let arrange the data in ascending order and then the form the cumulating
frequencies
80 16 16
100 24 40
150 26 66
180 30 96
200 20 116
250 6 122
As according to the table n = 122 (even),
So the median (M) = average of (n/2)th + (n/2+1)th
= 122/2 + (122+2/2)
= 61+62/2
= 61.5th
= 61.5th lies are the interval 41 to 66. Therefore the value is 150.
Example 2: Calculate the median for the following data:
Number of 6 16 7 4 2 8
students
Marks 20 25 50 9 80 40
Solution:
Lets arrange the data or we can say that marks in ascending order and then form
the cumulative frequencies.
9 4 4
20 6 10
25 16 26
40 8 34
50 7 41
80 2 43
Here
Median on Excel
Birth weight of girls = 3.2, 2.5, 2.8, 2.2, 3.0 (N=5)
After click on enter button, choose median given cell and click on right side,
you see one sign and then cursor stay on that and drag it.
14.31. MODE
There is given value of the data set which occurs most frequently. For example,
Let us consider the age of boys, where ages are
8, 5, 10, 9, 5, 8, 10, and 8
Here 8 is repeat most of the time or we can say occurs maximum time. Therefore
mode is 8, because 8 is occurs most frequently in the given data set value. The
data enter same as above description given in median and follow step same as
in median and get the values of mode (Figure 14.7).
According to the CRAFT and COWDEN, “The mode of the distribution of the
value at the point around which the items tend to be most heavily concentrated.
It may be regarded as the most typical value.”
Ungrouped data (simple series): Mode can be determined by locating
that value which occurs the maximum number of times. It is that
value of the variable which corresponds to the largest frequency.
Example 1: Find out the Mode of the following data.
1,3, 1, 3, 3, 5, 3, 3, 1, 5, 3, 3, 4, 5, 4, 2, 3, 2, 3, 7, 6, 3, 2, 5, 2, 3, 3, 2, 6, 2, 3, 2, 4, 2, 3.
Solution:
Let prepare the table
Values Number of items (f)
1 3
2 8
3 14
4 3
5 4
6 2
7 1
Here the 3 repeats 14 times and is the most frequent is used so the mode is 3.
Example 2: In Bombay there is Khadim shop sold 100 pairs of shoes in
Khadim exclusive on certain day with the following distribution. Find out
the mode of distribution.
Size of the 4 5 6 7 8 9 10
shoe
Number of 10 15 20 35 16 3 1
pairs
Size of the 4 5 6 7 8 9 10
shoe
Number of 10 15 20 35 16 3 1
pairs
Here is Mean
N
X = 40/12 = 3.33
The table indicate the number of “3” has the maximum frequency is “3” so
therefore “3” is the mode of the mode of the numbers.
(B) Grouped data (Discrete series): Mode is determined by inspection.
In this case error of judgment is possible in these cases where the difference
between the maximum frequency and the frequency preceding or succeeding it
is very small and the items are heavily concentrated on ether side. It is prepared
by grouping table and analysis table.
Grouped table features: It has six columns.
Column I: The maximum frequencies and original frequencies are marked.
Column II: The frequencies of column I are combined two by two and
frequency is marked by bold type.
Column III: Leaving the column I frequency and combine the other two and
again marked by bold type.
Column IV: The column I are combined in three by three and the maximum
frequency is marked by bold type.
Column V: Leave the frequency I and combine the others three by three and
Biostatistics
Solution:
Height Frequency of two (II) of two leaving Grouping of Grouping Grouping
(I) three (IV) of three of three
(V) (VI)
58 4 10 15
59 6 11 21
60 5 15 35
61 10 30 52
62 20 42 66
63 22 46 52
64 24 30 32
65 6 8 9
66 2 3
67 1
Analysis table:
Columns Size of the items having maximum frequency
58 59 60 61 62 63 64 65 66 67
I I
II I I
III I I
IV I I I
V I I I
VI I I I
Total 1 3 5 4 1
Since the number 63 occur maximum number of items i.e., 5 times, hence
mode is 63.
(C) Continuous Series:
Introduction to Biotechnology and Biostatistics
where,
L1 = Lower boundary
d1 = Difference of the largest frequency and preceding modal class
d2 = Difference of the largest frequency and following modal class
i = Width classes
fm= Maximum frequencyw
f1 = Frequency of the class just proceeding the modal class
f2 = Frequency of the class just following the modal class
d1 = fm – f1
d2 = fm – f2
Mode = L1 + fm – f1 i
(fm – f1) + (fm – f2)
= L1 + fm – f1 × i
2 fm – f1 – f2
1 2
= 15.5 + 16/ 64–40 × 5
= 15.5 + 16/24 × 5
= 15.5 + 3.33
= 18.83.
For excel you can fit the value same as median and follow the instruction as
before in mean and median.
Same as median
Same as median
Demerits
In this case large number of observation is available and there is no
Partition Values
When we required dividing a series into more than two equal parts, the dividing
places are known as partition values.
Percentiles
Deciles
Quartiles
Percentiles: The values which divide the total number of observations into the
hundred equal parts. There are 99 percentiles P1, P2, P3…….P99 is called as
first percentile second third percentile, etc.
Deciles: The values which divide the total number of observations into the ten
equal parts. These are nine deciles viz, D1, D2,…. D9, first, second and third
deciles, etc.
Quartiles: The values which divide the total number of observations into the
four equal parts. Therefore there are three quartiles.
First quartile (Lower quartile): Q1
Second quartile (Middle quartile): Q2
Third quartile (Upper quartile): Q3
Geometric Mean
geometric mean. The geometric mean cannot be used if the values is zeros or
negative values.
Geometric mean (GM) = 1
× X2 × X3 …. Xn
n = number of observations
X1 × X2 × X3… = Variable values.
Example 1: Find the G.M of the three numbers 8, 36, 48.
Solution:
Biostatistics
=2×2×2×3
= 24
Merits and Demerits of Geometric Mean
Merits
It is affected by the extreme values.
It is capable of algebraic treatment.
negative values.
Harmonic Mean
individual observations.
Thus for observations X1, X2, X3, …. Xn
H.M = N
1/X1 + 1/X2 + 1/ X3 …. 1/Xn
=N
Example: Find the average rate of motion in the case of a person who rides
km at 6 km an hour.
Solution:
Harmonic mean is the proper average.
N = 3 HM. = 3/1/10 +1/8+ 1/6
= 3/ 12+15+20/120
= 3/47/120
= 3/0.39
7.6 km an hour
Uses
It is very limited.
It is involving in time, rate and price.
It gives less weight to large items and more weight to small item.
Introduction to Biotechnology and Biostatistics
Demerits
The values cannot be computed when there are both positive and
negative items.
It is not popular.
Examples: Find out the relation between A.M, G.M. and H.M.
Solution:
The observation of any given set, A.M. is greater than or equal to G.M. and G.M. is greater than or
equal to HM.
H.M = 2/ 1/6+1/6
= 2/ 1/6+1/6
= 2/1/3
=6
So, X = GM. = H.M
But, the size vary, mean (A.M) will be greater than the geometric mean and geometric mean will be
greater than the harmonic mean. This is because of the property of the geometric mean to give larger
weight to smaller item and of the harmonic mean to give the largest weight to the smallest item.
X > G.M > HM.
For example:
We take two positive items 4 and 9
Mean (AM.) = 4+9/2 = 6.5
=6
H.M. = 2/1/4+1/9 =
= 2/13/36 = 2×36/13
= 5.5
So, A.M > G.M > H.M = = 6.5 > 6 > 5.5.
Questions: Which type of average would be suitable?
i. Average sales for various years ?
ii. Sale of shirts with collar size in Cm 36, 37, 35, 36, 33, 36 ?
iii. Size of agriculture holdings ?
iv. Per capita income in several countries ?
v. Runs scored by a player in different matches ?
vi. Comparison of intelligence of students ?
vii. Marks of candidates obtained in an examination ?
viii. Size of the shoes sold at a shop ?
Answers: (i) Mean (ii) Mode (iii) Mode (iv) Mean (v) Mean (vi) Median (vii) Median (viii) Mode.
Biostatistics
11 11 – 16 = -5 25
12 12 – 16 = -4 16
13 13 – 16 = -3 9
14 14 – 16 = -2 4
15 15 – 16 = -1 1
16 16 – 16 = 0 0
17 17 – 16 = +1 1
18 18 – 16 = +2 4
19 19 – 16 = +3 9
20 20 – 16 = +4 16
21 21 – 16 = +5 25
N = 11 2
= 110
S.D (
= 3.16
2 2
Where d = X – A
A = Assumed mean
n = Total number of observations
Example 1: Find the standard deviation of the following items.
Biostatistics
48 43 65 57 31 60 37 48 59 78
Solution:
Calculate the standard deviation.
Value (X) d2
48 48 – 57 = -9 81
43 43 – 57 = -14 196
65 65 – 57 = +8 64
57 57 – 57 = 0 0
31 31 – 57 = -26 676
60 60 – 57 = +3 9
37 37 – 57 = -20 400
48 48 – 57 = -9 81
59 59 – 57 = +2 4
78 78 -57 = +21 441
-78 +34 = -44 1952
n = 10
2 2
Example 1: Find out the mean and standard deviation of the following
data.
Size of item 10 11 12 13 14 15 16
Frequency 2 7 11 15 10 4 1
Solution: Prepare the following table:
Introduction to Biotechnology and Biostatistics
fd fd2
10 2 10 – 13 = -3 -6 18
11 7 11 – 13 = -2 -14 28
12 11 12 – 13 = -1 -11 11
13 15 13 – 13 = 0 0 0
14 10 14 – 13 = +1 10 10
15 4 15 – 13 = +2 8 16
16 1 16 – 13 = +3 3 9
2
= 13 – 10/50
=13 – 0.2
= 12.8
X = 12.8
2 2 2
= 1.342
Standard deviation in continuous series
(a) Direct Method
X = Mid value
X = A.M
f = Frequency
d = X-A/i
A = Assumed mean
N = total frequency
i = class width
Example 1: Calculate the mean, median, S.D variance and covariance of
the following items.
Heights in inches 95–105 105–115 115–125 125–135 135–145
Number of 19 23 36 70 52
children
Solution:
Let assumed the mean value is 130
Biostatistics
Cf fd fd2
o f
Children
95–105 100 19 19 100–130/10 -57 19 × 9 = 171
= -3
105–115 110 23 42 110 –13 0/10 -46 23 × 4 = 92
= -2
115–125 120 36 78 120–130/10 -36 36 × 1 = 36
= -1
125- 135 130 70 148 130–130/10 = 0 0 70 × 0 = 0
135–145 140 52 200 140–130/10= +52 52 × 1 = 52
+1
N = 200 -139 +152 = -87 = 351
Mean =
= 130 + -87/200 × 10
= 130 – 4.35
= 125.65
2 2
2
×10
= 1.2489 × 10
= 12.489
Median =
Median class 125 – 135 = 125 + 200/2 – 78 /70 × 10
= 125 + 100 -78/70 × 10
= 125 +22/7
= 125 + 3.14
= 128.14
2 2
= 155.97
= 156
1248.00/125.65 – 124800/12565
= 9.93
Standard Error
The sampling distribution of any statistics will have its own mean, standard
deviation, etc. The sample estimates of statistics will differ from population
parameter.
The difference between the sample or particular sample and population variation
is called as sampling error or standard error.
Standard error can be calculated by
Introduction to Biotechnology and Biostatistics
S.D = 23.61
n = 36
23.61/6
= 3.935.
Mean (x) =
Biostatistics
= 1809/100
=18.09
= 3.07
Variance = (S.D)2 = 9.4249
= 3.07/18.09 × 100
= 16.97
Example 2: Find out the mean and S.D from the following frequency
distribution?
Scores 20–22 23–25 26–28 29–31 32–34 35–37 38–40
Frequency 2 5 7 13 8 4 1
Solution:
Mid f d2 fd2
= 1188/40
= 29.7
S.D = = 2
= 4.18
Example: Calculate the mean and median from the following frequency
distribution.
Scores 20–24 25–29 30–34 35–39 40–44 45–49
Frequency 7 9 12 6 4 2
Solution:
f cf
20–24 22 19.5–24.5 7 154 7
25–29 27 24.5–29.5 9 243 16
Introduction to Biotechnology and Biostatistics
= N/2 = 40/2
= 20
L1 = 29.5
= 12, C = 16, I = 5
Median = 29.5 + 20–16/12 × 5
29.5 +4/12 × 5
= 29.5 + 1.667
= 31.167
Example: Calculate the mean, median and S.D from the following
distribution?
Scores 10–19 20–29 30–39 4 0 – 50–59 60–69 70–79 80–89 90–99
49
Frequency 2 5 3 5 8 12 25 30 10
Solution:
Mid f d2 fd2
n = 100 = 7030 2
= 38636
= 7030/100
= 70.30
Biostatistics
Median = 1
= 69.5 + 50 – 35/25 × 10
= 69.5 + 150/25
= 69.5 + 6
= 75.5
2
38636/100
= 19.656
Merits and Demerits of the Standard Deviation
Merits
measure of dispersion.
It is used in correlation.
Uses
Same as mean
Same as mean
Same as mean
Same as mean
Biostatistics
Same as mean
14.34. VARIANCE
2
)
Variance = (S.D) 2
)
2
Covariance:
where the average product of the simultaneous deviation of the variables from
their respective mean.
2 2
/ n-1 or
2 2
/n
COVXY
Variance on excel
Select new column for the variance and then click on cell and type VAR,
selection option and enter (Figure 14.9).
Introduction to Biotechnology and Biostatistics
14.36. REGRESSION
It is used to denote estimation or prediction of the average value of one variable
for a specified value of the other variable. One of the variables is known as
independent or the explained variable and the other is called dependent or the
explaining variable.
(It is the measure of the average relationship between two or more variable in
terms of the original units of the data; M.M. Blair).
Regression Lines
The bivariate data are plotted as points on graph paper, it will be found that the
concentrations point follows a certain pattern showing the relationship between
the variables. When the trends points are found to be linear, we determine the
used to obtain best estimated of one variable for given values of the other are
called regression lines.
bx +a.
of that line.
Introduction to Biotechnology and Biostatistics
change per unit change in some other independent variable (X) is known as
Types of Regression
Simple regression:
Dependent variable is a function of a single independent variable.
Multiple regression
Dependent variable is a function of two or more variable.
Linear regression
Dependent variable is linearly correlated with the predictor (independent
variable). It forms the straight line.
Nonlinear regression
Dependent variable has a nonlinear correlation with the independent variable. It
forms sigmoid or hyperbolic curve.
Properties Regression
The expression of the dependent variable is applied as a function of
independent variable.
Statistical Hypothesis
The statement or assertion about the statistical population or the value of its
parameters is called statistical hypothesis.
It is two types of hypothesis
Simple
Composite
1. Simple hypothesis
The hypothesis which specifies the population completely is called simple
hypothesis.
2. Composite hypothesis
The hypothesis which does not specify the population completely is called
composite hypothesis.
Introduction to Biotechnology and Biostatistics
Rejection Region
The set values of the statistics which lead to rejection of the null hypothesis
is called rejection region of the test. The probability of the null hypothesis is
rejected by the test is often referred to as “size” of the critical region.
On the other hand which lead to the acceptance of null hypothesis which
gives us a region is called as “Acceptance region.”
Statistics Test
After the arrangement of the null hypothesis and alternative hypothesis the test
of statistics is computed and it is based on the probability distribution. It is used
to test whether the null hypothesis set up should be accepted or rejected.
Degree of Freedoms
The sample which is freely variable without affecting the mean or it is an integer
There are the number of data which are given in the form of a series of variables
in a row or column or the number of frequencies that are put in cells in a
contingency table which can be calculated independently is called the degrees
of freedom and is denoted by
/E
2
Example: The Model’s reported the results of the garden pea test each for
Solution:
Cross Progeny Hypothesis
Green × Yellow Pods (F2)428: 152 3: 1
Violet red × White Flower (F1) 47: 40 1: 1
Round yellow × Wrinkled green (F1) 31: 26: 27: 26 1: 1: 1: 1
Introduction to Biotechnology and Biostatistics
Solution:
Null Hypothesis = 3: 1
Alternative Hypothesis = 1: 1
Calculation =
Observed (O) Expected (E) (O-E) (O-E)2 (O-E)2/E
428 428 – 435 = -7 49 49/435 = 0. 113
152 152 – 145 = 7 49 49/145 = 0. 338
Total = 580 x2 = 0. 451
The critical value of X2 at 0.05 and for 2–1 = 1 degree of freedom is 3.84.
The decision; the calculated value of chi – square (x2) = 0.451 < critical value
of x2 for df
variation with the data So it is result of F2 monohybrid cross.
Solution (b)
Null Hypothesis = 1: 1
Alternative Hypothesis = 3: 1
Calculation =
Critical value: The control vale of chi square at 0.05 and for 2–1 = 1 degree of
freedom is 3.84.
Decision: the calculated value of the chi square x2 = 0.562 < critical value of
x2 for 1 df = 3.84 so the null hypothesis is accepted, i.e., the variation is non
Solution (c)
Null Hypothesis = 1: 1
Alternative Hypothesis = 3: 1
Calculation =
Observed (O) Expected (E) (O-E) (O-E)2 (O-E)2/E
Critical value
Biostatistics
The chi square value of chi-square at 0.05 and 4–1 = 3 so, df = is 7.82
Decision
Calculated chi-square value (x2) = 0.618 < critical value of X2 of 3 df = 7.82 so
Measures of variability
It is helpful to find out on how individuals observations are dispersed around
the mean of a large series.
The variability of a given set will be zero and only when observations are equal
so it takes positive when observations are unequal.
The measurements and variability are both of fundamental importance in the
biological science.
Dispersion
Introduction to Biotechnology and Biostatistics
Range
It is the simplest measure of dispersion. It is the difference between the value of
smallest item and the largest item included in the distribution.
Range (R) = Largest value (L) – Smallest value (S)
The relative measure corresponding to range is co-
R=L–S
L+S
Example: Find out the ranges of daily wages of 8 persons in a family given
below.
Biostatistics
Solution:
There are class discontinuous type we change class limits to class boundaries
7 then the lower class boundary of lowest class 9.5 (S) and the upper class
boundary of the highest class = 99.5 (L)
The range (R) = 99.5 – 9.5 = Rs. 90.
Solution:
R=L–S
Here L = 25, S = 16
R = 25 – 16 = 9 (R)
L+S
= 25 -16
25+16
= 9/41 = 0.219
Merits and Demerits of Range
Merits
It takes time to calculate.
It is also simple to calculate.
It is easy to understand.
Demerits
It is not depend on all observations.
It is based on only the largest and smallest among the values.
It is highly affected by extreme values.
It cannot be calculated by from frequency distribution with open
classes.
Uses
1. Estimating the Fluctuations in Prices: It is useful for the prices
variation in stocks and shares.
2. Weather Forecasts: It is preferably used in determining the
Introduction to Biotechnology and Biostatistics
Mean Deviation
The mean deviation is called the average deviation. It is the average difference
between the items in a distribution and the median and mean that series.
It is about the mean.
It is about the median
It is the ration of mean deviation to its arithmetic mean or median multiplied by 100.
C.M.D = MD× 100
Mean/ Median
nn
Solution:
fD
N
= 36/48
= 0.75
Example: Find out the mean deviation of the following data 13, 84, 68, 24,
96, 139, 84, 27 and bout the median?
Solution:
Here there are many even number of observations viz 8, median is the average
of the two middle most observations
Let us arrange the data
Biostatistics
13 24 27 68 84 84 96 139
Median = 68+84/2
= 152/2
= 76
X X – Median = D
13 76 – 13 = 63
24 76 – 24 = 52
27 76 – 27 = 49
68 76 – 68 = 8
84 84 – 76 = 8
84 84 – 76 = 8
96 96 – 76 = 20
139 139 – 76 = 63
N=8
f
f
value of mean deviation.
MD = f f f
=X–X
f=N
deviation is taken from the mid value of each class. Multiply the
deviation by frequencies and obtain the total. It is divide the total
sum of the total number of observations.
Introduction to Biotechnology and Biostatistics
X= 1350/ 50
= 27
MD = f
= 472/50
= 9.44
Example: From the following frequency distribution calculate the value of
quartile (Q1) median (Q2) and upper Quartile (Q3).
Marks in 10–19 20–29 30–39 40–49 50–59 60–69 Total
Mathematics
Frequency 8 11 15 17 12 7 70
Solution:
Q1 = N/4
Q2 = N/2
Q3 = 3N/4
Biostatistics
10–19 8 9.5 0
20–29 11 Q1 19.5 8 N/4 = 17.5
30–39 15 29.5 19
40–49 17 Q2 39.5 34 2 N/4 = 35
50–59 12 49.5 51
60–69 7 Q3 59.5 63 3 N/4 = 52.5
69.5 70 =N
Q1 = L1 + N/4 –F1/f1 × i
L1 = Lower boundary of quartiles
F1 = Cumulative frequency
f1 = frequency of quartile class
i = width of class interval
Q1 = 19.5 + 17.5–8/11 × 10
= 19.5 + 95/11
= 19.5 +8.6
= 28.1
Q2 = 39.5 + 35–34/17 ×10
= 39.5+ 10/17
=39.5+0.58
= 40.08
=40
Q3 = 49.5 + 52.5 – 51/12 × 10
49.5 + 1.5 × 10/12
= 49.5 + 1.25
= 50.75
= 51
Q1 = 28
Q2 = 40
Q3 = 51
Solution:
10–15 4 10 0
15–20 12 15 4
Introduction to Biotechnology and Biostatistics
40–45 10 40 0
45 – 50 22 Q1 45 10 25 = N/4
50 – 65 28 Q2 50 32
55 – 60 20 Q3 55 60 75 = 3N/4
60 – 65 12 60 80
Biostatistics
65 – 70 8 65 92
70 100 =N
Q1 = L1 + N/4 – F/ f × i
N/4 = 25 F = 10
L1 = 45
f = 28
i=5
= 45 + 25 – 10/22 ×5
= 45 + 15/22 × 5
= 45 + 3.4
= 48.4
Q3 = 55 + 75 -60/20 ×5 = 55 + 3.75
= 58.75
Quartile Deviation (Q) = Q3 – Q1/2
= 58.75 – 48.4/2
= 10.35/2
= 5.175
2
)=
Kurtosis and Skewness of binomial distribution depends on the
proportion of p and q in the population.
1
) = (q – p)2/npq
2
) = 3 + 1 – 6 pq/npq
It is symmetrical of p = q = 0.5
It is positively skewed if p < 0.5
It is negatively skewed if p > 0.5
expressed by
nCk = n/ k n-k
(2) Bernoulli expansion:
It is having total number of classes, events with probability of Bernoulli
expansion
n = total number of events
p = classes
q = classes
X = number of classes in p classes
n-X = number of cases in q classes
Probability of P (X) is expressed by Bernoulli expansion
P (X) = n px qn-x/
(3) Binomial expansion:
It is having a total number of events and trials with probability of occurrence of
success or failure.
n = total number of events/ trials
p = probability of occurrence of success
q = probability of occurrence of failure.
n n 2 3
= 6. (0.5)2 (0.5)2
= 6 × 0.5 × 0.5 × 0.5 × 0.5
= 6 × 0.0625
= 0.375
So, the probabilities of two boys and two girls in a family are 0.375.
Example 3: In a two hundred families with three children a population of
Arambagh subdivision is sampled at random. How many families do we expect
to have (a) no girls (b) one girl (c) two girls? Assume the sex ration to be 1:1.
Solution:
Probabilities for girls and boys = ½
g for girls and b for boys.
Now we expand the binomial (g and b), n = 3
(g +b)3 = g3 + 3g2b + 3gb2 + b3
No girls relate to b3 term
(1) ½ × ½ × ½ = 1/8 = 1/8 ×200 = 25 (200 = families)
(2) One girls relates to 3gb2 term
3 × ½ (1/2)2 = 3 ×1/2 × ¼ = 3/8 = 3/8 ×200 = 75
(3) Two girls relate to 3g2b term
3 × (1/2)2 ½ = 3× ¼ ×1/2 = 3/8 = 3/8 × 200 = 75
Example 4: A plant breeder has 45 different inbred strains of pea plants. How
many different hybrids can be obtained from a total 45 plants?
Biostatistics
Solution:
Hybrid has two genes
N = 45 r = 2
According to the formula:
nCr = n/r (n –r)
= 45/ 2 (45 -2)
= 45 × 44 ×43/ 2.1 ×4 ×43
= 45 ×44/2
45 × 22 = 990
Example 5: In a family with two children in serampore subdivision where
both parents are heteroztgous for albinism. What proportion of these
family would be expected to have (a) neither child with albinism (b) one
child with albinism, (c) both children with albinism?
Solution:
Let the symbol “a” for albinism and “A” for normal
Expand binomial expansion
(A + a)2 = A2 + 2Aa + a2
The parents are heterozygous so therefore probabilities of normal ¾ and albino
¼
Two children in a family both are normal, i.e., A2 = (3/4)2 = 9/16
Among the two children, one is with albinism, i.e., 2Aa = 2Aa = 2 × ¾ × ¼ =
6/16
Both children with albinism, i.e., a2 = (1/4)2 = 1/16.
Example 5: Consider the parents of a Sinha Roy family in which both
of them heterozygous for a sever genetic syndrome, that is autosomal
P = 15p4q2
= 15 × (3/4)4 × (1/4)2
= 15 × 81/256 ×1/16
= 1215/4096
= 0.2966
= 0.297
Example 6: The four babies were born in Aligarh general hospital (a) what
was the chance that two will be boys and two girls (b) what was the chance
that all four would be girls?
Solution:
The probabilities of boys and girls were ½ = 0.5
P=½
=½
(a) n = 4. s = 2 and t = 2
(b) (s) for girls and
(c) (t) for boys
(d) P = n × ps × qt/s×t
(e) = 4 × (1/2)2 × (1/2)2/2×2
(f) = 4.3. (2) ×1/4 ×1/4/2.1(2)
(g) =6×¼×¼
(h) = 3/8
Probabilities was 3/8
(c) (p+q)4 = p4 + 4 p3q + 6 p2q2 + 4 pq3 + q4
(d) p = q4 (1/2)4 = 1/8
Biostatistics
Example 7: There are eight children in a family, where both parents are
heterozygous for albinism what mathematical expression predicts the
probability that six are normal and two are albinos?
Solution:
As both parents are heterozygous the probabilities of normal is ¾ and albinos ¼
i.e., p = ¾
q=¼
n= 8
s = 6 and t = 2
(a) P = n × ps × qt/s×t
= 8 ×3/4 × (1/4)2/6 (2)
8.7 (6) ×(3/4)6 × (1/4)2
= 28 × (3/4)6 ×(1/4)2
= 28 × 3/4×3/4×3/4×3/4×3/4×3/4 ×1/4×1/4
= 7×729/16384
= 5103/16384
= 0.31146
Example 17: A multiple allelic system is known to consist of seven alleles.
Assuming that this is a diploid species, how many different genotypes could
exist in the population?
Solution:
Number of possible genotypes = Number of different allelic combination
(heterozygotes) × number of genotypes with two same allele (homozygotes)
= n + n /k (n – k)
n=7
k = 2 (heterozygotes)
= (7)/ 2 (7–2) +7 = 7.6 (5)/2.5 +7 = 21 +7
= 28 genotypes
Poisson Distribution
The poison distribution was derived by French mathematician Simeon Denis
Poisson (1837) and is known as Poisson distribution.
It represent the Poisson distribution of discrete, random variables of rare events
whose probability occurrence is very small but the number of events/trials is
= e-m + e-m m/1 + e-m m2/2 + e-m m3/3 + e-m mr/r + e-m mn/n +
= e-m (1 + m/1 + m2/2 + m3/3 + mr/r + mn/n +
= e-m em = e 01 = 1
Mean = m = p
treatment in water.
The number of bacterial colonies in a given culture of per unit area
on microscopic slide has been seen under microscope.
The emission of radioactive particles
The number of mistakes have committed by a good typist per page.
The numbers of buses are passing through a certain road (M.G Road
Agra).
The number of diseases or death by cancer or heart attack in any
cities like in (Agra) hospitals in one year.
Example 1: In Biotechnology of 520 pages, 390 typological error occur.
= 0.4795
=2
Skewness (ß1) = 1/m
=1/4
=0.25
Kurtosis (ß2)
= 3 +1/m
=3+¼
0.325
µ3 = m
=4
µ4 = m +3m2
= 4 + 3 (4)2
= 4 +3.16
= 4 +48 =52
Example 4: The following data are obtained from vector cytogenetic
research laboratory of Dr. M.P.S College Agra.
1
)
0 20
1 26
2 16
3 4
4 2
What is mean?
What is e-m?
What is P(0)?
Solution:
= 0+26+32+12+18/20+26+16+4+2
= 78/68
1.147
=1.15
(X) = m = 1.15
e-1.15 = 0.32
Biostatistics
= 0.32 × 1 /1 = 0.32
Example 5: In a family of 8 children where both parents are heterozygous
for albinism, what mathematical expression predicts the probability that
six are normal and two are albino?
Solution:
Both parents are heterozygous, the probabilities of normal is 3/4 and albinos
1/4.
i.e.,
14.42. SKEWNESS
The distribution is said to be symmetrical when mean, median and coincide.
It has three parts left tail and middle part also. It has also right and left tail are
equal length. It is used to denote the extent of a symmetry in the data. When the
frequency distribution is not symmetrical it is said to be skewed. The meaning
of skewness is “ lack of symmetry.” A symmetrical distribution has therefore
zero skewness.
Characteristics
2. It may be positive or negative.
Positive skewness:
The curve of the distribution has longer tail toward the right it means
the higher values of the variable.
Mean > Median > Mode
If the curve of the distribution has a longer tail towards the left, i.e., the
lower values of the variable
Mean < Median < Mode
2. Here Mean, Median and mode are failed to coincide. Both median and
mean are displaced from the mode toward the skewed tail.
Introduction to Biotechnology and Biostatistics
Therefore
Q3 –Q2 > Q2 –Q1 (Positively Skewed)
Q2 – Q1 > Q3 –Q2 (Negatively charged).
Measures of Skewness
It indicate not only the extent of skewness in numerical expression
but also the direction, i.e., the number in which the deviations are
distributed.
It is normally measures of symmetry are called measures of skewness.
The absolute measures are known as measures of skewness.
It tells us the extent of symmetry whether it is positive or negative.
Absolute skewness = Mean – Mode
Mean > Mode (Positive skewness)
Mode > Mean (Negative skewness)
ß1 2
3
3
2
µ3 = 3rd moment and µ2 2nd moment.
There are important measures of relative skewness.
1.
Sk = Mean – Mode/ Standard deviation
Sk = 3 (Mean – Median)/Standard deviation
2.
Sk = Q3 – 2Q2 + Q1/ Q3 – Q1 (Q1 = First quartile; Q2 = Second quartile)
3.
Sk = P90 + P10 – 2 Median/P90 – P10 (P10 = 10th Percentiles; P90 = 90th Percentiles
curve.
The curve which has higher and sharper peaked is called as mesokurtic
Biostatistics
Importance of Skewness
It tells the direction and extent of asymmetry in a series.
It provides us an idea about the nature and degree of concentration of
items.
Dispersion Skewness
It spread the individuals values It shows the departure from symmetry,
about the mean it means central i.e., direction of variation.
value.
It shows the degree of variability. It shows the value is higher or the lower
concentration.
It is types of averages of deviation- It is not the average but it is measured
average of the second order. by the use of mean, median and mode.
It judges the truthfulness of the It judges the truthfulness of the central
central tendencies. tendencies.
Finite set:
Null set
It does not contains any elements at all is called as null set. It is also called void
set or empty set.
There is only one such set
Biostatistics
It is denoted by
Examples 1: A person who can jump to a height of 5 miles is the null set
because none can jump to such height.
Unit Set
The set having only one element and it is also known as single tone set.
Equal Set
The two sets viz A and B are called equal if they have same elements. When A
Equivalent Set
The two sets viz A and B are equivalent if the number of elements, i.e., cardial
numbers are equal. e.g., A = (2, 4 and 6) and B= (a, b and c)
Here n (A) = n
(B) = 3.
Cardinal Set
by n (A).
The cardial number null set is zero.
the former fits the letter so it involve all the statistical test for the calculation
or observation of the hypothesis to be significant or not significant value.
In scientific research, first we make a hypothesis and do experiments after
completion of the experiment we analysis the data that is correct or not so chi
square test for the analysis of our observation.
Useful Points
Hypothesis test
mean and true mean or population mean expressed in terms of the standard
error.
T = Difference between sample mean
Standard error of the difference between means
1
–X2
where X1 and X2 = Mean
SE = Standard error
Type
Paired t-Test
Unpaired t-Test
Condition for applying test
Random sample are collected from normal population,
The population variances are regarded as equal for the testing the
equality of two population means.
Samples are less than 30.
Some adjustments in degrees of freedom for are made in case of
two samples.
distribution properties
The distribution curve varies with the degrees of freedom.
It is symmetrical distribution with mean zero.
Compare the calculated value with the table value at particular degrees
of freedom.
Example: There are 13 children were given a usual diet plus vitamins “A”
and “D” tables. While the second comparable group of 12 children was taking
the usual diet. After 12 months, the gain in weight in pounds was noted as
given in the table. Can you say that Vitamins A and D were responsible for the
difference?
A 5 3 4 3 2 6 3 2 3 6 7 5 3
B 1 3 2 4 2 1 3 4 3 2 2 3
Solution:
Null Hypothesis:
Vitamins (A and D) are responsible for the gain weight difference
Alternative hypothesis:
Vitamins are not the responsible for the gain weight differences.
S.no Gr A (x) (X –X) = D (X-X)2 = Gr B (Y) (Y-Y) = D2 (Y-Y)2 =
D21 D22
1 5 5– 4 = 1 1 1 1 – 2.5 =- 1.5 2.25
2 3 3– 4 = -1 1 3 3 – 2.5 = 0.5 0.25
3 4 4– 4 = 0 0 2 2 – 2.5 = -0.5 0.25
4 3 3– 4 = -1 1 4 4 – 2.5 = 1.5 2.25
5 2 2– 4 = -2 4 2 2 – 2.5 = 0.5 0.25
6 6 6– 4 = 2 4 1 1 – 2.5 = -1.5 2.25
7 3 3– 4 = -1 1 3 3 – 2.5 = 0.5 0.25
8 2 2– 4 =-2 4 4 4 – 2.5 = 1.5 2.25
9 3 3– 4 = -1 1 3 3 – 2.5 = 0.5 0.25
10 6 6– 4 = 2 4 2 2 – 2.5 = -0.5 2.25
11 7 7– 4 = 3 9 2 2 – 2.5 = -0.5 0.25
12 5 5– 4 = 1 1 3 3 – 2.5 = + 0.5 0.25
13 3 3– 4 = -1 1 0.25
2
1
= 2
2
= 11
32
GrA GrB
n = 13 n = 12
X = 14
D21 = 32 D22 = 11
2 2
/ n1 + n2 – 2
Biostatistics
= 1.37 ×0.4
= 0.548 = 0.55
4 -2.5/0.55
1.5/0.55 = 2.72
14.48. Z-TEST
The deviation from the mean in a normal distribution r curve is called relative or
standard normal deviate and is given the symbol “Z.’ It is measured in terms of
SD and indicates how many an observations is bigger or smaller than the mean
in units of SD. So “Z” will be ration
Z = Observations – Mean/SD = X-X/SD
It is applying to the sampling variability and the difference between a sample
estimate and that of population is expressed in terms of SE instead of SD. The
score of the value ration between the observed difference and SE is called “Z.”
Condition for “Z”
It must be quantitative.
It should be assumed to follow normal distribution.
It must be randomly collected the data.
The sample size must be larger than 30.
mean (X)
Z = X – µ/ SE (X)
14.50. T TEST
One Tailed t Test
The statistical hypothesis where either alternative hypothesis is one sided is
called one tailed test or one sided test.
It may be right or left tailed test.
If we want to know one particular drug is than the other.
It will be one tailed test.
Two-Tailed t Test
It is a test of statistical hypothesis based on rejected region represented by both
sides of the standard normal curve.
Example: The nourished children or healthy children is different look from that
of unnourished or unhealthy children.
Example: The sample of mean of 1600 IQ level children was 99. It is likely
that this was a random sample from a population with mean I.Q 100 and
standard deviation is 15.
Solution:
Null hypothesis: the sample has not be drawn from the population with mean
I.Q 100.
Alternative hypothesis: sample has been drawn from the population hypothesis.
Biostatistics
Here n = 1600
X = 99
µ = 100
SD = 15
So the null hypothesis is rejected, sample has not been drawn from population
with mean 100 and SD is 15.
2
): It is an absolute measure of dispersion of raw scores around
the sample (group) mean and the dispersion of the scores resulting from their
varying differences (error terms) from the means.
The square of the standard deviation is called the variance and is denoted by
the 2).
Mean Square: The measure variability are used in the analysis of variance is
called a “Mean square.”
Sum of square deviation from mean divided by degrees of freedom.
Mean square = Sum of square deviation from mean/Degrees of freedom
Assumption in the analysis of variances:
It effects of various components are additive.
It occurs random and it is independent of each other in the groups.
In this the population is normally distributed with common variance.
The samples are independently drawn.
Technique for the analysis of variance
One way ANOVA: The single independent variable is involved Eample:
the effect of pesticides (independent variables) on the oxygen consumption
(dependent variable) in a sample f insect.
Two way ANOVA: The two independent variables are involved.
Example: There are number of group of pesticides involve for the oxygen
consumption of sample of insect.
Procedure:
It is more convenient.
Introduction to Biotechnology and Biostatistics
It is based on the short cut method on the sum of the squares of the
individuals values are usually used.
The procedure of the calculation in direct method are lengthy as
well as time consuming and this is not popular in practice for the all
experiments.
Parameter
It is the numerical index or summary value like mean, median and standard
deviation or variance of a variable for the entire population.
Non-parametric test
The test or methods are mathematical procedures concerned with the treatment
of standard problems when the assumption of normality is replaced by general
assumption concerning the distribution function. It is also called the distribution
free.
Parametric test
The most commonly used statistical methods are called parametric because
they are involved in testing the values of parameter (mean, median or standard
deviation).
Characteristics
It can be computed by very simple method.
It does not require normal distribution of the variables.
It can be used for very small sample.
It works out without using any pre-computed statistic as an estimate of
parameter.
It can be done with very little assumption.
Merits and demerits of non-parametric test
Merits
It can be applied in all types of data.
It does not need pre-computed statistics.
It has a greater range of applicability.
It does not require laborious and lengthy calculations.
It is generally simple to understand and very easy to computed and
applies.
Demerits
The procedure has lack of power.
Biostatistics
individuals.
It is applicable when two sets of ranking individuals are available.
Median test for independent sample
Average Formula:
Let a1, a2, a3, a4, a5…….an be set of numbers = (a1+a2+a3+a4+a5….+an)/n
Adding Formula: a/b+c/d = ad+bc/bd
Subtracting formula: a/b – c/d = ad – bc /ad
Multiplying fractions: a/b * c/d = ac/bd
Dividing fractions: a/b/c/d = a/b / c/d = a/b * d/c = ad/bc
15.1. SIGMA PLOT
Sigma plot is used for prepare graph, analysis of variance and most of presentation
of data for analysis. This software can read multiple formats, we can directly
paste data from excel to Sigmaplot worksheet and make graph easily.