100% found this document useful (1 vote)
360 views108 pages

Ug Stat Pract Manual

The document provides guidelines for constructing a frequency table from a set of data. It outlines the key steps: 1) determining the number of classes using Sturge's rule, 2) calculating the class interval based on the range and number of classes, 3) choosing class limits based on whether the variable is continuous or discrete, and 4) forming classes and determining class frequencies either through the exclusive or inclusive method. The frequency table summarizes the data distribution into meaningful classes for analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
360 views108 pages

Ug Stat Pract Manual

The document provides guidelines for constructing a frequency table from a set of data. It outlines the key steps: 1) determining the number of classes using Sturge's rule, 2) calculating the class interval based on the range and number of classes, 3) choosing class limits based on whether the variable is continuous or discrete, and 4) forming classes and determining class frequencies either through the exclusive or inclusive method. The frequency table summarizes the data distribution into meaningful classes for analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Dr. A. K. Parida, Ph.D.

Associate Professor

Department of Agricultural Statistics


College of Agriculture (OUAT), Bhubaneswar-3

PREFACE
The subject statistics has much importance for teaching,
research and extension in the field of agriculture and allied
science. The knowledge and expertise of the subject is
immensely helpful to the teachers, scientists, students and
research scholars for their area of study and application. We
collect data from different sources by different methods for
different purposes. As these data are random in nature, they are
subjected to various manipulations to infer valid conclusions for
further efficient use and correct decisions. No doubt, we can
handle the voluminous data so generated for the purpose by use
of computers and softwares. But, the fundamental concepts,
knowledge and expertise on procedures, principles and
techniques of statistics play a vital role to arrive at a valid and
meaningful conclusion.
This practical manual has been conceived and prepared for
the students and teachers as well to acquaint the basic concepts
of statistical principles and procedures of calculations as per the
syllabi of 4th Dean’s committee of ICAR for under graduate
courses in agriculture and allied sciences. The manuscript of this
manual has been prepared with my long years of teaching
expertise and persuasion from students and teachers of the
university. The contents so developed have been referred and
copied from many text books, journals, manuals and the
internet. I acknowledge the help of those sources. I expect
comments from the users of this manual for any addition or
deletion and improvement in future. I wish the practical manual
would be very much useful for students and research workers.
I may, also, thank to the authorities for providing funds
from the XIth ICAR development grant for printing the manual.

Date: March 25, 2009 Amulya Kumar Parida


CONTENTS
Practicals Topics Page

I. Statistical methods 1

1.1 Construction of Frequency Table 1

1.2 Graphical representation of frequency 4


distribution

1.3 Measures of central tendency or central value - 6


Arithmetic Mean, Geometric Mean, Harmonic
Mean, Median, Mode, Quartile, Decile and
Percentiles

1.4 Measures of dispersion of a frequency 13


distribution - Mean deviation, Standard
Deviation, Variance, and Coefficient of Variation
(C.V.)

1.4 Moments and Measure of skewness and kurtosis 17

1.5 Testing of Hypothesis or Test of Significance or 20


decision rule

1.6 Standard normal deviate (SND) or Z tests or 21


Large Sample Tests - for single mean and
difference of two means

1.7 Small Sample Tests - test of 2 variances, test 24


for single mean, two independent means and
two dependent means

1.8 Chi-square test (χ2) - Goodness-of-fit and 33


independence or association of attributes

1.9 Correlation and regression - Pearson’s 38


correlation coefficient and its test, Spearman's
Rank correlation coefficient; fitting of regression
equations of two variables Y and X

II. DESIGN AND ANALYSIS OF EXPERIMENTS 47

2.1 Basic concepts on design of experiments - 47


Analysis of variance : one-way and two-way
classification
Practicals Topics Page

2.2 Analysis of data in completely randomized design 52


(CRD): unequal replications, equal replications

2.3 Analysis of data in randomised complete block 57


design(RCBD)

2.4 Analysis of data in Latin square design (LSD) 61

2.5 Missing plot technique in design of Experiments 64

2.6 Analysis of data in RCBD with one missing 65


observation

2.7 Analysis of data in LSD with one missing 68


observation

III. SAMPLING TECHNIQUES 71

3.1 Principal steps in a sample survey 72

3.2 Simple random sampling (SRS): Selection of 76


sampling units from a Population

3.3 Parameter estimation in SRS: SRSWOR, SRSWR 78

3.4 Stratified sampling 82

3.5 Systematic sampling 88

APPENDIX STATISTICAL TABLES (t, F, χ2, r, Z, random 93


number)

Table-1(a): Critical values for t-distribution 93

Table-1(b): Critical values for t-distribution (One 93


& Two-tailed)

Table-2: Critical values for F-distribution 95

Table-3: χ2 (Chi-Squared) Distribution: Critical 101


Values of χ2

Table-4: Critical value for Correlation 101


coefficients (Simple or Partial)

Table-5: Percentage points of the normal 102


distribution, Z

Table-6: Random numbers 103


UG Practical Manual on Statistics

PRACTICAL MANUAL ON STATISTICS

Two major practical aspects of scientific investigations are collection


of data and interpretation of the collected data. The data may be
generated through a sample survey on a naturally existing population or a
designed experiment on a hypothetical population. The collected data are
condensed and useful information extracted through techniques of
statistical inference. This manual essentially deals with various statistical
methods and techniques used for objectively tabulating the data, step-by
step computation of data and making valid inferences out of the same
which will be useful for under graduate students.
General Objective: To impart knowledge to the students on basic
concepts and statistical techniques applied in agriculture and allied
sciences.
Specific objectives:
By the end of practical exercises, the students will be able to:
1. Acquaint with the practical applications of statistical techniques in
agriculture.
2. Make self sufficient and to draw valid conclusion of statistical
techniques.

I. STATISTICAL METHODS

1.1. Construction of frequency table


A frequency table is a technique which meaningfully summarizes a
set of observations in a tabular form so as to bring about the essential
information contained in it. A tabular arrangement of data by classes
together with the corresponding class frequencies is called a frequency
distribution or frequency table.
There are two types of frequency table.
i. Exclusive type
ii.Inclusive type
The frequency table of exclusive type (lower limit value is included
and upper limit is excluded) is formed when the data are continuous and
it is called as continuous distribution. The frequency table of inclusive
type (both lower and upper limit values included) is considered when the
data are discrete or discontinuous and it is called discontinuous / discrete
distribution.

Department of Agricultural Statistics, OUAT Page-1


UG Practical Manual on Statistics

Procedure:
The following steps are to be considered for constructing a
frequency table from a set of data.
Step-1. Determination of number of classes
Usually the number of classes should be of 5 to 15 otherwise the
information contained in the data may be lost. One may use the formula
of Sturge’s rule for determining the number of classes, K.
K= 1+3.322 log10 N where, N=No. of observations
Step-2. Determination of magnitude of class interval (CI)
From a given set of observations, locate the maximum (Max) and
minimum (Min) value.
Then, Range= Max – Min
Max  Min
and CI or class width (d) will be: d =
K
If ‘d’ have decimal value then consider the nearest integral value as class
width.

Step-3. Choice of class limits or class boundaries

First, we should check whether the observations of the variable is a


continuous or discrete type viz. height, weight, volume etc. of
measurement type is a continuous variable and no. of trees, no. of
students etc. of count type is discrete variables. Use exclusive method of
frequency distribution if the variable is continuous otherwise inclusive
method if variable is discrete.

Step-4. Formation of classes:

a. Exclusive method: From the first class the subsequent classes are
made by adding d with both lower and upper limits, e.g. if first class is
L to L+d then second class is L+d to L+2d and so on.
Exa. 10 to 15, 15 to 20, 20 to 25 etc.
b. Inclusive method: From the first class the subsequent classes are
made by adding (d+1) instead of d to both lower and upper limits,
e.g. if first class is L to L + d then second class is [L+(d+1)] to
[L+(2d+1)] and so on.
Exa. 10 to 15, 16 to 21, 22 to 27 etc.
Step-5. Determination of Class frequency
It is how frequently a value of the variable occurs in a class. The
class frequencies are determined with the help of tally marks (|).
Step-6. Construction of frequency distribution table

Department of Agricultural Statistics, OUAT Page-2


UG Practical Manual on Statistics

The frequency table has the following headings.

Classes Tally mark Frequency

(1) (2) (3)

The classes are formed starting with the minimum value of the set
of observations having each class of difference of class width(d). Then,
tally marks are made under each class as per the appearance of the
observations sequentially. In a class when 5th tally mark is required,
either a slash(/) or overhead mark(¯) is drawn to the group of 4 tally
marks. The tally mark in each class starts from the first observation till to
the end of data. Then the tally marks are counted as frequency of the
class in the last column.

Problem-1. Construct the frequency distribution table with the following


30 observations.

10(Min),15,17,20,21,16,17,18,20,31,35(Max),13,12,15,14,12,15,17,14,1
3,15,14,13,14,20,19,18,28,24,25.

Solution:

(i). No. of Classes, K = 1 + 3.322 log10N where, N = 30


K= 1+3.322  Log1030
= 1+3.322  1.4771
= 1+4.90=5.90  6.
Max  Min
(ii). Class size, d =
K
35  10 25
d   4.16  5.
6 6
a. Exclusive method:

Table-1. Construction of frequency distribution table with CI=5

Class Tally marks Frequency


10-15 IIII IIII 10
15-20 IIII IIII I 11
20-25 IIII 5
25-30 II 2
30-35 I 1
35-40 I 1
Total 30

Department of Agricultural Statistics, OUAT Page-3


UG Practical Manual on Statistics

b. Inclusive method:

Table-2. Construction of frequency distribution table with CI=5

Class Tally mark Frequency


10-14 IIII IIII 10
15-19 IIII IIII I 11
20-24 IIII 5
25-29 II 2
30-34 I 1
35-39 I 1
Total 30

1.2. Graphical representation of frequency distribution

Graphical representation of the observations facilitate to better


understanding about some more depth of distribution of observations. The
frequency distribution can be represented in the form of Histogram,
Frequency polygon, Frequency curve and Ogive.

Procedure:
a. Histogram: Histogram is a set of vertical bars in a 2-dimensional
graph whose areas are proportional to the frequency of the class. It
can be drawn by taking classes in X-axis and drawing bars of
corresponding class frequencies in the Y-axis.
b. Frequency polygon: It is made by joining straight lines with the mid
points of each bars of the Histogram.
c. Frequency curve: A Frequency curve is a graphical representation of
frequencies corresponding to their variate values by a smooth hand
curve. Frequency curve is made when the CI of each class is small
so as to draw a smooth hand curve. It can be drawn by smooth
hand joining of mid points of frequency polygon.
d. Ogive: It is a graph plotted for the variate values and their
corresponding cumulative frequency of a frequency distribution. Its
shape is just like elongated “S”. An Ogive is prepared by using
‘more than type’ or ‘less than type’ or both of cumulative
frequencies.
The above graphical representation of frequency data is easily made
with exclusive type. If a frequency table is of inclusive type, it is first
made into exclusive type and then the above types of graphs are drawn.
Cumulative frequency is the systematic sum of frequencies of each
class in downward (less than type) and upward (more than type) in the
classes of frequency table.

Department of Agricultural Statistics, OUAT Page-4


UG Practical Manual on Statistics

Problem–2. Construct the Histogram, Frequency Polygon, Frequency


curve and Ogive of the following frequency distribution on the length of
60 sorghum ear heads (cm).

Class (Length) : 18-20 21-23 24-26 27-29 30-32 33-35 36-38


No. of ear head : 4 10 14 16 10 4 2

Solution:
As the given frequency table is of inclusive type, the classes of
exclusive type is to be made for continuity of classes and then the both
type of cumulative frequencies are to be computed.

Table-3. Cumulative frequency table

Cumulative Frequency
Class Exclusive Class Mid value Frequency
Less than Greater than
18-20 17.5-20.5 19 4 4 60
21-23 20.5-23.5 22 10 14 56
24-26 23.5-26.5 25 14 28 46
27-29 26.5-29.5 28 16 44 32
30-32 29.5-32.5 31 10 54 16
33-35 32.5-35.5 34 4 58 6
36-38 35.5-38.5 37 2 60 2

Fig. 1. HISTOGRAM Fig. 2. FREQUENCY POLYGON

Fig. 3. FREQUENCY CURVE Fig. 4. OGIVE(1-less type, 2-more type)

Department of Agricultural Statistics, OUAT Page-5


UG Practical Manual on Statistics

Exercise: Construct a frequency distribution table, histogram, frequency


polygon, frequency curve and ogive for the following data and interpret
the results.
25, 32, 45, 8, 24, 42, 22, 12, 9, 15, 26, 35, 23, 41, 47, 18, 44, 37, 27,
46, 38, 24, 43,46, 10, 21, 36, 45, 22, 18.

1.3. Measures of central tendency or central value

Central tendency or central value is the property of the distribution


of data where we compute a central value which represents all other
values. It is commonly measured by the Arithmetic Mean (or Mean),
Geometric Mean, Harmonic Mean, Median and Mode.

Procedure:

Mean or Arithmetic Mean (A.M)


The arithmetic mean is the sum of observations divided by the total
number of observations.
i. For a series of data: If the series have ‘n’ values of a variable ‘X’, i.e. x,
x2,………….., x n, the Arithmetic Mean (A.M) is given by:

x1  x2  ........................  xn Sum of values


A.M  
n No. of Values
n

x i
X  i 1
n

ii. For ungrouped frequency distribution:


Suppose the values x 1, x2…………………..,xn occur with frequencies
n

 f .x i i n
f1,f2,………………, fn, then A.M. is given by: X  i 1
, N   fi
N i

iii. For grouped frequency distribution:

If data are grouped according to different class intervals, the mid


value of each class is taken as an approximation to the value of the
variable representing that class. If m1, m2………… …… mn represents the
mid values of ‘n’ classes of the variable ‘X’ and f1, f2,……..,fn represents
the corresponding frequencies, the Arithmetic Mean of x is

Department of Agricultural Statistics, OUAT Page-6


UG Practical Manual on Statistics

f m i i
X i 1
n

f i 1
i

a). Short-cut method (or change of origin):


If di = (xi - A), A= any arbitrary value(called origin), then
n

 f .d
i 1
i i
X  A
f i
i

b). Step-deviation method (or change of origin and scale):

x A
If u i   i  where, A = any arbitrary value(called origin),
 h 
h = magnitude of class interval (or scale), then
h n
X  A   fiu i
N i 1

Geometric Mean (G.M.)


Geometric mean is the ‘n-th’ root of the product of all ‘n’ values.
i. For a series of data: If the values of the variable are x1, x2,…xn, then
the Geometric mean of ‘x’ is:
G   x1. x2 ............xn 
1/ n

1 n
Alternatively, log10 G   log10 xi
n x 1
' or '
1 n 
G  Anti log  log10 xi 
 n x 1 

ii. For ungrouped frequency distribution:

If the values x1, x2………. xn occur with frequencies f1,f2….fn respectively,


then
1
 1 n 
G  ( x1 x2 ............xn )
f1 f2 fn f 1  f 2  ...........  f n
or G  Anti log  
 N x 1
f i log 10 x i 

N = f1  f2  ...........  fn
iii. For Grouped frequency distribution:
1
 1 n

G  (m 1 m 2
f1 f2
..............m n )
fn f1  f 2 ............. f n
or G  Anti log 
N

x 1
f i log 10 m i 

Department of Agricultural Statistics, OUAT Page-7
UG Practical Manual on Statistics

N= f1  f2  ...........  fn and m1, m2……….. mn are mid-values of the classes.


Harmonic mean (H.M.)
The Harmonic Mean is the reciprocal of the mean of reciprocal of the
observations.

i. For a series of data: If x1, x2…….xn are values of a given variable, then
the Harmonic Mean is:

1 n
H .M  
1 1 1 1 n
1
 
n  x1 x 2
 ........  
xn 
  
i 1  xi 

ii. For ungrouped frequency distribution:


If x1, x2,…………,xn occur with the frequencies f1,f2,……..,fn respectively,
then,

H .M  f i
 f i
n
f1 f 2 f
 ...........  n
x1 x2 xn ( f x )
i
i
i

iii. For grouped frequency distribution:

HM   f , where,m , m
i
,......., mn are mid  values of the classes.
  f m 
1 2
i
i

Problem-3. The frequency distribution of weight(g) of 180 sorghum ear-


heads is given in the following table. Calculate the A.M., G.M and H.M.

Table-4. Frequency distribution of sorghum ear heads

Weight of ear head in gm No. of ear heads


(X) (f)
40-60 6
60-80 28
80-100 35
100-120 50
120-140 30
140-160 10
160-180 12
180-200 9
Total 180

Department of Agricultural Statistics, OUAT Page-8


UG Practical Manual on Statistics

Solution:

Table-5. Computation of mean (A.M.) by direct method, short-cut


method and step-deviation method

ui=
Mid (m i  A)
Class
value fi fi mi A di fidi fi ui
(X) h
(mi)
40-60 50 6 300 -60 -360 -3 -18
60-80 70 28 1960 -40 -1120 -2 -56
80-100 90 35 3150 -20 -700 -1 -35
100-120 110 50 5500 110 0 0 0 0
120-140 130 30 3900 20 600 1 30
140-160 150 10 1500 40 400 2 20
160-180 170 12 2040 60 720 3 36
180-200 190 9 1710 80 720 4 36
Total N=180 fimi = - -  fidi = - fiui
20060 260 = 13

The mean weight of ear head is given by:


n

f m
   111.44g
i i
i. Direct method: X  i 1
 20060
N 180

f d i i
260
ii. Short-cut method : X  A  i 1
 110   110  1.44  111.44g
N 180

iii. Step-deviation method:


h n 20
X A 
N i 1
f i u i  110 
180
 13  110  1.44  111.44g

Table-6. Computation of Geometric mean (G.M.)

Class Mid value Frequency Log10mi fi  log10mi


(x) mi fi
40-60 50 6 1.69 10.14
60-80 70 28 1.84 51.52
80-100 90 35 1.95 68.25
100-120 110 50 2.04 102.00
120-140 130 30 2.11 63.30

Department of Agricultural Statistics, OUAT Page-9


UG Practical Manual on Statistics

140-160 150 10 2.17 21.70


160-180 170 12 2.23 26.76
180-200 190 9 2.27 20.43
Total 180 - 364.1

f. i log mi
364.1
Log G  i 1
n
  2.02 ; G  Ant log(2.02)  104.71g
f
180
i
i 1

Table-7. Computation of Harmonic Mean (H.M.)

Class Mid values Frequency


fi/mi
(x) mi fi
40-60 50 6 0.12
60-80 70 28 0.4
80-100 90 35 0.38
100-120 110 50 0.45
120-140 130 30 0.23
140-160 150 10 0.06
160-180 170 12 0.07
180-200 190 9 0.04
Total - N=180 (fi/ mi)= 1.75

Harmonic mean (H.M.) = f i



180
 102.85 g
f
 m i 1.75
i

Conclusion: From the above calculation the Arithmetic Mean (A.M.),


Geometric Mean (G.M.), and Harmanic Mean (H.M.) of weight of sorghum
ear-heads are 111.44g, 104.71g, and 102.85g respectively. And the
relation obtained is A.M. > G.M. > H.M.
Note: The relation may be A.M. ≥ G.M. ≥ H.M.

Median, Quartile, Decile and Percentiles


In a frequency distribution (arranged in increasing or decreasing
order), median is that value where half of the observation would be above
the value and half below it. Similarly Quartiles, Deciles and Percentiles are
those values of the variate which divide the total frequencies into 4 parts,
10 parts and 100 parts equally respectively.
Procedure:
Prepare a cumulative frequency table and then calculate i.N/4, i.N/10,
i.N/100 to find out the ith Quartile class, ith Decile class, ith Percentile class
respectively. In case of Quartiles, i=1,2,3; in Decile, i=1,2,……,9 and in
case of Percentile, i=1,2,…….,99.

Department of Agricultural Statistics, OUAT Page-10


UG Practical Manual on Statistics

Formula: C T  L o 
h N
fi
i(
x
 
)  c.f )

where, L0= Lower limit of the : ith Quartile class in case of ith Quartile
: i th Decile class in case of ith Decile
: ith Percentile class in case of ith Percentile
h = Width of the frequency distribution class
fi =Frequency of the i th Quartile or ith Decile or ith Percentile
class
N =Total frequency = ( fi)
c.f = Less than cumulative frequency preceding the ith Quartile
or ith Decile or ith Percentile class
x=4 or 10 or 100 for Quartiles, Deciles and Percentiles,
respectively.
How to find a quartile/decile/percentile class?
In a frequency table, to find out the ith Quartile class/ith Decile
class/ith Percentile class compute the i.N/4 or i.N/10 or i.N/100
respectively. Then locate the respective class in the table whose
corresponding c.f. is more than these values. In case of Quartiles,
i=1,2,3; in Decile, i=1,2,……,9 and in case of Percentile, i=1,2,…….,99.
Problem-4. Find the Median (2nd Quartile); lower Quartile(1st Quartile),
7th Decile and 85th Percentile of the frequency distribution given below:
Marks in below 10-20 20-30 30-40 40-50 50-60 60-70 above
statistics 10 70
No. of 8 12 20 32 30 28 12 4
students
Solution:
Table-8. Computation of Median, Quartile, Decile and Percentile

Marks in Statistics No. of Students Less than Cumulative


(X) (fi) frequency (c.f)
<10 8 8
10-20 12 20
20-30 20 40
30-40 32 72
40-50 30 102
50-60 28 130
60-70 12 142
>70 4 146=N

(i) Median = 2nd quartile denoted by Q2 i.e. i=2


 2  N 146 
So, for i=2, i.N/4=     73
 4 2 

Department of Agricultural Statistics, OUAT Page-11


UG Practical Manual on Statistics

Hence Median Class is 40-50 corresponding to c.f.=102 which is


h
>73. Median = L0 + (N/2-c.f)
fi
10
= 40 + (73 - 72)= 40+0.33= 40.33
30
(ii) First Quartile = Q1 Here, i=1
146
So, for i=1, i.N/4 = (1 x N/ 4 = )=36.5
4
Hence Q1 Class is 20-30 corresponding to c.f.=40 which is >36.5.
h
Q1 = L0 + (N/4 - c.f.)
fi
10
= 20 + (36.5  20)  20  8.25  28.25
20
(iii) Seventh Decile = D7 Here, i=7
7  146
So, for i=7, i.N/4= (7  N / 10)  )  102.2
10
And 7th decile class is 50-60.
D7  Lo  7.N / 10  c. f .
h
fi
10
 50  (102.2  102.0)  50  0.07  50.07
28

(iv) 85th Percentile = P85 Here, i=85


 85  146 
So, for i=85, i.N/4=  (85  N 100)   =124.1
 100 
And 85th Percentile class is 50-60.
10
P85  50  (124.1  102)  50  7.89  57.89
28

Mode of a frequency distribution


The Mode is the value of the variate which occurs most frequently in
the data set. In a frequency table the Modal class is the class which has
greatest frequency.
Procedure:
i. For a series or ungrouped data: The observation which have the highest
frequency i.e. the value which occurs maximum times is the mode.
ii. For grouped data:

Formula:
f  fp
Mode ( M O )  Lo  h
(2 f  fp  f s )
Where, L0 = Lower limit of the modal class

Department of Agricultural Statistics, OUAT Page-12


UG Practical Manual on Statistics

f = frequency of the modal class


fp = frequency preceeding the modal class
fs = frequency succeeding the modal class
h = width of the frequency distribution class
Note: The class which has highest frequency is the modal class

Problem-5. Compute the Modal value of the wages of workers in a farm


from the following frequency distribution.

Wages (Rs.) No. of workers


30-35 12
35-40 18
40-45 22
45-50 27
50-55 17
55-60 23
60-65 29
65-70 8

Solution:

Modal class = Maximum frequency(=29) class i.e. 60-65


( f  fp)
Mode = L0  xh
(2 f  f p  f s )
L0 = lower limit of modal class = 60
f = frequency of modal class = 29
fp = frequency of the preceeding modal class = 23
fs = frequency of the succeeding modal class = 8
h = class size = 5
(29  23)  5 6
Mode = 60   60   5  60  1.11  61.11
(2  29  23  8) 27
1.4. Measures of dispersion of a frequency distribution
Literal meaning of dispersion is scatterdness. We study dispersion
to have an idea about the homogeneity or heterogeneity of the
distribution i.e. the scatterdness of observations from a central value.
There are several measures of dispersion and each provides specific
information concerning the scatter or dispersion of values in a
distribution. Measure of mean along with dispersion gives some more
information about the data. The measures of dispersion are Range,
Quartile Deviation, Mean Deviation, Standard Deviation, Variance and
Coefficient of Variation.

Department of Agricultural Statistics, OUAT Page-13


UG Practical Manual on Statistics

Mean deviation from a particular value ‘A’ (Mean or Median or


Mode) of a frequency distribution

Procedure:
Mean deviation is defined as the arithmetic mean of the absolute
deviations of the variate values from a particular measure of location. This
mean deviation may be about Mean, about Median or about Mode.

In a frequency distribution,
1 n
M.D.   f i x i  A
N i 1
where, x1, x2,…………., xn are values of classes or mid-values of the classes
with frequencies f1,f2,………..,fn.
n
N= Total frequency = f
i 1
i

A= either Mean or Median or Mode

Problem-6. Compute the Mean Deviation from the Mean from the
following data.

Wages (Rs.) Number of labourers


60-70 5
50-60 10
40-50 20
30-40 8
20-30 3

Solution:
Table-9. Computation of Mean Deviation from Mean
Wages Mid Values Number of f i xi |d| = f |d|
(Rs.) (xi) labourers (fi) |x-mean|
60-70 65 5 325 18.70 93.50
50-60 55 10 550 8.70 87.00
40-50 45 20 900 1.30 26.00
30-40 35 8 280 11.30 90.40
20-30 25 3 75 21.30 63.90
Total 46 2130 - 360.80

Mean= 
 fx i i

2130
 46.30
f i 46

Mean Deviation from mean 


f d 
360.80
 7.843
f 46

Department of Agricultural Statistics, OUAT Page-14


UG Practical Manual on Statistics

Standard Deviation, Variance and Coefficient of Variation (C.V.)


Procedure:
The arithmetic mean of the squares of the deviation of the variate
values from their arithmetic mean is defined as the Variance. The positive
square root of the Variance is called the Standard Deviation (S.D.).

Coefficient of Variation (C.V.) is the relative magnitude of Variation,


based on observations relative to the magnitude of their arithmetic mean.
It is defined as the ratio of standard deviation to arithmetic mean
expressed as percentage.

There are two methods for calculation of Standard deviation:


i). Direct method
ii). Short-cut method (by changing of origin and scale)

i. Direct method:
Step 1 : Calculate mid value (xi) for group data
Step 2 : Calculate fi.xi of each class and finally  fi.xi
Step 3 : Calculate xi2 and fi.xi2 and finally  fi.xi2
Step 4 : Calculate S.D. (  ) by using the formula

f   fi . xi
2
. xi
2

S.D.=  =+
i
  , Where, N   f i
N  N 
 
f   fi xi  2 2
.xi
and Variance,  2   
i

N  N 
 
ii. Short-cut Method or Step deviation method:
Step 1 : Calculate the mid value (xi) for group data
Step 2 : Calculate deviation value (di), where
x A
di  i where, A=any arbitrary value or mean, c=class size
c
 
Step 3: Calculate, f i . d i  and f i . d 2 i and finally  f i d i and  f i . d i
2

Step 4: Calculate S.D. by using formula

S.D=   c
 f .d
i i
2


 f d  i. i
2

N N
 fd2  fd 
 i i  i i
2

So, Variance =   c  2 2  
 N  N  
   

Department of Agricultural Statistics, OUAT Page-15


UG Practical Manual on Statistics

 S.D. 
Coefficient of Variation, C.V.=    100    100
 Mean  X
 
Standard deviation is an absolute measure of dispersion whereas
C.V. is a relative measure of dispersion expressed in percentage for
comparing two or more data sets.
Problem-7. Compute the Standard Deviation, Variance and C.V. from the
following data.
Size of the holding No. of
(ha) farmers
2.5-3.5 1000
3.5-4.5 2300
4.5-5.5 3600
5.5-6.5 2400
6.5-7.5 1700
7.5-8.5 3000
8.5-9.5 500

Solution:
Table-10. Calculation table for Standard Deviation

Size of
Mid value di=(xi -A)
holding (fi) fi .xi fi .xi 2 fi.di fi.di2
(xi) for A=6
(ha.)
2.5-3.5 3 10,00 3000 9000 -3 -3000 9000
3.5-4.5 4 2300 9200 36,800 -2 -4600 9200
4.5-5.5 5 3600 180,00 90,000 -1 -3600 3600
5.5-6.5 6 2400 14400 86400 0 0 0
6.5-7.5 7 1700 11900 83,300 1 1700 1700
7.5-8.5 8 3000 24000 19200 2 6000 12000
8.5-9.5 9 500 4500 40,500 3 1500 4500
Total 14,500 85,000 5,38,000 -2000 40,000

a). Direct method:


f   f i .x i
2
.xi
2

S.D=  
i

N  N 
 
2
538000  85000 
=    37.103  34.362  1.65
14500  14500 
b). Step Deviation Method:
f .di
2
 f .di 
2

i. S.D =  c  
i i

N N
40,000   2000 
2

= 1  
14,500  14500 

Department of Agricultural Statistics, OUAT Page-16


UG Practical Manual on Statistics

= 2.758  0.019 = 2.739 1.655


ii. Variance = S.D  = 2.739 = 1.655
2 2

S.D.
iii. Coefficient of Variation, C.V. =  100
Mean

Here, Mean =
 f i . x i = 85000  5.862
 fi 14,500
S.D 1.655
 C.V   100   100
Mean 5.862
 28.23%

Moments, skewness and kurtosis

First four moments about mean of a frequency distribution


Procedure:
Generally there are two types of moments.

1).Moments about mean (  r )

r 
 f (x  x) i i
r

f i

2).Moments about origin (  'r )


 f .d
r

r  where, d i  x i  A and A=any arbitrary value


1 i i

f i

By step deviation method


h r  fidi x A
r

r  ( Where, d i  i )
 fi h
Moments about mean are:
1  0
'
 2   2 '( 1 ) 2
3  3  3 2 1 '  2(  ' 1 )3
' '

 4   4  4 '3 1  6 2 ( 1 )  3  '1
' ' ' ' 2
  4

Measure of Skewness and Kurtosis for a frequency distribution


Skewness is defined as lack of symmetry from mid value. Measures of
Skewness signify the direction and extent of Skewness (skewed to left or
right). There are two methods to find out Measure of Skewness from a
given frequency table.
First method – Karl Pearson coefficient of Skewness
Step-1. Find out Mean, Mode and S.D.

Department of Agricultural Statistics, OUAT Page-17


UG Practical Manual on Statistics

Step-2. Calculate measure of Skewness by using the formula given by


Karl Pearson,

Mean  Mode
Sk 
S.D
Second method - For wide class of frequency distribution
Step-1. Find 2nd and 3rd moments about mean
Step-2. Calculate measure of Skewness,

3
2
 1   1 
2
3

 f i (x i  x) 2  f i ( xi  x )3
Where,  2  , 3 
 fi  fi
If 1 =0 or  1 =0, indicates the distribution is symmetrical otherwise
skewed to left or right as per the sign of 3 -ve or +ve.

Kurtosis is a measure of the peakedness or flatness of a curve of a


distribution. Kurtosis is of three types - Platykurtic, Leptokurtic and
Mesokurtic. Kurtosis can be computed by the following steps.
Step-1.Find out 2nd and 4th moments about the mean of distribution
Step-2.Calculate Kurtosis as,

 2  42 or  2   2  3
2
 4  4th central moment about mean
Where,
 2  2nd central moment about mean
If  2 = 3 or  2 =0, indicates the distribution is normal i.e. mesokurtic
 2 >3 or  2 >0, indicates the distribution is more peaked i.e. leptokurtic
 2 <3 or  2 <0, indicates the distribution is more flattened i.e. platykurtic

Problem-8. Calculate the four moments about mean and find out the
measures of Skewness & Kurtosis from the following table.

Class 10-20 20-30 30-40 40-50 50-60 60-70 70-80


Interval
Frequency 3 7 4 14 8 6 3

Solution:

Department of Agricultural Statistics, OUAT Page-18


UG Practical Manual on Statistics

Table-11. Calculation of moments


di=
Mid ( x i  A)
Class Frequency
value
2 3 4
fidi fidi fidi fidi
interval (fi) h
(xi)
10-20 3 15 -3 -9 27 -81 243
20-30 7 25 -2 -14 28 -56 112
30-40 4 35 -1 -4 4 -4 4
40-50 14 45 0 0 0 0 0
50-60 8 55 1 8 8 8 8
60-70 6 65 2 12 24 48 96
70-80 3 75 3 9 27 81 243
Total 45 2 118 -4 706

From the table,

 fi .d i 2
 '1  h  10   0.44
 fi 42

 fi .d i
2
118
'2  h 2  100   262.22
 fi 45

 fi .d i 4
3

 '3  h 3  1000   88.88


 fi 45

 fi .d i
4
706
'4  h 4  10,000   156888.88
 fi 45

  2   2  (1 ) 2  262.22  (0.44) 2  262.02


' '

3   '3  31 .2  2( 1' )3


' '1

 88.88  3(0.44)(262.220  2  (0.44)3


 435.01  0.170  434.83
 ' 4   ' 4  6 3 . ' 4  4 2  ' 4  3(1 ) 4
' ' 2 '

 156888.88  6  (88.55) (0.44)  4  262.22  (0.44) 2  3  (.44) 4


 156888.88  234.64  203.06  0.11
 4  157326.47

So,
 23 (434.83) 2 189077.12
Skewness =  1 = ( 1 )     0.10
 2
3
(262.02) 3
179888.46

Department of Agricultural Statistics, OUAT Page-19


UG Practical Manual on Statistics

4 157326.47 157326.47
Kurtosis =  2     2.29
( 2 ) 2
(262.02) 2 68654.48

By moment method Skewness and Kurtosis of the given distribution are 0.10 and 2.29
respectively.
So, it is concluded that the distribution of the data is not
symmetrical i.e. skewed to the left as  1 =0.10 and the sign of 3 is –ve.
Again the distribution is also not normal i.e. less peaked(platykurtic) as
 2 is less than 3,i.e.,  2 =2.29.

Exercise: The following are the 405 soybean plant heights collected from
a particular plot.

Plant height 8- 13- 18- 23- 28- 33- 38- 43- 48- 53-
(cm.) 12 17 22 27 32 37 42 47 52 57
No. of 6 17 25 86 125 77 55 9 4 1
plants( f i )

Compute:
i).A.M., G.M., H.M., Median, Mode
ii). Mean Deviation from mean, S.D., Variance, C.V.
iii). Coefficient of Skewness and Kurtosis
iv). Interpret the results of above for soyabean

1.5. Testing of Hypothesis or Test of Significance or decision rule

The estimate based on sample values do not equal to the true


value in the population due to inherent variation in the population.
The samples drawn will have different estimates compared to the
true value. It has to be verified that whether the difference between
the sample estimate and the population value is due to sampling
fluctuation or real difference. If the difference is due to sampling
fluctuation only it can be safely said that the sample belongs to the
population under question and if the difference is real we have
every reason to believe that sample may not belong to the population
under question.
Steps involved in test of hypothesis:
1) The null and alternative hypothesis will be formulated
2) Test statistic will be constructed
3) Level of significance will be fixed
4) The table (critical) values will be found out from the tables for a
given level of significance
5) The null hypothesis will be rejected at the given level of significance
if the value of test statistic is greater than or equal to the critical
Department of Agricultural Statistics, OUAT Page-20
UG Practical Manual on Statistics

value. Otherwise null hypothesis will be accepted.


6) In the case of rejection the variation in the estimates will be
called “ significant‟ variation. In the case of acceptance the
variation in the estimates will be called “not- significant‟.

1.6. Standard normal deviate (SND) or Z tests or Large Sample


Tests
If the sample size n ≥ 30 then it is considered as large sample and
if the sample size n< 30 then it is considered as small sample and
accordingly there are large sample and small sample tests.
SND Test or One Sample (Z-test) for single mean
Case-I: Population standard deviation () is known
Assumptions:
1. Population is normally distributed
2. The sample is drawn at random
Conditions:
1. Population standard deviation  is known
2. Size of the sample is large (n > 30)
Procedure: Let x1,x2, ………xn be a random sample size of n from a
normal population with mean μ and variance 2. Let x be the sample
mean of sample of size ‘n’
Null Hypothesis is H0 : μ = μ0 (a specified value)
and alterative is H1: μ ≠μ0 (two-tail)
Under H0, the test statistic is
x  0
Z= ~ N(0,1)
/ n
i.e. the above statistic follows Normal Distribution with mean μ0 and
varaince ‟1‟.
If Zcal ≤ Z tab at 5% level of significance, H0 is accepted and hence we
conclude that there is no significant d i f f e r e n ce between the
population mean and the one specified in H0 as μ0.
Problem-9. A sample of 900 leaves has a mean of 3.4 cms and S.D. of
2.61 cms. Is the sample drawn from a large population of mean 3.25
cms?
Solution:
Here, Null Hypothesis is H0 : μ = μ0
and altenative is H1: μ ≠μ0 (two-tail)

Department of Agricultural Statistics, OUAT Page-21


UG Practical Manual on Statistics

Given x =3.4, μ0=3.25, σ=2.61 and n=900


Putting the values in the formula, we get Z=1.73
The tabulated value of Z at 5% is 1.96.
So, Z calculated is less than tabulated. Hence, H0 is accepted i.e.
the sample drawn is from a large population of mean 3.25 cms.
Exercise: A herd of 1500 steer was fed a special high-protein grain for a
month. A random sample of 29 was weighed and had gained an average
of 6.7 kgs. If the standard deviation of weight gain for the entire herd is
7.1kgs., test the hypothesis that the average weight gain per steer for the
month was more than 5 kgs. (Hints: H 0: μ = 5 H 1: μ > 5, Zcal=1.289)
Case-II: If  is not known

Null hypothesis (H0) :  = 0


under H0, the test statistic

| x  0 |
Z= ~ N(0,1)
s/ n

1
Where, s= [ ( x 2 )  ( x / n) 2 )] and x’s are sample observations.
n
If Zcal ≤ Z tab at 5% level of significance, H0 is accepted and hence
we conclude that there is no significant difference between the
population mean and the one specified in H0 otherwise we do not accept
H0.
The table below gives some critical values of Z  as:
Level of Critical value of Z 
significance
Two-tail One-tail
10% 1.645 1.28
5% 1.96 1.645
1% 2.58 2.33

SND test for two sample means or Z-test of significance for


difference of two means
Case-I: when σ is known
Procedure:

Let x1 be the mean of a random sample of size n1 from a population


with mean μ1and variance σ12 and let x2 be the mean of a random
sample of size n2 from another population with mean μ2 and variance

Department of Agricultural Statistics, OUAT Page-22


UG Practical Manual on Statistics

σ22.
The hypothesis is, H0: μ1= μ2 and H1: μ1≠ μ2(two-tail)
i.e. the null hypothesis states that the population means of the two
samples are identical. Under the null hypothesis the test statistic
becomes
| x1  x 2 |
Z= ~N(0,1)
 12  22

n1 n2

i.e the above statistic follows Normal Distribution with mean “0‟ and
variance ‟1‟.
2 2
If σ =σ = σ2 (say)
1 2 i.e. both samples have the same standard
deviation(or variance), then the test statistic becomes
| x1  x 2 |
Z= ~N(0,1)
1 1
 
n1 n2

If Zcal ≤ Z tab at 5% level of significance, H0 is accepted otherwise


rejected.
If H0 is accepted means, there is no significant difference between two
population means of the two samples and means are identical.

Problem-10. The Average panicle length of 60 paddy plants in field


No.1 is 18.5 cm and that of 70 paddy plants in field No.2 is 2 0 . 3 cm.
with common S.D. o f 1.15 cm. Test whether there is significant
difference between two paddy fields w.r.t. mean of panicle length.
Solution:

Hypothesis, H0: There is no significant difference between the means


of two paddy fields w.r.t. panicle length, i.e. μ1=μ2
Under H0, the test statistic becomes

Z= 1  2 ~N(0,1)
where,

x1 =18.5, x2 =20.3 n1=60, n2=70, σ=1.15

Substituting the given values in the formula, we get Z=8.89

Conclusion: So, at 5% level of significance 8.89 > 1.96(table value) and


hence H0 is rejected means there is significant difference between mean
panicle lengths of the two paddy populations in regard to panicle length.

Department of Agricultural Statistics, OUAT Page-23


UG Practical Manual on Statistics

Example: The amount of a certain trace element in blood is known to


vary with a standard deviation of 14.1 ppm (parts per million) for male
blood donors and 9.5 ppm for female donors. Random samples of 75
male and 50 female donors yield concentration means of 28 and 33 ppm,
respectively. Test whether the population means of concentrations of the
element are the same for men and women assuming unequal variance?
(Hints: H 0: μ1 = μ2 H1 : μ1 ≠ μ 2 Zcal=-2.37)

Case-II: when S.D. of both populations not known


The above methods are followed only after estimating the S.D. of the two
populations from the sample observations as:
1 1
S1= [ ( x1 )  ( x1 / n1 ) 2 )] S2= [ ( x2 )  ( x2 / n2 ) 2 )]
2 2

n1 n2

Where x1 and x2 are the independent sample observations with sizes n1


and n2 from the two normal populations respectively.
The pooled variance (S2) or S.D.(S) is computed as:
S2=
Problem-11. A breeder wants to investigate whether the number of
filled grains per panicle is the same in a new variety of paddy ACM.5
and an old variety ADT.36. To verify a random sample of 50 plants of
ACM.5 and 60 plants of ADT.36 were selected from the experimental
fields. The following results were obtained:

ForACM.5 For ADT.36


Mean=139.4 Mean=112.9
S1=26.864 S2=20.1096
N1=50 N2=60
Test whether the claim of the breeder is correct.
Solution:
The hypothesis is, H0: μ1= μ2 and H1: μ1≠ μ2(two-tail)
Assuming that the two population variances are unequal put the given
values in the formula
| x1  x 2 |
Z= = 4.76
 12  22

n1 n2
Calculated value of Z > Table value of Z at 5% ls (=1.96), H0 is
rejected. We conclude that the number of filled grains per panicle is
significantly different in the two verities ACM.5 and ADT.36.

1.7. Small Sample Tests

Department of Agricultural Statistics, OUAT Page-24


UG Practical Manual on Statistics

It is applicable when the sample size n<30.


Test of hypothesis on equality of two variances (Snedecor’s F-test or
variance ratio test)

Let x1, x2,…,xn1 of size n1 be a sample drawn from a normal


population with variance x2 and y1, y2,….,yn2 be another sample of size
n2 drawn independently from a normal population with variance y2 for
the same variable under study. Now we are interested to know whether
two samples are drawn from two different normal populations or they
belong to same normal population w.r.t. variance or scatterdness of the
observations.
Procedure:

Step-1. The Assumptions in F-test:


i. Parent population must be normal.
ii. Samples are independent.
Step-2. Take the null hypothesis
H 0 :  2 x   2 y against Alternate hypothesis H 1 :  2 x   2 y

Step-3. Choose the level of significance i.e 5% or 1%.


Step-4. Choose the location of Critical region i.e one tailed or two tailed
test.
Step-5. Compute the observed value of F as:
2
F  x2 with (n1  1) and (n2  1)d . f .if S 2 x  S 2 y (Greater value is taken in the numerator )
S
S y
( xi  x ) 2 2 ( yi  y ) 2
Where, S x  S y
2

n1  1 n2  1

Step-6. Compare the observed value with tabular value.


Step-7. If Fcal > Ftab then null hypothesis rejected and significant.
Fcal≤ Ftab then null hypothesis accepted and it is not
significant.
Problem-12. Two independent samples on dry weight(g) of plants were
observed from two populations as:
Sample–1 (x): 39, 41, 43, 41, 45, 39, 42, 44
Sample–2 (y): 40, 42, 40, 44, 39, 38, 40

Does the estimate of the population variances differ significantly?

Solution:

Department of Agricultural Statistics, OUAT Page-25


UG Practical Manual on Statistics

The Hypothesis is:

H 0 :  2 x   2 y (take the hypothesis that the population have same var iances)

H1 :  2 x   2 y

Level of significance, = 0.05

( xi  x ) 2 ( yi  y ) 2
2
Sx
Test Statistics, F   
2 2
where, S and S
n1  1 n2  1
2 x y
Sy

Table-12. Calculation of variances


Obs. No. x y (x  x) ( y  y) (x  x) 2 ( y  y) 2

1 39 40 -2.75 -0.42 7.5625 0.1764


2 41 42 -0.75 1.58 0.5625 2.4964
3 43 40 1.25 -0.42 1.5625 0.1764
4 41 44 -0.75 3.58 0.5625 12.8164
5 45 39 3.25 -1.42 10.5625 2.0164
6 39 38 -2.75 -2.42 7.5625 5.8564
7 42 40 0.25 0.42 0.0625 0.1764
8 44 - 2.25 - 5.0625 -
Total 334 283 - - 33.5 23.7148

x   xi  334  41.75, y   y i  283  40.42


n1 8 n2 7
( x i  x ) 2 33.5 ( y i  y) 2 23.7148
   4.782969 , S y    3.952144
2 2
Sx
n1  1 7 n2 1 6
2
4.782969
F  Sx 2   1.210
S y 3.952144
As n1=8 and n2=7, so for 7 and 6 degree of freedom at  = 0.05
the critical value of ‘F’ is 3.97. Since, the calculated value of F=1.21 is
less than the critical value(=3.97) the H0 is accepted i.e. the estimate of
the population variances does not differ significantly. It is concluded that
the two samples have been drawn from the same population or the
variances of the two populations are same.

Test for single mean (Student’s t-test)

Department of Agricultural Statistics, OUAT Page-26


UG Practical Manual on Statistics

This test is used to test if the sample mean ( x ) differ significantly


from the hypothetical value of the population mean 0.
Procedure:
Step-1.
Let x1, x2, …xn be a random sample of size n drawn from a population
with following assumptions :
i. Parent population must be normal.
ii. The sample is random.
iii. The population Standard deviation is normal.
iv. The sample size must be <30.
Step-2.
Take Null hypothesis H O :    o
Alternate hypothesis H 1 :    o
Step-3. Level of Significance as 5% or 1%
Step-4. Choose the location of ritical Region i.e one tailed or two tailed.
Step-5. Compute the sample statistic (observed) of student t-test.
x  0
t with (n-1) degrees of freedom
s
n
Where,
x  Sample mean
  Specified Population mean
0
s  Sample S tan dard deviation

 (x i  x)2
i .e .s  i

n1
Step-6. Compare the sample statistic with tabulated value.
Step-7. Decision Rule
i. If t(cal) > t(tab) then Significant and Null hypotheses rejected.
ii. If t(cal) ≤ t(tab) then Not significant and Null hypothesis accepted.

Problem-13. Ten animals are fed with an animal feed. The gain in
wt.(kg) of animals are given below. Negative value indicates loss in
weight. Test whether there is significant gain in weight as a result of
consumption of that particular animal feed.
Animal No. 1 2 3 4 5 6 7 8 9 10
Gain in Wt.(x) 25 10 11 13 12 8 5 13 7 -4

Solution:

Department of Agricultural Statistics, OUAT Page-27


UG Practical Manual on Statistics

Null hypothesis Ho :   0 (i.e. there is no gain in weight)


H 1 : . 0 i.e. there is gain in weight
This is a case of one tailed test.
Table-13. Calculation for t-Statistic
Animal No. Gain in wt.(x) (x  x) (x  x) 2
1 25 15 225
2 10 0 0
3 11 1 1
4 13 3 9
5 12 2 4
6 8 -2 4
7 5 -5 25
8 13 3 9
9 7 -3 9
10 -4 -14 196
Total  x  100 ( x  x ) 2  482

 x 100
 Mean  x    10
x 10
x  0
and t 
s
n
Where x  10, 0  0, n  10

 x  x 
2
482
s   7.3
n 1 9
10  0
t  4.31
7.3
10
Since the calculated t-value of 4.31 is more than the table value of
t=1.833 at 5% level significance for 9 d.f. for one tail test, the null
hypotheses is rejected and alternate hypothesis is accepted. So, we can
conclude that there is +ve gain in wt. due to consumption of the
particular feed.

Exercise: A random sample of height (ft.) of 10 trees from a forest was


observed. Test whether the mean height of trees of that forest is 100ft. or
not at 5% level. (Hints: Calculated t=-0.62)

Test for difference of two means for Independent samples (Fisher’s


t-test)

Department of Agricultural Statistics, OUAT Page-28


UG Practical Manual on Statistics

This test is used to test the difference between two population


means on the basis of two independent sample means or to test whether
two samples have been drawn from the same population having same
mean.
Procedure:
Let x1, x2, …xn1 be a random sample of size n1 drawn from a population
with mean x and y1, y2, … , yn2 be another independent random sample
with mean y having the following assumptions.

i. Parent population must be normal.


ii. Samples are random and independent of each other.
Case-I: Population variance for both the samples same and unknown.
Step-1. Take Null hypothesis Ho :  x   y
Alternative hypothesis H 1 :  x .   y

Step-3. Choose the level of significance either 5% or 1%.


Step-4. Choose the location of Critical region i.e. one tailed test or two
tailed test.
Step-5. Compute the sample t value (calculated) on the following formula
of Fisher’s- t test.
xy
t with (n1+n2–2) d.f.
1 1
s 
n1 n 2

 ( x i  x ) 2   ( y i  y) 2
Here, s  is the estimated standard deviation
n1  n 2  2
of the population
Where,
x  Sample mean of 1st sample, n1  no of observation of 1st sample
y  Sample mean of 2nd Sample, n2  no.of observation of 2nd Sample

Step-6. Compare the calculated value with table value.


Step-7. If. t(cal) > t(tab) then Null hypothesis rejected and it is significant.
if, t(cal) ≤ t(tab) then Null hypothesis accepted and it is not
significant.
Problem-14. The interest is to study the effect of two treatments A & B
on the yield of a crop each of the treatments being repeated in 5 plots
and the yield/plot noted below.

Yield (in kg/plot)

Department of Agricultural Statistics, OUAT Page-29


UG Practical Manual on Statistics

Treatment-A (x) 9 10 13 11 7 x  10
Treatment-B (y) 15 10 14 15 11 y  13

Test whether the mean yield obtained as a result of these two treatments
differ significantly.
Solution:
Step-1. Null hypothesis,
Ho :  A   B (i.e no significant difference between two means)
Alternate Hypothesis,
H 1 :  A   B (i.e two means differ significantly )

Step-2. This is a case of two-tailed test.


Step-3. The level of significance chosen is 5%.
Step. 4
Table-14. Calculation for Fisher’s–t Statistic
Sl. No. x y (x  x) ( y  y) (x  x) 2 ( y  y) 2
1 9 15 -1 2 1 4
2 10 10 0 -3 0 9
3 13 14 3 1 9 1
4 11 15 1 2 1 4
5 7 11 -3 -2 9 4
Total 50 65 - - 20 22

So,
50 65
x  10 , y   13 , n1  n2  5 and
5 5

 x  x     y  y
2 2
20  22 42
s i i
   5.25  2.29
n1  n2  2 8 8

xy 10  13 3 3
Test Statistic, t      2.08
1 1 1 1 2.29  0.63 1.44
s  2.29 
n1 n2 5 5

Step-5. The two tailed table value for “t” at 5% significance level with 8
d.f. is 2.306. So, calculated t is less than table value and hence the null
hypothesis is accepted. It is concluded that the two treatments do not
produce any significant difference in the mean yield.
Exercise: To assess the effect of inoculation with mycorrhiza on the height
growth of seedlings of a crop, 10 seedlings inoculated with
mycorrhiza(Group-1) and another 10 seedlings without inoculation(Group-
2) were collected from an experiment. The height of seedlings obtained
under the two groups of seedlings was:

Department of Agricultural Statistics, OUAT Page-30


UG Practical Manual on Statistics

Plot 1 2 3 4 5 6 7 8 9 10
Group I 23 17.4 17 20.5 22.7 24 22.5 22.7 19.4 18.8
Group II 8.5 9.6 7.7 10.1 9.7 13.2 10.3 9.1 10.5 7.4
Under the assumption of equality of variance of seedling height in the two
groups, test the equality of means. (tcal=11.75)
Exercise:Using the data of example of F-test, test equality of 2 means.

Test for difference of two dependent sample means(paired t-test)

Procedure:

Let (x1, y1), (x2, y2),…,(xn, yn) be n paired observations of a sample from
a population with basic assumption as follows:
i. Parent population must be normal.
ii. Samples are dependent and occur pair-wise.

Step-1. Take Null hypothesis: H 0 :  x   y or H o : d  0 i.e. no difference


Alternate hypotheses:

H 1 :  x   y or H 1 : d  0 (or d  0 or d  0)

Step-3. Choose the level of significance either 5% or 1%.


Step-4. Choose the location of Critical region i.e. one tailed test ‘or’ two
tailed test.
Step-5. Compute the observed t statistic on the following formula of pair-t
test:
d
t with (n  1) d . f .
s
n
Where, di  xi  yi

 d d n  1
2
s 1 i.e. d  mean of ' d ' var iable)

Step-6. Compare the observed value with tabular value.


Step-7. If t-calculated > t-tabulated then null hypothesis rejected and it is
significant otherwise null hypothesis is accepted.
Problem-15. Memory capacity of 9 students was tested before and after
training. Test at 5 per cent level of significance whether the training was
effective from the following scores.
Student 1 2 3 4 5 6 7 8 9
Before (x) 10 15 9 3 7 12 16 17 4

Department of Agricultural Statistics, OUAT Page-31


UG Practical Manual on Statistics

After (y) 12 17 8 5 6 11 18 20 3

Solution:
Here, marks obtained by the same batch of students in the tests are
available. Hence, the marks are expected to be correlated. So, paired t-
test will be appropriate. Then taking the null hypothesis that the mean of
difference is zero, we can write,
H 0 :  x   y , which is equivalent to test H 0 : d  0

H1 :  x   y

As we are having matched pairs; we use paired ‘t’-test , which is given by


d
t with (n  1) d . f .
S
n
Table-15. Calculation for paired-t
Score (x) Score (y) Difference
Student di2
xi yi di=(xi-yi)
1 10 12 -2 4
2 15 17 -2 4
3 9 8 1 1
4 3 5 -2 4
5 7 6 1 1
6 12 11 1 1
7 16 18 -2 4
8 17 20 -3 9
9 4 3 1 1
Total - - -7 29

 di  7
Here d    0.778
9 9
 (d  d )2 d  n.d 2
2

s i
 i

n 1 n 1
29  9   0.778
2
  2.944 1.715
9 1
d  0.778 0.778
t    1.361
S 1.715 0.572
n 9

Department of Agricultural Statistics, OUAT Page-32


UG Practical Manual on Statistics

Table value of ‘t’ at 5% level for 8 df is 2.306. The calculated value


is less than table value. Hence, it is not significant and the null hypothesis
is accepted. Hence we can conclude that the training was not effective.

Exercise: Data pertaining to organic carbon(OC) content measured at two


different layers of 10 number of soil pits in a natural forest were collected
to study whether the OC content is same or different as:

Organic carbon (%)


Soil pit 1 2 3 4 5 6 7 8 9 10
Layer (x) 1.59 1.39 1.64 1.17 1.27 1.58 1.64 1.53 1.21 1.48
1
Layer (y) 1.21 0.92 1.31 1.52 1.62 0.91 1.23 1.21 1.58 1.18
2

Analyse the data and draw your conclusion.

(Hints: sd2=0.1486 tcal =1.485)

1.8. Chi-square test (χ2)

Chi-square test of significance is for testing the agreements


between observation and hypothesis (or expected) where the data are
purely qualitative or enumerative in character. Such enumerative data are
characterized by the frequency of occurrence or non-occurrence of events
or attributes or categories expressed as counts or proportions or
percentages. But, the expected frequency in each category should
preferably be more than 5 and the total number of observations should be
large, say, more than 50.

χ2-test for Goodness-of-fit

This involves testing of significance of difference between observed


frequencies and the frequencies expected on some prior hypothesis or
rule. If Oi is a set of observed frequencies and Ei is corresponding set of
expected frequencies (i=1,2,…,n), the Karl Pearson’s Chi-square (χ2) is
given by :
χ2 =
Procedure:
Step-1. Follow the following assumption
i. Sample observation should be independent.
ii. Constraint on cell frequency should be linear i.e  Oi   Ei
iii. Total number of frequency should be reasonably large.
iv. No theoretical(expected) cell frequency be less than 5.

Department of Agricultural Statistics, OUAT Page-33


UG Practical Manual on Statistics

Step-2. Take the null hypothesis , H 0 : O i  E i


Alternative hypothesis H1 : O i  E i
Step-3. Choose the level of significance either  =5% or 1%.
Step-4. Choose the location of critical region i.e. one tailed or two tailed
Step-5. Compute the Chi-square value as per formula.
Step-6. Compare the observed value with tabular value and take decision
as:
If χ2cal > χ2tab then null hypotheses rejected and significant at  .
If χ2cal ≤ χ2tab then null hypothesis accepted and non significant at  .
Problem-16. In a cross between parents of the genetic constitution AAbb
and aaBB the phenotypes in F2 sample is classified as follows.
AB Ab aB Ab Total
87 29 32 12 160
They are expected to occur in a 9:3:3:1 ratio.
Does the segregation agree with the theoretical ratio?
Solution:
Ho: The Segregation agree with the theoretical ratio
H1: The Segregation does not agree with the theoretical ratio.
Level of Significance  = 0.05
4
(O i  E i ) 2
Tests Statistic is χ = 
2
with 3 df .
i 1 Ei
The expected frequencies are computed on the basis of the
theoretical segregation ratio 9:3:3:1. The total is 9+3+3+1=16. We
expect ‘9’ out of ‘16’ to belong to AB group, that is, the probability of AB
9
is
16
9
The expected frequency of AB is therefore,  160  90
16
3
The expected frequency of, Ab is 16  160  30
3
The expected frequency of, aB is 16  160  30
1
And the expected frequency of ab is 160  10
16

Table-16. Calculation for Chi-square value

Department of Agricultural Statistics, OUAT Page-34


UG Practical Manual on Statistics

Observed Expected (O i  E i ) 2 χ2
Pheno
frequency frequency (Oi-Ei) (Oi-Ei)2 value
type Ei
(Oi) (Ei)
AB 87 90 -3 9 0.100
Ab 29 30 -1 1 0.033 0.666
aB 32 30 2 4 0.133
ab 12 10 2 4 0.400

The calculated χ2 value is 0.666 which is less than the critical value
of χ (with 3 d.f. at  =0.05 is 7.815). Therefore, the calculated χ2 value is
2

not significant. Hence we accept the null hypothesis and conclude that the
observed phenotypic ratio confirms to the theoretical segregation ratio of
9:3:3:1.
Exercise: Data were collected on the number of insect species from an
undisturbed area of a Wildlife Sanctuary in different months to test
whether there are any significant differences between the numbers of
insect species found in different months. (Hints: we may state the null
hypothesis as the diversity in terms of number of insect species is the
same in all months and derive the expected frequencies in different
months accordingly). Test the data. (Ans. χ2=134.84)

Month Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. Total
Oi 67 115 118 72 67 77 75 63 42 24 32 52 804

χ2-test of independence or association of attributes

When individuals are classified simultaneously on the basis of


variables or attributes or categories the resulting table of frequencies is
called a (r x c) contingency table i.e. r-rows and c-columns. The χ2 test
may be applied to contingency table to find out if the variables are
independent or associated.
Procedure:
The χ2 value for this test may be obtained by two ways :
i. By estimating the value of Ei (Expected frequency) from the values of Oi
(Observed frequency) and applying 2 as goodness-of-fit.
ii. For 2x2 contingency table
2 x 2 Contingency table
Category
Group I II Total
1 a b a+b
2 c d c+d
Total a+c b+d N=a+b+c+d

Department of Agricultural Statistics, OUAT Page-35


UG Practical Manual on Statistics

(ad  bc) 2 N
The simple formula to calculate 2= with 1 d.f .
(a  b)(c  d )(a  c)(b  d )

Where a,b,c,d are observed cell frequencies. If any of the expected cell
frequencies is less than 5, then a slightly modified formula is necessary.
The corrected formula for 2x2 contingency table called Yates’ Correction
for continuity is:
2
 N
 ad  bc   .N
 
2  2
(a  b)(c  d )(a  c)(b  d )

Problem-17. In a survey of fertilizer practices in India each of 323 cotton


growing fields selected for survey was classified in the twin criteria of
irrigation practice (irrigated or non-irrigated) and the practice of manuring
(manured or un-manured) resulting in the following contingency table.

Irrigated Non- Irrigated Total


Manured 75(a) 35(b) 110
Un-manured 115 (c) 98(d) 213
Total 190 133 323

It is required to test whether the practice of irrigation and the


practice of manuring are independent or related (associated).
Solution:

Ho: these two-factors irrigation and manuring are independent.


H1: these two-factors irrigation and manuring are dependent or
associated.
First Method: Goodness-of-fit
The expected frequencies of each cell are calculated as:
The expected frequency of the cell (a) is
(a  b)  (a  c) 110  190
  64.7,
N 323
(a  b)  (b  d ) 110  133
Cell (b) is   45.29
N 323
(c  d )  (a  c) 213  190
Cell (c) is   125.29
N 323
( c  d )( b  d ) 213  133
Cell (d) is   87 .7
N 323

The 2 is calculated using the formula

Department of Agricultural Statistics, OUAT Page-36


UG Practical Manual on Statistics

O i  E i  2
x 
2

Ei
 i
follows x 2 distributi on with ( 2  1)  ( 2  1)  1 d . f .

Table-17. Calculation of chi-square value


Irrigated Non irrigated Total
Manured 75(O1) 35 (O2) 110
64.7(E1) 45.3 (E2)
Un Manured 115(O3) 98 (04) 213
125.3 (E3) 87.7(E4)
Total 190 133 323

2
The  value computed for the above table is
4
O i  E i 2
i 1 Ei
(75  64.7) 2 (35  45.3) 2 (115  125.3) 2 (98  87.7) 2
     6.03  6.00
64.7 45.3 125.3 87.7
Second Method: Independence of attributes
(ad  bc) 2 .N
x2 
(a  b)(c  d )(a  c)(b  d )
(75  98  35  115) 2  323

(95  35) (115  98) (75  115) (35  98)
(3.325) 2  323

592076100
 6.03  6.0
2
The  value computed for the above two methods is 6.00. Since
there are only two categories, irrigation and manuring, the df for the
2
above contingency table is one. The table value of  with 1df at 5% level
2
of significance is 3.84. Here the  calculated values is higher than the
table value and so the null hypothesis of independence of two factors
irrigation and manuring is rejected and concluded that they are mutually
related or associated.
Exercise: The following table shows the result of inoculation against
cholera in a group of people. Examine the effect of inoculation in
controlling susceptibility to cholera. (Hints: apply Yates’ correction)

Not attacked Attacked


Inoculated 43 5
Not-inoculated 7 28

Department of Agricultural Statistics, OUAT Page-37


UG Practical Manual on Statistics

1.9. Correlation and regression

In many natural systems, changes in one attribute are accompanied


by changes in another attribute and that a definite relation exists between
the two. In other words, there is a correlation between the two variables.
For instance, several soil properties like nitrogen content, organic carbon
content or pH are correlated and exhibit simultaneous variation. Strong
correlation is found to occur between several morphometric features of a
tree. In such instances, an investigator may be interested in measuring
the strength of the relationship. Having made a set of paired observations
(xi,yi); i = 1, ..., n, from n independent sampling units, a measure of the
linear relationship between two variables can be obtained by a quantity
called Pearson’s product moment correlation coefficient or simply
correlation coefficient.
Correlation is the study of co-variation between two variables to
understand how the variables are closely related. In correlation analysis,
both the variables are normally distributed and must be continuous. For
discovering and measuring the magnitude and direction of relationship
between two variables we use the statistical tool known as correlation
coefficient and its range is -1 to +1. The + and – sign indicates the
direction of relationship and the value gives the magnitude or strength
between the two variables.
Regression is the functional relationship between two or more
variables and thereby provides a mechanism for prediction or forecasting.
When the relationship between two variables is a straight line it is called
simple linear regression.

Karl Pearson’s correlation coefficient and its test of significance


Procedure: Let (Xi,Yi); i = 1,2,3, ...n, be from n independent sampling
units of 2 quantitative variables.

a). Direct Method:


Step-1. Construct a table for finding X2, Y2 and XY values
Step-2. Calculate  X ,  XY ,  X 2 ,  Y 2
Step-3. Calculate Karl Pearson’s correlation coefficient by
n  XY   X .  Y
rxy =
n  X 2  ( X )2 . n  Y 2  ( Y )2 
b). Step deviation method (change of origin and scale):

Step-1. Calculate U & V


X A  Y  B
Where, U   . ; V .
 h   k 

Department of Agricultural Statistics, OUAT Page-38


UG Practical Manual on Statistics

A, B are arbitrary values from X & Y and h, k are suitable chosen scales.
Step-2. Construct frequency distribution table for finding U,V, UV, U2,V2
Step-3. Calculate  U,  V,  UV,  U 2 &  V 2
Step-4. Calculate correlation coefficient by

n  UV   U  V
ruv 
 n U 2
 (U ) 2  n V 2
 (V ) 2 
OR


U .V  nV .V 
U  nU  V  nV 
2 2 2 2

Where, U   U / n , V   V / n
Both methods results the same value, i.e. rxy = ruv
Test of correlation coefficient:
Null hypothesis, H0:  =0 and Alternative, H1:  ≠0
Here  is the correlation in the population and r is the estimate of  from
sample observation.
Level of Significance,  =0.05
r n2
And Test statistic, t= ~ Student’s-t distribution with (n-2) d.f.
1 r2
The tcal is compared with ttab. If tcal ≤ ttab, then H0 is accepted means
not significant i.e. the two variables have no linear relationship (may be
some other like nonlinear) and if tcal > ttab, then H1 is accepted means
significant or we say the two variables are linearly related with the
magnitude and direction of r.
Problem-18. The following data gives the height of father and their sons
in 10 families. Compute the correlation coefficient of heights and test its
significance and give your conclusion.
Height of father (cm) 63 69 65 67 68 69 69 70 71 71
Height of son (cm) 65 63 63 65 67 67 68 71 61 69

Solution:

Department of Agricultural Statistics, OUAT Page-39


UG Practical Manual on Statistics

Table-17. Calculation of correlation coefficient

Ht. Of
Ht. of Son U=X- V=Y- U 
father X2 Y2 XY U2 V2
(Y) A B V
(X)
63 65 3969 4225 4095 -5 0 0 25 0
69 63 4761 3969 4347 1 -2 -2 1 4
65 63 4225 3969 4095 -3 -2 6 9 4
67 65 (B) 4489 4225 4355 -1 0 0 1 0
68 (A) 67 4624 4489 4556 0 2 0 0 4
69 67 4761 4489 4623 1 2 2 1 4
69 68 4761 4624 4692 1 3 3 1 9
70 71 4900 5041 4970 2 6 12 4 36
71 61 5041 3721 4331 3 -4 -12 9 16
71 69 5041 4761 4899 3 4 12 9 16
Total=682 659 46572 43513 44963 2 9 21 60 93

a). Direct Method:

n  XY   X .  Y
rxy  and putting values
n  X  ( X ) ) . n  Y  ( Y )
2 2 2 2


10  44963   682  659 
10  46572  465124 . 43513 .10  434281
192 192
   0.27
596 . 849 711 . 33

b). Step Deviation method:


U V
U   0 .2 ; V   0 .9
n n
U 2
 0 . 04 (V ) 2  0 . 81
 UV  n U V
ruv 
U  nU 2  V 2  nV 2
2

21  10 ( 0 . 18 )

60  10 ( 0 . 04 ). 93  10 ( 0 . 81 )
19 . 2 19 . 2
   0 . 27
( 7 . 72 ).( 9 . 21 ) 71 . 102

 The correlation coefficient between father and son in both methods is


0.27.
Test of significance of r:

Department of Agricultural Statistics, OUAT Page-40


UG Practical Manual on Statistics

r n2
Putting the value of r in the formula, t=
1 r2

0.27 10  2
the t statistic, t= =0.79
1  (0.27) 2

The ttab=2.31 with 8 d.f. at 5% ls.


So, tcal < ttab and H0 is accepted i.e. not significant. It is concluded
that the height of father and their son is not linearly related or we will say
that the height of father increase or decrease does not indicate the
increase or decrease in height of son.
Exercise: The data on pH and organic carbon content were measured
from soil samples collected from 15 pits taken in natural forests as given:

Soil Pit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
pH(x) 5.7 6.1 5.2 5.7 5.6 5.1 5.8 5.5 5.4 5.9 5.3 5.4 5.1 5.1 5.2
Organic 2.1 2.17 1.97 1.39 2.26 1.29 1.17 1.14 2.09 1.01 0.89 1.6 0.9 1.01 1.21
carbon(y)
(%)

Compute a suitable statistic and test to study whether increase in


ph of soil affects the organic carbon in that forest.(Hints:r=0.3541 and
tcal=1.3652)
Exercise: The following data contain 15 paired values of photosynthetic
rate(Y) and light interception(X) observed on leaves of a particular tree
species. The photosynthetic rate is dependent variable and the quantity of
light is independent variable. Study the linear relationship between the
two variables with test.
Tree 1 2 3 4 5 6 7 8
X 0.7619 0.7684 0.7961 0.838 0.8381 0.8435 0.8599 0.9209
Y 7.58 9.46 10.76 11.51 11.68 12.68 12.76 13.73
Tree 9 10 11 12 13 14 15
X 0.9993 1.0041 1.0089 1.0137 1.0184 1.0232 1.028
Y 13.89 13.97 14.05 14.13 14.2 14.28 14.36

Spearman's Rank correlation coefficient

A rank correlation is any of several statistics that measure the


relationship between rankings of different ordinal variables or different
rankings of the same variable, where a "ranking" is the assignment of the
labels "first", "second", "third", etc. to different observations of a
particular variable. Like any correlation calculation, it is appropriate for
both continuous and discrete variables, including ordinal variables. A rank
correlation coefficient measures the degree of similarity between two

Department of Agricultural Statistics, OUAT Page-41


UG Practical Manual on Statistics

rankings, and can be used to assess the significance of the relation


between them. A rank correlation coefficient can measure that
relationship, and the measure of significance of the rank correlation
coefficient can show whether the measured relationship is small enough
to likely be a coincidence. It is measured by Spearman's rank correlation
coefficient or Spearman's rho denoted by the Greek letter (rho)
of statistical dependence between two variables. It assesses how well the
relationship between two variables can be described and lie in the interval
[-1 to +1]. An increasing rank correlation coefficient implies increasing
agreement between rankings. The coefficient value can be interpreted as:
i. 1 if the agreement between the two rankings is perfect; the two
rankings are the same.
ii. 0 if the rankings are completely independent.
iii. −1 if the disagreement between the two rankings is perfect; one
ranking is the reverse of the other.
For a sample of size n, the n raw scores or values Xi,Yi are converted
to ranks xi,yi and ρ is computed. Identical values (rank ties or value
duplicates) are assigned a rank equal to the average of their positions in
the ascending order of the values.
The Spearman’s correlation coefficient is:

Where, di = xi – yi (i=1,2,3 ….n)

Procedure:

For a sample observation the Spearman rank correlation coefficient is:


6  di
2
 
6  di 2  m(m 2  1)
rs  1  and when ties occur, rs  1 
n (n 2  1) n (n 2  1)

Here, di= xi-yi , xi=Rank of 1st variable, yi= Rank of 2nd variable
m= No. of ties in any group.
Following steps are applicable for finding rank correlation
Step-1. Rank all observations
I. Ranking should be made from highest to lowest of the observations.
II. If any two or more of the observations are same in magnitude then
all of them must carry the same rank (average of ranks).

Department of Agricultural Statistics, OUAT Page-42


UG Practical Manual on Statistics

Step-2. When a common rank is assumed for different observations of a


m(m 2  1)
factor then is added to the numerator of the 2nd term of the
12
formula for the correlation coefficient.
Step-3. The sum of differences of the rank should be equal to zero, which
is a check for the correction of the calculation.
Problem-19. Find the Rank correlation between the following data

Preference Price (x) 73.2 85.8 78.9 75.8 77.2 81.2 83.8
Debenture Price (y) 92.8 99.2 98.8 98.3 98.3 96.7 97.1
Determine the relationship between preference share price & debenture
price?
Solution:
Table-18. Calculation of rank correlation coefficient
Preference Rank x Debenture Rank y
di=xi-yi di2
share price (x) (xi) Price (y) (yi)
73.2 7 97.8 5 2 4
85.8 1 99.2 1 0 0
78.9 4 98.8 2 2 4
75.8 6 98.3 3.5 2.5 6.25
77.2 5 98.3 3.5 1.5 2.25
81.2 3 96.7 7 -4 16
83.8 2 97.1 6 -4 16
 d i  48.50
2

Here, y has 2 identical values (m=2) and n=7.


Therefore, rank correlation (rs)
 m(m 2  1)   2(2 2  1) 
6  d i  
2
 6  48 . 5 
 12   12 
 1  1
n (n 2  1) 7(7 2  1)
6(48.5  0.5)
 1  0.125
336
It is concluded that the two prices are poorly related i.e. if one price
is increasing the other is not in the same way increasing.

Exercise: In a survey observations on 10 persons were taken on IQ and


No. of Hours Spent in TV per week(Y) as below. Compute the rank
correlation and study whether increase in IQ of persons invite the hours
spent in TV per week.

Department of Agricultural Statistics, OUAT Page-43


UG Practical Manual on Statistics

No. of Hours Spent


Person IQ(X)
in TV per week(Y)
1 106 7
2 86 0
3 100 27
4 101 50
5 99 28
6 103 29
7 97 20
8 113 12
9 112 6
10 110 17
(Hints: Ans. rs = −0.1757)

Fitting of regression equations of two variables Y and X

In regression analysis, both variables are normally distributed and


one of the variables represents cause (independent or explanatory
variable) and other is effect (dependent or response variable). The
relationship between two variables can be expressed as a function known
as Regression. When only two variables are involved in regression, the
functional relationship is known as simple regression. If the relationship
between the two variables is linear, it is known as simple linear
regression.
For simple linear regression, two regression equations are given by:
Y on X : Y  Y  b yx (X  X )
X on Y : X  Y  b xy (Y  Y )

Where, byx  regression coefficient of Y on X


b xy  regression coefficient of X on Y

Y  X 
Y X
, n  No. of observations
n n
Procedure:
Fitting of regression equations are carried out in two phases.
a). Calculation of regression coefficients (bYX and bXY)
i). Direct method:
Step-1. Construct a table to find out X2, Y2, XY
Step-2. Compute X, Y, X2, Y2, XY, Y and X from the table.
Step-3. Calculate the regression coefficients by the formula:

Department of Agricultural Statistics, OUAT Page-44


UG Practical Manual on Statistics

n  XY   X  Y
byx 
n  X 2  ( X ) 2
n  XY   X  Y
bxy 
n  Y 2  ( Y ) 2

ii). Step deviation method:


Step-1. Reduce the value of X & Y to U & V
Where A & B are arbitrary values and h & k are suitable scales and
XA YB
U V
h k
Step-2. Construct the table to compute U2, V2, UV
Step-3. Compute  U ,  V ,  UV ,  U 2 ,  V 2 from the table
Step-4. Compute regression coefficients by the formula:
n  UV   U  V
Re gression coefficient of U on V , bUV 
n V 2  (V ) 2
n  UV   U  V
Re gression coefficient of V on U , bVU 
n U 2  (U ) 2
Where n  no. of pairs of observations
h k
Step-5. Compute bXY = b UV & bYX = bVU
k h
b). Finding the regression equations
After estimating the values of X, Y , bYX and bXY and putting these
values in the following equations the regression equations can be
obtained.
Y  Y  b yx (X  X ) and X  X  b xy (Y  Y )
Problem-20. The Following data is given monthly Income & Expenditure
on food of 10 families.
Income (x) 120 90 80 150 130 140 110 95 70 105
Expenditure (y) 40 36 40 45 40 44 45 38 50 35
Find the two linear regression equations and correlation coefficient.
Solution:
XA YB
Let U  , V
h k
Here, A = 110, h= 5 ; B = 40, k =1

Department of Agricultural Statistics, OUAT Page-45


UG Practical Manual on Statistics

Table-19. Calculation of sums & sum of squares


Expenditure XA YB
Income (X) U V UV U2 V2
(Y) h k
120 40 2 0 0 4 0
90 36 -4 -4 16 16 16
80 40 -6 0 0 36 0
150 45 8 5 40 64 25
130 40 4 0 0 16 0
140 44 6 4 24 36 16
110 45 0 5 0 0 25
95 38 -3 -2 6 9 4
70 50 -8 10 -80 64 100
105 35 -1 -5 5 1 25
Total=1090 413 -2 13 11 246 211
Here n = 10
Regression coefficient U on V = bUV ,
n  UV   U  V

n  V 2  ( V )2
10  11  (2  13)

10  211  (13) 2
110  26 136
   0.07
2110  169 1941
Regression coefficient of V on U = bvu
n  UV   U  V

n  U 2  ( U) 2
136
  0.055
10  246  (2) 2

h 5
So, bxy  (buv ) )  (0.07)  0.35
k 1
k 1
byx  (bvu )  (0.055)  0.011
h 5
 Re gression Coefficient of y on x and x on y are 0.011 and 0.35 respectively.

Therefore, the two regression equations and correlation coefficient are:


i. Y on X : Y- 41.3 = 0.011(X-109)
ii. X on Y : X – 109 = 0.35(Y – 41.3)
iii. Correlation of X & Y = √(0.011x0.35) = 0.062

Exercise: From the Exercise in correlation data on photosynthetic rate(Y)


and light interception(X), find the regression equation of Y on X and
estimate Y when X= 0.95.
Department of Agricultural Statistics, OUAT Page-46
UG Practical Manual on Statistics

II. DESIGN AND ANALYSIS OF EXPERIMENTS

2.1. Basic concepts on design of experiments

Planning an experiment to obtain appropriate data with respect to


any problem under investigation is known as ‘design of experiment’. It is
a complete sequence of steps taken well in time to ensure that
appropriate data will be obtained in a way which permits an objective
analysis of the data leading to valid inferences with respect to the stated
problems. “Design of experiment” comprises the process of planning of
experiments, analysing the data /observations and interpretation of the
results. The techniques for making inferences is known as the “analysis of
variance”. There are three basic principles of the design of experiments:
(i) Replication, (ii) Randomization and (iii) Local control.
(i).Replication: The replication of treatments by applying them to more
than one experimental unit under investigation is known as replication.
Replication is necessary in order to get an estimate of the experimental
error variation- cause due to uncontrolled factors. Again, replication
increases the precision of treatments. Replication of treatments helps in
reducing the error in the experiment in addition to providing an estimate
of error.
(ii).Randomization: Assigning treatments or factors to be tested to
experimental units according to definite law of probability is known as
Randomization. In the principle of randomization, every experimental unit
will have the same chance of receiving any one of the treatments under
study. For an objective comparision it is necessary that treatments are
allotted randomly to various experimental units. Statistical procedures
employed in making inferences about treatments hold good only when the
treatments are allotted randomly to various experimental units.
(iii).Local control: Though every experiment should provide an estimate of
error variation, it is not desirable to have a large experimental error. The
reduction of experimental error can be achieved by making use of the fact
that adjacent areas in the field are relatively homogeneous than those
widely separated. The aim of local control is to reduce the error by
suitably modifying the allocation of treatment to the experimental units
by previous knowledge.

Analysis of variance (ANOVA)


Analysis of variance is basically a technique of partitioning the
overall variation in the responses observed in an investigation into
different assignable sources of variation, some of which are specifiable
and others unknown. Further, it helps in testing whether the variation due
Department of Agricultural Statistics, OUAT Page-47
UG Practical Manual on Statistics

to any particular component is significant as compared to residual


variation that can occur among the observational units.
Some important definition for experimental designs

Treatment: In experimentation, various objects of comparison are known


as treatments. In practice, treatments may refer to a physical substance
(fertilizers/varieties of crops/animal breed/feeds etc.) or a
procedure/condition/methods of cultivation/sowing/housing conditions,
etc. which are applied to experimental units for getting response.
Experimental Unit: The basic objects on which the experiment is done are
known as experimental unit.
Model: In statistics, model is generally expressed in terms of symbols,
usually as a set of equations consisting of factors and treatments with a
random effect.
Fixed effect model: A model in which the factors are fixed effects and the
error affect is random is called a fixed effect model. A fixed effect model
with two factors is written as:
 ijk     i   j  e ijk eijk is i.i.d ~ N (0, e )
2

Random effect model: Models in which factors are random effects and the
error affect is random is called random effect model.
Mixed effect model: Models in which some factors are fixed and some
random with error affect random is called mixed effect model.
Hypothesis: Any assumption or statement about the population
characteristic is called hypothesis. It may be parametric or non-
parametric.
Null hypothesis: It is the hypothesis which is tested for possible rejection
under the assumption that it is true.
Degrees of Freedom: The degrees of freedom correspond to the number
of independent deviations or contrasts that are available from the data
and can be calculated by deducting from the number of values available
to the number of constants that are calculated from the data.
Level of significance: This is the probability (under Ho) which leads to the
rejection of the null hypothesis (or rejection region). It is generally
denoted by the symbol  and is usually be 0.05(or 5%) or 0.01(or 1%).
Basic assumptions for analysis of variance:
(i) All the effects of different sources of variation (e.g treatment,
environment etc.) are additive.
(ii) Experimental errors are independent.
(iii) Experimental errors have common variance.
(iv) Experimental errors are normally distributed or asymptotic i.e,
i.i.d~N (o,e2)
Department of Agricultural Statistics, OUAT Page-48
UG Practical Manual on Statistics

Analysis of variance of one-way classified data


Let there be n observation yij, which are grouped into t
classes/treatments such that in the i-th group there are ni observations
i.e.
i=1,2,3….t; j=1,2,3,…,ni and n i
i n

and yij is response due to i-th treatment of j-th unit


Layout:

Treatments

1 2 .. i .. t

y11 y21 yi1 y t1

y12 y22 yi2 y t2

..

y1j y2j yij ytj

..

y1n1 y2n2 yini ytnt

Total T1 T2 Ti Tt Grand total=G

Mean
Grand mean=

Model:
yij    ti  eij

where,  is a constant representing the general conditions to which all


the observations are subjected; ti is the unknown effect of the i-th class
to be estimated and eij’ are independent random variables with zero mean
and constant variance,  e .
2

Hypothesis: Under certain additional assumptions, analysis of variance


leads to testing the following hypotheses,

and for at least one i and j

Analysis:
Step-1. Compute Correction Factor CF= (G 2 n)
Step-2. Compute Total Sum of Square, TSS=  yij  CF
2

i, j

Department of Agricultural Statistics, OUAT Page-49


UG Practical Manual on Statistics

2
Step-3. Compute Treatment Sum of Square, TrSS= (
Ti
)  CF
i
ni
Step-4. Compute Error Sum of Square, ESS=TSS - TrSS
Step-5. Prepare ANOVA Table
Sources of variation d.f. SS MSS Fcal F (tab)
Treatments t-1 TrSS TrSS TMS
TMS 
t 1 EMS
Error n-t ESS ESS
EMS 
nt
Total n-1 TSS
Step-6. Compare F values as:
If Fcal ≤ Ftab at α level then H0 is accepted i.e. all treatment effects are
same or not significant.
If Fcal > Ftab at α level then H1 is accepted i.e. at least two treatment
effects are different or significant.
Step-7. If in ANOVA, the test is not significant which means all the
treatments are equal in giving the effect, then stop further analysis as
result is concluded. But, if the test is significant means at least two
treatments are different for giving the effect, then proceed for comparing
the difference of treatment effects by Critical Difference (CD) or Least
Significant Difference (LSD) test.
CD Test:
i). Estimate SE of i-th treatment mean, SE (m)  EMS / ni
ii). Estimate SE of the difference between i-th and j-th treatment mean,

1 1
SE (d )  EMS   
n n 
 i j 

2  EMS
If ni= nj = r, then SE (d) =
r
iii). Compute CD = SE(d) x t, t=Tabulated t with error d.f. at α level
iv). Compare the difference of any two treatment means (DTM) with the
CD value to find the significant difference between treatments. If any DTM
is less than or equal to CD, then the two are not significant otherwise
significantly different. All such treatment pairs are compared likewise.
Step-8. In order to find out the reliability of the experiment, the
coefficient of variation (CV) is computed as:
EMS
CV   100
Overall mean

Department of Agricultural Statistics, OUAT Page-50


UG Practical Manual on Statistics

If the CV is 20% or less, it is an indication of better precision of the


experiment and when the CV is more than 20% the experiment may be
repeated and efforts made to reduce the experimental error.

Analysis of variance of two-way classified data

Two-way ANOVA is carried out when there are two-way variability


of factors. For example, treatment as first factor and blocking as second
factor in agricultural experiments; feed and housing condition in poultry;
learning process and education standard in social science; tree species
and agro-climatic condition, etc. Let yij be the responses due to
i=1,2,3….t treatments and j=1,2,3,…r blocks in a trial, then

Layout: Let there be t treatments with r blocks or replications for studying


the response of a characteristic, y
Replication r1 r2 .. rr Total Mean
Treatment
t1 Y11 Y12 .. Y1r T1 T1/r
t2 Y21 Y22 .. Y2r T2 T2/r
.. .. .. .. .. .. ..
tt Yt1 Yt2 Ytr Tt Tt/r
Total R1 R2 .. Rr G M=G/rt

Model: The model for two way classified data with one observation per
cell:
yij    ti  b j  eij
Hypothesis: Under certain additional assumptions, analysis of variance
leads to testing the following hypotheses,

and for at least one i and j

Analysis:
Step-1. Compute Correction Factor CF= (G 2 rt )
Step-2. Compute Total Sum of Square, TSS=  yij  CF
2

i, j
2
Step-3. Compute Treatment Sum of Square, TrSS= (
Ti
)  CF
i
r
2
Rj
Step-4. Compute Replication Sum of Square, RSS= ( )  CF
j
t
Step-5. Compute Error Sum of Square, ESS=TSS – TrSS - RSS
Step-6. Prepare ANOVA Table
Sources of variation d.f. SS MS Fcal F (tab)

Department of Agricultural Statistics, OUAT Page-51


UG Practical Manual on Statistics

Replication r-1 RSS RSS RMS


RMS 
r 1 EMS
Treatments t-1 TrSS TrSS TMS
TMS 
t 1 EMS
Error (r-1)(t-1) ESS ESS
EMS 
nt
Total rt-1 TSS
Step-7. Compare F values as:
If Fcal ≤ Ftab at α level then H0 is accepted i.e. all treatment effects are
same or not significant.
If Fcal > Ftab at α level then H1 is accepted i.e. at least two treatment
effects are different or significant.
Step-8. If in ANOVA, the test is not significant means all the treatments
are equal in giving the effect, then stop further analysis as result is
concluded. But, if the test is significant means at least two treatments are
different for giving the effect, then proceed for comparing the difference
of treatment effects by Critical Difference (CD) or Least Significant
Difference (LSD) test as above.
Step-9. SE of mean, SE (m)  EMS / r and
SE (diff of 2 means), SE (d )  2 EMS / r
EMS
Step-10. CV   100
M

2.2. Analysis of data in completely randomized design (CRD)

The simplest design using only two essential principles of field


experimentation, viz. replication and randomization, is the completely
randomised design (CRD). This is a one-way classification of data. In this
design whole of the experimental units is divided into no. of experimental
units depending on the no. of treatments and no. of replication for each
treatment. The treatments are then allotted randomly to the units of the
entire homogeneous material and observations on different characteristics
or variables of interest are recorded. This design is useful for laboratory
or green house experiments where treatment is the only variable of
interest for comparison.
Procedure:
The analysis is same as that of one-way classification with model,
assumptions, hypothesis and steps of calculation.
Model, Yij =  +ti +eij
Where, Yij is the value of the variate in the jth replicate of the ith
treatment (i=1,2….t; j=1,2…..ri)
Department of Agricultural Statistics, OUAT Page-52
UG Practical Manual on Statistics

 = is the general mean effect


ti is the effect due to ith treatment
eij is random error which is iid ~ N (0, e2)
Step-1.The observations of a variable y recorded can be arrived as
follows:
Arrangement of observation of CRD
Treatment
1 2 3 ……… T
Y11 Y21 Y31 ……… Yt1
Y12 Y22 Y32 ……… Yt2
Y13 Y23 Y33 ……… Yt3
--- --- --- ----- ----
Total T1 T2 T3 ----- Tt GT
No. of Repl. r1 r2 r3 ----- rt n
Treat mean T1  T1 / r1 T2 T2 /r2 T3 T3 /r3 Tt  Tt / rt

Step-2. The testing of hypothesis is,


and for at least one i and j
Step-3. Analysis of data
(GT) 2
i). Correction Factor (C.F.) =
n
ii). Total Sum of Squares (TSS) =   Y 2ij  C.F.
= (Y 211  Y 212  ....  Y 2 tr )  C.F.
T 2 T 2 T 
2
iii). Treatment Sum of Squares (TrSS) =  1  2  ...  t   C.F.
 r1 r2 rt 
iv). Error Sum of Squares (ESS) = TSS – TrSS

Step-4. Preparation of ANOVA table


Sources of variation d.f. SS MSS Fcal F (tab)
Treatments t-1 TrSS TrSS TMS
TMS 
t 1 EMS
Error n-t ESS ESS
EMS 
nt
Total n-1 TSS

Step-5. If the calculated value of F is greater than the table value


of F ; t  1, n  t , where α denotes the level of significance, the hypothesis,
Ho, is rejected and it can be inferred that some or all the treatment effects
are significantly different.

Department of Agricultural Statistics, OUAT Page-53


UG Practical Manual on Statistics

Step-6. Calculation of standard errors and CD value for pair comparison:


(a).Estimated SE of ith treatment mean, SE (m)  EMS / ri
(b).Estimated SE of the difference between i-th and j-th treatment mean is
1 1
SE (d )  EMS  
r r 
 i j 

2  EMS
If ri= rj = r, then SE (d) =
r
(c). CD = SE(d) x t
(d).The treatment means are arranged according to their ranks in
descending order. Using the CD value the bar chart is completed to
interpret the treatment comparisons.

CRD with unequal replications

Problem-21. A varietal trial on green gram was conducted in a green


house under CRD having five varieties V1, V2, V3, V4, V5 and replicated
with 3, 4, 5, 4 and 4, respectively. The data recorded on grain yield are
presented below.
Grain yield of green gram (kg/pot)
Varieties V1 V2 V3 V4 V5
1.6 2.5 1.3 2.0 1.6
1.2 2.2 0.9 1.5 1.0
1.5 2.4 0.8 1.6 0.8
-- 1.9 1.1 1.4 0.9
-- -- 1.0 -- --
Total 4.3 9.0 5.1 6.5 4.3
Repl 3 4 5 4 4
Mean 1.43 2.25 1.02 1.62 1.08
Variance 0.043 0.070 0.037 0.069 0.129
Analyse the data and find the best variety of highest grain yield.
Solution:

Step-1. Null hypothesis Ho: T1=T2 = T2…….= T5 means all varieties give
the same yield;
H1:T1  T2  ….  T5 means all the varieties does not give the same yield
Step-2. Calculation
i). C.F.= (29.2)2/ 20 = 42.6320
ii). TSS=[(1.6) 2+(1.2)2+……….+ (0.9) 2] – C.F. =47.840 – 42.632=5.208

Department of Agricultural Statistics, OUAT Page-54


UG Practical Manual on Statistics

iii). SS due to treatments (varieties) =TrSS or VSS


 (4.3) 2 (9.0) 2 (5.1) 2 (6.5) 2 (4.3) 2 
       C.F.
 3 4 5 4 4 
 46.8003  42.6320  4.1683
iv). ESS = TSS - VSS = 5.2080 – 4.1683 = 1.0397
Step-3. Construction of ANOVA table
Sources of variation d.f. SS MSS Fcal F0.01
Variety 4 4.1683 1.4021 15.037** 4.893
Error 15 1.0397 0.0693
Total 19 5.2080
** Significant at 1% level
Step-4. Since the observed value F is greater than 1% tabulated F value,
the null hypothesis rejected. It indicates some of the treatment pairs are
different. So, the C.D. test is required for pair wise comparison.
Step-5. Calculation of SE for V1 and V2
1 1 1 1
SE(d)= EMS    0.0693    0.040423  0.2011
 r1 r2  3 4
The table value of t for  = 0.05 and 15 df is 2.131
Hence, CD= (2.131)  (0.2011) = 0.4285
Similarly CD value of other pairs are:
V1 and V3 = 0.4096,
V1 and V4; V1 and V5 = 0.4285
V2 and V3; V3 and V4; V3 and V5 = 0.3763
V2 and V4; V2 and V5 = 0.3966.
Comparison of the difference between the mean yields of the varieties
with the corresponding CD value will result in the following bar chart.
V2 V4 V1 V5 V3
Conclusion: It is concluded that the variety V2 is the best variety in giving
highest grain yield followed by V1 & V4 and V3 & V5.
Exercise: The data from a laboratory experiment is used in which
observations were made on mycelial growth of different Rizoctonia
solani isolates on PDA medium as:

R. solani isolates Mycelial growth

Repl. 1 Repl. 2 Repl. 3

RS-1 29.0 28.0 29.0

RS-2 33.5 31.5 29.0

Department of Agricultural Statistics, OUAT Page-55


UG Practical Manual on Statistics

RS-3 26.5 30.0 ----

RS-4 48.5 46.5 49.0

RS-5 34.5 31.0 ----


Analyse the data and draw conclusions on significant difference of
different Rizoctonia solani.

CRD with equal replications

Problem-22. In order to find out the yielding abilities of five varieties of


sesamum, an experiment was conducted in a poly house using a CRD with
four plots per varieties. The observations are given in the table below.
Seed yield of sesamum (g/plot)
Varieties
1 2 3 4 5
25 25 24 20 14
21 28 24 17 15
21 24 16 16 13
18 25 21 19 11
Total 85 102 85 72 53
Mean 21.2 25.5 21.2 18.0 13.2
Analyse the data and draw conclusions on varietal performance of
different sesamum varieties.
Solution:
Step-1. Null hypothesis Ho: V1 = V2 …. = V5, H1: at least 2 varieties are
different.
Step-2. Calculation

(i). C.F. =
397 2
 7880.25
20
(ii). TSS = [(25.)2 + (21)2 +….. (11)2] – C.F. = 8307 – 7880.45 = 426.55
1
(iii). Varieties SS= VSS = (85 2  102 2  ......532 )  C.F.
4
= 8211.75- CF = 8211.75 – 7880.45 = 331.30
(iv). ESS = 426.55 – 331.30 = 95.25
Step-3. Construction of ANOVA table
Sources of variation d.f. SS MSS Fcal Ftab
Varieties 4 331.30 82.825 13.043 ** 4.893

Error 15 95.25 60350


Total 19 426.55
** Significant at 1% level.

Department of Agricultural Statistics, OUAT Page-56


UG Practical Manual on Statistics

Step-4. Since the observed value of F is greater than the 5% tabule


value, the null hypothesis rejected. So, we proceed for CD test.
26.350
SE(d) =  1.7819
4
The table value of t for  = 0.05 and 15 df is 2.131
Hence, CD = (2.131)  (1.7819) = 3.7972 = 3.80
The arrangement of treatments according to their ranks and the bar chart
will be: = V2 V1 V3 V4 V5
Conclusion: From the analysis, it is concluded that the variety V2 is the
best.

Exercise: The data represent a set of observations on wood density


obtained on a randomly collected set of 7 stems belonging to five cane
species.

Species
1 2 3 4 5
1 0.58 0.53 0.49 0.53 0.57
2 0.54 0.63 0.55 0.61 0.64
3 0.38 0.68 0.58 0.53 0.63
4 0.32 0.55 0.54 0.47 0.68
5 0.52 0.45 0.41 0.41 0.61
6 0.41 0.59 0.63 0.58 0.74
7 0.47 0.65 0.58 0.44 0.71
Analyse the data and draw conclusion on difference of cane species.

2.3. Analysis of data in randomised complete block design (RCBD


or RBD) with one observation per cell

In order to control variability in one direction in the experimental


material it is desirable to divide the experimental unit into homogenous
group of units called blocks perpendicular to treatments. The treatments
are randomly allocated to each of these blocks. This procedure gives an
arrangement of ‘t’ treatments in ‘r’ blocks such that each treatment
occurs precisely once in each block.

Procedure:

Department of Agricultural Statistics, OUAT Page-57


UG Practical Manual on Statistics

The analysis of a Randomised Complete Block Design is the one


similar to analysis of a two-way classified data. For analysis of this design
we use the linear additive model,
Yij =   t i  r j  eij

Where,  = the overall mean; ti = the ith treatment effect


rj=the jth replication effect, and
eij = the error term iid~ N (0.e2)

Step-1. The observations from a RBD can be arranged as follows:

Arrangement of data in RBD with t treatments and r replications

Replication
Treatment Total
1 2 3 …………. r
1 Y11 Y12 Y13 .………… Y1r T1
2 Y21 Y22 Y23 ..………. Y2r T2
3 Y31 Y32 Y33 .……….. Y3r T3
.……….. .……….. .……….. .……….. .……….. .……….. .………..
t Yt1 Yt2 Yt3 .……….. Ytr Tt
Total R1 R2 R3 ..……….. Rt GT

Step-2. The data can be analysed as:


(i). C.F. = (GT)2/rt
(ii). Total SS=TSS =  yij2 – C.F.
I
(iii). Replication SS= RSS=  R j  C.F.
2

t
I
(iv). Treatment SS= TrSS =  Ti  C.F.
2

r
(v). Error SS=ESS = TSS – RSS – TrSS
Step-3. We are interested in testing the hypothesis
Ho: t1 = t2 =. ………= tt, against the alternative that at least 2 t’s are not
equal.
Step-4. ANOVA table
Sources of variation d.f SS MSS Fcal F(tab)
Replication r-1 RSS RMS RMS / EMS
Treatment t-1 TrSS TMS TMS /EMS
Error (r - 1)(t-1) ESS EMS
Total rt-1 TSS
Step-5. If F-test shows that there is no significant difference between
replications, it indicates that RBD will not contribute to precision in
detecting treatment differences. In such situations the adoption of RBD in
preference to CRD is not advantageous.

Department of Agricultural Statistics, OUAT Page-58


UG Practical Manual on Statistics

Step-6. If by F-test we find significant difference between treatments,


then we can use CD for comparing pairs of treatments. The CD is given
by:
CD = tα x SE(d)
Where, tα = table value of t for α (0.01 or 0.05) level of significance and
error degrees of freedom.
2EMS
And SE(d) =
r
Based on the CD value the bar chart can be drawn and conclusions can be
written.
Problem-23. Plan and yield of six paddy strains (A,B,C,D,E,F) yield
(kg/plot) in a RBD experiment with four replications is shown below.
Block-I Block-II Block-III Block-IV
A (12) B (4) B (7) F (8)
E (14) C (6) C (9) A (18)
C (11) E (11) D (9) C (10)
D (7) A (16) E (15) E (6)
B (5) D (8) F (12) D (8)
F (10) F (9) A (14) E (12)
(Parentheses figures are yield observations)
Analyse the data and draw conclusions on paddy strains for yield
performance.
Solution:

Step-1. Null hypothesis H0 : TA = TB= ….= TF (All the varieties have the
same mean yield); H1 : At least 2 strains are different
Step-2. The data can be arranged in the following two-way classification.
Paddy yield (in kg/plot)
Replication or Blocks
Treatment Treatment Total Mean
I II III IV
A 12 16 14 18 60 15
B 5 4 7 6 22 5.5
C 11 6 9 10 36 9
D 7 8 9 8 32 8
E 14 11 15 12 52 13
F 10 9 12 8 39 9.8
Rep. Total 59 54 66 62 GT=241

Step-3. Calculation here, N=r x t = 4 x 6=24


(GT) 2 (241) 2
(i). Correction factor, CF =   2420
N 24
(ii). Total SS=TSS= (122+……+ 82) – CF= 2717-2420=297

Department of Agricultural Statistics, OUAT Page-59


UG Practical Manual on Statistics

(iii). Replication or Block SS=RSS =


60 2

 ............  62 2
 CF  2432  2420  12
6
(602  ....  392 )
(iv). Variety SS=VSS =  CF  2657  2420  237
9
(v). Error SS=ESS= TSS – RSS – VSS = 297-12-237=48

Step-4. Construction of ANOVA Table


Ftab
Sources of variation d.f. SS MSS Fcal
5% 1%
Block (r-1)=3 12 4 1.25ns
Variety (t-1) =5 237 47 14.8** 2.90 4.56
Error 15 48 3.2
Total (rt-1)=23 297
NS- Not significant ** Significant at 1% level
Step-5. Since the calculated F value of variety is greater than the F table
value for 5 and 15 d.f at 1% level, the conclusion is that the varieties
differ significantly at 1% level or the varietal differences are highly
significant.
2  EMS 3.2
Step-6. Critical difference, CD =  t0.05 for 15 d . f .   2.131  2.69
4 2
Step-7. The arrangement of treatments according to their ranks with
respect to their mean and their bar chart is as follows:
Varieties: A E F C D B
Conclusion: The Bar chart shows that varieties (A & E) are superior to B &
(C, D,F); while (C,D,F) are at par with respect to yield performance of
these 6 paddy strains.
Exercise: In a field experiment laid out under RCBD, data is made on
seven provenances of Gmelina arborea for the girth at breast-height
(gbh) of the trees attained since 6 years of planting.
gbh (cm) of trees in plots 6 years after planting

Treatment (Provenance) Replication

I II III

1 30.85 38.01 35.10

2 30.24 28.43 35.93

3 30.94 31.64 34.95

4 29.89 29.12 36.75

Department of Agricultural Statistics, OUAT Page-60


UG Practical Manual on Statistics

5 21.52 24.07 20.76

6 25.38 32.14 32.19

7 22.89 19.66 26.92

Analyse the data and draw conclusions on treatment differences.

2.4. Analysis of data in Latin square design (LSD)

This design controls heterogeneity in two directions in the


experimental material. In this design two restrictions are imposed by
forming blocks in two perpendicular directions, row wise and column wise.
Treatments are allotted in such a way that every treatment occur once
and only once in each row and each column. Thus, a Latin square of ‘t’
treatments is an arrangement of t x t or t2 cells such that every row or
every column contains every treatment precisely once. By this
arrangement the error variation can be considerably reduced further.

Procedure:

For analysis of these designs we use the linear additive model


y ijk    ri  c j  t k  e ijk

Where, yijk is the observation on kth treatment in the ith row and jth column
(i= 1,2,…………..,s, j=1,2,…………,s; k= 1,2,………,s)
 is the general mean effect, ri is the effect due to ith row, cj is the effect
due to jth column, tk is the effect due to kth treatment and eijk is the
random error component which is assumed to be independently and
identically normal distribution with mean zero and a constant variance,
e .
2

Analysis:

Let, there be s treatments arranged in s rows and s columns, then


compute,
(i). Ri= Total of ith row =  y ijk
j

(ii). Cj= Total of jth column = y i


ijk

(iii). TK= Total of kth treatment in the design


(iv). C.F.= (GT) 2 s 2 , where GT is Grand Total
(v). TSS (Total Sum of Squares) =  y  C.F.
2
ijk
i j

(vi). RSS (Row Sum of Squares) = R s  C.F.


2
i
i

Department of Agricultural Statistics, OUAT Page-61


UG Practical Manual on Statistics

(vii). CSS (Column Sum of Squares) = C s  C.F.


2
j
j

(viii). TrSS (Treatment Sum Squares) =


  Tk s  C.F.
2

(ix). ESS (Error Sum of Squares) = TSS- RSS- CSS - TrSS


(x). Hypothesis Ho:t1=t2=……………= ts against H1 that ti’s are not equal
(xi). ANOVA Table

Sources d.f. SS MSS Fcal


2
Row (s-1) RSS Sr = RSS/ s-1
Column (s-1) CSS Sc2 = CSS/s-1
Treatment (s-1) TrSS St2 = TrSS/s-1 St2/se2
Error (s-1) (s-2) ESS Se2=ESS/(s-1) (s-2)
Total (s2-1) TSS

If the calculated value of F for treatment is greater than the table of


F:(s-1);(s-1)(s-2) d.f., the hypothesis Ho is rejected. We can infer that
the treatment effects are significantly different. To detect the difference,
CD test is performed.
The estimated SE of the difference between ith and jth treatment is
2Se 2
SE (d ) 
s
The critical difference (CD) can be calculated as
CD= SE(d) x t at error df
The degrees of freedom for t are those as for error. The treatment
means are computed as Tk/s (k=1,2,………,s). These means can be
compared with the help of CD value. Any two treatments means are said
to differ significantly if their difference is larger than the CD value.

Problem-24. An experiment was carried out on Sorghum with 5


varieties (A,B,C,D & E) in a (5  5) LSD. The Plan and grain yield (kg/plot)
are given below:

Columns
Rows Row total
I II III IV V
I B (6) A (11) E (8) D (6) C (5) 36
II A (9) D (9) C (4) E (14) B (10) 46
III C (3) B (8) D (7) A (12) E (8) 38
IV E (10) C (5) A (10) B (7) D (10) 42
V D (8) E (15) B (9) C (3) A (18) 53
Column
36 48 38 42 51 215
total
(Parentheses figures are yield observations of respective treatments)
Perform the ANOVA and compare the variety mean yields.

Department of Agricultural Statistics, OUAT Page-62


UG Practical Manual on Statistics

Solution:
Step-1. Hypothesis:
H0 : A  B  C  D  E
H1 :  A  .....................   E
Step-2. Yield (kg/plot) of varieties and their totals

A B C D E
11 6 5 6 8
9 10 4 9 14
12 8 3 7 8
10 7 5 10 10
18 9 3 8 15
Tk 60 40 20 40 55

Variety totals are: A=60, B=40; C=20; D=40; E= 55


Step-3. Calculation
(i). Grand total, GT = 215, Total no. of observations=N=25
(ii). No. of varieties, s = 5
(GT) 2 (215) 2
(iii). Correction factor, C.F. =   1849
N 25
(iv). Total Sum of Squares=TSS =
 y ijk  C.F.  (6  11  ..........  18 )  1849
2 2 2 2

i j

 2163  1849  314


(v). Row Sum of Squares=RSS =
(36  46  38  42  53
2 2 2 2 2 2
R
i si  C.F.  5
 1849

 1885.8  1849  36.8


(vi). Column Sum of Squares=CSS =
2
Cj (36 2  48 2  38 2  42 2  512 )
j s  C.F.  5
 1849

 1881.8  1849  32.8


(vii). Variety Sum of Squares=VrSS
(60 2  40 2  20 2  40 2  55 2 )
2
Tk
= 
 C.F.  .  1849
k s 5
 2045  1849  196
(viii). Error SS=ESS= TSS- RSS – CSS- VrSS
=314-36.8-32.8-196 = 48.4
Step-4. Construction of ANOVA Table

Source of variation df SS MSS Fcal Ftab


Department of Agricultural Statistics, OUAT Page-63
UG Practical Manual on Statistics

5% 1%
Rows 4 36.8 9.2 (9.2/4.03)=2.28 ns
Columns 4 32.8 8.2 (8.2/4.03)=2.03 ns
Variety 4 196.0 49.0 (49/4.03)=12.15 ** 3.26 5.41
Error 12 48.4 4.03
Total 24 314
Step-5. Comparing the F ratio for Rows, Columns and Varieties with the
table value of F (for 4 and 12 d.f) it is found that only difference in
varietal means are highly significant.
Step-6. CD at 5% = SE(d) x t0.05 for 12 d.f
2  4.03
=  2.18  1.26  2.18  2.74
5
The arrangement of variety means according to their ranks and the
bar chart will be done by comparing the differences with CD value.

Variety A E B D C
Means 12 11 8 8 4
and the bar coding is: AE BD C
Conclusion: The analysis reveals that the varietal differences is present
and variety A & E are at par; variety B & D are also at par but C is
completely different in giving the yield of the crop. Variety A & E are the
best varieties for yield performance.
Exercise: In a varietal trial on paddy to test the yielding ability of 5
varieties (A,B,C,D,E), an experiment was laid out in a 5x5 LSD. The
results are given below.
Grain yield of paddy (kg/plot)
D 39.0 A 24.1 E 26.1 B 37.0 C 42.2
E 21.2 B 38.1 A 24.0 C 39.3 D 33.1
C 35.6 E 33.5 B 38.1 D 40.8 A 24.2
A 30.8 C 31.1 D 46.7 E 28.7 B 44.9
B 44.3 D 29.6 C 41.1 A 26.3 E 24.4
Analyse the data and draw conclusion on yielding ability of paddy
varieties.

2.5. Missing plot technique in design of Experiments

Statistical concept: In agricultural field experiments, the experimenter


is often encountered with the situation that the observations of a
particular plot may be lost or are so much affected by some extraneous
causes that it would not be desirable to regard these observations as
normal experimental observations. Such data are generally analysed

Department of Agricultural Statistics, OUAT Page-64


UG Practical Manual on Statistics

through missing plot technique. Statistical analysis of such type of designs


where observation on one or more plot are missing is somewhat
complicated due to disturbance in the initially symmetrical distribution of
plot among different treatments and also among different blocks. The
analysis of such experiments, however, can be carried out by one of the
following methods.

(a) Estimating the missing value(s) using the Principle of least squares
i.e. minimizing the error sum of squares.
(b) Method of interaction
(c) Method of fitting constants, and
(d) Analysis of the data with missing observation by the technique of
analysis of covariance.

In the following, we shall use the first method of analysis of data with
one missing observation.

2.6. Analysis of data in RCBD with one missing observation

Procedure:
When any one observation of a character under study is missing, we
first estimate the missing observation and substitute the estimated value
in that place and proceed for analysis. The method consists of selecting a
value ‘x’ for the unknown missing value such that the error variance is
kept at minimum.
Consider a randomized block design with t treatments and r
replications and one observation is missing.
Let, x be the value of the missing observation and this is estimated
as:
 rB '  tT '  G '
x
(r  1)(t  1)

where,
B’ = total of available values of the replication that contains the missing
value
T’ = total of available values of the treatment that contains the missing
value
G’ = grand total of all the available values
The analysis is than carried out as usual after substituting the
estimated value of the missing value with the following changes.
i). The d.f. for error and total is corrected by subtracting 1 from the actual
d.f.
ii). Treatment Sum of Squares is to be corrected by subtracting the bias,
( B '  tT '  G ' ) 2
B=
t (t  1)(r  1) 2

Department of Agricultural Statistics, OUAT Page-65


UG Practical Manual on Statistics

iii). Standard error for testing the significance of the difference between
treatment means:

(a).Standard error of the difference between two treatment means


not involving the missing value:
2Se 2
SE(d) =
r
Where, Se2 is the Error Mean Square
(b).Standard error of the difference between two treatment means
one of which involves the missing value:

EMS  t 
SE(d) =  2  
r  (r  1)(t  1) 
Problem-25. To find out the best source of nitrogen at 60 kg/ha, an
experiment was conducted on paddy with 5 sources of nitrogen in 4
blocks. The yield data for different treatments are given below.

Yield of grain (kg/plot)

Ammonium Ammonium Chilean Ammonium


Urea
Blocks Sulphate Chloride nitrate Sulphate Nitrate
S1 S2 S3 S4 S5
I 25.4 32.5 37.5 22.5 20.5
II 17.3 -- 25.4 14.7 21.5
III 22.4 28.4 30.1 23.5 23.5
IV 30.5 33.4 34.5 22.4 28.5

The observation relating to application of Ammonium Chloride in the


second block is missing. Estimate the missing value and analyse the data.
Solution:
Step-1. Prepare the following two-way table between treatments and
blocks treating the yield corresponding to S2 in second block as missing.

Treatment X Block table


Treatments
Blocks
1 2 3 4 5 Total
I 25.4 32.5 37.5 22.5 20.5 138.4
II 17.3 -- 25.4 14.7 21.5 78.9
III 22.4 28.4 30.1 23.5 23.5 127.9
IV 30.5 33.4 34.5 22.4 28.5 149.3
Total 95.6 94.3 127.5 83.1 94.0 494.5

Step-2. Estimate missing value, x,

Department of Agricultural Statistics, OUAT Page-66


UG Practical Manual on Statistics

 r  B'  t  T '  G' (4  78.9)  (5  94.3)  494.5


x   24.4
(r  1)(t  1) 3 4

Step-3. Insert the estimated missing value and carryout the analysis of
variance according to the usual procedure of RBD except for subtracting 1
d.f from the d.f. for total S.S as well as from the d.f. for error S.S.
Step-4. Calculation of sum of squares
(GT) 2 (518.9) 2
C.F. =   13462.86
rt 20

Total S.S=TSS=  y  C.F.  14124.21  13462.86  658.35


2
ij

B2 i
Block S.S =BSS=   C.F 13694.87  13462.86  232.01
i t
2
Tj
Treatment S.S.=TrSS =   C.F.  13806.73  13462.86  343.87
j r

Error S.S. =ESS= TSS – BSS – TrSS


= 658.35 – 232.01 – 343.87 = 82.47

While the error mean square is an unbiased estimate of the error


variance, the treatment S.S. is an over estimate and has to be corrected
by subtracting from it a bias, B

( B '  tT `  G ' ) 2 (78.9  5  94.3  494.5) 2


B=   17.36
t (t  1)(r  1) 2 5 4 9

Corrected Treatment S.S. = 343.87 – 17.36 = 326.51


Step-5. ANOVA Table
Sources d.f. SS MSS F
Blocks 3 232.01 77.34 10.31 **
Treatments 4 343.87 85.97 11.46 **
Error 11 82.47 7.50 --
Total 18 658.35 -- --
Treatments 4 326.51 81.63 8.99 **
(Corrected)
Error 11 99.83 9.03
** Significances at 1% level.
Step-6. Calculation Standard Error
(a). Standard error of the difference between two treatment means not
involving the missing value:
Department of Agricultural Statistics, OUAT Page-67
UG Practical Manual on Statistics

2Se 2 2  7.50
SE(d) =   1.936kg / plot
r 4
(b). Standard error of the difference between the two treatment means
one of which has a missing value:
EMS  t 
SE(d) = 2  r  1t  1   2.13 kg / plot
r  
Exercise: In an experiment under RCBD for comparing fodder yield of 7
sorghum varieties, the data was obtained as:
Fodder yield (t/ha)

Variety
Replication
I II III
V1 14.5 14.0 14.0
V2 16.5 16.9 16.7
V3 x 16.7 17.4
V4 17.6 16.9 17.5
V5 18.5 17.9 17.6
V6 19.3 18.3 18.8
V7 19.5 19.0 20.0
Here data on V3 in R-I is missing. Analyse the data and draw your
conclusion.

2.7. Analysis of data in LSD with one missing observation

Procedure:
Step-1. Estimate the missing value, x,

 t (R '  C '  T ' )  2G`


x
( t  1)( t  2)
where,
t = no. of treatments
R’ = total of available values of the row containing the missing value
C’ = total of available values of the column containing the missing value
T’ = total of available values of the treatment containing the missing
value
G’ = grand total of all available values
Step-2. The estimated missing value, x, is then inserted and the analysis
is carried out according to the usual procedure for LSD, except, for

Department of Agricultural Statistics, OUAT Page-68


UG Practical Manual on Statistics

subtracting 1 d.f. from the d.f. for total S.S. and error S.S. and
computing the corrected treatment S.S. by adjusting the bias, B as

(G '  R '  C '  (t  1)T ' ) 2


B=
((t  1)(t  2))2
Step-3. Standard Error for testing the significance of difference between
two treatment means will be done as follows:
a. SE of the difference between two treatment means not involving
the missing value,
2Se 2
SE (d ) 
t
where, Se2 is the error mean square.
b. SE of the difference between two treatment means one of which
has a missing value,
2 1 
SE (d)  Se 2   
 t ( t  1)( t  2) 

Problem-26. The data of grain yield of paddy from a varietal trail in 5 x


5 latin square design is shown in the following table. The yield of variety C
is missing from second row.
Grain yield of paddy (kg/ plot)
E C D B A Total
26 42 39 37 24 168
A D E C B
24 33 21 x 38 166
D B A E C
47 45 31 29 31 183
B A C D E
38 24 36 41 34 173
C E B A D
41 24 44 26 30 165
TOTAL= 176 168 171 133 157 805
Analyse the data and draw your conclusion.

Solution:

Step-1. We first estimate the missing value, x as


 t (R '  C'  T ' )  2G ' 5(116  133  150)  2(805) 385
X    32
( t  1)( t  2) (5  1)(5  2) 12
Step-2. On substitution of the estimated value in the missing place, we
get the corrected totals as follows:
Total of second row = 148; Total of 4th column= 165

Department of Agricultural Statistics, OUAT Page-69


UG Practical Manual on Statistics

Total of treatment C = 185; Grand total = 837


Step-3. Calculate the various sum of squares as normal LSD:
CF= (GT)2/t2 = 28022.76
Total SS =TSS= 29399.00 - CF= 1376.24
Row SS =RSS= 28154.20 - CF= 131.44
Column SS=CSS= 28063.00 - CF= 40.24
Treatment SS=TrSS= 28925.00 – CF = 902.24
Error SS=ESS=TSS - RSS - CSS – TrSS = 302.32
Step-4. Upward bias,B
(G  R  C  ( t  1)T )
' ' ' ' 2
[805  116  113  4(150)]2
   13.44
[( t  1)( t  2)]2 (4  3) 2
Corrected treatment SS=TrSS(Adj.) = 902.24-13.44 = 888.80
Step-5. Construction of ANOVA Table:
ANOVA Table

Sources of variation d.f. SS MS F


Row 4 131.44 32.86 1.196
Column 4 40.24 10.06 <1
Treatment(Adj.) 4 888.80 222.20 8.085
Error 11 302.32 27.4836
Total 23 1362.80

Step-6. Estimation of Standard errors (SE):


a. SE of the difference between two treatment means not involving the
missing value

2Se 2 2  27.4836
SE (d )    3.3156
t 5
CD  (2.201)  (3.3156)  7.2956

b. SE of the difference between two treatment means one of which


involves the missing value:
2 1  2 1 
SE (d )  Se 2     27.4836    13.2839  3.6447
 t t  1)t  2   5 5  15  2 
CD  (2.201)  (3.6447)  8.0220
Step-7. Arrange the variety means in descending order of value and
prepare the bar chart as:
B D C E A

Department of Agricultural Statistics, OUAT Page-70


UG Practical Manual on Statistics

Conclusion: For yield performance, variety B,D & C are at par and best
followed by both E & A.

Exercise: Estimate the missing value in the following LSD layout having 4
treatments A,B,C & D and analyse the data to draw conclusion.

A 12 C 19 B 10 D 8

C 18 B 12 D 6 A --

B 22 D 10 A 5 C 21

D 12 A 7 C 27 B 17

III. SAMPLING TECHNIQUES


Essentially, sampling consists of obtaining information from only a
part of a large group or population so as to infer about the whole
population. The object of sampling is thus to secure a sample which will
represent the population and reproduce the important characteristics of
the population under study as closely as possible. The principal
advantages of sampling as compared to complete enumeration of the
population are reduced cost, greater speed, greater scope and improved
accuracy. The smaller size of the sample makes the supervision more
effective. Moreover, it is important to note that the precision of the
estimates obtained from certain types of samples can be estimated from
the sample itself. The most ‘convenient’ method of sampling is that in
which the investigator selects a number of sampling units which he
considers ‘representative’ of the whole population

When sampling is performed so that every unit in the population


has some chance of being selected in the sample and the probability of
selection of every unit is known, the method of sampling is called
probability sampling. An example of probability sampling is random
selection, which should be clearly distinguished from haphazard selection,
which implies a strict process of selection equivalent to that of drawing
lots. In this manual, any reference to sampling, unless otherwise stated,
will relate to some form of probability sampling. The probability that any
sampling unit will be selected in the sample depends on the sampling
procedure used. The important point to note is that the precision and
reliability of the estimates obtained from a sample can be evaluated only

Department of Agricultural Statistics, OUAT Page-71


UG Practical Manual on Statistics

for a probability sample. The object of designing a sample survey is to


minimise the error in the final estimates. Even if the sample is a
probability sample, the sample being based on observations on a part of
the population cannot, in general, exactly represent the population. The
average magnitude of the sampling errors of most of the probability
samples can be estimated from the data collected. The magnitude of the
sampling errors depends on the size of the sample, the variability within
the population and the sampling method adopted. Thus, if a probability
sample is used, it is possible to predetermine the size of the sample
needed to obtain desired and specified degree of precision. A sampling
scheme is determined by the size of sampling units, number of sampling
units to be used, the distribution of the sampling units over the entire
area to be sampled, the type and method of measurement in the selected
units and the statistical procedures for analysing the survey data. A
variety of sampling methods and estimating techniques developed to
meet the varying demands of the survey statistician accord the user a
wide selection for specific situations. One can choose the method or
combination of methods that will yield a desired degree of precision at
minimum cost.

3.1. Principal steps in a sample survey

In any sample survey, we must first decide on the type of data to


be collected and determine how adequate the results should be. Secondly,
we must formulate the sampling plan for each of the characters for which
data are to be collected. We must also know how to combine the sampling
procedures for the various characters so that no duplication of field work
occurs. Thirdly, the field work must be efficiently organised with adequate
provision for supervising the work of the field staff. Lastly, the analysis of
the data collected should be carried out using appropriate statistical
techniques and the report should be drafted giving full details of the basic
assumptions made, the sampling plan and the results of the statistical
analysis.

(i) Specification of the objectives of the survey: Careful consideration


must be given at the outset to the purposes for which the survey is to be
undertaken. The characteristics on which information is to be collected
and the degree of detail to be attempted should be fixed. If it is a survey
of trees, it must be decided as to what species of trees are to be
enumerated, whether only estimation of the number of trees under
specified diameter classes or, in addition, whether the volume of trees is
also proposed to be estimated. It must also be decided at the outset what
accuracy is desired for the estimates.

(ii) Construction of a frame of units: The first requirement of probability


sample of any nature is the establishment of a frame. A frame is a list of

Department of Agricultural Statistics, OUAT Page-72


UG Practical Manual on Statistics

sampling units which may be unambiguously defined and identified in the


population. The sampling units may be compartments, topographical
sections, strips of a fixed width or plots of a definite shape and size. The
sampling frame is collected from secondary sources like revenue
department or any related offices or books, journals or records etc.

(iii) Choice of a sampling design: If it is agreed that the sampling design


should be such that it should provide a statistically meaningful measure of
the precision of the final estimates, then the sample should be a
probability sample, in that every unit in the population should have a
known probability of being selected in the sample. The choice of units to
be enumerated from the frame of units should be based on some
objective rule which leaves nothing to the opinion of the field worker. The
determination of the number of units to be included in the sample and the
method of selection is also governed by the allowable cost of the survey
and the accuracy in the final estimates.

(iv) Organisation of the field work: The entire success of a sampling


survey depends on the reliability of the field work. Proper selection of the
personnel, intensive training, clear instructions and proper supervision of
the fieldwork are essential to obtain satisfactory results. The field parties
should correctly locate the selected units and record the necessary
measurements according to the specific instruction given. The supervising
staff should check a part of their work in the field and satisfy that the
survey carried out in its entirety as planned.

(v) Analysis of the data: Depending on the sampling design used and the
information collected, proper formulae should be used in obtaining the
estimates and the precision of the estimates should be computed. Double
check of the computations is desired to safeguard accuracy in the
analysis.

(vi) Preliminary survey (pilot trials): The design of a sampling scheme for
a survey requires both knowledge of the statistical theory and experience
with data regarding the nature of the study area, the pattern of variability
and operational cost. If prior knowledge in these matters is not available,
a statistically planned small scale ‘pilot survey’ may have to be conducted
before undertaking any large scale survey in that area. Such exploratory
or pilot surveys will provide adequate knowledge regarding the variability
of the material and will afford opportunities to test and improve field
procedures, train field workers and study the operational efficiency of a
design. A pilot survey will also provide data for estimating the various
components of cost of operations in a survey like time of travel, time of
location and enumeration of sampling units, etc. The above information
will be of great help in deciding the proper type of design and intensity of
sampling that will be appropriate for achieving the objects of the survey.

Department of Agricultural Statistics, OUAT Page-73


UG Practical Manual on Statistics

Sampling terminology

Population : The word population is defined as the aggregate of units from


which a sample is chosen. Exa. All the plots, trees, plants, insects, blocks,
villages, or people etc. of study area.

Sampling units: Sampling units are all the well defined units of the
population from which a sample is to be collected.

Sampling frame: A list of sampling units of a population of units.

Sample: One or more sampling units selected from a population according


to some specified procedure constitute a sample.

Sampling intensity or sampling fraction: It is the ratio of the number of


units in the sample to the number of units in the population (n/N).

Population total: Suppose a finite population consists of units U1, U2, …,


UN. Let the value of the characteristic for the i-th unit be denoted by yi for
every unit Ui. The population total of the values, yi ( i = 1, 2, …, N) is:

Population mean: The arithmetic mean or average of yi values

Population variance: A measure of the variation between units of the


population is:

which measures the variation among the population units- large values
indicate large variation between units and small values indicate that the
values of the characteristic for the units are close to the population mean.
The square root of the variance is known as standard deviation.

Coefficient of variation: The ratio of the standard deviation to the mean


expressed in percentage is:

Sy
C.V .   100
Y
Department of Agricultural Statistics, OUAT Page-74
UG Practical Manual on Statistics

It being unitless used to compare the variation between two or more


populations or sets of observations for variability.

Parameter: A function of the values of the units in the population. Exa.


Population mean, variance, C.V., correlation etc., are population
parameters. The problem in sampling theory is to estimate the
parameters from a sample by a procedure that makes it possible to
measure the precision of the estimates.

Estimator and estimate: Let the sample observations be y1, y2, …, yn of


size n . Any function of the sample observations will be called a statistic.
When a statistic is used to estimate a population parameter, the statistic
will be called an estimator. Exa. the sample mean is an estimator of the
population mean. Any particular value of an estimator computed from an
observed sample will be called an estimate.

Bias in estimation: A statistic t is said to be an unbiased estimator of a


population parameter q if its expected value, E(t), is equal to q . A
sampling procedure based on a probability scheme gives rise to a number
of possible samples by repetition of the sampling procedure. If the values
of the statistic t are computed for each of the possible samples and if the
average of the values is equal to the population value q , then t is said to
be an unbiased estimator of q. In case E(t) is not equal to q , the
statistic t is said to be a biased estimator of q and the bias is given by,
bias = E(t) - q .

Sampling variance: It is defined as the average magnitude over all


possible samples of the squares of deviations of the estimator from its
expected value and is given by V(t) = E[t - E(t)]2.

The larger the sample and the smaller the variability between units in the
population, the smaller will be the sampling error and the greater will be
the confidence in the results.

Standard error of an estimator: The square root of the sampling variance


of an estimator is known as the standard error of the estimator. The
standard error of an estimate divided by the value of the estimate is
called relative standard error which is usually expressed in percentage.

Accuracy and precision: Precision of an estimate is the inverse of the


standard error or the sampling variance. Accuracy usually refers to the
size of the deviations of the sample estimate from the mean and the bias
thus measured by m - q. It is the accuracy of the sample estimate in
which we are chiefly interested and it is the precision with which we are
able to measure in most instances. We strive to design the survey and
attempt to analyse the data using appropriate statistical methods in such

Department of Agricultural Statistics, OUAT Page-75


UG Practical Manual on Statistics

a way that the precision is increased to the maximum and bias is reduced
to the minimum.

Confidence limits: If the estimator t is normally distributed (generally


valid for large samples), a confidence interval defined by a lower and
upper limit can be expected to include the population parameter q with a
specified probability level. The limits are given by

Lower limit = t  Z V (t )

Upper limit = t  Z V (t )

Where V(t) is the estimate of the variance of t and Zα is the value of the
normal deviate corresponding to a desired α% confidence probability.
When Zα is taken as 1.96, we say that the chance of the true value of q
being contained in the random interval is 95 per cent.

Some general remarks: Capital letters will be used to denote population


values and small letters to denote sample values. The symbol ‘cap’ (^)
above a symbol for a population value denotes its estimate based on
sample observations. While describing the different sampling methods,
the formulae for estimating only population mean and its sampling
variance are given. Two related parameters are population total and ratio
of the character under study (y) to some auxiliary variable (x). These
related statistics can always be obtained from the mean by using the
following general relations.

where = Estimate of the population total


N = Total number of units in the population
= Estimate of the population ratio
X = Population total of the auxiliary variable

3.2. Simple random sampling (SRS)

A sampling procedure such that each possible combination of


sampling units out of the population has the same chance of being
selected is referred to as simple random sampling. From theoretical
considerations, simple random sampling is the simplest form of sampling
and is the basis for many other sampling methods. Simple random
sampling is most applicable for the initial survey in an investigation and
for studies which involve sampling from a small area where the sample
size is relatively small. The irregular distribution of the sampling units in
an area in simple random sampling may be of great disadvantage where

Department of Agricultural Statistics, OUAT Page-76


UG Practical Manual on Statistics

accessibility is poor and the costs of travel and locating the plots are
considerably higher than the cost of enumerating the plot.

Selection of sampling units from a Population

In practice, a random sample is selected unit by unit. Two methods of


random selection for simple random sampling without replacement (WOR)
are explained in this section.

i). Lottery method: The units in the population are numbered 1 to N and
then N identical paper chits with numberings 1 to N are obtained and one
chit is chosen at random after shuffling the chits. The process is
repeated n times without replacing the chits selected. The units which
correspond to the numbers on the chosen chits form a simple random
sample of size n from the population of N units. In this way, the
probability of selecting any chit is the same for all the N chits.

ii). Selection based on random number tables: The procedure of selection


using the lottery method obviously becomes rather inconvenient
when N is large. To overcome this difficulty, we may use a table of
random numbers such as those published by Fisher and Yates a sample of
which is given in Appendix. The tables of random numbers have been
developed in such a way that the digits 0 to 9 appear independent of each
other and approximately equal number of times in the table. The simplest
way of selecting a random sample of required size consists in selecting a
set of n random numbers one by one from 1 to N in the random number
table and then taking the units bearing those numbers. This procedure
may involve a number of rejections since all the numbers more
than N appearing in the table are not considered for selection. In such
cases, the procedure is modified as follows. If N is a d digited number, we
first determine the highest d digited multiple of N, say N’. Then a random
number r is chosen from 1 to N’ and the unit having the serial number
equal to the remainder obtained on dividing r by N is considered as
selected. If remainder is zero, the last unit is selected.

Problem-27: Select a simple random sample of n=5 units from a


population of size N=40.

Solution:
i). Serially number the population units from 1 to 40 (here 40 is 2-digit).
ii). Find the highest two digit number 80 which is divisible by 40.
iii). Let us select the 5th column of random number table (Table-5 of
Appendix).
iv). The value 39 (which is less than N=40) is selected as 1st member of
the sample.
v).Other values of column 92, 90 ate rejected as >80.

Department of Agricultural Statistics, OUAT Page-77


UG Practical Manual on Statistics

vi). 27 is selected (which is in 1-40) as 2nd sample unit.


vii). 00 i.e 40th value selected as 3rd sample unit.
vii). The next value is 74. Dividing it by 40 the remainder is 34. So 34 th
unit as 4th sample unit.
viii). Next comes 07 and it is selected as 5th sample unit.
So, the selected 5 sample units from the population members of 40
are:39, 27, 40, 34 & 7.

Exercise: Select a random sample of 11 cows from a list 112 milching


cows of a herd by using the random number table.

3.3. Parameter estimation in SRS

a). SRS WOR (without replacement)

Let y1, y2,… ,yn be the measurements on a particular characteristic


on n selected units in a sample from a population of N sampling units. It
can be shown in the case of simple random sampling without replacement
that the sample mean,

is an unbiased estimator of the population mean, . An unbiased estimate


of the sampling variance of is given by,

where,

Assuming that the estimate is normally distributed, a confidence


interval on the population mean can be set with the lower and upper
confidence limits defined by,

Lower limit and Upper limit

where z is the table value which depends on how many observations


there are in the sample. If there are 30 or more observations we can read
the values from the table of the normal distribution. If there are less than
30 observations, the table value should be read from the table
of t distribution using n - 1 degree of freedom.
Department of Agricultural Statistics, OUAT Page-78
UG Practical Manual on Statistics

b). SRS WR (with replacement)

Let y1, y2,… ,yn be the measurements on a particular characteristic


on n selected units in a sample from a population of N sampling units with
replacement. Then,

1. Estimate of population mean,



N 1 2
2. Estimate of Variance of sample mean, V (Y )  Sy
Nn

where


3. Estimate of population total, Y  N  y

4. Estimate of C.I. of population mean:


N 1
Lower limit, YL  y  Z S y
Nn


N 1
Upper limit, YL  y  Z S y
Nn

Problem-28: A forest has been divided up into 1000 plots of 0.1 hectare
each and a simple random sample of 25 plots has been selected. For each
of these sample plots the wood volumes in m3 were recorded as:

Samle Obs. 1 2 3 4 5 6 7 8 9 10 11 12 13
Wood Volume 7 8 2 6 7 10 8 6 7 3 7 8 9
Samle Obs. 14 15 16 17 18 19 20 21 22 23 24 25
Wood Volume 11 8 4 7 7 8 7 7 5 8 8 7

Estimate the population mean, 95% C.I. of mean, C.V. and total volume
of wood in the forest by SRSWOR and SRSWR. Compare the efficiency of
the two methods.

Solution:
Department of Agricultural Statistics, OUAT Page-79
UG Practical Manual on Statistics

a). SRSWOR
Let the ith sampling unit (i=1,2,3,……,25) of wood volume is designated
as yi.
Now, an unbiased estimator of the population mean is obtained using
formula as:

= 7 m3
which is the mean wood volume per plot of 0.1 ha in the forest area.

An estimate ( ) of the variance of individual values of y is obtained using


formula as:

= = 3.833
Then unbiased estimate of sampling variance of mean is

= 0.1495 and 0.3867 m3


The relative standard error,

C.V.= = (100) = 5.52 %


The confidence limits on the population mean are :
Lower limit = 6.20
Upper limit = 7.80
The 95% confidence interval for the population mean is (6.20, 7.80) m 3.
Thus, we are 95% confident that the confidence interval (6.20, 7.80)
m3 would include the population mean.
An estimate of the total wood volume in the forest area sampled can
easily be obtained by multiplying the estimate of the mean by the total
number of plots in the population. Thus,

with a confidence interval of (6200, 7800) obtained by


multiplying the confidence limits on the mean by N = 1000.

b). SRSWR
An unbiased estimator of the population mean is also obtained using
formula as:

= 7 m3
which is the mean wood volume per plot of 0.1 ha in the forest area.

Department of Agricultural Statistics, OUAT Page-80


UG Practical Manual on Statistics

An estimate ( ) of the variance of individual values of y is also obtained


using formula as:

= = 3.833
Now, the unbiased estimate of sampling variance of mean is



1000  1 N 1
V (Y )  1000  25  3.833 =0.153167 and SE(est. of pop. Mean)= Sy
Nn
=0.391365 m3

The relative standard error, C.V.=0.3914x100/7=5.59%

The confidence limits on the population mean are :



N 1
YL  y  Z S y = 7  2.064  0.3914 =6.19
Nn


N 1
YL  y  Z S y = 7  2.064  0.3914 =7.81
Nn

Lower limit = 6.20


Upper limit = 7.80
The 95% confidence interval for the population mean is (6.19, 7.81) m3.
Thus, we are 95% confident that the confidence interval (6.19, 7.81)
m3 would include the population mean.
An estimate of the total wood volume in the forest area sampled can
easily be obtained by multiplying the estimate of the mean by the total
number of plots in the population. Thus,
with a confidence interval of (61900, 7810) obtained
by multiplying the confidence limits on the mean by N = 1000.
The efficiency of SRSWOR w.r.t SRSWR =(0.1495/0.1531)x100=97.58%

Exercise: In an agriculture survey the following data has been recorded


on holding size of land (in acres) as:

Sl. Holding Sl. Holding Sl. Holding


No. Size No. Size No. Size
1 21.04 13 8.29 25 22.13
2 12.59 14 7.27 26 1.68
3 20.30 15 1.47 27 49.58
4 16.16 16 1.12 28 1.68
5 23.82 17 10.67 29 4.80
6 1.79 18 5.94 30 12.72
7 26.91 19 3.15 31 6.31
Department of Agricultural Statistics, OUAT Page-81
UG Practical Manual on Statistics

8 7.41 20 4.84 32 14.18


9 7.68 21 9.07 33 22.19
10 66.55 22 3.69 34 2.50
11 141.80 23 14.61 35 25.29
12 28.12 24 1.10 36 20.99

Q.1. Draw a random sample of size, n=10 from these 36 observations.


Q.2. Findout the population parameters on mean, variance, total, C.V.,
C.I. of mean at 95% confidence by SRSWOR and SRSWR.
Q.3. Compare the relative precision of SRSWOR with SRSWR.

3.4. Stratified sampling

The basic idea in stratified random sampling is to divide a


heterogeneous population into sub-populations, usually known as strata,
each of which is internally homogeneous in which case a precise estimate
of any stratum mean can be obtained based on a small sample from that
stratum and by combining such estimates, a precise estimate for the
whole population can be obtained. Stratified sampling provides a better
cross section of the population than the procedure of simple random
sampling. It may also simplify the organisation of the field work.
Geographical proximity is sometimes taken as the basis of stratification.
The assumption here is that geographically contiguous areas are often
more alike than areas that are far apart. Administrative convenience may
also dictate the basis on which the stratification is made. A fairly effective
method of stratification is to conduct a quick reconnaissance survey of the
area or pool the information already at hand and stratify the area
according to some characteristics like land, slope, breed, age, plant types,
stand density, site quality etc. If the characteristic under study is known
to be correlated with a supplementary variable for which actual data or at
least good estimates are available for the units in the population, the
stratification may be done using the information on the supplementary
variable. For instance, the rainfall estimates obtained at a previous
inventory of an area may be used for stratification of the population.

In stratified sampling, the variance of the estimator consists of only


the ‘within strata’ variation. Thus the larger the number of strata into
which a population is divided, the higher, in general, the precision, since it
is likely that, in this case, the units within a stratum will be more
homogeneous. For estimating the variance within strata, there should be
a minimum of 2 units in each stratum. The larger the number of strata
the higher will, in general, be the cost of enumeration. So, depending on
administrative convenience, cost of the survey and variability of the

Department of Agricultural Statistics, OUAT Page-82


UG Practical Manual on Statistics

characteristic under study in the area, a decision on the number of strata


will have to be arrived at.

Allocation and selection of the sample within strata

Let the population is divided into k strata of N1, N2 ,…, Nk units


respectively, and that a sample of n units is to be drawn from the
population. The problem of allocation concerns the choice of the sample
sizes in the respective strata, i.e., how many units should be taken from
each stratum such that the total sample is n.

Other things being equal, a larger sample may be taken from a


stratum with a larger variance so that the variance of the estimates of
strata means gets reduced. The application of the above principle requires
advance estimates of the variation within each stratum. These may be
available from a previous survey or may be based on pilot surveys of a
restricted nature. Thus if this information is available, the sampling
fraction (ni/Ni) in each stratum may be taken proportional to the standard
deviation of each stratum.

In case the cost per unit of conducting the survey in each stratum is
known and is varying from stratum to stratum an efficient method of
allocation for minimum cost will be to take large samples from the
stratum where sampling is cheaper and variability is higher. To apply this
procedure one needs information on variability and cost of observation
per unit in the different strata.

Where information regarding the relative variances within strata and


cost of operations are not available, the allocation in the different strata
may be made in proportion to the number of units in them or the total
area of each stratum. This method is usually known as ‘proportional
allocation’.

For the selection of units within strata, in general, any method


which is based on a probability selection of units can be adopted. But the
selection should be independent in each stratum. If independent random
samples are taken from each stratum, the sampling procedure will be
known as ‘stratified random sampling’. Other modes of selection of
sampling such as systematic sampling can also be adopted within the
different strata.

Estimation of mean and variance

Department of Agricultural Statistics, OUAT Page-83


UG Practical Manual on Statistics

We shall assume that the population of N units is first divided


into k strata of N1, N2,…,Nk units respectively. These strata are non-
overlapping and together they comprise the whole population, so that

N1 + N2 + ….. + Nk = N

When the strata have been determined, a sample is drawn from


each stratum, the selection being made independently in each stratum.
The sample sizes within the strata are denoted by n1, n2, …,
nk respectively, so that

n1 + n2 +…..+ n3 = n

Let ytj (j = 1, 2,…., Nt ; t = 1, 2,..…k) be the value of the characteristic


under study for the j-th unit in the t-th stratum. Then,

i). the population mean in the t-th stratum is given by

The overall population mean is given by

The estimate of the population mean, , in this case will be obtained by

Where,

ii). Estimate of the variance of is given by

Department of Agricultural Statistics, OUAT Page-84


UG Practical Manual on Statistics

Where,

Stratification, if properly done as explained in the previous sections,


will usually give lower variance for the estimated population total or mean
than a simple random sample of the same size. However, a stratified
sample taken without due care and planning may not be better than a
simple random sample.

Problem-29: A forest area consisting of 69 compartments was divided


into three strata containing compartments 1-29, compartments 30-45,
and compartments 46-69 and sample size of 10, 5 and 8 compartments
respectively were chosen at random from the three strata. The serial
numbers of the selected compartments in each stratum are given in
column (4) of the following Table. The corresponding observed volume of
the particular species in each selected compartment in m3/ha is shown in
column (5).

Table-20. Estimation of parameters under stratified sampling

Stratum Total number Number of Selected sampling Volume


number of units in units sampled unit number (m3/ha) ( )
the stratum (nt)
(Nt) ( )

(1) (2) (3) (4) (5) (6)

1 5.40 29.16
18 4.87 23.72
28 4.61 21.25
I 12 3.26 10.63
20 4.96 24.60
19 4.73 22.37
9 4.39 19.27
6 2.34 5.48
17 4.74 22.47
7 2.85 8.12

Total 29 10 .. 42.15 187.07

43 4.79 22.94
II 42 4.57 20.88
36 4.89 23.91
45 4.42 19.54
39 3.44 11.83

Total 16 5 .. 22.11 99.10

59 7.41 54.91

Department of Agricultural Statistics, OUAT Page-85


UG Practical Manual on Statistics

50 3.70 13.69
49 5.45 29.70
III 58 7.01 49.14
54 3.83 14.67
69 5.25 27.56
52 4.50 20.25
47 6.51 42.38

Total 24 8 .. 43.66 252.30

Step-1. Compute the following quantities.

N = (29 + 16 + 24) = 69
n = (10 + 5 + 8) = 23
: y I = 4.215, y II = 4.422, y III = 5.458

Step-2. Estimation of the population mean

Step-3. Estimation of the variance of

and

Department of Agricultural Statistics, OUAT Page-86


UG Practical Manual on Statistics

Step-3. if we ignore the strata and assume that the same sample of
size n = 23 formed a simple random sample (WOR) from the population
of N = 69, the estimate of the population mean would reduce to

Estimate of the variance of the mean is,

Where,

so that

=C.V.

The gain in precision due to stratification with SRSWOR is computed by

= 121.8

Thus the gain in precision is 21.8%.

Exercise: 2000 wheat cultivators’ holdings in a GP were stratified


according to their sizes and the results due to stratification is given below.

Stratum No. No. of holdings Mean area per S.D. of area per
(Ni) holding ( Yt ) holding (St)

Department of Agricultural Statistics, OUAT Page-87


UG Practical Manual on Statistics

1 394 5.4 8.3


2 461 16.3 13.3
3 381 24.3 15.1
4 334 34.5 19.8
5 169 42.1 24.5
6 113 50.1 26.0
7 148 63.8 35.2
Estimate:
1. Mean of wheat area of the GP
2. Variance of mean of Wheat area of GP
3. C.V. of area of GP
4. Mean area, variance of mean, and C.V. of GP if considered as SRSWOR
5. Gain in precision of stratification with SRSWOR

3.5. Systematic sampling

Systematic sampling employs a simple rule of selecting every k-th


unit starting with a number chosen at random from 1 to k (k=N/n) as the
random start. Let there be N sampling units in the population numbered 1
to N, then a systematic sample of n units are selected starting with the
random start and others with an interval of k (called sampling interval)
from it. This type of sampling is often convenient in exercising control
over field work. Apart from these operational considerations, the
procedure of systematic sampling is observed to provide estimators more
efficient than simple random sampling under normal conditions. The
property of the systematic sample in spreading the sampling units evenly
over the population can be taken advantage of by listing the units so that
homogeneous units are put together or such that the values of the
characteristic for the units are in ascending or descending order of
magnitude i.e. in some order. For example, knowing the fertility trend of
the forest area the units (for example strips) may be listed along the
fertility trend.

Selection of a systematic sample

Consider a population of N=48 units. A sample of n=4 units is


needed. Here, k =(48/4)=12. If the random number selected from the set
of numbers from 1 to 12 is 11, then the units associated with serial
numbers 11, 23, 35 and 47 will be selected. This technique will generate
k systematic samples with equal probability.

In situations where N is not fully divisible by n, k is calculated as


the integer nearest to N/n. In this situation, the sample size is not

Department of Agricultural Statistics, OUAT Page-88


UG Practical Manual on Statistics

necessarily n and in some cases it may be n -1 and generates unequal


sample sizes.

Parameter estimation

The estimate for the population mean per unit is given by the sample
mean

where n is the number of units in the sample.

One-dimensional Systematic sampling

In the case of systematic strip surveys or, in general, any one


dimensional systematic sampling, an approximation to the standard
error may be obtained from the differences between pairs of successive
units. If there are n units enumerated in the systematic sample, there will
be (n-1) differences. The variance per unit is therefore given by the sum
of squares of the differences divided by twice the number of differences.
Thus if y1, y2,…,yn are the observed values (say volume) for the n units in
the systematic sample and defining the first difference d(yi) as given
below,

; (i = 1, 2, …, n -1)

the approximate variance per unit is estimated as

Problem-30: The following Table gives the observed diameters of 10


trees selected by systematic selection of 1 in 20 trees from a stand
containing 195 trees in rows of 15 trees. The first tree was selected as the
8th tree from one of the outside edges of the stand starting from one
corner and the remaining trees were selected systematically by taking
every 20th tree switching to the nearest tree of the next row after the last
tree in any row is encountered.

Table21. Tree diameter on a systematic sample of 10 trees from a plot

Tree No. DBH(cm), yi First difference, d(yi)

8 14.8

Department of Agricultural Statistics, OUAT Page-89


UG Practical Manual on Statistics

28 12.0 -2.8

48 13.6 +1.6

68 14.2 +0.6

88 11.8 -2.4

108 14.1 +2.3

128 11.6 -2.5

148 9.0 -2.6

168 10.1 +1.1

188 9.5 -0.6

Solution:

Average diameter,

The nine first differences can be obtained as shown in column (3) of the
Table. The error variance of the mean per unit is thus,

= 0.202167

k-Independent Systematic sampling of equal sample size

A difficulty with systematic sampling is that one systematic sample


by itself will not furnish valid assessment of the precision of the
estimates. With a view to have valid estimates of the precision, one may
resort to partially systematic samples. A theoretically valid method of
using the idea of systematic samples and at the same time leading to
unbiased estimates of the sampling error is to draw a minimum of two
systematic samples with independent random starts. If , , …,
are m estimates of the population mean based on m independent
systematic samples, the combined estimate for population mean is:

The estimate of the variance of is given by

Department of Agricultural Statistics, OUAT Page-90


UG Practical Manual on Statistics

Notice that the precision increases with the number of independent


systematic samples.

Problem-31: The data given in the following Table have one systematic
sample along with another systematic sample selected with independent
random starts. In the second sample, the first tree was selected as the
10th tree.

Table-22. Tree diameter on two independent systematic samples of 10


trees from a plot.

Sample-1 Sample-2

Tree No. DBH (cm), yi Tree No. DBH (cm), yi

8 14.8 10 13.6

28 12.0 30 10.0

48 13.6 50 14.8

68 14.2 70 14.2

88 11.8 90 13.8

108 14.1 110 14.5

128 11.6 130 12.0

148 9.0 150 10.0

168 10.1 170 10.5

188 9.5 190 8.5

Solution:

Here, n=10, k=20 and N=200

The average diameter for the first sample is and the


second sample is . Combined estimate of population mean ( ) is
obtained as:

Department of Agricultural Statistics, OUAT Page-91


UG Practical Manual on Statistics

= 12.13 cm

The estimate of the variance of mean is obtained as:

= 0.0036

= 0.06 cm and C.V.=0.06x100/12.13=0.49%

Total= 200x12.13=2426 cm

Exercise: Given below are data for 10 systematic samples of size 4 from a
population of 40 units.

Systematic sample numbers


1 2 3 4 5 6 7 8 9 10
0 1 2 1 4 5 6 7 7 9
7 8 9 10 12 13 15 6 16 17
18 18 19 20 21 20 24 13 28 29
29 30 31 31 30 32 35 37 38 63

Work out the estimate of population mean, total, variance, C.V. and
relative precision of systematic sample with SRSWOR.

*****************XXX******************

Department of Agricultural Statistics, OUAT Page-92


UG Practical Manual on Statistics

APPENDIX
STATISTICAL TABLES (t, F, χ2, r, Z, random number)
Table-1(a): Critical values for t-distribution
Probability % Probability % Probability %
DF 0.01 0.05 DF 0.01 0.05 DF 0.01 0.05
1 63.657 12.706 41 2.701 2.020 81 2.637 1.990
2 9.925 4.303 42 2.698 2.018 82 2.637 1.989
3 5.841 3.182 43 2.695 2.017 83 2.636 1.989
4 4.604 2.776 44 2.692 2.016 84 2.635 1.989
5 4.032 2.571 45 2.689 2.014 85 2.634 1.988
6 3.707 2.447 46 2.687 2.013 86 2.634 1.987
7 3.499 2.365 47 2.684 2.012 87 2.633 1.987
8 3.355 2.306 48 2.682 2.011 88 2.632 1.987
9 3.250 2.262 49 2.679 2.010 89 2.632 1.987
10 3.169 2.228 50 2.677 2.008 90 2.631 1.987
11 3.106 2.201 51 2.675 2.007 91 2.630 1.986
12 3.055 2.179 52 2.673 2.006 92 2.630 1.986
13 3.102 2.160 53 2.671 2.005 93 2.629 1.986
14 2.977 2.145 54 2.670 2.004 94 2.629 1.986
15 2.947 2.131 55 2.668 2.004 95 2.628 1.986
16 2.921 2.120 56 2.666 2.003 96 2.628 1.985
17 2.898 2.110 57 2.664 2.002 97 2.627 1.985
18 2.878 2.101 58 2.663 2.002 98 2.626 1.984
19 2.861 2.093 59 2.661 2.001 99 2.626 1.984
20 2.845 2.086 60 2.660 2.000 100 2.625 1.984
21 2.831 2.080 61 2.658 1.999 105 2.623 1.983
22 2.819 2.074 62 2.657 1.998 110 2.621 1.982
23 2.807 2.069 63 2.656 1.998 115 2.619 1.981
24 2.797 2.064 64 2.654 1.997 120 2.617 1.980
25 2.787 2.060 65 2.653 1.996 125 2.616 1.979
26 2.779 2.056 66 2.652 1.996 130 2.614 1.978
27 2.771 2.052 67 2.651 1.995 135 2.613 1.978
28 2.763 2.048 68 2.650 1.995 140 2.611 1.977
29 2.756 2.045 69 2.649 1.994 145 2.610 1.976
30 2.750 2.042 70 2.647 1.994 150 2.609 1.976
31 2.744 2.040 71 2.646 1.993 160 2.607 1.975
32 2.738 2.037 72 2.645 1.993 170 2.605 1.974
33 2.733 2.035 73 2.644 1.993 180 2.603 1.973
34 2.728 2.033 74 2.643 1.993 190 2.602 1.973
35 2.723 2.030 75 2.643 1.992 200 2.601 1.972
36 2.719 2.028 76 2.642 1.992 250 2.596 1.969
37 2.715 2.026 77 2.641 1.991 300 2.592 1.968
38 2.711 2.024 78 2.640 1.991 350 2.590 1.967
39 2.707 2.022 79 2.639 1.991 400 2.588 1.966
40 2.704 2.021 80 2.638 1.990  2.576 1.960

Table-1(b): Critical values for t-distribution (One & Two-tailed)


Department of Agricultural Statistics, OUAT Page-93
UG Practical Manual on Statistics

Percentage (P)

One-tailed Two-tailed

Degree of freedom (v) 5% 1% 5% 1%

1 6.31 31.8 12.7 63.7

2 2.92 6.96 4.30 9.92

3 2.35 4.54 3.18 5.84

4 2.13 3.75 2.78 4.60

5 2.02 3.36 2.57 4.03

6 1.94 3.14 2.45 3.71

7 1.89 3.00 2.36 3.50

8 1.86 2.90 2.31 3.36

9 1.83 2.82 2.26 3.25

10 1.81 2.76 2.23 3.17

11 1.80 2.72 2.20 3.11

12 1.78 2.68 2.18 3.05

13 1.77 2.65 2.16 3.01

14 1.76 2.62 2.14 2.98

15 1.75 2.60 2.13 2.95

16 1.75 2.58 2.12 2.92

17 1.74 2.57 2.11 2.90

18 1.73 2.55 2.10 2.88

19 1.73 2.44 2.09 2.86

20 1.72 2.53 2.09 2.85

22 1.72 2.51 2.07 2.82

24 1.72 2.49 2.06 2.80

26 1.71 2.48 2.06 2.78

28 1.70 2.47 2.05 2.76

30 1.70 2.46 2.04 2.75

35 1.69 2.44 2.03 2.72

40 1.68 2.42 2.02 2.70

45 1.68 2.41 2.01 2.69

Department of Agricultural Statistics, OUAT Page-94


UG Practical Manual on Statistics

50 1.68 2.40 2.01 2.68

55 1.67 2.40 2.00 2.67

60 1.67 2.39 2.00 2.66

¥ 1.64 2.33 1.96 2.58

Table-2: Critical values for F-distribution

Smaller MS Degrees of freedom for greater mean square (n1)


(n2) 1 2 3 4 5 6 7 8 9 10
1 5% 161.00 200.00 216.00 225.00 230.00 234.00 237.00 239.00 241.00 242.00
1% 4052.00 4999.00 5403.00 5625.00 5764.00 5859.00 5928.00 5981.00 6022.00 6056.00

2 5% 18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.38 19.39
1% 98.49 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40
3 5% 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78
1% 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23
4 5% 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96
1% 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.54
5 5% 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74
1% 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.29 10.15 10.05
6 5% 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06
1% 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87
7 5% 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.63
1% 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84 6.71 6.62
8 5% 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.34
1% 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03 5.91 5.82
9 5% 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.13
1% 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47 5.35 5.26
10 5% 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.97
1% 10.04 7.56 6.55 5.99 5.64 5.39 5.21 5.06 4.95 4.85
11 5% 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.80
1% 9.65 7.20 6.22 5.67 5.32 5.07 4.88 4.74 4.63 4.54
12 5% 4.75 3.88 3.49 3.26 3.11 3.00 2.92 2.85 2.80 2.76
1% 9.33 6.93 5.95 5.41 5.06 4.82 4.65 4.50 4.39 4.30
13 5% 4.67 3.80 3.41 3.18 3.02 2.92 2.84 2.77 2.72 2.67
1% 9.07 6.70 5.74 5.20 4.86 4.62 4.44 4.30 4.19 4.10
14 5% 4.60 3.74 3.34 3.11 2.96 2.85 2.77 2.70 2.65 2.60
1% 8.86 6.51 5.56 5.03 4.69 4.46 4.28 4.14 4.03 3.94
15 5% 4.54 3.68 3.29 3.06 2.90 2.79 2.70 2.64 2.59 2.55
1% 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80
16 5% 4.49 3.83 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49
1% 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69
17 5% 4.45 3.59 3.20 2.96 2.81 2.70 2.62 2.55 2.50 2.45
1% 8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59

Table-2 (Continued…)
Critical values for F-distribution

Department of Agricultural Statistics, OUAT Page-95


UG Practical Manual on Statistics

Smaller Degrees of freedom for greater mean square (n1)


MS

(n2) 1 2 3 4 5 6 7 8 9 10
18 5% 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41
1% 8.28 6.01 5.09 4.58 4.25 4.01 3.85 3.71 3.60 3.51
19 5% 4.38 3.52 3.13 2.90 2.74 2.63 2.55 2.48 2.43 2.38
1% 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43
20 5% 4.35 3.49 3.10 2.87 2.71 2.60 2.52 2.45 2.40 2.35
1% 8.10 5.85 4.94 4.43 4.10 3.87 3.71 3.56 3.45 3.37
21 5% 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32
1% 8.02 5.78 4.87 4.37 4.04 3.81 3.65 3.51 3.40 3.31
22 5% 4.30 3.44 3.05 2.82 2.66 2.55 2.47 2.40 2.35 2.30
1% 7.94 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26
23 5% 4.28 3.42 3.03 2.80 2.64 2.53 2.45 2.38 2.32 2.28
1% 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21
24 5% 4.26 3.40 3.01 2.78 2.62 2.51 2.43 2.36 2.30 2.26
1% 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.25 3.17
25 5% 4.24 3.38 2.99 2.76 2.60 2.49 2.41 2.34 2.28 2.24
1% 7.77 5.57 4.68 4.18 3.86 3.63 3.46 3.32 3.21 3.13
26 5% 4.22 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.28 2.22
1% 7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.17 3.09
27 5% 4.21 3.50 2.96 2.73 2.57 2.46 2.37 2.30 2.25 2.20
1% 7.68 5.49 4.60 4.11 3.79 3.56 3.39 3.26 3.14 3.06
28 5% 4.20 3.34 2.95 2.71 2.56 2.44 2.36 2.29 2.24 2.19
1% 7.64 5.45 4.57 4.07 3.76 3.53 3.36 3.23 3.11 3.03
29 5% 4.18 3.33 2.95 2.70 2.54 2.43 2.35 2.28 2.22 2.18
1% 7.60 5.42 4.54 4.04 3.73 3.50 3.33 3.20 3.08 3.00
30 5% 4.17 3.32 2.92 2.69 2.53 2.42 2.34 2.27 2.21 2.16
1% 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.06 2.98
31 5% 4.16 3.31 2.91 2.68 2.52 2.41 2.33 2.26 2.20 2.15
1% 7.53 5.37 4.49 4.00 3.68 3.45 3.28 3.15 3.04 2.96
32 5% 4.15 3.30 2.90 2.67 2.51 2.40 2.32 2.25 2.19 2.14
1% 7.50 5.34 4.46 3.97 3.66 3.42 3.25 3.12 3.01 2.94
33 5% 4.14 3.29 2.89 2.66 2.50 2.39 2.31 2.24 2.18 2.13
1% 7.47 5.32 4.44 3.95 3.64 3.40 3.23 3.10 2.99 2.92
34 5% 4.13 3.28 2.88 2.65 2.49 2.38 2.30 2.23 2.17 2.12
1% 7.44 5.29 4.42 3.93 3.61 3.38 3.21 3.08 2.97 2.89

Table-2 (Continued…)
Critical values for F-distribution
Smaller Degrees of freedom for greater mean square (n1)
MS

(n2) 1 2 3 4 5 6 7 8 9 10
35 5% 4.12 3.27 2.87 2.64 2.49 2.37 2.29 2.22 2.16 2.11
1% 7.42 5.27 4.40 3.91 3.60 3.37 3.20 3.06 2.96 2.88
36 5% 4.11 3.26 2.86 2.63 2.48 2.36 2.28 2.21 2.15 2.10
1% 7.39 5.25 4.38 3.89 3.58 3.35 3.18 3.04 2.94 2.86
37 5% 4.11 3.26 2.86 2.63 2.47 2.36 2.27 2.20 2.15 2.10
1% 7.37 5.23 4.36 3.88 3.56 3.34 3.17 3.03 2.93 2.84
38 5% 4.10 3.25 2.85 2.62 2.46 2.35 2.26 2.19 2.14 2.09
1% 7.35 5.21 4.34 3.86 3.54 3.32 3.15 3.02 2.91 2.82
39 5% 4.09 3.24 2.85 2.62 2.46 2.35 2.26 2.19 2.13 2.08

Department of Agricultural Statistics, OUAT Page-96


UG Practical Manual on Statistics

1% 7.33 5.20 4.33 3.85 3.53 3.31 3.14 3.01 2.90 2.81
40 5% 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.07
1% 7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.88 2.80
41 5% 4.08 3.23 2.84 2.61 2.45 2.33 2.25 2.18 2.12 2.07
1% 7.29 5.17 4.30 3.82 3.50 3.28 3.11 2.98 2.87 2.79
42 5% 4.07 3.22 2.83 2.60 2.44 2.32 2.24 2.17 2.11 2.06
1% 7.27 5.15 4.29 3.80 3.49 3.26 3.10 2.96 2.86 2.77
43 5% 4.07 3.22 2.83 2.60 2.44 2.32 2.24 2.17 2.11 2.06
1% 7.26 5.14 4.28 3.79 3.48 3.25 3.09 2.95 2.85 2.76
44 5% 4.06 3.21 2.82 2.59 2.43 2.31 2.23 2.16 2.10 2.05
1% 7.24 5.12 4.26 3.78 3.46 3.24 3.07 2.94 2.84 2.75
45 5% 4.06 3.21 2.82 2.59 2.43 2.31 2.23 2.15 2.10 2.05
1% 7.23 5.11 4.25 3.77 3.45 3.23 3.06 2.93 2.83 2.74
46 5% 4.05 3.20 2.81 2.58 2.42 2.30 2.22 2.14 2.09 2.04
1% 7.21 5.10 4.24 3.76 3.44 3.22 3.05 2.92 2.82 2.73
47 5% 4.05 3.20 2.81 2.58 2.42 2.30 2.22 2.14 2.09 2.04
1% 7.20 5.09 4.23 3.75 3.43 3.21 3.05 2.91 2.81 2.72
48 5% 4.04 3.19 2.80 2.57 2.41 2.30 2.21 2.14 2.08 2.03
1% 7.19 5.08 4.22 3.74 3.42 3.20 3.04 2.90 2.80 2.71
49 5% 4.04 3.19 2.80 2.57 2.41 2.30 2.21 2.14 2.08 2.03
1% 7.18 5.07 4.21 3.73 3.42 3.19 3.03 2.89 2.79 2.71
50 5% 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.02
1% 7.17 5.06 4.20 3.72 3.41 3.18 3.02 2.88 2.78 2.70

Table-2 (Continued…)
Critical values for F-distribution

Smaller Degrees of freedom for greater mean square (n1)


MS

(n2) 1 2 3 4 5 6 7 8 9 10
55 5% 4.02 3.17 2.78 2.54 2.38 2.27 2.18 2.11 2.05 2.00
1% 7.12 5.01 4.16 3.68 3.37 3.15 2.98 2.85 2.75 2.66
60 5% 4.00 3.15 2.76 2.52 2.37 2.25 2.17 2.10 2.04 1.99
1% 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63
65 5% 3.99 3.14 2.75 2.51 2.36 2.24 2.15 2.08 2.02 1.98
1% 7.04 4.95 4.10 3.62 3.31 3.09 2.93 2.79 2.70 2.61
70 5% 3.98 3.13 2.74 2.50 2.35 2.23 2.14 2.07 2.01 1.97
1% 7.01 4.92 4.08 3.60 3.29 3.07 2.91 2.77 2.67 2.59
80 5% 3.96 3.11 2.72 2.48 2.33 2.21 2.12 2.05 1.99 1.95
1% 6.96 4.88 4.04 3.56 3.25 3.04 2.87 2.74 2.64 2.55
100 5% 3.94 3.09 2.70 2.46 2.30 2.19 2.10 2.03 1.97 1.92
1% 6.90 4.82 3.98 3.51 3.20 2.99 2.82 2.69 2.59 2.51
125 5% 3.92 3.07 2.68 2.44 2.29 2.17 2.08 2.01 1.95 1.90
1% 6.84 4.78 3.94 3.47 3.17 2.95 2.79 2.65 2.56 2.47
150 5% 3.91 3.06 2.67 2.43 2.27 2.16 2.07 2.00 1.94 1.89
1% 6.81 4.75 3.91 3.44 3.14 2.92 2.76 2.62 2.53 2.44
200 5% 3.89 3.04 2.65 2.41 2.26 2.14 2.05 1.98 1.92 1.87
1% 6.76 4.71 3.88 3.41 3.11 2.90 2.73 2.60 2.50 2.41
400 5% 3.86 3.02 2.62 2.39 2.23 2.12 2.03 1.96 1.90 1.85
1% 6.70 4.66 3.83 3.36 3.06 2.85 2.69 2.55 2.46 2.37
10005% 3.85 3.00 2.61 2.38 2.22 2.10 2.02 1.95 1.89 1.84
1% 6.66 4.62 3.80 3.34 3.04 2.82 2.66 2.53 2.43 2.34

Department of Agricultural Statistics, OUAT Page-97


UG Practical Manual on Statistics

 5%
3.84 2.99 2.60 2.37 2.21 2.09 2.01 1.94 1.88 1.83
1% 6.64 4.60 3.78 3.32 3.02 2.80 2.64 2.51 2.41 2.32

Table-2 (Continued…)
Critical values for F-distribution

Smaller Degrees of freedom for greater mean square (n1)


MS

(n2) 11 12 13 14 15 16 17 18 19 20
1 5% 243.00 244.00 244.50 245.00 245.50 246.00 246.50 247.00 247.50 248.00
1% 6082.00 6106.00 6124.00 6142.00 6156.00 6169.00 6177.00 6186.00 6194.00 6208.00

2 5% 19.40 19.41 19.42 19.42 19.43 19.43 19.43 19.44 19.44 19.44
1% 99.41 99.42 99.42 99.43 99.43 99.44 99.44 99.45 99.45 99.45
3 5% 8.76 8.74 8.73 8.71 8.70 8.69 8.68 8.68 8.67 8.66
1% 27.13 27.05 26.99 26.92 26.88 26.83 26.80 26.76 26.73 26.69
4 5% 5.93 5.91 5.89 5.87 8.86 5.84 5.83 5.82 5.81 5.80
1% 14.45 14.37 14.31 14.24 14.20 14.15 14.11 14.07 14.04 14.02
5 5% 4.70 4.68 4.66 4.64 4.62 4.60 4.59 4.58 4.57 4.56
1% 9.96 9.89 9.81 9.77 9.73 9.68 9.65 9.62 9.58 9.55
6 5% 4.03 4.00 3.98 3.96 3.94 3.92 3.91 3.90 3.88 3.87
1% 7.79 7.72 7.66 7.60 7.56 7.52 7.79 7.46 7.42 7.39
7 5% 3.60 3.57 3.55 3.52 3.51 3.49 3.48 3.47 3.45 3.44
1% 6.54 6.47 6.41 6.35 6.31 6.27 6.24 6.21 6.18 6.15
8 5% 3.31 3.28 3.26 3.23 3.22 3.20 3.19 3.18 3.16 3.15
1% 5.74 5.67 5.62 5.56 5.52 5.48 5.46 5.42 5.39 5.36
9 5% 3.10 3.07 3.05 3.02 3.00 2.98 2.97 2.96 2.94 2.93
1% 5.18 5.11 5.06 5.00 4.96 4.92 4.89 4.86 4.83 4.80
10 5% 2.94 2.91 2.89 2.86 2.84 2.82 2.81 2.80 2.78 2.77
1% 4.78 4.71 5.66 4.60 4.56 4.52 4.49 4.47 4.44 4.41
11 5% 2.82 2.79 2.77 2.74 2.72 2.70 2.69 2.68 2.66 2.65
1% 4.46 4.40 4.35 4.29 4.25 4.21 4.18 4.16 4.13 4.10
12 5% 2.72 2.69 2.67 2.64 2.62 2.60 2.59 2.57 2.56 2.54
1% 4.22 4.16 4.11 4.05 4.02 3.98 3.95 3.92 3.89 3.86
13 5% 2.63 2.60 2.58 2.55 2.53 2.51 2.50 2.49 2.73 2.46
1% 4.02 3.96 3.92 3.85 3.82 3.78 3.75 3.73 3.70 3.67
14 5% 2.56 2.53 2.51 2.48 2.46 2.44 2.43 2.42 2.40 2.39
1% 3.86 3.80 3.75 3.70 3.66 3.62 3.59 3.57 3.54 3.51
15 5% 2.51 2.48 2.46 2.43 2.41 2.39 2.38 2.36 2.35 2.33
1% 3.73 3.67 3.66 3.56 3.52 3.48 3.45 3.42 3.39 3.36
16 5% 2.45 2.42 2.40 2.37 2.35 2.33 2.32 2.31 2.29 2.28
1% 3.61 3.55 3.50 3.45 3.41 3.37 3.34 3.31 3.28 3.25
17 5% 2.41 2.38 2.36 2.33 2.31 2.29 2.28 2.26 2.25 2.23
1% 3.52 3.45 3.40 3.35 3.31 3.27 3.24 3.22 3.19 3.16

Table-2 (Continued…)
Critical values for F-distribution

Department of Agricultural Statistics, OUAT Page-98


UG Practical Manual on Statistics

Smaller Degrees of freedom for greater mean square (n1)


MS

(n2) 11 12 13 14 15 16 17 18 19 20
18 5% 2.37 2.34 2.32 2.29 2.27 2.25 2.24 2.22 2.21 2.19
1% 3.44 3.37 3.32 3.27 3.23 3.19 3.16 3.13 3.10 3.07
19 5% 2.34 2.31 2.29 2.26 2.24 2.21 2.20 2.18 2.17 2.15
1% 3.36 3.30 3.25 3.19 3.16 3.12 3.09 3.06 3.03 3.00
20 5% 2.31 2.28 2.26 2.23 2.21 2.18 2.17 2.15 2.14 2.12
1% 3.30 3.23 3.18 3.13 3.09 3.05 3.02 3.00 2.97 2.94
21 5% 2.28 2.25 2.23 2.20 2.18 2.15 2.14 2.12 2.12 2.09
1% 3.24 3.17 3.12 3.07 3.03 2.99 2.96 2.94 2.91 2.88
22 5% 2.26 2.23 2.21 2.18 2.16 2.13 2.12 2.10 2.09 2.07
1% 3.18 3.12 3.07 3.02 2.98 2.94 2.91 2.89 2.86 2.83
23 5% 2.24 2.20 2.17 2.14 2.12 2.10 2.09 2.07 2.06 2.04
1% 3.14 3.07 3.02 2.97 2.93 2.89 2.86 2.84 2.81 2.78
24 5% 2.22 2.18 2.16 2.13 2.11 2.09 2.07 2.06 2.04 2.02
1% 3.09 3.03 2.98 2.93 2.89 2.85 2.82 2.80 2.87 2.74
25 5% 2.20 2.16 2.14 2.11 2.09 2.07 2.05 2.04 2.02 2.00
1% 3.05 2.99 2.94 2.89 2.85 2.81 2.78 2.76 2.73 2.70
26 5% 2.18 2.15 2.13 2.10 2.08 2.05 2.04 2.02 2.01 1.99
1% 3.02 2.96 2.91 2.86 2.82 2.77 2.74 2.72 2.69 2.66
27 5% 2.16 2.13 2.11 2.08 2.06 2.03 2.02 2.00 1.99 1.97
1% 2.98 2.93 2.88 2.83 2.79 2.74 2.71 2.69 2.66 2.63
28 5% 2.15 2.12 2.09 2.06 2.04 2.02 2.01 1.99 1.98 1.96
1% 2.95 2.90 2.85 2.80 2.76 2.71 2.68 2.66 2.63 2.60
29 5% 2.14 2.10 2.08 2.05 2.03 2.00 1.99 1.97 1.96 1.94
1% 2.92 2.87 2.82 2.77 2.73 2.68 2.65 2.63 2.60 2.57
30 5% 2.12 2.09 2.05 2.04 2.02 1.99 1.98 1.96 1.95 1.93
1% 2.90 2.84 2.79 2.74 2.70 2.66 2.63 2.61 2.58 2.55
31 5% 2.11 2.08 2.05 2.03 2.01 1.98 1.97 1.95 1.94 1.92
1% 2.88 2.82 2.77 2.72 2.68 2.64 2.61 2.59 2.56 2.53
32 5% 2.10 2.07 2.05 2.02 2.00 1.97 1.96 1.94 1.93 1.91
1% 2.86 2.80 2.75 2.70 2.66 2.62 2.59 2.57 2.54 2.51
33 5% 2.09 2.06 2.04 2.01 1.99 1.96 1.95 1.93 1.92 1.90
1% 2.84 2.78 2.73 2.68 2.64 2.60 2.57 2.55 2.52 2.49
34 5% 2.08 2.05 2.03 2.00 1.98 1.95 1.94 1.92 1.91 1.89
1% 2.82 2.76 2.71 2.66 2.62 2.58 2.55 2.53 2.50 2.47

Table-2 (Continued…)
Critical values for F-distribution

Smaller Degrees of freedom for greater mean square (n1)


MS

(n2) 11 12 13 14 15 16 17 18 19 20
35 5% 2.07 2.04 2.02 1.99 1.97 1.94 1.93 1.91 1.90 1.88
1% 2.80 2.74 2.69 2.64 2.60 2.56 2.53 2.51 2.48 2.45
36 5% 2.06 2.03 2.01 1.98 1.96 1.93 1.92 1.90 1.89 1.87
1% 2.78 2.72 2.67 2.62 2.58 2.54 2.51 2.49 2.46 2.43
37 5% 2.06 2.03 2.00 1.97 1.95 1.93 1.91 1.89 1.88 1.86
1% 2.77 2.71 2.66 2.61 2.57 2.53 2.50 2.47 2.44 2.41
38 5% 2.05 2.02 1.99 1.96 1.94 1.92 1.90 1.89 1.87 1.85
1% 2.75 2.69 2.64 2.59 2.55 2.51 2.48 2.46 2.43 2.40
39 5% 2.05 2.01 1.99 1.96 1.93 1.91 1.89 1.88 1.86 1.85
1% 2.74 2.68 2.63 2.58 2.54 2.50 2.48 2.45 2.42 2.38

Department of Agricultural Statistics, OUAT Page-99


UG Practical Manual on Statistics

40 5% 2.04 2.00 1.98 1.95 1.93 1.90 1.89 1.87 1.86 1.84
1% 2.73 2.66 2.61 2.56 2.53 2.49 2.46 2.43 2.40 2.37
41 5% 2.01 2.00 1.98 1.95 1.92 1.90 1.88 1.86 1.85 1.83
1% 2.72 2.65 2.60 2.55 2.51 2.48 2.45 2.42 2.39 2.36
42 5% 2.02 1.99 1.97 1.94 1.92 1.89 1.87 1.86 1.84 1.82
1% 2.70 2.64 2.59 2.54 2.50 2.46 2.43 2.41 2.38 2.35
43 5% 2.02 1.99 1.96 1.93 1.91 1.89 1.87 1.85 1.83 1.82
1% 2.69 2.63 2.58 2.53 2.49 2.45 2.42 2.39 2.36 2.33
44 5% 2.01 1.98 1.95 1.92 1.90 1.88 1.86 1.85 1.83 1.81
1% 2.68 2.62 2.57 2.52 2.48 2.44 2.41 2.38 2.35 2.32
45 5% 2.01 1.98 1.95 1.92 1.90 1.88 1.86 1.84 1.82 1.81
1% 2.67 2.61 2.56 2.51 2.47 2.43 2.40 2.37 2.34 2.31
46 5% 2.00 1.97 1.94 1.91 1.89 1.87 1.84 1.84 1.82 1.80
1% 2.66 2.60 2.55 2.50 2.46 2.42 2.39 2.36 2.33 2.30
47 5% 2.00 1.97 1.94 1.91 1.89 1.87 1.85 1.83 1.81 1.80
1% 2.65 2.59 2.54 2.51 2.45 2.41 2.38 2.35 2.32 2.29
48 5% 1.99 1.96 1.93 1.90 1.88 1.86 1.85 1.83 1.81 1.79
1% 2.64 2.58 2.53 2.48 2.44 2.40 2.37 2.34 2.31 2.28
49 5% 1.99 1.96 1.93 1.90 1.88 1.86 1.84 1.82 1.80 1.79
1% 2.63 2.57 2.52 2.47 2.43 2.40 2.36 2.33 2.30 2.27
50 5% 1.98 1.95 1.92 1.89 1.87 1.85 1.83 1.82 1.80 1.78
1% 2.62 2.56 2.51 2.46 2.43 2.39 2.36 2.33 2.29 2.26

Table-2 (Continued…)
Critical values for F-distribution

Smaller Degrees of freedom for greater mean square (n1)


MS

(n2) 11 12 14 16 20 24 30 40 50 75
55 5% 1.97 1.93 1.88 1.83 1.76 1.72 1.67 1.61 1.58 1.52
1% 2.59 2.53 2.43 2.35 2.23 2.15 2.06 1.96 1.90 1.82
60 5% 1.95 1.92 1.86 1.81 1.75 1.70 1.65 1.59 1.56 1.50
1% 2.56 2.50 2.40 2.32 2.20 2.12 2.03 1.93 1.87 1.79
65 5% 1.94 1.90 1.85 1.80 1.73 1.68 1.63 1.57 1.54 1.49
1% 2.54 2.47 2.37 2.30 2.18 2.09 2.00 1.90 1.84 1.76
70 5% 1.93 1.89 1.84 1.79 1.72 1.67 1.62 1.56 1.53 1.47
1% 2.51 2.45 2.35 2.28 2.15 2.07 1.98 1.88 1.82 1.74
80 5% 1.91 1.88 1.82 1.77 1.70 1.65 1.60 1.54 1.51 1.45
1% 2.48 2.41 2.32 2.24 2.11 2.03 1.94 1.84 1.78 1.70
100 5% 1.88 1.85 1.79 1.75 1.68 1.63 1.57 1.51 1.48 1.42
1% 2.43 2.36 2.26 2.19 2.06 1.98 1.89 1.79 1.73 1.64
125 5% 1.86 1.83 1.77 1.72 1.65 1.60 1.55 1.49 1.45 1.39
1% 2.40 2.33 2.23 2.15 2.03 1.94 1.85 1.75 1.68 1.59
150 5% 1.85 1.82 1.76 1.71 1.64 1.59 1.54 1.47 1.44 1.37
1% 2.37 2.30 2.20 2.12 2.00 1.91 1.83 1.72 1.66 1.56
200 5% 1.83 1.80 1.74 1.69 1.62 1.57 1.52 1.45 1.42 1.35
1% 2.34 2.28 2.17 2.09 1.97 1.88 1.79 1.69 1.62 1.53
400 5% 1.81 1.78 1.72 1.67 1.60 1.54 1.49 1.42 1.38 1.32
1% 2.29 2.23 2.12 2.04 1.92 1.84 1.74 1.64 1.57 1.47
1000 1.80 1.76 1.70 1.65 1.58 1.53 1.47 1.41 1.36 1.30
5%
1% 2.26 2.20 2.09 2.01 1.89 1.81 1.71 1.61 1.54 1.44

 5%
1.79 1.75 1.69 1.64 1.57 1.52 1.46 1.40 1.35 1.28

Department of Agricultural Statistics, OUAT Page-100


UG Practical Manual on Statistics

1% 2.24 2.18 2.07 1.99 1.87 1.79 1.69 1.59 1.52 1.41

Table-3: χ2 (Chi-Squared) Distribution: Critical Values of χ2

Table-4: Critical value for Correlation coefficients (Simple or Partial)

Probability % Probability % Probability %


DF 0.01 0.05 DF 0.01 0.05 DF 0.01 0.05
1 1.000 0.997 41 0.389 0.301 130 0.223 0.171
2 0.990 0.950 42 0.384 0.297 135 0.219 0.168
3 0.959 0.878 43 0.380 0.294 140 0.215 0.165
4 0.917 0.811 44 0.376 0.291 145 0.212 0.162
5 0.874 0.754 45 0.372 0.288 150 0.208 0.159
6 0.834 0.707 46 0.368 0.285 160 0.202 0.154
7 0.798 0.666 47 0.365 0.282 170 0.196 0.150
8 0.765 0.632 48 0.361 0.279 180 0.190 0.145
9 0.735 0.602 49 0.358 0.276 190 0.185 0.142
10 0.708 0.576 50 0.354 0.273 200 0.181 0.138
11 0.684 0.553 52 0.348 0.268 250 0.162 0.124
12 0.661 0.532 54 0.341 0.263 300 0.148 0.113
13 0.641 0.514 56 0.336 0.259 350 0.137 0.105
14 0.623 0.497 58 0.330 0.254 400 0.128 0.098
15 0.606 0.482 60 0.325 0.250 450 0.121 0.092
16 0.590 0.468 62 0.320 0.246 500 0.115 0.088
17 0.575 0.456 64 0.315 0.242 600 0.105 0.080
18 0.561 0.444 66 0.310 0.239 700 0.097 0.074
19 0.549 0.433 68 0.306 0.235 800 0.091 0.069
20 0.537 0.423 70 0.302 0.232 900 0.086 0.065
21 0.526 0.413 72 0.298 0.229 1000 0.081 0.062

Department of Agricultural Statistics, OUAT Page-101


UG Practical Manual on Statistics

22 0.515 0.404 74 0.294 0.226


23 0.505 0.396 76 0.290 0.223
24 0.496 0.388 78 0.286 0.220
25 0.487 0.381 80 0.283 0.217
26 0.478 0.374 82 0.280 0.215
27 0.470 0.367 84 0.276 0.212
28 0.463 0.361 86 0.273 0.210
29 0.456 0.355 88 0.270 0.207
30 0.449 0.349 90 0.267 0.205
31 0.442 0.344 92 0.264 0.203
32 0.436 0.339 94 0.262 0.201
33 0.430 0.334 96 0.259 0.199
34 0.424 0.329 98 0.256 0.197
35 0.418 0.325 100 0.254 0.195
36 0.413 0.320 105 0.248 0.190
37 0.408 0.316 110 0.242 0.186
38 0.403 0.312 115 0.237 0.182
39 0.398 0.308 120 0.232 0.178
40 0.393 0.304 125 0.228 0.174

Table-5: Percentage points of the normal distribution, Z

This table gives percentage points of the standard normal distribution. These are the values of z for which
a given percentage, P, of the standard normal distribution lies outside the range from -z to +z.

P (%) Z

90 0.1257

80 0.2533

70 0.3853

60 0.5244

50 0.6745

40 0.8416

30 1.0364

20 1.2816

15 1.4395

10 1.6449

5 1.9600

2 2.3263

1 2.5758

0.50 2.8070

0.25 3.0233

Department of Agricultural Statistics, OUAT Page-102


UG Practical Manual on Statistics

0.10 3.2905

0.01 3.8906

Table-6: Random numbers

Each digit in the following table is independent and has a probability of (1/10). The table was computed
from a population in which the digits 0 to 9 were equally likely.

77 21 24 33 39 07 83 00 02 77 28 11 37 33

78 02 65 38 92 90 07 13 11 95 58 88 64 55

77 10 41 31 90 76 35 00 25 78 80 18 77 32

85 21 57 89 27 08 70 32 14 58 81 83 41 55

75 05 14 19 00 64 53 01 50 80 01 88 74 21

57 19 77 98 74 82 07 22 42 89 12 37 16 56

59 59 47 98 07 41 38 12 06 09 19 80 44 13

76 96 73 88 44 25 72 27 21 90 22 76 69 67

96 90 76 82 74 19 81 28 61 91 95 02 47 31

63 61 36 80 48 50 26 71 16 08 25 65 91 75

65 02 65 25 45 97 17 84 12 19 59 27 79 18

37 16 64 00 80 06 62 11 62 88 59 54 12 53

58 29 55 59 57 73 78 43 28 99 91 77 93 89

79 68 43 00 06 63 26 10 26 83 94 48 25 31

87 92 56 91 74 30 83 39 85 99 11 73 34 98

96 86 39 03 67 35 64 09 62 36 46 86 54 13

72 20 60 14 48 08 36 92 58 99 15 30 47 87

67 61 97 37 73 55 47 97 25 65 67 67 41 35

25 09 03 43 83 82 60 26 81 96 51 05 77 72

72 14 78 75 39 54 75 77 55 59 71 73 15 56

59 93 34 37 34 27 07 66 15 63 14 50 74 29

21 48 85 56 91 43 50 71 58 96 14 31 55 61

96 32 49 79 42 71 79 69 52 39 45 04 49 91

16 85 53 65 11 36 08 14 86 60 40 18 51 15

Department of Agricultural Statistics, OUAT Page-103


UG Practical Manual on Statistics

64 28 96 90 23 12 98 92 28 94 57 41 99 11

60 54 36 51 15 63 83 42 63 08 01 89 18 53

42 86 68 06 36 25 82 26 85 49 76 15 90 13

00 49 62 15 53 32 31 28 38 88 14 97 80 33

26 64 87 61 67 53 23 68 51 98 60 59 02 33

02 95 21 53 34 23 10 82 82 82 48 71 02 39

65 47 77 14 75 30 32 81 10 83 03 97 24 37

28 55 15 36 46 33 06 22 29 23 81 14 20 91

59 75 78 49 51 02 20 17 02 30 32 78 44 79

87 54 57 69 63 31 61 25 92 31 16 44 02 10

94 53 87 97 15 23 08 71 26 06 25 87 48 97

79 43 75 93 39 10 18 51 28 17 65 43 22 06

48 38 71 77 53 37 80 13 60 63 59 75 89 73

98 30 59 32 90 05 86 12 83 70 50 30 25 65

85 80 16 77 35 74 09 32 06 30 91 55 92 33

87 03 96 27 05 59 64 25 33 07 03 08 55 58

Department of Agricultural Statistics, OUAT Page-104

You might also like