i
{
{
i
{
Teaching Material
Course Title: Statistical Methods
Credit Hrs. 2(1+1)
Semester-HI
Compiled by:
Dr. S. N. Singh (Univ.Professor )
Dr. Fozia Homa (Asstt. Professor)
Mr. Subrat Keshori Behera (Asstt. Professor)
Department of Statistics, Mathematics & Computer Application
BIHAR AGRICULTURAL COLLEGE, SABOUR
BIHAR AGRICULTURAL UNIVERSITY, SABOUR
BHAGALPUR
PIN 813 210.
‘Course Title: Statistical Methods
Credit Hrs. 2(1+1)
Course Content
Theory: Introduction to Statistics and its Applications in Agriculture, Graphical
Representation of Data, Measures of Central Tendency & Dispersion, Definition of
Probability, Addition and Multiplication Theorem (without proof). Simple Problems Based
on Probability. Binomial & Poisson Distributions, Definition of Correlation, Scatter Diagram,
Karl Pearson's Coefficient of Correlation. Linear Regression Equations. Introduction to Test
of Significance, One sample & two sample test t for Means, Chi-Square Test of Independence
of Attributes in 2 x 2 Contingency Table. Introduction to Analysis of Variance, Analysis of
One Way Classification. Introduction to Sampling Methods, Sampling versus Complete
Enumeration, Simple Random Sampling with and without replacement, Use of Random
Number Tables for selection of Simple Random Sample.
Practical: Graphical Representation of Data, Measures of Central Tendency (Ungrouped
data) with Calculation of Quartiles, Deciles & Percentiles. Measures of Central Tendency
(Grouped data) with Calculation of Quartiles, Deciles & Percentiles. Measures of Dispersion
(Ungrouped Data). Measures of Dispersion (Grouped Data). Moments, Measures of
Skewness & Kurtosis (Ungrouped Data). Moments, Measures of Skewness & Kurtosis
(Grouped Data). Correlation & Regression Analysis. Application of One Sample t-test
Application of Two Sample Fisher's test. Chi-Square test of Goodness of Fit. Chi-Square
test of Independence of Attributes for 2 x 2 contingency table. Analysis of Variance One Way
Classification. Analysis of Variance Two Way Classification. Selection of random sample
using Simple Random Sampling.
References:
1. Hand Book of Agricultural Statistics by S. RS. Chandel
2. Fundamentals of Mathematical Statistics (VolI&I) by S.C. Gupta and
V.K. Kapoor.
3. Mathematical Statistics by J.N. Kapur and H.C. Saxena
4, Elements of Statistics by B.N. Asthana.
5. Elements of Statistics by E.B. Mode.
6. Statistical Methods for Agricultural Workers by V.G.Panse & P.V. Sukhatme.
7. Design and Analysis of Experiments by MN. Das & N.C. GiriIntroduction
and Development of Sta ‘The Statistics seems to have been derived from the
Latin word ‘Status’ or the Italian word ‘Statista” or the German word ‘Statistik’ each of
which means a “political state” In ancient times, the government used it to collect the
information regarding the population and “property or wealth” of the country
Sir, Ronald A. Fisher (1890-1962) known as the father of statistics who applied statistics into
various field such as Genetics, Biometry, Education and Agriculture etc.
Definition of statistics: “These are the aggregates of facts affected to a marked extent by
multiplicity of causes, numerically expressed, enumerated or estimated according to
reasonable standards of accuracy, collected in a systematic manner, for a predetermined
purpose and placed in relation to each other”. by Prof. Horace & Secrist.
When itis used in plural, it means the quantitative data
When it is used in singular, it is defined as “science which deals with collection, presentation,
analysis and interpretation of numerical data”. by Croxton and Cowden.
Purpose or Function of statistics:
1. To summarise the large mass of data into a few representative value.
2. To establish a relation among the data sets or within each data set
Importance and Scope:
It has wide applications in almost all sciences like social as well as physical: Planning,
Economics, Business, Industry, Meteorology, Education, War, Agriculture, Psychometry etc.
Limitations of statisties:
= Statistics is not suited to the study of qualitative phenomenon.
= Statistics does not study individuals,
* Statistical laws are not exact.
= Statistics is liable to be misused.
Application of Statistics in Agriculture
In Agriculture it is used as collection, presentation, analysis and interpretation of numerical
data, In Agriculture it is applied in design of experiments through Analysis of variance and
various statistical tools are applied to find:
* Suitable fertilizer dose
* Suitable varieties of different crops.
= Date of sowing,
+ Method of transplanting
* Inmeteorology weather forecasting,
= Disease and insect pest forecasting,
+ Weather parameters (temperature rainfall, sunshine, wind velocity, humidity ete.)* Yield of the different crops,
* Yield attributes, morphological and biochemical traits,
* Chemical and physical studies of soil,
= Evaluation of pesticide efficacy,
* Cost of cultivation
= Crop cutting experiment to estimate the yield of the different crops,
+ Preharvest forecastof yield based on biometrical characters and farmers” appraisa,
+ Forecasting of yield of different crops based on meteorological data,
Statisstical tools:
= Measures of central tendency,measures of dispersion, graphical representations
* Different Sampling techniques in sample survey
= Different Test of significance
* Correlation, regression,multiple correlations and multiple regression.
® Rank correlation and more.
Frequency distribution
It is an arrangement of variate values along with their respective frequency.
Frequency : Frequency is derived from “how frequently a variable occurs”
Each class is defined by two boundaries Lower boundary is called lower limit and
upper boundary is called upper limit.
Range = Maximum Value - Minimum Value
Class Interval Upper limit ~ Lower limit
Mid value (Upper limit + Lower limit)/2; Frequeney density = frequeney/elass width
Relative frequency = Prequenee ohana
The following points may be kept in mind for classification of data:
(The classes should be clearly defined and free from ambiguity.
(ii) The classes should be exhaustive, ie. each of the given value should be included in one
of the class.
(iii) The classes should be mutually exclusive and non-overlapping,
(iv) The classes should be of equal width.
(v) Indeterminate classes, open end classes: less than or greater than should be avoided as.
far as possible
(vi) The number of classes should neither be too large nor too small. It should preferably lie
between 5 and 15. Struges used the formulae for determining the approximate number
ofclasses K=1 + 3.322 logioN, where N is the total frequency.@
Gi)
ii)
@
(i
Graphical Representation
Graphical representations are represented by points plotted on a graph paper which
makes the unwieldy data intelligible and conveys to the eye the general run of
observations. Graphical representation also facilitates the comparison of two or more
frequency distribution.
‘Some important type of graphical representation are:
Histogram
Frequency Polygon
Frequency curve
Histogram: If the frequency distribution is not continuous first itis to be converted into
continuous distribution by subtracting 0.5 from the lower limit and adding 0.5 to the
upper limit of each classes. In drawing histogram of a continuous frequency distribution
we first mark off class intervals on x-axis and corresponding frequency on y-axis by
selecting a suitable scale. On cach class interval we erect rectangles with heights
proportional to the frequency of the corresponding class interval so that the area of the
rectangle is proportional to the frequency of the class. If, however, the classes are of
unequal width then the heights of the rectangle, will be proportional to the ratio of the
frequency to the width of the class, the diagrams of continuous rectangles so obtained is
called histogram
Frequency polygon: For ungrouped distribution, the frequency polygon is obtained by
plotting the points with abscissa as the variate values and the ordinate as the
corresponding frequency and joins the points by means of straight line, For a grouped
frequency distribution the abscissa of the points are mid values of the class intervals, The
frequency polygon so obtained should be extended to the base line(x-axis) at both ends
so that it meets the x-axis at the mid points of two hypothetical classes, the class before
the first class and the class after the last class, each assumed to have zero frequency.
Frequency curve: If the class intervals are of small width, the frequency polygon can be
approximated to frequency curve and we join the points with smooth hand. The
frequency curve can also be obtained by drawing a smooth free hand curve through the
vertices of the frequency polygon.
Measures of central Tendency
“Central tendency may be defined as a value of the variate which is thoroughly
representative of the series or the distribution as a whole”. They give us an idea about the
concentration of the values in the central part of the distribution. The following are the
measures of central tendency.
Arithmetic mean or mean.
Median
(iii) Mode
(wv)
)
Geometric Mean
Harmonic MeanCharacteristics for an ideal measures of central tendency:
It should be rigidly defined
It should be readily comprehensible and easy to calculate,
iii It should be based upon all the observations.
iv. It should be suitable for further mathematical treatment.
Vv. Itshould be affected as little as possible by fluctuation of sampling
vi Itshould not affected much by extreme values.
Arithmetic Mean: If x1, x2,... Xn are n observations, then Arithmetic mean is given by
i
ii.
AM. = (x14 x24. AXa)/n
In case of frequency distribution,
Mean =(xifi + x2f+ ...4%a fa/N- Where, N= Ei fi
In case of grouped or continuous frequency distribution, x is taken as the mid value of the
corresponding class.
Properties of Arithmetic Meat
1. AM is independent of change of origin and scale both.
2.Algebraic sum of the deviations of a set of values from their arithmetic mean is zero.
3. The sum of the squares of the deviations of a set of values is minimum when taken about
‘mean.
ren
4. Combined mean, ¥ =
Merits of Arithmetic meai
(i) Itis rigidly defined
Gi) tis easy to understand and easy to calculate.
iii) Ttis based upon all the observations
(iv) Itis amenable to algebraic treatment.
(%) It is affected least by of fluctuation of sampling. This property is some time
described by saying that A.M. is stable average.
Demerits:
1. Arithmetic mean is affected very much by the extreme values.
2. Itcan not be determined by inspection.
3. It can not be used in qualitative characteristics like intelligence, honesty, beauty
4. Arithmetic mean can not be accurately obtained if single observation is missing or lost
5. Arithmetic mean can not be calculated if the extreme class is open.
Uses: It is generally used in all the subjects of studies like social and economic studies.
Average cost of production, Average price, Average yield! acre ete
Median:
Median of a distribution is the value of the variable which divides it into two equal parts.
It is the value which exceeds and is exceeded by the same number of observation ie. it is
the value such that the number of observation above it is equal to the number of
‘observation below it.Step-I: In case of ungrouped data, if the number of observation is odd then median is the
middle value after the values have been arranged in ascending or descending order of
‘magnitude.
Step-II: In case of even number of observations there are two middle terms and median
is obtained by taking the arithmetic mean of these middle terms after arranging the series
in ascending or descending order.
Step-III: In case of discrete frequency distribution, median is obtained by:
(i) Construct cumulative frequencies
(ii) Find N/2, Where, N= Eiki fr
(iii) See the cumulative frequency (c.f) just greater than N/2 and the corresponding
value of x gives median.
StepIV: In case of continuous frequency distribution, median is obtained by the formula
N
Medain= 1+ 275 x
Where, Lis the lower limit of the median class.
fis the frequency of the median class
his the magnitude of the median class.
is the cumulative frequency preceding the median class
N=Eh
@ s rigidly defined
(ii) It is easy to understand and to calculate.
(iii) It isnot atall affected by extreme values
(iv) Itean be calculated for distribution with open the classes.
Demerits of median:
(i) __It is not amenable to algebraic treatment.
(ii) Itis affected much by fluctuation of sampling
ii) In case of even number of observation median can not be determined exactly
(iv) It is not based on all the observations
Uses: 1. Median is the only average to be used while dealing with qualitative data e.g. to find
the average intelligence or average honesty among a group of people.
2. It is to be used for determining the typical value in problems conceming distribution
of wages ete
Mode: This is that value of the variable which occurs most frequently or whose frequency is
maximum,
In case of continuous distribution mode is given by:
tn
Mode = b+ 57
xh
he
Where, L= Lower limit of the modal class.
f= maximum frequency of modal class.f; & fy are the frequencies of preceding and following of the modal class respectively.
h= Magnitude of the modal class
Merits:
1. Itis readily comprehensible and easy to calculate
2. Itis not at all affected by the extreme values
3. Itcan be obtained simply by inspection
4, Itcan be computed in case of open end class
Demerits:
1. It is not rigidly defined. A distribution with two modes is called bi-modal and the
distribution with more than two modes is called multi-modal.
2. Itis not suitable for further mathematical treatment
3. It is not based on all the observations
4, Itis affected to a great extent by fluctuation of sampling
Uses: Mode is the average to be used in finding the ideal size e.g. in business forecasting, in
‘manufacture of ready-made garments, shoes size ete
For a symmetrical distribution; mean, median and mode coincide. If the distribution is
moderately asymmetrical the mean median and mode obey the following empirical relations:
Mean ~ median = 1/3 (Mean ~ mode)
mode = 3 median 2 mean
DISPERSION
“Dispersion is the measure of extent to which individual items vary by” L.R Connor
Consider the series (i) 7, 8, 9,
11 (ii) 3, 6,9, 12, 15 (iii) 1, 5, 9, 13, 17
In all these cases we see that the number of observation is 5 and the mean is 9. We can not
form an idea as to whether it is the average of 1* series or 2™ series or third series or any other
series of 5 observation whose sum is 45. Thus we see that the measure of central tendency are
inadequate to give us a complete idea of distribution. They must be supported and
supplemented by some other measures. One such measure is dispersion
Literal meaning of dispersion is *Scatteredness’. In dispersion, we have an idea about the
homogeneity or heterogeneity of the distribution. We say that series (i) is more homogeneous
(less dispersed) than the series (ii) or (iii) or we say that series (iii) is more heterogeneous
(more scattered) than the series (i) or (ii)
Characteristics for an ideal Measure of dispe1
i. should be rigidly defined.
ii, It should be easy to calculate and easy to understand.
iii, Itshould be based on all the observations.
iv. It should be amenable to further mathematical treatment.
v. Itshould be affected as litle as possible by fluctuation of samplingFollowing are the measures of dispersion:
1. Range.
2. Quartile deviation or Semi- interquartile range
3. Mean deviation.
4. Standard deviation.
1.Range: Range is the difference between two extreme observations of the distribution. If A
and B are two extreme values then
Range = A-B
Where, A and B are the two extreme value
example: 2, 4, 6, 8, 25, 30
Range = 30 - 2 = 28
Range is not a reliable measure of dispersion as it is based upon only two extreme
values.
2. Quartile de
ion = ( Qs— Quy/2
Where, Q; and Qs are the 1"! and 3" quartile respectively.
Itis not a reliable measure of dispersion as it covers only 50% of the distribution.
(3) Mean Deviation :
If-xi / {jis the frequency distribution then mean deviation is given by
Mean Deviation = at psi — A] Where,
> Mean deviation is also not a reliable measure of dispersion as it takes only positive value
due modulus sign.
> Mean deviation is least when measured from median.
(4) Standard deviation
It is the positive square root of the arithmetic mean of the square of deviation from
arithmetic mean.
Standard deviation is denoted by 6 (sigma)
Ma lp
6 = fee HF for ungrouped data.
6 = [ena- 9 for grouped data,
Shortcut method
1 ‘1 7 _
o?x=07d,= Edt?—(_ fai) where, 4
ox, = Wad, = h2 i Zhi di? — (g20)'] where, di=
Where A= Arbitrary value
h= Class intervalItisareliable measures of dispersion as it satisfies all characteristics for an ideal
measures of dispersion
Standard deviation or, variance is independent of change of origin but not of scale.
Coefficient of dispersion
Whenever, you want to compare the variability in two series, we compute coefficient
of dispersion, not measure of dispersion. Coefficient of dispersion is independent of unit of
measurement
3,5, 7, 11, 15, 17 (em)
4,6, 8, 10, 12 (kg)
We can compare the above two series although they are measured in different units.
Following are the measures of dispersion:
1, Coefficient of dispersion based upon range _ A -B
A+B
2. Coefficient of dispersion based upon Quartile deviation Q; - Qi
2 = %-%
O+ Or Q+Q
2
3. Coefficient of dispersion based upon mean deviation
> M._D.
Avy. from which it is calculated
o
4. Coefficient of dispersion based upon SD.= =
Coefficient of variation: It is 100 times co efficient of dispersion based upon standard
deviation,
o
cy. = = x100%
x
Whenever, we want to compare the variation in two series,we compute coefficient of
variation each series separately. The series having more. C.V. in comparison to other is said
to be more variable than others and the series having less. C.V. in comparison to other is said
to be more consistent than others.
Moments.
The r moment of a variable X about the point x = A, usually denoted by 1, is given by:
Fula fixi— AY, DfiaN
Yi fidtwhere dj = X;— A
zie
‘The r™ moment of a variable x about the mean, ¥ usually denoted by pir is given by:
Hy = FESUR! = FEE HL
and 4, = 25 fi (%;-¥ )~0 (being the algebraic sum of deviation from mean is zero)