Statistical Methods

Statics maths

Uploaded by

prakratidubey775

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

16 views

Statistical Methods

Statics maths

Uploaded by

prakratidubey775

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 28

i { { i { Teaching Material Course Title: Statistical Methods Credit Hrs. 2(1+1) Semester-HI Compiled by: Dr. S. N. Singh (Univ.Professor ) Dr. Fozia Homa (Asstt. Professor) Mr. Subrat Keshori Behera (Asstt. Professor) Department of Statistics, Mathematics & Computer Application BIHAR AGRICULTURAL COLLEGE, SABOUR BIHAR AGRICULTURAL UNIVERSITY, SABOUR BHAGALPUR PIN 813 210. ‘Course Title: Statistical Methods Credit Hrs. 2(1+1) Course Content Theory: Introduction to Statistics and its Applications in Agriculture, Graphical Representation of Data, Measures of Central Tendency & Dispersion, Definition of Probability, Addition and Multiplication Theorem (without proof). Simple Problems Based on Probability. Binomial & Poisson Distributions, Definition of Correlation, Scatter Diagram, Karl Pearson's Coefficient of Correlation. Linear Regression Equations. Introduction to Test of Significance, One sample & two sample test t for Means, Chi-Square Test of Independence of Attributes in 2 x 2 Contingency Table. Introduction to Analysis of Variance, Analysis of One Way Classification. Introduction to Sampling Methods, Sampling versus Complete Enumeration, Simple Random Sampling with and without replacement, Use of Random Number Tables for selection of Simple Random Sample. Practical: Graphical Representation of Data, Measures of Central Tendency (Ungrouped data) with Calculation of Quartiles, Deciles & Percentiles. Measures of Central Tendency (Grouped data) with Calculation of Quartiles, Deciles & Percentiles. Measures of Dispersion (Ungrouped Data). Measures of Dispersion (Grouped Data). Moments, Measures of Skewness & Kurtosis (Ungrouped Data). Moments, Measures of Skewness & Kurtosis (Grouped Data). Correlation & Regression Analysis. Application of One Sample t-test Application of Two Sample Fisher's test. Chi-Square test of Goodness of Fit. Chi-Square test of Independence of Attributes for 2 x 2 contingency table. Analysis of Variance One Way Classification. Analysis of Variance Two Way Classification. Selection of random sample using Simple Random Sampling. References: 1. Hand Book of Agricultural Statistics by S. RS. Chandel 2. Fundamentals of Mathematical Statistics (VolI&I) by S.C. Gupta and V.K. Kapoor. 3. Mathematical Statistics by J.N. Kapur and H.C. Saxena 4, Elements of Statistics by B.N. Asthana. 5. Elements of Statistics by E.B. Mode. 6. Statistical Methods for Agricultural Workers by V.G.Panse & P.V. Sukhatme. 7. Design and Analysis of Experiments by MN. Das & N.C. GiriIntroduction and Development of Sta ‘The Statistics seems to have been derived from the Latin word ‘Status’ or the Italian word ‘Statista” or the German word ‘Statistik’ each of which means a “political state” In ancient times, the government used it to collect the information regarding the population and “property or wealth” of the country Sir, Ronald A. Fisher (1890-1962) known as the father of statistics who applied statistics into various field such as Genetics, Biometry, Education and Agriculture etc. Definition of statistics: “These are the aggregates of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standards of accuracy, collected in a systematic manner, for a predetermined purpose and placed in relation to each other”. by Prof. Horace & Secrist. When itis used in plural, it means the quantitative data When it is used in singular, it is defined as “science which deals with collection, presentation, analysis and interpretation of numerical data”. by Croxton and Cowden. Purpose or Function of statistics: 1. To summarise the large mass of data into a few representative value. 2. To establish a relation among the data sets or within each data set Importance and Scope: It has wide applications in almost all sciences like social as well as physical: Planning, Economics, Business, Industry, Meteorology, Education, War, Agriculture, Psychometry etc. Limitations of statisties: = Statistics is not suited to the study of qualitative phenomenon. = Statistics does not study individuals, * Statistical laws are not exact. = Statistics is liable to be misused. Application of Statistics in Agriculture In Agriculture it is used as collection, presentation, analysis and interpretation of numerical data, In Agriculture it is applied in design of experiments through Analysis of variance and various statistical tools are applied to find: * Suitable fertilizer dose * Suitable varieties of different crops. = Date of sowing, + Method of transplanting * Inmeteorology weather forecasting, = Disease and insect pest forecasting, + Weather parameters (temperature rainfall, sunshine, wind velocity, humidity ete.)* Yield of the different crops, * Yield attributes, morphological and biochemical traits, * Chemical and physical studies of soil, = Evaluation of pesticide efficacy, * Cost of cultivation = Crop cutting experiment to estimate the yield of the different crops, + Preharvest forecastof yield based on biometrical characters and farmers” appraisa, + Forecasting of yield of different crops based on meteorological data, Statisstical tools: = Measures of central tendency,measures of dispersion, graphical representations * Different Sampling techniques in sample survey = Different Test of significance * Correlation, regression,multiple correlations and multiple regression. ® Rank correlation and more. Frequency distribution It is an arrangement of variate values along with their respective frequency. Frequency : Frequency is derived from “how frequently a variable occurs” Each class is defined by two boundaries Lower boundary is called lower limit and upper boundary is called upper limit. Range = Maximum Value - Minimum Value Class Interval Upper limit ~ Lower limit Mid value (Upper limit + Lower limit)/2; Frequeney density = frequeney/elass width Relative frequency = Prequenee ohana The following points may be kept in mind for classification of data: (The classes should be clearly defined and free from ambiguity. (ii) The classes should be exhaustive, ie. each of the given value should be included in one of the class. (iii) The classes should be mutually exclusive and non-overlapping, (iv) The classes should be of equal width. (v) Indeterminate classes, open end classes: less than or greater than should be avoided as. far as possible (vi) The number of classes should neither be too large nor too small. It should preferably lie between 5 and 15. Struges used the formulae for determining the approximate number ofclasses K=1 + 3.322 logioN, where N is the total frequency.@ Gi) ii) @ (i Graphical Representation Graphical representations are represented by points plotted on a graph paper which makes the unwieldy data intelligible and conveys to the eye the general run of observations. Graphical representation also facilitates the comparison of two or more frequency distribution. ‘Some important type of graphical representation are: Histogram Frequency Polygon Frequency curve Histogram: If the frequency distribution is not continuous first itis to be converted into continuous distribution by subtracting 0.5 from the lower limit and adding 0.5 to the upper limit of each classes. In drawing histogram of a continuous frequency distribution we first mark off class intervals on x-axis and corresponding frequency on y-axis by selecting a suitable scale. On cach class interval we erect rectangles with heights proportional to the frequency of the corresponding class interval so that the area of the rectangle is proportional to the frequency of the class. If, however, the classes are of unequal width then the heights of the rectangle, will be proportional to the ratio of the frequency to the width of the class, the diagrams of continuous rectangles so obtained is called histogram Frequency polygon: For ungrouped distribution, the frequency polygon is obtained by plotting the points with abscissa as the variate values and the ordinate as the corresponding frequency and joins the points by means of straight line, For a grouped frequency distribution the abscissa of the points are mid values of the class intervals, The frequency polygon so obtained should be extended to the base line(x-axis) at both ends so that it meets the x-axis at the mid points of two hypothetical classes, the class before the first class and the class after the last class, each assumed to have zero frequency. Frequency curve: If the class intervals are of small width, the frequency polygon can be approximated to frequency curve and we join the points with smooth hand. The frequency curve can also be obtained by drawing a smooth free hand curve through the vertices of the frequency polygon. Measures of central Tendency “Central tendency may be defined as a value of the variate which is thoroughly representative of the series or the distribution as a whole”. They give us an idea about the concentration of the values in the central part of the distribution. The following are the measures of central tendency. Arithmetic mean or mean. Median (iii) Mode (wv) ) Geometric Mean Harmonic MeanCharacteristics for an ideal measures of central tendency: It should be rigidly defined It should be readily comprehensible and easy to calculate, iii It should be based upon all the observations. iv. It should be suitable for further mathematical treatment. Vv. Itshould be affected as little as possible by fluctuation of sampling vi Itshould not affected much by extreme values. Arithmetic Mean: If x1, x2,... Xn are n observations, then Arithmetic mean is given by i ii. AM. = (x14 x24. AXa)/n In case of frequency distribution, Mean =(xifi + x2f+ ...4%a fa/N- Where, N= Ei fi In case of grouped or continuous frequency distribution, x is taken as the mid value of the corresponding class. Properties of Arithmetic Meat 1. AM is independent of change of origin and scale both. 2.Algebraic sum of the deviations of a set of values from their arithmetic mean is zero. 3. The sum of the squares of the deviations of a set of values is minimum when taken about ‘mean. ren 4. Combined mean, ¥ = Merits of Arithmetic meai (i) Itis rigidly defined Gi) tis easy to understand and easy to calculate. iii) Ttis based upon all the observations (iv) Itis amenable to algebraic treatment. (%) It is affected least by of fluctuation of sampling. This property is some time described by saying that A.M. is stable average. Demerits: 1. Arithmetic mean is affected very much by the extreme values. 2. Itcan not be determined by inspection. 3. It can not be used in qualitative characteristics like intelligence, honesty, beauty 4. Arithmetic mean can not be accurately obtained if single observation is missing or lost 5. Arithmetic mean can not be calculated if the extreme class is open. Uses: It is generally used in all the subjects of studies like social and economic studies. Average cost of production, Average price, Average yield! acre ete Median: Median of a distribution is the value of the variable which divides it into two equal parts. It is the value which exceeds and is exceeded by the same number of observation ie. it is the value such that the number of observation above it is equal to the number of ‘observation below it.Step-I: In case of ungrouped data, if the number of observation is odd then median is the middle value after the values have been arranged in ascending or descending order of ‘magnitude. Step-II: In case of even number of observations there are two middle terms and median is obtained by taking the arithmetic mean of these middle terms after arranging the series in ascending or descending order. Step-III: In case of discrete frequency distribution, median is obtained by: (i) Construct cumulative frequencies (ii) Find N/2, Where, N= Eiki fr (iii) See the cumulative frequency (c.f) just greater than N/2 and the corresponding value of x gives median. StepIV: In case of continuous frequency distribution, median is obtained by the formula N Medain= 1+ 275 x Where, Lis the lower limit of the median class. fis the frequency of the median class his the magnitude of the median class. is the cumulative frequency preceding the median class N=Eh @ s rigidly defined (ii) It is easy to understand and to calculate. (iii) It isnot atall affected by extreme values (iv) Itean be calculated for distribution with open the classes. Demerits of median: (i) __It is not amenable to algebraic treatment. (ii) Itis affected much by fluctuation of sampling ii) In case of even number of observation median can not be determined exactly (iv) It is not based on all the observations Uses: 1. Median is the only average to be used while dealing with qualitative data e.g. to find the average intelligence or average honesty among a group of people. 2. It is to be used for determining the typical value in problems conceming distribution of wages ete Mode: This is that value of the variable which occurs most frequently or whose frequency is maximum, In case of continuous distribution mode is given by: tn Mode = b+ 57 xh he Where, L= Lower limit of the modal class. f= maximum frequency of modal class.f; & fy are the frequencies of preceding and following of the modal class respectively. h= Magnitude of the modal class Merits: 1. Itis readily comprehensible and easy to calculate 2. Itis not at all affected by the extreme values 3. Itcan be obtained simply by inspection 4, Itcan be computed in case of open end class Demerits: 1. It is not rigidly defined. A distribution with two modes is called bi-modal and the distribution with more than two modes is called multi-modal. 2. Itis not suitable for further mathematical treatment 3. It is not based on all the observations 4, Itis affected to a great extent by fluctuation of sampling Uses: Mode is the average to be used in finding the ideal size e.g. in business forecasting, in ‘manufacture of ready-made garments, shoes size ete For a symmetrical distribution; mean, median and mode coincide. If the distribution is moderately asymmetrical the mean median and mode obey the following empirical relations: Mean ~ median = 1/3 (Mean ~ mode) mode = 3 median 2 mean DISPERSION “Dispersion is the measure of extent to which individual items vary by” L.R Connor Consider the series (i) 7, 8, 9, 11 (ii) 3, 6,9, 12, 15 (iii) 1, 5, 9, 13, 17 In all these cases we see that the number of observation is 5 and the mean is 9. We can not form an idea as to whether it is the average of 1* series or 2™ series or third series or any other series of 5 observation whose sum is 45. Thus we see that the measure of central tendency are inadequate to give us a complete idea of distribution. They must be supported and supplemented by some other measures. One such measure is dispersion Literal meaning of dispersion is *Scatteredness’. In dispersion, we have an idea about the homogeneity or heterogeneity of the distribution. We say that series (i) is more homogeneous (less dispersed) than the series (ii) or (iii) or we say that series (iii) is more heterogeneous (more scattered) than the series (i) or (ii) Characteristics for an ideal Measure of dispe1 i. should be rigidly defined. ii, It should be easy to calculate and easy to understand. iii, Itshould be based on all the observations. iv. It should be amenable to further mathematical treatment. v. Itshould be affected as litle as possible by fluctuation of samplingFollowing are the measures of dispersion: 1. Range. 2. Quartile deviation or Semi- interquartile range 3. Mean deviation. 4. Standard deviation. 1.Range: Range is the difference between two extreme observations of the distribution. If A and B are two extreme values then Range = A-B Where, A and B are the two extreme value example: 2, 4, 6, 8, 25, 30 Range = 30 - 2 = 28 Range is not a reliable measure of dispersion as it is based upon only two extreme values. 2. Quartile de ion = ( Qs— Quy/2 Where, Q; and Qs are the 1"! and 3" quartile respectively. Itis not a reliable measure of dispersion as it covers only 50% of the distribution. (3) Mean Deviation : If-xi / {jis the frequency distribution then mean deviation is given by Mean Deviation = at psi — A] Where, > Mean deviation is also not a reliable measure of dispersion as it takes only positive value due modulus sign. > Mean deviation is least when measured from median. (4) Standard deviation It is the positive square root of the arithmetic mean of the square of deviation from arithmetic mean. Standard deviation is denoted by 6 (sigma) Ma lp 6 = fee HF for ungrouped data. 6 = [ena- 9 for grouped data, Shortcut method 1 ‘1 7 _ o?x=07d,= Edt?—(_ fai) where, 4 ox, = Wad, = h2 i Zhi di? — (g20)'] where, di= Where A= Arbitrary value h= Class intervalItisareliable measures of dispersion as it satisfies all characteristics for an ideal measures of dispersion Standard deviation or, variance is independent of change of origin but not of scale. Coefficient of dispersion Whenever, you want to compare the variability in two series, we compute coefficient of dispersion, not measure of dispersion. Coefficient of dispersion is independent of unit of measurement 3,5, 7, 11, 15, 17 (em) 4,6, 8, 10, 12 (kg) We can compare the above two series although they are measured in different units. Following are the measures of dispersion: 1, Coefficient of dispersion based upon range _ A -B A+B 2. Coefficient of dispersion based upon Quartile deviation Q; - Qi 2 = %-% O+ Or Q+Q 2 3. Coefficient of dispersion based upon mean deviation > M._D. Avy. from which it is calculated o 4. Coefficient of dispersion based upon SD.= = Coefficient of variation: It is 100 times co efficient of dispersion based upon standard deviation, o cy. = = x100% x Whenever, we want to compare the variation in two series,we compute coefficient of variation each series separately. The series having more. C.V. in comparison to other is said to be more variable than others and the series having less. C.V. in comparison to other is said to be more consistent than others. Moments. The r moment of a variable X about the point x = A, usually denoted by 1, is given by: Fula fixi— AY, DfiaN Yi fidtwhere dj = X;— A zie ‘The r™ moment of a variable x about the mean, ¥ usually denoted by pir is given by: Hy = FESUR! = FEE HL and 4, = 25 fi (%;-¥ )~0 (being the algebraic sum of deviation from mean is zero)