Sampling Theory and Method-301-500

Uploaded by

limfoohoat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

89 views200 pages

Sampling Theory and Method-301-500

Uploaded by

limfoohoat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 200

280 STRATIFIED SAMPLING {ch 7 7.14 TECHNIQUE OF POST-STRATIFICATION The technique of posf stratification consists m drviding the popula, tian and the selected sample at the estimation stage into a certain number of strata (K’, say), termed post strata, and estimating ¥ by the weighted mean of the estimators of the post strata means, the, weights bemg the proportion of umts in the post strata That 1s, the estimator proposed 1s of the form ~ + As Foun E rahiiy (793) 1 where the subsoript pst donotes post stratification, and ?, and WY, are the unbiased estimators of the sth post stratum total of y and of the total number of units in that post stratum respectively The estrmators P, and N, are respectively obtained from the usual unbided estimator } by substatuting the value of y and J respectively for sample umts belonging to the sth post stratum and assigning the value 0 to those unts not belonging to that post stratum Thé estimator (7 93) 1s sometimes used in practice instead of the conven . ~ x tional estimator Ys; which can be wmitten as ¥ a #,, since the at post stratified estimator },/N, 1s expected to have a smaller sampling varianee than the estimator P,/N; though it 15 biased due to the use of the ratio estimator ,/N, Honco the problem to be faced 1m the caso of post stratification 13 whether the possible reduction ii the sampling variance achieved through post stratification is adequate to offset the bias introduced in the estamator The techmquoe of post-stratification using a particular stratification vatiable can be resorted to oven when stratified sampling based on another strati fication variable has been used i drawing the omgmal sample Tins technique 1s also useful in obtainmg estimates for domams of study by troatmg them as post strata, if they are not already treated as selechon strataSee. 7.14] TECHNIQUE OF POST-STRATIFICATION 281 ‘The problem of post-stratification has been discussed by Hansen, Horwitz and W. G. Madow (1953) and Cochran (1963), Williams(1962) has given a simple procedure of finding approximations to the variance and the variance estimator of a post-stratified estimator on the basis of the variance and variance estimator of the original estimator. This procedure is based on the result that t x) =F as (7.94) when tho sample size is fairly argo (ef. Chapter 10). The variance of the post-stratified estimator ¥’ and its variance estimator are obtained from the variance of the usual estimator ¥ and its variance estimator by substituting Ya—Y; in place of Tu and yu—¥; in plac of ys; respectively. For instance, suppose post-stratification is resorted to when the original sampling design is unstratified sts wor. The variance of the usual estimator Y and its variance estimator can be written as a we 1 & x, - W(¥)= + a S| —Fy wee (7.95 PY) = wot we 22, fal (7.98) and nm ZS Gad), ves (7-96) fat iad where W; and nj are the number of population units and sample units falling in the s-th post-stratum. In this ease the post-stratified estimator ¥" will be PF wis. ve (7.97) rt 36282 STRATIFIED SAMPLING [ch 7 where y, 18 the mean of the n, stmplo observations falling m the sth post stratum ts variance and vatianco estimator are approx: mately given by N, woy= toe te Ev ee (798) and aiy= Se ig Wau) (799) Na m1 gain These approximations to the variance and the variance estimator of the post strvtified estimator are hkely to be close to the actual values only when the sample s20 28 farly largo (ef Problem 723 p 292) 7.15 GONTROLLED SELECTION In this section a brief discussion 18 given on the devices available for effecting controls boyond stratificvtion im selecting the sample 80 as to got estimates with greater precision por umt of cost than 1s possible in srmple stratified sampling R Goodman and Kash (1950) suggested a process of selection tormed controlled selection, which while retaimmng the probabilities of selection assigned to the units in a stratified samplmg design ensures greater probabilities of selec tion for some or all preferred combinations of » out of 1 umts and consequently less probabilities of selection for some or all non preferred combinations than in the ongmal stratified sampling design At this stage 1b may be mentioned that even stratified samplng and systema tie samplng with a prespecified arrangement ore also a hind of con trolled selection as in these designs the probabihties of some or all proferred combinations of umts are moreased and those of some or all non preferred combinations ate reduced as compared to an un atratified sampling design However, im this section the torm con trolled selection ts used in tho sense of having controls beyond strat fication for reducmg the sampling variance of the estimator of tho population paramoter under studySee. 7.15 ] CONTROLLED SELECTION 283 ‘The fact that it is possible to introduce additional control in stratified sampling can bo illustrated by the example of sampling one hospital from each of two size-strata given by Hess, Ricdel and Fitzpatrick (1961). Let A, B, C, D be the four large hospitals forming stratum 1 and let a, b, ¢, d and ¢ be the five small hospitals forming stratum 2, If one hospital is to be selected from each stratum with srs, then the probability of selection of any unit is 0.25 in stratum 1 and 0.20 in stratum 2. Suppose it is further known that the hospitals A, B, a and 6 have ownership code 1 and thet the hospitals C, D, c, d and ¢ have the ownership code 2. In controlled selection an attempt is made to increase the probability of getting samples with hospitals having different ownership codes retaining at the samo time the originally assigned probabilities of selection. One way of achieving this is to re-arrange the units in the two strata such that in stratum 1 hospitals with ownership code 2 come first and then those with ownership code 1 and in stratum 2 the hospitals with code 1 come first and then those with code 2 and to select 2 hospitals systematically with probability proportional to their original probabilities of selection from all the unitsin the two strata taken together. All possible samples in stratified and controlled sampling are shown in Table 7.8, from which it is clear that the probability of getting a preferred sample having hospitals with different ownership codes is 0.90 in controlled selection as compared to 0.50 in stratified sampling. ‘As can be seen from the above example, in controlled selection an attempt is made to improve upon simple stratified sampling by making the selection in the different strata dependent on each other. It may be noted that in the above example we are conceptually trying to select 2 units from 4 deep strata formed on the basis of two stratification systems—one according to size and the other according to ownership—such that one unit is selected from each of the two size-strata and that the units in a sample are as heterogeneous as possible with respect to the second system of stratification. Thus the problem of controlled selection may be posed in a general way as that of devising 2 method of eclecting 2 sample, when the284 STRATIFIED SAMPLING (Ch? TABLE 78 AQL POSSIBLE SAMPLES IN STRATIFIED AND CONTROLLED SAMPLING WITH THEIR PROBABILITIES OF SELECTION conteolled Sia hospt_ owner proba stratufied samplmg solection oy eaple Pipbe comple poke on proba ” ality * tity SF ality a) (2) 3) ® (8) (6) mM (8) (9) (10) 1 c 2 2 Gy Aya, ya, 20 D 2 2 Gh 05 A, OS A I 28 Cala 05 Ava 08 Diby ocd B 1 25 Celg = Ad, Dery 10 2 a 1 20 Gat. 05 eg Aye b 1 20 Dy, 058 By OB Ady e 2 20 Dh 05 Bh OB a 2 20 De. 05 Be, 08 Be, 20 ° 2 20 Dit, 05 Bdp 5 Dita 05 Byes 05 * after ro arrangement for controlled selection ** The subecrspt denotes the ownorthip code of the hospital A preferred estaple as one with the hospitals having different ownership eodes number of multiple or deep strata 1s mote than the sample sue, such that the allocation to two or more systems of strata and the origmal probabihties of eelection for tho units are ensured Bryant, Hartley and Jessen (1960) have given an intoresting and faxly simple solution to this problem and the procedure suggested by them 1s briefly desorbed here Teco way Stratification Suppozo the population 1a stratified into HT and KX strata on the basis of two strata fieation variables 2, and x, respectively Let Wy ,a=12 ,Kande=1% ,K, ‘be the proportions of units in the KK deop strata formed by combining the two systems of strata and let W, and 7, be the proportions of wuts in tho e th stratum of the first system and in the # th stratum of tho second system respectively In the ease of pro Porbonat allocation, the sample sizes in tho etrata for the two systems aro given by {el¥s} ond {nIV,} respectzvely In the present cas, xt 1s essamed that the sampleREFIRENCES 285 size n is not largo enough to ensure positive integral allocations to cach of the KK’ multiple or deep strata, sinco if the eample size were large compared to the total num: bor of strata tho usunl methods of allocation (e.g. nar = Wee) can be applied without much difficulty. Honco, instead of ensuring exact proportional allocation to thn multiple strata, that is, {re} = {x Wie}, the suggested procedure ensures that the expected values of the allocations to the multiple strata aro proportional to the number of units in-thom, that is, E(nee) = Wee (or meny)n)- ‘Tho following steps are involved in this procedure : (i) constructing a square of n? cells h m rows and n columns, (i) selecting n of tho eells with equal probability such that no two sefected cctls belong to the samo row or column, (ii) grouping the n rows into K strata such that tho s-th stratum has an allocation of ny. units, (iv) grouping the n columns into K’ strata such that tho «/-th stratum has an allocation of n.s units, and (v) taking the allocation in the (s’)-th deop stratum ns the number of cells (say, neg) solected in tho joint-group formed by the eth group of (iii) and tho s’-th group of (iv). An unbiased estimator of tho population mean in this ease is given by & 1é - F=18 § Gunes ves (7.100) Ta ved ee C ) where gra’ is the sample mean in tho (ss’)-th stratum and Gey = 0 Wey [nenae ‘The abovo proceduro may be illustrated by applying it to the examplo of sampling 2 hospitals considered earlier in this section. Suppose the 9 hospitals in the population are stratified by two systems of stratification on tho basis of information on numbor of beds and ownership and it is desired to have ono sample hospital per stratum in the two systems. In this caso there aro 4 deop strata formed on a joint considera- tion of size and ownership and two units are to bo selected such thnt the marginal allocations of 1 unit per stratum for the two systems of stratification are achieved and thnt the expected allocation to the deep strata are equal to my.r.g/n which'is 1/2 in the present case. ‘The application of the above procedure to this example would. requiro the selection of one hospital from size-stratum 1 and if it happens to possess ownership code 1 (or 2), one hospital is to bo selected from thoso having ownership code 2 (or 1) in size-stratum 2. REFERENCES Aovaata, IT, (1954): A study of stratified random sampling; Ann. Inst. Stat. Math., 1-36. Bownry, A. L, (1926): Measurement of the precision attained in sampling; Bull. Inter. Stat. Just., 22, (1), 1-62. xr, E. C., Hanmex, 1. 0. and Jrssex, R. J. (1960): Design and. estimation jn two-way stratification; J. Amer, Stat. Agsn., 55, 105-124, Bry.286 STRATIFIED SAMPLING [ch 7 Cuaxnavanray, I M (1954) On tho problem of planning a mult: stage survey for multeple correlated characters, Santhya, 14 211 216 Cocrnax, WG (1961) Companson of mothods for determining stratura boundaries, Bull Inter Stat Inat , 8, (2}, 345-358 Cocmnax, WG (1963) Sampling Techniques, Second Edition, Chapters 5 and 54, Toba Wiley & Sons, New Lork Dazentes, T (1950) The problem of optimum stratifioation—I, Skand tlt, 35, 203-213 Dazextes, T and Guavny, M (1951) The problem of optimum stratrfication—IT, Stand Alt, 34 133-148 Dauevius, T (1952) Tho problem of optumum stratification in @ special type of design, Sland AL, 85, 61-70 Dazevius, T (953) Maltivanate sampling problem, Sland Akt, 36, 92-122. Datextes, T (1957) Sampling in Sweden, Almquist & Wioksell, Stockholm Darevses end Honezs, J L (Jr) (1937) The choice of atratifioation points Shand Alt, 40, 193-203 Dazesws, T and Hovers, J L (Jr) (1959) Viaumum variance stratification, J Amer Stat Assn, 4 88-101 Donprx, J (1959) Roview of the book Sampling 1n Swreden J’ Roy Stat Soe, (A), 128, 246-248 Exauax, G (1959) An approximation useful in untvanate stratification, Ann Math Stat , 20, 219-229 Evans, W D (1951) On stratification and optamura allocations, J Amer Stat Assn, 48, 90-104 Guosn, § P (1938) A note on stratafied random sampling with riultiple characters, Bull Cal Stat Assn, 8, 81-90 Goovsax, R and Kusm, TL, (1950) Controlled eelection—4 technique in probability sarapling, J Amer Stat Aas, 45, 350-372 Hassex,M H,Ronwrrz WN aud Mavow, WG (1953) Sample Sureey Afethods and Theory, Volumes I & TI, Chapter 5, John Wiley & Sons, Now York Hess,I,Rreper, DC, and Frecparuice, T B (1961) Probability Sampling of Hospitals and Patients The Unversity of Wichigan Ann Arbor Wess, 1, Serat, Y K. and Baraxwsavax, I R (1966) Stratificatton—A4 practical vovestigation, J Amer Stat Assn , 61, 74-90 Maxatanouts, PC (191) On largo-sealo sample surveys, Phil Trans Roy Soc, 284, (B), 329-151 Manataxonis, P C (1952) Some aspects of tho design of sample surveys, Sanklya, 12, 1-7 BMurtuy, 3 N (1962) Vartanco and confidence interval estimation Sankhya, 24, (B), 1-12 Nermax, J (1934) On the two different aspects of tho representative method, J Roy Stat Soc, 91, 58-625 Sema, VK. (1903) A note on optimum stratification of populations for estunating the population means, dust J Stat, 6, 20-33.COMPLEMENTS AND PROBLEMS, 287 Srvarr, A. (1954): A simple prosentation of optimum sampling results; J. Roy. Stat. Soe., (B), 16, 238-241. Suxnarsm, P. V. (1935): Contributions to tho theory of the representative method; J. Roy. Stat. Soc., Supplomont, 2, 253-268. Waaraus, W. H. (1962): On tho varinnoo of on estimator with post-stmatification; J. Amer. Stat, Aesn., BT, 622-627. COMPLEMENTS AND PROBLEMS 7.1 For a socio-economic survey, all tho villages in a region including the un- inhabited ones were grouped into 4 strata on tho basis of their altitude above sea-level and population density and from each stratum 10 villages were selected with srswr. Tho data on number of houscholds in each of the sample villages are given in Table. 7.9. TABLE 7.9. NUMBER OF HOUSEHOLDS FOR 40 SAMPLE VILLAGES. total total number of households in sample villages stratum no. of mo vice ee a) (2) (3) 4) (5) (8) (8) M10) (ANY (18) 1. 48 84 98 0 10 44 0 1% 13 0 2. 50 147 62 «87 «S$ 158 «170 «:10f- 5G 100 3. 228 262 110 232 139 178 334 0 63 220 4. 17 98: 25 St 35° 0 25 4 1s at (i) Estimate tho total number of households (H) unbiasedly and estimato its rso. (ii) Examine whothor there has been any gain duo to uso of stratification as compared to unstratified erswr. i) Compare the efficiency of tho present allocation with that of the optimum allocation keeping the total sample size fixed. 7.2 Using the data given in Table 7.10 (p.288) and considering tho size classes as strato, compare tho efficiencies of the following alternative allocations of a samplo of 3000 factories for estimating the total output. The samplo is to be selected with ste wor within cach stratum : {a) proportional allocation; {b) allocation proportional to total output; and {c) optimum allocation. illages has been divided into 3 strata having G1, 37 and 7.3 A population of 112 jary information. 24 villages respectively on the basis of the type of available au: We have a sample of 6 villages sclected with srs wor from the first stratum, a ppswr ample of 5 villages from the eccond stratum and two linear systematic samples (of four villages each) selected without replacement from tho third stratum. For eneh sclected village, the total area under wheat (y) is observed. The observed values and other relovant information are given in Table 7.11.288 STRATIFIED SAMPLING (Ch? TABLE 719 DATA O% DISTRIBUTIOV OF FACTORIES BY NUMBER OF WORKERS AND OV AVERAGE OUTPUT AND STANDARD DEVIATION sr suze class no of output standard f factor fa foviation = workers omones tn 00 Be ) (m.000 Ra) @) @ {3} (a) a 1 1— 49 18260 100 80 2 50 — 99 4315 250 200 3 100 — 249, 2233, B00 600 4 250 — 999 1057 1760 1900 5 1000 & above 567 $250 2500 TABLE7TI1 AREA UNDER WHEAT y FOR ALL THE SAMPLE VILLAGES AND CULTIVATED AREA z FOR SAMPLE VILLAGES OF STRATUM 2 stratum 2 stratum 3 {y) somplo stratum 1 ———_—— millage y 2 Y sample 1 sample 2 a) @ By ® (5) (6) 1 18 729 2a7 427 395 2 101 7 238 3°6 412 3 5 870 359 481 503 4 8 305 129 445 348 5 8 569 223 & 46 - = = = (total cultivated eres mm stratum 2 26912 acres 2 and y are im acres) (i) Estimate the total ares under wheat in each stratum zeparstely and also in all the 3 strata taken together {u) Obtoin estimates of mes of these estimates estunating unbusedly their ‘vorsences T4 Ino demographic survey st 1 proposed to have stratified sampling uang the districts m a region es strats The rolevant dain are gwen m Table 38 of Chapter 3 (p 1) (1) Assuming the cost of enumeration and tabulation per person 1s I/tth of a rupee and the overbead cost to be Rs 10000 determme the optimum values of nya ‘that would minimize the sampling vanance of the estunator of the overall population mean for a given expected total cost of Rs 80000 when villages are selected with erswr from each stratum (u) For the same value of tha total samplo se n obtmned in (1) find the values of ms when the allocation 1s made in proportion to Nyy and obtem the cost-eflicency of this procedure as compared to that of (i)COMPLEMENTS AND PROBLEMS 289 7.5 A survey is to be conducted for estimating the total number of literntes in a town having three communities, some particulars of which are given in Table 7.12 based on tho results of a pilot study, TABLE 7.12, A ROUGH IDEA OF THE TOTAL NUMBER OF PERSONS AND PROPORTIONS OF LITERATES. ____ total number porcentage of community of persons iterates 1 60 000 40 2 10 000 80 3 30 000 60 (i) Treating the communities as strata and assuming srswr in ench stratum, allocate a total sample size of 2000 persons to tho strate in an optimum manner for estimating the overall proportion of literates in the town. (ii) Estimate the efficiency of stratification as compared to unstratified sampling. 7.6 The analysis of variance of a population of 340 villages divided into 4 unequal strata is given in Table 7.13. Calculate the efficiency of stratification with Proportional allocation and with srswr in each stratum as compared to unstratified sampling for estimating the area under wheat. TABLE 7.13. ANALYSIS OF VARIANCE FOR AREA UNDER WHEAT. source of degrees of sum of squares mean squaro variation freedom (3/2) Q) (2) (3) (4) botweon strato 3 EAP ~Y) a 5400 within strata 336 BEda— 24 total 339 Ed(Ya-FyP 71.58 7.7 Suppose a region is divided into two sub-regions having 1000 and 1500 persons ‘and tho proportions of workers in manufacturing industries in tho two sub-regions aro likely to be about 30% and 70% respectively. Determine the total sample size required for estimating the ovorall population proportion within 5% of the truo value with 95% confidence, if srswr is adopted in the two strata (sub-regions) using optimum allocation of the total sample size. 7.8 Assuming the number of units in the strata to be equal, show that in stratified sampling with seswr and with equal allocation, the sampling variance Yq of the estimator of ¥ can be expresced as Yeah whore T is tho varianco in tho case of unstratified srswr in estimating ae 37290 STRATIFIED SaxtPLina {Ch 7 79 Instrotified ers wor the surveyor allocated the total sample size of n units by mistake 1 proportion to Nyo,? mstead of tho usual optimum allocation proportional to Nia, How does this allocation compare with proportional and optimum alloca tions? 710 Suppose the objective 15 to estimate the difference between the rates of inet dence of a particular disease in two villages, one a model villago having Ny persons and the other neighbounng village having Ny persons Let P, and P, be tho Proportions of persons having the disseso and lot C, and @, bo tho avorage costa of medically exemuming @ person in the two villages Assuming the cost to bo fixed at C,, determine the ophmum allocation of the total sample mzo to the two villages when srewr 18 sdopted m each village TAL Suppose t, and tig are unbiased sstimates of tho ¢ th stratum total (¢ = 1,2, ,K) based on 2 independent samples Show that the following two estimators ere unbiased for the varence of the combmed ostumator ¥y(tir-bfsa)/2 and compare thor variances t 1) ty S(t tea) 14 and Qn) y= (Sta LteP (Murthy, MN, Sankhya, 24, (8), (1962), 1-12) 712A population of N units ss divided into two strata of s1zes N, end Na unite and samples of n, and 1, units ato selected from each of thetwo strata with erswr Show that the efficiency of tho estsmator of Y in this case compared to that of optimum. allocation of the total samapte aize (nj +n.) 12 not less than 4¢/(1-+2}3, where = mynzinaan{, nj and n, being the optimum allocation to the two strate (Cochten, W G, Sompling Teohnguee, (1953), Ch 5, p 79) 713 In stratified srs wor for estimating tho overall population mean, obtan the sampling variances for the estimators in enso of (1) porportional allocation and (2) optimum allocation based on @ total sample size of m umts, assuming that Ny large enough for WV,/(Ns—1) to be epproxmately umty Compare thesa vanancos with that of the sample mean bssed on an unstratified eample of nm units drawn with srs Wor 734 Suppose » population of WV units 2 divided mto K strata at random such that thos th stratum hes a prespecified number of units Nae == 1,2, , i and within cach stratum sre wor 13 adopted using proportionat allocation Show that the vari ance of the estimator of the overall population mean in this case would be equal to that of the sample mean m the cate of unstratuled ara wor with tho same overall somplo azo 715 Assuming the population of WV units to bo drawn from super populations with the following models, compare tho expected variances of the estimators of Y (a) obtained by stratifying the populstion of N units into n strate of equal number ofCOMPLEMENTS AND PROBLEMS 291 ‘units and selecting one unit from each stratum with ers, and (b) based on a &; sample of n units assuming N to be a saultiplo of n: jematic @ EY) = ¥, rxy= 0%, Cov(T, Tr) =0, fe Gi) E(YO = a+ fi, VY) = 02, Covl¥n Te) = 0, HE where 4, i = 1,2,..,,.N, (Cochran, W. G., Sampling Techniques (1963), Ch. 8, 215-217). 716 Assuming the Gnito population of N units to bo drawn from » super-population with the model E(YXy = aXe, V(VX) = o°XF and Cov ( YyVy' Xu Xs show that in stratified sampling where the probability of inclusion of a unit in tho sample is proportional to its size, the optimum allocation of the total sample size to the strata, which minimizes the expected variance of tho estimator . gE % Woo 8 eon me where 74 is the probability of inclusion in tho sample of the i-th sample unit in the s-th stratum, is given by mm | ss es aoe te oe whero X, is the s-th stratum total for the size measure x. It may be noted that when g = 2, the optimum allocation becomes proportional to X, (Bao, T. J., (1966) unpublished). -17. In forming two strata in an optimum manner such that the eccond stratum consisting of N, large units is completely enumerated and a sample of ny (= n—N:) units is eclected with srs wor from the N; smaller units in the first stratum, ehow that the optimum point of stratification is given by y= Vita Jin, where Y, and o ato tho mean and the standard deviation of the first stratum. (Dalenius, T., Skand. Alt., 35, (1952), 61-70). 7.18. Suppose a population with a variable y having the probability density function Sy) = e% (y > is divided into two strata, stratum I defined by y < yo and stratum 2 by y > Yo. Derive the varianee of tho estimator of F assuming proportional allocation and srswr in each stratum. Find the optimum value of yp which will minimize the variance and evaluate the optimum variance. (alenius, T., Skand. Alt.. 33, (1950), 203-213). 7.19 Derive the results (7.80) and (7.81) given in Sub-section ops for optimum and equal allocations. (Dolenius, T., Sampling in Stecden, (1957), Ch. 7, p- 168). 10a regarding292 STRATIFIED SAMPLING [Ch. 7 720 From the results (7 82) show that af vtFa) ag inversely proporiional to K4, vba > VP an [Se +06) } (Cochran, WG, Sampling Teehugues, (1063), Ch 6A, p 134) 721 In the case of stratzfied sampling schemes where one umt 1s eclected from cach stratum, the samphng verianco 1s usually estimated by adoptmg tho method of collapsed strata This method oonaists 1 paring the strata to form collepsed strata and ostunating the eamplng vananco sa if two umta had been selected from oach oollapsed stratum Suppose the proportion of unis {1¥,) 2 the same for oach of the two strata forming the s th pair (¢ 2, ,K/2) Assuming ers within the strata, show that, the variance estimator a.m a) = BFE (yam yal where yan and Yq a0 tho values of unita svlected from the two strata formng the # th collapsed stratum, over estimates F(F) and that tho bias 1s amaall whon tho two strata forming the » th paw have approxmately the same mean (e= 1,2, , X/2) (Cochran WG, Sampling Teckniquee, (1963), Ch BA, p 14} 722 If the proportions of units (W,, and Ws.) for the two strata forming the 4th collapsed stratum (Problem 7 21) aro not the same for all ¢, consider the variance estimator 2% Fa) = BW ya Wea yell —vead and obtasn its bias (eth, GR, J Ind Soe Agr Stat, 18, (1968), 1-3) 723 A comple of m umts 2s drawn mith ses wor and two post strata are formed at the estimation stage There aro two possibilities b) each post stratum contains at least one sample umt, and {n) one of the post strata 19 empty, that 1s, contains no sample unst Consider the ostimstors (o) ¥ = Wit Woha and (b) ¥* = aDy,+0—-ayDeg,, where IV, and iV, (= 1—1Y;) are tho proportions of units and #7, and g, are the sample moans for the two post strata, a 18 1 or 0 according as stratum 2 or 11 empty and Dy = Wi P, and D, = Waly = U— Ty — Py) Py bomg the conditional: probability that stratum 2 1s empty given that possibilty (u} hes ocewrsed Show that the est matora (a) and (b) are conditionally unbiased given that posminkty (1) or {u) hes cceurred respectively Find the bias and the variance of ¥ saY +0-ayy’, where A 13 E or 0 according as possibility (1) or (ut) has occurred (Fuller, W A, J Amer Stat Agen, 61, (1966), 1172-1183)CGHAPTER 8 Cluster Sampling 8.1 NEED FOR CLUSTER SAMPLING Cluster sampling consists in forming suitable clusters of units and surveying all the units in a sample of clusters selected according to an appropriate sampling scheme. The advantages of cluster sampling from the point of view of cost arise mainly due to the fact that collection of data for nearby units is easier, faster, cheaper and more convenient than observing units scattered over a region. For instance, in a population survey it may be cheaper to collect data from all persons in a sample of households than from a sample of the same number of persons selected directly from all the persons. Similarly, it would be operationally more convenient to survey all households situated in a sample of areas such as villages than to survey a sample of the same number of households selected at random from a list of all households. Another example of the utility of cluster sampling is provided by crop surveys, where locating a randomly selected farm or plot (a parcel of land) requires a considerable part of the total time taken for the surrey, but once the plot is located, the time taken for identifying and surveying a few neighbouring plots will generally be only marginal. Because of its operational convenience and the possible reduction in cost, cluster sampling is resorted to in many survoys, using mutually exclusive or overlapping clusters formed by grouping nearby units or units which can be conveniently observed together. In general, for a given total number of sampling units, cluster campling294 CLUSTER SAMELING {Ch 8 3s less efficient than sampling of dividual umts from the view pont of sampling vaniance as the latter 18 expected to provide a better cross section of the population than the former duo to the usual tendency of units m a cluster to be similar In fact the sampling efficiency of cluster sampling 1s likely to decrease with incroase im cluster size Howover cluster sampling ts operationally more convenient and Jess costly than sampling of units directly due to the possible saving an time for journey, :dentification contact, etc, and hence in many practical situations the loss im sampling efficiency is hkely to be offset by the reduction in cost Ina general sense any system of sampling may be regarded as a Jond of cluster sampling sunce 2m every sampling scheme the umts are conceptually grouped to form samples (clusters) and one of them 1s selected with a certam specified probibility For instanco, systema hho samaphing may bo considered a particular case of cluster sampling since in this caso the population as divided mto a number of clusters each cluster consisting of umts distributed at o fixed interval (syste matically) over the whole population and one such cluster 1s selected at random But by cluster sampling 1s usually meant sampling of clusters of units formed by groupmg neighbouring umts or units which can be convemently surveyed together It may be noted that the various sampling procedures namely srs sj stematic sampling pps and stratified sampling discussed in the earher chapters can be apphed to samphng of clusters by treating the clusters themselves as sampling units 82 SAMPLING OF EQUAL CLUSTERS Let us first consider the case of clusters, which are mutually exolustve and have an equal number of umts Though the size of natural clusters such as villages (clusters of households or porsons) or branches of trees (clusters of leaves, flowers fruts) usually varies over clusters it is possible to have equal clusters when clusters arto artificially formed For instance m a crop survey, wo may consider clusters of two or more plots or other area units of a given aze antlSec. 8.20 ] SAMPLING OF ONE CLUSTER 295 shape as clusters and in a household survey two or more neighbouring houscholds may be grouped to form clusters, Similarly, in a production process in an industry, the number of items produced at regular intorvals of time may be tho same and tho production at different intervals of time can be considered to constitute tho clusters. 8.2a SAMPLING OF ONE CLUSTER Suppose a finite population of NAf units is divided into A’ mutually exclusive clusters of If units each and one cluster is selected with srs for estimating the population mean Si fe ol ee Fay BM (Tay, 2 Ty), (8) where Yy is the value of the j-th unit in the i-th cluster. An unbiased estimator of ¥ is clearly given by the sample cluster mean. Its variance is given by 1 - & (Yi—-¥) = oF, ve (8.2) i where the subscript ¢ denotes that the estimator is based on a cluster sample and o? stands for the between-cluster variance. Sampling of one cluster is being considered here mainly to bring out the impli- cations of using cluster sampling from the viow-point of sampling variance. It may be noted that it is not possible to estimate the variance of the estimator unbiasedly on the basis of a sample of onc cluster just as it is not possible to estimate unbiasedly the variance of a systematic sample estimator on the basis of a single sample. ‘Tho question of sampling n(> 2) clusters is considered in Sub- section 8.2b. Comparing (8.2) with the variance of the sample mean 7 based on Jf units drawn from NM units with srswr, namely, So os Oy W= C= sar 2 & (Fy-F), (8.3) fel jor206 CLUSTER SAMPLING [Ch 8 where the subscript r denotes atswr and o? 13 the total variance, wo find that the samphung efficiency of cluster sampling as compared to SISWE 18 mob Gab eSh on since o% = of-+03, whore o218 the within cluster vatianco given by 1 Xx 2 _¥y mya 2 ZO ¥,° From thus, it can be seen that cluster sampling will be moro efficient than stswr only when tho total variance 1s gteater than AF times the between cluster variance of, that 1s, when the withn cluster variance o2 1s greater than (3f—1) times o} =©This 1s not likely to be the case since og wil) usually be larger due to the within cluster homogenerty Hence, purely from the pomt of view of samplmg vanance, cluster sampling 13 gonerally Jess efficient than srs, though there may be special situations where the former may be as offinent as or even more efficiont than tho lattor The vananco m (82) and tho efficoney H, can bo expressed in terms of the itraclass correlation coefficient p, between pairs of units within clusters For, of can be written as 17. 2. 5 ay L-¥y =—yHr {2 z re —-M) o% —wie( 2 2 2 (vy-Tyee EE 3 Te-Pite—Y) } tal fol § ‘This may be wntten as o_o Op = 5p + (—Ded, (8 By where z 2 3 (Y¥q—¥)(Yy—F¥) p= Vi(ai—1o® ~*Sec. 8.2b j SAMPLING OF 2 CLUSTERS 207 Hence, . 1 Ey = oro ve (8.6) Substituting (0*~02) for of in (8.5), we got Mt M1 oF Noting that 0 war BM sy Bak (820Sec. 8.5a J SIMPLE RANDOM SAMPLING 307 Suppose 7 clusters aro selected with srs wor and all the units in the sample clusters are surveyed. An unbiased estimator of ¥ is given by {¥ Sarah=28 (4a. 829) : et whore J; is the mean of the i-th samplo cluster, sinco Myr is the total ie 44 for the i-th sample cluster and © § y is an unbiased estimator ° ist of the population total Y. When the value of Jf; is known only for the sample clusters and not for all the clusters, then ¥ can be unbiasedly estimated by Y=" 8 wy, wae (8.29) fot 8 g, (8.30) j=t which is considered Jater in this section, or a : fr = § atg,/8 ate, we (8.31) mf which is a ratio of two random variables and hence it is, in general, * biased for Y and such estimators are considered in Chapter 10. ‘The sampling varianco of ¥, given in (8.28) and its unbiased variance estimator are 3)_ N- gol} (My) 5 Ve) = yay ot= ay 2 (ae Fe JP a= (8:32) and N=n 92 ge 1 g(t ¥.)* wiy= Ayst t= shy 8 (4pa-F) ~ eam308 CLUSTER SAMPLING [Ch. 8 Variance as Funclon of pe 1m z en Substituting the value E Yy for Prim oj2, we get yop 2 NMI oP= = ( z yu-ar¥) alia Adding and subtracting Me inside the brackets an NAL 2052 and expanding, we get N19? = Z z (Yy-Y)+ & z & {¥u—¥UFy—¥) $2 Ean aary2F E ar —aryPs—¥). Noting thet m this ease tho overall variance ¢? and the mtracless correlation cosficrent pe are given by g_ 1 § —¥)t ° ve EF (a-¥) and x = z =, (¥g—FyPy—F) uaa sig Pe E Maan o? mn V(¥.) can be written as 5 Lf & Amt) , PUY} p (= 1+ 3 na Pe +o war: Sanna st ware E MiQh—My¥-¥) | ae (834) Tt may be noted that when the cluster sizes are equal, that 1s, when Aft = 20’ = At for all t, the exprosmon (8 34) reduces to the expression (8 9) derived oarher. The estimator of Y given in (8,30), namely, P=L8 yg, Trot is biased, since its expected value 1s given by BP)=1 2 7-7 rlSec. 8.5b J, VARYING PROBABILITY SAMPLING 309 which, in general, is not equal to F(= ¥/N3f’) and the bias is Bi) = =F z im yap 2 B AGY,;= ar - Cov (Y;,3f,). This shows that the bias is expected to be small when Jf, and Y; aro not highly correlated. In such a case, it may be desirable to uso this estimator, since, though biased, its mean square error, namely, MF) = VP BAF) _ Went Yate 2 3 ( ¥,-Y'P - {Cov(¥;, Wi}... (8.38) is likely to be considerably less than the variance given in (8.32). 8.5b VARYING PROBABILITY SAMPLING Since in many practical situations the cluster total for the estimation variable is likely to be positively corzclated with tho number of units in the cluster, it may be profitable to select the clusters with probability proportional to the number of units in the cluster instead of with equal probability, or to stratify first the clusters on tho basis of their sizes and then to have srs within each stratum. Suppose » clusters are selected with ppswr, size being the number of units in the cluster, then an unbiased estimator of Y is given by o ie ve (8.36) 2 Tas 2 aa. Since g; takes the values {Yj} with probabilities (J1/N IW’ At to, wa, WV, we have ig=2 Sng = = aie F and VE) = is war. = (F,-Tyeyy. wa (8:37) oe310 OLUSTER SAMPLING {Ch 8 An unbiased vanance cetamator can be derived v noting that unbiased. estumators of © ¥? and ¥? aro given by 4 = 3a 2 and $2—1(3) it Nar’ respectively and solving for u{Y) after substitutmg these estimators an (837) That 1s, LiL & oo gee a le SRY Yo} Hence, we get lL fa sy ani) 5 GY} (8 38) The officency of sampling n unequal clusters with ppswr as compared to selection of nJ{’ umts with srswr can be obtamed by com parmg (8 37) with the variance in the latter fase, namely, o = = —¥) 3 PE) = ar 7= zor 2 Zc ¥) (8 39) Expanding V(¥;) after adding and subtractmg Y, within the brachets in o%, wo got 1 4 ar nl 2@ cre E z 5 (Ty-Fo*} =i oy To = ar (ree 2, whero o3 1s the within cluster variance given by 1% ob = sip 2 z Mot, of = ay EB y—Pat Fonco, tho efficiency of cluster sampling 1s given by V(¥r) Rew 1 (8 40) WR) A I—(ezfo*) To mereaso tho efficrency further m the case of unequal cluster size, other sampling schemes, such as pps wor sampling and pps systematic sampling with a suitable arrangement of the clusters, may be usedSec. 8.6] ILLUSTRATIVE EXAMPLES Sih 8.6 ILLUSTRATIVE EXAMPLES In this section the results of an empirical study are given to illustrate the behaviour of the efficiency of cluster sampling with increase in cluster size for different sampling schemes. ‘This study relating to the estimation of the total acreage under autumn paddy in a village is based on the same plot-wise data used in Sub-section 3.3¢ for determining the optimum cluster size for a fixed cost. The relative variances (V/¥) for the estimators of acreage under paddy based on samples of clusters selected through (i) sts without replacement, (ii) simple systematic sampling, (iii) ppswr, size boing geographical area, and (iv) pps systematic sampling have been calculated for different numbers of plots per cluster (If = 1, 2, 5, 10, 20) keeping the total number of sample plots fixed at 40. The values of the relative variances as well as their efficiencies compared to direct sampling of the individual plots (3f=1) for tho corresponding sampling schomes are presented in Table 8.4. TABLE 8.4. BEHAVIOUR OF RELATIVE VARIANCE. AND EFFICIENCY FOR DIFFERENT CLUSTER SIZES WHEN TOTAL NUMBER OF SAMPLE PLOTS IS FIXED. no.of srs wor ppswr pps systematic cluster sample ———__—— SS size clusters rel. off. rel. rel. eff, rel. off, vor. (%) var. var, (%) 2%, a) @) (3) () (6) (6) a) (8) (9) (10) a 40 0798 100 100.0454 100.0132 100 2 20.0040 85 90 082155020250 5 S 1048 Gt 179225010058 10 . 10 a 10 20 2.2009 39 0.2457 1095 3 off efficiency. rel. var.—relative variance; eographical aren : 900 acres; arca under pad total number of plots =312 CLUSTER SAMPLING [Ch 8 From this table wo sce that for all the sampling schemes tho efficiency of cluster sampling decrerses with increase in the cluster size when the total number of sample plots 1s fixed and that the rates of decrewse in the efficiency are different for the different sampling schemes For instance, the decrease im efficioncy with imcrease in cluster size 15 much slower in caso of srs and systematic sampling than im caso of ppswr and pps systematic sampling Thus study of the effimency of cluster sampling 2s realistic only under the assump tion that the cost of survey 1s proportional to the total number of sample plots But the cost of survey 1s usually not just proportional to the numbor of sample plots in case of cluster simplmg Taking tho cost of survey to be proportional to n-+nAf(C,/C,) tho relative efficiency per umt of cost numely, Vii) _ VP) M+) (ay Ee= > Z ’ ° Ve) SC yey ONGC has been worked out for different values of O,/0, and the rosults are presentod m Table 8 6 TABLE 85 RELATIVE EFFICIENCY PER UNIT OF COST FOR DIFFERENT CLUSTER SIZES AND DIFFERENT SAMPLING SCHEMES cluster relative efficiency per umé cost (Pr{FekCr/Oc) ‘ize ©. rs wor systematic ‘ppswr pps systematic cy G@ OL 03 OF G2 O63 65 O1 O38 OF Ot O3 06 MO © 8 © © © MG Go) (Et) (12) (13) 1 100 100 100 100 100 100 100 109 100 100 100 100 2 166 138 127 165 146 135 101 99 88 93 82 76 5 234 166 127 272 19s 159 88 66 SE 46 32 OF YO 303 179 «138 283 «167 128 «05 5G 48D 20 283 143 105 103 98 72 Sf ff oO oF oO 8REFERENCES 313. From Table 8.5 it can be seen that the efficiency of cluster sampling Per unit of cost increases with cluster size upto a certain stage, but thereafter decreases with further increase in cluster size for sts and systematic sampling. ‘Tho cluster size for which the efficiency is tho maximum is to be considered as the optimum cluster size. It may be noted that the decrease in efficiency in caso of ppswr and pps systomatic sampling has become more gradual than in Table 8.4 due to the use of a more realistic cost structure (cf.) Problem 8.2, p.314). REFERENCES Hansen, M. H, and Hunwzrz, W.N. (1942): Relative officiencios of varions sampling units in population enquiries; J. Amer. Stat. Assn., 37, 89-04. Tasse, R. J, (1942): Statistical investigation of a sample survey for obtaining farm facts; Iowa Agricultural Experimental Station Research Bulletin, No. 304. Mauazanonrs, P. C. (1940a): A samplo survey of acreage undor jute in Bengal, Sankhya, 4 511-530, MAHALANOBIS, B. C. (194Ub} Contral Jute Committes. Mauatanonis, B.C, (1943): General Report ou the Sample Census of Arca under Jute in Bengal, 1941; Indian Central Jute Committeo. Report on the Sample Census of Jute, 1989; Indian Maananonts, P. C. (1944): On largo-sealo samplo survoys; Phil. Trans. Roy. Soc., London, 231, (B), 329-451. ‘Manatanonts, P. C. and Spxourta, J. M. (1951): On the sizo of samplo cuts in crop ‘cutting oxperiments in the ISI, 1939-1950; Bull. Inter, Stat. Inst., 33, (2), 359-403. Ssars, H. F. (1938): An empirical law describing hoterogencity in tho yields of agricultural crops, J. Agri. Sci., 28, 1-23. Susuarsm, P. V. (1947): The problem of plot sizo in large-seato yield surveys; J. Amer, Stat. Assn., 42, 297-310, 460. Sunmatsts, B. V. (1958): Sampling Theory of Surveys with Applications; Chapter VI, Towa Stato College Press, Ames, Towa, Indian Society of Agricultural Statistics, New Dolhi. 40alg CLUSTER SAMPLING [ch 6 COMPLEMENTS AND PROBLEMS 81 In planning a sample survey for estrmating the proportion P of area under jute in a region, a pilot study was undertaken m which independent seraples of clusters of different mzes (z) were takon up for estimating the values of o2 with a view to study ing the xelationshup betwoon # and ¢? ‘The results obtained i this pilot survey are given in Table & 6 TABLE 86 ESTIMATES OF of /P(I—P) FOR DIFFERENT CLUSTER SIZES cen ouea—P) EE ora) @ @) ®) 2) 100 oO 1120 12 25 0 0454 225 0 0883 16 00 0 OLD 400 0 0659 25 00 0 0398 625 0 03577 36 00 0 0342 900 0 0505 Source Mahalanobis, P ©, Phil Trans Roy Soe, (B), 231, (1914), 829-451, Table 9, p 412 ( Exarame whether column (2) in Table 8 6 conforms to the relation ajz?, where @ and g are constants to be determined (u) Assuming tho cost function O = 1000+2 In+0 Tnz, determmne the oplmum values of x and n for estimating P whon the cost 5 fixed at Re 10.000 82 amg the data given m Table $ 4 (p 311) about the relative variances of ostt mates of ncroage under paddy for different cluster sizes in case of vatious sampling schemes, determme the optnnum cluster aos for these schemes when the cost 1s assumed to be proportional to the square xoot of the size of tho cluster 83. Ina forost nursory, there aro sx rows each of length 434 feet in the bed To amive nt o suitable sampling umt for estimatmg the total number of seedkags m tho bed, the entire population was studied using four types of samphng unit (a) one foot length of single row, (b) two foot length of single row (e) one foot of the completo width of tho bed and (d) two feot of the complete width of the bed ‘The results of this study are given in Tablo 8% Find ont the optimum sampling umt after comparmg the relative cost efficioncies of tho four types of units considered hereCOMPLEMENTS AND PROBLEMS 315 TABLE 8.7. DATA ON COST AND VARIANCE FOR FOUR TYPES OF SAMPLING UNIT. type of unit total number —variance _Iength of a row (in fect) of units Per unit covered in 15 minutes: a (2) (3) @ ‘one-foot row 2604 2.537 44 two-feet row 1302 6.746 62 one-foot bed 434 23.004 78 ‘two-feet bed 217 68.558 108 ——— 8.4 Suppose in a study on cluster campling, a sample of n clustors of M units each was selected with srswr. Let b and w bo unbinsed estimates of between-cluster and within-cluster variances. Assuming the samplo size in terms of the number of units to be fixed, obtain an estimate of tho relative efficiency of cluster enmapling a3 compared to that of direct sampling of units by estimating the sampling variances in tho two cases unbiasedly. 8.5 Derive the results (8.23) and (8.26) relating to the varinnco and efficiency of cluster sampling in estimating a population proportion. 8.6 Derive tho result (8.34) and show that it reduces to (8.9) when the clusters are of equal sizo. 8.7. For examining tho efficiency of sampling houscholds instead of persons for estimating the proportion of males in a given area, the following simplifying assump- tions are mado : (i) each houschold consists of four persons (husband, wife and two children) and (ii) tho sex of a child is binomially distributed. Show that the intraclass correlation coefficient in this case is (—1/6) and thet the efficiency of sampling households compared to that of sampling persons is 200%. (Sukhatme, P. V., Sampling Theory of Surreys with Applications; (1953), Ch. VI, 248-250). 8.8 Lob there bo N clusters of AM units cach. When n clusters aro selected systematically for estimating the population mean per unit, derive the sampling variance of the estimator in terms of tho intraclass correlation coeflicient (Pe) between pairs of units in tho clusters and that (/;) between pairs of clusters in the samples, assuming N to bo a multiple of n. (Madow, W. G., Ann. Math. Stat., 20, (1949), 333-954),316 CLUSTER SAMPLING [Ch s 88 Ifthe NAL units i a population are grouped at random to form N clusters of Af units cach, show that sampling » clusters with sra wor would have the same eff ciney as sampling nA umts with srs wor 810 Let a fimte population of M, unsts be divided mto N olustora mith the + th cluster having Af; units Suppoze a comple of m umts 1s selected from the AM, popula ‘hon units with srs wor and then the snmple umis aro grouped according to tho clusters to which they belong Sampling of n clusters from these clusters (including tho empty ones) and observing only the ongually eampled untts in them ts termed post cluster sampling If y 18 tho sample total based on the values of the sample units an tho 4 selected clusters, show that the estunator ¥ = Nylon a3 unbussed for the overall population mean ¥ and derive its sampling vanance (Ghosh, § P, Ann Bath Stat, 24, (1963) 587-597)CHAPTER 9 Multi-stage Sampling 9.1 SAMPLING PROCEDURE In Chapter 8, it has been stated that though cluster sampling is economical under certain circumstances, it is generally less efficient than sampling of individual units directly. A compromise between cluster sampling and direct sampling of units can be achieved by selecting a sample of clusters and surveying only a sample of units in each sample cluster instead of completely enumerating all the units in the sample clusters. Such a procedure is known as two-stage sampling, since the units are selected in two stages. Hero clusters are termed first stage wnits (fsu) ot primary stage units (psu) and the ultimate observational units are tormed second stage wnits (ssu) or ultimate stage units (usu). It may be noted that this procedure can be easily generalized to give rise to multi-stage sampling, where the sampling units at each stage are clusters of units of the next stage and the ultimate observational units are selected in stagos, sampling at each stage being done from each of the sampling units or clusters selected in the previous stage. This procedure, being a compromise between wni-slage or direct sampling of units and cluster sampling, can be expected to be (i) more efficient than uni-stage sampling and Jess efficient than cluster sampling from considerations of operational convenience and cost, and (ii) less efficient than uni-stage sampling and more efficient than cluster sampling from the view-point of sampling variability, when the sample size in terms of number of ultimate units is fixed. It is of interest to note that an r-stage3138 MULTI STAGE SAMPLING {Cho design reduces to a stratified (r-1) stage desgn when all the fau's are included in the sample It may be mentioned that mult: stage sampling may be the only feasible procedure in a number of practical situations, where a satis- factory sampling frame of ultimate observational units is not readily available and the cost of obtammg such a framo is prohtbttive or where the cost of locating and physically adontvfymg the usw’s is considerable For instance, for conducting a socio economic survey In 8 region, where generally household 1s taken as the usu, a completo and up to date list of all the households m the region may not be available, whereas a list of villages or parishes and urban blocks which are groups of households may be readily available In such a case, a sample of villages and utban blocks may be selected frst and then a sample of households may be drawn from each selected village and urban block after making a complote ist of households un them It may happen that even a lst of villages 15 not available, but only a hist of all tehsils or counties (groups of villages) 1s available In this case a sample of households may be selected m three stages by selec ting fitst a samplo of tehstls (counties), then a sample of villages (parishes) from each selected techs! (county) after making a list of all the villages (parishes) in 1t and finally a sample of households from each selected village (parish) after listing all the households in 1t Since the selection 1s done m three atages, this procedure 1s termed. three stage sampling ere tehsils (connties) aro taken as first stage units (fsu), villages (partshes) as second stage units (ssu) and house holds as third or ultimate stage units (tsu) Jn practice, 16 usually happons that we have more mformation for groups of sampling umts than for individual units Hence, if these groups ate taken as fsu’s, the information available for them can be used in effecting good stratification or arrangement and im selecting the sample of fsu’s Further, smce the ssu’s are selected only from the sample fsu’s, 1t would be practicable to collect some suitable mnforma tion about the ssu’s at the time of hsting them and use this mforma tion for obtainmg a better sample of ssu’s Because of this, 1 may be possible that a multistage design, where the mformation9.2] ESTIMATION AND SAMPLING VARIANCE, 319 tilable at every stage is properly utilized, is more efficiont than “stage sampling even from the point of view of sampling variability. Multi-stage sampling has been found to be very useful in practico I this procedure is being currontly used in a number of sur halanobis (1940) used this sampling procedure in crop surveys tied out in Bengal during the period 1937-1941, and he had termed 8 procedure as nested sampling (Ganguli, 1941). Cochran (1939) 1 Hansen and Hurwitz (1943) have considered the use of this pro- lure in agricultural and population surveys respectively. Lahiri (1954) + discussed the use of multi-stage sampling in tho Indian National nple Survey, and Roy (1957) and D. Singh (1958) have considered estimation of variance components for this sampling scheme. Another type of sampling in stages consists in drawing a largo aple of units in tho first stage or phase, for which information some auxiliary variable is collected and then solecting a sub- aple of these units for the main survey using the auxiliary informa- n for stratification, selection and estimation. This method of npling is termed two-phase sampling. If the ultimate sample is acted in two or more phases, the sampling procedure is termed Ui-phase sampling. This is considered briefly in Section 9.12 {also in Chapters 10 and 11. This procedure also leads to redue- nin cost as compared to uni-stage sampling, though not necessarily the oxtent achieved in multi-stage sampling. 9.2 ESTIMATION AND SAMPLING VARIANCE To illustrate the technique of building up estimators of popula- n total and mean in the case of multi-stage sampling, lob us asider the application of two-stage sampling to a population, where } units are grouped into N groups or clusters and tho i-th cluster itains Mf; units, @=1,2,...,.). Leb Yy denote the value of the aractoristic under considoration for the j-th unit in the i-th cluster. on the population total ¥ is given by My yaSn, (Ti=%%u)- wa (0.1) a Pe$20 MULIT STAGE SAMPLING [ch 6 Taking the W clusters as fsu’s and the units themsolves as ssu’s, we may sample » fou’s with any given probability scheme and from tho ath selected fsu, my ssus may be selected with certam specified probabilities Let yy be the value of 7 th selected ssu im the + th selected feu G=1,2, ,m +=1,2, ,”) If the total values of tho selected fou’s were known tt can be seen that z+ would be possible to get an estimator of Y with the help of the probability scheme at the first stage as m cluster sampling But in two stage sampling, the actual totals of the selected fsus are not known and hence they have to be estrmated on the basis of the selected ssu’s using the probability scheme adopted sn. selecting them That 2s, m cluster sampling the estimator as of the form P, = 3 ay, where y 18 the total of the + th sample cluster and ay 18 the corresponding mflation factor But an two stage sampling the value of y itself 18 to be estimated by m, k= S agus Jol where ay 15 tho inflation factor at the second stago selection and therefore the estimator of Y takes the form m, s 2 P= 8 ad= Sa S eum (92) For instance, 1f the units at both the stages are selected with equal probabihty wath or without replacement or circular systematically, an unbiased estamator of ¥ 1s given oy PaXg ME yy yor WE gar Similarly, if n fsu’s are selected with probabilities {Pj} (t = 1,2, ,.N) with replacement or cicular systematieally, and tm, ssu’s are selected with probabilities (Py},(7 = 1,2, , 3%) with replacement or circular systematically, an unbiased estimator of ¥ 1s ny, * 1 pel gii gw, TM rot PEM gor PrySec. 9.2] ESTIMATION AND SAMPLING VARIANCE "32 where {pi} and {py} denote the probabilities of the selected first and second stage units. Extending this technique to the selection of the sample in more than two stages, we find that the estimator of the total and mean can be obtained by considering the estimators of the totals of the sample units at different stages built up from the next stage sample units. In the case of an r-stage design, the estimator will be of the form crt Bitten ty Vigtgree ty» r+ (9.3) fet Tage HE where dy...4 the inflation factor at the j-th stage and My is.tg 4 is the number of j-th stage units selected from a (j—1)-th stage unit, (j= 1,2, ...,7), mj, being n. Since in multi-stage sampling the units are selected in stages by adopting a random or probability mechanism at each stage, the selection procedures at all the stages are to be considered in deriving the expected value and the variance of an estimator based on the observations made on a sample of usu’s. This is usually done in stages starting from the ultimate stage and moving towards the first stage. The conditional expectation at the last stage is taken for a given set of selected penultimate stage units, the conditional expectation of this at the penultimate stage is taken for a given set of units selected in the previous stage, and this procedure is continued till the unconditional expected value of the conditional expectation of the estimator, taken over the second and subsequent stages of selection, js taken at the first or the primary stage. The question of obtaining the expected value and sampling variance of estimators based on units selected through randomization at two or more stages has beon considered briefly in Section 2.8 of Chapter 2 (p.41), where it has been shown that the expected value and the sampling variance of the estimator ¥ based on a two-stage sample ate symbolically given by E,(¥) = BEST), ve (D4) and EV AP), vs (8.5) Val?) = VBP) 41322 MULTI STAGE SAMPLING [Gh 9 whero Ey, and Py, denate the expectation and the vanance over the two stages, H, and V, the unconditional expectation and vamance over the first stage and E, and V, the conditional expectation and variance over the second stage for a given sample of fsu’s Similarly proceeding for a r stago design, we get EP) = BE, BAY), (96) and PP) = ViBp EpaEdP)+ EV, Bp aEXY)+ +BE, Ers¥AP) 7) It 1s to be noted that m (95) the expression V,H,(2") 18 a measure of the variation between fsus and the other expression Z,V,(P} 1s & measuro of tho vanation between ssus withm fiw’s In other words these expressions ate measures of the contribution to the total variance from the two stages of sampling Sumularly, m an r stage design, the total variance consists of r parts, each part givmg tho variation between units of a particular stage within the units of the previous stage 93 TWO-STAGE SAMPLING WITH SRS Suppose a sample of » feu’s 1s selected with srs and from the + th selected fsu a sample of m, ssu’s 1s selected again with ers An un uased estimator of Y 1s given by 8 a fa al (9.8) S|E tos since ah) = REM) #(E8 y) ay, where 18 the total of the + th sample fuSec. 9.30] SRS WOR AT BOTH THE STAGES 323 9.3a SRS WOR AT BOTH THE STAGES Supposo the sampling of units at both the stages is done with ers wor. Tho expression for V(¥) can be derived by applying (9.5) and we get WP) = VBP) +E VAP) = 7 (28 nu) +e [48 8 a fo 28 <2 Vv = wera) 1% Sapa : vee (9.9) n Te gmt f=2jN, f,= mM; and If’ is the average number of ssu’s per fsu. ‘An unbiased estimator of 7(2) can be obtained if it is possible to estimate oj2 and 73 unbiasedly, Since the ssu’s within the fsu’s are selected with sts wor, an unbiased estimator of o/% is given by and hence the second term in (9.9) is unbiasedly estimated by * 8 ap (fy : ws (9.10) me jet x ae ‘An unbinsed estimator of oj? can be obtained by estimating = a4?Y: int and Y? unbiasedly, for 0)? can be written as oF = oF so ( 5 3 aTi— 3):gaa MULTI-STAGE SAMPLING [Ch 9 Since Va(Gi) = EsG7)—Y2, an unbiased ostimator of Y?is given by y a BMG) = RAW N and hence that of 2 MPY? 1s i NV 8 apig—o_y) 7 SRO & |: An unbiased estimator of Y? 1s given by P2—-2(?) Substituting these estimators in o,? and smplifying, we got an unbiased estimator of of as N 2 § ME Gp) he oP) WH [o—ng—8 4fs0—fo + ae Lath. ay 6 . where sf = avis, (gr 8): Y¥ = Pyar’. Substituting in (3) this estimator and that obtained i (9.10), we get after simplification oP) = wea? apse Ban ay) &. ve (O12) Tt may be noted that for caloulating the first term im (9.12) the value of Bf 13 not required, since it can be rewritten as (cf) 3 - way § wana — FP From (9 10) and (9 11) it can be seen that an unbiased estimator of 6,2 is given by () a 7 af, RG. wm and that an unbiased estimator of the first term of P(?) given in (9.9) is got by multiplying (9.12) by N?4L21—f)/n,Sec. 9.3b J SRSWR AND SRS WOR AT THE TWO STAGES 225 9.35 SRSWR AND SRS WOR AT THE TWO STAGES In large-scale surveys, it is desirable to select the fsu’s with replacement, since it enables us to get an estimate of the sampling variance of the estimator without having to calculate separately the estimates of the within and between components. Suppose the sampling is done with erswr at the first stage and with srs wor at the second stage, the estimator of ¥ given in (9.8) remains unbiased and its sampling variance can be verified to be pe « x f) = wry ef 2 (fy VP) = Weare 45 MP Of) where of = (N—1)oj2/N. Noting that the n unbiased estimates of F obtained from n sample fsu’s, namely, G=NMG, 1=1,2,...2, wee (9.14) are statistically independent and have the came sampling variance, S(y 0) = na BI (9.15) ” Where sf is as defined earlier. 9.3c SRSWR AT BOTH THE STAGES Suppose the sampling at both the stages is dono with then also the estimator given in (9.8) remains unbiased for ¥ its sampling variance is given by 2, 08 = where o2= te Since here too the fsu's are sclected with replacement, an unbiased variance estimator is given by (9.15).326 MULTI STAGE SAMPLING (Cho 93d ESTIMATION OF POPULATION MEAN In all the cases considered in this section an unbiased estimator of the population mean Y =: Y/N’ can bo easily obtained by diniding by NU, if the valuo of AY’ 13 knownin advance In that case, tho expressions for the vatiance md the variance estimator oan be obtamod by dividing the corresponding expressions for ® by N*2f? If the value of 2f’ 1s not known m advance, then it has to be esti mated by the mean of the number of ssu sin the x sample fsu’s and an this case the estimator of ¥ 1s grven by 5 NS ah Ne aT Sm Fm on This, beg a ratio of two unbiased estimators, 13 gonerally Iased and such estimators are considered in detail in Chapter 10 9.3e UNI-STAGE AND CLUSTER SAMPLING It 13 of interest to note that two stage sampling reduces to cluster sampling af m= Mf; 1n the samphng schemes considered in Sub sections 9 3a and 9 3b and hence the variances of estimators for cluster samp] ing can be obtained as special cases of the variances in the case of two stage samplng by substitutmg fy= 1.1m (99) and (913) For comparing the efficiency of two stage sampling with that of um stage sampling and cluster samphng for a given total sample sizo m terms of number of ultimate units, let us consider the simplified case where Aly = Mand m= mfort=1,2 ,N In this case, the estimator of ¥ is given by F-158 B wy (0318) and its variance im two stage sampling with srswr at the first stago and sts wor at the second stage becomes Mom ot Af—-1 2m a of WY) == E+ (9 19)Sec. 9.3e ] UNI-STAGE AND CLUSTER SAMPLING 327 Noting that of and o% can be expressed in terms of the population variance ¢? and the intraclass correlation coefficient p,. (ef. Sub- section 8.2a of Chapter 8, p. 275), namely, ob = Sp UH E=I)p3 and of = AS! ott), we get after simplification ME) =< o+m—1p3, vee (9.20) nm where the subscript ¢ is used to denote two-stage sampling. If the total number of units is fixed at nm, then the variances of estimators of ¥ in cluster sampling and uni-stage sampling can be shown to be given by ni) = pyar 1p} vs (921) mm and (9.22) where the subscripts ¢ and r denote cluster sampling and uni-stage sts respectively. Comparing (9.21) and (9.22) with (9.20), we find that re) < nip < MF, if p, > 0 which is likely to be the case in practice when nearby: units are grouped to form the clusters or fsu’s. ‘This shows that the sampling efficiency of two-stage sampling design is expected to be between those of the uni-stage srs and cluster sampling for fixed total sample size. If in the above case, we adopt sampling with replacement at the second stage also, the sampling variance given in (9.19) reduces to vee (9.23)

GLMM in Agriculture and Biology
No ratings yet
GLMM in Agriculture and Biology
436 pages
A Course in Power Systems-Gupta
100% (5)
A Course in Power Systems-Gupta
570 pages
Agricultural Statistical Data Analysis Using Stata by George Boyhan
No ratings yet
Agricultural Statistical Data Analysis Using Stata by George Boyhan
253 pages
Carl Erik Sarndal, Et. Al - Model Assisted Survey Sampling (1992)
No ratings yet
Carl Erik Sarndal, Et. Al - Model Assisted Survey Sampling (1992)
306 pages
PRML Solution Manual
No ratings yet
PRML Solution Manual
253 pages
Health Safety and Enviornmental
No ratings yet
Health Safety and Enviornmental
519 pages
Stromwater Management
No ratings yet
Stromwater Management
602 pages
Chapter 6 Statistical Estimation Method of Moments MLE
No ratings yet
Chapter 6 Statistical Estimation Method of Moments MLE
29 pages
Statistical Survey Technques-Jessen
No ratings yet
Statistical Survey Technques-Jessen
540 pages
Practical Research Methdos Willie Tan
No ratings yet
Practical Research Methdos Willie Tan
612 pages
STAT 366 - Sample Survey Theory and Methods II - Lecture 2
No ratings yet
STAT 366 - Sample Survey Theory and Methods II - Lecture 2
82 pages
Holt D and Smith TMF, 1979 - Post Stratification. Journal of The Royal Statistical Society. Series A (General)
No ratings yet
Holt D and Smith TMF, 1979 - Post Stratification. Journal of The Royal Statistical Society. Series A (General)
15 pages
Geomechanics in Tropial Soils
No ratings yet
Geomechanics in Tropial Soils
454 pages
20231217010010689
No ratings yet
20231217010010689
460 pages
Quick Stata Tips
No ratings yet
Quick Stata Tips
103 pages
Life-Science Models
No ratings yet
Life-Science Models
396 pages
Input-Output Techniques
No ratings yet
Input-Output Techniques
348 pages
ESCP
No ratings yet
ESCP
368 pages
Proceeding of International On Tall Buildings
No ratings yet
Proceeding of International On Tall Buildings
335 pages
Statistics in Kinesiology
No ratings yet
Statistics in Kinesiology
316 pages
Stats216 hw3 PDF
No ratings yet
Stats216 hw3 PDF
26 pages
R Manual To Agresti's Categorical Data Analysis
100% (1)
R Manual To Agresti's Categorical Data Analysis
280 pages
Intro To Hydrology
No ratings yet
Intro To Hydrology
415 pages
Economic Evaluation of Island
No ratings yet
Economic Evaluation of Island
276 pages
20231213151923095
No ratings yet
20231213151923095
262 pages
Theory Population and Economic Growth
No ratings yet
Theory Population and Economic Growth
250 pages
Anaysis of Response of Crop and Livestock
No ratings yet
Anaysis of Response of Crop and Livestock
236 pages
Reinforced and Prestressed Masonary
No ratings yet
Reinforced and Prestressed Masonary
232 pages
Estimation and Testing of Hypothesis PDF
100% (1)
Estimation and Testing of Hypothesis PDF
75 pages
Development of Web-Based Rural Gravity Based Flow Meter System
No ratings yet
Development of Web-Based Rural Gravity Based Flow Meter System
208 pages
20231217185311591
No ratings yet
20231217185311591
204 pages
04 Stratified Sampling
No ratings yet
04 Stratified Sampling
19 pages
Hydraulic For Civil Engineer
No ratings yet
Hydraulic For Civil Engineer
104 pages
R Code For Canonical Correlation Analysis
No ratings yet
R Code For Canonical Correlation Analysis
10 pages
Sampling Theory and Method 1 149 Ook
No ratings yet
Sampling Theory and Method 1 149 Ook
149 pages
Copula
No ratings yet
Copula
21 pages
Introduction To Econometrics - Stock & Watson - CH 5 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 5 Slides
71 pages
Selected Statistical Tests
No ratings yet
Selected Statistical Tests
258 pages
835618
No ratings yet
835618
298 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
Sta 405 First Material
No ratings yet
Sta 405 First Material
17 pages
Rohatgi Expl
No ratings yet
Rohatgi Expl
192 pages
BiodiversityR PDF
No ratings yet
BiodiversityR PDF
128 pages
20240206165411011
No ratings yet
20240206165411011
54 pages
(GAM) Application PDF
No ratings yet
(GAM) Application PDF
30 pages
Stratified Sampling Notes
No ratings yet
Stratified Sampling Notes
7 pages
Binomial Distribution
No ratings yet
Binomial Distribution
16 pages
Estimation and Hypothesis
100% (2)
Estimation and Hypothesis
32 pages
Stats216 hw2
No ratings yet
Stats216 hw2
21 pages
Lecture 10 Randomized Complete Block Design Last Lecture
100% (1)
Lecture 10 Randomized Complete Block Design Last Lecture
4 pages
Element of Quiuality Control
No ratings yet
Element of Quiuality Control
41 pages
Estimation EMV
No ratings yet
Estimation EMV
37 pages
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
100% (1)
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
25 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
BADM 572 Module 4 Study Session 7 April 2019
No ratings yet
BADM 572 Module 4 Study Session 7 April 2019
44 pages
Generalized Linear Models: Ariel Alonso Abad
No ratings yet
Generalized Linear Models: Ariel Alonso Abad
43 pages
When Should You Adjust Standard Errors For Clustering?: Alberto Abadie, Susan Athey, Guido Imbens, & Jeffrey Wooldridge
No ratings yet
When Should You Adjust Standard Errors For Clustering?: Alberto Abadie, Susan Athey, Guido Imbens, & Jeffrey Wooldridge
33 pages
Chapter 8 Fitting Parametric Regression Models: Required Data
No ratings yet
Chapter 8 Fitting Parametric Regression Models: Required Data
11 pages
Lec. Note E4
No ratings yet
Lec. Note E4
5 pages
MCMC Sheldon Ross
No ratings yet
MCMC Sheldon Ross
68 pages
281A Final Sol
No ratings yet
281A Final Sol
9 pages
1971 - Rand - Objective Criteria For The Evaluation of Clustering Methods
No ratings yet
1971 - Rand - Objective Criteria For The Evaluation of Clustering Methods
6 pages
Anintroductiontomachinelearning: Michaelclark Centerforsocialresearch Universityofnotredame
No ratings yet
Anintroductiontomachinelearning: Michaelclark Centerforsocialresearch Universityofnotredame
43 pages
12-Multiple Comparison Procedure
No ratings yet
12-Multiple Comparison Procedure
12 pages
River Audit
No ratings yet
River Audit
12 pages
Ejemplo de Inferencia Umvue
No ratings yet
Ejemplo de Inferencia Umvue
10 pages
Comandos
No ratings yet
Comandos
51 pages
Wiley - Student Solutions Manual To Accompany Introduction To Time Series Analysis and Forecasting - 978-0-470-43574-8
0% (1)
Wiley - Student Solutions Manual To Accompany Introduction To Time Series Analysis and Forecasting - 978-0-470-43574-8
3 pages
3 - Principles of Data Reduction
No ratings yet
3 - Principles of Data Reduction
14 pages
Sufficient Statistics - Problems - Solved - Xiang - Yin
No ratings yet
Sufficient Statistics - Problems - Solved - Xiang - Yin
5 pages
Binomial Distribution
No ratings yet
Binomial Distribution
26 pages
Axiomatic Probability and Concepts
No ratings yet
Axiomatic Probability and Concepts
6 pages
Cramer Raoh and Out 08
No ratings yet
Cramer Raoh and Out 08
13 pages
Solution CH # 5
No ratings yet
Solution CH # 5
39 pages
Clustering
No ratings yet
Clustering
8 pages
Chapter10 Sampling Two Stage Sampling
No ratings yet
Chapter10 Sampling Two Stage Sampling
21 pages
Multiple Regression Tutorial 3
100% (2)
Multiple Regression Tutorial 3
5 pages
AUsing R For Power Analysis PDF
No ratings yet
AUsing R For Power Analysis PDF
6 pages
Emergency Gate Specification
No ratings yet
Emergency Gate Specification
7 pages
20231208004112082
No ratings yet
20231208004112082
5 pages
Sufficient Statistics and Exponential Family
No ratings yet
Sufficient Statistics and Exponential Family
11 pages
STAT 480b Answer Key To Problem Set No. 4
No ratings yet
STAT 480b Answer Key To Problem Set No. 4
3 pages
Sampling Book Pages2
No ratings yet
Sampling Book Pages2
2 pages
Comparing The Areas Under Two or More Correlated Receiver Operating Characteristic Curves A Nonparametric Approach
No ratings yet
Comparing The Areas Under Two or More Correlated Receiver Operating Characteristic Curves A Nonparametric Approach
10 pages
Barrage Drawing
No ratings yet
Barrage Drawing
34 pages
A Famous Example in Genetic Modeling Tanner 1996 or Dempster Laird and Rubin 1977 Is A PDF
No ratings yet
A Famous Example in Genetic Modeling Tanner 1996 or Dempster Laird and Rubin 1977 Is A PDF
1 page
HW 03 Sol
No ratings yet
HW 03 Sol
9 pages
Coordinate Descent and Golden Selection Search
No ratings yet
Coordinate Descent and Golden Selection Search
2 pages
20240214130729305
No ratings yet
20240214130729305
1 page
20240208145546915
No ratings yet
20240208145546915
1 page

Sampling Theory and Method-301-500

Uploaded by

Sampling Theory and Method-301-500

Uploaded by

You might also like