Creative Commons Attribution-Noncommercial-Sharealike License
Creative Commons Attribution-Noncommercial-Sharealike License
Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site.
Copyright 2009, The Johns Hopkins University and Saifuddin Ahmed. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided AS IS; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed.
Sampling Procedure
Sampling involves two tasks:
How to select the elements? How to estimate the population characteristics from the sampling units?
We employ some randomization process for sample selection so that there is no preferential treatment in selection which may introduce selectivity bias.
Obviously, in practice there is no reason of interviewing the same individual twice! [Except in reliability or validation studies] Why study simple random sampling with replacement (SRSWR)?
Mathematical properties
Number of possible samples (of N elements): SRSWR: SRSWOR: Nn
N N Cn = n
SRSWR:
Nn
Example: 2 elements from 4 (ABCD) (How many ways we can draw 2 elements from a population of size 4?) With SRSWR: 42 = 16 AA, AB, AC, AD, BA, BB, BC, BD, CA, CB, CC, CD, DA, DB, DC, DD = 16 samples
SRSWOR:
Mathematically, AA, AB, AC, AD, BA, BB, BC, BD, CA, CB, CC, CD, DA, DB, DC, DD
N N Cn = n
N! 4! = =6 ( N n)! n! 2!2!
SRSWOR:
Mathematically, AB, AC, AD, BA, BC, BD, CA, CB, CD, DA, DB, DC,
N N Cn = n
N! 4! = =6 ( N n)! n! 2!2!
SRSWOR:
Mathematically, AB, AC, AD, BA, BC, BD, CA, CB, CD, DA, DB, DC,
N N Cn = n
N! 4! = =6 ( N n)! n! 2!2!
SRSWOR:
N N Cn = n
SRSWR:
1/N
1 1 1 from, . ..... = 1 / N n N N N
n! ( N n)! 1 1 1 . ...... .n! = N N 1 N n +1 N!
SRSWOR:
N 1/ n
from,
(inclusion probability):
Let i is the inclusion probability for unit i, Then i = p(s) =
SRSWR: n/N From:
1 1 1 1 n for every n + + .... + = = N N N N N
N n 1 n 1 N 1 1 = + . + ....... + . N N N 1 N N n N
SRSWOR:
n/N
From:
SRSWOR= 3/6=1/2 Element selection probability: n/N [In this example: two elements sampled from a population size of 4; each element has a probability=n/N=2/4=1/2 for selection]
In practice SRSWR is not attractive: we do not want to interview same individuals more than once. But in mathematical term it is simpler to relate the sample to population by SRSWR. SRSWOR provides two additional advantages:
elements are not repeated variance estimation is smaller than SRSWR with same sample size.
S2 Var ( y ) = n
Nn N 1
SRS though attractive for its simplicity, the design is not usually used in the sample survey in practice for several reasons:
Lack of listing frame: the method requires that a list of population elements be available, which is not the case for many populations. Problem of small area estimation or domain analysis: For a small sample from a large population, all the areas may not have enough sample size for making small area estimation or for domain analysis by variables of interest. Not cost effective: SRS requires covering of whole population which may reside in a large geographic area; interviewing few samples spread sparsely over a large area would be very costly.
The term (1-n/N) is called finite population correction and is the multiplying factor to convert SRSWR variance to SRSWOR variance.
This finite correction term is always less than 1 (except when n~0!), and suggests that deffsrswor is always less than 1. That is why SRSWOR is more efficient than SRSWR. Deff is extensively used in the design based analysis to examine the efficiency of the estimates.
1.
T = Yi
i =1
a.
b.
5.
6.
var(p) = or
832645 708009 305761 536405 217862 844905 670523 334920 885588 458268 277649 977683
573158 285644 995036 504168 782003 296231 707073 023934 384435 058670 076177 759956
467460 727733 740619 750032 409660 103727 049209 808901 129958 888935 482951 553916
838921 343305 054728 367682 155199 053603 830572 740693 303040 064613 876389 983998
171721 539264 746425 626278 129514 562252 337034 170372 264636 661404 898190 331578
152885 907568 713746 855480 484511 219726 716264 095017 858065 411861 927367 981306
Systematic Sampling
systematic sampling, either by itself or in combination with some other method, may be the most widely used method of sampling.
In simple random sampling we want that the samples should be distributed randomly.
In systematic sampling we force to select samples evenly from the list (sampling frame):
First, let us consider that we are dividing the list evenly into some blocks.
In systematic sampling we may force to select samples evenly from the list (sampling frame):
First, let us consider that we are dividing the list evenly into some blocks.
In systematic sampling, only the first unit is selected at random, The rest being selected according to a predetermined pattern. to select a systematic sample of n units, the first unit is selected with a random start r from 1 to k sample, where k=N/n sample intervals, and after the selection of first sample, every kth unit is included where 1 r k.
An example: Let N=100, n=10, then k=100/10. Then the random start r is selected between 1 and 10 (say, r=7). So, the sample will be selected from the population with serial indexes of: 7, 17,27,.........,97 i.e., r, r+k, r+2k,......., r+(n-1)k
Solution if k=5 is considered, stop the selection of samples when n=175 achieved. if k=6 is considered, treat the sampling frame as a circular list and continue the selection of samples from the beginning of the list after exhausting the list during the first cycle. An alternative procedure is to keep k non-integer and continue the sample selection as follows:
Let us consider, k=5.71, and r=4. So, the first sample is 4th in the list. The second = (4+5.71) =9.71 ~9th in the list, the third =(4+2*5.71) =15.42 ~ 15th in the list, and so on. (The last sample is: 4+5.71*(175-1) = 997.54 ~ 997th in the list).
Advantages: Systematic sampling has many attractiveness: 1. 2. 3. Provides a better random distribution than SRS Simple to implement May be started without a complete listing frame (say, interview of every 9th patient coming to a clinic).
4. With ordered list, the variance may be smaller than SRS (see below for exceptions)
Disadvantages:
linear trend
Example: Say we want to take a sample of size 10 from a population of 100. We will select the first sample randomly, say, 85th element. So, our sample will consist of the following elements: 85, 95, 5, 15, 25, 35, 45, 55, 65, 75 Not without criticism.
In case of systematic sampling, the selection probability of each element in the population is the same, i.e., epsem
This depends on how the elements are listed: randomly arranged or sorted in a particular fashion by a variable. If the listing is randomly ordered, you may view this as an SRS. Unfortunately this statement is oversimplified. Another factor will affect systematic sampling: intraclass correlation.