0% found this document useful (0 votes)
7 views

Simple Random and Systematic Sampling

Chapter 3 discusses simple random sampling and systematic sampling as foundational probability sampling designs, highlighting their advantages and disadvantages. Simple random sampling is effective for homogeneous populations but can be costly and imprecise for heterogeneous ones, while systematic sampling offers logistical convenience but may introduce biases. The chapter also covers estimating population parameters such as mean, total, and proportion, as well as determining appropriate sample sizes for accurate estimation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Simple Random and Systematic Sampling

Chapter 3 discusses simple random sampling and systematic sampling as foundational probability sampling designs, highlighting their advantages and disadvantages. Simple random sampling is effective for homogeneous populations but can be costly and imprecise for heterogeneous ones, while systematic sampling offers logistical convenience but may introduce biases. The chapter also covers estimating population parameters such as mean, total, and proportion, as well as determining appropriate sample sizes for accurate estimation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Chapter 3: Simple Random Sampling and Systematic

Sampling
Simple random sampling and systematic sampling provide the foundation for almost all of the
more complex sampling designs that are based on probability sampling. They are also usually
the easiest designs to implement. These two designs highlight a trade-off inherent in all sampling
designs: do we select sample units at random to minimize the risk of introducing biases into the
sample or do we select sample units systematically to ensure that sample units are well-
distributed throughout the population?

Both designs involve selecting n sample units from the N units in the population and can be
implemented with or without replacement.

Simple Random Sampling


When the population of interest is relatively homogeneous then simple random sampling works
well, which means it provides estimates that are unbiased and have high precision. When little is
known about a population in advance, such as in a pilot study, simple random sampling is a
common design choice.

Advantages:

• Easy to implement
• Requires little advance knowledge about the target population

Disadvantages:

• Imprecise relative to other designs if the population is heterogeneous


• More expensive to implement than other designs if entities are clumped and the cost to
travel among units is appreciable

How it is implemented:

• Select n sample units at random from N available in the population

All units within the population must have the same probability of being selected, therefore each
and every sample of size n drawn from the population has an equal chance of being selected.

There are many strategies available for selecting a


random sample. For large finite populations (i.e.,
those where every potential sampling unit can be
identified in advance), this can involve generating
pseudorandom numbers with a computer. For small
finite populations it might involve using a table of
random numbers or even writing a unique identifier for
every sample unit in the population on a scrap of
paper, placing those numbers in a jar, shaking it, then
selecting n scraps of paper from the jar blindly. The
approach used for selecting the sample matters little
provided there are no constraints on how the sample
units are selected and all units have an equal chance
of being selected.

1
Estimating the Population Mean
The population mean (μ) is the true average number of entities per sample unit and is estimated
with the sample mean ( μ̂ or y ) which has an unbiased estimator:

∑y
i =1
i
μ̂ =
n

where yi is the value from each unit in the sample and n is the number of units in the sample.

The population variance (σ2) is estimated with the sample variance (s2) which has an unbiased
estimator:

∑(y
i =1
i − y)2
s2 =
n −1

2
⎛ N −n⎞ s
Variance of the estimate μ̂ is: vâr (μˆ ) = ⎜ ⎟ .
⎝ N ⎠ n

The standard error of the estimate is the square root of variance of the estimate, which as always,
is the standard deviation of the sampling distribution of the estimate. Standard error is a useful
gauge of how precisely a parameter has been estimated as is a function of the variation inherent
in the population (σ2) and the size of the sample (n).

2
⎛ N −n⎞ s
Standard error of μ̂ is: SE(μˆ ) = ⎜ ⎟ .
⎝ N ⎠ n

⎛ N −n⎞
The quantity ⎜ ⎟ is the finite population correction factor which adjusts variance of the
⎝ N ⎠
estimator (not variance of the population which does not change with n) to reflect the amount of
information that is known about the population through the sample. Simply, as the amount of
information we know about the population through sampling increases, the remaining uncertainty
decreases. Therefore, the correction factor reflects the proportion of the population that remains
unknown. Consequently, as the number of sampling units measured (n) approaches the total
number of sampling units in the population
(N), the finite population correction factor FPC with N = 100
approaches zero, so the amount of
uncertainty in the estimate also 1
approaches zero. 0.8
0.6
FPC

When the sample size n is small relative to 0.4


the population size N, the fraction of the 0.2
population being sampled n/N also is 0
small, therefore the correction factor has 0 20 40 60 80 100
little effect on the variance of the estimator n
(Fig. 2 - FPC.xls). If the finite population
correction factor is ignored, which is what

2
we have to do when N is unknown, the effect on the variance of the estimator is slight when N is
large. When N is small, however, the variance of the estimator can be overestimated
appreciably.

Example. (to be added)

Estimating the Population Total


Like the number of entities per sample unit, the total number of entities in the entire population is
another attribute estimated commonly. Unlike the population mean, however, estimating the
population total requires that we know the number of sampling units in a population, N.

N
The population total τ = ∑y
i =1
i = Nμ is estimated with the sample total ( τˆ ) which has an unbiased

n
N
estimator: τˆ = Nμˆ =
n ∑y
i =1
i

where N is the total number of sample units in a population, n is the number of units in the
sample, and yi is the value measured from each sample unit.

In studies of wildlife populations, the total number of entities in a population is often referred to as
“abundance” and is traditionally represented with the symbol N. Consequently, there is real
potential for confusing the number of entities in the population with the number of sampling units
in the sampling frame. Therefore, in the context of sampling theory, we’ll use τˆ to represent the
population total and N to represent the number of sampling units in a population. Later, when
addressing wildlife populations specifically, we’ll use N to represent abundance to remain
consistent with the literature in that field.

Because the estimator τˆ is simply the number of sample units in the population N times the
mean number of entities per sample unit, μ̂ , the variance of the estimate τˆ reflects both the
number of units in the sampling universe N and the variance associated with μ̂ . An unbiased
estimate for the variance of the estimate τˆ is:

⎛ s 2 ⎞⎛ N − n ⎞
var(τˆ) = N 2 var(μˆ ) = N 2 ⎜⎜ ⎟⎟⎜ ⎟
⎝ n ⎠⎝ N ⎠

where s2 is the estimated population variance.

Example. (to be added)

Estimating a Population Proportion


If you are interested in the composition of a population, you could use a simple random sample to
estimate the proportion of the population p that is composed of elements with a particular trait,
such as the proportion of plants that flower in a given year, the proportion of juvenile animals
captured, the proportion of females in estrus, and so on. We will consider only classifications that
are dichotomous, meaning that an element in the population either has the trait of interest
(flowering) or it does not (not flowering); extending this idea to more complex classifications is
straightforward.

3
In the case of simple random sampling, the population proportion follows the mean exactly; that
is, p = μ. If this idea is new to you, convince yourself by working through an example. Say we
generate a sample of size 10, where 4 entities have a value of 1 and 6 entities have a value of 0
(e.g., 1 = presence of a trait, 0 = absence of a trait). The proportion of entities in the sample with
the trait is 4/10 or 0.40 which is also equal to the sample mean, which = 0.40
([1+1+1+1+0+0+0+0+0+0]/10 = 4/10). Cosmic.

It follows that the population proportion (p) is estimated with the sample proportion ( p̂ ) which has
an unbiased estimator:

∑y
i =1
i
pˆ = μ̂ = .
n

Because we are dealing with dichotomous proportions (sample unit does or does not have the
trait), the population variance σ2 is computed based on variance for a binomial which is the
proportion of the population with the trait (p) times the proportion that does not have that trait (1 –
p) or p(1 – p). The estimate of the population variance s2 is: p ˆ (1 − pˆ ) .

2
⎛ N −n⎞ s ⎛ N − n ⎞ pˆ ( 1 − pˆ )
Variance of the estimate p̂ is: vâr (pˆ ) = ⎜ ⎟ =⎜ ⎟ .
⎝ N ⎠ n −1 ⎝ N ⎠ n −1
2
⎛ N −n⎞ s ⎛ N − n ⎞ pˆ ( 1 − pˆ )
Standard error of p̂ is: SE(pˆ ) = ⎜ ⎟ = ⎜ ⎟ .
⎝ N ⎠ n −1 ⎝ N ⎠ n −1

Example. (to be added)

Determining Sample Sizes


How many sample units should we measure from the population so that we have confidence that
parameters have been estimated adequately?

Determining how many sample units (n) to measure requires that we establish the degree of
precision that we require for the estimate we wish to generate; we denote a quantity B, the
desired bound of the error of estimation, which we define as the half-width of the confidence
interval we want to result around the estimate we will generate from the proposed sampling effort.

To establish the number of sample units to measure to estimate the population mean μ at a
desired level of precision B with simple random sampling, we set Z × SE( y ) (the formula for a
confidence interval) equal to B and solve this expression for n. We use Z to denote the upper α/2
point of the standard normal distribution for simplicity (although we could use the Student’s t
distribution), where α is the same value we used to establish the width of a confidence interval,
the rate at which we are willing to tolerate Type I errors.

⎛ N −n ⎛σ2 ⎞⎞
We set B=Z ⎜ ⎜ ⎟ ⎟ and solve for n:
⎜ N ⎜ n ⎟⎟
⎝ ⎝ ⎠⎠

4
1 Z 2σ 2
n= ; n0 = .
1 1 B2
+
n0 N

If we anticipate n to be small relative to N, we can ignore the population correction factor and use
only the formula for n0 to gauge sample size.

Example: Estimate the average amount of money μ for a hospital’s accounts receivable. Note,
however, that no prior information exists with which to estimate population variance σ2 but we
know that most receivables lie within a range of about $100 and there are N = 1000 accounts.
How many samples are needed to estimate μ with a bound on the error of estimation B = $3 with
95% confidence (α = 0.05, Z = 1.96) using simple random sampling?

Although it is ideal to have data with which to estimate σ2, the range is often approximately equal
to 4 σ, so one-fourth of the range might be used as an approximate value of σ.

range 100
σ≈ = = 25
4 4

Substituting into the formula above:

1 1 1
n= = = = 217.4
1 1 1 1 0.0036 + 0.0001
2 2
+ +
1.96 25 1000 277.78 1000
32

Therefore, about 218 samples are needed to estimate μ with a bound on the error of estimation
B = $3.

To establish the number of sample units to measure to estimate the population total τ at a desired
⎛ σ2 ⎞
level of precision B with simple random sampling, we set B = Z ⎜⎜ N ( N − n) ⎟ and solve for n:

⎝ n ⎠

1 N 2 Z 2σ 2
n= ; n0 =
1 1 B2
+
n0 N

And as with establishing n for the population mean, if N is large relative to n, the population
correction factor can be ignored, and the formula for sample size reduced to n0

Example: What sample size is necessary to estimate the caribou population we examined to
within d = 2000 animals of the true total with 90% confidence (α = 0.10)?

Using s2 = 919 from earlier and Z = 1.645, which is the upper α = 0.10/2 = 0.05 point of the
normal distribution:

286 21.645 2 919 2


n0 = ≈ 51 .
2000 2

To adjust for the size of the finite population:

5
1
n= ≈ 44.
1 1
+
51 286

Systematic Sampling
Occasionally, selecting sample units at random can introduce logistical challenges that preclude
collecting data efficiently. If we suspect that the chances of introducing a bias are low or if ideal
dispersion of sample units throughout the population is a higher priority than minimizing potential
biases, then it might be most appropriate to choose samples non-randomly. As in simple random
sampling, systematic sampling is a type of probability sampling where each element in the
population has a known and equal probability of being selected. The probabilistic framework is
maintained through selection of one or more random starting points. Although sometimes more
convenient, systematic sampling does provide less protection against introducing biases in the
sample compared to simple random sampling.

Estimators for systematic sampling and simple random sampling are identical; only the method of
sample selection differs. Therefore, systematic sampling is used most often to simplify the
process of selecting a sample or to ensure ideal dispersion of sample units throughout the
population.

Advantages:

• Easy to implement
• Maximum dispersion of sample units throughout the population
• Requires minimum knowledge of the population

Disadvantages:

• Less protection from possible biases


• Can be imprecise and inefficient relative to other designs if the population being sampled
is heterogeneous

How it is implemented:

• Choose a starting point at random


• Select samples at uniform intervals thereafter

1-in-k systematic sample


Most commonly, a systematic sample is obtained by
randomly selecting 1 unit from the first k units in the
population and every kth element thereafter. This
approach is called a 1-in-k systematic sample with a
random start. To choose k so than a sample of
appropriate size is selected, calculate:

k = Number of units in population / Number of sample


units required

For example, if we plan to choose 40 plots from a field


of 400 plots, k = 400/40 = 10, so this design would be

6
a 1-in-10 systematic sample. The example in the figure is a 1-in-8 sample drawn from a
population of N = 300; this yields n = 28. Note that the sample size drawn will vary and depends
on the location of the first unit drawn.

Estimating the Population Mean


n

∑y
i =1
i
The population mean (μ) is estimated with: μ̂ =
n

∑(y
i =1
i − y)2
The population variance (σ2) is estimated with: s 2 =
n −1

2
⎛ N −n⎞s
Variance of the estimate μ̂ is: vâr(μˆ ) = ⎜ ⎟ .
⎝ N ⎠ n

2
⎛ N −n⎞ s
Standard error of μ̂ is: SE ( μˆ ) = ⎜ ⎟ .
⎝ N ⎠ n

Estimating the Population Total


n
N
The population total τ is estimated with: τˆ = Nμˆ =
n ∑y
i =1
i .

⎛ s 2 ⎞⎛ N − n ⎞
Variance of the estimate τˆ is: vâr(τˆ) = N 2 var(μˆ ) = N 2 ⎜⎜ ⎟⎟⎜ ⎟.
⎝ n ⎠⎝ N ⎠
⎛ s2 ⎞⎛ N − n ⎞
Standard error of τˆ is: vâr(τˆ) = N 2 ⎜⎜ ⎟⎜
⎟⎝ N ⎟⎠
⎝ n ⎠

Estimating the Population Proportion


The population proportion (p) is estimated with the sample proportion ( p̂ ) which has an unbiased
estimator:

∑y
i =1
i
pˆ = μ̂ = .
n

Because we are estimating a dichotomous proportion, the population variance σ2 is again


computed with a binomial which is the proportion of the population with the trait (p) times the
proportion without that trait (1 – p) or p(1 – p). The estimate of the population variance s2 is:
pˆ (1 − pˆ ) .

7
2
⎛ N −n⎞ s ⎛ N − n ⎞ pˆ (1 − pˆ )
Variance of the estimate p̂ is: vâr( pˆ ) = ⎜ ⎟ =⎜ ⎟ .
⎝ N ⎠ n −1 ⎝ N ⎠ n −1

Examples. (to be added)

You might also like