0% found this document useful (0 votes)
244 views46 pages

Chapter 3 Sampling Methods

The document discusses sampling methods for research. It describes how sampling is necessary when studying large populations due to constraints of time, money and resources. The key steps in sampling design are: defining the population, creating a sampling frame, determining the sampling unit, deciding sample size, identifying parameters of interest, considering budget constraints, and choosing a sampling procedure. Obtaining a good sample requires using comprehensive population listings, contacting community leaders for current lists, doing a quick census of small populations, or estimating the population size.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
244 views46 pages

Chapter 3 Sampling Methods

The document discusses sampling methods for research. It describes how sampling is necessary when studying large populations due to constraints of time, money and resources. The key steps in sampling design are: defining the population, creating a sampling frame, determining the sampling unit, deciding sample size, identifying parameters of interest, considering budget constraints, and choosing a sampling procedure. Obtaining a good sample requires using comprehensive population listings, contacting community leaders for current lists, doing a quick census of small populations, or estimating the population size.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

A good sample significantly tells the fortune or misfortune of the

population, but a bad sample abuses the population.

- Abdi-Khalil Edriss -

SAMPLING METHODS
I. Sampling
It is important to use appropriate sampling methods in order to be able to make proper
inferences about a population, and draw conclusions that are valid on the information
of interest. The sample must be selected to represent the entire population through
proper sampling techniques or methods.

All items in any field of inquiry constitute a ‗universe‘ or a ‗population‘. A complete


enumeration of all items in the population is known as a census inquiry. In such an
inquiry, when all items are covered, no element of chance is left and highest accuracy
is obtained. This type of inquiry, however, involves a great deal of time, money and
energy. Therefore, when the field of inquiry is large, this method becomes difficult to
adopt because of the resources involved. At times, this method is beyond the reach of
ordinary researchers. Government, in very rare cases, is the only institution which can
get the complete enumeration carried out. For example, population census in most
countries is carried out once in a decade.

Undertaking a census survey, many a times, is not possible. Sometimes it is possible


to obtain sufficiently accurate results by studying only a part of total population,
technically called samples. The process of selecting samples is called sampling
technique. In sampling, however, the samples selected should be as representative of
the total population as possible in order to produce a miniature cross-section.

A researcher must prepare a sample design for the study. The researcher must plan
how a sample should be selected and of what size such a sample would be. Sample
~ 45 ~
design is determined before data are collected and it is the core for any research
undertaken.

3.1. What is sampling?

Sampling is the process of selecting representative number of elements


from a population.

Sampling can be accomplished by identifying –

i. The group of people or study population from which we draw a


sample;
ii. How these populations should be selected;
iii. The number of objects or people in the sample; and
iv. The acceptable sampling error (that is, minimizing the difference
between the sample estimate and the actual population parameter). In a
nutshell, this will guarantee us, at least, to obtain unbiased estimates of
a sample. Note that sampling error will always occur when a sample,
and not the whole population, is surveyed.

POINTS TO PONDER
The relationship between a sample and a population is established through sample
statistic and population parameter. We can be reasonably sure that the sample
statistic (e.g., y , S, or S2) is fairly close to the population parameter (i.e., , , , or
2
) if the sample elements are representative of the population elements and if the
analysis is done properly. Thus, by studying and understanding the properties (or
characteristics) of the sample elements, it would be possible to generalize the
estimated characteristics to the population elements.

Sampling

Population Sample

Parameters Estimates
( ,,σ , or σ2) ( y , S, or S2)

~ 46 ~
3.2. Why take a sample?

Ideally, one would measure indicators that are feasible, quantifiable,


acceptable, and sensitive measures in the whole population. But, usually
due to financial, logistical and geographical limitations, small portions (or
samples) of the population are normally studied by necessity. Therefore,
we take statistically adequate sample (or proper sample size) that can
reflect most of the important characteristics of the population.
o There could be resource (time, finance, manpower, etc.) limitations
which would make it difficult to study the whole population.
o In some cases, tests may be destructive. For example, when we test
the breaking strength of materials, we must destroy them. A census
would mean complete destruction of materials. In such a case, we
must sample.
o Sampling provides much quicker results than does a census. When
the time between the recognition of the need of information and the
availability of that information is short, sampling helps not to miss
the information.
o Sampling is the only process possible if the population is infinite.
o There is also an argument that the quality of a study is often better
with sampling than with a census. The basis of the argument is that
sampling possesses the possibility of better interviewing; more
thorough investigation of missing, wrong, or suspicious information,
better supervision, and better processing than is possible with
complete coverage.

3.3. What are the steps in sampling design?


There are a number of steps in developing a sample design, the most important
points or procedures are as follows.

o Type of universe or population: The first step in developing any sample


design is to define the universe or the population. The universe or the
population can be finite or infinite. In finite universe, the number of items
or elements is certain. Examples can be the population of a city, the
number of workers in a factory, etc. But in case of an infinite universe, the
number of items is infinite. Examples for an infinite universe can be
listeners of a specific radio program, number of TV watchers, stars in the
universe, number of bacteria, etc.

o Source list or frame: It is also known as sampling frame from which


sample is to be drawn. It contains the names of all items of a universe (for

~ 47 ~
finite universe). A source list should be comprehensive, correct, reliable
and appropriate. It is extremely important for the source list to be as
representative of the population as possible. As previously given example,
a comprehensive list such as region-district-village-household.

o Sampling Unit: A decision has to be taken concerning a sampling unit


before selecting sample. Sampling unit may be a geographical such as
region, province, district, village, etc., or a social unit such as family,
household, school, etc., or it may be an individual.

o Size of sample: This refers to the number of items to be selected from the
universe or population to constitute a sample. The size of sample should
neither be excessively large, nor too small. It should be optimum. In order
to decide on the size of the sample to be selected, a researcher must take
into consideration the size of population variance, the size of population,
the parameter of interest in the research study, and budgetary constraint.
Refer to Chapter 4 for detailed discussion on Sample Size Determinations.
What is representative and enough sample size?

o Parameters of interest: In determining the sample design, one must


consider the question of the specific population parameters which are of
interest. For example – population mean, variance, standard deviation,
proportion, etc.

o Budgetary constraint: Practically, costs have major impact upon


decisions relating to not only the size of the sample but also to the type of
sample. This fact can even lead to non-probability samples. However,
using Neumann or probability allocation techniques, it could be balanced
out between costing and sample without compromising much information
coming out of the research. For details, refer to Sample Size
Determinations Chapter.

o Sampling procedure: Finally, the researcher must decide the type of


sample that can be properly applied. The researcher must decide about the
technique to be used in selecting the items for the sample based on the
population or the universe being used. There are several sample designs to
choose from, and the researcher must choose a given sample size and for a
given cost which should result with a smaller sampling error possible. Of
course, the correct sampling method or methods must be applied. More
discussion in subsequent sections.

~ 48 ~
3.4. How do we get a good sample?

To obtain a good sample, we may jump start by –

o Using listings of larger population units such as states, provinces,


towns, districts, villages or census enumeration areas.
o Contacting community leaders, religious and political organizations for
current lists.
o Carrying out a quick census if the population under study is small.
o If these are not possible, estimate the population or at least the number
of households in each community and employ the suggested sampling
techniques.

POINTS TO PONDER
Make sure that all of the sampling units are listed in the sampling frame. If referred to
several dated census listings, ensure that they are from approximately the same year.

II. Most Frequently used Sampling Methods


The most frequently used sampling methods are known as (1) random or probability
sampling methods, and (2) non-random or non-probability sampling methods.

3.5. What is a random or probability sampling method?


A random sampling method is a sampling technique that gives equal chance of
selecting elements in a population when drawing samples. Most scientific
researches use random sampling methods.

3.6. What are the major probability sampling methods or techniques


in research methods?

The major random sampling techniques are –


i. Simple Random Sampling,
ii. Stratified Sampling,
iii. Systematic Sampling, and
iv. Cluster Sampling

Perhaps a combination of these methods, and mostly referred as multistage


sampling method often used in most surveys.

3.7. What are the main non-random sampling methods?


The major non-random sampling methods are –
i. Quota sampling
ii. Convenience sampling
iii. Purposive sampling or Judgment sampling
~ 49 ~
III. Simple Random Sampling Method

3.8. What is a Simple Random Sampling (SRS) method?

It is a method of selecting n units out of N population units giving each


element (or item/everyone) equal chance of being chosen. In practice, a
simple random sample is drawn unit by unit, with or without replacement,
from a population. For example, we could write the names or identification
numbers of all the communities, households or individuals on pieces of
paper, and randomly select the desired sample size by picking the required
number of papers.

However, in case of selecting large sample elements, we use table of


random numbers or computer generated random numbers or samples
nowadays (for example, consult Excel application software).

3.9. How do we draw a simple random sample?

Sampling units on an adequate frame must be numbered or identified so


that a randomization device, such as a random number table, can be used
to select specific units for the sample. Or, draw numbers out of a hat or a
container from those thoroughly mixed pieces of paper. Use table of
random numbers or computer generated random numbers (or samples).
(For example, consult Excel application software or any statistical
packages such as SPSS or STATA. Or, some sophisticated hand
calculators can generate random numbers, as well).

Formula from Excel


Excel > Function reference > Math and trigonometry
RAND

Returns an evenly distributed random real number greater than or equal to 0


and less than 1. A new random real number is returned every time the
worksheet is calculated.

To generate a random real number between ‗a‘ and ‗b‘, use:

RAND()*(b-a)+a
~ 50 ~
This formula can generate as many random number as you want; However,
one may alternatively opt to use table of random number, which is given here
as an example.

Table 3.1: Table of Random Number


51772 74640 42331 29044 46621 62898 93582 04186 19640 87056
24033 23491 83587 06568 21960 21387 76105 10863 97453 90581
45939 60173 52078 25424 11645 55870 56974 37428 93507 94271
30586 02133 75797 45406 31041 86707 12973 17169 88116 42187
03585 79353 81938 82322 96799 85659 36081 50884 14070 74950
64937 03355 95863 20790 65304 55189 00745 65253 11822 15804
15630 64759 51135 98527 62586 41889 25439 88036 24034 67283
09448 56301 57683 30277 94623 85418 68829 06652 41982 49159
21631 91157 77331 60710 52290 16835 48653 71590 16159 14676
91097 17480 29414 06829 87843 28195 27279 47152 35683 47280
50532 25496 95652 42457 73547 76552 50020 24819 52984 76168
07136 40887 79971 54195 25708 51817 36732 72484 94923 75936
27989 64728 10744 08396 56242 90985 28868 99431 50995 20507
85184 73949 36601 46253 00477 25234 09908 36574 72139 70185
54398 21154 97810 36764 32869 11785 55261 59009 38714 38723
65544 34371 09591 07839 58892 92843 72828 91341 84821 63886
08263 65952 85762 64236 39238 18776 84303 99247 46149 03229
39817 67906 48236 16057 81812 15815 63700 85915 19219 45943
62257 04077 79443 95203 02479 30763 92486 54083 23631 05825
53298 90276 62545 21944 16530 03878 07516 95715 02526 33537

3.10. How do we use table of random numbers?

Suppose that we want to select a simple random sample of 40 people (or


other units) out of a population of totaling 350.

i. To begin, number the members of the population: in this case from 1 to


350. Now the problem is to select 40 random numbers. Once we have
done, our sample will consist of the people having the numbers we
have selected. (Note: it is not essential to actually number them, as
long as we are sure of the total. If we have them in a list, for example,
we can always count through the list after we have selected the
numbers.)

ii. The next step is to determine the number of digits we will need in the
random numbers we select. In our example, there are 350 members of
the population, so we will need 3-digit numbers to give everyone equal
chance of being selected. (If there were 17480 members of the
population, we would need to select 5-digit numbers.) Thus, we want
to select 40 random numbers in the range from 001 to 350.

~ 51 ~
iii. Now turn to Table 3.1. Notices there are several rows and columns of
5-digit numbers. The table represents a series of random numbers in
the range from 00001 to 99999. To use the table for our hypothetical
sample, we have to answer these questions:

 How will we create 3-digit numbers out of 5-digit numbers?


 What pattern will we follow in moving through the table to
select our numbers?
 Where will we start?

Each question has several satisfactory answers. The key is to create


a plan and follow it. Here is an example.

iv. To create 3-digit numbers from 5-digit numbers, let‘s agree to


select 5-digit numbers from the table but consider only the left-
most 3 digits in each case.

v. We can also choose to progress through the table any way we want:
down the columns, up, across to the right or to the left, or
diagonally. Again, any of these plans will work just fine so long as
we stick to it. For convenience, let‘s agree to move down the
columns; we will go to the bottom of one column, we will go to the
top the next; and so forth.

vi. Now, where do we start? You can close your eyes and point
anywhere in the table (Table 3.1). (I know it does not sound
scientific, but it works.) Say, I will pick the number in the 3rd row
of column 2. Start with that number.

vii. Let‘s suppose we decide to start with the 3rd number in column 2
(Table 3.1). You will see that the starting number is 21960. We
have selected 219 as our first random number, and we have 39
more to go. Moving down the second column, we select 254, 004,
095, and continue to the top of column 3: 234, 116, 033, 156, 302,
174, 107 328, 078, and so on.

Note that any time you come across a number that lies outside your
range, skip it and continue on your way. The same applies if the
same number comes up more than once.

viii. That is the way to go. You keep up the procedure until you have
selected 40 random numbers. Returning to your list, your sample
consists of household number 219, 254, 004, and so forth.

~ 52 ~
POINTS TO PONDER
In simple random sampling, the selection of an individual or element is independent
of the selection of another individual or element. Under this sampling design, every
item of the population has an equal chance of inclusion in the sample. It is blind
chance alone that determines whether one item or the other is selected. Random
sampling is considered as the best technique of selecting a representative sample. The
problem with this method is that it is time consuming. Always minimize sampling
error, standard error and all sort of statistical errors.

3.11. What are the main estimates of a sample?

Although sampling is undertaken for many purposes, interest centers most


frequently on four characteristics –
i. Mean,
ii. Variance,
iii. Proportion, and
iv. Confidence intervals of a sample.

3.12. How do we compute some of the characteristics of a Simple


Random Sample mentioned previously?

By way of numerical examples, here are the computations for the main
characteristics of a simple random sample.

NUMERICAL EXAMPLE
From a large class, a random sample of 10 grades were drawn: 60, 52, 95, 80, 54, 48,
75, 91, 40 and 85. Calculate a 95% confidence interval for the whole class.

The mean of simple random sample is given as -

yi
y
n
60 52 95 80 54 48 75 91 40 85
10
680
= 68
10

POINTS TO PONDER - This is a sample mean that refers to the population mean
(or average) denoted by . Or, y infers (is an estimate for) to .

The variance of a simple random sample is given as -

~ 53 ~
( yi y) 2
S2
(n 1)
=[(60-68)2 + (52-68)2 +( 95-68)2 +( 80-68)2 + (54-68)2 + (48-68)2 + (75-
68)2 + (91-68)2 + (40-68)2 +(85-68)2]/ (10 – 1)
64 256 729 144 196 400 49 523 784 289
9
3434
=
9
= 381.56

POINTS TO PONDER
The sample variance S2 is an estimate for the population (class) variance denoted
by 2.

The standard deviation of a simple random sample -

This is another characteristics of interest associated with the sample variance


is the sample standard deviation, which is calculated from the sample variance
as

( yi y) 2
S
(n 1)
= S2
381.56
= 19.53

The confidence interval of a simple random sample -

To confidently (or reasonably) claim that the population mean (which is ) is


represented by the estimated sample mean (which is y , we must construct an
interval estimate or confidence interval (usually denoted as CI) of the form -

= y sampling error
s
= y t0.025, 9
n
19.53
= 68 2.262 ( )
10
= 68 13.97

~ 54 ~
where t is a student distribution, and for 95% confidence interval with 9
degree of freedom (n – 1 = 10 –1 = 9), the t-value is 2.262 (Refer to any
statistics book with t-table).

Thus, with 95% confidence, we can conclude that the mean grade of the whole
class is between 54.03 and 81.97 (lower and upper bound of Confidence
Interval).

IV. Stratified Sampling Method

3.13. What is a Stratified Sampling (SS) method?

It is a method of selecting n units out of Ni sub-populations called strata.


To obtain the samples, first divide the population into Ni non-overlapping
sub-populations (or strata) and then select the n units from each stratum.
Here is an example.

Figure 3.1: Non-overlapping stratified regions in a country

B E
N2 D N5
N4
A
C
N1
N3

Referring to Figure 3.1, suppose we divide a region into five districts (that is, strata),
A, B, C, D and E based on some geographic characteristics. And, the corresponding
size of the non-overlapping sub-populations (or strata) are given as –

N1 = 200, N2 = 300, N3 = 500, N4 = 600 and N5 = 400

Note that N = Ni = 200 + 300 + 500 + 600 + 400 = 2000, and suppose we want to
take sample size n=100 out of the entire population, N=2000; then from each stratum
we may sample –

N1 200
n1 n 100 10 (i.e., out of N1=2000)
N 2000

~ 55 ~
n2 = 15 from B(out of N2), n3 = 25 from C(out of N3), n4 = 30 from D (out of
N4 ) and n5 = 20 from E (out of N5)

Therefore, n = n1 + n2 + n3 + n4 + n5 = 100

However, where strata differ not only in size but also in variability and is considered
reasonable to take larger samples from the more variable strata and smaller samples
from the less variable strata, we can then account for both (differences in stratum size
and differences in stratum variability) by using disproportionate sampling design by
using the formula:

n.N i . i
ni
N 1 1 N 2 2 ... N k k
Where i denote the standard deviations for the ith stratum, Ni denote the size of the
ith stratum, and ni denote the sample size of the ith stratum.

Using the previous example, assume a population is divided into five strata so that N1
= 200, N2 = 300, N3 = 500, N4 = 600 and N5 = 400. Respective standard deviations are
1 5, 2 7, 3 10, 4 15, 5 9. How does a sample of size n=100 be allocated
to the five strata, considering optimum allocation?

Applying disproportionate sampling design, the optimum sample size for each stratum
is calculated as follows –

100(200)(5) 100000
n1 4.83 5
(200)(5) (300)(7) (500)(10) (600)15 (400)9 20700

100(300)(7) 210000
n2 10.14 10
(200)(5) (300)(7) (500)(10) (600)15 (400)9 20700

100(500)(10) 500000
n3 24.15 24
(200)(5) (300)(7) (500)(10) (600)15 (400)9 20700

100(600)(15) 900000
n4 43.47 44
(200)(5) (300)(7) (500)(10) (600)15 (400)9 20700

100(400)(9) 360000
n5 17.39 17
(200)(5) (300)(7) (500)(10) (600)15 (400)9 20700

~ 56 ~
POINTS TO PONDER
(1) In stratified sampling, each stratum is homogeneous internally and heterogeneous
with other strata. (2) The more strata used, the closer you come to maximizing inter-
strata differences and minimizing intra-stratum variances.

3.14. How to form strata?


The strata are formed on the basis of common characteristic(s) of the items to
be put in each stratum. This means that various strata be formed in such a way
as to ensure elements being most homogeneous within each stratum and most
heterogeneous between the different strata. Strata are purposively formed
usually based on past experience and personal judgment of the researcher. For
example, most frequently used strata such us urban vs rural, high density vs
medium density vs low density areas, etc.

3.15. How do we draw a stratified sample?

As shown earlier, we can draw a stratified random sample by separating


the population elements into non-overlapping groups, called strata, and
then selecting subjects or elements from each stratum by using a simple
random method or systematic sampling method.

For example, we may want to measure indicators for special groups (sub-
populations) within our study area such as urban and rural, female-headed and
male-headed households, ethnic groups, religious groups, etc. To ensure that
such special groups are adequately represented in our sample, we create a
stratum for each group and select units from each stratum separately.

POINTS TO PONDER
In practice, we make first inferences (deal with characteristics) about each stratum,
and finally about all the strata combined.

3.16. When do we use stratified sampling methods?

When a population has relatively large variability between strata, stratified


sampling is much more efficient than simple random sampling. If
stratification is not performed, large groups (such as regions, states,
provinces, etc.) will have more samples than the small groups. To achieve
equal precision for each group, we select the same number of units, say
number of households, in each sub-group, and later weight the results
according to the actual proportion of the national population in each group
(states, provinces, regions, etc.).

~ 57 ~
POINTS TO PONDER
The weighing of results comes in because of unequal sample sizes (small and large
groups) as strata (or groups) are formed.

3.17. How do we perform the weighting of results that are obtained


due to unequal sample size in each stratum?

Frequently the sample size in each stratum is not proportional to the actual
population. This imbalance must be corrected when the data set is
analyzed by weighting. For example, if the urban dwellers constitute 10%
of a country‘s population but our stratification procedure results in a
sample with 20% belonging to that category, the sample data will have to
be weighted to produce national results. Another remedial procedure is to
use Probability Proportional to Size Sampling (PPS) or Proportional
Stratification (PS) method, which we will discuss latter.

Practical advantages of Stratified Sampling

3.18. What are some of the advantages of stratified sampling?

The variance of the estimator of the population mean is usually reduced


because the variance for the observations within each stratum is usually
smaller than the overall population. In other words, it is possible to extract
more precise estimates from a homogeneous (measurement vary little
within) sub-population than a heterogeneous (measurement vary a lot
within) population.
Second, the cost of collecting and analyzing the data is often reduced by
the separation of a large population into smaller strata (administrative
convenience).
Third, separate estimates can be obtained for each individual stratum
without selecting another sample and hence without additional cost.

POINTS TO PONDER
The main advantages of stratified sampling are: (i) more reliable information is
obtained for the same sample size if the population is stratified than they are for the
population as a whole. (ii) Comparisons between strata are easy. This is so because a
separate but similar survey is done in each stratum.

~ 58 ~
Proportional Probability Sampling (PPS)

3.19. What is special about Proportional Probability Sampling (PPS)


or Proportional Stratification?

PPS takes into consideration the size of each stratum. PPS technique will
take off the imbalances of sample size in stratified sampling automatically.
It is also known as Proportional Stratification.

3.20. When do we apply PPS method?

PPS is applied when some strata have considerably large size, that is, the
basic sampling unit (say, household) in each stratum varies in size. For
example, a country may vary considerably in population density (some
highly populated while others are dispersed).

If SRS or Systematic Sampling is used for sampling villages, both large


and small communities will have the same probability of being included,
which is incorrect. One way of compensating for these differences is to
choose elements proportional to the sizes of the strata using PPS method.

POINTS TO PONDER
PPS method ensures that communities with larger proportion have a proportionately
greater chance of containing a selected cluster than small communities. This type of
sample is self-weighting, which will simplify the analysis and improves the
representativeness of the sample.

3.21. How do we perform PPS or select a sample using1 Proportional


Stratification?

To sample communities with PPS method, let‘s explain the techniques or


procedures via a numerical example as follows.

Step 1
o Assign a number to each community or village
o List the population size of each community or village

1
Refer to end of this chapter for additional PPS example, courtesy of National Statistics Office (NSO),
Malawi. This real application shows the combinations of SRS, SS and PPS, which was used by NSO
for National Sample Survey of Agriculture (NSSA) data collection.
~ 59 ~
o List the cumulative population of each community, that is, the sum of
the population of that community plus the populations of all the
communities above it.

Step 2
o Suppose the population comprises 10 villages and the population sizes
are as shown in Table 3.2.

Table 3.2: Cumulative population

Village Population size Cumulative


1 500 500
2 400 900
3 600 1500
4 1000 2500
5 1400 3900
6 1700 5600
7 400 6000
8 800 6800
9 550 7350
10 1100 8450

Step 3
o Calculate the sampling interval, using

Total cumulative population divided by number of clusters or strata


required, which equals the sample interval.

Now, suppose we need to select 5 villages (say, clusters) from a total population
size of 8450, this will give

Sampling interval = 8450/5 = 1690

Step 4
o Select a random number which is equal or less than the sampling
interval. So choose a number between 1 and 1690 by using table of
random numbers. Say, the chosen number is 1410.

Step 5
o Look at the cumulative village table and locate the first cluster or strata
by finding the village whose cumulative population exceeds this
random number. In our example, the first cluster would be located in
village 3, whose cumulative population is 1500.

Step 6
o Add the sampling interval to the random number
~ 60 ~
1690 + 1410 = 3100

Step 7
o Choose the community whose cumulative population exceeds 3100.
Thus, the second cluster will be located in village 5.

Step 8
o Identify the location of each subsequent cluster by adding the sampling
interval to the number located the previous cluster (say, 3100 + 1690,
4790 + 1690, etc.). We stop when we have located as many clusters as
we need.

3.22. How do we calculate some frequently used characteristics of a


stratified sample?

Let‘s discuss these characteristics by working out some numerical


examples.

Table 3.3: Important Formulas


Calculations of mean and sample variance of stratified sampling
Population Stratified Sampling
Proportion Sample size
of the whole in each
Stratum population Mean Variance stratum Sample mean
2
y1 j
1 p1 1 1 n1 y1
n1
2 p2 2 n2
y2 j
2 2 y2
n2
. . . . .
.
j pj 2 nj yij
.
j j yj
nj
. . . .
.
m pm m m
2 nm y mj
ym
nm
pj = 1 = pj n = nj
j y ss py j j

The stratified sample mean


estimates the population mean

2
Note that p stands for proportion, is population mean, is population variance, n
is sample size, and y is sample mean.

~ 61 ~
Furthermore, the variance of the stratified population can be estimated from the
sample stratified variances using the following formula.

var y ss = p12 var y1 + p 22 var y 2 + p32 var y3 +….+ pm2 var ym


p12 12 p 22 22 p32 32 p m2 m2
....
n1 n2 n3 nm
p12 s12 p 22 s 22 p32 s32 pm2 s m2
....
n1 n2 n3 nm

2
In practice, since the population variance, j , is unknown we have replaced it by the
2
sample variances, s in the formula as shown above.
j

NUMERICAL EXAMPLE
Suppose a Rural Credit Institution took a simple random sample of 1100 farmers of its
300,000 farmers it lends agricultural credits. The amount y that a farmer owed the
Rural Credit Institution were summarized with average of Malawi Kwacha, y =
MK7500 and standard deviation of s = MK2000.

Now, since stratifying would improve accuracy, the population and sample were
sorted according to the total repayment. This gave the data shown in Table 3.4 as
follows.

Table 3.4: Data on stratification


Stratum Known Sample Mean repayment Standard
Defined by total number of number in in MK in each deviation of
repayment farmers in each stratum repayment in
amount, A, in MK each stratum stratum, ni yi each stratum si

0< A 5000 150,000 n1 = 700 y1 2500 S1 =1500

5000<A 10000 100,000 n2 = 300 y2 6000 S2 = 2500

100000<A 50,000 n3 = 100


y3 10,000 S3 = 5000

300,000 n = 1100

Using data in Table 3.4, the Rural Credit Institution needs to know how much the
farmers owed it.

~ 62 ~
Solutions
First we need the population proportion pj (j=1, 2, 3; that is, for three the strata), thus

The population proportion are calculated as follows -

p1 = 150,000/300,000 = 0.50
p2 = 100,000/300,000 = 0.33
p3 = 50,000/300,000 = 0.17

Then we can calculate the stratified sample mean as follows.

The mean of a stratified random sample is given as

y ss p1 y1 p2 y 2 p3 y3
= 0.50 (2500) + 0.33(6000) + 0.17(10,000)
= 4930

So this is the average amount of repayment (in Malawi Kwacha) the farmers
should pay to the Rural Credit Institution.

The variance of a stratified random sample is given as -


2 2 2 2 2 2
ps 1 1 ps 2 2 ps
3 3
Var y ss
n 1 n 2 n 3

= 0.5 (1500) /700 + 0.33 (2500)2/300 + 0.172(5000)2/100


2 2 2

= 803.50 + 2268.75 + 7225


= 10,297.25
And the standard deviation is 10297.25 101.48

The 95% confidence interval (C.I.) for the population mean, , repayment is -

ss
y ss t 0.025,1100
Var y ss
= 4930 1.96 (101.48)
= 4930 198.90

From these results we can easily estimate the total amount of Malawi Kwacha (MK)
the farmers owe the institution. The total is calculated as

Total Amount = 300,000 farmers x (MK4930 MK198.90)


= MK1, 479, 000, 000 MK59, 670, 000

~ 63 ~
Hence, with 95% confidence we can conclude that the farmers owe the Rural Credit
Institution between K1, 419, 330, 000 and K1, 538, 667, 515 (K1.4 to 1.5 billion
Malawi Kwacha).

NUMERICAL EXAMPLE
Suppose the Ministry of Health and Population interviewed a sample of 50 pregnant
women from the 3 major hospitals in the country, and decided to select random
samples of size n1 = 25 from Queen Elizabeth, n2= 15 from Lilongwe Central
Hospital and n3 = 10 from Mzuzu Hospital. The simple random samples are selected
from the three strata, and interviews were conducted. The results, with measurements
of delivery and stay time in days are given below.

Other related ―useful facts‖ are also given as follows.

Queen Elizabeth Hospital Lilongwe Central Mzuzu Hospital


y1 1.5 y2 1.5 y3 2

S1 = 0.52 days S2 = 0.6 days S3 = 0.7 days


N1 = 500 N2 = 350 N3 = 150

Estimate the mean delivery and stay time (in days),

a. For all pregnant women who went to the three hospitals


b. For all pregnant women who went to Lilongwe Central Hospital only
c. In both cases, parts [a] and [b], place a 95% bound of error of estimation
d. Suppose it costs the government MK3000 per day for delivery and stay in
Lilongwe Central Hospital, estimate the total cost incurred in part [b].

Solutions
Since the data use 3 strata (hospitals), we apply stratified sampling calculations –

a. The overall mean estimate is –

y ss p1 y1 p2 y 2 p3 y 3

first N = N1 + N2 + N3 = 500 + 350 + 150 = 1000


and p1 = N1/N = 500/1000 = 0.5; p2 = N2/N = 350/1000 = 0.35 and
N3/N = 150/1000 = 0.15;

Hence, y ss = 0.5 (1.5) + 0.35 (1.5) + 0.15 (2)


= 0.75 + 0.525 + 0.3
= 1.575 delivery and stay days

~ 64 ~
b. Simply, read the answers from the data table, and therefore, it is 1.5 delivery
and stay days.

c. The bound of error of estimation for part [a] is –

2 Var y ss and
Hence,
2 2 2 2 2 2
ps1 1 ps 2 2 ps
3 3
Var y ss
n 1 n 2 n 3

= (0.5) (0.52) /25+ (0.35)2 (0.6)2/15 + (0.15)2 (0.7)2/10


2 2

= 0.0027 + 0.0029 + 0.001


= 0.0066

thus, 2 Var yss = 2 0.0066 = 0.1624

For part [b], the bound of error of estimation is –

s 0.6
(t 0.05,n) (1.75) 0.271
n 2
15

d. The total cost for Lilongwe Hospital (part b) is –

Total cost = cost per day x N2 x ( y 2 (t0.05, n) s/ n2 )


= K3000 x 350 x (1.5 days 0.271)
= K1, 050, 000 0.407

Conclusion: The total estimated cost is between MK0.64 and 1.4 million per day at
Lilongwe Central Hospital.

V. Systematic Sampling

3.23. What is systematic sampling?

It is a method of sampling every kth unit sample from regular or arranged


units (say, from smallest to largest observations) within the N population
size. The selection of the first k unit is done randomly, and then every kth
unit thereafter. Systematic sampling could be done either in linear or
circular fashion.

~ 65 ~
An element of randomness is introduced into this kind of sampling by
using random numbers to pick up the unit with which to start. The
following steps will help:
o Assign a sequence number to each member of the population.
o Determine the skip interval by dividing the number of units in the
population by the sample size. I=P/S, where I is interval or skip, P
is population size, and S is sample size.
o Select a starting point in a random digit table (it must be between 1
and I).
o Include that item in a sample and select every ith item thereafter
until total sample has been selected.

For example, if we want to take 100 samples from a population of 2000 members, the
interval is 20 (i.e 2000/100). The starting point must be selected randomly from the
interval 1 to 20. Then, every 20th item will be part of the sample. If the starting point
is 5, then the sample must include elements with sequence numbers of 5, 25, 45, 65,
85, 105, 125 and so forth.

POINTS TO PONDER
In systematic sampling, the selection of the first unit determines the whole sample.
The advantage of this sampling technique is the samples will spread evenly over the
entire population. It is also an easier and less costly method of sampling and can be
conveniently used even in case of large populations.
However, if there is a hidden periodicity in the population, systematic sampling will
prove to be an inefficient method of sampling.

3.24. What are random, ordered and periodic populations?

A population is random if the elements of the population are in random


order and also known as a regular sample.

A population is ordered if the elements within the population are ordered


in magnitude according to some scheme.

A population is periodic if the elements of the population have cyclical


variation, and this happens if the simple correlation coefficient is greater
than one.

3.25. When do we choose systematic sample?

A systematic sample is preferable when the population is ordered


numerically in decreasing or increasing order.
~ 66 ~
When population is numbered in a sequence random, systematic sampling
and random sampling methods are the same, and either design can be used.

Note that care must be used in applying systematic sampling to periodic populations.

3.26. What are the advantages of systematic sampling?

It is easier to perform in the field and hence is less subject to selection


errors by field-workers than are either simple random samples or stratified
samples, especially if a good frame is not available

It can provide greater information per unit cost than simple random
sampling can provide.

A systematic sample is generally spread more uniformly over the entire


population and thus provides more information about the population than
equivalent amount of data contained in a simple random sample.

POINTS TO PONDER
Systematic sampling seems about as precise as the corresponding stratified sampling,
but the difference is that in systematic sampling the sample unit occurs at the same
relative position; while in stratified sampling it is determined by randomization within
stratum.

3.27. How do we compute the characteristics of a systematic


sample?

Important Formulas

An estimate of the population mean, , is given by

yi
y sys
n

Estimation of sample variance is given by –

~ 67 ~
2 ( y) 2

2
y i n
i
( N n)
s n 1 N2

Estimation of the population total is given by –

Nysys

NUMERICAL EXAMPLE
A horticulturalist has 2000 experimental hybrid maize plants of a new variety under
study. He has taken a sample of 200 and wants to estimate the total yield from the
maize plant of a 1-in-10 systematic sample of plants. The data from this survey is
listed in Table 3.5. Calculate the sample mean, variance and place a bound on the
error of estimation.

Table 3.5: Data on hybrid maize plants

Number of corn heads


Maize sampled on the plant
(Position number) Y Y2
3 3 9
13 4 16
23 2 4
33 1 1
. . .
. . .
. . .
173 6 36
183 5 25
193 10 100
2
y 800 i y i
4000

Solution

The mean of a systematic random sample is given as -


yi
y sys
n
= 800/200 = 4

The sample variance of a systematic random sample is given as -

~ 68 ~
2 ( y) 2

2
y i n
i
( N n)
s n 1 N2

8002
4000
2 200 (2000 200)
s 200 1 20002
0.0018

A bound on the error of estimation


2 0.0018 0.085

To summarize, we estimate average number of heads (corn heads on each plant) to be


about 4. We are quite confident that the bond on the error of estimation is less than
9%.

VI. Cluster Sampling

3.28. What is cluster sampling?

It is a probability sampling method in which each sampling unit is a


collection or cluster of elements, which are close together. For example, it
may be a collection or groups of households that are geographically close
together. In cluster sampling, the total population is divided into a number
of relatively small subdivisions which are themselves clusters of still
smaller units and then some of these clusters are randomly selected for
inclusion in the overall sample.

For example, if the total area of interest happens to be a big one, a


convenient way in which a sample can be taken is to divide the area in to a
number of smaller non-overlapping areas and then to randomly select a
number of these smaller areas (clusters), with the ultimate sample
consisting of all units in these small areas or clusters.

3.29. How do we draw a cluster sample?

The first task in cluster sampling is to specify appropriate clusters. For


example, dividing the city into regions such as blocks, areas or clusters of
elements, and then select a simple random sample of blocks from the
population (here is where we apply the probability theory when selecting
the random blocks from the total blocks available – example will be given
later).

~ 69 ~
For example, a number of villages (or clusters) may be initially selected
from a sampling frame (list of villages). A village (s) is (are) randomly
chosen from the total number of villages or clusters. Within each village or
cluster, we may interview all households or we may select only a sample
either by random selection or by another method with a cluster.

POINTS TO PONDER
If clusters happen to be some geographic subdivisions, in that case cluster sampling is
better known as area sampling. In other words, cluster designs, where the primary
sampling unit represents a cluster of units based on geographic area, are
distinguished as area sampling.

3.30. When is a cluster sampling an effective design for obtaining


information at minimum cost?

When a good frame of listing the population elements are either


unavailable or very costly to obtain.

When the cost of obtaining increases as the distance separating the


elements increases.

3.31. What is the main difference between optimal construction of


strata and construction of clusters?

Strata are to be as homogeneous (alike) as possible within, but one stratum


should differ as much as possible from another with respect to the
characteristics being measured. Clusters, on the other hand, should be as
heterogeneous (different) as possible within, and one cluster should look
very much like another in order for the economic advantages of cluster
sampling to pay off.

Summary of differences between stratified sampling and cluster sampling


Stratified sampling Cluster sampling
Divide the population into a Divide the population into many
few subgroups, each with subgroups, each with a few elements
many elements in it. The in it. The subgroups are selected
subgroups are selected according to some criterion of ease or
according to some criterion availability in data collection
that is related to the variables
under study
Secure homogeneity within Secure heterogeneity within subgroups

~ 70 ~
subgroups and heterogeneity and homogeneity between subgroups,
between subgroups but we usually get the reverse
Randomly choose elements First, randomly choose a number of
from within each subgroup subgroups, and secondly elements
within randomly selected clusters or
subgroups

3.32. What is a two-stage cluster sample?

As seen in the summary above, a two-stage cluster sample is obtained by


first selecting a probability sample of clusters and then selecting a
probability sample of elements from each sampled cluster.

3.33. When do we use two-stage sampling?

When a cluster often contains too many elements to obtain a measurement


on each, or it contains elements so nearly alike that measurement of only a
few elements provides information on an entire cluster. When either
situation occurs, the experimenter can select a probability sample of
clusters and then take a probability sample of elements within each cluster.
The result is a two-stage cluster sample.

Generally, it is employed because of cost-effectiveness or because no adequate frame


for elements is available. Why is it cost-effective? It is cost-effective because cluster
sampling reduces cost by concentrating surveys in selected clusters. But certainly it is
less precise than simple random sampling.

3.34. How do we draw a two-stage cluster sample?

To select the sample, we first obtain a frame listing all clusters in the
population.
Second, from the list or the frame, we draw a simple random sample of
clusters, using the random sampling procedures (Refer to simple random
sampling method discussed previously).
Third, we obtain frames that list all elements in each of the sampled
clusters.
Finally, we select a simple random sample of elements or items from each
of these frames.

For example, sampling with Probability Proportional to the Cluster Size -In case
the cluster sampling units do not have the same number or approximately the same
number of elements, it is considered appropriate to use a random selection process
where the probability of each cluster being included in the sample is proportional to

~ 71 ~
the size of the cluster. For this purpose, we have to list the number of elements in each
cluster irrespective of the method of ordering the cluster. Then, we must sample
systematically the appropriate number of elements from the cumulative totals.

Consider the following data comprising the number of households in 20 villages. If


we were to select a sample of 30 households, using villages as clusters and selecting
within clusters proportional to size, how many households from each villages should
be selected? (Use a starting point of 5 randomly picked between 1 and 20 since the
interval 20 (i.e. 600/30=20).

Data and sample households selected from each village are as follows.
Village Number of Sample households
number households Cumulative selected
A 20 20 5 (first household randomly selected)
B 40 60 25, 45
C 10 70 65
D 25 95 85
E 30 125 105, 125
F 100 225 145, 165, 185, 205, 225
G 60 285 245, 265, 285
H 25 310 305
I 30 340 325
J 15 355 345
K 10 365 365
L 5 370 None
M 60 430 385, 405, 425
N 40 470 445, 465
O 30 500 485
P 15 515 505
Q 20 535 525
R 15 550 545
S 30 580 565
T 20 600 585

Notes:
There are 600 households from which 30 households were selected
Sampling interval is calculated as Interval=600/30 = 20, which is added on
the starting point 5 to determine the household selection in each village.
Successive increment of 20 from the first household (household number 5) is
done until 30 households were selected.

~ 72 ~
The randomly selected households from each village are given in the last
column of the Table. For example, 2 households were selected in village B, 5
households in village F and none in village L.

3.35. What are the advantages of two-stage cluster sampling over


the other designs?

A frame listing all elements in the population may be not possible or costly
to obtain, whereas to obtain a list of all clusters may be easy. For example,
to compile a list of all secondary students in the country would be
expensive and time consuming, but a list of secondary schools could be
readily acquired.

The cost of obtaining data may be inflated by travel costs if the sampled
elements are spread over a large geographic area. Thus, to sample clusters
of elements that are physically closer together is often economical.

3.36. What are some of the problems we face when selecting a


cluster sample?

The first problem in selecting a sample is the choice of appropriate


clusters. Two conditions are desirable: first, geographic proximity of the
elements within a cluster and second, cluster sizes that are convenient to
administer.

The selection of appropriate clusters also depends on whether we want to


sample a few clusters and many elements from each or many clusters and a
few elements from each.

And ultimately, the choice is based on costs.

3.37. What are some of the things we should be aware of when


using two-stage sampling in order to obtain accurate
information about the population characteristics?

Large clusters tend to possess heterogeneous elements, and hence a large


sample is required from each in order to acquire accurate estimates of
population parameters. In contrast, small clusters frequently contain
relatively homogeneous elements, in which case accurate information on
the characteristics of a cluster can be obtained by selecting a small sample
from each cluster.

~ 73 ~
3.38. What is design effect in cluster sampling? And, why is it a
unique problem when applying cluster sampling?

Design effect is an implicit clustering error within a cluster sampling


methods. It happens because cluster sampling implies that each respondent
is not chosen independently of the other respondents leading to sampling
error, which is often known as design effects in cluster sampling.

Mathematical expression of this implicit clustering error called the design


effect is given as -

var iance within a cluster


Design Effect
var iance of a simple random survey

This effect depends both on the degree of similarity among respondents


within a cluster and on the size of the clusters. Larger clusters will lead to
a greater design effect than small ones.
To compensate for this error, sample size calculations normally based on
simple random sample calculations are multiplied by the design effect to
increase the power and precision of the study.

POINTS TO PONDER
Based upon experiences or published cluster surveys, a design effect of 2.0 is allowed
for most variables. For example, for water and sanitation variables, a design effect of
10, for health variables like goiter it is 3.0. Other indicators strongly affected by
clustering are – vaccine coverage, which is associated with distance from health
centers and by local immunization campaigns. Homogeneity is particularly severe for
indicators of incidence of infectious diseases such as measles, which is spread from
child to child.

Some advantages of cluster sampling include reduced time and travel costs, as well
as, simplified field work and ease of both field supervision and survey administration.
This is important since better supervision of interviewers will result in improved data
quality.

~ 74 ~
3.39. How do we measure some of the characteristics of a cluster
sample?

Important Formulas

Use the following notations to discuss the mean, variance and error bound of a
cluster sample. Let

N= the number of clusters in the population


n = the number of clusters selected in a simple random sample
mj = the number of elements in cluster j, j = 1, 2, ….., N
mj
mmean = mm m , showing the average cluster size across selected
n
clusters

M = mj = the number of elements across all clusters (or population)


Yj = the total of all the observations in the jth cluster

The estimator of the population mean is the sample mean , which is


given by –

yj
y
mj
thus, mean takes the form of a ratio estimator

The estimated variance of y has the form of the variance ratio estimator
given as –

N n (yj ym) 2
Var ( y) ( )( )
Nnmm2 (n 1)

The bound on the error of estimation is given as-

2 var( y )

~ 75 ~
NUMERICAL EXAMPLE
Interviews were conducted in 150 areas (or clusters) of a city, and 25 areas (or
clusters) where sampled using cluster sampling method. For each of the 25 sampled
clusters the data on income are given in Table 3.5. Note that the clusters are numbered
on a city map, with numbers from 1 to 150. Use these data (Table 3.6) to estimate per
capita income in the city, and place bound on the error of estimation.

Table 3.6: Clusters, population size and income in a city


Number of Total income Number of Total income per
Cluster residents per cluster Cluster residents cluster
j mj Yj j mj Yj

1 5 5400 14 8 4100
2 6 4300 15 3 5300
3 2 8500 16 7 5000
4 3 5000 17 8 3200
5 8 4500 18 6 2200
6 5 6500 19 4 4500
7 7 7500 20 5 3700
8 6 4000 21 5 5100
9 6 5200 22 6 3000
10 5 6500 23 3 3900
11 4 5000 24 9 4100
12 11 12100 25 10 4700
13 8 9600
mj =150 yj = 132900

The best estimate of the population mean is given by the mean of all the cluster
samples as -

yj 132900
y 886
mj 150

The variance of all the cluster samples is given as -


2 2 2 2 2
y j
y 1
y 2
y 3
... y 25

= (5400)2 + (4300)2 + (8500)2 + … + (4700)2


= 820, 550, 000
2 2 2 2 2
m j m m m 1 2 3
... m 25
2 2 2 2
= 5 + 6 + 2 + … + 10
= 1024
~ 76 ~
yjmj = (5400)(5) + (4300)(6) + (8500)(2) + … + (4700)(10)
= 821, 500

The following equality is easily established -


2 2 2
(y y m j)2 y 2y ym j
y m j
j j j

Substituting the numbers into the right-hand side of the equations yields

(y y m j)2 820550000 2(886)(821500) (886) 2 (1024)


j

= 82055000 – 1455698000 + 803835904


= 168, 687, 904

and estimating M via mmean (mean of the number of residents in all the sampled
clusters) using the formula

mj 150
mm 6
n 25

Thus, the variance of the cluster sample is given by –

( N n) (y j ym) 2
Var ( y) ( )( )
Nnmm2 (n 1)

= [(150 – 25)/(150)(25)(62)][168,1687, 904/24] = 271.17

and the estimate of the mean with a bound on the error of estimation (confidence
interval of cluster sample is given as) is given by –

Y 2 var( y ) 886 2 271.17 886 32.93

Therefore, the best estimate of the average per capita income is K886 Malawi
Kwacha, and the error of estimation should be less than MK32.93 with probability
close to 95%. If this bound on the error of estimation appears rather large; sampling
more clusters (consequently, increasing the sample size) could reduce sampling error,
in general.

~ 77 ~
3.40. What if a list of households is unavailable?

If lists are not available and cannot be created by carrying out a quick
census, or by consulting community leaders, then select one household as
the starting point, followed by selecting successive households to ensure
that the sample is as representative as possible.

3.41. What are the techniques of selecting households?

The exact technique for the selection of the households will depend upon
field conditions. However, here are some commonly used methods in the
field.

Find a central point in the community, such as the market. Then randomly
select a direction from the central point and count the number of
households between the central point and the edge of town in that
direction. Randomly select one of these houses to be the starting point of
the survey. The remaining households in the sample should then be
selected to give as widespread coverage as possible of the community that
is consistent with practicability.

Note that to improve the representativeness of the sample, choose a


random number, k, and visit the kth closest household, or select all the
households at random. Each household selection process is continued until
the required cluster sample size has been surveyed.

In villages containing dwellings of several households, a specific


procedure is required. If these are infrequent, it is best to select all the
households within the selected dwelling as this prevents multiple-
household dwellings from being under-represented in the sample. If they
are a rare occurrence, they should be treated as a cluster.

In large communities, several central locations in different parts of the


community should be identified as starting points from which to perform
separate cluster surveys.

3.42. Compare and summarize probability sampling designs.


Design Random selection Other characteristics
Simple Sample members Each population element has an equal chance of
random individually from the being selected
sampling population
Disadvantage: requires a listing of population
elements, -expensive and requires more time to
implement

Systematic The initial sample Designation of the initial sample member


~ 78 ~
random member is determines the entire sample.
sampling individually selected
Disadvantage: periodicity within the population
may skew the sample and the results
Stratified Sample members All strata are represented in the sample most
random individually within frequently by proportional allocation
sampling each of the
subpopulations or Disadvantage: creating strata on the population
strata and randomly selecting elements from each
stratum is expensive

Cluster Cluster being All members of a selected clusters are included in


random selected from the the sample, but not all clusters are included
sampling larger population of
clusters and Disadvantage: often lower statistical efficiency
elements chosen (more error) due to subgroups being
from randomly homogeneous rather than heterogeneous and
selected clusters design effect problem (not elements
independently selected)

So far, we have explained sampling methods embodying the feature of randomness,


whereby every unit in the population had equal chance being selected or had a
calculable change of being included in the sample. However, there are also sampling
procedure with no randomness requirement. These non-random methods are
sometimes useful2 but not necessarily reliable.

VII. Non-Random Sampling Methods

3.43. What is a non-random sampling method?


A non-random sampling method is a method that does not follow
randomization or does not give equal chance of selecting the elements in a
population.

3.44. What are the major non-random sampling methods?


These are –
Quota sampling
Convenience sampling
Purposive sampling or Judgment sampling

3.45. What is Quota sampling?


A method that follow identifying groups (age, sex, political affiliation,
religious, etc.) and elements are sampled from each group at the interviewer‘s

2
Such quota sampling methods find their greatest application in opinion polls; political elections and
so forth. In addition, it has the advantage of being cheap and administratively simple.
~ 79 ~
discretion; such selection of respondents by the investigator does not happen
with random sampling methods.

Note that quota sampling has the feature of stratified sampling and some
approximate information on the number of units in each stratum; however, the
number of units and choosing sample units are left to the interview
him/herself.

3.46. What are the major weaknesses of quota sampling provided


its usefulness?
Its major weaknesses are; (1) lack of randomization, and (2) introduces
intentional or conscious errors can happen due to selection of units by the
interviewer. These are recipes for unreliable and inefficient estimates.

3.47. What is Convenience sampling?


It is a method that units or elements to be included in a sample are selected at
the convenience or discretion of the interviewer/investigator rather than by any
pre-specified probability of being selected. For example, the interview may
select his/her friends, relatives and acquaintances to perform certain quick
research. A TV station may conduct opinion survey near a bus station, in a
market or on campus. In a certain village, the interviewer may rotate a coca-
cola bottle and pick household using the direction of the bottle neck is
pointing! It is unscientific and unethical way of selecting a village.

It is very important to understand that such methods do not show the


representativeness of a sample; and hence, interpreting the estimates and
drawing conclusions are simply dangerous!!

3.48. What is Purposive sampling or Judgment sampling?


This is a sampling method that an investigator/interviewer chooses units of the
sample that she/he feels are most representative of the population with respect
to the population characteristics. For example, a lecturer may select 2 or 3
students from class, which he/she feels represent the class and conduct some
opinion survey on certain issues related to the class. Or, an interview may
select households in the village within the vicinity of accommodation; picks
households close to the main road rather than going into the villages, across
the river or behind the mountains.

Again, greater precaution should be taken in interpreting the estimates and


drawing conclusions. The validity or reliability of the results depends on the
judgment of the investigator!

~ 80 ~
3.49. What are non-sampling errors?
Non-sampling errors, usually known as ‗biases‘, are often serious than sample
errors. They arise or come about in a number of ways, sometimes intentional
(conscious) and at other times unintentionally (unconsciously).

For example, a poorly defined population, ignorance of assuming that all


random sampling methods are the same, reporting bias, non-response bias and
‗not reaching‘ randomly selected households are but a few consciously
introduced non-sampling errors. Other unintentionally non-sampling errors are
such as sensitive questions posed to respondents, for example, age of a
woman, income of a farmer or a business person and marital status.

It is therefore very important to train and supervise interviewers; as well as,


design a questionnaire with clear questions and translations, if non-English
language is applied.

3.50. What are survey errors?


i. Random sampling errors
Random: chance variation
o The difference between the sample value and the true value of the
population mean.
o Cannot be eliminated but can be reduced by increasing sample size.

Unbiased and consistent estimators

ii. Systematic sampling errors


Sample design error—problem in sample design or sample procedures
o Frame errors; e.g., sample drawn from small geographic area.
o Population specification errors
o Selection errors
Flaws in the execution of the sample design.
Biased and non-consistent estimators

iii. Measurement errors


 Variation between the true value and the information actually obtained
o Surrogate information error
o Interview error or interviewer bias
o Measurement instrument bias
o Processing error
o Non-response bias
o Refusal rate
o Response bias
 Deliberate falsification
 Unconscious misrepresentation
~ 81 ~
3.51. What is an experiment?
An experiment:
o The researcher changes an explanatory, independent, or
experimental variable (EV) to observe changes in the dependent
variable (DV).

Example:
EV = price DV = total sales
EV = advertising DV = market share
EV = program DV = income

Observing the effect


o Pre-and post surveys are often used to observe the change in
experiments conducted in social sciences.

Experiments in questionnaires
o The effect of information
o The order of question
o The wording of questions
o Different question formats
E.g., open-ended vs. close-ended, recollection

VIII. Comprehensive Sampling: Agricultural Survey in Malawi


(Courtesy of National Statistical Office (NSO), Malawi)

This is actual application of various random sampling methods, which shows the
combinations of SRS, SS and PPS that was used by NSO for National Sample Survey
of Agriculture (NSSA) data collection in 1992.

Purposes, Design and Collection of the data

1. Introduction
Data on the organization and structure of the smallholder agriculture sector in
Malawi
Done once every ten years
Helps the government to formulate plans to improve the productivity of the
smallholder sector
NSSA is an integrated, multiple visit questionnaire which consists six modules
i. Household Composition
ii. Garden Details
iii. Livestock Numbers
~ 82 ~
iv. Food security and Nutrition
v. Employment status of household head, migration and
household assets (SDA A) and Health (SDA B)
vi. Extension

2. Purposes of the Data


NSSA was intended to provide basic data requirements for
Estimation of national aggregate of crop production for use in compilation of
national accounts
Monitoring the effect of agricultural extension programmes as well as
evaluation of the National Rural Development Programmes (NRDP) at Rural
Development Projects (RDP), Agricultural Development Division (ADD) and
National levels, and
Establishment of priorities and formulation of socio-economic policy in the
smallholder sector of the rural economy

3. Sample Design
Malawi is divided into eight ADDs which in turn divided into 30 RDPs
For sample selection purposes, the country was divided into 107 strata based
on ecological features (soil type, cropping pattern, rain fall, etc.)
The stratum boundaries never crossed RDP and EA boundaries. This insured
that all the strata contained a complete set of EAs while RDPs contained a
complete set of strata and each ADDs contained a complete set of RDPs
The sampling methodology was a two-state stratified sample design

i. The Primary Sampling Units were EAs


ii. The Secondary Sampling Units were Households

The EAs were selected with Probability Proportional to Size (PPS) of the EA.
The measure of size being total population of the EA as found in the 1987
Population and Housing Census
A simple random procedure was employed in the selection of the sample
households within the selected EAs

NUMERICAL EXAMPLE
Sample size set at 600 EAs. This simply was found to be statistically adequate
to give reliable results at the RDP level with coefficient of variation, CV
10%.
The number of EAs to be selected per stratum was determined by the square
root of the size of the stratum where the stratum size was given by the sum of
the population of all the EAs within the stratum.

~ 83 ~
Assume 4 strata, S1, S2, S3 and S4
Allocation of EAs to Strata
Stratum Pop Pop
S2 S1 100 10
S1
S2 81 9
S3 225 15
S4 16 4
S3 ---------------------------------------------
P=422 S=38
S4

Note: WHY pop? The square of the pop is taken because to avoid over or under
representation of stratum. Otherwise, large strata can be over-represented and small
strata can be under-represented. Basically, it is one of the procedures of increasing
precision in statistical analysis. Now, assume N = national sample size of EAs.

The square root allocation was used where the number of EAs to be selected in a
stratum was arrived as follows.

S1 = N x 10/38 S2 = N x 9/38 S3 = N x 15/38 S4 = N x 4/38

First Stage Selection Procedures

Each stratum comprised of a complete set of


enumeration areas as shown in S3
S2 Note the EAs were the units used for
S1 sampling in the 1987 Census with
boundaries drawn to represent geographical
S3 areas with roughly equivalent populations
(i.e., about 250 dwelling units per EA). The
EAs were then grouped into agro-ecological
S4 zones using the categories developed by the
recent land resources appraisal.

Procedures

The EAs in a stratum were listed down their population were shown together with
their Cumulative populations. For example,

~ 84 ~
EA POP CUMULATIVE POP

1 200 200 (1-200)


2 181 381 (201-381)
3 215 596 (382-596)
4 300 896 (597-896)
. ….. so on
----------------------------------------------------------------
5000
The EAs are then selected with Probability Proportional to Size (PPS).

Suppose we were to select 10 EAs from S3 (say, S3 = N x pop/total)

Step 1:
o Divide 5000 by 10 which equals 500 (sample interval). Find the number
between 1 and 500, say RND = 210

Step 2:
Check along the cumulative total and see in which one 210 falls. The EA against
this cumulative total is then selected. In this case, EA2

Step 3:
Add 500 (sample interval) to 210, you get 710. Check against which cumulative
total 710 falls. EA4 is selected.

Step 4:
Continue step 3 (go to 3). This is, add 500 + 710, 1210 + 500 until all required
EAs are selected. (In this case, 10 for S3)

It is important to note that enumerators listed all households in selected EAs, and
from the households list of each EA 20 households were selected by the simple
random sampling procedure. Also note that the enumerators screened out households
with no cultivating garden and/or livestock. Total number of households for the
survey was 600 EAs times 20 households that equaled to 12, 000 households.

SUMMARY OF STATISTICAL METHODS

Some interesting statistical methods used in designing to collect NSSA data were –

Probability of selection of an EA
= Pop EA/Pop Stratum = PEA/PS

Probability of selection of a household


= # of selected HD/total # of HH listed in the EA = h/H

~ 85 ~
Overall Probability of selection = (PEA/PS) x (h/H)

Since the two events are independent


The weight/inflation factor is given by the inverse of the overall
probability of selection, that is,

(PSH)/(PEAh)
Estimation of a total area, Y is done as follows

Let yi = area of land a household has, then the stratum total is given by
Y = ((PEA/PS) x (h/H)) yi
Hence, we note that each EA gives an independent measure of the estimate of the
stratum it belongs to. Therefore, if n EAs are selected in a single stratum
Y 1/n [ (PEA/PS) x (h/H) yi]
And
Var (YStratum) = { (PEA/PS) x (h/H) yi]2 - [ (PEA/PS) x (h/H) yi]2/n}/n-1

NOTES OF THE MODULES (Data Collection)


i. Household Composition survey
Conducted in three rounds (October, February and June)
Assessed changes in hired labour and off-farm economic activities
pursued by household members
NSSA focused exclusively on smallholder in the agriculture sector,
excluding the sub-sector Estates. Note that smallholder and estate are
essentially land revenue, not socio-economic categories.

o An estate is defined as a holding which has been officially registered as


leasehold land, such s tobacco estates.
o A smallholder cultivates customary land which has been allocated to
his family and descendants by the village headman, but which may not
be sold and is restricted in what crops may be grown and where the
product of certain crops can be sold.

ii. The Garden Survey


Started immediately after the first round of the household composition
survey
Involved in measuring of all the gardens that were cultivated by the
sample household members during the cropping seasons
Recorded the area under crops, the crops grown and the material inputs
used to grow them
Yield subplots (YSPs) of 50 square meters were laid in each plot which
had one of the following crops – maize, rice, groundnuts, pulses,

~ 86 ~
sorghum, wheat and sunflower. Note that the crop was harvested by
the enumerator and the produce was weighed and recorded.
The survey recorded the area of land cultivated by the household, not
land owned.

iii. The Livestock Survey


Conducted in three rounds
Done in order to monitor possible changes in the numbers of livestock
Important indicator, since livestock (particularly cattle) are a
convenient asset which can easily be sold to provide cash in a crisis or
in the wet season (January to April) to buy food
Livestock sales are probably a more sensitive indicator of household
income than change in other assets like ploughs, hoes, etc.

iv. Food Security Module


Administered twice (July/August) to capture household food security
status both before and after harvest
Questioning households about food expenditure and consumption
Provide data of production, sales, and prices; though, not buyers or
where the burley was sold.

v. SDA Modules (Employment A and Health B)


Adapted from the SDA priority survey developed by the World Bank
to monitor the impact of structural adjustment policies
Priority survey (PS) is a rapid survey with two main objectives
i. Identifying policy target groups
ii. Producing a series of key indicators of household welfare
SDA A
Questions about employment and household income are disaggregated
by gender
Provide information on burley‘s importance as a source of household
income, provided that burley growers can be identified accurately,
therefore it will be possible to explore the linkages between burley and
other indicators of household welfare.

SDA B
Health

vi. Extension Module


Similar to National Extension Monitoring Survey and Agricultural
Services project of Ministry of Agriculture
Club membership and extension training and visit system

~ 87 ~
========================================================
MENTAL GYMNASTICS
CHAPTER THREE
=======================================================

1. Distinguish between the following of pairs of terms.


a) Sampling and Sampling error
b) Simple sampling method and Cluster Sampling
c) Sampling units and Sampling frame
d) Random sampling methods and Non-random sampling methods
e) Proportional Probability sampling and Stratified sampling

2. True, False or Uncertain. Support your answer.


a) The relationship between a sample and a population is established through
sample statistic and population parameter.
b) A sample size of n > 30 is considered to be large enough for any research.
c) Simple random sampling is non-probability sampling method.
d) Design Effect is an implicit error when performing stratified sampling.
e) The larger the sample, the smaller the sampling error.

3. Why sampling?

4. Why does Design Effect occur? And, how do we take care off?

5. In order to reduce the standard error by 30%, the sample size should be increase
by a factor of what?

6. A certain institution has determined the standard error of sampling distribution of


mean for proposed food security research with 200 households. However, this
standard error is thrice the level of acceptable size when compared with previous
experiences. What can be done to get a representative and acceptable (enough)
standard error of mean?

7. A random sample of 200 households is selected from the total of 2500 households.
The sample mean of income is found to be x 1500Malawi Kwacha (MK), and
the sample variance is s2 = 220. Estimate μ, the average de for all 2500 households
and confidence interval.

8. A large company is concerned about the time per week lost due to absenteeism.
The company employs N=400 employees, and the time log sheets of a simple
random sample of n=40 employees show the average amount of time lost is 10
hours with a sample variance s2=3.1. Estimate the total number of person-hours
lost per week, and confidence interval.

~ 88 ~
9. The average amount of groundnut production μ for smallholder farmers must be
estimated. With no prior data available to estimate the population variance, but
most groundnut production lie within 200 kgs. There are 500 smallholder
groundnut farmers. Find the sample size needed to estimate μ with standard error
of 30 kgs.

10. An agricultural economist wants to estimate the average amount of fertilizer


smallholder farmers received from all sources. The economist selects 20 villages
at random from 120 villages that received fertilizer in a district. The data collected
are reported as follows:

village Number of Total village Number of Total


smallholder fertilizer smallholder fertilizer
farmers received farmers received(kgs)
(kgs)
1 30 1600 11 35 2000
2 25 1250 12 40 2100
3 35 1490 13 50 2500
4 22 1380 14 35 1590
5 20 1500 15 20 1200
6 15 1430 16 19 850
7 10 900 17 32 1490
8 30 1620 18 22 920
9 22 1545 19 25 950
10 18 1700 20 22 1000

Estimate the average amount that would be received by all the smallholder
farmers in the district, and estimate a 95% confidence interval. (Hint: use cluster
sampling method0

11. An economic survey is designed to estimate the average amount spent on


electricity for households in Lilongwe city. Since no list of households is
available, cluster sampling is used in forming the clusters in Lilongwe city. A
simple random sample of 20 Areas is selected from the 58 Areas in the city.
Interviews with the households in the selected areas revealed the total cost of
electricity per month as follows.
12.

Area Number of Total Area Number of Total


households expenses households expenses
(MK) (MK)
41 50 49600 11 45 32000
25 45 62520 22 45 29100

~ 89 ~
47 69 54090 18 40 42500
4 32 34880 14 45 51590
15 40 15000 25 26 31200
36 25 23505 9 39 61850
27 30 39010 17 42 51490
5 40 46200 2 32 90020
1 20 21545 29 50 29505
10 22 31700 12 62 100000

Estimate the average amount a household in the city spends on electricity, and
calculate the confidence interval.

~ 90 ~

You might also like