Intro To Sample Survey Lecture Note
Intro To Sample Survey Lecture Note
Chapter 2
2 Introduction to Sample Survey
2.1 Basic Concepts and Uses of sample Surveys
A Survey is a scientific study that deals with an existing population of units typified by persons,
institutions, or physical objects. A survey attempts to acquire knowledge by observing the
population as it naturally exists and making quantitative statements about aggregate population
characteristics. By stating a survey is the study of a population as it naturally exists, we mean to
exclude experimental studies in which the material to be studied is manipulated by the researcher
and the results observed. We will consider a survey to include censuses in which attempts are made
to study all members of the population and sample surveys in which a scientific sample of the
population is studied. Methods used in surveys are used in other areas of scientific study, and there
is no universally accepted definition for a survey. In some studies surveys include both observing
techniques and experimental treatments. In undertaking surveys, it is often difficult or even
impossible for researchers to study very large populations. Hence, they select a smaller proportion,
a sample of population for study.
Researchers who apply sample survey use sampling techniques and use the information they collect
from the sample to make inference about the population as a whole. When sampling is well done,
the inferences made concerning the population can be quite reliable. It is the most widely used data
collection techniques for an extensive range of subjects for both research and administrative
purposes. Sample survey are used to develop, test, and refine research hypotheses in different
disciplines such as sociology, social psychology, demography, political science, economics,
education, and public health. Central government make considerable use of surveys to inform them
of the conditions of their populations in terms of employment and unemployment, income and
expenditure, housing condition, education, nutrition, health, travel, patterns, and many other
subjects. They also conduct surveys of organizations such as manufacturers, retail outlets, farms,
schools, and hospitals. Market researchers carry out surveys to identify markets for products, to
discover how the products are used and how they perform in practice and to determine consumer
reactions. Opinion polls keep track of the popularity of political leaders and their parties and
measure public opinion on a variety of topical issues.
2.2 Planning for sample survey
For a survey to yield desired results there is need to pay particular attention to the preparations that
precede the field work. In this regard all surveys require careful and judicious preparations if they
have to be successful. However, the amount of planning will vary depending on the type of survey,
materials and information required. The development of an adequate survey plan requires sufficient
time and resources and a planning cycle of two years is common for a complex survey.
The planning of a sample survey has three major steps.
A. Survey design and preparation
Setting objectives of survey
Sample design
Preparation of sampling frames
Decision on types of survey
Preparing survey instruments
Planning time table
Survey budget proposal
Conducting pilot survey
1
B. Data collection
Organization of field work
Locating respondents
Collecting information
C. Survey analysis
Preparation for processing (data files and structures, data checking, coding, data entry…
Performing statistical analysis (descriptive and inferential statistics)
Presenting methods and findings in study report
2.3 Advantages and disadvantages of sample surveys vs. census
A censuses and sample surveys are two major ways of collecting information for statistical
purposes. As described above sample surveys refers to the gathering of data about characteristics of
interest from only a sample of the population which is selected by some rules of statistical
techniques. A census refers to the collection of information about characteristics of interest from all
units in a population under study. The advantages of sample survey against complete coverage
(census) have become obvious in recent years and needed be stated briefly.
1. Sample survey saves money.
It is possible to collect information from sample households and obtain estimates that
reasonably approximate the actual characteristics of a large population. It is obviously cheaper
to gather information from 500 households rather than from 10,000 households. Therefore, the
cost of obtaining information through a sample would be a lot less than obtaining it through a
census. It is important to note that the cost per unit is higher in sample than in census.
2. Sample survey saves labor.
A smaller staff is required both for fieldwork and for data processing. The number of
enumerators, supervisors, data editors, and data entry clerks required are very much less in
sample survey than in census.
3. Sample survey saves time.
Sample survey requires a smaller scale of operations at all stages and it reduces data collection
and processing time.
4. Sample survey provides higher level of accuracy.
This accuracy can be achieved through more selective recruiting of interviewers and
supervisors, more extensive training programs, a closer supervision of the personnel involved
and a more efficient monitoring of the field work.
5. Sample survey could be the only option for the study in some specialized areas.
For example, there are some cases where information of technical nature requires highly trained
personnel and specialized equipment like in medical area. Observation or experimentation could
be destructive in nature like testing industrial products such as testing the average duration of
burning of bulbs, and testing the quality of wine, beer, etc. In this case sampling is the only
feasible means of study.
6. Other practical advantages
It may include tests of census procedures, evaluation of census, quality control at field level and
at data processing level, advance tabulation, and for updating census results.
2.4 Source of the data
Statistical data may be classified into two basic types: primary data and secondary data.
1. Primary (original) data refers to those which are collected to meet the specific problem needs
at hand. Primary data are data collected by the immediate user(s) of the data expressly for the
experiment or survey being conducted. It is this data that we will normally be referring to when
we talk about “collecting data”. The nature and type of primary data required would depend
largely on the study objectives and vary from one field to another.
2. Secondary Data refers to already existing information which has previously been collected and
reported by some individual or organization for their own purposes, and at latter stage at least
some of that data will come to be made available to other individuals and organizations.
2
Secondary data can be obtained rapidly and inexpensively. It may be of considerable value,
although the exact value will depend upon the type of study being carried out.
2.1. Uses of Secondary Data
Sometimes data requirements totally met from available secondary sources in which case there is no
need to execute survey and generate primary data. The general rule is to exhaust all possible means
to explore secondary data before deciding to mount on a comprehensive plan for primary data. In
particular secondary data may provide a context (geographic, temporal, social) validation for
primary data, which allow us to assess the quality and consistency of the primary data, may act as a
substitute for primary data.
Administrative record is the main source of secondary data, but there are other various internal and
external sources like, records, reports, books, periodicals, newspapers, and academic studies. Apart
from time saving and cost, secondary data is less subjected to intentional bias and the only
alternative to inaccessible information, which is impossible to gather through primary data
approach.
2.2. Advantages and disadvantages of secondary data
2.2.1. Advantages
Secondary data is usually available more cheaply, the collection is generally significantly
quicker and easier than collecting the same data „from scratch‟.
Existing data are likely to be available in more convenient form, involving dial-up access
rather than dust removal
Can give us access to otherwise unavailable organizations, individuals or locations.
Allows the researcher to extend the „time base‟ of their study by providing data about the
earlier state of the system being studied
The secondary data are likely to be pre-processed eliminates the time consuming analysis
stage
2.2.2. Disadvantage
The method by which secondary data were collected is often unknown to the user of the
data. This means that we are forced to rely on the skills and integrity of the people who
collected and analyzed the data.
We may have little or no direct knowledge of the processing methods employed, and we
will rarely have access to the original raw data to check them, i.e., lack of accuracy
It quickly becomes outdated in an ever changing environment
Differences in classification or measurement
3
Chapter 3
3 Multistage Sampling: Two-Stage Equal Cluster Sampling
3.1 Multi-stage Sample Design
For studies of large and geographically dispersed populations it is more convenient to use a multi-
stage sampling design. It is particularly appropriate where a large scale survey is to be conducted,
and where for logistic and organizational reasons it is convenient for the sample to be grouped
together in a more limited number of geographical areas, rather than being spread thinly and
dispersed across the whole country.
Multistage sampling is adopted in a number of situations:
Sampling frames may not be available for all the ultimate observational units in the entire
population.
A multistage sampling plan may be more convenient than a single stage sample of the
ultimate units, as the cost of surveying and supervising; in large scale survey can be very
high due to travel, identification and contact, etc.
It can be convenient means of reducing response errors and improving sampling efficiency
by reducing the intra-class correlation coefficient observed in natural sampling units.
In un-stratified multistage sampling, the sample is selected in stages, i.e., the population is divided
in to a number of PSUs, which are sampled; then the selected PSUs are sub divided into a number
of smaller second stage units, which are again sampled; the process is continued until the ultimate
sampling units are reached. For multistage simple random sampling, at each stage the selection
design is SRS, with an equal selection probability for each stage. For example, for two stage simple
random sample the selection method is SRS at first stage with equal probability(1/total PSUs) and
SRS at second – stage again with equal probability(1/ total SSUs)in which the method is described
in short as PSRS/SRS. For a multistage varying probability sampling with two stage designs, the
selection method could be probability proportional to size either at both sampling stages (PPS/PPS)
or PPS at first stage and SRS at second stage (PPS/SRS) and similar procedure can be followed for
more than two stages.
A basic principle of scientific sampling is that every sampling unit must have a known, positive
probability of being selected. Where the probabilities are equal, the sample design is known self-
weighting and the formulae for calculating estimates are relatively straight forward. Where the
sample design is not self-weighting, then the data relating to different sample units have to be
weighted.
Self-weighting Design in two stage sampling
In multi-stage sampling, there are different sample designs to choice from. For example, for two
stage sample with constant sampling fraction, an appropriate sampling design would be SRS or
Linear systematic sampling (LSS) at the first stage for selection of PSUs and again SRS or LSS at
the second – stage to select the second stage units, i.e., at both stages simple random sample is used
(SRS/SRS).
At both stages simple random sample is used (SRS).
Simple random sampling (SRS):
It is the simplest kind of sampling method. It requires as a sampling frame a list of sampling units-
households, farmers, institutions, or whatever else is being used – in any convenient order. The
items listed must be numbered in sequence, starting from one the first item at the head of the list
and continuing up to as many as the there are items listed. A table of random numbers is needed to
obtain a random selection of these numbers, and the items, which have been given the selected
numbers that form the sample chosen for the survey. The use of random numbers ensures that the
sample units are chosen entirely by chance, without being influenced by any person‟s unconscious
preferences. In a table of random numbers, each number within the range has an equal chance or
probability of selection. Since each element in the sample frame is given one number, each unit has
4
an equal chance of selection for the sample. The sampling could be performed with or without
replacement.
Linear systematic sample (LSS): it is operationally a convenient method of selecting a sample.
In a systematic sample we decide the sample size n from a population of size N. In this case,
however, the population has to be organized in some way, such as points along a path or in simple
numerical order. We choose a starting point the sequence by selecting the rth unit from one‟ end „of
the sequence, where r is less than n. and is usually chosen between 1 and k randomly. We then take
the rest of the sample by adding K to r, where k is an integer number equal to N/n or to the next
lowest integer below N/n if this division produces a real number. We do this repeatedly until we
reach the end of sequence. One way of envisioning a systematic sample is think of the sample frame
as a „row; of units, and the sample as a sequence of equal of equal-spaced „stops‟ along the row, as
shown below
Population (N)
1 ………………………. N
r k+r 2k+r … (n-1) k+r until desired sample size (n)
Cluster sampling: clusters can be defined as sampling units containing several elements that occur
in groups naturally or formed artificially. A cluster has listing units associated with it in which the
units can be geographical, temporal, or spatial in nature. Thus, cluster sampling can be defined as
any sampling plan that uses a frame consisting of clusters of listing units. In a single stage
sampling, we select a sample of clusters and completely cover all units within selected clusters.
Clusters can be selected by a variety of sampling techniques. For example, we can select a sample
of clusters by SRS or by systematic sampling or sampling with PPS.
The important reasons for using cluster sampling are feasibility and economy. If the only sampling
frames readily available for the target population are lists of clusters, then the only feasible method
of sampling is cluster sampling. That is why for surveys of human population; to compile lists of
households for the purpose of survey never seems feasible in terms of time and resources. Listing
costs and traveling costs almost always lowest for cluster sampling.
The disadvantage of cluster sampling is that the standard errors of estimates obtained form this
design are often high compared with those obtained from samples of the same number of listing
units chosen by other sampling designs. Therefore, one can choose the sampling design that gives
the lowest possible standard errors at a specified cost or, conversely, the sampling design that yield,
at the lowest cost, estimates having standard errors.
Stratified random sampling:
Definition: The process of splitting the sample to take account of possible sub-populations is called
stratification, and such techniques are called stratified sampling methods.
Stratified sampling is a technique, which uses any relevant information, which might be available,
in order to increase efficiency. Stratified sampling involves the division or stratification of a
population by partitioning the sampling frame into non-overlapping and relatively homogeneous
groups called strata. The selection of samples can be performed independently in each of those
strata.
Stratified random sampling is a sampling plain in which a population is dividing into L mutually
exclusive and exhaustive strata, and a simple random sample of nh elements is taken separately and
independently within each stratum. Let N1 , N2 ,..., N L represent the number of sampling units
within each stratum, and n1, n2 ,..., nL represent the number of randomly selected sampling units
within each stratum. Then the total number of possible stratified random samples is equal to
N1 N 2 NL N
... .
n1 n2 nL n
Stratified random sampling, in particular, involves dividing the population into strata, and then
selecting simple random samples from each of the strata. Stratification variables may be geographic
5
(region, province, rural urban, zone) or non-geographic (income, age, sex, size of employees, etc).
It should be kept in mind that stratification is limited only to those items of information, which are
available on the frame.
The Purpose of Stratified Sampling
Stratified sampling is used in certain types of surveys because it combines the conceptual simplicity
of simple random sampling with potentially significant gains in reliability. Basically there are four
major reasons for resorting to stratification:
1. The principal objective of stratification is to reduce sampling error. Under certain
conditions the variances of the sample estimates may be decreased, which means precision
may be increased over simple random sample.
In a stratified sample, the sampling error depends on the population variance existing within
the strata but not between the strata. For this reason it prays to create strata with low
internal variability. It follows that the more homogeneous the groups, the greater the
precision of the sample estimate.
2. In some cases, separate estimates are required at the stratum level. For example, in
household surveys estimates may be required by province, income group, occupation, age
group, urban size group, educational category, etc.
3. Stratified sampling is administratively convenient. It can enable a survey organization to
control the distribution of fieldwork among its regional offices. Also, for large complex
surveys, it can facilitate sample design work by enabling such work to be carried out within
operationally manageable units.
4. Sometimes, different parts of the population may call for different sampling procedures.
For example :
There may be incomplete frames in which lists may be available for different parts of
the population and in this case supplementing an incomplete list will be required.
Physical distribution of a population such as private households, institutions (military
camp, prisons and hostel), densely or sparsely populated areas may call for different
procedures. A different procedure may be used to sample persons in sparsely populated
rural areas than that used for the more densely populated urban areas.
The diverse nature of the sampling units of the population may call for different
procedures. For instance, employee‟s study requires stratification by literate and
illiterate; and study on young people between 14 and 18 years requires young people at
school and those who are not at school.
The strategy employed for constructing strata involves, first determining the population parameter
we are interested in estimating and, second stratifying the population with respect to another
variable that is thought to be associated with the variable of interest. For example, in the firm
expenditure survey, it is desirable to stratify the population of farms on the basis of information,
which is highly related to the cost of running a farm. One may consider size (acreage) and type of
farming (crops, livestock, other, etc) as appropriate stratification variables.
The major disadvantage of stratified sampling is that it may take more time to select the sample
than would be the case for simple random sampling. More time is involved because complete
frames are necessary within each of the strata and each stratum must be sampled.
6
Chapter 4
4 Preparation of Sampling Frames
4.1 Definition
In its simplest form sampling frame is a listing of the units from which the sample selection is to be
made at any stage of sampling. The units in the frame may be either areas or units of objects
covering the items being investigated in a survey. The units in the frame may be large or small
areas, households, persons, farms, or any identifiable items, and are generally known as area frame
or list frame.
The frame consists of materials, procedure, and devices that identify, distinguish, and allow access
to the elements of the target population. The frame is composed of a finite set of units to which the
probability sampling scheme is applied. Rules or mechanisms for linking the frame units to the
population elements are an integral part of the frame. The frame also includes auxiliary information
(measure of size, demographic information) used for special sampling techniques, such as
stratification and probability proportional to size sample selections, or special estimation techniques
like ratio or regression estimation.
In multistage sampling the sampling units used at the first stage of sampling are called primary
sampling units (PSUs). Those used at the final (ultimate) stage are called ultimate sampling units
(USUs). In designs with three or more stages, units used for the intermediate stages are called
secondary or second stage sampling units (SSUs), third stage sampling units and so on. Therefore,
for surveys with multistage sample designs, a frame is needed for each stage of selection.
For example, for the three stage design the sampling units for household survey are:
PSUs: districts (woredas)
SSUs: EA (kebeles)
USUs: housing units (households).
Any sampling frame used for the first stage of selection must cover the entire survey population
(the designated PSUs). At subsequent stages of selection, frames are needed only for the sample
units selected at the preceding stage. In the above case, a list of districts (woredas) would be needed
for first stage sample selection. List of EAs (kebeles) would be needed for second stage, but only
for the sample districts (woredas). For the final stage, list of housing units (households) are required
only for sample EAs (kebeles). In this study the term secondary sampling frame will be used for
frames that are developed specifically for the second and subsequent stage of sample selection.
7
Non-area units include housing units, households, persons, nomadic tribes, institutions,
construction camps, and other items, and these units must have a clear definition.
Coverage the coverage objective of the frame or frames used for a survey is to provide access to all
of the elementary units in the survey population and to do so in such a way that every one of those
units has a known (or knowable) probability of selection in the sample for the survey.
Access is achieved by sampling from the frames, usually through two or more stages of selection
and by the use of rules of association that link elementary units to the units that were selected at the
final stage of selection i.e., the USUs.
Media sampling frames may be stored either on print or electronic media. For a frame stored on
electronic medium, it is relatively easy to produce a printout of the entire frame or any portion
desired, and to organize in any desired format.
Content the frame contains a record for each frame unit. The only item that is absolutely
indispensible is a unique identifier of each unit. If a unit is selected, the numerical identifier
provides the means of access to the unit in order to perform subsequent sampling operations or to
collect survey data. The numerical identifier will be linked with other identifiers such as place,
names, or addresses of housing units, either in the frame itself or on maps or other auxiliary
materials.
Additional information there is a number of possible reasons for collecting additional information
during the construction of sampling frame. One occurs when the definition of the universe or the
sampling unit to be covered is rather complicated to apply under field conditions, and also
classificatory information is gathered during the frame listing, and the final decision as to which
units are to be excluded or included can be made at a later stage. Another common reason is for the
purpose of stratification and allocation in which the stratifying information must be gathered and
recorded during the frame listing.
8
error and the cost of producing survey estimates; the most efficient survey design is the one that
produces the desired level of precision at the lowest possible cost. Perhaps the most important of
these properties is the inclusion of accurate and up-to-date supplemental information for each frame
unit. Measure of size, such as population, number of households, number of agricultural holders and
other size of measure, are useful. Measure of size can be used in the following ways:
To construct sampling units
To form strata of units classified by size
To determine the allocation of sample PSUs to strata
To select units with probability proportionate to size (PPS)
As auxiliary variables for ratio or regression estimates.
Other properties of frame that facilitate the use of efficient sample designs include:
Choice of sampling units available- organizing the frame units in a hierarchical structure
and assigning identifiers to frame units.
Good quality maps of units available- showing the boundaries of each unit
Easy to manipulate/ process- computerization of the frame
c) Cost Related Properties
The preparation of sampling frames can be an expensive exercise. Low cost of frame development
can best be achieved by treating the development, maintenance and updating of frames for census
and household surveys as a single integrated ongoing process. If two alternative frame sources
would result in the same quality and efficiency, the one with lower cost of development, use and
maintenance would obviously be preferred, i.e., low cost of acquisition/preparation low cost of use,
and low cost of maintenance. The choice frame for a survey must be based on assessing the cost of
using that frame and the total error of the survey estimates when that particular frame is used.
Above all, the cost of frame preparation must be considered at the planning stage and must be
budgeted for. They are likely to be significant proportion of the total cost and relate to an element
of the survey work which is critical important in determining the eventual quality of the survey
results.
In summary, the sampling frame plays a central role in the design of a sample survey. It determines
how well a population is covered, affects the method of enumeration and influences the efficiency
with which a sample is designed. A frame becomes more valuable if contains some supplementary
information, which can be used to improve sampling, and estimation procedures.
The structure of the frame, the information it contains, and the quality of that information will
determine the type of sample designs, and estimation procedures that can be used in a survey.
Simple frames lacking auxiliary information support simple sample designs. For example, if the list
contains no information other than the identity of the elements, typically very simple sample
designs are used for selecting the sample. A simple random sample may be selected, or if the list is
large, a systematic sample or a systematic sample of clusters may be used.
Many sample designs use auxiliary data to produce more efficient samples. Complex sample
designs that are more efficient than simple random sampling, such as those employing stratification,
probability proportional to size sample selection, or special estimation techniques such as ratio and
regression estimators, require additional information beyond the identity of the target elements.
The sampling frame must be accurate and free from defects. It should be exhaustive (no units
omitted), non-repetitive, current or fresh list must be available (up to date), the units should be
clearly identifiable without ambiguity, and the units in the list must be traceable in the field.
9
CHAPTER 5:
5 SAMPLE DESIGN
5.1 Sampling Methods
The general aim of all sampling methods is to obtain a sample that is representative of the target
population. By this we mean that, as much as possible, the information derived from the sample
survey is the same as we would find if we carried out a census of the target population, allowing for
inevitable variation in the estimates due to imprecision.
When selecting a sampling method we need some minimal prior knowledge of the target
population; with this and some reasonable assumptions we can estimate a sample size required to
achieve a reasonable estimate, with acceptable precision, and accuracy of population characteristics.
How we actually decide which sampling units will be chosen makes up the sampling method.
Sampling methods can be categorized according to the approach they take to the probability of a
particular unit being included. Most sampling methods attempt to select units such that each has a
definable probability of being chosen. Moreover, most of these methods also attempt to ensure that
each unit has the same chance of being included as every other unit in the sample frame. All
methods that adopt this general approach are called probability sampling methods.
The basis of probability sampling is the selection of sampling units to make up the sample based on
defining the chance that each unit in the sample frame will be included. If we have 100 units in the
frame, we decide that we should have sample size of 10; we can define the probability of each unit
being selected as one in ten, or 0.1 (assuming each unit has the same chance). As we shall see next,
there are various methods that we can use to select the units.
It is important feature of probability sampling that each time we apply the same method to the same
sample frame we will generate a different sample. For a finite population we can use simple
combinatorial arithmetic to calculate how many samples we can draw from a particular sample
frame such that no two samples are identical. It turns out that from any population of N objects we
can draw NCn different samples, each of which contains n sampling units. In fact, in probability
sampling we are concerned with the probability of each sample being chosen rather than with the
probability of choosing individual units. If each sample is equally likely to be selected, then each
sampling unit automatically has the same chance of being included as every other sampling unit.
10
of the economic or social group, or other classification of the population covered by the
survey;
III. A clear specification of the desired information to be collected in statistical terms, i.e., to
determine the data requirements.
IV. The level of breakdowns by which the results are to be tabulated; regions, age groups, sexes,
residences and any other economic and social classification.
V. The level of accuracy desired or the specification of tolerable errors: the accuracy of a
survey estimate is generally taken to mean the closeness of the estimate to an exact or “true
value”, which is nearly always unknown, the error of a particular survey estimate is the
difference between that estimate and the true value of the quantity being estimated. It arises
sampling errors and non- sampling errors which need due consideration at design stage.
Since sample size determination requires the desired the desired confidence level and
margin of error, it is important to specify these factors by considering the cost and precision
required.
VI. The kind of results expected and, the users as the well as the uses of the data;
VII. Timeliness- how soon are the results needed; the utility of survey results falls off gradually
with the passage of time following the data collection stage of the survey. The rate at which
utility declines over time depend on the content and objective of the survey. For example,
political polls, relating to specific elections, revision of the minimum wage, monthly labor
force survey their results are needed very quickly. Users of survey data often press for
timeliness at the expense of accuracy. Therefore, one has to produce timely data to timely
data to facilitate their actual use and maintain the responsibility to produce accurate data
b) Sampling plan:
There are different ways of designing a sample survey, but the idea of optimum design started with
the sampling features such as selection process and estimation procedures. The selection process
deals with the preparation of sampling frames, sample size determination, choice of design to be
used, and sample selection method. The estimation procedure involves the process for computing
the sample statistics and calculating the reliability of these estimates. The purpose is to develop a
sample design that would meet reliability requirements at the lowest possible cost, or alternatively,
to produce the most reliable estimates for a fixed expenditure of resources.
12
Balancing the required level of precision and the resources available for conducting the
survey
Give due consideration to the likely tradeoffs between sampling and non- sampling errors
(lager sample size reduces sampling error, but it may have the effect of increasing non-
sampling errors.
The standard errors can be obtained by taking the square root of each variance. If a two stage
design is used with PPS/SRS, the estimation procedure would be as follows. At first stage kebeles
are selected with PPS (where size being number of households). At second stage, a fixed sample of
households was drawn from each sample by SRS or systematic i.e., mim is the same for all
sample kebeles (constant sample size). Then the estimation procedure would be:
13
Standard errors which can be obtained by taking the square root of the variance can be used to for
further estimation and evaluation.
Note for notation used:
n= is the number of first stage units in sample
H is total number of sub-units (households) in survey population
mi is number of second stage sub-units (households) within ith first stage unit in sample.
yij is observation for jth sub-unit within ith unit
14
Chapter 6:
6 Methods of collecting the Data
6.1 Time Dimension in Survey
Two types of surveys are classified according to the time of data collection: longitudinal surveys,
and cross –sectional surveys.
Longitudinal surveys gather information at different points in time in order to study changes over
extended period of time. Three different designs are used in longitudinal survey: panel studies,
trend studies, and cohort studies.
Panel studies are studies in which the same subjects are surveyed at different times over an
extended period. The investigator observes exactly the same people, group organization across time
periods. In a trend study, different people from the same general population are surveyed at
different times. In a cohort study, a specific population is followed over a length of time.
Cross –sectional surveys study a cross section (sample) of a population at a single point in time.
It is usually the simplest and least costly alternative. Its disadvantage is that it cannot capture social
process or change.
16
Interviewing (face- to – face, telephone)
Face-to-face interview is a social process that involves the interviewer and respondent. It is the
process in which the interviewer meets the respondents, explains the purpose of the study, forwards
a set of questions and records the answers. It is widely used in economic and social surveys.
Information may be collected by interview for various reasons. It may be information which could
be measured directly but would require too much time or too great a use of manpower or funds to
do so, in which case probably less accurate interview method is used instead. It may be information
that cannot be directly observed or measured because they relate to the past. It may be information
about the respondent‟s own knowledge, opinions, perceptions or attitude.
Some advantages of face –to face interviews:
Face-to face interviews have the highest response rate and permit the longest
questionnaires.
Interviewers control the sequence of questions and can use some probes.
Respondent is likely to answer all the questions alone.
Interviews also can observe the surroundings and can use nonverbal communication and
visual aids.
Well-trained interviewers can ask all types of questions including complex questions.
The disadvantages of this method, may include the following
Cost is high- the training, travel, supervision, and personnel costs for interviews can be
high.
Interviewer bias is also high in this method
The appearance, tone of voice, question wording, and so forth of the interview may affect
the respondent.
The use of telephone interviewing for social surveys has increased in developed countries
substantially in recent years because of the high penetration of telephones. Its major advantages are
lower cost and faster completion, with relatively high response rate. The phone permits the survey
to reach people who would not open their doors to an interviewer, but who might be willing to talk
on the telephone. There may be less interviewer bias and less social desirability bias than with
personal interviews. The main disadvantage of this method is that there is less opportunity for
establishing rapport with the respondent than in face-to –face situation. Another disadvantage is that
households without telephones and those with unlisted numbers are automatically excluded from
the survey, which may bias results. Those who have phone number blocking may simply ignore
calls from unfamiliar number of the survey.
We have to note that the use of this method is unpopular and very limited in developing countries.
17
CHAPTER 7:
For example; which crops do you grow? The question does not specify any particular season or
crops or plots and hence many answers are possible. It is open for discussion. Why did you say you
would not buy imported cooking oil when it is available in the market? Again this could be
discussed since the reasons could be quality, taste, price, etc.
The advantages of open-ended responses are:
They permit an unlimited number of possible answers, which may not be considered at
initial stage of the questions‟ design.
Respondent can answer in detail and can qualify and clarify responses by expressing in
his/her own words.
Unanticipated findings can be discovered.
They permit creativity, self-expression, and richness of detail.
They may be used when there are too many response categories to list on a questionnaire.
They are useful when the questions are too complex to reduce to a few standard responses.
The disadvantages of open-ended responses are:
That much irrelevant information is collected
The answers are not standardized and are therefore difficult to compare and to make
statistical analysis.
Coding responses is difficult
They require a higher level of skills on the part of the data collector since responses are
written verbatim.
More time, thought, and effort is necessary for completion
The forms are often bulky because answers take up a lot of space in the questionnaire.
Closed-ended question
A closed –ended question is one where a predetermined list of alternate responses is presented to
the respondent for checking the appropriate one(s). It implies that the respondents‟ answers are
restricted in some way to a limited range of alternatives. Closed-ended question falls into one of
two categories: dichotomous question and multiple- choice question.
A dichotomous question contains two alternatives in the predetermined list of responses.
Examples are yes-no, true-false, agree-disagree, like-dislike, fair-unfair and so on. A multiple-
choice question offers more than two responses in the predetermined list of alternate responses.
There are two categories of multiple choice questions: single coded question, where the respondents
are permitted to check one and only one response; and multi-coded question, allows the respondent
to select as many responses that are applicable.
Example: a) Do you have a bank account? Yes = 1, No = 2
b) How many children have you ever born?
1 =1-2 2= 3-4 3 = 5-6 4 =7-8 5 = more than 8
c) Which type of soft drink(s) does your household consume?
1 = Pepsi–Cola 2 = Coca–Cola 3 = Mirinda 4 = Fanta 5 = Sprite
6 = Seven–up 7 =others, specify______________________.
20
d) Has the road construction activity had impact on your access to public services (health,
education, market, etc)? Yes = 1, No = 2
If the answer is „Yes‟ explain the impact. _____________________
The choice can be made by making a mark alongside a category; by entering a numeric value; or by
selecting a code form a code list. Setting categories of responses requires skill and experience in the
areas of studies and suits computer processing.
The advantages of closed response categories are the
It is easier and quicker for respondents to answer.
The answers of different respondents are easier to standardized and to compare
The answer are easier to code and statistically analyze
The questions meaning is often made more clear by the response categories,
The answers are relatively complete as long as all relevant categories are specified
Respondents are more likely to answer about sensitive topics
The disadvantages of closed response are that
The respondent can guess at answers when they don‟t know since have the categories to
guide them
The appropriate category may be missing from the schedule
Failure to understand the question is less easily detected than with an open –ended question
A poorly planned list may act as a constraint to correct answers not catered for
Too few categories may fail to differentiate between important groups, and enumerator error
(placing the tick in the wrong box by accident will be more common)
7.5 Question phrasing and common problems which arise with question phrasing
Another aspect of questionnaire design that needs serious consideration is phrasing of the question.
The information required should be well and clearly defined at each stage at which a question is
21
posed: initial definition and explanation in the survey manual: text in the questionnaire; precise
units for physical measurement; and verbal phraseology by the enumerator.
At each stage the question should have:
A clear meaning
The same meaning to every person asked and the researcher,
An answer which the respondent knows,
An answer which can be given clearly and unambiguously by the respondent.
a) leading questions
A leading question is one that leads the respondent to choose one response over another by its
wording. The presentation of questions should be neutral. The form of the question should not
indicate a preferred or „correct‟ answer. For example, the question, „you don‟t smoke, do you?
Or Do you buy the fertilizer recommended by the extension worker?‟ leads respondents to state
that they do not smoke in the first case, and that they should buy fertilizer recommended by the
extension worker and that they are wrong if you fail to do so in the second case.
b) Multiple questions
Multiple (double –barreled) questions are questions which combine two or more distinct questions
into one single question. For example: „Do you like listening radio and watching television?‟
„Do you have a tractor or plough?‟ “Does this company have pension and health insurance benefit?”
In this case one would be confused and undecided as to which answer one should offer.
The best way to avoid confusion is to replace double questions with two or more single questions
and then to ask only one question at a time.
c) Ambiguous question
Ambiguity, confusion, and vagueness must be avoided from a question since different people will
understand the question differently and in effect their interpretation will depend on the individual
respondent. The question, „What is your income?‟ could mean weekly, monthly, or annual; family
or personal; from salary or from all sources; for this year or last year. The question, „Do you drink
beer frequently?‟ it is ambiguous because the word frequently does not specify a fixed time
reference. Vague words and phrases like „kind of’, „fairly‟, „generally‟, „often‟, regularly, etc.,
should be avoided.
d) Probing questions
Probing is not easy. A delicate balance has to be struck between persistence and rudeness. Very
often the respondent does not want to tell the truth. In some culture it is socially acceptable to tell
lies to close Friends, never mind strangers. The enumerator working on a repeated visit survey has
to maintain a working relationship with the respondent and cannot permit the need to resolve minor
contradictions on a few questions to disrupt the relationship. In some cases unbelievable data have
to be accepted, and it is helpful if some method is agreed for the enumerator to draw attention to
this on the form.
e) Use simple language
The language of a question should be simple. The aim in the question wording is to communicate
with respondents as nearly as possible in their own languages. Thus the wording of the question
must be appropriate to the respondent. Question should avoid the use of technical terms and jargon,
which the respondent may not understand. Where it is necessary to use technical or legal terms, one
should provide definitions and explanations.
For example; „Do you use inorganic fertilizer? It is better to specify types or brand names or
colloquial terms with which the respondent will be familiar. Also use terms which the respondent
22
will understand and which will not cause offence. For example terns such as „peasant‟ or „tribe „or
witchdoctor „may cause offence.
f) Sensitive topics
In some cultures people do not like to discuss private matters openly. sensitive questions are apt to
be irritating, threatening, or embarrassing to the respondent, such questions are prone to normative
answer, answers which confirm that the respondent acts within the special rules of society even if
that particular individual sometimes acts outside these rules. In a society which generally condemns
drunkenness, question about drunkenness might generate denial even if drunkenness sometimes
does occur. Under this circumstance it may be useful to word the questions so that there is some
assumption that the activity does take place. Thus rather than ask do you ever get drunk? We might
ask „how often do you get drunk?‟ the assumption in the question that you might sometimes get
drunk may ease the guilt of the respondent and generate a more truthful answer.
Questions on age, physical or mental disability, deaths in households, income, sexual behavior,
family planning, are relatively regarded as sensitive issues.
Special attention should be given during field testing of the questionnaire to identify particularly
sensitive questions and how they can be improved by rewording or better interviewing procedure.
23
CHAPTER 8
8.1 Pre–tests
It is difficult to plan a survey without a good deal of knowledge of its subject matter, the population
it is to cover, the way people will react to questions and even the possible answers they are likely to
give. Particularly for large–scale survey it should be the general rule to conduct pretests and pilot
survey in order to get solutions to the following questions.
How is one to estimate how long the survey will take, how many interviews will be needed,
how much money it will cost?
How, without trial interviews, can one be sure that the questions will be as meaningful to the
average respondent as to the survey expert?
How is one deciding which questions are worth asking at all?
Pretests and pilot surveys are standard practice with professional survey bodies and are widely used
in research surveys.
The pretest is a preliminary application of the data gathering technique for the purpose of
determining its adequacy. This may take the form of a series of small pre-tests on isolated problems
of the design. For example in testing of questionnaires, pre-testing refers to one or more series of
interviews conducted on successive drafts of the questionnaire for the purpose of identifying and
correcting errors and shortcomings. Its objective is to evaluate the general receptivity and feasibility
of the questionnaire, and identify specific problems of communication between the interviewer and
the respondent interms of specific questions or items of information sought.
24
If it is properly done, it is likely to lead to changes to the survey forms and manuals, and to the
procedures and organizational arrangements. It is therefore necessary to allow enough time to
analyze the results and observations from it, and produce revised materials and arrangements in
good time for the start of the main survey operations.
25
CHAPTER 9:
26
8.2 Preparing Budgets
Budget preparation involves the assignments of cost to each survey activity. The main expenditure
items include:
Office wages and salaries (administration, executive personnel, quality control, data
processing);
Survey materials;
Supervisory and interviewing costs (enumerators‟, supervisors‟ and field officers‟ salaries
and allowances)
Supplies for the reproduction of questionnaires, forms and manuals and other stationaries;
Transport cost,
Computer services;
Sampling design cost
Other administrative costs (office rentals, overheads recovery); etc
Preparation of a preliminary budget estimates is a priority activity that should be planned and
executed at an early stage. The budget will depend on the survey design, including the levels of
precision desired or various estimates, as well as on the geographical and other classification for the
presentation of the results, and the operational conditions prevailing in the region.
27
2. Field personnel
a) Salaries
50 enumerators for 2 months at 400 birr per month 40,000
10 Field supervisors for 3 months at 600 birr per month 18,000
10 Drivers for 3 months at 350 birr per month 10,500
Sub-total 68,500
b) Allowances
50 Enumerators for 1.5 months at 25 birr per day 56,250
10 Field supervisors for 2 months at 30 birr per day 18,000
10 Drivers for 2 months at 25 birr per day 15,000
50 Guides for 2 Month at 10 Birr per day 1,000
Sub- total allowances 90,250
Total field personnel 158,750
4. Stationary
Printing of forms, questionnaires 12,000
Pens, pencils, sharpeners, erasers, rulers 500
Report production 1,500
Manuals 2,500
Total stationary 16,500
28