0% found this document useful (0 votes)
40 views

Intro To Sample Survey Lecture Note

Uploaded by

yonasante2121
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Intro To Sample Survey Lecture Note

Uploaded by

yonasante2121
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

1 Chapter 1

Chapter 2
2 Introduction to Sample Survey
2.1 Basic Concepts and Uses of sample Surveys
A Survey is a scientific study that deals with an existing population of units typified by persons,
institutions, or physical objects. A survey attempts to acquire knowledge by observing the
population as it naturally exists and making quantitative statements about aggregate population
characteristics. By stating a survey is the study of a population as it naturally exists, we mean to
exclude experimental studies in which the material to be studied is manipulated by the researcher
and the results observed. We will consider a survey to include censuses in which attempts are made
to study all members of the population and sample surveys in which a scientific sample of the
population is studied. Methods used in surveys are used in other areas of scientific study, and there
is no universally accepted definition for a survey. In some studies surveys include both observing
techniques and experimental treatments. In undertaking surveys, it is often difficult or even
impossible for researchers to study very large populations. Hence, they select a smaller proportion,
a sample of population for study.
Researchers who apply sample survey use sampling techniques and use the information they collect
from the sample to make inference about the population as a whole. When sampling is well done,
the inferences made concerning the population can be quite reliable. It is the most widely used data
collection techniques for an extensive range of subjects for both research and administrative
purposes. Sample survey are used to develop, test, and refine research hypotheses in different
disciplines such as sociology, social psychology, demography, political science, economics,
education, and public health. Central government make considerable use of surveys to inform them
of the conditions of their populations in terms of employment and unemployment, income and
expenditure, housing condition, education, nutrition, health, travel, patterns, and many other
subjects. They also conduct surveys of organizations such as manufacturers, retail outlets, farms,
schools, and hospitals. Market researchers carry out surveys to identify markets for products, to
discover how the products are used and how they perform in practice and to determine consumer
reactions. Opinion polls keep track of the popularity of political leaders and their parties and
measure public opinion on a variety of topical issues.
2.2 Planning for sample survey
For a survey to yield desired results there is need to pay particular attention to the preparations that
precede the field work. In this regard all surveys require careful and judicious preparations if they
have to be successful. However, the amount of planning will vary depending on the type of survey,
materials and information required. The development of an adequate survey plan requires sufficient
time and resources and a planning cycle of two years is common for a complex survey.
The planning of a sample survey has three major steps.
A. Survey design and preparation
 Setting objectives of survey
 Sample design
 Preparation of sampling frames
 Decision on types of survey
 Preparing survey instruments
 Planning time table
 Survey budget proposal
 Conducting pilot survey
1
B. Data collection
 Organization of field work
 Locating respondents
 Collecting information
C. Survey analysis
 Preparation for processing (data files and structures, data checking, coding, data entry…
 Performing statistical analysis (descriptive and inferential statistics)
 Presenting methods and findings in study report
2.3 Advantages and disadvantages of sample surveys vs. census
A censuses and sample surveys are two major ways of collecting information for statistical
purposes. As described above sample surveys refers to the gathering of data about characteristics of
interest from only a sample of the population which is selected by some rules of statistical
techniques. A census refers to the collection of information about characteristics of interest from all
units in a population under study. The advantages of sample survey against complete coverage
(census) have become obvious in recent years and needed be stated briefly.
1. Sample survey saves money.
It is possible to collect information from sample households and obtain estimates that
reasonably approximate the actual characteristics of a large population. It is obviously cheaper
to gather information from 500 households rather than from 10,000 households. Therefore, the
cost of obtaining information through a sample would be a lot less than obtaining it through a
census. It is important to note that the cost per unit is higher in sample than in census.
2. Sample survey saves labor.
A smaller staff is required both for fieldwork and for data processing. The number of
enumerators, supervisors, data editors, and data entry clerks required are very much less in
sample survey than in census.
3. Sample survey saves time.
Sample survey requires a smaller scale of operations at all stages and it reduces data collection
and processing time.
4. Sample survey provides higher level of accuracy.
This accuracy can be achieved through more selective recruiting of interviewers and
supervisors, more extensive training programs, a closer supervision of the personnel involved
and a more efficient monitoring of the field work.
5. Sample survey could be the only option for the study in some specialized areas.
For example, there are some cases where information of technical nature requires highly trained
personnel and specialized equipment like in medical area. Observation or experimentation could
be destructive in nature like testing industrial products such as testing the average duration of
burning of bulbs, and testing the quality of wine, beer, etc. In this case sampling is the only
feasible means of study.
6. Other practical advantages
It may include tests of census procedures, evaluation of census, quality control at field level and
at data processing level, advance tabulation, and for updating census results.
2.4 Source of the data
Statistical data may be classified into two basic types: primary data and secondary data.
1. Primary (original) data refers to those which are collected to meet the specific problem needs
at hand. Primary data are data collected by the immediate user(s) of the data expressly for the
experiment or survey being conducted. It is this data that we will normally be referring to when
we talk about “collecting data”. The nature and type of primary data required would depend
largely on the study objectives and vary from one field to another.
2. Secondary Data refers to already existing information which has previously been collected and
reported by some individual or organization for their own purposes, and at latter stage at least
some of that data will come to be made available to other individuals and organizations.
2
Secondary data can be obtained rapidly and inexpensively. It may be of considerable value,
although the exact value will depend upon the type of study being carried out.
2.1. Uses of Secondary Data
Sometimes data requirements totally met from available secondary sources in which case there is no
need to execute survey and generate primary data. The general rule is to exhaust all possible means
to explore secondary data before deciding to mount on a comprehensive plan for primary data. In
particular secondary data may provide a context (geographic, temporal, social) validation for
primary data, which allow us to assess the quality and consistency of the primary data, may act as a
substitute for primary data.
Administrative record is the main source of secondary data, but there are other various internal and
external sources like, records, reports, books, periodicals, newspapers, and academic studies. Apart
from time saving and cost, secondary data is less subjected to intentional bias and the only
alternative to inaccessible information, which is impossible to gather through primary data
approach.
2.2. Advantages and disadvantages of secondary data
2.2.1. Advantages
 Secondary data is usually available more cheaply, the collection is generally significantly
quicker and easier than collecting the same data „from scratch‟.
 Existing data are likely to be available in more convenient form, involving dial-up access
rather than dust removal
 Can give us access to otherwise unavailable organizations, individuals or locations.
 Allows the researcher to extend the „time base‟ of their study by providing data about the
earlier state of the system being studied
 The secondary data are likely to be pre-processed eliminates the time consuming analysis
stage
2.2.2. Disadvantage
 The method by which secondary data were collected is often unknown to the user of the
data. This means that we are forced to rely on the skills and integrity of the people who
collected and analyzed the data.
 We may have little or no direct knowledge of the processing methods employed, and we
will rarely have access to the original raw data to check them, i.e., lack of accuracy
 It quickly becomes outdated in an ever changing environment
 Differences in classification or measurement

3
Chapter 3
3 Multistage Sampling: Two-Stage Equal Cluster Sampling
3.1 Multi-stage Sample Design
For studies of large and geographically dispersed populations it is more convenient to use a multi-
stage sampling design. It is particularly appropriate where a large scale survey is to be conducted,
and where for logistic and organizational reasons it is convenient for the sample to be grouped
together in a more limited number of geographical areas, rather than being spread thinly and
dispersed across the whole country.
Multistage sampling is adopted in a number of situations:
 Sampling frames may not be available for all the ultimate observational units in the entire
population.
 A multistage sampling plan may be more convenient than a single stage sample of the
ultimate units, as the cost of surveying and supervising; in large scale survey can be very
high due to travel, identification and contact, etc.
 It can be convenient means of reducing response errors and improving sampling efficiency
by reducing the intra-class correlation coefficient observed in natural sampling units.
In un-stratified multistage sampling, the sample is selected in stages, i.e., the population is divided
in to a number of PSUs, which are sampled; then the selected PSUs are sub divided into a number
of smaller second stage units, which are again sampled; the process is continued until the ultimate
sampling units are reached. For multistage simple random sampling, at each stage the selection
design is SRS, with an equal selection probability for each stage. For example, for two stage simple
random sample the selection method is SRS at first stage with equal probability(1/total PSUs) and
SRS at second – stage again with equal probability(1/ total SSUs)in which the method is described
in short as PSRS/SRS. For a multistage varying probability sampling with two stage designs, the
selection method could be probability proportional to size either at both sampling stages (PPS/PPS)
or PPS at first stage and SRS at second stage (PPS/SRS) and similar procedure can be followed for
more than two stages.
A basic principle of scientific sampling is that every sampling unit must have a known, positive
probability of being selected. Where the probabilities are equal, the sample design is known self-
weighting and the formulae for calculating estimates are relatively straight forward. Where the
sample design is not self-weighting, then the data relating to different sample units have to be
weighted.
Self-weighting Design in two stage sampling
In multi-stage sampling, there are different sample designs to choice from. For example, for two
stage sample with constant sampling fraction, an appropriate sampling design would be SRS or
Linear systematic sampling (LSS) at the first stage for selection of PSUs and again SRS or LSS at
the second – stage to select the second stage units, i.e., at both stages simple random sample is used
(SRS/SRS).
At both stages simple random sample is used (SRS).
Simple random sampling (SRS):
It is the simplest kind of sampling method. It requires as a sampling frame a list of sampling units-
households, farmers, institutions, or whatever else is being used – in any convenient order. The
items listed must be numbered in sequence, starting from one the first item at the head of the list
and continuing up to as many as the there are items listed. A table of random numbers is needed to
obtain a random selection of these numbers, and the items, which have been given the selected
numbers that form the sample chosen for the survey. The use of random numbers ensures that the
sample units are chosen entirely by chance, without being influenced by any person‟s unconscious
preferences. In a table of random numbers, each number within the range has an equal chance or
probability of selection. Since each element in the sample frame is given one number, each unit has
4
an equal chance of selection for the sample. The sampling could be performed with or without
replacement.
Linear systematic sample (LSS): it is operationally a convenient method of selecting a sample.
In a systematic sample we decide the sample size n from a population of size N. In this case,
however, the population has to be organized in some way, such as points along a path or in simple
numerical order. We choose a starting point the sequence by selecting the rth unit from one‟ end „of
the sequence, where r is less than n. and is usually chosen between 1 and k randomly. We then take
the rest of the sample by adding K to r, where k is an integer number equal to N/n or to the next
lowest integer below N/n if this division produces a real number. We do this repeatedly until we
reach the end of sequence. One way of envisioning a systematic sample is think of the sample frame
as a „row; of units, and the sample as a sequence of equal of equal-spaced „stops‟ along the row, as
shown below
Population (N)
1 ………………………. N
r k+r 2k+r … (n-1) k+r until desired sample size (n)
Cluster sampling: clusters can be defined as sampling units containing several elements that occur
in groups naturally or formed artificially. A cluster has listing units associated with it in which the
units can be geographical, temporal, or spatial in nature. Thus, cluster sampling can be defined as
any sampling plan that uses a frame consisting of clusters of listing units. In a single stage
sampling, we select a sample of clusters and completely cover all units within selected clusters.
Clusters can be selected by a variety of sampling techniques. For example, we can select a sample
of clusters by SRS or by systematic sampling or sampling with PPS.
The important reasons for using cluster sampling are feasibility and economy. If the only sampling
frames readily available for the target population are lists of clusters, then the only feasible method
of sampling is cluster sampling. That is why for surveys of human population; to compile lists of
households for the purpose of survey never seems feasible in terms of time and resources. Listing
costs and traveling costs almost always lowest for cluster sampling.
The disadvantage of cluster sampling is that the standard errors of estimates obtained form this
design are often high compared with those obtained from samples of the same number of listing
units chosen by other sampling designs. Therefore, one can choose the sampling design that gives
the lowest possible standard errors at a specified cost or, conversely, the sampling design that yield,
at the lowest cost, estimates having standard errors.
Stratified random sampling:
Definition: The process of splitting the sample to take account of possible sub-populations is called
stratification, and such techniques are called stratified sampling methods.
Stratified sampling is a technique, which uses any relevant information, which might be available,
in order to increase efficiency. Stratified sampling involves the division or stratification of a
population by partitioning the sampling frame into non-overlapping and relatively homogeneous
groups called strata. The selection of samples can be performed independently in each of those
strata.
Stratified random sampling is a sampling plain in which a population is dividing into L mutually
exclusive and exhaustive strata, and a simple random sample of nh elements is taken separately and
independently within each stratum. Let N1 , N2 ,..., N L represent the number of sampling units
within each stratum, and n1, n2 ,..., nL represent the number of randomly selected sampling units
within each stratum. Then the total number of possible stratified random samples is equal to
 N1   N 2   NL   N 

     ...      .
 n1   n2   nL   n 
Stratified random sampling, in particular, involves dividing the population into strata, and then
selecting simple random samples from each of the strata. Stratification variables may be geographic
5
(region, province, rural urban, zone) or non-geographic (income, age, sex, size of employees, etc).
It should be kept in mind that stratification is limited only to those items of information, which are
available on the frame.
The Purpose of Stratified Sampling
Stratified sampling is used in certain types of surveys because it combines the conceptual simplicity
of simple random sampling with potentially significant gains in reliability. Basically there are four
major reasons for resorting to stratification:
1. The principal objective of stratification is to reduce sampling error. Under certain
conditions the variances of the sample estimates may be decreased, which means precision
may be increased over simple random sample.
In a stratified sample, the sampling error depends on the population variance existing within
the strata but not between the strata. For this reason it prays to create strata with low
internal variability. It follows that the more homogeneous the groups, the greater the
precision of the sample estimate.
2. In some cases, separate estimates are required at the stratum level. For example, in
household surveys estimates may be required by province, income group, occupation, age
group, urban size group, educational category, etc.
3. Stratified sampling is administratively convenient. It can enable a survey organization to
control the distribution of fieldwork among its regional offices. Also, for large complex
surveys, it can facilitate sample design work by enabling such work to be carried out within
operationally manageable units.
4. Sometimes, different parts of the population may call for different sampling procedures.
For example :
 There may be incomplete frames in which lists may be available for different parts of
the population and in this case supplementing an incomplete list will be required.
 Physical distribution of a population such as private households, institutions (military
camp, prisons and hostel), densely or sparsely populated areas may call for different
procedures. A different procedure may be used to sample persons in sparsely populated
rural areas than that used for the more densely populated urban areas.
 The diverse nature of the sampling units of the population may call for different
procedures. For instance, employee‟s study requires stratification by literate and
illiterate; and study on young people between 14 and 18 years requires young people at
school and those who are not at school.
The strategy employed for constructing strata involves, first determining the population parameter
we are interested in estimating and, second stratifying the population with respect to another
variable that is thought to be associated with the variable of interest. For example, in the firm
expenditure survey, it is desirable to stratify the population of farms on the basis of information,
which is highly related to the cost of running a farm. One may consider size (acreage) and type of
farming (crops, livestock, other, etc) as appropriate stratification variables.
The major disadvantage of stratified sampling is that it may take more time to select the sample
than would be the case for simple random sampling. More time is involved because complete
frames are necessary within each of the strata and each stratum must be sampled.

6
Chapter 4
4 Preparation of Sampling Frames
4.1 Definition
In its simplest form sampling frame is a listing of the units from which the sample selection is to be
made at any stage of sampling. The units in the frame may be either areas or units of objects
covering the items being investigated in a survey. The units in the frame may be large or small
areas, households, persons, farms, or any identifiable items, and are generally known as area frame
or list frame.
The frame consists of materials, procedure, and devices that identify, distinguish, and allow access
to the elements of the target population. The frame is composed of a finite set of units to which the
probability sampling scheme is applied. Rules or mechanisms for linking the frame units to the
population elements are an integral part of the frame. The frame also includes auxiliary information
(measure of size, demographic information) used for special sampling techniques, such as
stratification and probability proportional to size sample selections, or special estimation techniques
like ratio or regression estimation.
In multistage sampling the sampling units used at the first stage of sampling are called primary
sampling units (PSUs). Those used at the final (ultimate) stage are called ultimate sampling units
(USUs). In designs with three or more stages, units used for the intermediate stages are called
secondary or second stage sampling units (SSUs), third stage sampling units and so on. Therefore,
for surveys with multistage sample designs, a frame is needed for each stage of selection.
For example, for the three stage design the sampling units for household survey are:
PSUs: districts (woredas)
SSUs: EA (kebeles)
USUs: housing units (households).
Any sampling frame used for the first stage of selection must cover the entire survey population
(the designated PSUs). At subsequent stages of selection, frames are needed only for the sample
units selected at the preceding stage. In the above case, a list of districts (woredas) would be needed
for first stage sample selection. List of EAs (kebeles) would be needed for second stage, but only
for the sample districts (woredas). For the final stage, list of housing units (households) are required
only for sample EAs (kebeles). In this study the term secondary sampling frame will be used for
frames that are developed specifically for the second and subsequent stage of sample selection.

4.2 Basic Consideration in the Choice of Sampling Frames


The choice of suitable frames for all stages of sample selection is a critical aspect of the design for
surveys. The population coverage, the stages of sampling, the stratification used the process of
selection itself every aspect of design is influenced by the sampling frames. Key considerations in
the choice of sampling frames, regardless of the stage of sampling for which they are used, include
the following: intended use, frame units, coverage, media, content, and additional information.
Intended uses: sampling frames are used for sample selection and for making estimates based on
sample data. The choice of the sampling method to be used at each stage of selection is limited by
the information available for each frame unit at that stage. If the information consists only of
attributes (e.g. urban/rural classification, identification of higher level units), it is necessary to use
an equal probability selection method with or without stratification. However, if quantitative
information or measure of size (e.g. counts of persons or household from a recent census) is
available for all or virtually all frame units, this information can be used in connection with sample
selection or estimation, or both.
Frame units: frame units are sampling units included in the frame. The kinds of units in frames
used for surveys include:
 Area units such as administrative subdivisions, census enumeration areas, land areas
(segments), and others. Area units cover specified land areas with defined boundaries.

7
 Non-area units include housing units, households, persons, nomadic tribes, institutions,
construction camps, and other items, and these units must have a clear definition.
Coverage the coverage objective of the frame or frames used for a survey is to provide access to all
of the elementary units in the survey population and to do so in such a way that every one of those
units has a known (or knowable) probability of selection in the sample for the survey.
Access is achieved by sampling from the frames, usually through two or more stages of selection
and by the use of rules of association that link elementary units to the units that were selected at the
final stage of selection i.e., the USUs.
Media sampling frames may be stored either on print or electronic media. For a frame stored on
electronic medium, it is relatively easy to produce a printout of the entire frame or any portion
desired, and to organize in any desired format.
Content the frame contains a record for each frame unit. The only item that is absolutely
indispensible is a unique identifier of each unit. If a unit is selected, the numerical identifier
provides the means of access to the unit in order to perform subsequent sampling operations or to
collect survey data. The numerical identifier will be linked with other identifiers such as place,
names, or addresses of housing units, either in the frame itself or on maps or other auxiliary
materials.
Additional information there is a number of possible reasons for collecting additional information
during the construction of sampling frame. One occurs when the definition of the universe or the
sampling unit to be covered is rather complicated to apply under field conditions, and also
classificatory information is gathered during the frame listing, and the final decision as to which
units are to be excluded or included can be made at a later stage. Another common reason is for the
purpose of stratification and allocation in which the stratifying information must be gathered and
recorded during the frame listing.

4.3 Desirable Properties of Frames


The properties can be grouped in to three major categories: properties related to quality, those
related to efficiency and those related to cost
a) Quality Related Properties
Quality related properties of frame are those properties which make it possible to minimize non-
sampling errors, especially coverage errors that might occur because of deficiencies in the frame.
Desirable quality related properties are that the frame:
 Consists of well-defined units, meaning that the area units has recognized boundaries
that are clearly delineated on various types of maps, and for non-area units a precise
standard definition of the unit be established.
 Units have adequate identifier; usually frame units will have both unique numerical
identifiers (primary identifiers), and the other identifiers, such as names and addresses
(secondary identifiers). The units in the list must be traceable in the field.
 Must be complete; the completeness of a sampling frame deals with the extent to which
the intended coverage is actually achieved and the extent to which desired information
for each frame unit is included in the frame. If incompleteness and duplication exist in a
frame, it can create a problem and introduce bias into the survey estimates.
 Are up-to date for frames that are to be used for more than once, procedures must be
developed for periodic updating to ensure that they are up-to date for some are likely to
change with time.
 Must have stable units if there is a choice with respect to the king of units to be used in
a frame, it is preferable to choose the most stable kinds of units available, i.e. those that
are least subject to change in number, definition and size.
b) Efficiency Related Properties
Efficiency related properties of frames are those qualities that make possible and facilitate the use
of efficient survey designs. Efficiency in this context refers to the relationship between sampling

8
error and the cost of producing survey estimates; the most efficient survey design is the one that
produces the desired level of precision at the lowest possible cost. Perhaps the most important of
these properties is the inclusion of accurate and up-to-date supplemental information for each frame
unit. Measure of size, such as population, number of households, number of agricultural holders and
other size of measure, are useful. Measure of size can be used in the following ways:
 To construct sampling units
 To form strata of units classified by size
 To determine the allocation of sample PSUs to strata
 To select units with probability proportionate to size (PPS)
 As auxiliary variables for ratio or regression estimates.
Other properties of frame that facilitate the use of efficient sample designs include:
 Choice of sampling units available- organizing the frame units in a hierarchical structure
and assigning identifiers to frame units.
 Good quality maps of units available- showing the boundaries of each unit
 Easy to manipulate/ process- computerization of the frame
c) Cost Related Properties
The preparation of sampling frames can be an expensive exercise. Low cost of frame development
can best be achieved by treating the development, maintenance and updating of frames for census
and household surveys as a single integrated ongoing process. If two alternative frame sources
would result in the same quality and efficiency, the one with lower cost of development, use and
maintenance would obviously be preferred, i.e., low cost of acquisition/preparation low cost of use,
and low cost of maintenance. The choice frame for a survey must be based on assessing the cost of
using that frame and the total error of the survey estimates when that particular frame is used.
Above all, the cost of frame preparation must be considered at the planning stage and must be
budgeted for. They are likely to be significant proportion of the total cost and relate to an element
of the survey work which is critical important in determining the eventual quality of the survey
results.
In summary, the sampling frame plays a central role in the design of a sample survey. It determines
how well a population is covered, affects the method of enumeration and influences the efficiency
with which a sample is designed. A frame becomes more valuable if contains some supplementary
information, which can be used to improve sampling, and estimation procedures.
The structure of the frame, the information it contains, and the quality of that information will
determine the type of sample designs, and estimation procedures that can be used in a survey.
Simple frames lacking auxiliary information support simple sample designs. For example, if the list
contains no information other than the identity of the elements, typically very simple sample
designs are used for selecting the sample. A simple random sample may be selected, or if the list is
large, a systematic sample or a systematic sample of clusters may be used.
Many sample designs use auxiliary data to produce more efficient samples. Complex sample
designs that are more efficient than simple random sampling, such as those employing stratification,
probability proportional to size sample selection, or special estimation techniques such as ratio and
regression estimators, require additional information beyond the identity of the target elements.
The sampling frame must be accurate and free from defects. It should be exhaustive (no units
omitted), non-repetitive, current or fresh list must be available (up to date), the units should be
clearly identifiable without ambiguity, and the units in the list must be traceable in the field.

9
CHAPTER 5:
5 SAMPLE DESIGN
5.1 Sampling Methods
The general aim of all sampling methods is to obtain a sample that is representative of the target
population. By this we mean that, as much as possible, the information derived from the sample
survey is the same as we would find if we carried out a census of the target population, allowing for
inevitable variation in the estimates due to imprecision.
When selecting a sampling method we need some minimal prior knowledge of the target
population; with this and some reasonable assumptions we can estimate a sample size required to
achieve a reasonable estimate, with acceptable precision, and accuracy of population characteristics.
How we actually decide which sampling units will be chosen makes up the sampling method.
Sampling methods can be categorized according to the approach they take to the probability of a
particular unit being included. Most sampling methods attempt to select units such that each has a
definable probability of being chosen. Moreover, most of these methods also attempt to ensure that
each unit has the same chance of being included as every other unit in the sample frame. All
methods that adopt this general approach are called probability sampling methods.
The basis of probability sampling is the selection of sampling units to make up the sample based on
defining the chance that each unit in the sample frame will be included. If we have 100 units in the
frame, we decide that we should have sample size of 10; we can define the probability of each unit
being selected as one in ten, or 0.1 (assuming each unit has the same chance). As we shall see next,
there are various methods that we can use to select the units.
It is important feature of probability sampling that each time we apply the same method to the same
sample frame we will generate a different sample. For a finite population we can use simple
combinatorial arithmetic to calculate how many samples we can draw from a particular sample
frame such that no two samples are identical. It turns out that from any population of N objects we
can draw NCn different samples, each of which contains n sampling units. In fact, in probability
sampling we are concerned with the probability of each sample being chosen rather than with the
probability of choosing individual units. If each sample is equally likely to be selected, then each
sampling unit automatically has the same chance of being included as every other sampling unit.

5.2 Choice of Sample Design


A sample design is a joint effort of the survey statistician and other experts such as subject matter
specialists, data users, and survey executing agencies. Mostly statisticians require information from
other experts in order to propose a sample design that will meet the required specification of the
users at the lowest possible cost. Among few issues on which they should discuss and reach
agreement may include objectives of the survey, variables to be measured, type of estimates
required, levels of reliability and validity needed for the estimates and any restrictions placed on
survey with respect to timeliness and cost.
a) Setting objectives and preliminary investigation of the survey: The survey objective should
be Cleary specified and precisely stated at the outset. Other issues related to the objective and
relevant to the survey must be assessed at the early stage of the design. Depending on the scope and
topics of the survey it may cover the following.
I. A clear formulation of the problem and the cause of the problem into a precise and definite
statement, it clarifies exactly what is to be investigated and why it should be investigated.
This leads to the types of survey to be followed: Exploratory (explore a new topic),
Descriptive (presents a picture of the specific details of a situation, social setting, or
relationship), explanatory (explain why something occurs, looks for causes and reasons)
II. Identification and definition of the population to be studied (target population and the kinds
of units to be covered ), and a description of the coverage, such as geographic area, branch

10
of the economic or social group, or other classification of the population covered by the
survey;
III. A clear specification of the desired information to be collected in statistical terms, i.e., to
determine the data requirements.
IV. The level of breakdowns by which the results are to be tabulated; regions, age groups, sexes,
residences and any other economic and social classification.
V. The level of accuracy desired or the specification of tolerable errors: the accuracy of a
survey estimate is generally taken to mean the closeness of the estimate to an exact or “true
value”, which is nearly always unknown, the error of a particular survey estimate is the
difference between that estimate and the true value of the quantity being estimated. It arises
sampling errors and non- sampling errors which need due consideration at design stage.
Since sample size determination requires the desired the desired confidence level and
margin of error, it is important to specify these factors by considering the cost and precision
required.
VI. The kind of results expected and, the users as the well as the uses of the data;
VII. Timeliness- how soon are the results needed; the utility of survey results falls off gradually
with the passage of time following the data collection stage of the survey. The rate at which
utility declines over time depend on the content and objective of the survey. For example,
political polls, relating to specific elections, revision of the minimum wage, monthly labor
force survey their results are needed very quickly. Users of survey data often press for
timeliness at the expense of accuracy. Therefore, one has to produce timely data to timely
data to facilitate their actual use and maintain the responsibility to produce accurate data
b) Sampling plan:
There are different ways of designing a sample survey, but the idea of optimum design started with
the sampling features such as selection process and estimation procedures. The selection process
deals with the preparation of sampling frames, sample size determination, choice of design to be
used, and sample selection method. The estimation procedure involves the process for computing
the sample statistics and calculating the reliability of these estimates. The purpose is to develop a
sample design that would meet reliability requirements at the lowest possible cost, or alternatively,
to produce the most reliable estimates for a fixed expenditure of resources.

5.3 Selection process:


After making an assessment of survey objectives, the kinds of topic to be covered, description of
coverage, reporting levels, and other issues as discussed above, the next step in selection process is
to make a choice of design.
Choice of design: there are different designs of sample, which are likely to be appropriate for
different types of survey, and in different circumstances. It varies from the simplest kind of sample
survey (simple random sampling –SRS) to a more complex large –scale sample survey design
(multi-stage sample design). In general, there are two approaches of sampling stages.
Single stage sample design (un-stratified/ stratified) and multi-stage sample design (unstratified /
stratified).

5.4 Sample size estimation


The sample size for a survey must be decided upon at the planning stage, together with the sample
design. If done properly, the correct estimation of sample size is a significant statistical exercise.
The sample size required depends upon there factors –the level of precision required in the estimate.
This requires specifying the acceptable margin of error and the confidence level, the level of
variability of the variables to be estimated, which could be measured by the standard error or
coefficient of variation, and the sample design to be used, in which different designs will produce
different levels of precision for the same sample size, or conversely different sample sizes for the
same level of precision.
11
Sometime we bypass the statistical process by adopting an ad hoc approach of using a fixed sample
proportion (such as 10% of the population size) or sample size (such as 100). In relatively large
populations (say at least 2000) this will normally produce results that are no worse than those
produced by a sample based on a carefully calculated sample size (provided, of course, that the
sample units that make up the 10% sample are properly selected, so that they are representative of
the population).
The basis for calculating the size of samples is that there is a minimum sample size required for a
given population to provide estimates level with an acceptable level of precision. Any sample larger
than this minimum size (if chosen properly) should yield results no less precise but not necessarily
more precise, than, the minimum sample. This means that, although we may choose to use a larger
sample for other reasons, there is no statistical basis for thinking that it will provide better results.
On the other hand, a sample size less than the minimum will almost certainly produce results with a
lower level of precision. Again there may be other external factors that make it necessary to use a
sample below this minimum. If the sample is too small the estimate will be too imprecise, but if the
sample is too large will be more work but no necessary increase in precision.
But remember that we are primarily interested in accuracy. Our aim in sampling is to get an
accurate estimate of the population‟s characteristics from measuring the sample‟s characteristics.
The main controlling factor in deciding whether the estimates will be accurate is how representative
the sample is. Using a small sample increases the possibility that the sample will not be
representative, but a sample that is larger than the minimum calculated sample size does not
necessarily increase the probability of getting a representative sample. As with precision, a larger –
than necessary sample may be used, but is not justified on statistical grounds. Of course, both an
appropriate sample size with the proper sampling technique is required. If the sampling process is
carried out correctly, using an effective sample size, the sample will be representative and the
estimates it generates will be useful.
Assumptions
In estimating sample sizes we need to make the following assumptions:
 The estimates produced by a set of samples from the same population are normally
distributed. A well-designed random sample is the sampling method that will most usually
produce such a distribution.
 We can decide on the required accuracy of the sample estimate. For example, if we decide
that accuracy has to be 5%, the estimated value must be within five percent either way of
the „true‟ value, within the margin of error defined in the next assumption.
 We can decide on a margin of error ( ) for the estimate, usually expressed as a probability
of error (5%, or 0.05). This means that in an acceptably- small number of cases (e.g. five out
of a hundred) our sample estimate is not within the accuracy range of the population
estimate defined in the last assumption.
 We can provide a value for the population variance (S 2 ) of the variable being estimated.
This is a measure of how much variation there is within the population in the value of the
property we are trying to estimate. In general we will require a larger sample to accurately
estimate something that is very variable, whilst something that has a similar value for all
members of the population will require a markedly smaller sample. As we shall discuss
shortly, although we almost never have a value for the population variance there are various
ways of obtaining an estimate for use in calculating sample sizes.
Based on these assumption there are several formulae that have been developed for estimating
minimum sample sizes.
Another area that needs attention in sample size determination is that several variables are equally
important in a particular survey and the precision requirements for each of these will then produce a
different estimate of the sample size needed. In this case, one should make an assessment to come
up with a single estimated sample size. These include:
 Estimating the sample size of different variables

12
 Balancing the required level of precision and the resources available for conducting the
survey
 Give due consideration to the likely tradeoffs between sampling and non- sampling errors
(lager sample size reduces sampling error, but it may have the effect of increasing non-
sampling errors.

5.5 Estimation procedures


The estimation characteristics will be a major objective in surveys. Population estimates will be
calculated from sample data, and reported together with an indication of the precision of the
estimate obtained from the sample variance. Typical estimates are totals, means, ratios, and
proportions, in which their standard errors will be required to enable confidence levels to be placed
on estimates, and tests of significance can be carried out.
Calculation of population estimates are derived from the type of sampling design used for the
survey. Based on the type of sampling design estimates are raised from an estimate of a small
sample to an estimate of the population by multiplying by the inverse of the sampling fraction.
The more complicated the design in terms of the number of stages, variation in sampling fraction
and sampling with or without replacement, the more complicated will be the algebra for
calculations. For example, if the design is single stage simple random sample, then the estimation
procedures for the estimates with their variances are;

The standard errors can be obtained by taking the square root of each variance. If a two stage
design is used with PPS/SRS, the estimation procedure would be as follows. At first stage kebeles
are selected with PPS (where size being number of households). At second stage, a fixed sample of
households was drawn from each sample by SRS or systematic i.e., mim is the same for all
sample kebeles (constant sample size). Then the estimation procedure would be:

The corresponding variance will be

13
Standard errors which can be obtained by taking the square root of the variance can be used to for
further estimation and evaluation.
Note for notation used:
n= is the number of first stage units in sample
H is total number of sub-units (households) in survey population
mi is number of second stage sub-units (households) within ith first stage unit in sample.
yij is observation for jth sub-unit within ith unit

14
Chapter 6:
6 Methods of collecting the Data
6.1 Time Dimension in Survey
Two types of surveys are classified according to the time of data collection: longitudinal surveys,
and cross –sectional surveys.
Longitudinal surveys gather information at different points in time in order to study changes over
extended period of time. Three different designs are used in longitudinal survey: panel studies,
trend studies, and cohort studies.
Panel studies are studies in which the same subjects are surveyed at different times over an
extended period. The investigator observes exactly the same people, group organization across time
periods. In a trend study, different people from the same general population are surveyed at
different times. In a cohort study, a specific population is followed over a length of time.
Cross –sectional surveys study a cross section (sample) of a population at a single point in time.
It is usually the simplest and least costly alternative. Its disadvantage is that it cannot capture social
process or change.

6.2 Data Gathering Techniques


The objective of the survey, the nature of the items of information, the operational feasibility and
cost will often determine the method of data collection. Of the various methods of collecting the
data just a few of them are outlined below.
We distinguish three basically different methods of collecting data. These are:
 Extraction of data from records
 Self –administered questionnaire
 Direct investigation measurement (observation) of the subject and
 Interviewing (face-to-face, telephone)
The data collection method will be determined by the nature of the required information and our
first step is to decide on which of these three methods to use.

6.2.1 Extraction of data from records


It is usually possible to answer some of the questions survey is intended to cover from available
data. For example, a mass of information about the population studied by social surveys is available
in historical documents, statistics reports, records of institutions and other sources; it is up to the
survey or to derive what help he/she can from it. However, one must first consider carefully its
suitability for the purpose. One must critically assess population coverage, definition, how accurate
is the information and if sufficiently up to date.
In addition, government departments, institutions like hospitals, and prisons, professional institutes
and business firms possess a mass of information relating to individuals, but not generally available
to the outside researcher since the information is compiled quite for another purpose. There are
some areas where information from records is the only available sources, like financial information,
distributive trade, educational and transport data in which close cooperation with the concerned
institution is vital to have access to the required data.
Information from records may serve as complement for analysis and can be used as a base for
preliminary investigation. Therefore, it is advisable to examine exhaustively what is available in
records before launching any surveys.

6.2.2 Self – administered questionnaire


Mail and self –administered questionnaire is a method of data collection in which researchers can
give questionnaires with instructions directly to respondents or mail them to respondents who read
15
instructions and questions, then record their answers and give it back or return it by mail again to
data collecting agency.
This type of survey has many advantages, which include:
 It is the cheapest and can be conducted by a single researcher
 A Researcher can send questionnaires to a wide geographical area
 The respondent can complete the questionnaire when it is convenient and can check
personal records if necessary
 Mail questionnaires offer anonymity and avoid interviewer bias.
 They are very effective, and response rates may be high for a target population that is well
educated or has a strong interest in the topic or the survey organization.
The disadvantages of this method may include:
 A low response rate is the biggest problem, the process of returning the questionnaire may
be unnecessarily extended. The researcher can raise response rates by sending non-
respondents‟ reminder letters, but this adds to the time and cost of data collection.
 A researcher cannot control the conditions under which a mail questionnaire is completed.
 Researchers cannot visually observe the respondent‟s reactions to questions, physical
characteristics, or settings.
 Mail questionnaire is not suitable for illiterate community
Therefore, the use of this method is limited to predominantly literate society, as the method
requires a clear understanding of the survey concepts through reading and writing. Because
of this its use is limited to developed countries with high percentage of literacy.

6.2.3 Direct investigation –measurement (observation) and interviewing (face-to –


face, telephone)
Measurement or observation of the subject and interviewing a respondent and obtaining the report
on the matter are two approaches, which are by no means exclusive. It is very common indeed to
find both being used in the same survey. Some topics can only be investigated by one or other
approach, but many can be investigated using either, and in such cases it is necessary to assess
which is more suitable in the circumstances of the particular study. Therefore, the type of question
and the nature and status of the topic will determine whether a required piece of information can be
gathered by measurement or interview approach.
Measurements or Observations
Information on a topic can be gathered by measurement if it is physically measurable or
observable. Common types of data collected by observation and measurement include:
 Land area measurement
 Crop output measurement
 Anthropometrics measurement
 Animal weight gain
 Instrument recordings or readings (e.g. rainfall, temperature, etc)
 Physical measurement or examination of people
 Counts of human, animal and plant populations
 Direct observations of work
 Exchange activities (e.g. purchase and sale prices).
Data collection by measurement can be undertaken in several ways. Some of these are:
 The direct measurement of physical characteristic using an instrument;
 The observation of people engaged in an activity; and
 Recording of relevant aspects of their activities

16
Interviewing (face- to – face, telephone)
Face-to-face interview is a social process that involves the interviewer and respondent. It is the
process in which the interviewer meets the respondents, explains the purpose of the study, forwards
a set of questions and records the answers. It is widely used in economic and social surveys.
Information may be collected by interview for various reasons. It may be information which could
be measured directly but would require too much time or too great a use of manpower or funds to
do so, in which case probably less accurate interview method is used instead. It may be information
that cannot be directly observed or measured because they relate to the past. It may be information
about the respondent‟s own knowledge, opinions, perceptions or attitude.
Some advantages of face –to face interviews:
 Face-to face interviews have the highest response rate and permit the longest
questionnaires.
Interviewers control the sequence of questions and can use some probes.
Respondent is likely to answer all the questions alone.
Interviews also can observe the surroundings and can use nonverbal communication and
visual aids.
 Well-trained interviewers can ask all types of questions including complex questions.
The disadvantages of this method, may include the following
Cost is high- the training, travel, supervision, and personnel costs for interviews can be
high.
Interviewer bias is also high in this method
The appearance, tone of voice, question wording, and so forth of the interview may affect
the respondent.
The use of telephone interviewing for social surveys has increased in developed countries
substantially in recent years because of the high penetration of telephones. Its major advantages are
lower cost and faster completion, with relatively high response rate. The phone permits the survey
to reach people who would not open their doors to an interviewer, but who might be willing to talk
on the telephone. There may be less interviewer bias and less social desirability bias than with
personal interviews. The main disadvantage of this method is that there is less opportunity for
establishing rapport with the respondent than in face-to –face situation. Another disadvantage is that
households without telephones and those with unlisted numbers are automatically excluded from
the survey, which may bias results. Those who have phone number blocking may simply ignore
calls from unfamiliar number of the survey.
We have to note that the use of this method is unpopular and very limited in developing countries.

17
CHAPTER 7:

7 INSTRUMENT OF DATA COLLECTION


7.1 Type of instruments
A data collection instrument is a document used for gathering and recording of data in a survey.
Basically there are two types of instruments to collect data: Structured questionnaire and
unstructured questionnaire.
The first type of instrument, structured questionnaire used mostly in formal sample survey, is a
formalized schedule or form and contains an assembly of carefully formulated set of questions for
information gathering. In other words, a structured questionnaire is one of the instruments used in
data collection and which contains written questions that people respond to directly on the
questionnaire form itself, with or without the aid of an interviewer. In a structured questionnaire, all
questions are prearranged in some specified order and the range of possible responses for each
question is provided.
The second type is a checklist of topics (unstructured questionnaire) used, mostly in qualitative
survey, when enquiries are not appropriate for structured questionnaires. An unstructured
questionnaire contains mostly open –ended questions. This type of instrument is used in an informal
or exploratory survey and designed in the form of survey guides, tally sheets, observational forms,
field notes, outline of questions, etc.
Most questionnaires used in sample surveys combine structured and unstructured questions. Since
questionnaire is the main data collection instrument in formal sample survey, this chapter will
discuss the issues involved in questionnaire design and other activities related to it.

7.2 Principles of Designing Questionnaire


All surveys involve presenting respondents with a series of questions to be answered. The questions
may be simple single- item measures or complex multiple – item scales. In whatever form it exists,
especially socioeconomic survey data are basically what people say to the investigator in response
to a question.
One major contributory element in the process of formal sample survey for maintaining data quality
is the questionnaire design. In this approach, questionnaire need to be structured and its design is
critical because survey analysis depends on the completeness of the topics covered. A well –
designed questionnaire will enable us to ask the respondents the same questions in the same way
and their answers must be recorded and coded uniformly so that data can be aggregated across the
sample.
Error–free data transfer requires clear, comprehensive questions, good enumeration, and Cleary set
out answers. Much of this process depends on good questionnaire design. The form must cater for
coding and subsequent data entry for processing. In this respect there are some questionnaire design
principles, which links between interview and data processing.
18
 Regarding the content one must include the minimum number of topics to meet the
objectives. Because of resource and time constraints we should focus on items of direct and
major interest and avoid collection of any non-essential information.
 Time for the interview is another factor that must be kept reasonable and this limits the
number of questions
 The questions must be easy to for the respondents to understand and to answer accurately
and clearly.
 The questionnaire should be easy to use as an interview guide for the enumerator and as an
instrument for recording answers.
 The questionnaire should be self- contained, which include identification of the enumerator,
respondent, date of interview and any other reference information such as geographical
identification and other field details.
 It should be designed in such a way that the recorded answers can easily be edited, coded
and transferred on to a computer file for data processing, tabulation and statistics analysis.
 The flow, structure and length of questionnaire should encourage and keep the interest of
the respondent.
 Careful thought should be given to the quality of presentation material such as paper, the
size of the sheets used, the clarity of printing and spaces provided for recording answer.
The process of design is creative and one should develop strong preferences of particular styles of
layout. And phraseology since there is no single prescription around which a form can be modeled.
A typical sequence of activities to design a form would have the following pattern.
 Draw up a list of question topics from a mixture of theoretical models, empirical
information, research evidence and terms of reference for the study;
 For each topic phrase the specific information required;
 List them in a logical order, following either a chronological or sequential pattern;
 Decide for each questions how to record the interview response;
 Make a first draft layout on the style of paper to be used;
 Test the design on model respondents;
 Prepare a pilot draft for a pilot or test survey;
 Modify the form from the results of the test, and
 Finalize the design and layout.
 Review as many times as possible the number of questions finally listed
Form design is largely a compromise between opposing criteria: layout for collection versus layout
for data processing. Layout for collection is the ease, speed and accuracy with which the
questionnaire can be completed in the field, while layout for data processing is the ease, speed and
accuracy with which information from the questionnaire can be processed for analysis. One should
give equal attention to both aspects –collection and data processing.

7.3 Type of Questions


Two basic types of questions can be used in questionnaires: open–ended questions and closed ended
questions-depending on the amount of freedom given to respondent in offering responses.
19
The type of questions for use will be determined by the form of responses sought, the nature of the
respondents and their ability to answer the questions.
Open-ended question
An open- ended (unstructured, free response) question is one which allows the respondent to
answer it freely in his or her own words, and to express any ideas generated from the question itself.
Open implies that the respondent is permitted to answer in any form and at any length without any
limitation on the range or complexity of the answer, to the question asked.
Response categories are most often associated with exploratory or informal surveys, in which the
investigator does not know the likely response from the units of study. It needs a checklist of topics,
guidelines or unstructured questions.

For example; which crops do you grow? The question does not specify any particular season or
crops or plots and hence many answers are possible. It is open for discussion. Why did you say you
would not buy imported cooking oil when it is available in the market? Again this could be
discussed since the reasons could be quality, taste, price, etc.
The advantages of open-ended responses are:
 They permit an unlimited number of possible answers, which may not be considered at
initial stage of the questions‟ design.
 Respondent can answer in detail and can qualify and clarify responses by expressing in
his/her own words.
 Unanticipated findings can be discovered.
 They permit creativity, self-expression, and richness of detail.
 They may be used when there are too many response categories to list on a questionnaire.
 They are useful when the questions are too complex to reduce to a few standard responses.
The disadvantages of open-ended responses are:
 That much irrelevant information is collected
 The answers are not standardized and are therefore difficult to compare and to make
statistical analysis.
 Coding responses is difficult
 They require a higher level of skills on the part of the data collector since responses are
written verbatim.
 More time, thought, and effort is necessary for completion
 The forms are often bulky because answers take up a lot of space in the questionnaire.
Closed-ended question
A closed –ended question is one where a predetermined list of alternate responses is presented to
the respondent for checking the appropriate one(s). It implies that the respondents‟ answers are
restricted in some way to a limited range of alternatives. Closed-ended question falls into one of
two categories: dichotomous question and multiple- choice question.
A dichotomous question contains two alternatives in the predetermined list of responses.
Examples are yes-no, true-false, agree-disagree, like-dislike, fair-unfair and so on. A multiple-
choice question offers more than two responses in the predetermined list of alternate responses.
There are two categories of multiple choice questions: single coded question, where the respondents
are permitted to check one and only one response; and multi-coded question, allows the respondent
to select as many responses that are applicable.
Example: a) Do you have a bank account? Yes = 1, No = 2
b) How many children have you ever born?
1 =1-2 2= 3-4 3 = 5-6 4 =7-8 5 = more than 8
c) Which type of soft drink(s) does your household consume?
1 = Pepsi–Cola 2 = Coca–Cola 3 = Mirinda 4 = Fanta 5 = Sprite
6 = Seven–up 7 =others, specify______________________.

20
d) Has the road construction activity had impact on your access to public services (health,
education, market, etc)? Yes = 1, No = 2
If the answer is „Yes‟ explain the impact. _____________________
The choice can be made by making a mark alongside a category; by entering a numeric value; or by
selecting a code form a code list. Setting categories of responses requires skill and experience in the
areas of studies and suits computer processing.
The advantages of closed response categories are the
 It is easier and quicker for respondents to answer.
 The answers of different respondents are easier to standardized and to compare
 The answer are easier to code and statistically analyze
 The questions meaning is often made more clear by the response categories,
 The answers are relatively complete as long as all relevant categories are specified
 Respondents are more likely to answer about sensitive topics
The disadvantages of closed response are that
 The respondent can guess at answers when they don‟t know since have the categories to
guide them
 The appropriate category may be missing from the schedule
 Failure to understand the question is less easily detected than with an open –ended question
 A poorly planned list may act as a constraint to correct answers not catered for
 Too few categories may fail to differentiate between important groups, and enumerator error
(placing the tick in the wrong box by accident will be more common)

7.4 Question Layout


In questionnaire design, as a general principle, questions should be presented in a logical order
designed to follow a natural sequence. Four basic alternatives are found in the layout of questions.
 A verbatim listing of every question, with complete wording and instructions on the
progression of the respondent through the form. It is commonly found in forms that are
designed for self-enumeration or where it is critical to the study that precise wording is used
at every interview. It can lead to lengthy and complex questionnaires and is rarely found.
 A listing of questions in a specific order, but without full or precise wording of the
questions or instructions for progression through the form. The form is normally
accompanied by a detailed reference manual, in which questions are specified in full and
examples given. The form will be completed by a trained enumerator and hence, careful
training is necessary to ensure that enumerators follow the guidelines when they interview
respondents.
 A tabular row and column format in which spaces are indicated for response, usually in
coded form, without any specification of questions. Question order is indicated by the
sequence of response categories. It is designed specifically to accommodate the needs of
data processing. In this case, reference manual is very important and comprehensive
training, and experienced enumerators are essential if it is to produce satisfactory results.
 A checklist of topics, indicating key facts to be covered, but with answers recorded either in
an unstructured way in a field notebook, or a simplified row/column table. The checklist
approach can be used in an informal study and which requires an experienced workers or
professionals.

7.5 Question phrasing and common problems which arise with question phrasing
Another aspect of questionnaire design that needs serious consideration is phrasing of the question.
The information required should be well and clearly defined at each stage at which a question is

21
posed: initial definition and explanation in the survey manual: text in the questionnaire; precise
units for physical measurement; and verbal phraseology by the enumerator.
At each stage the question should have:
 A clear meaning
 The same meaning to every person asked and the researcher,
 An answer which the respondent knows,
 An answer which can be given clearly and unambiguously by the respondent.
a) leading questions
A leading question is one that leads the respondent to choose one response over another by its
wording. The presentation of questions should be neutral. The form of the question should not
indicate a preferred or „correct‟ answer. For example, the question, „you don‟t smoke, do you?
Or Do you buy the fertilizer recommended by the extension worker?‟ leads respondents to state
that they do not smoke in the first case, and that they should buy fertilizer recommended by the
extension worker and that they are wrong if you fail to do so in the second case.
b) Multiple questions
Multiple (double –barreled) questions are questions which combine two or more distinct questions
into one single question. For example: „Do you like listening radio and watching television?‟
„Do you have a tractor or plough?‟ “Does this company have pension and health insurance benefit?”
In this case one would be confused and undecided as to which answer one should offer.
The best way to avoid confusion is to replace double questions with two or more single questions
and then to ask only one question at a time.
c) Ambiguous question
Ambiguity, confusion, and vagueness must be avoided from a question since different people will
understand the question differently and in effect their interpretation will depend on the individual
respondent. The question, „What is your income?‟ could mean weekly, monthly, or annual; family
or personal; from salary or from all sources; for this year or last year. The question, „Do you drink
beer frequently?‟ it is ambiguous because the word frequently does not specify a fixed time
reference. Vague words and phrases like „kind of’, „fairly‟, „generally‟, „often‟, regularly, etc.,
should be avoided.
d) Probing questions
Probing is not easy. A delicate balance has to be struck between persistence and rudeness. Very
often the respondent does not want to tell the truth. In some culture it is socially acceptable to tell
lies to close Friends, never mind strangers. The enumerator working on a repeated visit survey has
to maintain a working relationship with the respondent and cannot permit the need to resolve minor
contradictions on a few questions to disrupt the relationship. In some cases unbelievable data have
to be accepted, and it is helpful if some method is agreed for the enumerator to draw attention to
this on the form.
e) Use simple language
The language of a question should be simple. The aim in the question wording is to communicate
with respondents as nearly as possible in their own languages. Thus the wording of the question
must be appropriate to the respondent. Question should avoid the use of technical terms and jargon,
which the respondent may not understand. Where it is necessary to use technical or legal terms, one
should provide definitions and explanations.
For example; „Do you use inorganic fertilizer? It is better to specify types or brand names or
colloquial terms with which the respondent will be familiar. Also use terms which the respondent

22
will understand and which will not cause offence. For example terns such as „peasant‟ or „tribe „or
witchdoctor „may cause offence.
f) Sensitive topics
In some cultures people do not like to discuss private matters openly. sensitive questions are apt to
be irritating, threatening, or embarrassing to the respondent, such questions are prone to normative
answer, answers which confirm that the respondent acts within the special rules of society even if
that particular individual sometimes acts outside these rules. In a society which generally condemns
drunkenness, question about drunkenness might generate denial even if drunkenness sometimes
does occur. Under this circumstance it may be useful to word the questions so that there is some
assumption that the activity does take place. Thus rather than ask do you ever get drunk? We might
ask „how often do you get drunk?‟ the assumption in the question that you might sometimes get
drunk may ease the guilt of the respondent and generate a more truthful answer.
Questions on age, physical or mental disability, deaths in households, income, sexual behavior,
family planning, are relatively regarded as sensitive issues.
Special attention should be given during field testing of the questionnaire to identify particularly
sensitive questions and how they can be improved by rewording or better interviewing procedure.

7.6 Choice of the Reference Period


During questionnaire design, the choice of appropriate time reference period is an extremely
important consideration. Time reference period is the specified length of time for which the
respondent is asked to give information about events occurring within it. The choice of reference
period depends on the method of inquiry, the expected frequency of transaction, the account
keeping manner, and the recall lapse. In general, the more recent, and shorter a reference period, the
better the information is likely to be.

23
CHAPTER 8

8 PRE-TESTS AND PILOT SURVEY

8.1 Pre–tests
It is difficult to plan a survey without a good deal of knowledge of its subject matter, the population
it is to cover, the way people will react to questions and even the possible answers they are likely to
give. Particularly for large–scale survey it should be the general rule to conduct pretests and pilot
survey in order to get solutions to the following questions.
 How is one to estimate how long the survey will take, how many interviews will be needed,
how much money it will cost?
 How, without trial interviews, can one be sure that the questions will be as meaningful to the
average respondent as to the survey expert?
 How is one deciding which questions are worth asking at all?
Pretests and pilot surveys are standard practice with professional survey bodies and are widely used
in research surveys.
The pretest is a preliminary application of the data gathering technique for the purpose of
determining its adequacy. This may take the form of a series of small pre-tests on isolated problems
of the design. For example in testing of questionnaires, pre-testing refers to one or more series of
interviews conducted on successive drafts of the questionnaire for the purpose of identifying and
correcting errors and shortcomings. Its objective is to evaluate the general receptivity and feasibility
of the questionnaire, and identify specific problems of communication between the interviewer and
the respondent interms of specific questions or items of information sought.

8.2 Pilot study


A pilot survey or pilot study is generally a full–scale dress rehearsal of the survey. A major purpose
of pilot study is to check whether the organization and arrangements of the survey actually work
satisfactorily. The whole of the survey operation in all its aspects must be tested out on small scale.
This approach thus checks the administrative and organizational arrangements in general, the
arrangements for the supply and distribution of all the resources and equipment needed for the
survey, as well as the fieldwork operations, the survey forms and manual, Sample size
determination and the data processing.
It should proceed through all the stages and operations of the survey proper, but on a small scale in
a few selected localities. These localities should be chosen to cover to as complete a range as
possible to the types of area and population of different characteristics to be covered by the survey.
But there may not be enough resources or time to cover as needed, in which case the priority is to
cover a few areas but over a broad range of characteristics. Thus, the size and design of the pilot
survey is a matter of convenience, time and money. It should be large enough to fulfill the above
functions.
Since the purpose of the pilot study is to identify weaknesses and problems with the survey
materials, procedures and arrangements, the senior technical staff should be closely involved in
pilot study to observe all stages of the work as it is being done under field conditions. In other
word, the survey forms and procedures must be observed under operational conditions in the field if
problems are to be correctly identified, and appropriate solutions found.

24
If it is properly done, it is likely to lead to changes to the survey forms and manuals, and to the
procedures and organizational arrangements. It is therefore necessary to allow enough time to
analyze the results and observations from it, and produce revised materials and arrangements in
good time for the start of the main survey operations.

8.3 Specific uses of pilot survey


The pilot survey has many benefits in particular if the survey is to be conducted for the first time.
In general it provides guidance on:
 The adequacy of the sampling frame from which it is proposed to select the sample.
 The estimates necessary for determining the size of sample needed in the actual survey so
that the final estimates may be made with stated precision.
 The non- response rate to be expected. i.e, the probable numbers of refusals and non-
contacts can be roughly estimated from the pilot survey or pretests and ways of reducing
non-response can be sought.
 Making a sensible choice from alternative methods of collecting the data (observation, mail
questionnaires, interviewers, etc.)
 The adequacy of the questionnaire, which is probably the most valuable function of the pilot
survey.
 The efficiency of the instructions and general briefing of interviewers.
 The codes chosen for pre- coded questions, which may help to decide the alternative
answers to be allowed for in the coding.
 The probable cost and duration of the main survey and of its various stages.
 The deficiency of the organization in the field, in the office and in the communication
between the two

25
CHAPTER 9:

9 SURVEY COST ESTIMATION


9.1 Time Scheduling
Once there is an agreement to proceed with survey, a planning timetable should be drawn up in
order to facilitate Planning and budgeting. Scheduling for field operations must take into account
two key aspects:
 List of survey activities; and
 Approximate time needed to perform each activity
Important activities to be carried out starting from the beginning to the end must be listed at the
planning phase to ensure that certain activities are not overlooked. These activities should be listed
against their target approximate time needed to perform each activity. Be realistic about the time
necessary to complete each stage of the work. The following time schedule is an example of
personal interview study.
Activities Time Needed
1. Formulate survey objectives_____________________________ 1 week
2. Formulate survey methodology and design ________________ 1 week
3. Recruit and train interviewers __________________________ 3 weeks
4. Draft pilot questionnaire _______________________________ 2 weeks
5. Pretest questionnaire __________________________________ 2 weeks
6. Finalize questionnaire _________________________________ 3 weeks
7. Compile sampling frame _______________________________ 7 weeks
8. Select the sample _____________________________________ 2 weeks
9. Collect the data _______________________________________ 5 weeks
10. Transfer data to machine –read media____________________ 2 weeks
11. Process the data ______________________________________ 3 weeks
12. Prepare the report ____________________________________ 1 week
Total weeks __________________________________ 32 weeks
Some of the activities are performed simultaneously wile others need the completion of other
activities and must be presented in a form of a chart so that the required time can easily be
estimated.
A bar chart approach in presenting the time schedule was developed by Henry L. Gantt and for the
activities indicated above it shows a more realist time span, in which 21 weeks is required instead
of 32 weeks as illustrated below.
The following chart is an example of Gantt chart of study activities.

26
8.2 Preparing Budgets
Budget preparation involves the assignments of cost to each survey activity. The main expenditure
items include:
 Office wages and salaries (administration, executive personnel, quality control, data
processing);
 Survey materials;
 Supervisory and interviewing costs (enumerators‟, supervisors‟ and field officers‟ salaries
and allowances)
 Supplies for the reproduction of questionnaires, forms and manuals and other stationaries;
 Transport cost,
 Computer services;
 Sampling design cost
 Other administrative costs (office rentals, overheads recovery); etc

Preparation of a preliminary budget estimates is a priority activity that should be planned and
executed at an early stage. The budget will depend on the survey design, including the levels of
precision desired or various estimates, as well as on the geographical and other classification for the
presentation of the results, and the operational conditions prevailing in the region.

Example of Budget Preparation for Survey


1. Office Experts
1 survey director for 1 month at Birr 10,000 per month 10,000
1 Field organizer for 1 month at Birr 6000 per month 6,000
1 Survey statistician 1 month at Birr 6000 per month 6,000
Sub- total 22,000

27
2. Field personnel
a) Salaries
50 enumerators for 2 months at 400 birr per month 40,000
10 Field supervisors for 3 months at 600 birr per month 18,000
10 Drivers for 3 months at 350 birr per month 10,500
Sub-total 68,500
b) Allowances
50 Enumerators for 1.5 months at 25 birr per day 56,250
10 Field supervisors for 2 months at 30 birr per day 18,000
10 Drivers for 2 months at 25 birr per day 15,000
50 Guides for 2 Month at 10 Birr per day 1,000
Sub- total allowances 90,250
Total field personnel 158,750

3. Equipment and transport


Office equipment and furniture 80,000
Rent of vehicles 15,000
Running costs, maintenance, insurance 6,000
Enumerators‟ equipment 10,000
Data processing equipment 4,000
Miscellaneous 3,000
Total equipment and supplies 253,000

4. Stationary
Printing of forms, questionnaires 12,000
Pens, pencils, sharpeners, erasers, rulers 500
Report production 1,500
Manuals 2,500
Total stationary 16,500

5. Data Processing staff


1 data expert for 2 months at 3,000 birr per month 6,000
5 data editors and coders for 20 days at 1,500 birr per month each 5,000
2 data entry clerks for 15 days at 2000 birr per month each 2,000
Total data processing staff 13,000

Total Budget 463,250


6. Contingency, (for unnoticed activities)
Approximately 10% 46,325
Grand total 509,575

28

You might also like