CH 4
CH 4
4.0. Introduction
This unit explains the nature of sampling and how to determine the appropriate sample design. In
fact, sampling is common in daily activities, but most of these samples are not scientific. This
unit discusses the concept of sampling and the sampling process as a central aspect of economic
Sampling is defined in terms of the population to be studied. A population, or universe, is any
complete group of people, companies, colleges or the like that share some set of characteristics.
When a distinction is made between population and universe, it is on the basis of whether the
group is finite (population), or infinite (universe).
4.1. Difference between Census and Sampling
Definition of Census
A well-organized procedure of gathering, recording and analyzing information regarding the
members of the population is called a census. It is an official and complete count of the universe,
wherein each and every unit of the universe is included in the collection of data. Here universe
implies any region (city or country), a group of people, through which the data can be acquired.
Under this technique, the enumeration is conducted about the population by considering the
entire population. Hence this method requires huge finance, time and labour for gathering
information. This method is useful, to find out the ratio of male to female, the ratio of literate to
illiterate people, the ratio of people living in urban areas to the people in rural areas.
Definition of Sampling
We define sampling as the process in which the fraction of the population, so selected to
represent the characteristics of the larger group. This method is used for statistical testing, where
it is not possible to consider all members or observations, as the population size is very large.
As statistical inferences are based on the sampling observations, the selection of the appropriate
representative sample is of utmost importance. So, the sample selected should indicate the entire
universe and not exhibit a particular section. On the basis of the data collected from the
representative samples, the conclusion is drawn from the whole population. For instance: A
company places an order for raw material by simply checking out the sample.
Census and sampling are two methods of collecting survey data about the population that are
used by many countries. Census refers to the quantitative research method, in which all the
members of the population are enumerated. On the other hand, the sampling is the widely used
method, in statistical testing, wherein a data set is selected from the large population, which
represents the entire group. Census implies complete enumeration of the study objects, whereas
Sampling connotes enumeration of the subgroup of elements chosen for participation.
The paramount differences between census and sampling are discussed in detail in the given
below points:
1. The census is a systematic method that collects and records the data about the members
of the population. The sampling is defined as the subset of the population selected to
represent the entire group, in all its characteristics.
2. The census is alternately known as a complete enumeration survey method. In contrast,
sampling is also known as a partial enumeration survey method.
3. In the census, each and every unit of population is researched. On the contrary, only a
handful of items is selected from the population for research.
4. Census, is a very time-consuming method of survey, whereas, in the case of sampling, the
survey does not take much time.
5. The census method requires high capital investment as it involves the research and
collection of all the values of the population. Unlike sampling which is a comparatively
economical method.
The table below summarizes the difference between census and sampling
4.2. Sampling
The process of sampling involves a procedure that uses a small number of items or parts of the
whole population to make conclusions regarding the whole population.
The purpose of sampling is to enable researchers to estimate some unknown characteristics of the
population.
Why Sample? In a scientific study, when the objective is to estimate an unknown population
value, why should a sample be taken rather than a complete census? Sampling cuts costs, reduce
labor requirements, and gathers information quickly. Samples, if properly selected are accurate
in most cases. When the population elements are homogenous, samples are representative of the
population. Samples are accurate only when researchers have taken care to draw representative
samples properly.
Simply, a sampling frame is the source material or device from which a sample is drawn.It is a
list of all those within a population who can be sampled, and may include individuals,
households or institutions. Sample Frame consists of a listing of all possible Sampling Units.
Sampling frame error: a sampling frame error occurs when certain sample elements are excluded
or when the entire population is not represented in the sampling frame. By error the sampling
frame may include elements that are not member of the ideal target population or some of the
elements of the ideal target population are not listed in the sampling frame. So there is the
probability of taking sample from outside the target population.
Significance
Significance is the percent chance that a relationship found in the data is just due to an unlucky
(unsuccessful) sample, and if we took another sample we might find nothing. That is,
significance is the chance of a Type I error: the chance of concluding we have a relationship
when we do not. Social scientists often use the .05 level as a cutoff: if there is 5% or less chance
that a relationship is just due to chance, we conclude the relationship is real (technically, we fail
to accept the null hypothesis that the strength of the relationship is not different from zero).
Significance testing is not appropriate for non-random samples or for enumerations/censuses.
We would like to make similar inferences for non-random samples, but that is impossible. Any
relationship, not matter how small, is a true relationship (barring measurement error) for an
enumeration.
Confidence interval
Confidence intervals are directly related to coefficients of significance. For a given variable in a
given sample, one could compute the standard error, which, assuming a normal distribution, has
a 95% confidence interval of plus or minus 1.96 times the standard error. If a very large number
of samples were taken, and a (possibly different) estimated mean and corresponding 95%
confidence interval was constructed from each sample, then 95% of these confidence intervals
would contain the true population value, assuming random sampling.
B) Systematic Sampling
Systematic sampling is a sampling procedure in which an initial starting point is selected by a
random process, and then every nth number on the list is selected. Suppose a researcher wants to
take a sample of 1000 from a list consisting of 200,000 names of companies. With systematic
sampling, every 200th name from the list would be drawn.
C) Stratified Sampling
Stratified sampling is a probability sampling procedure in which simple random sub-samples are
drawn from within different strata that are equal on some characteristics.
The first step, for both stratified and quota sampling, is choosing strata on the basis of existing
information, such as classifying retail outlets on the basis of annual sales volume. In the second
step (the process of selecting sampling units within the strata), a sub-sample is drawn using
simple random sampling within each stratum. More efficient sample, Random sampling error
will be reduced with the use of stratified sampling.
Proportional stratified sample is a stratified sample in which the number of sampling units
drawn from each stratum is in proportion to the population size of that stratum. The number of
Disproportional stratified sample is a stratified sample in which the sample size for each stratum
is allocated according to analytical considerations. That is, sample size for each stratum is not
allocated in proportion to the population size.
D) Cluster Sampling
Cluster sampling is an economically efficient sampling technique in which the primary sampling
unit is not individual element in the population but a large cluster of elements (cities). The area
sample is a popular type of cluster sample. Area sample is a cluster sample in which the primary
sampling unit is geographical area. Clusters are used when no lists of the sample population are
available.
The problem with random sampling methods when we have to sample a population that's
disbursed across a wide geographic region is that you will have to cover a lot of ground
geographically in order to get to each of the units you sampled. Imagine taking a simple random
sample of all the residents of New York State in order to conduct personal interviews. By the
luck of the draw you will wind up with respondents who come from all over the state. Your
interviewers are going to have a lot of traveling to do. It is for precisely this problem that cluster
or area random sampling was invented.
The four methods we've covered so far -- simple, stratified, systematic and cluster -- are the
simplest random sampling strategies. In most real applied social research, we would use
sampling methods that are considerably more complex than these simple variations. The most
important principle here is that we can combine the simple methods described earlier in a variety
of useful ways that help us address our sampling needs in the most efficient and effective manner
possible. When we combine sampling methods, we call this multi-stage sampling.
Sampling error
Sampling error is the difference between the sample result and the result of a census conducted
by identical procedures. Sampling error occurs because of chance variation in the scientific
selection of sampling units. There is always a slight difference between the true population value
and the sample value, hence, a small sampling error. Sampling error is a function of sample size.
Sample size increases, sampling error decreases.
Sampling errors are random variations in the sample estimates around the true population
parameters. Sampling errors can be calculated only for probability or random samples. The
difference between sampling and non-sampling error is that the extent of the former can be
estimated from the sample variation, whereas the latter cannot.
X z
n
The measurement of sampling error is usually called the precision of the sampling plan.
Sampling error is related to confidence intervals. The Y% confidence interval for the
It is error resulting from some imperfect aspect of the research design that causes response errors
or from a mistake in the execution of the research. It is also an error that comes from such
sources as sample bias, mistakes in recording responses, and non-responses from persons who
were not contacted or who refused to participate. The non- sampling error includes Non-
coverage error, Wrong population is being sampled, Non-response, Instrument error, Interview
error.
(1) Non coverage error – sampling frame defects: this is where some part of the population are
not included in the sample. Non coverage error also occurs when the list used for sampling are
incomplete or are outdated data. For instance
- Omission of part of the intended population. e.g. soldiers, students living on campus,
people in hospitals, prisoners, etc are typically excluded from national samples.
- These omissions are unlikely to affect national results by more than 1%.
(2) The wrong population is sampled: Researchers must always be sure the group being
sampled is drawn from the population they want to generalize about (intended population).
E.g.1 Drawing a sample of college students to generalize about all college-age persons.
E.g.2 Survey swimmers at the city pool to determine whether the admission price is so high to
discourage use of the pool. Potential users who have already found the price too high will not be
among the swimmers.
(3) The response rate is low (non-response): Some people refuse to be interviewed because they
are ill, are too busy, or simply do not trust the interviewer
(4) Instrument error: The word “instrument” in sampling survey jargon means the device by
which we collect data – usually a questionnaire filled out by the respondent. Different wording of
a question can lead to different answers being given by a respondent. When a question is badly
worded, the resulting error is called instrument error. e.g. leading question or carelessly worded
questions may be misinterpreted by some respondents.
( 5) Interviewer error: This occurs when some characteristic of the interviewer, such as age or
sex, affects the way in which respondent answer questions. e.g. questions about racial
discrimination might be differently answered depending on the racial group of the interviewer.
-----------------------------------------------------------//------------------------------------------------------