Module 18 The Need For Sampling
Module 18 The Need For Sampling
LEARNING OUTCOMES
At the end of the module, you are expected to exhibit the following competencies:
1. Recognise the concepts of random sampling.
2. Distinguish between parameter and statistic.
3. Identify that the size of the sample (not the fraction of the population) determines the precision
of estimates from a probability sample.
IMPORTANT CONCEPTS
Sample survey is a method of systematically gathering information on a segment of the population, such as
individuals, families, wildlife, farms, business firms, and unions of workers, for the purpose of inferring
quantitative descriptors of the attributes of the population.
• Cost. A sample often provides useful and reliable information at a much lower cost than a census. For extremely
large populations, the conduct of a census can be even impractical. In fact, the difficulty of analyzing complete
census data led to summarizing a census by taking a “sample” of returns.
• Timeliness. A sample usually provides more timely information because fewer data are to be collected and
processed. This attribute is particularly important when information is needed quickly.
• Accuracy. A sample often provides information as accurate, or more accurate, than a census, because data
errors typically can be controlled better in smaller tasks.
• Detailed information. More time is spent in getting detailed information with sample surveys than with
censuses. In a census, we can often only obtain stock, not flow data. For instance, agricultural production
cannot be generated from censuses.
• Destructive testing. When a test involves the destruction of an item, sampling must be used. Battery life tests
must use sampling because something must be left to sell!
Probability Sampling
If data is to be used to make decisions about a population, then how the data is collected is critical. For a sample
data to provide reliable information about a population of interest, the sample must be representative of that
population. Selecting samples from the population using chance allows the samples to be representative.
If a sample survey involves allowing every member of the population to have a known, nonzero chance of being
selected into the sample, then the sample survey is called a probability sample. Probability samples are meant to
Module 18 Page 1 of 9
The Need for Sampling
ensure that the segment taken is representative of the entire population. Examples of these include the Family
Income and Expenditure Survey (FIES), the Labor Force Survey, and the Quarterly Survey of Establishments, all
conducted by the PSA. Opinion polls conducted by some non-government organizations with track records such
as the Social Weather Stations and Pulse Asia, likewise use chance methods to select their survey respondents.
Data collected from these probability sampling-based surveys yield estimates of characteristics of the population
that these surveys attempt to describe.
a. Simple random sampling (SRS) involves allowing each possible sample to have an equal chance of being
picked and every member of the population has an equal chance of being included in the sample.
Selection may be with replacement (selected individual or unit is returned to frame for possible
reselection) or without replacement (selected individual or unit isn’t returned to the frame). This
sampling method requires a listing of the elements of the population called the sampling frame. In the
case of agricultural surveys or surveys of establishments, the sampling frame may either be based on a
list frame, or an area frame, or a mixture. Samples may be obtained from the table of random numbers
or computer random number generators.
b. Stratified sampling is an extension of simple random sampling which allows for different homogeneous
groups, called strata, in the population to be represented in the sample. To obtain a stratified sample, the
population is divided into two or more strata based on common characteristics. A SRS is then used to
select from each strata, with sample sizes proportional to strata sizes. Samples from the strata are then
combined into one. This is a common technique when sampling from a population of voters, stratifying
across racial or socio-economic classes. When thinking of using stratification, the following questions
must be asked:
Are there different groups within the population?
Are these differences important to the investigation?
Usually, stratified sampling is done when the population is divided into several subgroups with common
characteristics. The population may be divided into urban and rural locations (as dwellings in rural areas may tend
to be homogenous compared to dwellings in urban areas); the student population may be divided by the year
level of learners; or the workers in a hospital may be categorized by their different occupations—nurse, doctor,
janitor, secretary.
Module 18 Page 2 of 9
The Need for Sampling
c. In systematic sampling, elements are selected from the population at a uniform interval that is measured
in time, order, or space.
Typically, there is firstly, a decision on a desired sample size n. The frame of N units is then divided into groups of
k units: k=N/n. Then, one unit is randomly selected from the first group, with every kth unit thereafter also
selected. For instance in Figure 3-02.2, consider the population of 20 trees, and if the sample size is 4, then the
frame is divided into 4 groups. Suppose that the fourth item is chosen in the first group, with every fifth unit
thereafter chosen.
d. Cluster sampling divides the population into groups called clusters, selects a random sample of clusters,
and then, subjects the sampled clusters to complete enumeration, that is everyone in the sampled
clusters are made part of the sample.
Clusters in the population may be based on convenience in the collection of data. For example, in a village,
clusters can be blocks of houses. In a school, the clusters can be the sections. In a dormitory, clusters can be the
rooms. In a city or municipality, the clusters can be the different barangays. Cluster sampling is conducted so that
data collected need not come from a huge geographic range, thus saving resources. For instance, instead of
getting a simple random sample of households from all over a town, clusters of dwellings can be selected from
different barangays so that the cost of data collection can be minimized.
Example:
Suppose you want to compute the mean grade point averages (GPAs) of learners at a certain higher educational
institution. You decide that an appropriate sample size is n = 100. To estimate the mean GPAs, you can use simple
random sampling to select 100 learners and average their GPAs. Since freshmen GPAs tend to be lower than
senior GPAs, you may want to make sure that both classes are represented, so you decide to use a stratified
sample.
Module 18 Page 3 of 9
The Need for Sampling
According to the university’s registrar, the student population consists of 35% freshmen, 30% sophomores, 20%
juniors, and 15% seniors. Get samples from each stratum, proportional to its size. Specifically, take simple
random samples of 35 freshmen, 30 sophomores, 20 juniors, and 15 seniors. Then, average the GPAs of the
learners to estimate the GPA of the entire university.
Instead of a class, you can also have subgroups of the student population based on their academic major,
assuming that each student is assigned one major. When stratifying into subgroups, the subgroups must be
mutually exclusive. If they are not, then some subjects will have a higher chance of being chosen since they
belong in more than one subgroup.
Statistics is different from Mathematics. The essential paradigm in Statistics is induction (from the particular to
the general) while Mathematics uses deduction (from the general to the particular). Modern Statistics’ is there to
develop tools that will allow scientifically valid inference from samples to the populations from which they came.
Non-probability Sampling
Non-probability or judgment sampling is the generic name of several sampling methods where some units in the
population do not have the chance to be selected in the sample, or if the inclusion probabilities cannot be
computed. Generally, the procedure involves arbitrary selection of “typical” or “representative” units concerning
which information is to be obtained. A few types of non-probability samples are listed below:
a. Haphazard or accidental sampling involves an unsystematic selection of sample units. Some disciplines
like archaeology, history, and even medicine draw conclusions from whatever items are made available.
Some disciplines like astronomy, experimental physics, and chemistry often do not care about the
“representativeness” of their specimens.
b. In convenience sampling, sample units expedient to the sampler are taken.
Module 18 Page 4 of 9
The Need for Sampling
c. For volunteer sampling, sample units are volunteers in studies wherein the measuring process is painful or
troublesome to a respondent.
d. Purposive sampling pertains to having an expert select a representative sample based on his own
subjective judgment. For instance, in Accounting, a sample audit of ledgers may be taken of certain
weeks (which are viewed as typical). Many agricultural surveys also adopt this procedure for lack of a
specific sampling frame.
e. In Quota Sampling, sample units are picked for convenience but certain quotas (such as the number of
persons to interview) are given to interviewers. This design is especially used in market research.
f. In Snowball Sampling, additional sample units are identified by asking previously picked sample units for
people they know who can be added to the sample. Usually, this is used when the topic is not common,
or the population is hard to access.
Survey Errors
When collecting data, whether through sample surveys or censuses, a variety of survey errors may arise. This is
why it is crucial to design the data collection process very carefully. Censuses may also overcount or undercount
certain portions of the population of interest. Household censuses in the Philippines, for instance, have often
been contentious because of undercounts and overcounts and their implications on politics since congressional
seats and Internal Revenue Allotment (IRA) depend on population counts. Conclusions based on purposive
samples, such as telephone polls used in early morning television shows, SMS polls, or surveys in Facebook, do
not hold the same weight as probability-based samples. A probability sample uses chance to ensure that the
sample is much more representative of the population, something that is not true of purposive samples.
• In the conduct of sample surveys, sampling error is roughly the difference between the value obtained in a
sample statistic and the value of the population parameter that would have arisen had a census been
conducted. This difference comes from the operation of the chance process that determines which particular
units in the population are included in the sample. This error can be positive or negative, small or large but
increasing the sample size can always reduce this type of error. This error can be estimated and reported along
with the sample statistic. Since estimates of a parameter from a probability sample would vary from sample to
sample, the variation in estimates serves as a measure of sampling error. Statisticians can say, for instance, that
in 2000, the FIES indicated that 39.5 percent of the entire Filipino population is poor and that there are 95
chances in 100 that a full census would reveal a value within 0.4% of the stated figure. The approval ratings of
the President, obtained from an opinion poll of about 1,200 respondents who were selected judiciously
through chance-methods, are theoretically within a margin of error of about 3 percentage points from the
actual approval ratings.
Module 18 Page 5 of 9
The Need for Sampling
• Another type of error that statisticians consider in the collection of data is called non-sampling error. There are
many specific types of non-sampling error. There may be selection bias or the systematic tendency to exclude
in a survey a particular group of units. As a result, you get coverage errors, which arise if, for example, we
assume that the respondents in a telephone poll in an early morning television shows reflect the entire
population of voters. Yet in fact, telephone polls in the Philippines at best represent only the population of
telephone subscribers, which is, in truth, only a vast minority of the targeted population of all Filipinos. Current
television and radio polls being conducted by a number of media stations reflect only the population of those
who are watching or listening to the show and who are persistent in phoning in their views. Thus, there is a
serious issue of coverage. The same is true in the case of Internet-based and SMS surveys. Even a seriously
done Internet survey will only reflect those who have Internet access, which is currently not the majority of
Filipino households. To illustrate coverage and other non-sampling errors, consider the following case in point.
In 1936, the Literary Digest, a famed magazine in the United States, conducted a survey of its subscribers as well
as telephone subscribers to predict the outcome of the presidential race. The Digest erroneously predicted that
then incumbent President Franklin D. Roosevelt would receive 43% of the vote and thus lose to the challenger
Kansas Governor Alfred Landon when in actuality, Landon only received 38% of the total vote. (The Digest went
bankrupt thereafter). At the same time, George Gallup set up his polling organization and correctly forecasted
Roosevelt’s victory from a mere sample of 50,000 people.
A post-mortem analysis revealed coverage errors arising from biases in sample selection. The Literary Digest list
of targeted respondents was taken from telephone books, magazine subscriptions, club membership lists, and
automobile registrations. Inadvertently, the Digest targeted well-to-do voters, who were predominantly
Republican and who had a tendency to vote for their candidate. The sample had a built-in bias to favor one group
over another. This is called selection bias. In addition, there was also a non-response bias since, of the 10 million
they targeted for the survey, only 2.4 million had actually responded. A response rate of 24% is far too low to
yield reliable estimates of population parameters. Nonresponsive people may differ considerably in their views
from the views of responders.
Here, we see that obtaining a large number of respondents does not cure procedural defects but only repeats
them over and over again! When choosing a sample, biases, such as selection bias or nonresponse bias should be
avoided. However, in practice, it can be challenging to avoid nonresponse bias in surveys since there are people
who will fill out surveys and those who will not, even if incentives are provided.
To remedy biases (or failures for a sample to represent the population) resulting from “convenience” errors,
polling organizations have since then resorted to using probability-based methodologies for selecting samples
where the subjects are chosen on the basis of certain probabilities, which in turn, allow us to compute for the
number of respondents each sampled respondent effectively represents. Randomization or using chance-based
procedures for selecting respondents is the best guarantee against bias. However, it is important to firstly have
an idea of the sampling frame, i.e. the targeted population, and carefully design the survey in order to make it
representative of the targeted population.
Other possible sources of biases in sample surveys that one should be cautious about:
Module 18 Page 6 of 9
The Need for Sampling
• wording of questions, which can influence the response enormously
• the sensitivity of a survey topic (e.g., income, sex and illegal behavior)
• interviewer biases in selecting respondents or in the responses generated because of the appearance and
demeanor of the interviewer
• non-response biases, which happens when targeted respondents opt not to provide information in the survey
As was pointed out earlier, statistics generated from a sample survey are subject to both non-sampling and
sampling errors. The latter arise because only a part of the population is observed. There is likely to be some
difference between the sample statistic and the true value of the population parameter (that you would have
obtained had a census been conducted). To know more about this difference or sampling error and consequently
establish the reliability of the sample statistic, you have to understand the chance process involved in the sample
selection. For this purpose, you have to analyze the sampling distribution or the set of all possible values that the
point estimate could take under repeated sampling, and possibly approximate this sampling distribution.
When estimating, you should know something about the population to be generalized. One of the characteristics
of the population that is often estimated is the mean. The population mean is often the parameter to be
estimated. There can be several estimators of the population mean, including the sample mean, sample median,
sample mode, and sample midrange. In similar manner, there can be several estimators of the population
variance s2 . Given sample data X1, X2, …, Xn, where X represents the sample mean (i.e. the sum of the data
divided by the sample size n), then the sample variance defined with denominator n-1
As was earlier pointed out, a good estimator must possess desirable properties— Accuracy and Precision.
• Accuracy is a measure of how close the estimates are to the parameter. It can be measured by bias, i.e., the
difference of the expected value of the estimate from the true value of the parameter. An estimator is said to
be unbiased if its bias is zero. Otherwise, the estimator is biased. When bias is positive or greater than zero, the
estimator overestimates the parameter. If negative or below zero, estimator underestimates the parameter.
• Precision is a measure of how close the estimates are with each other. The variance of the estimator or its
standard error gives a measure of how precise the estimator is. The smaller the value of the standard error of
an estimator, the more precise the estimator is.
Module 18 Page 7 of 9
The Need for Sampling
In general, we want the estimator to be both accurate and precise. We can illustrate precision and accuracy by
way of an analogy. Let us represent the parameter as a target bull’s eye while the estimates of the parameters
are the arrows shot by an archer. The first target (1) in the figure below illustrates a precise but not an accurate
estimator. The second target (2) shows that the archer or estimator is accurate but not precise. The third
estimator (3) shows the archer is both precise and accurate while the last target (4) shows an estimator that is
neither accurate nor precise.
Example: The sample mean (of a simple random sample) is an estimator of the population mean that is both
accurate and precise. Its expected value is equal to the population mean itself that is why it is unbiased and,
consequently, an accurate estimator. It is precise because statistical theory has determined that it has the
smallest standard error compared to other estimators. Having these good properties of an estimator makes the
sample mean a good estimator of the population mean.
PRACTICE SKILLS
1. Identify the population, parameter of interest, the sampling frame, the sample, the sampling method, and
any potential sources of biases in the following studies:
a. The producers of a television show asked information from Facebook users on the TV show’s Facebook
page about their sentiments (favorable, unfavorable, neutral) on a segment on the TV show.
b. A question posted on the website of a daily newspaper in the Philippines asked visitors of the site to
indicate their voter preference for the next presidential election.
c. In March 2015, Pulse Asia reported that the leading urgent concerns of Filipinos are inflation control
(46%), the increase of workers' pay (44%), and the fight against government corruption (40%). On the
other hand, Filipinos are least concerned with national territorial integrity (5%), terrorism (5%), and
charter change (4%).The nationwide survey was conducted from March 1 to 7, 2015 with 1,200
respondents.
d. A sample survey of persons with disability (PWDs) was designed to be representative of PWDs, by making
use of PWD registers from local government units, but an assessment suggested the registers were
severely undercovering PWDs. The design was adjusted to make use of snowball sampling where existing
sampled PWDs would identify other future subjects from among their acquaintances. The study
attempted to examine the proportion of PWDs who were poor.
Module 18 Page 8 of 9
The Need for Sampling
a. The teacher randomly selects 20 boys and 15 girls from a batch of learners to be members of a group that
will go to a field trip
b. A sample of 10 mice are selected at random from a set of 40 mice to test the effect of a certain medicine
c. The people in a certain seminar are all members of two of five groups are asked what they think about
the president.
d. A barangay health worker asks every four house in the village for the ages of the children living in those
households.
e. A sales clerk for a brand of clothing asks people who comes up to her whether they own a piece of article
from her brand.
f. A psychologist asks his patient, who suffers from depression, whether he knows other people with the
same condition, so he can include them in his study.
g. A brand manager of a toothpaste asks ten dentists that have clinic closest to his office whether they use a
particular brand of toothpaste.
REFERENCES
Richardson, M, Using Dice to Introduce Sampling Distributions. STatistics Education Web (STEW). Retrieved from
http://www.amstat.org/education/stew/pdfs/UsingDicetoIntroduceSamplingDistributio ns.doc
De Veau, R. D., Velleman, P. F., and Bock, D. E. (2006). Intro Stats. Pearson Ed. Inc.
Workbooks in Statistics 1: 11th Edition. Institute of Statistics, UP Los Baños, College Laguna 4031
Probability and statistics: Module 24. (2013). Australian Mathematical Sciences Institute and Education Services
Australia. Retrieved from http://www.amsi.org.au/ESA_Senior_Years/PDF/InferenceProp4g.pdf
Module 18 Page 9 of 9