Module 3 Sources of Data Sampling Procedures
Module 3 Sources of Data Sampling Procedures
Objectives
2. Oral Testimony
This category includes interviews with administrators, teachers and other school
employees, students and relatives, school patrons, or lay citizens, and members of
governing bodies.
3. Relics
This category includes buildings, furniture, teaching materials, equipment, murals,
decorative pictures, textbooks, examinations, and samples of student work.
1. Reports of a person who relates the testimony of an actual witness of, or participant in
an event
2. Writer of the secondary source who was not on the scene of the event but merely
reported what the person who was there said or wrote.
3. Textbooks and encyclopedias
1. Subjective Method. This can be done by asking people questions either directly or
indirectly. The direct or interview method is a method of asking questions orally to
respondents, hence, there is a face-to-face interaction or encounter between the
interviewer and the interviewee, the purpose of which is to get more in-depth information
about perceptions, insights, attitudes, experiences or beliefs. Although it is useful for
gaining insights and context into a topic, it is, however, time consuming, expensive and
susceptible to interview bias as compared to other data collection methods.
The indirect or questionnaire method is a method of asking questions with the use of
a questionnaire. A questionnaire is a set of questions for gathering information from
respondents who are required to fill out the forms themselves. Questionnaires can be
hand-carried to the respondents or sent by mail and collected later or returned by stamped
addressed envelope. It can also be sent and retrieved electronically (by email).
1
sense that the researcher controls how subjects are assigned to groups and which
treatments each group receives.
3. Use of Existing Records. This method makes use of data that were previously gathered
by another person and were compiled and made available by institutions or agencies.
Population is the totality or aggregate of a group of persons, things, or phenomenon about which
we want to describe and draw a conclusion. Some populations are small in size while others are large.
When a population is small, complete enumeration may be used for a study or investigation and any
numeric quantity derived that is used to describe the population is called parameter.
When the population is very large, collecting information may not be possible, may be time
consuming and costly. However, the use of a sample, which is any part or subset of a population, will be
less expensive and data collection is faster and since the data is smaller, it is possible to ensure and
improve the accuracy and quality of the data.
Many statistical studies are conducted on samples which are used as bases to describe or infer
about a population. Any characteristic that describes the sample is called statistic. It is a quantity,
calculated from a sample of data, used to estimate a parameter. For example, the average of the data in a
sample is used to give information about the overall average in the population from which that sample was
drawn. Parameters are often estimated since their value is generally unknown, especially when the
population is large enough that it is impossible or impractical to obtain measurements for all units.
Parameters are normally represented by Greek letter such as the population mean and population
variance, represented by the Greek letters, µ and ∂2 , respectively.On the other hand, the sample mean
and variance, two of the most common statistics derived from samples, are denoted by the symbols x and
s2, respectively.
Sampling Techniques
Samples can be drawn from a population using any of the two general types of sampling: Probability
Sampling and Nonprobability Sampling.
Probability Sampling is one in which every unit in the population has a chance (greater than zero) of
being selected in the sample. In other words, all units in the population are given equal chance to be
included in the sample. Probability sampling includes: simple random sampling, systematic sampling,
stratified sampling, multistage sampling and cluster sampling.
1. Simple Random Sampling is a method of sampling wherein all units in the population are given
equal chance to be included in the sample. It is most appropriate when the entire population from
which the sample is taken is homogeneous. This is accomplished by drawing lots or by using a
table of random numbers.
If your subjects are in a random order then systematic sampling is equivalent to random
sampling.
2
3. Stratified Sampling is a probability sampling method wherein the entire population is divided
into different subgroups or strata, then the final subjects are randomly selected proportionally and
independently from the different strata. Stratified sampling techniques are generally used when the
population is heterogeneous, or dissimilar, where certain homogeneous, or similar, sub-populations
can be isolated (strata). It is designed to organize the population into homogenous subsets before
sampling. If sampling is done using simple random sampling, the method is called stratified
random sampling and if it is done using systematic sampling, it is called stratified systematic
sampling. The most common strata used in stratified random sampling are age, gender,
socioeconomic status, religion, nationality and educational attainment.
4. Cluster Sampling is a sampling method used when "natural" groupings are evident in a statistical
population. It is a method of sampling wherein the entire population is divided into clusters.
5. Multistage Sampling is a sampling process involving several stages in which units at each
subsequent stage are subsampled from previously selected larger units. At the first stage, large
groups or clusters of population units are selected. These clusters are designed to contain more
units than are required for a final sample. At the second stage, units are sampled from the selected
clusters to derive the final sample. If more than two stages are used, the process of selecting "sub-
clusters" within clusters continues until the final sample is achieved. The same practical
considerations apply to multi-stage sampling as to the cluster sampling. Multi-stage sampling is
generally used when it is costly or impossible to form a list of all the units in the target population.
This is our most sophisticated sampling strategy and it is often used in large epidemiological
studies. To obtain a representative national sample, researchers may select zip codes at random
from each state. Within these zip codes, streets are randomly selected.
Stage 2: Barangays
Barangays are selected from within the selected municipalities or towns.
Stage 3: Houses
Houses are selected from within the selected barangays.
Nonprobability sampling is any sampling method where some elements of the population have
no chance to be included in the sample.
3
attempt to zero in on the target group, interviewing whoever is available. In purposive
sampling, we sample with a purpose in mind. The researcher chooses the sample based on
who they think would be appropriate for the study. For example, you are interested in
studying cognitive processing speed of young adults who have suffered closed head brain
injuries in automobile accidents. This would be a difficult population to find. Hence, purposive
sampling will only be the option.
3. Snowball sampling is a technique for developing a sample where existing study subjects
recruit future subjects from among their acquaintances. Thus the sample group appears to
grow like a rolling snowball. As the sample builds up you gain enough data to use for your
data. This sampling technique is often used in hidden populations which are difficult
researchers to access. For example, populations would be drug users. Because sample
members are not selected from a sampling frame, snowball samples are subject to numerous
biases. For example, people who have many friends are more likely to identify more
respondents than those who have very few friends.
Sample Calculation:
(http:www.research-advisors.com/tools/SampleSize.htm)
There are various formulas for calculating the required sample size based upon whether the
data collected is to be of a categorical or quantitative nature (e.g. to estimate a proportion or a
mean). These formulas require knowledge of the variance or proportion in the population and a
determination as to the maximum desirable error, as well as the acceptable Type I error risk (e.g. ,
confidence level).
It is possible to use one of them to construct a table that suggests the optimal sample size-
given a population size; a specific margin of error, and a desired confidence interval. This can help
researchers avoid the formulas altogether. The table in
http:www.research-advisors.com/tools/SampleSize.htm
shown on page 11, presents the results of one set of these calculations. It maybe used to determine
the appropriate sample size for almost any study.
Many researchers (and research texts) suggest that the first column within the table should
suffice (Confidence Level = 95%, Margin of Error = 5%). To use these values, simply determine the
size of the population down the left most column (use the next highest value if your exact population
size is not listed). The value in the next column is the sample size that is required to generate a
Margin of Error of ±5% for any population proportion.
The following formula can also be used to determine the sample size given N and e above.
n = N/ (1+N*e2)
4
= 5000/ (1+5000*.0252)= 1,212. 12 = 1,212
The result of the formula given above yields a sample size greater than the result shown in the
table which is 1176 which is good.
Consider the following problem. You are conducting a survey to estimate a population mean or
proportion. The sampling method is simple random sampling, without replacement. You want
your survey to provide a specified level of precision.
To choose the right sample size for a simple random sample, you need to define the following
inputs.
You will also need to know the variance of the population, σ2. Given these inputs, the following
formulas find the smallest sample size that provides the desired level of precision.
Sample Population
Sample size
statistic size
Mean Known n = { z2 * σ2 * [ N / (N - 1) ] } / { ME2 + [ z2 * σ2 / (N - 1) ] }
Mean Unknown n = ( z2 * σ2 ) / ME2
Proportion Known n = [ ( z2 * p * q ) + ME2 ] / [ ME2 + z2 * p * q / N ]
Proportion Unknown n = [ ( z2 * p * q ) + ME2 ] / ( ME2 )
This approach works when the sample size is relatively large (greater than or equal to 30). Use
the first or third formulas when the population size is known. When the population size is large
but unknown, use the second or fourth formulas.
For proportions, the sample size requirements vary, based on the value of the proportion. If you
are unsure of the right value to use, set p equal to 0.5. This will produce a conservative sample
5
size estimate; that is, the sample size will produce at least the precision called for and may
produce better precision.
Sample Problem
At the end of every school year, a certain country administers a reading test to a simple
random sample drawn without replacement from a population of 100,000 third graders. Over
the last five years, students who took the test correctly answered 75% of the test questions.
What sample size should you use to achieve a margin of error equal to plus or minus 4%,
with a confidence level of 95%?
Specify the margin of error. This was given in the problem definition. The margin of
error is plus or minus 4% or 0.04.
Specify the confidence level. This was also given. The confidence level is 95% or 0.95.
Compute alpha. Alpha is equal to one minus the confidence level. Thus, alpha = 1 -
0.95 = 0.05.
Determine the critical standard score (z). Since this is an estimation problem, the
critical standard score is the value for which the cumulative probability is 1 - alpha/2 =
1 - 0.05/2 = 0.975. To find that value, we use the Normal Calculator. Recall that the
distribution of standard scores has a mean of 0 and a standard deviation of 1.
Therefore, we plug the following entries into the normal calculator: Value = 0.975;
Mean = 0; and Standard deviation = 1. The calulator tells us that the value of the
standard score is 1.96.
And finally, we assume that the population proportion p is equal to its past value over
the previous 5 years. That value is 0.75. Given these inputs, we can find the smallest
sample size n that will provide the required margin of error.
n = 449.2
Therefore, to achieve a margin of error of plus or minus 4 percent, we will need to survey
450 students, using simple random sampling
6
The precision and cost of a stratified design are influenced by the way that sample elements are
allocated to strata.
One approach is proportionate stratification. With proportionate stratification, the sample size of each
stratum is proportionate to the population size of the stratum. Strata sample sizes are determined by
the following equation :
nh = ( Nh / N ) * n
where nh is the sample size for stratum h, Nh is the population size for stratum h, N is total population
size, and n is total sample size.
Another approach is disproportionate stratification, which can be a better choice (e.g., less cost, more
precision) if sample elements are assigned correctly to strata. To take advantage of disproportionate
stratification, researchers need to answer such questions as:
Given a fixed budget, how should sample be allocated to get the most precision from a
stratified sample?
Given a fixed sample size, how should sample be allocated to get the most precision from a
stratified sample?
Given a fixed budget, what is the most precision that I can get from a stratified sample?
Given a fixed sample size, what is the most precision that I can get from a stratified sample?
What is the smallest sample size that will provide a given level of survey precision?
What is the minimum cost to achieve a given level of survey precision?
Given a particular sample allocation plan, what level of precision can I expect?
And so on.
Stat Trek's Sample Size Calculator can help you find the right sample allocation plan for your stratified
design. You specify your main goal - maximize precision, minimize cost, stay within budget, etc.
Based on your goal, the calculator prompts you for the necessary inputs and handles all computations
automatically. It tells you the best sample size for each stratum. The calculator creates a summary
report that lists key findings, including the margin of error. And it describes analytical techniques. And
the calculator is free. You can find the Sample Size Calculator in Stat Trek's main menu under the Stat
Tools tab. Or you can tap the button below.
7
The ideal sample allocation plan would provide the most precision for the least cost. Optimal
allocation does just that. Based on optimal allocation, the best sample size for stratum h would be:
where nh is the sample size for stratum h, n is total sample size, Nh is the population size for
stratum h, σh is the standard deviation of stratum h, and ch is the direct cost to sample an individual
element from stratum h. Note that chdoes not include indirect costs, such as overhead costs.
The effect of the above equation is to sample more heavily from a stratum when
How to Maximize Precision, Given a Stratified Sample With a Fixed Sample Size
Sometimes, researchers want to find the sample allocation plan that provides the most precision,
given a fixed sample size. The solution to this problem is a special case of optimal allocation,
called Neyman allocation.
The equation for Neyman allocation can be derived from the equation for optimal allocation by
assuming that the direct cost to sample an individual element is equal across strata. Based on
Neyman allocation, the best sample size for stratum h would be:
where nh is the sample size for stratum h, n is total sample size, Nh is the population size for
stratum h, and σh is the standard deviation of stratum h.
This section presents a sample problem that illustrates how to maximize precision, given a fixed
sample size and a stratified sample. (In a subsequent lesson, we re-visit this problem and see how
stratified sampling compares to other sampling methods.)
Problem 1
At the end of every school year, a country administers a reading test to a sample of 36 third graders.
The school system has 20,000 third graders, half boys and half girls. The results from last year's test
are shown in the table below.
8
To maximize precision, how many sampled students should be boys and how many should be
girls?
What is the mean reading achievement level in the population?
Compute the confidence interval.
Find the margin of error
Solution: The first step is to decide how to allocate sample in order to maximize precision. Based on
Neyman allocation, the best sample size for stratum h is:
where nh is the sample size for stratum h, n is total sample size, Nh is the population size for
stratum h, and σh is the standard deviation of stratum h. By this equation, the number of boys in the
sample is:
nboys = 21.83
Therefore, to maximize precision, the total sample of 36 students should consist of 22 boys and (36 -
22) = 14 girls.
The remaining questions can be answered during the process of computing the confidence interval.
Elsewhere on this website, we described how to compute a confidence interval. We employ that
process below.
Identify a sample statistic. For this problem, we use the overall sample mean to estimate the
population mean. To compute the overall sample mean, we use the following equation (which
was introduced in a previous lesson):
x = 75
Therefore, based on data from the sample strata, we estimate that the mean reading
achievement level in the population is equal to 75.
Select a confidence level. In this analysis, the confidence level is defined for us in the problem.
We are working with a 95% confidence level.
Find the margin of error. Elsewhere on this site, we show how to compute the margin of
error when the sampling distribution is approximately normal. The key steps are shown below.
Find standard deviation or standard error. The equation to compute the standard error
was introduced in aprevious lesson. We use that equation here:
9
SE = (1 / N) * sqrt { Σ [ Nh2 * ( 1 - nh/Nh ) * sh2 / nh ] }
SE = 1.41
Thus, the standard deviation of the sampling distribution (i.e., the standard error) is
1.41.
Find critical value. The critical value is a factor used to compute the margin of error.
We express the critical value as a z-score. To find the critical value, we take these
steps.
α = 1 - 99/100 = 0.01
Specify the confidence interval. The range of the confidence interval is defined by the sample
statistic + margin of error. And the uncertainty is denoted by the confidence level. Thus, with
this sample design, we are 95% confident that the sample estimate of reading achievement is
75 + 2.76.
In summary, given a total sample size of 36 students, we can get the greatest precision from a
stratified sample if we sample 22 boys and 14 girls. This results in a 95% confidence interval of 72.24
to 77.76. The margin of error is 2.76.
Benefits of Sampling
11
Activity/Assignment
With all the experiences that you have had from the past to the present, literatures that
you have read, social issues that you have heard:
List down all possible researchable areas in your field or
specialization.
Identify the most pressing problem from the list you made in
number 1 and make a diagram/drawing/ or anything that
shows the role of theory in the research process.
3. Write at least one research question.
4. Write a brief explanation about your diagram/drawing.
5. Identify the respondents and other sources of data of your study.
6. What sampling method will you use? (that is, if sampling is
necessary) Explain why.
References:
Best, J. W., & Kahn, J. V., (2003). Research in Education, Ninth Edition, A Pearson
Education Company, Boston USA
McNabb, David E. 2015. Research Methods for Political Science, 2 nd ed., Quantitative
and Qualitative Approaches, Routledge, Taylor and Francis Group, London and New
York, USA.
Sample Calculation:
(http:www.research-advisors.com/tools/SampleSize.htm)
12