Module 2 - Sampling
Module 2 - Sampling
What is sampling?
For example, if a drug manufacturer would like to research the adverse side
effects of a drug on the country’s population, it is almost impossible to conduct
a research study that involves everyone. In this case, the researcher decides
a sample of people from each demographic and then researches them, giving
him/her indicative feedback on the drug’s behavior.
1. Probability sampling:
From the responses received, management will now be able to know whether
employees in that organization are happy or not about the amendment.
2. Non-probability sampling:
● Cluster sampling:
Using the probability sampling method, the bias in the sample derived
from a population is negligible to non-existent. The selection of the
sample mainly depicts the understanding and the inference of the
researcher. Probability sampling leads to higher quality data collection
as the sample appropriately represents the population.
● Diverse Population:
● Convenience sampling:
● Quota sampling:
Depending on the type of research, the researcher can apply quotas based on
the sampling frame. It is not necessary for the researcher to divide the quotas
equally. He/she divides the quotas as per his/her need (as shown in the
example where the researcher interviews 350 employed and only 150
unemployed individuals). Random sampling can be conducted to reach out to
the respondents.
● Create a hypothesis:
The non-probability method when there are budget and time constraints,
and some preliminary data must be collected. Since the survey design
is not rigid, it is easier to pick respondents at random and have them
take the survey or questionnaire.
We have looked at the different types of sampling methods above and their
subtypes. To encapsulate the whole discussion, though, the significant
differences between probability sampling methods and non-probability
sampling methods are as below:
Sampling Errors
They may create distortions in the results, leading users to draw incorrect
conclusions. When analysts do not select samples that represent the entire
population, the sampling errors are significant.
Since there is a fault in the data collection, the results obtained from sampling
become invalid. Furthermore, when a sample is selected randomly, or the
selection is based on bias, it fails to denote the whole population, and
sampling errors will certainly occur.
Since the exact population parameter is not known, sampling errors for
samples are generally unknown. However, analysts can use analytical
methods to measure the amount of variation caused by sampling errors.
Non-Sampling Error
1. Random errors
Random errors are errors that cannot be accounted for and just happen. In
statistical studies, it is believed that each random error offsets each other,
generally speaking, so they are of little to no concern.
2. Systematic errors
Systematic errors affect the sample of the study and, as a result, will often
create useless data. A systematic error is consistent and repeatable, so the
study’s creators must take great care to mitigate such an error.
Non-sampling errors can occur from several aspects of a study. The most
common non-sampling errors include errors in data entry, biased questions
and decision-making, non-responses, false information, and inappropriate
analysis.
Types of Non-Sampling Errors
There are several types of non-sampling errors, including:
1. Non-response error
A non-response error is caused by the differences between the people who
choose to participate compared to the people who do not participate in a
given survey. In other words, it exists when people are given the option to
participate but choose not to; therefore, their survey results are not
incorporated into the data.
2. Measurement error
A measurement error refers to all errors relating to the measurement of each
sampling unit, as opposed to errors relating to how they were selected. The
error often arises when there are confusing questions, low-quality data due to
sampling fatigue (i.e., someone is tired of taking a survey), and low-quality
measurement tools.
3. Interviewer error
Interviewer error occurs when the interviewer (or administrator) makes an
error when recording a response. In qualitative research, an interviewer may
lead a respondent to answer a certain way. In quantitative research, an
interviewer may ask the question differently, which leads to a different result.
4. Adjustment error
An adjustment error describes a situation where the analysis of the data
adjusts it so that it is not entirely accurate. Forms of adjustment error include
errors with weighting the data, data cleaning, and imputation.
5. Processing error
A processing error arises when there is a problem with processing the data
that causes an error of some kind. An example will be if the data were entered
incorrectly or if the data file is corrupt.
1. Sampling error can arise even when no apparent mistake’s been made, as
opposed to non-sampling error, which arises when a mistake occurs.
For example, a researcher may pay the individual a bonus depending on the
accuracy of their data entry, or they may film all interviews to ensure that the
interviewer stays on topic and on script.
Sampling Distribution
What Is a Sampling Distribution?
A sampling distribution is a probability distribution of a statistic obtained from a larger number of
samples drawn from a specific population. The sampling distribution of a given population is the
distribution of frequencies of a range of different outcomes that could possibly occur for a
statistic of a population.
In statistics, a population is the entire pool from which a statistical sample is drawn. A population
may refer to an entire group of people, objects, events, hospital visits, or measurements. A
population can thus be said to be an aggregate observation of subjects grouped together by a
common feature.
For example, a medical researcher that wanted to compare the average weight of all babies
born in North America from 1995 to 2005 to those born in South America within the same time
period cannot within a reasonable amount of time draw the data for the entire population of over
a million childbirths that occurred over the ten-year time frame. He will instead only use the
weight of, say, 100 babies, in each continent to make a conclusion. The weight of 200 babies
used is the sample and the average weight calculated is the sample mean.
Now suppose that instead of taking just one sample of 100 newborn weights from each
continent, the medical researcher takes repeated random samples from the general population,
and computes the sample mean for each sample group. So, for North America, he pulls up data
for 100 newborn weights recorded in the US, Canada and Mexico as follows: four 100 samples
from select hospitals in the US, five 70 samples from Canada and three 150 records from
Mexico, for a total of 1200 weights of newborn babies grouped in 12 sets. He also collects a
sample data of 100 birth weights from each of the 12 countries in South America.
The average weight computed for each sample set is the sampling distribution of the mean. Not
just the mean can be calculated from a sample. Other statistics, such as the standard deviation,
variance, proportion, and range can be calculated from sample data. The standard deviation
and variance measure the variability of the sampling distribution.
The number of observations in a population, the number of observations in a sample and the
procedure used to draw the sample sets determine the variability of a sampling distribution.
While the mean of a sampling distribution is equal to the mean of the population, the standard
error depends on the standard deviation of the population, the size of the population and the
size of the sample.
Knowing how spread apart the mean of each of the sample sets are from each other and from
the population mean will give an indication of how close the sample mean is to the population
mean. The standard error of the sampling distribution decreases as the sample size increases.
Special Considerations
A population or one sample set of numbers will have a normal distribution. However, because a
sampling distribution includes multiple sets of observations, it will not necessarily have a
bell-curved shape.
Following our example, the population average weight of babies in North America and in South
America has a normal distribution because some babies will be underweight (below the mean)
or overweight (above the mean), with most babies falling in between (around the mean). If the
average weight of newborns in North America is seven pounds, the sample mean weight in
each of the 12 sets of sample observations recorded for North America will be close to seven
pounds as well.
However, if you graph each of the averages calculated in each of the 1,200 sample groups, the
resulting shape may result in a uniform distribution, but it is difficult to predict with certainty what
the actual shape will turn out to be. The more samples the researcher uses from the population
of over a million weight figures, the more the graph will start forming a normal distribution.