Lecture PDF On Sampling and Coding
Lecture PDF On Sampling and Coding
Statistics
Data Information
But
•Where then does data come from?
•How is it gathered?
• How do we ensure its accurate?
•Is the data reliable?
•Is it representative of the population from which it was
drawn?
Sample
• A sample is a subset of a larger
population of objects individuals,
households, businesses, organizations
and so forth. In short , a sample is a
subset of the population that is being
Population studied, from which data is collected
Population to make inferences about the
population.
• Sampling is the process of selecting a
representative group of individuals
from the population to be included in
Sample
Sample the sample. Sampling enables
researchers to make estimates of some
unknown characteristics of the
population in question
• A finite group is called population
whereas a non-finite (infinite) group is
called universe
• A census is a investigation of all the
individual elements of a population
Sampling
• Identify the
population you
want to study.
Sampling Process
Sampling Sample
Frame
Inference
2. Identify the sampling units: The next step is to identify the sampling
units. The sampling units are the individual elements or units that will be
included in the sample. For example, if the population is students in a
university, the sampling units could be individual students, classes, or
departments.
3. Obtain a list of sampling units: Once the sampling units have been
identified, a list of all the sampling units should be obtained. This can be
done by obtaining a roster or list of all members of the population, or by
creating a list through other means such as internet searches or directories.
Practical approach to determine the expected
sample frame
4. Evaluate the sampling frame: The sampling frame should be evaluated
to determine its adequacy and representativeness. The sampling frame
should include all members of the population, and should be
representative of the population in terms of relevant characteristics such
as age, gender, education, income, and so on.
• And…
– The larger the sample needs to be to adequately describe the
population we need more observations to be able to make
accurate inferences.
What is Sampling?
• Sampling is the process of selecting observations (a sample)
to provide an adequate description and robust inferences of
the population
– The sample is representative of the population.
• Sample or Target population: the aggregation of the population from which the
sample is actually drawn (e.g., MBA students and faculty in 2008-09 Academic year).
• Sample frame: a specific list that closely approximates all elements in the population
—from this the researcher selects units to create the study sample (database of MBA
students and faculty in 2008-09).
• Sample: a set of cases that is drawn from a larger pool and used to make
generalizations about the population
Probability Sampling
• Probability sampling is a sampling method in which each member of
the population has an equal and known chance of being selected for
the sample.
• The simple random sample, in which each member of the population has
an equal probability of being selected, is the best-known probability
sample.
• A sample must be representative of the population with respect to the
variables of interest.
• A sample will be representative of the population from which it is
selected if each member of the population has an equal chance
(probability) of being selected.
• Probability samples are more accurate than non-probability samples
– They remove conscious and unconscious sampling bias.
• Probability samples allow us to estimate the accuracy of the sample.
• Probability samples permit the estimation of population parameters.
Types of Probability Sampling
Simple Random Sample
Cluster sampling
Systematic
Simple Random Sample
• Every subset of a specified size n from the
population has an equal chance of being
selected
Simple Random Sampling…
• Simple random sampling is a probability sampling technique used
to create a sample of individuals or items from a larger population.
• In simple random sampling, every individual or item in the
population has an equal and independent chance of being selected
for the sample.
• Drawing three names from a hat containing all the names of the
students in the class is an example of a simple random sample: any
group of three names is as equally likely as picking any other group
of three names.
• VERY EASY TO DEFINE!
• VERY, VERY DIFFICULT TO DO!
•Random sample of 100 cokes bottles today at the coke plant.
•Random sample of 50 pine trees in a 1000 acre forest.
•Random sample of 5 deer in a national forest.
5.19
Simple Random Sampling…
A government income tax auditor must choose a sample of 5 of
11 returns to audit…[Can do many different ways]
Generate Sorted
Person Random # Person Random #
baker 0.87487 1 mark 0.08350
george 0.89068 2 ralph 0.11597
ralph 0.11597 3 joe 0.24662
mary 0.58635 4 sally 0.34346
sally 0.34346 5 aaron 0.37239
joe 0.24662 andrea 0.47609
andrea 0.47609 greg 0.53542
mark 0.08350 mary 0.58635
greg 0.53542
kim 0.73809
aaron 0.37239
baker 0.87487
kim 0.73809
george 0.89068
Simple Random Sampling Vs. Complex Random Sampling
• Simple random sampling involves randomly selecting individual units
from the population to create a sample. This can be done using a variety
of methods, including random number generators or tables, or
systematic sampling. For example, if you wanted to survey students at
a university, you could randomly select students from a list of all
enrolled students to create your sample.
• Complex random sampling designs, on the other hand, involve
multiple stages of sampling to create a sample. This may be necessary
when the population is large or when it is not feasible to obtain a
complete list of all units in the population. In a complex random
sampling design, the population is divided into smaller groups or
clusters, and a sample of clusters is randomly selected. Then, a sample
of individuals or items is selected from within each selected cluster. For
example, if you wanted to survey households in a city, you could divide
the city into neighborhoods, randomly select a sample of
neighborhoods, and then randomly select households within each
selected neighborhood to create your sample
Simple Random Sampling Vs. Complex Random Sampling
Criteria Simple Random Sampling Complex Random Sampling
• Sampling Interval tells the researcher how to select elements from the frame (1
in ‘k’ elements is selected).
– Depends on sample size needed
• Example:
– You have a sampling frame (list) of 10,000 people and you need a sample of
1000 for your study…What is the sampling interval that you should follow?
– Every 10th person listed (1 in 10 persons)
• Method:
– Divide the population by certain characteristics into homogeneous
subgroups (strata) (e.g., PhD students, Masters Students, Bachelors
students).
– Elements within each strata are homogeneous, but are heterogeneous
across strata.
– A simple random or a systematic sample is taken from each strata relative
to the proportion of that stratum to each of the others.
equal intensity
STRATA 1 STRATA 2
n= 500 n = 500
POPULATION
N =1000
proportional to size
STRATA 1
n =400 STRATA 2
n = 600
STRATIFIED SAMPLING…….
30
Cluster sampling
Section 1 Section 2
Section 3
Section 5
Section 4
Cluster Sample
• The population is divided into subgroups (clusters) like
families. A simple random sample is taken of the subgroups
and then all members of the cluster selected are surveyed.
Cluster sampling
• Some populations are spread out (over a state or country).
Participants are
selected based on Potential bias in
pre-specified Useful for studying sample selection, Recruiting equal
Quota sampling characteristics to specific subgroups may not be numbers of males
achieve a certain of the population representative of and females for a
distribution in the the population survey
sample
Recruiting
Participants are Potential bias in individuals who
recruited through sample selection, have overcome
Snowball sampling referrals from other Useful for hard-to- may not be addiction through
participants in the reach populations representative of referrals from others
study the population who have overcome
addiction
Strengths and Weaknesses of Basic Sampling Techniques
Non-Response Error