Ch. 5 Reading Notes
Ch. 5 Reading Notes
Chapter 5 Notes
03/03/2021
Sampling Distribution
Sampling error difference between what you sample says and what the population
actually is. Arises because only a portion of population is observed
o Ex: 80% of sample approves of Trump, but 50% of population approves. Sampling
error is difference between 80% and 50%
o So, you need a way to measure the amount of error/uncertainty in the estimate
to report a margin of error
How to calculate uncertainty: take lots of different samples (ex: ask 4 groups of 10
people if they approve of Trump), add it up, divide by number of groups:
o Group 1 approval = 80% (0.80), group 2 = 30%, group 3 = 40%, group 4 = 60%. So:
(0.80 + 0.30 + 0.40 + 0.60)/4 = 0.525
Closer to real value (0.5) – so it’s better to have more samples
o If you plot all the averages of all the samples, you’ll start seeing a normal (bell)
curve. Sampling distribution theoretical frequency distribution of a statistic
generated from an infinite number of samples drawn from a population
Sampling is normally distributed for every observable variable, no matter
the concept – basis for inferential statistics. Mean of sampling
distribution = population parameter
See page 106 graphs
o Expected value the mean/average value of a sample statistic based on
repeated samples from a population: E(p) = P
Little p = estimated sample proportion. Equation the expected )or
long-run or average) value of sample proportions = the population
proportion (P)
Best guess of the value of the population parameter is the value of the
sample statistic
Sample Size and Margin of Error
Large samples are more likely to represent population because small samples have
higher chances of excluding certain groups
o Ex: you might have bad luck and not get any women, or black people, or older
people, etc. in the samples
Larger sample = more likely to include all types of people/be truly
representative of population
o Margin of error drops a lot when sample size increases, but increasing beyond a
certain point won’t give much marginal benefit:
Will cost more than it will improve accuracy/precision of study to
increase sample beyond a certain point
Sampling Methods
Samples must be obtained according to certain rules
Element (aka: unit of analysis) single occurrence, realization, or instance of the
objects or entities being studied. A particular case or entity about which info is collected
o Ex: presidential approval rating survey – individual American adults (survey
respondents) are the elements
o In simple cases, sampling unit = element. In more complicated sampling designs,
sampling unit may be a collection of elements.
Sampling unit the entity listed in a sampling frame
Sampling frame a list from which sampling units are drawn into a sample, and it must
be specified clearly. The population from which a sample is drawn. Ideally it is the same
as the total population of interest to a study (which is usually not possible)
A population can be stratified – subdivided into groups of similar elements – before a
sample is drawn. Each stratum is a subgroup of a population that shares 1 or more
characteristics
o Ex: population = campaign speeches. Strata = dividing speeches into campaign
years (this group of speeches was made in this year)
o Chosen strata are usually characteristics/attributes thought to be related to the
dependent variables under study
As samples become less representative of population, inferences about the population
become less valid
o Ex: if population has 50 characteristics, and your sample only has 40. More
characteristics = more valid inferences.
o But it’s super hard to include EVERY characteristic of a population
Closer it is to real population, the better
Types of Samples
Purpose of samples is to make inferences about the population from a smaller group. If
sampling frame is incomplete/inappropriate, sample bias happens
o Sample bias: whenever some elements of a population are systematically
excluded from a sample. Usually due to incomplete sampling frame or a
nonprobability method of selecting elements
o Sample is unrepresentative of the population of interest and inaccurate
conclusions about the population may be drawn
Sample bias makes it important to distinguish between probability sample and
nonprobability sample:
o Probability sample: a sample for which each element in the population has a
known probability of being included in the sample
This knowledge allows a researcher to calculate how accurately the
sample reflects the population
o Nonprobability sample: sample in which each element in the population has an
unknown probability of being selected
Probability of selection is required for the use of statistical theory to
make inferences.
o Probability samples > nonprobability samples (Because you can use statistical
theory to make inferences on the former but not on the latter)
1) Simple Random Samples: each element and combination of elements has an equal
chance of being selected
o Ex: drawing names from a hat – each name has equal chance of being drawn. Ex:
assigning numbers strategy
o Requires a list of the members of the population in the forms of a sampling
frame
Ex: if you’re studying countries and you need to pick a few countries to
study out of all 195 countries, you need a list of all countries to pick from
o Pro of SRS as the sample gets larger and larger, the sample will share the
characteristics of the population because every element has equal chance of
being selected
Problem is that obtaining a sampling frame that is the same as the
population is not always easy/possible
2) Systematic Random Samples: elements are selected from a list at predetermined
intervals
o Sometimes easier than Simple RS. It also requires a list of the target pop, but the
list is randomized to maintain a random sample
o Sampling interval: the “skip” of the number of elements between elements that
are drawn k = N/n, N = population size and n = desired sample size
Ex: if we want to pick countries out of 195 countries to study. If we want
a sample size of n = 10, we would divide the total by 10 to get the
sampling fraction (or interval k) – k = 195/10 = 19.5. Round up to 20. So,
starting at a random point, we would take every 20th country until we had
a sample of 10
Ex: if we start at country #11, the next would be country #31, #51,
etc.
o Useful for when we’re dealing with a long list of population elements. But it can
result in a biased sample
If elements on the list have been ranked according to a characteristic,
you’ll get biased sample
If the list contains a patter that corresponds to the sampling interval,
you’ll get bias (doesn’t happen often, but must be considered)
3) Stratified Sample: probability sample in which elements are divided into groups,
called strata, based on a characteristic, and elements are selected from each stratum in
proportion to its representation in the total population
o Sampling units are divided into strata with each unit appearing in only one
stratum. Then a simple random sample or systematic RS is taken from each
stratum
o Can be proportionate or disproportionate
Proportionate: use stratified sample in which each stratum is represented
in proportion to its size in the population (ex: divide into states, but São
Paulo is bigger than Acre, so you draw in proportion to population)
Disproportionate: select a stratified sample in which elements sharing a
characteristic are under or overrepresented (ex: if you’re trying to study a
specific group, you can overrepresent them)
o Characteristics to stratify should have theoretical importance in study – create
strata that are meaningful for the project
4) Cluster Samples: used when a list of elements doesn’t exist and creating one wouldn’t
be feasible. It’s a probability sample in which sampling frame initially consists of clusters
of elements
o Since only some elements are going to be selected in a sample, it is unnecessary
to secure a list of all elements in the population
o Groups/clusters of elements are identifies and listed as sampling units. Then, a
sample is drawn from this list of sampling units. Then, elements are identified
and sampled in the sampling units only
Ex: to conduct interviews with people, you need a small sample (because
interviews are time consuming). So you choose 100 random
neighborhoods, then 10 random streets in the neighborhoods then 10
random houses in the streets – conduct interviews in those houses only
o The houses chosen are random, so it’s a random sample, but the cluster process
reduced the geographic spread of respondents and saved resources
You don’t need to know the total number of people in the city before
starting the cluster process because each house has an equal probability
of being selected
Probability of your house being selected = probability of your
neighborhood being selected times probability of your street being
selected times probability of your house being selected
Systematic, stratified and cluster (2, 3 and 4) are often more practical than simple
random sample (1)
o In each case, the probability of being selected is known, so the accuracy of the
sample can be determined
o The type of sample chosen depends on the resources you have and the
availability of an accurate and comprehensive list of elements in a well-defined
target population
Nonprobability Samples: sample for which each element in the total population has an
unknown probability of being selected.
o Used when probability samples (which are better because they represent a large
population accurately and it’s possible to calculate how close an estimated
characteristic is to the population value) can’t be used (ex: too expensive)
Sometimes you can learn more by studying carefully selected and
perhaps unusual cases than by studying representative ones
Ex: studying undocumented immigrants. There isn’t a list of undoc
people, so you just have to work with who you can find, which isn’t
representative
o Convenience sample: a nonprobability sample in which the selection of elements
is determined by the researcher’s convenience.
o Purposive sample: researcher exercises considerable discretion over what
observations to study because the goal is typically to study a diverse and usually
limited number of observations rather than to analyze a sample that represents
the population
o Quota sample: elements are sampled in proportion to their representation in the
population (similar to proportionate stratified sampling)
Difference is that elements in the quota sample are not chosen in a
probabilistic way – they’re chosen in a purposive or convenient way
Usually biased
o Snowball sample: respondents are used to identify other people who might
qualify for including in the sample
These people are interviewed and asked to supply names for further
investigating, and the sample builds like a snowball
Problem asking people who know each other to join the study
means you’ll probably get people from the same social circles
similar characteristics
Continue the process until enough people are interviewed. Very useful
when studying rare/difficult to locate population (like undocumented)
Conclusion
If cost isn’t a major consideration and the validity of measures will not suffer, it’s
generally better to collect data for the complete target population than use a sample
If cost/validity dictate that a sample be drawn, a probability sample is usually preferable
to a nonprobability sample
o Accuracy of sample estimates can be determined only for probability samples.
o If the desire to represent a target population accurately is not a major concern or
is impossible to achieve, then a nonprobability sample can be used
Probability samples yield estimates of the target population. All samples are subject to
sampling error
o No sample, no matter how well drawn, can provide an exact measurement of an
attribute of, or relationship within, the target population
Statistical theory gives us methods to make inferences about unknown parameters and
for objectively measuring the probabilities of making inferential errors
o This info allows researchers and scientific community to judge the tenability of
many empirical claims
See page 117 for list of terms with definitions