0% found this document useful (0 votes)
37 views7 pages

Ch. 5 Reading Notes

This document summarizes key concepts about sampling from Chapter 5. It discusses how samples are used to make inferences about populations since studying entire populations is often impossible. The quality of a sample depends on how representative it is of the population and how members are selected. Larger, random samples produce more accurate estimates. Probability samples in which each member has a known chance of selection allow statistical inferences, unlike nonprobability samples. Simple random sampling involves giving each member an equal chance of being picked.

Uploaded by

Helena Rocha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views7 pages

Ch. 5 Reading Notes

This document summarizes key concepts about sampling from Chapter 5. It discusses how samples are used to make inferences about populations since studying entire populations is often impossible. The quality of a sample depends on how representative it is of the population and how members are selected. Larger, random samples produce more accurate estimates. Probability samples in which each member has a known chance of selection allow statistical inferences, unlike nonprobability samples. Simple random sampling involves giving each member an equal chance of being picked.

Uploaded by

Helena Rocha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

PLSC 270

Chapter 5 Notes
03/03/2021

Chapter 5 – Sampling: Introduction


 Ideally, we would study entire populations to answer research questions, but resources
are limited so we sometimes need to use a sample
o Advantage of sample: less costly, less resources. Disadvantage: info based on
sample is less accurate and/or more subject to error than population info
o Problem: can a sample of 1,000 people really say anything/reflect millions of
people?
 Must find the best possible sample to reduce uncertainties and increase
application of study

The Basics of Sampling


 Population might not necessarily mean people, it can be a set of countries, corporations,
gov’t agencies, years, events, etc.
o Either way, population has to be clearly, carefully, fully defined and must be
relevant to research question.
o Ex: It’s impossible to interview everyone for a survey, so you choose a few
members of a population to investigate
 Sample  any subset of units collected in some manner from a population. A subset of
observations or cases drawn from a specified population
o Sample size and how members of sample are chosen determine quality (aka
accuracy and reliability) of inferences about the whole population
 Important to clarify the method of selection and number of observations
to be drawn
o Most interesting attributes in empirical research are numerical or quantitative
indicators (ex: averages, percentages, etc.). Characteristics of interest can be
examined/measured once sample is gathered – sample statistics
 Sample statistic  estimator of a population characteristic of attribute
that is calculated from sample data. Used to approximate the
corresponding population parameters/values

How Do We Use a Sample to Learn about a Population?


 Always assume there will be a margin of error when we report sample statistic
(difference between sample and actual population parameters)
o Ex: if in a random sample of people, 54% approve of Trump, it means that in the
population, approximately 54% of people approve of Trump
 Researchers sacrifice some precision whenever they rely on samples.
How much is sacrificed depends on how sample was drawn and its size
o Loss of precision/accuracy usually comes from chance. But following proper
procedures and certain assumptions, sample is more accurate/precise
 Difficulty: figuring out how far off the estimate is likely to be
 Statistical inference  mathematical theory and techniques for making conjectures
about the unknown characteristics (parameters0 of populations based on samples
o Goal is to make supportable conjectures about the unknown characteristics of a
population based on sample statistics
o Studying statistics involves precisely defining what “supportable” means. Key to
making inferences from sample statistic = sampling distribution

Sampling Distribution
 Sampling error  difference between what you sample says and what the population
actually is. Arises because only a portion of population is observed
o Ex: 80% of sample approves of Trump, but 50% of population approves. Sampling
error is difference between 80% and 50%
o So, you need a way to measure the amount of error/uncertainty in the estimate
to report a margin of error
 How to calculate uncertainty: take lots of different samples (ex: ask 4 groups of 10
people if they approve of Trump), add it up, divide by number of groups:
o Group 1 approval = 80% (0.80), group 2 = 30%, group 3 = 40%, group 4 = 60%. So:
(0.80 + 0.30 + 0.40 + 0.60)/4 = 0.525
 Closer to real value (0.5) – so it’s better to have more samples
o If you plot all the averages of all the samples, you’ll start seeing a normal (bell)
curve. Sampling distribution  theoretical frequency distribution of a statistic
generated from an infinite number of samples drawn from a population
 Sampling is normally distributed for every observable variable, no matter
the concept – basis for inferential statistics. Mean of sampling
distribution = population parameter
 See page 106 graphs
o Expected value  the mean/average value of a sample statistic based on
repeated samples from a population: E(p) = P
 Little p = estimated sample proportion. Equation  the expected )or
long-run or average) value of sample proportions = the population
proportion (P)
 Best guess of the value of the population parameter is the value of the
sample statistic
Sample Size and Margin of Error
 Large samples are more likely to represent population because small samples have
higher chances of excluding certain groups
o Ex: you might have bad luck and not get any women, or black people, or older
people, etc. in the samples
 Larger sample = more likely to include all types of people/be truly
representative of population
o Margin of error drops a lot when sample size increases, but increasing beyond a
certain point won’t give much marginal benefit:
 Will cost more than it will improve accuracy/precision of study to
increase sample beyond a certain point
Sampling Methods
 Samples must be obtained according to certain rules
 Element (aka: unit of analysis)  single occurrence, realization, or instance of the
objects or entities being studied. A particular case or entity about which info is collected
o Ex: presidential approval rating survey – individual American adults (survey
respondents) are the elements
o In simple cases, sampling unit = element. In more complicated sampling designs,
sampling unit may be a collection of elements.
 Sampling unit  the entity listed in a sampling frame
 Sampling frame  a list from which sampling units are drawn into a sample, and it must
be specified clearly. The population from which a sample is drawn. Ideally it is the same
as the total population of interest to a study (which is usually not possible)
 A population can be stratified – subdivided into groups of similar elements – before a
sample is drawn. Each stratum is a subgroup of a population that shares 1 or more
characteristics
o Ex: population = campaign speeches. Strata = dividing speeches into campaign
years (this group of speeches was made in this year)
o Chosen strata are usually characteristics/attributes thought to be related to the
dependent variables under study
 As samples become less representative of population, inferences about the population
become less valid
o Ex: if population has 50 characteristics, and your sample only has 40. More
characteristics = more valid inferences.
o But it’s super hard to include EVERY characteristic of a population
 Closer it is to real population, the better

Types of Samples
 Purpose of samples is to make inferences about the population from a smaller group. If
sampling frame is incomplete/inappropriate, sample bias happens
o Sample bias: whenever some elements of a population are systematically
excluded from a sample. Usually due to incomplete sampling frame or a
nonprobability method of selecting elements
o Sample is unrepresentative of the population of interest and inaccurate
conclusions about the population may be drawn
 Sample bias makes it important to distinguish between probability sample and
nonprobability sample:
o Probability sample: a sample for which each element in the population has a
known probability of being included in the sample
 This knowledge allows a researcher to calculate how accurately the
sample reflects the population
o Nonprobability sample: sample in which each element in the population has an
unknown probability of being selected
 Probability of selection is required for the use of statistical theory to
make inferences.
o Probability samples > nonprobability samples (Because you can use statistical
theory to make inferences on the former but not on the latter)
 1) Simple Random Samples: each element and combination of elements has an equal
chance of being selected
o Ex: drawing names from a hat – each name has equal chance of being drawn. Ex:
assigning numbers strategy
o Requires a list of the members of the population in the forms of a sampling
frame
 Ex: if you’re studying countries and you need to pick a few countries to
study out of all 195 countries, you need a list of all countries to pick from
o Pro of SRS  as the sample gets larger and larger, the sample will share the
characteristics of the population because every element has equal chance of
being selected
 Problem is that obtaining a sampling frame that is the same as the
population is not always easy/possible
 2) Systematic Random Samples: elements are selected from a list at predetermined
intervals
o Sometimes easier than Simple RS. It also requires a list of the target pop, but the
list is randomized to maintain a random sample
o Sampling interval: the “skip” of the number of elements between elements that
are drawn  k = N/n, N = population size and n = desired sample size
 Ex: if we want to pick countries out of 195 countries to study. If we want
a sample size of n = 10, we would divide the total by 10 to get the
sampling fraction (or interval k) – k = 195/10 = 19.5. Round up to 20. So,
starting at a random point, we would take every 20th country until we had
a sample of 10
 Ex: if we start at country #11, the next would be country #31, #51,
etc.
o Useful for when we’re dealing with a long list of population elements. But it can
result in a biased sample
 If elements on the list have been ranked according to a characteristic,
you’ll get biased sample
 If the list contains a patter that corresponds to the sampling interval,
you’ll get bias (doesn’t happen often, but must be considered)
 3) Stratified Sample: probability sample in which elements are divided into groups,
called strata, based on a characteristic, and elements are selected from each stratum in
proportion to its representation in the total population
o Sampling units are divided into strata with each unit appearing in only one
stratum. Then a simple random sample or systematic RS is taken from each
stratum
o Can be proportionate or disproportionate
 Proportionate: use stratified sample in which each stratum is represented
in proportion to its size in the population (ex: divide into states, but São
Paulo is bigger than Acre, so you draw in proportion to population)
 Disproportionate: select a stratified sample in which elements sharing a
characteristic are under or overrepresented (ex: if you’re trying to study a
specific group, you can overrepresent them)
o Characteristics to stratify should have theoretical importance in study – create
strata that are meaningful for the project
 4) Cluster Samples: used when a list of elements doesn’t exist and creating one wouldn’t
be feasible. It’s a probability sample in which sampling frame initially consists of clusters
of elements
o Since only some elements are going to be selected in a sample, it is unnecessary
to secure a list of all elements in the population
o Groups/clusters of elements are identifies and listed as sampling units. Then, a
sample is drawn from this list of sampling units. Then, elements are identified
and sampled in the sampling units only
 Ex: to conduct interviews with people, you need a small sample (because
interviews are time consuming). So you choose 100 random
neighborhoods, then 10 random streets in the neighborhoods then 10
random houses in the streets – conduct interviews in those houses only
o The houses chosen are random, so it’s a random sample, but the cluster process
reduced the geographic spread of respondents and saved resources
 You don’t need to know the total number of people in the city before
starting the cluster process because each house has an equal probability
of being selected
 Probability of your house being selected = probability of your
neighborhood being selected times probability of your street being
selected times probability of your house being selected
 Systematic, stratified and cluster (2, 3 and 4) are often more practical than simple
random sample (1)
o In each case, the probability of being selected is known, so the accuracy of the
sample can be determined
o The type of sample chosen depends on the resources you have and the
availability of an accurate and comprehensive list of elements in a well-defined
target population
 Nonprobability Samples: sample for which each element in the total population has an
unknown probability of being selected.
o Used when probability samples (which are better because they represent a large
population accurately and it’s possible to calculate how close an estimated
characteristic is to the population value) can’t be used (ex: too expensive)
 Sometimes you can learn more by studying carefully selected and
perhaps unusual cases than by studying representative ones
 Ex: studying undocumented immigrants. There isn’t a list of undoc
people, so you just have to work with who you can find, which isn’t
representative
o Convenience sample: a nonprobability sample in which the selection of elements
is determined by the researcher’s convenience.
o Purposive sample: researcher exercises considerable discretion over what
observations to study because the goal is typically to study a diverse and usually
limited number of observations rather than to analyze a sample that represents
the population
o Quota sample: elements are sampled in proportion to their representation in the
population (similar to proportionate stratified sampling)
 Difference is that elements in the quota sample are not chosen in a
probabilistic way – they’re chosen in a purposive or convenient way
 Usually biased
o Snowball sample: respondents are used to identify other people who might
qualify for including in the sample
 These people are interviewed and asked to supply names for further
investigating, and the sample builds like a snowball
 Problem asking people who know each other to join the study
means you’ll probably get people from the same social circles 
similar characteristics
 Continue the process until enough people are interviewed. Very useful
when studying rare/difficult to locate population (like undocumented)

Conclusion
 If cost isn’t a major consideration and the validity of measures will not suffer, it’s
generally better to collect data for the complete target population than use a sample
 If cost/validity dictate that a sample be drawn, a probability sample is usually preferable
to a nonprobability sample
o Accuracy of sample estimates can be determined only for probability samples.
o If the desire to represent a target population accurately is not a major concern or
is impossible to achieve, then a nonprobability sample can be used
 Probability samples yield estimates of the target population. All samples are subject to
sampling error
o No sample, no matter how well drawn, can provide an exact measurement of an
attribute of, or relationship within, the target population
 Statistical theory gives us methods to make inferences about unknown parameters and
for objectively measuring the probabilities of making inferential errors
o This info allows researchers and scientific community to judge the tenability of
many empirical claims
 See page 117 for list of terms with definitions

You might also like