Chapter Sampling
Chapter Sampling
net/publication/348805611
CITATIONS READS
2 9,831
1 author:
SEE PROFILE
All content following this page was uploaded by Emilie van Haute on 27 January 2021.
References
George, Alexander L. and Andrew Bennett. 2005. Case Studies and Theory Development
in the Social Sciences. Cambridge, Massachusetts and London: MIT Press.
Krommendijk, Jasper. 2014. The Domestic Impact and Effectiveness of the Process of
State Reporting under UN Human Rights Treaties in the Netherlands, New
Zealand and Finland: Paper-pushing or policy prompting? Cambridge: Intersentia.
Maxwell, Joseph A. 2009. Designing a Qualitative Study. In Bickman, Leonard and Debra
J. Rog. Eds. The SAGE Handbook of Applied Social Research Methods. 2nd ed.
Thousand Oaks: SAGE Publications.
Salter, Michael and Julie Mason. 2007. Writing Law Dissertations: An Introduction and
Guide to the Conduct of Legal Research. Upper Saddle River: Pearson Education.
Turabian, Kate, Wayne C. Booth, Gregory G. Colomb, Joseph M. Williams and the
University of Chicago Press Editorial Staff. 2013. A Manual for Writers of
Research Papers, Theses, and Dissertations. 8th ed. Chicago and London: The
University of Chicago Press.
Van Hoecke, Mark. Ed. 2011. Methodologies of Legal Research: Which Kind of Method
for What Kind of Discipline? Oxford and Portland, Oregon: Hart Publishing.
Watts, Michael. 2001. The Holy Grail: In Pursuit of the Dissertation Proposal. Regents of
the University of California. Also available online at
<http://iis.berkeley.edu/node/424>.
Yin, Robert K. 2009. Case Study Research: Design and Methods. 4th ed. SAGE
Publications.
Sampling Techniques
Sample types and sample size
215
Emilie van Haute
Université libre de Bruxelles
In research, the term population refers to a well-defined set of UNITS OF ANALYSIS that
are the focus of the study. The number of units that make up the population is
symbolized by an uppercase N. These units of analysis can correspond to a set of
individuals, countries, organizations, agencies, events, news items, years, scores, books,
decisions, reforms, laws, etc. Let us consider a researcher interested in the study of how
members of parliaments (henceforth, MPs) conceive their roles as citizens’
representatives. The population of the study (N) includes all members of parliaments in
the world (46,552 according to the Global Parliamentary Report)22.
However, researchers may restrict their data collection to a sample of that population
for convenience or necessity if they lack the time and resources to collect data for the
entire population. Therefore, a sample is “any subset of units collected from a
population” (Johnson and Reynolds 2012: 224). Its size is denoted by a lowercase n. The
units that make up the sample are also referred to as ‘elements’ or ‘individuals’ In our
example, the researcher may lack the time, resources or the willingness to collect data
on all MPs in the world and will proceed to select a number of MPs (n) from this
population.
Research sampling techniques refer to CASE SELECTION strategy, i.e. the process and
methods used to select a subset of units from a population (in our example, selecting
MPs). While sampling techniques reduce the costs of data collection, they induce a loss
in terms of comprehensiveness and accuracy, compared to working on the entire
population. The data collected are subject to errors or BIAS. Two main decisions
determine the size or margin of error and whether the results of a sample study can be
generalized and applied to the entire population with accuracy: the choice of sample
type and sample size.
Types of samples
In order to apply the findings derived from the sample to the general population (with a
known sampling error), representative samples must be drawn using probability sample
methods.
A probability sample is a sample for which the probability of selecting each UNIT OF
ANALYSIS (in the population) is known. In our example, probability sampling indicates
the known probability that each MP in the world will be chosen for the subset of
selected MPs. This allows the researcher to calculate how accurately the sample reflects
22
http://archive.ipu.org/, consulted on November 2, 2017.
216
the population and to infer or generalize the results to the population with a known
margin of error. The four main probability sample methods are: simple random
sampling, systematic sampling, stratified sampling and cluster sampling.
In a simple random sample, each UNIT OF ANALYSIS has an equal chance of being
included in the sample. Units are randomly selected, preferably using a computerized
system to reduce human interference, which may subconsciously introduce patterns
and, therefore, BIAS (Hibberts et al. 2012: 55-56). In our example, the researcher could
use a random computerized technique to select MPs (from the world list) to include in
the sample (for additional examples of simple random techniques, see Johnson and
Reynolds 2016).
In a stratified sample, UNITS OF ANALYSIS are divided into groups based on one or more
characteristic. If a sample is composed of individuals, these are usually socio-
demographic characteristics. Then, units are selected within each group, using a simple
random or systematic sample technique. Commonly, stratified samples are
proportional, i.e. the sample size of each group is relative to their size and distribution in
the population. Thus, the sample resembles the population as far as possible in terms of
these characteristics. This is a way of avoiding BIAS when generalizing about the entire
population, if we know from prior studies that these characteristics affect the object
under study. Stratified samples can also be disproportional, i.e. the sample size of each
group differs from its proportion in the population. Here, certain groups may be over- or
under-represented to compensate for small groups, which would generate a small n in
the sample. It is used if the researcher intends to conduct analyses on the subgroups. In
this case, the researcher uses weights to generalize about the population in order to
compensate for these choices. In our example, all of the world’s MPs could be grouped
by country and MPs to include in the sample could be randomly selected in each
country. The number of MPs per country to include in the sample could be
proportionate to the size of that country in relation to the total population of MPs in the
217
world (proportionate stratified sample). However, the researcher could decide to over-
represent MPs from smaller countries (e.g. Micronesia, which has just 14 MPs) and
under-represent MPs from larger countries (e.g. China, where the congress has 2,924
members) to ensure that there are enough MPs from each country (disproportionate
stratified sample). In this case, when analysing the results, the researcher will have to
use weights to correct the over- or under-representation of certain groups in the
sample.
In a cluster sample, units that share one or more characteristics are put into groups.
Then, only certain groups are randomly selected. Within each selected group, units are
selected using random sampling. In our example, the researcher could randomly select a
certain number of countries and then randomly select MPs in these countries. Cluster
samples increase error, but can be useful to reduce the costs of data collection (e.g.
when the UNITS OF ANALYSIS are geographically spread out). In our example, only
selecting MPs from certain countries would reduce the costs of data collection. Cluster
samples can also be used if information about the full population is not available.
Probability samples are generally preferred because they increase accuracy and allow
inference or generalization about the population. However, they are more expensive
and not always an option because they require a list of all the units of analysis included
in the population (and their characteristics, for stratified and cluster samples). In
contrast, non-probability samples are samples in which each element in the population
has an unknown probability of being included in the sample. In this case, inference or
generalization cannot be conducted with a known margin of error. Therefore, they are
generally not recommended in SURVEY RESEARCH. However, they may be useful for
EXPERIMENTATION, INTERVIEW TECHNIQUES, exploratory and qualitative research or if
the target population is impossible to identify. The four main non-probability sample
methods are: convenience sampling, purposive sampling, quota sampling and snowball
sampling.
In a convenience sample, UNITS OF ANALYSIS are included in the sample because they
are available, as well as easy and convenient to select in the study. In our example, the
researcher could include MPs from their country of origin and from neighbouring
countries in the sample.
In a quota sample, units of analysis are divided into groups based on one or more
characteristics. Then, units are selected within each group, using a purposive or
convenience technique, which may or may not be in proportion to their distribution in
the population. This technique is similar to a (dis)proportionate stratified sample, except
218
that units are not randomly selected within the groups, but selected by purposive or
convenience methods. In our example, this would mean dividing MPs into subgroups
based on their country or party of origin, and then conveniently selecting MPs in these
groups (the sample may or may not be proportional to their size in the population).
Sample size
When building a sample, a key decision relates to sample size. The larger the sample,
the smaller the errors. At full sample size (i.e. when the entire population is included in
the study), there is no error.
Contrary to common belief, sample size is not usually determined by population size,
unless the population is rather small. In fact, the relationship between sample and
population size is exponential (see Table 7) and most quantitative research focuses on
large populations, which often makes population size irrelevant. The factors that do
determine sample size, however, are: the degree of heterogeneity of the population
(more heterogeneity requires a larger sample); the expected differences between
groups in the population (smaller expected differences require a larger sample); the
sampling technique used (more complex sampling techniques require a larger sample);
the type of analyses to be conducted (subgroup analyses require a larger sample); the
margin of error the researcher is ready to tolerate (lower margin of error requires a
larger sample); and the expected response rate (in the case of a survey, a low response
rate calls for a large sample).
Table 7. Relation between population size and sample size, based on margin of error (simple
random sampling)
Sample size based on a margin of error of… (based on simple
Population
random sample)
10% 5% 1%
100 50 80 99
500 81 218 476
1,000 88 278 906
10,000 96 370 4,900
100,000 96 383 8,763
1,000,000 and more 97 384 9,513
References
219
Chambers, Ray L. and Robert G. Clark. 2012. An Introduction to Model-Based Survey
Sampling with Applications. Oxford: Oxford University Press.
Govindarajulu, Zakkula. 1999. Elements of Sampling Theory and Methods. Upper Saddle
River, N.J.: Prentice Hall.
Hibberts, Mary, Johnson, R. Burke and Kenneth Hudson. 2012. Survey Sampling
Techniques. In Gideon, Lior (ed). Handbook of Survey Methodology for the Social
Sciences. New York: Springer pp.53-74.
Johnson, Janet B., Reynolds, H.T. and Jason D. Mycoff. 2016. Political Science Research
Methods. Los Angeles/London: Sage/CQ Press 8th edition.
Lohr, Sharon J. 2009. Sampling: Design and Analysis. Pacific Grove, Calif: Duxbury Press.
Noy, Chaim. 2006. Sampling Knowledge: The Hermeneutics of Snowball Sampling in
Qualitative Research. International Journal of Social Research Methodology
11(4): 327-344.
Teddlie, Charles and Fen Yu. 2007. Mixed Methods Sampling. A Typology with Examples.
Journal of Mixed Methods Research 1(1): 77-100.
Scientific Realism
Heikki Patomäki
University of Helsinki
Metaphysical realists have long maintained that the world is real, and while reality may
be independent of concepts, our concepts can make references to it and its essences.
Medieval nominalists denied this and maintained instead that abstract concepts or
universals are names only. For a nominalist, only particular or concrete beings exist.
Thus to name a particular four-legged creature “a dog” is just a human convention.
The debate between realism and nominalism assumed new meanings after
pragmatically successful breakthroughs in physics, chemistry, biology and medicine from
Newton to Maxwell and from Jenner to Darwin. Especially since the 19th century,
nominalists have been allied with modern day empiricists, who tend to treat scientific
theories instrumentally (theories are just tools indicating means to achieve ends); while
realists have fixed their eyes on the realisticness of scientific theories (the task of
theories is to depict reality). Both parties agree that modern scientific theories work
better than earlier forms of human understanding. The question is: is this because they
capture real essences and properties of the world? Or is it because we can formulate
theories that may work and accord with observations, but have no necessary bearing on
any deeper understanding of reality? Is the success of our theories just a miracle?
After the heyday of empiricism in the interwar period and its immediate aftermath [see
POSITIVISM AND POST-POSITIVISM], many critical reactions to empiricism seemed to
suggest scientific realism. For instance, Quine (1951) contested the idea of atomistic
220
facts and Bunge (1959) the idea that CAUSATION is only or mainly about empirically
regular connections between two events or observables. It was moreover widely
concurred that scientific theories make references to things that cannot be directly
observed (or at least seen), and thus emerged the issue of the status of non-
observables.
Social sciences seem to pose difficulties for scientific realism. Can meaning, history and
the social world really exist independently of us humans and our meanings and
conventions? Moreover, in the late 20th century many fields of social sciences were
commonly conceived in terms of inter-paradigm debates [see PARADIGM AND
RESEARCH PROGRAM]. What could concepts such as verisimilitude mean in scholarly
contexts where there appear to be several drastically different competing scientific
theories and approaches? Which one of them is approaching the truth? Many varieties
of post-POSITIVISM have adopted a strong anti-realist stand. According to Bunge (1993)
this has been a regressive move. Bunge stresses not only the reality of social systems
and macrosocial phenomena, but also of subjective experiences. “A realist should be
willing, nay eager, to admit the relevance of feeling, belief and interest to social action,
but he will insist that they be studied objectively” (p.211).
If realist objectivism is possible also in social sciences, what then explains the drastically
different opinions of say John K. Galbraith and Milton Friedman in economics? Bunge
maintains that while it is true that any given body of empirical data can be “covered” by
several different HYPOTHESES, for a realist an approximate empirical fit is only an
indicator of truth. A realist also requires compatibility with a comprehensive theory. A
theory with the greatest explanatory power is likely to be complex and refer to deep
mechanisms. Although the empirical underdetermination of theory can explain some of
the differences between Galbraith and Friedman, Bunge accuses Friedman of
philosophical “fictionalism”, of licensing assumptions that do not correspond to
anything outside theorists’ imagination. This kind of fictionalism is often freely admitted
by rational choice theories and any theorist thinking in ”as if” terms and under ”ceteris
paribus” conditions. Bunge’s point is that assumptions can be false and appearances
misleading, thus fictionalism is false.
221
Scientific realism has at times been criticised from the point of view of social
constructivism, though many forms of social constructivism are realist. For instance,
Searle (1995) distinguishes between “brute facts” and “institutional facts”. A piece of
paper in my hand is a physical fact, but it is a social agreement that it is also a 20 euro
note. We impose aesthetic, practical and social functions on objects in terms of
collective intentionality (i.e. shared beliefs, desires and intentions) and constitutive
rules: X counts as Y in context C. We can refer to institutional facts as objectively as to
brute facts; and the correspondence theory of truth is valid for both.
Of the founders of social sciences, at least Karl Marx and Emile Durkheim can be
plausibly argued to have been scientific realists – and there are realist readings of Max
Weber too. In modern social scientific contexts, the issue of non-observables concerns
the existential status of beliefs, desires and intentions; background competences;
collective intentionality; social structures and systems; and macrosocial and
macrohistorical phenomena. As scientific realists tend to defend the real existence of (at
least some of) these, it seems clear that a scientific realist can be a methodological but
not an ontological individualist (See METHODOLOGICAL INDIVIDUALISM AND HOLISM).
From a realist viewpoint, however, the meaning of the former is unclear because
realism is opposed to fictionalist as-if thinking [see ONTOLOGY].
References
Bunge, Mario. 1959. Causality: The Place of the Causal Principle in Modern Science.
Cambridge MA: Harvard University Press.
Bunge, Mario. 1993. Realism and Antirealism in Social Science. Theory and Decision
35:207-235.
Niiniluoto, Ilkka. 2000. Critical Scientific Realism. Oxford: Oxford University Press.
Putnam, Hilary. 1981. Reason, Truth and History. Cambridge: Cambridge University
Press.
Quine, Willard. 1951. Two Dogmas of Empiricism. The Philosophical Review 60:20-43.
Searle, John. 1995. The Construction of Social Reality. New York: The Free Press.
Van Fraassen, B. 1980. The Scientific Image. Oxford: Clarendon Press.
222
Scope conditions
A Potential Escape from Systematic Theory Falsification
Mathilde Gauquelin
Laval University and Ghent University
Three possible outcomes are foreseen in the relation between theory, scope conditions
and empirical results: if scope limitations are satisfied, inconsistent evidence should
falsify the conditional formulation, because the latter should have been true under the
stated limitations. However, if scope limitations are not satisfied, the evidence can only
be irrelevant, because the proposition was not meant to apply outside the stated
limitations and thus cannot be verified or falsified. Finally, if a theoretical formulation is
in fact true, findings of negative evidence will imply that some scope limitation is
unsatisfied, because the data should otherwise have supported the formulation (Walker
and Cohen 1985: 291-293). Walker and Cohen thus clarify that scope statements are
metatheoretical, in that they do not serve to make assertions about the relationship
between certain theoretical concepts or VARIABLES. Rather, they only affect the class of
223
individuals or units to which the proposition applies, regardless of the proposed
relationships between these elements (Walker and Cohen 1985: 291-293).
It logically follows that scope conditions also provide a framework to indicate where
evidence can be generalized, based on the idea that if a proposition is true within its
defined scope, there are always “additional but unknown circumstances to which [it]
applies”. The scientist’s task then becomes to generalize the theory by expanding its
scope of applicability to cover these circumstances. This process is similar to fixing
ceteris paribus parameters in economic theory: as relative certainty about a causal
relationship under given parameters is achieved, the relationship can subsequently be
tested under new parameters. Conversely, this also implies that if a proposition is
proven false under less restrictive scope conditions, it has reached its potential for
generalization and cannot see its scope further expanded (Walker and Cohen 1985: 295-
296).
More generally, there is a push among scholars to encourage the explicit statement of
scope conditions to formulate clearer and more precise theoretical statements. Goertz
thus remarks that many scientists implicitly assume conditions that directly limit the
scope of their research in practice, but are seldom explicitly discussed (Goertz 2005:
196). In practice, however, it is possible that some research methods have become
more closely associated with the definition of scope conditions. Levy explains that
although both quantitative and qualitative methods could deal with an ONTOLOGICAL
view of the world as complex and nuanced, many believe that qualitative methods are
better suited to comprehend this complexity and state scope conditions more explicitly.
He suggests that quantitative methods may be more oriented towards universalist as
opposed to contextual assumptions, due to their unquestioned use of complete
224
statistical populations whereas qualitative methods tend to focus on the specificities
and differences among cases (Levy 2007: 204-205).
Past one’s general choice of method, scope conditions also concern CASE SELECTION.
Goertz explains that scope conditions aim to meet causal homogeneity for case
selection, which refers to a set of cases where a given change on the independent
variable is expected to have the same average net effect on the dependent variable
across units (King, Keohane and Verba 1994: 91-93). However, Goertz expresses the
concern that scope conditions’ strict rooting in theory could lead to the involuntary
exclusion of relevant cases or, conversely, the inclusion of irrelevant cases. As an
alternative to Walker and Cohen’s exclusion of any case, either negative or positive, that
does not meet the scope conditions’ causal homogeneity standard, he thus offers the
“possibility principle”, which aims to exclude negative cases that do fall within the
defined scope conditions, but do not provide useful information for theory testing, thus
allowing for a measure of adaptation to empirical realities without sacrificing theoretical
soundness (Goertz 2005: 194). Although scope conditions remained widely used in
social sciences, it then appears that promising further refinements could make them
even more relevant.
References
Hempel, Carl Gustav. 1965. Aspects of Scientific Explanation: And Other Essays in the
Philosophy of Science. New York: Free Press.
Goertz, Gary. 2005. Social Science Concepts: A User’s Guide. Princeton: Princeton
University Press.
King Gary, Robert O. Keohane and Sidney Verba. 1994. Designing Social Inquiry:
Scientific Inference in Qualitative Research. Princeton: Princeton University
Press.
Levy, Jack S. 2007. Qualitative Methods and Cross-Method Dialogue in Political Science.
Comparative Political Studies 40(2): 196-214.
Nome, Martin Austvoll. 2013. When Do Commitment Problems Not Cause War? Turkey
and Cyprus, 1964 versus 1974. International Area Studies Review, 16(1): 50-73.
Soifer, Hillel David. 2012. The Causal Logic of Critical Junctures. Comparative Political
Studies, 45(12): 1572-1597.
Walker, Henry A. and Bernard P. Cohen. 1985. Scope Statements: Imperatives for
Evaluating Theory. American Sociological Review 50(3): 288-301.
Sequence Analysis
Being earnest with time
Thomas Collas
University of Louvain
&
225
Philippe Blanchard
University of Warwick
Sequence analysis (SA) refers to a set of tools used to represent, compare and explain
sequences, i.e. ordered lists of elements. Job careers (succession of positions) within a
profession, an organization or a whole country are typical examples of social sequences.
Various other topics have been analysed through SA, such as steps in traditional English
dances, country-level adoption of welfare policies over one century, individual or family
time-diaries, or sequences of bursts of political violence in the American Deep South
(see Abbott 1995 for an encompassing review of literature; Blanchard et al. 2014 for
recent developments).
Sequence analysts conceive the social world as happening in narratives, i.e. structures
with a beginning, a middle and an end. They unveil typical temporal structures and
reveal the temporal thickness of actions and institutions.
Andrew Abbott played a pioneering role in the diffusion of the method. With colleagues,
he introduced optimal matching analysis (OMA), a tool to compare sequences borrowed
from computer science and previously adapted to DNA comparisons. Abbott’s
methodological work on SA was part of a wider methodological and ontological thinking
on social processes (Abbott 1995).
Composition of sequences
In its simplest form, a sequence is a series of spells, also called “moments” or
“episodes”, defined on a more or less refined chronological scale (years, months, etc.) or
as a simple order (first, second, etc.). Each spell is chosen from an “alphabet” (or
“universe”, “space”) of all possible states observed in a population of sequences. For
example, the career of a top executive member of a major Eurozone company can be
described as the series of one-year spells OOPPPPPFFFFFFFPPFFFFF over 1990-2010,
with three states from the alphabet: F=financial sector, I=industrial sector, P=politics
and administration and O=other (Blanchard et al., under review). In a political economy
perspective, focusing on economic sectors allows to analyse the rise of financial
activities in actors’ careers and their centrality in modern capitalistic economies, at the
expense of other sectors. Had the focus rather been on the factors of professional
success in individual careers, one may rather look at successive ranks and promotions.
The detailed longitudinal data SA requires call for meticulous collection, whether using
archives, ad hoc surveys (especially life-history calendars), web-cropped CVs or web logs
(see PROSOPOGRAPHY).
226
Markov models), SA focuses on the structures drawn by the successions of spells within
sequences.
A wide collection of tools summarize the diversity of states inside sequences, the order
in which they occur, their recurrence, the sub-sequences they form (PFP and PPF both
include sub-sequences PP and PF), the time spent in each state or the number and kinds
of transitions between states. Synthetic metrics taking several aspects into account
assess the internal variety of sequences, or its “complexity”.
Sequences are both successions of states and successions of transitions between states.
An expanding body of research is dedicated to the second aspect, studying for example
transitions from employment to unemployment or retirement.
More details on these tools are available in the introduction to the widely-used R
package TraMineR (Gabadinho et al. 2011), which has contributed to spread and to
routinise SA in the recent years.
227
Clustering procedures – a vast and autonomous methodological territory – are often
applied to OMA-calculated pairwise distances in order to group similar sequences and to
separate dissimilar ones. One may then describe clusters with sequential and non-
sequential covariates, and look within clusters for prototypical sequences. Methods of
dimensionality reduction (such as multidimensional scaling) also help to make sense of
pairwise distances (for a synthesis on comparison methods and critics, see Brzinsky-Fay
and Kohler, eds., 2010).
Multichannel sequences
A pitfall of the above definition of sequences is their one-dimensionality. Life course
students agree that histories of social entities unfold in multiple and linked areas. For
instance, job careers are probably better understood when linked to residential and
family histories. These constitute different “dimensions” or “channels” of a same
sequence. Different ways to analyse multichannel sequences have been implemented
(for a seminal proposition, see Pollock 2007). However, such captivating perspectives
demand quality multidimensional longitudinal data that are often unavailable (see
CROSS-SLECTIONNAL AND LONGITUDINAL REASONING).
228
turning points assume that events are not idiosyncratic but make sense in relation to
stable trajectories. Tools combining SA and event history analysis offer means to
address such questions by identifying sequential patterns and exploring their
relationships to isolated events.
References
Abbott, Andrew. 1995. Sequence analysis: new methods for old ideas. Annual review of
sociology 21(1): 93-113.
Blanchard, Philippe, Bühlmann, Felix, Gauthier, Jacques-Antoine (Eds.). 2014. Advances
in sequence analysis: theory, method, applications. London: Springer.
Blanchard, Philippe, Dudouet, François-Xavier, Mahut, Dominique, Vion, Antoine. Under
review. The financial crisis and the withdrawal of the financial elite. A structural
and sequential approach.
Brzinsky-Fay, Christian and Ulrich Kohler (Eds.). 2010. New Developments in Sequence
Analysis. Sociological Methods & Research 38 (3) 359-512.
Cornwell, Benjamin. 2015. Social sequence analysis: Methods and applications.
Cambridge: Cambridge University Press.
Gabadinho, Alexis, Ritschard, Gilbert, Studer, Matthias, Müller, Nicolas S. 2011. Mining
sequence data in R with the TraMineR package: A user’s guide. Geneva:
University of Geneva.
Pollock, Gary. 2007. Holistic trajectories: a study of combined employment, housing and
family careers by using multiple-sequence analysis. Journal of the Royal
Statistical Society Series A. 170: 167-183.
Nicholas Haagensen
Copenhagen Business School & Université libre de Bruxelles
229