(Springer Series On Environmental Management) Dr. Michael L. Morrison, Dr. William M. Block, Dr. M. Dale Strickland, Dr. Bret A. Collier, Dr. Markus J. Peterson (Auth.) - Wildlife Study Design-Springe
(Springer Series On Environmental Management) Dr. Michael L. Morrison, Dr. William M. Block, Dr. M. Dale Strickland, Dr. Bret A. Collier, Dr. Markus J. Peterson (Auth.) - Wildlife Study Design-Springe
(Springer Series On Environmental Management) Dr. Michael L. Morrison, Dr. William M. Block, Dr. M. Dale Strickland, Dr. Bret A. Collier, Dr. Markus J. Peterson (Auth.) - Wildlife Study Design-Springe
ENVIRONMENTAL MANAGEMENT
BRUCE N. ANDERSON
ROBERT W. HOWARTH
LAWRENCE R. WALKER
Series Editors
Springer Series on Environmental Management
Volumes published since 1989
Second Edition
Dr. Michael L. Morrison Dr. William M. Block
Texas A&M University Rocky Mountain Research Station
College Station,TX USDA Forest Service
USA Flagstaff, AZ
[email protected] USA
Series Editors
Bruce N. Anderson Robert W. Howarth Lawrence R. Walker
Planreal Australasia Program in Biogeochemistry Department of
Keilor, Victoria 3036 and Environmental Change Biological Science
Australia Cornell University University of Nevada
[email protected] Corson Hall Las Vegas
Ithaca, NY 14853 Las Vegas, NV 89154
USA USA
[email protected] [email protected]
9 8 7 6 5 4 3 2 1
springer.com
Preface
We developed the first edition of this book because we perceived a need for a
compilation on study design with application to studies of the ecology, conserva-
tion, and management of wildlife. We felt that the need for coverage of study design
in one source was strong, and although a few books and monographs existed on
some of the topics that we covered, no single work attempted to synthesize the
many facets of wildlife study design.
We decided to develop this second edition because our original goal – synthesis
of study design – remains strong, and because we each gathered a substantial body
of new material with which we could update and expand each chapter. Several of
us also used the first edition as the basis for workshops and graduate teaching,
which provided us with many valuable suggestions from readers on how to improve
the text. In particular, Morrison received a detailed review from the graduate stu-
dents in his “Wildlife Study Design” course at Texas A&M University. We also
paid heed to the reviews of the first edition that appeared in the literature.
As for the first edition, we think this new edition is a useful textbook for
advanced undergraduate and graduate students and a valuable guide and reference
for scientists and resource managers. Thus, we see this book being used by students
in the classroom, by practicing professionals taking workshops on study design,
and as a reference by anyone interested in this topic. Although we focus our exam-
ples on terrestrial vertebrates, the concepts provided herein have applicability to
most ecological studies of flora and fauna.
We approached this book from both a basic and applied perspective. The topics
we cover include most of the important areas in statistics, but we were unable to go
into great detail regarding statistical methodology. However, we included sufficient
details for the reader to understand the concepts. Actual application might require
additional reading. To facilitate additional research on the topics, we included
extensive literature reviews on most of the areas covered.
A primary change in the second edition was division of the original Chap. 1 into
two new chapters. Chapter 1 now focuses on philosophical issues as they relate to
science. The philosophy of science provides a logical framework for generating
meaningful and well-defined questions based on existing theory and the results of
previous studies. It also provides a framework for combining the results of one’s
study into the larger body of knowledge about wildlife and for generating new
v
vi Preface
questions, thus completing the feedback loop that characterizes science. The new
Chapter 2 retains many of the elements present in the first chapter of the original
edition, but has been fully revised. In this new Chapter 2, we focus on the concept
of basic study design, including variable classification, the necessity of randomiza-
tion and replication in wildlife study design, and the three major types of designs
in decreasing order of rigor (i.e., manipulative experiments, quasi-experiments, and
observational studies).
Throughout the remaining chapters we expanded our use of examples and the
accompanying literature. In particular, we added considerable new material on
detection probabilities, adaptive cluster methods, double sampling, sampling of
rare species, and effect size and power. We expanded our coverage of impact
assessment with recent literature on disturbance and recovery. One of the changes
highlighted by student reviewers of the first edition was the need for more material
on what to do “when things go wrong.” That is, what can one do to recover a study
when the wonderful design put down on paper cannot be fully implemented in the
field, or when some event (e.g., natural catastrophe or just plain bad luck) reduces
your sample size? We also added a glossary to assist in reviewing key terminology
used in study design, as requested by student reviewers.
We thank Janet Slobodien, Editor, Ecology and Environmental Science,
Springer Science + Business Media, for guiding both editions through to publica-
tion; and also Tom Brazda of Springer for assisting with the final compilation and
editing of the book. Joyce Vandewater is thanked for patiently working with us to
create and standardize the graphics. We thank the reviewers selected by Springer
for providing valuable comments that strengthened this edition. Angela Hallock,
Texas A&M University, completed the task of securing copyright permissions for
material used in the text. Nils Peterson, Damon Hall, and Tarla Rai Peterson pro-
vided incisive reviews of Chapter 1 that greatly improved the final version.
We also thank those who assisted with the first edition, because the valuable
comments they made were retained through to this new edition: Rudy King, Rocky
Mountain Research Station, US Forest Service; Lyman McDonald, Western
EcoSystems Technology, Inc. In particular we thank first edition co-author William L.
Kendall for his valuable contributions.
Contents
vii
viii Contents
3 Experimental Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2 Principles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.3 Philosophies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.3.1 Design/Data-based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.3.2 Model-based Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.3.3 Mixtures of Design/Data-based and Model-based
Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.4 Replication, Randomization, Control, and Blocking . . . . . . . . . . . . . 81
3.4.1 Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.4.2 Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4.3 Control and Error Reduction. . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.5 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.6 Single-factor Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6.1 Paired and Unpaired . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6.2 Completely Randomized Design . . . . . . . . . . . . . . . . . . . . . . 86
3.6.3 Randomized Complete Block Design. . . . . . . . . . . . . . . . . . . 87
3.6.4 Incomplete Block Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.6.5 Latin Squares Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.7 Multiple-factor Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.7.1 Factorial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.7.2 Two-factor Designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.7.3 Multiple-factor Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.7.4 Higher Order Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Contents ix
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Glossary
Format of glossary: Key terms used throughout the text are listed below with a brief
definition; cross-reference to associated terms is provided where appropriate. The
number(s) following each term refers to the chapter(s) in which the term is defined
or otherwise discussed.
xv
xvi Glossary
Level-by-time interaction (6) The term “level” refers to the fact that spe-
cific categories (levels) of the impact are
designated; used in a level-by-time design.
Contrast with trend-by-time interaction.
Local extinction probability (1) The probability that a species currently
present in a biotic community will not be
present by the next time period.
Logical empiricism (1) See logical positivism; reflects the affinity
of later members of this movement for the
writings of Locke, Berkeley, and Hume.
Logical positivism (1) An early twentieth century philosophical
movement that holds that all meaningful
statements are either (1) analytic (e.g.,
mathematical equations) or (2) conclu-
sively verifiable or at least confirmable by
observation and experiment, and that all
other statements are therefore cognitively
meaningless.
Longitudinal studies (3) Repeated measures experiment common in
wildlife telemetry studies, environmental
impact studies, habitat use and selection
studies, studies of blood chemistry, and
many other forms of wildlife research, where
logistics typically leads to repeated measures
of data from study plots or study organisms.
Long-term study (5, 3, 7) A study that continues “…for as long as
the generation time of the dominant organ-
ism or long enough to include examples of
the important processes that structure the
ecosystem under study... the length of
study is measured against the dynamic
speed of the system being studied” (Strayer
et al. 1986).
Levels (6) Levels are measures of a resource such as
abundance, diversity, community struc-
ture, and reproductive rates. Hence, levels
are quantifiable on an objective scale and
can be used to estimate means and vari-
ance and to test hypotheses.
Magnitude of anticipated effect (3) The magnitude of the perturbation or the
importance of the effect to the biology of the
species, which often determines the level of
concern and the required level of precision.
Manipulative studies (3) Studies that include control of the experi-
mental conditions; there are always two or
xxiv Glossary
Optimal study design (6) If you know what type of impact will occur,
when and where it will occur, and have the
ability to gather pretreatment data, you are
in an optimal situation to design the study.
Contrast with suboptimal study design.
Overdispersion (3, 4) A statistical occurrence when the observed
variance of the data is larger than the pre-
dicted variance. Fairly common in analysis
using Poisson and Binomial regression
techniques.
Paired study design (3) A study that typically evaluates changes in
study units paired for similarity.
P-value (2, 3) Probability of obtaining a test statistic at
least as extreme at the observed condi-
tional on the null hypothesis being true.
Personal opinion (9) Personal opinion implies a decision based
on personal biases and experiences.
Contrast with expert opinion.
Panmictic populations (1) Populations where interactions between indi-
viduals, including potential mating opportu-
nities, are relatively continuous throughout
the space occupied by the population.
Paradigm (1, 3) A term employed by Thomas S. Kuhn that
characterizes a scientific tradition, includ-
ing its philosophy, theory, experiments,
methods, publications, and applications.
Paradigms govern what he called normal
science (see above). The term also has
come to describe a given world-view in
common parlance.
Parameter (2, 3) Quantities that define certain characteristics
of an ecological system or population.
Pilot study (1, 2, 5, 8) A pilot study is a full-scale dress rehearsal
of the study plan and includes data collec-
tion, data processing, and data analyses,
thus allowing thorough evaluation of all
aspects of the study including initial sample
size and power analyses. A pilot study is
often done with a much larger sample than
a pretest period. Such studies are especially
useful when initiating longer-term studies.
Population (1, 3) See biological population, sampled popu-
lation, and target population.
Postmodernism (1) It is a truism that postmodernism is indefinable.
It can be described as a cultural zeitgeist of
Glossary xxvii
1.1 Introduction
understanding the nature of the entity being studied (ontology), what constitutes
knowledge and how it is acquired (epistemology), and why one thinks the research
question is valuable, the approach ethical, and the results important (axiology).
Moreover, the philosophy of science provides a logical framework for generating
meaningful and well-defined questions based on existing theory and the results of
previous studies. It provides also a framework for combining the results of one’s
study into the larger body of knowledge about wildlife and for generating new
questions, thus completing the feedback loop that characterizes science. For these
reasons, we outline how scientific methodology helps us acquire valuable knowl-
edge both in general and in specific regarding wildlife. We end the chapter with a
brief discussion of terminology relevant to the remaining chapters.
The result, Theocharis and Psimopoulos feared, was that “having lost their monopoly
in the production of knowledge, scientists have also lost their privileged status in soci-
ety” and the governmental largess to which they had become accustomed (p. 597).
Since the 1960s, entire academic subdisciplines devoted to critiquing science,
and refereed journals associated with these endeavors, have become increasingly
common and influential. The more radical members of this group often are called
postmodernists. It is probably fair to say, however, that most scientists either were
blissfully unaware of these critiques, or dismissed them as so much leftwing aca-
demic nonsense.
By the 1990s, however, other scientists began to join Theocharis and Psimopoulos
with concerns about what they perceived to be attacks on the validity and value of
science. Paul R. Gross and Norman Levitt (1994), with Higher Superstition: The
Academic Left and Its Quarrels with Science, opened a frontal attack on critical stud-
1.2 Philosophy and Science 3
ies of science. They argued that scholars in critical science studies knew little about
science and used sloppy scholarship to grind political axes. Both the academic and
mainstream press gave Higher Superstition substantial coverage, and “the science
wars” were on.
In 1995, the New York Academy of Sciences hosted a conference entitled “The
Flight from Science and Reason” (see Gross et al. (1997) for proceedings). These
authors, in general, were also highly critical of what they perceived to be outra-
geous, politically motivated postmodern attacks on science. Social Text, a critical
theory journal, prepared a special 1996 issue titled “Science Wars” in response to
these criticisms. Although several articles made interesting points, if the essay by
physicist Alan D. Sokal had not been included, most scientists and the mainstream
media probably would have paid little attention. Sokal (1996b) purportedly argued
that quantum physics supported trendy postmodern critiques of scientific objectiv-
ity. He simultaneously revealed elsewhere that his article was a parody perpetrated
to see whether the journal editors would “publish an article liberally salted with
nonsense if (a) it sounded good and (b) it flattered the editors’ ideological precon-
ceptions” (Sokal 1996a, p. 62). The “Sokal affair,” as the hoax and its aftermath
came to be known, brought the science wars to the attention of most scientists and
humanists in academia through flurries of essays and letters to editors of academic
publications. A number of books soon followed that addressed the Sokal affair and
the science wars from various perspectives and with various degrees of acrimony
(e.g., Sokal and Bricmont 1998; Koertge 1998; Hacking 1999; Ashman and
Barringer 2001). At the same time, the public, no doubt already somewhat cynical
about academic humanists and scientists alike, read their fill about the science wars
in the mainstream media. Like all wars, there were probably no winners. A more
relevant question is the degree to which all combatants lost.
A student of wildlife science might well ask, “How can Karl Popper be one of
the more notorious enemies of science” and “If science is an objective, rational
enterprise addressing material realities, how can there be any argument about the
nature of scientific knowledge, let alone the sometimes vicious attacks seen in the
science wars?” These are fair questions. Our discussion of ontology, epistemology,
and axiology in science, making up the remainder of Sect. 1.2, should help answer
these and related questions and simultaneously serve as a brief philosophical foun-
dation for the rest of the book.
If asked to define reality, most contemporary scientists would probably find the
question somewhat silly. After all, is not reality the state of the material universe
around us? In philosophy, ontology is the study of the nature of reality, being, or
existence. Since Aristotle (384–322 B.C.), the empiricist tradition of philosophy
has held that material reality was indeed largely independent of human thought and
best understood through experience. Science is still informed to a large degree
4 1 Concepts for Wildlife Science: Theory
through this empiricist perception. Rationalism, however, has an equally long tradi-
tion in philosophy. Rationalists such as Pythagoras (ca. 582–507 B.C.), Socrates
(ca. 470–399 B.C.), and Plato (427/428–348 B.C.) argued that the ideal, grounded
in reason, was in many ways more “real” than the material. From this perspective,
the criterion for reality was not sensory experience, but instead was intellectual and
deductive. Certain aspects of this perspective are still an integral part of modern
science. For many contemporary philosophers, social scientists, and humanists,
however, reality is ultimately a social construction (Berger and Luckmann 1966).
That is, reality is to some degree contingent upon human perceptions and social
interactions (Lincoln and Guba 1985; Jasinoff et al. 1995). While philosophers
voiced arguments consistent with social constructionism as far back as the writing
of Heraclitus (ca. 535–475 B.C.), this perspective toward the nature of being
became well established during the mid-twentieth century.
Whatever the precise nature of reality, knowledge is society’s accepted portrayal
of it. Over the centuries, societies have – mistakenly or not – accessed knowledge
through a variety of methods, including experience, astrology, experimentation,
religion, science, and mysticism (Rosenberg 2000; Kitcher 2001). Because the
quest for knowledge is fundamental to wildlife science, we now flesh out the per-
mutations of knowing and knowledge acquisition.
1.2.3 Knowledge
What is knowledge, how is knowledge acquired, and what is it that we know? These
are the questions central to epistemology, the branch of Western philosophy that
studies the nature and scope of knowledge. The type of knowledge typically dis-
cussed in epistemology is propositional, or “knowing-that” as opposed to “knowing-
how,” knowledge. For example, in mathematics, one “knows that” 2 + 2 = 4, but
“knows how” to add.
In Plato’s dialogue Theaetetus (Plato [ca. 369 B.C.] 1973), Socrates concluded
that knowledge was justified true belief. Under this definition, for a person to know
a proposition, it must be true and he or she must simultaneously believe the propo-
sition and be able to provide a sound justification for it. For example, if your friend
said she knew that a tornado would level her house in exactly 365 days, and the
destruction indeed occurred precisely as predicted, she still would not have known
of the event 12 months in advance because she could not have provided a rational
justification for her belief despite the fact that it turned out later to be true. On the
other hand, if she said she knew a tornado would level her house sometime within
the next 20 years, and showed you 150 years of records indicating that houses in
her neighborhood were severely damaged by tornadoes approximately every 20
years, her statement would count as knowledge and tornado preparedness might be
in order. This definition of knowledge survived without serious challenge by phi-
losophers for thousands of years. It is also consistent with how most scientists per-
ceive knowledge today.
1.2 Philosophy and Science 5
(continued)
6 1 Concepts for Wildlife Science: Theory
Also during this period, philosophers informed by the rationalist tradition were
busily honing their epistemological perspective. René Descartes (1596–1650),
Baruch Spinoza (1632–1677), Gottfried Leibniz (1646–1716), and others are often
associated with this epistemological tradition and were responsible for integrating
mathematics into philosophy. For rationalists, reason takes precedence over experi-
ence for acquiring knowledge and, in principle, all knowledge can be acquired
through reason alone. In practice, however, rationalists realized this was unlikely
except in mathematics.
Philosophers during the Classical era probably would not have recognized any
crisp distinction between empiricism and rationalism. The seventeenth century
debate between Robert Boyle (1627–1691) and Thomas Hobbes (1588–1679)
regarding Boyle’s air pump experiments and the existence of vacuums fleshed out
this division (Shapin and Schaffer 1985). Hobbes argued that only self-evident
truths independent of the biophysical could form knowledge, while Boyle promoted
experimental verification, where knowledge was reliably produced in a laboratory
and independent of the researcher (Latour 1993). Even in the seventeenth century,
many rationalists found empirical science important, and some empiricists were
closer to Descartes methodologically and theoretically than were certain rational-
ists (e.g., Spinoza and Leibniz). Further, Immanuel Kant (1724–1804) began as
rationalist, then studied Hume and developed an influential blend of rationalist and
empiricist traditions. At least two important combinations of empiricism and cer-
tain aspects of rationalism followed.
One of these syntheses, pragmatism, remains the only major American philo-
sophical movement. Pragmatism originated with Charles Saunders Peirce (1839–
1914) in the early 1870s and was further developed and popularized by William
James (1842–1910), John Dewey (1859–1952), and others. Peirce, James, and
Dewey all were members of The Metaphysical Club in Cambridge, Massachusetts,
during the 1870s and undoubtedly discussed pragmatism at length. Their perspec-
tives on pragmatism were influenced by Kant, Mill, and Georg W.F. Hegel (1770–
1831), respectively (Haack and Lane 2006, p. 10), although other thinkers such as
Bacon and Hume were undoubtedly influential as well. James perceived pragma-
tism as a synthesis of what he termed the “tough-minded empiricist” (e.g., “materi-
alistic, pessimistic, … pluralistic, skeptical”), and “tender-minded rationalist” (e.g.,
“idealistic, optimistic, … monistic, dogmatical”) traditions of philosophy (1907,
p. 12). Similarly, Dewey argued that pragmatism represented a marriage between
the best of empiricism and rationalism (Haack 2006, pp. 33–40). James (1912, pp.
41–44) maintained the result of this conjunction was a “radical empiricism” that
must be directly experienced. As he put it,
To be radical, an empiricism must neither admit into its constructions any element that is
not directly experienced, nor exclude from them any element that is directly experienced.
… a real place must be found for every kind of thing experienced, whether term or relation,
in the final philosophic arrangement. (p. 42)
active fields of philosophy today. For this reason, there are several versions of neo-
pragmatism that differ in substantive ways from the classical pragmatism of Peirce,
James, Dewey, or George H. Mead (1863–1931). However, philosophers who con-
sider themselves pragmatists generally hold that truth, knowledge, and theory are
inexorably connected with practical consequences, or real effects.
The other important philosophical blend of empiricism and rationalism, logical
positivism (later members of this movement called themselves logical empiricists),
emerged during the 1920s and 1930s from the work of Moritz Schlick (1882–1936)
and his Vienna Circle, and Hans Reichenbach (1891–1953) and his Berlin Circle
(Rosenberg 2000). Logical positivists maintain that a statement is meaningful only
if it is (1) analytical (e.g., mathematical equations) or (2) can reasonably be verified
empirically. To logical positivists, ethics and aesthetics, for example, are meta-
physical and thus scientifically meaningless because one cannot evaluate such argu-
ments analytically or empirically. A common, often implicit assumption of those
informed by logical positivism is that given sufficient ingenuity, technology, and
time, scientists can ultimately come to understand material reality in all its com-
plexity. Similarly, the notion that researchers should work down to the ultimate
elements of the system of interest (to either natural or social scientists), and then
build the causal relationships back to eventually develop a complete explanation of
the universe in question, tends to characterize logical positivism as well. The
recently completed mapping of the human genome and promised medical break-
throughs related to this genomic map characterizes this tendency.
The publication of Karl R. Popper’s (1902–1994) Logik der Forschung by the
Vienna Circle in 1934 (given a 1935 imprint) called into question the sufficiency of
logical positivism. After the chaos of WWII, Popper translated the book into
English and published it as The Logic of Scientific Discovery in 1959. Popper’s
(1962) perspectives were further developed in Conjectures and Refutations: The
Growth of Scientific Knowledge. Unlike most positivists, Popper was not concerned
with distinguishing meaningful from meaningless statements or verification, but
rather distinguishing scientific from metaphysical statements using falsification.
For him, metaphysical statements were unfalsifiable, while scientific statements
could potentially be falsified. On this basis, scientists should ignore metaphysical
contentions; instead, they should deductively derive tests for hypotheses that could
lead to falsification. This approach often is called the hypothetico–deductive model
of science. Popper argued that hypotheses that did not withstand a rigorous test
should immediately be rejected and researchers should then move on to alternatives
that were more productive. He acknowledged, however, that metaphysical state-
ments in one era could become scientific later if they became falsifiable (e.g., due
to changes in technology). Under Popper’s model of science, while material reality
probably exists, the best scientists can do is determine what it is not, by systemati-
cally falsifying hypotheses related to the topic of interest. Thus, for Popperians,
knowledge regarding an issue is approximated by the explanatory hypothesis that
has best survived substantive experimental challenges to date. From this perspec-
tive, often called postpositivism, knowledge ultimately is conjectural and can be
modified based on further investigation.
1.2 Philosophy and Science 9
Table 1.1 The purpose, logical definition, and verbal description of inductive, deductive, and
retroductive reasoning, given the preconditions a, postconditions b, and the rule R1: a → b (a
therefore b; after Menzies 1996)
Method Purpose Definitiona,b Description
Induction Determining R1 a → b ⇒ R1 Learning the rule (R1) after numerous
examples of a and b
Deduction Determining b α ^ R1 ⇒ b Using the rule (R1) and its preconditions (a)
to deterministically make a conclusion (b)
Retroduction Determining a b ^ R1 ⇒ a Using the postcondition (b) and the rule (R1)
to hypothesize the preconditions (a) that
could best explain the observed
postconditions (b)
a
→, ^, and ⇒ signify “therefore,” “and,” and “logically implies,” respectively
b
Note that deduction and retroduction employ the same form of logical statement to determine
either the post- or precondition, respectively
Axiology is the study of value or quality. The nature, types, and criteria of values
and value judgments are critical to science. At least three aspects of value are
directly relevant to our discussion: (1) researcher ethics, (2) personal values
researchers bring to science, and (3) how we determine the quality of research.
Ethics in science runs the gambit from humane and appropriate treatment of
animal or human subjects to honesty in recording, evaluating, and reporting data.
Plagiarism, fabrication and falsification of data, and misallocation of credit by sci-
entists are all too often news headlines. While these ethical problems are rare, any
fraud or deception by scientists undermines the entire scientific enterprise. Ethical
concerns led the National Academy of Sciences (USA) to form the Committee on
the Conduct of Science to provide guidelines primarily for students beginning
careers in scientific research (Committee on the Conduct of Science 1989). All
graduate students should read the updated and expanded version of this report
(Committee on Science, Engineering, and Public Policy 1995). It also serves as a
brief refresher for more seasoned scientists.
Perhaps two brief case studies will help put bookends around ethical issues and
concerns in science. The first involves Hwang Woo-suk’s meteoric rise to the pin-
nacle of fame as a stem-cell researcher, and his even more rapid fall from grace. He
and his colleagues published two articles in Science reporting truly remarkable
results in 2004 and 2005. These publications brought his laboratory, Seoul National
University, and South Korea to the global forefront in stem cell research, and
Professor Hwang became a national hero nearly overnight. The only problem was
that Woo-suk and his coauthors fabricated data used in the two papers (Kennedy
2006). Additional ethical problems relating to sources of human embryos also soon
surfaced. In less than a month (beginning on 23 December 2005), a governmental
probe found the data were fabricated, Dr. Hwang admitted culpability and resigned
his professorship in disgrace, and the editors of Science retracted the two articles
with an apology to referees and those attempting to replicate the two studies. This
episode was a severe disgrace for Professor Hwang, Seoul National University, the
nation of South Korea, and the entire scientific community.
Although breaches of ethics similar to those in the previous example receive
considerable media attention and near universal condemnation, ethical problems in
science often are more insidious and thence less easily recognized and condemned.
Wolff-Michael Roth and Michael Bowen (2001) described an excellent example of
the latter. They used ethnographic approaches to explore the enculturation process
1.2 Philosophy and Science 13
of upper division undergraduate and entry level graduate student researchers begin-
ning their careers in field ecology. These students typically had little or no direct
supervision at their study areas and had to grapple independently with the myriad
problems inherent to fieldwork. Although they had reproduced experiments as part
of highly choreographed laboratory courses (e.g., chemistry), these exercises prob-
ably were more a hindrance than a help. In these choreographed exercises, the cor-
rect results were never in doubt, only the students’ ability to reproduce them was in
question. Roth and Bowen found that the desire to obtain the “right” or expected
results carried over to fieldwork. Specifically, one student was to replicate a 17-year
old study. He had a concise description of the layout, including maps. Unfortunately,
he was unable to interpret the description and maps well enough to lay out transects
identical to those used previously, despite the fact that most of the steel posts mark-
ing the original transects still were in place (he overlooked the effects of topo-
graphical variation and other issues). He knew the layout was incorrect, as older
trees were not where he expected them to be. Instead of obtaining expert assistance
and starting over, he bent “linear” transects to make things work out, assumed the
previous researcher had incorrectly identified trees, and that published field guides
contained major errors. “ ‘Creative solutions,’ ‘fibbing,’ and differences that ‘do not
matter’ characterized his work …” (p. 537). He also hid a major error out of con-
cern for grades. As he put it
I am programmed to save my ass. And saving my ass manifests itself in getting the best
mark I can by compromising the scruples that others hold dear . …That’s what I am made
of. That is what life taught me. (p. 543)
Of course, his “replication” was not a replication at all, but this fact would not be
obvious to anyone reading a final report. Roth and Bowen (2001) concluded that
… the culture of university ecology may actually encourage students to produce ‘creative
solutions’ to make discrepancies disappear. The pressures that arise from getting right
answers encourage students to ‘fib’ and hide the errors that they know they have commit-
ted. (p. 552)
While this example of unethical behavior by a student researcher might not seem
as egregious as the previous example, it actually is exactly the same ethical prob-
lem; both researchers produced data fraudulently so that their work would appear
better than it actually was for purposes of self-aggrandizement.
Another important axiological area relates to the values researchers bring to sci-
ence. For example, Thomas Chrowder Chamberlin (1890; 1843–1928) argued that
scientists should make use of multiple working hypotheses to help protect them-
selves from their own biases and to ensure they did not develop tunnel vision. John
R. Platt (1964) rediscovered Chamberlin’s contention and presented it to a new
generation of scientists (see Sect. 1.4.1 for details). That researchers’ values
impinge to some degree upon their science cannot be doubted. This is one of the
reasons philosophers such as Kuhn, Lakatos, and Feyerabend maintained there
were cultural aspects of science regardless of scientists’ attempts to be “objective”
and “unbiased” (see Sect. 1.2.3.1). Moreover, scientists’ values are directly relevant
to social constructionism and thence critical studies of science.
14 1 Concepts for Wildlife Science: Theory
In a general sense, science is a process used to learn how the world works. As dis-
cussed in Sect. 1.2, humans have used a variety of approaches for explaining the
world around them, including mysticism, religion, sorcery, and astrology, as well
as science. The scientific revolution propelled science to the forefront during the
last few centuries, and despite its shortcomings, natural science (the physical and
life sciences) has been remarkably effective in explaining the world around us
(Haack 2003). What is it about the methods of natural science that has proven so
successful? Here, we address this question for natural sciences in general and wild-
life science in particular. We begin this task by discussing research studies designed
to evaluate research hypotheses or conceptual models. We end this section by con-
textualizing how impact assessment and studies designed to inventory or monitor
species of interest fit within the methods of natural science. The remainder of the
book addresses specifics as they apply to wildlife science.
We avoided the temptation to label Sect. 1.3 “The Scientific Method.” After all, as
Sect. 1.2 amply illustrates, there is no single philosophy, let alone method, of sci-
ence. As philosopher of science Susan Haack (2003, p. 95) put it, “Controlled
1.3 Science and Method 15
Table 1.2 Typical steps used in the process of conducting natural science
1 Observe the system of interest
2 Identify a broad research problem or general question of interest
3 Conduct a thorough review of the refereed literature
4 Identify general research objectives
5 In light of these objectives, theory, published research results, and possibly a pilot study,
formulate specific research hypotheses and/or a conceptual model
6 Design (1) a manipulative experiment to test whether conclusions derived deductively from
each research hypothesis are supported by data or (2) another type of study to evaluate one
or more aspects of each hypothesis or the conceptual model
7 Obtain peer reviews of the research proposal and revise as needed.
8 Conduct a pilot study if needed to ensure the design is practicable. If necessary, circle back
to steps 6 or 5
9 Conduct the study
10 Analyze the data
11 Evaluate and interpret the data in light of the hypotheses or model being evaluated. Draw
conclusions based on data evaluation and interpretation as well as previously published
literature
12 Publish results in refereed outlets and present results at scientific meetings
13 In light of the results and feedback from the scientific community, circle back and repeat
the process beginning with steps 5, 4, or even steps 3, 2, or 1, as appropriate
scientific literature, one can inductively derive rules of association among classes
of facts based on theory regarding how some aspect of the system works (see
Guthery (2004) for a discussion of facts and science). Similarly, one can retroduc-
tively derive hypotheses that account for interesting phenomena observed (see
Guthery et al. (2004) for a discussion of hypotheses in wildlife science). The devel-
opment of research hypotheses and/or conceptual models that explain observed
phenomena is a key attribute of the scientific process.
Step 6 (Table 1.2) is the principal topic of this book; we discuss the details in
subsequent chapters. In general, one either designs a manipulative experiment to
test whether conclusions derived deductively from one or more research hypotheses
are supported by data, or designs another type of study to evaluate one or more
aspects of each hypothesis or conceptual model. There are basic principles of
design that are appropriate for any application, but researchers must customize the
details to fit specific objectives, the scope of their study, and the system or subsys-
tem being studied. It is critically important at this juncture to formally draft a
research proposal and have it critically reviewed by knowledgeable peers. It is also
important to consider how much effort will be required to achieve the study objec-
tives. This is an exercise in approximation and requires consideration of how the
researcher will analyze collected data, but can help identify cases where the effort
required is beyond the capabilities and budget of the investigator, and perhaps
thereby prevent wasted effort. Pilot studies can be critical here; they help research-
ers determine whether data collection methods are workable and appropriate, and
also serve as sources of data for sample size calculations. We consider sample size
further in Sect. 2.5.7.
Once the design is evaluated and revised, the researcher conducts the study and
analyzes the resulting data (steps 9–10, Table 1.2). In subsequent chapters, we dis-
cuss practical tips and pitfalls in conducting wildlife field studies, in addition to
general design considerations. We do not emphasize analytic methods because an
adequate exposition of statistical methodology is beyond the scope of this book.
Regardless, researchers must consider some aspects of statistical inference during
the design stage. In fact, the investigator should think about the entire study proc-
ess, including data analysis and even manuscript preparation (including table and
figure layout), in as much detail as possible from the beginning. This will have
implications for study design, especially sampling effort.
On the basis of the results of data analysis, predictions derived from the hypoth-
eses or conceptual models are compared against the results, and interpretations are
made and conclusions drawn (step 11, Table 1.2). The researcher then compares
and contrasts these results and conclusions with those of similar work published in
the refereed literature. Researchers then must present their results at professional
meetings and publish them in refereed journals. A key aspect of science is obtaining
feedback from other scientists. It is difficult to adequately accomplish this goal
without publishing in scientific journals. Remember, if a research project was worth
conducting in the first place, the results are worth publishing in a refereed outlet.
We hasten to add that sometimes field research studies, particularly, do not work
out as planned. This fact does not necessarily imply that the researcher did not learn
1.3 Science and Method 17
Natural resource management agencies often implement field studies to collect data
needed for management decision making (often required to do so by statute) rather
than to test hypotheses or evaluate conceptual models. For example, agencies may
18 1 Concepts for Wildlife Science: Theory
Thus far, we primarily have addressed natural science generally. Here we attempt
to place wildlife science more specifically within the context of the philosophy of
natural science. One way wildlife scientists have contextualized their discipline is
by comparing what is actually done to what they consider to be ideal based on the
philosophy of science. As we have seen in Sect. 1.2, however, the ideal was some-
what a moving target during the twentieth century. Additionally, the understandable
tendency of wildlife scientists to cite one another’s second, third, or fourth hand
summaries of Popper or Kuhn’s ideas, for example, rather than read these philoso-
phers’ writings themselves, further clouded this target. For this reason, many pub-
lications citing Popper or Kuhn do not accurately represent these authors’ ideas.
Here we discuss a few critiques of science by scientists that influenced how
researchers conduct wildlife science. We then attempt to contextualize where wild-
life science falls today within the philosophy of natural science.
biology and high-energy physics, made much more rapid scientific progress and
significant breakthroughs than did those working in other natural sciences because
they utilized an approach he called strong inference. Platt maintained that strong
inference was nothing more than an updated version of Bacon’s method of induc-
tive inference. Specifically, he argued, researchers should (1) inductively develop
multiple alternative hypotheses (after Chamberlin 1890), (2) deduce from these a
critical series of outcomes for each hypothesis, then devise a crucial experiment or
series of experiments that could lead to the elimination of one or more of the
hypotheses (after Popper 1959, 1962), (3) obtain decisive results through experi-
mentation, and (4) recycle the procedure to eliminate subsidiary hypotheses. He
also argued that these New Baconians used logic trees to work out what sort of
hypotheses and questions they should address next. He provided numerous exam-
ples of extraordinarily productive scientists whom he felt had used this approach.
As Platt concluded (1964, p. 352)
The man to watch, the man to put your money on, is not the man who wants to make “a
survey” or a “more detailed study” but the man with the notebook, the man with the alter-
native hypotheses and the crucial experiments, the man who knows how to answer your
Question of disproof and is already working on it.
Rowland H. Davis (2006) maintained that while Platt’s (1964) essay was influential
in an array of natural and social sciences, it probably had its greatest impact in ecol-
ogy. One reason was that in 1983, the American Naturalist prepared a dedicated
issue titled “A Round Table on Research in Ecology and Evolutionary Biology” that
included some of the most highly cited theoretical papers in ecology till that date.
Some of these authors directly suggested that researchers use Platt’s method of
strong inference to address their theoretical questions (Quinn and Dunham 1983;
Simberloff 1983) and others made similar suggestions somewhat less directly
(Roughgarden 1983; Salt 1983; Strong 1983). Several other essays invoking aspects
of Platt’s approach also appeared in ecology and evolutionary biology outlets dur-
ing the 1980s (e.g., Romesburg 1981; Atkinson 1985; Loehle 1987; Wenner 1989).
There is little doubt that wildlife ecology and conservation researchers were
inspired directly or indirectly to improve the sophistication of their study designs
by Platt’s essay.
Strong inference (Platt 1964) was not without problems, however, including
some that were quite serious. Only one year after its publication, a physicist and a
historian (Hafner and Presswood 1965) demonstrated in Science that historical evi-
dence did not support the contention that strong inference had been used in the
high-energy physics examples that Platt provided. Instead, they maintained “… that
strong inference is an idealized scheme to which scientific developments seldom
conform” (p. 503). More recently, two psychologists concluded that (1) Platt failed
to demonstrate that strong inference was used more frequently in rapidly versus
slowly progressing sciences, (2) Platt’s historiography was fatally flawed, and (3)
numerous other scientific approaches had been used as or more successfully than
strong inference (O’Donohue and Buchanan 2001). Davis (2006, p. 247) concluded
that “… the strongest critiques of his [Platt’s] recommendations were entirely
justified.”
1.4 Wildlife Science, Method, and Knowledge 21
One might logically ask why Platt’s essay was so influential, given its many
shortcomings. The answer, as Davis (2006, p. 238) put it, is “that the article was
more an inspirational tract than the development of a formal scientific methodol-
ogy.” It was effective because it “imparted to many natural and social scientists an
ambition to test hypotheses rather than to prove them” (p. 244). Davis concluded
that the value of “Strong Inference” was that it “encouraged better ideas, better
choices of research problems, better model systems, and thus better science overall,
even in the fields relatively resistant to the rigors of strong inference” (p. 248). This
undoubtedly was true for wildlife science.
Numerous influential essays more directly targeting how wildlife scientists
should conduct research also appeared during the last few decades. For example,
H. Charles Romesburg (1981) pointed out that wildlife scientists had used induction
to generate numerous rules of association among classes of facts, and had retroduc-
tively developed many intriguing hypotheses. Unfortunately, he argued, these
“research hypotheses either are forgotten, or they gain credence and the status of
laws through rhetoric, taste, authority, and verbal repetition” (p. 295). He recom-
mended that wildlife science attempt to falsify retroductively derived research
hypotheses more often using the hypothetico–deductive approach to science cham-
pioned by Popper (1959, 1962) and discussed by Platt (1964). Similarly, Stuart H.
Hurlbert (1984) maintained that far too many ecological researchers, when attempt-
ing to implement the hypothetico–deductive method using replicated field experi-
ments, actually employed pseudoreplicated designs (see Sect. 2.2 for details).
Because of these design flaws, he argued, researchers were much more likely to
find differences between treatments and controls than actually occurred.
One of the difficulties faced by wildlife science and ecology is that ecological
systems typically involve middle-number systems, or systems made up of too many
parts for a complete individual accounting (census), but too few parts for these parts
to be substituted for by averages (an approach successfully used by high-energy
physics) without yielding fuzzy results (Allen and Starr 1982; O’Neill et al. 1986).
For this reason, wildlife scientists often rely on statistical approaches or modeling
to make sense of these problematic data. Thence, the plethora of criticisms regard-
ing how wildlife scientists evaluate data should come as no surprise. For example,
Romesburg (1981) argued that wildlife scientists had a “fixation on statistical meth-
ods” and noted that “scientific studies that lacked thought … were dressed in quan-
titative trappings as compensation” (307). Robert K. Swihart and Norman A. Slade
(1985) argued that sequential locations of radiotelemetered animals often lacked
statistical independence and that many researchers evaluated such data inappropri-
ately. Douglas H. Johnson (1995) maintained that ecologists were too easily swayed
by the allure of nonparametric statistics and used these tools when others were
more appropriate. Patrick D. Gerard and others (1998) held that wildlife scientists
should not use retrospective power analysis in the manner The Journal of Wildlife
Management editors had insisted they should (The Wildlife Society 1995). Steve
Cherry (1998), Johnson (1999), and David R. Anderson and others (2000) main-
tained that null hypothesis significance testing was typically used inappropriately
in wildlife science and related fields, resulting in far too many p-values in refereed
22 1 Concepts for Wildlife Science: Theory
journals (See Sect. 2.5.2 for details). Anderson (2001, 2003) also made a compel-
ling argument that wildlife field studies relied far too much on (1) convenience
sampling and (2) index values. Since one leads to the other and neither are based
on probabilistic sampling designs, there is no valid way to make inference to the
population of interest or assess the precision of these parameter estimates. Finally,
Fred S. Guthery and others (2001, 2005) argued that wildlife scientists still ritualis-
tically applied statistical methods and that this tended to transmute means (statisti-
cal tools) into ends. They also echoed the view of previous critics (e.g., Romesburg
1981; Johnson 1999) that wildlife scientists should give scientific hypotheses and
research hypotheses a much more prominent place in their research programs,
while deemphasizing statistical hypotheses and other statistical tools because they
are just that – tools. These and similar critiques will be dealt with in more detail in
subsequent chapters. The take home message is that wildlife science is still strug-
gling to figure out how best to conduct its science, and where to position itself
within the firmament of the natural sciences.
Even if we ignore the serious deficiencies Kuhn, Lakatos, Feyerabend, and other
philosophers found in Popper’s model of science (see Sect. 1.2.3.1), and argue that
hypothesis falsification defines science, there is still a major disconnect between
this ideal and what respected wildlife researchers actually do. For example,
although most wildlife scientists extol the hypothetico–deductive model of science,
Fig. 1.1 represents common study designs actually employed by wildlife researchers.
Fig. 1.1 The potential for various wildlife study designs to produce conclusions with high cer-
tainty (few plausible alternative hypotheses) and widespread applicability (diversity of popula-
tions where inferences apply). Reproduced from Garton et al. (2005), with kind permission from
The Wildlife Society
1.4 Wildlife Science, Method, and Knowledge 23
Only a few of these can be construed as clearly Popperian. This does not imply the
remaining designs are not useful. In fact, much of the remaining chapters deal with
how to implement these and related designs. Instead, although Popper’s postposi-
tivist model of science is sometimes useful, it is often insufficient for the scope of
wildlife science.
Is there a philosophical model of science that better encompasses what wildlife
researchers do? Yes, there probably are several. For example, Lakatos’ (1970)
attempt to reconcile Popper (1959, 1962) and Kuhn’s (1962) representations of sci-
ence resulted in what we expect many wildlife scientists assume was Popper’s
model of science. Lakatos’ formulation still gives falsification its due, but also
makes a place for historically obvious paradigm shifts and addressed other potential
deficiencies in Popper’s model (see Sect. 1.2.3.1 for details). Lakatos’ model, how-
ever, still cannot encompass the array of designs represented in Fig. 1.1 (and dis-
cussed in subsequent chapters). Although Feyerabend’s (1975, 1978) “anything
goes” approach to science certainly can cover any contingency, it offers wildlife
scientists little philosophical guidance.
Haack (2003) developed a model of scientific evidence that offers a unified
philosophical foundation for natural science. Further, Fig. 1.1 makes perfect
sense in light of this model. Essentially, she argues that, from an epistemological
perspective (see Sect. 1.2.3.1), natural science is a pragmatic undertaking. Her
retro-classical version of American pragmatism places science firmly within the
empiricist sphere of epistemology as well, due to the criticality of experience.
She developed an apt analogy, beginning in the early 1990s (Haack 1990, 1993),
which should help contextualize her model. Haack (2003) maintained that natural
science research programs are conducted in much the way one completes a cross-
word puzzle, with warranted scientific claims anchored by experiential evidence
(analogous to clues) and enmeshed in reasons (analogous to the matrix of com-
pleted entries). As she put it
How reasonable a crossword entry is depends not only on how well it fits with the clue and
any already-completed intersecting entries, but also on how plausible those other entries
are, independent of the entry in question, and how much of the crossword has been com-
pleted. Analogously, the degree of warrant of a [scientific] claim for a person at a time
depends not only on how supportive his evidence is, but also on how comprehensive it is,
and how secure his reasons are, independent of the claim itself. (p. 67)
(p. 107). Further, just as someone completing a crossword puzzle might make inap-
propriate entries, and be forced to rethink their approach, scientists are fallible as
well. In fact, learning from mistaken results, concepts, or theories, and having to
begin certain aspects of a research program repeatedly, seems to characterize natu-
ral science (Hafner and Presswood 1965; Haack 2003).
We hasten to point out that others noted the puzzle-like nature of natural science
prior to Haack (1990, 1993, 2003). For example, Albert Einstein (1879–1955;
1936, pp. 353–354) wrote that
The liberty of choice [of scientific concepts and theories], however, is of a special kind; it
is not in any way similar to the liberty of a writer of fiction. Rather, it is similar to that of
a man engaged in solving a well-designed word puzzle. He may, it is true, propose any
word as the solution; but, there is only one word which really solves the puzzle in all it
forms. It is an outcome of faith that nature – as she is perceptible to our five senses – takes
the character of such a well-formulated puzzle. The successes reaped up to now by science
do, it is true, give a certain encouragement to this faith.
The target population of a wildlife study could include a broad array of biological
entities. It is important to be clear and specific in defining what that entity is. It is
just as important to identify this before the study begins as when it is explained in
a manuscript or report at the end, because the sampled population and thus the
sample stem directly from the target population. If some part of the target popula-
tion is ignored when setting up the study, then there will be no chance of sampling
that portion, and therefore drawing statistical inference to the entire population of
interest cannot be done appropriately, and any inference to the target population is
strictly a matter of professional judgment.
If a target population is well defined, and the desirable situation where the sam-
pled population matches the target population is achieved, then the statistical infer-
ence will be valid, regardless of whether the target matches an orthodox definition
of a biological grouping in wildlife science. Nevertheless, we believe that reviewing
general definitions of biological groupings will assist the reader in thinking about
the target population he or she would like to study.
In ecology, a population is a group of individuals of one species in an area at a
given time (Begon et al. 2006, p. 94). We assume these individuals have the poten-
tial to breed with one another, implying there is some chance they will encounter
one another. Dale R. McCullough (1996, p. 1) describes the distinction between
panmictic populations, where the interactions between individuals (including
potential mating opportunities) are relatively continuous throughout the space
occupied by the population, and metapopulations. A metapopulation (Levins 1969,
1970) is a population subdivided into segments occupying patches of habitat in a
fragmented landscape. An environment hostile to the species of interest separates
these patches. Movement, and presumably gene flow, between patches is inhibited,
but still occurs. Metapopulations provide a good example of where the design of a
population study could go awry. Suppose a metapopulation consists of sources and
sinks (Pulliam 1988), where the species remains on all patches and the metapopula-
tion is stable, but those that are sinks have low productivity and must rely on
dispersal from the sources to avoid local extinction. If an investigator considers
individuals on just one patch to constitute the entire population, then a demographic
study of this subpopulation could be misleading, as it could not address subpopulations
on other patches. By considering only natality and survival of this subpopulation,
the investigator might conclude that the population will either grow exponentially
(if a source) or decline to extinction (if a sink).
1.5 What is it We Study? 27
John Macnab (1985) argued that wildlife science was plagued with “slippery
shibboleths,” or code words having different meanings for individuals or subgroups
within the field. “Significance” is as slippery as any shibboleth in wildlife science.
We typically use this term in one of three ways: biological, statistical, or social sig-
nificance. All too often, authors either do not specify what they mean when they
say something is significant, or appear to imply that simply because results are (or
are not) statistically significant, they must also be (or not be) significant biologi-
cally and /or socially.
28 1 Concepts for Wildlife Science: Theory
lation, community, and habitat. The statistical concepts will be applied to the
biological ones (i.e., the set of experimental or sampling units will be identified),
based on the objectives of the study. We can divide wildlife studies into those
whose objectives focus on groupings of animals and those whose objectives focus
on the habitat of the animals.
We can further divide studies of animals into those that focus on measuring
something about the individual animal (e.g., sex, mass, breeding status) and those
that focus on how many animals are there. Consider a study of a population of cot-
ton rats (Sigmodon hispidus) in an old field where there are two measures of inter-
est: the size of the population and its sex ratio. The sampling units would be
individual rats and the target population would include all the rats in the field
(assume the field is isolated enough that this is not part of a metapopulation). If
capture probabilities of each sex are the same (perhaps a big assumption), then by
placing a set of traps throughout the field one could trap a representative sample
and estimate the sex ratio. If the traps are distributed probabilistically, the sampled
population would match the target population (and in this case the target population
would coincide with a biological population) and therefore the estimated sex ratio
should be representative of the population sex ratio.
The estimation of abundance is an atypical sampling problem. Instead of meas-
uring something about the sampling units, the objective is to estimate the total
number of units in the target population. Without a census, multiple samples and
capture–recapture statistical methodology are required to achieve an unbiased esti-
mate of the population size (see Sect. 2.5.4.). If traps are left in the same location
for each sample, it is important that there be enough traps so that each rat has some
chance of being captured during each trapping interval.
Estimates of abundance are not limited to the number of individual animals
in a population. The estimation of species richness involves the same design
considerations. Again, in the absence of a census of the species in a community
(i.e., probability of detecting at least one individual of each species is 1.0),
designs that allow the use of capture–recapture statistical methodologies might
be most appropriate (see reviews by Nichols and Conroy 1996; Nichols et al.
1998a,b; Williams et al. 2002). In this case, the target population is the set of all
the species in a community. We discuss accounting for detectability more fully
in Sect. 2.4.1.
If wildlife is of ultimate interest, but the proximal source of interest is something
associated with the ecosystem of which wildlife is a part, then the target population
could be vegetation or some other aspect of the animals’ habitat (e.g., Morrison et al.
2006). For example, if the objective of the study is to measure the impact of deer
browsing on a given plant in a national park, the target population is not the deer,
but the collection of certain plants within the park. The researcher could separate
the range of the plant into experimental units consisting of plots; some plots could
be left alone but monitored, whereas exclosures could be built around others to
prevent deer from browsing. In this way, the researcher could determine the impact
of the deer on this food plant by comparing plant measurements on plots with
exclosures versus plots without exclosures.
30 1 Concepts for Wildlife Science: Theory
1.6 Summary
Because wildlife scientists conduct research in the pursuit of knowledge, they must
understand what knowledge is and how it is acquired. We began Sect. 1.2 using “the
science wars” to highlight how different ontological, epistemological, and axiologi-
cal perspectives can lead to clashes grounded in fundamentally different philo-
sophical perspectives. This example also illustrates practical reasons why wildlife
scientists should become familiar with philosophy as it relates to natural science.
Differing perspectives on the nature of reality (ontology) explain part of this clash
of ideas. Most scientists, grounded in the empiricist tradition, hold that reality inde-
pendent of human thought and culture indeed exists. Conversely, many social sci-
entists and humanists argue that reality ultimately is socially constructed because it
is to some degree contingent upon human percepts and social interactions. Several
major perspectives toward the nature and scope of knowledge (epistemology) have
developed in Western philosophy. Influential approaches to knowledge acquisition
include empiricism, rationalism, pragmatism, logical positivism, postpositivism,
and social constructionism. Regardless of the epistemological perspective one
employs, however, logical thought, including inductive, deductive, and retroductive
reasoning (Table 1.1), remains an integral component of knowledge acquisition. At
least three aspects of value or quality (axiology) influence natural science. Ethical
behavior by scientists supports the integrity of the scientific enterprise, researchers
bring their own values into the scientific process, and both scientists and society
must determine the value and quality of scientific research.
As Sect. 1.2 illustrates, there is no single philosophy of science, and so there can
be no single method of science either. Regardless, natural science serves as a model
of human ingenuity. In Sect. 1.3, we addressed why natural science has proven such
a successful enterprise. Much of the reason relates to general steps commonly
employed (Table 1.2). These include (1) becoming familiar with the system of
interest, the question being addressed, and the related scientific literature, (2) con-
structing meaningful research hypotheses and/or conceptual models relating to the-
ory and objectives, (3) developing an appropriate study design and executing the
design and analyzing the data appropriately, (4) obtaining feedback from other
scientists at various stages in the process, such as through publication in referred
outlets, and (5) closing the circle of science by going back to steps 3, 2, or 1 as
needed. Often, because of the complex nature of scientific research, multiple
researchers using a variety of methods address different aspects of the same general
research program. Impact assessment, inventorying, and monitoring studies provide
important data for decision making by natural resource policy makers and manag-
ers. The results of well-designed impact and survey studies often are suitable for
publication in refereed outlets, and other researchers can use these data in conjunc-
tion with data collected during similar studies to address questions beyond the
scope of a single study.
In Sect. 1.4, we discussed how wildlife scientists have honed their approaches to
research by studying influential critiques written by other natural scientists (e.g.,
References 31
Platt 1964; Romesburg 1981; Hurlbert 1984). Because ecological systems contain
too many parts for a complete individual accounting (census), but too few parts for
these parts to be substituted for by averages, wildlife scientists typically rely on
statistical approaches or modeling to make sense of data. For this reason, numerous
critiques specifically addressing how wildlife scientists handle and mishandle data
analysis were published in recent decades. These publications continue to shape
and reshape how studies are designed, data analyzed, and publications written.
As Fig. 1.1 illustrates, wildlife science commonly employs a number of study
designs that do not follow Popper’s (1959, 1962) falsification approach to science.
Epistemologically, wildlife science probably is better described by Haack’s (2003)
pragmatic model of natural science, where research programs are conducted in
much the same way one completes a crossword puzzle, with warranted scientific
claims anchored by experiential evidence (analogous to clues) and enmeshed in
reasons (analogous to the matrix of completed entries). This pragmatic model
permits any study design that can provide reliable solutions to the scientific puzzle,
including various types of descriptive research, impact assessment, information–
theoretic approaches using model selection, replicated manipulative experiments
attempting to falsify retroductively derived research hypotheses, and qualitative
designs to name just a few. Under this pragmatic epistemology, truth, knowledge,
and theory are inexorably connected with practical consequences, or real effects.
We ended the chapter by clarifying what it is that wildlife scientists study (Sect.
1.5). We did so by defining a number of statistical, biological, and social terms.
This is important as the same English word can describe different entities in each
of these three domains (e.g., significance). We hope that these common definitions
will make it easier for readers to navigate among chapters. Similarly, this chapter
serves as a primer on the philosophy and nature of natural science that should help
contextualize the more technical chapters that follow.
References
Allen, T. F. H., and T. B. Starr. 1982. Hierarchy: Perspectives for Ecological Complexity.
University of Chicago Press, Chicago, IL.
Anderson, D. R. 2001. The need to get the basics right in wildlife field studies. Wildl. Soc. Bull.
29: 1294–1297.
Anderson, D. R. 2003. Response to Engeman: index values rarely constitute reliable information.
Wildl. Soc. Bull. 31: 288–291.
Anderson, D. R., K. P. Burnham, and W. L. Thompson. 2000. Null hypothesis testing: problems,
prevalence, and an alternative. J. Wildl. Manag. 64: 912–923.
Arnqvist, G., and D. Wooster. 1995. Meta-analysis: synthesizing research findings in ecology and
evolution. Trends Ecol. Evol. 10: 236–240.
Ashman, K. M., and P. S. Barringer, Eds. 2001. After the Science Wars. Routledge, London.
Atkinson, J. W. 1985. Models and myths of science: views of the elephant. Am. Zool. 25:
727–736.
Begon, M., C. R. Townsend, and J. L. Harper. 2006. Ecology: From Individuals to Ecosystems,
4th Edition. Blackwell, Malden, MA.
32 1 Concepts for Wildlife Science: Theory
Berger, P. L., and T. Luckmann. 1966. The Social Construction of Reality: A Treatise in the
Sociology of Knowledge. Doubleday, Garden City, NY.
Chamberlin, T. C. 1890. The method of multiple working hypotheses. Science 15: 92–96.
Cherry, S. 1998. Statistical tests in publications of The Wildlife Society. Wildl. Soc. Bull. 26:
947–953.
Committee on Science, Engineering, and Public Policy. 1995. On Being a Scientist: Responsible
Conduct in Research, 2nd Edition. National Academy Press, Washington, D.C.
Committee on the Conduct of Science. 1989. On being a scientist. Proc. Natl. Acad. Sci. USA 86:
9053–9074.
Costanza, R., R. d’Arge, R. de Groot, S. Farber, M. Grasso, B. Hannon, K. Limburg, S. Naeem,
R. V. Oneill, J. Paruelo, R. G. Raskin, P. Sutton, and M. van den Belt. 1997. The value of the
world’s ecosystem services and natural capital. Nature 387: 253–260.
Daniels, S. E., and G. B. Walker. 2001. Working Through Environmental Conflict: The
Collaborative Learning Approach. Praeger, Westport, CT.
Davis, R. H. 2006. Strong inference: rationale or inspiration? Perspect. Biol. Med. 49: 238–249.
Denzin, N. K., and Y. S. Lincoln, Eds. 2005. The Sage Handbook of Qualitative Research, 3rd
Edition. Sage Publications, Thousand Oaks, CA.
Depoe, S. P., J. W. Delicath, and M.-F. A. Elsenbeer, Eds. 2004. Communication and Public
Participation in Environmental Decision Making. State University of New York Press, Albany,
NY.
Diamond, J. M. 1972. Biogeographic kinetics: estimation of relaxation times for avifaunas of
southwest Pacific islands. Proc. Natl. Acad. Sci. USA 69: 3199–3203.
Diamond, J. M. 1975. The island dilemma: lessons of modern biogeographic studies for the design
of nature reserves. Biol. Conserv. 7: 129–146.
Diamond, J. M. 1976. Island biogeography and conservation: strategy and limitations. Science
193: 1027–1029.
Dillman, D. A. 2007. Mail and Internet Surveys: The Tailored Design Method, 2nd Edition. Wiley,
Hoboken, NJ.
Einstein, A. 1936. Physics and reality. J. Franklin Inst. 221: 349–382.
Feyerabend, P. 1975. Against Method: Outline of an Anarchistic Theory of Knowledge. NLB,
London.
Feyerabend, P. 1978. Science in a Free Society. NLB, London.
Ford, E. D. 2000. Scientific Method for Ecological Research. Cambridge University Press,
Cambridge.
Garton, E. O., J. T. Ratti, and J. H. Giudice. 2005. Research and experimental design, in C. E.
Braun, Ed. Techniques for Wildlife Investigations and Management, 6th Edition, pp. 43–71.
The Wildlife Society, Bethesda, MD.
Gerard, P. D., D. R. Smith, and G. Weerakkody. 1998. Limits of retrospective power analysis. J.
Wildl. Manage. 62: 801–807.
Gettier, E. L. 1963. Is justified true belief knowledge? Analysis 23: 121–123.
Gross, P. R., and N. Levitt. 1994. Higher Superstition: The Academic Left and its Quarrels With
Science. Johns Hopkins University Press, Baltimore, MD.
Gross, P. R., N. Levitt, and M. W. Lewis, Eds. 1997. The Flight From Science and Reason. New
York Academy of Sciences, New York, NY.
Gurevitch, J. A., and L. V. Hedges. 2001. Meta-analysis: combining the results of independent
experiments, in S. M. Scheiner, and J. A. Gurevitch, Eds. Design and Analysis of Ecological
Experiments, 2nd edition, pp. 347–369. Oxford University Press, Oxford.
Guthery, F. S. 2004. The flavors and colors of facts in wildlife science. Wildl. Soc. Bull. 32:
288–297.
Guthery, F. S., J. J. Lusk, and M. J. Peterson. 2001. The fall of the null hypothesis: liabilities and
opportunities. J. Wildl. Manag. 65: 379–384.
Guthery, F. S., J. J. Lusk, and M. J. Peterson. 2004. Hypotheses in wildlife science. Wildl. Soc.
Bull. 32: 1325–1332.
References 33
Guthery, F. S., L. A. Brennan, M. J. Peterson, and J. J. Lusk. 2005. Information theory in wildlife
science: critique and viewpoint. J. Wildl. Manag. 69: 457–465.
Haack, S. 1990. Rebuilding the ship while sailing on the water, in R. B. Gibson, and R. F. Barrett,
Eds. Perspectives on Quine, pp. 111–128. Blackwell, Oxford.
Haack, S. 1993. Evidence and Inquiry: Towards Reconstruction in Epistemology. Blackwell,
Oxford.
Haack, S. 2003. Defending Science – Within Reason: Between Scientism and Cynicism.
Prometheus Books, Amherst, NY.
Haack, S. 2006. Introduction: pragmatism, old and new, in S. Haack, and R. Lane, Eds.
Pragmatism, Old and New: Selected Writings, pp. 15–67. Prometheus Books, Amherst, NY.
Haack, S., and R. Lane, Eds. 2006. Pragmatism, Old and New: Selected Writings. Prometheus
Books, Amherst, NY.
Hacking, I. 1999. The Social Construction of What? Harvard University Press, Cambridge, MA.
Hafner, E. M., and S. Presswood. 1965. Strong inference and weak interactions. Science 149:
503–510.
Hall, L. S., P. R. Krausman, and M. L. Morrison. 1997. The habitat concept and a plea for standard
terminology. Wildl. Soc. Bull. 25: 173–182.
Hurlbert, S. H. 1984. Pseudoreplication and the design of ecological field experiments. Ecol.
Monogr. 54: 187–211.
Jackson, J. B. C., M. X. Kirby, W. H. Berger, K. A. Bjorndal, L. W. Botsford, B. J. Bourque, R.
H. Bradbury, R. Cooke, J. Erlandson, J. A. Estes, T. P. Hughes, S. Kidwell, C. B. Lange, H. S.
Lenihan, J. M. Pandolfi, C. H. Peterson, R. S. Steneck, M. J. Tegner, and R. R. Warner. 2001.
Historical overfishing and the recent collapse of coastal ecosystems. Science 293: 629–638.
James, W. 1907. Pragmatism, a new name for some old ways of thinking: popular lectures on phi-
losophy. Longmans, Green, New York, NY.
James, W. 1912. Essays in Radical Empiricism. Longmans, Green, New York, NY.
Jasinoff, S., G. E. Markle, J. C. Petersen, and T. Pinch, Eds. 1995. Handbook of Science and
Technology Studies. Sage Publications, Thousand Oaks, CA.
Johnson, D. H. 1995. Statistical sirens: the allure of nonparametrics. Ecology 76: 1998–2000.
Johnson, D. H. 1999. The insignificance of statistical significance testing. J. Wildl. Manag. 63:
763–772.
Johnson, D. H. 2002. The importance of replication in wildlife research. J. Wildl. Manag. 66:
919–932.
Kennedy, D. 2006. Editorial retraction. Science 311: 335.
Kitcher, P. 2001. Science, truth, and democracy. Oxford University Press, New York.
Koertge, N., Ed. 1998. A House Built on Sand: Exposing Postmodernist Myths About Science.
Oxford University Press, New York, NY.
Kuhn, T. S. 1962. The Structure of Scientific Revolutions. University of Chicago Press, Chicago,
IL.
Lakatos, I. 1970. Falsification and the methodology of scientific research programmes, in
I. Lakatos, and A. Musgrave, Eds. Criticism and the Growth of Knowledge, pp. 91–196.
Cambridge University Press, Cambridge.
Lakatos, I., and P. Feyerabend. 1999. For and Against Method: Including Lakatos’s Lectures on
Scientific Method and the Lakatos–Feyerabend Correspondence. M. Motterlini, Ed. University
of Chicago Press, Chicago, IL.
Lakatos, I., and A. Musgrave, Eds. 1970. Criticism and the Growth of Knowledge. Cambridge
University Press, London.
Latour, B. 1993. We Have Never Been Modern. C. Porter, translator. Harvard University Press,
Cambridge, MA.
Levins, R. 1969. Some demographic and genetic consequences of environmental heterogeneity for
biological control. Bull. Entomol. Soc. Am. 15: 237–240.
Levins, R. 1970. Extinction, in M. Gerstenhaber, Ed. Some Mathematical Questions in Biology,
pp. 77–107. American Mathematical Society, Providence, RI.
34 1 Concepts for Wildlife Science: Theory
Lincoln, Y. S., and E. G. Guba. 1985. Naturalistic Inquiry. Sage Publications, Newbury Park,
CA.
Loehle, C. 1987. Hypothesis testing in ecology: psychological aspects and the importance of the-
ory maturation. Q Rev Biol 62: 397–409.
MacArthur, R. H. 1972. Geographical Ecology: Patterns in the Distribution of Species. Harper and
Row, New York, NY.
MacArthur, R. H., and E. O. Wilson. 1967. The Theory of Island Biogeography. Princeton
University Press, Princeton, NJ.
Macnab, J. 1985. Carrying capacity and related slippery shibboleths. Wildl. Soc. Bull. 13:
403–410.
McCullough, D. R. 1996. Introduction, in D. R. McCullough, Ed. Metapopulations and Wildlife
Conservation, pp. 1–10. Island Press, Washington, D.C.
Menzies, T. 1996. Applications of abduction: knowledge-level modelling. Int. J. Hum. Comput.
Stud. 45: 305–335.
Morrison, M. L., B. G. Marcot, and R. W. Mannan. 2006. Wildlife–habitat relationships: concepts
and applications, 3rd Edition. Island Press, Washington, D.C.
Myers, N., R. A. Mittermeier, C. G. Mittermeier, G. A. B. da Fonseca, and J. Kent. 2000.
Biodiversity hotspots for conservation priorities. Nature 403: 853–858.
Nichols, J. D., and M. J. Conroy. 1996. Estimation of species richness. in D. E. Wilson, F. R. Cole,
J. D. Nichols, R. Rudran, and M. Foster, Eds. Measuring and Monitoring Biological Diversity:
Standard Methods for Mammals, pp. 226–234. Smithsonian Institution Press, Washington,
D.C.
Nichols, J. D., T. Boulinier, J. E. Hines, K. H. Pollock, and J. R. Sauer. 1998a. Estimating rates of
local species extinction, colonization, and turnover in animal communities. Ecol. Appl. 8:
1213–1225.
Nichols, J. D., T. Boulinier, J. E. Hines, K. H. Pollock, and J. R. Sauer. 1998b. Inference methods
for spatial variation in species richness and community composition when not all species are
detected. Conserv. Biol. 12: 1390–1398.
O’Donohue, W., and J. A. Buchanan. 2001. The weaknesses of strong inference. Behav. Philos.
29: 1–20.
O’Neill, R. V., D. L. DeAngelis, J. B. Waide, and T. F. H. Allen. 1986. A Hierarchical Concept of
Ecosystems. Princeton University Press, Princeton, NJ.
Osenberg, C. W., O. Sarnelle, and D. E. Goldberg. 1999. Meta-analysis in ecology: concepts, sta-
tistics, and applications. Ecology 80: 1103–1104.
Peterson, M. N., T. R. Peterson, M. J. Peterson, R. R. Lopez, and N. J. Silvy. 2002. Cultural con-
flict and the endangered Florida Key deer. J. Wildl. Manag. 66: 947–968.
Peterson, M. N., S. A. Allison, M. J. Peterson, T. R. Peterson, and R. R. Lopez. 2004. A tale of
two species: habitat conservation plans as bounded conflict. J. Wildl. Manag. 68: 743–761.
Peterson, M. N., M. J. Peterson, and T. R. Peterson. 2005. Conservation and the myth of consen-
sus. Conserv. Biol. 19: 762–767.
Peterson, M. N., M. J. Peterson, and T. R. Peterson. 2006a. Why conservation needs dissent.
Conserv. Biol. 20: 576–578.
Peterson, T. R., and R. R. Franks. 2006. Environmental conflict communication, in J. Oetzel, and
S. Ting-Toomey, Eds. The Sage Handbook of Conflict Communication: Integrating Theory,
Research, and Practice, pp. 419–445. Sage Publications, Thousand Oaks, CA.
Peterson, T. R., M. N. Peterson, M. J. Peterson, S. A. Allison, and D. Gore. 2006b. To play the
fool: can environmental conservation and democracy survive social capital? Commun. Crit. /
Cult. Stud. 3: 116–140.
Plato. [ca. 369 B.C.] 1973. Theaetetus. J. McDowell, translator. Clarendon Press, Oxford.
Platt, J. R. 1964. Strong inference: certain systematic methods of scientific thinking may produce
much more rapid progress than others. Science 146: 347–353.
Popper, K. R. 1935. Logik der forschung: zur erkenntnistheorie der modernen naturwissenschaft.
Springer, Wien, Österreich.
Popper, K. R. 1959. The Logic of Scientific Discovery. Hutchinson, London.
References 35
Popper, K. R. 1962. Conjectures and Refutations: The Growth of Scientific Knowledge. Basic
Books, New York, NY.
Pulliam, H. R. 1988. Sources, sinks, and population regulation. Am. Nat. 132: 652–661.
Quinn, J. F., and A. E. Dunham. 1983. On hypothesis testing in ecology and evolution. Am. Nat.
122: 602–617.
Romesburg, H. C. 1981. Wildlife science: gaining reliable knowledge. J. Wildl. Manag. 45:
293–313.
Rosenberg, A. 2000. Philosophy of science: a contemporary introduction. Routledge, London.
Roth, W.-M., and G. M. Bowen. 2001. ‘Creative solutions’ and ‘fibbing results’: enculturation in
field ecology. Soc. Stud. Sci. 31: 533–556.
Roughgarden, J. 1983. Competition and theory in community ecology. Am. Nat. 122: 583–601.
Salt, G. W. 1983. Roles: their limits and responsibilities in ecological and evolutionary research.
Am. Nat. 122: 697–705.
Shapin, S., and S. Schaffer. 1985. Leviathan and the air-pump: Hobbes, Boyle, and the experimen-
tal life. Princeton University Press, Princeton, NJ.
Simberloff, D. 1976a. Experimental zoogeography of islands: effects of island size. Ecology 57:
629–648.
Simberloff, D. 1976b. Species turnover and equilibrium island biogeography. Science 194:
572–578.
Simberloff, D. 1983. Competition theory, hypothesis-testing, and other community ecological
buzzwords. Am. Nat. 122: 626–635.
Simberloff, D., and L. G. Abele. 1982. Refuge design and island biogeographic theory: effects of
fragmentation. Am. Nat. 120: 41–50.
Simberloff, D. S., and E. O. Wilson. 1969. Experimental zoogeography of island: the colonization
of empty islands. Ecology 50: 278–296.
Sokal, A. D. 1996a. A physicist experiments with cultural studies. Lingua Franca 6(4): 62–64.
Sokal, A. D. 1996b. Transgressing the boundaries: toward a transformative hermeneutics of quan-
tum gravity. Soc. Text 46/47: 217–252.
Sokal, A. D., and J. Bricmont. 1998. Fashionable Nonsense: Postmodern Intellectuals’ Abuse of
Science. Picador, New York, NY.
Strong Jr., D. R., 1983. Natural variability and the manifold mechanisms of ecological communi-
ties. Am. Nat. 122: 636–660.
Swihart, R. K., and N. A. Slade. 1985. Testing for independence of observations in animal move-
ments. Ecology 66: 1176–1184.
Theocharis, T., and M. Psimopoulos. 1987. Where science has gone wrong. Nature 329:
595–598.
Vitousek, P. M., H. A. Mooney, J. Lubchenco, and J. M. Melillo. 1997. Human domination of
Earth’s ecosystems. Science 277: 494–499.
Wenner, A. M. 1989. Concept-centered versus organism-centered biology. Am. Zool. 29:
1177–1197.
Whittaker, R. J., M. B. Araujo, J. Paul, R. J. Ladle, J. E. M. Watson, and K. J. Willis. 2005.
Conservation biogeography: assessment and prospect. Divers. Distrib. 11: 3–23.
Wilcox, B. A. 1978. Supersaturated island faunas: a species–age relationship for lizards on post-
Pleistocene land-bridge islands. Science 199: 996–998.
The Wildlife Society. 1995. Journal news. J. Wildl. Manag. 59: 196–198.
Williams, B. K., J. D. Nichols, and M. J. Conroy. 2002. Analysis and management of animal popu-
lations. Academic Press, San Diego, CA.
Williamson, M. H. 1981. Island populations. Oxford University Press, Oxford.
Wilson, E. O., and D. S. Simberloff. 1969. Experimental zoogeography of islands: defaunation
and monitoring techniques. Ecology 50: 267–278.
Wondolleck, J. M., and S. L. Yaffee. 2000. Making Collaboration Work: Lessons From Innovation
in Natural Resource Management. Island Press, Washington, D.C.
Worster, D. 1994. Nature’s Economy: A History of Ecological Ideas, 2nd edition. Cambridge
University Press, Cambridge.
Chapter 2
Concepts for Wildlife Science: Design
Application
2.1 Introduction
In this chapter, we turn our attention to the concept of basic study design. We begin
by discussing variable classification, focusing on the types of variables: explana-
tory, disturbing, controlling, and randomized. We then discuss how each of these
variable types is integral to wildlife study design. We then detail the necessity of
randomization and replication in wildlife study design, and relate these topics to
variable selection.
We outline the three major types of designs in decreasing order of rigor (i.e.,
manipulative experiments, quasi-experiments, and observational studies) with respect
to controls, replication, and randomization, which we further elaborate in Chap. 3. We
provide a general summary on adaptive management and we briefly touch on survey
sampling designs for ecological studies, with a discussion on accounting for detecta-
bility, but leave detailed discussion of sampling design until Chap. 4.
We discuss the place of statistical inference in wildlife study design, focusing on
parameter estimation, hypothesis testing, and model selection. We do not delve into
specific aspects and applications of statistical models (e.g., generalized linear mod-
els or correlation analysis) as these are inferential, rather than design techniques.
We discuss the relationships between statistical inference and sampling distribu-
tions, covering the topics of statistical accuracy, precision, and bias. We provide an
outline for evaluating Type I and II errors as well as sample size determination. We
end this chapter with a discussion on integrating project goals with study design
and those factors influencing the design type used, and conclude with data storage
techniques and methods, programs for statistical data analysis, and approaches for
presenting results from research studies.
There are many things to be considered when designing a wildlife field study and
many pitfalls to be avoided. Pitfalls usually arise from unsuccessfully separating
sources of variation and relationships of interest from those that are extraneous or
Explanatory variables are the focus of most scientific studies. They include
response variables, dependent variables, or Y variables, and are the variables of
interest whose behavior we wish to predict on the basis of our research hypotheses.
Predictor variables are those variables that are purported by the hypothesis to cause
the behavior of the response variable. Predictors can be discrete or continuous,
ordered or unordered. When a predictor is continuous, such as in studies where
some type of regression analysis is used, it is often called a covariate, an independ-
ent variable, an X variable, and sometimes an explanatory variable (adding confu-
sion in this case). When a predictor is discrete, it is often called a factor, as in
analysis of variance, or class variables as in some statistical programs (SAS
Institute Inc. 2000). Regardless of nomenclature, the goal of all studies is to identify
the relationship between predictors and response variables in an unbiased fashion,
with maximal precision. This is difficult to do in ecology, where the system com-
plexity includes many other extraneous sources of variation that are difficult to
remove or measure on meaningful spatial or temporal scales.
For most wildlife studies, the goal is to estimate the trend, or change in popula-
tion size, over time. Therefore, for example, the response variable could be popula-
tion size for each species, with the predictor being time in years. The resulting
analysis is, in essence, measuring the effect that time has on populations.
Extraneous variables, if not dealt with properly either through control or randomi-
zation, can bias the results of a study. Such variables potentially affect the behavior
of the response variable, but are more of a nuisance than of interest to the scientist
or manager. For example, consider that in some surveys, individuals at a survey
point are not counted (i.e., probability of detection is <1.0). Figure 2.1 illustrates
this point, using simulated data. Based on the raw count of 20 birds of a species in
years 1 and 2, an obvious but biased estimate for trend would be a 0% increase.
However, the actual abundance decreased by 20% from 50 to 40. What is the reason
for this bias? The probability of detection was 0.4 in year 1 and 0.6 in year 2. What
causes variation in detection probability? There are many possibilities, including
changing observers, change of skill for a given observer, variation in weather or
2.2 Variable Classification 39
Fig. 2.1 Comparison of a hypothetical actual trend across 2 years with an estimated trend based
on raw counts. The difference in trend is based on difference detection probabilities in year 1
(0.40) and year 2 (0.60). Reproduced from Morrison et al. (2001), with kind permission from
Springer Science + Business Media
The best way to deal with disturbing variables, if feasible, is to make them control-
led variables, thus removing the potential bias and increasing precision. In many
design and analysis of variance books, we use controlling (blocking) in the design,
but we can potentially block in the statistical analysis as well (Kuehl 2000). By
sampling the same locations each year, we can control for differences in locations.
We could control for observer differences by ensuring that the same observers con-
duct the surveys at a given location each year. However, using the same observers
at the same locations is not practical for many long-term surveys and sampling at
the same location ignores spatial variability; also, sites wear out. Thus, during our
analysis for a given location, we compute the average of the trend estimates over
all the observers who surveyed at that location. This should remove or negate the
40 2 Concepts for Wildlife Science: Design Application
bias due to observer differences (Verner and Milne 1990). However, the more fre-
quently observers change within a given location, the more variability will be
introduced. When the number of observers is equal to the number of years a loca-
tion has been surveyed, there is no basis for computing a trend for a given observer,
as (1) the change in observer is confounded with change over time and (2) the trend
may be confounded with changing observer ability.
Some potentially disturbing variables cannot be controlled for through either design
or a posteriori analysis. This can occur in at least one of three ways:
1. The variable might simply be unrecognized, given that there are numerous
potential sources of variation in nature.
2. The investigator might recognize a potential problem conceptually, but either
does not know how to measure it or finds it impractical to do so.
3. Due to sample size, there is a limit to the number of disturbing variables that can
be accounted for in an analysis.
To minimize the effect of these remaining disturbing variables, we convert them to
randomized variables. Using random selection avoids bias due to any systematic
pattern in trend over space. We discuss randomization further in Sect. 2.3.
dependent on sample size but the variability of the average for a given treatment is
reduced as sample size increases, which makes the test for treatment effect more
powerful. The appropriate sample size for the study should be determined at the
design stage. We discuss sample size estimation in Sect. 2.5.8.
It is important to avoid confusing replication with pseudoreplication (Hurlbert
1984; Stewart-Oaten et al. 1986). In the impoundment example, suppose that you
measure density of invertebrates on a given impoundment at 20 different locations.
This is not a problem if these 20 samples are in reference to the impoundment.
Pseudoreplication here would consist of treating these 20 measurements as being
taken from 20 independent experimental units for the purpose of evaluating the
effect of the treatment (and thus the variation therein as experimental error), when
in fact it is a sample of 20 from one experimental unit for the purpose of estimating
the effect of a particular drawdown treatment (the variation therein is sampling
error). The effect of pseudoreplication is to underestimate experimental error and
increase the probability of detecting a treatment effect that does not really exist.
2.4.2 Quasi-Experiments
Fig. 2.2 Randomly drawn 20 impoundments from five national wildlife refuges of the northeast
U.S. for a hypothetical experiment to study the effect of drawdown speed on spring-migrating
shorebirds. They could be drawn completely randomly, or one could block on refuge by randomly
choosing four impoundments from each refuge. Reproduced from Morrison et al. (2001), with
kind permission from Springer Science + Business Media
of the study. Even where the compromise is great, inference might still be required
(e.g., legal requirements to evaluate impacts). Figure 2.3 (Skalski and Robson 1992)
presents a general classification of studies based on the presence/absence of randomi-
zation and replication.
Stewart-Oaten et al. (1986) described a quasi-experiment that falls in the cate-
gory of impact assessment (see Fig. 2.3), where they infer impact of a power plant
on the abundance of a polychaete, with minimal compromise of rigor. We develop
the field of impact assessment in detail in Chap. 6, but must briefly introduce the
topic here to ensure continuity with other material in the present chapter. They
called this a BACI (before–after/control–impact) design, which is equivalent to
Green’s (1979) “optimal impact study design.” We use the Stewart-Oaten et al.
(1986) example here to illustrate the principles behind this commonly used design,
but treat impact assessment in more detail in Chap. 6.
Mensurative studies (Hurlbert 1984) represent the class of observational studies for
which the researcher suspects certain conditions apply, but where it is not practical
to conduct a manipulative or quasi-experiment. Typical in wildlife research, obser-
2.4 Major Types of Study Designs 45
“why” questions (Gavin 1991), but they can increase our knowledge. However,
understanding of causation under descriptive studies is limited or nonexistent, thus
our inferences tend to be weaker than those derived from experimental manipulation.
To illustrate, consider Sinclair’s (1991) hypothetical study of black-tailed deer
(Odocoileus hemionus) diets during the winter in British Columbia. In this study, the
objective is to document diets of black-tailed deer in a specific location at a specific
time. Studies such as this do not allow us to make broader inferences to all black-
tailed deer in British Columbia because (1) we have not replicated across different
landscapes, (2) we have not randomized or applied multiple treatments (e.g., differ-
ent forage availability), and (3) we have not considered any other populations in
winter in British Columbia. Other descriptive studies include, for example, evalua-
tion of changes in habitat parameters over time based on GIS mapping techniques.
However, these mapping approaches are descriptive as there is little ability to infer
specific causation for the changes in habitat structure other than time as no treat-
ments are applied and areas are not randomly selected, although plots within an area
could be randomly selected if area-specific inferences are warranted.
Descriptive studies, however, can and do provide a wealth of information and
have been the foundation for ecological research for many years. Perhaps the most
useful studies of organisms have been descriptive work, as these studies provide the
foundation for future evaluation of hypotheses regarding mechanisms causing
changes in populations over time.
ARM (Holling 1978; Walters 1986; Williams 1996), sometimes called management
by experiment, is an approach to management that emphasizes continued learning
about a system in order to improve management in the future. Adaptive manage-
ment helps scientists learn about ecological systems by monitoring the results from
a suite of management programs (Gregory et al. 2006a). There are two basic types
of adaptive management, passive management, or management where historical
data and expert opinions combine into a best guess format focused on a singular
hypothesis, and active management, or management where systems are deliberately
perturbed in several ways and managers define competing hypotheses about the
impact of these perturbations (Walters and Holling 1990; Gregory et al. 2006a,b).
Although the process of adaptive management is often termed learning by doing,
it is important to recognize that the management process can be broken into several
elements (Johnson et al. 1993):
● Objective function
● Set of management options
● Set of competing system models
● Model weights
● Monitoring program
2.4 Major Types of Study Designs 47
brings design principles to bear to manage systems in the face of uncertainty while
allowing for learning more about the system to improve future management.
With the possible exception of the inclusion of multiple models, we might view
ARM as simply reflecting what an astute manager might be doing regularly through
subjective assessment and reevaluation of his or her decisions. Unfortunately, many
managers subscribe to this incorrect view of adaptive management. Nevertheless,
as the scientific method promotes rigor and objectivity in studies of natural history,
ARM promotes rigor and objectivity in the management of natural resources. This
rigor becomes more useful as the size and complexity of the managed system
grows, and the number of stakeholders increases.
2.5 Sampling
Sampling in wildlife field studies is most often associated with observational stud-
ies where there is no control of the system under study. We sample when the eco-
logical unit of interest cannot be censused. Thus, we use sampling when it is
impractical to measure every element of interest within an ecological unit of inter-
est. Typically, sampling consists of selecting (based on some probabilistic scheme)
a subset of a population, allowing one to estimate something about the entire target
population (Thompson 2002a). In classical manipulative experimental design
applications, experimental units are frequently small enough for us to measure the
entire experimental unit. For example, when applying different fertilizer types to
small plots of ground, we can collect the entire biomass of the plot to measure bio-
mass differences between fertilizer types.
With ecological manipulative experiments, however, the scale of treatment (or
observation) could be too large to conduct a census of the target population. For
example, if we subject a pine (Pinus spp.) plantation of 40 ha to a controlled burn
to estimate the effect on subsequent understory growth within that plantation, we
must evaluate the number of new shoots and their subsequent growth and survival
for multiple years. However, enumeration of all new shoots in just the first year
across the 40-ha plantation would be logistically infeasible (and probably unneces-
sary); thus, we must take a sample of new shoots, perhaps among ten 5 m × 5 m
vegetation plots randomly placed throughout the plantation.
In probability sampling, each element (e.g., animal or plot of vegetation) in the
unit has some nonzero probability of selection. We will measure those selected
units for the variable of interest (e.g., number of new shoots). Each selected element
is measured and summaries (e.g., means and variances) of these measurements
serve as the measurement for the sampling unit of interest. Using the summarized
measurements, we extrapolate to the ecological unit of interest.
Given that a complete count of each element is only rarely achieved in wildlife
studies, the purpose of sampling is to estimate the parameter (survival, abundance,
or recruitment) of interest while accounting for (1) spatial, (2) temporal, and (3)
sampling variations as well as accounting for imperfect detectability (Sect. 2.4.1).
2.5 Sampling 49
The properties of a sample are a function of the design, and therefore the antici-
pated data analysis should have a bearing on the design. We suggest that research-
ers use pilot studies to (1) evaluate the data collection methodologies, (2) determine
necessary sample sizes to obtain estimates with some accepted level of variation
and minimal bias, and (3) allow for optimal allocation of sampling effort over space
and time.
Fig. 2.4 The abundance of “Species X” at the Impact and Control stations, and the difference of
the abundances, as functions of time, in three versions of impact assessment. (a) In the most naive
view, each station’s abundance is constant except for a drop in the Impact station’s abundance
when the power plant starts up. (b) In a more plausible but still naive view, the abundance fluctu-
ates (e.g., seasonally), but the difference still remains constant except at start-up of the power
plant. (c) In a more realistic view, the abundance fluctuates partly in synchrony and partly sepa-
rately; the former fluctuations disappear in the differences but the latter remain, and the power
plant effect must be distinguished from them. Reproduced from Stewart-Oaten et al. (1986), with
kind permission from Springer Science + Business Media
population status (Thompson 2002b). However, the above approach assumes that
detection rates remain constant among all survey sites, observers, weather condi-
tions, species, and time periods, a seemingly “absurd” assumption (Anderson 2001,
p. 1295).
Consider the case where N1 and N2 are the population sizes in years 1 and 2, and
the change in population size N2/N1 is of interest. A common approach is to esti-
mate this ratio with the ratio of an index to population size for the 2 years. If the
index is a count (C) from some standardized protocol (e.g., number of deer seen
along a road transect), it is appropriate to characterize this count Ct at time t as Ct
= Nt pt, where pt is the detection probability. The ratio of the counts would thus be
C2 /C1 = N2 p2 /N1 p1. This estimator for the ratio N2/N1 is therefore unbiased if and
only if (iff) p1 = p2.
Given the above example, assume that a deer transect survey was used to evalu-
ate a deer population. Assume an average detection probability of 0.74 for the
observers during the initial survey and 0.44 for the observers during second survey
(Thompson 2002b; Anderson 2003). In addition, assume that the population
remains closed between survey periods (e.g., no births, death, immigration, or emi-
gration) at 250 deer. The count during the first transect survey (C1) would be 185
(0.74 × 250), whereas the count during the second survey (C2) would be 110 (0.44
× 250). Using the above ratio estimator (N2/N1), these values would indicate a
decline (110/185) of approximately 41%, when actually no decline had occurred.
Additionally, a mechanism for evaluating which disturbing variables (e.g., multiple
sampling occasions or interobserver variability) contributed to the different detec-
tion probabilities is unknown in this simple example.
This example illustrates the problems that can result in this type of sampling. We
deal with detection probability in at least one of three ways.
(1) Assume that detection probability varies randomly across time, space,
treatments, observers, and other factors of interest, and therefore, on average,
detection probabilities will be the same. This is the most common, but often
questioned, approach to the issue of detection probability and we do not recom-
mended this approach (Anderson 2001, 2003; Rosenstock et al. 2002;
Thompson 2002b).
(2) Identify the disturbing variables that cause detection probability to vary and
model them as predictors of the counts. This is the approach discussed earlier
(Sect. 2.2).
(3) Estimate the detection probabilities and those factors influencing variation in
detection directly (Williams et al. 2002), which is the most desirable option
because it relies on weaker assumptions of the attribute of interest (population
size), but typically requires substantially more effort than the other approaches.
Option 1 is the most naive approach, but it is also the simplest and cheapest option.
Option 3 is the only reasonable choice if aspects of population dynamics are of
interest to the researchers.
The most commonly used methods that account for detectability fall into two
general categories: capture–recapture methods and distance sampling methods
2.5 Sampling 53
Wildlife field research requires that scientists ask challenging questions before,
during, and after the design process. In addition to the common question of “What
is my objective?” researchers must also ask questions like “How am I going to
collect data for this research?” and “What analyses will I do with the data I have
collected?” Too many researchers treat not only statistical design, but statistical
inference as afterthoughts, only to discover that they cannot evaluate the research
question of interest because either data collection methodology was flawed or sam-
ple sizes were inadequate. We maintain that scientists can do a much better job if
they think out the entire study design thoroughly, from formulation of the general
scientific question to specific research hypotheses, through designs most likely to
allow strong inferences.
Investigators should consider several aspects of statistical inference during the
design stage to achieve their goals. These topics are covered in undergraduate
courses in statistics. Thus, each is an important topic and is worth reviewing here.
There are two primary areas of statistical inference: testing of hypotheses and esti-
mation of parameters. We will limit our focus in this section to the classical
approach to statistical analysis (frequentist) rather than the approaches based on
Bayesian theory, acknowledging that both approaches have a place in wildlife
research. To illustrate the comparison of the two approaches, consider a study
comparing the average thickness of the eggshells of osprey (Pandion haliaetus)
treated with the insecticide DDT (dichlorodiphenyl-trichloroethane). Suppose the
overlying biological hypothesis is that DDT reduces the productivity of osprey, and
more specifically, that it thins the eggshell, thus making eggs more fragile. Under
a hypothesis-testing approach, the null statistical hypothesis might be H0: There is
no difference in average eggshell thickness between lakes with and without DDT
residues. The alternative hypothesis would likely be HA: Average eggshell thickness
is lower where there is DDT residue. The researcher then collected a sample of eggs
from lakes both with and without DDT exposure, computed sample means and vari-
ances, and performed a statistical test on the difference between the sample means
(e.g., a two-sample t-test) to determine if the difference was statistically significant.
An estimation approach to address the same question would be to estimate the dif-
ference in average eggshell thickness for samples of eggs from lakes with and
without DDT exposure, construct a confidence interval around this difference, and
determine if the confidence interval includes 0. If the interval does not include 0,
then the difference would be statistically significant. Obviously, there is an intrinsic
link between hypothesis testing and estimation, and frequently the two approaches
give similar, if not identical, results.
2.6 Statistical Inference 55
Model selection and inference procedures have become increasingly common in the
field of wildlife ecology since the early 1990s (Lebreton et al. 1992; Burnham et al.
1995; Burnham and Anderson 2002), primarily as an alternative to statistical null
hypothesis significance testing (Anderson et al. 2000). Estimation procedures in
wildlife ecology have slowly shifted toward evaluating “nuisance” parameters
(detectability, capture rates) as well as parameters of interest like survival and abun-
dance (Lebreton et al. 1992), creating a clearly defined break with hypothesis testing
(Sect. 2.5.1). Model-based inference has become more important in ecological stud-
ies, with its focus being primarily on the analysis of data collected from capture–
recapture studies (Lebreton et al. 1992; Burnham et al. 1995) as most programs used
for population parameter estimation use model selection criteria (Sect. 2.6.2; White
and Burnham 1999; Arnason and Schwarz 1999; Buckland et al. 2001). Design-
based inferences, which are the foundation of sampling literature, are more uncom-
mon in wildlife sciences, likely due to the logistical difficulties with replication and
randomization. Although design-based inferences are the most statistically power-
ful, and in many cases can justify the use of hypothesis-testing approaches, addi-
tional inference is necessary if stochastic processes determine the distribution,
detectability, or characteristics of a population of interest (Buckland et al. 2000).
Currently, there are four groups with respect to evaluation of hypotheses and use
of model selection in wildlife sciences. We suggest that the first two groups include
scientists who are uninterested in analytical or statistical ecology (probably the
largest group) and those scientists who are interested in a specific analytical cook-
book that suits their specific needs (probably the other largest group). The other two
groups, in our opinion, represent a minority, although highly vocal, (relative to all
scientists), who focus on the development and evaluation of different statistical
procedures. Neither group disagrees with the basic fact that the “Immediate issue
is how to present useful and sensible results from field studies” (Eberhardt 2003,
p. 241), and both seem to agree that exorbitant usage of silly null hypotheses and p
values are unnecessary (Cherry 1998; Johnson 1999). One group suggests that
model selection is superior to other analytic methods (Anderson et al. 2000;
Burnham and Anderson 2002; Lukacs et al. 2007), whereas the other group sug-
gests that wildlife ecologists might simply be substituting one rote statistical tech-
nique (model selection) for another (null hypothesis significance testing), while
losing track of the more fundamental biological questions and related research
hypotheses (Guthery et al. 2001, 2005; Stephens et al. 2007a,b).
One of the primary criticisms of hypothesis testing is that scientists take the
results from a single, unreplicated study and make wide-ranging management
suggestions based on estimates of statistical significance (Johnson 1999). This
2.6 Statistical Inference 57
differs considerably from Fisher’s belief that hypotheses (and thus hypothesis tests)
were only valid across a series of experiments, as they would confirm the size and
direction with replication (Fisher 1929). However, this same issue holds true with
respect to statistical inference for model selection approaches to inference, in that
studies are frequently not replicated; thus, although the inference engine has
changed, the validity of the results should still be questioned until adequate meta-
replication (replication of the entire study) has been conducted (Johnson 2002b).
Information-theoretic approaches suggest a priori (e.g., before data collection,
preferably during study design) specification of candidate models (Burnham and
Anderson 2002). We agree with this general approach to science (thinking before
you act) as it forces scientists to evaluate and justify data collection needs. Indeed,
this is the underlying motivation for this book in that wildlife studies should be
conceived beforehand rather than as an afterthought. Careful planning upfront
keeps scientists from using a shotgun approach to hypothesis creation (e.g., testing
all possible relationships); we suspect that it is only rarely accomplished in obser-
vational studies. For example, one author of this book published a set of models he
posited before study implementation. He was instructed to evaluate more than five
additional models before his work could be published based on reviewer comments
about the data he presented. Thus, we suggest that although critical thinking before
study implementation is extremely important, most sets of candidate models should
be posited after preliminary data collection and evaluation using graphical displays
(Anscombe 1973), summary statistics, or some other supplementary method
(Eberhardt 2003) or after initial evaluation of an a priori set (Norman et al. 2004).
Either approach should reduce the frequency of vacuous candidate models in wild-
life studies (Guthery et al. 2005). However, we do not endorse detailed exploratory
data analysis or data mining, where a researcher looks for relationships between the
data without considering biological plausibility.
Although there is a multitude of research extolling or deprecating many statisti-
cal approaches to wildlife ecology, there seems to be a little gray area in this discus-
sion, with some treating model selection as “… the alternative to null hypothesis
testing” (Franklin et al. 2001) while others question the usefulness of information-
theoretic approaches as a replacement for all other ecological statistics (Guthery et
al. 2001, 2005; Steidl 2006). Statistical hypothesis testing has several limitations in
observation studies, but under Fisher’s (1929) model of multiple experimentation
can provide useful results. Model selection is a useful statistical tool for biologists
to use in observational studies for estimation and prediction, but does not substitute
for replicated experiments. There are numerous statistical tools available to wildlife
scientists, and we suggest that the use of many tools can assist with furthering our
understanding of ecological systems.
allows us to extend results from the specific to the general (Mood et al. 1974). As
discussed in Chap. 1, the purpose of a research study is to make valid inference to
the target population about some set of parameters that describe attributes of the
target population. In wildlife field studies, sampling from a population and then
drawing inference to the population based on the sample collected usually accom-
plishes this. Consider our osprey example: the parameter of interest was the aver-
age, or arithmetic mean, thickness of the shells of all osprey eggs found in nests
around a particular lake. However, it is often impractical to measure every egg
within a nest because the measurement would be invasive (i.e., the egg must be
broken unless samples were taken after hatch); thus the osprey population would be
negatively affected for the duration of the study because of the sampling protocols.
Therefore, we take a sample of size n from the N eggs in the population. We then
summarize those data collected from the sample (n) into one or more statistics (e.
g., mean, variance, range). We make inferences about populations under study
using one of these statistics. An estimator is a statistic that serves to approximate a
parameter of interest. Because our interest is in the mean thickness (m) of the osprey
eggs in that population, a logical estimator is the mean thickness (− X ) of the sample
eggs. However, we cannot assume that − X = m; but we hope that it is close. Assuming
repeated samples from a probability distribution (see below) that has mean (m) and
variance (s2), our expectation would be that the expected value of − −
X (e.g., E[X ] =
−
m), or that the average X is equal to the m that we are interested in estimating (Mood
et al. 1974). Thus, from our osprey example, we cannot assume that − X = m, nor can
we assume that another random sample of size n from the osprey eggs in the same
location will have − X = m. This is because the thickness of eggshells varies across
the population, and thus so does the eggshell mean thickness (− X ) from samples of
eggs drawn from that population. The probability distribution that formed from the
variation in an estimator across multiple samples is called a sampling distribution.
This distribution has a mean and variance parameter (measures of central tendency
and variation).
The objective of statistical inference is to identify the sampling distribution of
the estimator in relation to the parameters of interest. Properties of the sampling
distribution define the properties of the estimator in that we would assume that any
measurements taken from wildlife (e.g., offspring number, size, or weight) would
exhibit considerable variation from population to population and within a popula-
tion because of differences in age, sex, or reproductive status. A researcher could
determine the most likely sampling distribution by taking n samples and building a
frequency diagram of the results. However, it is likely that the effort that would be
necessary to fully specify this distribution using n samples would be more than it
would take to census the population (eggshell widths), and would be disruptive to
the population (osprey) under study.
The classical approach in ecological field studies is to assume a form for the
sampling distribution (e.g., Normal distribution), and then use the data collected
during the study to estimate the parameters that specify the distribution. There is
considerable literature on the case where the parameter of interest is an arithmetic
mean m and the estimator is the sample mean − X , as in our osprey example. If the
2.6 Statistical Inference 59
In Sect. 2.6, we outlined several factors used for making valid inferences in wildlife
study design. The necessity for this should be apparent because the intention in
wildlife field studies is to make inferences from the sample collected to the popula-
tion. Thus, we require that our hypothesis tests or parameter estimates adequately
represent the population as a whole (Thompson et al. 1998). When using the results
from a sample to make population inferences, an estimator q̂ is beneficial if it pro-
vides a good approximation of the population parameter q. A good approximation
depends upon the amount of error, which is associated with q̂. There are two basic
desirable properties for an estimator; bias and precision (note: accuracy is a com-
bination of bias and precision). The first (bias) is that the estimator mean be as close
to the parameter as possible and the second (precision) is that the estimator does
not vary considerably over multiple samples.
Both bias and precision are well-defined properties of estimators. A useful sta-
tistical concept for defining these terms is the expected value, which is nothing
more than the average of values parameter x can take, weighted by the frequency
60 2 Concepts for Wildlife Science: Design Application
of occurrence of those values (Mood et al. 1974; Williams et al. 2002). For exam-
ple, the expected value of − −
X , designated as E(X ), is equal to the population mean m.
The expected value of a sample is simply the arithmetic average of the values of x,
where each value has equal weight. Thus, E(− X ) is the long-term limiting average
from independent repeated experiments, thus an average of an average.
The bias of an estimator q̂ is the difference between its expected value and the
parameter of interest q:
The square root of the variance (ÖVar q̂) ) of any random variable (e.g., osprey egg-
shell thickness) is called the standard deviation. However, when the random varia-
ble is an estimator for a parameter (e.g., if the random variable is a mean (−
X ) ), the
standard deviation is more commonly called a standard error. Wildlife science stu-
dents commonly confuse standard deviation and standard error, and this confusion
has carried through to professional wildlife biologists. From a given sampling dis-
tribution (estimated or assumed), the standard error is the standard deviation from
this distribution. Confusion regarding estimation of standard error is the most rele-
vant when considering population means as the parameter of interest. Consider our
osprey eggshell thickness example. A sample of n eggs is randomly collected and
we measured shell thickness (x) for each egg, and computed the sample mean − X and
sample variance s2:
−
X =兺
n
x /n
i=1 i
and
s2 = 兺i=1 (xi −
X )2 / (n 1)
n
We are interested in the sampling distribution of −X , and statistical theory shows that
the standard error of −X is s−
X
= s/÷n, where s is the standard deviation for the popu-
lation of egg thickness. Thus, a reasonable estimator for this parameter is s−X = s/÷n,
where s is an estimator for the standard deviation of the thickness of individual
eggshells in the population and s−X = s/÷n is an estimator for the standard deviation,
or standard error, of the mean thickness of a sample of n eggshells randomly chosen
from the population under study.
Here, we show the oft-seen bulls-eye graphic illustrating varying levels of bias
and precision (Fig. 2.6). Figure 2.6a indicates a precise estimate, but biased;
62 2 Concepts for Wildlife Science: Design Application
Fig. 2.6b indicates an unbiased estimate with low precision. Figure 2.6c indicates
an estimate that is both imprecise and biased, whereas Fig. 2.6d is the optimal case
where the estimate is both unbiased and precise. Note that in those cases where bias
is shown (Fig. 2.6a and c), there is a systematic difference between the replicated
parameter q̂ that causes it to differ from the population parameter q (Williams et al.
2002). The most accurate estimator is usually one that minimizes bias and maxi-
mizes precision, and typically is a balance between the two. Estimator accuracy is
commonly measured using mean squared error (MSE), which is a variation of the
estimator q̂ around the parameter q:
2.6.5 Assumptions
variable of interest. Many methods require assumptions regarding the form of the
sampling distribution (e.g., normally distributed). However, in some cases, the cen-
tral limit theorem allows us to relax those assumptions as long as sample sizes are
large. Model-based inferences (e.g., CMR studies) frequently require more strin-
gent adherence to certain assumptions about the relationship between the pre-
dictands and predictors. Nonparametric methods still require assumptions, contrary
to the application by many in the scientific community (Johnson 1995). Resampling
(e.g., randomization, bootstrapping; Manly 1991), where the shape of the sampling
distribution is derived empirically from repeated sampling of the data that were
collected, has requirements such as initial random samples or observations that are
exchangeable under the null hypothesis with the consequence that test of difference
in location requires equal variance.
Although assumptions are ubiquitous in study design and statistical inference,
many methods are robust to moderate violations of assumptions. For example,
some methods requiring normality are robust to deviations from normality when
distributions are symmetric. Additionally, model-based inferences for capture–
recapture methods are in some cases robust to violations of the population closure
assumption (Kendall 1999). Thus, researchers should not lose heart because there
are numerous methods one can choose from, with varying degrees of assumption
complexity. We recommend that investigators use available analytical tests and
graphical evaluations to verify whether violations of assumptions have occurred.
At the design and inference stages, we recommend that the investigator identi-
fies the assumptions necessary for the suggested approach and then ask the follow-
ing questions: (1) Are there any assumptions that are likely to be severely violated?
(2) For assumptions that will be difficult to achieve, is there anything I can do to
meet those assumptions more closely? (3) Is the analytical method I will be employing
robust to violations of the assumptions I am likely to violate? (4) If analytical prob-
lems such as bias are likely to be an issue, are there alternative design- or model-
based approaches I can implement, which would provide me with results that are
more robust? Critical thinking about the question under investigation and the study
design at hand will greatly increase the probability that the study will produce bio-
logically and statistically meaningful results.
Under a classical (frequentist) design for statistical inference using hypothesis test-
ing where there is a specific null hypothesis, an omnibus alternative hypothesis, and
a specified level of significance for that test, we can have two types of errors. A
Type I error, rejecting the null hypothesis when it is true, occurs with probability a,
which typically is set by the investigator (i.e., the a level of the test, often a = 0.05;
Cherry 1998). The p value is a related concept. Historically, scientists have viewed
p values as a measure of how much evidence we have against the null hypothesis
we are evaluating. However, we prefer the definition given by Anderson et al.
64 2 Concepts for Wildlife Science: Design Application
used for sample size estimation are straightforward and can be found in most intro-
ductory statistical texts or statistical software. For others, especially when the esti-
mator of interest is a function of other random variables, sample size is more easily
determined numerically through simulation.
The assumption that an estimator follows some specified distribution is often
only approximate. Thus, some investigators feel that a priori computation of sample
size is not necessary. Rather, “getting as large a sample as you can” becomes the
prevailing philosophy. Although this approach can be advantageous, as increasing
sample size increases the likelihood that a statistical test will be significant (Johnson
1995), it is not good practice. First, eventually assumptions will be required to ana-
lyze the data. Second, although computed precision or power for a given sample
size is never exactly achieved, a rough estimate of sample size is useful for plan-
ning. For example, if the required sample size under modest assumptions indicates
that a sample size of 100 is necessary to meet study objectives, but the current
logistical plans allow for the collection of only 10 samples, then the process of
sample size determination was useful. Third, most of us do not have the luxury of
limitless budgets. As a result, we need to be as efficient as possible in conducting
research. This efficiency is possible when you define the sample sizes needed and
design your study accordingly.
The underlying reason for scientific inquiry in wildlife science is conservation and
maintenance of species, communities, and biodiversity over time and space.
Therefore, all wildlife research revolves around the development of methods to
assist with studying populations and evaluating those factors that influence popula-
tion trajectories. The first step in developing any research study, regardless of the
topic, is to clearly define the project’s goals (Thompson et al. 1998). Questions
should be worthwhile (of some conservation or management importance; MacKenzie
et al. 2006) and should be attainable (Sutherland 2006). Establishing general study
goals and framework is critical for determining information needs, the necessary
data, the time period of study, and the use of the data. In this section, we discuss
how to link project goals, study design, data collection, data interpretation, and data
presentation into a package that will result in meaningful conclusions.
Ecological research projects require well thought-out questions (Chap. 1), adequate
sampling and experimental designs (Chaps. 3 and 4 and this chapter), which ensure
the target population is identifiable. As an example, consider the question we
66 2 Concepts for Wildlife Science: Design Application
The emphasis of this book is on the design of ecological field studies; however,
design and inference are intimately related (Williams et al. 2002). Since in most
studies we are observing only a portion of the population, our usual interest is esti-
mation of population parameters (abundance, survival, recruitment, and move-
ment), which we hypothesize, based on our design, are characteristic of the entire
population (Thompson 2002).
Statistical methods available for analysis of ecological data are extensive; cur-
rent approaches include among classical frequentist and estimation methods, infor-
mation-theoretic, and Bayesian approaches (Burnham and Anderson 2002; Link
et al. 2002; Ellison 2004; Steidl 2006). However, because these approaches are
tools, we should treat them as tools, or means, rather than ends. The list of potential
statistical methodology used in wildlife sciences is considerable, and the choice of
approach depends on the species under study, questions of interest, study design,
and type of data collected. Thus, we will refrain from discussing the intricacies of
specific analytical methods (e.g., AIC for linear regression) and instead focus on a
2.7 Project Goals, Design, and Data Collection, Analysis, and Presentation 67
general discussion of analytical systems under which most wildlife scientists con-
duct analyses.
First, and we quote, “Lets not kid ourselves: the most widely used piece of soft-
ware for statistics is Excel” (Ripley 2002). We use spreadsheets such as Microsoft
Excel for four primary purposes in wildlife studies: (1) data entry and storage, (2)
data manipulation, (3) statistical analysis, and (4) graphic creation (see Sect. 2.4.7).
In fact, Excel has become the “de rigueur” initial location where most data analyses
are conducted and graphics developed in the wildlife sciences. This is likely due to
Excel’s availability and simplicity. This simplicity, however, comes at a price –
considerable mathematical inaccuracies. Errors associated with statistical computa-
tions in Excel are common (McCullough and Wilson 1999, 2002, 2005), although
the ecological community is slow to recognize the limitations of Excel for anything
other than data entry, storage, and manipulation.
Databases are an alternative for data storage. Databases are collections of
records linked through a data model and provide a description of how we represent
and manipulate data. They come in many different forms, ranging from simple table
models or two-dimensional arrays of data, in which columns indicate similar values
and rows indicate individuals or groups, to hierarchical and relational models
(Codd 1970; Date 2003). Data in hierarchical models are organized into a tree-like
structure that allows for repeating information using a parent–child relationship.
For example, a study site could be the parent, and the birds radiomarked on the site
the children. Relational databases are databases that use a set of relations to order
and manipulate data. A well-designed relational database helps ensure data are
entered in the correct format, takes up considerably less disk space, and is much
less likely to be corrupted by user errors as compared with a spreadsheet containing
the same data. We use databases widely in wildlife sciences primarily for data stor-
age and manipulation, but databases are probably underutilized, given their great
flexibility and range of applications, including analysis and data reporting.
Next, after data collection and transfer to some data storage format, wildlife
ecologists typically want to summarize and interpret the data using one or several
statistical procedures. Luckily, there exist a number of statistical programs for
analysis of ecological data. Nevertheless, these programs vary in functionality, ease
of use, and accuracy. Summary statistics (means, variances, and medians) are esti-
mated in nearly any program, and as such will not be discussed. Additionally,
approaches to link data in these formats with each of the below statistics programs
are readily available, although certain programs require specific data formats not
outlined (e.g., .inp files in MARK).
Some of the more common statistical environments used in wildlife science
include (this list is not comprehensive): Jump, SPSS Inc. (1999), SAS (SAS
Institute Inc. 2000), SYSTAT (SYSTAT 2002), MINITAB (Minitab 2003),
STATISTICA (StatSoft 2003), STATA (StataCorp 2005), and GENSTAT (Payne
et al. 2006). Each of these programs has advantages and disadvantages. For exam-
ple, SAS efficiently conducts batch processing, simplifying data manipulation and
analysis for large datasets; GENSTAT, SPSS, and STATISTICA all have excellent
GUIs (graphical user interfaces). SPSS is taught as the primary undergraduate and
68 2 Concepts for Wildlife Science: Design Application
graduate statistics package in many universities across the United States while SAS
has a considerable presence in both the academic research and business worlds. The
downside to most of these, however, is cost, as most are expensive and some require
annual licensing, although student versions are inexpensive. Scientific program-
ming and statistical computing environments also include programming languages
like S (Venables and Ripley 2002) and SPlus (Chambers 1998), and programming
environments such as MATLAB (2005) and R (R Development Core Team 2006).
These systems have been at the forefront of nearly all statistical computing for the last
decade and have a wide group of active users involved with development and testing.
Because each of these four environments allows command line, high-level program-
ming, they provide more flexibility with modeling and figure development. R is open-
source freeware while S, SPlus, and MATLAB are available for purchase.
Statistical programs designed to estimate population parameters are widely
available and have seen a dramatic increase in use by wildlife scientists since the
advent of powerful personal computers in the 1990s (Schwarz and Seber 1999;
Amstrup et al. 2005). The most recognizable, Program MARK (White and Burnham
1999), is used for estimation of parameters from “marked” individuals (hence the
name). MARK has become the standard engine for >100 different modeling
approaches ranging from survival estimation using telemetry data to abundance
estimation in closed systems from CMR data. However, other programs exist for
population parameter estimation, including RMARK (R-based system invoking
MARK for parameter estimation; Laake 2007), POPAN (open population mark–
recapture/resight models; Arnason and Schwarz 1999), and abundance estimation
using Distance (Buckland et al. 2001) and NOREMARK (White 1996) and occu-
pancy estimation using Presence (MacKenzie et al. 2006). Regularly updated, as
new methods become available, these programs represent state-of-the-art methods
for population parameter estimation. Users should note, however, that nearly all of
these “wildlife-specific” programs rely on information-theoretic approaches to
model selection and inference (Burnham and Anderson 2002), which require con-
siderable statistical background to ensure that resulting inferences are appropriately
developed, applied, and interpreted.
Wildlife research is primarily descriptive; all that varies is the choice of methods
(e.g., summary statistics, hypothesis tests, estimation procedures, and model selec-
tion) used to describe the system of interest. Statistical methods used in wildlife
science range from simple data description to complex predictive models (Williams
et al. 2002). As shown in the previous section, statistical applications have become
increasingly important in the examination and interpretation of ecological data to
the extent that entire programs have been developed for estimation of specific popu-
lation parameters. Approaches to presenting data are unlimited and dependent upon
the context (e.g., oral presentation, peer-reviewed article), so we will limit our
2.7 Project Goals, Design, and Data Collection, Analysis, and Presentation 69
discussion to a few key points. Note that Anderson et al. (2001) provided general
suggestions regarding (1) clarification of test statistic interpretation, (2) presenta-
tion of summary statistics for information-theoretic approaches, (3) discussion of
methods for Bayesian explanation, and (4) general suggestions regarding descrip-
tion of sample sizes and summary descriptive statistics (e.g., means).
Tables should be used to present numerical data that support specific conclu-
sions. Tables have the following general characteristics: they should present rele-
vant data efficiently in an unambiguous manner and each table should be readily
interpretable without reference to textual discussion. Tables tend to outline specific
cases (e.g., number of mortalities due to harvest) while graphics (see below) are
used to describe relationships between parameters (Sutherland 2006). Table head-
ings, row labels, and footnotes should precisely define what the data in each row–
column intersection mean. Tables are amenable to a wide variety of data types,
from absolute frequency data on captured individuals to summary parameter esti-
mates (Figs. 2.7 and 2.8, respectively).
Graphics also are important for interpreting data as they allow the reader to visu-
ally inspect ecological parameter estimates. Graphics should display ecological data
efficiently and accurately, and there is a wide range of graphical options available to
researchers (Cleveland 1993; Maindonald and Braun 2003). Tufte (1983) suggests,
“Excellence in statistical graphics consists of complex ideas communicated with
clarity, precision, and efficiency. Good graphs should (from Tufte 2001):
● Illustrate the data
● Induce the viewer to think about the substance rather than methodology, design,
or technology of graphic construction
Fig. 2.7 Example table showing summary data enumerating the number of individuals captured
during a research study. Reproduced from Lake et al. (2006), with kind permission from The
Wildlife Society
70 2 Concepts for Wildlife Science: Design Application
Fig. 2.8 Example table showing summary parameter estimates for all individuals captured dur-
ing a research study. Reproduced from Taylor et al. (2006), with kind permission from the Wildlife
Society
2.8 Summary
The emphasis of this book is on the design of ecological research with this chapter
focusing on the relationship between study design and statistical inference. In Sect.
2.2, we discussed the different types of variables common to wildlife studies:
References 71
References
Buckland, S. T., I. B. J. Goudie, and D. L. Borchers. 2000. Wildlife population assessment: past
developments and future directions. Biometrics 56: 1–12.
Buckland, S. T., D. R. Anderson, K. P. Burnham, J. L. Laake, D. L. Borchers, and L. Thomas.
2001. Introduction to Distance Sampling. Oxford University, Oxford.
Burnham, K. P. and D. R. Anderson. 2002. Model selection and multimodel inference: a practical
information-theoretic approach, 2nd Edition. Springer-Verlag, New York.
Burnham, K. P., D. R. Anderson, G. C. White, C. Brownie, and K. P. Pollock. 1987. Design and
analysis of methods for fish survival experiments based on release–recapture. Am. Fish. Soc.
Monogr. 5: 1–437.
Burnham, K. P., G. C. White, and D. R. Anderson. 1995. Model selection strategy in the analysis
of capture–recapture data. Biometrics 51: 888–898.
Chambers, J. M., W. S. Cleveland, B. Kleinez, and P. A. Turkey. 1983. Graphical methods for data
analysis. Wadsworth International Group, Belmont, CA, USA.
Chambers, J. M. 1998. Programming with data. A guide to the S language. Springer-Verlag,
New York.
Cherry S. 1998. Statistical tests in publications of The Wildlife Society. Wildl. Soc. Bull. 26:
947–953.
Cleveland, W. S. 1993. Visualizing Data. Hobart, Summit, NJ.
Cleveland, W. S. 1994. The Elements of Graphing Data. Hobart, Summit, NJ.
Cochran W. G. 1977. Sampling Techniques, 3rd Edition. John Wiley and Sons, New York.
Codd, E. F. 1970. A relational model of data for large shared data banks. Commun. ACM 13:
377–387.
Cohen, J. 1988. Statistical power analysis for the behavioral sciences, 2nd Edition. Lawrence
Erlbaum Associates, Inc., Mahwah, NJ.
Collier, B. A., S. S. Ditchkoff, J. B. Raglin, and J. M. Smith. 2007. Detection probability and
sources of variation in white-tailed deer spotlight surveys. J. Wildl. Manage. 71: 277–281.
Cook, R. D., and J. O. Jacobsen. 1979. A design for estimating visibility bias in aerial surveys.
Biometrics 35: 735–742.
Date, C. J. 2003. An Introduction to Database Systems, 8th Edition. Addison Wesley, Boston,
MA.
Dinsmore, S. J., G. C. White, and F. L. Knopf. 2002. Advanced techniques for modeling avian
nest survival. Ecology 83: 3476–3488.
Eberhardt, L. L. 2003. What should we do about hypothesis testing? J. Wildl. Manage. 67:
241–247.
Ellison, A. M. 2004. Bayesian inference in ecology. Ecol. Lett. 7: 509–520.
Farfarman, K. R., and C. A. DeYoung. 1986. Evaluation of spotlight counts of deer in south Texas.
Wildl. Soc. Bull. 14: 180–185.
Fisher, R. A. 1925. Statistical Methods for Research Workers. Oliver and Boyd, London.
Fisher, R. A. 1929. The statistical method in psychical research. Proc. Soc. Psychical Res. 39:
189–192.
Fisher, R. A. 1935. The Design of Experiments. Reprinted 1971 by Hafner, New York.
Franklin, A. B., T. M. Shenk, D. R. Anderson, and K. P. Burnham. 2001. in T. M. Shenk and
A. B. Franklin, Eds. Statistical model selection: the alternative to null hypothesis testing, pp.
75–90. Island, Washington, DC.
Gavin, T. A. 1991. Why ask “Why”: the importance of evolutionary biology in wildlife science.
J. Wildl. Manage. 55: 760–766.
Gerard, P. D., D. R. Smith, and G. Weerakkody. 1998. Limits of retrospective power analysis. J.
Wildl. Manage. 62: 801–807.
Green, R. H. 1979. Sampling Design and Statistical Methods for Environmental Biologists. Wiley,
New York.
Gregory, R., D. Ohlson, and J. Arvai. 2006a. Deconstructing adaptive management: criteria for
applications in environmental management. Ecol. Appl. 16: 2411–2425.
Gregory, R., L. Failing, and P. Higgins. 2006b. Adaptive management and environmental decision
making: a case study application to water use planning. Ecol. Econ. 58: 434–447.
References 73
Guthery, F. S., J. J. Lusk, and M. J. Peterson. 2001. The fall of the null hypothesis: liabilities and
opportunities. J. Wildl. Manage. 65: 379–384.
Guthery, F. S., L. A. Brennan, M. J. Peterson, and J. J. Lusk. 2005. Information theory in wildlife
science: critique and viewpoint. J. Wildl. Manage. 69: 457–465.
Hayes, J. P., and R. J. Steidl. 1997. Statistical power analysis and amphibian population trends.
Conserv. Biol. 11: 273–275.
Holling, C. S. (ed.) 1978. Adaptive Environmental Assessment and Management. Wiley, London.
Hurlbert, S. H. 1984. Pseudoreplication and the design of ecological field experiments. Ecol.
Monogr. 54: 187–211.
Johnson, D. H. 1995. Statistical sirens: the allure of nonparametrics. Ecology 76: 1998–2000.
Johnson, D. H. 1999. The insignificance of statistical significance testing. J. Wildl. Manage. 63:
763–772.
Johnson, D. H. 2002a. The role of hypothesis testing in wildlife science. J. Wildl. Manage. 66:
272–276.
Johnson, D. H. 2002b. The importance of replication in wildlife research. J. Wildl. Manage. 66:
919–932.
Johnson, F. A., B. K. Williams, J. D. Nichols, J. E. Hines, W. L. Kendall, G. W. Smith, and D. F.
Caithamer. 1993. Developing an adaptive management strategy for harvesting waterfowl in
North America. In Transactions of the North American Wildlife and Natural Resources
Conference, pp. 565–583. Wildlife Management Institute, Washington, DC.
Kendall, W. L. 1999. Robustness of closed capture–recapture methods to violations of the closure
assumption. Ecology 80: 2517–2525.
Kendall, W. L., B. G. Peterjohn, and J. R. Sauer. 1996. First-time observer effects in the North
American Breeding Bird Survey. Auk 113: 823–829.
Kish, L. 1987. Statistical Design for Research. Wiley, New York.
Kuehl, R. O. 2000. Design of Experiments: Statistical Principles of Research Design and Analysis,
2nd Edition. Brooks/Cole, Pacific Grove, California.
Laake, J. L. 2007. RMark, version 1.6.1. R package. http://nmml.afsc.noaa.gov/Software/marc/
marc.stm
Lebreton, J.-D., K. P. Burnham, J. Clobert, and D. R. Anderson. 1992. Modeling survival and
testing biological hypotheses using marked animals: a unified approach with case studies.
Ecol. Monogr. 62: 67–118.
Lehnen, S. E., and D. G. Krementz. 2005. Turnover rates of fall-migrating pectoral sandpipers in
the lower Mississippi Alluvial Valley. J. Wildl. Manage. 69: 671–680.
Link, W. A., and J. R. Sauer. 1998. Estimating population change from count data. Application to
the North American Breeding Bird Survey. Ecol. Appl. 8: 258–268.
Link, W. A., E. Cam, J. D. Nichols, and E. G. Cooch. 2002. Of BUGS and birds: Markov Chain
Monte Carlo for hierarchical modeling in wildlife research. J. Wildl. Manage. 66: 227–291.
Lukacs, P. M., W. L. Thompson, W. L. Kendall, W. R. Gould, P. F. Doherty Jr., K. P. Burnham,
and D. R. Anderson. 2007. Concerns regarding a call for pluralism of information theory and
hypothesis testing. J. Appl. Ecol. 44: 456–460.
MacKenzie, D. I., J. D. Nichols, J. A. Royle, K. H. Pollock, L. L. Bailey, and J. E. Hines. 2006.
Occupancy Estimation and Modeling. Academic, Burlington, MA.
Maindonald, J H., and J. Braun. 2003. Data Analysis and Graphics Using R. Cambridge
University, United Kingdom.
MATLAB. 2005. Learning MATLAB. The MathWorks, Inc., Natick, MA.
Manly, B. F. J. 1991. Randomization and Monte Carlo Methods in Biology. Chapman and Hall,
New York.
McCullough, B. D., and B. Wilson. 1999. On the accuracy of statistical procedures in Microsoft
Excel 97. Comput. Stat. Data Anal. 31: 27–37.
McCullough, B. D., and B. Wilson. 2002. On the accuracy of statistical procedures in Microsoft
Excel 2000 and Excel XP. Comput. Stat. Data Anal. 40: 713–721.
McCullough, B. D., and B. Wilson. 2005. On the accuracy of statistical procedures in Microsoft
Excel 2003 Comput. Stat. Data Anal. 49: 1224–1252.
74 2 Concepts for Wildlife Science: Design Application
Minitab. 2003. MINITAB Statistical Software, Release 14 for Windows. State College, Pennsylvania.
Mitchell, W. A. 1986. Deer spotlight census: Section 6.4.3, U.S. Army Corp of Engineers Wildlife
Resources Management Manual. Technical Report EL-86–53, U.S. Army Engineer Waterways
Experiment Station, Vicksburg, MS.
Mood, A. M., F. A. Graybill, and D. C. Boes. 1974. Introduction to the Theory of Statistics, 3rd
Edition, McGraw-Hill, Boston, MA.
Nichols, J. D., J. E. Hines, and K. H. Pollock. 1984. The use of a robust capture–recapture design
in small mammal population studies: a field example with Microtus pennsylvanicus. Acta
Therilogica 29: 357–365.
Nichols, J. D., J. E. Hines, J. R. Sauer, F. W. Fallon, J. E. Fallon, and P. J. Heglund. 2000. A double
observer approach for estimating detection probability and abundance from point counts. Auk
117(2): 393–408.
Norman, G. W., M. M. Conner, J. C. Pack, and G. C. White. 2004. Effects of fall hunting on sur-
vival of male wild turkeys in Virginia and West Virginia. J. Wildl. Manage. 68: 393–404.
Otis, D. L., K. P. Burnham, G. C. White, and D. R. Anderson. 1978. Statistical inference from
capture data on closed animal populations. Wildl. Monogr. 62: 1–135.
Payne, R. W., Murray, D. A., Harding, S. A., Baird, D. B. & Soutar, D. M. 2006. GenStat for
Windows, 9th Edition. Introduction. VSN International, Hemel Hempstead.
Peterjohn, B. G., J. R. Sauer, and W. A. Link. 1996. The 1994 and 1995 summary of the North
American Breeding Bird Survey. Bird Popul. 3: 48–66.
Pollock, K. H., J. D. Nichols, C. Brownie, and J. E. Hines. 1990. Statistical inference for capture–
recapture experiments. Wildl. Monogr. 107: 1–97.
R Development Core Team. 2006. R: a language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. ISBN 3–900051–07–0, URL http://
www.R-project.org
Ripley, B. D. 1996. Pattern Recognition and Neural Networks. Cambridge University, Cambridge.
Ripley, B. D. 2002. Statistical methods need software: a view of statistical computing. Opening
Lecture, RSS Statistical Computing Section.
Robbins, C. S., D. Bystrack, and P. H. Geissler. 1986. The breeding bird survey: the first 15 years,
1965–1979. Resource Publication no. 157, U.S. Department of the Interior, Fish and Wildlife
Service, Washington, DC.
Rosenstock, S. S., D. R. Anderson, K. M. Giesen, T. Leukering, and M. F. Carter. 2002. Landbird
counting techniques: current practices and an alternative. Auk 119(1): 46–53.
Royle, J. A., and J. D. Nichols. 2003. Estimating abundance from repeated presence-absence data
or point counts. Ecology 84: 777–790.
SAS Institute Inc. 2000. SAS language reference: dictionary, version 8. SAS Institute, Inc., North
Carolina.
Sauer, J. R., B. G. Peterjohn, and W. A. Link. 1994. Observer differences in the North American
Breeding Bird Survey. Auk 111: 50–62.
Schwarz, C. J. and G. A. F. Seber. 1999. Estimating animal abundance: review III. Stat. Sci. 14:
427–456.
Sinclair, A. R. E. 1991. Science and the practice of wildlife management. J. Wildl. Manage. 55:
767–773.
Skalski, J. R., and D. S. Robson. 1992. Techniques for Wildlife Investigations: Design and
Analysis of Capture Data. Academic, San Diego, CA.
SPSS Inc. 1999. SPSS Base 10.0 for Windows User’s Guide. SPSS Inc., Illinois.
StataCorp. 2005. Stata Statistical Software: Release 9. Texas.
StatSoft. 2003. STATISTICA data analysis software system, version 6. Oklahoma.
Seber, G. A. F. 1982. The Estimation of Animal Abundance and Related Parameters, 2nd Edition.
Griffin, London.
Steidl, R. J. 2006. Model selection, hypothesis testing, and risks of condemning analytical tools.
J. Wildl. Manage. 70: 1497–1498.
Steidl R. J., J. P. Hayes, and E. Schauber. 1997. Statistical power in wildlife research. J. Wildl.
Manage. 61: 270–279.
References 75
Stephens, P. A., S. W. Buskirk, and C. M. del Rio. 2007a. Inference in ecology and evolution.
Trends Ecol. Evol. 22: 192–197.
Stephens, P. A., S. W. Buskirk, G. D. Hayward, and C. M. Del Rio. 2007b. A call for statistical
pluralism answered. J. Appl. Ecol. 44: 461–463.
Stewart-Oaten, A., W. W. Murdoch, and K. R. Parker. 1986. Environmental impact assessment:
“Pseudoreplication” in time? Ecology 67: 929–940.
Sutherland, W. J. 2006. Planning a research programme, in W. J. Sutherland, Ed. Ecological
Census Techniques, 2nd Edition, pp. 1–10. Cambridge University, Cambridge.
SYSTAT. 2002. SYSTAT for Windows, version 10.2. SYSTAT software Inc., California.
Thompson, S. K. 2002. Sampling, 2nd Edition. John Wiley and Sons, New York.
Thompson, W. L. 2002. Towards reliable bird surveys: accounting for individuals present but not
detected. Auk 119(1): 18–25.
Thompson, S. K., and G. A. F. Seber. 1996. Adaptive Sampling. John Wiley and Sons, New
York.
Thompson, W. L., G. C. White, and C. Gowan. 1998. Monitoring vertebrate populations.
Academic, New York.
Tufte, E. R. 1983. The visual display of quantitative information. Graphics, Chesire, CT.
Tufte, E. R. 2001. The visual display of quantitative information, 2nd Edition. Graphics, Chesire,
CT.
Venables, W. N., and B. D. Ripley. 2002. Modern applied statistics with S, 4th Edition. Springer-
Verlag, New York.
Verner, J., and K. A. Milne. 1990. Analyst and observer variability in density estimates from spot
mapping. Condor 92: 313–325.
Walters, C. J. 1986. Adaptive Management of Renewable Resources. Macmillan, New York.
Walters, C. J., and C. S. Holling. 1990. Large-scale management experiments and learning by
doing. Ecology 71: 2060–2068.
White, G. C. 1996. NOREMARK: population estimation from mark-resighting surveys. Wildl.
Soc. Bull. 24: 50–52.
White, G. C., and K. P. Burnham. 1999. Program MARK: survival estimation from populations of
marked animals. Bird Study 46(Suppl.): 120–139.
Williams, B. K. 1996. Adaptive optimization and the harvest of biological populations. Math.
Biosci. 136: 1–20.
Williams, B. K., J. D. Nichols, and M. J. Conroy. 2002. Analysis and Management of Animal
Populations. Academic, San Diego, CA.
Chapter 3
Experimental Designs
3.1 Introduction
3.2 Principles
The relatively large geographic areas of interest, the amount of natural variability
(noise) in the environment, the difficulty of identifying the target population, the
difficulty of randomization, and the paucity of good controls make wildlife studies
challenging. Wildlife studies typically focus on harvestable species and relatively
scarce species of concern (e.g., threatened and endangered species) and factors that
influence their abundance (e.g., death, reproduction, and use). In wildlife studies,
the treatment is usually a management activity, land use change, or other perturba-
tion contamination event potentially affecting a wildlife population. Additionally,
this event could influence populations over an area much larger than the geographic
area of the treatment. In most instances, quantification of the magnitude and dura-
tion of the treatment effects necessarily requires an observational study, because
there usually is not a random selection of treatment and control areas. Early speci-
fication of the target population is essential in the design of a study. If investigators
can define the target population, then decisions about the basic study design and
sampling are much easier and the results of the study can be appropriately applied
to the population of interest.
Hurlbert (1984) divided experiments into two classes: mensurative and manipu-
lative. Mensurative studies involve making measurements of uncontrolled events at
one or more points in space or time with space and time being the only experimen-
tal variable or treatment. Mensurative studies are more commonly termed observa-
tional studies, a convention we adopt. Observational studies can include a wide
range of designs including the BACI, line-transect surveys for estimating abun-
dance, and sample surveys of resource use. The important point here is that all these
studies are constrained by a specific protocol designed to answer specific questions
or address hypotheses posed prior to data collection and analysis. Manipulative
studies include much more control of the experimental conditions; there are always
two or more treatments with different experimental units receiving different treat-
ments and random application of treatments.
Eberhardt and Thomas (1991), as modified by Manly (1992) provided a useful
and more detailed classification of study methods (Fig. 3.1). The major classes in
their scheme are studies where the observer has control of events (manipulative
experiments) and the study of uncontrolled events. Replicated and unreplicated
manipulative experiments follow the classical experimental approach described in
most statistics texts. Many of the designs we discuss are appropriate for these
experiments. Their other category of manipulative experiment, sampling for mode-
ling, deals with the estimation of parameters of a model hypothesized to represent
the investigated process (see Chap. 4).
Fig. 3.1 Classification scheme of the types of research studies as proposed by Eberhardt and
Thomas (1991) and modified by Manly (1992). Reproduced from Eberhardt et al. (1991) with
kind permission from Springer Science + Business Media
3.3 Philosophies 79
3.3 Philosophies
Scientific research is conducted under two broad and differing philosophies for mak-
ing statistical inferences: design/data-based and model-based. These differing phi-
losophies are often confused but both rely on current data to some degree and aim
to provide statistical inferences. There is a continuum from strict design/data-based
analysis (e.g., finite sampling theory [Cochran 1977] and randomization testing
[Manly 1991]) to pure model-based analysis (e.g., global climate models and habitat
evaluation procedures [HSI/HEP] using only historical data [USDI 1987]). A com-
bination of these two types of analyses is often employed in wildlife studies, result-
ing in inferences based on a number of interrelated arguments. For more detailed
discussion on design-based and model-based approaches see Chap. 4.
80 3 Experimental Designs
Inferences from wildlife studies often require mixtures of the strict design/data-
based and pure model-based analyses. Examples of analyses using mixtures of
study designs include:
1. Design/data-based studies conducted on a few target wildlife species
2. Manipulative tests using surrogate species to estimate the effect of exposure to
some perturbation on the species of concern (Cade 1994)
3.4 Replication, Randomization, Control, and Blocking 81
Fisher (1966) defined the traditional design paradigm for the manipulative experi-
ment in terms of the replication, randomization, control, and blocking, introduced
in Chap. 2. Two additional methods are useful for increasing the precision of stud-
ies in the absence of increased replication:
1. Group randomly allocated treatments within homogeneous groups of experi-
mental units (blocking)
2. Use analysis of covariance (ANCOVA) when analyzing the response to a treat-
ment to consider the added influence of variables having a measurable influence
on the dependent variable
3.4.1 Replication
3.4.2 Randomization
Like replication, an unbiased set of independent data is essential for estimating the
error variance and for most statistical tests of treatment effects. Although truly
unbiased data are unlikely, particularly in wildlife studies, a randomized sampling
method can help reduce bias and dependence of data and their effects on the accu-
racy of estimates of parameters. A systematic sample with a random start is one
type of randomization (Krebs 1989).
Collecting data from representative locations or typical settings is not random
sampling. If landowners preclude collecting samples from private land within a
study area, then sampling is not random for the entire area. In studies conducted on
representative study areas, statistical inference is limited to the protocol by which
the areas are selected. If private lands cannot be sampled and public lands are sam-
pled by some unbiased protocol, statistical inference is limited to public lands. The
selection of a proper sampling plan (see Chap. 4) is a critical step in the design of
a project and may be the most significant decision affecting the utility of the data
when the project is completed. If the objective of the study is statistical inference
to the entire area, yet the sampling is restricted to a subjectively selected portion of
the area, then there is no way to meet the objective with the study design. The infer-
ence to the entire area is reduced from a statistical basis to expert opinion.
Replication can increase the precision of an experiment (see Chap. 2), although this
increased precision can be expensive. As discussed by Cochran and Cox (1957) and
Cox (1958), the precision of an experiment can also be increased through:
1. Use of experimental controls
2. Refinement of experimental techniques, including greater sampling precision
within experimental units
3. Improvement of experimental designs, including stratification and measurements
of nontreatment factors (covariates) potentially influencing the experiment
Good experimental design should strive to improve confidence in cause and effect
conclusions from experiments through the control (standardization) of related vari-
ables (Krebs 1989).
ANCOVA uses information measured on related variables as an alternative to
standardizing variables (Green 1979). For example, understanding differences in
predator use between areas improves when considered in conjunction with factors
3.5 Practical Considerations 83
influencing use, such as the relative abundance of prey in each area. These factors
are often referred to as concomitant variables or covariates. ANCOVA combines
analysis of variance (ANOVA) and regression to assist interpretation of data when
no specific experimental controls have been used (Steel and Torrie 1980). This
analysis method allows adjustment of variables measured for treatment effects for
differences in other independent variables also influencing the treatment response
variable. ANCOVA assists in controlling error and increasing precision of
experiments.
Precision can also be improved using stratification, or assigning treatments (or
sampling effort) to homogeneous strata, or blocks, of experimental units.
Stratification can occur in space (e.g., units of homogeneous vegetation) and in
time (e.g., sampling by season). Strata should be small enough to maximize homo-
geneity, keeping in mind that smaller blocks may increase sample size require-
ments. For example, when stratifying an area by vegetation type, each stratum
should be small enough to ensure a relatively consistent vegetation pattern within
strata. Nevertheless, stratification requires some minimum sample size necessary to
make estimates of treatment effects within strata. It becomes clear that stratification
for a variable (e.g., vegetation type) in finer and finer detail will increase the mini-
mum sample size requirement for the area of interest. If additional related variables
are controlled for (e.g., treatment effects by season), then sample size requirements
can increase rapidly. Stratification also assumes the strata will remain relatively
consistent throughout the life of the study, an assumption often difficult to meet in
long-term field studies.
Once the decision is made to conduct a wildlife study, several practical issues must
be considered:
1. Area of interest (area to which statistical and deductive inferences will be made).
Options include the study site(s), the region containing the study sites, the local
area used by the species of concern, or the population potentially affected (in
this case, population refers to the group of animals interbreeding and sharing
common demographics).
2. Time of interest. The period of interest may be, for example, diurnal, nocturnal,
seasonal, or annual.
3. Species of interest. The species of interest may be based on behavior, existing
theories regarding species and their response to the particular perturbation,
abundance, or legal/social mandate.
4. Potentially confounding variables. These may include landscape issues (e.g.,
large-scale habitat variables), biological issues (e.g., variable prey species abun-
dance), land use issues (e.g., rapidly changing crops and pest control), weather,
study area access, etc.
84 3 Experimental Designs
5. Time available to conduct studies. The time available to conduct studies given
the level of scientific or public interest, the timing of the impact in the case of
an accidental perturbation, or project development schedule in the case of a
planned perturbation will often determine how studies are conducted and how
much data can be collected.
6. Budget. Budget is always a consideration for potentially expensive studies.
Budget should not determine what questions to ask but will influence how they
are answered. Budget will largely determine the sample size, and thus the degree
of confidence one will be able to place in the results of the studies.
7. Magnitude of anticipated effect. The magnitude of the perturbation or the impor-
tance of the effect to the biology of the species will often determine the level of
concern and the required level of precision.
The remainder of this chapter is devoted to a discussion of some of the more com-
mon experimental designs used in biological studies. We begin with the simplest
designs and progress toward the more complex while providing examples of practi-
cal applications of these designs to field studies. These applications usually take
liberties with Fisher’s requirements for designs of true experiments and thus we
refer to them as quasiexperiments. Since the same design and statistical analysis
can be used with either observational or experimental data, we draw no distinction
between the two types of study. Throughout the remainder of this chapter, we refer
to treatments in a general sense in that treatments may be manipulations by the
experimenter or variables of interest in an observational study.
Experiments are often classified based on the number of types of treatments that are
applied to experimental units. A one-factor experiment uses one type of treatment
or one classification factor in the experimental units in the study, such as all the
animals in a specific area or all trees of the same species in a management unit. The
treatment may be different levels of a particular substance or perturbation.
The simplest form of a biological study is the comparison of the means of two pop-
ulations. An unpaired study design estimates the effect of a treatment by examining
the difference in the population mean for a selected parameter in a treated and con-
trol population. In a paired study design, the study typically evaluates changes in
study units paired for similarity. This may take the form of studying a population
before and after a treatment is applied, or by studying two very similar study units.
For example, one might study the effects of a treatment by randomly assigning
3.6 Single-factor Designs 85
treatment and control designation to each member of several sets of twins or to the
right and left side of study animals, or study the effectiveness of two measurement
methods by randomly applying each method to subdivided body parts or plant
materials.
Comparison of population means is common in impact assessment. For exam-
ple, as a part of a study of winter habitat use of mule deer (Odocoileus hemionus)
in an area affected by gas exploration, development, and production, Sawyer et al.
(2006) conducted quadrat counts of deer using the winter range from 2001 to 2005
and estimated a 49% decline in deer density after development. As Underwood
(1997) points out, this is the classic “before–after” paired comparison where den-
sity is estimated before the treatment (gas development) and then compared to
density estimates after development. Even though this rather dramatic decline in
deer density is of concern, and represents a valid test of the null hypothesis that
density will not change after development has occurred, the attribution of the
change to development is not supported because of other influences potentially
acting on the population. These other potential factors are usually referred to as
confounding influences (Underwood 1997). In this case, other plausible explana-
tions for the decline in density might be a regional decline in deer density due to
weather or a response to competition with livestock for forage. Another approach
to designing a study to evaluate the impacts of gas development on this group of
deer is to measure density in both a treatment and a control area, where the com-
parison is the density in two independent groups of deer in the same region with
similar characteristics except for the presence (treatment) or absence (control) of
gas development.
While there is still opportunity for confounding, and cause and effect is still
strictly professional judgment since this is a mensurative study, the presence or
absence of a similar decline in the both the treatment and control groups of animals
adds strength to the assessment of presence or absence of impact. This example
illustrates a common problem in wildlife studies; that is, there is no statistical prob-
lem with the study, and there is confidence in not accepting the null hypothesis of
no change in density after development. The dilemma is that there is no straightfor-
ward way of attributing the change to the treatment of interest (i.e., gas develop-
ment). Fortunately, for Sawyer et al. (2006), contemporary estimates of habitat use
made before and after gas development illustrated a rather clear reduction of avail-
able habitat resulting from gas development, which provides support for the conclu-
sion that reduced density may be at least partially explained by development.
Another example of the value of paired comparisons is taken from the Coastal
Habitat Injury Assessment (CHIA) following the massive oil spill when the Exxon
Valdez struck Bligh Reef in Prince William Sound, Alaska in 1989 – the Exxon
Valdez oil spill (EVOS). Many studies evaluated the injury to marine resources fol-
lowing the spill of over 41 million liters of Alaska crude oil. Pairing of oiled and
unoiled sites within the area of impact of the EVOS was a centerpiece in the study
of shoreline impacts by the Oil Spill Trustees’ Coastal Habitat Injury Assessment
(Highsmith et al., 1993; McDonald et al., 1995; Harner et al. 1995). In this case,
beaches classified in a variety of oiled categories (none, light, moderate, and heavy)
86 3 Experimental Designs
were paired based on beach substrate type (exposed bedrock, sheltered bedrock,
boulder/cobble, and pebble/gravel). Measures of biological characteristics were
taken at each site (e.g., barnacles per square meter, macroinvertebrates per square
meter, intertidal fish, and algae per square meter) and comparisons were made
between pairs of sites. The results were summarized as p-values (probabilities of
observing differences as large as seen on the hypothesis that oiling had no effect)
and p-values were combined using a meta-analysis approach (Manly 2001).
units in ecological studies and spatial segregation of experimental units can lead
to erroneous results resulting from naturally occurring gradients (e.g., elevation
and exposure effects on plant growth). This is especially problematic with small
sample sizes common in field studies. A systematic selection of experimental
units (see Chap. 4) may reduce the effects of spatial segregation of units for a
given sample size while maintaining the mathematical properties of randomness.
Regardless, the natural gradients existing in nature make application of the com-
pletely randomized design inappropriate for most field studies.
For a hypothetical example of the completely randomized design, assume the
following situation. A farmer in Wyoming is complaining about the amount of
alfalfa consumed by deer in his fields. Since the wildlife agency must pay for veri-
fied claims of damage by big game, there is a need to estimate the effect of deer use
on production of alfalfa in the field. The biologist decides to estimate the damage
by comparing production in plots used by deer vs. control plots not used by deer
and divides the farmer’s largest uniform field into a grid of plots of equal size.
A sample of plots is then chosen by some random sampling procedure (see Chap. 4).
Deer-proof fence protects half of the randomly selected plots, while the other half
is unprotected controls. The effects of deer use is the difference between estimated
alfalfa production in the control and protected plots, as measured either by compar-
ing the two sample means by a simple t-test or the overall variation between the
grazed and ungrazed plots by ANOVA (Mead et al. 1993).
An astute biologist who wanted to pay only for alfalfa consumed by deer could
add an additional treatment to the experiment. That is, a portion of the plots could
be fenced to allow deer use but exclude rabbits and other small herbivores that are
not covered by Wyoming’s damage law, without altering the design of the experi-
ment. The analysis and interpretation of this expanded experiment also remains
relatively simple (Mead et al. 1993).
In a real world example, Stoner et al. (2006) evaluated the effect of cougar
(Puma concolor) exploitation levels in Utah. This study used a two-way factorial
ANOVA in a completely randomized design with unequal variances to test for age
differences among treatment groups (site and sex combinations) for demographic
structure, population recovery, and metapopulation dynamics.
While the simplicity of the completely randomized design is appealing, the lack of
any restriction in allocation of treatments even when differences in groups of
experimental units are known seems illogical. In ecological experiments and even
most controlled experiments in a laboratory, it is usually desirable to take advantage
of blocking or stratification (see Chap. 4 for discussion) as a form of error control.
In the deer example discussed earlier, suppose the biologist realizes there is a gradi-
ent of deer use with distance from cover. This variation could potentially bias
estimates of deer damage, favoring the farmer if by chance a majority of the plots
is near cover or favoring the wildlife agency if a majority of the plots is toward the
88 3 Experimental Designs
center of the field. Dividing the field into strata or blocks and estimating deer use
in each may improve the study. For example, the biologist might divide the field
into two strata, one including all potential plots within 50 m of the field edge and
one including the remaining plots. This stratification of the field into two blocks
restricts randomization by applying treatments to groups of experimental units that
are more similar and results in better estimates of the effect of deer use, resulting
in an equitable damage settlement.
In the experiment where blocking is used and each treatment is randomly
assigned within each block, the resulting design is called a randomized complete
block design (Table 3.1). Blocking can be based on a large number of factors poten-
tially affecting experimental variation. In animal studies, examples of blocks
include things such as expected abundance, territoriality, individual animal weights,
vegetation, and topographical features. Plant studies block on soil fertility, slope
gradient, exposure to sunlight, individual plant parts, or past management. In eco-
logical studies, it is common to block on habitat and across time. This form of
grouping is referred to as local control (Mead et al. 1993). The typical analysis of
randomized block designs is by ANOVA following the linear additive model
with the block × treatment interaction serving as the error estimate for hypothesis tests.
With proper blocking, no single treatment gains or loses advantage when compared
with another because of the characteristics of the units receiving the treatment.
If the units within blocks are homogeneous compared to units within other blocks,
the blocking reduces the effects of random variation among blocks on the errors
involved in comparing treatments. Notwithstanding, poorly designed blocking
creates more problems than it solves (see Chap. 4 for a discussion of problems
associated with stratification).
Volesky et al. (2005) provide an example of the randomized complete block
design to determine the use and herbage production (of cool-season graminoids) in
response to spring livestock grazing date and stocking rate in the Nebraska
Sandhills. The study used spring grazing date as the main plot, stocking rate as the
split plot (see Sect. 3.8.2), with a nongrazed control and grazing rate and stocking
rate were factor combinations of treatments. The analysis combined treatments
across years with years as fixed effects and blocks as random effects.
Bates et al. (2005) also used the randomized complete block design in a long-term
study of the successional trends following western juniper cutting. This study estab-
lished four blocks with each block divided into two plots and one plot within each
block randomly assigned the cutting treatment (CUT) and the remaining plot left as
woodland (WOODLAND). ANOVA was used to test for treatment effect on herba-
ceous standing crop (functional group and total herbaceous), cover (species and
functional group), and density (species and functional group). Cover and density of
shrubs and juniper were analyzed by species with response variables analyzed as a
randomized complete blocks across time. The final model included blocks (four
blocks, df = 3), years (1991–1997 and 2003, df = 7), treatments (CUT, WOODLAND,
df = 1), and year by treatment interaction (df = 7; with the error term df = 45).
A characteristic of the randomized block design discussed earlier was that each
treatment was included in each block. In some situations, blocks or budgets may
not be large enough to allow all treatments to be applied in all blocks. The incom-
plete block design results when each block has less than a full complement of treat-
ments. In a balanced incomplete block experiment (Table 3.2), all treatment effects
and their differences are estimated with the same precision, as long as every pair of
treatments occurs together the same number of times (Manly 1992). However,
analysis of incomplete block designs is considerably more complicated than com-
plete block designs. It is important to understand the analysis procedures before
implementing an incomplete block design. Example design and analysis methods
are discussed in Mead et al. (1993).
The randomized block design is useful when one source of local variation exists.
When additional sources of variation exist, then the randomized block design can
Table 3.3 A Latin square experiment with two blocking factors (X and Y) each with four blocks
and four treatments (A, B, C, D)
Blocking factor (Y)
Blocking factor (X) 1 2 3 4
1 A B C D
2 B C D A
3 C D A B
4 D A B C
Reproduced from Morrison et al. (2001), with kind permissions from Springer Science + Business
Media
be extended to form a Latin square (Table 3.3). For example, in a study of the
effectiveness of some treatment, variation may be expected among plots, seasons,
species, etc. In a Latin square, symmetry is required so that each row and column
in the square is a unique block. The basic model for the Latin square design is as
follows:
Observed outcome = row effect + column effect + treatment effect + random unit
variation.
The Latin square design allows separation of variation from multiple sources at the
expense of df, potentially reducing the ability of the experiment to detect effect.
The Latin square design is useful when multiple causes of variation are suspected
but unknown. However, caution should be exercised when adopting this design. As
an example of the cost of the design, a 3 × 3 Latin square must reduce the mean
square error by approximately 40% of the randomized block design of the same
experiment to detect a treatment effect of a given size.
While the Latin square design is not a common study design in wildlife studies
it can be useful in some situations. For example, with the aid of George Baxter and
Lyman McDonald, both professors at the University of Wyoming, the Wyoming
Game and Fish Department used the Latin square design on a commercial fisheries
project involving carp (Ctenopharyngodon idella) in Wyoming. The Department
wanted to determine the cause and frequency of “large” year classes and estimated
abundance of young fish by different methods at beach sites to help answer this
question. The study used three sites, three sampling periods separated by some time
to let the fish settle down, and three types of gear (minnow seining, wing traps, and
minnow traps). The design was set up in a balanced 3 × 3 Latin square and analysis
was by ANOVA. The Latin square takes the form of rows as sites, columns as times,
and three gear types with a response variable of p where p = proportion of young
of the year fish caught. The Latin square is completed, where each gear type occurs
once in each site and time. In addition to estimating the abundance of young fish,
the Department was interested in correcting seining data collected elsewhere for
biases relative to the “best” sampling method or the pooled proportions if there
were significant differences.
3.7 Multiple-factor Designs 91
3.6.6 Summary
Obviously the different levels of a single treatment in these designs are assumed to
be independent and the treatment response assumed to be unaffected by interactions
among treatment levels or between the treatment and the blocking factor. This
might not present a problem if interaction is 0, an unlikely situation in ecological
experiments. Heterogeneity in experimental units and strata (e.g., variation in
weather, vegetation, and soil fertility) is common in the real world and results in the
confounding of experimental error and interaction of block with treatment effects
(Underwood 1997). This potential lack of independence with a corresponding lack
of true replication can make interpretation of experiments very difficult, increasing
the effect size necessary for significance (increase in Type II error).
and the two amounts of total forage (treatment factors) in the first example and the
grouping by sex (classification factor) combined with the three levels of nutrients in
the second example both result in a 2 × 3 factorial experiment (Table 3.4).
Multiple-factor designs occur when one or more classes of treatments are combined
with one or more classifications of experimental units. Continuing the deer feeding
experiment, a multiple-factor experiment might include both classes of treatment
and the classification of deer by sex resulting in a 2 × 2 × 3 factorial experiment
(Table 3.5).
Classification factors, such as sex and age, are not random variables but are fixed
in the population of interest and cannot be manipulated by the experimenter. On the
other hand, the experimenter can manipulate treatment factors, usually the main
point of an experiment (Manly 1992). It is not appropriate to think in terms of a
random sample of treatments, but it is important to avoid bias by randomizing the
application of treatments to the experimental units available in the different classes
of factors. In the example above, a probabilistic sample of female deer selected
from all females available for study receive different levels of the treatment.
In the relatively simple experiments with unreplicated single-factor designs, the
experimenter dealt with treatment effects as if they were independent. In the real
world, one would expect that different factors often interact. The ANOVA of facto-
rial experiments allows the biologist to consider the effect of one factor on another.
In the deer example, it is reasonable to expect that lactating females might react
differently to a given level of a nutrient, such as calcium, than would male deer.
Thus, in the overall analysis of the effect of calcium in the diet, it would be instruc-
tive to separate the effects of calcium and sex on body condition (main effects) from
the effects of the interaction of sex and calcium. The linear model for the factorial
experiment allows the subdivision of treatment effects into main effects and interac-
tions, allowing the investigation of potentially interdependent factors. The linear
model can be characterized as follows:
Observed outcome = main effect variable A + main effect variable B+(A)(B) inter-
action + Random unit variation
3.7 Multiple-factor Designs 93
Table 3.5 An example of a 2 × 2 × 3 factorial experiment where the three levels of a mircronu-
trient (factor A) are applied to experimental deer grouped by sex (factor B), half of which are fed
a different amount of forage (factor C)
Factor B, sex Factor C, forage level Factor A, micronutrient
B1 c1 a1b1c1 a2b1c1 a3b1c1
c2 a1b1c2 a2b1c2 a3b1c2
B2 c1 a1b2c1 a2b2c1 a3b2c1
c2 a1b2c2 a2b2c2 a3b2c2
Reproduced from Morrison et al. (2001), with kind permissions from Springer Science + Business
Media
Mead et al. (1993) considered this characteristic one of the major statistical contri-
butions from factorial designs.
When interactions appear negligible, factorial designs have a second major ben-
efit referred to as “hidden replication” by Mead et al. (1993). Hidden replication
allows the use of all experimental units involved in the experiment in comparisons
of the main effects of different levels of a treatment when there is no significant
interaction. Mead et al. (1993) illustrated this increase in efficiency with a series of
examples showing the replication possible when examining three factors, A, B, and
C, each with two levels of treatment:
1. In the case of three independent comparisons, (a0b0c0) with (a1b0c0), (a0b1c0), and
(a0b0c1) with four replications for each was possible, involving 24 experimental
units. The variance of the estimate of the difference between the two levels of A
(or B or C) is 2s 2/4, where s 2 is the variance per plot.
2. Some efficiency is gained by reducing the use of treatment (a0b0c0) by combin-
ing the four treatments (a0b0c0), (a1b0c0), (a0b1c0), and (a0b0c1) into an experiment
with six replications each. Thus, the variance of the estimate of the difference
between any two levels is 2s 2/6, reducing the variance by two-thirds.
3. There are eight factorial treatments possible from combinations of the three fac-
tors with their two levels. When these treatments are combined with three replica-
tions, each comparison of two levels of a factor includes 12 replicates. All 24
experimental units are involved with each comparison of a factor’s two levels.
Thus, in the absence of interaction, the factorial experiment can be more economi-
cal, more precise, or both, than experiments looking at a single factor at a time.
There is more at stake than simply an increase in efficiency when deciding whether
to select a factorial design over independent comparisons. The counterargument for
case 1 above is that the analysis becomes conditional on the initial test of interac-
tion, with the result that main effect tests of significance levels may be biased.
Perhaps the only situation where example 1 might be desirable is in a study where
sample sizes are extremely limited.
Multiple-factor designs can become quite complicated, and interactions are the
norm. Although there may be no theoretical limit to the number of factors that can be
included in an experiment, it is obvious that sample size requirements increase dra-
matically as experimental factors with interactions increase. This increases the cost of
94 3 Experimental Designs
experiments and makes larger factorial experiments impractical. Also, the more com-
plicated the experiment is, the more difficulty one has in interpreting the results.
Factorial designs are reasonably common in ecology studies. Mieres and Fitzgerald
(2006) used both two-factor and three-factor models in studying the monitoring and
management of the harvest of tegu lizards (Tupinambis spp.) in Paraguay. The study
applied general linear models (two-factor and three-factor ANOVA) to test the null
hypothesis of no significant differences in mean size of males and females of each
species among years and among check stations. To analyze data from tanneries, they
used separate two-factor ANOVAs, with interaction (year and sex as factors), for each
species to test the hypothesis that body size varied by year and sex. To test for size
variation in tegu skins sampled in the field, the study used three-factor ANOVAs, with
interaction (year, sex, and check station as factors), to test the hypothesis that body
size varied by year, sex, and check station.
In a study of bandwidth selection for fixed-kernel analysis of animal utilization
distributions, Gitzen et al. (2006) used mixtures of bivariate normal distributions to
model animal location patterns. The study varied the degree of clumping of simu-
lated locations to create distribution types that would approximate a range of real
utilization distributions. Simulations followed a 4 × 3 × 3 factorial design, with
factors of distribution type (general, partially clumped, all clumped, nest tree),
number of component normals (2, 4, 16), and sample size (20, 50, 150)
The desire to include a large number of factors in an experiment has led to the
development of complex experimental designs. For an illustration of the many
options for complex designs, the biologist should consult textbooks with details on
the subject (e.g., Montgomery 1991; Milliken and Johnson 1984; Mead et al. 1993;
Underwood 1997). The object of these more complex designs is to allow the study
of as many factors as possible while conserving observations. One such design is a
form of the incomplete block design known as confounding. Mead et al. (1993)
described confounding as the allocation of the more important treatments in a ran-
domized block design so that differences between blocks cancel out the same way
they do for comparisons between treatments in a randomized block design. The
remaining factors of secondary interest, including those assumed to have negligible
interactions are included as treatments in each block, allowing the estimate of their
main effects while sacrificing the ability to include their effects on interactions.
Thus, block effects are confounded with the effects of interactions. The resulting
allocation of treatments becomes an incomplete block with a corresponding reduc-
tion in the number of treatment comparisons the experimenter must deal with.
Mead et al. (1993) provided two examples that help describe the rather complicated
blocking procedure. These complicated designs should not be attempted without
consulting a statistician and unless the experimenter is confident about the lack of
significant interaction in the factors of secondary interest.
3.8 Hierarchical Designs 95
Split-plot designs are a form of nested factorial design commonly used in agricul-
tural and biological experiments. The study area is divided into blocks following
96 3 Experimental Designs
the principles for blocking discussed earlier. The blocks are subdivided into rela-
tively large plots called main plots, which are then subdivided into smaller plots
called split plots, resulting in an incomplete block treatment structure. In a two-
factor design, one factor is randomly allocated to the main plots within each block.
The second factor is then randomly allocated to each split plot within each main
plot. The design allows some control of the randomization process within a legiti-
mate randomization procedure.
Table 3.6 illustrates a simple two-factor split-plot experiment. In this example,
four levels of factor A are allocated as if the experiment were a single-factor com-
pletely randomized design. The three levels of factor B are then randomly applied
to each level of factor A. It is possible to expand the split-plot design to include
multiple factors and to generalize the design by subdividing split plots, limited only
by the minimal practical size of units for measurements (Manly 1992).
The ANOVA of the split-plot experiment also occurs at multiple levels. At the
main plot level, the analysis is equivalent to a randomized block experiment. At the
split-plot level, variation is divided into variation among split-plot treatments, inter-
action of split-plot treatments with main effects, and a second error term for split
plots (Mead et al. 1993). A thorough discussion of the analysis of split-plot experi-
ments is presented in Milliken and Johnson (1984). It should be recognized that in
the split-plot analysis, the overall precision of the experiment is the same as the
basic design.
The split-plot design is useful in experiments when the application of one or
more factors requires a much larger experimental unit than for others. For example,
in comparing the suitability of different species of grass for revegetation of clear-
cuts, the grass plots can be much smaller, e.g., a few square meters, as compared
with the clear-cuts that might need to be several acres to be practical. The design
can also be used when variation is known to be greater with one treatment vs.
another, with the potential for using less material and consequently saving money.
The design can be useful in animal and plant studies where litters of animals or
closely associated groups of individual plants can be used as main plots and the
individual animals and plants used as split plots.
Manly (1992) listed two reasons to use the split-plot design. First, it may be
convenient or necessary to apply some treatments to whole plots at the same time.
Second, the design allows good comparisons between the levels of the factor that is
3.8 Hierarchical Designs 97
applied at the subplot level at the expense of the comparisons between the main
plots, since experimental error should be reduced within main plots. However,
Mead et al. (1993) pointed out that there is actually a greater loss of precision at the
main plot level than is gained at the level of split-plot comparisons. They also indi-
cate that there is a loss of replication in many of the comparisons of combinations
of main plot and split treatments resulting in a loss of precision. These authors rec-
ommend against the split-plot design except where practically necessary. Underwood
(1997) also warned against this lack of replication and the potential lack of inde-
pendence among treatments and replicates. This lack of independence results
because, in most layouts of split-plot designs, main plots and split plots tend to be
spatially very close.
Barrett and Stiling (2006) used a split-plot design in a study of Key deer
(Odocoileus virginianus clavium) impacts on hardwood hammocks near urban
areas in the Florida Keys. The study used a split-plot ANOVA model to test each
response variable (total basal area of large trees and percentage of canopy cover)
with deer density (low and high) and distance (urban and exurban) as factors with
island (Big Pine, No Name, Cudjoe, Sugarloaf) nested within levels of deer density.
The study found evidence that deer density interacted with distance indicating dif-
ferences in responses between urban and exurban hammock stands.
ANCOVA uses the concepts of ANOVA and regression (Huitema 1980; Winer et al.
1991; Underwood 1997) to improve studies by separating treatment effects on the
response variable from the effects of confounding variables (covariates). ANCOVA
100 3 Experimental Designs
can also be used to adjust response variables and summary statistics (e.g., treatment
means), to assist in the interpretation of data, and to estimate missing data (Steel
and Torrie 1980). It is appropriate to use ANCOVA in conjunction with most of the
previously discussed designs.
Earlier in this chapter, we introduced the concept of increasing the precision of
studies by the use of ANCOVA when analyzing the response to a treatment by con-
sidering the added influence of variables having a measurable influence on the
dependent variable. For example, in the study of fatalities associated with different
wind turbines, Anderson et al. (1999) recommended measuring bird use and the rotor-
swept area as covariates. It seems logical that the more birds use the area around tur-
bines and the larger the area covered by the turbine rotor, the more likely that bird
collisions might occur. Figure 3.2 provides an illustration of a hypothetical example
of how analysis of bird fatalities associated with two turbine types can be improved
by the use of covariates. In the example, the average number of fatalities per turbine
is much higher in the area with turbine type A vs. turbine type B. However, when the
fatalities are adjusted for differences in bird use, the ratio of fatalities per unit of bird
use is the same for both turbine types, suggesting no true difference in risk to birds
from the different turbines. Normally, in error control, multiple regression is used to
assess the difference between the experimental and control groups resulting from the
treatment after allowing for the effects of the covariate (Manly 1992).
Fig. 3.2 Illustration of hypothetical example of bird fatalities associated with two turbine types
(A and B) where the mean fatalities are adjusted for differences in bird use. The average number
of fatalities per turbine is much higher associated with turbine type A vs. turbine type B, while the
ratio of fatalities per unit of bird use is the same for both turbine types. Reproduced from Morrison
et al. (2001) with kind permission from Springer Science + Business Media
3.9 Analysis of Covariance 101
To this point, we dealt with designs that are concerned with the effect of a treatment
on one response variable (univariate methods). The point of multivariate analysis is
to consider several related random variables simultaneously, each one being consid-
ered equally important at the start of the analysis (Manly 1986). There is a great deal
of interest in the simultaneous analysis of multiple indicators (multivariate analysis)
to explain complex relationships among many different kinds of response variables
over space and time. This is particularly important in studying the impact of a pertur-
bation on the species composition and community structure of plants and animals
(Page et al. 1993; Stekoll et al. 1993). Multivariate techniques include multidimen-
sional scaling and ordination analysis by methods such as principal component analy-
sis and detrended canonical correspondence analysis (Gordon 1981; Dillon and
Goldstein 1984; Green 1984; Seber 1984; Pielou 1984; Manly 1986; Ludwig and
Reynolds 1988; James and McCulloch 1990; Page et al. 1993). If sampling units are
selected with equal probability by simple random sampling or by systematic sam-
pling (see Chap. 4) from treatment and control areas, and no quasiexperimental
design is involved (e.g., no pairing), then the multivariate procedures are applicable.
It is unlikely that multivariate techniques will directly yield indicators of effect
(i.e., combinations of the original indicators) that meet the criteria for determina-
tion of effect. Nevertheless, the techniques certainly can help explain and corrobo-
rate impact if analyzed properly within the study design. Data from many
recommended study designs are not easily analyzed by those multivariate tech-
niques, because, for example,
● In stratified random sampling, units from different strata are selected with une-
qual weights (unequal probability).
● In matched pair designs, the inherent precision created by the pairing is lost if
that pair bond is broken.
A complete description of multivariate techniques is beyond the scope of this book
and is adequately described in the sources referenced earlier. Multivariate analysis
has intuitive appeal to wildlife biologists and ecologists because it deals simultane-
ously with variables, which is the way the real world works (see Morrison et al.
3.11 Other Designs 103
2006). However, complexity is not always best when trying to understand natural
systems. We think it is worth repeating Manly’s (1986) precautions:
1. Use common sense when deciding how to analyze data and remember that the
primary objective of the analysis is to answer the questions of interest.
2. The complexity of multivariate analysis usually means that answers that are
produced are seldom straightforward because the relationship between the
observed variables may not be explained by the model selected.
3. As with any method of analysis, a few extreme observations (outliers) may
dominate the analysis, especially with a small sample size.
4. Finally, missing values can cause more problems with multivariate data than
with univariate data.
The following are examples of multivariate designs in wildlife studies. Miles et al.
(2006) used multivariate models to study the multiscale roost site selection by
evening bats on pine-dominated landscapes in southwest Georgia. The study devel-
oped 16 a priori multivariate models to describe day-roost selection by evening bats,
with pooling data across gender and age classes. Model sets included all possible
additive combinations of categories that described tree, plot, stand, and landscape
scales. The study used logistic regression to create models and the second-order
Akaike’s Information Criteria (AICc) to identify the most parsimonious model and to
predict variable importance. Kristina et al. (2006) evaluated habitat use by sympatric
mule and white-tailed deer in Texas using multivariate analysis of variance
(MANOVA) to test for differences and interactions in habitat composition of home
ranges, core areas, among years, and between species for males, and among years,
seasons, and species for females. Cox et al. (2006) evaluated Florida panther habitat
use using a MANOVA to test the hypothesis that overall habitat selection did not dif-
fer from random with sex as a main effect and individual panthers as the experimental
unit. The study used the same procedure to test for differences in habitat selection
between Florida panthers and introduced Texas cougars. Lanszki et al. (2006) evalu-
ated feeding habits and trophic niche overlap between sympatric golden jackal (Canis
aureus) and red fox (Vulpes vulpes) in the Pannonian ecoregion (Hungary). They used
a MANOVA to compare the canids in consumption of fresh biomass of prey based on
the prey’s body mass as the dependent variable, carnivore species as the fixed factor,
and seasons and mass categories as covariates.
research dollars. Sequential designs are unique in that the sample size is not fixed
before the study begins and there are now three potential statistical inferences,
accept, reject, or uncertainty (more data are needed). After each sampling event, the
available data are analyzed to determine if conclusions can be reached without
additional sampling. The obvious advantage to this approach is the potential sav-
ings in dollars and time necessary to conclude a study.
Sequential sampling can be very useful when data are essentially nonexistent on a
study population and a priori sample size estimation is essentially a guess. As an exam-
ple, suppose in a regulatory setting the standard for water quality below a waste treat-
ment facility is survival time for a particular fish species (e.g., fathead minnow). The
null hypothesis is that mean survival time is less than the regulatory standard and the
alternate hypothesis is greater than equal to the regulatory standard. The primary deci-
sion criterion is the acceptable risks for Type I and II errors. Typically, in a regulatory
setting the emphasis is placed on reducing the Type I errors (i.e., rejecting a true null
hypothesis). Sequential sampling continues until a decision regarding whether the facil-
ity is meeting the regulatory standard is possible within the acceptable risk of error.
Biological studies commonly use computer-intensive methods (see Manly 1997).
Randomization tests, for example, involve the repeated sampling of a randomization
distribution (say 5,000 times) to determine if a sample statistic is significant at a cer-
tain level. Manly (1997) suggests that a sequential version of a randomization test
offers the possibility of reducing the number of randomizations necessary, potentially
saving time and reducing the required computing power. Nevertheless, Manly (1997)
advocates the use of a fixed number of randomizations to estimate the significance
level rather than determining if it exceeds some prespecified level.
The above discussion of the sequential study design presumes there is comprehen-
sive knowledge of the biology of the population of interest. That is, we know which
variables are most important, the range of variables that should be studied, the proper
methods and metrics to use, and potential interactions. However, the sequential study
can also be thought of at a more global scale. That is, an investigation could begin
with a moderately sized experiment followed by reassessment after the first set of
results is obtained. The obvious advantage to this approach is that the a priori deci-
sions made regarding the biology of populations and the resulting initial study design
are modified based on new information. Adaptive resource management (Walters
1986; see Chap. 2) is popularizing this method of scientific study. Box et al. (1978)
advocate “the 25% rule,” that is not more than one quarter of the experimental effort
(budget) should be invested in a first design. The bottom-line is that when there is a
great deal of uncertainty regarding any of the necessary components of the study one
should not put all of the proverbial eggs (budget and time) into one basket (study).
The crossover design is a close relative of the Latin square and in some instances
the analysis is identical (Montgomery 1991). Simply put, crossover designs involve
3.11 Other Designs 105
the random assignment of two or more treatments to a study population during the
first study period and then the treatments are switched during subsequent study
periods so that all study units receive all treatments in sequence. Contrast this with
the above designs where treatments are assigned in parallel groups where some
subjects get the first treatment and different subjects get the second treatment. The
crossover design is typically implemented with a single treatment and control, and
represents a special situation where there is not a separate comparison group. In
effect, each study unit serves as its own control. In addition, since the same study
unit receives both treatments, there is no possibility of covariate imbalance. That is,
by assigning all treatments to each of the units crossover designs eliminate effects
of variation between experimental units (Williams et al. 2002).
The crossover design can be quite effective when spatially separated controls are
unavailable but temporal segregation of treatments is a possibility. However, a key
requirement is that the treatments must not have a lasting effect on the study units
such that the response in the second allocation of treatments is influenced by the
first. This potential for a carry-over effect limits to some extent the type of treat-
ments and study units that can be used in crossover experiments. Typically study
units are given some time for recovery (i.e., overcome any potential effects of the
first treatment application) before the second treatment phase begins. Williams et
al. (2002) describes an analysis procedure that includes a treatment effect, time
effect, carry-over effect, and two random terms, one for replication and one that
accounts for the sequencing of treatments.
Wolfe et al. (2004) provide a straightforward example of the application of the
crossover design in the study of the immobilization of mule deer with the drug
Thiafentanil (A-3080). This study utilized a balanced crossover design where each
deer was randomly assigned one of two Thiafentanil dose treatments. One treat-
ment was the existing study protocol dose (0.1 mg kg−1), and the other treatment
was 2× the protocol dose (0.2 mg kg−1). Treatment assignments were switched for
the second half of the experiment so that each animal eventually received both treat-
ments. The first half of the crossover experiment occurred on day 0 of the study and
the second half occurred 14 days later to allow the mule deer to recover from the
application of the first treatment dose. As another example, a study currently being
implemented at the Altamont Pass Wind Resource Area in central California, where
a high (>40 per year) number of golden eagles are being killed by wind turbines.
The study uses a crossover design to determine if a seasonal shutdown of turbines
can be effective in reducing eagle fatalities. A set of turbines are operated during
the first half of the winter season while another set is shut down and eagle fatalities
are quantified; the on–off turbines are reversed for the second half of the season;
and, the same protocol is followed for a second year. The objectives are to see if the
overall fatalities in the area decline because of a winter shutdown, to see if winter
fatalities decline due to partial shutdown, and to see if variation in fatalities occurs
within seasons of operation. Thus, the treatment has been “crossed-over” to the
other elements. Power remains low in such experiments, and the experimenter
draws conclusions using a weight of evidence approach (where “weight of evidence”
simply means you see a pattern in the response).
106 3 Experimental Designs
3.11.3 Quasiexperiments
To this point, we have concentrated on designs that closely follow the principles
Fisher (1966) developed for agricultural experiments where the observer can con-
trol the events. These principles are the basis for most introductory statistics
courses and textbooks. In such courses, there is the implication that the researcher
will have a great deal of latitude in the control of experiments. The implication is
that experimental controls are often possible and blocking for the partitioning of
sources of variance can commonly be used, and the designs of experiments often
become quite complicated. The principles provide an excellent foundation for the
study of uncontrolled events that include most wildlife studies. However, when
wildlife students begin life in the real world, they quickly learn that it is far messier
than their statistics professors led them to believe.
Wildlife studies are usually observational with few opportunities for the conduct
of replicated manipulative experiments. Studies usually focus on the impact of a
perturbation on a population or ecosystem, and fall into the category classified by
Eberhardt and Thomas (1991) as studies of uncontrolled events (see Fig. 3.1). The
perturbation may be a management method or decision with some control possible
or an environmental pollutant with no real potential for control. Even when some
control is possible, the ability to make statistical inference to a population is lim-
ited. The normal circumstance is for the biologist to create relatively simple models
of the real world, exercise all the experimental controls possible, and then, based
on the model-based experiments, make subjective conjecture (Eberhardt and
Thomas 1991) to the real world.
Regardless of the approach, most of the fundamental statistical principles still
apply, but the real world adds some major difficulties, increasing rather than dimin-
ishing the need for careful planning. Designing observational studies require the
same care as the design of manipulative experiments (Eberhardt and Thomas 1991).
Biologists should seek situations in which variables thought to be influential can be
manipulated and results carefully monitored (Underwood 1997). When combined
with observational studies of intact ecosystems, the results of these experiments
increase our understanding of how the systems work. The usefulness of the infor-
mation resulting from research is paramount in the design of studies and, if ecolo-
gists are to be taken seriously by decision-makers, they must provide information
useful for deciding on a course of action, as opposed to addressing purely academic
questions (Johnson 1995).
The need for quasiexperiments is illustrated by using the existing controversy
over the impact of wind power development on birds (Anderson et al. 1999). There
is a national desire by consumers for more environmentally friendly sources of
energy from so-called “Green Power.” Some industry analysts suggest that as much
as 20% of the energy needs in the United States could be met by electricity pro-
duced by wind plants. As with most technology development, power from wind
apparently comes with a cost to the environment. Early studies of the first large
wind resource areas in the Altamont Pass and Solano County areas of California by
3.11 Other Designs 107
the California Energy Commission (Orloff and Flannery 1992) found unexpectedly
high levels of bird fatalities. The resulting questions about the significance of these
fatalities to the impacted populations were predictable and led to independent
research on wind/bird interactions at these two sites and other wind plants through-
out the country (Strickland et al. 1998a,b; Anderson et al. 1996; Howell 1995; Hunt
1995; Orloff and Flannery 1992; Erickson et al. 2002). While these studies look at
project-specific impacts, the larger question is what these studies can tell us about
potential impacts to birds as this technology expands. The study of the impact of
wind power on birds is a classic example of the problems associated with study of
uncontrolled events.
First, the distribution of wind plants is nonrandom with respect to bird popula-
tions and windy sites. Four conditions are necessary for a wind project to be feasi-
ble. There must be a wind resource capable of producing power at rates attractive
to potential customers. There must be access to the wind. There must be a market
for the power, usually in the form of a contract. Finally, there must be distribution
lines associated with a power grid in close proximity. Thence, randomization of the
treatment is not possible. Wind plants are large and expensive, and sites with favo-
rable wind are widely dispersed. As a result, replication and contemporary controls
are difficult to achieve. Nevertheless, public concern will not allow the industry, its
regulators, or the scientific community to ignore the problem simply because
Fisher’s principles of experimental design are difficult to implement.
A second and more academic example of a quasiexperiment is illustrated by
Bystrom et al. (1998) in their whole-lake study of interspecific competition among
young predators and their prey. Before their study, most research on the issue
occurred on a much smaller scale in enclosures or ponds. Bystrom et al. sought to
evaluate the effect of competition from a prey fish (roach, Rutilus rutilus) on the
recruitment of a predatory fish (perch, Perca fluviatilis). The study introduced
roach to two of four small, adjacent unproductive lakes inhabited by natural popula-
tions of perch. After the introduction, the investigators collected data on diet,
growth, and survival of the newborn cohorts of perch during a 13-month period.
Several complications were encountered, including the incomplete removal of a
second and larger predator (pike, Esox lucius) in two of the four lakes and an unfor-
tunate die-off of adult perch in the roach-treatment lakes. A second unreplicated
enclosure experiment was conducted in one of the lakes to evaluate intraspecific vs.
interspecific competition.
Bystrom et al. (1998) attempted to follow good experimental design principles
in their study. The problems they encountered illustrate how difficult experiments
in nature really are. They were able to replicate both treatment and control environ-
ments and blocked treatment lakes. However, the experiment was conducted with a
bare minimum of two experimental units for each treatment. They attempted to
control for the effects of the pike remaining after the control efforts by blocking.
They also attempted to control for intraspecific competition, but with a separate
unreplicated study. It could be argued that a better study would have included
replications of the enclosure study in some form of nested design or a design that
considered the density of perch as a covariate in their blocked experiment. In spite
108 3 Experimental Designs
of a gallant effort, they are left with a study utilizing four subjectively selected lakes
from what is likely a very large population of oligotrophic lakes in Sweden and
somewhat arbitrary densities of prey and other natural predators. In addition, the
two “control” lakes were not true experimental controls and some of the differences
seen between the control and treatment conditions no doubt resulted from preexist-
ing differences. It is doubtful that a sample size of two is sufficient replication to
dismiss the possibility that differences attributed to the treatment could have
occurred by chance. Any extrapolation of the results of this study to other lakes and
other populations of perch is strictly a professional judgment; it is subject to the
protocols and unique environmental conditions of the original study and is not an
exercise of statistical inference.
The following discussion deals primarily with the study of a distinct treatment or
perturbation. These designs fall into the category of intervention analysis in
Eberhardt and Thomas’s (1991) classification scheme. Because these designs typi-
cally result in data collected repeatedly over time they are also called an interrupted
time series (Manly 1992). We do not specifically discuss designs for studies when
no distinct treatment or perturbation exists, as these depend on sampling and may
be characterized by the way samples are allocated over the area of interest.
Sampling plans are covered in detail in Chap. 4.
There are several alternative methods of observational study when estimating the
impact of environmental perturbations or the effects of a treatment. The following
is a brief description of the preferred designs, approximately in order of reliability
for sustaining confidence in the scientific conclusions. A more complete descrip-
tion of these designs can be found in Chap. 6 under the discussion of impact studies
and in Manly (1992) and Anderson et al. (1999).
Did the average difference in abundance between the reference area(s) and the treat-
ment area change after the treatment?
The BACI design is not always practical or possible. Adequate reference areas
are difficult to locate, the perturbation does not always allow enough time for study
before the impact, and multiple times and study areas increase the cost of study.
Additionally, alterations in land use or disturbance occurring before and after treat-
ment complicate the analysis of study results. We advise caution when employing
this method in areas where potential reference areas are likely to undergo signifi-
cant changes that potentially influence the response variable of interest. If advanced
knowledge of a study area exists, the area of interest is somewhat varied, and the
response variable of interest is wide ranging, then the BACI design is preferred for
observational studies for treatment effect.
Matched pairs of study sites from treatment and reference areas often are subjec-
tively selected to reduce the natural variation in impact indicators (Skalski and
Robson 1992; Stewart-Oaten et al. 1986). Statistical analysis of this form of
quasiexperiment is dependent on the sampling procedures used for site selection
and the amount of information collected on concomitant site-specific variables. For
example, sites may be randomly selected from an assessment area and each subjec-
tively matched with a site from a reference area.
When matched pairs are used in the BACI design to study a nonrandom treatment
(perturbation), the extent of statistical inferences is limited to the assessment area, and
the reference pairs simply act as an indicator of baseline conditions. Inferences also
are limited to the protocol by which the matched pairs are selected. If the protocol for
selection of matched pairs is unbiased, then statistical inferences comparing the
assessment and reference areas are valid and repeatable. For example, McDonald et al.
(1995) used this design to evaluate the impacts of the Exxon Valdez oil spill on the
intertidal communities in Prince William Sound, Alaska. Since the assessment study
units were a random sample of oiled units, statistical inferences were possible for all
oiled units. However, since the reference units were subjectively selected to match the
oiled units, no statistical inferences were possible or attempted to nonoiled units. The
selection of matched pairs for extended study contains the risk that sites may change
before the study is completed, making the matching inappropriate (see discussion of
stratification in Chap. 4). The presumption is that, with the exception of the treatment,
the pairs remain very similar – a risky proposition in long-term studies.
quently lack “before” baseline data from the assessment area and/or a reference
area requiring an alternative to the BACI, such as the impact-reference design.
Assessment and reference areas are censused or randomly subsampled by an appro-
priate sampling design. Design and analysis of treatment effects in the absence of
preimpact data follow Skalski and Robson’s (1992) (see Chap. 6) recommendations
for accident assessment studies.
Differences between assessment and reference areas measured only after the
treatment might be unrelated to the treatment, because site-specific factors differ.
For this reason, differences in natural factors between assessment and reference
areas should be avoided as much as possible. Although the design avoids the added
cost of collecting preimpact data, reliable quantification of treatment effects must
include as much temporal and spatial replication as possible. Additional study
components, such as the measurement of other environmental covariates that might
influence response variables, may help limit or explain variation and the confound-
ing effects of these differences. ANCOVA may be of value to adjust the analysis of
a random variable to allow for the effect of another variable.
If one discovers a gradient of response is absent but a portion of the study area
meets the requirements of a reference area, data analysis compares the response
variables measured in the treatment and control portions of the study area. The
impact-gradient design can be used in conjunction with BACI, impact reference,
and before–after designs.
When studies using reference areas are possible, the use of more than one reference
area increases the reliability of conclusions concerning quantification of a treatment
3.11 Other Designs 113
response in all designs (Underwood 1994). Multiple reference areas help deal with
the frequently heard criticism that the reference area is not appropriate for the treat-
ment area. Consistent relationships among several reference areas and the treatment
area will generate far more scientific confidence than if a single reference area is
used. In fact, scientific confidence is likely increased more than would be expected
given the increase in number of reference areas. This confidence comes from the
replication in space of the baseline condition. Multiple reference areas also reduce
the impact on the study if one reference area is lost, e.g., due to a change in land
use affecting response variables.
Collection of data on study areas for several time periods before and/or after the
treatment also will enhance reliability of results. This replication in time allows the
detection of convergence and divergence in the response variables among reference
and treatment areas. The data can be tested for interaction among study sites, time,
and the primary indicator of effect (e.g., mortality), assuming the data meet the
assumptions necessary for ANOVA of repeated measures. The specific test used
depends on the response variable of interest (e.g., count data, percentage data, con-
tinuous data, categorical data) and the subsampling plan used (e.g., point counts,
transect counts, vegetation collection methods, GIS [Geographic Information
System] data available, radio-tracking data, capture–recapture data). Often, classic
ANOVA procedures will be inappropriate and nonparametric, Bayesian, or other
computer-intensive methods will be required.
The conditions of the study may not allow a pure design/data-based analysis, particu-
larly in impact studies. For example, animal abundance in an area might be estimated
on matched pairs of impacted and reference study sites. However, carefully the match-
ing is conducted, uncontrolled factors always remain that may introduce too much
variation in the system to allow one to statistically detect important differences
between the assessment and reference areas. In a field study, there likely will be natu-
rally varying factors whose effects on the impact indicators are confounded with the
effects of the incident. Data for easily obtainable random variables that are correlated
with the impact indicators (covariates) will help interpret the gradient of response
observed in the field study. These variables ordinarily will not satisfy the criteria for
determining impact, but are often useful in model-based analyses for the prediction of
impact (Page et al. 1993; Smith 1979). For example, in the study of bird use on the
Wyoming wind plant site, Western Ecosystems Technology, Inc. (1995) developed
indices to prey abundance (e.g., prairie dogs [Cynomys], ground squirrels [Spermophilus],
and rabbits [Lagomorpha]). These ancillary variables are used in model-based analy-
ses to refine comparisons of avian predator use in assessment and reference areas.
Land use also is an obvious covariate that could provide important information when
evaluating differences in animal use among assessment and reference areas and time.
Indicators of degree of exposure to the treatment also should be measured
on sampling units. As in the response-gradient design, a clear effect–response
114 3 Experimental Designs
relationship between response variables and level of treatment will provide corrob-
orating evidence of effect. These indicators are also useful with other concomitant
variables in model-based analyses to help explain the “noise” in data from natural
systems. For example, in evaluating the effect of an oil spill, the location of the site
with regard to prevailing winds and currents or substrate of the oiled site are useful
indicators of the degree of oil exposure.
3.12 Meta-analyses
a desirable alternative to writing highly subjective narrative reviews. They make the
logical recommendation that meta-analysis of observational studies should follow
many of the principles of systematic reviews: a study protocol should be written in
advance, complete literature searches carried out, and studies selected and data
extracted in a reproducible and objective fashion. Following this systematic
approach exposes both differences and similarities of the studies, allows the explicit
formulation and testing of hypotheses, and allows the identification of the need for
future studies. Particular with observational studies, meta-analysis should carefully
consider the differences among studies and stratify the analysis to account for these
differences and for known biases.
Erickson et al. (2002) provide a nice example of a meta-analysis using pooled
data from a relatively large group of independent observational studies of the
impacts of wind power facilities on birds and bats. The meta-analysis evaluated
data on mortality, avian use, and raptor nesting for the purpose of predicting direct
impacts of wind facilities on avian resources, including the amount of study neces-
sary for those predictions. The authors considered approximately 30 available stud-
ies in their analysis of avian fatalities. In the end, they restricted the fatality and use
components of the meta-analysis to the 14 studies that were conducted consistent
with recommendations by Anderson et al. (1999). They also restricted their analysis
to raptors and waterfowl/waterbird groups because the methods for estimating use
appeared most appropriate for the larger birds.
Based on correlation analyses, the authors found that overall impact prediction
for all raptors combined would typically be similar after collection of one season
of raptor use data compared to a full year of data collection. The authors cau-
tioned that this was primarily the case in agricultural landscapes where use esti-
mates were relatively low, did not vary much among seasons, and mortality data
at new wind projects indicated absent to very low raptor mortality. Furthermore,
the authors recommended more than one season of data if a site appears to have
relatively high raptor use and in landscapes not yet adequately studied.
Miller et al. (2003) reviewed results of 56 papers and subjectively concluded
that current data (on roosting and foraging ecology of temperate insectivorous
bats) were unreliable due to small sample sizes, short-term nature of studies,
pseudoreplication, inferences beyond scale of data, study design, and limitations
of bat detectors and statistical analyses. To illustrate the value of a quantitative
meta-analysis, Kalcounis-Ruppell et al. (2005) used a series of meta-analyses on
the same set of 56 studies to assess whether data in this literature suggested gen-
eral patterns in roost tree selection and stand characteristics. The authors also
repeated their analyses with more recent data, and used a third and fourth series
of meta-analyses to separate the studies done on bat species that roost in cavities
from those that roost in foliage. The quantitative meta-analysis by Kalcounis-
Ruppell et al. (2005) provided a much more thorough and useful analysis of the
available literature compared to the more subjective analysis completed by Miller
et al. (2003).
116 3 Experimental Designs
Effect size is the difference between the null and alternative hypotheses. That is, if
a particular management action is expected to cause a change in abundance of an
3.13 Power and Sample Size Analyses 117
organism by 10%, then the effect size is 10%. Effect size is important in designing
experiments for obvious reasons. At a given a-level and sample size, the power of
an experiment increases with effect size and, conversely, the sample size necessary
to detect an effect typically increases with a decreasing effect size.
Given that detectable effect size decreases with increasing sample size, there
comes a condition in most studies that a finding of a statistically significant differ-
ence has no biological meaning (for example, a difference in canopy cover of 5%
over a sampling range of 30–80%; see Sect. 1.5.3). As such, setting a biologically
meaningful effect size is the most difficult and challenging aspect of power analysis
and this “magnitude of biological effect” is a hypothetical value based on the
researcher’s biological knowledge. This point is important in designing a meaning-
ful research project. Nevertheless, the choice of effect size is important and is an
absolute necessity before it is possible to determine the power of an experiment or
to design an experiment to have a predetermined power (Underwood 1997).
When the question of interest can be reduced to a single parameter (e.g., differences
between two population means or the difference between a single population and a
fixed value), establishing effect size is in its simplest form. There are three basic
types of simple effects:
● Absolute effect size is set when the values are in the same units; for example,
looking for a 10 mm difference in wing length between males and females of
some species.
● Relative effect size is used when temporal or spatial control measures are used
and effects are expressed as the difference between in response variable due. As
expected, relative effect sizes are expressed as percentages (e.g., the percent
increase in a population due to a treatment relative to the control).
● Standardized effect sizes are measures of absolute effect size scaled by variance
and therefore combine these two components of hypothesis testing (i.e., effect
size and variance). Standardized effect sizes are unit-less and are thus compara-
ble across studies. They are, however, difficult to interpret biologically and it is
thus usually preferable to use absolute or relative measures of effect size and
consider the variance component separately.
Setting an effect size when dealing with multiple factors or multiple levels of a sin-
gle factor is a complex procedure involving and examination of the absolute effect
size based on the variance of the population means:
118 3 Experimental Designs
σ 2 = 1 / k ∑ ( mi − m mean )2 .
Steidl and Thomas (2001) outlined four approaches for establishing effect size in
complex situations:
● Approach 1. Specify all cell means. In an experiment with three treatments and
a control, you might state that you are interested in examining power given a
control value of 10 g and treatment yields of 15, 20, and 25 g. Although these
statements are easy to interpret, they are also difficult to assign.
● Approach 2. Delineate a measure of effect size based on the population variances
through experimenting with different values of the means. That is, you experi-
ment with different values of the response variable and reach a conclusion based
on what a meaningful variance would be.
● Approach 3. Simplify the problem to one of comparing only two parameters. For
example, in a one-factor ANOVA you would define a measure of absolute effect
size (mmax – mmin), which places upper and lower bounds on power, each of which
can be calculated.
● Approach 4. Assess power at prespecified levels of standardized effect size for a
range of tests. In the absence of other guidance, it is possible to calculate power
at three levels as implied by the adjectives small, medium, and large. This
approach is seldom applied in ecological research and is mentioned here briefly
only for completeness.
In sum, power and sample size analyses are important aspects of study design, but
only so that we can obtain a reliable picture of the underlying distribution of the
biological parameters of interest. The statistical analyses that follow provide addi-
tional guidance for making conclusions. By setting effect size or just your expecta-
tion regarding results (e.g., in an observational study) a priori, the biology drives the
process rather than the statistics. That is, the proper procedure is to use statistics to
first help guide study design, and later to compliment interpretations. The all too
common practice of collecting data, applying a statistical analysis, and then inter-
preting the outcome misses the needed biological guidance necessary for an ade-
quate study. What you are doing, essentially, is agreeing a priori to accept whatever
guidance the statistical analyses provide and then trying to force a biological expla-
nation into that framework. Even in situations where you are doing survey work to
develop a list of species occupying a particular location, stating a priori what you
expect to find, and the relative order by abundance, provides a biological framework
for later interpretation (and tends to reduce the fishing expedition mentality).
Sensitivity analysis can be used to help establish an appropriate effect size. For
example, you can use the best available demographic information – even if it is
from surrogate species – to determine what magnitude of change in, say, reproduc-
tive success will force l (population rate of increase) above or below 1.0. This
value then sets the effect size for prospective power analysis or for use in guiding
an observational study (i.e., what difference in nest success for a species would be
of interest when studying reproduction along an elevation gradient?). For a primarily
3.14 Prospective Power Analysis 119
a funding entity (e.g., agency) has developed a request for a study (i.e., Request
for Proposal or RFP) that includes a specific sampling location(s), sampling
conditions, and a limit to the amount of funding available. By accepting such
funding, you are in essence accepting what the resulting power and effect size.
You often have the ability, however, to adjust the sampling protocol to ensure
that you can address at least part of the study objectives with appropriate rigor.
In this scenario you conduct power analysis in an iterative manner using differ-
ent effect sizes and a-levels to determine what you can achieve with the sample
size limits in place (Fig. 3.3).
3. Scenario 3. Here you are determining what effect size you can achieve given a
target power, a-level, variance, and sample size. As discussed earlier, a can be
changed within some reasonable bounds (i.e., a case can usually be made for
£0.15) and variance is set. Here you also are attempting to determine what role
sample size has in determining effect size.
In summary, the advantage of prospective power analysis is the insight you gain
regarding the design of your study. Moreover, even if you must conduct a study
given inflexible design constraints, power analysis provides you with knowledge of
the likely rigor of your results.
Fig. 3.3 The influence of number of replicates on statistical power to detect small (0.09),
medium (0.23), and large (0.36) effect sizes (differences in the probability of predation) between
six large and six small trout using a Wilcoxon signed-ranks test. Power was estimated using a
Monte Carlo simulation. Reproduced from Steidel et al. (2001) with kind permission from
Springer Science + Business Media
3.16 Power Analysis and Wildlife Studies 121
As the names implies, retrospective power analysis is conducted after the study is
completed, the data have been collected and analyzed, and the outcome is known.
Statisticians typically dismiss retrospective power analysis as being uninformative
and perhaps inappropriate and its application is controversial (Gerard et al. 1998).
However, in some situations retrospective power analysis can be useful. For exam-
ple, if a hypothesis was tested and not rejected you might want to know the proba-
bility that a Type II error was committed (i.e., did the test have low power?). As
summarized by Steidl and Thomas (2001), retrospective power analysis is useful in
distinguishing between two reasons for failing to reject the null hypothesis:
● The true effect size was not biologically significant.
● The true effect size was biologically significant but you failed to reject the null
hypothesis (i.e., you committed a Type II error).
To make this distinction, you calculate the power to detect a minimally biologically
significant effect size given the sample size, a, and variance used in the study. If the
resulting power at this effect size is large, then the magnitude of the minimum bio-
logically significant effect would likely lead to statistically significant results. Given
that the test was actually not significant, you can infer that the true effect size is likely
not this large. If, however, power was small at this effect size, you can infer that the
true effect size could be large or small and that your results are inconclusive.
Despite the controversy, retrospective power analysis can be a useful tool in
management and conservation. Nevertheless, retrospective power analysis should
never be used when power is calculated using the observed effect size. In such
cases, the resulting value for power is simply a reexpression of the p-value, where
low p-values lead to high power and vice versa.
In practice, observational studies generally have low statistical power. In the case
of environmental impact monitoring, the H0 will usually be that there is no impact
to the variable of interest. Accepting a “no impact” result when an experiment has
low statistical power may give regulators and the public a false sense of security.
The a-level of the experiment is usually set by convention and the magnitude of the
effect in an observational study is certainly not controllable. In the case of a regula-
tory study, the regulation may establish the a-level. Thus, sample size and estimates
of variance usually determine the power of observational studies. Many of the
methods discussed in this chapter are directed toward reducing variance in obser-
vational studies. In properly designed observational studies, the ultimate determi-
nant of statistical power is sample size.
The lack of sufficient sample size necessary to have reasonable power to detect
differences between treatment and reference (control) populations is a common
122 3 Experimental Designs
Fig. 3.4 An illustration of how means and variance stabilize with additional sampling. Note that
in the all four examples the means (horizontal solid and dashed lines) and variance (vertical solid
and dashed lines) stabilize with 20–30 plots. Knowledge of the behavior of means and variance
influences the amount of sampling in field studies
for a species of interest. You can justify ceasing sampling when the means and vari-
ance stabilize (i.e., asymptote; see Fig. 3.4). In a similar fashion, you can take
increasingly large random subsamples from a completed data set, calculate the
mean and variance, and determine if the values reached an asymptote.
Much has been written criticizing null hypothesis significance testing including
applications to wildlife study (see Sect. 1.4.1; Johnson 1999; Steidl and Thomas
2001). McDonald and Erickson (1994), and Erickson and McDonald (1995)
describe an alternative approach often referred to as bioequivalence testing.
Bioequivalence testing reverses the burden of proof so that a treatment is consid-
ered biologically significant until evidence suggests otherwise; thus the role of the
null and alternative hypotheses are switched. As summarized by Steidl and Thomas
(2001), a minimum effect size that is considered biologically significant is defined.
124 3 Experimental Designs
Then, the alternative hypothesis is stated such that the true effect size is greater than
or equal to the minimum effect size that was initially selected. Lastly, the alterna-
tive hypothesis is that the true effect size is less than the initial minimum effect size.
Thus, Type I error occurs when the researcher concludes incorrectly that no biologi-
cally significant difference exists when one does. Recall that this is the type of error
addressed by power analysis within the standard hypothesis-testing framework.
Bioequivalence testing controls this error rate a priori by setting the a-level of the
test. Type II error, however, does remain within this framework when the researcher
concludes incorrectly that an important difference exists when one does not.
For a real world example of the significance of value of this alternative approach,
consider testing for compliance with a regulatory standard for water quality. In the
case of the classic hypothesis testing, poor laboratory procedure resulting in wide
confidence intervals could easily lead to a failure to reject the null hypothesis that
a water quality standard had been exceeded. Conversely, bioequivalence testing
protects against this potentiality and is consistent with the precautionary principle.
While this approach appears to have merit, it is not currently in widespread use in
wildlife science.
Fig. 3.5 Hypothetical observed effects (circles) and their associated 100(1−a)% confidence
intervals. The solid line represents zero effect, and dashed lines represent minimum biologically
important effects. In case A, the confidence interval for the estimated effect excludes zero effect
and includes only biologically important effects, so the study is both statistically and biologically
important. In case B, the confidence interval excludes zero effect, so the study is statistically
significant; however, the confidence interval also includes values below those thought to be bio-
logically important, so the study is inconclusive biologically. In case C, the confidence interval
includes zero effect and biologically important effects, so the study is both statistically and bio-
logically inconclusive. In case D, the confidence interval includes zero effect but excludes all
effects considered biologically important, so the “practical” null hypothesis of no biologically
important effect can be accepted with 100(1−a)% confidence. In case E, the confidence interval
excludes zero effect but does not include effects considered biologically important, so the study is
statistically but not biologically important. Reproduced from Steidel et al. (2001) with kind per-
mission from Springer Science + Business Media
As in much of life, things can and often do go wrong in the best-designed studies.
The following are a few case studies that illustrate adjustments that salvage a study
when problems occur.
Case 1 – As previously discussed, Sawyer (2006) conducted a study to deter-
mine the impact of gas development on habitat use and demographics of mule deer
in southwestern Wyoming. Although the study of habitat use clearly demonstrated
a decline in use of otherwise suitable habitat, the lack of a suitable control ham-
pered identification of the relationship of this impact to population demographics.
Sawyer (2006) established a reference area early in the study based on historical
data supplemented by aerial surveys during a pilot study period. While the impact
area boundary remained suitable over the course of the 4-year study, the boundary
around the control area turned out to be inadequate. That is, each year the deer dis-
tribution was different, resulting in the need for continually expanding the area
being surveyed as a control. Thus, even though the numbers of deer remained rela-
tively unchanged in the reference area, the fact that the boundaries continued to
change made a comparison of abundance and other demographic characteristics
between the control and impact area problematic. Demographic data for the deer
within the impact area did show declines in reproductive rate and survival, although
the reductions were not statistically different from 0. Additionally, emigration rates
did not satisfactorily explain the decline in deer numbers in the impact area. Finally,
simulations using the range of reproduction and survival measured in the impact
area suggested that those declines, while not statistically significant could, when
combined with emigration rates explain the decline in deer numbers. While there is
still opportunity for confounding and cause and effect is still strictly professional
judgment, the weight of evidence suggests that the loss in effective habitat caused
by the gas development may have resulted in a decline in deer abundance and sup-
ports a closer look at the impact of gas development on mule deer in this area.
Case 2 – McDonald (2004) surveyed statisticians and biologists, and reported
successes and failures in attempts to study rare populations. One of the survey
respondents, Lowell Diller (Senior Biologist, Green Diamond Resource Company,
Korbel, California, USA) suggested that “A rare population is one where it is diffi-
cult to find individuals, utilizing known sampling techniques, either because of small
numbers, secretive and/or nocturnal behavior, or because of clumped distribution
over large ranges, i.e., a lot of zeros occur in the data. Therefore, a rare population
is often conditional on the sampling techniques available.” Lowell provided an illus-
tration of his point by describing surveys conducted for snakes during the mid-1970s
on the Snake River Birds of Prey Area in southern Idaho. Surveys were being con-
ducted for night snakes (Hypsiglena torquata), which were thought to be one of the
3.21 Retrospective Studies 127
rarest snakes in Idaho with only four known records for the state. His initial surveys,
using standard collecting techniques for the time (turning rocks and such along some
transect, or driving roads at night), confirmed that night snakes were very rare. In the
second year of his study, however, he experimented with drift fences and funnel traps
and suddenly began capturing numerous night snakes. They turned out to be the
most common snakes in certain habitats and were the third most commonly captured
snake within the entire study area. This case study illustrates two points, unsuccess-
ful surveys may be the result of “common wisdom” being incorrect, and/or standard
techniques may be ineffective for some organisms and/or situations.
Case 3 – The Coastal Habitat Injury Assessment started immediately after the
EVOS in 1989 with the selection of heavily oiled sites for determining the rate of
recovery. To allow an estimate of injury, the entire oiled area was divided into 16
strata based on the substrate type (exposed bedrock, sheltered bedrock, boulder/cob-
ble, and pebble/gravel) and degree of oiling (none, light, moderate, and heavy). Four
sites were then selected from each of the 16 strata for sampling to estimate the abun-
dance of more than a thousand species of animals and plants. The stratification and
site selection were all based on the information in a geographical information system
(GIS). Unfortunately, some sites were excluded from sampling because of their
proximity to active eagle nests, and more importantly, many of the oiling levels were
misclassified and some of the unoiled sites were under the influence of freshwater
dramatically reducing densities of marine species. So many sites were misclassified
by the GIS system that the initial study design was abandoned in 1990. Alternatively,
researchers matched each of the moderately and heavily oiled sites sampled in 1989
with a comparable unoiled control site based on physical characteristics, resulting in
a paired comparison design. The Trustees of Natural Resources Damage Assessment,
the state of Alaska and the US Government, estimated injury by determining the
magnitude of difference between the paired oiled and unoiled sites (Highsmith et al.
1993; McDonald et al. 1995; Harner et al. 1995). Manley (2001) provides a detailed
description of the rather unusual analysis of the resulting data.
McDonald (2004) concluded that the most important characteristics of success-
ful studies are (1) they trusted in random sampling, systematic sampling with a
random start, or some other probabilistic sampling procedure to spread the initial
sampling effort over the entire study area and (2) they used appropriate field proce-
dures to increase detection and estimate the probability of detection of individuals
on sampled units. It seems clear that including good study design principles in the
initial study as described in this chapter increases the chances of salvaging a study
when things go wrong.
As the name implies, a retrospective study is an observational study that looks back-
ward in time. Retrospective studies can be an analysis of existing data or a study of
events that have already occurred. For example, we find data on bird fatalities from
128 3 Experimental Designs
several independent surveys of communications towers and we figure out why they
died. Similarly, we design a study to determine the cause of fatalities in an area that
has been exposed to an oil spill. A retrospective study can address specific statistical
hypotheses relatively rapidly, because data are readily available or already in hand;
all we need to do is analyze the data and look for apparent treatment effects and cor-
relations. In the first case, the birds are already dead; we just have to tabulate all the
results and look at the information available for each communications tower.
Numerous mensurative experiments used to test hypotheses are retrospective in
nature (See Sinclair 1991; Nichols 1991); and, medical research on human diseases
is usually a retrospective study. Retrospective studies are opposed to prospective
studies, designed studies based on a priori hypotheses about events that have not yet
occurred.
Retrospective studies are common in ecology and are the only option in most
post hoc impact assessments. Williams et al. (2002) offer two important caveats to
the interpretation of retrospective studies. First, inferences from retrospective stud-
ies are weak, primarily because response variables may be influenced by unrecog-
nized and unmeasured covariates. Second, patterns found through mining the data
collected during a retrospective study are often used to formulate a hypothesis that
is then tested with the same data. This second caveat brings to mind two comments
Lyman McDonald heard Wayne Fuller make at a lecture at Iowa State University.
The paraphrased comments are that “the good old data are not so good” and “more
will be expected from the data than originally designed.” In general, data mining
should be avoided or used as to develop hypotheses that are tested with newly
obtained empirical data. Moreover, all the above study design principles apply to
retrospective studies.
3.22 Summary
Single-factor designs are the simplest and include both paired and unpaired
experiments of two treatments or a treatment and control. Adding blocking, includ-
ing randomized block, incomplete block, and Latin squares designs further compli-
cates the completely randomized design. Multiple designs include factorial
experiments, two-factor experiments and multifactor experiments. Higher order
designs result from the desire to include a large number of factors in an experiment.
The object of these more complex designs is to allow the study of as many factors
as possible while conserving observations. Hierarchical designs as the name
implies increases complexity by having nested experimental units, for example
split-plot and repeated measures designs. The price of increased complexity is a
reduction in effective sample size for individual factors in the experiment.
ANCOVA uses the concepts of ANOVA and regression to improve studies by
separating treatment effects on the response variable from the effects of covari-
ates. ANCOVA can also be used to adjust response variables and summary statis-
tics (e.g., treatment means), to assist in the interpretation of data, and to estimate
missing data.
Multivariate analysis considers several related random variables simultaneously,
each one considered equally important at the start of the analysis. This is particu-
larly important in studying the impact of a perturbation on the species composition
and community structure of plants and animals. Multivariate techniques include
multidimensional scaling and ordination analysis by methods such as principal
component analysis and detrended canonical correspondence analysis.
Other designs frequently used to increase efficiency, particularly in the face of
scarce financial resources, or when manipulative experiments are impractical
include sequential designs, crossover designs, and quasiexperiments.
Quasiexperiments are designed studies conducted when control and randomization
opportunities are possible, but limited. The lack of randomization limits statistical
inference to the study protocol and inference beyond the study protocol is usually
expert opinion. The BACI study design is usually the optimum approach to
quasiexperiments. Meta-analysis of a relatively large number of independent stud-
ies improves the confidence in making extrapolations from quasiexperiments.
An experiment is statistically very powerful if the probability of concluding no
effect when in fact effect does exist is very small. Four interrelated factors deter-
mine statistical power: power increases as sample size, a-level, and effect size
increase; power decreases as variance increases. Understanding statistical power
requires an understanding of Type I and Type II error, and the relationship of these
errors to null and alternative hypotheses. It is important to understand the concept
of power when designing a research project, primarily because such understanding
grounds decisions about how to design the project, including methods for data col-
lection, the sampling plan, and sample size. To calculate power the researcher must
have established a hypothesis to test, understand the expected variability in the data
to be collected, decide on an acceptable a-level, and most importantly, a biologi-
cally relevant response level. Retrospective power analysis occurs after the study is
completed, the data have been collected and analyzed, and with a known outcome.
Statisticians typically dismiss retrospective power analysis as being uninformative
130 3 Experimental Designs
References
Anderson, R. L., J. Tom, N. Neumann, and J. A. Cleckler. 1996. Avian Monitoring and Risk
Assessment at Tehachapi Pass Wind Resource Area, California. Staff Report to California
Energy Commission, Sacramento, CA, November, 1996.
Anderson, R. L., M. L. Morrison, K. Sinclair, and M. D. Strickland. 1999. Studying Wind Energy/
Bird Interactions: A Guidance Document. Avian Subcommittee of the National Wind
Coordinating Committee, Washington, DC.
Barrett, M. A., and P. Stiling. 2006. Key deer impacts on hardwood hammocks near urban areas.
J. Wildl. Manage. 70(6): 1574–1579.
Bates, J. D., R. F. Miller, and T. Svejcar. 2005. Long-term successional trends following western
juniper cutting. Rangeland Ecol. Manage. 58(5): 533–541.
Berenbaum, M. R., and A. R. Zangerl. 2006. Parsnip webworms and host plants at home and
abroad: Trophic complexity in a geographic mosaic. Ecology 87(12): 3070–3081.
Borgman, L. E., J. W. Kern, R. Anderson-Sprecher, and G. T. Flatman. 1996. The sampling theory
of Pierre Gy: Comparisons, implementation, and applications for environmental sampling, in
References 131
L. H. Lawrence, Ed. Principles of Environmental Sampling, 2nd Edition, pp. 203–221. ACS
Professional Reference Book, American Chemical Society, Washington, DC.
Box, G. E. P., and B. C. Tiao. 1975. Intervention analysis with applications to economic and envi-
ronmental problems. J. Am. Stat. Assoc. 70: 70–79.
Box, G. E. P., W. G. Hunter, and J. S. Hunter. 1978. Statistics for Experimenters, an Introduction
to Design, Data Analysis, and Model Building. Wiley, New York.
Bystrom, P., L. Persson, and E. Wahlstrom. 1998. Competing predators and prey: Juvenile bottle-
necks in whole-lake experiments. Ecology 79(6): 2153–2167.
Cade, T. J. 1994. Industry research: Kenetech wind power. In Proceedings of National Avian-Wind
Power Planning Meeting, Denver, CO, 20–21 July 1994, pp. 36–39. Rpt. DE95-004090. Avian
Subcommittee of the National Wind Coordinating Committee, % RESOLVE Inc., Washington,
DC, and LGL Ltd, King City, Ontario.
Cochran, W. G. 1977. Sampling Techniques, 3rd Edition. Wiley, New York.
Cochran, W. G., and G. Cox. 1957. Experimental Designs, 2nd Edition. Wiley, New York.
Cohen, J. 1973. Statistical power analysis and research results. Am. Educ. Res. J. 10: 225–229.
Cox, D. R. 1958. Planning of Experiments (Wiley Classics Library Edition published 1992).
Wiley, New York.
Cox, J. J., D. S. Maehr, and J. L. Larkin. 2006. Florida panther habitat use: New approach to an
old problem. J. Wildl. Manage. 70(6): 1778–1785.
Crowder, M. J., and D. J. Hand. 1990. Analysis of Repeated Measures. Chapman and Hall,
London.
Dallal, G. E. 1992. The 17/10 rule for sample-size determinations (letter to the editor). Am. Stat.
46: 70.
Dillon, W. R., and M. Goldstein. 1984. Multivariate analysis methods and applications. Wiley,
New York.
Eberhardt, L. L., and J. M. Thomas. 1991. Designing environmental field studies. Ecol. Monogr.
61: 53–73.
Egger, M., M. Schneider, and D. Smith. 1998. Meta-analysis spurious precision? Meta-analysis of
observations studies. Br. Med. J. 316: 140–144.
Erickson, W. P., and L. L. McDonald. 1995. Tests for bioequivalence of control media and test
media in studies of toxicity. Environ. Toxicol. Chem. 14: 1247–1256.
Erickson, W., G. Johnson, D. Young, D. Strickland, R. Good, M. Bourassa, K. Bay, and K. Sernka.
2002. Synthesis and Comparison of Baseline Avian and Bat Use, Raptor Nesting and Mortality
Information from Proposed and Existing Wind Developments. Prepared by Western
EcoSystems Technology, Inc., Cheyenne, WY, for Bonneville Power Administration, Portland,
OR. December 2002 [online]. Available: http://www.bpa.gov/Power/pgc/wind/Avian_and_
Bat_Study_12-2002.pdf
Fairweather, P. G. 1991. Statistical power and design requirements for environmental monitoring.
Aust. J. Mar. Freshwat. Res. 42: 555–567.
Fisher, R. A. 1966. The Design of Experiments, 8th Edition. Hafner, New York.
Fisher, R. A. 1970. Statistical Methods for Research Workers, 14th Edition. Oliver and Boyd,
Edinburgh.
Flemming, R. M., K. Falk, and S. E. Jamieson. 2006. Effect of embedded lead shot on body condi-
tion of common eiders. J. Wildl. Manage. 70(6): 1644–1649.
Folks, J. L. 1984. Combination of independent tests, in P. R. Krishnaiah and P. K. Sen, Eds.
Handbook of Statistics 4, Nonparametric Methods, pp. 113–121. North-Holland, Amsterdam.
Garton, E. O., J. T. Ratti, and J. H. Giudice. 2005. Research and experimental design, in C.
E. Braun, Ed. Techniques for Wildlife Investigation and Management, 6th Edition, pp. 43–71.
The Wildlife Society, Bethesda, Maryland, USA.
Gasaway, W. C., S. D. Dubois, and S. J. Harbo. 1985. Biases in aerial transect surveys for moose
during May and June. J. Wildl. Manage. 49: 777–784.
Gerard, P. D., D. R. Smith, and G. Weerakkody. 1998. Limits of retrospective power analysis.
J. Wildl. Manage. 62: 801–807.
132 3 Experimental Designs
Gilbert, R. O. 1987. Statistical Methods for Environmental Pollution Monitoring. Van Nostrand
Reinhold, New York.
Gilbert, R. O., and J. C. Simpson. 1992. Statistical methods for evaluating the attainment of
cleanup standards. Vol. 3, Reference-Based Standards for Soils and Solid Media. Prepared by
Pacific Northwest Laboratory, Battelle Memorial Institute, Richland, WA, for U.S.
Environmental Protection Agency under a Related Services Agreement with U.S. Department
of Energy, Washington, DC. PNL-7409 Vol. 3, Rev. 1/UC-600.
Gitzen, R. A., J. J. Millspaugh, and B. J. Kernohan. 2006. Bandwidth selection for fixed-kernel
analysis of animal utilization distributions. J. Wildl. Manage. 70(5): 1334–1344.
Glass, G. V., P. D. Peckham, and J. R. Sanders. 1972. Consequences of failure to meet assumptions
underlying the fixed effects analyses of variance and covariance. Rev. Educ. Res. 42: 237–288.
Gordon, A. D. 1981. Classification. Chapman and Hall, London.
Green, R. H. 1979. Sampling Design and Statistical Methods for Environmental Biologists. Wiley,
New York.
Green, R. H. 1984. Some guidelines for the design of biological monitoring programs in the
marine environment, in H. H. White, Ed. Concepts in Marine Pollution Measurements,
pp. 647–655. University of Maryland, College Park. MD.
Harner, E. J., E. S. Gilfillan, and J. E. O’Reilly. 1995. A comparison of the design and analysis
strategies used in assessing the ecological consequences of the Exxon Valdez. Paper presented
at the International Environmetrics Conference, Kuala Lumpur, December 1995.
Herring, G., and J. A. Collazo. 2006. Lesser scaup winter foraging and nutrient reserve acquisition
in east-central Florida. J Wildl. Manage. 70(6): 1682–1689.
Highsmith, R. C., M. S. Stekoll, W. E. Barber, L. Deysher, L. McDonald, D. Strickland, and
W. P. Erickson. 1993. Comprehensive assessment of coastal habitat, final status report. Vol. I,
Coastal Habitat Study No. 1A. School of Fisheries and Ocean Sciences, University of
Fairbanks, AK.
Howell, J. A. 1995. Avian Mortality at Rotor Swept Area Equivalents. Altamont Pass and
Montezuma Hills, California. Prepared for Kenetech Windpower (formerly U.S. Windpower,
Inc.), San Francisco, CA.
Huitema, B. E. 1980. The Analysis of Covariance and Alternatives. Wiley, New York.
Hunt, G. 1995. A Pilot Golden Eagle population study in the Altamont Pass Wind Resource Area,
California. Prepared by Predatory Bird Research Group, University of California, Santa Cruz
CA, for National Renewable Energy Laboratory, Golden, CO. Rpt. TP-441-7821.
Hurlbert, S. H. 1984. Pseudoreplication and the design of ecological field experiments. Ecol.
Monogr. 54: 187–211.
James, F. C., and C. E. McCulloch. 1990. Multivariate analysis in ecology and systematics:
Panacea or Pandora’s box? Annu. Rev. Ecol. Syst. 21: 129–166.
Johnson, D. H. 1995. Statistical sirens: The allure of nonparametrics. Ecology 76: 1998–2000.
Johnson, D. H. 1999. The insignificance of statistical significance testing. J. Wildl. Manage. 63(3):
763–772.
Johnson, B., J. Rogers, A. Chu, P. Flyer, and R. Dorrier. 1989. Methods for Evaluating the
Attainment of Cleanup Standards. Vol. 1, Soils and Solid Media. Prepared by WESTAT
Research, Inc., Rockville, MD, for U.S. Environmental Protection Agency, Washington, DC.
EPA 230/02-89-042.
Kalcounis-Ruppell, M. C., J. M. Psyllakis, and R. M. Brigham. 2005. Tree roost selection by bats:
An empirical synthesis using meta-analysis. Wildl. Soc. Bull. 33(3): 1123–1132.
Kempthorne, O. 1966. The Design and Analysis of Experiments. Wiley, New York.
Krebs, C. J. 1989. Ecological Methodology. Harper and Row, New York.
Kristina, J. B., B. B. Warren, M. H. Humphrey, F. Harwell, N. E., Mcintyre, P. R. Krausman, and
M. C. Wallace. 2006. Habitat use by sympatric mule and white-tailed deer in Texas. J. Wildl.
Manage. 70(5): 1351–1359.
Lanszki, J. M., M. Heltai, and L. Szabo. 2006. Feeding habits and trophic niche overlap between
sympatric golden jackal (Canis aureus) and red fox (Vulpes vulpes) in the Pannonian ecoregion
(Hungary). Can. J. Zool. 84: 1647–1656.
References 133
Ludwig, J. A., and J. F. Reynolds. 1988. Statistical Ecology: A Primer on Methods and Computing.
Wiley, New York.
Manly, B. F. J. 1986. Multivariate Statistical Methods: A Primer. Chapman and Hall, London.
Manly, B. F. J. 1991. Randomization and Monte Carlo Methods in Biology. Chapman and Hall,
London.
Manly, B. F. J. 1992. The Design and Analysis of Research Studies. Cambridge University Press,
Cambridge.
Manly, B. F. J. 1997. Randomization, Bootstrap and Monte Carlo Methods in Biology, 2nd
Edition, 300 pp. Chapman and Hall, London (1st edition 1991, 2nd edition 1997).
Manly, B. F. J. 2001. Statistics for environmental science and management. Chapman and Hall/
CRC, London.
Martin, L. M., and B. J. Wisley. 2006. Assessing grassland restoration success: Relative roles of
seed additions and native ungulate activities. J. Appl. Ecol. 43: 1098–1109.
McDonald, L. L. 2004. Sampling rare populations, in W. L. Thompson, Ed. Sampling Rare or
Elusive Species, pp. 11–42. Island Press, Washington, DC.
McDonald, L. L., and W. P. Erickson. 1994. Testing for bioequivalence in field studies: Has a dis-
turbed site been adequately reclaimed?, in D. J. Fletcher and B. F. J. Manly, Eds. Statistics in
Ecology and Environmental Monitoring, pp. 183–197. Otago Conference Series 2, Univ.
Otago Pr., Dunedin, New Zealand.
McDonald, L. L., W. P. Erickson, and M. D. Strickland. 1995. Survey design, statistical analysis,
and basis for statistical inferences in Coastal Habitat Injury Assessment: Exxon Valdez Oil
Spill, in P. G. Wells, J. N. Butler, and J. S. Hughes, Eds. Exxon Valdez Oil Spill: Fate and
Effects in Alaskan Waters. ASTM STP 1219. American Society for Testing and Materials,
Philadelphia, PA.
McKinlay, S. M. 1975. The design and analysis of the observational study – A review. J. Am. Stat.
Assoc. 70: 503–518.
Mead, R., R. N. Curnow, and A. M. Hasted. 1993. Statistical Methods in Agriculture and
Experimental Biology, 2nd Edition. Chapman and Hall, London.
Mieres, M. M., and L. A. Fitzgerald. 2006. Monitoring and managing the harvest of tegu lizards
in Paraguay. J. Wildl. Manage. 70(6): 1723–1734.
Miles, A. C., S. B. Castleberry, D. A. Miller, and L. M. Conner. 2006. Multi-scale roost site selec-
tion by evening bats on pine-dominated landscapes in southwest Georgia. J. Wildl. Manage.
70(5): 1191–1199.
Miller, D. A., E. B. Arnett, and M. J. Lacki. 2003. Habitat management for forest-roosting bats of
North America: A critical review of habitat studies. Wildl. Soc. Bull. 31: 30–44.
Milliken, G. A., and D. E. Johnson. 1984. Analysis of Messy Data. Van Nostrand Reinhold, New
York.
Montgomery, D. C. 1991. Design and Analysis of Experiments, 2nd Edition. Wiley, New York.
Morrison, M. L., G. G. Marcot, and R. W. Mannan. 2006. Wildlife–Habitat Relationships:
Concepts and Applications, 2nd Edition. University of Wisconsin Press, Madison, WI.
National Research Council. 1985. Oil in the Sea: Inputs, Fates, and Effects. National Academy,
Washington, DC.
Nichols, J. D. 1991. Extensive monitoring programs viewed as long-term population studies: The
case of North American waterfowl. Ibis 133(Suppl. 1): 89–98.
Orloff, S., and A. Flannery. 1992. Wind Turbine Effects on Avian Activity, Habitat Use, and
Mortality in Altamont Pass and Solano County Wind Resource Areas. Prepared by Biosystems
Analysis, Inc., Tiburon, CA, for California Energy Commission, Sacramento, CA.
Page, D. S., E. S. Gilfillan, P. D. Boehm, and E. J. Harner. 1993. Shoreline ecology program for
Prince William Sound, Alaska, following the Exxon Valdez oil spill: Part 1 – Study design and
methods [Draft]. Third Symposium on Environmental Toxicology and Risk: Aquatic, Plant,
and Terrestrial. American Society for Testing and Materials, Philadelphia, PA.
Peterle, T. J. 1991. Wildlife Toxicology. Van Nostrand Reinhold, New York.
Peterman, R. M. 1989. Application of statistical power analysis on the Oregon coho salmon prob-
lem. Can. J. Fish. Aquat. Sci. 46: 1183–1187.
134 3 Experimental Designs
Application. PB88-100151. U.S. Department of the Interior, CERCLA 301 Project, Washington,
DC.
Volesky, J. D., W. H. Schacht, P. E. Reece, and T. J. Vaughn. 2005. Spring growth and use of cool-
season graminoids in the Nebraska Sandhills. Rangeland Ecol. Manage. 58(4): 385–392.
Walters, C. 1986. Adaptive Management of Renewable Resources. Macmillan, New York.
Western Ecosystems Technology, Inc. 1995. Draft General Design, Wyoming Windpower
Monitoring Proposal. Appendix B in Draft Kenetech/PacifiCorp Windpower Project
Environmental Impact Statement. FES-95-29. Prepared by U.S. Department of the Interior,
Bureau of Land Management, Great Divide Resource Area, Rawlins, WY, and Mariah
Associates, Inc., Laramie, WY.
Williams, B. K., J. D. Nichols, and M. J. Conroy. 2002. Analysis and Management of Animal
Populations, Modeling, Estimation, and Decision Making. Academic, New York.
Winer, B. J. 1971. Statistical Principles in Experimental Design, 2nd Edition. McGraw-Hill, New
York.
Winer, B. J., D. R. Brown, and K. M. Michels. 1991. Statistical Principles in Experimental Design,
3rd Edition. McGraw-Hill, New York.
Wolfe, L. L., W. R. Lance, and M. W. Miller. 2004. Immobilization of mule deer with Thiafentanil
(A-3080) or Thiafentanil plus Xylazine. J. Wildl. Dis. 40(2): 282–287.
Zar, J. H. 1998. Biostatistical analysis, 2nd Edition. Prentice-Hall, Englewood Cliffs, NJ.
Chapter 4
Sample Survey Strategies
4.1 Introduction
The goal of wildlife ecology research is to learn about wildlife populations and
their use of habitats. The objective of this chapter is to provide a description of the
fundamentals of sampling for wildlife and other ecological studies. We discuss a
majority of sampling issues from the perspective of design-based observational
studies where empirical data are collected according to a specific study design. We
end the chapter with a discussion of several common model-based sampling
approaches that combine collection of new data with parameters from the literature
or data from similar studies by way of a theoretical mathematical/statistical model.
This chapter draws upon and summarizes topics from several books on applied sta-
tistical sampling and wildlife monitoring and we would encourage interested read-
ers to see Thompson and Seber (1996), Thompson (2002b), Thompson et al.
(1998), Cochran (1977), and Williams et al. (2002).
Typically, the availability of resources is limited in wildlife studies, so research-
ers are unable to carry out a census of a population of plants or animals. Even in
the case of fixed organisms (e.g., plants), the amount of data may make it impossi-
ble to collect and process all relevant information within the available time. Other
methods of data collection may be destructive, making measurements on all indi-
viduals in the population infeasible. Thus, in most cases wildlife ecologists must
study a subset of the population and use information collected from that subset to
make statements about the population as a whole. This subset under study is called
a sample and is the focus of this section. We again note that there is a significant
difference between a statistical population and a biological population (Chap. 1).
All wildlife studies should involve random selection of units for study through
sample surveys. This will result in data that can be used to estimate the biological
parameters of interest. Studies that require a sample must focus on several different
factors. What is the appropriate method to obtain a sample of the population of
interest? Once the method is determined, what measurements will be taken on the
characteristics of the population? Collecting the sample entails questions of sam-
pling design, plot delineation, sample size estimation, enumeration (counting)
methods, and determination of what measurements to record (Thompson 2002b).
our interest is in estimating the mean number of individuals per plot. The sample
mean (−X ) will be an unbiased estimator for the population mean (m) or the average
population size for each randomly selected sample. While the population mean is
the average measurement for each of N samples (after Cochran 1977; Thompson
2002b) defined as
1 1 N
m= ( X1 + X 2 + ⋅⋅⋅ + X N ) = ∑ X
N N i =1
the sample mean −
X i is the average count from those surveyed plots selected under
the simple random sampling design (n) estimated by
1 1 n
X= ( X1 + X 2 + ⋅⋅⋅ + X N ) = ∑ X i .
N n i =1
In this situation, we assumed we had a finite population of known size N. Thus,
within a simple random sampling framework, the sample variance (s2) is an unbi-
ased estimator for the finite population variance s2. Thus,
1 n
(
∑ Xi − X )
2
s2 =
n − 1 i =1
This approach holds true for estimation of subpopulation means (mean of a statisti-
cal population based on stratification). A subpopulation mean is one where we wish
to estimate the mean of a subsample of interest. For example, consider the situation
where we want to estimate the abundance of mice (Mus or Peromyscus) across an
agricultural landscape. After laying out our sampling grid, however, we determine
that abundance of the two species should be estimated for both fescue (Family
Poaceae) and mixed warm-season grass fields. Thus, we are interested in both the
mean number of mice per sample plot and the mean number of mice per sample
plot within a habitat type, e.g., a subpopulation. For habitat type h, our sample
mean subpopulation estimates would be
n
1 h
Xh = ∑ X hi
N i =1
with sample variance
( )
n
1
∑
h 2
sh2 = X hi − X h
nh − 1 i =1
As many ecological researchers wish to estimate the total population size based on
sample data, under a situation with no subpopulation estimates, our estimator for
total population size (T) would be
N n
Tˆ = ∑ Xi .
n i =1
Additional information on estimation of population total and means for more com-
plex designs can be found in Cochran (1977) and Thompson (2002b).
140 4 Sample Survey Strategies
We use sampling designs to ensure that the data collected are as accurate as possi-
ble for a given cost. Thus, plot construction necessitates that researchers evaluate
the impacts of different plot sizes and shapes have on estimator precision. Although
the importance of determining optimal sizes and shape for sampling plots [for con-
sistency within the text, we are using “plots” rather than “quadrats” as defined by
Krebs (1999)] is obvious. With the exception of work by Krebs (1989, 1999) and
general discussion by Thompson et al. (1998), there has been little research on plot
construction in wildlife science. Wildlife tend to be nonrandomly distributed across
the landscape and are influenced by inter- and intraspecific interactions (Fretwell
and Lucas 1970; Block and Brennan 1993). When developing a sampling design to
study a population, the researcher must decide what size of plots should be used
and what shape of plots would be most appropriate based on the study question and
the species life history (Thompson et al. 1998; Krebs 1999). Most frequently, plot
size and shape selection is based on statistical criteria (e.g., minimum standard
error), although in studies of ecological scale, the shape and size will be dependent
upon the process under study (Krebs 1999). Additionally, it is important to realize
that estimates of precision (variance) are dependent upon the distribution of the
target organism(s) in the plots to be sampled (Wiegert 1962).
Krebs (1999) listed three approaches to determine which plot shape and size
would be optimal for a given study:
1. Statistically, or the plot size which has the highest precision for a specific area
or cost
2. Ecologically, or the plot sizes which are most efficient to answering the question
of interest
3. Logistically, or the plot size which is the easiest to construct and use
Plot shape is directly related to both the precision of the counts taken within the plot
and potential coverage of multiple habitat types (Krebs 1999). Four primary factors
influence plot shape selection: (1) detectability of individuals, (2) distribution of
individuals, (3) edge effects, and (4) data collection methods. Shape relates to count
precision because of the edge effect, which causes the researcher to decide whether
an individual is within the sample plot or not, even when total plot size is equal
(Fig. 4.1). Given plots of equal area, long and narrow rectangular plots will have
greater edge effect than square or circular plots. Thompson (1992) concluded that
rectangular plots were more efficient than other plots for detecting individuals.
Note that, in general, long and narrow rectangular plots will have a greater chance
of intersecting species with a clumped distribution. Previous research in vegetation
science has shown that rectangular plots are more efficient (higher precision) than
square plots (Kalamkar 1932; Hasel 1938; Pechanec and Stewart 1940; Bormann
1953). Size is more related to efficiency in sampling (Wiegert 1962), in that we are
trying to estimate population parameters as precisely as possible at the lowest cost
(Schoenly et al. 2003). Generally, larger plots have a lower ratio of edge to interior,
4.2 Basic Sample Structure, Design, and Selection 141
Fig. 4.1 An example of three different types of plot shapes, each with the same area, but with
different perimeter to edge ratios. Reproduced from Thompson et al. (1998) with kind permission
from Elsevier
limiting potential edge effects. Large plots, however, are typically more difficult to
survey based on cost and logistics. Thus, under a fixed budget, there is a general
trade off between plot size and number of plots to sample. A method developed by
Hendricks (1956) found that as sample area increased, variance decline, but this
method is less flexible as this approach had several assumptions such as proportion-
ality of sampling cost per unit area.
Fig. 4.2 Two normal distributions with different the same mean and different variances
Methods for sample selection typically fall into two general categories: nonran-
dom sampling and random sampling. In random sampling, also called probability
sampling, the selection of units for inclusion in the sample has a known probability
of occurring. If the sample is selected randomly, then based on sample survey theory
(Cochran 1977) the sample estimates will be normally distributed. With normally
distributed estimates, knowledge of the sample mean and variance specifies the
shape of the normal distribution (Fig. 4.2). There is considerable literature justifying
the need for probabilistic sampling designs from a standpoint of statistical inference
(Cochran 1977; Thompson and Seber 1996; Thompson 2002b), but little evidence
exists that nonprobabilistic samples can be inferentially justified (Cochran 1977;
Anderson 2001; Thompson 2002a). In wildlife ecology, nonprobabilistic sampling
designs are likely to be divided into several (overlapping) categories which we gen-
eralize as convenience/haphazard sampling (hereafter convenience) or judgment
sampling/search sampling (hereafter judgment) while probabilistic sampling is the
other category used in wildlife ecology. For the rest of the chapter, we will discuss
these different sampling designs and their application to wildlife ecology research.
Convenience sampling has historically been the most common approach to sam-
pling wildlife populations. A convenience sample is one where the samples chosen
are based on an arbitrary selection procedure, often based on accessibility, and jus-
tified because of constraints on time, budgets, or study logistics. Gilbert (1987, p. 19)
noted in discussion of haphazard sampling, that:
4.2 Basic Sample Structure, Design, and Selection 143
Haphazard sampling embodies the philosophy of “any sampling location will do.” This
attitude encourages taking samples at convenient locations (say near the road) or times,
which can lead to biased estimates of means and other population characteristics.
Haphazard sampling is appropriate if the target population is completely homogeneous.
This assumption is highly suspect in most wildlife studies.
procedures are often justified based on their economics (e.g., easier to sample roads
than contact landowners for access), this is often not the case as these samples do not
allow for wide ranging inferences, thus limiting their applicability. Probabilistic sam-
ples allows the researcher to design a study and be confident that the results are suffi-
ciently accurate and economical (Cochran 1977). Nonprobabilistic sampling, while
common, do not lend themselves to valid statistical inference or estimation of variabil-
ity and often more cost is incurred attempting to validate convenience samples than
would be spent developing and applying probabilistic designs.
Random sampling is the process by which samples are selected from a set of n dis-
tinct sampling units, where each sample has a known likelihood of selection prede-
termined by the sampling methods chosen (Cochran 1977; Foreman 1991). Samples
selected probabilistically provide a basis for inference (estimation of means and
variances) from the data collected during the sampling process; samples from non-
probability designs do not have this characteristic.
The simplest form of random sampling is sampling at a single level or scale. That
is, the study area is divided into a set of potential units from which a sample is
taken. For example, a study area could be divided into a grid of sample plots all of
the same size from which a simple random sample is drawn (Fig. 4.3). The organ-
isms of interest in each cell in the selected sample are then counted. In its simplest
sense, single level sampling for a simple random sample, assume that we have n =
100 distinct samples, S1, S2,…,Sn, where each sample Si has a known probability of
selection (pi) or the probability that the ith sample is taken (Cochran 1977).
Assuming that each sample unit (plot) is of equal size, then the probability that a
single plot is chose to be sampled is 1/100 or pi = 0.01. In the application of single-
level probability sampling we assume that each unit in the population has the same
chance of being selected. Although this assumption may be modified by other
probabilistic sampling schemes (e.g., stratified sampling or unequal probability
sampling), the decisions regarding sample selection satisfy this assumption.
Sampling at more than one level, however, often is beneficial in wildlife studies.
Multilevel sampling can be simple, such as selecting subsamples of the original
probability sample for additional measurements as described in ranked set sam-
pling (Sect. 4.3.5). Multilevel sampling can be more complicated, such as double
sampling to estimate animal abundance (Sect. 4.3.6). In the correct circumstances,
multilevel sampling can increase the quality of field data, often at a lower cost.
4.3 Sampling Designs 145
Fig. 4.3 A simple sampling frame of 100 sample plots that can be used for selecting a simple
random sample
Although a simple random sample is the most basic method for sample selection,
there are others that are relevant to wildlife ecology studies, including stratified
random sampling, systematic sampling, sequential random sampling, cluster sam-
pling, adaptive sampling, and so on. These sampling plans (and others) can be
combined or extended to provide a large number of options for study designs,
which can include concepts like unequal probability sampling. Many sampling
designs are complicated, thus statistical guidance is suggested to select the appro-
priate design and analysis approaches. Below we discuss several sampling scales
and then appropriate designs for each scale.
In stratified sampling, the sampling frame is separated into different regions (strata)
comprising the population to be surveyed and a sample of units within stratum are
selected for study, usually by a random or systematic process. Ideally, strata should
be homogeneous with respect to the variable of interest itself (e.g., animal density),
but in practice, stratification is usually based on covariates that scientists hope are
highly correlated with the variable of interest (e.g., habitat types influences animal
density). Stratification may be used to increase the likelihood that the sampling
effort will be spread over important subdivisions or strata of the study area, popula-
tion, or study period (Fig. 4.4). Similarly, units might also be stratified for subsam-
pling. For example, when estimating the density of forest interior birds, the wildlife
biologist might stratify the study area into regions of high, medium, and low canopy
cover and sample each independently, perhaps in proportion to area size.
Stratification is common in wildlife studies, as it often is used to estimate param-
eters within strata and for contrasting parameters among strata. This type of analy-
sis is referred to using “strata as domains of study … in which the primary purpose
is to make comparisons between different strata” (Cochran 1977, p. 140). Under
4.3 Sampling Designs 147
Fig. 4.4 Stratification based on the density of a population. Reproduced from Krebs (1999) with
kind permission from Pearson Education
stratified designs, the formulas for analysis and for allocation of sampling effort
(Cochran 1977, pp. 140–141) are quite different from formulas appearing in intro-
ductory texts such as Scheaffer et al. (1990), where the standard objective is to
minimize the variance of summary statistics for all strata combined.
The primary objective of stratification is improved precision based on optimal
allocation of sampling effort into more homogeneous strata. In practice, it may be
possible to create homogeneous strata with respect to one or a few primary indicators,
but there are often many indicators measured, and it is not likely that the units within
strata will be homogeneous for all of them. For example, one could stratify a study
area based on vegetative characteristics and find that the stratification works well for
indicators of effect associated with trees. But, because of management (e.g., grazing),
the grass understory might be completely different and make the stratification unsat-
isfactory for indicators of effect measured in the understory. Differences in variance
among strata for the primary indicators may not occur or may not be substantially
better than random sampling. Factors used to stratify an area should be based on the
spatial location of regions where the population is expected to be relatively homoge-
neous, the size of sampling units, and the ease of identifying strata boundaries. Strata
should be of obvious biological significance for the variables of interest.
A fundamental problem is that strata normally are of unequal sizes; therefore, units
from different strata have different weights in any overall analysis. The formulas for
computing an overall mean and its standard error based on stratified sampling are rela-
tively complex (Cochran 1977). Formulas for the analysis of subpopulations (subunits
148 4 Sample Survey Strategies
of a study area) that belong to more than one stratum (Cochran 1977, pp. 142–144;
Thompson 2002b) are even more complex for basic statistics such as means and totals.
Samples can be allocated to strata in proportion to strata size or through some optimal
allocation process (Thompson 2002b). When using the stratification with proportional
allocation, the samples are self-weighting in that estimates of the overall mean and
proportion are the same as for estimates of these parameters from simple random
sample. Although proportional allocation is straightforward, it may not make the most
efficient use of time and budget. If it is known that within strata variances differ, sam-
ples can be allocated to optimize sample size. Detailed methods for optimizing sample
size are described in Cochran (1977) and Thompson (2002b).
Stratification has some inherent problems. In any stratification scheme, some
potential study sites will be misclassified in the original classification (e.g., a dark
area classified as a pond on the aerial photo was actually a parking lot). Stratification
is often based on maps that are inaccurate, resulting in misclassification of sites that
have no chance of selection. Misclassified portions of the study area can be adjusted
once errors are found, but data analysis becomes much more complicated, primarily
because of differences in the probability of selecting study units in the misclassified
portions of the study area. Short-term studies usually lead to additional research
questions requiring longer term research and a more complicated analysis of sub-
populations (Cochran 1977, pp. 142–144) that cross strata boundaries. However,
strata may change over the course of a study. Typical strata for wildlife studies
include physiography/topography, vegetative community, land use, temporal
frame, or management action of interest. Note, however, that the temporal aspect
of a study is of particular significance when stratifying on a variable that will likely
change with time (e.g., land use). Stratified sampling works best when applied to
short-term studies, thus reducing the likelihood that strata boundaries will change.
In long-term studies, initial stratification procedures at the beginning of the study
are likely to be the most beneficial to the investigators.
In systematic sampling, the sampling frame is partitioned into primary units where
each primary unit consists of a set of secondary units (Thompson 2002b). Sampling
then entails selecting units spaced in some systematic fashion throughout the popula-
tion based on a random start (Foreman 1991). A systematic sample from an ordered
list would consist of sampling every kth item in the list. A spatial sample typically
utilizes a systematic grid of points. Systematic sampling distributes the locations of
samples (units) uniformly through the list or over the area (site). Mathematical prop-
erties of systematic samples are not as straightforward as for random sampling, but
the statistical precision generally is better (Scheaffer et al. 1990).
Systematic sampling has been criticized for two basic reasons. First, the arrange-
ment of points may follow some unknown cyclic pattern in the response variable.
Theoretically, this problem is addressed a great deal, but is seldom a problem in
4.3 Sampling Designs 149
practice. If there are known cyclic patterns in the area of interest, the patterns
should be used to advantage to design a better systematic sampling plan. For exam-
ple, in a study of the cumulative effects of proposed wind energy development on
passerines and shore birds in the Buffalo Ridge area of southwestern Minnesota,
Strickland et al. (1996) implemented a grid of sampling points resulting in observa-
tions at varying distances from the intersection of roads laid out on section lines.
Second, in classical finite sampling theory (Cochran 1977), variation is assessed in
terms of how much the result might change if a different random starting point could
be selected for the uniform pattern. For a single uniform grid of sampling points (or a
single set of parallel lines) this is impossible, and thus variation cannot be estimated in
the classical sense. Various model-based approximations have been proposed for the
elusive measure of variation in systematic sampling (Wolter 1984). Sampling variance
can be estimated by replicating the systematic sample. For example, in a study requir-
ing a 10% sample it would be possible to take multiple smaller samples (say a 1%
sample repeated ten times), each with a random starting point. Inference to the popula-
tion mean and total can be made in the usual manner for simple random sampling.
Systematic sampling works very well in the following situations:
1. Analyses of observational data conducted as if random sampling had been con-
ducted (effectively ignoring the potential correlation between neighboring loca-
tions in the uniform pattern of a systematic sample)
2. Encounter sampling with unequal probability (Overton et al. 1991; Otis et al.
1993)
3. The model-based analysis commonly known as spatial statistics, wherein mod-
els are proposed to estimate treatment effects using the correlation between
neighboring units in the systematic grid (kriging)
The design and analysis in case 1 above is often used in evaluation of indicators of
a treatment response (e.g., change in density) in relatively small, homogeneous study
areas or small study areas where a gradient is expected in measured values of the
indicator across the area. Ignoring the potential correlation and continuing the analy-
sis as if it is justified by random sampling can be defended (Gilbert and Simpson
1992), especially in situations where a conservative statistical analysis is desired
(e.g., impact assessment). Estimates of variance treating the systematic sample as a
random sample will tend to overestimate the true variance of the sample (Hurlbert
1984; Scheaffer et al. 1990; Thompson 2002). Thus, systematic sampling in rela-
tively small impact assessment study areas following Gilbert and Simpson’s (1992)
formulas for analysis makes a great deal of sense. This applies whether systematic
sampling is applied to compare two areas (assessment and reference), the same area
before and following the incident, or between strata of a stratified sample.
In wildlife studies, populations tend to be aggregated or clustered, thus sample units
closer to each other will be more likely to be similar. For this reason, systematic sampling
tends to overestimate the variance of parameter estimates. A uniform grid of points or
parallel lines may not encounter rare units. To increase the likelihood of capturing some
of these rare units, scientists may stratify the sample such that all units of each distinct
type are joined together into strata and simple random samples are drawn from each
150 4 Sample Survey Strategies
stratum. Nevertheless, stratification works best if the study is short term, no units are
misclassified and no units change strata during the study. In longer term studies, such as
the US Environmental Protection Agency’s (EPA’s) long-term Environmental Monitoring
and Assessment Program (EMAP), as described by Overton et al. (1991), systematic
sampling has been proposed to counter these problems.
Cluster sampling is closely related to systematic sampling. A cluster sample is
a probabilistic sample in which each sampling unit is a collection, or cluster, of
elements such as groups of animals or plants (Scheaffer et al. 1990; Thompson
2002b). One of the most common uses of cluster sampling is the two-stage cluster
sample. First, the researcher selects a probabilistic sample of plots, each of the pri-
mary plots having eight secondary plots. Then, within those primary plots, we
either select another probability sample of plots from the eight secondary plots, or
consider the cluster of eight secondary plots of our sample and conduct our enu-
meration method within each of those plots (Fig. 4.5). The selection of progres-
sively smaller subsets of elements within the original set of sample clusters leads
to a multistage cluster sample. Cluster sampling methods can become considerably
complex, depending on sampling design, study question, and phenology of the spe-
cies under study (Christman 2000). For example, consider an ecologist interested
in estimating Greater Prairie-chicken (Tympanuchus cupido) lek numbers in the
Fig. 4.5 (a) Cluster sample of ten primary units with each primary unit consisting of eight
secondary units; (b) systematic sample with two starting points. Reproduced from Thompson
(2002) with kind permission from Wiley
4.3 Sampling Designs 151
plains during the breeding season. Lek sites are typically close spatially, relative to
the size of grasslands matrix these birds inhabit, thus we would expect that if a lek
is located within a primary sample plot, there are other leks in the vicinity. For this
reason, the researcher would randomly sample primary plots across a landscape of
Greater Prairie-chicken habitat, then, within those large plots, conduct enumeration
of lek numbers within the secondary plots.
Thompson (2002b, pp. 129–130) lists several features that systematic and clus-
ter sampling that make these designs worth evaluating for ecological studies:
• In systematic sampling, it is not uncommon to have a sample size of 1, that is, a
single primary unit (see Fig. 4.5).
• In cluster sampling, the size of the cluster may serve as auxiliary information
that may be used either in selecting clusters with unequal probabilities or in
forming ratio estimators.
• The size and shape of clusters may affect efficiency.
4.3.4.1 Definitions
Fig. 4.6 A hypothetical example of adaptive sampling, illustrating a mule deer winter range that
is divided into 400 study units (small squares) with simple random sample of ten units selected
(dark squares in (a) ) potentially containing deer carcasses (black dots). Each study unit and all
adjacent units are considered a neighborhood of units. The condition of including adjacent units
in the adaptive sample is the presence of one or more carcasses (black dots) in the unit. Additional
searches result in the discovery of additional carcasses in a sample of 45 units in ten clusters (dark
squares in (b) ). (Thompson 1990. Reprinted with permission from the Journal of the American
Statistical Association, Copyright 1990 by the American Statistical Association. All rights
reserved)
154 4 Sample Survey Strategies
following a severe winter, a survey for deer carcasses is conducted. An initial sim-
ple random sample of ten units is selected (see Fig. 4.6a). Each study unit and all
adjacent units are considered a neighborhood of units. The condition of including
adjacent units is the presence of one or more carcasses in the sampled unit. With
the adaptive design, additional searches are conducted in those units in the same
neighborhood of a unit containing a carcass in the first survey. Additional searches
are conducted until no further carcasses are discovered, resulting in a sample of 45
units in ten clusters (see Fig. 4.6b).
The potential benefits of adaptive sampling are obvious in the mule deer exam-
ple. The number of carcasses (point-objects in Fig. 4.6) is relatively small in the
initial sample. The addition of four or five more randomly selected sample units
probably would not have resulted in the detection of the number of carcasses con-
tained in the ten clusters of units. Thus, the precision of the estimates obtained from
the cluster sample of 45 units is greater than from a random sample of 45 units. This
increase in precision could translate into cost savings by reducing required samples
for a given level of precision. Cost savings also could result from reduced cost and
time for data collection given the logistics of sampling clusters of sampled units vs.
potentially a more widely spread random sample of units. This cost saving, how-
ever, is partially offset by increased record keeping and increased training costs.
Although there are numerous adaptive sampling options, design efficiency depends
upon several factors, including initial sample size, population distribution, plot
shape, and selection conditions (Smith et al. 1995; Thompson 2002b). Thus we
recommend that adaptive designs be pilot tested before implementation to ensure
that estimate precision and sampling efficiency is increased over alternate designs.
The potential efficiencies of precision and cost associated with adaptive sampling
come with a price. Computational complexities are added because of sample size
uncertainty and unequal probability associated with the sample unit selection. Units
within the neighborhood of units meeting the condition enter the sample at a much
higher probability than the probability of any one unit when sampled at random,
resulting in potentially biased estimates of the variable of interest. For example, ui
is included if selected during the initial sample, if it is in the network of any unit
selected, or if it is an edge unit to a selected network. In sampling with replacement,
repeat observations in the data may occur either due to repeat selections in the ini-
tial sample or due to initial selection of more than one unit in a cluster.
The Horvitz–Thompson (H–T) estimator (Horvitz and Thompson 1952) provides an unbi-
ased estimate of the parameter of interest when the probability ai that unit i is included in
the sample is known. The value for each unit in the sample is divided by the probability
that the unit is included in the sample. Inclusion probabilities are seldom known in field
studies, and modifying the Horvitz–Thompson estimator, where estimates of inclusion
probabilities are obtained from the data, as described by Thompson and Seber (1996) forms
an unbiased estimator (modified H–T).
4.3 Sampling Designs 155
Thompson and Seber (1996) and Thompson (2002b) summarized a variety of other
adaptive sampling designs. Strip adaptive cluster sampling includes sampling an
initial strip(s) of a given width divided into units of equal lengths. Systematic adap-
tive cluster sampling may be used when the initial sampling procedure is based on
a systematic sample of secondary plots within a primary plot. Stratified adaptive
cluster sampling may be useful when the population is highly aggregated with dif-
ferent expectations of densities between strata. In this case, follow-up adaptive
sampling may cross strata boundaries (Thompson 2002b). Thompson and Seber
(1996) also discuss sample size determination based on initial observations within
primary units, strata, or observed values in neighboring primary units or strata.
Adaptive sampling has considerable potential in ecological research, particularly in
studies of rare organisms and organisms occurring in clumped distributions.
In Fig. 4.7, the initial sample consists of five randomly selected strips or primary
units. The secondary units are small, square plots. Whenever a target element is
located, adjacent plots are added to the sample, which effectively expands the width
of the primary strip. As depicted in the figure, because this is a probabilistic sam-
pling procedure not all target elements are located (in fact, you might not know they
exist). For systematic adaptive cluster sampling (Fig. 4.8) the initial sample is a
spatial systematic sample with two randomly selected starting points. Adjacent
plots are added to the initial sample whenever a target element is located. The
choice of the systematic or strip adaptive cluster design depends primarily on the a
priori decision to use a specific conventional sampling design to gather the initial
sample, such as the preceding example using aerial or line transects.
Stratified adaptive cluster sampling essentially works like the previous adaptive
designs, and is most often implemented when some existing information on how an
initial stratification is available. In conventional (nonadaptive) stratified sampling,
units that are thought to be similar are grouped a priori into stratum based on prior
information. For example, in Fig. 4.9a, the initial stratified random sample of five
units in two strata is established. Then, whenever a sampling unit containing the
156 4 Sample Survey Strategies
Fig. 4.7 An example of an adaptive cluster sample with initial random selection of five strip plots
with the final sample outlined. Reproduced from Thompson (1991a) with kind permission from
the International Biometric Society
Fig. 4.8 An example of an adaptive cluster sample with initial random selection of two system-
atic samples with the final sample outlined. Reproduced from Thompson (1991a) with kind
permission from the International Biometric Society
4.3 Sampling Designs 157
Fig. 4.9 (a) Stratified random sample of five units per strata. (b) The final sample, which results
from the initial sample shown in (a). Reproduced from Thompson et al. (1991b) with kind permis-
sion from Oxford University Press
desired element is encountered, the adjacent units are added. The final sample in
this example (Fig. 4.9b) shows how elements from one strata can be included in a
cluster initiated in the other stratum (some units in the right-side stratum were
included in the cluster [sample] as a result of an initial selection in the left-side
stratum). Thompson (2002b, pp. 332–334) provides a comparison of this example
with conventional stratified sampling.
There are four challenges you will encounter when considering implementing an
adaptive cluster design (Smith et al. 2004, pp. 86–87):
158 4 Sample Survey Strategies
Double sampling would normally be used with independent samples where an ini-
tial (relatively small) sample of size n1 is taken where both y and x are measured.
The means for the two variables are calculated or, if the mean is known, the value
of the variable is estimated as
x1 = ∑ i xi /ni or Xˆ i = Nxi .
In a relatively large sample of size n2 (or a census) only the variable x is measured.
Its mean is
x2 = ∑ i xi /n2 or Xˆ 2 = Nx2 .
–
In some situations, the mean for X2 or (X2) is known from census data, thus the
standard error is zero (X2 = 0.0). As an example, suppose X2 = total production for a
farmer’s native hay field and Y = potential production without deer as measured in
n1 = 10 deer proof exclosures randomly located in a field. Two variables (Xi,yi) are
measured on the ith enclosure, where yi is the biomass present on a plot inside the
enclosure and Xi is the biomass present on a paired plot outside the enclosure.
160 4 Sample Survey Strategies
The ratio of production inside the exclosures to production outside the exclo-
sures is
y
R= i = 1 =
yˆ ∑ yi .
x1 x1 ∑ xi
ˆ
ing a pilot study, allowing one to identify species most likely affected by a treat-
ment or impact. In the second stage and with pilot information gained, the more
expensive and time-consuming indicators (e.g., the actual number of individuals)
might be measured on a subset of the units. If the correlation between the indicators
measured on the double-sampled units is sufficiently high, precision of statistical
analyses of the expensive and/or time-consuming indicator is improved.
Application of double sampling has grown in recent years, particularly for cor-
recting for visibility bias. Eberhardt and Simmons (1987) suggested double sam-
pling as a way to calibrate aerial observations. Pollock and Kendall (1987) included
double sampling in their review of the methods for estimating visibility bias in aerial
surveys. Graham and Bell (1969) reported an analysis of double counts made during
aerial surveys of feral livestock in the Northern Territory of Australia using a similar
method to Caughley and Grice (1982) and Cook and Jacobson (1979). Several stud-
ies have used radiotelemetered animals to measure visibility bias, including Packard
et al. (1985) for manatees (Trichechus manatus), Samuel et al. (1987) for elk, and
Flowy et al. (1979) for white-tailed deer (Odocoileus virginianus). McDonald et al.
(1990) estimated the visibility bias of sheep groups in an aerial survey of Dall sheep
(Ovus dalli) in the Arctic National Wildlife Refuge (ANWR), Alaska using this
technique. Strickland et al. (1994) compared population estimates of Dall sheep in
the Kenai Wildlife Refuge in Alaska using double sampling following the Gasaway
et al. (1986) ratio technique and double sampling combined with logistic regression.
Recently, Bart and Earnst (2002) outlined applications of double sampling to esti-
mate bird population trends. Double sampling shows great promise in field sampling
where visibility bias is considered a major issue.
First, wildlife studies are usually plagued with the need for a large sample size in the
face of budgetary and logistical constraints. Ranked set sampling provides an oppor-
tunity to make the best of available resources through what Patil et al. (1994) referred
to as observational economy. Ranked set sampling can be used with any sampling
scheme resulting in a probabilistic sample. A relatively large probabilistic sample of
units (N) is selected containing one or more elements (ni) of interest. The elements then
are ranked within each unit based on some obvious and easily discernible characteris-
tic (e.g., patch size, % cover type). The ranked elements are then selected in ascending
or descending order of rank – one per unit – for further analysis. The resulting rank-
ordered sample provides an unbiased estimator of the population mean superior in
efficiency to a simple random sample of the same size (Dell and Clutter 1972).
Ranked set sampling is a technique originally developed for estimating vegetation
biomass during studies of terrestrial vegetation; however, the procedure deserves
much broader application (Muttlak and McDonald 1992). The technique is best
explained by a simple illustration. Assume 60 uniformly spaced sampling units are
arranged in a rectangular grid on a big game winter range. Measure a quick,
162 4 Sample Survey Strategies
economical indicator of plant forage production (e.g., plant crown diameter) on each
of the first three units, rank order the three units according to this indicator, and
measure an expensive indicator (e.g., weight of current annual growth from a sample
of twigs) on the highest ranked unit. Continue by measuring shrub crown diameter
on the next three units (numbers 4, 5, and 6), rank order them, and estimate the
weight of current annual growth on the second-ranked unit. Finally, rank order units
7, 8, and 9 by plant crown diameter and estimate the weight of current annual growth
on the lowest-ranked unit; then start the process over on the next nine units. After,
completion of all 60 units, a ranked set sample of 20 units will be available for esti-
mates of the weight of current annual growth. This sample is not as good as a sample
of size 60 for estimating the weight of current annual growth, but should have con-
siderably better precision than a simple random sample of size 20. Ranked set sam-
pling is most advantageous when the quick, economical indicator is highly correlated
with the expensive indicator, and ranked set sampling can increase precision and
lower costs over simple random sampling (Mode et al. 2002). These relationships
need to be confirmed through additional research. Also, the methodology for estima-
tion of standard errors and allocation of sampling effort is not straightforward.
One of the primary functions of sampling design it to draw a sample that we hope
provides good coverage of the area of interest and allows for precise estimates of the
parameter of interest. The simple Latin square sampling +1 design can provide better
sample coverage than systematic or simple random sampling, especially when the dis-
tribution of the target species exhibits spatial autocorrelation (Munholland and
Borkowski 1996). A simple Latin square +1 design is fairly straightforward; a sam-
pling frame is developed first (note that a Latin square +1 is irrespective of plot shape
or size), then a random sample of plots is selected from each row–column combination
(Fig. 4.10a), and then a single plot (the +1) is selected at random from the remaining
plots (6 showing in Fig. 4.10a). Simple Latin square +1 sampling frames need not be
square; they could also be linear (Fig. 4.10b) or any other a range of various shapes
(Thompson et al. 1998) so long as the sampling frame can be fully specified.
Fig. 4.10 (a) A simple Latin square sample of +1 drawn from a sampling frame consisting of nine
square plots. Those plots having an “X” were the initial randomly selected plots based for each
row–column; the plot having an “O” is the +1 plot, which was randomly selected from the remain-
ing plots. (b) The same sampling frame adapted to a population tied to a linear resource.
Reproduced from Thompson et al. (1998) with kind permission from Elsevier
4.4 Point and Line Sampling 163
In the application of probability sampling, as seen above, one assumes each unit in
the population has equal chance of being selected. Although this assumption may be
modified by some sampling schemes (e.g., stratified sampling), the decisions regard-
ing sample selection satisfy this assumption. In the cases where the probability of
selection is influenced in some predictable way by some characteristic of the object
or organism, this bias must be considered in calculating means and totals. Examples
include line intercept sampling of vegetation (McDonald 1980; Kaiser 1983), plot-
less techniques such as the Bitterlich plotless technique for the estimation of forest
cover (Grosenbaugh 1952), aerial transect methods for estimating big game numbers
(Steinhorst and Samuel 1989; Trenkel et al. 1997), and the variable circular plot
method for estimating bird numbers (Reynolds et al. 1980). If the probability of
selection is proportional to some variable, then equations for estimating the magni-
tude and mean for population characteristics can be modified by an estimate of the
bias caused by this variable. Size bias estimation procedures are illustrated where
appropriate in the following discussion of sample selection methods.
N′
Nˆ = ,
a
where the numerator (N') equals the number of organisms counted and the denomi-
nator (a) equals the proportion of the area sampled. In the case of a simple random
sample, the variance is estimated as
n
( xi − x ) 2
ˆ ( xi ) = ∑
var ,
i =1 ( n − 1)
where n = the number of plots, xi = the number of organisms counted on plot i, and
−
X = the mean number of organisms counted per sample plot.
Sampling by fixed plot is best done when organisms are sessile (e.g., plants) or
when sampling occurs in a short time frame such that movements from plots has no
effect (e.g., aerial photography). We assume, under this design, that counts are
made without bias and no organisms are missed. If counts have a consistent bias
and/or organisms are missed, then estimation of total abundance may be inappro-
priate (Anderson 2001). Aerial surveys are often completed under the assumption
that few animals are missed and counts are made without bias. However, as a rule,
total counts of organisms, especially when counts are made remotely such as with
aerial surveys, should be considered conservative. Biases are also seldom consist-
ent. For example, aerial counts are likely to vary depending on the observer, the
weather, ground cover, pilot, and type of aircraft.
Fig. 4.11 Parameters in line intercept sampling, including the area (A = L × W) of the study area,
the objects of interest (1–5), aerial coverage (a1,…,an) of the objects, the intercept lines and their
random starting point and spacing interval. Reproduced from McDonald (1991) with kind permission
from Lyman McDonald
Even though biologists often do not recognize that items have been sampled with
unequal probability and that these data are size biased, care should be taken to rec-
ognize and correct for this source of bias. Size bias can be accounted for by calculat-
ing the probability of encountering the ith object with a given length (L) and width
(W) with a line perpendicular to the baseline from a single randomly selected point
166 4 Sample Survey Strategies
Pi = wi / W ; i = 1, 2, 3,…, N ,
where wi is the width of the object in relation to the baseline. The estimate of the
number of objects N is n n
Nˆ = ∑ (1 / pi ) = W ∑ (1 / wi ),
i =1 i =1
mated by
∑
The total of the attribute, Ŷ = Y ; over all objects in the area sampled is esti-
i =1
1
n
Yˆ = W ∑ ( yi / wi )
i =1
cover of objects is unbiased and can be estimated by the percentage of the line that
is intersected by the objects (Lucas and Seber 1977) using the formula
n
Cˆ = ∑ vi / L,
i =1
where vi is the length of the intersection of the ith object with a single replicate line
of length L. Again, replication of lines of intercept m times allows the estimate of
a standard error for use in making statistical inferences. Equal length lines can be
combined in the above formula to equal L. Weighted means are calculated when
lines are of unequal length.
Line intercept methodologies often employ systematic sampling designs. In the sys-
tematic placement of lines, the correct determination of the replication unit and thus
the correct sample size for statistical inferences is an issue. If sufficient distance
between lines exists to justify an assumption of independence, then the proper sample
size is the number of individual lines and the data are analyzed as if the individual lines
are independent replications. However, if the assumption of independence is not justi-
fied (i.e., data from individual lines are correlated) then the set of correlated lines is
considered the replication unit. The set of m lines could be replicated m' times using a
new random starting point each time, yielding an independent estimate of parameters
of interest with L' = m(L) as the combined length of the transects to yield m' independ-
ent replications. Statistical inferences would follow the standard procedures.
The general objectives in systematic location of lines are to:
1. Provide uniform coverage over the study region, R
2. Generate a relatively large variance within the replication unit vs. a relatively
small variance from replication to replication
For example, the total biomass and cover by large invertebrates on tidal influenced
beaches may be estimated by line intercept sampling with lines perpendicular to the
tidal flow. Standard errors computed for systematically located lines should be
conservative (too large) if densities of the invertebrates are more similar at the same
tidal elevation on all transects vs. different tidal elevations on the same transect
(condition 2 above is satisfied). Even if individual lines cannot be considered inde-
pendent, when condition 2 is satisfied then standard computational procedures for
standard errors can be used (i.e., compute standard errors as if the data were inde-
pendent) to produce conservative estimates.
Often one or more long lines are possible but the number is not sufficient to provide
an acceptable estimate of the standard error. Standard errors can be estimated by
168 4 Sample Survey Strategies
breaking the lines into subsets, which are then used in a jackknife or bootstrap pro-
cedure. A good example might be surveys along linear features such as rivers or
highways. Skinner et al. (1997) used bootstrapping for calculating confidence inter-
vals around estimates of moose density along a long transect zigzagging along the
Innoko River in Alaska. Each zigzag is treated as an independent transect. While
there may be some lack of independence where the segments join, it is ignored in
favor of acquiring an estimate variance for moose density along the line. This works
best with a relatively large sample size that fairly represents the area of interest.
Skinner et al. (1997) reported satisfactory results with 40–60 segments per stratum.
Generally, the jackknife procedure estimates a population parameter by repeat-
edly estimating the parameter after one of the sample values is eliminated from the
calculation resulting in several pseudoestimates of the parameter. The pseudoesti-
mates of the parameter are treated as a random sample of independent estimates of
the parameter, allowing an estimate of variance and confidence intervals. The boot-
strap is the selection of a random sample of n values X1, X2,…, Xn from a population
and using the sample to estimate some population parameter. Then a large number
of random samples (usually >1,000) of size n are taken from the original sample.
The large number of bootstrap samples is used to estimate the parameter of interest,
its variance, and a confidence interval. Both methods require a large number of cal-
culations and require a computer. For details on jack-knife, bootstrap, and other
computer-intensive methods, see Manly (1991).
Line transects are similar to line intercept sampling in that the basic sampling unit
is a line randomly or systematically located on a baseline, perpendicular to the
baseline, and extended across the study region. Unlike line intercept sampling,
objects are recorded on either side of the line according to some rule of inclusion.
When a total count of objects is attempted within a fixed distance of the line,
transect sampling is analogous to sampling on fixed plot (see Sect. 4.4.1). This
form of line transect, also known as a belt (strip) transect, has been used by the US
Fish and Wildlife Service (Conroy et al. 1988) in aerial counts of black ducks. As
with most attempts at total counts, belt transect surveys usually do not detect 100%
of the animals or other objects within the strip. When surveys are completed
according to a standard protocol, the counts can be considered an index. Conroy et
al. (1988) recognized ducks were missed and suggested that survey results should
be considered an index to population size.
Line-transect sampling wherein the counts are considered incomplete has been
widely applied for estimation of density of animal populations (Laake et al. 1979,
1993). Burnham et al. (1980) comprehensively reviewed the theory and applica-
tions of this form of line-transect sampling. Buckland et al. (1993) updated the
developments in line-transect sampling through the decade of the 1980s. Alpizar-
Jara and Pollock (1996), Beavers and Ramsey (1998), Manly et al. (1996), Quang
4.5 Line Transects 169
and Becker (1996, 1997), and Southwell (1994) developed additional theory and
application. The notation in this section follows Burnham et al. (1980).
Line-transect studies have used two basic designs and analytic methods depend-
ing on the type of data recorded (1) perpendicular distances (x) or sighting distances
(r) and (2) angles (q) (Fig. 4.12). Studies based on sighting distances and angles are
generally subject to more severe biases and are not emphasized in this discussion.
There are several assumptions required in the use of line-transect surveys
(Buckland et al. 2001), including:
1. Objects on the line are detected with 100% probability.
2. Objects do not move in response to the observer before detection (e.g., animal
movements are independent of observers).
3. Objects are not counted twice.
4. Objects are fixed at the point of initial detection.
5. Distances are measured without errors.
6. Transect lines are probabilistically located in the study area.
Fig. 4.12 The types of data recorded for the two basic types of line-transect study designs includ-
ing perpendicular distances (x) or sighting distances (r) and angles (q). The probability of detect-
ing an object at a perpendicular distance of x from the transect line is known as the object’s
detection function g(x). Reproduced from Burnham et al. (1980) with kind permission from The
Wildlife Society
170 4 Sample Survey Strategies
such as weather, observer training, vegetation type, etc., so long as all such func-
tions satisfy the condition that probability of detection is 100% at the origin x = 0
(Burnham et al. 1980).
The average probability of detection for an object in the strip of width 2w is
estimated by
Pˆw = 1 / wf
ˆ (0 )
where f(x) denotes the relative probability density function of the observed right
angle distances, xi, i = 1, 2,…,n. The function f(x) is estimated by a curve fitted to
the (relative) frequency histogram of the right angle distances to the observed
objects and fˆ(0) is estimated by the intersection of f(x) with the vertical axis at x =
0. Given Pˆw = 1 / wf
ˆ (0), and detection of n objects in the strip of width 2w and
length L, the observed density is computed by
Dˆ = n/2 Lw.
The observed density is corrected for visibility bias by dividing by the average
probability of detection of objects to obtain
Dˆ = (n/ 2 Lw) / (1/wfˆ (0))
.
= nfˆ (0) / 2 L
The width of the strip drops out of the formula for estimation of density of objects
allowing line-transect surveys with no bound on w (i.e., w = ∞). However, at large
distances from the line, the probability of detection becomes very low and it is
desirable to set an upper limit on w such that 1–3% of the most extreme observa-
tions are truncated as outliers. Decisions on dropping outliers from the data set can
be made after data are collected.
4.5.2 Replication
Estimates of the variances and standard errors associated with line-transect sampling
are usually made under the assumption that the sightings are independent events and
the number of objects detected is a Poisson random variable. If there are enough data
(i.e., ≥40 detected objects) on independent replications of transect lines or system-
atic sets of lines, then a better estimate of these statistics can be made. Replications
must be physically distinct and be located in the study area according to a true prob-
ability sampling procedure providing equal chance of detection for all individuals.
Given independent lines, the density should be estimated on each line and the stand-
ard error of density estimated by the usual standard error of the mean density
(weighted by line length if lines vary appreciably in length).
If there are not enough detections on independent replications, then jackknifing
the lines should be considered (Manly 1991). For example, to jackknife the lines,
4.5 Line Transects 171
repeatedly leave one line out of the data set and obtain the pseudoestimate of den-
sity by biasing estimates on the remaining lines. The mean of the pseudoestimates
and the standard error of the pseudoestimates would then be computed. While jack-
knifing small samples will allow the estimation of variance, sample sizes are not
increased and the pseudovalues are likely to be correlated to some extent, resulting
in a biased estimate of variance. The significance of this bias is hard to predict and
should be evaluated by conducting numerous studies of a range of situations before
reliance is placed on the variance estimator (Manly 1991).
Size bias is an issue when the probability of detecting subjects is influenced by size
(e.g., the subject’s width, area, etc.). In particular, animals typically occur in groups,
and the probability of detecting an individual increases with group size. Estimates
of group density and mean group size are required to estimate the density of indi-
viduals and an overestimate of mean group size will lead to an overestimate of true
density. Drummer and McDonald (1987) proposed bivariate detection functions
incorporating both perpendicular distance and group size. Drummer (1991) offered
the software package SIZETRAN for fitting size-biased data. Quang (1989) pre-
sented nonparametric estimation procedures for size-biased line-transect surveys.
Distance-based methods have been combined with aerial surveys (Guenzel
1997) to become a staple for some big game biologists in estimating animal abun-
dance. As pointed our earlier (Sect. 4.5.1), the probability of detecting objects dur-
ing line-transect surveys can influence parameter estimates. Quang and Becker
(1996) offered an approach for incorporating any appropriate covariate influencing
detection into aerial surveys using line-transect methodology by modeling scale
parameters as log-linear functions of covariates. Manly et al. (1996) used a double-
sample protocol during aerial transect surveys of polar bear. Observations by two
observers were analyzed using maximum likelihood methods combined with an
information criterion (AIC) to provide estimates of the abundance of polar bears.
Beavers and Ramsey (1998) illustrated the use of ordinary least-squares regression
analyses to adjust line-transect data for the influence of variables (covariates).
The line-transect method is also proposed for use with aerial surveys and other
methods of estimating animal abundance such as a form of capture–recapture
(Alpizar-Jara and Pollock 1996) and double sampling (Quang and Becker 1997;
Manly et al. 1996). Lukacs et al. (2005) investigated the efficiency of trapping web
designs, which can be combined with distance sampling to estimate density or
abundance (Lukacs et al. 2004) and provided software for survey design (Lukacs
2002). In addition, line-transect methods have been developed which incorporate
covariates (Marques and Buckland 2004), combine capture–mark–recapture data
(Burnham et al. 2004), and a host of other potential topics (Buckland et al. 2004).
The field of abundance and density estimation from transect-based sampling
schemes is active, so additional methodologies are sure to be forthcoming.
172 4 Sample Survey Strategies
The concept of plotless or distance methods was introduced earlier in our discus-
sion of the line intercept method (see Sect. 4.4.2). Plotless methods from sample
points using some probability sampling procedure are considered more efficient
than fixed area plots when organisms of interest are sparse and counting of individ-
uals within plots is time consuming (Ludwig and Reynolds 1988).
Fig. 4.13 The two T-square sampling points and the two distances measured at each point.
Reproduced from Morrison et al. (2001) with kind permission from Springer Science + Business
Media
4.6 Plotless Point Sampling 173
8( z 2 S x2 + 2 xzs xz + x 2 sz2 )
Standard error of (1/ N̂ T ) = ,
n
where Sx2 is the variance of point-to-organism distances, Sz2 is the variance of T-square
organism-to-neighbor distances, and Sxz is the covariance of x and z distances.
critical point estimate is the intersection of the detection function at the origin.
Burnham et al. (1980) suggested trimming data so that roughly 95% of the observed
distances are used in the analysis. The assumption is that the outer 5% of observa-
tions are outliers that may negatively affect density estimates.
The assumption that counts are independent may be difficult, as subjects being
counted are seldom marked or obviously unique. Biologists may consider estimat-
ing use per unit area per unit time as an index to abundance. When subjects are rela-
tively uncommon, the amount of time spent within distance intervals can be
recorded. In areas with a relatively high density of subjects, surveys can be con-
ducted as instantaneous counts of animals at predetermined intervals of time during
survey periods.
all study designs. The importance of models and assumptions in the analysis of
empirical data ranges from little effect in design-based studies to being a critical
part of data analysis in model-based studies. Design-based studies result in pre-
dicted values and estimates of precision as a function of the study design. Model-
based studies lead to predicted values and estimates of precision based on a
combination of study design and model assumptions often open to criticism. The
following discussion focuses on the most prevalent model-based studies that are
heavily dependent on assumptions and estimation procedures involving linear and
logistic regression for data analysis. These study methods are only slightly more
model-based than some previously discussed (e.g., plotless and line intercept)
involving estimates of nuisance parameters such as detection probabilities, proba-
bilities of inclusion, and encounter probabilities.
The Petersen–Lincoln model has been used for years by wildlife biologists to esti-
mate animal abundance and is considered a closed population model. The Petersen–
Lincoln model should be considered an index to abundance when a systematic bias
prevents of one or more of the assumptions described below from being satisfied.
In a Petersen–Lincoln study, a sample n1 of the population is taken at time t1 and all
organisms are uniquely marked. A second sample n2 is taken at time t2 and the
organisms captured are examined for a mark and a count is made of the recaptures
(m2). Population size (N) is estimated as
Nˆ = n1n2 / m2 .
as in Menkins and Anderson (1988), works reasonably well. Kendall (1999) also
discussed the implications of these and other types of closure violations for studies
involving greater than two samples of the population.
The second assumption is related to the first and implies that each sample is a
simple random sample from a closed population and that marked individuals have
the same probability of capture as the unmarked animals. If the probability of cap-
ture is different for different classes of animals (say young vs. adults) or for differ-
ent locations, then the sampling could follow the stratified random sampling plan.
It is common in studies of large populations that a portion of the animal’s range
may be inaccessible due to topography or land ownership. The estimate of abun-
dance is thus limited to the area of accessibility. This can be a problem for animals
that have large ranges, as there is no provision for animals being unavailable during
either of the sampling periods. The probability of capture can also be influenced by
the conduct of the study such that animals become trap happy (attracted to traps) or
trap shy (repulsed from traps). The fact that study design seldom completely satis-
fies this assumption has led to the development of models (discussed below) that
allow the relaxation of this requirement.
The third assumption depends on an appropriate marking technique. Marks must
be recognizable without influencing the probability of resighting or recapture.
Thus, marks must not make the animal more or less visible to the observer or more
or less susceptible to mortality. Marks should not be easily lost. If the loss of marks
is a problem, double marking (Caughley 1977; Seber 1982) can be used for correc-
tions to the recapture data. New methods of marking animals are likely to help
refine the design of mark–recapture observational studies and experiments (Lebreton
et al. 1992). This assumption illustrates the need for standardized methods and
good standard operating procedures so that study plans are easy to follow and data
are properly recorded.
An appropriate study design can help meet the assumptions of the Petersen–
Lincoln model, but the two trapping occasions do not allow a test of the assumptions
upon which the estimates are based. Lancia et al. (2005) suggested that in two-sample
studies, the recapture method be different and independent of the initial sample
method. For example, one might trap and neckband mule deer and then use observa-
tion as the recapture method. This recommendation seems reasonable and should
eliminate the concern over trap response and heterogeneous capture probabilities.
Otis et al. (1978) and White et al. (1982) offered a modeling strategy for making
density and population size estimates using capture data on closed animal popula-
tions. With a complete capture history of every animal caught, these models allow
relaxation of the equal catchability assumption (Pollock 1974; Otis et al. 1978;
Burnham and Overton 1978; White et al. 1982; Pollock and Otto 1983; Chao 1987,
1988, 1989; Menkins and Anderson 1998; Huggins 1989, 1991; Brownie et al. 1993;
178 4 Sample Survey Strategies
Lee and Chao 1994). A set of eight models is selected to provide the appropriate
estimator of the population size. The models are M0, Mt, Mb, Mh, Mtb, Mth, Mbh, and
Mtbh, where the subscript “0” indicates the null case, and t, b, and h, are as follows:
● 0 – All individuals have the same probability of capture throughout the entire
study
● t – Time-specific changes in capture probabilities (i.e., the Darroch 1958 model
where probability of capture is the same for all individuals on a given occasion)
● b – Capture probabilities change due to behavioral response from first capture
(i.e., probability of capture remains constant until first capture, can change once,
and then remains constant for the remainder of the study)
● h – Heterogeneity of capture probabilities in the population (i.e., different sub-
sets of the individuals have different probability of capture but, probability of
capture does not change during the course of the study)
This series of eight models includes all possible combinations of the three factors,
including none and all of them (Table 4.1 and Fig. 4.14). Population estimates from
removal data can also be obtained because the estimators for the removal model of
Zippen (1958) are the same as the estimators under the behavioral model Mb.
Estimators for the eight models can be found in Rexstad and Burnham (1991).
We suggest you also check the US Geological Survey Patuxent Wildlife Research
Center’s software archive (http://www.pwrc.usgs.gov) for additional information
and updated software for mark–recapture data. Since explicit formulas do not exist
for the estimators, they must be solved by iterative procedures requiring a compu-
ter. The design issues are essentially identical to the two-sample Petersen–Lincoln
study with the condition of assumption 2 met through the repeated trapping events
and modeling.
Table 4.1 The eight models summarized by symbol, sources of variation in capture probability,
and the associated estimator, if any
Model Sources of variation in capture possibilities Appropriate estimator
M0 None Null
Mt Time Darroch
Mb Behavior Zippin
Mh Heterogeneity Jacknife
Mtb Time, behavior None
Mth Time, heterogeneity None
Mbh Behavior, heterogeneity Generalized removal
Mtbh Time, behavior, heterogeneity None
The names provided are those used by program Capture and MARK for these estimators
4.7 Model-based Sampling 179
Fig. 4.14 The series of eight closed population models proposed includes all possible combina-
tions of three factors, including none and all of them. Reproduced from Otis et al. (1978) with kind
permission from The Wildlife Society
using more complex applications of these models. If the study is being done at
multiple sites then multistate models (e.g., Brownie et al. 1993; Williams et al.
2002) can be used to estimate probabilities of movement between areas.
Supplemental telemetry could be used to estimate some of the movement. Band
recoveries can be combined with recapture information to separate philopatry from
survival (Burnham 1993). In age-dependent models, recruitment from a lower age
class can be separate from immigration (Nichols and Pollock 1983). There are
many different types of capture–recapture models including approaches outlined by
Burnham (1993), the super-population approach of Schwarz and Arnason (1996), a
host of models by Pradel (1996) which focus on survival and recruitment, as well
as the Link and Barker (2005) reparameterization of the Pradel (1996) model to
better estimate those recruitment parameters.
Lancia et al. (2005) pointed out that the distinction between open and closed popu-
lations is made to simplify models used to estimate population parameters of inter-
est. The simplifications are expressed as assumptions and study design must take
these simplifying assumptions into account. Pollock (1982) noted that long-term
studies often consist of multiple capture occasions for each period of interest. He
showed that the extra information from the individual capture occasions could be
exploited to reduce bias in Jolly–Seber estimates of abundance and recruitment
when there is heterogeneity in detection probabilities.
Under Pollock’s robust design, each sampling period consists of at least two subsam-
ples, ideally spaced closely together so that the population can be considered closed to
additions and deletions during that period. Kendall and Pollock (1992) summarized
other advantages of this basic design, in that abundance, survival rate, and recruitment
can be estimated for all time periods in the study, whereas with the classic design one
cannot estimate abundance for the first and last periods, survival rate to the last period,
and the first and last recruitment values; recruitment can be separated into immigration
and recruitment from a lower age class within the population when there are at least two
age classes, whereas the classic design requires three age classes (Nichols and Pollock
1990); abundance and survival can be estimated with less dependence, thereby lessen-
ing some of the statistical problems with density-dependent modeling (Pollock et al.
1990); and study designs for individual periods can be customized to meet specific
objectives, due to the second level of sampling. For instance, adding more sampling
effort in period i (e.g., more trapping days) should increase precision of the abundance
estimate for period i. However, adding more sampling effort after period i should
increase precision of survival rate from i to i + 1.
The additional information from the subsamples in the robust design allows one
to estimate the probability that a member of the population is unavailable for detec-
tion (i.e., a temporary emigrant) in a given period (Kendall et al. 1997). Depending
on the context of the analysis, this could be equivalent to an animal being a non-
breeder or an animal in torpor. Based on the advantages listed above, we recommend
182 4 Sample Survey Strategies
that Pollock’s robust design be used for most capture–recapture studies. There are
no apparent disadvantages in doing so. Even the assumption of closure across sub-
samples within a period is not necessarily a hindrance (Schwarz and Stobo 1997;
Kendall 1999). Even where it turns out that it is not possible to apply sufficient
effort to each subsample to employ the robust design, the data still can be pooled
and traditional methods used. The advantages of the robust design derive from the
second source of capture information provided by the subsamples. Obviously, the
overall study design must recognize the desired comparisons using the open mod-
els, even though the distribution of the samples for the closed model (as long as it
is a probabilistic sample) is of relatively little consequence.
Survival analysis is a set of statistical procedures for which the outcome variable is
the time until an event occurs (Kleinbaum 1996). As such, survival analysis is con-
cerned with the distribution of lifetimes (Venables and Ripley 2002). In wildlife
research, survival analysis is used to estimate survival (Ŝ ), or the probability that an
individual survives a specified period (days, weeks, years). Because estimates of sur-
vival are used in population models, evaluations of changing population demography,
and as justification for altering management practices, approaches to survival analysis
have becoming increasingly common in wildlife research. Probably the most com-
mon approach to survival analysis in wildlife science is estimation using known fate
data based on radio-telemetry where individuals are relocated on some regular basis.
Another common application of time to event models has been recent work focused
on estimating survival of nests where the event of interest is the success or failure of
a nest (Stanley 2000; Dinsmore et al. 2002; Rotella et al. 2004; Shaffer 2004).
Generally, estimation of survival is focused on the amount of time until some event
occurs. Time-to-event models are not constrained to evaluating only survival, as the
event of interest could include not only death, but also recovery (e.g., regrowth after a
burn), return to a location (e.g., site fidelity), incidence (e.g., disease transmission or
relapse), or any experience of interest that happens to an individual (Kleinbaum 1996).
Typically, the time in time-to-event models refers to an anthropomorphic specification
set by the researchers (e.g., days, months, seasons) based on knowledge of the species
of interest. In wildlife studies, the event of interest is usually death (failure).
One key point that must be addressed is censoring, both right and interval censor-
ing and left truncation. Censoring occurs when the information on the individual(s)
survival is incomplete, thus we do not know the survival times exactly. There are
three types of censoring which influence survival modeling:
● Right censoring – when the dataset becomes incomplete on the right size of the
follow-up period
● Middle censoring – when during the study, the probability of detecting an indi-
vidual is <1
● Left truncation – when the dataset is incomplete at the left side of the follow-up
period
4.7 Model-based Sampling 183
where the product of all j terms for which aj < time t given that aj are the discrete time
points (j) when death occurs, dj is the number of deaths at the jth time point, and rj is
the number of animals at risk at the jth time point. Thus, the probability of surviving
from time 0 to a1 (interval during which first death occurs) is estimated as
d
Sˆ (a1 ) = 1 − 1
r1
and the probability of surviving from time a1 to a2 is 1 − d2/r2, so Ŝ (a2) is the prod-
uct of the first 2 is given by
184 4 Sample Survey Strategies
⎛ d ⎞⎛ d ⎞
Sˆ (a2 ) = ⎜ 1 − 1 ⎟ ⎜ 1 − 2 ⎟ , and so on.
⎝ r1 ⎠ ⎝ r2 ⎠
There are several general assumptions for time to event studies (see Pollock et al.
1989; Williams et al. 2002). First, we assume that radio-tagged individuals are a
random sample from the population of interest. This assumption can be satisfied by
using random location of trapping sites or perhaps stratifying trapping effort by
perceived density of the population. We also assume that survival times are inde-
pendent among different animals; violating this assumption leads to overdispersion.
For example, you catch a brood of quail (say 6 young) and radio-tag each, but a
predator finds the brood and predates the hen and all the young – thus survival time
between individuals was not independent. Additionally, we assume that radio trans-
mitters (or other marks) do not affect the survival of marked individuals and that
the censoring mechanism in random or that censoring is not related to fate of the
individual (e.g., a radio destroyed during predation or harvest event). For staggered
entry studies, newly marked individuals have the same survival function as previ-
ously marked individuals.
Alldredge et al. (1998) reviewed the multitude of methods used in the study of
resource selection. Resource selection occurs in a hierarchical fashion from the geo-
graphic range of a species, to individual animal ranges within a geographic range, to
use of general features (habitats) within the individual’s range, to the selection of
particular elements (food items) within the feeding site (Manly et al. 1993). The first
design decision in a resource selection study is the scale of study (Johnson 1980).
Manly et al. (1993) suggested conducting studies at multiple scales. Additional
important decisions affecting the outcome of these studies include the selection of
the study area boundary and study techniques (Manly et al. 1993).
Resource selection probability functions give probabilities of use for resource
units of different types. This approach may be used when the resource being stud-
ied can be classified as a universe of N available units, some of which are used and
4.8 Resource Selection 187
the remainder not used. Also, every unit can be classified by the values that it pos-
sesses for certain important variables (X = X1, X2, …, Xp) thought to affect use.
Examples include prey items selected by predators based on color, size, and age, or
plots of land selected by ungulates based on distance to water, vegetation type, dis-
tance to disturbance, and so on. Sampling of used and unused units must consider
the same issues as discussed previously for any probability sample.
Thomas and Taylor (1990) described three general study designs for evaluating
resource selection. In design I, measurements are made at the population level.
Units available to all animals in the population are sampled or censused and classi-
fied into used and unused. Individual animals are not identified. In design II, indi-
vidual animals are identified and the use of resources is measured for each while
availability is measured at the level available to the entire population. In design III,
individuals are identified or collected as in design II and at least two of the sets of
resource units (used resource units, unused resource units, available resource units)
are sampled or censused for each animal.
Manly et al. (1993) also offered three sampling protocols for resource selection stud-
ies. First, one outlines random sampling or complete counts on available units and ran-
domly samples used resource units. Next, one = outlines randomly samples or census
subjects within available units and randomly samples unused units. Finally, one takes
an independent sample of both used and unused units. Also, it is possible in some situa-
tions to census both used and unused units. Erickson et al. (1998) described a moose
(Alces alces) study on the Innoko National Wildlife Refuge in Alaska that evaluated
habitat selection following Design I and sampling protocol A (Fig. 4.15).
Fig. 4.15 Schematic of design I and sampling protocol A (from Manly et al. 1993) as used in a
survey of potential moose use sites in a river corridor in Innoko National Wildlife Refuge in
Alaska. Reproduced from Erickson et al. (1998) with kind permission from American Statistical
Society
188 4 Sample Survey Strategies
The selection of a particular design and sampling protocol must consider the
study area, the habitats or characteristics of interest, the practical sample size, and
the anticipated method of analysis. The design of studies should also consider the
relationship between resource selection and the anticipated benefits of the selection
of good resources, such as increased survival rate, increased productivity, and/or
increased fitness (Alldredge et al. 1998).
Wildlife studies frequently are interested in describing the spatial pattern of resources
or contaminants. The application of spatial statistics offers an opportunity to evaluate
the precision of spatial data as well as improve the efficiency of spatial sampling
efforts. Spatial statistics combine the geostatistical prediction techniques of kriging
(Krige 1951) and simulation procedures such as conditional and unconditional simu-
lation (Borgman et al. 1984, 1994). Both kriging and simulation procedures are used
to estimate random variables at unsampled locations. Kriging produces best linear
unbiased predictions using available known data, while the simulation procedures
give a variety of estimates usually based on the data’s statistical distribution. Kriging
results in a smoothed version of the distribution of estimates, while simulation procedures
result in predicted variance and correlation structure, and natural variability of the
original process are preserved (Kern 1997). If the spatial characterization of the mean
of the variable in the mean in each cell of a grid, for example, then kriging procedures
are satisfactory. However, if the spatial variability of the process is of importance,
simulation procedures are more appropriate. For a more complete treatment of
simulation techniques see Borgman et al. (1994) or Deutsch and Journel (1992).
Cressie (1991) gave a complete theoretical development of kriging procedures, while
Isaaks and Srivastava (1989) provided a more applied treatment appropriate for the
practitioner. For the original developments in geostatistics, we refer you to Krige
(1951), Matheron (1962, 1971), and Journel and Huigbregts (1978).
In a study using spatial statistics, data generally are gathered from a grid of points
and the spatial covariance structure of variables is used to estimate the variable of
interest at points not sampled. The data on the variable of interest at the sample locations
could be used to predict the distribution of the variable for management or conserva-
tion purposes. For example, suppose a wind plant is planned for a particular area and
there is concern regarding the potential for the development to create risk to birds. If
bird counts are used as an index of local use, then estimates of local mean bird use
could be used to design the wind plant to avoid high bird use areas. Preservation of
local variability would not be necessary, and kriging would provide a reasonable
method to predict locations where bird use is low and hence wind turbines should be
located. Using this sort of linear prediction requires sampling in all areas of interest.
Geostatistical modeling, which considers both linear trends and correlated
random variables, can be more valuable in predicting the spatial distribution of a
variable of interest. These geostatistical simulation models are stochastic, and
4.10 Summary 189
4.10 Summary
The goal of wildlife ecology research is to learn about wildlife populations and the
habitats that they use. Thus, the objective of Chap. 4 was to provide a description
of the fundamental methods for sampling and making inferences in wildlife studies.
190 4 Sample Survey Strategies
We began with a discussion of the basics of sample survey design, plot shape and
size, random and nonrandom sample survey selection as well as a description of
common definitions used in wildlife sample survey design. Within Sect. 4.1, we
detail the necessity to define clearly study objectives, the area of inference, and the
sampling unit(s) of importance. Additionally, we discuss the need for clear defini-
tion of the parameters to measure. In Sects. 4.2 and 4.3, we discussed numerous
methods for probability sampling, ranging from simple random sampling to strip
adaptive cluster sampling. Under this framework, we outline the need for probabil-
istic sampling procedures and how their use lead to strong inference. We outlined
several methods to sample populations, ranging from simple fixed area plots to
more complicated distance-based estimators under design-based inference.
Next, we focused on model-based sampling (Sect. 4.7). We outlined the rationale
for using model-based techniques and discussed the differences between model-based
and design-based studies (also see Chap 2). Often, as each wildlife study is unique,
decisions regarding the sampling plan will require use of a variety of methods. With
this in mind, we discussed several variant of capture–mark–recapture techniques,
analysis of presence–absence data, and time to event models; all of which are used for
model-based inferences. We conclude this chapter with a discussion on resource selec-
tion and spatial statistics and their application to wildlife conservation.
References
Alldredge, J. R., D. L. Thomas, and L. McDonald. 1998. Survey and comparison of methods for
study of resource selection. J. Agric. Biol. Environ. Stat. 3: 237–253.
Alpizar-Jara, R., and K. H. Pollock. 1996. A combination line transect and capture re-capture
sampling model for multiple observers in aerial surveys. J. Environ. Stat. 3: 311–327.
Amstrup, S. C., T. L. McDonald, and B. F. J. Manly. 2005. Handbook of Capture–Recapture
Analysis. Princeton University Press, Princeton.
Anderson, D. R. 2001. The need to get the basics right in wildlife field studies. Wildl. Soc. Bull.
29: 1294–1297.
Bart, J. and S. Earnst. 2002. Double sampling to estimate density and population trends in birds.
Auk 119: 36–45.
Beavers, S. C., and F. L. Ramsey. 1998. Detectability analysis in transect surveys. J. Wildl.
Manage. 62(3): 948–957.
Block, W. M., and L. A. Brennan. 1992. The habitat concept in ornithology: Theory and applica-
tions. Curr. Ornithol. 11: 35–91.
Borchers, D. L., S. T. Buckland, and W. Zucchini. 2002. Estimating Animal Abundance. Springer,
Berlin Heidelberg New York.
Borgman, L. E., M. Taheri, and R. Hagan. 1984. Three-dimensional, frequency-domain simula-
tions of geologic variables, in G. Verly, M. David, A. G. Journel, and A. Marechal, Eds.
Geostatistics for Natural Resources Characterization, Part I. Reidel, Dordrecht.
Borgman, L. E., C. D. Miler, S. R. Signerini, and R. C. Faucette. 1994. Stochastic interpolation as
a means to estimate oceanic fields. Atmos.-Ocean. 32(2): 395–419.
Bormann, F. H. 1953. The statistical efficiency of sample plot size and shape in forest ecology.
Ecology 34: 474–487.
Brownie, C., J. E. Hines, J. D. Nichols, K. H. Pollock, and J. B. Hestbeck. 1993. Capture–recapture
studies for multiple strata including non-Markovian transitions. Biometrics 49: 1173–1187.
References 191
Buckland, S. T. 1987. On the variable circular plot method of estimating animal density.
Biometrics 43: 363–384.
Buckland, S. T., D. R. Anderson, K. P. Burnham, and J. L. Laake. 1993. Distance Sampling:
Estimating Abundance of Biological Populations. Chapman and Hall, London.
Buckland, S. T., D. R. Anderson, K. P. Burnham, J. L. Laake, D. L. Borchers, and L. Thomas.
2001. Introduction to Distance Sampling. Oxford University Press, Oxford.
Buckland, S. T., D. R. Anderson, K. P. Burnham, J. L. Laake, D. L. Borchers, and L. Thomas.
2004. Advanced Distance Sampling. Oxford University Press, Oxford.
Burnham, K. P. 1993. A theory for combined analysis of ring recovery and recapture data, in
J. D. Lebreton and P. M. North, Eds. Marked Individuals in the Study of Bird Population, pp.
199–214. Birkhäuser-Verlag, Basel, Switzerland.
Burnham, K. P., and W. S. Overton. 1978. Estimation of the size of a closed population when
capture probabilities vary among animals. Bometrika 65: 625–633.
Burnham, K. P., D. R. Anderson, and J. L. Laake. 1980. Estimation of density from line transect
sampling of biological populations. Wildl. Monogr. 72: 1–202.
Burnham, K. P., S. T. Buckland, J. L. Laake, D.L. Borchers, T. A. Marques, J. R. B. Bishop, and
L. Thomas. 2004. In S. T. Buckland, D. R. Anderson, K. P. Burnham, J. L. Laake, D. L.
Borchers, and L. Thomas, Eds. Advanced Distance Sampling, pp. 307–392. Oxford University
Press, Oxford.
Byth, K. 1982. On robust distance-based intensity estimators. Biometrics 38: 127–135.
Canfield, R. H. 1941. Application of the line intercept method in sampling range vegetation.
J. Forest. 39: 388–394.
Caughley, G. 1977. Sampling in aerial survey. J. Wildl. Manage. 41: 605–615.
Caughley, G., and D. Grice. 1982. A correction factor for counting emus from the air and its
application to counts in western Australia. Aust. Wildl. Res. 9: 253–259.
Chao, A. 1987. Estimating the population size for capture–recapture data with unequal catchabil-
ity. Biometrics 43: 783–791.
Chao, A. 1988. Estimating animal abundance with capture frequency data. J. Wildl. Manage. 52:
295–300.
Chao, A. 1989. Estimating population size for sparse data in capture–recapture experiments.
Biometrics 45: 427–438.
Christman, M. C. 2000. A review of quadrat-based sampling of rare, geographically clustered
populations. J. Agric. Biol. Environ. Stat. 5: 168–201.
Cochran, W. G. 1977. Sampling Techniques, 3rd Edition. Wiley, New York.
Collier, B. A., S. S. Ditchkoff, J. B. Raglin, and J. M. Smith. 2007. Detection probability and
sources of variation in white-tailed deer spotlight surveys. J. Wildl. Manage. 71: 277–281.
Conroy, M. J., J. R. Goldsberry, J. E. Hines, and D. B. Stotts. 1988. Evaluation of aerial transect
surveys for wintering American black ducks. J. Wildl. Manage. 52: 694–703.
Cook, R. D., and J. O. Jacobson. 1979. A design for estimating visibility bias in aerial surveys.
Biometrics 35: 735–742.
Cressie, N. A. C. 1991. Statistics for Spatial Data. Wiley, New York.
Darroch, J. N. 1958. The multiple recapture census: I. Estimation of a closed population.
Biometrika 45: 343–359.
Dell, T. R., and J. L. Clutter. 1972. Ranked set sampling theory with order statistics background.
Biometrics 28: 545–553.
Deutsch, C. V., and A. G. Journel. 1992. GSLIB Geostatistical Software Library and User’s
Guide. Oxford University Press, New York.
Diggle, P. J. 1983. Statistical Analysis of Spatial Point Patterns. Academic, London.
Dinsmore, S. J., G. C. White, and F. L. Knopf. 2002. Advanced techniques for modeling avian
nest survival. Ecology 83: 3476–3488.
Drummer, T. D. 1991. SIZETRAN: Analysis of size-biased line transect data. Wildl. Soc. Bull.
19(1): 117–118.
Drummer, T. D., and L. L. McDonald. 1987. Size bias in line transect sampling. Biometrics 43: 13–21.
Eberhardt, L. L. 1978. Transect methods for populations studies. J. Wildl. Manage. 42: 1–31.
192 4 Sample Survey Strategies
Eberhardt, L. L., and M. A. Simmons. 1987. Calibrating population indices by double sampling.
J. Wildl. Manage. 51: 665–675.
Erickson, W. P., T. L. McDonald, and R. Skinner. 1998. Habitat selection using GIS data: A case
study. J. Agric. Biol. Environ. Stat. 3: 296–310.
Flowy, T. J., L. D. Mech, and M. E. Nelson. 1979. An improved method of censusing deer in
deciduous–coniferous forests. J. Wildl. Manage. 43: 258–261.
Foreman. K. 1991. Survey Sampling Principles. Marcel Dekker, New York.
Fretwell, S. D., and H. L. Lucas. 1970. On territorial behavior and other factors influencing habitat
distribution in birds. I. Acta Biotheoret. 19: 16–36.
Gaillard, J. M. 1988. Contribution a la Dynamique des Populations de Grands Mammiferes: l’Exemple
du Chevreuil (Capreolus capreolus). Dissertation. Universite Lyon I, Villeurbanne, France.
Gasaway, W. C., S. D. DuBois, D. J. Reed, and S. J. Harbo. 1986. Estimating Moose Population
Parameters from Aerial Surveys. Institute of Arctic Biology, Biological Papers of the
University of Alaska, Fairbanks, AK 99775, No. 22.
Gilbert, R. O. 1987. Statistical Methods for Environmental Pollution Monitoring. Van Nostrand
Reinhold, New York.
Gilbert, R. O., and J. C. Simpson. 1992. Statistical methods for evaluating the attainment of
cleanup standards. Vol. 3, Reference-Based Standards for Soils and Solid Media. Prepared by
Pacific Northwest Laboratory, Battelle Memorial Institute, Richland, WA, for U.S.
Environmental Protection Agency under a Related Services Agreement with U.S. Department
of Energy, Washington, DC. PNL-7409 Vol. 3, Rev. 1/UC-600.
Graham, A., and R. Bell. 1969. Factors influencing the countability of animals. East Afr. Agric.
For. J. 34: 38–43.
Grosenbaugh, L. R. 1952. Plotless timber estimates – new, fast, easy. J. For. 50: 532–537.
Guenzel, R. J. 1997. Estimating Pronghorn Abundance Using Aerial Line Transect Sampling.
Wyoming Game and Fish Department, Cheyenne, WY.
Hasel, A. A. 1938. Sampling error in timber surveys. J. Agric. Res. 57: 713–736.
Hendricks, W. A. 1956. The Mathematical Theory of Sampling. The Scarecrow Press, New
Brunswick.
Hornocker, M. G. 1970. An analysis of mountain lion predation upon mule deor and elk in the
Idaho Primitive Area. Wildl. Monogr. 21.
Horvitz, D. G., and D. J. Thompson. 1952. A generalization of sampling without replacement
from a finite universe. J. Am. Stat. Assoc. 47: 663–685.
Hosmer Jr., D. W., and S. Lemeshow. 1999. Applied Survival Analysis. Wiley, New York.
Huggins, R. M. 1989. On the statistical analysis of capture experiments. Biometrika 76: 133–140.
Huggins, R. M. 1991. Some practical aspects of a conditional likelihood approach to capture
experiments. Biometrics 47: 725–732.
Hurlbert, S. H. 1984. Pseudoreplication and the design of ecological field experiments. Ecol.
Monogr. 54: 187–211.
Isaaks, E. H., and R. M. Srivastava. 1989. An Introduction to Applied Geostatistics, Oxford
University Press, New York.
Johnson, D. H. 1980. The comparison of usage and availability measurements for evaluating
resource preference. Ecology 61: 65–71.
Johnson, D. H., Ed. 1998. J. Agric. Biol. Environ. Stat. 3(3) Special Issue: Resource Selection
Using Data from Geographical Information Systems (GIS).
Johnson, D. H. 2002. The importance of replication in wildlife research. J. Wildl. Manage. 66:
919–932.
Journel, A. G., and C. J. Huijbregts. 1978. Mining Geostatistics. Academic, London.
Kaiser, L. 1983. Unbiased estimation in line-intercept sampling. Biometrics 39: 965–976.
Kalamkar, R. J. 1932. Experimental error and the field plot technique with potatoes. J. Agric. Sci.
22: 373–383.
Kaplan, E. L., and P. Meier. 1958. Nonparametric estimation from incomplete observations.
J. Am. Stat. Assoc. 53: 457–481.
References 193
MacKenzie, D. I., J. A. Royle, J. A. Brown, and J. D. Nichols. 2004. Occupancy estimation and
modeling for rare and elusive species. In W. L. Thompson, Ed. Sampling Rare or Elusive
Species, pp. 149–172. Island Press,Washington.
MacKenzie, D. I., J. D. Nichols, J. Andrew Royle, K. H. Pollock, L. L. Bailey, and J. E. Hines.
2006. Occupancy Estimation and Modeling. Academic Press, London.
Manly, B. F. J. 1991. Randomization and Monte Carlo Methods in Biology. Chapman and Hall, London.
Manly, B. F. J., L. McDonald, and D. Thomas. 1993. Resource Selection by Animals: Statistical
Design and Analysis for Field Studies. Chapman and Hall, London.
Manly, B. F. J., L. McDonald, and G. W. Garner. 1996. Maximum likelihood estimation for the double-
count method with independent observers. J. Agric. Biol. Environ. Stat. 1(2): 170–189.
Marques, F. C. C., and S. T. Buckland. 2004. In S. T. Buckland, D. R. Anderson, K. P. Burnham,
J. L. Laake, D. L. Borchers, and L. Thomas, Eds. Advanced Distance Sampling, pp. 31–47.
Oxford University Press, Oxford.
Matheron, G. 1962. Traite de Geostatistique Appliquee, Tome I. Memoires du Bureau de
Recherches Geologiques et Minieres, No. 14. Editions Technip, Paris.
Matheron, G. 1971. The theory of regionized variables and its applications. Cahiers du Centre de
Morphologie Mathematique, No. 5. Fontaine-bleau, France.
McCullagh, P., and J. A. Nelder. 1983. Generalized Linear Models. Chapman and Hall, London.
McDonald, L. L. 1980. Line-intercept sampling for attributes other than cover and density.
J. Wildl. Manage. 44: 530–533.
McDonald, L. L. 1991. Workshop Notes on Statistics for Field Ecology. Western Ecosystems
Technology, Inc. Cheyenne, WY.
McDonald, L. L. 2004. Sampling rare populations, in W. L. Thompson, Ed. Sampling Rare or
Elusive Species, pp. 11–42. Island Press, Washington.
McDonald, L. L., H. B. Harvey, F. J. Mauer, and A. W. Brackney. 1990. Design of aerial surveys
for Dall sheep in the Arctic National Wildlife Refuge, Alaska. Seventh Biennial Northern Wild
Sheep and Goat Symposium. May 14–17, 1990, Clarkston, Washington.
McDonald, L. L., W. P. Erickson, and M. D. Strickland. 1995. Survey design, statistical analysis, and
basis for statistical inferences in Coastal Habitat Injury Assessment: Exxon Valdez Oil Spill, in P. G.
Wells, J. N. Buther, and J. S. Hughes, Eds. Exxon Valdez Oil Spill: Fate and Effects in Alaskan
Waters. ASTM STP 1219. American Society for Testing and Materials, Philadelphia, PA.
Menkins Jr., G. E., and S. H. Anderson. 1988. Estimation of small-mammal population size.
Ecology 69: 1952–1959.
Miller, S. D., G. C. White, R. A. Sellers, H. V. Reynolds, J. W. Schoen, K. Titus, V. G. Barnes, R. B.
Smith, R. R. Nelson, W. B. Ballard, and C. C. Schwartz. 1997. Brown and black bear density estima-
tion in Alsaka using radiotelemetry and replicated mark-resight techniques. Wildl. Monogr. 133.
Mode, N. A., L. L. Conquest, and D. A. Marker. 2002. Incorporating prior knowledge in environ-
mental sampling: ranked set sampling and other double sampling procedures. Environmetrics
13: 513–521.
Mood, A. M., F. A. Graybill, and D. C. Boes. 1974. Introduction to the Theory of Statistics, 3rd
Edition. McGraw-Hill, Boston.
Morrison, M. L., W. M. Block, M. D. Strickland, and W. L. Kendall. 2001. Wildlife Study Design.
Springer.
Munholland, P. L., and J. J. Borkowski. 1996. Simple latin square sampling +1: A spatial design
using quadrats. Biometrics 52: 125–136.
Muttlak, H. A., and L. L.McDonald. 1992. Ranked set sampling and the line intercept method: a
more efficient procedure Biometrical J. 34: 329–346.
Nichols, J. D., and K. H. Pollock. 1983. Estimation methodology in contemporary small mammal
capture–recapture studies. J. Mammal. 64: 253–260.
Nichols, J. D., and K. H. Pollock. 1990. Estimation of recruitment from immigration versus in situ
reproduction using Pollocks robust design. Ecology 71: 21–26.
Noon, B. R., N. M. Ishwar, and K. Vasudevan. 2006. Efficiency of adaptive cluster sampling and
random sampling in detecting terrestrial herptofauna in a tropical rainforest. J. Wildl. Manage.
34: 59–68.
References 195
Otis, D. L., K. P. Burnham, G. C. White, and D. R. Anderson. 1978. Statistical inference from
capture data on closed animal populations. Wildl. Monogr. 62.
Otis, D. L., L. L. McDonald, and M. Evans. 1993. Parameter estimation in encounter sampling
surveys. J. Wildl. Manage. 57: 543–548.
Overton, W. S., D. White, and D. L. Stevens. 1991. Design Report for EMAP: Environmental
Monitoring and Assessment Program. Environmental Research Laboratory, U.S. Environmental
Protection Agency, Corvallis, OR. EPA/600/3–91/053.
Packard, J. M., R. C. Summers, and L. B. Barnes. 1985. Variation of visibility bias during aerial
surveys of manatees. J. Wildl. Manage. 49: 347–351.
Patil, G. P., A. K. Sinha, and C. Taillie. 1994. Ranked set sampling, in G. P. Patil and C. R. Rao,
Eds. Handbook of Statistics, Environmental Statistics, Vol. 12. North-Holland, Amsterdam.
Pechanec, J. F., and G. Stewart. 1940. Sage brush-grass range sampling studies: size and structure
of sampling unit. Am. Soc. Agron. J. 32: 669–682.
Peterjohn, B. G., J. R. Sauer, and W. A. Link. 1996. The 1994 and 1995 summary of the North
American Breeding Bird Survey. Bird Popul. 3: 48–66.
Pollock, K. H. 1974. The Assumption of Equal Catchability of Animals in Tag-Recapture
Experiments. Ph.D. Thesis, Cornell University, Ithaca, NY.
Pollock, K. H. 1982. A capture–recapture design robust to unequal probability of capture. J. Wildl.
Manage. 46: 752–757.
Pollock, K. H. 1991. Modeling capture, recapture, and removal statistics for estimation of demo-
graphic parameters for fish and wildlife populations: past, present, and future. J. Am. Stat.
Assoc. 86: 225–238.
Pollock, K. H., and M. C. Otto. 1983. Robust estimation of population size in closed animal popu-
lations from capture–recapture experiments. Biometrics 39: 1035–1049.
Pollock, K. H., and W. L. Kendall. 1987. Visibility bias in aerial surveys: A review of estimation
procedures. J. Wildl. Manage. 51: 502–520.
Pollock, K. H., S. R. Winterstein, C. M. Bunck, and P. D. Curtis. 1989. Survival analysis in telemetry
studies: The staggered entry design. J. Wildl. Manage. 53: 7–15.
Pollock, K. H., J. D. Nichols, C. Brownie, and J. E. Hines. 1990. Statistical inferences for capture–
recapture experiments. Wildl. Monogr. 107.
Pradel, R. 1996. Utilization of capture–mark–recapture for the study of recruitment and popula-
tion growth rate. Biometrics 52: 703–709.
Quang, P. X. 1989. A nonparametric approach to size-biased line transect sampling. Draft Report.
Department of Mathematical Sciences, University of Alaska, Fairbanks, AK 99775. Biometrics
47(1): 269–279.
Quang, P. X., and E. F. Becker. 1996. Line transect sampling under varying conditions with appli-
cation to aerial surveys. Ecology 77: 1297–1302.
Quang, P. X., and E. F. Becker. 1997. Combining line transect and double count sampling tech-
niques for aerial surveys. J. Agric. Biol. Environ. Stat. 2: 230–242.
Ramsey, F. L., and J. M. Scott. 1979. Estimating population densities from variable circular plot
surveys, in R. M. Cormack, G. P. Patil and D. S. Robson, Eds. Sampling Biological
Populations, pp. 155–181. International Co-operative Publishing House, Fairland, MD.
Reed, D. J., L. L. McDonald, and J. R. Gilbert. 1989. Variance of the product of estimates. Draft
report. Alaska Department of Fish and Game, 1300 College Road, Fairbanks, AK 99701.
Rexstad, E., and K. Burnham. 1991. Users Guide for Interactive Program CAPTURE, Abundance
Estimation for Closed Populations. Colorado Cooperative Fish and Wildlife Research Unit,
Fort Collins, CO.
Reynolds, R. T., J. M. Scott, and R. A. Nussbaum. 1980. A variable circular-plot method for esti-
mating bird numbers. Condor 82(3): 309–313.
Rotella, J. J., S. J. Dinsmore, and T. L. Shaffer. 2004. Modeling nest-survival data: A comparison
of recently developed methods that can be implemented in MARK and SAS. Anim. Biodivers.
Conserv. 27: 187–205.
Royle, J. A. 2004a. Generalized estimators of avian abundance from count survey data. Anim.
Biodivers. Conserv. 27: 375–386.
196 4 Sample Survey Strategies
Royle, J. A. 2004b. Modeling abundance index data from anuran calling surveys. Conserv. Biol.
18: 1378–1385.
Royle, J. A., and W. A. Link. 2005. A general class of multinomial mixture models for anuran
calling survey data. Ecology 86: 2505–2512.
Royle, J. A., and J. D. Nichols. 2003. Estimating abundance from repeated presence–absence data
or point counts. Ecology 84: 777–790.
Royle, J. A., J. D. Nichols, and M. Kery. 2005. Modelling occurrence and abundance of species
when detection is imperfect. Oikos 110: 353–359.
Samuel, M. D., E. O. Garton, M. W. Schlegel, and R. G. Carson. 1987. Visibility bias during aerial
surveys of elk in northcentral Idaho. J. Wildl. Manage. 51: 622–630.
Scheaffer, R. L., W. Mendenhall, and L. Ott. 1990. Elementary Survey Sampling. PWS-Kent,
Boston, MA.
Schoenly, K. G., I. T. Domingo, and A. T. Barrion. 2003. Determining optimal quadrat sizes for
invertebrate communities in agrobiodiversity studies: A case study from tropical irrigated rice.
Environ. Entomol. 32: 929–938.
Schwarz, C. J., and A. N. Arnason. 1996. A general methodology for the analysis of capture–
recapture experiments in open populations. Biometrics 52: 860–873.
Schwarz, C. J., and W. T. Stobo. 1997. Estimating temporary migration using the robust design.
Biometrics 53: 178–194.
Seber, G. A. F. 1982. The Estimation of Animal Abundance and Related Parameters, 2nd Edition.
Griffin, London.
Shaffer, T. L. 2004. A unified approach to analyzing nest success. Auk 121: 526–540.
Skinner, R., W. Erickson, G. Minick, and L. L. McDonald. 1997. Estimating Moose Populations
and Trends Using Line Transect Sampling. Technical Report Prepared for the USFWS, Innoko
National Wildlife Refuge, McGrath, AK.
Smith, W. 1979. An oil spill sampling strategy, in R. M. Cormack, G. P. Patil, and D. S. Robson,
Eds. Sampling Biological Populations, pp. 355–363. International Co-operative Publishing
House, Fairland, MD.
Smith, D. R., M. J. Conroy, and D. H. Brakhage. 1995. Efficiency of adaptive cluster sampling
for estimating density of wintering waterfowl. Biometrics 51: 777–788.
Smith, D. R., J. A. Brown, and N. C. H. Lo. 2004. Application of adaptive sampling to biological
populations, in W. L. Thompson, Ed. Sampling Rare or Elusive Species, pp. 77–122. Island
Press, Washington.
Southwell, C. 1994. Evaluation of walked line transect counts for estimating macropod density.
J. Wildl. Manage. 58: 348–356.
Stanley, T. R. 2000. Modeling and estimation of stage-specific daily survival probabilities of
nests. Ecology 81: 2048–2053.
Steinhorst, R. K., and M. D. Samuel. 1989. Sightability adjustment methods for aerial surveys of
wildlife populations. Biometrics 45: 415–425.
Stevens, D. L., and A. R. Olsen. 1999. Spatially restricted surveys over time for aquatic resources.
J. Agric. Biol. Environ. Stat. 4: 415–428.
Stevens, D. L., and A. R. Olsen. 2004. Spatially balanced sampling of natural resources. J. Am.
Stat. Assoc. 99: 262–278.
Strickland, M. D., L. McDonald, J. W. Kern, T. Spraker, and A. Loranger. 1994. Analysis of 1992
Dall’s sheep and mountain goal survey data, Kenai National Wildlife Refuge. Bienn. Symp.
Northern Wild Sheep and Mountain Goat Council.
Strickland, M. D., W. P. Erickson, and L. L. McDonald. 1996. Avian Monitoring Studies: Buffalo
Ridge Wind Resource Area, Minnesota. Prepared for Northern States Power, Minneapolis,
MN.
Thomas, J. W. Ed. 1979. Wildlife Habitats in Managed Forests: The Blue Mountains of Oregon
and Washington. US Forest Service, Agriculture Handbook 553.
Thompson, S. K. 1990. Adaptive cluster sampling. J. Am Stat. Assoc. 85: 1050–1059.
Thompson, S. K. 1991a. Adaptive cluster sampling: designs with primary and secondary units.
Biometrics 47: 1103–1115.
References 197
5.1 Introduction
We have now presented the philosophy and basic concepts of study design, experi-
mental design, and sampling. These concepts provide the foundation for design and
execution of studies. Once a general design is conceptualized, it needs to be
applied. A conceptual study design, however, is not always an executable design.
During the application of the planned study design, additional steps and considera-
tions are often necessary. These include ensuring appropriate sampling across space
and time, addressing sampling errors and missing data, identifying appropriate
response variables, applying appropriate sampling methodology, and establishing
sampling points. Jeffers (1980) provides very useful guidance in the form of a
checklist of factors to consider in developing and applying a sampling strategy.
These include (1) stating the objectives, (2) defining the population to which infer-
ences are to be made, (3) defining sampling units, (4) identifying preliminary
information to assist development and execution of the sampling design, (5) choos-
ing the appropriate sampling design, (6) determining the appropriate sample size,
and (7) recording and analyzing data. We have discussed many of these topics in
Chaps. 1–4; we further elaborate on some of these considerations here.
Fig. 5.1 Approximate matching of spatial and temporal scales in ecological studies. Domains of
scale are represented by the dotted lines. Reproduced from Bissonette (1997) with kind permission
from Springer Science + Business Media
those observed at others (Wiens et al. 1986). For example, consider populations of
small mammals from ponderosa pine/Gambel oak (Pinus ponderosa/Quercus gam-
beli) forests of north-central Arizona. These small mammals are typically reproduc-
tively inactive during winter, initiating reproduction during spring and continuing
through summer and fall with the population gradually increasing over this period
(Fig. 5.2). As a result, estimates of population size depend largely on the time of
year when data were collected. Population estimates based on sampling during fall
would be greater than those derived using winter data or using a pooled data set
collected throughout the year. Not only do populations fluctuate within years, but
also they exhibit annual variation (see Fig. 5.2) as is often the case with r-selected
species. Thus, a study conducted during a given year may not be representative of
the population dynamics for the species studied. Depending on whether the objec-
tive of a study is to characterize “average or typical” population size, patterns of
population periodicity, or population trend, sampling should occur over the appro-
priate time period for unbiased point estimates or estimates of trend.
Wildlife populations and patterns of resource use vary spatially as well. Spatial
variations occur at different scales, ranging from within the home range of an indi-
vidual to the geographic range of the species. For example, Mexican spotted owls
(Strix occidentalis lucida) often concentrate activities within a small portion of
5.2 Spatial and Temporal Sampling 201
Fig. 5.2 Crude density estimates for the deer mouse (Peromyscus maniculatus), brush mouse
(P. boylii), and Mexican woodrat (Neotoma mexicana) from Arizona pine/oak forests illustrating
season and yearly variation in abundance. Reproduced from Block et al. (2005) with kind permis-
sion from The Wildlife Society
Thompson et al. (1998) distinguish two types of errors: sampling and nonsampling
errors. Sampling error is random in nature, often resulting from the selection of
sampling units. Sampling error is most appropriately addressed by developing and
implementing a sampling design that provides adequate precision. We refer you to
5.3 Sampling Biases 203
the preceding chapters for guidance on study design development. In contrast, non-
sampling error or sampling bias is typically a systematic bias where a parameter is
consistently under- or overestimated. Although considerable thought and planning
goes into development of a sampling design, execution of that design may result in
unaccounted biases. These include differences among observers, observer bias,
measurement error, missing data, and selection biases. Realization that bias can and
does occur and knowledge of the types of bias that will arise are crucial to any wild-
life study. By understanding potential biases or flaws in execution of a study design,
an investigator can try to avoid or minimize their effects (see Sect. 2.2) on data qual-
ity, results, and conclusions. Aspects of sampling bias are discussed below.
Many types of wildlife data are collected. Although some collection methods have
little sensitivity to interobserver variation, others can result in systematically biased
data dependent on observer. Differences among observers can result from the tech-
nique used to collect the data, differences in observer skill, or human error in col-
lecting, recording, or transcribing data.
Interobserver differences are unavoidable when collecting many types of wild-
life data. Not only will observer results differ among each other, but also deviations
from true values (i.e., bias) often occur. For example, Block et al. (1987) found that
ocular estimates or guesses of habitat variables differed among the observers in
their study. Not only did these observers provide significantly different estimates
for many variables, but also their estimates differed unpredictably from measured
or baseline values (e.g., some overestimated while others underestimated). Given
the vast interobserver variation, Block et al. (1987) found that number of samples
for precise estimates were greater for ocular estimates than for systematic measure-
ments (Fig. 5.3). Verner (1987) and Nichols et al. (2000) found similar observer
variation when comparing bird count data among observers. Verner (1987) found
that one person might record fewer total detections than other observers, but would
detect more bird species. Observer variation is not limited to differences among
individuals; Collier et al. (2007) found that evidence detection rates by the same
observers varied between sampling occasions. Without an understanding of the
magnitude and direction of observer variation, one is restricted in the ability to cor-
rect or adjust observations and minimize bias.
If measurement errors are independent among observers such that they average to
be 0 (i.e., some observers overestimate and others underestimate a parameter), they
are reflected within standard estimates of precision. These errors can, however,
204 5 Sampling Strategies: Applications
Fig. 5.3 Influence of sample size on the stability of estimates (dotted lines) and measurements
(solid lines). Vertical lines represent 1 SD for point estimates. Reproduced from Block et al.
(1987) with kind permission from the Cooper Ornithological Society
result in precision estimates beyond an acceptable level, in which case actions are
necessary to reduce the frequency and magnitude of such errors. This can be
accomplished by better training and quality control or by increasing sample sizes
to increase precision of the estimates.
If measurement errors are correlated among observations, then estimates of preci-
sion may be biased low (Cochran 1977). Examples of correlated measurement errors
are when an observer consistently detects fewer or more individuals of a species than
what exists, a tool that is calibrated incorrectly (e.g., compass, altimeter, range-
finder), or when systematic errors are made during data transcription or data entry.
resources remain constant during the course of a study, (2) the probability function
of resource selection remains constant within a season, (3) available resources have
been correctly identified, (4) used and unused resources are correctly identified, (5)
the variables that influence the probability of selection have been identified, (6)
individuals have free access to all available resource units, and (7) when sampling
resources, units are sampled randomly and independently. The degree to which
these assumptions are violated introduces bias into the study.
The assumptions of constant resource availability and a constant probability func-
tion of resource selection are violated in most field studies. Resources change over
the course of a study, both in quantity and quality. Further, populations’ patterns of
resource use and resource selection may also change with time. The severity of the
potential bias depends on the length of study and the amount of variation in the sys-
tem. Many studies are conducted within a season, most typically the breeding sea-
son. Consider the breeding chronology of many temperate passerine birds (Fig. 5.4).
The breeding season begins with establishment of territories and courtship behavior
Fig. 5.4 Transition diagram of the breeding history stage for male white-crowned sparrows
(Zonotrichia leucophrys) (from Wingfield and Jacobs 1999)
206 5 Sampling Strategies: Applications
during early spring. Once pair bonds are established, they select or construct a nest
and then go through various stages of reproduction, such as egg laying, incubation,
and nestling and fledgling rearing before the juveniles disperse. Some species may
nest multiple times during a breeding season. Initiation of breeding behavior to dis-
persal of the young may encompass 3–4 months, a period that likely includes fluc-
tuations in food resources, vegetation structure, and in individual resource needs.
Thus, an investigator studying birds during the breeding season has two alternatives:
either examine “general patterns” of selection across the breeding season or partition
the breeding period into smaller units possibly corresponding to different stages in
the reproductive cycle (e.g., courtship, nest formation, egg laying, incubation, fledg-
ling, dispersal). Examining general patterns across the breeding season may provide
results that have no relevance to actual selection patterns because the average or
mean value for a given variable may not represent the condition used by the species
at any time during the season. Breaking the breeding season into smaller subunits
may reduce the variation in resource availability and resource selection to meet this
assumption more closely. A tradeoff here, however, is that if the period is too short,
the investigator may have insufficient time to obtain enough samples for analysis.
Thus, one must carefully weigh the gain obtained by shortening the study period
with the ability to get a large enough sample size.
A key aspect of any study is to determine the appropriate variables to measure.
In habitat studies, for example, investigators select variables they think are relevant
to the species studied. Ideally, the process of selecting variables should be based on
careful evaluation of the species studied including a thorough review of past work.
Whether the species is responding directly to the variable measured is generally
unknown, and could probably more closely be determined by a carefully controlled
experiment such as those Klopfer (1963) conducted on chipping sparrows (Spizella
passerina). Rarely are such experiments conducted, however.
Even if the appropriate variable is chosen for study, an additional problem is
determining if that resource is available to the species. That is, even though the
investigator has shown that a species uses a particular resource, this does not mean
that all units of that resource are equally available to the species. Numerous biotic
and abiotic factors may render otherwise suitable resources unavailable to the spe-
cies (Wiens 1985) (Fig. 5.5). Identifying which resources are and are not available
is a daunting task. Most frequently, the investigator uses some measure of occur-
rence or relative occurrence of a resource to index availability. Implicit to such an
index is a linear relation between occurrence and availability. The degree to which
this assumption is violated is usually unknown and rarely tested; effects of violating
the assumption on study results are perhaps less well understood.
An underlying goal of any study is to produce reliable data that provide a basis for
valid inferences. Thus, the investigator should strive to obtain the most accurate
data possible, given the organism or system studied, the methods available for
5.3 Sampling Biases 207
Fig. 5.5 Factors influencing habitat selection in birds. Reproduced from Wiens (1985) with kind
permission from Elsevier
collecting data, and constraints imposed by the people, time, and money at hand.
Many forms of bias can be minimized through careful planning and execution of a
study. Specific aspects that can and should be addressed are discussed below.
your field of research before continuing with the sampling investigation” (Jeffers
1980, p. 5). Thereafter, QA/QC should be invoked continuously during all stages of
a study (data collection, data entry, data archiving) to ensure an acceptable quality
of collecting data, high standards for project personnel, a rigorous training pro-
gram, and periodic checks for consistency in data collection. The QA/QC process
should also be documented as part of the study files, and QC data retained to pro-
vide estimates of data quality.
The first step in addressing observer bias is to recognize that it exists. Once identified,
the investigator has various options to limit bias. Potential ways to control observer
bias are to (1) use methods that are repeatable with little room for judgment error, (2)
use skilled, qualified, and motivated observers, (3) provide adequate training and peri-
odic retraining, and (4) institute QC protocols to detect and address observer bias.
If the bias is uncorrelated such that it averages to be 0 but inflates estimates of
sampling variance, it can be addressed by tightening the way data are collected to
minimize increases in precision estimates, or by increasing sample sizes to reduce
precision estimates. If the bias is systematic and correlated, it may be possible to
develop a correction factor to reduce or possibly eliminate bias. Such a correction
factor could be developed through double sampling (see Sect. 4.3.5) and as discussed
later in Sect. 5.5.5. As an example, consider two ways to measure a tree canopy cover:
one using a spherical densiometer and the other by taking ocular estimates. The den-
siometer is a tool commonly used to measure canopy cover and provides a less biased
estimate than ocular estimates. When field readings are compared between the two
methods, a relationship can be developed based on a curve-fitting algorithm, and that
algorithm can be used to adjust observations using the less reliable technique.
A step often ignored or simply conducted in a cursory fashion is training. Kepler
and Scott (1981) developed a training program for field biologists conducting bird
counts and found that training reduced interobserver variation. Training is needed not
only to ensure that observers have adequate skills to collect data, but also to ensure
that observers follow established protocols, to ensure that data forms are completed
correctly, and to ensure that data are summarized and stored correctly after they have
been collected. Training should not be regarded as a single event, but as a continuous
process. Periodic training should be done to verify that observers have maintained
necessary skill levels, and that they continue to follow established protocols.
Investigators use a variety of ways to collect and record data. These include writing
observations on data sheets, recording them into a tape recorder, use of hand-held
data loggers, use of video cameras, sound equipment, or traps to capture animals.
5.4 Missing Data 209
5.4.1 Nonresponses
Nonresponse error occurs when one fails to record or observe an individual or unit
that is part of the selected sample. For example, when conducting point or line-
transect counts of birds, an observer will undoubtedly fail to detect all individuals
present. This occurs for a variety reasons, including the behavior or conspicuous-
ness of the individual, observer knowledge and skill levels, physical abilities of the
observer, environmental conditions (e.g., vegetation structure, noise levels,
weather), and type of method used. Likely, detectability of a given species will vary
in time and space. As the proportion of nonresponses increases, so does the confi-
dence interval of the parameter estimated (Cochran 1977). Thus, it is important to
minimize nonresponses by using appropriate methods for collecting data and by
using only qualified personnel. Once appropriate steps have been made to minimize
nonresponses, techniques are available to adjust observations collected in the field
by employing a detectability correction factor (see Sect. 4.5.1 for an example from
line-transect sampling). Thompson et al. (1998, p. 83) caution that methods that use
correction factors are (1) more costly and time consuming than use of simple indi-
ces and (2) should only be used when the underlying assumptions of the method are
“reasonably” satisfied.
Deviations from sampling designs can add an unknown bias to the data set. For
example, if an observer decides to estimate rather than measure tree heights, it could
result in an underestimate of tree heights (see Fig. 5.3) (Block et al. 1987). Also, if
210 5 Sampling Strategies: Applications
an observer records birds during point counts for only 6 min rather than 10 min as
established in the study design, the observer may record fewer species and fewer
individuals of each species (Fig. 5.6) (Thompson and Schwalbach 1995).
As detailed in Chap. 4, every wildlife study is unique and the selection of a specific
sampling protocol depends on the experience and expertise of the investigator. For
many studies, there may be more than one valid approach to conduct a given study.
The investigator has choices in specific sampling methods used to collect various
types of data. Sometimes the choice is obvious; other times the choice may be more
obscure. Critical factors that weigh into selection of a sampling protocol include (1)
the biology of the species studied, (2) its relative population size and spatial distri-
bution, (3) methods used to detect the species, (4) the study objectives, (5) resources
available to the study, (6) the size of the study area(s), (7) time of year, and (8) the
skill levels of the observers.
Consider, for example, different methods to sample population numbers of pas-
serine birds (Ralph and Scott 1981; Verner 1985). Four common methods include
spot mapping, point counts, line transects, and capture–recapture. Each has merits
and limitations in terms of feasibility, efficiency, and accuracy, and variations exist
even within a method. With point counts, the investigator must decide to use fixed-
or variable-radius plots, radius length for fixed plots, how much time to spend at a
point (usually varies between 4 and 10 min), how many times to count each point,
5.5 Selection of the Sampling Protocol 211
and how to array points within a study area. Many of the nuances that influence
these decisions are discussed more fully in the papers contained within Ralph et al.
(1995). Our point is that numerous details must be considered when selecting the
specific sampling protocol to be used in collecting data. Simply choosing a basic
procedure does not provide the level of specificity that must be defined in the
sampling protocol.
Sampling intensity refers to how many, how long, and how often units should be
sampled. Obviously, this depends on the study objectives and the information
needed to address them. In addition, the biology of the organism and attributes of
the process or system being studied will influence sampling intensity. In consider-
ing sampling intensity, one must realize that tradeoffs will be involved. For example,
the length of time spent at each sampling unit or number of repeat visits needed
may limit the overall number of plots visited. Deriving the optimal allocation of
sampling intensity should be a goal of a pilot study. In the absence of a pilot study,
the investigator might try to analyze a similar data set or consult with experts
knowledgeable in the species studied and the types of statistical analyses involved.
A third possibility is to conduct the study in a staged fashion, whereby preliminary
data can be evaluated to determine the appropriate sampling intensity. This proce-
dure can be repeated over a series of sampling sessions until the sampling intensity
meets the study needs.
How many plots are needed for a study is an issue of sample size. Adequate
sample sizes are needed for precise point estimates of the variable being measured,
to ensure adequate statistical power to detect a difference or trend should it indeed
occur, and to meet specific requirements of specialized analytical tools such as
program DISTANCE (Buckland et al. 2001, 2004), program MARK (Cooch and
White 2007), and others. As a rule, more plots are needed if a species is rare or has
a clumped distribution (Thompson et al. 1998). Sample size considerations have
been discussed in more detail in Sect. 2.6.7.
Temporal considerations for sampling intensity involve both the length of time
to spend collecting observations during each visit to a sampling point, number of
visits to each point, and the length of time needed to conduct a study. The amount
of time spent at a sampling point depends on the species studied and the probability
of detecting them during a certain period of time. Dawson et al. (1995) found that
the probability of detecting a species increased with time spent at a count station
(Table 5.1). However, the more time spent at each point limits the number of points
that can be sampled, consequently both factors must be considered in the study
design (Petit et al. 1995).
One visit to a plot may be inadequate to detect all individuals using an area
because of missed observations, behavioral differences that influence detectability,
212 5 Sampling Strategies: Applications
Table 5.1 Probability of detecting 14 species of neotropical migratory birds within 5, 10,
15, and 20 min at points where they were known to be present
Probability of detecting within
Species name Number of points* 5 min 10 min 15 min 20 min
Yellow-billed cuckoo 258 0.465 0.655 0.812 0.922
Great-crested flycatcher 270 0.540 0.704 0.839 0.926
Eastern wood-pewee 294 0.611 0.752 0.854 0.920
Acadian flycatcher 176 0.747 0.820 0.896 0.936
Blue-gray gnatcatcher 112 0.580 0.728 0.862 0.931
Wood thrush 323 0.784 0.882 0.939 0.971
Gray catbird 171 0.615 0.779 0.893 0.936
Red-eyed vireo 377 0.857 0.922 0.964 0.980
Worm-eating warbler 79 0.507 0.671 0.877 0.929
Ovenbird 244 0.765 0.885 0.940 0.977
Kentucky warbler 82 0.580 0.773 0.827 0.945
Common yellowthroat 125 0.606 0.740 0.852 0.950
Scarlet tanager 295 0.718 0.833 0.910 0.948
Indigo bunting 184 0.582 0.726 0.845 0.912
* Number of points is the number at which the species was detected
Source: Dawson et al. (1995)
Line intercept sampling was first introduced as a method for sampling vegetation
cover (Canfield 1941; Bonham 1989). It is widely used in wildlife habitat studies
when vegetation cover is anticipated as a strong correlate of use by the species of
interest. Effectively, one calculates the proportion or percentage of a line that has
vegetation directly above the line. Say, for example, a line intercept was 20-m long,
and 15 m of the line were covered by canyon live oak (Quercus chrysolepis). Thus,
percentage canopy cover would be 15/20 × 100 = 75%. There is some flexibility in
5.5 Selection of the Sampling Protocol 213
how line intercept methods are used. They are used to estimate overall canopy
cover, cover by plant species, cover at different height strata, or cover by different
plant forms (e.g., herbs, shrubs, trees), providing opportunities to examine both
structural and floristic habitat correlates.
A derivation of the line intercept technique is the point intercept method. This
methodology was developed primarily to sample grasses and forbs as part of range
studies (Heady et al. 1959). Generally, points are systematically arrayed along a
transect that is oriented in a random direction. The method entails noting whether
the object being sampled is on the point (commonly termed a hit) or not. An esti-
mate of cover would be the percentage of points where a “hit” was recorded.
Similar to line intercepts, point intercepts can be used a various ways to address
different study objectives.
As noted in Sect. 4.4.2, intercepts can also be a used as a method to sample
individuals for collecting additional information or attributes. For example, sup-
pose a study on secondary cavity-nesting birds wanted to estimate the relative
abundance of cavities on a study area. An intercept could be used to select trees to
sample, and then the numbers of cavities could be counted on each tree that was hit
to provide an abundance estimate for cavities.
Four primary factors influence the selection of the plot shape: detectability of indi-
viduals, distribution of individuals, edge effects (i.e., ratio of plot perimeter to plot
area), and the methods used to collect data (Thompson 2002; Thompson et al. 1998).
A plot with a large edge effect may lead to greater commission or omission of indi-
viduals during counts, that is, including individuals existing off the plot or excluding
individuals occurring on the plot. Given plots of equal area, long, narrow, rectangular
plots will have greater edge effect than square plots, which have more edge than cir-
cular plots. However, differences in edge effect can be outweighed by the sampling
method, dispersion pattern of individuals, and their detectability. For example, if
sampling occurs quickly, then animals may not move in and out of the plot, thereby
214 5 Sampling Strategies: Applications
minimizing the errors of commission and omission. Thompson (2002) concluded that
rectangular plots were more efficient than square or round plots for detecting individ-
uals, which would reduce the magnitude of adjustments needed for simple counts and
provide less biased estimates of animal numbers. Given that most species of wildlife
are not randomly distributed, most exhibit some degree of clumping. Thompson et al.
(1998) found that precision of estimates increased if few plots had no detection of the
species sampled. Thus, the shape of a plot must be such to maximize the probability
of including the species. Generally, a long and narrow rectangular plot would have
greater chance of encountering a species with a clumped distribution.
Numerous factors influence plot size, including the biology of the species, its spatial
distribution, study objectives, edge effects, logistical considerations, and cost con-
straints. Certainly, larger species or top-level predators with large home ranges
require larger plots to include adequate numbers. For example, a 10-ha plot might
include only 1% of the home range of a spotted owl, whereas it could include the
entire home ranges of multiple deer mice (Peromyscus maniculatus). The spatial
arrangement of a species also influences plot size. If the distribution is strongly aggre-
gated, small plots may fail to include any in a large proportion, whereas the number
of empty plots would be reduced with large plots (Fig. 5.7). Another advantage of
larger plots, especially when studying highly vagile animals, is that larger plots are
more likely to include the entire home range of the species than smaller plots.
Effectively, larger plots have a lower ratio of edge to interior, thereby limiting poten-
tial edge effects. Disadvantages of larger plots are costs and limitations in the number
of plots that can be sampled. Larger plots require more effort to sample than smaller
plots. Given a fixed budget, one is faced with a tradeoff between plot size and the
number that can be sampled. This tradeoff must be carefully weighed during the plan-
ning process by carefully stating the study objectives or hypothesis, specifying the
differences, trends, or parameters to be estimated, and determining the sample size
needed for adequate statistical treatment of the data. A study that consists of one or
two large plots may be of limited value in drawing inferences regarding the species
studied in that results cannot be easily extrapolated beyond the study location. Thus,
an investigator must carefully weigh the resources at hand with the objectives or
information needs of a study to evaluate critically if enough plots of adequate size can
be sampled to ensure a valid study. If not, the study should not be done.
Fig. 5.7 A sampling frame with an underlying clumped spatial distribution of animals is divided
into four strata each containing 25 plots. A simple random sample of four plots is selected from
each stratum (Figure from Monitoring Vertebrate Populations by Thompson, White and Gowan,
Copyright © 1998 by Academic Press, reproduced by permission of Elsevier)
options. Those who skip this step because they do not have enough time usually end
up losing time.” A pilot study allows the investigator a chance to evaluate whether
data collection methodologies are effective, estimate sampling variances, establish
sample sizes, and adjust the plot shape and size and other aspects of the sampling
design. Conducting a pilot study often leads to greater efficiency in the long run
because it can ensure that you do not oversample and that the approach that you are
taking is the appropriate one. All too often investigators skip the pilot portion of a
study, going directly into data collection, only to conclude that their study fails to
achieve anticipated goals despite having collected volumes of data. Many biometri-
cians can recount horror stories of students entering their office with reams of field
data, only to find out that the data have very limited utility.
Double sampling entails designs in which two different methods are used to collect
data related to the same or a similar variable (Thompson 2002). A primary impetus
for double sampling is to find a quicker and cheaper way to get needed information.
216 5 Sampling Strategies: Applications
For example, one may use line-transect methods (see Chap. 4) as the preferred way
to estimate population size of elk. Although previous investigators have demon-
strated that line-transect methods provide accurate and precise estimates of elk
population size, the costs of conducting the sampling are high, especially when
estimates are needed for large areas. An alternative may be to conduct aerial sur-
veys, although aerial surveys by themselves may not be sufficiently accurate to base
estimates on them alone. By using both methods for an initial sample, one can use
a simple ratio estimate of detectability to calibrate results from the aerial survey.
Ratio estimation is effective in situations in which the variable of interest is linearly
related to the auxiliary (quick and dirty) variable. The ideal proportion of the sam-
ple relative to the subsample depends on the relative costs of obtaining the informa-
tion and the strength of the relationship between the two (Thompson 2002). For a
detailed discussion of double sampling, see Sect. 4.3.5.
Double sampling also can be used for identifying strata and determining alloca-
tion of samples in the strata. For example, consider a study to determine the average
height of trees used by hairy woodpeckers for foraging. In collecting information
on tree height, the investigator also gathers information on age (adult or juvenile)
and sex of woodpeckers observed. Assuming a large enough sample, the investigator
can estimate the relative proportion of individuals in each age/sex stratum. A stratified
random sample is then selected from the initial sample to estimate the variable of
interest: in this case, tree height. The advantage of this approach is that it provides
an unbiased point estimate of tree height, but with greater precision than provided
by a simple random sample.
A third use of double sampling is to correct for nonresponses in surveys. Surveys
are often collected to gather information on consumptive or nonconsumptive use of
wildlife for a variety of reasons. Surveys done on hunting success are used in popula-
tion estimation models or to set future bag limits. Surveys conducted on recreational
bird-watching provide resource agencies with important information to manage people
and minimize potentially deleterious effects on rare or sensitive birds. Thus, surveys
are a useful tool in understanding the biology of species of interest and in developing
management options to conserve them. A problem arises, however, because only a
proportion of those surveyed actually respond for one reason or another. Assuming that
the sample of respondents is representative of the nonrespondents is tenuous. A solu-
tion is to consider the nonrespondents as a separate stratum and to take a subsample
from that. Thus, the double sample is needed to separately sample the strata defined
by whether or not people responded to the survey.
5.6.2 Categories
Once the investigator has selected a study area and designated sampling plots, he
or she must place stations in such a way to adequately sample within and among
plots. Placement of stations is influenced by a number of factors including study
objectives, the sampling design, logistical considerations, and the organism(s) or
system(s) studied. Further, different designs for placing stations are needed depend-
ing on whether the ultimate goal is to detect all individuals (complete enumeration)
or to sample a portion of the population and estimate population size. As noted
below, a complete enumeration is rarely achieved, thus the more likely scenario is
for a portion of the population to be sampled.
Sampling stations can be placed in an array such as a grid where the array is con-
sidered the sampling unit. Grids are commonly used for sampling small mammals,
herpetofauna, or passerine birds. In this case, sampling stations are not generally
considered independent samples because individuals of the species studied may use
multiple stations, and the presence or absence of an animal at one sampling station
may influence the presence of animals at others. Using grids or similar arrays allows
an investigator to apply capture–recapture (Otis et al. 1978; White et al. 1982) cap-
ture–removal (Zippen 1958) methods to study population dynamics. Habitat relations
5.7 Sampling Small Areas 219
can be studied by comparing stations where a species was captured with those where
it was not or by assessing correlations across grids (Block et al. 1998).
Often, study objectives and statistical analyses require that sampling stations be
independent. Ad hoc definitions of independence could be (1) when the probability
of sampling the same individual at adjacent points is relatively small or (2) when
the species or number of a species present at one point does not influence what
species and how many are present at nearby points. An empirical way to test for
autocorrelation of observations is based on mixed model analysis of variance (Littel
et al. 1996) where one can test for correlations between sampling points at various
distances. An alternative testing approach might be the analysis provided by
Swihart and Slade (1985) for testing for autocorrelations among observations sepa-
rated in time. Both methods might allow the investigator to estimate the optimal
distance between points to ensure that they are independent observations. Once
investigators define that distance, they need to physically locate sampling points
within their sample plots. An efficient way to place plots is using a systematic ran-
dom sampling design (Cochran 1977). This entails randomly choosing the initial
starting point and a random direction, and then spacing subsequent points at the
defined distance along this bearing. Alternatively, one could randomly overlay a
grid with spacing between grid points greater than or equal to that needed for inde-
pendent observations and then collect data at all or a random sample of those grid
points. An advantage of systematic random sampling is that it provides for efficient
sampling in the field, whereas locating random grid points could require more time
to locate and reach each sampling point. However, a disadvantage of systematic
random sampling is that it may not sample the variation within a sample plot as well
as random points distributed throughout the plot (Thompson et al. 1998). This
potential bias, however, is probably ameliorated given large enough (>50) sample
sizes (Cochran 1977; Thompson et al. 1998).
Many studies are restricted to small areas for a variety of reasons. For example, a
study could be conducted to understand population dynamics or habitat relations of
species occurring on a small, unique area such as an island or a patch of riparian
vegetation surrounded by uplands. Given the uniqueness of the area, the study must
be done at that particular location. Another reason a study might be done in a small
area is that a study is being done to understand potential effects of a planned activ-
ity or impact on the species found there. These types of studies often fall under the
umbrella of impact assessment studies that are discussed in detail in Chap. 6. Given
a small area, the researcher should strive for a complete enumeration with the reali-
zation that he or she will probably miss some individuals (see below). However,
given that most of the population within the study area will be sampled, a correction
for detectability can be applied to parameter estimates, thereby increasing their
precision.
220 5 Sampling Strategies: Applications
Suppose the primary objective of a study is to estimate population density of the yellow-
blotched ensatina (Ensatina eschscholtzii croceater), and a secondary objective is to
collect morphometric data on the animals located. This subspecies of ensatina is
patchily distributed within mesic oak woodlands of southern California (Block and
222 5 Sampling Strategies: Applications
Morrison 1998). Using adaptive sampling, random walk surveys are used to locate
the salamander-by searching on and under all suitable substrates. Once a salamander
is located, the researcher conducts an area-constrained survey (Heyer et al. 1994) to
sample that area intensively for salamanders. An area-constrained survey consists of
establishing a plot, and then searching for ensatinas on, under, and in all possible
substrates – typically rocks and logs for this species – within that plot. The researcher
would also collect morphological information on animals captured. Thus, both a
density sample (primary study goal) and morphometric data (ancillary study informa-
tion) would result from the efforts. Although it appears to have potential for many
wildlife studies, adaptive sampling has not been widely used. For a more complete
discussion of adaptive sampling and the variations thereof, see Thompson (2002),
Thompson and Seber (1996), and Smith et al. (2004).
As well as you might plan, things do not always go as you wish. You might encounter
problems obtaining samples because of limited access or physical barriers. Weather
might limit field collection or influence behavior of the animal under study.
Equipment might fail. Field personnel may not follow protocols leading to mistakes
or gaps in data collection. And yes, of course, funding might be reduced or even
eliminated. When these situations occur, what can you do? Ignore the problems and
continue blithely along? End the study and analyze data you have? Evaluate the sta-
tus of the study and make mid-course corrections? Below we provide some guidance
for making the best out of less than desirable sampling situations.
Hopefully, you have the wherewithal to stick with your study and the make mid-
project changes. The severity and extent of the changes needed might dictate
whether a simple course correction is needed, or if major changes are needed in the
scope and direction of the study. Timing of the changes also influences what changes
are needed. For example, changes occurring early in a study might require revising
study objectives and the basis of the research, whereas changes toward the end might
entail minor adjustments to make sure you retain much of your efforts. Changes may
be needed for a number of reasons, and those changes may differ depending on
whether the study is an experiment, quasiexperiment, or observational study. It is
almost impossible to envision all of the problems that can and do occur. In the col-
lective years conducting research by the authors of this book, we continue to encoun-
ter new ways for things to go wrong. Our bottom line advice is do not panic. With
some perseverance, you can likely salvage a study even if the nature and underlying
objectives are a bit different from what you set out to accomplish.
5.10 When Things Go Wrong 223
As we noted above, you can probably salvage something from study even if your best
laid plans go wrong. Do not panic, but systematically evaluate your options. Some
questions to ask: can I still address my original objectives; how much of my existing
data can be used; how much more data must I gather; can I follow the same sampling
design or does it need to be adjusted; what are my options for data analysis; should I
switch from an ANOVA or BACI analysis to more exploratory model selection tech-
niques? By considering these questions, you will understand your options better. It
may very well be that your final study has little resemblance to your original plan. It
is best to get something out of all of your efforts than waste all of your hard work.
Often you are able to address your main study objective but with less confidence
than would have been possible had you implemented the optimal design. For exam-
ple, losing several study plots due to unforeseen circumstances (e.g., permission for
access denied; treatments could not be applied) would result in a more limited
224 5 Sampling Strategies: Applications
inference based on your results; but, you would still have results and be able to
make some more qualified inference for application outside your immediate study
sites. Likewise, your initial plans to submit your work for publication to a national-
level journal might need to be changed to submit to a regional journal. But, in our
opinion, your duty as a scientist is to do the best work you can and get the work
published; where it is published is a consideration but of secondary importance.
5.11 Summary
The first rule for applying a sampling design is to recognize that the design itself is not
self-implementing. By that, we mean that people are needed actually to conduct the
study: for example, to locate and mark sample plots, lay out sample points, collect data,
and then document, transcribe, and handle the data through analysis and interpretation.
During the process of applying a conceived study plan, adjustments will more than
likely be necessary. Along this line, we offer a few salient points to keep in mind:
● Wildlife populations and ecologies typically vary in time and space. A study
design should account for these variations to ensure accurate and precise esti-
mates of the parameters under study.
● Various factors may lend bias to the data collected and study results. These
include observer bias, sampling and measurement bias, and selection bias.
Investigators must acknowledge that bias can and does occur, and take measures
to minimize or mitigate the effects of that bias.
● A critical aspect of any study is development of and adherence to a rigorous
QA/QC program.
● Study plans should be regarded as living documents that detail all facets of a
study, including any changes and modifications made during application of the
study design.
● Sampling intensity must be sufficient to provide information needed to address
the study objectives. Anything less may constitute a waste of resources.
● Plot size and shape are unique to each study.
● Pilot studies are critical as “Those who skip this step because they do not have
enough time usually end up losing time” (Green 1979, p. 31).
● Studying rare species or events requires special approaches such as adaptive
sampling, adaptive cluster sampling, sequential sampling, and two-phase strati-
fied adaptive sampling.
● Stuff happens. Even the best designed studies require mid-study adjustments.
References
Becker, E. F., H. N. Golden, and C. L. Gardner. 2004. Using probability sampling of animal tracks
in snow to estimate population size, in W. L. Thompson, Ed. Sampling Rare or Elusive
Species, pp. 248–270. Island Press, Covelo, CA.
References 225
Hurlbert, S. H. 1984. Pseudoreplication and the design of ecological field experiments. Ecol.
Monogr. 54: 187–211.
Jeffers, J. N. R. 1980. Sampling. Statistical Checklist 2. Institute of Terrestrial Ecology, National
Environment Research Council, Cambridge, United Kingdom.
Karanth, K. U., J. D. Nichols, and N. S. Kumar. 2004. Photographic sampling of elusive mammals
in tropical forests, in W. L. Thompson, Ed. Sampling Rare or Elusive Species, pp. 229–247.
Island Press, Covelo, CA.
Kepler, C. B., and J. M. Scott. 1981. Reducing bird count variability by training observers. Stud.
Avian Biol. 6: 366–371.
Klopfer, P. H. 1963. Behavioral aspects of habitat selection: The role of early experience. Wilson
Bull. 75: 15–22.
Littel, R. C., G. A. Milliken, W. W. Stroup, and R. D. Wolfinger. 1996. SAS System for Mixed
Models. SAS Institute, Cary, NC.
Little, R. J. A., and D. B. Rubin. 1987. Statistical Analysis with Missing Data. Wiley, New York,
NY.
MacArthur, R. H., and J. W. MacArthur. 1961. On bird species diversity. Ecology 42: 594–598.
Manly, B. L. 2004. Two-phase adaptive stratified sampling, in W. L. Thompson, Ed. Sampling
Rare or Elusive Species, pp. 123–133. Island Press, Covelo, CA.
Manly, B. L., L. McDonald, and D. Thomas. 1993. Resource Selection by Animals. Chapman and
Hall, London.
Miles, D. B. 1990. The importance and consequences of temporal variation in avian foraging
behavior. Stud. Avian Biol. 13: 210–217.
Moir, W. H., B. Geils, M. A. Benoit, and D. Scurlock. 1997. Ecology of southwestern ponderosa
pine forests, in W. M. Block, and D. M. Finch, Tech. Eds. Songbird Ecology in Southwestern
Ponderosa Pine Forests: A Literature Review, pp. 3–27. Gen. Tech. Rpt. RM-GTR-292. USDA
Forest Service, Rocky Mountain Forest and Range Experiment Station, Fort Collins, CO.
Morrison, M. L. 1982. The structure of western warbler assemblages: Analysis of foraging behav-
ior and habitat selection in Oregon. Auk 98: 578–588.
Morrison, M. L. 1987. The design and importance of long-term ecological studies: Analysis of
vertebrates in the Inyo-White mountains, California, in R. C. Szaro, K. E. Severson, and D. R.
Patton, Tech. Coords. Management of Amphibians, Reptiles, and Small Mammals in North
America, pp. 267–275. Gen. Tech. Rpt. RM-166. USDA Forest Service, Rocky Mountain
Forest and Range Experiment Station, Fort Collins, CO.
Nichols, J. D., J. E. Hines, J. R. Sauer, F. W. Fallon, J. E. Fallon, and P. J. Heglund. 2000. A double
observer approach for estimating detection probability and abundance from point counts. Auk
117: 393–408.
Noon, B. R., and W. M. Block. 1990. Analytical considerations for study design. Stud. Avian Biol.
13: 126–133.
Otis, D. L., K. P. Burnham, G. C. White, and D. R. Anderson. 1978. Statistical inference from
capture data on closed animal populations. Wildl. Monogr. 62.
Petit, D. R., L. J. Petit, V. A. Saab, and T. E. Martin. 1995. Fixed-radius point counts in forests:
Factors influencing effectiveness and efficiency, in C. J. Ralph, F. R. Sauer, and S. Drogge,
Tech. Eds. Monitoring Bird Populations by Point Counts, pp. 49–56. Gen. Tech. Rpt. PSW-
GTR-149. USDA Forest Service, Pacific Southwest Research Station, Albany, CA.
Ralph, C. J., and J. M. Scott, Eds. 1981. Estimating numbers of terrestrial birds. Stud. Avian Biol. 6.
Ralph, C. J., J. R. Sauer, and S. Drogge, Tech. Eds. 1995. Monitoring Bird Populations by Point
Counts. Gen. Tech. Rpt. PSW-GTR-149. USDA Forest Service, Pacific Southwest Research
Station, Albany, CA.
Schneider, D. C. 1994. Quantitative Ecology. Academic, San Diego, CA.
Schwartz, M. K., G. Luikart, and R. S. Waples. 2006. Genetic monitoring as a promising tool for
conservation and management. Trends Ecol. Evol. 22: 25–33.
Smith, D. R, J. A. Brown, and N. C. H. Lo. 2004. Applications of adaptive sampling to biological
populations, in W. L. Thompson, Ed. Sampling Rare or Elusive Species, pp. 77–122. Island
Press, Covelo, CA.
References 227
6.1 Introduction
In this chapter we apply the concepts developed previously in this book to the spe-
cific issue of determining the effects of environmental impacts on wildlife. Impact
is a general term used to describe any change that perturbs the current system,
whether it is planned or unplanned, human induced, or an act of nature. Thus,
impacts include a 100-year flood that destroys a well-developed riparian woodland,
a disease that decimates an elk herd, or the planned or unplanned application of
fertilizer. Impacts also include projects that are intended to improve conditions for
animals such as ecological restoration. For example, removal of exotic salt cedar
from riparian areas to enhance cottonwood regeneration can substantially impact
the existing site conditions.
You have likely already encountered many situations that fall into the latter cat-
egory; namely, studies that are constrained by location and time. Such situations
often arise in environmental studies because the interest (e.g., funding agency) is
local, such as the response of plants and animals to treatments (e.g., fire, herbicide)
applied on a management area of a few 100 to a few 1,000 ha. Often these treat-
ments are applied to small plots to evaluate one resource, such as plants, and you
have been funded to study animal responses. In such situations, the initial plots
might be too small to adequately sample many animal species. Or, there might be
no treatment involved, and the project focus is to quantify the ecology of some spe-
cies within a small temporal and spatial scale. It is important for students to note
that most resource management is applied locally; that is, on a small spatial scale
to respond to the needs of local resource managers. The suite of study designs that
fall under the general rubric of impact assessment are applicable to studies that are
not viewed as having caused an environmental impact per se. Designs that we cover
below, such as after-only gradient designs, are but one example.
A distinction should be made between a hypothesis developed within a manipu-
lative experiment framework and a hypothesis developed within an impact frame-
work. By randomizing treatments to experimental units and replicating the
experiments, test conditions are basically free from confounding influences of time
and space; inferences can be clearly stated. Within an impact framework, however,
test conditions are usually outside the control of the investigator, which makes
inference problematic (see related discussion in Skalski and Robson 1992, pp.
161–162).
In this chapter we will concentrate on impacts that are seldom planned, and are
usually considered to be a negative influence on the environment. But this need not be
so, and the designs described herein have wide applicability to the study of wildlife.
In his classic book on study design, Green (1979) outlined the basic distinction
between an optimal and suboptimal study design. In brief, if you know what type of
impact will occur, when and where it will occur, and have the ability to gather pretreat-
ment data, you are in an “optimal” situation to design the study (Fig. 6.1, sequence 1).
Main Sequence 1 is, in essence, the classic manipulative study, although Green was
developing his outline in an impact assessment framework. That is, you might be
establishing control areas and gathering pretreatment data in anticipation of a likely
catastrophic impact such as a fire or flood. Thus, the “when” aspect of the optimal
design need not be known specifically other than in the future that you can plan for.
As we step through the decision tree developed by Green (1979; Fig. 6.1), your
ability to plan aspects of the impact study decline. Unless we are concerned only
Fig. 6.1 The decision key to the “main sequence” categories of environmental studies. (From
Sampling Design and Statistical Methods for Environmental Biologists by Green, Copyright ©
1979, Wiley. Reprinted by permission of publisher)
6.2 Experimental Designs: Optimal and Suboptimal 231
with a small geographic area, we can seldom adequately anticipate where an impact
will occur, or even if we could, what the type and intensity of the impact would be.
This uncertainly is exacerbated by the population structure of many species. Because
animals are not distributed uniformly, even within a single vegetation type, we are
forced to sample intensively over an area in anticipation of something (the impact)
that might never occur – few budgets can allow such luxury (the topic of distribution
of plots described below under suboptimal designs applies in this situation).
That we cannot know the specific type of impact that will occur does not always
negate, however, implementation of an optimal design. For example, personnel of
a wildlife refuge located downstream from a chemical factory could anticipate the
chemical, but perhaps not the toxicity, of a spill; or a refuge located in an area of
extensive human population growth might anticipate an increase in air pollution
and/or a decrease in the water table. Such impacts could likely be modeled using
the experience of similar regions in the recent past. Agencies charged with manag-
ing game populations could anticipate, perhaps through modeling, some of the
major environmental changes resulting from future changes in key limits (e.g.,
effects on prey, effects on forage quality). Various measures could be prioritized for
study. Sensitivity analysis is also a useful tool to aid in understanding the behavior
of model parameters and to narrow the number of variables to be monitored. If
monitoring sites are distributed across the area of interest (e.g., study site, impacted
site), then it is also likely that not all sites would be impacted; thus, some would
serve as nonimpacted controls.
As noted by Green, an optimal design is thus an areas-by-times factorial design
in which evidence for an impact is a significant areas-by-times interaction. Given
that the prerequisites for an optimal design are met, the choice of a specific sampling
design and statistical analyses should be based on your ability to (1) test the null
hypothesis that any change in the impacted area does not differ statistically or bio-
logically from the control and (2) relate to the impact any demonstrated change
unique to the impacted area and to separate effects caused by naturally occurring
variation unrelated to the impact (Green 1979, p. 71). Locating appropriate control
sites is not a trivial matter. The selection of control sites has been more fully dis-
cussed in Chap. 2. It is important to make sure that your control sites are truly unim-
pacted, even in very subtle ways. For example, the impact could cause animals to
vacate the site and move onto the control, which could cause any number of behav-
ioral or density dependent responses by the animals already residing there that you
would not perceive.
It is often not possible, however, to meet the criteria for development of an optimal
design. Impacts often occur unexpectedly; for example, an unplanned fire substantially
reduces the cover of grass or shrub over 35 ha of your management area or a flood
destroys most of the remaining native sycamores (Plantanus wrightii) along the stream
crossing your property. In such cases, a series of suboptimal study designs have been
described. If no control areas are possible (see Fig. 6.1, sequence 2), then the signifi-
cance of the impact must be inferred from temporal changes alone (discussed below).
If the location and timing of the impact are not known (i.e., it is expected but cannot
be planned; e.g., fire, flood, disease), the study becomes a baseline or monitoring study
232 6 Impact Assessment
(see Fig. 6.1, sequence 3). If properly planned spatially, then it is likely that nonim-
pacted areas will be available to serve as controls if and when the impact occurs. This
again indicates why “monitoring” studies are certainly research, and might allow the
development of a rigorous experimental analysis if properly planned.
Unfortunately, impacts often occur without any preplanning by the land man-
ager. This common situation (see Fig. 6.1, sequence 4) means that impact effects
must be inferred from among areas differing in the degree of impact; study design
for these situations is discussed below. Finally, situations do occur (see Fig. 6.1,
sequence 5) where an impact is known to have occurred, but the time and location
are uncertain (e.g., the discovery of a toxin in an animal or plant). This most diffi-
cult situation means that all direct evidence of the initial impact could be nonexist-
ent. For example, suppose a pesticide is located in the soil, but at levels below that
known to cause death or obvious visual signs of harm to animals. The range of
concentration of pesticide known to harm animals varies widely depending on the
absolute amount applied, the distribution of application, and the environmental
conditions present both during and after application (e.g., soil condition, rainfall).
Thus, “backdating” the effect is problematic. Further, it is difficult to know how
the pesticide impacted the animal community (e.g., loss of a species) or if recovery
has occurred. Ways to evaluate impacts using various suboptimal designs are pre-
sented below.
6.3 Disturbances
Three primary types of disturbances occur: pulse, press, and those affecting tem-
poral variance (Bender et al. 1984, Underwood 1994). Pulse disturbances are
those that are not sustained after the initial disturbance; the effects of the distur-
bance may be long lasting. Press disturbances are those that are sustained beyond
the initial disturbance. Both pulse and press disturbances can result from the
same general impact. For example, a pulse disturbance occurs when a flood
washes frog egg masses downstream. A press disturbance occurs subsequently
because of the changes in river banks and associated vegetation, which prevent
the successful placement of new egg masses. The magnitude of the pulse distur-
bance will determine our ability to even know that an impact has occurred. For
example, Fig. 6.2 depicts mild (B) and relatively severe (C) pulse disturbances;
the former would be difficult to detect if sampling was less frequent (i.e., if
sampling had not occurred between times 6 and 8) and/or the variance of each
disturbance event was high. Figure 6.3 depicts mild (C) and relatively severe (D)
press disturbances. The former would be difficult to distinguish from the variation
inherent in the control sites.
Disturbances affecting temporal variance are those that do not alter the mean
abundance, but change the magnitude of the oscillations between sampling periods.
These changes can increase (see Fig. 6.3a) or even decrease (see Fig. 6.3b) the variance
relative to predisturbance and/or control sites.
6.3 Disturbances 233
Fig. 6.3 Simulated environmental disturbances in one location (•—•), with three controls, all
sampled six times before and after the disturbance (at the times indicated by the arrow). (a, b) The
impact is an alteration of temporal variance after the disturbance; temporal standard deviation ×5
in (a) and ×0.5 in (b). (c, d) A press reduction of abundance to 0.8 (c) and 0.2 (d) of the original
mean. Reproduced from Underwood (1994), with kind permission from Springer Science +
Business Media
6.3 Disturbances 235
a press disturbance usually recovers slowly as either the source of the impact lessens
(e.g., a chemical degrades) or the elements impacted slowly recover (e.g., plant
growth, animal recolonization).
Also note that an impact can change the temporal pattern of an element, such as
the pattern in fluctuation of numbers. A change in temporal patterning could be due
to direct effects on the element or through indirect effects that influence the ele-
ment. Direct effects might result from a change in activity patterns because the
impact agent modified sex or age ratios (i.e., differentially impacted survival of
different sex–age classes); indirect effects could result because the impact agent
severely impacted a predator, competitor, or food source of the element being
monitored.
Parker and Wiens (2005) provided the following basic definitions that should be
used in discussing impact assessment; we have modified some of these slightly to
match other concepts and terminology in this book:
● Biological resources: Quantifiable components of the systems such as organ-
isms, populations, species, and communities.
● Levels: measures of a resource such as abundance, diversity, community struc-
ture, and reproductive rates. Hence, levels are quantifiable on an objective scale
and can be used to estimate means and variance and to test hypotheses.
● Natural factors: Physical and chemical features of the environment that affect
the level of a resource at a given time and location, such as temperature,
substrate, dissolved oxygen, total organic carbon.
● Gradient analysis and dose–response regression: Are often used synonymously;
where dose is a measure of exposure to the impact and response is a measure of
the biological system.
● Recovery: A temporal process in which impacts progressively lessen through
natural processes and/or active restoration efforts.
● Recovered: When natural factors have regained their influence over the biological
resource(s) being assessed.
As summarized by Parker and Wiens (2005), impact assessment requires making
assumptions about the nature of temporal and spatial variability of the system under
study. Of course, any ecological study makes such assumptions whether or not they
are acknowledged; the nature of this variability is critical in designing, analyzing,
and interpreting results of a study.
Parker and Wiens (2005; also Wiens and Parker 1995) categorized assumptions
about the temporal and spatial variability of a natural (nonimpacted) system as in
steady state, spatial, or dynamic equilibrium (Fig. 6.4). As the name implies, a
steady-state system is typified by levels of resources, and the natural factors con-
trolling them, showing a constant mean through time (a). Hence, the resource at a
236 6 Impact Assessment
given location has a single long-term equilibrium to which it will return following
perturbation (if it can, indeed, return). Such situations usually only occur in very
localized areas. In (A), the arrow denotes when then state of the system (solid line)
is perturbed to a lower level (the dashed line). Spatial equilibrium occurs when two
or more sampling areas, such as impact and reference, have similar natural factors
and, thus, similar levels of a resource (B). Thus, in the absence of a perturbation,
differences in means are due to sampling error and stochastic variations. Look
closely at the dashed line in (A) vs. the dashed line in (B); the primary difference
between figures is that multiple areas are considered in (B). Dynamic equilibrium
incorporates both temporal and spatial variation, where natural factors and levels of
resources usually differ between two or more areas being compared, but the differ-
ences between mean levels of the resource remain similar over time (C). In such
systems recovery occurs when the dynamics of the impacted areas once again par-
allel those of the reference area. Note in (C) that the reference (solid line) line fluc-
tuates around the mean (also solid line), while the impacted area (dashed line)
drops well below the natural (although lower than the reference) condition (lower
solid line).
Fig. 6.4 Ecological assumptions affecting the assessment of recovery from an environmental
accident. Reproduced from Parker and Wiens (2005), with kind permission from Springer Science
+ Business Media
6.4 Before–After Designs 237
Parker and Wiens (2005) also presented an example of when ignorance of the
underlying system dynamics can lead to erroneous conclusions on recovery. In
Fig. 6.4D, we see considerable natural variation around the long-term, steady-
state mean. In this figure the horizontal solid line represents the long-term mean
of the fluctuating solid line. If this long-term variation is not known or not con-
sidered, the perturbed system might erroneously be deemed recovered when it is
not; for example, at point ‘a’ in the figure. Conversely, the system might be
deemed to be impacted when in fact it has recovered (point ‘b’; note that the
dashed line is now tracking the solid line that represents the natural state).
The assumptions surrounding all three of these scenarios about system equilib-
rium also require that the perturbation did not cause the resource to pass some
threshold beyond which it cannot recover. In such situations a new equilibrium will
likely be established. For example, when an event such as fire, over grazing, or
flooding permanently changes the soil. Under such situations the system would
recover to a different state.
As outlined above, Green (1979) developed many of the basic principles of envi-
ronmental sampling design. Most notably, his before–after/control–impact, or
BACI, design is the standard upon which many current designs are based. In the
BACI design, a sample is taken before and another sample is taken after a distur-
bance, in each of the putatively disturbed (impacted) sites and an undisturbed (con-
trol) site. If there is an environmental disturbance that affects a population, it would
appear as a statistical interaction between the difference in mean abundance of the
sampled populations in the control and impacted sites before the disturbance, and
that difference after the disturbance (Fig. 6.5a).
However, the basic BACI design is confounded because any difference from before
to after may occur between two times of sampling as a result of natural variation, and
not necessarily by the impact itself (Hurlbert 1984; Underwood 1994). To address this
problem, the basic BACI was expanded to include temporal replication, which involves
several replicated sampling times before and after the impact (see Fig. 6.5d).
Stewart-Oaten et al. (1986) discussed the advantage that taking samples at non-
regular time intervals, rather than on a fixed schedule, had in impact assessment.
Sampling at nonregular times will help ensure that no cyclical differences unfore-
seen by the worker will influence the magnitude of the before–after difference.
Taking samples at regular intervals means that temporal variance might not be esti-
mated accurately and that the magnitude of the impact may be overestimated or
underestimated. For example, sampling rodents only during the fall postbreeding
period, which is a common practice, will obviously underestimate annual variance,
and overestimate the annual mean.
238 6 Impact Assessment
Fig. 6.5 Common sampling designs to detect environmental impacts, with circles indicating
times of sampling: (a) a single sample in one location before and after an impact (at the time of
the arrow); (b) random samples in one location before and after an impact; (c) BACI design with
a single sample before and after the impact in each of a control (dashed line) and the putatively
impacted location (solid line); (d) modified BACI where differences between mean abundances in
the control and potentially impacted locations are calculated for random times before and after the
disturbance begins (vertical lines indicate difference); and (e) further modification of (d) to allow
sampling at different times in each location. Reproduced from Underwood (1991), with kind
permission from CSIRO Publishing
Analyses based on temporal replication must assume that each sampling date
represents an independent estimate of the true change (see also time-series analysis,
below). Osenberg et al. (1994) examined patterns of serial correlation from a long-
term study of marine invertebrates to gain insight into the frequency with which
samples could be collected without grossly violating the assumption of temporal
independence. They found that serial correlation was not a general problem for the
parameters they estimated; other analyses have produced similar results (Carpenter
et al. 1989; Stewart-Oaten et al. 1992). Many studies in the wildlife literature have
examined serial correlation in telemetry studies, and independence of observations
should not be assumed without appropriate analysis (White and Garrott 1990).
Underwood (1991, 1994) presented an excellent review of the development of
impact analysis, including basic and advanced statistical analyses appropriate to
different designs. Analyses of basic BACI designs are summarized, based on analysis
6.4 Before–After Designs 239
of variance (ANOVA), in Table 6.1, and are keyed to the patterns of disturbance
described in Fig. 6.5b–d.
The Stewart-Oaten et al. (1986) modification (of taking samples at nonregular
intervals) solved some of the problems of lack of temporal replication, but did not
address the problem of lack of spatial replication. The comparison of a single
impact site and a single control site is still confounded by different factors between
the two sites that are not due to the identified impact. Remember that local popula-
tions do not necessarily have the same trajectory of abundance and behavior, and
temporal interaction among sites is common. Stewart-Oaten et al. (1986) concluded
that such a temporal change in the difference between the two sites before the
impact would render the variable being used to assess the impact unusable; we can
only assume that such a temporal difference would continue after the impact.
Because many populations can be expected to vary in their patterns of abundance
across times and sites, the basic BACI design – even with temporal replication – is
a serious restriction on the usefulness of this design (Underwood 1994).
Table 6.1 Statistical analyses for the detection of environmental impact using various sampling
designsa
Source of variation Degrees of freedom F-ratio vs. Degrees of freedom
(a) Replicated before/after sampling at a single location; samples are taken at t random times
before and t times after the putative impact (see Fig. 6.5b)
Before vs. after = B 1 T(B) 1, 2(t − 1)
Times (before vs. after) = T(B) 2(t − 1)
Residual 2t(n − 1)
Total 2m − 1
(b) BACI: A single time of sampling at two locations, one control and one potentially impacted
(see Fig. 6.5c)
Before vs. after = B 1
Locations: control vs. impact = L 1
Interaction B × L 1 Residual 1, 4(n − 1)
Residual 4(n − 1)
Total 4n − 1
(c) BACI: Replicated before/after sampling at two locations, one control and one potentially
impacted; samples are taken at t random times before and t times after the putative impact,
but at the same times in each site (see Fig. 6.5d)
Before vs. after = B 1
Locations: control vs. impact = L 1
Interaction B × L 1 L × T(B) 1, 2(t − 1)
Times (before vs.
after) = T(B) 2(t − 1) Residual 2(t − 1), 4t(n − 1)
Interaction L × T(B) 2(t − 1) Residual t − 1,4t(n − 1)
L × T(B) before t−1 Residual t − 1,4t(n − 1)
L × T(B) after t−1 Residual T − 1,4t(n − 1)
Residual 4t(n − 1)
Total 4n − 1
Source: Reproduced from Underwood (1991), with kind permission from CSIRO Publishing
a
In each case, analysis of variance is used to provide a standard framework for all designs. In all
cases, n replicate samples are taken at each time and site of sampling
240 6 Impact Assessment
Fig. 6.6 Sampling to detect environmental impacts. (a) BACI design – replicated samples are
taken several times in a single control (dashed line) and in the potentially impacted location (solid
line) before and after a planned disturbance (at the time indicated by the arrow). (b) Sampling
three control locations to provide spatial replication. Reproduced from Underwood (1994), with
kind permission from Springer Science + Business Media
6.4 Before–After Designs 241
Osenberg et al. (1994) developed the BACI design with paired sampling, or BACIP.
The BACIP design requires paired (simultaneous or nearly so) sampling several
times before and after the impact at both the control and impacted site. The measure
Fig. 6.7 Frequency distribution of the sample size (number of independent sampling dates) for
parameters in each group that is required to achieve 80% power. Reproduced from Osenberg et al.
(1994), with kind permission from Springer Science + Business Media
6.4 Before–After Designs 243
Table 6.3 Asymmetrical analysis of variance of model data from a single impact (I) and three
control locations sampled at six times before and six times after a disturbance that causes no
impact
Source of variation Degrees of freedom Mean square F-ratio F-ratio vs.
Before vs. after = B 1 331.5
Among locations = L 3 25,114.4
a
Impact vs. controls = I 1 3,762.8
a
Among controls = C 2 35,790.2
Times (B) = T(B) 10 542.0
B×L 3 375.0
a
B×I 1 454.0 1.51 Residual
a
B×C 2 335.5 1.12 Residual
T(B) × L 30 465.3
a
Times (before) × L 15 462.2
b
T (before) × I 5 515.6
b
T (before) × C 10 435.9
a
Times (after) × L 15 468.2
b
T (after) × I 5 497.3
b
T (after) × C 10 453.6 1.51 Residual
Residual 192 300.0
Reproduced from Underwood (1991), with kind permission from CSIRO Publishing
a, b
Represent repartitioned sources of variation to allow analysis of environmental impacts as
specific interactions with time periods B × I or T (After) × I
6.4 Before–After Designs 245
Alternatively, if the impact is not sustained or is not sufficient to alter the mean-
abundance in the impacted site over all times of sampling after the disturbance, it
should be detected in the pattern of statistical interaction between the time of sampling
and the contrast of the impacted and control sites (see Table 6.3, T(After) × I ) ).
Alternatively, a more conservative approach would be to develop error terms based
on (1) B × L, averaging out time or (2) T(B) × L, incorporating time.
Thus, a difference is sought between the time course in the putatively impacted site
and that in the controls. Such a difference would indicate an unusual event affecting
mean abundance of the population in the single disturbed site, at the time the distur-
bance began, compared with what occurred elsewhere in the undisturbed controls.
The impact will either be detected as a different pattern of interaction among the
times of sampling or at the larger time scale of before to after the disturbance.
The manner in which system dynamics interact with the design of an impact
study are summarized in Table 6.4. This table is largely self-explanatory. The col-
umn headed “Baseline” is defined as a study that compares pre- and postdata from
the impact area only. This is analogous to Green’s (1979) Main Sequence 2, where
the impact is inferred from temporal variation only. Recall that reference areas in the
classic (original) BACI design are not required. However, because natural factors
usually vary temporally, results from baseline studies are seldom sufficient to determine
Table 6.4 Three design strategies for assessing recovery from environmental impacts on biologi-
cal resources in temporally and spatially varying environments
Multiyear
No reason to Reason to
reject/suspect reject/suspect
Attributes Baseline Single year assumptionsa assumptionsa
When to use Temporally Spatial Temporally variant taxa, long recovery
invariant taxa equilibrium period, taxa on multiple recovery
achievable, periods, information on recovery
short recovery process desired
period
Data needs Pre- and Impact and Time series for impact and reference
postimpact reference areas or for gradientb
only sites,
covariates
Comparison Pre- vs. Impact vs. Impact vs. reference and gradient over
postimpact reference, timeb
matched pairs,
gradientb
Equilibrium Steady-state Spatial Dynamic Reject or suspect
assumption assumptions
Breakdown in Temporal Spatial variation Temporal NA
assumptions variation confounds variation
confounds with recovery differs for
with recovery impact and
reference
categories
(continued)
246 6 Impact Assessment
if recovery has occurred. “Single-year studies” compare impact and reference areas
but within a single year. These designs approximate spatial equilibrium through the
use of multiple sampling areas, which requires a close matching of natural condi-
tions across sites (e.g., matched pairs design). Recovery occurs when impact and
reference means are similar. “Multiyear studies” reduce the effects of temporal and
spatial variation by removing (subtracting out) naturally varying temporal effects.
If the impact and reference areas are in a dynamic equilibrium, recovery occurs
6.5 Suboptimal Designs 247
when differences in annual means become constant (trend lines become parallel as
explained above for Fig. 6.4C).
Designs classified as suboptimal apply to the true impact situation; namely, where
you had no ability to gather preimpact (pretreatment) data or plan where the impact
was going to occur. Such situations are frequent and involve events such as chemi-
cal spills and natural catastrophic events. After-only impact designs also apply,
however, to planned events that resulted from management actions, such as timber
harvest, road building, and restoration activities, but were done without any moni-
toring plan. As noted by Parker and Wiens (2005), it is critical in impact assessment
to separate the recovery signal from natural variation and of verifying the ecologi-
cal assumptions on which detecting recovery depends.
The assumption that the sampling interval is short enough so that no changes in site
conditions occurred during the sampling period is especially critical in all single-time
designs. You must also assume that natural forces were acting in a similar manner
across all treatment and reference sites; using multiple (replicated) sites enhances the
chance that interpretable data will be gathered. In all of the following designs, analysis
248 6 Impact Assessment
of covariance using nonimpact-related variables may help identify the effect such
variables are having on determination of the severity of the impact.
Matched pair designs reduce the confounding of factors across sites. Under this
design, sites within the impacted area are randomly selected and nonrandomly
matched with similar reference sites. Such matching reduces the variability between
pairs, thus statistically enhancing the difference between impacted and reference
pairs. These differences are then compared using paired t tests.
You must assume under this design that the matched sites do not vary systemati-
cally in some important manner that either masks an impact or falsely indicates that
an impact has occurred. As noted by Wiens and Parker (1995), this design is open
to criticism because the reference sites are chosen nonrandomly.
Gradient designs analyze an impact along a continuous scale and use regression
techniques to test for an association between level of impact and response by the
animal. For example, data on animal condition and impact intensity can be taken at
a series of points along a transect through a disturbed area, and the results regressed
to look for a significant association.
6.5 Suboptimal Designs 249
Under this design you must assume that the complete or nearly complete range of
impact, including none (which becomes a reference site embedded in the analysis), has
been sampled about evenly. This ensures that regression analysis can be run properly, and
increases the likelihood that other natural factors are balanced across sampling locations.
The gradient approach is especially applicable to localized impacts because it allows
you to quantify that the response of elements vary with distance from the impact. Further,
you might be able to identify the mechanism by which the impacted site recovers, such
as through a gradual lessening of the effect of the impact along the gradient from distal
to proximal ends. The ends of the gradient serve, in essence, as nonimpacted reference
sites. Figure 6.8 depicts a simple but effective gradient design.
Multiple-time designs are those where repeated sampling can occur following the
disturbance. Such a sampling regime allows evaluation of temporal dynamics that
could indicate both an impact and a subsequent recovery. Gathering samples repeat-
edly from the same locations usually requires that specialized statistical analyses
(e.g., repeated measures analysis) be used to interpret recovery rate.
250 6 Impact Assessment
Due to natural variation, the factors of interest may change in value from year to
year. A difference in magnitude of these annual changes between sites is evidence
that a change has occurred. If the change results from a human impact (e.g.,
logging), a natural catastrophe (e.g., fire, disease), or even an experimental treat-
ment, the use of multiple sites means that reference (“control”) sites will be
available for comparative purposes. Following the sites for an extended period of
time (likely 2–5 year) will reveal how the trajectory of recovery compares with
that on reference sites; interpretation of change will not be masked by the over-
riding ecological process (i.e., the assumption of steady-state dynamics is relaxed,
as described above). Factors can differ between sites, but temporal changes in the
resource are expected to be similar to reference sites in the absence of the impact.
It is thus assumed that a dynamic equilibrium exists between factors affecting the
resource and the state of the resource. It also assumes that some recovery occurs
during the course of the study.
The term “level” (level-by-time interaction) refers to the fact that specific
categories (levels) of the impact are designated. For example, a chemical spill could
directly kill animals or force them to leave an impacted area. Here the chemical
impacts specific locations at different general levels (e.g., light, moderate, and
heavy spills). As the chemical dissipates, or after it has been cleaned-up, animals
might start returning to the site, or those that remained during the spill might begin
to reproduce. In either case, the abundance of animals on the impacted site should
recover to the pattern of abundance being shown by the reference sites during the
same time period, assuming no residual effects of the chemical. If a change in
abundance or other population parameter persists, then this change in pattern
between impact and reference sites indicates effect. The asymmetrical design of
Underwood (1994) described above is similar to this level-by-time design, except
that the BACI uses a before–after comparison.
6.6 Supplemental Approach to Impact Assessment 251
Because it is unlikely that powerful “before” sampling will be available in most cir-
cumstances, we need to select alternative approaches to understanding ecological
processes and the effects of planned and unplanned impacts on them. We need to
determine the rates of change and the magnitude of spatial differences for various
populations. An initial step could involve the selection of priority species and vege-
tation types. For example, we know that logging, agriculture, housing, recreation,
and hunting will continue. Thus, it makes sense to monitor examples of such systems
that are still in a relatively undisturbed state. A set of locations could be monitored
where effects of human disturbance are minimal. These sites would collectively
constitute baseline information that could then be used to contrast with perturbed
areas, when the opportunity or need arose in other geographic locations.
Thus, baseline information would already exist to address a number of specific needs.
This would lessen the need for specifically acquired “before” data. Such an analysis would
be useful because (1) many different locations would be used, (2) it would lower the over-
all cost of conducting environmental assessments, and (3) it would reduce the need for
location- and time-specific “before” data. Additionally, it would also improve our ability
to predict the likely impact of proposed developments through the accumulation of data
necessary to develop population models (Underwood 1994). Of course, it is difficult to
know exactly where to place such sampling sites. This stresses the need for careful devel-
opment of an appropriate sampling design, including assessment of the number and place-
ment of locations to be monitored. Such considerations are typically ignored, even though
252 6 Impact Assessment
Fig. 6.9 (a) Indication that an impact has occurred. (b) No indication of an impact occurring
most statistical tests emphasize adequate randomization and replication of the sampling
units (see Chap. 2). This type of design will be most effective at the broader spatial scales
of analysis, such as the landscape or macrohabitat scales, where we are concerned with
changes in presence–absence, or at most, general direction of trend.
This suggestion partially addresses the problem of implementing long-term
research programs by reducing the time- and location-specificity of data collection.
The multiple-time designs outlined above would benefit from such a program by
being placed in the context of estimating of variance of response variables and system
dynamics over long time frames. In addition, these designs can incorporate before–
after data (Green 1979; Osenberg et al. 1994; Underwood 1994; Wiens and Parker
6.7 Epidemiological Approaches 253
1995). Very rare species, especially those that are state and federally threatened and
endangered, will likely require more intensive, site- and time-specific sampling
because of the relatively small margin of error involved in managing these groups.
In addition, there are often sample size problems with rare species because of their
small numbers. Also, there may be limits placed by regulatory agencies (US Fish
and Wildlife Service and the state wildlife agency) on the methodologies used to
study rare species.
Attributable risk is defined as the proportional increase in the risk of injury or death
attributable to the external factor (e.g., wind turbine, pesticide, noise). It combines
the relative risk (risk of natural mortality) with the likelihood that a given individual
is exposed to the external factor. Attributable risk (AR) is calculated as
where PD is the probability of death for the entire study population, and PDUE the
probability of death for the population not exposed to the risk. That is, PD incorpo-
rates all causes of death or injury that the study population is experiencing, be it
related to the impact of interest or another (natural or human-related) cause. The
PDUE, then, is simply PD without inclusion of the impact of interest.
For example, suppose that the probability of death for a randomly chosen
individual in the population is 0.01, and the probability of death in a control
6.7 Epidemiological Approaches 255
area for a bird flying through a theoretical rotor plane without the presence of
blades is 0.0005. The AR is thus (0.01–0.0005)/0.01 = 0.95. Thus, about 95%
of the risk of dying while crossing the rotor plane is attributable to the presence
of blades. As noted by Mayer (1996), it is this large attributable risk that stimu-
lates the concern about the impact of wind development on birds, regardless of
the absolute number of bird deaths. Testing a preventive measure in a treat-
ment–control experiment allows us to determine the change in risk due to the
prevention.
where PDI is the probability of injury or death given the preventive intervention.
For example, studies have shown that birds are killed by wind turbines. Thus, the
need arises to test various preventive measures, such as removing perches and
painting blades. If population mortality in the wind farm is 0.01, and mortality
for those using the area with the preventive intervention is 0.005, then the pre-
ventable fraction is (0.01–0.005)/0.01 = 0.5. Thus, about 50% of the risk would
be removed if all of the perches were removed. Note that the attributable risk and
preventable fraction would be the same value if the intervention removed the
risk entirely.
Prevented fraction is the actual reduction in mortality that occurred because of the
preventive intervention. Prevented fraction (PFI) is calculated as:
where PDAI is the probability of injury or death in the absence of intervention. For
example, suppose that 25% of the perches are removed in a treatment–control
experiment. Field studies determine that mortality is 0.01 for the population and
0.015 for those living without the prevention (e.g., perches). The prevented fraction
is (0.015–0.01)/0.015 = 0.33. Thus, about 33% of the risk has been removed by
removing 25% of the perches.
256 6 Impact Assessment
The choice of the use factor, or denominator, is more important than the numerator. The
choice arises from the preliminary understanding of the process of injury or death. In
fact, the treatment effect is usually small relative to the variability that would arise from
allowing alternative measures of risk. For example, should the denominator be bird
abundance, bird flight time in the farm, bird passes through the rotor plane, or some
other measure of use? Unless these measures are highly intercorrelated – which is
unlikely – then the measure selected will result in quite different measures of mortality.
Further, the choice of denominator is important in that it should express the mechanism
causing the injury or mortality. If it does not, then we will not be able to accurately
measure the effectiveness of a risk reduction treatment.
For example, suppose that bird use or abundance is the denominator, bird deaths
are the numerator, and painted blades are the treatment. A treatment–control study
determines that death decreases from 10 to 7 following treatment, but use actually
decreases from 100 to 70 (arbitrary units). It thus appears that the treatment had no
effect because both ratios are 0.1 (10/100 and 7/70). This study is seriously flawed
because, in fact, no conclusion should be drawn. This is because there is no direct
link between the number of birds using the area and flights near a turbine. There
are numerous reasons why bird use of a wind farm could change (up or down) that
are independent of the farm, for example, changes in prey availability, direction of
prevailing winds, environmental contaminants, and so on. In this case, recording
bird flights through the rotor plane of painted blades would have been a more cor-
rect measure of effect. In addition, the use of selected covariates can help focus the
analysis on the treatment effects. Naturally, the hypothetical study noted above
should be adequately replicated if implemented.
It is, of course, prohibitive from a practical standpoint to record every passage of a
bird through a zone of risk (be it a rotor plane or the overall wind farm). Further, it is
usually prohibitive to accurately census the population and tally all deaths. As such,
we must usually rely on surrogate variables to use as indices of population size and
death. A surrogate variable is one that replaces the outcome variable without signifi-
cant loss in the validity or power of the study. An example would be using the number
of birds observed during 10-min point counts as a measure of utilization (of a treatment
or control). Utilization is an indicator of the level of at-risk behavior. Thus, adopting a
measure of utilization requires the assumption that the higher the utilization the higher
the risk. If feasible, assumptions should always be tested early in the study.
6.7 Epidemiological Approaches 257
As outlined by Mayer (1996), there are four tasks that the investigator must accom-
plish when designing a study of impact assessment. The logic is sequential and
nested; each choice depends on the choice made before:
1. Isolate the hypothesis of mechanism that is being tested. For example, one might
be testing the hypothesis that birds strike blades when attempting to perch on a
turbine. The hypothesis should be simple and readily testable.
2. Choose a measure of injury–death frequency that best isolates the hypothesis
being tested. The two components of this choice are to choose an injury–death
count to use as a numerator and a base count (likely utilization) to use as a
denominator. It is critical that a relevant measure of use be obtained (e.g., passes
through the rotor plane; occurrence by flight-height categories).
3. Choose a measure of effect that uses the measure of injury–death frequency and
isolates the hypothesis being tested. Here, decide whether the relative risk (risk
ratio), attributable risk, or another measure of effect should be used.
4. Design a study that compares two or more groups using the measure of effect
applied to the measure of injury–death frequency chosen. The goal here is to
isolate the effect, control for confounding factors, and allow a test of the hypoth-
esis. Replication is essential.
The ideal denominator in epidemiology is the unit that represents a constant risk to
the animal. The unit might be miles of flight, hours spent in the farm, or years of life.
If the denominator is the total population number, then we are assuming that each bird
bears the same risk by being alive. In human epidemiological studies, the total popu-
lation size is usually used because we cannot estimate units of time or units of use. In
258 6 Impact Assessment
Equation (6.1) is ideal, but as discussed above, usually impractical. Equation (6.2) is
feasible, but will vary widely depending upon the measure of bird use selected. In
addition, for (6.2), the background (control) mortality rate must also be determined
for comparative purposes. Thus, (6.2) should be the center of further discussion.
Consultations with personnel in the wind industry have led to the conclusion that
a measure of operation time that is easily standardized among wind farms and tur-
bine types would be preferable. It has been suggested that a measure that considers
differences in blade size and operation time would be most appropriate. As such,
the concept of “rotor swept area” has been developed, which is simply the circular
area that a turning blade covers. Rotor swept area is then converted to an index that
incorporates operation time as follows:
Here, “risk measure” could be flight passes through rotor plane or any other appro-
priate measure of use (as discussed above). Here again, we emphasize the need to
test assumptions. For example, an assumption here is that the probability of a bird
being struck is equal among all turbines, which may or may not be the case.
Numerous factors, such as placement of the turbine in a string of turbines or place-
ment of a turbine along a slope, could influence the probability of a bird collision.
One primary variable will usually drive the study design; thus, the initial sample
size should be aimed at that variable. It is thus assumed that at least a reasonable
sample size will be gathered for the other, secondary variables. Sampling can be
adjusted as data are collected (i.e., sequential analysis of sample size).
Designing treatment–control studies for inferences on measures of use is feasible.
Determination of mortality (using (6.2) ) is possible, but statistical power to conclude
6.7 Epidemiological Approaches 259
that treatment and control sites have different mortality rates will be low. For exam-
ple, in a randomized pairs design, most pairs are expected to result in 0 mortalities,
with tied values and no mortalities on either member of a pair. The high frequency of
zero values effectively reduces the sample size for most analyses.
Case studies have high utility in evaluating mortality. Here, one collects dead or
injured animals inside and outside the impacted area, and conducts blind analyses
to determine the cause of death. Unfortunately, from the standpoint of study design,
under most situations very few dead animals will be found outside the impacted
area. However, all dead animals found in a study should be subjected to blind analy-
ses because this information will assist with evaluation of observational data.
The case study approach suggests that epidemiological analysis can often be
combined with clinical analysis to extend the inferential power of a study. Here the
clinical analysis would be the necropsies of the animals. Suppose that we are suc-
cessful at finding dead birds inside a wind farm. If we look at proportional mortality
– the proportion of the birds killed by blunt trauma, sharp trauma, poisoning, hunt-
ing, and natural causes – then the proportions should differ significantly between
the farm and the control area. The assumption is that the differential bias in finding
dead birds within the two areas is uniform across the causes of mortality and thus
the proportions should be the same even if the counts differ (i.e., relatively few dead
birds found outside the farm). An inherent problem with this approach is the diffi-
culty in finding dead birds in the control area(s).
A central part of impact assessment is development of a model that shows the sur-
vival rates required to maintain a constant population. The strategy is to determine
survival rates required to sustain populations exhibiting various combinations of
other parameters governing population size. To be useful in a wide range of envi-
ronmental situations and useable for people with varying expertise, the model must
be based on simple mathematics.
The use of models (of all types) has soared in the past 20 years. In fact, modeling
is now a focus of much interest, research, and management action in wildlife and
conservation biology. But as in all aspects of science, models have certain assump-
tions and limitations that must be understood before results of the models can be
properly used. Modeling per se is neither “good” nor “bad”; it is the use of model
outputs that determines the value of the modeling approach.
The use of population models to make management decisions is fairly common.
For example, models play a role in management plans for such threatened and
endangered species as the spotted owl (Strix occidentalis, all subspecies), desert
tortoise (Gopherus agassizi), Kirtland’s warbler (Dendroica kirklandii), various
kangaroo rats (Dipodomys spp.), and so forth. Models are valuable because they
analyze the effects of management proposals in ways that are usually not possible
using short-term data or professional opinion. Models can be used in impact assessment
to predict how a system (e.g., species, group of species, environmental measures)
should have behaved under nonimpacted conditions, and also how a system might
have behaved under various impact scenarios.
As well summarized elsewhere (e.g., Manly et al. 2002; Morrison et al. 2006), docu-
mentation of the resources used by animals is a cornerstone – along with quantifying
distribution and abundance – of animal ecology. Thus, much literature is available
on how to identify, quantify, and interpret the use of resources by animals. In this
section we briefly review some of the terminology associated with resource use, and
provide some guidance on how studies of resources have been categorized in the lit-
erature. The specific statistical procedures and models used in resource selection
studies are basically the same as those used in other studies of wildlife ecology, and
have been well presented by Manly et al. (2002).
The use of resources is obviously critical to all organisms, and resource use is
defined as the quantity of the resource that is used in a specific period of time. The
6.8 Modeling Alternatives 261
thus estimates calculated from observations may be sued to estimate parameters for
the population of animals and estimates of variability of these estimates.
Virtually all classes of statistical techniques have been used to analyze use–
availability (or use–nonuse) data, depending upon the objectives of the researcher,
the structure of the data, and adherence to statistical assumptions (i.e., univariate
parametric or nonparametric univariate comparisons, multivariate analyses,
Bayesian statistics, and various indices), and these techniques have been well
reviewed (see summary in Morrison et al. 2006, pp. 166–167 and Manly et al.
2002). Compositional analysis, only recently applied to habitat analysis, should be
considered for use in these studies (Aebischer et al. 1993).
6.8.2 Synthesis
The goal should be to present a realistic and unbiased evaluation of the model. It is
preferable to present both a best and worst case scenario for model outputs, so that
the range of values attainable by the model can be evaluated. For example, with a
basic Leslie matrix model of population growth, knowing whether the confidence
interval for the predicted (mean) value for l (rate of population growth) includes a
negative value provides insight into the reliability of the predicted direction of pop-
ulation growth.
The process of model development and evaluation may show that the predictions
of the model are sufficiently robust to existing uncertainties about the animal’s
behavior and demography that high confidence can be placed in the model’s predic-
tions. Even a poor model does not mean that modeling is inappropriate for the situa-
tion under study. Rather, even a poor model (i.e., a model that does not meet study
objectives) will provide insight into how a population reacts to certain environmental
situations, and thus provide guidelines as to how empirical data should be collected
so that the model can be improved. Modeling is usually an iterative process.
In the study of wildlife we are constantly confronted with the need to examine eco-
logical relationships within a short timeframe – a few years – and within a small
spatial area. The restriction of studies to short temporal and spatial scales usually
arises because of limited funding, the specific needs of a funding agency to study a
localized situation (e.g., a wildlife management area), and the fact that much
research is conducted by graduate students who must complete thesis work within
2–3 years. Faculty, as well as agency, scientists are not immune from the need to
develop results from research within a few years.
Despite the temporal and spatial constraints confronted by most researchers, the
need to publish research results remains strong. Additionally, much of the research
6.9 Applications to Wildlife Research 263
that is conducted under time and space constraints will be used to guide manage-
ment of animals, including those hunted and those considered rare or endangered.
Thus, the research that is conducted must be rigorous. In Chap. 5 we discuss many
of the strategies that can be used to, in essence, make the best out of a bad set of
research constraints; here we synthesize some of the steps you can take related to
impact assessment studies.
Recall (Sect. 6.3.1) our definition and discussion of “recovery”: a temporal proc-
ess in which impacts progressively lessen through natural processes and/or active
restoration efforts. Using the concept of recovery and the assumptions about the tem-
poral and spatial variability of a natural (nonimpacted) system – steady state, spatial,
or dynamic equilibrium – Parker and Wiens (2005) outlined strategies for assessing
“recovery from environmental impacts.” Here we use the Parker and Wiens rationale
to specify a strategy for assessing the state or condition of a wildlife population (or
segment thereof ) over a short timeframe and in a limited area. Thus, as we noted
when we opened this chapter, the broad field of impact assessment can be applied to
virtually any research goal because all systems are under constant change.
Studies of recovery (impact assessment) are simply trying to separate the signal
from the noise; this is the same thing most researchers are trying to do. For exam-
ple, say you want to determine what food source(s) are used by deer on a wildlife
management area, and you want to maximize your ability to be confident your
results are not overly time and space constrained. Applying an impact assessment
strategy, you can view your area of interest as the “impacted site” and multiple
other, similar (nearby locations under similar environmental conditions) locations
as your “reference sites.” Under this impact–reference design you can compare, say,
feeding rates in several categories of vegetation types or by particular species of
plants on the impacted site with your reference (no impact) sites. Because deer are
not randomized spatially (i.e., you selected the impact site and locations to study
deer therein), random sampling alone can only reduce the confounding effects of
spatial variation. ANCOVA can be used to further reduce confounding effects given
you can identify factors influencing deer activity, such as distance from roads or
development, availability of water, and related key factors. By including reference
site(s), you will maximize you ability to identify the signal – what the deer are
using on your focal area – from the noise of the environment.
Of course, the longer duration you can study the better for deciphering signal
from noise. But, this strategy can be applied to studies of even one season. Table 6.4
specifies many of the strengths and weaknesses of this approach. Single-year stud-
ies provide only a brief glimpse of environmental variation, and the results of such
studies must fully acknowledge such a limitation. But, applying the impact–reference
strategy strengthens what you can say about the ecology under study. Using multi-
ple study areas allows you to improve your knowledge on the amount of spatial
variation that exists in the environment under study.
Improving what you can say about temporal variation, without studying for
multiple years, can be achieved by increasing the number of reference sites and the
spread of the sites across space. By venturing into “less similar” areas, but those
that still harbor the species of interest, you begin to implement the gradient
264 6 Impact Assessment
6.10 Summary
The field of impact assessment is expanding rapidly as new study designs and ana-
lytical techniques are applied. Because most of the impacts are not planned, subop-
timal designs are usually required. As such, the need for replication and exploration
of confounding factors is critical. In many cases, statistical power will remain low.
In such cases, it is incumbent on the researcher to clearly acknowledge the weak-
nesses of the design and analyses, and fairly represent the available conclusions.
As summarized by Skalski and Robson (1992, p. 211), impact studies are among
the most difficult to properly design and analyze. Impact assessments typically must
include a temporal dimension to the design. Skalski and Robson (1992, p. 212–213)
offered the following considerations that are unique to designing impact assessments:
● Identification of constraints imposed by the investigation with regard to rand-
omization and replication
● Incorporation of all prior knowledge as to where, when, and how the impact is
to occur (if known) into the design of the field investigation
● Expression of the impact hypothesis in statistical terms as a function of model
parameters
● Use of a preliminary survey that is consistent with the objective of the consum-
mate field design to estimate variance components for sample size calculations
● Evaluation of economic and inferential costs of conducting a constrained inves-
tigation relative to other design options
● Establishment of a field design whose spatial and temporal dimensions permit
model-dependent estimates of effects of impact
● Where possible, conducting auxiliary investigations of stressors to provide
ancillary data for establishing cause–effect relationship
References 265
References
Aebischer, N. J., P. A. Robertson, and R.E. Kenward. 1993. Compositional analysis of habitat use
from animal radio-tracking data. Ecology 74:1313–1325.
Ahlbom, A. 1993. Biostatistics for Epidemiologists. Lewis, Boca Raton, FL.
Barker, D. J., and A. J. Hall. 1991. Practical Epidemiology, 4th Edition. Churchill Livingstone,
London.
Bender, E. A., T. J. Case, and M. E. Gilpin. 1984. Perturbation experiments in community ecol-
ogy: Theory and practice. Ecology 65: 1–13.
Carpenter, S. R., T. M. Frost, D. Heisey, and T. K. Kratz. 1989. Randomized intervention analysis
and the interpretation of whole-ecosystem experiments. Ecology 70: 1142–1152.
DeMeo, Committee, c/o RESOLVE, Inc., Washington, DC.
Green, R. H. 1979. Sampling Design and Statistical Methods for Environmental Biologists. Wiley,
New York, NY.
Hurlbert, S. J. 1984. Pseudoreplication and the design of ecological field experiments. Ecol.
Monogr. 54: 187–211.
Manly, B. F. J., L. L. McDonald, D. L. Thomas, T. L. McDonald, and W. P. Erickson. 2002.
Resource Selection by Animals: Statistical Design and Analysis for Field Studies, 2nd Edition.
Kluwer Academic, Dordrecht, The Netherlands.
Mayer, L. S. 1996. The use of epidemiological measures to estimate the effects of adverse factors
and preventive interventions. In Proceedings of National Avian-Wind Power Planning Meeting
II, pp. 26–39. Avian Subcommittee of the National Wind Coordinating Committee. National
Technical Information Service, Springfield, VA.
McDonald, T. L., W. P. Erickson, and L. L. McDonald. 2000. Analysis of count data from before–
after control–impact studies. J. Agric. Biol. Environ. Stat. 5: 262–279.
Morrison, M. L., B. G. Marcot, and R. W. Mannan. 2006. Wildlife-habitat Relationships: Concepts
and Applications, 3rd Edition. Island Press, Washington, DC.
Osenberg, C. W., R. J. Schmitt, S. J. Holbrook, K. E. Abu-Saba, and A. R. Flegal. 1994. Detection
of environmental impacts: Natural variability, effect size, and power analysis. Ecol. Appl. 4:
16–30.
Parker, K. R., and J. A. Wiens. 2005. Assessing recovery following environmental accidents:
Environmental variation, ecological assumptions, and strategies. Ecol. Appl. 15: 2037–2051.
Savereno, A. J., L. A. Saverno, R. Boettcher, and S. M. Haig. 1996. Avian behavior and mortality
at power lines in coastal South Carolina. Wildl. Soc. Bull. 24: 636–648.
Skalski, J. R., and D. S. Robson. 1992. Techniques for wildlife Investigations: design and Analysis
of Capture Data. Academic Press, San Diego, CA.
Stewart-Oaten, A., W. M. Murdoch, and K. R. Parker. 1986. Environmental impact assessment:
Pseudoreplication in time? Ecology 67: 929–940.
Stewart-Oaten, A., J. R. Bence, and C. W. Osenberg. 1992. Assessing effects of unreplicated per-
turbations: No simple solutions. Ecology 73: 1396–1404.
Thomas, D. L., and E. Y. Taylor. 1990. Study designs and tests for comparing resource use and
availability. J. Wildl. Manage. 54: 322–330.
Underwood, A. J. 1991. Beyond BACI: Experimental designs for detecting human environmental
impacts on temporal variation in natural populations. Aust. J. Mar. Fresh. Res. 42: 569–587.
Underwood, A. J. 1992. Beyond BACI: The detection of environmental impacts on populations in
the real, but variable, world. J. Exp. Mar. Biol. Ecol. 161: 145–178.
Underwood, A. J. 1994. On beyond BACI: Sampling designs that might reliably detect environ-
mental disturbances. Ecol. Appl. 4: 3–15.
White, G. C., and R. A. Garrott. 1990. Analysis of wildlife radio-tracking data. Academic, San
Diego, CA.
Wiens, J. A., and K. R. Parker. 1995. Analyzing the effects of accidental environmental impacts:
Approaches and assumptions. Ecol. Appl. 5: 1069–1083.
Chapter 7
Inventory and Monitoring Studies
7.1 Introduction
Inventory and monitoring are probably the most frequently conducted wildlife studies.
Not only are they conducted in the pursuit of new knowledge (e.g., to describe the
fauna or habitats [see Sect. 1.5 for definition of habitat and related terms] of a given
area, or understand trends or changes of selected parameters), but also they are corner-
stones in the management of wildlife resources. In general terms, inventories are con-
ducted to determine the distribution and composition of wildlife and wildlife habitats
in areas where such information is lacking, and monitoring is typically used to under-
stand rates of change or the effects of management practices on wildlife populations
and habitats. In application to wildlife, inventory and monitoring are typically applied
to species’ habitats and populations. Because sampling population parameters can be
costly, habitat is often monitored as a surrogate for monitoring populations directly.
This is possible, however, only if a clear and direct linkage has been established
between the two. By this, we mean that a close correspondence has been identified
between key population parameters and one or more variables that comprise a species’
habitat. Unfortunately, such clear linkages are lacking for most species.
The need for monitoring and inventory go well beyond simply a scientific
pursuit. For example, requirements for monitoring are mandated by key legisla-
tion (e.g., National Forest Management Act [1976], National Environmental
Policy Act [1969], Endangered Species Act [1973]), thus institutionalizing the
need for conducting such studies. Even so, monitoring is embroiled in contro-
versy. The controversy is not so much over the importance or need to conduct
monitoring, but surrounds the inadequacy of many programs to implement scien-
tifically credible monitoring programs (Morrison and Marcot 1995; White et al.
1999; Moir and Block 2001). Unfortunately, few inventory/monitoring studies are
conducted at an appropriate level of rigor to precisely estimate the selected
parameters. Given that inventory and monitoring are key steps in the management
process and especially adaptive management (Walters 1986; Moir and Block,
2001), it is crucial to follow a credible, repeatable, and scientific process to pro-
vide reliable knowledge (cf. Romesburg 1981). The purpose of this chapter is to
outline basic steps that should be followed for inventory and monitoring studies.
(continued)
270 7 Inventory and Monitoring Studies
Inventory and monitoring studies entail similar, but distinct processes. Although
some steps are clearly identical, others are specific to the type of study being done
(Fig. 7.1). The first step, which is universal to any study, is to clearly state the goals.
For example, why conduct the study? What information is needed? How will the
information be used in this or future management decisions? Clearly answering
these questions will help to define a study design that addresses them adequately.
That is, establishing inventory and monitoring goals is critical for defining what
will be monitored (e.g., selected species or all species, population variables or habi-
tat variables), setting target and threshold values, designing appropriate protocols
for collecting data, and determining the appropriate methods for data analysis.
7.3.1 Inventory
Fig. 7.1 Simplified sequence of steps involved with inventory and monitoring
272 7 Inventory and Monitoring Studies
that new species were being detected even after 6 years of sampling, likely because
it is extremely difficult to detect all rare and incidental species.
Inventories are typically done in areas or conditions for which data are lacking,
or across a range of areas or conditions to more clearly define ecological distribu-
tion of a species (e.g., define both presence and absence) (Heyer et al. 1994).
A typical goal of inventories is to assess the presence or infer absence of species
within an area prior to initiating a habitat-altering activity. Note that we state “infer
absence.” Verifying presence is straightforward; if you see or otherwise detect a
species then it uses the area. However, failure to detect a species does not necessar-
ily translate into it being absent when you sampled or that it never uses the area.
This is where designing and implementing an appropriate study design is critical.
The study design must be such that the probability of detecting a species or indi-
viduals using the area is high. Some important associated components are use of
the proper field methodology and sampling during the appropriate period and with
adequate intensity. We cannot reiterate these points enough because proper study
design is the foundation of a valid wildlife study (cf. Chaps. 1 and 2).
Thus, inventories are critical tools to aid in resource planning and species con-
servation. Even basic information on the distribution of species and habitats in an
area can then help design management to protect or enhance conditions for desired
species, whether they are threatened or endangered species or those valued for con-
sumptive or nonconsumptive reasons.
7.3.2 Monitoring
indeed. On the other hand, management plans often contain specific and measura-
ble criteria, such as the desired amount of forest in a given structural class (e.g.,
mature or old-growth forest), or the number of a given habitat element that should
occur across the landscape. For these criteria, establishing a valid monitoring study
is not nearly as challenging. Compliance monitoring is done when mandated by
statute (see Sect. 1.3.2). An example of compliance is monitoring established
within a biological opinion provided by the US Fish and Wildlife Service during
interagency consultation under the Endangered Species Act. Typically, this moni-
toring, referred to as take monitoring, assesses whether an activity adversely affects
the occupancy or habitat of a threatened or endangered species. If so, the action
agency is charged with a “take,” meaning that the activity had an adverse impact on
a specified number of the species. To illustrate further the different types of moni-
toring, we draw upon our theme, the Mexican spotted owl (Box 7.2).
Box 7.2 Monitoring for a Threatened Species: The Mexican Spotted Owl
Different monitoring goals are illustrated in the spotted owl example. The extent
to which management activities are actually applied on the ground and the
degree to which those activities are in accord with recovery plan guidelines
would be evaluated by implementation monitoring. For example, consider a
silvicultural prescription with the ultimate objective of creating owl nesting
habitat within 20 year (the criteria for nesting habitat were provided in the
recovery plan). The prescription entailed decreasing tree basal area by 15% and
changing the size class distribution of trees from one skewed toward smaller
trees to an equal distribution of size classes. Further, the recovery plan specifies
the retention of key correlates of owl habitat – trees >60 cm dbh, large snags,
and large downed logs – during active management practices such as logging
and prescribed burning. In this case, implementation monitoring must have two
primary objectives. One is to determine if losses of key habitat elements
exceeded acceptable levels, and the second is to determine if tree basal area was
reduced as planned and the resultant size class distribution of trees was even.
Recall that the ultimate objective of the treatment was to produce a stand in 20
year that had attributes of owl nesting habitat. Whether or not the prescription
achieved this objective is the goal of effectiveness monitoring.
The owl recovery plan (USDI Fish and Wildlife Service 1995) provided five
delisting criteria that must be met before the owl should be removed from the
list of threatened and endangered species. One criterion was to demonstrate that
the three “core populations” were stable or increasing, and another required
habitat stability across the range of the subspecies. The recovery plan became
official guidance for the US Fish and Wildlife Service, and then for the US
Forest Service as they amended Forest Plans for all forests in the southwestern
region to incorporate the recovery plan recommendations (USDA Forest Service
1996). For a little background, National Forests are mandated to develop Forest
Plans by the National Forest Management Act (NFMA) of 1976, thus making
(continued)
276 7 Inventory and Monitoring Studies
Monitoring can be used to measure natural or intrinsic rates of change over time
or to understand effects of anthropogenic or extrinsic factors on population or habitat
change or trends. By intrinsic changes, we refer to those that might occur in the
absence of human impact, such as trends or changes resulting from natural processes
(e.g., succession) or disturbances (fire, weather, etc.) (Franklin 1989). Anthropogenic
factors are those that may alter or disrupt natural processes and disturbances and
potentially affect wildlife habitats or populations. In most management situations,
monitoring is conducted to understand effects of anthropogenic factors (e.g., water
diversions, livestock, logging, fire suppression) on wildlife. However, recognizing
trends even in the absence of anthropogenic factors is complicated by the dynamic
and often chaotic behavior of ecological systems (Allen and Hoekstra 1992).
Because intrinsic and extrinsic factors more often than not act synergistically to
influence trend or change, the effects of either may be difficult to distinguish (Noon
et al. 1999). Again, this is where application of an appropriate study design plan is
critically important. A well-conceived and well-executed study may allow the inves-
tigator to partition sources of variation and narrow the list of possible factors influ-
encing identified trends (see previous chapters).
A premise underlying most of what we present in this volume is that study designs
must permit valid treatment of the data. For inventory studies, we must be able to
characterize accurately the species or habitat variables of interest. For monitoring,
7.4 Statistical Considerations 277
we must know the effort needed to show a trend over time or to document a speci-
fied effect size in a parameter from time t1 to t2.
In this regard, the investigator should be well aware of concepts of statistical
power, effect size, and sample size, and how they interact with Type I and Type
II errors (see Chaps. 2 and 3 for detailed discussion of these concepts). Typically,
investigators focus on the Type I error rate or alpha. However, in the case of
sensitive, threatened, endangered, or rare species, consideration of Type II error
rate is equally, if not more, relevant. A Type II error would be failure to detect a
difference when it indeed occurred, an error that should be kept to a minimum.
With threatened, endangered, or rare species, overreaction and concluding a
negative impact or negative population trend when it is not occurring (Type I
error) may have no deleterious effects on the species because additional protec-
tions would be invoked to guard against any negative management actions. In
contrast, failing to conclude a significant decline in abundance when it is occur-
ring (Type II error) may allow management to proceed without change even
though some practices are deleterious to the species. The potential risk to the
species could be substantial.
Effect size and power go hand in hand when designing a monitoring study. Simply
stated, effect size is a measure of the difference between two groups. This difference
can be quantified a number of ways using various indices that measure the magni-
tude of a treatment effect. Steidl et al. (1997) regarded effect size as the absolute
difference between two populations in a select parameter. Typically, investigators
establish effect a priori and should be the minimum level that makes biological dif-
ference. For example, a population decline of 10% for a species of concern might
be biologically relevant, so you would need a study with adequate sensitivity to
show that decline when it occurs.
Three common measures of effects size are Cohen’s d, Hedges’ g, and Cohen’s
f 2 (Cohen 1988, 1992; Hedges and Olkin 1985). Cohen’s d measures the effect size
between two means, where d is defined as the difference between two means
divided by the pooled standard deviation of those means. To interpret this index,
Cohen (1992) suggested that d = 0.2 indicates a small, 0.5 a medium, and 0.8 a
large effect size. Hedges’ ĝ incorporates sample size by both computing a denomi-
nator which looks at the sample sizes of the respective standard deviations and also
makes an adjustment to the overall effect size based on this sample size. Cohen’s
f 2 is analagous to an F test for multiple correlation or multiple regression. With this
index, f 2 of 0.02 is considered a small effect size, 0.15 is medium, and 0.35 is large
(Cohen 1988).
Simply stated, statisical power is the probability that you will correctly reject a
null hypothesis (Steidl et al. 1997). Recall from Chap. 2 that failure to reject cor-
rectly the null hypothesis is termed Type II error. As power increases, Type II error
278 7 Inventory and Monitoring Studies
Fig. 7.3 Power analysis for hairy woodpecker and chestnut-backed chickadee to evaluate num-
ber of replicates needed to detect population increases of 50, 100, 150, and 200%. Reproduced
from Steidl et al. (1997), with kind permission from The Wildlife Society
7.4 Statistical Considerations 279
is severe. Thus, we are interested in more subtle population changes, which may go
undetected given this experimental design.
It has become common practice to conduct retrospective power analysis in situ-
ations where results of a test are nonsignificant. Basically, such tests are used more
as a diagnostic tool to evaluate what effects size might have been detected given a
certain power, or vice versa, what power might be achieved given certain effect size
or sample size. Steidl et al. (1997) caution about taking the results of retrospective
power analyses too far. Effectively, their primary use is to evaluate hypothetical
scenarios that may help to inform similar studies conducted sometime in the future.
In some cases, they might also be used to test hypothesized effects sizes thought to
be biologically relevant or to calculate confidence intervals around the observed
effect size (Hayes and Steidel 1997; Thomas 1997).
The answer to the question of what makes inventorying and monitoring different
is basic. The difference between the two is largely a function of time; inventory
measures the status of a resource at a point in time, whereas monitoring assesses
change or trend over time in resource abundance or condition. Inventory and
monitoring follow different processes to meet their goals, especially the series of
feedback loops inherent to monitoring (see Fig. 7.1). Both require that you set
goals, identify what to measure, and, in the case of management, state a value that
when exceeded will result in a management decision. However, because inven-
tory is to assess resource state whereas monitoring is to assess resource dynam-
ics, they will often require different study designs. For example, the sampling
design for a study to inventory Arizona to determine the distribution of spotted
owls would be much different from a study to monitor population trend. Each
study would be designed to estimate different parameters and would entail appli-
cation of different statistical procedures, thus requiring different approaches to
collect the relevant data. One basic principle common to both inventory and
monitoring is that both should be scientifically valid. Thus, concepts discussed in
Chaps. 1 and 2 regarding adequate sample sizes, randomization, replication, and
general study rigor are critically important to any inventory or monitoring study.
Failure to incorporate these considerations will result in misleading information,
282 7 Inventory and Monitoring Studies
Box 7.3 Inventory and Monitoring Goals for the Mexican Spotted Owl
Inventories are used in two basic ways for the Mexican spotted owl. One is
part of project planning and the other is to increase basic knowledge about
owl distribution. The Mexican Spotted Owl Recovery Plan requires that all
areas with any chance of occupancy by owls be inventoried prior to initiating
any habitat-altering activity. The reason why is to determine if owls are using
the area and if so, to modify the management activity if necessary to minimize
impact to the bird. Thus the goal is straightforward: to determine occupancy
(or infer nonoccupancy) of owls to help guide the types and severity of habi-
tat-modifying management that might impact the owl. The second goal of
inventory is to understand the distribution of the owl better. Most inventories
for owls have been conducted in areas where management (typically timber
harvest and prescribed fire) is planned as part of the process described above.
These areas represent only a subset of the lands that the owl inhabits. Thus, to
increase knowledge of owl distribution and population size, the plan calls for
inventories in “holes in the distribution” or in potential habitats where no
records of owls exist.
The recovery plan also requires both population and habitat monitoring.
The reasons for monitoring are multifaceted. First, the owl was listed as
threatened based on loss of habitat and the concern that habitat would con-
tinue to be lost given current management practices. Although not explicitly
stated in listing documents, it was assumed that there was a population
decline concomitant with habitat decline. Thus, a very basic reason to moni-
tor is to evaluate whether or not these trends are indeed occurring and if they
are correlated. A second objective for monitoring is to evaluate whether or not
implementation of management recommendations in the recovery plan were
accomplishing their intended goal, namely recovering the subspecies. This
would entail (1) implementation monitoring to determine if management
activities were done as designed and (2) effectiveness monitoring to evaluate
whether following management recommendations is sustaining owl popula-
tions and habitats. This would be tested by examining both habitat and popu-
lation trends to ensure that owl populations persist into the future. A third
objective of monitoring, validation, would provide measurable, quantitative
benchmarks that when met would allow the bird to be removed from the list
of threatened species (i.e., delisted).
7.6 Selection of a Design 283
Once the study objective is established, the scale of resolution chosen by ecologists
is perhaps the most important decision in inventory and monitoring because it pre-
determines procedures, observations, and results (Green 1979; Hurlbert 1984).
A major step in designing an inventory or monitoring study is to establish clearly
the target population and the sampling frame. Defining the target population essen-
tially defines the area to be sampled. For example, if an area was to be inventoried
to determine the presence of Mexican spotted owls on a national forest, sampling
areas should include general areas that the owl uses (mature conifer forests and
slickrock canyons) but not include areas that the owl presumably would not use
(grasslands, desert scrub) based on previous studies. This first step establishes the
sampling universe from which samples can be drawn and the extent to which infer-
ences can be extrapolated. Thus, the results of these owl surveys apply only to the
particular national forest and not to all national forests within the geographic range
of the owl.
Although this seems rather straightforward, the mobility of wildlife can
muddle the inferences drawn from the established area. Consider, for example,
the case of the golden eagle example presented in Chap. 6. A somewhat arbitrary
decision was made to define the “population” potentially affected by wind turbine
mortality as the birds found within a fixed radius of 30 km of the wind farm. The
basis for this decision included information on habitat use patterns, range sizes,
movement patterns, and logistics of sampling a large area. The primary assump-
tion is that birds within this radius have the greatest potential of encountering
wind turbines and are the birds most likely to be affected. Not measured, how-
ever, were cascading effects that may impact eagles beyond the 30 km radius,
because eagles found within this radius were not a distinct population. Thus,
factors that influenced birds within this arbitrary boundary may have also affected
those outside of the boundary. The point here is that even though considerable
thought went into the decision of defining the sampling universe for this study,
the results of the monitoring efforts may be open to question because mortality
of birds within the 30 km may be affecting the larger population, including birds
found beyond the 30 km radius.
7.6 Selection of a Design 285
A key aspect in the design of any study is to identify when to collect data. There
are two parts to this aspect: the timing of data collection and the length of time over
which data should be taken. The choice of timing and length of study is influenced
by the biology of the organism, the objectives of the study, intrinsic and extrinsic
factors that influence the parameter(s) to be estimated, and resources available to
conduct the study. Overarching these considerations is the need to sample ade-
quately for precise estimates of the parameter of interest.
Timing refers to when to collect data and it depends on numerous considera-
tions. Obviously, studies of breeding animals should be conducted during the
breeding season, studies of migrating animals during the migration period, and so
on. Within a season, timing can be critically important because detectability of
individuals can change for different activities or during different phenological
phases. Male passerine birds, for example, are generally more conspicuous during
the early part of the breeding when they are displaying as part of courtship and ter-
ritorial defense activities. Detection probabilities for many species will be greater
during this period than at other times. Another consideration is that the vary popula-
tion under study can change within a season. For example, age class structures and
numbers of individuals change during the course of the breeding season as juve-
niles fledge from nests and become a more entrenched part of the population.
Population estimates for a species, therefore, may differ substantially depending on
when data are collected. Once the decision is made as to when to collect data, it is
crucial that data are collected during the same time in the phenology of the species
during subsequent years to control for some of the within season variation.
Objectives of a study also dictate when data should be collected. If the study is
an inventory to determine the presence of species breeding in an area, sampling
should occur throughout the breeding season to account for asynchrony in breeding
cycles and heterogeneity in detectabilities among species. Sampling spread over the
course of the season would give a greater chance of recording most of the species
using the area. If a monitoring study is being conducted to evaluate population
trend of a species based on a demographic model, sampling should be done at the
appropriate time to ensure unbiased estimates of the relevant population parame-
ters. Demographic models typically require fecundity and survival data to estimate
the finite rate of population increase. Sampling for each of these parameters may
be necessary during distinct times to ensure unbiased estimates for the respective
measures (USDI Fish and Wildlife Service 1995).
Length of the study refers to how long a study must be done to estimate the
parameter of interest. It depends on a number of factors including study objectives,
field methodology, ecosystem processes, biology of the species, budget, and feasi-
bility. A primary consideration for monitoring and inventory studies should be
temporal qualities of the ecological process or state being measured (e.g., popula-
tion cycles, successional patterns). Temporal qualities include frequency, magni-
tude, and regularity, which are influenced by both biotic and abiotic factors acting
286 7 Inventory and Monitoring Studies
both stochastically and deterministically (Franklin 1989). Further, animals are sub-
jected to various environmental influences during their lifetimes. A study should
engage in data collection over a sufficiently long period to allow the population(s)
under study to be subjected to a reasonable range of environmental conditions.
Consider two hypothetical wildlife populations that exhibit cyclic behaviors, one
that cycles on average ten times per 20 years, and the other exhibiting a complete
cycle just once every 20 year (Fig. 7.4). The population cycles are the results of
various intrinsic and extrinsic factors that influence population growth and decline.
A monitoring program established to sample both populations over a 10-year
period may be adequate to understand population trends in the species with fre-
quent cycles, but may be misleading for the species with the long population cycle.
Likely, a longer timeframe would be needed to monitor the population of species
with the lower frequency cycles.
However, considering only the frequency of population cycles may be inade-
quate as the amplitude or magnitude of population shifts may also influence the
length of a study to sort out effects within year variation from between year varia-
tion. Consider two populations that exhibit ten cycles in 20 years, but now the
magnitude of the change for one is twice that of the other (see Fig. 7.4). Sampling
the population exhibiting greater variation would require a longer period to detect
a population trend or effect size should one indeed occur.
Gathering abundance and demographic data can be costly, entail extensive field
sampling, and require highly skilled personnel. Cost is higher largely because of the
number of the samples needed for precise point estimates of the relevant parame-
ters. Often, these costs are beyond the budget of many funding agencies, thus
requiring more cost-effective approaches. Even if cost is not the primary constraint,
the feasibility of obtaining enough samples to estimate abundance for rare species
may be limiting.
New advances for estimating detection probabilities and using this information
to adjust occupancy rates are largely responsible for the renewed interest in occu-
pancy monitoring. In addition, one can model covariates as they relate to occupancy
rates. These models can serve as descriptive tools to explain variation in occupancy
rates. Although occupancy estimation is not new and is the basis for numerous
indices, it has gone through a recent resurgence as a viable monitoring approach,
especially for rare and elusive species (MacKenzie et al. 2004). Generally, occu-
pancy monitoring is cost-efficient, can employ various indirect signs of occupancy,
and does not always require as highly skilled personnel. Occupancy can be useful
288 7 Inventory and Monitoring Studies
Fig. 7.5 Simple random samples of ten plots (gray plots) from sampling frames containing (a) a
random distribution of individuals and (b) a clumped distribution of individuals. Reproduced from
Thompson et al. (1998), with kind permission from Elsevier
290 7 Inventory and Monitoring Studies
The number of sample plots and placement of plots within the study area
depend on a number of sampling considerations, including sampling variances
and species distributions and abundances. Sample size should be defined by the
number of plots to provide precise estimates of the parameter of interest.
Allocation of sample plots should try to minimize sampling variances and can be
done a number of ways. Survey sampling textbooks are a good source of discus-
sion of the theoretical and practical considerations. Basic sampling designs
include simple random, systematic random, stratified random, – cluster sampling,
two-stage cluster sampling, and ratio estimators (Thompson 2002; Cochran
1977). Chapters 4 and 5 presented some of these basic sampling designs with
examples of how they are typically applied.
Historically, wildlife biologists have made heavy use of indices as surrogates for
measuring populations. These can include raw counts, auditory counts, track sur-
veys, pellets counts, browse sign, capture per unit of effort, and hunter success.
Indices are often used to address inventory and monitoring questions (see Sect.
7.1). Implicit to indices is that they provide an unbiased estimate of the relative
abundance of the species under study. This assumption, however, rests heavily on
the assumption that capture probabilities are homogeneous across time, places, and
observers (Anderson 2001).
Although indices are widely used, they are not widely accepted (Anderson 2001;
Engeman 2003). Primary criticisms are that they fail to account for heterogeneous
detection probabilities (Anderson 2001), employ convenience samples which are
not probabilistic samples (Anderson 2001, 2003) typically lack measures of preci-
sion (Rosenstock et al. 2002), and when provided they have large confidence inter-
vals (Sharp et al. 2001).
However, few investigators have enough resources to feed the data hungry analy-
ses that permit raw counts to be adjusted by detection probabilities (Engeman
2003), thereby relegating investigators to using indices. McKelvey and Pearson
(2001) noted that 98% of the small mammal studies published in a 5-year period
had too few data for valid mark–recapture estimation. Verner and Ritter (1985)
found that simple counts of birds were highly correlated with adjusted counts, but
simple counts were possible for all species whereas adjusted counts were possible
only for common species with enough detection.
Index methods are efficient and their use will likely continue (Engeman 2003).
Engeman (2003) notes that the issue with indices is not so much the method as it is
with selecting and executing an appropriate study design and conducting data anal-
ysis to meet the study objective. Methods exist to calibrate indices by using ratio
estimation techniques (see Chap 5; Eberhardt and Simmons 1987), double sam-
pling techniques (Bart et al. 2004), or detection probabilities (White 2005). These
7.7 Alternatives to Long-Term Studies 291
calibration or correction tools may reduce bias associated with indices and render
indices more acceptable as inventory and monitoring tools.
Four phenomena necessitate long-term studies (1) slow processes, such as forest suc-
cession or some vertebrate population cycles, (2) rare events, such as fire, floods, dis-
eases, (3) subtle processes where short-term variation exceeds the long-term trend,
and (4) complex phenomena, such as intricate ecological relationships (Strayer et al.
1986). Unfortunately, needs for timely answers, costs, changing priorities, and logisti-
cal considerations may preclude long-term studies. In such cases, alternative
approaches are sought to address inventory or monitoring objectives. Various alterna-
tives to long-term sampling have been proposed, such as retrospective sampling
(Davis 1989), substitution of space for time (Pickett 1989), the use of systems with
fast dynamics as analogies for those with slow dynamics (Strayer et al. 1986), mode-
ling (Shugart 1989), and genetic approaches (Schwartz et al. 2007).
Retrospective studies have been used to address many of the same questions as
long-term studies. A key use of retrospective studies is to provide baseline data for
comparison with modern observations. Further, they can characterize slow proc-
esses and disturbance regimes, and how they may have influenced selected ecosystem
attributes (Swetnam and Bettancourt 1998). Perhaps the greatest value of retrospective
studies is for characterizing changes to vegetation and wildlife habitats over time.
Dendrochronological studies provide information on frequencies and severities of
historical disturbance events (Swetnam 1990) (Fig. 7.6). This information can be
used to reconstruct ranges of variation in vegetation structure and composition at
various spatial scales. These studies can also be used to infer short- and long-term
effects of various management practices on habitats, as well as effects of disrup-
tions of disturbance regimes on habitats.
Other potential tools for retrospective studies include databases from long-term
ecological research sites, forest inventory databases, pollen studies, and sediment
cores. They are also used in epidemiological and epizootiological studies. With any
of these studies, one must be aware of the underlying assumptions and limitations
of the methodology. For example, dendrochronological methods often fail to
account for the small trees because they are consumed by fire and not sampled. This
limitation may result in a biased estimate of forest structure and misleading infer-
ences about historical conditions. If the investigator understands this idiosyncrasy,
then he or she can consider this during evaluation.
292 7 Inventory and Monitoring Studies
Fig. 7.6 Fire-area index computed as the number of sites recording fires per year for the period
1700–1900. Fires recorded by any tree within the sites are shown on the bottom plot, whereas fires
recorded by 10, 20, or 50% of the trees are shown above (from Swetnam 1990)
Substituting space for time is achieved by finding samples that represent the range
of variation for the variable(s) of interest in order to infer long-term trends (Pickett
1989, Morrison 1992). The assumption is that local areas are subjected to different
environments and different disturbance histories that result in different conditions
across the landscape. Thus, rather than following few samples over a protracted
period to understand effects of slow processes, random events, or systems with high
variances, more areas are sampled hoping that they represent conditions that might
exist during different phases of these processes. For example, if you wanted to
understand the long-term effects of forest clear-cutting on wildlife, a logical
approach would be to locate a series of sites representing a chronosequence of condi-
tions rather than waiting for a recent clear-cut to go through succession. By chron-
osequence, we mean areas that were clear-cut at various times in the past (e.g., 5, 10,
20, 30, 50, 75, and 100 years ago). By sampling enough areas representative of veg-
etation structure and composition at different times following clear-cuts you could
draw inferences as to possible short- and long-term effects on wildlife. To provide
valid results using this approach requires that many sites with somewhat similar his-
tories and characteristics be used (Morrison 1992). If substantial sources of variation
7.7 Alternatives to Long-Term Studies 293
between sampling units cannot be accounted for, then substituting space for time
will fail (Pickett 1989). Even if these sources can be accounted for, space-for-time
substitutions may fail to take into account mesoscale events (Swetnam and
Bettancourt 1998) that affect large regions and tend to mitigate or swamp local envi-
ronmental conditions. Pickett (1989) cautioned that studies that rely on spatial rather
than temporal sampling are best suited for providing qualitative trends or generating
hypotheses rather than for providing rigorous quantitative results. Even so, spatially
dispersed studies are preferred for inventory studies.
Clearly, an empirical basis is needed to support the use of space-for-time substitu-
tions in monitoring studies. By this, we mean that you should conduct a baseline
study to evaluate whether such an approach would provide unbiased estimates of the
variable(s) under study. This baseline study would require comparisons of an exist-
ing long-term data set collected as part of another study with a data set collected
from multiple locations over a time. If no significant differences are observed in
estimates of the variables of interest, then space-for-time substitutions may be justi-
fied. If a difference is observed, then one can explore methods to calibrate results of
one approach with the other. If the differences cannot be rectified by calibration, you
should reconsider the use of space-for-time substitutions in your study design.
Applying the results of a simple system with rapid generation times or accelerated
rates of succession can provide insights into how systems with inherently slower
processes might behave (Morrison 1992). For example, applying results of labora-
tory studies on rodents might provide some insight on population dynamics of
larger wild mammals. Obviously, extending results of captive animals to wild popu-
lations has obvious drawbacks, as does applying results from r-selected species
such as rodents to larger K-selected species such as carnivores. At best, such sub-
stitutions might provide a basis for development of hypotheses or theoretical con-
structs that can be subjected to empirical tests. These tests should be designed to
show the correspondence between the surrogate measure (e.g., that with fast
dynamics) and the variable that exhibits slow dynamics. If the relationship is
strong, then it might be acceptable to use behavior of the surrogate measure as an
index for the variable of interest.
7.7.4 Modeling
Use of models has gained wide application in studies of wildlife habitats (Verner
et al. 1986) and populations (McCullough and Barrett 1992). Models can be con-
ceptual or empirical (Shugart 1989). Conceptual models are generally used to
structure a scientific endeavor. As an example, one might ask, “How is the population
294 7 Inventory and Monitoring Studies
Fig. 7.7 Simplified schematic representation of some important ecological linkages associated
with California spotted owls (from Verner et al. 1992)
7.7 Alternatives to Long-Term Studies 295
provided in Fig. 7.7 is perhaps more detailed than most conceptual models, but it
does show how a system can be characterized as interactions among many tractable
and researchable components.
Quantitative forecasts from predictive models are used to provide wildlife managers
with realizations of ecological processes. When structuring any modeling exercise to
address population dynamics questions, an initial decision must be made concerning the
proposed model’s purpose (McCallum 2000). Empirical models are quantitative predic-
tions of how natural systems behave. Models for examining population dynamics exist
on a continuum from empirical models used to make predictions to abstract models that
attempt to provide general insights (Holling 1966; May 1974; McCallum 2000).
Predictive models require a larger number of parameters than abstract models, increas-
ing their predictive ability for the system of interest, but reducing the generality of the
model and thus its ability to expand results to other systems.
Ecological modeling in wildlife studies encompasses a broad range of topics, but
most often relates to two topics, demographic (parameter estimation) and popula-
tion modeling. Demographic modeling is directed toward developing a model
which best explains the behavior and characteristics of empirical data, and then
using that model to predict how that or similar systems will behave in the future
(Burnham and Anderson 2002). The use and sophistication of demographic mode-
ling has increased along with increases in personal computing power (White and
Nichols 1992) and development of statistical programs specifically for ecological
data (Sec 2.7.2).
Population modeling is directed towards development of predictive models,
based on the aforementioned demographic parameters, which we use to forecast the
response of wildlife populations to perturbations. Population models come in many
forms: population viability analysis, matrix population models, individual based
models, and so on (Caswell 2001; Boyce 1992; DeAngelis and Kross 1992) each
structured with the intent of describing and predicting population dynamics over
time and space (Lande et al. 2003). To be realistic, population models must include
simultaneous interactions between deterministic and stochastic processes (Lande et al.
2003), which lends uncertainty to predictions of population trajectories. Because
the fundamental unit in animal ecology is the individual (Dunham and Beaupre
1998), many population models incorporate individual variability (e.g., stochastic-
ity in estimates of demographic parameters).
7.7.5 Genetics
Genetic techniques represent a new and burgeoning field providing novel approaches
to monitoring. Schwartz et al. (2007) provide an insightful overview of these tech-
niques. They separated genetic monitoring into two categories (1) markers used for
traditional population monitoring and (2) those used to monitor population genetics.
Most genetic materials are obtained through noninvasive samples – hair, scat,
feathers, and the like – thus, obviating the need to capture or even observe the species
296 7 Inventory and Monitoring Studies
under study. Individual animals are identified using genetic markers, thus permit-
ting estimates of abundance and vital rates. For rare species, abundance indices are
possible, which are adjusted subsequently for small population size or detection
probability (White 2005). For more abundant species, capture–recapture analyses
can be applied (see Chap. 4). These samples can also be used to estimate survival
and turnover rates. Survival rates are often difficult to estimate using traditional
mark–capture techniques, especially when detection or capture rates vary with
time. For example, male northern goshawks are detected more easily using tradi-
tional techniques during years when they breed than in years when they do not
(Reynolds and Joy 2006). Survival estimates based on years when the goshawks do
not breed may be underestimates given lower capture probabilities. This bias might
be reduced using molted feathers and genetic markers to estimate survival.
Genetics can also be used to identify species, the presence of hybrids, and the
prevalence of disease or invasive species. For example, genetics has been used to
identify the historical geographical range of fisher (Martes pennanti) (Aubry et al.
2004; Schwartz 2007), the presence of Canada lynx (Lynx canandensis) (McKelvey
et al. 2006), hybridization between bobcats (Lynx rufus) and lynx (Schwartz et al.
2004), and hybridization between northern spotted owls (Strix occidentalis cau-
rina) and barred owls (Strix varia) (Haig et al. 2004).
Genetics can also be used to estimate effective population size and changes in
allele frequencies. This information is critical to understanding patterns of gene
flow and effects of habitat fragmentation on populations. The insight provided by
these approaches and others has tremendous implications for present and future
management of these species. Ultimately, the success of that management can only
be assessed with continued monitoring in the mode of adaptive management.
Fig. 7.8 A seven-step generalized adaptive management system illustrating the series of steps
and feedback loops. Reproduced from Moir and Block (2001), with kind permission from Oxford
University Press
Fig. 7.9 Hypothetical curve of the statistical power needed to detect a population trend in a
population (from USDI Fish and Wildlife Service 1995)
7.8 Adaptive Management 299
Thresholds and trigger points represent predetermined levels that when exceeded
will lead to an action or response. The action or response could be termination or
modification of a particular activity. For example, consider a prescribed fire project
to be conducted in spotted owl habitat. The plan calls for treating 5,000 ha, spread
across six separate fires. The fire plan calls for no special protection to trees, but
predicts that no trees >60 cm dbh will be killed because of the fire. Two statistical
tests are developed, one to test the spatial extent of loss and the other to test for the
absolute magnitude of the percentage of large trees lost. A monitoring plan is devel-
oped following a standard protocol (see Box 7.4). Following postfire monitoring, it
was determined that too many large trees were lost exceeding prefire predictions,
resulting in feedback into the system. In this case, the loss of any trees signified a
threshold that was exceeded. If actions were then developed and initiated to miti-
gate the loss of trees, then the threshold becomes a trigger point. In this case, future
prescription may require removing litter and flammable debris from the base of
large trees to minimize the probability of tree mortality.
(continued)
300 7 Inventory and Monitoring Studies
(continued)
302 7 Inventory and Monitoring Studies
Table 7.2 specifies necessary minimum sample sizes for two null propor-
tions. Application of these sample sizes will depend on the particular Key
Habitat Component and the number of acres treated. See below for more
specific guidelines.
This analysis involves a two-step process to evaluate whether the treat-
ment was implemented correctly. The first is to compare the observed propor-
tion of plots where losses exceeded predictions under the null hypothesis. If
the observed proportion is less than the null proportion, then the project was
7.8 Adaptive Management 303
(continued)
304 7 Inventory and Monitoring Studies
Fig. 7.10 Confidence limits plotted for a range of observed proportions with sample size
specified at (a) n = 25 and (b) n = 50
7.8 Adaptive Management 305
sc = su (1 − f),
Ganey et al. (2004) reported the results of a pilot study conducted in 1999.
The study occurred with the Upper Gila Mountains Recovery Unit on 25 40–
76 km2 quadrats. Quadrats were stratified into high and low density, and field
sampling followed established mark–recapture protocols for this subspecies.
They concluded that the approach was possible but infeasible given costs and
logistics of conducting field samples. They also found that temporal variation
inherent to Mexican spotted owl populations was so large, the power to detect
a population trend was relatively low. They proposed occupancy monitoring
as a cost-effective alternative to mark–recapture, a proposal under serious
consideration by the Mexican Spotted Owl Recovery Team.
Many inventory and monitoring studies are short term occurring within a restricted
area or both. Indeed, studies done to inventory an area for species of concern prior
to implementing a habitat-altering activity do not have the luxury of a long-term
study. To accommodate this situation, specific studies should be done or existing
data sets should be analyzed to establish the minimum amount of time for the study
to provide reliable information. For rare or elusive species, such studies would
focus on the amount of time and number of sampling points needed to detect a spe-
cies if it is present. For studies whose goal is to develop list of the species present,
pilot studies or existing data could be used to develop species accumulation curves
that can help to define the amount of effort needed to account for most species
present (Fig. 7.2).
Often studies are restricted in the amount of area available for study. This may
occur in an island situation either in the traditional sense or when a patch of vegeta-
tion is surrounded by a completely different vegetation type (e.g., riparian habitats
in the southwest). Small areas also occur when the area of interest is restricted. An
example is when development is planned for a small parcel of land and the objec-
tive is to evaluate the species potentially affected within that parcel. In these situa-
tions, you are not so much faced with a sampling problem as you are with a sample
size problem. Given the small area, you should strive to detect every individual and
conduct a complete census. Even so, you may have too few data to permit rigorous
treatment of the data for many of the species encountered. Various tools such as
rarefaction and bootstrapping can be used to compensate for small samples encoun-
tered in small areas studies.
308 7 Inventory and Monitoring Studies
Ideally, monitoring studies should occur over long periods. The objective of such
studies is to document trends that can help to inform predictions of future trajecto-
ries. Many of these monitoring programs also occur over wide geographic areas
such the Breeding Bird Survey, Christmas Bird Counts, and Forest Inventory and
Assessment. The logistics of implementing such large-scale, long-term monitoring
programs is daunting, and potentially compromises integrity of the data. Perhaps
the major hurdle of long-term, regional studies is to make sure that protocols are
followed consistently over time. For example, the Breeding Bird Survey is a long-term
monitoring program that occurs throughout United States (Sauer et al. 2005). The
survey entails conducting point counts along established road transect. Unfortunately,
coverage of these transects varies from year to year, which reduces the effectiveness
of the monitoring program and necessitates innovative analyses to fills in the gaps.
Transects are sampled by a large number of people, with varying levels of expertise,
skill, and ability. A similar situation occurs with habitat monitoring and the US
Forest Service’s Forest Inventory and Assessment program. This program includes
vegetation plots on a 5,000-m grid with plots on lands regardless of ownership
(i.e., not just on Forest Service land). For various reasons the Forest Service has
altered the sampling design by changing the number of plots surveyed, revising
measurement protocols, and the frequency at which they sample points. These
changes effectively compromise the ability to examine long-term trends because of
the difficulty of sorting out variation ascribed to changes in sampling protocols
from variation resulting from vegetation change.
7.10 Summary
Inventory and monitoring are key aspects of wildlife biology and management; they
can be done in pursuit of basic knowledge or as part of the management process.
Inventory is used to assess the state or status of one or more resources, whereas
monitoring is typically done to assess change or trend. Monitoring can be classified
into four overlapping categories:
● Implementation monitoring is used to assess whether or not a directed manage-
ment action was carried out as designed.
● Effectiveness monitoring is used to evaluate whether a management action met
its desired objective.
● Validation monitoring is used to evaluate whether an established management
plan is working.
● Compliance monitoring is used to see if management is occurring according to
established law.
Selecting the appropriate variable to inventory or monitor is a key aspect of the
study design; direct measures, such as population numbers, are preferred over indi-
References 309
rect measures, such as indicator species. The length of monitoring studies depends
largely on the process or variable being studied. The appropriate length often
exceeds available resources, necessitating alternative approaches such as retrospec-
tive studies, modeling, genetic tools, substituting space for time, and substituting
fast for slow dynamics.
Time, cost, and logistics often influence the feasibility of what can be done. Use of
indices can be an effective way to address study objectives provided data are collected
following an appropriate study design and data are analyzed correctly. Indices can be
improved and calibrated using ratio-estimation and double-counting techniques.
Monitoring effects of management actions requires a clear and direct linkage
between study results and management activities, often expressed as a feedback
loop. Feedback is essential for assessing the efficacy of monitoring and for validat-
ing or changing management practices. Failure to complete the feedback process
negates the intent and value of monitoring.
References
Allen, T. F. H., and T. W. Hoekstra. 1992. Towards a Unified Ecology. Columbia University Press,
New York, NY.
Anderson, D. R. 2001. The need to get the basics right in wildlife field studies. Wildl. Soc. Bull.
29: 1294–1297.
Anderson, D. R. 2003. Response to Engeman: Index values rarely constitute reliable information.
Wildl. Soc. Bull. 31: 288–291.
Aubry, K., S. Wisley, C. Raley, and S. Buskirk. 2004. Zoogeography, pacing patterns and dispersal
in fishers: insights gained from combining field and genetic data, in D. J. Harrison, A. K.
Fuller, and G. Proulx, Eds. Martins and Fishers (Martes) in Human-Altered Environments: An
International Perspective, pp. 201–220. Springer Academic, New York, NY.
Bart, J., S. Droge, P. Geissler, B. Peterjohn, and C. J. Ralph. 2004. Density estimation in wildlife
surveys. Wildl. Soc. Bull. 32: 1242–1247.
Block, W. M., and L. A. Brennan. 1993. The habitat concept in ornithology: Theory and applica-
tions, in D. M. Power, Ed. Current Ornithology, vol. 11, pp. 35–91. Plenum, New York, NY.
Block, W. M., M. L. Morrison, J. Verner, and P. N. Manley. 1994. Assessing wildlife–habitat-
relationships models: A case study with California oak woodlands. Wildl. Soc. Bull. 22:
549–561.
Boyce, M. S. 1992. Population viability analysis. Annu. Rev. Ecol. Syst. 23: 481–506.
Burnham, K. P., and D. R. Anderson. 2002. Model Selection and Multimodel Inference: A
Practical Information-Theoretic Approach, 2nd Edition. Springer-Verlag, New York, NY.
Caswell, H. 2001. Matrix Population Models: Construction, Analysis, and Interpretation, 2nd
Edition. Sinauer Associates, Inc., Sunderland, MA.
Clements, E. E. 1920. Plant Indicators. Carnegie Institute, Washington, DC.
Cochran, W. G. 1977. Sampling Techniques, 3rd Edition. Wiley, New York, NY.
Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences, 2nd Edition. Erlbaum,
Hillsdale, NJ.
Cohen, J. 1992. A power primer. Psychol. Bull. 112: 155–159.
Davis, M. B. 1989. Retrospective studies, in G. E. Likens, Ed. Long-Term Studies in Ecology:
Approaches and Alternatives, pp. 71–89. Springer-Verlag, New York, NY.
DeAngelis, D. L., and L. J. Gross. 1992. Individual-Based Models and Approaches in Ecology.
Chapman and Hall, London.
310 7 Inventory and Monitoring Studies
Dunham, A. E., and S. J. Beaupre. 1998. Ecological experiments: Scale, phenomenology, mecha-
nism, and the illusion of generality, in J. Bernardo and W. J. Resetarits Jr., Eds. Experimental
Ecology: Issues and Perspectives, pp. 27–49. Oxford University Press, Oxford.
Eberhardt, L. L., and M. A. Simmons. 1987. Calibrating population indices by double sampling.
J. Wildl. Manage. 51: 665–675.
Efron, B. 1982. The Jackknife, The Bootstrap, and Other Resampling Plans. Society for Industrial
and Applied Mathematics, Philadelphia, PA, 92 pp.
Engeman, R. M. 2003. More on the need to get the basics right: Population indices. Wildl. Soc.
Bull. 31: 286–287.
Fleiss, J. L. 1981. Statistical Methods for Rates and Proportions, 2nd Edition. Wiley, New York, NY.
Franklin, J. F. 1989. Importance and justification of long-term studies in ecology, in G. E. Likens,
Ed. Long-Term Studies in Ecology: Approaches and Alternatives, pp. 3–19. Springer-Verlag,
New York, NY.
Ganey, J. L., G. C. White, D. C. Bowden, and A. B. Franklin. 2004. Evaluating methods for moni-
toring populations of Mexican spotted owls: A case study, in W. L. Thompson, Ed. Sampling
Rare and Elusive Species: Concepts, Designs, and Techniques for Estimating Population
Parameters, pp. 337–385. Island Press, Washington, DC.
Gray, P. A., D. Cameron, and I. Kirkham. 1996. Wildlife habitat evaluation in forested ecosystems: Some
examples from Canada and the United States, in R. M. DeGraaf and R. I. Miller, Eds. Conservation
of Faunal Diversity in Forested Landscapes, pp. 407–536. Chapman and Hall, London.
Green, R. H. 1979. Sampling Design and Statistical Methods for Environmental Biologists. Wiley,
New York, NY.
Haig, S. M., T. D. Mullins, E. D. Forsman, P. W. Trail, and L. Wennerberg. 2004. Genetic identi-
fication of spotted owls, barred owls, and their hybrids: Legal implications of hybrid identity.
Conserv. Biol. 18: 1347–1357.
Hayek, L. C. 1994. Analysis of amphibian biodiversity data, in W. R. Heyer, M. A. Donnelly, R.
W. McDiarmid, L. C. Hayek, and M. S. Foster, Eds. Measuring and Monitoring Biological
Diversity: Standard Methods for Amphibians, pp. 207–269. Smithsonian Institution Press,
Washington, DC.
Hayes, J. P., and R. J. Steidel. 1997. Statistical power analysis and amphibian population trends.
Conserv. Biol. 11: 273–275.
Hedges, L. V., and Olkin, I. (1985). Statistical Methods for Meta-Analysis. Academic, San Diego, CA.
Heyer, W. R., M. A. Donnelly, R. W. McDiarmid, L. C. Hayek, and M. S. Foster. 1994. Measuring
and Monitoring Biological Diversity: Standard Methods for Amphibians. Smithsonian
Institution Press, Washington, DC.
Hohn, M. E. 1976. Binary coefficients: A theoretical and empirical study. J. Int. Assoc. Math.
Geol. 8: 137–150.
Holling, C. S. 1966. The functional response of invertebrate predators to prey density. Mem.
Entomol. Soc. Can. 48: 1–86.
Hunter Jr., M. L. 1991. Coping with ignorance: The coarse filter strategy for maintaining biodi-
versity, in K. A. Kohm, Ed. Balancing on the Brink of Extinction: The Endangered Species
Act and Lessons for the Future, pp. 266–281. Island Press, Washington, DC.
Hurlbert, S. H. 1984. Pseudoreplication and the design of ecological field experiments. Ecol.
Monogr. 54: 187–211.
Jaccard, P. 1901. The distribution of the flora in the alpine zone. New Phytol. 11: 37–50.
Lande, R., S. Engen, and B. -E. Sæther. 2003. Stochastic Population Dynamics in Ecology and
Conservation. Oxford University Press, Oxford.
Landres, P. B., J. Verner, and J. W. Thomas, 1988. Ecological uses of vertebrate indicator species:
A critique. Conserv. Biol. 2: 316–328.
MacKenzie, D. I., J. A. Royle, J. A. Brown, and J. D. Nichols. 2004. Occupancy estimation and
modeling for rare and elusive populations, in W.L. Thompson, Ed. Sampling Rare or Elusive
Species. pp. 142–172. Island Press, Covelo, CA.
Marascuilo, L. A., and M. McSweeny. 1977. Non-Parametric and Distribution-Free Methods for
the Social Sciences. Brooks/Cole, Monterey, CA.
References 311
May, R. M. 1974. Stability and Complexity in Model Ecosystems, 2nd Edition. Princeton
University Press, Princeton.
McCallum, H. 2000. Population Parameters: Estimation for Ecological Models. Blackwell,
Malden, MA.
McCullough, D. R., and R. H. Barrett. 1992. Wildlife 2001: Populations. Elsevier, London.
McKelvey, K. S., and D. E. Pearson. 2001. Population estimation with sparse data: The role of
estimators versus indices revisited. Can. J. Zool. 79: 1754–1765.
McKelvey, K. S., J. von Kienast, K. B. Aubry, G. M. Koehler, B. T. Maletzke, J. R. Squires, E. L.
Lindquist, S. Loch, M. K. Schwartz. 2006. DNA Analysis of hair and scat collected along
snow tracks to document the presence of Canada lynx. Wildl. Soc. Bull. 34(2): 451–455.
Miller, R. I. 1996. Modern approaches to monitoring changes in forests using maps, in R. M.
DeGraaf and R. I. Miller, Eds. Conservation of Faunal Diversity in Forested Landscapes, pp.
595–614. Chapman and Hall, London.
Moir, W. H., and W. M. Block. 2001. Adaptive management on public lands in the United States:
Commitment or rhetoric? Environ. Manage. 28: 141–148.
Morrison, M. L. 1986. Birds as indicators of environmental change. Curr. Ornithol. 3: 429–451.
Morrison, M. L. 1992. The design and importance of long-term ecological studies: Analysis of
vertebrates in the Inyo-White Mountains, California, in R. C. Szaro, K. E. Severson, and D. R.
Patton, Tech. Coords. Management of Amphibians, Reptiles, and Small Mammals in North
America, pp. 267–275. USDA Forest Service, Gen. Tech. Rpt. RM-166. Rocky Mountain
Forest and Range Experiment Station, Fort Collins, CO.
Morrison, M. L., and B. G. Marcot. 1995. An evaluation of resource inventory and monitoring
programs used in national forest planning. Environ. Manage. 19: 147–156.
Morrison, M. L., B. G. Marcot, and R. W. Mannan. 1998. Wildlife–Habitat Relationships:
Concepts and Application, 2nd Edition. University of Wisconsin Press, Madison, WI.
Noon, B. R., T. A. Spies, and M. G. Raphael. 1999. Conceptual basis for designing an effectiveness
monitoring program, in B. S. Mulder, B. R. Noon, T. A. Spies, M. G. Raphael, C. J. Palmer, A.
R. Olsen, G. H. Reeves, and H. H. Welsh, Tech. Coords. The Strategy and Design of the
Effectiveness Monitoring Program in the Northwest Forest Plan, pp. 21–48. USDA Forest
Service, Gen. Tech. Rpt. PNW-GTR-437. Pacific Northwest Research Station, Portland, OR.
Pickett, S. T. A. 1989. Space-For-Time Substitutions as an Alternative to Long-Term Studies, in
G. E. Likens, Ed. Long-Term Studies in Ecology: Approaches and Alternatives, pp. 110–135.
Springer-Verlag, New York, NY.
Pielou, E. C. 1977. Mathematical Ecology, 2nd Edition. Wiley, New York, NY.
Reynolds, R. T., and S. M. Joy. 2006. Demography of Northern Goshawks in Northern ARIZONA,
1991–1996. Stud. Avian Biol. 31: 63–74.
Rinkevich, S. E., J. L. Ganey, W. H. Moir, F. P. Howe, F. Clemente, and J. F. Martinez-Montoya.
1995. Recovery units, in Recovey Plan for the Mexican Spotted Owl (Strix occidentalis lucida),
vol. I, pp. 36–51. USDI Fish and Wildlife Service, Southwestern Region, Albuquerque, NM.
Romesburg, H. C. 1981. Wildlife science: Gaining reliable knowledge. J. Wild. Manage. 45: 293–313.
Rosenstock, S. S., D. R. Anderson, K. M. Giesen, T. Leukering, and M. E. Carter. 2002. Landbird
counting techniques: Current practices and alternatives. Auk 119: 46–53.
Sauer, J. R., J. E. Hines, and J. Fallon. 2005. The North American Breeding Bird Survey, Results and
Analysis 1966–2005. Version 6.2.2006. USGS Patuxent Wildlife Research Center, Laurel, MD
Schwartz, M. K. 2007. Ancient DNA confirms native rocky mountain fisher Martes pennanti
avoided early 20th century extinction. J. Mam. 87: 921–92.
Schwartz, M. K., K. L. Pilgrim, K. S. McKelvey, E. L. Lindquist, J. J. Claar, S. Loch, and L. F.
Ruggiero. 2004. Hybridization between Canada lynx and bobcats: Genetic results and man-
agement implications. Conserv. Genet. 5: 349–355.
Schwartz, M. K., G. Luikart, and R. S. Waples. 2007. Genetic monitoring as a promising tool for
conservation and management. Trends Ecol. Evol. 22: 25–33.
Scott, J. M., F. Davis, B. Csuti, R. Noss, B. Butterfield, C. Grives, H. Anderson, S. Caicco, F.
D’Erchia, T. Edwards Jr., J. Ulliman, and R. G. Wright. 1993. Gap analysis: A geographical
approach to protection of biodiversity. Wildl. Monogr. 123.
312 7 Inventory and Monitoring Studies
Sharp, A., M. Norton, A. Marks, and K. Holmes. 2001. An evaluation of two indices of red fox
(Vulpes vulpes) abundance in an arid environment. Wildl. Res. 28: 419–424.
Shugart, H. H. 1989. The role of ecological models in long-term ecological studies, in G. E.
Likens, Ed. Long-Term Studies in Ecology: Approaches and Alternatives, pp. 90–109.
Springer-Verlag, New York, NY.
Sokal R. R., and C. D. Michener. 1958. A statistical method for evaluating systematic relation-
ships. Univ. Kansas Sci. Bull. 38: 1409–1438.
Sokal, R. R., and F. J. Rohlf. 1969. Biometry: The Principles and Practice of Statistics in
Biological Research. Freeman, San Francisco, CA.
Sorensen, T. 1948. A method for establishing groups of equal amplitude in plant sociology based
on similarity of species content, and its applications to analyses of the vegetation on Danish
commons. Det Kongelige Danske Viden-skkabernes Selskab, Biloogiske Skrifter 5: 1–34.
Spellerberg, I. F. 1991. Monitoring Ecological Change. Cambridge University Press, New York, NY.
Steidl, R. J., J. P. Hayes, and E. Schauber. 1997. Statistical power analaysis in wildlife research.
J. Wildl. Manage. 61: 270–279.
Strayer, D., J. S. Glitzenstein, C. G. Jones, J. Kolasa, G. Likens, M. J. McDonnell, G. G. Parker,
and S. T. A. Pickett. 1986. Long-Term Ecological Studies: An Illustrated Account of Their
Design, Operation, and Importance to Ecology. Occas Pap 2. Institute for Ecosystem Studies,
Millbrook, New York, NY.
Swetnam, T. W. 1990. Fire history and climate in the southwestern United States, in J. S.
Krammes, Tech. Coord. Effects of Fire Management of Southwestern Natural Resources, pp.
6–17. USDA Forest Service, Gen. Tech. Rpt. RM-191. Rocky Mountain Forest and Range
Experiment Station, Fort Collins, CO.
Swetnam, T. W., and J. L. Bettancourt. 1998. Mesoscale disturbance and ecological response to
decadal climatic variability in the American Southwest. J. Climate 11: 3128–3147.
Thomas, L. 1997. Retorspective power analysis. Conserv. Biol. 11: 276–280.
Thompson, S. K. 2002. Sampling, 2nd Edition. Wiley, New York, NY.
Thompson, W. L., G. C. White, and C. Gowan, 1998. Monitoring Vertebrate Populations.
Academic, San Diego, CA.
USDA Forest Service. 1996. Record of Decision for Amendment of Forest Plans: Arizona and
New Mexico. USDA Forest Service, Southwestern Region, Albuquerque, NM.
USDI Fish and Wildlife Service. 1995. Recovery Plan for the Mexican Spotted Owl (Strix occi-
dentalis lucida), vol. I. USDI Fish and Wildlife Service, Albuquerque, NM.
Verner, J. 1983. An integrated system for monitoring wildlife on the Sierra Nevada Forest. Trans.
North Am. Wildl. Nat. Resour. Conf. 48: 355–366.
Verner, J., and L. V. Ritter. 1985. A comparison of transects and point counts in oak-pine wood-
lands of California. Condor 87: 47–68.
Verner, J., M. L. Morrison, and C. J. Ralph. 1986. Wildlife 2000: Modeling Habitat Relationships
of Terrestrial Vertebrates. University of Wisconsin Press, Madison, WI.
Verner, J., R. J. Gutiérrez, and G. I. Gould Jr. 1992. The California spotted owl: General biology
and ecological relations, in J. Verner, K. S. McKelvey, B. R. Noon, R. J. Gutiérrez, G. I. Gould
Jr., and T. W. Beck, Tech. Coords. The California Spotted Owl: A Technical Assessment of Its
Current Status, pp. 55–77. USDA Forest Service, Gen. Tech. Rpt. PSW-GTR-133. Pacific
Southwest Research Station, Berkeley, CA.
Walters, C. J. 1986. Adaptive Management of Renewable Resources. Macmillan, New York, NY.
White, G. C. 2005. Correcting wildlife counts using detection probabilities. Wildl. Res. 32: 211–216.
White, G. C., and J. D. Nichols, 1992. Introduction to the methods section, in D. R. McCullough
and R. H. Barrett, Eds. Wildlife 2001: Populations, pp. 13–16. Elsevier, London.
White, G. C., W. M. Block, J. L. Ganey, W. H. Moir, J. P. Ward Jr., A. B. Franklin, S. L. Spangle,
S. E. Rinkevich, R. Vahle, F. P. Howe, and J. L. Dick Jr. 1999. Science versus reality in delist-
ing criteria for a threatened species: The Mexican spotted owl experience. Trans. North Am.
Wildl. Nat. Resour. Conf. 64: 292–306.
Chapter 8
Design Applications
8.1 Introduction
Our goal in this chapter is to provide guidance that will enhance the decision-
making process throughout a given study. The often-stated axiom about “the best
laid plans…” certainly applies to study design. The initial plan that any researcher,
no matter how experienced, develops will undergo numerous changes throughout
the duration of a study. As one’s experience with designing and implementing stud-
ies grows, he or she anticipates and resolves more and more major difficulties
before they have an opportunity to impact the smooth progression of the study. The
“feeling” one gains for what will and what will not work in the laboratory or espe-
cially the field is difficult to impart; there is truly no substitute for experience.
However, by following a systematic path of steps when designing and implement-
ing a given study, even the beginning student can avoid many of the most severe
pitfalls associated with research.
The guidelines we present below will help develop the thought process required
to foresee potential problems that may affect a project. We present these guidelines
in a checklist format to ensure an organized progression of steps. Perhaps a good
analogy is that of preparing an oral presentation or writing a manuscript; without a
detailed outline, it is too easy to leave out many important details. This is at least
partially because the presenter or writer is quite familiar with their subject (we
hope), and tends to take many issues and details for granted. The speaker, however,
can always go back and fill in the details, and the writer can insert text; these are
corrective measures typically impossible in a research project.
Throughout this chapter, we provide examples from our own experiences. We will
focus, however, on a “theme” example, which provides continuity and allows the
reader to see how one can accomplish a multifaceted study. The goal of the theme
study is development of strategies to reduce nest parasitism by the brown-headed
cowbird (Molothrus ater) on host species along the lower Colorado River (which
forms the border of California and Arizona in the desert southwest of the United
States). We provide a brief summary of the natural history of the cowbird in Box 8.1.
The theme example is multifaceted and includes impact assessment (influence
of cowbirds on hosts), treatment effects (response of cowbirds to control measures),
Box 8.1
The brown-headed cowbird is a small blackbird, is native to this region, and
has been increasing in both abundance and geographic range during the late
1800s and throughout the 1900s. This expansion is apparently due to the abil-
ity of the cowbird to occupy farm and rangelands and other disturbed loca-
tions. The cowbird does not build a nest. Rather, it lays its eggs in the nest of
other species, usually small, open-cup nesting songbirds such as warblers,
vireos, and sparrows. The host adults then raise the nestling cowbirds. Research
has shown that parasitized nests produce fewer host young than nonparasitized
nests. Although this is a natural occurrence, the population abundance of cer-
tain host species can be adversely impacted by parasitism. These effects are
exacerbated when the host species is already rare and confined to environmen-
tal conditions favored by cowbirds. In the Southwest, host species that require
riparian vegetation are the most severely impacted by cowbirds. This is because
most (>90% along the lower Colorado River) of the riparian vegetation has
been removed for river channelization and other management activities, thus
concentrating both cowbirds and hosts in a small area.
A study is designed around questions posed by the investigator (see Sect. 1.3).
Although this statement may sound trivial, crafting questions is actually a difficult
and critically important process. If a full list of questions is not produced before the
8.2 Sequence of Study Design, Analysis, and Publication 315
Fig. 8.1 Progression of steps recommended for planning, implementing, and completing a
research project. Reproduced from Morrison et al. (2001), with kind permission from Springer
Science + Business Media
study plan is developed, it is often problematic to either insert new questions into
an ongoing study or answer new questions not considered until the conclusion of
data collection. Thus, the first step should be listing all relevant questions that
should be asked during the study. Then, these questions should be prioritized based
on the importance of the answer to the study. Optimally, we then design the study
to answer the aforementioned questions, in a statistically rigorous fashion, one
question at a time. The investigator must resist the overpowering temptation to try
to address multiple questions with limited resources simply because “they are of
interest.” Such a strategy will, at a minimum, doom a study to mediocrity. We
would much rather see one question answered thoroughly than see a paper report-
ing the results of a series of partially or weakly addressed questions.
Guiding the selection of questions will be the availability of study locations,
time, personnel, and funding. All studies will have limitations for all of these
parameters, which place constraints on the number and types of questions that can
be addressed.
Questions are developed using literature, expert opinion, your own experiences,
intuition, and guesswork. A thorough literature review is an essential cornerstone
of all studies. There is no need to reinvent the wheel; we should attempt to advance
knowledge rather than simply repeating a study in yet another geographic location.
Likewise, we should critically evaluate published work and not repeat biased or
316 8 Design Applications
substandard research. We must be familiar with the past before we can advance into
the future. The insights offered by people experienced in the field of interest always
should be solicited, because these individuals have ideas and intuition often una-
vailable in their publications. You should synthesize all of these sources and
develop your own ideas, blended with intuition, to devise your questions.
Box 8.2
Here is our prioritization of the study questions for the theme example given
in text under “Questions.”
#1 – h. What is an adequate measure of project success (e.g., reduced para-
sitism rates, increased reproductive success; fledgling success)?
This should be the obvious choice given the project goal of determining
ways of lowering parasitism and increasing breeding success of cowbird hosts.
The complicating issue here, however, is that simply lowering parasitism may
not increase reproductive success. This is because other factors, most notably
predation by a host of species, could negate any beneficial effects of adult or
egg and young removal. Thus, while documenting a reduction in parasitism is
essential, it alone is not sufficient to declare the study a success.
#2 – c. Effectiveness of removal of adults vs. removal of eggs and young
It is extremely time consuming to locate host nests. Thus, it would be eas-
ier to focus management on removal of adult cowbirds. It was deemed neces-
sary to examine both methods because of the worry that removal of adults
alone would be inadequate, and the necessity of rapidly determining an effec-
tive management strategy.
#3 – a. Effectiveness of a single trap; i.e., the area of influence a trap exerts
on parasitism and host productivity
The failure to detect a significant influence of adult cowbird removal on
host reproduction could be criticized as resulting from inadequate trapping
effort. Therefore, it was judged to be critical to ensure that an adequate trap-
ping effort be implemented. The decision was made to “overtrap” to avoid
this type of failure; future studies could refine trapping effort if trapping was
shown to be an effective procedure.
#4 – i. Population abundance of cowbirds and host species
The initial reason for concern regarding the influence of cowbird parasit-
ism on hosts was the increase in cowbird abundance and the decrease in many
host species abundance. Therefore, determining trends in cowbird and host
abundance will serve, over time, to determine the degree of concern that
should be placed on this management issue.
Not all questions are mutually exclusive, nor is it possible to produce a
perfect ordering. However, it is critical that a short list of prioritized items be
developed, and that this prioritization be based on the ultimate reason for
conducting the study.
our ability to make reliable predictions because, in part, of the failure to follow rig-
orously the H–D method (see Sect. 1.4). However, testing hypotheses does not
necessarily lessen the numerous ad hoc explanations that accompany studies that
fail to establish any formal and testable structure. A decent biologist can always
explain a result. Our point here is that simply stating a hypothesis in no way guarantees
318 8 Design Applications
Design of the project entails the remaining sections of this chapter. Indeed, even the
final section on publishing should be considered in designing your study (see Sect.
1.3.1). For example, a useful exercise when designing your study is to ask, “How
will I explain this method when I write this up?” If you do not have absolute confi-
dence in your answer, then a revision of methods is called for. Put more bluntly, “If
you cannot explain it, how can you expect a reader to understand it?” A well-written
study plan/proposal can become, in essence, the methods section of a manuscript.
Imbedded in our previous discussion of question and hypotheses development are
issues related to delineation of the study population, spatial and temporal extent of
study, sex and age considerations, needed generalizability of results, and so forth.
Each of these separate issues must now be placed into a coherent study design. This
requires that we make decisions regarding allocation of available resources (person-
nel, funding, logistics, and time). At this step, it is likely that any previous failure to
reduce the number of questions being asked will be highlighted.
Delineation of the study population is a critical aspect of any study (see Sects.
1.3–1.5). Such delineation allows one to make definitive statements regarding how
widely study results can be extrapolated. Unfortunately, few research papers make
such a determination. A simple example: The willow flycatcher (Empidonax trail-
lii) is a rare inhabitant of riparian and shrub vegetation throughout much of the
western United States (Ehrlich et al. 1988). It is currently divided into three subspe-
cies, one of which (E. t. extimus) is classified as threatened/endangered and is
restricted to riparian vegetation in the arid Southwest. The other two subspecies
(E. t. brewsteri and E. t. adastus) occur in higher elevation, mountain shrub (espe-
cially willow, Salix) and riparian vegetation; they are declining in numbers but not
federally listed (Harris et al. 1987). Thus, it is unlikely that research findings for
E. t. brewsteri and E. t. adastus would be applicable to E. t. extimus. Further, it is
unlikely that results for studies of E. t. extimus from the lower Colorado River – a
desert environment – would be applicable to populations of this subspecies occur-
ring farther north in less arid regions. Subspecies are often divided into ecotypes
that cannot be separated morphologically. Indeed, the physiological adaptations of
ecotypes are little studied; differences among ecotypes (and thus populations) are
probably a leading reason why research results can seldom be successfully applied
to other geographic locations.
The distribution of sampling locations (e.g., plots, transects) is a critical aspect
of all studies (see Chap. 4). First, the absolute area available to place the plots is
usually limited by extent of the vegetation of interest, legal restrictions preventing
access to land, and differences in environmental conditions. If the goal of a study
is to determine effects of a treatment in riparian vegetation on a wildlife refuge,
then the study is constrained by the extent of that vegetation on a defined area (as
is the case in our theme example).
Second, the behavior of the animal(s) under study will constrain the placement
of sampling locations. Most studies would be severely biased by movement of
animals between what are designated as independent sampling locations. For example,
320 8 Design Applications
The optimal design for this study would have been based on at least 1 year of pre-
treatment data collection, followed by random assignment of treatments (adult and/
or egg–young removal). However, funding agencies required a more rapid assess-
ment of treatment effects because of pressure from regulatory agencies. Thus, the
decision was made to forego use of a BACI-type design, and instead use another
impact assessment approach (see Chap. 6). Specifically, pairs of plots were selected
based on similarity in environmental conditions and proximity to one another.
Treatments were then assigned randomly to one member of each pair. Paired plots
were placed in close proximity because of the lack of appropriate (riparian) vegeta-
tion. Because of limited riparian vegetation, it was not possible to have control plots
for adult cowbird removal that were within a reasonably close proximity to treated
plots. This is because previous research indicated that a single cowbird trap could
impact parasitism rates up to at least 0.8 km from the trap. The decision was made
to treat all plots for adult removal, but only one of the pairs would be treated for
egg and young removal. Plots receiving no treatments were placed upstream far
outside the influence of the cowbird traps. This design carries the assumption that
the upstream reference plots will be an adequate indicator of naturally occurring
parasitism rates in the area. Research on parasitism rates conducted several years
prior to the new study provided additional support. Thus, the design implemented
for this study is nonoptimal, but typical of the constraints placed on wildlife
research. Textbook examples of experimental design are often impossible to imple-
ment in the real world of field biology (although this is not an excuse for a sloppy
design; see Sect. 8.3). Thus, a certain amount of ingenuity is required to design an
experiment that can still test the relevant hypotheses. Of course, some hypotheses
cannot be tested in the field regardless of the design (see Chap. 1).
8.2 Sequence of Study Design, Analysis, and Publication 321
Variables must be selected that are expected to respond to the treatment being
tested, or be closely linked to the relationship being investigated. Even purely
descriptive, or hypothesis generating, studies should focus sampling efforts on a
restricted set of measurements. A thorough literature review is an essential part of
any study, including development of a list of the variables measured by previous
workers (see Sect. 1.3). However, one must avoid the temptation of developing a
long “shopping list” of variables to measure: This only results in lowered precision
because of smaller sample sizes. Rather, a list of previous measurements should be
used to develop a short list of variables found previously to be of predictive value.
Measuring numerous variables often means that some will be redundant and meas-
ure the same quality (e.g., tree height and basal area). Also, lists of variables from
past studies may provide an opportunity to learn from past efforts; that is, there may
be better variables to measure than those measured before.
For example, since multivariate statistical tools became readily available and
easy to use, there has been a proliferation of studies that collected data on a massive
number of variables. It is not unusual to see studies that gathered data on 20–30
variables or more, or read analyses of all possible subsets of ten or more variables
(which results in millions of comparisons); this tendency is especially apparent in
studies of wildlife–habitat relationships (see Morrison et al. 2006 for review).
Preliminary data collection (see Sect. 8.2.7) can aid in developing a short list of
variables. Most studies, including those using multivariate methods, identify only a
few variables that have the majority of predictive power. There is ample evidence
in the literature to justify concentrating on a minimum number of predictive or
response variables.
Each variable measured increases the time spent on each independent sample,
time that could be better spent on additional independent samples. Further,
researchers often tend to use rapid measurement techniques when confronted with
a long list of variables. For example, many workers visually estimate vegetation
variables (e.g., shrub cover, tree height) even though research indicates that visual
methods are inferior to more direct measurements (e.g., using a line intercept for
shrubs and a clinometer for tree height; see Block et al. 1987). Thus, there is ample
reason to concentrate on a few, carefully selected variables.
This is also the time to identify potential covariates (see Chap. 3). Analysis of
covariance is an indirect, or statistical, means of controlling variability due to
experimental error that increases precision and removes potential sources of bias.
As a reminder, statistical control is achieved by measuring one or more concomitant
variates in addition to the variate of primary interest (i.e., the response variable).
Measurements on the covariates are made for adjusting the measurements on the
primary variate. For example, in conducting an experiment on food preference,
previous experience will likely influence results. The variate in this experiment may
be a measure of the time taken to make a decision; the covariate may be a measure
associated with the degree of experience at the start of the trials. Thus, covariates
322 8 Design Applications
must be designed into the study, and should not be an afterthought. Covariates may
be used along with direct experimental design control. Care must be taken in the
use of covariates or misleading results can result (see Chap. 3 (see also Winer et al.
1991, Chap. 10). Step 5 (Sect. 8.2.5) incorporates further discussion of variable
selection.
One of our research interests is the movement of males and females by age within and
between riparian patches. The process of selecting variates to measure begins with
prioritization of the information needed to address project goals. Where and when the
birds are located and their activity (e.g., foraging, nest searching) is important because
this provides information useful in locating traps and searching for host nests that
have been parasitized. The sex composition of the birds (such as in flocks) is important
because it provides information on (1) how the sex ratio may be changing with
trapping and (2) where females may be concentrating their activities. The age com-
position also is important because it provides a measure of experimental success (i.e.,
how many cowbirds are being produced). Finally, the type of vegetation used by
cowbirds provides data on foraging and nest-searching preferences.
An important aspect of the cowbird study is the effectiveness of the removal of
adults in controlling parasitism. Confounding this analysis could be the location of
a host nest within a study plot. This is because research has shown that cowbirds
seem to prefer edge locations (e.g., Laymon 1987; Robinson et al. 1993; Morrison
et al. 1999), with a decreasing parasitism rate as you move farther into the interior
of a woodland. Thus, the variate may be a measure of host nesting success or para-
sitism rate, with the covariate being the distance of the nest from the plot edge.
There are usually many methods available for recording data on a specific variable.
For example, birds can be counted using fixed area plots, variable distance meas-
ures, and transects; vegetation can be measured using points, plots of various sizes
and shapes, and line transects; and so forth (see Chapter 4). However, many of these
decisions will be guided by the objectives of the study and, thus, the specific types
of data and precision associated with the data needed. For example, if study objec-
tives require that nesting success is determined, then an intensive, site-specific
counting method (such as spot mapping) might be most appropriate. This same
method, however, would not necessarily be appropriate for a study of bird abun-
dance along an elevational gradient. Thus, there is no “best” method; rather, there
are methods that are most appropriate for the objectives of any study.
8.2 Sequence of Study Design, Analysis, and Publication 323
Once the information to be recorded is determined and the collection strategy estab-
lished, the next step is to group the items in some meaningful and efficient manner,
and order the items within the groups. Developing an efficient recording order
greatly simplifies data collection, thus reducing observer frustration and increasing
the quantity and quality of the data. For our theme example, some items refer to an
individual bird, some to the flock the birds may be in, some to the foraging sub-
strate, some to the foraging site, and so on. Items within each group are then
ordered in a manner that will maximize data collection. For example, items related
to the foraging individual include:
● Age
● Sex
● Plant species the bird is in or under
● Substrate the bird is directing foraging upon
● The behavior of the bird
● Rate of foraging
324 8 Design Applications
This is the “action plan” for data collection. This plan includes three specific parts:
1. Sampling protocol
2. Data form
3. Variable key
In Box 8.3 we present an example of these three steps that was developed for a
study of the distribution, abundance, and habitat affiliations of the red-legged frog
(Rana aurora draytonii).
Box 8.3
The material given below outlines an example of the three primary products
necessary to organize properly a data-collecting strategy (1) sampling proto-
col, (2) variable key, and (3) data form.
1. Sampling Protocol
This protocol is designed to quantify the habitat use of the red-legged frog
(RLF) at three hierarchical scales. It is focused on aquatic environments,
although the species is known to use upland sites during certain periods of the
year. The hierarchy used is based on Welsh and Lind (1995). Terrestrial sight-
ings are accommodated in this protocol in a general fashion, although more
specific information would need to be added at the microhabitat scale than is
provided herein.
The landscape scale describes the general geographic relationship of each
reach sampled. Additional information that could be recorded in separate
notes includes distance from artificial water containments; the distance to,
and type of, human developments; and so forth. The appropriate measure of
the animal associated with this broad scale would be presence or absence of
the species. The macrohabitat scale describes individual segments or plots
stratified within each reach by general features of the environment. The
appropriate measure of the animal associated with this midscale is abundance
by life stage. The microhabitat scale describes the specific location of an egg
mass, tadpole, or adult.
All data should be recorded on the accompanying data form using the
indicated codes. Any changes to the codes, or addition to the codes, must be
indicated on a master code sheet and updated as soon as possible.
The following table summarizes the hierarchical arrangement of sampling
used to determine the distribution, abundance, and habitat affinities of red-legged
frogs, although the general format is applicable to a variety of species (see Welsh
and Lind 1996). Variables should be placed under one classification only.
(continued)
8.2 Sequence of Study Design, Analysis, and Publication 325
(continued)
326 8 Design Applications
2. Variable Key
The following is a measurement protocol and key for the data form given
below. Each variable on the data form is described on this key. In field sam-
pling, changes made to the sampling procedures would be entered on a mas-
ter key so that data forms can be revised and procedures documented (which
is especially important in studies using multiple observers).
I. Landscape Scale
A. Take measurements from a central, characteristic location in the reach.
II. Macrohabitat Scale
Each reach will be divided into sections or plots based on some readily iden-
tifiable landscape feature, or some feature that is easy to relocate (e.g., bridge,
road, house).
A. Take the mean of measurements needed to characterize the segment (1–3
measurements depending on size of segment)
B. 1. a. Average width of predominant condition given in 2 below
b. Length for nonstream condition
2. Predominant condition: record single predominant condition in the
segment
3. Sediment: coarse (rocks, boulders); medium (gravel, pebble); fine
(sand, silt)
C. Record the indicated “type” of predominant aquatic vegetation
D. Record the percentage of the stream within the segment that is obscured
(shaded) by live foliage of any plant species
E. The goal is to note if any life stage of RLF is present; approximate num-
bers can be recorded if time allows (code 1 = 1 or a few egg masses or
individuals; code 2 = 5–10; code 3 = >10).
F. The goal is to note the presence of any predator or competitor; approxi-
mate numbers can be recorded as for E above
III. Microhabital Scale
Individual animals that are selected for detailed analysis, or all animals within
subsegments or subplots, can be analyzed for site-specific conditions.
A. The goal is to accurately count all RLF by life stage if working within a
subsegment or subplot; or record the specific life stage if sampling micro-
habitat for an individual.
B. Record the location of the life stage by the categories provided (1–4); also
record water depth (4.a.) if applicable.
C. Within the location recorded under B above, record the specific site as
indicated (l.a. to c.); also include the distance from water if applicable
(2.).
D. 1. Record the estimated percentage cover of the four predominant aquatic
plant species within the subsegment/subplot, or within a 1-m radius of
the individual RLF; also record the vigor (D.2.) and height (D.3.) of each
plant species.
(continued)
328 8 Design Applications
Suppose we have indeed decided to record details on foraging birds, their move-
ments, and related topics. Referring back to the section on Step 1 (Sect. 8.2.1) –
Questions, we might ask include:
8.2 Sequence of Study Design, Analysis, and Publication 329
The critical importance of determination of adequate sample sizes and power analy-
ses was developed in Chaps. 2.6.6 and 2.6.7, respectively (see also Thompson et al.
1998, Chap. 6). A priori power analyses are often difficult because of the lack of
sufficient data upon which to base estimates of variance. Nevertheless, power cal-
culations remove much of the arbitrariness from study design (power calculations
are performed again at various times during the study, as discussed in Sect. 8.2.7).
In addition, power calculations give a good indication if you are trying to collect
data on too many variables, thus allowing you to refocus sampling efforts on the
priority variables (see Steidl et al. [1997] for a good example of application of
power analysis to wildlife research).
To determine the number of paired plots (treatment vs. control) needed to rigor-
ously test the null hypothesis of no treatment effect, data on nesting biology col-
lected during several previous years of study were available. Although the data
330 8 Design Applications
were not collected from plots, they were collected within the same general areas.
Thus, the data were collected in similar environmental conditions as would be
encountered during the new study. If these data had not been available, then infor-
mation from the literature on studies in similar conditions and with similar species
would have been utilized.
For example, the data available on nesting success of Bell’s vireos, a species
highly susceptible to cowbird parasitism, indicated that nesting success of parasitized
nests was 0.8 ± 1.3 (SD) fledglings/nest. Using this value, effect sizes of 0.5, 1.0,
and 1.5 fledglings/nest were calculated with power of 80% and a = 0.1. Assuming
equal variances between plots, and using calculations for one-sided tests (because
primary interest was in increased nest success with cowbird control), sample sizes
(number of plots) were calculated as >>20 (power = ~50% for n = 20), 15, and 7
for the three effect sizes. Thus, we concluded that time and personnel availability
allowed us to identify an effect equal to 1.5 additional fledgling per nest following
treatment at 80% power.
An additional important and critical step in development of the study plan is inde-
pendent peer review (see Sect. 1.3.1). Included should be review by experts in the
field and experts in technical aspects of the study, particularly study design and sta-
tistical analyses. Review prior to actually collecting the data can help avoid, although
not eliminate, many problems and wasted effort. In this example, experts from man-
agement agencies helped confirm that the questions being asked were relevant to their
needs, several statisticians were consulted on the procedures being used, and several
individuals studying bird ecology and cowbirds reviewed the design.
All studies should begin with a preliminary phase during which observers are
trained to become competent in all sampling procedures. The development of rigid
sampling protocols, as developed above (see Sect. 8.2.5) improves the chances that
observers will record data in a similar manner. Training of observers should
include:
1. Testing of visual and aural acuity. Much wildlife research involves the ability to
see and hear well. For example, birds can produce calls that are near the limits
of human hearing ability. Slight ear damage, however, can go unnoticed, but
result in an inability to hear high-frequency calls. Hearing tests for personnel
who will be counting birds using sight and sound should be conducted (e.g.,
Ramsey and Scott 1981).
2. Standardization of recording methods. Unfortunately, there is seldom a correct
value with which we can compare samples. Most of our data represent indices
of some “true” but unknown value (e.g., indices representing animal density or
vegetation cover). Further, field sampling usually requires that the observer
interpret a behavior or makes estimate of animal counts or plant cover. Thus,
8.2 Sequence of Study Design, Analysis, and Publication 331
An essential part of the theme study was determining the abundance of cowbirds
and potential host species. Because the majority of birds encountered during a for-
mal bird count are heard but not seen, observers must possess advanced identifica-
tion skills and excellent hearing capabilities. Selecting experienced personnel eases
training of observers. Although a talented but inexperienced individual can usually
learn to identify by song the majority of birds in an area within 1–2 mo, it takes
many years of intensive study to be able to differentiate species by call notes,
including especially rare or transient species. Thus, observers should not be asked
to accomplish more than their skill level allows.
Even experienced observers must have ample time to learn the local avifauna;
this usually involves 3–4 weeks of review in the field and the use of song record-
ings. Regardless of the level of experience, all observers must standardize the
recording of data. In the theme study, observers went through the following steps:
1. During the first month of employment, new employees used tapes to learn songs
and call notes of local birds. This included testing each other through the use of
tapes. Additionally, they worked together in the field, making positive visual
identifications of all birds heard and learning flight and other behaviors.
2. While learning songs and calls, observers practiced distance estimation in the
field. Observers worked together, taking turns pointing out objects, with each
observer privately recording their estimate of the distance; the distance was then
measured. This allows observers to achieve both accurate and precise
measurements.
332 8 Design Applications
3. Practice bird counts are an essential part of proper training. When any activity
is restricted in time, such as a 5-min point count, inexperienced observers
become confused, panic, and fail to record reliable data. Having the ability to
identify birds (i.e., being a good bird watcher) does not mean a person can
conduct an adequate count. Thus, realistic practice sessions are necessary to
determine the capability of even good bird-watchers. In the theme study,
observers, accompanied by the experienced project leader, conducted “blind”
point counts during which each person independently recorded what they saw
and heard, as well as an estimate of the distance to the bird. These practice
counts were then immediately compared with discrepancies among observers
discussed and resolved.
The quality of data was enhanced and controlled throughout the study duration by
incorporating activities such as:
● Banding. Repeating of band numbers and colors, sex, age, and all other meas-
urements between observer and recorder. This serves as a field check of data
recording, and keeps field technicians alert.
● Band resightings. Color bands are difficult to see in the field, and many colors
are difficult to differentiate because of bird movements and shadows (e.g., dark
blue vs. purple bands). Observers can practice by placing bands on twigs at vari-
ous distances and heights and under different lighting conditions.
● Bird counts. Regular testing of species identification by sight and sound, dis-
tance estimation, and numbers counted. Technicians do not always improve their
identification abilities as the study proceeds. Numerous factors, such as fatigue,
forgetfulness, deteriorating attitudes, etc., can jeopardize data quality.
● Foraging behavior. Regular testing of assignment of behaviors to categories. As
described above for bird counts, many factors can negatively impact data quality.
Additionally, technicians need to communicate continually any changes in inter-
pretation of behaviors, additions and deletions to the variable list, and the like.
This becomes especially important in studies using distinct field teams operating
in different geographic locations.
● Data proofing: as described above, it is important that one proofs data after
every sampling session.
There should be a constant feedback between data collection and QA/QC. Probably
one of the principal weaknesses of most studies is a failure to apply QA/QC on a
continuing basis. The prevalent but seldom acknowledged problem of observer drift
can affect all studies regardless of the precautions taken during observer selection
and training. The QA/QC should be ongoing; including analysis of sample sizes (are
too many or too few data being collected?). Electronic data loggers can be useful in
many applications. People can collect data in the field and the data loggers can be
programmed to accept only valid codes. Data can then be directly downloaded into
a computerized database for proofing and storage. The database then can be queried
and analyses made in the statistical program of choice. Previously, data loggers often
were beyond the financial reach of many projects. This is no longer true as prices
have dropped precipitously and technology advanced considerably.
A weakness of most studies is failure to enter, proof, and analyze data on a con-
tinuing basis. Most researchers will quickly note that they have scarcely enough
time to collect the data let alone enter and analyze even a portion of it. This is, of
course, a fallacious response because a properly designed study would allocate
334 8 Design Applications
When initiated, most studies have a relatively short (a few months to a few years)
time frame. As such, maintaining the sites usually only requires occasional replacing
of flagging or other markers. It is difficult to anticipate, however, potential future
uses of data and the location from where they were gathered (i.e., they might become
monitoring studies). In southeastern Arizona, for example, few anticipated the dra-
matic spreading in geographic area by the exotic lovegrasses (Eragrostis spp.) when
they were introduced as cattle forage. However, several studies on the impacts of
grazing on wildlife included sampling of grass cover and species composition,
thereby establishing baseline information on lovegrasses “by accident.” By perma-
nently marking such plots, researchers can now return, relocate the plots, and con-
tinue the originally unplanned monitoring of these species. Unexpected fires,
chemical spills, urbanization, and any other planned or unplanned impact will
undoubtedly impact all lands areas sometime in the future. Although it is unlikely
that an adequate study design will be available serendipitously, it is likely that some
useful comparisons can be made using “old” data (e.g., to determine sample size for
a planned experiment). All that is required to ensure that all studies can be used in
the future is thoughtful marking and referencing of study plots. All management
agencies, universities, and private foundations should establish a central record-
keeping protocol. Unfortunately, even dedicated research areas often fail to do so;
no agency or university we are aware of has established such a protocol for all
research efforts.
It is difficult to imagine the amount of data that have been collected over time.
Only a tiny fraction of these data resides in any permanent databases. As such,
except for the distillation presented in research papers, these data are essentially
lost to scientists and managers of the future. Here again, it is indeed rare that any
organization requires that data collected by their scientists be deposited and main-
tained in any centralized database. In fact, few maintain any type of catalog that at
least references the location of the data, contents of the data records, and other per-
tinent information.
Perhaps one of the biggest scientific achievements of our time would be the
centralization, or at least central cataloging, of data previously collected (that which
is not lost). An achievement unlikely to occur. Each research management organi-
8.2 Sequence of Study Design, Analysis, and Publication 335
zation can, however, initiate a systematic plan for managing both study areas and
data in the future.
Our example here is brief because the process initiated on the study was primarily
continuation, on a regular basis, of the routine established under Step 8, QA/QC
(Sect. 8.2.8). Regular testing of observer data gathering was conducted on a
bimonthly basis. Additionally, all study points and plots were located both on topo-
graphic maps of the study area, and by the use of Global Positioning System (GPS)
technology. These data were included in the final report of the study, thus ensuring
that the sponsoring agencies had a formal record of each sampling location that
could be cross-referenced with the original field data.
Testing of hypotheses and evaluating conceptual models are, of course, central fea-
tures of most wildlife studies (see Sect. 1.3). However, several steps should be taken
before formally testing hypotheses or evaluating models. These include:
1. Calculating descriptive statistics. As a biologist, the researcher should be famil-
iar with not only the mean and variance, but also the form of the distribution and
the range of values the data take on. The message: do not rush into throwing
your data into some statistical “black box” without first understanding the nature
of the data set. Of interest should be the distribution (e.g., normal, Poisson,
bimodal) and identification of outliers.
2. Sample size analyses. Hopefully, sample size analyses will have been an ongo-
ing process in the study. If not, then this is the time to determine if you did,
indeed, gather adequate samples. Many researchers have found, when attempt-
ing to apply multivariate statistics, that they can only use several of the 10, 20,
or even 100 variables they collected because of limited sample sizes.
3. Testing assumptions. Most researchers understand the need to examine their data
for adherence to test assumptions associated with specific statistical algorithms,
such as equality of variances between groups, and normality of data for each vari-
able for a t test for example. In biological analyses, these assumptions are seldom
met, thus rendering the formal use of standard parametric tests inappropriate. To
counter violation of assumptions, a host of data transformation techniques is
available (e.g., log, square root). However, we offer two primary cautions. First,
remember that we are dealing with biological phenomena that may not corre-
spond to a normal statistical distribution; in fact, they usually do not. Thus, it is
usually far more biologically relevant to find a statistical technique that fits the
data, rather than trying to force your biological data to fit a statistical distribution.
336 8 Design Applications
For example, nonlinear regression techniques are available, thus there is little
biological justification for trying to linearize a biological distribution so you can
apply the more widely understood and simple linear regression procedures.
Second, if transformations are applied, they must be successful in forcing the data
into a form that meets assumptions of the test. Most often, researchers simply
state that the data were transformed, but no mention is made of the resulting dis-
tributions (i.e., were the data normalized or linearized?). Nonparametric and
nonlinear methods are often an appropriate alternative to forcing data (especially
when sample sizes are small) to meet parametric assumptions.
Once a specific a-value is set and the analysis performed, the P-value associated
with the test is either “significantly different from a ” or “not significantly different
from a.” Technically, a P-value cannot be “highly significantly different” or “not
significant but tended to show a difference.” Many research articles provide specific
a-values for testing a hypothesis, yet discuss test results as if a were a floating
value. Insightful papers by Cherry (1998) and Johnson (1999) address the issue of
null hypothesis significance testing, and question the historical concentration on
P-values (see Sects. 1.4.1 and 2.6.1). Although we agree with their primary theses,
our point here is that, once a specific analysis has been designed and implemented,
changing your evaluation criteria is not generally acceptable. The salient point is
that the P-value generated from the formal test must be compared directly with the
a-level set during study development; the null hypothesis is either rejected or not
rejected. Borderline cases can certainly be discussed (e.g., a P = 0.071 with an a of
5%). But, because there should have been a very good reason for setting the a-level
in the first place, the fact remains that, in this example, the hypothesis was not
rejected. You have no basis for developing management recommendations as if it
had been rejected. This is why development of a rigorous design, which includes
clear development of the expected magnitude of biological effect (the effect size),
followed by thorough monitoring of sampling methods and sample sizes must be
accomplished throughout the study. Statistical techniques such as power analysis
and model testing (where appropriate) help ensure that rigorous results were indeed
obtained. Many articles and books have been written on hypothesis testing, so we
will not repeat those details here (Winer et al. 1991; Hilborn and Mangel 1997).
Field sampling indicated that removing cowbird adults from the vicinity of nesting
hosts, in combination with removal of cowbird nestlings and addling cowbird eggs
caused a 54% increase in the number of young fledged (2.0 ± 0.8 young/nest on
treatment plots vs. 1.3 ± 1.1 young/nest on nontreated plots; note that these data are
preliminary and should not be cited in the context of cowbird–host ecology and
management). According to our a priori calculations (see Sect. 8.2.6), we could not
detect an effect size of <1 with 80% power and P = 0.1 given our sample sizes.
However, these results for treatment effects did indicate that our findings were not
8.2 Sequence of Study Design, Analysis, and Publication 337
significantly different (P = 0.184) and had an associated power of about 61%. Thus,
the results we obtained were what we should have expected given our initial plan-
ning based on power analysis.
see if (1) sample sizes were analyzed, (2) power analysis was conducted, (3) a was
adhered to, and (4) their interpretation of results is consistent with accepted statisti-
cal procedures (e.g., assumptions tested?).
Comparisons of complementary groups with respect to the level of an estimated
characteristic. “The proportion of adult males 3–4 years of age who had a parasite
load was greater among residents of urban areas (47%) than among residents of non-
urban areas (22%).” For this type of statement to be justified by data, the two esti-
mates should meet minimum specifications of reliability (e.g., each estimate should
have a CV of less than 25%), and they should differ from each other statistically.
● Fifth tier. Similar to fourth tier except accepts papers of very local interest,
including distribution lists and anecdotal notes. Transactions of the Western
Section of the Wildlife Society, Bulletin of the Texas Ornithology Society, Texas
Journal of Science (or similar).
We again hasten to add that there are numerous seminal papers even in fifth tier jour-
nals. Virtually any paper might help expand our knowledge of the natural world.
Within reason, we should strive to get the results of all research efforts published.
Some journals emphasize publication of articles that have broad spatial applicability
or address fundamental ecological principles. In contrast, other journals publish arti-
cles that include those of relatively local (regional) and species-specific interest.
Contrary to what some people – including journal editors – apparently think, the spa-
tial or fundamental applicability of an article primarily concerns the service that a
journal is providing to its readers (or members of the supporting society) rather than
the quality of the article per se. Thus, publishing in a “lower tier” journal does not
mean your work is not of as high a quality as that published in a high tier outlet.
The publication process is frustrating. Manuscripts are often rejected because
you failed to explain adequately your procedures, expanded application of your
results far beyond an appropriate level, provided unnecessary detail, used inappro-
priate statistics, or wrote in a confusing manner. In many cases, the comments from
the editor and referees will appear easy to address, and you will be confused and
frustrated over the rejection of your manuscript. All journals have budgets, and
must limit the number of printed pages. The editor prioritizes your manuscript rela-
tive to the other submissions and considers how much effort must be expended in
handling revisions of your manuscript. Your manuscript might be accepted by a
lower tier journal, even though it may have received even more critical reviews than
your original submission to a higher tier outlet. This often occurs because the editor
has decided that your paper ranks high relative to other submissions, because the
editor or an associate editor has the time to handle several revisions of your manu-
script, and /or because the journal budget is adequate to allow publication.
The response you receive from an editor is usually one of the following:
1. Accept as is. Extraordinarily rare, but does occur.
2. Tentatively accepted with minor revision. This occurs more often, but is still
relatively rare. The “tentative” is added because you still have a few details to
attend to (probably of minor clarification and editorial in nature).
3. Potentially acceptable but decision will be based on revision. This is usually the
way a nonrejected (see below) manuscript is handled. If the editor considers
your manuscript of potential interest, but cannot make a decision because of the
amount of revision necessary, he will probably send your revision back to the
referees for further comment. The editor expects you to try to meet the criticisms
raised by the referees. It is in your best interest to try to accommodate the refe-
rees, be objective, and try to see their point. A rational, detailed cover letter
should explain to the editor how you addressed referees’ concerns and sugges-
tions, and why you did not follow certain specific suggestions. This is especially
important if your revision is being sent back to the same referees!
340 8 Design Applications
4. Rejected but would consider a resubmission. In this case, the editor feels that
your manuscript has potential, but the revisions necessary are too extensive to
warrant consideration of a revision. The revisions needed usually involve differ-
ent analyses and substantial shortening of the manuscript. In essence, your revi-
sion would be so extensive that you will be creating a much different manuscript.
The revision will be treated as a new submission and will go through a new
review process (probably using different referees).
5. Rejected and suggests submission to a regional journal. The editor is telling you
that you aimed too high and should try a lower tier outlet. Use the reviews in a
positive manner and follow the editor’s suggestion. Your work likely does not
have broad appeal but will interest a more local audience.
6. Returned without review. The editors of some journals might return your manu-
script without submitting it to review. He or she has made the decision that your
manuscript is inappropriate for this journal. Returning without review happens
much more frequently in European ecology journals and in the first and second
tier North American journals. Although it is understandable that an editor wants
to keep the workload on the referees to a minimum, we think it is usually best
to allow the referees to make an independent judgment of all submissions; oth-
erwise, the editor functions as a one-person peer review.
Although the reviews you receive are independent from your study, they are not
necessarily unbiased. We all have biases based on previous experience (see Sect.
1.2.4). Further, at times, personal resentment might sneak into the review (although
we think most people clearly separate personal from professional views).
Nevertheless, it is not uncommon for your initial submission to be rejected. It is not
unusual for the rejection rate on the 1–3 tier journals to exceed 60%. Although dis-
appointing, you should always take the reviews as constructive criticism, make a
revision, and submit to another, possibly lower-tier, journal. Your “duty” as a sci-
entist is to see that your results are published; you have no direct control over the
review process or the outlet your work achieves.
Once you have completed your field or lab study there is nothing you can change
about your study design per se. That is, if you sampled birds in three treated and
three untreated, 1 ha plots; you cannot suddenly have data on birds from 500 m long
transects or from larger plots. But, let us say you have determined (perhaps through
a hypercritical peer review) that you had insufficient samples to determine treat-
ment effects; you needed more than three treated plots. Is your study ruined? If you
are a student, what about completing your thesis? In the sections that follow, we
discuss some general approaches to getting something meaningful out of the data
you collected even when your design was inappropriate, or when a catastrophe
struck your study.
8.3 Trouble Shooting: Making the Best of a Bad Situation 341
As noted in Sect. 8.3, you cannot change your study design after the study is com-
pleted. You can, however, legitimately change the way you group and analyze your
data, which in essence changes your design. For example, say you have collected
data in a series of eight plots with the intent of determining the impacts of thinning
hardwoods on bird abundance. Because there were no pretreatment data, you were
forced to use plots that had been thinned over a period of 4 years; plot size varied
with size of the hardwood stands. These treatment plots were distributed over a
minor elevation gradient (say, 200 m), and were located on slopes of varying slope
and aspect. At once you will recognize several key problems with this design with
regard to treatment effects: the treatments are confounded by differences in age,
elevation, slope, and aspect. Perhaps analysis of covariance can assist with some of
these confounding issues, but problems with sample size and plot size remain. An
alternative to trying to move forward with a study of treatment effects would be to
turn the study around and look at the response of birds to variations in a gradient of
hardwood density and not the treatments per se. Yes, problems with confounding
variables remain, but you will be held to a different standard during the peer review
process by using a gradient approach rather than an experimental “treatment
effects” approach. You will still be able to talk about how birds respond to hard-
wood density, and make a few statements about how “my results might be applica-
ble to the design of hardwood treatments.” Thus, you have a posteriori changed
your method of data processing and analysis (from a two-group treatment vs. no
treatment to a gradient approach).
Most field biologists recognize that the season of study and the age and sex of
the study animals can have a profound influence on study results. As such, data are
often recorded in a manner to separate effects of season (time), age, and sex from
the desired response variable (e.g., foraging rate). However, dividing your sample
into many categories effectively lowers the sample size. That is, if a priori power
analysis indicates that you need a sample of 35 individuals (we discuss the issue of
independence below) to meet project goals, this usually means a sample of 35 each
of, say, adult males, subadult males, adult females, and subadult females. Thus,
your sample has just been increased to about 140 and probably 140 per season.
A standard rule of thumb is to always collect data in reasonable categories because
you can always go back and lump, but often you cannot go back and split (i.e., if
you did not record age you cannot divide your data into age categories). Thus, while
lumping data certainly lessens your ability to tease out biologically meaningful
relationships, it does remain an option when sample sizes are too small in your
desired categories. You do run the risk of obscuring, say, age specific activities:
e.g., the adults show a positive reaction to something you are measuring and the
subadults a negative reaction, which expresses itself in your data as “no reaction.”
But, here we are talking about ways to make the best of a bad situation; we are not
able to put insight into data that were collected in a manner that could obscure
meaningful relationships.
342 8 Design Applications
Related to and an integral part of our discussion and examples in Sect. 8.3.1 is the
altering of your study goals in light of design inadequacies (or insufficient or inap-
propriate samples). Situations do arise that are outside of your reasonable control,
such as natural or human-caused catastrophes. Fires and floods could virtually
obliterate a study area. Perhaps serendipity will have left you with an ideal “treat-
ment effects” study (i.e., pre- and postfire). More likely, you will be left with ashes.
Given that a graduate student does not want to stay on for another 2–3 years to
complete a different study, about all one might be left with in such circumstances
is a brief “before the fire happened” look at the ecology of the animals that were
under study. Alternatively, the study can be altered to select different study sites,
abandon the initial study goals, and expand to look at the basic ecology of the target
animal(s) over a wider geographic area.
As described above (Sect. 8.2.12), there are numerous opportunities to locate a
journal that will welcome your manuscript. It might be that your study has been
compromised in some way that prevents you from generating the type of ecological
or management conclusions that had been intended when you began the work.
Thus, you will likely save yourself, as well as referees and a journal editor, a lot of
time by matching your manuscript with an appropriate journal (as described under
Sect. 8.2.12 regarding tiers of journals). Again, your duty as a scientist is to get
your work published. We are certainly not going to argue that a paper in Science or
Nature would get you more accolades than a paper in the Southwestern Naturalist.
Nevertheless, the fact remains that most of our work will be species and/or site
8.3 Trouble Shooting: Making the Best of a Bad Situation 343
specific. We recommend that you seek the advice from well-published individuals
in selecting a journal for submission of your work.
Another option to consider when things have not gone as you anticipated with
your study is conversion of your focus to that of a pilot study. If you are unable to
draw reasonable conclusions based on your data, then a focus on hypothesis gener-
ating rather than hypothesis testing can be a reasonable approach. For example, say
that your intended study goal was to make recommendations for management of a
rare salamander based on marking and tracking individuals. But, because of various
difficulties in first locating and then tracking the animals (e.g., marks could not be
read), you failed to gather an adequate sample of individuals to address your initial
goal when time and money expired. A reasonable approach, then, would be to focus
your study as more methodological and report on solutions to these difficulties.
Presentation of your ecological (e.g., habitat use) data would be appropriate, but
only as preliminary findings; management recommendations would not likely be
appropriate. While serving as co-Editors of The Journal of Wildlife Management,
Block and Morrison frequently recommended that manuscripts be sent to a regional
natural history journal because meaningful management recommendations could
not be developed from the study results.
Related to focusing on a pilot study is focusing your work as a case study. There
are situations in which your biological study was, perhaps, too localized or too brief in
duration to warrant a full research article. For example, all of us have been contracted
to conduct rather short-term (i.e., one season) assessments of the distribution of endan-
gered species on a wildlife management area, military base, or a site proposed for
development. Additionally, any study that results in a small data set collected over a
short timeframe might be appropriate for addressing as a case study. While these are
valuable data for the issue at hand, they usually have little interest to the general scien-
tific readership of a journal. However, focusing on the issue underlying the reason for
the study rather than the data collected is a viable way to pursue publication. For
example, data collected on the distribution of endangered species on a military base
that is slated for closure and potential economic development could serve as the basis
for an article on the role of military bases in conservation; the story is the vehicle for
carrying the data. Likewise, data collected on a few water catchments could be used to
review and discuss the issue of adding water to the environment.
8.3.3 Leadership
The fundamental resource necessary for success in any scientific study is leader-
ship. Regardless of the rigor of the design and the qualifications of your assistants,
you must be able to train, encourage, critique – and accept criticism and sugges-
tions, and overall guide your study throughout its duration. Leadership skills are
required to develop and guide successful research teams. We have all read job
advertisements in which a requirement reads something like “a proven ability to
work well with others...”. Employers are looking for people they can work with.
344 8 Design Applications
8.4 Summary
References
Alldredge, M. W., T. R. Simons, and K. H. Pollock. 2007. Factors affecting aural detections of
songbirds. Ecol. Appl. 17: 948–955.
Block, W. M., K. A. With, and M. L. Morrison. 1987. On measuring bird habitat: Influence of
observer variability and sample size. Condor 89: 241–251.
Cherry, S. 1998. Statistical tests in publications of The Wildlife Society. Wildl. Soc. Bull. 26:
947–953.
Cochran, W. G. 1983. Planning and Analysis of Observational Studies. Wiley, New York, NY.
References 345
Cook, C. W., and J. Stubbendieck. 1986. Range Research: Basic Problems and Techniques.
Society for Range Management, Denver, CO.
Ehrlich, P. R., D. S. Dobkin, and D. Wheye. 1988. The Birder’s Handbook: A Field Guide to the
Natural History of North American Birds. Simon and Schuster, New York, NY.
Finch, D. M. 1983. Brood parasitism of the Abert’s towhee: Timing, frequency, and effects.
Condor 85: 355–359.
Garton, E. O., J. T. Ratti, and J. H. Giudice. 2005. Research and experimental design, in C. E.
Braun, Ed. Techniques for Wildlife Investigations and Management, pp. 43–71, 6th Edition.
The Wildlife Society, Bethesda, MD.
Harris, J. H., S. D. Sanders, and M. A. Flett. 1987. Willow flycatcher surveys in the Sierra Nevada.
West. Birds 18: 27–36.
Hilborn, R., and M. Mangel. 1997. The Ecological Detective: Confronting Models with Data.
Monographs in Population Biology 28. Princeton University Press, Princeton, NJ.
Johnson, D. H. 1999. The insignificance of statistical significance testing. J. Wild. Manage. 63:
763–772.
Kepler, C. B., and J. M. Scott. 1981. Reducing bird count variability by training observers. Stud.
Avian Biol. 6: 366–371.
Kuhn, T. S. 1962. The Structure of Scientific Revolutions. University of Chicago Press, Chicago, IL.
Laymon, S. A. 1987. Brown-headed cowbirds in California: Historical perspectives and manage-
ment opportunities in riparian habitats. West. Birds 18: 63–70.
Lehner, P. N. 1996. Handbook of Ethological Methods, 2nd Edition. Cambridge University Press,
Cambridge.
Levy, P. S., and S. Lemeshow. 1999. Sampling of Populations: Methods and Applications, 3rd
Edition. Wiley, New York, NY.
Martin, T. E. 1992. Breeding productivity considerations: What are the appropriate habitat features
for management? in J. M. Hagan and D. W. Johnston, Eds. Ecology and Conservation of
Neotropical Migrant Landbirds, pp. 455–473. Smithsonian Institution Press, Washington, DC.
Martin, P., and P. Bateson. 1993. Measuring Behavior: An Introductory Guide, 2nd Edition.
Cambridge University Press, Cambridge.
Morrison, M. L., L. S. Hall, S. K. Robinson, S. I. Rothstein, D. C. Hahn, and J. D. Rich. 1999.
Research and management of the brown-headed cowbird in western landscapes. Stud. Avian
Biol. 18.
Morrison, M. L., B. G. Marcot, and R. W. Mannan. 2006. Wildlife–Habitat Relationships:
Concepts and Applications, 3rd Edition. Island Press, Washington, DC.
Ralph, C. J., and J. M. Scott, Eds. 1981. Estimating numbers of terrestrial birds. Stud. Avian Biol. 6.
Ramsey, F. L., and J. M. Scott. 1981. Tests of hearing ability. Stud. Avian Biol. 6: 341–345.
Robinson, S. K., J. A. Gryzbowski, S. I. Rothstein, M. C. Brittingham, L. J. Petit, and F. R.
Thompson. 1993. Management implications of cowbird parasitism on neotropical migrant
songbirds, in D. M. Finch and P. W. Stangel, Eds. Status and Management of Neotropical
Migratory Birds, pp. 93–102. USDA Forest Service Gen. Tech. Rpt. RM-229. Rocky Mountain
Forest and Range Experiment Station, Fort Collins, CO.
Romesburg, H. C. 1981. Wildlife science: Gaining reliable knowledge. J. Wildl. Manage. 45:
293–313.
Steidl, R. J., J. P. Hayes, and E. Schauber. 1997. Statistical power analysis in wildlife research.
J. Wildl. Manage. 61: 270–279.
Thompson, W. L., G. C. White, and C. Gowan. 1998. Monitoring Vertebrate Populations.
Academic, San Diego, CA.
Welsh Jr., H. H., and A. J. Lind. 1995. Habitat correlates of Del Norte Salamander, Plethodon
elongatus (Caudata: Plethodontidae), in northwestern California. J. Herpetol. 29: 198–210.
Winer, B. J., D. R. Brown, and K. M. Michels. 1991. Statistical Principles in Experimental Design,
3rd Edition. McGraw-Hill, New York, NY.
Chapter 9
Education in Study Design and Statistics
for Students and Professionals
9.1 Introduction
There is a fear of statistics among the public, state and federal officials, and even
among numerous scientists. The general feeling appears to be based on the convo-
luted manner in which “statistics” is presented in the media and by the cursory
introduction to statistics that most people receive in college. Among the media, we
often hear that “statistics can be used to support anything you want”; thus, statistics
(and perhaps statisticians by implication) become untrustworthy. Of course, noth-
ing could be further from the truth. It is not statistics per se that is the culprit.
Rather, it is usually the way in which the data were selected for analysis that results
in skepticism among the public.
Additionally, and as we have emphasized throughout this book, “statistics” and
“study design” are interrelated yet separate topics. No statistical analysis can repair
data gathered from a fundamentally flawed design, yet improperly conducted sta-
tistical analyses can easily be corrected if the design was appropriate. In this chap-
ter we outline the knowledge base we think all natural resource professionals
should possess, categorized by the primary role one plays in the professional field.
Students, scientists, managers, and yes, even administrators, must possess a funda-
mental understanding of study design and statistics if they are to make informed
decisions. We hope that the guidance provided below will help steer many of you
toward an enhanced understanding and appreciation of study design and statistics.
would look rather foolish blaming a failure to protect the species on his or her staff.
That is, how can you manage people if you do not know – at least fundamentally –
what they are doing? As noted by Sokal and Rohlf (1995), there appears to be a
very high correlation between success in biometry and success in the chosen field
of biological specialization.
Wildlife professionals must, at a minimum, be able to ask the proper questions
needed to interpret any report or paper. Such questions include issues of independence,
randomization, and replication; adequacy of sample size and statistical power; pseu-
doreplication and study design; and proper extrapolation of results (as we developed
in Chaps. 1 and 2). You do not, for example, need to know how to invert a matrix to
understand multivariate analyses (see Morrison et al. (2006) for some examples). In
legal proceedings, one must be clear on the reasons underlying a management or regu-
latory decision, but does not need to be able to create statistical software.
Thus, it is incumbent on all professionals to not only achieve an adequate under-
standing of study design and statistics (both basic and advanced), but also keep
current on methodological advances. The field of natural resource management is
becoming more analytically sophisticated (see Chap. 2). For example, it is now
common to use rather complicated population models to assist with evaluation of
the status of species of concern – simply plotting trends of visual counts on an X–Y
graph no longer suffices for either peer-review or management planning. Below we
outline what we consider adequate training in design and statistics for natural
resource professionals, including university training, continuing education, and the
resources available to assist with learning.
It is an axiom that all education must rest on a solid foundation. Otherwise, any hope
of advancement of knowledge and understanding is problematic. Most universities
require that undergraduates in the sciences (e.g., biology, chemistry, physics, and geol-
ogy) have taken courses in mathematics through algebra, trigonometry, and often cal-
culus. Beyond these basic courses, universities vary in their requirements for students
specializing in natural resources and their management. Many popular biostatistics
textbooks are written so as to not require mathematical education beyond elementary
algebra (e.g., Hoshmand 2006; Sokal and Rohlf 1995; Zar 1998), or are written in a
“nonmathematical” manner (e.g., Motulsky 1995). These are good books that impart a
solid foundation of basic statistics – mentioning their requirements imply no criticism.
Sokal and Rohlf (1995) noted that, in their experience, people with limited mathemati-
cal backgrounds are able to do excellently in biometry. They thought there was little
correlation between innate mathematical ability and capacity to understand biometric
methodologies. However, many more advanced mathematical and statistical methods
in natural resources require an understanding of more advanced mathematics, including
calculus (e.g., many modeling techniques, population estimators). All students planning
on later receiving graduate degrees should take at least an introductory course in
350 9 Education in Study Design and Statistics for Students and Professionals
The heading for this subsection implies a dichotomy between biometric and other
approaches to statistics. In reality, textbooks and the way statistics courses are taught
vary widely, from very applied, “cookbook” approaches, to highly theoretical instruc-
tion into the underlying mathematics of statistical methods. As noted earlier in this
chapter, we think that all resource professionals require, at a minimum, a good knowl-
edge of the principles of study design and statistics. Thus, the applied approach,
which minimizes formula and mathematics, is adequate in many cases for interpreta-
tion of research results. Knowing that ANOVA somehow looks for significant differ-
ences in two or more groups, that various rules-of-thumb are available to determine
necessary sample size, and that pseudoreplication includes the inappropriate calculat-
ing of sample sizes will suffice for many biologists, managers, and administrators.
Thus, “biometrics” courses tend to sacrifice fundamentals of theory and
inference for applications and interpretation. This is appropriate if the course is
considered a self-contained survey of common procedures, and not considered a
prerequisite to more advanced statistics courses. However, if your expectation is
that you will be conducting independent research and writing and evaluating scien-
tific publications, then a better understanding of the mathematical underpinnings of
statistics is required. Using our previous examples, advancing from simple one-
way ANOVA to tests of interactions and blocking, properly anticipating sample
size requirements (e.g., through power analysis), and understanding the statistical
basis of pseudoreplication all require that the mathematics of the procedures be
understood at least in general terms. The ability to interpret an ANOVA computer
printout is much different from being able to explain how the residuals (error) were
calculated. The single-semester “biometrics” courses often offered in biology and
natural resource programs do not provide these fundamentals.
When graduate school is the goal, it is probably better to sacrifice application
(i.e., the single-semester biometrics course) for fundamentals. Many universities
offer a two-semester “fundamentals” course within the statistics department; many
also offer a version of these courses for nonstatistics majors. Such courses usually
require, for example, that each step in an ANOVA can be interpreted – calculation
of degrees of freedom, sources of variation, and interaction terms. Such understand-
ing is necessary to properly analyze and interpret complicated data, and is funda-
mental to more advanced parametric techniques (e.g., multivariate analyses). It is
unlikely that the undergraduate will have time to take additional statistics courses.
9.2 Basic Design and Statistical Knowledge 351
The first task of many new graduate students is to fulfill the courses they missed
(or avoided) at the undergraduate level. Many universities offer graduate level sta-
tistics courses in the Statistics Department aimed at nonmajors to fill these gaps.
Such courses often cover two semesters and offer a detailed coverage of the funda-
mentals of statistics. However, most frequently these courses focus only on applica-
tion of statistical approaches, rather than delving into the theory behind those
applications. This is where the advantage of solid, fundamental mathematical and
statistical training during one’s undergraduate training begins to show its advan-
tages. Such courses usually allow graduate students to step directly into more
advanced courses such as sampling, nonparametric statistics, categorical data anal-
ysis, multivariate statistics, and experimental design.
There are often several tests available to analyze a data set; choosing the most appro-
priate test can often be tricky. A fundamental decision that must be made, however,
involves choosing between the two families of tests: namely, parametric and non-
parametric tests (e.g., see Motulsky (1995) for a good discussion). Many sampling
problems in natural resources involve small populations and/or populations that do
not exhibit a normal distribution, i.e., they are skewed in some fashion. Large data
352 9 Education in Study Design and Statistics for Students and Professionals
sets usually present no problem. At large sample size, nonparametric tests are ade-
quately powerful, and parametric tests are often robust to violations of assumptions
as expected based on the central limit theorem. It is the small data set that represents
the problem. It is difficult to determine the form of the population distribution, and
the choice of tests becomes problematic: nonparametric tests are not powerful and
parametric tests are not robust (Motulsky 1995, p. 300).
The researcher is presented with two major choices when dealing with samples that
do not meet parametric assumptions. The choice initially selected by most researchers
is to perform transformations of the original data such that the resulting variates meet
the assumptions for parametric tests. Transformations, in essence, “linearize” the data.
To some, implementing transformations seems like “data grinding,” or manipulation
of data to try and force significance (Sokal and Rohlf 1981, p. 418). Further, most
people have a difficult time thinking about the distribution of the logarithm of tree
height, or the square root of canopy cover. Although it may take some getting use to,
there is no scientific necessity to use common linear or arithmetic scales. For example,
the square root of the surface area of an organism is often a more appropriate measure
of the fundamental biological variable subjected to physiological and evolutionary
forces than is the surface area itself (Sokal and Rohlf 1981, p. 418).
However, although attempting to transforms your data to meet assumptions of
parametric tests might be statistically sound, such actions also likely obscure biologi-
cal relationships. We go back once again to the fundamental importance of viewing
your data graphically before applying any statistical tests. Visual examinations often
reveal interesting biological properties of your data, such as nonlinear relationships
and distinct thresholds in response variables. Further, applying transformations to
data does not usually linearize biological data. Additionally, if data are transformed
for analysis, they must be back transformed if biological interpretations to be valid.
The second choice involves the use of nonparametric tests. Most common para-
metric tests have what we could call nonparametric equivalents, including multi-
variate analyses (Table 9.1). Nonparametric tests are gaining in popularity as
researchers become more familiar with statistics, and concomitantly, as nonpara-
metric tests are increasingly being included on canned statistical packages. Because
beginning and intermediate statistics courses spend little time with nonparametric
statistics (concentrating primarily on chi-square tests), wildlife scientists are not as
familiar with the assumptions or interpreting the results of nonparametric tests as
they are with the parametric equivalents. This engenders a resistance among many
to use of the nonparametric tests.
So, how do researchers handle the difficulties of small sample size and data that
are in violation of assumptions of parametric tests? The procedures are many,
although not necessarily always appropriate. In virtually any issue of a major ecol-
ogy journal you can find studies that:
● Simply conduct parametric tests and say nothing about testing assumptions
● Conduct tests of assumptions but do not say if assumptions were met
● Conduct nonparametric tests without giving the rationale for their use or stating
whether these tests met relevant assumptions
9.2 Basic Design and Statistical Knowledge 353
Fortunately, by the 1990s, most journals insisted that statistical procedures be fully
explained and justified; today, few papers lack details on the testing of assump-
tions. However, in our readings, it is quite common to read that transformations
were performed, but no mention is given regarding the success of those transforma-
tions in normalizing data. Simply performing transformations does not necessarily
justify using parametric tests. Thus, the graduate student would be advised to take
a course in nonparametric statistics. There is no doubt that all researchers will have
the need to use these tests, especially those listed in Table 9.1.
Perhaps the most relevant advanced statistical course graduate students in wildlife
sciences should consider is one that covers analysis of categorical data. Categorical
data analysis, or analyses of data categorized based on a measurement scale con-
sisting of a set of categories (Agresti 1996), has seen a considerable increase in
applications to wildlife research. These measurement scales typically are either
ordinal (data has a natural ordering such as age classes) or nominal (data has no
natural ordering, such as names of different birds species located at a site). Thus,
categorical data analysis makes use of both parametric and nonparametric statisti-
cal procedures.
For many wildlife studies, we deal with data that are either distributed binomi-
ally (0, 1; died or survived) or placed into a categorical framework (counts of indi-
viduals within a plot). Thus, fundamental understanding of binomial, multinomial,
Poisson, and exponential distributions are necessary for a majority of statistical
analyses used in wildlife ecology. For example, estimation of survival is often con-
ducted using logistic regression, a form of a generalized linear model. Logistic
regression relies on the logit link function, based on the binomial distribution, so
that predictions of survival will be mapped to the range 0–1. Additionally, logit link
functions can be used to evaluate proportional odds for ranked data (Agresti 1996)
and underpin a host of the current capture–mark–recapture modeling approaches
used wild wildlife science ( Williams et al. 2002).
Frequently, many data of interest to wildlife ecologists are represented by dis-
crete counts. The primary sampling model for count data is the Poisson regression,
which is used to analyze count data as a function of various predictive variables,
most frequently as a log-linear model, or a model where the log link function is used
(Mood et al. 1974). Categorical data analysis is a field of statistics that has seen
considerable research interest, ranging from simple contingency table analyses using
chi-square tests to methods for longitudinal data analysis for binary responses.
Although perhaps not obvious to many wildlife scientists, a majority of the statistical
approaches used in wildlife ecology rely on categorical data analysis theory, thus
highlighting its importance to wildlife students.
354 9 Education in Study Design and Statistics for Students and Professionals
Most ecologists agree that the best decisions are those based on a solid database –
the real stuff. However, there are numerous circumstances where the issue of the
moment (e.g., management of endangered species) does not allow gathering of the
data everyone would desire. For example, where the long-term persistence of a
population in the face of development must be evaluated without the benefit of
detailed demographic studies. Further, there are numerous situations where a good
database exists, but the questions being asked concern the probability of population
response to different management scenarios. For example, the influence of differ-
ent livestock grazing intensities on the fecundity of deer. Model-based analyses are
usually required to make such projections. Thus, we recommend that graduate stu-
dents become familiar with basic modeling and estimation procedures, including
analyses of population growth rates, and density estimators. These procedures
require an understanding of matrix algebra and calculus.
9.2.2.6 Priorities
Obviously, any person would be well served by taking all of the courses described
above. But, given the competing demands of other courses and fieldwork, what
should the graduate student prioritize? We would like to see every MS student take
a course in basic sampling design as well as the two-semester fundamental statistics
courses. PhD students, on the other hand, should not only have Master’s level
coursework in sampling design, but also have additional courses such as probability
theory, and a calculus-based math–stat course covering basic statistical theory and
356 9 Education in Study Design and Statistics for Students and Professionals
The natural resource manager must balance many competing issues when perform-
ing his or her duties. Many or most of the duties involve statistics, e.g., surveys of
user preferences for services, designing and implementing restoration plans, man-
aging and evaluating harvest records, monitoring the status of protected species,
evaluating research reports, and budgetary matters. The statistical preparation out-
lined above for the undergraduate also applies here: A minimum of a general bio-
metrics course. It is probably preferable to obtain a solid grasp of applications
rather than the more fundamental statistics courses. Obviously, the more the
better!
In addition to a basic understanding of statistics, managers need to understand
the importance to statistics in making decisions. Personal opinion and experience
certainly have a place in management decisions. [Note: we contrast personal opin-
ion with expert opinion. Personal opinion implies a decision based on personal
biases and experiences. In contrast, expert opinion can be formalized into a process
that seeks the council of many individuals with expertise in the area of interest.]
However, managers must become sufficiently versed in study design and statistics
and avoid falling into the “statistics can be used to support anything you want”
dogma. Falling back on personal opinion to render decisions because of statistical
ignorance is not a wise management action. Using sound analyses avoids the
appearance of personal bias in decision making, and provides documentation of the
decision-making process; this is quite helpful in a legal proceeding.
Managers should also have an appreciation of statistical modeling and manage-
ment science (e.g., adaptive resource management). We contend that every man-
ager builds models in that every manager makes predictions (at least mentally)
about how the system he or she is managing will respond to any given management
action. Knowing the principles of management science will assist the manager in
structuring the problem in his or her own thought processes, especially when the
problem becomes very complex or other parties (e.g., stakeholders) must be
brought into the decision process. These principles help to identify the sources of
uncertainty (e.g., environmental variation, competing theories about system
dynamics, partial controllability and observability of the system) that must be
addressed, and how to manage in the face of them.
Managers require the same formal statistical training as outlined above for
graduate students. Many students who were training as researchers – and thus
received some statistical training – become managers by way of various job
changes and promotions. However, many managers either never proceeded beyond
the undergraduate level or completed nonthesis MS options. Unfortunately, most
nonthesis options require little in the way of statistics and experimental design.
9.3 Resources 357
Thus, as professionals, they are ill-prepared to handle the aspects of their profession
on which most management decisions are based (see also Garcia 1989; Schreuder
et al. 1993; Morrison and Marcot 1995).
Thus, managers should be sufficiently motivated to obtain advanced training in
statistics and design. This training can be gained through a variety of sources,
including self-training, college courses, and professional workshops. Further,
enlightened administrators could organize internal training workshops by contract-
ing with statistical and design consultants.
The duties of manager and administrator – and sometimes even scientist – are often
difficult to separate. Also, as discussed above for the manager, people often become
administrators after stints as a manager or researcher. However, others become
administrators of various natural resource programs through processes that involve
little or no ecological – and especially statistical – training. Such individuals, nev-
ertheless, need to be able to interpret the adequacy of environmental monitoring
plans, impact assessments, research papers, personal opinion, and a host of other
information. After all, it is the administrator who ultimately gives approval, and is
often called upon to justify that approval. It is true that administrators (and manag-
ers) can hire or consult with statisticians. However, they must still be able to
explain their decision-making process and answer questions that would challenge
anyone with only a rudimentary understanding of technical matters.
We recommend that natural resource administrators be at least as knowledgea-
ble as the managers under their supervision. Thus, administrators should possess
the knowledge of statistics and design as outlined above for MS students.
9.3 Resources
9.3.1 Books
All natural resource administrators, managers, and researchers should have a per-
sonal library of books that are readily available for reference. This costs money, but
the alternative is either ignorance or constant trips to a colleague’s office or the
library. Here, we provide some suggestions for assembling a small personal library
that provides references for common study designs and statistical analyses.
Fortunately, the basic designs and statistical procedures are relatively stable
through time. As such, one does not need to purchase the latest edition of every
text. In fact, the basic text used in most reader’s undergraduate and graduate
courses in statistics and study design certainly form the core of a personal library.
358 9 Education in Study Design and Statistics for Students and Professionals
Kish (1987, p. vi) and Kish (2004) described the general area of statistical
design as “ill-defined and broad,” but described three relatively well-defined and
specialized approaches (1) experimental designs that deal mostly with symmetrical
designs for pure experiments, (2) survey sampling that deals mostly with descrip-
tive statistics, and (3) observational studies including controlled investigations and
quasiexperimental designs. There are numerous books that address each of these
topics. As one’s library grows and specific needs arise, we suspect that these spe-
cific topics will be added to the library.
Making recommendations for specific books is difficult because there are a
multitude of excellent books available. Below we list some of our favorites, catego-
rized by general analytical family. The fact we do not list a specific title by no
means indicates our displeasure with its contents or approach; rather, these are
books we have used and know to be useful. Each personal library should contain a
book that covers each of the major categories listed below. Topics indented as sub-
categories provide more detailed coverage of the more common topics listed in the
primary categories; these would be useful but not essential (i.e., could be reviewed
as needed in a library, or added later as the need becomes evident).
The books listed first are general overviews of two or more of the subtopics
below:
● Kish (1987). A well-written, although brief, review of the topics of experimental
design, survey sampling, and observational studies. A good introduction to these
topics. This book has been reprinted as Kish (2004). A related offering is Kish
(1995), which is a reprinting of his original 1965 edition.
● Manly (1992). An advanced coverage, emphasizing experimental designs, and
including liner regression and time series methods.
Experimental Design (ANOVA)
● Underwood (1997). This very readable book emphasizes application of ANOVA
designs to ecological experimentation. We highly recommend this book.
Survey Sampling
● Survey sampling can be considered a subclass of the next subtopic, observa-
tional studies, but is separated because of its common use.
● Levy and Lemeshow (1999). A popular book that presents methods in a step-by-
step fashion. A nice feature of this book is the emphasis on determining proper
sample sizes; also discusses statistical software.
Observational Studies (controlled investigations)
● Cochran (1983). A short book that begins with some very useful material on
planning observational studies and interpreting data.
9.3 Resources 359
These texts assume little or no mathematical knowledge. These texts are not the
recommended stepping stone to more advanced statistical procedures:
● Watt (1998). A beginning text that explains basic statistical methods and
includes descriptions of study design as applied to biology.
● Fowler et al. (1998). Another basic text that is easy to read and provides a good
foundation with little use of mathematics for the field biologist.
● Motulsky (1995). A basic text that uses a minimal amount of mathematics to
survey statistics from basics through more advanced ANOVA and regression.
This is a good text for those not likely to advance immediately to more sophisti-
cated procedures. It uses examples from the statistical software InStat (GraphPad
Software, San Diego, CA), a relatively inexpensive program. The book and
software would make a useful teaching tool for basic analyses of biological
data.
9.3.1.3 Fundamentals
These texts assume knowledge of college algebra and incorporate fairly detailed
descriptions of the formulas and structures of statistical procedures. This type of
knowledge is necessary before advancing to more complicated statistical
procedures:
● Sokal and Rohlf (1995). A widely used text that emphasizes biological applica-
tions. It covers primarily parametric tests from an elementary introduction up to
the advanced methods of ANOVA and multiple regression.
● Zar (1998). A widely used text that provides introductory yet detailed descrip-
tions of statistical techniques through ANOVA and multiple regression. Zar also
provides very useful chapters on analysis of circular distributions.
Nonparametric and Categorical Data Analysis:
● Agresti (2002). Concentrates on two-way contingency tables, log-linear and
logit models for two-way and multiway tables, and applications of analyses. Le
(1998) presents a similar coverage and is readable.
● Stokes et al. (2000) presents a thorough development of categorical methods
using SAS as the analytical system.
● Hollander and Wolfe (1998). A detailed and comprehensive coverage of non-
parametric statistics.
360 9 Education in Study Design and Statistics for Students and Professionals
Dillon and Goldstein (1984), Manly (2004), and Afifi (2004) are all very readable
and thorough coverages of standard multivariate methods. We particularly recom-
mend Afifi’s text given the emphasis he places on interpretation of results. Included
are examples using the more commonly used statistical packages.
Hosmer and Lemeshow (2000) details the use of logistic regression, which has
become one of the most widely used multivariate procedures in wildlife science.
Kleinbaum (2005) is written in an understandable manner for the nonstatistician
and is aimed at graduate students.
Here we present some of the many resources available over the internet that focus
on design and statistical analyses. We usually provide the IRL for the home page
of the organization sponsoring the Web page because the specific within-Web site
links often change through time. Only sites offering free access to programs are
provided; commercial sites (regardless of the quality of the products offered for
purchase) are not listed:
● USGS Patuxent Wildlife Research Center (http://www.mbr-pwrc.usgs.gov/
software.html): Contains an extensive list of programs focused on analyses of
animal populations, including survival estimation and capture probabilities. Also
contains or provides links to documentation of programs and literature sources.
● Illinois Natural History Survey (http://nhsbig.inhs.uiuc.edu): Manages the
Clearinghouse for Ecological Software, which provides programs for density
estimation, bioacoustics, home range analysis, estimating population parame-
ters, habitat analysis, and more. For habitat analysis, programs such as Fragstats
can be located.
● Colorado State University (http://www.warnercnr.colostate.edu): Offers the
widely used program MARK (developed and maintained by Dr. Gary White), as
well as other widely used programs such as CAPTURE and DISTANCE.
9.4 Summary 361
9.4 Summary
We have emphasized throughout this book, “statistics” and “study design” are
interrelated yet separate topics. No statistical analysis can repair data gathered from
a fundamentally flawed design, yet improperly conducted statistical analyses can
easily be corrected if the design was appropriate. In this chapter we provided spe-
cific guidance regarding the knowledge that we think all resource professionals
should possess, including students, scientists, managers, and administrators. All
resource professionals must possess a fundamental understanding of study design
if they are to make informed decisions. Wildlife professionals must, at a minimum,
be able to ask the proper questions needed to interpret any report or paper. Such
questions include issues of independence, randomization, and replication; adequacy
of sample size and statistical power; pseudoreplication and study design; and
proper extrapolation of results.
Because many of the more advanced mathematical and statistical methods in
natural resources require an understanding of more advanced mathematics, includ-
ing calculus, we recommend that students planning on receiving graduate degrees
should take at least a beginning course in calculus in preparation for advanced
methods in natural resources. Otherwise, you will be limited in the types of courses
you will be qualified to take in graduate school. Many “biometrics” courses tend to
sacrifice fundamentals for specific applications and interpretation. When graduate
school is the goal, it is probably better to sacrifice application (i.e., the single-
semester biometrics course) for fundamentals. Graduate students must obtain a
good understanding of experimental design, and take the opportunity to receive
advanced statistical training in topics such as nonparametric, categorical, and mul-
tivariate analyses.
In addition to a basic understanding of statistics, managers and administrators
need to understand the importance to study design and statistics in making deci-
sions. Personal opinion and experience certainly have a place in management deci-
sions, but all resource professionals must be able to grasp the strengths and
weaknesses of various sampling approaches. Using sound analyses avoids the
appearance of personal bias in decision making, and provides documentation of the
decision-making process; this would be quite helpful in a legal proceeding.
We also provide guidance on classes to take, books to own and use as reference
sources, and other ways in which you can obtain and maintain needed design and
analytical skills. We also provide a list of Web sites where you may obtain
extremely useful software to aid in ecological analyses.
362 9 Education in Study Design and Statistics for Students and Professionals
References
Afifi, A. A. 2004. Computer-Aided Multivariate Analysis, 4th Edition. Chapman and Hall/CRC,
Boca Raton, FL.
Agresti, A. 1996. An Introduction to Categorical Data Analysis. Wiley, New York, NY.
Agresti, A. 2002. Categorical Data Analysis, 2nd Edition. Wiley, New York, NY.
Cochran, W. G. 1983. Planning and Analysis of Observational Studies. Wiley, New York, NY.
Conover, W. J. 1999. Practical Nonparametric Statistics, 3rd Edition. Wiley, New York, NY.
Dillon, W. R., and M. Goldstein. 1984. Multivariate Analysis: Methods and Applications. Wiley,
New York, NY.
Draper, N. R., and H. Smith. 1998. Applied Regresion Analysis, 3rd Edition. Wiley, New York, NY.
Fowler, J. L. Cohen, and P. Jarvis. 1998. Practical Statistics for Field Biology, 2nd Edition. Wiley,
New York, NY.
Garcia, M. W. 1989. Forest Service experience with interdisciplinary teams developing integrated
resource management plans. Environ. Manage. 13: 583–592.
Hollander, M., and D. A. Wolfe. 1998. Nonparametric Statistical Methods, 2nd Edition. Wiley,
New York, NY.
Hoshmand, A. R. 2006. Design of Experiments for Agriculture and the Natural Sciences, 2nd
Edition. Chapman and Hall/CRC, Boca Raton, FL.
Hosmer Jr., D.W., and S. Lemeshow. 2000. Applied Logistic Regression, 2nd Edition. Wiley, New
York, NY.
Kish, L. 1987. Statistical Design for Research. Wiley, New York, NY.
Kish, L. 1995. Survey Sampling. Wiley, New York, NY (reprint of the 1965 edition).
Kish, L. 2004. Statistical Design for Research. Wiley, New York, NY (reprint of the 1987 edition).
Kleinbaum, D. G. 2005. Logistic Regression: A Self-Learning Text, 2nd Edition. Springer-Verlag,
New York, NY.
Le, C. T. 1998. Applied Categorical Data Analysis. Wiley, New York, NY.
Levy, P. S., and S. Lemeshow. 1999. Sampling of Populations: Methods and Applications, 3rd
Edition. Wiley, New York, NY.
Manly, B. F. J. 1992. The Design and Analysis of Research Studies. Cambridge University Press,
Cambridge.
Manly, B. F. J. 2004. Multivariate Statistical Methods: A Primer, 3rd Edition. Chapman and Hall,
Boca Raton, FL.
Mood, A. M., F. A. Graybill, and D. C. Boes. 1974. Introduction to the Theory of Statistics, 3rd
Edition. McGraw-Hill, Boston, MA.
Morrison, M. L., and B. G. Marcot. 1995. An evaluation of resource inventory and monitoring
program used in national forest planning. Environ. Manage. 19: 147–156.
Morrison, M. L., B. G. Marcot, and R. W. Mannan. 2006. Wildlife Habitat Relationships:
Concepts and Applications, 3rd Edition. Island Press, Washington, DC.
Motulsky, H. 1995. Intuitive Biostatistics. Oxford University Press, New York, NY.
Rosenbaum, P. R. 2002. Observational Studies, 2nd Edition. Springer-Verlag, New York, NY.
Schreuder, H. T., T. G. Gregoire, and G. B. Wood. 1993. Sampling Methods for Multiresource
Forest Inventory. Wiley, New York, NY.
Sokal, R. R., and F. J. Rohlf. 1981. Biometry, 2nd Edition. Freeman, New York, NY.
Sokal, R. R., and F. J. Rohlf. 1995. Biometry, 3rd Edition. Freeman, New York, NY.
Stokes, M. E., C. S. Davis, and G. G. Koch. 2000. Categorical Data Analysis in the SAS System,
2nd Edition. SAS Publishing, Cary, NC.
Thompson, S. K. 2002. Sampling, 2nd Edition, Wiley, New York, NY.
Underwood, A. J. 1997. Experiments in Ecology. Cambridge University Press, Cambridge.
Watt, T. A. 1998. Introductory Statistics for Biology Students, 2nd Edition. Chapman and Hall,
Boca Raton, FL.
Williams, B. K., J. D. Nichols, and M. J. Conroy. 2002. Analysis and Management of Animal
Populations. Academic, San Diego, CA.
Zar, J. H. 1998. Biostatistical Analysis, 4th Edition. Prentice-Hall, Englewood Cliffs, NJ.
Chapter 10
Synthesis: Advances in Wildlife Study Design
10.1 Introduction
In this chapter, we first briefly summarize our ideas on how to improve the way we
pursue wildlife field studies through study design. We hope that our ideas, devel-
oped through the pursuit of many types of studies conducted under many different
logistic and funding constraints, will serve to continue the discussion on improving
scientific knowledge, conservation, and management of natural resources. We then
provide the reader with a study guide for each chapter that serves as a reminder of
the major points raised therein.
The underlying basis for wildlife research is the pursuit of knowledge about eco-
logical systems. For this reason, researchers must understand the nature of the real-
ity they study (ontology), the characteristics and scope of knowledge (epistemology),
and what characterizes valuable and high quality research as well as value judg-
ments made during the research process (axiology). Although there is no single
prescriptive method of research in natural science, wildlife researchers employ cer-
tain intellectual and methodological approaches in common (see Chap. 1).
The goal of wildlife ecology research is to develop knowledge about wildlife popula-
tions and the habitats these populations use in order to benefit conservation. To attain
this goal, wildlife ecologists draw from the fields of molecular biology, animal physiol-
ogy, plant and animal ecology, statistics, computer science, sociology, public policy,
economics, law, and many others disciplines when developing wildlife research studies.
Using our knowledge of the species or system of interest, we ask important questions
and generate hypotheses or statements about how we think the system works. We then
draw on tools from many scientific disciplines to study, evaluate, and then refine our
hypotheses about how ecological systems work, generate new hypotheses, ask new
questions, and continue the learning process (see Table 1.2). It is critical that those
implementing conservation, such as natural resource managers, also clearly understand
the basics of sound methods of wildlife research; this knowledge is required to evaluate
the quality of information available to them for making decisions.
Our review of wildlife study design and statistical analyses leads us to the fol-
lowing conclusions and suggestions for change. First, the field of ecology will fail
to advance our knowledge of nature unless we ask important research questions and
follow rigorous scientific methods in the design, implementation, and analysis of
research and surveys. Natural resource management and conservation in general is
ill served by poorly designed studies that ignore the necessity of basic concepts
such as randomization and replication. More often than not, studies that ignore
sound design principles produce flawed results.
Scientists must clearly elucidate study goals, and the spatial and temporal applicabil-
ity of results, before initiating sampling. It is critical that managers determine how and
where they will use study results so that results match needs. Researchers should care-
fully evaluate required sample size for the study before initiation of field sampling.
Simple steps, such as sample size determination or power analysis, allow the researcher
to evaluate the likely precision of results before the study begins. In this manner,
researchers and natural resource managers alike can anticipate confidence in their deci-
sions based on study results. Wildlife scientists require probabilistic samples and replica-
tion for all studies so that there is less chance that the results are biased and a greater
likelihood that variation in the results can be attributable to treatment effects when they
exist. Establishing replicates is often difficult in field situations, but scientists can usually
achieve replication with planning. We must avoid pseudoreplication, however, so that
natural resource managers do not make unsound decisions based on erroneous interpreta-
tions of data. If pseudoreplication is unavoidable (e.g., such as is often the case with iso-
lated, rare groups of animals), we must acknowledge the implications of the sampling
and account for it when interpreting results. Finally, we must interpret studies that do not
employ probabilistic sampling and replication (all descriptive studies) critically. Although
descriptive research can provide reliable data on such characteristics as typical clutch
sizes for a given bird species, it generally cannot provide reliable data on more complex
phenomena such as key factors limiting abundance of an endangered species.
In Sect. 10.3, we briefly summarize the primary points made in each of the previous
chapters. We hope that these summaries will help flesh out the points made in Sect.
10.2 and refer readers back to the appropriate chapters more details where needed.
10.3 Summaries
● Suggests that researchers must determine the best approach for each individ-
ual study given specific constraints; it does not provide or condone a rote
checklist for excellent wildlife research programs.
12. Wildlife scientists use biological and statistical terms to represent various
aspects of what they study. This sometimes can be confusing as the same word
often is used in multiple contexts. Key biological and statistical concepts dis-
cussed in Chap. 1 and used in subsequent chapters include:
● The term “significant” is particularly problematic as it can mean that some-
thing is biologically, statistically, or socially significant. Based on a particu-
lar study, not all statistically significant differences matter biologically, and
just because we cannot find statistically significant differences does not
imply that important biological differences do not indeed exist. Further, if
wildlife scientists find something biologically significant does not imply that
society will reach the same conclusion (and vice versa). Researchers must
clearly stipulate what they mean by “significant.”
13. Finally, we can divide wildlife studies into (1) those where the objectives focus
on measuring something about individual animals or groupings of animals, (2)
and those where the objectives focus on the habitat of the animal or group. This
differentiation is critically important as appropriate study design hinges upon it.
1. Sound wildlife study design relies on the ability of the scientist to think critically
when developing a study plan. Critical thought about the question of interest, the
system under study, and potential methods for separating and evaluating sources
of variation is necessary to ensure that we successfully define the causal and
mechanistic relationships between variables of interest.
2. Disturbing variables limit our ability to examine the impacts of explanatory variables
on the response variables of interest. Disturbing variables should be removed from the
study through controlling for them by design using appropriate probabilistic sampling
methods, or in the analysis by treating then as controlled variables or covariates.
3. Random selection of experimental study units permits us to use probability the-
ory to make statistical inferences that extend to target population. Random
assignment of treatments to study units helps to limit or balance the impacts of
disturbing factors. Replication of experimental treatments is necessary to cap-
ture the full variability of treatment effects.
4. When developing a study, determine what type of design is most appropriate for
the ecological question of interest. Determine whether a true experiment or
quasiexperiment is feasible, whether a study is best suited to a mensurative
approach, whether adaptive resource management is more appropriate, or
whether the study is limited to description alone.
10.3 Summaries 367
possible. The important point here is that all these studies are constrained by a
specific protocol designed to answer specific questions or address hypotheses
posed prior to data collection and analysis.
2. Once a decision is made to conduct research there are a number of practical
considers including the area of interest, time of interest, species of interest,
potentially confounding variables, time available to conduct studies, budget, and
the magnitude of the anticipated effect.
3. Single-factor designs are the simplest and include both paired and unpaired
experiments of two treatments or a treatment and control. Adding blocking,
including randomized block, incomplete block and Latin squares designs further
complicates the completely randomized design. Multiple designs include facto-
rial experiments, two-factor experiments and multifactor experiments. Higher
order designs result from the desire to include a large number of factors in an
experiment. The object of these more complex designs is to allow the study of
as many factors as possible while conserving observations. Hierarchical designs
as the name implies increases complexity by having nested experimental units,
for example split-plot and repeated measures designs.
4. ANCOVA uses the concepts of ANOVA and regression to improve studies by
separating treatment effects on the response variable from the effects of covari-
ates. ANCOVA can also be used to adjust response variables and summary sta-
tistics (e.g., treatment means), to assist in the interpretation of data, and to
estimate missing data.
5. Multivariate analysis considers several related random variables simultane-
ously, each one being considered equally important at the start of the analysis.
This is particularly important in studying the impact of a perturbation on the
species composition and community structure of plants and animals. Multivariate
techniques include multidimensional scaling and ordination analysis by meth-
ods such as principal component analysis and detrended canonical correspond-
ence analysis.
6. Other designs are frequently used to increase efficiency, particularly in the face
of scarce financial resources or when manipulative experiments are impractical.
Examples of these designs include sequential designs, crossover designs, and
quasiexperiments. Quasiexperiments are designed studies conducted when con-
trol and randomization opportunities are limited. The lack of randomization
limits statistical inference to the study protocol and inference is usually expert
opinion. The BACI study design is usually the optimum approach to quasiex-
periments. Meta-analysis of a relatively large number of independent studies
improves the confidence in making extrapolations from quasiexperiments.
7. An experiment is considered very powerful if the probability of concluding no
effect when in fact effect does exist is very small. Four interrelated factors deter-
mine statistical power: power increases as sample size, a-level, and effect size
increase; power decreases as variance increases. Understanding statistical power
requires an understanding of Type I and Type II error, and the relationship of
these errors to null and alternative hypotheses. It is important to understand the
concept of power when designing a research project, primarily because such
10.3 Summaries 369
1. Wildlife populations and ecologies typically vary in time and space. A study
design should account for these variations to ensure accurate and precise esti-
mates of the parameters under study.
10.3 Summaries 371
2. Various factors may lend bias to the data collected and study results. These
include observer bias, sampling and measurement bias, and selection bias.
Investigators should acknowledge that bias can and does occur, and take meas-
ures to minimize or mitigate the effects of that bias.
3. A critical aspect of any study is development of and adherence to a rigorous
quality assurance/quality control program.
4. Study plans should be regarded as living documents that detail all facets of a
study, including any changes and modifications made during application of the
study design. As a rule of thumb, study plans should have sufficient detail to
allow independent replication of the study.
5. Sampling intensity should be sufficient to provide the information needed and
the precision desired to address the study objectives. Anything less may consti-
tute a waste of resources.
6. Plot size and shape are unique to each study.
7. Pilot studies are critical: “Those who skip this step because they do not have
enough time usually end up losing time” (Green 1979, p. 31).
1. “Impact” is a general term used to describe any change that perturbs the current
system, whether it is planned or unplanned, human induced or an act of nature
and positive or negative.
2. There are several prerequisites for an optimal study design:
● The impact must not have occurred, so that before-impact baseline data can
provide a temporal control for comparing with after-impact data.
● The type of impact and the time and place of occurrence must be known.
● Nonimpacted controls must be available.
3. Impact assessment requires making assumptions about the nature of temporal
and spatial variability of the system under study; assumptions about the tempo-
ral and spatial variability of a natural (nonimpacted) system can be categorized
as in steady-state, spatial, or dynamic equilibrium.
4. Three primary types of disturbances occur: pulse, press, and those affecting tem-
poral variance. Background variance caused by natural and/or undetected distur-
bances makes identifying the magnitude and duration of a disturbance difficult.
5. The “before–after/control–impact,” or BACI, design is the standard upon which
many current designs are based. In the BACI design, a sample is taken before
and another sample is taken after a disturbance, in each of the putatively dis-
turbed (impacted) sites and in an undisturbed (control) sites.
6. The basic BACI design has been expanded and improved to include both tem-
poral and spatial replication (multiple controls; use of matched pairs).
7. Designs classified under “suboptimal” are designs without pretreatment data and
most often apply to the impact situation where you had no ability to gather
372 10 Synthesis: Advances in Wildlife Study Design
preimpact (pretreatment) data or plan where the impact was going to occur.
After-only impact designs also apply to planned events that resulted from
management actions, but were done without any pretreatment data.
8. The gradient approach is especially applicable to localized impacts within homo-
geneous landscapes because it allows you to quantify the response of elements
at varying distances from the impact and each gradient provides a self-contained
control at the point beyond which impacts are detected.
9. A serious constraint in the design of wildlife impact studies is the limited oppor-
tunity to collect data before the disturbance. The before period is often short and
beyond the control of the researcher, that is the biologist has not control over
where or when the disturbance will occur. In some cases, it may be possible to
improve our understanding of potential temporal variation without studying for
multiple years by increasing the number of reference sites and spatial distribution
of study sites such that the full range of impact response is sampled.
10. Because of the unplanned nature of most disturbances, pretreatment data are sel-
dom directly available. Thus, the task of making a determination on the effects
the disturbance had on wildlife and other resources is complicated by (1) natural
stochasticity in the environment and (2) the unreplicated nature of the distur-
bance. To some extent, multiple reference areas can improve confidence in the
attribution of impact by allowing a comparison of the condition in the impacted
area to a distribution of conditions in the unimpacted (control) population.
11. Epidemiological approaches, by focusing on determining incidence rates, lend
themselves to applications in impact assessment. The choice of the use factor, or
denominator, is more important than the numerator. The choice arises from the
preliminary understanding of the process of injury or death. The ideal denomina-
tor in epidemiology is the unit that represents a constant risk to the animal.
12. Obtaining information on the sensory abilities of animals is a key step in
designing potential risk-reduction strategies.
1. Inventory and monitoring are key steps in wildlife biology and management;
they can be done in pursuit of basic knowledge or as part of the management
process.
2. Inventory assesses the state or status of one or more resources, whereas monitor-
ing assesses population changes or trends.
3. Monitoring can be classified into four overlapping categories (1) implementa-
tion monitoring is used to assess whether or not a directed management action
was carried out as designed, (2) effectiveness monitoring is used to evaluate
whether a management action met its desired objective, (3) validation monitor-
ing is used to evaluate whether an established management plan is working, and
(4) compliance monitoring is used to see if management is occurring according
to established law or regulation.
10.3 Summaries 373
1. All wildlife professionals should, at a minimum, be able to ask the proper ques-
tions needed to interpret any report or journal article. Such questions include
10.3 Summaries 375
A B
Abies spp. See Fir BACI (before–after/control–impact) design, 44
Adaptive management system, processes in, Barred owls, 296
296–297 Before-after/control-impact design, 109–110,
Adaptive resource management 237–242
elements of, 46 Before-after/control-impact-pairs design
monitoring and research program, 47 (BACIP design), 242–243
natural resources, 48 Before–after design, 112
objective function, 47 Belt (strip) transect, 168
Adaptive sampling, 151, 220–221 Binomial test. See Chi-square test
adaptive cluster sampling example, Bioequivalence testing, 123–124
152–154 Biometry, 349
challenges in, 158 courses in, 350
cluster in, 152 Biotic community, definition of, 27
Horvitz–Thompson (H–T) estimator, Bird fatalities study, with turbines types, 100
154–155 Birth rate, 179
neighborhood in, 152 Black-tailed deer diets study, 46. See also
stopping rules, 158 Descriptive studies, wildlife
stratified adaptive cluster sampling, 155 Bobcats, 296
strip adaptive cluster sampling, 155 Bootstrap, 168
systematic adaptive cluster sampling, 155 Bootstrap technique, 272
After-only suboptimal designs, 247 Breeding Bird Survey, 308
Against Method and Science in a Free Breeding season, and bird study, 205–206
Society, 9 British Columbia, black-tailed deer diets
Aix sponsa. See Wood ducks study in, 46
Akaike’s Information Criteria (AICc), 103 Brown-headed cowbird, 313
Analysis of covariance (ANCOVA), 81–83, natural history and occurrence of, 314
99–102 researches and studies on, 316, 317, 320,
Analysis of variance (ANOVA), 83, 238–239, 322, 331, 336–337
348, 350 Brush mouse (P. boylii), 201
Anthropogenic/extrinsic factors for change,
monitoring in, 276
Aquila chrysaetos. See Golden eagles C
Arctic National Wildlife Refuge (ANWR), Canada lynx, 296
161 CAPTURE program, for statistical analyses,
Asymmetrical analysis of variance, 243–244 360
Attributable risk (AR), assessment of, Capture-mark-recapture methods
254–255 model-based inferences for, 63
Axiology, definition of, 12 monitoring marked individuals, 53
377
378 Index
Incomplete block design, analysis of, 89 Key Habitat Components, monitoring of,
Indicator species analysis, in inventory and 300–302
monitoring studies, 279–280 Kirtland’s warbler, 260
International Encyclopedia of Unified Kruskal–Wallis test, 354
Science, 9
Interrupted time series, 109
Intervention analysis L
for study of environmental contamination, Latin square sampling +1 design, 162
79 Latin squares design, 89–90
quasiexperimental designs for, 109 Learning by doing. See Adaptive resource
time-series method of, 112 management
Intrinsic/natural factors of changes, Lesser scaup (Aythya affinis), 102
monitoring in, 276 Line intercept sampling, 164–165, 212–213
Inventory studies, in wildlife research, 267 Line-transect sampling, 168–171
adaptive management system, 296–297 Linear regression, 348
microhabitat implementation Local extinction probability, definition of, 27
monitoring, 299–307 Log-linear model, for studying wildlife
thresholds and trigger points, 298–299 ecology, 353
and monitoring, 281–283 Logistic regression
basic applications of, 271–273 based on binomial distribution, 353
fast for slow dynamics substitutions and for creating models and second-order
modeling, 293–295 Akaike’s Information Criteria, 103
genetic techniques, 295–296 for recording animals by habitat or
goal of, 271 behavior, 101
long-term and regional applications of, ratio technique and double sampling, 161
308 Long-term studies, on wildlife populations,
retrospective studies, 291–292 201–202
sampling design selection for Longitudinal experiment. See Repeated
indices used, 290–291 measures designs
occupancy vs. abundance, 287–288 Lynx canandensis. See Canada lynx
resources identification, 283–284 Lynx rufus. See Bobcats
sampling area selection, 284
sampling strategies, 288–290
study duration and design, 285–287 M
short-term and small area applications of, Management by experiment. See Adaptive
307 resource management
space for time substitutions, 292–293 Manager
statistical measures in, 277–281 duties of, 356
statistical training for, 356–357
wildlife study by, decision of, 364
J Manatees (Trichechus manatus), 161
Jackknife procedure, for estimating population Manipulative experiments for ecological
parameter, 168 system
Junco hyemalis. See Dark-eyed junco experimental unit, 42
Juniperus spp., 269 impoundment example, 43, 44
scale of treatment, 48
Manipulative studies, for surveys of resource
K use, 78
Kangaroo rats, 260 Mann–Whitney test, for comparing two
Kaplan–Meier estimator, as extension of unpaired groups, 354
binomial estimator, 183 MANOVA. See Multivariate analysis of
Kaplan–Meier survival curve, 354 variance
Key deer (Odocoileus virginianus clavium), Marine environment, analysis of, 241
97 MARK program, for statistical analyses, 360
Index 381
Slow processes. See Long-term studies, on for hypothesis testing and estimation,
wildlife populations 54–56
Small areas, sampling of, 219 procedure for selection of, 354
Snowshoe hares, 201 Steady-state system, categorized assumptions
Southwestern ponderosa pine forests, 202 for, 235–236
Space-for-time substitution, in monitoring Stopping rules, in adaptive sampling, 158
studies, 292–293 Stratified sampling
Spatial and Temporal Sampling, 199, 200 application in short-term studies of
spatial replication designs, 202 wildlife, 148
time spans designs, 201–202 formulas for computing mean and its
Spatial equilibrium standard error based on, 147
multiyear studies for, 246 two-phase adaptive sampling, 221
occurrence of, 236 Stressors, in inventory and monitoring studies,
Spatial sampling, of wildlife population, 280
199–201 Strix occidentalis. See Mexican Spotted Owl
Spatial scales. See Scales, in ecological Strix occidentalis caurina. See Northern
studies spotted owls
Spatial statistics Strix varia. See Barred owls
application to wildlife conservation, Study designs
188, 190 adjustments in, 126–127
model-based analysis, 149 improvement in reliability of, 112–113
Spearman correlation, for selecting statistical Suboptimal study designs, development of,
test, 354 231–232
Species richness, in inventory and monitoring Subtle processes, affecting wildlife
studies, 281 population, 202
Species sampling, inventory in, 272–273 System recovery, assumptions of, 235–237
Spizella passerina (chipping sparrows), Systematic random sampling design, 219
206 Systematic sampling
Split-plot designs for producing unbiased estimates of
used in agricultural and biological population total, mean, and
experiments, 95 variance, 49
variable for impacts on hardwood objectives in, 167
hammocks, 97 probabilistic sampling procedure for, 130
Spotted owl (Strix occidentalis), 185 selection of sampling units with equal
Spruce, 269 probability, 102
Statisical power analysis, in inventory and
monitoring studies, 277–279
Statistical analyses, for environmental impact T
detection, 239 T-square procedure, 172–173
Statistical design, for studying wildlife Tegu lizards (Tupinambis spp.), 94
ecology, 358 Temporal sampling, of wildlife population,
Statistical environments, in wildlife science, 199–201
67–68 Temporal scales. See Scales, in ecological
Statistical knowledge, undergraduates, studies
347–349 Temporal variance, disturbances in, 232
Statistical programs The Journal of Wildlife Management, 21, 343
designed to estimate population The Logic of Scientific Discovery, 8
parameters, 68 The Social Construction of Reality, 10
development of, 295 The Structure of Scientific Revolutions, 9
explanatory variables for, 38 Thiafentanil dose treatments, on mule deer,
for analysis of ecological data, 67 105
used for estimation procedures, 179 Time-series designs, 250
Statistical test Time-to-event models, 182–184
for estimating error variance, 82 Transcription error, 209
Index 385