Statistical Methods For AM Studies
Statistical Methods For AM Studies
Statistical Methods For AM Studies
M A N A G E M E N T
H A N D B O O K
42
1998
Ministry of Forests
Research Program
Ministry of Forests
Research Program
Citation
Sit, V. and B. Taylor (editors). 1998. Statistical Methods for Adaptive Management Studies. Res. Br.,
B.C. Min. For., Res. Br., Victoria, BC, Land Manage. Handb. No. 42.
634.92072
C989600815
Prepared for:
B.C. Ministry of Forests
Research Branch
PO Box 9519, Stn Prov Govt
Victoria, BC V8W 9C2
Published by
B.C. Ministry of Forests
Forestry Division Services Branch
Production Resources
595 Pandora Avenue
Victoria, BC V8W 3E7
1998 Province of British Columbia
Copies of this and other Ministry of Forests
titles are available from:
Crown Publications Inc.
521 Fort Street
Victoria, BC V8W 1E7
Send comments to: Vera Sit, B.C. Ministry of Forests, Research Branch,
PO Box 9519, Stn Prov Govt, Victoria, BC V8W 9C2
Ministry of Forests Publications Internet Catalogue: www.for.gov.bc.ca/hfd
ACKNOWLEDGEMENTS
iii
LIST OF CONTRIBUTORS
Judith L. Anderson
Wendy A. Bergerud
Bruce G. Marcot
Randall M. Peterman
Calvin N. Peters
William J. Reed
Richard D. Routledge
Carl J. Schwarz
Vera Sit
G. John Smith
Brenda Taylor
iv
CONTENTS
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
1 Statistics and the Practice of Adaptive Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
J. Brian Nyberg
2 Design of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Amanda F. Linnell Nemec
3 Studies of Uncontrolled Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Carl J. Schwarz
4 Retrospective Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
G. John Smith
5 Measurements and Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Richard D. Routledge
6 Errors of Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Judith L. Anderson
7 Bayesian Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Wendy A. Bergerud and William J. Reed
8 Decision Analysis: Taking Uncertainties into Account
in Forest Resource Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Randall M. Peterman and Calvin N. Peters
9 Selecting Appropriate Statistical Procedures and Asking
the Right Questions: A Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Bruce G. Marcot
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3.1 Equivalencies between terms used in surveys and in experimental design . . . . . . . . . . . . . . . . 29
4.1 I/A ratio by colony 19901995 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 I/A ratios for 1996 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.1 Diversity measures for the abundance patterns in Figure 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1 Four possible outcomes of a statistical test of a null hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1 Numbers of previously sampled plots from both NSR and SR cutblocks . . . . . . . . . . . . . . . . 93
7.2 Probability parameters for p(), prior distribution of (cutblock is NSR or SR),
and p(X|), conditional probability of observing X, given (cutblock is NSR or SR) . . . 94
7.3 The likelihoods (probability that X plots out of 12 are US given ) and the
posterior probability that the cutblock is NSR for all possible X values,
when the prior probability, 0 = 0.84 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.4 Suggested cutoff values for the Bayes factor when comparing two hypotheses . . . . . . . . . . 97
7.5 Hypothetical gains for each combination of action and state of nature . . . . . . . . . . . . . . . . . 98
7.6 The Bayes posterior gain and posterior Bayes decision for all possible numbers of
understocked plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1 A generalized decision table showing calculation of expected outcomes
for two potential management actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.2 Some possible arrangements that could be considered for a thinning experiment . . . . . . 117
8.3 Results of Sainsburys (1991) calculations of the benefits of different designs
for an active adaptive management experiment on groundfish in Australia . . . . . . . . . . . 120
9.1 Stages of an adaptive management project and sources of information
appropriate for each stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
A1.1 Numbers of previously sampled plots (observed as US or S)
from both NSR and SR cutblocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
vi
2.1 Relationship between the study units in a typical research experiment
and an adaptive management experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Design and analysis of an adaptive management experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 A classification of the methods considered in this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Relationship between degree of control, strength of inference, and type of study design . . . 21
3.3 Simplified outcomes in a BACI design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Problems with the simple BACI design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 The BACI-P design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 The enhanced BACI-P design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1 Comparing the development of a retrospective and prospective study . . . . . . . . . . . . . . . . . . . 46
4.2 I/A ratio vs colony size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.1 Examples of accuracy and precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Errors in x-values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Estimated numbers of chinook salmon spawning in the
Upper Fraser Watershed near Tte Jaune Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 Four dominance patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1 The relationship between scientific and statistical hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Variables influencing power to detect the difference between a
sample mean and a constant for the wood duck nest cavity example . . . . . . . . . . . . . . . . . . . . . 74
6.3 A posteriori power analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.1 Components of a Bayesian analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.2 Distribution of sample plots for the silviculture example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.3 Probability tree for the silviculture example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.4 Decision tree for the silviculture example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.1 Changes in estimates of various physical constants as new experimental
or measurement methods were developed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.2 A simple example of a generalized decision tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.3 Decision tree for the Tahoe National Forest example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.4 Possible models for hypothetical data on average volume per tree
at age 100 years as a function of initial density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.5 Posterior probabilities for different slopes of a linear model . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.6 An example sensitivity analysis of Cohan et al.s (1984) decision analysis
on the Tahoe burning example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.7 Decision tree for the analysis of various management actions in
Sainsburys (1988) large-scale fishing experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.1 Causes and correlates: four examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
vii
Abstract
1.1 Introduction
The concept of adaptive management (Holling [editor]1978) is steadily gaining wider acceptance in
forestry, especially in Canada and the United States
(e.g., Schmiegelow and Hannon 1993; Bormann et al.
1994; Nyberg and Taylor 1995; Covington and Wagner [technical coordinators] 1996; MacDonald et al.
1997). As a hybrid of scientific research and resource
management, adaptive management blends methods
of investigation and discovery with deliberate manipulations of managed systems. Through observation
and evaluation of the ways that human interventions
affect managed systems, new knowledge is gleaned
about system interactions and productive capacities.
This new knowledge is then applied to future decisions in a cycle of continuous improvement of
policies and field practices.
Adaptive management has somewhat different
goals from research and presents challenges that differ both in scope and nature from those posed by
typical forest research studies. Consequently, designing and analyzing adaptive management studies
involves more than simply transferring research techniques to management problems. Scientists can play
The sequence may need to be repeated in a continuing learning cycle if uncertainties remain unresolved
or new ones appear.
This report deals mainly with the second, fourth,
and fifth steps in the adaptive management process,
namely the design (thoughtful selection) of practices
to be studied, the measurement (monitoring) of responses, and the evaluation (analysis) of results.
as part of adaptive management, logging could proceed and at the same time yield important information on key indicators of stream conditions, fish and
wildlife habitat, and responses of trees and other vegetation. Through modified logging operations managers could apply several alternative treatments, such
as various widths of unharvested reserve zones or different degrees of partial cutting of riparian zones, or
both. Among other benefits, such a program might
reveal the cumulative effects on watershed dynamics
and wildlife habitat of large and widespread treatmentseffects that could not be studied in a more
traditional, small-scale research experiment.
Because adaptive management has attributes common to scientific research and sometimes draws on
research techniques such as controlled experiments,
some may assume that an ambitious program of
adaptive management would obviate the need for research. They would be wrong, for intensive research
is far more suited than adaptive management to
answering some questions. Intensive research can
produce deeper knowledge of selected system
processes, such as mechanisms of physiological response in seedlings exposed to varying temperature
and moisture regimes, than could adaptive management. It may also be the only approach suitable for
sorting out the interacting effects of a number of factors on some dependent variable. This in-depth
knowledge may be crucial for building models used
to forecast how the overall system will respond to
management.
Adaptive management is most suited to selecting
amongst alternative courses of action, such as different partial cutting treatments that could be applied to
a particular site and stand type. It can also be helpful
for testing the modelled responses of managed systems against real-world results of management,
across a much wider range of conditions than could
any practical program of intensive research.
In fact, research and adaptive management complement each other, so that the application of both
approaches to a problem will almost certainly lead to
better results than use of either alone. Adaptive management can reveal management surprises;
research can help to explain them.
Adaptive management may be valuable to anyone
facing substantial uncertainty about the best course
of management action, as long as that person has or
1 It can be argued that adaptive management is the best way to resolve gridlock based on mistrust of resource managers by public groups or
other stakeholders. The same is true where the impasse arises from competing ideas and values, fear of consequences, or other concerns
rooted in lack of knowledge of the forests responses to management.
______. 1992. Can fisheries agencies learn from experience? Fisheries 17:614.
Holling, C.S. (editor). 1978. Adaptive environmental
assessment and management. J. Wiley, London,
U.K.
Lancia, R.A., C.E. Braun, M.W. Collopy, R.D.
Dueser, J.G. Kie, C.J. Martinka, J.D. Nichols,
T.D. Nudds, W.R. Porath, and N.G. Tilghmann. 1996. ARM! for the future: adaptive
resource management in the wildlife profession. Wildl. Soc.Bull. 24:43642.
Lee, K.N. 1993. Compass and gyroscope: integrating
science and politics for the environment. Island
Press, Washington, D.C.
McLain, R.J. and R.G. Lee. 1996. Adaptive management: promises and pitfalls. Environ. Manage.
20:43748.
MacDonald, G.B., R. Arnup, and R.K. Jones. 1997.
Adaptive forest management in Ontario: a literature review and strategic analysis. Ontario
Min. Nat. Resour., For. Res. Info. Pap. 139.
Namkoong, G. 1997. A gene conservation plan for
loblolly pine. Can. J. For. Res. 27:4337.
Nyberg, J.B. and B. Taylor. 1995. Applying adaptive
management in British Columbias forests. In
Proc. FAO/ECE/ILO International Forestry
Seminar, Prince George, B.C., Sept. 915, 1995,
pp. 23945. Can. For. Serv., Prince George, B.C.
Schmiegelow, F.K.A. and S.J. Hannon. 1993. Adaptive
management, adaptive science and the effects
of forest fragmentation on boreal birds in
northern Alberta. Trans. N. Am. Wildl. Nat.
Resour. Conf. 58:58497.
Taylor, B., L. Kremsater, and R. Ellis. 1997. Adaptive
management of forests in British Columbia.
B.C. Min. For., For. Practices Br., Victoria, B.C.
Walters, C.J. 1986. Adaptive management of renewable resources. McGraw-Hill, New York, N.Y.
Walters, C.J. and C.S. Holling. 1990. Large-scale
management experiments and learning by
doing. Ecology 71:20608.
Walton, M. 1986. The Deming management method.
Perigee Books, New York, N.Y.
2 DESIGN OF EXPERIMENTS
AMANDA F. LINNELL NEMEC
Abstract
2.1 Introduction
Successful management of our forests is a dynamic
process in which current programs are continually
monitored and adapted as new information becomes
available and policies change. Because the outcome
of decisions is always uncertain, managers often experiment with new strategies to help determine the
best course of action. Although such tests do not
necessarily conform to the standards of a strictly controlled research experiment, many issues, such as
elimination of bias, repeatability of results, efficient
use of resources, and quantification of uncertainty,
are the same. For this reason, managers, as well as researchers, can benefit from a good understanding of
the principles of sound experimental design. This
chapter reviews the theory of classical experimental
design and the assumptions that provide a basis for
statistical inference from experimental data. Application of traditional theory to the design of adaptive
management experiments is considered.
The literature on the design of experiments is vast.
Classical books, such as The Design of Experiments by
Fisher (1935), The Design and Analysis of Experiments
by Kempthorne (1952), Experimental Designs by
Cochran and Cox (1957), and Planning of Experiments
by Cox (1958), remain valuable sources of guidance
Forest ecosystem
Management unit
Experimental unit
.
10
2.3 Objectives
The first requirement in any experiment is a clear
statement of the goals. Paraphrasing Box et al. (1978,
p. 15), the purpose of an experiment must be clear
and agreed upon by all interested parties; there must
be agreement on what criteria will determine whether
the objectives have been accomplished; and, finally,
in the event that the objectives change during the
course of the experiment, an arrangement must be
made by which all interested parties are informed of
the change, and agreement on the revisions can be
reached. The importance of these points cannot be
overemphasized. Without clear objectives, the outcome of an experiment is predictable: ambiguous
results and wasted time, money, and valuable (possibly irreplaceable) resources. Unnecessary waste is
always unacceptable. However, when large management units are involved, the costs can be devastating.
Defining the objectives of an experiment requires
careful consideration of the components that make
up the system under study, the forces that drive the
system, and the best means of extracting information
about both. In a small-scale research experiment, attention might reasonably be restricted to relatively
simple questionsfor instance, how does tree
growth differ under various controlled conditions?
The objectives of adaptive management experiments
typically concern more complicated issuessuch as,
how is biodiversity affected by forest practices?
In both cases, general scientific concepts (e.g., tree
growth and biodiversity) must be stated in terms of
well-defined, measurable quantities (e.g., height or
diameter increment, number of species). These
quantities provide a concrete basis for planning
experiments and for analyzing the results.
The objectives of an experiment are often posed as
hypotheses to be tested or parameters to be estimated. Special care must be taken to ensure that the
hypotheses are sensible and that the parameters are
useful for making decisions. In classical hypothesis
testing, a so-called null hypothesis is retained unless
there is convincing evidence to the contrary. Based
on the outcome of the experiment, the null hypothesis is either retained or rejected in favour of a specific
alternative hypothesis. (See Anderson, this volume,
Chap. 6, for a discussion of the associated errors of
inference.) The null and alternative hypotheses
should be defined so that both outcomes (i.e., acceptance or rejection of the null hypothesis) represent
reasonable and informative conclusions that would
1 The term treatment will hereafter refer to a particular set of conditions, an action, or an entire management strategy.
11
12
experimental units have a variety of origins (e.g., different geographic regions) and therefore exhibit
considerable inherent variability. In such situations, a
substantial reduction in the experimental error can
often be achieved by separating the units into homogeneous groups, or blocks, according to origin, or
some other factor. Treatments are then randomly assigned within each block. This is a form of restricted
randomization because each treatment is constrained
to occur a fixed number of times in each block. In a
completely randomized design, the randomization is
unrestricted, which might result in a very uneven distribution of origins or other attributes among the
treatment groups. Many other designs with some
form of restricted randomization exist, including
split-plot designs (which impose constraints on the
assignment of treatments to two types of experimental units: main plots and subplots), Latin squares,
and lattice designs. Refer to Cochran and Cox (1957)
and Anderson and McLean (1974) for more information about these and other designs.
Random sampling is another way of avoiding bias
when factors, such as species or age, cannot be assigned. Experimental units should be randomly
chosen from the experimental population and sampling units should likewise be selected at random
from the experimental units. Random samples, unlike haphazard samples or judgement samples
(samples judged to be representative by an expert
who makes the selection), have known statistical
properties. This allows the precision of a result to be
estimated from the samplesomething that is not
possible with non-random sampling. For further discussion of the topic, see Deming (1960, pp. 3033)
and Schwarz (this volume, Chap. 3).
Randomization and random sampling are closely
related ideas leading, in many cases, to the same
mathematical models and equivalent data analyses
(Feinberg and Tanur 1987; Smith and Sugden 1988).
For instance, randomization with blocking is analogous to stratified random sampling, and split-plot
designs are comparable to cluster sampling. Despite
the parallels, randomization is traditionally discussed
in connection with experimental design, while issues
relating to sampling design (e.g., type and number of
sampling units, method of sampling) are reserved for
discussions of observational studies. For more information on the latter subject refer to Schwarz (this
volume, Chap. 3).
Randomization is one of the simplest ideals to
achieve in both small- and large-scale experiments.
13
light of present computing power. As the role of experiments continues to evolve, new criteria and
principles will undoubtedly emerge (see Section 2.6).
Successful experimentation depends on more than
the choice of factors and the use of appropriate randomization, replication, and blocking. Many practical details must also be considered. All pertinent
measurements of the experimental (sampling) units
must be identified, appropriate field procedures and
data collection forms must be developed, and provision must be made for adequate supervision of the
data collection. In large-scale studies, coordination
and optimization of procedures are especially important. For a checklist of these and other aspects of the
planning and execution of a study, refer to Sit (1993).
2.5 Statistical Inference for Experiments
Statistical inference and experimental design are
closely linked. Design determines the kind of statistical inferences that are possible, while consideration
of the proposed method of analysis almost always influences design (e.g., sample-size calculations depend
on the hypothesis testing procedure or estimation
method that will be used). In fact, failure to contemplate how the data will be analyzed invariably results
in a poor design.
All statistical inferences are based on a set of assumptions that links the data to the experimental
population via the design. Thus inferences are necessarily confined to the experimental population.
Because the relationship between experimental and
target populations is unknown, extrapolation to the
latter is more or less conjecture and should be viewed
with caution, particularly when the gap between
experimental and target populations is large. The
validity of statistical inference in an experimental setting is discussed more fully by Deming (1953, 1975),
Box et al. (1978, Chap. 1), and Hahn and Meeker
(1993).
Analysis of variance (ANOVA) methods are commonly applied to experimental data, so much so that
discussions of experimental design are often more
about ANOVA than fundamental issues of design.
Both are concerned with sources of variation and the
estimation of experimental error. The basic premise
of ANOVA is that the observed variability in experimental data can be attributed to a finite number of
identifiable sources, including factors under the control of the investigator, uncontrolled experimental
errors, and various interactions. A simple, one-way,
14
15
16
2.8 Summary
Sound experimental design is essential for adaptive
management of valuable forest resources. Adherence
to the principles of randomization, replication, and
blocking helps to ensure that an experiment meets
the basic requirements for success (Cox 1958): absence of systematic error, precision, validity for an
appropriate range of conditions, simplicity, and an
estimate of uncertainty. Failure to consider these issues leads to unnecessary waste and, in the worst
case, bad decisions resulting in serious damage to
sensitive forest ecosystems. Development and adoption of adaptive methods for carrying out large-scale
experiments has been slow, due partly to the barriers
created by excessively technical language, a lack of
analytical tools, and a limited number of successful
applications to serve as models. Overcoming these
obstacles by continuing efforts to educate and to inform (e.g., Biometrics Information series published
by Research Branch, B.C. Ministry of Forests) can
only help to improve matters in the future.
References
Deming, W.E. 1953. On the distinction between enumerative and analytic surveys. J. Am. Statist.
Assoc. 48:24455.
Anderson, V.L. and R.A. McLean. 1974. Design of experiments: a realistic approach. Marcel Dekker,
New York, N.Y.
17
Peterman, R.M. and C. Peters. [n.d.] Decision analysis: taking uncertainties into account in forest
resource management. This volume.
Pike, D.J. 1984. Discussion (of paper by Steinberg and
Hunter 1984). Technometrics 26:1059.
Schwarz, C.J. [n.d]. Studies of uncontrolled events.
This volume.
Sit, V. 1993. What do we look for in a working plan?
B.C. Min. For., Res. Br., Victoria, B.C. Biometrics Inf. Pamph. No. 44.
Kempthorne, O. 1952. The design and analysis of experiments. Krieger, Malabar, Fla.
Mandel, J. 1964. The statistical analysis of experimental data, Dover, Mineola, N.Y.
Steinberg, D.M. and W.G. Hunter. 1984. Experimental design: review and comment.
Technometrics 26:71130.
Montgomery, D.C. 1991. Design and analysis of experiments. 3rd ed. J. Wiley, New York, N.Y.
18
Abstract
3.1 Introduction
The rationale for carefully planned experiments in
ecology is well documented (Hurlbert 1984). A welldesigned experiment will have a high probability of
detecting important, biologically meaningful differences among the experimental groups. Furthermore,
because the manager directly manipulated the experimental factors and randomly assigned the
experimental units to the particular combination of
experimental factors, the manager can infer a causal
relationship between the experimental factors and
the response variable. The manager who takes a similar approach and practices active adaptive
management can make the strongest possible inferences about the role of the experimental factors.
In many cases, controlled experiments are impractical or too expensive, and surveys of existing
ecological populations are performed, even though
the resulting inferences will be weaker than those
obtainable through controlled experimentation.
Experimental
studies
Designed
experiments
Descriptive
surveys
Observational
surveys
Non-experimental
studies
Analytical
surveys
Impact
surveys
.
19
20
Degree of control
Designed
experiments
Control-Impact
surveys
Impact
surveys
Analytical
surveys
Observational
surveys
Descriptive
surveys
Strength of inference
.
Relationship between degree of control, strength of inference, and type of study design.
21
22
divided into 300 1-hectare plots, and a random sample of 20 plots was selected and analyzed using aerial
photos.
Pitfall: A simple random sample design is often hid-
the same fashion as a simple random sample. However, the true precision of an estimator from a
systematic sample can be either worse or better than
a simple random sample of the same size, depending
if units within the systematic sample are positively or
negatively correlated among themselves. For example, if a systematic samples sampling interval
happens to match a cyclic pattern in the population,
values within the systematic sample are highly positively correlated (the sampled units may all hit the
peaks of the cyclic trend), and the true sampling
precision is worse than a simple random sample of
the same size. What is even more unfortunate is that,
because the units are positively correlated within the
sample, the sample variance will underestimate the
true variation in the population, and, if the estimated
precision is computed using the formula for a simple
random sample, a double dose of bias in the estimated precision occurs (Krebs 1989, p. 227). On the other
hand, if the systematic sample is arranged perpendicular to a known trend to try to incorporate
additional variability in the sample, the units within a
sample are now negatively correlated, the true precision is now better than an simple random sample of
the same size, but the sample variance now overestimates the population variance, and the formula for
precision from a simple random sample will overstate the sampling error.
While logistically simpler, a systematic sample is
only equivalent to a simple random sample of the
same size if the population units are in random
order to begin with (Krebs 1989, p. 227). Even worse,
no information in the systematic sample allows the
manager to check for hidden trends and cycles.
Nevertheless, systematic samples offer the following practical advantages over simple random
sampling if the bias in the estimated precision can be
corrected:
make plot relocation for long-term monitoring
easier;
allow mapping to be carried out concurrently with
the sampling effort because the ground is systematically traversed;
avoid poorly distributed sampling units, which can
occur with a simple random sample (this problem
can also be avoided by judicious stratification).
23
error is too small and does not fully reflect the actual
imprecision in the estimate).
Solution: To be confident that the reported standard
error really reflects the uncertainty of the estimate,
the analytical methods must be appropriate for the
survey design. The proper analysis treats the clusters
as a random sample from the population of clusters.
The methods of simple random samples are applied
to the cluster summary statistics (Thompson 1992,
Chap. 12; Nemec 1993).
Multi-stage sampling
In many situations the population is naturally divided into several different sizes of units. For example,
a forest management unit consists of several stands,
each stand has several cutblocks, and each cutblock
can be divided into plots. These natural divisions can
be easily accommodated in a survey through the use
of multistage methods. Units are selected in stages.
For example, several stands could be selected from a
management area; then several cutblocks are selected
in each of the chosen stands; then several plots are selected in each of the chosen cutblocks. Note that in a
multistage design, units at any stage are selected at
random only from those larger units selected in previous stages.
The advantage of multistage designs are that costs
can be reduced compared to a simple random sample
of the same size, primarily through improved logistics. The precision of the results is less than an
equivalent simple random sample, but because costs
are less, a larger multistage survey can often be completed for the same costs as a smaller simple random
sample. This approach often results in a more precise
design for the same cost. However, due to the misuse
of data from complex designs, simple designs are
often highly preferred and end up being more costefficient when costs associated with incorrect decisions are incorporated.
Pitfall: Although random selections are made at each
24
25
a small area are visited. A simple, cheap characteristic, which is used to predict the value of the tree, is
measured. A subsample of the trees is then selected
with probability proportional to the predicted value,
remeasured using a more expensive measuring device. The relationship between the cheap and
expensive measurement in the second phase is used
with the simple measurement from the first phase to
obtain a more precise estimate for the entire area.
This example illustrates two-phase sampling with unequal probability of selection.
Stratification
All survey methods can potentially benefit from stratification (also known as blocking in the experimentaldesign literature). Stratification groups survey units
into homogeneous groups before conducting the survey, and then conducts independent surveys in each
stratum. At the end of the survey, the stratum results
are combined and weighted appropriately. For example, a watershed might be stratified by elevation into
three strata, and separate surveys are conducted within each elevation stratum. The separate results would
be weighted proportionally to the size of the elevation strata. Stratification will be beneficial whenever
variability among the sampling units can be anticipated and strata can be formed that are more
homogeneous than the original population.
A major question with stratified surveys is the
allocation of sampling units among the strata. Depending upon the goals of the survey, an optimal
allocation of sampling units can be one that is equal
in all strata, that is proportional to the stratum size,
or that is related to the cost of sampling in each stratum (Thompson 1992, Chap. 11). Equal allocation
(where all strata have the same sample size) is preferred when equally precise estimates are required for
each stratum as well as for the overall population.
Proportional allocation (where the sample size in
each stratum is proportional to the population size)
is preferred when more precise estimates are required
in larger strata. If the costs of sampling vary among
the strata, then an optimal allocation that accounts
for costs would try to obtain the best overall precision at the lowest cost by allocating units among the
strata accounting for the costs of sampling in each
stratum.
Stratification can be carried out prior to the survey
(pre-stratification) or after the survey (post-stratification). Pre-stratification is used if the stratum
26
variable is known in advance for every plot (e.g., elevation of a plot). Post-stratification is used if the
stratum variable can only be ascertained after measuring the plot (e.g., soil quality or soil pH). The
advantages of pre-stratification are that samples can
be allocated to the various strata in advance to optimize the survey and the analysis is relatively
straightforward. With post-stratification, there is no
control over sample size in each of the strata, and the
analysis is more complicated (the samples sizes in
each stratum are now random). Post-stratification
can result in significant gains in precision but does
not allow for finer control of the sample sizes as
found in pre-stratification.
Auxiliary variables
An association between the measured variable of interest and a second variable of interest can be
exploited to obtain more precise estimates. For example, suppose that growth in a sample plot is
related to soil nitrogen content. A simple random
sample of plots is selected and the height of trees in
the sample plot is measured along with the soil nitrogen content in the plot. A regression model is fit
(Thompson 1992, Chap. 7 and 8) between the two
variables to account for some of the variation in tree
height as a function of soil nitrogen content. This approach can be used to make precise predictions of the
mean height in stands if the soil nitrogen content can
be easily measured. This method will be successful if
a direct relationship exists between the two variables.
The stronger the relationship, the more effective this
method will be. This technique is often called ratioestimation or regression-estimation.
Notice that multiphase designs often use an auxiliary variable but this second variable is only
measured on a subset of the sample units.
Unit size
A typical concern with any of the survey methods occurs when the population does not have natural
discrete sampling units. For example, a large section
of land may be arbitrarily divided into 1 m2 plots, or
10 m2 plots. A natural questionis what is the best
size of unit?has no simple answer and depends
upon several factors, which must be addressed for
each survey:
Cost: All else being equal, sampling many small
plots may be more expensive than sampling fewer
size requirements for each objective may also be difficult. In these and many other cases, sample sizes are
determined solely by the budget for the survey.
3.3 Analytical Surveys
In descriptive surveys, the objective was to simply
obtain information about one large group. In
observational surveys, two deliberately selected subpopulations are chosen and surveyed, but the results
are not generalized to the whole population. In
analytical surveys, subpopulations are selected and
sampled to generalize the observed differences
among the subpopulation to this and other similar
populations.
As such, analytical and observational surveys and
experimental design are similar. However, the primary difference is that, in experiments, the manager
controls the assignment of the explanatory variables
while measuring the response variables, whereas in
analytical and observational surveys, neither set of
variables is under the control of the manager. (Refer
to Section 3.1, Examples B, C, and D). The analysis of
complex surveys for analytical purposes can be very
difficult (Sedransk 1965a, 1965b, 1966; Rao 1973; Kish
1984, 1987).
The first step in analytical surveys is to identify potential explanatory variables (similar to factors in
experiments). At this point, analytical surveys can be
usually further subdivided into three categories depending on the type of stratification:
the population is pre-stratified by the explanatory
variables and surveys are conducted in each stratum to measure the outcome variables;
the population is surveyed in its entirety, and poststratified by the explanatory variables; and
the explanatory variables can be used as auxiliary
variables in ratio or regression methods.
In very complex surveys, all three types of stratification may take place.
The choice between the categories is usually made
by the ease with which the population can be prestratified and the strength of the relationship between
the response and explanatory variables. For example,
sample plots can be easily pre-stratified by elevation
or by exposure to the sun, but it would be difficult to
pre-stratify by soil pH.
Pre-stratification has the advantage that the manager controls the number of sample points collected
in each stratum. However, the numbers are not
27
28
higher elevations have less growth. In the second experiment, the explanatory variable is the amount of
fertilizer applied. Ten stands are randomly assigned
to each of two doses of fertilizer. The amount of
growth is measured and it appears that stands that
receive a higher dose of fertilizer have greater growth.
In the first experiment, the manager cannot say
whether the differences in growth are due to differences in elevation or amount of sun exposure or soil
quality as all three may be highly related. In the second experiment, all uncontrolled factors are present
in both groups and their effects will, on average, be
equal. Consequently, the assignment of cause to the
fertilizer dose is justified because it is the only factor
that differs (on average) among the groups.
As noted by Eberhardt and Thomas (1991), rigorous application of the techniques for survey sampling
is needed when conducting analytical surveys, otherwise these surveys are likely to be subject to biases.
Experience and judgement are very important in
evaluating the prospects for bias, and attempting to
find ways to control and account for these biases. The
most common source of bias is the selection of survey units; the most common pitfall is to select units
based on convenience rather than on a probabilistic
sampling design. The potential problems that this
can lead to are analogous to those that occur when it
is assumed that callers to a radio phone-in show are
representative of the entire population.
Survey term
Cluster sampling
(a) Clusters are random effects; units within a cluster treated as subsamples; or
(b) Clusters treated as main plots; units within a cluster treated as subplots in a
split-plot analysis
Multi-stage sampling
(a) Nested designs with units at each stage nested in units in higher stages. Effects
of units at each stage treated as random effects; or
(b) Split-plot designs with factors operating at higher stages treated as main plot
factors and factors operating at lower stages treated as subplot factors
Stratification
Sampling unit
Subsample
Subsample
29
30
from the same flaw as the previous design. The repeated surveys are pseudoreplications in time rather
than real replicates (Hurlbert 1984). The observed
change may have occurred regardless of the clearcut
because of long-term trends over time. Again, decisions based on this design are difficult to justify.
BACI: Before-after-control-impact surveys
As Green (1979) pointed out, an optimal impact survey has several features:
the type of impact, time of impact, and place of
occurrence should be known in advance;
the impact should not have occurred yet; and
control areas should be available.
before and after samples for the control site with the
before and after samples for the treatment sites. This
contrast is known as the area-by-time interaction
(see Figure 3.3).
This design allows for both natural stream-tostream variation and coincidental time effects. If the
clearcut has no effect, then change in water quality
between the two time points should be the same (i.e.,
parallel lines in Figures 3.3a and b). On the other
hand, if the clearcut has an impact, the time trends
will not be parallel (Figures 3.3c, d, and e).
Pitfalls: Hurlbert (1984), Stewart-Oaten et al. (1986),
and Underwood (1991) discuss the simple BACI design and point out concerns with its application.
First, because impact to the sites was not randomly
assigned, any observed difference between control
and impact sites may be related solely to some other
factor that differs between the two sites. One could
argue that it is unfair to ascribe the effect to the impact. However, as Stewart-Oaten et al. (1986) point
out, the survey is concerned about a particular impact
Water quality
(b)
Water quality
(a)
Before
After
Time
Before
After
Time
.
(e)
Water quality
(d)
Water quality
Water quality
(c)
Before
After
Time
Before
After
Time
Before
After
Time
31
points as the single-impact stream. Then if the observed difference in the impact stream is much
different than could be expected based on the multiple-control streams, the event is said to have caused
an impact. When several control sites are monitored,
the lack of randomization is less of a concern because
the replicated control sites provide some information
about potential effects of other factors.
The second and more serious concern with the
simple BACI design with a single sampling point before and after the impact is that it fails to recognize
that natural fluctuations in the characteristic of interest that are unrelated to any impact may occur
(Hurlbert 1984; Stewart-Oaten et al. 1986). For example, consider Figure 3.4. If there were no natural
fluctuations over time, the single samples before and
after the impact would be sufficient to detect the
effects of the impact. However, if the population also
Water quality
(a)
Before
After
Time
(c)
Water quality
Water quality
(b)
Before
After
Time
32
Before
After
Time
(a)
(b)
Control
Impact
Water quality
Water quality
Control
Impact
Difference
Difference
Before
After
Time
Before
After
Time
has natural fluctuations over and above the longterm average, then distinguishing between cases
where there is no effect from those where there was
impact is impossible. In terms of our example, differences in the water quality may be artifacts of the
sampling dates and natural fluctuations may obscure
differences or lead to the conclusion that differences
are present when they are not.
BACI-P: Before-after-control-impact
paired designs
Stewart-Oaten et al. (1986) extended the simple BACI
design by pairing surveys at several selected time
points before and after the impact. Both sites are
measured at the same time points. An analysis of how
the difference between the control and impact sites
changes over time would reveal if an impact has occurred (Figure 3.5). The rationale behind the design is
that repeated sampling before the development indicates the pattern of differences over several periods of
potential change between the two sites. This survey
design provides information both on the mean difference in the water quality before and after impact, and
on the natural variability of the water quality measurements. If the changes in the mean difference are
large relative to natural variability, the manager has
detected an effect.
The decision between random and regularly
33
will depend upon the actual levels. For example, suppose that the readings of water quality at two sites at
the first time point were 200 versus 100, which has an
arithmetic difference of 100; at the second time point,
the readings were 20 versus 10, which has an arithmetic difference of 10; but both pairs are in a 2:1 ratio
at both time points. The remedy is simple: a logarithmic transform of the raw data converts a multiplicative difference into a constant arithmetic
difference on the logarithmic scale. This problem is
commonly found when water quality measurements
are concentrations (e.g., pH).
Underwood (1991) also considered two variations
on the BACI-P design. First, it may not be possible to
sample both sites simultaneously for technical or logistical reasons. Underwood (1991) discussed a
modification where sampling is done at different
times in each site before and after impact (i.e., sampling times are no longer paired), but notes that this
modification cannot detect changes in the two sites
that occurred before the impact. For example, differences in water quality may show a gradual change
over time in the paired design prior to impact. Without paired sampling, it would be difficult to detect
this change. Second, sampling only a single control
site still has the problems identified earlier of not
knowing if observed differences in the impact and
the control sites are site-specific. Again, Underwood
(1991) suggests that multiple control sites should be
monitored. In our example, more than one control
site would be measured at each time point. The variability in the difference between each control site and
the impact site provides information on generalization to other sites.
Enhanced BACI-P: Designs to detect acute versus
chronic effects or to detect changes in variation as
well as changes in the mean
As Underwood (1991) pointed out, the previous designs are suitable for detecting long-term (chronic)
effects in the mean level of some variable. In some
cases, the impact may have an acute effect (i.e., effects only last for a short while) or may change the
variability in response (e.g., seasonal changes become
more pronounced). Underwoods solution is to
modify the sampling schedule so that it occurs on
two temporal scales (Figure 3.6). For example, groups
of surveys could be conducted every 6 months with
three surveys 1 week apart randomly located within
each group. The analysis of such a design is presented
in Underwood (1991). Again, several control sites
should be used to confound the argument about detected differences being site-specific.
This design is also useful when there are different
objectives. For example, the objective for one variable
may be to detect a change in trend. The pairing of
sample points on the long time scale leads to efficient
detection of trend changes. The objectives for another variable may be to detect differences in the mean
level. The short time scale surveys randomly located
Water quality
Control
Impact
Period 1
Period 2
Before impact
Period 1
Period 2
After impact
Time
.
34
in time and space are efficient for detecting differences in the mean level.
3.4.2 Issues in impact surveys
Time dependence
Many of the analyses proposed for the above surveys
(e.g., regression or ANOVA) have methodological
problems that need to be resolved before interpreting
the results.
In regression of the characteristics versus time, the
estimated slope is often used as evidence of a longterm change. However, data collected over time
violate the assumption of independence required for
ordinary regression. The estimate of the slope remains unbiased, but typically the estimated standard
error of the slope is too small. The results appear to
be statistically significant when, in fact, there is no
evidence of a change (Neter et al. 1990, Chap. 13) and
a Type I error would have been made.
Comparing the means before and after impact
using ANOVA methods also suffers from the same
problem of correlation among the measurements.
Again, a typical result is that the estimated standard
error of the difference is too small, and results are declared statistically significant when in fact they are
not, and a Type I error would have been made.
An alternative analysis is to use time-series methods that incorporate temporal correlation. The
analysis of time series is quite complex (Nelson 1973)
particularly if the time points are unequally spaced. If
the data points are taken before and after the impact,
the time series analysis can be extended using intervention analysis to test if an impact changed the level
of the series (Rasmussen et al. 1993).
Temporary or permanent monitoring sites
A common question in monitoring surveys is the use
of temporary or permanent monitoring sites. For example, should permanent water quality sampling
sites that are remeasured over time, or temporary
sampling sites that are re-randomized at each time be
used? Many of the concerns are similar to those for
repeated sampling designs discussed earlier. Permanent plots give better estimates of change over time
because the extra plot-to-plot variability caused by
bringing in new plots each year is removed. However,
the costs of establishing permanent plots are higher
than for temporary sites, and the first randomization
may lead to a selection of plots that have some
35
36
37
Keith, L.H. (editor). 1988. Principles of environmental sampling. Am. Chem. Soc., New York, N.Y
A series of papers on sampling mainly for environmental contaminants in ground and surface
water, soils, and air. A detailed discussion on
sampling for pattern.
Nemec, A.F.L. 1993. Standard error formulae for cluster sampling (unequal cluster sizes). B.C. Min.
For., Res. Br., Victoria, Biometric Inf. Pamph.
No. 43.
38
______. 1965b. Analytical surveys with cluster sampling. J. Royal Statist. Soc., B, 27:26478.
______. 1966. An application of sequential sampling
to analytical surveys. Biometrika 53:8597.
Skalski, J.R. and D.S. Robson 1992. Techniques for
wildlife investigations: design and analysis of
capture data. Academic Press, New York, N.Y.
Presents methods for conducting experimental inference and mark-recapture statistical studies for
fish and wildlife investigations.
Smith, E.P., D.R. Oruos, and J. Cairns Jr. 1993.
Impact assessment using the before-aftercontrol-impact (BACI) model: concerns and
comments. Can. J. Fisheries Aquatic Sci.
50:62737
Stewart-Oaten, A., J.R. Bence, and C.W. Osenberg.
1992. Assessing effects of unreplicated perturbationsno simple solutions. Ecology
73:1396404.
Stewart-Oaten, A., W.M. Murdoch, and K. Parker.
1986. Environmental impact assessment:
pseudoreplication in time? Ecology
67:92940.
One of the first extensions of the BACI design discussed in Green (1979).
Thompson, S.K. 1992. Sampling. J. Wiley, New York,
N.Y.
A good companion to Cochran (1977). Has many
examples of using sampling for biological populations. Also has chapters on mark-recapture,
line-transect methods, spatial methods, and
adaptive sampling.
Underwood, A.J. 1991. Beyond BACI: experimental
designs for detecting human environmental
impacts on temporal variations in natural populations. Austr. Marine and Freshwater Res.
42:56987.
A discussion of current BACI designs, and an enhanced BACI design to detect changes in
variability as well as in the mean response.
______. 1994. Things environmental scientists (and
statisticians) need to know to receive (and give)
better statistical advice. In Statistics in ecology
and environmental monitoring. D.J. Fletcher
and B.F. Manly (editors). Univ. Otago Press,
Dunedin, N.Z.
39
4 RETROSPECTIVE STUDIES
G. JOHN SMITH
Statistics are extremely important to resource management. The rigour of gathering and analyzing data for
a proper statistical analysis often conflicts with the
need to obtain the required information within a short
time frame and within a limited budget. Retrospective
studies are one alternative to a fully controlled, or
prospective, study. These studies offer a compromise,
which uses existing data or circumstances. This approach greatly shortens the time between the inception of the study and the presentation of the results,
as well as reduces the cost. A considerable degree of
methodological correctness can be maintained by
careful design, analytical techniques, and presentation
of results.
As with any compromise, retrospective studies must
be used carefully. In retrospective analyses, often the
results are preliminary, and sometimes do not allow
for quantitative model building, hypothesis testing, or
point estimation. However, by carefully presenting results and designing the study, and being aware of the
pitfalls inherent in individual analyses, a great deal of
useful information can be obtained. Even if the results
are interim, such efforts can be beneficial to the gathering of future information, and to decision-making
processes. Managers need all the tools available to
properly manage forest resources and adapt to changing conditions and priorities.
In this chapter, a definition and many examples are
presented to demonstrate the differences between
prospective and retrospective studies. Each example is
reviewed with an emphasis on contrasting retrospective and prospective studies, and pointing out the
strengths and weaknesses of the retrospective approach. Finally, some suggestions are given regarding
the design of retrospective studies and the analysis of
retrospective data.
4.1 Introduction
4.2 Definitions
In any study involving data, two values help determine the methodology to apply. The first is
expedienceto complete the work as quickly and efficiently as possible to meet deadlines and minimize
cost. The second is rigourto scrupulously apply statistical methods and experimental controls to ensure
Abstract
41
42
43
44
in all respects except for site density so that a difference in site index can be directly attributable to the
density. He correctly points out the potential biases
in the selection of plots, and although the design has
some shortcomings, he finds compelling evidence to
suggest a density-dependent repression on site index.
The dependence was most noticeable in very dense
stands and hardly noticeable in stands of lower density. Thus growth and yield models that do not take
repression into account would not be applicable to
very dense stands
Here is an example where existing data have been
used considerably. The results were presented along
with a discussion of the potential flaws and biases,
and a great deal of information and knowledge was
gained. Researchers would have had to wait years for
the results from a prospective study.
4.3.7 Example F2: site index versus tree density
Thrower (1992) also conducted a study at Larsons
Bench, east of Chilliwack, B.C., on the relationship
between density and height-and-diameter growth in
coastal Douglas-fir stands. This study, too, tried to
mimic an experimental design by comparing a natural (unspaced) and a previously logged (spaced) area,
each with several similar ecological units. This study
used an existing thinning, which was not designed for
a research study. Some matching of units was possible. However, since the two areas did not have the
same conditions at the time of thinning, it was acknowledged that the comparison of the growth rates
in the two areas may be a combination of growth rate
and initial conditions.
4.4 Contrasting Data Collected from Prospective
and Retrospective Studies
Data collected from a prospective study can be used
and analyzed directly, and the interpretation can be
based on sound statistical design and analysis. By
contrast, data from a retrospective study have fewer
statistical controls, and often some components cannot be combined with other components of the data
unless additional assumptions are made about their
comparability.
Consider Example B (forest bird populations) in
the previous section. For a prospective study, the same
treatments would be used throughout 80 years of the
study. The analysis of trends for various species is relatively straightforward. In the retrospective study
there may be a variety of stands where clearcutting
procedures. Any difficulty in finding truly comparable data sources inhibits a rigorous statistical design.
The lack of control in the retrospective study means
that additional assumptions will be required to
perform the analysis, implying that more care is
needed in the interpretation of the results. For example, if in Example B (forest bird populations) we
were unable to match clearcut and natural stands, we
may find it necessary to compare a clearcut stand
with a natural one that has a different aspect and tree
species mix. We would assume that these factors in
the two stands do not make a difference to the
species present.
In Example F2 (site index versus tree density) the
comparability of thinned and unthinned stands was
somewhat compromised because of differences in the
conditions of the stands at the time of clearcutting.
Consequently, the results may be less widely applicable. The advantages, however, are the potential to
greatly shorten the studys time frame, and to reduce
the effort and resources required.
The previous comments do not imply that retrospective analyses are inadequate, but do indicate the
importance of additional diligence during their design, interpretation, and analysis. Furthermore, the
study of alternative data sources greatly enhances the
ability to design an efficient study using knowledge
already gained about the subject under investigation.
This study may lead to rejecting some scenarios or
considering others that might not otherwise be
obvious. Also, retrospective data provide advance
warnings of difficulties one might expect, which
could lead to the failure of an experimental design. A
retrospective study can often be used as a pilot study
to obtain qualitative information.
4.6. Studies with Significant Retrospective and
Prospective Components
Sometimes it is not clear whether to classify a study
as prospective or retrospective. The study may appear to be a prospective study, but after scrutiny may
be found to be more like a retrospective one. Thinking about which category the study belongs to will
help us understand how the assumptions might affect
the interpretation of results. The following examples
illustrate this point.
4.6.1 Example G: change in the measuring instrument
Consider the case where a questionnaire survey of
45
Retrospective Study
Common Elements
Prospective Study
Dynamic
natural system
Set goals
Search for
retrospective data
or circumstances
Plan project,
study or analysis
Experimental
design
Determine
applicability
of retrospective
data to study
Determine
methods for
integrating data
Field procedures
and data collection
Analysis
Evaluation,
approximate
interest
Inference
Interpretation
of results
Dissemination
of results,
recommendation
Action
46
hunters has been done for many years, yielding information about hunter activity, number of animals or
birds killed, etc. With advancing technology and
more knowledge of the resource, the survey is
redesigned, new computer technology is used, the
questions are clearer, and perhaps one or two are
added or deleted. The same series of statistics is generated before and after the redesign.
Now suppose that later a trend analysis of the
number of days hunted and the number of birds of
various species killed is required over a time period
that spans both the old and the new methodology.
Because the same series of statistics has been generated throughout the period covered by the study, the
researcher might infer that it would be valid simply
to use the data without further consideration. However, changes in the way a question is asked or in the
way the questions are edited may influence a persons
response and therefore an apparent trend may be due
to the question rather than the activity. Any analysis
would require assumptions about the comparability
of the data. The interpretation of the results would
need to acknowledge these assumptions and examine
their potential implication. Cooch et al. (1978) describe some of the effects of survey changes on results
Average
I/A ratio
1990
1991
1992
1993
1994
1995
0.83
0.90
0.95
0.92
0.85
0.96
1.10
1.02
0.97
1.03
0.93
1.05
0.90
0.84
0.88
0.81
0.78
0.91
1.12
1.25
1.12
1.17
1.08
1.20
1.35
1.41
1.22
1.27
1.19
1.36
1.06
1.08
1.03
1.04
0.97
1.10
Total
Colony
Nests observed
Immatures
Adults
I/A ratio
Colony size
20
29
40
0.73
58
20
39
40
0.98
75
C
18
34
36
0.94
105
D
17
39
34
1.15
178
20
63
40
1.58
285
95
148
190
701
47
I/A ratio
1.5
1.0
0.5
0
100
200
300
Colony size
. I/A ratio vs colony size.
48
Probability sampling
Awareness and clear statement of assumptions
Design considerationscontrolling variation
Use of direct measurements rather than proxies for
the measurements
49
50
51
References
52
Goudie, J.W. 1996. The effect of stocking on estimated site index in the Morice, Lakes and
Vanderhoof timber supply areas in central
British Columbia. Presented at N. Int. Veg.
Manage. Ann. Meet. Jan. 2425, 1996,
Smithers, B.C.
Greenberg, R.S. 1988. Retrospective studies (including case-control). Encycl. Statist. Sci. 8:1204.
Gyug, L.W. and S.P. Bennett. 1995. Bird use of
wildlife tree patches 25 years after clearcutting.
Rep. prep. for B.C. Min. Environ. Penticton,
B.C.
Hahn, G.J. 1980. Retrospective studies versus planned
experimentation. Chem. Techn. 10:3723.
Hamilton, A.N., C.A. Bryden, and C.J. Clement. 1991.
Impacts of glyphosate application on grizzly
bear forage production in the coastal western
hemlock zone. For. Can. and B.C. Min. For.
FRDA Rep. No. 165.
Hartman, G.F. and M. Miles. 1995. Evaluation of fish
habitat improvement projects in B.C. and recommendations on the development of
guidelines for future work. B.C. Min. Environ.,
Lands and Parks, Fish. Br., Victoria, B.C.
Hoogstraten, J. and P. Koele. 1988. A method for analyzing retrospective pre-test / post-test designs:
II. Application. Bull. Psychometric Soc.
26:1245.
Hunter, M.L. 1990. Wildlife, forests, and forestry
principles of managing forests for biological diversity. Prentice-Hall, Englewood Cliffs, N.J.
Kelsall, J.P., E.S. Telfer, and T.D. Wright. 1977. The
effects of fires on the ecology of the boreal forest with particular reference to the Canadian
north. Can. Wildl. Serv., Occas. Pap. No. 32.
Koele, P. and J. Hoogstraten. 1988. A method for analyzing retrospective pre-test/post-test designs: I
Theory. Bull. Psychometric Soc. 26:514.
McAllister, M.K. and R.M. Peterman. 1992. Experimental design in the management of fisheries: a
review. N. Am. J. Fish. Manage. 12:118.
Wang, M.C. 1992. The analysis of retrospectively ascertained data in the presence of reporting
delays. J. Am. Statist. Assoc. 87:397406.
Marcot, B.G. 1989. Putting data, experience and professional judgment to work in making land
management decisions. In Proc. B.C.U.S. Dep.
Agric. For. Serv. Workshop, Oct. 1620, 1989,
pp. 14061.
Maymin, Z. and S. Gutmann. 1992. Testing retrospective hypotheses. Can. J. Statistics 20:33545.
Nemec, A.F.L. [n.d.]. Design of experiments. This
volume.
Peck, J.E. and B. McClune. 1995. Remnant trees in relation to canopy lichen communities in western
Oregon: a retrospective approach. Dep. Bot.
Plant Path., Oreg. State Univ., Corvallis, Oreg.
Tech. Rep.
Prentice, R. 1976. Use of the logistic model in retrospective studies. Biometrics 32:599606.
Reed, W.J. 1995. Estimating the historic probability of
stand-replacement fire using the age-class distribution of undisturbed forest. For. Sci.
40:10419.
Rosner, B.A. 1986. Fundamentals of biostatistics.
PWS Publishers, Boston, Mass.
Routledge, R., G.E.J. Smith, L. Sun, N. Dawe, E. Nygren, and J. Sedinger. Biometrics [1998].
Estimating the size of a transient population.
Biometrics (in press).
Schwarz, C.J. [n.d.]. Studies of uncontrolled events.
This volume.
Sikkel, D. 1990. Retrospective questions and group
differences. J. Official Statist. 6:16577.
53
Abstract
5.1 Introduction
Do any of these statements describe your views?
It is a waste of time to worry about measurement
errors. I have enough practice in my field to have
reduced measurement errors to a negligible size.
If I know that my measurements are not perfect,
then I should take several, and average them,
maybe throwing out the odd one that is far from
the others.
I have the resources only to make a subjective guess
at the abundance of some minor species. Surely this
will be adequate. After all, I am only looking for
trends. If the measurement errors are large, and are
consistently present, cant we ignore them when we
are looking for trends?
I dont have to worry about measurement errors. I
always take repeated observations and use standard
statistical techniques to deal with them. If my measurements do contain large errors, then cant I just
take repeated measurements, do a routine statistical
analysis, and quote a P-value to silence the pesky
biometricians?
I have an important job to do. I dont have the time
or luxury of worrying about statistical niceties like
academics and scientists. I need to get on with
managing for forest production.
If you agree with any of these opinions, then you
may find this chapter unsettling.
55
56
inspecting a list of measurements of the same quantity. Deviations from the average of this list will
estimate the chance errors. The average size of these
deviations (calculated formally through a root mean
square) is called the standard deviation.
For a list of measurements, x1,x2,x3,xn, the average or mean is given by
n
= x1 + x2 +x3 ++xn = i =1 i ,
x
n
n
and the standard deviation by
SD(x) =
(x )2.
x
n 1
i =1
SD(X ) = ,
57
3. They might be due to an inherent part of the natural variability that should not be ignored.
If aberrant values are routinely thrown out, the
resulting data set will give a false impression of the
structure of the system being measured. For example,
in a wildlife habitat study, denning habitat might be
predicted in an area based on the distribution of the
diameter at breast height for a stand. For a stand
dominated by small trees, the diameter measurements associated with the few mature trees could be
considered as outliers. Discarding these outliers
could lead to the conclusion that the stand contains
only small-diameter trees incapable of providing
valuable denning sites. Management errors could
then arise from this erroneous impression.
Furthermore, opportunities for identifying and
controlling large sources of error can be lost, and
clues to new discoveries may go undetected. For example, a few extraordinarily large-diameter trees in a
replanted stand may lead to valuable genetic insight;
a pocket of unusually small ones may betray the arrival of a new insect pest.
Outliers must be treated carefully for another reason: they can invalidate standard statistical inference
procedures (see section 5.3.1, last paragraph).
5.2.5 Accuracy and precision
A measurement process is said to have high accuracy
if all errors are typically small. This requires that both
(a)
(b)
(c)
. Examples of (a) high accuracy (both small bias and small chance errors), (b) low accuracy caused by poor
precision (small bias but large chance errors), and (c) low accuracy but high precision (large bias but small
chance errors).
58
59
relationship. Nigh (1995) proposes using the geometric mean regression line in this context. However,
this technique, promoted by Ricker (1973) in fisheries
analysis, is highly controversial and may itself
provide inaccurate slope estimates. The most appropriate technique will depend upon the specific
application and on the relative sizes of the variation
in the x- and y-directions. See Fuller (1987) for a
thorough discussion of the handling of estimation
errors in the x-variable.
Outliers are also particularly troublesome in regression analyses. Points that are far from the
regression line have considerable influence. Rumours
abound of practitioners routinely discarding such
points. See Section 5.2.4 on discarding aberrant measurements.
(a) No errors in x
25
20
15
7
10
11
12
13
11
12
13
X
(b) Errors in x
25
20
15
7
10
X
. Errors in the x-values typically not only increase the scatter in the picture, but also spread out the points in the
x-direction. This result reduces the slope of the regression line. In this example, the slope is reduced from 1.95 to
1.10.
60
5.4.1 Counting
Counting is a basic method for assessing the size of a
population. As discussed in the example on rhinoceros surveys, it is easy to overestimate the ability to
count animals. Be prepared for substantial undercounts. Counts of spawning salmon, for example, can
be out by a factor of 5 to 10.
Both over- and undercounting can have serious
management implications, and of course these implications extend to other forms of estimates. In a
silvicultural management experiment, for example,
the numbers of trees in sampled stands may be
counted to estimate stand density. If the stand is too
dense, then by law it must be thinned; if the stand is
understocked, then a remedial measure such as
planting must be taken. In this case, over- and underestimation would incur unnecessary cost to the forest
manager. The uncertainty in an estimate must always
be acknowledged, and quantified where possible
through estimates of bias and standard error, along
with confidence limits if appropriate.
5.4.2 Direct physical measurements
Measurements of simple physical attributes such as
length and mass are usually highly accurate. However, some situations may demand extraordinarily high
accuracy. For example, complex models will require
extremely accurate measurements if measurement
errors are compounded in these models.
5.4.3 Indirect physical measurements
Many measurements of physical quantities are indirect. A liquid thermometer, for example, displays the
height of a column of liquid, and measures temperature only insofar as this height is related to
temperature. The manufacturer must calibrate the
instrument by testing it out at known temperatures.
We cannot safely ignore the manufacturers methodology if we push the instrument close to its limits.
These limits can be exceeded by:
1. demanding more accuracy than can be expected of
the instrument (e.g., trying to measure altitude to
the nearest metre when repeat readings at the
same elevation show chance variation over a range
of 20 m);
2. taking measurements outside of the range of values for which the instrument was designed
(e.g., using an altimeter calibrated for use up to
3000 m in a flight over Mount Waddington, whose
summit is at 4019 m); or
3. using the instrument under unusual conditions
61
10 000
9 000
Spawning estimates
8 000
7 000
6 000
5 000
4 000
3 000
2 000
1 000
0
1950
1960
1970
1980
1990
2000
Year
. Estimated numbers of chinook salmon spawning in the Upper Fraser Watershed near Tte Jaune Cache.
62
fraction killed. To do this, we need an unbiased estimate of abundance. By contrast, it is often feasible
only to obtain an index of abundance for much of
the population. Indices can be turned into unbiased
estimates through a sort of calibration process.
In resource management work, this goal is often
achieved through double sampling and ratio estimation. The Canadian Wildlife Service, for example,
conducts an annual survey of breeding waterfowl.
Aerial surveys are used to obtain rough abundance
estimates. These figures are then supplemented by
more thorough ground surveys. The results of the
ground surveys are used to adjust the less accurate,
but more extensive, aerial surveys for bias. In a sense,
the ground surveys are used to calibrate the aerial
surveys.
This technique of double sampling is described in
more detail by Cochran (1977, Chapter 12) and
Thompson (1992, Chapter 14). It is a valuable tool in
a wide variety of contexts. For example, in a management experiment involving the monitoring of grass
biomass over time, definitive estimates of biomass
could be obtained only through destructive sampling.
By contrast, extensive but imprecise information
could be obtained from subjective, visual estimates.
By sampling a small number of quadrats destructively, the results of a more extensive set of visual
estimates can be adjusted for bias.
5.4.6 Quantitative measures of imprecise concepts
Forest management is closely linked to ecology,
which in turn is full of vague concepts such as niche
width, niche overlap, similarity, importance, competition, and diversity (Krebs 1994). Developing precise,
quantitative measures of these concepts is one of the
enduring challenges of the subject. The following discussion illustrates both the need for precise,
quantitative measures and common problems encountered in their construction and use. These are
illustrated in the context of diversity measures.
Biodiversity has recently received increasing attention in resource management. Yet it is not easy to
define and measure. The Biodiversity Guidebook in
the Forest Practices Code (B.C. Ministry of Forests
and B.C. Environment 1995) defines the concept as
follows:
Biological diversity (or biodiversity) is the diversity of plants, animals and other living organisms
in all their forms and levels of organization, and
includes the diversity of genes, species and
ecosystems, as well as the evolutionary and functional processes that link them.
This definition explicitly mentions the organisms
and levels of organization to be considered, but
leaves the word diversity undefined. The guidebook gives directions on how to manage forests to
maintain biodiversity. Some of these directions are
based on assumptions on how forest ecosystems
function, including:
The more that managed forests resemble the
forests that were established from natural
disturbances, the greater the probability that
all native species and ecological processes will be
maintained.
Although this assumption has considerable
intuitive and practical appeal, how can we be sure
that it is valid and that the strategy is working? We
need to have some way of quantifying biodiversity so
that we can monitor it directly. This in turn requires
a quantitative definition that pins down this vague
concept.
No single definition will be universally applicable.
A wildlife biologist will be interested in maintaining
wildlife diversity by maintaining the structural diversity in a forest. Thus, two concepts of diversity are
invoked: the species diversity of the wildlife and the
structural diversity of the forest. A fisheries biologist
will focus on maintaining the diversity of individual
fish stocks (not species), which in turn depends on
maintaining a diverse set of healthy fish habitats.
Indices are useful measures of abstract concepts.
However, a single measure may not capture the concept fully, and several different types of indices may
be needed to measure an imprecise concept. Consider an analogy of blind people trying to describe an
elephant. Each person examines a different part of
the elephant. No one person will obtain an accurate
overall impression of the elephant. Obviously, the
more of the elephant that you can include in the operational definition the better, but there will always
be limitations.
Now consider species diversity: its simplest definition is the number of species. But counting or
estimating the number of species in a community is
very difficult. An indefinite number of rare or cryptic
species may go undetected in any survey.
Furthermore, diversity depends not only on
the number of species, but also on the lack of
63
N1 = exp
p 1n(p )
i
i =1
; and
N2 =
p2
[ ]
i =1
In each instance, if all species are equally abundant, the diversity reduces to the species count, N0.
It is customary to view N1 and N2 as describing the
number of equally common species that would
produce an equivalent impression of diversity.
Which index should be used? A choice of index
should in general depend on the properties of the
index in relation to the goals of a study. The example
portrayed in Figure 5.4 and Table 5.1 illustrates some
of the differences amongst these three indices.
Figure 5.4 displays the abundance patterns; Table 5.1,
the values of the diversity indices.
64
Community
a
b
c
d
Shannon-Wiener (N1)
8
4.81
2.19
1.34
Simpsons (N2)
8
4.38
1.53
1.11
4. Identify indices that have been developed to quantify this abstract concept.
5. Assess the sensitivity of the indices to the important aspects identified in step 3.
6. Examine the range and scale of the indices. Is it
easy to tell whether an observed value represents a
desirable level, or whether an observed change in
the index represents an important change in the
achievement of management objectives?
7. Ensure that the bias and standard error of all selected indices are well understood and predictable.
Each index shoul not be overly sensitive to small
. Four dominance patterns. In each instance, eight species are present, but the communities are increasingly
dominated by fewer species.
65
66
4. Visual estimates
Ensure that all visual estimates are conducted
according to rigorous protocols by well-trained
observers.
Pay particular attention to observer bias. When
bringing a new observer into the program, ensure that his/her results are backed up by an
experienced observers.
If sites or times are to be selected as part of the
collection of visual estimates, eliminate selection bias by providing a protocol for site- or
time-selection. Do not, for example, let vegetation samplers pick modal sites.
5. Data handling
Record data directly into electronic form where
possible.
Back up all data frequently.
Use electronic data screening programs to
search for aberrant measurements that might be
due to a data handling error.
Design any manual data-recording forms and
electronic data-entry interfaces to minimize
data-entry errors. In the forms, include a field
for comments, encourage its use, and ensure
that the comments are not lost or ignored.
5.6 Summary
A century ago, British Columbias renewable resources seemed so limitless that we asked very little of
our measurements of the resources and their support
systems. With new requirements imposed (e.g., by
the Forest Practices Code) and with increased harvesting capacity, we are escalating our demands on
the measurement systems. In recent years, our systems for estimating fish populations have let us
down. The recent controversy over the management
of Fraser River sockeye (Fraser et al. 1995), has not
been so much about a breakdown in the quality of
the measurement procedures, as about the fact that
our management expectations have increased beyond
the capacity of the measurement system.
Assess continually the adequacy of a measurement
system to improve it where possible and to point out
when its limitations may be exceeded. The assessment should include:
the choice of quantities to be measured;
the procedures and equipment for taking the measurements;
any associated sampling;
67
Huang, S., S.J. Titus, and D.P. Wiens. 1992. Comparison of nonlinear height-diameter functions for
major Alberta tree species. Can. J. For. Res.
22:1297304.
Krebs, C.J. 1994. Ecology: The experimental analysis
of distribution and abundance. 4th ed. Harper
and Collins, New York, N.Y.
Nemec, A.F.L. [n.d.]. Design of experiments. This
volume.
Nigh, G.D. 1995. The geometric mean regression line:
a method for developing site index conversion
equations for species in mixed stands. For. Sci.
41:8498.
Ricker, W.E. 1973. Linear regressions in fisheries research. J. Fish. Res. Board Can. 30:40934.
Routledge, R.D. 1979. Diversity indices: Which ones
are admissible? J. Theoret. Biol. 76:50315.
______. 1980. Bias in estimating the diversity of large,
uncensused communities. Ecology 61:27681.
References
Bergerud, W.A. and W.J. Reed. [n.d.]. Bayesian statistical methods. This volume.
British Columbia Ministry of Forests and B.C. Environment. 1995. Biodiversity guidebook.
Victoria, B.C. Forest Practices Code guidebook.
Caughley, G. 1974. Bias in aerial survey. J. Wildl.
Manage. 36:13540.
Cochran, W.G. 1977. Sampling techniques. 3rd ed.
J. Wiley, New York, N.Y.
Fuller, W.A. 1987. Measurement error models.
J. Wiley, New York, N.Y.
Goddard, J. 1967. The validity of censusing black rhinoceros populations from the air. East Afr.
Wildl. J. 5:1823.
68
Wallace, A.R. 1895. A narrative of travels on the Amazon and Rio Negro: with an account on the
native tribes, and observations on the climate,
geology, and natural history of the Amazon
Valley. Greenwood Press, New York, N.Y.
Reprinted 1969.
6 ERRORS OF INFERENCE
JUDITH L. ANDERSON
Abstract
Scientific
hypothesis
Deductive
logic
Statistical
hypotheses:
H0 , HA
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
Development of
measurement
and sampling
procedures
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
Inductive
logic
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
A major goal in adaptive and experimental management is to improve our understanding of managed
biological systems by making reliable conclusions
(inferences) from experiments and monitoring programs. However, any experimental inference has a
chance of being incorrect, and these errors can result
in large economic and ecological costs. Therefore, experimenters must understand how errors of
inference occur and how to control them. This chapter discusses the following topics:
the relationship between the two types of errors of
inference in statistical tests, with a focus on the
category of error most often ignoredType II
error, failure to reject a null hypothesis when it is
in fact false;
Statistical
inference:
Reject H0?
Data collection
and parameter
estimation
69
. Four possible outcomes of a statistical test of a null hypothesis. The probability of each outcome is given in
parentheses. Management decisions that might proceed from the inference are indicated in parentheses. Adapted
from Toft and Shea (1983).
Inference
State of nature
Do not reject HO
(Manage as though HO were true)
Reject HO
(Manage as though HA were true)
HO true
Correct (1):
Correctly infer that no
treatment effect exists
HO false
70
xC
xE
(xE )/s, where is the mean of population
density estimates in experimental (logged with 10-m
71
72
73
0.4
(a)
A
(b)
(d)
Probability
0.4
0.4
(c)
(e)
A
1
0
15
20
25
30
15
20
25
30
74
75
To get a feeling for small, medium, and large standardized effect sizes for the wood duck example,
suppose the investigator really had no idea what deviation from the preferred mean of 25 cm would
constitute an unsuitable habitat. He might decide
to detect a medium effect size, reasoning that a difference perceptible to a human observer should be
evident to the ducks. Cohen (1988) defines a
medium standardized effect size for a t-test as
(1 2)/ = 0.5. Using the standard deviation
among cavity diameters (10 cm) in the formula, the
experimenter would need to set HA: 2 20 cm. If the
experimenter felt that even a subtle difference in cavity size could be important to the ducks, he would
choose Cohens small standardized effect size (0.2)
and set HA: 2 23 cm. Finally, if he felt that only an
obvious difference was worth detecting reliably, he
could choose a large standardized effect size (0.8)
and set HA: 2 17.
Because of their importance and subtlety, effect
size concepts have received considerable attention.
Further discussion can be found in: Toft and Shea
(1983); Rotenberry and Wiens (1985); Tanke and
Bonham (1985); Stewart-Oaten, et al. (1986); Kraemer
and Thiemann (1987); Millard (1987a); Cohen (1988,
1992); Forbes (1990); Parkhurst (1990); Peterman
(1990a); Fairweather (1991); Faith et al. (1991); Matloff (1991); McGraw and Wong (1992); Nicholson and
Fryer (1992); Schrader-Frechette and McCoy (1992);
Stewart-Oaten et al. (1992); Scheiner (1993); Osenberg et al. (1994); and Mapstone (1995).
6.6 How Should Power Analysis be Used in Experimental Adaptive Management?
6.6.1 Power considerations are an intrinsic part of
experimental design
The selection of experimental design is largely a question of managing the factors that influence Type II
error rate: sample size, variance, effect size, and .
These variables affect statistical power in different
ways. In the wood duck example, halving the sample
variance and doubling the biologically significant effect size improved power more than did doubling
sample size or doubling the critical level of . Thus,
while is completely under the experimenters control and there are good reasons to choose critical
-levels other than 0.05 (see Section 6.7), changing
critical -level is not the most effective way to gain
power (Lipsey 1990). Instead, as we will discuss,
efficient design usually requires specific knowledge
76
quality, Osenberg et al. (1994) compared the sensitivity of population-based biological variables,
individual-based biological variables, and chemicalphysical variables. They found that standardized
effect sizes (and hence power) were greatest for individual-based biological variables because they
responded most sensitively to impacts and exhibited
relatively low error variance.
Ecological variables often exhibit large variance, so
strategies for reducing variance are especially important for achieving high statistical power. The total
variance is the sum of error variance and variance
that can be accounted for with additional information. Therefore, it is useful to account for as much of
the variance as possible in the experimental design.
Examples include blocking (Hurlbert 1984; Krebs
1989), covariate analysis (Wiens and Parker 1995),
and controlling for spatial heterogeneity (Dutilleul
1993). Error variance can also be reduced by improving measurement precision and reliability (Williams
and Zimmerman 1989; Lipsey 1990).
How could these strategies be applied to the
salamander example from Section 6.2.2? The experimenter might suspect that population density is
affected by aspect, irrespective of the width of the riparian reserve. The sample watersheds could be
divided into three blocks (north-facing, south-facing,
and east- or west-facing), to control for the variance
associated with aspect. Similarly, the experimenter
could include, as covariates, information about the
quantity and decay state of coarse woody debris in
each sample plot to account for some of the variance.
The error variance in estimates of population density
might also be reduced by sampling more intensively.
Finally, the experimenter might decide to estimate
recruitment and death rates of salamanders in addition to population density in an effort to measure
more responsive variables.
Finally, the most sophisticated design is not always
the most powerful. In more complex experimental
designs, statistical power is really a function of degrees of freedom, rather than straightforward sample
size. Because degrees of freedom are influenced by
both the extent of replication and the number of parameters to be estimated, increasing the complexity
of the design can be counterproductive with respect
to power. For example, in analysis of variance
(ANOVA), increasing the number of factors increases the number of parameters (treatment means) to be
estimated and this decreases the effective number of
replicates per cell, reducing power (Cohen 1988).
77
0.8
Power
0.6
0.4
0.2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
78
79
All reports of nonsignificant results should mention the effect size and power of the experiment.
Where appropriate, a posteriori power analysis may
be used.
Where potential costs of the errors of inference to
various stakeholders can be quantified, these costs
should be included in decisions about acceptable
levels of and .
Where currently available experimental designs
lack power, research should be directed toward developing new, powerful methodologies, such as
Before-After-Control-Impact paired designs (Underwood 1994).
Resources should be allocated to pilot studies that
will help to improve the power of large experiments.
A priori power analyses are often difficult because
not enough is known about potential response variables, biologically significant effect sizes, and spatial
and temporal variability. It would be useful to carry
out large-scale, long-term monitoring of these variables in forest ecosystems, with the express purpose of estimating them for use in future power
analyses and choices about experimental design
(Osenberg et al. 1994). Standard response variables
such as those proposed by Keddy and Drummond
(1996) for eastern deciduous forests could become
a starting point for this sort of database.
6.9 Relevant Literature and Software
6.9.1 A few key references guide experimenters
through power analysis for the most frequently
used statistical tests
The classic reference to statistical power is Cohen
(1988). Cohen provides clearly written instructions
for calculating standardized effect size and other
input parameters to power and sample size tables. He
provides these tables for t-tests, tests involving correlation coefficients, tests involving proportions, the
sign test, chi-square tests for goodness of fit and contingency, analysis of variance and covariance,
multiple regression and correlation, and set correlation and multivariate methods (e.g., canonical
correlation, MANOVA, and MANCOVA).
Zar (1996) presents a graph of power and sample
size for analysis of variance, as well as formulas for
calculating power and required sample size for a variety of other tests. While he does not include tabled
80
analysis. Some examples that may be useful in adaptive management experiments include:
density dependencepower and sample size for
tests designed to detect whether population parameters vary as a function of population density
(Solow and Steele 1990; Dennis and Taper 1994);
trend detectionpower and sample size for tests
designed to detect whether a variable is changing
with time (Hinds 1984; Tanke and Bonham 1985;
Harris 1986; Gerodette 1987, 1991; Whysong and
Brady 1987; Kendall et al. 1992; Loftis et al. 1989);
detection of rare speciessample sizes necessary
to detect rare species, based on the Poisson distribution (Green and Young 1993);
resource selectionpatterns of Type I and Type II
errors for four tests of habitat/resource selection
(Alldredge and Ratti 1986);
home range independencepower of Schoener
statistic for independence of animal home ranges
(Swihart and Slade 1986);
environmental monitoringpower, sample size,
and cost considerations for programs of environmental impact monitoring (Skalski and McKenzie
1982; Millard 1987b; Ferraro and Cole 1990;
Ferraro et al. 1989; Parkhurst 1990; Smith and
McBride 1990; Ferraro et al. 1994; Wiens and Parker 1995);
analysis of covariance in environmental monitoringanalysis of Type I and Type II error for
ANCOVA with examples from water quality monitoring (Green 1986);
environmental impact detection, unique cases, and
before-after-control-impact design issues (Bernstein and Zalinski 1983; Faith et al. 1991;
Underwood 1991, 1994; Stewart-Oaten et al. 1992;
Schroeter et al. 1993; Osenberg et al. 1994; Allison
et al. 1997; Gorman and Allison 1997); and
spatial patterns and heterogeneitypower analysis
for experimental designs that take spatial patterns
and heterogeneity into account (Andrew and Mapstone 1987; Downing and Downing 1992; Scharf
and Alley 1993).
6.9.4 When analytic methods are not appropriate,
Monte Carlo simulation can be used to estimate
power
Many ecological analyses involve specialized statistics
or experimental designs for which no analytic formulas exist for calculating power. In such cases, Monte
Carlo simulation can be used to produce many simulated data sets generated from distributions with
81
82
References
Alldredge, J.R. and J.T. Ratti. 1986. Comparison of
some statistical techniques for analysis of resource selection. J. Wildl. Manage. 50:15765.
Allison, D.B., J.M. Silverstein, and B.S. Gorman. 1997.
Power, sample size estimation, and early stopping rules. In Design and analysis of single case
research. R.D. Franklin, D.B. Allison, and B.S.
Gorman (editors). Erlbaum, Mahwah, N.J., pp.
33572.
Andrew, N.L. and B.D. Mapstone. 1987. Sampling
and the description of spatial pattern in marine
ecology. Oceanography and Marine Biology
Annual Reviews 25:3990.
Bergerud, W. 1992. A general description of hypothesis testing and power analysis. B.C. Min. For.,
Res. Br., Victoria, B.C. Biometrics Inf. Pamph.
No. 37.
______. 1995a. Power analysis and sample sizes for
completely randomized designs with subsampling. B.C. Min. For., Res. Br., Victoria, B.C.
Biometrics Inf. Pamph. No. 49.
______. 1995b. Power analysis and sample sizes for
randomized block designs with subsampling.
B.C. Min. For., Res. Br., Victoria, B.C. Biometrics Inf. Pamph. No. 50.
______. 1995c. Programs for power analysis/sample
size calculations for CR and RB designs. B.C.
Min. For., Res. Br., Victoria, B.C. Biometrics
Inf. Pamph. No. 51.
______. 1995d. Post-hoc power analysis for ANOVA
F-tests. B.C. Min. For., Res. Br., Victoria, B.C.
Biometrics Inf. Pamph. No. 52.
Bergerud, W.A. and W.J. Reed. [n.d.] Bayesian statistical methods. This volume.
Bernstein, B.B. and J. Zalinski. 1983. An optimum
sampling design and power tests for environmental biologists. J. Environ. Manage. 16:3543.
Bittman, R.M. and M.L. Carniello. 1990. The design
of an experiment using statistical power with a
startle chamber study as an example. J. Appl.
Toxicology 10:1258.
83
Green, R.H. and R.C. Young. 1993. Sampling to detect rare species. Ecol. Applic. 3:3516.
Friendly, M. 1996. Power analysis for ANOVA designs. York Univ., Toronto, Ont. Available on
line: <www.math.yorku.ca/SCS/Demos/
power/>[January 1996].
84
Millard, S.P. 1987a. Environmental monitoring, statistics, and the law: Room for improvement.
Am. Statist. 41:24953.
______. 1990b. The importance of reporting statistical power: The forest decline and acidic
deposition example. Ecology 71:30247.
85
86
Schrader-Frechette, K.S. and E.D. McCoy. 1992. Statistics, costs, and rationality in ecological
inference. Trends in Ecology and Evolution
7:969.
Schroeter, S.C., J.D. Dixon, J. Kastendiek, R.O.
Smith, and J.R. Bence. 1993. Detecting the ecological effects of environmental impacts: a case
study of kelp forest invertebrates. Ecol. Applic.
3:33150.
Schwarz, C.J. [n.d.]. Studies of uncontrolled events.
This volume.
Searcy-Bernal, R. 1994. Statistical power and aquacultural research. Aquaculture 127:137368.
Sedlmeier, P. and G. Gigerenzer. 1989. Do studies of
statistical power have an effect on the power of
studies? Psychol. Bull. 105:30916.
Sit, V. 1992. Power analysis and sample size determination for contingency table tests. B.C. Min.
For., Res. Br., Victoria, B.C. Biometrics Inf.
Pamph. No. 41.
Skalski, J.R. and D.H. McKenzie. 1982. A design for
aquatic monitoring programs. J. Environ. Manage. 14:23751.
Smith, D.G. and G.B. McBride. 1990. New Zealands
national water quality monitoring network
design and first years operation. Water Res.
Bull. 26:76775.
Solow, A.R. and J.H. Steele. 1990. On sample size, statistical power, and the detection of density
dependence. J. Animal Ecol. 59:10736.
Steiger, J.H. and R.T. Fouladi. 1992. R2: A computer
program for interval estimation, power calculations, sample size estimation, and hypothesis
testing in multiple regression. Beh. Res. Methods, Instr. Comp. 24:5812.
Stewart-Oaten, A., J.R. Bence, and C.W. Osenberg.
1992. Assessing effects of unreplicated perturbations: no simple solutions. Ecology
73:13961404.
Stewart-Oaten, A., W.W. Murdoch, and K.R. Parker.
1986. Environmental impact assessment:
pseudoreplication in time? Ecology
67:92940.
87
7.1 Introduction
In most introductory statistics courses, students encounter basic concepts of the familiar frequentist
paradigm, such as sampling distributions, significance tests, P-values, and confidence intervals. These
concepts, which are further developed in specialized
courses on regression, design of experiments, and
sampling methods, form the basis of the statistical
toolkit that most graduates carry with them into the
world of science and management. When faced with
problems involving experiments, data, and decisions,
most foresters, biologists, and forest managers will
naturally reach into this toolkit. However, the familiar statistical toolkit often proves inadequate for dealing with management problems. Managers make decisions in an environment of uncertainty, where the
best choice among several alternatives is unknown.
They often want to know the probability that a hypothesis is true, or the degree to which it is trueinformation that frequentist statistics does not directly
provide. Despite its limitations for management, the
frequentist framework is seldom questioned by practitioners, partly because the limitations of frequentist
1 An accessible discussion of some of these points can be found in Swindel (1972), Dennis (1996), and Edwards (1996), and in the Teachers
Corner of The American Statistician, Vol. 51, (3), pp. 24174 (several articles).
89
summarizes the data and can be used to answer questions or to calculate confidence limits for the parameters. Questions are posed as hypotheses, the most
frequent being the well-known null hypothesis (that
a particular parameters unknown value is zero). By
considering many hypothetical replications of the experiment, statisticians can determine the behaviour
of fitted parameter values or of some function of the
parameters. This behaviour is described by a frequency or sampling distribution (such as the normal, t-,
and F-distributions) and is used to calculate the familiar P-values and confidence intervals.
One could argue that frequentist statistics is only
really applicable to the analysis of data that arise from
a procedure that is intrinsically repeatable. Such a restriction would severely limit its use. For example, it
would rule out almost completely its use in time
series analysis, time being universally recognized as
non-repeatable. This restriction would also rule out
applying frequentist statistics in many areas of
forestry, because trees generally take a long time to
grow and growing conditions could change over the
course of an experiment. For example, an experiment
involving growing various tree species would not really be repeatable, given the dependence of growth
upon weather over several decades and the possibilities of site changes. However, this objection is usually
overcome by observing that replication is regarded in
a hypothetical sense for the purpose of interpreting
results, and that even in truly repeatable experiments,
the experiment is seldom actually repeated. Rather,
in both cases one contemplates a universe of possible
replications for the purpose of comparing the actual
observed results.
In forest management, managers would like simple
answers to practical questions from sampling procedures and studies, whether experimental, observational, or a combination of the two. For example, an
assessment of the probability that one or more hypotheses are true, or the probability that an estimate
of a parameter is reasonably close to its unknown
true value, would be useful information when formulating decisions. Insofar as frequentist statistical
methods do not directly provide this information (although the results of frequentist statistical analysis
are often loosely misinterpreted in this way) they
may be of limited use to the forest manager. Bayesian
methods on the other hand can provide precisely this
sort of information.
2 An informal proof of Bayes theorem using the example in Section 7.3 is presented in Appendix 1.
90
(1)
3 This sort of stratum is known as a working group within the B.C. Ministry of Forests.
91
92
Cutblock status
US
Total
NSR
SR
Total
672
72
744
168
88
256
840
160
1000
Prior
distribution
Bayes
theorem
Posterior
distribution
672 plots
observed to be
understocked
Data
840 plots
from NSR
cutblocks
168 plots
observed to
be stocked
1000
plots
72 plots
observed to be
understocked
160 plots
from SR
cutblocks
88 plots
observed to
be stocked
93
672
= 0.903.
744
With relevant prior information and the sampling
results of just one plot, the Bayesian approach allows
the determination of the posterior probability that
the cutblock is NSR. On the other hand, the frequentist approach formally ignores the prior information
and so could do little with just one plot. In any case,
regardless of approach, it is unwise to decide the
management of the cutblock on the basis of one plot.
In Section 7.3.1, we will extend the methodology to
samples of several plots. For the rest of this section,
we will fill in the steps just skipped by developing the
necessary statistical notation, the probability model
for the data, and the posterior probability distribution.
The two possible true states of the cutblock can
be denoted by the random variable , which can have
one of two values, either NSR or SR. Based on
Table 7.1, the prior probability that the cutblock is
NSR is denoted by p( = NSR) = 0 = 0.84 while the
prior probability that the cutblock is SR is denoted by
p( = SR) = 10 = 0.16. This prior distribution has a
Bernoulli distribution with parameter 0 (a special
case of the binomial distribution when the sample
size is one).
The data, denoted by X, can have one of two values: either X = US or X = S. The probability model
provides the probability of the data given the true
while
72
p(X =US| = SR) = = 0.45.
160
These probabilities are parameters of the probability
models for the two states of nature and will be denoted by 1 and 2 , respectively. They are also
conditional probabilities because they are the probability that X = US given (or conditional on) the true
state of nature. Furthermore, these probabilities are
the likelihood6 functions, because they provide a measure of the likelihood of observing the data, X, given
specific values for the state of nature, . The model
parameters are summarized in Table 7.2 and in the
tree diagram in Figure 7.3.
A derivation of Bayes theorem using this example
is presented in Appendix 1. The conclusion from that
appendix is that the posterior probability of a particular state of nature, , given the data X is given by:
p(|X) = p() p(X|).
p(X)
For this example, the probability that the new cutblock is NSR given that the observed plot is US is:
Table 7.2 The probability parameters for p(), the prior distribution of (cutblock is NSR or SR), and
p(X|), the conditional probability of observing X, given (cutblock is NSR or SR). All values
were calculated from those in Table 7.1.
Probability parameters
Cutblock status
X = US
X=S
0 = 0.84
10 = 0.16
1 = 0.80
2 = 0.45
11 = 0.20
12 = 0.55
= NSR
= SR
94
States of nature, ,
for the cutblock
1 = 0.80
X = US
= NSR
0 = 0.84
11 = 0.20
2 = 0.45
10 = 0.16
X=S
X = US
= SR
12 = 0.55
X=S
. The likelihoods (probability that X plots out of 12 are US given ) and the posterior probability that the cutblock
is NSR for all possible X values, when the prior probability, 0 = 0.84
X=
number of
US plots
observed
Likelihood that
plots are US
when = NSR
p(X | = NSR)
Likelihood that
plots are US
when = SR
p(X | = SR)
Posterior
probability
for = NSR
p( = NSR| X)
Cutblock is
most likely
Management
decision
0.000
0.000
0.001
0.003
0.016
0.073
0.277
0.652
0.902
0.978
0.995
0.999
1.000
SR
SR
SR
SR
SR
unclear
unclear
unclear
unclear
NSR
NSR
NSR
NSR
not plant
not plant
not plant
not plant
not plant
unclear
unclear
unclear
unclear
plant
plant
plant
plant
0
1
2
3
4
5
6
7
8
9
10
11
12
0.000
0.000
0.000
0.000
0.001
0.003
0.016
0.053
0.133
0.236
0.283
0.206
0.069
0.001
0.008
0.034
0.092
0.170
0.222
0.212
0.149
0.076
0.028
0.007
0.001
0.000
Total
1.000
1.000
95
(2)
Table 7.3 gives the likelihoods and posterior probabilities that the cutblock is NSR for n = 12 sample
plots (using a protocol of one plot per hectare of cutblock to determine that 12 plots are required) and for
all possible X values (X = 0 to X = 12). The posterior
probabilities are calculated using equation (2), with
the prior probability being p( = NSR) = 0 = 0.84.
If four or fewer plots out of 12 are observed to be
US, then deciding that the cutblock is SR seems clear
because the posterior probability is less than 0.05.
Also, if eight or more plots are observed to be US
then the NSR decision is clear because the posterior
probability is greater than 0.95. But when 5, 6, or 7
out of 12 plots are observed to be US then the management decision is not clear. These results depend
on the model probability values of 1 and 2. If their
values had been more widely separated then the unclear decision would have occurred for fewer
X-values; if their values had been more similar, then
the undecided decision would have occurred for
more X-values. Odds ratios provide another way to
express these results.
7.3.4 Odds ratios
Another way to look at these results is to calculate the
posterior odds that the cutblock is NSR ( = NSR)
given X plots observed to be understocked. This
96
=
=
(3)
p(X| = SR)
p( = SR)
0
p(X| = NSR)
.
10
p(X| = SR)
( )
0.80
= = 1.78
0.45
and so the posterior odds are:
1
0
p( = NSR|X)
=
2
(10)
p( = SR|X)
( )( )
( )( )
BF
01
13
3 20
20 150
> 150
Nothing to mention
Not worth more than a bare mention
Positive
Strong
Very strong
whereas frequentist P-values can be dramatically affected by unusually large or small sample sizes (Cox
and Hinkley 1974, Table 10.2; Ellison 1996).
7.4 Bayesian Decision Theory
Both the inferential problems of estimation and hypothesis testing can be viewed as problems in
decision theory, for which a complete Bayesian theory has been developed. However, Bayesian decision
theory can also be used in applied problems of decision-making when information is obtained through
experience and experimentation. For instance, the
natural regeneration example previously discussed
could be formulated as a Bayesian decision theory
problem, as could many other questions relating to
forest management.
The basic framework of decision theory assumes
a set of possible, but unknown, states of nature,
= {1, 2, }, and a set of possible actions
A = {a1, a2, } available to the decision-maker.7 If the
decision-maker chooses an action, a1, when the state
of nature is 1 then an incurred loss can be calculated
by a function denoted by L(1, a1). This loss could
also be written as a gain G(1, a1) = L(1, a1). For the
natural regeneration example, the set has two states
of nature: = {1= NSR, 2= SR}. The two possible
actions under consideration are A = {a1= plant, a2 =
not plant}. For illustration purposes,8 some arbitrary
numbers will be used for the gain function and are
presented in Table 7.5 and Figure 7.4. This figure
shows a simple decision-tree diagram often used in
decision analysis. This example will be used to illustrate the basic concepts in decision analysis, which
are developed in more detail by Peterman and Peters
(this volume, Chap. 8).
The decision-maker wants to keep losses as small
as possible or, alternately, the gains as high as possible. The difficulty lies in the fact that there is usually
not a unique action, a*, for which the gain is maximized for all states of nature, . For some states of
nature one action maximizes the gain, while for others a different action will provide a maximum. In
such cases, since the state of nature is unknown, an
unambiguously best action cannot be chosen. For
example, planting a site when it is sufficiently regenerated is a waste of resources and may require further
resources later, if, for instance, the stand is too dense
7 and A are names used to represent sets of things, which consist of the possible states of nature: 1, 2,..., and the possible actions a1, a2 ...,
respectively.
8 Although we have used the gain function when writing this section because of its more positive point of view, the literature mostly uses the
loss function.
97
Gain
State of nature
Management action
1= cutblock is NSR
a1
2 = cutblock is SR
1 0
a2
G (1,a1) = $200/ha
G (2,a1) = $1200/ha
1 = cutblock is NSR
G (1,a2) = $1800/ha
2 = cutblock is SR
1 0
G (2,a2) = $500/ha
Possible action
State of nature
1 = NSR
2 = SR
98
a1 = plant
G(1,a1) = $200/ha
G(2,a1) = -$1200/ha
a2 = not plant
G(1,a2) = -$1800/ha
G(2,a2) = $500/ha
. The Bayes posterior gain and posterior Bayes decision for all possible numbers of understocked plots
Number of
understocked
plots
observed
Posterior
probability
for = NSR
(p = p( = NSR|X))
Posterior gain
for action:
a1 = plant
Bayes
posterior gain
for action:
a2 = not plant
0.000
0.000
0.001
0.003
0.016
0.073
0.277
0.652
0.902
0.978
0.995
0.999
1.000
-1200
-1200
-1199
-1195
-1178
-1098
-812
-287
62
169
194
199
200
500
500
498
492
464
333
-137
-1000
-1574
-1750
-1790
-1798
-1800
0
1
2
3
4
5
6
7
8
9
10
11
12
not plant
not plant
not plant
not plant
not plant
not plant
not plant
plant
plant
plant
plant
plant
plant
Prior
Bayes
decision
plant
plant
plant
plant
plant
plant
plant
plant
plant
plant
plant
plant
plant
The left-hand side is the posterior odds (see equation (2)). If it is greater than the ratio of gain
differences on the right-hand side, then planting will
be the Bayes decision. If this odds is less, then not
planting would be the Bayes decision. Thus the condition (equation (4)) can be expressed as: plant if
and only if the evidence for an NSR cutblock is sufficiently high. How high it has to be depends on the
prior odds, and on the anticipated gains under all
scenarios (via the right-hand side of equation(4)).
For our example, the ratio of gains is:
G(2, a2) G(2, a1) 500 (-1200) 1700
=
= = 0.85.
G(1, a1) G(1, a2) 200 (-1900) 2000
Posterior
Bayes
decision
(4)
99
100
101
102
. Numbers of previously sampled plots (observed as US or S) from both NSR and SR cutblocks. Parameters for the
prior probability distribution and the two probability models are also shown.
Probability parameters
Cutblock is
= NSR
= SR
Total
Joint distribution
(probability model parameters)
Prior probability:
p()
X = US
X=S
840 plots
(0 = 0.84)
160 plots
(10 = 0.16)
672 plots
(1 = 0.80)
72 plots
(2 = 0.45)
168 plots
(11 = 0.20)
88 plots
(12 = 0.55)
1000
744
256
103
()
n
p(X| = SR) = (X )
(12) (nX).
(A2.1)
( )
( )
or
so that numerically,
p(X = US) = 0.84 0.80 + 0.16 0.45 = 0.744.
Thus, for the example, equation (A1.3) can be written as:
posterior probability = p( = NSR|X = US)
=
0 1
0 1 + (10) 2.
n!
n
n
9 This distribution is described in most standard introductory statistical textbooks. X is known as the binomial coefficient and X = .
X!(n X)!
If X = 7 and n = 12 then this is equal to 792. When n = 1, the binomial distribution becomes the Bernouilli.
()
104
()
For forest resource managers, uncertainties are unavoidable because of natural ecological variability and
our imperfect knowledge of ecosystems. Nevertheless,
management decisions must be made and actions
must be taken. Decision analysis, a quantitative
method of evaluating management options, can
greatly assist that decision-making process because it
explicitly uses information on uncertainties. Although
widely used in business, decision analysis is particularly
useful for forest management because it accounts for
uncertainty about states of nature (e.g., current timber volume per hectare, the slope of the relationship
between survival rate of a rare bird species and size of
patches of mature stands of trees). Decision analysis,
in conjunction with Bayesian statistical methods, permits calculation of the potential outcomes of
management actions, considering each hypothesized
state of nature weighted by its probability of occurrence. Given a clear objective, managers can then
rank their management options. A sensitivity analysis
can determine how sensitive this ranked order of management options is to different assumptions or
parameter values. Sensitivity analysis can also identify
research priorities and help resolve conflicts between
interest groups about objectives or beliefs about how
a forest ecosystem works. Decision analysis is particularly appropriate for the planning stage of an active
adaptive management initiative because it can compare the expected performance of different proposed
experimental plans, taking into account various uncertainties. This procedure can help identify the best
experimental design for an adaptive management
plan, as well as its associated monitoring program.
8.1 Introduction
As noted in Nyberg (this volume, Chap. 1) uncertainties are pervasive in natural resource management.
Our knowledge of ecosystems is incomplete and imperfect, which creates imprecision and bias in data
used to quantitatively describe the dynamics of these
systems. Despite the presence of these uncertainties,
decisions must be made and regulations must be developed. One purpose of this chapter is to discuss
why it is important for decision-makers to explicitly
105
106
50
0
-50
-100
-150
-200
-250
1950
1955
1960
1965
1970
1975
1970
1975
1970
1975
Year of estimate
1955
1960
1965
Year of estimate
25
20
15
10
5
0
-5
1950
1955
1960
1965
Year of estimate
. Changes in estimates of various physical constants as new experimental or measurement methods were
developed. Data points are mean estimates of physical constants and vertical bars represent standard errors of
the mean estimates. All values are in units of deviations from their 1973 estimates, in parts per million (ppm).
(Adapted from Henrion and Fischhoff 1986.)
107
108
Decision analysis is becoming a popular tool in resource management (e.g., Lord 1976; Walters 1981,
1986; Cohan et al. 1984; Parkhurst 1984; Bergh and
Butterworth 1987; Holbert and Johnson 1989; Parma
and Deriso 1990; McAllister and Peterman 1992a; McDaniels 1992; Thompson 1992; Hilborn et al. 1994;
Maguire and Boiney 1994; Reckhow 1994; Adkison
and Peterman 1996). This popularity is due to several
reasons.
First, most problems in resource management are
too complex (with lags, nonlinearities, threshold
phenomena, and cumulative effects) to permit the
use of formal optimization techniques (see Clark
1990 for some exceptions). Second, decision analysis
can help managers rank proposed management actions based on quantitative assessments of
probabilities of uncertain events and the desirability
of possible outcomes (Keeney 1982; Howard 1988;
Clemen 1996). Decision analysis can be thought of as
one type of risk assessment in that it considers the
uncertainties that create risks. Although decision
analysis cannot guarantee that a correct decision will
be made each time, it will improve the quality of several similar decisions over the long term because it
explicitly takes uncertainties into account quantitatively (Von Winterfeldt and Edwards 1986). Similarly,
taking the optimal action identified by a decision
analysis does not guarantee a certain desired outcome, but it increases the probability of a desirable
outcome occurring. Finally, decision analysis can
combine Bayesian statistical analysis and stochastic
models (Monte Carlo simulations) into a structured,
systematic approach to making decisions. Complex
decision problems are broken down into smaller and
more manageable components; these components
are then recombined to determine the optimal action. This process makes decision analysis a useful
tool for decisions involving complex ecological and
human responses to management actions, which certainly characterize forest management.
8.4 Eight Components of Decision Analysis
To make a complex decision problem in forestry
more tractable, decision analysis breaks the problem
down into eight components:
1. management objectives;
2. management options;
3. uncertain states of nature;
4. probabilities on the uncertain states of nature;
. A generalized decision table showing calculation of expected outcomes for two potential management
actions, given two possible states of nature (Hypothesis 1 and 2) with their associated probabilities (P1 and
P2). Compare with Figure 8.2.
Hypotheses or uncertain
states of nature
Probabilities
Potential
management action #1
Potential management
action #2
Hypothesis 1
Probability that
Hypothesis 1 is
correct (P1)
Consequence of action 1
if Hypothesis 1 is
correct (C11)
Consequence of action 2
if Hypothesis 1 is
correct (C21)
Hypothesis 2
Probability that
Hypothesis 2 is
correct (P2)
Consequence of action 1
if Hypothesis 2 is
correct (C12)
Consequence of action 2
if Hypothesis 2 is
correct (C22)
Expected consequence
of action 1 =
(P1 C11)+(P2 C12)
Expected consequence
of action 2 =
(P1 C21)+(P2 C22)
5. model to calculate the outcome of each management action for each state of nature;
6. decision tree or decision table;
7. ranking of management actions; and
8. sensitivity analyses.
A generalized decision table (e.g., Table 8.1) can be
used to structure the decision analysis of simple
problems. In this table, two alternative management
actions are listed across columns and alternative hypotheses or uncertain states of nature, with their
associated probabilities (P1 and P2), are placed in
rows. For each combination of action and hypothesis,
Management actions
Probabilities of
states of nature
the consequences or outcomes (C11, C12, etc.) are calculated using a model. The expected value of the
consequence for a particular management action
(last row) is then calculated from the weighted average of all possible consequences for that action,
where the weighting is the probability of the hypothesis that gives rise to each consequence.
For more complex problems, a decision tree can be
used to structure the analysis (Render and Stair 1988;
Clemen 1996). The generalized decision tree in Figure
8.2 corresponds to the decision table in Table 8.1. Alternative management actions in Figure 8.2 are
represented by branches emerging from a square
States of nature
or hypotheses
Hypothesis 1
P1
Outcomes or
consequences
C11
Action 1
P2
P1
Hypothesis 2
Hypothesis 1
C12
C21
Action 2
P2
Hypothesis 2
C22
. A simple example of a generalized decision tree showing two different management actions and two possible
states of nature (Hypothesis 1 and 2) with their associated probabilities (P1 and P2). The square at the left is the
decision node and the circles are chance nodes. The consequences associated with each combination of
management action, i, and state of nature, j, are designated Cij. This decision tree is the graphical equivalent of
the general decision table shown in Table 8.1.
109
decision node, and uncertain states of nature or hypotheses are represented as branches coming from
the circular chance nodes. The probability of each
uncertain state of nature is shown explicitly for each
state-of-nature branch. Outcomes or consequences
of each management action, given each state of nature, are shown on the right. Decision trees can
accommodate much more complexity than a decision table by including numerous branches and
uncertainty nodes.
We will use an application of decision analysis to
forest management in Tahoe National Forest, California (Cohan et al. 1984) to illustrate the eight
components of this method. The purpose of Cohan
et al.s particular decision analysis (referred to as the
Tahoe example) was to determine what treatment
should be applied before a prescribed burn on a
recently harvested forest site. Figure 8.3 shows the
decision tree for this problem; its components are
explained below.
8.4.1 Management objectives
Decision analysis requires a clearly defined management objective or goal so that the different
management actions can be ranked by how well they
are expected to attain the objective. The objective is
usually stated explicitly in terms of maximizing (or
minimizing) one or more quantitative measures of
performance (such as expected value of future timber
harvests). However, decision analysis can also accommodate situations in which the objective is to choose
an action that produces a performance measure, such
as abundance of some rare bird species, that is within
an acceptable range of values. In this case, actions
that do not lead to outcomes within this range can be
discarded, and some secondary criterion (such as
minimizing cost) can be used to choose from the remaining actions. As emphasized by Keeney (1992),
identifying objectives requires carefully applying various procedures to ensure, for instance, that
fundamental objectives are not confused with the
means needed to attain them.
In the Tahoe example (Figure 8.3), the management objective was to maximize the expected net
resource value of the forest following the prescribed
burn. That value took into account the value of the
timber harvested, as well as the cost of carrying out
the pre-burn treatment (if any), the cost of the prescribed broadcast burn, and the cost incurred from
an escaped fire (if one escaped).
In the case of British Columbias forests, manage-
110
ment objectives can involve timber value, recreational use, wildlife habitat, and quality of viewscapes in
various combinations and with various relative importances. For example, a primary management
objective in Clayoquot Sound is to maintain longterm productivity and natural diversity of the area.
Subgoals include maintaining watershed integrity, biological diversity, and cultural, scenic, recreational,
and tourism values (Scientific Panel for Sustainable
Forest Practices in Clayoquot Sound 1995).
8.4.2 Management options
Managers need to define a list of alternative actions
from which to choose the best option. Considerable
thought should be put into developing innovative
options, as well as into identifying feasible ones
(Keeney 1982).
The Tahoe prescribed burn problem has two alternative management actions. These alternatives are
shown in Figure 8.3 as two branches emerging from
the square decision node. One choice was to conduct the prescribed broadcast burn without any
pre-burn treatment of the site (burn only). The
other alternative was to pile up timber slash from the
clearcut harvest before the broadcast burn (YUM
and burn). This latter treatment, referred to as yarding unmerchantable material (YUM), incurs
additional costs but reduces the probability of fire escaping and increases the chances of a successful burn.
Cohan et al.s (1984) question was, Is YUM worth
the additional cost?
8.4.3 Uncertain states of nature
Uncertain states of nature are parameters or quantitative hypotheses that are treated explicitly as
uncertainties in an analysis, usually by considering a
range of values for one or more parameters in a
model (see Section 8.4.5). Such uncertain parameters
lead to a corresponding range of forecasts of outcomes of management actions. For instance, it may
be difficult to estimate the effect of different sizes of
leave patches in a retention harvesting strategy on
abundance of a bird population because of uncertainty about how the probability of blowdown is
affected by patch size (i.e., whether that probability is
a steeply rising function of patch size or a relatively
flat one). There is also considerable uncertainty
about the benefits of some requirements in the
British Columbia Forest Practices Code for meeting
objectives related to biodiversity or recreational use.
For example, it is unclear whether the survival rate of
juvenile coho salmon is greatly or only slightly affected by the width of riparian forest that the Code
requires to be left along stream banks.
Two major uncertainties in the Tahoe example
(Figure 8.3) involved the fire behaviour and the magnitude of costs associated with what Cohan et al.
(1984) referred to generally as problem fires.
Uncertainty in fire behaviour was represented by
defining three types of fires: a successful burn, a
problem burn, and an escaped fire. The second uncertainty was the cost of a problem burn (high,
intermediate, or low cost). These uncertain states of
nature are shown as branches emerging from circular
chance nodes in Figure 8.3.
Management actions
Treatments
Fire behaviour
Cost of problems
Successful burn
0.899
Outcomes
Resource Treatment Problem/
value
cost
escape cost
Net
value
6 620
4 858
1 762
6 620
4 858
3 010
-1 248
6 620
4 858
1 400
362
6 620
4 858
700
1 062
Escaped re
0.001
6 620
4 858
40 000
-38 238
Successful burn
0.8485
6 620
4 550
2 070
6 620
4 550
3 360
-1 290
6 620
4 550
1 750
320
6 620
4 550
1 050
1 020
6 620
4 550
40 000
-37 930
High cost
YUM and
burn
$1 559
0.25
Problems
0.100
Int. cost
0.50
Low cost
0.25
High cost
0.25
$1 713
Burn only
Problems
0.150
Int. cost
0.50
Low cost
0.25
Escaped re
0.0015
. Decision tree for the example described in the text for the Tahoe National Forest. The management options
(treatments) are to burn only or YUM and burn; the latter refers to yarding unmerchantable material,
where the slash material from the logging operation is piled up before burning. Outcomes are costs in dollars for
a 14-acre site. The resulting expected net resource values for each management option are indicated next to the
option. See text for details. (Adapted from Cohan et al. 1984.)
111
H0
best
estimate
?
Initial density
. Possible models for hypothetical data on average volume per tree at age 100 years as a function of initial
density. The solid line is the best-fit regression line; dashed lines represent other possible, but less likely,
hypotheses about the true underlying relationship, including the null hypothesis, HO , of no relationship.
112
113
of these models for each alternative management action and for each uncertain state of nature are shown
on the right side of Figure 8.3. The net resource value
is the resource value minus the treatment cost and
the cost of a problem fire or escaped fire. For example, the simulated net resource value of the YUM
and burn option, if a successful burn resulted, was
$1762 on this 14-acre site. For the same action, but assuming that a problem fire recurred that had high
costs, their simulated net resource value was -$1248.
0.16
Posterior probability
0.12
0.08
0.04
0
-0.3
-0.22
-0.14
-0.06
0.02
. Posterior probabilities for different slopes of a linear model for the hypothetical data shown in Figure 8.4.
Posterior probabilities were calculated using Bayesian statistics. The best-fit line shown in Figure 8.4 has the
highest posterior probability, but other lines with different slopes also have reasonably high probabilities. These
probabilities can be used in a decision analysis to represent the relative degree of belief in the different slopes.
114
115
1750
1700
1650
Burn only
1600
1550
1500
1450
1400
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
116
designed before their implementation (Hairston [editor] 1989). Well-designed experiments generate
rigorous new information about the relative effectiveness of each action or about the different
hypotheses about biological processes. Acting adaptively will tend to reduce future uncertainties and
thereby improve future management (Peterman and
McAllister 1993).
If decision-makers take this approach, they must
. Some possible arrangements that could be considered for a thinning experiment. Each arrangement
consists of a different number of replicates at various densities of trees, which might be necessary because
of logistical constraints.
Option 1
Option 2
Option 3
250
500
750
Control (unthinned)
3
2
3
2
3
3
2
2
4
4
0
2
117
inventory methods would produce a probability distribution of estimates of timber volumes at any given
time. The inventory methods differed in cost (high,
medium, or low) and precision (high, medium, or
low).
Stahl et al. (1994) found that, in general, several inexpensive and less precise inventories taken only a
few times during the life of a stand resulted in a higher expected net income than a single, expensive but
very precise inventory. In addition, the authors concluded that precise inventory information was more
valuable when the potential losses in income due to
incorrect decisions were large. This conclusion is perhaps intuitive, but Stahl et al. were able to
quantitatively estimate the relative value of different
methods of doing forest inventories by explicitly considering uncertainties in information.
In wildlife management, Maguire (1986) used
decision analysis to recommend an appropriate conservation strategy for Whooping Crane populations
to minimize their probability of extinction. Maguire
evaluated whether it is better from a biodiversity
standpoint to create a single large population or
several small ones, given that random catastrophic
events can occur (a common debate in conservation
biology; see Simberloff and Abele 1976). In the
Whooping Crane situation, when Maguire (1986)
took the uncertainties associated with severe storms
into account, the optimal action was to move some
of the Whooping Cranes and create two separate
populations. This approach was better than keeping
them as a single population that had a higher probability of extinction if a rare severe storm occurred in
that one location.
Decision analysis has also been applied to complex
land use issues, such as the decision whether to preserve and/or mine in the Tatshenshini area of
wilderness in northwestern British Columbia (McDaniels 1992). There, the major uncertainties
included the environmental values associated with
preserving the area, the tax revenue to be generated
by mining, the question of whether mining would
actually go ahead given the regulatory process, and
other uncertainties. The analysis suggested that
preservation of the entire area as a park would have
the greatest expected value, taking into account the
nonmarket value of the wilderness.
Within the field of natural resources, decision
analysis has been used most widely in fisheries management. For instance, several authors have used
118
decision analysis to identify optimal management actions for Pacific salmon (e.g., Lord 1976; Walters 1981,
1986). Decision analysis was also able to identify the
optimal precautionary safety margin to apply to harvest rates of other marine fish species, given
uncertainties in stock abundance and productivity
(Frederick and Peterman 1995).
A final fisheries example from the northwestern
shelf of Australia (Sainsbury 1988, 1991; Sainsbury et
al. 1997) demonstrates particularly well how decision
analysis can be used in the design phase of an experimental, or active adaptive management program.
Foresters can learn considerably from this case study
because it is one of the few large-scale active adaptive
management experiments ever implemented, as well
as one of the few to use formal decision analysis in
the planning stage (also see Walters 1986; McAllister
and Peterman 1992b). This case study is therefore
worth discussing in detail.
The objectives of this experiment were to determine why the abundances of two economically
valuable groups of groundfish species were declining
relative to less valuable species and to take appropriate management action (Sainsbury 1988). In 1985,
Sainsbury proposed four different hypotheses, or
states of nature, that could potentially explain the
historical decrease in abundance of the valuable
species relative to the less valuable ones. These hypotheses were an intraspecific mechanism that
inhibited the valuable species, two different interspecific interactions between the valuable and
less-valuable species that kept the former at low
abundances, and a mechanism in which the existing
trawl fishery disrupted the preferred ocean floor
habitat of the valuable species. Sainsbury proposed
five experimental, or active adaptive, management
regimes to distinguish among these hypotheses (see
WA to WE in Figure 8.7, which shows the major elements of Sainsburys decision analysis). These
experimental management strategies ranged from
continuing the existing trawl fishery, to stopping the
trawl fishery for some period and using a trap fishery
only, to several activities in various spatial areas (including no fishing, trap fishing only, trawl fishing
only, or both). Sainsburys decision analysis forecasted the expected economic value of the fish catch for
each of these management strategies for each of the
four possible states of nature. These states of nature were weighted by their probability of occurrence
(P1 to P4), as derived from historical data and
Management actions
Management time
Hypothesis 1
P1
P2
P3
WA
t=5
WB
WC
Hypothesis 2
Hypothesis 3
P4
t = 10
Hypothesis 4
Value of catch
(millions $)
C1
C2
C3
C4
t = 20
Hypothesis 1
WD
Outcomes
t=5
P1
t = 10
P2
P3
WE
t = 20
Hypothesis 2
Hypothesis 3
P4
Hypothesis 4
Hypothesis 1
t=5
P1
t = 10
t = 20
P2
P3
Hypothesis 2
Hypothesis 3
P4
Hypothesis 4
C5
C6
C7
C8
C9
C10
C11
C12
. Decision tree for the analysis of various management actions in Sainsburys (1988) large-scale fishing
experiment. Management strategies (WA to WE ), time periods, hypotheses, and outcomes are described in the
text and Table 8.3. Only a subset of the complex branches is shown.
119
. Results of Sainsburys (1991) calculations of the benefits of different designs for an active adaptive management
experiment on groundfish in Australia. Management strategies WA to WE are defined in the text; they differed in
how much fishing occurred and when, what type of gear was used, and whether the strategies were based only
on existing information as of 1985 (WA and WB) or on information resulting from the active adaptive experiment
(WC , WD , WE ). WB ,1 to WB ,4 refer to four different long-term harvesting strategies; time period, t, is the duration
of the experiment in years. Expected values of catch are in millions of Australian dollars. See text for details.
(Adapted from Sainsbury 1991.)
Strategy
WA
WB,1
WB,2
WB,3
WB,4
WC,t = 5
WC,t = 10
WC,t = 20
WD,t = 5
WD,t = 10
WD,t = 20
WE,t = 5
WE,t = 10
WE,t = 20
120
9.96
27.2
35.4
31.8
9.96
35.6
29.7
21.2
37.4
37.2
36.3
40.6
40.5
38.6
improved understanding of which of the four hypotheses was responsible for the decline in
abundance of the valuable species. This approach allowed a more accurate decision to be made about
which long-term harvesting strategy was most likely
to reverse the problem and increase the value of the
catch. (Incidentally, Sainsbury et al. 1997 reported
that the experimental management strategy WE generated data by 1991 that strongly supported the fourth
hypothesisthat trawling detrimentally affected the
habitat of the more valuable groundfish species.
Trawling was subsequently reduced.)
8.7 Value of Information
By taking uncertainty into account quantitatively in
decision analyses, analysts can quantify the effects of
considering or reducing uncertainties when making
decisions. Several types of analyses are possible: expected value of including uncertainty (EVIU),
expected value of sample information (EVSI), expected value of perfect information (EVPI), and
expected value of experimental or adaptive management. (See Morgan and Henrion 1990 for more
details.)
121
122
123
124
Not all circumstances warrant a full, formal quantitative decision analysisjustifiable usage of
decision analysis is case-specific. For example, decision analysis is more feasible if at least some
reliable data are available and clear management
objectives are stated. Furthermore, decision analysis is more appropriate when costs of incorrect
decisions are potentially large. First-time users of
this approach are encouraged to use the references
here and to discuss the approach with experts who
have previously used decision analysis. Regardless
of the specific situation, it is always worth at least
thinking about a decision-making problem in
terms of the components of decision analysis as
described, even if final calculations are never carried out due to limitations in data or other
problems. The mere process of describing each
component helps to clarify and organize the decision-making process and to identify research
needs.
Acknowledgements
We are grateful to Vera Sit, Brenda Taylor, Darrell
Errico, and Milo Adkison for many useful suggestions, comments, and discussions. Milo Adkison also
suggested the format for Table 8.1. Several reviewers
also provided helpful comments: Russ Horton,
Wendy Bergerud, Michael Stoehr, Jeff Stone, and
Peter Ott.
References
Adkison, M.D. and R.M. Peterman. 1996. Results of
Bayesian methods depend on details of implementation: an example of estimating salmon
escapement goals. Fish. Res. 25:15570.
Bell, D.E., R.L. Keeney, and H. Raiffa (editors). 1977.
Conflicting objectives in decisions. J. Wiley,
New York, N.Y.
Berger, J.O. and D.A. Berry. 1988. Statistical analysis
and the illusion of objectivity. Am. Sci.
76:15965.
Bergerud, W.A. and W.J. Reed. [n.d.]. Bayesian statistical methods. This volume.
Bergh, M.O. and D.S. Butterworth. 1987. Towards rational harvesting of the South African anchovy
considering survey imprecision and recruitment variability. S. African J. Marine Sci.
5:93751.
Box, G.E.P. and G.C. Tiao. 1973. Bayesian inference
in statistical analysis. Addison-Wesley, Reading,
Mass.
Clark, C.W. 1990. Mathematical bioeconomics: the
optimal management of renewable resources. J.
Wiley, New York, N.Y.
Clemen, R.T. 1996. Making hard decisions: an introduction to decision analysis. 2nd ed. Duxbury
Press, Wadsworth Publ. Co., Belmont, Calif.
Cohan, D., S.M. Haas, D.L. Radloff, and R.F. Yancik.
1984. Using fire in forest management: decision
making under uncertainty. Interfaces 14:819.
Crome, F.H.J., M.R. Thomas, and L.A. Moore. 1996.
A novel Bayesian approach to assessing impacts
of rain forest logging. Ecol. Applic. 6:110423.
Edwards, A.W.F. 1992. Likelihood: expanded edition.
Johns Hopkins Univ. Press, Baltimore, Md.
Ellison, A.M. 1996. An introduction to Bayesian inference for ecological research and
environmental decision-making. Ecol. Applic.
6:103646.
Errico, D. 1989. Choosing stand density when spacing
lodgepole pine in the presence of risk of pest attack. B.C. Min. For., Victoria, B.C. Res. Rep.
89004-HQ.
Frederick, S.W. and R.M. Peterman. 1995. Choosing
fisheries harvest policies: when does uncertainty matter? Can. J. Fish. Aquat. Sci. 52:291306.
Gobas, F.A.P.C. 1993. A model for predicting the
bioaccumulation of hydrophobic organic
chemicals in aquatic food-webs: application to
Lake Ontario. Ecol. Modeling 69:117.
Hairston, N.G. (editor). 1989. Ecological experiments: purpose, design, and execution.
Cambridge Univ. Press, Cambridge, U.K.
125
Henrion, M. and B. Fischhoff. 1986. Assessing uncertainty in physical constants. Am. J. Physics
54:7917.
Maguire, L.A. 1986. Using decision analysis to manage endangered species populations. J. Environ.
Manage. 22:34560.
Maguire, L.A. and L.G. Boiney. 1994. Resolving environmental disputes: a framework
incorporating decision analysis and dispute resolution techniques. J. Environ. Manage.
42:318.
Holbert, D. and J.C. Johnson. 1989. Using prior information in fisheries management: A
comparison of classical and Bayesian methods
for estimating population parameters. Coastal
Manage. 17:33347.
Howard, R.A. 1988. Decision analysis: practice and
promise. Manage. Sci. 34:67995.
Ibrekk, H. and M.G. Morgan. 1987. Graphical communication of uncertain quantities to
non-technical people. Risk Analysis 7:51929.
Keeney, R.L. 1982. Decision analysis: an overview.
Operations Res. 30:80338.
______. 1992. Value-focused thinking. Harvard Univ.
Press, Cambridge, Mass.
Mapstone, B.D. 1995. Scalable decision rules for environmental impact studies: effect size, Type I,
and Type II errors. Ecol. Applic. 5:40110.
Morgan, M.G. and M. Henrion. 1990. Uncertainty:
a guide to dealing with uncertainty in quantitative risk and policy analysis. Cambridge Univ.
Press, Cambridge, U.K.
Nyberg, J.B. [n.d.]. Statistics and the practice of
adaptive management. This volume.
Osenberg, C.W., R.J. Schmitt, S.J. Holbrook, K.E.
Abu-Saba, and A.R. Flegal. 1994. Detection of
environmental impacts: natural variability, effect size, and power analysis. Ecol. Applic.
4:1630.
Keeney, R.L. and H. Raiffa. 1976. Decisions with multiple objectives: preferences and value
trade-offs. J. Wiley, New York, N.Y.
Lindley, D.V. 1985. Making decisions. Wiley Interscience, New York, N.Y.
Lord, G.E. 1976. Decision theory applied to the simulated data acquisition and management of a
salmon fishery. Fish. Bull. (U.S.) 74:83746.
McAllister, M.K. and R.M Peterman. 1992a. Decision
analysis of a large-scale fishing experiment
designed to test for a genetic effect of sizeselective fishing on British Columbia pink
salmon (Oncorhynchus gorbuscha). Can. J. Fish.
and Aquat. Sci. 49:130514.
______. 1992b. Experimental design in the management of fisheries: a review. N. Am. J. Fish.
Manage. 12:118.
McDaniels, T. 1992. A multiple objective decision
analysis of land use for the Tatshenshini-Alsek
area. Appendix to report for B.C. Commission
on Resources and Environment, Victoria, B.C.
126
Simberloff, D.S. and L.G. Abele. 1976. Island biogeographic theory and conservation practice.
Science 191:2856.
Press, S. J. 1989. Bayesian statistics: principles, models, and applications. J. Wiley, New York, N.Y.
127
9 SELECTING APPROPRIATE STATISTICAL PROCEDURES AND ASKING THE RIGHT QUESTIONS: A SYNTHESIS
BRUCE G. MARCOT
Abstract
9.1 Introduction
How should managers and researchers select an approach for designing an adaptive management study
and analyzing the results? The chapters in this report
provide some guidance; for example, Nemec (this
volume, Chap. 2), summarizes principles of experimental design, and Schwarz (this volume, Chap. 3)
lists types of nonexperimental and experimental designs. Other publications (e.g., Green 1979), while not
specific to adaptive management as defined in this
volume, also provide guidance on designing ecological studies. This chapter reviews issues to consider in
designing adaptive management studies, synthesizes
the methods discussed in preceding chapters of this
report, and summarizes the roles different types of
information can play in adaptive management.
Statistical approaches and study designs can be selected only when the management question is first
well articulated. In the first section of this chapter,
I review three types of monitoring, differentiated by
the types of question they each address, and then address how the spatial and temporal elements of a
management question can influence study design.
In the second section, I review the characteristics of
powerful studies and the principles of experimental
design. The third section summarizes various types of
information (including existing data, retrospective
studies, and nonexperimental studies) and experimental studies, and how they can contribute to
129
130
Issues of space
The five kinds of spatial effects to consider can influence the design of a study as well as the interpretation
of its results.
1. What is the influence of on-site management
activities on off-site conditions? That is, local
management may influence remote conditions,
both directly and indirectly (Loehle 1990). An example is the downstream effect of stream
temperature or sedimentation on fish populations
due to local reduction, removal, or restoration of
riparian vegetation cover.
2. What is the relative influence of off-site management activities on on-site (desired) conditions?
On-site conditions can be influenced by other offsite activities. For example, despite protection of
old-growth forest groves, some arboreal lichens
might nonetheless decline because of degraded air
quality from industrial pollutants originating elsewhere in the airshed. The potential influence of
downstream dams and fish harvesting on the
abundance of local fish populations is another
example.
3. To what degree do local management activities influence the on-site (desired) conditions? That is,
to what extent do background noise and other environmental factors affect on-site conditions?
Local management may influence only a portion
of the total variation in local conditions. For example, providing local breeding habitat only
partially succeeds in conserving populations of
neotropical migratory birds, whose numbers may
still decline due to pesticide loads or habitat loss
encountered during wintering in the neotropics.
4. What is the relative influence of conditions and activities from different spatial scales, particularly the
effects on local stand-level conditions from broader landscape-level factors? That is, desired
conditions and management actions are best addressed at appropriate scales of geography. As
examples, effects of forest management on abundance of coarse woody debris are best assessed at
the stand level; effects of forest management on
vegetation conditions that affect visual quality or
goshawk (Accipiter gentilis) habitat are best assessed at the landscape level; and effects of overall
management policy and ownership patterns on
grizzly bear (Ursus arctos) populations are best
assessed at subregional or regional levels.
term. For example, annual non-monotonic variations in bird populationsboth increases and
decreasesmay belie truer long-term declines in
some population counts (Thomas and Martin
1996).
3. What are the cumulative effects of a variable over
time? Some variables do not make a mark except
over time or until a particular threshold has been
exceeded. An example is the adverse effect of certain pesticides on wildlife reproduction. The
detrimental effect may not be apparent until the
pesticide concentrations reach a particular level of
toxicity (Tiebout and Brugger 1995).
The design of adaptive management studies and
selection of analysis methods are guided in part by
these considerations of space and time. For example,
replication is one major consideration in designing
studies. Given a large geographic area, as tends to be
the focus in ecosystem management, or a rare condition, such as a threatened species population, are
spatial replicates possible? That is, can landscapes or
threatened populations be replicated at all, or in
adequate numbers? If the conditions cannot be replicated, then pseudoreplication (e.g., dividing a single
area into smaller blocks) may be the only recourse
(Hurlbert 1984). Alternatively, other kinds of studies
(e.g., analytical surveys, expert testimony) might help
in assessing the impact of the treatments, although
they do not allow strong inference about cause.
Similarly, long response times and time lags make
temporal replication difficult. Retrospective studies
(see Smith, this volume, Chap. 4) provide one alternative for gaining insight into the long-term effects
of management actions. In cases where either spatial
or temporal replication is severely limited, a higher
probability of Type I and II errors might need to be
tolerated (see Anderson, this volume, Chap. 6).
In some cases, a powerful adaptive management
study may be possible but managers, decision-makers, industries, or other interested bodies may not be
willing to bear the cost, duration, and tight controls
on management activities. The consequences of not
using an optimum study must be explicitly considered and understood by all.
131
132
3. What is the relevance of the results? How representative is the study of other sites or conditions?
Some studies may reveal only local conditions and
the chance effects of unique site histories, rather
than overall effects, or they may pertain to only
one vegetation type or climatic condition. The
manager should know the contexts under which
results apply. For example, results of a forest thinning operation may apply to only a particular
initial stand density or forest type.
4. Were the effects truly a result of the management
activity? This question cuts to the heart of separating cause from noise, and determining what really
influenced the outcome. The experimental studies
that are central to adaptive management are designed to determine causality. Researchers and
managers should not assume that demonstration
of pattern and correlation constitutes valid evidence of causation.
9.3.3 Principles of experimental design
To help ensure success in evaluating management
actions, researchers should review adaptive management studies for the four main principles of
experimentation: randomization, replication, blocking, and representation (see Nemec, this volume,
Chap. 2). Randomization reduces bias. Replication
allows an estimation of variance, which is vital for
confirming observed differences. Blocking increases
precision and reduces cost and sample size. Representation helps to ensure study of the correct universe of
interest.
In the real world, these four principles cannot always be met and compromises are necessary. It is
often impossible to fully randomly allocate treatments, such as forest clearcuts or fire locations. In
such cases, study sites may be randomly selected
from existing clearcuts or fire locations, resulting in
nonexperimental studies (e.g., observational studies,
analytical surveys, retrospective studies, or impact
studies; see Schwarz, this volume, Chap. 3). When interpreting study results, researchers should account
for the site-specific characteristics leading to the initial nonrandom assignment of the treatment.
Furthermore, the researcher should recognize that
the altered study can no longer provide reliable
knowledge of cause, but only generates hypotheses
for validation when future management actions are
implemented.
When replication is not possible, suspected causal
effects can be masked by confounding hidden causes
1 Modifications addressed the need to adhere to the U.S. Federal Advisory Committee Act, by polling individual experts for basic ecological information and not reaching group consensus on specific management actions.
133
Anecdotes and expert judgement alone are not recommended for evaluating management actions because of their low reliability and unknown bias.
In the BC Forest Service, use of this source of information alone to evaluate management actions is not
considered adaptive management.
Retrospective studies
Sometimes the results of management actions are
provided by measuring the outcomes of future actions taken in the past. Retrospective studies (evaluating the outcomes of actions taken in the past) are
valuable for helping to predict the outcomes of future
actions. These studies can provide some insights to
support or refute proposed hypotheses, and are particularly valuable for problems where some indicators take a long time to respond. However, because
the treatments might not have been randomly assigned, and the initial conditions and the details of
the treatments are often unknown, teasing out causal
factors may be challenging at best and misleading at
worst.
Nonexperimental (observational) studies
Nonexperimental studies (called observational
studies by some authors) are the most common kind
of field studies reported in wildlife journals. Like retrospective studies, nonexperimental studies are not
based on experimental manipulations. Although it
may be debatable whether nonexperimental studies
should entail hypothesis testing, they should
nonetheless meet statistical assumptions, including
adequacy of samples sizes and selection of study sites,
to ensure reliable results. Much can be learned from
taking advantage of existing conditions and unplanned disturbances (Carpenter 1990; Schwarz, this
volume, Chap. 3).
Nonexperimental studies usually entail analysis of
correlations among environmental and organism parameters, such as studying the correlations between
clearcutting and wildlife response. Causes are
inferred and corroborated through repeated observations under different conditions. Because results may
be confounded by uncontrolled (and unknown) factors, nonexperimental studies are best interpreted as
providing only insights to cause. These insights can
be valuable in predicting outcomes of actions, but
again, the veracity of such predictions and the effects
of management actions are best evaluated through
controlled experiments (McKinlay 1975, 1985). Of
134
135
2 Some authors suggest that Bayesian analyses also can be interpreted as the testing of null hypotheses, that is, the prior probabilities.
136
(a)
(d)
E1
E1
E2
S1
S2
Ei
(b)
Sj
E2
S2
(c)
E1
?
S1
E2
. Causes and correlates: four examples. In all figures, S = wildlife species response; ? = unexplained variation due
to measurement error, experimental error, or effects of other environmental or species factors; solid arrows =
causal relations; dotted arrows = correlational relations that may or may not be causal. (a) In this simplest case,
some wildlife species response S, such as population presence or abundance, is assumed to be explained and
caused by some environmental factor E. (b) In a more complex case, we may be measuring one environmental
factor E1 when the real cause is another environmental factor E2. (c) Getting closer to the real world, a second
species response S2 may be part of the cause. (d) Most like the real world, with feedback relations among the
dependent (response) variables S. (Adapted from Morrison et al. 1998, Fig. 10.2.)
courses of action: selecting correct indicators, merging disparate lines of evidence, and using statistical
procedures that take advantage of prior knowledge or
that function adequately with small sample sizes.
Selecting correct indicators
Indicators that are objective, repeatable measurements, whose quality is documented quantitatively
should be selected. For adaptive management studies,
an indicator should (1) respond rapidly to changes,
(2) signal changes in other variables of interest, (3) be
monitored efficiently, and (4) be causally linked to
137
tions, the data from several studies might be combined into an overall regression. This regression
might suggest a significant correlation between
clearcutting and grizzly bear populations. However,
grizzly bears within individual study areas might respond differently to clearcutting because they come
from different geographic areas, latitudes, or forest
types. Thus the correlation may reflect these differences between populations, rather than any
treatment effect. The incorrect conclusion of correlation would arise because such an analysis violates an
assumption underlying regression: that the data
come from the same statistical population with the
same causal mechanisms. On the other hand, a formal meta-analysis approach would analyze results
from each study with differences among studies as an
explanatory factor. CI has great utility, especially
where powerful experimental studies are difficult.
However, managers and researchers must be careful
in its use, ensuring that studies are truly from the
same causal web.
138
139
Literature
review
"
"
[ ]
[ ]
(pilot)
(pilot)
[ ]b
[ ]b
Experimental
study
"
"
Nonexperimental
study
"
"
"
Retrospective
study
"
"
"
Anecdote
"
"
"
Demonstration
Expert
judgement
a Experimental and nonexperimental studies can provide information on patterns, correlates, etc., but typically these studies will not be done by taking advantage of management
actions, but rather as part of applied research.
b All else being equal, if the cost of conducting a nonexperimental study is significantly less than that of an experimental study, choose the former.
c In the Evaluation stage, existing information based on literature, expert judgement, and retrospective analysis is updated using data collected from the management experiment to assess the effect or outcome of an action. It can also be used to determine the relative plausibility of suspected causes, and to estimate the prior probabilities in a Bayesian
analysis.
5. Evaluate
Interpretation
4. Monitor
3. Implement
2. Design project
Determine treatments to
implement, sample size,
effect size, power, etc.
1. Assess problem
Identify potential impacts of
management actions and the
potential reasons for them.
Identify patterns and trends
Identify correlates
Identify potential causes
of suspected impact a
AM stages
. Stages of an adaptive management (AM) project and sources of information appropriate for each stage. This table can be used by managers as a
decision tree to guide (1) the choice of study for each AM stage (reading across rows), and (2) the use of existing information (reading down
columns). = recommended sources; " = not recommended; [ ] = most recommended for a given project stage; = does not apply;
(pilot) = use for pilot study only. See Chapter 1 for full descriptions of AM stages.
Acknowledgements
My thanks to Brian Nyberg and Brenda Taylor for
suggesting topics to cover in this chapter and for
their technical reviews of the manuscript. Roger
Green, Rick Page, Martin Raphael, and Ian Thompson also provided technical reviews. Thanks to Vera
Sit and Brenda Taylor for their fine edits. My gratitude to Tim Max for his technical review and for
thoughtful discussions on the role of statistics in
adaptive management.
3 As expressed by one statistician, if predictive models are so complex that they become essentially untestable, then they are nothing more
than belief structures and their relation to science is questionable at best (T. Max, pers. comm., 1997).
140
References
Anderson, J.L. [n.d.]. Errors of inference. This volume.
Block, W.M., L.A. Brennan, and R.J. Gutierrez. 1987.
Evaluation of guild-indicator species for use in
resource management. Environ. Manage.
11:2659.
Carpenter, S.R. 1990. Large-scale perturbations: opportunities for innovation. Ecology 71:203843.
Landres, P.B., J. Verner, and J.W. Thomas. 1988. Ecological uses of vertebrate indicator species: a
critique. Cons. Biol. 2:31628.
Ligon, J.D. and P.B. Stacey. 1996. Land use, lag times
and the detection of demographic change: the
case of the acorn woodpecker. Cons. Biol.
10:8406.
Draper, D., D.P. Gaver, Jr., P.K. Goel, J.B. Greenhouse, L.V. Hedges, C.N. Morris, J.R. Tucker,
and C.M. Waternaux. 1992. Combining information: statistical issues and opportunities for
research. National Academic Press, Washington, D.C. Contemporary statistics No. 1.
Link, W.A. and D.C. Hahn. 1996. Empirical Bayes estimation of proportions with application to
cowbird parasitism rates. Ecology 77:252837.
Gazey, W.J. and M.J. Staley. 1986. Population estimation from mark-recapture experiments using a
sequential Bayes algorithm. Ecology 67:94151.
Geiser, L.H., C.C. Derr, and K.L. Dillman. 1994. Air
quality monitoring on the Tongass National
Forest: methods and baselines using lichens.
U.S. Dep. Agric. For. Serv., Alaska Region. Petersburg, Alaska. R10-TB-46.
Loehle, C. 1990. Indirect effects: a critique and alternate methods. Ecology 71:23826.
McKinlay, S.M. 1975. Comprehensive review on design and analysis of observational studies. J.
Am. Statist. Assoc. 70:50320.
______. 1985. Observational studies. In Encyclopedia
of statistical sciences. Vol. 6. J. Wiley, New
York, N.Y. pp. 397401.
141
142
Peterman, R.M. and C. Peters. [n.d.]. Decision analysis: taking uncertainties into account in forest
resource management. This volume.
Richey, J.S., R.R. Horner, and B.W. Mar. 1985. The
Delphi technique in environmental assessment.
II. Consensus on critical issues in environmental monitoring program design. J. Environ.
Manage. 21:14759.
Richey, J.S., B.W. Mar, and R.R. Horner. 1985. The
Delphi technique in environmental assessment.
I. Implementation and effectiveness. J. Environ.
Manage. 21:13546.
Routledge, R.D. [n.d.]. Measurements and estimates.
This volume.
Schemske, D.W. and C.C. Horvitz. 1988. Plant-animal interactions and fruit production in a
neotropical herb: a path analysis. Ecology
69:112838.
Schuster, E.G., S.S. Frissell, E.E. Baker, and R.S. Loveless. 1985. The Delphi method: application to
elk habitat quality. U.S. Dep. Agric. For. Serv.
Res. Pap. INT-353.
Schwarz, C.J. [n.d.]. Studies of uncontrolled events.
This volume.
Smith, G.J. [n.d.]. Retrospective studies. This volume.
Stolte, K., D. Mangis, R. Doty, and K. Tonnessen.
1993. Lichens as bioindicators of air quality.
Rocky Mtn. For. Range Exp. Sta., Fort Collins
Colo. U.S. Dep. Agric. For. Serv. Gen. Tech.
Rep. RM-224.
Thomas, L. and K. Martin. 1996. The importance of
analysis method for breeding bird survey population trend estimates. Cons. Biol. 10:47990.
Tibell, L. 1992. Crustose lichens as indicators of forest
continuity in boreal coniferous forests. Nor. J.
Bot. 12:42750.
Tiebout, H.M. III and K.E. Brugger. 1995. Ecological
risk assessment of pesticides for terrestrial vertebrates: evaluation and application of the U.S.
Environmental Protection Agencys quotient
model. Cons. Biol. 9:160518.
143
144
GLOSSARY
A posteriori: Referred to after the data have been collected and examined.
A priori: Referred to before the data are collected and
examined.
Accuracy: The nearness of a measurement to the
actual value of the variable being measured.
Active adaptive management: Management is designed as an experiment to compare alternative
actions (treatments) or discriminate among alternative hypotheses about how the system responds
to actions. Active adaptive management can involve deliberate probing of the system to
identify thresholds in response and clarify the
shape of the functional relationship between actions and response variables.
Alternative hypothesis: A claim or research hypothesis that is compared with another (usually null)
hypothesis.
Analysis of variance (ANOVA): A group of statistical procedures for analyzing continuous data
sampled from two or more populations, or from
experiments in which two or more treatments are
used. ANOVA procedures partition the variation
observable in a response variable into two basic
components: (1) variation due to assignable causes
and (2) uncontrolled or random variation. Assignable causes refer to known or suspected sources of
variation from variates that are controlled (experimental factors) or measured (covariates) during
an experiment. Random variation includes the effects of all other sources not controlled or
measured during the experiment.
Analytical survey: A type of nonexperimental study
where groups sampled from a population of units
are compared.
Autocorrelation: Occurrence when consecutive measurements in a series are not independent of one
another. Also called serial correlation.
Bayes decision: The optimal decision identified
when uncertainties are considered using a formal
decision analysis.
Bias: The deviation of a statistical estimate from the
quantity it estimates. Bias can be a systematic
error introduced into sampling or testing. Positive
bias will overestimate the parameter; negative bias
will underestimate it.
145
Experimental unit: The entity to which one treatment (level of one or more factors) is applied.
Also called a treatment unit.
Homogeneous: Experimental units are homogeneous when they do not differ from one another
in any systematic fashion and are as alike as possible on all characteristics that might affect the
response.
Hypothesis: A tentative assumption, adopted to account for certain facts that can be tested.
Hypothesis testing: A type of statistical inference for
assessing the validity of a hypothesis by determining whether it is consistent with the sample data.
Impact survey: A type of nonexperimental study
where one site affected by some planned or un-
146
Prospective study: A study where actions (treatments) have not yet been applied, and data have
not yet been collected. Prospective studies may be
either experimental or nonexperimental. Contrast
with retrospective study.
Pseudoreplication: Refers to various violations of the
assumption that replicated treatments are independent. A common form of pseudoreplication
occurs when multiple subsamples from one treat-
147
148