Stochastic Simulation and Power Analysis: ©2006 Ben Bolker August 3, 2007

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Stochastic simulation and power analysis

2006 Ben Bolker


August 3, 2007

Summary
This chapter introduces techniques and ideas related to simulating ecological patterns. Its main goals are: (1) to show you how to generate patterns you can use to sharpen your intuition and test your estimation tools; and (2) to introduce statistical power and related concepts, and show you how to estimate statistical power by simulation. This chapter and the supplements will also give you more practice working with R.

Introduction

Chapters ?? and ??, gave a basic overview of functions to describe deterministic patterns and probability distributions to describe stochastic patterns. This chapter will show you how to use stochastic simulation to understand and test your data. Simulation is sometimes called forward modeling, to emphasize that you pick a model and parameters and work forward to predict patterns in the data. Parameter estimation, or inverse modeling (the main focus of this book), starts from the data and works backward to choose a model and estimate parameters. Ecologists often use simulation to explore the patterns that emerge from ecological models. Often they use theoretical models without accompanying data, in order to understand qualitative patterns and plan future studies. But even if you have data, models, but you might want to start by simulating your system. You can use simulations to explore the functions and distributions you chose to quantify your data. If you can choose parameters that make the simulated output from those functions functions and distributions approximate your data, you can conrm that the models are reasonable and simultaneously nd a rough estimate of the parameters. You can also use simulated data from your system to test your estimation procedures. Chapters 68 will show you how to estimate parameters; in this chapter Ill work with more canned procedures like nonlinear regression. Since you never know the true answer to an ecological question you only have imperfect measurements with which youre trying to get as close to the answer as possible simulation is the only way to test whether you can correctly 1

estimate the parameters of an ecological system. Its always good to test such a best-case scenario, where you know that the functions and distributions youre using are correct, before you proceed to real data. Power analysis is a specic kind of simulation testing where you explore how large a sample size you would need to get a reasonably precise estimate of your parameters. You can also also use power analysis to explore how variations in experimental design would change your ability to answer ecological questions.

Stochastic simulation

Static ecological processes, where the data represent a snapshot of some ecological system, are easy to simulate . For static data, we can use a single function to simulate the deterministic process and then add heterogeneity. Often, however, we will chain together several dierent mathematical functions and probability distributions representing dierent stages in an ecological process to produce surprisingly complex and rich descriptions of ecological systems. Ill start with three simple examples that illustrate the general procedure, and then move on to two slightly more in-depth examples.

2.1
2.1.1

Simple examples
Single groups

Figure 1 shows the results of two simple simulations, each with a single group and single continuous covariate. The rst simulation (Figure 1a) is a linear model with normally distributed errors. It might represent productivity as a function of nitrogen concentration, or predation risk as a function of predator density. The mathematical formula is Y Normal(a + bx, 2 ), specifying that Y is a random variable drawn from a normal distribution with mean a + bx and variance 2 . The symbol means is distributed according to. This model can also be written as yi = a + bxi + i , i N (0, 2 ), specifying that the ith value of Y , yi , is equal to a + bxi plus a normally distributed error term with mean zero. I will always use the rst form because it is more general: normally distributed error is one of the few kinds that can simply be added onto the deterministic model in this way. The two lines on the plot show both the theoretical relationship between y and x and the best-t line (by linear regression, lm(y~x) (Section ??). The lines dier slightly because of the randomness incorporated in the simulation. A few lines of R code will run this simulation. Set up the values of x, and specify values for the parameters a and b: > x = 1:20 > a = 2 > b = 1
Dynamic

processes are more challenging. See Chapter ??.

25 20

true best fit


q q q q q q

15
qq q

qq q q q

true best fit


q

10 y

q q q q

15 10 5

q q

qq q qq q q q qq

q q q q qq qq q

0 5 10 15 20 x 0 1 2 x 3

Figure 1: Two simple simulations: a linear function with normal errors (Y Normal(a + bx, 2 )), and a hyperbolic function with negative binomial errors (Y NegBin( = ab/(b + x), k )). Calculate the deterministic part of the model: > y_det = a + b * x Pick 20 random normal deviates with the mean equal to the deterministic equation and = 2: > y = rnorm(20, mean = y_det, sd = 2) (you could also specify this as y = y_det+rnorm(20,sd=2), corresponding to the additive model yi = a + bxi + i , i N (0, 2 ) (the mean parameter is zero by default). However, the additive form works only for the Normal, and not for most of the other distributions we will be using). The second simulation uses hyperbolic functions (y = ab/(b + x)) with negative binomial error: in symbols, Y NegBin( = ab/(b + x), k ). The function is parameterized so that a is the intercept term (when x = 0, y = ab/b = a). This simulation might represent the decreasing fecundity of two dierent species with increasing population density: the hyperbolic function is a natural expression of the decreasing quantity of a limiting resource per individual. In this case, we cannot express the model as the deterministic function plus error. Instead, we have to incorporate the deterministic model as a control on one of the parameters of the error distributionin this case, the mean . (Although the negative binomial is a discrete distribution, its parameters and k are continuous.) Ecological models typically describe the dierences in the

mean among groups or as covariates change, but we could also allow the variance or the shape of the distribution to change. The R code for this simulation is easy, too. Dene parameters > a = 20 > b = 1 > k = 5 How you simulate the x values depends on the experimental design you are trying to simulate. In this case, we choose 50 x values randomly distributed between 0 and 5 to simulate a study were the samples are chosen from natural varying sites, in contrast to the previous simulation where x varied systematically (x=1:20), simulating an experimental or observational study that samples from a gradient in the predictor variable x. > x = runif(50, min = 0, max = 5) Now we calculate the deterministic mean y_det, and then sample negative binomial values with the appropriate mean and overdispersion: > y_det = a * b/(b + x) > y = rnbinom(50, mu = y_det, size = k) 2.1.2 Multiple groups

Ecological studies typically compare the properties of organisms in dierent groups (e.g. control and treatment, parasitized and unparasitized, high and low altitude). Figure 2 shows a simulation that extends the hyperbolic simulation above to compares the eects of a continuous covariate in two dierent groups (species in this case). Both groups have the same overdispersion parameter k , but the hyperbolic parameters a and b dier: Y NegBin( = ai bi /(bi + x), k ) (1)

where i is 1 or 2 depending on the species of an individual. Suppose we still have 50 individuals, but the rst 25 are species 1 and the second 25 are species 2. We use rep to set up a factor that describes the group structure (the R command gl is also useful for more complicated group assignments): > g = factor(rep(1:2, each = 25)) Dening vectors of parameters, each with one element per species, or a single parameter for k since the species are equivalent in this case: > a = c(20, 10) > b = c(1, 2) > k = 5 4

20 15 y 10
q q q q q q q q q q q q q qq q q q q q q q

data (sp. 1) true (sp. 1) best fit (sp. 1) data (sp. 2) true (sp. 2) best fit (sp. 2)

5 0 0 1

qq

4 x

Figure 2: Simulation results from a hyperbolic/negative binomial model with groups diering in both intercept and slope: Y NegBin( = ai bi /(bi + x), k ). Parameters: a = {20, 10}, b = {1, 2}, k = 5. Rs vectorization makes it easy to incorporate dierent parameters for different species into the formula, by using the group vector g to specify which element of the parameter vectors to use for any particular individual. > y_det = a[g]/(b[g] + x) > y = rnbinom(50, mu = y_det, size = k)

2.2
2.2.1

Intermediate examples
Reef sh settlement

The damselsh settlement data from Schmitt et al. (1999) (p. ??) include random variation in settlement density (the density of larvae arriving on a given anemone) and random variation in density-dependent recruitment (number of settlers surviving for 6 months on an anemone). To simulate the variation in settlement density I took random draws from a zero-inated negative binomial (p. ??), although a non-inated binomial, or 5

14 150 Frequency Recruits 12 10 100 8 6 4 2 0 0 50 100 Settlers 150 200 0


q q q q q

qq q qq q q q q q qq q

q q q q q

q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q qq q q q qq q qqq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

50

50

100 Settlers

150

200

Figure 3: Damselsh recruitment: (a) distribution of settlers; (b) recruitment as a function of settlement density even a geometric distribution (i.e. a negative binomial with k = 1) might be sucient to describe the data. Schmitt et al. modeled density-dependent recruitment with a Beverton-Holt curve (equivalents to the Michaelis-Menten function). I have simulated this curve with binomial error (for survival of recruits) superimposed. The model is R Binom(N = S, p = a/(1 + (a/b)S )). (2)

(With the recruitment probability per settler p given as the hyperbolic function a/(1 + (a/b)S ), the mean number of recruits is Beverton-Holt: N p = aS/(1 + (a/b)S ).) The settlement density S is drawn from the zero-inated negative binomial distribution shown in Figure 3a. Set up the parameters, including the number of samples (N ): > > > > > > N = 603 a = 0.696 b = 9.79 mu = 25.32 zprob = 0.123 k = 0.932 Dene a function for the recruitment probability: > recrprob = function(S) { + a/(1 + (a/b) * S) + } Now simulate the number of settlers and the number of recruits, using rzinbinom from the emdbook package: 6

30

20

10

q q q q q q qq q q qq q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q qq q q q qq q q q q q q q q q qq q q q q q qq q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q qq q q q q q qq q q q qq q q q q qq q q q q qq qqq q qq q q q q q q q q q q q q q q q q qqq q q qq qq q qq q q q q q q q q q qq qq q qqq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q qq qq q q q q q q qq qq qq q q q q q q q q q q qq q q q qq q q qq q qq q q q q q q q q qqq q qq q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q qq qq q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q

0.10 0.08 Proportion 0.06


q qq q q q q q q q q

0.04
q

q q q q q q q qq qq qqqq qq

0.02
q

0.00 0 3 6 9

10

20

30

13

18

23

28

Number of neighbors
q q q

1e+00 Biomass (g)

q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

10000
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

1e02

1+Seed set

q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

1000

q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q qq q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq

100

1e04

10
q

1e06

q q

1 40 60 80

qq

q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

20

1e06

1e04

1e02

1e+00

Mass Competition index

Figure 4: Pigweed simulations. (a) Spatial pattern (Poisson cluster process). (b) Distribution of number of neighbors within 2 m. (c) End-of-year biomass, based on a hyperbolic function of crowding index with a gamma error distribution. (d) Seed set, proportional to biomass with a negative binomial error distribution. > settlers = rzinbinom(N, mu = mu, size = k, zprob = zprob) > recr = rbinom(N, prob = recrprob(settlers), size = settlers) 2.2.2 Pigweed distribution and fecundity

Pacala and Silander (1990) did a series of experiments quantifying the strength and spatial scale of competition between the annual weeds velvetweed (Abutilon theophrasti ) and pigweed (Amaranthus retroexus ). They were interested in neighborhood competition among nearby plants. Local dispersal of seeds changes the distribution of the number of neighbors per plant. If plants were randomly distributed we would expect a Poisson distribution of neighbors within a given distance, but if seeds have a limited dispersal range so that plants are spatially aggregated, we expect a distribution with higher variance (and a higher mean number of neighbors for a given overall plant density) such as the negative binomial. Neighbors increase local competition for nutrients, which in turn decreases plants growth rate, their biomass at the end of the growing season, and

their fecundity (seed set). Thus dierences in dispersal and spatial patterning within and among species can in theory change competitive outcomes Bolker et al. (2003), although Pacala and Silander found that spatial structure had little eect in their system. To explore the patterns of competition driven by local dispersal and crowding, we can simulate this spatial competitive process. Lets start by simulating a spatial distribution of plants in an L L plot (L = 30m below). Well use a Poisson cluster process, where mothers are located randomly in space at points {xp , yp } (called a Poisson process in spatial ecology), and their children are distributed nearby (only the children, and not the mothers, are included in the nal pattern). The simulation includes N = 50 parents, for which we pick 50 x and 50 y values, each uniformly distributed between 0 and L. The distance of each child from its parent is exponentially distributed with rate=1/d (mean dispersal distance d), and the direction is random that is, uniformly distributed between 0 and 2 radians . I use a little bit of trigonometry to calculate the ospring locations (Figure 4a). The formal mathematical denition of the model for ospring location is: parent locations xp , yp distance from parent r dispersal angle ospring x xc ospring y yc In R, set up the parameters: > > > > > > set.seed(1001) L = 30 nparents = 50 offspr_per_parent = 10 noffspr = nparents * offspr_per_parent dispdist = 2 Pick locations for the parents: > parent_x = runif(nparents, min = 0, max = L) > parent_y = runif(nparents, min = 0, max = L) Pick angles and distances for dispersal: > angle = runif(noffspr, min = 0, max = 2 * pi) > dist = rexp(noffspr, 1/dispdist) Add the ospring displacements to the parent coordinates (using rep(...,each=offspr_per_parent)):
R, like most computer languages, works in radians rather than degrees; to convert from degrees to radians, multiply by 2/360. Since R doesnt understand Greek letters, use pi to denote : radians=degrees*2*pi/360.

U (0, L) Exp(0, 1/d) U (0, 2 ) xp + r cos yp + r sin .

> offspr_x = rep(parent_x, each = offspr_per_parent) + + cos(angle) * dist > offspr_y = rep(parent_y, each = offspr_per_parent) + + sin(angle) * dist If you wanted to allow dierent numbers of ospring for each parent for example, drawn from a Poisson distribution you could use offspr_per_parent=rpois(nparents,lambda) and then rep(..., times=offspr_per_parent). Instead of specifying that each parents coordinates should be repeated the same number of times, you would be telling R to repeat each parents coordinates according to its number of ospring. Next we calculate the neighborhood density, or the number of individuals within 2 m of each plant (not counting itself). Figure 4(b) shows this distribution, along with a tted negative binomial distribution. This calculation reduces the spatial pattern to a simpler non-spatial distribution of crowding. > pos <- cbind(offspr_x, offspr_y) > ndist <- as.matrix(dist(pos, upper = TRUE, diag = TRUE)) > nbrcrowd = apply(ndist < 2, 1, sum) - 1 Next we use a relationship that Pacala and Silander found between end-ofyear mass (M ) and competition index (C ). They tted this relationship based on a competition index estimated as a function of the neighborhood density of conspecic (pigweed) and heterospecic (velvetleaf) competitors, C = 1 + cpp np + cvp nv . For this example, I simply made up a proportionality constant to match the observed range of competition indices. Pacala and Silander found that biomass M Gamma(shape = m/(1 + C ), scale = ), with m = 2.3 and = 0.49. > > > > > + ci = nbrcrowd * 3 M = 2.3 alpha = 0.49 mass_det = M/(1 + ci) mass = rgamma(length(mass_det), scale = mass_det, shape = alpha)

Finally, we simulate seed set as a function of biomass, again using a relationship estimated by Pacala and Silander. Seed set is proportional to mass, with negative binomial errors: S NegBin( = bM, k ), with b = 271.6, k = 0.569. > > > > b = 271.6 k = 0.569 seed_det = b * mass seed = rnbinom(length(seed_det), mu = seed_det, size = k)

Figure 4c shows both mass and (1+seed set) on a logarithmic scale, along with dashed lines showing the 95% condence limits of the theoretical distribution.

The idea behind realistic static models is that they can link together simple deterministic and stochastic models of each process in a chain of ecological processesin this case from spatial distribution to neighborhood crowding to biomass to seed set. (Pacala and Silander actually went a step further and computed the density-dependent survival probability. We could simulate this using a standard model like survival Binom(N = 1, p = logistic(a + bC )), where the logistic function allows the survival probability to be an increasing function of competition index without letting it ever go above 1.) Thus, although its hard to write down a simple function or distribution that describes the relationship between competition index and the number surviving, as shown here we can break the relationship down into stages in the ecological process and use a simple model for each stage.

Power analysis

Power analysis in the narrow sense means guring out the (frequentist) statistical power, the probability of failing to reject the null hypothesis when it is false (Figure 5). Power analysis is important, but the narrow frequentist definition suers from some of the problems that we are trying to move beyond by learning new statistical methods, such as a focus on p values and on the truth of a particular null hypothesis. Thinking about power analysis even in this narrow sense is already a vast improvement on the naive and erroneous the null hypothesis is false if p < 0.05 and true if p > 0.05 approach. However, we should really be considering a much broader question: How do the quality and quantity of my data and the true properties (parameters) of my ecological system aect the quality of the answers to my questions about ecological systems? For any real experiment or observation situation, we dont know what is really going on (the true model or parameters), so we dont have the information required to answer these questions from the data alone. But we can approach them by analysis or simulation. Historically, questions about statistical power could only be answered by sophisticated analyses, and only for standard statistical models and experimental designs such as one-way ANOVA or linear regression. Increases in computing power have extended power analyses to many new areas, and Rs capability to run many repeated stochastic simulations is a great help. Paradoxically, the mathematical diculty of deriving power formulas is a great equalizer: since even research statisticians typically use simulations to estimate power, its now possible (by learning simulation, which is easier than learning advanced mathematical statistics) to work on an equal footing with even cutting-edge researchers. The rst part of the rather vague (but common-sense) question above is about quantity and quality of data and the true properties of the ecological system. These properties include:
Number of data points (number of observations/sampling intensity)

10

H0

H1

1.0 0.8

= 0.25 = 0.75
q

Probability

Power 1 Power 0.6 0.4 0.2

=2

0 x

Effect size

Figure 5: The frequentist denition of power. In the left-hand plot, the type I (false positive) rate is the area under the tails of the null hypothesis H0 ; the type II error rate, , is the area under the sampling distribution of the alternative hypothesis (H1 ) between the tails of the null hypothesis; thus the power 1 is the gray area shown that lies above the upper critical value of the null hypothesis curve. (There is also a tiny area where H1 overlaps the lower tail of H0 .) The right-hand plot shows power as a function of eect size (distance between the means) and standard deviation; the point shows the situation (eect size=2, = 0.75) illustrated in the left gure.

11

Distribution of data (experimental design)

Number of observations per site, number of sites Temporal and spatial extent (distance between the farthest samples, controlling the largest scale you can measure) and grain (distance between the closest samples, controlling the smallest scale you can measure) Even or clustered distribution in space and/or time. Blocking. Balance (i.e., equal or similar numbers of observations in each treatment) Distribution of continuous covariates mimicking the natural distribution, or stratied to sample evenly across the natural range of values, or articially extended to a wider range
Amount of variation (measurement/sampling error, demographic stochasticity, environmental variation). Experimental control or quantication of variation. Eect size (small or large), or the distance of the true parameter from the null-hypothesis value.

These properties will determine how much information you can extract from your data. Large data sets are better than smaller ones; balanced data sets with wide ranges are better than unbalanced data sets with narrow ranges; data sets with large extent (maximum spatial and/or temporal range) and small grain (minimum distance between samples) are best; and larger eects are obviously easier to detect and characterize. There are obvious tradeos between eort (measured in person-hours or dollars) and the number of samples, and in how you allocate that eort. Would you prefer more information about fewer samples, or less information about more? More observations at fewer sites or fewer at more sites? Should you spend your eort increasing extent or decreasing grain? Subtler tradeos also aect the value of an experiment. For example, controlling extraneous variation allows a more powerful answer to a statistical question but how do we know what is extraneous? Variation actually aects the function of ecological systems (Jensens inequality: Ruel and Ayres, 1999). Measuring a plant in a constant laboratory environment may turn out to answer the wrong question: we ultimately want to know how the plant performs in the natural environment, not in the lab, and variability is an important part of most environments. In contrast, performing unrealistic manipulations like pushing population densities beyond their natural limits may help to identify density-dependent processes that are real and important but undetectable at ambient densities (Osenberg et al., 2002). There is no simple answer to these questions, but theyre important to think about. The quality of the answers we get from our analyses is as multifaceted as the quality of the data. Precision species how nely you can estimate a parameter the number of signicant digits, or the narrowness of the condence interval

12

while accuracy species how likely your answer is to be correct. Accurate but imprecise answers are better than precise but inaccurate ones: at least in this case you know that your answer is imprecise, rather than having misleadingly precise but inaccurate answers. But you need both precision and accuracy to understand and predict ecological systems. More specically, I will show how to estimate the following aspects of precision and accuracy for the damselsh system:
Bias (accuracy): bias is the expected dierence between the estimate and the true value of the parameter. If you run a large number of simulations for each one, then the with a true value of d and estimate a value of d bias is E [d d]. Most simple statistical estimators are unbiased, and so most of us have come to expect (wrongly) that statistical estimates are generally unbiased. Most statistical estimators are indeed asymptotically unbiased, which means that in the limit of a large amount of data they will give the right answer on average, but a surprisingly large number of common estimators are biased (Poulin, 1996; Doak et al., 2005).

E [d ])2 ], measures the variability Variance (precision): variance, or E [(d ) around their mean value. Just as an accurate of the point estimates (d but imprecise answer is worthless, unbiased answers are worthless if they have high variance. With low bias we know that we get the right answer on average, but high variability means that any particular estimate could be way o. With real data, we never know which estimates are right and which are wrong.
Condence interval width (precision): the width of the condence intervals, either in absolute terms or as a proportion of the estimated value, provides useful information on the precision of your estimate. If the condence interval is estimated correctly (see coverage, below) then the condence interval should be related to the variance among estimates. Mean squared error (MSE: accuracy and precision) combines bias and variance as (bias2 +variance). It represents the total variation around the ])2 + E [(d true value, rather than the average estimated value (E [d d 2 2 E [d]) ] = E [(d d) ]. MSE gives an overall sense of the quality of the estimator. Coverage (accuracy): when we sample data and estimate parameters, we try to estimate the uncertainty in those parameters. Coverage describes how accurate those condence intervals are, and (once again) can only be estimated via simulation. If the condence intervals (for a given condence level 1 ) are dlow and dhigh , then the coverage describes the proportion or percentage of simulations in which the condence intervals actually include the true value (Prob(dlow < d < dhigh )). Ideally, the observed coverage should equal the nominal coverage of 1 ; values that are too high are pessimistic, overstating the level of uncertainty, while values

13

that are too low are optimistic. (It often takes several hundred simulations to get a reasonably precise estimate of the coverage, especially when estimating the coverage for 95% condence intervals.)
Power (precision): nally, the narrow-sense power gives the probability of correctly rejecting the null hypothesis, or in other words the fraction of the times that the null-hypothesis value d0 will be outside of the condence limits: (Prob(d0 < dlow or d0 > dhigh )). In frequentist language, it is 1 , where is the probability of making a type II error.

accept H0 reject H0

H0 true 1

H0 false 1

Typically you specify an alternative hypothesis H1 , a desired type I error rate , and a desired power (1 ) and then calculate the required sample size, or calculate (1 ) as a function of sample size, for some particular H1 . When the eect size is zero (the dierence between the null and the alternate hypotheses is zero i.e. the null hypothesis is true), the power is undened, but it approaches as the eect size gets small (H1 H0 ). R has built-in functions for several standard cases (power of tests of difference between means of two normal populations [power.t.test], tests of dierence in proportions, [power.prop.test], and one-way, balanced ANOVA [power.anova.test]) . For more discussion of these cases, or for other fairly straightforward examples, you can look in any relatively advanced biometry book (e.g. Sokal and Rohlf (1995)), or even nd a calculator on the web (search for statistical power calculator). For more complicated and ecologically realistic examples, however, youll probably have to nd the answer through simulation, as demonstrated below.

3.1
3.1.1

Simple examples
Linear regression

Lets start by estimating the statistical power of detecting the linear trend in Figure 1a, as a function of sample size. In order to nd out whether we can reject the null hypothesis in a single experiment, we simulate a data set with a given slope, intercept, and number of data points; run a linear regression; extract the p-value; and see whether it is less than our specied criterion (usually 0.05). For example:
Not zero! even when the null hypothesis is true, we reject it a proportion of the time: thus we can expect to correctly reject the null hypothesis, even for very small eects, with probability at least . The Hmisc package, available on CRAN, has a few more power calculators.

14

> > > >

y_det = a + b * x y = rnorm(N, mean = y_det, sd = sd) m = lm(y ~ x) coef(summary(m))["x", "Pr(>|t|)"]

[1] 0.003615899 Extracting p-values from R analyses can be tricky. In this case, the coecients of the summary of the linear t are a matrix including the standard error, t statistic, and p-value for each parameter; I used matrix indexing to pull out the specic value I wanted. More generally, you will have to use the names and str commands to pick through the results of a test to nd the p-value. In order to estimate the probability of successfully rejecting the null hypothesis when it is false (the power), we have to repeat this procedure many times and calculate the proportion of the time that we reject the null hypothesis. Specify the number of simulations to run (400 is a reasonable number if we want to calculate a percentage even 100 would do to get a crude estimate): > nsim = 400 Set up a vector to hold the p-value for each simulation: > pval = numeric(nsim) Now repeat what we did above 400 times, each time saving the p-value in the storage vector: > for (i in 1:nsim) { + y_det = a + b * x + y = rnorm(N, mean = y_det, sd = sd) + m = lm(y ~ x) + pval[i] = coef(summary(m))["x", "Pr(>|t|)"] + } Calculate the power: > sum(pval < 0.05)/nsim [1] 0.87 However, we dont just want to know the power for a single experimental design. Rather, we want to know how the power changes as we change some aspect of the design such as the sample size or the variance. Thus we have to repeat the entire procedure above multiple times, each time changing some parameter of the simulation such as the slope, or the error variance, or the distribution of the x values. Coding this in R usually involves nested for loops. For example:

15

> > > + + + + + + + + +

bvec = seq(-2, 2, by = 0.1) power.b = numeric(length(bvec)) for (j in 1:length(bvec)) { b = bvec[j] for (i in 1:nsim) { y_det = a + b * x y = rnorm(N, mean = y_det, sd = sd) m = lm(y ~ x) pval[i] = coef(summary(m))["x", "Pr(>|t|)"] } power.b[j] = sum(pval < 0.05)/nsim }

The results would resemble a noisy version of the right subgure in Figure 5. The power equals =0.05 when the slope is zero, rising to 0.8 for slope 1. You could repeat these calculations for a dierent set of parameters (e.g. changing the sample size, or the number of parameters). If you were feeling ambitious, you could calculate the power for many combinations of (e.g.) slope and sample size, using yet another for loop; saving the results in a matrix; and using contour or persp to plot the results. 3.1.2 Hyperbolic/negative binomial data

What about the power to detect the dierence between the two groups shown in Figure 1b with hyperbolic dependence on x, negative binomial errors, and dierent intercepts and hyperbolic slopes? In order to estimate the power of the analysis, we have to know how to test statistically for a dierence between the two groups. Jumping the gun a little bit (this topic will be covered in much greater detail in Chapter 6), we can dene negative log-likelihood functions both for a null model that assumes the intercept is the same for both groups as well as for a more complex model that allows for dierences in the intercept. The mle2 command in the bbmle package lets us t the parameters of these models, and the anova command gives us a p-value for the dierence between the models (p. ??): > m0 = mle2(y ~ dnbinom(mu = a * b * x/(b + x), size = k), + start = list(a = 15, b = 1, k = 5)) > m1 = mle2(y ~ dnbinom(mu = a * b * x/(b + x), size = k), + parameters = list(a ~ g, b ~ g), start = list(a = 15, + b = 1, k = 5)) > anova(m0, m1)[2, "Pr(>Chisq)"] Without showing the details, we now run a for loop that simulates the system above 200 times each for a range of sample sizes, uses anova to calculate the p-values, and calculates the proportion of p-values < 0.05 for each sample size. Figure 6 shows the results. For small sample sizes (< 20), the power is abysmal 16

1.0 0.8 Power 0.6 0.4 0.2 0.0


10 20 50 100
q q q q q q q q q q q q

q q q q q

200

500

Sample size

Figure 6: Statistical power to detect dierences between two hyperbolic functions with intercepts a = {10, 20}, slopes b = {2, 1}, and negative binomial k = 5, as a function of sample size. Sample size is plotted on a logarithmic scale. ( 0.2 0.4). Power then rises approximately linearly, rising to acceptable levels (0.8 and up) for sample sizes of 50100 and greater. The variation in Figure 6 is due to stochastic variation. We could run more simulations per sample size to reduce the variation, but its probably unnecessary since all power analysis is approximate anyway. 3.1.3 Bias and variance in estimates of the negative binomial k parameter

For another simple example, one that demonstrates that theres more to life than p-values, consider the problem of estimating the k parameter of a negative binomial distribution. Are standard estimators biased? How large a sample do you need for a reasonably accurate estimate of aggregation? Statisticians have long been aware that maximum likelihood estimates of the negative binomial k and similar aggregation indices, while better than simpler method of moments estimates (p. ??), are biased for small sample sizes (Pieters

17

et al., 1977; Piegorsch, 1990; Poulin, 1996; Lloyd-Smith, 2007). While you could delve into the statistical literature on this topic and even nd special-purpose estimators that reduce the bias (Saha and Paul, 2005), its empowering to be able to explore the problem yourself through simulation. We can generate negative binomial samples with rnbinom, and the fitdistr command from the MASS package is a convenient way to estimate the parameters. fitdistr nds maximum likelihood estimates, which generally have good properties but are not infallible, as we will see shortly. For a single sample: > x = rnbinom(100, mu = 1, size = 0.5) > f = fitdistr(x, "negative binomial") > f size mu 0.21908756 1.05996103 (0.05712932) (0.24875054) (the standard deviations of the parameter estimates are given in parentheses). You can see that for this example the value of k (size) is underestimated relative to the true value of 0.5 but how do the estimates behave in general? In order to dig the particular values we want (estimated k and standard deviation of the estimate) out of the object that fitdistr returns, we have to use str(f) to examine its internal structure. It turns out that f$estimate["size"] and f$sd["size"] are the numbers we want. Set up a vector of sample sizes (lseq is a function from the emdbook package that generates a logarithmically spaced sequence) and set aside space for the estimated k and its standard deviation: > Nvec = round(lseq(20, 500, length = 100)) > estk = numeric(length(Nvec)) > estksd = numeric(length(Nvec)) Now pick samples and estimate the parameters: > set.seed(1001) > for (i in 1:length(Nvec)) { + N = Nvec[i] + x = rnbinom(N, mu = 1, size = 0.5) + f = fitdistr(x, "negative binomial") + estk[i] = f$estimate["size"] + estksd[i] = f$sd["size"] + } Figure 7 shows the results: the estimate is indeed biased, and highly variable, for small sample sizes. For sample sizes below about 100, the estimate k is biased upward by about 20% on average. The coecient of variation (standard deviation divided by the mean) is similarly greater than 0.2 for sample sizes less than 100. 18

5.00 Estimated sd(k) 2.00 1.00 0.50 0.20 0.10 0.05

2.0 Estimated k 1.5


q q

q q

1.0 0.5

q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q qq q q q q q qq q qq q qqq q q q q q q qq qq q q q q q qq q q q q qq q q q qq qqq q q q q qq q q q q q

q qqq q q q q q q q qq qq q q q q qq qq q q qq qq q q q q qq qqq q q q q q q q q q q q qq q q qq q q qq q q qq q q q q q q q q q q q qqq q q qq q qq qqq q q q q q q q q

20

50

100

200

500

20

50

100

200

500

Sample size

Sample size

Figure 7: Estimates of negative binomial k with increasing sample size. In lefthand gure, solid line is a loess t. Horizontal dashed line is the true value. The y axis in the right-hand gure is logarithmic.

3.2

Detecting under- and overcompensation in sh data

Finally, we will explore a more extended and complex example the diculty of estimating the exponent d in the Shepherd function, R = aS/(1 + (a/b)S d ) ((Figure 5c). This parameter controls whether the Shepherd function is undercompensating (d < 1: recruitment increases indenitely as the number of settlers grows), saturating (d = 1: recruitment reaches an asymptote), or overcompensating (d > 1: recruitment decreases at high settlement). Schmitt et al. (1999) set d = 1 in part because d is very hard to estimate reliably we are about to see just how hard. You can use the simulation approach described above to generate simulated data sets of dierent sizes whose characteristics matched Schmitt et al.s data: a zero-inated negative binomial distribution of numbers of settlers and a Shepherd-function relationship (with a specied value of d) between the number of settlers and the number of recruits. For each simulated data set, use Rs nls function to estimate the values of the parameters by nonlinear least squares . Then calculate the condence limits on d (using confint) and record the estimated value of the parameter and the lower and upper condence limits. ) and 95% condence limits (d , Figure 8 shows the point estimates (d low dhigh ) for the rst 20 out of 400 simulations with 1000 simulated observations and a true value of d = 1.2. The gure also illustrates several of the summary statistics discussed above: bias, variance, power, and coverage (see the caption for details). For this particular case (n = 1000, d = 1.2) I can compute the bias (0.0039),
Non-linear least-squares tting assumes constant, normally distributed error, ignoring the fact that the data are really binomially distributed. Chapter ?? will present more sophisticated maximum likelihood approaches to this problem.

19

1.6

estimates

lower bounds

1.3

1.4

1.5

q q

q q q q q q q q q q q q q

^ d

1.2

q q q q q

^ mean: E[d] true: d

a
^ d

1.1

b
null: d0

1.0

0.9 1

10

20 Simulation

Figure 8: Simulations and power/coverage. Points and error bars show point ) and 95% condence limits (d , d estimates (d low high ) for the rst 20 out of 400 simulations with a true value of d = 1.2 and 1000 samples. Horizontal lines , E [d ] =1.204; the true value for this set of simulations, show the mean value of d d = 1.2; and the null value, d0 = 1. The left-hand density in the gure represents for all 400 simulations. The right-hand density represents the distribution of d the distribution of the lower condence limit, dlow . The distance between d ] (short-dashed horizontal line) shows the bias. (solid horizontal line) and E [d , , shows the square root The error bar showing the standard deviation of d d of the variance of d. The coverage is the proportion of lower condence limits that fall below the true value, area b + c in the lower-bound density. The power is the proportion of lower condence limits that fall above the null value, area a + b in the lower-bound density. For simplicity, I have omitted the distribution of the upper bounds dhigh .

20

variance (0.003, or d = 0.059), mean-squared-error (0.003), coverage (0.921), and power (0.986). With 1000 observations, things look pretty good, but 1000 observations is a lot and d = 1.2 represents a lot of overcompensation. The real value of power analyses comes when we compare the quality of estimates across a range of sample sizes and eect sizes. Figure 9 gives a gloomier picture, showing the bias, precision, coverage, and power for a range of d values from 0.7 to 1.3 and a range of sample sizes from 50 to 2000. It takes sample sizes of at least 500 to obtain reasonably unbiased estimates with adequate precision, and even then the coverage may be low if d < 1.0 and the power low if d is close to 1 (0.9 d 1.1). Because of the upward bias in d at low sample sizes, the calculated power is actually higher at very low sample sizes, but this is not particularly comforting. The power of the analysis slightly better for overcompensation than undercompensation. The relatively low power values are as expected from Fig. 9b, which shows wide condence intervals. Low power would also be predictable from the high variance of the estimates, which I didnt even bother to show in Fig. 9a because they obscured the gure too much. Another use for our simulations is to take a rst look at the tradeos involved in adding complexity to models. Figure 10 shows estimates of b, the asymptote if d = 1, for dierent sample sizes and values of d. If d = 1, then the Shepherd model reduces to the Beverton-Holt model. In this case, you might think that it wouldnt matter whether you used the Shepherd or the Beverton-Holt model to estimate the b parameter, but there are serious disadvantages to the Shepherd function. First, even when d = 1, the Shepherd estimate of d is biased upwards for low sample sizes, leading to a severe upward bias in the estimate of b. Second, not shown on the graph because it would have obscured everything else, the variance of the Shepherd estimate is far higher than the variance of the BevertonHolt estimate (e.g. for a sample size of 200, the Beverton-Holt estimate is 9.83 0.78 (s.d.), while the Shepherd estimate is 14.16 13.94 (s.d.)). On the other hand, if d is not equal to 1, the bias in the Beverton-Holt estimate of b is large and more or less independent of sample size. For reasonable sample sizes, if d = 0.9 the Beverton-Holt estimate is biased upward by 6; if d = 1.1 it is biased downward by 3.79. Since the Beverton-Holt model isnt exible enough to account for the changes in shape caused by d, it has to modify b in order to compensate. This general phenomenon is called the biasvariance tradeo (see p. ??): more complex models in general reduce bias at the price of increased variance. (The small-sample bias of the Shepherd is a separate, and slightly less general, phenomenon.) Because it is fundamentally dicult to estimate parameters or test hypotheses with noisy data, and most ecological data sets are noisy, power analyses are often depressing. On the other hand, even if things are bad, its better to know how bad they are than just to guess; knowing how much you really know is important. In addition, there are design decisions you can make (e.g. number of treatments vs number of replicates per treatment) that optimize power given the constraints of time and money. 21

1.3

Estimated d

1.2 1.1 1.0 0.9 0.8 0.7

3 33333 2 2 1 2222 11111 0 000 9 00 8 9 7 9999 8 78888 7 7 77


0 500

Confidence interval width

a
3 2 1 0 9 8 7
1000 1500

b
1.4 1.2 1.0 0.8 0.6 0.4 0.2

3 2 1 0 9 8 7
2000

7 8 9 0 3 1 2 7 8 9 3 2 0 1

7 8 3 97 2 0 1 8 3 97 2 0 1 8 3 9 2 07 8 1 3 9 2 0 1

7 8 3 9 2 0 1
1000 1500

7 8 3 9 2 0 1
2000

500

Sample size

Sample size

c
1.00 0.95 1.0 0.90 0.85 0.80 0.75 0

d
Power or 3 2 0 1 9 8 7 3 2 1 0 9 8 7
500 1000 1500 2000 0.8 0.6 0.4 0.2 0.0

3 2 3 222 13 3 3 00 01 11 99 9 8 2 880 9 7 1 0 77 8 9

33 3 333 2 2 2 2 1 0 2 82 7 8 1 7 9 81 7 71 9 119 7 0 8 978 0 0 9 800 9


0 500

3 2 7 8 1 9

3 2 7 8 1 9

Coverage

0
1000 1500

2000

Sample size

Sample size

Figure 9: Summaries of statistical accuracy, precision, and power for extimating the Shepherd exponent d, for a range of d values from undercompensation, d = 0.7 (line marked 7) to overcompensation d = 1.3 (line marked 3). (a) Estimated d: the estimates are strongly biased upwards for sample sizes less than 500, especially for undercompensation (d < 1). (b) Condence interval width: the condence intervals are large (> 0.4) for sample sizes smaller than about 500, for any value of d. (c) Coverage of the nominal 95% condence intervals is adequate for large sample size (> 250) and overcompensation (d > 1), but poor even for large sample sizes when d < 1. (d) For statistical power (1 ) of at least 0.8, sample sizes of 5001000 are required if d 0.7 or d 1.2; sample sizes of 1000 if d = 0.8; and sample sizes of at least 2000 if d = 0.9 or d = 1.1. When d = 1.0 (0 line), the probability of rejecting the null hypothesis is a little above the nominal value of = 0.05.

22

20

15

q q q9 9 q 9 9 9 q q 9

q 9

BH est.: d=0.9

q 9

Estimate of b

9 0 9 Shepherd est. 1 0 1 9 9 0 1 0 1 9 0 1 q0 q 0 0 q 0 q 0 q 0 q q 0

10

BH est.: d=1.0

9 0 1 q 0

q1 q 1 1 q 1 q 1 q q 1

q 1

BH est.: d=1.1

q 1

5 0

500

1000 Number of samples

1500

2000

Figure 10: Estimates of b, using Beverton-Holt or Shepherd functions, for different values of d and sample sizes.

23

Remember that systematic biases, pseudo-replication, etc. factors that are rarely accounted for in your experimental design or in your power analysis are often far more important than the fussy details of your statistical design. While you should quantify the power of your experiment to make sure it has a reasonable of success, thoughtful experimental design (e.g. measuring and statistically accounting for covariates such as mass, rainfall, etc.; pairing control and treatment samples; or expanding the range of covariates tested) will make a much bigger dierence than tweaking the details of your experiment to squeeze out a little bit more statistical power.

References
Bolker, B. M., S. W. Pacala, and C. Neuhauser. 2003. Spatial dynamics in model plant communities: what do we really know? American Naturalist 162:135148. Doak, D. F., K. Gross, and W. F. Morris. 2005. Understanding and predicting the eects of sparse data on demographic analyses. Ecology 86:11541163. Lloyd-Smith, J. O. 2007. Maximum likelihood esimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases. PLoS ONE 2:e180. Osenberg, C. W., C. M. St. Mary, R. J. Schmitt, S. J. Holbrook, P. Chesson, and B. Byrne. 2002. Rethinking ecological inference: density dependence in reef shes. Ecology Letters 5:715721. Pacala, S. and J. Silander, Jr. 1990. Field tests of neighborhood population dynamic models of two annual weed species. Ecological Monographs 60:113 134. Piegorsch, W. W. 1990. Maximum likelihood estimation for the negative binomial dispersion parameter. Biometrics 46:863867. Pieters, E. P., C. E. Gates, J. H. Matis, and W. L. Sterling. 1977. Small sample comparison of dierent estimators of negative binomial parameters. Biometrics 33:718723. Poulin, R. 1996. Measuring parasite aggregation: defending the index of discrepancy. International Journal for Parasitology 26:227229. Ruel, J. J. and M. P. Ayres. 1999. Jensens inequality predicts eects of environmental variation. Trends in Ecology and Evolution 14:361366. Saha, K. and S. Paul. 2005. Bias-corrected maximum likelihood estimator of the negative binomial dispersion parameter. Biometrics 61:179185.

24

Schmitt, R. J., S. J. Holbrook, and C. W. Osenberg. 1999. Quantifying the eects of multiple processes on local abundance: a cohort approach for open populations. Ecology Letters 2:294303. Sokal, R. R. and F. J. Rohlf. 1995. Biometry. W. H. Freeman, New York. 3d edition.

25

You might also like