Instant Download Bayesian Methods For Data Analysis Third Edition Carlin B.P. PDF All Chapter
Instant Download Bayesian Methods For Data Analysis Third Edition Carlin B.P. PDF All Chapter
com
https://ebookfinal.com/download/bayesian-methods-for-data-
analysis-third-edition-carlin-b-p/
OR CLICK BUTTON
DOWNLOAD EBOOK
https://ebookfinal.com/download/statistical-methods-for-spatial-data-
analysis-1st-edition-oliver-schabenberger/
ebookfinal.com
https://ebookfinal.com/download/statistical-methods-for-microarray-
data-analysis-methods-and-protocols-1st-edition-andrei-y-yakovlev/
ebookfinal.com
https://ebookfinal.com/download/bayesian-methods-for-ecology-1st-
edition-michael-a-mccarthy/
ebookfinal.com
https://ebookfinal.com/download/statistical-methods-for-the-analysis-
of-biomedical-data-second-edition-robert-f-woolson/
ebookfinal.com
Doing Bayesian Data Analysis Second Edition A Tutorial
with R JAGS and Stan John Kruschke
https://ebookfinal.com/download/doing-bayesian-data-analysis-second-
edition-a-tutorial-with-r-jags-and-stan-john-kruschke/
ebookfinal.com
https://ebookfinal.com/download/doing-bayesian-data-analysis-a-
tutorial-introduction-with-r-and-bugs-1st-edition-john-k-kruschke/
ebookfinal.com
https://ebookfinal.com/download/bayesian-models-for-categorical-data-
wiley-series-in-probability-and-statistics-1st-edition-peter-congdon/
ebookfinal.com
https://ebookfinal.com/download/data-analysis-and-graphics-using-r-an-
example-based-approach-third-edition-john-maindonald/
ebookfinal.com
https://ebookfinal.com/download/an-introduction-to-statistical-
methods-and-data-analysis-6th-edition-r-lyman-ott/
ebookfinal.com
Bayesian Methods
for Data Analysis
Third Edition
Bayesian Methods
for Data Analysis
Third Edition
Bradley P. Carlin
Univesity of Minnesota
Minneapolis, MN, U.S.A.
Thomas A. Louis
Johns Hopkins Bloomberg School of Public Health
Baltimore, MD, U.S.A.
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize
to copyright holders if permission to publish in this form has not been obtained. If any copyright material
has not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, trans-
mitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter
invented, including photocopying, microfilming, and recording, or in any information storage or retrieval
system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the
CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Carlin, Bradley P.
Bayesian methods for data analysis / authors, Bradley P. Carlin and Thomas A.
Louis. -- 3rd ed.
p. cm. -- (Chapman & Hall/CRC texts in statistical science
series ; 78)
Originally published: Bayes and Empirical Bayes methods for data analysis. 1st ed.
Includes bibliographical references and index.
ISBN 978-1-58488-697-6 (alk. paper)
1. Bayesian statistical decision theory. I. Louis, Thomas A., 1944- II. Carlin, Bradley
P. Bayes and Empirical Bayes methods for data analysis. III. Title. IV. Series.
QA279.5.C36 2008
519.5’42--dc22 2008019143
and
Appendices 417
References 487
analytic techniques right away in Chapter 2, the basic Bayes chapter. While
the theory supporting the use of MCMC is only cursorily explained at this
point, the aim is get the reader up to speed on the way that a great deal of
applied Bayesian work is now routinely done in practice. While a probabilist
might disagree, the real beauty of MCMC for us lies not in the algorithms
themselves, but in the way their power enables us to focus on statistical
modeling and data analysis in a way impossible before. As such, Chapter 2
is now generously endowed with data examples and corresponding R and
WinBUGS code, as well as several new homework exercises along these same
lines. The core computing and model criticism and selection material, for-
merly in Chapters 5 and 6, has been moved up to Chapters 3 and 4, in
keeping with our desire to get the key modeling tools as close to the front
of the book as possible. On a related note, new Sections 2.4 and 4.1 contain
explicit descriptions and illustrations of hierarchical modeling, now com-
monplace in Bayesian data analysis. The philosophically related material
on empirical Bayes and Bayesian performance formerly in Chapters 3 and
4 has been thinned and combined into new Chapter 5. Compensating for
this, the design of experiments material formerly (and rather oddly) tacked
onto Chapter 4 has been expanded into its own chapter (Chapter 6) that in-
cludes more explicit advice for clinical trialists and others requiring a basic
education in Bayesian sample size determination, as well as the frequentist
checks still often required of such designs (e.g., by regulatory agencies) be-
fore they are put into practice. Finally, the remaining chapters have been
updated as needed, including a completely revised and expanded Subsec-
tion 7.1 on ranking and histogram estimation, and a new Subsection 8.3
case study on infectious disease modeling and the 1918 flu epidemic.
As with the previous two editions, this revision presupposes no previ-
ous exposure to Bayes or empirical Bayes (EB) methods, but readers with
a master’s-level understanding of traditional statistics – say, at the level
of Hogg and Craig (1978), Mood, Graybill, and Boes (1974), or Casella
and Berger (1990) – may well find the going easier. Thanks to the rear-
rangements mentioned above, a course on the basics of modern applied
Bayesian methods might cover only Chapters 1 to 4, since they provide
all that is needed to do some pretty serious hierarchical modeling in stan-
dard computer packages. In the Division of Biostatistics at the University
of Minnesota, we do essentially this in a three-credit-hour, single-semester
(15 week) course aimed at master’s and advanced undergraduate students
in statistics and biostatistics, and also at master’s and doctoral students
in other departments who need to know enough hierarchical modeling to
analyze their data. For those interested in fitting advanced models be-
yond the scope of standard packages, or in doing methodological research
of their own, the material in the latter chapters may well be crucial. At
Minnesota, we also have a one-semester course of this type, aimed at
doctoral and advanced master’s students in statistics and biostatistics.
PREFACE TO THE THIRD EDITION xv
1.1 Introduction
79 87 83 80 78
90 89 92 99 95
96 100 110 115
101 109 105 108 112
96 104 92 101 96
Table 1.1 An array of well-estimated rates per 10,000 with one estimate missing.
With no direct information for , what would you use for an estimate?
Does 200 seem reasonable? Probably not, since the unknown rate is sur-
rounded by estimates near 100. To produce an estimate for the missing
cell you might fit an additive model (rows and columns) and then use the
model to impute a value for , or merely average the values in surround-
ing cells. These are two examples of borrowing information. Whatever your
approach, some number around 100 seems reasonable.
MOTIVATING VIGNETTES 3
Now assume that we obtain data for the cell and the estimate is, in
fact, 200, based on 2 events in a population of 100 (200 = 10000 × 2/100).
Would you now estimate by 200 (a very unstable estimate based on very
little information), when with no information a moment ago you used 100?
While 200 is a perfectly valid estimate (though its uncertainty should be
reported), some sort of weighted average of this direct estimate (200) and
the indirect estimate you used when there was no direct information (100)
seems intuitively more appealing. The Bayesian formalism allows just this
sort of natural compromise estimate to emerge.
Finally, repeat this mental exercise assuming that the direct estimate is
still 200 per 10,000, but now based on 20 events in a population of 1000,
and then on 2000 events in a population of 100,000. What estimate would
you use in each case? Bayes and empirical Bayes methods structure this
type of statistical decision problem, automatically giving increasing weight
to the direct estimate as it becomes more reliable.
1.2.3 Bioassay
Consider a carcinogen bioassay where you are comparing a control group
(C) and an exposed group (E) with 50 rodents in each (see Table 1.2). In
the control group, 0 tumors are found; in the exposed group, there are 3,
producing a non-significant, one-sided Fisher exact test p-value of approx-
imately 0.125. However, your colleague, who is a veterinary pathologist,
states, “I don’t know about statistical significance, but three tumors in 50
rodents is certainly biologically significant!”
C E Total
Tumor 0 3 3
No Tumor 50 47 97
50 50 100
C E Total
Tumor 0 3 3
No Tumor 450 47 497
450 50 500
Table 1.3 Hypothetical bioassay results augmented by 400 historical controls, none
with a tumor; one-sided p = 0.001.
While probability has been the subject of study for hundreds of years (most
notably by mathematicians retained by rich noblemen to advise them on
how to maximize their winnings in games of chance), statistics is a relatively
young field. Linear regression first appeared in the work of Francis Galton
in the late 1800s, with Karl Pearson adding correlation and goodness-of-
fit measures around the turn of the last century. The field did not really
blossom until the 1920s and 1930s, when R.A. Fisher developed the notion
of likelihood for general estimation, and Jerzy Neyman and Egon Pearson
developed the basis for classical hypothesis testing. A flurry of research
activity was energized by the World War II, which generated a wide variety
of difficult applied problems and the first substantive government funding
for their solution in the United States and Great Britain.
By contrast, Bayesian methods are much older, dating to the original
1763 paper by the Rev. Thomas Bayes, a minister and amateur mathe-
matician. The area generated some interest by Laplace, Gauss, and others
in the 19th century, but the Bayesian approach was ignored (or actively
opposed) by the statisticians of the early 20th century. Fortunately, dur-
ing this period several prominent non-statisticians, most notably Harold
Jeffreys (a physicist) and Arthur Bowley (an econometrician), continued
to lobby on behalf of Bayesian ideas (which they referred to as “inverse
probability”). Then, beginning around 1950, statisticians such as L.J. Sav-
age, Bruno de Finetti, Dennis Lindley, and many others began advocating
Bayesian methods as remedies for certain deficiencies in the classical ap-
proach. The following example discusses the case of interval estimation.
iid
Example 1.1 Suppose Xi ∼ N (θ, σ 2 ), i = 1, . . . , n, where N denotes the
normal (Gaussian) distribution and iid stands for “independent and iden-
tically distributed.” We desire a 95% interval estimate for the population
mean θ. Provided n is sufficiently large (say, bigger than 30), a classical
approach would use the confidence interval
√
δ(x) = x̄ ± 1.96s/ n ,
THE BAYES-FREQUENTIST CONTROVERSY 7
2. Negative binomial: Data collection involved flipping the coin until the
third tail appeared. Here, the random quantity X is the number of heads
required to complete the experiment, so that X ∼ N egBin(r = 3, θ),
8 APPROACHES FOR STATISTICAL INFERENCE
Thus, using the “usual” Type I error level α = .05, we see that the two
model assumptions lead to two different decisions: we would reject H0 if X
were assumed negative binomial, but not if it were assumed binomial. But
there is no information given in the problem setting to help us make this
determination, so it is not clear which analysis the frequentist should regard
as “correct.” In any case, assuming we trust the statistical model, it does
not seem reasonable that how the experiment was monitored should have
any bearing on our decision; surely only its results are relevant! Indeed, the
likelihood functions tell a consistent story, since (1.2) and (1.3) differ only
by a multiplicative constant that does not depend on θ.
A Bayesian explanation of what went wrong in the previous example
would be that the Neyman-Pearson approach allows unobserved outcomes
to affect the rejection decision. That is, the probability of X values “more
extreme” than 9 (the value actually observed) was used as evidence against
H0 in each case, even though these values did not occur. More formally, this
is a violation of a statistical axiom known as the Likelihood Principle, a no-
tion present in the work of Fisher and Barnard, but not precisely defined
until the landmark paper by Birnbaum (1962). In a nutshell, the Likeli-
hood Principle states that once the data value x has been observed, the
likelihood function L(θ|x) contains all relevant experimental information
delivered by x about the unknown parameter θ. In the previous exam-
ple, L1 and L2 are proportional to each other as functions of θ, hence are
equivalent in terms of experimental information (recall that multiplying a
likelihood function by an arbitrary function h(x) does not change the MLE
θ̂). Yet in the Neyman-Pearson formulation, these equivalent likelihoods
lead to two different inferences regarding θ. Put another way, frequentist
test results actually depend not only on what x was observed, but on how
the experiment was stopped.
THE BAYES-FREQUENTIST CONTROVERSY 9
1. Bayesian methods provide the user with the ability to formally incorpo-
rate prior information.
10 APPROACHES FOR STATISTICAL INFERENCE
Discussion
Statistical decision rules can be generated by any philosophy under any col-
lection of assumptions. They can then be evaluated by any criteria, even
those arising from an utterly different philosophy. We contend (and the
subsequent chapters will show) that the Bayesian approach is an excel-
lent “procedure generator,” even if one’s evaluation criteria are frequentist
provided that the prior distributions introduce only a small amount of in-
formation. This agnostic view considers features of the prior (possibly the
entire prior) as “tuning parameters” that can be used to produce a decision
rule with broad validity. The Bayesian formalism will be even more effective
if one desires to structure an analysis using either personal opinion or ob-
jective information external to the current data set. The Bayesian approach
also encourages documenting assumptions and quantifying uncertainty. Of
course, no approach automatically produces broadly valid inferences, even
in the context of the Bayesian models. A procedure generated by a high-
information prior with most of its mass far from the truth will perform
poorly under both Bayesian and frequentist evaluations.
Statisticians need design and analysis methods that strike an effective
tradeoff between efficiency and robustness, irrespective of the underlying
philosophy. For example, in estimation, central focus should be on reduc-
tion of MSE and related performance measures through a tradeoff between
variance and bias. This concept is appropriate for both frequentists and
Bayesians. In this context, our strategy will be to use the Bayesian formal-
ism to reduce MSE even when evaluations are frequentist.
Importantly, the Bayesian formalism properly propagates uncertainty
through the analysis enabling a more realistic (typically inflated) assess-
ment of the variability in estimated quantities of interest. Also, the for-
malism structures the analysis of complicated models where intuition may
produce faulty or inefficient approaches. This structuring becomes espe-
EXERCISES 13
1.6 Exercises
1. Let θ be the true proportion of men in your community over the age of
40 with hypertension. Consider the following “thought experiment”:
(a) Though you may have little or no expertise in this area, give an initial
point estimate of θ.
(b) Now suppose a survey to estimate θ is established in your community,
and of the first 5 randomly selected men, 4 are hypertensive. How does
this information affect your initial estimate of θ?
(c) Finally, suppose that at the survey’s completion, 400 of 1000 men
have emerged as hypertensive. Now what is your estimate of θ?
What guidelines for statistical inference do your answers suggest?
2. Repeat the journal publication thought problem from Subsection 1.2.1
for the situation where
(a) you have won a lottery on your first try.
(b) you have correctly predicted the winner of the first game of the World
Series (professional baseball).
3. Assume you have developed predictive distributions of the length of time
it takes to drive to work, one distribution for Route A and one for Route
B. What summaries of these distributions would you use to select a route
(a) to maximize the probability that the drive takes less than 30 minutes?
(b) to minimize your average commuting time?
4. For predictive distributions of survival time associated with two medical
treatments, propose treatment selection criteria that are meaningful to
you (or if you prefer, to society).
5. Here is an example in a vein similar to that of Example 1.2, and orig-
inally presented by Berger and Berry (1988). Consider a clinical trial
established to study the effectiveness of vitamin C in treating the com-
mon cold. After grouping subjects into pairs based on baseline variables
such as gender, age, and health status, we randomly assign one member
of each pair to receive vitamin C, with the other receiving a placebo.
We then count how many pairs had vitamin C giving superior relief
after 48 hours. We wish to test H0 : P (vitamin C better) = 12 versus
Ha : P (vitamin C better) = 12 .
14 APPROACHES FOR STATISTICAL INFERENCE
2.1 Introduction
We begin by reviewing the fundamentals introduced in Chapter 1. The
Bayesian approach begins exactly as a traditional frequentist analysis does,
with a sampling model for the observed data y = (y1 , . . . , yn ) given a vector
of unknown parameters θ. This sampling model is typically given in the
form of a probability distribution f (y|θ). When viewed as a function of θ
instead of y, this distribution is usually called the likelihood, and sometimes
written as L(θ; y) to emphasize our mental reversal of the roles of θ and
Note that L need not be a probability distribution for θ given y; that is,
y.
L(θ; y)dθ need not be 1; it may not even be finite. Still, given particular
data values y, it is very often possible to find the value θ that maximizes
the likelihood function, i.e.,
θ = argmaxθ L(θ; y) .
This value is called the maximum likelihood estimate (MLE) for θ. This
idea dates to Fisher (1922; see also Stigler, 2005) and continues to form
the basis for many of the most commonly used statistical analysis methods
today.
In the Bayesian approach, instead of supposing that θ is a fixed (though
unknown) parameter, we think of it as a random quantity as well. This
is operationalized by adopting a probability distribution for θ that sum-
marizes any information we have about it not related to that provided by
the data y, called the prior distribution (or simply the prior). Just as the
likelihood had parameters θ, the prior may have parameters η; these are
often referred to as hyperparameters, in order to distinguish them from the
likelihood parameters θ. For the moment we assume that the hyperparam-
eters η are known, and thus write the prior as π(θ) ≡ π(θ|η). Inference
concerning θ is then based on its posterior distribution, given by
p(y, θ) p(y, θ)
p(θ|y) = =
p(y) p(y, θ) dθ
f (y|θ)π(θ)
= . (2.1)
f (y|θ)π(θ) dθ
16 THE BAYES APPROACH
This formula is known as Bayes’ Theorem, and first appeared (in a some-
what simplified form) in Bayes (1763). Notice the contribution of both the
experimental data (in the form of the likelihood f ) and prior opinion (in
the form of the prior π) to the posterior in the last expression of equation
(2.1). The posterior is simply the product of the likelihood and the prior,
renormalized so that it integrates to 1 (and is thus itself a valid probability
distribution).
Readers less comfortable with the probability calculus needed to handle
the continuous variables in (2.1) may still be familiar with a discrete, set
theoretic version of Bayes’ Theorem from a previous probability or statis-
tics course. In this simpler formulation, we are given an event of interest A
and a collection of events Bj , j = 1, . . . , J that are mutually exclusive and
exhaustive (that is, exactly one of them must occur). Given the probabili-
ties of each of these events P (Bj ), as well as the conditional probabilities
P (A|Bj ), from fundamental rules of probability, we have
P (A, Bj ) P (A, Bj )
P (Bj |A) = = J
P (A) j=1 P (A, Bj )
P (A|Bj )P (Bj )
= J
, (2.2)
j=1 P (A|Bj )P (Bj )
where P (A, Bj ) indicates the joint event where both A and Bj occur; many
textbooks write P (A∩Bj ) for P (A, Bj ). The reader will appreciate that all
four expressions in (2.2) are just a discrete finite versions of the correspond-
ing expressions in (2.1), with the Bj playing the role of the parameters θ
and A playing the role of the data y.
This simplified version Bayes’ Theorem (referred to by many textbook
authors as Bayes’ Rule) may appear too simple to be of much practical
value, but interesting applications do arise:
Example 2.1 Ultrasound tests done near the end of the first trimester of a
pregnancy are often used to predict the sex of the baby. However, the errors
made by radiologists in reading ultrasound results are not symmetric, in
the following sense: girls are virtually always correctly identified as girls,
while boys are sometimes misidentified as girls (in cases where the penis is
not clearly visible, perhaps due to the child’s position in the womb). More
specifically, a leading radiologist states that
P (test + |G) = 1 and P (test + |B) = .25 ,
where “test +” denotes that the ultrasound test predicts the child is a girl.
Thus, we have a 25% false positive rate for girl, but no false negatives.
Suppose a particular woman’s test comes back positive for girl, and we
wish to know the probability she is actually carrying a girl. Assuming 48%
of babies are girls, we can use (2.2) where “boy” and “girl” provide the
J = 2 mutually exclusive and exhaustive cases Bj . Thus, with A being the
INTRODUCTION 17
prior
0.5
likelihood
posterior
0.4
density
0.3
0.2
0.1
0.0
−2 0 2 4 6 8 10
prior
1.2
posterior with n= 1
posterior with n= 10
1.0
0.8
density
0.6
0.4
0.2
0.0
−2 0 2 4 6 8
Figure 2.2 Prior and posterior distributions based on samples of size 1 and 10,
normal/normal model.
where we request dashed (lty=2) or dotted (lty=3) line types for the two
posteriors, since we’ve already used a solid line (lty=1) for the prior. Fi-
nally, we can add a legend to the figure by typing
R code legend(-2, 1.3, legend=c("prior", "posterior with n=1",
"posterior with n=10"), lty=1:3, ncol=1)
This completes Figure 2.2.
The R language also allows the user to draw random samples from a
wide variety of probability distributions. This is sometimes called Monte
Carlo sampling, after the city known for its famous casinos (and presumably
rarely visited by probabilists who named the technique). In the previous
example, we may draw 2000 independent random samples directly from the
posterior (say, in the n = 1 case) using the rnorm command:
R code y1 <- rnorm(2000, mean=param1[1], sd=param1[2])
## param1: [1] 4.0000000 0.7071068
A histogram of these samples can be added to our previous plot using hist
and lines as follows:
R code r1 <- hist(y1, freq=F, breaks=20, plot=F)
lines(r1, lty=2, freq=F, col="gray90")
producing Figure 2.3. We remark that hist can be used without first stor-
ing its result in r1, but this will start the plot over again, erasing the
three existing curves. The empirical mean and standard deviation of our
Monte Carlo samples may be obtained simply by typing mean(y1) and
sd(y1) in R; for our sample we obtained 4.03 and 0.693, very close to the
true values of 4.0 and 0.707, respectively. These estimates could be made
more accurate by increasing the Monte Carlo sample size (say, from 2000
to 10,000 or 100,000); we defer details about the error inherent in Monte
Carlo estimation to Section 3.3.
Of course, Monte Carlo methods are not necessary in this simple nor-
mal/normal example, since the integral in the denominator of Bayes’ Theo-
rem can be evaluated in closed form. In this case, we would likely prefer the
resulting smooth curve and corresponding exact answers for the posterior
mean and variance to the bumpy histogram and the estimated mean and
variance produced by Monte Carlo. However, if the likelihood f and the
prior π do not permit evaluation of the denominator integral, Monte Carlo
methods are generally the preferred method for estimating the posterior.
The reason for this is the approach’s great generality: samples can typi-
cally be drawn from any posterior regardless of how high-dimensional the
parameter vector θ is. Thus, while we no longer get a smooth, exact func-
tional form for the posterior p(θ|y), we gain the ability to work problems
of essentially unlimited complexity.
Chapter 3 contains an extensive discussion of the various computational
methods useful in Bayesian data analysis; for the time being, we give only
22 THE BAYES APPROACH
prior
1.2
posterior with n=1
posterior with n=10
1.0
0.8
density
0.6
0.4
0.2
0.0
−2 0 2 4 6 8
Figure 2.3 Prior and posterior distributions for the normal/normal model, with
a histogram of Monte Carlo draws superimposed for the n = 1 case.
Notice the use of <- (arrow) for assignment and ~ (tilde) for “is distributed
as,” consistent with usual notation in R and statistical practice generally.
However, note that the second parameter in the normal distribution expres-
sion dnorm is the precision (reciprocal variance), not the variance itself, a
convention that many students find confusing initially. To be consistent
with our notation from Examples 2.2 and 2.3, the code above includes
assignment statements for the precision in the data (prec.ybar) and the
prior (prec.theta) that give the appropriate conversions from the corre-
sponding variances, σ 2 /n and τ 2 .
The data file in this simple problem consists of the one line
BUGS code list(ybar=6, mu=2, sigma2=1, tau2=1, n=1)
24 THE BAYES APPROACH
Thus the data file includes not only the data ȳ, but also the sample size n
and the hyperparameters μ, σ2 , and τ 2 . Finally, the initial values file takes
the form
BUGS code list(theta = 0)
which simply initializes the sampler at 0 (though in this univariate problem
the starting place actually turns out to be arbitrary).
Running the sampler for 30,000 iterations, we obtained a smoothed kernel
density estimate plotted in Figure 2.4, which looks very similar to the
histogram in Figure 2.3. The sample mean and standard deviation of the
30,000 draws were 3.997 and 0.704, very close to the true values of 4.0 and
0.707.
Again, we do not need all this MCMC power to solve this tiny little
problem, but the benefits of WinBUGS will become very apparent as this
chapter wears on and we begin tackling models with many more unknown
parameters and more complicated distributional structures.
That is, we can obtain the posterior for the full dataset (y1 , y2 ) by first
finding p(θ|y1 ) and then treating it as the prior for the second portion of
the data y2 . This easy algorithm for updating the posterior is quite natural
when the data arrive sequentially over time, as in a clinical trial or perhaps
a business or economic setting.
Many authors (notably Geisser, 1993) have argued that concentrating
on inference for the model parameters is misguided, since θ is merely an
unobservable, theoretical quantity. Switching to a different model for the
data may result in an entirely different θ vector. Moreover, even a perfect
understanding of the model does not constitute a direct attack on the
problem of predicting how the system under study will behave in the future
– often the real goal of a statistical analysis. To this end, suppose that yn+1
is a future observation, independent of y given the underlying θ. Then the
predictive distribution for yn+1 is given by
p(yn+1 |y) = p(yn+1 , θ|y)dθ
= f (yn+1 |θ, y)p(θ|y)dθ
= f (yn+1 |θ)p(θ|y)dθ , (2.7)
content, so that the data from the current study will be the dominant
force in determining the posterior distribution. We address each of these
approaches in turn.
frequentist software packages (because now both the prior and likelihood
information can be input as data records in the program). In the main,
however, prior elicitation issues tend to be application- and elicitee-specific,
meaning that general purpose algorithms are typically unavailable; see e.g.
Greenland (2006, 2007b) for illustration of his prior data approach in the
2 × 2 table and linear regression settings, respectively.
As with many other areas of Bayesian endeavor, the difficulty of prior
elicitation has been ameliorated somewhat through the addition of interac-
tive computing, especially dynamic graphics and object-oriented computer
languages such as R. As of the current writing, the best source of informa-
tion on up-to-date elicitation software may be www.shef.ac.uk/beep/, the
website of the BEEP (Bayesian Elicitation of Expert Probabilities) project.
BEEP is the grant-funded research effort associated with the O’Hagan et
al. (2006) book. Currently available are fairly high-level programs by Paul
Garthwaite and David Jenkinson to elicit beliefs about relationships assum-
ing a particular kind of linear model. Another, more low-level approach
called SHELF, developed by Tony O’Hagan and Jeremy Oakley, is aimed
at eliciting a single distribution in as rigorous and defensible a framework
as possible. This R software includes templates for elicitors to follow that
will both guide them through the process and also provide a record of
the elicitation. Software by John Paul Gosling called ROBEO is also avail-
able for implementing the nonparametric elicitation method of Oakley and
O’Hagan (2007) and its extensions. The group also plans extensions to mul-
tivariate settings and to allow for uncertainty in the elicited probabilities
or quantiles.
Example 2.6 In the arena of monitoring clinical trials, Chaloner et al.
(1993) show how to combine histogram elicitation, matching a functional
form, and interactive graphical methods. Following the advice of Kadane
et al. (1980), these authors elicit a prior not on θ (in this case, an unob-
servable proportional hazards regression parameter), but on corresponding
observable quantities familiar to their medically-oriented elicitees, namely
the proportion of individuals failing within two years in a population of
control patients, p0 , and the corresponding two-year failure proportion in
a population of treated patients, p1 . Writing the survivor function at time
t for the controls as S(t), under the proportional hazards model we then
have that p0 = 1 − S(2) and p1 = 1 − S(2)exp(θ) . Hence, the equation
log[− log(1 − p1 )] = θ + log[− log(1 − p0 )] (2.9)
gives the relationship between p0 , p1 , and θ.
Since the effect of a treatment is typically thought of by clinicians as
being relative to the baseline (control) failure probability p0 , they first
elicit a best guess for this rate, p̂0 . Conditional on this modal value, they
then elicit an entire distribution for p1 , beginning with initial guesses for
the upper and lower quartiles of p1 ’s distribution. These values determine
32 THE BAYES APPROACH
initial guesses for the parameters μ and σ of a smooth density function that
corresponds on the θ scale to an extreme value distribution. This smooth
functional form is then displayed on the computer screen, and the elicitee
is allowed to experiment with new values for μ and σ in order to obtain
an even better fit to his or her true prior beliefs. Finally, fine tuning of the
density is allowed via mouse input directly onto the screen. The density is
restandardized after each such change, with updated quantiles computed
for the elicitee’s approval. At the conclusion of this process, the final prior
distribution is discretized onto a suitably fine grid of points (p11 , . . . , p1K ),
and finally converted to a histogram-type prior on θ via equation (2.9), for
use in computing its posterior distribution.
e−θ θx
f (x|θ) = , x ∈ {0, 1, 2, . . .}, θ > 0.
x!
To effect a Bayesian analysis, we require a prior distribution for θ having
support on the positive real line. A reasonably flexible choice is provided
PRIOR DISTRIBUTIONS 33
Sally. (Looking about.) Now didn’t I wool that sargeant. I’ll bet he
hain’t got brains enough for a mule. It takes seven hundred er them
fellers to know as much as a Yankee. When he was stealin’ the
chickens at that deserted house, I told him it warn’t fair to steal my
chickens, when I was givin’ his men coffee. Gorry, won’t they sleep
some! Now Hez. he has learned ter steal chickens since he come
down here. You jest wait and see me break him er that when I get
him back to Pordunk! Now I should like to see a man of mine stealin’
chickens, or runnin’ after other wimen! Now wouldn’t there be the
handsomest fuss Pordunk ever looked at! (Looking about.) I guess
them fellers are snorin’ by this time. (Exit R., cautiously.)
Scene 4. Room covering whole stage. Door at R. centre. Large
box, R. U. E. Hezekiah and Barney disc. rear centre, chained to a ring
in the floor.
THE END.
TRANSCRIBER’S NOTES
1. The stage directions were inconsistently formatted. Some
were italicized and some not. Also some were in
parentheses and some in square brackets. (As if the
typesetter ran out of parentheses or italics
occassionally.) They were all altered to parentheses and
italics.
2. Silently corrected typographical errors and variations in
spelling.
3. Retained anachronistic, non-standard, and uncertain
spellings as printed.
*** END OF THE PROJECT GUTENBERG EBOOK ZINA: THE SLAVE
GIRL; OR, WHICH THE TRAITOR? ***
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.