Lectures 5
Lectures 5
5-1
Bayesian Inference
5-2
Bayes Theorem
∗ Although the discipline did not emerge until many years after
his death, it is based on a theorem that bears his name.
5-3
Bayesian Statistics Process
5-4
Finding the Posterior Distribution
∗ Posterior distribution
π(θ)L(θ | x1, . . . , xn)
π(θ | x1, . . . , xn) = Z
π(θ)L(θ | x1, . . . , xn) dθ
5-5
Bayesian Statistics and Sufficiency
5-6
Prior Distributions
∗ The inferences made using Bayesian methods will depend on
both the likelihood (as in frequentist statistics) and the prior
distribution.
5-7
Conjugate Priors
Definition 5.1
Given a family F of pdf’s (or pmf’s) f (x | θ) indexed by a pa-
Q
rameter θ, then a family, of prior distributions is said to be
conjugate for the family F if the posterior distribution of θ is
for all f ∈ F, all priors π(θ) ∈
Q Q
in the family and all possible
data sets x.
5-8
Non-informative Priors
5-9
Jeffrey’s Prior
Definition 5.3
Suppose that X ∼ fX (x | θ ). The usual Fisher Information
Matrix is
∂2
" #
I(θ ) = − E log fX (x | θ )
∂ θ∂ θt
The Jeffrey’s Prior for θ is
q
π(θ ) ∝ |I(θ )|.
5-10
Hyperparameters
5-11
Hierarchical Bayes Models
∗ In many cases we consider the hyperparameters to be random
variables with their own prior distribution. We will usually
use an non-informative parameter-free prior which may be
improper for this second stage prior.
5-14
Bayesian Decision Theory
5-15
The Risk Function
∗ Recall that the Risk Function is the average loss when using
the Decision Rule (estimator) θ̂, for a given value of θ.
h i
R(θ, θ̂) = E L(θ, θ̂(X )) | θ
Definition 5.4
Suppose that θ̂1(X ) and θ̂2(X ) are two possible estimators of
a parameter θ ∈ Θ. Then θ̂2 is said to be an inadmissible
estimator of θ if
R(θ, θ̂1) ⩽ R(θ, θ̂2) for every θ ∈ Θ
R(θ0, θ̂1) < R(θ0, θ̂2) for some θ0 ∈ Θ
5-16
Bayes Risk and Bayes Rules
Definition 5.5
Suppose that we wish to estimate a parameter θ ∈ Θ and we
have the prior distribution π(θ). Let θ̂ be an estimator of θ with
risk function (for a specified loss function) R(θ, θ̂). The Bayes
Risk is the average risk over all possible values of θ.
Z
RB (θ̂) = E R(θ, θ̂) = R(θ, θ̂)π(θ) dθ
Θ
The estimator θ̂ which minimizes the Bayes risk is known as the
Bayes Rule
5-17
Finding the Bayes Rule
Theorem 5.2
The Bayes Rule is the estimator which minimizes the posterior
expected loss.
Theorem 5.3
1. When using squared error loss, the Bayes Rule is the posterior
expected value, E(θ | x).
2. When using absolute error loss, the Bayes Rule is the poste-
rior median.
5-18
Bayesian Hypothesis Tests
5-19
Rejection Regions For Bayesian Tests
5-20
Priors for Bayesian Hypothesis Tests
∗ Usually we will use different priors for hypothesis testing than
we do for estimation.
5-21
Posterior Odds
∗ From Bayes Theorem we have that
f (x | θ ∈ Θ0)P(θ ∈ Θ0)
P(θ ∈ Θ0 | x) =
m(x)
f (x | θ ∈
/ Θ0)P(θ ∈
/ Θ0)
P(θ ∈
/ Θ 0 | x) =
m(x)
Definition 5.6
Suppose that X is a random sample from the joint distribution
f (x; θ) and we are testing
H 0 : θ ∈ Θ0 V H1 : θ ∈
/ Θ0 .
Then the Bayes Factor for this test is defined as
P(θ ∈
/ Θ 0 | x) f (x | θ ∈
/ Θ0 )
B10 = =
P(θ ∈ Θ0 | x) f (x | θ ∈ Θ 0 )
5-23
Bayes Factors
5-24
Bayesian Interval Estimation
5-25
Bayesian Credible Sets
Definition 5.7
Suppose that X ∼ f (x | θ and π(θ | x) is the posterior desnity for
θ given the observed data x. A set I(x) which satisfies
Z
π(θ | x)dθ = 1 − α.
I(x)
is called a 100(1 − α)% Bayesian credible set for θ.
5-26
Optimal Bayesian Credible Sets
∗ Of course there are many sets I(x) which can satisfy the
definition to be a Bayesian credible set.
Theorem 5.4
If the posterior density π(θ | x) is unimodal then, for a given
value of α, the shortest 1 − α credible interval for θ is given by
I(x) = {θ : π(θ | x) ⩾ k}
where
Z
π(θ | x) dθ = 1 − α.
{θ:π(θ|x)⩾k}
5-27
Highest Posterior Density Sets
Definition 5.8
Given a data set x1, . . . , xn and a posterior density π(θ | x) for
a scalar parameter θ a Bayesian 100(1 − α)% Highest Posterior
Density set for θ is the set of points
Ck (x) = {θ : π(θ | x) ⩾ k} .
The point k is chosen such that
Z
P(θ ∈ Ck (x) | x) = π(θ | x)dθ = 1 − α.
Ck (x)
5-28
Equi-tailed Bayesian Intervals
Definition 5.9
Given a data set x1, . . . , xn and a posterior density π(θ | x) for a
scalar parameter θ a Bayesian 100(1 − α)% equi-tailed credible
interval for θ is the interval with end-points θl (x) < θu(x) such
that
Z θ ( x)
l α
P(θ < θl (x) | x) = π(θ | x)dθ =
−∞ 2
Z ∞
α
P(θ > θu(x) | x) = π(θ | x)dθ =
θu (x) 2
5-29
Integration in Bayesian Analysis
5-30
Monte Carlo Bayesian Analysis
∗ Due to the importance of integration, Monte Carlo methods
are widely used in Bayesian statistics.
∗ For this reason, methods that do not need this constant are
often used.