CH 10
CH 10
This chpaper considers Bayesian estimation and prediction for the multiple linear regression
model in which x variables are fixed constants.
f (y|θ)p(θ)
π(θ|y) = R
f (y|θ)p(θ)dθ
1
= f (y|θ)p(θ),
c(y)
R
where c(y) = f (y|θ)p(θ)dθ is the normalizing constant of the posterior distribution.
Bayesian inference for the model is always based on the posterior distribution π(θ|y). For
example, let q(y0 |θ) denote a prediction function for y0 given θ. Then
Z
r(y0 |y) = q(y0 |θ)π(θ|y)dθ,
1
Theorem 2.1. Consider the Bayesian multiple regression model, for which the prior distributions
are as specified in (1). Then the joint prior distribution is conjugate, that is, π(β, τ |y) is of the
same form as π(β, τ ).
Theorem 2.2. Consider the Bayesian multiple regression model, for which the prior distributions
are as specified in (1). The marginal posterior distribution π(β|y) is a multivariate t distribution
with parameters (n + 2α, ϕ∗ , W ∗ ), where
and
(y − Xϕ)′ (I + XV X ′ )−1 (y − Xϕ) + 2δ
W∗ = (V −1 + X ′ X)−1 . (3)
n + 2α
Theorem 2.3. Consider the Bayesian multiple regression model, for which the prior distributions
are as specified in (1). The marginal posterior distribution π(τ |y) is a gamma distribution with
parameters α + n/2 and (−ϕ′∗ V −1 ′ −1
∗ ϕ∗ + ϕ V ϕ + y ′ y + 2δ)/2, where V ∗ = (V −1 + X ′ X)−1 and
ϕ∗ = V ∗ (V −1 ϕ + X ′ y).
a′ β − a′ ϕ∗
∼ t(n + 2α),
a′ W ∗ a
and, as an important special case,
βi − ϕ∗i
∼ t(n + 2α),
w∗ii
where ϕ∗i is the ith element of ϕ∗ and w∗ii is the ith diagonal element of W ∗ . Thus a Bayesian
point estimate of βi is its posterior mean ϕ∗i and a 100(1 − ω)% Bayesian credible interval for βi
is
ϕ∗i + tω/2,n+2α w∗ii .
Hypothesis Test For example, to test the hypothesis test βi > βi0 , we can calculate the prob-
ability
βi0 − ϕ∗i
P t(n + 2α) > .
w∗ii
The larger the probability is, the more credible is the hypothesis.
2
Special cases of Inference First, we consider the use of a diffuse prior. Let ϕ = 0, let V be
a diagonal matrix with all diagonal elements equal to a large constant (say, 106 ), and let α and δ
both be equal to a small constant (say, 10−6 ). In this case, V −1 is close to 0, and so ϕ∗ , and the
Bayesian point estimate of β in (2) is approximately equal to
(X ′ X)−1 X ′ y.
y ′ (I − X(X ′ X)−1 X ′ )y n − 1 2 ′ −1
W∗ = (X ′ X)−1 = s (X X) .
n n
The second special case of inference is the case in which ϕ = 0 and V is a diagonal matrix
with a constant on the diagonal. Thus V = aI, where a is a positive number, and the Bayesian
estimator of β becomes
1
(X ′ X + I)−1 X ′ y,
a
which is known as the ridge estimator.
(−ϕ′∗ V −1 ′ −1
∗ ϕ∗ + ϕ V ϕ + y ′ y + 2δ)/2
,
α + n/2 − 1
and a 100(1 − ω)% Bayesian credible interval for σ 2 is given by the 1 − ω/2 and ω/2 quantiles of
the inverse gamma distribution.
Consider a special case that α and δ are both close to 0, ϕ = 0, and V is a diagonal matrix
with all diagonal elements equal to a large constant. Then the Bayesian point estimator of σ 2 is
approximately
(y ′ y − ϕ∗ V −1
∗ ϕ∗ )/2 y ′ y − y ′ X(X ′ X)−1 X ′ y n−k−1 2
= = s.
n/2 − 1 n−2 n−2