Bayes Regression
Bayes Regression
Adam N. Smith
July 31, 2021
1 Model
The standard multiple linear regression model relates a response variable yi to a
k-dimensional vector of predictor variables xi = (xi1 , . . . , xik ) for i = 1, . . . , n.
1
Prior We choose conjugate priors for (β, σ 2 ) to ensure an analytic expression for
the posterior distribution. With both β and σ 2 unknown, the conjugate prior is
specified as p(β, σ 2 ) = p(β|σ 2 )p(σ 2 ) where
Hence, β|σ 2 has a normal prior and σ 2 has scaled inverse chi-squared prior.1
2
The last line uses the fact that y 0 Xβ and β 0 Aβ̄ are both scalars, so y 0 Xβ = (y 0 Xβ)0
and β 0 Aβ̄ = (β 0 Aβ̄)0 . Now write (7) as
h i
β 0 (X 0 X + A)β − β 0 (2X 0 y + 2Aβ̄) + y 0 y + β̄ 0 Aβ̄. (8)
Step 2. Completing the Square The matrix version of completing the square
is given by:
X 0 M X + X 0 n + p = (X − h)0 M (X − h) + k (9)
M = X 0X + A
n = −2(X 0 y + Aβ̄)
h = (X 0 X + A)−1 (X 0 y + Aβ̄)
k = −(X 0 y + Aβ̄)0 (X 0 X + A)−1 (X 0 y + Aβ̄)
p=0
= β̃ 0 (X 0 X + A)β̃.
3
Therefore, using the results of equations (8), (10), and (11), (7) simplifies to
where
4
the posterior precision) which is the inverse of a k × k matrix. This type of matrix
inverse regularly appear in the computation of posterior moments, especially in
Bayesian regression models. When k is large, this matrix inverse becomes more
computationally demanding and can be a bottleneck in a posterior sampling routine.
Rossi et al. (2005) describe the Bayesian regression model with an eye towards ef-
ficient computation. The goal of this section is to provide the necessary background
information to understand their approach. We start by defining the Cholesky de-
composition which is a common method for matrix factorization.
5
This result shows that the inverse of Σ can be computed only using the inverse
of the Cholesky root U . That is, we have replaced the problem of inverting Σ with
the problem of inverting U . The fact that U is upper triangular leads to faster and
more numerically stable inversion methods relative to a dense matrix like Σ. The
following R code uses the previous result to compute Σ−1 .
Here IR refers to the “inverse (Cholesky) root” of Σ. Also notice that backsolve()
is used in place of solve() for computing IR. While solve(U) is equivalent to
backsolve(U,diag(nrow(U))), backsolve() is preferred because it recognizes the
special structure of U and solves the triangular systems of equations.
We can now return to the problem of sampling from the posterior defined in
(14). Using the results of the previous section, we first write
Σ = (X 0 X + A) = U 0 U (19)
6
and so
β̃ = (X 0 X + A)−1 (X 0 y + Aβ̄) (21)
= (IR)(IR)0 (X 0 y + Aβ̄).
The following R code generates one draw from the posterior of β|σ 2 .
k = length ( betabar )
U = chol ( crossprod ( X )+ A )
IR = backsolve (U , diag ( k ))
btilde = crossprod ( t ( IR ))%*%( crossprod (X , y )+ A %*% betabar )
beta = btilde + sqrt ( sigmasq )* IR %*% rnorm ( k )
Rossi et al. (2005) take this a step further. Let A = U 0 U and define
! !
y X
z= W = (22)
U β̄ U
k = length ( betabar )
RA = chol ( A )
W = rbind (X , RA )
z = c (y , RA %*% betabar )
IR = backsolve ( chol ( crossprod ( W )) , diag ( k ))
btilde = crossprod ( t ( IR ))%*% crossprod (W , z )
beta = btilde + sqrt ( sigmasq )* IR %*% rnorm ( k )
References
Rossi, P. E. (2019). bayesm: Bayesian Inference for Marketing/Micro-Econometrics,
R package version 3.1-4 edition.
Rossi, P. E., Allenby, G. M., and McCulloch, R. (2005). Bayesian Statistics and
Marketing. John Wiley & Sons.