0% found this document useful (0 votes)

111 views11 pages

Normal Linear Regression Model Via Gibbs Sampling Gibbs Sampler Diagnostics

This document discusses using Gibbs sampling to estimate the parameters in a normal linear regression model with independent normal and inverse gamma priors. It describes: 1) The structural model and prior distributions assumed, which include normal priors on the regression coefficients and an inverse gamma prior on the error variance. 2) How the Gibbs sampler works by drawing from the conditional posterior distributions of each parameter given the current values of the other parameters. 3) That this allows obtaining draws from the joint posterior distribution, from which estimates of the model parameters and their uncertainties can be derived.

Uploaded by

Renato Salazar Rios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views11 pages

Normal Linear Regression Model Via Gibbs Sampling Gibbs Sampler Diagnostics

Uploaded by

Renato Salazar Rios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

1

Normal linear regression model via Gibbs Sampling

Gibbs Sampler Diagnostics

AAEC 6984 / Spring 2012

Instructor: KLAUS MOELTNER

Textbooks: K Ch. 4, KPT Ch. 11

Matlab scripts: mod2s1a, mod2_convergence_plots, mod2_convergence_plots2,
mod2_plots, mod2_application, mod2_blocking, mod2_ac_plots,
mod2_convergence_plots3, mod2_wetlands, mod2_wetlands2, mod2_wetlands_plot
Matlab scripts: gs_independent_normal, gs_independent_normal_keepall,
klausdiagnostics, klausdiagnostics_greater0, gs_normal_blocked,
gs_normal_blocked_keepall

As mentioned previously conjugate priors may be overly restrictive in many Bayesian applications. Here
we use a popular combination of independent priors for the regression model – normal for β and inverse-
gamma for σ 2 .

The enhanced flexibility in prior modeling comes at the price of abandoning analytical results for the
posterior distribution. Instead, we will use posterior simulation via Gibbs Sampling to obtain draws from
the joint and marginal posteriors.

The structural model is the same as for the CLRM with conjugate priors.
.
p ( y | θ, X ) = ( 2π )
−n / 2
σ2( )
−n / 2

 (
exp  − 2σ1 2 ( y − Xβ )′ ( y − Xβ ) 
 ) (1)

The priors are given as follows:

( )
p β, σ 2 = p ( β ) p σ 2 ( ) where
β ~ n ( µ 0 , V0 ) , σ 2 ~ ig ( v0 ,τ 0 )

p ( β ) = ( 2π )
−k / 2
V0
−1/ 2
(
exp − 12 ( β − µ 0 )′ V0-1 ( β − µ 0 ) ) (2)

τ 0v0 −( v0 +1)  τ  τ0 τ 02
( )
p σ2 =
Γ ( v0 )
(σ 2 ) exp  − 02  , with ( )
E σ2 = ( )
, V σ2 =
 σ  v0 − 1 ( v0 − 1) ( v0 − 2 )
2

Note that σ 2 does not enter the prior density of β . AS mentioned before, amongst the many possible
parameterizations of the inverse-gamma (ig) density, we choose the form given in Gelman et al. (2004),
where v0 is the shape parameter and τ 0 is the scale parameter. For the density to have a defined mean,
we need v0 > 1 , and for a well-defined variance we need v0 > 2 .

Combining the priors with the likelihood, and dropping all terms that are multiplicatively unrelated to our
parameters of interest yields the posterior kernel
2

(
p β, σ 2 | y , X ∝ )
( )
(3)
( )
− n −2 v0 − 2

(σ ) 2 2
exp − 2σ1 2 ( 2τ 0 ) exp  − 12

1
σ 2 ( y − Xβ )′ ( y − Xβ ) + ( β − µ0 )′ V0-1 ( β − µ0 ) .



We first aim to find the posterior density for β , conditional on σ 2 (i.e. treating σ 2 as a constant). Thus,
we will fist focus on the components of the posterior kernel that cannot be multiplicatively separated from
β . This leaves

( 
)
p β | σ 2 , y , X ∝ exp  − 12 ( σ2
1
( y − Xβ )′ ( y − Xβ ) + ( β − µ0 )′ V0-1 ( β − µ0 ) ) . (4)

Note the conditionality on σ 2 on both sides of (4). Using the same algebraic manipulations and
reasoning as for the previous model, we obtain:

( ) ( )
−1
β | σ 2 , y, X ~ n ( µ1 , V1 ) with V1 = V0-1 + σ12 X′X and µ1 = V1 V0-1µ 0 + σ12 X′y (5)

To derive the conditional posterior density for σ 2 , we return to our original form for the joint posterior
given in (3). Ignoring terms that are not related to σ 2 , we have

( ) .
− n −2 v0 −2

(
p σ 2 | β, y , X ∝ σ 2 ) ( ) 2
exp  − 2σ1 2 2τ 0 + ( y − Xβ )′ ( y − Xβ )

(6)

Comparing this expression to the kernel of the ig prior in (2), we recognize this as the kernel of another ig
density. Specifically:

σ 2 | β, y , X ~ ig ( v1 ,τ1 ) with
2v0 + n (7)
( y − Xβ )′ ( y − Xβ )
1
v1 = and τ1 = τ 0 +
2 2

Thus, it will be straightforward to draw β conditional on σ 2 and vice versa.

Gibbs Sampler

The Gibbs Sampler (GS) has become the “workhorse” of Bayesian posterior simulation in recent years.
The general idea is simple: break the joint posterior into conditional posteriors for which the analytical
form of its density is known. Then sample sequentially and repeatedly from these conditionals. After a
number of draws the joint sequence of conditional draws will converge to the desired joint posterior
densities for all parameter. In addition, each individual sequence can be interpreted as the marginal
posterior for a given parameter. The GS is an example of a “Markov Chain Monte Carlo”, or MC 2
procedure.

Formally, assume our parameter vector of interest is θ , with posterior kernel p ( θ | y ) ∝ p ( θ ) p ( y | θ ) .

Assume further that this kernel has no known analytical form. However, we can split θ into two
components, say θ1 and θ 2 . For example, in our current application, θ1 = β and θ 2 = σ 2 . In other
3

applications, we may want to split θ into 3, 4, or even more components. The key notion is that we know
the analytical form of the resulting full conditional posterior distributions, i.e.

p ( θ1 | y, θ2 ) and p ( θ2 | y, θ1 ) . (3.8)

All we need to get the GS started is an initial value for θ 2 , call it θ02 . This can be chosen arbitrarily, or
one can use OLS results or results from previous analyses. We assume this starting value comes directly
from the marginal posterior p ( θ2 | y ) . Next, we draw θ1 conditional on θ02 from p ( θ1 | y , θ02 ) . Call this
draw θ11 . Next, we draw another value of θ 2 conditional on θ11 from p ( θ 2 | y , θ11 ) . Call this draw θ12 .
We repeat this process R times. In essence, we use the basic rule of conditional probabilities , i.e.

p ( θ | y ) = p ( θ1 | y, θ2 ) p ( θ2 | y ) = p ( θ2 | y, θ1 ) p ( θ1 | y ) (3.9)

over and over again. As a caveat we should note that, naturally, there is no guarantee that our starting
value θ02 really came from the marginal posterior p ( θ2 | y ) . However, under relatively weak conditions
(see Koop Ch. 4 for details) the starting value(s) will not matter and the GS will indeed converge to draws
from p ( θ | y ) . To assure that the effect of the starting value has truly “faded away”, we usually discard
the first r1 draws of the sequence, and keep only the remaining r2 = R − r1 draws. The discarded draws
are often referred to as “burn-ins”.

Monte Carlo Integration

The derivation of moments from this simulated posterior is accomplished via Monte Carlo Integration.
For example, the analytical expressions for the mean (or expectation) and variance of a given element of
β , say β j , are given by

( ) ∫ ( ) ( ) ∫(β ( )) p ( β )
2
E β j = β j p β j | y, X d β j V βj = j − E βj j | y, X d β j

A convenient alternative expression for the variance is

( ) ( ) ( ( )) ( ) ∫ ( )
2
V β j = E β 2j − E β j where E β j2 = β 2j p β j | y , X d β j

Also, the expectation and variance of any other function of β j , say g β j , take the form of ( )

( ( )) = ∫ g ( β ) p ( β ) ( ( )) = ∫ ( g ( β ) − E ( g ( β ))) p ( β )
2
E g βj j j | y, X d β j V g βj j j j | y, X d β j

For any of these expressions, MCI approximates the integral with averaging over draws. Thus
4

( )
E βj ≈
1
R
∑β j ,r

( )
E β j2 ≈
1
R
∑β 2
j ,r

( ( )) ≈ R1 ∑ g ( β )
E g βj j ,r

Implementation with Simulated Data

Matlab script m2s1a is the main script for this model. It opens the log file, loads data, specifies priors and
other settings for the posterior simulator, and calls the Gibbs Sampler (GS) via the function
gs_normal_independent. The last part of the script applies diagnostic tools (discussed later) to the
posterior draws and formats all output for the log file.

Convergence Plots

Convergence plots are a simple visual tool to examine if the simulator has converged to the posterior
distribution for a given parameter. The plot simply shows all draws of a given parameter in chronological
order, as generated by the sampler. "Convergence" usually implies that the draws are tightly clustered
around a flat line (the posterior mean), and do not wander widely. This is similar to assessing stationarity
for time series data.

Convergence plots can be used to assess the sensitivity of the algorithm to starting draws (the chain
should converge to the same distribution under different starting draws), the speed of convergence (and
thus the efficiency of the sampler), and the sufficiency of the chosen number of discarded draws (burn-
ins).

Script mod2_convergence plots provides a few examples using the simulated data from mod2s1a.
There are three parts: Full data, data truncated to 100 observations, and data truncated to 10 observations.
For each case, we first use two sets of starting draws for β and σ 2 . The first set comes from the OLS
output and is thus "right on target", i.e. close to the posterior mean (and the true parameter values
underlying our simulated data). The second set is deliberately located quite far from the true values.

For all cases we use a slightly modified GS via function gs_normal_independent_keepall. As

opposed to the previous version, this algorithm preserves the burn-in draws and sends them back to the
main script along with the "keepers" (i.e. retained draws from before).

You can see that with the full data set of 10,000 observations, the choice of starting draws virtually
doesn't matter – the chain essentially converges after one or two iterations. Furthermore, parameter draws
fluctuate very tightly around the posterior mean. A reduction in sample size implicitly assigns more
weight to our diffuse priors. As a result, the chain exhibits more "noise", i.e. wider fluctuations around
the mean. This will translate into a larger posterior standard deviation. However, even with just a
handful of observations, convergence is virtually immediate even with off-target starting draws. This is
an indication that the sampler itself is efficient, i.e. "mixes rapidly".

Posterior noise (i.e. the posterior standard deviation) is also driven by the information content of the data,
regardless of sample size. Highly collinear data implies poor information content and will generate wider
5

fluctuations around the posterior mean. This is illustrated in script mod2_convergence plots2. If
you compare the plots from this script to the ones from before for any parameter and sample size, you will
notice the increase in variability around the posterior mean for the collinear data.

Prior vs. Posterior Plots

As for the conjugate prior case, it is illustrative to plot and compare the prior and posterior distributions
for our parameters. This is accomplished in script mod2_plots. As before, the posterior densities are
much tighter than the priors in all cases.

Application: Female Earnings

Matlab script mod2_application implements the normal linear regression model with independent
priors using Mroz's (1987) labor data and provides posterior plots for selected parameters.

Gibbs Sampler Diagnostics

This section describes in detail the output component generated by function "klaus_diagnostics",
which is applied to parameter draws generated by the posterior simulator.

Numerical standard error (nse)

The nse captures simulation noise for a value of interest generated via posterior simulation. Usually this
value of interest is a measure of central tendency, such as the posterior mean. Consider the mean
M
θ = 1
M ∑θ
m =1
M of a sequence of m draws of (some generic) parameter θ . Assume these draws were

generated by a Gibbs Sampler (GS) or some other Markov-Chain Monte Carlo ( MC2) algorithm. If these
draws were perfectly independently and identically distributed (i.i.d.) with sample variance s 2 , we could
quickly derive the nse for θ using the basic formula for the standard error of a sample mean, i.e.

nse (θ ) = V (θ ) =
1 s
Ms 2 = (1)
M2 M

However, as with all MC2 procedures, we have to ex ante assume that these sequential draws will have a
considerable degree of correlation. This means we have to consider all covariance terms between all
draws of θ . After some straightforward analytical simplifications (shown in detail in KPT, p. 145), we
end up with a more general expression for the nse that allows for correlation across all draws:

s2  M −1 
nse (θ ) = V (θ ) =
M
( )
1 + 2 ∑ 1 − M ρ j  ,
j
(2)
j =1 
6

sj
where ρ j = is the lag-correlation between some draw θ i and a draw that was obtained j iterations
s2
prior to θ i , i.e. θi − j , with s j denoting the associated sample covariance. It can be easily seen from (2)
that under perfect independence ( ρ j = 0, ∀j ) we arrive again at the basic formula given in (1). In most
MC2 applications, lag-correlations will be positive but declining in magnitude. Thus, the second term
under the square root in (2) will exceed 1, and the nse under correlation will be larger that the nse under
independence. Note that in either case the nse can be made arbitrarily small by increasing M, the number
of draws. However, this can be very costly in terms of computation time. Thus, we are always interested
in devising a posterior sampler that is “efficient” in the sense of generating draws with low lag-
correlation. There are many “tricks” for accomplishing this – we’ll touch upon a few in this course.

In summary, the first purpose of the nse is to provide a measure of “simulation error” or “simulation
noise” surrounding a posterior construct of interest, usually the posterior mean of a given parameter.
Thus, the nse has a similar function to the standard error (s.e.) in Classical Analysis. However, its
intuition is very different – it simply captures simulation noise, i.e. the penalty for having to approximate
the joint posterior via simulations (since its analytical form is unknown). The Classical s.e. conveys the
notion of sampling error – i.e. the variability of the statistical construct of interest under (hypothetical) re-
sampling. You can also think of this as the penalty from working with a small sample relative to a large
population.

Inefficiency factors (IEFs)

The second purpose of the nse is to provide a summary measure of the “efficiency” of the
posterior simulator. An efficient simulator will have low correlation across draws, and thus will be able
to “tell the same story with fewer draws” – saving valuable computing time. As discussed in Chib (2001,
section 3.2) the ratio of the squared nse under correlation over the squared nse under independence can be
interpreted as “inefficiency factor” (IEF), also known as “autocorrelation time” for a given parameter, i.e.

nse 2 (θ ) M −1
IEF (θ ) =
nse2 (θ ; ρ j = 0, ∀j )
( )
= 1 + 2 ∑ 1 − Mj ρ j
j =1
(3)

A well-designed posterior simulator will generate sequences of parameter draws with low IEFs, the ideal
being an IEF close to 1. Sometimes the inverse of the IEF is used to measure posterior efficiency. This
quantity is called “numerical efficiency” (See Geweke, 1992). A third quantity to assess efficiency is the
“i.i.d - equivalent number of iterations”, labeled M* in the following, i.e. the number of i.i.d. draws that
contain the same amount of information about θ as the observed number of draws under correlation. It is
easily derived via

s2  M −1 
s2
*
=  1 + 2 ∑ ( )
1 − Mj ρ j  → M * =
 −
M
=
M
 IEF
(4)
( )
M M j =1 
M 1

1 + 2 ∑ 1 − M ρ j 
j

 j =1 

It follows that under perfect efficiency, we have M * = M , but usually we observe M * < M .

It should be noted that a very high IEF (very low M*) can be indicative of identification problems in your
model. How high is “very high? From personal experience, I would say that IEFs in the 1-5 range
7

indicate good efficiency, in the 6-20 range they’re still “tolerable”, but anything above 20 deserves closer
inspection. Certainly, IEFs of 100 and higher are almost a sure-bet indication of an identification or
specification problem in the underlying structural model.

Geweke’s (1992) Convergence Diagnostics(CD)

Another important sampler diagnostic is Geweke’s (1992) CD score. It is based on the simple intuition
that if the entire sequence of retained θ 's (focusing on a single parameter for simplicity) can truly be
interpreted as random draws from the same posterior density p (θ | y ) , and we divide the sequence of R
draws into three segments, the mean of the first segment of r = 1⋯ R1 draws should be “not too different”
from the mean of the last segment of r = R2 + 1⋯ R draws. A stated in Koop Ch. 4, setting R1 = 0.1R and
R2 = 0.6 R produces adequate results for most applications. Define the two means as θ1 and θ 2 ,
respectively. Then, asymptotically, the difference between these means, weighted by their respective
numerical standard errors, converges to a standard normal (“z”) variate, i.e.

θ1 − θ 2 a
CD = ~ n ( 0,1) . (4.5)
nse + nse
2
1
2
2

Thus, a CD value that clearly exceeds 1.96 for a specific parameter θ would raise a flag – it indicates that
the sequence of posterior draws may not have converged to p (θ | y ) . In practice, if a few CD values in
the 2-2.5 range for a model with many parameters would hardly raise concerns. However, if your
posterior simulator generates CD values of 3 or higher, an increase in the number of burn-in draws may
be warranted. Similarly to IEFs, grossly inflated CD values may also indicate identification and / or mis-
specification problems in the underlying model.

The Matlab function “klausdiagnostics” automatically generates the following posterior statistics for
all model parameters: posterior mean, posterior standard deviation, nse, IEF, M*, and CD. This is
illuatrated in our application of the normal regression model with independent priors. The Matlab
function “klausdiagnostics_greater0” produces, in addition, the posterior probability for a
parameter to exceed zero. This conveys a quick picture as to where the bulk of the posterior is located vis-
a-vis zero. The classically trained reader can relate to this quite well as it provides "similarly flavored"
information to the standard t-statistic or p-value in a typical regression output.

Autocorrelation (AC) plots

There are two additional noteworthy diagnostics tools: autocorrelation plots (“AC plots”) and re-running
the posterior sampler with different starting values, as implemented in Session 6 of this course. An AC
plot provides a simple visual inspection of the lag-correlation terms ( ρ j in (2) ) for a given parameter.
An example for AC plots is provided in the Matlab script mod2_ac_plots. A "well-behaved" AC plot
will exhibit small correlation effects that randomly fluctuate around zero. A plot with high (usually
positive) correlations that taper off only very slowly with increasing lag would be indicative of
inefficiencies in the posterior simulator.
8

Blocking
As discussed in previous Sessions of this course, a Gibbs Sampler operates by splitting the full set of
parameters into different groups or "blocks", which are then drawn sequentially and repeatedly,
conditional on all other blocks.

The main consideration in designing these blocks for a standard Gibbs Sampler is that the conditional
posterior density for each block is known, else we wouldn't be able to take any draws from it. However,
this requirement can be relaxed by employing other posterior simulation techniques, such as the
Metropolis Hastings (MH) algorithm, which does not require full knowledge of the conditional posterior
density for a given parameter or block of parameters. Thus, the "optimal blocking" of the full set of
parameters becomes a more general question.

As discussed inter alia in Chib (2001, section 7.1) it is generally recommended that parameters be drawn
in as few blocks as possible, and that parameters that tend to be highly correlated be collected in the same
block.

Identifying what constitutes a full-fledged block in a given posterior algorithm can be tricky. This is
because blocks can be combined by the method of composition, i.e. by exploiting the fact that partially
conditional posterior distributions may be known (or can be approximated at low computational cost) for
some parameters. For example, consider an initial blocking of the full parameter vector into three groups,
θ1 , θ 2 , and θ 3 . A "naïve" posterior sampler will then operate as follows:

1. Draw θ1 from p ( θ1 | θ2 , θ3 , y ) , where y represents the available data.

2. Draw θ 2 from p ( θ2 | θ1 , θ3 , y )
3. Draw θ 3 from p ( θ3 | θ1 , θ2 , y )
4. Repeat.

Now suppose the partially conditional density p ( θ1 | θ3 , y ) is known or can be approximated at low
computational cost. A (likely) more efficient posterior sampler would then collect θ1 and θ 2 in a single
block and operate as follows:

1. Draw θ1 , θ 2 from p ( θ1 , θ2 | θ3 , y ) = p ( θ2 | θ1 , θ3 , y ) * p ( θ1 | θ3 , y ) as follows:

a. Draw θ1 from p ( θ1 | θ 3 , y )
b. Draw θ 2 from p ( θ2 | θ1 , θ3 , y )
2. Draw θ 3 from p ( θ3 | θ1 , θ2 , y )

Thus, even though step 1 involves 2 sub-steps, it is considered a single block. The key notion in
identifying the number of blocks is that for a set of parameters to constitute a self-standing block, it needs
to be drawn conditional on all other blocks, and vice versa, i.e. all other blocks also need to be
conditioned on the first block.

Blocking in the normal linear regression model

For the normal regression model we have considered so far the parameter blocking was quite obvious
and, as it turns out, efficient: We grouped the constant term and all slope parameters into a single block (
9

β ), which left the regression variance σ 2 as the only other (single-parameter) block. Our GS proceeded
as follows:

(
1. Draw β from p β | σ 2 , y, X )
2. Draw σ 2 from p (σ 2
| β, y , X )

Matlab script mod2_blocking proposes a different approach based on 4 blocks. Specifically, we split β
into three parts of equal length, labeled β1 , β 2 , and β 3 . We then proceed as follows, using function
gs_normal_blocked:

(
1. Draw β1 from p β1 | β 2 , β 3 ,σ 2 , y , X )
2. Draw β 2 from p ( β 2 | β1 , β 3 , σ 2 , y, X )
3. Draw β 3 from p ( β 3 | β1 , β 2 , σ 2 , y, X )
4. Draw σ 2 from p (σ 2
| β, y , X )
In practice, draws of β j , j = 1⋯3 can be obtained as follows:
We know that for the basic linear regression model

y = Xβ + ε (6)

we obtain conditional draws of β via

( ) ( )
−1
β | σ 2 , y, X ~ n ( µ1 , V1 ) with V1 = V0-1 + σ12 X′X and µ1 = V1 V0-1µ 0 + σ12 X′y (7)

Now partition X and β into three parts corresponding to our new blocking, i.e.

y = X1 β1 + X 2 β 2 + X 3 β 3 + ε (8)

Consider the "transformed" dependent variable yɶ = y − X 2 β 2 − X 3β 3 and the resulting modified

regression model

yɶ = X1β1 + ε (9)

We can thus derive the conditional posterior for β1 as

( )
−1
β1 | β 2 , β 3 ,σ 2 , y , X ~ n ( µ1 , V1 ) with V1 = V01
-1
+ σ12 X1′ X1 and
(10)
µ1 = V1 V01
-1
(µ 01 + σ12 X1′ yɶ = V1 V01
-1
) (
µ 01 + σ12 X′ ( y − X 2β 2 − X 3β 3 ) )
where µ 01 and V01 are the prior mean and variance for β1 . Draws of β 2 and β 3 can be obtained in
analogous fashion.
10

The posterior output clearly shows efficiency losses compared to the original version, as judged by IEF
and M* scores. This is depicted graphically in script mod2_ac_plots, which compares autocorrelation
plots for the original and the "excessively-blocked" version for selected parameters.

Also note the loss in speed – the inefficient sampler takes about twice as long to take the same number of
draws.

We can also compare the relative performance of the two samplers based on convergence plots, as
implemented in script mod2_convergence_plots3. The plotted chain of draws for "age" is especially
illustrative: the chain wanders widely, with a clear auto-correlation pattern. The main drawback of such a
highly correlated chain is that it takes much longer to "visit" the entire posterior distribution with
appropriate frequencies . This may result in misleading posterior inference, based on overly tight or
otherwise "incomplete" distributions. (In fact, the posterior standard deviations flowing from the
inefficient sampler are actually slightly smaller than those generated by the efficient sampler) .

Effect of Priors under Small Sample Sizes

As you may expect, the choice of priors becomes especially important under small sample sizes, where
the data will be much less dominant in shaping the posterior distribution.

As mentioned previously, this potentially pronounced effect of prior distributions and parameters on
"final results" is both a curse and a blessing. It requires great care in prior selection, but it also allows for
the introduction of pertinent information that is exogenous to the (small) data at hand.

With our diagnostic tools in hand, we are now well-equipped to take a closer look at the role of priors in
small-sample application.

Matlab script mod2_wetlands applies the normal linear regression model with independent priors to a
small data set of 12 observations as used in Moeltner and Woodward (2009).

This is an example of a meta-dataset, compiled from results and summary statistics reported in existing
studies. This secondary data set is then processed via meta-regression to yield insights into potential
outcomes associated with a new policy site or setting. This strategy is commonly referred to as "Benefit
Transfer", and constitutes a low-cost alternative to primary data collection.

For our purposes, think of this application simply as a regression model with a very small sample. The
outcome variable of interest is the average willingness to pay (per year) across a group of residents (“sub-
population”) to preserve a specific wetland area. Explanatory variables are the percentage of active
wetland users in the sub-population, average annual household income (in log form), and wetland size (in
1000 acres).

Script mod2_wetlands estimates the model using our efficient 2-block Gibbs Sampler and vague priors
for all parameters. The posterior output suggests convergence (based on CD scores) and good efficiency
(based on IEF scores). As expected, the posterior standard deviations are relatively large (say, compared
to the posterior mean), a direct effect of the vague priors.

Script mod2_wetlands2 implements the same model with informed priors based on estimated
parameters reported in the literature, as described in Moeltner and Woodward (2009). Again, the
11

diagnostics point at convergence and good efficiency. The posterior standard deviations are smaller than
in the original model, which is unambiguously desirable.

However, the posterior means have changed as well, which leads to changed inference on expected
marginal effects. For example, the original model produces a posterior mean for income elasticity of
0.288. This increases to 0.388 in the refined model.

Script mod2_wetlands_plots compares the posterior and prior densities across the two models for a
selected set of parameters. Note that there are efficiency spillovers even for parameters that themselves
did not receive informed priors (such as the marginal effect of "users").

References
Chib, Siddartha. 2001. "Markov Chain Monte Carlo Methods: Computation and Inference," in J. J.
Heckman and E. Leamer (eds), Handbook of Econometrics: Elsevier.

Geweke, J. 1992. "Evaluating the Accuracy of Sampling-based Approaches to the Calculation of

Posterior Moments," in J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith (eds),
Bayesian Statistics 4 Oxford, UK: Oxford University Press.

Moeltner, K. and R. Woodward. 2009. "Meta-Functional Benefit Transfer for Wetland Valuation: Making
the Most of Small Samples." Environmental and Resource Economics 42, 89-109.

Test Card Credit
100% (6)
Test Card Credit
32 pages
A Guide To Modern Econometrics, 5th Edition Answers To Selected Exercises - Chapter 2
No ratings yet
A Guide To Modern Econometrics, 5th Edition Answers To Selected Exercises - Chapter 2
5 pages
(ANSWERED) Informecial App Analysis Question Test
100% (1)
(ANSWERED) Informecial App Analysis Question Test
3 pages
Cramer Raoh and Out 08
No ratings yet
Cramer Raoh and Out 08
13 pages
Moment Generating Functions
No ratings yet
Moment Generating Functions
7 pages
Solution Key Comprehensive Question Paper
No ratings yet
Solution Key Comprehensive Question Paper
8 pages
69813
No ratings yet
69813
16 pages
STAT 480b Answer Key To Problem Set No. 4
No ratings yet
STAT 480b Answer Key To Problem Set No. 4
3 pages
Hw2sol PDF
No ratings yet
Hw2sol PDF
15 pages
Introduction To Econometrics, Tutorial
No ratings yet
Introduction To Econometrics, Tutorial
10 pages
Prof. U.J.Dixit
No ratings yet
Prof. U.J.Dixit
11 pages
Hw3sol 21015 PDF
No ratings yet
Hw3sol 21015 PDF
13 pages
K Kiran Kumar IIM Indore
100% (1)
K Kiran Kumar IIM Indore
115 pages
Solutions
No ratings yet
Solutions
20 pages
Assignment 3
No ratings yet
Assignment 3
6 pages
Assignment 1 Answers
No ratings yet
Assignment 1 Answers
7 pages
QM Session 4
No ratings yet
QM Session 4
44 pages
HW 5 Sol
No ratings yet
HW 5 Sol
9 pages
Simplex Method For Standard Maximization Problem
No ratings yet
Simplex Method For Standard Maximization Problem
6 pages
Lecture Note 4 To 7 OLS
No ratings yet
Lecture Note 4 To 7 OLS
29 pages
Probability and Statistics
100% (1)
Probability and Statistics
26 pages
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
100% (1)
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
37 pages
4.determinants Assignment Solutions
100% (2)
4.determinants Assignment Solutions
13 pages
2326 - EC2020 - Main EQP v1 - Final
No ratings yet
2326 - EC2020 - Main EQP v1 - Final
19 pages
Introduction To Econometrics - Stock & Watson - CH 5 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 5 Slides
71 pages
Problems With OLS
No ratings yet
Problems With OLS
8 pages
Unit-17 IGNOU STATISTICS
No ratings yet
Unit-17 IGNOU STATISTICS
15 pages
07 - Lent - Topic 2 - Generalized Method of Moments, Part II - The Linear Model - mw217
No ratings yet
07 - Lent - Topic 2 - Generalized Method of Moments, Part II - The Linear Model - mw217
16 pages
ARCH Model
No ratings yet
ARCH Model
26 pages
Community Project: ANCOVA (Analysis of Covariance) in SPSS
No ratings yet
Community Project: ANCOVA (Analysis of Covariance) in SPSS
4 pages
4 - LM Test and Heteroskedasticity
No ratings yet
4 - LM Test and Heteroskedasticity
13 pages
CH 13
No ratings yet
CH 13
123 pages
Econometrics I: TA Session 5: Giovanna Ubida
No ratings yet
Econometrics I: TA Session 5: Giovanna Ubida
20 pages
ECON 330-Econometrics-Dr. Farooq Naseer
No ratings yet
ECON 330-Econometrics-Dr. Farooq Naseer
5 pages
Chapter 6 Section 4-5: Probability: Multiple Choice
No ratings yet
Chapter 6 Section 4-5: Probability: Multiple Choice
7 pages
Math525 2
No ratings yet
Math525 2
8 pages
Chapter 7 Solutions
100% (1)
Chapter 7 Solutions
4 pages
Worksheet Applications in Probability
No ratings yet
Worksheet Applications in Probability
2 pages
Math G11 AA SL Final Exam 2025
No ratings yet
Math G11 AA SL Final Exam 2025
12 pages
Bivariate Exponential Distribution
No ratings yet
Bivariate Exponential Distribution
11 pages
MMC2020 Solutions
100% (1)
MMC2020 Solutions
9 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
Iitjee Maths
No ratings yet
Iitjee Maths
78 pages
Linear and Multiobjective Programming With Fuzzy Stochastic Extensions
No ratings yet
Linear and Multiobjective Programming With Fuzzy Stochastic Extensions
103 pages
0.10.numerical Integration Trapezium Rule
No ratings yet
0.10.numerical Integration Trapezium Rule
16 pages
Rohatgi Expl
No ratings yet
Rohatgi Expl
192 pages
Statistics 131 Worksheet 10: Let X, · · ·, X ∼ U (0, θ), θ > 0. Find unbiased estimators of θ
No ratings yet
Statistics 131 Worksheet 10: Let X, · · ·, X ∼ U (0, θ), θ > 0. Find unbiased estimators of θ
2 pages
Random Variables
No ratings yet
Random Variables
57 pages
Sample Exam Solutions
No ratings yet
Sample Exam Solutions
11 pages
Heteroskedasticity
100% (1)
Heteroskedasticity
23 pages
Chapter12 Sampling Successive Occasions
No ratings yet
Chapter12 Sampling Successive Occasions
11 pages
STA2004F
No ratings yet
STA2004F
212 pages
Econometrics Multiple Regression Analysis: Heteroskedasticity
No ratings yet
Econometrics Multiple Regression Analysis: Heteroskedasticity
19 pages
Tute Exercises PDF
No ratings yet
Tute Exercises PDF
141 pages
Principles of Biostatistics: Class Notes To Accompany The Textbook by Pagano and Gauvreau
No ratings yet
Principles of Biostatistics: Class Notes To Accompany The Textbook by Pagano and Gauvreau
125 pages
2011 Exam - Financial Mathematics
No ratings yet
2011 Exam - Financial Mathematics
6 pages
College Algebra 12th Edition Gustafson Solutions Manual PDF
No ratings yet
College Algebra 12th Edition Gustafson Solutions Manual PDF
73 pages
Intro Bayes Time Series 1
No ratings yet
Intro Bayes Time Series 1
72 pages
Bayesian Inference For The Gaussian
No ratings yet
Bayesian Inference For The Gaussian
11 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
AAEC 6984 / SPRING 2014 Instructor: Klaus Moeltner
No ratings yet
AAEC 6984 / SPRING 2014 Instructor: Klaus Moeltner
3 pages
Ahmad2015 PDF
No ratings yet
Ahmad2015 PDF
21 pages
Advanced Econometrics: Based On The Textbook by Verbeek: A Guide To Modern Econometrics
No ratings yet
Advanced Econometrics: Based On The Textbook by Verbeek: A Guide To Modern Econometrics
24 pages
Advanced Econometrics: Based On The Textbook by Verbeek: A Guide To Modern Econometrics
No ratings yet
Advanced Econometrics: Based On The Textbook by Verbeek: A Guide To Modern Econometrics
27 pages
Second Midterm Test in Advanced Econometrics: Tentative Answers
No ratings yet
Second Midterm Test in Advanced Econometrics: Tentative Answers
3 pages
Hierarchical PDF
No ratings yet
Hierarchical PDF
11 pages
Advanced Econometrics: Based On The Textbook by Verbeek: A Guide To Modern Econometrics
No ratings yet
Advanced Econometrics: Based On The Textbook by Verbeek: A Guide To Modern Econometrics
48 pages
EC 6310: Advanced Econometric Theory: Bayesian Computation in The Nonlinear Regression Model
No ratings yet
EC 6310: Advanced Econometric Theory: Bayesian Computation in The Nonlinear Regression Model
33 pages
Data Augmentation / Latent Variable Models: (KPT Ch. 14)
No ratings yet
Data Augmentation / Latent Variable Models: (KPT Ch. 14)
13 pages
MA Advanced Macroeconomics: 5. Latent Variables: The Kalman Filter
No ratings yet
MA Advanced Macroeconomics: 5. Latent Variables: The Kalman Filter
22 pages
Answer Key Testname: UNTITLED1.TST: ESSAY. Write Your Answer in The Space Provided
No ratings yet
Answer Key Testname: UNTITLED1.TST: ESSAY. Write Your Answer in The Space Provided
6 pages
Ps 5 S
No ratings yet
Ps 5 S
5 pages
R2 0.72, Ser 3,773.35
No ratings yet
R2 0.72, Ser 3,773.35
5 pages
Android Instructions - Freedom Pro Keyboard
No ratings yet
Android Instructions - Freedom Pro Keyboard
2 pages
2FA Set Up
No ratings yet
2FA Set Up
17 pages
Content Standard:: /configuring-Of-Computer-Systems-And-Networks - PDF Module in ICT CHS 10 Teacher Guide
100% (2)
Content Standard:: /configuring-Of-Computer-Systems-And-Networks - PDF Module in ICT CHS 10 Teacher Guide
2 pages
Lecture 01 Intro
No ratings yet
Lecture 01 Intro
31 pages
ChatLog Indore ML Python Batch 2 2021 - 07 - 21 15 - 00
No ratings yet
ChatLog Indore ML Python Batch 2 2021 - 07 - 21 15 - 00
22 pages
DBMS Adi
No ratings yet
DBMS Adi
19 pages
Gridadvisor Series II Smart Sensor Catalog Ca915001en
No ratings yet
Gridadvisor Series II Smart Sensor Catalog Ca915001en
4 pages
2014-Instruction of MAGYN Electromagnetic Flowemter
No ratings yet
2014-Instruction of MAGYN Electromagnetic Flowemter
66 pages
APS 502 LP Models
No ratings yet
APS 502 LP Models
37 pages
Endian Iec-62443-Compliance Whitepaper en
No ratings yet
Endian Iec-62443-Compliance Whitepaper en
5 pages
03 01 PatMax Logic
No ratings yet
03 01 PatMax Logic
15 pages
SaaS Implementation Best Practices - v2
No ratings yet
SaaS Implementation Best Practices - v2
24 pages
Topic 1 (Whole Numbers) - Y4
No ratings yet
Topic 1 (Whole Numbers) - Y4
23 pages
Bni Iol-712-000-K023 - en - Bni00041
No ratings yet
Bni Iol-712-000-K023 - en - Bni00041
12 pages
Yaesu Bda Ft-991
No ratings yet
Yaesu Bda Ft-991
158 pages
User Guide LIGO Fuel Level Sensor 2021
No ratings yet
User Guide LIGO Fuel Level Sensor 2021
32 pages
Controlcasepciv4 241115112355 3cfe7e3f
No ratings yet
Controlcasepciv4 241115112355 3cfe7e3f
27 pages
Wellav CMP201 Users Guide
No ratings yet
Wellav CMP201 Users Guide
142 pages
Teknik Lipatan Minggu 14
No ratings yet
Teknik Lipatan Minggu 14
42 pages
Tally Shortcuts - Quick Short Cuts
No ratings yet
Tally Shortcuts - Quick Short Cuts
6 pages
Application Information: Need To Know How? You've Turned To The Right Place - . - Literally
No ratings yet
Application Information: Need To Know How? You've Turned To The Right Place - . - Literally
50 pages
SmartPlant Electrical SmartPlant 3D Cable Management Interface
No ratings yet
SmartPlant Electrical SmartPlant 3D Cable Management Interface
1 page
Propagation Delay and Short-Circuit Power Dissipation Modeling of The CMOS Inverter
No ratings yet
Propagation Delay and Short-Circuit Power Dissipation Modeling of The CMOS Inverter
12 pages
Applied Ethics
No ratings yet
Applied Ethics
5 pages
C# Concepts
No ratings yet
C# Concepts
2 pages
Activity File XII 24-25 - 240919 - 091153
No ratings yet
Activity File XII 24-25 - 240919 - 091153
17 pages
Genie Evo
No ratings yet
Genie Evo
124 pages
An ATM With An Eye
No ratings yet
An ATM With An Eye
43 pages

Normal Linear Regression Model Via Gibbs Sampling Gibbs Sampler Diagnostics

Uploaded by

Normal Linear Regression Model Via Gibbs Sampling Gibbs Sampler Diagnostics

Uploaded by

1

Normal linear regression model via Gibbs Sampling

AAEC 6984 / Spring 2012

Textbooks: K Ch. 4, KPT Ch. 11

The priors are given as follows:

Thus, it will be straightforward to draw β conditional on σ 2 and vice versa.

Formally, assume our parameter vector of interest is θ , with posterior kernel p ( θ | y ) ∝ p ( θ ) p ( y | θ ) .

Monte Carlo Integration

A convenient alternative expression for the variance is

Implementation with Simulated Data

For all cases we use a slightly modified GS via function gs_normal_independent_keepall. As

Prior vs. Posterior Plots

Application: Female Earnings

Gibbs Sampler Diagnostics

Numerical standard error (nse)

Inefficiency factors (IEFs)

Geweke’s (1992) Convergence Diagnostics(CD)

Autocorrelation (AC) plots

1. Draw θ1 from p ( θ1 | θ2 , θ3 , y ) , where y represents the available data.

1. Draw θ1 , θ 2 from p ( θ1 , θ2 | θ3 , y ) = p ( θ2 | θ1 , θ3 , y ) * p ( θ1 | θ3 , y ) as follows:

Blocking in the normal linear regression model

we obtain conditional draws of β via

Consider the "transformed" dependent variable yɶ = y − X 2 β 2 − X 3β 3 and the resulting modified

We can thus derive the conditional posterior for β1 as

Effect of Priors under Small Sample Sizes

Geweke, J. 1992. "Evaluating the Accuracy of Sampling-based Approaches to the Calculation of

You might also like