0% found this document useful (0 votes)

316 views29 pages

(Ebook) Introduction To Bayesian Econometrics and Decision Theory

Manual que busca mostrar al usuario la aplicación y fundamentos sobre la probabilidad Bayesiana. El documento cuenta con un enfoque económico/financiero.

Uploaded by

Daniel Rangel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

316 views29 pages

(Ebook) Introduction To Bayesian Econometrics and Decision Theory

Manual que busca mostrar al usuario la aplicación y fundamentos sobre la probabilidad Bayesiana. El documento cuenta con un enfoque económico/financiero.

Uploaded by

Daniel Rangel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Introduction to Bayesian Econometrics and Decision

Theory
Karsten T. Hansen
January 14, 2002

Lecture notes, 315 Winter 2001.

Bayesian Theory
Introduction
This note will give a short introduction to Bayesian Econometrics and Bayesian Decision Theory. Econometrics is usually taught from a classical, frequentist perspective.
However, thinking of econometric models from a Bayesian viewpoint can often be illuminating.
Here is a non-exhaustive list of arguments for considering a Bayesian analysis of an
econometric model (taken from Bergers book):
1. Prior information about economically structural parameters is often
available.
In many econometric models we often have information about the underlying
unknown parameters. Often some parameter values just dont make much sense
in terms of the underlying econometric theory (e.g., own price elasticities being positive in a demand function). A Bayesian analysis makes it very easy to
incorporate such information directly.
2. Uncertainty = Probabilities. Any conclusion derived from a statistical analysis should have attached to it an indication of the uncertainty of the conclusion.
For example, a point estimate of an unknown parameter is more or less worthless
without an indication of the uncertainty underlying the estimate.
In classical statistics one can only talk about uncertainty in a repeated sample
framework. Recall the construction of confidence intervals!
A Bayesian analysis will yield statements like Given the observed data, I believe
with 95 percent probability that this wage elasticity is between .1 and .18

3. Allows for conditioning on data. A Bayesian analysis conditions on the

observed data, where as a classical analysis averages over all possible data structures.
4. Exact distribution theory. Frequentist distribution theory of estimators of
parameters for all but the most simple non-interesting econometric models rely
on asymptotic approximations. These approximations are sometimes good, sometimes horrible. Bayesian distribution is always exact - never requiring the use of
asymptotic approximations.
5. Coherency and Rationality. It has been shown that any statistical analysis
which is not Bayesian must violate some basic common sense axiom of behavior.
This is related to the fact that a Bayesian analysis is directly based on axiomatic
utility theory.
6. Bayes is optimal from a classical perspective. It has been shown in numerous papers that whenever one finds a class of optimal decision rules from
a classical perspective (optimal with respect to some acceptable principle) they
usually corresponds the class of Bayes decision rules. An example is the many
complete class theorems in the literature (which roughly says that all admissible
decision rules are Bayes decision rules).
7. Operational advantage: You always know what to do! Researchers
are often faced with problems like How do I estimate the parameters of this
econometric model in a good way?. In a Bayesian analysis you always do this
the same way - and it is usually with a good answer.
8. Computation. In the past it was often very hard to carry out a Bayesian analysis in practice due to the need of analytical integration. With the introduction
of cheap high-performance PCs and the development of Monte Carlo statistical
methods it is now possible to estimate models with several thousand parameters.

This is only a partial list. A few more (technical) reasons for considering a Bayesian
approach is
can easily accomodate inference in non-regular models,
allows for parameter uncertainty when forming predicitions,
can test multiple non-nested models,
allows for automatic James-Stein Shrinkage estimation using hierarchial models.
Probability theory as logic
Probability spaces are usually introduced in the form of the Kolmogorov axioms. A
probability space (, F, P ) consists of a sample space , a set of events F consisting
of subsets of and a probability measure P with the properties
1. F is a -field
2. P (A) 0, for all A F
3. P () = 1
4. For a disjoint collection {Aj F},
P (Aj ) =

P (Aj )

These are axioms and hence taken as given. The classical interpretation of the
number P (A) is the relative frequency with which A occurs in a repeated random
experiment when the number of trials go to infinity.
But why should we be base probability theory on exactly these axioms? Indeed
many have criticized these axioms as being arbitrary. Can we derive them from deeper
principles that seem less arbitrary? Yes and this also leads to an alternative interpretation of the number P (A).
3

Let us start by noting that in a large part of our lives our human brains are engaged
in plausible reasoning. As an example of plausible reasoning consider the following little
story from Jaynes book:
Suppose some dark night a policeman walks down a street, apparently
deserted; but suddenly he hears a burglar alarm, looks across the street,
and sees a jewelry store with a broken window. Then a gentleman wearing a
mask comes crawling out through the broken window, carrying a bag which
turns out to be full of expensive jewelry. The policeman doesnt hesitate
at all in deciding that this gentleman is dishonest. But by what reasoning
process does he arrive at this conclusion?
The policemans reasoning is clearly not deductive reasoning which is based on
relationships like
If A is true, then B is true.
Deductive reasoning is then A true = B true and B false = A false.
The policemans reasoning is better described by the following relationship:
If A is true, then B becomes more plausible
Plausible reasoning is then
B is true = A becomes more plausible.
How can one formalize this kind of reasoning? In chapter 1 and 2 of Jaynes book it
is shown that given some basic desiderata that a theory of plausible reasoning should
satisfy one can derive the laws of probability from scratch. These desiderata are that
(i) degrees of plausibility are represented by real numbers, (ii) if a conclusion can be
reasoned out in more than one way, then every possible way must lead to the same
result (plus some further weak conditions requiring correspondence of the theory to
common sense).
4

So according to this approach probability theory is not a theory about limiting relative frequencies in random experiments, but a formalization of the process of plausible
reasoning and the interpretation of a probability is
P (A) = the degree of belief in the proposition A
This subjective definition of probability can now be used to formalize the idea of learning
in an uncertain environment. Suppose my degree of belief in A is P (A). Then I learn
that the proposition B is true. If I believe there is some connection between A and B
I have then also learned something about A. In particular, the laws of probability (or,
according to the theory above, the laws of plausible reasoning) tells me that
Pr(A|B) =

Pr(B|A)Pr(A)
Pr(A B)
=
,
Pr(B)
Pr(B)

(1)

which, of course, is known as Bayes rule. If there is no logical connection between A

and B then Pr(B|A) = Pr(B) and in this case Pr(A|B) = Pr(A) and I havent learning
anything by observing B. On the other hand, if Pr(B|A) 6= Pr(B) then B contains
information about A and therefore I must update my beliefs about A.
The Bayesian approach to statistics
The Bayesian approach to statistics is based on applying the laws of probability to
statistical inference. To see what this entails simply replace A and B above by
A = the unobserved parameter vector, ,
B = the observed data vector, y.
Replacing probabilities with pdfs we get
p(|y) =

p(y|)p()
p(y)

(2)

Here p(y|) is the sample distribution of the data given and p() is the prior distribution of .
5

So Bayesian statistics is nothing more than a formal model of learning in an uncertain environment applied to statistical inference. The prior expresses my beliefs about
before observing the data;The distribution p(|y) expresses my updated beliefs about
after observing the data.
Definition 1. p() is the prior distribution of . p(|y) given in (2) is the posterior
distribution of , and
p(y) =

p(y|)p()d,

is the marginal distribution of the data.

Carrying out a Bayesian analysis is deceptively simple and always proceed as follows:
Formulate the sample distribution p(y|) and prior p().
Compute the posterior p(|y) according to (2)
Thats it! All information about is now contained in the posterior. For example the
probability that A is
Pr( A|y) =
A couple of things to note:

p(|y)d.
A

In the Bayesian approach randomness= uncertainty. The reason something is

random is not because it is generated by a random experiment but because it is
unknown. According to the Bayesian approach you are only allowed to condition
on something you know.
Note that data and parameters are treated symmetrically. Before observing any
data both data and parameters are considered random (since they are unknown).
After observing the data only the parameters are considered random (since now
you know the data but you still dont know the parameters).
The final product of a Bayesian analysis is the posterior distribution of , p(|y).
This distribution summarizes your current state of knowledge about . Note that
6

the posterior distribution is not an estimator. An estimator is a function of the

which given the data y yields a single value of . The posterior

data, = (y),
distribution is a distribution over the whole parameter space of .
While the posterior distribution is the complete representation of your beliefs about
, it is sometimes convenient to report a single estimate, e.g. the most likely value of
.
How should one can generate Bayes estimators from the posterior distribution?
Since we argued above that the Bayesian approach is just a model of learning we might
as well ask how one should make an optimal decision in an uncertain environment.
Well, as economists we know this: Maximize expected utility or, equivalently, minimize
expected loss.
So let
Y = Data sample space,
= Parameter space,
A = Action space,

where a A is an action or decision related to the parameter , e.g. a = (y),

an
estimator.
Definition 2. A function L : A R with the interpretation that L(1 , a1 ) is the
loss incurred if action a = a1 is taken when the parameter is = 1 is called a loss
function
With these definitions we can now define the posterior expected loss of a a given
action.
Definition 3. The posterior expected loss of an action a A is
Z
(a|y) = L(, a)p(|y)d
Now we can define a Bayes estimator:
7

(3)

Definition 4. Given a sample distribution, prior and loss function, a Bayes estimator
B (y) is any function of y so that
B (y) = arg minaA (a|y)
Some typical loss functions when is one dimensional are
L(, a) = ( a)2 ,
L(, a) = | a|,

k2 ( a), if > a,
L(, a) =
,

k1 (a ), otherwise,

quadratic,

(4)

absolut error,

(5)

generalized absolut error.

(6)

Then the corresponding optimal Bayes estimators are

E[|y],
Q1/2 (|y),
Qk2 /(k1 +k2 ) (|y),

(Posterior mean),

(7)

(Posterior median),

(8)

(k2 /(k2 + k1 ) fractile of posterior ),

(9)

Proof. Consider first the quadratic case. The posterior expected loss is
Z
(a|y) = ( a)2 p(|y)d,
which is a continuous and convex function of a so
Z
Z
(a|y)

= 0
( a )p(|y)d = 0 a = p(|y)d E[|y].
a
For the generalized absolut error loss case we get
Z
(a|y) = L(, a)p(|y]d
Z
Z a
= k1
(a )p(|y)d + k2

( a)p(|y)d

Now using integration by parts,

Z a
Z a
(a )p(|y)d = (a a)Pr( < a|y) lim (a x)Pr( < x|y) +
Pr( < x|y)dx
x

Z a
=
Pr( < x|y)dx,

and similarly for the second integral. Then

Z a
Z
Pr( < x|y)dx + k2
(a|y) = k1

Pr( > x|y)dx

(10)

This is a continuous convex function of a and

Pr( < a |y) =

(11)

which shows that a is the k2 /(k1 + k2 ) fractile of the posterior. For k1 = k2 we get the
posterior median.
One can also construct loss functions which gives the posterior mode as an optimal
Bayes estimator.
Here is a simple example of a Bayesian analysis.
Example 1. Suppose we have a sample y of size n where by assumption yi is sampled
from a normal distribution with mean and variance 1,
i.i.d.

yi | N(, 1).

(12)

So the sample distribution is

n/2 n

p(y|) = (2)

exp

n
o
1 X
2
(yi )2 .
2 i=1

(13)

Suppose we use the prior

p() = (2)

1/2

exp

o
1
2
2 ( 0 ) ,
20

(14)

where 0 and 02 is the known prior mean and variance. So before observing any data
the best guess of is 0 (at least under squared error loss). Usually one would have
0 large to express that little is known about before observing the data.
9

The posterior distribution of is

p(|y) =
The numerator is

p(y|)p()
p(y|)p()
=R
p(y)
p(y|)p()d

(n+1)/2 n

p(y|)p() = (2)
Note that

Then since

exp

n
o
1 X
1
2
(yi )2 2 ( 0 )2
2 i=1
20

n
n
X
X
2
(yi ) =
(yi y)2 + n( y)2
i=1

i=1

(15)

1
n
1
1
(
( y)2 + 2 ( 0 )2 = 2 (
y 0 )2 ,
)2 + 2
2

0 + n1 2

(16)

where
y + (1/02 )0
(n/ 2 )
,
(n/ 2 ) + (1/02 )
1

2 =
2
n/ + (1/02 )

(17)
(18)

the term in curly brackets is

n
1
1 X
1
(yi )2 2 ( 0 )2 = 2 (
)2 h(y),
2
2 i=1
20
2

where

Then

where
1

n
1 X
1
(
y 0 )2 .
h(y) = 2
(yi y)2 +
2
2 i=1
2(0 + n1 2 )

1
p(y|)p() = p(y)(2)1/2
1 exp 2 (
)2 },
2

0 1 exp h(y)}.
p(y) = (2)n/2 n

A convenient expansion when working with the normal distribution is

ac
ab + cd 2
(b d)2
+
a(x b)2 + c(x d)2 = (a + c) x
a+c
a+c

(19)

(20)

Then
p(|y) =

p(y|)p()
1
= (2)1/2
1 exp 2 (
)2 },

p(y)
2

(21)

which is the density of a normal distribution with mean

and variance
2 so we
conclude that
p(|y) = N(
,
2 ).

(22)

To derive this we did more calculations than we actually had to. Remember that when
deriving the posterior for we only need to include terms where enters. Hence,
p(|y) =

p(y|)p()
p(y)

p(y|)p()
n
n
o
1
1 X
(yi )2 2 ( 0 )2
exp 2
2 i=1
20

1
exp 2 (
)2 .
2

So this quick calculation shows that

1
p(|y) exp 2 (
)2 .
2

We recognize this as an unnormalized normal density. So we can immediately conclude

(22).
Under squared error loss the Bayes estimator is the posterior mean,
(n/ 2 )
y + (1/02 )0
.
E[|y] =
=
(n/ 2 ) + (1/02 )

(23)

The optimal Bayes estimator is a convex combination of the usual estimator y and the
prior expectation 0 . When n is large and/or 0 is large most weight is given to y. In
particular,
E[|y] y

as n

E[|y] y

as 0 .
11

In this example there was a close correspondence between the optimal Bayes estimator and the classical estimator y. But suppose now we had the knowledge that
have to be positive. Suppose also we initially use the prior
p() = I(K > > 0)

1
,
K

where K is a large positive number. Then we can compute the posterior for K <
and then let K approach infinity. Then posterior is then

1
1
p(|y) exp 2 (
)2 I(K > > 0) ,

2
K

2 = 2 /n. This is an unormalized doubly truncated normal

where now
= y and
distribution so

,
|
2 I(K > > 0)

p(|y) =
(
)
K
/
2
|
I( > > 0)
,

,
/
(
)

as K .

The posterior is a left truncated normal distribution with mean

E[|y] = y +

(
y /
)
.
y /
)
(

(24)

Note that the unrestricted estimate is y which may be negative. Developing the
repeated sample distribution of y under the restriction > 0 is a tricky matter. On
the other hand, the posterior analysis is straightforward and E[|y] is a reasonable and
intuitive estimator of .
Models via exchangeability
In criticisms of Bayesian statistics one often meet statements like This is too restrictive
since you have to use a prior to do a Bayesian analysis whereas in classical statistics
you dont. This is correct but now we will show that under mild conditions there
always exists a prior.

Consider the following example. Suppose you wish to estimate the probability of
unemployment for a group of (similar) individuals. The only information you have
is a sample y = (y1 , . . . , yn ) where yi is one if individual i is employed and zero if
unemployed. Clearly the indices of the observations should not matter in this case.
The joint distribution of the sample p(y1 , . . . , yn ) should be invariant to permutations
of the indices, i.e.,
p(y1 , . . . , yn ) = p(yi(1) , . . . , yi(n) ),
where {i(1), . . . , i(n)} is a permutation of {1, . . . , n}. Such a condition is called exchangeability.
Definition 5. A finite set of random quantities z1 , . . . , zn are said to be exchangeable if
every permutation of z1 , . . . , zn has the same joint distribution as every other permutation. An infinite collection is exchangeable if every finite subcollection is exchangeable.
The relatively weak assumption of exchangeability turns out to have a profound
consequence as shown by a famous theorem by deFinetti.
Theorem 1. deFinettis representation theorem Let z1 , . . . be a sequence of 0-1
random quantities. The sequence (z1 , . . . , zn ) is exchangeable for every n if and only if
p(z1 , . . . , zn ) =
where

1
0

n
Y
i=1

zi (1 )1zi dF (),

1 X
F () = lim Pr
zi
n
n i=1

What does deFinettis theorem say? It says that if the sequence z1 , z2 , . . . is considered exchangeable then it is as if the zi s are iid Bernoulli given ,
i.i.d.

zi | Bernoulli(),

i = 1, . . . , n,

where is a random variable with a distribution which is the limit distribution of the
P
sample average n1 ni=1 zi .
13

So one way of defending a model like

i.i.d.

i = 1, . . . , n,

zi | Bernoulli(),
(),

(25)
(26)

is to appeal to exchangeability and think about your beliefs about the limit of n1
when you pick the prior ().

i=1 zi

DeFinittis theorem can be generalized to sequences of continuous random variables,

see the book by Schervish for theorems and proofs.
Frequentist properties of Bayes procedures
It is often of interest to evaluate Bayes estimators from a classical frequentist perspective.
Consider first the issue of consistency. Suppose we have computed the posterior
distribution of the parameter vector ,
p(|y) =

p(y|)p()
.
p(y)

Suppose now we also make the assumption that there is a population distribution f (y).
As a measure of the difference between the sample distribution used to compute the
posterior p(y|) and the actual population distribution we can use the Kullback-Leibler
discrepency,
H() =

log

p(y |)
i
f (yi )dyi .
f (yi )

(27)

Let be the value of that minimizes this distance. One can show that if f (yi ) =
p(yi |0 ), i.e. the sample distribution is correctly specified and the population is indexed

by some true value 0 , then = 0 . Then2

Theorem 2. If the parameter space is compact and A is a neighborhood of 0 with

nonzero prior probability, then
p( A|y) 1
2

as n .

For proofs see the textbook by Schervish or Gelman et.al. on the reading list.

So in the case of correct specification, f (yi ) = p(yi |0 ), the posterior will concentrate
around the true value 0 asymptotically as long as 0 is contained in the support of
the prior. Under misspecification the posterior will concentrate around the value of 0
that minimizes the distance to the true model.
Now we shall consider the frequentist risk properties of Bayes estimators. To this

end we shall first define the frequentist risk of an estimator (y).

This is
Z
= L(, (y))p(y|)dy.

r(, )

(28)

Note the difference between this risk measure and the Bayesian risk measure (3): The
frequentist risk averages over the data for a given parameter whereas the Bayesian
risk measure averages over the parameter space given the data. Furthermore, the
The Bayes risk
frequentist risk is a function of both and the proposed estimator .

is only a function of a = .
There are two popular ways to choose estimators optimally based on their frequentist risk. These are minimaxity and admissibility It turns out that there is close
relationship between admissibility and Bayes estimators.
Definition 6. An estimator is inadmissible if there exists an estimator 1 which
i.e., such that for every ,
dominates ,
r(, 1 ),
r(, )
and, for at least one value 0 ,
> r(0 , 1 ).
r(0 , )
If is not inadmissible it is admissible.
The idea behing admissibility is to reduce the number of potential estimators to
consider. Indeed is seems hard to defend using an inadmissible estimator.
Under mild conditions Bayes estimators can be shown to be admissible. Under
somewhat stronger conditions one can in fact show the reverse: All admissible estima15

tors are Bayes estimators (or limits of Bayes estimators). A theorem proving this is
called a complete class theorem and different versions of complete class theorems exist.3
Comparisons between classical and Bayesian inference
The fundamental difference between classical frequentist inference and Bayesian inference is in the use of pre-data versus post-data probability statements.
The frequentist approach is limited to pre-data considerations. This approach answers questions of the following form:
(Q1) Before we have seen the data, what data do we expect to get?
(Q2) If we use the as yet unknown data to estimate parameters by some known algorithm, how accurate do we expect the estimates to be?
(Q3) If the hypothesis being tested is in fact true, what is the probability that we shall
get data indicating that it is true?
These questions can also be answered in the Bayesian approach. However, followers
of the Bayesian approach argue that these questions are not relevant for scientific
inference. What is relevant are post-data questions:
(Q1) After having seen the data, do we have any reason to be surprised by them?
(Q2) After we have seen the data, what parameter estimates can we now make, and
what accuracy are we entitled to claim?
(Q3) What is the probability conditional on the data, that the hypothesis is true?
Questions (Q1)-(Q3) are only meaningful in a Bayesian framework.
In the frequentist approach one cannot talk about the probability of a hypothesis.
The marginal propensity to consume is either .92 or not. A frequentist 95 pct. confidence interval (a, b) does not mean that the probability of a < < b is 95 pct. either
belongs to the interval (a, b) or not.
3

For more about this see the book by Berger.

Sometimes frequentist and Bayesian procedures give similar results although their
interpretation differ.
Example 2. In example 1 we found the posterior,
,
p(|y) = N(
2 ),
where
(n/ 2 )
y + (1/02 )0
,
(n/ 2 ) + (1/02 )
1

2 =
2
n/ + (1/02 )
=

Suppose we look at the limit prior, 0 4 Then

p(|y) = N(
y , 2 /n),
and the Bayes estimate under squared error loss plus/minus one posterior standard
deviation is

B = y .
n

(29)

On the other hand, the repeated sampled distribution of

= y is
p(
y |) = N(, 2 /n),
and the estimate plus minus one standard deviation of the repeated sample distribution
is

y = .
n

(30)

Conceptually (29) and (30) are very different, but the final statements one would make
about would be nearly identical.
We stress once again the difference between (30) and (29). (30) answers the question
(Q1) How much would the estimate of vary over the class of all data sets that we
might conceivably get?
4

This is a special case of a non-informative prior. We will discuss these later.

whereas (29) answers the question

(Q2) How accurately is the value of determined by the one data set that we actually
have?
The Bayesian camp has often critized the fact that the frequentist approach takes
data that could have been observed but wasnt into account when conducting inference
about the parameter vector. Here is a famous example from Bergers book that shows
why this sometimes can be a relevant critique.
Example 3. Suppose a substance to be analyzed can be sent either to either lab 1 or
lab 2. The two labs seem equally good so a fair coin is flipped to choose between them.
The coin flip results in lab 1. A week later the results come back from lab 1 and a
conclusion is to be made. Should this conclusion take into account the fact that the
coin could have pointed to lab 2 instead? Common sense says no, but according to the
frequentist principle we have to average over all possible samples including the ones
from lab 2.
Here is another often quoted example that shows this also affects testing.
Example 4. Suppose in 12 independent tosses of a coin you observe 9 heads and 3 tails.
Let =probability of heads. You wish to test H0 : = 1/2 vs. H1 : > 1/2. Given
that this is all the information you have, there are two candidates for the likelihood
function:
(1) Binomial. The number n = 12 was fixed beforehand and the random quantity X
was the number of heads observed in n tosses. Then X Bin(12, ) and

12 9
(1 )3 .
L1 () =
9
(2) Negative binomial. The coint was flipped until the third head appeared. Then
the random component is X =the number of flips required to complete the experiment, so X N egBin(3, ) and

11 9
(1 )3 .
L2 () =
9
18

Suppose we use the test statistic X =number of heads and decision rule reject H0 if
X c. The p-value is the probability of observing the data X = 9 or something more
extreme under H0 . This

1
X
12
1 = Pr(X 9| = 1/2) =
2
(1/2)j (1/2)12j = .075,
j
j=9

X
2+j
2 = Pr(X 9| = 1/2) =
(1/2)j (1/2)3 = .0325.
j
j=9
So using a conventional Type I error level = .05 the two model assumptions lead to
two different conclusions. But there is nothing in the situation that tells us which of
the two models we should use.
What happens here is that the Neyman-Pearson test procedure allows unobserved
outcomes to effect the results. X values more extreme than 9 was used as evidence
against the null. The prominent Bayesian Harold Jeffreys described this situation as a
hypothesis that may be true may be rejected because is has not predicted observable
results that have not occurred.
There is also an important difference between frequentist and Bayesian approaches
to the elimination of nuisance parameters. In the frequentist approach nuisance parameters are usually eliminated by the plug-in method. Suppose we have an estimator
1 of a parameter 1 which depends on another parameter 2 :
1 = 1 (y, 2 ).
Typically one would get rid of the dependence of 2 by plugging in an estimate of 2 :
1 = 1 (y, 2 (y)).
In the Bayesian approach one gets rid of nuisance parameters by integration. Suppose
the joint posterior distribution of 1 and 2 is p(1 , 2 |y). Inference about 1 is then
based on the marginal posterior
p(1 |y) =

p(1 , 2 |y)d2 .
19

Note that we can rewrite this integration as

Z
p(1 |y) = p(1 , 2 |y)d2
Z
= p(1 |2 , y)p(2 |y)d2 ,
so instead of plugging in a single value of 2 we average over all possible values of 2
by integrating the conditional posterior of 1 given 2 w.r.t. the marginal posterior for
2 .

Bayesian Mechanics
The normal linear regression model
The sampling distribution of the n vector of observable data y is
p(y|X, ) = N(y|X, 1 In ),

(31)

where X is n k with rank k and = (, ). Note that the covariance matrix 1 In is

formulated in terms of the precision of the observations. The precision is the inverse
of the variance.
We need a prior for . There are two popular choices.
p() = N(|0 , 1
0 ),
p( ) = G( |1 , 2 )
and
p(, ) 1

(32)

The first prior specifies that and are prior independent with having a multivariate normal prior with mean 0 and covariance 1
0 and having a gamma prior
with shape parameter 1 and inverse scale parameter 2 (We could also have chosen
to work with the variance 2 = 1/ . The implied prior on 2 will be an inverse gamma
distribution).
The second prior is a non-informative prior. This is a prior that you may want to
use if you dont have much prior information about available (you may be wondering
why 1 represents a non-informative prior on . This will become clearer below).
Consider the second prior first. The posterior distribution of is

p(|y) n/21 exp (y X)0 (y X)

n 2
o
0 (y X )
+ ( )
,
0 X 0 X( )
= n/21 exp (y X )
2

where = (X 0 X)1 X 0 y.

(33)

If we are interested primarily in we can integrate out the nuisance parameter

we get
0 (y X )
to get the marginal posterior of . Letting s(y) = (y X )
Z
p(|y) = p(|y)d
Z
n
o
d
0 X 0 X( )
n/21 exp s(y) + ( )
2

n/2
1
0 X( )

1+
( )X
,
s(y)

(34)
(35)

which is the kernel of a multivariate t distribution,

,
p(|y) = tnk |,

(36)

with n k degrees of freedom, mean and scale matrix

= s(y) (X 0 X)1 .

Note that this is exactly equivalent to the repeated sample distribution of .

We can also derive the marginal posterior of . From (33) we get
Z
p( |y) = p(|y)d
Z
n
o
d
0 X 0 X( )
n/21 exp s(y) + ( )
2
Z
n

o

n/21
0 X 0 X( )
d

exp s(y)
exp ( )
2
2

(nk)/21 exp s(y) ,

(37)
(38)

which we recognize as the kernel of a gamma distribution with shape parameter (n

k)/2 and inverse scale s(y)/2,
n k s(y)
p( |y) = G |
,
.
2
2
Note that the mean of this distribution is
E[ |y] =

1
(n k)
= 2,
s(y)

where
2 = s(y)/(n k).
22

(39)

Now we can see one way the prior p(, ) 1 may be considered non-informative:
The marginal posterior distributions have properties closely resembling the corresponding repeated sample distributions.
For the first prior we get

p(|y) n/2 exp (y X)0 (y X)
2
1

exp ( 0 )0 0 ( 0 ) 1 1 exp{2 } (40)

This can be rewritten as

p(|y) n/2+1 1 exp

0 X 0 X( )

s(y) + ( )
2

1
exp ( 0 )0 0 ( 0 ) exp{2 } (41)
2

This joint posterior of does not lead to convenient expressions for the marginals of
and .

We can, however, derive analytical expressions for the conditional posteriors p(|, y)
and p( |, y). These conditional posteriors turn out to play a fundamental role when
designing simulation algorithms.
Let us first derive the conditional posterior for given . Remember that we then
only need to include terms containing . Then we get
p(|, y) exp

o
1
0 X 0 X( )
+ ( 0 )0 0 ( 0 )
( )
2

(42)

Now we can use the following convenient expansion5 :

Lemma 1. Let z, a, b be k vectors and A, B be symmetric k k matrices such that

(A + B)1 exists. Then

(z a)0 A(z a) + (z b)0 B(z b) = (z c)0 (A + B)(z c) + (a b)0 A(A + B)1 B(a b),
where c = (A + B)1 (Aa + Bb)
5

For a proof see Box and Tiao (1973), p.418.

Applying lemma 1 it follows that

p(|, y) exp

o
1
0
,
1 ( )
( )
2

where
= ( X 0 X + 0 )1 ,

1 ( X 0 y + 0 0 ).
=
So
),

p(|, y) = N(|,

(43)

a multivariate normal distribution.

Similarly, from (40) we get

p( |, y) n/2+1 1 exp{ (y X)0 (y X) 2 },

2
which is the kernel of a gamma distribution,
n

(y X)0 (y X)
p( |, y) = G | + 1 ,
+ 2
2
2

(44)

We will later on see how it is extremely easy to simulate draws from p(|y) and
p( |y) using these conditional distributions.
The SURE model
Consider now the model
yij = x0ij j + ij ,

i = 1, . . . , n; j = 1, . . . , J,

(45)

where i = (i1 , . . . , iJ ) is assumed jointly normal,

i | N(0, 1 ).
We can rewrite this model as
Yi = Xi + i ,

i = 1, . . . , n.
24

(46)

where Yi = (yi1 , . . . , yiJ )0 and

x
0 0

0 xi2 0

Xi =
,
..
.. . .
.
.
0
.

0
0 xiJ

1

2

=
.
..

J

We need to specify a prior on = (, ). Again we can consider a non-informative

and an informative prior. The usual non-informative prior for this model is
p(, ) ||(J+1)/2 .

(47)

Alternatively, one can use

p() = N(|0 , 1
0 ),
p() = W(|, S).
The prior for is a multivariate normal distribution as before. The prior is a Wishart
distribution. This is the multivariate generalization of the gamma distribution. The
Wishart distribution has mean
E[] = S.
The posterior under the first prior is
n
1X

(Yi Xi )0 (Yi Xi )
p(, |y) ||(J+1)/2 ||n/2 exp
2 i=1
n

1X
(nJ1)/2
= ||
exp
(Yi Xi )0 (Yi Xi )
2 i=1

(48)

Using the well known result

n
X
i=1

(Yi Xi ) (Yi Xi ) =

n
X
i=1

(Yi Xi ())+
(Yi Xi ())
0

( ())

n
X
i=1

Xi0 Xi ( ()),
(49)

where

()
=

n
X
i=1

we find the conditional posterior

1
Xi0 Xi

n
X

Xi0 Yi ,

i=1

X
1
0

p(|, y) exp ( ())

Xi0 Xi ( ()
2
i=1

1
X

Xi0 Xi
.
p(|, y) = N |(),

(50)

(51)

i=1

The conditional posterior of is normal with mean equal to the efficient GLS estimator
(when is known).
To get the conditional posterior for note that
n
n
o
X
0
(Yi Xi ) (Yi Xi ) = Tr M () ,
i=1

where

n
X
(Yi Xi )(Yi Xi )0 .
M () =
i=1

From (48) we then get

1 n
o
p(|, y) ||(nJ1)/2 exp Tr M () .
2

(52)

This is the kernel of a Wishart distribution,

p(|, y) = W n, M ()1 )

(53)

Note that the posterior of mean of given is

E[|, y] = nM ()1 .
The inverse of this is n1 M () which is the usual estimate of the covariance matrix if
is known.

Next lets derive the conditional posteriors under the proper prior distributions.
The joint posterior is
J1)/2

p(, |y) ||

1
1 1
0
exp ( 0 ) 0 ( 0 )
exp Tr S
2
2
n
1X

n/2
0
|| exp
(Yi Xi ) (Yi Xi ) (54)
2 i=1

The conditional for is then

p(|, y) exp ( 0 )0 0 ( 0 ) exp

1
exp ( 0 )0 0 ( 0 ) exp
2

1X
(Yi Xi )0 (Yi Xi )
2 i=1
n

X
1
0

Xi0 Xi )( ()
( ()) (
2
i=1

Application of lemma 1 then gives

p(|, y) = N |(), () ,

(55)

where
n

1
X

()
= 0 +
Xi0 Xi
,

(56)

i=1

n
X

() = ()
Xi0 Yi + 0 0 .

(57)

i=1

The conditional for is

1
1X
p(|, y) ||(n+J1)/2 exp Tr S 1
(Yi Xi )0 (Yi Xi )
2
2 i=1
1
1

||(n+J1)/2 exp Tr S 1 Tr M ()
2
2
1

||(n+J1)/2 exp Tr S 1 + M ()
,
2

which is the kernel of a Wishart distribution,

p(|, y) = W n + , S 1 + M ()
27

(58)

Readings
Bayesian foundations and philosophy
Jaynes, E.T., (1994), Probability: The logic of science, unpublished book.
Chapters may be downloaded from http://bayes.wustl.edu/etj/prob.html
Jeffreys, H., (1961), Theory of Probability, Oxford University Press.
Bayesian statistics and econometrics
Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (1995), Bayesian
Data Analysis, Chapman and Hall.
Schervish, M.J. (1995) Theory of Statistics, Springer.
Zellner, A. (1971) An Introduction to Bayesian Inference in Econometrics,
Wiley.
Bayesian statistics and Decision Theory
Berger, J.O., (1985), Statistical Decision Theory and Bayesian Analysis,
Springer
Robert, C.P., (1994), The Bayesian Choice, Springer.

SOIL MECHANICS AND FOUNDATION ENGINEERING, K.R. Arora, Delhi, 2004. 903p PDF
85% (54)
SOIL MECHANICS AND FOUNDATION ENGINEERING, K.R. Arora, Delhi, 2004. 903p PDF
903 pages
UoN - ECON1001 Microeconomics For Business Decisions Notes
No ratings yet
UoN - ECON1001 Microeconomics For Business Decisions Notes
122 pages
CdA Calculator
No ratings yet
CdA Calculator
9 pages
Time+Series+Forecasting Monograph
No ratings yet
Time+Series+Forecasting Monograph
58 pages
Practical Guide To Logistic Regression - Even
100% (1)
Practical Guide To Logistic Regression - Even
42 pages
Sls 614: Second Language Writing
No ratings yet
Sls 614: Second Language Writing
7 pages
Time Series: Book 2
100% (1)
Time Series: Book 2
45 pages
White Paper - Multiplicative MMM Simplified
No ratings yet
White Paper - Multiplicative MMM Simplified
8 pages
DMUU Assignment 1 - GroupC
No ratings yet
DMUU Assignment 1 - GroupC
4 pages
Investmment Management and Portifolio AnalysisVERY USEFUL 1
No ratings yet
Investmment Management and Portifolio AnalysisVERY USEFUL 1
165 pages
Foundations of Portfolio Theory: Harrym - Markowitz
No ratings yet
Foundations of Portfolio Theory: Harrym - Markowitz
9 pages
ETF Trading System Made Easy: Simple method 100% automatic Profit from 25% to 125% annually depending on the market chosen
From Everand
ETF Trading System Made Easy: Simple method 100% automatic Profit from 25% to 125% annually depending on the market chosen
Richard Bastien
No ratings yet
Solutions Manual 1ed
No ratings yet
Solutions Manual 1ed
79 pages
1970 - Fama - Efficient Capital Market - A Review of Theory and Empirical Work
No ratings yet
1970 - Fama - Efficient Capital Market - A Review of Theory and Empirical Work
36 pages
Mankiw (1995) - The Growth of Nations
100% (1)
Mankiw (1995) - The Growth of Nations
53 pages
Silber - 1984 - Marketmaker Behavior in An Auction Market, An Analysis of Scalpers in Futures Markets
No ratings yet
Silber - 1984 - Marketmaker Behavior in An Auction Market, An Analysis of Scalpers in Futures Markets
18 pages
Management Accounting
No ratings yet
Management Accounting
11 pages
The Cost of Production: Prepared by
No ratings yet
The Cost of Production: Prepared by
50 pages
Intro To SEM - Day 3 - Nov2012
No ratings yet
Intro To SEM - Day 3 - Nov2012
50 pages
Neoclassical Theory of Economic Development
No ratings yet
Neoclassical Theory of Economic Development
18 pages
RK LiftMeUp WordsofWisdom
100% (1)
RK LiftMeUp WordsofWisdom
71 pages
Sampling and Sample Size Determination
100% (1)
Sampling and Sample Size Determination
42 pages
Fear & Greed Indicator
No ratings yet
Fear & Greed Indicator
4 pages
CH 2 Financial Analysis Technoques Presentation
No ratings yet
CH 2 Financial Analysis Technoques Presentation
44 pages
AC3059 Financial Management
No ratings yet
AC3059 Financial Management
4 pages
Pool of Questions of Economix Quizbee 2013
100% (1)
Pool of Questions of Economix Quizbee 2013
17 pages
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
No ratings yet
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
31 pages
Measures of Centraltendency
No ratings yet
Measures of Centraltendency
33 pages
T2.Statistics Review (Stock & Watson)
No ratings yet
T2.Statistics Review (Stock & Watson)
15 pages
Stock Market Classes
No ratings yet
Stock Market Classes
5 pages
Bartlett-Arithmetic Growth PDF
100% (2)
Bartlett-Arithmetic Growth PDF
29 pages
Two Period Model
No ratings yet
Two Period Model
23 pages
Decisions Under Risk and Uncertainty
No ratings yet
Decisions Under Risk and Uncertainty
18 pages
Interdependence and The Gains From Trade
No ratings yet
Interdependence and The Gains From Trade
30 pages
David A. Freedman - The Limits of Econometrics PDF
100% (1)
David A. Freedman - The Limits of Econometrics PDF
13 pages
Examining The Effectiveness of Fundamental Analysis in A Long-Term Stock Portfolio
No ratings yet
Examining The Effectiveness of Fundamental Analysis in A Long-Term Stock Portfolio
9 pages
2unit 3 Forecasting
No ratings yet
2unit 3 Forecasting
92 pages
Choosing The Correct Statistical Test
No ratings yet
Choosing The Correct Statistical Test
26 pages
Forecasting - Wikipedia
No ratings yet
Forecasting - Wikipedia
22 pages
Basic Definitions: Profit Maximization
No ratings yet
Basic Definitions: Profit Maximization
5 pages
Risk Pricing: Chapter 3 Part 1
No ratings yet
Risk Pricing: Chapter 3 Part 1
27 pages
Asq Proving Ci With Profit Ability E1338
No ratings yet
Asq Proving Ci With Profit Ability E1338
304 pages
PT As I Still See It Clean Copy Markowitz 2010
No ratings yet
PT As I Still See It Clean Copy Markowitz 2010
42 pages
Invest and Earn Quick: Mastering Technical Analysis of the Financial Markets: Winning Strategies of Professional Investment
From Everand
Invest and Earn Quick: Mastering Technical Analysis of the Financial Markets: Winning Strategies of Professional Investment
Warren H. Lau
No ratings yet
Calculus For Economics2020
No ratings yet
Calculus For Economics2020
146 pages
Lecture Notes - Monetarism
No ratings yet
Lecture Notes - Monetarism
5 pages
Competitor Analysis
No ratings yet
Competitor Analysis
6 pages
Economics For Business Decisions Book 2 For M.com Sem 1 2019
No ratings yet
Economics For Business Decisions Book 2 For M.com Sem 1 2019
161 pages
Fundamental Accounting Principles Fundamental Accounting Principles
No ratings yet
Fundamental Accounting Principles Fundamental Accounting Principles
48 pages
T.C. Abant İzzet Baysal University Institute of Social Sciences Department of Business Administration
No ratings yet
T.C. Abant İzzet Baysal University Institute of Social Sciences Department of Business Administration
25 pages
The January Effect: 2a. Market Anomalies
No ratings yet
The January Effect: 2a. Market Anomalies
6 pages
Panel Data Regression Chap 10
No ratings yet
Panel Data Regression Chap 10
76 pages
Demand Estimation & Forecasting
No ratings yet
Demand Estimation & Forecasting
15 pages
CurtisIrvine Microeconomics 2015A PDF
100% (1)
CurtisIrvine Microeconomics 2015A PDF
447 pages
Managerial Economics:: Production Theory
No ratings yet
Managerial Economics:: Production Theory
44 pages
Chapter5 Solutions
100% (1)
Chapter5 Solutions
12 pages
X-Bar and R Charts: NCSS Statistical Software
No ratings yet
X-Bar and R Charts: NCSS Statistical Software
26 pages
The Trader Business-Plan: Putting The Pieces Together
No ratings yet
The Trader Business-Plan: Putting The Pieces Together
4 pages
Model Thinking
100% (1)
Model Thinking
103 pages
Endowment Effect, Winners Curse & Hindsight, Disposition Effect
100% (1)
Endowment Effect, Winners Curse & Hindsight, Disposition Effect
25 pages
Trading Indicators & Oscillators: Standard Deviation (StdDev), Accelerator Oscillator (AC), Fractals Indicator, Bollinger Bands, Momentum: Technical Indicators & Oscillators
From Everand
Trading Indicators & Oscillators: Standard Deviation (StdDev), Accelerator Oscillator (AC), Fractals Indicator, Bollinger Bands, Momentum: Technical Indicators & Oscillators
SmartMoney
No ratings yet
Unit Root, Cointegration, Granger-Causality, Threshold Regression and Other Econometric Modeling with Economics and Financial Data: 單根，共積，格蘭傑爾因果，門檻迴歸及其他計量經濟模式
From Everand
Unit Root, Cointegration, Granger-Causality, Threshold Regression and Other Econometric Modeling with Economics and Financial Data: 單根，共積，格蘭傑爾因果，門檻迴歸及其他計量經濟模式
Chin-Wei Yang
No ratings yet
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
79 pages
Preface - Concrete Construction Engineering (2nd Edition)
No ratings yet
Preface - Concrete Construction Engineering (2nd Edition)
10 pages
Latvia Civil Engineering 2013
No ratings yet
Latvia Civil Engineering 2013
373 pages
Latvia Civil Engineering 2011
No ratings yet
Latvia Civil Engineering 2011
271 pages
Cover & Table of Contents - Construction Databook Construction Materials and Equipment (2nd Edition)
No ratings yet
Cover & Table of Contents - Construction Databook Construction Materials and Equipment (2nd Edition)
8 pages
Instrumentación de Puentes: Structural Health Monitoring of Bridges
No ratings yet
Instrumentación de Puentes: Structural Health Monitoring of Bridges
38 pages
Asphalt Rubber Paving: Process Overview
No ratings yet
Asphalt Rubber Paving: Process Overview
32 pages
Eco-Efficiency of Chip Seal & Micro-Surfacing
No ratings yet
Eco-Efficiency of Chip Seal & Micro-Surfacing
48 pages
4678 Presentacion V Andreassian PDF
No ratings yet
4678 Presentacion V Andreassian PDF
51 pages
QTMR Pavement Rehabilitation Manual 2012-04
No ratings yet
QTMR Pavement Rehabilitation Manual 2012-04
381 pages
7.1 Two Phase Sampling
No ratings yet
7.1 Two Phase Sampling
5 pages
Geometric Mean
No ratings yet
Geometric Mean
3 pages
Dadm Research
No ratings yet
Dadm Research
11 pages
Bio2 Module 3 - Comparison of Means
No ratings yet
Bio2 Module 3 - Comparison of Means
19 pages
7SSMM700 Tutorial 3 Solutions
No ratings yet
7SSMM700 Tutorial 3 Solutions
22 pages
Course Schedule
No ratings yet
Course Schedule
1 page
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Math Education Presentation Skeleton in A Purple White Black Lined Style
No ratings yet
Math Education Presentation Skeleton in A Purple White Black Lined Style
37 pages
T-Test-Assignment 3-Act
No ratings yet
T-Test-Assignment 3-Act
4 pages
Time Series Analysis With Python
100% (1)
Time Series Analysis With Python
64 pages
Ecf350 FPD 9 2021 1
No ratings yet
Ecf350 FPD 9 2021 1
27 pages
TB - 2 Exploring Two Variable Data
No ratings yet
TB - 2 Exploring Two Variable Data
28 pages
Business Decision Making
No ratings yet
Business Decision Making
5 pages
Linear Regression
No ratings yet
Linear Regression
108 pages
One Sample T Test, Rejected
No ratings yet
One Sample T Test, Rejected
2 pages
Design Space Process Models With Monte Carlo Simulation
No ratings yet
Design Space Process Models With Monte Carlo Simulation
38 pages
UNIT 2-3 - Notes - Unit-2-3-Notes
No ratings yet
UNIT 2-3 - Notes - Unit-2-3-Notes
16 pages
Preliminary Selection of Mean Structure
No ratings yet
Preliminary Selection of Mean Structure
9 pages
Propensity Score Matching Example
No ratings yet
Propensity Score Matching Example
4 pages
Basic Econometrics Gujrati 8-11
No ratings yet
Basic Econometrics Gujrati 8-11
199 pages
5.2) Multinomial Logistic Regression
No ratings yet
5.2) Multinomial Logistic Regression
34 pages
Reliability Test Using Cronbach Alpha: Formula Solution
No ratings yet
Reliability Test Using Cronbach Alpha: Formula Solution
1 page
Effect of Service Quality On Customer Satisfaction (Case Study in Indomaret KM 30)
No ratings yet
Effect of Service Quality On Customer Satisfaction (Case Study in Indomaret KM 30)
7 pages
Capability Analysis Using Statgraphics Centurion: Neil W. Polhemus, Cto, Statpoint Technologies, Inc
No ratings yet
Capability Analysis Using Statgraphics Centurion: Neil W. Polhemus, Cto, Statpoint Technologies, Inc
32 pages
Skittles Final
No ratings yet
Skittles Final
7 pages
11014-Article Text-33351-2-10-20230201
No ratings yet
11014-Article Text-33351-2-10-20230201
15 pages
Data Analytics
No ratings yet
Data Analytics
28 pages
Analysis of Survival Data First Edition Cox - Download The Ebook Today To Explore Every Detail
100% (1)
Analysis of Survival Data First Edition Cox - Download The Ebook Today To Explore Every Detail
62 pages

(Ebook) Introduction To Bayesian Econometrics and Decision Theory

Uploaded by

(Ebook) Introduction To Bayesian Econometrics and Decision Theory

Uploaded by

Introduction to Bayesian Econometrics and Decision

Lecture notes, 315 Winter 2001.

3. Allows for conditioning on data. A Bayesian analysis conditions on the

which, of course, is known as Bayes rule. If there is no logical connection between A

is the marginal distribution of the data.

In the Bayesian approach randomness= uncertainty. The reason something is

the posterior distribution is not an estimator. An estimator is a function of the

which given the data y yields a single value of . The posterior

where a A is an action or decision related to the parameter , e.g. a = (y),

generalized absolut error.

Then the corresponding optimal Bayes estimators are

(k2 /(k2 + k1 ) fractile of posterior ),

Now using integration by parts,

and similarly for the second integral. Then

Pr( > x|y)dx

This is a continuous convex function of a and

Pr( < a |y) =

So the sample distribution is

Suppose we use the prior

The posterior distribution of is

the term in curly brackets is

A convenient expansion when working with the normal distribution is

which is the density of a normal distribution with mean

So this quick calculation shows that

We recognize this as an unnormalized normal density. So we can immediately conclude

2 = 2 /n. This is an unormalized doubly truncated normal

The posterior is a left truncated normal distribution with mean

So one way of defending a model like

DeFinittis theorem can be generalized to sequences of continuous random variables,

by some true value 0 , then = 0 . Then2

Theorem 2. If the parameter space is compact and A is a neighborhood of 0 with

end we shall first define the frequentist risk of an estimator (y).

For more about this see the book by Berger.

Suppose we look at the limit prior, 0 4 Then

On the other hand, the repeated sampled distribution of

This is a special case of a non-informative prior. We will discuss these later.

whereas (29) answers the question

Note that we can rewrite this integration as

where X is n k with rank k and = (, ). Note that the covariance matrix 1 In is

p(|y) n/21 exp (y X)0 (y X)

If we are interested primarily in we can integrate out the nuisance parameter

which is the kernel of a multivariate t distribution,

with n k degrees of freedom, mean and scale matrix

Note that this is exactly equivalent to the repeated sample distribution of .

(nk)/21 exp s(y) ,

which we recognize as the kernel of a gamma distribution with shape parameter (n

exp ( 0 )0 0 ( 0 ) 1 1 exp{2 } (40)

This can be rewritten as

Now we can use the following convenient expansion5 :

Lemma 1. Let z, a, b be k vectors and A, B be symmetric k k matrices such that

(A + B)1 exists. Then

For a proof see Box and Tiao (1973), p.418.

Applying lemma 1 it follows that

a multivariate normal distribution.

p( |, y) n/2+1 1 exp{ (y X)0 (y X) 2 },

where i = (i1 , . . . , iJ ) is assumed jointly normal,

where Yi = (yi1 , . . . , yiJ )0 and

We need to specify a prior on = (, ). Again we can consider a non-informative

Alternatively, one can use

Using the well known result

we find the conditional posterior

p(|, y) exp ( ())

From (48) we then get

This is the kernel of a Wishart distribution,

Note that the posterior of mean of given is

The conditional for is then

p(|, y) exp ( 0 )0 0 ( 0 ) exp

Application of lemma 1 then gives

The conditional for is

which is the kernel of a Wishart distribution,

You might also like