0% found this document useful (0 votes)
28 views13 pages

Essay Draft

Unit root hypothesis is concerned with discriminating between a trend stationary (TS) model and a difference stationary (DS) model. This can be done by embedding the models in the structural form: _ y t = u +dt +u t (1) where a(l) is a polynomial of order p in the lag operator L and t N _ 0, s 2 _. An alternative parametrisation that has been used in unit root testing is

Uploaded by

moonychan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views13 pages

Essay Draft

Unit root hypothesis is concerned with discriminating between a trend stationary (TS) model and a difference stationary (DS) model. This can be done by embedding the models in the structural form: _ y t = u +dt +u t (1) where a(l) is a polynomial of order p in the lag operator L and t N _ 0, s 2 _. An alternative parametrisation that has been used in unit root testing is

Uploaded by

moonychan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1 Formulation of the testing model

The unit root hypothesis is concerned with discriminating between a trend stationary (TS)
model and a dierence stationary (DS) model. For testing purposes these models can be
conveniently embedded in the structural form:
_
y
t
= +t +u
t
A(L)u
t
=
t
(1)
where A(L) is a polynomial of order p in the lag operator L and
t
N
_
0,
2
_
. This
parametrisation is also labelled a components model, since it makes clear distinctions
between the deterministic and stochastic components.
In the simple AR(1) process A(L) = 1 L, and simplying (1) gives:
y
t
= (1 ) + + (1 )t +y
t1
+
t
(2)
The unit root hypothesis is = 1 and identies a DS model. Under this null the intercept
and time trend t disappear, with becoming the drift parameter. The alternative
hypothesis is || < 1 and distinguishes a TS model. Under this alternative the process
is stable and mean reverts around a deterministic trend. Provided y
t
is also logarithmic,
then has a direct interpretation as the rate of growth in this situation. For || > 1
the process is explosive. While asymptotic theory excludes the possibility of an explosive
innite series, || > 1 can be modelled in the Bayesian framework under the assumption
of a nite time frame. The above components model has been adopted by ? and ? for
unit root testing, and promoted by ?? and ?.
An alternative parametrisation that has been used in unit root testing is the reduced form
of the components model:
y
t
=
0
+
1
t +y
t1
+
t
(3)
where the parameters in (1) and (3) are related by:

0
= (1 ) +,
1
= (1 ) (4)
However the convenience of a linear model in (3) is complicated by the non-linear de-
pendence of
0
and
1
upon in (4). Establishing the same dynamics of the unit root
hypothesis under the components model require the constraints
0
= and
1
= 0 under
the null. If we consider
0
and
1
as independent of , then y
t
will not be of the same
order of magnitude under the null and alternative hypotheses. In particular if the time
trend coecient
1
= 0, y
t
will be O(t) under the alternative hypothesis and O(t
2
) under
the null (see ?, pp.164-168). This behaviour where a time trend is always present will be
undesirably captured by the likelihood function based on the reduced form. Consequently
conducting the null and alternative hypotheses will no longer be discriminating between
a DS model and a TS model.
Recall in the components model the time trend disappears in the presence of a unit root,
and whether = 0 then governs the order of magnitude of y
t
. Therefore there is no
discrepancy in the behaviour of the data under the null and alternative hypotheses. This
is one of the reasons for considering the components model as the better parametrisation
under a Bayesian framework.
1
1.1 Generalising to incorporate AR(p) process
The components model can incorporate a more sophisticated serial correlation structure
by generalising A(L) = 1 a
1
L . . . a
p
L
p
. The stationary property of y
t
is equivalent
to the roots of the polynomial function A(L) lying outside the unit circle. It is desirable
to isolate the possibility of a unit root by a convenient representation (see ?, pp.516-518)
:
A(L) = (1 L) + (1 L)A

(L) (5)
with
_

_
= 1 A(1)
A

(L) = a

1
L +. . . +a

p1
L
p1
a

j
=

p1
i=j
a
i+1
, j = 1, 2, ..., p 1
The data process has a unit root if it is the the case = 1. Replacing A(L) in (1) by its
new formulation (5) and simplifying obtains:
y
t
= (1 ) + + (1 )t +y
t1
+A

(L)(y
t
) +
t
(6)
Here one can see that the parameter is concerned with the long run dynamics of the
series, while the a

j
parameters are concerned with the short term. When compared to the
AR(1) model, the addition of p1 lags of y
t
presents an extra non-linearity in .
? proposed to replace (y
t
) by (y
t

), where

is an estimator of in the empiri-
cal mean of y
t
. This simplication
1
can be theoretically justied as linearising A

(1)
around

A

(1) = 0 and

. The advantage is from simplifying calculations, and having the
marginal likelihood function of retaining the same structure irrespective of the autore-
gressive order. Writing out (6) with the simplication obtains:
y
t
= (1 ) + + (1 )t +y
t1
+A

(L)y

t
+
t
(7)
where y

t
= y
t

.
2 Bayesian methodology of unit root testing
To test for unit roots under the Bayesian framework, one requires the posterior density of
: f(|y). The posterior density of oers a distributional summary of conditional on
the data and prior belief. To obtain f(|y) it is convenient rst deriving the marginalised
likelihood function of : L(; y). This is calculated as:
L(; y) =
_
()
L(, ; y)(|)d
1
By multivariate Taylor series:
A

(1)

A

(1)

(1) +

(1)

A

(1)

= A

(1)

Since a sucient but not necessary condition for A

(1) = 0 is a
2
= a
3
= . . . = a
p
= 0, the linearisation
around A

(1) = 0 can be thought of as restricting between an AR(1) and AR(p) model.


2
where L(, ; y) is the joint likelihood function, (, ,
2
, a

1
, . . . , a

p1
) are the nuisance
parameters, and (|) is some (conditional) prior with support (). By Bayes Theorem
f(|y) is then
f(|y) L(; y)()
with () as the prior for . The choices of () will be the focus of an extended discussion;
but rst a re-parametrisation to allow for matrix notation is convenient for the derivation
of L(; y).
2.1 Matrix formulation
Conditioning on the parameter of interest , the reduced form of the components model
in (2) and (7) are linear in the nuisance parameters (, ,
2
, a

1
, . . . , a

p1
). We wish to
obtain a regression format of the form:
y
t
() = x

t
() +
t

t
N(0,
2
)
This is achieved in the simple AR(1) case by dening x

t
= [1, t] and

= [, ], and write:
_
y
t
() = (1 L)y
t
x

t
() = (1 L)x
t
For the AR(p) model (p> 1) dene x

t
= [1, t] and

= [, , a

1
, a

2
, . . . , a

p1
], then the
re-parametrisation is:
_
y
t
() = (1 L)y
t
x

t
() = [x

t
x

t1
, y

t1
, y

t2
, . . . , y

tp+1
]
Since an AR(p) model reserves the rst p observations, let the sample size be T+p; we
can proceed to stack the T observations and obtaining the regression format (conditional
on ):
y() = X() +
where y() is a T 1 vector function of , X() is a T (p +1) matrix function of , and
N(0,
2
I
T
). This notation can be found in ?, ? and ?.
2.2 Deriving the marginal likelihood function of
The marginal likelihood function of is a summary of the data information regarding the
parameter. To conduct any meaningful posterior analysis, we would require this function
to be nite and positive at = 1 and be integrable over the parameter space of .
In order to derive the marginal likelihood function of , I will rst formulate the approx-
imate likelihood function and proceed to integrate out the nuisance parameters (,
2
)
under a diuse prior. The consequent marginalised approximate likelihood function has
an innite singularity at = 1 and will be insucient for unit root testing. The distri-
bution of the initial observation y
0
is required, and I will show how this is modelled in
the literature to explicitly consider explosive values of . Finally the marginalised exact
likelihood function will be bounded everywhere and can be used for posterior inference.
3
2.2.1 Marginalised approximate likelihood function
The approximate likelihood function of an AR(p) process is the joint data density from
considering the rst p observations as xed. Following the matrix format above, the
conditional approximate likelihood function of (,
2
) is given by the conditional data
density:
L
A
_
,
2
|y,
_

2
_

T
2
exp
_

1
2
2
[y() X()]

[y() X()]
_
(8)
It can be shown that L
A
_
,
2
|y,
_
belongs to the class of normal-inverted gamma-2
density function (see ?, pp.57):
L
A
_
,
2
|y,
_
f
NI
g
_
,
2
|

(),

M(), s(), v
_
where:
_

M() = X

()X()

() = [X

()X()]
1
X

()y()
s() = y

()
_
I
T
X() [X

()X()]
1
X

()
_
y()
v = T (p + 1) 2
A diuse prior for (,
2
|) can be used to express ignorance:
(,
2
|)
1

2
(9)
which is equivalent to at priors for log(
2
) and independent of . It follows that the
conditional joint posterior density of (,
2
) is given by:
f
_
,
2
|y,
_
f
NI
g
_
,
2
|

(),

M(), s(), v
_
where v = v + 2. The conditional posterior density of is Student t:
f
_
|y,
_
= f
t
_
|

(),

M(), s(), v
_
and from the integrating constant of this density, the kernel of the marginalised approxi-
mate likelihood function is:
L
A
(; y) |

M()|

1
2
s()

v
2
(10)
? proves the marginalised approximate likelihood function in (10) is unbounded at = 1
when the constant term is present. The distribution of the initial observation y
0
is
required for the marginal likelihood function of to be bounded everywhere.
2.2.2 Modelling the initial observation
In the literature of economic time series it is often reasonable to assume that the observed
sample is a segment of a data generating process; one which began long ago or in the
4
innite past. The initial observation is then the closest reference to this unobservable
past, and hence can bring in valuable information. As noted in ? one can compare the
distance of the initial observation from the time trend to form an impression of the size of
the root. To see this consider an AR(1) process and the components model in (1); and let
us assume some starting date at t = s such that u
s
= is xed. The initial observation
y
0
can be shown to have the distribution:
y
0
|
_
,
2
, , s
_
N
_
+
s
,
2
q(, s)
_
q(, s) =
s

i=0

2i
=
1
2(s+1)
1
2
Since q(, s) is a geometric series of non-negative
2
it is well dened for all R, and
there is no implicit truncation to the parameter space. Under the stationarity assumption
and innite time horizon y
0
N
_
,

2
1
2
_
, hence for small the variance of y
0
is small
and we should observe the initial observation close to the time trend.
In order to incorporate the density of the initial observation let us simplify calculations
by assuming = 0, then:
y
0
|
_
,
2
, s
_
N
_
,
2
q(, s)
_
(11)
The exact likelihood function of an AR(p) process considers the rst p observations as
random. However this joint density is rather complex (see ?, pp.123-125), and following
the examples of ??, and ?; only the distribution of y
0
will be used while (y
1
, y
2
, . . . , y
p1
)
shall remain xed.
The conditional exact likelihood function of (,
2
) is then calculated as the product of
the conditional data density in (8) and of the marginal distribution of y
0
in (11):
L
E
_
,
2
|y, , s
_
q (, s)

1
2
_

2
_

T+2
2

exp
_

1
2
2
_
[y() X()]

[y() X()] +
1
q (, s)
[y
0
]
2
__
We can write the term in the exponential function more compactly as:

1
2
2
_
[y() X()]

[y() X()] +
_
y

0
x

_
y

0
x

_
where y

0
= y
0
q (, s)

1
2
and x

0
=
_
q (, s)

1
2
, 0
_
.
Following the natural conjugate framework, the marginal exact likelihood function of
under a diuse prior for (,
2
) in (9) can be easily derived as:
L
E
(; y, s) q (, s)

1
2
|

M()|

1
2
s()

v
2
(12)
where:
_

M() = X

()X() +
_
q (, s)
1
0
0 0
_

() =
_

M()
_
1
_
X

()y() +
_
y
0
q (, s)
1
0
__
s() = y

()y() +y
2
0
q (, s)
1

()

M()

()
v = T p + 1
5
?, pp.182-183 dene a function q(, v) which behaves similarly to q(, s) for reference
values of s and v, but is easier to calculate:
q(, v) =
1 +v
1 +v
2
(13)
However while both s and v are subjective and aect the likelihood function, using q(, v)
implicitly truncates the domain of
_

1 +v,

1 +v
_
. This is due to requiring the
variance of y
0
in (11) to be non-negative. The choice of s is a starting date but the
direct impact on for a given sample is less conspicuous. As v allows for more direct
interpretative power, all subsequent calculations are based on replacing q (, s) for q (, v).
For testing the authors recommended a value of v =
1
3
which was wide enough.
2.3 Testing the unit root hypothesis
The requirements for testing the unit root hypothesis are the marginal likelihood function
L
E
(; y, v) as dened in (12), with q (, s) replaced by q (, v) in (13), and a prior in ().
The choice of () here is essentially free, but rst let us consider the standard testing
procedures.
The usual procedure to test a point null hypothesis of the form:
H
0
: = 1 vs H
1
: = 1
is through posterior odds ratio. This requires assigning a probability mass to H
0
: = 1
and dening the prior as:
() =
_
; if = 1
(1 )(); if = 1
where [0, 1] and () is required to be a proper density integrating to one. The
posterior probability of H
0
is then given by:
p
_
H
0
|y, v
_
=
_
1 +
1

1
B
_
1
B =
L
E
( = 1; y, v)
_
=1
L
E
(; y, v)() d
where B is called the Bayes factor. For objective weight ( = 0.5) the posterior proba-
bility of H
0
is only dependent on the Bayes factor. We reject the unit root hypothesis
if p(H
0
|y, v) < 0.5 or B < 1. Posterior odds ratio of point null hypotheses have been
considered by ?, ??, and ?.
Another testing procedure is to calculate a 95% posterior credible interval for based on:
_
C
R
L
E
(; y, v) () d = 0.95
6
where it is usually desirable to minimise the size of C
R
by choosing the Highest Posterior
Density credible set. However this can lead to disjointed credible intervals for multi-modal
distributions. Since we are interested in the unit root hypothesis, one can dene C
R
to
be:

sup
_

1+v
L
E
(; y, v) () d = 0.95
and proceed to observe whether = 1 lies in the interval
_

1 +v ,
sup
_
. This is
equivalent to calculating a non-stationarity probability:
p
_
1|y, v
_
=

1+v
_
1
L
E
(; y, v) () d
and for p ( 1|y, v) < 0.05, the 95% credible interval does not encompass unity. Con-
sequently the unit root hypothesis is rejected if it is the case p ( 1|y, v) < 0.05. The
above testing procedure has been adopted by ? and ?.
For any continuous and proper density () dened over the domain of , it is possible to
calculate both Bayes factor and a 95% credible interval. ? recommended this to be the
standard for conducting any point null hypothesis under the Bayesian framework. The
Bayes factor provides the data evidence against the point null, while the credible interval
reports the size of discrepancy. As conclusions reached under the two testing procedures
may not always agree for the same set of observations, it is of interest to study this in
more details. But rst let us consider the issue of the prior on , which has been a central
focus in the Bayesian framework of unit root testing.
2.4 The issue of the prior on
The prior in Bayesian analysis serves as an elicitation of the information available to the
decision maker before observing the data. When there is insucient information or in
situations which desire objective inference, the search has been in a non-informative prior
to convey some notion of ignorance or emphasising data evidence.
One central focus of the Bayesian literature on unit root testing has been the appropriate
non-informative prior for . Since the early works of ? the uniform distribution is seen
as non-informative for location parameters. A uniform prior for has been used by ? to
illustrate that Bayesian procedures can reject the unit root hypothesis systematically more
than its classical counterpart. However this foundation has been deeply questioned by ?,
in which he raised concerns that is not akin to a location parameter and its impact upon
sample moments (e.g. mean, variance, autocorrelation) changes drastically for dierent
values. Phillips proposal of an objective framework was in the Jereys prior based on the
principles of ? and the arguments of ?.
In the former, Jereys conceptualised that any objective procedure of assigning a non-
informative prior should be invariant under one-to-one reparametrisation. This is the
property that if = () is an injective function and the procedure calculates
1
() and

2
() based on the same statistical model (i.e. f(y|) and f(y|) respectively) then:

1
() d =
2
() d
7
Based on the invariance principle, Jereys derived the implied prior from the Fishers
information

J
() |H()|
1
2
H() =
_
Y
f(y|)
d
2
d
2
log f(y|) dy
Perks used the idea that a high Fishers information is synonymous to a tight condence
interval of the maximum likelihood estimate to justify Jereys prior. To see this the max-
imum likelihood estimator
MLE
of a xed parameter
0
has (under regularity conditions
implying asymptotic normality) the distribution:

MLE
N
_

0
, H (
0
)
1
_
Favouring values of for which the Fishers information is large is then interpretable as
emphasising data evidence.
Phillips derived the Jereys prior for based on the approximate likelihood function of
the reduced form model in (3), and found
J
() to be steeply increasing as approached
unity and beyond. The nding is then interpreted to mean that the process generated in
these intervals to be itself more informative in the sense of high Fishers information.
Within such views the at uniform prior is then subjective, since it implicitly down-weighs
the sample information for large values by imposing a bias towards stationarity.
However it is worth mentioning that the application of Jereys rule is not without its
problems. One relevant case being that the invariance principle ensures corresponding
regions of the likelihood function under dierent multivariate parametrisations to be the
same volume; but regions can be of the same volume yet very dierent shape. To interpret
information limits based on any single one parameter within a multiparameter framework
is to ignore this structure.
Following the debate of a non-informative prior for unit root testing, ? and ? have
proposed alternative solutions. In the following section, I oer a small introduction to
four priors which have appeared in the non-informative debate.
Jereys Prior
f
J
(, v, T)
_

_
_
1+v+
2
1+v
2
_
(1 +v)
1
2T
1
2
1
_
+
1
1
2
_
T
1
2T
1
2
_
= 1
_
2+v
v
[(1 +v) T 1] +
T(T1)
2
= 1
(14)
Where T is the sample size, and this prior has an integrable innite singularity at =

1 +v and is therefore proper by truncating


_

1 +v,

1 +v
_
. The Jereys prior
in (14) is based on the exact likelihood function of an AR(1) process with no time trend
and the structural form in (2). It is calculated by ?, pp.195-196.
8
Lubranos Prior
f
L
(, v) =
1

_
1 +v
2
(15)
The prior is an Arcsine distribution proposed by ? after observing the similarity of this
distribution with the Jereys Prior in (14), and removing the unintuitive property of
having a prior dependent on the sampling size T.
Berger and Yang Reference prior
f
BY
(, v) =
_

_
1
2

1
2
|| < 1
arcsec
1
(

1+v)
4||

2
1
1 < || <

1 +v
(16)
This prior has an integrable innite singularity at = 1 and is ? Reference prior by
truncating
_

1 +v,

1 +v
_
. A Reference prior was rst described by ? under the
principle of maximising the missing information in an experiment, where distance between
densities are measured by the Kullback-Leibler divergence.
Unifrom prior
f
U
(, v) =
1
2

1 +v
(17)
The standard uniform prior for
_

1 +v,

1 +v
_
.
3 Monte Carlos simulation
A Monte Carlos simulation can be used to study the performance of testing procedures, the
choice of priors and of Bayesian methods in general. The desirable test should naturally
minimise the test size and maximise test power, and it is of interest to examine these
in the context of small set of observations. The choice of the models are the AR(1)
process with constant and with or without time trend. These two models capture the
dynamic properties of a random walk and a random walk with drift respectively. The true
parameters used in the simulation is specied by:
= 0,
2
= 1, {0, 1}, [0.50 : 0.001 : 1.00]
y
0
|v N (0, q(, v)) , v
_
1
6
,
1
3
_
9
The sample size of each test is T = 50 and I repeat N = 1000 replications for each value.
Three types of rejection criterion for the unit root hypothesis are considered:
B < 1
p
_
1|y, v
_
< 0.05
B < 1 and p
_
1|y, v
_
< 0.05
where B is the Bayes factor. I consider the four priors (14) to (17) and benchmark them
against the relevant Dickey Fuller test for each model.
AR(1) model with no time trend
Discussion
AR(1) model with time trend
Discussion
3.1 Bayesian Model Averaging
It is obvious from the Monte Carlos simulations that the choice of prior has a large impact
upon the size and power of Bayesian tests. If a decision maker recognises uncertainty in the
selection process of a prior, then one natural extension is to accommodate this uncertainty
through Bayesian Model Averaging. Specically one may recognise four dierent Bayesian
models that is described by the same marginal exact likelihood function of in (12), the
same xed belief for the domain of
_

1 +v,

1 +v
_
but four dierent priors detailed
by (14) to (17). Denote a model as M
i
= {L
E
(; y, v),
i
()} then the posterior probability
of the model is:
p
_
M
i
|y
_
=
p (y|M
i
) p (M
i
)
p (y)
where
p
_
y|M
i
_
=

1+v
_

1+v
L
E
(; y, v)
i
() d
p (y) =
4

i=1
p
_
y|M
i
_
p (M
i
)
10
note p (y|M
i
) can be thought of as a weighted average of the marginal likelihood function
under prior
i
(). If the assumption is p (M
i
) =
1
4
, calculations simplies to:
p (y) =
1
4
4

i=1
p
_
y|M
i
_
p
_
M
i
|y
_
=
p (y|M
i
)
4

i=1
p (y|M
i
)
(18)
In (18) we see that the Bayesian Averaging allocates p (M
i
|y) to emphasis priors giving
the highest weighted average of the likelihood function. Using this framework the Bayes
factor and posterior probability of non-stationarity is calculated as:
B =
L
E
( = 1; y, v)
p (y)
p
_
1|y, v
_
=
4

i=1
p
_
1|y, v, M
i
_
p
_
M
i
|y
_
The following two graphs exhibit the same Monte Carlos simulation conducted above but
accounting also for Bayesian Model Averaging, the Dickey Fuller tests are removed for
better comparison.
4 Implementing subjective beliefs over explosive values
In the application of unit root testing it is often sensible to assume that values implying
explosive series are unlikely. The Bayesian procedure described above can accommodate
this subjective belief through two channels: the rst is to reect this in the prior dis-
tribution of such that the density is decreasing for 1. However given that the
Jereys and Lubranos prior explode over the non-stationary interval supposedly to re-
ect increasing sample information (if one is to believe in the arguments of ?), it may
not be of interest to change their fundamental structure. Furthermore since the marginal
likelihood function in (12) already encompasses our subjective belief by been constricted
to
_

1 +v,

1 +v
_
, adding an additional layer of subjectivity to the prior may
well be double counting. Naturally the second channel calls for ways to best truncate
the domain of through v, and may well be a more consistent approach.
However eliciting an exact value for v to implicate all values of
_

1 +v,

1 +v
_
to
have zero probability is dicult, but a good reference point is v =
1
3
recommended by ?,
pp.182-183. I consider v as a random sample from an exponential distribution V exp ()
with mean E (V ) =
1
to reect the uncertainty in the choice of the truncating point;
then adopting = 3 as per Lubranos recommendation. This is a hierarchical model ar-
gument by considering the problem of what the domain of should be (i.e. uncertainty
over v) as separate from the prior density of in this uncertain domain. The justication
for considering an underlying exponential distribution is given below.
11
Lack of Memory
The exponential distribution is the only continuous distribution having the following prop-
erty:
p
_
V > v
0
+|V > v
0
_
= p (V > )
otherwise known as the the lack of memory property. It can be thought of as a constant
failure rate since at any point of reference v
0
, the distribution ensures that the probability
V will last more than is constant. This is a very desirable property in the interval of
explosive values we deem appropriate, since the process can be made to reect probability
statements as:
p
_
V > 0.1 +|V > 0.1
_
= p (V > ) = 0.05
for some small value. But the constant failure rate is harder to justify as desirable once
V reaches some tipping point (v
0
= 0.75 say) which are not impossible.
Maximum entropy choice
Suppose one wishes to nd a probability density f (v) on R
+
subject to the constraint
that:
_
R
+
vf (v) dv =
1
(19)
where is a constant. One choice of f (v) subject to (19) is to follow the maximum entropy
principle, seeking f (v) to also maximise the uncertainty as measured by its entropy:
H (f) =
_
R
+
f (v) log f (v) dv
The exponential distribution is then the derivation of the above procedure, and can be
interpreted as the distribution consistent with our knowledge that the mean is
1
but
introduces no additional information. In this sense, the predictive power of an exponential
distribution should deviate least from our expectations.
4.1 Examining the eect of a hierarchical model
We can study the eects of considering v as drawn from an exponential random variable
upon unit root testing through a repeated sampling context. The context is such that for
each sample we conduct Bayesian procedures based on the same set of observations y, but
having a random sample v drawn from V exp (). In the asymptotic limit the posterior
density should converge to:
f
_
, v|y
_
L
E
(, v; y)
_
|v
_
(v) (20)
where L
E
(, v; y) is given by (12), and (|v) are the priors (14) to (17). This is not
any standard recognisable distribution, but we can analysis f (, v|y) in (20) by using a
12
random-walk Metropolis-Hastings algorithm. Specially I simulate 10000 observations from
the Metropolis-Hastings algorithm with 1000 discarded as burn in and visually inspect the
plots to ensure the chain is mixing well.
From the posterior sample draws P( 1|y) is then approximated as:
P( 1|y) = E
(,v)|y
[l ( 1)]
1
N
N

n=1
l
_

(n)
1
_
where N = 9000 is the total number of accepted draws, with each draw of the form

(n)
= (
(n)
, v
(n)
), and l() is the indicator function.
The marginal likelihood p (y) needed for the Bayes factor can be obtained through the
Gelfand-Dey method, and a procedure similar to the one found in was used.
13

You might also like